OCFS2-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [Ocfs2-devel] NFS clients crash OCFS2 nodes (general protection fault: 0000 [#1] SMP PTI)
@ 2020-05-19 10:48 Pierre Dinh-van
  2020-05-20  2:52 ` Changwei Ge
  2020-05-20 22:01 ` herbert.van.den.bergh at oracle.com
  0 siblings, 2 replies; 8+ messages in thread
From: Pierre Dinh-van @ 2020-05-19 10:48 UTC (permalink / raw)
  To: ocfs2-devel


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hi,

I'm experiencing quite reproducable crashs on my OCFS2 cluster the last
weeks, and I think I found out part of the Problem. But before I make a
bug report (I'm not sure where), I though I could post my research here
in case someone understands it better than I.

When it happens, the kernel is giving me a 'general protection fault:
0000 [#1] SMP PTI' or ' BUG: unable to handle page fault for address:'
message, local access to the OCFS2 volumes are all hanging forever, and
the load is getting so high that only a hard reset "solves" the problem.
It happened the same way on both nodes (always on the active node
serving NFS). It happens when both nodes are up, or when only one is
active.

First, a few words about my setup :

- - storage is Fiberchannel LUNs served by a EMC VNX5200 (4x 5TB).

- - LUNs are encrypted with LUKS

- - LUKS volumes are formated with OCFS2

- - heartbeat=local

- - The OCFS2 cluster is communicating on a dedictated IP over InfinyBand
direct link (actually a fail-over bonding of the 2 ib interfaces).

- - nodes are 2 debian buster hardware servers (24 cores, 128GB RAM).

- - I use corosync for cluster management, but I don't think it's related

- - The cluster serves 4 NFS exports (3 are NFSv3, and one is NFSv4 with
kerberos) and samba shares

- - The samba shares are served in parallel by the 2 nodes managed by ctdb

- - The NFS exports are served by one of the node having the dedicated IP
for this export.

- - When corosync is moving the IP of a NFS export to the other node, it
restarts nfsd to issue a grace time of 10s on the new active node.

- - The bug appeared with default debian buster kernel 4.19.0-9-amd64 and
with the backported kernel 5.5.0-0.bpo.2-amd64 as well


My cluster went live a few weeks ago, and it first worked with both
nodes active. During some maintenance work where the active NFS node had
to switch, I had a a bug, that I was able to reproduce against my will
at least 20 times in the last nights :

1) around 100 NFS clients are connected with the exports and happy
working with it

2) The IPs of the NFS shares are switching from nodeA to nodeB

3) nodeB's kernel says "general protection fault: 0000 [#1] SMP PTI"

4) nodeB's load is getting high

5) access to the OCFS2 is hanging (clean reboot never end, system cannot
unmount OCFS volumes)

6) I power cycle nodeB

7) I start nodeA again

8) nodeA give the same symptoms, as soon as some NFS clients do some
special requests


Most of the time, when the IP jumps to the other node, I get some of
this messages :


[? 501.707566] (nfsd,2133,14):ocfs2_test_inode_bit:2841 ERROR: unable to
get alloc inode in slot 65535
[? 501.716660] (nfsd,2133,14):ocfs2_test_inode_bit:2867 ERROR: status = -22
[? 501.726585] (nfsd,2133,6):ocfs2_test_inode_bit:2841 ERROR: unable to
get alloc inode in slot 65535
[? 501.735579] (nfsd,2133,6):ocfs2_test_inode_bit:2867 ERROR: status = -22

But it also happens when the node is not crashing, so I think it's
another Problem.

Last night, I saved a few of the kernel output while the servers where
crashing one after the other like in a ping-pong game :


- -----8< on nodeB? 8<-----

[? 502.070431] nfsd: nfsv4 idmapping failing: has idmapd not been started?
[? 502.475747] general protection fault: 0000 [#1] SMP PTI
[? 502.481027] CPU: 6 PID: 2104 Comm: nfsd Not tainted
5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
[? 502.490016] Hardware name: IBM System x3650 M3
- -[7945UHV]-/94Y7614???? , BIOS -[D6E150CUS-1.11]- 02/08/2011
[? 502.499821] RIP: 0010:_raw_spin_lock+0xc/0x20
[? 502.504206] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
[? 502.523018] RSP: 0018:ffffa5af4838bac0 EFLAGS: 00010246
[? 502.528277] RAX: 0000000000000000 RBX: 0fdbbeded053c1d6 RCX:
0000000000000000
[? 502.535433] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
0fdbbeded053c25e
[? 502.542588] RBP: 0fdbbeded053c25e R08: ffff9672ffa73068 R09:
000000000002c340
[? 502.549743] R10: 00000184d03a2115 R11: 0000000000000000 R12:
ffff9672eff43fd0
[? 502.556899] R13: ffff9672fff66bc0 R14: 000000000000ffff R15:
ffff9672fff660c8
[? 502.564055] FS:? 0000000000000000(0000) GS:ffff968203a80000(0000)
knlGS:0000000000000000
[? 502.572166] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[? 502.577928] CR2: 00007f0a4bb5df50 CR3: 0000001e4169a004 CR4:
00000000000206e0
[? 502.585082] Call Trace:
[? 502.587546]? igrab+0x19/0x50
[? 502.590484]? ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
[? 502.596094]? ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
[? 502.601534]? ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
[? 502.606536]? ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
[? 502.612321]? ocfs2_get_parent+0xa3/0x300 [ocfs2]
[? 502.616960]? reconnect_path+0xa1/0x2c0
[? 502.620742]? ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
[? 502.625636]? exportfs_decode_fh+0x111/0x2d0
[? 502.629844]? ? exp_find_key+0xcd/0x160 [nfsd]
[? 502.634218]? ? __kmalloc+0x180/0x270
[? 502.637810]? ? security_prepare_creds+0x6f/0xa0
[? 502.642356]? ? tomoyo_write_self+0x1b0/0x1b0
[? 502.646642]? ? security_prepare_creds+0x49/0xa0
[? 502.651995]? fh_verify+0x3e5/0x5f0 [nfsd]
[? 502.656737]? nfsd3_proc_getattr+0x6b/0x100 [nfsd]
[? 502.662152]? nfsd_dispatch+0x9e/0x210 [nfsd]
[? 502.667136]? svc_process_common+0x386/0x6f0 [sunrpc]
[? 502.672791]? ? svc_sock_secure_port+0x12/0x30 [sunrpc]
[? 502.678628]? ? svc_recv+0x2ff/0x9c0 [sunrpc]
[? 502.683597]? ? nfsd_svc+0x2c0/0x2c0 [nfsd]
[? 502.688395]? ? nfsd_destroy+0x50/0x50 [nfsd]
[? 502.693377]? svc_process+0xd1/0x110 [sunrpc]
[? 502.698361]? nfsd+0xe3/0x140 [nfsd]
[? 502.702561]? kthread+0x112/0x130
[? 502.706504]? ? kthread_park+0x80/0x80
[? 502.710886]? ret_from_fork+0x35/0x40
[? 502.715181] Modules linked in: rpcsec_gss_krb5 nfsd nfs_acl lockd
grace ocfs2 quota_tree sctp libcrc32c nft_counter ipt_REJECT
nf_reject_ipv4 xt_tcpudp nft_compat nf_tables nfnetlink binfmt_misc
ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
cpufreq_userspace ocfs2_stackglue cpufreq_powersave cpufreq_conservative
configfs bonding dm_round_robin dm_multipath scsi_dh_rdac scsi_dh_emc
scsi_dh_alua intel_powerclamp coretemp kvm_intel ipmi_ssif kvm cdc_ether
irqbypass usbnet mii tpm_tis intel_cstate tpm_tis_core intel_uncore
joydev tpm ipmi_si ioatdma sg iTCO_wdt ib_mthca iTCO_vendor_support
watchdog dca pcspkr ipmi_devintf rng_core ib_uverbs ipmi_msghandler
i5500_temp i7core_edac evdev ib_umad ib_ipoib ib_cm ib_core auth_rpcgss
sunrpc ip_tables x_tables autofs4 algif_skcipher af_alg ext4 crc16
mbcache jbd2 crc32c_generic dm_crypt dm_mod sd_mod hid_generic usbhid
hid sr_mod cdrom crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel mgag200 drm_vram_helper drm_ttm_helper
[? 502.715218]? i2c_algo_bit ttm qla2xxx drm_kms_helper drm ata_generic
ata_piix aesni_intel libata nvme_fc ehci_pci uhci_hcd crypto_simd
ehci_hcd nvme_fabrics cryptd glue_helper lpc_ich nvme_core megaraid_sas
mfd_core usbcore scsi_transport_fc e1000e scsi_mod i2c_i801 usb_common
ptp pps_core bnx2 button
[? 502.837917] ---[ end trace 6e1a8e507ac30d8d ]---
[? 502.843379] RIP: 0010:_raw_spin_lock+0xc/0x20
[? 502.848584] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
[? 502.869295] RSP: 0018:ffffa5af4838bac0 EFLAGS: 00010246
[? 502.875404] RAX: 0000000000000000 RBX: 0fdbbeded053c1d6 RCX:
0000000000000000
[? 502.883423] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
0fdbbeded053c25e
[? 502.891436] RBP: 0fdbbeded053c25e R08: ffff9672ffa73068 R09:
000000000002c340
[? 502.899446] R10: 00000184d03a2115 R11: 0000000000000000 R12:
ffff9672eff43fd0
[? 502.907455] R13: ffff9672fff66bc0 R14: 000000000000ffff R15:
ffff9672fff660c8
[? 502.915473] FS:? 0000000000000000(0000) GS:ffff968203a80000(0000)
knlGS:0000000000000000
[? 502.924496] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[? 502.931147] CR2: 00007f0a4bb5df50 CR3: 0000001e4169a004 CR4:
00000000000206e0

- -----8< cut here 8<-----


shortly after when trying to start nodeA while nodeB was down :

- -----8< nodeA 8<-----

[? 728.927750] general protection fault: 0000 [#1] SMP PTI
[? 728.934647] CPU: 8 PID: 12156 Comm: nfsd Not tainted
5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
[? 728.944800] Hardware name: IBM System x3650 M3 -[7945H2G]-/69Y4438,
BIOS -[D6E158AUS-1.16]- 11/26/2012
[? 728.955176] RIP: 0010:_raw_spin_lock+0xc/0x20
[? 728.960595] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
[? 728.981559] RSP: 0018:ffffa24d09187ac0 EFLAGS: 00010246
[? 728.987885] RAX: 0000000000000000 RBX: 625f656369766564 RCX:
0000000000000000
[? 728.996135] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
625f6563697665ec
[? 729.004384] RBP: 625f6563697665ec R08: ffff91f1272e3680 R09:
000000000002c340
[? 729.012628] R10: 000002114c00d22b R11: 0000000000000000 R12:
ffff91e13858bbd0
[? 729.020877] R13: ffff91f13c03bbc0 R14: 000000000000ffff R15:
ffff91f13c03b0c8
[? 729.029108] FS:? 0000000000000000(0000) GS:ffff91f13fa80000(0000)
knlGS:0000000000000000
[? 729.038308] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[? 729.045161] CR2: 000055fe95d1d9a8 CR3: 0000001c8420a003 CR4:
00000000000206e0
[? 729.053415] Call Trace:
[? 729.056978]? igrab+0x19/0x50
[? 729.061023]? ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
[? 729.067736]? ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
[? 729.074284]? ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
[? 729.080386]? ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
[? 729.087268]? ocfs2_get_parent+0xa3/0x300 [ocfs2]
[? 729.092995]? reconnect_path+0xa1/0x2c0
[? 729.097863]? ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
[? 729.103840]? exportfs_decode_fh+0x111/0x2d0
[? 729.109133]? ? exp_find_key+0xcd/0x160 [nfsd]
[? 729.114574]? ? dbs_update_util_handler+0x16/0x90
[? 729.120273]? ? __kmalloc+0x180/0x270
[? 729.124920]? ? security_prepare_creds+0x6f/0xa0
[? 729.130521]? ? tomoyo_write_self+0x1b0/0x1b0
[? 729.135858]? ? security_prepare_creds+0x49/0xa0
[? 729.141464]? fh_verify+0x3e5/0x5f0 [nfsd]
[? 729.146536]? nfsd3_proc_getattr+0x6b/0x100 [nfsd]
[? 729.152297]? nfsd_dispatch+0x9e/0x210 [nfsd]
[? 729.157647]? svc_process_common+0x386/0x6f0 [sunrpc]
[? 729.163675]? ? svc_sock_secure_port+0x12/0x30 [sunrpc]
[? 729.169873]? ? svc_recv+0x2ff/0x9c0 [sunrpc]
[? 729.175190]? ? nfsd_svc+0x2c0/0x2c0 [nfsd]
[? 729.180323]? ? nfsd_destroy+0x50/0x50 [nfsd]
[? 729.185631]? svc_process+0xd1/0x110 [sunrpc]
[? 729.190925]? nfsd+0xe3/0x140 [nfsd]
[? 729.195421]? kthread+0x112/0x130
[? 729.199636]? ? kthread_park+0x80/0x80
[? 729.204279]? ret_from_fork+0x35/0x40
[? 729.208818] Modules linked in: ocfs2 quota_tree dm_crypt crypto_simd
cryptd glue_helper algif_skcipher af_alg nfsd nfs_acl lockd grace sctp
libcrc32c nft_counter xt_tcpudp nft_compat nf_tables nfnetlink
binfmt_misc ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
ocfs2_stackglue configfs bonding dm_round_robin dm_multipath
scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_powerclamp ipmi_ssif
coretemp kvm_intel kvm irqbypass cdc_ether intel_cstate usbnet ib_mthca
mii joydev intel_uncore ib_uverbs iTCO_wdt sg ioatdma
iTCO_vendor_support i7core_edac watchdog i5500_temp pcspkr ipmi_si
tpm_tis tpm_tis_core tpm ipmi_devintf ipmi_msghandler rng_core evdev
acpi_cpufreq ib_umad auth_rpcgss ib_ipoib ib_cm sunrpc ib_core ip_tables
x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_mod
hid_generic sd_mod usbhid hid sr_mod cdrom uas usb_storage mgag200
drm_vram_helper drm_ttm_helper qla2xxx ata_generic i2c_algo_bit ttm
nvme_fc ixgbe nvme_fabrics ata_piix xfrm_algo ehci_pci uhci_hcd
[? 729.208868]? drm_kms_helper nvme_core ehci_hcd dca libata
megaraid_sas scsi_transport_fc libphy ptp drm usbcore scsi_mod
crc32c_intel lpc_ich i2c_i801 pps_core mfd_core usb_common bnx2 mdio button
[? 729.323900] ---[ end trace 212cefe75207e17f ]---
[? 729.329644] RIP: 0010:_raw_spin_lock+0xc/0x20
[? 729.335121] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
[? 729.356192] RSP: 0018:ffffa24d09187ac0 EFLAGS: 00010246
[? 729.362584] RAX: 0000000000000000 RBX: 625f656369766564 RCX:
0000000000000000
[? 729.370889] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
625f6563697665ec
[? 729.379184] RBP: 625f6563697665ec R08: ffff91f1272e3680 R09:
000000000002c340
[? 729.387475] R10: 000002114c00d22b R11: 0000000000000000 R12:
ffff91e13858bbd0
[? 729.395767] R13: ffff91f13c03bbc0 R14: 000000000000ffff R15:
ffff91f13c03b0c8
[? 729.404066] FS:? 0000000000000000(0000) GS:ffff91f13fa80000(0000)
knlGS:0000000000000000
[? 729.413336] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[? 729.420262] CR2: 000055fe95d1d9a8 CR3: 0000001c8420a003 CR4:
00000000000206e0

- -----8< cut here 8<-----

Since the backtrace is speaking about nfsd, I tried to isolate the
Problem by droping all NFS packet entering to the active node before
starting the services :

iptables -I INPUT -p tcp --dport 2049 -j DROP

Everything fine, access over samba were doing fine.

I started to allow NFS for part of my network with iptables, until the
crash occurs.

On this one, the crash occured after allowing only a NFSv3 client.

- -----8< nodeB 8<-----

[ 2494.893697] BUG: unable to handle page fault for address:
00000000000fab7d
[ 2494.900626] #PF: supervisor write access in kernel mode
[ 2494.905873] #PF: error_code(0x0002) - not-present page
[ 2494.911042] PGD 0 P4D 0
[ 2494.913596] Oops: 0002 [#1] SMP PTI
[ 2494.917106] CPU: 11 PID: 21342 Comm: nfsd Not tainted
5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
[ 2494.926285] Hardware name: IBM System x3650 M3
- -[7945UHV]-/94Y7614???? , BIOS -[D6E150CUS-1.11]- 02/08/2011
[ 2494.936077] RIP: 0010:_raw_spin_lock+0xc/0x20
[ 2494.940461] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
[ 2494.959269] RSP: 0018:ffffb949cca7fac0 EFLAGS: 00010246
[ 2494.964511] RAX: 0000000000000000 RBX: 00000000000faaf5 RCX:
0000000000000000
[ 2494.971664] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
00000000000fab7d
[ 2494.978815] RBP: 00000000000fab7d R08: ffff99c8f12737b8 R09:
000000000002c340
[ 2494.985968] R10: 000005dc16e333d8 R11: 0000000000000000 R12:
ffff99c903066bd0
[ 2494.993119] R13: ffff99c9014c7bc0 R14: 000000000000ffff R15:
ffff99c9014c70c8
[ 2495.000271] FS:? 0000000000000000(0000) GS:ffff99ba03bc0000(0000)
knlGS:0000000000000000
[ 2495.008382] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2495.014141] CR2: 00000000000fab7d CR3: 0000000f38df2002 CR4:
00000000000206e0
[ 2495.021291] Call Trace:
[ 2495.023753]? igrab+0x19/0x50
[ 2495.026693]? ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
[ 2495.032301]? ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
[ 2495.037741]? ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
[ 2495.042742]? ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
[ 2495.048525]? ocfs2_get_parent+0xa3/0x300 [ocfs2]
[ 2495.053163]? reconnect_path+0xa1/0x2c0
[ 2495.056943]? ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
[ 2495.061838]? exportfs_decode_fh+0x111/0x2d0
[ 2495.066042]? ? exp_find_key+0xcd/0x160 [nfsd]
[ 2495.070417]? ? __kmalloc+0x180/0x270
[ 2495.074007]? ? security_prepare_creds+0x6f/0xa0
[ 2495.078551]? ? tomoyo_write_self+0x1b0/0x1b0
[ 2495.082834]? ? security_prepare_creds+0x49/0xa0
[ 2495.087387]? fh_verify+0x3e5/0x5f0 [nfsd]
[ 2495.092125]? nfsd3_proc_getattr+0x6b/0x100 [nfsd]
[ 2495.097517]? nfsd_dispatch+0x9e/0x210 [nfsd]
[ 2495.102489]? svc_process_common+0x386/0x6f0 [sunrpc]
[ 2495.108134]? ? svc_sock_secure_port+0x12/0x30 [sunrpc]
[ 2495.113946]? ? svc_recv+0x2ff/0x9c0 [sunrpc]
[ 2495.118883]? ? nfsd_svc+0x2c0/0x2c0 [nfsd]
[ 2495.123651]? ? nfsd_destroy+0x50/0x50 [nfsd]
[ 2495.128604]? svc_process+0xd1/0x110 [sunrpc]
[ 2495.133563]? nfsd+0xe3/0x140 [nfsd]
[ 2495.137733]? kthread+0x112/0x130
[ 2495.141643]? ? kthread_park+0x80/0x80
[ 2495.145993]? ret_from_fork+0x35/0x40
[ 2495.150262] Modules linked in: ocfs2 quota_tree nfsd nfs_acl lockd
grace sctp libcrc32c nft_counter xt_tcpudp nft_compat nf_tables
nfnetlink binfmt_misc ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
ocfs2_nodemanager ocfs2_stackglue cpufreq_userspace cpufreq_powersave
cpufreq_conservative configfs bonding dm_round_robin dm_multipath
scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_powerclamp coretemp
kvm_intel kvm ipmi_ssif irqbypass cdc_ether usbnet intel_cstate mii
ib_mthca intel_uncore evdev iTCO_wdt joydev ipmi_si tpm_tis sg
tpm_tis_core tpm ipmi_devintf iTCO_vendor_support ib_uverbs pcspkr
watchdog ioatdma rng_core i7core_edac dca i5500_temp ipmi_msghandler
ib_umad ib_ipoib auth_rpcgss ib_cm sunrpc ib_core ip_tables x_tables
autofs4 algif_skcipher af_alg ext4 crc16 mbcache jbd2 crc32c_generic
dm_crypt dm_mod sr_mod cdrom sd_mod hid_generic usbhid hid
crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mgag200
drm_vram_helper drm_ttm_helper i2c_algo_bit ttm qla2xxx drm_kms_helper
[ 2495.150298]? ata_generic drm ata_piix aesni_intel libata nvme_fc
uhci_hcd crypto_simd nvme_fabrics ehci_pci ehci_hcd megaraid_sas cryptd
nvme_core glue_helper usbcore scsi_transport_fc lpc_ich mfd_core e1000e
scsi_mod i2c_i801 usb_common bnx2 ptp pps_core button
[ 2495.269073] CR2: 00000000000fab7d
[ 2495.273205] ---[ end trace 44f000505f987296 ]---
[ 2495.278647] RIP: 0010:_raw_spin_lock+0xc/0x20
[ 2495.283825] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
[ 2495.304287] RSP: 0018:ffffb949cca7fac0 EFLAGS: 00010246
[ 2495.310368] RAX: 0000000000000000 RBX: 00000000000faaf5 RCX:
0000000000000000
[ 2495.318358] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
00000000000fab7d
[ 2495.326339] RBP: 00000000000fab7d R08: ffff99c8f12737b8 R09:
000000000002c340
[ 2495.334314] R10: 000005dc16e333d8 R11: 0000000000000000 R12:
ffff99c903066bd0
[ 2495.342297] R13: ffff99c9014c7bc0 R14: 000000000000ffff R15:
ffff99c9014c70c8
[ 2495.350279] FS:? 0000000000000000(0000) GS:ffff99ba03bc0000(0000)
knlGS:0000000000000000
[ 2495.359224] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2495.365825] CR2: 00000000000fab7d CR3: 0000000f38df2002 CR4:
00000000000206e0

- -----8< cut here 8<-----


I also made some record of NFS traffic with tcpdump when the crashs are
occuring, but I didn't had time to analyse it. I can provide them if it
helps (but I might have to clean them up first to avoid some data leaks :)

After this happening a few times, I had one case where one of the few
NFSv4 client was involved, and I took as aworkaround to migrate the
NFSv4 share to XFS since I though it could be NFSv4 related. Now that I
looked@the kernel dumps, I think it's also NFSv3 related.

Is it a kernel bug or am I missing something ?

For now, I will switch back to XFS with a failover setup of my cluster,
cause I cannot afford this kind of crashes of a central file server.



Cheers


Pierre


PS : I'll try to update with what I found in the PCAP files in some days



-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEjXhlMuC3bFbPA3S/qC81yulnAywFAl7DuYUACgkQqC81yuln
AyyWHg/+NUMgiysN+QesGedzDjfKiUSTKqPvbg3pUklq3bXfidLdIcM6HGZt31Gy
/k33m83+675c3vQeez8uvSVAh7uNzQW2m7BGrzEv/ZkpwIZxqIwiV+gi/zc7lnGP
vqkjAqVjAapk254W3aTonZKpExKCbp6h1g1eVWKk43Mu7/UaLkQcjDPq9AyheSt+
PAPpiQeQ9ox1+OVvEnpawJmcV9QUluHPwsar/yYa87ET2LIlBswtjOHr+Vw27kJU
qwak16rTDWqVQo5PYIi5N1jdCAEAPOV6EalNNwIQ9/u+lCdbVHYRV7Ze5nE58DDz
zP7Ai9Vaj/oBr/znjbaoE2VR0iWRVqbztBxts0hNv1IxAqJivZnaaJx6bXp51NeV
kXwTz0tHqCoy8RVlUXEGP8FBuaWgSLZyfN5IOk1KoocsLUuZh/BVxh+snDj4T52k
u/SRBFV4lW4RVYWHBItik8oqfMwYaJuwp2hdmU1hHsm1UnHQTQjum1L9ZfLDI7XU
NxHMGedneJePXdWH0eTqvHAPJ1U59gEL/cMPnOJF57q+eYoCs0svnRxCwQzviBok
Zak2uzIIXr3jc9bY3qiQlZivik5b+l6LK7ccoS7eEIqxOkKivOjwijuXc0EHOUR2
c0QjFUDwJ1gR8rkBE/rOiKLXhMGnv2lcIZWiu7JlxMq+ms7f/1k=
=6VH2
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Ocfs2-devel] NFS clients crash OCFS2 nodes (general protection fault: 0000 [#1] SMP PTI)
  2020-05-19 10:48 [Ocfs2-devel] NFS clients crash OCFS2 nodes (general protection fault: 0000 [#1] SMP PTI) Pierre Dinh-van
@ 2020-05-20  2:52 ` Changwei Ge
  2020-05-21 17:02   ` Pierre Dinh-van
  2020-05-20 22:01 ` herbert.van.den.bergh at oracle.com
  1 sibling, 1 reply; 8+ messages in thread
From: Changwei Ge @ 2020-05-20  2:52 UTC (permalink / raw)
  To: ocfs2-devel

Hi Pierre,

You indeed provide detailed information on this problem and the research 
looks very nice.

On 5/19/20 6:48 PM, Pierre Dinh-van wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> Hi,
> 
> I'm experiencing quite reproducable crashs on my OCFS2 cluster the last
> weeks, and I think I found out part of the Problem. But before I make a
> bug report (I'm not sure where), I though I could post my research here
> in case someone understands it better than I.
> 
> When it happens, the kernel is giving me a 'general protection fault:
> 0000 [#1] SMP PTI' or ' BUG: unable to handle page fault for address:'
> message, local access to the OCFS2 volumes are all hanging forever, and
> the load is getting so high that only a hard reset "solves" the problem.
> It happened the same way on both nodes (always on the active node
> serving NFS). It happens when both nodes are up, or when only one is
> active.

This makes me a little puzzled, you mean you crash the kernel manually 
or ocfs2 crash the kernel itself? From the backtrace you posted, I 
assume that you crash the kernel manually to migrate NFS export IP to 
the other standby node, right?

> 
> First, a few words about my setup :
> 
> - - storage is Fiberchannel LUNs served by a EMC VNX5200 (4x 5TB).
> 
> - - LUNs are encrypted with LUKS
> 
> - - LUKS volumes are formated with OCFS2
> 
> - - heartbeat=local
> 
> - - The OCFS2 cluster is communicating on a dedictated IP over InfinyBand
> direct link (actually a fail-over bonding of the 2 ib interfaces).
> 
> - - nodes are 2 debian buster hardware servers (24 cores, 128GB RAM).
> 
> - - I use corosync for cluster management, but I don't think it's related
> 
> - - The cluster serves 4 NFS exports (3 are NFSv3, and one is NFSv4 with
> kerberos) and samba shares
> 
> - - The samba shares are served in parallel by the 2 nodes managed by ctdb
> 
> - - The NFS exports are served by one of the node having the dedicated IP
> for this export.

In fact, I am not quite experienced in this kind of setup where ocfs2 
serves as NFS/Samba backend filesystem, but the use case seems no 
problem to me. But considering that you have hundreds of clients 
connected to a single NFS/v4 server, do you have a performance 
limitation. This has nothing with the problem, just my personal curiosity.

> 
> - - When corosync is moving the IP of a NFS export to the other node, it
> restarts nfsd to issue a grace time of 10s on the new active node.

Sure, this looks normal step.

> 
> - - The bug appeared with default debian buster kernel 4.19.0-9-amd64 and
> with the backported kernel 5.5.0-0.bpo.2-amd64 as well
> 
> 
> My cluster went live a few weeks ago, and it first worked with both
> nodes active. During some maintenance work where the active NFS node had
> to switch, I had a a bug, that I was able to reproduce against my will
> at least 20 times in the last nights :
> 
> 1) around 100 NFS clients are connected with the exports and happy
> working with it
> 
> 2) The IPs of the NFS shares are switching from nodeA to nodeB
> 
> 3) nodeB's kernel says "general protection fault: 0000 [#1] SMP PTI"
> 
> 4) nodeB's load is getting high

I suppose the reason is that ocfs2 can't successfully acquire spin_lock 
of a system inode(ocfs2 internal mechanism to store its metadata, not 
visible to end users ). As you know, locking path won't stop trying 
until it get the lock, so it consumes a lot of CPU cycles, you probably 
observed a high system workload.

> 
> 5) access to the OCFS2 is hanging (clean reboot never end, system cannot
> unmount OCFS volumes)
> 
> 6) I power cycle nodeB
> 
> 7) I start nodeA again
> 
> 8) nodeA give the same symptoms, as soon as some NFS clients do some
> special requests
> 
> 
> Most of the time, when the IP jumps to the other node, I get some of
> this messages :
> 
> 
> [? 501.707566] (nfsd,2133,14):ocfs2_test_inode_bit:2841 ERROR: unable to
> get alloc inode in slot 65535

This is very strange, I think ocfs2 passes the invalid slot number to 
subsequent consumer function, which makes it use a dangling pointer to a 
fake/illegal system inode. In other word, this why ocfs2 can't acquire 
the spin_lock of system inode.


Can you try to fix the logic in `ocfs2_get_suballoc_slot_bit` judging if 
slot number is valid? And perform your case once more?

Feel free to ask my help if you need. :-)


> [? 501.716660] (nfsd,2133,14):ocfs2_test_inode_bit:2867 ERROR: status = -22

Why this happens and if it is another issue needs further investigation.

> [? 501.726585] (nfsd,2133,6):ocfs2_test_inode_bit:2841 ERROR: unable to
> get alloc inode in slot 65535
> [? 501.735579] (nfsd,2133,6):ocfs2_test_inode_bit:2867 ERROR: status = -22
> 
> But it also happens when the node is not crashing, so I think it's
> another Problem.
> 
> Last night, I saved a few of the kernel output while the servers where
> crashing one after the other like in a ping-pong game :
> 
> 
> - -----8< on nodeB? 8<-----
> 
> [? 502.070431] nfsd: nfsv4 idmapping failing: has idmapd not been started?
> [? 502.475747] general protection fault: 0000 [#1] SMP PTI
> [? 502.481027] CPU: 6 PID: 2104 Comm: nfsd Not tainted
> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
> [? 502.490016] Hardware name: IBM System x3650 M3
> - -[7945UHV]-/94Y7614???? , BIOS -[D6E150CUS-1.11]- 02/08/2011
> [? 502.499821] RIP: 0010:_raw_spin_lock+0xc/0x20
> [? 502.504206] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [? 502.523018] RSP: 0018:ffffa5af4838bac0 EFLAGS: 00010246
> [? 502.528277] RAX: 0000000000000000 RBX: 0fdbbeded053c1d6 RCX:
> 0000000000000000
> [? 502.535433] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 0fdbbeded053c25e
> [? 502.542588] RBP: 0fdbbeded053c25e R08: ffff9672ffa73068 R09:
> 000000000002c340
> [? 502.549743] R10: 00000184d03a2115 R11: 0000000000000000 R12:
> ffff9672eff43fd0
> [? 502.556899] R13: ffff9672fff66bc0 R14: 000000000000ffff R15:
> ffff9672fff660c8
> [? 502.564055] FS:? 0000000000000000(0000) GS:ffff968203a80000(0000)
> knlGS:0000000000000000
> [? 502.572166] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [? 502.577928] CR2: 00007f0a4bb5df50 CR3: 0000001e4169a004 CR4:
> 00000000000206e0
> [? 502.585082] Call Trace:
> [? 502.587546]? igrab+0x19/0x50
> [? 502.590484]? ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
> [? 502.596094]? ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
> [? 502.601534]? ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
> [? 502.606536]? ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
> [? 502.612321]? ocfs2_get_parent+0xa3/0x300 [ocfs2]
> [? 502.616960]? reconnect_path+0xa1/0x2c0
> [? 502.620742]? ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
> [? 502.625636]? exportfs_decode_fh+0x111/0x2d0
> [? 502.629844]? ? exp_find_key+0xcd/0x160 [nfsd]
> [? 502.634218]? ? __kmalloc+0x180/0x270
> [? 502.637810]? ? security_prepare_creds+0x6f/0xa0
> [? 502.642356]? ? tomoyo_write_self+0x1b0/0x1b0
> [? 502.646642]? ? security_prepare_creds+0x49/0xa0
> [? 502.651995]? fh_verify+0x3e5/0x5f0 [nfsd]
> [? 502.656737]? nfsd3_proc_getattr+0x6b/0x100 [nfsd]
> [? 502.662152]? nfsd_dispatch+0x9e/0x210 [nfsd]
> [? 502.667136]? svc_process_common+0x386/0x6f0 [sunrpc]
> [? 502.672791]? ? svc_sock_secure_port+0x12/0x30 [sunrpc]
> [? 502.678628]? ? svc_recv+0x2ff/0x9c0 [sunrpc]
> [? 502.683597]? ? nfsd_svc+0x2c0/0x2c0 [nfsd]
> [? 502.688395]? ? nfsd_destroy+0x50/0x50 [nfsd]
> [? 502.693377]? svc_process+0xd1/0x110 [sunrpc]
> [? 502.698361]? nfsd+0xe3/0x140 [nfsd]
> [? 502.702561]? kthread+0x112/0x130
> [? 502.706504]? ? kthread_park+0x80/0x80
> [? 502.710886]? ret_from_fork+0x35/0x40
> [? 502.715181] Modules linked in: rpcsec_gss_krb5 nfsd nfs_acl lockd
> grace ocfs2 quota_tree sctp libcrc32c nft_counter ipt_REJECT
> nf_reject_ipv4 xt_tcpudp nft_compat nf_tables nfnetlink binfmt_misc
> ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
> cpufreq_userspace ocfs2_stackglue cpufreq_powersave cpufreq_conservative
> configfs bonding dm_round_robin dm_multipath scsi_dh_rdac scsi_dh_emc
> scsi_dh_alua intel_powerclamp coretemp kvm_intel ipmi_ssif kvm cdc_ether
> irqbypass usbnet mii tpm_tis intel_cstate tpm_tis_core intel_uncore
> joydev tpm ipmi_si ioatdma sg iTCO_wdt ib_mthca iTCO_vendor_support
> watchdog dca pcspkr ipmi_devintf rng_core ib_uverbs ipmi_msghandler
> i5500_temp i7core_edac evdev ib_umad ib_ipoib ib_cm ib_core auth_rpcgss
> sunrpc ip_tables x_tables autofs4 algif_skcipher af_alg ext4 crc16
> mbcache jbd2 crc32c_generic dm_crypt dm_mod sd_mod hid_generic usbhid
> hid sr_mod cdrom crct10dif_pclmul crc32_pclmul crc32c_intel
> ghash_clmulni_intel mgag200 drm_vram_helper drm_ttm_helper
> [? 502.715218]? i2c_algo_bit ttm qla2xxx drm_kms_helper drm ata_generic
> ata_piix aesni_intel libata nvme_fc ehci_pci uhci_hcd crypto_simd
> ehci_hcd nvme_fabrics cryptd glue_helper lpc_ich nvme_core megaraid_sas
> mfd_core usbcore scsi_transport_fc e1000e scsi_mod i2c_i801 usb_common
> ptp pps_core bnx2 button
> [? 502.837917] ---[ end trace 6e1a8e507ac30d8d ]---
> [? 502.843379] RIP: 0010:_raw_spin_lock+0xc/0x20
> [? 502.848584] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [? 502.869295] RSP: 0018:ffffa5af4838bac0 EFLAGS: 00010246
> [? 502.875404] RAX: 0000000000000000 RBX: 0fdbbeded053c1d6 RCX:
> 0000000000000000
> [? 502.883423] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 0fdbbeded053c25e
> [? 502.891436] RBP: 0fdbbeded053c25e R08: ffff9672ffa73068 R09:
> 000000000002c340
> [? 502.899446] R10: 00000184d03a2115 R11: 0000000000000000 R12:
> ffff9672eff43fd0
> [? 502.907455] R13: ffff9672fff66bc0 R14: 000000000000ffff R15:
> ffff9672fff660c8
> [? 502.915473] FS:? 0000000000000000(0000) GS:ffff968203a80000(0000)
> knlGS:0000000000000000
> [? 502.924496] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [? 502.931147] CR2: 00007f0a4bb5df50 CR3: 0000001e4169a004 CR4:
> 00000000000206e0
> 
> - -----8< cut here 8<-----
> 
> 
> shortly after when trying to start nodeA while nodeB was down :
> 
> - -----8< nodeA 8<-----
> 
> [? 728.927750] general protection fault: 0000 [#1] SMP PTI
> [? 728.934647] CPU: 8 PID: 12156 Comm: nfsd Not tainted
> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
> [? 728.944800] Hardware name: IBM System x3650 M3 -[7945H2G]-/69Y4438,
> BIOS -[D6E158AUS-1.16]- 11/26/2012
> [? 728.955176] RIP: 0010:_raw_spin_lock+0xc/0x20
> [? 728.960595] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [? 728.981559] RSP: 0018:ffffa24d09187ac0 EFLAGS: 00010246
> [? 728.987885] RAX: 0000000000000000 RBX: 625f656369766564 RCX:
> 0000000000000000
> [? 728.996135] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 625f6563697665ec
> [? 729.004384] RBP: 625f6563697665ec R08: ffff91f1272e3680 R09:
> 000000000002c340
> [? 729.012628] R10: 000002114c00d22b R11: 0000000000000000 R12:
> ffff91e13858bbd0
> [? 729.020877] R13: ffff91f13c03bbc0 R14: 000000000000ffff R15:
> ffff91f13c03b0c8
> [? 729.029108] FS:? 0000000000000000(0000) GS:ffff91f13fa80000(0000)
> knlGS:0000000000000000
> [? 729.038308] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [? 729.045161] CR2: 000055fe95d1d9a8 CR3: 0000001c8420a003 CR4:
> 00000000000206e0
> [? 729.053415] Call Trace:
> [? 729.056978]? igrab+0x19/0x50
> [? 729.061023]? ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
> [? 729.067736]? ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
> [? 729.074284]? ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
> [? 729.080386]? ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
> [? 729.087268]? ocfs2_get_parent+0xa3/0x300 [ocfs2]
> [? 729.092995]? reconnect_path+0xa1/0x2c0
> [? 729.097863]? ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
> [? 729.103840]? exportfs_decode_fh+0x111/0x2d0
> [? 729.109133]? ? exp_find_key+0xcd/0x160 [nfsd]
> [? 729.114574]? ? dbs_update_util_handler+0x16/0x90
> [? 729.120273]? ? __kmalloc+0x180/0x270
> [? 729.124920]? ? security_prepare_creds+0x6f/0xa0
> [? 729.130521]? ? tomoyo_write_self+0x1b0/0x1b0
> [? 729.135858]? ? security_prepare_creds+0x49/0xa0
> [? 729.141464]? fh_verify+0x3e5/0x5f0 [nfsd]
> [? 729.146536]? nfsd3_proc_getattr+0x6b/0x100 [nfsd]
> [? 729.152297]? nfsd_dispatch+0x9e/0x210 [nfsd]
> [? 729.157647]? svc_process_common+0x386/0x6f0 [sunrpc]
> [? 729.163675]? ? svc_sock_secure_port+0x12/0x30 [sunrpc]
> [? 729.169873]? ? svc_recv+0x2ff/0x9c0 [sunrpc]
> [? 729.175190]? ? nfsd_svc+0x2c0/0x2c0 [nfsd]
> [? 729.180323]? ? nfsd_destroy+0x50/0x50 [nfsd]
> [? 729.185631]? svc_process+0xd1/0x110 [sunrpc]
> [? 729.190925]? nfsd+0xe3/0x140 [nfsd]
> [? 729.195421]? kthread+0x112/0x130
> [? 729.199636]? ? kthread_park+0x80/0x80
> [? 729.204279]? ret_from_fork+0x35/0x40
> [? 729.208818] Modules linked in: ocfs2 quota_tree dm_crypt crypto_simd
> cryptd glue_helper algif_skcipher af_alg nfsd nfs_acl lockd grace sctp
> libcrc32c nft_counter xt_tcpudp nft_compat nf_tables nfnetlink
> binfmt_misc ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
> ocfs2_stackglue configfs bonding dm_round_robin dm_multipath
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_powerclamp ipmi_ssif
> coretemp kvm_intel kvm irqbypass cdc_ether intel_cstate usbnet ib_mthca
> mii joydev intel_uncore ib_uverbs iTCO_wdt sg ioatdma
> iTCO_vendor_support i7core_edac watchdog i5500_temp pcspkr ipmi_si
> tpm_tis tpm_tis_core tpm ipmi_devintf ipmi_msghandler rng_core evdev
> acpi_cpufreq ib_umad auth_rpcgss ib_ipoib ib_cm sunrpc ib_core ip_tables
> x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_mod
> hid_generic sd_mod usbhid hid sr_mod cdrom uas usb_storage mgag200
> drm_vram_helper drm_ttm_helper qla2xxx ata_generic i2c_algo_bit ttm
> nvme_fc ixgbe nvme_fabrics ata_piix xfrm_algo ehci_pci uhci_hcd
> [? 729.208868]? drm_kms_helper nvme_core ehci_hcd dca libata
> megaraid_sas scsi_transport_fc libphy ptp drm usbcore scsi_mod
> crc32c_intel lpc_ich i2c_i801 pps_core mfd_core usb_common bnx2 mdio button
> [? 729.323900] ---[ end trace 212cefe75207e17f ]---
> [? 729.329644] RIP: 0010:_raw_spin_lock+0xc/0x20
> [? 729.335121] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [? 729.356192] RSP: 0018:ffffa24d09187ac0 EFLAGS: 00010246
> [? 729.362584] RAX: 0000000000000000 RBX: 625f656369766564 RCX:
> 0000000000000000
> [? 729.370889] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 625f6563697665ec
> [? 729.379184] RBP: 625f6563697665ec R08: ffff91f1272e3680 R09:
> 000000000002c340
> [? 729.387475] R10: 000002114c00d22b R11: 0000000000000000 R12:
> ffff91e13858bbd0
> [? 729.395767] R13: ffff91f13c03bbc0 R14: 000000000000ffff R15:
> ffff91f13c03b0c8
> [? 729.404066] FS:? 0000000000000000(0000) GS:ffff91f13fa80000(0000)
> knlGS:0000000000000000
> [? 729.413336] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [? 729.420262] CR2: 000055fe95d1d9a8 CR3: 0000001c8420a003 CR4:
> 00000000000206e0
> 
> - -----8< cut here 8<-----
> 
> Since the backtrace is speaking about nfsd, I tried to isolate the
> Problem by droping all NFS packet entering to the active node before
> starting the services :
> 
> iptables -I INPUT -p tcp --dport 2049 -j DROP
> 
> Everything fine, access over samba were doing fine.
> 
> I started to allow NFS for part of my network with iptables, until the
> crash occurs.
> 
> On this one, the crash occured after allowing only a NFSv3 client.
> 
> - -----8< nodeB 8<-----
> 
> [ 2494.893697] BUG: unable to handle page fault for address:
> 00000000000fab7d
> [ 2494.900626] #PF: supervisor write access in kernel mode
> [ 2494.905873] #PF: error_code(0x0002) - not-present page
> [ 2494.911042] PGD 0 P4D 0
> [ 2494.913596] Oops: 0002 [#1] SMP PTI
> [ 2494.917106] CPU: 11 PID: 21342 Comm: nfsd Not tainted
> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
> [ 2494.926285] Hardware name: IBM System x3650 M3
> - -[7945UHV]-/94Y7614???? , BIOS -[D6E150CUS-1.11]- 02/08/2011
> [ 2494.936077] RIP: 0010:_raw_spin_lock+0xc/0x20
> [ 2494.940461] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [ 2494.959269] RSP: 0018:ffffb949cca7fac0 EFLAGS: 00010246
> [ 2494.964511] RAX: 0000000000000000 RBX: 00000000000faaf5 RCX:
> 0000000000000000
> [ 2494.971664] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 00000000000fab7d
> [ 2494.978815] RBP: 00000000000fab7d R08: ffff99c8f12737b8 R09:
> 000000000002c340
> [ 2494.985968] R10: 000005dc16e333d8 R11: 0000000000000000 R12:
> ffff99c903066bd0
> [ 2494.993119] R13: ffff99c9014c7bc0 R14: 000000000000ffff R15:
> ffff99c9014c70c8
> [ 2495.000271] FS:? 0000000000000000(0000) GS:ffff99ba03bc0000(0000)
> knlGS:0000000000000000
> [ 2495.008382] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2495.014141] CR2: 00000000000fab7d CR3: 0000000f38df2002 CR4:
> 00000000000206e0
> [ 2495.021291] Call Trace:
> [ 2495.023753]? igrab+0x19/0x50
> [ 2495.026693]? ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
> [ 2495.032301]? ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
> [ 2495.037741]? ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
> [ 2495.042742]? ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
> [ 2495.048525]? ocfs2_get_parent+0xa3/0x300 [ocfs2]
> [ 2495.053163]? reconnect_path+0xa1/0x2c0
> [ 2495.056943]? ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
> [ 2495.061838]? exportfs_decode_fh+0x111/0x2d0
> [ 2495.066042]? ? exp_find_key+0xcd/0x160 [nfsd]
> [ 2495.070417]? ? __kmalloc+0x180/0x270
> [ 2495.074007]? ? security_prepare_creds+0x6f/0xa0
> [ 2495.078551]? ? tomoyo_write_self+0x1b0/0x1b0
> [ 2495.082834]? ? security_prepare_creds+0x49/0xa0
> [ 2495.087387]? fh_verify+0x3e5/0x5f0 [nfsd]
> [ 2495.092125]? nfsd3_proc_getattr+0x6b/0x100 [nfsd]
> [ 2495.097517]? nfsd_dispatch+0x9e/0x210 [nfsd]
> [ 2495.102489]? svc_process_common+0x386/0x6f0 [sunrpc]
> [ 2495.108134]? ? svc_sock_secure_port+0x12/0x30 [sunrpc]
> [ 2495.113946]? ? svc_recv+0x2ff/0x9c0 [sunrpc]
> [ 2495.118883]? ? nfsd_svc+0x2c0/0x2c0 [nfsd]
> [ 2495.123651]? ? nfsd_destroy+0x50/0x50 [nfsd]
> [ 2495.128604]? svc_process+0xd1/0x110 [sunrpc]
> [ 2495.133563]? nfsd+0xe3/0x140 [nfsd]
> [ 2495.137733]? kthread+0x112/0x130
> [ 2495.141643]? ? kthread_park+0x80/0x80
> [ 2495.145993]? ret_from_fork+0x35/0x40
> [ 2495.150262] Modules linked in: ocfs2 quota_tree nfsd nfs_acl lockd
> grace sctp libcrc32c nft_counter xt_tcpudp nft_compat nf_tables
> nfnetlink binfmt_misc ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
> ocfs2_nodemanager ocfs2_stackglue cpufreq_userspace cpufreq_powersave
> cpufreq_conservative configfs bonding dm_round_robin dm_multipath
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_powerclamp coretemp
> kvm_intel kvm ipmi_ssif irqbypass cdc_ether usbnet intel_cstate mii
> ib_mthca intel_uncore evdev iTCO_wdt joydev ipmi_si tpm_tis sg
> tpm_tis_core tpm ipmi_devintf iTCO_vendor_support ib_uverbs pcspkr
> watchdog ioatdma rng_core i7core_edac dca i5500_temp ipmi_msghandler
> ib_umad ib_ipoib auth_rpcgss ib_cm sunrpc ib_core ip_tables x_tables
> autofs4 algif_skcipher af_alg ext4 crc16 mbcache jbd2 crc32c_generic
> dm_crypt dm_mod sr_mod cdrom sd_mod hid_generic usbhid hid
> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mgag200
> drm_vram_helper drm_ttm_helper i2c_algo_bit ttm qla2xxx drm_kms_helper
> [ 2495.150298]? ata_generic drm ata_piix aesni_intel libata nvme_fc
> uhci_hcd crypto_simd nvme_fabrics ehci_pci ehci_hcd megaraid_sas cryptd
> nvme_core glue_helper usbcore scsi_transport_fc lpc_ich mfd_core e1000e
> scsi_mod i2c_i801 usb_common bnx2 ptp pps_core button
> [ 2495.269073] CR2: 00000000000fab7d
> [ 2495.273205] ---[ end trace 44f000505f987296 ]---
> [ 2495.278647] RIP: 0010:_raw_spin_lock+0xc/0x20
> [ 2495.283825] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [ 2495.304287] RSP: 0018:ffffb949cca7fac0 EFLAGS: 00010246
> [ 2495.310368] RAX: 0000000000000000 RBX: 00000000000faaf5 RCX:
> 0000000000000000
> [ 2495.318358] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 00000000000fab7d
> [ 2495.326339] RBP: 00000000000fab7d R08: ffff99c8f12737b8 R09:
> 000000000002c340
> [ 2495.334314] R10: 000005dc16e333d8 R11: 0000000000000000 R12:
> ffff99c903066bd0
> [ 2495.342297] R13: ffff99c9014c7bc0 R14: 000000000000ffff R15:
> ffff99c9014c70c8
> [ 2495.350279] FS:? 0000000000000000(0000) GS:ffff99ba03bc0000(0000)
> knlGS:0000000000000000
> [ 2495.359224] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2495.365825] CR2: 00000000000fab7d CR3: 0000000f38df2002 CR4:
> 00000000000206e0
> 
> - -----8< cut here 8<-----
> 
> 
> I also made some record of NFS traffic with tcpdump when the crashs are
> occuring, but I didn't had time to analyse it. I can provide them if it
> helps (but I might have to clean them up first to avoid some data leaks :)
> 
> After this happening a few times, I had one case where one of the few
> NFSv4 client was involved, and I took as aworkaround to migrate the
> NFSv4 share to XFS since I though it could be NFSv4 related. Now that I
> looked at the kernel dumps, I think it's also NFSv3 related.
> 
> Is it a kernel bug or am I missing something ?

Probably a ocfs2 internal error handling bug. :-(

> 
> For now, I will switch back to XFS with a failover setup of my cluster,
> cause I cannot afford this kind of crashes of a central file server.
> 
> 

Interesting, do you mean you can use XFS as a NFSv4 backend filesystem 
which is formatted on a shared SCSI LUN with more than one host attached 
without filesystem corruption?


Thanks,
Changwei


> 
> Cheers
> 
> 
> Pierre
> 
> 
> PS : I'll try to update with what I found in the PCAP files in some days
> 
> 
> 
> -----BEGIN PGP SIGNATURE-----
> 
> iQIzBAEBCgAdFiEEjXhlMuC3bFbPA3S/qC81yulnAywFAl7DuYUACgkQqC81yuln
> AyyWHg/+NUMgiysN+QesGedzDjfKiUSTKqPvbg3pUklq3bXfidLdIcM6HGZt31Gy
> /k33m83+675c3vQeez8uvSVAh7uNzQW2m7BGrzEv/ZkpwIZxqIwiV+gi/zc7lnGP
> vqkjAqVjAapk254W3aTonZKpExKCbp6h1g1eVWKk43Mu7/UaLkQcjDPq9AyheSt+
> PAPpiQeQ9ox1+OVvEnpawJmcV9QUluHPwsar/yYa87ET2LIlBswtjOHr+Vw27kJU
> qwak16rTDWqVQo5PYIi5N1jdCAEAPOV6EalNNwIQ9/u+lCdbVHYRV7Ze5nE58DDz
> zP7Ai9Vaj/oBr/znjbaoE2VR0iWRVqbztBxts0hNv1IxAqJivZnaaJx6bXp51NeV
> kXwTz0tHqCoy8RVlUXEGP8FBuaWgSLZyfN5IOk1KoocsLUuZh/BVxh+snDj4T52k
> u/SRBFV4lW4RVYWHBItik8oqfMwYaJuwp2hdmU1hHsm1UnHQTQjum1L9ZfLDI7XU
> NxHMGedneJePXdWH0eTqvHAPJ1U59gEL/cMPnOJF57q+eYoCs0svnRxCwQzviBok
> Zak2uzIIXr3jc9bY3qiQlZivik5b+l6LK7ccoS7eEIqxOkKivOjwijuXc0EHOUR2
> c0QjFUDwJ1gR8rkBE/rOiKLXhMGnv2lcIZWiu7JlxMq+ms7f/1k=
> =6VH2
> -----END PGP SIGNATURE-----
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Ocfs2-devel] NFS clients crash OCFS2 nodes (general protection fault: 0000 [#1] SMP PTI)
  2020-05-19 10:48 [Ocfs2-devel] NFS clients crash OCFS2 nodes (general protection fault: 0000 [#1] SMP PTI) Pierre Dinh-van
  2020-05-20  2:52 ` Changwei Ge
@ 2020-05-20 22:01 ` herbert.van.den.bergh at oracle.com
  2020-06-02 10:10   ` Pierre Dinh-van
  1 sibling, 1 reply; 8+ messages in thread
From: herbert.van.den.bergh at oracle.com @ 2020-05-20 22:01 UTC (permalink / raw)
  To: ocfs2-devel

Pierre,

A similar if not identical bug was reported against Oracle Linux. The 
fix is still being developed. I will update that bug to reference this 
thread, and ask the developer to submit the fix upstream and/or to the 
ocfs2-devel list once it's ready.

Thanks,
Herbert.

On 5/19/20 3:48 AM, Pierre Dinh-van wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> Hi,
>
> I'm experiencing quite reproducable crashs on my OCFS2 cluster the last
> weeks, and I think I found out part of the Problem. But before I make a
> bug report (I'm not sure where), I though I could post my research here
> in case someone understands it better than I.
>
> When it happens, the kernel is giving me a 'general protection fault:
> 0000 [#1] SMP PTI' or ' BUG: unable to handle page fault for address:'
> message, local access to the OCFS2 volumes are all hanging forever, and
> the load is getting so high that only a hard reset "solves" the problem.
> It happened the same way on both nodes (always on the active node
> serving NFS). It happens when both nodes are up, or when only one is
> active.
>
> First, a few words about my setup :
>
> - - storage is Fiberchannel LUNs served by a EMC VNX5200 (4x 5TB).
>
> - - LUNs are encrypted with LUKS
>
> - - LUKS volumes are formated with OCFS2
>
> - - heartbeat=local
>
> - - The OCFS2 cluster is communicating on a dedictated IP over InfinyBand
> direct link (actually a fail-over bonding of the 2 ib interfaces).
>
> - - nodes are 2 debian buster hardware servers (24 cores, 128GB RAM).
>
> - - I use corosync for cluster management, but I don't think it's related
>
> - - The cluster serves 4 NFS exports (3 are NFSv3, and one is NFSv4 with
> kerberos) and samba shares
>
> - - The samba shares are served in parallel by the 2 nodes managed by ctdb
>
> - - The NFS exports are served by one of the node having the dedicated IP
> for this export.
>
> - - When corosync is moving the IP of a NFS export to the other node, it
> restarts nfsd to issue a grace time of 10s on the new active node.
>
> - - The bug appeared with default debian buster kernel 4.19.0-9-amd64 and
> with the backported kernel 5.5.0-0.bpo.2-amd64 as well
>
>
> My cluster went live a few weeks ago, and it first worked with both
> nodes active. During some maintenance work where the active NFS node had
> to switch, I had a a bug, that I was able to reproduce against my will
> at least 20 times in the last nights :
>
> 1) around 100 NFS clients are connected with the exports and happy
> working with it
>
> 2) The IPs of the NFS shares are switching from nodeA to nodeB
>
> 3) nodeB's kernel says "general protection fault: 0000 [#1] SMP PTI"
>
> 4) nodeB's load is getting high
>
> 5) access to the OCFS2 is hanging (clean reboot never end, system cannot
> unmount OCFS volumes)
>
> 6) I power cycle nodeB
>
> 7) I start nodeA again
>
> 8) nodeA give the same symptoms, as soon as some NFS clients do some
> special requests
>
>
> Most of the time, when the IP jumps to the other node, I get some of
> this messages :
>
>
> [? 501.707566] (nfsd,2133,14):ocfs2_test_inode_bit:2841 ERROR: unable to
> get alloc inode in slot 65535
> [? 501.716660] (nfsd,2133,14):ocfs2_test_inode_bit:2867 ERROR: status = -22
> [? 501.726585] (nfsd,2133,6):ocfs2_test_inode_bit:2841 ERROR: unable to
> get alloc inode in slot 65535
> [? 501.735579] (nfsd,2133,6):ocfs2_test_inode_bit:2867 ERROR: status = -22
>
> But it also happens when the node is not crashing, so I think it's
> another Problem.
>
> Last night, I saved a few of the kernel output while the servers where
> crashing one after the other like in a ping-pong game :
>
>
> - -----8< on nodeB? 8<-----
>
> [? 502.070431] nfsd: nfsv4 idmapping failing: has idmapd not been started?
> [? 502.475747] general protection fault: 0000 [#1] SMP PTI
> [? 502.481027] CPU: 6 PID: 2104 Comm: nfsd Not tainted
> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
> [? 502.490016] Hardware name: IBM System x3650 M3
> - -[7945UHV]-/94Y7614???? , BIOS -[D6E150CUS-1.11]- 02/08/2011
> [? 502.499821] RIP: 0010:_raw_spin_lock+0xc/0x20
> [? 502.504206] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [? 502.523018] RSP: 0018:ffffa5af4838bac0 EFLAGS: 00010246
> [? 502.528277] RAX: 0000000000000000 RBX: 0fdbbeded053c1d6 RCX:
> 0000000000000000
> [? 502.535433] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 0fdbbeded053c25e
> [? 502.542588] RBP: 0fdbbeded053c25e R08: ffff9672ffa73068 R09:
> 000000000002c340
> [? 502.549743] R10: 00000184d03a2115 R11: 0000000000000000 R12:
> ffff9672eff43fd0
> [? 502.556899] R13: ffff9672fff66bc0 R14: 000000000000ffff R15:
> ffff9672fff660c8
> [? 502.564055] FS:? 0000000000000000(0000) GS:ffff968203a80000(0000)
> knlGS:0000000000000000
> [? 502.572166] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [? 502.577928] CR2: 00007f0a4bb5df50 CR3: 0000001e4169a004 CR4:
> 00000000000206e0
> [? 502.585082] Call Trace:
> [? 502.587546]? igrab+0x19/0x50
> [? 502.590484]? ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
> [? 502.596094]? ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
> [? 502.601534]? ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
> [? 502.606536]? ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
> [? 502.612321]? ocfs2_get_parent+0xa3/0x300 [ocfs2]
> [? 502.616960]? reconnect_path+0xa1/0x2c0
> [? 502.620742]? ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
> [? 502.625636]? exportfs_decode_fh+0x111/0x2d0
> [? 502.629844]? ? exp_find_key+0xcd/0x160 [nfsd]
> [? 502.634218]? ? __kmalloc+0x180/0x270
> [? 502.637810]? ? security_prepare_creds+0x6f/0xa0
> [? 502.642356]? ? tomoyo_write_self+0x1b0/0x1b0
> [? 502.646642]? ? security_prepare_creds+0x49/0xa0
> [? 502.651995]? fh_verify+0x3e5/0x5f0 [nfsd]
> [? 502.656737]? nfsd3_proc_getattr+0x6b/0x100 [nfsd]
> [? 502.662152]? nfsd_dispatch+0x9e/0x210 [nfsd]
> [? 502.667136]? svc_process_common+0x386/0x6f0 [sunrpc]
> [? 502.672791]? ? svc_sock_secure_port+0x12/0x30 [sunrpc]
> [? 502.678628]? ? svc_recv+0x2ff/0x9c0 [sunrpc]
> [? 502.683597]? ? nfsd_svc+0x2c0/0x2c0 [nfsd]
> [? 502.688395]? ? nfsd_destroy+0x50/0x50 [nfsd]
> [? 502.693377]? svc_process+0xd1/0x110 [sunrpc]
> [? 502.698361]? nfsd+0xe3/0x140 [nfsd]
> [? 502.702561]? kthread+0x112/0x130
> [? 502.706504]? ? kthread_park+0x80/0x80
> [? 502.710886]? ret_from_fork+0x35/0x40
> [? 502.715181] Modules linked in: rpcsec_gss_krb5 nfsd nfs_acl lockd
> grace ocfs2 quota_tree sctp libcrc32c nft_counter ipt_REJECT
> nf_reject_ipv4 xt_tcpudp nft_compat nf_tables nfnetlink binfmt_misc
> ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
> cpufreq_userspace ocfs2_stackglue cpufreq_powersave cpufreq_conservative
> configfs bonding dm_round_robin dm_multipath scsi_dh_rdac scsi_dh_emc
> scsi_dh_alua intel_powerclamp coretemp kvm_intel ipmi_ssif kvm cdc_ether
> irqbypass usbnet mii tpm_tis intel_cstate tpm_tis_core intel_uncore
> joydev tpm ipmi_si ioatdma sg iTCO_wdt ib_mthca iTCO_vendor_support
> watchdog dca pcspkr ipmi_devintf rng_core ib_uverbs ipmi_msghandler
> i5500_temp i7core_edac evdev ib_umad ib_ipoib ib_cm ib_core auth_rpcgss
> sunrpc ip_tables x_tables autofs4 algif_skcipher af_alg ext4 crc16
> mbcache jbd2 crc32c_generic dm_crypt dm_mod sd_mod hid_generic usbhid
> hid sr_mod cdrom crct10dif_pclmul crc32_pclmul crc32c_intel
> ghash_clmulni_intel mgag200 drm_vram_helper drm_ttm_helper
> [? 502.715218]? i2c_algo_bit ttm qla2xxx drm_kms_helper drm ata_generic
> ata_piix aesni_intel libata nvme_fc ehci_pci uhci_hcd crypto_simd
> ehci_hcd nvme_fabrics cryptd glue_helper lpc_ich nvme_core megaraid_sas
> mfd_core usbcore scsi_transport_fc e1000e scsi_mod i2c_i801 usb_common
> ptp pps_core bnx2 button
> [? 502.837917] ---[ end trace 6e1a8e507ac30d8d ]---
> [? 502.843379] RIP: 0010:_raw_spin_lock+0xc/0x20
> [? 502.848584] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [? 502.869295] RSP: 0018:ffffa5af4838bac0 EFLAGS: 00010246
> [? 502.875404] RAX: 0000000000000000 RBX: 0fdbbeded053c1d6 RCX:
> 0000000000000000
> [? 502.883423] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 0fdbbeded053c25e
> [? 502.891436] RBP: 0fdbbeded053c25e R08: ffff9672ffa73068 R09:
> 000000000002c340
> [? 502.899446] R10: 00000184d03a2115 R11: 0000000000000000 R12:
> ffff9672eff43fd0
> [? 502.907455] R13: ffff9672fff66bc0 R14: 000000000000ffff R15:
> ffff9672fff660c8
> [? 502.915473] FS:? 0000000000000000(0000) GS:ffff968203a80000(0000)
> knlGS:0000000000000000
> [? 502.924496] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [? 502.931147] CR2: 00007f0a4bb5df50 CR3: 0000001e4169a004 CR4:
> 00000000000206e0
>
> - -----8< cut here 8<-----
>
>
> shortly after when trying to start nodeA while nodeB was down :
>
> - -----8< nodeA 8<-----
>
> [? 728.927750] general protection fault: 0000 [#1] SMP PTI
> [? 728.934647] CPU: 8 PID: 12156 Comm: nfsd Not tainted
> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
> [? 728.944800] Hardware name: IBM System x3650 M3 -[7945H2G]-/69Y4438,
> BIOS -[D6E158AUS-1.16]- 11/26/2012
> [? 728.955176] RIP: 0010:_raw_spin_lock+0xc/0x20
> [? 728.960595] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [? 728.981559] RSP: 0018:ffffa24d09187ac0 EFLAGS: 00010246
> [? 728.987885] RAX: 0000000000000000 RBX: 625f656369766564 RCX:
> 0000000000000000
> [? 728.996135] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 625f6563697665ec
> [? 729.004384] RBP: 625f6563697665ec R08: ffff91f1272e3680 R09:
> 000000000002c340
> [? 729.012628] R10: 000002114c00d22b R11: 0000000000000000 R12:
> ffff91e13858bbd0
> [? 729.020877] R13: ffff91f13c03bbc0 R14: 000000000000ffff R15:
> ffff91f13c03b0c8
> [? 729.029108] FS:? 0000000000000000(0000) GS:ffff91f13fa80000(0000)
> knlGS:0000000000000000
> [? 729.038308] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [? 729.045161] CR2: 000055fe95d1d9a8 CR3: 0000001c8420a003 CR4:
> 00000000000206e0
> [? 729.053415] Call Trace:
> [? 729.056978]? igrab+0x19/0x50
> [? 729.061023]? ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
> [? 729.067736]? ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
> [? 729.074284]? ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
> [? 729.080386]? ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
> [? 729.087268]? ocfs2_get_parent+0xa3/0x300 [ocfs2]
> [? 729.092995]? reconnect_path+0xa1/0x2c0
> [? 729.097863]? ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
> [? 729.103840]? exportfs_decode_fh+0x111/0x2d0
> [? 729.109133]? ? exp_find_key+0xcd/0x160 [nfsd]
> [? 729.114574]? ? dbs_update_util_handler+0x16/0x90
> [? 729.120273]? ? __kmalloc+0x180/0x270
> [? 729.124920]? ? security_prepare_creds+0x6f/0xa0
> [? 729.130521]? ? tomoyo_write_self+0x1b0/0x1b0
> [? 729.135858]? ? security_prepare_creds+0x49/0xa0
> [? 729.141464]? fh_verify+0x3e5/0x5f0 [nfsd]
> [? 729.146536]? nfsd3_proc_getattr+0x6b/0x100 [nfsd]
> [? 729.152297]? nfsd_dispatch+0x9e/0x210 [nfsd]
> [? 729.157647]? svc_process_common+0x386/0x6f0 [sunrpc]
> [? 729.163675]? ? svc_sock_secure_port+0x12/0x30 [sunrpc]
> [? 729.169873]? ? svc_recv+0x2ff/0x9c0 [sunrpc]
> [? 729.175190]? ? nfsd_svc+0x2c0/0x2c0 [nfsd]
> [? 729.180323]? ? nfsd_destroy+0x50/0x50 [nfsd]
> [? 729.185631]? svc_process+0xd1/0x110 [sunrpc]
> [? 729.190925]? nfsd+0xe3/0x140 [nfsd]
> [? 729.195421]? kthread+0x112/0x130
> [? 729.199636]? ? kthread_park+0x80/0x80
> [? 729.204279]? ret_from_fork+0x35/0x40
> [? 729.208818] Modules linked in: ocfs2 quota_tree dm_crypt crypto_simd
> cryptd glue_helper algif_skcipher af_alg nfsd nfs_acl lockd grace sctp
> libcrc32c nft_counter xt_tcpudp nft_compat nf_tables nfnetlink
> binfmt_misc ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
> ocfs2_stackglue configfs bonding dm_round_robin dm_multipath
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_powerclamp ipmi_ssif
> coretemp kvm_intel kvm irqbypass cdc_ether intel_cstate usbnet ib_mthca
> mii joydev intel_uncore ib_uverbs iTCO_wdt sg ioatdma
> iTCO_vendor_support i7core_edac watchdog i5500_temp pcspkr ipmi_si
> tpm_tis tpm_tis_core tpm ipmi_devintf ipmi_msghandler rng_core evdev
> acpi_cpufreq ib_umad auth_rpcgss ib_ipoib ib_cm sunrpc ib_core ip_tables
> x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_mod
> hid_generic sd_mod usbhid hid sr_mod cdrom uas usb_storage mgag200
> drm_vram_helper drm_ttm_helper qla2xxx ata_generic i2c_algo_bit ttm
> nvme_fc ixgbe nvme_fabrics ata_piix xfrm_algo ehci_pci uhci_hcd
> [? 729.208868]? drm_kms_helper nvme_core ehci_hcd dca libata
> megaraid_sas scsi_transport_fc libphy ptp drm usbcore scsi_mod
> crc32c_intel lpc_ich i2c_i801 pps_core mfd_core usb_common bnx2 mdio button
> [? 729.323900] ---[ end trace 212cefe75207e17f ]---
> [? 729.329644] RIP: 0010:_raw_spin_lock+0xc/0x20
> [? 729.335121] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [? 729.356192] RSP: 0018:ffffa24d09187ac0 EFLAGS: 00010246
> [? 729.362584] RAX: 0000000000000000 RBX: 625f656369766564 RCX:
> 0000000000000000
> [? 729.370889] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 625f6563697665ec
> [? 729.379184] RBP: 625f6563697665ec R08: ffff91f1272e3680 R09:
> 000000000002c340
> [? 729.387475] R10: 000002114c00d22b R11: 0000000000000000 R12:
> ffff91e13858bbd0
> [? 729.395767] R13: ffff91f13c03bbc0 R14: 000000000000ffff R15:
> ffff91f13c03b0c8
> [? 729.404066] FS:? 0000000000000000(0000) GS:ffff91f13fa80000(0000)
> knlGS:0000000000000000
> [? 729.413336] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [? 729.420262] CR2: 000055fe95d1d9a8 CR3: 0000001c8420a003 CR4:
> 00000000000206e0
>
> - -----8< cut here 8<-----
>
> Since the backtrace is speaking about nfsd, I tried to isolate the
> Problem by droping all NFS packet entering to the active node before
> starting the services :
>
> iptables -I INPUT -p tcp --dport 2049 -j DROP
>
> Everything fine, access over samba were doing fine.
>
> I started to allow NFS for part of my network with iptables, until the
> crash occurs.
>
> On this one, the crash occured after allowing only a NFSv3 client.
>
> - -----8< nodeB 8<-----
>
> [ 2494.893697] BUG: unable to handle page fault for address:
> 00000000000fab7d
> [ 2494.900626] #PF: supervisor write access in kernel mode
> [ 2494.905873] #PF: error_code(0x0002) - not-present page
> [ 2494.911042] PGD 0 P4D 0
> [ 2494.913596] Oops: 0002 [#1] SMP PTI
> [ 2494.917106] CPU: 11 PID: 21342 Comm: nfsd Not tainted
> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
> [ 2494.926285] Hardware name: IBM System x3650 M3
> - -[7945UHV]-/94Y7614???? , BIOS -[D6E150CUS-1.11]- 02/08/2011
> [ 2494.936077] RIP: 0010:_raw_spin_lock+0xc/0x20
> [ 2494.940461] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [ 2494.959269] RSP: 0018:ffffb949cca7fac0 EFLAGS: 00010246
> [ 2494.964511] RAX: 0000000000000000 RBX: 00000000000faaf5 RCX:
> 0000000000000000
> [ 2494.971664] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 00000000000fab7d
> [ 2494.978815] RBP: 00000000000fab7d R08: ffff99c8f12737b8 R09:
> 000000000002c340
> [ 2494.985968] R10: 000005dc16e333d8 R11: 0000000000000000 R12:
> ffff99c903066bd0
> [ 2494.993119] R13: ffff99c9014c7bc0 R14: 000000000000ffff R15:
> ffff99c9014c70c8
> [ 2495.000271] FS:? 0000000000000000(0000) GS:ffff99ba03bc0000(0000)
> knlGS:0000000000000000
> [ 2495.008382] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2495.014141] CR2: 00000000000fab7d CR3: 0000000f38df2002 CR4:
> 00000000000206e0
> [ 2495.021291] Call Trace:
> [ 2495.023753]? igrab+0x19/0x50
> [ 2495.026693]? ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
> [ 2495.032301]? ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
> [ 2495.037741]? ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
> [ 2495.042742]? ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
> [ 2495.048525]? ocfs2_get_parent+0xa3/0x300 [ocfs2]
> [ 2495.053163]? reconnect_path+0xa1/0x2c0
> [ 2495.056943]? ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
> [ 2495.061838]? exportfs_decode_fh+0x111/0x2d0
> [ 2495.066042]? ? exp_find_key+0xcd/0x160 [nfsd]
> [ 2495.070417]? ? __kmalloc+0x180/0x270
> [ 2495.074007]? ? security_prepare_creds+0x6f/0xa0
> [ 2495.078551]? ? tomoyo_write_self+0x1b0/0x1b0
> [ 2495.082834]? ? security_prepare_creds+0x49/0xa0
> [ 2495.087387]? fh_verify+0x3e5/0x5f0 [nfsd]
> [ 2495.092125]? nfsd3_proc_getattr+0x6b/0x100 [nfsd]
> [ 2495.097517]? nfsd_dispatch+0x9e/0x210 [nfsd]
> [ 2495.102489]? svc_process_common+0x386/0x6f0 [sunrpc]
> [ 2495.108134]? ? svc_sock_secure_port+0x12/0x30 [sunrpc]
> [ 2495.113946]? ? svc_recv+0x2ff/0x9c0 [sunrpc]
> [ 2495.118883]? ? nfsd_svc+0x2c0/0x2c0 [nfsd]
> [ 2495.123651]? ? nfsd_destroy+0x50/0x50 [nfsd]
> [ 2495.128604]? svc_process+0xd1/0x110 [sunrpc]
> [ 2495.133563]? nfsd+0xe3/0x140 [nfsd]
> [ 2495.137733]? kthread+0x112/0x130
> [ 2495.141643]? ? kthread_park+0x80/0x80
> [ 2495.145993]? ret_from_fork+0x35/0x40
> [ 2495.150262] Modules linked in: ocfs2 quota_tree nfsd nfs_acl lockd
> grace sctp libcrc32c nft_counter xt_tcpudp nft_compat nf_tables
> nfnetlink binfmt_misc ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
> ocfs2_nodemanager ocfs2_stackglue cpufreq_userspace cpufreq_powersave
> cpufreq_conservative configfs bonding dm_round_robin dm_multipath
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_powerclamp coretemp
> kvm_intel kvm ipmi_ssif irqbypass cdc_ether usbnet intel_cstate mii
> ib_mthca intel_uncore evdev iTCO_wdt joydev ipmi_si tpm_tis sg
> tpm_tis_core tpm ipmi_devintf iTCO_vendor_support ib_uverbs pcspkr
> watchdog ioatdma rng_core i7core_edac dca i5500_temp ipmi_msghandler
> ib_umad ib_ipoib auth_rpcgss ib_cm sunrpc ib_core ip_tables x_tables
> autofs4 algif_skcipher af_alg ext4 crc16 mbcache jbd2 crc32c_generic
> dm_crypt dm_mod sr_mod cdrom sd_mod hid_generic usbhid hid
> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mgag200
> drm_vram_helper drm_ttm_helper i2c_algo_bit ttm qla2xxx drm_kms_helper
> [ 2495.150298]? ata_generic drm ata_piix aesni_intel libata nvme_fc
> uhci_hcd crypto_simd nvme_fabrics ehci_pci ehci_hcd megaraid_sas cryptd
> nvme_core glue_helper usbcore scsi_transport_fc lpc_ich mfd_core e1000e
> scsi_mod i2c_i801 usb_common bnx2 ptp pps_core button
> [ 2495.269073] CR2: 00000000000fab7d
> [ 2495.273205] ---[ end trace 44f000505f987296 ]---
> [ 2495.278647] RIP: 0010:_raw_spin_lock+0xc/0x20
> [ 2495.283825] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [ 2495.304287] RSP: 0018:ffffb949cca7fac0 EFLAGS: 00010246
> [ 2495.310368] RAX: 0000000000000000 RBX: 00000000000faaf5 RCX:
> 0000000000000000
> [ 2495.318358] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 00000000000fab7d
> [ 2495.326339] RBP: 00000000000fab7d R08: ffff99c8f12737b8 R09:
> 000000000002c340
> [ 2495.334314] R10: 000005dc16e333d8 R11: 0000000000000000 R12:
> ffff99c903066bd0
> [ 2495.342297] R13: ffff99c9014c7bc0 R14: 000000000000ffff R15:
> ffff99c9014c70c8
> [ 2495.350279] FS:? 0000000000000000(0000) GS:ffff99ba03bc0000(0000)
> knlGS:0000000000000000
> [ 2495.359224] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2495.365825] CR2: 00000000000fab7d CR3: 0000000f38df2002 CR4:
> 00000000000206e0
>
> - -----8< cut here 8<-----
>
>
> I also made some record of NFS traffic with tcpdump when the crashs are
> occuring, but I didn't had time to analyse it. I can provide them if it
> helps (but I might have to clean them up first to avoid some data leaks :)
>
> After this happening a few times, I had one case where one of the few
> NFSv4 client was involved, and I took as aworkaround to migrate the
> NFSv4 share to XFS since I though it could be NFSv4 related. Now that I
> looked at the kernel dumps, I think it's also NFSv3 related.
>
> Is it a kernel bug or am I missing something ?
>
> For now, I will switch back to XFS with a failover setup of my cluster,
> cause I cannot afford this kind of crashes of a central file server.
>
>
>
> Cheers
>
>
> Pierre
>
>
> PS : I'll try to update with what I found in the PCAP files in some days
>
>
>
> -----BEGIN PGP SIGNATURE-----
>
> iQIzBAEBCgAdFiEEjXhlMuC3bFbPA3S/qC81yulnAywFAl7DuYUACgkQqC81yuln
> AyyWHg/+NUMgiysN+QesGedzDjfKiUSTKqPvbg3pUklq3bXfidLdIcM6HGZt31Gy
> /k33m83+675c3vQeez8uvSVAh7uNzQW2m7BGrzEv/ZkpwIZxqIwiV+gi/zc7lnGP
> vqkjAqVjAapk254W3aTonZKpExKCbp6h1g1eVWKk43Mu7/UaLkQcjDPq9AyheSt+
> PAPpiQeQ9ox1+OVvEnpawJmcV9QUluHPwsar/yYa87ET2LIlBswtjOHr+Vw27kJU
> qwak16rTDWqVQo5PYIi5N1jdCAEAPOV6EalNNwIQ9/u+lCdbVHYRV7Ze5nE58DDz
> zP7Ai9Vaj/oBr/znjbaoE2VR0iWRVqbztBxts0hNv1IxAqJivZnaaJx6bXp51NeV
> kXwTz0tHqCoy8RVlUXEGP8FBuaWgSLZyfN5IOk1KoocsLUuZh/BVxh+snDj4T52k
> u/SRBFV4lW4RVYWHBItik8oqfMwYaJuwp2hdmU1hHsm1UnHQTQjum1L9ZfLDI7XU
> NxHMGedneJePXdWH0eTqvHAPJ1U59gEL/cMPnOJF57q+eYoCs0svnRxCwQzviBok
> Zak2uzIIXr3jc9bY3qiQlZivik5b+l6LK7ccoS7eEIqxOkKivOjwijuXc0EHOUR2
> c0QjFUDwJ1gR8rkBE/rOiKLXhMGnv2lcIZWiu7JlxMq+ms7f/1k=
> =6VH2
> -----END PGP SIGNATURE-----
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Ocfs2-devel] NFS clients crash OCFS2 nodes (general protection fault: 0000 [#1] SMP PTI)
  2020-05-20  2:52 ` Changwei Ge
@ 2020-05-21 17:02   ` Pierre Dinh-van
  2020-05-22  1:35     ` Changwei Ge
  0 siblings, 1 reply; 8+ messages in thread
From: Pierre Dinh-van @ 2020-05-21 17:02 UTC (permalink / raw)
  To: ocfs2-devel

On 5/20/20 4:52 AM, Changwei Ge wrote:
> Hi Pierre,

Hi Changwei,

> You indeed provide detailed information on this problem and the
research looks very nice.

Thank you

> On 5/19/20 6:48 PM, Pierre Dinh-van wrote:
>>
> Hi,
>
> I'm experiencing quite reproducable crashs on my OCFS2 cluster the last
> weeks, and I think I found out part of the Problem. But before I make a
> bug report (I'm not sure where), I though I could post my research here
> in case someone understands it better than I.
>
> When it happens, the kernel is giving me a 'general protection fault:
> 0000 [#1] SMP PTI' or ' BUG: unable to handle page fault for address:'
> message, local access to the OCFS2 volumes are all hanging forever, and
> the load is getting so high that only a hard reset "solves" the problem.
> It happened the same way on both nodes (always on the active node
> serving NFS). It happens when both nodes are up, or when only one is
> active.
>
> > This makes me a little puzzled, you mean you crash the kernel
> manually or ocfs2 crash the kernel itself? From the backtrace you
> posted, I assume that you crash the kernel manually to migrate NFS
> export IP to the other standby node, right?
>
I'm not crashing the kernel manually (I don't know how I could do).
Ocfs2 is crashing the kernel as soon as some NFS requests are coming in,
after the node started to serve NFS.

Maybe I'm using the term "crash the kernel" a wrong way. Actually, only
the OCFS2 part of the system is not responding after this crash. I can
still log in with SSH and try to do things.


> First, a few words about my setup :

> In fact, I am not quite experienced in this kind of setup where ocfs2
serves as NFS/Samba backend filesystem, but the use case seems no
problem to me. But considering that you have hundreds of clients
connected to a single NFS/v4 server, do you have a performance
limitation. This has nothing with the problem, just my personal curiosity.

Most of the NFS clients are Linux desktop clients. Some MacOSX too. The
accesses are not really I/O intensive so I don't have preformance issues
of having only 1 active node. Actually, the performance problem we had
first was because of o2cb speaking over a 1Gb ethernet link, which was
increasing the lstat() time a lot compared to XFS we had before. The
cluster setup was mostly designed to have an easy failover system, and
OCFS2 gave the impression of reducing the risk of corrupt file system
(and allowed samba to be load balanced too)


> 4) nodeB's load is getting high
>
> > I suppose the reason is that ocfs2 can't successfully acquire
> spin_lock of a system inode(ocfs2 internal mechanism to store its
> metadata, not visible to end users ). As you know, locking path won't
> stop trying until it get the lock, so it consumes a lot of CPU cycles,
> you probably observed a high system workload.


So a possible path would be something like this ? (I don't know much
about internals auf NFS locking and POSIX locking, so it's a guess):

NFS client has a lock -> server is rebooted and forgets about the locks
-> NFS clients requests the lock (maybe ignores the grace time or races
with the nfsd restart). nfsd ask the FS about an unknown lock and it
crashes the OCFS2 module somehow.


> Most of the time, when the IP jumps to the other node, I get some of
> this messages :
>
>
> [? 501.707566] (nfsd,2133,14):ocfs2_test_inode_bit:2841 ERROR: unable to
> get alloc inode in slot 65535
>
> > This is very strange, I think ocfs2 passes the invalid slot number
> to subsequent consumer function, which makes it use a dangling pointer
> to a fake/illegal system inode. In other word, this why ocfs2 can't
> acquire the spin_lock of system inode.
>
>
> > Can you try to fix the logic in `ocfs2_get_suballoc_slot_bit`
> judging if slot number is valid? And perform your case once more?
>
> > Feel free to ask my help if you need. :-)
>
I never looked at the code of OCFS2 and I'm not familiar with filesystem
internals. I actually wrote to ocfs2-devel because ocfs2-users is not
letting my mails go through :).

For testing the case, I will try to do something on the standby node on
the original ocfs2 volumes, as soon as I'm sure that it's safe for the
productive server. But for checking the logic in
ocfs2_get_suballoc_slot_bit, I looked at the code and don't really
understand it


>
> [? 501.716660] (nfsd,2133,14):ocfs2_test_inode_bit:2867 ERROR: status
> = -22
>
> > Why this happens and if it is another issue needs further investigation.
>
> [? 501.726585] (nfsd,2133,6):ocfs2_test_inode_bit:2841 ERROR: unable to
> get alloc inode in slot 65535
> [? 501.735579] (nfsd,2133,6):ocfs2_test_inode_bit:2867 ERROR: status = -22
>
> But it also happens when the node is not crashing, so I think it's
> another Problem.
>
> Last night, I saved a few of the kernel output while the servers where
> crashing one after the other like in a ping-pong game :
>
>
> -----8< on nodeB? 8<-----
>
> [? 502.070431] nfsd: nfsv4 idmapping failing: has idmapd not been started?
> [? 502.475747] general protection fault: 0000 [#1] SMP PTI
> [? 502.481027] CPU: 6 PID: 2104 Comm: nfsd Not tainted
> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
> []...
> I also made some record of NFS traffic with tcpdump when the crashs are
> occuring, but I didn't had time to analyse it. I can provide them if it
> helps (but I might have to clean them up first to avoid some data leaks :)
>
> After this happening a few times, I had one case where one of the few
> NFSv4 client was involved, and I took as aworkaround to migrate the
> NFSv4 share to XFS since I though it could be NFSv4 related. Now that I
> looked at the kernel dumps, I think it's also NFSv3 related.
>
> Is it a kernel bug or am I missing something ?
>
> > Probably a ocfs2 internal error handling bug. :-(

I'll check if I can reproduce it in a safe testing environment.

Is there any things I should also make while reproducing to have a
better understanding of the failure path ? I never did kernel debugging
before.


>
> For now, I will switch back to XFS with a failover setup of my cluster,
> cause I cannot afford this kind of crashes of a central file server.
>
>
>
> > Interesting, do you mean you can use XFS as a NFSv4 backend
> filesystem which is formatted on a shared SCSI LUN with more than one
> host attached without filesystem corruption?
>
>
I switched back to a single active node setup. the second node is in
standby and don't touch the XFS until the first node is not using them
anymore.

Should be safe for now.


Cheers


Pierre



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20200521/5cc7d0b6/attachment.bin 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Ocfs2-devel] NFS clients crash OCFS2 nodes (general protection fault: 0000 [#1] SMP PTI)
  2020-05-21 17:02   ` Pierre Dinh-van
@ 2020-05-22  1:35     ` Changwei Ge
  0 siblings, 0 replies; 8+ messages in thread
From: Changwei Ge @ 2020-05-22  1:35 UTC (permalink / raw)
  To: ocfs2-devel



On 5/22/20 1:02 AM, Pierre Dinh-van wrote:
> On 5/20/20 4:52 AM, Changwei Ge wrote:
>> Hi Pierre,
> 
> Hi Changwei,
> 
>> You indeed provide detailed information on this problem and the
> research looks very nice.
> 
> Thank you
> 
>> On 5/19/20 6:48 PM, Pierre Dinh-van wrote:
>>>
>> Hi,
>>
>> I'm experiencing quite reproducable crashs on my OCFS2 cluster the last
>> weeks, and I think I found out part of the Problem. But before I make a
>> bug report (I'm not sure where), I though I could post my research here
>> in case someone understands it better than I.
>>
>> When it happens, the kernel is giving me a 'general protection fault:
>> 0000 [#1] SMP PTI' or ' BUG: unable to handle page fault for address:'
>> message, local access to the OCFS2 volumes are all hanging forever, and
>> the load is getting so high that only a hard reset "solves" the problem.
>> It happened the same way on both nodes (always on the active node
>> serving NFS). It happens when both nodes are up, or when only one is
>> active.
>>
>>> This makes me a little puzzled, you mean you crash the kernel
>> manually or ocfs2 crash the kernel itself? From the backtrace you
>> posted, I assume that you crash the kernel manually to migrate NFS
>> export IP to the other standby node, right?
>>
> I'm not crashing the kernel manually (I don't know how I could do).
> Ocfs2 is crashing the kernel as soon as some NFS requests are coming in,
> after the node started to serve NFS.
> 
> Maybe I'm using the term "crash the kernel" a wrong way. Actually, only
> the OCFS2 part of the system is not responding after this crash. I can
> still log in with SSH and try to do things.
> 
> 
>> First, a few words about my setup :
> 
>> In fact, I am not quite experienced in this kind of setup where ocfs2
> serves as NFS/Samba backend filesystem, but the use case seems no
> problem to me. But considering that you have hundreds of clients
> connected to a single NFS/v4 server, do you have a performance
> limitation. This has nothing with the problem, just my personal curiosity.
> 
> Most of the NFS clients are Linux desktop clients. Some MacOSX too. The
> accesses are not really I/O intensive so I don't have preformance issues
> of having only 1 active node. Actually, the performance problem we had
> first was because of o2cb speaking over a 1Gb ethernet link, which was
> increasing the lstat() time a lot compared to XFS we had before. The
> cluster setup was mostly designed to have an easy failover system, and
> OCFS2 gave the impression of reducing the risk of corrupt file system
> (and allowed samba to be load balanced too)

For load balance use case, it might have an effect on ocfs2 performance.
IMO, it works well for high availability scenario.

> 
> 
>> 4) nodeB's load is getting high
>>
>>> I suppose the reason is that ocfs2 can't successfully acquire
>> spin_lock of a system inode(ocfs2 internal mechanism to store its
>> metadata, not visible to end users ). As you know, locking path won't
>> stop trying until it get the lock, so it consumes a lot of CPU cycles,
>> you probably observed a high system workload.
> 
> 
> So a possible path would be something like this ? (I don't know much
> about internals auf NFS locking and POSIX locking, so it's a guess):
> 
> NFS client has a lock -> server is rebooted and forgets about the locks
> -> NFS clients requests the lock (maybe ignores the grace time or races
> with the nfsd restart). nfsd ask the FS about an unknown lock and it
> crashes the OCFS2 module somehow.
> 

Not really, the lock I mentioned is neither Posix lock nor NFSv4 lock 
service. It's kind of kernel internal synchronization mechanism.

> 
>> Most of the time, when the IP jumps to the other node, I get some of
>> this messages :
>>
>>
>> [? 501.707566] (nfsd,2133,14):ocfs2_test_inode_bit:2841 ERROR: unable to
>> get alloc inode in slot 65535
>>
>>> This is very strange, I think ocfs2 passes the invalid slot number
>> to subsequent consumer function, which makes it use a dangling pointer
>> to a fake/illegal system inode. In other word, this why ocfs2 can't
>> acquire the spin_lock of system inode.
>>
>>
>>> Can you try to fix the logic in `ocfs2_get_suballoc_slot_bit`
>> judging if slot number is valid? And perform your case once more?
>>
>>> Feel free to ask my help if you need. :-)
>>
> I never looked at the code of OCFS2 and I'm not familiar with filesystem
> internals. I actually wrote to ocfs2-devel because ocfs2-users is not
> letting my mails go through :).
> 
> For testing the case, I will try to do something on the standby node on
> the original ocfs2 volumes, as soon as I'm sure that it's safe for the
> productive server. But for checking the logic in
> ocfs2_get_suballoc_slot_bit, I looked at the code and don't really
> understand it
> 

Not a problem. I will see if I can allocate some bandwidth to fix error 
judgement logic.

Thanks,
Changwei

> 
>>
>> [? 501.716660] (nfsd,2133,14):ocfs2_test_inode_bit:2867 ERROR: status
>> = -22
>>
>>> Why this happens and if it is another issue needs further investigation.
>>
>> [? 501.726585] (nfsd,2133,6):ocfs2_test_inode_bit:2841 ERROR: unable to
>> get alloc inode in slot 65535
>> [? 501.735579] (nfsd,2133,6):ocfs2_test_inode_bit:2867 ERROR: status = -22
>>
>> But it also happens when the node is not crashing, so I think it's
>> another Problem.
>>
>> Last night, I saved a few of the kernel output while the servers where
>> crashing one after the other like in a ping-pong game :
>>
>>
>> -----8< on nodeB? 8<-----
>>
>> [? 502.070431] nfsd: nfsv4 idmapping failing: has idmapd not been started?
>> [? 502.475747] general protection fault: 0000 [#1] SMP PTI
>> [? 502.481027] CPU: 6 PID: 2104 Comm: nfsd Not tainted
>> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
>> []...
>> I also made some record of NFS traffic with tcpdump when the crashs are
>> occuring, but I didn't had time to analyse it. I can provide them if it
>> helps (but I might have to clean them up first to avoid some data leaks :)
>>
>> After this happening a few times, I had one case where one of the few
>> NFSv4 client was involved, and I took as aworkaround to migrate the
>> NFSv4 share to XFS since I though it could be NFSv4 related. Now that I
>> looked at the kernel dumps, I think it's also NFSv3 related.
>>
>> Is it a kernel bug or am I missing something ?
>>
>>> Probably a ocfs2 internal error handling bug. :-(
> 
> I'll check if I can reproduce it in a safe testing environment.
> 
> Is there any things I should also make while reproducing to have a
> better understanding of the failure path ? I never did kernel debugging
> before.
> 
> 
>>
>> For now, I will switch back to XFS with a failover setup of my cluster,
>> cause I cannot afford this kind of crashes of a central file server.
>>
>>
>>
>>> Interesting, do you mean you can use XFS as a NFSv4 backend
>> filesystem which is formatted on a shared SCSI LUN with more than one
>> host attached without filesystem corruption?
>>
>>
> I switched back to a single active node setup. the second node is in
> standby and don't touch the XFS until the first node is not using them
> anymore.
> 
> Should be safe for now.
> 
> 
> Cheers
> 
> 
> Pierre
> 
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Ocfs2-devel] NFS clients crash OCFS2 nodes (general protection fault: 0000 [#1] SMP PTI)
  2020-05-20 22:01 ` herbert.van.den.bergh at oracle.com
@ 2020-06-02 10:10   ` Pierre Dinh-van
  2020-06-02 15:56     ` herbert.van.den.bergh at oracle.com
  0 siblings, 1 reply; 8+ messages in thread
From: Pierre Dinh-van @ 2020-06-02 10:10 UTC (permalink / raw)
  To: ocfs2-devel

Hi Herbert,

Is there somewhere a Bugtracker for it ? Or are the bugs @oracle not
public anymore ?

Browsing/searching bugzilla don't show me any bug :
https://bugzilla.oracle.com/bugzilla/describecomponents.cgi

Cheers

Pierre

On 5/21/20 12:01 AM, herbert.van.den.bergh at oracle.com wrote:
> Pierre,
>
> A similar if not identical bug was reported against Oracle Linux. The
> fix is still being developed. I will update that bug to reference this
> thread, and ask the developer to submit the fix upstream and/or to the
> ocfs2-devel list once it's ready.
>
> Thanks,
> Herbert.
>
> On 5/19/20 3:48 AM, Pierre Dinh-van wrote:
> Hi,
> 
> I'm experiencing quite reproducable crashs on my OCFS2 cluster the last
> weeks, and I think I found out part of the Problem. But before I make a
> bug report (I'm not sure where), I though I could post my research here
> in case someone understands it better than I.
> 
> When it happens, the kernel is giving me a 'general protection fault:
> 0000 [#1] SMP PTI' or ' BUG: unable to handle page fault for address:'
> message, local access to the OCFS2 volumes are all hanging forever, and
> the load is getting so high that only a hard reset "solves" the problem.
> It happened the same way on both nodes (always on the active node
> serving NFS). It happens when both nodes are up, or when only one is
> active.
> 
> First, a few words about my setup :
> 
> - storage is Fiberchannel LUNs served by a EMC VNX5200 (4x 5TB).
> 
> - LUNs are encrypted with LUKS
> 
> - LUKS volumes are formated with OCFS2
> 
> - heartbeat=local
> 
> - The OCFS2 cluster is communicating on a dedictated IP over InfinyBand
> direct link (actually a fail-over bonding of the 2 ib interfaces).
> 
> - nodes are 2 debian buster hardware servers (24 cores, 128GB RAM).
> 
> - I use corosync for cluster management, but I don't think it's related
> 
> - The cluster serves 4 NFS exports (3 are NFSv3, and one is NFSv4 with
> kerberos) and samba shares
> 
> - The samba shares are served in parallel by the 2 nodes managed by
> ctdb
> 
> - The NFS exports are served by one of the node having the dedicated IP
> for this export.
> 
> - When corosync is moving the IP of a NFS export to the other node, it
> restarts nfsd to issue a grace time of 10s on the new active node.
> 
> - The bug appeared with default debian buster kernel 4.19.0-9-amd64 and
> with the backported kernel 5.5.0-0.bpo.2-amd64 as well
> 
> 
> My cluster went live a few weeks ago, and it first worked with both
> nodes active. During some maintenance work where the active NFS node had
> to switch, I had a a bug, that I was able to reproduce against my will
> at least 20 times in the last nights :
> 
> 1) around 100 NFS clients are connected with the exports and happy
> working with it
> 
> 2) The IPs of the NFS shares are switching from nodeA to nodeB
> 
> 3) nodeB's kernel says "general protection fault: 0000 [#1] SMP PTI"
> 
> 4) nodeB's load is getting high
> 
> 5) access to the OCFS2 is hanging (clean reboot never end, system cannot
> unmount OCFS volumes)
> 
> 6) I power cycle nodeB
> 
> 7) I start nodeA again
> 
> 8) nodeA give the same symptoms, as soon as some NFS clients do some
> special requests
> 
> 
> Most of the time, when the IP jumps to the other node, I get some of
> this messages :
> 
> 
> [  501.707566] (nfsd,2133,14):ocfs2_test_inode_bit:2841 ERROR: unable to
> get alloc inode in slot 65535
> [  501.716660] (nfsd,2133,14):ocfs2_test_inode_bit:2867 ERROR: status
> = -22
> [  501.726585] (nfsd,2133,6):ocfs2_test_inode_bit:2841 ERROR: unable to
> get alloc inode in slot 65535
> [  501.735579] (nfsd,2133,6):ocfs2_test_inode_bit:2867 ERROR: status =
> -22
> 
> But it also happens when the node is not crashing, so I think it's
> another Problem.
> 
> Last night, I saved a few of the kernel output while the servers where
> crashing one after the other like in a ping-pong game :
> 
> 
> -----8< on nodeB  8<-----
> 
> [  502.070431] nfsd: nfsv4 idmapping failing: has idmapd not been
> started?
> [  502.475747] general protection fault: 0000 [#1] SMP PTI
> [  502.481027] CPU: 6 PID: 2104 Comm: nfsd Not tainted
> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
> [  502.490016] Hardware name: IBM System x3650 M3
> -[7945UHV]-/94Y7614     , BIOS -[D6E150CUS-1.11]- 02/08/2011
> [  502.499821] RIP: 0010:_raw_spin_lock+0xc/0x20
> [  502.504206] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [  502.523018] RSP: 0018:ffffa5af4838bac0 EFLAGS: 00010246
> [  502.528277] RAX: 0000000000000000 RBX: 0fdbbeded053c1d6 RCX:
> 0000000000000000
> [  502.535433] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 0fdbbeded053c25e
> [  502.542588] RBP: 0fdbbeded053c25e R08: ffff9672ffa73068 R09:
> 000000000002c340
> [  502.549743] R10: 00000184d03a2115 R11: 0000000000000000 R12:
> ffff9672eff43fd0
> [  502.556899] R13: ffff9672fff66bc0 R14: 000000000000ffff R15:
> ffff9672fff660c8
> [  502.564055] FS:  0000000000000000(0000) GS:ffff968203a80000(0000)
> knlGS:0000000000000000
> [  502.572166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  502.577928] CR2: 00007f0a4bb5df50 CR3: 0000001e4169a004 CR4:
> 00000000000206e0
> [  502.585082] Call Trace:
> [  502.587546]  igrab+0x19/0x50
> [  502.590484]  ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
> [  502.596094]  ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
> [  502.601534]  ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
> [  502.606536]  ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
> [  502.612321]  ocfs2_get_parent+0xa3/0x300 [ocfs2]
> [  502.616960]  reconnect_path+0xa1/0x2c0
> [  502.620742]  ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
> [  502.625636]  exportfs_decode_fh+0x111/0x2d0
> [  502.629844]  ? exp_find_key+0xcd/0x160 [nfsd]
> [  502.634218]  ? __kmalloc+0x180/0x270
> [  502.637810]  ? security_prepare_creds+0x6f/0xa0
> [  502.642356]  ? tomoyo_write_self+0x1b0/0x1b0
> [  502.646642]  ? security_prepare_creds+0x49/0xa0
> [  502.651995]  fh_verify+0x3e5/0x5f0 [nfsd]
> [  502.656737]  nfsd3_proc_getattr+0x6b/0x100 [nfsd]
> [  502.662152]  nfsd_dispatch+0x9e/0x210 [nfsd]
> [  502.667136]  svc_process_common+0x386/0x6f0 [sunrpc]
> [  502.672791]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
> [  502.678628]  ? svc_recv+0x2ff/0x9c0 [sunrpc]
> [  502.683597]  ? nfsd_svc+0x2c0/0x2c0 [nfsd]
> [  502.688395]  ? nfsd_destroy+0x50/0x50 [nfsd]
> [  502.693377]  svc_process+0xd1/0x110 [sunrpc]
> [  502.698361]  nfsd+0xe3/0x140 [nfsd]
> [  502.702561]  kthread+0x112/0x130
> [  502.706504]  ? kthread_park+0x80/0x80
> [  502.710886]  ret_from_fork+0x35/0x40
> [  502.715181] Modules linked in: rpcsec_gss_krb5 nfsd nfs_acl lockd
> grace ocfs2 quota_tree sctp libcrc32c nft_counter ipt_REJECT
> nf_reject_ipv4 xt_tcpudp nft_compat nf_tables nfnetlink binfmt_misc
> ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
> cpufreq_userspace ocfs2_stackglue cpufreq_powersave cpufreq_conservative
> configfs bonding dm_round_robin dm_multipath scsi_dh_rdac scsi_dh_emc
> scsi_dh_alua intel_powerclamp coretemp kvm_intel ipmi_ssif kvm cdc_ether
> irqbypass usbnet mii tpm_tis intel_cstate tpm_tis_core intel_uncore
> joydev tpm ipmi_si ioatdma sg iTCO_wdt ib_mthca iTCO_vendor_support
> watchdog dca pcspkr ipmi_devintf rng_core ib_uverbs ipmi_msghandler
> i5500_temp i7core_edac evdev ib_umad ib_ipoib ib_cm ib_core auth_rpcgss
> sunrpc ip_tables x_tables autofs4 algif_skcipher af_alg ext4 crc16
> mbcache jbd2 crc32c_generic dm_crypt dm_mod sd_mod hid_generic usbhid
> hid sr_mod cdrom crct10dif_pclmul crc32_pclmul crc32c_intel
> ghash_clmulni_intel mgag200 drm_vram_helper drm_ttm_helper
> [  502.715218]  i2c_algo_bit ttm qla2xxx drm_kms_helper drm ata_generic
> ata_piix aesni_intel libata nvme_fc ehci_pci uhci_hcd crypto_simd
> ehci_hcd nvme_fabrics cryptd glue_helper lpc_ich nvme_core megaraid_sas
> mfd_core usbcore scsi_transport_fc e1000e scsi_mod i2c_i801 usb_common
> ptp pps_core bnx2 button
> [  502.837917] ---[ end trace 6e1a8e507ac30d8d ]---
> [  502.843379] RIP: 0010:_raw_spin_lock+0xc/0x20
> [  502.848584] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [  502.869295] RSP: 0018:ffffa5af4838bac0 EFLAGS: 00010246
> [  502.875404] RAX: 0000000000000000 RBX: 0fdbbeded053c1d6 RCX:
> 0000000000000000
> [  502.883423] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 0fdbbeded053c25e
> [  502.891436] RBP: 0fdbbeded053c25e R08: ffff9672ffa73068 R09:
> 000000000002c340
> [  502.899446] R10: 00000184d03a2115 R11: 0000000000000000 R12:
> ffff9672eff43fd0
> [  502.907455] R13: ffff9672fff66bc0 R14: 000000000000ffff R15:
> ffff9672fff660c8
> [  502.915473] FS:  0000000000000000(0000) GS:ffff968203a80000(0000)
> knlGS:0000000000000000
> [  502.924496] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  502.931147] CR2: 00007f0a4bb5df50 CR3: 0000001e4169a004 CR4:
> 00000000000206e0
> 
> -----8< cut here 8<-----
> 
> 
> shortly after when trying to start nodeA while nodeB was down :
> 
> -----8< nodeA 8<-----
> 
> [  728.927750] general protection fault: 0000 [#1] SMP PTI
> [  728.934647] CPU: 8 PID: 12156 Comm: nfsd Not tainted
> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
> [  728.944800] Hardware name: IBM System x3650 M3 -[7945H2G]-/69Y4438,
> BIOS -[D6E158AUS-1.16]- 11/26/2012
> [  728.955176] RIP: 0010:_raw_spin_lock+0xc/0x20
> [  728.960595] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [  728.981559] RSP: 0018:ffffa24d09187ac0 EFLAGS: 00010246
> [  728.987885] RAX: 0000000000000000 RBX: 625f656369766564 RCX:
> 0000000000000000
> [  728.996135] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 625f6563697665ec
> [  729.004384] RBP: 625f6563697665ec R08: ffff91f1272e3680 R09:
> 000000000002c340
> [  729.012628] R10: 000002114c00d22b R11: 0000000000000000 R12:
> ffff91e13858bbd0
> [  729.020877] R13: ffff91f13c03bbc0 R14: 000000000000ffff R15:
> ffff91f13c03b0c8
> [  729.029108] FS:  0000000000000000(0000) GS:ffff91f13fa80000(0000)
> knlGS:0000000000000000
> [  729.038308] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  729.045161] CR2: 000055fe95d1d9a8 CR3: 0000001c8420a003 CR4:
> 00000000000206e0
> [  729.053415] Call Trace:
> [  729.056978]  igrab+0x19/0x50
> [  729.061023]  ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
> [  729.067736]  ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
> [  729.074284]  ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
> [  729.080386]  ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
> [  729.087268]  ocfs2_get_parent+0xa3/0x300 [ocfs2]
> [  729.092995]  reconnect_path+0xa1/0x2c0
> [  729.097863]  ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
> [  729.103840]  exportfs_decode_fh+0x111/0x2d0
> [  729.109133]  ? exp_find_key+0xcd/0x160 [nfsd]
> [  729.114574]  ? dbs_update_util_handler+0x16/0x90
> [  729.120273]  ? __kmalloc+0x180/0x270
> [  729.124920]  ? security_prepare_creds+0x6f/0xa0
> [  729.130521]  ? tomoyo_write_self+0x1b0/0x1b0
> [  729.135858]  ? security_prepare_creds+0x49/0xa0
> [  729.141464]  fh_verify+0x3e5/0x5f0 [nfsd]
> [  729.146536]  nfsd3_proc_getattr+0x6b/0x100 [nfsd]
> [  729.152297]  nfsd_dispatch+0x9e/0x210 [nfsd]
> [  729.157647]  svc_process_common+0x386/0x6f0 [sunrpc]
> [  729.163675]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
> [  729.169873]  ? svc_recv+0x2ff/0x9c0 [sunrpc]
> [  729.175190]  ? nfsd_svc+0x2c0/0x2c0 [nfsd]
> [  729.180323]  ? nfsd_destroy+0x50/0x50 [nfsd]
> [  729.185631]  svc_process+0xd1/0x110 [sunrpc]
> [  729.190925]  nfsd+0xe3/0x140 [nfsd]
> [  729.195421]  kthread+0x112/0x130
> [  729.199636]  ? kthread_park+0x80/0x80
> [  729.204279]  ret_from_fork+0x35/0x40
> [  729.208818] Modules linked in: ocfs2 quota_tree dm_crypt crypto_simd
> cryptd glue_helper algif_skcipher af_alg nfsd nfs_acl lockd grace sctp
> libcrc32c nft_counter xt_tcpudp nft_compat nf_tables nfnetlink
> binfmt_misc ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
> ocfs2_stackglue configfs bonding dm_round_robin dm_multipath
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_powerclamp ipmi_ssif
> coretemp kvm_intel kvm irqbypass cdc_ether intel_cstate usbnet ib_mthca
> mii joydev intel_uncore ib_uverbs iTCO_wdt sg ioatdma
> iTCO_vendor_support i7core_edac watchdog i5500_temp pcspkr ipmi_si
> tpm_tis tpm_tis_core tpm ipmi_devintf ipmi_msghandler rng_core evdev
> acpi_cpufreq ib_umad auth_rpcgss ib_ipoib ib_cm sunrpc ib_core ip_tables
> x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_mod
> hid_generic sd_mod usbhid hid sr_mod cdrom uas usb_storage mgag200
> drm_vram_helper drm_ttm_helper qla2xxx ata_generic i2c_algo_bit ttm
> nvme_fc ixgbe nvme_fabrics ata_piix xfrm_algo ehci_pci uhci_hcd
> [  729.208868]  drm_kms_helper nvme_core ehci_hcd dca libata
> megaraid_sas scsi_transport_fc libphy ptp drm usbcore scsi_mod
> crc32c_intel lpc_ich i2c_i801 pps_core mfd_core usb_common bnx2 mdio
> button
> [  729.323900] ---[ end trace 212cefe75207e17f ]---
> [  729.329644] RIP: 0010:_raw_spin_lock+0xc/0x20
> [  729.335121] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [  729.356192] RSP: 0018:ffffa24d09187ac0 EFLAGS: 00010246
> [  729.362584] RAX: 0000000000000000 RBX: 625f656369766564 RCX:
> 0000000000000000
> [  729.370889] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 625f6563697665ec
> [  729.379184] RBP: 625f6563697665ec R08: ffff91f1272e3680 R09:
> 000000000002c340
> [  729.387475] R10: 000002114c00d22b R11: 0000000000000000 R12:
> ffff91e13858bbd0
> [  729.395767] R13: ffff91f13c03bbc0 R14: 000000000000ffff R15:
> ffff91f13c03b0c8
> [  729.404066] FS:  0000000000000000(0000) GS:ffff91f13fa80000(0000)
> knlGS:0000000000000000
> [  729.413336] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  729.420262] CR2: 000055fe95d1d9a8 CR3: 0000001c8420a003 CR4:
> 00000000000206e0
> 
> -----8< cut here 8<-----
> 
> Since the backtrace is speaking about nfsd, I tried to isolate the
> Problem by droping all NFS packet entering to the active node before
> starting the services :
> 
> iptables -I INPUT -p tcp --dport 2049 -j DROP
> 
> Everything fine, access over samba were doing fine.
> 
> I started to allow NFS for part of my network with iptables, until the
> crash occurs.
> 
> On this one, the crash occured after allowing only a NFSv3 client.
> 
> -----8< nodeB 8<-----
> 
> [ 2494.893697] BUG: unable to handle page fault for address:
> 00000000000fab7d
> [ 2494.900626] #PF: supervisor write access in kernel mode
> [ 2494.905873] #PF: error_code(0x0002) - not-present page
> [ 2494.911042] PGD 0 P4D 0
> [ 2494.913596] Oops: 0002 [#1] SMP PTI
> [ 2494.917106] CPU: 11 PID: 21342 Comm: nfsd Not tainted
> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
> [ 2494.926285] Hardware name: IBM System x3650 M3
> -[7945UHV]-/94Y7614     , BIOS -[D6E150CUS-1.11]- 02/08/2011
> [ 2494.936077] RIP: 0010:_raw_spin_lock+0xc/0x20
> [ 2494.940461] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [ 2494.959269] RSP: 0018:ffffb949cca7fac0 EFLAGS: 00010246
> [ 2494.964511] RAX: 0000000000000000 RBX: 00000000000faaf5 RCX:
> 0000000000000000
> [ 2494.971664] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 00000000000fab7d
> [ 2494.978815] RBP: 00000000000fab7d R08: ffff99c8f12737b8 R09:
> 000000000002c340
> [ 2494.985968] R10: 000005dc16e333d8 R11: 0000000000000000 R12:
> ffff99c903066bd0
> [ 2494.993119] R13: ffff99c9014c7bc0 R14: 000000000000ffff R15:
> ffff99c9014c70c8
> [ 2495.000271] FS:  0000000000000000(0000) GS:ffff99ba03bc0000(0000)
> knlGS:0000000000000000
> [ 2495.008382] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2495.014141] CR2: 00000000000fab7d CR3: 0000000f38df2002 CR4:
> 00000000000206e0
> [ 2495.021291] Call Trace:
> [ 2495.023753]  igrab+0x19/0x50
> [ 2495.026693]  ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
> [ 2495.032301]  ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
> [ 2495.037741]  ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
> [ 2495.042742]  ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
> [ 2495.048525]  ocfs2_get_parent+0xa3/0x300 [ocfs2]
> [ 2495.053163]  reconnect_path+0xa1/0x2c0
> [ 2495.056943]  ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
> [ 2495.061838]  exportfs_decode_fh+0x111/0x2d0
> [ 2495.066042]  ? exp_find_key+0xcd/0x160 [nfsd]
> [ 2495.070417]  ? __kmalloc+0x180/0x270
> [ 2495.074007]  ? security_prepare_creds+0x6f/0xa0
> [ 2495.078551]  ? tomoyo_write_self+0x1b0/0x1b0
> [ 2495.082834]  ? security_prepare_creds+0x49/0xa0
> [ 2495.087387]  fh_verify+0x3e5/0x5f0 [nfsd]
> [ 2495.092125]  nfsd3_proc_getattr+0x6b/0x100 [nfsd]
> [ 2495.097517]  nfsd_dispatch+0x9e/0x210 [nfsd]
> [ 2495.102489]  svc_process_common+0x386/0x6f0 [sunrpc]
> [ 2495.108134]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
> [ 2495.113946]  ? svc_recv+0x2ff/0x9c0 [sunrpc]
> [ 2495.118883]  ? nfsd_svc+0x2c0/0x2c0 [nfsd]
> [ 2495.123651]  ? nfsd_destroy+0x50/0x50 [nfsd]
> [ 2495.128604]  svc_process+0xd1/0x110 [sunrpc]
> [ 2495.133563]  nfsd+0xe3/0x140 [nfsd]
> [ 2495.137733]  kthread+0x112/0x130
> [ 2495.141643]  ? kthread_park+0x80/0x80
> [ 2495.145993]  ret_from_fork+0x35/0x40
> [ 2495.150262] Modules linked in: ocfs2 quota_tree nfsd nfs_acl lockd
> grace sctp libcrc32c nft_counter xt_tcpudp nft_compat nf_tables
> nfnetlink binfmt_misc ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
> ocfs2_nodemanager ocfs2_stackglue cpufreq_userspace cpufreq_powersave
> cpufreq_conservative configfs bonding dm_round_robin dm_multipath
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_powerclamp coretemp
> kvm_intel kvm ipmi_ssif irqbypass cdc_ether usbnet intel_cstate mii
> ib_mthca intel_uncore evdev iTCO_wdt joydev ipmi_si tpm_tis sg
> tpm_tis_core tpm ipmi_devintf iTCO_vendor_support ib_uverbs pcspkr
> watchdog ioatdma rng_core i7core_edac dca i5500_temp ipmi_msghandler
> ib_umad ib_ipoib auth_rpcgss ib_cm sunrpc ib_core ip_tables x_tables
> autofs4 algif_skcipher af_alg ext4 crc16 mbcache jbd2 crc32c_generic
> dm_crypt dm_mod sr_mod cdrom sd_mod hid_generic usbhid hid
> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mgag200
> drm_vram_helper drm_ttm_helper i2c_algo_bit ttm qla2xxx drm_kms_helper
> [ 2495.150298]  ata_generic drm ata_piix aesni_intel libata nvme_fc
> uhci_hcd crypto_simd nvme_fabrics ehci_pci ehci_hcd megaraid_sas cryptd
> nvme_core glue_helper usbcore scsi_transport_fc lpc_ich mfd_core e1000e
> scsi_mod i2c_i801 usb_common bnx2 ptp pps_core button
> [ 2495.269073] CR2: 00000000000fab7d
> [ 2495.273205] ---[ end trace 44f000505f987296 ]---
> [ 2495.278647] RIP: 0010:_raw_spin_lock+0xc/0x20
> [ 2495.283825] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
> [ 2495.304287] RSP: 0018:ffffb949cca7fac0 EFLAGS: 00010246
> [ 2495.310368] RAX: 0000000000000000 RBX: 00000000000faaf5 RCX:
> 0000000000000000
> [ 2495.318358] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
> 00000000000fab7d
> [ 2495.326339] RBP: 00000000000fab7d R08: ffff99c8f12737b8 R09:
> 000000000002c340
> [ 2495.334314] R10: 000005dc16e333d8 R11: 0000000000000000 R12:
> ffff99c903066bd0
> [ 2495.342297] R13: ffff99c9014c7bc0 R14: 000000000000ffff R15:
> ffff99c9014c70c8
> [ 2495.350279] FS:  0000000000000000(0000) GS:ffff99ba03bc0000(0000)
> knlGS:0000000000000000
> [ 2495.359224] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2495.365825] CR2: 00000000000fab7d CR3: 0000000f38df2002 CR4:
> 00000000000206e0
> 
> -----8< cut here 8<-----
> 
> 
> I also made some record of NFS traffic with tcpdump when the crashs are
> occuring, but I didn't had time to analyse it. I can provide them if it
> helps (but I might have to clean them up first to avoid some data
> leaks :)
> 
> After this happening a few times, I had one case where one of the few
> NFSv4 client was involved, and I took as aworkaround to migrate the
> NFSv4 share to XFS since I though it could be NFSv4 related. Now that I
> looked at the kernel dumps, I think it's also NFSv3 related.
> 
> Is it a kernel bug or am I missing something ?
> 
> For now, I will switch back to XFS with a failover setup of my cluster,
> cause I cannot afford this kind of crashes of a central file server.
> 
> 
> 
> Cheers
> 
> 
> Pierre
> 
> 
> PS : I'll try to update with what I found in the PCAP files in some days
> 
> 
> 
>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Ocfs2-devel] NFS clients crash OCFS2 nodes (general protection fault: 0000 [#1] SMP PTI)
  2020-06-02 10:10   ` Pierre Dinh-van
@ 2020-06-02 15:56     ` herbert.van.den.bergh at oracle.com
  2020-06-12  1:36       ` Junxiao Bi
  0 siblings, 1 reply; 8+ messages in thread
From: herbert.van.den.bergh at oracle.com @ 2020-06-02 15:56 UTC (permalink / raw)
  To: ocfs2-devel

Hi Pierre,

This is tracked in an internal bug tracking system, which is not 
available to the public or to customers. We'll make sure to post an 
announcement to this list when the bug is fixed.

Thanks,
Herbert.

On 6/2/20 3:10 AM, Pierre Dinh-van wrote:
> Hi Herbert,
>
> Is there somewhere a Bugtracker for it ? Or are the bugs @oracle not
> public anymore ?
>
> Browsing/searching bugzilla don't show me any bug :
> https://bugzilla.oracle.com/bugzilla/describecomponents.cgi
>
> Cheers
>
> Pierre
>
> On 5/21/20 12:01 AM, herbert.van.den.bergh at oracle.com wrote:
>> Pierre,
>>
>> A similar if not identical bug was reported against Oracle Linux. The
>> fix is still being developed. I will update that bug to reference this
>> thread, and ask the developer to submit the fix upstream and/or to the
>> ocfs2-devel list once it's ready.
>>
>> Thanks,
>> Herbert.
>>
>> On 5/19/20 3:48 AM, Pierre Dinh-van wrote:
>> Hi,
>>
>> I'm experiencing quite reproducable crashs on my OCFS2 cluster the last
>> weeks, and I think I found out part of the Problem. But before I make a
>> bug report (I'm not sure where), I though I could post my research here
>> in case someone understands it better than I.
>>
>> When it happens, the kernel is giving me a 'general protection fault:
>> 0000 [#1] SMP PTI' or ' BUG: unable to handle page fault for address:'
>> message, local access to the OCFS2 volumes are all hanging forever, and
>> the load is getting so high that only a hard reset "solves" the problem.
>> It happened the same way on both nodes (always on the active node
>> serving NFS). It happens when both nodes are up, or when only one is
>> active.
>>
>> First, a few words about my setup :
>>
>> - storage is Fiberchannel LUNs served by a EMC VNX5200 (4x 5TB).
>>
>> - LUNs are encrypted with LUKS
>>
>> - LUKS volumes are formated with OCFS2
>>
>> - heartbeat=local
>>
>> - The OCFS2 cluster is communicating on a dedictated IP over InfinyBand
>> direct link (actually a fail-over bonding of the 2 ib interfaces).
>>
>> - nodes are 2 debian buster hardware servers (24 cores, 128GB RAM).
>>
>> - I use corosync for cluster management, but I don't think it's related
>>
>> - The cluster serves 4 NFS exports (3 are NFSv3, and one is NFSv4 with
>> kerberos) and samba shares
>>
>> - The samba shares are served in parallel by the 2 nodes managed by
>> ctdb
>>
>> - The NFS exports are served by one of the node having the dedicated IP
>> for this export.
>>
>> - When corosync is moving the IP of a NFS export to the other node, it
>> restarts nfsd to issue a grace time of 10s on the new active node.
>>
>> - The bug appeared with default debian buster kernel 4.19.0-9-amd64 and
>> with the backported kernel 5.5.0-0.bpo.2-amd64 as well
>>
>>
>> My cluster went live a few weeks ago, and it first worked with both
>> nodes active. During some maintenance work where the active NFS node had
>> to switch, I had a a bug, that I was able to reproduce against my will
>> at least 20 times in the last nights :
>>
>> 1) around 100 NFS clients are connected with the exports and happy
>> working with it
>>
>> 2) The IPs of the NFS shares are switching from nodeA to nodeB
>>
>> 3) nodeB's kernel says "general protection fault: 0000 [#1] SMP PTI"
>>
>> 4) nodeB's load is getting high
>>
>> 5) access to the OCFS2 is hanging (clean reboot never end, system cannot
>> unmount OCFS volumes)
>>
>> 6) I power cycle nodeB
>>
>> 7) I start nodeA again
>>
>> 8) nodeA give the same symptoms, as soon as some NFS clients do some
>> special requests
>>
>>
>> Most of the time, when the IP jumps to the other node, I get some of
>> this messages :
>>
>>
>> [  501.707566] (nfsd,2133,14):ocfs2_test_inode_bit:2841 ERROR: unable to
>> get alloc inode in slot 65535
>> [  501.716660] (nfsd,2133,14):ocfs2_test_inode_bit:2867 ERROR: status
>> = -22
>> [  501.726585] (nfsd,2133,6):ocfs2_test_inode_bit:2841 ERROR: unable to
>> get alloc inode in slot 65535
>> [  501.735579] (nfsd,2133,6):ocfs2_test_inode_bit:2867 ERROR: status =
>> -22
>>
>> But it also happens when the node is not crashing, so I think it's
>> another Problem.
>>
>> Last night, I saved a few of the kernel output while the servers where
>> crashing one after the other like in a ping-pong game :
>>
>>
>> -----8< on nodeB  8<-----
>>
>> [  502.070431] nfsd: nfsv4 idmapping failing: has idmapd not been
>> started?
>> [  502.475747] general protection fault: 0000 [#1] SMP PTI
>> [  502.481027] CPU: 6 PID: 2104 Comm: nfsd Not tainted
>> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
>> [  502.490016] Hardware name: IBM System x3650 M3
>> -[7945UHV]-/94Y7614     , BIOS -[D6E150CUS-1.11]- 02/08/2011
>> [  502.499821] RIP: 0010:_raw_spin_lock+0xc/0x20
>> [  502.504206] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
>> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
>> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
>> [  502.523018] RSP: 0018:ffffa5af4838bac0 EFLAGS: 00010246
>> [  502.528277] RAX: 0000000000000000 RBX: 0fdbbeded053c1d6 RCX:
>> 0000000000000000
>> [  502.535433] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
>> 0fdbbeded053c25e
>> [  502.542588] RBP: 0fdbbeded053c25e R08: ffff9672ffa73068 R09:
>> 000000000002c340
>> [  502.549743] R10: 00000184d03a2115 R11: 0000000000000000 R12:
>> ffff9672eff43fd0
>> [  502.556899] R13: ffff9672fff66bc0 R14: 000000000000ffff R15:
>> ffff9672fff660c8
>> [  502.564055] FS:  0000000000000000(0000) GS:ffff968203a80000(0000)
>> knlGS:0000000000000000
>> [  502.572166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  502.577928] CR2: 00007f0a4bb5df50 CR3: 0000001e4169a004 CR4:
>> 00000000000206e0
>> [  502.585082] Call Trace:
>> [  502.587546]  igrab+0x19/0x50
>> [  502.590484]  ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
>> [  502.596094]  ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
>> [  502.601534]  ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
>> [  502.606536]  ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
>> [  502.612321]  ocfs2_get_parent+0xa3/0x300 [ocfs2]
>> [  502.616960]  reconnect_path+0xa1/0x2c0
>> [  502.620742]  ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
>> [  502.625636]  exportfs_decode_fh+0x111/0x2d0
>> [  502.629844]  ? exp_find_key+0xcd/0x160 [nfsd]
>> [  502.634218]  ? __kmalloc+0x180/0x270
>> [  502.637810]  ? security_prepare_creds+0x6f/0xa0
>> [  502.642356]  ? tomoyo_write_self+0x1b0/0x1b0
>> [  502.646642]  ? security_prepare_creds+0x49/0xa0
>> [  502.651995]  fh_verify+0x3e5/0x5f0 [nfsd]
>> [  502.656737]  nfsd3_proc_getattr+0x6b/0x100 [nfsd]
>> [  502.662152]  nfsd_dispatch+0x9e/0x210 [nfsd]
>> [  502.667136]  svc_process_common+0x386/0x6f0 [sunrpc]
>> [  502.672791]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
>> [  502.678628]  ? svc_recv+0x2ff/0x9c0 [sunrpc]
>> [  502.683597]  ? nfsd_svc+0x2c0/0x2c0 [nfsd]
>> [  502.688395]  ? nfsd_destroy+0x50/0x50 [nfsd]
>> [  502.693377]  svc_process+0xd1/0x110 [sunrpc]
>> [  502.698361]  nfsd+0xe3/0x140 [nfsd]
>> [  502.702561]  kthread+0x112/0x130
>> [  502.706504]  ? kthread_park+0x80/0x80
>> [  502.710886]  ret_from_fork+0x35/0x40
>> [  502.715181] Modules linked in: rpcsec_gss_krb5 nfsd nfs_acl lockd
>> grace ocfs2 quota_tree sctp libcrc32c nft_counter ipt_REJECT
>> nf_reject_ipv4 xt_tcpudp nft_compat nf_tables nfnetlink binfmt_misc
>> ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
>> cpufreq_userspace ocfs2_stackglue cpufreq_powersave cpufreq_conservative
>> configfs bonding dm_round_robin dm_multipath scsi_dh_rdac scsi_dh_emc
>> scsi_dh_alua intel_powerclamp coretemp kvm_intel ipmi_ssif kvm cdc_ether
>> irqbypass usbnet mii tpm_tis intel_cstate tpm_tis_core intel_uncore
>> joydev tpm ipmi_si ioatdma sg iTCO_wdt ib_mthca iTCO_vendor_support
>> watchdog dca pcspkr ipmi_devintf rng_core ib_uverbs ipmi_msghandler
>> i5500_temp i7core_edac evdev ib_umad ib_ipoib ib_cm ib_core auth_rpcgss
>> sunrpc ip_tables x_tables autofs4 algif_skcipher af_alg ext4 crc16
>> mbcache jbd2 crc32c_generic dm_crypt dm_mod sd_mod hid_generic usbhid
>> hid sr_mod cdrom crct10dif_pclmul crc32_pclmul crc32c_intel
>> ghash_clmulni_intel mgag200 drm_vram_helper drm_ttm_helper
>> [  502.715218]  i2c_algo_bit ttm qla2xxx drm_kms_helper drm ata_generic
>> ata_piix aesni_intel libata nvme_fc ehci_pci uhci_hcd crypto_simd
>> ehci_hcd nvme_fabrics cryptd glue_helper lpc_ich nvme_core megaraid_sas
>> mfd_core usbcore scsi_transport_fc e1000e scsi_mod i2c_i801 usb_common
>> ptp pps_core bnx2 button
>> [  502.837917] ---[ end trace 6e1a8e507ac30d8d ]---
>> [  502.843379] RIP: 0010:_raw_spin_lock+0xc/0x20
>> [  502.848584] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
>> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
>> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
>> [  502.869295] RSP: 0018:ffffa5af4838bac0 EFLAGS: 00010246
>> [  502.875404] RAX: 0000000000000000 RBX: 0fdbbeded053c1d6 RCX:
>> 0000000000000000
>> [  502.883423] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
>> 0fdbbeded053c25e
>> [  502.891436] RBP: 0fdbbeded053c25e R08: ffff9672ffa73068 R09:
>> 000000000002c340
>> [  502.899446] R10: 00000184d03a2115 R11: 0000000000000000 R12:
>> ffff9672eff43fd0
>> [  502.907455] R13: ffff9672fff66bc0 R14: 000000000000ffff R15:
>> ffff9672fff660c8
>> [  502.915473] FS:  0000000000000000(0000) GS:ffff968203a80000(0000)
>> knlGS:0000000000000000
>> [  502.924496] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  502.931147] CR2: 00007f0a4bb5df50 CR3: 0000001e4169a004 CR4:
>> 00000000000206e0
>>
>> -----8< cut here 8<-----
>>
>>
>> shortly after when trying to start nodeA while nodeB was down :
>>
>> -----8< nodeA 8<-----
>>
>> [  728.927750] general protection fault: 0000 [#1] SMP PTI
>> [  728.934647] CPU: 8 PID: 12156 Comm: nfsd Not tainted
>> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
>> [  728.944800] Hardware name: IBM System x3650 M3 -[7945H2G]-/69Y4438,
>> BIOS -[D6E158AUS-1.16]- 11/26/2012
>> [  728.955176] RIP: 0010:_raw_spin_lock+0xc/0x20
>> [  728.960595] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
>> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
>> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
>> [  728.981559] RSP: 0018:ffffa24d09187ac0 EFLAGS: 00010246
>> [  728.987885] RAX: 0000000000000000 RBX: 625f656369766564 RCX:
>> 0000000000000000
>> [  728.996135] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
>> 625f6563697665ec
>> [  729.004384] RBP: 625f6563697665ec R08: ffff91f1272e3680 R09:
>> 000000000002c340
>> [  729.012628] R10: 000002114c00d22b R11: 0000000000000000 R12:
>> ffff91e13858bbd0
>> [  729.020877] R13: ffff91f13c03bbc0 R14: 000000000000ffff R15:
>> ffff91f13c03b0c8
>> [  729.029108] FS:  0000000000000000(0000) GS:ffff91f13fa80000(0000)
>> knlGS:0000000000000000
>> [  729.038308] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  729.045161] CR2: 000055fe95d1d9a8 CR3: 0000001c8420a003 CR4:
>> 00000000000206e0
>> [  729.053415] Call Trace:
>> [  729.056978]  igrab+0x19/0x50
>> [  729.061023]  ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
>> [  729.067736]  ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
>> [  729.074284]  ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
>> [  729.080386]  ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
>> [  729.087268]  ocfs2_get_parent+0xa3/0x300 [ocfs2]
>> [  729.092995]  reconnect_path+0xa1/0x2c0
>> [  729.097863]  ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
>> [  729.103840]  exportfs_decode_fh+0x111/0x2d0
>> [  729.109133]  ? exp_find_key+0xcd/0x160 [nfsd]
>> [  729.114574]  ? dbs_update_util_handler+0x16/0x90
>> [  729.120273]  ? __kmalloc+0x180/0x270
>> [  729.124920]  ? security_prepare_creds+0x6f/0xa0
>> [  729.130521]  ? tomoyo_write_self+0x1b0/0x1b0
>> [  729.135858]  ? security_prepare_creds+0x49/0xa0
>> [  729.141464]  fh_verify+0x3e5/0x5f0 [nfsd]
>> [  729.146536]  nfsd3_proc_getattr+0x6b/0x100 [nfsd]
>> [  729.152297]  nfsd_dispatch+0x9e/0x210 [nfsd]
>> [  729.157647]  svc_process_common+0x386/0x6f0 [sunrpc]
>> [  729.163675]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
>> [  729.169873]  ? svc_recv+0x2ff/0x9c0 [sunrpc]
>> [  729.175190]  ? nfsd_svc+0x2c0/0x2c0 [nfsd]
>> [  729.180323]  ? nfsd_destroy+0x50/0x50 [nfsd]
>> [  729.185631]  svc_process+0xd1/0x110 [sunrpc]
>> [  729.190925]  nfsd+0xe3/0x140 [nfsd]
>> [  729.195421]  kthread+0x112/0x130
>> [  729.199636]  ? kthread_park+0x80/0x80
>> [  729.204279]  ret_from_fork+0x35/0x40
>> [  729.208818] Modules linked in: ocfs2 quota_tree dm_crypt crypto_simd
>> cryptd glue_helper algif_skcipher af_alg nfsd nfs_acl lockd grace sctp
>> libcrc32c nft_counter xt_tcpudp nft_compat nf_tables nfnetlink
>> binfmt_misc ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
>> ocfs2_stackglue configfs bonding dm_round_robin dm_multipath
>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_powerclamp ipmi_ssif
>> coretemp kvm_intel kvm irqbypass cdc_ether intel_cstate usbnet ib_mthca
>> mii joydev intel_uncore ib_uverbs iTCO_wdt sg ioatdma
>> iTCO_vendor_support i7core_edac watchdog i5500_temp pcspkr ipmi_si
>> tpm_tis tpm_tis_core tpm ipmi_devintf ipmi_msghandler rng_core evdev
>> acpi_cpufreq ib_umad auth_rpcgss ib_ipoib ib_cm sunrpc ib_core ip_tables
>> x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_mod
>> hid_generic sd_mod usbhid hid sr_mod cdrom uas usb_storage mgag200
>> drm_vram_helper drm_ttm_helper qla2xxx ata_generic i2c_algo_bit ttm
>> nvme_fc ixgbe nvme_fabrics ata_piix xfrm_algo ehci_pci uhci_hcd
>> [  729.208868]  drm_kms_helper nvme_core ehci_hcd dca libata
>> megaraid_sas scsi_transport_fc libphy ptp drm usbcore scsi_mod
>> crc32c_intel lpc_ich i2c_i801 pps_core mfd_core usb_common bnx2 mdio
>> button
>> [  729.323900] ---[ end trace 212cefe75207e17f ]---
>> [  729.329644] RIP: 0010:_raw_spin_lock+0xc/0x20
>> [  729.335121] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
>> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
>> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
>> [  729.356192] RSP: 0018:ffffa24d09187ac0 EFLAGS: 00010246
>> [  729.362584] RAX: 0000000000000000 RBX: 625f656369766564 RCX:
>> 0000000000000000
>> [  729.370889] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
>> 625f6563697665ec
>> [  729.379184] RBP: 625f6563697665ec R08: ffff91f1272e3680 R09:
>> 000000000002c340
>> [  729.387475] R10: 000002114c00d22b R11: 0000000000000000 R12:
>> ffff91e13858bbd0
>> [  729.395767] R13: ffff91f13c03bbc0 R14: 000000000000ffff R15:
>> ffff91f13c03b0c8
>> [  729.404066] FS:  0000000000000000(0000) GS:ffff91f13fa80000(0000)
>> knlGS:0000000000000000
>> [  729.413336] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  729.420262] CR2: 000055fe95d1d9a8 CR3: 0000001c8420a003 CR4:
>> 00000000000206e0
>>
>> -----8< cut here 8<-----
>>
>> Since the backtrace is speaking about nfsd, I tried to isolate the
>> Problem by droping all NFS packet entering to the active node before
>> starting the services :
>>
>> iptables -I INPUT -p tcp --dport 2049 -j DROP
>>
>> Everything fine, access over samba were doing fine.
>>
>> I started to allow NFS for part of my network with iptables, until the
>> crash occurs.
>>
>> On this one, the crash occured after allowing only a NFSv3 client.
>>
>> -----8< nodeB 8<-----
>>
>> [ 2494.893697] BUG: unable to handle page fault for address:
>> 00000000000fab7d
>> [ 2494.900626] #PF: supervisor write access in kernel mode
>> [ 2494.905873] #PF: error_code(0x0002) - not-present page
>> [ 2494.911042] PGD 0 P4D 0
>> [ 2494.913596] Oops: 0002 [#1] SMP PTI
>> [ 2494.917106] CPU: 11 PID: 21342 Comm: nfsd Not tainted
>> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
>> [ 2494.926285] Hardware name: IBM System x3650 M3
>> -[7945UHV]-/94Y7614     , BIOS -[D6E150CUS-1.11]- 02/08/2011
>> [ 2494.936077] RIP: 0010:_raw_spin_lock+0xc/0x20
>> [ 2494.940461] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
>> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
>> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
>> [ 2494.959269] RSP: 0018:ffffb949cca7fac0 EFLAGS: 00010246
>> [ 2494.964511] RAX: 0000000000000000 RBX: 00000000000faaf5 RCX:
>> 0000000000000000
>> [ 2494.971664] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
>> 00000000000fab7d
>> [ 2494.978815] RBP: 00000000000fab7d R08: ffff99c8f12737b8 R09:
>> 000000000002c340
>> [ 2494.985968] R10: 000005dc16e333d8 R11: 0000000000000000 R12:
>> ffff99c903066bd0
>> [ 2494.993119] R13: ffff99c9014c7bc0 R14: 000000000000ffff R15:
>> ffff99c9014c70c8
>> [ 2495.000271] FS:  0000000000000000(0000) GS:ffff99ba03bc0000(0000)
>> knlGS:0000000000000000
>> [ 2495.008382] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 2495.014141] CR2: 00000000000fab7d CR3: 0000000f38df2002 CR4:
>> 00000000000206e0
>> [ 2495.021291] Call Trace:
>> [ 2495.023753]  igrab+0x19/0x50
>> [ 2495.026693]  ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
>> [ 2495.032301]  ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
>> [ 2495.037741]  ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
>> [ 2495.042742]  ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
>> [ 2495.048525]  ocfs2_get_parent+0xa3/0x300 [ocfs2]
>> [ 2495.053163]  reconnect_path+0xa1/0x2c0
>> [ 2495.056943]  ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
>> [ 2495.061838]  exportfs_decode_fh+0x111/0x2d0
>> [ 2495.066042]  ? exp_find_key+0xcd/0x160 [nfsd]
>> [ 2495.070417]  ? __kmalloc+0x180/0x270
>> [ 2495.074007]  ? security_prepare_creds+0x6f/0xa0
>> [ 2495.078551]  ? tomoyo_write_self+0x1b0/0x1b0
>> [ 2495.082834]  ? security_prepare_creds+0x49/0xa0
>> [ 2495.087387]  fh_verify+0x3e5/0x5f0 [nfsd]
>> [ 2495.092125]  nfsd3_proc_getattr+0x6b/0x100 [nfsd]
>> [ 2495.097517]  nfsd_dispatch+0x9e/0x210 [nfsd]
>> [ 2495.102489]  svc_process_common+0x386/0x6f0 [sunrpc]
>> [ 2495.108134]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
>> [ 2495.113946]  ? svc_recv+0x2ff/0x9c0 [sunrpc]
>> [ 2495.118883]  ? nfsd_svc+0x2c0/0x2c0 [nfsd]
>> [ 2495.123651]  ? nfsd_destroy+0x50/0x50 [nfsd]
>> [ 2495.128604]  svc_process+0xd1/0x110 [sunrpc]
>> [ 2495.133563]  nfsd+0xe3/0x140 [nfsd]
>> [ 2495.137733]  kthread+0x112/0x130
>> [ 2495.141643]  ? kthread_park+0x80/0x80
>> [ 2495.145993]  ret_from_fork+0x35/0x40
>> [ 2495.150262] Modules linked in: ocfs2 quota_tree nfsd nfs_acl lockd
>> grace sctp libcrc32c nft_counter xt_tcpudp nft_compat nf_tables
>> nfnetlink binfmt_misc ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
>> ocfs2_nodemanager ocfs2_stackglue cpufreq_userspace cpufreq_powersave
>> cpufreq_conservative configfs bonding dm_round_robin dm_multipath
>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_powerclamp coretemp
>> kvm_intel kvm ipmi_ssif irqbypass cdc_ether usbnet intel_cstate mii
>> ib_mthca intel_uncore evdev iTCO_wdt joydev ipmi_si tpm_tis sg
>> tpm_tis_core tpm ipmi_devintf iTCO_vendor_support ib_uverbs pcspkr
>> watchdog ioatdma rng_core i7core_edac dca i5500_temp ipmi_msghandler
>> ib_umad ib_ipoib auth_rpcgss ib_cm sunrpc ib_core ip_tables x_tables
>> autofs4 algif_skcipher af_alg ext4 crc16 mbcache jbd2 crc32c_generic
>> dm_crypt dm_mod sr_mod cdrom sd_mod hid_generic usbhid hid
>> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mgag200
>> drm_vram_helper drm_ttm_helper i2c_algo_bit ttm qla2xxx drm_kms_helper
>> [ 2495.150298]  ata_generic drm ata_piix aesni_intel libata nvme_fc
>> uhci_hcd crypto_simd nvme_fabrics ehci_pci ehci_hcd megaraid_sas cryptd
>> nvme_core glue_helper usbcore scsi_transport_fc lpc_ich mfd_core e1000e
>> scsi_mod i2c_i801 usb_common bnx2 ptp pps_core button
>> [ 2495.269073] CR2: 00000000000fab7d
>> [ 2495.273205] ---[ end trace 44f000505f987296 ]---
>> [ 2495.278647] RIP: 0010:_raw_spin_lock+0xc/0x20
>> [ 2495.283825] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
>> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
>> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
>> [ 2495.304287] RSP: 0018:ffffb949cca7fac0 EFLAGS: 00010246
>> [ 2495.310368] RAX: 0000000000000000 RBX: 00000000000faaf5 RCX:
>> 0000000000000000
>> [ 2495.318358] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
>> 00000000000fab7d
>> [ 2495.326339] RBP: 00000000000fab7d R08: ffff99c8f12737b8 R09:
>> 000000000002c340
>> [ 2495.334314] R10: 000005dc16e333d8 R11: 0000000000000000 R12:
>> ffff99c903066bd0
>> [ 2495.342297] R13: ffff99c9014c7bc0 R14: 000000000000ffff R15:
>> ffff99c9014c70c8
>> [ 2495.350279] FS:  0000000000000000(0000) GS:ffff99ba03bc0000(0000)
>> knlGS:0000000000000000
>> [ 2495.359224] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 2495.365825] CR2: 00000000000fab7d CR3: 0000000f38df2002 CR4:
>> 00000000000206e0
>>
>> -----8< cut here 8<-----
>>
>>
>> I also made some record of NFS traffic with tcpdump when the crashs are
>> occuring, but I didn't had time to analyse it. I can provide them if it
>> helps (but I might have to clean them up first to avoid some data
>> leaks :)
>>
>> After this happening a few times, I had one case where one of the few
>> NFSv4 client was involved, and I took as aworkaround to migrate the
>> NFSv4 share to XFS since I though it could be NFSv4 related. Now that I
>> looked at the kernel dumps, I think it's also NFSv3 related.
>>
>> Is it a kernel bug or am I missing something ?
>>
>> For now, I will switch back to XFS with a failover setup of my cluster,
>> cause I cannot afford this kind of crashes of a central file server.
>>
>>
>>
>> Cheers
>>
>>
>> Pierre
>>
>>
>> PS : I'll try to update with what I found in the PCAP files in some days
>>
>>
>>
>>>
>>> _______________________________________________
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel at oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Ocfs2-devel] NFS clients crash OCFS2 nodes (general protection fault: 0000 [#1] SMP PTI)
  2020-06-02 15:56     ` herbert.van.den.bergh at oracle.com
@ 2020-06-12  1:36       ` Junxiao Bi
  0 siblings, 0 replies; 8+ messages in thread
From: Junxiao Bi @ 2020-06-12  1:36 UTC (permalink / raw)
  To: ocfs2-devel

Hi Pierre,

Please try the 3 patches i just post to the mail list. It may fix your 
issue.

Thanks,

Junxiao.

On 6/2/20 8:56 AM, herbert.van.den.bergh at oracle.com wrote:
> Hi Pierre,
>
> This is tracked in an internal bug tracking system, which is not
> available to the public or to customers. We'll make sure to post an
> announcement to this list when the bug is fixed.
>
> Thanks,
> Herbert.
>
> On 6/2/20 3:10 AM, Pierre Dinh-van wrote:
>> Hi Herbert,
>>
>> Is there somewhere a Bugtracker for it ? Or are the bugs @oracle not
>> public anymore ?
>>
>> Browsing/searching bugzilla don't show me any bug :
>> https://bugzilla.oracle.com/bugzilla/describecomponents.cgi
>>
>> Cheers
>>
>> Pierre
>>
>> On 5/21/20 12:01 AM, herbert.van.den.bergh at oracle.com wrote:
>>> Pierre,
>>>
>>> A similar if not identical bug was reported against Oracle Linux. The
>>> fix is still being developed. I will update that bug to reference this
>>> thread, and ask the developer to submit the fix upstream and/or to the
>>> ocfs2-devel list once it's ready.
>>>
>>> Thanks,
>>> Herbert.
>>>
>>> On 5/19/20 3:48 AM, Pierre Dinh-van wrote:
>>> Hi,
>>>
>>> I'm experiencing quite reproducable crashs on my OCFS2 cluster the last
>>> weeks, and I think I found out part of the Problem. But before I make a
>>> bug report (I'm not sure where), I though I could post my research here
>>> in case someone understands it better than I.
>>>
>>> When it happens, the kernel is giving me a 'general protection fault:
>>> 0000 [#1] SMP PTI' or ' BUG: unable to handle page fault for address:'
>>> message, local access to the OCFS2 volumes are all hanging forever, and
>>> the load is getting so high that only a hard reset "solves" the problem.
>>> It happened the same way on both nodes (always on the active node
>>> serving NFS). It happens when both nodes are up, or when only one is
>>> active.
>>>
>>> First, a few words about my setup :
>>>
>>> - storage is Fiberchannel LUNs served by a EMC VNX5200 (4x 5TB).
>>>
>>> - LUNs are encrypted with LUKS
>>>
>>> - LUKS volumes are formated with OCFS2
>>>
>>> - heartbeat=local
>>>
>>> - The OCFS2 cluster is communicating on a dedictated IP over InfinyBand
>>> direct link (actually a fail-over bonding of the 2 ib interfaces).
>>>
>>> - nodes are 2 debian buster hardware servers (24 cores, 128GB RAM).
>>>
>>> - I use corosync for cluster management, but I don't think it's related
>>>
>>> - The cluster serves 4 NFS exports (3 are NFSv3, and one is NFSv4 with
>>> kerberos) and samba shares
>>>
>>> - The samba shares are served in parallel by the 2 nodes managed by
>>> ctdb
>>>
>>> - The NFS exports are served by one of the node having the dedicated IP
>>> for this export.
>>>
>>> - When corosync is moving the IP of a NFS export to the other node, it
>>> restarts nfsd to issue a grace time of 10s on the new active node.
>>>
>>> - The bug appeared with default debian buster kernel 4.19.0-9-amd64 and
>>> with the backported kernel 5.5.0-0.bpo.2-amd64 as well
>>>
>>>
>>> My cluster went live a few weeks ago, and it first worked with both
>>> nodes active. During some maintenance work where the active NFS node had
>>> to switch, I had a a bug, that I was able to reproduce against my will
>>> at least 20 times in the last nights :
>>>
>>> 1) around 100 NFS clients are connected with the exports and happy
>>> working with it
>>>
>>> 2) The IPs of the NFS shares are switching from nodeA to nodeB
>>>
>>> 3) nodeB's kernel says "general protection fault: 0000 [#1] SMP PTI"
>>>
>>> 4) nodeB's load is getting high
>>>
>>> 5) access to the OCFS2 is hanging (clean reboot never end, system cannot
>>> unmount OCFS volumes)
>>>
>>> 6) I power cycle nodeB
>>>
>>> 7) I start nodeA again
>>>
>>> 8) nodeA give the same symptoms, as soon as some NFS clients do some
>>> special requests
>>>
>>>
>>> Most of the time, when the IP jumps to the other node, I get some of
>>> this messages :
>>>
>>>
>>> [  501.707566] (nfsd,2133,14):ocfs2_test_inode_bit:2841 ERROR: unable to
>>> get alloc inode in slot 65535
>>> [  501.716660] (nfsd,2133,14):ocfs2_test_inode_bit:2867 ERROR: status
>>> = -22
>>> [  501.726585] (nfsd,2133,6):ocfs2_test_inode_bit:2841 ERROR: unable to
>>> get alloc inode in slot 65535
>>> [  501.735579] (nfsd,2133,6):ocfs2_test_inode_bit:2867 ERROR: status =
>>> -22
>>>
>>> But it also happens when the node is not crashing, so I think it's
>>> another Problem.
>>>
>>> Last night, I saved a few of the kernel output while the servers where
>>> crashing one after the other like in a ping-pong game :
>>>
>>>
>>> -----8< on nodeB  8<-----
>>>
>>> [  502.070431] nfsd: nfsv4 idmapping failing: has idmapd not been
>>> started?
>>> [  502.475747] general protection fault: 0000 [#1] SMP PTI
>>> [  502.481027] CPU: 6 PID: 2104 Comm: nfsd Not tainted
>>> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
>>> [  502.490016] Hardware name: IBM System x3650 M3
>>> -[7945UHV]-/94Y7614     , BIOS -[D6E150CUS-1.11]- 02/08/2011
>>> [  502.499821] RIP: 0010:_raw_spin_lock+0xc/0x20
>>> [  502.504206] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
>>> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
>>> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
>>> [  502.523018] RSP: 0018:ffffa5af4838bac0 EFLAGS: 00010246
>>> [  502.528277] RAX: 0000000000000000 RBX: 0fdbbeded053c1d6 RCX:
>>> 0000000000000000
>>> [  502.535433] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
>>> 0fdbbeded053c25e
>>> [  502.542588] RBP: 0fdbbeded053c25e R08: ffff9672ffa73068 R09:
>>> 000000000002c340
>>> [  502.549743] R10: 00000184d03a2115 R11: 0000000000000000 R12:
>>> ffff9672eff43fd0
>>> [  502.556899] R13: ffff9672fff66bc0 R14: 000000000000ffff R15:
>>> ffff9672fff660c8
>>> [  502.564055] FS:  0000000000000000(0000) GS:ffff968203a80000(0000)
>>> knlGS:0000000000000000
>>> [  502.572166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  502.577928] CR2: 00007f0a4bb5df50 CR3: 0000001e4169a004 CR4:
>>> 00000000000206e0
>>> [  502.585082] Call Trace:
>>> [  502.587546]  igrab+0x19/0x50
>>> [  502.590484]  ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
>>> [  502.596094]  ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
>>> [  502.601534]  ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
>>> [  502.606536]  ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
>>> [  502.612321]  ocfs2_get_parent+0xa3/0x300 [ocfs2]
>>> [  502.616960]  reconnect_path+0xa1/0x2c0
>>> [  502.620742]  ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
>>> [  502.625636]  exportfs_decode_fh+0x111/0x2d0
>>> [  502.629844]  ? exp_find_key+0xcd/0x160 [nfsd]
>>> [  502.634218]  ? __kmalloc+0x180/0x270
>>> [  502.637810]  ? security_prepare_creds+0x6f/0xa0
>>> [  502.642356]  ? tomoyo_write_self+0x1b0/0x1b0
>>> [  502.646642]  ? security_prepare_creds+0x49/0xa0
>>> [  502.651995]  fh_verify+0x3e5/0x5f0 [nfsd]
>>> [  502.656737]  nfsd3_proc_getattr+0x6b/0x100 [nfsd]
>>> [  502.662152]  nfsd_dispatch+0x9e/0x210 [nfsd]
>>> [  502.667136]  svc_process_common+0x386/0x6f0 [sunrpc]
>>> [  502.672791]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
>>> [  502.678628]  ? svc_recv+0x2ff/0x9c0 [sunrpc]
>>> [  502.683597]  ? nfsd_svc+0x2c0/0x2c0 [nfsd]
>>> [  502.688395]  ? nfsd_destroy+0x50/0x50 [nfsd]
>>> [  502.693377]  svc_process+0xd1/0x110 [sunrpc]
>>> [  502.698361]  nfsd+0xe3/0x140 [nfsd]
>>> [  502.702561]  kthread+0x112/0x130
>>> [  502.706504]  ? kthread_park+0x80/0x80
>>> [  502.710886]  ret_from_fork+0x35/0x40
>>> [  502.715181] Modules linked in: rpcsec_gss_krb5 nfsd nfs_acl lockd
>>> grace ocfs2 quota_tree sctp libcrc32c nft_counter ipt_REJECT
>>> nf_reject_ipv4 xt_tcpudp nft_compat nf_tables nfnetlink binfmt_misc
>>> ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
>>> cpufreq_userspace ocfs2_stackglue cpufreq_powersave cpufreq_conservative
>>> configfs bonding dm_round_robin dm_multipath scsi_dh_rdac scsi_dh_emc
>>> scsi_dh_alua intel_powerclamp coretemp kvm_intel ipmi_ssif kvm cdc_ether
>>> irqbypass usbnet mii tpm_tis intel_cstate tpm_tis_core intel_uncore
>>> joydev tpm ipmi_si ioatdma sg iTCO_wdt ib_mthca iTCO_vendor_support
>>> watchdog dca pcspkr ipmi_devintf rng_core ib_uverbs ipmi_msghandler
>>> i5500_temp i7core_edac evdev ib_umad ib_ipoib ib_cm ib_core auth_rpcgss
>>> sunrpc ip_tables x_tables autofs4 algif_skcipher af_alg ext4 crc16
>>> mbcache jbd2 crc32c_generic dm_crypt dm_mod sd_mod hid_generic usbhid
>>> hid sr_mod cdrom crct10dif_pclmul crc32_pclmul crc32c_intel
>>> ghash_clmulni_intel mgag200 drm_vram_helper drm_ttm_helper
>>> [  502.715218]  i2c_algo_bit ttm qla2xxx drm_kms_helper drm ata_generic
>>> ata_piix aesni_intel libata nvme_fc ehci_pci uhci_hcd crypto_simd
>>> ehci_hcd nvme_fabrics cryptd glue_helper lpc_ich nvme_core megaraid_sas
>>> mfd_core usbcore scsi_transport_fc e1000e scsi_mod i2c_i801 usb_common
>>> ptp pps_core bnx2 button
>>> [  502.837917] ---[ end trace 6e1a8e507ac30d8d ]---
>>> [  502.843379] RIP: 0010:_raw_spin_lock+0xc/0x20
>>> [  502.848584] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
>>> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
>>> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
>>> [  502.869295] RSP: 0018:ffffa5af4838bac0 EFLAGS: 00010246
>>> [  502.875404] RAX: 0000000000000000 RBX: 0fdbbeded053c1d6 RCX:
>>> 0000000000000000
>>> [  502.883423] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
>>> 0fdbbeded053c25e
>>> [  502.891436] RBP: 0fdbbeded053c25e R08: ffff9672ffa73068 R09:
>>> 000000000002c340
>>> [  502.899446] R10: 00000184d03a2115 R11: 0000000000000000 R12:
>>> ffff9672eff43fd0
>>> [  502.907455] R13: ffff9672fff66bc0 R14: 000000000000ffff R15:
>>> ffff9672fff660c8
>>> [  502.915473] FS:  0000000000000000(0000) GS:ffff968203a80000(0000)
>>> knlGS:0000000000000000
>>> [  502.924496] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  502.931147] CR2: 00007f0a4bb5df50 CR3: 0000001e4169a004 CR4:
>>> 00000000000206e0
>>>
>>> -----8< cut here 8<-----
>>>
>>>
>>> shortly after when trying to start nodeA while nodeB was down :
>>>
>>> -----8< nodeA 8<-----
>>>
>>> [  728.927750] general protection fault: 0000 [#1] SMP PTI
>>> [  728.934647] CPU: 8 PID: 12156 Comm: nfsd Not tainted
>>> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
>>> [  728.944800] Hardware name: IBM System x3650 M3 -[7945H2G]-/69Y4438,
>>> BIOS -[D6E158AUS-1.16]- 11/26/2012
>>> [  728.955176] RIP: 0010:_raw_spin_lock+0xc/0x20
>>> [  728.960595] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
>>> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
>>> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
>>> [  728.981559] RSP: 0018:ffffa24d09187ac0 EFLAGS: 00010246
>>> [  728.987885] RAX: 0000000000000000 RBX: 625f656369766564 RCX:
>>> 0000000000000000
>>> [  728.996135] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
>>> 625f6563697665ec
>>> [  729.004384] RBP: 625f6563697665ec R08: ffff91f1272e3680 R09:
>>> 000000000002c340
>>> [  729.012628] R10: 000002114c00d22b R11: 0000000000000000 R12:
>>> ffff91e13858bbd0
>>> [  729.020877] R13: ffff91f13c03bbc0 R14: 000000000000ffff R15:
>>> ffff91f13c03b0c8
>>> [  729.029108] FS:  0000000000000000(0000) GS:ffff91f13fa80000(0000)
>>> knlGS:0000000000000000
>>> [  729.038308] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  729.045161] CR2: 000055fe95d1d9a8 CR3: 0000001c8420a003 CR4:
>>> 00000000000206e0
>>> [  729.053415] Call Trace:
>>> [  729.056978]  igrab+0x19/0x50
>>> [  729.061023]  ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
>>> [  729.067736]  ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
>>> [  729.074284]  ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
>>> [  729.080386]  ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
>>> [  729.087268]  ocfs2_get_parent+0xa3/0x300 [ocfs2]
>>> [  729.092995]  reconnect_path+0xa1/0x2c0
>>> [  729.097863]  ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
>>> [  729.103840]  exportfs_decode_fh+0x111/0x2d0
>>> [  729.109133]  ? exp_find_key+0xcd/0x160 [nfsd]
>>> [  729.114574]  ? dbs_update_util_handler+0x16/0x90
>>> [  729.120273]  ? __kmalloc+0x180/0x270
>>> [  729.124920]  ? security_prepare_creds+0x6f/0xa0
>>> [  729.130521]  ? tomoyo_write_self+0x1b0/0x1b0
>>> [  729.135858]  ? security_prepare_creds+0x49/0xa0
>>> [  729.141464]  fh_verify+0x3e5/0x5f0 [nfsd]
>>> [  729.146536]  nfsd3_proc_getattr+0x6b/0x100 [nfsd]
>>> [  729.152297]  nfsd_dispatch+0x9e/0x210 [nfsd]
>>> [  729.157647]  svc_process_common+0x386/0x6f0 [sunrpc]
>>> [  729.163675]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
>>> [  729.169873]  ? svc_recv+0x2ff/0x9c0 [sunrpc]
>>> [  729.175190]  ? nfsd_svc+0x2c0/0x2c0 [nfsd]
>>> [  729.180323]  ? nfsd_destroy+0x50/0x50 [nfsd]
>>> [  729.185631]  svc_process+0xd1/0x110 [sunrpc]
>>> [  729.190925]  nfsd+0xe3/0x140 [nfsd]
>>> [  729.195421]  kthread+0x112/0x130
>>> [  729.199636]  ? kthread_park+0x80/0x80
>>> [  729.204279]  ret_from_fork+0x35/0x40
>>> [  729.208818] Modules linked in: ocfs2 quota_tree dm_crypt crypto_simd
>>> cryptd glue_helper algif_skcipher af_alg nfsd nfs_acl lockd grace sctp
>>> libcrc32c nft_counter xt_tcpudp nft_compat nf_tables nfnetlink
>>> binfmt_misc ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
>>> ocfs2_stackglue configfs bonding dm_round_robin dm_multipath
>>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_powerclamp ipmi_ssif
>>> coretemp kvm_intel kvm irqbypass cdc_ether intel_cstate usbnet ib_mthca
>>> mii joydev intel_uncore ib_uverbs iTCO_wdt sg ioatdma
>>> iTCO_vendor_support i7core_edac watchdog i5500_temp pcspkr ipmi_si
>>> tpm_tis tpm_tis_core tpm ipmi_devintf ipmi_msghandler rng_core evdev
>>> acpi_cpufreq ib_umad auth_rpcgss ib_ipoib ib_cm sunrpc ib_core ip_tables
>>> x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_mod
>>> hid_generic sd_mod usbhid hid sr_mod cdrom uas usb_storage mgag200
>>> drm_vram_helper drm_ttm_helper qla2xxx ata_generic i2c_algo_bit ttm
>>> nvme_fc ixgbe nvme_fabrics ata_piix xfrm_algo ehci_pci uhci_hcd
>>> [  729.208868]  drm_kms_helper nvme_core ehci_hcd dca libata
>>> megaraid_sas scsi_transport_fc libphy ptp drm usbcore scsi_mod
>>> crc32c_intel lpc_ich i2c_i801 pps_core mfd_core usb_common bnx2 mdio
>>> button
>>> [  729.323900] ---[ end trace 212cefe75207e17f ]---
>>> [  729.329644] RIP: 0010:_raw_spin_lock+0xc/0x20
>>> [  729.335121] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
>>> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
>>> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
>>> [  729.356192] RSP: 0018:ffffa24d09187ac0 EFLAGS: 00010246
>>> [  729.362584] RAX: 0000000000000000 RBX: 625f656369766564 RCX:
>>> 0000000000000000
>>> [  729.370889] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
>>> 625f6563697665ec
>>> [  729.379184] RBP: 625f6563697665ec R08: ffff91f1272e3680 R09:
>>> 000000000002c340
>>> [  729.387475] R10: 000002114c00d22b R11: 0000000000000000 R12:
>>> ffff91e13858bbd0
>>> [  729.395767] R13: ffff91f13c03bbc0 R14: 000000000000ffff R15:
>>> ffff91f13c03b0c8
>>> [  729.404066] FS:  0000000000000000(0000) GS:ffff91f13fa80000(0000)
>>> knlGS:0000000000000000
>>> [  729.413336] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  729.420262] CR2: 000055fe95d1d9a8 CR3: 0000001c8420a003 CR4:
>>> 00000000000206e0
>>>
>>> -----8< cut here 8<-----
>>>
>>> Since the backtrace is speaking about nfsd, I tried to isolate the
>>> Problem by droping all NFS packet entering to the active node before
>>> starting the services :
>>>
>>> iptables -I INPUT -p tcp --dport 2049 -j DROP
>>>
>>> Everything fine, access over samba were doing fine.
>>>
>>> I started to allow NFS for part of my network with iptables, until the
>>> crash occurs.
>>>
>>> On this one, the crash occured after allowing only a NFSv3 client.
>>>
>>> -----8< nodeB 8<-----
>>>
>>> [ 2494.893697] BUG: unable to handle page fault for address:
>>> 00000000000fab7d
>>> [ 2494.900626] #PF: supervisor write access in kernel mode
>>> [ 2494.905873] #PF: error_code(0x0002) - not-present page
>>> [ 2494.911042] PGD 0 P4D 0
>>> [ 2494.913596] Oops: 0002 [#1] SMP PTI
>>> [ 2494.917106] CPU: 11 PID: 21342 Comm: nfsd Not tainted
>>> 5.5.0-0.bpo.2-amd64 #1 Debian 5.5.17-1~bpo10+1
>>> [ 2494.926285] Hardware name: IBM System x3650 M3
>>> -[7945UHV]-/94Y7614     , BIOS -[D6E150CUS-1.11]- 02/08/2011
>>> [ 2494.936077] RIP: 0010:_raw_spin_lock+0xc/0x20
>>> [ 2494.940461] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
>>> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
>>> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
>>> [ 2494.959269] RSP: 0018:ffffb949cca7fac0 EFLAGS: 00010246
>>> [ 2494.964511] RAX: 0000000000000000 RBX: 00000000000faaf5 RCX:
>>> 0000000000000000
>>> [ 2494.971664] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
>>> 00000000000fab7d
>>> [ 2494.978815] RBP: 00000000000fab7d R08: ffff99c8f12737b8 R09:
>>> 000000000002c340
>>> [ 2494.985968] R10: 000005dc16e333d8 R11: 0000000000000000 R12:
>>> ffff99c903066bd0
>>> [ 2494.993119] R13: ffff99c9014c7bc0 R14: 000000000000ffff R15:
>>> ffff99c9014c70c8
>>> [ 2495.000271] FS:  0000000000000000(0000) GS:ffff99ba03bc0000(0000)
>>> knlGS:0000000000000000
>>> [ 2495.008382] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 2495.014141] CR2: 00000000000fab7d CR3: 0000000f38df2002 CR4:
>>> 00000000000206e0
>>> [ 2495.021291] Call Trace:
>>> [ 2495.023753]  igrab+0x19/0x50
>>> [ 2495.026693]  ocfs2_get_system_file_inode+0x65/0x2c0 [ocfs2]
>>> [ 2495.032301]  ? ocfs2_read_blocks_sync+0x159/0x330 [ocfs2]
>>> [ 2495.037741]  ocfs2_test_inode_bit+0xe8/0x900 [ocfs2]
>>> [ 2495.042742]  ? ocfs2_free_dir_lookup_result+0x24/0x50 [ocfs2]
>>> [ 2495.048525]  ocfs2_get_parent+0xa3/0x300 [ocfs2]
>>> [ 2495.053163]  reconnect_path+0xa1/0x2c0
>>> [ 2495.056943]  ? nfsd_proc_setattr+0x1b0/0x1b0 [nfsd]
>>> [ 2495.061838]  exportfs_decode_fh+0x111/0x2d0
>>> [ 2495.066042]  ? exp_find_key+0xcd/0x160 [nfsd]
>>> [ 2495.070417]  ? __kmalloc+0x180/0x270
>>> [ 2495.074007]  ? security_prepare_creds+0x6f/0xa0
>>> [ 2495.078551]  ? tomoyo_write_self+0x1b0/0x1b0
>>> [ 2495.082834]  ? security_prepare_creds+0x49/0xa0
>>> [ 2495.087387]  fh_verify+0x3e5/0x5f0 [nfsd]
>>> [ 2495.092125]  nfsd3_proc_getattr+0x6b/0x100 [nfsd]
>>> [ 2495.097517]  nfsd_dispatch+0x9e/0x210 [nfsd]
>>> [ 2495.102489]  svc_process_common+0x386/0x6f0 [sunrpc]
>>> [ 2495.108134]  ? svc_sock_secure_port+0x12/0x30 [sunrpc]
>>> [ 2495.113946]  ? svc_recv+0x2ff/0x9c0 [sunrpc]
>>> [ 2495.118883]  ? nfsd_svc+0x2c0/0x2c0 [nfsd]
>>> [ 2495.123651]  ? nfsd_destroy+0x50/0x50 [nfsd]
>>> [ 2495.128604]  svc_process+0xd1/0x110 [sunrpc]
>>> [ 2495.133563]  nfsd+0xe3/0x140 [nfsd]
>>> [ 2495.137733]  kthread+0x112/0x130
>>> [ 2495.141643]  ? kthread_park+0x80/0x80
>>> [ 2495.145993]  ret_from_fork+0x35/0x40
>>> [ 2495.150262] Modules linked in: ocfs2 quota_tree nfsd nfs_acl lockd
>>> grace sctp libcrc32c nft_counter xt_tcpudp nft_compat nf_tables
>>> nfnetlink binfmt_misc ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
>>> ocfs2_nodemanager ocfs2_stackglue cpufreq_userspace cpufreq_powersave
>>> cpufreq_conservative configfs bonding dm_round_robin dm_multipath
>>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_powerclamp coretemp
>>> kvm_intel kvm ipmi_ssif irqbypass cdc_ether usbnet intel_cstate mii
>>> ib_mthca intel_uncore evdev iTCO_wdt joydev ipmi_si tpm_tis sg
>>> tpm_tis_core tpm ipmi_devintf iTCO_vendor_support ib_uverbs pcspkr
>>> watchdog ioatdma rng_core i7core_edac dca i5500_temp ipmi_msghandler
>>> ib_umad ib_ipoib auth_rpcgss ib_cm sunrpc ib_core ip_tables x_tables
>>> autofs4 algif_skcipher af_alg ext4 crc16 mbcache jbd2 crc32c_generic
>>> dm_crypt dm_mod sr_mod cdrom sd_mod hid_generic usbhid hid
>>> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mgag200
>>> drm_vram_helper drm_ttm_helper i2c_algo_bit ttm qla2xxx drm_kms_helper
>>> [ 2495.150298]  ata_generic drm ata_piix aesni_intel libata nvme_fc
>>> uhci_hcd crypto_simd nvme_fabrics ehci_pci ehci_hcd megaraid_sas cryptd
>>> nvme_core glue_helper usbcore scsi_transport_fc lpc_ich mfd_core e1000e
>>> scsi_mod i2c_i801 usb_common bnx2 ptp pps_core button
>>> [ 2495.269073] CR2: 00000000000fab7d
>>> [ 2495.273205] ---[ end trace 44f000505f987296 ]---
>>> [ 2495.278647] RIP: 0010:_raw_spin_lock+0xc/0x20
>>> [ 2495.283825] Code: 66 90 66 66 90 31 c0 ba ff 00 00 00 f0 0f b1 17 75
>>> 05 48 89 d8 5b c3 e8 12 5f 91 ff eb f4 66 66 66 66 90 31 c0 ba 01 00 00
>>> 00 <f0> 0f b1 17 75 01 c3 89 c6 e8 76 46 91 ff 66 90 c3 0f 1f 00 66 66
>>> [ 2495.304287] RSP: 0018:ffffb949cca7fac0 EFLAGS: 00010246
>>> [ 2495.310368] RAX: 0000000000000000 RBX: 00000000000faaf5 RCX:
>>> 0000000000000000
>>> [ 2495.318358] RDX: 0000000000000001 RSI: 0000000000000009 RDI:
>>> 00000000000fab7d
>>> [ 2495.326339] RBP: 00000000000fab7d R08: ffff99c8f12737b8 R09:
>>> 000000000002c340
>>> [ 2495.334314] R10: 000005dc16e333d8 R11: 0000000000000000 R12:
>>> ffff99c903066bd0
>>> [ 2495.342297] R13: ffff99c9014c7bc0 R14: 000000000000ffff R15:
>>> ffff99c9014c70c8
>>> [ 2495.350279] FS:  0000000000000000(0000) GS:ffff99ba03bc0000(0000)
>>> knlGS:0000000000000000
>>> [ 2495.359224] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 2495.365825] CR2: 00000000000fab7d CR3: 0000000f38df2002 CR4:
>>> 00000000000206e0
>>>
>>> -----8< cut here 8<-----
>>>
>>>
>>> I also made some record of NFS traffic with tcpdump when the crashs are
>>> occuring, but I didn't had time to analyse it. I can provide them if it
>>> helps (but I might have to clean them up first to avoid some data
>>> leaks :)
>>>
>>> After this happening a few times, I had one case where one of the few
>>> NFSv4 client was involved, and I took as aworkaround to migrate the
>>> NFSv4 share to XFS since I though it could be NFSv4 related. Now that I
>>> looked at the kernel dumps, I think it's also NFSv3 related.
>>>
>>> Is it a kernel bug or am I missing something ?
>>>
>>> For now, I will switch back to XFS with a failover setup of my cluster,
>>> cause I cannot afford this kind of crashes of a central file server.
>>>
>>>
>>>
>>> Cheers
>>>
>>>
>>> Pierre
>>>
>>>
>>> PS : I'll try to update with what I found in the PCAP files in some days
>>>
>>>
>>>
>>>> _______________________________________________
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel at oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-19 10:48 [Ocfs2-devel] NFS clients crash OCFS2 nodes (general protection fault: 0000 [#1] SMP PTI) Pierre Dinh-van
2020-05-20  2:52 ` Changwei Ge
2020-05-21 17:02   ` Pierre Dinh-van
2020-05-22  1:35     ` Changwei Ge
2020-05-20 22:01 ` herbert.van.den.bergh at oracle.com
2020-06-02 10:10   ` Pierre Dinh-van
2020-06-02 15:56     ` herbert.van.den.bergh at oracle.com
2020-06-12  1:36       ` Junxiao Bi

OCFS2-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/ocfs2-devel/0 ocfs2-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ocfs2-devel ocfs2-devel/ https://lore.kernel.org/ocfs2-devel \
		ocfs2-devel@oss.oracle.com
	public-inbox-index ocfs2-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/com.oracle.oss.ocfs2-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git