All of lore.kernel.org
 help / color / mirror / Atom feed
From: Salvatore Bonaccorso <carnil@debian.org>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work
Date: Sun, 18 Oct 2020 11:39:03 +0200	[thread overview]
Message-ID: <20201018093903.GA364695@eldamar.lan> (raw)
In-Reply-To: <20201012163355.GF26571@fieldses.org>

Hi,

On Mon, Oct 12, 2020 at 12:33:55PM -0400, J. Bruce Fields wrote:
> On Mon, Oct 12, 2020 at 05:41:59PM +0200, Salvatore Bonaccorso wrote:
> > Hi Bruce,
> > 
> > Thanks a lot for your reply, much appreciated.
> > 
> > On Mon, Oct 12, 2020 at 10:26:02AM -0400, J. Bruce Fields wrote:
> > > On Sun, Oct 11, 2020 at 09:59:13AM +0200, Salvatore Bonaccorso wrote:
> > > > Hi
> > > > 
> > > > On a system running 4.19.146-1 in Debian buster an issue got hit,
> > > > while the server was under some slight load, but it does not seem
> > > > easily reproducible, so asking if some more information can be
> > > > provided to track/narrow this down. On the console the following was
> > > > caught:
> > > 
> > > Worth checking git logs of fs/nfsd/nfs4state.c and
> > > fs/nfsd/nfs4callback.c.  It might be
> > > 2bbfed98a4d82ac4e7abfcd4eba40bddfc670b1d "nfsd: Fix races between
> > > nfsd4_cb_release() and nfsd4_shutdown_callback()" ?
> > 
> > That might be possible. As it was not possible to simply trigger the
> > issue, do you know if it is possible to simply reproduce the issue
> > fixed in the above?
> 
> I don't have a reproducer.

I stil could not find a way to controlled trigger the issue, but on
the same server the following was caused as well, which might be
releated as well (altough different backtrace, but maybe gives
additional hints what can be looked for):

[ 4390.059004] ------------[ cut here ]------------
[ 4390.063780] WARNING: CPU: 14 PID: 24184 at fs/nfsd/nfs4state.c:4778 laundromat_main.cold.125+0x31/0x7a [nfsd]
[ 4390.073763] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache quota_v2 quota_tree bonding intel_rapl skx_edac nfit libnv
dimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate ast ttm drm_kms_helper intel_uncore mei_me drm jo
ydev iTCO_wdt evdev pcc_cpufreq pcspkr sg ioatdma intel_rapl_perf mei iTCO_vendor_support i2c_algo_bit dca ipmi_si wmi ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad button nfsd auth_rpcgss
nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc
32c_generic raid1
[ 4390.144313]  raid0 multipath linear md_mod hid_generic usbhid hid sd_mod crc32c_intel aesni_intel ahci xhci_pci libahci aes_x86_64 xhci_hcd arcmsr crypto_simd libata cryptd i40e usbcore scsi_mod
glue_helper lpc_ich i2c_i801 mfd_core usb_common
[ 4390.165906] CPU: 14 PID: 24184 Comm: kworker/u42:2 Not tainted 4.19.0-11-amd64 #1 Debian 4.19.146-1
[ 4390.174969] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[ 4390.184654] Workqueue: nfsd4 laundromat_main [nfsd]
[ 4390.189550] RIP: 0010:laundromat_main.cold.125+0x31/0x7a [nfsd]
[ 4390.195484] Code: f6 6b c0 e8 24 ae 62 d7 e9 6f 21 ff ff 48 c7 c7 38 f6 6b c0 e8 13 ae 62 d7 e9 1f 21 ff ff 48 c7 c7 40 f3 6b c0 e8 02 ae 62 d7 <0f> 0b e9 34 23 ff ff 48 c7 c7 40 f3 6b c0 e8 ef a
d 62 d7 0f 0b e9
[ 4390.214280] RSP: 0018:ffffb0568a6d7e20 EFLAGS: 00010246
[ 4390.219523] RAX: 0000000000000024 RBX: ffffb0568a6d7e50 RCX: 0000000000000000
[ 4390.226678] RDX: 0000000000000000 RSI: ffff9def8f8166b8 RDI: ffff9def8f8166b8
[ 4390.233826] RBP: ffffb0568a6d7e50 R08: 0000000000000556 R09: 0000000000aaaaaa
[ 4390.240976] R10: 0000000000000000 R11: 0000000000000001 R12: 000000005f8b6732
[ 4390.248126] R13: ffff9de31d44c798 R14: 000000000000001c R15: ffffb0568a6d7e50
[ 4390.255277] FS:  0000000000000000(0000) GS:ffff9def8f800000(0000) knlGS:0000000000000000
[ 4390.263382] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4390.269147] CR2: 00007f50de60b7f8 CR3: 000000046740a006 CR4: 00000000007606e0
[ 4390.276302] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4390.283448] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4390.290598] PKRU: 55555554
[ 4390.293316] Call Trace:
[ 4390.295790]  process_one_work+0x1a7/0x3a0
[ 4390.299815]  worker_thread+0x30/0x390
[ 4390.303487]  ? create_worker+0x1a0/0x1a0
[ 4390.307422]  kthread+0x112/0x130
[ 4390.310661]  ? kthread_bind+0x30/0x30
[ 4390.314340]  ret_from_fork+0x35/0x40
[ 4390.317936] ---[ end trace d0209d068f8583fd ]---
[ 4390.322577] list_add double add: new=ffff9de31d44c798, prev=ffffb0568a6d7e50, next=ffff9de31d44c798.
[ 4390.331736] ------------[ cut here ]------------
[ 4390.336368] kernel BUG at lib/list_debug.c:31!
[ 4390.340829] invalid opcode: 0000 [#1] SMP PTI
[ 4390.345200] CPU: 14 PID: 24184 Comm: kworker/u42:2 Tainted: G        W         4.19.0-11-amd64 #1 Debian 4.19.146-1
[ 4390.355648] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[ 4390.365327] Workqueue: nfsd4 laundromat_main [nfsd]
[ 4390.371091] RIP: 0010:__list_add_valid+0x41/0x50
[ 4390.376551] Code: 85 94 00 00 00 48 39 c7 74 0b 48 39 d7 74 06 b8 01 00 00 00 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 80 a4 a9 98 e8 bd 6b d0 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8b 07 48 8b 57 08
[ 4390.397015] RSP: 0018:ffffb0568a6d7e18 EFLAGS: 00010246
[ 4390.403077] RAX: 0000000000000058 RBX: ffff9de31d44c798 RCX: 0000000000000000
[ 4390.411045] RDX: 0000000000000000 RSI: ffff9def8f8166b8 RDI: ffff9def8f8166b8
[ 4390.418996] RBP: ffffb0568a6d7e50 R08: 0000000000000573 R09: 0000000000aaaaaa
[ 4390.426937] R10: 0000000000000000 R11: 0000000000000001 R12: 000000005f8b6732
[ 4390.434859] R13: ffff9de31d44c798 R14: 000000000000001c R15: ffffb0568a6d7e50
[ 4390.442761] FS:  0000000000000000(0000) GS:ffff9def8f800000(0000) knlGS:0000000000000000
[ 4390.451610] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4390.458100] CR2: 00007f50de60b7f8 CR3: 000000046740a006 CR4: 00000000007606e0
[ 4390.465970] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4390.473820] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4390.481650] PKRU: 55555554
[ 4390.485032] Call Trace:
[ 4390.488145]  laundromat_main+0x27a/0x610 [nfsd]
[ 4390.493333]  process_one_work+0x1a7/0x3a0
[ 4390.497987]  worker_thread+0x30/0x390
[ 4390.502276]  ? create_worker+0x1a0/0x1a0
[ 4390.506808]  kthread+0x112/0x130
[ 4390.510636]  ? kthread_bind+0x30/0x30
[ 4390.514892]  ret_from_fork+0x35/0x40
[ 4390.519056] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache quota_v2 quota_tree bonding intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate ast ttm drm_kms_helper intel_uncore mei_me drm joydev iTCO_wdt evdev pcc_cpufreq pcspkr sg ioatdma intel_rapl_perf mei iTCO_vendor_support i2c_algo_bit dca ipmi_si wmi ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad button nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1
[ 4390.593983]  raid0 multipath linear md_mod hid_generic usbhid hid sd_mod crc32c_intel aesni_intel ahci xhci_pci libahci aes_x86_64 xhci_hcd arcmsr crypto_simd libata cryptd i40e usbcore scsi_mod glue_helper lpc_ich i2c_i801 mfd_core usb_common
[ 4390.616982] ---[ end trace d0209d068f8583fe ]---
[ 4390.704972] RIP: 0010:__list_add_valid+0x41/0x50
[ 4390.710342] Code: 85 94 00 00 00 48 39 c7 74 0b 48 39 d7 74 06 b8 01 00 00 00 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 80 a4 a9 98 e8 bd 6b d0 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8b 07 48 8b 57 08
[ 4390.730656] RSP: 0018:ffffb0568a6d7e18 EFLAGS: 00010246
[ 4390.736676] RAX: 0000000000000058 RBX: ffff9de31d44c798 RCX: 0000000000000000
[ 4390.744609] RDX: 0000000000000000 RSI: ffff9def8f8166b8 RDI: ffff9def8f8166b8
[ 4390.752555] RBP: ffffb0568a6d7e50 R08: 0000000000000573 R09: 0000000000aaaaaa
[ 4390.760497] R10: 0000000000000000 R11: 0000000000000001 R12: 000000005f8b6732
[ 4390.768433] R13: ffff9de31d44c798 R14: 000000000000001c R15: ffffb0568a6d7e50
[ 4390.776361] FS:  0000000000000000(0000) GS:ffff9def8f800000(0000) knlGS:0000000000000000
[ 4390.785244] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4390.791789] CR2: 00007f50de60b7f8 CR3: 000000046740a006 CR4: 00000000007606e0
[ 4390.799725] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4390.807642] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4390.815544] PKRU: 55555554
[ 4390.819023] Kernel panic - not syncing: Fatal exception
[ 4390.825022] Kernel Offset: 0x16c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Thanks for your time in any case!

Regards,
Salvatore

  reply	other threads:[~2020-10-18  9:39 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-11  7:59 Kernel panic / list_add corruption when in nfsd4_run_cb_work Salvatore Bonaccorso
2020-10-12 14:26 ` J. Bruce Fields
2020-10-12 15:41   ` Salvatore Bonaccorso
2020-10-12 16:33     ` J. Bruce Fields
2020-10-18  9:39       ` Salvatore Bonaccorso [this message]
2021-10-06 18:46         ` Salvatore Bonaccorso
2021-11-22  9:15           ` Olivier Monaco
2021-11-22 15:17             ` Chuck Lever III
2021-11-24 15:29               ` Bruce Fields
2021-11-24 15:59                 ` Trond Myklebust
2021-11-24 16:10                   ` Trond Myklebust
2021-11-24 16:10                   ` bfields
2021-11-24 17:14                     ` Trond Myklebust
2021-11-24 22:06                       ` bfields
2021-11-24 22:17                         ` Trond Myklebust
2021-12-01 22:33                           ` bfields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201018093903.GA364695@eldamar.lan \
    --to=carnil@debian.org \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.