All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yi Zhang <yi.zhang@redhat.com>
To: Bruno Goncalves <bgoncalv@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block <linux-block@vger.kernel.org>,
	CKI Project <cki-project@redhat.com>
Subject: Re: kernel BUG at lib/list_debug.c:30! (list_add corruption. prev->next should be nex)
Date: Fri, 25 Nov 2022 16:38:58 +0800	[thread overview]
Message-ID: <CAHj4cs9uLczHhbO+SRmbBGPu3WZ_HntiCi4sxettXCnjuV8ZXQ@mail.gmail.com> (raw)
In-Reply-To: <CA+QYu4qDcYJf3WKAmuFcFGX273t4Yi0WG+eF6oQGiRyKeXejWw@mail.gmail.com>

I reproduced this issue even when system boot with the latest
linux-block/for-next, will try to bisect it later.

43f3ae1898c9 (HEAD -> for-next, origin/for-next) Merge branch
'for-6.2/writeback' into for-next
d6798bc243fa writeback: Add asserts for adding freed inode to lists

[   24.183829] list_add corruption. prev->next should be next
(ffff9a1d9f337f68), but was ffff9a1a02119e70. (prev=ffff9a1a02119e70).
[   24.195478] ------------[ cut here ]------------
[   24.200088] kernel BUG at lib/list_debug.c:30!
[   24.204532] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   24.209751] CPU: 4 PID: 167 Comm: kworker/4:1 Not tainted 6.1.0-rc6+ #1
[   24.216365] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
2.8.5 08/18/2022
[   24.223930] Workqueue: cgwb_release cgwb_release_workfn
[   24.229157] RIP: 0010:__list_add_valid.cold+0x3a/0x5b
[   24.234208] Code: f2 4c 89 c1 48 89 fe 48 c7 c7 20 23 65 a8 e8 d2
a2 fe ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 c8 22 65 a8 e8 bb
a2 fe ff <0f> 0b 4c 89 c1 48 c7 c7 70 22 65 a8 e8 aa a2 fe ff 0f 0b 48
c7 c7
[   24.252953] RSP: 0018:ffffb035407e7da8 EFLAGS: 00010046
[   24.258172] RAX: 0000000000000075 RBX: ffff9a1a02119e68 RCX: 0000000000000000
[   24.265303] RDX: 0000000000000000 RSI: ffff9a1d9f31f840 RDI: ffff9a1d9f31f840
[   24.272428] RBP: ffff9a1d9f337f00 R08: 0000000000000000 R09: 00000000ffff7fff
[   24.279560] R10: ffffb035407e7c50 R11: ffffffffa8be75e8 R12: ffff9a1d9f337f68
[   24.286683] R13: ffff9a1a02119e70 R14: ffff9a1a02119e70 R15: ffff9a1d9f330340
[   24.293808] FS:  0000000000000000(0000) GS:ffff9a1d9f300000(0000)
knlGS:0000000000000000
[   24.301894] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   24.307641] CR2: 000055b5f1f28050 CR3: 0000000104e38000 CR4: 0000000000350ee0
[   24.314772] Call Trace:
[   24.314774]  <TASK>
[   24.314774]  insert_work+0x46/0xc0
[   24.314780]  __queue_work+0x1d5/0x380
[   24.326376]  queue_work_on+0x24/0x30
[   24.329955]  blkcg_unpin_online+0x1b5/0x1c0
[   24.334143]  cgwb_release_workfn+0x6a/0x200
[   24.338327]  process_one_work+0x1e5/0x3b0
[   24.342342]  ? rescuer_thread+0x390/0x390
[   24.346352]  worker_thread+0x50/0x3a0
[   24.350019]  ? rescuer_thread+0x390/0x390
[   24.354030]  kthread+0xd9/0x100
[   24.357177]  ? kthread_complete_and_exit+0x20/0x20
[   24.361970]  ret_from_fork+0x22/0x30
[   24.365550]  </TASK>
[   24.367742] Modules linked in: sunrpc intel_rapl_msr
intel_rapl_common amd64_edac edac_mce_amd ipmi_ssif kvm_amd kvm
mgag200 ledtrig_audio rfkill video i2c_algo_bit drm_shmem_helper
dcdbas drm_kms_helper irqbypass dell_smbios rapl dell_wmi_descriptor
wmi_bmof pcspkr syscopyarea acpi_ipmi sysfillrect sysimgblt
fb_sys_fops ipmi_si ipmi_devintf ptdma i2c_piix4 k10temp
ipmi_msghandler acpi_power_meter vfat fat acpi_cpufreq drm fuse xfs
libcrc32c sd_mod sg ahci crct10dif_pclmul crc32_pclmul libahci
crc32c_intel ghash_clmulni_intel mpt3sas nvme tg3 libata nvme_core ccp
raid_class nvme_common t10_pi sp5100_tco scsi_transport_sas wmi
dm_mirror dm_region_hash dm_log dm_mod
[   24.426475] ---[ end trace 0000000000000000 ]---
[   24.505278] RIP: 0010:__list_add_valid.cold+0x3a/0x5b
[   24.510331] Code: f2 4c 89 c1 48 89 fe 48 c7 c7 20 23 65 a8 e8 d2
a2 fe ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 c8 22 65 a8 e8 bb
a2 fe ff <0f> 0b 4c 89 c1 48 c7 c7 70 22 65 a8 e8 aa a2 fe ff 0f 0b 48
c7 c7
[   24.510332] RSP: 0018:ffffb035407e7da8 EFLAGS: 00010046
[   24.510333] RAX: 0000000000000075 RBX: ffff9a1a02119e68 RCX: 0000000000000000
[   24.510334] RDX: 0000000000000000 RSI: ffff9a1d9f31f840 RDI: ffff9a1d9f31f840
[   24.510335] RBP: ffff9a1d9f337f00 R08: 0000000000000000 R09: 00000000ffff7fff
[   24.510337] R10: ffffb035407e7c50 R11: ffffffffa8be75e8 R12: ffff9a1d9f337f68
[   24.562805] R13: ffff9a1a02119e70 R14: ffff9a1a02119e70 R15: ffff9a1d9f330340
[   24.569929] FS:  0000000000000000(0000) GS:ffff9a1d9f300000(0000)
knlGS:0000000000000000
[   24.578017] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   24.578018] CR2: 000055b5f1f28050 CR3: 0000000104e38000 CR4: 0000000000350ee0
[   24.578019] Kernel panic - not syncing: Fatal exception
[   24.578653] Kernel Offset: 0x26200000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   24.682013] ---[ end Kernel panic - not syncing: Fatal exception ]---
[   24.339396] r[-- MARK -- Fri Nov 25 06:25:00 2022]


On Thu, Nov 24, 2022 at 11:00 PM Bruno Goncalves <bgoncalv@redhat.com> wrote:
>
> On Wed, 23 Nov 2022 at 14:46, Jens Axboe <axboe@kernel.dk> wrote:
> >
> > On 11/23/22 1:48 AM, Bruno Goncalves wrote:
> > > Hello,
> > >
> > > We recently started to hit the following panic when testing the block
> > > tree (for-next branch).
> > >
> > > [ 5076.172749] list_add corruption. prev->next should be next
> > > (ffff91cd6f7fa568), but was ffff91c991ca6670. (prev=ffff91c991ca6670).
> > > [ 5076.173863] ------------[ cut here ]------------
> > > [ 5076.174853] kernel BUG at lib/list_debug.c:30!
> > > [ 5076.175523] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> > > [ 5076.175853] CPU: 15 PID: 16415 Comm: kworker/15:13 Tainted: G
> > >    I        6.1.0-rc6 #1
> > > [ 5076.176799] Hardware name: HP ProLiant DL360p Gen8, BIOS P71 05/24/2019
> > > [ 5076.177198] Workqueue: cgwb_release cgwb_release_workfn
> > > [ 5076.177497] RIP: 0010:__list_add_valid.cold+0x3a/0x5b
> > > [ 5076.177788] Code: f2 48 89 c1 48 89 fe 48 c7 c7 48 d8 76 ad e8 5a
> > > 8f fd ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 f0 d7 76 ad e8 43
> > > 8f fd ff <0f> 0b 48 89 c1 48 c7 c7 98 d7 76 ad e8 32 8f fd ff 0f 0b 48
> > > c7 c7
> > > [ 5076.179173] RSP: 0018:ffffa1c98a6afdb0 EFLAGS: 00010082
> > > [ 5076.179472] RAX: 0000000000000075 RBX: ffff91c991ca6668 RCX: 0000000000000000
> > > [ 5076.180241] RDX: 0000000000000002 RSI: ffffffffad752ad3 RDI: 00000000ffffffff
> > > [ 5076.181069] RBP: ffff91cd6f7fa500 R08: 0000000000000000 R09: ffffa1c98a6afc60
> > > [ 5076.182209] R10: 0000000000000003 R11: ffff91cd7ff42fe8 R12: ffff91cd6f7fa568
> > > [ 5076.183002] R13: ffff91c991ca6670 R14: ffff91c991ca6670 R15: ffff91cd6f7f1440
> > > [ 5076.183902] FS:  0000000000000000(0000) GS:ffff91cd6f7c0000(0000)
> > > knlGS:0000000000000000
> > > [ 5076.184377] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 5076.185084] CR2: 0000560ff67e11b8 CR3: 000000020d010005 CR4: 00000000000606e0
> > > [ 5076.185945] Call Trace:
> > > [ 5076.186110]  <TASK>
> > > [ 5076.186916]  insert_work+0x46/0xc0
> > > [ 5076.187533]  __queue_work+0x1d4/0x460
> > > [ 5076.187788]  queue_work_on+0x37/0x40
> > > [ 5076.187993]  blkcg_unpin_online+0x1ad/0x1b0
> > > [ 5076.188244]  cgwb_release_workfn+0x6a/0x200
> > > [ 5076.188464]  process_one_work+0x1c7/0x380
> > > [ 5076.188675]  worker_thread+0x4d/0x380
> > > [ 5076.188881]  ? rescuer_thread+0x380/0x380
> > > [ 5076.189089]  kthread+0xe9/0x110
> > > [ 5076.189716]  ? kthread_complete_and_exit+0x20/0x20
> > > [ 5076.190407]  ret_from_fork+0x22/0x30
> > > [ 5076.190677]  </TASK>
> > > [ 5076.190816] Modules linked in: nvme nvme_core nvme_common loop tls
> > > rfkill intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal
> > > intel_powerclamp coretemp sunrpc kvm_intel kvm iTCO_wdt iapl
> > > intel_cstate intel_uncore pcspkr lpc_ich ipmi_ssif hpilo tg3 acpi_ipmi
> > > ioatdma ipmi_si ipmi_devintf dca ipmi_msghandler acpi_power_meter fuse
> > > zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni
> > > polyval_generic ghash_clmulni_intel sha512_ssse3 serio_raw hpsa
> > > mgag200 scsi_transport_sas [last unloaded: scsi_debug]
> > > [ 5076.293315] ---[ end trace 0000000000000000 ]---
> > > [ 5076.295226] RIP: 0010:__list_add_valid.cold+0x3a/0x5b
> > > [ 5076.295587] Code: f2 48 89 c1 48 89 fe 48 c7 c7 48 d8 76 ad e8 5a
> > > 8f fd ff 0f 0b 48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 f0 d7 76 ad e8 43
> > > 8f fd ff <0f> 0b 48 89 c1 48 c7 c7 98 d7 76 ad e8 32 8f fd ff 0f 0b 48
> > > c7 c7
> > > [ 5076.296921] RSP: 0018:ffffa1c98a6afdb0 EFLAGS: 00010082
> > > [ 5076.297239] RAX: 0000000000000075 RBX: ffff91c991ca6668 RCX: 0000000000000000
> > > [ 5076.297983] RDX: 0000000000000002 RSI: ffffffffad752ad3 RDI: 00000000ffffffff
> > > [ 5076.298768] RBP: ffff91cd6f7fa500 R08: 0000000000000000 R09: ffffa1c98a6afc60
> > > [ 5076.299525] R10: 0000S:  0000000000000000(0000)
> > > GS:ffff91cd6f7c0000(0000) knlGS:0000000000000000
> > > [ 5076.700351] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 5076.701046] CR2: 0000560ff67e11b8 CR3: 000000020d010005 CR4: 00000000000606e0
> > > [ 5076ernel panic - not syncing: Fatal exception
> > > [ 5077.924713] Shutting down cpus with NMI
> > > [ 5077.924986] Kernel Offset: 0x2b000000 from 0xffffffff81000000
> > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > [ 5077.927946] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > >
> > > It seems to happen often during different tests.
> > >
> > > full console.log:
> > > https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/11/21/redhat:700955106/build_x86_64_redhat:700955106_x86_64/tests/1/results_0001/console.log/console.log
> > >
> > > kernel tarball:
> > > https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/700955106/publish%20x86_64/3356091217/artifacts/kernel-block-redhat_700955106_x86_64.tar.gz
> > >
> > > kernel config: https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/700955106/build%20x86_64/3356091207/artifacts/kernel-block-redhat_700955106_x86_64.config
> > >
> > > test logs: https://datawarehouse.cki-project.org/kcidb/tests/6061677
> > >
> > > We didn't bisect, but the first commit we hit the problem was
> > > "f65d92c600fe6eecdbd6e7fab7893c9c094dfcbf
> > > (io_uring-6.1-2022-11-18-2180-gf65d92c600fe)" and the last one where
> > > we didn't hit the problem was
> > > "40fa774af7fd04d06014ac74947c351649b6f64f
> > > (io_uring-6.1-2022-11-11-1843-g40fa774af7fd)"
> > >
> > > test logs: https://datawarehouse.cki-project.org/kcidb/tests/6061677
> > > cki issue tracker: https://datawarehouse.cki-project.org/issue/1732
> >
> > Please just try and clone for-6.2/block from the block tree and bisect
> > it?
> >
>
> Hi,
> I've tried with commit 93c68cc46a070775cc6675e3543dd909eb9f6c9e (drbd:
> use consistent license), but I was not able to hit the panic with it.
>
>
> Bruno
>
> > --
> > Jens Axboe
> >
> >
>


-- 
Best Regards,
  Yi Zhang


  reply	other threads:[~2022-11-25  8:40 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-23  8:48 kernel BUG at lib/list_debug.c:30! (list_add corruption. prev->next should be nex) Bruno Goncalves
2022-11-23 13:46 ` Jens Axboe
2022-11-24 14:57   ` Bruno Goncalves
2022-11-25  8:38     ` Yi Zhang [this message]
2022-11-26 14:29       ` [bisected]kernel " Yi Zhang
2022-11-26 15:53         ` Jens Axboe
2022-11-26 22:54           ` Waiman Long
2022-11-27  4:13             ` Waiman Long
2022-11-28 18:55         ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHj4cs9uLczHhbO+SRmbBGPu3WZ_HntiCi4sxettXCnjuV8ZXQ@mail.gmail.com \
    --to=yi.zhang@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bgoncalv@redhat.com \
    --cc=cki-project@redhat.com \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.