linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RIP: 0010:blk_mq_start_request when [none] elevator is used with kernel 5.19.y
@ 2022-09-19  7:03 Jaroslav Pulchart
  2022-09-19 13:40 ` Suwan Kim
  0 siblings, 1 reply; 3+ messages in thread
From: Jaroslav Pulchart @ 2022-09-19  7:03 UTC (permalink / raw)
  To: linux-block

Hello

I'm observing an issue in blk when [none] elevator is used with kernel
5.19.y, the issue is not observed in 5.18.y kernel line.

The issue is reproducible: 100% time with [none] io scheduler (no
issue with mq--deadline)

The command in dmesg trace depends about time when [none] io scheduler
was chosen by:
   echo "none" > /sys/block/vda/queue/scheduler
it can be mkfs, systemd-logind, flush, ... the first command which was
executed and doing some IO after the io scheduler change to [none],

PS: the system is still working after the issue report.

Example of issue report from booting firecracker micro vm:

[  671.000798] ------------[ cut here ]------------
[  671.002062] WARNING: CPU: 3 PID: 1926 at block/blk-mq.c:1143
blk_mq_start_request+0x127/0x130
[  671.004722] Modules linked in: intel_rapl_msr(E)
intel_rapl_common(E) libnvdimm(E) kvm_intel(E) kvm(E) irqbypass(E)
rapl(E) virtio_balloon(E) ext4(E) mbcache(E) jbd2(E)
crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) virtio_net(E)
net_failover(E) failover(E) virtio_blk(E) ghash_clmulni_intel(E)
serio_raw(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[  671.004746] Unloaded tainted modules: nfit(E):4 intel_cstate(E):4
intel_uncore(E):4
[  671.016268] CPU: 3 PID: 1926 Comm: kworker/u8:1 Tainted: G
  E     5.19.6-2.gdc.el8.x86_64 #1
[  671.019020] Workqueue: writeback wb_workfn (flush-252:0)
[  671.020642] RIP: 0010:blk_mq_start_request+0x127/0x130
[  671.022228] Code: c1 e8 09 66 89 43 7a 48 8b 7d 28 48 85 ff 0f 84
10 ff ff ff 48 89 de e8 f7 40 01 00 8b 83 94 00 00 00 85 c0 0f 84 08
ff ff ff <0f> 0b e9 01 ff ff ff 66 90 0f 1f 44 00 00 48 8b bf 18 01 00
00 40
[  671.027634] RSP: 0018:ffffc90000f1b6f8 EFLAGS: 00010202
[  671.029263] RAX: 0000000000000001 RBX: ffff88817d6a1a40 RCX: 0000000000000017
[  671.031385] RDX: 000000000080002a RSI: ffff88817d6a1a40 RDI: ffff88817e9c1058
[  671.033479] RBP: ffff88817d6b2a00 R08: 00000000047eb630 R09: ffff88817d6a1b60
[  671.035631] R10: 000000017d6a2000 R11: 0000000000000002 R12: ffff88817d6a1b60
[  671.037749] R13: ffff88817d6a1b60 R14: ffff88817fb08e00 R15: ffff88817fb08200
[  671.039881] FS:  0000000000000000(0000) GS:ffff888259780000(0000)
knlGS:0000000000000000
[  671.042304] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  671.044126] CR2: 00007f38b4001d88 CR3: 000000017fa1e003 CR4: 0000000000770ee0
[  671.046267] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  671.048432] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  671.050533] PKRU: 55555554
[  671.051388] Call Trace:
[  671.052137]  <TASK>
[  671.052821]  virtblk_prep_rq.isra.19+0xb4/0x290 [virtio_blk]
[  671.054626]  virtio_queue_rq+0x47/0x194 [virtio_blk]
[  671.056121]  __blk_mq_try_issue_directly+0x153/0x1c0
[  671.057587]  blk_mq_plug_issue_direct.constprop.75+0x85/0x120
[  671.059385]  blk_mq_flush_plug_list+0x287/0x2f0
[  671.060842]  ? blk_mq_rq_ctx_init.isra.47+0x177/0x190
[  671.062517]  blk_add_rq_to_plug+0x62/0x110
[  671.063754]  blk_mq_submit_bio+0x20e/0x540
[  671.065043]  __submit_bio+0xf5/0x180
[  671.066244]  submit_bio_noacct_nocheck+0x25a/0x2b0
[  671.067737]  submit_bio+0x3e/0xd0
[  671.068805]  submit_bh_wbc+0x117/0x140
[  671.070050]  __block_write_full_page+0x231/0x550
[  671.071519]  ? create_page_buffers+0x90/0x90
[  671.072807]  ? blkdev_bio_end_io_async+0x80/0x80
[  671.074161]  __writepage+0x16/0x70
[  671.075209]  write_cache_pages+0x187/0x4d0
[  671.076518]  ? dirty_background_bytes_handler+0x30/0x30
[  671.078171]  generic_writepages+0x4f/0x80
[  671.079355]  do_writepages+0xd2/0x1b0
[  671.080452]  __writeback_single_inode+0x41/0x360
[  671.081871]  writeback_sb_inodes+0x1f0/0x460
[  671.083192]  __writeback_inodes_wb+0x5f/0xd0
[  671.084407]  wb_writeback+0x235/0x2d0
[  671.085387]  wb_workfn+0x348/0x4a0
[  671.086392]  ? put_prev_task_fair+0x1b/0x40
[  671.087670]  ? __update_idle_core+0x1b/0xb0
[  671.088862]  process_one_work+0x1c5/0x390
[  671.090041]  ? process_one_work+0x390/0x390
[  671.091326]  worker_thread+0x30/0x360
[  671.092401]  ? process_one_work+0x390/0x390
[  671.093639]  kthread+0xd7/0x100
[  671.094578]  ? kthread_complete_and_exit+0x20/0x20
[  671.095996]  ret_from_fork+0x1f/0x30
[  671.097128]  </TASK>
[  671.097768] ---[ end trace 0000000000000000 ]---

Does somebody have any suggestions on what is wrong and how to fix it?

Best regards,
Jaroslav Pulchart

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RIP: 0010:blk_mq_start_request when [none] elevator is used with kernel 5.19.y
  2022-09-19  7:03 RIP: 0010:blk_mq_start_request when [none] elevator is used with kernel 5.19.y Jaroslav Pulchart
@ 2022-09-19 13:40 ` Suwan Kim
  2022-09-19 16:26   ` Jaroslav Pulchart
  0 siblings, 1 reply; 3+ messages in thread
From: Suwan Kim @ 2022-09-19 13:40 UTC (permalink / raw)
  To: Jaroslav Pulchart
  Cc: linux-block, virtualization, Michael S. Tsirkin, Stefan Hajnoczi,
	Christoph Hellwig

Hi Jaroslav,

On Mon, Sep 19, 2022 at 4:27 PM Jaroslav Pulchart
<jaroslav.pulchart@gooddata.com> wrote:
>
> Hello
>
> I'm observing an issue in blk when [none] elevator is used with kernel
> 5.19.y, the issue is not observed in 5.18.y kernel line.
>
> The issue is reproducible: 100% time with [none] io scheduler (no
> issue with mq--deadline)
>
> The command in dmesg trace depends about time when [none] io scheduler
> was chosen by:
>    echo "none" > /sys/block/vda/queue/scheduler
> it can be mkfs, systemd-logind, flush, ... the first command which was
> executed and doing some IO after the io scheduler change to [none],
>
> PS: the system is still working after the issue report.
>
> Example of issue report from booting firecracker micro vm:
>
> [  671.000798] ------------[ cut here ]------------
> [  671.002062] WARNING: CPU: 3 PID: 1926 at block/blk-mq.c:1143
> blk_mq_start_request+0x127/0x130
> [  671.004722] Modules linked in: intel_rapl_msr(E)
> intel_rapl_common(E) libnvdimm(E) kvm_intel(E) kvm(E) irqbypass(E)
> rapl(E) virtio_balloon(E) ext4(E) mbcache(E) jbd2(E)
> crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) virtio_net(E)
> net_failover(E) failover(E) virtio_blk(E) ghash_clmulni_intel(E)
> serio_raw(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> [  671.004746] Unloaded tainted modules: nfit(E):4 intel_cstate(E):4
> intel_uncore(E):4
> [  671.016268] CPU: 3 PID: 1926 Comm: kworker/u8:1 Tainted: G
>   E     5.19.6-2.gdc.el8.x86_64 #1
> [  671.019020] Workqueue: writeback wb_workfn (flush-252:0)
> [  671.020642] RIP: 0010:blk_mq_start_request+0x127/0x130
> [  671.022228] Code: c1 e8 09 66 89 43 7a 48 8b 7d 28 48 85 ff 0f 84
> 10 ff ff ff 48 89 de e8 f7 40 01 00 8b 83 94 00 00 00 85 c0 0f 84 08
> ff ff ff <0f> 0b e9 01 ff ff ff 66 90 0f 1f 44 00 00 48 8b bf 18 01 00
> 00 40
> [  671.027634] RSP: 0018:ffffc90000f1b6f8 EFLAGS: 00010202
> [  671.029263] RAX: 0000000000000001 RBX: ffff88817d6a1a40 RCX: 0000000000000017
> [  671.031385] RDX: 000000000080002a RSI: ffff88817d6a1a40 RDI: ffff88817e9c1058
> [  671.033479] RBP: ffff88817d6b2a00 R08: 00000000047eb630 R09: ffff88817d6a1b60
> [  671.035631] R10: 000000017d6a2000 R11: 0000000000000002 R12: ffff88817d6a1b60
> [  671.037749] R13: ffff88817d6a1b60 R14: ffff88817fb08e00 R15: ffff88817fb08200
> [  671.039881] FS:  0000000000000000(0000) GS:ffff888259780000(0000)
> knlGS:0000000000000000
> [  671.042304] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  671.044126] CR2: 00007f38b4001d88 CR3: 000000017fa1e003 CR4: 0000000000770ee0
> [  671.046267] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  671.048432] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  671.050533] PKRU: 55555554
> [  671.051388] Call Trace:
> [  671.052137]  <TASK>
> [  671.052821]  virtblk_prep_rq.isra.19+0xb4/0x290 [virtio_blk]
> [  671.054626]  virtio_queue_rq+0x47/0x194 [virtio_blk]
> [  671.056121]  __blk_mq_try_issue_directly+0x153/0x1c0
> [  671.057587]  blk_mq_plug_issue_direct.constprop.75+0x85/0x120
> [  671.059385]  blk_mq_flush_plug_list+0x287/0x2f0
> [  671.060842]  ? blk_mq_rq_ctx_init.isra.47+0x177/0x190
> [  671.062517]  blk_add_rq_to_plug+0x62/0x110
> [  671.063754]  blk_mq_submit_bio+0x20e/0x540
> [  671.065043]  __submit_bio+0xf5/0x180
> [  671.066244]  submit_bio_noacct_nocheck+0x25a/0x2b0
> [  671.067737]  submit_bio+0x3e/0xd0
> [  671.068805]  submit_bh_wbc+0x117/0x140
> [  671.070050]  __block_write_full_page+0x231/0x550
> [  671.071519]  ? create_page_buffers+0x90/0x90
> [  671.072807]  ? blkdev_bio_end_io_async+0x80/0x80
> [  671.074161]  __writepage+0x16/0x70
> [  671.075209]  write_cache_pages+0x187/0x4d0
> [  671.076518]  ? dirty_background_bytes_handler+0x30/0x30
> [  671.078171]  generic_writepages+0x4f/0x80
> [  671.079355]  do_writepages+0xd2/0x1b0
> [  671.080452]  __writeback_single_inode+0x41/0x360
> [  671.081871]  writeback_sb_inodes+0x1f0/0x460
> [  671.083192]  __writeback_inodes_wb+0x5f/0xd0
> [  671.084407]  wb_writeback+0x235/0x2d0
> [  671.085387]  wb_workfn+0x348/0x4a0
> [  671.086392]  ? put_prev_task_fair+0x1b/0x40
> [  671.087670]  ? __update_idle_core+0x1b/0xb0
> [  671.088862]  process_one_work+0x1c5/0x390
> [  671.090041]  ? process_one_work+0x390/0x390
> [  671.091326]  worker_thread+0x30/0x360
> [  671.092401]  ? process_one_work+0x390/0x390
> [  671.093639]  kthread+0xd7/0x100
> [  671.094578]  ? kthread_complete_and_exit+0x20/0x20
> [  671.095996]  ret_from_fork+0x1f/0x30
> [  671.097128]  </TASK>
> [  671.097768] ---[ end trace 0000000000000000 ]---
>
> Does somebody have any suggestions on what is wrong and how to fix it?
>
> Best regards,
> Jaroslav Pulchart

This issue looks similar to the link below.

https://lore.kernel.org/all/CAPBb6MW0Ou8CcWCD-n+9-B0daiviP1Uk9A9C0QBp=B2oFECF3w@mail.gmail.com/t/

We have fix for this issue. Could you test the below patch?

https://patchwork.kernel.org/project/linux-block/patch/20220830150153.12627-1-suwan.kim027@gmail.com/

Regards,
Suwan Kim

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RIP: 0010:blk_mq_start_request when [none] elevator is used with kernel 5.19.y
  2022-09-19 13:40 ` Suwan Kim
@ 2022-09-19 16:26   ` Jaroslav Pulchart
  0 siblings, 0 replies; 3+ messages in thread
From: Jaroslav Pulchart @ 2022-09-19 16:26 UTC (permalink / raw)
  To: Suwan Kim
  Cc: linux-block, virtualization, Michael S. Tsirkin, Stefan Hajnoczi,
	Christoph Hellwig

Hi

I tested the "[v2] virtio-blk: Fix WARN_ON_ONCE in virtio_queue_rq()"
patch and it fixed the issue so it looks to be the same problem.

Best,
Jaroslav Pulchart


po 19. 9. 2022 v 15:40 odesílatel Suwan Kim <suwan.kim027@gmail.com> napsal:
>
> Hi Jaroslav,
>
> On Mon, Sep 19, 2022 at 4:27 PM Jaroslav Pulchart
> <jaroslav.pulchart@gooddata.com> wrote:
> >
> > Hello
> >
> > I'm observing an issue in blk when [none] elevator is used with kernel
> > 5.19.y, the issue is not observed in 5.18.y kernel line.
> >
> > The issue is reproducible: 100% time with [none] io scheduler (no
> > issue with mq--deadline)
> >
> > The command in dmesg trace depends about time when [none] io scheduler
> > was chosen by:
> >    echo "none" > /sys/block/vda/queue/scheduler
> > it can be mkfs, systemd-logind, flush, ... the first command which was
> > executed and doing some IO after the io scheduler change to [none],
> >
> > PS: the system is still working after the issue report.
> >
> > Example of issue report from booting firecracker micro vm:
> >
> > [  671.000798] ------------[ cut here ]------------
> > [  671.002062] WARNING: CPU: 3 PID: 1926 at block/blk-mq.c:1143
> > blk_mq_start_request+0x127/0x130
> > [  671.004722] Modules linked in: intel_rapl_msr(E)
> > intel_rapl_common(E) libnvdimm(E) kvm_intel(E) kvm(E) irqbypass(E)
> > rapl(E) virtio_balloon(E) ext4(E) mbcache(E) jbd2(E)
> > crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) virtio_net(E)
> > net_failover(E) failover(E) virtio_blk(E) ghash_clmulni_intel(E)
> > serio_raw(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> > [  671.004746] Unloaded tainted modules: nfit(E):4 intel_cstate(E):4
> > intel_uncore(E):4
> > [  671.016268] CPU: 3 PID: 1926 Comm: kworker/u8:1 Tainted: G
> >   E     5.19.6-2.gdc.el8.x86_64 #1
> > [  671.019020] Workqueue: writeback wb_workfn (flush-252:0)
> > [  671.020642] RIP: 0010:blk_mq_start_request+0x127/0x130
> > [  671.022228] Code: c1 e8 09 66 89 43 7a 48 8b 7d 28 48 85 ff 0f 84
> > 10 ff ff ff 48 89 de e8 f7 40 01 00 8b 83 94 00 00 00 85 c0 0f 84 08
> > ff ff ff <0f> 0b e9 01 ff ff ff 66 90 0f 1f 44 00 00 48 8b bf 18 01 00
> > 00 40
> > [  671.027634] RSP: 0018:ffffc90000f1b6f8 EFLAGS: 00010202
> > [  671.029263] RAX: 0000000000000001 RBX: ffff88817d6a1a40 RCX: 0000000000000017
> > [  671.031385] RDX: 000000000080002a RSI: ffff88817d6a1a40 RDI: ffff88817e9c1058
> > [  671.033479] RBP: ffff88817d6b2a00 R08: 00000000047eb630 R09: ffff88817d6a1b60
> > [  671.035631] R10: 000000017d6a2000 R11: 0000000000000002 R12: ffff88817d6a1b60
> > [  671.037749] R13: ffff88817d6a1b60 R14: ffff88817fb08e00 R15: ffff88817fb08200
> > [  671.039881] FS:  0000000000000000(0000) GS:ffff888259780000(0000)
> > knlGS:0000000000000000
> > [  671.042304] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  671.044126] CR2: 00007f38b4001d88 CR3: 000000017fa1e003 CR4: 0000000000770ee0
> > [  671.046267] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  671.048432] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  671.050533] PKRU: 55555554
> > [  671.051388] Call Trace:
> > [  671.052137]  <TASK>
> > [  671.052821]  virtblk_prep_rq.isra.19+0xb4/0x290 [virtio_blk]
> > [  671.054626]  virtio_queue_rq+0x47/0x194 [virtio_blk]
> > [  671.056121]  __blk_mq_try_issue_directly+0x153/0x1c0
> > [  671.057587]  blk_mq_plug_issue_direct.constprop.75+0x85/0x120
> > [  671.059385]  blk_mq_flush_plug_list+0x287/0x2f0
> > [  671.060842]  ? blk_mq_rq_ctx_init.isra.47+0x177/0x190
> > [  671.062517]  blk_add_rq_to_plug+0x62/0x110
> > [  671.063754]  blk_mq_submit_bio+0x20e/0x540
> > [  671.065043]  __submit_bio+0xf5/0x180
> > [  671.066244]  submit_bio_noacct_nocheck+0x25a/0x2b0
> > [  671.067737]  submit_bio+0x3e/0xd0
> > [  671.068805]  submit_bh_wbc+0x117/0x140
> > [  671.070050]  __block_write_full_page+0x231/0x550
> > [  671.071519]  ? create_page_buffers+0x90/0x90
> > [  671.072807]  ? blkdev_bio_end_io_async+0x80/0x80
> > [  671.074161]  __writepage+0x16/0x70
> > [  671.075209]  write_cache_pages+0x187/0x4d0
> > [  671.076518]  ? dirty_background_bytes_handler+0x30/0x30
> > [  671.078171]  generic_writepages+0x4f/0x80
> > [  671.079355]  do_writepages+0xd2/0x1b0
> > [  671.080452]  __writeback_single_inode+0x41/0x360
> > [  671.081871]  writeback_sb_inodes+0x1f0/0x460
> > [  671.083192]  __writeback_inodes_wb+0x5f/0xd0
> > [  671.084407]  wb_writeback+0x235/0x2d0
> > [  671.085387]  wb_workfn+0x348/0x4a0
> > [  671.086392]  ? put_prev_task_fair+0x1b/0x40
> > [  671.087670]  ? __update_idle_core+0x1b/0xb0
> > [  671.088862]  process_one_work+0x1c5/0x390
> > [  671.090041]  ? process_one_work+0x390/0x390
> > [  671.091326]  worker_thread+0x30/0x360
> > [  671.092401]  ? process_one_work+0x390/0x390
> > [  671.093639]  kthread+0xd7/0x100
> > [  671.094578]  ? kthread_complete_and_exit+0x20/0x20
> > [  671.095996]  ret_from_fork+0x1f/0x30
> > [  671.097128]  </TASK>
> > [  671.097768] ---[ end trace 0000000000000000 ]---
> >
> > Does somebody have any suggestions on what is wrong and how to fix it?
> >
> > Best regards,
> > Jaroslav Pulchart
>
> This issue looks similar to the link below.
>
> https://lore.kernel.org/all/CAPBb6MW0Ou8CcWCD-n+9-B0daiviP1Uk9A9C0QBp=B2oFECF3w@mail.gmail.com/t/
>
> We have fix for this issue. Could you test the below patch?
>
> https://patchwork.kernel.org/project/linux-block/patch/20220830150153.12627-1-suwan.kim027@gmail.com/
>
> Regards,
> Suwan Kim

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-09-19 16:26 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-19  7:03 RIP: 0010:blk_mq_start_request when [none] elevator is used with kernel 5.19.y Jaroslav Pulchart
2022-09-19 13:40 ` Suwan Kim
2022-09-19 16:26   ` Jaroslav Pulchart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).