All of lore.kernel.org
 help / color / mirror / Atom feed
* blk-mq crash under KVM  in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-11 10:26 ` Christian Borntraeger
  0 siblings, 0 replies; 31+ messages in thread
From: Christian Borntraeger @ 2014-09-11 10:26 UTC (permalink / raw)
  To: rusty Russell, ms >> Michael S. Tsirkin, Jens Axboe, KVM list
  Cc: Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List

Folks,

we have seen the following bug with 3.16 as a KVM guest. It suspect the blk-mq rework that happened between 3.15 and 3.16, but it can be something completely different.


[   65.992022] Unable to handle kernel pointer dereference in virtual kernel address space
[   65.992187] failing address: ccccccccccccd000 TEID: ccccccccccccd803
[   65.992363] Fault in home space mode while using kernel ASCE.
[   65.992365] AS:0000000000a7c007 R3:0000000000000024 
[   65.993754] Oops: 0038 [#1] SMP 
[   65.993923] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi virtio_balloon vhost_net vhost macvtap macvlan kvm dm_multipath virtio_net virtio_blk sunrpc
[   65.994274] CPU: 0 PID: 44 Comm: kworker/u6:2 Not tainted 3.16.0-20140814.0.c66c84c.fc18-s390xfrob #1
[   65.996043] Workqueue: writeback bdi_writeback_workfn (flush-251:32)
[   65.996222] task: 0000000002250000 ti: 0000000002258000 task.ti: 0000000002258000
[   65.996228] Krnl PSW : 0704f00180000000 00000000003ed114 (blk_mq_tag_to_rq+0x20/0x38)
[   65.997299]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 EA:3
               Krnl GPRS: 0000000000000040 cccccccccccccccc 0000000001619000 000000000000004e
[   65.997301]            000000000000004e 0000000000000000 0000000000000001 0000000000a0de18
[   65.997302]            0000000077ffbe18 0000000077ffbd50 000000006d72d620 000000000000004f
[   65.997304]            0000000001a99400 0000000000000080 00000000003eddee 0000000077ffbc28
[   65.997864] Krnl Code: 00000000003ed106: e31020300004        lg      %r1,48(%r2)
                          00000000003ed10c: 91082044            tm      68(%r2),8
                         #00000000003ed110: a7840009            brc     8,3ed122
                         >00000000003ed114: e34016880004        lg      %r4,1672(%r1)
                          00000000003ed11a: 59304100            c       %r3,256(%r4)
                          00000000003ed11e: a7840003            brc     8,3ed124
                          00000000003ed122: 07fe                bcr     15,%r14
                          00000000003ed124: b9040024            lgr     %r2,%r4
[   65.998221] Call Trace:
[   65.998224] ([<0000000000000001>] 0x1)
[   65.998227]  [<00000000003f17b6>] blk_mq_tag_busy_iter+0x7a/0xc4
[   65.998228]  [<00000000003edcd6>] blk_mq_rq_timer+0x96/0x13c
[   65.999226]  [<000000000013ee60>] call_timer_fn+0x40/0x110
[   65.999230]  [<000000000013f642>] run_timer_softirq+0x2de/0x3d0
[   65.999238]  [<0000000000135b70>] __do_softirq+0x124/0x2ac
[   65.999241]  [<0000000000136000>] irq_exit+0xc4/0xe4
[   65.999435]  [<000000000010bc08>] do_IRQ+0x64/0x84
[   66.437533]  [<000000000067ccd8>] ext_skip+0x42/0x46
[   66.437541]  [<00000000003ed7b4>] __blk_mq_alloc_request+0x58/0x1e8
[   66.437544] ([<00000000003ed788>] __blk_mq_alloc_request+0x2c/0x1e8)
[   66.437547]  [<00000000003eef82>] blk_mq_map_request+0xc2/0x208
[   66.437549]  [<00000000003ef860>] blk_sq_make_request+0xac/0x350
[   66.437721]  [<00000000003e2d6c>] generic_make_request+0xc4/0xfc
[   66.437723]  [<00000000003e2e56>] submit_bio+0xb2/0x1a8
[   66.438373]  [<000000000031e8aa>] ext4_io_submit+0x52/0x80
[   66.438375]  [<000000000031ccfa>] ext4_writepages+0x7c6/0xd0c
[   66.438378]  [<00000000002aea20>] __writeback_single_inode+0x54/0x274
[   66.438379]  [<00000000002b0134>] writeback_sb_inodes+0x28c/0x4ec
[   66.438380]  [<00000000002b042e>] __writeback_inodes_wb+0x9a/0xe4
[   66.438382]  [<00000000002b06a2>] wb_writeback+0x22a/0x358
[   66.438383]  [<00000000002b0cd0>] bdi_writeback_workfn+0x354/0x538
[   66.438618]  [<000000000014e3aa>] process_one_work+0x1aa/0x418
[   66.438621]  [<000000000014ef94>] worker_thread+0x48/0x524
[   66.438625]  [<00000000001560ca>] kthread+0xee/0x108
[   66.438627]  [<000000000067c76e>] kernel_thread_starter+0x6/0xc
[   66.438628]  [<000000000067c768>] kernel_thread_starter+0x0/0xc
[   66.438629] Last Breaking-Event-Address:
[   66.438631]  [<00000000003edde8>] blk_mq_timeout_check+0x6c/0xb8

I looked into the dump, and the full function is  (annotated by me to match the source code)
r2= tags
r3= tag (4e)
Dump of assembler code for function blk_mq_tag_to_rq:
   0x00000000003ed0f4 <+0>:     lg      %r1,96(%r2)			# r1 has now tags->rqs
   0x00000000003ed0fa <+6>:     sllg    %r2,%r3,3			# r2 has tag*8
   0x00000000003ed100 <+12>:    lg      %r2,0(%r2,%r1)			# r2 now has rq (=tags->rqs[tag])
   0x00000000003ed106 <+18>:    lg      %r1,48(%r2)			# r1 now has rq->q
   0x00000000003ed10c <+24>:    tm      68(%r2),8			# test for rq->cmd_flags & REQ_FLUSH_SEQ
   0x00000000003ed110 <+28>:    je      0x3ed122 <blk_mq_tag_to_rq+46>  #  if not goto 3ed122
   0x00000000003ed114 <+32>:    lg      %r4,1672(%r1)			# r4 = rq->q->flush_rq  <-------- CRASHES as rq->q points to cccccccccccc
   0x00000000003ed11a <+38>:    c       %r3,256(%r4)			# compare tag with rq->q->flush_rq->tag
   0x00000000003ed11e <+42>:    je      0x3ed124 <blk_mq_tag_to_rq+48>  # if equal goto ..124
   0x00000000003ed122 <+46>:    br      %r14				# return (with return value == r2)
   0x00000000003ed124 <+48>:    lgr     %r2,%r4				# return value = r4
   0x00000000003ed128 <+52>:    br      %r14				# return

Does anyone have an idea?
The request itself is completely filled with cc


Christian


^ permalink raw reply	[flat|nested] 31+ messages in thread

* blk-mq crash under KVM  in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-11 10:26 ` Christian Borntraeger
  0 siblings, 0 replies; 31+ messages in thread
From: Christian Borntraeger @ 2014-09-11 10:26 UTC (permalink / raw)
  To: rusty Russell, ms >> Michael S. Tsirkin, Jens Axboe, KVM list
  Cc: linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Virtualization List

Folks,

we have seen the following bug with 3.16 as a KVM guest. It suspect the blk-mq rework that happened between 3.15 and 3.16, but it can be something completely different.


[   65.992022] Unable to handle kernel pointer dereference in virtual kernel address space
[   65.992187] failing address: ccccccccccccd000 TEID: ccccccccccccd803
[   65.992363] Fault in home space mode while using kernel ASCE.
[   65.992365] AS:0000000000a7c007 R3:0000000000000024 
[   65.993754] Oops: 0038 [#1] SMP 
[   65.993923] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi virtio_balloon vhost_net vhost macvtap macvlan kvm dm_multipath virtio_net virtio_blk sunrpc
[   65.994274] CPU: 0 PID: 44 Comm: kworker/u6:2 Not tainted 3.16.0-20140814.0.c66c84c.fc18-s390xfrob #1
[   65.996043] Workqueue: writeback bdi_writeback_workfn (flush-251:32)
[   65.996222] task: 0000000002250000 ti: 0000000002258000 task.ti: 0000000002258000
[   65.996228] Krnl PSW : 0704f00180000000 00000000003ed114 (blk_mq_tag_to_rq+0x20/0x38)
[   65.997299]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 EA:3
               Krnl GPRS: 0000000000000040 cccccccccccccccc 0000000001619000 000000000000004e
[   65.997301]            000000000000004e 0000000000000000 0000000000000001 0000000000a0de18
[   65.997302]            0000000077ffbe18 0000000077ffbd50 000000006d72d620 000000000000004f
[   65.997304]            0000000001a99400 0000000000000080 00000000003eddee 0000000077ffbc28
[   65.997864] Krnl Code: 00000000003ed106: e31020300004        lg      %r1,48(%r2)
                          00000000003ed10c: 91082044            tm      68(%r2),8
                         #00000000003ed110: a7840009            brc     8,3ed122
                         >00000000003ed114: e34016880004        lg      %r4,1672(%r1)
                          00000000003ed11a: 59304100            c       %r3,256(%r4)
                          00000000003ed11e: a7840003            brc     8,3ed124
                          00000000003ed122: 07fe                bcr     15,%r14
                          00000000003ed124: b9040024            lgr     %r2,%r4
[   65.998221] Call Trace:
[   65.998224] ([<0000000000000001>] 0x1)
[   65.998227]  [<00000000003f17b6>] blk_mq_tag_busy_iter+0x7a/0xc4
[   65.998228]  [<00000000003edcd6>] blk_mq_rq_timer+0x96/0x13c
[   65.999226]  [<000000000013ee60>] call_timer_fn+0x40/0x110
[   65.999230]  [<000000000013f642>] run_timer_softirq+0x2de/0x3d0
[   65.999238]  [<0000000000135b70>] __do_softirq+0x124/0x2ac
[   65.999241]  [<0000000000136000>] irq_exit+0xc4/0xe4
[   65.999435]  [<000000000010bc08>] do_IRQ+0x64/0x84
[   66.437533]  [<000000000067ccd8>] ext_skip+0x42/0x46
[   66.437541]  [<00000000003ed7b4>] __blk_mq_alloc_request+0x58/0x1e8
[   66.437544] ([<00000000003ed788>] __blk_mq_alloc_request+0x2c/0x1e8)
[   66.437547]  [<00000000003eef82>] blk_mq_map_request+0xc2/0x208
[   66.437549]  [<00000000003ef860>] blk_sq_make_request+0xac/0x350
[   66.437721]  [<00000000003e2d6c>] generic_make_request+0xc4/0xfc
[   66.437723]  [<00000000003e2e56>] submit_bio+0xb2/0x1a8
[   66.438373]  [<000000000031e8aa>] ext4_io_submit+0x52/0x80
[   66.438375]  [<000000000031ccfa>] ext4_writepages+0x7c6/0xd0c
[   66.438378]  [<00000000002aea20>] __writeback_single_inode+0x54/0x274
[   66.438379]  [<00000000002b0134>] writeback_sb_inodes+0x28c/0x4ec
[   66.438380]  [<00000000002b042e>] __writeback_inodes_wb+0x9a/0xe4
[   66.438382]  [<00000000002b06a2>] wb_writeback+0x22a/0x358
[   66.438383]  [<00000000002b0cd0>] bdi_writeback_workfn+0x354/0x538
[   66.438618]  [<000000000014e3aa>] process_one_work+0x1aa/0x418
[   66.438621]  [<000000000014ef94>] worker_thread+0x48/0x524
[   66.438625]  [<00000000001560ca>] kthread+0xee/0x108
[   66.438627]  [<000000000067c76e>] kernel_thread_starter+0x6/0xc
[   66.438628]  [<000000000067c768>] kernel_thread_starter+0x0/0xc
[   66.438629] Last Breaking-Event-Address:
[   66.438631]  [<00000000003edde8>] blk_mq_timeout_check+0x6c/0xb8

I looked into the dump, and the full function is  (annotated by me to match the source code)
r2= tags
r3= tag (4e)
Dump of assembler code for function blk_mq_tag_to_rq:
   0x00000000003ed0f4 <+0>:     lg      %r1,96(%r2)			# r1 has now tags->rqs
   0x00000000003ed0fa <+6>:     sllg    %r2,%r3,3			# r2 has tag*8
   0x00000000003ed100 <+12>:    lg      %r2,0(%r2,%r1)			# r2 now has rq (=tags->rqs[tag])
   0x00000000003ed106 <+18>:    lg      %r1,48(%r2)			# r1 now has rq->q
   0x00000000003ed10c <+24>:    tm      68(%r2),8			# test for rq->cmd_flags & REQ_FLUSH_SEQ
   0x00000000003ed110 <+28>:    je      0x3ed122 <blk_mq_tag_to_rq+46>  #  if not goto 3ed122
   0x00000000003ed114 <+32>:    lg      %r4,1672(%r1)			# r4 = rq->q->flush_rq  <-------- CRASHES as rq->q points to cccccccccccc
   0x00000000003ed11a <+38>:    c       %r3,256(%r4)			# compare tag with rq->q->flush_rq->tag
   0x00000000003ed11e <+42>:    je      0x3ed124 <blk_mq_tag_to_rq+48>  # if equal goto ..124
   0x00000000003ed122 <+46>:    br      %r14				# return (with return value == r2)
   0x00000000003ed124 <+48>:    lgr     %r2,%r4				# return value = r4
   0x00000000003ed128 <+52>:    br      %r14				# return

Does anyone have an idea?
The request itself is completely filled with cc


Christian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM  in multiqueue block code (with virtio-blk and ext4)
  2014-09-11 10:26 ` Christian Borntraeger
@ 2014-09-12 10:56   ` Christian Borntraeger
  -1 siblings, 0 replies; 31+ messages in thread
From: Christian Borntraeger @ 2014-09-12 10:56 UTC (permalink / raw)
  To: rusty Russell, ms >> Michael S. Tsirkin, Jens Axboe, KVM list
  Cc: Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List

On 09/11/2014 12:26 PM, Christian Borntraeger wrote:
> Folks,
> 
> we have seen the following bug with 3.16 as a KVM guest. It suspect the blk-mq rework that happened between 3.15 and 3.16, but it can be something completely different.
> 
> 
> [   65.992022] Unable to handle kernel pointer dereference in virtual kernel address space
> [   65.992187] failing address: ccccccccccccd000 TEID: ccccccccccccd803
> [   65.992363] Fault in home space mode while using kernel ASCE.
> [   65.992365] AS:0000000000a7c007 R3:0000000000000024 
> [   65.993754] Oops: 0038 [#1] SMP 
> [   65.993923] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi virtio_balloon vhost_net vhost macvtap macvlan kvm dm_multipath virtio_net virtio_blk sunrpc
> [   65.994274] CPU: 0 PID: 44 Comm: kworker/u6:2 Not tainted 3.16.0-20140814.0.c66c84c.fc18-s390xfrob #1
> [   65.996043] Workqueue: writeback bdi_writeback_workfn (flush-251:32)
> [   65.996222] task: 0000000002250000 ti: 0000000002258000 task.ti: 0000000002258000
> [   65.996228] Krnl PSW : 0704f00180000000 00000000003ed114 (blk_mq_tag_to_rq+0x20/0x38)
> [   65.997299]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 EA:3
>                Krnl GPRS: 0000000000000040 cccccccccccccccc 0000000001619000 000000000000004e
> [   65.997301]            000000000000004e 0000000000000000 0000000000000001 0000000000a0de18
> [   65.997302]            0000000077ffbe18 0000000077ffbd50 000000006d72d620 000000000000004f
> [   65.997304]            0000000001a99400 0000000000000080 00000000003eddee 0000000077ffbc28
> [   65.997864] Krnl Code: 00000000003ed106: e31020300004        lg      %r1,48(%r2)
>                           00000000003ed10c: 91082044            tm      68(%r2),8
>                          #00000000003ed110: a7840009            brc     8,3ed122
>                          >00000000003ed114: e34016880004        lg      %r4,1672(%r1)
>                           00000000003ed11a: 59304100            c       %r3,256(%r4)
>                           00000000003ed11e: a7840003            brc     8,3ed124
>                           00000000003ed122: 07fe                bcr     15,%r14
>                           00000000003ed124: b9040024            lgr     %r2,%r4
> [   65.998221] Call Trace:
> [   65.998224] ([<0000000000000001>] 0x1)
> [   65.998227]  [<00000000003f17b6>] blk_mq_tag_busy_iter+0x7a/0xc4
> [   65.998228]  [<00000000003edcd6>] blk_mq_rq_timer+0x96/0x13c
> [   65.999226]  [<000000000013ee60>] call_timer_fn+0x40/0x110
> [   65.999230]  [<000000000013f642>] run_timer_softirq+0x2de/0x3d0
> [   65.999238]  [<0000000000135b70>] __do_softirq+0x124/0x2ac
> [   65.999241]  [<0000000000136000>] irq_exit+0xc4/0xe4
> [   65.999435]  [<000000000010bc08>] do_IRQ+0x64/0x84
> [   66.437533]  [<000000000067ccd8>] ext_skip+0x42/0x46
> [   66.437541]  [<00000000003ed7b4>] __blk_mq_alloc_request+0x58/0x1e8
> [   66.437544] ([<00000000003ed788>] __blk_mq_alloc_request+0x2c/0x1e8)
> [   66.437547]  [<00000000003eef82>] blk_mq_map_request+0xc2/0x208

I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.

Christian

> [   66.437549]  [<00000000003ef860>] blk_sq_make_request+0xac/0x350
> [   66.437721]  [<00000000003e2d6c>] generic_make_request+0xc4/0xfc
> [   66.437723]  [<00000000003e2e56>] submit_bio+0xb2/0x1a8
> [   66.438373]  [<000000000031e8aa>] ext4_io_submit+0x52/0x80
> [   66.438375]  [<000000000031ccfa>] ext4_writepages+0x7c6/0xd0c
> [   66.438378]  [<00000000002aea20>] __writeback_single_inode+0x54/0x274
> [   66.438379]  [<00000000002b0134>] writeback_sb_inodes+0x28c/0x4ec
> [   66.438380]  [<00000000002b042e>] __writeback_inodes_wb+0x9a/0xe4
> [   66.438382]  [<00000000002b06a2>] wb_writeback+0x22a/0x358
> [   66.438383]  [<00000000002b0cd0>] bdi_writeback_workfn+0x354/0x538
> [   66.438618]  [<000000000014e3aa>] process_one_work+0x1aa/0x418
> [   66.438621]  [<000000000014ef94>] worker_thread+0x48/0x524
> [   66.438625]  [<00000000001560ca>] kthread+0xee/0x108
> [   66.438627]  [<000000000067c76e>] kernel_thread_starter+0x6/0xc
> [   66.438628]  [<000000000067c768>] kernel_thread_starter+0x0/0xc
> [   66.438629] Last Breaking-Event-Address:
> [   66.438631]  [<00000000003edde8>] blk_mq_timeout_check+0x6c/0xb8
> 
> I looked into the dump, and the full function is  (annotated by me to match the source code)
> r2= tags
> r3= tag (4e)
> Dump of assembler code for function blk_mq_tag_to_rq:
>    0x00000000003ed0f4 <+0>:     lg      %r1,96(%r2)			# r1 has now tags->rqs
>    0x00000000003ed0fa <+6>:     sllg    %r2,%r3,3			# r2 has tag*8
>    0x00000000003ed100 <+12>:    lg      %r2,0(%r2,%r1)			# r2 now has rq (=tags->rqs[tag])
>    0x00000000003ed106 <+18>:    lg      %r1,48(%r2)			# r1 now has rq->q
>    0x00000000003ed10c <+24>:    tm      68(%r2),8			# test for rq->cmd_flags & REQ_FLUSH_SEQ
>    0x00000000003ed110 <+28>:    je      0x3ed122 <blk_mq_tag_to_rq+46>  #  if not goto 3ed122
>    0x00000000003ed114 <+32>:    lg      %r4,1672(%r1)			# r4 = rq->q->flush_rq  <-------- CRASHES as rq->q points to cccccccccccc
>    0x00000000003ed11a <+38>:    c       %r3,256(%r4)			# compare tag with rq->q->flush_rq->tag
>    0x00000000003ed11e <+42>:    je      0x3ed124 <blk_mq_tag_to_rq+48>  # if equal goto ..124
>    0x00000000003ed122 <+46>:    br      %r14				# return (with return value == r2)
>    0x00000000003ed124 <+48>:    lgr     %r2,%r4				# return value = r4
>    0x00000000003ed128 <+52>:    br      %r14				# return
> 
> Does anyone have an idea?
> The request itself is completely filled with cc
> 
> 
> Christian
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-12 10:56   ` Christian Borntraeger
  0 siblings, 0 replies; 31+ messages in thread
From: Christian Borntraeger @ 2014-09-12 10:56 UTC (permalink / raw)
  To: rusty Russell, ms >> Michael S. Tsirkin, Jens Axboe, KVM list
  Cc: linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Virtualization List

On 09/11/2014 12:26 PM, Christian Borntraeger wrote:
> Folks,
> 
> we have seen the following bug with 3.16 as a KVM guest. It suspect the blk-mq rework that happened between 3.15 and 3.16, but it can be something completely different.
> 
> 
> [   65.992022] Unable to handle kernel pointer dereference in virtual kernel address space
> [   65.992187] failing address: ccccccccccccd000 TEID: ccccccccccccd803
> [   65.992363] Fault in home space mode while using kernel ASCE.
> [   65.992365] AS:0000000000a7c007 R3:0000000000000024 
> [   65.993754] Oops: 0038 [#1] SMP 
> [   65.993923] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi virtio_balloon vhost_net vhost macvtap macvlan kvm dm_multipath virtio_net virtio_blk sunrpc
> [   65.994274] CPU: 0 PID: 44 Comm: kworker/u6:2 Not tainted 3.16.0-20140814.0.c66c84c.fc18-s390xfrob #1
> [   65.996043] Workqueue: writeback bdi_writeback_workfn (flush-251:32)
> [   65.996222] task: 0000000002250000 ti: 0000000002258000 task.ti: 0000000002258000
> [   65.996228] Krnl PSW : 0704f00180000000 00000000003ed114 (blk_mq_tag_to_rq+0x20/0x38)
> [   65.997299]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 EA:3
>                Krnl GPRS: 0000000000000040 cccccccccccccccc 0000000001619000 000000000000004e
> [   65.997301]            000000000000004e 0000000000000000 0000000000000001 0000000000a0de18
> [   65.997302]            0000000077ffbe18 0000000077ffbd50 000000006d72d620 000000000000004f
> [   65.997304]            0000000001a99400 0000000000000080 00000000003eddee 0000000077ffbc28
> [   65.997864] Krnl Code: 00000000003ed106: e31020300004        lg      %r1,48(%r2)
>                           00000000003ed10c: 91082044            tm      68(%r2),8
>                          #00000000003ed110: a7840009            brc     8,3ed122
>                          >00000000003ed114: e34016880004        lg      %r4,1672(%r1)
>                           00000000003ed11a: 59304100            c       %r3,256(%r4)
>                           00000000003ed11e: a7840003            brc     8,3ed124
>                           00000000003ed122: 07fe                bcr     15,%r14
>                           00000000003ed124: b9040024            lgr     %r2,%r4
> [   65.998221] Call Trace:
> [   65.998224] ([<0000000000000001>] 0x1)
> [   65.998227]  [<00000000003f17b6>] blk_mq_tag_busy_iter+0x7a/0xc4
> [   65.998228]  [<00000000003edcd6>] blk_mq_rq_timer+0x96/0x13c
> [   65.999226]  [<000000000013ee60>] call_timer_fn+0x40/0x110
> [   65.999230]  [<000000000013f642>] run_timer_softirq+0x2de/0x3d0
> [   65.999238]  [<0000000000135b70>] __do_softirq+0x124/0x2ac
> [   65.999241]  [<0000000000136000>] irq_exit+0xc4/0xe4
> [   65.999435]  [<000000000010bc08>] do_IRQ+0x64/0x84
> [   66.437533]  [<000000000067ccd8>] ext_skip+0x42/0x46
> [   66.437541]  [<00000000003ed7b4>] __blk_mq_alloc_request+0x58/0x1e8
> [   66.437544] ([<00000000003ed788>] __blk_mq_alloc_request+0x2c/0x1e8)
> [   66.437547]  [<00000000003eef82>] blk_mq_map_request+0xc2/0x208

I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.

Christian

> [   66.437549]  [<00000000003ef860>] blk_sq_make_request+0xac/0x350
> [   66.437721]  [<00000000003e2d6c>] generic_make_request+0xc4/0xfc
> [   66.437723]  [<00000000003e2e56>] submit_bio+0xb2/0x1a8
> [   66.438373]  [<000000000031e8aa>] ext4_io_submit+0x52/0x80
> [   66.438375]  [<000000000031ccfa>] ext4_writepages+0x7c6/0xd0c
> [   66.438378]  [<00000000002aea20>] __writeback_single_inode+0x54/0x274
> [   66.438379]  [<00000000002b0134>] writeback_sb_inodes+0x28c/0x4ec
> [   66.438380]  [<00000000002b042e>] __writeback_inodes_wb+0x9a/0xe4
> [   66.438382]  [<00000000002b06a2>] wb_writeback+0x22a/0x358
> [   66.438383]  [<00000000002b0cd0>] bdi_writeback_workfn+0x354/0x538
> [   66.438618]  [<000000000014e3aa>] process_one_work+0x1aa/0x418
> [   66.438621]  [<000000000014ef94>] worker_thread+0x48/0x524
> [   66.438625]  [<00000000001560ca>] kthread+0xee/0x108
> [   66.438627]  [<000000000067c76e>] kernel_thread_starter+0x6/0xc
> [   66.438628]  [<000000000067c768>] kernel_thread_starter+0x0/0xc
> [   66.438629] Last Breaking-Event-Address:
> [   66.438631]  [<00000000003edde8>] blk_mq_timeout_check+0x6c/0xb8
> 
> I looked into the dump, and the full function is  (annotated by me to match the source code)
> r2= tags
> r3= tag (4e)
> Dump of assembler code for function blk_mq_tag_to_rq:
>    0x00000000003ed0f4 <+0>:     lg      %r1,96(%r2)			# r1 has now tags->rqs
>    0x00000000003ed0fa <+6>:     sllg    %r2,%r3,3			# r2 has tag*8
>    0x00000000003ed100 <+12>:    lg      %r2,0(%r2,%r1)			# r2 now has rq (=tags->rqs[tag])
>    0x00000000003ed106 <+18>:    lg      %r1,48(%r2)			# r1 now has rq->q
>    0x00000000003ed10c <+24>:    tm      68(%r2),8			# test for rq->cmd_flags & REQ_FLUSH_SEQ
>    0x00000000003ed110 <+28>:    je      0x3ed122 <blk_mq_tag_to_rq+46>  #  if not goto 3ed122
>    0x00000000003ed114 <+32>:    lg      %r4,1672(%r1)			# r4 = rq->q->flush_rq  <-------- CRASHES as rq->q points to cccccccccccc
>    0x00000000003ed11a <+38>:    c       %r3,256(%r4)			# compare tag with rq->q->flush_rq->tag
>    0x00000000003ed11e <+42>:    je      0x3ed124 <blk_mq_tag_to_rq+48>  # if equal goto ..124
>    0x00000000003ed122 <+46>:    br      %r14				# return (with return value == r2)
>    0x00000000003ed124 <+48>:    lgr     %r2,%r4				# return value = r4
>    0x00000000003ed128 <+52>:    br      %r14				# return
> 
> Does anyone have an idea?
> The request itself is completely filled with cc
> 
> 
> Christian
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
  2014-09-11 10:26 ` Christian Borntraeger
@ 2014-09-12 11:54   ` Ming Lei
  -1 siblings, 0 replies; 31+ messages in thread
From: Ming Lei @ 2014-09-12 11:54 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: rusty Russell, ms >> Michael S. Tsirkin, Jens Axboe,
	KVM list, Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List

On Thu, Sep 11, 2014 at 6:26 PM, Christian Borntraeger
<borntraeger@de.ibm.com> wrote:
> Folks,
>
> we have seen the following bug with 3.16 as a KVM guest. It suspect the blk-mq rework that happened between 3.15 and 3.16, but it can be something completely different.
>

Care to share how you reproduce the issue?

> [   65.992022] Unable to handle kernel pointer dereference in virtual kernel address space
> [   65.992187] failing address: ccccccccccccd000 TEID: ccccccccccccd803
> [   65.992363] Fault in home space mode while using kernel ASCE.
> [   65.992365] AS:0000000000a7c007 R3:0000000000000024
> [   65.993754] Oops: 0038 [#1] SMP
> [   65.993923] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi virtio_balloon vhost_net vhost macvtap macvlan kvm dm_multipath virtio_net virtio_blk sunrpc
> [   65.994274] CPU: 0 PID: 44 Comm: kworker/u6:2 Not tainted 3.16.0-20140814.0.c66c84c.fc18-s390xfrob #1
> [   65.996043] Workqueue: writeback bdi_writeback_workfn (flush-251:32)
> [   65.996222] task: 0000000002250000 ti: 0000000002258000 task.ti: 0000000002258000
> [   65.996228] Krnl PSW : 0704f00180000000 00000000003ed114 (blk_mq_tag_to_rq+0x20/0x38)
> [   65.997299]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 EA:3
>                Krnl GPRS: 0000000000000040 cccccccccccccccc 0000000001619000 000000000000004e
> [   65.997301]            000000000000004e 0000000000000000 0000000000000001 0000000000a0de18
> [   65.997302]            0000000077ffbe18 0000000077ffbd50 000000006d72d620 000000000000004f
> [   65.997304]            0000000001a99400 0000000000000080 00000000003eddee 0000000077ffbc28
> [   65.997864] Krnl Code: 00000000003ed106: e31020300004        lg      %r1,48(%r2)
>                           00000000003ed10c: 91082044            tm      68(%r2),8
>                          #00000000003ed110: a7840009            brc     8,3ed122
>                          >00000000003ed114: e34016880004        lg      %r4,1672(%r1)
>                           00000000003ed11a: 59304100            c       %r3,256(%r4)
>                           00000000003ed11e: a7840003            brc     8,3ed124
>                           00000000003ed122: 07fe                bcr     15,%r14
>                           00000000003ed124: b9040024            lgr     %r2,%r4
> [   65.998221] Call Trace:
> [   65.998224] ([<0000000000000001>] 0x1)
> [   65.998227]  [<00000000003f17b6>] blk_mq_tag_busy_iter+0x7a/0xc4
> [   65.998228]  [<00000000003edcd6>] blk_mq_rq_timer+0x96/0x13c
> [   65.999226]  [<000000000013ee60>] call_timer_fn+0x40/0x110
> [   65.999230]  [<000000000013f642>] run_timer_softirq+0x2de/0x3d0
> [   65.999238]  [<0000000000135b70>] __do_softirq+0x124/0x2ac
> [   65.999241]  [<0000000000136000>] irq_exit+0xc4/0xe4
> [   65.999435]  [<000000000010bc08>] do_IRQ+0x64/0x84
> [   66.437533]  [<000000000067ccd8>] ext_skip+0x42/0x46
> [   66.437541]  [<00000000003ed7b4>] __blk_mq_alloc_request+0x58/0x1e8
> [   66.437544] ([<00000000003ed788>] __blk_mq_alloc_request+0x2c/0x1e8)
> [   66.437547]  [<00000000003eef82>] blk_mq_map_request+0xc2/0x208
> [   66.437549]  [<00000000003ef860>] blk_sq_make_request+0xac/0x350
> [   66.437721]  [<00000000003e2d6c>] generic_make_request+0xc4/0xfc
> [   66.437723]  [<00000000003e2e56>] submit_bio+0xb2/0x1a8
> [   66.438373]  [<000000000031e8aa>] ext4_io_submit+0x52/0x80
> [   66.438375]  [<000000000031ccfa>] ext4_writepages+0x7c6/0xd0c
> [   66.438378]  [<00000000002aea20>] __writeback_single_inode+0x54/0x274
> [   66.438379]  [<00000000002b0134>] writeback_sb_inodes+0x28c/0x4ec
> [   66.438380]  [<00000000002b042e>] __writeback_inodes_wb+0x9a/0xe4
> [   66.438382]  [<00000000002b06a2>] wb_writeback+0x22a/0x358
> [   66.438383]  [<00000000002b0cd0>] bdi_writeback_workfn+0x354/0x538
> [   66.438618]  [<000000000014e3aa>] process_one_work+0x1aa/0x418
> [   66.438621]  [<000000000014ef94>] worker_thread+0x48/0x524
> [   66.438625]  [<00000000001560ca>] kthread+0xee/0x108
> [   66.438627]  [<000000000067c76e>] kernel_thread_starter+0x6/0xc
> [   66.438628]  [<000000000067c768>] kernel_thread_starter+0x0/0xc
> [   66.438629] Last Breaking-Event-Address:
> [   66.438631]  [<00000000003edde8>] blk_mq_timeout_check+0x6c/0xb8
>
> I looked into the dump, and the full function is  (annotated by me to match the source code)
> r2= tags
> r3= tag (4e)
> Dump of assembler code for function blk_mq_tag_to_rq:
>    0x00000000003ed0f4 <+0>:     lg      %r1,96(%r2)                     # r1 has now tags->rqs
>    0x00000000003ed0fa <+6>:     sllg    %r2,%r3,3                       # r2 has tag*8
>    0x00000000003ed100 <+12>:    lg      %r2,0(%r2,%r1)                  # r2 now has rq (=tags->rqs[tag])
>    0x00000000003ed106 <+18>:    lg      %r1,48(%r2)                     # r1 now has rq->q
>    0x00000000003ed10c <+24>:    tm      68(%r2),8                       # test for rq->cmd_flags & REQ_FLUSH_SEQ
>    0x00000000003ed110 <+28>:    je      0x3ed122 <blk_mq_tag_to_rq+46>  #  if not goto 3ed122
>    0x00000000003ed114 <+32>:    lg      %r4,1672(%r1)                   # r4 = rq->q->flush_rq  <-------- CRASHES as rq->q points to cccccccccccc
>    0x00000000003ed11a <+38>:    c       %r3,256(%r4)                    # compare tag with rq->q->flush_rq->tag
>    0x00000000003ed11e <+42>:    je      0x3ed124 <blk_mq_tag_to_rq+48>  # if equal goto ..124
>    0x00000000003ed122 <+46>:    br      %r14                            # return (with return value == r2)
>    0x00000000003ed124 <+48>:    lgr     %r2,%r4                         # return value = r4
>    0x00000000003ed128 <+52>:    br      %r14                            # return
>
> Does anyone have an idea?
> The request itself is completely filled with cc

That is very weird, the 'rq' is got from hctx->tags,  and rq should be
valid, and rq->q shouldn't have been changed even though it was
double free or double allocation.

> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.

No, it needn't the protection.

Thanks,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-12 11:54   ` Ming Lei
  0 siblings, 0 replies; 31+ messages in thread
From: Ming Lei @ 2014-09-12 11:54 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, KVM list, ms >> Michael S. Tsirkin,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Virtualization List

On Thu, Sep 11, 2014 at 6:26 PM, Christian Borntraeger
<borntraeger@de.ibm.com> wrote:
> Folks,
>
> we have seen the following bug with 3.16 as a KVM guest. It suspect the blk-mq rework that happened between 3.15 and 3.16, but it can be something completely different.
>

Care to share how you reproduce the issue?

> [   65.992022] Unable to handle kernel pointer dereference in virtual kernel address space
> [   65.992187] failing address: ccccccccccccd000 TEID: ccccccccccccd803
> [   65.992363] Fault in home space mode while using kernel ASCE.
> [   65.992365] AS:0000000000a7c007 R3:0000000000000024
> [   65.993754] Oops: 0038 [#1] SMP
> [   65.993923] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi virtio_balloon vhost_net vhost macvtap macvlan kvm dm_multipath virtio_net virtio_blk sunrpc
> [   65.994274] CPU: 0 PID: 44 Comm: kworker/u6:2 Not tainted 3.16.0-20140814.0.c66c84c.fc18-s390xfrob #1
> [   65.996043] Workqueue: writeback bdi_writeback_workfn (flush-251:32)
> [   65.996222] task: 0000000002250000 ti: 0000000002258000 task.ti: 0000000002258000
> [   65.996228] Krnl PSW : 0704f00180000000 00000000003ed114 (blk_mq_tag_to_rq+0x20/0x38)
> [   65.997299]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 EA:3
>                Krnl GPRS: 0000000000000040 cccccccccccccccc 0000000001619000 000000000000004e
> [   65.997301]            000000000000004e 0000000000000000 0000000000000001 0000000000a0de18
> [   65.997302]            0000000077ffbe18 0000000077ffbd50 000000006d72d620 000000000000004f
> [   65.997304]            0000000001a99400 0000000000000080 00000000003eddee 0000000077ffbc28
> [   65.997864] Krnl Code: 00000000003ed106: e31020300004        lg      %r1,48(%r2)
>                           00000000003ed10c: 91082044            tm      68(%r2),8
>                          #00000000003ed110: a7840009            brc     8,3ed122
>                          >00000000003ed114: e34016880004        lg      %r4,1672(%r1)
>                           00000000003ed11a: 59304100            c       %r3,256(%r4)
>                           00000000003ed11e: a7840003            brc     8,3ed124
>                           00000000003ed122: 07fe                bcr     15,%r14
>                           00000000003ed124: b9040024            lgr     %r2,%r4
> [   65.998221] Call Trace:
> [   65.998224] ([<0000000000000001>] 0x1)
> [   65.998227]  [<00000000003f17b6>] blk_mq_tag_busy_iter+0x7a/0xc4
> [   65.998228]  [<00000000003edcd6>] blk_mq_rq_timer+0x96/0x13c
> [   65.999226]  [<000000000013ee60>] call_timer_fn+0x40/0x110
> [   65.999230]  [<000000000013f642>] run_timer_softirq+0x2de/0x3d0
> [   65.999238]  [<0000000000135b70>] __do_softirq+0x124/0x2ac
> [   65.999241]  [<0000000000136000>] irq_exit+0xc4/0xe4
> [   65.999435]  [<000000000010bc08>] do_IRQ+0x64/0x84
> [   66.437533]  [<000000000067ccd8>] ext_skip+0x42/0x46
> [   66.437541]  [<00000000003ed7b4>] __blk_mq_alloc_request+0x58/0x1e8
> [   66.437544] ([<00000000003ed788>] __blk_mq_alloc_request+0x2c/0x1e8)
> [   66.437547]  [<00000000003eef82>] blk_mq_map_request+0xc2/0x208
> [   66.437549]  [<00000000003ef860>] blk_sq_make_request+0xac/0x350
> [   66.437721]  [<00000000003e2d6c>] generic_make_request+0xc4/0xfc
> [   66.437723]  [<00000000003e2e56>] submit_bio+0xb2/0x1a8
> [   66.438373]  [<000000000031e8aa>] ext4_io_submit+0x52/0x80
> [   66.438375]  [<000000000031ccfa>] ext4_writepages+0x7c6/0xd0c
> [   66.438378]  [<00000000002aea20>] __writeback_single_inode+0x54/0x274
> [   66.438379]  [<00000000002b0134>] writeback_sb_inodes+0x28c/0x4ec
> [   66.438380]  [<00000000002b042e>] __writeback_inodes_wb+0x9a/0xe4
> [   66.438382]  [<00000000002b06a2>] wb_writeback+0x22a/0x358
> [   66.438383]  [<00000000002b0cd0>] bdi_writeback_workfn+0x354/0x538
> [   66.438618]  [<000000000014e3aa>] process_one_work+0x1aa/0x418
> [   66.438621]  [<000000000014ef94>] worker_thread+0x48/0x524
> [   66.438625]  [<00000000001560ca>] kthread+0xee/0x108
> [   66.438627]  [<000000000067c76e>] kernel_thread_starter+0x6/0xc
> [   66.438628]  [<000000000067c768>] kernel_thread_starter+0x0/0xc
> [   66.438629] Last Breaking-Event-Address:
> [   66.438631]  [<00000000003edde8>] blk_mq_timeout_check+0x6c/0xb8
>
> I looked into the dump, and the full function is  (annotated by me to match the source code)
> r2= tags
> r3= tag (4e)
> Dump of assembler code for function blk_mq_tag_to_rq:
>    0x00000000003ed0f4 <+0>:     lg      %r1,96(%r2)                     # r1 has now tags->rqs
>    0x00000000003ed0fa <+6>:     sllg    %r2,%r3,3                       # r2 has tag*8
>    0x00000000003ed100 <+12>:    lg      %r2,0(%r2,%r1)                  # r2 now has rq (=tags->rqs[tag])
>    0x00000000003ed106 <+18>:    lg      %r1,48(%r2)                     # r1 now has rq->q
>    0x00000000003ed10c <+24>:    tm      68(%r2),8                       # test for rq->cmd_flags & REQ_FLUSH_SEQ
>    0x00000000003ed110 <+28>:    je      0x3ed122 <blk_mq_tag_to_rq+46>  #  if not goto 3ed122
>    0x00000000003ed114 <+32>:    lg      %r4,1672(%r1)                   # r4 = rq->q->flush_rq  <-------- CRASHES as rq->q points to cccccccccccc
>    0x00000000003ed11a <+38>:    c       %r3,256(%r4)                    # compare tag with rq->q->flush_rq->tag
>    0x00000000003ed11e <+42>:    je      0x3ed124 <blk_mq_tag_to_rq+48>  # if equal goto ..124
>    0x00000000003ed122 <+46>:    br      %r14                            # return (with return value == r2)
>    0x00000000003ed124 <+48>:    lgr     %r2,%r4                         # return value = r4
>    0x00000000003ed128 <+52>:    br      %r14                            # return
>
> Does anyone have an idea?
> The request itself is completely filled with cc

That is very weird, the 'rq' is got from hctx->tags,  and rq should be
valid, and rq->q shouldn't have been changed even though it was
double free or double allocation.

> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.

No, it needn't the protection.

Thanks,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
  2014-09-12 11:54   ` Ming Lei
@ 2014-09-12 20:09     ` Christian Borntraeger
  -1 siblings, 0 replies; 31+ messages in thread
From: Christian Borntraeger @ 2014-09-12 20:09 UTC (permalink / raw)
  To: Ming Lei
  Cc: rusty Russell, Michael S. Tsirkin, Jens Axboe, KVM list,
	Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List

On 09/12/2014 01:54 PM, Ming Lei wrote:
> On Thu, Sep 11, 2014 at 6:26 PM, Christian Borntraeger
> <borntraeger@de.ibm.com> wrote:
>> Folks,
>>
>> we have seen the following bug with 3.16 as a KVM guest. It suspect the blk-mq rework that happened between 3.15 and 3.16, but it can be something completely different.
>>
> 
> Care to share how you reproduce the issue?

Host with 16GB RAM 32GB swap. 15 guest all with 2 GB RAM (and varying amount of CPUs). All do heavy file I/O.
It did not happen with 3.15/3.15 in guest/host and does happen with 3.16/3.16. So our next step is to check
3.15/3.16 and 3.16/3.15 to identify if its host memory mgmt or guest block layer.

Christian

> 
>> [   65.992022] Unable to handle kernel pointer dereference in virtual kernel address space
>> [   65.992187] failing address: ccccccccccccd000 TEID: ccccccccccccd803
>> [   65.992363] Fault in home space mode while using kernel ASCE.
>> [   65.992365] AS:0000000000a7c007 R3:0000000000000024
>> [   65.993754] Oops: 0038 [#1] SMP
>> [   65.993923] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi virtio_balloon vhost_net vhost macvtap macvlan kvm dm_multipath virtio_net virtio_blk sunrpc
>> [   65.994274] CPU: 0 PID: 44 Comm: kworker/u6:2 Not tainted 3.16.0-20140814.0.c66c84c.fc18-s390xfrob #1
>> [   65.996043] Workqueue: writeback bdi_writeback_workfn (flush-251:32)
>> [   65.996222] task: 0000000002250000 ti: 0000000002258000 task.ti: 0000000002258000
>> [   65.996228] Krnl PSW : 0704f00180000000 00000000003ed114 (blk_mq_tag_to_rq+0x20/0x38)
>> [   65.997299]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 EA:3
>>                Krnl GPRS: 0000000000000040 cccccccccccccccc 0000000001619000 000000000000004e
>> [   65.997301]            000000000000004e 0000000000000000 0000000000000001 0000000000a0de18
>> [   65.997302]            0000000077ffbe18 0000000077ffbd50 000000006d72d620 000000000000004f
>> [   65.997304]            0000000001a99400 0000000000000080 00000000003eddee 0000000077ffbc28
>> [   65.997864] Krnl Code: 00000000003ed106: e31020300004        lg      %r1,48(%r2)
>>                           00000000003ed10c: 91082044            tm      68(%r2),8
>>                          #00000000003ed110: a7840009            brc     8,3ed122
>>                          >00000000003ed114: e34016880004        lg      %r4,1672(%r1)
>>                           00000000003ed11a: 59304100            c       %r3,256(%r4)
>>                           00000000003ed11e: a7840003            brc     8,3ed124
>>                           00000000003ed122: 07fe                bcr     15,%r14
>>                           00000000003ed124: b9040024            lgr     %r2,%r4
>> [   65.998221] Call Trace:
>> [   65.998224] ([<0000000000000001>] 0x1)
>> [   65.998227]  [<00000000003f17b6>] blk_mq_tag_busy_iter+0x7a/0xc4
>> [   65.998228]  [<00000000003edcd6>] blk_mq_rq_timer+0x96/0x13c
>> [   65.999226]  [<000000000013ee60>] call_timer_fn+0x40/0x110
>> [   65.999230]  [<000000000013f642>] run_timer_softirq+0x2de/0x3d0
>> [   65.999238]  [<0000000000135b70>] __do_softirq+0x124/0x2ac
>> [   65.999241]  [<0000000000136000>] irq_exit+0xc4/0xe4
>> [   65.999435]  [<000000000010bc08>] do_IRQ+0x64/0x84
>> [   66.437533]  [<000000000067ccd8>] ext_skip+0x42/0x46
>> [   66.437541]  [<00000000003ed7b4>] __blk_mq_alloc_request+0x58/0x1e8
>> [   66.437544] ([<00000000003ed788>] __blk_mq_alloc_request+0x2c/0x1e8)
>> [   66.437547]  [<00000000003eef82>] blk_mq_map_request+0xc2/0x208
>> [   66.437549]  [<00000000003ef860>] blk_sq_make_request+0xac/0x350
>> [   66.437721]  [<00000000003e2d6c>] generic_make_request+0xc4/0xfc
>> [   66.437723]  [<00000000003e2e56>] submit_bio+0xb2/0x1a8
>> [   66.438373]  [<000000000031e8aa>] ext4_io_submit+0x52/0x80
>> [   66.438375]  [<000000000031ccfa>] ext4_writepages+0x7c6/0xd0c
>> [   66.438378]  [<00000000002aea20>] __writeback_single_inode+0x54/0x274
>> [   66.438379]  [<00000000002b0134>] writeback_sb_inodes+0x28c/0x4ec
>> [   66.438380]  [<00000000002b042e>] __writeback_inodes_wb+0x9a/0xe4
>> [   66.438382]  [<00000000002b06a2>] wb_writeback+0x22a/0x358
>> [   66.438383]  [<00000000002b0cd0>] bdi_writeback_workfn+0x354/0x538
>> [   66.438618]  [<000000000014e3aa>] process_one_work+0x1aa/0x418
>> [   66.438621]  [<000000000014ef94>] worker_thread+0x48/0x524
>> [   66.438625]  [<00000000001560ca>] kthread+0xee/0x108
>> [   66.438627]  [<000000000067c76e>] kernel_thread_starter+0x6/0xc
>> [   66.438628]  [<000000000067c768>] kernel_thread_starter+0x0/0xc
>> [   66.438629] Last Breaking-Event-Address:
>> [   66.438631]  [<00000000003edde8>] blk_mq_timeout_check+0x6c/0xb8
>>
>> I looked into the dump, and the full function is  (annotated by me to match the source code)
>> r2= tags
>> r3= tag (4e)
>> Dump of assembler code for function blk_mq_tag_to_rq:
>>    0x00000000003ed0f4 <+0>:     lg      %r1,96(%r2)                     # r1 has now tags->rqs
>>    0x00000000003ed0fa <+6>:     sllg    %r2,%r3,3                       # r2 has tag*8
>>    0x00000000003ed100 <+12>:    lg      %r2,0(%r2,%r1)                  # r2 now has rq (=tags->rqs[tag])
>>    0x00000000003ed106 <+18>:    lg      %r1,48(%r2)                     # r1 now has rq->q
>>    0x00000000003ed10c <+24>:    tm      68(%r2),8                       # test for rq->cmd_flags & REQ_FLUSH_SEQ
>>    0x00000000003ed110 <+28>:    je      0x3ed122 <blk_mq_tag_to_rq+46>  #  if not goto 3ed122
>>    0x00000000003ed114 <+32>:    lg      %r4,1672(%r1)                   # r4 = rq->q->flush_rq  <-------- CRASHES as rq->q points to cccccccccccc
>>    0x00000000003ed11a <+38>:    c       %r3,256(%r4)                    # compare tag with rq->q->flush_rq->tag
>>    0x00000000003ed11e <+42>:    je      0x3ed124 <blk_mq_tag_to_rq+48>  # if equal goto ..124
>>    0x00000000003ed122 <+46>:    br      %r14                            # return (with return value == r2)
>>    0x00000000003ed124 <+48>:    lgr     %r2,%r4                         # return value = r4
>>    0x00000000003ed128 <+52>:    br      %r14                            # return
>>
>> Does anyone have an idea?
>> The request itself is completely filled with cc
> 
> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
> valid, and rq->q shouldn't have been changed even though it was
> double free or double allocation.
> 
>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
> 
> No, it needn't the protection.
> 
> Thanks,
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-12 20:09     ` Christian Borntraeger
  0 siblings, 0 replies; 31+ messages in thread
From: Christian Borntraeger @ 2014-09-12 20:09 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, KVM list, Michael S. Tsirkin,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Virtualization List

On 09/12/2014 01:54 PM, Ming Lei wrote:
> On Thu, Sep 11, 2014 at 6:26 PM, Christian Borntraeger
> <borntraeger@de.ibm.com> wrote:
>> Folks,
>>
>> we have seen the following bug with 3.16 as a KVM guest. It suspect the blk-mq rework that happened between 3.15 and 3.16, but it can be something completely different.
>>
> 
> Care to share how you reproduce the issue?

Host with 16GB RAM 32GB swap. 15 guest all with 2 GB RAM (and varying amount of CPUs). All do heavy file I/O.
It did not happen with 3.15/3.15 in guest/host and does happen with 3.16/3.16. So our next step is to check
3.15/3.16 and 3.16/3.15 to identify if its host memory mgmt or guest block layer.

Christian

> 
>> [   65.992022] Unable to handle kernel pointer dereference in virtual kernel address space
>> [   65.992187] failing address: ccccccccccccd000 TEID: ccccccccccccd803
>> [   65.992363] Fault in home space mode while using kernel ASCE.
>> [   65.992365] AS:0000000000a7c007 R3:0000000000000024
>> [   65.993754] Oops: 0038 [#1] SMP
>> [   65.993923] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi virtio_balloon vhost_net vhost macvtap macvlan kvm dm_multipath virtio_net virtio_blk sunrpc
>> [   65.994274] CPU: 0 PID: 44 Comm: kworker/u6:2 Not tainted 3.16.0-20140814.0.c66c84c.fc18-s390xfrob #1
>> [   65.996043] Workqueue: writeback bdi_writeback_workfn (flush-251:32)
>> [   65.996222] task: 0000000002250000 ti: 0000000002258000 task.ti: 0000000002258000
>> [   65.996228] Krnl PSW : 0704f00180000000 00000000003ed114 (blk_mq_tag_to_rq+0x20/0x38)
>> [   65.997299]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 EA:3
>>                Krnl GPRS: 0000000000000040 cccccccccccccccc 0000000001619000 000000000000004e
>> [   65.997301]            000000000000004e 0000000000000000 0000000000000001 0000000000a0de18
>> [   65.997302]            0000000077ffbe18 0000000077ffbd50 000000006d72d620 000000000000004f
>> [   65.997304]            0000000001a99400 0000000000000080 00000000003eddee 0000000077ffbc28
>> [   65.997864] Krnl Code: 00000000003ed106: e31020300004        lg      %r1,48(%r2)
>>                           00000000003ed10c: 91082044            tm      68(%r2),8
>>                          #00000000003ed110: a7840009            brc     8,3ed122
>>                          >00000000003ed114: e34016880004        lg      %r4,1672(%r1)
>>                           00000000003ed11a: 59304100            c       %r3,256(%r4)
>>                           00000000003ed11e: a7840003            brc     8,3ed124
>>                           00000000003ed122: 07fe                bcr     15,%r14
>>                           00000000003ed124: b9040024            lgr     %r2,%r4
>> [   65.998221] Call Trace:
>> [   65.998224] ([<0000000000000001>] 0x1)
>> [   65.998227]  [<00000000003f17b6>] blk_mq_tag_busy_iter+0x7a/0xc4
>> [   65.998228]  [<00000000003edcd6>] blk_mq_rq_timer+0x96/0x13c
>> [   65.999226]  [<000000000013ee60>] call_timer_fn+0x40/0x110
>> [   65.999230]  [<000000000013f642>] run_timer_softirq+0x2de/0x3d0
>> [   65.999238]  [<0000000000135b70>] __do_softirq+0x124/0x2ac
>> [   65.999241]  [<0000000000136000>] irq_exit+0xc4/0xe4
>> [   65.999435]  [<000000000010bc08>] do_IRQ+0x64/0x84
>> [   66.437533]  [<000000000067ccd8>] ext_skip+0x42/0x46
>> [   66.437541]  [<00000000003ed7b4>] __blk_mq_alloc_request+0x58/0x1e8
>> [   66.437544] ([<00000000003ed788>] __blk_mq_alloc_request+0x2c/0x1e8)
>> [   66.437547]  [<00000000003eef82>] blk_mq_map_request+0xc2/0x208
>> [   66.437549]  [<00000000003ef860>] blk_sq_make_request+0xac/0x350
>> [   66.437721]  [<00000000003e2d6c>] generic_make_request+0xc4/0xfc
>> [   66.437723]  [<00000000003e2e56>] submit_bio+0xb2/0x1a8
>> [   66.438373]  [<000000000031e8aa>] ext4_io_submit+0x52/0x80
>> [   66.438375]  [<000000000031ccfa>] ext4_writepages+0x7c6/0xd0c
>> [   66.438378]  [<00000000002aea20>] __writeback_single_inode+0x54/0x274
>> [   66.438379]  [<00000000002b0134>] writeback_sb_inodes+0x28c/0x4ec
>> [   66.438380]  [<00000000002b042e>] __writeback_inodes_wb+0x9a/0xe4
>> [   66.438382]  [<00000000002b06a2>] wb_writeback+0x22a/0x358
>> [   66.438383]  [<00000000002b0cd0>] bdi_writeback_workfn+0x354/0x538
>> [   66.438618]  [<000000000014e3aa>] process_one_work+0x1aa/0x418
>> [   66.438621]  [<000000000014ef94>] worker_thread+0x48/0x524
>> [   66.438625]  [<00000000001560ca>] kthread+0xee/0x108
>> [   66.438627]  [<000000000067c76e>] kernel_thread_starter+0x6/0xc
>> [   66.438628]  [<000000000067c768>] kernel_thread_starter+0x0/0xc
>> [   66.438629] Last Breaking-Event-Address:
>> [   66.438631]  [<00000000003edde8>] blk_mq_timeout_check+0x6c/0xb8
>>
>> I looked into the dump, and the full function is  (annotated by me to match the source code)
>> r2= tags
>> r3= tag (4e)
>> Dump of assembler code for function blk_mq_tag_to_rq:
>>    0x00000000003ed0f4 <+0>:     lg      %r1,96(%r2)                     # r1 has now tags->rqs
>>    0x00000000003ed0fa <+6>:     sllg    %r2,%r3,3                       # r2 has tag*8
>>    0x00000000003ed100 <+12>:    lg      %r2,0(%r2,%r1)                  # r2 now has rq (=tags->rqs[tag])
>>    0x00000000003ed106 <+18>:    lg      %r1,48(%r2)                     # r1 now has rq->q
>>    0x00000000003ed10c <+24>:    tm      68(%r2),8                       # test for rq->cmd_flags & REQ_FLUSH_SEQ
>>    0x00000000003ed110 <+28>:    je      0x3ed122 <blk_mq_tag_to_rq+46>  #  if not goto 3ed122
>>    0x00000000003ed114 <+32>:    lg      %r4,1672(%r1)                   # r4 = rq->q->flush_rq  <-------- CRASHES as rq->q points to cccccccccccc
>>    0x00000000003ed11a <+38>:    c       %r3,256(%r4)                    # compare tag with rq->q->flush_rq->tag
>>    0x00000000003ed11e <+42>:    je      0x3ed124 <blk_mq_tag_to_rq+48>  # if equal goto ..124
>>    0x00000000003ed122 <+46>:    br      %r14                            # return (with return value == r2)
>>    0x00000000003ed124 <+48>:    lgr     %r2,%r4                         # return value = r4
>>    0x00000000003ed128 <+52>:    br      %r14                            # return
>>
>> Does anyone have an idea?
>> The request itself is completely filled with cc
> 
> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
> valid, and rq->q shouldn't have been changed even though it was
> double free or double allocation.
> 
>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
> 
> No, it needn't the protection.
> 
> Thanks,
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
  2014-09-12 20:09     ` Christian Borntraeger
@ 2014-09-17  7:59       ` Christian Borntraeger
  -1 siblings, 0 replies; 31+ messages in thread
From: Christian Borntraeger @ 2014-09-17  7:59 UTC (permalink / raw)
  To: Ming Lei
  Cc: rusty Russell, Michael S. Tsirkin, Jens Axboe, KVM list,
	Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	David Hildenbrand

On 09/12/2014 10:09 PM, Christian Borntraeger wrote:
> On 09/12/2014 01:54 PM, Ming Lei wrote:
>> On Thu, Sep 11, 2014 at 6:26 PM, Christian Borntraeger
>> <borntraeger@de.ibm.com> wrote:
>>> Folks,
>>>
>>> we have seen the following bug with 3.16 as a KVM guest. It suspect the blk-mq rework that happened between 3.15 and 3.16, but it can be something completely different.
>>>
>>
>> Care to share how you reproduce the issue?
> 
> Host with 16GB RAM 32GB swap. 15 guest all with 2 GB RAM (and varying amount of CPUs). All do heavy file I/O.
> It did not happen with 3.15/3.15 in guest/host and does happen with 3.16/3.16. So our next step is to check
> 3.15/3.16 and 3.16/3.15 to identify if its host memory mgmt or guest block layer.

The crashed happen pretty randomly, but when they happen it seems that its the same trace as below. This makes memory corruption by host vm less likely and some thing wrong in blk-mq more likely I guess


> 
> Christian
> 
>>
>>> [   65.992022] Unable to handle kernel pointer dereference in virtual kernel address space
>>> [   65.992187] failing address: ccccccccccccd000 TEID: ccccccccccccd803
>>> [   65.992363] Fault in home space mode while using kernel ASCE.
>>> [   65.992365] AS:0000000000a7c007 R3:0000000000000024
>>> [   65.993754] Oops: 0038 [#1] SMP
>>> [   65.993923] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi virtio_balloon vhost_net vhost macvtap macvlan kvm dm_multipath virtio_net virtio_blk sunrpc
>>> [   65.994274] CPU: 0 PID: 44 Comm: kworker/u6:2 Not tainted 3.16.0-20140814.0.c66c84c.fc18-s390xfrob #1
>>> [   65.996043] Workqueue: writeback bdi_writeback_workfn (flush-251:32)
>>> [   65.996222] task: 0000000002250000 ti: 0000000002258000 task.ti: 0000000002258000
>>> [   65.996228] Krnl PSW : 0704f00180000000 00000000003ed114 (blk_mq_tag_to_rq+0x20/0x38)
>>> [   65.997299]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 EA:3
>>>                Krnl GPRS: 0000000000000040 cccccccccccccccc 0000000001619000 000000000000004e
>>> [   65.997301]            000000000000004e 0000000000000000 0000000000000001 0000000000a0de18
>>> [   65.997302]            0000000077ffbe18 0000000077ffbd50 000000006d72d620 000000000000004f
>>> [   65.997304]            0000000001a99400 0000000000000080 00000000003eddee 0000000077ffbc28
>>> [   65.997864] Krnl Code: 00000000003ed106: e31020300004        lg      %r1,48(%r2)
>>>                           00000000003ed10c: 91082044            tm      68(%r2),8
>>>                          #00000000003ed110: a7840009            brc     8,3ed122
>>>                          >00000000003ed114: e34016880004        lg      %r4,1672(%r1)
>>>                           00000000003ed11a: 59304100            c       %r3,256(%r4)
>>>                           00000000003ed11e: a7840003            brc     8,3ed124
>>>                           00000000003ed122: 07fe                bcr     15,%r14
>>>                           00000000003ed124: b9040024            lgr     %r2,%r4
>>> [   65.998221] Call Trace:
>>> [   65.998224] ([<0000000000000001>] 0x1)
>>> [   65.998227]  [<00000000003f17b6>] blk_mq_tag_busy_iter+0x7a/0xc4
>>> [   65.998228]  [<00000000003edcd6>] blk_mq_rq_timer+0x96/0x13c
>>> [   65.999226]  [<000000000013ee60>] call_timer_fn+0x40/0x110
>>> [   65.999230]  [<000000000013f642>] run_timer_softirq+0x2de/0x3d0
>>> [   65.999238]  [<0000000000135b70>] __do_softirq+0x124/0x2ac
>>> [   65.999241]  [<0000000000136000>] irq_exit+0xc4/0xe4
>>> [   65.999435]  [<000000000010bc08>] do_IRQ+0x64/0x84
>>> [   66.437533]  [<000000000067ccd8>] ext_skip+0x42/0x46
>>> [   66.437541]  [<00000000003ed7b4>] __blk_mq_alloc_request+0x58/0x1e8
>>> [   66.437544] ([<00000000003ed788>] __blk_mq_alloc_request+0x2c/0x1e8)
>>> [   66.437547]  [<00000000003eef82>] blk_mq_map_request+0xc2/0x208
>>> [   66.437549]  [<00000000003ef860>] blk_sq_make_request+0xac/0x350
>>> [   66.437721]  [<00000000003e2d6c>] generic_make_request+0xc4/0xfc
>>> [   66.437723]  [<00000000003e2e56>] submit_bio+0xb2/0x1a8
>>> [   66.438373]  [<000000000031e8aa>] ext4_io_submit+0x52/0x80
>>> [   66.438375]  [<000000000031ccfa>] ext4_writepages+0x7c6/0xd0c
>>> [   66.438378]  [<00000000002aea20>] __writeback_single_inode+0x54/0x274
>>> [   66.438379]  [<00000000002b0134>] writeback_sb_inodes+0x28c/0x4ec
>>> [   66.438380]  [<00000000002b042e>] __writeback_inodes_wb+0x9a/0xe4
>>> [   66.438382]  [<00000000002b06a2>] wb_writeback+0x22a/0x358
>>> [   66.438383]  [<00000000002b0cd0>] bdi_writeback_workfn+0x354/0x538
>>> [   66.438618]  [<000000000014e3aa>] process_one_work+0x1aa/0x418
>>> [   66.438621]  [<000000000014ef94>] worker_thread+0x48/0x524
>>> [   66.438625]  [<00000000001560ca>] kthread+0xee/0x108
>>> [   66.438627]  [<000000000067c76e>] kernel_thread_starter+0x6/0xc
>>> [   66.438628]  [<000000000067c768>] kernel_thread_starter+0x0/0xc
>>> [   66.438629] Last Breaking-Event-Address:
>>> [   66.438631]  [<00000000003edde8>] blk_mq_timeout_check+0x6c/0xb8
>>>
>>> I looked into the dump, and the full function is  (annotated by me to match the source code)
>>> r2= tags
>>> r3= tag (4e)
>>> Dump of assembler code for function blk_mq_tag_to_rq:
>>>    0x00000000003ed0f4 <+0>:     lg      %r1,96(%r2)                     # r1 has now tags->rqs
>>>    0x00000000003ed0fa <+6>:     sllg    %r2,%r3,3                       # r2 has tag*8
>>>    0x00000000003ed100 <+12>:    lg      %r2,0(%r2,%r1)                  # r2 now has rq (=tags->rqs[tag])
>>>    0x00000000003ed106 <+18>:    lg      %r1,48(%r2)                     # r1 now has rq->q
>>>    0x00000000003ed10c <+24>:    tm      68(%r2),8                       # test for rq->cmd_flags & REQ_FLUSH_SEQ
>>>    0x00000000003ed110 <+28>:    je      0x3ed122 <blk_mq_tag_to_rq+46>  #  if not goto 3ed122
>>>    0x00000000003ed114 <+32>:    lg      %r4,1672(%r1)                   # r4 = rq->q->flush_rq  <-------- CRASHES as rq->q points to cccccccccccc
>>>    0x00000000003ed11a <+38>:    c       %r3,256(%r4)                    # compare tag with rq->q->flush_rq->tag
>>>    0x00000000003ed11e <+42>:    je      0x3ed124 <blk_mq_tag_to_rq+48>  # if equal goto ..124
>>>    0x00000000003ed122 <+46>:    br      %r14                            # return (with return value == r2)
>>>    0x00000000003ed124 <+48>:    lgr     %r2,%r4                         # return value = r4
>>>    0x00000000003ed128 <+52>:    br      %r14                            # return
>>>
>>> Does anyone have an idea?
>>> The request itself is completely filled with cc
>>
>> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
>> valid, and rq->q shouldn't have been changed even though it was
>> double free or double allocation.
>>
>>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
>>
>> No, it needn't the protection.
>>
>> Thanks,
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-17  7:59       ` Christian Borntraeger
  0 siblings, 0 replies; 31+ messages in thread
From: Christian Borntraeger @ 2014-09-17  7:59 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, KVM list, Michael S. Tsirkin,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Virtualization List, David Hildenbrand

On 09/12/2014 10:09 PM, Christian Borntraeger wrote:
> On 09/12/2014 01:54 PM, Ming Lei wrote:
>> On Thu, Sep 11, 2014 at 6:26 PM, Christian Borntraeger
>> <borntraeger@de.ibm.com> wrote:
>>> Folks,
>>>
>>> we have seen the following bug with 3.16 as a KVM guest. It suspect the blk-mq rework that happened between 3.15 and 3.16, but it can be something completely different.
>>>
>>
>> Care to share how you reproduce the issue?
> 
> Host with 16GB RAM 32GB swap. 15 guest all with 2 GB RAM (and varying amount of CPUs). All do heavy file I/O.
> It did not happen with 3.15/3.15 in guest/host and does happen with 3.16/3.16. So our next step is to check
> 3.15/3.16 and 3.16/3.15 to identify if its host memory mgmt or guest block layer.

The crashed happen pretty randomly, but when they happen it seems that its the same trace as below. This makes memory corruption by host vm less likely and some thing wrong in blk-mq more likely I guess


> 
> Christian
> 
>>
>>> [   65.992022] Unable to handle kernel pointer dereference in virtual kernel address space
>>> [   65.992187] failing address: ccccccccccccd000 TEID: ccccccccccccd803
>>> [   65.992363] Fault in home space mode while using kernel ASCE.
>>> [   65.992365] AS:0000000000a7c007 R3:0000000000000024
>>> [   65.993754] Oops: 0038 [#1] SMP
>>> [   65.993923] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi virtio_balloon vhost_net vhost macvtap macvlan kvm dm_multipath virtio_net virtio_blk sunrpc
>>> [   65.994274] CPU: 0 PID: 44 Comm: kworker/u6:2 Not tainted 3.16.0-20140814.0.c66c84c.fc18-s390xfrob #1
>>> [   65.996043] Workqueue: writeback bdi_writeback_workfn (flush-251:32)
>>> [   65.996222] task: 0000000002250000 ti: 0000000002258000 task.ti: 0000000002258000
>>> [   65.996228] Krnl PSW : 0704f00180000000 00000000003ed114 (blk_mq_tag_to_rq+0x20/0x38)
>>> [   65.997299]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 EA:3
>>>                Krnl GPRS: 0000000000000040 cccccccccccccccc 0000000001619000 000000000000004e
>>> [   65.997301]            000000000000004e 0000000000000000 0000000000000001 0000000000a0de18
>>> [   65.997302]            0000000077ffbe18 0000000077ffbd50 000000006d72d620 000000000000004f
>>> [   65.997304]            0000000001a99400 0000000000000080 00000000003eddee 0000000077ffbc28
>>> [   65.997864] Krnl Code: 00000000003ed106: e31020300004        lg      %r1,48(%r2)
>>>                           00000000003ed10c: 91082044            tm      68(%r2),8
>>>                          #00000000003ed110: a7840009            brc     8,3ed122
>>>                          >00000000003ed114: e34016880004        lg      %r4,1672(%r1)
>>>                           00000000003ed11a: 59304100            c       %r3,256(%r4)
>>>                           00000000003ed11e: a7840003            brc     8,3ed124
>>>                           00000000003ed122: 07fe                bcr     15,%r14
>>>                           00000000003ed124: b9040024            lgr     %r2,%r4
>>> [   65.998221] Call Trace:
>>> [   65.998224] ([<0000000000000001>] 0x1)
>>> [   65.998227]  [<00000000003f17b6>] blk_mq_tag_busy_iter+0x7a/0xc4
>>> [   65.998228]  [<00000000003edcd6>] blk_mq_rq_timer+0x96/0x13c
>>> [   65.999226]  [<000000000013ee60>] call_timer_fn+0x40/0x110
>>> [   65.999230]  [<000000000013f642>] run_timer_softirq+0x2de/0x3d0
>>> [   65.999238]  [<0000000000135b70>] __do_softirq+0x124/0x2ac
>>> [   65.999241]  [<0000000000136000>] irq_exit+0xc4/0xe4
>>> [   65.999435]  [<000000000010bc08>] do_IRQ+0x64/0x84
>>> [   66.437533]  [<000000000067ccd8>] ext_skip+0x42/0x46
>>> [   66.437541]  [<00000000003ed7b4>] __blk_mq_alloc_request+0x58/0x1e8
>>> [   66.437544] ([<00000000003ed788>] __blk_mq_alloc_request+0x2c/0x1e8)
>>> [   66.437547]  [<00000000003eef82>] blk_mq_map_request+0xc2/0x208
>>> [   66.437549]  [<00000000003ef860>] blk_sq_make_request+0xac/0x350
>>> [   66.437721]  [<00000000003e2d6c>] generic_make_request+0xc4/0xfc
>>> [   66.437723]  [<00000000003e2e56>] submit_bio+0xb2/0x1a8
>>> [   66.438373]  [<000000000031e8aa>] ext4_io_submit+0x52/0x80
>>> [   66.438375]  [<000000000031ccfa>] ext4_writepages+0x7c6/0xd0c
>>> [   66.438378]  [<00000000002aea20>] __writeback_single_inode+0x54/0x274
>>> [   66.438379]  [<00000000002b0134>] writeback_sb_inodes+0x28c/0x4ec
>>> [   66.438380]  [<00000000002b042e>] __writeback_inodes_wb+0x9a/0xe4
>>> [   66.438382]  [<00000000002b06a2>] wb_writeback+0x22a/0x358
>>> [   66.438383]  [<00000000002b0cd0>] bdi_writeback_workfn+0x354/0x538
>>> [   66.438618]  [<000000000014e3aa>] process_one_work+0x1aa/0x418
>>> [   66.438621]  [<000000000014ef94>] worker_thread+0x48/0x524
>>> [   66.438625]  [<00000000001560ca>] kthread+0xee/0x108
>>> [   66.438627]  [<000000000067c76e>] kernel_thread_starter+0x6/0xc
>>> [   66.438628]  [<000000000067c768>] kernel_thread_starter+0x0/0xc
>>> [   66.438629] Last Breaking-Event-Address:
>>> [   66.438631]  [<00000000003edde8>] blk_mq_timeout_check+0x6c/0xb8
>>>
>>> I looked into the dump, and the full function is  (annotated by me to match the source code)
>>> r2= tags
>>> r3= tag (4e)
>>> Dump of assembler code for function blk_mq_tag_to_rq:
>>>    0x00000000003ed0f4 <+0>:     lg      %r1,96(%r2)                     # r1 has now tags->rqs
>>>    0x00000000003ed0fa <+6>:     sllg    %r2,%r3,3                       # r2 has tag*8
>>>    0x00000000003ed100 <+12>:    lg      %r2,0(%r2,%r1)                  # r2 now has rq (=tags->rqs[tag])
>>>    0x00000000003ed106 <+18>:    lg      %r1,48(%r2)                     # r1 now has rq->q
>>>    0x00000000003ed10c <+24>:    tm      68(%r2),8                       # test for rq->cmd_flags & REQ_FLUSH_SEQ
>>>    0x00000000003ed110 <+28>:    je      0x3ed122 <blk_mq_tag_to_rq+46>  #  if not goto 3ed122
>>>    0x00000000003ed114 <+32>:    lg      %r4,1672(%r1)                   # r4 = rq->q->flush_rq  <-------- CRASHES as rq->q points to cccccccccccc
>>>    0x00000000003ed11a <+38>:    c       %r3,256(%r4)                    # compare tag with rq->q->flush_rq->tag
>>>    0x00000000003ed11e <+42>:    je      0x3ed124 <blk_mq_tag_to_rq+48>  # if equal goto ..124
>>>    0x00000000003ed122 <+46>:    br      %r14                            # return (with return value == r2)
>>>    0x00000000003ed124 <+48>:    lgr     %r2,%r4                         # return value = r4
>>>    0x00000000003ed128 <+52>:    br      %r14                            # return
>>>
>>> Does anyone have an idea?
>>> The request itself is completely filled with cc
>>
>> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
>> valid, and rq->q shouldn't have been changed even though it was
>> double free or double allocation.
>>
>>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
>>
>> No, it needn't the protection.
>>
>> Thanks,
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
  2014-09-17  7:59       ` Christian Borntraeger
@ 2014-09-17 10:01         ` Ming Lei
  -1 siblings, 0 replies; 31+ messages in thread
From: Ming Lei @ 2014-09-17 10:01 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: rusty Russell, Michael S. Tsirkin, Jens Axboe, KVM list,
	Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	David Hildenbrand

On Wed, Sep 17, 2014 at 3:59 PM, Christian Borntraeger
<borntraeger@de.ibm.com> wrote:
> On 09/12/2014 10:09 PM, Christian Borntraeger wrote:
>> On 09/12/2014 01:54 PM, Ming Lei wrote:
>>> On Thu, Sep 11, 2014 at 6:26 PM, Christian Borntraeger
>>> <borntraeger@de.ibm.com> wrote:
>>>> Folks,
>>>>
>>>> we have seen the following bug with 3.16 as a KVM guest. It suspect the blk-mq rework that happened between 3.15 and 3.16, but it can be something completely different.
>>>>
>>>
>>> Care to share how you reproduce the issue?
>>
>> Host with 16GB RAM 32GB swap. 15 guest all with 2 GB RAM (and varying amount of CPUs). All do heavy file I/O.
>> It did not happen with 3.15/3.15 in guest/host and does happen with 3.16/3.16. So our next step is to check
>> 3.15/3.16 and 3.16/3.15 to identify if its host memory mgmt or guest block layer.
>
> The crashed happen pretty randomly, but when they happen it seems that its the same trace as below. This makes memory corruption by host vm less likely and some thing wrong in blk-mq more likely I guess
>

Maybe you can try these patches because atomic op
can be reordered on S390:

http://marc.info/?l=linux-kernel&m=141094730828533&w=2
http://marc.info/?l=linux-kernel&m=141094730828534&w=2

Thanks
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-17 10:01         ` Ming Lei
  0 siblings, 0 replies; 31+ messages in thread
From: Ming Lei @ 2014-09-17 10:01 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, KVM list, Michael S. Tsirkin,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Virtualization List, David Hildenbrand

On Wed, Sep 17, 2014 at 3:59 PM, Christian Borntraeger
<borntraeger@de.ibm.com> wrote:
> On 09/12/2014 10:09 PM, Christian Borntraeger wrote:
>> On 09/12/2014 01:54 PM, Ming Lei wrote:
>>> On Thu, Sep 11, 2014 at 6:26 PM, Christian Borntraeger
>>> <borntraeger@de.ibm.com> wrote:
>>>> Folks,
>>>>
>>>> we have seen the following bug with 3.16 as a KVM guest. It suspect the blk-mq rework that happened between 3.15 and 3.16, but it can be something completely different.
>>>>
>>>
>>> Care to share how you reproduce the issue?
>>
>> Host with 16GB RAM 32GB swap. 15 guest all with 2 GB RAM (and varying amount of CPUs). All do heavy file I/O.
>> It did not happen with 3.15/3.15 in guest/host and does happen with 3.16/3.16. So our next step is to check
>> 3.15/3.16 and 3.16/3.15 to identify if its host memory mgmt or guest block layer.
>
> The crashed happen pretty randomly, but when they happen it seems that its the same trace as below. This makes memory corruption by host vm less likely and some thing wrong in blk-mq more likely I guess
>

Maybe you can try these patches because atomic op
can be reordered on S390:

http://marc.info/?l=linux-kernel&m=141094730828533&w=2
http://marc.info/?l=linux-kernel&m=141094730828534&w=2

Thanks
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
  2014-09-17  7:59       ` Christian Borntraeger
@ 2014-09-17 12:00         ` David Hildenbrand
  -1 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2014-09-17 12:00 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Ming Lei, rusty Russell, Michael S. Tsirkin, Jens Axboe,
	KVM list, Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List

> >>> Does anyone have an idea?
> >>> The request itself is completely filled with cc
> >>
> >> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
> >> valid, and rq->q shouldn't have been changed even though it was
> >> double free or double allocation.
> >>
> >>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
> >>
> >> No, it needn't the protection.
> >>
> >> Thanks,
> >>
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

Digging through the code, I think I found a possible cause:

tags->rqs[..] is not initialized with zeroes (via alloc_pages_node in
blk-mq.c:blk_mq_init_rq_map()).

When a request is created:

1. __blk_mq_alloc_request() gets a free tag (thus e.g. removing it from
bitmap_tags)

2. __blk_mq_alloc_request() initializes is via blk_mq_rq_ctx_init(). The struct
is filled with life and rq->q is set.


When blk_mq_hw_ctx_check_timeout() is called:

1. blk_mq_tag_busy_iter() is used to call blk_mq_timeout_check() on all busy
tags.

2. This is done by collecting all free tags using bt_for_each_free() and
handing them to blk_mq_timeout_check(). This uses bitmap_tags.

3. blk_mq_timeout_check() calls  blk_mq_tag_to_rq() to get the rq.


Could we have a race between

- getting the tag (turning it busy) and initializing it and
- detecting a tag to be busy and trying to access it?


I haven't looked at the details yet. If so, we might either do some locking
(if there is existing infrastructure), or somehow mark a request as not being
initialized prior to accessing the data.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-17 12:00         ` David Hildenbrand
  0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2014-09-17 12:00 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Ming Lei, rusty Russell, Michael S. Tsirkin, Jens Axboe,
	KVM list, Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List

> >>> Does anyone have an idea?
> >>> The request itself is completely filled with cc
> >>
> >> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
> >> valid, and rq->q shouldn't have been changed even though it was
> >> double free or double allocation.
> >>
> >>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
> >>
> >> No, it needn't the protection.
> >>
> >> Thanks,
> >>
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

Digging through the code, I think I found a possible cause:

tags->rqs[..] is not initialized with zeroes (via alloc_pages_node in
blk-mq.c:blk_mq_init_rq_map()).

When a request is created:

1. __blk_mq_alloc_request() gets a free tag (thus e.g. removing it from
bitmap_tags)

2. __blk_mq_alloc_request() initializes is via blk_mq_rq_ctx_init(). The struct
is filled with life and rq->q is set.


When blk_mq_hw_ctx_check_timeout() is called:

1. blk_mq_tag_busy_iter() is used to call blk_mq_timeout_check() on all busy
tags.

2. This is done by collecting all free tags using bt_for_each_free() and
handing them to blk_mq_timeout_check(). This uses bitmap_tags.

3. blk_mq_timeout_check() calls  blk_mq_tag_to_rq() to get the rq.


Could we have a race between

- getting the tag (turning it busy) and initializing it and
- detecting a tag to be busy and trying to access it?


I haven't looked at the details yet. If so, we might either do some locking
(if there is existing infrastructure), or somehow mark a request as not being
initialized prior to accessing the data.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
  2014-09-17  7:59       ` Christian Borntraeger
                         ` (2 preceding siblings ...)
  (?)
@ 2014-09-17 12:00       ` David Hildenbrand
  -1 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2014-09-17 12:00 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Jens Axboe, KVM list, Michael S. Tsirkin, Ming Lei,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Virtualization List

> >>> Does anyone have an idea?
> >>> The request itself is completely filled with cc
> >>
> >> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
> >> valid, and rq->q shouldn't have been changed even though it was
> >> double free or double allocation.
> >>
> >>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
> >>
> >> No, it needn't the protection.
> >>
> >> Thanks,
> >>
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

Digging through the code, I think I found a possible cause:

tags->rqs[..] is not initialized with zeroes (via alloc_pages_node in
blk-mq.c:blk_mq_init_rq_map()).

When a request is created:

1. __blk_mq_alloc_request() gets a free tag (thus e.g. removing it from
bitmap_tags)

2. __blk_mq_alloc_request() initializes is via blk_mq_rq_ctx_init(). The struct
is filled with life and rq->q is set.


When blk_mq_hw_ctx_check_timeout() is called:

1. blk_mq_tag_busy_iter() is used to call blk_mq_timeout_check() on all busy
tags.

2. This is done by collecting all free tags using bt_for_each_free() and
handing them to blk_mq_timeout_check(). This uses bitmap_tags.

3. blk_mq_timeout_check() calls  blk_mq_tag_to_rq() to get the rq.


Could we have a race between

- getting the tag (turning it busy) and initializing it and
- detecting a tag to be busy and trying to access it?


I haven't looked at the details yet. If so, we might either do some locking
(if there is existing infrastructure), or somehow mark a request as not being
initialized prior to accessing the data.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
  2014-09-17 12:00         ` David Hildenbrand
  (?)
@ 2014-09-17 13:52           ` Ming Lei
  -1 siblings, 0 replies; 31+ messages in thread
From: Ming Lei @ 2014-09-17 13:52 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Christian Borntraeger, rusty Russell, Michael S. Tsirkin,
	Jens Axboe, KVM list, Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List

On Wed, 17 Sep 2014 14:00:34 +0200
David Hildenbrand <dahi@linux.vnet.ibm.com> wrote:

> > >>> Does anyone have an idea?
> > >>> The request itself is completely filled with cc
> > >>
> > >> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
> > >> valid, and rq->q shouldn't have been changed even though it was
> > >> double free or double allocation.
> > >>
> > >>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
> > >>
> > >> No, it needn't the protection.
> > >>
> > >> Thanks,
> > >>
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> 
> Digging through the code, I think I found a possible cause:
> 
> tags->rqs[..] is not initialized with zeroes (via alloc_pages_node in
> blk-mq.c:blk_mq_init_rq_map()).

Yes, it may cause problem when the request is allocated at the 1st time,
and timeout handler may comes just after the allocation and before its
initialization, then oops triggered because of garbage data in the request. 

--
>From ffd0824b7b686074c2d5d70bc4e6bba3ba56a30c Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Wed, 17 Sep 2014 21:00:34 +0800
Subject: [PATCH] blk-mq: initialize request before the 1st allocation

Otherwise the request can be accessed from timeout handler
just after its 1st allocation from tag pool and before
initialization in blk_mq_rq_ctx_init(), so cause oops since
the request is filled up with garbage data.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-mq.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4aac826..d24673f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -514,6 +514,10 @@ struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
 {
 	struct request *rq = tags->rqs[tag];
 
+	/* uninitialized request */
+	if (!rq->q || rq->tag == -1)
+		return rq;
+
 	if (!is_flush_request(rq, tag))
 		return rq;
 
@@ -1401,6 +1405,12 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set,
 		left -= to_do * rq_size;
 		for (j = 0; j < to_do; j++) {
 			tags->rqs[i] = p;
+
+			/* Avoiding early access from timeout handler */
+			tags->rqs[i]->tag = -1;
+			tags->rqs[i]->q = NULL;
+			tags->rqs[i]->cmd_flags = 0;
+
 			if (set->ops->init_request) {
 				if (set->ops->init_request(set->driver_data,
 						tags->rqs[i], hctx_idx, i,
-- 
1.7.9.5





-- 
Ming Lei

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-17 13:52           ` Ming Lei
  0 siblings, 0 replies; 31+ messages in thread
From: Ming Lei @ 2014-09-17 13:52 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Jens Axboe, KVM list, Michael S. Tsirkin,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Virtualization List, Christian Borntraeger

On Wed, 17 Sep 2014 14:00:34 +0200
David Hildenbrand <dahi@linux.vnet.ibm.com> wrote:

> > >>> Does anyone have an idea?
> > >>> The request itself is completely filled with cc
> > >>
> > >> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
> > >> valid, and rq->q shouldn't have been changed even though it was
> > >> double free or double allocation.
> > >>
> > >>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
> > >>
> > >> No, it needn't the protection.
> > >>
> > >> Thanks,
> > >>
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> 
> Digging through the code, I think I found a possible cause:
> 
> tags->rqs[..] is not initialized with zeroes (via alloc_pages_node in
> blk-mq.c:blk_mq_init_rq_map()).

Yes, it may cause problem when the request is allocated at the 1st time,
and timeout handler may comes just after the allocation and before its
initialization, then oops triggered because of garbage data in the request. 

--
>From ffd0824b7b686074c2d5d70bc4e6bba3ba56a30c Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Wed, 17 Sep 2014 21:00:34 +0800
Subject: [PATCH] blk-mq: initialize request before the 1st allocation

Otherwise the request can be accessed from timeout handler
just after its 1st allocation from tag pool and before
initialization in blk_mq_rq_ctx_init(), so cause oops since
the request is filled up with garbage data.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-mq.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4aac826..d24673f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -514,6 +514,10 @@ struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
 {
 	struct request *rq = tags->rqs[tag];
 
+	/* uninitialized request */
+	if (!rq->q || rq->tag == -1)
+		return rq;
+
 	if (!is_flush_request(rq, tag))
 		return rq;
 
@@ -1401,6 +1405,12 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set,
 		left -= to_do * rq_size;
 		for (j = 0; j < to_do; j++) {
 			tags->rqs[i] = p;
+
+			/* Avoiding early access from timeout handler */
+			tags->rqs[i]->tag = -1;
+			tags->rqs[i]->q = NULL;
+			tags->rqs[i]->cmd_flags = 0;
+
 			if (set->ops->init_request) {
 				if (set->ops->init_request(set->driver_data,
 						tags->rqs[i], hctx_idx, i,
-- 
1.7.9.5





-- 
Ming Lei

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-17 13:52           ` Ming Lei
  0 siblings, 0 replies; 31+ messages in thread
From: Ming Lei @ 2014-09-17 13:52 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Jens Axboe, KVM list, Michael S. Tsirkin,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Virtualization List, Christian Borntraeger

On Wed, 17 Sep 2014 14:00:34 +0200
David Hildenbrand <dahi@linux.vnet.ibm.com> wrote:

> > >>> Does anyone have an idea?
> > >>> The request itself is completely filled with cc
> > >>
> > >> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
> > >> valid, and rq->q shouldn't have been changed even though it was
> > >> double free or double allocation.
> > >>
> > >>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
> > >>
> > >> No, it needn't the protection.
> > >>
> > >> Thanks,
> > >>
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> 
> Digging through the code, I think I found a possible cause:
> 
> tags->rqs[..] is not initialized with zeroes (via alloc_pages_node in
> blk-mq.c:blk_mq_init_rq_map()).

Yes, it may cause problem when the request is allocated at the 1st time,
and timeout handler may comes just after the allocation and before its
initialization, then oops triggered because of garbage data in the request. 

--
From ffd0824b7b686074c2d5d70bc4e6bba3ba56a30c Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Wed, 17 Sep 2014 21:00:34 +0800
Subject: [PATCH] blk-mq: initialize request before the 1st allocation

Otherwise the request can be accessed from timeout handler
just after its 1st allocation from tag pool and before
initialization in blk_mq_rq_ctx_init(), so cause oops since
the request is filled up with garbage data.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-mq.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4aac826..d24673f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -514,6 +514,10 @@ struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
 {
 	struct request *rq = tags->rqs[tag];
 
+	/* uninitialized request */
+	if (!rq->q || rq->tag == -1)
+		return rq;
+
 	if (!is_flush_request(rq, tag))
 		return rq;
 
@@ -1401,6 +1405,12 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set,
 		left -= to_do * rq_size;
 		for (j = 0; j < to_do; j++) {
 			tags->rqs[i] = p;
+
+			/* Avoiding early access from timeout handler */
+			tags->rqs[i]->tag = -1;
+			tags->rqs[i]->q = NULL;
+			tags->rqs[i]->cmd_flags = 0;
+
 			if (set->ops->init_request) {
 				if (set->ops->init_request(set->driver_data,
 						tags->rqs[i], hctx_idx, i,
-- 
1.7.9.5





-- 
Ming Lei

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
  2014-09-17 13:52           ` Ming Lei
@ 2014-09-17 14:11             ` David Hildenbrand
  -1 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2014-09-17 14:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christian Borntraeger, rusty Russell, Michael S. Tsirkin,
	Jens Axboe, KVM list, Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List

> On Wed, 17 Sep 2014 14:00:34 +0200
> David Hildenbrand <dahi@linux.vnet.ibm.com> wrote:
> 
> > > >>> Does anyone have an idea?
> > > >>> The request itself is completely filled with cc
> > > >>
> > > >> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
> > > >> valid, and rq->q shouldn't have been changed even though it was
> > > >> double free or double allocation.
> > > >>
> > > >>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
> > > >>
> > > >> No, it needn't the protection.
> > > >>
> > > >> Thanks,
> > > >>
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > > 
> > 
> > Digging through the code, I think I found a possible cause:
> > 
> > tags->rqs[..] is not initialized with zeroes (via alloc_pages_node in
> > blk-mq.c:blk_mq_init_rq_map()).
> 
> Yes, it may cause problem when the request is allocated at the 1st time,
> and timeout handler may comes just after the allocation and before its
> initialization, then oops triggered because of garbage data in the request. 
> 
> --
> From ffd0824b7b686074c2d5d70bc4e6bba3ba56a30c Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@canonical.com>
> Date: Wed, 17 Sep 2014 21:00:34 +0800
> Subject: [PATCH] blk-mq: initialize request before the 1st allocation
> 
> Otherwise the request can be accessed from timeout handler
> just after its 1st allocation from tag pool and before
> initialization in blk_mq_rq_ctx_init(), so cause oops since
> the request is filled up with garbage data.
> 
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>  block/blk-mq.c |   10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4aac826..d24673f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -514,6 +514,10 @@ struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
>  {
>  	struct request *rq = tags->rqs[tag];
>  
> +	/* uninitialized request */
> +	if (!rq->q || rq->tag == -1)
> +		return rq;
> +
>  	if (!is_flush_request(rq, tag))
>  		return rq;
>  
> @@ -1401,6 +1405,12 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set,
>  		left -= to_do * rq_size;
>  		for (j = 0; j < to_do; j++) {
>  			tags->rqs[i] = p;
> +
> +			/* Avoiding early access from timeout handler */
> +			tags->rqs[i]->tag = -1;
> +			tags->rqs[i]->q = NULL;
> +			tags->rqs[i]->cmd_flags = 0;

I was playing with a simple patch that just sets cmd_flags and action_flags to
0. That should already be sufficient to hinder blk_mq_tag_to_rq and the calling
method to do the wrong thing.

Will see the result tomorrow after testing.

> +
>  			if (set->ops->init_request) {
>  				if (set->ops->init_request(set->driver_data,
>  						tags->rqs[i], hctx_idx, i,


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-17 14:11             ` David Hildenbrand
  0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2014-09-17 14:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christian Borntraeger, rusty Russell, Michael S. Tsirkin,
	Jens Axboe, KVM list, Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List

> On Wed, 17 Sep 2014 14:00:34 +0200
> David Hildenbrand <dahi@linux.vnet.ibm.com> wrote:
> 
> > > >>> Does anyone have an idea?
> > > >>> The request itself is completely filled with cc
> > > >>
> > > >> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
> > > >> valid, and rq->q shouldn't have been changed even though it was
> > > >> double free or double allocation.
> > > >>
> > > >>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
> > > >>
> > > >> No, it needn't the protection.
> > > >>
> > > >> Thanks,
> > > >>
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > > 
> > 
> > Digging through the code, I think I found a possible cause:
> > 
> > tags->rqs[..] is not initialized with zeroes (via alloc_pages_node in
> > blk-mq.c:blk_mq_init_rq_map()).
> 
> Yes, it may cause problem when the request is allocated at the 1st time,
> and timeout handler may comes just after the allocation and before its
> initialization, then oops triggered because of garbage data in the request. 
> 
> --
> From ffd0824b7b686074c2d5d70bc4e6bba3ba56a30c Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@canonical.com>
> Date: Wed, 17 Sep 2014 21:00:34 +0800
> Subject: [PATCH] blk-mq: initialize request before the 1st allocation
> 
> Otherwise the request can be accessed from timeout handler
> just after its 1st allocation from tag pool and before
> initialization in blk_mq_rq_ctx_init(), so cause oops since
> the request is filled up with garbage data.
> 
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>  block/blk-mq.c |   10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4aac826..d24673f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -514,6 +514,10 @@ struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
>  {
>  	struct request *rq = tags->rqs[tag];
>  
> +	/* uninitialized request */
> +	if (!rq->q || rq->tag == -1)
> +		return rq;
> +
>  	if (!is_flush_request(rq, tag))
>  		return rq;
>  
> @@ -1401,6 +1405,12 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set,
>  		left -= to_do * rq_size;
>  		for (j = 0; j < to_do; j++) {
>  			tags->rqs[i] = p;
> +
> +			/* Avoiding early access from timeout handler */
> +			tags->rqs[i]->tag = -1;
> +			tags->rqs[i]->q = NULL;
> +			tags->rqs[i]->cmd_flags = 0;

I was playing with a simple patch that just sets cmd_flags and action_flags to
0. That should already be sufficient to hinder blk_mq_tag_to_rq and the calling
method to do the wrong thing.

Will see the result tomorrow after testing.

> +
>  			if (set->ops->init_request) {
>  				if (set->ops->init_request(set->driver_data,
>  						tags->rqs[i], hctx_idx, i,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
  2014-09-17 13:52           ` Ming Lei
  (?)
  (?)
@ 2014-09-17 14:11           ` David Hildenbrand
  -1 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2014-09-17 14:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, KVM list, Michael S. Tsirkin,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Virtualization List, Christian Borntraeger

> On Wed, 17 Sep 2014 14:00:34 +0200
> David Hildenbrand <dahi@linux.vnet.ibm.com> wrote:
> 
> > > >>> Does anyone have an idea?
> > > >>> The request itself is completely filled with cc
> > > >>
> > > >> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
> > > >> valid, and rq->q shouldn't have been changed even though it was
> > > >> double free or double allocation.
> > > >>
> > > >>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
> > > >>
> > > >> No, it needn't the protection.
> > > >>
> > > >> Thanks,
> > > >>
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > > 
> > 
> > Digging through the code, I think I found a possible cause:
> > 
> > tags->rqs[..] is not initialized with zeroes (via alloc_pages_node in
> > blk-mq.c:blk_mq_init_rq_map()).
> 
> Yes, it may cause problem when the request is allocated at the 1st time,
> and timeout handler may comes just after the allocation and before its
> initialization, then oops triggered because of garbage data in the request. 
> 
> --
> From ffd0824b7b686074c2d5d70bc4e6bba3ba56a30c Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@canonical.com>
> Date: Wed, 17 Sep 2014 21:00:34 +0800
> Subject: [PATCH] blk-mq: initialize request before the 1st allocation
> 
> Otherwise the request can be accessed from timeout handler
> just after its 1st allocation from tag pool and before
> initialization in blk_mq_rq_ctx_init(), so cause oops since
> the request is filled up with garbage data.
> 
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>  block/blk-mq.c |   10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4aac826..d24673f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -514,6 +514,10 @@ struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
>  {
>  	struct request *rq = tags->rqs[tag];
>  
> +	/* uninitialized request */
> +	if (!rq->q || rq->tag == -1)
> +		return rq;
> +
>  	if (!is_flush_request(rq, tag))
>  		return rq;
>  
> @@ -1401,6 +1405,12 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set,
>  		left -= to_do * rq_size;
>  		for (j = 0; j < to_do; j++) {
>  			tags->rqs[i] = p;
> +
> +			/* Avoiding early access from timeout handler */
> +			tags->rqs[i]->tag = -1;
> +			tags->rqs[i]->q = NULL;
> +			tags->rqs[i]->cmd_flags = 0;

I was playing with a simple patch that just sets cmd_flags and action_flags to
0. That should already be sufficient to hinder blk_mq_tag_to_rq and the calling
method to do the wrong thing.

Will see the result tomorrow after testing.

> +
>  			if (set->ops->init_request) {
>  				if (set->ops->init_request(set->driver_data,
>  						tags->rqs[i], hctx_idx, i,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
  2014-09-17 13:52           ` Ming Lei
@ 2014-09-17 14:22             ` Jens Axboe
  -1 siblings, 0 replies; 31+ messages in thread
From: Jens Axboe @ 2014-09-17 14:22 UTC (permalink / raw)
  To: Ming Lei, David Hildenbrand
  Cc: Christian Borntraeger, rusty Russell, Michael S. Tsirkin,
	KVM list, Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List

On 2014-09-17 07:52, Ming Lei wrote:
> On Wed, 17 Sep 2014 14:00:34 +0200
> David Hildenbrand <dahi@linux.vnet.ibm.com> wrote:
>
>>>>>> Does anyone have an idea?
>>>>>> The request itself is completely filled with cc
>>>>>
>>>>> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
>>>>> valid, and rq->q shouldn't have been changed even though it was
>>>>> double free or double allocation.
>>>>>
>>>>>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
>>>>>
>>>>> No, it needn't the protection.
>>>>>
>>>>> Thanks,
>>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
>> Digging through the code, I think I found a possible cause:
>>
>> tags->rqs[..] is not initialized with zeroes (via alloc_pages_node in
>> blk-mq.c:blk_mq_init_rq_map()).
>
> Yes, it may cause problem when the request is allocated at the 1st time,
> and timeout handler may comes just after the allocation and before its
> initialization, then oops triggered because of garbage data in the request.
>
> --
>  From ffd0824b7b686074c2d5d70bc4e6bba3ba56a30c Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@canonical.com>
> Date: Wed, 17 Sep 2014 21:00:34 +0800
> Subject: [PATCH] blk-mq: initialize request before the 1st allocation
>
> Otherwise the request can be accessed from timeout handler
> just after its 1st allocation from tag pool and before
> initialization in blk_mq_rq_ctx_init(), so cause oops since
> the request is filled up with garbage data.
>
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>   block/blk-mq.c |   10 ++++++++++
>   1 file changed, 10 insertions(+)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4aac826..d24673f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -514,6 +514,10 @@ struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
>   {
>   	struct request *rq = tags->rqs[tag];
>
> +	/* uninitialized request */
> +	if (!rq->q || rq->tag == -1)
> +		return rq;
> +
>   	if (!is_flush_request(rq, tag))
>   		return rq;
>
> @@ -1401,6 +1405,12 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set,
>   		left -= to_do * rq_size;
>   		for (j = 0; j < to_do; j++) {
>   			tags->rqs[i] = p;
> +
> +			/* Avoiding early access from timeout handler */
> +			tags->rqs[i]->tag = -1;
> +			tags->rqs[i]->q = NULL;
> +			tags->rqs[i]->cmd_flags = 0;
> +
>   			if (set->ops->init_request) {
>   				if (set->ops->init_request(set->driver_data,
>   						tags->rqs[i], hctx_idx, i,

Another way would be to ensure that the timeout handler doesn't touch 
hw_ctx or tag_sets that aren't fully initialized yet. But I think this 
is safer/cleaner.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-17 14:22             ` Jens Axboe
  0 siblings, 0 replies; 31+ messages in thread
From: Jens Axboe @ 2014-09-17 14:22 UTC (permalink / raw)
  To: Ming Lei, David Hildenbrand
  Cc: KVM list, Michael S. Tsirkin,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Virtualization List, Christian Borntraeger

On 2014-09-17 07:52, Ming Lei wrote:
> On Wed, 17 Sep 2014 14:00:34 +0200
> David Hildenbrand <dahi@linux.vnet.ibm.com> wrote:
>
>>>>>> Does anyone have an idea?
>>>>>> The request itself is completely filled with cc
>>>>>
>>>>> That is very weird, the 'rq' is got from hctx->tags,  and rq should be
>>>>> valid, and rq->q shouldn't have been changed even though it was
>>>>> double free or double allocation.
>>>>>
>>>>>> I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.
>>>>>
>>>>> No, it needn't the protection.
>>>>>
>>>>> Thanks,
>>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
>> Digging through the code, I think I found a possible cause:
>>
>> tags->rqs[..] is not initialized with zeroes (via alloc_pages_node in
>> blk-mq.c:blk_mq_init_rq_map()).
>
> Yes, it may cause problem when the request is allocated at the 1st time,
> and timeout handler may comes just after the allocation and before its
> initialization, then oops triggered because of garbage data in the request.
>
> --
>  From ffd0824b7b686074c2d5d70bc4e6bba3ba56a30c Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@canonical.com>
> Date: Wed, 17 Sep 2014 21:00:34 +0800
> Subject: [PATCH] blk-mq: initialize request before the 1st allocation
>
> Otherwise the request can be accessed from timeout handler
> just after its 1st allocation from tag pool and before
> initialization in blk_mq_rq_ctx_init(), so cause oops since
> the request is filled up with garbage data.
>
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>   block/blk-mq.c |   10 ++++++++++
>   1 file changed, 10 insertions(+)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4aac826..d24673f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -514,6 +514,10 @@ struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
>   {
>   	struct request *rq = tags->rqs[tag];
>
> +	/* uninitialized request */
> +	if (!rq->q || rq->tag == -1)
> +		return rq;
> +
>   	if (!is_flush_request(rq, tag))
>   		return rq;
>
> @@ -1401,6 +1405,12 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set,
>   		left -= to_do * rq_size;
>   		for (j = 0; j < to_do; j++) {
>   			tags->rqs[i] = p;
> +
> +			/* Avoiding early access from timeout handler */
> +			tags->rqs[i]->tag = -1;
> +			tags->rqs[i]->q = NULL;
> +			tags->rqs[i]->cmd_flags = 0;
> +
>   			if (set->ops->init_request) {
>   				if (set->ops->init_request(set->driver_data,
>   						tags->rqs[i], hctx_idx, i,

Another way would be to ensure that the timeout handler doesn't touch 
hw_ctx or tag_sets that aren't fully initialized yet. But I think this 
is safer/cleaner.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
  2014-09-17 14:22             ` Jens Axboe
@ 2014-09-17 15:24               ` Ming Lei
  -1 siblings, 0 replies; 31+ messages in thread
From: Ming Lei @ 2014-09-17 15:24 UTC (permalink / raw)
  To: Jens Axboe
  Cc: David Hildenbrand, Christian Borntraeger, rusty Russell,
	Michael S. Tsirkin, KVM list, Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List

On Wed, Sep 17, 2014 at 10:22 PM, Jens Axboe <axboe@kernel.dk> wrote:
>
> Another way would be to ensure that the timeout handler doesn't touch hw_ctx
> or tag_sets that aren't fully initialized yet. But I think this is
> safer/cleaner.

That may not be easy or enough to check if hw_ctx/tag_sets are
fully initialized if you mean all requests have been used one time.

On Wed, Sep 17, 2014 at 10:11 PM, David Hildenbrand
> I was playing with a simple patch that just sets cmd_flags and action_flags to

What is action_flags?

> 0. That should already be sufficient to hinder blk_mq_tag_to_rq and the calling
> method to do the wrong thing.

Yes, clearing rq->cmd_flags should be enough.

And looks better to move rq initialization to __blk_mq_free_request()
too, otherwise timeout still may see old cmd_flags and rq->q before
rq's new initialization.


Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-17 15:24               ` Ming Lei
  0 siblings, 0 replies; 31+ messages in thread
From: Ming Lei @ 2014-09-17 15:24 UTC (permalink / raw)
  To: Jens Axboe
  Cc: KVM list, Michael S. Tsirkin,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Virtualization List, David Hildenbrand, Christian Borntraeger

On Wed, Sep 17, 2014 at 10:22 PM, Jens Axboe <axboe@kernel.dk> wrote:
>
> Another way would be to ensure that the timeout handler doesn't touch hw_ctx
> or tag_sets that aren't fully initialized yet. But I think this is
> safer/cleaner.

That may not be easy or enough to check if hw_ctx/tag_sets are
fully initialized if you mean all requests have been used one time.

On Wed, Sep 17, 2014 at 10:11 PM, David Hildenbrand
> I was playing with a simple patch that just sets cmd_flags and action_flags to

What is action_flags?

> 0. That should already be sufficient to hinder blk_mq_tag_to_rq and the calling
> method to do the wrong thing.

Yes, clearing rq->cmd_flags should be enough.

And looks better to move rq initialization to __blk_mq_free_request()
too, otherwise timeout still may see old cmd_flags and rq->q before
rq's new initialization.


Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
  2014-09-17 15:24               ` Ming Lei
@ 2014-09-17 19:09                 ` David Hildenbrand
  -1 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2014-09-17 19:09 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christian Borntraeger, rusty Russell,
	Michael S. Tsirkin, KVM list, Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List

> On Wed, Sep 17, 2014 at 10:22 PM, Jens Axboe <axboe@kernel.dk> wrote:
> >
> > Another way would be to ensure that the timeout handler doesn't touch hw_ctx
> > or tag_sets that aren't fully initialized yet. But I think this is
> > safer/cleaner.
> 
> That may not be easy or enough to check if hw_ctx/tag_sets are
> fully initialized if you mean all requests have been used one time.
> 
> On Wed, Sep 17, 2014 at 10:11 PM, David Hildenbrand
> > I was playing with a simple patch that just sets cmd_flags and action_flags to
> 
> What is action_flags?

atomic_flags, sorry :)

Otherwise e.g. REQ_ATOM_STARTED could already be set due to the randomness. I
am not sure if this is really necessary, or if it is completely shielded by the
tag-handling code, but seemed to be clean for me to do it (and I remember it
not being set within blk_mq_rq_ctx_init).

> 
> > 0. That should already be sufficient to hinder blk_mq_tag_to_rq and the calling
> > method to do the wrong thing.
> 
> Yes, clearing rq->cmd_flags should be enough.
> 
> And looks better to move rq initialization to __blk_mq_free_request()
> too, otherwise timeout still may see old cmd_flags and rq->q before
> rq's new initialization.

Yes, __blk_mq_free_request() should also reset at least rq->cmd_flags, and I
think we can remove the initialization from  __blk_mq_alloc_request().

David

> 
> 
> Thanks,


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-17 19:09                 ` David Hildenbrand
  0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2014-09-17 19:09 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, KVM list, Michael S. Tsirkin,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Virtualization List, Christian Borntraeger

> On Wed, Sep 17, 2014 at 10:22 PM, Jens Axboe <axboe@kernel.dk> wrote:
> >
> > Another way would be to ensure that the timeout handler doesn't touch hw_ctx
> > or tag_sets that aren't fully initialized yet. But I think this is
> > safer/cleaner.
> 
> That may not be easy or enough to check if hw_ctx/tag_sets are
> fully initialized if you mean all requests have been used one time.
> 
> On Wed, Sep 17, 2014 at 10:11 PM, David Hildenbrand
> > I was playing with a simple patch that just sets cmd_flags and action_flags to
> 
> What is action_flags?

atomic_flags, sorry :)

Otherwise e.g. REQ_ATOM_STARTED could already be set due to the randomness. I
am not sure if this is really necessary, or if it is completely shielded by the
tag-handling code, but seemed to be clean for me to do it (and I remember it
not being set within blk_mq_rq_ctx_init).

> 
> > 0. That should already be sufficient to hinder blk_mq_tag_to_rq and the calling
> > method to do the wrong thing.
> 
> Yes, clearing rq->cmd_flags should be enough.
> 
> And looks better to move rq initialization to __blk_mq_free_request()
> too, otherwise timeout still may see old cmd_flags and rq->q before
> rq's new initialization.

Yes, __blk_mq_free_request() should also reset at least rq->cmd_flags, and I
think we can remove the initialization from  __blk_mq_alloc_request().

David

> 
> 
> Thanks,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
  2014-09-17 19:09                 ` David Hildenbrand
@ 2014-09-17 20:16                   ` Jens Axboe
  -1 siblings, 0 replies; 31+ messages in thread
From: Jens Axboe @ 2014-09-17 20:16 UTC (permalink / raw)
  To: David Hildenbrand, Ming Lei
  Cc: Christian Borntraeger, rusty Russell, Michael S. Tsirkin,
	KVM list, Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List

On 09/17/2014 01:09 PM, David Hildenbrand wrote:
>>> 0. That should already be sufficient to hinder blk_mq_tag_to_rq and the calling
>>> method to do the wrong thing.
>>
>> Yes, clearing rq->cmd_flags should be enough.
>>
>> And looks better to move rq initialization to __blk_mq_free_request()
>> too, otherwise timeout still may see old cmd_flags and rq->q before
>> rq's new initialization.
> 
> Yes, __blk_mq_free_request() should also reset at least rq->cmd_flags, and I
> think we can remove the initialization from  __blk_mq_alloc_request().

And then we come full circle, that's how the code originally started out
(and it is the saner way to do things). So yes, I'd greatly applaud that.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-17 20:16                   ` Jens Axboe
  0 siblings, 0 replies; 31+ messages in thread
From: Jens Axboe @ 2014-09-17 20:16 UTC (permalink / raw)
  To: David Hildenbrand, Ming Lei
  Cc: KVM list, Michael S. Tsirkin,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Virtualization List, Christian Borntraeger

On 09/17/2014 01:09 PM, David Hildenbrand wrote:
>>> 0. That should already be sufficient to hinder blk_mq_tag_to_rq and the calling
>>> method to do the wrong thing.
>>
>> Yes, clearing rq->cmd_flags should be enough.
>>
>> And looks better to move rq initialization to __blk_mq_free_request()
>> too, otherwise timeout still may see old cmd_flags and rq->q before
>> rq's new initialization.
> 
> Yes, __blk_mq_free_request() should also reset at least rq->cmd_flags, and I
> think we can remove the initialization from  __blk_mq_alloc_request().

And then we come full circle, that's how the code originally started out
(and it is the saner way to do things). So yes, I'd greatly applaud that.


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
  2014-09-17 19:09                 ` David Hildenbrand
@ 2014-09-18  2:13                   ` Ming Lei
  -1 siblings, 0 replies; 31+ messages in thread
From: Ming Lei @ 2014-09-18  2:13 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Jens Axboe, Christian Borntraeger, rusty Russell,
	Michael S. Tsirkin, KVM list, Virtualization List,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List

On Thu, Sep 18, 2014 at 3:09 AM, David Hildenbrand
<dahi@linux.vnet.ibm.com> wrote:
>> On Wed, Sep 17, 2014 at 10:22 PM, Jens Axboe <axboe@kernel.dk> wrote:
>> >
>> > Another way would be to ensure that the timeout handler doesn't touch hw_ctx
>> > or tag_sets that aren't fully initialized yet. But I think this is
>> > safer/cleaner.
>>
>> That may not be easy or enough to check if hw_ctx/tag_sets are
>> fully initialized if you mean all requests have been used one time.
>>
>> On Wed, Sep 17, 2014 at 10:11 PM, David Hildenbrand
>> > I was playing with a simple patch that just sets cmd_flags and action_flags to
>>
>> What is action_flags?
>
> atomic_flags, sorry :)
>
> Otherwise e.g. REQ_ATOM_STARTED could already be set due to the randomness. I
> am not sure if this is really necessary, or if it is completely shielded by the
> tag-handling code, but seemed to be clean for me to do it (and I remember it
> not being set within blk_mq_rq_ctx_init).

You are right, both cmd_flags and atomic_flags should be cleared
in blk_mq_init_rq_map().

>
>>
>> > 0. That should already be sufficient to hinder blk_mq_tag_to_rq and the calling
>> > method to do the wrong thing.
>>
>> Yes, clearing rq->cmd_flags should be enough.
>>
>> And looks better to move rq initialization to __blk_mq_free_request()
>> too, otherwise timeout still may see old cmd_flags and rq->q before
>> rq's new initialization.
>
> Yes, __blk_mq_free_request() should also reset at least rq->cmd_flags, and I
> think we can remove the initialization from  __blk_mq_alloc_request().
>
> David
>
>>
>>
>> Thanks,
>



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)
@ 2014-09-18  2:13                   ` Ming Lei
  0 siblings, 0 replies; 31+ messages in thread
From: Ming Lei @ 2014-09-18  2:13 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Jens Axboe, KVM list, Michael S. Tsirkin,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Virtualization List, Christian Borntraeger

On Thu, Sep 18, 2014 at 3:09 AM, David Hildenbrand
<dahi@linux.vnet.ibm.com> wrote:
>> On Wed, Sep 17, 2014 at 10:22 PM, Jens Axboe <axboe@kernel.dk> wrote:
>> >
>> > Another way would be to ensure that the timeout handler doesn't touch hw_ctx
>> > or tag_sets that aren't fully initialized yet. But I think this is
>> > safer/cleaner.
>>
>> That may not be easy or enough to check if hw_ctx/tag_sets are
>> fully initialized if you mean all requests have been used one time.
>>
>> On Wed, Sep 17, 2014 at 10:11 PM, David Hildenbrand
>> > I was playing with a simple patch that just sets cmd_flags and action_flags to
>>
>> What is action_flags?
>
> atomic_flags, sorry :)
>
> Otherwise e.g. REQ_ATOM_STARTED could already be set due to the randomness. I
> am not sure if this is really necessary, or if it is completely shielded by the
> tag-handling code, but seemed to be clean for me to do it (and I remember it
> not being set within blk_mq_rq_ctx_init).

You are right, both cmd_flags and atomic_flags should be cleared
in blk_mq_init_rq_map().

>
>>
>> > 0. That should already be sufficient to hinder blk_mq_tag_to_rq and the calling
>> > method to do the wrong thing.
>>
>> Yes, clearing rq->cmd_flags should be enough.
>>
>> And looks better to move rq initialization to __blk_mq_free_request()
>> too, otherwise timeout still may see old cmd_flags and rq->q before
>> rq's new initialization.
>
> Yes, __blk_mq_free_request() should also reset at least rq->cmd_flags, and I
> think we can remove the initialization from  __blk_mq_alloc_request().
>
> David
>
>>
>>
>> Thanks,
>



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2014-09-18  2:13 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-11 10:26 blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4) Christian Borntraeger
2014-09-11 10:26 ` Christian Borntraeger
2014-09-12 10:56 ` Christian Borntraeger
2014-09-12 10:56   ` Christian Borntraeger
2014-09-12 11:54 ` Ming Lei
2014-09-12 11:54   ` Ming Lei
2014-09-12 20:09   ` Christian Borntraeger
2014-09-12 20:09     ` Christian Borntraeger
2014-09-17  7:59     ` Christian Borntraeger
2014-09-17  7:59       ` Christian Borntraeger
2014-09-17 10:01       ` Ming Lei
2014-09-17 10:01         ` Ming Lei
2014-09-17 12:00       ` David Hildenbrand
2014-09-17 12:00         ` David Hildenbrand
2014-09-17 13:52         ` Ming Lei
2014-09-17 13:52           ` Ming Lei
2014-09-17 13:52           ` Ming Lei
2014-09-17 14:11           ` David Hildenbrand
2014-09-17 14:11           ` David Hildenbrand
2014-09-17 14:11             ` David Hildenbrand
2014-09-17 14:22           ` Jens Axboe
2014-09-17 14:22             ` Jens Axboe
2014-09-17 15:24             ` Ming Lei
2014-09-17 15:24               ` Ming Lei
2014-09-17 19:09               ` David Hildenbrand
2014-09-17 19:09                 ` David Hildenbrand
2014-09-17 20:16                 ` Jens Axboe
2014-09-17 20:16                   ` Jens Axboe
2014-09-18  2:13                 ` Ming Lei
2014-09-18  2:13                   ` Ming Lei
2014-09-17 12:00       ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.