All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] question: I found a qemu crash when attach virtio-scsi disk
@ 2017-11-01  6:42 lizhengui
  2017-11-03 10:26 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  0 siblings, 1 reply; 10+ messages in thread
From: lizhengui @ 2017-11-01  6:42 UTC (permalink / raw)
  To: kwolf, jcody, mreitz, pbonzini
  Cc: lizhengui, qemu-block, qemu-devel, Fangyi (C)

Hi, when I attach virtio-scsi disk to VM, the qemu crash happened at very low probability. 

The qemu crash bt is below:

#0  0x00007f2be3ada1d7 in raise () from /usr/lib64/libc.so.6
#1  0x00007f2be3adb8c8 in abort () from /usr/lib64/libc.so.6
#2  0x000000000084fe49 in PAT_abort ()
#3  0x000000000084ce8d in patchIllInsHandler ()
#4  <signal handler called>
#5  0x00000000008228bb in qemu_strnlen ()
#6  0x0000000000822934 in strpadcpy ()
#7  0x0000000000684a88 in scsi_disk_emulate_inquiry ()
#8  0x000000000068744b in scsi_disk_emulate_command ()
#9  0x000000000068c481 in scsi_req_enqueue ()
#10 0x00000000004b1f00 in virtio_scsi_handle_cmd_req_submit ()
#11 0x00000000004b2e9e in virtio_scsi_handle_cmd_vq ()
#12 0x000000000076dba7 in aio_dispatch ()
#13 0x000000000076dd96 in aio_poll ()
#14 0x00000000007a8673 in blk_prw ()
#15 0x00000000007a922c in blk_pread ()
#16 0x00000000007a9cd0 in blk_pread_unthrottled ()
#17 0x00000000005cb404 in guess_disk_lchs ()
#18 0x00000000005cb5b4 in hd_geometry_guess ()
#19 0x00000000005cad56 in blkconf_geometry ()
#20 0x0000000000685956 in scsi_realize ()
#21 0x000000000068d3e3 in scsi_qdev_realize ()
#22 0x00000000005e3938 in device_set_realized ()
#23 0x000000000075bced in property_set_bool ()
#24 0x0000000000760205 in object_property_set_qobject ()
#25 0x000000000075df64 in object_property_set_bool ()
#26 0x00000000005580ad in qdev_device_add ()
#27 0x000000000055850b in qmp_device_add ()
#28 0x0000000000818b37 in do_qmp_dispatch.constprop.1 ()
#29 0x0000000000818d8b in qmp_dispatch ()
#30 0x000000000045d212 in handle_qmp_command ()
#31 0x000000000081f819 in json_message_process_token ()
#32 0x00000000008434d0 in json_lexer_feed_char ()
#33 0x00000000008435e6 in json_lexer_feed ()
#34 0x000000000045bd72 in monitor_qmp_read ()
#35 0x000000000055ecf3 in tcp_chr_read ()
#36 0x00007f2be4cf899a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#37 0x000000000076b86b in os_host_main_loop_wait ()
#38 0x000000000076b995 in main_loop_wait ()
#39 0x0000000000569c51 in main_loop ()
#40 0x0000000000420665 in main ()

From the qemu crash bt, we can see that the scsi_realize has not completed yet. Some fields sush as vendor, version in SCSIDiskState is 
Null at this moment. If qemu handles scsi request from this scsi disk at this moment, the qemu will access some null pointers and cause crash.
How can I solve this problem? Should we add a check that whether the scsi disk has realized or not in scsi_disk_emulate_command before
Handling scsi requests? 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] question: I found a qemu crash when attach virtio-scsi disk
  2017-11-01  6:42 [Qemu-devel] question: I found a qemu crash when attach virtio-scsi disk lizhengui
@ 2017-11-03 10:26 ` Stefan Hajnoczi
  2017-11-06 16:11   ` Kevin Wolf
       [not found]   ` <68B56AECEFB25A418ADB9417F6178A531091E80A@dggemi507-mbx.china.huawei.com>
  0 siblings, 2 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2017-11-03 10:26 UTC (permalink / raw)
  To: lizhengui
  Cc: kwolf, jcody, mreitz, pbonzini, qemu-devel, qemu-block, Fangyi (C)

[-- Attachment #1: Type: text/plain, Size: 3644 bytes --]

On Wed, Nov 01, 2017 at 06:42:33AM +0000, lizhengui wrote:
> Hi, when I attach virtio-scsi disk to VM, the qemu crash happened at very low probability. 
> 
> The qemu crash bt is below:
> 
> #0  0x00007f2be3ada1d7 in raise () from /usr/lib64/libc.so.6
> #1  0x00007f2be3adb8c8 in abort () from /usr/lib64/libc.so.6
> #2  0x000000000084fe49 in PAT_abort ()
> #3  0x000000000084ce8d in patchIllInsHandler ()
> #4  <signal handler called>
> #5  0x00000000008228bb in qemu_strnlen ()
> #6  0x0000000000822934 in strpadcpy ()
> #7  0x0000000000684a88 in scsi_disk_emulate_inquiry ()
> #8  0x000000000068744b in scsi_disk_emulate_command ()
> #9  0x000000000068c481 in scsi_req_enqueue ()
> #10 0x00000000004b1f00 in virtio_scsi_handle_cmd_req_submit ()
> #11 0x00000000004b2e9e in virtio_scsi_handle_cmd_vq ()
> #12 0x000000000076dba7 in aio_dispatch ()
> #13 0x000000000076dd96 in aio_poll ()
> #14 0x00000000007a8673 in blk_prw ()
> #15 0x00000000007a922c in blk_pread ()
> #16 0x00000000007a9cd0 in blk_pread_unthrottled ()
> #17 0x00000000005cb404 in guess_disk_lchs ()
> #18 0x00000000005cb5b4 in hd_geometry_guess ()
> #19 0x00000000005cad56 in blkconf_geometry ()
> #20 0x0000000000685956 in scsi_realize ()
> #21 0x000000000068d3e3 in scsi_qdev_realize ()
> #22 0x00000000005e3938 in device_set_realized ()
> #23 0x000000000075bced in property_set_bool ()
> #24 0x0000000000760205 in object_property_set_qobject ()
> #25 0x000000000075df64 in object_property_set_bool ()
> #26 0x00000000005580ad in qdev_device_add ()
> #27 0x000000000055850b in qmp_device_add ()
> #28 0x0000000000818b37 in do_qmp_dispatch.constprop.1 ()
> #29 0x0000000000818d8b in qmp_dispatch ()
> #30 0x000000000045d212 in handle_qmp_command ()
> #31 0x000000000081f819 in json_message_process_token ()
> #32 0x00000000008434d0 in json_lexer_feed_char ()
> #33 0x00000000008435e6 in json_lexer_feed ()
> #34 0x000000000045bd72 in monitor_qmp_read ()
> #35 0x000000000055ecf3 in tcp_chr_read ()
> #36 0x00007f2be4cf899a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
> #37 0x000000000076b86b in os_host_main_loop_wait ()
> #38 0x000000000076b995 in main_loop_wait ()
> #39 0x0000000000569c51 in main_loop ()
> #40 0x0000000000420665 in main ()
> 
> From the qemu crash bt, we can see that the scsi_realize has not completed yet. Some fields sush as vendor, version in SCSIDiskState is 
> Null at this moment. If qemu handles scsi request from this scsi disk at this moment, the qemu will access some null pointers and cause crash.
> How can I solve this problem? Should we add a check that whether the scsi disk has realized or not in scsi_disk_emulate_command before
> Handling scsi requests? 

Please try this patch:

diff --git a/hw/block/block.c b/hw/block/block.c
index 27878d0087..df99ddb899 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -120,9 +120,16 @@ void blkconf_geometry(BlockConf *conf, int *ptrans,
         }
     }
     if (!conf->cyls && !conf->heads && !conf->secs) {
+        AioContext *ctx = blk_get_aio_context(conf->blk);
+
+        /* Callers may not expect this function to dispatch aio handlers, so
+         * disable external aio such as guest device emulation.
+         */
+        aio_disable_external(ctx);
         hd_geometry_guess(conf->blk,
                           &conf->cyls, &conf->heads, &conf->secs,
                           ptrans);
+        aio_enable_external(ctx);
     } else if (ptrans && *ptrans == BIOS_ATA_TRANSLATION_AUTO) {
         *ptrans = hd_bios_chs_auto_trans(conf->cyls, conf->heads, conf->secs);
     }

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] question: I found a qemu crash when attach virtio-scsi disk
  2017-11-03 10:26 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2017-11-06 16:11   ` Kevin Wolf
  2017-11-06 16:33     ` Paolo Bonzini
       [not found]   ` <68B56AECEFB25A418ADB9417F6178A531091E80A@dggemi507-mbx.china.huawei.com>
  1 sibling, 1 reply; 10+ messages in thread
From: Kevin Wolf @ 2017-11-06 16:11 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: lizhengui, jcody, mreitz, pbonzini, qemu-devel, qemu-block, Fangyi (C)

[-- Attachment #1: Type: text/plain, Size: 4084 bytes --]

Am 03.11.2017 um 11:26 hat Stefan Hajnoczi geschrieben:
> On Wed, Nov 01, 2017 at 06:42:33AM +0000, lizhengui wrote:
> > Hi, when I attach virtio-scsi disk to VM, the qemu crash happened at very low probability. 
> > 
> > The qemu crash bt is below:
> > 
> > #0  0x00007f2be3ada1d7 in raise () from /usr/lib64/libc.so.6
> > #1  0x00007f2be3adb8c8 in abort () from /usr/lib64/libc.so.6
> > #2  0x000000000084fe49 in PAT_abort ()
> > #3  0x000000000084ce8d in patchIllInsHandler ()
> > #4  <signal handler called>
> > #5  0x00000000008228bb in qemu_strnlen ()
> > #6  0x0000000000822934 in strpadcpy ()
> > #7  0x0000000000684a88 in scsi_disk_emulate_inquiry ()
> > #8  0x000000000068744b in scsi_disk_emulate_command ()
> > #9  0x000000000068c481 in scsi_req_enqueue ()
> > #10 0x00000000004b1f00 in virtio_scsi_handle_cmd_req_submit ()
> > #11 0x00000000004b2e9e in virtio_scsi_handle_cmd_vq ()
> > #12 0x000000000076dba7 in aio_dispatch ()
> > #13 0x000000000076dd96 in aio_poll ()
> > #14 0x00000000007a8673 in blk_prw ()
> > #15 0x00000000007a922c in blk_pread ()
> > #16 0x00000000007a9cd0 in blk_pread_unthrottled ()
> > #17 0x00000000005cb404 in guess_disk_lchs ()
> > #18 0x00000000005cb5b4 in hd_geometry_guess ()
> > #19 0x00000000005cad56 in blkconf_geometry ()
> > #20 0x0000000000685956 in scsi_realize ()
> > #21 0x000000000068d3e3 in scsi_qdev_realize ()
> > #22 0x00000000005e3938 in device_set_realized ()
> > #23 0x000000000075bced in property_set_bool ()
> > #24 0x0000000000760205 in object_property_set_qobject ()
> > #25 0x000000000075df64 in object_property_set_bool ()
> > #26 0x00000000005580ad in qdev_device_add ()
> > #27 0x000000000055850b in qmp_device_add ()
> > #28 0x0000000000818b37 in do_qmp_dispatch.constprop.1 ()
> > #29 0x0000000000818d8b in qmp_dispatch ()
> > #30 0x000000000045d212 in handle_qmp_command ()
> > #31 0x000000000081f819 in json_message_process_token ()
> > #32 0x00000000008434d0 in json_lexer_feed_char ()
> > #33 0x00000000008435e6 in json_lexer_feed ()
> > #34 0x000000000045bd72 in monitor_qmp_read ()
> > #35 0x000000000055ecf3 in tcp_chr_read ()
> > #36 0x00007f2be4cf899a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
> > #37 0x000000000076b86b in os_host_main_loop_wait ()
> > #38 0x000000000076b995 in main_loop_wait ()
> > #39 0x0000000000569c51 in main_loop ()
> > #40 0x0000000000420665 in main ()
> > 
> > From the qemu crash bt, we can see that the scsi_realize has not completed yet. Some fields sush as vendor, version in SCSIDiskState is 
> > Null at this moment. If qemu handles scsi request from this scsi disk at this moment, the qemu will access some null pointers and cause crash.
> > How can I solve this problem? Should we add a check that whether the scsi disk has realized or not in scsi_disk_emulate_command before
> > Handling scsi requests? 
> 
> Please try this patch:
> 
> diff --git a/hw/block/block.c b/hw/block/block.c
> index 27878d0087..df99ddb899 100644
> --- a/hw/block/block.c
> +++ b/hw/block/block.c
> @@ -120,9 +120,16 @@ void blkconf_geometry(BlockConf *conf, int *ptrans,
>          }
>      }
>      if (!conf->cyls && !conf->heads && !conf->secs) {
> +        AioContext *ctx = blk_get_aio_context(conf->blk);
> +
> +        /* Callers may not expect this function to dispatch aio handlers, so
> +         * disable external aio such as guest device emulation.
> +         */
> +        aio_disable_external(ctx);
>          hd_geometry_guess(conf->blk,
>                            &conf->cyls, &conf->heads, &conf->secs,
>                            ptrans);
> +        aio_enable_external(ctx);
>      } else if (ptrans && *ptrans == BIOS_ATA_TRANSLATION_AUTO) {
>          *ptrans = hd_bios_chs_auto_trans(conf->cyls, conf->heads, conf->secs);
>      }

But why is the new disk even attached to the controller and visible to
the guest at this point when it hasn't completed its initialisation yet?
Isn't the root problem that we're initialising things in the wrong
order?

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] question: I found a qemu crash when attach virtio-scsi disk
  2017-11-06 16:11   ` Kevin Wolf
@ 2017-11-06 16:33     ` Paolo Bonzini
  2017-11-07  1:59       ` Fam Zheng
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2017-11-06 16:33 UTC (permalink / raw)
  To: Kevin Wolf, Stefan Hajnoczi
  Cc: lizhengui, jcody, mreitz, qemu-devel, qemu-block, Fangyi (C)

[-- Attachment #1: Type: text/plain, Size: 4658 bytes --]

On 06/11/2017 17:11, Kevin Wolf wrote:
> Am 03.11.2017 um 11:26 hat Stefan Hajnoczi geschrieben:
>> On Wed, Nov 01, 2017 at 06:42:33AM +0000, lizhengui wrote:
>>> Hi, when I attach virtio-scsi disk to VM, the qemu crash happened at very low probability. 
>>>
>>> The qemu crash bt is below:
>>>
>>> #0  0x00007f2be3ada1d7 in raise () from /usr/lib64/libc.so.6
>>> #1  0x00007f2be3adb8c8 in abort () from /usr/lib64/libc.so.6
>>> #2  0x000000000084fe49 in PAT_abort ()
>>> #3  0x000000000084ce8d in patchIllInsHandler ()
>>> #4  <signal handler called>
>>> #5  0x00000000008228bb in qemu_strnlen ()
>>> #6  0x0000000000822934 in strpadcpy ()
>>> #7  0x0000000000684a88 in scsi_disk_emulate_inquiry ()
>>> #8  0x000000000068744b in scsi_disk_emulate_command ()
>>> #9  0x000000000068c481 in scsi_req_enqueue ()
>>> #10 0x00000000004b1f00 in virtio_scsi_handle_cmd_req_submit ()
>>> #11 0x00000000004b2e9e in virtio_scsi_handle_cmd_vq ()
>>> #12 0x000000000076dba7 in aio_dispatch ()
>>> #13 0x000000000076dd96 in aio_poll ()
>>> #14 0x00000000007a8673 in blk_prw ()
>>> #15 0x00000000007a922c in blk_pread ()
>>> #16 0x00000000007a9cd0 in blk_pread_unthrottled ()
>>> #17 0x00000000005cb404 in guess_disk_lchs ()
>>> #18 0x00000000005cb5b4 in hd_geometry_guess ()
>>> #19 0x00000000005cad56 in blkconf_geometry ()
>>> #20 0x0000000000685956 in scsi_realize ()
>>> #21 0x000000000068d3e3 in scsi_qdev_realize ()
>>> #22 0x00000000005e3938 in device_set_realized ()
>>> #23 0x000000000075bced in property_set_bool ()
>>> #24 0x0000000000760205 in object_property_set_qobject ()
>>> #25 0x000000000075df64 in object_property_set_bool ()
>>> #26 0x00000000005580ad in qdev_device_add ()
>>> #27 0x000000000055850b in qmp_device_add ()
>>> #28 0x0000000000818b37 in do_qmp_dispatch.constprop.1 ()
>>> #29 0x0000000000818d8b in qmp_dispatch ()
>>> #30 0x000000000045d212 in handle_qmp_command ()
>>> #31 0x000000000081f819 in json_message_process_token ()
>>> #32 0x00000000008434d0 in json_lexer_feed_char ()
>>> #33 0x00000000008435e6 in json_lexer_feed ()
>>> #34 0x000000000045bd72 in monitor_qmp_read ()
>>> #35 0x000000000055ecf3 in tcp_chr_read ()
>>> #36 0x00007f2be4cf899a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
>>> #37 0x000000000076b86b in os_host_main_loop_wait ()
>>> #38 0x000000000076b995 in main_loop_wait ()
>>> #39 0x0000000000569c51 in main_loop ()
>>> #40 0x0000000000420665 in main ()
>>>
>>> From the qemu crash bt, we can see that the scsi_realize has not completed yet. Some fields sush as vendor, version in SCSIDiskState is 
>>> Null at this moment. If qemu handles scsi request from this scsi disk at this moment, the qemu will access some null pointers and cause crash.
>>> How can I solve this problem? Should we add a check that whether the scsi disk has realized or not in scsi_disk_emulate_command before
>>> Handling scsi requests? 
>>
>> Please try this patch:
>>
>> diff --git a/hw/block/block.c b/hw/block/block.c
>> index 27878d0087..df99ddb899 100644
>> --- a/hw/block/block.c
>> +++ b/hw/block/block.c
>> @@ -120,9 +120,16 @@ void blkconf_geometry(BlockConf *conf, int *ptrans,
>>          }
>>      }
>>      if (!conf->cyls && !conf->heads && !conf->secs) {
>> +        AioContext *ctx = blk_get_aio_context(conf->blk);
>> +
>> +        /* Callers may not expect this function to dispatch aio handlers, so
>> +         * disable external aio such as guest device emulation.
>> +         */
>> +        aio_disable_external(ctx);
>>          hd_geometry_guess(conf->blk,
>>                            &conf->cyls, &conf->heads, &conf->secs,
>>                            ptrans);
>> +        aio_enable_external(ctx);
>>      } else if (ptrans && *ptrans == BIOS_ATA_TRANSLATION_AUTO) {
>>          *ptrans = hd_bios_chs_auto_trans(conf->cyls, conf->heads, conf->secs);
>>      }
> 
> But why is the new disk even attached to the controller and visible to
> the guest at this point when it hasn't completed its initialisation yet?
> Isn't the root problem that we're initialising things in the wrong
> order?

Well, the root cause then is that scsi_device_find is just reusing the
list of devices on the SCSI bus.  Devices are added to that list very
early by qdev_set_parent_bus.

Stefan's patch could make the issue even harder to hit, but I think that
with iothreads you could hit it anyway.

The solution is probably to add an "online" flag to the device, and set
it in scsi_device_realize.  But even that has the issue that access to
the list is not protected with a lock.  What do you think?

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] question: I found a qemu crash when attach virtio-scsi disk
       [not found]   ` <68B56AECEFB25A418ADB9417F6178A531091E80A@dggemi507-mbx.china.huawei.com>
@ 2017-11-06 19:06     ` Stefan Hajnoczi
  2017-11-07  2:19       ` [Qemu-devel] Re " lizhengui
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Hajnoczi @ 2017-11-06 19:06 UTC (permalink / raw)
  To: lizhengui
  Cc: kwolf, jcody, mreitz, pbonzini, qemu-devel, qemu-block, Fangyi (C)

[-- Attachment #1: Type: text/plain, Size: 227 bytes --]

On Sun, Nov 05, 2017 at 02:51:40PM +0000, lizhengui wrote:
> Ok, thanks for your reply so mush..

Did the patch solve the problem?

If it works for you I will send a proper patch to the mailing list.

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] question: I found a qemu crash when attach virtio-scsi disk
  2017-11-06 16:33     ` Paolo Bonzini
@ 2017-11-07  1:59       ` Fam Zheng
  2017-11-07 10:39         ` Paolo Bonzini
  0 siblings, 1 reply; 10+ messages in thread
From: Fam Zheng @ 2017-11-07  1:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Stefan Hajnoczi, lizhengui, qemu-block, Fangyi (C),
	jcody, qemu-devel, mreitz

On Mon, 11/06 17:33, Paolo Bonzini wrote:
> On 06/11/2017 17:11, Kevin Wolf wrote:
> > Am 03.11.2017 um 11:26 hat Stefan Hajnoczi geschrieben:
> >> On Wed, Nov 01, 2017 at 06:42:33AM +0000, lizhengui wrote:
> >>> Hi, when I attach virtio-scsi disk to VM, the qemu crash happened at very low probability. 
> >>>
> >>> The qemu crash bt is below:
> >>>
> >>> #0  0x00007f2be3ada1d7 in raise () from /usr/lib64/libc.so.6
> >>> #1  0x00007f2be3adb8c8 in abort () from /usr/lib64/libc.so.6
> >>> #2  0x000000000084fe49 in PAT_abort ()
> >>> #3  0x000000000084ce8d in patchIllInsHandler ()
> >>> #4  <signal handler called>
> >>> #5  0x00000000008228bb in qemu_strnlen ()
> >>> #6  0x0000000000822934 in strpadcpy ()
> >>> #7  0x0000000000684a88 in scsi_disk_emulate_inquiry ()
> >>> #8  0x000000000068744b in scsi_disk_emulate_command ()
> >>> #9  0x000000000068c481 in scsi_req_enqueue ()
> >>> #10 0x00000000004b1f00 in virtio_scsi_handle_cmd_req_submit ()
> >>> #11 0x00000000004b2e9e in virtio_scsi_handle_cmd_vq ()
> >>> #12 0x000000000076dba7 in aio_dispatch ()
> >>> #13 0x000000000076dd96 in aio_poll ()
> >>> #14 0x00000000007a8673 in blk_prw ()
> >>> #15 0x00000000007a922c in blk_pread ()
> >>> #16 0x00000000007a9cd0 in blk_pread_unthrottled ()
> >>> #17 0x00000000005cb404 in guess_disk_lchs ()
> >>> #18 0x00000000005cb5b4 in hd_geometry_guess ()
> >>> #19 0x00000000005cad56 in blkconf_geometry ()
> >>> #20 0x0000000000685956 in scsi_realize ()
> >>> #21 0x000000000068d3e3 in scsi_qdev_realize ()
> >>> #22 0x00000000005e3938 in device_set_realized ()
> >>> #23 0x000000000075bced in property_set_bool ()
> >>> #24 0x0000000000760205 in object_property_set_qobject ()
> >>> #25 0x000000000075df64 in object_property_set_bool ()
> >>> #26 0x00000000005580ad in qdev_device_add ()
> >>> #27 0x000000000055850b in qmp_device_add ()
> >>> #28 0x0000000000818b37 in do_qmp_dispatch.constprop.1 ()
> >>> #29 0x0000000000818d8b in qmp_dispatch ()
> >>> #30 0x000000000045d212 in handle_qmp_command ()
> >>> #31 0x000000000081f819 in json_message_process_token ()
> >>> #32 0x00000000008434d0 in json_lexer_feed_char ()
> >>> #33 0x00000000008435e6 in json_lexer_feed ()
> >>> #34 0x000000000045bd72 in monitor_qmp_read ()
> >>> #35 0x000000000055ecf3 in tcp_chr_read ()
> >>> #36 0x00007f2be4cf899a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
> >>> #37 0x000000000076b86b in os_host_main_loop_wait ()
> >>> #38 0x000000000076b995 in main_loop_wait ()
> >>> #39 0x0000000000569c51 in main_loop ()
> >>> #40 0x0000000000420665 in main ()
> >>>
> >>> From the qemu crash bt, we can see that the scsi_realize has not completed yet. Some fields sush as vendor, version in SCSIDiskState is 
> >>> Null at this moment. If qemu handles scsi request from this scsi disk at this moment, the qemu will access some null pointers and cause crash.
> >>> How can I solve this problem? Should we add a check that whether the scsi disk has realized or not in scsi_disk_emulate_command before
> >>> Handling scsi requests? 
> >>
> >> Please try this patch:
> >>
> >> diff --git a/hw/block/block.c b/hw/block/block.c
> >> index 27878d0087..df99ddb899 100644
> >> --- a/hw/block/block.c
> >> +++ b/hw/block/block.c
> >> @@ -120,9 +120,16 @@ void blkconf_geometry(BlockConf *conf, int *ptrans,
> >>          }
> >>      }
> >>      if (!conf->cyls && !conf->heads && !conf->secs) {
> >> +        AioContext *ctx = blk_get_aio_context(conf->blk);
> >> +
> >> +        /* Callers may not expect this function to dispatch aio handlers, so
> >> +         * disable external aio such as guest device emulation.
> >> +         */
> >> +        aio_disable_external(ctx);
> >>          hd_geometry_guess(conf->blk,
> >>                            &conf->cyls, &conf->heads, &conf->secs,
> >>                            ptrans);
> >> +        aio_enable_external(ctx);
> >>      } else if (ptrans && *ptrans == BIOS_ATA_TRANSLATION_AUTO) {
> >>          *ptrans = hd_bios_chs_auto_trans(conf->cyls, conf->heads, conf->secs);
> >>      }
> > 
> > But why is the new disk even attached to the controller and visible to
> > the guest at this point when it hasn't completed its initialisation yet?
> > Isn't the root problem that we're initialising things in the wrong
> > order?
> 
> Well, the root cause then is that scsi_device_find is just reusing the
> list of devices on the SCSI bus.  Devices are added to that list very
> early by qdev_set_parent_bus.
> 
> Stefan's patch could make the issue even harder to hit, but I think that
> with iothreads you could hit it anyway.
> 
> The solution is probably to add an "online" flag to the device, and set
> it in scsi_device_realize.  But even that has the issue that access to
> the list is not protected with a lock.  What do you think?
> 

Can main thread somehow call aio_context_acquire(vs->ctx) (and release) around
qdev_set_parent_bus()? virtio_scsi_device_find() takes the lock.

Fam

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] Re [Qemu-block] question: I found a qemu crash when attach virtio-scsi disk
  2017-11-06 19:06     ` Stefan Hajnoczi
@ 2017-11-07  2:19       ` lizhengui
  0 siblings, 0 replies; 10+ messages in thread
From: lizhengui @ 2017-11-07  2:19 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, jcody, mreitz, pbonzini, qemu-devel, qemu-block, Fangyi (C)

The qemu works good by now after taking the patch when I attach virtio-scsi disk repeatedly. But 
the probability of the problem is very low,I only met this problem once.

-----邮件原件-----
发件人: Stefan Hajnoczi [mailto:stefanha@gmail.com] 
发送时间: 2017年11月7日 3:06
收件人: lizhengui
抄送: kwolf@redhat.com; jcody@redhat.com; mreitz@redhat.com; pbonzini@redhat.com; qemu-devel@nongnu.org; qemu-block@nongnu.org; Fangyi (C)
主题: Re: [Qemu-block] question: I found a qemu crash when attach virtio-scsi disk

On Sun, Nov 05, 2017 at 02:51:40PM +0000, lizhengui wrote:
> Ok, thanks for your reply so mush..

Did the patch solve the problem?

If it works for you I will send a proper patch to the mailing list.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] question: I found a qemu crash when attach virtio-scsi disk
  2017-11-07  1:59       ` Fam Zheng
@ 2017-11-07 10:39         ` Paolo Bonzini
  2017-11-09 11:33           ` Stefan Hajnoczi
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2017-11-07 10:39 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Stefan Hajnoczi, lizhengui, qemu-block, Fangyi (C),
	jcody, qemu-devel, mreitz

On 07/11/2017 02:59, Fam Zheng wrote:
> On Mon, 11/06 17:33, Paolo Bonzini wrote:
>> On 06/11/2017 17:11, Kevin Wolf wrote:
>>> Am 03.11.2017 um 11:26 hat Stefan Hajnoczi geschrieben:
>>>> On Wed, Nov 01, 2017 at 06:42:33AM +0000, lizhengui wrote:
>>>>> Hi, when I attach virtio-scsi disk to VM, the qemu crash happened at very low probability. 
>>>>>
>>>>> The qemu crash bt is below:
>>>>>
>>>>> #0  0x00007f2be3ada1d7 in raise () from /usr/lib64/libc.so.6
>>>>> #1  0x00007f2be3adb8c8 in abort () from /usr/lib64/libc.so.6
>>>>> #2  0x000000000084fe49 in PAT_abort ()
>>>>> #3  0x000000000084ce8d in patchIllInsHandler ()
>>>>> #4  <signal handler called>
>>>>> #5  0x00000000008228bb in qemu_strnlen ()
>>>>> #6  0x0000000000822934 in strpadcpy ()
>>>>> #7  0x0000000000684a88 in scsi_disk_emulate_inquiry ()
>>>>> #8  0x000000000068744b in scsi_disk_emulate_command ()
>>>>> #9  0x000000000068c481 in scsi_req_enqueue ()
>>>>> #10 0x00000000004b1f00 in virtio_scsi_handle_cmd_req_submit ()
>>>>> #11 0x00000000004b2e9e in virtio_scsi_handle_cmd_vq ()
>>>>> #12 0x000000000076dba7 in aio_dispatch ()
>>>>> #13 0x000000000076dd96 in aio_poll ()
>>>>> #14 0x00000000007a8673 in blk_prw ()
>>>>> #15 0x00000000007a922c in blk_pread ()
>>>>> #16 0x00000000007a9cd0 in blk_pread_unthrottled ()
>>>>> #17 0x00000000005cb404 in guess_disk_lchs ()
>>>>> #18 0x00000000005cb5b4 in hd_geometry_guess ()
>>>>> #19 0x00000000005cad56 in blkconf_geometry ()
>>>>> #20 0x0000000000685956 in scsi_realize ()
>>>>> #21 0x000000000068d3e3 in scsi_qdev_realize ()
>>>>> #22 0x00000000005e3938 in device_set_realized ()
>>>>> #23 0x000000000075bced in property_set_bool ()
>>>>> #24 0x0000000000760205 in object_property_set_qobject ()
>>>>> #25 0x000000000075df64 in object_property_set_bool ()
>>>>> #26 0x00000000005580ad in qdev_device_add ()
>>>>> #27 0x000000000055850b in qmp_device_add ()
>>>>> #28 0x0000000000818b37 in do_qmp_dispatch.constprop.1 ()
>>>>> #29 0x0000000000818d8b in qmp_dispatch ()
>>>>> #30 0x000000000045d212 in handle_qmp_command ()
>>>>> #31 0x000000000081f819 in json_message_process_token ()
>>>>> #32 0x00000000008434d0 in json_lexer_feed_char ()
>>>>> #33 0x00000000008435e6 in json_lexer_feed ()
>>>>> #34 0x000000000045bd72 in monitor_qmp_read ()
>>>>> #35 0x000000000055ecf3 in tcp_chr_read ()
>>>>> #36 0x00007f2be4cf899a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
>>>>> #37 0x000000000076b86b in os_host_main_loop_wait ()
>>>>> #38 0x000000000076b995 in main_loop_wait ()
>>>>> #39 0x0000000000569c51 in main_loop ()
>>>>> #40 0x0000000000420665 in main ()
>>>>>
>>>>> From the qemu crash bt, we can see that the scsi_realize has not completed yet. Some fields sush as vendor, version in SCSIDiskState is 
>>>>> Null at this moment. If qemu handles scsi request from this scsi disk at this moment, the qemu will access some null pointers and cause crash.
>>>>> How can I solve this problem? Should we add a check that whether the scsi disk has realized or not in scsi_disk_emulate_command before
>>>>> Handling scsi requests? 
>>>>
>>>> Please try this patch:
>>>>
>>>> diff --git a/hw/block/block.c b/hw/block/block.c
>>>> index 27878d0087..df99ddb899 100644
>>>> --- a/hw/block/block.c
>>>> +++ b/hw/block/block.c
>>>> @@ -120,9 +120,16 @@ void blkconf_geometry(BlockConf *conf, int *ptrans,
>>>>          }
>>>>      }
>>>>      if (!conf->cyls && !conf->heads && !conf->secs) {
>>>> +        AioContext *ctx = blk_get_aio_context(conf->blk);
>>>> +
>>>> +        /* Callers may not expect this function to dispatch aio handlers, so
>>>> +         * disable external aio such as guest device emulation.
>>>> +         */
>>>> +        aio_disable_external(ctx);
>>>>          hd_geometry_guess(conf->blk,
>>>>                            &conf->cyls, &conf->heads, &conf->secs,
>>>>                            ptrans);
>>>> +        aio_enable_external(ctx);
>>>>      } else if (ptrans && *ptrans == BIOS_ATA_TRANSLATION_AUTO) {
>>>>          *ptrans = hd_bios_chs_auto_trans(conf->cyls, conf->heads, conf->secs);
>>>>      }
>>>
>>> But why is the new disk even attached to the controller and visible to
>>> the guest at this point when it hasn't completed its initialisation yet?
>>> Isn't the root problem that we're initialising things in the wrong
>>> order?
>>
>> Well, the root cause then is that scsi_device_find is just reusing the
>> list of devices on the SCSI bus.  Devices are added to that list very
>> early by qdev_set_parent_bus.
>>
>> Stefan's patch could make the issue even harder to hit, but I think that
>> with iothreads you could hit it anyway.
>>
>> The solution is probably to add an "online" flag to the device, and set
>> it in scsi_device_realize.  But even that has the issue that access to
>> the list is not protected with a lock.  What do you think?
> 
> Can main thread somehow call aio_context_acquire(vs->ctx) (and release) around
> qdev_set_parent_bus()? virtio_scsi_device_find() takes the lock.

No, the context is not set yet.  But the locking is easy to add,
separately from the bug that Zhengui is reporting.

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] question: I found a qemu crash when attach virtio-scsi disk
  2017-11-07 10:39         ` Paolo Bonzini
@ 2017-11-09 11:33           ` Stefan Hajnoczi
  2017-11-09 11:43             ` Paolo Bonzini
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Hajnoczi @ 2017-11-09 11:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Fam Zheng, Kevin Wolf, lizhengui, qemu-block, Fangyi (C),
	jcody, qemu-devel, mreitz

[-- Attachment #1: Type: text/plain, Size: 5652 bytes --]

On Tue, Nov 07, 2017 at 11:39:44AM +0100, Paolo Bonzini wrote:
> On 07/11/2017 02:59, Fam Zheng wrote:
> > On Mon, 11/06 17:33, Paolo Bonzini wrote:
> >> On 06/11/2017 17:11, Kevin Wolf wrote:
> >>> Am 03.11.2017 um 11:26 hat Stefan Hajnoczi geschrieben:
> >>>> On Wed, Nov 01, 2017 at 06:42:33AM +0000, lizhengui wrote:
> >>>>> Hi, when I attach virtio-scsi disk to VM, the qemu crash happened at very low probability. 
> >>>>>
> >>>>> The qemu crash bt is below:
> >>>>>
> >>>>> #0  0x00007f2be3ada1d7 in raise () from /usr/lib64/libc.so.6
> >>>>> #1  0x00007f2be3adb8c8 in abort () from /usr/lib64/libc.so.6
> >>>>> #2  0x000000000084fe49 in PAT_abort ()
> >>>>> #3  0x000000000084ce8d in patchIllInsHandler ()
> >>>>> #4  <signal handler called>
> >>>>> #5  0x00000000008228bb in qemu_strnlen ()
> >>>>> #6  0x0000000000822934 in strpadcpy ()
> >>>>> #7  0x0000000000684a88 in scsi_disk_emulate_inquiry ()
> >>>>> #8  0x000000000068744b in scsi_disk_emulate_command ()
> >>>>> #9  0x000000000068c481 in scsi_req_enqueue ()
> >>>>> #10 0x00000000004b1f00 in virtio_scsi_handle_cmd_req_submit ()
> >>>>> #11 0x00000000004b2e9e in virtio_scsi_handle_cmd_vq ()
> >>>>> #12 0x000000000076dba7 in aio_dispatch ()
> >>>>> #13 0x000000000076dd96 in aio_poll ()
> >>>>> #14 0x00000000007a8673 in blk_prw ()
> >>>>> #15 0x00000000007a922c in blk_pread ()
> >>>>> #16 0x00000000007a9cd0 in blk_pread_unthrottled ()
> >>>>> #17 0x00000000005cb404 in guess_disk_lchs ()
> >>>>> #18 0x00000000005cb5b4 in hd_geometry_guess ()
> >>>>> #19 0x00000000005cad56 in blkconf_geometry ()
> >>>>> #20 0x0000000000685956 in scsi_realize ()
> >>>>> #21 0x000000000068d3e3 in scsi_qdev_realize ()
> >>>>> #22 0x00000000005e3938 in device_set_realized ()
> >>>>> #23 0x000000000075bced in property_set_bool ()
> >>>>> #24 0x0000000000760205 in object_property_set_qobject ()
> >>>>> #25 0x000000000075df64 in object_property_set_bool ()
> >>>>> #26 0x00000000005580ad in qdev_device_add ()
> >>>>> #27 0x000000000055850b in qmp_device_add ()
> >>>>> #28 0x0000000000818b37 in do_qmp_dispatch.constprop.1 ()
> >>>>> #29 0x0000000000818d8b in qmp_dispatch ()
> >>>>> #30 0x000000000045d212 in handle_qmp_command ()
> >>>>> #31 0x000000000081f819 in json_message_process_token ()
> >>>>> #32 0x00000000008434d0 in json_lexer_feed_char ()
> >>>>> #33 0x00000000008435e6 in json_lexer_feed ()
> >>>>> #34 0x000000000045bd72 in monitor_qmp_read ()
> >>>>> #35 0x000000000055ecf3 in tcp_chr_read ()
> >>>>> #36 0x00007f2be4cf899a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
> >>>>> #37 0x000000000076b86b in os_host_main_loop_wait ()
> >>>>> #38 0x000000000076b995 in main_loop_wait ()
> >>>>> #39 0x0000000000569c51 in main_loop ()
> >>>>> #40 0x0000000000420665 in main ()
> >>>>>
> >>>>> From the qemu crash bt, we can see that the scsi_realize has not completed yet. Some fields sush as vendor, version in SCSIDiskState is 
> >>>>> Null at this moment. If qemu handles scsi request from this scsi disk at this moment, the qemu will access some null pointers and cause crash.
> >>>>> How can I solve this problem? Should we add a check that whether the scsi disk has realized or not in scsi_disk_emulate_command before
> >>>>> Handling scsi requests? 
> >>>>
> >>>> Please try this patch:
> >>>>
> >>>> diff --git a/hw/block/block.c b/hw/block/block.c
> >>>> index 27878d0087..df99ddb899 100644
> >>>> --- a/hw/block/block.c
> >>>> +++ b/hw/block/block.c
> >>>> @@ -120,9 +120,16 @@ void blkconf_geometry(BlockConf *conf, int *ptrans,
> >>>>          }
> >>>>      }
> >>>>      if (!conf->cyls && !conf->heads && !conf->secs) {
> >>>> +        AioContext *ctx = blk_get_aio_context(conf->blk);
> >>>> +
> >>>> +        /* Callers may not expect this function to dispatch aio handlers, so
> >>>> +         * disable external aio such as guest device emulation.
> >>>> +         */
> >>>> +        aio_disable_external(ctx);
> >>>>          hd_geometry_guess(conf->blk,
> >>>>                            &conf->cyls, &conf->heads, &conf->secs,
> >>>>                            ptrans);
> >>>> +        aio_enable_external(ctx);
> >>>>      } else if (ptrans && *ptrans == BIOS_ATA_TRANSLATION_AUTO) {
> >>>>          *ptrans = hd_bios_chs_auto_trans(conf->cyls, conf->heads, conf->secs);
> >>>>      }
> >>>
> >>> But why is the new disk even attached to the controller and visible to
> >>> the guest at this point when it hasn't completed its initialisation yet?
> >>> Isn't the root problem that we're initialising things in the wrong
> >>> order?
> >>
> >> Well, the root cause then is that scsi_device_find is just reusing the
> >> list of devices on the SCSI bus.  Devices are added to that list very
> >> early by qdev_set_parent_bus.
> >>
> >> Stefan's patch could make the issue even harder to hit, but I think that
> >> with iothreads you could hit it anyway.
> >>
> >> The solution is probably to add an "online" flag to the device, and set
> >> it in scsi_device_realize.  But even that has the issue that access to
> >> the list is not protected with a lock.  What do you think?
> > 
> > Can main thread somehow call aio_context_acquire(vs->ctx) (and release) around
> > qdev_set_parent_bus()? virtio_scsi_device_find() takes the lock.
> 
> No, the context is not set yet.  But the locking is easy to add,
> separately from the bug that Zhengui is reporting.

Do you want to submit a patch for this instead of the
aio_disable_external() approach I posted?  I think your idea is cleaner
than modifying the geometry probing function.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [Qemu-block] question: I found a qemu crash when attach virtio-scsi disk
  2017-11-09 11:33           ` Stefan Hajnoczi
@ 2017-11-09 11:43             ` Paolo Bonzini
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2017-11-09 11:43 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, Kevin Wolf, lizhengui, qemu-block, Fangyi (C),
	jcody, qemu-devel, mreitz

[-- Attachment #1: Type: text/plain, Size: 574 bytes --]

On 09/11/2017 12:33, Stefan Hajnoczi wrote:
>>> Can main thread somehow call aio_context_acquire(vs->ctx) (and release) around
>>> qdev_set_parent_bus()? virtio_scsi_device_find() takes the lock.
>> No, the context is not set yet.  But the locking is easy to add,
>> separately from the bug that Zhengui is reporting.
> Do you want to submit a patch for this instead of the
> aio_disable_external() approach I posted?  I think your idea is cleaner
> than modifying the geometry probing function.

Yes, I will.  Both the locking and the "online" status.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-11-09 11:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-01  6:42 [Qemu-devel] question: I found a qemu crash when attach virtio-scsi disk lizhengui
2017-11-03 10:26 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-11-06 16:11   ` Kevin Wolf
2017-11-06 16:33     ` Paolo Bonzini
2017-11-07  1:59       ` Fam Zheng
2017-11-07 10:39         ` Paolo Bonzini
2017-11-09 11:33           ` Stefan Hajnoczi
2017-11-09 11:43             ` Paolo Bonzini
     [not found]   ` <68B56AECEFB25A418ADB9417F6178A531091E80A@dggemi507-mbx.china.huawei.com>
2017-11-06 19:06     ` Stefan Hajnoczi
2017-11-07  2:19       ` [Qemu-devel] Re " lizhengui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.