qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Zhenyu Ye <yezhenyu2@huawei.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: fam@euphon.net, qemu-block@nongnu.org, armbru@redhat.com,
	xiexiangyou@huawei.com, qemu-devel@nongnu.org,
	stefanha@redhat.com, pbonzini@redhat.com, mreitz@redhat.com
Subject: Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions
Date: Tue, 11 Aug 2020 21:54:08 +0800	[thread overview]
Message-ID: <c6d75e49-3e36-6a76-fdc8-cdf09e7c3393@huawei.com> (raw)
In-Reply-To: <20200810153811.GF14538@linux.fritz.box>

Hi Kevin,

On 2020/8/10 23:38, Kevin Wolf wrote:
> Am 10.08.2020 um 16:52 hat Zhenyu Ye geschrieben:
>> Before doing qmp actions, we need to lock the qemu_global_mutex,
>> so the qmp actions should not take too long time.
>>
>> Unfortunately, some qmp actions need to acquire aio context and
>> this may take a long time.  The vm will soft lockup if this time
>> is too long.
> 
> Do you have a specific situation in mind where getting the lock of an
> AioContext can take a long time? I know that the main thread can
> block for considerable time, but QMP commands run in the main thread, so
> this patch doesn't change anything for this case. It would be effective
> if an iothread blocks, but shouldn't everything running in an iothread
> be asynchronous and therefore keep the AioContext lock only for a short
> time?
> 

Theoretically, everything running in an iothread is asynchronous. However,
some 'asynchronous' actions are not non-blocking entirely, such as
io_submit().  This will block while the iodepth is too big and I/O pressure
is too high.  If we do some qmp actions, such as 'info block', at this time,
may cause vm soft lockup.  This series can make these qmp actions safer.

I constructed the scene as follow:
1. create a vm with 4 disks, using iothread.
2. add press to the CPU on the host.  In my scene, the CPU usage exceeds 95%.
3. add press to the 4 disks in the vm at the same time.  I used the fio and
some parameters are:

	 fio -rw=randrw -bs=1M -size=1G -iodepth=512 -ioengine=libaio -numjobs=4

4. do block query actions, for example, by virsh:

	virsh qemu-monitor-command [vm name] --hmp info block

Then the vm will soft lockup, the calltrace is:

[  192.311393] watchdog: BUG: soft lockup - CPU#1 stuck for 42s! [kworker/1:1:33]
[  192.314241] Kernel panic - not syncing: softlockup: hung tasks
[  192.316370] CPU: 1 PID: 33 Comm: kworker/1:1 Kdump: loaded Tainted: G           OEL    4.19.36+ #16
[  192.318765] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[  192.325638] Workqueue: events drm_fb_helper_dirty_work
[  192.roorr327238] Call trace:
[  192.331528]  dump_backtrace+0x0/0x198
[  192.332695]  show_stack+0x24/0x30
[  192.334219]  dump_stack+0xa4/0xcc
[  192.335578]  panic+0x12c/0x314
[  192.336565]  watchdog_timoot_fn+0x3e4/0x3e8
[  192.339984]  __hrtimer_run_queues+0x114/0x358
[  192.341576]  hrtimer_interrupt+0x104/0x2d8
[  192.343247]  arch_timer_handler_virt+0x38/0x58
[  192.345074]  handle_percpu_devid_irq+0x90/0x248
[  192.347238]  generic_handle_irq+0x34/0x50
[  192.349536]  __handle_domain_irq+0x68/0xc0
[  192.351193]  gic_handle_irq+0x6c/0x150
[  192.352639]  el1_irq+0xb8/0x140
[  192.353855]  vp_notify+0x28/0x38 [virtio_pci]
[  192.355646]  virtqueue_kick+0x3c/0x78 [virtio_ring]
[  192.357539]  virtio_gpu_queue_ctrl_buffer_locked+0x180/0x248 [virtio_gpu]
[  192.359869]  virtio_gpu_queue_ctrl_buffer+0x50/0x78 [virtio_gpu]
[  192.361456]  virtio_gpu_cmd_resource_flush+0x8c/0xb0 [virtio_gpu]
[  192.363422]  virtio_gpu_surface_dirty+0x60/0x110 [virtio_gpu]
[  192.365215]  virtio_gpu_framebuffer_surface_dirty+0x34/0x48 [virtio_gpu]
[  192.367452]  drm_fb_helper_dirty_work+0x178/0x1c0
[  192.368912]  process_one_work+0x1b4/0x3f8
[  192.370192]  worker_thread+0x54/0x470
[  192.371370]  kthread+0x134/0x138
[  192.379241]  ret_from_fork+0x10/0x18
[  192.382688] kernel fault(0x5) notification starting on CPU 1
[  192.385059] kernel fault(0x5) notification finished on CPU 1
[  192.387311] SMP: stopping secondary CPUs
[  192.391024] Kernel Offset: disabled
[  192.392111] CPU features: 0x0,a1806008
[  192.393306] Memory Limit: none
[  192.396701] Starting crashdump kernel...
[  192.397821] Bye!

This problem can be avoided after this series applied.

Thanks,
Zhenyu



  reply	other threads:[~2020-08-11 13:55 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10 14:52 [PATCH v1 0/2] Add timeout mechanism to qmp actions Zhenyu Ye
2020-08-10 14:52 ` [PATCH v1 1/2] util: introduce aio_context_acquire_timeout Zhenyu Ye
2020-08-10 14:52 ` [PATCH v1 2/2] qmp: use aio_context_acquire_timeout replace aio_context_acquire Zhenyu Ye
2020-08-10 15:38 ` [PATCH v1 0/2] Add timeout mechanism to qmp actions Kevin Wolf
2020-08-11 13:54   ` Zhenyu Ye [this message]
2020-08-21 12:52     ` Stefan Hajnoczi
2020-09-14 13:27     ` Stefan Hajnoczi
2020-09-17  7:36       ` Zhenyu Ye
2020-09-17 10:10         ` Fam Zheng
2020-09-17 15:44         ` Stefan Hajnoczi
2020-09-17 16:01           ` Fam Zheng
2020-09-18 11:23             ` Zhenyu Ye
2020-09-18 14:06               ` Fam Zheng
2020-09-19  2:22                 ` Zhenyu Ye
2020-09-21 11:14                   ` Fam Zheng
2020-10-13 10:00                     ` Stefan Hajnoczi
2020-10-19 12:40                       ` Zhenyu Ye
2020-10-19 13:25                         ` Paolo Bonzini
2020-10-20  1:34                           ` Zhenyu Ye
2020-10-22 16:29                             ` Fam Zheng
2020-12-08 13:10                               ` Stefan Hajnoczi
2020-12-08 13:47                                 ` Glauber Costa
2020-12-14 16:33                                   ` Stefan Hajnoczi
2020-12-21 11:30                                     ` Zhenyu Ye
2020-09-14 14:42     ` Daniel P. Berrangé
2020-09-17  8:12       ` Zhenyu Ye
2020-08-12 13:51 ` Stefan Hajnoczi
2020-08-13  1:51   ` Zhenyu Ye

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6d75e49-3e36-6a76-fdc8-cdf09e7c3393@huawei.com \
    --to=yezhenyu2@huawei.com \
    --cc=armbru@redhat.com \
    --cc=fam@euphon.net \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=xiexiangyou@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).