All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: qemu-block@nongnu.org, Kevin Wolf <kwolf@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>
Cc: richard.henderson@linaro.org, qemu-devel@nongnu.org,
	Thomas Lamprecht <t.lamprecht@proxmox.com>,
	Hanna Reitz <hreitz@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Fam Zheng <fam@euphon.net>,
	"Michael S . Tsirkin" <mst@redhat.com>
Subject: Re: [PULL 29/32] virtio-blk: implement BlockDevOps->drained_begin()
Date: Mon, 13 Nov 2023 15:38:24 +0100	[thread overview]
Message-ID: <a8de02ee-913d-425a-8c08-e103e153ed39@proxmox.com> (raw)
In-Reply-To: <ee6374dc-c644-449f-b5d1-788695e1a83e@proxmox.com>

Am 03.11.23 um 14:12 schrieb Fiona Ebner:
> Hi,
> 
> Am 30.05.23 um 18:32 schrieb Kevin Wolf:
>> From: Stefan Hajnoczi <stefanha@redhat.com>
>>
>> Detach ioeventfds during drained sections to stop I/O submission from
>> the guest. virtio-blk is no longer reliant on aio_disable_external()
>> after this patch. This will allow us to remove the
>> aio_disable_external() API once all other code that relies on it is
>> converted.
>>
>> Take extra care to avoid attaching/detaching ioeventfds if the data
>> plane is started/stopped during a drained section. This should be rare,
>> but maybe the mirror block job can trigger it.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Message-Id: <20230516190238.8401-18-stefanha@redhat.com>
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> 
> I ran into a strange issue where guest IO would get completely stuck
> during certain block jobs a while ago and finally managed to find a
> small reproducer [0]. I'm using a VM with virtio-blk-pci (or
> virtio-scsi-pci) with an iothread and running
> 
> fio --name=file --size=100M --direct=1 --rw=randwrite --bs=4k
> --ioengine=psync --numjobs=5 --runtime=1200 --time_based
> 
> in the guest. Then I'm issuing the QMP command with the reproducer in a
> loop. Usually, the guest IO will get stuck after about 1-3 minutes,
> sometimes fio can manage to continue with a lower speed for a while (but
> trying to Ctrl+C it or doing other IO in the guest will already be
> broken), which I guess could be a hint that it's an issue with notifiers?
> 
> Bisecting (to declare a commit good, I waited 10 minutes) led me to this
> patch, i.e. commit 1665d9326f ("virtio-blk: implement
> BlockDevOps->drained_begin()") and for SCSI, I verified that the issue
> similarly starts happening after 766aa2de0f ("virtio-scsi: implement
> BlockDevOps->drained_begin()").
> 
> Both issues are still present on current master (i.e. 1c98a821a2
> ("tests/qtest: Introduce tests for AMD/Xilinx Versal TRNG device"))
> 
> Happy to provide more information and hints about how to debug the issue
> further.
> 

Of course, I meant "and for hints" ;)

I should also mention that when IO is stuck, for the two
BlockDriverStates (i.e. bdrv_raw and bdrv_file) and BlockBackend,
in_flight and quiesce_counter are 0, tracked_requests, respectively
queued_requests, are empty and quiesced_parent is false for the parents.

Two observations:

1. I found that using QMP 'stop' and 'cont' will allow guest IO to get
unstuck. I'm pretty sure, it's the virtio_blk_data_plane_stop/start
calls it triggers.

2. While experimenting, I found that after the below change [1] in
aio_poll(), I wasn't able to trigger the issue anymore (letting my
reproducer run for 40 minutes).

Best Regards,
Fiona

[1]:

> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index 7f2c99729d..dff9ad4148 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -655,7 +655,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
>      /* If polling is allowed, non-blocking aio_poll does not need the
>       * system call---a single round of run_poll_handlers_once suffices.
>       */
> -    if (timeout || ctx->fdmon_ops->need_wait(ctx)) {
> +    if (1) { //timeout || ctx->fdmon_ops->need_wait(ctx)) {
>          /*
>           * Disable poll mode. poll mode should be disabled before the call
>           * of ctx->fdmon_ops->wait() so that guest's notification can wake


> [0]:
> 
>> diff --git a/blockdev.c b/blockdev.c
>> index db2725fe74..bf2e0fc22c 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -2986,6 +2986,11 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
>>      bool zero_target;
>>      int ret;
>>  
>> +    bdrv_drain_all_begin();
>> +    bdrv_drain_all_end();
>> +    return;
>> +
>> +
>>      bs = qmp_get_root_bs(arg->device, errp);
>>      if (!bs) {
>>          return;
> 
> 
> 



  reply	other threads:[~2023-11-13 14:40 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-30 16:32 [PULL 00/32] Block layer patches Kevin Wolf
2023-05-30 16:32 ` [PULL 01/32] block-coroutine-wrapper: Take AioContext lock in no_co_wrappers Kevin Wolf
2023-05-30 16:32 ` [PULL 02/32] block: Clarify locking rules for bdrv_open(_inherit)() Kevin Wolf
2023-05-30 16:32 ` [PULL 03/32] block: Take main AioContext lock when calling bdrv_open() Kevin Wolf
2023-05-30 16:32 ` [PULL 04/32] block-backend: Fix blk_new_open() for iothreads Kevin Wolf
2023-05-30 16:32 ` [PULL 05/32] mirror: Hold main AioContext lock for calling bdrv_open_backing_file() Kevin Wolf
2023-05-30 16:32 ` [PULL 06/32] qcow2: Fix open with 'file' in iothread Kevin Wolf
2023-05-30 16:32 ` [PULL 07/32] raw-format: " Kevin Wolf
2023-05-30 16:32 ` [PULL 08/32] copy-before-write: Fix open with child " Kevin Wolf
2023-05-30 16:32 ` [PULL 09/32] block: Take AioContext lock in bdrv_open_driver() Kevin Wolf
2023-05-30 16:32 ` [PULL 10/32] block: Fix AioContext locking in bdrv_insert_node() Kevin Wolf
2023-05-30 16:32 ` [PULL 11/32] iotests: Make verify_virtio_scsi_pci_or_ccw() public Kevin Wolf
2023-05-30 16:32 ` [PULL 12/32] iotests: Test blockdev-create in iothread Kevin Wolf
2023-05-30 16:32 ` [PULL 13/32] block-backend: split blk_do_set_aio_context() Kevin Wolf
2023-05-30 16:32 ` [PULL 14/32] hw/qdev: introduce qdev_is_realized() helper Kevin Wolf
2023-05-30 16:32 ` [PULL 15/32] virtio-scsi: avoid race between unplug and transport event Kevin Wolf
2023-05-30 16:32 ` [PULL 16/32] virtio-scsi: stop using aio_disable_external() during unplug Kevin Wolf
2023-05-30 16:32 ` [PULL 17/32] util/vhost-user-server: rename refcount to in_flight counter Kevin Wolf
2023-05-30 16:32 ` [PULL 18/32] block/export: wait for vhost-user-blk requests when draining Kevin Wolf
2023-05-30 16:32 ` [PULL 19/32] block/export: stop using is_external in vhost-user-blk server Kevin Wolf
2023-05-30 16:32 ` [PULL 20/32] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore Kevin Wolf
2023-05-30 16:32 ` [PULL 21/32] block: add blk_in_drain() API Kevin Wolf
2023-05-30 16:32 ` [PULL 22/32] block: drain from main loop thread in bdrv_co_yield_to_drain() Kevin Wolf
2023-05-30 16:32 ` [PULL 23/32] xen-block: implement BlockDevOps->drained_begin() Kevin Wolf
2023-05-30 16:32 ` [PULL 24/32] hw/xen: do not set is_external=true on evtchn fds Kevin Wolf
2023-05-30 16:32 ` [PULL 25/32] block/export: rewrite vduse-blk drain code Kevin Wolf
2023-05-30 16:32 ` [PULL 26/32] block/export: don't require AioContext lock around blk_exp_ref/unref() Kevin Wolf
2023-05-30 16:32 ` [PULL 27/32] block/fuse: do not set is_external=true on FUSE fd Kevin Wolf
2023-05-30 16:32 ` [PULL 28/32] virtio: make it possible to detach host notifier from any thread Kevin Wolf
2023-05-30 16:32 ` [PULL 29/32] virtio-blk: implement BlockDevOps->drained_begin() Kevin Wolf
2023-11-03 13:12   ` Fiona Ebner
2023-11-13 14:38     ` Fiona Ebner [this message]
2023-12-07 15:22     ` Fiona Ebner
2023-12-08  8:32       ` Kevin Wolf
2023-12-11 10:48         ` Fiona Ebner
2023-12-13 21:19           ` Stefan Hajnoczi
2023-05-30 16:32 ` [PULL 30/32] virtio-scsi: " Kevin Wolf
2023-05-30 16:32 ` [PULL 31/32] virtio: do not set is_external=true on host notifiers Kevin Wolf
2023-05-30 16:32 ` [PULL 32/32] aio: remove aio_disable_external() API Kevin Wolf
2023-05-30 18:33 ` [PULL 00/32] Block layer patches Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a8de02ee-913d-425a-8c08-e103e153ed39@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=stefanha@redhat.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.