All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v0 0/2] Postponed actions
@ 2018-06-29 12:40 Denis Plotnikov
  2018-06-29 12:40 ` [Qemu-devel] [PATCH v0 1/2] async: add infrastructure for postponed actions Denis Plotnikov
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Denis Plotnikov @ 2018-06-29 12:40 UTC (permalink / raw)
  To: kwolf, reitz, stefanha, famz, qemu-stable; +Cc: qemu-block, qemu-devel

There are cases when a request to a block driver state shouldn't have
appeared producing dangerous race conditions.
This misbehaviour is usually happens with storage devices emulated
without eventfd for guest to host notifications like IDE.

The issue arises when the context is in the "drained" section
and doesn't expect the request to come, but request comes from the
device not using iothread and which context is processed by the main loop.

The main loop apart of the iothread event loop isn't blocked by the
"drained" section.
The request coming and processing while in "drained" section can spoil the
block driver state consistency.

This behavior can be observed in the following KVM-based case:

1. Setup a VM with an IDE disk.
2. Inside a VM start a disk writing load for the IDE device
  e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
3. On the host create a mirroring block job for the IDE device
  e.g: drive_mirror <your_IDE> <your_path>
4. On the host finish the block job
  e.g: block_job_complete <your_IDE>
 
Having done the 4th action, you could get an assert:
assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
On my setup, the assert is 1/3 reproducible.

The patch series introduces the mechanism to postpone the requests
until the BDS leaves "drained" section for the devices not using iothreads.
Also, it modifies the asynchronous block backend infrastructure to use
that mechanism to release the assert bug for IDE devices.

Denis Plotnikov (2):
  async: add infrastructure for postponed actions
  block: postpone the coroutine executing if the BDS's is drained

 block/block-backend.c | 58 ++++++++++++++++++++++++++++++---------
 include/block/aio.h   | 63 +++++++++++++++++++++++++++++++++++++++++++
 util/async.c          | 33 +++++++++++++++++++++++
 3 files changed, 142 insertions(+), 12 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v0 1/2] async: add infrastructure for postponed actions
  2018-06-29 12:40 [Qemu-devel] [PATCH v0 0/2] Postponed actions Denis Plotnikov
@ 2018-06-29 12:40 ` Denis Plotnikov
  2018-06-29 12:40 ` [Qemu-devel] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained Denis Plotnikov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 25+ messages in thread
From: Denis Plotnikov @ 2018-06-29 12:40 UTC (permalink / raw)
  To: kwolf, reitz, stefanha, famz, qemu-stable; +Cc: qemu-block, qemu-devel

There is the concept of iothreads servicing Block Driver States
via processing events on the corresponding aio context. Also,
there is a mechanism called "drained section" which is kind of
critical section, preventing an instance of Block Driver State
from processing external requests.

The "drained section" is respected by iothreads. While processing the
the event loop, the iothread stops its event loop from running until
the current context has no Block Driver States in drained section.

This scheme works for devices are able to work with iothread only.
There are other devices, e.g. ide, which don't work with iothreads.
The requests to those devices are processed by the main event loop
which doesn't stop processing the events on its context when there are
in "drained section" BlockDriverState-s.

Thereby, there is a case when the request can be processed when
the BDS don't expect that, for example, ide controller makes a request
when the VM finalizes the drive mirroring. This could lead to spoiling
the data consistency on those BDS'es.

To prevent this situation, the patch introduces the infrastructure
for postponing actions. The infrastructure allows postponing the
actions for the BDS to the moment when BDS's context becomes enabled
for external requests. When the context becomes enabled for external
requests all the postponed action are executed.

The not-iothread-friendly devices can use the infrastructure to
postpone the execution of the requests appeared when the context was
in "drained section".

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 include/block/aio.h | 63 +++++++++++++++++++++++++++++++++++++++++++++
 util/async.c        | 33 ++++++++++++++++++++++++
 2 files changed, 96 insertions(+)

diff --git a/include/block/aio.h b/include/block/aio.h
index ae6f354e6c..ca61009e57 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -46,11 +46,24 @@ typedef struct AioHandler AioHandler;
 typedef void QEMUBHFunc(void *opaque);
 typedef bool AioPollFn(void *opaque);
 typedef void IOHandler(void *opaque);
+typedef void AioPostponedFunc(void *opaque);
 
 struct Coroutine;
 struct ThreadPool;
 struct LinuxAioState;
 
+/**
+ * Struct for postponing the actions
+ */
+typedef struct AioPostponedAction {
+    /* A function to run on context enabling */
+    AioPostponedFunc *func;
+    /* Param to be passed to the function */
+    void *func_param;
+    /**/
+    QSLIST_ENTRY(AioPostponedAction) next_action;
+} AioPostponedAction;
+
 struct AioContext {
     GSource source;
 
@@ -110,6 +123,16 @@ struct AioContext {
     EventNotifier notifier;
 
     QSLIST_HEAD(, Coroutine) scheduled_coroutines;
+
+    /* list of postponed actions
+     * The actions (might be thought as requests) which have been postponed
+     * because of the drained section was entered at the moment of their
+     * appearing.
+     * All these actions have to be run wheen the context is enabled for
+     * external requests.
+     */
+    QSLIST_HEAD(, AioPostponedAction) postponed_actions;
+
     QEMUBH *co_schedule_bh;
 
     /* Thread pool for performing work and receiving completion callbacks.
@@ -435,6 +458,41 @@ static inline void aio_timer_init(AioContext *ctx,
  */
 int64_t aio_compute_timeout(AioContext *ctx);
 
+/**
+ * aio_create_postponed_action:
+ * @ctx: the aio context
+ * @func: the function to postpone
+ * @param: the parameter to passed to the function
+ *
+ * Create a postponed action.
+ */
+AioPostponedAction *aio_create_postponed_action(
+                        AioPostponedFunc func, void *func_param);
+
+/**
+ * aio_postpone_action:
+ * @ctx: the aio context
+ * @action: the function and the parameter to passed to the function
+ *
+ * Queue an ation to the queue. The queue are processed and the actions
+ * are executed when the context becomes available to the external
+ * requests.
+ *
+ * Should be invoked under aio_context_acquire/release.
+ */
+void aio_postpone_action(AioContext *ctx, AioPostponedAction *action);
+
+/**
+ * aio_run_postponed_actions:
+ * @ctx: the aio context
+ *
+ * The function invokes all the actions queued to postponed action
+ * context queue.
+ *
+ * Should be invoked under aio_context_acquire/release.
+ */
+void aio_run_postponed_actions(AioContext *ctx);
+
 /**
  * aio_disable_external:
  * @ctx: the aio context
@@ -443,7 +501,9 @@ int64_t aio_compute_timeout(AioContext *ctx);
  */
 static inline void aio_disable_external(AioContext *ctx)
 {
+    aio_context_acquire(ctx);
     atomic_inc(&ctx->external_disable_cnt);
+    aio_context_release(ctx);
 }
 
 /**
@@ -456,12 +516,15 @@ static inline void aio_enable_external(AioContext *ctx)
 {
     int old;
 
+    aio_context_acquire(ctx);
     old = atomic_fetch_dec(&ctx->external_disable_cnt);
     assert(old > 0);
     if (old == 1) {
+        aio_run_postponed_actions(ctx);
         /* Kick event loop so it re-arms file descriptors */
         aio_notify(ctx);
     }
+    aio_context_release(ctx);
 }
 
 /**
diff --git a/util/async.c b/util/async.c
index 03f62787f2..e5fa35972e 100644
--- a/util/async.c
+++ b/util/async.c
@@ -278,6 +278,7 @@ aio_ctx_finalize(GSource     *source)
 #endif
 
     assert(QSLIST_EMPTY(&ctx->scheduled_coroutines));
+    assert(QSLIST_EMPTY(&ctx->postponed_actions));
     qemu_bh_delete(ctx->co_schedule_bh);
 
     qemu_lockcnt_lock(&ctx->list_lock);
@@ -415,6 +416,7 @@ AioContext *aio_context_new(Error **errp)
 
     ctx->co_schedule_bh = aio_bh_new(ctx, co_schedule_bh_cb, ctx);
     QSLIST_INIT(&ctx->scheduled_coroutines);
+    QSLIST_INIT(&ctx->postponed_actions);
 
     aio_set_event_notifier(ctx, &ctx->notifier,
                            false,
@@ -507,3 +509,34 @@ void aio_context_release(AioContext *ctx)
 {
     qemu_rec_mutex_unlock(&ctx->lock);
 }
+
+AioPostponedAction *aio_create_postponed_action(
+                        AioPostponedFunc func, void *func_param)
+{
+    AioPostponedAction *action = g_malloc(sizeof(AioPostponedAction));
+    action->func = func;
+    action->func_param = func_param;
+    return action;
+}
+
+static void aio_destroy_postponed_action(AioPostponedAction *action)
+{
+    g_free(action);
+}
+
+/* should be run under aio_context_aquire/release */
+void aio_postpone_action(AioContext *ctx, AioPostponedAction *action)
+{
+    QSLIST_INSERT_HEAD(&ctx->postponed_actions, action, next_action);
+}
+
+/* should be run under aio_context_aquire/release */
+void aio_run_postponed_actions(AioContext *ctx)
+{
+    while(!QSLIST_EMPTY(&ctx->postponed_actions)) {
+        AioPostponedAction *action = QSLIST_FIRST(&ctx->postponed_actions);
+        QSLIST_REMOVE_HEAD(&ctx->postponed_actions, next_action);
+        action->func(action->func_param);
+        aio_destroy_postponed_action(action);
+    }
+}
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained
  2018-06-29 12:40 [Qemu-devel] [PATCH v0 0/2] Postponed actions Denis Plotnikov
  2018-06-29 12:40 ` [Qemu-devel] [PATCH v0 1/2] async: add infrastructure for postponed actions Denis Plotnikov
@ 2018-06-29 12:40 ` Denis Plotnikov
  2018-09-10 12:41   ` Kevin Wolf
  2018-07-02  1:47 ` [Qemu-devel] [PATCH v0 0/2] Postponed actions no-reply
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Denis Plotnikov @ 2018-06-29 12:40 UTC (permalink / raw)
  To: kwolf, reitz, stefanha, famz, qemu-stable; +Cc: qemu-block, qemu-devel

Fixes the problem of ide request appearing when the BDS is in
the "drained section".

Without the patch the request can come and be processed by the main
event loop, as the ide requests are processed by the main event loop
and the main event loop doesn't stop when its context is in the
"drained section".
The request execution is postponed until the end of "drained section".

The patch doesn't modify ide specific code, as well as any other
device code. Instead, it modifies the infrastructure of asynchronous
Block Backend requests, in favor of postponing the requests arisen
when in "drained section" to remove the possibility of request appearing
for all the infrastructure clients.

This approach doesn't make vCPU processing the request wait untill
the end of request processing.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 block/block-backend.c | 58 ++++++++++++++++++++++++++++++++++---------
 1 file changed, 46 insertions(+), 12 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index d55c328736..68dcd704d2 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1318,6 +1318,7 @@ typedef struct BlkAioEmAIOCB {
     BlkRwCo rwco;
     int bytes;
     bool has_returned;
+    CoroutineEntry *co_entry;
 } BlkAioEmAIOCB;
 
 static const AIOCBInfo blk_aio_em_aiocb_info = {
@@ -1340,16 +1341,55 @@ static void blk_aio_complete_bh(void *opaque)
     blk_aio_complete(acb);
 }
 
+static void blk_aio_create_co(void *opaque)
+{
+    BlockDriverState *current_bs;
+    AioContext *ctx;
+    BlkAioEmAIOCB *acb = (BlkAioEmAIOCB *) opaque;
+    BlockBackend *blk = acb->rwco.blk;
+
+    /* The check makes sense if the action was postponed until the context
+     * is enabled for external requests: if a BlockDriverState of a BlockBackend
+     * was changed, for example on making a new snapshot, update BlockDriverState
+     * in ACB and try to run the coroutine in the changed BDS context
+     */
+    current_bs = blk_bs(blk);
+
+    if (current_bs != acb->common.bs) {
+        acb->common.bs = current_bs;
+    }
+
+    ctx = blk_get_aio_context(blk);
+    /* If a request comes from device (e.g. ide controller) when
+     * the context disabled, postpone the request until the context is
+     * enabled for external requests.
+     * Otherwise, create a couroutine and enter it right now
+     */
+    aio_context_acquire(ctx);
+    if (aio_external_disabled(ctx)) {
+        AioPostponedAction *action = aio_create_postponed_action(
+                                                blk_aio_create_co, acb);
+        aio_postpone_action(ctx, action);
+    } else {
+        Coroutine *co = qemu_coroutine_create(acb->co_entry, acb);
+        blk_inc_in_flight(blk);
+        bdrv_coroutine_enter(acb->common.bs, co);
+
+        acb->has_returned = true;
+        if (acb->rwco.ret != NOT_DONE) {
+            aio_bh_schedule_oneshot(ctx, blk_aio_complete_bh, acb);
+        }
+    }
+    aio_context_release(ctx);
+}
+
 static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
                                 void *iobuf, CoroutineEntry co_entry,
                                 BdrvRequestFlags flags,
                                 BlockCompletionFunc *cb, void *opaque)
 {
-    BlkAioEmAIOCB *acb;
-    Coroutine *co;
 
-    blk_inc_in_flight(blk);
-    acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
+    BlkAioEmAIOCB *acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
     acb->rwco = (BlkRwCo) {
         .blk    = blk,
         .offset = offset,
@@ -1359,15 +1399,9 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
     };
     acb->bytes = bytes;
     acb->has_returned = false;
+    acb->co_entry = co_entry;
 
-    co = qemu_coroutine_create(co_entry, acb);
-    bdrv_coroutine_enter(blk_bs(blk), co);
-
-    acb->has_returned = true;
-    if (acb->rwco.ret != NOT_DONE) {
-        aio_bh_schedule_oneshot(blk_get_aio_context(blk),
-                                blk_aio_complete_bh, acb);
-    }
+    blk_aio_create_co(acb);
 
     return &acb->common;
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v0 0/2] Postponed actions
  2018-06-29 12:40 [Qemu-devel] [PATCH v0 0/2] Postponed actions Denis Plotnikov
  2018-06-29 12:40 ` [Qemu-devel] [PATCH v0 1/2] async: add infrastructure for postponed actions Denis Plotnikov
  2018-06-29 12:40 ` [Qemu-devel] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained Denis Plotnikov
@ 2018-07-02  1:47 ` no-reply
  2018-07-02 15:18 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2018-07-16 15:01 ` [Qemu-devel] " Denis Plotnikov
  4 siblings, 0 replies; 25+ messages in thread
From: no-reply @ 2018-07-02  1:47 UTC (permalink / raw)
  To: dplotnikov; +Cc: famz, kwolf, reitz, stefanha

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180629124052.331406-1-dplotnikov@virtuozzo.com
Subject: [Qemu-devel] [PATCH v0 0/2] Postponed actions

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
1c24f9f035 block: postpone the coroutine executing if the BDS's is drained
0d33eb4378 async: add infrastructure for postponed actions

=== OUTPUT BEGIN ===
Checking PATCH 1/2: async: add infrastructure for postponed actions...
ERROR: space required before the open parenthesis '('
#202: FILE: util/async.c:544:
+    while(!QSLIST_EMPTY(&ctx->postponed_actions)) {

total: 1 errors, 0 warnings, 153 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/2: block: postpone the coroutine executing if the BDS's is drained...
WARNING: line over 80 characters
#53: FILE: block/block-backend.c:1357:
+     * was changed, for example on making a new snapshot, update BlockDriverState

total: 0 errors, 1 warnings, 83 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
  2018-06-29 12:40 [Qemu-devel] [PATCH v0 0/2] Postponed actions Denis Plotnikov
                   ` (2 preceding siblings ...)
  2018-07-02  1:47 ` [Qemu-devel] [PATCH v0 0/2] Postponed actions no-reply
@ 2018-07-02 15:18 ` Stefan Hajnoczi
  2018-07-17 10:31   ` Stefan Hajnoczi
  2018-07-16 15:01 ` [Qemu-devel] " Denis Plotnikov
  4 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2018-07-02 15:18 UTC (permalink / raw)
  To: Denis Plotnikov
  Cc: kwolf, reitz, stefanha, famz, qemu-stable, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2117 bytes --]

On Fri, Jun 29, 2018 at 03:40:50PM +0300, Denis Plotnikov wrote:
> There are cases when a request to a block driver state shouldn't have
> appeared producing dangerous race conditions.
> This misbehaviour is usually happens with storage devices emulated
> without eventfd for guest to host notifications like IDE.
> 
> The issue arises when the context is in the "drained" section
> and doesn't expect the request to come, but request comes from the
> device not using iothread and which context is processed by the main loop.
> 
> The main loop apart of the iothread event loop isn't blocked by the
> "drained" section.
> The request coming and processing while in "drained" section can spoil the
> block driver state consistency.
> 
> This behavior can be observed in the following KVM-based case:
> 
> 1. Setup a VM with an IDE disk.
> 2. Inside a VM start a disk writing load for the IDE device
>   e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
> 3. On the host create a mirroring block job for the IDE device
>   e.g: drive_mirror <your_IDE> <your_path>
> 4. On the host finish the block job
>   e.g: block_job_complete <your_IDE>
>  
> Having done the 4th action, you could get an assert:
> assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
> On my setup, the assert is 1/3 reproducible.
> 
> The patch series introduces the mechanism to postpone the requests
> until the BDS leaves "drained" section for the devices not using iothreads.
> Also, it modifies the asynchronous block backend infrastructure to use
> that mechanism to release the assert bug for IDE devices.

I don't understand the scenario.  IDE emulation runs in the vcpu and
main loop threads.  These threads hold the global mutex when executing
QEMU code.  If thread A is in a drained region with the global mutex,
then thread B cannot run QEMU code since it would need to global mutex.

So I guess the problem is not that thread B will submit new requests,
but maybe that the IDE DMA code will run a completion in thread A and
submit another request in the drained region?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v0 0/2] Postponed actions
  2018-06-29 12:40 [Qemu-devel] [PATCH v0 0/2] Postponed actions Denis Plotnikov
                   ` (3 preceding siblings ...)
  2018-07-02 15:18 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2018-07-16 15:01 ` Denis Plotnikov
  2018-07-16 18:59   ` [Qemu-devel] [Qemu-block] " John Snow
  4 siblings, 1 reply; 25+ messages in thread
From: Denis Plotnikov @ 2018-07-16 15:01 UTC (permalink / raw)
  To: kwolf, reitz, stefanha, famz, qemu-stable; +Cc: qemu-devel, qemu-block

Ping!

On 29.06.2018 15:40, Denis Plotnikov wrote:
> There are cases when a request to a block driver state shouldn't have
> appeared producing dangerous race conditions.
> This misbehaviour is usually happens with storage devices emulated
> without eventfd for guest to host notifications like IDE.
> 
> The issue arises when the context is in the "drained" section
> and doesn't expect the request to come, but request comes from the
> device not using iothread and which context is processed by the main loop.
> 
> The main loop apart of the iothread event loop isn't blocked by the
> "drained" section.
> The request coming and processing while in "drained" section can spoil the
> block driver state consistency.
> 
> This behavior can be observed in the following KVM-based case:
> 
> 1. Setup a VM with an IDE disk.
> 2. Inside a VM start a disk writing load for the IDE device
>    e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
> 3. On the host create a mirroring block job for the IDE device
>    e.g: drive_mirror <your_IDE> <your_path>
> 4. On the host finish the block job
>    e.g: block_job_complete <your_IDE>
>   
> Having done the 4th action, you could get an assert:
> assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
> On my setup, the assert is 1/3 reproducible.
> 
> The patch series introduces the mechanism to postpone the requests
> until the BDS leaves "drained" section for the devices not using iothreads.
> Also, it modifies the asynchronous block backend infrastructure to use
> that mechanism to release the assert bug for IDE devices.
> 
> Denis Plotnikov (2):
>    async: add infrastructure for postponed actions
>    block: postpone the coroutine executing if the BDS's is drained
> 
>   block/block-backend.c | 58 ++++++++++++++++++++++++++++++---------
>   include/block/aio.h   | 63 +++++++++++++++++++++++++++++++++++++++++++
>   util/async.c          | 33 +++++++++++++++++++++++
>   3 files changed, 142 insertions(+), 12 deletions(-)
> 

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block]  [PATCH v0 0/2] Postponed actions
  2018-07-16 15:01 ` [Qemu-devel] " Denis Plotnikov
@ 2018-07-16 18:59   ` John Snow
  2018-07-18  7:53     ` Denis Plotnikov
  2018-08-13  8:32     ` Denis Plotnikov
  0 siblings, 2 replies; 25+ messages in thread
From: John Snow @ 2018-07-16 18:59 UTC (permalink / raw)
  To: Denis Plotnikov, kwolf, mreitz, stefanha, famz, qemu-stable
  Cc: qemu-devel, qemu-block



On 07/16/2018 11:01 AM, Denis Plotnikov wrote:
> Ping!
> 

I never saw a reply to Stefan's question on July 2nd, did you reply
off-list?

--js

> On 29.06.2018 15:40, Denis Plotnikov wrote:
>> There are cases when a request to a block driver state shouldn't have
>> appeared producing dangerous race conditions.
>> This misbehaviour is usually happens with storage devices emulated
>> without eventfd for guest to host notifications like IDE.
>>
>> The issue arises when the context is in the "drained" section
>> and doesn't expect the request to come, but request comes from the
>> device not using iothread and which context is processed by the main
>> loop.
>>
>> The main loop apart of the iothread event loop isn't blocked by the
>> "drained" section.
>> The request coming and processing while in "drained" section can spoil
>> the
>> block driver state consistency.
>>
>> This behavior can be observed in the following KVM-based case:
>>
>> 1. Setup a VM with an IDE disk.
>> 2. Inside a VM start a disk writing load for the IDE device
>>    e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
>> 3. On the host create a mirroring block job for the IDE device
>>    e.g: drive_mirror <your_IDE> <your_path>
>> 4. On the host finish the block job
>>    e.g: block_job_complete <your_IDE>
>>   Having done the 4th action, you could get an assert:
>> assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
>> On my setup, the assert is 1/3 reproducible.
>>
>> The patch series introduces the mechanism to postpone the requests
>> until the BDS leaves "drained" section for the devices not using
>> iothreads.
>> Also, it modifies the asynchronous block backend infrastructure to use
>> that mechanism to release the assert bug for IDE devices.
>>
>> Denis Plotnikov (2):
>>    async: add infrastructure for postponed actions
>>    block: postpone the coroutine executing if the BDS's is drained
>>
>>   block/block-backend.c | 58 ++++++++++++++++++++++++++++++---------
>>   include/block/aio.h   | 63 +++++++++++++++++++++++++++++++++++++++++++
>>   util/async.c          | 33 +++++++++++++++++++++++
>>   3 files changed, 142 insertions(+), 12 deletions(-)
>>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
  2018-07-02 15:18 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2018-07-17 10:31   ` Stefan Hajnoczi
  0 siblings, 0 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2018-07-17 10:31 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Denis Plotnikov, kwolf, mreitz, famz, qemu-stable, qemu-devel,
	qemu-block

[-- Attachment #1: Type: text/plain, Size: 2279 bytes --]

On Mon, Jul 02, 2018 at 04:18:43PM +0100, Stefan Hajnoczi wrote:
> On Fri, Jun 29, 2018 at 03:40:50PM +0300, Denis Plotnikov wrote:
> > There are cases when a request to a block driver state shouldn't have
> > appeared producing dangerous race conditions.
> > This misbehaviour is usually happens with storage devices emulated
> > without eventfd for guest to host notifications like IDE.
> > 
> > The issue arises when the context is in the "drained" section
> > and doesn't expect the request to come, but request comes from the
> > device not using iothread and which context is processed by the main loop.
> > 
> > The main loop apart of the iothread event loop isn't blocked by the
> > "drained" section.
> > The request coming and processing while in "drained" section can spoil the
> > block driver state consistency.
> > 
> > This behavior can be observed in the following KVM-based case:
> > 
> > 1. Setup a VM with an IDE disk.
> > 2. Inside a VM start a disk writing load for the IDE device
> >   e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
> > 3. On the host create a mirroring block job for the IDE device
> >   e.g: drive_mirror <your_IDE> <your_path>
> > 4. On the host finish the block job
> >   e.g: block_job_complete <your_IDE>
> >  
> > Having done the 4th action, you could get an assert:
> > assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
> > On my setup, the assert is 1/3 reproducible.
> > 
> > The patch series introduces the mechanism to postpone the requests
> > until the BDS leaves "drained" section for the devices not using iothreads.
> > Also, it modifies the asynchronous block backend infrastructure to use
> > that mechanism to release the assert bug for IDE devices.
> 
> I don't understand the scenario.  IDE emulation runs in the vcpu and
> main loop threads.  These threads hold the global mutex when executing
> QEMU code.  If thread A is in a drained region with the global mutex,
> then thread B cannot run QEMU code since it would need to global mutex.
> 
> So I guess the problem is not that thread B will submit new requests,
> but maybe that the IDE DMA code will run a completion in thread A and
> submit another request in the drained region?

Ping! :)

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block]  [PATCH v0 0/2] Postponed actions
  2018-07-16 18:59   ` [Qemu-devel] [Qemu-block] " John Snow
@ 2018-07-18  7:53     ` Denis Plotnikov
  2018-08-13  8:32     ` Denis Plotnikov
  1 sibling, 0 replies; 25+ messages in thread
From: Denis Plotnikov @ 2018-07-18  7:53 UTC (permalink / raw)
  To: John Snow, kwolf, mreitz, stefanha, famz, qemu-stable
  Cc: qemu-devel, qemu-block



On 16.07.2018 21:59, John Snow wrote:
> 
> 
> On 07/16/2018 11:01 AM, Denis Plotnikov wrote:
>> Ping!
>>
> 
> I never saw a reply to Stefan's question on July 2nd, did you reply
> off-list?
For some reason, there are no Stefan's replies on my server. Found it in 
the web. Will respond to it shortly.

Thanks!

Denis
> 
> --js
> 
>> On 29.06.2018 15:40, Denis Plotnikov wrote:
>>> There are cases when a request to a block driver state shouldn't have
>>> appeared producing dangerous race conditions.
>>> This misbehaviour is usually happens with storage devices emulated
>>> without eventfd for guest to host notifications like IDE.
>>>
>>> The issue arises when the context is in the "drained" section
>>> and doesn't expect the request to come, but request comes from the
>>> device not using iothread and which context is processed by the main
>>> loop.
>>>
>>> The main loop apart of the iothread event loop isn't blocked by the
>>> "drained" section.
>>> The request coming and processing while in "drained" section can spoil
>>> the
>>> block driver state consistency.
>>>
>>> This behavior can be observed in the following KVM-based case:
>>>
>>> 1. Setup a VM with an IDE disk.
>>> 2. Inside a VM start a disk writing load for the IDE device
>>>     e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
>>> 3. On the host create a mirroring block job for the IDE device
>>>     e.g: drive_mirror <your_IDE> <your_path>
>>> 4. On the host finish the block job
>>>     e.g: block_job_complete <your_IDE>
>>>    Having done the 4th action, you could get an assert:
>>> assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
>>> On my setup, the assert is 1/3 reproducible.
>>>
>>> The patch series introduces the mechanism to postpone the requests
>>> until the BDS leaves "drained" section for the devices not using
>>> iothreads.
>>> Also, it modifies the asynchronous block backend infrastructure to use
>>> that mechanism to release the assert bug for IDE devices.
>>>
>>> Denis Plotnikov (2):
>>>     async: add infrastructure for postponed actions
>>>     block: postpone the coroutine executing if the BDS's is drained
>>>
>>>    block/block-backend.c | 58 ++++++++++++++++++++++++++++++---------
>>>    include/block/aio.h   | 63 +++++++++++++++++++++++++++++++++++++++++++
>>>    util/async.c          | 33 +++++++++++++++++++++++
>>>    3 files changed, 142 insertions(+), 12 deletions(-)
>>>
>>

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block]  [PATCH v0 0/2] Postponed actions
  2018-07-16 18:59   ` [Qemu-devel] [Qemu-block] " John Snow
  2018-07-18  7:53     ` Denis Plotnikov
@ 2018-08-13  8:32     ` Denis Plotnikov
  2018-08-13 16:30       ` Kevin Wolf
  1 sibling, 1 reply; 25+ messages in thread
From: Denis Plotnikov @ 2018-08-13  8:32 UTC (permalink / raw)
  To: John Snow, kwolf, mreitz, stefanha, famz, qemu-stable
  Cc: qemu-devel, qemu-block

Ping ping!

On 16.07.2018 21:59, John Snow wrote:
> 
> 
> On 07/16/2018 11:01 AM, Denis Plotnikov wrote:
>> Ping!
>>
> 
> I never saw a reply to Stefan's question on July 2nd, did you reply
> off-list?
> 
> --js
Yes, I did. I talked to Stefan why the patch set appeared.
> 
>> On 29.06.2018 15:40, Denis Plotnikov wrote:
>>> There are cases when a request to a block driver state shouldn't have
>>> appeared producing dangerous race conditions.
>>> This misbehaviour is usually happens with storage devices emulated
>>> without eventfd for guest to host notifications like IDE.
>>>
>>> The issue arises when the context is in the "drained" section
>>> and doesn't expect the request to come, but request comes from the
>>> device not using iothread and which context is processed by the main
>>> loop.
>>>
>>> The main loop apart of the iothread event loop isn't blocked by the
>>> "drained" section.
>>> The request coming and processing while in "drained" section can spoil
>>> the
>>> block driver state consistency.
>>>
>>> This behavior can be observed in the following KVM-based case:
>>>
>>> 1. Setup a VM with an IDE disk.
>>> 2. Inside a VM start a disk writing load for the IDE device
>>>     e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
>>> 3. On the host create a mirroring block job for the IDE device
>>>     e.g: drive_mirror <your_IDE> <your_path>
>>> 4. On the host finish the block job
>>>     e.g: block_job_complete <your_IDE>
>>>    Having done the 4th action, you could get an assert:
>>> assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
>>> On my setup, the assert is 1/3 reproducible.
>>>
>>> The patch series introduces the mechanism to postpone the requests
>>> until the BDS leaves "drained" section for the devices not using
>>> iothreads.
>>> Also, it modifies the asynchronous block backend infrastructure to use
>>> that mechanism to release the assert bug for IDE devices.
>>>
>>> Denis Plotnikov (2):
>>>     async: add infrastructure for postponed actions
>>>     block: postpone the coroutine executing if the BDS's is drained
>>>
>>>    block/block-backend.c | 58 ++++++++++++++++++++++++++++++---------
>>>    include/block/aio.h   | 63 +++++++++++++++++++++++++++++++++++++++++++
>>>    util/async.c          | 33 +++++++++++++++++++++++
>>>    3 files changed, 142 insertions(+), 12 deletions(-)
>>>
>>

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block]  [PATCH v0 0/2] Postponed actions
  2018-08-13  8:32     ` Denis Plotnikov
@ 2018-08-13 16:30       ` Kevin Wolf
  2018-08-14  7:08         ` Denis Plotnikov
  0 siblings, 1 reply; 25+ messages in thread
From: Kevin Wolf @ 2018-08-13 16:30 UTC (permalink / raw)
  To: Denis Plotnikov
  Cc: John Snow, mreitz, stefanha, famz, qemu-stable, qemu-devel, qemu-block

Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben:
> Ping ping!
> 
> On 16.07.2018 21:59, John Snow wrote:
> > 
> > 
> > On 07/16/2018 11:01 AM, Denis Plotnikov wrote:
> > > Ping!
> > > 
> > 
> > I never saw a reply to Stefan's question on July 2nd, did you reply
> > off-list?
> > 
> > --js
> Yes, I did. I talked to Stefan why the patch set appeared.

The rest of us still don't know the answer. I had the same question.

Kevin

> > > On 29.06.2018 15:40, Denis Plotnikov wrote:
> > > > There are cases when a request to a block driver state shouldn't have
> > > > appeared producing dangerous race conditions.
> > > > This misbehaviour is usually happens with storage devices emulated
> > > > without eventfd for guest to host notifications like IDE.
> > > > 
> > > > The issue arises when the context is in the "drained" section
> > > > and doesn't expect the request to come, but request comes from the
> > > > device not using iothread and which context is processed by the main
> > > > loop.
> > > > 
> > > > The main loop apart of the iothread event loop isn't blocked by the
> > > > "drained" section.
> > > > The request coming and processing while in "drained" section can spoil
> > > > the
> > > > block driver state consistency.
> > > > 
> > > > This behavior can be observed in the following KVM-based case:
> > > > 
> > > > 1. Setup a VM with an IDE disk.
> > > > 2. Inside a VM start a disk writing load for the IDE device
> > > >     e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
> > > > 3. On the host create a mirroring block job for the IDE device
> > > >     e.g: drive_mirror <your_IDE> <your_path>
> > > > 4. On the host finish the block job
> > > >     e.g: block_job_complete <your_IDE>
> > > >    Having done the 4th action, you could get an assert:
> > > > assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
> > > > On my setup, the assert is 1/3 reproducible.
> > > > 
> > > > The patch series introduces the mechanism to postpone the requests
> > > > until the BDS leaves "drained" section for the devices not using
> > > > iothreads.
> > > > Also, it modifies the asynchronous block backend infrastructure to use
> > > > that mechanism to release the assert bug for IDE devices.
> > > > 
> > > > Denis Plotnikov (2):
> > > >     async: add infrastructure for postponed actions
> > > >     block: postpone the coroutine executing if the BDS's is drained
> > > > 
> > > >    block/block-backend.c | 58 ++++++++++++++++++++++++++++++---------
> > > >    include/block/aio.h   | 63 +++++++++++++++++++++++++++++++++++++++++++
> > > >    util/async.c          | 33 +++++++++++++++++++++++
> > > >    3 files changed, 142 insertions(+), 12 deletions(-)
> > > > 
> > > 
> 
> -- 
> Best,
> Denis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block]  [PATCH v0 0/2] Postponed actions
  2018-08-13 16:30       ` Kevin Wolf
@ 2018-08-14  7:08         ` Denis Plotnikov
  2018-08-20  7:40           ` Denis Plotnikov
                             ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Denis Plotnikov @ 2018-08-14  7:08 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: John Snow, mreitz, stefanha, famz, qemu-stable, qemu-devel, qemu-block



On 13.08.2018 19:30, Kevin Wolf wrote:
> Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben:
>> Ping ping!
>>
>> On 16.07.2018 21:59, John Snow wrote:
>>>
>>>
>>> On 07/16/2018 11:01 AM, Denis Plotnikov wrote:
>>>> Ping!
>>>>
>>>
>>> I never saw a reply to Stefan's question on July 2nd, did you reply
>>> off-list?
>>>
>>> --js
>> Yes, I did. I talked to Stefan why the patch set appeared.
> 
> The rest of us still don't know the answer. I had the same question.
> 
> Kevin
Yes, that's my fault. I should have post it earlier.

I reviewed the problem once again and come up with the following 
explanation.
Indeed, if the global lock has been taken by the main thread the vCPU 
threads won't be able to execute mmio ide.
But, if the main thread will release the lock then nothing will prevent
vCPU threads form execution what they want, e.g writing to the block device.

In case of running the mirroring it is possible. Let's take a look
at the following snippet of mirror_run. This is a part the mirroring 
completion part.

             bdrv_drained_begin(bs);
             cnt = bdrv_get_dirty_count(s->dirty_bitmap);
 >>>>>>      if (cnt > 0 || mirror_flush(s) < 0) {
                 bdrv_drained_end(bs);
                 continue;
             }

(X) >>>>    assert(QLIST_EMPTY(&bs->tracked_requests));

mirror_flush here can yield the current coroutine so nothing more can be 
executed.
We could end up with the situation when the main loop have to revolve to 
poll for another timer/bh to process. While revolving it releases the 
global lock. If the global lock is waited for by a vCPU (any other) 
thread, the waiting thread will get the lock and make what it intends.

This is something that I can observe:

mirror_flush yields coroutine, the main thread revolves and locks 
because a vCPU was waiting for the lock. Now the vCPU thread owns the 
lock and the main thread waits for the lock releasing.
The vCPU thread does cmd_write_dma and releases the lock. Then, the main
thread gets the lock and continues to run eventually proceeding with the 
coroutine yeiled.
If the vCPU requests aren't completed by the moment we will assert at 
(X). If the vCPU requests are completed we won't even notice that we had 
some writes while in the drained section.

Denis
> 
>>>> On 29.06.2018 15:40, Denis Plotnikov wrote:
>>>>> There are cases when a request to a block driver state shouldn't have
>>>>> appeared producing dangerous race conditions.
>>>>> This misbehaviour is usually happens with storage devices emulated
>>>>> without eventfd for guest to host notifications like IDE.
>>>>>
>>>>> The issue arises when the context is in the "drained" section
>>>>> and doesn't expect the request to come, but request comes from the
>>>>> device not using iothread and which context is processed by the main
>>>>> loop.
>>>>>
>>>>> The main loop apart of the iothread event loop isn't blocked by the
>>>>> "drained" section.
>>>>> The request coming and processing while in "drained" section can spoil
>>>>> the
>>>>> block driver state consistency.
>>>>>
>>>>> This behavior can be observed in the following KVM-based case:
>>>>>
>>>>> 1. Setup a VM with an IDE disk.
>>>>> 2. Inside a VM start a disk writing load for the IDE device
>>>>>      e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
>>>>> 3. On the host create a mirroring block job for the IDE device
>>>>>      e.g: drive_mirror <your_IDE> <your_path>
>>>>> 4. On the host finish the block job
>>>>>      e.g: block_job_complete <your_IDE>
>>>>>     Having done the 4th action, you could get an assert:
>>>>> assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
>>>>> On my setup, the assert is 1/3 reproducible.
>>>>>
>>>>> The patch series introduces the mechanism to postpone the requests
>>>>> until the BDS leaves "drained" section for the devices not using
>>>>> iothreads.
>>>>> Also, it modifies the asynchronous block backend infrastructure to use
>>>>> that mechanism to release the assert bug for IDE devices.
>>>>>
>>>>> Denis Plotnikov (2):
>>>>>      async: add infrastructure for postponed actions
>>>>>      block: postpone the coroutine executing if the BDS's is drained
>>>>>
>>>>>     block/block-backend.c | 58 ++++++++++++++++++++++++++++++---------
>>>>>     include/block/aio.h   | 63 +++++++++++++++++++++++++++++++++++++++++++
>>>>>     util/async.c          | 33 +++++++++++++++++++++++
>>>>>     3 files changed, 142 insertions(+), 12 deletions(-)
>>>>>
>>>>
>>
>> -- 
>> Best,
>> Denis

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
  2018-08-14  7:08         ` Denis Plotnikov
@ 2018-08-20  7:40           ` Denis Plotnikov
  2018-08-20  7:42           ` Denis Plotnikov
  2018-08-27  7:05           ` Denis Plotnikov
  2 siblings, 0 replies; 25+ messages in thread
From: Denis Plotnikov @ 2018-08-20  7:40 UTC (permalink / raw)
  To: qemu-devel

ping ping!

On 14.08.2018 10:08, Denis Plotnikov wrote:
> 
> 
> On 13.08.2018 19:30, Kevin Wolf wrote:
>> Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben:
>>> Ping ping!
>>>
>>> On 16.07.2018 21:59, John Snow wrote:
>>>>
>>>>
>>>> On 07/16/2018 11:01 AM, Denis Plotnikov wrote:
>>>>> Ping!
>>>>>
>>>>
>>>> I never saw a reply to Stefan's question on July 2nd, did you reply
>>>> off-list?
>>>>
>>>> --js
>>> Yes, I did. I talked to Stefan why the patch set appeared.
>>
>> The rest of us still don't know the answer. I had the same question.
>>
>> Kevin
> Yes, that's my fault. I should have post it earlier.
> 
> I reviewed the problem once again and come up with the following 
> explanation.
> Indeed, if the global lock has been taken by the main thread the vCPU 
> threads won't be able to execute mmio ide.
> But, if the main thread will release the lock then nothing will prevent
> vCPU threads form execution what they want, e.g writing to the block 
> device.
> 
> In case of running the mirroring it is possible. Let's take a look
> at the following snippet of mirror_run. This is a part the mirroring 
> completion part.
> 
>              bdrv_drained_begin(bs);
>              cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>  >>>>>>      if (cnt > 0 || mirror_flush(s) < 0) {
>                  bdrv_drained_end(bs);
>                  continue;
>              }
> 
> (X) >>>>    assert(QLIST_EMPTY(&bs->tracked_requests));
> 
> mirror_flush here can yield the current coroutine so nothing more can be 
> executed.
> We could end up with the situation when the main loop have to revolve to 
> poll for another timer/bh to process. While revolving it releases the 
> global lock. If the global lock is waited for by a vCPU (any other) 
> thread, the waiting thread will get the lock and make what it intends.
> 
> This is something that I can observe:
> 
> mirror_flush yields coroutine, the main thread revolves and locks 
> because a vCPU was waiting for the lock. Now the vCPU thread owns the 
> lock and the main thread waits for the lock releasing.
> The vCPU thread does cmd_write_dma and releases the lock. Then, the main
> thread gets the lock and continues to run eventually proceeding with the 
> coroutine yeiled.
> If the vCPU requests aren't completed by the moment we will assert at 
> (X). If the vCPU requests are completed we won't even notice that we had 
> some writes while in the drained section.
> 
> Denis
>>
>>>>> On 29.06.2018 15:40, Denis Plotnikov wrote:
>>>>>> There are cases when a request to a block driver state shouldn't have
>>>>>> appeared producing dangerous race conditions.
>>>>>> This misbehaviour is usually happens with storage devices emulated
>>>>>> without eventfd for guest to host notifications like IDE.
>>>>>>
>>>>>> The issue arises when the context is in the "drained" section
>>>>>> and doesn't expect the request to come, but request comes from the
>>>>>> device not using iothread and which context is processed by the main
>>>>>> loop.
>>>>>>
>>>>>> The main loop apart of the iothread event loop isn't blocked by the
>>>>>> "drained" section.
>>>>>> The request coming and processing while in "drained" section can 
>>>>>> spoil
>>>>>> the
>>>>>> block driver state consistency.
>>>>>>
>>>>>> This behavior can be observed in the following KVM-based case:
>>>>>>
>>>>>> 1. Setup a VM with an IDE disk.
>>>>>> 2. Inside a VM start a disk writing load for the IDE device
>>>>>>      e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
>>>>>> 3. On the host create a mirroring block job for the IDE device
>>>>>>      e.g: drive_mirror <your_IDE> <your_path>
>>>>>> 4. On the host finish the block job
>>>>>>      e.g: block_job_complete <your_IDE>
>>>>>>     Having done the 4th action, you could get an assert:
>>>>>> assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
>>>>>> On my setup, the assert is 1/3 reproducible.
>>>>>>
>>>>>> The patch series introduces the mechanism to postpone the requests
>>>>>> until the BDS leaves "drained" section for the devices not using
>>>>>> iothreads.
>>>>>> Also, it modifies the asynchronous block backend infrastructure to 
>>>>>> use
>>>>>> that mechanism to release the assert bug for IDE devices.
>>>>>>
>>>>>> Denis Plotnikov (2):
>>>>>>      async: add infrastructure for postponed actions
>>>>>>      block: postpone the coroutine executing if the BDS's is drained
>>>>>>
>>>>>>     block/block-backend.c | 58 
>>>>>> ++++++++++++++++++++++++++++++---------
>>>>>>     include/block/aio.h   | 63 
>>>>>> +++++++++++++++++++++++++++++++++++++++++++
>>>>>>     util/async.c          | 33 +++++++++++++++++++++++
>>>>>>     3 files changed, 142 insertions(+), 12 deletions(-)
>>>>>>
>>>>>
>>>
>>> -- 
>>> Best,
>>> Denis
> 

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
  2018-08-14  7:08         ` Denis Plotnikov
  2018-08-20  7:40           ` Denis Plotnikov
@ 2018-08-20  7:42           ` Denis Plotnikov
  2018-08-27  7:05           ` Denis Plotnikov
  2 siblings, 0 replies; 25+ messages in thread
From: Denis Plotnikov @ 2018-08-20  7:42 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: famz, qemu-block, qemu-devel, qemu-stable, stefanha, mreitz, John Snow

ping ping!

On 14.08.2018 10:08, Denis Plotnikov wrote:
> 
> 
> On 13.08.2018 19:30, Kevin Wolf wrote:
>> Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben:
>>> Ping ping!
>>>
>>> On 16.07.2018 21:59, John Snow wrote:
>>>>
>>>>
>>>> On 07/16/2018 11:01 AM, Denis Plotnikov wrote:
>>>>> Ping!
>>>>>
>>>>
>>>> I never saw a reply to Stefan's question on July 2nd, did you reply
>>>> off-list?
>>>>
>>>> --js
>>> Yes, I did. I talked to Stefan why the patch set appeared.
>>
>> The rest of us still don't know the answer. I had the same question.
>>
>> Kevin
> Yes, that's my fault. I should have post it earlier.
> 
> I reviewed the problem once again and come up with the following 
> explanation.
> Indeed, if the global lock has been taken by the main thread the vCPU 
> threads won't be able to execute mmio ide.
> But, if the main thread will release the lock then nothing will prevent
> vCPU threads form execution what they want, e.g writing to the block 
> device.
> 
> In case of running the mirroring it is possible. Let's take a look
> at the following snippet of mirror_run. This is a part the mirroring 
> completion part.
> 
>              bdrv_drained_begin(bs);
>              cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>  >>>>>>      if (cnt > 0 || mirror_flush(s) < 0) {
>                  bdrv_drained_end(bs);
>                  continue;
>              }
> 
> (X) >>>>    assert(QLIST_EMPTY(&bs->tracked_requests));
> 
> mirror_flush here can yield the current coroutine so nothing more can be 
> executed.
> We could end up with the situation when the main loop have to revolve to 
> poll for another timer/bh to process. While revolving it releases the 
> global lock. If the global lock is waited for by a vCPU (any other) 
> thread, the waiting thread will get the lock and make what it intends.
> 
> This is something that I can observe:
> 
> mirror_flush yields coroutine, the main thread revolves and locks 
> because a vCPU was waiting for the lock. Now the vCPU thread owns the 
> lock and the main thread waits for the lock releasing.
> The vCPU thread does cmd_write_dma and releases the lock. Then, the main
> thread gets the lock and continues to run eventually proceeding with the 
> coroutine yeiled.
> If the vCPU requests aren't completed by the moment we will assert at 
> (X). If the vCPU requests are completed we won't even notice that we had 
> some writes while in the drained section.
> 
> Denis
>>
>>>>> On 29.06.2018 15:40, Denis Plotnikov wrote:
>>>>>> There are cases when a request to a block driver state shouldn't have
>>>>>> appeared producing dangerous race conditions.
>>>>>> This misbehaviour is usually happens with storage devices emulated
>>>>>> without eventfd for guest to host notifications like IDE.
>>>>>>
>>>>>> The issue arises when the context is in the "drained" section
>>>>>> and doesn't expect the request to come, but request comes from the
>>>>>> device not using iothread and which context is processed by the main
>>>>>> loop.
>>>>>>
>>>>>> The main loop apart of the iothread event loop isn't blocked by the
>>>>>> "drained" section.
>>>>>> The request coming and processing while in "drained" section can 
>>>>>> spoil
>>>>>> the
>>>>>> block driver state consistency.
>>>>>>
>>>>>> This behavior can be observed in the following KVM-based case:
>>>>>>
>>>>>> 1. Setup a VM with an IDE disk.
>>>>>> 2. Inside a VM start a disk writing load for the IDE device
>>>>>>      e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
>>>>>> 3. On the host create a mirroring block job for the IDE device
>>>>>>      e.g: drive_mirror <your_IDE> <your_path>
>>>>>> 4. On the host finish the block job
>>>>>>      e.g: block_job_complete <your_IDE>
>>>>>>     Having done the 4th action, you could get an assert:
>>>>>> assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
>>>>>> On my setup, the assert is 1/3 reproducible.
>>>>>>
>>>>>> The patch series introduces the mechanism to postpone the requests
>>>>>> until the BDS leaves "drained" section for the devices not using
>>>>>> iothreads.
>>>>>> Also, it modifies the asynchronous block backend infrastructure to 
>>>>>> use
>>>>>> that mechanism to release the assert bug for IDE devices.
>>>>>>
>>>>>> Denis Plotnikov (2):
>>>>>>      async: add infrastructure for postponed actions
>>>>>>      block: postpone the coroutine executing if the BDS's is drained
>>>>>>
>>>>>>     block/block-backend.c | 58 
>>>>>> ++++++++++++++++++++++++++++++---------
>>>>>>     include/block/aio.h   | 63 
>>>>>> +++++++++++++++++++++++++++++++++++++++++++
>>>>>>     util/async.c          | 33 +++++++++++++++++++++++
>>>>>>     3 files changed, 142 insertions(+), 12 deletions(-)
>>>>>>
>>>>>
>>>
>>> -- 
>>> Best,
>>> Denis
> 

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
  2018-08-14  7:08         ` Denis Plotnikov
  2018-08-20  7:40           ` Denis Plotnikov
  2018-08-20  7:42           ` Denis Plotnikov
@ 2018-08-27  7:05           ` Denis Plotnikov
  2018-08-27 16:05             ` John Snow
  2 siblings, 1 reply; 25+ messages in thread
From: Denis Plotnikov @ 2018-08-27  7:05 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: famz, qemu-block, qemu-devel, qemu-stable, stefanha, mreitz, John Snow

PING! PING!

On 14.08.2018 10:08, Denis Plotnikov wrote:
> 
> 
> On 13.08.2018 19:30, Kevin Wolf wrote:
>> Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben:
>>> Ping ping!
>>>
>>> On 16.07.2018 21:59, John Snow wrote:
>>>>
>>>>
>>>> On 07/16/2018 11:01 AM, Denis Plotnikov wrote:
>>>>> Ping!
>>>>>
>>>>
>>>> I never saw a reply to Stefan's question on July 2nd, did you reply
>>>> off-list?
>>>>
>>>> --js
>>> Yes, I did. I talked to Stefan why the patch set appeared.
>>
>> The rest of us still don't know the answer. I had the same question.
>>
>> Kevin
> Yes, that's my fault. I should have post it earlier.
> 
> I reviewed the problem once again and come up with the following 
> explanation.
> Indeed, if the global lock has been taken by the main thread the vCPU 
> threads won't be able to execute mmio ide.
> But, if the main thread will release the lock then nothing will prevent
> vCPU threads form execution what they want, e.g writing to the block 
> device.
> 
> In case of running the mirroring it is possible. Let's take a look
> at the following snippet of mirror_run. This is a part the mirroring 
> completion part.
> 
>              bdrv_drained_begin(bs);
>              cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>  >>>>>>      if (cnt > 0 || mirror_flush(s) < 0) {
>                  bdrv_drained_end(bs);
>                  continue;
>              }
> 
> (X) >>>>    assert(QLIST_EMPTY(&bs->tracked_requests));
> 
> mirror_flush here can yield the current coroutine so nothing more can be 
> executed.
> We could end up with the situation when the main loop have to revolve to 
> poll for another timer/bh to process. While revolving it releases the 
> global lock. If the global lock is waited for by a vCPU (any other) 
> thread, the waiting thread will get the lock and make what it intends.
> 
> This is something that I can observe:
> 
> mirror_flush yields coroutine, the main thread revolves and locks 
> because a vCPU was waiting for the lock. Now the vCPU thread owns the 
> lock and the main thread waits for the lock releasing.
> The vCPU thread does cmd_write_dma and releases the lock. Then, the main
> thread gets the lock and continues to run eventually proceeding with the 
> coroutine yeiled.
> If the vCPU requests aren't completed by the moment we will assert at 
> (X). If the vCPU requests are completed we won't even notice that we had 
> some writes while in the drained section.
> 
> Denis
>>
>>>>> On 29.06.2018 15:40, Denis Plotnikov wrote:
>>>>>> There are cases when a request to a block driver state shouldn't have
>>>>>> appeared producing dangerous race conditions.
>>>>>> This misbehaviour is usually happens with storage devices emulated
>>>>>> without eventfd for guest to host notifications like IDE.
>>>>>>
>>>>>> The issue arises when the context is in the "drained" section
>>>>>> and doesn't expect the request to come, but request comes from the
>>>>>> device not using iothread and which context is processed by the main
>>>>>> loop.
>>>>>>
>>>>>> The main loop apart of the iothread event loop isn't blocked by the
>>>>>> "drained" section.
>>>>>> The request coming and processing while in "drained" section can 
>>>>>> spoil
>>>>>> the
>>>>>> block driver state consistency.
>>>>>>
>>>>>> This behavior can be observed in the following KVM-based case:
>>>>>>
>>>>>> 1. Setup a VM with an IDE disk.
>>>>>> 2. Inside a VM start a disk writing load for the IDE device
>>>>>>      e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
>>>>>> 3. On the host create a mirroring block job for the IDE device
>>>>>>      e.g: drive_mirror <your_IDE> <your_path>
>>>>>> 4. On the host finish the block job
>>>>>>      e.g: block_job_complete <your_IDE>
>>>>>>     Having done the 4th action, you could get an assert:
>>>>>> assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
>>>>>> On my setup, the assert is 1/3 reproducible.
>>>>>>
>>>>>> The patch series introduces the mechanism to postpone the requests
>>>>>> until the BDS leaves "drained" section for the devices not using
>>>>>> iothreads.
>>>>>> Also, it modifies the asynchronous block backend infrastructure to 
>>>>>> use
>>>>>> that mechanism to release the assert bug for IDE devices.
>>>>>>
>>>>>> Denis Plotnikov (2):
>>>>>>      async: add infrastructure for postponed actions
>>>>>>      block: postpone the coroutine executing if the BDS's is drained
>>>>>>
>>>>>>     block/block-backend.c | 58 
>>>>>> ++++++++++++++++++++++++++++++---------
>>>>>>     include/block/aio.h   | 63 
>>>>>> +++++++++++++++++++++++++++++++++++++++++++
>>>>>>     util/async.c          | 33 +++++++++++++++++++++++
>>>>>>     3 files changed, 142 insertions(+), 12 deletions(-)
>>>>>>
>>>>>
>>>
>>> -- 
>>> Best,
>>> Denis
> 

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
  2018-08-27  7:05           ` Denis Plotnikov
@ 2018-08-27 16:05             ` John Snow
  2018-08-28 10:23               ` Denis Plotnikov
  0 siblings, 1 reply; 25+ messages in thread
From: John Snow @ 2018-08-27 16:05 UTC (permalink / raw)
  To: Denis Plotnikov, Kevin Wolf
  Cc: famz, qemu-block, qemu-devel, qemu-stable, stefanha, mreitz



On 08/27/2018 03:05 AM, Denis Plotnikov wrote:
> PING! PING!
> 

Sorry, Kevin and Stefan are both on PTO right now, I think. I can't
promise I have the time to look soon, but you at least deserve an answer
for the radio silence the last week.

--js

> On 14.08.2018 10:08, Denis Plotnikov wrote:
>>
>>
>> On 13.08.2018 19:30, Kevin Wolf wrote:
>>> Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben:
>>>> Ping ping!
>>>>
>>>> On 16.07.2018 21:59, John Snow wrote:
>>>>>
>>>>>
>>>>> On 07/16/2018 11:01 AM, Denis Plotnikov wrote:
>>>>>> Ping!
>>>>>>
>>>>>
>>>>> I never saw a reply to Stefan's question on July 2nd, did you reply
>>>>> off-list?
>>>>>
>>>>> --js
>>>> Yes, I did. I talked to Stefan why the patch set appeared.
>>>
>>> The rest of us still don't know the answer. I had the same question.
>>>
>>> Kevin
>> Yes, that's my fault. I should have post it earlier.
>>
>> I reviewed the problem once again and come up with the following
>> explanation.
>> Indeed, if the global lock has been taken by the main thread the vCPU
>> threads won't be able to execute mmio ide.
>> But, if the main thread will release the lock then nothing will prevent
>> vCPU threads form execution what they want, e.g writing to the block
>> device.
>>
>> In case of running the mirroring it is possible. Let's take a look
>> at the following snippet of mirror_run. This is a part the mirroring
>> completion part.
>>
>>              bdrv_drained_begin(bs);
>>              cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>>  >>>>>>      if (cnt > 0 || mirror_flush(s) < 0) {
>>                  bdrv_drained_end(bs);
>>                  continue;
>>              }
>>
>> (X) >>>>    assert(QLIST_EMPTY(&bs->tracked_requests));
>>
>> mirror_flush here can yield the current coroutine so nothing more can
>> be executed.
>> We could end up with the situation when the main loop have to revolve
>> to poll for another timer/bh to process. While revolving it releases
>> the global lock. If the global lock is waited for by a vCPU (any
>> other) thread, the waiting thread will get the lock and make what it
>> intends.
>>
>> This is something that I can observe:
>>
>> mirror_flush yields coroutine, the main thread revolves and locks
>> because a vCPU was waiting for the lock. Now the vCPU thread owns the
>> lock and the main thread waits for the lock releasing.
>> The vCPU thread does cmd_write_dma and releases the lock. Then, the main
>> thread gets the lock and continues to run eventually proceeding with
>> the coroutine yeiled.
>> If the vCPU requests aren't completed by the moment we will assert at
>> (X). If the vCPU requests are completed we won't even notice that we
>> had some writes while in the drained section.
>>
>> Denis
>>>
>>>>>> On 29.06.2018 15:40, Denis Plotnikov wrote:
>>>>>>> There are cases when a request to a block driver state shouldn't
>>>>>>> have
>>>>>>> appeared producing dangerous race conditions.
>>>>>>> This misbehaviour is usually happens with storage devices emulated
>>>>>>> without eventfd for guest to host notifications like IDE.
>>>>>>>
>>>>>>> The issue arises when the context is in the "drained" section
>>>>>>> and doesn't expect the request to come, but request comes from the
>>>>>>> device not using iothread and which context is processed by the main
>>>>>>> loop.
>>>>>>>
>>>>>>> The main loop apart of the iothread event loop isn't blocked by the
>>>>>>> "drained" section.
>>>>>>> The request coming and processing while in "drained" section can
>>>>>>> spoil
>>>>>>> the
>>>>>>> block driver state consistency.
>>>>>>>
>>>>>>> This behavior can be observed in the following KVM-based case:
>>>>>>>
>>>>>>> 1. Setup a VM with an IDE disk.
>>>>>>> 2. Inside a VM start a disk writing load for the IDE device
>>>>>>>      e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
>>>>>>> 3. On the host create a mirroring block job for the IDE device
>>>>>>>      e.g: drive_mirror <your_IDE> <your_path>
>>>>>>> 4. On the host finish the block job
>>>>>>>      e.g: block_job_complete <your_IDE>
>>>>>>>     Having done the 4th action, you could get an assert:
>>>>>>> assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
>>>>>>> On my setup, the assert is 1/3 reproducible.
>>>>>>>
>>>>>>> The patch series introduces the mechanism to postpone the requests
>>>>>>> until the BDS leaves "drained" section for the devices not using
>>>>>>> iothreads.
>>>>>>> Also, it modifies the asynchronous block backend infrastructure
>>>>>>> to use
>>>>>>> that mechanism to release the assert bug for IDE devices.
>>>>>>>
>>>>>>> Denis Plotnikov (2):
>>>>>>>      async: add infrastructure for postponed actions
>>>>>>>      block: postpone the coroutine executing if the BDS's is drained
>>>>>>>
>>>>>>>     block/block-backend.c | 58
>>>>>>> ++++++++++++++++++++++++++++++---------
>>>>>>>     include/block/aio.h   | 63
>>>>>>> +++++++++++++++++++++++++++++++++++++++++++
>>>>>>>     util/async.c          | 33 +++++++++++++++++++++++
>>>>>>>     3 files changed, 142 insertions(+), 12 deletions(-)
>>>>>>>
>>>>>>
>>>>
>>>> -- 
>>>> Best,
>>>> Denis
>>
> 

-- 
—js

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
  2018-08-27 16:05             ` John Snow
@ 2018-08-28 10:23               ` Denis Plotnikov
  2018-09-10 10:11                 ` Denis Plotnikov
  0 siblings, 1 reply; 25+ messages in thread
From: Denis Plotnikov @ 2018-08-28 10:23 UTC (permalink / raw)
  To: John Snow, Kevin Wolf
  Cc: famz, qemu-block, qemu-devel, qemu-stable, stefanha, mreitz



On 27.08.2018 19:05, John Snow wrote:
> 
> 
> On 08/27/2018 03:05 AM, Denis Plotnikov wrote:
>> PING! PING!
>>
> 
> Sorry, Kevin and Stefan are both on PTO right now, I think. I can't
> promise I have the time to look soon, but you at least deserve an answer
> for the radio silence the last week.
> 
> --js
Thanks for the response!
I'll be waiting for some comments!

Denis
> 
>> On 14.08.2018 10:08, Denis Plotnikov wrote:
>>>
>>>
>>> On 13.08.2018 19:30, Kevin Wolf wrote:
>>>> Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben:
>>>>> Ping ping!
>>>>>
>>>>> On 16.07.2018 21:59, John Snow wrote:
>>>>>>
>>>>>>
>>>>>> On 07/16/2018 11:01 AM, Denis Plotnikov wrote:
>>>>>>> Ping!
>>>>>>>
>>>>>>
>>>>>> I never saw a reply to Stefan's question on July 2nd, did you reply
>>>>>> off-list?
>>>>>>
>>>>>> --js
>>>>> Yes, I did. I talked to Stefan why the patch set appeared.
>>>>
>>>> The rest of us still don't know the answer. I had the same question.
>>>>
>>>> Kevin
>>> Yes, that's my fault. I should have post it earlier.
>>>
>>> I reviewed the problem once again and come up with the following
>>> explanation.
>>> Indeed, if the global lock has been taken by the main thread the vCPU
>>> threads won't be able to execute mmio ide.
>>> But, if the main thread will release the lock then nothing will prevent
>>> vCPU threads form execution what they want, e.g writing to the block
>>> device.
>>>
>>> In case of running the mirroring it is possible. Let's take a look
>>> at the following snippet of mirror_run. This is a part the mirroring
>>> completion part.
>>>
>>>               bdrv_drained_begin(bs);
>>>               cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>>>   >>>>>>      if (cnt > 0 || mirror_flush(s) < 0) {
>>>                   bdrv_drained_end(bs);
>>>                   continue;
>>>               }
>>>
>>> (X) >>>>    assert(QLIST_EMPTY(&bs->tracked_requests));
>>>
>>> mirror_flush here can yield the current coroutine so nothing more can
>>> be executed.
>>> We could end up with the situation when the main loop have to revolve
>>> to poll for another timer/bh to process. While revolving it releases
>>> the global lock. If the global lock is waited for by a vCPU (any
>>> other) thread, the waiting thread will get the lock and make what it
>>> intends.
>>>
>>> This is something that I can observe:
>>>
>>> mirror_flush yields coroutine, the main thread revolves and locks
>>> because a vCPU was waiting for the lock. Now the vCPU thread owns the
>>> lock and the main thread waits for the lock releasing.
>>> The vCPU thread does cmd_write_dma and releases the lock. Then, the main
>>> thread gets the lock and continues to run eventually proceeding with
>>> the coroutine yeiled.
>>> If the vCPU requests aren't completed by the moment we will assert at
>>> (X). If the vCPU requests are completed we won't even notice that we
>>> had some writes while in the drained section.
>>>
>>> Denis
>>>>
>>>>>>> On 29.06.2018 15:40, Denis Plotnikov wrote:
>>>>>>>> There are cases when a request to a block driver state shouldn't
>>>>>>>> have
>>>>>>>> appeared producing dangerous race conditions.
>>>>>>>> This misbehaviour is usually happens with storage devices emulated
>>>>>>>> without eventfd for guest to host notifications like IDE.
>>>>>>>>
>>>>>>>> The issue arises when the context is in the "drained" section
>>>>>>>> and doesn't expect the request to come, but request comes from the
>>>>>>>> device not using iothread and which context is processed by the main
>>>>>>>> loop.
>>>>>>>>
>>>>>>>> The main loop apart of the iothread event loop isn't blocked by the
>>>>>>>> "drained" section.
>>>>>>>> The request coming and processing while in "drained" section can
>>>>>>>> spoil
>>>>>>>> the
>>>>>>>> block driver state consistency.
>>>>>>>>
>>>>>>>> This behavior can be observed in the following KVM-based case:
>>>>>>>>
>>>>>>>> 1. Setup a VM with an IDE disk.
>>>>>>>> 2. Inside a VM start a disk writing load for the IDE device
>>>>>>>>       e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
>>>>>>>> 3. On the host create a mirroring block job for the IDE device
>>>>>>>>       e.g: drive_mirror <your_IDE> <your_path>
>>>>>>>> 4. On the host finish the block job
>>>>>>>>       e.g: block_job_complete <your_IDE>
>>>>>>>>      Having done the 4th action, you could get an assert:
>>>>>>>> assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
>>>>>>>> On my setup, the assert is 1/3 reproducible.
>>>>>>>>
>>>>>>>> The patch series introduces the mechanism to postpone the requests
>>>>>>>> until the BDS leaves "drained" section for the devices not using
>>>>>>>> iothreads.
>>>>>>>> Also, it modifies the asynchronous block backend infrastructure
>>>>>>>> to use
>>>>>>>> that mechanism to release the assert bug for IDE devices.
>>>>>>>>
>>>>>>>> Denis Plotnikov (2):
>>>>>>>>       async: add infrastructure for postponed actions
>>>>>>>>       block: postpone the coroutine executing if the BDS's is drained
>>>>>>>>
>>>>>>>>      block/block-backend.c | 58
>>>>>>>> ++++++++++++++++++++++++++++++---------
>>>>>>>>      include/block/aio.h   | 63
>>>>>>>> +++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>      util/async.c          | 33 +++++++++++++++++++++++
>>>>>>>>      3 files changed, 142 insertions(+), 12 deletions(-)
>>>>>>>>
>>>>>>>
>>>>>
>>>>> -- 
>>>>> Best,
>>>>> Denis
>>>
>>
> 

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions
  2018-08-28 10:23               ` Denis Plotnikov
@ 2018-09-10 10:11                 ` Denis Plotnikov
  0 siblings, 0 replies; 25+ messages in thread
From: Denis Plotnikov @ 2018-09-10 10:11 UTC (permalink / raw)
  To: John Snow, Kevin Wolf
  Cc: famz, qemu-block, qemu-stable, qemu-devel, mreitz, stefanha

PING PING!

On 28.08.2018 13:23, Denis Plotnikov wrote:
> 
> 
> On 27.08.2018 19:05, John Snow wrote:
>>
>>
>> On 08/27/2018 03:05 AM, Denis Plotnikov wrote:
>>> PING! PING!
>>>
>>
>> Sorry, Kevin and Stefan are both on PTO right now, I think. I can't
>> promise I have the time to look soon, but you at least deserve an answer
>> for the radio silence the last week.
>>
>> --js
> Thanks for the response!
> I'll be waiting for some comments!
> 
> Denis
>>
>>> On 14.08.2018 10:08, Denis Plotnikov wrote:
>>>>
>>>>
>>>> On 13.08.2018 19:30, Kevin Wolf wrote:
>>>>> Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben:
>>>>>> Ping ping!
>>>>>>
>>>>>> On 16.07.2018 21:59, John Snow wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 07/16/2018 11:01 AM, Denis Plotnikov wrote:
>>>>>>>> Ping!
>>>>>>>>
>>>>>>>
>>>>>>> I never saw a reply to Stefan's question on July 2nd, did you reply
>>>>>>> off-list?
>>>>>>>
>>>>>>> --js
>>>>>> Yes, I did. I talked to Stefan why the patch set appeared.
>>>>>
>>>>> The rest of us still don't know the answer. I had the same question.
>>>>>
>>>>> Kevin
>>>> Yes, that's my fault. I should have post it earlier.
>>>>
>>>> I reviewed the problem once again and come up with the following
>>>> explanation.
>>>> Indeed, if the global lock has been taken by the main thread the vCPU
>>>> threads won't be able to execute mmio ide.
>>>> But, if the main thread will release the lock then nothing will prevent
>>>> vCPU threads form execution what they want, e.g writing to the block
>>>> device.
>>>>
>>>> In case of running the mirroring it is possible. Let's take a look
>>>> at the following snippet of mirror_run. This is a part the mirroring
>>>> completion part.
>>>>
>>>>               bdrv_drained_begin(bs);
>>>>               cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>>>>   >>>>>>      if (cnt > 0 || mirror_flush(s) < 0) {
>>>>                   bdrv_drained_end(bs);
>>>>                   continue;
>>>>               }
>>>>
>>>> (X) >>>>    assert(QLIST_EMPTY(&bs->tracked_requests));
>>>>
>>>> mirror_flush here can yield the current coroutine so nothing more can
>>>> be executed.
>>>> We could end up with the situation when the main loop have to revolve
>>>> to poll for another timer/bh to process. While revolving it releases
>>>> the global lock. If the global lock is waited for by a vCPU (any
>>>> other) thread, the waiting thread will get the lock and make what it
>>>> intends.
>>>>
>>>> This is something that I can observe:
>>>>
>>>> mirror_flush yields coroutine, the main thread revolves and locks
>>>> because a vCPU was waiting for the lock. Now the vCPU thread owns the
>>>> lock and the main thread waits for the lock releasing.
>>>> The vCPU thread does cmd_write_dma and releases the lock. Then, the 
>>>> main
>>>> thread gets the lock and continues to run eventually proceeding with
>>>> the coroutine yeiled.
>>>> If the vCPU requests aren't completed by the moment we will assert at
>>>> (X). If the vCPU requests are completed we won't even notice that we
>>>> had some writes while in the drained section.
>>>>
>>>> Denis
>>>>>
>>>>>>>> On 29.06.2018 15:40, Denis Plotnikov wrote:
>>>>>>>>> There are cases when a request to a block driver state shouldn't
>>>>>>>>> have
>>>>>>>>> appeared producing dangerous race conditions.
>>>>>>>>> This misbehaviour is usually happens with storage devices emulated
>>>>>>>>> without eventfd for guest to host notifications like IDE.
>>>>>>>>>
>>>>>>>>> The issue arises when the context is in the "drained" section
>>>>>>>>> and doesn't expect the request to come, but request comes from the
>>>>>>>>> device not using iothread and which context is processed by the 
>>>>>>>>> main
>>>>>>>>> loop.
>>>>>>>>>
>>>>>>>>> The main loop apart of the iothread event loop isn't blocked by 
>>>>>>>>> the
>>>>>>>>> "drained" section.
>>>>>>>>> The request coming and processing while in "drained" section can
>>>>>>>>> spoil
>>>>>>>>> the
>>>>>>>>> block driver state consistency.
>>>>>>>>>
>>>>>>>>> This behavior can be observed in the following KVM-based case:
>>>>>>>>>
>>>>>>>>> 1. Setup a VM with an IDE disk.
>>>>>>>>> 2. Inside a VM start a disk writing load for the IDE device
>>>>>>>>>       e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
>>>>>>>>> 3. On the host create a mirroring block job for the IDE device
>>>>>>>>>       e.g: drive_mirror <your_IDE> <your_path>
>>>>>>>>> 4. On the host finish the block job
>>>>>>>>>       e.g: block_job_complete <your_IDE>
>>>>>>>>>      Having done the 4th action, you could get an assert:
>>>>>>>>> assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
>>>>>>>>> On my setup, the assert is 1/3 reproducible.
>>>>>>>>>
>>>>>>>>> The patch series introduces the mechanism to postpone the requests
>>>>>>>>> until the BDS leaves "drained" section for the devices not using
>>>>>>>>> iothreads.
>>>>>>>>> Also, it modifies the asynchronous block backend infrastructure
>>>>>>>>> to use
>>>>>>>>> that mechanism to release the assert bug for IDE devices.
>>>>>>>>>
>>>>>>>>> Denis Plotnikov (2):
>>>>>>>>>       async: add infrastructure for postponed actions
>>>>>>>>>       block: postpone the coroutine executing if the BDS's is 
>>>>>>>>> drained
>>>>>>>>>
>>>>>>>>>      block/block-backend.c | 58
>>>>>>>>> ++++++++++++++++++++++++++++++---------
>>>>>>>>>      include/block/aio.h   | 63
>>>>>>>>> +++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>      util/async.c          | 33 +++++++++++++++++++++++
>>>>>>>>>      3 files changed, 142 insertions(+), 12 deletions(-)
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Best,
>>>>>> Denis
>>>>
>>>
>>
> 

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained
  2018-06-29 12:40 ` [Qemu-devel] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained Denis Plotnikov
@ 2018-09-10 12:41   ` Kevin Wolf
  2018-09-12 12:03     ` Denis Plotnikov
  0 siblings, 1 reply; 25+ messages in thread
From: Kevin Wolf @ 2018-09-10 12:41 UTC (permalink / raw)
  To: Denis Plotnikov
  Cc: mreitz, stefanha, famz, qemu-stable, qemu-block, qemu-devel

Am 29.06.2018 um 14:40 hat Denis Plotnikov geschrieben:
> Fixes the problem of ide request appearing when the BDS is in
> the "drained section".
> 
> Without the patch the request can come and be processed by the main
> event loop, as the ide requests are processed by the main event loop
> and the main event loop doesn't stop when its context is in the
> "drained section".
> The request execution is postponed until the end of "drained section".
> 
> The patch doesn't modify ide specific code, as well as any other
> device code. Instead, it modifies the infrastructure of asynchronous
> Block Backend requests, in favor of postponing the requests arisen
> when in "drained section" to remove the possibility of request appearing
> for all the infrastructure clients.
> 
> This approach doesn't make vCPU processing the request wait untill
> the end of request processing.
> 
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>

I generally agree with the idea that requests should be queued during a
drained section. However, I think there are a few fundamental problems
with the implementation in this series:

1) aio_disable_external() is already a layering violation and we'd like
   to get rid of it (by replacing it with a BlockDevOps callback from
   BlockBackend to the devices), so adding more functionality there
   feels like a step in the wrong direction.

2) Only blk_aio_* are fixed, while we also have synchronous public
   interfaces (blk_pread/pwrite) as well as coroutine-based ones
   (blk_co_*). They need to be postponed as well.

   blk_co_preadv/pwritev() are the common point in the call chain for
   all of these variants, so this is where the fix needs to live.

3) Within a drained section, you want requests from other users to be
   blocked, but not your own ones (essentially you want exclusive
   access). We don't have blk_drained_begin/end() yet, so this is not
   something to implement right now, but let's keep this requirement in
   mind and choose a design that allows this.

I believe the whole logic should be kept local to BlockBackend, and
blk_root_drained_begin/end() should be the functions that start queuing
requests or let queued requests resume.

As we are already in coroutine context in blk_co_preadv/pwritev(), after
checking that blk->quiesce_counter > 0, we can enter the coroutine
object into a list and yield. blk_root_drained_end() calls aio_co_wake()
for each of the queued coroutines. This should be all that we need to
manage.

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained
  2018-09-10 12:41   ` Kevin Wolf
@ 2018-09-12 12:03     ` Denis Plotnikov
  2018-09-12 13:15       ` Kevin Wolf
  0 siblings, 1 reply; 25+ messages in thread
From: Denis Plotnikov @ 2018-09-12 12:03 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: mreitz, stefanha, den, vsementsov, famz, qemu-stable, qemu-block,
	qemu-devel



On 10.09.2018 15:41, Kevin Wolf wrote:
> Am 29.06.2018 um 14:40 hat Denis Plotnikov geschrieben:
>> Fixes the problem of ide request appearing when the BDS is in
>> the "drained section".
>>
>> Without the patch the request can come and be processed by the main
>> event loop, as the ide requests are processed by the main event loop
>> and the main event loop doesn't stop when its context is in the
>> "drained section".
>> The request execution is postponed until the end of "drained section".
>>
>> The patch doesn't modify ide specific code, as well as any other
>> device code. Instead, it modifies the infrastructure of asynchronous
>> Block Backend requests, in favor of postponing the requests arisen
>> when in "drained section" to remove the possibility of request appearing
>> for all the infrastructure clients.
>>
>> This approach doesn't make vCPU processing the request wait untill
>> the end of request processing.
>>
>> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> 
> I generally agree with the idea that requests should be queued during a
> drained section. However, I think there are a few fundamental problems
> with the implementation in this series:
> 
> 1) aio_disable_external() is already a layering violation and we'd like
>     to get rid of it (by replacing it with a BlockDevOps callback from
>     BlockBackend to the devices), so adding more functionality there
>     feels like a step in the wrong direction.
> 
> 2) Only blk_aio_* are fixed, while we also have synchronous public
>     interfaces (blk_pread/pwrite) as well as coroutine-based ones
>     (blk_co_*). They need to be postponed as well.
Good point! Thanks!
> 
>     blk_co_preadv/pwritev() are the common point in the call chain for
>     all of these variants, so this is where the fix needs to live.
Using the common point might be a good idea, but in case aio requests we 
also have to mane completions which out of the scope of 
blk_co_p(read|write)v:

static void blk_aio_write_entry(void *opaque) {
     ...
     rwco->ret = blk_co_pwritev(...);

     blk_aio_complete(acb);
     ...
}

This makes the difference.
I would suggest adding waiting until "drained_end" is done on the 
synchronous read/write at blk_prw
 
                               >
> 3) Within a drained section, you want requests from other users to be
>     blocked, but not your own ones (essentially you want exclusive
>     access). We don't have blk_drained_begin/end() yet, so this is not
>     something to implement right now, but let's keep this requirement in
>     mind and choose a design that allows this.
There is an idea to distinguish the requests that should be done without 
respect to "drained section" by using a flag in BdrvRequestFlags. The 
requests with a flag set should be processed anyway.
> 
> I believe the whole logic should be kept local to BlockBackend, and
> blk_root_drained_begin/end() should be the functions that start queuing
> requests or let queued requests resume.
> 
> As we are already in coroutine context in blk_co_preadv/pwritev(), after
> checking that blk->quiesce_counter > 0, we can enter the coroutine
> object into a list and yield. blk_root_drained_end() calls aio_co_wake()
> for each of the queued coroutines. This should be all that we need to
> manage.
In my understanding by using brdv_drained_begin/end we want to protect a 
certain BlockDriverState from external access but not the whole 
BlockBackend which may involve using a number of BlockDriverState-s.
I though it because we could possibly change a backing file for some 
BlockDriverState. And for the time of changing we need to prevent 
external access to it but keep the io going.
By using blk_root_drained_begin/end() we put to "drained section" all 
the BlockDriverState-s linked to that root.
Does it have to be so?

Denis

> 
> Kevin
> 

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained
  2018-09-12 12:03     ` Denis Plotnikov
@ 2018-09-12 13:15       ` Kevin Wolf
  2018-09-12 14:53         ` Denis Plotnikov
  2018-09-12 17:03         ` Denis V. Lunev
  0 siblings, 2 replies; 25+ messages in thread
From: Kevin Wolf @ 2018-09-12 13:15 UTC (permalink / raw)
  To: Denis Plotnikov
  Cc: mreitz, stefanha, den, vsementsov, famz, qemu-stable, qemu-block,
	qemu-devel

Am 12.09.2018 um 14:03 hat Denis Plotnikov geschrieben:
> On 10.09.2018 15:41, Kevin Wolf wrote:
> > Am 29.06.2018 um 14:40 hat Denis Plotnikov geschrieben:
> > > Fixes the problem of ide request appearing when the BDS is in
> > > the "drained section".
> > > 
> > > Without the patch the request can come and be processed by the main
> > > event loop, as the ide requests are processed by the main event loop
> > > and the main event loop doesn't stop when its context is in the
> > > "drained section".
> > > The request execution is postponed until the end of "drained section".
> > > 
> > > The patch doesn't modify ide specific code, as well as any other
> > > device code. Instead, it modifies the infrastructure of asynchronous
> > > Block Backend requests, in favor of postponing the requests arisen
> > > when in "drained section" to remove the possibility of request appearing
> > > for all the infrastructure clients.
> > > 
> > > This approach doesn't make vCPU processing the request wait untill
> > > the end of request processing.
> > > 
> > > Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> > 
> > I generally agree with the idea that requests should be queued during a
> > drained section. However, I think there are a few fundamental problems
> > with the implementation in this series:
> > 
> > 1) aio_disable_external() is already a layering violation and we'd like
> >     to get rid of it (by replacing it with a BlockDevOps callback from
> >     BlockBackend to the devices), so adding more functionality there
> >     feels like a step in the wrong direction.
> > 
> > 2) Only blk_aio_* are fixed, while we also have synchronous public
> >     interfaces (blk_pread/pwrite) as well as coroutine-based ones
> >     (blk_co_*). They need to be postponed as well.
> Good point! Thanks!
> > 
> >     blk_co_preadv/pwritev() are the common point in the call chain for
> >     all of these variants, so this is where the fix needs to live.
> Using the common point might be a good idea, but in case aio requests we
> also have to mane completions which out of the scope of
> blk_co_p(read|write)v:

I don't understand what you mean here (possibly because I fail to
understand the word "mane") and what completions have to do with
queueing of requests.

Just to clarify, we are talking about the following situation, right?
bdrv_drain_all_begin() has returned, so all the old requests have
already been drained and their completion callback has already been
called. For any new requests that come in, we need to queue them until
the drained section ends. In other words, they won't reach the point
where they could possibly complete before .drained_end.

> static void blk_aio_write_entry(void *opaque) {
>     ...
>     rwco->ret = blk_co_pwritev(...);
> 
>     blk_aio_complete(acb);
>     ...
> }
> 
> This makes the difference.
> I would suggest adding waiting until "drained_end" is done on the
> synchronous read/write at blk_prw

It is possible, but then the management becomes a bit more complicated
because you have more than just a list of Coroutines that you need to
wake up.

One thing that could be problematic in blk_co_preadv/pwritev is that
blk->in_flight would count even requests that are queued if we're not
careful. Then a nested drain would deadlock because the BlockBackend
would never say that it's quiesced.

>                               >
> > 3) Within a drained section, you want requests from other users to be
> >     blocked, but not your own ones (essentially you want exclusive
> >     access). We don't have blk_drained_begin/end() yet, so this is not
> >     something to implement right now, but let's keep this requirement in
> >     mind and choose a design that allows this.
> There is an idea to distinguish the requests that should be done without
> respect to "drained section" by using a flag in BdrvRequestFlags. The
> requests with a flag set should be processed anyway.

I don't think that would work because the accesses can be nested quite
deeply in functions that can be called from anywhere.

But possibly all of the interesting cases are directly calling BDS
functions anyway and not BlockBackend.

> > I believe the whole logic should be kept local to BlockBackend, and
> > blk_root_drained_begin/end() should be the functions that start queuing
> > requests or let queued requests resume.
> > 
> > As we are already in coroutine context in blk_co_preadv/pwritev(), after
> > checking that blk->quiesce_counter > 0, we can enter the coroutine
> > object into a list and yield. blk_root_drained_end() calls aio_co_wake()
> > for each of the queued coroutines. This should be all that we need to
> > manage.
> In my understanding by using brdv_drained_begin/end we want to protect a
> certain BlockDriverState from external access but not the whole BlockBackend
> which may involve using a number of BlockDriverState-s.
> I though it because we could possibly change a backing file for some
> BlockDriverState. And for the time of changing we need to prevent external
> access to it but keep the io going.
> By using blk_root_drained_begin/end() we put to "drained section" all the
> BlockDriverState-s linked to that root.
> Does it have to be so?

It's the other way round, actually.

In order for a BDS to be fully drained, it must make sure that it
doesn't get new requests from its parents any more. So drain propagates
towards the parents, not towards the children.

blk_root_drained_begin/end() are functions that are called when
blk->root.bs is drained.

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained
  2018-09-12 13:15       ` Kevin Wolf
@ 2018-09-12 14:53         ` Denis Plotnikov
  2018-09-12 15:09           ` Kevin Wolf
  2018-09-12 17:03         ` Denis V. Lunev
  1 sibling, 1 reply; 25+ messages in thread
From: Denis Plotnikov @ 2018-09-12 14:53 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: mreitz, stefanha, den, vsementsov, famz, qemu-stable, qemu-block,
	qemu-devel



On 12.09.2018 16:15, Kevin Wolf wrote:
> Am 12.09.2018 um 14:03 hat Denis Plotnikov geschrieben:
>> On 10.09.2018 15:41, Kevin Wolf wrote:
>>> Am 29.06.2018 um 14:40 hat Denis Plotnikov geschrieben:
>>>> Fixes the problem of ide request appearing when the BDS is in
>>>> the "drained section".
>>>>
>>>> Without the patch the request can come and be processed by the main
>>>> event loop, as the ide requests are processed by the main event loop
>>>> and the main event loop doesn't stop when its context is in the
>>>> "drained section".
>>>> The request execution is postponed until the end of "drained section".
>>>>
>>>> The patch doesn't modify ide specific code, as well as any other
>>>> device code. Instead, it modifies the infrastructure of asynchronous
>>>> Block Backend requests, in favor of postponing the requests arisen
>>>> when in "drained section" to remove the possibility of request appearing
>>>> for all the infrastructure clients.
>>>>
>>>> This approach doesn't make vCPU processing the request wait untill
>>>> the end of request processing.
>>>>
>>>> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
>>>
>>> I generally agree with the idea that requests should be queued during a
>>> drained section. However, I think there are a few fundamental problems
>>> with the implementation in this series:
>>>
>>> 1) aio_disable_external() is already a layering violation and we'd like
>>>      to get rid of it (by replacing it with a BlockDevOps callback from
>>>      BlockBackend to the devices), so adding more functionality there
>>>      feels like a step in the wrong direction.
>>>
>>> 2) Only blk_aio_* are fixed, while we also have synchronous public
>>>      interfaces (blk_pread/pwrite) as well as coroutine-based ones
>>>      (blk_co_*). They need to be postponed as well.
>> Good point! Thanks!
>>>
>>>      blk_co_preadv/pwritev() are the common point in the call chain for
>>>      all of these variants, so this is where the fix needs to live.
>> Using the common point might be a good idea, but in case aio requests we
>> also have to mane completions which out of the scope of
>> blk_co_p(read|write)v:
> 
> I don't understand what you mean here (possibly because I fail to
> understand the word "mane") and what completions have to do with
mane = make
> queueing of requests.
> 
> Just to clarify, we are talking about the following situation, right?
> bdrv_drain_all_begin() has returned, so all the old requests have
> already been drained and their completion callback has already been
> called. For any new requests that come in, we need to queue them until
> the drained section ends. In other words, they won't reach the point
> where they could possibly complete before .drained_end.
Yes

To make it clear: I'm trying to defend the idea that putting the 
postponing routine in blk_co_preadv/pwritev is not the best choice and 
that's why:

If I understood your idea correctly, if we do the postponing inside
blk_co_p(write|read)v we don't know whether we do synchronous or 
asynchronous request.
We need to know this because if we postpone an async request then, 
later, on the postponed requests processing, we must to make "a 
completion" for that request stating that it's finally "done".

Furthermore, for sync requests if we postpone them, we must block the 
clients issued them until the requests postponed have been processed on 
drained section leaving. This would ask an additional notification 
mechanism. Instead, we can just check whether we could proceed in 
blk_p(write|read) and if not (we're in drained) to wait there.

We avoid the things above if we postponing in blk_aio_prwv and waiting 
in blk_prw without postponing.

What do you think?

> 
>> static void blk_aio_write_entry(void *opaque) {
>>      ...
>>      rwco->ret = blk_co_pwritev(...);
>>
>>      blk_aio_complete(acb);
>>      ...
>> }
>>
>> This makes the difference.
>> I would suggest adding waiting until "drained_end" is done on the
>> synchronous read/write at blk_prw
> 
> It is possible, but then the management becomes a bit more complicated
> because you have more than just a list of Coroutines that you need to
> wake up.
> 
> One thing that could be problematic in blk_co_preadv/pwritev is that
> blk->in_flight would count even requests that are queued if we're not
> careful. Then a nested drain would deadlock because the BlockBackend
> would never say that it's quiesced.
> 
>>                                >
>>> 3) Within a drained section, you want requests from other users to be
>>>      blocked, but not your own ones (essentially you want exclusive
>>>      access). We don't have blk_drained_begin/end() yet, so this is not
>>>      something to implement right now, but let's keep this requirement in
>>>      mind and choose a design that allows this.
>> There is an idea to distinguish the requests that should be done without
>> respect to "drained section" by using a flag in BdrvRequestFlags. The
>> requests with a flag set should be processed anyway.
> 
> I don't think that would work because the accesses can be nested quite
> deeply in functions that can be called from anywhere.
> 
> But possibly all of the interesting cases are directly calling BDS
> functions anyway and not BlockBackend.
I hope it's so but what If not, fixing everywhere?
> 
>>> I believe the whole logic should be kept local to BlockBackend, and
>>> blk_root_drained_begin/end() should be the functions that start queuing
>>> requests or let queued requests resume.
>>>
>>> As we are already in coroutine context in blk_co_preadv/pwritev(), after
>>> checking that blk->quiesce_counter > 0, we can enter the coroutine
>>> object into a list and yield. blk_root_drained_end() calls aio_co_wake()
>>> for each of the queued coroutines. This should be all that we need to
>>> manage.
>> In my understanding by using brdv_drained_begin/end we want to protect a
>> certain BlockDriverState from external access but not the whole BlockBackend
>> which may involve using a number of BlockDriverState-s.
>> I though it because we could possibly change a backing file for some
>> BlockDriverState. And for the time of changing we need to prevent external
>> access to it but keep the io going.
>> By using blk_root_drained_begin/end() we put to "drained section" all the
>> BlockDriverState-s linked to that root.
>> Does it have to be so?
> 
> It's the other way round, actually.
> 
> In order for a BDS to be fully drained, it must make sure that it
> doesn't get new requests from its parents any more. So drain propagates
> towards the parents, not towards the children.
> 
> blk_root_drained_begin/end() are functions that are called when
> blk->root.bs is drained.
Make sense. Now I understand.

Denis
> 
> Kevin
> 

-- 
Best,
Denis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained
  2018-09-12 14:53         ` Denis Plotnikov
@ 2018-09-12 15:09           ` Kevin Wolf
  0 siblings, 0 replies; 25+ messages in thread
From: Kevin Wolf @ 2018-09-12 15:09 UTC (permalink / raw)
  To: Denis Plotnikov
  Cc: mreitz, stefanha, den, vsementsov, famz, qemu-stable, qemu-block,
	qemu-devel

Am 12.09.2018 um 16:53 hat Denis Plotnikov geschrieben:
> On 12.09.2018 16:15, Kevin Wolf wrote:
> > Am 12.09.2018 um 14:03 hat Denis Plotnikov geschrieben:
> > > On 10.09.2018 15:41, Kevin Wolf wrote:
> > > > Am 29.06.2018 um 14:40 hat Denis Plotnikov geschrieben:
> > > > > Fixes the problem of ide request appearing when the BDS is in
> > > > > the "drained section".
> > > > > 
> > > > > Without the patch the request can come and be processed by the main
> > > > > event loop, as the ide requests are processed by the main event loop
> > > > > and the main event loop doesn't stop when its context is in the
> > > > > "drained section".
> > > > > The request execution is postponed until the end of "drained section".
> > > > > 
> > > > > The patch doesn't modify ide specific code, as well as any other
> > > > > device code. Instead, it modifies the infrastructure of asynchronous
> > > > > Block Backend requests, in favor of postponing the requests arisen
> > > > > when in "drained section" to remove the possibility of request appearing
> > > > > for all the infrastructure clients.
> > > > > 
> > > > > This approach doesn't make vCPU processing the request wait untill
> > > > > the end of request processing.
> > > > > 
> > > > > Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> > > > 
> > > > I generally agree with the idea that requests should be queued during a
> > > > drained section. However, I think there are a few fundamental problems
> > > > with the implementation in this series:
> > > > 
> > > > 1) aio_disable_external() is already a layering violation and we'd like
> > > >      to get rid of it (by replacing it with a BlockDevOps callback from
> > > >      BlockBackend to the devices), so adding more functionality there
> > > >      feels like a step in the wrong direction.
> > > > 
> > > > 2) Only blk_aio_* are fixed, while we also have synchronous public
> > > >      interfaces (blk_pread/pwrite) as well as coroutine-based ones
> > > >      (blk_co_*). They need to be postponed as well.
> > > Good point! Thanks!
> > > > 
> > > >      blk_co_preadv/pwritev() are the common point in the call chain for
> > > >      all of these variants, so this is where the fix needs to live.
> > > Using the common point might be a good idea, but in case aio requests we
> > > also have to mane completions which out of the scope of
> > > blk_co_p(read|write)v:
> > 
> > I don't understand what you mean here (possibly because I fail to
> > understand the word "mane") and what completions have to do with
> mane = make
> > queueing of requests.
> > 
> > Just to clarify, we are talking about the following situation, right?
> > bdrv_drain_all_begin() has returned, so all the old requests have
> > already been drained and their completion callback has already been
> > called. For any new requests that come in, we need to queue them until
> > the drained section ends. In other words, they won't reach the point
> > where they could possibly complete before .drained_end.
> Yes
> 
> To make it clear: I'm trying to defend the idea that putting the postponing
> routine in blk_co_preadv/pwritev is not the best choice and that's why:
> 
> If I understood your idea correctly, if we do the postponing inside
> blk_co_p(write|read)v we don't know whether we do synchronous or
> asynchronous request.
> We need to know this because if we postpone an async request then, later, on
> the postponed requests processing, we must to make "a completion" for that
> request stating that it's finally "done".

Yes, for AIO requests, the completion callback must be called
eventually. This is not different between normal and postponed requests,
though. This is why blk_aio_read/write_entry() call blk_aio_complete()
before they return. This call will be made for postponed requests, too,
so there is nothing that you would need to do additionally inside
blk_co_preadv/pwritev().

> Furthermore, for sync requests if we postpone them, we must block the
> clients issued them until the requests postponed have been processed on
> drained section leaving. This would ask an additional notification
> mechanism. Instead, we can just check whether we could proceed in
> blk_p(write|read) and if not (we're in drained) to wait there.

Again, this is the same for normal requests. The BDRV_POLL_WHILE() in
blk_prw() already implements the waiting. You don't need another
mechanism.

> We avoid the things above if we postponing in blk_aio_prwv and waiting in
> blk_prw without postponing.
> 
> What do you think?
> 
> > 
> > > static void blk_aio_write_entry(void *opaque) {
> > >      ...
> > >      rwco->ret = blk_co_pwritev(...);
> > > 
> > >      blk_aio_complete(acb);
> > >      ...
> > > }
> > > 
> > > This makes the difference.
> > > I would suggest adding waiting until "drained_end" is done on the
> > > synchronous read/write at blk_prw
> > 
> > It is possible, but then the management becomes a bit more complicated
> > because you have more than just a list of Coroutines that you need to
> > wake up.
> > 
> > One thing that could be problematic in blk_co_preadv/pwritev is that
> > blk->in_flight would count even requests that are queued if we're not
> > careful. Then a nested drain would deadlock because the BlockBackend
> > would never say that it's quiesced.
> > 
> > >                                >
> > > > 3) Within a drained section, you want requests from other users to be
> > > >      blocked, but not your own ones (essentially you want exclusive
> > > >      access). We don't have blk_drained_begin/end() yet, so this is not
> > > >      something to implement right now, but let's keep this requirement in
> > > >      mind and choose a design that allows this.
> > > There is an idea to distinguish the requests that should be done without
> > > respect to "drained section" by using a flag in BdrvRequestFlags. The
> > > requests with a flag set should be processed anyway.
> > 
> > I don't think that would work because the accesses can be nested quite
> > deeply in functions that can be called from anywhere.
> > 
> > But possibly all of the interesting cases are directly calling BDS
> > functions anyway and not BlockBackend.
> I hope it's so but what If not, fixing everywhere?

If you keep things local to the BlockBackend (instead of involving the
AioContext), you can block requests for all other BlockBackends, but
still allow them on the BlockBackend whose user requested draining
(i.e. exclusive access).

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained
  2018-09-12 13:15       ` Kevin Wolf
  2018-09-12 14:53         ` Denis Plotnikov
@ 2018-09-12 17:03         ` Denis V. Lunev
  2018-09-13  8:44           ` Kevin Wolf
  1 sibling, 1 reply; 25+ messages in thread
From: Denis V. Lunev @ 2018-09-12 17:03 UTC (permalink / raw)
  To: Kevin Wolf, Denis Plotnikov
  Cc: mreitz, stefanha, vsementsov, famz, qemu-stable, qemu-block, qemu-devel

On 09/12/2018 04:15 PM, Kevin Wolf wrote:
> Am 12.09.2018 um 14:03 hat Denis Plotnikov geschrieben:
>> On 10.09.2018 15:41, Kevin Wolf wrote:
>>> Am 29.06.2018 um 14:40 hat Denis Plotnikov geschrieben:
>>>> Fixes the problem of ide request appearing when the BDS is in
>>>> the "drained section".
>>>>
>>>> Without the patch the request can come and be processed by the main
>>>> event loop, as the ide requests are processed by the main event loop
>>>> and the main event loop doesn't stop when its context is in the
>>>> "drained section".
>>>> The request execution is postponed until the end of "drained section".
>>>>
>>>> The patch doesn't modify ide specific code, as well as any other
>>>> device code. Instead, it modifies the infrastructure of asynchronous
>>>> Block Backend requests, in favor of postponing the requests arisen
>>>> when in "drained section" to remove the possibility of request appearing
>>>> for all the infrastructure clients.
>>>>
>>>> This approach doesn't make vCPU processing the request wait untill
>>>> the end of request processing.
>>>>
>>>> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
>>> I generally agree with the idea that requests should be queued during a
>>> drained section. However, I think there are a few fundamental problems
>>> with the implementation in this series:
>>>
>>> 1) aio_disable_external() is already a layering violation and we'd like
>>>     to get rid of it (by replacing it with a BlockDevOps callback from
>>>     BlockBackend to the devices), so adding more functionality there
>>>     feels like a step in the wrong direction.
>>>
>>> 2) Only blk_aio_* are fixed, while we also have synchronous public
>>>     interfaces (blk_pread/pwrite) as well as coroutine-based ones
>>>     (blk_co_*). They need to be postponed as well.
>> Good point! Thanks!

Should we really prohibit all public interfaces, as they are reused
inside block
level?

There is also a problem which is not stated in the clear words yet.
We have potential deadlock in the code under the following
conditions, which should be also taken into the consideration.

<path from the controller>
bdrv_co_pwritev
    bdrv_inc_in_flight
    bdrv_aligned_pwritev
        notifier_list_with_return_notify
             backup_before_write_notify
                 backup_do_cow
                     backup_cow_with_bounce_buffer
                         blk_co_preadv

Here blk_co_preadv() must finish its work before we
will release the notifier and finish request initiated
from the controller and which has incremented
in-fligh counter.

Thus we should differentiate requests initiated at the
controller level and requests initiated in the block layer.
This is sad but true.

The idea to touch only these interfaces was to avoid
interference with block jobs code. It is revealed that
the approach is a mistake and we should have a
segregation by request kinds. Thus the idea of the
flag for use in the controller code should not be that
awful.


>>>     blk_co_preadv/pwritev() are the common point in the call chain for
>>>     all of these variants, so this is where the fix needs to live.
>> Using the common point might be a good idea, but in case aio requests we
>> also have to mane completions which out of the scope of
>> blk_co_p(read|write)v:
> I don't understand what you mean here (possibly because I fail to
> understand the word "mane") and what completions have to do with
> queueing of requests.
>
> Just to clarify, we are talking about the following situation, right?
> bdrv_drain_all_begin() has returned, so all the old requests have
> already been drained and their completion callback has already been
> called. For any new requests that come in, we need to queue them until
> the drained section ends. In other words, they won't reach the point
> where they could possibly complete before .drained_end.

Such requests should not reach the point once they will start to
execute EXCEPT notifiers. There is a big problem with synchronous
which can queue new requests and that requests are to be finished.

Den

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained
  2018-09-12 17:03         ` Denis V. Lunev
@ 2018-09-13  8:44           ` Kevin Wolf
  0 siblings, 0 replies; 25+ messages in thread
From: Kevin Wolf @ 2018-09-13  8:44 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: Denis Plotnikov, mreitz, stefanha, vsementsov, famz, qemu-stable,
	qemu-block, qemu-devel

Am 12.09.2018 um 19:03 hat Denis V. Lunev geschrieben:
> On 09/12/2018 04:15 PM, Kevin Wolf wrote:
> > Am 12.09.2018 um 14:03 hat Denis Plotnikov geschrieben:
> >> On 10.09.2018 15:41, Kevin Wolf wrote:
> >>> Am 29.06.2018 um 14:40 hat Denis Plotnikov geschrieben:
> >>>> Fixes the problem of ide request appearing when the BDS is in
> >>>> the "drained section".
> >>>>
> >>>> Without the patch the request can come and be processed by the main
> >>>> event loop, as the ide requests are processed by the main event loop
> >>>> and the main event loop doesn't stop when its context is in the
> >>>> "drained section".
> >>>> The request execution is postponed until the end of "drained section".
> >>>>
> >>>> The patch doesn't modify ide specific code, as well as any other
> >>>> device code. Instead, it modifies the infrastructure of asynchronous
> >>>> Block Backend requests, in favor of postponing the requests arisen
> >>>> when in "drained section" to remove the possibility of request appearing
> >>>> for all the infrastructure clients.
> >>>>
> >>>> This approach doesn't make vCPU processing the request wait untill
> >>>> the end of request processing.
> >>>>
> >>>> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> >>> I generally agree with the idea that requests should be queued during a
> >>> drained section. However, I think there are a few fundamental problems
> >>> with the implementation in this series:
> >>>
> >>> 1) aio_disable_external() is already a layering violation and we'd like
> >>>     to get rid of it (by replacing it with a BlockDevOps callback from
> >>>     BlockBackend to the devices), so adding more functionality there
> >>>     feels like a step in the wrong direction.
> >>>
> >>> 2) Only blk_aio_* are fixed, while we also have synchronous public
> >>>     interfaces (blk_pread/pwrite) as well as coroutine-based ones
> >>>     (blk_co_*). They need to be postponed as well.
> >> Good point! Thanks!
> 
> Should we really prohibit all public interfaces, as they are reused
> inside block level?

We need to fix that. blk_*() should never be called from inside the BDS
layer.

> There is also a problem which is not stated in the clear words yet.
> We have potential deadlock in the code under the following
> conditions, which should be also taken into the consideration.
> 
> <path from the controller>
> bdrv_co_pwritev
>     bdrv_inc_in_flight
>     bdrv_aligned_pwritev
>         notifier_list_with_return_notify
>              backup_before_write_notify
>                  backup_do_cow
>                      backup_cow_with_bounce_buffer
>                          blk_co_preadv
> 
> Here blk_co_preadv() must finish its work before we will release the
> notifier and finish request initiated from the controller and which
> has incremented in-fligh counter.

Yes, before_write notifiers are evil. I've objected to them since the
day they were introduced and I'm surprised it's becoming a problem only
now.

We should probably change the backup job to insert a job node rather
sooner than later. Then it doesn't need to call blk_*() any more.

> Thus we should differentiate requests initiated at the controller
> level and requests initiated in the block layer.  This is sad but
> true.

The difference is supposed to be whether a request goes through a
BlockBackend or not.

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-09-13  8:45 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-29 12:40 [Qemu-devel] [PATCH v0 0/2] Postponed actions Denis Plotnikov
2018-06-29 12:40 ` [Qemu-devel] [PATCH v0 1/2] async: add infrastructure for postponed actions Denis Plotnikov
2018-06-29 12:40 ` [Qemu-devel] [PATCH v0 2/2] block: postpone the coroutine executing if the BDS's is drained Denis Plotnikov
2018-09-10 12:41   ` Kevin Wolf
2018-09-12 12:03     ` Denis Plotnikov
2018-09-12 13:15       ` Kevin Wolf
2018-09-12 14:53         ` Denis Plotnikov
2018-09-12 15:09           ` Kevin Wolf
2018-09-12 17:03         ` Denis V. Lunev
2018-09-13  8:44           ` Kevin Wolf
2018-07-02  1:47 ` [Qemu-devel] [PATCH v0 0/2] Postponed actions no-reply
2018-07-02 15:18 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-07-17 10:31   ` Stefan Hajnoczi
2018-07-16 15:01 ` [Qemu-devel] " Denis Plotnikov
2018-07-16 18:59   ` [Qemu-devel] [Qemu-block] " John Snow
2018-07-18  7:53     ` Denis Plotnikov
2018-08-13  8:32     ` Denis Plotnikov
2018-08-13 16:30       ` Kevin Wolf
2018-08-14  7:08         ` Denis Plotnikov
2018-08-20  7:40           ` Denis Plotnikov
2018-08-20  7:42           ` Denis Plotnikov
2018-08-27  7:05           ` Denis Plotnikov
2018-08-27 16:05             ` John Snow
2018-08-28 10:23               ` Denis Plotnikov
2018-09-10 10:11                 ` Denis Plotnikov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.