All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH RFC 0/3] Backup/blockjob fixes
@ 2016-07-27 10:49 Vladimir Sementsov-Ogievskiy
  2016-07-27 10:49 ` [Qemu-devel] [PATCH 1/3] blockjob: fix dead pointer in txn list Vladimir Sementsov-Ogievskiy
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2016-07-27 10:49 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: vsementsov, stefanha, famz, mreitz, jcody, kwolf, den

Hi all!

There are some fixes, related to previously posted test
    [PATCH] backup: block-job error BUG

I'm not sure that proposed solution is a true way to fix the problem,
but it just works for me.

Vladimir Sementsov-Ogievskiy (3):
  blockjob: fix dead pointer in txn list
  backup: fix transaction fail scenario
  blockjob: fix transaction cancel

 block/backup.c           | 12 ++++++++++++
 blockdev.c               | 13 +++++++++++++
 blockjob.c               | 34 ++++++++++++++++++++++++++++++++++
 include/block/blockjob.h | 10 ++++++++++
 4 files changed, 69 insertions(+)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [PATCH 1/3] blockjob: fix dead pointer in txn list
  2016-07-27 10:49 [Qemu-devel] [PATCH RFC 0/3] Backup/blockjob fixes Vladimir Sementsov-Ogievskiy
@ 2016-07-27 10:49 ` Vladimir Sementsov-Ogievskiy
  2016-08-01 22:39   ` [Qemu-devel] [Qemu-block] " John Snow
  2016-07-27 10:49 ` [Qemu-devel] [PATCH 2/3] backup: fix transaction fail scenario Vladimir Sementsov-Ogievskiy
  2016-07-27 10:49 ` [Qemu-devel] [PATCH 3/3] blockjob: fix transaction cancel Vladimir Sementsov-Ogievskiy
  2 siblings, 1 reply; 7+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2016-07-27 10:49 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: vsementsov, stefanha, famz, mreitz, jcody, kwolf, den

Job may be freed in block_job_unref and in this case this would break
transaction QLIST.

Fix this by removing job from this list before unref.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 blockjob.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/blockjob.c b/blockjob.c
index a5ba3be..e045091 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -216,6 +216,7 @@ static void block_job_completed_single(BlockJob *job)
     }
     job->cb(job->opaque, job->ret);
     if (job->txn) {
+        QLIST_REMOVE(job, txn_list);
         block_job_txn_unref(job->txn);
     }
     block_job_unref(job);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Qemu-devel] [PATCH 2/3] backup: fix transaction fail scenario
  2016-07-27 10:49 [Qemu-devel] [PATCH RFC 0/3] Backup/blockjob fixes Vladimir Sementsov-Ogievskiy
  2016-07-27 10:49 ` [Qemu-devel] [PATCH 1/3] blockjob: fix dead pointer in txn list Vladimir Sementsov-Ogievskiy
@ 2016-07-27 10:49 ` Vladimir Sementsov-Ogievskiy
  2016-07-27 10:49 ` [Qemu-devel] [PATCH 3/3] blockjob: fix transaction cancel Vladimir Sementsov-Ogievskiy
  2 siblings, 0 replies; 7+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2016-07-27 10:49 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: vsementsov, stefanha, famz, mreitz, jcody, kwolf, den

When there are several backups in one transaction, successed block job
must wait for possible cancelling by other block job (if it fails).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/backup.c           | 12 ++++++++++++
 blockdev.c               |  2 ++
 blockjob.c               | 33 +++++++++++++++++++++++++++++++++
 include/block/blockjob.h | 10 ++++++++++
 4 files changed, 57 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index 2c05323..9198a40 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -467,6 +467,18 @@ static void coroutine_fn backup_run(void *opaque)
     qemu_co_rwlock_unlock(&job->flush_rwlock);
     g_free(job->done_bitmap);
 
+    if (ret == 0) {
+        job->common.success = true;
+        while (!block_job_txn_all_success(&job->common)) {
+            if (block_job_is_cancelled(&job->common)) {
+                break;
+            }
+
+            block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, SLICE_TIME);
+        }
+    }
+
+
     bdrv_op_unblock_all(blk_bs(target), job->common.blocker);
 
     data = g_malloc(sizeof(*data));
diff --git a/blockdev.c b/blockdev.c
index 384ad3b..59ae9e4 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2250,6 +2250,8 @@ void qmp_transaction(TransactionActionList *dev_list,
         }
     }
 
+    block_job_txn_set_prepared(block_job_txn);
+
     QSIMPLEQ_FOREACH(state, &snap_bdrv_states, entry) {
         if (state->ops->commit) {
             state->ops->commit(state);
diff --git a/blockjob.c b/blockjob.c
index e045091..a85844d 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -49,6 +49,8 @@ struct BlockJobTxn {
 
     /* Reference count */
     int refcnt;
+
+    bool prepared;
 };
 
 static QLIST_HEAD(, BlockJob) block_jobs = QLIST_HEAD_INITIALIZER(block_jobs);
@@ -222,6 +224,28 @@ static void block_job_completed_single(BlockJob *job)
     block_job_unref(job);
 }
 
+bool block_job_txn_all_success(BlockJob *job)
+{
+    BlockJobTxn *txn = job->txn;
+    BlockJob *other_job;
+
+    if (txn == NULL) {
+        return job->success;
+    }
+
+    if (!txn->prepared) {
+        return false;
+    }
+
+    QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
+        if (!(other_job->success)) {
+            return false;
+        }
+    }
+
+    return true;
+}
+
 static void block_job_completed_txn_abort(BlockJob *job)
 {
     AioContext *ctx;
@@ -662,3 +686,12 @@ void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
     QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
     block_job_txn_ref(txn);
 }
+
+void block_job_txn_set_prepared(BlockJobTxn *txn)
+{
+    if (!txn) {
+        return;
+    }
+
+    txn->prepared = true;
+}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 4ddb4ae..41f56b4 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -156,6 +156,13 @@ struct BlockJob {
     bool ready;
 
     /**
+     * All work is done with success, job is ready to be completed
+     * successfully, but waiting for cancelling by other job from transaction,
+     * if it (other job) fails.
+     */
+    bool success;
+
+    /**
      * Set to true when the job has deferred work to the main loop.
      */
     bool deferred_to_main_loop;
@@ -504,4 +511,7 @@ void block_job_txn_unref(BlockJobTxn *txn);
  */
 void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job);
 
+void block_job_txn_set_prepared(BlockJobTxn *txn);
+bool block_job_txn_all_success(BlockJob *job);
+
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Qemu-devel] [PATCH 3/3] blockjob: fix transaction cancel
  2016-07-27 10:49 [Qemu-devel] [PATCH RFC 0/3] Backup/blockjob fixes Vladimir Sementsov-Ogievskiy
  2016-07-27 10:49 ` [Qemu-devel] [PATCH 1/3] blockjob: fix dead pointer in txn list Vladimir Sementsov-Ogievskiy
  2016-07-27 10:49 ` [Qemu-devel] [PATCH 2/3] backup: fix transaction fail scenario Vladimir Sementsov-Ogievskiy
@ 2016-07-27 10:49 ` Vladimir Sementsov-Ogievskiy
  2 siblings, 0 replies; 7+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2016-07-27 10:49 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: vsementsov, stefanha, famz, mreitz, jcody, kwolf, den

Prevent a situation, when some jobs from transaction are already
finished and user tries to cancel a job from this transaction.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 blockdev.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 59ae9e4..d1818c2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3772,6 +3772,17 @@ void qmp_block_job_cancel(const char *device,
         goto out;
     }
 
+    if (block_job_txn_all_success(job)) {
+        if (job->txn != NULL) {
+            error_setg(errp, "All block jobs in transaction for device '%s' are"
+                             "already successed", device);
+        } else {
+            error_setg(errp, "The block job for device '%s' is"
+                             "already successed", device);
+        }
+        goto out;
+    }
+
     trace_qmp_block_job_cancel(job);
     block_job_cancel(job);
 out:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/3] blockjob: fix dead pointer in txn list
  2016-07-27 10:49 ` [Qemu-devel] [PATCH 1/3] blockjob: fix dead pointer in txn list Vladimir Sementsov-Ogievskiy
@ 2016-08-01 22:39   ` John Snow
  2016-08-02 11:05     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 7+ messages in thread
From: John Snow @ 2016-08-01 22:39 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, famz, mreitz, stefanha, den

On 07/27/2016 06:49 AM, Vladimir Sementsov-Ogievskiy wrote:
> Job may be freed in block_job_unref and in this case this would break
> transaction QLIST.
>
> Fix this by removing job from this list before unref.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  blockjob.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/blockjob.c b/blockjob.c
> index a5ba3be..e045091 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -216,6 +216,7 @@ static void block_job_completed_single(BlockJob *job)
>      }
>      job->cb(job->opaque, job->ret);
>      if (job->txn) {
> +        QLIST_REMOVE(job, txn_list);
>          block_job_txn_unref(job->txn);
>      }
>      block_job_unref(job);
>

Has this caused actual problems for you?

This function is only ever called in a transactional context if the 
transaction is over -- so we're not likely to use the pointers ever 
again anyway.

Still, it's good practice, and the caller uses a safe iteration of the 
list, so I think this should be safe.

But I don't think this SHOULD fix an actual bug. If it does, I think 
something else is wrong.

Tested-by: John Snow <jsnow@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/3] blockjob: fix dead pointer in txn list
  2016-08-01 22:39   ` [Qemu-devel] [Qemu-block] " John Snow
@ 2016-08-02 11:05     ` Vladimir Sementsov-Ogievskiy
  2016-08-02 16:50       ` John Snow
  0 siblings, 1 reply; 7+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2016-08-02 11:05 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block; +Cc: kwolf, famz, mreitz, stefanha, den

On 02.08.2016 01:39, John Snow wrote:
> On 07/27/2016 06:49 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Job may be freed in block_job_unref and in this case this would break
>> transaction QLIST.
>>
>> Fix this by removing job from this list before unref.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>  blockjob.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/blockjob.c b/blockjob.c
>> index a5ba3be..e045091 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -216,6 +216,7 @@ static void block_job_completed_single(BlockJob 
>> *job)
>>      }
>>      job->cb(job->opaque, job->ret);
>>      if (job->txn) {
>> +        QLIST_REMOVE(job, txn_list);
>>          block_job_txn_unref(job->txn);
>>      }
>>      block_job_unref(job);
>>
>
> Has this caused actual problems for you?

Yes, with the same changed test 124 (my parallel thread). Backup job can 
finish  too early (if dirty bitmap is empty) and then we use this 
transaction job list with dead pointer.

>
> This function is only ever called in a transactional context if the 
> transaction is over -- so we're not likely to use the pointers ever 
> again anyway.

Backup job may finish even earlier than all jobs are added to the list. 
(same case, empty dirty bitmap for one of drives).

>
> Still, it's good practice, and the caller uses a safe iteration of the 
> list, so I think this should be safe.
>
> But I don't think this SHOULD fix an actual bug. If it does, I think 
> something else is wrong.
>
> Tested-by: John Snow <jsnow@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/3] blockjob: fix dead pointer in txn list
  2016-08-02 11:05     ` Vladimir Sementsov-Ogievskiy
@ 2016-08-02 16:50       ` John Snow
  0 siblings, 0 replies; 7+ messages in thread
From: John Snow @ 2016-08-02 16:50 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, den, famz, stefanha, mreitz



On 08/02/2016 07:05 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.08.2016 01:39, John Snow wrote:
>> On 07/27/2016 06:49 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Job may be freed in block_job_unref and in this case this would break
>>> transaction QLIST.
>>>
>>> Fix this by removing job from this list before unref.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>  blockjob.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/blockjob.c b/blockjob.c
>>> index a5ba3be..e045091 100644
>>> --- a/blockjob.c
>>> +++ b/blockjob.c
>>> @@ -216,6 +216,7 @@ static void block_job_completed_single(BlockJob
>>> *job)
>>>      }
>>>      job->cb(job->opaque, job->ret);
>>>      if (job->txn) {
>>> +        QLIST_REMOVE(job, txn_list);
>>>          block_job_txn_unref(job->txn);
>>>      }
>>>      block_job_unref(job);
>>>
>>
>> Has this caused actual problems for you?
>
> Yes, with the same changed test 124 (my parallel thread). Backup job can
> finish  too early (if dirty bitmap is empty) and then we use this
> transaction job list with dead pointer.
>
>>
>> This function is only ever called in a transactional context if the
>> transaction is over -- so we're not likely to use the pointers ever
>> again anyway.
>
> Backup job may finish even earlier than all jobs are added to the list.
> (same case, empty dirty bitmap for one of drives).
>

AHA, I get it now.

I think the right solution will be a general mechanism at the 
transactional level, not backup-specific hacks, but thank you for 
explaining this to me.

>>
>> Still, it's good practice, and the caller uses a safe iteration of the
>> list, so I think this should be safe.
>>
>> But I don't think this SHOULD fix an actual bug. If it does, I think
>> something else is wrong.
>>
>> Tested-by: John Snow <jsnow@redhat.com>
>> Reviewed-by: John Snow <jsnow@redhat.com>
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-02 16:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-27 10:49 [Qemu-devel] [PATCH RFC 0/3] Backup/blockjob fixes Vladimir Sementsov-Ogievskiy
2016-07-27 10:49 ` [Qemu-devel] [PATCH 1/3] blockjob: fix dead pointer in txn list Vladimir Sementsov-Ogievskiy
2016-08-01 22:39   ` [Qemu-devel] [Qemu-block] " John Snow
2016-08-02 11:05     ` Vladimir Sementsov-Ogievskiy
2016-08-02 16:50       ` John Snow
2016-07-27 10:49 ` [Qemu-devel] [PATCH 2/3] backup: fix transaction fail scenario Vladimir Sementsov-Ogievskiy
2016-07-27 10:49 ` [Qemu-devel] [PATCH 3/3] blockjob: fix transaction cancel Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.