All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] fix image fleecing
@ 2018-07-03 18:07 Vladimir Sementsov-Ogievskiy
  2018-07-03 18:07 ` [Qemu-devel] [PATCH 1/2] block: add BDRV_REQ_SERIALISING flag Vladimir Sementsov-Ogievskiy
  2018-07-03 18:07 ` [Qemu-devel] [PATCH 2/2] block/backup: fix fleecing scheme: use serialized writes Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-03 18:07 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: famz, stefanha, mreitz, kwolf, jcody, eblake, jsnow, den, vsementsov

Hi all.

It's a continuation of discussion under
"[PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing" [1].

Here is my try to implement Kevin's idea, that all backup writes (for
fleecing case) should be serialized. (However, I've skipped for now
fixing related permissions).

Looks like these patches may replace patch [1], to make fleecing scheme
safe. But I'm not sure, a look by Kevin is necessary.

A test is still needed, to prove that this patch is necessary and that it
works..

Vladimir Sementsov-Ogievskiy (2):
  block: add BDRV_REQ_SERIALISING flag
  block/backup: fix fleecing scheme: use serialized writes

 include/block/block.h |  5 ++++-
 block/backup.c        | 21 ++++++++++++++++++---
 block/io.c            |  4 ++++
 3 files changed, 26 insertions(+), 4 deletions(-)

-- 
2.11.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 1/2] block: add BDRV_REQ_SERIALISING flag
  2018-07-03 18:07 [Qemu-devel] [PATCH 0/2] fix image fleecing Vladimir Sementsov-Ogievskiy
@ 2018-07-03 18:07 ` Vladimir Sementsov-Ogievskiy
  2018-07-04 14:44   ` Vladimir Sementsov-Ogievskiy
  2018-07-03 18:07 ` [Qemu-devel] [PATCH 2/2] block/backup: fix fleecing scheme: use serialized writes Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-03 18:07 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: famz, stefanha, mreitz, kwolf, jcody, eblake, jsnow, den, vsementsov

Serialized writes should be used in copy-on-write of backup(sync=none)
for image fleecing scheme.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block.h | 5 ++++-
 block/io.c            | 4 ++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/block/block.h b/include/block/block.h
index e5c7759a0c..107113aad5 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -58,8 +58,11 @@ typedef enum {
      * content. */
     BDRV_REQ_WRITE_UNCHANGED    = 0x40,
 
+    /* Force request serializing. Only for writes. */
+    BDRV_REQ_SERIALISING        = 0x80,
+
     /* Mask of valid flags */
-    BDRV_REQ_MASK               = 0x7f,
+    BDRV_REQ_MASK               = 0xff,
 } BdrvRequestFlags;
 
 typedef struct BlockSizes {
diff --git a/block/io.c b/block/io.c
index 1a2272fad3..d5ba078514 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1572,6 +1572,10 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
     max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
                                    align);
 
+    if (flags & BDRV_REQ_SERIALISING) {
+        mark_request_serialising(req, bdrv_get_cluster_size(bs));
+    }
+
     waited = wait_serialising_requests(req);
     assert(!waited || !req->serialising);
     assert(req->overlap_offset <= offset);
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 2/2] block/backup: fix fleecing scheme: use serialized writes
  2018-07-03 18:07 [Qemu-devel] [PATCH 0/2] fix image fleecing Vladimir Sementsov-Ogievskiy
  2018-07-03 18:07 ` [Qemu-devel] [PATCH 1/2] block: add BDRV_REQ_SERIALISING flag Vladimir Sementsov-Ogievskiy
@ 2018-07-03 18:07 ` Vladimir Sementsov-Ogievskiy
  2018-07-04 10:39   ` Kevin Wolf
  1 sibling, 1 reply; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-03 18:07 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: famz, stefanha, mreitz, kwolf, jcody, eblake, jsnow, den, vsementsov

Fleecing scheme works as follows: we want a kind of temporary snapshot
of active drive A. We create temporary image B, with B->backing = A.
Then we start backup(sync=none) from A to B. From this point, B reads
as point-in-time snapshot of A (A continues to be active drive,
accepting guest IO).

This scheme needs some additional synchronization between reads from B
and backup COW operations, otherwise, the following situation is
theoretically possible:

(assume B is qcow2, client is NBD client, reading from B)

1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and
   goes up to l2 table loading (assume cache miss)

2) guest write => backup COW => qcow2 write =>
   try to take qcow2 mutex => waiting

3. l2 table loaded, we see that cluster is UNALLOCATED, go to
   "case QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before
   bdrv_co_preadv(bs->backing, ...)

4) aha, mutex unlocked, backup COW continues, and we finally finish
   guest write and change cluster in our active disk A

5. actually, do bdrv_co_preadv(bs->backing, ...) and read
   _new updated_ data.

To avoid this, let's make all COW writes serializing, to not intersect
with reads from B.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/backup.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 81895ddbe2..ab46a7d43d 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -47,6 +47,8 @@ typedef struct BackupBlockJob {
     HBitmap *copy_bitmap;
     bool use_copy_range;
     int64_t copy_range_size;
+
+    bool fleecing;
 } BackupBlockJob;
 
 static const BlockJobDriver backup_job_driver;
@@ -102,6 +104,7 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
     QEMUIOVector qiov;
     BlockBackend *blk = job->common.blk;
     int nbytes;
+    int sflags = job->fleecing ? BDRV_REQ_SERIALISING : 0;
 
     hbitmap_reset(job->copy_bitmap, start / job->cluster_size, 1);
     nbytes = MIN(job->cluster_size, job->len - start);
@@ -124,11 +127,12 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
 
     if (qemu_iovec_is_zero(&qiov)) {
         ret = blk_co_pwrite_zeroes(job->target, start,
-                                   qiov.size, BDRV_REQ_MAY_UNMAP);
+                                   qiov.size, sflags | BDRV_REQ_MAY_UNMAP);
     } else {
         ret = blk_co_pwritev(job->target, start,
                              qiov.size, &qiov,
-                             job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
+                             job->compress ? BDRV_REQ_WRITE_COMPRESSED :
+                                             sflags);
     }
     if (ret < 0) {
         trace_backup_do_cow_write_fail(job, start, ret);
@@ -614,6 +618,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     BlockDriverInfo bdi;
     BackupBlockJob *job = NULL;
     int ret;
+    bool fleecing;
 
     assert(bs);
     assert(target);
@@ -668,6 +673,15 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
         return NULL;
     }
 
+    /* Detect image fleecing */
+    fleecing = sync_mode == MIRROR_SYNC_MODE_NONE && target->backing->bs == bs;
+    if (fleecing) {
+        if (compress) {
+            error_setg(errp, "Image fleecing doesn't support compressed mode.");
+            return NULL;
+        }
+    }
+
     len = bdrv_getlength(bs);
     if (len < 0) {
         error_setg_errno(errp, -len, "unable to get length for '%s'",
@@ -700,6 +714,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_INCREMENTAL ?
                        sync_bitmap : NULL;
     job->compress = compress;
+    job->fleecing = fleecing;
 
     /* If there is no backing file on the target, we cannot rely on COW if our
      * backup cluster size is smaller than the target cluster size. Even for
@@ -727,7 +742,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     } else {
         job->cluster_size = MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
     }
-    job->use_copy_range = true;
+    job->use_copy_range = !fleecing;
     job->copy_range_size = MIN_NON_ZERO(blk_get_max_transfer(job->common.blk),
                                         blk_get_max_transfer(job->target));
     job->copy_range_size = MAX(job->cluster_size,
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] block/backup: fix fleecing scheme: use serialized writes
  2018-07-03 18:07 ` [Qemu-devel] [PATCH 2/2] block/backup: fix fleecing scheme: use serialized writes Vladimir Sementsov-Ogievskiy
@ 2018-07-04 10:39   ` Kevin Wolf
  2018-07-04 12:31     ` Vladimir Sementsov-Ogievskiy
  2018-07-04 13:20     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 13+ messages in thread
From: Kevin Wolf @ 2018-07-04 10:39 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu-block, famz, stefanha, mreitz, jcody, eblake,
	jsnow, den

Am 03.07.2018 um 20:07 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Fleecing scheme works as follows: we want a kind of temporary snapshot
> of active drive A. We create temporary image B, with B->backing = A.
> Then we start backup(sync=none) from A to B. From this point, B reads
> as point-in-time snapshot of A (A continues to be active drive,
> accepting guest IO).
> 
> This scheme needs some additional synchronization between reads from B
> and backup COW operations, otherwise, the following situation is
> theoretically possible:
> 
> (assume B is qcow2, client is NBD client, reading from B)
> 
> 1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and
>    goes up to l2 table loading (assume cache miss)
> 
> 2) guest write => backup COW => qcow2 write =>
>    try to take qcow2 mutex => waiting
> 
> 3. l2 table loaded, we see that cluster is UNALLOCATED, go to
>    "case QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before
>    bdrv_co_preadv(bs->backing, ...)
> 
> 4) aha, mutex unlocked, backup COW continues, and we finally finish
>    guest write and change cluster in our active disk A
> 
> 5. actually, do bdrv_co_preadv(bs->backing, ...) and read
>    _new updated_ data.
> 
> To avoid this, let's make all COW writes serializing, to not intersect
> with reads from B.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

I think this should work, even though we can still improve it a bit.

First, I don't think we need to disable copy offloading as long as we
add BDRV_REQ_SERIALISING support to bdrv_co_copy_range_internal(). The
thing that's a bit strange there is that there is only a single flags
parameter for both source and target. We may need to split that so
that we can pass BDRV_REQ_NO_SERIALISING for the source and
BDRV_REQ_SERIALISING for the target.

> +    /* Detect image fleecing */
> +    fleecing = sync_mode == MIRROR_SYNC_MODE_NONE && target->backing->bs == bs;

The other thing is that I'm not convinced of the condition here. This is
the narrowest thinkable condition to recognise the exact fleecing setup
we have in mind right now. However, it is not the condition to flag all
setups that are affected by the same problem.

I don't think sync_mode actually makes a meaningful difference here. If
someone wants to create a full backup and already access it while the
job is still in progress, they will still want it to be consistent.

It also doesn't make a difference whether the fleecing overlay has the
source directly attached as a backing node or whether there is a filter
node between them.

So while I'm not sure whether it really covers all interesting cases,
this condition would already be a lot better:

    fleecing = bdrv_chain_contains(target, bs);

Maybe rename the variable to serialise_target_writes because it's no
longer restricted to the exact fleecing case.

> +    if (fleecing) {
> +        if (compress) {
> +            error_setg(errp, "Image fleecing doesn't support compressed mode.");
> +            return NULL;
> +        }
> +    }

Why not fleecing && compress instead of a nested if? And why is
fleecing even at odds with compression?

Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] block/backup: fix fleecing scheme: use serialized writes
  2018-07-04 10:39   ` Kevin Wolf
@ 2018-07-04 12:31     ` Vladimir Sementsov-Ogievskiy
  2018-07-04 13:20     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-04 12:31 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, famz, stefanha, mreitz, jcody, eblake,
	jsnow, den

04.07.2018 13:39, Kevin Wolf wrote:
> Am 03.07.2018 um 20:07 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Fleecing scheme works as follows: we want a kind of temporary snapshot
>> of active drive A. We create temporary image B, with B->backing = A.
>> Then we start backup(sync=none) from A to B. From this point, B reads
>> as point-in-time snapshot of A (A continues to be active drive,
>> accepting guest IO).
>>
>> This scheme needs some additional synchronization between reads from B
>> and backup COW operations, otherwise, the following situation is
>> theoretically possible:
>>
>> (assume B is qcow2, client is NBD client, reading from B)
>>
>> 1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and
>>     goes up to l2 table loading (assume cache miss)
>>
>> 2) guest write => backup COW => qcow2 write =>
>>     try to take qcow2 mutex => waiting
>>
>> 3. l2 table loaded, we see that cluster is UNALLOCATED, go to
>>     "case QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before
>>     bdrv_co_preadv(bs->backing, ...)
>>
>> 4) aha, mutex unlocked, backup COW continues, and we finally finish
>>     guest write and change cluster in our active disk A
>>
>> 5. actually, do bdrv_co_preadv(bs->backing, ...) and read
>>     _new updated_ data.
>>
>> To avoid this, let's make all COW writes serializing, to not intersect
>> with reads from B.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> I think this should work, even though we can still improve it a bit.
>
> First, I don't think we need to disable copy offloading as long as we
> add BDRV_REQ_SERIALISING support to bdrv_co_copy_range_internal(). The
> thing that's a bit strange there is that there is only a single flags
> parameter for both source and target. We may need to split that so
> that we can pass BDRV_REQ_NO_SERIALISING for the source and
> BDRV_REQ_SERIALISING for the target.

Ok

>
>> +    /* Detect image fleecing */
>> +    fleecing = sync_mode == MIRROR_SYNC_MODE_NONE && target->backing->bs == bs;
> The other thing is that I'm not convinced of the condition here. This is
> the narrowest thinkable condition to recognise the exact fleecing setup
> we have in mind right now. However, it is not the condition to flag all
> setups that are affected by the same problem.
>
> I don't think sync_mode actually makes a meaningful difference here. If
> someone wants to create a full backup and already access it while the
> job is still in progress, they will still want it to be consistent.
>
> It also doesn't make a difference whether the fleecing overlay has the
> source directly attached as a backing node or whether there is a filter
> node between them.
>
> So while I'm not sure whether it really covers all interesting cases,
> this condition would already be a lot better:
>
>      fleecing = bdrv_chain_contains(target, bs);
>
> Maybe rename the variable to serialise_target_writes because it's no
> longer restricted to the exact fleecing case.

Ok

>
>> +    if (fleecing) {
>> +        if (compress) {
>> +            error_setg(errp, "Image fleecing doesn't support compressed mode.");
>> +            return NULL;
>> +        }
>> +    }
> Why not fleecing && compress instead of a nested if?

hmm, really strange

>   And why is
> fleecing even at odds with compression?

I thought that compressed writes goes some other way, now I see: they 
don't. Will drop the restriction

>
> Kevin

Thank you for the review, I'll resend soon.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] block/backup: fix fleecing scheme: use serialized writes
  2018-07-04 10:39   ` Kevin Wolf
  2018-07-04 12:31     ` Vladimir Sementsov-Ogievskiy
@ 2018-07-04 13:20     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-04 13:20 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, famz, stefanha, mreitz, jcody, eblake,
	jsnow, den

04.07.2018 13:39, Kevin Wolf wrote:
> Am 03.07.2018 um 20:07 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Fleecing scheme works as follows: we want a kind of temporary snapshot
>> of active drive A. We create temporary image B, with B->backing = A.
>> Then we start backup(sync=none) from A to B. From this point, B reads
>> as point-in-time snapshot of A (A continues to be active drive,
>> accepting guest IO).
>>
>> This scheme needs some additional synchronization between reads from B
>> and backup COW operations, otherwise, the following situation is
>> theoretically possible:
>>
>> (assume B is qcow2, client is NBD client, reading from B)
>>
>> 1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and
>>     goes up to l2 table loading (assume cache miss)
>>
>> 2) guest write => backup COW => qcow2 write =>
>>     try to take qcow2 mutex => waiting
>>
>> 3. l2 table loaded, we see that cluster is UNALLOCATED, go to
>>     "case QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before
>>     bdrv_co_preadv(bs->backing, ...)
>>
>> 4) aha, mutex unlocked, backup COW continues, and we finally finish
>>     guest write and change cluster in our active disk A
>>
>> 5. actually, do bdrv_co_preadv(bs->backing, ...) and read
>>     _new updated_ data.
>>
>> To avoid this, let's make all COW writes serializing, to not intersect
>> with reads from B.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> I think this should work, even though we can still improve it a bit.
>
> First, I don't think we need to disable copy offloading as long as we
> add BDRV_REQ_SERIALISING support to bdrv_co_copy_range_internal(). The
> thing that's a bit strange there is that there is only a single flags
> parameter for both source and target. We may need to split that so
> that we can pass BDRV_REQ_NO_SERIALISING for the source and
> BDRV_REQ_SERIALISING for the target.

here is interesting point: why copy_range skips wait_for_serializing for 
write if k? I think it should influence only on read, isn't it? is it a bug?

another question, I don't see, where is request alignment handled in 
copy_range path, like in bdrv_co_pwritev/readv ?

>
>> +    /* Detect image fleecing */
>> +    fleecing = sync_mode == MIRROR_SYNC_MODE_NONE && target->backing->bs == bs;
> The other thing is that I'm not convinced of the condition here. This is
> the narrowest thinkable condition to recognise the exact fleecing setup
> we have in mind right now. However, it is not the condition to flag all
> setups that are affected by the same problem.
>
> I don't think sync_mode actually makes a meaningful difference here. If
> someone wants to create a full backup and already access it while the
> job is still in progress, they will still want it to be consistent.
>
> It also doesn't make a difference whether the fleecing overlay has the
> source directly attached as a backing node or whether there is a filter
> node between them.
>
> So while I'm not sure whether it really covers all interesting cases,
> this condition would already be a lot better:
>
>      fleecing = bdrv_chain_contains(target, bs);
>
> Maybe rename the variable to serialise_target_writes because it's no
> longer restricted to the exact fleecing case.
>
>> +    if (fleecing) {
>> +        if (compress) {
>> +            error_setg(errp, "Image fleecing doesn't support compressed mode.");
>> +            return NULL;
>> +        }
>> +    }
> Why not fleecing && compress instead of a nested if? And why is
> fleecing even at odds with compression?
>
> Kevin


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block: add BDRV_REQ_SERIALISING flag
  2018-07-03 18:07 ` [Qemu-devel] [PATCH 1/2] block: add BDRV_REQ_SERIALISING flag Vladimir Sementsov-Ogievskiy
@ 2018-07-04 14:44   ` Vladimir Sementsov-Ogievskiy
  2018-07-04 15:08     ` Kevin Wolf
  0 siblings, 1 reply; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-04 14:44 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: famz, stefanha, mreitz, kwolf, jcody, eblake, jsnow, den

03.07.2018 21:07, Vladimir Sementsov-Ogievskiy wrote:
> Serialized writes should be used in copy-on-write of backup(sync=none)
> for image fleecing scheme.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   include/block/block.h | 5 ++++-
>   block/io.c            | 4 ++++
>   2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/include/block/block.h b/include/block/block.h
> index e5c7759a0c..107113aad5 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -58,8 +58,11 @@ typedef enum {
>        * content. */
>       BDRV_REQ_WRITE_UNCHANGED    = 0x40,
>   
> +    /* Force request serializing. Only for writes. */
> +    BDRV_REQ_SERIALISING        = 0x80,
> +
>       /* Mask of valid flags */
> -    BDRV_REQ_MASK               = 0x7f,
> +    BDRV_REQ_MASK               = 0xff,
>   } BdrvRequestFlags;
>   
>   typedef struct BlockSizes {
> diff --git a/block/io.c b/block/io.c
> index 1a2272fad3..d5ba078514 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1572,6 +1572,10 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
>       max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
>                                      align);
>   
> +    if (flags & BDRV_REQ_SERIALISING) {
> +        mark_request_serialising(req, bdrv_get_cluster_size(bs));
> +    }
> +
>       waited = wait_serialising_requests(req);
>       assert(!waited || !req->serialising);

Kevin, about this assertion, introduced in 28de2dcd88de "block: Assert 
serialisation assumptions in pwritev"? Will not it fail with fleecing 
scheme? I'm afraid it will, when we will wait for client read with our 
request, marked serializing a moment ago...

Can we just switch it to assert(!waited || !req->partial);, setting 
req->partial in bdrv_co_pwritev for parts of unaligned requests? And 
allow new flag only for aligned requests?

Other ideas?

>       assert(req->overlap_offset <= offset);


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block: add BDRV_REQ_SERIALISING flag
  2018-07-04 14:44   ` Vladimir Sementsov-Ogievskiy
@ 2018-07-04 15:08     ` Kevin Wolf
  2018-07-04 16:11       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 13+ messages in thread
From: Kevin Wolf @ 2018-07-04 15:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu-block, famz, stefanha, mreitz, jcody, eblake,
	jsnow, den

Am 04.07.2018 um 16:44 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 03.07.2018 21:07, Vladimir Sementsov-Ogievskiy wrote:
> > Serialized writes should be used in copy-on-write of backup(sync=none)
> > for image fleecing scheme.
> > 
> > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > ---
> >   include/block/block.h | 5 ++++-
> >   block/io.c            | 4 ++++
> >   2 files changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/block/block.h b/include/block/block.h
> > index e5c7759a0c..107113aad5 100644
> > --- a/include/block/block.h
> > +++ b/include/block/block.h
> > @@ -58,8 +58,11 @@ typedef enum {
> >        * content. */
> >       BDRV_REQ_WRITE_UNCHANGED    = 0x40,
> > +    /* Force request serializing. Only for writes. */
> > +    BDRV_REQ_SERIALISING        = 0x80,
> > +
> >       /* Mask of valid flags */
> > -    BDRV_REQ_MASK               = 0x7f,
> > +    BDRV_REQ_MASK               = 0xff,
> >   } BdrvRequestFlags;
> >   typedef struct BlockSizes {
> > diff --git a/block/io.c b/block/io.c
> > index 1a2272fad3..d5ba078514 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -1572,6 +1572,10 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
> >       max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
> >                                      align);
> > +    if (flags & BDRV_REQ_SERIALISING) {
> > +        mark_request_serialising(req, bdrv_get_cluster_size(bs));
> > +    }
> > +
> >       waited = wait_serialising_requests(req);
> >       assert(!waited || !req->serialising);
> 
> Kevin, about this assertion, introduced in 28de2dcd88de "block: Assert
> serialisation assumptions in pwritev"? Will not it fail with fleecing
> scheme? I'm afraid it will, when we will wait for client read with our
> request, marked serializing a moment ago...

Hm, looks like it yes.

> Can we just switch it to assert(!waited || !req->partial);, setting
> req->partial in bdrv_co_pwritev for parts of unaligned requests? And allow
> new flag only for aligned requests?
> 
> Other ideas?

The commit message of 28de2dcd88de tells you what we need to do (and
that just changing the assertion is wrong):

    If a request calls wait_serialising_requests() and actually has to wait
    in this function (i.e. a coroutine yield), other requests can run and
    previously read data (like the head or tail buffer) could become
    outdated. In this case, we would have to restart from the beginning to
    read in the updated data.

    However, we're lucky and don't actually need to do that: A request can
    only wait in the first call of wait_serialising_requests() because we
    mark it as serialising before that call, so any later requests would
    wait. So as we don't wait in practice, we don't have to reload the data.

    This is an important assumption that may not be broken or data
    corruption will happen. Document it with some assertions.

So we may need to return -EAGAIN here, check that in the caller and
repeat the write request from the very start.

Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block: add BDRV_REQ_SERIALISING flag
  2018-07-04 15:08     ` Kevin Wolf
@ 2018-07-04 16:11       ` Vladimir Sementsov-Ogievskiy
  2018-07-04 16:20         ` Kevin Wolf
  0 siblings, 1 reply; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-04 16:11 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, famz, stefanha, mreitz, jcody, eblake,
	jsnow, den

04.07.2018 18:08, Kevin Wolf wrote:
> Am 04.07.2018 um 16:44 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 03.07.2018 21:07, Vladimir Sementsov-Ogievskiy wrote:
>>> Serialized writes should be used in copy-on-write of backup(sync=none)
>>> for image fleecing scheme.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>    include/block/block.h | 5 ++++-
>>>    block/io.c            | 4 ++++
>>>    2 files changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/block/block.h b/include/block/block.h
>>> index e5c7759a0c..107113aad5 100644
>>> --- a/include/block/block.h
>>> +++ b/include/block/block.h
>>> @@ -58,8 +58,11 @@ typedef enum {
>>>         * content. */
>>>        BDRV_REQ_WRITE_UNCHANGED    = 0x40,
>>> +    /* Force request serializing. Only for writes. */
>>> +    BDRV_REQ_SERIALISING        = 0x80,
>>> +
>>>        /* Mask of valid flags */
>>> -    BDRV_REQ_MASK               = 0x7f,
>>> +    BDRV_REQ_MASK               = 0xff,
>>>    } BdrvRequestFlags;
>>>    typedef struct BlockSizes {
>>> diff --git a/block/io.c b/block/io.c
>>> index 1a2272fad3..d5ba078514 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -1572,6 +1572,10 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
>>>        max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
>>>                                       align);
>>> +    if (flags & BDRV_REQ_SERIALISING) {
>>> +        mark_request_serialising(req, bdrv_get_cluster_size(bs));
>>> +    }
>>> +
>>>        waited = wait_serialising_requests(req);
>>>        assert(!waited || !req->serialising);
>> Kevin, about this assertion, introduced in 28de2dcd88de "block: Assert
>> serialisation assumptions in pwritev"? Will not it fail with fleecing
>> scheme? I'm afraid it will, when we will wait for client read with our
>> request, marked serializing a moment ago...
> Hm, looks like it yes.
>
>> Can we just switch it to assert(!waited || !req->partial);, setting
>> req->partial in bdrv_co_pwritev for parts of unaligned requests? And allow
>> new flag only for aligned requests?
>>
>> Other ideas?
> The commit message of 28de2dcd88de tells you what we need to do (and
> that just changing the assertion is wrong):
>
>      If a request calls wait_serialising_requests() and actually has to wait
>      in this function (i.e. a coroutine yield), other requests can run and
>      previously read data (like the head or tail buffer) could become
>      outdated. In this case, we would have to restart from the beginning to
>      read in the updated data.
>
>      However, we're lucky and don't actually need to do that: A request can
>      only wait in the first call of wait_serialising_requests() because we
>      mark it as serialising before that call, so any later requests would
>      wait. So as we don't wait in practice, we don't have to reload the data.
>
>      This is an important assumption that may not be broken or data
>      corruption will happen. Document it with some assertions.
>
> So we may need to return -EAGAIN here, check that in the caller and
> repeat the write request from the very start.

But in case of aligned request, there no previously read data, and we 
can safely continue. And actually it's our case (backup writes are aligned).

>
> Kevin


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block: add BDRV_REQ_SERIALISING flag
  2018-07-04 16:11       ` Vladimir Sementsov-Ogievskiy
@ 2018-07-04 16:20         ` Kevin Wolf
  2018-07-04 16:36           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 13+ messages in thread
From: Kevin Wolf @ 2018-07-04 16:20 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu-block, famz, stefanha, mreitz, jcody, eblake,
	jsnow, den

Am 04.07.2018 um 18:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 04.07.2018 18:08, Kevin Wolf wrote:
> > Am 04.07.2018 um 16:44 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > 03.07.2018 21:07, Vladimir Sementsov-Ogievskiy wrote:
> > > > Serialized writes should be used in copy-on-write of backup(sync=none)
> > > > for image fleecing scheme.
> > > > 
> > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > > ---
> > > >    include/block/block.h | 5 ++++-
> > > >    block/io.c            | 4 ++++
> > > >    2 files changed, 8 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/include/block/block.h b/include/block/block.h
> > > > index e5c7759a0c..107113aad5 100644
> > > > --- a/include/block/block.h
> > > > +++ b/include/block/block.h
> > > > @@ -58,8 +58,11 @@ typedef enum {
> > > >         * content. */
> > > >        BDRV_REQ_WRITE_UNCHANGED    = 0x40,
> > > > +    /* Force request serializing. Only for writes. */
> > > > +    BDRV_REQ_SERIALISING        = 0x80,
> > > > +
> > > >        /* Mask of valid flags */
> > > > -    BDRV_REQ_MASK               = 0x7f,
> > > > +    BDRV_REQ_MASK               = 0xff,
> > > >    } BdrvRequestFlags;
> > > >    typedef struct BlockSizes {
> > > > diff --git a/block/io.c b/block/io.c
> > > > index 1a2272fad3..d5ba078514 100644
> > > > --- a/block/io.c
> > > > +++ b/block/io.c
> > > > @@ -1572,6 +1572,10 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
> > > >        max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
> > > >                                       align);
> > > > +    if (flags & BDRV_REQ_SERIALISING) {
> > > > +        mark_request_serialising(req, bdrv_get_cluster_size(bs));
> > > > +    }
> > > > +
> > > >        waited = wait_serialising_requests(req);
> > > >        assert(!waited || !req->serialising);
> > > Kevin, about this assertion, introduced in 28de2dcd88de "block: Assert
> > > serialisation assumptions in pwritev"? Will not it fail with fleecing
> > > scheme? I'm afraid it will, when we will wait for client read with our
> > > request, marked serializing a moment ago...
> > Hm, looks like it yes.
> > 
> > > Can we just switch it to assert(!waited || !req->partial);, setting
> > > req->partial in bdrv_co_pwritev for parts of unaligned requests? And allow
> > > new flag only for aligned requests?
> > > 
> > > Other ideas?
> > The commit message of 28de2dcd88de tells you what we need to do (and
> > that just changing the assertion is wrong):
> > 
> >      If a request calls wait_serialising_requests() and actually has to wait
> >      in this function (i.e. a coroutine yield), other requests can run and
> >      previously read data (like the head or tail buffer) could become
> >      outdated. In this case, we would have to restart from the beginning to
> >      read in the updated data.
> > 
> >      However, we're lucky and don't actually need to do that: A request can
> >      only wait in the first call of wait_serialising_requests() because we
> >      mark it as serialising before that call, so any later requests would
> >      wait. So as we don't wait in practice, we don't have to reload the data.
> > 
> >      This is an important assumption that may not be broken or data
> >      corruption will happen. Document it with some assertions.
> > 
> > So we may need to return -EAGAIN here, check that in the caller and
> > repeat the write request from the very start.
> 
> But in case of aligned request, there no previously read data, and we can
> safely continue. And actually it's our case (backup writes are aligned).

Hm, right. I don't particularly like req->partial because it's easy to
forget to set it to false when you do something that would need to be
repeated, but I don't have a better idea.

Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block: add BDRV_REQ_SERIALISING flag
  2018-07-04 16:20         ` Kevin Wolf
@ 2018-07-04 16:36           ` Vladimir Sementsov-Ogievskiy
  2018-07-04 17:06             ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-04 16:36 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, famz, stefanha, mreitz, jcody, eblake,
	jsnow, den

04.07.2018 19:20, Kevin Wolf wrote:
> Am 04.07.2018 um 18:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 04.07.2018 18:08, Kevin Wolf wrote:
>>> Am 04.07.2018 um 16:44 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 03.07.2018 21:07, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Serialized writes should be used in copy-on-write of backup(sync=none)
>>>>> for image fleecing scheme.
>>>>>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>> ---
>>>>>     include/block/block.h | 5 ++++-
>>>>>     block/io.c            | 4 ++++
>>>>>     2 files changed, 8 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/include/block/block.h b/include/block/block.h
>>>>> index e5c7759a0c..107113aad5 100644
>>>>> --- a/include/block/block.h
>>>>> +++ b/include/block/block.h
>>>>> @@ -58,8 +58,11 @@ typedef enum {
>>>>>          * content. */
>>>>>         BDRV_REQ_WRITE_UNCHANGED    = 0x40,
>>>>> +    /* Force request serializing. Only for writes. */
>>>>> +    BDRV_REQ_SERIALISING        = 0x80,
>>>>> +
>>>>>         /* Mask of valid flags */
>>>>> -    BDRV_REQ_MASK               = 0x7f,
>>>>> +    BDRV_REQ_MASK               = 0xff,
>>>>>     } BdrvRequestFlags;
>>>>>     typedef struct BlockSizes {
>>>>> diff --git a/block/io.c b/block/io.c
>>>>> index 1a2272fad3..d5ba078514 100644
>>>>> --- a/block/io.c
>>>>> +++ b/block/io.c
>>>>> @@ -1572,6 +1572,10 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
>>>>>         max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
>>>>>                                        align);
>>>>> +    if (flags & BDRV_REQ_SERIALISING) {
>>>>> +        mark_request_serialising(req, bdrv_get_cluster_size(bs));
>>>>> +    }
>>>>> +
>>>>>         waited = wait_serialising_requests(req);
>>>>>         assert(!waited || !req->serialising);
>>>> Kevin, about this assertion, introduced in 28de2dcd88de "block: Assert
>>>> serialisation assumptions in pwritev"? Will not it fail with fleecing
>>>> scheme? I'm afraid it will, when we will wait for client read with our
>>>> request, marked serializing a moment ago...
>>> Hm, looks like it yes.
>>>
>>>> Can we just switch it to assert(!waited || !req->partial);, setting
>>>> req->partial in bdrv_co_pwritev for parts of unaligned requests? And allow
>>>> new flag only for aligned requests?
>>>>
>>>> Other ideas?
>>> The commit message of 28de2dcd88de tells you what we need to do (and
>>> that just changing the assertion is wrong):
>>>
>>>       If a request calls wait_serialising_requests() and actually has to wait
>>>       in this function (i.e. a coroutine yield), other requests can run and
>>>       previously read data (like the head or tail buffer) could become
>>>       outdated. In this case, we would have to restart from the beginning to
>>>       read in the updated data.
>>>
>>>       However, we're lucky and don't actually need to do that: A request can
>>>       only wait in the first call of wait_serialising_requests() because we
>>>       mark it as serialising before that call, so any later requests would
>>>       wait. So as we don't wait in practice, we don't have to reload the data.
>>>
>>>       This is an important assumption that may not be broken or data
>>>       corruption will happen. Document it with some assertions.
>>>
>>> So we may need to return -EAGAIN here, check that in the caller and
>>> repeat the write request from the very start.
>> But in case of aligned request, there no previously read data, and we can
>> safely continue. And actually it's our case (backup writes are aligned).
> Hm, right. I don't particularly like req->partial because it's easy to
> forget to set it to false when you do something that would need to be
> repeated, but I don't have a better idea.
>
> Kevin

I said partial, because I imagined unaligned request split to parts for 
separate writing, but this is wrong, req->unaligned sound better for me now.

So, for aligned requests all is ok.

But for unaligned all is ok too, because they are marked serializing and 
waited on first call to wait_for_serializing, before reading tails and 
before considered place in bdrv_aligned_pwritev.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block: add BDRV_REQ_SERIALISING flag
  2018-07-04 16:36           ` Vladimir Sementsov-Ogievskiy
@ 2018-07-04 17:06             ` Vladimir Sementsov-Ogievskiy
  2018-07-05  8:34               ` Kevin Wolf
  0 siblings, 1 reply; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-07-04 17:06 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, famz, stefanha, mreitz, jcody, eblake,
	jsnow, den

04.07.2018 19:36, Vladimir Sementsov-Ogievskiy wrote:
> 04.07.2018 19:20, Kevin Wolf wrote:
>> Am 04.07.2018 um 18:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> 04.07.2018 18:08, Kevin Wolf wrote:
>>>> Am 04.07.2018 um 16:44 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>> 03.07.2018 21:07, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> Serialized writes should be used in copy-on-write of 
>>>>>> backup(sync=none)
>>>>>> for image fleecing scheme.
>>>>>>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>>>>> <vsementsov@virtuozzo.com>
>>>>>> ---
>>>>>>     include/block/block.h | 5 ++++-
>>>>>>     block/io.c            | 4 ++++
>>>>>>     2 files changed, 8 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/include/block/block.h b/include/block/block.h
>>>>>> index e5c7759a0c..107113aad5 100644
>>>>>> --- a/include/block/block.h
>>>>>> +++ b/include/block/block.h
>>>>>> @@ -58,8 +58,11 @@ typedef enum {
>>>>>>          * content. */
>>>>>>         BDRV_REQ_WRITE_UNCHANGED    = 0x40,
>>>>>> +    /* Force request serializing. Only for writes. */
>>>>>> +    BDRV_REQ_SERIALISING        = 0x80,
>>>>>> +
>>>>>>         /* Mask of valid flags */
>>>>>> -    BDRV_REQ_MASK               = 0x7f,
>>>>>> +    BDRV_REQ_MASK               = 0xff,
>>>>>>     } BdrvRequestFlags;
>>>>>>     typedef struct BlockSizes {
>>>>>> diff --git a/block/io.c b/block/io.c
>>>>>> index 1a2272fad3..d5ba078514 100644
>>>>>> --- a/block/io.c
>>>>>> +++ b/block/io.c
>>>>>> @@ -1572,6 +1572,10 @@ static int coroutine_fn 
>>>>>> bdrv_aligned_pwritev(BdrvChild *child,
>>>>>>         max_transfer = 
>>>>>> QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
>>>>>>                                        align);
>>>>>> +    if (flags & BDRV_REQ_SERIALISING) {
>>>>>> +        mark_request_serialising(req, bdrv_get_cluster_size(bs));
>>>>>> +    }
>>>>>> +
>>>>>>         waited = wait_serialising_requests(req);
>>>>>>         assert(!waited || !req->serialising);
>>>>> Kevin, about this assertion, introduced in 28de2dcd88de "block: 
>>>>> Assert
>>>>> serialisation assumptions in pwritev"? Will not it fail with fleecing
>>>>> scheme? I'm afraid it will, when we will wait for client read with 
>>>>> our
>>>>> request, marked serializing a moment ago...
>>>> Hm, looks like it yes.
>>>>
>>>>> Can we just switch it to assert(!waited || !req->partial);, setting
>>>>> req->partial in bdrv_co_pwritev for parts of unaligned requests? 
>>>>> And allow
>>>>> new flag only for aligned requests?
>>>>>
>>>>> Other ideas?
>>>> The commit message of 28de2dcd88de tells you what we need to do (and
>>>> that just changing the assertion is wrong):
>>>>
>>>>       If a request calls wait_serialising_requests() and actually 
>>>> has to wait
>>>>       in this function (i.e. a coroutine yield), other requests can 
>>>> run and
>>>>       previously read data (like the head or tail buffer) could become
>>>>       outdated. In this case, we would have to restart from the 
>>>> beginning to
>>>>       read in the updated data.
>>>>
>>>>       However, we're lucky and don't actually need to do that: A 
>>>> request can
>>>>       only wait in the first call of wait_serialising_requests() 
>>>> because we
>>>>       mark it as serialising before that call, so any later 
>>>> requests would
>>>>       wait. So as we don't wait in practice, we don't have to 
>>>> reload the data.
>>>>
>>>>       This is an important assumption that may not be broken or data
>>>>       corruption will happen. Document it with some assertions.
>>>>
>>>> So we may need to return -EAGAIN here, check that in the caller and
>>>> repeat the write request from the very start.
>>> But in case of aligned request, there no previously read data, and 
>>> we can
>>> safely continue. And actually it's our case (backup writes are 
>>> aligned).
>> Hm, right. I don't particularly like req->partial because it's easy to
>> forget to set it to false when you do something that would need to be
>> repeated, but I don't have a better idea.
>>
>> Kevin
>
> I said partial, because I imagined unaligned request split to parts 
> for separate writing, but this is wrong, req->unaligned sound better 
> for me now.
>
> So, for aligned requests all is ok.
>
> But for unaligned all is ok too, because they are marked serializing 
> and waited on first call to wait_for_serializing, before reading tails 
> and before considered place in bdrv_aligned_pwritev.
>

Is it correct "serialiSing" ? Google and Thunderbird both correcting me 
to serialiZing

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block: add BDRV_REQ_SERIALISING flag
  2018-07-04 17:06             ` Vladimir Sementsov-Ogievskiy
@ 2018-07-05  8:34               ` Kevin Wolf
  0 siblings, 0 replies; 13+ messages in thread
From: Kevin Wolf @ 2018-07-05  8:34 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu-block, famz, stefanha, mreitz, jcody, eblake,
	jsnow, den

Am 04.07.2018 um 19:06 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 04.07.2018 19:36, Vladimir Sementsov-Ogievskiy wrote:
> > 04.07.2018 19:20, Kevin Wolf wrote:
> > > Am 04.07.2018 um 18:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > 04.07.2018 18:08, Kevin Wolf wrote:
> > > > > Am 04.07.2018 um 16:44 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > > > 03.07.2018 21:07, Vladimir Sementsov-Ogievskiy wrote:
> > > > > > > Serialized writes should be used in copy-on-write of
> > > > > > > backup(sync=none)
> > > > > > > for image fleecing scheme.
> > > > > > > 
> > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy
> > > > > > > <vsementsov@virtuozzo.com>
> > > > > > > ---
> > > > > > >     include/block/block.h | 5 ++++-
> > > > > > >     block/io.c            | 4 ++++
> > > > > > >     2 files changed, 8 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/include/block/block.h b/include/block/block.h
> > > > > > > index e5c7759a0c..107113aad5 100644
> > > > > > > --- a/include/block/block.h
> > > > > > > +++ b/include/block/block.h
> > > > > > > @@ -58,8 +58,11 @@ typedef enum {
> > > > > > >          * content. */
> > > > > > >         BDRV_REQ_WRITE_UNCHANGED    = 0x40,
> > > > > > > +    /* Force request serializing. Only for writes. */
> > > > > > > +    BDRV_REQ_SERIALISING        = 0x80,
> > > > > > > +
> > > > > > >         /* Mask of valid flags */
> > > > > > > -    BDRV_REQ_MASK               = 0x7f,
> > > > > > > +    BDRV_REQ_MASK               = 0xff,
> > > > > > >     } BdrvRequestFlags;
> > > > > > >     typedef struct BlockSizes {
> > > > > > > diff --git a/block/io.c b/block/io.c
> > > > > > > index 1a2272fad3..d5ba078514 100644
> > > > > > > --- a/block/io.c
> > > > > > > +++ b/block/io.c
> > > > > > > @@ -1572,6 +1572,10 @@ static int coroutine_fn
> > > > > > > bdrv_aligned_pwritev(BdrvChild *child,
> > > > > > >         max_transfer =
> > > > > > > QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer,
> > > > > > > INT_MAX),
> > > > > > >                                        align);
> > > > > > > +    if (flags & BDRV_REQ_SERIALISING) {
> > > > > > > +        mark_request_serialising(req, bdrv_get_cluster_size(bs));
> > > > > > > +    }
> > > > > > > +
> > > > > > >         waited = wait_serialising_requests(req);
> > > > > > >         assert(!waited || !req->serialising);
> > > > > > Kevin, about this assertion, introduced in 28de2dcd88de
> > > > > > "block: Assert
> > > > > > serialisation assumptions in pwritev"? Will not it fail with fleecing
> > > > > > scheme? I'm afraid it will, when we will wait for client
> > > > > > read with our
> > > > > > request, marked serializing a moment ago...
> > > > > Hm, looks like it yes.
> > > > > 
> > > > > > Can we just switch it to assert(!waited || !req->partial);, setting
> > > > > > req->partial in bdrv_co_pwritev for parts of unaligned
> > > > > > requests? And allow
> > > > > > new flag only for aligned requests?
> > > > > > 
> > > > > > Other ideas?
> > > > > The commit message of 28de2dcd88de tells you what we need to do (and
> > > > > that just changing the assertion is wrong):
> > > > > 
> > > > >       If a request calls wait_serialising_requests() and
> > > > > actually has to wait
> > > > >       in this function (i.e. a coroutine yield), other
> > > > > requests can run and
> > > > >       previously read data (like the head or tail buffer) could become
> > > > >       outdated. In this case, we would have to restart from
> > > > > the beginning to
> > > > >       read in the updated data.
> > > > > 
> > > > >       However, we're lucky and don't actually need to do
> > > > > that: A request can
> > > > >       only wait in the first call of
> > > > > wait_serialising_requests() because we
> > > > >       mark it as serialising before that call, so any later
> > > > > requests would
> > > > >       wait. So as we don't wait in practice, we don't have
> > > > > to reload the data.
> > > > > 
> > > > >       This is an important assumption that may not be broken or data
> > > > >       corruption will happen. Document it with some assertions.
> > > > > 
> > > > > So we may need to return -EAGAIN here, check that in the caller and
> > > > > repeat the write request from the very start.
> > > > But in case of aligned request, there no previously read data,
> > > > and we can
> > > > safely continue. And actually it's our case (backup writes are
> > > > aligned).
> > > Hm, right. I don't particularly like req->partial because it's easy to
> > > forget to set it to false when you do something that would need to be
> > > repeated, but I don't have a better idea.
> > > 
> > > Kevin
> > 
> > I said partial, because I imagined unaligned request split to parts for
> > separate writing, but this is wrong, req->unaligned sound better for me
> > now.
> > 
> > So, for aligned requests all is ok.
> > 
> > But for unaligned all is ok too, because they are marked serializing and
> > waited on first call to wait_for_serializing, before reading tails and
> > before considered place in bdrv_aligned_pwritev.
> > 
> 
> Is it correct "serialiSing" ? Google and Thunderbird both correcting me to
> serialiZing

Both are correct, it's just British vs. American spelling. In the
context of request serialisation, we seem to use the spelling with S, so
it would be more consistent to stay with it.

Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-07-05  8:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-03 18:07 [Qemu-devel] [PATCH 0/2] fix image fleecing Vladimir Sementsov-Ogievskiy
2018-07-03 18:07 ` [Qemu-devel] [PATCH 1/2] block: add BDRV_REQ_SERIALISING flag Vladimir Sementsov-Ogievskiy
2018-07-04 14:44   ` Vladimir Sementsov-Ogievskiy
2018-07-04 15:08     ` Kevin Wolf
2018-07-04 16:11       ` Vladimir Sementsov-Ogievskiy
2018-07-04 16:20         ` Kevin Wolf
2018-07-04 16:36           ` Vladimir Sementsov-Ogievskiy
2018-07-04 17:06             ` Vladimir Sementsov-Ogievskiy
2018-07-05  8:34               ` Kevin Wolf
2018-07-03 18:07 ` [Qemu-devel] [PATCH 2/2] block/backup: fix fleecing scheme: use serialized writes Vladimir Sementsov-Ogievskiy
2018-07-04 10:39   ` Kevin Wolf
2018-07-04 12:31     ` Vladimir Sementsov-Ogievskiy
2018-07-04 13:20     ` Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.