All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] introduce max_transfer_length
@ 2014-09-05 16:51 Peter Lieven
  2014-09-05 16:51 ` [Qemu-devel] [PATCH 1/4] BlockLimits: " Peter Lieven
                   ` (4 more replies)
  0 siblings, 5 replies; 44+ messages in thread
From: Peter Lieven @ 2014-09-05 16:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, Peter Lieven, mreitz, ronniesahlberg, pbonzini

This series adds the basics for introducing a maximum transfer length
to the block layer. Its main purpose is currently avoiding that
a multiwrite_merge exceeds the max_xfer_len of an attached iSCSI LUN.
This is a required bug fix.

Discussed reporting of this maximum in the SCSI Disk Inquiry Emulation 
of the Block Limits VPD is currently not added as we do not import any
of the other limits there. This has to be addresses in a seperate series.

Peter Lieven (4):
  BlockLimits: introduce max_transfer_length
  block: immediate cancel oversized read/write requests
  block/iscsi: set max_transfer_length
  block: avoid creating oversized writes in multiwrite_merge

 block.c                   |   23 +++++++++++++++++++++++
 block/iscsi.c             |   12 ++++++++++--
 include/block/block_int.h |    3 +++
 3 files changed, 36 insertions(+), 2 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH 1/4] BlockLimits: introduce max_transfer_length
  2014-09-05 16:51 [Qemu-devel] [PATCH 0/4] introduce max_transfer_length Peter Lieven
@ 2014-09-05 16:51 ` Peter Lieven
  2014-09-05 16:51 ` [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests Peter Lieven
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2014-09-05 16:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, Peter Lieven, mreitz, ronniesahlberg, pbonzini

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block.c                   |    4 ++++
 include/block/block_int.h |    3 +++
 2 files changed, 7 insertions(+)

diff --git a/block.c b/block.c
index cb670fd..2c4a5de 100644
--- a/block.c
+++ b/block.c
@@ -529,6 +529,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
             return;
         }
         bs->bl.opt_transfer_length = bs->file->bl.opt_transfer_length;
+        bs->bl.max_transfer_length = bs->file->bl.max_transfer_length;
         bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
     } else {
         bs->bl.opt_mem_alignment = 512;
@@ -543,6 +544,9 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
         bs->bl.opt_transfer_length =
             MAX(bs->bl.opt_transfer_length,
                 bs->backing_hd->bl.opt_transfer_length);
+        bs->bl.max_transfer_length =
+            MIN(bs->bl.max_transfer_length,
+                bs->backing_hd->bl.max_transfer_length);
         bs->bl.opt_mem_alignment =
             MAX(bs->bl.opt_mem_alignment,
                 bs->backing_hd->bl.opt_mem_alignment);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 8a61215..e178782 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -288,6 +288,9 @@ typedef struct BlockLimits {
     /* optimal transfer length in sectors */
     int opt_transfer_length;
 
+    /* maximal transfer length in sectors */
+    int max_transfer_length;
+
     /* memory alignment so that no bounce buffer is needed */
     size_t opt_mem_alignment;
 } BlockLimits;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-05 16:51 [Qemu-devel] [PATCH 0/4] introduce max_transfer_length Peter Lieven
  2014-09-05 16:51 ` [Qemu-devel] [PATCH 1/4] BlockLimits: " Peter Lieven
@ 2014-09-05 16:51 ` Peter Lieven
  2014-09-08 13:44   ` Benoît Canet
  2014-09-05 16:51 ` [Qemu-devel] [PATCH 3/4] block/iscsi: set max_transfer_length Peter Lieven
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-05 16:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, Peter Lieven, mreitz, ronniesahlberg, pbonzini

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/block.c b/block.c
index 2c4a5de..fa4c34b 100644
--- a/block.c
+++ b/block.c
@@ -3215,6 +3215,13 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
         return -EINVAL;
     }
 
+    if (bs->bl.max_transfer_length && nb_sectors > bs->bl.max_transfer_length) {
+        error_report("read of %d sectors at sector %ld exceeds device max"
+                     " transfer length of %d sectors.", nb_sectors, sector_num,
+                     bs->bl.max_transfer_length);
+        return -EINVAL;
+    }
+
     return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
                              nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
 }
@@ -3507,6 +3514,13 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         return -EINVAL;
     }
 
+    if (bs->bl.max_transfer_length && nb_sectors > bs->bl.max_transfer_length) {
+        error_report("write of %d sectors at sector %ld exceeds device max"
+                     " transfer length of %d sectors.", nb_sectors, sector_num,
+                     bs->bl.max_transfer_length);
+        return -EINVAL;
+    }
+
     return bdrv_co_do_pwritev(bs, sector_num << BDRV_SECTOR_BITS,
                               nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH 3/4] block/iscsi: set max_transfer_length
  2014-09-05 16:51 [Qemu-devel] [PATCH 0/4] introduce max_transfer_length Peter Lieven
  2014-09-05 16:51 ` [Qemu-devel] [PATCH 1/4] BlockLimits: " Peter Lieven
  2014-09-05 16:51 ` [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests Peter Lieven
@ 2014-09-05 16:51 ` Peter Lieven
  2014-09-05 16:51 ` [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge Peter Lieven
  2014-09-05 17:05 ` [Qemu-devel] [PATCH 0/4] introduce max_transfer_length ronnie sahlberg
  4 siblings, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2014-09-05 16:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, Peter Lieven, mreitz, ronniesahlberg, pbonzini

the limit of 0xffffff for 16 byte CDBs is intentional to
avoid overflows on 32-bit architectures.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/iscsi.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 3e19202..a4b625c 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1455,10 +1455,18 @@ static void iscsi_close(BlockDriverState *bs)
 
 static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
 {
-    IscsiLun *iscsilun = bs->opaque;
-
     /* We don't actually refresh here, but just return data queried in
      * iscsi_open(): iscsi targets don't change their limits. */
+
+    IscsiLun *iscsilun = bs->opaque;
+    uint32_t max_xfer_len = iscsilun->use_16_for_rw ? 0xffffff : 0xffff;
+
+    if (iscsilun->bl.max_xfer_len) {
+        max_xfer_len = MIN(max_xfer_len, iscsilun->bl.max_xfer_len);
+    }
+
+    bs->bl.max_transfer_length = sector_lun2qemu(max_xfer_len, iscsilun);
+
     if (iscsilun->lbp.lbpu) {
         if (iscsilun->bl.max_unmap < 0xffffffff) {
             bs->bl.max_discard = sector_lun2qemu(iscsilun->bl.max_unmap,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge
  2014-09-05 16:51 [Qemu-devel] [PATCH 0/4] introduce max_transfer_length Peter Lieven
                   ` (2 preceding siblings ...)
  2014-09-05 16:51 ` [Qemu-devel] [PATCH 3/4] block/iscsi: set max_transfer_length Peter Lieven
@ 2014-09-05 16:51 ` Peter Lieven
  2014-09-18 14:13   ` Paolo Bonzini
  2014-09-05 17:05 ` [Qemu-devel] [PATCH 0/4] introduce max_transfer_length ronnie sahlberg
  4 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-05 16:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, stefanha, Peter Lieven, mreitz, ronniesahlberg, pbonzini

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block.c b/block.c
index fa4c34b..db3f842 100644
--- a/block.c
+++ b/block.c
@@ -4554,6 +4554,11 @@ static int multiwrite_merge(BlockDriverState *bs, BlockRequest *reqs,
             merge = 0;
         }
 
+        if (bs->bl.max_transfer_length && reqs[outidx].nb_sectors +
+            reqs[i].nb_sectors > bs->bl.max_transfer_length) {
+            merge = 0;
+        }
+
         if (merge) {
             size_t size;
             QEMUIOVector *qiov = g_malloc0(sizeof(*qiov));
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] introduce max_transfer_length
  2014-09-05 16:51 [Qemu-devel] [PATCH 0/4] introduce max_transfer_length Peter Lieven
                   ` (3 preceding siblings ...)
  2014-09-05 16:51 ` [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge Peter Lieven
@ 2014-09-05 17:05 ` ronnie sahlberg
  2014-09-05 19:52   ` Peter Lieven
  4 siblings, 1 reply; 44+ messages in thread
From: ronnie sahlberg @ 2014-09-05 17:05 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, Max Reitz

Looks good to me.

(minor question is just why not let default max be 0xffff for both 10
and 16 CDBs ?)

On Fri, Sep 5, 2014 at 9:51 AM, Peter Lieven <pl@kamp.de> wrote:
> This series adds the basics for introducing a maximum transfer length
> to the block layer. Its main purpose is currently avoiding that
> a multiwrite_merge exceeds the max_xfer_len of an attached iSCSI LUN.
> This is a required bug fix.
>
> Discussed reporting of this maximum in the SCSI Disk Inquiry Emulation
> of the Block Limits VPD is currently not added as we do not import any
> of the other limits there. This has to be addresses in a seperate series.
>
> Peter Lieven (4):
>   BlockLimits: introduce max_transfer_length
>   block: immediate cancel oversized read/write requests
>   block/iscsi: set max_transfer_length
>   block: avoid creating oversized writes in multiwrite_merge
>
>  block.c                   |   23 +++++++++++++++++++++++
>  block/iscsi.c             |   12 ++++++++++--
>  include/block/block_int.h |    3 +++
>  3 files changed, 36 insertions(+), 2 deletions(-)
>
> --
> 1.7.9.5
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] introduce max_transfer_length
  2014-09-05 17:05 ` [Qemu-devel] [PATCH 0/4] introduce max_transfer_length ronnie sahlberg
@ 2014-09-05 19:52   ` Peter Lieven
  2014-09-05 21:22     ` ronnie sahlberg
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-05 19:52 UTC (permalink / raw)
  To: ronnie sahlberg
  Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, Max Reitz

Am 05.09.2014 um 19:05 schrieb ronnie sahlberg:
> Looks good to me.
>
> (minor question is just why not let default max be 0xffff for both 10
> and 16 CDBs ?)

You are right. I was looking at the technical limit, but in fact it doesn't
make sense to have different limits. Its ridiculous to say, you wan't to
do big I/O then you need a target thats bigger than 2TB ;-)

Peter

>
> On Fri, Sep 5, 2014 at 9:51 AM, Peter Lieven <pl@kamp.de> wrote:
>> This series adds the basics for introducing a maximum transfer length
>> to the block layer. Its main purpose is currently avoiding that
>> a multiwrite_merge exceeds the max_xfer_len of an attached iSCSI LUN.
>> This is a required bug fix.
>>
>> Discussed reporting of this maximum in the SCSI Disk Inquiry Emulation
>> of the Block Limits VPD is currently not added as we do not import any
>> of the other limits there. This has to be addresses in a seperate series.
>>
>> Peter Lieven (4):
>>   BlockLimits: introduce max_transfer_length
>>   block: immediate cancel oversized read/write requests
>>   block/iscsi: set max_transfer_length
>>   block: avoid creating oversized writes in multiwrite_merge
>>
>>  block.c                   |   23 +++++++++++++++++++++++
>>  block/iscsi.c             |   12 ++++++++++--
>>  include/block/block_int.h |    3 +++
>>  3 files changed, 36 insertions(+), 2 deletions(-)
>>
>> --
>> 1.7.9.5
>>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] introduce max_transfer_length
  2014-09-05 19:52   ` Peter Lieven
@ 2014-09-05 21:22     ` ronnie sahlberg
  0 siblings, 0 replies; 44+ messages in thread
From: ronnie sahlberg @ 2014-09-05 21:22 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, Max Reitz

Feel free to add a

Reviewed-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

to the patches.

On Fri, Sep 5, 2014 at 12:52 PM, Peter Lieven <pl@kamp.de> wrote:
> Am 05.09.2014 um 19:05 schrieb ronnie sahlberg:
>> Looks good to me.
>>
>> (minor question is just why not let default max be 0xffff for both 10
>> and 16 CDBs ?)
>
> You are right. I was looking at the technical limit, but in fact it doesn't
> make sense to have different limits. Its ridiculous to say, you wan't to
> do big I/O then you need a target thats bigger than 2TB ;-)
>
> Peter
>
>>
>> On Fri, Sep 5, 2014 at 9:51 AM, Peter Lieven <pl@kamp.de> wrote:
>>> This series adds the basics for introducing a maximum transfer length
>>> to the block layer. Its main purpose is currently avoiding that
>>> a multiwrite_merge exceeds the max_xfer_len of an attached iSCSI LUN.
>>> This is a required bug fix.
>>>
>>> Discussed reporting of this maximum in the SCSI Disk Inquiry Emulation
>>> of the Block Limits VPD is currently not added as we do not import any
>>> of the other limits there. This has to be addresses in a seperate series.
>>>
>>> Peter Lieven (4):
>>>   BlockLimits: introduce max_transfer_length
>>>   block: immediate cancel oversized read/write requests
>>>   block/iscsi: set max_transfer_length
>>>   block: avoid creating oversized writes in multiwrite_merge
>>>
>>>  block.c                   |   23 +++++++++++++++++++++++
>>>  block/iscsi.c             |   12 ++++++++++--
>>>  include/block/block_int.h |    3 +++
>>>  3 files changed, 36 insertions(+), 2 deletions(-)
>>>
>>> --
>>> 1.7.9.5
>>>
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-05 16:51 ` [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests Peter Lieven
@ 2014-09-08 13:44   ` Benoît Canet
  2014-09-08 13:49     ` Paolo Bonzini
  0 siblings, 1 reply; 44+ messages in thread
From: Benoît Canet @ 2014-09-08 13:44 UTC (permalink / raw)
  To: Peter Lieven
  Cc: kwolf, stefanha, qemu-devel, mreitz, ronniesahlberg, pbonzini

The Friday 05 Sep 2014 à 18:51:26 (+0200), Peter Lieven wrote :
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block.c |   14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 2c4a5de..fa4c34b 100644
> --- a/block.c
> +++ b/block.c
> @@ -3215,6 +3215,13 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>          return -EINVAL;
>      }
>  
> +    if (bs->bl.max_transfer_length && nb_sectors > bs->bl.max_transfer_length) {
> +        error_report("read of %d sectors at sector %ld exceeds device max"
> +                     " transfer length of %d sectors.", nb_sectors, sector_num,
> +                     bs->bl.max_transfer_length);
> +        return -EINVAL;
> +    }
> +
>      return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
>                               nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
>  }
> @@ -3507,6 +3514,13 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
>          return -EINVAL;
>      }
>  
> +    if (bs->bl.max_transfer_length && nb_sectors > bs->bl.max_transfer_length) {
> +        error_report("write of %d sectors at sector %ld exceeds device max"
> +                     " transfer length of %d sectors.", nb_sectors, sector_num,
> +                     bs->bl.max_transfer_length);
> +        return -EINVAL;
> +    }
> +
>      return bdrv_co_do_pwritev(bs, sector_num << BDRV_SECTOR_BITS,
>                                nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
>  }
> -- 
> 1.7.9.5
> 
> 

Look like you are changing the coroutine version.

Some hardware like virtio-blk uses the AIO version of read and writes.
What would happen if all the block drivers down the chain are AIO enabled ?

Best regards

Benoît

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-08 13:44   ` Benoît Canet
@ 2014-09-08 13:49     ` Paolo Bonzini
  2014-09-08 13:56       ` Peter Lieven
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2014-09-08 13:49 UTC (permalink / raw)
  To: Benoît Canet, Peter Lieven
  Cc: kwolf, ronniesahlberg, qemu-devel, stefanha, mreitz

Il 08/09/2014 15:44, Benoît Canet ha scritto:
>> > +    if (bs->bl.max_transfer_length && nb_sectors > bs->bl.max_transfer_length) {
>> > +        error_report("read of %d sectors at sector %ld exceeds device max"
>> > +                     " transfer length of %d sectors.", nb_sectors, sector_num,
>> > +                     bs->bl.max_transfer_length);
>> > +        return -EINVAL;
>> > +    }
>> > +
>> >      return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
>> >                               nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
>> >  }
>> > @@ -3507,6 +3514,13 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
>> >          return -EINVAL;
>> >      }
>> >  
>> > +    if (bs->bl.max_transfer_length && nb_sectors > bs->bl.max_transfer_length) {
>> > +        error_report("write of %d sectors at sector %ld exceeds device max"
>> > +                     " transfer length of %d sectors.", nb_sectors, sector_num,
>> > +                     bs->bl.max_transfer_length);
>> > +        return -EINVAL;
>> > +    }
>> > +
>> >      return bdrv_co_do_pwritev(bs, sector_num << BDRV_SECTOR_BITS,
>> >                                nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
>> >  }
>> > -- 
>> > 1.7.9.5
>> > 
>> > 
> Look like you are changing the coroutine version.
> 
> Some hardware like virtio-blk uses the AIO version of read and writes.
> What would happen if all the block drivers down the chain are AIO enabled ?

The AIO version still goes through bdrv_co_do_readv/writev.

However, error_report is not something you can use for guest-accessible
error messages, unless you want your logs to fill up with error messages. :)

Paolo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-08 13:49     ` Paolo Bonzini
@ 2014-09-08 13:56       ` Peter Lieven
  2014-09-08 13:58         ` Paolo Bonzini
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-08 13:56 UTC (permalink / raw)
  To: Paolo Bonzini, Benoît Canet
  Cc: kwolf, ronniesahlberg, qemu-devel, stefanha, mreitz

On 08.09.2014 15:49, Paolo Bonzini wrote:
> Il 08/09/2014 15:44, Benoît Canet ha scritto:
>>>> +    if (bs->bl.max_transfer_length && nb_sectors > bs->bl.max_transfer_length) {
>>>> +        error_report("read of %d sectors at sector %ld exceeds device max"
>>>> +                     " transfer length of %d sectors.", nb_sectors, sector_num,
>>>> +                     bs->bl.max_transfer_length);
>>>> +        return -EINVAL;
>>>> +    }
>>>> +
>>>>       return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
>>>>                                nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
>>>>   }
>>>> @@ -3507,6 +3514,13 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
>>>>           return -EINVAL;
>>>>       }
>>>>   
>>>> +    if (bs->bl.max_transfer_length && nb_sectors > bs->bl.max_transfer_length) {
>>>> +        error_report("write of %d sectors at sector %ld exceeds device max"
>>>> +                     " transfer length of %d sectors.", nb_sectors, sector_num,
>>>> +                     bs->bl.max_transfer_length);
>>>> +        return -EINVAL;
>>>> +    }
>>>> +
>>>>       return bdrv_co_do_pwritev(bs, sector_num << BDRV_SECTOR_BITS,
>>>>                                 nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
>>>>   }
>>>> -- 
>>>> 1.7.9.5
>>>>
>>>>
>> Look like you are changing the coroutine version.
>>
>> Some hardware like virtio-blk uses the AIO version of read and writes.
>> What would happen if all the block drivers down the chain are AIO enabled ?
> The AIO version still goes through bdrv_co_do_readv/writev.
>
> However, error_report is not something you can use for guest-accessible
> error messages, unless you want your logs to fill up with error messages. :)

So you would not throw an error msg here?

Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-08 13:56       ` Peter Lieven
@ 2014-09-08 13:58         ` Paolo Bonzini
  2014-09-08 14:35           ` Peter Lieven
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2014-09-08 13:58 UTC (permalink / raw)
  To: Peter Lieven, Benoît Canet
  Cc: kwolf, ronniesahlberg, qemu-devel, stefanha, mreitz

Il 08/09/2014 15:56, Peter Lieven ha scritto:
>>>>>
>>> Look like you are changing the coroutine version.
>>>
>>> Some hardware like virtio-blk uses the AIO version of read and writes.
>>> What would happen if all the block drivers down the chain are AIO
>>> enabled ?
>> The AIO version still goes through bdrv_co_do_readv/writev.
>>
>> However, error_report is not something you can use for guest-accessible
>> error messages, unless you want your logs to fill up with error
>> messages. :)
> 
> So you would not throw an error msg here?

No, though a trace could be useful.

Paolo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-08 13:58         ` Paolo Bonzini
@ 2014-09-08 14:35           ` Peter Lieven
  2014-09-08 14:42             ` Paolo Bonzini
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-08 14:35 UTC (permalink / raw)
  To: Paolo Bonzini, Benoît Canet
  Cc: kwolf, ronniesahlberg, qemu-devel, stefanha, mreitz

On 08.09.2014 15:58, Paolo Bonzini wrote:
> Il 08/09/2014 15:56, Peter Lieven ha scritto:
>>>> Look like you are changing the coroutine version.
>>>>
>>>> Some hardware like virtio-blk uses the AIO version of read and writes.
>>>> What would happen if all the block drivers down the chain are AIO
>>>> enabled ?
>>> The AIO version still goes through bdrv_co_do_readv/writev.
>>>
>>> However, error_report is not something you can use for guest-accessible
>>> error messages, unless you want your logs to fill up with error
>>> messages. :)
>> So you would not throw an error msg here?
> No, though a trace could be useful.

Is there a howto somewhere how to implement that?

Then I would send a v2 if there are no other complaints.

Whats your opinion changed the max_xfer_len to 0xffff regardsless
of use_16_for_rw in iSCSI?

Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-08 14:35           ` Peter Lieven
@ 2014-09-08 14:42             ` Paolo Bonzini
  2014-09-08 14:54               ` Peter Lieven
  2014-09-08 15:13               ` ronnie sahlberg
  0 siblings, 2 replies; 44+ messages in thread
From: Paolo Bonzini @ 2014-09-08 14:42 UTC (permalink / raw)
  To: Peter Lieven, Benoît Canet
  Cc: kwolf, ronniesahlberg, qemu-devel, stefanha, mreitz

Il 08/09/2014 16:35, Peter Lieven ha scritto:
>>>>
>>>> messages. :)
>>> So you would not throw an error msg here?
>> No, though a trace could be useful.
> 
> Is there a howto somewhere how to implement that?

Try commit 4ac4458076e1aaf3b01a6361154117df20e22215.

> Whats your opinion changed the max_xfer_len to 0xffff regardsless
> of use_16_for_rw in iSCSI?

If you implemented request splitting in the block layer, it would be
okay to force max_xfer_len to 0xffff.

Paolo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-08 14:42             ` Paolo Bonzini
@ 2014-09-08 14:54               ` Peter Lieven
  2014-09-23  8:47                 ` Kevin Wolf
  2014-09-08 15:13               ` ronnie sahlberg
  1 sibling, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-08 14:54 UTC (permalink / raw)
  To: Paolo Bonzini, Benoît Canet
  Cc: kwolf, ronniesahlberg, qemu-devel, stefanha, mreitz

On 08.09.2014 16:42, Paolo Bonzini wrote:
> Il 08/09/2014 16:35, Peter Lieven ha scritto:
>>>>> messages. :)
>>>> So you would not throw an error msg here?
>>> No, though a trace could be useful.
>> Is there a howto somewhere how to implement that?
> Try commit 4ac4458076e1aaf3b01a6361154117df20e22215.

Thanks for the pointer.

>
>> Whats your opinion changed the max_xfer_len to 0xffff regardsless
>> of use_16_for_rw in iSCSI?
> If you implemented request splitting in the block layer, it would be
> okay to force max_xfer_len to 0xffff.

Unfortunately, I currently have no time for that. It will include some shuffling with
qiovs that has to be properly tested.

Regarding iSCSI: In fact currently the limit is 0xffff for all iSCSI Targets < 2TB.
So I thought that its not obvious at all why a > 2TB target can handle bigger requests.

To the root cause of this patch multiwrite_merge I still have some thoughts:
  - why are we merging requests for raw (especially host devices and/or iSCSI?)
    The original patch from Kevin was to mitigate a QCOW2 performance regression.
    For iSCSI the qiov concats are destroying all the zerocopy efforts we made.
  - should we only merge requests within the same cluster?
  - why is there no multiread_merge?

Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-08 14:42             ` Paolo Bonzini
  2014-09-08 14:54               ` Peter Lieven
@ 2014-09-08 15:13               ` ronnie sahlberg
  2014-09-08 15:15                 ` Paolo Bonzini
  2014-09-08 15:16                 ` Peter Lieven
  1 sibling, 2 replies; 44+ messages in thread
From: ronnie sahlberg @ 2014-09-08 15:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Benoît Canet, Kevin Wolf, Peter Lieven, qemu-devel,
	Max Reitz, Stefan Hajnoczi

On Mon, Sep 8, 2014 at 7:42 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 08/09/2014 16:35, Peter Lieven ha scritto:
>>>>>
>>>>> messages. :)
>>>> So you would not throw an error msg here?
>>> No, though a trace could be useful.
>>
>> Is there a howto somewhere how to implement that?
>
> Try commit 4ac4458076e1aaf3b01a6361154117df20e22215.
>
>> Whats your opinion changed the max_xfer_len to 0xffff regardsless
>> of use_16_for_rw in iSCSI?
>
> If you implemented request splitting in the block layer, it would be
> okay to force max_xfer_len to 0xffff.

I think it should be OK if that is a different series. This only
affects the iSCSI transport since it is the only transport so far to
record or report a max transfer length.
If a guest generates these big requests(i.e. not multiwrite) we
already fail them due to the READ10 limitations. We would just fail
the request at a different layer in the stack with these patches.


What I would like to see would also be to report these limitations to
the guest itself to prevent it from generating too large I/Os.
I am willing to do that part once the initial max_xfer_len ones are finished.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-08 15:13               ` ronnie sahlberg
@ 2014-09-08 15:15                 ` Paolo Bonzini
  2014-09-08 15:18                   ` Peter Lieven
  2014-09-08 15:16                 ` Peter Lieven
  1 sibling, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2014-09-08 15:15 UTC (permalink / raw)
  To: ronnie sahlberg
  Cc: Benoît Canet, Kevin Wolf, Peter Lieven, qemu-devel,
	Max Reitz, Stefan Hajnoczi

Il 08/09/2014 17:13, ronnie sahlberg ha scritto:
> 
> What I would like to see would also be to report these limitations to
> the guest itself to prevent it from generating too large I/Os.

That's difficult because you don't want a backend change (e.g. from
local storage to iSCSI) to change the vital product data in the guest.

That's why we have splitting code for discard, and why we would have to
add it for read/write too.

Paolo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-08 15:13               ` ronnie sahlberg
  2014-09-08 15:15                 ` Paolo Bonzini
@ 2014-09-08 15:16                 ` Peter Lieven
  1 sibling, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2014-09-08 15:16 UTC (permalink / raw)
  To: ronnie sahlberg, Paolo Bonzini
  Cc: Benoît Canet, Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

On 08.09.2014 17:13, ronnie sahlberg wrote:
> On Mon, Sep 8, 2014 at 7:42 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 08/09/2014 16:35, Peter Lieven ha scritto:
>>>>>> messages. :)
>>>>> So you would not throw an error msg here?
>>>> No, though a trace could be useful.
>>> Is there a howto somewhere how to implement that?
>> Try commit 4ac4458076e1aaf3b01a6361154117df20e22215.
>>
>>> Whats your opinion changed the max_xfer_len to 0xffff regardsless
>>> of use_16_for_rw in iSCSI?
>> If you implemented request splitting in the block layer, it would be
>> okay to force max_xfer_len to 0xffff.
> I think it should be OK if that is a different series. This only
> affects the iSCSI transport since it is the only transport so far to
> record or report a max transfer length.
> If a guest generates these big requests(i.e. not multiwrite) we
> already fail them due to the READ10 limitations. We would just fail
> the request at a different layer in the stack with these patches.
>
>
> What I would like to see would also be to report these limitations to
> the guest itself to prevent it from generating too large I/Os.
> I am willing to do that part once the initial max_xfer_len ones are finished.

Yes, I also think this approach is straightforward. We should avoid
big requests at the beginning and not fix them at the end.
I had a look it shouldn't be too hard. There are default values for
virtio-scsi HD inquiry emulation which are set if no cmdline values
are specified. It should be possible to feed them from bs->bl if set.

Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-08 15:15                 ` Paolo Bonzini
@ 2014-09-08 15:18                   ` Peter Lieven
  2014-09-08 15:27                     ` Paolo Bonzini
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-08 15:18 UTC (permalink / raw)
  To: Paolo Bonzini, ronnie sahlberg
  Cc: Benoît Canet, Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

On 08.09.2014 17:15, Paolo Bonzini wrote:
> Il 08/09/2014 17:13, ronnie sahlberg ha scritto:
>> What I would like to see would also be to report these limitations to
>> the guest itself to prevent it from generating too large I/Os.
> That's difficult because you don't want a backend change (e.g. from
> local storage to iSCSI) to change the vital product data in the guest.

I hadn't that in mind....

>
> That's why we have splitting code for discard, and why we would have to
> add it for read/write too.

Why should a guest generate such big requests. Afaik the reported
limit for e.g. virtio-blk is 1024 sectors (reported through blockdev --getmaxsect /dev/vda).
I think it was only the multiwrite_merge code causing trouble here.

Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-08 15:18                   ` Peter Lieven
@ 2014-09-08 15:27                     ` Paolo Bonzini
  2014-09-08 16:18                       ` Peter Lieven
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2014-09-08 15:27 UTC (permalink / raw)
  To: Peter Lieven, ronnie sahlberg
  Cc: Benoît Canet, Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Il 08/09/2014 17:18, Peter Lieven ha scritto:
>>
>> That's why we have splitting code for discard, and why we would have to
>> add it for read/write too.
> 
> Why should a guest generate such big requests.

When copying data, gparted will try using very large I/O sizes.  Of
course if something breaks it will just use a smaller size, but it would
make performance worse.

I tried now (with local storage, not virtual---but with such large block
sizes it's disk bound anyway, one request can take 0.1 seconds to
execute) and a 2 MB block size is 20% slower than 16 MB block size on
your usual 3.5" rotational SATA disk.

Paolo

> Afaik the reported
> limit for e.g. virtio-blk is 1024 sectors (reported through blockdev
> --getmaxsect /dev/vda).
> I think it was only the multiwrite_merge code causing trouble here.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-08 15:27                     ` Paolo Bonzini
@ 2014-09-08 16:18                       ` Peter Lieven
  2014-09-08 16:29                         ` Paolo Bonzini
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-08 16:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Benoît Canet, Kevin Wolf, ronnie sahlberg, qemu-devel,
	Max Reitz, Stefan Hajnoczi



> Am 08.09.2014 um 17:27 schrieb Paolo Bonzini <pbonzini@redhat.com>:
> 
> Il 08/09/2014 17:18, Peter Lieven ha scritto:
>>> 
>>> That's why we have splitting code for discard, and why we would have to
>>> add it for read/write too.
>> 
>> Why should a guest generate such big requests.
> 
> When copying data, gparted will try using very large I/O sizes.  Of
> course if something breaks it will just use a smaller size, but it would
> make performance worse.
> 
> I tried now (with local storage, not virtual---but with such large block
> sizes it's disk bound anyway, one request can take 0.1 seconds to
> execute) and a 2 MB block size is 20% slower than 16 MB block size on
> your usual 3.5" rotational SATA disk.
> 

can you share with what command exactly you ran these tests?

i tried myself and found that without multiwrite_merge i was not able to create a request bigger than 0xffff sectors from inside linux.

Peter

> Paolo
> 
>> Afaik the reported
>> limit for e.g. virtio-blk is 1024 sectors (reported through blockdev
>> --getmaxsect /dev/vda).
>> I think it was only the multiwrite_merge code causing trouble here.
> 
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-08 16:18                       ` Peter Lieven
@ 2014-09-08 16:29                         ` Paolo Bonzini
  2014-09-12 11:43                           ` Peter Lieven
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2014-09-08 16:29 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Benoît Canet, Kevin Wolf, ronnie sahlberg, qemu-devel,
	Max Reitz, Stefan Hajnoczi

Il 08/09/2014 18:18, Peter Lieven ha scritto:
>> > When copying data, gparted will try using very large I/O sizes.  Of
>> > course if something breaks it will just use a smaller size, but it would
>> > make performance worse.
>> > 
>> > I tried now (with local storage, not virtual---but with such large block
>> > sizes it's disk bound anyway, one request can take 0.1 seconds to
>> > execute) and a 2 MB block size is 20% slower than 16 MB block size on
>> > your usual 3.5" rotational SATA disk.
>> > 
> can you share with what command exactly you ran these tests?
> 
> i tried myself and found that without multiwrite_merge i was not able to create a request bigger than 0xffff sectors from inside linux.

On a different machine:

$ time dd if=/dev/zero of=test bs=16777216 count=30 oflag=direct
real	0m13.497s
user	0m0.001s
sys	0m0.541s

$ time dd if=/dev/zero of=test2 bs=1048576 count=480 oflag=direct
real	0m15.835s
user	0m0.005s
sys	0m0.770s

The bigger block size is 17% faster; for disk-to-disk copy:

$ time dd if=test of=test3 bs=16777216 count=30 iflag=direct oflag=direct
real	0m26.075s
user	0m0.001s
sys	0m0.678s

$ time dd if=test2 of=test4 bs=1048576 count=480 iflag=direct oflag=direct
real	0m45.210s
user	0m0.005s
sys	0m1.145s

The bigger block size is 73% faster.

Paolo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-08 16:29                         ` Paolo Bonzini
@ 2014-09-12 11:43                           ` Peter Lieven
  2014-09-18 14:12                             ` Paolo Bonzini
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-12 11:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Benoît Canet, Kevin Wolf, ronnie sahlberg, qemu-devel,
	Max Reitz, Stefan Hajnoczi

Am 08.09.2014 um 18:29 schrieb Paolo Bonzini:
> Il 08/09/2014 18:18, Peter Lieven ha scritto:
>>>> When copying data, gparted will try using very large I/O sizes.  Of
>>>> course if something breaks it will just use a smaller size, but it would
>>>> make performance worse.
>>>>
>>>> I tried now (with local storage, not virtual---but with such large block
>>>> sizes it's disk bound anyway, one request can take 0.1 seconds to
>>>> execute) and a 2 MB block size is 20% slower than 16 MB block size on
>>>> your usual 3.5" rotational SATA disk.
>>>>
>> can you share with what command exactly you ran these tests?
>>
>> i tried myself and found that without multiwrite_merge i was not able to create a request bigger than 0xffff sectors from inside linux.
> On a different machine:
>
> $ time dd if=/dev/zero of=test bs=16777216 count=30 oflag=direct
> real	0m13.497s
> user	0m0.001s
> sys	0m0.541s
>
> $ time dd if=/dev/zero of=test2 bs=1048576 count=480 oflag=direct
> real	0m15.835s
> user	0m0.005s
> sys	0m0.770s
>
> The bigger block size is 17% faster; for disk-to-disk copy:
>
> $ time dd if=test of=test3 bs=16777216 count=30 iflag=direct oflag=direct
> real	0m26.075s
> user	0m0.001s
> sys	0m0.678s
>
> $ time dd if=test2 of=test4 bs=1048576 count=480 iflag=direct oflag=direct
> real	0m45.210s
> user	0m0.005s
> sys	0m1.145s
>
> The bigger block size is 73% faster.

I perfectly believe that 16MB blocksize is faster than 2MB. That is true for iSCSI as well.

However, I do not see requests of this size coming in via virtio-blk when I ran:

dd if=/dev/zero of=test bs=16777216 count=30 oflag=direct.

As you can see from the multiwrite_merge trace the merging has never been stopped because of
the max_transfer_length. The question is, why are the I/O requests not coming in as specified?

multiwrite_merge: num_reqs 15 -> 1
iscsi_co_writev: sector_num 0 nb_sectors 15360 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 17 -> 1
iscsi_co_writev: sector_num 15360 nb_sectors 17408 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 32768 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 19 -> 1
iscsi_co_writev: sector_num 44032 nb_sectors 19456 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 2 -> 1
iscsi_co_writev: sector_num 63488 nb_sectors 2048 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 12 -> 1
iscsi_co_writev: sector_num 65536 nb_sectors 12288 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 18 -> 1
iscsi_co_writev: sector_num 77824 nb_sectors 18432 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 2 -> 1
iscsi_co_writev: sector_num 96256 nb_sectors 2048 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 12 -> 1
iscsi_co_writev: sector_num 98304 nb_sectors 12288 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 19 -> 1
iscsi_co_writev: sector_num 110592 nb_sectors 19456 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 1 -> 1
iscsi_co_writev: sector_num 130048 nb_sectors 1024 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 131072 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 19 -> 1
iscsi_co_writev: sector_num 142336 nb_sectors 19456 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 2 -> 1
iscsi_co_writev: sector_num 161792 nb_sectors 2048 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 12 -> 1
iscsi_co_writev: sector_num 163840 nb_sectors 12288 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 19 -> 1
iscsi_co_writev: sector_num 176128 nb_sectors 19456 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 1 -> 1
iscsi_co_writev: sector_num 195584 nb_sectors 1024 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 7 -> 1
iscsi_co_writev: sector_num 196608 nb_sectors 7168 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 18 -> 1
iscsi_co_writev: sector_num 203776 nb_sectors 18432 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 7 -> 1
iscsi_co_writev: sector_num 222208 nb_sectors 7168 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 229376 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 19 -> 1
iscsi_co_writev: sector_num 240640 nb_sectors 19456 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 2 -> 1
iscsi_co_writev: sector_num 260096 nb_sectors 2048 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 262144 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 18 -> 1
iscsi_co_writev: sector_num 273408 nb_sectors 18432 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 3 -> 1
iscsi_co_writev: sector_num 291840 nb_sectors 3072 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 12 -> 1
iscsi_co_writev: sector_num 294912 nb_sectors 12288 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 19 -> 1
iscsi_co_writev: sector_num 307200 nb_sectors 19456 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 1 -> 1
iscsi_co_writev: sector_num 326656 nb_sectors 1024 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 327680 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 19 -> 1
iscsi_co_writev: sector_num 338944 nb_sectors 19456 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 2 -> 1
iscsi_co_writev: sector_num 358400 nb_sectors 2048 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 360448 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 19 -> 1
iscsi_co_writev: sector_num 371712 nb_sectors 19456 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 2 -> 1
iscsi_co_writev: sector_num 391168 nb_sectors 2048 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 393216 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 14 -> 1
iscsi_co_writev: sector_num 404480 nb_sectors 14336 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 7 -> 1
iscsi_co_writev: sector_num 418816 nb_sectors 7168 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 425984 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 20 -> 1
iscsi_co_writev: sector_num 437248 nb_sectors 20480 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 1 -> 1
iscsi_co_writev: sector_num 457728 nb_sectors 1024 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 458752 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 18 -> 1
iscsi_co_writev: sector_num 470016 nb_sectors 18432 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 3 -> 1
iscsi_co_writev: sector_num 488448 nb_sectors 3072 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 491520 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 19 -> 1
iscsi_co_writev: sector_num 502784 nb_sectors 19456 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 2 -> 1
iscsi_co_writev: sector_num 522240 nb_sectors 2048 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 524288 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 18 -> 1
iscsi_co_writev: sector_num 535552 nb_sectors 18432 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 3 -> 1
iscsi_co_writev: sector_num 553984 nb_sectors 3072 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 557056 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 19 -> 1
iscsi_co_writev: sector_num 568320 nb_sectors 19456 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 2 -> 1
iscsi_co_writev: sector_num 587776 nb_sectors 2048 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 589824 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 19 -> 1
iscsi_co_writev: sector_num 601088 nb_sectors 19456 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 2 -> 1
iscsi_co_writev: sector_num 620544 nb_sectors 2048 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 12 -> 1
iscsi_co_writev: sector_num 622592 nb_sectors 12288 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 18 -> 1
iscsi_co_writev: sector_num 634880 nb_sectors 18432 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 2 -> 1
iscsi_co_writev: sector_num 653312 nb_sectors 2048 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 655360 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 19 -> 1
iscsi_co_writev: sector_num 666624 nb_sectors 19456 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 2 -> 1
iscsi_co_writev: sector_num 686080 nb_sectors 2048 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 688128 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 19 -> 1
iscsi_co_writev: sector_num 699392 nb_sectors 19456 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 2 -> 1
iscsi_co_writev: sector_num 718848 nb_sectors 2048 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 9 -> 1
iscsi_co_writev: sector_num 720896 nb_sectors 9216 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 18 -> 1
iscsi_co_writev: sector_num 730112 nb_sectors 18432 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 5 -> 1
iscsi_co_writev: sector_num 748544 nb_sectors 5120 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 12 -> 1
iscsi_co_writev: sector_num 753664 nb_sectors 12288 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 7 -> 1
iscsi_co_writev: sector_num 765952 nb_sectors 7168 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 6 -> 1
iscsi_co_writev: sector_num 773120 nb_sectors 6144 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 7 -> 1
iscsi_co_writev: sector_num 779264 nb_sectors 7168 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 786432 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 14 -> 1
iscsi_co_writev: sector_num 797696 nb_sectors 14336 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 5 -> 1
iscsi_co_writev: sector_num 812032 nb_sectors 5120 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 2 -> 1
iscsi_co_writev: sector_num 817152 nb_sectors 2048 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 8 -> 1
iscsi_co_writev: sector_num 819200 nb_sectors 8192 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 13 -> 1
iscsi_co_writev: sector_num 827392 nb_sectors 13312 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 6 -> 1
iscsi_co_writev: sector_num 840704 nb_sectors 6144 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 5 -> 1
iscsi_co_writev: sector_num 846848 nb_sectors 5120 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 851968 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 14 -> 1
iscsi_co_writev: sector_num 863232 nb_sectors 14336 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 6 -> 1
iscsi_co_writev: sector_num 877568 nb_sectors 6144 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 1 -> 1
iscsi_co_writev: sector_num 883712 nb_sectors 1024 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 884736 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 896000 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 5 -> 1
iscsi_co_writev: sector_num 907264 nb_sectors 5120 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 5 -> 1
iscsi_co_writev: sector_num 912384 nb_sectors 5120 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 11 -> 1
iscsi_co_writev: sector_num 917504 nb_sectors 11264 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 14 -> 1
iscsi_co_writev: sector_num 928768 nb_sectors 14336 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 5 -> 1
iscsi_co_writev: sector_num 943104 nb_sectors 5120 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 2 -> 1
iscsi_co_writev: sector_num 948224 nb_sectors 2048 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 12 -> 1
iscsi_co_writev: sector_num 950272 nb_sectors 12288 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 15 -> 1
iscsi_co_writev: sector_num 962560 nb_sectors 15360 bs->bl.max_transfer_length 65535
multiwrite_merge: num_reqs 5 -> 1
iscsi_co_writev: sector_num 977920 nb_sectors 5120 bs->bl.max_transfer_length 65535

Peter

>
> Paolo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-12 11:43                           ` Peter Lieven
@ 2014-09-18 14:12                             ` Paolo Bonzini
  2014-09-18 14:16                               ` Peter Lieven
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2014-09-18 14:12 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Benoît Canet, Kevin Wolf, ronnie sahlberg, qemu-devel,
	Max Reitz, Stefan Hajnoczi

Il 12/09/2014 13:43, Peter Lieven ha scritto:
> 
> As you can see from the multiwrite_merge trace the merging has never been stopped because of
> the max_transfer_length. The question is, why are the I/O requests not coming in as specified?

I think that's because of the filesystem.  Try writing directly to a device.

Paolo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge
  2014-09-05 16:51 ` [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge Peter Lieven
@ 2014-09-18 14:13   ` Paolo Bonzini
  2014-09-18 22:56     ` Peter Lieven
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2014-09-18 14:13 UTC (permalink / raw)
  To: Peter Lieven, qemu-devel; +Cc: kwolf, ronniesahlberg, stefanha, mreitz

Il 05/09/2014 18:51, Peter Lieven ha scritto:
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/block.c b/block.c
> index fa4c34b..db3f842 100644
> --- a/block.c
> +++ b/block.c
> @@ -4554,6 +4554,11 @@ static int multiwrite_merge(BlockDriverState *bs, BlockRequest *reqs,
>              merge = 0;
>          }
>  
> +        if (bs->bl.max_transfer_length && reqs[outidx].nb_sectors +
> +            reqs[i].nb_sectors > bs->bl.max_transfer_length) {
> +            merge = 0;
> +        }
> +
>          if (merge) {
>              size_t size;
>              QEMUIOVector *qiov = g_malloc0(sizeof(*qiov));
> 

So I think if we treat it just as a hint for multiwrite, we can avoid
writing code to split oversized requests.  They always worked so far, we
can certainly wait until we have a real bugfix.

Can you drop patch 2 and resend the rest?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-18 14:12                             ` Paolo Bonzini
@ 2014-09-18 14:16                               ` Peter Lieven
  2014-09-18 14:17                                 ` Paolo Bonzini
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-18 14:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Benoît Canet, Kevin Wolf, ronnie sahlberg, qemu-devel,
	Max Reitz, Stefan Hajnoczi

On 18.09.2014 16:12, Paolo Bonzini wrote:
> Il 12/09/2014 13:43, Peter Lieven ha scritto:
>> As you can see from the multiwrite_merge trace the merging has never been stopped because of
>> the max_transfer_length. The question is, why are the I/O requests not coming in as specified?
> I think that's because of the filesystem.  Try writing directly to a device.

thats what I did...

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-18 14:16                               ` Peter Lieven
@ 2014-09-18 14:17                                 ` Paolo Bonzini
  2014-09-18 22:57                                   ` Peter Lieven
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2014-09-18 14:17 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Benoît Canet, Kevin Wolf, ronnie sahlberg, qemu-devel,
	Max Reitz, Stefan Hajnoczi

Il 18/09/2014 16:16, Peter Lieven ha scritto:
>>
>>> As you can see from the multiwrite_merge trace the merging has never
>>> been stopped because of
>>> the max_transfer_length. The question is, why are the I/O requests
>>> not coming in as specified?
>> I think that's because of the filesystem.  Try writing directly to a
>> device.
> 
> thats what I did...

Ah, I saw of=test.

Paolo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge
  2014-09-18 14:13   ` Paolo Bonzini
@ 2014-09-18 22:56     ` Peter Lieven
  2014-09-19 13:33       ` Paolo Bonzini
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-18 22:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, ronniesahlberg, qemu-devel, stefanha, mreitz


Am 18.09.2014 um 16:13 schrieb Paolo Bonzini <pbonzini@redhat.com>:

> Il 05/09/2014 18:51, Peter Lieven ha scritto:
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> block.c |    5 +++++
>> 1 file changed, 5 insertions(+)
>> 
>> diff --git a/block.c b/block.c
>> index fa4c34b..db3f842 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -4554,6 +4554,11 @@ static int multiwrite_merge(BlockDriverState *bs, BlockRequest *reqs,
>>             merge = 0;
>>         }
>> 
>> +        if (bs->bl.max_transfer_length && reqs[outidx].nb_sectors +
>> +            reqs[i].nb_sectors > bs->bl.max_transfer_length) {
>> +            merge = 0;
>> +        }
>> +
>>         if (merge) {
>>             size_t size;
>>             QEMUIOVector *qiov = g_malloc0(sizeof(*qiov));
>> 
> 
> So I think if we treat it just as a hint for multiwrite, we can avoid
> writing code to split oversized requests.  They always worked so far, we
> can certainly wait until we have a real bug fix.

I would not treat this as a hint. I would use it in cases where we definitely
know an absolute hard limit for I/O request size. Otherwise the value for
bs->bl.max_transfer_length should be 0.

If there comes in an oversized request we fail it as early as possible and regarding
the multi write code we avoid that it accidentally generates an oversized request.

Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-18 14:17                                 ` Paolo Bonzini
@ 2014-09-18 22:57                                   ` Peter Lieven
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2014-09-18 22:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Benoît Canet, Kevin Wolf, ronnie sahlberg, qemu-devel,
	Max Reitz, Stefan Hajnoczi


Am 18.09.2014 um 16:17 schrieb Paolo Bonzini <pbonzini@redhat.com>:

> Il 18/09/2014 16:16, Peter Lieven ha scritto:
>>> 
>>>> As you can see from the multiwrite_merge trace the merging has never
>>>> been stopped because of
>>>> the max_transfer_length. The question is, why are the I/O requests
>>>> not coming in as specified?
>>> I think that's because of the filesystem.  Try writing directly to a
>>> device.
>> 
>> thats what I did...
> 
> Ah, I saw of=test.

It was definitely /dev/vda. Any ideas? Maybe something in the virtio drivers?

Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge
  2014-09-18 22:56     ` Peter Lieven
@ 2014-09-19 13:33       ` Paolo Bonzini
  2014-09-22  9:43         ` Peter Lieven
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2014-09-19 13:33 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, ronniesahlberg, qemu-devel, stefanha, mreitz

Il 19/09/2014 00:56, Peter Lieven ha scritto:
>> > So I think if we treat it just as a hint for multiwrite, we can avoid
>> > writing code to split oversized requests.  They always worked so far, we
>> > can certainly wait until we have a real bug fix.
> I would not treat this as a hint. I would use it in cases where we definitely
> know an absolute hard limit for I/O request size. Otherwise the value for
> bs->bl.max_transfer_length should be 0.
> 
> If there comes in an oversized request we fail it as early as possible

That's the part that I'd rather not touch, at least not without doing
request splitting.

Paolo

 and regarding
> the multi write code we avoid that it accidentally generates an oversized request.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge
  2014-09-19 13:33       ` Paolo Bonzini
@ 2014-09-22  9:43         ` Peter Lieven
  2014-09-22 19:06           ` Paolo Bonzini
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-22  9:43 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, ronniesahlberg, qemu-devel, stefanha, mreitz

On 19.09.2014 15:33, Paolo Bonzini wrote:
> Il 19/09/2014 00:56, Peter Lieven ha scritto:
>>>> So I think if we treat it just as a hint for multiwrite, we can avoid
>>>> writing code to split oversized requests.  They always worked so far, we
>>>> can certainly wait until we have a real bug fix.
>> I would not treat this as a hint. I would use it in cases where we definitely
>> know an absolute hard limit for I/O request size. Otherwise the value for
>> bs->bl.max_transfer_length should be 0.
>>
>> If there comes in an oversized request we fail it as early as possible
> That's the part that I'd rather not touch, at least not without doing
> request splitting.

This series aims not at touching default behaviour. The default for max_transfer_length
is 0 (no limit). max_transfer_length is a limit that MUST be satisfied otherwise the request
will fail. And Patch 2 aims at catching this fail earlier in the stack. Currently, we only
have a limit for iSCSI. Without Patch 2 it would fail after we have send the command to
the target. And without Patch 4 it may happen that multiwrite_merge traps the into the limit.

Maybe I should adjust the description of max_transfer_length?

Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge
  2014-09-22  9:43         ` Peter Lieven
@ 2014-09-22 19:06           ` Paolo Bonzini
  2014-09-23  6:15             ` Peter Lieven
  0 siblings, 1 reply; 44+ messages in thread
From: Paolo Bonzini @ 2014-09-22 19:06 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, ronniesahlberg, qemu-devel, stefanha, mreitz

Il 22/09/2014 11:43, Peter Lieven ha scritto:
> This series aims not at touching default behaviour. The default for max_transfer_length
> is 0 (no limit). max_transfer_length is a limit that MUST be satisfied otherwise the request
> will fail. And Patch 2 aims at catching this fail earlier in the stack.

Understood.  But the right fix is to avoid that backend limits transpire
into guest ABI, not to catch the limits earlier.  So the right fix would
be to implement request splitting.

Since we never had a bug report about this, I'm not pushing to implement
splitting.  However, patch 2 is still wrong.

Patch 4 instead is fixing a real bug, so it's very much welcome.

Paolo

> Currently, we only
> have a limit for iSCSI. Without Patch 2 it would fail after we have send
> the command to
> the target. And without Patch 4 it may happen that multiwrite_merge
> traps the into the limit.
> 
> Maybe I should adjust the description of max_transfer_length?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge
  2014-09-22 19:06           ` Paolo Bonzini
@ 2014-09-23  6:15             ` Peter Lieven
  2014-09-23  8:59               ` Kevin Wolf
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-23  6:15 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, ronniesahlberg, qemu-devel, stefanha, mreitz

On 22.09.2014 21:06, Paolo Bonzini wrote:
> Il 22/09/2014 11:43, Peter Lieven ha scritto:
>> This series aims not at touching default behaviour. The default for max_transfer_length
>> is 0 (no limit). max_transfer_length is a limit that MUST be satisfied otherwise the request
>> will fail. And Patch 2 aims at catching this fail earlier in the stack.
> Understood.  But the right fix is to avoid that backend limits transpire
> into guest ABI, not to catch the limits earlier.  So the right fix would
> be to implement request splitting.

Since you proposed to add traces for this would you leave those in?
And since iSCSI is the only user of this at the moment would you
go for implementing this check in the iSCSI block layer?

As for the split logic would you think it is enough to build new qiov's
out of the too big iov without copying the contents? This would work
as long as a single iov inside the qiov is not bigger the max_transfer_length.
Otherwise I would need to allocate temporary buffers and copy around.

Peter

>
> Since we never had a bug report about this, I'm not pushing to implement
> splitting.  However, patch 2 is still wrong.
>
> Patch 4 instead is fixing a real bug, so it's very much welcome.
>
> Paolo
>
>> Currently, we only
>> have a limit for iSCSI. Without Patch 2 it would fail after we have send
>> the command to
>> the target. And without Patch 4 it may happen that multiwrite_merge
>> traps the into the limit.
>>
>> Maybe I should adjust the description of max_transfer_length?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-08 14:54               ` Peter Lieven
@ 2014-09-23  8:47                 ` Kevin Wolf
  2014-09-23  8:55                   ` Peter Lieven
  0 siblings, 1 reply; 44+ messages in thread
From: Kevin Wolf @ 2014-09-23  8:47 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Benoît Canet, stefanha, qemu-devel, mreitz, ronniesahlberg,
	Paolo Bonzini

Am 08.09.2014 um 16:54 hat Peter Lieven geschrieben:
> On 08.09.2014 16:42, Paolo Bonzini wrote:
> >Il 08/09/2014 16:35, Peter Lieven ha scritto:
> >>Whats your opinion changed the max_xfer_len to 0xffff regardsless
> >>of use_16_for_rw in iSCSI?
> >If you implemented request splitting in the block layer, it would be
> >okay to force max_xfer_len to 0xffff.
> 
> Unfortunately, I currently have no time for that. It will include some shuffling with
> qiovs that has to be properly tested.
> 
> Regarding iSCSI: In fact currently the limit is 0xffff for all iSCSI Targets < 2TB.
> So I thought that its not obvious at all why a > 2TB target can handle bigger requests.
> 
> To the root cause of this patch multiwrite_merge I still have some thoughts:
>  - why are we merging requests for raw (especially host devices and/or iSCSI?)
>    The original patch from Kevin was to mitigate a QCOW2 performance regression.

The problem wasn't in qcow2, though, it just became more apparent there
because lots of small requests are deadly to performance during cluster
allocation. Installation simply took ages compared to IDE. If you do a
real benchmark, you'll probably see (smaller) differences with raw, too.

The core problem is virtio-blk getting much smaller requests. IIRC, I
got an average of 128k-512k per request for IDE and something as poor as
4k-32k for virtio-blk. If I read this thread correctly, you're saying
that this is still the case. I suspect there is something wrong with the
guest driver, but somebody would have to dig into that.

>    For iSCSI the qiov concats are destroying all the zerocopy efforts we made.

If this is true, it just means that the iscsi driver sucks for vectored
I/O and needs to be fixed.

>  - should we only merge requests within the same cluster?

Does it hurt to merge everything we can? The block driver needs to be
able to take things apart anyway, the large request could come from
somewhere else (guest, block job, builtin NBD server, etc.)

>  - why is there no multiread_merge?

Because nobody implemented it. :-)

As I said above, writes hurt a lot because of qcow2 cluster allocation.
Reads are probably losing some performance as well (someone would need
to check this), but processing them separately isn't quite as painful.

Kevin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-23  8:47                 ` Kevin Wolf
@ 2014-09-23  8:55                   ` Peter Lieven
  2014-09-23  9:09                     ` Kevin Wolf
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-23  8:55 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Benoît Canet, stefanha, qemu-devel, mreitz, ronniesahlberg,
	Paolo Bonzini

On 23.09.2014 10:47, Kevin Wolf wrote:
> Am 08.09.2014 um 16:54 hat Peter Lieven geschrieben:
>> On 08.09.2014 16:42, Paolo Bonzini wrote:
>>> Il 08/09/2014 16:35, Peter Lieven ha scritto:
>>>> Whats your opinion changed the max_xfer_len to 0xffff regardsless
>>>> of use_16_for_rw in iSCSI?
>>> If you implemented request splitting in the block layer, it would be
>>> okay to force max_xfer_len to 0xffff.
>> Unfortunately, I currently have no time for that. It will include some shuffling with
>> qiovs that has to be properly tested.
>>
>> Regarding iSCSI: In fact currently the limit is 0xffff for all iSCSI Targets < 2TB.
>> So I thought that its not obvious at all why a > 2TB target can handle bigger requests.
>>
>> To the root cause of this patch multiwrite_merge I still have some thoughts:
>>   - why are we merging requests for raw (especially host devices and/or iSCSI?)
>>     The original patch from Kevin was to mitigate a QCOW2 performance regression.
> The problem wasn't in qcow2, though, it just became more apparent there
> because lots of small requests are deadly to performance during cluster
> allocation. Installation simply took ages compared to IDE. If you do a
> real benchmark, you'll probably see (smaller) differences with raw, too.
>
> The core problem is virtio-blk getting much smaller requests. IIRC, I
> got an average of 128k-512k per request for IDE and something as poor as
> 4k-32k for virtio-blk. If I read this thread correctly, you're saying
> that this is still the case. I suspect there is something wrong with the
> guest driver, but somebody would have to dig into that.

This seems to be the case. I will check if this disappears with most
recent kernels virtio-blk versions.

>
>>     For iSCSI the qiov concats are destroying all the zerocopy efforts we made.
> If this is true, it just means that the iscsi driver sucks for vectored
> I/O and needs to be fixed.

It works quite well, I misunderstood the iovec_concat routine at first. I thought
it was copying payload data around. The overhead for bilding the qiov structure
only should be neglectible.


>
>>   - should we only merge requests within the same cluster?
> Does it hurt to merge everything we can? The block driver needs to be
> able to take things apart anyway, the large request could come from
> somewhere else (guest, block job, builtin NBD server, etc.)

I was just thinking. It was unclear what the original intention was.

My concern was that merging too much will increase latency for
the specific I/O even if it increases throughput.

>
>>   - why is there no multiread_merge?
> Because nobody implemented it. :-)

Maybe its worth looking at this. Taking the multiwrite_merge stuff as
a basis it shouldn't be too hard.

>
> As I said above, writes hurt a lot because of qcow2 cluster allocation.
> Reads are probably losing some performance as well (someone would need
> to check this), but processing them separately isn't quite as painful.

I will try to sort that out.

Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge
  2014-09-23  6:15             ` Peter Lieven
@ 2014-09-23  8:59               ` Kevin Wolf
  2014-09-23  9:04                 ` Peter Lieven
  2014-09-23  9:32                 ` Peter Lieven
  0 siblings, 2 replies; 44+ messages in thread
From: Kevin Wolf @ 2014-09-23  8:59 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Paolo Bonzini, ronniesahlberg, qemu-devel, stefanha, mreitz

Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben:
> On 22.09.2014 21:06, Paolo Bonzini wrote:
> >Il 22/09/2014 11:43, Peter Lieven ha scritto:
> >>This series aims not at touching default behaviour. The default for max_transfer_length
> >>is 0 (no limit). max_transfer_length is a limit that MUST be satisfied otherwise the request
> >>will fail. And Patch 2 aims at catching this fail earlier in the stack.
> >Understood.  But the right fix is to avoid that backend limits transpire
> >into guest ABI, not to catch the limits earlier.  So the right fix would
> >be to implement request splitting.
> 
> Since you proposed to add traces for this would you leave those in?
> And since iSCSI is the only user of this at the moment would you
> go for implementing this check in the iSCSI block layer?
> 
> As for the split logic would you think it is enough to build new qiov's
> out of the too big iov without copying the contents? This would work
> as long as a single iov inside the qiov is not bigger the max_transfer_length.
> Otherwise I would need to allocate temporary buffers and copy around.

You can split single iovs, too. There are functions that make this very
easy, they copy a sub-qiov with a byte granularity offset and length
(qemu_iovec_concat and friends). qcow2 uses the same to split requests
at (fragmented) cluster boundaries.

Kevin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge
  2014-09-23  8:59               ` Kevin Wolf
@ 2014-09-23  9:04                 ` Peter Lieven
  2014-09-23  9:32                 ` Peter Lieven
  1 sibling, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2014-09-23  9:04 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, ronniesahlberg, qemu-devel, stefanha, mreitz

On 23.09.2014 10:59, Kevin Wolf wrote:
> Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben:
>> On 22.09.2014 21:06, Paolo Bonzini wrote:
>>> Il 22/09/2014 11:43, Peter Lieven ha scritto:
>>>> This series aims not at touching default behaviour. The default for max_transfer_length
>>>> is 0 (no limit). max_transfer_length is a limit that MUST be satisfied otherwise the request
>>>> will fail. And Patch 2 aims at catching this fail earlier in the stack.
>>> Understood.  But the right fix is to avoid that backend limits transpire
>>> into guest ABI, not to catch the limits earlier.  So the right fix would
>>> be to implement request splitting.
>> Since you proposed to add traces for this would you leave those in?
>> And since iSCSI is the only user of this at the moment would you
>> go for implementing this check in the iSCSI block layer?
>>
>> As for the split logic would you think it is enough to build new qiov's
>> out of the too big iov without copying the contents? This would work
>> as long as a single iov inside the qiov is not bigger the max_transfer_length.
>> Otherwise I would need to allocate temporary buffers and copy around.
> You can split single iovs, too. There are functions that make this very
> easy, they copy a sub-qiov with a byte granularity offset and length
> (qemu_iovec_concat and friends). qcow2 uses the same to split requests
> at (fragmented) cluster boundaries.

Thanks for that pointer. Looking at qemu_iovec_concat this seems to be a
nobrainer.

I will first send an updated series dropping Patch 2 and will then send a follow-up
that splits requests.

Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests
  2014-09-23  8:55                   ` Peter Lieven
@ 2014-09-23  9:09                     ` Kevin Wolf
  0 siblings, 0 replies; 44+ messages in thread
From: Kevin Wolf @ 2014-09-23  9:09 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Benoît Canet, stefanha, qemu-devel, mreitz, ronniesahlberg,
	Paolo Bonzini

Am 23.09.2014 um 10:55 hat Peter Lieven geschrieben:
> >>  - should we only merge requests within the same cluster?
> >Does it hurt to merge everything we can? The block driver needs to be
> >able to take things apart anyway, the large request could come from
> >somewhere else (guest, block job, builtin NBD server, etc.)
> 
> I was just thinking. It was unclear what the original intention was.

I hope it's clearer now. Anyway, for qcow2 merging even across clusters
does help a bit if those clusters are physically contiguous.

> My concern was that merging too much will increase latency for
> the specific I/O even if it increases throughput.

That's probably a valid concern. Perhaps we should add a set of options
to enable or disable certain request manipulations that the block layer
can do.  This includes request merging, but also alignment adjustments,
zero detection (which already has an option) etc.

> >>  - why is there no multiread_merge?
> >Because nobody implemented it. :-)
> 
> Maybe its worth looking at this. Taking the multiwrite_merge stuff as
> a basis it shouldn't be too hard.

Yes, it should be doable. We need numbers, though, before merging
something like this.

> >As I said above, writes hurt a lot because of qcow2 cluster allocation.
> >Reads are probably losing some performance as well (someone would need
> >to check this), but processing them separately isn't quite as painful.
> 
> I will try to sort that out.

Great, thanks!

Kevin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge
  2014-09-23  8:59               ` Kevin Wolf
  2014-09-23  9:04                 ` Peter Lieven
@ 2014-09-23  9:32                 ` Peter Lieven
  2014-09-23  9:47                   ` Kevin Wolf
  1 sibling, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-23  9:32 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, ronniesahlberg, qemu-devel, stefanha, mreitz

On 23.09.2014 10:59, Kevin Wolf wrote:
> Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben:
>> On 22.09.2014 21:06, Paolo Bonzini wrote:
>>> Il 22/09/2014 11:43, Peter Lieven ha scritto:
>>>> This series aims not at touching default behaviour. The default for max_transfer_length
>>>> is 0 (no limit). max_transfer_length is a limit that MUST be satisfied otherwise the request
>>>> will fail. And Patch 2 aims at catching this fail earlier in the stack.
>>> Understood.  But the right fix is to avoid that backend limits transpire
>>> into guest ABI, not to catch the limits earlier.  So the right fix would
>>> be to implement request splitting.
>> Since you proposed to add traces for this would you leave those in?
>> And since iSCSI is the only user of this at the moment would you
>> go for implementing this check in the iSCSI block layer?
>>
>> As for the split logic would you think it is enough to build new qiov's
>> out of the too big iov without copying the contents? This would work
>> as long as a single iov inside the qiov is not bigger the max_transfer_length.
>> Otherwise I would need to allocate temporary buffers and copy around.
> You can split single iovs, too. There are functions that make this very
> easy, they copy a sub-qiov with a byte granularity offset and length
> (qemu_iovec_concat and friends). qcow2 uses the same to split requests
> at (fragmented) cluster boundaries.

Might it be as easy as this?

static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
     BdrvRequestFlags flags)
{
     if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) {
         return -EINVAL;
     }

     if (bs->bl.max_transfer_length &&
         nb_sectors > bs->bl.max_transfer_length) {
         int ret = 0;
         QEMUIOVector *qiov2 = NULL;
         size_t soffset = 0;

         trace_bdrv_co_do_readv_toobig(bs, sector_num, nb_sectors,
bs->bl.max_transfer_length);

         qemu_iovec_init(qiov2, qiov->niov);
         while (nb_sectors > bs->bl.max_transfer_length && !ret) {
             qemu_iovec_reset(qiov2);
             qemu_iovec_concat(qiov2, qiov, soffset,
                               bs->bl.max_transfer_length << BDRV_SECTOR_BITS);
             ret = bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
                                     bs->bl.max_transfer_length << BDRV_SECTOR_BITS,
                                     qiov2, flags);
             soffset += bs->bl.max_transfer_length << BDRV_SECTOR_BITS;
             sector_num += bs->bl.max_transfer_length;
             nb_sectors -= bs->bl.max_transfer_length;
         }
         qemu_iovec_destroy(qiov2);
         if (ret) {
             return ret;
         }
     }

     return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
                              nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
}

>
> Kevin


-- 

Mit freundlichen Grüßen

Peter Lieven

...........................................................

   KAMP Netzwerkdienste GmbH
   Vestische Str. 89-91 | 46117 Oberhausen
   Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
   pl@kamp.de | http://www.kamp.de

   Geschäftsführer: Heiner Lante | Michael Lante
   Amtsgericht Duisburg | HRB Nr. 12154
   USt-Id-Nr.: DE 120607556

...........................................................

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge
  2014-09-23  9:32                 ` Peter Lieven
@ 2014-09-23  9:47                   ` Kevin Wolf
  2014-09-23  9:52                     ` Peter Lieven
  0 siblings, 1 reply; 44+ messages in thread
From: Kevin Wolf @ 2014-09-23  9:47 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Paolo Bonzini, ronniesahlberg, qemu-devel, stefanha, mreitz

Am 23.09.2014 um 11:32 hat Peter Lieven geschrieben:
> On 23.09.2014 10:59, Kevin Wolf wrote:
> >Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben:
> >>On 22.09.2014 21:06, Paolo Bonzini wrote:
> >>>Il 22/09/2014 11:43, Peter Lieven ha scritto:
> >>>>This series aims not at touching default behaviour. The default for max_transfer_length
> >>>>is 0 (no limit). max_transfer_length is a limit that MUST be satisfied otherwise the request
> >>>>will fail. And Patch 2 aims at catching this fail earlier in the stack.
> >>>Understood.  But the right fix is to avoid that backend limits transpire
> >>>into guest ABI, not to catch the limits earlier.  So the right fix would
> >>>be to implement request splitting.
> >>Since you proposed to add traces for this would you leave those in?
> >>And since iSCSI is the only user of this at the moment would you
> >>go for implementing this check in the iSCSI block layer?
> >>
> >>As for the split logic would you think it is enough to build new qiov's
> >>out of the too big iov without copying the contents? This would work
> >>as long as a single iov inside the qiov is not bigger the max_transfer_length.
> >>Otherwise I would need to allocate temporary buffers and copy around.
> >You can split single iovs, too. There are functions that make this very
> >easy, they copy a sub-qiov with a byte granularity offset and length
> >(qemu_iovec_concat and friends). qcow2 uses the same to split requests
> >at (fragmented) cluster boundaries.
> 
> Might it be as easy as this?

This is completely untested, right? :-)

But ignoring bugs, the principle looks right.

> static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
>     BdrvRequestFlags flags)
> {
>     if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) {
>         return -EINVAL;
>     }
> 
>     if (bs->bl.max_transfer_length &&
>         nb_sectors > bs->bl.max_transfer_length) {
>         int ret = 0;
>         QEMUIOVector *qiov2 = NULL;

Make it "QEMUIOVector qiov2;" on the stack.

>         size_t soffset = 0;
> 
>         trace_bdrv_co_do_readv_toobig(bs, sector_num, nb_sectors,
> bs->bl.max_transfer_length);
> 
>         qemu_iovec_init(qiov2, qiov->niov);

And &qiov2 here, then this doesn't crash with a NULL dereference.

>         while (nb_sectors > bs->bl.max_transfer_length && !ret) {
>             qemu_iovec_reset(qiov2);
>             qemu_iovec_concat(qiov2, qiov, soffset,
>                               bs->bl.max_transfer_length << BDRV_SECTOR_BITS);
>             ret = bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
>                                     bs->bl.max_transfer_length << BDRV_SECTOR_BITS,
>                                     qiov2, flags);
>             soffset += bs->bl.max_transfer_length << BDRV_SECTOR_BITS;
>             sector_num += bs->bl.max_transfer_length;
>             nb_sectors -= bs->bl.max_transfer_length;
>         }
>         qemu_iovec_destroy(qiov2);
>         if (ret) {
>             return ret;
>         }

The error check needs to be immediately after the assignment of ret,
otherwise the next loop iteration can overwrite an error with a success
(and if it didn't, it would still do useless I/O because the request as
a whole would fail anyway).

>     }
> 
>     return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
>                              nb_sectors << BDRV_SECTOR_BITS, qiov, flags);

qiov doesn't work here for the splitting case. You need the remaining
part, not the whole original qiov.

Kevin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge
  2014-09-23  9:47                   ` Kevin Wolf
@ 2014-09-23  9:52                     ` Peter Lieven
  2014-09-23 10:05                       ` Kevin Wolf
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-23  9:52 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, ronniesahlberg, qemu-devel, stefanha, mreitz

On 23.09.2014 11:47, Kevin Wolf wrote:
> Am 23.09.2014 um 11:32 hat Peter Lieven geschrieben:
>> On 23.09.2014 10:59, Kevin Wolf wrote:
>>> Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben:
>>>> On 22.09.2014 21:06, Paolo Bonzini wrote:
>>>>> Il 22/09/2014 11:43, Peter Lieven ha scritto:
>>>>>> This series aims not at touching default behaviour. The default for max_transfer_length
>>>>>> is 0 (no limit). max_transfer_length is a limit that MUST be satisfied otherwise the request
>>>>>> will fail. And Patch 2 aims at catching this fail earlier in the stack.
>>>>> Understood.  But the right fix is to avoid that backend limits transpire
>>>>> into guest ABI, not to catch the limits earlier.  So the right fix would
>>>>> be to implement request splitting.
>>>> Since you proposed to add traces for this would you leave those in?
>>>> And since iSCSI is the only user of this at the moment would you
>>>> go for implementing this check in the iSCSI block layer?
>>>>
>>>> As for the split logic would you think it is enough to build new qiov's
>>>> out of the too big iov without copying the contents? This would work
>>>> as long as a single iov inside the qiov is not bigger the max_transfer_length.
>>>> Otherwise I would need to allocate temporary buffers and copy around.
>>> You can split single iovs, too. There are functions that make this very
>>> easy, they copy a sub-qiov with a byte granularity offset and length
>>> (qemu_iovec_concat and friends). qcow2 uses the same to split requests
>>> at (fragmented) cluster boundaries.
>> Might it be as easy as this?
> This is completely untested, right? :-)

Yes :-)
I was just unsure if the general approach is right.

>
> But ignoring bugs, the principle looks right.
>
>> static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>>      int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
>>      BdrvRequestFlags flags)
>> {
>>      if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) {
>>          return -EINVAL;
>>      }
>>
>>      if (bs->bl.max_transfer_length &&
>>          nb_sectors > bs->bl.max_transfer_length) {
>>          int ret = 0;
>>          QEMUIOVector *qiov2 = NULL;
> Make it "QEMUIOVector qiov2;" on the stack.
>
>>          size_t soffset = 0;
>>
>>          trace_bdrv_co_do_readv_toobig(bs, sector_num, nb_sectors,
>> bs->bl.max_transfer_length);
>>
>>          qemu_iovec_init(qiov2, qiov->niov);
> And &qiov2 here, then this doesn't crash with a NULL dereference.
>
>>          while (nb_sectors > bs->bl.max_transfer_length && !ret) {
>>              qemu_iovec_reset(qiov2);
>>              qemu_iovec_concat(qiov2, qiov, soffset,
>>                                bs->bl.max_transfer_length << BDRV_SECTOR_BITS);
>>              ret = bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
>>                                      bs->bl.max_transfer_length << BDRV_SECTOR_BITS,
>>                                      qiov2, flags);
>>              soffset += bs->bl.max_transfer_length << BDRV_SECTOR_BITS;
>>              sector_num += bs->bl.max_transfer_length;
>>              nb_sectors -= bs->bl.max_transfer_length;
>>          }
>>          qemu_iovec_destroy(qiov2);
>>          if (ret) {
>>              return ret;
>>          }
> The error check needs to be immediately after the assignment of ret,
> otherwise the next loop iteration can overwrite an error with a success
> (and if it didn't, it would still do useless I/O because the request as
> a whole would fail anyway).

There is a && !ret in the loop condition. I wanted to avoid copying the destroy part.

BTW, is it !ret or ret < 0 ?

>
>>      }
>>
>>      return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
>>                               nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
> qiov doesn't work here for the splitting case. You need the remaining
> part, not the whole original qiov.

Right ;-)

Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge
  2014-09-23  9:52                     ` Peter Lieven
@ 2014-09-23 10:05                       ` Kevin Wolf
  2014-09-30  7:26                         ` Peter Lieven
  0 siblings, 1 reply; 44+ messages in thread
From: Kevin Wolf @ 2014-09-23 10:05 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Paolo Bonzini, ronniesahlberg, qemu-devel, stefanha, mreitz

Am 23.09.2014 um 11:52 hat Peter Lieven geschrieben:
> On 23.09.2014 11:47, Kevin Wolf wrote:
> >Am 23.09.2014 um 11:32 hat Peter Lieven geschrieben:
> >>On 23.09.2014 10:59, Kevin Wolf wrote:
> >>>Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben:
> >>>>On 22.09.2014 21:06, Paolo Bonzini wrote:
> >>>>>Il 22/09/2014 11:43, Peter Lieven ha scritto:
> >>>>>>This series aims not at touching default behaviour. The default for max_transfer_length
> >>>>>>is 0 (no limit). max_transfer_length is a limit that MUST be satisfied otherwise the request
> >>>>>>will fail. And Patch 2 aims at catching this fail earlier in the stack.
> >>>>>Understood.  But the right fix is to avoid that backend limits transpire
> >>>>>into guest ABI, not to catch the limits earlier.  So the right fix would
> >>>>>be to implement request splitting.
> >>>>Since you proposed to add traces for this would you leave those in?
> >>>>And since iSCSI is the only user of this at the moment would you
> >>>>go for implementing this check in the iSCSI block layer?
> >>>>
> >>>>As for the split logic would you think it is enough to build new qiov's
> >>>>out of the too big iov without copying the contents? This would work
> >>>>as long as a single iov inside the qiov is not bigger the max_transfer_length.
> >>>>Otherwise I would need to allocate temporary buffers and copy around.
> >>>You can split single iovs, too. There are functions that make this very
> >>>easy, they copy a sub-qiov with a byte granularity offset and length
> >>>(qemu_iovec_concat and friends). qcow2 uses the same to split requests
> >>>at (fragmented) cluster boundaries.
> >>Might it be as easy as this?
> >This is completely untested, right? :-)
> 
> Yes :-)
> I was just unsure if the general approach is right.

Looks alright to me.

> >But ignoring bugs, the principle looks right.
> >
> >>static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
> >>     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
> >>     BdrvRequestFlags flags)
> >>{
> >>     if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) {
> >>         return -EINVAL;
> >>     }
> >>
> >>     if (bs->bl.max_transfer_length &&
> >>         nb_sectors > bs->bl.max_transfer_length) {
> >>         int ret = 0;
> >>         QEMUIOVector *qiov2 = NULL;
> >Make it "QEMUIOVector qiov2;" on the stack.
> >
> >>         size_t soffset = 0;
> >>
> >>         trace_bdrv_co_do_readv_toobig(bs, sector_num, nb_sectors,
> >>bs->bl.max_transfer_length);
> >>
> >>         qemu_iovec_init(qiov2, qiov->niov);
> >And &qiov2 here, then this doesn't crash with a NULL dereference.
> >
> >>         while (nb_sectors > bs->bl.max_transfer_length && !ret) {
> >>             qemu_iovec_reset(qiov2);
> >>             qemu_iovec_concat(qiov2, qiov, soffset,
> >>                               bs->bl.max_transfer_length << BDRV_SECTOR_BITS);
> >>             ret = bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
> >>                                     bs->bl.max_transfer_length << BDRV_SECTOR_BITS,
> >>                                     qiov2, flags);
> >>             soffset += bs->bl.max_transfer_length << BDRV_SECTOR_BITS;
> >>             sector_num += bs->bl.max_transfer_length;
> >>             nb_sectors -= bs->bl.max_transfer_length;
> >>         }
> >>         qemu_iovec_destroy(qiov2);
> >>         if (ret) {
> >>             return ret;
> >>         }
> >The error check needs to be immediately after the assignment of ret,
> >otherwise the next loop iteration can overwrite an error with a success
> >(and if it didn't, it would still do useless I/O because the request as
> >a whole would fail anyway).
> 
> There is a && !ret in the loop condition. I wanted to avoid copying the destroy part.

Ah, yes, clever. I missed that. Maybe too clever then. ;-)

> BTW, is it !ret or ret < 0 ?

It only ever returns 0 or negative, so both are equivalent. I
prefer checks for ret < 0, but that's a matter of style rather than
correctness.

Another problem I just noticed is that this is not the only caller of
bdrv_co_do_preadv(), so you're not splitting all requests. The
synchronous bdrv_read/write/pread/pwrite/pwritev functions all don't get
the functionality this way.

Perhaps you should be doing it inside bdrv_co_do_preadv(), before the
call to bdrv_aligned_preadv(). Might even be more correct if it can
happen that the alignment adjustment increases a request too much to fit
in bl.max_transfer_length.

Kevin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge
  2014-09-23 10:05                       ` Kevin Wolf
@ 2014-09-30  7:26                         ` Peter Lieven
  2014-09-30  8:03                           ` Kevin Wolf
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2014-09-30  7:26 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, ronniesahlberg, qemu-devel, stefanha, mreitz

On 23.09.2014 12:05, Kevin Wolf wrote:
> Am 23.09.2014 um 11:52 hat Peter Lieven geschrieben:
>> On 23.09.2014 11:47, Kevin Wolf wrote:
>>> Am 23.09.2014 um 11:32 hat Peter Lieven geschrieben:
>>>> On 23.09.2014 10:59, Kevin Wolf wrote:
>>>>> Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben:
>>>>>> On 22.09.2014 21:06, Paolo Bonzini wrote:
>>>>>>> Il 22/09/2014 11:43, Peter Lieven ha scritto:
>>>>>>>> This series aims not at touching default behaviour. The default for max_transfer_length
>>>>>>>> is 0 (no limit). max_transfer_length is a limit that MUST be satisfied otherwise the request
>>>>>>>> will fail. And Patch 2 aims at catching this fail earlier in the stack.
>>>>>>> Understood.  But the right fix is to avoid that backend limits transpire
>>>>>>> into guest ABI, not to catch the limits earlier.  So the right fix would
>>>>>>> be to implement request splitting.
>>>>>> Since you proposed to add traces for this would you leave those in?
>>>>>> And since iSCSI is the only user of this at the moment would you
>>>>>> go for implementing this check in the iSCSI block layer?
>>>>>>
>>>>>> As for the split logic would you think it is enough to build new qiov's
>>>>>> out of the too big iov without copying the contents? This would work
>>>>>> as long as a single iov inside the qiov is not bigger the max_transfer_length.
>>>>>> Otherwise I would need to allocate temporary buffers and copy around.
>>>>> You can split single iovs, too. There are functions that make this very
>>>>> easy, they copy a sub-qiov with a byte granularity offset and length
>>>>> (qemu_iovec_concat and friends). qcow2 uses the same to split requests
>>>>> at (fragmented) cluster boundaries.
>>>> Might it be as easy as this?
>>> This is completely untested, right? :-)
>> Yes :-)
>> I was just unsure if the general approach is right.
> Looks alright to me.
>
>>> But ignoring bugs, the principle looks right.
>>>
>>>> static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>>>>      int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
>>>>      BdrvRequestFlags flags)
>>>> {
>>>>      if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) {
>>>>          return -EINVAL;
>>>>      }
>>>>
>>>>      if (bs->bl.max_transfer_length &&
>>>>          nb_sectors > bs->bl.max_transfer_length) {
>>>>          int ret = 0;
>>>>          QEMUIOVector *qiov2 = NULL;
>>> Make it "QEMUIOVector qiov2;" on the stack.
>>>
>>>>          size_t soffset = 0;
>>>>
>>>>          trace_bdrv_co_do_readv_toobig(bs, sector_num, nb_sectors,
>>>> bs->bl.max_transfer_length);
>>>>
>>>>          qemu_iovec_init(qiov2, qiov->niov);
>>> And &qiov2 here, then this doesn't crash with a NULL dereference.
>>>
>>>>          while (nb_sectors > bs->bl.max_transfer_length && !ret) {
>>>>              qemu_iovec_reset(qiov2);
>>>>              qemu_iovec_concat(qiov2, qiov, soffset,
>>>>                                bs->bl.max_transfer_length << BDRV_SECTOR_BITS);
>>>>              ret = bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
>>>>                                      bs->bl.max_transfer_length << BDRV_SECTOR_BITS,
>>>>                                      qiov2, flags);
>>>>              soffset += bs->bl.max_transfer_length << BDRV_SECTOR_BITS;
>>>>              sector_num += bs->bl.max_transfer_length;
>>>>              nb_sectors -= bs->bl.max_transfer_length;
>>>>          }
>>>>          qemu_iovec_destroy(qiov2);
>>>>          if (ret) {
>>>>              return ret;
>>>>          }
>>> The error check needs to be immediately after the assignment of ret,
>>> otherwise the next loop iteration can overwrite an error with a success
>>> (and if it didn't, it would still do useless I/O because the request as
>>> a whole would fail anyway).
>> There is a && !ret in the loop condition. I wanted to avoid copying the destroy part.
> Ah, yes, clever. I missed that. Maybe too clever then. ;-)
>
>> BTW, is it !ret or ret < 0 ?
> It only ever returns 0 or negative, so both are equivalent. I
> prefer checks for ret < 0, but that's a matter of style rather than
> correctness.
>
> Another problem I just noticed is that this is not the only caller of
> bdrv_co_do_preadv(), so you're not splitting all requests. The
> synchronous bdrv_read/write/pread/pwrite/pwritev functions all don't get
> the functionality this way.
>
> Perhaps you should be doing it inside bdrv_co_do_preadv(), before the
> call to bdrv_aligned_preadv(). Might even be more correct if it can
> happen that the alignment adjustment increases a request too much to fit
> in bl.max_transfer_length.

If I do it this way can I use the same req Object for all splitted
requests?

Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge
  2014-09-30  7:26                         ` Peter Lieven
@ 2014-09-30  8:03                           ` Kevin Wolf
  0 siblings, 0 replies; 44+ messages in thread
From: Kevin Wolf @ 2014-09-30  8:03 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Paolo Bonzini, ronniesahlberg, qemu-devel, stefanha, mreitz

Am 30.09.2014 um 09:26 hat Peter Lieven geschrieben:
> On 23.09.2014 12:05, Kevin Wolf wrote:
> >Am 23.09.2014 um 11:52 hat Peter Lieven geschrieben:
> >>On 23.09.2014 11:47, Kevin Wolf wrote:
> >>>Am 23.09.2014 um 11:32 hat Peter Lieven geschrieben:
> >>>>On 23.09.2014 10:59, Kevin Wolf wrote:
> >>>>>Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben:
> >>>>>>On 22.09.2014 21:06, Paolo Bonzini wrote:
> >>>>>>>Il 22/09/2014 11:43, Peter Lieven ha scritto:
> >>>>>>>>This series aims not at touching default behaviour. The default for max_transfer_length
> >>>>>>>>is 0 (no limit). max_transfer_length is a limit that MUST be satisfied otherwise the request
> >>>>>>>>will fail. And Patch 2 aims at catching this fail earlier in the stack.
> >>>>>>>Understood.  But the right fix is to avoid that backend limits transpire
> >>>>>>>into guest ABI, not to catch the limits earlier.  So the right fix would
> >>>>>>>be to implement request splitting.
> >>>>>>Since you proposed to add traces for this would you leave those in?
> >>>>>>And since iSCSI is the only user of this at the moment would you
> >>>>>>go for implementing this check in the iSCSI block layer?
> >>>>>>
> >>>>>>As for the split logic would you think it is enough to build new qiov's
> >>>>>>out of the too big iov without copying the contents? This would work
> >>>>>>as long as a single iov inside the qiov is not bigger the max_transfer_length.
> >>>>>>Otherwise I would need to allocate temporary buffers and copy around.
> >>>>>You can split single iovs, too. There are functions that make this very
> >>>>>easy, they copy a sub-qiov with a byte granularity offset and length
> >>>>>(qemu_iovec_concat and friends). qcow2 uses the same to split requests
> >>>>>at (fragmented) cluster boundaries.
> >>>>Might it be as easy as this?
> >>>This is completely untested, right? :-)
> >>Yes :-)
> >>I was just unsure if the general approach is right.
> >Looks alright to me.
> >
> >>>But ignoring bugs, the principle looks right.
> >>>
> >>>>static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
> >>>>     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
> >>>>     BdrvRequestFlags flags)
> >>>>{
> >>>>     if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) {
> >>>>         return -EINVAL;
> >>>>     }
> >>>>
> >>>>     if (bs->bl.max_transfer_length &&
> >>>>         nb_sectors > bs->bl.max_transfer_length) {
> >>>>         int ret = 0;
> >>>>         QEMUIOVector *qiov2 = NULL;
> >>>Make it "QEMUIOVector qiov2;" on the stack.
> >>>
> >>>>         size_t soffset = 0;
> >>>>
> >>>>         trace_bdrv_co_do_readv_toobig(bs, sector_num, nb_sectors,
> >>>>bs->bl.max_transfer_length);
> >>>>
> >>>>         qemu_iovec_init(qiov2, qiov->niov);
> >>>And &qiov2 here, then this doesn't crash with a NULL dereference.
> >>>
> >>>>         while (nb_sectors > bs->bl.max_transfer_length && !ret) {
> >>>>             qemu_iovec_reset(qiov2);
> >>>>             qemu_iovec_concat(qiov2, qiov, soffset,
> >>>>                               bs->bl.max_transfer_length << BDRV_SECTOR_BITS);
> >>>>             ret = bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
> >>>>                                     bs->bl.max_transfer_length << BDRV_SECTOR_BITS,
> >>>>                                     qiov2, flags);
> >>>>             soffset += bs->bl.max_transfer_length << BDRV_SECTOR_BITS;
> >>>>             sector_num += bs->bl.max_transfer_length;
> >>>>             nb_sectors -= bs->bl.max_transfer_length;
> >>>>         }
> >>>>         qemu_iovec_destroy(qiov2);
> >>>>         if (ret) {
> >>>>             return ret;
> >>>>         }
> >>>The error check needs to be immediately after the assignment of ret,
> >>>otherwise the next loop iteration can overwrite an error with a success
> >>>(and if it didn't, it would still do useless I/O because the request as
> >>>a whole would fail anyway).
> >>There is a && !ret in the loop condition. I wanted to avoid copying the destroy part.
> >Ah, yes, clever. I missed that. Maybe too clever then. ;-)
> >
> >>BTW, is it !ret or ret < 0 ?
> >It only ever returns 0 or negative, so both are equivalent. I
> >prefer checks for ret < 0, but that's a matter of style rather than
> >correctness.
> >
> >Another problem I just noticed is that this is not the only caller of
> >bdrv_co_do_preadv(), so you're not splitting all requests. The
> >synchronous bdrv_read/write/pread/pwrite/pwritev functions all don't get
> >the functionality this way.
> >
> >Perhaps you should be doing it inside bdrv_co_do_preadv(), before the
> >call to bdrv_aligned_preadv(). Might even be more correct if it can
> >happen that the alignment adjustment increases a request too much to fit
> >in bl.max_transfer_length.
> 
> If I do it this way can I use the same req Object for all splitted
> requests?

That's a good question. I think as long as you process the parts of the
split request one after another, reusing the same req object should be
safe. If you were to process them in parallel, though, I wouldn't be as
sure about it (which you probably don't want because it complicates
things :-)).

The probably most obviously correct way to handle things would be to
have one tracked_request_begin/end() for the whole request and then call
bdrv_aligned_preadv() multiple times in between. Otherwise you'd have to
serialise each part individually etc.

Kevin

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2014-09-30  8:04 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-05 16:51 [Qemu-devel] [PATCH 0/4] introduce max_transfer_length Peter Lieven
2014-09-05 16:51 ` [Qemu-devel] [PATCH 1/4] BlockLimits: " Peter Lieven
2014-09-05 16:51 ` [Qemu-devel] [PATCH 2/4] block: immediately cancel oversized read/write requests Peter Lieven
2014-09-08 13:44   ` Benoît Canet
2014-09-08 13:49     ` Paolo Bonzini
2014-09-08 13:56       ` Peter Lieven
2014-09-08 13:58         ` Paolo Bonzini
2014-09-08 14:35           ` Peter Lieven
2014-09-08 14:42             ` Paolo Bonzini
2014-09-08 14:54               ` Peter Lieven
2014-09-23  8:47                 ` Kevin Wolf
2014-09-23  8:55                   ` Peter Lieven
2014-09-23  9:09                     ` Kevin Wolf
2014-09-08 15:13               ` ronnie sahlberg
2014-09-08 15:15                 ` Paolo Bonzini
2014-09-08 15:18                   ` Peter Lieven
2014-09-08 15:27                     ` Paolo Bonzini
2014-09-08 16:18                       ` Peter Lieven
2014-09-08 16:29                         ` Paolo Bonzini
2014-09-12 11:43                           ` Peter Lieven
2014-09-18 14:12                             ` Paolo Bonzini
2014-09-18 14:16                               ` Peter Lieven
2014-09-18 14:17                                 ` Paolo Bonzini
2014-09-18 22:57                                   ` Peter Lieven
2014-09-08 15:16                 ` Peter Lieven
2014-09-05 16:51 ` [Qemu-devel] [PATCH 3/4] block/iscsi: set max_transfer_length Peter Lieven
2014-09-05 16:51 ` [Qemu-devel] [PATCH 4/4] block: avoid creating oversized writes in multiwrite_merge Peter Lieven
2014-09-18 14:13   ` Paolo Bonzini
2014-09-18 22:56     ` Peter Lieven
2014-09-19 13:33       ` Paolo Bonzini
2014-09-22  9:43         ` Peter Lieven
2014-09-22 19:06           ` Paolo Bonzini
2014-09-23  6:15             ` Peter Lieven
2014-09-23  8:59               ` Kevin Wolf
2014-09-23  9:04                 ` Peter Lieven
2014-09-23  9:32                 ` Peter Lieven
2014-09-23  9:47                   ` Kevin Wolf
2014-09-23  9:52                     ` Peter Lieven
2014-09-23 10:05                       ` Kevin Wolf
2014-09-30  7:26                         ` Peter Lieven
2014-09-30  8:03                           ` Kevin Wolf
2014-09-05 17:05 ` [Qemu-devel] [PATCH 0/4] introduce max_transfer_length ronnie sahlberg
2014-09-05 19:52   ` Peter Lieven
2014-09-05 21:22     ` ronnie sahlberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.