All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] block: split large discard requests from block frontend
@ 2016-04-01 12:22 Olaf Hering
  2016-04-01 17:19 ` [Qemu-devel] [Qemu-block] " Max Reitz
  0 siblings, 1 reply; 6+ messages in thread
From: Olaf Hering @ 2016-04-01 12:22 UTC (permalink / raw)
  To: qemu-block, qemu-devel; +Cc: Kevin Wolf, Olaf Hering, Stefan Hajnoczi

Large discard requests lead to sign expansion errors in qemu.
Since there is no API to tell a guest about the limitations qmeu
has to split a large request itself.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index c4869b9..5b6ed58 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2442,7 +2442,7 @@ static void coroutine_fn bdrv_discard_co_entry(void *opaque)
     rwco->ret = bdrv_co_discard(rwco->bs, rwco->sector_num, rwco->nb_sectors);
 }
 
-int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
+static int __bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
                                  int nb_sectors)
 {
     BdrvTrackedRequest req;
@@ -2524,6 +2524,26 @@ out:
     return ret;
 }
 
+int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
+                                 int nb_sectors)
+{
+    int num, ret;
+    int limit = BDRV_REQUEST_MAX_SECTORS;
+    int remaining = nb_sectors;
+    int64_t sector_offset = sector_num;
+
+    do {
+        num = remaining > limit ? limit : remaining;
+        ret = __bdrv_co_discard(bs, sector_offset, num);
+        if (ret < 0)
+            break;
+        remaining -= num;
+        sector_offset += num;
+    } while (remaining > 0);
+
+    return ret;
+}
+
 int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
 {
     Coroutine *co;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] block: split large discard requests from block frontend
  2016-04-01 12:22 [Qemu-devel] [PATCH] block: split large discard requests from block frontend Olaf Hering
@ 2016-04-01 17:19 ` Max Reitz
  2016-04-01 17:49   ` Olaf Hering
  0 siblings, 1 reply; 6+ messages in thread
From: Max Reitz @ 2016-04-01 17:19 UTC (permalink / raw)
  To: Olaf Hering, qemu-block, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi


[-- Attachment #1.1: Type: text/plain, Size: 3769 bytes --]

On 01.04.2016 14:22, Olaf Hering wrote:
> Large discard requests lead to sign expansion errors in qemu.
> Since there is no API to tell a guest about the limitations qmeu
> has to split a large request itself.
> 
> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/io.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)

Hi!

Let me first nag about some style problems, and then I'll come to the
technical/design aspect. :-)

> diff --git a/block/io.c b/block/io.c
> index c4869b9..5b6ed58 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2442,7 +2442,7 @@ static void coroutine_fn bdrv_discard_co_entry(void *opaque)
>      rwco->ret = bdrv_co_discard(rwco->bs, rwco->sector_num, rwco->nb_sectors);
>  }
>  
> -int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
> +static int __bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,

Two things: First, I think we do not like identifiers to start with
underscores, especially not with two underscores or with underscore and
a capital letter, because in C those combinations are generally reserved
for compiler/language extensions or for the operating system (think of
__attribute__(), __asm__(), or _Bool).

Second, the coroutine_fn should stay. This is just an empty macro (I
think) that signifies that a certain function is run inside of a
coroutine. That fact will not change if you put a wrapper around it.

>                                   int nb_sectors)

This should be aligned to the opening paranthesis of the line above.

>  {
>      BdrvTrackedRequest req;
> @@ -2524,6 +2524,26 @@ out:
>      return ret;
>  }
>  
> +int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
> +                                 int nb_sectors)
> +{
> +    int num, ret;
> +    int limit = BDRV_REQUEST_MAX_SECTORS;
> +    int remaining = nb_sectors;
> +    int64_t sector_offset = sector_num;
> +
> +    do {
> +        num = remaining > limit ? limit : remaining;

num = MIN(limit, remaining); would work just as fine and maybe look a
bit better.

> +        ret = __bdrv_co_discard(bs, sector_offset, num);
> +        if (ret < 0)
> +            break;

In qemu, every block is enclosed by {} braces, even if it contains only
a single statement.

This is something that the checkpatch script (scripts/checkpatch.pl in
the qemu tree) can detect. It is advisible to run that scripts on
patches before they are sent to the mailing list.

> +        remaining -= num;
> +        sector_offset += num;
> +    } while (remaining > 0);
> +
> +    return ret;
> +}
> +
>  int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
>  {
>      Coroutine *co;
> 

So, now about the technical aspect:

Guest requests are not issued to BlockDriverStates directly, they pass
through a BlockBackend first. The check whether the request is too large
is done there (in blk_check_request() called by blk_co_discard() and
blk_aio_discard()).

Thus, if a discard request exceeds BDRV_REQUEST_MAX_SECTORS, it will be
denied at the BlockBackend level and not reach bdrv_co_discard() at all.
At least that is how it's supposed to be. If it isn't, that's a bug. ;-)

Finally, I'm not sure whether this is something that should be done in
the block layer. I personally feel like it is the device models'
responsibility to only submit requests that adhere to the limits
established by the block layer.

In any case, do you have a test case where a guest was able to submit a
request that led to the overflow error you described in the commit message?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] block: split large discard requests from block frontend
  2016-04-01 17:19 ` [Qemu-devel] [Qemu-block] " Max Reitz
@ 2016-04-01 17:49   ` Olaf Hering
  2016-05-06 16:44     ` Max Reitz
  0 siblings, 1 reply; 6+ messages in thread
From: Olaf Hering @ 2016-04-01 17:49 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, qemu-block

On Fri, Apr 01, Max Reitz wrote:

> In any case, do you have a test case where a guest was able to submit a
> request that led to the overflow error you described in the commit message?

mkfs -t ext4 /dev/sdb1 in a xen guest with qcow2 as backing device.
When I added discard support to libxl I worked with raw images, so I did
not notice this. Not sure why it happens to work in kvm guests. I assume
the frontend driver just works around the qemu bug by limiting its
request size.

Olaf

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] block: split large discard requests from block frontend
  2016-04-01 17:49   ` Olaf Hering
@ 2016-05-06 16:44     ` Max Reitz
  2016-11-17 13:54       ` Olaf Hering
  0 siblings, 1 reply; 6+ messages in thread
From: Max Reitz @ 2016-05-06 16:44 UTC (permalink / raw)
  To: Olaf Hering; +Cc: qemu-block, qemu-devel, Kevin Wolf, Stefan Hajnoczi


[-- Attachment #1.1: Type: text/plain, Size: 1918 bytes --]

On 01.04.2016 19:49, Olaf Hering wrote:
> On Fri, Apr 01, Max Reitz wrote:
> 
>> In any case, do you have a test case where a guest was able to submit a
>> request that led to the overflow error you described in the commit message?
> 
> mkfs -t ext4 /dev/sdb1 in a xen guest with qcow2 as backing device.
> When I added discard support to libxl I worked with raw images, so I did
> not notice this. Not sure why it happens to work in kvm guests. I assume
> the frontend driver just works around the qemu bug by limiting its
> request size.

Sorry for not having replied in so long.

I know next to nothing about Xen, but I'm very much inclined to think
the Xen block driver (hw/block/xen_disk.c) is at fault here. The
blkif_request_discard structure it uses for accepting discard requests
apparently has a uint64_t nr_sectors field.

So I think using the Xen block device, it is possible for a guest to
submit discard requests with more then INT_MAX sectors involved, and
it's the Xen's block device emulation code job to split those requests
into smaller ones.

That said, I'm not sure this is ideal. It doesn't really look like other
block drivers care so much about splitting requests either. So we're a
bit in a pickle there.

Anyway, for your issue at hand I think there is a simpler solution.
bdrv_co_discard() can and already does split requests. So if we remove
the calls to blk_check_request() in blk_discard() and
bdrv_check_request() in bdrv_co_discard(), everything should already
work just fine.

Therefore, I think what could work for now is for blk_discard() and
bdrv_co_discard() to modify the checks they are doing on the incoming
request.

In theory, all we need to do is skip the length check for both, i.e.
accept requests even if they are longer than INT_MAX / BDRV_SECTOR_SIZE
sectors. I'm not sure how we can do that nicely in practice, though.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] block: split large discard requests from block frontend
  2016-05-06 16:44     ` Max Reitz
@ 2016-11-17 13:54       ` Olaf Hering
  2016-11-17 16:27         ` Olaf Hering
  0 siblings, 1 reply; 6+ messages in thread
From: Olaf Hering @ 2016-11-17 13:54 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-block, qemu-devel, Kevin Wolf, Stefan Hajnoczi

On Fri, May 06, Max Reitz wrote:

> On 01.04.2016 19:49, Olaf Hering wrote:
> > On Fri, Apr 01, Max Reitz wrote:
> > 
> >> In any case, do you have a test case where a guest was able to submit a
> >> request that led to the overflow error you described in the commit message?
> > 
> > mkfs -t ext4 /dev/sdb1 in a xen guest with qcow2 as backing device.
> > When I added discard support to libxl I worked with raw images, so I did
> > not notice this. Not sure why it happens to work in kvm guests. I assume
> > the frontend driver just works around the qemu bug by limiting its
> > request size.
> 
> Sorry for not having replied in so long.
> 
> I know next to nothing about Xen, but I'm very much inclined to think
> the Xen block driver (hw/block/xen_disk.c) is at fault here. The
> blkif_request_discard structure it uses for accepting discard requests
> apparently has a uint64_t nr_sectors field.

Thanks for the pointer.

Looking at current master, BLKIF_OP_DISCARD is indeed broken. The values
passed from the guest are u64 and get stashed into signed values. I will
add a loop to repeatedly call blk_aio_pdiscard with small chunks of
BDRV_REQUEST_MAX_SECTORS.

We quickly checked other users of blk_aio_pdiscard and it appears they
are not affected because they notify the guest abuilt the limits.

Olaf

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH] block: split large discard requests from block frontend
  2016-11-17 13:54       ` Olaf Hering
@ 2016-11-17 16:27         ` Olaf Hering
  0 siblings, 0 replies; 6+ messages in thread
From: Olaf Hering @ 2016-11-17 16:27 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2577 bytes --]

On Thu, Nov 17, Olaf Hering wrote:

> Looking at current master, BLKIF_OP_DISCARD is indeed broken. The values
> passed from the guest are u64 and get stashed into signed values. I will
> add a loop to repeatedly call blk_aio_pdiscard with small chunks of
> BDRV_REQUEST_MAX_SECTORS.

I have not compile tested the change below, but thats the way how I
would deal with it:

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 3a7dc19..5962c4b 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -660,6 +660,38 @@ static void qemu_aio_complete(void *opaque, int ret)
     qemu_bh_schedule(ioreq->blkdev->bh);
 }
 
+static bool blk_split_discard(struct ioreq *ioreq, blkif_sector_t sector_number, uint64_t nr_sectors)
+{
+    struct XenBlkDev *blkdev = ioreq->blkdev;
+    int64_t byte_offset;
+    int bytes;
+    uint64_t byte_limit = INT64_MAX + INT_MAX;
+    uint64_t sec_start = sector_number;
+    uint64_t sec_count = nr_sectors;
+    uint64_t byte_remaining;
+    uint64_t limit = BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS;
+
+    /* Wrap around? */
+    if ((sec_start + sec_count) < sec_count)
+        return false;
+    /* Overflowing byte limit? */
+    if ((sec_start + sec_count) > (byte_limit >> BDRV_SECTOR_BITS))
+        return false;
+
+    byte_offset = sec_start << BDRV_SECTOR_BITS;
+    byte_remaining = sec_count << BDRV_SECTOR_BITS;
+    do {
+        bytes = byte_remaining > limit ? limit : byte_remaining;
+        ioreq->aio_inflight++;
+        blk_aio_pdiscard(blkdev->blk, byte_offset, bytes,
+                         qemu_aio_complete, ioreq);
+        byte_remaining -= bytes;
+        byte_offset += bytes;
+    } while (byte_remaining > 0);
+
+    return true;
+}
+
 static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
 {
     struct XenBlkDev *blkdev = ioreq->blkdev;
@@ -708,12 +740,9 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
         break;
     case BLKIF_OP_DISCARD:
     {
-        struct blkif_request_discard *discard_req = (void *)&ioreq->req;
-        ioreq->aio_inflight++;
-        blk_aio_pdiscard(blkdev->blk,
-                         discard_req->sector_number << BDRV_SECTOR_BITS,
-                         discard_req->nr_sectors << BDRV_SECTOR_BITS,
-                         qemu_aio_complete, ioreq);
+        struct blkif_request_discard *req = (void *)&ioreq->req;
+        if (blk_split_discard(ioreq, req->sector_number, req->nr_sectors))
+            goto err;
         break;
     }
     default:


Olaf

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-11-17 16:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-01 12:22 [Qemu-devel] [PATCH] block: split large discard requests from block frontend Olaf Hering
2016-04-01 17:19 ` [Qemu-devel] [Qemu-block] " Max Reitz
2016-04-01 17:49   ` Olaf Hering
2016-05-06 16:44     ` Max Reitz
2016-11-17 13:54       ` Olaf Hering
2016-11-17 16:27         ` Olaf Hering

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.