All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: Olaf Hering <olaf@aepfle.de>, qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	"open list:All patches CC here" <qemu-devel@nongnu.org>,
	Max Reitz <mreitz@redhat.com>,
	"open list:X86" <xen-devel@lists.xensource.com>,
	Anthony Perard <anthony.perard@citrix.com>
Subject: Re: [Qemu-devel] [PATCH] xen_disk: convert discard input to byte ranges
Date: Fri, 18 Nov 2016 10:39:28 -0600	[thread overview]
Message-ID: <42ca6186-ca47-3c63-d2e0-54f2ed9f4be7@redhat.com> (raw)
In-Reply-To: <20161118102452.5779-1-olaf@aepfle.de>

[-- Attachment #1: Type: text/plain, Size: 4208 bytes --]

On 11/18/2016 04:24 AM, Olaf Hering wrote:
> The guest sends discard requests as u64 sector/count pairs, but the
> block layer operates internally with s64/s32 pairs. The conversion
> leads to IO errors in the guest, the discard request is not processed.
> 
>   domU.cfg:
>   'vdev=xvda, format=qcow2, backendtype=qdisk, target=/x.qcow2'
>   domU:
>   mkfs.ext4 -F /dev/xvda
>   Discarding device blocks: failed - Input/output error
> 
> Fix this by splitting the request into chunks of BDRV_REQUEST_MAX_SECTORS.
> Add input range checking to avoid overflow.
> 
> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> ---
>  hw/block/xen_disk.c | 45 +++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 39 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 3a7dc19..c3f572f 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -660,6 +660,41 @@ static void qemu_aio_complete(void *opaque, int ret)
>      qemu_bh_schedule(ioreq->blkdev->bh);
>  }
>  
> +static bool blk_split_discard(struct ioreq *ioreq, blkif_sector_t sector_number,
> +                              uint64_t nr_sectors)
> +{
> +    struct XenBlkDev *blkdev = ioreq->blkdev;
> +    int64_t byte_offset;
> +    int byte_chunk;
> +    uint64_t sec_start = sector_number;
> +    uint64_t sec_count = nr_sectors;
> +    uint64_t byte_remaining;
> +    uint64_t limit = BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS;

[For reference, this limit is the same as rounding INT32_MAX down to the
nearest 512-byte limit, or 0x7ffffe00]

> +
> +    /* Wrap around? */
> +    if ((sec_start + sec_count) < sec_count) {
> +        return false;
> +    }
> +    /* Overflowing byte limit? */
> +    if ((sec_start + sec_count) > ((INT64_MAX + INT_MAX) >> BDRV_SECTOR_BITS)) {

This is undefined.  INT64_MAX + anything non-negative overflows int64,
and even if you treat overflow as defined by twos-complement
representation (which creates a negative number), shifting a negative
number is also undefined.

If you are trying to detect guests that make a request that would cover
more than INT64_MAX bytes, you can simplify.  Besides, for as much
storage as there is out there, I seriously doubt ANYONE will ever have
2^63 bytes addressable through a single device.  Why not just write it as:

if ((INT64_MAX >> BDRV_SECTOR_BITS) - sec_count < sec_start) {

> +        return false;
> +    }
> +
> +    byte_offset = sec_start << BDRV_SECTOR_BITS;
> +    byte_remaining = sec_count << BDRV_SECTOR_BITS;
> +
> +    do {
> +        byte_chunk = byte_remaining > limit ? limit : byte_remaining;
> +        ioreq->aio_inflight++;
> +        blk_aio_pdiscard(blkdev->blk, byte_offset, byte_chunk,
> +                         qemu_aio_complete, ioreq);
> +        byte_remaining -= byte_chunk;
> +        byte_offset += byte_chunk;
> +    } while (byte_remaining > 0);

This part looks reasonable.

> +
> +    return true;
> +}
> +
>  static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
>  {
>      struct XenBlkDev *blkdev = ioreq->blkdev;
> @@ -708,12 +743,10 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
>          break;
>      case BLKIF_OP_DISCARD:
>      {
> -        struct blkif_request_discard *discard_req = (void *)&ioreq->req;

The old code had it...

> -        ioreq->aio_inflight++;
> -        blk_aio_pdiscard(blkdev->blk,
> -                         discard_req->sector_number << BDRV_SECTOR_BITS,
> -                         discard_req->nr_sectors << BDRV_SECTOR_BITS,
> -                         qemu_aio_complete, ioreq);
> +        struct blkif_request_discard *req = (void *)&ioreq->req;

...but C doesn't require a cast to void*. As long as you are touching
this, you could remove the cast (unless I'm missing something, and the
cast is also there to cast away const).

> +        if (!blk_split_discard(ioreq, req->sector_number, req->nr_sectors)) {
> +            goto err;
> +        }
>          break;
>      }
>      default:
> 
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Eric Blake <eblake@redhat.com>
To: Olaf Hering <olaf@aepfle.de>, qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
	"open list:X86" <xen-devel@lists.xensource.com>,
	"open list:All patches CC here" <qemu-devel@nongnu.org>,
	Max Reitz <mreitz@redhat.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Anthony Perard <anthony.perard@citrix.com>
Subject: Re: [Qemu-devel] [PATCH] xen_disk: convert discard input to byte ranges
Date: Fri, 18 Nov 2016 10:39:28 -0600	[thread overview]
Message-ID: <42ca6186-ca47-3c63-d2e0-54f2ed9f4be7@redhat.com> (raw)
In-Reply-To: <20161118102452.5779-1-olaf@aepfle.de>


[-- Attachment #1.1: Type: text/plain, Size: 4208 bytes --]

On 11/18/2016 04:24 AM, Olaf Hering wrote:
> The guest sends discard requests as u64 sector/count pairs, but the
> block layer operates internally with s64/s32 pairs. The conversion
> leads to IO errors in the guest, the discard request is not processed.
> 
>   domU.cfg:
>   'vdev=xvda, format=qcow2, backendtype=qdisk, target=/x.qcow2'
>   domU:
>   mkfs.ext4 -F /dev/xvda
>   Discarding device blocks: failed - Input/output error
> 
> Fix this by splitting the request into chunks of BDRV_REQUEST_MAX_SECTORS.
> Add input range checking to avoid overflow.
> 
> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> ---
>  hw/block/xen_disk.c | 45 +++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 39 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 3a7dc19..c3f572f 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -660,6 +660,41 @@ static void qemu_aio_complete(void *opaque, int ret)
>      qemu_bh_schedule(ioreq->blkdev->bh);
>  }
>  
> +static bool blk_split_discard(struct ioreq *ioreq, blkif_sector_t sector_number,
> +                              uint64_t nr_sectors)
> +{
> +    struct XenBlkDev *blkdev = ioreq->blkdev;
> +    int64_t byte_offset;
> +    int byte_chunk;
> +    uint64_t sec_start = sector_number;
> +    uint64_t sec_count = nr_sectors;
> +    uint64_t byte_remaining;
> +    uint64_t limit = BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS;

[For reference, this limit is the same as rounding INT32_MAX down to the
nearest 512-byte limit, or 0x7ffffe00]

> +
> +    /* Wrap around? */
> +    if ((sec_start + sec_count) < sec_count) {
> +        return false;
> +    }
> +    /* Overflowing byte limit? */
> +    if ((sec_start + sec_count) > ((INT64_MAX + INT_MAX) >> BDRV_SECTOR_BITS)) {

This is undefined.  INT64_MAX + anything non-negative overflows int64,
and even if you treat overflow as defined by twos-complement
representation (which creates a negative number), shifting a negative
number is also undefined.

If you are trying to detect guests that make a request that would cover
more than INT64_MAX bytes, you can simplify.  Besides, for as much
storage as there is out there, I seriously doubt ANYONE will ever have
2^63 bytes addressable through a single device.  Why not just write it as:

if ((INT64_MAX >> BDRV_SECTOR_BITS) - sec_count < sec_start) {

> +        return false;
> +    }
> +
> +    byte_offset = sec_start << BDRV_SECTOR_BITS;
> +    byte_remaining = sec_count << BDRV_SECTOR_BITS;
> +
> +    do {
> +        byte_chunk = byte_remaining > limit ? limit : byte_remaining;
> +        ioreq->aio_inflight++;
> +        blk_aio_pdiscard(blkdev->blk, byte_offset, byte_chunk,
> +                         qemu_aio_complete, ioreq);
> +        byte_remaining -= byte_chunk;
> +        byte_offset += byte_chunk;
> +    } while (byte_remaining > 0);

This part looks reasonable.

> +
> +    return true;
> +}
> +
>  static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
>  {
>      struct XenBlkDev *blkdev = ioreq->blkdev;
> @@ -708,12 +743,10 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
>          break;
>      case BLKIF_OP_DISCARD:
>      {
> -        struct blkif_request_discard *discard_req = (void *)&ioreq->req;

The old code had it...

> -        ioreq->aio_inflight++;
> -        blk_aio_pdiscard(blkdev->blk,
> -                         discard_req->sector_number << BDRV_SECTOR_BITS,
> -                         discard_req->nr_sectors << BDRV_SECTOR_BITS,
> -                         qemu_aio_complete, ioreq);
> +        struct blkif_request_discard *req = (void *)&ioreq->req;

...but C doesn't require a cast to void*. As long as you are touching
this, you could remove the cast (unless I'm missing something, and the
cast is also there to cast away const).

> +        if (!blk_split_discard(ioreq, req->sector_number, req->nr_sectors)) {
> +            goto err;
> +        }
>          break;
>      }
>      default:
> 
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

  parent reply	other threads:[~2016-11-18 16:39 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-18 10:24 [Qemu-devel] [PATCH] xen_disk: convert discard input to byte ranges Olaf Hering
2016-11-18 10:24 ` Olaf Hering
2016-11-18 10:30 ` [Qemu-devel] " Olaf Hering
2016-11-18 10:30   ` Olaf Hering
2016-11-23 10:49   ` [Qemu-devel] " Olaf Hering
2016-11-23 10:49     ` Olaf Hering
2016-11-23 11:02     ` [Qemu-devel] " Olaf Hering
2016-11-23 11:02       ` Olaf Hering
2016-11-23 18:51       ` [Qemu-devel] " Stefano Stabellini
2016-11-23 18:51         ` Stefano Stabellini
2016-11-18 13:43 ` [Qemu-devel] " Eric Blake
2016-11-18 13:43   ` Eric Blake
2016-11-18 14:19   ` Olaf Hering
2016-11-18 14:19     ` Olaf Hering
2016-11-18 14:35     ` Eric Blake
2016-11-18 14:35       ` Eric Blake
2016-11-18 15:38       ` Kevin Wolf
2016-11-18 15:38         ` Kevin Wolf
2016-11-18 16:39 ` Eric Blake [this message]
2016-11-18 16:39   ` Eric Blake
2016-11-18 17:41   ` Olaf Hering
2016-11-18 17:41     ` Olaf Hering
2016-11-18 18:50     ` Eric Blake
2016-11-18 18:50       ` Eric Blake
2016-11-22 16:12       ` Olaf Hering
2016-11-22 16:12         ` Olaf Hering
2016-11-22 16:32         ` Eric Blake
2016-11-22 16:32           ` Eric Blake
2016-11-22 17:00           ` Olaf Hering
2016-11-22 17:00             ` Olaf Hering
2016-11-22 17:11             ` Eric Blake
2016-11-22 17:11               ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42ca6186-ca47-3c63-d2e0-54f2ed9f4be7@redhat.com \
    --to=eblake@redhat.com \
    --cc=anthony.perard@citrix.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=olaf@aepfle.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.