From: Eric Blake <eblake@redhat.com> To: Olaf Hering <olaf@aepfle.de>, qemu-block@nongnu.org Cc: Kevin Wolf <kwolf@redhat.com>, Stefano Stabellini <sstabellini@kernel.org>, "open list:All patches CC here" <qemu-devel@nongnu.org>, Max Reitz <mreitz@redhat.com>, "open list:X86" <xen-devel@lists.xensource.com>, Anthony Perard <anthony.perard@citrix.com> Subject: Re: [Qemu-devel] [PATCH] xen_disk: convert discard input to byte ranges Date: Fri, 18 Nov 2016 10:39:28 -0600 [thread overview] Message-ID: <42ca6186-ca47-3c63-d2e0-54f2ed9f4be7@redhat.com> (raw) In-Reply-To: <20161118102452.5779-1-olaf@aepfle.de> [-- Attachment #1: Type: text/plain, Size: 4208 bytes --] On 11/18/2016 04:24 AM, Olaf Hering wrote: > The guest sends discard requests as u64 sector/count pairs, but the > block layer operates internally with s64/s32 pairs. The conversion > leads to IO errors in the guest, the discard request is not processed. > > domU.cfg: > 'vdev=xvda, format=qcow2, backendtype=qdisk, target=/x.qcow2' > domU: > mkfs.ext4 -F /dev/xvda > Discarding device blocks: failed - Input/output error > > Fix this by splitting the request into chunks of BDRV_REQUEST_MAX_SECTORS. > Add input range checking to avoid overflow. > > Signed-off-by: Olaf Hering <olaf@aepfle.de> > --- > hw/block/xen_disk.c | 45 +++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 39 insertions(+), 6 deletions(-) > > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c > index 3a7dc19..c3f572f 100644 > --- a/hw/block/xen_disk.c > +++ b/hw/block/xen_disk.c > @@ -660,6 +660,41 @@ static void qemu_aio_complete(void *opaque, int ret) > qemu_bh_schedule(ioreq->blkdev->bh); > } > > +static bool blk_split_discard(struct ioreq *ioreq, blkif_sector_t sector_number, > + uint64_t nr_sectors) > +{ > + struct XenBlkDev *blkdev = ioreq->blkdev; > + int64_t byte_offset; > + int byte_chunk; > + uint64_t sec_start = sector_number; > + uint64_t sec_count = nr_sectors; > + uint64_t byte_remaining; > + uint64_t limit = BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS; [For reference, this limit is the same as rounding INT32_MAX down to the nearest 512-byte limit, or 0x7ffffe00] > + > + /* Wrap around? */ > + if ((sec_start + sec_count) < sec_count) { > + return false; > + } > + /* Overflowing byte limit? */ > + if ((sec_start + sec_count) > ((INT64_MAX + INT_MAX) >> BDRV_SECTOR_BITS)) { This is undefined. INT64_MAX + anything non-negative overflows int64, and even if you treat overflow as defined by twos-complement representation (which creates a negative number), shifting a negative number is also undefined. If you are trying to detect guests that make a request that would cover more than INT64_MAX bytes, you can simplify. Besides, for as much storage as there is out there, I seriously doubt ANYONE will ever have 2^63 bytes addressable through a single device. Why not just write it as: if ((INT64_MAX >> BDRV_SECTOR_BITS) - sec_count < sec_start) { > + return false; > + } > + > + byte_offset = sec_start << BDRV_SECTOR_BITS; > + byte_remaining = sec_count << BDRV_SECTOR_BITS; > + > + do { > + byte_chunk = byte_remaining > limit ? limit : byte_remaining; > + ioreq->aio_inflight++; > + blk_aio_pdiscard(blkdev->blk, byte_offset, byte_chunk, > + qemu_aio_complete, ioreq); > + byte_remaining -= byte_chunk; > + byte_offset += byte_chunk; > + } while (byte_remaining > 0); This part looks reasonable. > + > + return true; > +} > + > static int ioreq_runio_qemu_aio(struct ioreq *ioreq) > { > struct XenBlkDev *blkdev = ioreq->blkdev; > @@ -708,12 +743,10 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq) > break; > case BLKIF_OP_DISCARD: > { > - struct blkif_request_discard *discard_req = (void *)&ioreq->req; The old code had it... > - ioreq->aio_inflight++; > - blk_aio_pdiscard(blkdev->blk, > - discard_req->sector_number << BDRV_SECTOR_BITS, > - discard_req->nr_sectors << BDRV_SECTOR_BITS, > - qemu_aio_complete, ioreq); > + struct blkif_request_discard *req = (void *)&ioreq->req; ...but C doesn't require a cast to void*. As long as you are touching this, you could remove the cast (unless I'm missing something, and the cast is also there to cast away const). > + if (!blk_split_discard(ioreq, req->sector_number, req->nr_sectors)) { > + goto err; > + } > break; > } > default: > > -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 604 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: Eric Blake <eblake@redhat.com> To: Olaf Hering <olaf@aepfle.de>, qemu-block@nongnu.org Cc: Kevin Wolf <kwolf@redhat.com>, "open list:X86" <xen-devel@lists.xensource.com>, "open list:All patches CC here" <qemu-devel@nongnu.org>, Max Reitz <mreitz@redhat.com>, Stefano Stabellini <sstabellini@kernel.org>, Anthony Perard <anthony.perard@citrix.com> Subject: Re: [Qemu-devel] [PATCH] xen_disk: convert discard input to byte ranges Date: Fri, 18 Nov 2016 10:39:28 -0600 [thread overview] Message-ID: <42ca6186-ca47-3c63-d2e0-54f2ed9f4be7@redhat.com> (raw) In-Reply-To: <20161118102452.5779-1-olaf@aepfle.de> [-- Attachment #1.1: Type: text/plain, Size: 4208 bytes --] On 11/18/2016 04:24 AM, Olaf Hering wrote: > The guest sends discard requests as u64 sector/count pairs, but the > block layer operates internally with s64/s32 pairs. The conversion > leads to IO errors in the guest, the discard request is not processed. > > domU.cfg: > 'vdev=xvda, format=qcow2, backendtype=qdisk, target=/x.qcow2' > domU: > mkfs.ext4 -F /dev/xvda > Discarding device blocks: failed - Input/output error > > Fix this by splitting the request into chunks of BDRV_REQUEST_MAX_SECTORS. > Add input range checking to avoid overflow. > > Signed-off-by: Olaf Hering <olaf@aepfle.de> > --- > hw/block/xen_disk.c | 45 +++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 39 insertions(+), 6 deletions(-) > > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c > index 3a7dc19..c3f572f 100644 > --- a/hw/block/xen_disk.c > +++ b/hw/block/xen_disk.c > @@ -660,6 +660,41 @@ static void qemu_aio_complete(void *opaque, int ret) > qemu_bh_schedule(ioreq->blkdev->bh); > } > > +static bool blk_split_discard(struct ioreq *ioreq, blkif_sector_t sector_number, > + uint64_t nr_sectors) > +{ > + struct XenBlkDev *blkdev = ioreq->blkdev; > + int64_t byte_offset; > + int byte_chunk; > + uint64_t sec_start = sector_number; > + uint64_t sec_count = nr_sectors; > + uint64_t byte_remaining; > + uint64_t limit = BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS; [For reference, this limit is the same as rounding INT32_MAX down to the nearest 512-byte limit, or 0x7ffffe00] > + > + /* Wrap around? */ > + if ((sec_start + sec_count) < sec_count) { > + return false; > + } > + /* Overflowing byte limit? */ > + if ((sec_start + sec_count) > ((INT64_MAX + INT_MAX) >> BDRV_SECTOR_BITS)) { This is undefined. INT64_MAX + anything non-negative overflows int64, and even if you treat overflow as defined by twos-complement representation (which creates a negative number), shifting a negative number is also undefined. If you are trying to detect guests that make a request that would cover more than INT64_MAX bytes, you can simplify. Besides, for as much storage as there is out there, I seriously doubt ANYONE will ever have 2^63 bytes addressable through a single device. Why not just write it as: if ((INT64_MAX >> BDRV_SECTOR_BITS) - sec_count < sec_start) { > + return false; > + } > + > + byte_offset = sec_start << BDRV_SECTOR_BITS; > + byte_remaining = sec_count << BDRV_SECTOR_BITS; > + > + do { > + byte_chunk = byte_remaining > limit ? limit : byte_remaining; > + ioreq->aio_inflight++; > + blk_aio_pdiscard(blkdev->blk, byte_offset, byte_chunk, > + qemu_aio_complete, ioreq); > + byte_remaining -= byte_chunk; > + byte_offset += byte_chunk; > + } while (byte_remaining > 0); This part looks reasonable. > + > + return true; > +} > + > static int ioreq_runio_qemu_aio(struct ioreq *ioreq) > { > struct XenBlkDev *blkdev = ioreq->blkdev; > @@ -708,12 +743,10 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq) > break; > case BLKIF_OP_DISCARD: > { > - struct blkif_request_discard *discard_req = (void *)&ioreq->req; The old code had it... > - ioreq->aio_inflight++; > - blk_aio_pdiscard(blkdev->blk, > - discard_req->sector_number << BDRV_SECTOR_BITS, > - discard_req->nr_sectors << BDRV_SECTOR_BITS, > - qemu_aio_complete, ioreq); > + struct blkif_request_discard *req = (void *)&ioreq->req; ...but C doesn't require a cast to void*. As long as you are touching this, you could remove the cast (unless I'm missing something, and the cast is also there to cast away const). > + if (!blk_split_discard(ioreq, req->sector_number, req->nr_sectors)) { > + goto err; > + } > break; > } > default: > > -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 604 bytes --]
next prev parent reply other threads:[~2016-11-18 16:39 UTC|newest] Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-11-18 10:24 [Qemu-devel] [PATCH] xen_disk: convert discard input to byte ranges Olaf Hering 2016-11-18 10:24 ` Olaf Hering 2016-11-18 10:30 ` [Qemu-devel] " Olaf Hering 2016-11-18 10:30 ` Olaf Hering 2016-11-23 10:49 ` [Qemu-devel] " Olaf Hering 2016-11-23 10:49 ` Olaf Hering 2016-11-23 11:02 ` [Qemu-devel] " Olaf Hering 2016-11-23 11:02 ` Olaf Hering 2016-11-23 18:51 ` [Qemu-devel] " Stefano Stabellini 2016-11-23 18:51 ` Stefano Stabellini 2016-11-18 13:43 ` [Qemu-devel] " Eric Blake 2016-11-18 13:43 ` Eric Blake 2016-11-18 14:19 ` Olaf Hering 2016-11-18 14:19 ` Olaf Hering 2016-11-18 14:35 ` Eric Blake 2016-11-18 14:35 ` Eric Blake 2016-11-18 15:38 ` Kevin Wolf 2016-11-18 15:38 ` Kevin Wolf 2016-11-18 16:39 ` Eric Blake [this message] 2016-11-18 16:39 ` Eric Blake 2016-11-18 17:41 ` Olaf Hering 2016-11-18 17:41 ` Olaf Hering 2016-11-18 18:50 ` Eric Blake 2016-11-18 18:50 ` Eric Blake 2016-11-22 16:12 ` Olaf Hering 2016-11-22 16:12 ` Olaf Hering 2016-11-22 16:32 ` Eric Blake 2016-11-22 16:32 ` Eric Blake 2016-11-22 17:00 ` Olaf Hering 2016-11-22 17:00 ` Olaf Hering 2016-11-22 17:11 ` Eric Blake 2016-11-22 17:11 ` Eric Blake
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=42ca6186-ca47-3c63-d2e0-54f2ed9f4be7@redhat.com \ --to=eblake@redhat.com \ --cc=anthony.perard@citrix.com \ --cc=kwolf@redhat.com \ --cc=mreitz@redhat.com \ --cc=olaf@aepfle.de \ --cc=qemu-block@nongnu.org \ --cc=qemu-devel@nongnu.org \ --cc=sstabellini@kernel.org \ --cc=xen-devel@lists.xensource.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.