All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] block: file-posix queue
@ 2021-06-03 13:37 Paolo Bonzini
  2021-06-03 13:37 ` [PATCH v3 1/7] file-posix: fix max_iov for /dev/sg devices Paolo Bonzini
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Paolo Bonzini @ 2021-06-03 13:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, qemu-block

Hi Kevin,

this is a combination of two series that both affect host block device
support in block/file-posix.c.  Joelle's series is unchanged, while
mine was adjusted according to your review of v2.

v1->v2: add missing patch

v2->v3: add max_hw_transfer to BlockLimits


Joelle van Dyne (3):
  block: feature detection for host block support
  block: check for sys/disk.h
  block: detect DKIOCGETBLOCKCOUNT/SIZE before use

Paolo Bonzini (4):
  file-posix: fix max_iov for /dev/sg devices
  scsi-generic: pass max_segments via max_iov field in BlockLimits
  block: add max_hw_transfer to BlockLimits
  file-posix: try BLKSECTGET on block devices too, do not round to power
    of 2

 block.c                        |   2 +-
 block/block-backend.c          |  12 ++++
 block/file-posix.c             | 104 ++++++++++++++++++++-------------
 block/io.c                     |   1 +
 hw/scsi/scsi-generic.c         |   6 +-
 include/block/block_int.h      |   7 +++
 include/sysemu/block-backend.h |   1 +
 meson.build                    |   7 ++-
 qapi/block-core.json           |  10 +++-
 9 files changed, 102 insertions(+), 48 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 1/7] file-posix: fix max_iov for /dev/sg devices
  2021-06-03 13:37 [PATCH v3 0/7] block: file-posix queue Paolo Bonzini
@ 2021-06-03 13:37 ` Paolo Bonzini
  2021-06-07  5:39   ` Vladimir Sementsov-Ogievskiy
                     ` (2 more replies)
  2021-06-03 13:37 ` [PATCH v3 2/7] scsi-generic: pass max_segments via max_iov field in BlockLimits Paolo Bonzini
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 28+ messages in thread
From: Paolo Bonzini @ 2021-06-03 13:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, qemu-block

Even though it was only called for devices that have bs->sg set (which
must be character devices),
sg_get_max_segments looked at /sys/dev/block which only works for
block devices.

On Linux the sg driver has its own way to provide the maximum number of
iovecs in a scatter/gather list.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/file-posix.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index f37dfc10b3..58db526cc2 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1180,6 +1180,17 @@ static int sg_get_max_segments(int fd)
         goto out;
     }
 
+    if (S_ISCHR(st->st_mode)) {
+        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
+            return ret;
+        }
+        return -EIO;
+    }
+
+    if (!S_ISBLK(st->st_mode)) {
+        return -ENOTSUP;
+    }
+
     sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
                                 major(st.st_rdev), minor(st.st_rdev));
     sysfd = open(sysfspath, O_RDONLY);
-- 
2.31.1




^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 2/7] scsi-generic: pass max_segments via max_iov field in BlockLimits
  2021-06-03 13:37 [PATCH v3 0/7] block: file-posix queue Paolo Bonzini
  2021-06-03 13:37 ` [PATCH v3 1/7] file-posix: fix max_iov for /dev/sg devices Paolo Bonzini
@ 2021-06-03 13:37 ` Paolo Bonzini
  2021-06-15 16:06   ` Max Reitz
  2021-06-15 16:20   ` Max Reitz
  2021-06-03 13:37 ` [PATCH v3 3/7] block: add max_hw_transfer to BlockLimits Paolo Bonzini
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Paolo Bonzini @ 2021-06-03 13:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, qemu-block

I/O to a disk via read/write is not limited by the number of segments allowed
by the host adapter; the kernel can split requests if needed, and the limit
imposed by the host adapter can be very low (256k or so) to avoid that SG_IO
returns EINVAL if memory is heavily fragmented.

Since this value is only interesting for SG_IO-based I/O, do not include
it in the max_transfer and only take it into account when patching the
block limits VPD page in the scsi-generic device.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/file-posix.c     | 3 +--
 hw/scsi/scsi-generic.c | 6 ++++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 58db526cc2..e3241a0dd3 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1239,8 +1239,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 
         ret = sg_get_max_segments(s->fd);
         if (ret > 0) {
-            bs->bl.max_transfer = MIN(bs->bl.max_transfer,
-                                      ret * qemu_real_host_page_size);
+            bs->bl.max_iov = ret;
         }
     }
 
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 98c30c5d5c..82e1e2ee79 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -179,10 +179,12 @@ static void scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s)
         (r->req.cmd.buf[1] & 0x01)) {
         page = r->req.cmd.buf[2];
         if (page == 0xb0) {
-            uint32_t max_transfer =
-                blk_get_max_transfer(s->conf.blk) / s->blocksize;
+            uint32_t max_transfer = blk_get_max_transfer(s->conf.blk);
+            uint32_t max_iov = blk_get_max_iov(s->conf.blk);
 
             assert(max_transfer);
+            max_transfer = MIN_NON_ZERO(max_transfer, max_iov * qemu_real_host_page_size)
+                / s->blocksize;
             stl_be_p(&r->buf[8], max_transfer);
             /* Also take care of the opt xfer len. */
             stl_be_p(&r->buf[12],
-- 
2.31.1




^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 3/7] block: add max_hw_transfer to BlockLimits
  2021-06-03 13:37 [PATCH v3 0/7] block: file-posix queue Paolo Bonzini
  2021-06-03 13:37 ` [PATCH v3 1/7] file-posix: fix max_iov for /dev/sg devices Paolo Bonzini
  2021-06-03 13:37 ` [PATCH v3 2/7] scsi-generic: pass max_segments via max_iov field in BlockLimits Paolo Bonzini
@ 2021-06-03 13:37 ` Paolo Bonzini
  2021-06-03 17:33   ` Eric Blake
  2021-06-15 16:18   ` Max Reitz
  2021-06-03 13:37 ` [PATCH v3 4/7] file-posix: try BLKSECTGET on block devices too, do not round to power of 2 Paolo Bonzini
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Paolo Bonzini @ 2021-06-03 13:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, qemu-block

For block host devices, I/O can happen through either the kernel file
descriptor I/O system calls (preadv/pwritev, io_submit, io_uring)
or the SCSI passthrough ioctl SG_IO.

In the latter case, the size of each transfer can be limited by the
HBA, while for file descriptor I/O the kernel is able to split and
merge I/O in smaller pieces as needed.  Applying the HBA limits to
file descriptor I/O results in more system calls and suboptimal
performance, so this patch splits the max_transfer limit in two:
max_transfer remains valid and is used in general, while max_hw_transfer
is limited to the maximum hardware size.  max_hw_transfer can then be
included by the scsi-generic driver in the block limits page, to ensure
that the stricter hardware limit is used.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/block-backend.c          | 12 ++++++++++++
 block/file-posix.c             |  2 +-
 block/io.c                     |  1 +
 hw/scsi/scsi-generic.c         |  2 +-
 include/block/block_int.h      |  7 +++++++
 include/sysemu/block-backend.h |  1 +
 6 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 15f1ea4288..2ea1412a54 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1953,6 +1953,18 @@ uint32_t blk_get_request_alignment(BlockBackend *blk)
     return bs ? bs->bl.request_alignment : BDRV_SECTOR_SIZE;
 }
 
+/* Returns the maximum hardware transfer length, in bytes; guaranteed nonzero */
+uint64_t blk_get_max_hw_transfer(BlockBackend *blk)
+{
+    BlockDriverState *bs = blk_bs(blk);
+    uint64_t max = INT_MAX;
+
+    if (bs) {
+        max = MIN_NON_ZERO(bs->bl.max_hw_transfer, bs->bl.max_transfer);
+    }
+    return max;
+}
+
 /* Returns the maximum transfer length, in bytes; guaranteed nonzero */
 uint32_t blk_get_max_transfer(BlockBackend *blk)
 {
diff --git a/block/file-posix.c b/block/file-posix.c
index e3241a0dd3..f55f92d0f5 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1234,7 +1234,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
         int ret = sg_get_max_transfer_length(s->fd);
 
         if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
-            bs->bl.max_transfer = pow2floor(ret);
+            bs->bl.max_hw_transfer = pow2floor(ret);
         }
 
         ret = sg_get_max_segments(s->fd);
diff --git a/block/io.c b/block/io.c
index 323854d063..089b99bb0c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -127,6 +127,7 @@ static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
 {
     dst->opt_transfer = MAX(dst->opt_transfer, src->opt_transfer);
     dst->max_transfer = MIN_NON_ZERO(dst->max_transfer, src->max_transfer);
+    dst->max_hw_transfer = MIN_NON_ZERO(dst->max_hw_transfer, src->max_hw_transfer);
     dst->opt_mem_alignment = MAX(dst->opt_mem_alignment,
                                  src->opt_mem_alignment);
     dst->min_mem_alignment = MAX(dst->min_mem_alignment,
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 82e1e2ee79..3762dce749 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -179,7 +179,7 @@ static void scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s)
         (r->req.cmd.buf[1] & 0x01)) {
         page = r->req.cmd.buf[2];
         if (page == 0xb0) {
-            uint32_t max_transfer = blk_get_max_transfer(s->conf.blk);
+            uint64_t max_transfer = blk_get_max_hw_transfer(s->conf.blk);
             uint32_t max_iov = blk_get_max_iov(s->conf.blk);
 
             assert(max_transfer);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 057d88b1fc..f1a54db0f8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -695,6 +695,13 @@ typedef struct BlockLimits {
      * clamped down. */
     uint32_t max_transfer;
 
+    /* Maximal hardware transfer length in bytes.  Applies whenever
+     * transfers to the device bypass the kernel I/O scheduler, for
+     * example with SG_IO.  If larger than max_transfer or if zero,
+     * blk_get_max_hw_transfer will fall back to max_transfer.
+     */
+    uint64_t max_hw_transfer;
+
     /* memory alignment, in bytes so that no bounce buffer is needed */
     size_t min_mem_alignment;
 
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 5423e3d9c6..9ac5f7bbd3 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -208,6 +208,7 @@ void blk_eject(BlockBackend *blk, bool eject_flag);
 int blk_get_flags(BlockBackend *blk);
 uint32_t blk_get_request_alignment(BlockBackend *blk);
 uint32_t blk_get_max_transfer(BlockBackend *blk);
+uint64_t blk_get_max_hw_transfer(BlockBackend *blk);
 int blk_get_max_iov(BlockBackend *blk);
 void blk_set_guest_block_size(BlockBackend *blk, int align);
 void *blk_try_blockalign(BlockBackend *blk, size_t size);
-- 
2.31.1




^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 4/7] file-posix: try BLKSECTGET on block devices too, do not round to power of 2
  2021-06-03 13:37 [PATCH v3 0/7] block: file-posix queue Paolo Bonzini
                   ` (2 preceding siblings ...)
  2021-06-03 13:37 ` [PATCH v3 3/7] block: add max_hw_transfer to BlockLimits Paolo Bonzini
@ 2021-06-03 13:37 ` Paolo Bonzini
  2021-06-15 16:32   ` Max Reitz
  2021-06-03 13:37 ` [PATCH v3 5/7] block: feature detection for host block support Paolo Bonzini
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2021-06-03 13:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, qemu-block

bs->sg is only true for character devices, but block devices can also
be used with scsi-block and scsi-generic.  Unfortunately BLKSECTGET
returns bytes in an int for /dev/sgN devices, and sectors in a short
for block devices, so account for that in the code.

The maximum transfer also need not be a power of 2 (for example I have
seen disks with 1280 KiB maximum transfer) so there's no need to pass
the result through pow2floor.

This patch reintroduces the logic that was removed in commit 867eccfed8
("file-posix: Use max transfer length/segment count only for SCSI
passthrough", 2019-07-12).  The removal was motivated by the performance
regression when using a block device's max_transfer with kernel I/O;
the new, separate max_hw_transfer limit avoids reintroducing the same
regression.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/file-posix.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index f55f92d0f5..1439293f63 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1149,22 +1149,27 @@ static void raw_reopen_abort(BDRVReopenState *state)
     s->reopen_state = NULL;
 }
 
-static int sg_get_max_transfer_length(int fd)
+static int hdev_get_max_hw_transfer(int fd, struct stat *st)
 {
 #ifdef BLKSECTGET
-    int max_bytes = 0;
-
-    if (ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
-        return max_bytes;
+    if (S_ISBLK(st->st_mode)) {
+        unsigned short max_sectors = 0;
+        if (ioctl(fd, BLKSECTGET, &max_sectors) == 0) {
+            return max_sectors * 512;
+        }
     } else {
-        return -errno;
+        int max_bytes = 0;
+        if (ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
+            return max_bytes;
+        }
     }
+    return -errno;
 #else
     return -ENOSYS;
 #endif
 }
 
-static int sg_get_max_segments(int fd)
+static int hdev_get_max_segments(int fd, struct stat *st)
 {
 #ifdef CONFIG_LINUX
     char buf[32];
@@ -1173,12 +1178,6 @@ static int sg_get_max_segments(int fd)
     int ret;
     int sysfd = -1;
     long max_segments;
-    struct stat st;
-
-    if (fstat(fd, &st)) {
-        ret = -errno;
-        goto out;
-    }
 
     if (S_ISCHR(st->st_mode)) {
         if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
@@ -1192,7 +1191,7 @@ static int sg_get_max_segments(int fd)
     }
 
     sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
-                                major(st.st_rdev), minor(st.st_rdev));
+                                major(st->st_rdev), minor(st->st_rdev));
     sysfd = open(sysfspath, O_RDONLY);
     if (sysfd == -1) {
         ret = -errno;
@@ -1229,15 +1228,20 @@ out:
 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BDRVRawState *s = bs->opaque;
+    struct stat st;
+
+    if (fstat(s->fd, &st)) {
+        return;
+    }
 
-    if (bs->sg) {
-        int ret = sg_get_max_transfer_length(s->fd);
+    if (bs->sg || S_ISBLK(st.st_mode)) {
+        int ret = hdev_get_max_hw_transfer(s->fd, &st);
 
         if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
-            bs->bl.max_hw_transfer = pow2floor(ret);
+            bs->bl.max_hw_transfer = ret;
         }
 
-        ret = sg_get_max_segments(s->fd);
+        ret = hdev_get_max_segments(s->fd, &st);
         if (ret > 0) {
             bs->bl.max_iov = ret;
         }
-- 
2.31.1




^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 5/7] block: feature detection for host block support
  2021-06-03 13:37 [PATCH v3 0/7] block: file-posix queue Paolo Bonzini
                   ` (3 preceding siblings ...)
  2021-06-03 13:37 ` [PATCH v3 4/7] file-posix: try BLKSECTGET on block devices too, do not round to power of 2 Paolo Bonzini
@ 2021-06-03 13:37 ` Paolo Bonzini
  2021-06-15 16:40   ` Max Reitz
  2021-06-03 13:37 ` [PATCH v3 6/7] block: check for sys/disk.h Paolo Bonzini
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2021-06-03 13:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Joelle van Dyne, qemu-block

From: Joelle van Dyne <j@getutm.app>

On Darwin (iOS), there are no system level APIs for directly accessing
host block devices. We detect this at configure time.

Signed-off-by: Joelle van Dyne <j@getutm.app>
Message-Id: <20210315180341.31638-2-j@getutm.app>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/file-posix.c   | 33 ++++++++++++++++++++++-----------
 meson.build          |  6 +++++-
 qapi/block-core.json | 10 +++++++---
 3 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 1439293f63..5821e1afed 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -42,6 +42,8 @@
 #include "scsi/constants.h"
 
 #if defined(__APPLE__) && (__MACH__)
+#include <sys/ioctl.h>
+#if defined(HAVE_HOST_BLOCK_DEVICE)
 #include <paths.h>
 #include <sys/param.h>
 #include <IOKit/IOKitLib.h>
@@ -52,6 +54,7 @@
 //#include <IOKit/storage/IOCDTypes.h>
 #include <IOKit/storage/IODVDMedia.h>
 #include <CoreFoundation/CoreFoundation.h>
+#endif /* defined(HAVE_HOST_BLOCK_DEVICE) */
 #endif
 
 #ifdef __sun__
@@ -180,7 +183,17 @@ typedef struct BDRVRawReopenState {
     bool check_cache_dropped;
 } BDRVRawReopenState;
 
-static int fd_open(BlockDriverState *bs);
+static int fd_open(BlockDriverState *bs)
+{
+    BDRVRawState *s = bs->opaque;
+
+    /* this is just to ensure s->fd is sane (its called by io ops) */
+    if (s->fd >= 0) {
+        return 0;
+    }
+    return -EIO;
+}
+
 static int64_t raw_getlength(BlockDriverState *bs);
 
 typedef struct RawPosixAIOData {
@@ -3028,6 +3041,7 @@ static BlockStatsSpecific *raw_get_specific_stats(BlockDriverState *bs)
     return stats;
 }
 
+#if defined(HAVE_HOST_BLOCK_DEVICE)
 static BlockStatsSpecific *hdev_get_specific_stats(BlockDriverState *bs)
 {
     BlockStatsSpecific *stats = g_new(BlockStatsSpecific, 1);
@@ -3037,6 +3051,7 @@ static BlockStatsSpecific *hdev_get_specific_stats(BlockDriverState *bs)
 
     return stats;
 }
+#endif /* HAVE_HOST_BLOCK_DEVICE */
 
 static QemuOptsList raw_create_opts = {
     .name = "raw-create-opts",
@@ -3252,6 +3267,8 @@ BlockDriver bdrv_file = {
 /***********************************************/
 /* host device */
 
+#if defined(HAVE_HOST_BLOCK_DEVICE)
+
 #if defined(__APPLE__) && defined(__MACH__)
 static kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
                                 CFIndex maxPathSize, int flags);
@@ -3544,16 +3561,6 @@ hdev_co_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
 }
 #endif /* linux */
 
-static int fd_open(BlockDriverState *bs)
-{
-    BDRVRawState *s = bs->opaque;
-
-    /* this is just to ensure s->fd is sane (its called by io ops) */
-    if (s->fd >= 0)
-        return 0;
-    return -EIO;
-}
-
 static coroutine_fn int
 hdev_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes)
 {
@@ -3877,6 +3884,8 @@ static BlockDriver bdrv_host_cdrom = {
 };
 #endif /* __FreeBSD__ */
 
+#endif /* HAVE_HOST_BLOCK_DEVICE */
+
 static void bdrv_file_init(void)
 {
     /*
@@ -3884,6 +3893,7 @@ static void bdrv_file_init(void)
      * registered last will get probed first.
      */
     bdrv_register(&bdrv_file);
+#if defined(HAVE_HOST_BLOCK_DEVICE)
     bdrv_register(&bdrv_host_device);
 #ifdef __linux__
     bdrv_register(&bdrv_host_cdrom);
@@ -3891,6 +3901,7 @@ static void bdrv_file_init(void)
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
     bdrv_register(&bdrv_host_cdrom);
 #endif
+#endif /* HAVE_HOST_BLOCK_DEVICE */
 }
 
 block_init(bdrv_file_init);
diff --git a/meson.build b/meson.build
index 913cf2a41a..9387fc5799 100644
--- a/meson.build
+++ b/meson.build
@@ -183,7 +183,7 @@ if targetos == 'windows'
                                       include_directories: include_directories('.'))
 elif targetos == 'darwin'
   coref = dependency('appleframeworks', modules: 'CoreFoundation')
-  iokit = dependency('appleframeworks', modules: 'IOKit')
+  iokit = dependency('appleframeworks', modules: 'IOKit', required: false)
 elif targetos == 'sunos'
   socket = [cc.find_library('socket'),
             cc.find_library('nsl'),
@@ -1072,6 +1072,9 @@ if get_option('cfi')
   add_global_link_arguments(cfi_flags, native: false, language: ['c', 'cpp', 'objc'])
 endif
 
+have_host_block_device = (targetos != 'darwin' or
+    cc.has_header('IOKit/storage/IOMedia.h'))
+
 #################
 # config-host.h #
 #################
@@ -1165,6 +1168,7 @@ config_host_data.set('HAVE_PTY_H', cc.has_header('pty.h'))
 config_host_data.set('HAVE_SYS_IOCCOM_H', cc.has_header('sys/ioccom.h'))
 config_host_data.set('HAVE_SYS_KCOV_H', cc.has_header('sys/kcov.h'))
 config_host_data.set('HAVE_SYSTEM_FUNCTION', cc.has_function('system', prefix: '#include <stdlib.h>'))
+config_host_data.set('HAVE_HOST_BLOCK_DEVICE', have_host_block_device)
 
 config_host_data.set('CONFIG_PREADV', cc.has_function('preadv', prefix: '#include <sys/uio.h>'))
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 2ea294129e..2dd41be156 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -897,7 +897,8 @@
   'discriminator': 'driver',
   'data': {
       'file': 'BlockStatsSpecificFile',
-      'host_device': 'BlockStatsSpecificFile',
+      'host_device': { 'type': 'BlockStatsSpecificFile',
+                       'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
       'nvme': 'BlockStatsSpecificNvme' } }
 
 ##
@@ -2814,7 +2815,9 @@
 { 'enum': 'BlockdevDriver',
   'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
             'cloop', 'compress', 'copy-on-read', 'dmg', 'file', 'ftp', 'ftps',
-            'gluster', 'host_cdrom', 'host_device', 'http', 'https', 'iscsi',
+            'gluster', 'host_cdrom',
+            {'name': 'host_device', 'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
+            'http', 'https', 'iscsi',
             'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
             'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
             { 'name': 'replication', 'if': 'defined(CONFIG_REPLICATION)' },
@@ -3996,7 +3999,8 @@
       'ftps':       'BlockdevOptionsCurlFtps',
       'gluster':    'BlockdevOptionsGluster',
       'host_cdrom': 'BlockdevOptionsFile',
-      'host_device':'BlockdevOptionsFile',
+      'host_device': { 'type': 'BlockdevOptionsFile',
+                       'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
       'http':       'BlockdevOptionsCurlHttp',
       'https':      'BlockdevOptionsCurlHttps',
       'iscsi':      'BlockdevOptionsIscsi',
-- 
2.31.1




^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 6/7] block: check for sys/disk.h
  2021-06-03 13:37 [PATCH v3 0/7] block: file-posix queue Paolo Bonzini
                   ` (4 preceding siblings ...)
  2021-06-03 13:37 ` [PATCH v3 5/7] block: feature detection for host block support Paolo Bonzini
@ 2021-06-03 13:37 ` Paolo Bonzini
  2021-06-15 16:42   ` Max Reitz
  2021-06-03 13:37 ` [PATCH v3 7/7] block: detect DKIOCGETBLOCKCOUNT/SIZE before use Paolo Bonzini
  2021-06-07 13:52 ` [PATCH v3 0/7] block: file-posix queue Maxim Levitsky
  7 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2021-06-03 13:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, Peter Maydell, Philippe Mathieu-Daudé,
	Joelle van Dyne, qemu-block

From: Joelle van Dyne <j@getutm.app>

Some BSD platforms do not have this header.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Joelle van Dyne <j@getutm.app>
Message-Id: <20210315180341.31638-3-j@getutm.app>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c     | 2 +-
 meson.build | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 3f456892d0..1d37f133a8 100644
--- a/block.c
+++ b/block.c
@@ -54,7 +54,7 @@
 #ifdef CONFIG_BSD
 #include <sys/ioctl.h>
 #include <sys/queue.h>
-#ifndef __DragonFly__
+#if defined(HAVE_SYS_DISK_H)
 #include <sys/disk.h>
 #endif
 #endif
diff --git a/meson.build b/meson.build
index 9387fc5799..8c44c37f9a 100644
--- a/meson.build
+++ b/meson.build
@@ -1169,6 +1169,7 @@ config_host_data.set('HAVE_SYS_IOCCOM_H', cc.has_header('sys/ioccom.h'))
 config_host_data.set('HAVE_SYS_KCOV_H', cc.has_header('sys/kcov.h'))
 config_host_data.set('HAVE_SYSTEM_FUNCTION', cc.has_function('system', prefix: '#include <stdlib.h>'))
 config_host_data.set('HAVE_HOST_BLOCK_DEVICE', have_host_block_device)
+config_host_data.set('HAVE_SYS_DISK_H', cc.has_header('sys/disk.h'))
 
 config_host_data.set('CONFIG_PREADV', cc.has_function('preadv', prefix: '#include <sys/uio.h>'))
 
-- 
2.31.1




^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 7/7] block: detect DKIOCGETBLOCKCOUNT/SIZE before use
  2021-06-03 13:37 [PATCH v3 0/7] block: file-posix queue Paolo Bonzini
                   ` (5 preceding siblings ...)
  2021-06-03 13:37 ` [PATCH v3 6/7] block: check for sys/disk.h Paolo Bonzini
@ 2021-06-03 13:37 ` Paolo Bonzini
  2021-06-15 16:50   ` Max Reitz
  2021-06-07 13:52 ` [PATCH v3 0/7] block: file-posix queue Maxim Levitsky
  7 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2021-06-03 13:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Warner Losh, Joelle van Dyne, qemu-block, Peter Maydell

From: Joelle van Dyne <j@getutm.app>

iOS hosts do not have these defined so we fallback to the
default behaviour.

Co-authored-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Joelle van Dyne <j@getutm.app>
Message-Id: <20210315180341.31638-4-j@getutm.app>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/file-posix.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 5821e1afed..4e2f7cf508 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2322,8 +2322,11 @@ static int64_t raw_getlength(BlockDriverState *bs)
 again:
 #endif
     if (!fstat(fd, &sb) && (S_IFCHR & sb.st_mode)) {
+        size = 0;
 #ifdef DIOCGMEDIASIZE
-        if (ioctl(fd, DIOCGMEDIASIZE, (off_t *)&size))
+        if (ioctl(fd, DIOCGMEDIASIZE, (off_t *)&size)) {
+            size = 0;
+        }
 #elif defined(DIOCGPART)
         {
                 struct partinfo pi;
@@ -2332,9 +2335,7 @@ again:
                 else
                         size = 0;
         }
-        if (size == 0)
-#endif
-#if defined(__APPLE__) && defined(__MACH__)
+#elif defined(DKIOCGETBLOCKCOUNT) && defined(DKIOCGETBLOCKSIZE)
         {
             uint64_t sectors = 0;
             uint32_t sector_size = 0;
@@ -2342,19 +2343,15 @@ again:
             if (ioctl(fd, DKIOCGETBLOCKCOUNT, &sectors) == 0
                && ioctl(fd, DKIOCGETBLOCKSIZE, &sector_size) == 0) {
                 size = sectors * sector_size;
-            } else {
-                size = lseek(fd, 0LL, SEEK_END);
-                if (size < 0) {
-                    return -errno;
-                }
             }
         }
-#else
-        size = lseek(fd, 0LL, SEEK_END);
+#endif
+        if (size == 0) {
+            size = lseek(fd, 0LL, SEEK_END);
+        }
         if (size < 0) {
             return -errno;
         }
-#endif
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
         switch(s->type) {
         case FTYPE_CD:
-- 
2.31.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/7] block: add max_hw_transfer to BlockLimits
  2021-06-03 13:37 ` [PATCH v3 3/7] block: add max_hw_transfer to BlockLimits Paolo Bonzini
@ 2021-06-03 17:33   ` Eric Blake
  2021-06-04  7:21     ` Paolo Bonzini
  2021-06-15 16:18   ` Max Reitz
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Blake @ 2021-06-03 17:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, qemu-devel, qemu-block

On Thu, Jun 03, 2021 at 03:37:18PM +0200, Paolo Bonzini wrote:
> For block host devices, I/O can happen through either the kernel file
> descriptor I/O system calls (preadv/pwritev, io_submit, io_uring)
> or the SCSI passthrough ioctl SG_IO.
> 
> In the latter case, the size of each transfer can be limited by the
> HBA, while for file descriptor I/O the kernel is able to split and
> merge I/O in smaller pieces as needed.  Applying the HBA limits to
> file descriptor I/O results in more system calls and suboptimal
> performance, so this patch splits the max_transfer limit in two:
> max_transfer remains valid and is used in general, while max_hw_transfer
> is limited to the maximum hardware size.  max_hw_transfer can then be
> included by the scsi-generic driver in the block limits page, to ensure
> that the stricter hardware limit is used.
> 

> +/* Returns the maximum hardware transfer length, in bytes; guaranteed nonzero */
> +uint64_t blk_get_max_hw_transfer(BlockBackend *blk)
> +{
> +    BlockDriverState *bs = blk_bs(blk);
> +    uint64_t max = INT_MAX;

This is an unaligned value; should we instead round it down to the
request_alignment granularity?

> +
> +    if (bs) {
> +        max = MIN_NON_ZERO(bs->bl.max_hw_transfer, bs->bl.max_transfer);
> +    }
> +    return max;
> +}
> +
>  /* Returns the maximum transfer length, in bytes; guaranteed nonzero */
>  uint32_t blk_get_max_transfer(BlockBackend *blk)
>  {

> +++ b/include/block/block_int.h
> @@ -695,6 +695,13 @@ typedef struct BlockLimits {
>       * clamped down. */
>      uint32_t max_transfer;
>  
> +    /* Maximal hardware transfer length in bytes.  Applies whenever

Leading /* on its own line, per our style.

> +     * transfers to the device bypass the kernel I/O scheduler, for
> +     * example with SG_IO.  If larger than max_transfer or if zero,
> +     * blk_get_max_hw_transfer will fall back to max_transfer.
> +     */

Should we mandate any additional requirements on this value such as
multiple of request_alignment or even power-of-2?

> +    uint64_t max_hw_transfer;
> +
>      /* memory alignment, in bytes so that no bounce buffer is needed */
>      size_t min_mem_alignment;
>  

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/7] block: add max_hw_transfer to BlockLimits
  2021-06-03 17:33   ` Eric Blake
@ 2021-06-04  7:21     ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2021-06-04  7:21 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, qemu-devel, qemu-block

On 03/06/21 19:33, Eric Blake wrote:
>> +/* Returns the maximum hardware transfer length, in bytes; guaranteed nonzero */
>> +uint64_t blk_get_max_hw_transfer(BlockBackend *blk)
>> +{
>> +    BlockDriverState *bs = blk_bs(blk);
>> +    uint64_t max = INT_MAX;
> 
> This is an unaligned value; should we instead round it down to the
> request_alignment granularity?

See below...

>> +++ b/include/block/block_int.h
>> @@ -695,6 +695,13 @@ typedef struct BlockLimits {
>>        * clamped down. */
>>       uint32_t max_transfer;
>>   
>> +    /* Maximal hardware transfer length in bytes.  Applies whenever
> 
> Leading /* on its own line, per our style.

The whole file still uses this style, I can change it if desired or do 
it later for the whole file or even the whole block subsystem.

>> +     * transfers to the device bypass the kernel I/O scheduler, for
>> +     * example with SG_IO.  If larger than max_transfer or if zero,
>> +     * blk_get_max_hw_transfer will fall back to max_transfer.
>> +     */
> 
> Should we mandate any additional requirements on this value such as
> multiple of request_alignment or even power-of-2?

Certainly not power of 2.  Multiple of request_alignment probably makes 
sense, but max_transfer doesn't have that limit.

Paolo

>> +    uint64_t max_hw_transfer;
>> +
>>       /* memory alignment, in bytes so that no bounce buffer is needed */
>>       size_t min_mem_alignment;
>>   
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 1/7] file-posix: fix max_iov for /dev/sg devices
  2021-06-03 13:37 ` [PATCH v3 1/7] file-posix: fix max_iov for /dev/sg devices Paolo Bonzini
@ 2021-06-07  5:39   ` Vladimir Sementsov-Ogievskiy
  2021-06-07  6:16     ` Vladimir Sementsov-Ogievskiy
  2021-06-07  7:23   ` Vladimir Sementsov-Ogievskiy
  2021-06-15 15:58   ` Max Reitz
  2 siblings, 1 reply; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-06-07  5:39 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, qemu-block

03.06.2021 16:37, Paolo Bonzini wrote:
> Even though it was only called for devices that have bs->sg set (which
> must be character devices),
> sg_get_max_segments looked at /sys/dev/block which only works for
> block devices.
> 
> On Linux the sg driver has its own way to provide the maximum number of
> iovecs in a scatter/gather list.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block/file-posix.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index f37dfc10b3..58db526cc2 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1180,6 +1180,17 @@ static int sg_get_max_segments(int fd)
>           goto out;
>       }
>   
> +    if (S_ISCHR(st->st_mode)) {
> +        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
> +            return ret;

So, this is a new most-probable path for char-dev, yes?

Is it better do it at start of the function to avoid extra fstat() ?

> +        }
> +        return -EIO;
> +    }
> +
> +    if (!S_ISBLK(st->st_mode)) {
> +        return -ENOTSUP;
> +    }
> +
>       sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
>                                   major(st.st_rdev), minor(st.st_rdev));
>       sysfd = open(sysfspath, O_RDONLY);
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 1/7] file-posix: fix max_iov for /dev/sg devices
  2021-06-07  5:39   ` Vladimir Sementsov-Ogievskiy
@ 2021-06-07  6:16     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-06-07  6:16 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, qemu-block

07.06.2021 08:39, Vladimir Sementsov-Ogievskiy wrote:
> 03.06.2021 16:37, Paolo Bonzini wrote:
>> Even though it was only called for devices that have bs->sg set (which
>> must be character devices),
>> sg_get_max_segments looked at /sys/dev/block which only works for
>> block devices.
>>
>> On Linux the sg driver has its own way to provide the maximum number of
>> iovecs in a scatter/gather list.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>   block/file-posix.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/block/file-posix.c b/block/file-posix.c
>> index f37dfc10b3..58db526cc2 100644
>> --- a/block/file-posix.c
>> +++ b/block/file-posix.c
>> @@ -1180,6 +1180,17 @@ static int sg_get_max_segments(int fd)
>>           goto out;
>>       }
>> +    if (S_ISCHR(st->st_mode)) {
>> +        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
>> +            return ret;
> 
> So, this is a new most-probable path for char-dev, yes?
> 
> Is it better do it at start of the function to avoid extra fstat() ?

Haha, stupid question, sorry my morning sleepy eyes. fstat is used for outer if condition.

> 
>> +        }
>> +        return -EIO;
>> +    }
>> +
>> +    if (!S_ISBLK(st->st_mode)) {
>> +        return -ENOTSUP;
>> +    }
>> +
>>       sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
>>                                   major(st.st_rdev), minor(st.st_rdev));
>>       sysfd = open(sysfspath, O_RDONLY);
>>
> 
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 1/7] file-posix: fix max_iov for /dev/sg devices
  2021-06-03 13:37 ` [PATCH v3 1/7] file-posix: fix max_iov for /dev/sg devices Paolo Bonzini
  2021-06-07  5:39   ` Vladimir Sementsov-Ogievskiy
@ 2021-06-07  7:23   ` Vladimir Sementsov-Ogievskiy
  2021-06-15 15:58   ` Max Reitz
  2 siblings, 0 replies; 28+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-06-07  7:23 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, qemu-block

03.06.2021 16:37, Paolo Bonzini wrote:
> Even though it was only called for devices that have bs->sg set (which
> must be character devices),
> sg_get_max_segments looked at /sys/dev/block which only works for
> block devices.

I assume, you keep /sys/dev/block code branch here only for following converting of the function to be hdev_get_max_hw_transfer()? Worth mentioning here?

> 
> On Linux the sg driver has its own way to provide the maximum number of
> iovecs in a scatter/gather list.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block/file-posix.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index f37dfc10b3..58db526cc2 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1180,6 +1180,17 @@ static int sg_get_max_segments(int fd)
>           goto out;
>       }
>   
> +    if (S_ISCHR(st->st_mode)) {

Should it be instead "if (bs->sg) {" ? As SG_GET_SG_TABLESIZE looks like sg-specific, not for any chardev.

Also, st is not a pointer, so here and in the next if condition it should be s/st->/st./

> +        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
> +            return ret;
> +        }
> +        return -EIO;
> +    }
> +
> +    if (!S_ISBLK(st->st_mode)) {
> +        return -ENOTSUP;
> +    }
> +
>       sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
>                                   major(st.st_rdev), minor(st.st_rdev));
>       sysfd = open(sysfspath, O_RDONLY);
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/7] block: file-posix queue
  2021-06-03 13:37 [PATCH v3 0/7] block: file-posix queue Paolo Bonzini
                   ` (6 preceding siblings ...)
  2021-06-03 13:37 ` [PATCH v3 7/7] block: detect DKIOCGETBLOCKCOUNT/SIZE before use Paolo Bonzini
@ 2021-06-07 13:52 ` Maxim Levitsky
  2021-06-10 13:40   ` Paolo Bonzini
  7 siblings, 1 reply; 28+ messages in thread
From: Maxim Levitsky @ 2021-06-07 13:52 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, qemu-block, Max Reitz

On Thu, 2021-06-03 at 15:37 +0200, Paolo Bonzini wrote:
> Hi Kevin,
> 
> this is a combination of two series that both affect host block device
> support in block/file-posix.c.  Joelle's series is unchanged, while
> mine was adjusted according to your review of v2.
> 
> v1->v2: add missing patch
> 
> v2->v3: add max_hw_transfer to BlockLimits
> 
> 
> Joelle van Dyne (3):
>   block: feature detection for host block support
>   block: check for sys/disk.h
>   block: detect DKIOCGETBLOCKCOUNT/SIZE before use
> 
> Paolo Bonzini (4):
>   file-posix: fix max_iov for /dev/sg devices
>   scsi-generic: pass max_segments via max_iov field in BlockLimits
>   block: add max_hw_transfer to BlockLimits
>   file-posix: try BLKSECTGET on block devices too, do not round to power
>     of 2
> 
>  block.c                        |   2 +-
>  block/block-backend.c          |  12 ++++
>  block/file-posix.c             | 104 ++++++++++++++++++++-------------
>  block/io.c                     |   1 +
>  hw/scsi/scsi-generic.c         |   6 +-
>  include/block/block_int.h      |   7 +++
>  include/sysemu/block-backend.h |   1 +
>  meson.build                    |   7 ++-
>  qapi/block-core.json           |  10 +++-
>  9 files changed, 102 insertions(+), 48 deletions(-)
> 
Hi Paolo and everyone!
 
I used to have a patch series that was about to fix the block limits of the scsi-block,
which I think is similar to this patch series.
 
Sorry that I kind of forgot about it for too much time.
 
https://www.mail-archive.com/qemu-devel@nongnu.org/msg768261.html
 
I'll need some time to swap-in this area so that I could compare our
patches to see if we missed something.
 
Best regards,
	Maxim Levitsky
 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/7] block: file-posix queue
  2021-06-07 13:52 ` [PATCH v3 0/7] block: file-posix queue Maxim Levitsky
@ 2021-06-10 13:40   ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2021-06-10 13:40 UTC (permalink / raw)
  To: Maxim Levitsky, qemu-devel; +Cc: kwolf, qemu-block, Max Reitz

On 07/06/21 15:52, Maxim Levitsky wrote:
> I used to have a patch series that was about to fix the block limits of the scsi-block,
> which I think is similar to this patch series.
>   
> Sorry that I kind of forgot about it for too much time.
>   
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg768261.html
>   
> I'll need some time to swap-in this area so that I could compare our
> patches to see if we missed something.

They are indeed very similar; the only substantial change is that my 
patches also clamp max_hw_transfer to max_transfer.

I picked up patch 5 from your old submission and queued it, since it's a 
SCSI change.

Paolo



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 1/7] file-posix: fix max_iov for /dev/sg devices
  2021-06-03 13:37 ` [PATCH v3 1/7] file-posix: fix max_iov for /dev/sg devices Paolo Bonzini
  2021-06-07  5:39   ` Vladimir Sementsov-Ogievskiy
  2021-06-07  7:23   ` Vladimir Sementsov-Ogievskiy
@ 2021-06-15 15:58   ` Max Reitz
  2 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2021-06-15 15:58 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, qemu-block

On 03.06.21 15:37, Paolo Bonzini wrote:
> Even though it was only called for devices that have bs->sg set (which
> must be character devices),
> sg_get_max_segments looked at /sys/dev/block which only works for
> block devices.
>
> On Linux the sg driver has its own way to provide the maximum number of
> iovecs in a scatter/gather list.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block/file-posix.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index f37dfc10b3..58db526cc2 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1180,6 +1180,17 @@ static int sg_get_max_segments(int fd)
>           goto out;
>       }
>   
> +    if (S_ISCHR(st->st_mode)) {
> +        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
> +            return ret;
> +        }
> +        return -EIO;
> +    }
> +
> +    if (!S_ISBLK(st->st_mode)) {
> +        return -ENOTSUP;
> +    }
> +

With %s/->/./:

Reviewed-by: Max Reitz <mreitz@redhat.com>

(To answer Vladimir’s question, I don’t believe the condition should be 
bs->sg, because bs->sg == true is a given for this function anyway. 
Therefore, there’s no need to check whether the char device is an SG 
device.)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/7] scsi-generic: pass max_segments via max_iov field in BlockLimits
  2021-06-03 13:37 ` [PATCH v3 2/7] scsi-generic: pass max_segments via max_iov field in BlockLimits Paolo Bonzini
@ 2021-06-15 16:06   ` Max Reitz
  2021-06-15 16:20   ` Max Reitz
  1 sibling, 0 replies; 28+ messages in thread
From: Max Reitz @ 2021-06-15 16:06 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, qemu-block

On 03.06.21 15:37, Paolo Bonzini wrote:
> I/O to a disk via read/write is not limited by the number of segments allowed
> by the host adapter; the kernel can split requests if needed, and the limit
> imposed by the host adapter can be very low (256k or so) to avoid that SG_IO
> returns EINVAL if memory is heavily fragmented.
>
> Since this value is only interesting for SG_IO-based I/O, do not include
> it in the max_transfer and only take it into account when patching the
> block limits VPD page in the scsi-generic device.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block/file-posix.c     | 3 +--
>   hw/scsi/scsi-generic.c | 6 ++++--
>   2 files changed, 5 insertions(+), 4 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/7] block: add max_hw_transfer to BlockLimits
  2021-06-03 13:37 ` [PATCH v3 3/7] block: add max_hw_transfer to BlockLimits Paolo Bonzini
  2021-06-03 17:33   ` Eric Blake
@ 2021-06-15 16:18   ` Max Reitz
  2021-06-16 13:18     ` Paolo Bonzini
  1 sibling, 1 reply; 28+ messages in thread
From: Max Reitz @ 2021-06-15 16:18 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, qemu-block

On 03.06.21 15:37, Paolo Bonzini wrote:
> For block host devices, I/O can happen through either the kernel file
> descriptor I/O system calls (preadv/pwritev, io_submit, io_uring)
> or the SCSI passthrough ioctl SG_IO.
>
> In the latter case, the size of each transfer can be limited by the
> HBA, while for file descriptor I/O the kernel is able to split and
> merge I/O in smaller pieces as needed.  Applying the HBA limits to
> file descriptor I/O results in more system calls and suboptimal
> performance, so this patch splits the max_transfer limit in two:
> max_transfer remains valid and is used in general, while max_hw_transfer
> is limited to the maximum hardware size.  max_hw_transfer can then be
> included by the scsi-generic driver in the block limits page, to ensure
> that the stricter hardware limit is used.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block/block-backend.c          | 12 ++++++++++++
>   block/file-posix.c             |  2 +-
>   block/io.c                     |  1 +
>   hw/scsi/scsi-generic.c         |  2 +-
>   include/block/block_int.h      |  7 +++++++
>   include/sysemu/block-backend.h |  1 +
>   6 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 15f1ea4288..2ea1412a54 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1953,6 +1953,18 @@ uint32_t blk_get_request_alignment(BlockBackend *blk)
>       return bs ? bs->bl.request_alignment : BDRV_SECTOR_SIZE;
>   }
>   
> +/* Returns the maximum hardware transfer length, in bytes; guaranteed nonzero */
> +uint64_t blk_get_max_hw_transfer(BlockBackend *blk)
> +{
> +    BlockDriverState *bs = blk_bs(blk);
> +    uint64_t max = INT_MAX;
> +
> +    if (bs) {
> +        max = MIN_NON_ZERO(bs->bl.max_hw_transfer, bs->bl.max_transfer);
> +    }
> +    return max;

Both `max_hw_transfer` and `max_transfer` can be 0, so this could return 
0, contrary to what the comment above promises.

Should `max` be initialized to 0 with a `MIN_NON_ZERO(max, INT_MAX)` 
here (like `blk_get_max_transfer()` does it)?

(As for the rest, I think aligning to the request alignment makes sense, 
but then again we don’t do that for max_transfer either, so... this at 
least wouldn’t be a new bug.

Regarding the comment, checkpatch complains about it, so it should be 
fixed so that /* is on its own line.

Speaking of checkpatch, now that I ran it, it also complains about the 
new line in bdrv_merge_limits() exceeding 80 characters, so that should 
be fixed, too.)

Max



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/7] scsi-generic: pass max_segments via max_iov field in BlockLimits
  2021-06-03 13:37 ` [PATCH v3 2/7] scsi-generic: pass max_segments via max_iov field in BlockLimits Paolo Bonzini
  2021-06-15 16:06   ` Max Reitz
@ 2021-06-15 16:20   ` Max Reitz
  2021-06-15 17:41     ` Paolo Bonzini
  1 sibling, 1 reply; 28+ messages in thread
From: Max Reitz @ 2021-06-15 16:20 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, qemu-block

On 03.06.21 15:37, Paolo Bonzini wrote:
> I/O to a disk via read/write is not limited by the number of segments allowed
> by the host adapter; the kernel can split requests if needed, and the limit
> imposed by the host adapter can be very low (256k or so) to avoid that SG_IO
> returns EINVAL if memory is heavily fragmented.
>
> Since this value is only interesting for SG_IO-based I/O, do not include
> it in the max_transfer and only take it into account when patching the
> block limits VPD page in the scsi-generic device.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block/file-posix.c     | 3 +--
>   hw/scsi/scsi-generic.c | 6 ++++--
>   2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 58db526cc2..e3241a0dd3 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1239,8 +1239,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>   
>           ret = sg_get_max_segments(s->fd);
>           if (ret > 0) {
> -            bs->bl.max_transfer = MIN(bs->bl.max_transfer,
> -                                      ret * qemu_real_host_page_size);
> +            bs->bl.max_iov = ret;
>           }
>       }
>   
> diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
> index 98c30c5d5c..82e1e2ee79 100644
> --- a/hw/scsi/scsi-generic.c
> +++ b/hw/scsi/scsi-generic.c
> @@ -179,10 +179,12 @@ static void scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s)
>           (r->req.cmd.buf[1] & 0x01)) {
>           page = r->req.cmd.buf[2];
>           if (page == 0xb0) {
> -            uint32_t max_transfer =
> -                blk_get_max_transfer(s->conf.blk) / s->blocksize;
> +            uint32_t max_transfer = blk_get_max_transfer(s->conf.blk);
> +            uint32_t max_iov = blk_get_max_iov(s->conf.blk);
>   
>               assert(max_transfer);
> +            max_transfer = MIN_NON_ZERO(max_transfer, max_iov * qemu_real_host_page_size)
> +                / s->blocksize;

Now that I ran checkpatch for patch 3, I saw that it complains about 
this line being longer than 80 characters. I think it could be split so 
it doesn’t exceed that limit. It looks a bit like you intentionally 
exceeded the warning limit, but didn’t exceed the error limit (of 90). 
Is that so?

Max



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/7] file-posix: try BLKSECTGET on block devices too, do not round to power of 2
  2021-06-03 13:37 ` [PATCH v3 4/7] file-posix: try BLKSECTGET on block devices too, do not round to power of 2 Paolo Bonzini
@ 2021-06-15 16:32   ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2021-06-15 16:32 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, qemu-block

On 03.06.21 15:37, Paolo Bonzini wrote:
> bs->sg is only true for character devices, but block devices can also
> be used with scsi-block and scsi-generic.  Unfortunately BLKSECTGET
> returns bytes in an int for /dev/sgN devices, and sectors in a short
> for block devices, so account for that in the code.
>
> The maximum transfer also need not be a power of 2 (for example I have
> seen disks with 1280 KiB maximum transfer) so there's no need to pass
> the result through pow2floor.
>
> This patch reintroduces the logic that was removed in commit 867eccfed8
> ("file-posix: Use max transfer length/segment count only for SCSI
> passthrough", 2019-07-12).  The removal was motivated by the performance
> regression when using a block device's max_transfer with kernel I/O;
> the new, separate max_hw_transfer limit avoids reintroducing the same
> regression.

(And the fact that the result of hdev_get_max_segments() is no longer 
used to cap max_transfer, but is just stored in max_iov instead.)

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block/file-posix.c | 40 ++++++++++++++++++++++------------------
>   1 file changed, 22 insertions(+), 18 deletions(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index f55f92d0f5..1439293f63 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1149,22 +1149,27 @@ static void raw_reopen_abort(BDRVReopenState *state)
>       s->reopen_state = NULL;
>   }
>   
> -static int sg_get_max_transfer_length(int fd)
> +static int hdev_get_max_hw_transfer(int fd, struct stat *st)
>   {
>   #ifdef BLKSECTGET
> -    int max_bytes = 0;
> -
> -    if (ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
> -        return max_bytes;
> +    if (S_ISBLK(st->st_mode)) {
> +        unsigned short max_sectors = 0;
> +        if (ioctl(fd, BLKSECTGET, &max_sectors) == 0) {
> +            return max_sectors * 512;
> +        }
>       } else {
> -        return -errno;
> +        int max_bytes = 0;
> +        if (ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
> +            return max_bytes;
> +        }
>       }
> +    return -errno;
>   #else
>       return -ENOSYS;
>   #endif
>   }
>   
> -static int sg_get_max_segments(int fd)
> +static int hdev_get_max_segments(int fd, struct stat *st)

OK, now I see why in patch 1 you treated `st` as a pointer. Still, needs 
to be `st.` in patch 1, and changed to `st->` in this patch only.

>   {
>   #ifdef CONFIG_LINUX
>       char buf[32];
> @@ -1173,12 +1178,6 @@ static int sg_get_max_segments(int fd)
>       int ret;
>       int sysfd = -1;
>       long max_segments;
> -    struct stat st;
> -
> -    if (fstat(fd, &st)) {
> -        ret = -errno;
> -        goto out;
> -    }
>   
>       if (S_ISCHR(st->st_mode)) {
>           if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
> @@ -1192,7 +1191,7 @@ static int sg_get_max_segments(int fd)
>       }
>   
>       sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
> -                                major(st.st_rdev), minor(st.st_rdev));
> +                                major(st->st_rdev), minor(st->st_rdev));
>       sysfd = open(sysfspath, O_RDONLY);
>       if (sysfd == -1) {
>           ret = -errno;
> @@ -1229,15 +1228,20 @@ out:
>   static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>   {
>       BDRVRawState *s = bs->opaque;
> +    struct stat st;
> +
> +    if (fstat(s->fd, &st)) {
> +        return;
> +    }

First, I think if we ignore an error, there should be a comment 
explaining why that’s OK.

It is OK here because inquiring transfer limits is best-effort.

However, the alignment probes below the following block are completely 
independent of `st`. Therefore, I don’t think we should skip them if 
`fstat()` failed here; only the `if (bs->sg || ...)` block should be 
skipped.

Max

>   
> -    if (bs->sg) {
> -        int ret = sg_get_max_transfer_length(s->fd);
> +    if (bs->sg || S_ISBLK(st.st_mode)) {
> +        int ret = hdev_get_max_hw_transfer(s->fd, &st);
>   
>           if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
> -            bs->bl.max_hw_transfer = pow2floor(ret);
> +            bs->bl.max_hw_transfer = ret;
>           }
>   
> -        ret = sg_get_max_segments(s->fd);
> +        ret = hdev_get_max_segments(s->fd, &st);
>           if (ret > 0) {
>               bs->bl.max_iov = ret;
>           }



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 5/7] block: feature detection for host block support
  2021-06-03 13:37 ` [PATCH v3 5/7] block: feature detection for host block support Paolo Bonzini
@ 2021-06-15 16:40   ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2021-06-15 16:40 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, Joelle van Dyne, qemu-block

On 03.06.21 15:37, Paolo Bonzini wrote:
> From: Joelle van Dyne <j@getutm.app>
>
> On Darwin (iOS), there are no system level APIs for directly accessing
> host block devices. We detect this at configure time.
>
> Signed-off-by: Joelle van Dyne <j@getutm.app>
> Message-Id: <20210315180341.31638-2-j@getutm.app>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block/file-posix.c   | 33 ++++++++++++++++++++++-----------
>   meson.build          |  6 +++++-
>   qapi/block-core.json | 10 +++++++---
>   3 files changed, 34 insertions(+), 15 deletions(-)

[...]

> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 2ea294129e..2dd41be156 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -897,7 +897,8 @@
>     'discriminator': 'driver',
>     'data': {
>         'file': 'BlockStatsSpecificFile',
> -      'host_device': 'BlockStatsSpecificFile',
> +      'host_device': { 'type': 'BlockStatsSpecificFile',
> +                       'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
>         'nvme': 'BlockStatsSpecificNvme' } }
>   
>   ##
> @@ -2814,7 +2815,9 @@
>   { 'enum': 'BlockdevDriver',
>     'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
>               'cloop', 'compress', 'copy-on-read', 'dmg', 'file', 'ftp', 'ftps',
> -            'gluster', 'host_cdrom', 'host_device', 'http', 'https', 'iscsi',
> +            'gluster', 'host_cdrom',
> +            {'name': 'host_device', 'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },

Shouldn’t this be done for host_cdrom, too? (and below)

Apart from that, looks good to me.

> +            'http', 'https', 'iscsi',
>               'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
>               'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
>               { 'name': 'replication', 'if': 'defined(CONFIG_REPLICATION)' },
> @@ -3996,7 +3999,8 @@
>         'ftps':       'BlockdevOptionsCurlFtps',
>         'gluster':    'BlockdevOptionsGluster',
>         'host_cdrom': 'BlockdevOptionsFile',
> -      'host_device':'BlockdevOptionsFile',
> +      'host_device': { 'type': 'BlockdevOptionsFile',
> +                       'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
>         'http':       'BlockdevOptionsCurlHttp',
>         'https':      'BlockdevOptionsCurlHttps',
>         'iscsi':      'BlockdevOptionsIscsi',



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 6/7] block: check for sys/disk.h
  2021-06-03 13:37 ` [PATCH v3 6/7] block: check for sys/disk.h Paolo Bonzini
@ 2021-06-15 16:42   ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2021-06-15 16:42 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: kwolf, Peter Maydell, Philippe Mathieu-Daudé,
	Joelle van Dyne, qemu-block

On 03.06.21 15:37, Paolo Bonzini wrote:
> From: Joelle van Dyne <j@getutm.app>
>
> Some BSD platforms do not have this header.
>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Signed-off-by: Joelle van Dyne <j@getutm.app>
> Message-Id: <20210315180341.31638-3-j@getutm.app>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block.c     | 2 +-
>   meson.build | 1 +
>   2 files changed, 2 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/7] block: detect DKIOCGETBLOCKCOUNT/SIZE before use
  2021-06-03 13:37 ` [PATCH v3 7/7] block: detect DKIOCGETBLOCKCOUNT/SIZE before use Paolo Bonzini
@ 2021-06-15 16:50   ` Max Reitz
  2021-06-15 16:57     ` Daniel P. Berrangé
  0 siblings, 1 reply; 28+ messages in thread
From: Max Reitz @ 2021-06-15 16:50 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: kwolf, Peter Maydell, qemu-block, Joelle van Dyne, Warner Losh

On 03.06.21 15:37, Paolo Bonzini wrote:
> From: Joelle van Dyne <j@getutm.app>
>
> iOS hosts do not have these defined so we fallback to the
> default behaviour.
>
> Co-authored-by: Warner Losh <imp@bsdimp.com>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Joelle van Dyne <j@getutm.app>
> Message-Id: <20210315180341.31638-4-j@getutm.app>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   block/file-posix.c | 21 +++++++++------------
>   1 file changed, 9 insertions(+), 12 deletions(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 5821e1afed..4e2f7cf508 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2322,8 +2322,11 @@ static int64_t raw_getlength(BlockDriverState *bs)
>   again:
>   #endif
>       if (!fstat(fd, &sb) && (S_IFCHR & sb.st_mode)) {
> +        size = 0;
>   #ifdef DIOCGMEDIASIZE
> -        if (ioctl(fd, DIOCGMEDIASIZE, (off_t *)&size))
> +        if (ioctl(fd, DIOCGMEDIASIZE, (off_t *)&size)) {

Pre-existing, but I feel compelled to express my unease about this cast.

> +            size = 0;
> +        }
>   #elif defined(DIOCGPART)
>           {
>                   struct partinfo pi;
> @@ -2332,9 +2335,7 @@ again:
>                   else
>                           size = 0;
>           }
> -        if (size == 0)
> -#endif
> -#if defined(__APPLE__) && defined(__MACH__)
> +#elif defined(DKIOCGETBLOCKCOUNT) && defined(DKIOCGETBLOCKSIZE)

As far a I can tell, before this patch, if the DIOCGMEDIASIZE ioctl 
failed, we fell back to this DKIOCGETBLOCKCOUNT/SIZE block (if compiled 
in). Now this is an #elif and so will not be used if DIOCGMEDIASIZE was 
defined. Is that intentional?

This may be fine, and apart from that, this patch looks good to me, but 
this change in behavior wasn’t mentioned in the commit message, hence me 
asking.

>           {
>               uint64_t sectors = 0;
>               uint32_t sector_size = 0;
> @@ -2342,19 +2343,15 @@ again:
>               if (ioctl(fd, DKIOCGETBLOCKCOUNT, &sectors) == 0
>                  && ioctl(fd, DKIOCGETBLOCKSIZE, &sector_size) == 0) {
>                   size = sectors * sector_size;
> -            } else {
> -                size = lseek(fd, 0LL, SEEK_END);
> -                if (size < 0) {
> -                    return -errno;
> -                }
>               }
>           }
> -#else
> -        size = lseek(fd, 0LL, SEEK_END);
> +#endif
> +        if (size == 0) {
> +            size = lseek(fd, 0LL, SEEK_END);
> +        }
>           if (size < 0) {
>               return -errno;
>           }
> -#endif
>   #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
>           switch(s->type) {
>           case FTYPE_CD:



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/7] block: detect DKIOCGETBLOCKCOUNT/SIZE before use
  2021-06-15 16:50   ` Max Reitz
@ 2021-06-15 16:57     ` Daniel P. Berrangé
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel P. Berrangé @ 2021-06-15 16:57 UTC (permalink / raw)
  To: Max Reitz
  Cc: kwolf, Peter Maydell, qemu-block, qemu-devel, Joelle van Dyne,
	Paolo Bonzini, Warner Losh

On Tue, Jun 15, 2021 at 06:50:57PM +0200, Max Reitz wrote:
> On 03.06.21 15:37, Paolo Bonzini wrote:
> > From: Joelle van Dyne <j@getutm.app>
> > 
> > iOS hosts do not have these defined so we fallback to the
> > default behaviour.
> > 
> > Co-authored-by: Warner Losh <imp@bsdimp.com>
> > Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> > Signed-off-by: Joelle van Dyne <j@getutm.app>
> > Message-Id: <20210315180341.31638-4-j@getutm.app>
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >   block/file-posix.c | 21 +++++++++------------
> >   1 file changed, 9 insertions(+), 12 deletions(-)
> > 
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 5821e1afed..4e2f7cf508 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -2322,8 +2322,11 @@ static int64_t raw_getlength(BlockDriverState *bs)
> >   again:
> >   #endif
> >       if (!fstat(fd, &sb) && (S_IFCHR & sb.st_mode)) {
> > +        size = 0;
> >   #ifdef DIOCGMEDIASIZE
> > -        if (ioctl(fd, DIOCGMEDIASIZE, (off_t *)&size))
> > +        if (ioctl(fd, DIOCGMEDIASIZE, (off_t *)&size)) {
> 
> Pre-existing, but I feel compelled to express my unease about this cast.

We set -D_FILE_OFFSET_BITS=64, so IIUC,  off_t should be 64-bits
on both 32 and 64 bit build hosts. IIUC, it is defined to be a
signed integer.  So while off_t may not have the same typedef
as int64_t, it should be the same size and signedness. I expect
we have other code with this same assumption about off-t/int64_t
interchangeability.

We could assert sizeof(int64_t) == sizeof(off_t) in a header
somewhere if we want to be super paranoid.

> 
> > +            size = 0;
> > +        }
> >   #elif defined(DIOCGPART)
> >           {
> >                   struct partinfo pi;
> > @@ -2332,9 +2335,7 @@ again:
> >                   else
> >                           size = 0;
> >           }
> > -        if (size == 0)
> > -#endif
> > -#if defined(__APPLE__) && defined(__MACH__)
> > +#elif defined(DKIOCGETBLOCKCOUNT) && defined(DKIOCGETBLOCKSIZE)
> 
> As far a I can tell, before this patch, if the DIOCGMEDIASIZE ioctl failed,
> we fell back to this DKIOCGETBLOCKCOUNT/SIZE block (if compiled in). Now
> this is an #elif and so will not be used if DIOCGMEDIASIZE was defined. Is
> that intentional?
> 
> This may be fine, and apart from that, this patch looks good to me, but this
> change in behavior wasn’t mentioned in the commit message, hence me asking.
> 
> >           {
> >               uint64_t sectors = 0;
> >               uint32_t sector_size = 0;
> > @@ -2342,19 +2343,15 @@ again:
> >               if (ioctl(fd, DKIOCGETBLOCKCOUNT, &sectors) == 0
> >                  && ioctl(fd, DKIOCGETBLOCKSIZE, &sector_size) == 0) {
> >                   size = sectors * sector_size;
> > -            } else {
> > -                size = lseek(fd, 0LL, SEEK_END);
> > -                if (size < 0) {
> > -                    return -errno;
> > -                }
> >               }
> >           }
> > -#else
> > -        size = lseek(fd, 0LL, SEEK_END);
> > +#endif
> > +        if (size == 0) {
> > +            size = lseek(fd, 0LL, SEEK_END);
> > +        }
> >           if (size < 0) {
> >               return -errno;
> >           }
> > -#endif
> >   #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
> >           switch(s->type) {
> >           case FTYPE_CD:
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/7] scsi-generic: pass max_segments via max_iov field in BlockLimits
  2021-06-15 16:20   ` Max Reitz
@ 2021-06-15 17:41     ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2021-06-15 17:41 UTC (permalink / raw)
  To: Max Reitz; +Cc: Wolf, Kevin, qemu-devel, open list:Block layer core

[-- Attachment #1: Type: text/plain, Size: 2715 bytes --]

Yes, I try to avoid the 90 character limit (though sometimes that's hard
too, hello QOM) but I don't think much of the lower one.

Paolo

Il mar 15 giu 2021, 18:20 Max Reitz <mreitz@redhat.com> ha scritto:

> On 03.06.21 15:37, Paolo Bonzini wrote:
> > I/O to a disk via read/write is not limited by the number of segments
> allowed
> > by the host adapter; the kernel can split requests if needed, and the
> limit
> > imposed by the host adapter can be very low (256k or so) to avoid that
> SG_IO
> > returns EINVAL if memory is heavily fragmented.
> >
> > Since this value is only interesting for SG_IO-based I/O, do not include
> > it in the max_transfer and only take it into account when patching the
> > block limits VPD page in the scsi-generic device.
> >
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >   block/file-posix.c     | 3 +--
> >   hw/scsi/scsi-generic.c | 6 ++++--
> >   2 files changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 58db526cc2..e3241a0dd3 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -1239,8 +1239,7 @@ static void raw_refresh_limits(BlockDriverState
> *bs, Error **errp)
> >
> >           ret = sg_get_max_segments(s->fd);
> >           if (ret > 0) {
> > -            bs->bl.max_transfer = MIN(bs->bl.max_transfer,
> > -                                      ret * qemu_real_host_page_size);
> > +            bs->bl.max_iov = ret;
> >           }
> >       }
> >
> > diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
> > index 98c30c5d5c..82e1e2ee79 100644
> > --- a/hw/scsi/scsi-generic.c
> > +++ b/hw/scsi/scsi-generic.c
> > @@ -179,10 +179,12 @@ static void
> scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s)
> >           (r->req.cmd.buf[1] & 0x01)) {
> >           page = r->req.cmd.buf[2];
> >           if (page == 0xb0) {
> > -            uint32_t max_transfer =
> > -                blk_get_max_transfer(s->conf.blk) / s->blocksize;
> > +            uint32_t max_transfer = blk_get_max_transfer(s->conf.blk);
> > +            uint32_t max_iov = blk_get_max_iov(s->conf.blk);
> >
> >               assert(max_transfer);
> > +            max_transfer = MIN_NON_ZERO(max_transfer, max_iov *
> qemu_real_host_page_size)
> > +                / s->blocksize;
>
> Now that I ran checkpatch for patch 3, I saw that it complains about
> this line being longer than 80 characters. I think it could be split so
> it doesn’t exceed that limit. It looks a bit like you intentionally
> exceeded the warning limit, but didn’t exceed the error limit (of 90).
> Is that so?
>
> Max
>
>

[-- Attachment #2: Type: text/html, Size: 3515 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/7] block: add max_hw_transfer to BlockLimits
  2021-06-15 16:18   ` Max Reitz
@ 2021-06-16 13:18     ` Paolo Bonzini
  2021-06-16 13:46       ` Max Reitz
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2021-06-16 13:18 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: kwolf, qemu-block

On 15/06/21 18:18, Max Reitz wrote:
>>   }
>> +/* Returns the maximum hardware transfer length, in bytes; guaranteed 
>> nonzero */
>> +uint64_t blk_get_max_hw_transfer(BlockBackend *blk)
>> +{
>> +    BlockDriverState *bs = blk_bs(blk);
>> +    uint64_t max = INT_MAX;
>> +
>> +    if (bs) {
>> +        max = MIN_NON_ZERO(bs->bl.max_hw_transfer, bs->bl.max_transfer);
>> +    }
>> +    return max;
> 
> Both `max_hw_transfer` and `max_transfer` can be 0, so this could return 
> 0, contrary to what the comment above promises.
> 
> Should `max` be initialized to 0 with a `MIN_NON_ZERO(max, INT_MAX)` 
> here (like `blk_get_max_transfer()` does it)?

Yes, something to that effect.

> (As for the rest, I think aligning to the request alignment makes sense, 
> but then again we don’t do that for max_transfer either, so... this at 
> least wouldn’t be a new bug.

Ok, will do.  I will also add a new patch to align max_transfer to the 
request alignment.

> Regarding the comment, checkpatch complains about it, so it should be 
> fixed so that /* is on its own line.

That makes it different from every other comment in block_int.h though. 
  Is it okay to fix all of them in a follow-up?

Paolo

> Speaking of checkpatch, now that I ran it, it also complains about the 
> new line in bdrv_merge_limits() exceeding 80 characters, so that should 
> be fixed, too.)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/7] block: add max_hw_transfer to BlockLimits
  2021-06-16 13:18     ` Paolo Bonzini
@ 2021-06-16 13:46       ` Max Reitz
  2021-06-17  9:59         ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Max Reitz @ 2021-06-16 13:46 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: kwolf, qemu-block

On 16.06.21 15:18, Paolo Bonzini wrote:
> On 15/06/21 18:18, Max Reitz wrote:
>>>   }
>>> +/* Returns the maximum hardware transfer length, in bytes; 
>>> guaranteed nonzero */
>>> +uint64_t blk_get_max_hw_transfer(BlockBackend *blk)
>>> +{
>>> +    BlockDriverState *bs = blk_bs(blk);
>>> +    uint64_t max = INT_MAX;
>>> +
>>> +    if (bs) {
>>> +        max = MIN_NON_ZERO(bs->bl.max_hw_transfer, 
>>> bs->bl.max_transfer);
>>> +    }
>>> +    return max;
>>
>> Both `max_hw_transfer` and `max_transfer` can be 0, so this could 
>> return 0, contrary to what the comment above promises.
>>
>> Should `max` be initialized to 0 with a `MIN_NON_ZERO(max, INT_MAX)` 
>> here (like `blk_get_max_transfer()` does it)?
>
> Yes, something to that effect.
>
>> (As for the rest, I think aligning to the request alignment makes 
>> sense, but then again we don’t do that for max_transfer either, so... 
>> this at least wouldn’t be a new bug.
>
> Ok, will do.  I will also add a new patch to align max_transfer to the 
> request alignment.
>
>> Regarding the comment, checkpatch complains about it, so it should be 
>> fixed so that /* is on its own line.
>
> That makes it different from every other comment in block_int.h 
> though.  Is it okay to fix all of them in a follow-up?

The reason it’s different is that the comment style in question was 
added to checkpatch only relatively recently. I can’t speak for others, 
but I’m a simple person. I just do what makes checkpatch happy. :)

Given that the checkpatch complaint is only a warning, I think it’s OK 
to keep the comment as it is here, and perhaps optionally fix all 
comments in block_int.h in a follow-up. I don’t think we need to fix 
existing comments, but, well, it wouldn’t be wrong.

Max



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/7] block: add max_hw_transfer to BlockLimits
  2021-06-16 13:46       ` Max Reitz
@ 2021-06-17  9:59         ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2021-06-17  9:59 UTC (permalink / raw)
  To: Max Reitz, qemu-devel

On 16/06/21 15:46, Max Reitz wrote:
> Given that the checkpatch complaint is only a warning, I think it’s OK 
> to keep the comment as it is here, and perhaps optionally fix all 
> comments in block_int.h in a follow-up. I don’t think we need to fix 
> existing comments, but, well, it wouldn’t be wrong.

We don't need to, but sometimes the benefit (of not discussing 
checkpatch over and over) does seem to outweigh the churn.  Thanks for 
the review!

Paolo



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-06-17 10:12 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-03 13:37 [PATCH v3 0/7] block: file-posix queue Paolo Bonzini
2021-06-03 13:37 ` [PATCH v3 1/7] file-posix: fix max_iov for /dev/sg devices Paolo Bonzini
2021-06-07  5:39   ` Vladimir Sementsov-Ogievskiy
2021-06-07  6:16     ` Vladimir Sementsov-Ogievskiy
2021-06-07  7:23   ` Vladimir Sementsov-Ogievskiy
2021-06-15 15:58   ` Max Reitz
2021-06-03 13:37 ` [PATCH v3 2/7] scsi-generic: pass max_segments via max_iov field in BlockLimits Paolo Bonzini
2021-06-15 16:06   ` Max Reitz
2021-06-15 16:20   ` Max Reitz
2021-06-15 17:41     ` Paolo Bonzini
2021-06-03 13:37 ` [PATCH v3 3/7] block: add max_hw_transfer to BlockLimits Paolo Bonzini
2021-06-03 17:33   ` Eric Blake
2021-06-04  7:21     ` Paolo Bonzini
2021-06-15 16:18   ` Max Reitz
2021-06-16 13:18     ` Paolo Bonzini
2021-06-16 13:46       ` Max Reitz
2021-06-17  9:59         ` Paolo Bonzini
2021-06-03 13:37 ` [PATCH v3 4/7] file-posix: try BLKSECTGET on block devices too, do not round to power of 2 Paolo Bonzini
2021-06-15 16:32   ` Max Reitz
2021-06-03 13:37 ` [PATCH v3 5/7] block: feature detection for host block support Paolo Bonzini
2021-06-15 16:40   ` Max Reitz
2021-06-03 13:37 ` [PATCH v3 6/7] block: check for sys/disk.h Paolo Bonzini
2021-06-15 16:42   ` Max Reitz
2021-06-03 13:37 ` [PATCH v3 7/7] block: detect DKIOCGETBLOCKCOUNT/SIZE before use Paolo Bonzini
2021-06-15 16:50   ` Max Reitz
2021-06-15 16:57     ` Daniel P. Berrangé
2021-06-07 13:52 ` [PATCH v3 0/7] block: file-posix queue Maxim Levitsky
2021-06-10 13:40   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.