All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] block: file-posix queue
@ 2021-05-24 16:36 Paolo Bonzini
  2021-05-24 16:36 ` [PATCH v2 1/6] scsi-generic: pass max_segments via max_iov field in BlockLimits Paolo Bonzini
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Paolo Bonzini @ 2021-05-24 16:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block

Hi Max/Kevin,

this is a combination of two series that both affect host block device
support in block/file-posix.c.  I suspect both of them evaded your radar,
so I'm grouping them together and sending them out again.

v1->v2: add missing patch

Joelle van Dyne (3):
  block: feature detection for host block support
  block: check for sys/disk.h
  block: detect DKIOCGETBLOCKCOUNT/SIZE before use

Paolo Bonzini (3):
  scsi-generic: pass max_segments via max_iov field in BlockLimits
  file-posix: try BLKSECTGET on block devices too, do not round to power
    of 2
  file-posix: fix max_iov for /dev/sg devices

 block.c                |   2 +-
 block/file-posix.c     | 104 +++++++++++++++++++++++++----------------
 hw/scsi/scsi-generic.c |   6 ++-
 meson.build            |   7 ++-
 qapi/block-core.json   |  10 ++--
 5 files changed, 81 insertions(+), 48 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/6] scsi-generic: pass max_segments via max_iov field in BlockLimits
  2021-05-24 16:36 [PATCH v2 0/6] block: file-posix queue Paolo Bonzini
@ 2021-05-24 16:36 ` Paolo Bonzini
  2021-05-24 16:36 ` [PATCH v2 2/6] file-posix: try BLKSECTGET on block devices too, do not round to power of 2 Paolo Bonzini
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2021-05-24 16:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block

I/O to a disk via read/write is not limited by the number of segments allowed
by the host adapter; the kernel can split requests if needed, and the limit
imposed by the host adapter can be very low (256k or so) to avoid that SG_IO
returns EINVAL if memory is heavily fragmented.

Since this value is only interesting for SG_IO-based I/O, do not include
it in the max_transfer and only take it into account when patching the
block limits VPD page in the scsi-generic device.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/file-posix.c     | 3 +--
 hw/scsi/scsi-generic.c | 6 ++++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 10b71d9a13..59c889d5a7 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1228,8 +1228,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 
         ret = sg_get_max_segments(s->fd);
         if (ret > 0) {
-            bs->bl.max_transfer = MIN(bs->bl.max_transfer,
-                                      ret * qemu_real_host_page_size);
+            bs->bl.max_iov = ret;
         }
     }
 
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 98c30c5d5c..82e1e2ee79 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -179,10 +179,12 @@ static void scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s)
         (r->req.cmd.buf[1] & 0x01)) {
         page = r->req.cmd.buf[2];
         if (page == 0xb0) {
-            uint32_t max_transfer =
-                blk_get_max_transfer(s->conf.blk) / s->blocksize;
+            uint32_t max_transfer = blk_get_max_transfer(s->conf.blk);
+            uint32_t max_iov = blk_get_max_iov(s->conf.blk);
 
             assert(max_transfer);
+            max_transfer = MIN_NON_ZERO(max_transfer, max_iov * qemu_real_host_page_size)
+                / s->blocksize;
             stl_be_p(&r->buf[8], max_transfer);
             /* Also take care of the opt xfer len. */
             stl_be_p(&r->buf[12],
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/6] file-posix: try BLKSECTGET on block devices too, do not round to power of 2
  2021-05-24 16:36 [PATCH v2 0/6] block: file-posix queue Paolo Bonzini
  2021-05-24 16:36 ` [PATCH v2 1/6] scsi-generic: pass max_segments via max_iov field in BlockLimits Paolo Bonzini
@ 2021-05-24 16:36 ` Paolo Bonzini
  2021-05-27 15:51   ` Kevin Wolf
  2021-05-24 16:36 ` [PATCH v2 3/6] file-posix: fix max_iov for /dev/sg devices Paolo Bonzini
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2021-05-24 16:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block

bs->sg is only true for character devices, but block devices can also
be used with scsi-block and scsi-generic.  Unfortunately BLKSECTGET
returns bytes in an int for /dev/sgN devices, and sectors in a short
for block devices, so account for that in the code.

The maximum transfer also need not be a power of 2 (for example I have
seen disks with 1280 KiB maximum transfer) so there's no need to pass
the result through pow2floor.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/file-posix.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 59c889d5a7..e5ef006aee 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1149,22 +1149,27 @@ static void raw_reopen_abort(BDRVReopenState *state)
     s->reopen_state = NULL;
 }
 
-static int sg_get_max_transfer_length(int fd)
+static int sg_get_max_transfer_length(int fd, struct stat *st)
 {
 #ifdef BLKSECTGET
-    int max_bytes = 0;
-
-    if (ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
-        return max_bytes;
+    if (S_ISBLK(st->st_mode)) {
+        unsigned short max_sectors = 0;
+        if (ioctl(fd, BLKSECTGET, &max_sectors) == 0) {
+            return max_sectors * 512;
+        }
     } else {
-        return -errno;
+        int max_bytes = 0;
+        if (ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
+            return max_bytes;
+        }
     }
+    return -errno;
 #else
     return -ENOSYS;
 #endif
 }
 
-static int sg_get_max_segments(int fd)
+static int sg_get_max_segments(int fd, struct stat *st)
 {
 #ifdef CONFIG_LINUX
     char buf[32];
@@ -1173,15 +1178,9 @@ static int sg_get_max_segments(int fd)
     int ret;
     int sysfd = -1;
     long max_segments;
-    struct stat st;
-
-    if (fstat(fd, &st)) {
-        ret = -errno;
-        goto out;
-    }
 
     sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
-                                major(st.st_rdev), minor(st.st_rdev));
+                                major(st->st_rdev), minor(st->st_rdev));
     sysfd = open(sysfspath, O_RDONLY);
     if (sysfd == -1) {
         ret = -errno;
@@ -1218,15 +1217,20 @@ out:
 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
     BDRVRawState *s = bs->opaque;
+    struct stat st;
+
+    if (fstat(s->fd, &st)) {
+        return;
+    }
 
-    if (bs->sg) {
-        int ret = sg_get_max_transfer_length(s->fd);
+    if (bs->sg || S_ISBLK(st.st_mode)) {
+        int ret = sg_get_max_transfer_length(s->fd, &st);
 
         if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
-            bs->bl.max_transfer = pow2floor(ret);
+            bs->bl.max_transfer = ret;
         }
 
-        ret = sg_get_max_segments(s->fd);
+        ret = sg_get_max_segments(s->fd, &st);
         if (ret > 0) {
             bs->bl.max_iov = ret;
         }
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/6] file-posix: fix max_iov for /dev/sg devices
  2021-05-24 16:36 [PATCH v2 0/6] block: file-posix queue Paolo Bonzini
  2021-05-24 16:36 ` [PATCH v2 1/6] scsi-generic: pass max_segments via max_iov field in BlockLimits Paolo Bonzini
  2021-05-24 16:36 ` [PATCH v2 2/6] file-posix: try BLKSECTGET on block devices too, do not round to power of 2 Paolo Bonzini
@ 2021-05-24 16:36 ` Paolo Bonzini
  2021-05-24 16:36 ` [PATCH v2 4/6] block: feature detection for host block support Paolo Bonzini
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2021-05-24 16:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block

Even though it was only called for devices that have bs->sg set (which
must be character devices),
sg_get_max_segments looked at /sys/dev/block which only works for
block devices.

On Linux the sg driver has its own way to provide the maximum number of
iovecs in a scatter/gather list.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/file-posix.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index e5ef006aee..77a45083a6 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1179,6 +1179,17 @@ static int sg_get_max_segments(int fd, struct stat *st)
     int sysfd = -1;
     long max_segments;
 
+    if (S_ISCHR(st->st_mode)) {
+        if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) {
+            return ret;
+        }
+        return -EIO;
+    }
+
+    if (!S_ISBLK(st->st_mode)) {
+        return -ENOTSUP;
+    }
+
     sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
                                 major(st->st_rdev), minor(st->st_rdev));
     sysfd = open(sysfspath, O_RDONLY);
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/6] block: feature detection for host block support
  2021-05-24 16:36 [PATCH v2 0/6] block: file-posix queue Paolo Bonzini
                   ` (2 preceding siblings ...)
  2021-05-24 16:36 ` [PATCH v2 3/6] file-posix: fix max_iov for /dev/sg devices Paolo Bonzini
@ 2021-05-24 16:36 ` Paolo Bonzini
  2021-05-24 16:36 ` [PATCH v2 5/6] block: check for sys/disk.h Paolo Bonzini
  2021-05-24 16:36 ` [PATCH v2 6/6] block: detect DKIOCGETBLOCKCOUNT/SIZE before use Paolo Bonzini
  5 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2021-05-24 16:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: Joelle van Dyne, qemu-block

From: Joelle van Dyne <j@getutm.app>

On Darwin (iOS), there are no system level APIs for directly accessing
host block devices. We detect this at configure time.

Signed-off-by: Joelle van Dyne <j@getutm.app>
Message-Id: <20210315180341.31638-2-j@getutm.app>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/file-posix.c   | 33 ++++++++++++++++++++++-----------
 meson.build          |  6 +++++-
 qapi/block-core.json | 10 +++++++---
 3 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 77a45083a6..e83f34a960 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -42,6 +42,8 @@
 #include "scsi/constants.h"
 
 #if defined(__APPLE__) && (__MACH__)
+#include <sys/ioctl.h>
+#if defined(HAVE_HOST_BLOCK_DEVICE)
 #include <paths.h>
 #include <sys/param.h>
 #include <IOKit/IOKitLib.h>
@@ -52,6 +54,7 @@
 //#include <IOKit/storage/IOCDTypes.h>
 #include <IOKit/storage/IODVDMedia.h>
 #include <CoreFoundation/CoreFoundation.h>
+#endif /* defined(HAVE_HOST_BLOCK_DEVICE) */
 #endif
 
 #ifdef __sun__
@@ -180,7 +183,17 @@ typedef struct BDRVRawReopenState {
     bool check_cache_dropped;
 } BDRVRawReopenState;
 
-static int fd_open(BlockDriverState *bs);
+static int fd_open(BlockDriverState *bs)
+{
+    BDRVRawState *s = bs->opaque;
+
+    /* this is just to ensure s->fd is sane (its called by io ops) */
+    if (s->fd >= 0) {
+        return 0;
+    }
+    return -EIO;
+}
+
 static int64_t raw_getlength(BlockDriverState *bs);
 
 typedef struct RawPosixAIOData {
@@ -3017,6 +3030,7 @@ static BlockStatsSpecific *raw_get_specific_stats(BlockDriverState *bs)
     return stats;
 }
 
+#if defined(HAVE_HOST_BLOCK_DEVICE)
 static BlockStatsSpecific *hdev_get_specific_stats(BlockDriverState *bs)
 {
     BlockStatsSpecific *stats = g_new(BlockStatsSpecific, 1);
@@ -3026,6 +3040,7 @@ static BlockStatsSpecific *hdev_get_specific_stats(BlockDriverState *bs)
 
     return stats;
 }
+#endif /* HAVE_HOST_BLOCK_DEVICE */
 
 static QemuOptsList raw_create_opts = {
     .name = "raw-create-opts",
@@ -3241,6 +3256,8 @@ BlockDriver bdrv_file = {
 /***********************************************/
 /* host device */
 
+#if defined(HAVE_HOST_BLOCK_DEVICE)
+
 #if defined(__APPLE__) && defined(__MACH__)
 static kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
                                 CFIndex maxPathSize, int flags);
@@ -3533,16 +3550,6 @@ hdev_co_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
 }
 #endif /* linux */
 
-static int fd_open(BlockDriverState *bs)
-{
-    BDRVRawState *s = bs->opaque;
-
-    /* this is just to ensure s->fd is sane (its called by io ops) */
-    if (s->fd >= 0)
-        return 0;
-    return -EIO;
-}
-
 static coroutine_fn int
 hdev_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes)
 {
@@ -3866,6 +3873,8 @@ static BlockDriver bdrv_host_cdrom = {
 };
 #endif /* __FreeBSD__ */
 
+#endif /* HAVE_HOST_BLOCK_DEVICE */
+
 static void bdrv_file_init(void)
 {
     /*
@@ -3873,6 +3882,7 @@ static void bdrv_file_init(void)
      * registered last will get probed first.
      */
     bdrv_register(&bdrv_file);
+#if defined(HAVE_HOST_BLOCK_DEVICE)
     bdrv_register(&bdrv_host_device);
 #ifdef __linux__
     bdrv_register(&bdrv_host_cdrom);
@@ -3880,6 +3890,7 @@ static void bdrv_file_init(void)
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
     bdrv_register(&bdrv_host_cdrom);
 #endif
+#endif /* HAVE_HOST_BLOCK_DEVICE */
 }
 
 block_init(bdrv_file_init);
diff --git a/meson.build b/meson.build
index 890d7cf6b3..338ea279ee 100644
--- a/meson.build
+++ b/meson.build
@@ -182,7 +182,7 @@ if targetos == 'windows'
                                       include_directories: include_directories('.'))
 elif targetos == 'darwin'
   coref = dependency('appleframeworks', modules: 'CoreFoundation')
-  iokit = dependency('appleframeworks', modules: 'IOKit')
+  iokit = dependency('appleframeworks', modules: 'IOKit', required: false)
 elif targetos == 'sunos'
   socket = [cc.find_library('socket'),
             cc.find_library('nsl'),
@@ -1071,6 +1071,9 @@ if get_option('cfi')
   add_global_link_arguments(cfi_flags, native: false, language: ['c', 'cpp', 'objc'])
 endif
 
+have_host_block_device = (targetos != 'darwin' or
+    cc.has_header('IOKit/storage/IOMedia.h'))
+
 #################
 # config-host.h #
 #################
@@ -1164,6 +1167,7 @@ config_host_data.set('HAVE_PTY_H', cc.has_header('pty.h'))
 config_host_data.set('HAVE_SYS_IOCCOM_H', cc.has_header('sys/ioccom.h'))
 config_host_data.set('HAVE_SYS_KCOV_H', cc.has_header('sys/kcov.h'))
 config_host_data.set('HAVE_SYSTEM_FUNCTION', cc.has_function('system', prefix: '#include <stdlib.h>'))
+config_host_data.set('HAVE_HOST_BLOCK_DEVICE', have_host_block_device)
 
 config_host_data.set('CONFIG_PREADV', cc.has_function('preadv', prefix: '#include <sys/uio.h>'))
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 2ea294129e..2dd41be156 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -897,7 +897,8 @@
   'discriminator': 'driver',
   'data': {
       'file': 'BlockStatsSpecificFile',
-      'host_device': 'BlockStatsSpecificFile',
+      'host_device': { 'type': 'BlockStatsSpecificFile',
+                       'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
       'nvme': 'BlockStatsSpecificNvme' } }
 
 ##
@@ -2814,7 +2815,9 @@
 { 'enum': 'BlockdevDriver',
   'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
             'cloop', 'compress', 'copy-on-read', 'dmg', 'file', 'ftp', 'ftps',
-            'gluster', 'host_cdrom', 'host_device', 'http', 'https', 'iscsi',
+            'gluster', 'host_cdrom',
+            {'name': 'host_device', 'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
+            'http', 'https', 'iscsi',
             'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
             'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
             { 'name': 'replication', 'if': 'defined(CONFIG_REPLICATION)' },
@@ -3996,7 +3999,8 @@
       'ftps':       'BlockdevOptionsCurlFtps',
       'gluster':    'BlockdevOptionsGluster',
       'host_cdrom': 'BlockdevOptionsFile',
-      'host_device':'BlockdevOptionsFile',
+      'host_device': { 'type': 'BlockdevOptionsFile',
+                       'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
       'http':       'BlockdevOptionsCurlHttp',
       'https':      'BlockdevOptionsCurlHttps',
       'iscsi':      'BlockdevOptionsIscsi',
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 5/6] block: check for sys/disk.h
  2021-05-24 16:36 [PATCH v2 0/6] block: file-posix queue Paolo Bonzini
                   ` (3 preceding siblings ...)
  2021-05-24 16:36 ` [PATCH v2 4/6] block: feature detection for host block support Paolo Bonzini
@ 2021-05-24 16:36 ` Paolo Bonzini
  2021-05-24 16:36 ` [PATCH v2 6/6] block: detect DKIOCGETBLOCKCOUNT/SIZE before use Paolo Bonzini
  5 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2021-05-24 16:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Philippe Mathieu-Daudé, Joelle van Dyne, qemu-block

From: Joelle van Dyne <j@getutm.app>

Some BSD platforms do not have this header.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Joelle van Dyne <j@getutm.app>
Message-Id: <20210315180341.31638-3-j@getutm.app>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c     | 2 +-
 meson.build | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 0dc97281dc..a16a2840d3 100644
--- a/block.c
+++ b/block.c
@@ -54,7 +54,7 @@
 #ifdef CONFIG_BSD
 #include <sys/ioctl.h>
 #include <sys/queue.h>
-#ifndef __DragonFly__
+#if defined(HAVE_SYS_DISK_H)
 #include <sys/disk.h>
 #endif
 #endif
diff --git a/meson.build b/meson.build
index 338ea279ee..82d0e6a224 100644
--- a/meson.build
+++ b/meson.build
@@ -1168,6 +1168,7 @@ config_host_data.set('HAVE_SYS_IOCCOM_H', cc.has_header('sys/ioccom.h'))
 config_host_data.set('HAVE_SYS_KCOV_H', cc.has_header('sys/kcov.h'))
 config_host_data.set('HAVE_SYSTEM_FUNCTION', cc.has_function('system', prefix: '#include <stdlib.h>'))
 config_host_data.set('HAVE_HOST_BLOCK_DEVICE', have_host_block_device)
+config_host_data.set('HAVE_SYS_DISK_H', cc.has_header('sys/disk.h'))
 
 config_host_data.set('CONFIG_PREADV', cc.has_function('preadv', prefix: '#include <sys/uio.h>'))
 
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 6/6] block: detect DKIOCGETBLOCKCOUNT/SIZE before use
  2021-05-24 16:36 [PATCH v2 0/6] block: file-posix queue Paolo Bonzini
                   ` (4 preceding siblings ...)
  2021-05-24 16:36 ` [PATCH v2 5/6] block: check for sys/disk.h Paolo Bonzini
@ 2021-05-24 16:36 ` Paolo Bonzini
  5 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2021-05-24 16:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: Warner Losh, Joelle van Dyne, qemu-block, Peter Maydell

From: Joelle van Dyne <j@getutm.app>

iOS hosts do not have these defined so we fallback to the
default behaviour.

Co-authored-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Joelle van Dyne <j@getutm.app>
Message-Id: <20210315180341.31638-4-j@getutm.app>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/file-posix.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index e83f34a960..34bd413553 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2311,8 +2311,11 @@ static int64_t raw_getlength(BlockDriverState *bs)
 again:
 #endif
     if (!fstat(fd, &sb) && (S_IFCHR & sb.st_mode)) {
+        size = 0;
 #ifdef DIOCGMEDIASIZE
-        if (ioctl(fd, DIOCGMEDIASIZE, (off_t *)&size))
+        if (ioctl(fd, DIOCGMEDIASIZE, (off_t *)&size)) {
+            size = 0;
+        }
 #elif defined(DIOCGPART)
         {
                 struct partinfo pi;
@@ -2321,9 +2324,7 @@ again:
                 else
                         size = 0;
         }
-        if (size == 0)
-#endif
-#if defined(__APPLE__) && defined(__MACH__)
+#elif defined(DKIOCGETBLOCKCOUNT) && defined(DKIOCGETBLOCKSIZE)
         {
             uint64_t sectors = 0;
             uint32_t sector_size = 0;
@@ -2331,19 +2332,15 @@ again:
             if (ioctl(fd, DKIOCGETBLOCKCOUNT, &sectors) == 0
                && ioctl(fd, DKIOCGETBLOCKSIZE, &sector_size) == 0) {
                 size = sectors * sector_size;
-            } else {
-                size = lseek(fd, 0LL, SEEK_END);
-                if (size < 0) {
-                    return -errno;
-                }
             }
         }
-#else
-        size = lseek(fd, 0LL, SEEK_END);
+#endif
+        if (size == 0) {
+            size = lseek(fd, 0LL, SEEK_END);
+        }
         if (size < 0) {
             return -errno;
         }
-#endif
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
         switch(s->type) {
         case FTYPE_CD:
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/6] file-posix: try BLKSECTGET on block devices too, do not round to power of 2
  2021-05-24 16:36 ` [PATCH v2 2/6] file-posix: try BLKSECTGET on block devices too, do not round to power of 2 Paolo Bonzini
@ 2021-05-27 15:51   ` Kevin Wolf
  2021-05-27 20:14     ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Kevin Wolf @ 2021-05-27 15:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: nsoffer, qemu-devel, qemu-block, mlevitsk

Am 24.05.2021 um 18:36 hat Paolo Bonzini geschrieben:
> bs->sg is only true for character devices, but block devices can also
> be used with scsi-block and scsi-generic.  Unfortunately BLKSECTGET
> returns bytes in an int for /dev/sgN devices, and sectors in a short
> for block devices, so account for that in the code.
> 
> The maximum transfer also need not be a power of 2 (for example I have
> seen disks with 1280 KiB maximum transfer) so there's no need to pass
> the result through pow2floor.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Looks like this is more or less a revert of Maxim's commit 867eccfe. If
this is what we want, should this old commit be mentioned in one way or
another in the commit message?

Apparently the motivation for Maxim's patch was, if I'm reading the
description correctly, that it affected non-sg cases by imposing
unnecessary restrictions. I see that patch 1 changed the max_iov part so
that it won't affect non-sg cases any more, but max_transfer could still
be more restricted than necessary, no?

For convenience, the bug report fixed with that patch is here:
https://bugzilla.redhat.com/show_bug.cgi?id=1647104

Are we really trying to describe different things (limits for SG_IO and
for normal I/O) in one value with max_transfer, even though it could be
two different numbers for the same block device?

> diff --git a/block/file-posix.c b/block/file-posix.c
> index 59c889d5a7..e5ef006aee 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1149,22 +1149,27 @@ static void raw_reopen_abort(BDRVReopenState *state)
>      s->reopen_state = NULL;
>  }
>  
> -static int sg_get_max_transfer_length(int fd)
> +static int sg_get_max_transfer_length(int fd, struct stat *st)

This is now a misnomer. Should we revert to the pre-867eccfe name
hdev_get_max_transfer_length()?

>  {
>  #ifdef BLKSECTGET
> -    int max_bytes = 0;
> -
> -    if (ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
> -        return max_bytes;
> +    if (S_ISBLK(st->st_mode)) {
> +        unsigned short max_sectors = 0;
> +        if (ioctl(fd, BLKSECTGET, &max_sectors) == 0) {
> +            return max_sectors * 512;
> +        }
>      } else {
> -        return -errno;
> +        int max_bytes = 0;
> +        if (ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
> +            return max_bytes;
> +        }
>      }
> +    return -errno;
>  #else
>      return -ENOSYS;
>  #endif
>  }
>  
> -static int sg_get_max_segments(int fd)
> +static int sg_get_max_segments(int fd, struct stat *st)

Same for this one.

>  {
>  #ifdef CONFIG_LINUX
>      char buf[32];
> @@ -1173,15 +1178,9 @@ static int sg_get_max_segments(int fd)
>      int ret;
>      int sysfd = -1;
>      long max_segments;
> -    struct stat st;
> -
> -    if (fstat(fd, &st)) {
> -        ret = -errno;
> -        goto out;
> -    }
>  
>      sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
> -                                major(st.st_rdev), minor(st.st_rdev));
> +                                major(st->st_rdev), minor(st->st_rdev));
>      sysfd = open(sysfspath, O_RDONLY);
>      if (sysfd == -1) {
>          ret = -errno;
> @@ -1218,15 +1217,20 @@ out:
>  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>      BDRVRawState *s = bs->opaque;
> +    struct stat st;
> +
> +    if (fstat(s->fd, &st)) {
> +        return;

Don't we want to set errp? Or do you intentionally ignore the error?

> +    }
>  
> -    if (bs->sg) {
> -        int ret = sg_get_max_transfer_length(s->fd);
> +    if (bs->sg || S_ISBLK(st.st_mode)) {
> +        int ret = sg_get_max_transfer_length(s->fd, &st);
>  
>          if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
> -            bs->bl.max_transfer = pow2floor(ret);
> +            bs->bl.max_transfer = ret;
>          }
>  
> -        ret = sg_get_max_segments(s->fd);
> +        ret = sg_get_max_segments(s->fd, &st);
>          if (ret > 0) {
>              bs->bl.max_iov = ret;
>          }

Kevin



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/6] file-posix: try BLKSECTGET on block devices too, do not round to power of 2
  2021-05-27 15:51   ` Kevin Wolf
@ 2021-05-27 20:14     ` Paolo Bonzini
  2021-05-31 13:59       ` Kevin Wolf
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2021-05-27 20:14 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: nsoffer, qemu-devel, qemu-block, mlevitsk

On 27/05/21 17:51, Kevin Wolf wrote:
> Am 24.05.2021 um 18:36 hat Paolo Bonzini geschrieben:
>> bs->sg is only true for character devices, but block devices can also
>> be used with scsi-block and scsi-generic.  Unfortunately BLKSECTGET
>> returns bytes in an int for /dev/sgN devices, and sectors in a short
>> for block devices, so account for that in the code.
>>
>> The maximum transfer also need not be a power of 2 (for example I have
>> seen disks with 1280 KiB maximum transfer) so there's no need to pass
>> the result through pow2floor.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> Looks like this is more or less a revert of Maxim's commit 867eccfe. If
> this is what we want, should this old commit be mentioned in one way or
> another in the commit message?

It is (but it is not intentional).

> Apparently the motivation for Maxim's patch was, if I'm reading the
> description correctly, that it affected non-sg cases by imposing
> unnecessary restrictions. I see that patch 1 changed the max_iov part so
> that it won't affect non-sg cases any more, but max_transfer could still
> be more restricted than necessary, no?

Indeed the kernel puts no limit at all, but especially with O_DIRECT we 
probably benefit from avoiding the moral equivalent of "bufferbloat".

> For convenience, the bug report fixed with that patch is here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1647104
> 
> Are we really trying to describe different things (limits for SG_IO and
> for normal I/O) in one value with max_transfer, even though it could be
> two different numbers for the same block device?

>> -static int sg_get_max_transfer_length(int fd)
>> +static int sg_get_max_transfer_length(int fd, struct stat *st)
> 
> This is now a misnomer. Should we revert to the pre-867eccfe name
> hdev_get_max_transfer_length()?

Yes.

>>   static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>>   {
>>       BDRVRawState *s = bs->opaque;
>> +    struct stat st;
>> +
>> +    if (fstat(s->fd, &st)) {
>> +        return;
> 
> Don't we want to set errp? Or do you intentionally ignore the error?

Yes, since we ignore errors from the ioctl I figured it's the same for 
fstat (just do not do the ioctls).

However, skipping raw_probe_alignment is wrong.

Thanks for the review!  Should I wait for you to go through the other 
patches?

Paolo



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/6] file-posix: try BLKSECTGET on block devices too, do not round to power of 2
  2021-05-27 20:14     ` Paolo Bonzini
@ 2021-05-31 13:59       ` Kevin Wolf
  2021-05-31 16:36         ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Kevin Wolf @ 2021-05-31 13:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: nsoffer, qemu-devel, qemu-block, mlevitsk

Am 27.05.2021 um 22:14 hat Paolo Bonzini geschrieben:
> On 27/05/21 17:51, Kevin Wolf wrote:
> > Am 24.05.2021 um 18:36 hat Paolo Bonzini geschrieben:
> > > bs->sg is only true for character devices, but block devices can also
> > > be used with scsi-block and scsi-generic.  Unfortunately BLKSECTGET
> > > returns bytes in an int for /dev/sgN devices, and sectors in a short
> > > for block devices, so account for that in the code.
> > > 
> > > The maximum transfer also need not be a power of 2 (for example I have
> > > seen disks with 1280 KiB maximum transfer) so there's no need to pass
> > > the result through pow2floor.
> > > 
> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > 
> > Looks like this is more or less a revert of Maxim's commit 867eccfe. If
> > this is what we want, should this old commit be mentioned in one way or
> > another in the commit message?
> 
> It is (but it is not intentional).
> 
> > Apparently the motivation for Maxim's patch was, if I'm reading the
> > description correctly, that it affected non-sg cases by imposing
> > unnecessary restrictions. I see that patch 1 changed the max_iov part so
> > that it won't affect non-sg cases any more, but max_transfer could still
> > be more restricted than necessary, no?
> 
> Indeed the kernel puts no limit at all, but especially with O_DIRECT we
> probably benefit from avoiding the moral equivalent of "bufferbloat".

Yeah, that sounds plausible, but on the other hand the bug report Maxim
addressed was about performance issues related to buffer sizes being too
small. So even if we want to have some limit, max_transfer of the host
device is probably not the right one for the general case.

> > For convenience, the bug report fixed with that patch is here:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1647104
> > 
> > Are we really trying to describe different things (limits for SG_IO and
> > for normal I/O) in one value with max_transfer, even though it could be
> > two different numbers for the same block device?
> 
> > > -static int sg_get_max_transfer_length(int fd)
> > > +static int sg_get_max_transfer_length(int fd, struct stat *st)
> > 
> > This is now a misnomer. Should we revert to the pre-867eccfe name
> > hdev_get_max_transfer_length()?
> 
> Yes.
> 
> > >   static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
> > >   {
> > >       BDRVRawState *s = bs->opaque;
> > > +    struct stat st;
> > > +
> > > +    if (fstat(s->fd, &st)) {
> > > +        return;
> > 
> > Don't we want to set errp? Or do you intentionally ignore the error?
> 
> Yes, since we ignore errors from the ioctl I figured it's the same for fstat
> (just do not do the ioctls).
> 
> However, skipping raw_probe_alignment is wrong.
> 
> Thanks for the review!  Should I wait for you to go through the other
> patches?

I went through the whole series, but had no comments for the other
patches, so the rest should be good.

Kevin



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/6] file-posix: try BLKSECTGET on block devices too, do not round to power of 2
  2021-05-31 13:59       ` Kevin Wolf
@ 2021-05-31 16:36         ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2021-05-31 16:36 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: nsoffer, qemu-devel, qemu-block, mlevitsk

On 31/05/21 15:59, Kevin Wolf wrote:
>>> Apparently the motivation for Maxim's patch was, if I'm reading the
>>> description correctly, that it affected non-sg cases by imposing
>>> unnecessary restrictions. I see that patch 1 changed the max_iov part so
>>> that it won't affect non-sg cases any more, but max_transfer could still
>>> be more restricted than necessary, no?
>>
>> Indeed the kernel puts no limit at all, but especially with O_DIRECT we
>> probably benefit from avoiding the moral equivalent of "bufferbloat".
> 
> Yeah, that sounds plausible, but on the other hand the bug report Maxim
> addressed was about performance issues related to buffer sizes being too
> small. So even if we want to have some limit, max_transfer of the host
> device is probably not the right one for the general case.

Yeah, for a simple dd with O_DIRECT there is no real max_transfer, and 
if you are willing to allocate a big enough buffer.  Quick test on my 
laptop, reading 12.5 GiB:

    163840       9.46777s
    327680       9.41480s
    520192       9.39520s (max_iov * 4K)
    614400       9.06289s
    655360	8.85762s
    1310720      8.75502s
    2621440	8.26522s
    5242880	7.88319s
    10485760	7.66751s
    20971520 	7.42627s

In practice using blktrace shows that virtual address space is 
fragmented enough that the cap for I/O operations is not max_transfer 
but max_iov * 4096 (as was before the series)...  and yet the benefit 
effectively *begins* there because it's where the cost of the system 
calls is amortized over multiple kernel<->disk communications.

Things are probably more complicated if more than one I/O is in flight, 
and with async I/O instead of read/write, but still a huge part of 
performance is seemingly the cost of system calls (not just the context 
switch, also pinning the I/O buffer and all other ancillary costs).

So the solution is probably to add a max_hw_transfer limit in addition 
to max_transfer, and have max_hw_iov instead of max_iov to match.

Paolo



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-05-31 16:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-24 16:36 [PATCH v2 0/6] block: file-posix queue Paolo Bonzini
2021-05-24 16:36 ` [PATCH v2 1/6] scsi-generic: pass max_segments via max_iov field in BlockLimits Paolo Bonzini
2021-05-24 16:36 ` [PATCH v2 2/6] file-posix: try BLKSECTGET on block devices too, do not round to power of 2 Paolo Bonzini
2021-05-27 15:51   ` Kevin Wolf
2021-05-27 20:14     ` Paolo Bonzini
2021-05-31 13:59       ` Kevin Wolf
2021-05-31 16:36         ` Paolo Bonzini
2021-05-24 16:36 ` [PATCH v2 3/6] file-posix: fix max_iov for /dev/sg devices Paolo Bonzini
2021-05-24 16:36 ` [PATCH v2 4/6] block: feature detection for host block support Paolo Bonzini
2021-05-24 16:36 ` [PATCH v2 5/6] block: check for sys/disk.h Paolo Bonzini
2021-05-24 16:36 ` [PATCH v2 6/6] block: detect DKIOCGETBLOCKCOUNT/SIZE before use Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.