All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.
@ 2014-12-18 11:17 Ekaterina Tumanova
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 1/5] block: add bdrv functions for geometry and blocksize Ekaterina Tumanova
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Ekaterina Tumanova @ 2014-12-18 11:17 UTC (permalink / raw)
  To: Public KVM Mailing List
  Cc: kwolf, thuth, dahi, Ekaterina Tumanova, armbru, mihajlov,
	borntraeger, stefanha, cornelia.huck, pbonzini

Updates v4 -> v5:

Minor Updates according the last review from Markus:
1. update commit message for patch 2
2. fix comment typos
3. fix check_for_dasd to return -1 instead of -ENOTSUP

Thanks,
Kate.

---------------------------------------------------------------
   Patchset Description (didn't change):

Proper geometry and blocksize information is vital for support of
DASD/ECKD drives in Linux guests. Otherwise things will fail in
certain cases.

The existing geometry and blocksize qemu defaults have no sense
for DASD drives (hd_geometry_guess detection and 512 for sizes).
Setting this information manually in XML file is far from user-friendly,
as the admin has to manually look up the properties of the
host disks and then modify the guest definition accordingly.

Since Linux uses DASDs as normal block devices, we actually
want to use virtio-blk to pass those to KVM guests.

In order to avoid any change in behavior of other drives, the DASD
special casing was advised. We call ioctl BIODASDINFO2 on the block
device, which will only succeed if the device is really a DASD.

In order to retrieve the underlying device geometry and blocksizes
a new block-backend functions and underlying driver functions were
introduced (blk_probe_blocksizes anf blk_probe_geometry wrappers
and corresponding bdrv_xxxxxx functions).

As for now only "host_device" driver received new detection methods.
For "raw" we call childs method as usual. In future one may update
other drivers to add some other detection heuristics.

If the host_device appears to be a DASD, the driver functions
(hdev_probe_blocksizes and hdev_probe_geometry) will call certain
ioctls in order to detect geometry and blocksizes of the underlying device.
if probing failed bdrv_probe_blocksizes caller will set defaults,
and bdrv_probe_geometry will fail to allow fallback to old detection logic.

The front-end (BlockConf API) was updated:
1. a new blkconf_blocksizes function was added. It doesn't
change user-defined blocksize values. If properties are unset, it will
set values, returned by blk_probe_backend. In order to allow this logic,
blocksize properties were initialized with 0. (driver will return 512 if
backing device probing didn't succeed or if driver method is not defined).
2. hd_geometry guess was changed to firstly try to retrieve values via
blk_probe_geometry and if it fails, fallback to the old logic.

Ekaterina Tumanova (5):
  block: add bdrv functions for geometry and blocksize
  raw-posix: Refactor logical block size detection.
  block: Add driver methods to probe blocksizes and geometry
  block-backend: Add wrappers for blocksizes and geometry probing
  BlockConf: Call backend functions to detect geometry and blocksizes

 block.c                        |  35 +++++++++++
 block/block-backend.c          |  10 +++
 block/raw-posix.c              | 138 ++++++++++++++++++++++++++++++++++++-----
 block/raw_bsd.c                |  14 +++++
 hw/block/block.c               |  15 +++++
 hw/block/hd-geometry.c         |  10 ++-
 hw/block/nvme.c                |   1 +
 hw/block/virtio-blk.c          |   1 +
 hw/core/qdev-properties.c      |   3 +-
 hw/ide/qdev.c                  |   1 +
 hw/scsi/scsi-disk.c            |   1 +
 hw/usb/dev-storage.c           |   1 +
 include/block/block.h          |  13 ++++
 include/block/block_int.h      |  15 +++++
 include/hw/block/block.h       |   5 +-
 include/sysemu/block-backend.h |   2 +
 16 files changed, 244 insertions(+), 21 deletions(-)

-- 
1.8.5.5

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 1/5] block: add bdrv functions for geometry and blocksize
  2014-12-18 11:17 [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices Ekaterina Tumanova
@ 2014-12-18 11:18 ` Ekaterina Tumanova
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 2/5] raw-posix: Refactor logical block size detection Ekaterina Tumanova
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Ekaterina Tumanova @ 2014-12-18 11:18 UTC (permalink / raw)
  To: Public KVM Mailing List
  Cc: kwolf, thuth, dahi, Ekaterina Tumanova, armbru, mihajlov,
	borntraeger, stefanha, cornelia.huck, pbonzini

Add driver functions for geometry and blocksize detection

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
---
 block.c                   | 35 +++++++++++++++++++++++++++++++++++
 include/block/block.h     | 13 +++++++++++++
 include/block/block_int.h | 15 +++++++++++++++
 3 files changed, 63 insertions(+)

diff --git a/block.c b/block.c
index 4165d42..0586785 100644
--- a/block.c
+++ b/block.c
@@ -548,6 +548,41 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
     }
 }
 
+/**
+ * Get @bs's logical and physical block size, store them in @bsz.
+ * @bs must not be empty.
+ */
+void bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
+{
+    BlockDriver *drv = bs->drv;
+
+    assert(drv != NULL);
+    if (drv->bdrv_probe_blocksizes &&
+        !drv->bdrv_probe_blocksizes(bs, bsz)) {
+            return;
+    }
+    bsz->log = BDRV_SECTOR_SIZE;
+    bsz->phys = BDRV_SECTOR_SIZE;
+}
+
+/**
+ * Try to get @bs's geometry (cyls, heads, sectors)
+ * On success, store them in @geo struct and return 0.
+ * On failure return -errno.
+ * @bs must not be empty.
+ */
+int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
+{
+    BlockDriver *drv = bs->drv;
+
+    assert(drv != NULL);
+    if (drv->bdrv_probe_geometry) {
+        return drv->bdrv_probe_geometry(bs, geo);
+    }
+
+    return -ENOTSUP;
+}
+
 /*
  * Create a uniquely-named empty temporary file.
  * Return 0 upon success, otherwise a negative errno value.
diff --git a/include/block/block.h b/include/block/block.h
index 6e7275d..14ac3b1 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -60,6 +60,17 @@ typedef enum {
     BDRV_REQ_MAY_UNMAP    = 0x4,
 } BdrvRequestFlags;
 
+typedef struct BlockSizes {
+    uint32_t phys;
+    uint32_t log;
+} BlockSizes;
+
+typedef struct HDGeometry {
+    uint32_t heads;
+    uint32_t sectors;
+    uint32_t cylinders;
+} HDGeometry;
+
 #define BDRV_O_RDWR        0x0002
 #define BDRV_O_SNAPSHOT    0x0008 /* open the file read only and save writes in a snapshot */
 #define BDRV_O_TEMPORARY   0x0010 /* delete the file after use */
@@ -539,6 +550,8 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs);
  * the old #AioContext is not executing.
  */
 void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context);
+void bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz);
+int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo);
 
 void bdrv_io_plug(BlockDriverState *bs);
 void bdrv_io_unplug(BlockDriverState *bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 06a21dd..b216156 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -273,6 +273,21 @@ struct BlockDriver {
     void (*bdrv_io_unplug)(BlockDriverState *bs);
     void (*bdrv_flush_io_queue)(BlockDriverState *bs);
 
+    /**
+     * Try to get @bs's logical and physical block size.
+     * On success, store them in @bsz and return zero.
+     * On failure, return negative errno.
+     */
+    int (*bdrv_probe_blocksizes)(BlockDriverState *bs, BlockSizes *bsz);
+    /**
+     * Try to get @bs's geometry (cyls, heads, sectors)
+     * On success, store them in @geo and return 0.
+     * On failure return -errno.
+     * Only drivers that want to override guest geometry implement this
+     * callback; see hd_geometry_guess().
+     */
+    int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
+
     QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
1.8.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 2/5] raw-posix: Refactor logical block size detection.
  2014-12-18 11:17 [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices Ekaterina Tumanova
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 1/5] block: add bdrv functions for geometry and blocksize Ekaterina Tumanova
@ 2014-12-18 11:18 ` Ekaterina Tumanova
  2015-01-02 11:52   ` Stefan Hajnoczi
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 3/5] block: Add driver methods to probe blocksizes and geometry Ekaterina Tumanova
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Ekaterina Tumanova @ 2014-12-18 11:18 UTC (permalink / raw)
  To: Public KVM Mailing List
  Cc: kwolf, thuth, dahi, Ekaterina Tumanova, armbru, mihajlov,
	borntraeger, stefanha, cornelia.huck, pbonzini

1. Move block size detection into dedicated function.
2. Select exactly one IOCTL that detects blocksize, specific to the host OS.

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
---
 block/raw-posix.c | 41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e51293a..38172ca 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -217,11 +217,31 @@ static int raw_normalize_devicepath(const char **filename)
 }
 #endif
 
+/*
+ * Get logical block size via ioctl. On success return 0. Otherwise -errno.
+ */
+static int probe_logical_blocksize(int fd, unsigned int *sector_size)
+{
+#if defined(BLKSSZGET)
+#  define SECTOR_SIZE BLKSSZGET
+#elif defined(DKIOCGETBLOCKSIZE)
+#  define SECTOR_SIZE DKIOCGETBLOCKSIZE
+#elif defined(DIOCGSECTORSIZE)
+#  define SECTOR_SIZE DIOCGSECTORSIZE
+#else
+    return -ENOTSUP
+#endif
+    if (ioctl(fd, SECTOR_SIZE, sector_size) < 0) {
+        return -errno;
+    }
+    return 0;
+#undef SECTOR_SIZE
+}
+
 static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
 {
     BDRVRawState *s = bs->opaque;
     char *buf;
-    unsigned int sector_size;
 
     /* For /dev/sg devices the alignment is not really used.
        With buffered I/O, we don't have any restrictions. */
@@ -231,25 +251,12 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
         return;
     }
 
-    /* Try a few ioctls to get the right size */
     bs->request_alignment = 0;
     s->buf_align = 0;
-
-#ifdef BLKSSZGET
-    if (ioctl(fd, BLKSSZGET, &sector_size) >= 0) {
-        bs->request_alignment = sector_size;
-    }
-#endif
-#ifdef DKIOCGETBLOCKSIZE
-    if (ioctl(fd, DKIOCGETBLOCKSIZE, &sector_size) >= 0) {
-        bs->request_alignment = sector_size;
+    /* Let's try to use the logical blocksize for the alignment. */
+    if (probe_logical_blocksize(fd, &bs->request_alignment) < 0) {
+        bs->request_alignment = 0;
     }
-#endif
-#ifdef DIOCGSECTORSIZE
-    if (ioctl(fd, DIOCGSECTORSIZE, &sector_size) >= 0) {
-        bs->request_alignment = sector_size;
-    }
-#endif
 #ifdef CONFIG_XFS
     if (s->is_xfs) {
         struct dioattr da;
-- 
1.8.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 3/5] block: Add driver methods to probe blocksizes and geometry
  2014-12-18 11:17 [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices Ekaterina Tumanova
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 1/5] block: add bdrv functions for geometry and blocksize Ekaterina Tumanova
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 2/5] raw-posix: Refactor logical block size detection Ekaterina Tumanova
@ 2014-12-18 11:18 ` Ekaterina Tumanova
  2014-12-18 12:43   ` Thomas Huth
  2015-01-02 12:11   ` Stefan Hajnoczi
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 4/5] block-backend: Add wrappers for blocksizes and geometry probing Ekaterina Tumanova
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Ekaterina Tumanova @ 2014-12-18 11:18 UTC (permalink / raw)
  To: Public KVM Mailing List
  Cc: kwolf, thuth, dahi, Ekaterina Tumanova, armbru, mihajlov,
	borntraeger, stefanha, cornelia.huck, pbonzini

Introduce driver methods of defining disk blocksizes (physical and
logical) and hard drive geometry.
Methods are only implemented for "host_device". For "raw" devices
driver calls child's method.

For now geometry detection will only work for DASD devices. To check
that a local check_for_dasd function was introduced. It calls BIODASDINFO2
ioctl and returns its rc.

Blocksizes detection function will probe sizes for DASD devices and
set default for other devices.

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
---
 block/raw-posix.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/raw_bsd.c   | 14 ++++++++
 2 files changed, 111 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 38172ca..1783acf 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -56,6 +56,7 @@
 #include <linux/cdrom.h>
 #include <linux/fd.h>
 #include <linux/fs.h>
+#include <linux/hdreg.h>
 #ifndef FS_NOCOW_FL
 #define FS_NOCOW_FL                     0x00800000 /* Do not cow file */
 #endif
@@ -90,6 +91,10 @@
 #include <xfs/xfs.h>
 #endif
 
+#ifdef __s390__
+#include <asm/dasd.h>
+#endif
+
 //#define DEBUG_FLOPPY
 
 //#define DEBUG_BLOCK
@@ -238,6 +243,23 @@ static int probe_logical_blocksize(int fd, unsigned int *sector_size)
 #undef SECTOR_SIZE
 }
 
+/**
+ * Get physical block size of @fd.
+ * On success, store it in @blk_size and return 0.
+ * On failure, return -errno.
+ */
+static int probe_physical_blocksize(int fd, unsigned int *blk_size)
+{
+#ifdef BLKPBSZGET
+    if (ioctl(fd, BLKPBSZGET, blk_size) < 0) {
+        return -errno;
+    }
+    return 0;
+#else
+    return -ENOTSUP;
+#endif
+}
+
 static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
 {
     BDRVRawState *s = bs->opaque;
@@ -660,6 +682,79 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
     bs->bl.opt_mem_alignment = s->buf_align;
 }
 
+static int check_for_dasd(int fd)
+{
+#ifdef BIODASDINFO2
+    struct dasd_information2_t info = {0};
+
+    return ioctl(fd, BIODASDINFO2, &info);
+#else
+    return -1;
+#endif
+}
+
+/**
+ * Try to get @bs's logical and physical block size.
+ * On success, store them in @bsz and return zero.
+ * On failure, return negative errno.
+ */
+static int hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
+{
+    BDRVRawState *s = bs->opaque;
+    int ret;
+
+    /* If DASD, get blocksizes */
+    if (check_for_dasd(s->fd) < 0) {
+        return -ENOTSUP;
+    }
+    ret = probe_logical_blocksize(s->fd, &bsz->log);
+    if (ret < 0) {
+        return ret;
+    }
+    return probe_physical_blocksize(s->fd, &bsz->phys);
+}
+
+/**
+ * Try to get @bs's geometry (cyls, heads, sectos)
+ * On success, store them in @geo and return 0.
+ * On failure return -errno.
+ * (as for 12/2014 only DASDs are supported)
+ */
+static int hdev_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
+{
+    BDRVRawState *s = bs->opaque;
+    struct hd_geometry ioctl_geo = {0};
+    uint32_t blksize;
+
+    /* If DASD, get it's geometry */
+    if (check_for_dasd(s->fd) < 0) {
+        return -ENOTSUP;
+    }
+    if (ioctl(s->fd, HDIO_GETGEO, &ioctl_geo) < 0) {
+        return -errno;
+    }
+    /* HDIO_GETGEO may return success even though geo contains zeros
+       (e.g. certain multipath setups) */
+    if (!ioctl_geo.heads || !ioctl_geo.sectors || !ioctl_geo.cylinders) {
+        return -ENOTSUP;
+    }
+    /* Do not return a geometry for partition */
+    if (ioctl_geo.start != 0) {
+        return -ENOTSUP;
+    }
+    geo->heads = ioctl_geo.heads;
+    geo->sectors = ioctl_geo.sectors;
+    if (!probe_physical_blocksize(s->fd, &blksize)) {
+        /* overwrite cyls: HDIO_GETGEO result is incorrect for big drives */
+        geo->cylinders = bs->total_sectors / (blksize / BDRV_SECTOR_SIZE)
+                                           / (geo->heads * geo->sectors);
+        return 0;
+    }
+    geo->cylinders = ioctl_geo.cylinders;
+
+    return 0;
+}
+
 static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)
 {
     int ret;
@@ -2125,6 +2220,8 @@ static BlockDriver bdrv_host_device = {
     .bdrv_get_info = raw_get_info,
     .bdrv_get_allocated_file_size
                         = raw_get_allocated_file_size,
+    .bdrv_probe_blocksizes = hdev_probe_blocksizes,
+    .bdrv_probe_geometry = hdev_probe_geometry,
 
     .bdrv_detach_aio_context = raw_detach_aio_context,
     .bdrv_attach_aio_context = raw_attach_aio_context,
diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index 05b02c7..f3d532b 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -235,6 +235,18 @@ static int raw_probe(const uint8_t *buf, int buf_size, const char *filename)
     return 1;
 }
 
+static int raw_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
+{
+    bdrv_probe_blocksizes(bs->file, bsz);
+
+    return 0;
+}
+
+static int raw_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
+{
+    return bdrv_probe_geometry(bs->file, geo);
+}
+
 BlockDriver bdrv_raw = {
     .format_name          = "raw",
     .bdrv_probe           = &raw_probe,
@@ -252,6 +264,8 @@ BlockDriver bdrv_raw = {
     .has_variable_length  = true,
     .bdrv_get_info        = &raw_get_info,
     .bdrv_refresh_limits  = &raw_refresh_limits,
+    .bdrv_probe_blocksizes = &raw_probe_blocksizes,
+    .bdrv_probe_geometry  = &raw_probe_geometry,
     .bdrv_is_inserted     = &raw_is_inserted,
     .bdrv_media_changed   = &raw_media_changed,
     .bdrv_eject           = &raw_eject,
-- 
1.8.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 4/5] block-backend: Add wrappers for blocksizes and geometry probing
  2014-12-18 11:17 [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices Ekaterina Tumanova
                   ` (2 preceding siblings ...)
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 3/5] block: Add driver methods to probe blocksizes and geometry Ekaterina Tumanova
@ 2014-12-18 11:18 ` Ekaterina Tumanova
  2014-12-18 14:45   ` Thomas Huth
  2015-01-02 12:34   ` Stefan Hajnoczi
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 5/5] BlockConf: Call backend functions to detect geometry and blocksizes Ekaterina Tumanova
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Ekaterina Tumanova @ 2014-12-18 11:18 UTC (permalink / raw)
  To: Public KVM Mailing List
  Cc: kwolf, thuth, dahi, Ekaterina Tumanova, armbru, mihajlov,
	borntraeger, stefanha, cornelia.huck, pbonzini

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
---
 block/block-backend.c          | 10 ++++++++++
 include/sysemu/block-backend.h |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index ef16d73..4b9ed85 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -667,3 +667,13 @@ void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
 {
     return qemu_aio_get(aiocb_info, blk_bs(blk), cb, opaque);
 }
+
+void blk_probe_blocksizes(BlockBackend *blk, BlockSizes *bsz)
+{
+    bdrv_probe_blocksizes(blk->bs, bsz);
+}
+
+int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo)
+{
+    return bdrv_probe_geometry(blk->bs, geo);
+}
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8871a02..aecdc53 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -150,5 +150,7 @@ BlockAcctStats *blk_get_stats(BlockBackend *blk);
 
 void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
                   BlockCompletionFunc *cb, void *opaque);
+void blk_probe_blocksizes(BlockBackend *blk, BlockSizes *bsz);
+int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo);
 
 #endif
-- 
1.8.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 5/5] BlockConf: Call backend functions to detect geometry and blocksizes
  2014-12-18 11:17 [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices Ekaterina Tumanova
                   ` (3 preceding siblings ...)
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 4/5] block-backend: Add wrappers for blocksizes and geometry probing Ekaterina Tumanova
@ 2014-12-18 11:18 ` Ekaterina Tumanova
  2014-12-18 14:55   ` Thomas Huth
  2015-01-02 12:46   ` Stefan Hajnoczi
  2014-12-18 15:59 ` [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices Christian Borntraeger
  2015-01-02 11:30 ` Stefan Hajnoczi
  6 siblings, 2 replies; 25+ messages in thread
From: Ekaterina Tumanova @ 2014-12-18 11:18 UTC (permalink / raw)
  To: Public KVM Mailing List
  Cc: kwolf, thuth, dahi, Ekaterina Tumanova, armbru, mihajlov,
	borntraeger, stefanha, cornelia.huck, pbonzini

geometry: hd_geometry_guess function autodetects the drive geometry.
This patch adds a block backend call, that probes the backing device
geometry. If the inner driver method is implemented and succeeds
(currently only for DASDs), the blkconf_geometry will pass-through
the backing device geometry. Otherwise will fallback to old logic.

blocksize: This patch initializes blocksize properties to 0.
In order to set the properly a blkconf_blocksizes was introduced.
If user didn't set physical or logical blocksize, it will
retrieve it's value from a driver (which will return a default 512 for
any backend but DASD).

The blkconf_blocksizes call was added to all users of BlkConf.

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
---
 hw/block/block.c          | 15 +++++++++++++++
 hw/block/hd-geometry.c    | 10 +++++++++-
 hw/block/nvme.c           |  1 +
 hw/block/virtio-blk.c     |  1 +
 hw/core/qdev-properties.c |  3 ++-
 hw/ide/qdev.c             |  1 +
 hw/scsi/scsi-disk.c       |  1 +
 hw/usb/dev-storage.c      |  1 +
 include/hw/block/block.h  |  5 +++--
 9 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/hw/block/block.c b/hw/block/block.c
index a625773..a4f4f06 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -25,6 +25,21 @@ void blkconf_serial(BlockConf *conf, char **serial)
     }
 }
 
+void blkconf_blocksizes(BlockConf *conf)
+{
+    BlockBackend *blk = conf->blk;
+    BlockSizes blocksizes;
+
+    blk_probe_blocksizes(blk, &blocksizes);
+    /* fill in detected values if they are not defined via qemu command line */
+    if (!conf->physical_block_size) {
+        conf->physical_block_size = blocksizes.phys;
+    }
+    if (!conf->logical_block_size) {
+        conf->logical_block_size = blocksizes.log;
+    }
+}
+
 void blkconf_geometry(BlockConf *conf, int *ptrans,
                       unsigned cyls_max, unsigned heads_max, unsigned secs_max,
                       Error **errp)
diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c
index 6fcf74d..b187878 100644
--- a/hw/block/hd-geometry.c
+++ b/hw/block/hd-geometry.c
@@ -121,8 +121,16 @@ void hd_geometry_guess(BlockBackend *blk,
                        int *ptrans)
 {
     int cylinders, heads, secs, translation;
+    HDGeometry geo;
 
-    if (guess_disk_lchs(blk, &cylinders, &heads, &secs) < 0) {
+    /* Try to probe the backing device geometry, otherwise fallback
+       to the old logic. (as of 12/2014 probing only succeeds on DASDs) */
+    if (blk_probe_geometry(blk, &geo) == 0) {
+        *pcyls = geo.cylinders;
+        *psecs = geo.sectors;
+        *pheads = geo.heads;
+        translation = BIOS_ATA_TRANSLATION_NONE;
+    } else if (guess_disk_lchs(blk, &cylinders, &heads, &secs) < 0) {
         /* no LCHS guess: use a standard physical disk geometry  */
         guess_chs_for_size(blk, pcyls, pheads, psecs);
         translation = hd_bios_chs_auto_trans(*pcyls, *pheads, *psecs);
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index aa1ed98..2c630bc 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -763,6 +763,7 @@ static int nvme_init(PCIDevice *pci_dev)
     if (!n->serial) {
         return -1;
     }
+    blkconf_blocksizes(&n->conf);
 
     pci_conf = pci_dev->config;
     pci_conf[PCI_INTERRUPT_PIN] = 1;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index b19b102..6f01565 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -745,6 +745,7 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
         error_propagate(errp, err);
         return;
     }
+    blkconf_blocksizes(&conf->conf);
 
     virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
                 sizeof(struct virtio_blk_config));
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 2e47f70..ba81709 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -580,7 +580,8 @@ static void set_blocksize(Object *obj, Visitor *v, void *opaque,
         error_propagate(errp, local_err);
         return;
     }
-    if (value < min || value > max) {
+    /* value of 0 means "unset" */
+    if (value && (value < min || value > max)) {
         error_set(errp, QERR_PROPERTY_VALUE_OUT_OF_RANGE,
                   dev->id?:"", name, (int64_t)value, min, max);
         return;
diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index 1ebb58d..353854c 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -169,6 +169,7 @@ static int ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind)
     }
 
     blkconf_serial(&dev->conf, &dev->serial);
+    blkconf_blocksizes(&dev->conf);
     if (kind != IDE_CD) {
         blkconf_geometry(&dev->conf, &dev->chs_trans, 65536, 16, 255, &err);
         if (err) {
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index f65618d..e7244e6 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2251,6 +2251,7 @@ static void scsi_realize(SCSIDevice *dev, Error **errp)
     }
 
     blkconf_serial(&s->qdev.conf, &s->serial);
+    blkconf_blocksizes(&s->qdev.conf);
     if (dev->type == TYPE_DISK) {
         blkconf_geometry(&dev->conf, NULL, 65535, 255, 255, &err);
         if (err) {
diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index 4539733..cc02dfd 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -611,6 +611,7 @@ static void usb_msd_realize_storage(USBDevice *dev, Error **errp)
     }
 
     blkconf_serial(&s->conf, &dev->serial);
+    blkconf_blocksizes(&s->conf);
 
     /*
      * Hack alert: this pretends to be a block device, but it's really
diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index 0d0ce9a..3e502a8 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -44,9 +44,9 @@ static inline unsigned int get_physical_block_exp(BlockConf *conf)
 #define DEFINE_BLOCK_PROPERTIES(_state, _conf)                          \
     DEFINE_PROP_DRIVE("drive", _state, _conf.blk),                      \
     DEFINE_PROP_BLOCKSIZE("logical_block_size", _state,                 \
-                          _conf.logical_block_size, 512),               \
+                          _conf.logical_block_size, 0),               \
     DEFINE_PROP_BLOCKSIZE("physical_block_size", _state,                \
-                          _conf.physical_block_size, 512),              \
+                          _conf.physical_block_size, 0),              \
     DEFINE_PROP_UINT16("min_io_size", _state, _conf.min_io_size, 0),  \
     DEFINE_PROP_UINT32("opt_io_size", _state, _conf.opt_io_size, 0),    \
     DEFINE_PROP_UINT32("discard_granularity", _state, \
@@ -63,6 +63,7 @@ void blkconf_serial(BlockConf *conf, char **serial);
 void blkconf_geometry(BlockConf *conf, int *trans,
                       unsigned cyls_max, unsigned heads_max, unsigned secs_max,
                       Error **errp);
+void blkconf_blocksizes(BlockConf *conf);
 
 /* Hard disk geometry */
 
-- 
1.8.5.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/5] block: Add driver methods to probe blocksizes and geometry
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 3/5] block: Add driver methods to probe blocksizes and geometry Ekaterina Tumanova
@ 2014-12-18 12:43   ` Thomas Huth
  2015-01-02 12:11   ` Stefan Hajnoczi
  1 sibling, 0 replies; 25+ messages in thread
From: Thomas Huth @ 2014-12-18 12:43 UTC (permalink / raw)
  To: Ekaterina Tumanova
  Cc: kwolf, borntraeger, armbru, Public KVM Mailing List, mihajlov,
	dahi, stefanha, cornelia.huck, pbonzini

On Thu, 18 Dec 2014 12:18:02 +0100
Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> wrote:

> Introduce driver methods of defining disk blocksizes (physical and
> logical) and hard drive geometry.
> Methods are only implemented for "host_device". For "raw" devices
> driver calls child's method.
> 
> For now geometry detection will only work for DASD devices. To check
> that a local check_for_dasd function was introduced. It calls BIODASDINFO2
> ioctl and returns its rc.
> 
> Blocksizes detection function will probe sizes for DASD devices and
> set default for other devices.
> 
> Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
> Reviewed-by: Markus Armbruster <armbru@redhat.com>
> ---
>  block/raw-posix.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/raw_bsd.c   | 14 ++++++++
>  2 files changed, 111 insertions(+)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 38172ca..1783acf 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -56,6 +56,7 @@
>  #include <linux/cdrom.h>
>  #include <linux/fd.h>
>  #include <linux/fs.h>
> +#include <linux/hdreg.h>
>  #ifndef FS_NOCOW_FL
>  #define FS_NOCOW_FL                     0x00800000 /* Do not cow file */
>  #endif
> @@ -90,6 +91,10 @@
>  #include <xfs/xfs.h>
>  #endif
> 
> +#ifdef __s390__
> +#include <asm/dasd.h>
> +#endif
> +
>  //#define DEBUG_FLOPPY
> 
>  //#define DEBUG_BLOCK
> @@ -238,6 +243,23 @@ static int probe_logical_blocksize(int fd, unsigned int *sector_size)
>  #undef SECTOR_SIZE
>  }
> 
> +/**
> + * Get physical block size of @fd.
> + * On success, store it in @blk_size and return 0.
> + * On failure, return -errno.
> + */
> +static int probe_physical_blocksize(int fd, unsigned int *blk_size)
> +{
> +#ifdef BLKPBSZGET
> +    if (ioctl(fd, BLKPBSZGET, blk_size) < 0) {
> +        return -errno;
> +    }
> +    return 0;
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
>  static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
>  {
>      BDRVRawState *s = bs->opaque;
> @@ -660,6 +682,79 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>      bs->bl.opt_mem_alignment = s->buf_align;
>  }
> 
> +static int check_for_dasd(int fd)
> +{
> +#ifdef BIODASDINFO2
> +    struct dasd_information2_t info = {0};
> +
> +    return ioctl(fd, BIODASDINFO2, &info);
> +#else
> +    return -1;
> +#endif
> +}
> +
> +/**
> + * Try to get @bs's logical and physical block size.
> + * On success, store them in @bsz and return zero.
> + * On failure, return negative errno.
> + */
> +static int hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
> +{
> +    BDRVRawState *s = bs->opaque;
> +    int ret;
> +
> +    /* If DASD, get blocksizes */
> +    if (check_for_dasd(s->fd) < 0) {
> +        return -ENOTSUP;
> +    }
> +    ret = probe_logical_blocksize(s->fd, &bsz->log);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    return probe_physical_blocksize(s->fd, &bsz->phys);
> +}
> +
> +/**
> + * Try to get @bs's geometry (cyls, heads, sectos)
> + * On success, store them in @geo and return 0.
> + * On failure return -errno.
> + * (as for 12/2014 only DASDs are supported)
> + */
> +static int hdev_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
> +{
> +    BDRVRawState *s = bs->opaque;
> +    struct hd_geometry ioctl_geo = {0};
> +    uint32_t blksize;
> +
> +    /* If DASD, get it's geometry */

s/it's/its/

> +    if (check_for_dasd(s->fd) < 0) {
> +        return -ENOTSUP;
> +    }
> +    if (ioctl(s->fd, HDIO_GETGEO, &ioctl_geo) < 0) {
> +        return -errno;
> +    }
> +    /* HDIO_GETGEO may return success even though geo contains zeros
> +       (e.g. certain multipath setups) */
> +    if (!ioctl_geo.heads || !ioctl_geo.sectors || !ioctl_geo.cylinders) {
> +        return -ENOTSUP;
> +    }
> +    /* Do not return a geometry for partition */
> +    if (ioctl_geo.start != 0) {
> +        return -ENOTSUP;
> +    }
> +    geo->heads = ioctl_geo.heads;
> +    geo->sectors = ioctl_geo.sectors;
> +    if (!probe_physical_blocksize(s->fd, &blksize)) {
> +        /* overwrite cyls: HDIO_GETGEO result is incorrect for big drives */
> +        geo->cylinders = bs->total_sectors / (blksize / BDRV_SECTOR_SIZE)
> +                                           / (geo->heads * geo->sectors);
> +        return 0;
> +    }
> +    geo->cylinders = ioctl_geo.cylinders;
> +
> +    return 0;
> +}
> +
>  static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)
>  {
>      int ret;
> @@ -2125,6 +2220,8 @@ static BlockDriver bdrv_host_device = {
>      .bdrv_get_info = raw_get_info,
>      .bdrv_get_allocated_file_size
>                          = raw_get_allocated_file_size,
> +    .bdrv_probe_blocksizes = hdev_probe_blocksizes,
> +    .bdrv_probe_geometry = hdev_probe_geometry,
> 
>      .bdrv_detach_aio_context = raw_detach_aio_context,
>      .bdrv_attach_aio_context = raw_attach_aio_context,
> diff --git a/block/raw_bsd.c b/block/raw_bsd.c
> index 05b02c7..f3d532b 100644
> --- a/block/raw_bsd.c
> +++ b/block/raw_bsd.c
> @@ -235,6 +235,18 @@ static int raw_probe(const uint8_t *buf, int buf_size, const char *filename)
>      return 1;
>  }
> 
> +static int raw_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
> +{
> +    bdrv_probe_blocksizes(bs->file, bsz);
> +
> +    return 0;
> +}
> +
> +static int raw_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
> +{
> +    return bdrv_probe_geometry(bs->file, geo);
> +}
> +
>  BlockDriver bdrv_raw = {
>      .format_name          = "raw",
>      .bdrv_probe           = &raw_probe,
> @@ -252,6 +264,8 @@ BlockDriver bdrv_raw = {
>      .has_variable_length  = true,
>      .bdrv_get_info        = &raw_get_info,
>      .bdrv_refresh_limits  = &raw_refresh_limits,
> +    .bdrv_probe_blocksizes = &raw_probe_blocksizes,
> +    .bdrv_probe_geometry  = &raw_probe_geometry,
>      .bdrv_is_inserted     = &raw_is_inserted,
>      .bdrv_media_changed   = &raw_media_changed,
>      .bdrv_eject           = &raw_eject,

Looks good to me now (apart from the very minor typo).

Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/5] block-backend: Add wrappers for blocksizes and geometry probing
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 4/5] block-backend: Add wrappers for blocksizes and geometry probing Ekaterina Tumanova
@ 2014-12-18 14:45   ` Thomas Huth
  2015-01-02 12:34   ` Stefan Hajnoczi
  1 sibling, 0 replies; 25+ messages in thread
From: Thomas Huth @ 2014-12-18 14:45 UTC (permalink / raw)
  To: Ekaterina Tumanova
  Cc: kwolf, borntraeger, armbru, Public KVM Mailing List, mihajlov,
	dahi, stefanha, cornelia.huck, pbonzini

On Thu, 18 Dec 2014 12:18:03 +0100
Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> wrote:

> Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
> Reviewed-by: Markus Armbruster <armbru@redhat.com>
> ---
>  block/block-backend.c          | 10 ++++++++++
>  include/sysemu/block-backend.h |  2 ++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index ef16d73..4b9ed85 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -667,3 +667,13 @@ void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
>  {
>      return qemu_aio_get(aiocb_info, blk_bs(blk), cb, opaque);
>  }
> +
> +void blk_probe_blocksizes(BlockBackend *blk, BlockSizes *bsz)
> +{
> +    bdrv_probe_blocksizes(blk->bs, bsz);
> +}
> +
> +int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo)
> +{
> +    return bdrv_probe_geometry(blk->bs, geo);
> +}
> diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
> index 8871a02..aecdc53 100644
> --- a/include/sysemu/block-backend.h
> +++ b/include/sysemu/block-backend.h
> @@ -150,5 +150,7 @@ BlockAcctStats *blk_get_stats(BlockBackend *blk);
> 
>  void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
>                    BlockCompletionFunc *cb, void *opaque);
> +void blk_probe_blocksizes(BlockBackend *blk, BlockSizes *bsz);
> +int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo);
> 
>  #endif

Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 5/5] BlockConf: Call backend functions to detect geometry and blocksizes
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 5/5] BlockConf: Call backend functions to detect geometry and blocksizes Ekaterina Tumanova
@ 2014-12-18 14:55   ` Thomas Huth
  2015-01-02 12:46   ` Stefan Hajnoczi
  1 sibling, 0 replies; 25+ messages in thread
From: Thomas Huth @ 2014-12-18 14:55 UTC (permalink / raw)
  To: Ekaterina Tumanova
  Cc: kwolf, borntraeger, armbru, Public KVM Mailing List, mihajlov,
	dahi, stefanha, cornelia.huck, pbonzini

On Thu, 18 Dec 2014 12:18:04 +0100
Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> wrote:

> geometry: hd_geometry_guess function autodetects the drive geometry.
> This patch adds a block backend call, that probes the backing device
> geometry. If the inner driver method is implemented and succeeds
> (currently only for DASDs), the blkconf_geometry will pass-through
> the backing device geometry. Otherwise will fallback to old logic.
> 
> blocksize: This patch initializes blocksize properties to 0.
> In order to set the properly a blkconf_blocksizes was introduced.
> If user didn't set physical or logical blocksize, it will
> retrieve it's value from a driver (which will return a default 512 for
> any backend but DASD).
> 
> The blkconf_blocksizes call was added to all users of BlkConf.
> 
> Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
> Reviewed-by: Markus Armbruster <armbru@redhat.com>
> ---
>  hw/block/block.c          | 15 +++++++++++++++
>  hw/block/hd-geometry.c    | 10 +++++++++-
>  hw/block/nvme.c           |  1 +
>  hw/block/virtio-blk.c     |  1 +
>  hw/core/qdev-properties.c |  3 ++-
>  hw/ide/qdev.c             |  1 +
>  hw/scsi/scsi-disk.c       |  1 +
>  hw/usb/dev-storage.c      |  1 +
>  include/hw/block/block.h  |  5 +++--
>  9 files changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/block/block.c b/hw/block/block.c
> index a625773..a4f4f06 100644
> --- a/hw/block/block.c
> +++ b/hw/block/block.c
> @@ -25,6 +25,21 @@ void blkconf_serial(BlockConf *conf, char **serial)
>      }
>  }
> 
> +void blkconf_blocksizes(BlockConf *conf)
> +{
> +    BlockBackend *blk = conf->blk;
> +    BlockSizes blocksizes;
> +
> +    blk_probe_blocksizes(blk, &blocksizes);
> +    /* fill in detected values if they are not defined via qemu command line */
> +    if (!conf->physical_block_size) {
> +        conf->physical_block_size = blocksizes.phys;
> +    }
> +    if (!conf->logical_block_size) {
> +        conf->logical_block_size = blocksizes.log;
> +    }
> +}
> +
>  void blkconf_geometry(BlockConf *conf, int *ptrans,
>                        unsigned cyls_max, unsigned heads_max, unsigned secs_max,
>                        Error **errp)
> diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c
> index 6fcf74d..b187878 100644
> --- a/hw/block/hd-geometry.c
> +++ b/hw/block/hd-geometry.c
> @@ -121,8 +121,16 @@ void hd_geometry_guess(BlockBackend *blk,
>                         int *ptrans)
>  {
>      int cylinders, heads, secs, translation;
> +    HDGeometry geo;
> 
> -    if (guess_disk_lchs(blk, &cylinders, &heads, &secs) < 0) {
> +    /* Try to probe the backing device geometry, otherwise fallback
> +       to the old logic. (as of 12/2014 probing only succeeds on DASDs) */
> +    if (blk_probe_geometry(blk, &geo) == 0) {
> +        *pcyls = geo.cylinders;
> +        *psecs = geo.sectors;
> +        *pheads = geo.heads;
> +        translation = BIOS_ATA_TRANSLATION_NONE;
> +    } else if (guess_disk_lchs(blk, &cylinders, &heads, &secs) < 0) {
>          /* no LCHS guess: use a standard physical disk geometry  */
>          guess_chs_for_size(blk, pcyls, pheads, psecs);
>          translation = hd_bios_chs_auto_trans(*pcyls, *pheads, *psecs);
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index aa1ed98..2c630bc 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -763,6 +763,7 @@ static int nvme_init(PCIDevice *pci_dev)
>      if (!n->serial) {
>          return -1;
>      }
> +    blkconf_blocksizes(&n->conf);
> 
>      pci_conf = pci_dev->config;
>      pci_conf[PCI_INTERRUPT_PIN] = 1;
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index b19b102..6f01565 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -745,6 +745,7 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
>          error_propagate(errp, err);
>          return;
>      }
> +    blkconf_blocksizes(&conf->conf);
> 
>      virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
>                  sizeof(struct virtio_blk_config));
> diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
> index 2e47f70..ba81709 100644
> --- a/hw/core/qdev-properties.c
> +++ b/hw/core/qdev-properties.c
> @@ -580,7 +580,8 @@ static void set_blocksize(Object *obj, Visitor *v, void *opaque,
>          error_propagate(errp, local_err);
>          return;
>      }
> -    if (value < min || value > max) {
> +    /* value of 0 means "unset" */
> +    if (value && (value < min || value > max)) {
>          error_set(errp, QERR_PROPERTY_VALUE_OUT_OF_RANGE,
>                    dev->id?:"", name, (int64_t)value, min, max);
>          return;
> diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
> index 1ebb58d..353854c 100644
> --- a/hw/ide/qdev.c
> +++ b/hw/ide/qdev.c
> @@ -169,6 +169,7 @@ static int ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind)
>      }
> 
>      blkconf_serial(&dev->conf, &dev->serial);
> +    blkconf_blocksizes(&dev->conf);
>      if (kind != IDE_CD) {
>          blkconf_geometry(&dev->conf, &dev->chs_trans, 65536, 16, 255, &err);
>          if (err) {
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index f65618d..e7244e6 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -2251,6 +2251,7 @@ static void scsi_realize(SCSIDevice *dev, Error **errp)
>      }
> 
>      blkconf_serial(&s->qdev.conf, &s->serial);
> +    blkconf_blocksizes(&s->qdev.conf);
>      if (dev->type == TYPE_DISK) {
>          blkconf_geometry(&dev->conf, NULL, 65535, 255, 255, &err);
>          if (err) {
> diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
> index 4539733..cc02dfd 100644
> --- a/hw/usb/dev-storage.c
> +++ b/hw/usb/dev-storage.c
> @@ -611,6 +611,7 @@ static void usb_msd_realize_storage(USBDevice *dev, Error **errp)
>      }
> 
>      blkconf_serial(&s->conf, &dev->serial);
> +    blkconf_blocksizes(&s->conf);
> 
>      /*
>       * Hack alert: this pretends to be a block device, but it's really
> diff --git a/include/hw/block/block.h b/include/hw/block/block.h
> index 0d0ce9a..3e502a8 100644
> --- a/include/hw/block/block.h
> +++ b/include/hw/block/block.h
> @@ -44,9 +44,9 @@ static inline unsigned int get_physical_block_exp(BlockConf *conf)
>  #define DEFINE_BLOCK_PROPERTIES(_state, _conf)                          \
>      DEFINE_PROP_DRIVE("drive", _state, _conf.blk),                      \
>      DEFINE_PROP_BLOCKSIZE("logical_block_size", _state,                 \
> -                          _conf.logical_block_size, 512),               \
> +                          _conf.logical_block_size, 0),               \
>      DEFINE_PROP_BLOCKSIZE("physical_block_size", _state,                \
> -                          _conf.physical_block_size, 512),              \
> +                          _conf.physical_block_size, 0),              \
>      DEFINE_PROP_UINT16("min_io_size", _state, _conf.min_io_size, 0),  \
>      DEFINE_PROP_UINT32("opt_io_size", _state, _conf.opt_io_size, 0),    \
>      DEFINE_PROP_UINT32("discard_granularity", _state, \
> @@ -63,6 +63,7 @@ void blkconf_serial(BlockConf *conf, char **serial);
>  void blkconf_geometry(BlockConf *conf, int *trans,
>                        unsigned cyls_max, unsigned heads_max, unsigned secs_max,
>                        Error **errp);
> +void blkconf_blocksizes(BlockConf *conf);
> 
>  /* Hard disk geometry */
> 

Patch series looks fine to me now.

Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.
  2014-12-18 11:17 [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices Ekaterina Tumanova
                   ` (4 preceding siblings ...)
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 5/5] BlockConf: Call backend functions to detect geometry and blocksizes Ekaterina Tumanova
@ 2014-12-18 15:59 ` Christian Borntraeger
  2015-01-02 12:57   ` Stefan Hajnoczi
  2015-01-02 11:30 ` Stefan Hajnoczi
  6 siblings, 1 reply; 25+ messages in thread
From: Christian Borntraeger @ 2014-12-18 15:59 UTC (permalink / raw)
  To: Ekaterina Tumanova, Public KVM Mailing List
  Cc: kwolf, thuth, armbru, mihajlov, dahi, stefanha, cornelia.huck, pbonzini

Am 18.12.2014 um 12:17 schrieb Ekaterina Tumanova:
> Updates v4 -> v5:
> 
> Minor Updates according the last review from Markus:
> 1. update commit message for patch 2
> 2. fix comment typos
> 3. fix check_for_dasd to return -1 instead of -ENOTSUP
> 
> Thanks,
> Kate.
> 
> ---------------------------------------------------------------
>    Patchset Description (didn't change):
> 
> Proper geometry and blocksize information is vital for support of
> DASD/ECKD drives in Linux guests. Otherwise things will fail in
> certain cases.
> 
> The existing geometry and blocksize qemu defaults have no sense
> for DASD drives (hd_geometry_guess detection and 512 for sizes).
> Setting this information manually in XML file is far from user-friendly,
> as the admin has to manually look up the properties of the
> host disks and then modify the guest definition accordingly.
> 
> Since Linux uses DASDs as normal block devices, we actually
> want to use virtio-blk to pass those to KVM guests.
> 
> In order to avoid any change in behavior of other drives, the DASD
> special casing was advised. We call ioctl BIODASDINFO2 on the block
> device, which will only succeed if the device is really a DASD.
> 
> In order to retrieve the underlying device geometry and blocksizes
> a new block-backend functions and underlying driver functions were
> introduced (blk_probe_blocksizes anf blk_probe_geometry wrappers
> and corresponding bdrv_xxxxxx functions).
> 
> As for now only "host_device" driver received new detection methods.
> For "raw" we call childs method as usual. In future one may update
> other drivers to add some other detection heuristics.
> 
> If the host_device appears to be a DASD, the driver functions
> (hdev_probe_blocksizes and hdev_probe_geometry) will call certain
> ioctls in order to detect geometry and blocksizes of the underlying device.
> if probing failed bdrv_probe_blocksizes caller will set defaults,
> and bdrv_probe_geometry will fail to allow fallback to old detection logic.
> 
> The front-end (BlockConf API) was updated:
> 1. a new blkconf_blocksizes function was added. It doesn't
> change user-defined blocksize values. If properties are unset, it will
> set values, returned by blk_probe_backend. In order to allow this logic,
> blocksize properties were initialized with 0. (driver will return 512 if
> backing device probing didn't succeed or if driver method is not defined).
> 2. hd_geometry guess was changed to firstly try to retrieve values via
> blk_probe_geometry and if it fails, fallback to the old logic.
> 
> Ekaterina Tumanova (5):
>   block: add bdrv functions for geometry and blocksize
>   raw-posix: Refactor logical block size detection.
>   block: Add driver methods to probe blocksizes and geometry
>   block-backend: Add wrappers for blocksizes and geometry probing
>   BlockConf: Call backend functions to detect geometry and blocksizes
> 
>  block.c                        |  35 +++++++++++
>  block/block-backend.c          |  10 +++
>  block/raw-posix.c              | 138 ++++++++++++++++++++++++++++++++++++-----
>  block/raw_bsd.c                |  14 +++++
>  hw/block/block.c               |  15 +++++
>  hw/block/hd-geometry.c         |  10 ++-
>  hw/block/nvme.c                |   1 +
>  hw/block/virtio-blk.c          |   1 +
>  hw/core/qdev-properties.c      |   3 +-
>  hw/ide/qdev.c                  |   1 +
>  hw/scsi/scsi-disk.c            |   1 +
>  hw/usb/dev-storage.c           |   1 +
>  include/block/block.h          |  13 ++++
>  include/block/block_int.h      |  15 +++++
>  include/hw/block/block.h       |   5 +-
>  include/sysemu/block-backend.h |   2 +
>  16 files changed, 244 insertions(+), 21 deletions(-)
> 

Kevin, Stefan,

Are you ok with the patches? If yes, can you take care of these patches in the block tree?

Christian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.
  2014-12-18 11:17 [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices Ekaterina Tumanova
                   ` (5 preceding siblings ...)
  2014-12-18 15:59 ` [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices Christian Borntraeger
@ 2015-01-02 11:30 ` Stefan Hajnoczi
  2015-01-13  9:59   ` Ekaterina Tumanova
  6 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2015-01-02 11:30 UTC (permalink / raw)
  To: Ekaterina Tumanova
  Cc: kwolf, thuth, borntraeger, armbru, Public KVM Mailing List,
	mihajlov, dahi, stefanha, cornelia.huck, pbonzini

[-- Attachment #1: Type: text/plain, Size: 3196 bytes --]

On Thu, Dec 18, 2014 at 12:17:59PM +0100, Ekaterina Tumanova wrote:
> Updates v4 -> v5:
> 
> Minor Updates according the last review from Markus:
> 1. update commit message for patch 2
> 2. fix comment typos
> 3. fix check_for_dasd to return -1 instead of -ENOTSUP
> 
> Thanks,
> Kate.
> 
> ---------------------------------------------------------------
>    Patchset Description (didn't change):
> 
> Proper geometry and blocksize information is vital for support of
> DASD/ECKD drives in Linux guests. Otherwise things will fail in
> certain cases.
> 
> The existing geometry and blocksize qemu defaults have no sense
> for DASD drives (hd_geometry_guess detection and 512 for sizes).
> Setting this information manually in XML file is far from user-friendly,
> as the admin has to manually look up the properties of the
> host disks and then modify the guest definition accordingly.

What is the expected behavior when you dd the disk image from the DASD
to a raw image file on an NFS file system on the host?

It seems you'd lose the geometry detection and the disk image could be
unusable from inside the guest.

> Since Linux uses DASDs as normal block devices, we actually
> want to use virtio-blk to pass those to KVM guests.
> 
> In order to avoid any change in behavior of other drives, the DASD
> special casing was advised. We call ioctl BIODASDINFO2 on the block
> device, which will only succeed if the device is really a DASD.
> 
> In order to retrieve the underlying device geometry and blocksizes
> a new block-backend functions and underlying driver functions were
> introduced (blk_probe_blocksizes anf blk_probe_geometry wrappers
> and corresponding bdrv_xxxxxx functions).
> 
> As for now only "host_device" driver received new detection methods.
> For "raw" we call childs method as usual. In future one may update
> other drivers to add some other detection heuristics.
> 
> If the host_device appears to be a DASD, the driver functions
> (hdev_probe_blocksizes and hdev_probe_geometry) will call certain
> ioctls in order to detect geometry and blocksizes of the underlying device.
> if probing failed bdrv_probe_blocksizes caller will set defaults,
> and bdrv_probe_geometry will fail to allow fallback to old detection logic.
> 
> The front-end (BlockConf API) was updated:
> 1. a new blkconf_blocksizes function was added. It doesn't
> change user-defined blocksize values. If properties are unset, it will
> set values, returned by blk_probe_backend. In order to allow this logic,
> blocksize properties were initialized with 0. (driver will return 512 if
> backing device probing didn't succeed or if driver method is not defined).
> 2. hd_geometry guess was changed to firstly try to retrieve values via
> blk_probe_geometry and if it fails, fallback to the old logic.

Thanks for the detailed cover letter, it's much more than I had in mind
when I requested a cover letter.  I'm looking for an overview of the
problem this series fixes and the rationale for the solution.  It's not
necessary to explain the individual code changes in the cover letter
since they are already covered by the patches.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/5] raw-posix: Refactor logical block size detection.
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 2/5] raw-posix: Refactor logical block size detection Ekaterina Tumanova
@ 2015-01-02 11:52   ` Stefan Hajnoczi
  2015-01-13 10:03     ` Ekaterina Tumanova
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2015-01-02 11:52 UTC (permalink / raw)
  To: Ekaterina Tumanova
  Cc: kwolf, thuth, borntraeger, armbru, Public KVM Mailing List,
	mihajlov, dahi, stefanha, cornelia.huck, pbonzini

[-- Attachment #1: Type: text/plain, Size: 958 bytes --]

On Thu, Dec 18, 2014 at 12:18:01PM +0100, Ekaterina Tumanova wrote:
> +#if defined(BLKSSZGET)
> +#  define SECTOR_SIZE BLKSSZGET
> +#elif defined(DKIOCGETBLOCKSIZE)
> +#  define SECTOR_SIZE DKIOCGETBLOCKSIZE
> +#elif defined(DIOCGSECTORSIZE)
> +#  define SECTOR_SIZE DIOCGSECTORSIZE
> +#else
> +    return -ENOTSUP
> +#endif
> +    if (ioctl(fd, SECTOR_SIZE, sector_size) < 0) {
> +        return -errno;
> +    }
> +    return 0;
> +#undef SECTOR_SIZE

Not a reason to respin, but I would have preferred simply moving the old
code.

I think the new code works because BLKSSZGET is Linux, DKIOCGETBLOCKSIZE
is Mac OS, and DIOCGSECTORSIZE is FreeBSD.

If there is a host OS where more than one ioctl is available and the
first one fails then the new code is broken.  The old code didn't use
#elif so each ioctl had a chance to run.

Also, the name SECTOR_SIZE is misleading.  It's not a sector size value
but the "get sector size" ioctl request code.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 3/5] block: Add driver methods to probe blocksizes and geometry
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 3/5] block: Add driver methods to probe blocksizes and geometry Ekaterina Tumanova
  2014-12-18 12:43   ` Thomas Huth
@ 2015-01-02 12:11   ` Stefan Hajnoczi
  1 sibling, 0 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2015-01-02 12:11 UTC (permalink / raw)
  To: Ekaterina Tumanova
  Cc: kwolf, thuth, borntraeger, armbru, Public KVM Mailing List,
	mihajlov, dahi, stefanha, cornelia.huck, pbonzini

[-- Attachment #1: Type: text/plain, Size: 2161 bytes --]

On Thu, Dec 18, 2014 at 12:18:02PM +0100, Ekaterina Tumanova wrote:
> @@ -90,6 +91,10 @@
>  #include <xfs/xfs.h>
>  #endif
>  
> +#ifdef __s390__
> +#include <asm/dasd.h>
> +#endif

#if defined(__linux__) && defined(__s390__)

Or you could move it inside #ifdef __linux__ earlier in the file.

> +/**
> + * Try to get @bs's geometry (cyls, heads, sectos)
> + * On success, store them in @geo and return 0.
> + * On failure return -errno.
> + * (as for 12/2014 only DASDs are supported)

This comment isn't useful because it's apparent from the code and it
could get outdated.  It might be better to remove it and instead
document that this function allows the block driver to assign default
geometry values that the guest sees.

> + */
> +static int hdev_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
> +{
> +    BDRVRawState *s = bs->opaque;
> +    struct hd_geometry ioctl_geo = {0};
> +    uint32_t blksize;
> +
> +    /* If DASD, get it's geometry */
> +    if (check_for_dasd(s->fd) < 0) {
> +        return -ENOTSUP;
> +    }
> +    if (ioctl(s->fd, HDIO_GETGEO, &ioctl_geo) < 0) {

This is a Linux ioctl, won't this break compilation on non-Linux hosts?

> +        return -errno;
> +    }
> +    /* HDIO_GETGEO may return success even though geo contains zeros
> +       (e.g. certain multipath setups) */
> +    if (!ioctl_geo.heads || !ioctl_geo.sectors || !ioctl_geo.cylinders) {
> +        return -ENOTSUP;
> +    }
> +    /* Do not return a geometry for partition */
> +    if (ioctl_geo.start != 0) {
> +        return -ENOTSUP;
> +    }
> +    geo->heads = ioctl_geo.heads;
> +    geo->sectors = ioctl_geo.sectors;
> +    if (!probe_physical_blocksize(s->fd, &blksize)) {
> +        /* overwrite cyls: HDIO_GETGEO result is incorrect for big drives */
> +        geo->cylinders = bs->total_sectors / (blksize / BDRV_SECTOR_SIZE)
> +                                           / (geo->heads * geo->sectors);

Please use bdrv_nb_sectors(bs) instead of bs->total_sectors.
bs->total_sectors is a cached value that is basically private to
block.c and should not be accessed directly.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/5] block-backend: Add wrappers for blocksizes and geometry probing
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 4/5] block-backend: Add wrappers for blocksizes and geometry probing Ekaterina Tumanova
  2014-12-18 14:45   ` Thomas Huth
@ 2015-01-02 12:34   ` Stefan Hajnoczi
  1 sibling, 0 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2015-01-02 12:34 UTC (permalink / raw)
  To: Ekaterina Tumanova
  Cc: kwolf, thuth, borntraeger, armbru, Public KVM Mailing List,
	mihajlov, dahi, stefanha, cornelia.huck, pbonzini

[-- Attachment #1: Type: text/plain, Size: 374 bytes --]

On Thu, Dec 18, 2014 at 12:18:03PM +0100, Ekaterina Tumanova wrote:
> Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
> Reviewed-by: Markus Armbruster <armbru@redhat.com>
> ---
>  block/block-backend.c          | 10 ++++++++++
>  include/sysemu/block-backend.h |  2 ++
>  2 files changed, 12 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 5/5] BlockConf: Call backend functions to detect geometry and blocksizes
  2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 5/5] BlockConf: Call backend functions to detect geometry and blocksizes Ekaterina Tumanova
  2014-12-18 14:55   ` Thomas Huth
@ 2015-01-02 12:46   ` Stefan Hajnoczi
  1 sibling, 0 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2015-01-02 12:46 UTC (permalink / raw)
  To: Ekaterina Tumanova
  Cc: kwolf, thuth, borntraeger, armbru, Public KVM Mailing List,
	mihajlov, dahi, stefanha, cornelia.huck, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1870 bytes --]

On Thu, Dec 18, 2014 at 12:18:04PM +0100, Ekaterina Tumanova wrote:
> +void blkconf_blocksizes(BlockConf *conf)
> +{
> +    BlockBackend *blk = conf->blk;
> +    BlockSizes blocksizes;
> +
> +    blk_probe_blocksizes(blk, &blocksizes);
> +    /* fill in detected values if they are not defined via qemu command line */
> +    if (!conf->physical_block_size) {
> +        conf->physical_block_size = blocksizes.phys;
> +    }
> +    if (!conf->logical_block_size) {
> +        conf->logical_block_size = blocksizes.log;
> +    }
> +}

Please put the 512-byte default value in this function instead of in
blk_probe_blocksizes().  That way the .bdrv_probe_blocksizes() error
return is not discarded.

It also makes .bdrv_probe_blocksizes() and .bdrv_probe_geometry() APIs
consistent with each other.  Probing geometry doesn't set default
geometry either when then block driver does not support probing.

> diff --git a/include/hw/block/block.h b/include/hw/block/block.h
> index 0d0ce9a..3e502a8 100644
> --- a/include/hw/block/block.h
> +++ b/include/hw/block/block.h
> @@ -44,9 +44,9 @@ static inline unsigned int get_physical_block_exp(BlockConf *conf)
>  #define DEFINE_BLOCK_PROPERTIES(_state, _conf)                          \
>      DEFINE_PROP_DRIVE("drive", _state, _conf.blk),                      \
>      DEFINE_PROP_BLOCKSIZE("logical_block_size", _state,                 \
> -                          _conf.logical_block_size, 512),               \
> +                          _conf.logical_block_size, 0),               \
>      DEFINE_PROP_BLOCKSIZE("physical_block_size", _state,                \
> -                          _conf.physical_block_size, 512),              \
> +                          _conf.physical_block_size, 0),              \

Please drop the default parameter from the DEFINE_PROP_BLOCKSIZE() macro
since it must always be 0 now.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.
  2014-12-18 15:59 ` [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices Christian Borntraeger
@ 2015-01-02 12:57   ` Stefan Hajnoczi
  2015-01-13  8:32     ` Christian Borntraeger
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2015-01-02 12:57 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: kwolf, thuth, Public KVM Mailing List, Ekaterina Tumanova,
	armbru, mihajlov, dahi, stefanha, cornelia.huck, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1096 bytes --]

On Thu, Dec 18, 2014 at 04:59:50PM +0100, Christian Borntraeger wrote:
> Are you ok with the patches? If yes, can you take care of these patches in the block tree?

This series looks close, I've left comments on the patches.

The series is fine for command-line QEMU users where probing makes the
command-line more convenient, so we can merge it.  But the approach is
fundamentally wrong for stacks where libvirt is in use.

Libvirt is unaware of the guest geometry and block sizes that are probed
in QEMU by this patch series.  This breaks non-shared storage migration
and also means libvirt-based tools that manipulate drives on a guest may
inadvertently change the guest-visible geometry and cause disk problems.

For example, what happens when you copy the disk image off a host DASD
and onto NFS?  QEMU no longer probes the geometry and the disk geometry
has changed.

The right place to tackle guest-visible geometry is in libvirt, not in
QEMU, because it is guest state the needs to be captured in domain XML
so that migration and tooling can preserve it when manipulating guests.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.
  2015-01-02 12:57   ` Stefan Hajnoczi
@ 2015-01-13  8:32     ` Christian Borntraeger
  2015-01-13 10:51       ` Markus Armbruster
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Borntraeger @ 2015-01-13  8:32 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, thuth, Public KVM Mailing List, Ekaterina Tumanova,
	armbru, mihajlov, dahi, stefanha, cornelia.huck, pbonzini

Am 02.01.2015 um 13:57 schrieb Stefan Hajnoczi:
> On Thu, Dec 18, 2014 at 04:59:50PM +0100, Christian Borntraeger wrote:
>> Are you ok with the patches? If yes, can you take care of these patches in the block tree?
> 
> This series looks close, I've left comments on the patches.

OK, so you would take care of the series in your tree if the next spin addresses your comments, right?
> 
> The series is fine for command-line QEMU users where probing makes the
> command-line more convenient, so we can merge it.  But the approach is
> fundamentally wrong for stacks where libvirt is in use.
> Libvirt is unaware of the guest geometry and block sizes that are probed
> in QEMU by this patch series.  This breaks non-shared storage migration
> and also means libvirt-based tools that manipulate drives on a guest may
> inadvertently change the guest-visible geometry and cause disk problems.
> 
> For example, what happens when you copy the disk image off a host DASD
> and onto NFS?  QEMU no longer probes the geometry and the disk geometry
> has changed.
> 
> The right place to tackle guest-visible geometry is in libvirt, not in
> QEMU, because it is guest state the needs to be captured in domain XML
> so that migration and tooling can preserve it when manipulating guests.
> 
> Stefan
> 

I agree that this is not perfect and has obvious holes, but  it works 
just fine with libvirt stacks that are typical on s390 (*). scsi disks
(emulated and real) are not affected and work just fine. DASD disks are
special anyway - please consider this as some kind of real HW pass-through
even when we use virtio-blk. We can assume that admins can provide
shared access between source and target. If you look at the real HW (LPAR
and z/VM) DASDs are always attached via fibre channel (FICON protocol)
(often with SAN switches) so shared access is quite common even over longer
distances.

And yes, using NFS or image file will break unless the user specifies the
geometry in libvirt. Setting these values is possible as of today in libvirts
XML. But what programmatic way should libvirt use to detect that information 
itself? On an image file libvirt doesnt know either. This would be somewhat
possible on an upper level like open stack and that upper level would then
need to talk with the DS8000 storage subsystems and the system z hardware but
even then it would fail when creating new DASD like images.  (This would boil
down to provide an interface like "create an image file that looks like as
DASD of type 3390/xx formatted with dasdfmt and the following parameters).


If we talk about cloud/openstack etc we do not have pass through devices anyway
and only image files. If somebody asks me, I would create all image files as 
SCSI images anyway, all the DASD stuff makes sense if you have DASD hardware 
(or if you really know what you are going to do).

What is your opinion on 
a) migrating disk properties like geometry
b) comparing the detected disk properties and reject migration if they
   dont match?
Both changes  should provide a safety net and could be added at a later
point in time.

I hope that clarifies some of the aspects and why I think that this patch series would be "good enough" for most cases. Makes sense?

Christian

PS: Proper DASD support might require a DASD emulation and/or passthrough
without virtio-blk. There are some very special operations (like low-level
format, raw-track access, reserve/release) which are pretty hard to 
virtualize with virtio-blk.

(*) Well, thats my guess,  as there are not many stacks on s390 yet in production 
as far as I know

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.
  2015-01-02 11:30 ` Stefan Hajnoczi
@ 2015-01-13  9:59   ` Ekaterina Tumanova
  0 siblings, 0 replies; 25+ messages in thread
From: Ekaterina Tumanova @ 2015-01-13  9:59 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, thuth, dahi, armbru, Public KVM Mailing List, mihajlov,
	borntraeger, stefanha, cornelia.huck, pbonzini

On 01/02/2015 02:30 PM, Stefan Hajnoczi wrote:
> On Thu, Dec 18, 2014 at 12:17:59PM +0100, Ekaterina Tumanova wrote:
>> Updates v4 -> v5:
>>
>> Minor Updates according the last review from Markus:
>> 1. update commit message for patch 2
>> 2. fix comment typos
>> 3. fix check_for_dasd to return -1 instead of -ENOTSUP
>>
>> Thanks,
>> Kate.
>>
>> ---------------------------------------------------------------
>>     Patchset Description (didn't change):
>>
>> Proper geometry and blocksize information is vital for support of
>> DASD/ECKD drives in Linux guests. Otherwise things will fail in
>> certain cases.
>>
>> The existing geometry and blocksize qemu defaults have no sense
>> for DASD drives (hd_geometry_guess detection and 512 for sizes).
>> Setting this information manually in XML file is far from user-friendly,
>> as the admin has to manually look up the properties of the
>> host disks and then modify the guest definition accordingly.
>
> What is the expected behavior when you dd the disk image from the DASD
> to a raw image file on an NFS file system on the host?
>
> It seems you'd lose the geometry detection and the disk image could be
> unusable from inside the guest.
>

This patch set doesn't change the handling of the images, dd'ed from
DASDs. Please refer to Christian's reply for details.

Thanks a lot for you review!

Kate.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/5] raw-posix: Refactor logical block size detection.
  2015-01-02 11:52   ` Stefan Hajnoczi
@ 2015-01-13 10:03     ` Ekaterina Tumanova
  2015-01-13 15:24       ` Stefan Hajnoczi
  0 siblings, 1 reply; 25+ messages in thread
From: Ekaterina Tumanova @ 2015-01-13 10:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, thuth, dahi, armbru, Public KVM Mailing List, mihajlov,
	borntraeger, stefanha, cornelia.huck, pbonzini

On 01/02/2015 02:52 PM, Stefan Hajnoczi wrote:
> On Thu, Dec 18, 2014 at 12:18:01PM +0100, Ekaterina Tumanova wrote:
>> +#if defined(BLKSSZGET)
>> +#  define SECTOR_SIZE BLKSSZGET
>> +#elif defined(DKIOCGETBLOCKSIZE)
>> +#  define SECTOR_SIZE DKIOCGETBLOCKSIZE
>> +#elif defined(DIOCGSECTORSIZE)
>> +#  define SECTOR_SIZE DIOCGSECTORSIZE
>> +#else
>> +    return -ENOTSUP
>> +#endif
>> +    if (ioctl(fd, SECTOR_SIZE, sector_size) < 0) {
>> +        return -errno;
>> +    }
>> +    return 0;
>> +#undef SECTOR_SIZE
>
> Not a reason to respin, but I would have preferred simply moving the old
> code.
>
> I think the new code works because BLKSSZGET is Linux, DKIOCGETBLOCKSIZE
> is Mac OS, and DIOCGSECTORSIZE is FreeBSD.
>
> If there is a host OS where more than one ioctl is available and the
> first one fails then the new code is broken.  The old code didn't use
> #elif so each ioctl had a chance to run.
>

In this case, why should it have a chance to run, if we only use one
result at a time? (Old code overwrites first result with the second)

Plus as far as I understand, in this hypothetical case of 2 ioctls
defined, one will most probably will be a redefinition of another.

p.s. ok to all further comments to patches.

Thanks!
Kate.


> Also, the name SECTOR_SIZE is misleading.  It's not a sector size value
> but the "get sector size" ioctl request code.
>
> Stefan
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.
  2015-01-13  8:32     ` Christian Borntraeger
@ 2015-01-13 10:51       ` Markus Armbruster
  2015-01-13 16:04         ` Stefan Hajnoczi
  0 siblings, 1 reply; 25+ messages in thread
From: Markus Armbruster @ 2015-01-13 10:51 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: kwolf, thuth, Stefan Hajnoczi, Ekaterina Tumanova,
	Public KVM Mailing List, mihajlov, dahi, stefanha, cornelia.huck,
	pbonzini

Christian Borntraeger <borntraeger@de.ibm.com> writes:

> Am 02.01.2015 um 13:57 schrieb Stefan Hajnoczi:
>> On Thu, Dec 18, 2014 at 04:59:50PM +0100, Christian Borntraeger wrote:
>>> Are you ok with the patches? If yes, can you take care of these
>>> patches in the block tree?
>> 
>> This series looks close, I've left comments on the patches.
>
> OK, so you would take care of the series in your tree if the next spin
> addresses your comments, right?
>> 
>> The series is fine for command-line QEMU users where probing makes the
>> command-line more convenient, so we can merge it.  But the approach is
>> fundamentally wrong for stacks where libvirt is in use.
>> Libvirt is unaware of the guest geometry and block sizes that are probed
>> in QEMU by this patch series.  This breaks non-shared storage migration
>> and also means libvirt-based tools that manipulate drives on a guest may
>> inadvertently change the guest-visible geometry and cause disk problems.
>> 
>> For example, what happens when you copy the disk image off a host DASD
>> and onto NFS?  QEMU no longer probes the geometry and the disk geometry
>> has changed.
>> 
>> The right place to tackle guest-visible geometry is in libvirt, not in
>> QEMU, because it is guest state the needs to be captured in domain XML
>> so that migration and tooling can preserve it when manipulating guests.
>> 
>> Stefan
>> 
>
> I agree that this is not perfect and has obvious holes, but  it works 
> just fine with libvirt stacks that are typical on s390 (*). scsi disks
> (emulated and real) are not affected and work just fine. DASD disks are
> special anyway - please consider this as some kind of real HW pass-through
> even when we use virtio-blk. We can assume that admins can provide
> shared access between source and target. If you look at the real HW (LPAR
> and z/VM) DASDs are always attached via fibre channel (FICON protocol)
> (often with SAN switches) so shared access is quite common even over longer
> distances.

I wouldn't quite call it pass-through, but I agree that DASDs are
special.

> And yes, using NFS or image file will break unless the user specifies the
> geometry in libvirt. Setting these values is possible as of today in libvirts
> XML. But what programmatic way should libvirt use to detect that information 
> itself? On an image file libvirt doesnt know either. This would be somewhat
> possible on an upper level like open stack and that upper level would then
> need to talk with the DS8000 storage subsystems and the system z hardware but
> even then it would fail when creating new DASD like images.  (This would boil
> down to provide an interface like "create an image file that looks like as
> DASD of type 3390/xx formatted with dasdfmt and the following parameters).

Disk geometry is really a device model property.  For historical
reasons, we conjure up default geometries in the block layer.  For
convenience, we special-case DASDs (this series).

Very few guests care about device geometry.

If your guest does, relying on default geometry has always exposed you
to the risk of an unwanted geometry change.  So far largely theoretical,
as the default geometry guessing algorithm has been unlikely to change
incompatibily.  To eliminate the risk, specify the geometry explicitly.
With libvirt, that means putting it into domain XML.

This series changes *does* change the guessing algorithm, but for DASDs
only.  Thus, the risk becomes real, but for users of DASDs only.  Since
said users of DASDs ask for it...  let them have rope, I say.

Libvirt could default the geometry to whatever QEMU comes up with, then
fix it forever in domain XML.  Doubtful whether it's worth the bother,
as very few guests care, and their users are probably (painfully) aware
of geometry pitfalls anyway.

> If we talk about cloud/openstack etc we do not have pass through devices anyway
> and only image files. If somebody asks me, I would create all image files as 
> SCSI images anyway, all the DASD stuff makes sense if you have DASD hardware 
> (or if you really know what you are going to do).
>
> What is your opinion on 
> a) migrating disk properties like geometry
> b) comparing the detected disk properties and reject migration if they
>    dont match?
> Both changes  should provide a safety net and could be added at a later
> point in time.

If I remember correctly I suggested to investigate in that direction,
but then we concluded it wasn't worth the bother.

> I hope that clarifies some of the aspects and why I think that this
> patch series would be "good enough" for most cases. Makes sense?

My R-by stands.

> Christian
>
> PS: Proper DASD support might require a DASD emulation and/or passthrough
> without virtio-blk. There are some very special operations (like low-level
> format, raw-track access, reserve/release) which are pretty hard to 
> virtualize with virtio-blk.

That's what I'd call pass-through.

> (*) Well, thats my guess, as there are not many stacks on s390 yet in
> production
> as far as I know

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/5] raw-posix: Refactor logical block size detection.
  2015-01-13 10:03     ` Ekaterina Tumanova
@ 2015-01-13 15:24       ` Stefan Hajnoczi
  0 siblings, 0 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2015-01-13 15:24 UTC (permalink / raw)
  To: Ekaterina Tumanova
  Cc: kwolf, thuth, dahi, Stefan Hajnoczi, Public KVM Mailing List,
	armbru, borntraeger, cornelia.huck, pbonzini, mihajlov

[-- Attachment #1: Type: text/plain, Size: 1746 bytes --]

On Tue, Jan 13, 2015 at 01:03:03PM +0300, Ekaterina Tumanova wrote:
> On 01/02/2015 02:52 PM, Stefan Hajnoczi wrote:
> >On Thu, Dec 18, 2014 at 12:18:01PM +0100, Ekaterina Tumanova wrote:
> >>+#if defined(BLKSSZGET)
> >>+#  define SECTOR_SIZE BLKSSZGET
> >>+#elif defined(DKIOCGETBLOCKSIZE)
> >>+#  define SECTOR_SIZE DKIOCGETBLOCKSIZE
> >>+#elif defined(DIOCGSECTORSIZE)
> >>+#  define SECTOR_SIZE DIOCGSECTORSIZE
> >>+#else
> >>+    return -ENOTSUP
> >>+#endif
> >>+    if (ioctl(fd, SECTOR_SIZE, sector_size) < 0) {
> >>+        return -errno;
> >>+    }
> >>+    return 0;
> >>+#undef SECTOR_SIZE
> >
> >Not a reason to respin, but I would have preferred simply moving the old
> >code.
> >
> >I think the new code works because BLKSSZGET is Linux, DKIOCGETBLOCKSIZE
> >is Mac OS, and DIOCGSECTORSIZE is FreeBSD.
> >
> >If there is a host OS where more than one ioctl is available and the
> >first one fails then the new code is broken.  The old code didn't use
> >#elif so each ioctl had a chance to run.
> >
> 
> In this case, why should it have a chance to run, if we only use one
> result at a time? (Old code overwrites first result with the second)
> 
> Plus as far as I understand, in this hypothetical case of 2 ioctls
> defined, one will most probably will be a redefinition of another.

As code reviewer, I don't know exactly what all supported host OSes do
(Linux, *BSD, Solaris, AIX, etc).

If you leave the control flow unchanged then I'm confident that this
patch doesn't introduce a bug.

If you rewrite the control flow, then the semantics are different but I
can't verify that they are correct in all cases.

Spurious code changes make the life of code reviewers harder.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.
  2015-01-13 10:51       ` Markus Armbruster
@ 2015-01-13 16:04         ` Stefan Hajnoczi
  2015-01-13 17:27           ` Markus Armbruster
  2015-01-13 19:07           ` Christian Borntraeger
  0 siblings, 2 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2015-01-13 16:04 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: kwolf, thuth, Stefan Hajnoczi, Ekaterina Tumanova,
	Public KVM Mailing List, Markus Armbruster, dahi, cornelia.huck,
	pbonzini, mihajlov

[-- Attachment #1: Type: text/plain, Size: 6340 bytes --]

On Tue, Jan 13, 2015 at 11:51:51AM +0100, Markus Armbruster wrote:
> Christian Borntraeger <borntraeger@de.ibm.com> writes:
> 
> > Am 02.01.2015 um 13:57 schrieb Stefan Hajnoczi:
> >> On Thu, Dec 18, 2014 at 04:59:50PM +0100, Christian Borntraeger wrote:
> >>> Are you ok with the patches? If yes, can you take care of these
> >>> patches in the block tree?
> >> 
> >> This series looks close, I've left comments on the patches.
> >
> > OK, so you would take care of the series in your tree if the next spin
> > addresses your comments, right?
> >> 
> >> The series is fine for command-line QEMU users where probing makes the
> >> command-line more convenient, so we can merge it.  But the approach is
> >> fundamentally wrong for stacks where libvirt is in use.
> >> Libvirt is unaware of the guest geometry and block sizes that are probed
> >> in QEMU by this patch series.  This breaks non-shared storage migration
> >> and also means libvirt-based tools that manipulate drives on a guest may
> >> inadvertently change the guest-visible geometry and cause disk problems.
> >> 
> >> For example, what happens when you copy the disk image off a host DASD
> >> and onto NFS?  QEMU no longer probes the geometry and the disk geometry
> >> has changed.
> >> 
> >> The right place to tackle guest-visible geometry is in libvirt, not in
> >> QEMU, because it is guest state the needs to be captured in domain XML
> >> so that migration and tooling can preserve it when manipulating guests.
> >> 
> >> Stefan
> >> 
> >
> > I agree that this is not perfect and has obvious holes, but  it works 
> > just fine with libvirt stacks that are typical on s390 (*). scsi disks
> > (emulated and real) are not affected and work just fine. DASD disks are
> > special anyway - please consider this as some kind of real HW pass-through
> > even when we use virtio-blk. We can assume that admins can provide
> > shared access between source and target. If you look at the real HW (LPAR
> > and z/VM) DASDs are always attached via fibre channel (FICON protocol)
> > (often with SAN switches) so shared access is quite common even over longer
> > distances.
> 
> I wouldn't quite call it pass-through, but I agree that DASDs are
> special.
> 
> > And yes, using NFS or image file will break unless the user specifies the
> > geometry in libvirt. Setting these values is possible as of today in libvirts
> > XML. But what programmatic way should libvirt use to detect that information 
> > itself? On an image file libvirt doesnt know either. This would be somewhat
> > possible on an upper level like open stack and that upper level would then
> > need to talk with the DS8000 storage subsystems and the system z hardware but
> > even then it would fail when creating new DASD like images.  (This would boil
> > down to provide an interface like "create an image file that looks like as
> > DASD of type 3390/xx formatted with dasdfmt and the following parameters).
> 
> Disk geometry is really a device model property.  For historical
> reasons, we conjure up default geometries in the block layer.  For
> convenience, we special-case DASDs (this series).
> 
> Very few guests care about device geometry.
> 
> If your guest does, relying on default geometry has always exposed you
> to the risk of an unwanted geometry change.  So far largely theoretical,
> as the default geometry guessing algorithm has been unlikely to change
> incompatibily.  To eliminate the risk, specify the geometry explicitly.
> With libvirt, that means putting it into domain XML.
> 
> This series changes *does* change the guessing algorithm, but for DASDs
> only.  Thus, the risk becomes real, but for users of DASDs only.  Since
> said users of DASDs ask for it...  let them have rope, I say.
> 
> Libvirt could default the geometry to whatever QEMU comes up with, then
> fix it forever in domain XML.  Doubtful whether it's worth the bother,
> as very few guests care, and their users are probably (painfully) aware
> of geometry pitfalls anyway.
> 
> > If we talk about cloud/openstack etc we do not have pass through devices anyway
> > and only image files. If somebody asks me, I would create all image files as 
> > SCSI images anyway, all the DASD stuff makes sense if you have DASD hardware 
> > (or if you really know what you are going to do).
> >
> > What is your opinion on 
> > a) migrating disk properties like geometry
> > b) comparing the detected disk properties and reject migration if they
> >    dont match?
> > Both changes  should provide a safety net and could be added at a later
> > point in time.
> 
> If I remember correctly I suggested to investigate in that direction,
> but then we concluded it wasn't worth the bother.

Migrating disk geometry violates how live migration works:

1. Run-time device state like device registers and memory contents are
   live-migrated by QEMU.

2. Machine configuration like the list of emulated devices and
   read-only properties on those devices are not live migrated by QEMU.

Disk geometry is a property of the emulated storage device, so it is not
live migrated.

Why is migrating disk geometry incorrect?  Because if you live migrate
it in QEMU there is no way to persist it on the destination host.  You
will be unable to cold boot the guest on the destination because you
lose the machine configuration when the guest shuts down.

The machine configuration is persisted by the management tool or
manually by the user.  That is where the disk geometry needs to go
during migration.


I'm really starting to get worried that you are going to break things.
This DASD hack is a layering violation but okay, go ahead if you want.
But now you are also thinking about breaking live migration.

The proper thing to do is to introduce libvirt XML syntax for DASD.
That way the geometry can be handled as part of the machine
configuration.  Then live migration and storage management tools can do
the right thing.

I've said this should be done in libvirt repeatedly but you keep wanting
to hack QEMU instead of doing this cleanly :(.

If you have plans to expand on this hack, please scrap this series and
introduce libvirt XML syntax instead.

Does what I'm saying make sense?

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.
  2015-01-13 16:04         ` Stefan Hajnoczi
@ 2015-01-13 17:27           ` Markus Armbruster
  2015-01-13 19:07           ` Christian Borntraeger
  1 sibling, 0 replies; 25+ messages in thread
From: Markus Armbruster @ 2015-01-13 17:27 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, thuth, dahi, Stefan Hajnoczi, Ekaterina Tumanova,
	Public KVM Mailing List, mihajlov, Christian Borntraeger,
	cornelia.huck, pbonzini

Stefan Hajnoczi <stefanha@redhat.com> writes:

> On Tue, Jan 13, 2015 at 11:51:51AM +0100, Markus Armbruster wrote:
>> Christian Borntraeger <borntraeger@de.ibm.com> writes:
>> 
>> > Am 02.01.2015 um 13:57 schrieb Stefan Hajnoczi:
>> >> On Thu, Dec 18, 2014 at 04:59:50PM +0100, Christian Borntraeger wrote:
>> >>> Are you ok with the patches? If yes, can you take care of these
>> >>> patches in the block tree?
>> >> 
>> >> This series looks close, I've left comments on the patches.
>> >
>> > OK, so you would take care of the series in your tree if the next spin
>> > addresses your comments, right?
>> >> 
>> >> The series is fine for command-line QEMU users where probing makes the
>> >> command-line more convenient, so we can merge it.  But the approach is
>> >> fundamentally wrong for stacks where libvirt is in use.
>> >> Libvirt is unaware of the guest geometry and block sizes that are probed
>> >> in QEMU by this patch series.  This breaks non-shared storage migration
>> >> and also means libvirt-based tools that manipulate drives on a guest may
>> >> inadvertently change the guest-visible geometry and cause disk problems.
>> >> 
>> >> For example, what happens when you copy the disk image off a host DASD
>> >> and onto NFS?  QEMU no longer probes the geometry and the disk geometry
>> >> has changed.
>> >> 
>> >> The right place to tackle guest-visible geometry is in libvirt, not in
>> >> QEMU, because it is guest state the needs to be captured in domain XML
>> >> so that migration and tooling can preserve it when manipulating guests.
>> >> 
>> >> Stefan
>> >> 
>> >
>> > I agree that this is not perfect and has obvious holes, but  it works 
>> > just fine with libvirt stacks that are typical on s390 (*). scsi disks
>> > (emulated and real) are not affected and work just fine. DASD disks are
>> > special anyway - please consider this as some kind of real HW pass-through
>> > even when we use virtio-blk. We can assume that admins can provide
>> > shared access between source and target. If you look at the real HW (LPAR
>> > and z/VM) DASDs are always attached via fibre channel (FICON protocol)
>> > (often with SAN switches) so shared access is quite common even over longer
>> > distances.
>> 
>> I wouldn't quite call it pass-through, but I agree that DASDs are
>> special.
>> 
>> > And yes, using NFS or image file will break unless the user specifies the
>> > geometry in libvirt. Setting these values is possible as of today
>> > in libvirts
>> > XML. But what programmatic way should libvirt use to detect that
>> > information
>> > itself? On an image file libvirt doesnt know either. This would be somewhat
>> > possible on an upper level like open stack and that upper level would then
>> > need to talk with the DS8000 storage subsystems and the system z
>> > hardware but
>> > even then it would fail when creating new DASD like images.  (This
>> > would boil
>> > down to provide an interface like "create an image file that looks like as
>> > DASD of type 3390/xx formatted with dasdfmt and the following parameters).
>> 
>> Disk geometry is really a device model property.  For historical
>> reasons, we conjure up default geometries in the block layer.  For
>> convenience, we special-case DASDs (this series).
>> 
>> Very few guests care about device geometry.
>> 
>> If your guest does, relying on default geometry has always exposed you
>> to the risk of an unwanted geometry change.  So far largely theoretical,
>> as the default geometry guessing algorithm has been unlikely to change
>> incompatibily.  To eliminate the risk, specify the geometry explicitly.
>> With libvirt, that means putting it into domain XML.
>> 
>> This series changes *does* change the guessing algorithm, but for DASDs
>> only.  Thus, the risk becomes real, but for users of DASDs only.  Since
>> said users of DASDs ask for it...  let them have rope, I say.
>> 
>> Libvirt could default the geometry to whatever QEMU comes up with, then
>> fix it forever in domain XML.  Doubtful whether it's worth the bother,
>> as very few guests care, and their users are probably (painfully) aware
>> of geometry pitfalls anyway.
>> 
>> > If we talk about cloud/openstack etc we do not have pass through
>> > devices anyway
>> > and only image files. If somebody asks me, I would create all
>> > image files as
>> > SCSI images anyway, all the DASD stuff makes sense if you have
>> > DASD hardware
>> > (or if you really know what you are going to do).
>> >
>> > What is your opinion on 
>> > a) migrating disk properties like geometry
>> > b) comparing the detected disk properties and reject migration if they
>> >    dont match?
>> > Both changes  should provide a safety net and could be added at a later
>> > point in time.
>> 
>> If I remember correctly I suggested to investigate in that direction,
>> but then we concluded it wasn't worth the bother.
>
> Migrating disk geometry violates how live migration works:
>
> 1. Run-time device state like device registers and memory contents are
>    live-migrated by QEMU.
>
> 2. Machine configuration like the list of emulated devices and
>    read-only properties on those devices are not live migrated by QEMU.

Correct.  Frontend configuration could be migrated, but we don't.

> Disk geometry is a property of the emulated storage device, so it is not
> live migrated.

Yes.

> Why is migrating disk geometry incorrect?  Because if you live migrate
> it in QEMU there is no way to persist it on the destination host.  You
> will be unable to cold boot the guest on the destination because you
> lose the machine configuration when the guest shuts down.
>
> The machine configuration is persisted by the management tool or
> manually by the user.  That is where the disk geometry needs to go
> during migration.

Yes, it's actually a bad idea, not just not worth the bother.

> I'm really starting to get worried that you are going to break things.
> This DASD hack is a layering violation but okay, go ahead if you want.

The DASD hack is on top of ancient and equally unclean hacks: geometry
is guessed from the MS-DOS partition table if present and sane, else
from the device size.  A blank disk obviously gets the latter.  Have the
guest write a partition table, then reboot, and you get the former.  If
it's different, and the guest cares, poof!  Fortunately, very few guests
care.

> But now you are also thinking about breaking live migration.

I suspect he's merely thinking about ways to get this series committed,
and now regrets having mentioned migration again ;)

> The proper thing to do is to introduce libvirt XML syntax for DASD.
> That way the geometry can be handled as part of the machine
> configuration.  Then live migration and storage management tools can do
> the right thing.
>
> I've said this should be done in libvirt repeatedly but you keep wanting
> to hack QEMU instead of doing this cleanly :(.

As I explained, the problem isn't new.  This series merely makes it
bigger for DASD.  If that's what DASD users want... let them have rope.

For pretty much everyone else, the problem has been and remains
insignificant.  Does geometry really change when there's no observer?
Hard to get excited about doing geometry right in libvirt...

> If you have plans to expand on this hack, please scrap this series and
> introduce libvirt XML syntax instead.

I wouldn't object to "instead", but I wouldn't object to "in addition
to", either.

> Does what I'm saying make sense?

I don't think you're wrong, we're mostly weighing pros and cons
differently.  But you're the maintainer, so it's ultimately your call.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.
  2015-01-13 16:04         ` Stefan Hajnoczi
  2015-01-13 17:27           ` Markus Armbruster
@ 2015-01-13 19:07           ` Christian Borntraeger
  2015-01-14 13:57             ` Stefan Hajnoczi
  1 sibling, 1 reply; 25+ messages in thread
From: Christian Borntraeger @ 2015-01-13 19:07 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, thuth, Stefan Hajnoczi, Ekaterina Tumanova,
	Public KVM Mailing List, Markus Armbruster, dahi, cornelia.huck,
	pbonzini, mihajlov

Am 13.01.2015 um 17:04 schrieb Stefan Hajnoczi:
[...]
> I'm really starting to get worried that you are going to break things.
> This DASD hack is a layering violation but okay, go ahead if you want.
> But now you are also thinking about breaking live migration.
> 
> The proper thing to do is to introduce libvirt XML syntax for DASD.
> That way the geometry can be handled as part of the machine
> configuration.  Then live migration and storage management tools can do
> the right thing.
> 
> I've said this should be done in libvirt repeatedly but you keep wanting
> to hack QEMU instead of doing this cleanly :(.
> 
> If you have plans to expand on this hack, please scrap this series and
> introduce libvirt XML syntax instead.

We had no plans to expand on this band-aid. I just tried to come up with ideas beyond this hack to make this more acceptible to you (because I though that
you want something on top). Seems that I misunderstood you, so if you  prefer
to just keep this hack and not do anything further in that direction, this is
totally fine with me.

So lets just fix up the small nits and go ahead, ok?


Christian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices.
  2015-01-13 19:07           ` Christian Borntraeger
@ 2015-01-14 13:57             ` Stefan Hajnoczi
  0 siblings, 0 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2015-01-14 13:57 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: kwolf, thuth, Stefan Hajnoczi, Ekaterina Tumanova,
	Public KVM Mailing List, Markus Armbruster, dahi, cornelia.huck,
	pbonzini, mihajlov

[-- Attachment #1: Type: text/plain, Size: 1357 bytes --]

On Tue, Jan 13, 2015 at 08:07:15PM +0100, Christian Borntraeger wrote:
> Am 13.01.2015 um 17:04 schrieb Stefan Hajnoczi:
> [...]
> > I'm really starting to get worried that you are going to break things.
> > This DASD hack is a layering violation but okay, go ahead if you want.
> > But now you are also thinking about breaking live migration.
> > 
> > The proper thing to do is to introduce libvirt XML syntax for DASD.
> > That way the geometry can be handled as part of the machine
> > configuration.  Then live migration and storage management tools can do
> > the right thing.
> > 
> > I've said this should be done in libvirt repeatedly but you keep wanting
> > to hack QEMU instead of doing this cleanly :(.
> > 
> > If you have plans to expand on this hack, please scrap this series and
> > introduce libvirt XML syntax instead.
> 
> We had no plans to expand on this band-aid. I just tried to come up with ideas beyond this hack to make this more acceptible to you (because I though that
> you want something on top). Seems that I misunderstood you, so if you  prefer
> to just keep this hack and not do anything further in that direction, this is
> totally fine with me.
> 
> So lets just fix up the small nits and go ahead, ok?

Yes, I'm fine with this series (modulo the review comments which require
a v6).

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-01-14 13:58 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-18 11:17 [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices Ekaterina Tumanova
2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 1/5] block: add bdrv functions for geometry and blocksize Ekaterina Tumanova
2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 2/5] raw-posix: Refactor logical block size detection Ekaterina Tumanova
2015-01-02 11:52   ` Stefan Hajnoczi
2015-01-13 10:03     ` Ekaterina Tumanova
2015-01-13 15:24       ` Stefan Hajnoczi
2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 3/5] block: Add driver methods to probe blocksizes and geometry Ekaterina Tumanova
2014-12-18 12:43   ` Thomas Huth
2015-01-02 12:11   ` Stefan Hajnoczi
2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 4/5] block-backend: Add wrappers for blocksizes and geometry probing Ekaterina Tumanova
2014-12-18 14:45   ` Thomas Huth
2015-01-02 12:34   ` Stefan Hajnoczi
2014-12-18 11:18 ` [Qemu-devel] [PATCH v5 5/5] BlockConf: Call backend functions to detect geometry and blocksizes Ekaterina Tumanova
2014-12-18 14:55   ` Thomas Huth
2015-01-02 12:46   ` Stefan Hajnoczi
2014-12-18 15:59 ` [Qemu-devel] [PATCH v5 0/5] Geometry and blocksize detection for backing devices Christian Borntraeger
2015-01-02 12:57   ` Stefan Hajnoczi
2015-01-13  8:32     ` Christian Borntraeger
2015-01-13 10:51       ` Markus Armbruster
2015-01-13 16:04         ` Stefan Hajnoczi
2015-01-13 17:27           ` Markus Armbruster
2015-01-13 19:07           ` Christian Borntraeger
2015-01-14 13:57             ` Stefan Hajnoczi
2015-01-02 11:30 ` Stefan Hajnoczi
2015-01-13  9:59   ` Ekaterina Tumanova

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.