All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices
@ 2014-07-29 12:27 Ekaterina Tumanova
  2014-07-29 12:27 ` [Qemu-devel] [PATCH 1/4] hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x Ekaterina Tumanova
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Ekaterina Tumanova @ 2014-07-29 12:27 UTC (permalink / raw)
  To: Public KVM Mailing List
  Cc: cornelia.huck, dahi, Ekaterina Tumanova, borntraeger

This patch set is based on a patch suggested by Einar Lueck
on Feb 08, 2013.

This patch set introduces:
1. s390x specific geometry detection:
Add s390 specific version of hd_geometry_guess function,
which uses HDIO_GETGEO ioctl.

2. A set of blocksize patches for autodetection of logical and
physical blocksizes. Change history:
2.1
Original blocksize patch only configured autolookup for
virtio-blk devices. There was a request from Stefan Hajnoczi
to make this architecture-independent. Now autolookup is
configured by default for all block devices.
2.2
Add driver method to probe blocksizes for "raw" and
"host_device" drivers.
(also requested by the reviewers of the original patch)

Ekaterina Tumanova (4):
  hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x
  blocksize: support auto-sensing of blocksizes
  blocksize: Add driver method to get the blocksizes
  blocksize: add blkconf_blocksize call to all block devices

 block.c                   | 12 +++++++++
 block/raw-posix.c         | 69 ++++++++++++++++++++++++++++++++++-------------
 block/raw_bsd.c           | 14 ++++++++++
 hw/block/Makefile.objs    |  6 ++++-
 hw/block/block.c          | 25 +++++++++++++++++
 hw/block/hd-geometry.c    | 56 ++++++++++++++++++++++++++++++++++++++
 hw/block/nvme.c           |  1 +
 hw/block/virtio-blk.c     |  1 +
 hw/core/qdev-properties.c |  4 ++-
 hw/ide/qdev.c             |  1 +
 hw/scsi/scsi-disk.c       |  1 +
 hw/usb/dev-storage.c      |  1 +
 include/block/block.h     |  1 +
 include/block/block_int.h |  5 ++++
 include/hw/block/block.h  |  6 +++--
 15 files changed, 180 insertions(+), 23 deletions(-)

-- 
1.8.5.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 1/4] hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x
  2014-07-29 12:27 [Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices Ekaterina Tumanova
@ 2014-07-29 12:27 ` Ekaterina Tumanova
  2014-08-20  8:19   ` Paolo Bonzini
  2014-07-29 12:27 ` [Qemu-devel] [PATCH 2/4] blocksize: support auto-sensing of blocksizes Ekaterina Tumanova
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Ekaterina Tumanova @ 2014-07-29 12:27 UTC (permalink / raw)
  To: Public KVM Mailing List
  Cc: cornelia.huck, dahi, Ekaterina Tumanova, borntraeger

This patch extends the function hd_geometry_guess. It introduces a
target specific hook. The default implementation for this target
specific hook is empty, has therefore no effect and the existing logic
works as before.

For target-s390x, the behaviour is chosen as follows:
If no geo could be guessed via guess_disk_lchs, a new function called
guess_disk_pchs is called. The latter utilizes HDIO_GET_GEO ioctl to ask
the underlying disk for geometry.
If this is not successful (e.g. image files) geometry is derived from the
size of the disk (as before).

The new HDIO_GETGEO logic is required for two use cases:
a) Support for geometries of Direct Attached Storage Disks (DASD)
on s390x configured as backing of virtio block devices.
b) Support for FCP attached SCSI disks that do not yet have a
partition table. Without this patch, fdisk -l on the host would
return different results then fdisk -l in the guest.

Based on 2013 patch from Einar Lueck <elelueck@de.ibm.com>

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
 hw/block/Makefile.objs |  6 +++++-
 hw/block/hd-geometry.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
index bf46f03..e4f6205 100644
--- a/hw/block/Makefile.objs
+++ b/hw/block/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-y += block.o cdrom.o hd-geometry.o
+common-obj-y += block.o cdrom.o
 common-obj-$(CONFIG_FDC) += fdc.o
 common-obj-$(CONFIG_SSI_M25P80) += m25p80.o
 common-obj-$(CONFIG_NAND) += nand.o
@@ -13,3 +13,7 @@ obj-$(CONFIG_SH4) += tc58128.o
 
 obj-$(CONFIG_VIRTIO) += virtio-blk.o
 obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += dataplane/
+
+# geometry is target/architecture dependent and therefore needs to be built
+# for every target platform
+obj-y += hd-geometry.o
diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c
index 6feb4f8..7988264 100644
--- a/hw/block/hd-geometry.c
+++ b/hw/block/hd-geometry.c
@@ -33,6 +33,10 @@
 #include "block/block.h"
 #include "hw/block/block.h"
 #include "trace.h"
+#ifdef __linux__
+#include <linux/fs.h>
+#include <linux/hdreg.h>
+#endif
 
 struct partition {
         uint8_t boot_ind;           /* 0x80 - active */
@@ -47,6 +51,37 @@ struct partition {
         uint32_t nr_sects;          /* nr of sectors in partition */
 } QEMU_PACKED;
 
+static void guess_chs_for_size(BlockDriverState *bs, uint32_t *pcyls,
+                               uint32_t *pheads, uint32_t *psecs);
+
+/* try to get geometry from disk via HDIO_GETGEO ioctl
+   Return 0 if OK, -1 if ioctl does not work (e.g. image file) */
+static inline int guess_disk_pchs(BlockDriverState *bs, uint32_t *pcylinders,
+                                  uint32_t *pheads, uint32_t *psectors)
+{
+#ifdef __linux__
+    struct hd_geometry geo;
+
+    if (bdrv_ioctl(bs, HDIO_GETGEO, &geo)) {
+        return -1;
+    }
+
+    /* HDIO_GETGEO may return success even though geo contains zeros
+       (e.g. certain multipath setups) */
+    if (!geo.heads || !geo.sectors || !geo.cylinders) {
+        return -1;
+    }
+
+    *pheads = geo.heads;
+    *psectors = geo.sectors;
+    *pcylinders = geo.cylinders;
+    return 0;
+#else
+    return -1;
+#endif
+}
+
+
 /* try to guess the disk logical geometry from the MSDOS partition table.
    Return 0 if OK, -1 if could not guess */
 static int guess_disk_lchs(BlockDriverState *bs,
@@ -116,6 +151,26 @@ static void guess_chs_for_size(BlockDriverState *bs,
     *psecs = 63;
 }
 
+#ifdef TARGET_S390X
+void hd_geometry_guess(BlockDriverState *bs, uint32_t *pcyls, uint32_t *pheads,
+                       uint32_t *psecs, int *ptrans)
+{
+    if (guess_disk_lchs(bs, (int *)pcyls, (int *)pheads, (int *)psecs) == 0) {
+        goto done;
+    }
+    if (guess_disk_pchs(bs, pcyls, pheads, psecs) == 0) {
+        goto done;
+    }
+    /* still no geometry - try to guess from disk size */
+    guess_chs_for_size(bs, pcyls, pheads, psecs);
+done:
+    if (ptrans) {
+        *ptrans = BIOS_ATA_TRANSLATION_NONE;
+    }
+    trace_hd_geometry_guess(bs, *pcyls, *pheads, *psecs,
+                            BIOS_ATA_TRANSLATION_NONE);
+}
+#else
 void hd_geometry_guess(BlockDriverState *bs,
                        uint32_t *pcyls, uint32_t *pheads, uint32_t *psecs,
                        int *ptrans)
@@ -148,6 +203,7 @@ void hd_geometry_guess(BlockDriverState *bs,
     }
     trace_hd_geometry_guess(bs, *pcyls, *pheads, *psecs, translation);
 }
+#endif
 
 int hd_bios_chs_auto_trans(uint32_t cyls, uint32_t heads, uint32_t secs)
 {
-- 
1.8.5.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 2/4] blocksize: support auto-sensing of blocksizes
  2014-07-29 12:27 [Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices Ekaterina Tumanova
  2014-07-29 12:27 ` [Qemu-devel] [PATCH 1/4] hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x Ekaterina Tumanova
@ 2014-07-29 12:27 ` Ekaterina Tumanova
  2014-09-03 15:37   ` Stefan Hajnoczi
  2014-07-29 12:27 ` [Qemu-devel] [PATCH 3/4] blocksize: Add driver method to get the blocksizes Ekaterina Tumanova
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Ekaterina Tumanova @ 2014-07-29 12:27 UTC (permalink / raw)
  To: Public KVM Mailing List
  Cc: cornelia.huck, dahi, Ekaterina Tumanova, borntraeger

The block device model does not impose fixed block sizes for
access to backing devices. This patch introduces support for
auto lookup of the block sizes of the backing block device.

To achieve this, a new function blkconf_blocksizes is
implemented. This function tries to get values
from the block driver. If this does not work 512 is used,
so the default excecution logic is not changed.

Based on 2013 patch from Einar Lueck <elelueck@de.ibm.com>

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
 block.c                   | 12 ++++++++++++
 hw/block/block.c          | 25 +++++++++++++++++++++++++
 hw/core/qdev-properties.c |  4 +++-
 include/block/block.h     |  1 +
 include/block/block_int.h |  5 +++++
 include/hw/block/block.h  |  2 ++
 6 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 8cf519b..67de6e9 100644
--- a/block.c
+++ b/block.c
@@ -552,6 +552,18 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
     }
 }
 
+int bdrv_probe_blocksizes(BlockDriverState *bs)
+{
+    BlockDriver *drv = bs->drv;
+
+    assert(drv != NULL);
+    if (drv->bdrv_probe_blocksizes) {
+        return drv->bdrv_probe_blocksizes(bs);
+    }
+
+    return -1;
+}
+
 /*
  * Create a uniquely-named empty temporary file.
  * Return 0 upon success, otherwise a negative errno value.
diff --git a/hw/block/block.c b/hw/block/block.c
index 33dd3f3..29a0227 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -10,6 +10,10 @@
 #include "sysemu/blockdev.h"
 #include "hw/block/block.h"
 #include "qemu/error-report.h"
+#include "block/block_int.h"
+#ifdef __linux__
+#include <linux/fs.h>
+#endif
 
 void blkconf_serial(BlockConf *conf, char **serial)
 {
@@ -22,6 +26,27 @@ void blkconf_serial(BlockConf *conf, char **serial)
     }
 }
 
+void blkconf_blocksizes(BlockConf *conf)
+{
+    BlockDriverState *bs = conf->bs;
+
+    /* default values as a basis - if probing fails */
+    conf->physical_block_size = BLOCK_PROPERTY_STD_BLKSIZE;
+    conf->logical_block_size = BLOCK_PROPERTY_STD_BLKSIZE;
+    if (bdrv_probe_blocksizes(conf->bs) < 0) {
+        return;
+    }
+    if (bs->logical_block_size) {
+        conf->logical_block_size = (uint16_t)(bs->logical_block_size);
+    }
+    if (bs->physical_block_size) {
+        conf->physical_block_size = (uint16_t)(bs->physical_block_size);
+    } else if (conf->logical_block_size) {
+        /* if driver sets no physical size, try to use logical size */
+        conf->physical_block_size = conf->logical_block_size;
+    }
+}
+
 int blkconf_geometry(BlockConf *conf, int *ptrans,
                      unsigned cyls_max, unsigned heads_max, unsigned secs_max)
 {
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 3d12560..49fb1e3 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -579,7 +579,9 @@ static void set_blocksize(Object *obj, Visitor *v, void *opaque,
         error_propagate(errp, local_err);
         return;
     }
-    if (value < min || value > max) {
+
+    /* value == 0 indicates that block size should be sensed later on */
+    if ((value < min || value > max) && value > 0) {
         error_set(errp, QERR_PROPERTY_VALUE_OUT_OF_RANGE,
                   dev->id?:"", name, (int64_t)value, min, max);
         return;
diff --git a/include/block/block.h b/include/block/block.h
index f08471d..43af424 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -583,6 +583,7 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs);
  * the old #AioContext is not executing.
  */
 void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context);
+int bdrv_probe_blocksizes(BlockDriverState *bs);
 
 void bdrv_io_plug(BlockDriverState *bs);
 void bdrv_io_unplug(BlockDriverState *bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 7b541a0..572e954 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -266,6 +266,8 @@ struct BlockDriver {
     void (*bdrv_io_unplug)(BlockDriverState *bs);
     void (*bdrv_flush_io_queue)(BlockDriverState *bs);
 
+    int (*bdrv_probe_blocksizes)(BlockDriverState *bs);
+
     QLIST_ENTRY(BlockDriver) list;
 };
 
@@ -360,6 +362,9 @@ struct BlockDriverState {
     /* the block size for which the guest device expects atomicity */
     int guest_block_size;
 
+    unsigned int physical_block_size;
+    unsigned int logical_block_size;
+
     /* do we need to tell the quest if we have a volatile write cache? */
     int enable_write_cache;
 
diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index 7c3d6c8..7a0092e 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -40,6 +40,7 @@ static inline unsigned int get_physical_block_exp(BlockConf *conf)
     return exp;
 }
 
+#define BLOCK_PROPERTY_STD_BLKSIZE 512
 #define DEFINE_BLOCK_PROPERTIES(_state, _conf)                          \
     DEFINE_PROP_DRIVE("drive", _state, _conf.bs),                       \
     DEFINE_PROP_BLOCKSIZE("logical_block_size", _state,                 \
@@ -60,6 +61,7 @@ static inline unsigned int get_physical_block_exp(BlockConf *conf)
 /* Configuration helpers */
 
 void blkconf_serial(BlockConf *conf, char **serial);
+void blkconf_blocksizes(BlockConf *conf);
 int blkconf_geometry(BlockConf *conf, int *trans,
                      unsigned cyls_max, unsigned heads_max, unsigned secs_max);
 
-- 
1.8.5.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 3/4] blocksize: Add driver method to get the blocksizes
  2014-07-29 12:27 [Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices Ekaterina Tumanova
  2014-07-29 12:27 ` [Qemu-devel] [PATCH 1/4] hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x Ekaterina Tumanova
  2014-07-29 12:27 ` [Qemu-devel] [PATCH 2/4] blocksize: support auto-sensing of blocksizes Ekaterina Tumanova
@ 2014-07-29 12:27 ` Ekaterina Tumanova
  2014-07-29 12:27 ` [Qemu-devel] [PATCH 4/4] blocksize: add blkconf_blocksize call to all block devices Ekaterina Tumanova
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Ekaterina Tumanova @ 2014-07-29 12:27 UTC (permalink / raw)
  To: Public KVM Mailing List
  Cc: cornelia.huck, dahi, Ekaterina Tumanova, borntraeger

This patch introduces a new method of defining the physical
and logical blocksizes for "raw" and "host_device" drivers.
It uses various ioctls to determine the logical and physical
blocksizes.

For detecting the logical size it uses ioctl calls, that
were previously coded in raw_probe_alignment (now moved into
separate function probe_logical_blocksize). For detecting the
physical blocksize it uses BLKPBSZGET ioctl.

For "raw" devices driver calls the child's method.

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
 block/raw-posix.c | 69 ++++++++++++++++++++++++++++++++++++++++---------------
 block/raw_bsd.c   | 14 +++++++++++
 2 files changed, 64 insertions(+), 19 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 8e9758e..6e0d80c 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -221,50 +221,70 @@ static int raw_normalize_devicepath(const char **filename)
 }
 #endif
 
-static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
+static unsigned int probe_logical_blocksize(BlockDriverState *bs, int fd)
 {
-    BDRVRawState *s = bs->opaque;
-    char *buf;
-    unsigned int sector_size;
-
-    /* For /dev/sg devices the alignment is not really used.
-       With buffered I/O, we don't have any restrictions. */
-    if (bs->sg || !(s->open_flags & O_DIRECT)) {
-        bs->request_alignment = 1;
-        s->buf_align = 1;
-        return;
-    }
+    unsigned int sector_size = 0;
 
     /* Try a few ioctls to get the right size */
-    bs->request_alignment = 0;
-    s->buf_align = 0;
-
 #ifdef BLKSSZGET
     if (ioctl(fd, BLKSSZGET, &sector_size) >= 0) {
-        bs->request_alignment = sector_size;
+        return sector_size;
     }
 #endif
 #ifdef DKIOCGETBLOCKSIZE
     if (ioctl(fd, DKIOCGETBLOCKSIZE, &sector_size) >= 0) {
-        bs->request_alignment = sector_size;
+        return sector_size;
     }
 #endif
 #ifdef DIOCGSECTORSIZE
     if (ioctl(fd, DIOCGSECTORSIZE, &sector_size) >= 0) {
-        bs->request_alignment = sector_size;
+        return sector_size;
     }
 #endif
 #ifdef CONFIG_XFS
     if (s->is_xfs) {
         struct dioattr da;
         if (xfsctl(NULL, fd, XFS_IOC_DIOINFO, &da) >= 0) {
-            bs->request_alignment = da.d_miniosz;
+            sector_size = da.d_miniosz;
             /* The kernel returns wrong information for d_mem */
             /* s->buf_align = da.d_mem; */
+            return sector_size;
         }
     }
 #endif
 
+    return 0;
+}
+
+static unsigned int probe_physical_blocksize(BlockDriverState *bs, int fd)
+{
+    unsigned int blk_size = 0;
+#ifdef BLKPBSZGET
+    if (ioctl(fd, BLKPBSZGET, &blk_size) >= 0) {
+        return blk_size;
+    }
+#endif
+
+    return 0;
+}
+
+static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
+{
+    BDRVRawState *s = bs->opaque;
+    char *buf;
+
+    /* For /dev/sg devices the alignment is not really used.
+       With buffered I/O, we don't have any restrictions. */
+    if (bs->sg || !(s->open_flags & O_DIRECT)) {
+        bs->request_alignment = 1;
+        s->buf_align = 1;
+        return;
+    }
+
+    s->buf_align = 0;
+    /* Let's try to use the logical blocksize for the alignment. */
+    bs->request_alignment = probe_logical_blocksize(bs, fd);
+
     /* If we could not get the sizes so far, we can only guess them */
     if (!s->buf_align) {
         size_t align;
@@ -642,6 +662,16 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
     bs->bl.opt_mem_alignment = s->buf_align;
 }
 
+static int hdev_probe_blocksizes(BlockDriverState *bs)
+{
+    BDRVRawState *s = bs->opaque;
+
+    bs->logical_block_size = probe_logical_blocksize(bs, s->fd);
+    bs->physical_block_size = probe_physical_blocksize(bs, s->fd);
+
+    return 0;
+}
+
 static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)
 {
     int ret;
@@ -2009,6 +2039,7 @@ static BlockDriver bdrv_host_device = {
     .bdrv_get_info = raw_get_info,
     .bdrv_get_allocated_file_size
                         = raw_get_allocated_file_size,
+    .bdrv_probe_blocksizes = hdev_probe_blocksizes,
 
     .bdrv_detach_aio_context = raw_detach_aio_context,
     .bdrv_attach_aio_context = raw_attach_aio_context,
diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index f82f4c2..ae400a6 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -173,6 +173,19 @@ static int raw_probe(const uint8_t *buf, int buf_size, const char *filename)
     return 1;
 }
 
+static int raw_probe_blocksizes(BlockDriverState *bs)
+{
+    int ret;
+
+    ret = bdrv_probe_blocksizes(bs->file);
+    if (ret == 0) {
+        bs->logical_block_size = bs->file->logical_block_size;
+        bs->physical_block_size = bs->file->physical_block_size;
+    }
+
+    return ret;
+}
+
 static BlockDriver bdrv_raw = {
     .format_name          = "raw",
     .bdrv_probe           = &raw_probe,
@@ -190,6 +203,7 @@ static BlockDriver bdrv_raw = {
     .has_variable_length  = true,
     .bdrv_get_info        = &raw_get_info,
     .bdrv_refresh_limits  = &raw_refresh_limits,
+    .bdrv_probe_blocksizes = &raw_probe_blocksizes,
     .bdrv_is_inserted     = &raw_is_inserted,
     .bdrv_media_changed   = &raw_media_changed,
     .bdrv_eject           = &raw_eject,
-- 
1.8.5.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 4/4] blocksize: add blkconf_blocksize call to all block devices
  2014-07-29 12:27 [Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices Ekaterina Tumanova
                   ` (2 preceding siblings ...)
  2014-07-29 12:27 ` [Qemu-devel] [PATCH 3/4] blocksize: Add driver method to get the blocksizes Ekaterina Tumanova
@ 2014-07-29 12:27 ` Ekaterina Tumanova
  2014-09-03 15:46   ` Stefan Hajnoczi
  2014-07-29 12:37 ` [Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices Christian Borntraeger
  2014-11-06 15:58 ` [Qemu-devel] " Christian Borntraeger
  5 siblings, 1 reply; 23+ messages in thread
From: Ekaterina Tumanova @ 2014-07-29 12:27 UTC (permalink / raw)
  To: Public KVM Mailing List
  Cc: cornelia.huck, dahi, Ekaterina Tumanova, borntraeger

This patch add the blkconf_blocksize call to all
devices, which use DEFINE_BLOCK_PROPERTIES.
If the underlying driver function fails, blkconf_blocksizes
will set blocksizes to default (512) value.

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
 hw/block/nvme.c          | 1 +
 hw/block/virtio-blk.c    | 1 +
 hw/ide/qdev.c            | 1 +
 hw/scsi/scsi-disk.c      | 1 +
 hw/usb/dev-storage.c     | 1 +
 include/hw/block/block.h | 4 ++--
 6 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 5fd8f89..50fe769 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -761,6 +761,7 @@ static int nvme_init(PCIDevice *pci_dev)
     if (!n->serial) {
         return -1;
     }
+    blkconf_blocksizes(&n->conf);
 
     pci_conf = pci_dev->config;
     pci_conf[PCI_INTERRUPT_PIN] = 1;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index c241c50..b5027b1 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -743,6 +743,7 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
     }
 
     blkconf_serial(&blk->conf, &blk->serial);
+    blkconf_blocksizes(&blk->conf);
     s->original_wce = bdrv_enable_write_cache(blk->conf.bs);
     if (blkconf_geometry(&blk->conf, NULL, 65535, 255, 255) < 0) {
         error_setg(errp, "Error setting geometry");
diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index 6e475e6..6d94d8f 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -161,6 +161,7 @@ static int ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind)
     }
 
     blkconf_serial(&dev->conf, &dev->serial);
+    blkconf_blocksizes(&dev->conf);
     if (kind != IDE_CD
         && blkconf_geometry(&dev->conf, &dev->chs_trans, 65536, 16, 255) < 0) {
         return -1;
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index d47ecd6..bfae48b 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2250,6 +2250,7 @@ static int scsi_initfn(SCSIDevice *dev)
     }
 
     blkconf_serial(&s->qdev.conf, &s->serial);
+    blkconf_blocksizes(&s->qdev.conf);
     if (dev->type == TYPE_DISK
         && blkconf_geometry(&dev->conf, NULL, 65535, 255, 255) < 0) {
         return -1;
diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index ae4efcb..cf50ac1 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -603,6 +603,7 @@ static int usb_msd_initfn_storage(USBDevice *dev)
     }
 
     blkconf_serial(&s->conf, &dev->serial);
+    blkconf_blocksizes(&s->conf);
 
     /*
      * Hack alert: this pretends to be a block device, but it's really
diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index 7a0092e..8560bb2 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -44,9 +44,9 @@ static inline unsigned int get_physical_block_exp(BlockConf *conf)
 #define DEFINE_BLOCK_PROPERTIES(_state, _conf)                          \
     DEFINE_PROP_DRIVE("drive", _state, _conf.bs),                       \
     DEFINE_PROP_BLOCKSIZE("logical_block_size", _state,                 \
-                          _conf.logical_block_size, 512),               \
+                          _conf.logical_block_size, 0),                 \
     DEFINE_PROP_BLOCKSIZE("physical_block_size", _state,                \
-                          _conf.physical_block_size, 512),              \
+                          _conf.physical_block_size, 0),                \
     DEFINE_PROP_UINT16("min_io_size", _state, _conf.min_io_size, 0),  \
     DEFINE_PROP_UINT32("opt_io_size", _state, _conf.opt_io_size, 0),    \
     DEFINE_PROP_INT32("bootindex", _state, _conf.bootindex, -1),        \
-- 
1.8.5.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices
  2014-07-29 12:27 [Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices Ekaterina Tumanova
                   ` (3 preceding siblings ...)
  2014-07-29 12:27 ` [Qemu-devel] [PATCH 4/4] blocksize: add blkconf_blocksize call to all block devices Ekaterina Tumanova
@ 2014-07-29 12:37 ` Christian Borntraeger
  2014-11-06 15:58 ` [Qemu-devel] " Christian Borntraeger
  5 siblings, 0 replies; 23+ messages in thread
From: Christian Borntraeger @ 2014-07-29 12:37 UTC (permalink / raw)
  To: Ekaterina Tumanova, Public KVM Mailing List
  Cc: cornelia.huck, dahi, Stefan Hajnoczi, Kevin Wolf

On 29/07/14 14:27, Ekaterina Tumanova wrote:
> This patch set is based on a patch suggested by Einar Lueck
> on Feb 08, 2013.
> 
> This patch set introduces:
> 1. s390x specific geometry detection:
> Add s390 specific version of hd_geometry_guess function,
> which uses HDIO_GETGEO ioctl.
> 
> 2. A set of blocksize patches for autodetection of logical and
> physical blocksizes. Change history:
> 2.1
> Original blocksize patch only configured autolookup for
> virtio-blk devices. There was a request from Stefan Hajnoczi
> to make this architecture-independent. Now autolookup is
> configured by default for all block devices.
> 2.2
> Add driver method to probe blocksizes for "raw" and
> "host_device" drivers.
> (also requested by the reviewers of the original patch)
> 
> Ekaterina Tumanova (4):
>   hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x
>   blocksize: support auto-sensing of blocksizes
>   blocksize: Add driver method to get the blocksizes
>   blocksize: add blkconf_blocksize call to all block devices
> 
>  block.c                   | 12 +++++++++
>  block/raw-posix.c         | 69 ++++++++++++++++++++++++++++++++++-------------
>  block/raw_bsd.c           | 14 ++++++++++
>  hw/block/Makefile.objs    |  6 ++++-
>  hw/block/block.c          | 25 +++++++++++++++++
>  hw/block/hd-geometry.c    | 56 ++++++++++++++++++++++++++++++++++++++
>  hw/block/nvme.c           |  1 +
>  hw/block/virtio-blk.c     |  1 +
>  hw/core/qdev-properties.c |  4 ++-
>  hw/ide/qdev.c             |  1 +
>  hw/scsi/scsi-disk.c       |  1 +
>  hw/usb/dev-storage.c      |  1 +
>  include/block/block.h     |  1 +
>  include/block/block_int.h |  5 ++++
>  include/hw/block/block.h  |  6 +++--
>  15 files changed, 180 insertions(+), 23 deletions(-)
> 

CCing Kevin and Stefan.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x
  2014-07-29 12:27 ` [Qemu-devel] [PATCH 1/4] hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x Ekaterina Tumanova
@ 2014-08-20  8:19   ` Paolo Bonzini
  2014-08-20  9:35     ` Christian Borntraeger
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2014-08-20  8:19 UTC (permalink / raw)
  To: Ekaterina Tumanova, Public KVM Mailing List
  Cc: cornelia.huck, dahi, borntraeger

Il 29/07/2014 14:27, Ekaterina Tumanova ha scritto:
> The new HDIO_GETGEO logic is required for two use cases:
> a) Support for geometries of Direct Attached Storage Disks (DASD)
> on s390x configured as backing of virtio block devices.

Is this still relevant now that QEMU can emulate 512-byte sectors on top
of host devices with 4k sectors?

> b) Support for FCP attached SCSI disks that do not yet have a
> partition table. Without this patch, fdisk -l on the host would
> return different results then fdisk -l in the guest.

Can you provide an example?

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x
  2014-08-20  8:19   ` Paolo Bonzini
@ 2014-08-20  9:35     ` Christian Borntraeger
  2014-08-20 13:10       ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Christian Borntraeger @ 2014-08-20  9:35 UTC (permalink / raw)
  To: Paolo Bonzini, Ekaterina Tumanova, Public KVM Mailing List
  Cc: cornelia.huck, dahi

On 20/08/14 10:19, Paolo Bonzini wrote:
> Il 29/07/2014 14:27, Ekaterina Tumanova ha scritto:
>> The new HDIO_GETGEO logic is required for two use cases:
>> a) Support for geometries of Direct Attached Storage Disks (DASD)
>> on s390x configured as backing of virtio block devices.
> 
> Is this still relevant now that QEMU can emulate 512-byte sectors on top
> of host devices with 4k sectors?

Yes, the geometry and the block size define the layout of the DASD disk format. This defines how the ibm disk layout partition table looks like. So partition detection of the IBM layout only works if values are correct. (see linux block/partitions/ibm.c. it needs these values to correctly locate the data structures).

Furthermore, if you do an mkfs in the host and the guest sees a different block size all kind of strange things will happen when mounting, no?

> 
>> b) Support for FCP attached SCSI disks that do not yet have a
>> partition table. Without this patch, fdisk -l on the host would
>> return different results then fdisk -l in the guest.
> 
> Can you provide an example?

scsi disk in the host:
[root@host ~]#  sfdisk -g /dev/disk/by-id/wwn-0x6005076305ffc1ae0000000000002593
/dev/disk/by-id/wwn-0x6005076305ffc1ae0000000000002593: 1011 cylinders, 34 heads, 61 sectors/track

same scsi disk in the guest as virtio-blk:
scsi disk in the guest:
[root@guest ~]# sfdisk -g /dev/disk/by-id/virtio-d20
/dev/disk/by-id/virtio-d20: 2080 cylinders, 16 heads, 63 sectors/track

16/63 is currently hardcoded by QEMU, no matter what the host thinks. This gets overridden as soon as there is a partition table.

command line was:
-drive file=/dev/disk/by-id/scsi-36005076305ffc1ae0000000000002593,if=none,id=drive-virtio-disk20,format=raw,serial=d20,cache=none,aio=native

> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x
  2014-08-20  9:35     ` Christian Borntraeger
@ 2014-08-20 13:10       ` Paolo Bonzini
  2014-08-25  8:21         ` Christian Borntraeger
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2014-08-20 13:10 UTC (permalink / raw)
  To: Christian Borntraeger, Ekaterina Tumanova, Public KVM Mailing List
  Cc: cornelia.huck, dahi

Il 20/08/2014 11:35, Christian Borntraeger ha scritto:
> On 20/08/14 10:19, Paolo Bonzini wrote:
>> Il 29/07/2014 14:27, Ekaterina Tumanova ha scritto:
>>> The new HDIO_GETGEO logic is required for two use cases:
>>> a) Support for geometries of Direct Attached Storage Disks (DASD)
>>> on s390x configured as backing of virtio block devices.
>>
>> Is this still relevant now that QEMU can emulate 512-byte sectors on top
>> of host devices with 4k sectors?
> 
> Yes, the geometry and the block size define the layout of the DASD disk format. This defines how the ibm disk layout partition table looks like. So partition detection of the IBM layout only works if values are correct. (see linux block/partitions/ibm.c. it needs these values to correctly locate the data structures).
> 
> Furthermore, if you do an mkfs in the host and the guest sees a different block size all kind of strange things will happen when mounting, no?
> 
>>
>>> b) Support for FCP attached SCSI disks that do not yet have a
>>> partition table. Without this patch, fdisk -l on the host would
>>> return different results then fdisk -l in the guest.
>>
>> Can you provide an example?
> 
> scsi disk in the host:
> [root@host ~]#  sfdisk -g /dev/disk/by-id/wwn-0x6005076305ffc1ae0000000000002593
> /dev/disk/by-id/wwn-0x6005076305ffc1ae0000000000002593: 1011 cylinders, 34 heads, 61 sectors/track
> 
> same scsi disk in the guest as virtio-blk:
> scsi disk in the guest:
> [root@guest ~]# sfdisk -g /dev/disk/by-id/virtio-d20
> /dev/disk/by-id/virtio-d20: 2080 cylinders, 16 heads, 63 sectors/track
> 
> 16/63 is currently hardcoded by QEMU, no matter what the host thinks. This gets overridden as soon as there is a partition table.
> 
> command line was:
> -drive file=/dev/disk/by-id/scsi-36005076305ffc1ae0000000000002593,if=none,id=drive-virtio-disk20,format=raw,serial=d20,cache=none,aio=native

But this risks changing whenever you move data from a disk to another
disk, or if you move a virtual DASD disk from physical DASD to physical
SCSI.

If s390 is so sensitive to the head count and number of sectors/track
(on x86 everything is done with LBAs nowadays, even in the firmware), I
think whatever management layer you use should always specify them
explicitly.

I'm not saying this patch shouldn't be included---but it should be
treated as a handy thing for developers rather than as a definitive fix.

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x
  2014-08-20 13:10       ` Paolo Bonzini
@ 2014-08-25  8:21         ` Christian Borntraeger
  0 siblings, 0 replies; 23+ messages in thread
From: Christian Borntraeger @ 2014-08-25  8:21 UTC (permalink / raw)
  To: Paolo Bonzini, Ekaterina Tumanova, Public KVM Mailing List
  Cc: cornelia.huck, dahi, Stefan Hajnoczi, Kevin Wolf

On 20/08/14 15:10, Paolo Bonzini wrote:
> Il 20/08/2014 11:35, Christian Borntraeger ha scritto:
>> On 20/08/14 10:19, Paolo Bonzini wrote:
>>> Il 29/07/2014 14:27, Ekaterina Tumanova ha scritto:
>>>> The new HDIO_GETGEO logic is required for two use cases:
>>>> a) Support for geometries of Direct Attached Storage Disks (DASD)
>>>> on s390x configured as backing of virtio block devices.
>>>
>>> Is this still relevant now that QEMU can emulate 512-byte sectors on top
>>> of host devices with 4k sectors?
>>
>> Yes, the geometry and the block size define the layout of the DASD disk format. This defines how the ibm disk layout partition table looks like. So partition detection of the IBM layout only works if values are correct. (see linux block/partitions/ibm.c. it needs these values to correctly locate the data structures).
>>
>> Furthermore, if you do an mkfs in the host and the guest sees a different block size all kind of strange things will happen when mounting, no?
>>
>>>
>>>> b) Support for FCP attached SCSI disks that do not yet have a
>>>> partition table. Without this patch, fdisk -l on the host would
>>>> return different results then fdisk -l in the guest.
>>>
>>> Can you provide an example?
>>
>> scsi disk in the host:
>> [root@host ~]#  sfdisk -g /dev/disk/by-id/wwn-0x6005076305ffc1ae0000000000002593
>> /dev/disk/by-id/wwn-0x6005076305ffc1ae0000000000002593: 1011 cylinders, 34 heads, 61 sectors/track
>>
>> same scsi disk in the guest as virtio-blk:
>> scsi disk in the guest:
>> [root@guest ~]# sfdisk -g /dev/disk/by-id/virtio-d20
>> /dev/disk/by-id/virtio-d20: 2080 cylinders, 16 heads, 63 sectors/track
>>
>> 16/63 is currently hardcoded by QEMU, no matter what the host thinks. This gets overridden as soon as there is a partition table.
>>
>> command line was:
>> -drive file=/dev/disk/by-id/scsi-36005076305ffc1ae0000000000002593,if=none,id=drive-virtio-disk20,format=raw,serial=d20,cache=none,aio=native
> 
> But this risks changing whenever you move data from a disk to another
> disk, or if you move a virtual DASD disk from physical DASD to physical
> SCSI.
> 
> If s390 is so sensitive to the head count and number of sectors/track
> (on x86 everything is done with LBAs nowadays, even in the firmware), I
> think whatever management layer you use should always specify them
> explicitly.

Only the DASD disks are so sensitive. The SCSI geometry is just a cosmetic thing, but it doesnt hurt. So for anything that comes via FCP SCSI we dont have a series issue besides the cosmetic thing. (Well, unless you have a storage server with 4k scsi disks, then it also becomes an issue on x86 - already seen on Flash Systems and other storage servers).
> 
> I'm not saying this patch shouldn't be included---but it should be
> treated as a handy thing for developers rather than as a definitive fix.

Yes. I would even suggest, that for image files you better use a SCSI disk image (which then has the normal partition layout as x86). 
Of course, there are reasons to have image files to hold DASD images, but then you have to manually specify geo/ss. In fact, libvirt always supported to specify the geometry and Viktor from our team did the sector size support in libvirt with:

commit 5cc50ad7a4e139261079a5848859c84cab3b0c7c
Author:     Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
AuthorDate: Wed Aug 29 17:48:30 2012 +0200
Commit:     Eric Blake <eblake@redhat.com>
CommitDate: Fri Aug 31 11:27:27 2012 -0700

    conf: Support for Block Device IO Limits
    
    Introducing a new iolimits element allowing to override certain
    properties of a guest block device like the physical and logical
    block size.
    This can be useful for platforms with 'non-standard' disk formats
    like S390 DASD with its 4K block size.
    
    Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>


commit 277a49bce73da908c965e466b03f5fc97f04cae1
Author:     Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
AuthorDate: Wed Aug 29 17:48:31 2012 +0200
Commit:     Eric Blake <eblake@redhat.com>
CommitDate: Fri Aug 31 11:27:47 2012 -0700

    qemu: Support for Block Device IO Limits.
    
    Implementation of iolimits for the qemu driver with
    capability probing for block size attribute and
    command line generation for block sizes.
    Including testcase for qemuxml2argvtest.
    
    Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>

some time ago.

So this detection is mostly important for DASD disk passthrough as the defaults are plainly wrong for those disks. (Under LPAR and z/VM the disks work out of the box, so a proper detection really helps).
As far as I can see the patches only affect disks, that are "raw" or "host_device". So it would be good, if Kevin and Stefan could take a look at the overall design. If they/you come up with a totally different approach, that is also fine.
Kate was away last week, so hopefully she can continue to drive the discussion with you folks, when she is back.

Christian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] blocksize: support auto-sensing of blocksizes
  2014-07-29 12:27 ` [Qemu-devel] [PATCH 2/4] blocksize: support auto-sensing of blocksizes Ekaterina Tumanova
@ 2014-09-03 15:37   ` Stefan Hajnoczi
  0 siblings, 0 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2014-09-03 15:37 UTC (permalink / raw)
  To: Ekaterina Tumanova
  Cc: cornelia.huck, dahi, Public KVM Mailing List, borntraeger

[-- Attachment #1: Type: text/plain, Size: 1140 bytes --]

On Tue, Jul 29, 2014 at 02:27:17PM +0200, Ekaterina Tumanova wrote:
> The block device model does not impose fixed block sizes for
> access to backing devices. This patch introduces support for
> auto lookup of the block sizes of the backing block device.
> 
> To achieve this, a new function blkconf_blocksizes is
> implemented. This function tries to get values
> from the block driver. If this does not work 512 is used,
> so the default excecution logic is not changed.
> 
> Based on 2013 patch from Einar Lueck <elelueck@de.ibm.com>
> 
> Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> ---
>  block.c                   | 12 ++++++++++++
>  hw/block/block.c          | 25 +++++++++++++++++++++++++
>  hw/core/qdev-properties.c |  4 +++-
>  include/block/block.h     |  1 +
>  include/block/block_int.h |  5 +++++
>  include/hw/block/block.h  |  2 ++
>  6 files changed, 48 insertions(+), 1 deletion(-)

This should be implemented as part of BlockLimits and
bdrv_refresh_limits().

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] blocksize: add blkconf_blocksize call to all block devices
  2014-07-29 12:27 ` [Qemu-devel] [PATCH 4/4] blocksize: add blkconf_blocksize call to all block devices Ekaterina Tumanova
@ 2014-09-03 15:46   ` Stefan Hajnoczi
  2014-09-04 10:28     ` Ekaterina Tumanova
  2014-09-04 14:15     ` Christian Borntraeger
  0 siblings, 2 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2014-09-03 15:46 UTC (permalink / raw)
  To: Ekaterina Tumanova
  Cc: cornelia.huck, dahi, Public KVM Mailing List, borntraeger

[-- Attachment #1: Type: text/plain, Size: 1358 bytes --]

On Tue, Jul 29, 2014 at 02:27:19PM +0200, Ekaterina Tumanova wrote:
> This patch add the blkconf_blocksize call to all
> devices, which use DEFINE_BLOCK_PROPERTIES.
> If the underlying driver function fails, blkconf_blocksizes
> will set blocksizes to default (512) value.
> 
> Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> ---
>  hw/block/nvme.c          | 1 +
>  hw/block/virtio-blk.c    | 1 +
>  hw/ide/qdev.c            | 1 +
>  hw/scsi/scsi-disk.c      | 1 +
>  hw/usb/dev-storage.c     | 1 +
>  include/hw/block/block.h | 4 ++--
>  6 files changed, 7 insertions(+), 2 deletions(-)

Wasn't this NACKed before on the grounds that it is likely to upset the
guest after live migration?  QEMU doesn't automatically query the
storage because these parameters must be preserved across migration.

The knowledge of these fields belongs in the management tool that
orchestrates migration, not QEMU.

If you want specific parameters, please put them in your guest
configuration.  QEMU and libvirt support that.

I'm concerned that this patch serious is likely to break things and
autodetection doesn't add much value since the management tool needs to
be aware of this information anyway.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] blocksize: add blkconf_blocksize call to all block devices
  2014-09-03 15:46   ` Stefan Hajnoczi
@ 2014-09-04 10:28     ` Ekaterina Tumanova
  2014-09-17 14:00       ` Stefan Hajnoczi
  2014-09-04 14:15     ` Christian Borntraeger
  1 sibling, 1 reply; 23+ messages in thread
From: Ekaterina Tumanova @ 2014-09-04 10:28 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: cornelia.huck, dahi, Public KVM Mailing List, borntraeger

On 09/03/2014 07:46 PM, Stefan Hajnoczi wrote:
> On Tue, Jul 29, 2014 at 02:27:19PM +0200, Ekaterina Tumanova wrote:
>> This patch add the blkconf_blocksize call to all
>> devices, which use DEFINE_BLOCK_PROPERTIES.
>> If the underlying driver function fails, blkconf_blocksizes
>> will set blocksizes to default (512) value.
>>
>> Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
>> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
>> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
>> ---
>>   hw/block/nvme.c          | 1 +
>>   hw/block/virtio-blk.c    | 1 +
>>   hw/ide/qdev.c            | 1 +
>>   hw/scsi/scsi-disk.c      | 1 +
>>   hw/usb/dev-storage.c     | 1 +
>>   include/hw/block/block.h | 4 ++--
>>   6 files changed, 7 insertions(+), 2 deletions(-)
>
> Wasn't this NACKed before on the grounds that it is likely to upset the
> guest after live migration?  QEMU doesn't automatically query the
> storage because these parameters must be preserved across migration.
>

Sorry, haven't found this discussion in the mail list. Do you have a link?

As far as I understand, the xxxxxx_init functions of the qemu block 
devices, which contain blkconf_blocksize calls, will
be called again on the destination host before the guest is resumed. And 
since migration requests qemu to be brought on the same disk, the 
configuration will receive the same block size from the ioctl, as
before. What do I miss?

> The knowledge of these fields belongs in the management tool that
> orchestrates migration, not QEMU.
>

For case of DASDs we need QEMU to know the underlying blocksize.
And you mentioned in your review comment to the Einar's initial patch
that you request this to be implemented for all architectures:

"Detecting the underlying block size is a generally useful configuration
option.  This should not be s390-specific, so no need to rename
DEFINE_BLOCK_PROPERTIES()."

http://qemu.11.n7.nabble.com/Qemu-devel-PATCH-V3-0-2-hd-geometry-c-Integrate-HDIO-GETGEO-in-guessing-tt185124.html#none

> If you want specific parameters, please put them in your guest
> configuration.  QEMU and libvirt support that.
>
> I'm concerned that this patch serious is likely to break things and
> autodetection doesn't add much value since the management tool needs to
> be aware of this information anyway.
>

Can you please explain what do you mean by "AUTOdetection"?
Do you simply mean "detection by ioctl" or "detection performed without
guest request"?

> Stefan
>

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] blocksize: add blkconf_blocksize call to all block devices
  2014-09-03 15:46   ` Stefan Hajnoczi
  2014-09-04 10:28     ` Ekaterina Tumanova
@ 2014-09-04 14:15     ` Christian Borntraeger
  2014-09-17 13:56       ` Stefan Hajnoczi
  1 sibling, 1 reply; 23+ messages in thread
From: Christian Borntraeger @ 2014-09-04 14:15 UTC (permalink / raw)
  To: Stefan Hajnoczi, Ekaterina Tumanova
  Cc: cornelia.huck, dahi, Public KVM Mailing List

On 03/09/14 17:46, Stefan Hajnoczi wrote:
> On Tue, Jul 29, 2014 at 02:27:19PM +0200, Ekaterina Tumanova wrote:
>> This patch add the blkconf_blocksize call to all
>> devices, which use DEFINE_BLOCK_PROPERTIES.
>> If the underlying driver function fails, blkconf_blocksizes
>> will set blocksizes to default (512) value.
>>
>> Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
>> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
>> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
>> ---
>>  hw/block/nvme.c          | 1 +
>>  hw/block/virtio-blk.c    | 1 +
>>  hw/ide/qdev.c            | 1 +
>>  hw/scsi/scsi-disk.c      | 1 +
>>  hw/usb/dev-storage.c     | 1 +
>>  include/hw/block/block.h | 4 ++--
>>  6 files changed, 7 insertions(+), 2 deletions(-)
> 
> Wasn't this NACKed before on the grounds that it is likely to upset the

Not yet ;-) (just other other comments)

> guest after live migration?  QEMU doesn't automatically query the
> storage because these parameters must be preserved across migration.

Would it be more acceptable if this auto detection is only running on init,
but on migration the target will use the block size of the source?

> 
> The knowledge of these fields belongs in the management tool that
> orchestrates migration, not QEMU.

I think the main problem that this patch tries to address is, is not migration. 
It tries to make the whole guest work an pass-through dasd. It guess that this problem
did not happen on x86 yet as most disks are 512 and most file system will cope with
sector size changes.
Maybe this will change as soon as you run a guest on a real disk (no image file) and
the disk happens to be a 4Kn disk.

Question is: Do we want all users to specify the block size of the underlying disk,
just because its a 4Kn-type disk?

Or in other words:
shall we force everybody to change
qemu-system-x86_64 -hda /dev/sdc

into 
qemu-system-x86_64 -drive file=/dev/sdc,if=none,id=mydisk -device ide-drive,bus=ide.0,drive=mydisk,physical_block_size=4096
for a 4Kn device?

Or for the libvirt case, we need
   <blockio logical_block_size='4096' physical_block_size='4096'/>

Or is your suggestion to let libvirt detect the block size for host devices?


> If you want specific parameters, please put them in your guest
> configuration.  QEMU and libvirt support that.
> 
> I'm concerned that this patch serious is likely to break things and
> autodetection doesn't add much value since the management tool needs to
> be aware of this information anyway.

I am totally with you, that we should make this patch in a way to not break anything on other platforms.

Christian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] blocksize: add blkconf_blocksize call to all block devices
  2014-09-04 14:15     ` Christian Borntraeger
@ 2014-09-17 13:56       ` Stefan Hajnoczi
  0 siblings, 0 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2014-09-17 13:56 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: cornelia.huck, dahi, Ekaterina Tumanova, Public KVM Mailing List

[-- Attachment #1: Type: text/plain, Size: 3204 bytes --]

On Thu, Sep 04, 2014 at 04:15:21PM +0200, Christian Borntraeger wrote:
> On 03/09/14 17:46, Stefan Hajnoczi wrote:
> > On Tue, Jul 29, 2014 at 02:27:19PM +0200, Ekaterina Tumanova wrote:
> > guest after live migration?  QEMU doesn't automatically query the
> > storage because these parameters must be preserved across migration.
> 
> Would it be more acceptable if this auto detection is only running on init,
> but on migration the target will use the block size of the source?

Detection is only useful when installing a new guest from scratch.

> > The knowledge of these fields belongs in the management tool that
> > orchestrates migration, not QEMU.
> 
> I think the main problem that this patch tries to address is, is not migration. 
> It tries to make the whole guest work an pass-through dasd. It guess that this problem
> did not happen on x86 yet as most disks are 512 and most file system will cope with
> sector size changes.
> Maybe this will change as soon as you run a guest on a real disk (no image file) and
> the disk happens to be a 4Kn disk.
> 
> Question is: Do we want all users to specify the block size of the underlying disk,
> just because its a 4Kn-type disk?
> 
> Or in other words:
> shall we force everybody to change
> qemu-system-x86_64 -hda /dev/sdc
> 
> into 
> qemu-system-x86_64 -drive file=/dev/sdc,if=none,id=mydisk -device ide-drive,bus=ide.0,drive=mydisk,physical_block_size=4096
> for a 4Kn device?
> 
> Or for the libvirt case, we need
>    <blockio logical_block_size='4096' physical_block_size='4096'/>
> 
> Or is your suggestion to let libvirt detect the block size for host devices?

No, QEMU already detects alignment requirements and uses bounce buffers.
So the guest can run under the illusion that 512 byte requests are okay
even on a 4 KB disk.

Please see block.c:bdrv_co_do_preadv()

This means existing guests that were installed on a 512 byte disk keep
running when you move them onto a 4 KB sector host block device.

If you want to pass the 4 KB requirement through, then set it explicitly
in the guest configuration.

> > If you want specific parameters, please put them in your guest
> > configuration.  QEMU and libvirt support that.
> > 
> > I'm concerned that this patch serious is likely to break things and
> > autodetection doesn't add much value since the management tool needs to
> > be aware of this information anyway.
> 
> I am totally with you, that we should make this patch in a way to not break anything on other platforms.

This is not architecture-specific.  If the current policy doesn't work
on s390 then the policy needs to be extended for all architectures.

My point is that QEMU cannot pass values from the host through to the
guest automatically since it changes what hardware configuration the
guest sees.  Whether the guest is running (live migration) or across
boot (moving disk image to another machine) doesn't really matter, the
point is that the guest hardware configuration should be constant to
avoid upsetting guests.

For these reasons, I think that I/O alignment constraints should be
explicitly configured at the QEMU level.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] blocksize: add blkconf_blocksize call to all block devices
  2014-09-04 10:28     ` Ekaterina Tumanova
@ 2014-09-17 14:00       ` Stefan Hajnoczi
  0 siblings, 0 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2014-09-17 14:00 UTC (permalink / raw)
  To: Ekaterina Tumanova
  Cc: cornelia.huck, dahi, Public KVM Mailing List, borntraeger

[-- Attachment #1: Type: text/plain, Size: 2398 bytes --]

On Thu, Sep 04, 2014 at 02:28:26PM +0400, Ekaterina Tumanova wrote:
> On 09/03/2014 07:46 PM, Stefan Hajnoczi wrote:
> >On Tue, Jul 29, 2014 at 02:27:19PM +0200, Ekaterina Tumanova wrote:
> >>This patch add the blkconf_blocksize call to all
> >>devices, which use DEFINE_BLOCK_PROPERTIES.
> >>If the underlying driver function fails, blkconf_blocksizes
> >>will set blocksizes to default (512) value.
> >>
> >>Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
> >>Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
> >>Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> >>---
> >>  hw/block/nvme.c          | 1 +
> >>  hw/block/virtio-blk.c    | 1 +
> >>  hw/ide/qdev.c            | 1 +
> >>  hw/scsi/scsi-disk.c      | 1 +
> >>  hw/usb/dev-storage.c     | 1 +
> >>  include/hw/block/block.h | 4 ++--
> >>  6 files changed, 7 insertions(+), 2 deletions(-)
> >
> >Wasn't this NACKed before on the grounds that it is likely to upset the
> >guest after live migration?  QEMU doesn't automatically query the
> >storage because these parameters must be preserved across migration.
> >
> 
> Sorry, haven't found this discussion in the mail list. Do you have a link?

I don't have the link but am referring to the last time these patches
were discussed - ~1 year ago?

> As far as I understand, the xxxxxx_init functions of the qemu block devices,
> which contain blkconf_blocksize calls, will
> be called again on the destination host before the guest is resumed. And
> since migration requests qemu to be brought on the same disk, the
> configuration will receive the same block size from the ioctl, as
> before. What do I miss?

Live storage migration is supported.  This means non-shared storage is
possible.

Therefore the host block device that a guest runs on can change.

> >If you want specific parameters, please put them in your guest
> >configuration.  QEMU and libvirt support that.
> >
> >I'm concerned that this patch serious is likely to break things and
> >autodetection doesn't add much value since the management tool needs to
> >be aware of this information anyway.
> >
> 
> Can you please explain what do you mean by "AUTOdetection"?
> Do you simply mean "detection by ioctl" or "detection performed without
> guest request"?

I mean passing through the host block device attributes to the guest.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] Geometry and blocksize support for backing devices
  2014-07-29 12:27 [Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices Ekaterina Tumanova
                   ` (4 preceding siblings ...)
  2014-07-29 12:37 ` [Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices Christian Borntraeger
@ 2014-11-06 15:58 ` Christian Borntraeger
  2014-11-06 17:24   ` Paolo Bonzini
  2014-11-07  9:17   ` Markus Armbruster
  5 siblings, 2 replies; 23+ messages in thread
From: Christian Borntraeger @ 2014-11-06 15:58 UTC (permalink / raw)
  To: Markus Armbruster, Stefan Hajnoczi, Kevin Wolf
  Cc: Ekaterina Tumanova, qemu-devel, Viktor Mihajlovski, dahi,
	cornelia.huck, Paolo Bonzini

Markus, Kevin, Stefan,

here is a (somewhat late) followup of some KVM forum discussions regarding
block size and geometry of pass-through block devices. Let's just do a quick
wrap-up (as of my understanding) and a proposal at the end of the mail





- DASD/ECKD disk devices have several special properties, e.g. the geometry 
  still has an influence on the partitions and each track can have a different
  format and the format of the disk actually follows what z/OS has as basic
  data structures. Linux does a low-level format of the disk to look like a
  block device (most common case is formatted with 4K blocks). There are 
  still some smalls warts for z/OS compatibility (block0 has 28 bytes of data,
  block1 has 148 bytes and block2 has 84 bytes of data. everything else has
  4k. the dasd device driver will fake 4k blocks for all blocks by ignoring 
  writes beyond the data for blocks 0-2 and filling the gaps with 0xe5).

- Since Linux uses DASDs as normal block device, we actually want to use 
  virtio-blk to pass those to KVM guests. Due to the warts mentioned above,
  we have to have the proper block size and geometry in the guest. Otherwise 
  things will fail in certain cases (linux partition detection for example)

- we have libvirt support to provide this information in the XML. This is far 
  from user friendly, though, as the admin has to manually look up the
  properties of the host disks and then modify the guest definition accordingly.

- Kate came up with patches (based on initial patches from Einar Lueck) for
  auto-detection of geometry and block size for host block devices

- Stefan and Paolo had some concerns:
1. if the geometry etc is important, then make it part of the guest definition
2. what about migration and the target disk differs
3. is that issue system z specific or generic?

Regarding 1: this does work as of today, but it is pretty complicated for an
admin to do so

Regarding 2: System z system do not have built-in disks, they are always
accessed via fibre channel (either FICON protocol for DASDs or FCP protocol
for scsi). So its quite common to have shared access to the same disk from 
different System z boxes. system z admins should be able to setup this 
properly. Question is, is it ok to assume that and fail if not?

Regarding 3: No idea. 

At KVM forum I talked to several people regarding a solution:

a) Stefan suggested to make the auto detection explicit, e.g.: provide a
"autodetect" tag for the secs, cylc, heads and logical_sector_size properties
This would require changes in qemu, but also in libvirt and its domain
configuration
b) Markus suggested that there are already some cases in QEMU, where we rely
on the admin to provide a proper setup on the target, e.g. an .iso file as image.
If the target has a different content in the .iso file things will break.

Markus said, there are two classes of this.
a) problems that can be detected by QEMU. Here QEMU will abort migration
b) problems that cannot be detected by QEMU (e.g. different iso content). this
will trigger failures later on

a is preferred

Now here comes my proposal:
Markus statement brought up an idea of special casing DASDs support. We can
call an ioctl BIODASDINFO on the block device that will only succeed if the host
disk is really a dasd. We could enable the auto detection for that case.
In addition, QEMU will check if geometry and block size match during migration, if
not, migration will fail. That would work with the following cases

(manual override == secs, cyls, headers, blocksize given by admin)

HOST A                          HOST B
dasd (auto)            ----->   dasd (auto)
dasd (auto)            ----->   image file (manual override)
image file (manual)    ----->   dasd (auto)
image file (manual)    ----->   image file (manual)
dasd (auto)            ----->   other host block device with manual override

if there are different dasds or different value migration will fail (and that is
what we want)

In addition we could also implement Stefans proposal to add a "autodetect" statement
for secs, cyls, head.... but I am not sure about libvirt support, though

So 3 parts:
1. autodetect on real DASDs 
2. geometry and sector size checking in generic code
3. maybe an autodetect flag

makes sense? Any guidance how to proceed?

Christian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] Geometry and blocksize support for backing devices
  2014-11-06 15:58 ` [Qemu-devel] " Christian Borntraeger
@ 2014-11-06 17:24   ` Paolo Bonzini
  2014-11-06 19:05     ` Christian Borntraeger
  2014-11-07  9:17   ` Markus Armbruster
  1 sibling, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2014-11-06 17:24 UTC (permalink / raw)
  To: Christian Borntraeger, Markus Armbruster, Stefan Hajnoczi, Kevin Wolf
  Cc: cornelia.huck, dahi, Ekaterina Tumanova, qemu-devel, Viktor Mihajlovski


On 06/11/2014 16:58, Christian Borntraeger wrote:
> Now here comes my proposal:
> Markus statement brought up an idea of special casing DASDs support. We can
> call an ioctl BIODASDINFO on the block device that will only succeed if the host
> disk is really a dasd. We could enable the auto detection for that case.

That would be enough for me, actually.

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] Geometry and blocksize support for backing devices
  2014-11-06 17:24   ` Paolo Bonzini
@ 2014-11-06 19:05     ` Christian Borntraeger
  0 siblings, 0 replies; 23+ messages in thread
From: Christian Borntraeger @ 2014-11-06 19:05 UTC (permalink / raw)
  To: Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Kevin Wolf
  Cc: cornelia.huck, dahi, Ekaterina Tumanova, qemu-devel, Viktor Mihajlovski

Am 06.11.2014 18:24, schrieb Paolo Bonzini:
> 
> On 06/11/2014 16:58, Christian Borntraeger wrote:
>> Now here comes my proposal:
>> Markus statement brought up an idea of special casing DASDs support. We can
>> call an ioctl BIODASDINFO on the block device that will only succeed if the host
>> disk is really a dasd. We could enable the auto detection for that case.
> 
> That would be enough for me, actually.

Its certainly enough for us as well.

Stefan, Markus, Kevin, can Kate go on with this direction?

Christian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] Geometry and blocksize support for backing devices
  2014-11-06 15:58 ` [Qemu-devel] " Christian Borntraeger
  2014-11-06 17:24   ` Paolo Bonzini
@ 2014-11-07  9:17   ` Markus Armbruster
  2014-11-07  9:48     ` Christian Borntraeger
  2014-11-07 13:39     ` Ekaterina Tumanova
  1 sibling, 2 replies; 23+ messages in thread
From: Markus Armbruster @ 2014-11-07  9:17 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kevin Wolf, Stefan Hajnoczi, Ekaterina Tumanova, qemu-devel,
	Viktor Mihajlovski, dahi, cornelia.huck, Paolo Bonzini

Christian Borntraeger <borntraeger@de.ibm.com> writes:

> Markus, Kevin, Stefan,
>
> here is a (somewhat late) followup of some KVM forum discussions regarding
> block size and geometry of pass-through block devices. Let's just do a quick
> wrap-up (as of my understanding) and a proposal at the end of the mail
>
>
>
>
>
> - DASD/ECKD disk devices have several special properties, e.g. the geometry 
>   still has an influence on the partitions and each track can have a different
>   format and the format of the disk actually follows what z/OS has as basic
>   data structures. Linux does a low-level format of the disk to look like a
>   block device (most common case is formatted with 4K blocks). There are 
>   still some smalls warts for z/OS compatibility (block0 has 28 bytes of data,
>   block1 has 148 bytes and block2 has 84 bytes of data. everything else has
>   4k. the dasd device driver will fake 4k blocks for all blocks by ignoring 
>   writes beyond the data for blocks 0-2 and filling the gaps with 0xe5).
>
> - Since Linux uses DASDs as normal block device, we actually want to use 
>   virtio-blk to pass those to KVM guests. Due to the warts mentioned above,
>   we have to have the proper block size and geometry in the guest. Otherwise 
>   things will fail in certain cases (linux partition detection for example)
>
> - we have libvirt support to provide this information in the XML. This is far 
>   from user friendly, though, as the admin has to manually look up the
>   properties of the host disks and then modify the guest definition accordingly.
>
> - Kate came up with patches (based on initial patches from Einar Lueck) for
>   auto-detection of geometry and block size for host block devices
>
> - Stefan and Paolo had some concerns:
> 1. if the geometry etc is important, then make it part of the guest definition
> 2. what about migration and the target disk differs
> 3. is that issue system z specific or generic?
>
> Regarding 1: this does work as of today, but it is pretty complicated for an
> admin to do so
>
> Regarding 2: System z system do not have built-in disks, they are always
> accessed via fibre channel (either FICON protocol for DASDs or FCP protocol
> for scsi). So its quite common to have shared access to the same disk from 
> different System z boxes. system z admins should be able to setup this 
> properly. Question is, is it ok to assume that and fail if not?
>
> Regarding 3: No idea. 
>
> At KVM forum I talked to several people regarding a solution:
>
> a) Stefan suggested to make the auto detection explicit, e.g.: provide a
> "autodetect" tag for the secs, cylc, heads and logical_sector_size properties
> This would require changes in qemu, but also in libvirt and its domain
> configuration
> b) Markus suggested that there are already some cases in QEMU, where we rely
> on the admin to provide a proper setup on the target, e.g. an .iso
> file as image.
> If the target has a different content in the .iso file things will break.
>
> Markus said, there are two classes of this.
> a) problems that can be detected by QEMU. Here QEMU will abort migration

Example: device missing on target, target rejects migration when it
receives the device's state.

> b) problems that cannot be detected by QEMU (e.g. different iso content). this
> will trigger failures later on
>
> a is preferred

Note that the geometry is currently in class (b).  Configuration
generally is.

A perpetual long-term goal of migration is embedding configuration in
the migration stream, to move it from (b) to (a).  Just frontend
configuration, because backend configuration is generally host-specific.

Geometry is a property of the device, thus frontend configuration.

For historical reasons, device geometry properties default to backend
values set with -drive cyls=... or -hdachs.  This is deprecated.

If the user doesn't specify geometry, QEMU makes one up, usually based
on device size.  In certain circumstances, it bases on a DOS partition
table instead.  Misfeature, in my opinion.  Partitioning a disk can give
you a different geometry on the next restart, including migration,
unless you specify the geometry explicitly.  Fortunately, most guests
don't care for geometry at all.  This is entirely undocumented, as far
as I can tell.

> Now here comes my proposal:
> Markus statement brought up an idea of special casing DASDs support. We can
> call an ioctl BIODASDINFO on the block device that will only succeed if the host
> disk is really a dasd. We could enable the auto detection for that case.

If BIODASDINFO succeeds, QEMU uses that instead of making up device
geometry as described above.  Correct?

Let's spell out when exactly BIODASDINFO succeeds, to avoid
misunderstandings.  It does when the backend is a DASD (/dev/dasd*).
What about a partition on a DASD?  A file in a filesystem on a DASD?

Auto-detecting geometry on DASDs adds an irregularity to the user
interface.  I guess DASDs are special enough to tolerate that.

> In addition, QEMU will check if geometry and block size match during
> migration, if
> not, migration will fail. That would work with the following cases
>
> (manual override == secs, cyls, headers, blocksize given by admin)
>
> HOST A                          HOST B
> dasd (auto)            ----->   dasd (auto)
> dasd (auto)            ----->   image file (manual override)
> image file (manual)    ----->   dasd (auto)
> image file (manual)    ----->   image file (manual)
> dasd (auto)            ----->   other host block device with manual override
>
> if there are different dasds or different value migration will fail (and that is
> what we want)

I guess I helped plant this idea.  Thinking about it again, I fear doing
it just for geometry could be shortsighted.  When we do it for all
device configuration later on, a geometry special case could get in the
way.

While you're certainly welcome to take on one of migration's long-term
projects, I don't want to make it a prerequisite for getting DASDs more
usable ;)

Without this sanity check, we gain another way to mess up geometry by
changing the default geometry (see "DOS partition table" above for the
existing way).  We reuse the same answer: don't do that then.

Good enough for Paolo, if I understand him correctly.  I guess it's good
enough for me too, but please document it.  Covering the whole default
geometry machinery would be nice.

> In addition we could also implement Stefans proposal to add a
> "autodetect" statement
> for secs, cyls, head.... but I am not sure about libvirt support, though
>
> So 3 parts:
> 1. autodetect on real DASDs 
> 2. geometry and sector size checking in generic code
> 3. maybe an autodetect flag
>
> makes sense? Any guidance how to proceed?

Try posting a patch just for 1., and see if anyone screams :)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] Geometry and blocksize support for backing devices
  2014-11-07  9:17   ` Markus Armbruster
@ 2014-11-07  9:48     ` Christian Borntraeger
  2014-11-07 15:58       ` Markus Armbruster
  2014-11-07 13:39     ` Ekaterina Tumanova
  1 sibling, 1 reply; 23+ messages in thread
From: Christian Borntraeger @ 2014-11-07  9:48 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Stefan Hajnoczi, Ekaterina Tumanova, qemu-devel,
	Viktor Mihajlovski, dahi, cornelia.huck, Paolo Bonzini

Am 07.11.2014 10:17, schrieb Markus Armbruster:
> Christian Borntraeger <borntraeger@de.ibm.com> writes:
> 
>> Markus, Kevin, Stefan,
>>
>> here is a (somewhat late) followup of some KVM forum discussions regarding
>> block size and geometry of pass-through block devices. Let's just do a quick
>> wrap-up (as of my understanding) and a proposal at the end of the mail
>>
>>
>>
>>
>>
>> - DASD/ECKD disk devices have several special properties, e.g. the geometry 
>>   still has an influence on the partitions and each track can have a different
>>   format and the format of the disk actually follows what z/OS has as basic
>>   data structures. Linux does a low-level format of the disk to look like a
>>   block device (most common case is formatted with 4K blocks). There are 
>>   still some smalls warts for z/OS compatibility (block0 has 28 bytes of data,
>>   block1 has 148 bytes and block2 has 84 bytes of data. everything else has
>>   4k. the dasd device driver will fake 4k blocks for all blocks by ignoring 
>>   writes beyond the data for blocks 0-2 and filling the gaps with 0xe5).
>>
>> - Since Linux uses DASDs as normal block device, we actually want to use 
>>   virtio-blk to pass those to KVM guests. Due to the warts mentioned above,
>>   we have to have the proper block size and geometry in the guest. Otherwise 
>>   things will fail in certain cases (linux partition detection for example)
>>
>> - we have libvirt support to provide this information in the XML. This is far 
>>   from user friendly, though, as the admin has to manually look up the
>>   properties of the host disks and then modify the guest definition accordingly.
>>
>> - Kate came up with patches (based on initial patches from Einar Lueck) for
>>   auto-detection of geometry and block size for host block devices
>>
>> - Stefan and Paolo had some concerns:
>> 1. if the geometry etc is important, then make it part of the guest definition
>> 2. what about migration and the target disk differs
>> 3. is that issue system z specific or generic?
>>
>> Regarding 1: this does work as of today, but it is pretty complicated for an
>> admin to do so
>>
>> Regarding 2: System z system do not have built-in disks, they are always
>> accessed via fibre channel (either FICON protocol for DASDs or FCP protocol
>> for scsi). So its quite common to have shared access to the same disk from 
>> different System z boxes. system z admins should be able to setup this 
>> properly. Question is, is it ok to assume that and fail if not?
>>
>> Regarding 3: No idea. 
>>
>> At KVM forum I talked to several people regarding a solution:
>>
>> a) Stefan suggested to make the auto detection explicit, e.g.: provide a
>> "autodetect" tag for the secs, cylc, heads and logical_sector_size properties
>> This would require changes in qemu, but also in libvirt and its domain
>> configuration
>> b) Markus suggested that there are already some cases in QEMU, where we rely
>> on the admin to provide a proper setup on the target, e.g. an .iso
>> file as image.
>> If the target has a different content in the .iso file things will break.
>>
>> Markus said, there are two classes of this.
>> a) problems that can be detected by QEMU. Here QEMU will abort migration
> 
> Example: device missing on target, target rejects migration when it
> receives the device's state.
> 
>> b) problems that cannot be detected by QEMU (e.g. different iso content). this
>> will trigger failures later on
>>
>> a is preferred
> 
> Note that the geometry is currently in class (b).  Configuration
> generally is.
> 
> A perpetual long-term goal of migration is embedding configuration in
> the migration stream, to move it from (b) to (a).  Just frontend
> configuration, because backend configuration is generally host-specific.
> 
> Geometry is a property of the device, thus frontend configuration.
> 
> For historical reasons, device geometry properties default to backend
> values set with -drive cyls=... or -hdachs.  This is deprecated.
> 
> If the user doesn't specify geometry, QEMU makes one up, usually based
> on device size.  In certain circumstances, it bases on a DOS partition
> table instead.  Misfeature, in my opinion.  Partitioning a disk can give
> you a different geometry on the next restart, including migration,
> unless you specify the geometry explicitly.  Fortunately, most guests
> don't care for geometry at all.  This is entirely undocumented, as far
> as I can tell.
> 
>> Now here comes my proposal:
>> Markus statement brought up an idea of special casing DASDs support. We can
>> call an ioctl BIODASDINFO on the block device that will only succeed if the host
>> disk is really a dasd. We could enable the auto detection for that case.
> 
> If BIODASDINFO succeeds, QEMU uses that instead of making up device
> geometry as described above.  Correct?
> 
> Let's spell out when exactly BIODASDINFO succeeds, to avoid
> misunderstandings.  It does when the backend is a DASD (/dev/dasd*).
> What about a partition on a DASD?  A file in a filesystem on a DASD?

BIODASDINFO (we will propbably use BIODASDINFO2) will succeed only if the backend is backed by a dasd or its partitions. It will fail on other block devices and files on a dasd.

Now: when passing in only a dasd partition we cannot do any sane thing with the dasd oddities inside the guest (e.g. creating dasd partitions, reading volume label or TOC etc.)
So there is no need passing in geometry and block size. We could make the special case even stricter and check for start == 0 when doing the HDIO_GETGEO.
So pass-through of geometry and block size is only performed if BIODASDINFO2 succeeds, and the GETGEO call indicates that this is not a partions by having start == 0. 

Makes sense?

> 
> Auto-detecting geometry on DASDs adds an irregularity to the user
> interface.  I guess DASDs are special enough to tolerate that.
> 
>> In addition, QEMU will check if geometry and block size match during
>> migration, if
>> not, migration will fail. That would work with the following cases
>>
>> (manual override == secs, cyls, headers, blocksize given by admin)
>>
>> HOST A                          HOST B
>> dasd (auto)            ----->   dasd (auto)
>> dasd (auto)            ----->   image file (manual override)
>> image file (manual)    ----->   dasd (auto)
>> image file (manual)    ----->   image file (manual)
>> dasd (auto)            ----->   other host block device with manual override
>>
>> if there are different dasds or different value migration will fail (and that is
>> what we want)
> 
> I guess I helped plant this idea.  Thinking about it again, I fear doing
> it just for geometry could be shortsighted.  When we do it for all
> device configuration later on, a geometry special case could get in the
> way.
> 
> While you're certainly welcome to take on one of migration's long-term
> projects, I don't want to make it a prerequisite for getting DASDs more
> usable ;)
> 
> Without this sanity check, we gain another way to mess up geometry by
> changing the default geometry (see "DOS partition table" above for the
> existing way).  We reuse the same answer: don't do that then.
> 
> Good enough for Paolo, if I understand him correctly.  I guess it's good
> enough for me too, but please document it.  Covering the whole default
> geometry machinery would be nice.
> 
>> In addition we could also implement Stefans proposal to add a
>> "autodetect" statement
>> for secs, cyls, head.... but I am not sure about libvirt support, though
>>
>> So 3 parts:
>> 1. autodetect on real DASDs 
>> 2. geometry and sector size checking in generic code
>> 3. maybe an autodetect flag
>>
>> makes sense? Any guidance how to proceed?
> 
> Try posting a patch just for 1., and see if anyone screams :)

Kate, can you have a try?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] Geometry and blocksize support for backing devices
  2014-11-07  9:17   ` Markus Armbruster
  2014-11-07  9:48     ` Christian Borntraeger
@ 2014-11-07 13:39     ` Ekaterina Tumanova
  1 sibling, 0 replies; 23+ messages in thread
From: Ekaterina Tumanova @ 2014-11-07 13:39 UTC (permalink / raw)
  To: Markus Armbruster, Christian Borntraeger
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Viktor Mihajlovski,
	dahi, cornelia.huck, Paolo Bonzini

On 11/07/2014 12:17 PM, Markus Armbruster wrote:
> Christian Borntraeger <borntraeger@de.ibm.com> writes:
>
>> Markus, Kevin, Stefan,
>>
>> here is a (somewhat late) followup of some KVM forum discussions regarding
>> block size and geometry of pass-through block devices. Let's just do a quick
>> wrap-up (as of my understanding) and a proposal at the end of the mail
>>
>>
>>
>>
>>
>> - DASD/ECKD disk devices have several special properties, e.g. the geometry
>>    still has an influence on the partitions and each track can have a different
>>    format and the format of the disk actually follows what z/OS has as basic
>>    data structures. Linux does a low-level format of the disk to look like a
>>    block device (most common case is formatted with 4K blocks). There are
>>    still some smalls warts for z/OS compatibility (block0 has 28 bytes of data,
>>    block1 has 148 bytes and block2 has 84 bytes of data. everything else has
>>    4k. the dasd device driver will fake 4k blocks for all blocks by ignoring
>>    writes beyond the data for blocks 0-2 and filling the gaps with 0xe5).
>>
>> - Since Linux uses DASDs as normal block device, we actually want to use
>>    virtio-blk to pass those to KVM guests. Due to the warts mentioned above,
>>    we have to have the proper block size and geometry in the guest. Otherwise
>>    things will fail in certain cases (linux partition detection for example)
>>
>> - we have libvirt support to provide this information in the XML. This is far
>>    from user friendly, though, as the admin has to manually look up the
>>    properties of the host disks and then modify the guest definition accordingly.
>>
>> - Kate came up with patches (based on initial patches from Einar Lueck) for
>>    auto-detection of geometry and block size for host block devices
>>
>> - Stefan and Paolo had some concerns:
>> 1. if the geometry etc is important, then make it part of the guest definition
>> 2. what about migration and the target disk differs
>> 3. is that issue system z specific or generic?
>>
>> Regarding 1: this does work as of today, but it is pretty complicated for an
>> admin to do so
>>
>> Regarding 2: System z system do not have built-in disks, they are always
>> accessed via fibre channel (either FICON protocol for DASDs or FCP protocol
>> for scsi). So its quite common to have shared access to the same disk from
>> different System z boxes. system z admins should be able to setup this
>> properly. Question is, is it ok to assume that and fail if not?
>>
>> Regarding 3: No idea.
>>
>> At KVM forum I talked to several people regarding a solution:
>>
>> a) Stefan suggested to make the auto detection explicit, e.g.: provide a
>> "autodetect" tag for the secs, cylc, heads and logical_sector_size properties
>> This would require changes in qemu, but also in libvirt and its domain
>> configuration
>> b) Markus suggested that there are already some cases in QEMU, where we rely
>> on the admin to provide a proper setup on the target, e.g. an .iso
>> file as image.
>> If the target has a different content in the .iso file things will break.
>>
>> Markus said, there are two classes of this.
>> a) problems that can be detected by QEMU. Here QEMU will abort migration
>
> Example: device missing on target, target rejects migration when it
> receives the device's state.
>
>> b) problems that cannot be detected by QEMU (e.g. different iso content). this
>> will trigger failures later on
>>
>> a is preferred
>
> Note that the geometry is currently in class (b).  Configuration
> generally is.
>
> A perpetual long-term goal of migration is embedding configuration in
> the migration stream, to move it from (b) to (a).  Just frontend
> configuration, because backend configuration is generally host-specific.
>
> Geometry is a property of the device, thus frontend configuration.
>
> For historical reasons, device geometry properties default to backend
> values set with -drive cyls=... or -hdachs.  This is deprecated.
>
> If the user doesn't specify geometry, QEMU makes one up, usually based
> on device size.  In certain circumstances, it bases on a DOS partition
> table instead.  Misfeature, in my opinion.  Partitioning a disk can give
> you a different geometry on the next restart, including migration,
> unless you specify the geometry explicitly.  Fortunately, most guests
> don't care for geometry at all.  This is entirely undocumented, as far
> as I can tell.
>
>> Now here comes my proposal:
>> Markus statement brought up an idea of special casing DASDs support. We can
>> call an ioctl BIODASDINFO on the block device that will only succeed if the host
>> disk is really a dasd. We could enable the auto detection for that case.
>
> If BIODASDINFO succeeds, QEMU uses that instead of making up device
> geometry as described above.  Correct?
>
> Let's spell out when exactly BIODASDINFO succeeds, to avoid
> misunderstandings.  It does when the backend is a DASD (/dev/dasd*).
> What about a partition on a DASD?  A file in a filesystem on a DASD?
>
> Auto-detecting geometry on DASDs adds an irregularity to the user
> interface.  I guess DASDs are special enough to tolerate that.
>
>> In addition, QEMU will check if geometry and block size match during
>> migration, if
>> not, migration will fail. That would work with the following cases
>>
>> (manual override == secs, cyls, headers, blocksize given by admin)
>>
>> HOST A                          HOST B
>> dasd (auto)            ----->   dasd (auto)
>> dasd (auto)            ----->   image file (manual override)
>> image file (manual)    ----->   dasd (auto)
>> image file (manual)    ----->   image file (manual)
>> dasd (auto)            ----->   other host block device with manual override
>>
>> if there are different dasds or different value migration will fail (and that is
>> what we want)
>
> I guess I helped plant this idea.  Thinking about it again, I fear doing
> it just for geometry could be shortsighted.  When we do it for all
> device configuration later on, a geometry special case could get in the
> way.
>
> While you're certainly welcome to take on one of migration's long-term
> projects, I don't want to make it a prerequisite for getting DASDs more
> usable ;)
>
> Without this sanity check, we gain another way to mess up geometry by
> changing the default geometry (see "DOS partition table" above for the
> existing way).  We reuse the same answer: don't do that then.
>
> Good enough for Paolo, if I understand him correctly.  I guess it's good
> enough for me too, but please document it.  Covering the whole default
> geometry machinery would be nice.
>
>> In addition we could also implement Stefans proposal to add a
>> "autodetect" statement
>> for secs, cyls, head.... but I am not sure about libvirt support, though
>>
>> So 3 parts:
>> 1. autodetect on real DASDs
>> 2. geometry and sector size checking in generic code
>> 3. maybe an autodetect flag
>>
>> makes sense? Any guidance how to proceed?
>
> Try posting a patch just for 1., and see if anyone screams :)
>

Thanks a lot for the advise!
A small follow-up question...

Since the whole thing becomes arch-specific again (or even 
DASD-specific), I suppose I no longer need to change the driver code.

In my most recent patchset geometry was detected by HDIO_GETGEO inside
hw/block/hd-geometry.c (via arch-specific hook)
But... for the blocksize the control was passed to the driver functions 
(probe_logical_blocksize, probe_physical_blocksize) which called 
appropriate ioctls (BLKPBSZGET, BLKSSZGET) for "raw" and "host devices"

I suppose, that now there's no need for that anymore I can move the 
block ioctl calls back to hw/block/block.c and remove the introduced 
driver functions.

What do you think?

Kate.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] Geometry and blocksize support for backing devices
  2014-11-07  9:48     ` Christian Borntraeger
@ 2014-11-07 15:58       ` Markus Armbruster
  0 siblings, 0 replies; 23+ messages in thread
From: Markus Armbruster @ 2014-11-07 15:58 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kevin Wolf, Stefan Hajnoczi, Ekaterina Tumanova, qemu-devel,
	Viktor Mihajlovski, dahi, cornelia.huck, Paolo Bonzini

Christian Borntraeger <borntraeger@de.ibm.com> writes:

> Am 07.11.2014 10:17, schrieb Markus Armbruster:
>> Christian Borntraeger <borntraeger@de.ibm.com> writes:
[...]
>>> Now here comes my proposal:
>>> Markus statement brought up an idea of special casing DASDs support. We can
>>> call an ioctl BIODASDINFO on the block device that will only
>>> succeed if the host
>>> disk is really a dasd. We could enable the auto detection for that case.
>> 
>> If BIODASDINFO succeeds, QEMU uses that instead of making up device
>> geometry as described above.  Correct?
>> 
>> Let's spell out when exactly BIODASDINFO succeeds, to avoid
>> misunderstandings.  It does when the backend is a DASD (/dev/dasd*).
>> What about a partition on a DASD?  A file in a filesystem on a DASD?
>
> BIODASDINFO (we will propbably use BIODASDINFO2) will succeed only if
> the backend is backed by a dasd or its partitions. It will fail on
> other block devices and files on a dasd.
>
> Now: when passing in only a dasd partition we cannot do any sane thing
> with the dasd oddities inside the guest (e.g. creating dasd
> partitions, reading volume label or TOC etc.)
> So there is no need passing in geometry and block size. We could make
> the special case even stricter and check for start == 0 when doing the
> HDIO_GETGEO.
> So pass-through of geometry and block size is only performed if
> BIODASDINFO2 succeeds, and the GETGEO call indicates that this is not
> a partions by having start == 0.
>
> Makes sense?

Yes.  Thank you!

[...]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-11-07 15:58 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-29 12:27 [Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices Ekaterina Tumanova
2014-07-29 12:27 ` [Qemu-devel] [PATCH 1/4] hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x Ekaterina Tumanova
2014-08-20  8:19   ` Paolo Bonzini
2014-08-20  9:35     ` Christian Borntraeger
2014-08-20 13:10       ` Paolo Bonzini
2014-08-25  8:21         ` Christian Borntraeger
2014-07-29 12:27 ` [Qemu-devel] [PATCH 2/4] blocksize: support auto-sensing of blocksizes Ekaterina Tumanova
2014-09-03 15:37   ` Stefan Hajnoczi
2014-07-29 12:27 ` [Qemu-devel] [PATCH 3/4] blocksize: Add driver method to get the blocksizes Ekaterina Tumanova
2014-07-29 12:27 ` [Qemu-devel] [PATCH 4/4] blocksize: add blkconf_blocksize call to all block devices Ekaterina Tumanova
2014-09-03 15:46   ` Stefan Hajnoczi
2014-09-04 10:28     ` Ekaterina Tumanova
2014-09-17 14:00       ` Stefan Hajnoczi
2014-09-04 14:15     ` Christian Borntraeger
2014-09-17 13:56       ` Stefan Hajnoczi
2014-07-29 12:37 ` [Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices Christian Borntraeger
2014-11-06 15:58 ` [Qemu-devel] " Christian Borntraeger
2014-11-06 17:24   ` Paolo Bonzini
2014-11-06 19:05     ` Christian Borntraeger
2014-11-07  9:17   ` Markus Armbruster
2014-11-07  9:48     ` Christian Borntraeger
2014-11-07 15:58       ` Markus Armbruster
2014-11-07 13:39     ` Ekaterina Tumanova

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.