[Qemu-devel] [PATCH for-2.7 0/4] NBD and block alignment fixes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH for-2.7 0/4] NBD and block alignment fixes
@ 2016-07-21 19:34 Eric Blake
  2016-07-21 19:34 ` [Qemu-devel] [PATCH 1/4] nbd: Fix bad flag detection on server Eric Blake
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Eric Blake @ 2016-07-21 19:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pl, pbonzini

This series extracts a couple of bug fixes that should be included
in 2.7, out of my earlier v5 NBD series [1] that was deemed too
large and too late.  Then it tackles the promised regression fix
reported by Peter for Dell Equallogic iSCSI SANs with their unusual
non-power-of-2 unmap granularity.

[1] https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04053.html

The earlier series had a couple other patches that are
borderline bug fixes, but I think they can wait for 2.8,
as follows:

https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04054.html
8/14 nbd: Let server know when client gives up
Servers already have to deal with clients like qemu 2.6 that
don't give this notification, so it doesn't hurt to keep 2.7
in that same situation.

https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04064.html
12/14 nbd: Improve server handling of shutdown requests
Clients already have to deal with servers like qemu 2.6 that
don't reply to NBD_OPT_ABORT, so it doesn't hurt to keep 2.7
in that same situation.

Also available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-fixes-v1

Eric Blake (4):
  nbd: Fix bad flag detection on server
  nbd: Limit nbdflags to 16 bits
  osdep: Document differences in rounding macros
  block: Cater to iscsi with non-power-of-2 discard

 block/nbd-client.h        |  2 +-
 include/block/block_int.h | 37 ++++++++++++++++++++-----------------
 include/block/nbd.h       |  6 +++---
 include/qemu/osdep.h      |  6 +++++-
 block/io.c                | 15 +++++++++------
 nbd/client.c              | 28 +++++++++++++++-------------
 nbd/server.c              | 13 ++++++-------
 qemu-nbd.c                |  4 ++--
 8 files changed, 61 insertions(+), 50 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 1/4] nbd: Fix bad flag detection on server
  2016-07-21 19:34 [Qemu-devel] [PATCH for-2.7 0/4] NBD and block alignment fixes Eric Blake
@ 2016-07-21 19:34 ` Eric Blake
  2016-07-21 19:34 ` [Qemu-devel] [PATCH 2/4] nbd: Limit nbdflags to 16 bits Eric Blake
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 27+ messages in thread
From: Eric Blake @ 2016-07-21 19:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pl, pbonzini

Commit ab7c548e added a check for invalid flags, but used an
early return on error instead of properly going through the
cleanup label.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v1: extract from larger series
previously 1/14 of v5 NBD write zeroes series
v4: new patch
---
 nbd/server.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/nbd/server.c b/nbd/server.c
index 29e2099..3c1e2b3 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1057,7 +1057,8 @@ static ssize_t nbd_co_receive_request(NBDRequest *req,
     if (request->type & ~NBD_CMD_MASK_COMMAND & ~NBD_CMD_FLAG_FUA) {
         LOG("unsupported flags (got 0x%x)",
             request->type & ~NBD_CMD_MASK_COMMAND);
-        return -EINVAL;
+        rc = -EINVAL;
+        goto out;
     }

     rc = 0;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 2/4] nbd: Limit nbdflags to 16 bits
  2016-07-21 19:34 [Qemu-devel] [PATCH for-2.7 0/4] NBD and block alignment fixes Eric Blake
  2016-07-21 19:34 ` [Qemu-devel] [PATCH 1/4] nbd: Fix bad flag detection on server Eric Blake
@ 2016-07-21 19:34 ` Eric Blake
  2016-08-01  9:17   ` Paolo Bonzini
  2016-07-21 19:34 ` [Qemu-devel] [PATCH 3/4] osdep: Document differences in rounding macros Eric Blake
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Eric Blake @ 2016-07-21 19:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pl, pbonzini, qemu-stable, Kevin Wolf, Max Reitz

Rather than asserting that nbdflags is within range, just give
it the correct type to begin with :)  nbdflags corresponds to
the per-export portion of NBD Protocol "transmission flags", which
is 16 bits in response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO.

Furthermore, upstream NBD has never passed the global flags to
the kernel via ioctl(NBD_SET_FLAGS) (the ioctl was first
introduced in NBD 2.9.22; then a latent bug in NBD 3.1 actually
tried to OR the global flags with the transmission flags, with
the disaster that the addition of NBD_FLAG_NO_ZEROES in 3.9
caused all earlier NBD 3.x clients to treat every export as
read-only; NBD 3.10 and later intentionally clip things to 16
bits to pass only transmission flags).  Qemu should follow suit,
since the current two global flags (NBD_FLAG_FIXED_NEWSTYLE
and NBD_FLAG_NO_ZEROES) have no impact on the kernel's behavior
during transmission.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>

---
v1: extract from larger series
previously 3/14 of v5 NBD write zeroes series
v4: rebase, cc qemu-stable
v3: expand scope of patch
---
 block/nbd-client.h  |  2 +-
 include/block/nbd.h |  6 +++---
 nbd/client.c        | 28 +++++++++++++++-------------
 nbd/server.c        | 10 ++++------
 qemu-nbd.c          |  4 ++--
 5 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index fa9817b..044aca4 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -20,7 +20,7 @@
 typedef struct NbdClientSession {
     QIOChannelSocket *sioc; /* The master data channel */
     QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
-    uint32_t nbdflags;
+    uint16_t nbdflags;
     off_t size;

     CoMutex send_mutex;
diff --git a/include/block/nbd.h b/include/block/nbd.h
index cb91820..1897557 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -90,11 +90,11 @@ ssize_t nbd_wr_syncv(QIOChannel *ioc,
                      size_t niov,
                      size_t length,
                      bool do_read);
-int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
+int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                           QCryptoTLSCreds *tlscreds, const char *hostname,
                           QIOChannel **outioc,
                           off_t *size, Error **errp);
-int nbd_init(int fd, QIOChannelSocket *sioc, uint32_t flags, off_t size);
+int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size);
 ssize_t nbd_send_request(QIOChannel *ioc, struct nbd_request *request);
 ssize_t nbd_receive_reply(QIOChannel *ioc, struct nbd_reply *reply);
 int nbd_client(int fd);
@@ -104,7 +104,7 @@ typedef struct NBDExport NBDExport;
 typedef struct NBDClient NBDClient;

 NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
-                          uint32_t nbdflags, void (*close)(NBDExport *),
+                          uint16_t nbdflags, void (*close)(NBDExport *),
                           Error **errp);
 void nbd_export_close(NBDExport *exp);
 void nbd_export_get(NBDExport *exp);
diff --git a/nbd/client.c b/nbd/client.c
index 78a7195..a92f1e2 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -408,7 +408,7 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
 }


-int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
+int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
                           QCryptoTLSCreds *tlscreds, const char *hostname,
                           QIOChannel **outioc,
                           off_t *size, Error **errp)
@@ -468,7 +468,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
         uint32_t opt;
         uint32_t namesize;
         uint16_t globalflags;
-        uint16_t exportflags;
         bool fixedNewStyle = false;

         if (read_sync(ioc, &globalflags, sizeof(globalflags)) !=
@@ -477,7 +476,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
             goto fail;
         }
         globalflags = be16_to_cpu(globalflags);
-        *flags = globalflags << 16;
         TRACE("Global flags are %" PRIx32, globalflags);
         if (globalflags & NBD_FLAG_FIXED_NEWSTYLE) {
             fixedNewStyle = true;
@@ -545,17 +543,15 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
             goto fail;
         }
         *size = be64_to_cpu(s);
-        TRACE("Size is %" PRIu64, *size);

-        if (read_sync(ioc, &exportflags, sizeof(exportflags)) !=
-            sizeof(exportflags)) {
+        if (read_sync(ioc, flags, sizeof(*flags)) != sizeof(*flags)) {
             error_setg(errp, "Failed to read export flags");
             goto fail;
         }
-        exportflags = be16_to_cpu(exportflags);
-        *flags |= exportflags;
-        TRACE("Export flags are %" PRIx16, exportflags);
+        be16_to_cpus(flags);
     } else if (magic == NBD_CLIENT_MAGIC) {
+        uint32_t oldflags;
+
         if (name) {
             error_setg(errp, "Server does not support export names");
             goto fail;
@@ -572,16 +568,22 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint32_t *flags,
         *size = be64_to_cpu(s);
         TRACE("Size is %" PRIu64, *size);

-        if (read_sync(ioc, flags, sizeof(*flags)) != sizeof(*flags)) {
+        if (read_sync(ioc, &oldflags, sizeof(oldflags)) != sizeof(oldflags)) {
             error_setg(errp, "Failed to read export flags");
             goto fail;
         }
-        *flags = be32_to_cpu(*flags);
+        be32_to_cpus(&oldflags);
+        if (oldflags & ~0xffff) {
+            error_setg(errp, "Unexpected export flags %0x" PRIx32, oldflags);
+            goto fail;
+        }
+        *flags = oldflags;
     } else {
         error_setg(errp, "Bad magic received");
         goto fail;
     }

+    TRACE("Size is %" PRIu64 ", export flags %" PRIx16, *size, *flags);
     if (read_sync(ioc, &buf, 124) != 124) {
         error_setg(errp, "Failed to read reserved block");
         goto fail;
@@ -593,7 +595,7 @@ fail:
 }

 #ifdef __linux__
-int nbd_init(int fd, QIOChannelSocket *sioc, uint32_t flags, off_t size)
+int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size)
 {
     unsigned long sectors = size / BDRV_SECTOR_SIZE;
     if (size / BDRV_SECTOR_SIZE != sectors) {
@@ -689,7 +691,7 @@ int nbd_disconnect(int fd)
 }

 #else
-int nbd_init(int fd, QIOChannelSocket *ioc, uint32_t flags, off_t size)
+int nbd_init(int fd, QIOChannelSocket *ioc, uint16_t flags, off_t size)
 {
     return -ENOTSUP;
 }
diff --git a/nbd/server.c b/nbd/server.c
index 3c1e2b3..80fbb4d 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -63,7 +63,7 @@ struct NBDExport {
     char *name;
     off_t dev_offset;
     off_t size;
-    uint32_t nbdflags;
+    uint16_t nbdflags;
     QTAILQ_HEAD(, NBDClient) clients;
     QTAILQ_ENTRY(NBDExport) next;

@@ -544,8 +544,8 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
     NBDClient *client = data->client;
     char buf[8 + 8 + 8 + 128];
     int rc;
-    const int myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
-                         NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
+    const uint16_t myflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
+                              NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
     bool oldStyle;

     /* Old style negotiation header without options
@@ -575,7 +575,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)

     oldStyle = client->exp != NULL && !client->tlscreds;
     if (oldStyle) {
-        assert ((client->exp->nbdflags & ~65535) == 0);
         TRACE("advertising size %" PRIu64 " and flags %x",
               client->exp->size, client->exp->nbdflags | myflags);
         stq_be_p(buf + 8, NBD_CLIENT_MAGIC);
@@ -606,7 +605,6 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
             goto fail;
         }

-        assert ((client->exp->nbdflags & ~65535) == 0);
         TRACE("advertising size %" PRIu64 " and flags %x",
               client->exp->size, client->exp->nbdflags | myflags);
         stq_be_p(buf + 18, client->exp->size);
@@ -810,7 +808,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
 }

 NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
-                          uint32_t nbdflags, void (*close)(NBDExport *),
+                          uint16_t nbdflags, void (*close)(NBDExport *),
                           Error **errp)
 {
     NBDExport *exp = g_malloc0(sizeof(NBDExport));
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 321f02b..e3571c2 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -251,7 +251,7 @@ static void *nbd_client_thread(void *arg)
 {
     char *device = arg;
     off_t size;
-    uint32_t nbdflags;
+    uint16_t nbdflags;
     QIOChannelSocket *sioc;
     int fd;
     int ret;
@@ -465,7 +465,7 @@ int main(int argc, char **argv)
     BlockBackend *blk;
     BlockDriverState *bs;
     off_t dev_offset = 0;
-    uint32_t nbdflags = 0;
+    uint16_t nbdflags = 0;
     bool disconnect = false;
     const char *bindto = "0.0.0.0";
     const char *port = NULL;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 3/4] osdep: Document differences in rounding macros
  2016-07-21 19:34 [Qemu-devel] [PATCH for-2.7 0/4] NBD and block alignment fixes Eric Blake
  2016-07-21 19:34 ` [Qemu-devel] [PATCH 1/4] nbd: Fix bad flag detection on server Eric Blake
  2016-07-21 19:34 ` [Qemu-devel] [PATCH 2/4] nbd: Limit nbdflags to 16 bits Eric Blake
@ 2016-07-21 19:34 ` Eric Blake
  2016-08-01  9:19   ` Paolo Bonzini
  2016-07-21 19:34 ` [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard Eric Blake
  2016-07-27  7:26 ` [Qemu-devel] [PATCH for-2.7 0/4] NBD and block alignment fixes Fam Zheng
  4 siblings, 1 reply; 27+ messages in thread
From: Eric Blake @ 2016-07-21 19:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, pl, pbonzini

Make it obvious which macros are safe in which situations.

Useful since QEMU_ALIGN_UP and ROUND_UP both purport to do
the same thing, but differ on whether the alignment must be
a power of 2.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 include/qemu/osdep.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index fbb8759..9991fb0 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -158,7 +158,8 @@ extern int daemon(int, int);
 /* Round number down to multiple */
 #define QEMU_ALIGN_DOWN(n, m) ((n) / (m) * (m))

-/* Round number up to multiple */
+/* Round number up to multiple. Safe when m is not a power of 2 (see
+ * ROUND_UP for a faster version when a power of 2 is guaranteed) */
 #define QEMU_ALIGN_UP(n, m) QEMU_ALIGN_DOWN((n) + (m) - 1, (m))

 /* Check if n is a multiple of m */
@@ -175,6 +176,9 @@ extern int daemon(int, int);
 /* Check if pointer p is n-bytes aligned */
 #define QEMU_PTR_IS_ALIGNED(p, n) QEMU_IS_ALIGNED((uintptr_t)(p), (n))

+/* Round number up to multiple. Requires that d be a power of 2 (see
+ * QEMU_ALIGN_UP for a safer but slower version on arbitrary
+ * numbers) */
 #ifndef ROUND_UP
 #define ROUND_UP(n,d) (((n) + (d) - 1) & -(d))
 #endif
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-07-21 19:34 [Qemu-devel] [PATCH for-2.7 0/4] NBD and block alignment fixes Eric Blake
                   ` (2 preceding siblings ...)
  2016-07-21 19:34 ` [Qemu-devel] [PATCH 3/4] osdep: Document differences in rounding macros Eric Blake
@ 2016-07-21 19:34 ` Eric Blake
  2016-07-26 13:28   ` Stefan Hajnoczi
  2016-07-27  7:25   ` Fam Zheng
  2016-07-27  7:26 ` [Qemu-devel] [PATCH for-2.7 0/4] NBD and block alignment fixes Fam Zheng
  4 siblings, 2 replies; 27+ messages in thread
From: Eric Blake @ 2016-07-21 19:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, pl, pbonzini, Stefan Hajnoczi, Fam Zheng, Kevin Wolf,
	Max Reitz

Dell Equallogic iSCSI SANs have a very unusual advertised geometry:

$ iscsi-inq -e 1 -c $((0xb0)) iscsi://XXX/0
wsnz:0
maximum compare and write length:1
optimal transfer length granularity:0
maximum transfer length:0
optimal transfer length:0
maximum prefetch xdread xdwrite transfer length:0
maximum unmap lba count:30720
maximum unmap block descriptor count:2
optimal unmap granularity:30720
ugavalid:1
unmap granularity alignment:0
maximum write same length:30720

which says that both the maximum and the optimal discard size
is 15M.  It is not immediately apparent if the device allows
discard requests not aligned to the optimal size, nor if it
allows discards at a finer granularity than the optimal size.

I tried to find details in the SCSI Commands Reference Manual
Rev. A on what valid values of maximum and optimal sizes are
permitted, but while that document mentions a "Block Limits
VPD Page", I couldn't actually find documentation of that page
or what values it would have, or if a SCSI device has an
advertisement of its minimal unmap granularity.  So it is not
obvious to me whether the Dell Equallogic device is compliance
with the SCSI specification.

Fortunately, it is easy enough to support non-power-of-2 sizing,
even if it means we are less efficient than truly possible when
targetting that device (for example, it means that we refuse to
unmap anything that is not a multiple of 15M and aligned to a
15M boundary, even if the device truly does support a smaller
granularity where unmapping actually works).

Reported-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Eric Blake <eblake@redhat.com>

---
Help in locating the actual specs on what SCSI requires for
page 0xb0 would be nice. But this should at least avoid the
assertion failures that Peter is hitting.  I was able to
test this patch using NBD on a hacked up qemu where I made
block/nbd.c report the same block limits, and could confirm
the assert under qemu-io 'w -z 0 40m' and 'discard 0 40m'
pre-patch, as well as the post-patch behavior of splitting
things to 15M alignment ('discard 1M 15M' becomes a no-op
because it is not aligned).  But obviously it needs to be
tested on the actual iscsi SAN that triggered the original
report.
---
 include/block/block_int.h | 37 ++++++++++++++++++++-----------------
 block/io.c                | 15 +++++++++------
 2 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 1fe0fd9..47665be 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -330,36 +330,39 @@ typedef struct BlockLimits {
      * otherwise. */
     uint32_t request_alignment;

-    /* maximum number of bytes that can be discarded at once (since it
-     * is signed, it must be < 2G, if set), should be multiple of
+    /* Maximum number of bytes that can be discarded at once (since it
+     * is signed, it must be < 2G, if set). Must be multiple of
      * pdiscard_alignment, but need not be power of 2. May be 0 if no
      * inherent 32-bit limit */
     int32_t max_pdiscard;

-    /* optimal alignment for discard requests in bytes, must be power
-     * of 2, less than max_pdiscard if that is set, and multiple of
-     * bl.request_alignment. May be 0 if bl.request_alignment is good
-     * enough */
+    /* Optimal alignment for discard requests in bytes. A power of 2
+     * is best but not mandatory.  Must be a multiple of
+     * bl.request_alignment, and must be less than max_pdiscard if
+     * that is set. May be 0 if bl.request_alignment is good enough */
     uint32_t pdiscard_alignment;

-    /* maximum number of bytes that can zeroized at once (since it is
-     * signed, it must be < 2G, if set), should be multiple of
+    /* Maximum number of bytes that can zeroized at once (since it is
+     * signed, it must be < 2G, if set). Must be multiple of
      * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
     int32_t max_pwrite_zeroes;

-    /* optimal alignment for write zeroes requests in bytes, must be
-     * power of 2, less than max_pwrite_zeroes if that is set, and
-     * multiple of bl.request_alignment. May be 0 if
-     * bl.request_alignment is good enough */
+    /* Optimal alignment for write zeroes requests in bytes. A power
+     * of 2 is best but not mandatory.  Must be a multiple of
+     * bl.request_alignment, and must be less than max_pwrite_zeroes
+     * if that is set. May be 0 if bl.request_alignment is good
+     * enough */
     uint32_t pwrite_zeroes_alignment;

-    /* optimal transfer length in bytes (must be power of 2, and
-     * multiple of bl.request_alignment), or 0 if no preferred size */
+    /* Optimal transfer length in bytes.  A power of 2 is best but not
+     * mandatory.  Must be a multiple of bl.request_alignment, or 0 if
+     * no preferred size */
     uint32_t opt_transfer;

-    /* maximal transfer length in bytes (need not be power of 2, but
-     * should be multiple of opt_transfer), or 0 for no 32-bit limit.
-     * For now, anything larger than INT_MAX is clamped down. */
+    /* Maximal transfer length in bytes.  Need not be power of 2, but
+     * must be multiple of opt_transfer and bl.request_alignment, or 0
+     * for no 32-bit limit.  For now, anything larger than INT_MAX is
+     * clamped down. */
     uint32_t max_transfer;

     /* memory alignment, in bytes so that no bounce buffer is needed */
diff --git a/block/io.c b/block/io.c
index 7323f0f..0644a5d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1180,10 +1180,11 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
     int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
                         bs->bl.request_alignment);

-    assert(is_power_of_2(alignment));
-    head = offset & (alignment - 1);
-    tail = (offset + count) & (alignment - 1);
-    max_write_zeroes &= ~(alignment - 1);
+    assert(alignment % bs->bl.request_alignment == 0);
+    head = offset % alignment;
+    tail = (offset + count) % alignment;
+    max_write_zeroes = max_write_zeroes / alignment * alignment;
+    assert(max_write_zeroes >= bs->bl.request_alignment);

     while (count > 0 && !ret) {
         int num = count;
@@ -2429,9 +2430,10 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset,

     /* Discard is advisory, so ignore any unaligned head or tail */
     align = MAX(bs->bl.pdiscard_alignment, bs->bl.request_alignment);
-    assert(is_power_of_2(align));
-    head = MIN(count, -offset & (align - 1));
+    assert(align % bs->bl.request_alignment == 0);
+    head = offset % align;
     if (head) {
+        head = MIN(count, align - head);
         count -= head;
         offset += head;
     }
@@ -2449,6 +2451,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset,

     max_pdiscard = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_pdiscard, INT_MAX),
                                    align);
+    assert(max_pdiscard);

     while (count > 0) {
         int ret;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-07-21 19:34 ` [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard Eric Blake
@ 2016-07-26 13:28   ` Stefan Hajnoczi
  2016-07-27  7:25   ` Fam Zheng
  1 sibling, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2016-07-26 13:28 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, qemu-block, pl, pbonzini, Fam Zheng, Kevin Wolf, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2677 bytes --]

On Thu, Jul 21, 2016 at 01:34:48PM -0600, Eric Blake wrote:
> Dell Equallogic iSCSI SANs have a very unusual advertised geometry:
> 
> $ iscsi-inq -e 1 -c $((0xb0)) iscsi://XXX/0
> wsnz:0
> maximum compare and write length:1
> optimal transfer length granularity:0
> maximum transfer length:0
> optimal transfer length:0
> maximum prefetch xdread xdwrite transfer length:0
> maximum unmap lba count:30720
> maximum unmap block descriptor count:2
> optimal unmap granularity:30720
> ugavalid:1
> unmap granularity alignment:0
> maximum write same length:30720
> 
> which says that both the maximum and the optimal discard size
> is 15M.  It is not immediately apparent if the device allows
> discard requests not aligned to the optimal size, nor if it
> allows discards at a finer granularity than the optimal size.
> 
> I tried to find details in the SCSI Commands Reference Manual
> Rev. A on what valid values of maximum and optimal sizes are
> permitted, but while that document mentions a "Block Limits
> VPD Page", I couldn't actually find documentation of that page
> or what values it would have, or if a SCSI device has an
> advertisement of its minimal unmap granularity.  So it is not
> obvious to me whether the Dell Equallogic device is compliance
> with the SCSI specification.
> 
> Fortunately, it is easy enough to support non-power-of-2 sizing,
> even if it means we are less efficient than truly possible when
> targetting that device (for example, it means that we refuse to
> unmap anything that is not a multiple of 15M and aligned to a
> 15M boundary, even if the device truly does support a smaller
> granularity where unmapping actually works).
> 
> Reported-by: Peter Lieven <pl@kamp.de>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> Help in locating the actual specs on what SCSI requires for
> page 0xb0 would be nice. But this should at least avoid the
> assertion failures that Peter is hitting.  I was able to
> test this patch using NBD on a hacked up qemu where I made
> block/nbd.c report the same block limits, and could confirm
> the assert under qemu-io 'w -z 0 40m' and 'discard 0 40m'
> pre-patch, as well as the post-patch behavior of splitting
> things to 15M alignment ('discard 1M 15M' becomes a no-op
> because it is not aligned).  But obviously it needs to be
> tested on the actual iscsi SAN that triggered the original
> report.
> ---
>  include/block/block_int.h | 37 ++++++++++++++++++++-----------------
>  block/io.c                | 15 +++++++++------
>  2 files changed, 29 insertions(+), 23 deletions(-)

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-07-21 19:34 ` [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard Eric Blake
  2016-07-26 13:28   ` Stefan Hajnoczi
@ 2016-07-27  7:25   ` Fam Zheng
  2016-07-28  2:39     ` Eric Blake
  1 sibling, 1 reply; 27+ messages in thread
From: Fam Zheng @ 2016-07-27  7:25 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, qemu-block, pl, pbonzini, Stefan Hajnoczi,
	Kevin Wolf, Max Reitz

On Thu, 07/21 13:34, Eric Blake wrote:
> +    max_write_zeroes = max_write_zeroes / alignment * alignment;

Not using QEMU_ALIGN_DOWN despite patch 3?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.7 0/4] NBD and block alignment fixes
  2016-07-21 19:34 [Qemu-devel] [PATCH for-2.7 0/4] NBD and block alignment fixes Eric Blake
                   ` (3 preceding siblings ...)
  2016-07-21 19:34 ` [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard Eric Blake
@ 2016-07-27  7:26 ` Fam Zheng
  4 siblings, 0 replies; 27+ messages in thread
From: Fam Zheng @ 2016-07-27  7:26 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, pbonzini, pl, qemu-block

On Thu, 07/21 13:34, Eric Blake wrote:
> This series extracts a couple of bug fixes that should be included
> in 2.7, out of my earlier v5 NBD series [1] that was deemed too
> large and too late.  Then it tackles the promised regression fix
> reported by Peter for Dell Equallogic iSCSI SANs with their unusual
> non-power-of-2 unmap granularity.

Series:
Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-07-27  7:25   ` Fam Zheng
@ 2016-07-28  2:39     ` Eric Blake
  2016-08-01  9:22       ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Blake @ 2016-07-28  2:39 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, qemu-block, pl, pbonzini, Stefan Hajnoczi,
	Kevin Wolf, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 402 bytes --]

On 07/27/2016 01:25 AM, Fam Zheng wrote:
> On Thu, 07/21 13:34, Eric Blake wrote:
>> +    max_write_zeroes = max_write_zeroes / alignment * alignment;
> 
> Not using QEMU_ALIGN_DOWN despite patch 3?

Looks like I missed that on the rebase. Can fix if there is a reason for
a respin.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] nbd: Limit nbdflags to 16 bits
  2016-07-21 19:34 ` [Qemu-devel] [PATCH 2/4] nbd: Limit nbdflags to 16 bits Eric Blake
@ 2016-08-01  9:17   ` Paolo Bonzini
  2016-08-01 11:43     ` Eric Blake
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-08-01  9:17 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, pl, qemu-stable, Kevin Wolf, Max Reitz



On 21/07/2016 21:34, Eric Blake wrote:
> Furthermore, upstream NBD has never passed the global flags to
> the kernel via ioctl(NBD_SET_FLAGS) (the ioctl was first
> introduced in NBD 2.9.22; then a latent bug in NBD 3.1 actually
> tried to OR the global flags with the transmission flags, with
> the disaster that the addition of NBD_FLAG_NO_ZEROES in 3.9
> caused all earlier NBD 3.x clients to treat every export as
> read-only; NBD 3.10 and later intentionally clip things to 16
> bits to pass only transmission flags).  Qemu should follow suit,
> since the current two global flags (NBD_FLAG_FIXED_NEWSTYLE
> and NBD_FLAG_NO_ZEROES) have no impact on the kernel's behavior
> during transmission.

Should squash this in too:

diff --git a/nbd/server.c b/nbd/server.c
index 80fbb4d..6fa2f9c 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -575,7 +575,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
 
     oldStyle = client->exp != NULL && !client->tlscreds;
     if (oldStyle) {
-        TRACE("advertising size %" PRIu64 " and flags %x",
+        TRACE("advertising size %" PRIu64 " and flags %" PRIx16,
               client->exp->size, client->exp->nbdflags | myflags);
         stq_be_p(buf + 8, NBD_CLIENT_MAGIC);
         stq_be_p(buf + 16, client->exp->size);
@@ -605,7 +605,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
             goto fail;
         }
 
-        TRACE("advertising size %" PRIu64 " and flags %x",
+        TRACE("advertising size %" PRIu64 " and flags %" PRIx16,
               client->exp->size, client->exp->nbdflags | myflags);
         stq_be_p(buf + 18, client->exp->size);
         stw_be_p(buf + 26, client->exp->nbdflags | myflags);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] osdep: Document differences in rounding macros
  2016-07-21 19:34 ` [Qemu-devel] [PATCH 3/4] osdep: Document differences in rounding macros Eric Blake
@ 2016-08-01  9:19   ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2016-08-01  9:19 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, pl



On 21/07/2016 21:34, Eric Blake wrote:
> Make it obvious which macros are safe in which situations.
> 
> Useful since QEMU_ALIGN_UP and ROUND_UP both purport to do
> the same thing, but differ on whether the alignment must be
> a power of 2.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  include/qemu/osdep.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index fbb8759..9991fb0 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -158,7 +158,8 @@ extern int daemon(int, int);
>  /* Round number down to multiple */
>  #define QEMU_ALIGN_DOWN(n, m) ((n) / (m) * (m))
> 
> -/* Round number up to multiple */
> +/* Round number up to multiple. Safe when m is not a power of 2 (see
> + * ROUND_UP for a faster version when a power of 2 is guaranteed) */
>  #define QEMU_ALIGN_UP(n, m) QEMU_ALIGN_DOWN((n) + (m) - 1, (m))
> 
>  /* Check if n is a multiple of m */
> @@ -175,6 +176,9 @@ extern int daemon(int, int);
>  /* Check if pointer p is n-bytes aligned */
>  #define QEMU_PTR_IS_ALIGNED(p, n) QEMU_IS_ALIGNED((uintptr_t)(p), (n))
> 
> +/* Round number up to multiple. Requires that d be a power of 2 (see
> + * QEMU_ALIGN_UP for a safer but slower version on arbitrary
> + * numbers) */
>  #ifndef ROUND_UP
>  #define ROUND_UP(n,d) (((n) + (d) - 1) & -(d))
>  #endif

Ouch, this is ugly, especially since DIV_ROUND_UP does not require
alignment!  Not your fault of course, and the patch is arguably an
improvement.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-07-28  2:39     ` Eric Blake
@ 2016-08-01  9:22       ` Paolo Bonzini
  2016-10-25 12:03         ` Peter Lieven
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-08-01  9:22 UTC (permalink / raw)
  To: Eric Blake, Fam Zheng
  Cc: qemu-devel, qemu-block, pl, Stefan Hajnoczi, Kevin Wolf, Max Reitz



On 28/07/2016 04:39, Eric Blake wrote:
> On 07/27/2016 01:25 AM, Fam Zheng wrote:
>> On Thu, 07/21 13:34, Eric Blake wrote:
>>> +    max_write_zeroes = max_write_zeroes / alignment * alignment;
>>
>> Not using QEMU_ALIGN_DOWN despite patch 3?
> 
> Looks like I missed that on the rebase. Can fix if there is a reason for
> a respin.

Since Stefan acked this, I'm applying the patch and fixing it to use
QEMU_ALIGN_DOWN.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] nbd: Limit nbdflags to 16 bits
  2016-08-01  9:17   ` Paolo Bonzini
@ 2016-08-01 11:43     ` Eric Blake
  2016-08-01 11:50       ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Blake @ 2016-08-01 11:43 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: qemu-block, pl, qemu-stable, Kevin Wolf, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1741 bytes --]

On 08/01/2016 03:17 AM, Paolo Bonzini wrote:
> 
> 
> On 21/07/2016 21:34, Eric Blake wrote:
>> Furthermore, upstream NBD has never passed the global flags to
>> the kernel via ioctl(NBD_SET_FLAGS) (the ioctl was first
>> introduced in NBD 2.9.22; then a latent bug in NBD 3.1 actually
>> tried to OR the global flags with the transmission flags, with
>> the disaster that the addition of NBD_FLAG_NO_ZEROES in 3.9
>> caused all earlier NBD 3.x clients to treat every export as
>> read-only; NBD 3.10 and later intentionally clip things to 16
>> bits to pass only transmission flags).  Qemu should follow suit,
>> since the current two global flags (NBD_FLAG_FIXED_NEWSTYLE
>> and NBD_FLAG_NO_ZEROES) have no impact on the kernel's behavior
>> during transmission.
> 
> Should squash this in too:
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 80fbb4d..6fa2f9c 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -575,7 +575,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
>  
>      oldStyle = client->exp != NULL && !client->tlscreds;
>      if (oldStyle) {
> -        TRACE("advertising size %" PRIu64 " and flags %x",
> +        TRACE("advertising size %" PRIu64 " and flags %" PRIx16,
>                client->exp->size, client->exp->nbdflags | myflags);

No, we shouldn't. Last time I tried that, we tickled a clang bug where
%hx gripes when presented an 'int' argument, in spite of the int
argument being computed as 'short | short'. See commit 2cb34749, and
your discussion leading up to it:
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg04663.html

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] nbd: Limit nbdflags to 16 bits
  2016-08-01 11:43     ` Eric Blake
@ 2016-08-01 11:50       ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2016-08-01 11:50 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, pl, qemu-stable, Kevin Wolf, Max Reitz



On 01/08/2016 13:43, Eric Blake wrote:
> On 08/01/2016 03:17 AM, Paolo Bonzini wrote:
>>
>>
>> On 21/07/2016 21:34, Eric Blake wrote:
>>> Furthermore, upstream NBD has never passed the global flags to
>>> the kernel via ioctl(NBD_SET_FLAGS) (the ioctl was first
>>> introduced in NBD 2.9.22; then a latent bug in NBD 3.1 actually
>>> tried to OR the global flags with the transmission flags, with
>>> the disaster that the addition of NBD_FLAG_NO_ZEROES in 3.9
>>> caused all earlier NBD 3.x clients to treat every export as
>>> read-only; NBD 3.10 and later intentionally clip things to 16
>>> bits to pass only transmission flags).  Qemu should follow suit,
>>> since the current two global flags (NBD_FLAG_FIXED_NEWSTYLE
>>> and NBD_FLAG_NO_ZEROES) have no impact on the kernel's behavior
>>> during transmission.
>>
>> Should squash this in too:
>>
>> diff --git a/nbd/server.c b/nbd/server.c
>> index 80fbb4d..6fa2f9c 100644
>> --- a/nbd/server.c
>> +++ b/nbd/server.c
>> @@ -575,7 +575,7 @@ static coroutine_fn int nbd_negotiate(NBDClientNewData *data)
>>  
>>      oldStyle = client->exp != NULL && !client->tlscreds;
>>      if (oldStyle) {
>> -        TRACE("advertising size %" PRIu64 " and flags %x",
>> +        TRACE("advertising size %" PRIu64 " and flags %" PRIx16,
>>                client->exp->size, client->exp->nbdflags | myflags);
> 
> No, we shouldn't. Last time I tried that, we tickled a clang bug where
> %hx gripes when presented an 'int' argument, in spite of the int
> argument being computed as 'short | short'. See commit 2cb34749, and
> your discussion leading up to it:
> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg04663.html

Uff, you're right. :(  I remembered the discussion, but not the outcome.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-08-01  9:22       ` Paolo Bonzini
@ 2016-10-25 12:03         ` Peter Lieven
  2016-10-25 12:09           ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Lieven @ 2016-10-25 12:03 UTC (permalink / raw)
  To: Paolo Bonzini, Eric Blake, Fam Zheng
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Kevin Wolf, Max Reitz

Am 01.08.2016 um 11:22 schrieb Paolo Bonzini:
>
> On 28/07/2016 04:39, Eric Blake wrote:
>> On 07/27/2016 01:25 AM, Fam Zheng wrote:
>>> On Thu, 07/21 13:34, Eric Blake wrote:
>>>> +    max_write_zeroes = max_write_zeroes / alignment * alignment;
>>> Not using QEMU_ALIGN_DOWN despite patch 3?
>> Looks like I missed that on the rebase. Can fix if there is a reason for
>> a respin.
> Since Stefan acked this, I'm applying the patch and fixing it to use
> QEMU_ALIGN_DOWN.
>
> Paolo

Hi,

I came across a sort of regression we introduced with the dropping of head and tail
of an unaligned discard.

The discard alignment that we use to trim the discard request is just a hint.

I learned on the equallogics that a page (which is this unusal 15MB large) is
unallocated even if the discard happens in pieces. E.g. in slices of 1MB requests.

 From my point of view I would like to restore the old behaviour. What do you think?

Thanks,
Peter

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-10-25 12:03         ` Peter Lieven
@ 2016-10-25 12:09           ` Paolo Bonzini
  2016-10-25 12:12             ` Peter Lieven
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-10-25 12:09 UTC (permalink / raw)
  To: Peter Lieven, Eric Blake, Fam Zheng
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Kevin Wolf, Max Reitz



On 25/10/2016 14:03, Peter Lieven wrote:
> Am 01.08.2016 um 11:22 schrieb Paolo Bonzini:
>>
>> On 28/07/2016 04:39, Eric Blake wrote:
>>> On 07/27/2016 01:25 AM, Fam Zheng wrote:
>>>> On Thu, 07/21 13:34, Eric Blake wrote:
>>>>> +    max_write_zeroes = max_write_zeroes / alignment * alignment;
>>>> Not using QEMU_ALIGN_DOWN despite patch 3?
>>> Looks like I missed that on the rebase. Can fix if there is a reason for
>>> a respin.
>> Since Stefan acked this, I'm applying the patch and fixing it to use
>> QEMU_ALIGN_DOWN.
>>
>> Paolo
> 
> Hi,
> 
> I came across a sort of regression we introduced with the dropping of
> head and tail
> of an unaligned discard.
> 
> The discard alignment that we use to trim the discard request is just a
> hint.
> 
> I learned on the equallogics that a page (which is this unusal 15MB
> large) is
> unallocated even if the discard happens in pieces. E.g. in slices of 1MB
> requests.
> 
> From my point of view I would like to restore the old behaviour. What do
> you think?

The right logic should be the one in Linux: if splitting a request, and
the next starting sector would be misaligned, stop the discard at the
previous aligned sector.  Otherwise leave everything alone.

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-10-25 12:09           ` Paolo Bonzini
@ 2016-10-25 12:12             ` Peter Lieven
  2016-10-25 12:19               ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Lieven @ 2016-10-25 12:12 UTC (permalink / raw)
  To: Paolo Bonzini, Eric Blake, Fam Zheng
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Kevin Wolf, Max Reitz

Am 25.10.2016 um 14:09 schrieb Paolo Bonzini:
>
> On 25/10/2016 14:03, Peter Lieven wrote:
>> Am 01.08.2016 um 11:22 schrieb Paolo Bonzini:
>>> On 28/07/2016 04:39, Eric Blake wrote:
>>>> On 07/27/2016 01:25 AM, Fam Zheng wrote:
>>>>> On Thu, 07/21 13:34, Eric Blake wrote:
>>>>>> +    max_write_zeroes = max_write_zeroes / alignment * alignment;
>>>>> Not using QEMU_ALIGN_DOWN despite patch 3?
>>>> Looks like I missed that on the rebase. Can fix if there is a reason for
>>>> a respin.
>>> Since Stefan acked this, I'm applying the patch and fixing it to use
>>> QEMU_ALIGN_DOWN.
>>>
>>> Paolo
>> Hi,
>>
>> I came across a sort of regression we introduced with the dropping of
>> head and tail
>> of an unaligned discard.
>>
>> The discard alignment that we use to trim the discard request is just a
>> hint.
>>
>> I learned on the equallogics that a page (which is this unusal 15MB
>> large) is
>> unallocated even if the discard happens in pieces. E.g. in slices of 1MB
>> requests.
>>
>>  From my point of view I would like to restore the old behaviour. What do
>> you think?
> The right logic should be the one in Linux: if splitting a request, and
> the next starting sector would be misaligned, stop the discard at the
> previous aligned sector.  Otherwise leave everything alone.

Just to clarify. I mean the guest would send incremental 1MB discards
we would now drop all of them if the alignment is 15MB. Previously,
we have sent all of the 1MB requests.

Peter

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-10-25 12:12             ` Peter Lieven
@ 2016-10-25 12:19               ` Paolo Bonzini
  2016-10-25 12:42                 ` Peter Lieven
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-10-25 12:19 UTC (permalink / raw)
  To: Peter Lieven, Eric Blake, Fam Zheng
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Kevin Wolf, Max Reitz



On 25/10/2016 14:12, Peter Lieven wrote:
> Am 25.10.2016 um 14:09 schrieb Paolo Bonzini:
>>
>> On 25/10/2016 14:03, Peter Lieven wrote:
>>> Am 01.08.2016 um 11:22 schrieb Paolo Bonzini:
>>>> On 28/07/2016 04:39, Eric Blake wrote:
>>>>> On 07/27/2016 01:25 AM, Fam Zheng wrote:
>>>>>> On Thu, 07/21 13:34, Eric Blake wrote:
>>>>>>> +    max_write_zeroes = max_write_zeroes / alignment * alignment;
>>>>>> Not using QEMU_ALIGN_DOWN despite patch 3?
>>>>> Looks like I missed that on the rebase. Can fix if there is a
>>>>> reason for
>>>>> a respin.
>>>> Since Stefan acked this, I'm applying the patch and fixing it to use
>>>> QEMU_ALIGN_DOWN.
>>>>
>>>> Paolo
>>> Hi,
>>>
>>> I came across a sort of regression we introduced with the dropping of
>>> head and tail
>>> of an unaligned discard.
>>>
>>> The discard alignment that we use to trim the discard request is just a
>>> hint.
>>>
>>> I learned on the equallogics that a page (which is this unusal 15MB
>>> large) is
>>> unallocated even if the discard happens in pieces. E.g. in slices of 1MB
>>> requests.
>>>
>>>  From my point of view I would like to restore the old behaviour.
>>> What do
>>> you think?
>> The right logic should be the one in Linux: if splitting a request, and
>> the next starting sector would be misaligned, stop the discard at the
>> previous aligned sector.  Otherwise leave everything alone.
> 
> Just to clarify. I mean the guest would send incremental 1MB discards
> we would now drop all of them if the alignment is 15MB. Previously,
> we have sent all of the 1MB requests.

Yes.  In this case there would be no need at all to split the request,
so each request should be passed through.

But hey, that firmware is seriously weird. :)

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-10-25 12:19               ` Paolo Bonzini
@ 2016-10-25 12:42                 ` Peter Lieven
  2016-10-25 13:59                   ` Eric Blake
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Lieven @ 2016-10-25 12:42 UTC (permalink / raw)
  To: Paolo Bonzini, Eric Blake, Fam Zheng
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Kevin Wolf, Max Reitz

Am 25.10.2016 um 14:19 schrieb Paolo Bonzini:
>
> On 25/10/2016 14:12, Peter Lieven wrote:
>> Am 25.10.2016 um 14:09 schrieb Paolo Bonzini:
>>> On 25/10/2016 14:03, Peter Lieven wrote:
>>>> Am 01.08.2016 um 11:22 schrieb Paolo Bonzini:
>>>>> On 28/07/2016 04:39, Eric Blake wrote:
>>>>>> On 07/27/2016 01:25 AM, Fam Zheng wrote:
>>>>>>> On Thu, 07/21 13:34, Eric Blake wrote:
>>>>>>>> +    max_write_zeroes = max_write_zeroes / alignment * alignment;
>>>>>>> Not using QEMU_ALIGN_DOWN despite patch 3?
>>>>>> Looks like I missed that on the rebase. Can fix if there is a
>>>>>> reason for
>>>>>> a respin.
>>>>> Since Stefan acked this, I'm applying the patch and fixing it to use
>>>>> QEMU_ALIGN_DOWN.
>>>>>
>>>>> Paolo
>>>> Hi,
>>>>
>>>> I came across a sort of regression we introduced with the dropping of
>>>> head and tail
>>>> of an unaligned discard.
>>>>
>>>> The discard alignment that we use to trim the discard request is just a
>>>> hint.
>>>>
>>>> I learned on the equallogics that a page (which is this unusal 15MB
>>>> large) is
>>>> unallocated even if the discard happens in pieces. E.g. in slices of 1MB
>>>> requests.
>>>>
>>>>   From my point of view I would like to restore the old behaviour.
>>>> What do
>>>> you think?
>>> The right logic should be the one in Linux: if splitting a request, and
>>> the next starting sector would be misaligned, stop the discard at the
>>> previous aligned sector.  Otherwise leave everything alone.
>> Just to clarify. I mean the guest would send incremental 1MB discards
>> we would now drop all of them if the alignment is 15MB. Previously,
>> we have sent all of the 1MB requests.
> Yes.  In this case there would be no need at all to split the request,
> so each request should be passed through.
>
> But hey, that firmware is seriously weird. :)

Yes, so you would not change the new implementation?

Even if the discard is e.g. 1MB it could theretically be that internally
the device has a finer granularity. Its an optimal discard alignment
not the minimum required discard size. I think thats a difference.
It does not say I can't handle smaller discards.

Peter

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-10-25 12:42                 ` Peter Lieven
@ 2016-10-25 13:59                   ` Eric Blake
  2016-10-25 14:20                     ` Peter Lieven
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Blake @ 2016-10-25 13:59 UTC (permalink / raw)
  To: Peter Lieven, Paolo Bonzini, Fam Zheng
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Kevin Wolf, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 936 bytes --]

On 10/25/2016 07:42 AM, Peter Lieven wrote:
>>
>> But hey, that firmware is seriously weird. :)
> 
> Yes, so you would not change the new implementation?
> 
> Even if the discard is e.g. 1MB it could theretically be that internally
> the device has a finer granularity. Its an optimal discard alignment
> not the minimum required discard size. I think thats a difference.
> It does not say I can't handle smaller discards.

The firmware is probably technically buggy for advertising too large of
a minimum granularity, if it can piece together smaller requests into a
larger discard.  If discards need to happen at a smaller granularity,
the firmware (or kernel quirk system) should fix the advertisement to
the actual granularity that it will honor.  I don't see a reason to
change qemu's current behavior.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-10-25 13:59                   ` Eric Blake
@ 2016-10-25 14:20                     ` Peter Lieven
  2016-10-25 14:35                       ` Eric Blake
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Lieven @ 2016-10-25 14:20 UTC (permalink / raw)
  To: Eric Blake, Paolo Bonzini, Fam Zheng
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Kevin Wolf, Max Reitz

Am 25.10.2016 um 15:59 schrieb Eric Blake:
> On 10/25/2016 07:42 AM, Peter Lieven wrote:
>>> But hey, that firmware is seriously weird. :)
>> Yes, so you would not change the new implementation?
>>
>> Even if the discard is e.g. 1MB it could theretically be that internally
>> the device has a finer granularity. Its an optimal discard alignment
>> not the minimum required discard size. I think thats a difference.
>> It does not say I can't handle smaller discards.
> The firmware is probably technically buggy for advertising too large of
> a minimum granularity, if it can piece together smaller requests into a
> larger discard.  If discards need to happen at a smaller granularity,
> the firmware (or kernel quirk system) should fix the advertisement to
> the actual granularity that it will honor.  I don't see a reason to
> change qemu's current behavior.
>

The issue is that the optimal unmap granularity is only a hint.
There is no evidence what happens with unaligned requests or requests
that are smaller. They could still lead to a discard operation in the
storage device. It just says if you can send me discards of that size
thats optimal for me. Thats not said that smaller or unaligned requests
have no effect.

Peter

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-10-25 14:20                     ` Peter Lieven
@ 2016-10-25 14:35                       ` Eric Blake
  2016-10-25 14:36                         ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Blake @ 2016-10-25 14:35 UTC (permalink / raw)
  To: Peter Lieven, Paolo Bonzini, Fam Zheng
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Kevin Wolf, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1286 bytes --]

On 10/25/2016 09:20 AM, Peter Lieven wrote:
>> The firmware is probably technically buggy for advertising too large of
>> a minimum granularity, if it can piece together smaller requests into a
>> larger discard.  If discards need to happen at a smaller granularity,
>> the firmware (or kernel quirk system) should fix the advertisement to
>> the actual granularity that it will honor.  I don't see a reason to
>> change qemu's current behavior.
>>
> 
> The issue is that the optimal unmap granularity is only a hint.
> There is no evidence what happens with unaligned requests or requests
> that are smaller. They could still lead to a discard operation in the
> storage device. It just says if you can send me discards of that size
> thats optimal for me. Thats not said that smaller or unaligned requests
> have no effect.

So your argument is that we should always pass down every unaligned
less-than-optimum discard request all the way to the hardware, rather
than dropping it higher in the stack, even though discard requests are
already advisory, in order to leave the hardware as the ultimate
decision on whether to ignore the unaligned request?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-10-25 14:35                       ` Eric Blake
@ 2016-10-25 14:36                         ` Paolo Bonzini
  2016-10-25 16:12                           ` Eric Blake
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2016-10-25 14:36 UTC (permalink / raw)
  To: Eric Blake, Peter Lieven, Fam Zheng
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Kevin Wolf, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 436 bytes --]



On 25/10/2016 16:35, Eric Blake wrote:
> So your argument is that we should always pass down every unaligned
> less-than-optimum discard request all the way to the hardware, rather
> than dropping it higher in the stack, even though discard requests are
> already advisory, in order to leave the hardware as the ultimate
> decision on whether to ignore the unaligned request?

Yes, I agree with Peter as to this.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-10-25 14:36                         ` Paolo Bonzini
@ 2016-10-25 16:12                           ` Eric Blake
  2016-11-08 11:03                             ` Peter Lieven
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Blake @ 2016-10-25 16:12 UTC (permalink / raw)
  To: Paolo Bonzini, Peter Lieven, Fam Zheng
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Kevin Wolf, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 852 bytes --]

On 10/25/2016 09:36 AM, Paolo Bonzini wrote:
> 
> 
> On 25/10/2016 16:35, Eric Blake wrote:
>> So your argument is that we should always pass down every unaligned
>> less-than-optimum discard request all the way to the hardware, rather
>> than dropping it higher in the stack, even though discard requests are
>> already advisory, in order to leave the hardware as the ultimate
>> decision on whether to ignore the unaligned request?
> 
> Yes, I agree with Peter as to this.

Okay, I'll work on patches. I think it counts as bug fix, so appropriate
even if I miss soft freeze (I'd still like to get NBD write zero support
into 2.8, since it already missed 2.7, but that one is still awaiting
review with not much time left).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-10-25 16:12                           ` Eric Blake
@ 2016-11-08 11:03                             ` Peter Lieven
  2016-11-08 16:43                               ` Eric Blake
  2016-11-11  4:02                               ` Eric Blake
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Lieven @ 2016-11-08 11:03 UTC (permalink / raw)
  To: Eric Blake, Paolo Bonzini, Fam Zheng
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Kevin Wolf, Max Reitz

Am 25.10.2016 um 18:12 schrieb Eric Blake:
> On 10/25/2016 09:36 AM, Paolo Bonzini wrote:
>>
>> On 25/10/2016 16:35, Eric Blake wrote:
>>> So your argument is that we should always pass down every unaligned
>>> less-than-optimum discard request all the way to the hardware, rather
>>> than dropping it higher in the stack, even though discard requests are
>>> already advisory, in order to leave the hardware as the ultimate
>>> decision on whether to ignore the unaligned request?
>> Yes, I agree with Peter as to this.
> Okay, I'll work on patches. I think it counts as bug fix, so appropriate
> even if I miss soft freeze (I'd still like to get NBD write zero support
> into 2.8, since it already missed 2.7, but that one is still awaiting
> review with not much time left).
>

Hi Eric,

have you had time to look at this?
If you need help, let me know.

Peter

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-11-08 11:03                             ` Peter Lieven
@ 2016-11-08 16:43                               ` Eric Blake
  2016-11-11  4:02                               ` Eric Blake
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Blake @ 2016-11-08 16:43 UTC (permalink / raw)
  To: Peter Lieven, Paolo Bonzini, Fam Zheng
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Kevin Wolf, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1223 bytes --]

On 11/08/2016 05:03 AM, Peter Lieven wrote:
> Am 25.10.2016 um 18:12 schrieb Eric Blake:
>> On 10/25/2016 09:36 AM, Paolo Bonzini wrote:
>>>
>>> On 25/10/2016 16:35, Eric Blake wrote:
>>>> So your argument is that we should always pass down every unaligned
>>>> less-than-optimum discard request all the way to the hardware, rather
>>>> than dropping it higher in the stack, even though discard requests are
>>>> already advisory, in order to leave the hardware as the ultimate
>>>> decision on whether to ignore the unaligned request?
>>> Yes, I agree with Peter as to this.
>> Okay, I'll work on patches. I think it counts as bug fix, so appropriate
>> even if I miss soft freeze (I'd still like to get NBD write zero support
>> into 2.8, since it already missed 2.7, but that one is still awaiting
>> review with not much time left).
>>
> 
> Hi Eric,
> 
> have you had time to look at this?
> If you need help, let me know.

Still on my list. I'm not forgetting it, and it does count as a bug fix
so it is safe for inclusion, although I'm trying to get it in before
this week is out.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard
  2016-11-08 11:03                             ` Peter Lieven
  2016-11-08 16:43                               ` Eric Blake
@ 2016-11-11  4:02                               ` Eric Blake
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Blake @ 2016-11-11  4:02 UTC (permalink / raw)
  To: Peter Lieven, Paolo Bonzini, Fam Zheng
  Cc: qemu-devel, qemu-block, Stefan Hajnoczi, Kevin Wolf, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1352 bytes --]

On 11/08/2016 05:03 AM, Peter Lieven wrote:
> Am 25.10.2016 um 18:12 schrieb Eric Blake:
>> On 10/25/2016 09:36 AM, Paolo Bonzini wrote:
>>>
>>> On 25/10/2016 16:35, Eric Blake wrote:
>>>> So your argument is that we should always pass down every unaligned
>>>> less-than-optimum discard request all the way to the hardware, rather
>>>> than dropping it higher in the stack, even though discard requests are
>>>> already advisory, in order to leave the hardware as the ultimate
>>>> decision on whether to ignore the unaligned request?
>>> Yes, I agree with Peter as to this.
>> Okay, I'll work on patches. I think it counts as bug fix, so appropriate
>> even if I miss soft freeze (I'd still like to get NBD write zero support
>> into 2.8, since it already missed 2.7, but that one is still awaiting
>> review with not much time left).
>>
> 
> Hi Eric,
> 
> have you had time to look at this?
> If you need help, let me know.

Patches posted, but testing help would be appreciated since you have the
actual hardware that exhibits the issue.

I'm also trying to write a patch to extend the blkdebug driver to share
this "feature" of a 15M page, and write a qemu-iotest to make it harder
to regress in the future.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-11-11  4:02 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-21 19:34 [Qemu-devel] [PATCH for-2.7 0/4] NBD and block alignment fixes Eric Blake
2016-07-21 19:34 ` [Qemu-devel] [PATCH 1/4] nbd: Fix bad flag detection on server Eric Blake
2016-07-21 19:34 ` [Qemu-devel] [PATCH 2/4] nbd: Limit nbdflags to 16 bits Eric Blake
2016-08-01  9:17   ` Paolo Bonzini
2016-08-01 11:43     ` Eric Blake
2016-08-01 11:50       ` Paolo Bonzini
2016-07-21 19:34 ` [Qemu-devel] [PATCH 3/4] osdep: Document differences in rounding macros Eric Blake
2016-08-01  9:19   ` Paolo Bonzini
2016-07-21 19:34 ` [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard Eric Blake
2016-07-26 13:28   ` Stefan Hajnoczi
2016-07-27  7:25   ` Fam Zheng
2016-07-28  2:39     ` Eric Blake
2016-08-01  9:22       ` Paolo Bonzini
2016-10-25 12:03         ` Peter Lieven
2016-10-25 12:09           ` Paolo Bonzini
2016-10-25 12:12             ` Peter Lieven
2016-10-25 12:19               ` Paolo Bonzini
2016-10-25 12:42                 ` Peter Lieven
2016-10-25 13:59                   ` Eric Blake
2016-10-25 14:20                     ` Peter Lieven
2016-10-25 14:35                       ` Eric Blake
2016-10-25 14:36                         ` Paolo Bonzini
2016-10-25 16:12                           ` Eric Blake
2016-11-08 11:03                             ` Peter Lieven
2016-11-08 16:43                               ` Eric Blake
2016-11-11  4:02                               ` Eric Blake
2016-07-27  7:26 ` [Qemu-devel] [PATCH for-2.7 0/4] NBD and block alignment fixes Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.