All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] hw/block/nvme: mdts/zasl cleanup
@ 2021-02-22 20:29 Klaus Jensen
  2021-02-22 20:29 ` [PATCH 1/3] hw/block/nvme: document 'mdts' nvme device parameter Klaus Jensen
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Klaus Jensen @ 2021-02-22 20:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Klaus Jensen, Max Reitz, Klaus Jensen,
	Keith Busch

From: Klaus Jensen <k.jensen@samsung.com>

The gist of this series is about aligning the zoned.zasl parameter with
the mdts parameter. I complained about this back when I was reviewing
the zoned series but was shot down. I relented on the size/capacity
debate (and still fully support that), but I never really liked that
ZASL is different from MDTS. Changing the definition makes the
validation code much simpler and, well, it aligns perfectly with the
existing mdts parameter, which is the goal here.

While the current definition of zasl is in master, it has not yet been
released, so this is sort of our last chance to change this before v6.0.

I'll repeat the commit message of [3/3] here for context:

ZASL (Zone Append Size Limit) is defined exactly like MDTS (Maximum Data
Transfer Size), that is, it is a value in units of the minimum memory
page size (CAP.MPSMIN) and is reported as a power of two.

The 'mdts' nvme device parameter is specified as in the spec, but the
'zoned.append_size_limit' parameter is specified in bytes. This is
suboptimal for a number of reasons:

  1. It is just plain confusing wrt. the definition of mdts.
  2. There is a lot of complexity involved in validating the value; it
     must be a power of two, it should be larger than 4k, if it is zero
     we set it internally to mdts, but still report it as zero.
  3. While "hw/block/nvme: improve invalid zasl value reporting"
     slightly improved the handling of the parameter, the validation is
     still wrong; it does not depend on CC.MPS, it depends on
     CAP.MPSMIN. And we are not even checking that it is actually less
     than or equal to MDTS, which is kinda the *one* condition it must
     satisfy.

Fix this by defining zasl exactly like mdts and checking the one thing
that it must satisfy (that it is less than or equal to mdts). Also,
change the default value from 128KiB to 0 (aka, whatever mdts is).

Klaus Jensen (3):
  hw/block/nvme: document 'mdts' nvme device parameter
  hw/block/nvme: deduplicate bad mdts trace event
  hw/block/nvme: align zoned.zasl with mdts

 hw/block/nvme.h       |  4 +--
 hw/block/nvme.c       | 67 ++++++++++++++-----------------------------
 hw/block/trace-events |  4 +--
 3 files changed, 25 insertions(+), 50 deletions(-)

-- 
2.30.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] hw/block/nvme: document 'mdts' nvme device parameter
  2021-02-22 20:29 [PATCH 0/3] hw/block/nvme: mdts/zasl cleanup Klaus Jensen
@ 2021-02-22 20:29 ` Klaus Jensen
  2021-02-22 20:29 ` [PATCH 2/3] hw/block/nvme: deduplicate bad mdts trace event Klaus Jensen
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Klaus Jensen @ 2021-02-22 20:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Klaus Jensen, Max Reitz, Klaus Jensen,
	Keith Busch

From: Klaus Jensen <k.jensen@samsung.com>

Document the 'mdts' nvme device parameter.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 1cd82fa3c9fe..6a27b28f2c2d 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -63,6 +63,12 @@
  *   completion when there are no outstanding AERs. When the maximum number of
  *   enqueued events are reached, subsequent events will be dropped.
  *
+ * - `mdts`
+ *   Indicates the maximum data transfer size for a command that transfers data
+ *   between host-accessible memory and the controller. The value is specified
+ *   as a power of two (2^n) and is in units of the minimum memory page size
+ *   (CAP.MPSMIN). The default value is 7 (i.e. 512 KiB).
+ *
  * - `zoned.append_size_limit`
  *   The maximum I/O size in bytes that is allowed in Zone Append command.
  *   The default is 128KiB. Since internally this this value is maintained as
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] hw/block/nvme: deduplicate bad mdts trace event
  2021-02-22 20:29 [PATCH 0/3] hw/block/nvme: mdts/zasl cleanup Klaus Jensen
  2021-02-22 20:29 ` [PATCH 1/3] hw/block/nvme: document 'mdts' nvme device parameter Klaus Jensen
@ 2021-02-22 20:29 ` Klaus Jensen
  2021-02-22 20:29 ` [PATCH 3/3] hw/block/nvme: align zoned.zasl with mdts Klaus Jensen
  2021-02-22 21:00 ` [PATCH 0/3] hw/block/nvme: mdts/zasl cleanup Keith Busch
  3 siblings, 0 replies; 6+ messages in thread
From: Klaus Jensen @ 2021-02-22 20:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Klaus Jensen, Max Reitz, Klaus Jensen,
	Keith Busch

From: Klaus Jensen <k.jensen@samsung.com>

If mdts is exceeded, trace it from a single place.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c       | 6 +-----
 hw/block/trace-events | 2 +-
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6a27b28f2c2d..25a7726ca05b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1075,6 +1075,7 @@ static inline uint16_t nvme_check_mdts(NvmeCtrl *n, size_t len)
     uint8_t mdts = n->params.mdts;
 
     if (mdts && len > n->page_size << mdts) {
+        trace_pci_nvme_err_mdts(len);
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 
@@ -1945,7 +1946,6 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest *req)
 
     status = nvme_check_mdts(n, len);
     if (status) {
-        trace_pci_nvme_err_mdts(nvme_cid(req), len);
         return status;
     }
 
@@ -2048,7 +2048,6 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
 
     status = nvme_check_mdts(n, data_size);
     if (status) {
-        trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
         goto invalid;
     }
 
@@ -2116,7 +2115,6 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
     if (!wrz) {
         status = nvme_check_mdts(n, data_size);
         if (status) {
-            trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
             goto invalid;
         }
     }
@@ -2610,7 +2608,6 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req)
 
     status = nvme_check_mdts(n, data_size);
     if (status) {
-        trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
         return status;
     }
 
@@ -3052,7 +3049,6 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req)
 
     status = nvme_check_mdts(n, len);
     if (status) {
-        trace_pci_nvme_err_mdts(nvme_cid(req), len);
         return status;
     }
 
diff --git a/hw/block/trace-events b/hw/block/trace-events
index b04f7a3e1890..e1a85661cf3f 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -114,7 +114,7 @@ pci_nvme_clear_ns_close(uint32_t state, uint64_t slba) "zone state=%"PRIu32", sl
 pci_nvme_clear_ns_reset(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Empty state"
 
 # nvme traces for error conditions
-pci_nvme_err_mdts(uint16_t cid, size_t len) "cid %"PRIu16" len %zu"
+pci_nvme_err_mdts(size_t len) "len %zu"
 pci_nvme_err_req_status(uint16_t cid, uint32_t nsid, uint16_t status, uint8_t opc) "cid %"PRIu16" nsid %"PRIu32" status 0x%"PRIx16" opc 0x%"PRIx8""
 pci_nvme_err_addr_read(uint64_t addr) "addr 0x%"PRIx64""
 pci_nvme_err_addr_write(uint64_t addr) "addr 0x%"PRIx64""
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] hw/block/nvme: align zoned.zasl with mdts
  2021-02-22 20:29 [PATCH 0/3] hw/block/nvme: mdts/zasl cleanup Klaus Jensen
  2021-02-22 20:29 ` [PATCH 1/3] hw/block/nvme: document 'mdts' nvme device parameter Klaus Jensen
  2021-02-22 20:29 ` [PATCH 2/3] hw/block/nvme: deduplicate bad mdts trace event Klaus Jensen
@ 2021-02-22 20:29 ` Klaus Jensen
  2021-02-22 21:00 ` [PATCH 0/3] hw/block/nvme: mdts/zasl cleanup Keith Busch
  3 siblings, 0 replies; 6+ messages in thread
From: Klaus Jensen @ 2021-02-22 20:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Klaus Jensen, Max Reitz, Klaus Jensen,
	Keith Busch

From: Klaus Jensen <k.jensen@samsung.com>

ZASL (Zone Append Size Limit) is defined exactly like MDTS (Maximum Data
Transfer Size), that is, it is a value in units of the minimum memory
page size (CAP.MPSMIN) and is reported as a power of two.

The 'mdts' nvme device parameter is specified as in the spec, but the
'zoned.append_size_limit' parameter is specified in bytes. This is
suboptimal for a number of reasons:

  1. It is just plain confusing wrt. the definition of mdts.
  2. There is a lot of complexity involved in validating the value; it
     must be a power of two, it should be larger than 4k, if it is zero
     we set it internally to mdts, but still report it as zero.
  3. While "hw/block/nvme: improve invalid zasl value reporting"
     slightly improved the handling of the parameter, the validation is
     still wrong; it does not depend on CC.MPS, it depends on
     CAP.MPSMIN. And we are not even checking that it is actually less
     than or equal to MDTS, which is kinda the *one* condition it must
     satisfy.

Fix this by defining zasl exactly like mdts and checking the one thing
that it must satisfy (that it is less than or equal to mdts). Also,
change the default value from 128KiB to 0 (aka, whatever mdts is).

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.h       |  4 +---
 hw/block/nvme.c       | 55 ++++++++++++-------------------------------
 hw/block/trace-events |  2 +-
 3 files changed, 17 insertions(+), 44 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index cb2b5175f1a1..f45ace0cff5b 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -20,7 +20,7 @@ typedef struct NvmeParams {
     uint32_t aer_max_queued;
     uint8_t  mdts;
     bool     use_intel_id;
-    uint32_t zasl_bs;
+    uint8_t  zasl;
     bool     legacy_cmb;
 } NvmeParams;
 
@@ -171,8 +171,6 @@ typedef struct NvmeCtrl {
     QTAILQ_HEAD(, NvmeAsyncEvent) aer_queue;
     int         aer_queued;
 
-    uint8_t     zasl;
-
     NvmeSubsystem   *subsys;
 
     NvmeNamespace   namespace;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 25a7726ca05b..edd0b85c10ce 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -69,13 +69,11 @@
  *   as a power of two (2^n) and is in units of the minimum memory page size
  *   (CAP.MPSMIN). The default value is 7 (i.e. 512 KiB).
  *
- * - `zoned.append_size_limit`
- *   The maximum I/O size in bytes that is allowed in Zone Append command.
- *   The default is 128KiB. Since internally this this value is maintained as
- *   ZASL = log2(<maximum append size> / <page size>), some values assigned
- *   to this property may be rounded down and result in a lower maximum ZA
- *   data size being in effect. By setting this property to 0, users can make
- *   ZASL to be equal to MDTS. This property only affects zoned namespaces.
+ * - `zoned.zasl`
+ *   Indicates the maximum data transfer size for the Zone Append command. Like
+ *   `mdts`, the value is specified as a power of two (2^n) and is in units of
+ *   the minimum memory page size (CAP.MPSMIN). The default value is 0 (i.e.
+ *   defaulting to the value of `mdts`).
  *
  * nvme namespace device parameters
  * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -2135,10 +2133,9 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
                 goto invalid;
             }
 
-            if (nvme_l2b(ns, nlb) > (n->page_size << n->zasl)) {
-                trace_pci_nvme_err_append_too_large(slba, nlb, n->zasl);
-                status = NVME_INVALID_FIELD;
-                goto invalid;
+            if (n->params.zasl && data_size > n->page_size << n->params.zasl) {
+                trace_pci_nvme_err_zasl(data_size);
+                return NVME_INVALID_FIELD | NVME_DNR;
             }
 
             slba = zone->w_ptr;
@@ -3212,9 +3209,8 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req)
     if (c->csi == NVME_CSI_NVM) {
         return nvme_rpt_empty_id_struct(n, req);
     } else if (c->csi == NVME_CSI_ZONED) {
-        if (n->params.zasl_bs) {
-            id.zasl = n->zasl;
-        }
+        id.zasl = n->params.zasl;
+
         return nvme_dma(n, (uint8_t *)&id, sizeof(id),
                         DMA_DIRECTION_FROM_DEVICE, req);
     }
@@ -4088,19 +4084,6 @@ static int nvme_start_ctrl(NvmeCtrl *n)
     nvme_init_sq(&n->admin_sq, n, n->bar.asq, 0, 0,
                  NVME_AQA_ASQS(n->bar.aqa) + 1);
 
-    if (!n->params.zasl_bs) {
-        n->zasl = n->params.mdts;
-    } else {
-        if (n->params.zasl_bs < n->page_size) {
-            NVME_GUEST_ERR(pci_nvme_err_startfail_zasl_too_small,
-                           "Zone Append Size Limit (ZASL) of %d bytes is too "
-                           "small; must be at least %d bytes",
-                           n->params.zasl_bs, n->page_size);
-            return -1;
-        }
-        n->zasl = 31 - clz32(n->params.zasl_bs / n->page_size);
-    }
-
     nvme_set_timestamp(n, 0ULL);
 
     QTAILQ_INIT(&n->aer_queue);
@@ -4609,17 +4592,10 @@ static void nvme_check_constraints(NvmeCtrl *n, Error **errp)
         host_memory_backend_set_mapped(n->pmr.dev, true);
     }
 
-    if (n->params.zasl_bs) {
-        if (!is_power_of_2(n->params.zasl_bs)) {
-            error_setg(errp, "zone append size limit has to be a power of 2");
-            return;
-        }
-
-        if (n->params.zasl_bs < 4096) {
-            error_setg(errp, "zone append size limit must be at least "
-                       "4096 bytes");
-            return;
-        }
+    if (n->params.zasl > n->params.mdts) {
+        error_setg(errp, "zoned.zasl (Zone Append Size Limit) must be less "
+                   "than or equal to mdts (Maximum Data Transfer Size)");
+        return;
     }
 }
 
@@ -4988,8 +4964,7 @@ static Property nvme_props[] = {
     DEFINE_PROP_UINT8("mdts", NvmeCtrl, params.mdts, 7),
     DEFINE_PROP_BOOL("use-intel-id", NvmeCtrl, params.use_intel_id, false),
     DEFINE_PROP_BOOL("legacy-cmb", NvmeCtrl, params.legacy_cmb, false),
-    DEFINE_PROP_SIZE32("zoned.append_size_limit", NvmeCtrl, params.zasl_bs,
-                       NVME_DEFAULT_MAX_ZA_SIZE),
+    DEFINE_PROP_UINT8("zoned.zasl", NvmeCtrl, params.zasl, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/trace-events b/hw/block/trace-events
index e1a85661cf3f..25ba51ea5405 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -115,6 +115,7 @@ pci_nvme_clear_ns_reset(uint32_t state, uint64_t slba) "zone state=%"PRIu32", sl
 
 # nvme traces for error conditions
 pci_nvme_err_mdts(size_t len) "len %zu"
+pci_nvme_err_zasl(size_t len) "len %zu"
 pci_nvme_err_req_status(uint16_t cid, uint32_t nsid, uint16_t status, uint8_t opc) "cid %"PRIu16" nsid %"PRIu32" status 0x%"PRIx16" opc 0x%"PRIx8""
 pci_nvme_err_addr_read(uint64_t addr) "addr 0x%"PRIx64""
 pci_nvme_err_addr_write(uint64_t addr) "addr 0x%"PRIx64""
@@ -144,7 +145,6 @@ pci_nvme_err_zone_boundary(uint64_t slba, uint32_t nlb, uint64_t zcap) "lba 0x%"
 pci_nvme_err_zone_invalid_write(uint64_t slba, uint64_t wp) "lba 0x%"PRIx64" wp 0x%"PRIx64""
 pci_nvme_err_zone_write_not_ok(uint64_t slba, uint32_t nlb, uint16_t status) "slba=%"PRIu64", nlb=%"PRIu32", status=0x%"PRIx16""
 pci_nvme_err_zone_read_not_ok(uint64_t slba, uint32_t nlb, uint16_t status) "slba=%"PRIu64", nlb=%"PRIu32", status=0x%"PRIx16""
-pci_nvme_err_append_too_large(uint64_t slba, uint32_t nlb, uint8_t zasl) "slba=%"PRIu64", nlb=%"PRIu32", zasl=%"PRIu8""
 pci_nvme_err_insuff_active_res(uint32_t max_active) "max_active=%"PRIu32" zone limit exceeded"
 pci_nvme_err_insuff_open_res(uint32_t max_open) "max_open=%"PRIu32" zone limit exceeded"
 pci_nvme_err_zd_extension_map_error(uint32_t zone_idx) "can't map descriptor extension for zone_idx=%"PRIu32""
-- 
2.30.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] hw/block/nvme: mdts/zasl cleanup
  2021-02-22 20:29 [PATCH 0/3] hw/block/nvme: mdts/zasl cleanup Klaus Jensen
                   ` (2 preceding siblings ...)
  2021-02-22 20:29 ` [PATCH 3/3] hw/block/nvme: align zoned.zasl with mdts Klaus Jensen
@ 2021-02-22 21:00 ` Keith Busch
  2021-02-25 10:43   ` Klaus Jensen
  3 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2021-02-22 21:00 UTC (permalink / raw)
  To: Klaus Jensen; +Cc: Kevin Wolf, Klaus Jensen, qemu-devel, qemu-block, Max Reitz

These look good.

Reviewed-by: Keith Busch <kbusch@kernel.org>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] hw/block/nvme: mdts/zasl cleanup
  2021-02-22 21:00 ` [PATCH 0/3] hw/block/nvme: mdts/zasl cleanup Keith Busch
@ 2021-02-25 10:43   ` Klaus Jensen
  0 siblings, 0 replies; 6+ messages in thread
From: Klaus Jensen @ 2021-02-25 10:43 UTC (permalink / raw)
  To: Keith Busch; +Cc: Kevin Wolf, Klaus Jensen, qemu-devel, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 142 bytes --]

On Feb 23 06:00, Keith Busch wrote:
> These look good.
> 
> Reviewed-by: Keith Busch <kbusch@kernel.org>

Thanks, applied to nvme-next.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-02-25 10:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-22 20:29 [PATCH 0/3] hw/block/nvme: mdts/zasl cleanup Klaus Jensen
2021-02-22 20:29 ` [PATCH 1/3] hw/block/nvme: document 'mdts' nvme device parameter Klaus Jensen
2021-02-22 20:29 ` [PATCH 2/3] hw/block/nvme: deduplicate bad mdts trace event Klaus Jensen
2021-02-22 20:29 ` [PATCH 3/3] hw/block/nvme: align zoned.zasl with mdts Klaus Jensen
2021-02-22 21:00 ` [PATCH 0/3] hw/block/nvme: mdts/zasl cleanup Keith Busch
2021-02-25 10:43   ` Klaus Jensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.