[PATCH v1] nvme: allow cmb and pmr emulation on same device

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v1] nvme: allow cmb and pmr emulation on same device
@ 2020-06-05 18:10 Andrzej Jakowski
  2020-06-05 18:10 ` [PATCH v1 1/2] nvme: indicate CMB support through controller capabilities register Andrzej Jakowski
  2020-06-05 18:10 ` [PATCH v1 2/2] nvme: allow cmb and pmr to be enabled on same device Andrzej Jakowski
  0 siblings, 2 replies; 6+ messages in thread
From: Andrzej Jakowski @ 2020-06-05 18:10 UTC (permalink / raw)
  To: kbusch, kwolf, mreitz; +Cc: qemu-devel, qemu-block

This patch series does following:
 - Fixes problem where CMBS bit was not set in controller capabilities
   register, so support for CMB was not correctly advertised to guest.
   This is resend of patch that has been submitted and reviewed by
   Klaus [1]
 - Introduces BAR4 sharing between MSI-X vectors and CMB. This allows
   to have CMB and PMR emulated on the same device. This extension
   was indicated by Keith [2]

[1]: https://lore.kernel.org/qemu-devel/20200408055607.g2ii7gwqbnv6cd3w@apples.localdomain/
[2]: https://lore.kernel.org/qemu-devel/20200330165518.GA8234@redsun51.ssa.fujisawa.hgst.com/



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v1 1/2] nvme: indicate CMB support through controller capabilities register
  2020-06-05 18:10 [PATCH v1] nvme: allow cmb and pmr emulation on same device Andrzej Jakowski
@ 2020-06-05 18:10 ` Andrzej Jakowski
  2020-06-05 18:10 ` [PATCH v1 2/2] nvme: allow cmb and pmr to be enabled on same device Andrzej Jakowski
  1 sibling, 0 replies; 6+ messages in thread
From: Andrzej Jakowski @ 2020-06-05 18:10 UTC (permalink / raw)
  To: kbusch, kwolf, mreitz
  Cc: Klaus Jensen, Andrzej Jakowski, qemu-devel, qemu-block

This patch sets CMBS bit in controller capabilities register when user
configures NVMe driver with CMB support, so capabilites are correctly
reported to guest OS.

Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/block/nvme.c      | 2 ++
 include/block/nvme.h | 6 +++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index a21eeca2fb..f0b45704be 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1449,6 +1449,8 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
     n->bar.intmc = n->bar.intms = 0;
 
     if (n->cmb_size_mb) {
+        /* Contoller capabilities */
+        NVME_CAP_SET_CMBS(n->bar.cap, 1);
 
         NVME_CMBLOC_SET_BIR(n->bar.cmbloc, 2);
         NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0);
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 5525c8e343..b48349dbd0 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -35,6 +35,7 @@ enum NvmeCapShift {
     CAP_MPSMIN_SHIFT   = 48,
     CAP_MPSMAX_SHIFT   = 52,
     CAP_PMR_SHIFT      = 56,
+    CAP_CMB_SHIFT      = 57,
 };
 
 enum NvmeCapMask {
@@ -48,6 +49,7 @@ enum NvmeCapMask {
     CAP_MPSMIN_MASK    = 0xf,
     CAP_MPSMAX_MASK    = 0xf,
     CAP_PMR_MASK       = 0x1,
+    CAP_CMB_MASK       = 0x1,
 };
 
 #define NVME_CAP_MQES(cap)  (((cap) >> CAP_MQES_SHIFT)   & CAP_MQES_MASK)
@@ -78,8 +80,10 @@ enum NvmeCapMask {
                                                            << CAP_MPSMIN_SHIFT)
 #define NVME_CAP_SET_MPSMAX(cap, val) (cap |= (uint64_t)(val & CAP_MPSMAX_MASK)\
                                                             << CAP_MPSMAX_SHIFT)
-#define NVME_CAP_SET_PMRS(cap, val) (cap |= (uint64_t)(val & CAP_PMR_MASK)\
+#define NVME_CAP_SET_PMRS(cap, val)   (cap |= (uint64_t)(val & CAP_PMR_MASK)   \
                                                             << CAP_PMR_SHIFT)
+#define NVME_CAP_SET_CMBS(cap, val)   (cap |= (uint64_t)(val & CAP_CMB_MASK)   \
+                                                           << CAP_CMB_SHIFT)
 
 enum NvmeCcShift {
     CC_EN_SHIFT     = 0,
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v1 2/2] nvme: allow cmb and pmr to be enabled on same device
  2020-06-05 18:10 [PATCH v1] nvme: allow cmb and pmr emulation on same device Andrzej Jakowski
  2020-06-05 18:10 ` [PATCH v1 1/2] nvme: indicate CMB support through controller capabilities register Andrzej Jakowski
@ 2020-06-05 18:10 ` Andrzej Jakowski
  2020-06-08  8:08   ` Klaus Jensen
  1 sibling, 1 reply; 6+ messages in thread
From: Andrzej Jakowski @ 2020-06-05 18:10 UTC (permalink / raw)
  To: kbusch, kwolf, mreitz; +Cc: Andrzej Jakowski, qemu-devel, qemu-block

So far it was not possible to have CMB and PMR emulated on the same
device, because BAR2 was used exclusively either of PMR or CMB. This
patch places CMB at BAR4 offset so it not conflicts with MSI-X vectors.

Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
---
 hw/block/nvme.c      | 127 +++++++++++++++++++++++++++++--------------
 hw/block/nvme.h      |   3 +-
 include/block/nvme.h |   4 +-
 3 files changed, 91 insertions(+), 43 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index f0b45704be..353cf20e0a 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -22,12 +22,12 @@
  *              [pmrdev=<mem_backend_file_id>,] \
  *              num_queues=<N[optional]>
  *
- * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
- * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
+ * Note cmb_size_mb denotes size of CMB in MB. CMB when configured is assumed
+ * to be resident in BAR4 at certain offset - this is because BAR4 is also
+ * used for storing MSI-X table that is available at offset 0 in BAR4.
  *
- * cmb_size_mb= and pmrdev= options are mutually exclusive due to limitation
- * in available BAR's. cmb_size_mb= will take precedence over pmrdev= when
- * both provided.
+ * pmrdev is assumed to be resident in BAR2. When configured it consumes whole
+ * BAR2 exclusively.
  * Enabling pmr emulation can be achieved by pointing to memory-backend-file.
  * For example:
  * -object memory-backend-file,id=<mem_id>,share=on,mem-path=<file_path>, \
@@ -64,9 +64,10 @@ static void nvme_process_sq(void *opaque);
 
 static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
 {
-    if (n->cmbsz && addr >= n->ctrl_mem.addr &&
-                addr < (n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size))) {
-        memcpy(buf, (void *)&n->cmbuf[addr - n->ctrl_mem.addr], size);
+    hwaddr cmb_addr = n->bar4.addr + n->cmb_offset;
+    if (n->cmbsz && addr >= cmb_addr &&
+        (addr + size) <= (cmb_addr + NVME_CMBSZ_GETSIZE(n->bar.cmbsz))) {
+        memcpy(buf, (void *)&n->cmbuf[addr - cmb_addr], size);
     } else {
         pci_dma_read(&n->parent_obj, addr, buf, size);
     }
@@ -152,17 +153,18 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
                              uint64_t prp2, uint32_t len, NvmeCtrl *n)
 {
     hwaddr trans_len = n->page_size - (prp1 % n->page_size);
+    hwaddr cmb_addr = n->bar4.addr + n->cmb_offset;
     trans_len = MIN(len, trans_len);
     int num_prps = (len >> n->page_bits) + 1;
 
     if (unlikely(!prp1)) {
         trace_nvme_err_invalid_prp();
         return NVME_INVALID_FIELD | NVME_DNR;
-    } else if (n->cmbsz && prp1 >= n->ctrl_mem.addr &&
-               prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
+    } else if (n->cmbsz && prp1 >= cmb_addr &&
+               prp1 < cmb_addr + int128_get64(n->bar4.size)) {
         qsg->nsg = 0;
         qemu_iovec_init(iov, num_prps);
-        qemu_iovec_add(iov, (void *)&n->cmbuf[prp1 - n->ctrl_mem.addr], trans_len);
+        qemu_iovec_add(iov, (void *)&n->cmbuf[prp1 - cmb_addr], trans_len);
     } else {
         pci_dma_sglist_init(qsg, &n->parent_obj, num_prps);
         qemu_sglist_add(qsg, prp1, trans_len);
@@ -207,7 +209,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
                 if (qsg->nsg){
                     qemu_sglist_add(qsg, prp_ent, trans_len);
                 } else {
-                    qemu_iovec_add(iov, (void *)&n->cmbuf[prp_ent - n->ctrl_mem.addr], trans_len);
+                    qemu_iovec_add(iov, (void *)&n->cmbuf[prp_ent - cmb_addr],
+                                   trans_len);
                 }
                 len -= trans_len;
                 i++;
@@ -220,7 +223,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
             if (qsg->nsg) {
                 qemu_sglist_add(qsg, prp2, len);
             } else {
-                qemu_iovec_add(iov, (void *)&n->cmbuf[prp2 - n->ctrl_mem.addr], trans_len);
+                qemu_iovec_add(iov, (void *)&n->cmbuf[prp2 - cmb_addr],
+                               trans_len);
             }
         }
     }
@@ -1342,6 +1346,71 @@ static const MemoryRegionOps nvme_cmb_ops = {
     },
 };
 
+#define NVME_MSIX_BIR (4)
+static void nvme_bar4_init(PCIDevice *pci_dev)
+{
+    NvmeCtrl *n = NVME(pci_dev);
+    int status;
+    uint64_t bar_size = 4096;
+    uint32_t nvme_pba_offset = bar_size / 2;
+    uint32_t nvme_pba_size = QEMU_ALIGN_UP(n->num_queues, 64) / 8;
+    uint32_t cmb_size_units;
+
+    if (n->num_queues * PCI_MSIX_ENTRY_SIZE > nvme_pba_offset) {
+        nvme_pba_offset = n->num_queues * PCI_MSIX_ENTRY_SIZE;
+    }
+
+    if (nvme_pba_offset + nvme_pba_size > 4096) {
+        bar_size = nvme_pba_offset + nvme_pba_size;
+    }
+
+    if (n->cmb_size_mb) {
+        /* Contoller capabilities */
+        NVME_CAP_SET_CMBS(n->bar.cap, 1);
+
+        NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1);
+        NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0);
+        NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0);
+        NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
+        NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
+        NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2); /* MBs */
+        NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->cmb_size_mb);
+
+        cmb_size_units = NVME_CMBSZ_GETSIZEUNITS(n->bar.cmbsz);
+        n->cmb_offset = QEMU_ALIGN_UP(bar_size, cmb_size_units);
+
+        NVME_CMBLOC_SET_BIR(n->bar.cmbloc, NVME_MSIX_BIR);
+        NVME_CMBLOC_SET_OFST(n->bar.cmbloc, n->cmb_offset / cmb_size_units);
+
+        n->cmbloc = n->bar.cmbloc;
+        n->cmbsz = n->bar.cmbsz;
+
+        n->cmbuf = g_malloc0(NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
+
+        bar_size += n->cmb_offset;
+        bar_size += NVME_CMBSZ_GETSIZE(n->bar.cmbsz);
+    }
+
+    bar_size = pow2ceil(bar_size);
+
+    memory_region_init_io(&n->bar4, OBJECT(n), &nvme_cmb_ops, n,
+                          "nvme-bar4", bar_size);
+
+    status = msix_init(pci_dev, n->num_queues,
+              &n->bar4, NVME_MSIX_BIR, 0,
+              &n->bar4, NVME_MSIX_BIR, nvme_pba_offset,
+              0, NULL);
+
+    if (status) {
+        return;
+    }
+
+    pci_register_bar(pci_dev, NVME_MSIX_BIR,
+             PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64 |
+             PCI_BASE_ADDRESS_MEM_PREFETCH, &n->bar4);
+
+}
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
     NvmeCtrl *n = NVME(pci_dev);
@@ -1372,7 +1441,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
         return;
     }
 
-    if (!n->cmb_size_mb && n->pmrdev) {
+    if (n->pmrdev) {
         if (host_memory_backend_is_mapped(n->pmrdev)) {
             char *path = object_get_canonical_path_component(OBJECT(n->pmrdev));
             error_setg(errp, "can't use already busy memdev: %s", path);
@@ -1413,7 +1482,6 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
     pci_register_bar(pci_dev, 0,
         PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
         &n->iomem);
-    msix_init_exclusive_bar(pci_dev, n->num_queues, 4, NULL);
 
     id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
     id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
@@ -1445,35 +1513,12 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
     NVME_CAP_SET_CSS(n->bar.cap, 1);
     NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
 
+    nvme_bar4_init(pci_dev);
+
     n->bar.vs = 0x00010200;
     n->bar.intmc = n->bar.intms = 0;
 
-    if (n->cmb_size_mb) {
-        /* Contoller capabilities */
-        NVME_CAP_SET_CMBS(n->bar.cap, 1);
-
-        NVME_CMBLOC_SET_BIR(n->bar.cmbloc, 2);
-        NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0);
-
-        NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1);
-        NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0);
-        NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0);
-        NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
-        NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
-        NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2); /* MBs */
-        NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->cmb_size_mb);
-
-        n->cmbloc = n->bar.cmbloc;
-        n->cmbsz = n->bar.cmbsz;
-
-        n->cmbuf = g_malloc0(NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
-        memory_region_init_io(&n->ctrl_mem, OBJECT(n), &nvme_cmb_ops, n,
-                              "nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
-        pci_register_bar(pci_dev, NVME_CMBLOC_BIR(n->bar.cmbloc),
-            PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64 |
-            PCI_BASE_ADDRESS_MEM_PREFETCH, &n->ctrl_mem);
-
-    } else if (n->pmrdev) {
+    if (n->pmrdev) {
         /* Controller Capabilities register */
         NVME_CAP_SET_PMRS(n->bar.cap, 1);
 
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 6520a9f0be..b983fc0005 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -60,7 +60,7 @@ typedef struct NvmeNamespace {
 typedef struct NvmeCtrl {
     PCIDevice    parent_obj;
     MemoryRegion iomem;
-    MemoryRegion ctrl_mem;
+    MemoryRegion bar4;
     NvmeBar      bar;
     BlockConf    conf;
 
@@ -77,6 +77,7 @@ typedef struct NvmeCtrl {
     uint32_t    cmb_size_mb;
     uint32_t    cmbsz;
     uint32_t    cmbloc;
+    uint32_t    cmb_offset;
     uint8_t     *cmbuf;
     uint64_t    irq_status;
     uint64_t    host_timestamp;                 /* Timestamp sent by the host */
diff --git a/include/block/nvme.h b/include/block/nvme.h
index b48349dbd0..47379e1a27 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -216,9 +216,11 @@ enum NvmeCmbszMask {
     (cmbsz |= (uint64_t)(val & CMBSZ_SZU_MASK) << CMBSZ_SZU_SHIFT)
 #define NVME_CMBSZ_SET_SZ(cmbsz, val)    \
     (cmbsz |= (uint64_t)(val & CMBSZ_SZ_MASK) << CMBSZ_SZ_SHIFT)
+#define NVME_CMBSZ_GETSIZEUNITS(cmbsz) \
+    (1 << (12 + 4 * NVME_CMBSZ_SZU(cmbsz)))
 
 #define NVME_CMBSZ_GETSIZE(cmbsz) \
-    (NVME_CMBSZ_SZ(cmbsz) * (1 << (12 + 4 * NVME_CMBSZ_SZU(cmbsz))))
+    (NVME_CMBSZ_SZ(cmbsz) * NVME_CMBSZ_GETSIZEUNITS(cmbsz))
 
 enum NvmePmrcapShift {
     PMRCAP_RDS_SHIFT      = 3,
-- 
2.21.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 2/2] nvme: allow cmb and pmr to be enabled on same device
  2020-06-05 18:10 ` [PATCH v1 2/2] nvme: allow cmb and pmr to be enabled on same device Andrzej Jakowski
@ 2020-06-08  8:08   ` Klaus Jensen
  2020-06-08 19:44     ` Andrzej Jakowski
  0 siblings, 1 reply; 6+ messages in thread
From: Klaus Jensen @ 2020-06-08  8:08 UTC (permalink / raw)
  To: Andrzej Jakowski; +Cc: kbusch, kwolf, qemu-devel, qemu-block, mreitz

On Jun  5 11:10, Andrzej Jakowski wrote:
> So far it was not possible to have CMB and PMR emulated on the same
> device, because BAR2 was used exclusively either of PMR or CMB. This
> patch places CMB at BAR4 offset so it not conflicts with MSI-X vectors.
> 

Hi Andrzej,

Thanks for doing this, it's a nice addition!

Though, I would prefer that the table and pba was located in BAR0 and
keeping BAR4 for exclusive CMB use. I'm no expert on this, but is it ok
to have the table and pba in prefetchable memory? Having it "together"
with the other controller-level configuration memory just feels more
natural to me, but I'm not gonna put my foot down.

Using BAR0 would also slightly simplify the patch since no changes would
be required for the CMB path.

Also, can you rebase this on Kevin's block branch? There are a bunch of
refactoring patches that changes the realization code, so this patch
doesn't apply at all.

> Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
> ---
>  hw/block/nvme.c      | 127 +++++++++++++++++++++++++++++--------------
>  hw/block/nvme.h      |   3 +-
>  include/block/nvme.h |   4 +-
>  3 files changed, 91 insertions(+), 43 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index f0b45704be..353cf20e0a 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -22,12 +22,12 @@
>   *              [pmrdev=<mem_backend_file_id>,] \
>   *              num_queues=<N[optional]>
>   *
> - * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
> - * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
> + * Note cmb_size_mb denotes size of CMB in MB. CMB when configured is assumed
> + * to be resident in BAR4 at certain offset - this is because BAR4 is also
> + * used for storing MSI-X table that is available at offset 0 in BAR4.
>   *
> - * cmb_size_mb= and pmrdev= options are mutually exclusive due to limitation
> - * in available BAR's. cmb_size_mb= will take precedence over pmrdev= when
> - * both provided.
> + * pmrdev is assumed to be resident in BAR2. When configured it consumes whole
> + * BAR2 exclusively.

Actually it uses both BAR2 and BAR3 since its 64 bits.

> @@ -1342,6 +1346,71 @@ static const MemoryRegionOps nvme_cmb_ops = {
>      },
>  };
>  
> +#define NVME_MSIX_BIR (4)
> +static void nvme_bar4_init(PCIDevice *pci_dev)
> +{
> +    NvmeCtrl *n = NVME(pci_dev);
> +    int status;
> +    uint64_t bar_size = 4096;
> +    uint32_t nvme_pba_offset = bar_size / 2;
> +    uint32_t nvme_pba_size = QEMU_ALIGN_UP(n->num_queues, 64) / 8;
> +    uint32_t cmb_size_units;
> +
> +    if (n->num_queues * PCI_MSIX_ENTRY_SIZE > nvme_pba_offset) {
> +        nvme_pba_offset = n->num_queues * PCI_MSIX_ENTRY_SIZE;
> +    }
> +
> +    if (nvme_pba_offset + nvme_pba_size > 4096) {
> +        bar_size = nvme_pba_offset + nvme_pba_size;
> +    }
> +

This is migration compatibility stuff that is not needed because the
nvme device is unmigratable anyway.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 2/2] nvme: allow cmb and pmr to be enabled on same device
  2020-06-08  8:08   ` Klaus Jensen
@ 2020-06-08 19:44     ` Andrzej Jakowski
  2020-06-09  9:49       ` Klaus Jensen
  0 siblings, 1 reply; 6+ messages in thread
From: Andrzej Jakowski @ 2020-06-08 19:44 UTC (permalink / raw)
  To: Klaus Jensen; +Cc: kbusch, kwolf, qemu-devel, qemu-block, mreitz

On 6/8/20 1:08 AM, Klaus Jensen wrote:
> On Jun  5 11:10, Andrzej Jakowski wrote:
>> So far it was not possible to have CMB and PMR emulated on the same
>> device, because BAR2 was used exclusively either of PMR or CMB. This
>> patch places CMB at BAR4 offset so it not conflicts with MSI-X vectors.
>>
> 
> Hi Andrzej,
> 
> Thanks for doing this, it's a nice addition!
> 
> Though, I would prefer that the table and pba was located in BAR0 and
> keeping BAR4 for exclusive CMB use. I'm no expert on this, but is it ok
> to have the table and pba in prefetchable memory? Having it "together"
> with the other controller-level configuration memory just feels more
> natural to me, but I'm not gonna put my foot down.
Hi Klaus,

Thx for your feedback!
I don't think it matters if MSIX table is in prefetchable vs 
non-prefetchable memory. 
My understanding is that spec allows MSIX and PBA to be in any BAR and
offset. I understand your preference and at the same time think that
since it is not in violation of the spec why don't we leave it as-is?
Does anybody know what's typical approach for real devices?
> 
> Using BAR0 would also slightly simplify the patch since no changes would
> be required for the CMB path.
> 
> Also, can you rebase this on Kevin's block branch? There are a bunch of
> refactoring patches that changes the realization code, so this patch
> doesn't apply at all.
Yep will reabse it.
> 
>> Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
>> ---
>>  hw/block/nvme.c      | 127 +++++++++++++++++++++++++++++--------------
>>  hw/block/nvme.h      |   3 +-
>>  include/block/nvme.h |   4 +-
>>  3 files changed, 91 insertions(+), 43 deletions(-)
>>
>> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
>> index f0b45704be..353cf20e0a 100644
>> --- a/hw/block/nvme.c
>> +++ b/hw/block/nvme.c
>> @@ -22,12 +22,12 @@
>>   *              [pmrdev=<mem_backend_file_id>,] \
>>   *              num_queues=<N[optional]>
>>   *
>> - * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
>> - * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
>> + * Note cmb_size_mb denotes size of CMB in MB. CMB when configured is assumed
>> + * to be resident in BAR4 at certain offset - this is because BAR4 is also
>> + * used for storing MSI-X table that is available at offset 0 in BAR4.
>>   *
>> - * cmb_size_mb= and pmrdev= options are mutually exclusive due to limitation
>> - * in available BAR's. cmb_size_mb= will take precedence over pmrdev= when
>> - * both provided.
>> + * pmrdev is assumed to be resident in BAR2. When configured it consumes whole
>> + * BAR2 exclusively.
> 
> Actually it uses both BAR2 and BAR3 since its 64 bits.
Correct. That's what I implied here w/o actual verbiage. I can extend it
to add that information.
> 
>> @@ -1342,6 +1346,71 @@ static const MemoryRegionOps nvme_cmb_ops = {
>>      },
>>  };
>>  
>> +#define NVME_MSIX_BIR (4)
>> +static void nvme_bar4_init(PCIDevice *pci_dev)
>> +{
>> +    NvmeCtrl *n = NVME(pci_dev);
>> +    int status;
>> +    uint64_t bar_size = 4096;
>> +    uint32_t nvme_pba_offset = bar_size / 2;
>> +    uint32_t nvme_pba_size = QEMU_ALIGN_UP(n->num_queues, 64) / 8;
>> +    uint32_t cmb_size_units;
>> +
>> +    if (n->num_queues * PCI_MSIX_ENTRY_SIZE > nvme_pba_offset) {
>> +        nvme_pba_offset = n->num_queues * PCI_MSIX_ENTRY_SIZE;
>> +    }
>> +
>> +    if (nvme_pba_offset + nvme_pba_size > 4096) {
>> +        bar_size = nvme_pba_offset + nvme_pba_size;
>> +    }
>> +
> 
> This is migration compatibility stuff that is not needed because the
> nvme device is unmigratable anyway.
I don't understand that comment. Could you please explain more?
 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 2/2] nvme: allow cmb and pmr to be enabled on same device
  2020-06-08 19:44     ` Andrzej Jakowski
@ 2020-06-09  9:49       ` Klaus Jensen
  0 siblings, 0 replies; 6+ messages in thread
From: Klaus Jensen @ 2020-06-09  9:49 UTC (permalink / raw)
  To: Andrzej Jakowski; +Cc: kbusch, kwolf, qemu-devel, qemu-block, mreitz

On Jun  8 12:44, Andrzej Jakowski wrote:
> On 6/8/20 1:08 AM, Klaus Jensen wrote:
> > On Jun  5 11:10, Andrzej Jakowski wrote:
> >> So far it was not possible to have CMB and PMR emulated on the same
> >> device, because BAR2 was used exclusively either of PMR or CMB. This
> >> patch places CMB at BAR4 offset so it not conflicts with MSI-X vectors.
> >>
> > 
> > Hi Andrzej,
> > 
> > Thanks for doing this, it's a nice addition!
> > 
> > Though, I would prefer that the table and pba was located in BAR0 and
> > keeping BAR4 for exclusive CMB use. I'm no expert on this, but is it ok
> > to have the table and pba in prefetchable memory? Having it "together"
> > with the other controller-level configuration memory just feels more
> > natural to me, but I'm not gonna put my foot down.
> Hi Klaus,
> 
> Thx for your feedback!
> I don't think it matters if MSIX table is in prefetchable vs 
> non-prefetchable memory. 
> My understanding is that spec allows MSIX and PBA to be in any BAR and
> offset. I understand your preference and at the same time think that
> since it is not in violation of the spec why don't we leave it as-is?
> Does anybody know what's typical approach for real devices?

On the SSD in my system, lspci -vv shows:

        Capabilities: [b0] MSI-X: Enable+ Count=33 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=4 offset=00000000

So, the table is in BAR0 but the PBA is in BAR4. Oh well.

> >> @@ -1342,6 +1346,71 @@ static const MemoryRegionOps nvme_cmb_ops = {
> >>      },
> >>  };
> >>  
> >> +#define NVME_MSIX_BIR (4)
> >> +static void nvme_bar4_init(PCIDevice *pci_dev)
> >> +{
> >> +    NvmeCtrl *n = NVME(pci_dev);
> >> +    int status;
> >> +    uint64_t bar_size = 4096;
> >> +    uint32_t nvme_pba_offset = bar_size / 2;
> >> +    uint32_t nvme_pba_size = QEMU_ALIGN_UP(n->num_queues, 64) / 8;
> >> +    uint32_t cmb_size_units;
> >> +
> >> +    if (n->num_queues * PCI_MSIX_ENTRY_SIZE > nvme_pba_offset) {
> >> +        nvme_pba_offset = n->num_queues * PCI_MSIX_ENTRY_SIZE;
> >> +    }
> >> +
> >> +    if (nvme_pba_offset + nvme_pba_size > 4096) {
> >> +        bar_size = nvme_pba_offset + nvme_pba_size;
> >> +    }
> >> +
> > 
> > This is migration compatibility stuff that is not needed because the
> > nvme device is unmigratable anyway.
> I don't understand that comment. Could you please explain more?
>  
> 

It looks like you cribbed this code from msix_init_exclusive_bar(). All
that fuzz about the PBA starting in the upper half (nvme_pba_offset =
bar_size / 2) for nentries lower or equal to 128 is for migration
compatibility (migrating a VM from an old version of QEMU to a newer
one). It is something that was added when the location of the msix table
and pba became dynamic (it used to be static). But, since the nvme
device is marked as unmigratable (VMStateDescription.unmigratable = 1),
I believe these special cases are irrelevant.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-06-09  9:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-05 18:10 [PATCH v1] nvme: allow cmb and pmr emulation on same device Andrzej Jakowski
2020-06-05 18:10 ` [PATCH v1 1/2] nvme: indicate CMB support through controller capabilities register Andrzej Jakowski
2020-06-05 18:10 ` [PATCH v1 2/2] nvme: allow cmb and pmr to be enabled on same device Andrzej Jakowski
2020-06-08  8:08   ` Klaus Jensen
2020-06-08 19:44     ` Andrzej Jakowski
2020-06-09  9:49       ` Klaus Jensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.