All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/5] Few fixes for userspace NVME driver
@ 2019-04-17 19:53 ` Maxim Levitsky
  0 siblings, 0 replies; 22+ messages in thread
From: Maxim Levitsky @ 2019-04-17 19:53 UTC (permalink / raw)
  To: qemu-block; +Cc: Maxim Levitsky, Fam Zheng, Kevin Wolf, Max Reitz, qemu-devel

Hi!
These are few assorted fixes and features for the userspace
nvme driver.

Tested that on my laptop with my Samsung X5 thunderbolt drive, which
happens to have 4K sectors, support for discard and write zeros.

Also bunch of fixes sitting in my queue from the period when I developed
the nvme-mdev driver.

Best regards,
        Maxim Levitsky

Maxim Levitsky (5):
  block/nvme: don't flip CQ phase bits
  block/nvme: fix doorbell stride
  block/nvme: support larger that 512 bytes sector devices
  block/nvme: add support for write zeros
  block/nvme: add support for discard

 block/nvme.c         | 193 +++++++++++++++++++++++++++++++++++++++++--
 block/trace-events   |   3 +
 include/block/nvme.h |  19 ++++-
 3 files changed, 205 insertions(+), 10 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v2 0/5] Few fixes for userspace NVME driver
@ 2019-04-17 19:53 ` Maxim Levitsky
  0 siblings, 0 replies; 22+ messages in thread
From: Maxim Levitsky @ 2019-04-17 19:53 UTC (permalink / raw)
  To: qemu-block; +Cc: Fam Zheng, Kevin Wolf, Max Reitz, qemu-devel, Maxim Levitsky

Hi!
These are few assorted fixes and features for the userspace
nvme driver.

Tested that on my laptop with my Samsung X5 thunderbolt drive, which
happens to have 4K sectors, support for discard and write zeros.

Also bunch of fixes sitting in my queue from the period when I developed
the nvme-mdev driver.

Best regards,
        Maxim Levitsky

Maxim Levitsky (5):
  block/nvme: don't flip CQ phase bits
  block/nvme: fix doorbell stride
  block/nvme: support larger that 512 bytes sector devices
  block/nvme: add support for write zeros
  block/nvme: add support for discard

 block/nvme.c         | 193 +++++++++++++++++++++++++++++++++++++++++--
 block/trace-events   |   3 +
 include/block/nvme.h |  19 ++++-
 3 files changed, 205 insertions(+), 10 deletions(-)

-- 
2.17.2



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v2 1/5] block/nvme: don't flip CQ phase bits
@ 2019-04-17 19:53   ` Maxim Levitsky
  0 siblings, 0 replies; 22+ messages in thread
From: Maxim Levitsky @ 2019-04-17 19:53 UTC (permalink / raw)
  To: qemu-block; +Cc: Maxim Levitsky, Fam Zheng, Kevin Wolf, Max Reitz, qemu-devel

Phase bits are only set by the hardware to indicate new completions
and not by the device driver.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/nvme.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 0684bbd077..2d208000df 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -340,8 +340,6 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q)
         qemu_mutex_lock(&q->lock);
         c->cid = cpu_to_le16(0);
         q->inflight--;
-        /* Flip Phase Tag bit. */
-        c->status = cpu_to_le16(le16_to_cpu(c->status) ^ 0x1);
         progress = true;
     }
     if (progress) {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v2 1/5] block/nvme: don't flip CQ phase bits
@ 2019-04-17 19:53   ` Maxim Levitsky
  0 siblings, 0 replies; 22+ messages in thread
From: Maxim Levitsky @ 2019-04-17 19:53 UTC (permalink / raw)
  To: qemu-block; +Cc: Fam Zheng, Kevin Wolf, Max Reitz, qemu-devel, Maxim Levitsky

Phase bits are only set by the hardware to indicate new completions
and not by the device driver.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/nvme.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 0684bbd077..2d208000df 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -340,8 +340,6 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q)
         qemu_mutex_lock(&q->lock);
         c->cid = cpu_to_le16(0);
         q->inflight--;
-        /* Flip Phase Tag bit. */
-        c->status = cpu_to_le16(le16_to_cpu(c->status) ^ 0x1);
         progress = true;
     }
     if (progress) {
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v2 2/5] block/nvme: fix doorbell stride
@ 2019-04-17 19:53   ` Maxim Levitsky
  0 siblings, 0 replies; 22+ messages in thread
From: Maxim Levitsky @ 2019-04-17 19:53 UTC (permalink / raw)
  To: qemu-block; +Cc: Maxim Levitsky, Fam Zheng, Kevin Wolf, Max Reitz, qemu-devel

Fix the math involving non standard doorbell stride

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/nvme.c b/block/nvme.c
index 2d208000df..208242cf1f 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -216,7 +216,7 @@ static NVMeQueuePair *nvme_create_queue_pair(BlockDriverState *bs,
         error_propagate(errp, local_err);
         goto fail;
     }
-    q->cq.doorbell = &s->regs->doorbells[idx * 2 * s->doorbell_scale + 1];
+    q->cq.doorbell = &s->regs->doorbells[(idx * 2 + 1) * s->doorbell_scale];
 
     return q;
 fail:
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v2 2/5] block/nvme: fix doorbell stride
@ 2019-04-17 19:53   ` Maxim Levitsky
  0 siblings, 0 replies; 22+ messages in thread
From: Maxim Levitsky @ 2019-04-17 19:53 UTC (permalink / raw)
  To: qemu-block; +Cc: Fam Zheng, Kevin Wolf, Max Reitz, qemu-devel, Maxim Levitsky

Fix the math involving non standard doorbell stride

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/nvme.c b/block/nvme.c
index 2d208000df..208242cf1f 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -216,7 +216,7 @@ static NVMeQueuePair *nvme_create_queue_pair(BlockDriverState *bs,
         error_propagate(errp, local_err);
         goto fail;
     }
-    q->cq.doorbell = &s->regs->doorbells[idx * 2 * s->doorbell_scale + 1];
+    q->cq.doorbell = &s->regs->doorbells[(idx * 2 + 1) * s->doorbell_scale];
 
     return q;
 fail:
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v2 3/5] block/nvme: support larger that 512 bytes sector devices
@ 2019-04-17 19:53   ` Maxim Levitsky
  0 siblings, 0 replies; 22+ messages in thread
From: Maxim Levitsky @ 2019-04-17 19:53 UTC (permalink / raw)
  To: qemu-block; +Cc: Maxim Levitsky, Fam Zheng, Kevin Wolf, Max Reitz, qemu-devel

Currently the driver hardcodes the sector size to 512,
and doesn't check the underlying device

Also fail if underlying nvme device is formatted with metadata
as this needs special support.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/nvme.c | 40 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 35 insertions(+), 5 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 208242cf1f..0b1da54574 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -101,8 +101,11 @@ typedef struct {
     size_t doorbell_scale;
     bool write_cache_supported;
     EventNotifier irq_notifier;
+
     uint64_t nsze; /* Namespace size reported by identify command */
     int nsid;      /* The namespace id to read/write data. */
+    size_t blkshift;
+
     uint64_t max_transfer;
     bool plugged;
 
@@ -415,8 +418,9 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
     BDRVNVMeState *s = bs->opaque;
     NvmeIdCtrl *idctrl;
     NvmeIdNs *idns;
+    NvmeLBAF *lbaf;
     uint8_t *resp;
-    int r;
+    int r, hwsect_size;
     uint64_t iova;
     NvmeCmd cmd = {
         .opcode = NVME_ADM_CMD_IDENTIFY,
@@ -463,7 +467,22 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
     }
 
     s->nsze = le64_to_cpu(idns->nsze);
+    lbaf = &idns->lbaf[NVME_ID_NS_FLBAS_INDEX(idns->flbas)];
+
+    if (lbaf->ms) {
+        error_setg(errp, "Namespaces with metadata are not yet supported");
+        goto out;
+    }
+
+    hwsect_size = 1 << lbaf->ds;
+
+    if (hwsect_size < BDRV_SECTOR_BITS || hwsect_size > s->page_size) {
+        error_setg(errp, "Namespace has unsupported block size (%d)",
+                hwsect_size);
+        goto out;
+    }
 
+    s->blkshift = lbaf->ds;
 out:
     qemu_vfio_dma_unmap(s->vfio, resp);
     qemu_vfree(resp);
@@ -782,8 +801,17 @@ fail:
 static int64_t nvme_getlength(BlockDriverState *bs)
 {
     BDRVNVMeState *s = bs->opaque;
+    return s->nsze << s->blkshift;
+}
+
 
-    return s->nsze << BDRV_SECTOR_BITS;
+static int nvme_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
+{
+    BDRVNVMeState *s = bs->opaque;
+    assert(s->blkshift >= 9);
+    bsz->phys = 1 << s->blkshift;
+    bsz->log = 1 << s->blkshift;
+    return 0;
 }
 
 /* Called with s->dma_map_lock */
@@ -914,13 +942,14 @@ static coroutine_fn int nvme_co_prw_aligned(BlockDriverState *bs,
     BDRVNVMeState *s = bs->opaque;
     NVMeQueuePair *ioq = s->queues[1];
     NVMeRequest *req;
-    uint32_t cdw12 = (((bytes >> BDRV_SECTOR_BITS) - 1) & 0xFFFF) |
+
+    uint32_t cdw12 = (((bytes >> s->blkshift) - 1) & 0xFFFF) |
                        (flags & BDRV_REQ_FUA ? 1 << 30 : 0);
     NvmeCmd cmd = {
         .opcode = is_write ? NVME_CMD_WRITE : NVME_CMD_READ,
         .nsid = cpu_to_le32(s->nsid),
-        .cdw10 = cpu_to_le32((offset >> BDRV_SECTOR_BITS) & 0xFFFFFFFF),
-        .cdw11 = cpu_to_le32(((offset >> BDRV_SECTOR_BITS) >> 32) & 0xFFFFFFFF),
+        .cdw10 = cpu_to_le32((offset >> s->blkshift) & 0xFFFFFFFF),
+        .cdw11 = cpu_to_le32(((offset >> s->blkshift) >> 32) & 0xFFFFFFFF),
         .cdw12 = cpu_to_le32(cdw12),
     };
     NVMeCoData data = {
@@ -1151,6 +1180,7 @@ static BlockDriver bdrv_nvme = {
     .bdrv_file_open           = nvme_file_open,
     .bdrv_close               = nvme_close,
     .bdrv_getlength           = nvme_getlength,
+    .bdrv_probe_blocksizes    = nvme_probe_blocksizes,
 
     .bdrv_co_preadv           = nvme_co_preadv,
     .bdrv_co_pwritev          = nvme_co_pwritev,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v2 3/5] block/nvme: support larger that 512 bytes sector devices
@ 2019-04-17 19:53   ` Maxim Levitsky
  0 siblings, 0 replies; 22+ messages in thread
From: Maxim Levitsky @ 2019-04-17 19:53 UTC (permalink / raw)
  To: qemu-block; +Cc: Fam Zheng, Kevin Wolf, Max Reitz, qemu-devel, Maxim Levitsky

Currently the driver hardcodes the sector size to 512,
and doesn't check the underlying device

Also fail if underlying nvme device is formatted with metadata
as this needs special support.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/nvme.c | 40 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 35 insertions(+), 5 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 208242cf1f..0b1da54574 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -101,8 +101,11 @@ typedef struct {
     size_t doorbell_scale;
     bool write_cache_supported;
     EventNotifier irq_notifier;
+
     uint64_t nsze; /* Namespace size reported by identify command */
     int nsid;      /* The namespace id to read/write data. */
+    size_t blkshift;
+
     uint64_t max_transfer;
     bool plugged;
 
@@ -415,8 +418,9 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
     BDRVNVMeState *s = bs->opaque;
     NvmeIdCtrl *idctrl;
     NvmeIdNs *idns;
+    NvmeLBAF *lbaf;
     uint8_t *resp;
-    int r;
+    int r, hwsect_size;
     uint64_t iova;
     NvmeCmd cmd = {
         .opcode = NVME_ADM_CMD_IDENTIFY,
@@ -463,7 +467,22 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
     }
 
     s->nsze = le64_to_cpu(idns->nsze);
+    lbaf = &idns->lbaf[NVME_ID_NS_FLBAS_INDEX(idns->flbas)];
+
+    if (lbaf->ms) {
+        error_setg(errp, "Namespaces with metadata are not yet supported");
+        goto out;
+    }
+
+    hwsect_size = 1 << lbaf->ds;
+
+    if (hwsect_size < BDRV_SECTOR_BITS || hwsect_size > s->page_size) {
+        error_setg(errp, "Namespace has unsupported block size (%d)",
+                hwsect_size);
+        goto out;
+    }
 
+    s->blkshift = lbaf->ds;
 out:
     qemu_vfio_dma_unmap(s->vfio, resp);
     qemu_vfree(resp);
@@ -782,8 +801,17 @@ fail:
 static int64_t nvme_getlength(BlockDriverState *bs)
 {
     BDRVNVMeState *s = bs->opaque;
+    return s->nsze << s->blkshift;
+}
+
 
-    return s->nsze << BDRV_SECTOR_BITS;
+static int nvme_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
+{
+    BDRVNVMeState *s = bs->opaque;
+    assert(s->blkshift >= 9);
+    bsz->phys = 1 << s->blkshift;
+    bsz->log = 1 << s->blkshift;
+    return 0;
 }
 
 /* Called with s->dma_map_lock */
@@ -914,13 +942,14 @@ static coroutine_fn int nvme_co_prw_aligned(BlockDriverState *bs,
     BDRVNVMeState *s = bs->opaque;
     NVMeQueuePair *ioq = s->queues[1];
     NVMeRequest *req;
-    uint32_t cdw12 = (((bytes >> BDRV_SECTOR_BITS) - 1) & 0xFFFF) |
+
+    uint32_t cdw12 = (((bytes >> s->blkshift) - 1) & 0xFFFF) |
                        (flags & BDRV_REQ_FUA ? 1 << 30 : 0);
     NvmeCmd cmd = {
         .opcode = is_write ? NVME_CMD_WRITE : NVME_CMD_READ,
         .nsid = cpu_to_le32(s->nsid),
-        .cdw10 = cpu_to_le32((offset >> BDRV_SECTOR_BITS) & 0xFFFFFFFF),
-        .cdw11 = cpu_to_le32(((offset >> BDRV_SECTOR_BITS) >> 32) & 0xFFFFFFFF),
+        .cdw10 = cpu_to_le32((offset >> s->blkshift) & 0xFFFFFFFF),
+        .cdw11 = cpu_to_le32(((offset >> s->blkshift) >> 32) & 0xFFFFFFFF),
         .cdw12 = cpu_to_le32(cdw12),
     };
     NVMeCoData data = {
@@ -1151,6 +1180,7 @@ static BlockDriver bdrv_nvme = {
     .bdrv_file_open           = nvme_file_open,
     .bdrv_close               = nvme_close,
     .bdrv_getlength           = nvme_getlength,
+    .bdrv_probe_blocksizes    = nvme_probe_blocksizes,
 
     .bdrv_co_preadv           = nvme_co_preadv,
     .bdrv_co_pwritev          = nvme_co_pwritev,
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v2 4/5] block/nvme: add support for write zeros
@ 2019-04-17 19:53   ` Maxim Levitsky
  0 siblings, 0 replies; 22+ messages in thread
From: Maxim Levitsky @ 2019-04-17 19:53 UTC (permalink / raw)
  To: qemu-block; +Cc: Maxim Levitsky, Fam Zheng, Kevin Wolf, Max Reitz, qemu-devel

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/nvme.c         | 69 +++++++++++++++++++++++++++++++++++++++++++-
 block/trace-events   |  1 +
 include/block/nvme.h | 19 +++++++++++-
 3 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 0b1da54574..35b925899f 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -109,6 +109,8 @@ typedef struct {
     uint64_t max_transfer;
     bool plugged;
 
+    bool supports_write_zeros;
+
     CoMutex dma_map_lock;
     CoQueue dma_flush_queue;
 
@@ -457,6 +459,10 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
     s->max_transfer = MIN_NON_ZERO(s->max_transfer,
                           s->page_size / sizeof(uint64_t) * s->page_size);
 
+
+
+    s->supports_write_zeros = (idctrl->oncs & NVME_ONCS_WRITE_ZEROS) != 0;
+
     memset(resp, 0, 4096);
 
     cmd.cdw10 = 0;
@@ -469,6 +475,11 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
     s->nsze = le64_to_cpu(idns->nsze);
     lbaf = &idns->lbaf[NVME_ID_NS_FLBAS_INDEX(idns->flbas)];
 
+    if (NVME_ID_NS_DLFEAT_WRITE_ZEROS(idns->dlfeat) &&
+            NVME_ID_NS_DLFEAT_READ_BEHAVIOR(idns->dlfeat) ==
+                    NVME_ID_NS_DLFEAT_READ_BEHAVIOR_ZEROS)
+        bs->supported_write_flags |= BDRV_REQ_MAY_UNMAP;
+
     if (lbaf->ms) {
         error_setg(errp, "Namespaces with metadata are not yet supported");
         goto out;
@@ -763,6 +774,8 @@ static int nvme_file_open(BlockDriverState *bs, QDict *options, int flags,
     int ret;
     BDRVNVMeState *s = bs->opaque;
 
+    bs->supported_write_flags = BDRV_REQ_FUA;
+
     opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
     qemu_opts_absorb_qdict(opts, options, &error_abort);
     device = qemu_opt_get(opts, NVME_BLOCK_OPT_DEVICE);
@@ -791,7 +804,6 @@ static int nvme_file_open(BlockDriverState *bs, QDict *options, int flags,
             goto fail;
         }
     }
-    bs->supported_write_flags = BDRV_REQ_FUA;
     return 0;
 fail:
     nvme_close(bs);
@@ -1080,6 +1092,58 @@ static coroutine_fn int nvme_co_flush(BlockDriverState *bs)
 }
 
 
+static coroutine_fn int nvme_co_pwrite_zeroes(BlockDriverState *bs,
+    int64_t offset, int bytes, BdrvRequestFlags flags)
+{
+    BDRVNVMeState *s = bs->opaque;
+    NVMeQueuePair *ioq = s->queues[1];
+    NVMeRequest *req;
+
+    if (!s->supports_write_zeros) {
+        return -ENOTSUP;
+    }
+
+    uint32_t cdw12 = ((bytes >> s->blkshift) - 1) & 0xFFFF;
+
+    NvmeCmd cmd = {
+        .opcode = NVME_CMD_WRITE_ZEROS,
+        .nsid = cpu_to_le32(s->nsid),
+        .cdw10 = cpu_to_le32((offset >> s->blkshift) & 0xFFFFFFFF),
+        .cdw11 = cpu_to_le32(((offset >> s->blkshift) >> 32) & 0xFFFFFFFF),
+    };
+
+    NVMeCoData data = {
+        .ctx = bdrv_get_aio_context(bs),
+        .ret = -EINPROGRESS,
+    };
+
+    if (flags & BDRV_REQ_MAY_UNMAP) {
+        cdw12 |= (1 << 25);
+    }
+
+    if (flags & BDRV_REQ_FUA) {
+        cdw12 |= (1 << 30);
+    }
+
+    cmd.cdw12 = cpu_to_le32(cdw12);
+
+    trace_nvme_write_zeros(s, offset, bytes, flags);
+    assert(s->nr_queues > 1);
+    req = nvme_get_free_req(ioq);
+    assert(req);
+
+    nvme_submit_command(s, ioq, req, &cmd, nvme_rw_cb, &data);
+
+    data.co = qemu_coroutine_self();
+    while (data.ret == -EINPROGRESS) {
+        qemu_coroutine_yield();
+    }
+
+    trace_nvme_rw_done(s, true, offset, bytes, data.ret);
+    return data.ret;
+}
+
+
 static int nvme_reopen_prepare(BDRVReopenState *reopen_state,
                                BlockReopenQueue *queue, Error **errp)
 {
@@ -1184,6 +1248,9 @@ static BlockDriver bdrv_nvme = {
 
     .bdrv_co_preadv           = nvme_co_preadv,
     .bdrv_co_pwritev          = nvme_co_pwritev,
+
+    .bdrv_co_pwrite_zeroes    = nvme_co_pwrite_zeroes,
+
     .bdrv_co_flush_to_disk    = nvme_co_flush,
     .bdrv_reopen_prepare      = nvme_reopen_prepare,
 
diff --git a/block/trace-events b/block/trace-events
index 7335a42540..943a58569f 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -144,6 +144,7 @@ nvme_submit_command_raw(int c0, int c1, int c2, int c3, int c4, int c5, int c6,
 nvme_handle_event(void *s) "s %p"
 nvme_poll_cb(void *s) "s %p"
 nvme_prw_aligned(void *s, int is_write, uint64_t offset, uint64_t bytes, int flags, int niov) "s %p is_write %d offset %"PRId64" bytes %"PRId64" flags %d niov %d"
+nvme_write_zeros(void *s, uint64_t offset, uint64_t bytes, int flags) "s %p offset %"PRId64" bytes %"PRId64" flags %d"
 nvme_qiov_unaligned(const void *qiov, int n, void *base, size_t size, int align) "qiov %p n %d base %p size 0x%zx align 0x%x"
 nvme_prw_buffered(void *s, uint64_t offset, uint64_t bytes, int niov, int is_write) "s %p offset %"PRId64" bytes %"PRId64" niov %d is_write %d"
 nvme_rw_done(void *s, int is_write, uint64_t offset, uint64_t bytes, int ret) "s %p is_write %d offset %"PRId64" bytes %"PRId64" ret %d"
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 0eae6f9f15..edf8e90557 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -653,12 +653,29 @@ typedef struct NvmeIdNs {
     uint8_t     mc;
     uint8_t     dpc;
     uint8_t     dps;
-    uint8_t     res30[98];
+
+    uint8_t     nmic;
+    uint8_t     rescap;
+    uint8_t     fpi;
+    uint8_t     dlfeat;
+
+    uint8_t     res30[94];
     NvmeLBAF    lbaf[16];
     uint8_t     res192[192];
     uint8_t     vs[3712];
 } NvmeIdNs;
 
+
+/*Deallocate Logical Block Features*/
+#define NVME_ID_NS_DLFEAT_GUARD_CRC(dlfeat)       ((dlfeat) & 0x10)
+#define NVME_ID_NS_DLFEAT_WRITE_ZEROS(dlfeat)     ((dlfeat) & 0x04)
+
+#define NVME_ID_NS_DLFEAT_READ_BEHAVIOR(dlfeat)     ((dlfeat) & 0x3)
+#define NVME_ID_NS_DLFEAT_READ_BEHAVIOR_UNDEFINED   0
+#define NVME_ID_NS_DLFEAT_READ_BEHAVIOR_ZEROS       1
+#define NVME_ID_NS_DLFEAT_READ_BEHAVIOR_ONES        2
+
+
 #define NVME_ID_NS_NSFEAT_THIN(nsfeat)      ((nsfeat & 0x1))
 #define NVME_ID_NS_FLBAS_EXTENDED(flbas)    ((flbas >> 4) & 0x1)
 #define NVME_ID_NS_FLBAS_INDEX(flbas)       ((flbas & 0xf))
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v2 4/5] block/nvme: add support for write zeros
@ 2019-04-17 19:53   ` Maxim Levitsky
  0 siblings, 0 replies; 22+ messages in thread
From: Maxim Levitsky @ 2019-04-17 19:53 UTC (permalink / raw)
  To: qemu-block; +Cc: Fam Zheng, Kevin Wolf, Max Reitz, qemu-devel, Maxim Levitsky

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/nvme.c         | 69 +++++++++++++++++++++++++++++++++++++++++++-
 block/trace-events   |  1 +
 include/block/nvme.h | 19 +++++++++++-
 3 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 0b1da54574..35b925899f 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -109,6 +109,8 @@ typedef struct {
     uint64_t max_transfer;
     bool plugged;
 
+    bool supports_write_zeros;
+
     CoMutex dma_map_lock;
     CoQueue dma_flush_queue;
 
@@ -457,6 +459,10 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
     s->max_transfer = MIN_NON_ZERO(s->max_transfer,
                           s->page_size / sizeof(uint64_t) * s->page_size);
 
+
+
+    s->supports_write_zeros = (idctrl->oncs & NVME_ONCS_WRITE_ZEROS) != 0;
+
     memset(resp, 0, 4096);
 
     cmd.cdw10 = 0;
@@ -469,6 +475,11 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
     s->nsze = le64_to_cpu(idns->nsze);
     lbaf = &idns->lbaf[NVME_ID_NS_FLBAS_INDEX(idns->flbas)];
 
+    if (NVME_ID_NS_DLFEAT_WRITE_ZEROS(idns->dlfeat) &&
+            NVME_ID_NS_DLFEAT_READ_BEHAVIOR(idns->dlfeat) ==
+                    NVME_ID_NS_DLFEAT_READ_BEHAVIOR_ZEROS)
+        bs->supported_write_flags |= BDRV_REQ_MAY_UNMAP;
+
     if (lbaf->ms) {
         error_setg(errp, "Namespaces with metadata are not yet supported");
         goto out;
@@ -763,6 +774,8 @@ static int nvme_file_open(BlockDriverState *bs, QDict *options, int flags,
     int ret;
     BDRVNVMeState *s = bs->opaque;
 
+    bs->supported_write_flags = BDRV_REQ_FUA;
+
     opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
     qemu_opts_absorb_qdict(opts, options, &error_abort);
     device = qemu_opt_get(opts, NVME_BLOCK_OPT_DEVICE);
@@ -791,7 +804,6 @@ static int nvme_file_open(BlockDriverState *bs, QDict *options, int flags,
             goto fail;
         }
     }
-    bs->supported_write_flags = BDRV_REQ_FUA;
     return 0;
 fail:
     nvme_close(bs);
@@ -1080,6 +1092,58 @@ static coroutine_fn int nvme_co_flush(BlockDriverState *bs)
 }
 
 
+static coroutine_fn int nvme_co_pwrite_zeroes(BlockDriverState *bs,
+    int64_t offset, int bytes, BdrvRequestFlags flags)
+{
+    BDRVNVMeState *s = bs->opaque;
+    NVMeQueuePair *ioq = s->queues[1];
+    NVMeRequest *req;
+
+    if (!s->supports_write_zeros) {
+        return -ENOTSUP;
+    }
+
+    uint32_t cdw12 = ((bytes >> s->blkshift) - 1) & 0xFFFF;
+
+    NvmeCmd cmd = {
+        .opcode = NVME_CMD_WRITE_ZEROS,
+        .nsid = cpu_to_le32(s->nsid),
+        .cdw10 = cpu_to_le32((offset >> s->blkshift) & 0xFFFFFFFF),
+        .cdw11 = cpu_to_le32(((offset >> s->blkshift) >> 32) & 0xFFFFFFFF),
+    };
+
+    NVMeCoData data = {
+        .ctx = bdrv_get_aio_context(bs),
+        .ret = -EINPROGRESS,
+    };
+
+    if (flags & BDRV_REQ_MAY_UNMAP) {
+        cdw12 |= (1 << 25);
+    }
+
+    if (flags & BDRV_REQ_FUA) {
+        cdw12 |= (1 << 30);
+    }
+
+    cmd.cdw12 = cpu_to_le32(cdw12);
+
+    trace_nvme_write_zeros(s, offset, bytes, flags);
+    assert(s->nr_queues > 1);
+    req = nvme_get_free_req(ioq);
+    assert(req);
+
+    nvme_submit_command(s, ioq, req, &cmd, nvme_rw_cb, &data);
+
+    data.co = qemu_coroutine_self();
+    while (data.ret == -EINPROGRESS) {
+        qemu_coroutine_yield();
+    }
+
+    trace_nvme_rw_done(s, true, offset, bytes, data.ret);
+    return data.ret;
+}
+
+
 static int nvme_reopen_prepare(BDRVReopenState *reopen_state,
                                BlockReopenQueue *queue, Error **errp)
 {
@@ -1184,6 +1248,9 @@ static BlockDriver bdrv_nvme = {
 
     .bdrv_co_preadv           = nvme_co_preadv,
     .bdrv_co_pwritev          = nvme_co_pwritev,
+
+    .bdrv_co_pwrite_zeroes    = nvme_co_pwrite_zeroes,
+
     .bdrv_co_flush_to_disk    = nvme_co_flush,
     .bdrv_reopen_prepare      = nvme_reopen_prepare,
 
diff --git a/block/trace-events b/block/trace-events
index 7335a42540..943a58569f 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -144,6 +144,7 @@ nvme_submit_command_raw(int c0, int c1, int c2, int c3, int c4, int c5, int c6,
 nvme_handle_event(void *s) "s %p"
 nvme_poll_cb(void *s) "s %p"
 nvme_prw_aligned(void *s, int is_write, uint64_t offset, uint64_t bytes, int flags, int niov) "s %p is_write %d offset %"PRId64" bytes %"PRId64" flags %d niov %d"
+nvme_write_zeros(void *s, uint64_t offset, uint64_t bytes, int flags) "s %p offset %"PRId64" bytes %"PRId64" flags %d"
 nvme_qiov_unaligned(const void *qiov, int n, void *base, size_t size, int align) "qiov %p n %d base %p size 0x%zx align 0x%x"
 nvme_prw_buffered(void *s, uint64_t offset, uint64_t bytes, int niov, int is_write) "s %p offset %"PRId64" bytes %"PRId64" niov %d is_write %d"
 nvme_rw_done(void *s, int is_write, uint64_t offset, uint64_t bytes, int ret) "s %p is_write %d offset %"PRId64" bytes %"PRId64" ret %d"
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 0eae6f9f15..edf8e90557 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -653,12 +653,29 @@ typedef struct NvmeIdNs {
     uint8_t     mc;
     uint8_t     dpc;
     uint8_t     dps;
-    uint8_t     res30[98];
+
+    uint8_t     nmic;
+    uint8_t     rescap;
+    uint8_t     fpi;
+    uint8_t     dlfeat;
+
+    uint8_t     res30[94];
     NvmeLBAF    lbaf[16];
     uint8_t     res192[192];
     uint8_t     vs[3712];
 } NvmeIdNs;
 
+
+/*Deallocate Logical Block Features*/
+#define NVME_ID_NS_DLFEAT_GUARD_CRC(dlfeat)       ((dlfeat) & 0x10)
+#define NVME_ID_NS_DLFEAT_WRITE_ZEROS(dlfeat)     ((dlfeat) & 0x04)
+
+#define NVME_ID_NS_DLFEAT_READ_BEHAVIOR(dlfeat)     ((dlfeat) & 0x3)
+#define NVME_ID_NS_DLFEAT_READ_BEHAVIOR_UNDEFINED   0
+#define NVME_ID_NS_DLFEAT_READ_BEHAVIOR_ZEROS       1
+#define NVME_ID_NS_DLFEAT_READ_BEHAVIOR_ONES        2
+
+
 #define NVME_ID_NS_NSFEAT_THIN(nsfeat)      ((nsfeat & 0x1))
 #define NVME_ID_NS_FLBAS_EXTENDED(flbas)    ((flbas >> 4) & 0x1)
 #define NVME_ID_NS_FLBAS_INDEX(flbas)       ((flbas & 0xf))
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v2 5/5] block/nvme: add support for discard
@ 2019-04-17 19:53   ` Maxim Levitsky
  0 siblings, 0 replies; 22+ messages in thread
From: Maxim Levitsky @ 2019-04-17 19:53 UTC (permalink / raw)
  To: qemu-block; +Cc: Maxim Levitsky, Fam Zheng, Kevin Wolf, Max Reitz, qemu-devel

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/nvme.c       | 80 ++++++++++++++++++++++++++++++++++++++++++++++
 block/trace-events |  2 ++
 2 files changed, 82 insertions(+)

diff --git a/block/nvme.c b/block/nvme.c
index 35b925899f..b83912c627 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -110,6 +110,7 @@ typedef struct {
     bool plugged;
 
     bool supports_write_zeros;
+    bool supports_discard;
 
     CoMutex dma_map_lock;
     CoQueue dma_flush_queue;
@@ -462,6 +463,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
 
 
     s->supports_write_zeros = (idctrl->oncs & NVME_ONCS_WRITE_ZEROS) != 0;
+    s->supports_discard = (idctrl->oncs & NVME_ONCS_DSM) != 0;
 
     memset(resp, 0, 4096);
 
@@ -1144,6 +1146,83 @@ static coroutine_fn int nvme_co_pwrite_zeroes(BlockDriverState *bs,
 }
 
 
+static int coroutine_fn nvme_co_pdiscard(BlockDriverState *bs,
+        int64_t offset, int bytes)
+{
+    BDRVNVMeState *s = bs->opaque;
+    NVMeQueuePair *ioq = s->queues[1];
+    NVMeRequest *req;
+    NvmeDsmRange *buf;
+    QEMUIOVector local_qiov;
+    int r;
+
+    NvmeCmd cmd = {
+        .opcode = NVME_CMD_DSM,
+        .nsid = cpu_to_le32(s->nsid),
+        .cdw10 = 0, /*number of ranges - 0 based*/
+        .cdw11 = cpu_to_le32(1 << 2), /*deallocate bit*/
+    };
+
+    NVMeCoData data = {
+        .ctx = bdrv_get_aio_context(bs),
+        .ret = -EINPROGRESS,
+    };
+
+    if (!s->supports_discard) {
+        return -ENOTSUP;
+    }
+
+    assert(s->nr_queues > 1);
+
+    buf = qemu_try_blockalign0(bs, 4096);
+    if (!buf) {
+            return -ENOMEM;
+    }
+
+    buf->nlb = bytes >> s->blkshift;
+    buf->slba = offset >> s->blkshift;
+    buf->cattr = 0;
+
+    qemu_iovec_init(&local_qiov, 1);
+    qemu_iovec_add(&local_qiov, buf, 4096);
+
+    req = nvme_get_free_req(ioq);
+    assert(req);
+
+    qemu_co_mutex_lock(&s->dma_map_lock);
+    r = nvme_cmd_map_qiov(bs, &cmd, req, &local_qiov);
+    qemu_co_mutex_unlock(&s->dma_map_lock);
+
+    if (r) {
+        req->busy = false;
+        return r;
+    }
+
+    trace_nvme_dsm(s, offset, bytes);
+
+    nvme_submit_command(s, ioq, req, &cmd, nvme_rw_cb, &data);
+
+    data.co = qemu_coroutine_self();
+    while (data.ret == -EINPROGRESS) {
+        qemu_coroutine_yield();
+    }
+
+    qemu_co_mutex_lock(&s->dma_map_lock);
+    r = nvme_cmd_unmap_qiov(bs, &local_qiov);
+    qemu_co_mutex_unlock(&s->dma_map_lock);
+    if (r) {
+        return r;
+    }
+
+    trace_nvme_dsm_done(s, offset, bytes, data.ret);
+
+    qemu_iovec_destroy(&local_qiov);
+    qemu_vfree(buf);
+    return data.ret;
+
+}
+
+
 static int nvme_reopen_prepare(BDRVReopenState *reopen_state,
                                BlockReopenQueue *queue, Error **errp)
 {
@@ -1250,6 +1329,7 @@ static BlockDriver bdrv_nvme = {
     .bdrv_co_pwritev          = nvme_co_pwritev,
 
     .bdrv_co_pwrite_zeroes    = nvme_co_pwrite_zeroes,
+    .bdrv_co_pdiscard         = nvme_co_pdiscard,
 
     .bdrv_co_flush_to_disk    = nvme_co_flush,
     .bdrv_reopen_prepare      = nvme_reopen_prepare,
diff --git a/block/trace-events b/block/trace-events
index 943a58569f..e55ac5c40b 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -148,6 +148,8 @@ nvme_write_zeros(void *s, uint64_t offset, uint64_t bytes, int flags) "s %p offs
 nvme_qiov_unaligned(const void *qiov, int n, void *base, size_t size, int align) "qiov %p n %d base %p size 0x%zx align 0x%x"
 nvme_prw_buffered(void *s, uint64_t offset, uint64_t bytes, int niov, int is_write) "s %p offset %"PRId64" bytes %"PRId64" niov %d is_write %d"
 nvme_rw_done(void *s, int is_write, uint64_t offset, uint64_t bytes, int ret) "s %p is_write %d offset %"PRId64" bytes %"PRId64" ret %d"
+nvme_dsm(void *s, uint64_t offset, uint64_t bytes) "s %p offset %"PRId64" bytes %"PRId64""
+nvme_dsm_done(void *s, uint64_t offset, uint64_t bytes, int ret) "s %p offset %"PRId64" bytes %"PRId64" ret %d"
 nvme_dma_map_flush(void *s) "s %p"
 nvme_free_req_queue_wait(void *q) "q %p"
 nvme_cmd_map_qiov(void *s, void *cmd, void *req, void *qiov, int entries) "s %p cmd %p req %p qiov %p entries %d"
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v2 5/5] block/nvme: add support for discard
@ 2019-04-17 19:53   ` Maxim Levitsky
  0 siblings, 0 replies; 22+ messages in thread
From: Maxim Levitsky @ 2019-04-17 19:53 UTC (permalink / raw)
  To: qemu-block; +Cc: Fam Zheng, Kevin Wolf, Max Reitz, qemu-devel, Maxim Levitsky

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 block/nvme.c       | 80 ++++++++++++++++++++++++++++++++++++++++++++++
 block/trace-events |  2 ++
 2 files changed, 82 insertions(+)

diff --git a/block/nvme.c b/block/nvme.c
index 35b925899f..b83912c627 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -110,6 +110,7 @@ typedef struct {
     bool plugged;
 
     bool supports_write_zeros;
+    bool supports_discard;
 
     CoMutex dma_map_lock;
     CoQueue dma_flush_queue;
@@ -462,6 +463,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
 
 
     s->supports_write_zeros = (idctrl->oncs & NVME_ONCS_WRITE_ZEROS) != 0;
+    s->supports_discard = (idctrl->oncs & NVME_ONCS_DSM) != 0;
 
     memset(resp, 0, 4096);
 
@@ -1144,6 +1146,83 @@ static coroutine_fn int nvme_co_pwrite_zeroes(BlockDriverState *bs,
 }
 
 
+static int coroutine_fn nvme_co_pdiscard(BlockDriverState *bs,
+        int64_t offset, int bytes)
+{
+    BDRVNVMeState *s = bs->opaque;
+    NVMeQueuePair *ioq = s->queues[1];
+    NVMeRequest *req;
+    NvmeDsmRange *buf;
+    QEMUIOVector local_qiov;
+    int r;
+
+    NvmeCmd cmd = {
+        .opcode = NVME_CMD_DSM,
+        .nsid = cpu_to_le32(s->nsid),
+        .cdw10 = 0, /*number of ranges - 0 based*/
+        .cdw11 = cpu_to_le32(1 << 2), /*deallocate bit*/
+    };
+
+    NVMeCoData data = {
+        .ctx = bdrv_get_aio_context(bs),
+        .ret = -EINPROGRESS,
+    };
+
+    if (!s->supports_discard) {
+        return -ENOTSUP;
+    }
+
+    assert(s->nr_queues > 1);
+
+    buf = qemu_try_blockalign0(bs, 4096);
+    if (!buf) {
+            return -ENOMEM;
+    }
+
+    buf->nlb = bytes >> s->blkshift;
+    buf->slba = offset >> s->blkshift;
+    buf->cattr = 0;
+
+    qemu_iovec_init(&local_qiov, 1);
+    qemu_iovec_add(&local_qiov, buf, 4096);
+
+    req = nvme_get_free_req(ioq);
+    assert(req);
+
+    qemu_co_mutex_lock(&s->dma_map_lock);
+    r = nvme_cmd_map_qiov(bs, &cmd, req, &local_qiov);
+    qemu_co_mutex_unlock(&s->dma_map_lock);
+
+    if (r) {
+        req->busy = false;
+        return r;
+    }
+
+    trace_nvme_dsm(s, offset, bytes);
+
+    nvme_submit_command(s, ioq, req, &cmd, nvme_rw_cb, &data);
+
+    data.co = qemu_coroutine_self();
+    while (data.ret == -EINPROGRESS) {
+        qemu_coroutine_yield();
+    }
+
+    qemu_co_mutex_lock(&s->dma_map_lock);
+    r = nvme_cmd_unmap_qiov(bs, &local_qiov);
+    qemu_co_mutex_unlock(&s->dma_map_lock);
+    if (r) {
+        return r;
+    }
+
+    trace_nvme_dsm_done(s, offset, bytes, data.ret);
+
+    qemu_iovec_destroy(&local_qiov);
+    qemu_vfree(buf);
+    return data.ret;
+
+}
+
+
 static int nvme_reopen_prepare(BDRVReopenState *reopen_state,
                                BlockReopenQueue *queue, Error **errp)
 {
@@ -1250,6 +1329,7 @@ static BlockDriver bdrv_nvme = {
     .bdrv_co_pwritev          = nvme_co_pwritev,
 
     .bdrv_co_pwrite_zeroes    = nvme_co_pwrite_zeroes,
+    .bdrv_co_pdiscard         = nvme_co_pdiscard,
 
     .bdrv_co_flush_to_disk    = nvme_co_flush,
     .bdrv_reopen_prepare      = nvme_reopen_prepare,
diff --git a/block/trace-events b/block/trace-events
index 943a58569f..e55ac5c40b 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -148,6 +148,8 @@ nvme_write_zeros(void *s, uint64_t offset, uint64_t bytes, int flags) "s %p offs
 nvme_qiov_unaligned(const void *qiov, int n, void *base, size_t size, int align) "qiov %p n %d base %p size 0x%zx align 0x%x"
 nvme_prw_buffered(void *s, uint64_t offset, uint64_t bytes, int niov, int is_write) "s %p offset %"PRId64" bytes %"PRId64" niov %d is_write %d"
 nvme_rw_done(void *s, int is_write, uint64_t offset, uint64_t bytes, int ret) "s %p is_write %d offset %"PRId64" bytes %"PRId64" ret %d"
+nvme_dsm(void *s, uint64_t offset, uint64_t bytes) "s %p offset %"PRId64" bytes %"PRId64""
+nvme_dsm_done(void *s, uint64_t offset, uint64_t bytes, int ret) "s %p offset %"PRId64" bytes %"PRId64" ret %d"
 nvme_dma_map_flush(void *s) "s %p"
 nvme_free_req_queue_wait(void *q) "q %p"
 nvme_cmd_map_qiov(void *s, void *cmd, void *req, void *qiov, int entries) "s %p cmd %p req %p qiov %p entries %d"
-- 
2.17.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/5] Few fixes for userspace NVME driver
  2019-04-17 19:53 ` Maxim Levitsky
                   ` (5 preceding siblings ...)
  (?)
@ 2019-06-03 12:26 ` Maxim Levitsky
  -1 siblings, 0 replies; 22+ messages in thread
From: Maxim Levitsky @ 2019-06-03 12:26 UTC (permalink / raw)
  To: qemu-block; +Cc: Fam Zheng, Kevin Wolf, qemu-devel, Max Reitz

On Wed, 2019-04-17 at 22:53 +0300, Maxim Levitsky wrote:
> Hi!
> These are few assorted fixes and features for the userspace
> nvme driver.
> 
> Tested that on my laptop with my Samsung X5 thunderbolt drive, which
> happens to have 4K sectors, support for discard and write zeros.
> 
> Also bunch of fixes sitting in my queue from the period when I developed
> the nvme-mdev driver.
> 
> Best regards,
>         Maxim Levitsky
> 
> Maxim Levitsky (5):
>   block/nvme: don't flip CQ phase bits
>   block/nvme: fix doorbell stride
>   block/nvme: support larger that 512 bytes sector devices
>   block/nvme: add support for write zeros
>   block/nvme: add support for discard
> 
>  block/nvme.c         | 193 +++++++++++++++++++++++++++++++++++++++++--
>  block/trace-events   |   3 +
>  include/block/nvme.h |  19 ++++-
>  3 files changed, 205 insertions(+), 10 deletions(-)
> 

Ping.

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 1/5] block/nvme: don't flip CQ phase bits
  2019-04-17 19:53   ` Maxim Levitsky
  (?)
@ 2019-06-03 22:25   ` John Snow
  2019-06-05  7:47     ` Maxim Levitsky
  -1 siblings, 1 reply; 22+ messages in thread
From: John Snow @ 2019-06-03 22:25 UTC (permalink / raw)
  To: Maxim Levitsky, qemu-block; +Cc: Fam Zheng, Kevin Wolf, qemu-devel, Max Reitz



On 4/17/19 3:53 PM, Maxim Levitsky wrote:
> Phase bits are only set by the hardware to indicate new completions
> and not by the device driver.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  block/nvme.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 0684bbd077..2d208000df 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -340,8 +340,6 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q)
>          qemu_mutex_lock(&q->lock);
>          c->cid = cpu_to_le16(0);
>          q->inflight--;
> -        /* Flip Phase Tag bit. */
> -        c->status = cpu_to_le16(le16_to_cpu(c->status) ^ 0x1);
>          progress = true;
>      }
>      if (progress) {
> 

Since you've not got much traction on this and you've pinged a v2, can
you point me to a spec or a reproducer that illustrates the problem?

(Or wait for more NVME knowledgeable people to give you a review...!)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 1/5] block/nvme: don't flip CQ phase bits
  2019-06-03 22:25   ` [Qemu-devel] [Qemu-block] " John Snow
@ 2019-06-05  7:47     ` Maxim Levitsky
  2019-06-06 21:23       ` John Snow
  0 siblings, 1 reply; 22+ messages in thread
From: Maxim Levitsky @ 2019-06-05  7:47 UTC (permalink / raw)
  To: John Snow, qemu-block; +Cc: Fam Zheng, Kevin Wolf, qemu-devel, Max Reitz

On Mon, 2019-06-03 at 18:25 -0400, John Snow wrote:
> 
> On 4/17/19 3:53 PM, Maxim Levitsky wrote:
> > Phase bits are only set by the hardware to indicate new completions
> > and not by the device driver.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  block/nvme.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/block/nvme.c b/block/nvme.c
> > index 0684bbd077..2d208000df 100644
> > --- a/block/nvme.c
> > +++ b/block/nvme.c
> > @@ -340,8 +340,6 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q)
> >          qemu_mutex_lock(&q->lock);
> >          c->cid = cpu_to_le16(0);
> >          q->inflight--;
> > -        /* Flip Phase Tag bit. */
> > -        c->status = cpu_to_le16(le16_to_cpu(c->status) ^ 0x1);
> >          progress = true;
> >      }
> >      if (progress) {
> > 
> 
> Since you've not got much traction on this and you've pinged a v2, can
> you point me to a spec or a reproducer that illustrates the problem?
> 
> (Or wait for more NVME knowledgeable people to give you a review...!)

"A Completion Queue entry is posted to the Completion Queue when the controller write of that Completion
Queue entry to the next free Completion Queue slot inverts the Phase Tag (P) bit from its previous value
in memory. The controller may generate an interrupt to the host to indicate that one or more Completion
Queue entries have been posted."



Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/5] block/nvme: add support for write zeros
  2019-04-17 19:53   ` Maxim Levitsky
  (?)
@ 2019-06-06  2:56   ` Fam Zheng
  -1 siblings, 0 replies; 22+ messages in thread
From: Fam Zheng @ 2019-06-06  2:56 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: Kevin Wolf, qemu-devel, qemu-block, Max Reitz

On Wed, 04/17 22:53, Maxim Levitsky wrote:
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  block/nvme.c         | 69 +++++++++++++++++++++++++++++++++++++++++++-
>  block/trace-events   |  1 +
>  include/block/nvme.h | 19 +++++++++++-
>  3 files changed, 87 insertions(+), 2 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 0b1da54574..35b925899f 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -109,6 +109,8 @@ typedef struct {
>      uint64_t max_transfer;
>      bool plugged;
>  
> +    bool supports_write_zeros;
> +
>      CoMutex dma_map_lock;
>      CoQueue dma_flush_queue;
>  
> @@ -457,6 +459,10 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
>      s->max_transfer = MIN_NON_ZERO(s->max_transfer,
>                            s->page_size / sizeof(uint64_t) * s->page_size);
>  
> +
> +

Too many blank lines here.

> +    s->supports_write_zeros = (idctrl->oncs & NVME_ONCS_WRITE_ZEROS) != 0;
> +
>      memset(resp, 0, 4096);
>  
>      cmd.cdw10 = 0;
> @@ -469,6 +475,11 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
>      s->nsze = le64_to_cpu(idns->nsze);
>      lbaf = &idns->lbaf[NVME_ID_NS_FLBAS_INDEX(idns->flbas)];
>  
> +    if (NVME_ID_NS_DLFEAT_WRITE_ZEROS(idns->dlfeat) &&
> +            NVME_ID_NS_DLFEAT_READ_BEHAVIOR(idns->dlfeat) ==
> +                    NVME_ID_NS_DLFEAT_READ_BEHAVIOR_ZEROS)
> +        bs->supported_write_flags |= BDRV_REQ_MAY_UNMAP;
> +
>      if (lbaf->ms) {
>          error_setg(errp, "Namespaces with metadata are not yet supported");
>          goto out;
> @@ -763,6 +774,8 @@ static int nvme_file_open(BlockDriverState *bs, QDict *options, int flags,
>      int ret;
>      BDRVNVMeState *s = bs->opaque;
>  
> +    bs->supported_write_flags = BDRV_REQ_FUA;
> +
>      opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
>      qemu_opts_absorb_qdict(opts, options, &error_abort);
>      device = qemu_opt_get(opts, NVME_BLOCK_OPT_DEVICE);
> @@ -791,7 +804,6 @@ static int nvme_file_open(BlockDriverState *bs, QDict *options, int flags,
>              goto fail;
>          }
>      }
> -    bs->supported_write_flags = BDRV_REQ_FUA;
>      return 0;
>  fail:
>      nvme_close(bs);
> @@ -1080,6 +1092,58 @@ static coroutine_fn int nvme_co_flush(BlockDriverState *bs)
>  }
>  
>  
> +static coroutine_fn int nvme_co_pwrite_zeroes(BlockDriverState *bs,
> +    int64_t offset, int bytes, BdrvRequestFlags flags)
> +{
> +    BDRVNVMeState *s = bs->opaque;
> +    NVMeQueuePair *ioq = s->queues[1];
> +    NVMeRequest *req;
> +
> +    if (!s->supports_write_zeros) {
> +        return -ENOTSUP;
> +    }

Local variables declaration below statements is not allowed as per coding style.

> +
> +    uint32_t cdw12 = ((bytes >> s->blkshift) - 1) & 0xFFFF;
> +
> +    NvmeCmd cmd = {
> +        .opcode = NVME_CMD_WRITE_ZEROS,
> +        .nsid = cpu_to_le32(s->nsid),
> +        .cdw10 = cpu_to_le32((offset >> s->blkshift) & 0xFFFFFFFF),
> +        .cdw11 = cpu_to_le32(((offset >> s->blkshift) >> 32) & 0xFFFFFFFF),
> +    };
> +
> +    NVMeCoData data = {
> +        .ctx = bdrv_get_aio_context(bs),
> +        .ret = -EINPROGRESS,
> +    };
> +
> +    if (flags & BDRV_REQ_MAY_UNMAP) {
> +        cdw12 |= (1 << 25);
> +    }
> +
> +    if (flags & BDRV_REQ_FUA) {
> +        cdw12 |= (1 << 30);
> +    }
> +
> +    cmd.cdw12 = cpu_to_le32(cdw12);
> +
> +    trace_nvme_write_zeros(s, offset, bytes, flags);
> +    assert(s->nr_queues > 1);
> +    req = nvme_get_free_req(ioq);
> +    assert(req);
> +
> +    nvme_submit_command(s, ioq, req, &cmd, nvme_rw_cb, &data);
> +
> +    data.co = qemu_coroutine_self();
> +    while (data.ret == -EINPROGRESS) {
> +        qemu_coroutine_yield();
> +    }
> +
> +    trace_nvme_rw_done(s, true, offset, bytes, data.ret);
> +    return data.ret;
> +}
> +
> +
>  static int nvme_reopen_prepare(BDRVReopenState *reopen_state,
>                                 BlockReopenQueue *queue, Error **errp)
>  {
> @@ -1184,6 +1248,9 @@ static BlockDriver bdrv_nvme = {
>  
>      .bdrv_co_preadv           = nvme_co_preadv,
>      .bdrv_co_pwritev          = nvme_co_pwritev,
> +
> +    .bdrv_co_pwrite_zeroes    = nvme_co_pwrite_zeroes,
> +
>      .bdrv_co_flush_to_disk    = nvme_co_flush,
>      .bdrv_reopen_prepare      = nvme_reopen_prepare,
>  
> diff --git a/block/trace-events b/block/trace-events
> index 7335a42540..943a58569f 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -144,6 +144,7 @@ nvme_submit_command_raw(int c0, int c1, int c2, int c3, int c4, int c5, int c6,
>  nvme_handle_event(void *s) "s %p"
>  nvme_poll_cb(void *s) "s %p"
>  nvme_prw_aligned(void *s, int is_write, uint64_t offset, uint64_t bytes, int flags, int niov) "s %p is_write %d offset %"PRId64" bytes %"PRId64" flags %d niov %d"
> +nvme_write_zeros(void *s, uint64_t offset, uint64_t bytes, int flags) "s %p offset %"PRId64" bytes %"PRId64" flags %d"
>  nvme_qiov_unaligned(const void *qiov, int n, void *base, size_t size, int align) "qiov %p n %d base %p size 0x%zx align 0x%x"
>  nvme_prw_buffered(void *s, uint64_t offset, uint64_t bytes, int niov, int is_write) "s %p offset %"PRId64" bytes %"PRId64" niov %d is_write %d"
>  nvme_rw_done(void *s, int is_write, uint64_t offset, uint64_t bytes, int ret) "s %p is_write %d offset %"PRId64" bytes %"PRId64" ret %d"
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 0eae6f9f15..edf8e90557 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -653,12 +653,29 @@ typedef struct NvmeIdNs {
>      uint8_t     mc;
>      uint8_t     dpc;
>      uint8_t     dps;
> -    uint8_t     res30[98];
> +
> +    uint8_t     nmic;
> +    uint8_t     rescap;
> +    uint8_t     fpi;
> +    uint8_t     dlfeat;
> +
> +    uint8_t     res30[94];
>      NvmeLBAF    lbaf[16];
>      uint8_t     res192[192];
>      uint8_t     vs[3712];
>  } NvmeIdNs;
>  
> +
> +/*Deallocate Logical Block Features*/
> +#define NVME_ID_NS_DLFEAT_GUARD_CRC(dlfeat)       ((dlfeat) & 0x10)
> +#define NVME_ID_NS_DLFEAT_WRITE_ZEROS(dlfeat)     ((dlfeat) & 0x04)
> +
> +#define NVME_ID_NS_DLFEAT_READ_BEHAVIOR(dlfeat)     ((dlfeat) & 0x3)
> +#define NVME_ID_NS_DLFEAT_READ_BEHAVIOR_UNDEFINED   0
> +#define NVME_ID_NS_DLFEAT_READ_BEHAVIOR_ZEROS       1
> +#define NVME_ID_NS_DLFEAT_READ_BEHAVIOR_ONES        2
> +
> +
>  #define NVME_ID_NS_NSFEAT_THIN(nsfeat)      ((nsfeat & 0x1))
>  #define NVME_ID_NS_FLBAS_EXTENDED(flbas)    ((flbas >> 4) & 0x1)
>  #define NVME_ID_NS_FLBAS_INDEX(flbas)       ((flbas & 0xf))
> -- 
> 2.17.2
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/5] block/nvme: add support for discard
  2019-04-17 19:53   ` Maxim Levitsky
  (?)
@ 2019-06-06  3:19   ` Fam Zheng
  2019-06-06  7:31     ` Maxim Levitsky
  -1 siblings, 1 reply; 22+ messages in thread
From: Fam Zheng @ 2019-06-06  3:19 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: Kevin Wolf, qemu-devel, qemu-block, Max Reitz

On Wed, 04/17 22:53, Maxim Levitsky wrote:
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  block/nvme.c       | 80 ++++++++++++++++++++++++++++++++++++++++++++++
>  block/trace-events |  2 ++
>  2 files changed, 82 insertions(+)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 35b925899f..b83912c627 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -110,6 +110,7 @@ typedef struct {
>      bool plugged;
>  
>      bool supports_write_zeros;
> +    bool supports_discard;
>  
>      CoMutex dma_map_lock;
>      CoQueue dma_flush_queue;
> @@ -462,6 +463,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
>  
>  
>      s->supports_write_zeros = (idctrl->oncs & NVME_ONCS_WRITE_ZEROS) != 0;
> +    s->supports_discard = (idctrl->oncs & NVME_ONCS_DSM) != 0;
>  
>      memset(resp, 0, 4096);
>  
> @@ -1144,6 +1146,83 @@ static coroutine_fn int nvme_co_pwrite_zeroes(BlockDriverState *bs,
>  }
>  
>  
> +static int coroutine_fn nvme_co_pdiscard(BlockDriverState *bs,
> +        int64_t offset, int bytes)

While you respin, you can align the parameters.

> +{
> +    BDRVNVMeState *s = bs->opaque;
> +    NVMeQueuePair *ioq = s->queues[1];
> +    NVMeRequest *req;
> +    NvmeDsmRange *buf;
> +    QEMUIOVector local_qiov;
> +    int r;
> +
> +    NvmeCmd cmd = {
> +        .opcode = NVME_CMD_DSM,
> +        .nsid = cpu_to_le32(s->nsid),
> +        .cdw10 = 0, /*number of ranges - 0 based*/
> +        .cdw11 = cpu_to_le32(1 << 2), /*deallocate bit*/
> +    };
> +
> +    NVMeCoData data = {
> +        .ctx = bdrv_get_aio_context(bs),
> +        .ret = -EINPROGRESS,
> +    };
> +
> +    if (!s->supports_discard) {
> +        return -ENOTSUP;
> +    }
> +
> +    assert(s->nr_queues > 1);
> +
> +    buf = qemu_try_blockalign0(bs, 4096);
> +    if (!buf) {
> +            return -ENOMEM;
> +    }
> +
> +    buf->nlb = bytes >> s->blkshift;
> +    buf->slba = offset >> s->blkshift;

This buffer is for the device, do we need to do anything about the endianness?

> +    buf->cattr = 0;
> +
> +    qemu_iovec_init(&local_qiov, 1);
> +    qemu_iovec_add(&local_qiov, buf, 4096);
> +
> +    req = nvme_get_free_req(ioq);
> +    assert(req);
> +
> +    qemu_co_mutex_lock(&s->dma_map_lock);
> +    r = nvme_cmd_map_qiov(bs, &cmd, req, &local_qiov);
> +    qemu_co_mutex_unlock(&s->dma_map_lock);
> +
> +    if (r) {
> +        req->busy = false;
> +        return r;
> +    }
> +
> +    trace_nvme_dsm(s, offset, bytes);
> +
> +    nvme_submit_command(s, ioq, req, &cmd, nvme_rw_cb, &data);
> +
> +    data.co = qemu_coroutine_self();
> +    while (data.ret == -EINPROGRESS) {
> +        qemu_coroutine_yield();
> +    }
> +
> +    qemu_co_mutex_lock(&s->dma_map_lock);
> +    r = nvme_cmd_unmap_qiov(bs, &local_qiov);
> +    qemu_co_mutex_unlock(&s->dma_map_lock);
> +    if (r) {
> +        return r;
> +    }
> +
> +    trace_nvme_dsm_done(s, offset, bytes, data.ret);
> +
> +    qemu_iovec_destroy(&local_qiov);
> +    qemu_vfree(buf);
> +    return data.ret;
> +
> +}
> +
> +
>  static int nvme_reopen_prepare(BDRVReopenState *reopen_state,
>                                 BlockReopenQueue *queue, Error **errp)
>  {
> @@ -1250,6 +1329,7 @@ static BlockDriver bdrv_nvme = {
>      .bdrv_co_pwritev          = nvme_co_pwritev,
>  
>      .bdrv_co_pwrite_zeroes    = nvme_co_pwrite_zeroes,
> +    .bdrv_co_pdiscard         = nvme_co_pdiscard,
>  
>      .bdrv_co_flush_to_disk    = nvme_co_flush,
>      .bdrv_reopen_prepare      = nvme_reopen_prepare,
> diff --git a/block/trace-events b/block/trace-events
> index 943a58569f..e55ac5c40b 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -148,6 +148,8 @@ nvme_write_zeros(void *s, uint64_t offset, uint64_t bytes, int flags) "s %p offs
>  nvme_qiov_unaligned(const void *qiov, int n, void *base, size_t size, int align) "qiov %p n %d base %p size 0x%zx align 0x%x"
>  nvme_prw_buffered(void *s, uint64_t offset, uint64_t bytes, int niov, int is_write) "s %p offset %"PRId64" bytes %"PRId64" niov %d is_write %d"
>  nvme_rw_done(void *s, int is_write, uint64_t offset, uint64_t bytes, int ret) "s %p is_write %d offset %"PRId64" bytes %"PRId64" ret %d"
> +nvme_dsm(void *s, uint64_t offset, uint64_t bytes) "s %p offset %"PRId64" bytes %"PRId64""
> +nvme_dsm_done(void *s, uint64_t offset, uint64_t bytes, int ret) "s %p offset %"PRId64" bytes %"PRId64" ret %d"
>  nvme_dma_map_flush(void *s) "s %p"
>  nvme_free_req_queue_wait(void *q) "q %p"
>  nvme_cmd_map_qiov(void *s, void *cmd, void *req, void *qiov, int entries) "s %p cmd %p req %p qiov %p entries %d"
> -- 
> 2.17.2
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/5] block/nvme: add support for discard
  2019-06-06  3:19   ` Fam Zheng
@ 2019-06-06  7:31     ` Maxim Levitsky
  0 siblings, 0 replies; 22+ messages in thread
From: Maxim Levitsky @ 2019-06-06  7:31 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Kevin Wolf, qemu-devel, qemu-block, Max Reitz

On Thu, 2019-06-06 at 11:19 +0800, Fam Zheng wrote:
> On Wed, 04/17 22:53, Maxim Levitsky wrote:
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  block/nvme.c       | 80 ++++++++++++++++++++++++++++++++++++++++++++++
> >  block/trace-events |  2 ++
> >  2 files changed, 82 insertions(+)
> > 
> > diff --git a/block/nvme.c b/block/nvme.c
> > index 35b925899f..b83912c627 100644
> > --- a/block/nvme.c
> > +++ b/block/nvme.c
> > @@ -110,6 +110,7 @@ typedef struct {
> >      bool plugged;
> >  
> >      bool supports_write_zeros;
> > +    bool supports_discard;
> >  
> >      CoMutex dma_map_lock;
> >      CoQueue dma_flush_queue;
> > @@ -462,6 +463,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
> >  
> >  
> >      s->supports_write_zeros = (idctrl->oncs & NVME_ONCS_WRITE_ZEROS) != 0;
> > +    s->supports_discard = (idctrl->oncs & NVME_ONCS_DSM) != 0;
> >  
> >      memset(resp, 0, 4096);
> >  
> > @@ -1144,6 +1146,83 @@ static coroutine_fn int nvme_co_pwrite_zeroes(BlockDriverState *bs,
> >  }
> >  
> >  
> > +static int coroutine_fn nvme_co_pdiscard(BlockDriverState *bs,
> > +        int64_t offset, int bytes)
> 
> While you respin, you can align the parameters.

Hi Fam!!


I didn't knew that this is also required by qemu codeing style (it kind of suggested in the kernel)
I'll be more that glad to do so!


> 
> > +{
> > +    BDRVNVMeState *s = bs->opaque;
> > +    NVMeQueuePair *ioq = s->queues[1];
> > +    NVMeRequest *req;
> > +    NvmeDsmRange *buf;
> > +    QEMUIOVector local_qiov;
> > +    int r;
> > +
> > +    NvmeCmd cmd = {
> > +        .opcode = NVME_CMD_DSM,
> > +        .nsid = cpu_to_le32(s->nsid),
> > +        .cdw10 = 0, /*number of ranges - 0 based*/
> > +        .cdw11 = cpu_to_le32(1 << 2), /*deallocate bit*/
> > +    };
> > +
> > +    NVMeCoData data = {
> > +        .ctx = bdrv_get_aio_context(bs),
> > +        .ret = -EINPROGRESS,
> > +    };
> > +
> > +    if (!s->supports_discard) {
> > +        return -ENOTSUP;
> > +    }
> > +
> > +    assert(s->nr_queues > 1);
> > +
> > +    buf = qemu_try_blockalign0(bs, 4096);
> > +    if (!buf) {
> > +            return -ENOMEM;
> > +    }
> > +
> > +    buf->nlb = bytes >> s->blkshift;
> > +    buf->slba = offset >> s->blkshift;
> 
> This buffer is for the device, do we need to do anything about the endianness?

Thank you very very much, this is indeed an endianess bug.


Thanks a lot for the review,
	Best regards,
		Maxim Levitsky

> 
> > +    buf->cattr = 0;
> > +
> > +    qemu_iovec_init(&local_qiov, 1);
> > +    qemu_iovec_add(&local_qiov, buf, 4096);
> > +
> > +    req = nvme_get_free_req(ioq);
> > +    assert(req);
> > +
> > +    qemu_co_mutex_lock(&s->dma_map_lock);
> > +    r = nvme_cmd_map_qiov(bs, &cmd, req, &local_qiov);
> > +    qemu_co_mutex_unlock(&s->dma_map_lock);
> > +
> > +    if (r) {
> > +        req->busy = false;
> > +        return r;
> > +    }
> > +
> > +    trace_nvme_dsm(s, offset, bytes);
> > +
> > +    nvme_submit_command(s, ioq, req, &cmd, nvme_rw_cb, &data);
> > +
> > +    data.co = qemu_coroutine_self();
> > +    while (data.ret == -EINPROGRESS) {
> > +        qemu_coroutine_yield();
> > +    }
> > +
> > +    qemu_co_mutex_lock(&s->dma_map_lock);
> > +    r = nvme_cmd_unmap_qiov(bs, &local_qiov);
> > +    qemu_co_mutex_unlock(&s->dma_map_lock);
> > +    if (r) {
> > +        return r;
> > +    }
> > +
> > +    trace_nvme_dsm_done(s, offset, bytes, data.ret);
> > +
> > +    qemu_iovec_destroy(&local_qiov);
> > +    qemu_vfree(buf);
> > +    return data.ret;
> > +
> > +}
> > +
> > +
> >  static int nvme_reopen_prepare(BDRVReopenState *reopen_state,
> >                                 BlockReopenQueue *queue, Error **errp)
> >  {
> > @@ -1250,6 +1329,7 @@ static BlockDriver bdrv_nvme = {
> >      .bdrv_co_pwritev          = nvme_co_pwritev,
> >  
> >      .bdrv_co_pwrite_zeroes    = nvme_co_pwrite_zeroes,
> > +    .bdrv_co_pdiscard         = nvme_co_pdiscard,
> >  
> >      .bdrv_co_flush_to_disk    = nvme_co_flush,
> >      .bdrv_reopen_prepare      = nvme_reopen_prepare,
> > diff --git a/block/trace-events b/block/trace-events
> > index 943a58569f..e55ac5c40b 100644
> > --- a/block/trace-events
> > +++ b/block/trace-events
> > @@ -148,6 +148,8 @@ nvme_write_zeros(void *s, uint64_t offset, uint64_t bytes, int flags) "s %p offs
> >  nvme_qiov_unaligned(const void *qiov, int n, void *base, size_t size, int align) "qiov %p n %d base %p size 0x%zx align 0x%x"
> >  nvme_prw_buffered(void *s, uint64_t offset, uint64_t bytes, int niov, int is_write) "s %p offset %"PRId64" bytes %"PRId64" niov %d is_write %d"
> >  nvme_rw_done(void *s, int is_write, uint64_t offset, uint64_t bytes, int ret) "s %p is_write %d offset %"PRId64" bytes %"PRId64" ret %d"
> > +nvme_dsm(void *s, uint64_t offset, uint64_t bytes) "s %p offset %"PRId64" bytes %"PRId64""
> > +nvme_dsm_done(void *s, uint64_t offset, uint64_t bytes, int ret) "s %p offset %"PRId64" bytes %"PRId64" ret %d"
> >  nvme_dma_map_flush(void *s) "s %p"
> >  nvme_free_req_queue_wait(void *q) "q %p"
> >  nvme_cmd_map_qiov(void *s, void *cmd, void *req, void *qiov, int entries) "s %p cmd %p req %p qiov %p entries %d"
> > -- 
> > 2.17.2
> > 
> 
> 




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 1/5] block/nvme: don't flip CQ phase bits
  2019-06-05  7:47     ` Maxim Levitsky
@ 2019-06-06 21:23       ` John Snow
  2019-06-07 11:08         ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: John Snow @ 2019-06-06 21:23 UTC (permalink / raw)
  To: Maxim Levitsky, qemu-block
  Cc: Fam Zheng, Kevin Wolf, Paolo Bonzini, qemu-devel, Max Reitz



On 6/5/19 3:47 AM, Maxim Levitsky wrote:
> On Mon, 2019-06-03 at 18:25 -0400, John Snow wrote:
>>
>> On 4/17/19 3:53 PM, Maxim Levitsky wrote:
>>> Phase bits are only set by the hardware to indicate new completions
>>> and not by the device driver.
>>>
>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>>> ---
>>>  block/nvme.c | 2 --
>>>  1 file changed, 2 deletions(-)
>>>
>>> diff --git a/block/nvme.c b/block/nvme.c
>>> index 0684bbd077..2d208000df 100644
>>> --- a/block/nvme.c
>>> +++ b/block/nvme.c
>>> @@ -340,8 +340,6 @@ static bool nvme_process_completion(BDRVNVMeState *s, NVMeQueuePair *q)
>>>          qemu_mutex_lock(&q->lock);
>>>          c->cid = cpu_to_le16(0);
>>>          q->inflight--;
>>> -        /* Flip Phase Tag bit. */
>>> -        c->status = cpu_to_le16(le16_to_cpu(c->status) ^ 0x1);
>>>          progress = true;
>>>      }
>>>      if (progress) {
>>>
>>
>> Since you've not got much traction on this and you've pinged a v2, can
>> you point me to a spec or a reproducer that illustrates the problem?
>>
>> (Or wait for more NVME knowledgeable people to give you a review...!)
> 
> "A Completion Queue entry is posted to the Completion Queue when the controller write of that Completion
> Queue entry to the next free Completion Queue slot inverts the Phase Tag (P) bit from its previous value
> in memory. The controller may generate an interrupt to the host to indicate that one or more Completion
> Queue entries have been posted."
> 

In the future, please reference the sections in your commit messages
when relevant:

NVM Express 1.3, Section 4.1 "Submission Queue & Completion Queue
Definition"

I also found 4.6 "Completion Queue Entry" to be informative; especially
Figure 28 which defines the phase bit.

So: This looks right; does this fix a bug that can be observed? Do we
have any regression tests for block/NVMe?

--js

> 
> 
> Best regards,
> 	Maxim Levitsky
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 1/5] block/nvme: don't flip CQ phase bits
  2019-06-06 21:23       ` John Snow
@ 2019-06-07 11:08         ` Paolo Bonzini
  2019-06-07 19:28           ` John Snow
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2019-06-07 11:08 UTC (permalink / raw)
  To: John Snow, Maxim Levitsky, qemu-block
  Cc: Fam Zheng, Kevin Wolf, qemu-devel, Max Reitz

On 06/06/19 23:23, John Snow wrote:
> So: This looks right; does this fix a bug that can be observed? Do we
> have any regression tests for block/NVMe?

I don't think it fixes a bug; by the time the CQ entry is picked up by
QEMU, the device is not supposed to touch it anymore.

However, the idea behind the phase bits is that you can decide whether
the driver has placed a completion in the queue.  When we get here, we have

	le16_to_cpu(c->status) & 0x1) == !q->cq_phase

On the next pass through the ring buffer q->cq_phase will be flipped,
and thus when we see this element we'll get

	le16_to_cpu(c->status) & 0x1) == q->cq_phase

and not process it.  Since block/nvme.c flips the bit, this mechanism
does not work and the loop termination relies on the other part of the
condition, "if (!c->cid) break;".

So the patch is correct, but it would also be nice to also either remove
phase handling altogether, or check that the phase handling works
properly and drop the !c->cid test.

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 1/5] block/nvme: don't flip CQ phase bits
  2019-06-07 11:08         ` Paolo Bonzini
@ 2019-06-07 19:28           ` John Snow
  2019-06-11  8:50             ` Maxim Levitsky
  0 siblings, 1 reply; 22+ messages in thread
From: John Snow @ 2019-06-07 19:28 UTC (permalink / raw)
  To: Paolo Bonzini, Maxim Levitsky, qemu-block
  Cc: Fam Zheng, Kevin Wolf, qemu-devel, Max Reitz



On 6/7/19 7:08 AM, Paolo Bonzini wrote:
> On 06/06/19 23:23, John Snow wrote:
>> So: This looks right; does this fix a bug that can be observed? Do we
>> have any regression tests for block/NVMe?
> 
> I don't think it fixes a bug; by the time the CQ entry is picked up by
> QEMU, the device is not supposed to touch it anymore.
> 
> However, the idea behind the phase bits is that you can decide whether
> the driver has placed a completion in the queue.  When we get here, we have
> 
> 	le16_to_cpu(c->status) & 0x1) == !q->cq_phase
> 
> On the next pass through the ring buffer q->cq_phase will be flipped,
> and thus when we see this element we'll get
> 
> 	le16_to_cpu(c->status) & 0x1) == q->cq_phase
> 
> and not process it.  Since block/nvme.c flips the bit, this mechanism
> does not work and the loop termination relies on the other part of the
> condition, "if (!c->cid) break;".
> 
> So the patch is correct, but it would also be nice to also either remove
> phase handling altogether, or check that the phase handling works
> properly and drop the !c->cid test.
> 
> Paolo
> 

Gotcha, I see, that's why it doesn't cause problems. Thanks :)

--js


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 1/5] block/nvme: don't flip CQ phase bits
  2019-06-07 19:28           ` John Snow
@ 2019-06-11  8:50             ` Maxim Levitsky
  0 siblings, 0 replies; 22+ messages in thread
From: Maxim Levitsky @ 2019-06-11  8:50 UTC (permalink / raw)
  To: John Snow, Paolo Bonzini, qemu-block
  Cc: Fam Zheng, Kevin Wolf, qemu-devel, Max Reitz

On Fri, 2019-06-07 at 15:28 -0400, John Snow wrote:
> 
> On 6/7/19 7:08 AM, Paolo Bonzini wrote:
> > On 06/06/19 23:23, John Snow wrote:
> > > So: This looks right; does this fix a bug that can be observed? Do we
> > > have any regression tests for block/NVMe?
> > 
> > I don't think it fixes a bug; by the time the CQ entry is picked up by
> > QEMU, the device is not supposed to touch it anymore.
> > 
> > However, the idea behind the phase bits is that you can decide whether
> > the driver has placed a completion in the queue.  When we get here, we have
> > 
> > 	le16_to_cpu(c->status) & 0x1) == !q->cq_phase
> > 
> > On the next pass through the ring buffer q->cq_phase will be flipped,
> > and thus when we see this element we'll get
> > 
> > 	le16_to_cpu(c->status) & 0x1) == q->cq_phase
> > 
> > and not process it.  Since block/nvme.c flips the bit, this mechanism
> > does not work and the loop termination relies on the other part of the
> > condition, "if (!c->cid) break;".
> > 
> > So the patch is correct, but it would also be nice to also either remove
> > phase handling altogether, or check that the phase handling works
> > properly and drop the !c->cid test.
> > 
> > Paolo


I agree with that and I'll send an updated patch soon.

The driver should not touch the completion entries at all, but rather just scan for the entries whose
phase bit was flipped by the hardware.

in fact I don't even think that the 'c->cid' became the exit condition, but rather since the device is not allowed 
to fully fill the compleiton queue (it must alway keep at least one free entry there), the end condition would still
be the check on the flipped phase bit.


I'll fix that to be up to the spec,

Best regards,
	Maxim Levitskky



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-06-11  8:53 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-17 19:53 [Qemu-devel] [PATCH v2 0/5] Few fixes for userspace NVME driver Maxim Levitsky
2019-04-17 19:53 ` Maxim Levitsky
2019-04-17 19:53 ` [Qemu-devel] [PATCH v2 1/5] block/nvme: don't flip CQ phase bits Maxim Levitsky
2019-04-17 19:53   ` Maxim Levitsky
2019-06-03 22:25   ` [Qemu-devel] [Qemu-block] " John Snow
2019-06-05  7:47     ` Maxim Levitsky
2019-06-06 21:23       ` John Snow
2019-06-07 11:08         ` Paolo Bonzini
2019-06-07 19:28           ` John Snow
2019-06-11  8:50             ` Maxim Levitsky
2019-04-17 19:53 ` [Qemu-devel] [PATCH v2 2/5] block/nvme: fix doorbell stride Maxim Levitsky
2019-04-17 19:53   ` Maxim Levitsky
2019-04-17 19:53 ` [Qemu-devel] [PATCH v2 3/5] block/nvme: support larger that 512 bytes sector devices Maxim Levitsky
2019-04-17 19:53   ` Maxim Levitsky
2019-04-17 19:53 ` [Qemu-devel] [PATCH v2 4/5] block/nvme: add support for write zeros Maxim Levitsky
2019-04-17 19:53   ` Maxim Levitsky
2019-06-06  2:56   ` Fam Zheng
2019-04-17 19:53 ` [Qemu-devel] [PATCH v2 5/5] block/nvme: add support for discard Maxim Levitsky
2019-04-17 19:53   ` Maxim Levitsky
2019-06-06  3:19   ` Fam Zheng
2019-06-06  7:31     ` Maxim Levitsky
2019-06-03 12:26 ` [Qemu-devel] [PATCH v2 0/5] Few fixes for userspace NVME driver Maxim Levitsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.