qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/5] NVMe passthrough: Support 64kB page host
@ 2020-10-15 11:52 Eric Auger
  2020-10-15 11:52 ` [RFC 1/5] block/nvme: use some NVME_CAP_* macros Eric Auger
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Eric Auger @ 2020-10-15 11:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, stefanha, fam,
	philmd, alex.williamson
  Cc: kwolf, mreitz

This series allows NVMe passthrough on aarch64 with 64kB page host.
Addresses and sizes of buffers which are VFIO DMA mapped are
aligned with the host page size.

nvme_register_buf() path is taken care of in this series
but it does not seem to prevent the use case from working.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/nvme_64k_rfc

This was tested on ARM only.

Eric Auger (5):
  block/nvme: use some NVME_CAP_* macros
  block/nvme: Change size and alignment of IDENTIFY response buffer
  block/nvme: Change size and alignment of queue
  block/nvme: Change size and alignment of prp_list_pages
  block/nvme: Align iov's va and size on host page size

 block/nvme.c | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

-- 
2.21.3



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC 1/5] block/nvme: use some NVME_CAP_* macros
  2020-10-15 11:52 [RFC 0/5] NVMe passthrough: Support 64kB page host Eric Auger
@ 2020-10-15 11:52 ` Eric Auger
  2020-10-15 13:29   ` Philippe Mathieu-Daudé
  2020-10-15 11:52 ` [RFC 2/5] block/nvme: Change size and alignment of IDENTIFY response buffer Eric Auger
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Eric Auger @ 2020-10-15 11:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, stefanha, fam,
	philmd, alex.williamson
  Cc: kwolf, mreitz

let's use NVME_CAP_DSTRD, NVME_CAP_MPSMIN and NVME_CAP_TO macros

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 block/nvme.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index f4f27b6da7..e3d96f20d0 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -728,10 +728,10 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
         goto out;
     }
 
-    s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
-    s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t);
+    s->page_size = MAX(4096, 1 << (12 + NVME_CAP_MPSMIN(cap)));
+    s->doorbell_scale = (4 << ((NVME_CAP_DSTRD(cap)))) / sizeof(uint32_t);
     bs->bl.opt_mem_alignment = s->page_size;
-    timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 30000);
+    timeout_ms = MIN(500 * NVME_CAP_TO(cap), 30000);
 
     /* Reset device to get a clean state. */
     s->regs->ctrl.cc = cpu_to_le32(le32_to_cpu(s->regs->ctrl.cc) & 0xFE);
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 2/5] block/nvme: Change size and alignment of IDENTIFY response buffer
  2020-10-15 11:52 [RFC 0/5] NVMe passthrough: Support 64kB page host Eric Auger
  2020-10-15 11:52 ` [RFC 1/5] block/nvme: use some NVME_CAP_* macros Eric Auger
@ 2020-10-15 11:52 ` Eric Auger
  2020-10-20 10:59   ` Philippe Mathieu-Daudé
  2020-10-15 11:52 ` [RFC 3/5] block/nvme: Change size and alignment of queue Eric Auger
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Eric Auger @ 2020-10-15 11:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, stefanha, fam,
	philmd, alex.williamson
  Cc: kwolf, mreitz

In preparation of 64kB host page support, let's change the size
and alignment of the IDENTIFY command response buffer so that
the VFIO DMA MAP succeeds. We align on the host page size.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 block/nvme.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index e3d96f20d0..088ff1825a 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -513,19 +513,21 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
         .opcode = NVME_ADM_CMD_IDENTIFY,
         .cdw10 = cpu_to_le32(0x1),
     };
+    size_t id_size = QEMU_ALIGN_UP(sizeof(*id), qemu_real_host_page_size);
 
-    id = qemu_try_memalign(s->page_size, sizeof(*id));
+    id = qemu_try_memalign(qemu_real_host_page_size, id_size);
     if (!id) {
         error_setg(errp, "Cannot allocate buffer for identify response");
         goto out;
     }
-    r = qemu_vfio_dma_map(s->vfio, id, sizeof(*id), true, &iova);
+    r = qemu_vfio_dma_map(s->vfio, id, id_size, true, &iova);
     if (r) {
         error_setg(errp, "Cannot map buffer for DMA");
         goto out;
     }
 
-    memset(id, 0, sizeof(*id));
+    memset(id, 0, id_size);
+
     cmd.dptr.prp1 = cpu_to_le64(iova);
     if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], &cmd)) {
         error_setg(errp, "Failed to identify controller");
@@ -547,7 +549,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
     s->supports_write_zeroes = !!(oncs & NVME_ONCS_WRITE_ZEROES);
     s->supports_discard = !!(oncs & NVME_ONCS_DSM);
 
-    memset(id, 0, sizeof(*id));
+    memset(id, 0, id_size);
     cmd.cdw10 = 0;
     cmd.nsid = cpu_to_le32(namespace);
     if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], &cmd)) {
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 3/5] block/nvme: Change size and alignment of queue
  2020-10-15 11:52 [RFC 0/5] NVMe passthrough: Support 64kB page host Eric Auger
  2020-10-15 11:52 ` [RFC 1/5] block/nvme: use some NVME_CAP_* macros Eric Auger
  2020-10-15 11:52 ` [RFC 2/5] block/nvme: Change size and alignment of IDENTIFY response buffer Eric Auger
@ 2020-10-15 11:52 ` Eric Auger
  2020-10-15 11:52 ` [RFC 4/5] block/nvme: Change size and alignment of prp_list_pages Eric Auger
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2020-10-15 11:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, stefanha, fam,
	philmd, alex.williamson
  Cc: kwolf, mreitz

In preparation of 64kB host page support, let's change the size
and alignment of the queue so that the VFIO DMA MAP succeeds.
We align on the host page size.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 block/nvme.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 088ff1825a..cf8ec200af 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -164,9 +164,9 @@ static void nvme_init_queue(BDRVNVMeState *s, NVMeQueue *q,
     size_t bytes;
     int r;
 
-    bytes = ROUND_UP(nentries * entry_bytes, s->page_size);
+    bytes = ROUND_UP(nentries * entry_bytes, qemu_real_host_page_size);
     q->head = q->tail = 0;
-    q->queue = qemu_try_memalign(s->page_size, bytes);
+    q->queue = qemu_try_memalign(qemu_real_host_page_size, bytes);
     if (!q->queue) {
         error_setg(errp, "Cannot allocate queue");
         return;
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 4/5] block/nvme: Change size and alignment of prp_list_pages
  2020-10-15 11:52 [RFC 0/5] NVMe passthrough: Support 64kB page host Eric Auger
                   ` (2 preceding siblings ...)
  2020-10-15 11:52 ` [RFC 3/5] block/nvme: Change size and alignment of queue Eric Auger
@ 2020-10-15 11:52 ` Eric Auger
  2020-10-15 11:52 ` [RFC 5/5] block/nvme: Align iov's va and size on host page size Eric Auger
  2020-10-15 13:49 ` [RFC 0/5] NVMe passthrough: Support 64kB page host Philippe Mathieu-Daudé
  5 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2020-10-15 11:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, stefanha, fam,
	philmd, alex.williamson
  Cc: kwolf, mreitz

In preparation of 64kB host page support, let's change the size
and alignment of the prp_list_pages so that the VFIO DMA MAP succeeds
with 64kB host page size. We align on the host page size.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 block/nvme.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index cf8ec200af..be8ec48bf2 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -210,24 +210,24 @@ static NVMeQueuePair *nvme_create_queue_pair(BDRVNVMeState *s,
     Error *local_err = NULL;
     NVMeQueuePair *q;
     uint64_t prp_list_iova;
+    size_t bytes;
 
     q = g_try_new0(NVMeQueuePair, 1);
     if (!q) {
         return NULL;
     }
-    q->prp_list_pages = qemu_try_memalign(s->page_size,
-                                          s->page_size * NVME_NUM_REQS);
+    bytes = QEMU_ALIGN_UP(s->page_size * NVME_NUM_REQS, qemu_real_host_page_size);
+    q->prp_list_pages = qemu_try_memalign(qemu_real_host_page_size, bytes);
     if (!q->prp_list_pages) {
         goto fail;
     }
-    memset(q->prp_list_pages, 0, s->page_size * NVME_NUM_REQS);
+    memset(q->prp_list_pages, 0, bytes);
     qemu_mutex_init(&q->lock);
     q->s = s;
     q->index = idx;
     qemu_co_queue_init(&q->free_req_queue);
     q->completion_bh = aio_bh_new(aio_context, nvme_process_completion_bh, q);
-    r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
-                          s->page_size * NVME_NUM_REQS,
+    r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages, bytes,
                           false, &prp_list_iova);
     if (r) {
         goto fail;
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 5/5] block/nvme: Align iov's va and size on host page size
  2020-10-15 11:52 [RFC 0/5] NVMe passthrough: Support 64kB page host Eric Auger
                   ` (3 preceding siblings ...)
  2020-10-15 11:52 ` [RFC 4/5] block/nvme: Change size and alignment of prp_list_pages Eric Auger
@ 2020-10-15 11:52 ` Eric Auger
  2020-10-20 11:33   ` Philippe Mathieu-Daudé
  2020-10-15 13:49 ` [RFC 0/5] NVMe passthrough: Support 64kB page host Philippe Mathieu-Daudé
  5 siblings, 1 reply; 16+ messages in thread
From: Eric Auger @ 2020-10-15 11:52 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, stefanha, fam,
	philmd, alex.williamson
  Cc: kwolf, mreitz

Make sure iov's va and size are properly aligned on the host page
size.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 block/nvme.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index be8ec48bf2..45807ed110 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -978,11 +978,12 @@ static coroutine_fn int nvme_cmd_map_qiov(BlockDriverState *bs, NvmeCmd *cmd,
     for (i = 0; i < qiov->niov; ++i) {
         bool retry = true;
         uint64_t iova;
+        size_t len = QEMU_ALIGN_UP(qiov->iov[i].iov_len,
+                                   qemu_real_host_page_size);
 try_map:
         r = qemu_vfio_dma_map(s->vfio,
                               qiov->iov[i].iov_base,
-                              qiov->iov[i].iov_len,
-                              true, &iova);
+                              len, true, &iova);
         if (r == -ENOMEM && retry) {
             retry = false;
             trace_nvme_dma_flush_queue_wait(s);
@@ -1126,8 +1127,8 @@ static inline bool nvme_qiov_aligned(BlockDriverState *bs,
     BDRVNVMeState *s = bs->opaque;
 
     for (i = 0; i < qiov->niov; ++i) {
-        if (!QEMU_PTR_IS_ALIGNED(qiov->iov[i].iov_base, s->page_size) ||
-            !QEMU_IS_ALIGNED(qiov->iov[i].iov_len, s->page_size)) {
+        if (!QEMU_PTR_IS_ALIGNED(qiov->iov[i].iov_base, qemu_real_host_page_size) ||
+            !QEMU_IS_ALIGNED(qiov->iov[i].iov_len, qemu_real_host_page_size)) {
             trace_nvme_qiov_unaligned(qiov, i, qiov->iov[i].iov_base,
                                       qiov->iov[i].iov_len, s->page_size);
             return false;
@@ -1143,7 +1144,7 @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
     int r;
     uint8_t *buf = NULL;
     QEMUIOVector local_qiov;
-
+    size_t len = QEMU_ALIGN_UP(bytes, qemu_real_host_page_size);
     assert(QEMU_IS_ALIGNED(offset, s->page_size));
     assert(QEMU_IS_ALIGNED(bytes, s->page_size));
     assert(bytes <= s->max_transfer);
@@ -1151,7 +1152,7 @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
         return nvme_co_prw_aligned(bs, offset, bytes, qiov, is_write, flags);
     }
     trace_nvme_prw_buffered(s, offset, bytes, qiov->niov, is_write);
-    buf = qemu_try_memalign(s->page_size, bytes);
+    buf = qemu_try_memalign(qemu_real_host_page_size, len);
 
     if (!buf) {
         return -ENOMEM;
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC 1/5] block/nvme: use some NVME_CAP_* macros
  2020-10-15 11:52 ` [RFC 1/5] block/nvme: use some NVME_CAP_* macros Eric Auger
@ 2020-10-15 13:29   ` Philippe Mathieu-Daudé
  2020-10-15 13:32     ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 16+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-15 13:29 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, stefanha, fam,
	alex.williamson
  Cc: kwolf, mreitz

On 10/15/20 1:52 PM, Eric Auger wrote:
> let's use NVME_CAP_DSTRD, NVME_CAP_MPSMIN and NVME_CAP_TO macros
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>   block/nvme.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index f4f27b6da7..e3d96f20d0 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -728,10 +728,10 @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
>           goto out;
>       }
>   
> -    s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
> -    s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t);
> +    s->page_size = MAX(4096, 1 << (12 + NVME_CAP_MPSMIN(cap)));

Are you suggesting commit fad1eb68862 ("block/nvme: Use register
definitions from 'block/nvme.h'") is buggy?

> +    s->doorbell_scale = (4 << ((NVME_CAP_DSTRD(cap)))) / sizeof(uint32_t);
>       bs->bl.opt_mem_alignment = s->page_size;
> -    timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 30000);
> +    timeout_ms = MIN(500 * NVME_CAP_TO(cap), 30000);
>   
>       /* Reset device to get a clean state. */
>       s->regs->ctrl.cc = cpu_to_le32(le32_to_cpu(s->regs->ctrl.cc) & 0xFE);
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/5] block/nvme: use some NVME_CAP_* macros
  2020-10-15 13:29   ` Philippe Mathieu-Daudé
@ 2020-10-15 13:32     ` Philippe Mathieu-Daudé
  2020-10-15 13:36       ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 16+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-15 13:32 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, stefanha, fam,
	alex.williamson
  Cc: kwolf, mreitz

On 10/15/20 3:29 PM, Philippe Mathieu-Daudé wrote:
> On 10/15/20 1:52 PM, Eric Auger wrote:
>> let's use NVME_CAP_DSTRD, NVME_CAP_MPSMIN and NVME_CAP_TO macros
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>   block/nvme.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/nvme.c b/block/nvme.c
>> index f4f27b6da7..e3d96f20d0 100644
>> --- a/block/nvme.c
>> +++ b/block/nvme.c
>> @@ -728,10 +728,10 @@ static int nvme_init(BlockDriverState *bs, const 
>> char *device, int namespace,
>>           goto out;
>>       }
>> -    s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
>> -    s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t);
>> +    s->page_size = MAX(4096, 1 << (12 + NVME_CAP_MPSMIN(cap)));
> 
> Are you suggesting commit fad1eb68862 ("block/nvme: Use register
> definitions from 'block/nvme.h'") is buggy?

Buh I wonder how we missed that :/

> 
>> +    s->doorbell_scale = (4 << ((NVME_CAP_DSTRD(cap)))) / 
>> sizeof(uint32_t);
>>       bs->bl.opt_mem_alignment = s->page_size;
>> -    timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 30000);
>> +    timeout_ms = MIN(500 * NVME_CAP_TO(cap), 30000);
>>       /* Reset device to get a clean state. */
>>       s->regs->ctrl.cc = cpu_to_le32(le32_to_cpu(s->regs->ctrl.cc) & 
>> 0xFE);
>>
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/5] block/nvme: use some NVME_CAP_* macros
  2020-10-15 13:32     ` Philippe Mathieu-Daudé
@ 2020-10-15 13:36       ` Philippe Mathieu-Daudé
  2020-10-15 16:11         ` Auger Eric
  0 siblings, 1 reply; 16+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-15 13:36 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, stefanha, fam,
	alex.williamson
  Cc: kwolf, mreitz

On 10/15/20 3:32 PM, Philippe Mathieu-Daudé wrote:
> On 10/15/20 3:29 PM, Philippe Mathieu-Daudé wrote:
>> On 10/15/20 1:52 PM, Eric Auger wrote:
>>> let's use NVME_CAP_DSTRD, NVME_CAP_MPSMIN and NVME_CAP_TO macros
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>> ---
>>>   block/nvme.c | 6 +++---
>>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/block/nvme.c b/block/nvme.c
>>> index f4f27b6da7..e3d96f20d0 100644
>>> --- a/block/nvme.c
>>> +++ b/block/nvme.c
>>> @@ -728,10 +728,10 @@ static int nvme_init(BlockDriverState *bs, 
>>> const char *device, int namespace,
>>>           goto out;
>>>       }
>>> -    s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
>>> -    s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / 
>>> sizeof(uint32_t);
>>> +    s->page_size = MAX(4096, 1 << (12 + NVME_CAP_MPSMIN(cap)));
>>
>> Are you suggesting commit fad1eb68862 ("block/nvme: Use register
>> definitions from 'block/nvme.h'") is buggy?
> 
> Buh I wonder how we missed that :/

Since your patch doesn't apply anyway, I might fix as:

         s->page_size = 4096 << NVME_CAP_MPSMIN(cap);

> 
>>
>>> +    s->doorbell_scale = (4 << ((NVME_CAP_DSTRD(cap)))) / 
>>> sizeof(uint32_t);
>>>       bs->bl.opt_mem_alignment = s->page_size;
>>> -    timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 30000);
>>> +    timeout_ms = MIN(500 * NVME_CAP_TO(cap), 30000);
>>>       /* Reset device to get a clean state. */
>>>       s->regs->ctrl.cc = cpu_to_le32(le32_to_cpu(s->regs->ctrl.cc) & 
>>> 0xFE);
>>>
>>
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/5] NVMe passthrough: Support 64kB page host
  2020-10-15 11:52 [RFC 0/5] NVMe passthrough: Support 64kB page host Eric Auger
                   ` (4 preceding siblings ...)
  2020-10-15 11:52 ` [RFC 5/5] block/nvme: Align iov's va and size on host page size Eric Auger
@ 2020-10-15 13:49 ` Philippe Mathieu-Daudé
  2020-10-15 16:15   ` Auger Eric
  5 siblings, 1 reply; 16+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-15 13:49 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, stefanha, fam,
	alex.williamson
  Cc: kwolf, mreitz

On 10/15/20 1:52 PM, Eric Auger wrote:
> This series allows NVMe passthrough on aarch64 with 64kB page host.
> Addresses and sizes of buffers which are VFIO DMA mapped are
> aligned with the host page size.
> 
> nvme_register_buf() path is taken care of in this series
> but it does not seem to prevent the use case from working.
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/qemu/tree/nvme_64k_rfc
> 
> This was tested on ARM only.
> 
> Eric Auger (5):
>    block/nvme: use some NVME_CAP_* macros
>    block/nvme: Change size and alignment of IDENTIFY response buffer
>    block/nvme: Change size and alignment of queue
>    block/nvme: Change size and alignment of prp_list_pages
>    block/nvme: Align iov's va and size on host page size

Since it is easier for me to rebase on top of your series,
I'm including it in my work (fixing the checkpatch errors)
and will repost block/nvme/ patches altogether.

Regards,

Phil.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 1/5] block/nvme: use some NVME_CAP_* macros
  2020-10-15 13:36       ` Philippe Mathieu-Daudé
@ 2020-10-15 16:11         ` Auger Eric
  0 siblings, 0 replies; 16+ messages in thread
From: Auger Eric @ 2020-10-15 16:11 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	eric.auger.pro, qemu-devel, qemu-arm, stefanha, fam,
	alex.williamson
  Cc: kwolf, mreitz

Hi Philippe,

On 10/15/20 3:36 PM, Philippe Mathieu-Daudé wrote:
> On 10/15/20 3:32 PM, Philippe Mathieu-Daudé wrote:
>> On 10/15/20 3:29 PM, Philippe Mathieu-Daudé wrote:
>>> On 10/15/20 1:52 PM, Eric Auger wrote:
>>>> let's use NVME_CAP_DSTRD, NVME_CAP_MPSMIN and NVME_CAP_TO macros
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>> ---
>>>>   block/nvme.c | 6 +++---
>>>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/block/nvme.c b/block/nvme.c
>>>> index f4f27b6da7..e3d96f20d0 100644
>>>> --- a/block/nvme.c
>>>> +++ b/block/nvme.c
>>>> @@ -728,10 +728,10 @@ static int nvme_init(BlockDriverState *bs,
>>>> const char *device, int namespace,
>>>>           goto out;
>>>>       }
>>>> -    s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
>>>> -    s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) /
>>>> sizeof(uint32_t);
>>>> +    s->page_size = MAX(4096, 1 << (12 + NVME_CAP_MPSMIN(cap)));
>>>
>>> Are you suggesting commit fad1eb68862 ("block/nvme: Use register
>>> definitions from 'block/nvme.h'") is buggy?
yes I think so. Sorry I messed up my rebase and failed to grab that patch.
>>
>> Buh I wonder how we missed that :/
> 
> Since your patch doesn't apply anyway, I might fix as:
> 
>         s->page_size = 4096 << NVME_CAP_MPSMIN(cap);
1 << (12 + NVME_CAP_MPSMIN(cap)) matches the spec text so personally I
would keep that.

Thanks

Eric
> 
>>
>>>
>>>> +    s->doorbell_scale = (4 << ((NVME_CAP_DSTRD(cap)))) /
>>>> sizeof(uint32_t);
>>>>       bs->bl.opt_mem_alignment = s->page_size;
>>>> -    timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 30000);
>>>> +    timeout_ms = MIN(500 * NVME_CAP_TO(cap), 30000);
>>>>       /* Reset device to get a clean state. */
>>>>       s->regs->ctrl.cc = cpu_to_le32(le32_to_cpu(s->regs->ctrl.cc) &
>>>> 0xFE);
>>>>
>>>
>>
> 
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/5] NVMe passthrough: Support 64kB page host
  2020-10-15 13:49 ` [RFC 0/5] NVMe passthrough: Support 64kB page host Philippe Mathieu-Daudé
@ 2020-10-15 16:15   ` Auger Eric
  2020-10-15 18:01     ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 16+ messages in thread
From: Auger Eric @ 2020-10-15 16:15 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	eric.auger.pro, qemu-devel, qemu-arm, stefanha, fam,
	alex.williamson
  Cc: kwolf, mreitz

Hi Philippe,

On 10/15/20 3:49 PM, Philippe Mathieu-Daudé wrote:
> On 10/15/20 1:52 PM, Eric Auger wrote:
>> This series allows NVMe passthrough on aarch64 with 64kB page host.
>> Addresses and sizes of buffers which are VFIO DMA mapped are
>> aligned with the host page size.
>>
>> nvme_register_buf() path is taken care of in this series
>> but it does not seem to prevent the use case from working.
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/nvme_64k_rfc
>>
>> This was tested on ARM only.
>>
>> Eric Auger (5):
>>    block/nvme: use some NVME_CAP_* macros
>>    block/nvme: Change size and alignment of IDENTIFY response buffer
>>    block/nvme: Change size and alignment of queue
>>    block/nvme: Change size and alignment of prp_list_pages
>>    block/nvme: Align iov's va and size on host page size
> 
> Since it is easier for me to rebase on top of your series,
> I'm including it in my work (fixing the checkpatch errors)
> and will repost block/nvme/ patches altogether.

There should be one warning (line exceeding 80 chars) but no error. I
can easily rebase/respin if you prefer.

Thanks

Eric
> 
> Regards,
> 
> Phil.
> 
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/5] NVMe passthrough: Support 64kB page host
  2020-10-15 16:15   ` Auger Eric
@ 2020-10-15 18:01     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 16+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-15 18:01 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, qemu-devel, qemu-arm, stefanha, fam,
	alex.williamson
  Cc: kwolf, mreitz

On 10/15/20 6:15 PM, Auger Eric wrote:
> Hi Philippe,
> 
> On 10/15/20 3:49 PM, Philippe Mathieu-Daudé wrote:
>> On 10/15/20 1:52 PM, Eric Auger wrote:
>>> This series allows NVMe passthrough on aarch64 with 64kB page host.
>>> Addresses and sizes of buffers which are VFIO DMA mapped are
>>> aligned with the host page size.
>>>
>>> nvme_register_buf() path is taken care of in this series
>>> but it does not seem to prevent the use case from working.
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>> This series can be found at:
>>> https://github.com/eauger/qemu/tree/nvme_64k_rfc
>>>
>>> This was tested on ARM only.
>>>
>>> Eric Auger (5):
>>>     block/nvme: use some NVME_CAP_* macros
>>>     block/nvme: Change size and alignment of IDENTIFY response buffer
>>>     block/nvme: Change size and alignment of queue
>>>     block/nvme: Change size and alignment of prp_list_pages
>>>     block/nvme: Align iov's va and size on host page size
>>
>> Since it is easier for me to rebase on top of your series,
>> I'm including it in my work (fixing the checkpatch errors)
>> and will repost block/nvme/ patches altogether.
> 
> There should be one warning (line exceeding 80 chars) but no error. I
> can easily rebase/respin if you prefer.

Yes, warnings, no error. No need to respin.

> 
> Thanks
> 
> Eric
>>
>> Regards,
>>
>> Phil.
>>
>>
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/5] block/nvme: Change size and alignment of IDENTIFY response buffer
  2020-10-15 11:52 ` [RFC 2/5] block/nvme: Change size and alignment of IDENTIFY response buffer Eric Auger
@ 2020-10-20 10:59   ` Philippe Mathieu-Daudé
  2020-10-20 11:31     ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 16+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-20 10:59 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, stefanha, fam,
	alex.williamson
  Cc: kwolf, mreitz

On 10/15/20 1:52 PM, Eric Auger wrote:
> In preparation of 64kB host page support, let's change the size
> and alignment of the IDENTIFY command response buffer so that
> the VFIO DMA MAP succeeds. We align on the host page size.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>   block/nvme.c | 10 ++++++----
>   1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index e3d96f20d0..088ff1825a 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -513,19 +513,21 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
>           .opcode = NVME_ADM_CMD_IDENTIFY,
>           .cdw10 = cpu_to_le32(0x1),
>       };
> +    size_t id_size = QEMU_ALIGN_UP(sizeof(*id), qemu_real_host_page_size);

Do we really need to ALIGN_UP if we then call qemu_try_memalign()?

>   
> -    id = qemu_try_memalign(s->page_size, sizeof(*id));
> +    id = qemu_try_memalign(qemu_real_host_page_size, id_size);
>       if (!id) {
>           error_setg(errp, "Cannot allocate buffer for identify response");
>           goto out;
>       }
> -    r = qemu_vfio_dma_map(s->vfio, id, sizeof(*id), true, &iova);
> +    r = qemu_vfio_dma_map(s->vfio, id, id_size, true, &iova);
>       if (r) {
>           error_setg(errp, "Cannot map buffer for DMA");
>           goto out;
>       }
>   
> -    memset(id, 0, sizeof(*id));
> +    memset(id, 0, id_size);
> +
>       cmd.dptr.prp1 = cpu_to_le64(iova);
>       if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], &cmd)) {
>           error_setg(errp, "Failed to identify controller");
> @@ -547,7 +549,7 @@ static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
>       s->supports_write_zeroes = !!(oncs & NVME_ONCS_WRITE_ZEROES);
>       s->supports_discard = !!(oncs & NVME_ONCS_DSM);
>   
> -    memset(id, 0, sizeof(*id));
> +    memset(id, 0, id_size);
>       cmd.cdw10 = 0;
>       cmd.nsid = cpu_to_le32(namespace);
>       if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], &cmd)) {
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/5] block/nvme: Change size and alignment of IDENTIFY response buffer
  2020-10-20 10:59   ` Philippe Mathieu-Daudé
@ 2020-10-20 11:31     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 16+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-20 11:31 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, stefanha, fam,
	alex.williamson
  Cc: kwolf, mreitz

On 10/20/20 12:59 PM, Philippe Mathieu-Daudé wrote:
> On 10/15/20 1:52 PM, Eric Auger wrote:
>> In preparation of 64kB host page support, let's change the size
>> and alignment of the IDENTIFY command response buffer so that
>> the VFIO DMA MAP succeeds. We align on the host page size.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>   block/nvme.c | 10 ++++++----
>>   1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/nvme.c b/block/nvme.c
>> index e3d96f20d0..088ff1825a 100644
>> --- a/block/nvme.c
>> +++ b/block/nvme.c
>> @@ -513,19 +513,21 @@ static void nvme_identify(BlockDriverState *bs, 
>> int namespace, Error **errp)
>>           .opcode = NVME_ADM_CMD_IDENTIFY,
>>           .cdw10 = cpu_to_le32(0x1),
>>       };
>> +    size_t id_size = QEMU_ALIGN_UP(sizeof(*id), 
>> qemu_real_host_page_size);
> 
> Do we really need to ALIGN_UP if we then call qemu_try_memalign()?

Ah, we need to ALIGN_UP for qemu_vfio_dma_map(), OK.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

> 
>> -    id = qemu_try_memalign(s->page_size, sizeof(*id));
>> +    id = qemu_try_memalign(qemu_real_host_page_size, id_size);
>>       if (!id) {
>>           error_setg(errp, "Cannot allocate buffer for identify 
>> response");
>>           goto out;
>>       }
>> -    r = qemu_vfio_dma_map(s->vfio, id, sizeof(*id), true, &iova);
>> +    r = qemu_vfio_dma_map(s->vfio, id, id_size, true, &iova);
>>       if (r) {
>>           error_setg(errp, "Cannot map buffer for DMA");
>>           goto out;
>>       }
>> -    memset(id, 0, sizeof(*id));
>> +    memset(id, 0, id_size);
>> +
>>       cmd.dptr.prp1 = cpu_to_le64(iova);
>>       if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], &cmd)) {
>>           error_setg(errp, "Failed to identify controller");
>> @@ -547,7 +549,7 @@ static void nvme_identify(BlockDriverState *bs, 
>> int namespace, Error **errp)
>>       s->supports_write_zeroes = !!(oncs & NVME_ONCS_WRITE_ZEROES);
>>       s->supports_discard = !!(oncs & NVME_ONCS_DSM);
>> -    memset(id, 0, sizeof(*id));
>> +    memset(id, 0, id_size);
>>       cmd.cdw10 = 0;
>>       cmd.nsid = cpu_to_le32(namespace);
>>       if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], &cmd)) {
>>
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 5/5] block/nvme: Align iov's va and size on host page size
  2020-10-15 11:52 ` [RFC 5/5] block/nvme: Align iov's va and size on host page size Eric Auger
@ 2020-10-20 11:33   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 16+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-20 11:33 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, stefanha, fam,
	alex.williamson
  Cc: kwolf, mreitz

On 10/15/20 1:52 PM, Eric Auger wrote:
> Make sure iov's va and size are properly aligned on the host page
> size.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

> ---
>   block/nvme.c | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index be8ec48bf2..45807ed110 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -978,11 +978,12 @@ static coroutine_fn int nvme_cmd_map_qiov(BlockDriverState *bs, NvmeCmd *cmd,
>       for (i = 0; i < qiov->niov; ++i) {
>           bool retry = true;
>           uint64_t iova;
> +        size_t len = QEMU_ALIGN_UP(qiov->iov[i].iov_len,
> +                                   qemu_real_host_page_size);
>   try_map:
>           r = qemu_vfio_dma_map(s->vfio,
>                                 qiov->iov[i].iov_base,
> -                              qiov->iov[i].iov_len,
> -                              true, &iova);
> +                              len, true, &iova);
>           if (r == -ENOMEM && retry) {
>               retry = false;
>               trace_nvme_dma_flush_queue_wait(s);
> @@ -1126,8 +1127,8 @@ static inline bool nvme_qiov_aligned(BlockDriverState *bs,
>       BDRVNVMeState *s = bs->opaque;
>   
>       for (i = 0; i < qiov->niov; ++i) {
> -        if (!QEMU_PTR_IS_ALIGNED(qiov->iov[i].iov_base, s->page_size) ||
> -            !QEMU_IS_ALIGNED(qiov->iov[i].iov_len, s->page_size)) {
> +        if (!QEMU_PTR_IS_ALIGNED(qiov->iov[i].iov_base, qemu_real_host_page_size) ||
> +            !QEMU_IS_ALIGNED(qiov->iov[i].iov_len, qemu_real_host_page_size)) {
>               trace_nvme_qiov_unaligned(qiov, i, qiov->iov[i].iov_base,
>                                         qiov->iov[i].iov_len, s->page_size);
>               return false;
> @@ -1143,7 +1144,7 @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
>       int r;
>       uint8_t *buf = NULL;
>       QEMUIOVector local_qiov;
> -
> +    size_t len = QEMU_ALIGN_UP(bytes, qemu_real_host_page_size);
>       assert(QEMU_IS_ALIGNED(offset, s->page_size));
>       assert(QEMU_IS_ALIGNED(bytes, s->page_size));
>       assert(bytes <= s->max_transfer);
> @@ -1151,7 +1152,7 @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
>           return nvme_co_prw_aligned(bs, offset, bytes, qiov, is_write, flags);
>       }
>       trace_nvme_prw_buffered(s, offset, bytes, qiov->niov, is_write);
> -    buf = qemu_try_memalign(s->page_size, bytes);
> +    buf = qemu_try_memalign(qemu_real_host_page_size, len);
>   
>       if (!buf) {
>           return -ENOMEM;
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-10-20 11:38 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-15 11:52 [RFC 0/5] NVMe passthrough: Support 64kB page host Eric Auger
2020-10-15 11:52 ` [RFC 1/5] block/nvme: use some NVME_CAP_* macros Eric Auger
2020-10-15 13:29   ` Philippe Mathieu-Daudé
2020-10-15 13:32     ` Philippe Mathieu-Daudé
2020-10-15 13:36       ` Philippe Mathieu-Daudé
2020-10-15 16:11         ` Auger Eric
2020-10-15 11:52 ` [RFC 2/5] block/nvme: Change size and alignment of IDENTIFY response buffer Eric Auger
2020-10-20 10:59   ` Philippe Mathieu-Daudé
2020-10-20 11:31     ` Philippe Mathieu-Daudé
2020-10-15 11:52 ` [RFC 3/5] block/nvme: Change size and alignment of queue Eric Auger
2020-10-15 11:52 ` [RFC 4/5] block/nvme: Change size and alignment of prp_list_pages Eric Auger
2020-10-15 11:52 ` [RFC 5/5] block/nvme: Align iov's va and size on host page size Eric Auger
2020-10-20 11:33   ` Philippe Mathieu-Daudé
2020-10-15 13:49 ` [RFC 0/5] NVMe passthrough: Support 64kB page host Philippe Mathieu-Daudé
2020-10-15 16:15   ` Auger Eric
2020-10-15 18:01     ` Philippe Mathieu-Daudé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).