All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/4] sqsize zero-based fixes
@ 2016-08-11  4:07 Jay Freyensee
  2016-08-11  4:07 ` [PATCH RFC 1/4] fabrics: define admin sqsize min default, per spec Jay Freyensee
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Jay Freyensee @ 2016-08-11  4:07 UTC (permalink / raw)


These patches are based on NVMe-over-Fabrics spec clarifcations
by Dave Minturn, one of the spec authors. 
In summary, sqsize and hsqsize are 0-based
values, and hrqsize cannot be smaller than hsqsize+1.

There is an issue with these patches I haven't worked out yet,
thus the reason for RFC.  The keep-alive timer is going
off then causes a kernel crash and I haven't worked out
exactly why yet, but I'm thinking
it has something to do with sqsize now being 0-based but
the RDMA implemenation layer's queue_size being 1's based.

Anyways, feedback appreciated as I continue
to work on how to fix this appropriately.

Jay Freyensee (4):
  fabrics: define admin sqsize min default, per spec
  nvme-rdma: fix sqsize/hsqsize/hrqsize per spec
  nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize
  nvme-loop: set sqsize to 0-based value, per spec

 drivers/nvme/host/fabrics.c | 16 +++++++++++++++-
 drivers/nvme/host/nvme.h    |  1 +
 drivers/nvme/host/rdma.c    | 19 ++++++++++++++++---
 drivers/nvme/target/loop.c  |  2 +-
 drivers/nvme/target/rdma.c  |  8 ++++----
 5 files changed, 37 insertions(+), 9 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 1/4] fabrics: define admin sqsize min default, per spec
  2016-08-11  4:07 [PATCH RFC 0/4] sqsize zero-based fixes Jay Freyensee
@ 2016-08-11  4:07 ` Jay Freyensee
  2016-08-11  9:01   ` Sagi Grimberg
  2016-08-11  4:07 ` [PATCH RFC 2/4] nvme-rdma: fix sqsize/hsqsize/hrqsize " Jay Freyensee
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Jay Freyensee @ 2016-08-11  4:07 UTC (permalink / raw)


Upon admin queue connect(), the rdma qp was being
set based on NVMF_AQ_DEPTH.  However, the fabrics layer was
using the sqsize field value set for I/O queues for the admin
queue, which through the nvme layer and rdma layer off-whack:

root at fedora23-fabrics-host1 nvmf]# dmesg
[ 3507.798642] nvme_fabrics: nvmf_connect_admin_queue():admin sqsize
being sent is: 128
[ 3507.798858] nvme nvme0: creating 16 I/O queues.
[ 3507.896407] nvme nvme0: new ctrl: NQN "nullside-nqn", addr
192.168.1.3:4420

Thus, to have a different admin queue value (which the fabrics
spec states the minimum depth for a fabrics admin queue is 32 via
the ASQSZ definition), we need also a new variable to hold
the sqsize for admin fabrics queue.  This also allows fabric
implementation layers to set an admin sqsize greater than the
specification-defined minimum, which the fabrics spec allows.

Reported-by: Daniel Verkamp <daniel.verkamp at intel.com>
Signed-off-by: Jay Freyensee <james_p_freyensee at linux.intel.com>
---
 drivers/nvme/host/fabrics.c | 16 +++++++++++++++-
 drivers/nvme/host/nvme.h    |  1 +
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index dc99676..cfc3607 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -363,7 +363,14 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
 	cmd.connect.opcode = nvme_fabrics_command;
 	cmd.connect.fctype = nvme_fabrics_type_connect;
 	cmd.connect.qid = 0;
-	cmd.connect.sqsize = cpu_to_le16(ctrl->sqsize);
+
+	/*
+	 * fabrics spec sets a minimum of depth 32 for admin queue, but
+	 * fabric implementation layer may choose to define something deeper.
+	 */
+	cmd.connect.sqsize = (ctrl->admin_sqsize < NVMF_AQ_DEPTH-1) ?
+			     cpu_to_le16(NVMF_AQ_DEPTH-1) :
+			     cpu_to_le16(ctrl->admin_sqsize);
 	/*
 	 * Set keep-alive timeout in seconds granularity (ms * 1000)
 	 * and add a grace period for controller kato enforcement
@@ -833,6 +840,13 @@ static ssize_t nvmf_dev_write(struct file *file, const char __user *ubuf,
 
 	seq_file->private = ctrl;
 
+	/*
+	 * set admin_sqsize a minimum default value
+	 * but allow fabric implementation modules
+	 * to define a larger sqsize, if desired
+	 */
+	ctrl->admin_sqsize = NVMF_AQ_DEPTH-1;
+
 out_unlock:
 	mutex_unlock(&nvmf_dev_mutex);
 	kfree(buf);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ab18b78..32577a7 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -137,6 +137,7 @@ struct nvme_ctrl {
 	struct delayed_work ka_work;
 
 	/* Fabrics only */
+	u16 admin_sqsize;
 	u16 sqsize;
 	u32 ioccsz;
 	u32 iorcsz;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 2/4] nvme-rdma: fix sqsize/hsqsize/hrqsize per spec
  2016-08-11  4:07 [PATCH RFC 0/4] sqsize zero-based fixes Jay Freyensee
  2016-08-11  4:07 ` [PATCH RFC 1/4] fabrics: define admin sqsize min default, per spec Jay Freyensee
@ 2016-08-11  4:07 ` Jay Freyensee
  2016-08-11  7:03   ` Sagi Grimberg
  2016-08-11 15:21   ` Christoph Hellwig
  2016-08-11  4:07 ` [PATCH RFC 3/4] nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize Jay Freyensee
  2016-08-11  4:07 ` [PATCH RFC 4/4] nvme-loop: set sqsize to 0-based value, per spec Jay Freyensee
  3 siblings, 2 replies; 13+ messages in thread
From: Jay Freyensee @ 2016-08-11  4:07 UTC (permalink / raw)


Per NVMe-over-Fabrics 1.0 spec, sqsize is represented as
a 0-based value.

Also per spec, the RDMA binding values shall be set
to sqsize, which makes hsqsize 0-based values.

Also per spec, but not very clear, is hrqsize is +1
of hsqsize.

Thus, the sqsize during NVMf connect() is now:

[root at fedora23-fabrics-host1 for-48]# dmesg
[  318.720645] nvme_fabrics: nvmf_connect_admin_queue(): sqsize for
admin queue: 31
[  318.720884] nvme nvme0: creating 16 I/O queues.
[  318.810114] nvme_fabrics: nvmf_connect_io_queue(): sqsize for i/o
queue: 127

Reported-by: Daniel Verkamp <daniel.verkamp at intel.com>
Signed-off-by: Jay Freyensee <james_p_freyensee at linux.intel.com>
---
 drivers/nvme/host/rdma.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 3e3ce2b..8be64f1 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1284,8 +1284,21 @@ static int nvme_rdma_route_resolved(struct nvme_rdma_queue *queue)
 
 	priv.recfmt = cpu_to_le16(NVME_RDMA_CM_FMT_1_0);
 	priv.qid = cpu_to_le16(nvme_rdma_queue_idx(queue));
-	priv.hrqsize = cpu_to_le16(queue->queue_size);
-	priv.hsqsize = cpu_to_le16(queue->queue_size);
+
+	/*
+	 * On one end, the fabrics spec is pretty clear that
+	 * hsqsize variables shall be set to the value of sqsize,
+	 * which is a 0-based number. What is confusing is the value for
+	 * hrqsize.  After clarification from NVMe spec committee member,
+	 * the minimum value of hrqsize is hsqsize+1.
+	 */
+	if (priv.qid == 0) {
+		priv.hsqsize = cpu_to_le16(queue->ctrl->ctrl.admin_sqsize);
+		priv.hrqsize = cpu_to_le16(queue->ctrl->ctrl.admin_sqsize+1);
+	} else {
+		priv.hsqsize = cpu_to_le16(queue->ctrl->ctrl.sqsize);
+		priv.hrqsize = cpu_to_le16(queue->ctrl->ctrl.sqsize+1);
+	}
 
 	ret = rdma_connect(queue->cm_id, &param);
 	if (ret) {
@@ -1907,7 +1920,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 	spin_lock_init(&ctrl->lock);
 
 	ctrl->queue_count = opts->nr_io_queues + 1; /* +1 for admin queue */
-	ctrl->ctrl.sqsize = opts->queue_size;
+	ctrl->ctrl.sqsize = opts->queue_size-1;
 	ctrl->ctrl.kato = opts->kato;
 
 	ret = -ENOMEM;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 3/4] nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize
  2016-08-11  4:07 [PATCH RFC 0/4] sqsize zero-based fixes Jay Freyensee
  2016-08-11  4:07 ` [PATCH RFC 1/4] fabrics: define admin sqsize min default, per spec Jay Freyensee
  2016-08-11  4:07 ` [PATCH RFC 2/4] nvme-rdma: fix sqsize/hsqsize/hrqsize " Jay Freyensee
@ 2016-08-11  4:07 ` Jay Freyensee
  2016-08-11  4:07 ` [PATCH RFC 4/4] nvme-loop: set sqsize to 0-based value, per spec Jay Freyensee
  3 siblings, 0 replies; 13+ messages in thread
From: Jay Freyensee @ 2016-08-11  4:07 UTC (permalink / raw)


Now that the host will be sending hsqsize 0-based
values with hrqsize equaling hsqsize+1,
the target needs to be adjusted as well.

Signed-off-by: Jay Freyensee <james_p_freyensee at linux.intel.com>
---
 drivers/nvme/target/rdma.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index e06d504..4a08946 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1004,11 +1004,11 @@ nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn,
 	queue->host_qid = le16_to_cpu(req->qid);
 
 	/*
-	 * req->hsqsize corresponds to our recv queue size
-	 * req->hrqsize corresponds to our send queue size
+	 * req->hsqsize corresponds to our recv queue size, plus 1
+	 * req->hrqsize corresponds to our send queue size, plus 1
 	 */
-	queue->recv_queue_size = le16_to_cpu(req->hsqsize);
-	queue->send_queue_size = le16_to_cpu(req->hrqsize);
+	queue->recv_queue_size = le16_to_cpu(req->hsqsize+1);
+	queue->send_queue_size = le16_to_cpu(req->hrqsize+1);
 
 	if (!queue->host_qid && queue->recv_queue_size > NVMF_AQ_DEPTH)
 		return NVME_RDMA_CM_INVALID_HSQSIZE;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 4/4] nvme-loop: set sqsize to 0-based value, per spec
  2016-08-11  4:07 [PATCH RFC 0/4] sqsize zero-based fixes Jay Freyensee
                   ` (2 preceding siblings ...)
  2016-08-11  4:07 ` [PATCH RFC 3/4] nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize Jay Freyensee
@ 2016-08-11  4:07 ` Jay Freyensee
  3 siblings, 0 replies; 13+ messages in thread
From: Jay Freyensee @ 2016-08-11  4:07 UTC (permalink / raw)


Signed-off-by: Jay Freyensee <james_p_freyensee at linux.intel.com>
---
 drivers/nvme/target/loop.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 94e7829..21ce0e8 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -622,7 +622,7 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
 
 	ret = -ENOMEM;
 
-	ctrl->ctrl.sqsize = opts->queue_size;
+	ctrl->ctrl.sqsize = opts->queue_size-1;
 	ctrl->ctrl.kato = opts->kato;
 
 	ctrl->queues = kcalloc(opts->nr_io_queues + 1, sizeof(*ctrl->queues),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 2/4] nvme-rdma: fix sqsize/hsqsize/hrqsize per spec
  2016-08-11  4:07 ` [PATCH RFC 2/4] nvme-rdma: fix sqsize/hsqsize/hrqsize " Jay Freyensee
@ 2016-08-11  7:03   ` Sagi Grimberg
  2016-08-11 16:35     ` J Freyensee
  2016-08-11 15:21   ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Sagi Grimberg @ 2016-08-11  7:03 UTC (permalink / raw)




On 11/08/16 07:07, Jay Freyensee wrote:
> Per NVMe-over-Fabrics 1.0 spec, sqsize is represented as
> a 0-based value.
>
> Also per spec, the RDMA binding values shall be set
> to sqsize, which makes hsqsize 0-based values.
>
> Also per spec, but not very clear, is hrqsize is +1
> of hsqsize.
>
> Thus, the sqsize during NVMf connect() is now:
>
> [root at fedora23-fabrics-host1 for-48]# dmesg
> [  318.720645] nvme_fabrics: nvmf_connect_admin_queue(): sqsize for
> admin queue: 31
> [  318.720884] nvme nvme0: creating 16 I/O queues.
> [  318.810114] nvme_fabrics: nvmf_connect_io_queue(): sqsize for i/o
> queue: 127
>
> Reported-by: Daniel Verkamp <daniel.verkamp at intel.com>
> Signed-off-by: Jay Freyensee <james_p_freyensee at linux.intel.com>
> ---
>  drivers/nvme/host/rdma.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 3e3ce2b..8be64f1 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1284,8 +1284,21 @@ static int nvme_rdma_route_resolved(struct nvme_rdma_queue *queue)
>
>  	priv.recfmt = cpu_to_le16(NVME_RDMA_CM_FMT_1_0);
>  	priv.qid = cpu_to_le16(nvme_rdma_queue_idx(queue));
> -	priv.hrqsize = cpu_to_le16(queue->queue_size);
> -	priv.hsqsize = cpu_to_le16(queue->queue_size);
> +
> +	/*
> +	 * On one end, the fabrics spec is pretty clear that
> +	 * hsqsize variables shall be set to the value of sqsize,
> +	 * which is a 0-based number. What is confusing is the value for
> +	 * hrqsize.  After clarification from NVMe spec committee member,
> +	 * the minimum value of hrqsize is hsqsize+1.
> +	 */
> +	if (priv.qid == 0) {
> +		priv.hsqsize = cpu_to_le16(queue->ctrl->ctrl.admin_sqsize);
> +		priv.hrqsize = cpu_to_le16(queue->ctrl->ctrl.admin_sqsize+1);
> +	} else {
> +		priv.hsqsize = cpu_to_le16(queue->ctrl->ctrl.sqsize);
> +		priv.hrqsize = cpu_to_le16(queue->ctrl->ctrl.sqsize+1);
> +	}

Huh? (scratch...) using priv.hrqsize = priv.hsqsize+1 is pointless.

We expose to the block layer X and we send to the target X-1 and
the target does X+1 (looks goofy, but ok). We also size our RDMA
send/recv according to X so why on earth would we want to tell the
target we have a recv queue of size X+1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 1/4] fabrics: define admin sqsize min default, per spec
  2016-08-11  4:07 ` [PATCH RFC 1/4] fabrics: define admin sqsize min default, per spec Jay Freyensee
@ 2016-08-11  9:01   ` Sagi Grimberg
  2016-08-11 15:08     ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Sagi Grimberg @ 2016-08-11  9:01 UTC (permalink / raw)




On 11/08/16 07:07, Jay Freyensee wrote:
> Upon admin queue connect(), the rdma qp was being
> set based on NVMF_AQ_DEPTH.  However, the fabrics layer was
> using the sqsize field value set for I/O queues for the admin
> queue, which through the nvme layer and rdma layer off-whack:
>
> root at fedora23-fabrics-host1 nvmf]# dmesg
> [ 3507.798642] nvme_fabrics: nvmf_connect_admin_queue():admin sqsize
> being sent is: 128
> [ 3507.798858] nvme nvme0: creating 16 I/O queues.
> [ 3507.896407] nvme nvme0: new ctrl: NQN "nullside-nqn", addr
> 192.168.1.3:4420
>
> Thus, to have a different admin queue value (which the fabrics
> spec states the minimum depth for a fabrics admin queue is 32 via
> the ASQSZ definition), we need also a new variable to hold
> the sqsize for admin fabrics queue.  This also allows fabric
> implementation layers to set an admin sqsize greater than the
> specification-defined minimum, which the fabrics spec allows.

We only need a variable if we actually have someone making use of
it with the non-default value. I honestly don't see the reason
to keep it in a variable when no one is making real use
of it.

If you would have exposed it to user-space then it would make
better sense (although I'm not a fan of adding not very useful
configs to user-space).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 1/4] fabrics: define admin sqsize min default, per spec
  2016-08-11  9:01   ` Sagi Grimberg
@ 2016-08-11 15:08     ` Christoph Hellwig
  2016-08-11 16:33       ` J Freyensee
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2016-08-11 15:08 UTC (permalink / raw)


On Thu, Aug 11, 2016@12:01:59PM +0300, Sagi Grimberg wrote:
> it with the non-default value. I honestly don't see the reason
> to keep it in a variable when no one is making real use
> of it.
>
> If you would have exposed it to user-space then it would make
> better sense (although I'm not a fan of adding not very useful
> configs to user-space).

I agree - until we have a good use case for large admin queues
we should keep things simple.  And I'm still pissed about the idiotic
addition of the admin queue size to the discovery records in the last
moment.  It's entirely contrary to the disccovery service abstraction
we build, and pointless as well.  So the more we can ignore it, the
better.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 2/4] nvme-rdma: fix sqsize/hsqsize/hrqsize per spec
  2016-08-11  4:07 ` [PATCH RFC 2/4] nvme-rdma: fix sqsize/hsqsize/hrqsize " Jay Freyensee
  2016-08-11  7:03   ` Sagi Grimberg
@ 2016-08-11 15:21   ` Christoph Hellwig
  2016-08-11 16:40     ` J Freyensee
  2016-08-11 23:24     ` J Freyensee
  1 sibling, 2 replies; 13+ messages in thread
From: Christoph Hellwig @ 2016-08-11 15:21 UTC (permalink / raw)


I think having different notations for hsqsize vs hrqsize is highly
confusing, and I've asked for a clarification of the hrqsize definition
which is ambious at it's best at the moment.  But independent of
that a few more comments:

> @@ -1907,7 +1920,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
>  	spin_lock_init(&ctrl->lock);
>  
>  	ctrl->queue_count = opts->nr_io_queues + 1; /* +1 for admin queue */
> -	ctrl->ctrl.sqsize = opts->queue_size;
> +	ctrl->ctrl.sqsize = opts->queue_size-1;

We should keep our own sqsize in normal notation in core, and just
convert to strange notations on the wire.  Especially as we use it
to driver all the resource allocation as Sagi said.


> +	if (priv.qid == 0) {
> +		priv.hsqsize = cpu_to_le16(queue->ctrl->ctrl.admin_sqsize);
> +		priv.hrqsize = cpu_to_le16(queue->ctrl->ctrl.admin_sqsize+1);

Based on that priv.hsqsize should have a - 1 here  and hrqsize should
be kept as-is.  Also always use spaces around the operators.

And while we're at it - the fix to use the separate AQ values should
go into the first patch.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 1/4] fabrics: define admin sqsize min default, per spec
  2016-08-11 15:08     ` Christoph Hellwig
@ 2016-08-11 16:33       ` J Freyensee
  0 siblings, 0 replies; 13+ messages in thread
From: J Freyensee @ 2016-08-11 16:33 UTC (permalink / raw)


On Thu, 2016-08-11@17:08 +0200, Christoph Hellwig wrote:
> On Thu, Aug 11, 2016@12:01:59PM +0300, Sagi Grimberg wrote:
> > 
> > it with the non-default value. I honestly don't see the reason
> > to keep it in a variable when no one is making real use
> > of it.
> > 
> > If you would have exposed it to user-space then it would make
> > better sense (although I'm not a fan of adding not very useful
> > configs to user-space).
> 
> I agree - until we have a good use case for large admin queues
> we should keep things simple.??And I'm still pissed about the idiotic
> addition of the admin queue size to the discovery records in the last
> moment.??It's entirely contrary to the disccovery service abstraction
> we build, and pointless as well.??So the more we can ignore it, the
> better.

OK, I have no problems with that. ?So #define set the admin sqsize in
nvmf_connect_admin_queue() OK?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 2/4] nvme-rdma: fix sqsize/hsqsize/hrqsize per spec
  2016-08-11  7:03   ` Sagi Grimberg
@ 2016-08-11 16:35     ` J Freyensee
  0 siblings, 0 replies; 13+ messages in thread
From: J Freyensee @ 2016-08-11 16:35 UTC (permalink / raw)


On Thu, 2016-08-11@10:03 +0300, Sagi Grimberg wrote:
> 
> On 11/08/16 07:07, Jay Freyensee wrote:
> > 
> > Per NVMe-over-Fabrics 1.0 spec, sqsize is represented as
> > a 0-based value.
> > 
> > Also per spec, the RDMA binding values shall be set
> > to sqsize, which makes hsqsize 0-based values.
> > 
> > Also per spec, but not very clear, is hrqsize is +1
> > of hsqsize.
> > 
> > Thus, the sqsize during NVMf connect() is now:
> > 
> > [root at fedora23-fabrics-host1 for-48]# dmesg
> > [??318.720645] nvme_fabrics: nvmf_connect_admin_queue(): sqsize for
> > admin queue: 31
> > [??318.720884] nvme nvme0: creating 16 I/O queues.
> > [??318.810114] nvme_fabrics: nvmf_connect_io_queue(): sqsize for
> > i/o
> > queue: 127
> > 
> > Reported-by: Daniel Verkamp <daniel.verkamp at intel.com>
> > Signed-off-by: Jay Freyensee <james_p_freyensee at linux.intel.com>
> > ---
> > ?drivers/nvme/host/rdma.c | 19 ++++++++++++++++---
> > ?1 file changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> > index 3e3ce2b..8be64f1 100644
> > --- a/drivers/nvme/host/rdma.c
> > +++ b/drivers/nvme/host/rdma.c
> > @@ -1284,8 +1284,21 @@ static int nvme_rdma_route_resolved(struct
> > nvme_rdma_queue *queue)
> > 
> > ?	priv.recfmt = cpu_to_le16(NVME_RDMA_CM_FMT_1_0);
> > ?	priv.qid = cpu_to_le16(nvme_rdma_queue_idx(queue));
> > -	priv.hrqsize = cpu_to_le16(queue->queue_size);
> > -	priv.hsqsize = cpu_to_le16(queue->queue_size);
> > +
> > +	/*
> > +	?* On one end, the fabrics spec is pretty clear that
> > +	?* hsqsize variables shall be set to the value of sqsize,
> > +	?* which is a 0-based number. What is confusing is the
> > value for
> > +	?* hrqsize.??After clarification from NVMe spec committee
> > member,
> > +	?* the minimum value of hrqsize is hsqsize+1.
> > +	?*/
> > +	if (priv.qid == 0) {
> > +		priv.hsqsize = cpu_to_le16(queue->ctrl-
> > >ctrl.admin_sqsize);
> > +		priv.hrqsize = cpu_to_le16(queue->ctrl-
> > >ctrl.admin_sqsize+1);
> > +	} else {
> > +		priv.hsqsize = cpu_to_le16(queue->ctrl-
> > >ctrl.sqsize);
> > +		priv.hrqsize = cpu_to_le16(queue->ctrl-
> > >ctrl.sqsize+1);
> > +	}
> 
> Huh? (scratch...) using priv.hrqsize = priv.hsqsize+1 is pointless.

It may be pointless, but Dave said that is the current interpretation
of the NVMe-over-Fabrics spec (which I don't really understand either).

> 
> We expose to the block layer X and we send to the target X-1 and
> the target does X+1 (looks goofy, but ok). We also size our RDMA
> send/recv according to X so why on earth would we want to tell the
> target we have a recv queue of size X+1

Could be the reason I see kato timeouts then kernel crashing...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 2/4] nvme-rdma: fix sqsize/hsqsize/hrqsize per spec
  2016-08-11 15:21   ` Christoph Hellwig
@ 2016-08-11 16:40     ` J Freyensee
  2016-08-11 23:24     ` J Freyensee
  1 sibling, 0 replies; 13+ messages in thread
From: J Freyensee @ 2016-08-11 16:40 UTC (permalink / raw)


On Thu, 2016-08-11@17:21 +0200, Christoph Hellwig wrote:
> I think having different notations for hsqsize vs hrqsize is highly
> confusing, and I've asked for a clarification of the hrqsize
> definition
> which is ambious at it's best at the moment.??But independent of
> that a few more comments:
> 
> > 
> > @@ -1907,7 +1920,7 @@ static struct nvme_ctrl
> > *nvme_rdma_create_ctrl(struct device *dev,
> > ?	spin_lock_init(&ctrl->lock);
> > ?
> > ?	ctrl->queue_count = opts->nr_io_queues + 1; /* +1 for
> > admin queue */
> > -	ctrl->ctrl.sqsize = opts->queue_size;
> > +	ctrl->ctrl.sqsize = opts->queue_size-1;
> 
> We should keep our own sqsize in normal notation in core, and just
> convert to strange notations on the wire.??Especially as we use it
> to driver all the resource allocation as Sagi said.
> 
> 
> > 
> > +	if (priv.qid == 0) {
> > +		priv.hsqsize = cpu_to_le16(queue->ctrl-
> > >ctrl.admin_sqsize);
> > +		priv.hrqsize = cpu_to_le16(queue->ctrl-
> > >ctrl.admin_sqsize+1);
> 
> Based on that priv.hsqsize should have a - 1 here??and hrqsize should
> be kept as-is.??Also always use spaces around the operators.

Hmmm, not sure I understand. ?In this current patch series
implementation, admin_sqsize will be NVMF_AQ_DEPTH-1 (31), so it
doesn't need a -1.

Then you are saying do NOT add hrqsize? ?Thus, hsqsize == hrqsize??

> 
> And while we're at it - the fix to use the separate AQ values should
> go into the first patch.

It will get re-worked anyways based on the interest to not have a
variable at the moment at someone's disposal to create variable-sized
admin queue depths.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 2/4] nvme-rdma: fix sqsize/hsqsize/hrqsize per spec
  2016-08-11 15:21   ` Christoph Hellwig
  2016-08-11 16:40     ` J Freyensee
@ 2016-08-11 23:24     ` J Freyensee
  1 sibling, 0 replies; 13+ messages in thread
From: J Freyensee @ 2016-08-11 23:24 UTC (permalink / raw)


On Thu, 2016-08-11@17:21 +0200, Christoph Hellwig wrote:
> I think having different notations for hsqsize vs hrqsize is highly
> confusing, and I've asked for a clarification of the hrqsize
> definition
> which is ambious at it's best at the moment.??But independent of
> that a few more comments:
> 
> > 
> > @@ -1907,7 +1920,7 @@ static struct nvme_ctrl
> > *nvme_rdma_create_ctrl(struct device *dev,
> > ?	spin_lock_init(&ctrl->lock);
> > ?
> > ?	ctrl->queue_count = opts->nr_io_queues + 1; /* +1 for
> > admin queue */
> > -	ctrl->ctrl.sqsize = opts->queue_size;
> > +	ctrl->ctrl.sqsize = opts->queue_size-1;
> 
> We should keep our own sqsize in normal notation in core, and just
> convert to strange notations on the wire.??Especially as we use it
> to driver all the resource allocation as Sagi said.

So what I'm understanding is to keep ctrl->ctrl.sqsize as ones-based in
the code but when we actually do something like a connect() send
sqsize-1 in the connect() command.

That almost works, but is still a tad ugly because there are code lines
such as:

ctrl->ctrl.sqsize =
????????????????min_t(int, NVME_CAP_MQES(ctrl->cap) + 1, ctrl-
>ctrl.sqsize);

in which also need to be adjusted because what is currently going on
here is NVME_CAP_MQES() is a zero-based value but ctrl->ctrl.sqsize is
currently implemented as a one's-based value.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-08-11 23:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-11  4:07 [PATCH RFC 0/4] sqsize zero-based fixes Jay Freyensee
2016-08-11  4:07 ` [PATCH RFC 1/4] fabrics: define admin sqsize min default, per spec Jay Freyensee
2016-08-11  9:01   ` Sagi Grimberg
2016-08-11 15:08     ` Christoph Hellwig
2016-08-11 16:33       ` J Freyensee
2016-08-11  4:07 ` [PATCH RFC 2/4] nvme-rdma: fix sqsize/hsqsize/hrqsize " Jay Freyensee
2016-08-11  7:03   ` Sagi Grimberg
2016-08-11 16:35     ` J Freyensee
2016-08-11 15:21   ` Christoph Hellwig
2016-08-11 16:40     ` J Freyensee
2016-08-11 23:24     ` J Freyensee
2016-08-11  4:07 ` [PATCH RFC 3/4] nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize Jay Freyensee
2016-08-11  4:07 ` [PATCH RFC 4/4] nvme-loop: set sqsize to 0-based value, per spec Jay Freyensee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.