All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2] nvmet: make MDTS value configurable
@ 2019-04-03 23:06 Hannes Reinecke
  2019-04-04  5:27 ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2019-04-03 23:06 UTC (permalink / raw)


Some drivers and/or use-cases might need to set a smaller MDTS
value, so add a per-port attribute to modify the MDTS value.

Signed-off-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/target/admin-cmd.c |  3 +--
 drivers/nvme/target/configfs.c  | 29 +++++++++++++++++++++++++++++
 drivers/nvme/target/nvmet.h     |  1 +
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 76250181fee0..d33e0fd248cd 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -311,8 +311,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 	/* we support multiple ports, multiples hosts and ANA: */
 	id->cmic = (1 << 0) | (1 << 1) | (1 << 3);
 
-	/* no limit on data transfer sizes for now */
-	id->mdts = 0;
+	id->mdts = req->port->mdts;
 	id->cntlid = cpu_to_le16(ctrl->cntlid);
 	id->ver = cpu_to_le32(ctrl->subsys->ver);
 
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index adb79545cdd7..a70b5e452ac3 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -299,6 +299,34 @@ static ssize_t nvmet_addr_trtype_store(struct config_item *item,
 
 CONFIGFS_ATTR(nvmet_, addr_trtype);
 
+static ssize_t nvmet_addr_mdts_show(struct config_item *item,
+					   char *page)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+
+	return snprintf(page, PAGE_SIZE, "%u\n", port->mdts);
+}
+
+static ssize_t nvmet_addr_mdts_store(struct config_item *item,
+					    const char *page, size_t count)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+	unsigned int mdts;
+	int ret;
+
+	ret = kstrtou32(page, 0, &mdts);
+	if (ret)
+		return ret;
+	if (mdts > 0xFF)
+		return -EINVAL;
+	down_write(&nvmet_config_sem);
+	port->mdts = mdts;
+	up_write(&nvmet_config_sem);
+
+	return count;
+}
+CONFIGFS_ATTR(nvmet_, addr_mdts);
+
 /*
  * Namespace structures & file operation functions below
  */
@@ -1157,6 +1185,7 @@ static struct configfs_attribute *nvmet_port_attrs[] = {
 	&nvmet_attr_addr_traddr,
 	&nvmet_attr_addr_trsvcid,
 	&nvmet_attr_addr_trtype,
+	&nvmet_attr_addr_mdts,
 	&nvmet_attr_param_inline_data_size,
 	NULL,
 };
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 51e49efd7849..fdd3d3e6e759 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -139,6 +139,7 @@ struct nvmet_port {
 	enum nvme_ana_state		*ana_state;
 	void				*priv;
 	bool				enabled;
+	u8				mdts;
 	int				inline_data_size;
 };
 
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCHv2] nvmet: make MDTS value configurable
  2019-04-03 23:06 [PATCHv2] nvmet: make MDTS value configurable Hannes Reinecke
@ 2019-04-04  5:27 ` Christoph Hellwig
  2019-04-07 12:51   ` Max Gurtovoy
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2019-04-04  5:27 UTC (permalink / raw)


On Thu, Apr 04, 2019@01:06:54AM +0200, Hannes Reinecke wrote:
> Some drivers and/or use-cases might need to set a smaller MDTS
> value, so add a per-port attribute to modify the MDTS value.

If the drivers need it they need to communicate that value up the
stack.  We should not require user inputs to make things work.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCHv2] nvmet: make MDTS value configurable
  2019-04-04  5:27 ` Christoph Hellwig
@ 2019-04-07 12:51   ` Max Gurtovoy
  2019-04-08  6:16     ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Max Gurtovoy @ 2019-04-07 12:51 UTC (permalink / raw)



On 4/4/2019 8:27 AM, Christoph Hellwig wrote:
> On Thu, Apr 04, 2019@01:06:54AM +0200, Hannes Reinecke wrote:
>> Some drivers and/or use-cases might need to set a smaller MDTS
>> value, so add a per-port attribute to modify the MDTS value.
> If the drivers need it they need to communicate that value up the
> stack.  We should not require user inputs to make things work.

I suggested to add ops function.

Something like (pseudo code):


static u8 nvmet_rdma_port_mdts(struct nvmet_port *nport)
{
 ??????? struct nvmet_rdma_port *port = nport->priv;
 ??????? struct rdma_cm_id *cm_id = port->cm_id;

 ??????? /* we assume ctrl page_size is 4K */
 ??????? return ilog2(cm_id->device->attrs.max_io_sz / SZ_4K);
}


but after digging the code, there is still work to be done in order to 
do it...


>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.infradead.org%2Fmailman%2Flistinfo%2Flinux-nvme&amp;data=02%7C01%7Cmaxg%40mellanox.com%7C01509c9c058248d7ada908d6b8be3f77%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636899524585583497&amp;sdata=BWEpJW%2BdnykRerFGvO6HXtrh2%2BMXr9GI%2BeGOux0vrGA%3D&amp;reserved=0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCHv2] nvmet: make MDTS value configurable
  2019-04-07 12:51   ` Max Gurtovoy
@ 2019-04-08  6:16     ` Christoph Hellwig
  2019-04-08 10:42       ` Max Gurtovoy
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2019-04-08  6:16 UTC (permalink / raw)


On Sun, Apr 07, 2019@03:51:53PM +0300, Max Gurtovoy wrote:
>
> On 4/4/2019 8:27 AM, Christoph Hellwig wrote:
>> On Thu, Apr 04, 2019@01:06:54AM +0200, Hannes Reinecke wrote:
>>> Some drivers and/or use-cases might need to set a smaller MDTS
>>> value, so add a per-port attribute to modify the MDTS value.
>> If the drivers need it they need to communicate that value up the
>> stack.  We should not require user inputs to make things work.
>
> I suggested to add ops function.
>
> Something like (pseudo code):
>
>
> static u8 nvmet_rdma_port_mdts(struct nvmet_port *nport)
> {
> ??????? struct nvmet_rdma_port *port = nport->priv;
> ??????? struct rdma_cm_id *cm_id = port->cm_id;
>
> ??????? /* we assume ctrl page_size is 4K */
> ??????? return ilog2(cm_id->device->attrs.max_io_sz / SZ_4K);
> }
>
>
> but after digging the code, there is still work to be done in order to do 
> it...

The general idea sounds fine to me.  Can you look into it?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCHv2] nvmet: make MDTS value configurable
  2019-04-08  6:16     ` Christoph Hellwig
@ 2019-04-08 10:42       ` Max Gurtovoy
  2019-04-08 10:50         ` Hannes Reinecke
  2019-04-24 17:05         ` Sagi Grimberg
  0 siblings, 2 replies; 8+ messages in thread
From: Max Gurtovoy @ 2019-04-08 10:42 UTC (permalink / raw)



On 4/8/2019 9:16 AM, Christoph Hellwig wrote:
> On Sun, Apr 07, 2019@03:51:53PM +0300, Max Gurtovoy wrote:
>> On 4/4/2019 8:27 AM, Christoph Hellwig wrote:
>>> On Thu, Apr 04, 2019@01:06:54AM +0200, Hannes Reinecke wrote:
>>>> Some drivers and/or use-cases might need to set a smaller MDTS
>>>> value, so add a per-port attribute to modify the MDTS value.
>>> If the drivers need it they need to communicate that value up the
>>> stack.  We should not require user inputs to make things work.
>> I suggested to add ops function.
>>
>> Something like (pseudo code):
>>
>>
>> static u8 nvmet_rdma_port_mdts(struct nvmet_port *nport)
>> {
>>  ??????? struct nvmet_rdma_port *port = nport->priv;
>>  ??????? struct rdma_cm_id *cm_id = port->cm_id;
>>
>>  ??????? /* we assume ctrl page_size is 4K */
>>  ??????? return ilog2(cm_id->device->attrs.max_io_sz / SZ_4K);
>> }
>>
>>
>> but after digging the code, there is still work to be done in order to do
>> it...
> The general idea sounds fine to me.  Can you look into it?

Yes, from what I saw we don't limit the max_io_size in the target since 
we're using the rw-API.

In case 1 rdma ctx is not enough (MR page list for example) we use more 
ctx's in the rw-API. I think we have a bug there since we don't check 
that nr_ops is not to big for transaction.

So probably we need to add some limit int nvmet/RDMA for that as 
NFS/RDMA does (let's agree on some value/module_param/tunable).

Or we can say that the limit is (if we force MRs for example):

rdma_rw_fr_page_list_len(device) * device_page_size * 
max_mrs_per_transaction.

We also need to calculate mr_pool_size and max_rdma_ctxs more carefully IMO.

So let's agree on the approach and then implement a solution (Sagi, can 
you comments on that too please ?)


Nevertheless, While digging into it I saw some improvements we can do to 
rw-API, so I'm preparing few patches (add static allocations option 
instead of allocations in fast-path).

-Max.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCHv2] nvmet: make MDTS value configurable
  2019-04-08 10:42       ` Max Gurtovoy
@ 2019-04-08 10:50         ` Hannes Reinecke
  2019-04-24 17:05         ` Sagi Grimberg
  1 sibling, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2019-04-08 10:50 UTC (permalink / raw)


On 4/8/19 12:42 PM, Max Gurtovoy wrote:
> 
> On 4/8/2019 9:16 AM, Christoph Hellwig wrote:
>> On Sun, Apr 07, 2019@03:51:53PM +0300, Max Gurtovoy wrote:
>>> On 4/4/2019 8:27 AM, Christoph Hellwig wrote:
>>>> On Thu, Apr 04, 2019@01:06:54AM +0200, Hannes Reinecke wrote:
>>>>> Some drivers and/or use-cases might need to set a smaller MDTS
>>>>> value, so add a per-port attribute to modify the MDTS value.
>>>> If the drivers need it they need to communicate that value up the
>>>> stack.? We should not require user inputs to make things work.
>>> I suggested to add ops function.
>>>
>>> Something like (pseudo code):
>>>
>>>
>>> static u8 nvmet_rdma_port_mdts(struct nvmet_port *nport)
>>> {
>>> ???????? struct nvmet_rdma_port *port = nport->priv;
>>> ???????? struct rdma_cm_id *cm_id = port->cm_id;
>>>
>>> ???????? /* we assume ctrl page_size is 4K */
>>> ???????? return ilog2(cm_id->device->attrs.max_io_sz / SZ_4K);
>>> }
>>>
>>>
>>> but after digging the code, there is still work to be done in order 
>>> to do
>>> it...
>> The general idea sounds fine to me.? Can you look into it?
> 
> Yes, from what I saw we don't limit the max_io_size in the target since 
> we're using the rw-API.
> 
> In case 1 rdma ctx is not enough (MR page list for example) we use more 
> ctx's in the rw-API. I think we have a bug there since we don't check 
> that nr_ops is not to big for transaction.
> 
> So probably we need to add some limit int nvmet/RDMA for that as 
> NFS/RDMA does (let's agree on some value/module_param/tunable).
> 
> Or we can say that the limit is (if we force MRs for example):
> 
> rdma_rw_fr_page_list_len(device) * device_page_size * 
> max_mrs_per_transaction.
> 
> We also need to calculate mr_pool_size and max_rdma_ctxs more carefully 
> IMO.
> 
> So let's agree on the approach and then implement a solution (Sagi, can 
> you comments on that too please ?)
> 
> 
> Nevertheless, While digging into it I saw some improvements we can do to 
> rw-API, so I'm preparing few patches (add static allocations option 
> instead of allocations in fast-path).
> 
In general I'm in favour of having a 'port_ops' template, too.
That would allow me to pass the correct count for the number of hardware 
queues, too; ATM we're using the default for nvmet, yet FC-NVMe driver 
do set their own count internally already.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCHv2] nvmet: make MDTS value configurable
  2019-04-08 10:42       ` Max Gurtovoy
  2019-04-08 10:50         ` Hannes Reinecke
@ 2019-04-24 17:05         ` Sagi Grimberg
  2019-04-25 14:29           ` Hannes Reinecke
  1 sibling, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2019-04-24 17:05 UTC (permalink / raw)



> Yes, from what I saw we don't limit the max_io_size in the target since 
> we're using the rw-API.
> 
> In case 1 rdma ctx is not enough (MR page list for example) we use more 
> ctx's in the rw-API. I think we have a bug there since we don't check 
> that nr_ops is not to big for transaction.

What do you mean too big?

> So probably we need to add some limit int nvmet/RDMA for that as 
> NFS/RDMA does (let's agree on some value/module_param/tunable).
> 
> Or we can say that the limit is (if we force MRs for example):
> 
> rdma_rw_fr_page_list_len(device) * device_page_size * 
> max_mrs_per_transaction.

if at all it should probably be:
rdma_rw_fr_page_list_len(device) * device_page_size * 
attr->cap.max_rdma_ctxs

> We also need to calculate mr_pool_size and max_rdma_ctxs more carefully 
> IMO.
> 
> So let's agree on the approach and then implement a solution (Sagi, can 
> you comments on that too please ?)

I don't think we want a modparam for something this low-level...

we reserve a qp space and mr pool, we should limit MDTS based on that
alone.

> Nevertheless, While digging into it I saw some improvements we can do to 
> rw-API, so I'm preparing few patches (add static allocations option 
> instead of allocations in fast-path).

Note that I still don't understand what is the original issue Hannes
was trying to resolve...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCHv2] nvmet: make MDTS value configurable
  2019-04-24 17:05         ` Sagi Grimberg
@ 2019-04-25 14:29           ` Hannes Reinecke
  0 siblings, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2019-04-25 14:29 UTC (permalink / raw)


On 4/24/19 7:05 PM, Sagi Grimberg wrote:
> 
>> Yes, from what I saw we don't limit the max_io_size in the target 
>> since we're using the rw-API.
>>
>> In case 1 rdma ctx is not enough (MR page list for example) we use 
>> more ctx's in the rw-API. I think we have a bug there since we don't 
>> check that nr_ops is not to big for transaction.
> 
> What do you mean too big?
> 
>> So probably we need to add some limit int nvmet/RDMA for that as 
>> NFS/RDMA does (let's agree on some value/module_param/tunable).
>>
>> Or we can say that the limit is (if we force MRs for example):
>>
>> rdma_rw_fr_page_list_len(device) * device_page_size * 
>> max_mrs_per_transaction.
> 
> if at all it should probably be:
> rdma_rw_fr_page_list_len(device) * device_page_size * 
> attr->cap.max_rdma_ctxs
> 
>> We also need to calculate mr_pool_size and max_rdma_ctxs more 
>> carefully IMO.
>>
>> So let's agree on the approach and then implement a solution (Sagi, 
>> can you comments on that too please ?)
> 
> I don't think we want a modparam for something this low-level...
> 
> we reserve a qp space and mr pool, we should limit MDTS based on that
> alone.
> 
>> Nevertheless, While digging into it I saw some improvements we can do 
>> to rw-API, so I'm preparing few patches (add static allocations option 
>> instead of allocations in fast-path).
> 
> Note that I still don't understand what is the original issue Hannes
> was trying to resolve...

We've had quite some issues around bio splitting on NVMe (cf Ming Leis 
patchset 'blk-mq: fix races related to freeing queue' and my patch 
'block: use separate bio set for splitting').
These issue only materialized when using a smaller MDTS value (eg 64k), 
so the idea to have a proper blktest reproducer.
Which would require us to be able to set a different MDTS value in the 
first place.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-04-25 14:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-03 23:06 [PATCHv2] nvmet: make MDTS value configurable Hannes Reinecke
2019-04-04  5:27 ` Christoph Hellwig
2019-04-07 12:51   ` Max Gurtovoy
2019-04-08  6:16     ` Christoph Hellwig
2019-04-08 10:42       ` Max Gurtovoy
2019-04-08 10:50         ` Hannes Reinecke
2019-04-24 17:05         ` Sagi Grimberg
2019-04-25 14:29           ` Hannes Reinecke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.