All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 0/2] nvme: use xarray for ns tracking
@ 2020-07-20  3:32 Chaitanya Kulkarni
  2020-07-20  3:32 ` [PATCH V4 1/2] nvmet: use xarray for ctrl ns storing Chaitanya Kulkarni
  2020-07-20  3:32 ` [PATCH V4 2/2] nvme-core: use xarray for ctrl ns tracking Chaitanya Kulkarni
  0 siblings, 2 replies; 7+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-20  3:32 UTC (permalink / raw)
  To: kbusch, hch, sagi; +Cc: Chaitanya Kulkarni, linux-nvme

Hi,

This patch-series uses ctrl->namespaces with an xarray for host-core and
target-core. We can see following performance improvement when running
fio with 32 parallel jobs where first 16 namespaces and last 16
namespaces are used for I/O using NVMeOF (nvme-loop) backed by nulli blk
devices mapped 1:1 on target namespaces.

For host even though nvme_find_get_ns() doesn't fall into the fast path
yet it does for NVMeOF passthru. This prepares us to improve performance
for future NVMeOF passthru backend which is under review which uses the
similar data structure as target.

Following are the performance numbers with NVMeOF (nvme-loop) backed by
null_blk devices mapped 1:1 on NVMeOF target backend :-

IOPS/Bandwidth ~16.4198% increase with XArray (higher the better) :-
-----------------------------------------------------------------------

default-1.fio.log:  read:  IOPS=820k,  BW=3204MiB/s  (3360MB/s)(188GiB/60002msec)
default-2.fio.log:  read:  IOPS=835k,  BW=3260MiB/s  (3418MB/s)(191GiB/60002msec)
default-3.fio.log:  read:  IOPS=834k,  BW=3257MiB/s  (3415MB/s)(191GiB/60001msec)
xarray-1.fio.log:   read:  IOPS=966k,  BW=3772MiB/s  (3955MB/s)(221GiB/60003msec)
xarray-2.fio.log:   read:  IOPS=966k,  BW=3775MiB/s  (3958MB/s)(221GiB/60002msec)
xarray-3.fio.log:   read:  IOPS=965k,  BW=3769MiB/s  (3952MB/s)(221GiB/60002msec)

Latency (submission) ~25% decrease with XArray (lower the better) :-
-----------------------------------------------------------------------

default-1.fio.log:  slat  (usec):  min=8,  max=26066,  avg=25.18,  stdev=47.72
default-2.fio.log:  slat  (usec):  min=8,  max=907,    avg=20.24,  stdev=7.36
default-3.fio.log:  slat  (usec):  min=8,  max=723,    avg=20.21,  stdev=7.16
xarray-1.fio.log:   slat  (usec):  min=8,  max=639,    avg=14.84,  stdev=1.50
xarray-2.fio.log:   slat  (usec):  min=8,  max=840,    avg=14.84,  stdev=1.51
xarray-3.fio.log:   slat  (usec):  min=8,  max=2161,   avg=15.08,  stdev=9.56

CPU usage (system) ~12.2807% decrease with XArray (lower the better) :-
-----------------------------------------------------------------------

default-1.fio.log:  cpu  :  usr=3.92%,  sys=57.25%,  ctx=2159595,  majf=0,  minf=2807
default-2.fio.log:  cpu  :  usr=3.98%,  sys=57.99%,  ctx=1565139,  majf=0,  minf=2425
default-3.fio.log:  cpu  :  usr=3.99%,  sys=57.85%,  ctx=1563792,  majf=0,  minf=2977
xarray-1.fio.log:   cpu  :  usr=4.47%,  sys=50.88%,  ctx=1810927,  majf=0,  minf=2478
xarray-2.fio.log:   cpu  :  usr=4.47%,  sys=50.88%,  ctx=1812184,  majf=0,  minf=2176
xarray-3.fio.log:   cpu  :  usr=4.49%,  sys=50.86%,  ctx=1816963,  majf=0,  minf=2736

For XArray helpers maybe we can do a separate series ?

Regards,
Chaitanya

* Changes from V3:-
-------------------

1. Get rid of the centralize helper for ctrl queue mgmt. 
2. Re-order patches and make nvmet patch first.
3. In the error patch for xa_insert() for nvmet patch restore subsystem
   max nsid and call percpu_ref_exit(). 
5. Ger rid of the rcu read_lock() and rcu_read_unlock() in
   nvmet_find_namespaces().
4. Remove an extra local varable and use ctrl->namespaces directly in
   host ns remove path.
5. Call nvme_nvm_unregister() when xa_insert() fails in nvme_alloc_ns().
6. Move xa_erase() call in nvme_ns_remove() before existing call to
   synchronize_rcu().

* Changes from V2:-
-------------------

1.  Add Xarray __xa_load() API as a preparation patch.
2.  Remove the id_ctrl call in nvme_dev_user_cmd().
3.  Remove the switch error check for xa_insert().
4.  Don't change the ns->kref code. when calling xa_erase().
5.  Keep XArray for deletion in the nvme_remove_invalid_namespaces()
    see [1].
6.  Keep XArray for deletion in the nvme_remove_namespaces() see [1].
7.  Remove randomly changed the lines to alingn the coding style in
    nvmet patch.
8.  Remove remaining #include nvme.h from the nvmet patch.
9.  Remove the xa_empty() from nvmet_max_nsid().
10. Centralize the blk-mq queue wrapper. The blk-mq queue related
    wrapper functions nvme_kill_queues(), nvme_unfreeze(),
    nvme_wait_freeze(), nvme_start_freeze(), nvme_stop_queues(),
    nvme_start_queues(), nvme_start_queues(), and nvme_sync_queues()
    differ in only one line i.e. blk_mq_queue_xxx() call. For the one
    line we have 7 functions and 7 exported symbols. Using a 
    centralize ctrl-queue action function and well defined enums
    represnting names of the helpers we can minimize the code and
    exported symbol and still maintain the redability.

* Change from V1:-
------------------

1. Use xarray instead of rcu locks.

Chaitanya Kulkarni (2):
  nvmet: use xarray for ctrl ns storing
  nvme-core: use xarray for ctrl ns tracking

 drivers/nvme/host/core.c        | 184 +++++++++++++++-----------------
 drivers/nvme/host/multipath.c   |  15 ++-
 drivers/nvme/host/nvme.h        |   5 +-
 drivers/nvme/target/admin-cmd.c |  17 ++-
 drivers/nvme/target/core.c      |  64 ++++-------
 drivers/nvme/target/nvmet.h     |   3 +-
 6 files changed, 123 insertions(+), 165 deletions(-)

-- 
2.26.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V4 1/2] nvmet: use xarray for ctrl ns storing
  2020-07-20  3:32 [PATCH V4 0/2] nvme: use xarray for ns tracking Chaitanya Kulkarni
@ 2020-07-20  3:32 ` Chaitanya Kulkarni
  2020-07-20 13:30   ` Christoph Hellwig
  2020-07-20  3:32 ` [PATCH V4 2/2] nvme-core: use xarray for ctrl ns tracking Chaitanya Kulkarni
  1 sibling, 1 reply; 7+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-20  3:32 UTC (permalink / raw)
  To: kbusch, hch, sagi; +Cc: Chaitanya Kulkarni, linux-nvme

This patch replaces the ctrl->namespaces tracking from linked list to
xarray and improves the performance when accessing one namespce :-

XArray vs Default:-

IOPS and BW (more the better) increase BW (~1.8%):-
---------------------------------------------------

 XArray :-
  read:  IOPS=160k,  BW=626MiB/s  (656MB/s)(18.3GiB/30001msec)
  read:  IOPS=160k,  BW=626MiB/s  (656MB/s)(18.3GiB/30001msec)
  read:  IOPS=162k,  BW=631MiB/s  (662MB/s)(18.5GiB/30001msec)

 Default:-
  read:  IOPS=156k,  BW=609MiB/s  (639MB/s)(17.8GiB/30001msec)
  read:  IOPS=157k,  BW=613MiB/s  (643MB/s)(17.0GiB/30001msec)
  read:  IOPS=160k,  BW=626MiB/s  (656MB/s)(18.3GiB/30001msec)

Submission latency (less the better) decrease (~8.3%):-
-------------------------------------------------------

 XArray:-
  slat  (usec):  min=7,  max=8386,  avg=11.19,  stdev=5.96
  slat  (usec):  min=7,  max=441,   avg=11.09,  stdev=4.48
  slat  (usec):  min=7,  max=1088,  avg=11.21,  stdev=4.54

 Default :-
  slat  (usec):  min=8,   max=2826.5k,  avg=23.96,  stdev=3911.50
  slat  (usec):  min=8,   max=503,      avg=12.52,  stdev=5.07
  slat  (usec):  min=8,   max=2384,     avg=12.50,  stdev=5.28

CPU Usage (less the better) decrease (~5.2%):-
----------------------------------------------

 XArray:-
  cpu  :  usr=1.84%,  sys=18.61%,  ctx=949471,  majf=0,  minf=250
  cpu  :  usr=1.83%,  sys=18.41%,  ctx=950262,  majf=0,  minf=237
  cpu  :  usr=1.82%,  sys=18.82%,  ctx=957224,  majf=0,  minf=234

 Default:-
  cpu  :  usr=1.70%,  sys=19.21%,  ctx=858196,  majf=0,  minf=251
  cpu  :  usr=1.82%,  sys=19.98%,  ctx=929720,  majf=0,  minf=227
  cpu  :  usr=1.83%,  sys=20.33%,  ctx=947208,  majf=0,  minf=235.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 17 ++++-----
 drivers/nvme/target/core.c      | 64 +++++++++++----------------------
 drivers/nvme/target/nvmet.h     |  3 +-
 3 files changed, 27 insertions(+), 57 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 95bb3bc4e335..55918fcef80b 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -113,11 +113,10 @@ static u16 nvmet_get_smart_log_all(struct nvmet_req *req,
 	u64 data_units_read = 0, data_units_written = 0;
 	struct nvmet_ns *ns;
 	struct nvmet_ctrl *ctrl;
+	unsigned long idx;
 
 	ctrl = req->sq->ctrl;
-
-	rcu_read_lock();
-	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) {
+	xa_for_each(&ctrl->subsys->namespaces, idx, ns) {
 		/* we don't have the right data for file backed ns */
 		if (!ns->bdev)
 			continue;
@@ -127,9 +126,7 @@ static u16 nvmet_get_smart_log_all(struct nvmet_req *req,
 		host_writes += part_stat_read(ns->bdev->bd_part, ios[WRITE]);
 		data_units_written += DIV_ROUND_UP(
 			part_stat_read(ns->bdev->bd_part, sectors[WRITE]), 1000);
-
 	}
-	rcu_read_unlock();
 
 	put_unaligned_le64(host_reads, &slog->host_reads[0]);
 	put_unaligned_le64(data_units_read, &slog->data_units_read[0]);
@@ -230,14 +227,13 @@ static u32 nvmet_format_ana_group(struct nvmet_req *req, u32 grpid,
 {
 	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 	struct nvmet_ns *ns;
+	unsigned long idx;
 	u32 count = 0;
 
 	if (!(req->cmd->get_log_page.lsp & NVME_ANA_LOG_RGO)) {
-		rcu_read_lock();
-		list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link)
+		xa_for_each(&ctrl->subsys->namespaces, idx, ns)
 			if (ns->anagrpid == grpid)
 				desc->nsids[count++] = cpu_to_le32(ns->nsid);
-		rcu_read_unlock();
 	}
 
 	desc->grpid = cpu_to_le32(grpid);
@@ -556,6 +552,7 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
 	static const int buf_size = NVME_IDENTIFY_DATA_SIZE;
 	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 	struct nvmet_ns *ns;
+	unsigned long idx;
 	u32 min_nsid = le32_to_cpu(req->cmd->identify.nsid);
 	__le32 *list;
 	u16 status = 0;
@@ -567,15 +564,13 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
 		goto out;
 	}
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) {
+	xa_for_each(&ctrl->subsys->namespaces, idx, ns) {
 		if (ns->nsid <= min_nsid)
 			continue;
 		list[i++] = cpu_to_le32(ns->nsid);
 		if (i == buf_size / sizeof(__le32))
 			break;
 	}
-	rcu_read_unlock();
 
 	status = nvmet_copy_to_sgl(req, 0, list, buf_size);
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 9cdc39c8b729..9b9e88bfc210 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -115,13 +115,14 @@ u16 nvmet_zero_sgl(struct nvmet_req *req, off_t off, size_t len)
 
 static unsigned int nvmet_max_nsid(struct nvmet_subsys *subsys)
 {
-	struct nvmet_ns *ns;
+	unsigned long nsid = 0;
+	struct nvmet_ns *cur;
+	unsigned long idx;
 
-	if (list_empty(&subsys->namespaces))
-		return 0;
+	xa_for_each(&subsys->namespaces, idx, cur)
+		nsid = cur->nsid;
 
-	ns = list_last_entry(&subsys->namespaces, struct nvmet_ns, dev_link);
-	return ns->nsid;
+	return nsid;
 }
 
 static u32 nvmet_async_event_result(struct nvmet_async_event *aen)
@@ -410,28 +411,13 @@ static void nvmet_stop_keep_alive_timer(struct nvmet_ctrl *ctrl)
 	cancel_delayed_work_sync(&ctrl->ka_work);
 }
 
-static struct nvmet_ns *__nvmet_find_namespace(struct nvmet_ctrl *ctrl,
-		__le32 nsid)
-{
-	struct nvmet_ns *ns;
-
-	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) {
-		if (ns->nsid == le32_to_cpu(nsid))
-			return ns;
-	}
-
-	return NULL;
-}
-
 struct nvmet_ns *nvmet_find_namespace(struct nvmet_ctrl *ctrl, __le32 nsid)
 {
 	struct nvmet_ns *ns;
 
-	rcu_read_lock();
-	ns = __nvmet_find_namespace(ctrl, nsid);
+	ns = xa_load(&ctrl->subsys->namespaces, nsid);
 	if (ns)
 		percpu_ref_get(&ns->ref);
-	rcu_read_unlock();
 
 	return ns;
 }
@@ -586,24 +572,10 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 	if (ns->nsid > subsys->max_nsid)
 		subsys->max_nsid = ns->nsid;
 
-	/*
-	 * The namespaces list needs to be sorted to simplify the implementation
-	 * of the Identify Namepace List subcommand.
-	 */
-	if (list_empty(&subsys->namespaces)) {
-		list_add_tail_rcu(&ns->dev_link, &subsys->namespaces);
-	} else {
-		struct nvmet_ns *old;
-
-		list_for_each_entry_rcu(old, &subsys->namespaces, dev_link,
-					lockdep_is_held(&subsys->lock)) {
-			BUG_ON(ns->nsid == old->nsid);
-			if (ns->nsid < old->nsid)
-				break;
-		}
+	ret = xa_insert(&subsys->namespaces, ns->nsid, ns, GFP_KERNEL);
+	if (ret)
+		goto out_restore_subsys_maxnsid;
 
-		list_add_tail_rcu(&ns->dev_link, &old->dev_link);
-	}
 	subsys->nr_namespaces++;
 
 	nvmet_ns_changed(subsys, ns->nsid);
@@ -612,6 +584,10 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 out_unlock:
 	mutex_unlock(&subsys->lock);
 	return ret;
+
+out_restore_subsys_maxnsid:
+	subsys->max_nsid = nvmet_max_nsid(subsys);
+	percpu_ref_exit(&ns->ref);
 out_dev_put:
 	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
 		pci_dev_put(radix_tree_delete(&ctrl->p2p_ns_map, ns->nsid));
@@ -630,7 +606,7 @@ void nvmet_ns_disable(struct nvmet_ns *ns)
 		goto out_unlock;
 
 	ns->enabled = false;
-	list_del_rcu(&ns->dev_link);
+	xa_erase(&ns->subsys->namespaces, ns->nsid);
 	if (ns->nsid == subsys->max_nsid)
 		subsys->max_nsid = nvmet_max_nsid(subsys);
 
@@ -681,7 +657,6 @@ struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid)
 	if (!ns)
 		return NULL;
 
-	INIT_LIST_HEAD(&ns->dev_link);
 	init_completion(&ns->disable_done);
 
 	ns->nsid = nsid;
@@ -1263,14 +1238,14 @@ static void nvmet_setup_p2p_ns_map(struct nvmet_ctrl *ctrl,
 		struct nvmet_req *req)
 {
 	struct nvmet_ns *ns;
+	unsigned long idx;
 
 	if (!req->p2p_client)
 		return;
 
 	ctrl->p2p_client = get_device(req->p2p_client);
 
-	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link,
-				lockdep_is_held(&ctrl->subsys->lock))
+	xa_for_each(&ctrl->subsys->namespaces, idx, ns)
 		nvmet_p2pmem_ns_add_p2p(ctrl, ns);
 }
 
@@ -1523,7 +1498,7 @@ struct nvmet_subsys *nvmet_subsys_alloc(const char *subsysnqn,
 	kref_init(&subsys->ref);
 
 	mutex_init(&subsys->lock);
-	INIT_LIST_HEAD(&subsys->namespaces);
+	xa_init(&subsys->namespaces);
 	INIT_LIST_HEAD(&subsys->ctrls);
 	INIT_LIST_HEAD(&subsys->hosts);
 
@@ -1535,8 +1510,9 @@ static void nvmet_subsys_free(struct kref *ref)
 	struct nvmet_subsys *subsys =
 		container_of(ref, struct nvmet_subsys, ref);
 
-	WARN_ON_ONCE(!list_empty(&subsys->namespaces));
+	WARN_ON_ONCE(!xa_empty(&subsys->namespaces));
 
+	xa_destroy(&subsys->namespaces);
 	kfree(subsys->subsysnqn);
 	kfree_rcu(subsys->model, rcuhead);
 	kfree(subsys);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 6f8bd6a93575..49e14111446c 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -52,7 +52,6 @@
 	(cpu_to_le32(offsetof(struct nvmf_connect_command, x)))
 
 struct nvmet_ns {
-	struct list_head	dev_link;
 	struct percpu_ref	ref;
 	struct block_device	*bdev;
 	struct file		*file;
@@ -219,7 +218,7 @@ struct nvmet_subsys {
 	struct mutex		lock;
 	struct kref		ref;
 
-	struct list_head	namespaces;
+	struct xarray		namespaces;
 	unsigned int		nr_namespaces;
 	unsigned int		max_nsid;
 	u16			cntlid_min;
-- 
2.26.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V4 2/2] nvme-core: use xarray for ctrl ns tracking
  2020-07-20  3:32 [PATCH V4 0/2] nvme: use xarray for ns tracking Chaitanya Kulkarni
  2020-07-20  3:32 ` [PATCH V4 1/2] nvmet: use xarray for ctrl ns storing Chaitanya Kulkarni
@ 2020-07-20  3:32 ` Chaitanya Kulkarni
  2020-07-20 13:34   ` Christoph Hellwig
  1 sibling, 1 reply; 7+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-20  3:32 UTC (permalink / raw)
  To: kbusch, hch, sagi; +Cc: Chaitanya Kulkarni, linux-nvme

This patch replaces the ctrl->namespaces tracking from linked list to
xarray and improves the performance. For host even though
nvme_find_get_ns() doesn't fall into the fast path yet it does for
NVMeOF passthru patch-series which is currently under review. This
prepares us to improve performance for future NVMeOF passthru backend
since nvme_find_get_ns() uses same data structure as it does in target
to find namespace in I/O patch from nsid specified in the nvme_rw_cmd.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/host/core.c      | 184 ++++++++++++++++------------------
 drivers/nvme/host/multipath.c |  15 ++-
 drivers/nvme/host/nvme.h      |   5 +-
 3 files changed, 96 insertions(+), 108 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f49085bcaa42..3f09774d2d54 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1426,9 +1426,9 @@ static u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 static void nvme_update_formats(struct nvme_ctrl *ctrl, u32 *effects)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		if (_nvme_revalidate_disk(ns->disk))
 			nvme_set_queue_dying(ns);
 		else if (blk_queue_is_zoned(ns->disk->queue)) {
@@ -1440,7 +1440,6 @@ static void nvme_update_formats(struct nvme_ctrl *ctrl, u32 *effects)
 			 */
 			*effects |= NVME_CMD_EFFECTS_NCC;
 		}
-	up_read(&ctrl->namespaces_rwsem);
 }
 
 static void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects)
@@ -3195,34 +3194,33 @@ static int nvme_dev_open(struct inode *inode, struct file *file)
 static int nvme_dev_user_cmd(struct nvme_ctrl *ctrl, void __user *argp)
 {
 	struct nvme_ns *ns;
-	int ret;
+	unsigned long idx;
+	int ret = -EINVAL;
+	int count = 0;
 
-	down_read(&ctrl->namespaces_rwsem);
-	if (list_empty(&ctrl->namespaces)) {
+	if (xa_empty(&ctrl->namespaces)) {
 		ret = -ENOTTY;
-		goto out_unlock;
+		goto out;
 	}
-
-	ns = list_first_entry(&ctrl->namespaces, struct nvme_ns, list);
-	if (ns != list_last_entry(&ctrl->namespaces, struct nvme_ns, list)) {
-		dev_warn(ctrl->device,
-			"NVME_IOCTL_IO_CMD not supported when multiple namespaces present!\n");
-		ret = -EINVAL;
-		goto out_unlock;
+	xa_for_each(&ctrl->namespaces, idx, ns) {
+		if (count > 0)
+			goto err;
+		count++;
 	}
 
 	dev_warn(ctrl->device,
 		"using deprecated NVME_IOCTL_IO_CMD ioctl on the char device!\n");
 	kref_get(&ns->kref);
-	up_read(&ctrl->namespaces_rwsem);
 
 	ret = nvme_user_cmd(ctrl, ns, argp);
 	nvme_put_ns(ns);
-	return ret;
 
-out_unlock:
-	up_read(&ctrl->namespaces_rwsem);
+out:
 	return ret;
+err:
+	dev_warn(ctrl->device,
+			"NVME_IOCTL_IO_CMD not supported when multiple namespaces present!\n");
+	goto out;
 }
 
 static long nvme_dev_ioctl(struct file *file, unsigned int cmd,
@@ -3790,31 +3788,16 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
 	return ret;
 }
 
-static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
-{
-	struct nvme_ns *nsa = container_of(a, struct nvme_ns, list);
-	struct nvme_ns *nsb = container_of(b, struct nvme_ns, list);
-
-	return nsa->head->ns_id - nsb->head->ns_id;
-}
-
 static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 {
-	struct nvme_ns *ns, *ret = NULL;
+	struct nvme_ns *ns;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		if (ns->head->ns_id == nsid) {
-			if (!kref_get_unless_zero(&ns->kref))
-				continue;
-			ret = ns;
-			break;
-		}
-		if (ns->head->ns_id > nsid)
-			break;
-	}
-	up_read(&ctrl->namespaces_rwsem);
-	return ret;
+	rcu_read_lock();
+	ns = xa_load(&ctrl->namespaces, nsid);
+	ns = ns && kref_get_unless_zero(&ns->kref) ? ns : NULL;
+	rcu_read_unlock();
+
+	return ns;
 }
 
 static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
@@ -3884,9 +3867,9 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 		}
 	}
 
-	down_write(&ctrl->namespaces_rwsem);
-	list_add_tail(&ns->list, &ctrl->namespaces);
-	up_write(&ctrl->namespaces_rwsem);
+	ret = xa_insert(&ctrl->namespaces, nsid, ns, GFP_KERNEL);
+	if (ret)
+		goto out_unregister_nvm;
 
 	nvme_get_ctrl(ctrl);
 
@@ -3897,6 +3880,9 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	kfree(id);
 
 	return;
+ out_unregister_nvm:
+	if ((ctrl->quirks & NVME_QUIRK_LIGHTNVM) && id->vs[0] == 0x1)
+		nvme_nvm_unregister(ns);
  out_put_disk:
 	/* prevent double queue cleanup */
 	ns->disk->queue = NULL;
@@ -3929,6 +3915,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 		list_del_init(&ns->head->entry);
 	mutex_unlock(&ns->ctrl->subsys->lock);
 
+	xa_erase(&ns->ctrl->namespaces, ns->head->ns_id);
 	synchronize_rcu(); /* guarantee not available in head->list */
 	nvme_mpath_clear_current_path(ns);
 	synchronize_srcu(&ns->head->srcu); /* wait for concurrent submissions */
@@ -3940,10 +3927,6 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 			blk_integrity_unregister(ns->disk);
 	}
 
-	down_write(&ns->ctrl->namespaces_rwsem);
-	list_del_init(&ns->list);
-	up_write(&ns->ctrl->namespaces_rwsem);
-
 	nvme_mpath_check_last_path(ns);
 	nvme_put_ns(ns);
 }
@@ -3974,19 +3957,31 @@ static void nvme_validate_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 					unsigned nsid)
 {
-	struct nvme_ns *ns, *next;
-	LIST_HEAD(rm_list);
+	struct xarray rm_array;
+	unsigned long tnsid;
+	struct nvme_ns *ns;
+	unsigned long idx;
+	int ret;
 
-	down_write(&ctrl->namespaces_rwsem);
-	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
-		if (ns->head->ns_id > nsid || test_bit(NVME_NS_DEAD, &ns->flags))
-			list_move_tail(&ns->list, &rm_list);
+	xa_init(&rm_array);
+
+	xa_lock(&ctrl->namespaces);
+	xa_for_each(&ctrl->namespaces, idx, ns) {
+		tnsid = ns->head->ns_id;
+		if (tnsid > nsid || test_bit(NVME_NS_DEAD, &ns->flags)) {
+			__xa_erase(&ctrl->namespaces, tnsid);
+			xa_unlock(&ctrl->namespaces);
+			/* Even if insert fails keep going */
+			ret = xa_insert(&rm_array, nsid, ns, GFP_KERNEL);
+			if (ret)
+				pr_err("xa_insert %d\n", ret);
+			xa_lock(&ctrl->namespaces);
+		}
 	}
-	up_write(&ctrl->namespaces_rwsem);
+	xa_unlock(&ctrl->namespaces);
 
-	list_for_each_entry_safe(ns, next, &rm_list, list)
+	xa_for_each(&rm_array, idx, ns)
 		nvme_ns_remove(ns);
-
 }
 
 static int nvme_scan_ns_list(struct nvme_ctrl *ctrl)
@@ -4084,10 +4079,6 @@ static void nvme_scan_work(struct work_struct *work)
 	if (nvme_scan_ns_list(ctrl) != 0)
 		nvme_scan_ns_sequential(ctrl);
 	mutex_unlock(&ctrl->scan_lock);
-
-	down_write(&ctrl->namespaces_rwsem);
-	list_sort(NULL, &ctrl->namespaces, ns_cmp);
-	up_write(&ctrl->namespaces_rwsem);
 }
 
 /*
@@ -4097,8 +4088,12 @@ static void nvme_scan_work(struct work_struct *work)
  */
 void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
 {
-	struct nvme_ns *ns, *next;
-	LIST_HEAD(ns_list);
+	struct xarray rm_array;
+	struct nvme_ns *ns;
+	unsigned long idx;
+	int ret;
+
+	xa_init(&rm_array);
 
 	/*
 	 * make sure to requeue I/O to all namespaces as these
@@ -4119,11 +4114,18 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
 	if (ctrl->state == NVME_CTRL_DEAD)
 		nvme_kill_queues(ctrl);
 
-	down_write(&ctrl->namespaces_rwsem);
-	list_splice_init(&ctrl->namespaces, &ns_list);
-	up_write(&ctrl->namespaces_rwsem);
+	xa_lock(&ctrl->namespaces);
+	xa_for_each(&ctrl->namespaces, idx, ns) {
+		__xa_erase(&ctrl->namespaces, ns->head->ns_id);
+		xa_unlock(&ctrl->namespaces);
+		ret = xa_insert(&rm_array, ns->head->ns_id, ns, GFP_KERNEL);
+		if (ret)
+			pr_err("xa_insert %d\n", ret);
+		xa_lock(&ctrl->namespaces);
+	}
+	xa_unlock(&ctrl->namespaces);
 
-	list_for_each_entry_safe(ns, next, &ns_list, list)
+	xa_for_each(&rm_array, idx, ns)
 		nvme_ns_remove(ns);
 }
 EXPORT_SYMBOL_GPL(nvme_remove_namespaces);
@@ -4344,6 +4346,10 @@ static void nvme_free_ctrl(struct device *dev)
 	if (subsys && ctrl->instance != subsys->instance)
 		ida_simple_remove(&nvme_instance_ida, ctrl->instance);
 
+	WARN_ON_ONCE(!xa_empty(&ctrl->namespaces));
+
+	xa_destroy(&ctrl->namespaces);
+
 	list_for_each_entry_safe(cel, next, &ctrl->cels, entry) {
 		list_del(&cel->entry);
 		kfree(cel);
@@ -4378,9 +4384,8 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	ctrl->state = NVME_CTRL_NEW;
 	spin_lock_init(&ctrl->lock);
 	mutex_init(&ctrl->scan_lock);
-	INIT_LIST_HEAD(&ctrl->namespaces);
 	INIT_LIST_HEAD(&ctrl->cels);
-	init_rwsem(&ctrl->namespaces_rwsem);
+	xa_init(&ctrl->namespaces);
 	ctrl->dev = dev;
 	ctrl->ops = ops;
 	ctrl->quirks = quirks;
@@ -4460,98 +4465,87 @@ EXPORT_SYMBOL_GPL(nvme_init_ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
-
-	down_read(&ctrl->namespaces_rwsem);
+	unsigned long idx;
 
 	/* Forcibly unquiesce queues to avoid blocking dispatch */
 	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
 		blk_mq_unquiesce_queue(ctrl->admin_q);
 
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		nvme_set_queue_dying(ns);
-
-	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_kill_queues);
 
 void nvme_unfreeze(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		blk_mq_unfreeze_queue(ns->queue);
-	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_unfreeze);
 
 void nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
+	xa_for_each(&ctrl->namespaces, idx, ns) {
 		timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
 		if (timeout <= 0)
 			break;
 	}
-	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_wait_freeze_timeout);
 
 void nvme_wait_freeze(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		blk_mq_freeze_queue_wait(ns->queue);
-	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_wait_freeze);
 
 void nvme_start_freeze(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		blk_freeze_queue_start(ns->queue);
-	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_start_freeze);
 
 void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		blk_mq_quiesce_queue(ns->queue);
-	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
 
 void nvme_start_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		blk_mq_unquiesce_queue(ns->queue);
-	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
 
-
 void nvme_sync_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		blk_sync_queue(ns->queue);
-	up_read(&ctrl->namespaces_rwsem);
 
 	if (ctrl->admin_q)
 		blk_sync_queue(ctrl->admin_q);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 74bad4e3d377..af486864a1dc 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -115,13 +115,12 @@ bool nvme_failover_req(struct request *req)
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
+	xa_for_each(&ctrl->namespaces, idx, ns) {
 		if (ns->head->disk)
 			kblockd_schedule_work(&ns->head->requeue_work);
 	}
-	up_read(&ctrl->namespaces_rwsem);
 }
 
 static const char *nvme_ana_state_names[] = {
@@ -155,13 +154,12 @@ bool nvme_mpath_clear_current_path(struct nvme_ns *ns)
 void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
 	mutex_lock(&ctrl->scan_lock);
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		if (nvme_mpath_clear_current_path(ns))
 			kblockd_schedule_work(&ns->head->requeue_work);
-	up_read(&ctrl->namespaces_rwsem);
 	mutex_unlock(&ctrl->scan_lock);
 }
 
@@ -495,6 +493,7 @@ static int nvme_update_ana_state(struct nvme_ctrl *ctrl,
 	u32 nr_nsids = le32_to_cpu(desc->nnsids), n = 0;
 	unsigned *nr_change_groups = data;
 	struct nvme_ns *ns;
+	unsigned long idx;
 
 	dev_dbg(ctrl->device, "ANA group %d: %s.\n",
 			le32_to_cpu(desc->grpid),
@@ -506,8 +505,7 @@ static int nvme_update_ana_state(struct nvme_ctrl *ctrl,
 	if (!nr_nsids)
 		return 0;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
+	xa_for_each(&ctrl->namespaces, idx, ns) {
 		unsigned nsid = le32_to_cpu(desc->nsids[n]);
 
 		if (ns->head->ns_id < nsid)
@@ -517,7 +515,6 @@ static int nvme_update_ana_state(struct nvme_ctrl *ctrl,
 		if (++n == nr_nsids)
 			break;
 	}
-	up_read(&ctrl->namespaces_rwsem);
 	return 0;
 }
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 13ca90bcd352..d1b9761aff9c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -212,8 +212,7 @@ struct nvme_ctrl {
 	int numa_node;
 	struct blk_mq_tag_set *tagset;
 	struct blk_mq_tag_set *admin_tagset;
-	struct list_head namespaces;
-	struct rw_semaphore namespaces_rwsem;
+	struct xarray namespaces;
 	struct device ctrl_device;
 	struct device *device;	/* char device */
 	struct cdev cdev;
@@ -388,8 +387,6 @@ enum nvme_ns_features {
 };
 
 struct nvme_ns {
-	struct list_head list;
-
 	struct nvme_ctrl *ctrl;
 	struct request_queue *queue;
 	struct gendisk *disk;
-- 
2.26.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V4 1/2] nvmet: use xarray for ctrl ns storing
  2020-07-20  3:32 ` [PATCH V4 1/2] nvmet: use xarray for ctrl ns storing Chaitanya Kulkarni
@ 2020-07-20 13:30   ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2020-07-20 13:30 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: kbusch, hch, linux-nvme, sagi

Applied to nvme-5.9 after fixing a missing endianess conversion.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V4 2/2] nvme-core: use xarray for ctrl ns tracking
  2020-07-20  3:32 ` [PATCH V4 2/2] nvme-core: use xarray for ctrl ns tracking Chaitanya Kulkarni
@ 2020-07-20 13:34   ` Christoph Hellwig
  2020-07-20 19:41     ` Sagi Grimberg
  2020-07-20 21:32     ` Chaitanya Kulkarni
  0 siblings, 2 replies; 7+ messages in thread
From: Christoph Hellwig @ 2020-07-20 13:34 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: kbusch, hch, linux-nvme, sagi

On Sun, Jul 19, 2020 at 08:32:03PM -0700, Chaitanya Kulkarni wrote:
> This patch replaces the ctrl->namespaces tracking from linked list to
> xarray and improves the performance. For host even though
> nvme_find_get_ns() doesn't fall into the fast path yet it does for
> NVMeOF passthru patch-series which is currently under review. This
> prepares us to improve performance for future NVMeOF passthru backend
> since nvme_find_get_ns() uses same data structure as it does in target
> to find namespace in I/O patch from nsid specified in the nvme_rw_cmd.

І'm still not really sold on that rationale..

> +	if (xa_empty(&ctrl->namespaces)) {
>  		ret = -ENOTTY;
> -		goto out_unlock;
> +		goto out;
>  	}
> -
> -	ns = list_first_entry(&ctrl->namespaces, struct nvme_ns, list);
> -	if (ns != list_last_entry(&ctrl->namespaces, struct nvme_ns, list)) {
> -		dev_warn(ctrl->device,
> -			"NVME_IOCTL_IO_CMD not supported when multiple namespaces present!\n");
> -		ret = -EINVAL;
> -		goto out_unlock;
> +	xa_for_each(&ctrl->namespaces, idx, ns) {
> +		if (count > 0)
> +			goto err;
> +		count++;

How about factoring the check into a little
nvme_has_multiple_namespaces helper and avoid the backwards jumping
goto later on?

> +	rcu_read_lock();
> +	ns = xa_load(&ctrl->namespaces, nsid);
> +	ns = ns && kref_get_unless_zero(&ns->kref) ? ns : NULL;
> +	rcu_read_unlock();

The canonical way to write this is:

	rcu_read_lock();
	ns = xa_load(&ctrl->namespaces, nsid);
	if (ns && !kref_get_unless_zero(&ns->kref))
		ns = NULL;
	rcu_read_unlock();

> + out_unregister_nvm:

Maybe call this out_unregister_lightnvm to be a little more descriptive.

> +
> +	xa_lock(&ctrl->namespaces);
> +	xa_for_each(&ctrl->namespaces, idx, ns) {
> +		tnsid = ns->head->ns_id;
> +		if (tnsid > nsid || test_bit(NVME_NS_DEAD, &ns->flags)) {
> +			__xa_erase(&ctrl->namespaces, tnsid);
> +			xa_unlock(&ctrl->namespaces);
> +			/* Even if insert fails keep going */
> +			ret = xa_insert(&rm_array, nsid, ns, GFP_KERNEL);
> +			if (ret)
> +				pr_err("xa_insert %d\n", ret);
> +			xa_lock(&ctrl->namespaces);
> +		}

And this is where I'm still worried that we now need an insert
as part of the namespace deletion.  Either keep the list here as
I suggested before, or we just need rework the deletion here first.


> @@ -4119,11 +4114,18 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
>  	if (ctrl->state == NVME_CTRL_DEAD)
>  		nvme_kill_queues(ctrl);
>  
> -	down_write(&ctrl->namespaces_rwsem);
> -	list_splice_init(&ctrl->namespaces, &ns_list);
> -	up_write(&ctrl->namespaces_rwsem);
> +	xa_lock(&ctrl->namespaces);
> +	xa_for_each(&ctrl->namespaces, idx, ns) {
> +		__xa_erase(&ctrl->namespaces, ns->head->ns_id);
> +		xa_unlock(&ctrl->namespaces);
> +		ret = xa_insert(&rm_array, ns->head->ns_id, ns, GFP_KERNEL);
> +		if (ret)
> +			pr_err("xa_insert %d\n", ret);
> +		xa_lock(&ctrl->namespaces);
> +	}
> +	xa_unlock(&ctrl->namespaces);

Same here.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V4 2/2] nvme-core: use xarray for ctrl ns tracking
  2020-07-20 13:34   ` Christoph Hellwig
@ 2020-07-20 19:41     ` Sagi Grimberg
  2020-07-20 21:32     ` Chaitanya Kulkarni
  1 sibling, 0 replies; 7+ messages in thread
From: Sagi Grimberg @ 2020-07-20 19:41 UTC (permalink / raw)
  To: Christoph Hellwig, Chaitanya Kulkarni; +Cc: kbusch, linux-nvme


>> This patch replaces the ctrl->namespaces tracking from linked list to
>> xarray and improves the performance. For host even though
>> nvme_find_get_ns() doesn't fall into the fast path yet it does for
>> NVMeOF passthru patch-series which is currently under review. This
>> prepares us to improve performance for future NVMeOF passthru backend
>> since nvme_find_get_ns() uses same data structure as it does in target
>> to find namespace in I/O patch from nsid specified in the nvme_rw_cmd.
> 
> І'm still not really sold on that rationale..

I think that your suggestion to have a passthru_q is a much more
elegant solution (although it probably cannot be unified with the
connect_q which should be able to make forward progress on a quiesced
queue as well for fabrics).


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V4 2/2] nvme-core: use xarray for ctrl ns tracking
  2020-07-20 13:34   ` Christoph Hellwig
  2020-07-20 19:41     ` Sagi Grimberg
@ 2020-07-20 21:32     ` Chaitanya Kulkarni
  1 sibling, 0 replies; 7+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-20 21:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kbusch, sagi, linux-nvme

On 7/20/20 06:35, Christoph Hellwig wrote:
> On Sun, Jul 19, 2020 at 08:32:03PM -0700, Chaitanya Kulkarni wrote:
>> This patch replaces the ctrl->namespaces tracking from linked list to
>> xarray and improves the performance. For host even though
>> nvme_find_get_ns() doesn't fall into the fast path yet it does for
>> NVMeOF passthru patch-series which is currently under review. This
>> prepares us to improve performance for future NVMeOF passthru backend
>> since nvme_find_get_ns() uses same data structure as it does in target
>> to find namespace in I/O patch from nsid specified in the nvme_rw_cmd.
> 
> І'm still not really sold on that rationale..
> 
>> +	if (xa_empty(&ctrl->namespaces)) {
>>   		ret = -ENOTTY;
>> -		goto out_unlock;
>> +		goto out;
>>   	}
>> -
>> -	ns = list_first_entry(&ctrl->namespaces, struct nvme_ns, list);
>> -	if (ns != list_last_entry(&ctrl->namespaces, struct nvme_ns, list)) {
>> -		dev_warn(ctrl->device,
>> -			"NVME_IOCTL_IO_CMD not supported when multiple namespaces present!\n");
>> -		ret = -EINVAL;
>> -		goto out_unlock;
>> +	xa_for_each(&ctrl->namespaces, idx, ns) {
>> +		if (count > 0)
>> +			goto err;
>> +		count++;
> 
> How about factoring the check into a little
> nvme_has_multiple_namespaces helper and avoid the backwards jumping
> goto later on?
> 
Added to V5.
>> +	rcu_read_lock();
>> +	ns = xa_load(&ctrl->namespaces, nsid);
>> +	ns = ns && kref_get_unless_zero(&ns->kref) ? ns : NULL;
>> +	rcu_read_unlock();
> 
> The canonical way to write this is:
> 
Added to V5.
> 	rcu_read_lock();
> 	ns = xa_load(&ctrl->namespaces, nsid);
> 	if (ns && !kref_get_unless_zero(&ns->kref))
> 		ns = NULL;
> 	rcu_read_unlock();
> 
>> + out_unregister_nvm:
> 
> Maybe call this out_unregister_lightnvm to be a little more descriptive.
> 
Added to V5.
>> +
>> +	xa_lock(&ctrl->namespaces);
>> +	xa_for_each(&ctrl->namespaces, idx, ns) {
>> +		tnsid = ns->head->ns_id;
>> +		if (tnsid > nsid || test_bit(NVME_NS_DEAD, &ns->flags)) {
>> +			__xa_erase(&ctrl->namespaces, tnsid);
>> +			xa_unlock(&ctrl->namespaces);
>> +			/* Even if insert fails keep going */
>> +			ret = xa_insert(&rm_array, nsid, ns, GFP_KERNEL);
>> +			if (ret)
>> +				pr_err("xa_insert %d\n", ret);
>> +			xa_lock(&ctrl->namespaces);
>> +		}
> 
> And this is where I'm still worried that we now need an insert
> as part of the namespace deletion.  Either keep the list here as
> I suggested before, or we just need rework the deletion here first.
> 

I really want to avoid increasing the size of struct nvme_ns as 
explained in the last version's cover-letter just for the sake of 
deletion especially when it is approaching cache-line size.

Can you elaborate more about what you have in mind for rework ?

We can get away with a call to nvme_ns_remove() outside of the lock 
instead of xa_insert() in the xa_for_each() loop so no need for
rm_array and xa_insert(), is that what you are referring to ?

Basic tested patch on the top of this one see [1].

> 
>> @@ -4119,11 +4114,18 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
>>   	if (ctrl->state == NVME_CTRL_DEAD)
>>   		nvme_kill_queues(ctrl);
>>   
>> -	down_write(&ctrl->namespaces_rwsem);
>> -	list_splice_init(&ctrl->namespaces, &ns_list);
>> -	up_write(&ctrl->namespaces_rwsem);
>> +	xa_lock(&ctrl->namespaces);
>> +	xa_for_each(&ctrl->namespaces, idx, ns) {
>> +		__xa_erase(&ctrl->namespaces, ns->head->ns_id);
>> +		xa_unlock(&ctrl->namespaces);
>> +		ret = xa_insert(&rm_array, ns->head->ns_id, ns, GFP_KERNEL);
>> +		if (ret)
>> +			pr_err("xa_insert %d\n", ret);
>> +		xa_lock(&ctrl->namespaces);
>> +	}
>> +	xa_unlock(&ctrl->namespaces);
> 
> Same here.
> 

Same here see [1].

[1] Get rid of the rm_xarray and xa_insert in the nvme_ns_remove()
     path:-

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3f09774d2d54..59fb00027530 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3957,13 +3957,9 @@ static void nvme_validate_ns(struct nvme_ctrl 
*ctrl, unsigned nsid)
  static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
                                         unsigned nsid)
  {
-       struct xarray rm_array;
         unsigned long tnsid;
         struct nvme_ns *ns;
         unsigned long idx;
-       int ret;
-
-       xa_init(&rm_array);

         xa_lock(&ctrl->namespaces);
         xa_for_each(&ctrl->namespaces, idx, ns) {
@@ -3971,17 +3967,11 @@ static void 
nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
                 if (tnsid > nsid || test_bit(NVME_NS_DEAD, &ns->flags)) {
                         __xa_erase(&ctrl->namespaces, tnsid);
                         xa_unlock(&ctrl->namespaces);
-                       /* Even if insert fails keep going */
-                       ret = xa_insert(&rm_array, nsid, ns, GFP_KERNEL);
-                       if (ret)
-                               pr_err("xa_insert %d\n", ret);
+                       nvme_ns_remove(ns);
                         xa_lock(&ctrl->namespaces);
                 }
         }
         xa_unlock(&ctrl->namespaces);
-
-       xa_for_each(&rm_array, idx, ns)
-               nvme_ns_remove(ns);
  }

  static int nvme_scan_ns_list(struct nvme_ctrl *ctrl)
@@ -4088,12 +4078,8 @@ static void nvme_scan_work(struct work_struct *work)
   */
  void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
  {
-       struct xarray rm_array;
         struct nvme_ns *ns;
         unsigned long idx;
-       int ret;
-
-       xa_init(&rm_array);

         /*
          * make sure to requeue I/O to all namespaces as these
@@ -4118,15 +4104,10 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
         xa_for_each(&ctrl->namespaces, idx, ns) {
                 __xa_erase(&ctrl->namespaces, ns->head->ns_id);
                 xa_unlock(&ctrl->namespaces);
-               ret = xa_insert(&rm_array, ns->head->ns_id, ns, GFP_KERNEL);
-               if (ret)
-                       pr_err("xa_insert %d\n", ret);
+               nvme_ns_remove(ns);
                 xa_lock(&ctrl->namespaces);
         }
         xa_unlock(&ctrl->namespaces);
-
-       xa_for_each(&rm_array, idx, ns)
-               nvme_ns_remove(ns);
  }
  EXPORT_SYMBOL_GPL(nvme_remove_namespaces);

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-07-20 21:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-20  3:32 [PATCH V4 0/2] nvme: use xarray for ns tracking Chaitanya Kulkarni
2020-07-20  3:32 ` [PATCH V4 1/2] nvmet: use xarray for ctrl ns storing Chaitanya Kulkarni
2020-07-20 13:30   ` Christoph Hellwig
2020-07-20  3:32 ` [PATCH V4 2/2] nvme-core: use xarray for ctrl ns tracking Chaitanya Kulkarni
2020-07-20 13:34   ` Christoph Hellwig
2020-07-20 19:41     ` Sagi Grimberg
2020-07-20 21:32     ` Chaitanya Kulkarni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.