Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking
@ 2020-07-01  2:25 Chaitanya Kulkarni
  2020-07-01  2:25 ` [PATCH V2 1/2] nvme-core: use xarray for ctrl " Chaitanya Kulkarni
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-01  2:25 UTC (permalink / raw)
  To: kbusch; +Cc: Chaitanya Kulkarni, hch, linux-nvme, sagi

Hi,

This is a small patch-series which replaces ctrl->namespaces with
xarray for host-core and target-core. We can see following
performance improvement when running fio with 32 parallel jobs where
first 16 namespaces and last 16 namespaces are used for I/O. See [1] for
detailed performance numbers. 

Please consider this for nvme-5.9.

Regards,
Chaitanya

[1] XArray vs linked list performance comparision with NVMeOF nvme-loop
target with each namespace backed by unique null block device :-

Bandwidth ~16.4198% increase with Xarray :-
-----------------------------------------------------------------------

default-1.fio.log:  read:  IOPS=820k,  BW=3204MiB/s  (3360MB/s)(188GiB/60002msec)
default-2.fio.log:  read:  IOPS=835k,  BW=3260MiB/s  (3418MB/s)(191GiB/60002msec)
default-3.fio.log:  read:  IOPS=834k,  BW=3257MiB/s  (3415MB/s)(191GiB/60001msec)
xarray-1.fio.log:   read:  IOPS=966k,  BW=3772MiB/s  (3955MB/s)(221GiB/60003msec)
xarray-2.fio.log:   read:  IOPS=966k,  BW=3775MiB/s  (3958MB/s)(221GiB/60002msec)
xarray-3.fio.log:   read:  IOPS=965k,  BW=3769MiB/s  (3952MB/s)(221GiB/60002msec)

Latency ~25% decrease with Xarray :-
------------------------------------------------------------------------

default-1.fio.log:  slat  (usec):  min=8,  max=26066,  avg=25.18,  stdev=47.72
default-2.fio.log:  slat  (usec):  min=8,  max=907,    avg=20.24,  stdev=7.36
default-3.fio.log:  slat  (usec):  min=8,  max=723,    avg=20.21,  stdev=7.16
xarray-1.fio.log:   slat  (usec):  min=8,  max=639,    avg=14.84,  stdev=1.50
xarray-2.fio.log:   slat  (usec):  min=8,  max=840,    avg=14.84,  stdev=1.51
xarray-3.fio.log:   slat  (usec):  min=8,  max=2161,   avg=15.08,  stdev=9.56

CPU usage ~12.2807% decrease with Xarray :-
-----------------------------------------------------------------------

default-1.fio.log:  cpu  :  usr=3.92%,  sys=57.25%,  ctx=2159595,  majf=0,  minf=2807
default-2.fio.log:  cpu  :  usr=3.98%,  sys=57.99%,  ctx=1565139,  majf=0,  minf=2425
default-3.fio.log:  cpu  :  usr=3.99%,  sys=57.85%,  ctx=1563792,  majf=0,  minf=2977
xarray-1.fio.log:   cpu  :  usr=4.47%,  sys=50.88%,  ctx=1810927,  majf=0,  minf=2478
xarray-2.fio.log:   cpu  :  usr=4.47%,  sys=50.88%,  ctx=1812184,  majf=0,  minf=2176
xarray-3.fio.log:   cpu  :  usr=4.49%,  sys=50.86%,  ctx=1816963,  majf=0,  minf=2736

Change from V1:-

1. Use xarray instead of rcu locks.

Chaitanya Kulkarni (2):
  nvme-core: use xarray for ctrl ns tracking
  nvmet: use xarray for ctrl ns storing

 drivers/nvme/host/core.c        | 235 +++++++++++++++++++-------------
 drivers/nvme/host/multipath.c   |  15 +-
 drivers/nvme/host/nvme.h        |   5 +-
 drivers/nvme/target/admin-cmd.c |  25 ++--
 drivers/nvme/target/core.c      |  77 +++++------
 drivers/nvme/target/nvmet.h     |   3 +-
 6 files changed, 192 insertions(+), 168 deletions(-)

-- 
2.26.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V2 1/2] nvme-core: use xarray for ctrl ns tracking
  2020-07-01  2:25 [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking Chaitanya Kulkarni
@ 2020-07-01  2:25 ` Chaitanya Kulkarni
  2020-07-01 13:12   ` Christoph Hellwig
  2020-07-01  2:25 ` [PATCH V2 2/2] nvmet: use xarray for ctrl ns storing Chaitanya Kulkarni
  2020-07-01  5:36 ` [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking Christoph Hellwig
  2 siblings, 1 reply; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-01  2:25 UTC (permalink / raw)
  To: kbusch; +Cc: Chaitanya Kulkarni, hch, linux-nvme, sagi

This patch replaces the ctrl->namespaces tracking from linked list to
xarray and improves the performance.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/host/core.c      | 235 ++++++++++++++++++++--------------
 drivers/nvme/host/multipath.c |  15 +--
 drivers/nvme/host/nvme.h      |   5 +-
 3 files changed, 145 insertions(+), 110 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e62fdc208b27..10e1fda8a21d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -437,10 +437,8 @@ static void nvme_put_ns_head(struct nvme_ns_head *head)
 	kref_put(&head->ref, nvme_free_ns_head);
 }
 
-static void nvme_free_ns(struct kref *kref)
+static void __nvme_free_ns(struct nvme_ns *ns)
 {
-	struct nvme_ns *ns = container_of(kref, struct nvme_ns, kref);
-
 	if (ns->ndev)
 		nvme_nvm_unregister(ns);
 
@@ -450,6 +448,13 @@ static void nvme_free_ns(struct kref *kref)
 	kfree(ns);
 }
 
+static void nvme_free_ns(struct kref *kref)
+{
+	struct nvme_ns *ns = container_of(kref, struct nvme_ns, kref);
+
+	__nvme_free_ns(ns);
+}
+
 static void nvme_put_ns(struct nvme_ns *ns)
 {
 	kref_put(&ns->kref, nvme_free_ns);
@@ -1381,12 +1386,11 @@ static u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 static void nvme_update_formats(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		if (ns->disk && nvme_revalidate_disk(ns->disk))
 			nvme_set_queue_dying(ns);
-	up_read(&ctrl->namespaces_rwsem);
 }
 
 static void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects)
@@ -3063,34 +3067,36 @@ static int nvme_dev_open(struct inode *inode, struct file *file)
 
 static int nvme_dev_user_cmd(struct nvme_ctrl *ctrl, void __user *argp)
 {
+	struct nvme_id_ctrl *id;
 	struct nvme_ns *ns;
-	int ret;
+	int ret = 0;
 
-	down_read(&ctrl->namespaces_rwsem);
-	if (list_empty(&ctrl->namespaces)) {
+	if (xa_empty(&ctrl->namespaces)) {
 		ret = -ENOTTY;
-		goto out_unlock;
+		goto out;
 	}
 
-	ns = list_first_entry(&ctrl->namespaces, struct nvme_ns, list);
-	if (ns != list_last_entry(&ctrl->namespaces, struct nvme_ns, list)) {
-		dev_warn(ctrl->device,
-			"NVME_IOCTL_IO_CMD not supported when multiple namespaces present!\n");
+	/* Let the scan work finish updating ctrl->namespaces */
+	flush_work(&ctrl->scan_work);
+	if (nvme_identify_ctrl(ctrl, &id)) {
+		dev_err(ctrl->device, "nvme_identify_ctrl() failed\n");
 		ret = -EINVAL;
-		goto out_unlock;
+		goto out;
+	}
+	if (le32_to_cpu(id->nn) > 1) {
+		dev_warn(ctrl->device,
+		"NVME_IOCTL_IO_CMD not supported when multiple namespaces present!\n");
+		goto out;
 	}
 
 	dev_warn(ctrl->device,
 		"using deprecated NVME_IOCTL_IO_CMD ioctl on the char device!\n");
 	kref_get(&ns->kref);
-	up_read(&ctrl->namespaces_rwsem);
 
 	ret = nvme_user_cmd(ctrl, ns, argp);
 	nvme_put_ns(ns);
-	return ret;
-
-out_unlock:
-	up_read(&ctrl->namespaces_rwsem);
+out:
+	kfree(id);
 	return ret;
 }
 
@@ -3590,31 +3596,21 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
 	return ret;
 }
 
-static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
-{
-	struct nvme_ns *nsa = container_of(a, struct nvme_ns, list);
-	struct nvme_ns *nsb = container_of(b, struct nvme_ns, list);
-
-	return nsa->head->ns_id - nsb->head->ns_id;
-}
-
 static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 {
-	struct nvme_ns *ns, *ret = NULL;
+	struct nvme_ns *ns;
+	XA_STATE(xas, &ctrl->namespaces, nsid);
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		if (ns->head->ns_id == nsid) {
-			if (!kref_get_unless_zero(&ns->kref))
-				continue;
-			ret = ns;
-			break;
-		}
-		if (ns->head->ns_id > nsid)
-			break;
-	}
-	up_read(&ctrl->namespaces_rwsem);
-	return ret;
+	rcu_read_lock();
+	do {
+		ns = xas_load(&xas);
+		if (xa_is_zero(ns))
+			ns = NULL;
+	} while (xas_retry(&xas, ns));
+	ns = ns && kref_get_unless_zero(&ns->kref) ? ns : NULL;
+	rcu_read_unlock();
+
+	return ns;
 }
 
 static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
@@ -3684,9 +3680,19 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 		}
 	}
 
-	down_write(&ctrl->namespaces_rwsem);
-	list_add_tail(&ns->list, &ctrl->namespaces);
-	up_write(&ctrl->namespaces_rwsem);
+	ret = xa_insert(&ctrl->namespaces, nsid, ns, GFP_KERNEL);
+	if (ret) {
+		switch (ret) {
+		case -ENOMEM:
+			dev_err(ctrl->device,
+				"xa insert memory allocation\n");
+			break;
+		case -EBUSY:
+			dev_err(ctrl->device,
+				"xa insert entry already present\n");
+			break;
+		}
+	}
 
 	nvme_get_ctrl(ctrl);
 
@@ -3718,6 +3724,9 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 
 static void nvme_ns_remove(struct nvme_ns *ns)
 {
+	struct xarray *xa = &ns->ctrl->namespaces;
+	bool free;
+
 	if (test_and_set_bit(NVME_NS_REMOVING, &ns->flags))
 		return;
 
@@ -3740,12 +3749,14 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 			blk_integrity_unregister(ns->disk);
 	}
 
-	down_write(&ns->ctrl->namespaces_rwsem);
-	list_del_init(&ns->list);
-	up_write(&ns->ctrl->namespaces_rwsem);
+	xa_lock(xa);
+	__xa_erase(xa, ns->head->ns_id);
+	free = refcount_dec_and_test(&ns->kref.refcount) ? true : false;
+	xa_unlock(xa);
 
 	nvme_mpath_check_last_path(ns);
-	nvme_put_ns(ns);
+	if (free)
+		__nvme_free_ns(ns);
 }
 
 static void nvme_ns_remove_by_nsid(struct nvme_ctrl *ctrl, u32 nsid)
@@ -3774,19 +3785,38 @@ static void nvme_validate_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 					unsigned nsid)
 {
-	struct nvme_ns *ns, *next;
-	LIST_HEAD(rm_list);
+	struct xarray *namespaces = &ctrl->namespaces;
+	struct xarray rm_array;
+	unsigned long tnsid;
+	struct nvme_ns *ns;
+	unsigned long idx;
+	int ret;
 
-	down_write(&ctrl->namespaces_rwsem);
-	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
-		if (ns->head->ns_id > nsid || test_bit(NVME_NS_DEAD, &ns->flags))
-			list_move_tail(&ns->list, &rm_list);
+	xa_init(&rm_array);
+
+	xa_lock(namespaces);
+	xa_for_each(namespaces, idx, ns) {
+		tnsid = ns->head->ns_id;
+		if (tnsid > nsid || test_bit(NVME_NS_DEAD, &ns->flags)) {
+			xa_unlock(namespaces);
+			xa_erase(namespaces, tnsid);
+			/* Even if insert fails keep going */
+			ret = xa_insert(&rm_array, nsid, ns, GFP_KERNEL);
+			switch (ret) {
+			case -ENOMEM:
+				pr_err("xa insert memory allocation failed\n");
+				break;
+			case -EBUSY:
+				pr_err("xa insert entry already present\n");
+				break;
+			}
+			xa_lock(namespaces);
+		}
 	}
-	up_write(&ctrl->namespaces_rwsem);
+	xa_unlock(namespaces);
 
-	list_for_each_entry_safe(ns, next, &rm_list, list)
+	xa_for_each(&rm_array, idx, ns)
 		nvme_ns_remove(ns);
-
 }
 
 static int nvme_scan_ns_list(struct nvme_ctrl *ctrl)
@@ -3884,10 +3914,6 @@ static void nvme_scan_work(struct work_struct *work)
 	if (nvme_scan_ns_list(ctrl) != 0)
 		nvme_scan_ns_sequential(ctrl);
 	mutex_unlock(&ctrl->scan_lock);
-
-	down_write(&ctrl->namespaces_rwsem);
-	list_sort(NULL, &ctrl->namespaces, ns_cmp);
-	up_write(&ctrl->namespaces_rwsem);
 }
 
 /*
@@ -3897,8 +3923,13 @@ static void nvme_scan_work(struct work_struct *work)
  */
 void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
 {
-	struct nvme_ns *ns, *next;
-	LIST_HEAD(ns_list);
+	struct xarray rm_array;
+	unsigned long tnsid;
+	struct nvme_ns *ns;
+	unsigned long idx;
+	int ret;
+
+	xa_init(&rm_array);
 
 	/*
 	 * make sure to requeue I/O to all namespaces as these
@@ -3919,11 +3950,30 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
 	if (ctrl->state == NVME_CTRL_DEAD)
 		nvme_kill_queues(ctrl);
 
-	down_write(&ctrl->namespaces_rwsem);
-	list_splice_init(&ctrl->namespaces, &ns_list);
-	up_write(&ctrl->namespaces_rwsem);
+	xa_lock(&ctrl->namespaces);
+	xa_for_each(&ctrl->namespaces, idx, ns) {
+		tnsid = ns->head->ns_id;
+		xa_unlock(&ctrl->namespaces);
+		xa_erase(&ctrl->namespaces, tnsid);
+		/* Even if insert fails keep going */
+		ret = xa_insert(&rm_array, tnsid, ns, GFP_KERNEL);
+		if (ret) {
+			switch (ret) {
+			case -ENOMEM:
+				dev_err(ctrl->device,
+					"xa insert memory allocation\n");
+				break;
+			case -EBUSY:
+				dev_err(ctrl->device,
+					"xa insert entry already present\n");
+				break;
+			}
+		}
+		xa_lock(&ctrl->namespaces);
+	}
+	xa_unlock(&ctrl->namespaces);
 
-	list_for_each_entry_safe(ns, next, &ns_list, list)
+	xa_for_each(&rm_array, idx, ns)
 		nvme_ns_remove(ns);
 }
 EXPORT_SYMBOL_GPL(nvme_remove_namespaces);
@@ -4144,6 +4194,9 @@ static void nvme_free_ctrl(struct device *dev)
 	if (subsys && ctrl->instance != subsys->instance)
 		ida_simple_remove(&nvme_instance_ida, ctrl->instance);
 
+	WARN_ON_ONCE(!xa_empty(&ctrl->namespaces));
+
+	xa_destroy(&ctrl->namespaces);
 	kfree(ctrl->effects);
 	nvme_mpath_uninit(ctrl);
 	__free_page(ctrl->discard_page);
@@ -4174,8 +4227,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	ctrl->state = NVME_CTRL_NEW;
 	spin_lock_init(&ctrl->lock);
 	mutex_init(&ctrl->scan_lock);
-	INIT_LIST_HEAD(&ctrl->namespaces);
-	init_rwsem(&ctrl->namespaces_rwsem);
+	xa_init(&ctrl->namespaces);
 	ctrl->dev = dev;
 	ctrl->ops = ops;
 	ctrl->quirks = quirks;
@@ -4255,98 +4307,87 @@ EXPORT_SYMBOL_GPL(nvme_init_ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
-
-	down_read(&ctrl->namespaces_rwsem);
+	unsigned long idx;
 
 	/* Forcibly unquiesce queues to avoid blocking dispatch */
 	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
 		blk_mq_unquiesce_queue(ctrl->admin_q);
 
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		nvme_set_queue_dying(ns);
-
-	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_kill_queues);
 
 void nvme_unfreeze(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		blk_mq_unfreeze_queue(ns->queue);
-	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_unfreeze);
 
 void nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
+	xa_for_each(&ctrl->namespaces, idx, ns) {
 		timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
 		if (timeout <= 0)
 			break;
 	}
-	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_wait_freeze_timeout);
 
 void nvme_wait_freeze(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		blk_mq_freeze_queue_wait(ns->queue);
-	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_wait_freeze);
 
 void nvme_start_freeze(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		blk_freeze_queue_start(ns->queue);
-	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_start_freeze);
 
 void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		blk_mq_quiesce_queue(ns->queue);
-	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
 
 void nvme_start_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		blk_mq_unquiesce_queue(ns->queue);
-	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
 
-
 void nvme_sync_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		blk_sync_queue(ns->queue);
-	up_read(&ctrl->namespaces_rwsem);
 
 	if (ctrl->admin_q)
 		blk_sync_queue(ctrl->admin_q);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 18d084ed497e..18674735c4bc 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -115,13 +115,12 @@ bool nvme_failover_req(struct request *req)
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
+	xa_for_each(&ctrl->namespaces, idx, ns) {
 		if (ns->head->disk)
 			kblockd_schedule_work(&ns->head->requeue_work);
 	}
-	up_read(&ctrl->namespaces_rwsem);
 }
 
 static const char *nvme_ana_state_names[] = {
@@ -155,13 +154,12 @@ bool nvme_mpath_clear_current_path(struct nvme_ns *ns)
 void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
+	unsigned long idx;
 
 	mutex_lock(&ctrl->scan_lock);
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
+	xa_for_each(&ctrl->namespaces, idx, ns)
 		if (nvme_mpath_clear_current_path(ns))
 			kblockd_schedule_work(&ns->head->requeue_work);
-	up_read(&ctrl->namespaces_rwsem);
 	mutex_unlock(&ctrl->scan_lock);
 }
 
@@ -497,6 +495,7 @@ static int nvme_update_ana_state(struct nvme_ctrl *ctrl,
 	u32 nr_nsids = le32_to_cpu(desc->nnsids), n = 0;
 	unsigned *nr_change_groups = data;
 	struct nvme_ns *ns;
+	unsigned long idx;
 
 	dev_dbg(ctrl->device, "ANA group %d: %s.\n",
 			le32_to_cpu(desc->grpid),
@@ -508,8 +507,7 @@ static int nvme_update_ana_state(struct nvme_ctrl *ctrl,
 	if (!nr_nsids)
 		return 0;
 
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
+	xa_for_each(&ctrl->namespaces, idx, ns) {
 		unsigned nsid = le32_to_cpu(desc->nsids[n]);
 
 		if (ns->head->ns_id < nsid)
@@ -519,7 +517,6 @@ static int nvme_update_ana_state(struct nvme_ctrl *ctrl,
 		if (++n == nr_nsids)
 			break;
 	}
-	up_read(&ctrl->namespaces_rwsem);
 	return 0;
 }
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 2ef8d501e2a8..cff40e567bee 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -206,8 +206,7 @@ struct nvme_ctrl {
 	int numa_node;
 	struct blk_mq_tag_set *tagset;
 	struct blk_mq_tag_set *admin_tagset;
-	struct list_head namespaces;
-	struct rw_semaphore namespaces_rwsem;
+	struct xarray namespaces;
 	struct device ctrl_device;
 	struct device *device;	/* char device */
 	struct cdev cdev;
@@ -376,8 +375,6 @@ enum nvme_ns_features {
 };
 
 struct nvme_ns {
-	struct list_head list;
-
 	struct nvme_ctrl *ctrl;
 	struct request_queue *queue;
 	struct gendisk *disk;
-- 
2.26.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V2 2/2] nvmet: use xarray for ctrl ns storing
  2020-07-01  2:25 [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking Chaitanya Kulkarni
  2020-07-01  2:25 ` [PATCH V2 1/2] nvme-core: use xarray for ctrl " Chaitanya Kulkarni
@ 2020-07-01  2:25 ` Chaitanya Kulkarni
  2020-07-01  6:08   ` Christoph Hellwig
  2020-07-01  5:36 ` [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking Christoph Hellwig
  2 siblings, 1 reply; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-01  2:25 UTC (permalink / raw)
  To: kbusch; +Cc: Chaitanya Kulkarni, hch, linux-nvme, sagi

This patch replaces the ctrl->namespaces tracking from linked list to
xarray and improves the performance when accessing one namespce :-

XArray vs Default:-

IOPS and BW (more the better) increase BW (~1.8%):-
---------------------------------------------------

 XArray :-
  read:  IOPS=160k,  BW=626MiB/s  (656MB/s)(18.3GiB/30001msec)
  read:  IOPS=160k,  BW=626MiB/s  (656MB/s)(18.3GiB/30001msec)
  read:  IOPS=162k,  BW=631MiB/s  (662MB/s)(18.5GiB/30001msec)

 Default:-
  read:  IOPS=156k,  BW=609MiB/s  (639MB/s)(17.8GiB/30001msec)
  read:  IOPS=157k,  BW=613MiB/s  (643MB/s)(17.0GiB/30001msec)
  read:  IOPS=160k,  BW=626MiB/s  (656MB/s)(18.3GiB/30001msec)

Submission latency (less the better) decrease (~8.3%):-
-------------------------------------------------------

 XArray:-
  slat  (usec):  min=7,  max=8386,  avg=11.19,  stdev=5.96
  slat  (usec):  min=7,  max=441,   avg=11.09,  stdev=4.48
  slat  (usec):  min=7,  max=1088,  avg=11.21,  stdev=4.54

 Default :-
  slat  (usec):  min=8,   max=2826.5k,  avg=23.96,  stdev=3911.50
  slat  (usec):  min=8,   max=503,      avg=12.52,  stdev=5.07
  slat  (usec):  min=8,   max=2384,     avg=12.50,  stdev=5.28

CPU Usage (less the better) decrease (~5.2%):-
----------------------------------------------

 XArray:-
  cpu  :  usr=1.84%,  sys=18.61%,  ctx=949471,  majf=0,  minf=250
  cpu  :  usr=1.83%,  sys=18.41%,  ctx=950262,  majf=0,  minf=237
  cpu  :  usr=1.82%,  sys=18.82%,  ctx=957224,  majf=0,  minf=234

 Default:-
  cpu  :  usr=1.70%,  sys=19.21%,  ctx=858196,  majf=0,  minf=251
  cpu  :  usr=1.82%,  sys=19.98%,  ctx=929720,  majf=0,  minf=227
  cpu  :  usr=1.83%,  sys=20.33%,  ctx=947208,  majf=0,  minf=235.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 25 +++++------
 drivers/nvme/target/core.c      | 77 +++++++++++++++------------------
 drivers/nvme/target/nvmet.h     |  3 +-
 3 files changed, 47 insertions(+), 58 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 95bb3bc4e335..39e753ed9419 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -109,15 +109,14 @@ static u16 nvmet_get_smart_log_nsid(struct nvmet_req *req,
 static u16 nvmet_get_smart_log_all(struct nvmet_req *req,
 		struct nvme_smart_log *slog)
 {
-	u64 host_reads = 0, host_writes = 0;
 	u64 data_units_read = 0, data_units_written = 0;
-	struct nvmet_ns *ns;
+	u64 host_reads = 0, host_writes = 0;
 	struct nvmet_ctrl *ctrl;
+	struct nvmet_ns *ns;
+	unsigned long idx;
 
 	ctrl = req->sq->ctrl;
-
-	rcu_read_lock();
-	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) {
+	xa_for_each(&ctrl->subsys->namespaces, idx, ns) {
 		/* we don't have the right data for file backed ns */
 		if (!ns->bdev)
 			continue;
@@ -127,9 +126,7 @@ static u16 nvmet_get_smart_log_all(struct nvmet_req *req,
 		host_writes += part_stat_read(ns->bdev->bd_part, ios[WRITE]);
 		data_units_written += DIV_ROUND_UP(
 			part_stat_read(ns->bdev->bd_part, sectors[WRITE]), 1000);
-
 	}
-	rcu_read_unlock();
 
 	put_unaligned_le64(host_reads, &slog->host_reads[0]);
 	put_unaligned_le64(data_units_read, &slog->data_units_read[0]);
@@ -230,14 +227,13 @@ static u32 nvmet_format_ana_group(struct nvmet_req *req, u32 grpid,
 {
 	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 	struct nvmet_ns *ns;
+	unsigned long idx;
 	u32 count = 0;
 
 	if (!(req->cmd->get_log_page.lsp & NVME_ANA_LOG_RGO)) {
-		rcu_read_lock();
-		list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link)
+		xa_for_each(&ctrl->subsys->namespaces, idx, ns)
 			if (ns->anagrpid == grpid)
 				desc->nsids[count++] = cpu_to_le32(ns->nsid);
-		rcu_read_unlock();
 	}
 
 	desc->grpid = cpu_to_le32(grpid);
@@ -554,11 +550,12 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 static void nvmet_execute_identify_nslist(struct nvmet_req *req)
 {
 	static const int buf_size = NVME_IDENTIFY_DATA_SIZE;
+	u32 min_nsid = le32_to_cpu(req->cmd->identify.nsid);
 	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 	struct nvmet_ns *ns;
-	u32 min_nsid = le32_to_cpu(req->cmd->identify.nsid);
-	__le32 *list;
+	unsigned long idx;
 	u16 status = 0;
+	__le32 *list;
 	int i = 0;
 
 	list = kzalloc(buf_size, GFP_KERNEL);
@@ -567,15 +564,13 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
 		goto out;
 	}
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) {
+	xa_for_each(&ctrl->subsys->namespaces, idx, ns) {
 		if (ns->nsid <= min_nsid)
 			continue;
 		list[i++] = cpu_to_le32(ns->nsid);
 		if (i == buf_size / sizeof(__le32))
 			break;
 	}
-	rcu_read_unlock();
 
 	status = nvmet_copy_to_sgl(req, 0, list, buf_size);
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index cea7c3834021..d14edaa22ad5 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -10,6 +10,9 @@
 #include <linux/pci-p2pdma.h>
 #include <linux/scatterlist.h>
 
+#include "../host/nvme.h"
+#undef CREATE_TRACE_POINTS
+
 #define CREATE_TRACE_POINTS
 #include "trace.h"
 
@@ -115,13 +118,17 @@ u16 nvmet_zero_sgl(struct nvmet_req *req, off_t off, size_t len)
 
 static unsigned int nvmet_max_nsid(struct nvmet_subsys *subsys)
 {
-	struct nvmet_ns *ns;
+	struct nvmet_ns *cur;
+	unsigned long idx;
+	unsigned long nsid;
 
-	if (list_empty(&subsys->namespaces))
+	if (xa_empty(&subsys->namespaces))
 		return 0;
 
-	ns = list_last_entry(&subsys->namespaces, struct nvmet_ns, dev_link);
-	return ns->nsid;
+	xa_for_each(&subsys->namespaces, idx, cur)
+		nsid = cur->nsid;
+
+	return nsid;
 }
 
 static u32 nvmet_async_event_result(struct nvmet_async_event *aen)
@@ -410,25 +417,17 @@ static void nvmet_stop_keep_alive_timer(struct nvmet_ctrl *ctrl)
 	cancel_delayed_work_sync(&ctrl->ka_work);
 }
 
-static struct nvmet_ns *__nvmet_find_namespace(struct nvmet_ctrl *ctrl,
-		__le32 nsid)
-{
-	struct nvmet_ns *ns;
-
-	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) {
-		if (ns->nsid == le32_to_cpu(nsid))
-			return ns;
-	}
-
-	return NULL;
-}
-
 struct nvmet_ns *nvmet_find_namespace(struct nvmet_ctrl *ctrl, __le32 nsid)
 {
-	struct nvmet_ns *ns;
+	XA_STATE(xas, &ctrl->subsys->namespaces, le32_to_cpu(nsid));
+	struct nvmet_ns *ns = NULL;
 
 	rcu_read_lock();
-	ns = __nvmet_find_namespace(ctrl, nsid);
+	do {
+		ns = xas_load(&xas);
+		if (xa_is_zero(ns))
+			ns = NULL;
+	} while (xas_retry(&xas, ns));
 	if (ns)
 		percpu_ref_get(&ns->ref);
 	rcu_read_unlock();
@@ -586,24 +585,20 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 	if (ns->nsid > subsys->max_nsid)
 		subsys->max_nsid = ns->nsid;
 
-	/*
-	 * The namespaces list needs to be sorted to simplify the implementation
-	 * of the Identify Namepace List subcommand.
-	 */
-	if (list_empty(&subsys->namespaces)) {
-		list_add_tail_rcu(&ns->dev_link, &subsys->namespaces);
-	} else {
-		struct nvmet_ns *old;
-
-		list_for_each_entry_rcu(old, &subsys->namespaces, dev_link,
-					lockdep_is_held(&subsys->lock)) {
-			BUG_ON(ns->nsid == old->nsid);
-			if (ns->nsid < old->nsid)
-				break;
+	ret = xa_insert(&subsys->namespaces, ns->nsid, ns, GFP_KERNEL);
+	if (ret) {
+		switch (ret) {
+		case -ENOMEM:
+			pr_err("xa insert memory allocation\n");
+			goto out_unlock;
+		case -EBUSY:
+			pr_err("xa insert entry already present\n");
+			goto out_unlock;
+		default:
+			goto out_unlock;
 		}
-
-		list_add_tail_rcu(&ns->dev_link, &old->dev_link);
 	}
+
 	subsys->nr_namespaces++;
 
 	nvmet_ns_changed(subsys, ns->nsid);
@@ -630,7 +625,7 @@ void nvmet_ns_disable(struct nvmet_ns *ns)
 		goto out_unlock;
 
 	ns->enabled = false;
-	list_del_rcu(&ns->dev_link);
+	xa_erase(&ns->subsys->namespaces, ns->nsid);
 	if (ns->nsid == subsys->max_nsid)
 		subsys->max_nsid = nvmet_max_nsid(subsys);
 
@@ -681,7 +676,6 @@ struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid)
 	if (!ns)
 		return NULL;
 
-	INIT_LIST_HEAD(&ns->dev_link);
 	init_completion(&ns->disable_done);
 
 	ns->nsid = nsid;
@@ -1263,14 +1257,14 @@ static void nvmet_setup_p2p_ns_map(struct nvmet_ctrl *ctrl,
 		struct nvmet_req *req)
 {
 	struct nvmet_ns *ns;
+	unsigned long idx;
 
 	if (!req->p2p_client)
 		return;
 
 	ctrl->p2p_client = get_device(req->p2p_client);
 
-	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link,
-				lockdep_is_held(&ctrl->subsys->lock))
+	xa_for_each(&ctrl->subsys->namespaces, idx, ns)
 		nvmet_p2pmem_ns_add_p2p(ctrl, ns);
 }
 
@@ -1523,7 +1517,7 @@ struct nvmet_subsys *nvmet_subsys_alloc(const char *subsysnqn,
 	kref_init(&subsys->ref);
 
 	mutex_init(&subsys->lock);
-	INIT_LIST_HEAD(&subsys->namespaces);
+	xa_init(&subsys->namespaces);
 	INIT_LIST_HEAD(&subsys->ctrls);
 	INIT_LIST_HEAD(&subsys->hosts);
 
@@ -1535,8 +1529,9 @@ static void nvmet_subsys_free(struct kref *ref)
 	struct nvmet_subsys *subsys =
 		container_of(ref, struct nvmet_subsys, ref);
 
-	WARN_ON_ONCE(!list_empty(&subsys->namespaces));
+	WARN_ON_ONCE(!xa_empty(&subsys->namespaces));
 
+	xa_destroy(&subsys->namespaces);
 	kfree(subsys->subsysnqn);
 	kfree_rcu(subsys->model, rcuhead);
 	kfree(subsys);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 6f8bd6a93575..49e14111446c 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -52,7 +52,6 @@
 	(cpu_to_le32(offsetof(struct nvmf_connect_command, x)))
 
 struct nvmet_ns {
-	struct list_head	dev_link;
 	struct percpu_ref	ref;
 	struct block_device	*bdev;
 	struct file		*file;
@@ -219,7 +218,7 @@ struct nvmet_subsys {
 	struct mutex		lock;
 	struct kref		ref;
 
-	struct list_head	namespaces;
+	struct xarray		namespaces;
 	unsigned int		nr_namespaces;
 	unsigned int		max_nsid;
 	u16			cntlid_min;
-- 
2.26.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking
  2020-07-01  2:25 [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking Chaitanya Kulkarni
  2020-07-01  2:25 ` [PATCH V2 1/2] nvme-core: use xarray for ctrl " Chaitanya Kulkarni
  2020-07-01  2:25 ` [PATCH V2 2/2] nvmet: use xarray for ctrl ns storing Chaitanya Kulkarni
@ 2020-07-01  5:36 ` Christoph Hellwig
  2020-07-01 13:21   ` Keith Busch
  2020-07-01 18:41   ` Chaitanya Kulkarni
  2 siblings, 2 replies; 16+ messages in thread
From: Christoph Hellwig @ 2020-07-01  5:36 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: kbusch, hch, linux-nvme, sagi

On Tue, Jun 30, 2020 at 07:25:15PM -0700, Chaitanya Kulkarni wrote:
> Hi,
> 
> This is a small patch-series which replaces ctrl->namespaces with
> xarray for host-core and target-core. We can see following
> performance improvement when running fio with 32 parallel jobs where
> first 16 namespaces and last 16 namespaces are used for I/O. See [1] for
> detailed performance numbers. 

Why would that make any difference given that we don't look up namespaces
in the I/O path?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 2/2] nvmet: use xarray for ctrl ns storing
  2020-07-01  2:25 ` [PATCH V2 2/2] nvmet: use xarray for ctrl ns storing Chaitanya Kulkarni
@ 2020-07-01  6:08   ` Christoph Hellwig
  2020-07-01 18:36     ` Chaitanya Kulkarni
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2020-07-01  6:08 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: kbusch, hch, linux-nvme, sagi

Ok, for the target I can see how this makes sense unlike the host.
Comments below:

> -	u64 host_reads = 0, host_writes = 0;
>  	u64 data_units_read = 0, data_units_written = 0;
> -	struct nvmet_ns *ns;
> +	u64 host_reads = 0, host_writes = 0;
>  	struct nvmet_ctrl *ctrl;
> +	struct nvmet_ns *ns;
> +	unsigned long idx;

Please don't randomly reorder the variable declarations.  Same for
later function.

> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index cea7c3834021..d14edaa22ad5 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -10,6 +10,9 @@
>  #include <linux/pci-p2pdma.h>
>  #include <linux/scatterlist.h>
>  
> +#include "../host/nvme.h"
> +#undef CREATE_TRACE_POINTS

We really shoud not include the host nvme.h in the target code.  What
are you trying to get from it?

>  static unsigned int nvmet_max_nsid(struct nvmet_subsys *subsys)
>  {
> -	struct nvmet_ns *ns;
> +	struct nvmet_ns *cur;
> +	unsigned long idx;
> +	unsigned long nsid;
>  
> -	if (list_empty(&subsys->namespaces))
> +	if (xa_empty(&subsys->namespaces))
>  		return 0;
>  
> -	ns = list_last_entry(&subsys->namespaces, struct nvmet_ns, dev_link);
> -	return ns->nsid;
> +	xa_for_each(&subsys->namespaces, idx, cur)
> +		nsid = cur->nsid;
> +
> +	return nsid;

No need for the xa_empty here.  I also forwarded a question to Matthew
if there is a better way to find the highest index.

>  struct nvmet_ns *nvmet_find_namespace(struct nvmet_ctrl *ctrl, __le32 nsid)
>  {
> +	XA_STATE(xas, &ctrl->subsys->namespaces, le32_to_cpu(nsid));
> +	struct nvmet_ns *ns = NULL;
>  
>  	rcu_read_lock();
> +	do {
> +		ns = xas_load(&xas);
> +		if (xa_is_zero(ns))
> +			ns = NULL;
> +	} while (xas_retry(&xas, ns));
>  	if (ns)
>  		percpu_ref_get(&ns->ref);
>  	rcu_read_unlock();

Why doesn't this use xa_load?

> +	ret = xa_insert(&subsys->namespaces, ns->nsid, ns, GFP_KERNEL);
> +	if (ret) {
> +		switch (ret) {
> +		case -ENOMEM:
> +			pr_err("xa insert memory allocation\n");
> +			goto out_unlock;
> +		case -EBUSY:
> +			pr_err("xa insert entry already present\n");
> +			goto out_unlock;
> +		default:
> +			goto out_unlock;
>  		}
>  	}
> +

I don't think we need the switch statement, just return any error
encountered.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 1/2] nvme-core: use xarray for ctrl ns tracking
  2020-07-01  2:25 ` [PATCH V2 1/2] nvme-core: use xarray for ctrl " Chaitanya Kulkarni
@ 2020-07-01 13:12   ` Christoph Hellwig
  2020-07-01 16:44     ` Keith Busch
  2020-07-01 18:19     ` Chaitanya Kulkarni
  0 siblings, 2 replies; 16+ messages in thread
From: Christoph Hellwig @ 2020-07-01 13:12 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: kbusch, Matthew Wilcox, hch, linux-nvme, sagi

[willy: a comment/request on the xa_load API below]

On Tue, Jun 30, 2020 at 07:25:16PM -0700, Chaitanya Kulkarni wrote:
> This patch replaces the ctrl->namespaces tracking from linked list to
> xarray and improves the performance.

The performance improvement needs to be clearly stated here.

>  static int nvme_dev_user_cmd(struct nvme_ctrl *ctrl, void __user *argp)
>  {
> +	struct nvme_id_ctrl *id;
>  	struct nvme_ns *ns;
> +	int ret = 0;
>  
> +	if (xa_empty(&ctrl->namespaces)) {
>  		ret = -ENOTTY;
> +		goto out;
>  	}
>  
> -	ns = list_first_entry(&ctrl->namespaces, struct nvme_ns, list);
> -	if (ns != list_last_entry(&ctrl->namespaces, struct nvme_ns, list)) {
> -		dev_warn(ctrl->device,
> -			"NVME_IOCTL_IO_CMD not supported when multiple namespaces present!\n");
> +	/* Let the scan work finish updating ctrl->namespaces */
> +	flush_work(&ctrl->scan_work);
> +	if (nvme_identify_ctrl(ctrl, &id)) {
> +		dev_err(ctrl->device, "nvme_identify_ctrl() failed\n");
>  		ret = -EINVAL;
> -		goto out_unlock;
> +		goto out;
> +	}
> +	if (le32_to_cpu(id->nn) > 1) {
> +		dev_warn(ctrl->device,
> +		"NVME_IOCTL_IO_CMD not supported when multiple namespaces present!\n");
> +		goto out;
>  	}

This code doesn't make any sense at all.  Why does a patch changing
data structures add new calls that go out on the wire?

> +	struct nvme_ns *ns;
> +	XA_STATE(xas, &ctrl->namespaces, nsid);
>  
> +	rcu_read_lock();
> +	do {
> +		ns = xas_load(&xas);
> +		if (xa_is_zero(ns))
> +			ns = NULL;
> +	} while (xas_retry(&xas, ns));
> +	ns = ns && kref_get_unless_zero(&ns->kref) ? ns : NULL;
> +	rcu_read_unlock();

This looks pretty weird, but I think the problem is one in the xarray
API, as for the typical lookup pattern we'd want an xa_load with
external RCU locking:

	rcu_read_lock();
	ns = xa_load_rcu(&ctrl->namespaces, nsid);
	if (ns && !kref_get_unless_zero(&ns->kref))
		ns = NULL;
	rcu_read_unlock();

instead of duplicating this fairly arcane loop in all kinds of callers.

> -	down_write(&ctrl->namespaces_rwsem);
> -	list_add_tail(&ns->list, &ctrl->namespaces);
> -	up_write(&ctrl->namespaces_rwsem);
> +	ret = xa_insert(&ctrl->namespaces, nsid, ns, GFP_KERNEL);
> +	if (ret) {
> +		switch (ret) {
> +		case -ENOMEM:
> +			dev_err(ctrl->device,
> +				"xa insert memory allocation\n");
> +			break;
> +		case -EBUSY:
> +			dev_err(ctrl->device,
> +				"xa insert entry already present\n");
> +			break;
> +		}
> +	}

No need for the switch and the detailed printks here, but we do need
actual error handling.

>  static void nvme_ns_remove(struct nvme_ns *ns)
>  {
> +	struct xarray *xa = &ns->ctrl->namespaces;
> +	bool free;
> +
>  	if (test_and_set_bit(NVME_NS_REMOVING, &ns->flags))
>  		return;
>  
> @@ -3740,12 +3749,14 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>  			blk_integrity_unregister(ns->disk);
>  	}
>  
> -	down_write(&ns->ctrl->namespaces_rwsem);
> -	list_del_init(&ns->list);
> -	up_write(&ns->ctrl->namespaces_rwsem);
> +	xa_lock(xa);
> +	__xa_erase(xa, ns->head->ns_id);
> +	free = refcount_dec_and_test(&ns->kref.refcount) ? true : false;
> +	xa_unlock(xa);
>  
>  	nvme_mpath_check_last_path(ns);
> -	nvme_put_ns(ns);
> +	if (free)
> +		__nvme_free_ns(ns);

This looks very strange to me.  Shoudn't this be a normal xa_erase
followed by a normal nvme_put_ns?  For certain the driver code has
no business poking into the kref internals.

>  static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
>  					unsigned nsid)
>  {
> +	struct xarray *namespaces = &ctrl->namespaces;
> +	struct xarray rm_array;
> +	unsigned long tnsid;
> +	struct nvme_ns *ns;
> +	unsigned long idx;
> +	int ret;
>  
> +	xa_init(&rm_array);
> +
> +	xa_lock(namespaces);
> +	xa_for_each(namespaces, idx, ns) {
> +		tnsid = ns->head->ns_id;
> +		if (tnsid > nsid || test_bit(NVME_NS_DEAD, &ns->flags)) {
> +			xa_unlock(namespaces);
> +			xa_erase(namespaces, tnsid);
> +			/* Even if insert fails keep going */
> +			ret = xa_insert(&rm_array, nsid, ns, GFP_KERNEL);
> +			switch (ret) {
> +			case -ENOMEM:
> +				pr_err("xa insert memory allocation failed\n");
> +				break;
> +			case -EBUSY:
> +				pr_err("xa insert entry already present\n");
> +				break;
> +			}
> +			xa_lock(namespaces);
> +		}
>  	}
> +	xa_unlock(namespaces);

I don't think you want an xarray for the delete list.  Just keep the
list head for that now - once we moved to RCU read side locking some
of this could potentially be simplified later.

>   */
>  void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
>  {
> -	struct nvme_ns *ns, *next;
> -	LIST_HEAD(ns_list);
> +	struct xarray rm_array;
> +	unsigned long tnsid;
> +	struct nvme_ns *ns;
> +	unsigned long idx;
> +	int ret;
> +
> +	xa_init(&rm_array);
>  
>  	/*
>  	 * make sure to requeue I/O to all namespaces as these
> @@ -3919,11 +3950,30 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
>  	if (ctrl->state == NVME_CTRL_DEAD)
>  		nvme_kill_queues(ctrl);
>  
> -	down_write(&ctrl->namespaces_rwsem);
> -	list_splice_init(&ctrl->namespaces, &ns_list);
> -	up_write(&ctrl->namespaces_rwsem);
> +	xa_lock(&ctrl->namespaces);
> +	xa_for_each(&ctrl->namespaces, idx, ns) {
> +		tnsid = ns->head->ns_id;
> +		xa_unlock(&ctrl->namespaces);
> +		xa_erase(&ctrl->namespaces, tnsid);
> +		/* Even if insert fails keep going */
> +		ret = xa_insert(&rm_array, tnsid, ns, GFP_KERNEL);
> +		if (ret) {
> +			switch (ret) {
> +			case -ENOMEM:
> +				dev_err(ctrl->device,
> +					"xa insert memory allocation\n");
> +				break;
> +			case -EBUSY:
> +				dev_err(ctrl->device,
> +					"xa insert entry already present\n");
> +				break;
> +			}
> +		}
> +		xa_lock(&ctrl->namespaces);
> +	}
> +	xa_unlock(&ctrl->namespaces);

Same here.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking
  2020-07-01  5:36 ` [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking Christoph Hellwig
@ 2020-07-01 13:21   ` Keith Busch
  2020-07-01 16:49     ` Sagi Grimberg
  2020-07-01 18:41     ` Chaitanya Kulkarni
  2020-07-01 18:41   ` Chaitanya Kulkarni
  1 sibling, 2 replies; 16+ messages in thread
From: Keith Busch @ 2020-07-01 13:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, sagi, Chaitanya Kulkarni

On Wed, Jul 01, 2020 at 07:36:09AM +0200, Christoph Hellwig wrote:
> On Tue, Jun 30, 2020 at 07:25:15PM -0700, Chaitanya Kulkarni wrote:
> > Hi,
> > 
> > This is a small patch-series which replaces ctrl->namespaces with
> > xarray for host-core and target-core. We can see following
> > performance improvement when running fio with 32 parallel jobs where
> > first 16 namespaces and last 16 namespaces are used for I/O. See [1] for
> > detailed performance numbers. 
> 
> Why would that make any difference given that we don't look up namespaces
> in the I/O path?

Not in host side. Target does a lookup on each IO through:

  nvmet_req_init()
    nvmet_parse_io_cmrd()
      nvmet_find_namespaces()

And this detail should be mentioned in the test setup for the cover
letter, or in patch 2.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 1/2] nvme-core: use xarray for ctrl ns tracking
  2020-07-01 13:12   ` Christoph Hellwig
@ 2020-07-01 16:44     ` Keith Busch
  2020-07-01 18:19     ` Chaitanya Kulkarni
  1 sibling, 0 replies; 16+ messages in thread
From: Keith Busch @ 2020-07-01 16:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Matthew Wilcox, sagi, Chaitanya Kulkarni

On Wed, Jul 01, 2020 at 03:12:35PM +0200, Christoph Hellwig wrote:
> [willy: a comment/request on the xa_load API below]
> 
> On Tue, Jun 30, 2020 at 07:25:16PM -0700, Chaitanya Kulkarni wrote:
> >  
> > +	rcu_read_lock();
> > +	do {
> > +		ns = xas_load(&xas);
> > +		if (xa_is_zero(ns))
> > +			ns = NULL;
> > +	} while (xas_retry(&xas, ns));
> > +	ns = ns && kref_get_unless_zero(&ns->kref) ? ns : NULL;
> > +	rcu_read_unlock();
> 
> This looks pretty weird, but I think the problem is one in the xarray
> API, as for the typical lookup pattern we'd want an xa_load with
> external RCU locking:
> 
> 	rcu_read_lock();
> 	ns = xa_load_rcu(&ctrl->namespaces, nsid);
> 	if (ns && !kref_get_unless_zero(&ns->kref))
> 		ns = NULL;
> 	rcu_read_unlock();
> 
> instead of duplicating this fairly arcane loop in all kinds of callers.

Is it not safe to use external RCU locking with the existing xa_load()?
The rcu_read_lock() api says nested read-side critical sections are
fine.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking
  2020-07-01 13:21   ` Keith Busch
@ 2020-07-01 16:49     ` Sagi Grimberg
  2020-07-01 18:43       ` Chaitanya Kulkarni
  2020-07-01 18:41     ` Chaitanya Kulkarni
  1 sibling, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2020-07-01 16:49 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig; +Cc: Chaitanya Kulkarni, linux-nvme


>>> Hi,
>>>
>>> This is a small patch-series which replaces ctrl->namespaces with
>>> xarray for host-core and target-core. We can see following
>>> performance improvement when running fio with 32 parallel jobs where
>>> first 16 namespaces and last 16 namespaces are used for I/O. See [1] for
>>> detailed performance numbers.
>>
>> Why would that make any difference given that we don't look up namespaces
>> in the I/O path?
> 
> Not in host side. Target does a lookup on each IO through:
> 
>    nvmet_req_init()
>      nvmet_parse_io_cmrd()
>        nvmet_find_namespaces()
> 
> And this detail should be mentioned in the test setup for the cover
> letter, or in patch 2.

Maybe ns scanning performance will improve :)

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 1/2] nvme-core: use xarray for ctrl ns tracking
  2020-07-01 13:12   ` Christoph Hellwig
  2020-07-01 16:44     ` Keith Busch
@ 2020-07-01 18:19     ` Chaitanya Kulkarni
  2020-07-01 18:29       ` Keith Busch
  1 sibling, 1 reply; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-01 18:19 UTC (permalink / raw)
  To: Christoph Hellwig, Matthew Wilcox; +Cc: kbusch, sagi, linux-nvme

On 7/1/20 6:12 AM, Christoph Hellwig wrote:
> [willy: a comment/request on the xa_load API below]
That will be great.
> 
> On Tue, Jun 30, 2020 at 07:25:16PM -0700, Chaitanya Kulkarni wrote:
>> This patch replaces the ctrl->namespaces tracking from linked list to
>> xarray and improves the performance.
> 
> The performance improvement needs to be clearly stated here.
> 

This is a preparation for the passthru code which uses 
nvme_find_get_ns() which falls in the fast path from passthru to host
core. Having linked list will have same performance issue with which
I've reported on for the NVMeOF gen-blk target using nvme-loop when we 
integrate passthru.

How about we get this series in a good shape and before you apply I'll 
forward port on the passthru V14 and document the performance numbers 
for both the non gen-blk and passthru NVMeOF target ?

OR you want to see the numbers now with the comments fixed in V3 ?

I'm fine either way.

>>   static int nvme_dev_user_cmd(struct nvme_ctrl *ctrl, void __user *argp)
>>   {
>> +	struct nvme_id_ctrl *id;
>>   	struct nvme_ns *ns;
>> +	int ret = 0;
>>   
>> +	if (xa_empty(&ctrl->namespaces)) {
>>   		ret = -ENOTTY;
>> +		goto out;
>>   	}
>>   
>> -	ns = list_first_entry(&ctrl->namespaces, struct nvme_ns, list);
>> -	if (ns != list_last_entry(&ctrl->namespaces, struct nvme_ns, list)) {
>> -		dev_warn(ctrl->device,
>> -			"NVME_IOCTL_IO_CMD not supported when multiple namespaces present!\n");
>> +	/* Let the scan work finish updating ctrl->namespaces */
>> +	flush_work(&ctrl->scan_work);
>> +	if (nvme_identify_ctrl(ctrl, &id)) {
>> +		dev_err(ctrl->device, "nvme_identify_ctrl() failed\n");
>>   		ret = -EINVAL;
>> -		goto out_unlock;
>> +		goto out;
>> +	}
>> +	if (le32_to_cpu(id->nn) > 1) {
>> +		dev_warn(ctrl->device,
>> +		"NVME_IOCTL_IO_CMD not supported when multiple namespaces present!\n");
>> +		goto out;
>>   	}
> 
> This code doesn't make any sense at all.  Why does a patch changing
> data structures add new calls that go out on the wire?
> 
Yes, this should not be here, I'll remove that and only keep the code to
check multiple namespaces and if needed this needs to be a separate
patch.

>> +	struct nvme_ns *ns;
>> +	XA_STATE(xas, &ctrl->namespaces, nsid);
>>   
>> +	rcu_read_lock();
>> +	do {
>> +		ns = xas_load(&xas);
>> +		if (xa_is_zero(ns))
>> +			ns = NULL;
>> +	} while (xas_retry(&xas, ns));
>> +	ns = ns && kref_get_unless_zero(&ns->kref) ? ns : NULL;
>> +	rcu_read_unlock();
> 
> This looks pretty weird, but I think the problem is one in the xarray
> API, as for the typical lookup pattern we'd want an xa_load with
> external RCU locking:
> 

The Kref needs to be taken under the lock so I've open coded the 
xa_load() and added kref get under rcu locking.

Matthew can shed more light above pattern this ?

> 	rcu_read_lock();
> 	ns = xa_load_rcu(&ctrl->namespaces, nsid);
> 	if (ns && !kref_get_unless_zero(&ns->kref))
> 		ns = NULL;
> 	rcu_read_unlock();
> 
> instead of duplicating this fairly arcane loop in all kinds of callers.
> 
>> -	down_write(&ctrl->namespaces_rwsem);
>> -	list_add_tail(&ns->list, &ctrl->namespaces);
>> -	up_write(&ctrl->namespaces_rwsem);
>> +	ret = xa_insert(&ctrl->namespaces, nsid, ns, GFP_KERNEL);
>> +	if (ret) {
>> +		switch (ret) {
>> +		case -ENOMEM:
>> +			dev_err(ctrl->device,
>> +				"xa insert memory allocation\n");
>> +			break;
>> +		case -EBUSY:
>> +			dev_err(ctrl->device,
>> +				"xa insert entry already present\n");
>> +			break;
>> +		}
>> +	}
> 
> No need for the switch and the detailed printks here, but we do need
> actual error handling.
> 

We need to add a wrapper nvme_xa_insert() to take care for the error 
handling with pr_debug() and switch() which can be used everywhere
to not bloat the functions calling xa_insert() are you okay with that ?

I'll have goto above based on if you are okay with the xa_insert() 
wrapper nvme_xa_insert() to undo things that alloc ns does.

>>   static void nvme_ns_remove(struct nvme_ns *ns)
>>   {
>> +	struct xarray *xa = &ns->ctrl->namespaces;
>> +	bool free;
>> +
>>   	if (test_and_set_bit(NVME_NS_REMOVING, &ns->flags))
>>   		return;
>>   
>> @@ -3740,12 +3749,14 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>>   			blk_integrity_unregister(ns->disk);
>>   	}
>>   
>> -	down_write(&ns->ctrl->namespaces_rwsem);
>> -	list_del_init(&ns->list);
>> -	up_write(&ns->ctrl->namespaces_rwsem);
>> +	xa_lock(xa);
>> +	__xa_erase(xa, ns->head->ns_id);
>> +	free = refcount_dec_and_test(&ns->kref.refcount) ? true : false;
>> +	xa_unlock(xa);
>>   
>>   	nvme_mpath_check_last_path(ns);
>> -	nvme_put_ns(ns);
>> +	if (free)
>> +		__nvme_free_ns(ns);
> 
> This looks very strange to me.  Shoudn't this be a normal xa_erase
> followed by a normal nvme_put_ns?  For certain the driver code has
> no business poking into the kref internals.
> 

There is a race when kref is manipulated in nvme_find_get_ns() and
nvme _ns_remove() pointed by Keith which needs ns->kref to be guarded 
with locks. Let me know I'll share a detail scenario.

Given that xarray locking only uses spinlocks we cannot call
nvme_put_ns() since it will sleep, so it separates the code.

>>   static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
>>   					unsigned nsid)
>>   {
>> +	struct xarray *namespaces = &ctrl->namespaces;
>> +	struct xarray rm_array;
>> +	unsigned long tnsid;
>> +	struct nvme_ns *ns;
>> +	unsigned long idx;
>> +	int ret;
>>   
>> +	xa_init(&rm_array);
>> +
>> +	xa_lock(namespaces);
>> +	xa_for_each(namespaces, idx, ns) {
>> +		tnsid = ns->head->ns_id;
>> +		if (tnsid > nsid || test_bit(NVME_NS_DEAD, &ns->flags)) {
>> +			xa_unlock(namespaces);
>> +			xa_erase(namespaces, tnsid);
>> +			/* Even if insert fails keep going */
>> +			ret = xa_insert(&rm_array, nsid, ns, GFP_KERNEL);
>> +			switch (ret) {
>> +			case -ENOMEM:
>> +				pr_err("xa insert memory allocation failed\n");
>> +				break;
>> +			case -EBUSY:
>> +				pr_err("xa insert entry already present\n");
>> +				break;
>> +			}
>> +			xa_lock(namespaces);
>> +		}
>>   	}
>> +	xa_unlock(namespaces);
> 
> I don't think you want an xarray for the delete list.  Just keep the
> list head for that now - once we moved to RCU read side locking some
> of this could potentially be simplified later.
Okay that make sense will keep it simple, will do.
> 
>>    */
>>   void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
>>   {
>> -	struct nvme_ns *ns, *next;
>> -	LIST_HEAD(ns_list);
>> +	struct xarray rm_array;
>> +	unsigned long tnsid;
>> +	struct nvme_ns *ns;
>> +	unsigned long idx;
>> +	int ret;
>> +
>> +	xa_init(&rm_array);
>>   
>>   	/*
>>   	 * make sure to requeue I/O to all namespaces as these
>> @@ -3919,11 +3950,30 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
>>   	if (ctrl->state == NVME_CTRL_DEAD)
>>   		nvme_kill_queues(ctrl);
>>   
>> -	down_write(&ctrl->namespaces_rwsem);
>> -	list_splice_init(&ctrl->namespaces, &ns_list);
>> -	up_write(&ctrl->namespaces_rwsem);
>> +	xa_lock(&ctrl->namespaces);
>> +	xa_for_each(&ctrl->namespaces, idx, ns) {
>> +		tnsid = ns->head->ns_id;
>> +		xa_unlock(&ctrl->namespaces);
>> +		xa_erase(&ctrl->namespaces, tnsid);
>> +		/* Even if insert fails keep going */
>> +		ret = xa_insert(&rm_array, tnsid, ns, GFP_KERNEL);
>> +		if (ret) {
>> +			switch (ret) {
>> +			case -ENOMEM:
>> +				dev_err(ctrl->device,
>> +					"xa insert memory allocation\n");
>> +				break;
>> +			case -EBUSY:
>> +				dev_err(ctrl->device,
>> +					"xa insert entry already present\n");
>> +				break;
>> +			}
>> +		}
>> +		xa_lock(&ctrl->namespaces);
>> +	}
>> +	xa_unlock(&ctrl->namespaces);
> 
> Same here.
Yes there is pattern here and I was wondering if we can ge rid of this 
duplicate code will see.
> 


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 1/2] nvme-core: use xarray for ctrl ns tracking
  2020-07-01 18:19     ` Chaitanya Kulkarni
@ 2020-07-01 18:29       ` Keith Busch
  2020-07-01 18:40         ` Chaitanya Kulkarni
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2020-07-01 18:29 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-nvme, Christoph Hellwig, Matthew Wilcox, sagi

On Wed, Jul 01, 2020 at 06:19:50PM +0000, Chaitanya Kulkarni wrote:
> On 7/1/20 6:12 AM, Christoph Hellwig wrote:
> > [willy: a comment/request on the xa_load API below]
> > On Tue, Jun 30, 2020 at 07:25:16PM -0700, Chaitanya Kulkarni wrote:
> >> +	if (le32_to_cpu(id->nn) > 1) {
> >> +		dev_warn(ctrl->device,
> >> +		"NVME_IOCTL_IO_CMD not supported when multiple namespaces present!\n");
> >> +		goto out;
> >>   	}
> > 
> > This code doesn't make any sense at all.  Why does a patch changing
> > data structures add new calls that go out on the wire?
> > 
> Yes, this should not be here, I'll remove that and only keep the code to
> check multiple namespaces and if needed this needs to be a separate
> patch.

This isn't actually checking for multiple namespaces, though. It's just
getting the highest nsid, which doesn't tell us how many namespaces the
driver is bound to.

> >> -	down_write(&ns->ctrl->namespaces_rwsem);
> >> -	list_del_init(&ns->list);
> >> -	up_write(&ns->ctrl->namespaces_rwsem);
> >> +	xa_lock(xa);
> >> +	__xa_erase(xa, ns->head->ns_id);
> >> +	free = refcount_dec_and_test(&ns->kref.refcount) ? true : false;
> >> +	xa_unlock(xa);
> >>   
> >>   	nvme_mpath_check_last_path(ns);
> >> -	nvme_put_ns(ns);
> >> +	if (free)
> >> +		__nvme_free_ns(ns);
> > 
> > This looks very strange to me.  Shoudn't this be a normal xa_erase
> > followed by a normal nvme_put_ns?  For certain the driver code has
> > no business poking into the kref internals.
> > 
> 
> There is a race when kref is manipulated in nvme_find_get_ns() and
> nvme _ns_remove() pointed by Keith which needs ns->kref to be guarded 
> with locks. Let me know I'll share a detail scenario.

That wasn't the race I was pointing out. In your earlier internal
implementation, you had the xa_erase done after the final put, and I was
warning against that. The erase needs to happen where you have it, but I
don't see a need for protecting the kref like this.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 2/2] nvmet: use xarray for ctrl ns storing
  2020-07-01  6:08   ` Christoph Hellwig
@ 2020-07-01 18:36     ` Chaitanya Kulkarni
  0 siblings, 0 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-01 18:36 UTC (permalink / raw)
  To: Christoph Hellwig, Matthew Wilcox; +Cc: kbusch, sagi, linux-nvme

(+Matthew)

On 6/30/20 11:08 PM, Christoph Hellwig wrote:
> Ok, for the target I can see how this makes sense unlike the host.
Host side is for passthru code which is not merged yet.
> Comments below:
> 
>> -	u64 host_reads = 0, host_writes = 0;
>>   	u64 data_units_read = 0, data_units_written = 0;
>> -	struct nvmet_ns *ns;
>> +	u64 host_reads = 0, host_writes = 0;
>>   	struct nvmet_ctrl *ctrl;
>> +	struct nvmet_ns *ns;
>> +	unsigned long idx;
> 
> Please don't randomly reorder the variable declarations.  Same for
> later function.
Okay since I was touching these functions can we cleanup ?
> 
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index cea7c3834021..d14edaa22ad5 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -10,6 +10,9 @@
>>   #include <linux/pci-p2pdma.h>
>>   #include <linux/scatterlist.h>
>>   
>> +#include "../host/nvme.h"
>> +#undef CREATE_TRACE_POINTS
> 
> We really shoud not include the host nvme.h in the target code.  What
> are you trying to get from it?
> 
Yes we this needs to be removed. I've added nvme_xa_insert() wrapper for 
xa_insert() and didn't post that series. I'd really want us to use 
wrapper to minimize the code and help debugging with pr_debug.

If we do the prototype will go in ../host/nvme.h ? instead of new header
nvme-common.h, please confirm.

>>   static unsigned int nvmet_max_nsid(struct nvmet_subsys *subsys)
>>   {
>> -	struct nvmet_ns *ns;
>> +	struct nvmet_ns *cur;
>> +	unsigned long idx;
>> +	unsigned long nsid;
>>   
>> -	if (list_empty(&subsys->namespaces))
>> +	if (xa_empty(&subsys->namespaces))
>>   		return 0;
>>   
>> -	ns = list_last_entry(&subsys->namespaces, struct nvmet_ns, dev_link);
>> -	return ns->nsid;
>> +	xa_for_each(&subsys->namespaces, idx, cur)
>> +		nsid = cur->nsid;
>> +
>> +	return nsid;
> 
> No need for the xa_empty here.  I also forwarded a question to Matthew
> if there is a better way to find the highest index.
> 

I did look into this and I'm still to add possibly in next V3.
Meanwhile if Matthew has any suggestions that will be great.

I'm not an XArray expert but if it is some flavor of search tree then 
largest index will be on the right most leaf we can either cache it at 
each insertion which will add cost for each insert (not ideal) or just
traverse the tree with an API (ideal).

>>   struct nvmet_ns *nvmet_find_namespace(struct nvmet_ctrl *ctrl, __le32 nsid)
>>   {
>> +	XA_STATE(xas, &ctrl->subsys->namespaces, le32_to_cpu(nsid));
>> +	struct nvmet_ns *ns = NULL;
>>   
>>   	rcu_read_lock();
>> +	do {
>> +		ns = xas_load(&xas);
>> +		if (xa_is_zero(ns))
>> +			ns = NULL;
>> +	} while (xas_retry(&xas, ns));
>>   	if (ns)
>>   		percpu_ref_get(&ns->ref);
>>   	rcu_read_unlock();
> 
> Why doesn't this use xa_load?

I tried to keep load code similar to what we have in the host-core
so that we can add wrapper and not duplicate code which also follows the 
pattern of taking ref count under rcu lock.

> 
>> +	ret = xa_insert(&subsys->namespaces, ns->nsid, ns, GFP_KERNEL);
>> +	if (ret) {
>> +		switch (ret) {
>> +		case -ENOMEM:
>> +			pr_err("xa insert memory allocation\n");
>> +			goto out_unlock;
>> +		case -EBUSY:
>> +			pr_err("xa insert entry already present\n");
>> +			goto out_unlock;
>> +		default:
>> +			goto out_unlock;
>>   		}
>>   	}
>> +
> 
> I don't think we need the switch statement, just return any error
> encountered.
okay.
> 


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 1/2] nvme-core: use xarray for ctrl ns tracking
  2020-07-01 18:29       ` Keith Busch
@ 2020-07-01 18:40         ` Chaitanya Kulkarni
  0 siblings, 0 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-01 18:40 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme, Christoph Hellwig, Matthew Wilcox, sagi

On 7/1/20 11:29 AM, Keith Busch wrote:
> On Wed, Jul 01, 2020 at 06:19:50PM +0000, Chaitanya Kulkarni wrote:
>> On 7/1/20 6:12 AM, Christoph Hellwig wrote:
>>> [willy: a comment/request on the xa_load API below]
>>> On Tue, Jun 30, 2020 at 07:25:16PM -0700, Chaitanya Kulkarni wrote:
>>>> +	if (le32_to_cpu(id->nn) > 1) {
>>>> +		dev_warn(ctrl->device,
>>>> +		"NVME_IOCTL_IO_CMD not supported when multiple namespaces present!\n");
>>>> +		goto out;
>>>>    	}
>>> This code doesn't make any sense at all.  Why does a patch changing
>>> data structures add new calls that go out on the wire?
>>>
>> Yes, this should not be here, I'll remove that and only keep the code to
>> check multiple namespaces and if needed this needs to be a separate
>> patch.
> This isn't actually checking for multiple namespaces, though. It's just
> getting the highest nsid, which doesn't tell us how many namespaces the
> driver is bound to.
> 
hmmm, I'll try and fix it in a next version.
>>>> -	down_write(&ns->ctrl->namespaces_rwsem);
>>>> -	list_del_init(&ns->list);
>>>> -	up_write(&ns->ctrl->namespaces_rwsem);
>>>> +	xa_lock(xa);
>>>> +	__xa_erase(xa, ns->head->ns_id);
>>>> +	free = refcount_dec_and_test(&ns->kref.refcount) ? true : false;
>>>> +	xa_unlock(xa);
>>>>    
>>>>    	nvme_mpath_check_last_path(ns);
>>>> -	nvme_put_ns(ns);
>>>> +	if (free)
>>>> +		__nvme_free_ns(ns);
>>> This looks very strange to me.  Shoudn't this be a normal xa_erase
>>> followed by a normal nvme_put_ns?  For certain the driver code has
>>> no business poking into the kref internals.
>>>
>> There is a race when kref is manipulated in nvme_find_get_ns() and
>> nvme _ns_remove() pointed by Keith which needs ns->kref to be guarded
>> with locks. Let me know I'll share a detail scenario.

> That wasn't the race I was pointing out. In your earlier internal
> implementation, you had the xa_erase done after the final put, and I was
> warning against that. The erase needs to happen where you have it, but I
> don't see a need for protecting the kref like this.
> 

Okay I've misunderstood it and follow the same pattern here due to my 
previous implementation. Maybe we can get rid of it, let me see.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking
  2020-07-01  5:36 ` [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking Christoph Hellwig
  2020-07-01 13:21   ` Keith Busch
@ 2020-07-01 18:41   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-01 18:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kbusch, sagi, linux-nvme

On 6/30/20 10:36 PM, Christoph Hellwig wrote:
> On Tue, Jun 30, 2020 at 07:25:15PM -0700, Chaitanya Kulkarni wrote:
>> Hi,
>>
>> This is a small patch-series which replaces ctrl->namespaces with
>> xarray for host-core and target-core. We can see following
>> performance improvement when running fio with 32 parallel jobs where
>> first 16 namespaces and last 16 namespaces are used for I/O. See [1] for
>> detailed performance numbers.
> Why would that make any difference given that we don't look up namespaces
> in the I/O path?
> 
This is for the passthru series which uses nvme_find_get_ns() in the 
fast path.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking
  2020-07-01 13:21   ` Keith Busch
  2020-07-01 16:49     ` Sagi Grimberg
@ 2020-07-01 18:41     ` Chaitanya Kulkarni
  1 sibling, 0 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-01 18:41 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig; +Cc: sagi, linux-nvme

On 7/1/20 6:21 AM, Keith Busch wrote:
> On Wed, Jul 01, 2020 at 07:36:09AM +0200, Christoph Hellwig wrote:
>> On Tue, Jun 30, 2020 at 07:25:15PM -0700, Chaitanya Kulkarni wrote:
>>> Hi,
>>>
>>> This is a small patch-series which replaces ctrl->namespaces with
>>> xarray for host-core and target-core. We can see following
>>> performance improvement when running fio with 32 parallel jobs where
>>> first 16 namespaces and last 16 namespaces are used for I/O. See [1] for
>>> detailed performance numbers.
>> Why would that make any difference given that we don't look up namespaces
>> in the I/O path?
> Not in host side. Target does a lookup on each IO through:
> 
>    nvmet_req_init()
>      nvmet_parse_io_cmrd()
>        nvmet_find_namespaces()
> 
> And this detail should be mentioned in the test setup for the cover
> letter, or in patch 2.
> 
Yes I've missed that I'll add these details in V3.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking
  2020-07-01 16:49     ` Sagi Grimberg
@ 2020-07-01 18:43       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-01 18:43 UTC (permalink / raw)
  To: Sagi Grimberg, Keith Busch, Christoph Hellwig; +Cc: linux-nvme

On 7/1/20 9:50 AM, Sagi Grimberg wrote:
>>>> Hi,
>>>>
>>>> This is a small patch-series which replaces ctrl->namespaces with
>>>> xarray for host-core and target-core. We can see following
>>>> performance improvement when running fio with 32 parallel jobs where
>>>> first 16 namespaces and last 16 namespaces are used for I/O. See [1] for
>>>> detailed performance numbers.
>>> Why would that make any difference given that we don't look up namespaces
>>> in the I/O path?
>> Not in host side. Target does a lookup on each IO through:
>>
>>     nvmet_req_init()
>>       nvmet_parse_io_cmrd()
>>         nvmet_find_namespaces()
>>
>> And this detail should be mentioned in the test setup for the cover
>> letter, or in patch 2.
> Maybe ns scanning performance will improve:)
> 

No it is not about scan only, (although it should as there is no sorting 
involve) this is for the passthru series which uses nvme_find_get_ns() 
in the fast path.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, back to index

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-01  2:25 [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking Chaitanya Kulkarni
2020-07-01  2:25 ` [PATCH V2 1/2] nvme-core: use xarray for ctrl " Chaitanya Kulkarni
2020-07-01 13:12   ` Christoph Hellwig
2020-07-01 16:44     ` Keith Busch
2020-07-01 18:19     ` Chaitanya Kulkarni
2020-07-01 18:29       ` Keith Busch
2020-07-01 18:40         ` Chaitanya Kulkarni
2020-07-01  2:25 ` [PATCH V2 2/2] nvmet: use xarray for ctrl ns storing Chaitanya Kulkarni
2020-07-01  6:08   ` Christoph Hellwig
2020-07-01 18:36     ` Chaitanya Kulkarni
2020-07-01  5:36 ` [PATCH V2 0/2] nvme: replace linked list with xarray for ns tracking Christoph Hellwig
2020-07-01 13:21   ` Keith Busch
2020-07-01 16:49     ` Sagi Grimberg
2020-07-01 18:43       ` Chaitanya Kulkarni
2020-07-01 18:41     ` Chaitanya Kulkarni
2020-07-01 18:41   ` Chaitanya Kulkarni

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git