[PATCH V3 00/10] nvme: use xarray for ns tracking

* [PATCH V3 00/10] nvme: use xarray for ns tracking
@ 2020-07-14 23:30 Chaitanya Kulkarni
  2020-07-14 23:30 ` [PATCH V3 01/10] xarray: add __xa_load() version Chaitanya Kulkarni
                   ` (10 more replies)
  0 siblings, 11 replies; 25+ messages in thread
From: Chaitanya Kulkarni @ 2020-07-14 23:30 UTC (permalink / raw)
  To: kbusch, hch, sagi, willy; +Cc: Chaitanya Kulkarni, linux-nvme

Hi,

This patch-series uses ctrl->namespaces with an xarray for host-core and
target-core. We can see following performance improvement when running
fio with 32 parallel jobs where first 16 namespaces and last 16
namespaces are used for I/O using NVMeOF (nvme-loop) backed by nulli blk
devices mapped 1:1 on target namespaces.

For host even though nvme_find_get_ns() doesn't fall into the fast path
yet it does for NVMeOF passthru. This prepares us to improve performance
for future NVMeOF passthru backend which is under review which uses the
similar data structure as target.

Following are the performance numbers with NVMeOF (nvme-loop) backed by
null_blk devices mapped 1:1 on NVMeOF target backend :-

Bandwidth ~16.4198% increase with XArray (higher the better) :-
-----------------------------------------------------------------------

default-1.fio.log:  read:  IOPS=820k,  BW=3204MiB/s  (3360MB/s)(188GiB/60002msec)
default-2.fio.log:  read:  IOPS=835k,  BW=3260MiB/s  (3418MB/s)(191GiB/60002msec)
default-3.fio.log:  read:  IOPS=834k,  BW=3257MiB/s  (3415MB/s)(191GiB/60001msec)
xarray-1.fio.log:   read:  IOPS=966k,  BW=3772MiB/s  (3955MB/s)(221GiB/60003msec)
xarray-2.fio.log:   read:  IOPS=966k,  BW=3775MiB/s  (3958MB/s)(221GiB/60002msec)
xarray-3.fio.log:   read:  IOPS=965k,  BW=3769MiB/s  (3952MB/s)(221GiB/60002msec)

Latency (submission) ~25% decrease with XArray (lower the better) :-
------------------------------------------------------------------------

default-1.fio.log:  slat  (usec):  min=8,  max=26066,  avg=25.18,  stdev=47.72
default-2.fio.log:  slat  (usec):  min=8,  max=907,    avg=20.24,  stdev=7.36
default-3.fio.log:  slat  (usec):  min=8,  max=723,    avg=20.21,  stdev=7.16
xarray-1.fio.log:   slat  (usec):  min=8,  max=639,    avg=14.84,  stdev=1.50
xarray-2.fio.log:   slat  (usec):  min=8,  max=840,    avg=14.84,  stdev=1.51
xarray-3.fio.log:   slat  (usec):  min=8,  max=2161,   avg=15.08,  stdev=9.56

CPU usage (system) ~12.2807% decrease with XArray (lower the better) :-
-----------------------------------------------------------------------

default-1.fio.log:  cpu  :  usr=3.92%,  sys=57.25%,  ctx=2159595,  majf=0,  minf=2807
default-2.fio.log:  cpu  :  usr=3.98%,  sys=57.99%,  ctx=1565139,  majf=0,  minf=2425
default-3.fio.log:  cpu  :  usr=3.99%,  sys=57.85%,  ctx=1563792,  majf=0,  minf=2977
xarray-1.fio.log:   cpu  :  usr=4.47%,  sys=50.88%,  ctx=1810927,  majf=0,  minf=2478
xarray-2.fio.log:   cpu  :  usr=4.47%,  sys=50.88%,  ctx=1812184,  majf=0,  minf=2176
xarray-3.fio.log:   cpu  :  usr=4.49%,  sys=50.86%,  ctx=1816963,  majf=0,  minf=2736

We did not hear anything on XArray bits, it will be great to have some feedback
from Matthew Willcox so we can atleast converge XArray part of this series.

I'm still doing more deeper testing but basic things seems to work with
performance numbers being consistent with couple of outstanding issues.
(using xa for ns remove instead of list_head and overall xa comments from
Matthew).

Regards,
Chaitanya

* Changes from V2:-
-------------------

1.  Add Xarray __xa_load() API as a preparation patch.
2.  Remove the id_ctrl call in nvme_dev_user_cmd().
3.  Remove the switch error check for xa_insert().
4.  Don't change the ns->kref code. when calling xa_erase().
5.  Keep XArray for deletion in the nvme_remove_invalid_namespaces()
    see [1].
6.  Keep XArray for deletion in the nvme_remove_namespaces() see [1].
7.  Remove randomly changed the lines to alingn the coding style in
    nvmet patch.
8.  Remove remaining #include nvme.h from the nvmet patch.
9.  Remove the xa_empty() from nvmet_max_nsid().
10. Centralize the blk-mq queue wrapper. The blk-mq queue related
    wrapper functions nvme_kill_queues(), nvme_unfreeze(),
    nvme_wait_freeze(), nvme_start_freeze(), nvme_stop_queues(),
    nvme_start_queues(), nvme_start_queues(), and nvme_sync_queues()
    differ in only one line i.e. blk_mq_queue_xxx() call. For the one
    line we have 7 functions and 7 exported symbols. Using a 
    centralize ctrl-queue action function and well defined enums
    represnting names of the helpers we can minimize the code and
    exported symbol and still maintain the redability.

* Change from V1:-
------------------

1. Use xarray instead of rcu locks.

[1] Keeping the Xarray for ns deletion in nvme_remove_namespaces() and
    nvme_remove_invalid_namepaces() :-

1.1. No impact on the fast patch :-

    The list_head is only needed for the deletion purpose which and has no
    impact on the fast path.

1.2 Effects on cacheline size :-

    Size of the struct nvme_ns without list head is 112 [1.3] which still
    has some room for future members so that we can fit members in 2
    cachelines before 3rd one is needed. With addition it becomes 128 i.e.
    for any new member we will need 3rd cacheline the test machine has 
    cacheline size == 64.

    # getconf LEVEL1_DCACHE_LINESIZE
    64

    Even though nvme_fine_get_ns() is not in the fast path yet with
    addtion of passthru it will be, also with any new field we will have
    to fetch one extra cache line.


[1.3] Without addition of the list_head size 112 :-

# dmesg  -c
[  737.953355] nvme_core_init 4601 sizeof(nvme_ns) 112 <---
# git diff

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a8c39254dae1..89c10cb7f6ef 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4598,6 +4598,7 @@ static int __init nvme_core_init(void)
 {
        int result = -ENOMEM;
 
+       pr_info("%s %d sizeof(nvme_ns) %lu\n", __func__, __LINE__, sizeof(struct nvme_ns));
        _nvme_check_size();
 
        nvme_wq = alloc_workqueue("nvme-wq",

[1.4] With addition of the list_head size becomes 128 :-

# dmesg  -c
[  772.513266] nvme_core_init 4601 sizeof(nvme_ns) 128 < ----
# git diff
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a8c39254dae1..89c10cb7f6ef 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4598,6 +4598,7 @@ static int __init nvme_core_init(void)
 {
        int result = -ENOMEM;
 
+       pr_info("%s %d sizeof(nvme_ns) %lu\n", __func__, __LINE__, sizeof(struct nvme_ns));
        _nvme_check_size();
 
        nvme_wq = alloc_workqueue("nvme-wq",
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1e8dedee74df..2907f6fe0400 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -394,6 +394,7 @@ enum nvme_ns_features {
 };
 
 struct nvme_ns {
+       struct list_head list;
        struct nvme_ctrl *ctrl;
        struct request_queue *queue;
        struct gendisk *disk;

Chaitanya Kulkarni (10):
  xarray: add __xa_load() version
  nvme-core: use xarray for ctrl ns tracking
  nvme: centralize queue action nvme_kill_queues()
  nvme: centralize queue action nvme_unfreeze()
  nvme: centralize queue action nvme_wait_freeze()
  nvme: centralize queue action nvme_start_freeze()
  nvme: centralize queue action nvme_stop_queues()
  nvme: centralize queue action nvme_start_queues()
  nvme: centralize queue action nvme_sync_queues()
  nvmet: use xarray for ctrl ns storing

 drivers/nvme/host/core.c        | 291 ++++++++++++++------------------
 drivers/nvme/host/fc.c          |   4 +-
 drivers/nvme/host/multipath.c   |  15 +-
 drivers/nvme/host/nvme.h        |  23 +--
 drivers/nvme/host/pci.c         |  28 +--
 drivers/nvme/host/rdma.c        |   6 +-
 drivers/nvme/host/tcp.c         |   6 +-
 drivers/nvme/target/admin-cmd.c |  17 +-
 drivers/nvme/target/core.c      |  58 ++-----
 drivers/nvme/target/loop.c      |   2 +-
 drivers/nvme/target/nvmet.h     |   3 +-
 include/linux/xarray.h          |   1 +
 lib/xarray.c                    |  26 ++-
 13 files changed, 212 insertions(+), 268 deletions(-)

-- 
2.26.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 25+ messages in thread