All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Jiang <dave.jiang@intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"Koul, Vinod  <vinod.koul@intel.com>,
	Christoph Hellwig <hch@infradead.org>,
	dmaengine@vger.kernel.org" <dmaengine@vger.kernel.org>
Subject: Re: [PATCH v5 7/7] libnvdimm: Add blk-mq pmem driver
Date: Wed, 23 Aug 2017 12:56:04 -0700	[thread overview]
Message-ID: <cc37e79b-904e-a8a3-faae-28887ed28b0f@intel.com> (raw)
In-Reply-To: <CAPcyv4h5wm3C8BhQQpA8vZ3SCrcunLCW-rgrq6QCUHX74YJpgQ@mail.gmail.com>



On 08/23/2017 11:39 AM, Dan Williams wrote:
> On Mon, Aug 21, 2017 at 2:11 PM, Dave Jiang <dave.jiang@intel.com> wrote:
>> Adding a DMA supported blk-mq driver for pmem.
> 
> "Add support for offloading pmem block-device I/O operations to a DMA engine."
> 
>> This provides signficant CPU
> 
> *significant
> 
>> utilization reduction.
> 
> "at the cost of some increased latency and bandwidth reduction in some cases."
> 
>> By default the pmem driver will be using blk-mq with
> 
> "By default the current cpu-copy based pmem driver will load, but this
> driver can be manually selected with a modprobe configuration."
> 
>> DMA through the dmaengine API. DMA can be turned off with use_dma=0 kernel
>> parameter.
> 
> Do we need the module option? It seems for debug / testing a user can
> simply unload the ioatdma driver, otherwise we should use dma by
> default.
> 
>> Additional kernel parameters are provided:
>>
>> queue_depth: The queue depth for blk-mq. Typically in relation to what the
>>              DMA engine can provide per queue/channel. This needs to take
>>              into account of num_sg as well for some DMA engines. i.e.
>>              num_sg * queue_depth < total descriptors available per queue or
>>              channel.
>>
>> q_per_node: Hardware queues per node. Typically the number of channels the
>>             DMA engine can provide per socket.
>> num_sg: Number of scatterlist we can handle per I/O request.
> 
> Why do these need to be configurable?

The concern is with other arch/platforms that have different DMA
engines. The configurations would be platform dependent.

> 
>>
>> Numbers below are measured against pmem simulated via DRAM using
>> memmap=NN!SS.  DMA engine used is the ioatdma on Intel Skylake Xeon
>> platform.  Keep in mind the performance for persistent memory
>> will differ.
>> Fio 2.21 was used.
>>
>> 64k: 1 task queuedepth=1
>> CPU Read:  7631 MB/s  99.7% CPU    DMA Read: 2415 MB/s  54% CPU
>> CPU Write: 3552 MB/s  100% CPU     DMA Write 2173 MB/s  54% CPU
>>
>> 64k: 16 tasks queuedepth=16
>> CPU Read: 36800 MB/s  1593% CPU    DMA Read:  29100 MB/s  607% CPU
>> CPU Write 20900 MB/s  1589% CPU    DMA Write: 23400 MB/s  585% CPU
>>
>> 2M: 1 task queuedepth=1
>> CPU Read:  6013 MB/s  99.3% CPU    DMA Read:  7986 MB/s  59.3% CPU
>> CPU Write: 3579 MB/s  100% CPU     DMA Write: 5211 MB/s  58.3% CPU
>>
>> 2M: 16 tasks queuedepth=16
>> CPU Read:  18100 MB/s 1588% CPU    DMA Read:  21300 MB/s 180.9% CPU
>> CPU Write: 14100 MB/s 1594% CPU    DMA Write: 20400 MB/s 446.9% CPU
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>>  drivers/nvdimm/Kconfig   |   23 +
>>  drivers/nvdimm/Makefile  |    3
>>  drivers/nvdimm/pmem.h    |    3
>>  drivers/nvdimm/pmem_mq.c |  853 ++++++++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 882 insertions(+)
>>  create mode 100644 drivers/nvdimm/pmem_mq.c
>>
>> diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
>> index 5bdd499..c88c2bb 100644
>> --- a/drivers/nvdimm/Kconfig
>> +++ b/drivers/nvdimm/Kconfig
>> @@ -36,6 +36,29 @@ config BLK_DEV_PMEM
>>
>>           Say Y if you want to use an NVDIMM
>>
>> +config BLK_DEV_PMEM_MQ
> 
> Lets call the driver pmem_dma instead of pmem_mq, the "mq" is just a
> side effect of enabling dma support.
> 
>> +       tristate "PMEM: Persistent memory block device multi-queue support"
>> +       depends on m
> 
> Not sure what the exact kconfig syntax should be, but the policy I'm
> looking for is that this driver should be allowed to be built-in if
> the pmem driver is disabled.
> 
>> +       default LIBNVDIMM
>> +       select DAX
>> +       select ND_BTT if BTT
>> +       select ND_PFN if NVDIMM_PFN
> 
> I think these should probably move to a common symbol used by both
> pmem and pmem_dma.
> 
>> +       select DMA_ENGINE
> 
> Shouldn't this be "depends on"?
> 
>> +       help
>> +         Memory ranges for PMEM are described by either an NFIT
>> +         (NVDIMM Firmware Interface Table, see CONFIG_NFIT_ACPI), a
>> +         non-standard OEM-specific E820 memory type (type-12, see
>> +         CONFIG_X86_PMEM_LEGACY), or it is manually specified by the
>> +         'memmap=nn[KMG]!ss[KMG]' kernel command line (see
>> +         Documentation/admin-guide/kernel-parameters.rst).  This driver
>> +         converts these persistent memory ranges into block devices that are
>> +         capable of DAX (direct-access) file system mappings.  See
>> +         Documentation/nvdimm/nvdimm.txt for more details. This driver
>> +         utilizes block layer multi-queue in order to support using DMA
>> +         engines to help offload the data copying.
> 
> Rather than duplicating some of the pmem text I think this help text
> should only talk about the incremental differences relative to the
> base pmem driver.
> 
>> +
>> +         Say Y if you want to use an NVDIMM
>> +
>>  config ND_BLK
>>         tristate "BLK: Block data window (aperture) device support"
>>         default LIBNVDIMM
>> diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
>> index 909554c..8bfd107 100644
>> --- a/drivers/nvdimm/Makefile
>> +++ b/drivers/nvdimm/Makefile
>> @@ -1,11 +1,14 @@
>>  obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o
>>  obj-$(CONFIG_BLK_DEV_PMEM) += nd_pmem.o
>> +obj-$(CONFIG_BLK_DEV_PMEM_MQ) += nd_pmem_mq.o
>>  obj-$(CONFIG_ND_BTT) += nd_btt.o
>>  obj-$(CONFIG_ND_BLK) += nd_blk.o
>>  obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
>>
>>  nd_pmem-y := pmem.o
>>
>> +nd_pmem_mq-y := pmem_mq.o
>> +
> 
> s/mq/dma/
> 
> I also think it's okay to drop the nd_ prefix for this.
> 
>>  nd_btt-y := btt.o
>>
>>  nd_blk-y := blk.o
>> diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
>> index 5434321..bbbe982 100644
>> --- a/drivers/nvdimm/pmem.h
>> +++ b/drivers/nvdimm/pmem.h
>> @@ -4,6 +4,7 @@
>>  #include <linux/types.h>
>>  #include <linux/pfn_t.h>
>>  #include <linux/fs.h>
>> +#include <linux/blk-mq.h>
>>
>>  #ifdef CONFIG_ARCH_HAS_PMEM_API
>>  #define ARCH_MEMREMAP_PMEM MEMREMAP_WB
>> @@ -35,6 +36,8 @@ struct pmem_device {
>>         struct badblocks        bb;
>>         struct dax_device       *dax_dev;
>>         struct gendisk          *disk;
>> +       struct blk_mq_tag_set   tag_set;
>> +       struct request_queue    *q;
>>  };
>>
>>  long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
>> diff --git a/drivers/nvdimm/pmem_mq.c b/drivers/nvdimm/pmem_mq.c
>> new file mode 100644
>> index 0000000..db2fc2a
>> --- /dev/null
>> +++ b/drivers/nvdimm/pmem_mq.c
> 
> [snip bulk of driver code]
> 
> Rather than copy-paste most of the existing pmem driver let's refactor
> all the common bits into a pmem-core module.
> 
>> +MODULE_AUTHOR("Dave Jiang <dave.jiang@intel.com>");
> 
> Drop this MODULE_AUTHOR(), I think this is better handled with a
> MAINTAINERS entry.
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

  reply	other threads:[~2017-08-23 19:53 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-21 21:10 [PATCH v5 0/7] libnvdimm: add DMA support for pmem blk-mq Dave Jiang
2017-08-21 21:10 ` [PATCH v5 1/7] dmaengine: ioatdma: revert 7618d035 to allow sharing of DMA channels Dave Jiang
2017-08-21 21:10 ` [PATCH v5 2/7] dmaengine: Add DMA_MEMCPY_SG transaction op Dave Jiang
2017-08-21 21:10 ` [PATCH v5 3/7] dmaengine: add verification of DMA_MEMSET_SG in dmaengine Dave Jiang
2017-08-21 21:10 ` [PATCH v5 4/7] dmaengine: ioatdma: dma_prep_memcpy_sg support Dave Jiang
2017-08-21 21:11 ` [PATCH v5 5/7] dmaengine: add function to provide per descriptor xfercap for dma engine Dave Jiang
2017-08-21 21:11 ` [PATCH v5 6/7] dmaengine: add SG support to dmaengine_unmap Dave Jiang
2017-08-21 21:11 ` [PATCH v5 7/7] libnvdimm: Add blk-mq pmem driver Dave Jiang
2017-08-23 18:39   ` Dan Williams
2017-08-23 19:56     ` Dave Jiang [this message]
2017-08-23 19:58       ` Dan Williams
2017-08-24 18:52   ` kbuild test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cc37e79b-904e-a8a3-faae-28887ed28b0f@intel.com \
    --to=dave.jiang@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.