All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: linux-nvdimm@lists.01.org
Cc: Jan Kara <jack@suse.cz>, Matthew Wilcox <mawilcox@microsoft.com>,
	linux-kernel@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@lst.de>
Subject: [PATCH v2 19/33] dax, pmem: introduce 'copy_from_iter' dax operation
Date: Fri, 14 Apr 2017 19:34:48 -0700	[thread overview]
Message-ID: <149222368842.32363.497776993572386297.stgit@dwillia2-desk3.amr.corp.intel.com> (raw)
In-Reply-To: <149222358726.32363.15876405696577002849.stgit@dwillia2-desk3.amr.corp.intel.com>

The direct-I/O write path for a pmem device must ensure that data is
flushed to a power-fail safe zone when the operation is complete.
However, other dax capable block devices, like brd, do not have this
requirement.  Introduce a 'copy_from_iter' dax operation so that pmem
can inject cache management without imposing this overhead on other dax
capable block_device drivers.

This is also a first step of moving all architecture-specific
pmem-operations to the pmem driver.

Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/pmem.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/dax.h   |    3 +++
 2 files changed, 46 insertions(+)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 3b3dab73d741..e501df4ab4b4 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -220,6 +220,48 @@ __weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
 	return PHYS_PFN(pmem->size - pmem->pfn_pad - offset);
 }
 
+static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
+		void *addr, size_t bytes, struct iov_iter *i)
+{
+	size_t len;
+
+	/* TODO: skip the write-back by always using non-temporal stores */
+	len = copy_from_iter_nocache(addr, bytes, i);
+
+	/*
+	 * In the iovec case on x86_64 copy_from_iter_nocache() uses
+	 * non-temporal stores for the bulk of the transfer, but we need
+	 * to manually flush if the transfer is unaligned. A cached
+	 * memory copy is used when destination or size is not naturally
+	 * aligned. That is:
+	 *   - Require 8-byte alignment when size is 8 bytes or larger.
+	 *   - Require 4-byte alignment when size is 4 bytes.
+	 *
+	 * In the non-iovec case the entire destination needs to be
+	 * flushed.
+	 */
+	if (iter_is_iovec(i)) {
+		unsigned long flushed, dest = (unsigned long) addr;
+
+		if (bytes < 8) {
+			if (!IS_ALIGNED(dest, 4) || (bytes != 4))
+				wb_cache_pmem(addr, 1);
+		} else {
+			if (!IS_ALIGNED(dest, 8)) {
+				dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
+				wb_cache_pmem(addr, 1);
+			}
+
+			flushed = dest - (unsigned long) addr;
+			if (bytes > flushed && !IS_ALIGNED(bytes - flushed, 8))
+				wb_cache_pmem(addr + bytes - 1, 1);
+		}
+	} else
+		wb_cache_pmem(addr, bytes);
+
+	return len;
+}
+
 static const struct block_device_operations pmem_fops = {
 	.owner =		THIS_MODULE,
 	.rw_page =		pmem_rw_page,
@@ -236,6 +278,7 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev,
 
 static const struct dax_operations pmem_dax_ops = {
 	.direct_access = pmem_dax_direct_access,
+	.copy_from_iter = pmem_copy_from_iter,
 };
 
 static void pmem_release_queue(void *q)
diff --git a/include/linux/dax.h b/include/linux/dax.h
index d3158e74a59e..156f067d4db5 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -16,6 +16,9 @@ struct dax_operations {
 	 */
 	long (*direct_access)(struct dax_device *, pgoff_t, long,
 			void **, pfn_t *);
+	/* copy_from_iter: dax-driver override for default copy_from_iter */
+	size_t (*copy_from_iter)(struct dax_device *, pgoff_t, void *, size_t,
+			struct iov_iter *);
 };
 
 int dax_read_lock(void);

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: linux-nvdimm@ml01.01.org
Cc: Jan Kara <jack@suse.cz>, Matthew Wilcox <mawilcox@microsoft.com>,
	linux-kernel@vger.kernel.org, Jeff Moyer <jmoyer@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Christoph Hellwig <hch@lst.de>
Subject: [PATCH v2 19/33] dax, pmem: introduce 'copy_from_iter' dax operation
Date: Fri, 14 Apr 2017 19:34:48 -0700	[thread overview]
Message-ID: <149222368842.32363.497776993572386297.stgit@dwillia2-desk3.amr.corp.intel.com> (raw)
In-Reply-To: <149222358726.32363.15876405696577002849.stgit@dwillia2-desk3.amr.corp.intel.com>

The direct-I/O write path for a pmem device must ensure that data is
flushed to a power-fail safe zone when the operation is complete.
However, other dax capable block devices, like brd, do not have this
requirement.  Introduce a 'copy_from_iter' dax operation so that pmem
can inject cache management without imposing this overhead on other dax
capable block_device drivers.

This is also a first step of moving all architecture-specific
pmem-operations to the pmem driver.

Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/pmem.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/dax.h   |    3 +++
 2 files changed, 46 insertions(+)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 3b3dab73d741..e501df4ab4b4 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -220,6 +220,48 @@ __weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
 	return PHYS_PFN(pmem->size - pmem->pfn_pad - offset);
 }
 
+static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
+		void *addr, size_t bytes, struct iov_iter *i)
+{
+	size_t len;
+
+	/* TODO: skip the write-back by always using non-temporal stores */
+	len = copy_from_iter_nocache(addr, bytes, i);
+
+	/*
+	 * In the iovec case on x86_64 copy_from_iter_nocache() uses
+	 * non-temporal stores for the bulk of the transfer, but we need
+	 * to manually flush if the transfer is unaligned. A cached
+	 * memory copy is used when destination or size is not naturally
+	 * aligned. That is:
+	 *   - Require 8-byte alignment when size is 8 bytes or larger.
+	 *   - Require 4-byte alignment when size is 4 bytes.
+	 *
+	 * In the non-iovec case the entire destination needs to be
+	 * flushed.
+	 */
+	if (iter_is_iovec(i)) {
+		unsigned long flushed, dest = (unsigned long) addr;
+
+		if (bytes < 8) {
+			if (!IS_ALIGNED(dest, 4) || (bytes != 4))
+				wb_cache_pmem(addr, 1);
+		} else {
+			if (!IS_ALIGNED(dest, 8)) {
+				dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
+				wb_cache_pmem(addr, 1);
+			}
+
+			flushed = dest - (unsigned long) addr;
+			if (bytes > flushed && !IS_ALIGNED(bytes - flushed, 8))
+				wb_cache_pmem(addr + bytes - 1, 1);
+		}
+	} else
+		wb_cache_pmem(addr, bytes);
+
+	return len;
+}
+
 static const struct block_device_operations pmem_fops = {
 	.owner =		THIS_MODULE,
 	.rw_page =		pmem_rw_page,
@@ -236,6 +278,7 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev,
 
 static const struct dax_operations pmem_dax_ops = {
 	.direct_access = pmem_dax_direct_access,
+	.copy_from_iter = pmem_copy_from_iter,
 };
 
 static void pmem_release_queue(void *q)
diff --git a/include/linux/dax.h b/include/linux/dax.h
index d3158e74a59e..156f067d4db5 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -16,6 +16,9 @@ struct dax_operations {
 	 */
 	long (*direct_access)(struct dax_device *, pgoff_t, long,
 			void **, pfn_t *);
+	/* copy_from_iter: dax-driver override for default copy_from_iter */
+	size_t (*copy_from_iter)(struct dax_device *, pgoff_t, void *, size_t,
+			struct iov_iter *);
 };
 
 int dax_read_lock(void);

  parent reply	other threads:[~2017-04-15  2:40 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-15  2:33 [PATCH v2 00/33] dax: introduce dax_operations Dan Williams
2017-04-15  2:33 ` Dan Williams
2017-04-15  2:33 ` [PATCH v2 01/33] device-dax: rename 'dax_dev' to 'dev_dax' Dan Williams
2017-04-15  2:33   ` Dan Williams
2017-04-15  2:33 ` [PATCH v2 02/33] dax: refactor dax-fs into a generic provider of 'struct dax_device' instances Dan Williams
2017-04-15  2:33   ` Dan Williams
2017-04-15  2:33 ` [PATCH v2 03/33] dax: add a facility to lookup a dax device by 'host' device name Dan Williams
2017-04-15  2:33   ` Dan Williams
2017-04-15  2:33 ` [PATCH v2 04/33] dax: introduce dax_operations Dan Williams
2017-04-15  2:33   ` Dan Williams
2017-04-15  2:33 ` [PATCH v2 05/33] pmem: add dax_operations support Dan Williams
2017-04-15  2:33   ` Dan Williams
2017-04-15  2:33 ` [PATCH v2 06/33] axon_ram: " Dan Williams
2017-04-15  2:33   ` Dan Williams
2017-04-15  2:33 ` [PATCH v2 07/33] brd: " Dan Williams
2017-04-15  2:33   ` Dan Williams
2017-04-15  2:33 ` [PATCH v2 08/33] dcssblk: " Dan Williams
2017-04-15  2:33   ` Dan Williams
2017-04-15  2:33 ` [PATCH v2 09/33] block: kill bdev_dax_capable() Dan Williams
2017-04-15  2:33   ` Dan Williams
2017-04-15  2:34 ` [PATCH v2 10/33] dax: introduce dax_direct_access() Dan Williams
2017-04-15  2:34   ` Dan Williams
2017-04-15  2:34 ` [PATCH v2 11/33] dm: add dax_device and dax_operations support Dan Williams
2017-04-15  2:34   ` Dan Williams
2017-04-15 15:17   ` Dan Williams
2017-04-15 15:17     ` Dan Williams
2017-04-15 15:17     ` Dan Williams
2017-04-15  2:34 ` [PATCH v2 12/33] dm: teach dm-targets to use a dax_device + dax_operations Dan Williams
2017-04-15  2:34   ` Dan Williams
2017-04-15  2:34 ` [PATCH v2 13/33] ext2, ext4, xfs: retrieve dax_device for iomap operations Dan Williams
2017-04-15  2:34   ` Dan Williams
2017-04-15  2:34 ` [PATCH v2 14/33] Revert "block: use DAX for partition table reads" Dan Williams
2017-04-15  2:34   ` Dan Williams
2017-04-15  2:34 ` [PATCH v2 15/33] filesystem-dax: convert to dax_direct_access() Dan Williams
2017-04-15  2:34   ` Dan Williams
2017-04-15  2:34 ` [PATCH v2 16/33] block, dax: convert bdev_dax_supported() " Dan Williams
2017-04-15  2:34   ` Dan Williams
2017-04-15  2:34 ` [PATCH v2 17/33] block: remove block_device_operations ->direct_access() Dan Williams
2017-04-15  2:34   ` Dan Williams
2017-04-15  2:34 ` [PATCH v2 18/33] x86, dax, pmem: remove indirection around memcpy_from_pmem() Dan Williams
2017-04-15  2:34   ` Dan Williams
2017-04-15  2:34 ` Dan Williams [this message]
2017-04-15  2:34   ` [PATCH v2 19/33] dax, pmem: introduce 'copy_from_iter' dax operation Dan Williams
2017-04-15  2:34 ` [PATCH v2 20/33] dm: add ->copy_from_iter() dax operation support Dan Williams
2017-04-15  2:34   ` Dan Williams
2017-04-15  2:34 ` [PATCH v2 21/33] filesystem-dax: convert to dax_copy_from_iter() Dan Williams
2017-04-15  2:34   ` Dan Williams
2017-04-15  2:35 ` [PATCH v2 22/33] dax, pmem: introduce an optional 'flush' dax_operation Dan Williams
2017-04-15  2:35   ` Dan Williams
2017-04-15  2:35 ` [PATCH v2 23/33] dm: add ->flush() dax operation support Dan Williams
2017-04-15  2:35   ` Dan Williams
2017-04-15  2:35 ` [PATCH v2 24/33] filesystem-dax: convert to dax_flush() Dan Williams
2017-04-15  2:35   ` Dan Williams
2017-04-15  2:35 ` [PATCH v2 25/33] x86, dax: replace clear_pmem() with open coded memset + dax_ops->flush Dan Williams
2017-04-15  2:35   ` Dan Williams
2017-04-15  2:35 ` [PATCH v2 26/33] x86, dax, libnvdimm: move wb_cache_pmem() to libnvdimm Dan Williams
2017-04-15  2:35   ` Dan Williams
2017-04-15  2:35 ` [PATCH v2 27/33] x86, libnvdimm, pmem: move arch_invalidate_pmem() " Dan Williams
2017-04-15  2:35   ` Dan Williams
2017-04-15  2:35 ` [PATCH v2 28/33] x86, libnvdimm, dax: stop abusing __copy_user_nocache Dan Williams
2017-04-15  2:35   ` Dan Williams
2017-04-15  2:35 ` [PATCH v2 29/33] uio, libnvdimm, pmem: implement cache bypass for all copy_from_iter() operations Dan Williams
2017-04-15  2:35   ` Dan Williams
2017-04-24 15:04   ` Jan Kara
2017-04-24 15:04     ` Jan Kara
2017-04-15  2:35 ` [PATCH v2 30/33] libnvdimm, pmem: fix persistence warning Dan Williams
2017-04-15  2:35   ` Dan Williams
2017-04-15  2:35 ` [PATCH v2 31/33] libnvdimm, nfit: enable support for volatile ranges Dan Williams
2017-04-15  2:35   ` Dan Williams
2017-04-15  2:35 ` [PATCH v2 32/33] filesystem-dax: gate calls to dax_flush() on QUEUE_FLAG_WC Dan Williams
2017-04-15  2:35   ` Dan Williams
2017-04-15  2:36 ` [PATCH v2 33/33] libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region Dan Williams
2017-04-15  2:36   ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=149222368842.32363.497776993572386297.stgit@dwillia2-desk3.amr.corp.intel.com \
    --to=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mawilcox@microsoft.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.