Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 0/7] dax,pmem: Provide a dax operation to zero range of memory 
@ 2020-02-07 20:26 Vivek Goyal
  2020-02-07 20:26 ` [PATCH v3 1/7] pmem: Add functions for reading/writing page to/from pmem Vivek Goyal
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Vivek Goyal @ 2020-02-07 20:26 UTC (permalink / raw)
  To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
  Cc: dm-devel, vishal.l.verma, vgoyal

Hi,

This is V3 of patches. I have dropped RFC tag from the series as it
looks like there is agreement on the interface. These patches are also
available at.

https://github.com/rhvgoyal/linux/commits/dax-zero-range-v3

I posted previous versions here.

v2:
https://lore.kernel.org/linux-fsdevel/20200203200029.4592-1-vgoyal@redhat.com/
v1:
https://lore.kernel.org/linux-fsdevel/20200123165249.GA7664@redhat.com/

Changes since V2:

Primarily took care of comments from Christoph.

- Changed zero_copy_range() parameters to pass dax device offset as u64.
- Fixed comment which says current interface only supports zeroing
  with-in page.
- Refactored pmem_do_bvec() and reused write side of code in
  zero_page_range().
- Removed generic_dax_zero_page_range()
- Fixed s390 dcssblk.c compilation issue.

Please review. 

Thanks
Vivek

Vivek Goyal (7):
  pmem: Add functions for reading/writing page to/from pmem
  pmem: Enable pmem_do_write() to deal with arbitrary ranges
  dax, pmem: Add a dax operation zero_page_range
  s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
  dm,dax: Add dax zero_page_range operation
  dax,iomap: Start using dax native zero_page_range()
  dax,iomap: Add helper dax_iomap_zero() to zero a range

 drivers/dax/super.c           |  19 ++++++
 drivers/md/dm-linear.c        |  21 +++++++
 drivers/md/dm-log-writes.c    |  19 ++++++
 drivers/md/dm-stripe.c        |  26 ++++++++
 drivers/md/dm.c               |  31 ++++++++++
 drivers/nvdimm/pmem.c         | 112 ++++++++++++++++++++++++----------
 drivers/s390/block/dcssblk.c  |  17 ++++++
 fs/dax.c                      |  53 ++++------------
 fs/iomap/buffered-io.c        |   9 +--
 include/linux/dax.h           |  20 ++----
 include/linux/device-mapper.h |   3 +
 11 files changed, 235 insertions(+), 95 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 1/7] pmem: Add functions for reading/writing page to/from pmem
  2020-02-07 20:26 [PATCH v3 0/7] dax,pmem: Provide a dax operation to zero range of memory Vivek Goyal
@ 2020-02-07 20:26 ` Vivek Goyal
  2020-02-17 13:21   ` Christoph Hellwig
  2020-02-07 20:26 ` [PATCH v3 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges Vivek Goyal
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2020-02-07 20:26 UTC (permalink / raw)
  To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
  Cc: dm-devel, vishal.l.verma, vgoyal

This splits pmem_do_bvec() into pmem_do_read() and pmem_do_write().
pmem_do_write() will be used by pmem zero_page_range() as well. Hence
sharing the same code.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 drivers/nvdimm/pmem.c | 79 ++++++++++++++++++++++++++-----------------
 1 file changed, 48 insertions(+), 31 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index ad8e4df1282b..9ad07cb8c9fc 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -136,9 +136,25 @@ static blk_status_t read_pmem(struct page *page, unsigned int off,
 	return BLK_STS_OK;
 }
 
-static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
-			unsigned int len, unsigned int off, unsigned int op,
-			sector_t sector)
+static blk_status_t pmem_do_read(struct pmem_device *pmem,
+			struct page *page, unsigned int page_off,
+			sector_t sector, unsigned int len)
+{
+	blk_status_t rc;
+	phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
+	void *pmem_addr = pmem->virt_addr + pmem_off;
+
+	if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
+		return BLK_STS_IOERR;
+
+	rc = read_pmem(page, page_off, pmem_addr, len);
+	flush_dcache_page(page);
+	return rc;
+}
+
+static blk_status_t pmem_do_write(struct pmem_device *pmem,
+			struct page *page, unsigned int page_off,
+			sector_t sector, unsigned int len)
 {
 	blk_status_t rc = BLK_STS_OK;
 	bool bad_pmem = false;
@@ -148,39 +164,40 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
 	if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
 		bad_pmem = true;
 
-	if (!op_is_write(op)) {
-		if (unlikely(bad_pmem))
-			rc = BLK_STS_IOERR;
-		else {
-			rc = read_pmem(page, off, pmem_addr, len);
-			flush_dcache_page(page);
-		}
-	} else {
-		/*
-		 * Note that we write the data both before and after
-		 * clearing poison.  The write before clear poison
-		 * handles situations where the latest written data is
-		 * preserved and the clear poison operation simply marks
-		 * the address range as valid without changing the data.
-		 * In this case application software can assume that an
-		 * interrupted write will either return the new good
-		 * data or an error.
-		 *
-		 * However, if pmem_clear_poison() leaves the data in an
-		 * indeterminate state we need to perform the write
-		 * after clear poison.
-		 */
-		flush_dcache_page(page);
-		write_pmem(pmem_addr, page, off, len);
-		if (unlikely(bad_pmem)) {
-			rc = pmem_clear_poison(pmem, pmem_off, len);
-			write_pmem(pmem_addr, page, off, len);
-		}
+	/*
+	 * Note that we write the data both before and after
+	 * clearing poison.  The write before clear poison
+	 * handles situations where the latest written data is
+	 * preserved and the clear poison operation simply marks
+	 * the address range as valid without changing the data.
+	 * In this case application software can assume that an
+	 * interrupted write will either return the new good
+	 * data or an error.
+	 *
+	 * However, if pmem_clear_poison() leaves the data in an
+	 * indeterminate state we need to perform the write
+	 * after clear poison.
+	 */
+	flush_dcache_page(page);
+	write_pmem(pmem_addr, page, page_off, len);
+	if (unlikely(bad_pmem)) {
+		rc = pmem_clear_poison(pmem, pmem_off, len);
+		write_pmem(pmem_addr, page, page_off, len);
 	}
 
 	return rc;
 }
 
+static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
+			unsigned int len, unsigned int off, unsigned int op,
+			sector_t sector)
+{
+	if (!op_is_write(op))
+		return pmem_do_read(pmem, page, off, sector, len);
+
+	return pmem_do_write(pmem, page, off, sector, len);
+}
+
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
 	int ret = 0;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges
  2020-02-07 20:26 [PATCH v3 0/7] dax,pmem: Provide a dax operation to zero range of memory Vivek Goyal
  2020-02-07 20:26 ` [PATCH v3 1/7] pmem: Add functions for reading/writing page to/from pmem Vivek Goyal
@ 2020-02-07 20:26 ` Vivek Goyal
  2020-02-17 13:23   ` Christoph Hellwig
  2020-02-07 20:26 ` [PATCH v3 3/7] dax, pmem: Add a dax operation zero_page_range Vivek Goyal
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2020-02-07 20:26 UTC (permalink / raw)
  To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
  Cc: dm-devel, vishal.l.verma, vgoyal

Currently pmem_do_write() is written with assumption that all I/O is
sector aligned. Soon I want to use this function in zero_page_range()
where range passed in does not have to be sector aligned.

Modify this function to be able to deal with an arbitrary range. Which
is specified by pmem_off and len.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 drivers/nvdimm/pmem.c | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 9ad07cb8c9fc..281fe04d25fd 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -154,15 +154,23 @@ static blk_status_t pmem_do_read(struct pmem_device *pmem,
 
 static blk_status_t pmem_do_write(struct pmem_device *pmem,
 			struct page *page, unsigned int page_off,
-			sector_t sector, unsigned int len)
+			u64 pmem_off, unsigned int len)
 {
 	blk_status_t rc = BLK_STS_OK;
 	bool bad_pmem = false;
-	phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
-	void *pmem_addr = pmem->virt_addr + pmem_off;
-
-	if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
-		bad_pmem = true;
+	phys_addr_t pmem_real_off = pmem_off + pmem->data_offset;
+	void *pmem_addr = pmem->virt_addr + pmem_real_off;
+	sector_t sector_start, sector_end;
+	unsigned nr_sectors;
+
+	sector_start = DIV_ROUND_UP(pmem_off, SECTOR_SIZE);
+	sector_end = (pmem_off + len) >> SECTOR_SHIFT;
+	if (sector_end > sector_start) {
+		nr_sectors = sector_end - sector_start;
+		if (unlikely(is_bad_pmem(&pmem->bb, sector_start,
+					 nr_sectors << SECTOR_SHIFT)))
+			bad_pmem = true;
+	}
 
 	/*
 	 * Note that we write the data both before and after
@@ -181,7 +189,13 @@ static blk_status_t pmem_do_write(struct pmem_device *pmem,
 	flush_dcache_page(page);
 	write_pmem(pmem_addr, page, page_off, len);
 	if (unlikely(bad_pmem)) {
-		rc = pmem_clear_poison(pmem, pmem_off, len);
+		/*
+		 * Pass sector aligned offset and length. That seems
+		 * to work as of now. Other finer grained alignment
+		 * cases can be addressed later if need be.
+		 */
+		rc = pmem_clear_poison(pmem, ALIGN(pmem_real_off, SECTOR_SIZE),
+				       nr_sectors << SECTOR_SHIFT);
 		write_pmem(pmem_addr, page, page_off, len);
 	}
 
@@ -195,7 +209,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
 	if (!op_is_write(op))
 		return pmem_do_read(pmem, page, off, sector, len);
 
-	return pmem_do_write(pmem, page, off, sector, len);
+	return pmem_do_write(pmem, page, off, sector << SECTOR_SHIFT, len);
 }
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 3/7] dax, pmem: Add a dax operation zero_page_range
  2020-02-07 20:26 [PATCH v3 0/7] dax,pmem: Provide a dax operation to zero range of memory Vivek Goyal
  2020-02-07 20:26 ` [PATCH v3 1/7] pmem: Add functions for reading/writing page to/from pmem Vivek Goyal
  2020-02-07 20:26 ` [PATCH v3 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges Vivek Goyal
@ 2020-02-07 20:26 ` Vivek Goyal
  2020-02-17 13:26   ` Christoph Hellwig
  2020-02-07 20:26 ` [PATCH v3 4/7] s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver Vivek Goyal
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2020-02-07 20:26 UTC (permalink / raw)
  To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
  Cc: dm-devel, vishal.l.verma, vgoyal

Add a dax operation zero_page_range, to zero a range of memory. This will
also clear any poison in the range being zeroed.

As of now, zeroing of up to one page is allowed in a single call. There
are no callers which are trying to zero more than a page in a single call.
Once we grow the callers which zero more than a page in single call, we
can add that support. Primary reason for not doing that yet is that this
will add little complexity in dm implementation where a range might be
spanning multiple underlying targets and one will have to split the range
into multiple sub ranges and call zero_page_range() on individual targets.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 drivers/dax/super.c   | 19 +++++++++++++++++++
 drivers/nvdimm/pmem.c | 15 +++++++++++++++
 include/linux/dax.h   |  3 +++
 3 files changed, 37 insertions(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 26a654dbc69a..31ee0b47b4ed 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -344,6 +344,25 @@ size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
 }
 EXPORT_SYMBOL_GPL(dax_copy_to_iter);
 
+int dax_zero_page_range(struct dax_device *dax_dev, u64 offset, size_t len)
+{
+	if (!dax_alive(dax_dev))
+		return -ENXIO;
+
+	if (!dax_dev->ops->zero_page_range)
+		return -EOPNOTSUPP;
+	/*
+	 * There are no callers that want to zero across a page boundary as of
+	 * now. Once users are there, this check can be removed after the
+	 * device mapper code has been updated to split ranges across targets.
+	 */
+	if (offset_in_page(offset) + len > PAGE_SIZE)
+		return -EIO;
+
+	return dax_dev->ops->zero_page_range(dax_dev, offset, len);
+}
+EXPORT_SYMBOL_GPL(dax_zero_page_range);
+
 #ifdef CONFIG_ARCH_HAS_PMEM_API
 void arch_wb_cache_pmem(void *addr, size_t size);
 void dax_flush(struct dax_device *dax_dev, void *addr, size_t size)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 281fe04d25fd..0def7dc8e487 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -299,6 +299,20 @@ static const struct block_device_operations pmem_fops = {
 	.revalidate_disk =	nvdimm_revalidate_disk,
 };
 
+static int pmem_dax_zero_page_range(struct dax_device *dax_dev, u64 offset,
+				    size_t len)
+{
+	int rc;
+	struct pmem_device *pmem = dax_get_private(dax_dev);
+	struct page *page = ZERO_PAGE(0);
+
+	rc = pmem_do_write(pmem, page, 0, offset, len);
+	if (rc > 0)
+		return -EIO;
+
+	return 0;
+}
+
 static long pmem_dax_direct_access(struct dax_device *dax_dev,
 		pgoff_t pgoff, long nr_pages, void **kaddr, pfn_t *pfn)
 {
@@ -330,6 +344,7 @@ static const struct dax_operations pmem_dax_ops = {
 	.dax_supported = generic_fsdax_supported,
 	.copy_from_iter = pmem_copy_from_iter,
 	.copy_to_iter = pmem_copy_to_iter,
+	.zero_page_range = pmem_dax_zero_page_range,
 };
 
 static const struct attribute_group *pmem_attribute_groups[] = {
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 9bd8528bd305..a555f0aeb7bd 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -34,6 +34,8 @@ struct dax_operations {
 	/* copy_to_iter: required operation for fs-dax direct-i/o */
 	size_t (*copy_to_iter)(struct dax_device *, pgoff_t, void *, size_t,
 			struct iov_iter *);
+	/* zero_page_range: required operation. Zero range with-in a page  */
+	int (*zero_page_range)(struct dax_device *, u64, size_t);
 };
 
 extern struct attribute_group dax_attribute_group;
@@ -209,6 +211,7 @@ size_t dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
 		size_t bytes, struct iov_iter *i);
 size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
 		size_t bytes, struct iov_iter *i);
+int dax_zero_page_range(struct dax_device *dax_dev, u64 offset, size_t len);
 void dax_flush(struct dax_device *dax_dev, void *addr, size_t size);
 
 ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 4/7] s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
  2020-02-07 20:26 [PATCH v3 0/7] dax,pmem: Provide a dax operation to zero range of memory Vivek Goyal
                   ` (2 preceding siblings ...)
  2020-02-07 20:26 ` [PATCH v3 3/7] dax, pmem: Add a dax operation zero_page_range Vivek Goyal
@ 2020-02-07 20:26 ` Vivek Goyal
  2020-02-10 20:53   ` Gerald Schaefer
  2020-02-07 20:26 ` [PATCH v3 5/7] dm,dax: Add dax zero_page_range operation Vivek Goyal
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2020-02-07 20:26 UTC (permalink / raw)
  To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
  Cc: dm-devel, vishal.l.verma, vgoyal, linux-s390

Add dax operation zero_page_range for dcssblk driver.

CC: linux-s390@vger.kernel.org
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 drivers/s390/block/dcssblk.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 63502ca537eb..331abab5d066 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -57,11 +57,28 @@ static size_t dcssblk_dax_copy_to_iter(struct dax_device *dax_dev,
 	return copy_to_iter(addr, bytes, i);
 }
 
+static int dcssblk_dax_zero_page_range(struct dax_device *dax_dev, u64 offset,
+				       size_t len)
+{
+	long rc;
+	void *kaddr;
+	pgoff_t pgoff = offset >> PAGE_SHIFT;
+	unsigned page_offset = offset_in_page(offset);
+
+	rc = dax_direct_access(dax_dev, pgoff, 1, &kaddr, NULL);
+	if (rc < 0)
+		return rc;
+	memset(kaddr + page_offset, 0, len);
+	dax_flush(dax_dev, kaddr + page_offset, len);
+	return 0;
+}
+
 static const struct dax_operations dcssblk_dax_ops = {
 	.direct_access = dcssblk_dax_direct_access,
 	.dax_supported = generic_fsdax_supported,
 	.copy_from_iter = dcssblk_dax_copy_from_iter,
 	.copy_to_iter = dcssblk_dax_copy_to_iter,
+	.zero_page_range = dcssblk_dax_zero_page_range,
 };
 
 struct dcssblk_dev_info {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 5/7] dm,dax: Add dax zero_page_range operation
  2020-02-07 20:26 [PATCH v3 0/7] dax,pmem: Provide a dax operation to zero range of memory Vivek Goyal
                   ` (3 preceding siblings ...)
  2020-02-07 20:26 ` [PATCH v3 4/7] s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver Vivek Goyal
@ 2020-02-07 20:26 ` Vivek Goyal
  2020-02-07 20:26 ` [PATCH v3 6/7] dax,iomap: Start using dax native zero_page_range() Vivek Goyal
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Vivek Goyal @ 2020-02-07 20:26 UTC (permalink / raw)
  To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
  Cc: dm-devel, vishal.l.verma, vgoyal

This patch adds support for dax zero_page_range operation to dm targets.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 drivers/md/dm-linear.c        | 21 +++++++++++++++++++++
 drivers/md/dm-log-writes.c    | 19 +++++++++++++++++++
 drivers/md/dm-stripe.c        | 26 ++++++++++++++++++++++++++
 drivers/md/dm.c               | 31 +++++++++++++++++++++++++++++++
 include/linux/device-mapper.h |  3 +++
 5 files changed, 100 insertions(+)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 8d07fdf63a47..03f99e6ad372 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -201,10 +201,30 @@ static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
 	return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
+static int linear_dax_zero_page_range(struct dm_target *ti, u64 offset,
+				      size_t len)
+{
+	int ret;
+	struct linear_c *lc = ti->private;
+	struct block_device *bdev = lc->dev->bdev;
+	struct dax_device *dax_dev = lc->dev->dax_dev;
+	pgoff_t pgoff = offset >> PAGE_SHIFT;
+	unsigned page_offset = offset_in_page(offset);
+	sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+
+	dev_sector = linear_map_sector(ti, sector);
+	ret = bdev_dax_pgoff(bdev, dev_sector, ALIGN(len, PAGE_SIZE), &pgoff);
+	if (ret)
+		return ret;
+	return dax_zero_page_range(dax_dev, (pgoff << PAGE_SHIFT) + page_offset,
+				   len);
+}
+
 #else
 #define linear_dax_direct_access NULL
 #define linear_dax_copy_from_iter NULL
 #define linear_dax_copy_to_iter NULL
+#define linear_dax_zero_page_range NULL
 #endif
 
 static struct target_type linear_target = {
@@ -226,6 +246,7 @@ static struct target_type linear_target = {
 	.direct_access = linear_dax_direct_access,
 	.dax_copy_from_iter = linear_dax_copy_from_iter,
 	.dax_copy_to_iter = linear_dax_copy_to_iter,
+	.dax_zero_page_range = linear_dax_zero_page_range,
 };
 
 int __init dm_linear_init(void)
diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 99721c76225d..f36ee223cb60 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -994,10 +994,28 @@ static size_t log_writes_dax_copy_to_iter(struct dm_target *ti,
 	return dax_copy_to_iter(lc->dev->dax_dev, pgoff, addr, bytes, i);
 }
 
+static int log_writes_dax_zero_page_range(struct dm_target *ti, u64 offset,
+					  size_t len)
+{
+	int ret;
+	struct log_writes_c *lc = ti->private;
+	pgoff_t pgoff = offset >> PAGE_SHIFT;
+	unsigned page_offset = offset_in_page(offset);
+	sector_t sector = pgoff * PAGE_SECTORS;
+
+	ret = bdev_dax_pgoff(lc->dev->bdev, sector, ALIGN(len, PAGE_SIZE),
+			     &pgoff);
+	if (ret)
+		return ret;
+	return dax_zero_page_range(lc->dev->dax_dev,
+				   (pgoff << PAGE_SHIFT) + page_offset, len);
+}
+
 #else
 #define log_writes_dax_direct_access NULL
 #define log_writes_dax_copy_from_iter NULL
 #define log_writes_dax_copy_to_iter NULL
+#define log_writes_dax_zero_page_range NULL
 #endif
 
 static struct target_type log_writes_target = {
@@ -1016,6 +1034,7 @@ static struct target_type log_writes_target = {
 	.direct_access = log_writes_dax_direct_access,
 	.dax_copy_from_iter = log_writes_dax_copy_from_iter,
 	.dax_copy_to_iter = log_writes_dax_copy_to_iter,
+	.dax_zero_page_range = log_writes_dax_zero_page_range,
 };
 
 static int __init dm_log_writes_init(void)
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 63bbcc20f49a..f5e17284c615 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -360,10 +360,35 @@ static size_t stripe_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
 	return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
+static int stripe_dax_zero_page_range(struct dm_target *ti, u64 offset,
+				      size_t len)
+{
+	int ret;
+	pgoff_t pgoff = offset >> PAGE_SHIFT;
+	unsigned page_offset = offset_in_page(offset);
+	sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+	struct stripe_c *sc = ti->private;
+	struct dax_device *dax_dev;
+	struct block_device *bdev;
+	uint32_t stripe;
+
+	stripe_map_sector(sc, sector, &stripe, &dev_sector);
+	dev_sector += sc->stripe[stripe].physical_start;
+	dax_dev = sc->stripe[stripe].dev->dax_dev;
+	bdev = sc->stripe[stripe].dev->bdev;
+
+	ret = bdev_dax_pgoff(bdev, dev_sector, ALIGN(len, PAGE_SIZE), &pgoff);
+	if (ret)
+		return ret;
+	return dax_zero_page_range(dax_dev, (pgoff << PAGE_SHIFT) + page_offset,
+				   len);
+}
+
 #else
 #define stripe_dax_direct_access NULL
 #define stripe_dax_copy_from_iter NULL
 #define stripe_dax_copy_to_iter NULL
+#define stripe_dax_zero_page_range NULL
 #endif
 
 /*
@@ -486,6 +511,7 @@ static struct target_type stripe_target = {
 	.direct_access = stripe_dax_direct_access,
 	.dax_copy_from_iter = stripe_dax_copy_from_iter,
 	.dax_copy_to_iter = stripe_dax_copy_to_iter,
+	.dax_zero_page_range = stripe_dax_zero_page_range,
 };
 
 int __init dm_stripe_init(void)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index e8f9661a10a1..4d2a6eadd901 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1198,6 +1198,36 @@ static size_t dm_dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff,
 	return ret;
 }
 
+static int dm_dax_zero_page_range(struct dax_device *dax_dev, u64 offset,
+				  size_t len)
+{
+	struct mapped_device *md = dax_get_private(dax_dev);
+	pgoff_t pgoff = offset >> PAGE_SHIFT;
+	sector_t sector = pgoff * PAGE_SECTORS;
+	struct dm_target *ti;
+	int ret = -EIO;
+	int srcu_idx;
+
+	ti = dm_dax_get_live_target(md, sector, &srcu_idx);
+
+	if (!ti)
+		goto out;
+	if (WARN_ON(!ti->type->dax_zero_page_range)) {
+		/*
+		 * ->zero_page_range() is mandatory dax operation. If we are
+		 *  here, something is wrong.
+		 */
+		dm_put_live_table(md, srcu_idx);
+		goto out;
+	}
+	ret = ti->type->dax_zero_page_range(ti, offset, len);
+
+ out:
+	dm_put_live_table(md, srcu_idx);
+
+	return ret;
+}
+
 /*
  * A target may call dm_accept_partial_bio only from the map routine.  It is
  * allowed for all bio types except REQ_PREFLUSH, REQ_OP_ZONE_RESET,
@@ -3194,6 +3224,7 @@ static const struct dax_operations dm_dax_ops = {
 	.dax_supported = dm_dax_supported,
 	.copy_from_iter = dm_dax_copy_from_iter,
 	.copy_to_iter = dm_dax_copy_to_iter,
+	.zero_page_range = dm_dax_zero_page_range,
 };
 
 /*
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 475668c69dbc..b4ef5b07be74 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -141,6 +141,8 @@ typedef long (*dm_dax_direct_access_fn) (struct dm_target *ti, pgoff_t pgoff,
 		long nr_pages, void **kaddr, pfn_t *pfn);
 typedef size_t (*dm_dax_copy_iter_fn)(struct dm_target *ti, pgoff_t pgoff,
 		void *addr, size_t bytes, struct iov_iter *i);
+typedef int (*dm_dax_zero_page_range_fn)(struct dm_target *ti, u64 offset,
+		size_t len);
 #define PAGE_SECTORS (PAGE_SIZE / 512)
 
 void dm_error(const char *message);
@@ -195,6 +197,7 @@ struct target_type {
 	dm_dax_direct_access_fn direct_access;
 	dm_dax_copy_iter_fn dax_copy_from_iter;
 	dm_dax_copy_iter_fn dax_copy_to_iter;
+	dm_dax_zero_page_range_fn dax_zero_page_range;
 
 	/* For internal device-mapper use. */
 	struct list_head list;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 6/7] dax,iomap: Start using dax native zero_page_range()
  2020-02-07 20:26 [PATCH v3 0/7] dax,pmem: Provide a dax operation to zero range of memory Vivek Goyal
                   ` (4 preceding siblings ...)
  2020-02-07 20:26 ` [PATCH v3 5/7] dm,dax: Add dax zero_page_range operation Vivek Goyal
@ 2020-02-07 20:26 ` Vivek Goyal
  2020-02-17 13:26   ` Christoph Hellwig
  2020-02-07 20:26 ` [PATCH v3 7/7] dax,iomap: Add helper dax_iomap_zero() to zero a range Vivek Goyal
  2020-02-14 12:57 ` [PATCH v3 0/7] dax,pmem: Provide a dax operation to zero range of memory Vivek Goyal
  7 siblings, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2020-02-07 20:26 UTC (permalink / raw)
  To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
  Cc: dm-devel, vishal.l.verma, vgoyal

Get rid of calling block device interface for zeroing in iomap dax
zeroing path and use dax native zeroing interface instead.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/dax.c | 45 +++++++++------------------------------------
 1 file changed, 9 insertions(+), 36 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 1f1f0201cad1..6757e12b86b2 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1044,48 +1044,21 @@ static vm_fault_t dax_load_hole(struct xa_state *xas,
 	return ret;
 }
 
-static bool dax_range_is_aligned(struct block_device *bdev,
-				 unsigned int offset, unsigned int length)
-{
-	unsigned short sector_size = bdev_logical_block_size(bdev);
-
-	if (!IS_ALIGNED(offset, sector_size))
-		return false;
-	if (!IS_ALIGNED(length, sector_size))
-		return false;
-
-	return true;
-}
-
 int __dax_zero_page_range(struct block_device *bdev,
 		struct dax_device *dax_dev, sector_t sector,
 		unsigned int offset, unsigned int size)
 {
-	if (dax_range_is_aligned(bdev, offset, size)) {
-		sector_t start_sector = sector + (offset >> 9);
-
-		return blkdev_issue_zeroout(bdev, start_sector,
-				size >> 9, GFP_NOFS, 0);
-	} else {
-		pgoff_t pgoff;
-		long rc, id;
-		void *kaddr;
+	pgoff_t pgoff;
+	long rc, id;
 
-		rc = bdev_dax_pgoff(bdev, sector, PAGE_SIZE, &pgoff);
-		if (rc)
-			return rc;
+	rc = bdev_dax_pgoff(bdev, sector, PAGE_SIZE, &pgoff);
+	if (rc)
+		return rc;
 
-		id = dax_read_lock();
-		rc = dax_direct_access(dax_dev, pgoff, 1, &kaddr, NULL);
-		if (rc < 0) {
-			dax_read_unlock(id);
-			return rc;
-		}
-		memset(kaddr + offset, 0, size);
-		dax_flush(dax_dev, kaddr + offset, size);
-		dax_read_unlock(id);
-	}
-	return 0;
+	id = dax_read_lock();
+	rc = dax_zero_page_range(dax_dev, (pgoff << PAGE_SHIFT) + offset, size);
+	dax_read_unlock(id);
+	return rc;
 }
 EXPORT_SYMBOL_GPL(__dax_zero_page_range);
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 7/7] dax,iomap: Add helper dax_iomap_zero() to zero a range
  2020-02-07 20:26 [PATCH v3 0/7] dax,pmem: Provide a dax operation to zero range of memory Vivek Goyal
                   ` (5 preceding siblings ...)
  2020-02-07 20:26 ` [PATCH v3 6/7] dax,iomap: Start using dax native zero_page_range() Vivek Goyal
@ 2020-02-07 20:26 ` Vivek Goyal
  2020-02-17 13:27   ` Christoph Hellwig
  2020-02-14 12:57 ` [PATCH v3 0/7] dax,pmem: Provide a dax operation to zero range of memory Vivek Goyal
  7 siblings, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2020-02-07 20:26 UTC (permalink / raw)
  To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
  Cc: dm-devel, vishal.l.verma, vgoyal

Add a helper dax_ioamp_zero() to zero a range. This patch basically
merges __dax_zero_page_range() and iomap_dax_zero().

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/dax.c               | 12 ++++++------
 fs/iomap/buffered-io.c |  9 +--------
 include/linux/dax.h    | 17 +++--------------
 3 files changed, 10 insertions(+), 28 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 6757e12b86b2..f6c4788ba764 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1044,23 +1044,23 @@ static vm_fault_t dax_load_hole(struct xa_state *xas,
 	return ret;
 }
 
-int __dax_zero_page_range(struct block_device *bdev,
-		struct dax_device *dax_dev, sector_t sector,
-		unsigned int offset, unsigned int size)
+int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
+		   struct iomap *iomap)
 {
 	pgoff_t pgoff;
 	long rc, id;
+	sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
 
-	rc = bdev_dax_pgoff(bdev, sector, PAGE_SIZE, &pgoff);
+	rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
 	if (rc)
 		return rc;
 
 	id = dax_read_lock();
-	rc = dax_zero_page_range(dax_dev, (pgoff << PAGE_SHIFT) + offset, size);
+	rc = dax_zero_page_range(iomap->dax_dev, (pgoff << PAGE_SHIFT) + offset,
+				 size);
 	dax_read_unlock(id);
 	return rc;
 }
-EXPORT_SYMBOL_GPL(__dax_zero_page_range);
 
 static loff_t
 dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 828444e14d09..5a5d784a110e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -974,13 +974,6 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
 	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
 }
 
-static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
-		struct iomap *iomap)
-{
-	return __dax_zero_page_range(iomap->bdev, iomap->dax_dev,
-			iomap_sector(iomap, pos & PAGE_MASK), offset, bytes);
-}
-
 static loff_t
 iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
 		void *data, struct iomap *iomap, struct iomap *srcmap)
@@ -1000,7 +993,7 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
 		bytes = min_t(loff_t, PAGE_SIZE - offset, count);
 
 		if (IS_DAX(inode))
-			status = iomap_dax_zero(pos, offset, bytes, iomap);
+			status = dax_iomap_zero(pos, offset, bytes, iomap);
 		else
 			status = iomap_zero(inode, pos, offset, bytes, iomap,
 					srcmap);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index a555f0aeb7bd..31d0e6fc3023 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -13,6 +13,7 @@
 typedef unsigned long dax_entry_t;
 
 struct iomap_ops;
+struct iomap;
 struct dax_device;
 struct dax_operations {
 	/*
@@ -223,20 +224,8 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf,
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
 int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 				      pgoff_t index);
-
-#ifdef CONFIG_FS_DAX
-int __dax_zero_page_range(struct block_device *bdev,
-		struct dax_device *dax_dev, sector_t sector,
-		unsigned int offset, unsigned int length);
-#else
-static inline int __dax_zero_page_range(struct block_device *bdev,
-		struct dax_device *dax_dev, sector_t sector,
-		unsigned int offset, unsigned int length)
-{
-	return -ENXIO;
-}
-#endif
-
+int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
+			struct iomap *iomap);
 static inline bool dax_mapping(struct address_space *mapping)
 {
 	return mapping->host && IS_DAX(mapping->host);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 4/7] s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
  2020-02-07 20:26 ` [PATCH v3 4/7] s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver Vivek Goyal
@ 2020-02-10 20:53   ` Gerald Schaefer
  2020-02-11 15:11     ` Vivek Goyal
  0 siblings, 1 reply; 20+ messages in thread
From: Gerald Schaefer @ 2020-02-10 20:53 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-fsdevel, linux-nvdimm, hch, dan.j.williams, dm-devel,
	vishal.l.verma, linux-s390

On Fri,  7 Feb 2020 15:26:49 -0500
Vivek Goyal <vgoyal@redhat.com> wrote:

> Add dax operation zero_page_range for dcssblk driver.
> 
> CC: linux-s390@vger.kernel.org
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  drivers/s390/block/dcssblk.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
> index 63502ca537eb..331abab5d066 100644
> --- a/drivers/s390/block/dcssblk.c
> +++ b/drivers/s390/block/dcssblk.c
> @@ -57,11 +57,28 @@ static size_t dcssblk_dax_copy_to_iter(struct dax_device *dax_dev,
>  	return copy_to_iter(addr, bytes, i);
>  }
>  
> +static int dcssblk_dax_zero_page_range(struct dax_device *dax_dev, u64 offset,
> +				       size_t len)
> +{
> +	long rc;
> +	void *kaddr;
> +	pgoff_t pgoff = offset >> PAGE_SHIFT;
> +	unsigned page_offset = offset_in_page(offset);
> +
> +	rc = dax_direct_access(dax_dev, pgoff, 1, &kaddr, NULL);

Why do you pass only 1 page as nr_pages argument for dax_direct_access()?
In some other patch in this series there is a comment that this will
currently only be used for one page, but support for more pages might be
added later. Wouldn't it make sense to rather use something like
PAGE_ALIGN(page_offset + len) >> PAGE_SHIFT instead of 1 here, so that
this won't have to be changed when callers will be ready to use it
with more than one page?

Of course, I guess then we'd also need some check on the return value
from dax_direct_access(), i.e. if the returned available range is
large enough for the requested range.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 4/7] s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
  2020-02-10 20:53   ` Gerald Schaefer
@ 2020-02-11 15:11     ` Vivek Goyal
  2020-02-11 15:49       ` Gerald Schaefer
  0 siblings, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2020-02-11 15:11 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: linux-fsdevel, linux-nvdimm, hch, dan.j.williams, dm-devel,
	vishal.l.verma, linux-s390

On Mon, Feb 10, 2020 at 09:53:15PM +0100, Gerald Schaefer wrote:
> On Fri,  7 Feb 2020 15:26:49 -0500
> Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > Add dax operation zero_page_range for dcssblk driver.
> > 
> > CC: linux-s390@vger.kernel.org
> > Suggested-by: Christoph Hellwig <hch@infradead.org>
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  drivers/s390/block/dcssblk.c | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
> > index 63502ca537eb..331abab5d066 100644
> > --- a/drivers/s390/block/dcssblk.c
> > +++ b/drivers/s390/block/dcssblk.c
> > @@ -57,11 +57,28 @@ static size_t dcssblk_dax_copy_to_iter(struct dax_device *dax_dev,
> >  	return copy_to_iter(addr, bytes, i);
> >  }
> >  
> > +static int dcssblk_dax_zero_page_range(struct dax_device *dax_dev, u64 offset,
> > +				       size_t len)
> > +{
> > +	long rc;
> > +	void *kaddr;
> > +	pgoff_t pgoff = offset >> PAGE_SHIFT;
> > +	unsigned page_offset = offset_in_page(offset);
> > +
> > +	rc = dax_direct_access(dax_dev, pgoff, 1, &kaddr, NULL);
> 
> Why do you pass only 1 page as nr_pages argument for dax_direct_access()?
> In some other patch in this series there is a comment that this will
> currently only be used for one page, but support for more pages might be
> added later. Wouldn't it make sense to rather use something like
> PAGE_ALIGN(page_offset + len) >> PAGE_SHIFT instead of 1 here, so that
> this won't have to be changed when callers will be ready to use it
> with more than one page?
> 
> Of course, I guess then we'd also need some check on the return value
> from dax_direct_access(), i.e. if the returned available range is
> large enough for the requested range.

I left it at 1 page because that's the current limitation of this
interface and there are no callers which are zeroing across page
boundaries.

I prefer to keep it this way and modify it when we are extending this
interface to allow zeroing across page boundaries. Because even if I add
that logic, I can't test it.

But if you still prefer to change it, I am open to make that change.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 4/7] s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
  2020-02-11 15:11     ` Vivek Goyal
@ 2020-02-11 15:49       ` Gerald Schaefer
  0 siblings, 0 replies; 20+ messages in thread
From: Gerald Schaefer @ 2020-02-11 15:49 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-fsdevel, linux-nvdimm, hch, dan.j.williams, dm-devel,
	vishal.l.verma, linux-s390

On Tue, 11 Feb 2020 10:11:14 -0500
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Mon, Feb 10, 2020 at 09:53:15PM +0100, Gerald Schaefer wrote:
> > On Fri,  7 Feb 2020 15:26:49 -0500
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> > 
> > > Add dax operation zero_page_range for dcssblk driver.
> > > 
> > > CC: linux-s390@vger.kernel.org
> > > Suggested-by: Christoph Hellwig <hch@infradead.org>
> > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > ---
> > >  drivers/s390/block/dcssblk.c | 17 +++++++++++++++++
> > >  1 file changed, 17 insertions(+)
> > > 
> > > diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
> > > index 63502ca537eb..331abab5d066 100644
> > > --- a/drivers/s390/block/dcssblk.c
> > > +++ b/drivers/s390/block/dcssblk.c
> > > @@ -57,11 +57,28 @@ static size_t dcssblk_dax_copy_to_iter(struct dax_device *dax_dev,
> > >  	return copy_to_iter(addr, bytes, i);
> > >  }
> > >  
> > > +static int dcssblk_dax_zero_page_range(struct dax_device *dax_dev, u64 offset,
> > > +				       size_t len)
> > > +{
> > > +	long rc;
> > > +	void *kaddr;
> > > +	pgoff_t pgoff = offset >> PAGE_SHIFT;
> > > +	unsigned page_offset = offset_in_page(offset);
> > > +
> > > +	rc = dax_direct_access(dax_dev, pgoff, 1, &kaddr, NULL);
> > 
> > Why do you pass only 1 page as nr_pages argument for dax_direct_access()?
> > In some other patch in this series there is a comment that this will
> > currently only be used for one page, but support for more pages might be
> > added later. Wouldn't it make sense to rather use something like
> > PAGE_ALIGN(page_offset + len) >> PAGE_SHIFT instead of 1 here, so that
> > this won't have to be changed when callers will be ready to use it
> > with more than one page?
> > 
> > Of course, I guess then we'd also need some check on the return value
> > from dax_direct_access(), i.e. if the returned available range is
> > large enough for the requested range.
> 
> I left it at 1 page because that's the current limitation of this
> interface and there are no callers which are zeroing across page
> boundaries.
> 
> I prefer to keep it this way and modify it when we are extending this
> interface to allow zeroing across page boundaries. Because even if I add
> that logic, I can't test it.

OK, fine with me.

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 0/7] dax,pmem: Provide a dax operation to zero range of memory
  2020-02-07 20:26 [PATCH v3 0/7] dax,pmem: Provide a dax operation to zero range of memory Vivek Goyal
                   ` (6 preceding siblings ...)
  2020-02-07 20:26 ` [PATCH v3 7/7] dax,iomap: Add helper dax_iomap_zero() to zero a range Vivek Goyal
@ 2020-02-14 12:57 ` Vivek Goyal
  7 siblings, 0 replies; 20+ messages in thread
From: Vivek Goyal @ 2020-02-14 12:57 UTC (permalink / raw)
  To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams; +Cc: dm-devel, vishal.l.verma

On Fri, Feb 07, 2020 at 03:26:45PM -0500, Vivek Goyal wrote:
> Hi,
> 
> This is V3 of patches. I have dropped RFC tag from the series as it
> looks like there is agreement on the interface. These patches are also
> available at.

Hi Dan, Christoph,

Ping for this patch series. How does it look? Do you have concerns?
If not, it will be good if this is merged.

Thanks
Vivek

> 
> https://github.com/rhvgoyal/linux/commits/dax-zero-range-v3
> 
> I posted previous versions here.
> 
> v2:
> https://lore.kernel.org/linux-fsdevel/20200203200029.4592-1-vgoyal@redhat.com/
> v1:
> https://lore.kernel.org/linux-fsdevel/20200123165249.GA7664@redhat.com/
> 
> Changes since V2:
> 
> Primarily took care of comments from Christoph.
> 
> - Changed zero_copy_range() parameters to pass dax device offset as u64.
> - Fixed comment which says current interface only supports zeroing
>   with-in page.
> - Refactored pmem_do_bvec() and reused write side of code in
>   zero_page_range().
> - Removed generic_dax_zero_page_range()
> - Fixed s390 dcssblk.c compilation issue.
> 
> Please review. 
> 
> Thanks
> Vivek
> 
> Vivek Goyal (7):
>   pmem: Add functions for reading/writing page to/from pmem
>   pmem: Enable pmem_do_write() to deal with arbitrary ranges
>   dax, pmem: Add a dax operation zero_page_range
>   s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
>   dm,dax: Add dax zero_page_range operation
>   dax,iomap: Start using dax native zero_page_range()
>   dax,iomap: Add helper dax_iomap_zero() to zero a range
> 
>  drivers/dax/super.c           |  19 ++++++
>  drivers/md/dm-linear.c        |  21 +++++++
>  drivers/md/dm-log-writes.c    |  19 ++++++
>  drivers/md/dm-stripe.c        |  26 ++++++++
>  drivers/md/dm.c               |  31 ++++++++++
>  drivers/nvdimm/pmem.c         | 112 ++++++++++++++++++++++++----------
>  drivers/s390/block/dcssblk.c  |  17 ++++++
>  fs/dax.c                      |  53 ++++------------
>  fs/iomap/buffered-io.c        |   9 +--
>  include/linux/dax.h           |  20 ++----
>  include/linux/device-mapper.h |   3 +
>  11 files changed, 235 insertions(+), 95 deletions(-)
> 
> -- 
> 2.20.1
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/7] pmem: Add functions for reading/writing page to/from pmem
  2020-02-07 20:26 ` [PATCH v3 1/7] pmem: Add functions for reading/writing page to/from pmem Vivek Goyal
@ 2020-02-17 13:21   ` Christoph Hellwig
  2020-02-17 18:04     ` Vivek Goyal
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2020-02-17 13:21 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-fsdevel, linux-nvdimm, hch, dan.j.williams, dm-devel,
	vishal.l.verma

On Fri, Feb 07, 2020 at 03:26:46PM -0500, Vivek Goyal wrote:
> +static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
> +			unsigned int len, unsigned int off, unsigned int op,
> +			sector_t sector)
> +{
> +	if (!op_is_write(op))
> +		return pmem_do_read(pmem, page, off, sector, len);
> +
> +	return pmem_do_write(pmem, page, off, sector, len);

Why not:

	if (op_is_write(op))
		return pmem_do_write(pmem, page, off, sector, len);
	return pmem_do_read(pmem, page, off, sector, len);

that being said I don't see the point of this pmem_do_bvec helper given
that it only has two callers.

The rest looks good to me.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges
  2020-02-07 20:26 ` [PATCH v3 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges Vivek Goyal
@ 2020-02-17 13:23   ` Christoph Hellwig
  2020-02-17 14:59     ` Vivek Goyal
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2020-02-17 13:23 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-fsdevel, linux-nvdimm, hch, dan.j.williams, dm-devel,
	vishal.l.verma

On Fri, Feb 07, 2020 at 03:26:47PM -0500, Vivek Goyal wrote:
> Currently pmem_do_write() is written with assumption that all I/O is
> sector aligned. Soon I want to use this function in zero_page_range()
> where range passed in does not have to be sector aligned.
> 
> Modify this function to be able to deal with an arbitrary range. Which
> is specified by pmem_off and len.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  drivers/nvdimm/pmem.c | 30 ++++++++++++++++++++++--------
>  1 file changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 9ad07cb8c9fc..281fe04d25fd 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -154,15 +154,23 @@ static blk_status_t pmem_do_read(struct pmem_device *pmem,
>  
>  static blk_status_t pmem_do_write(struct pmem_device *pmem,
>  			struct page *page, unsigned int page_off,
> -			sector_t sector, unsigned int len)
> +			u64 pmem_off, unsigned int len)
>  {
>  	blk_status_t rc = BLK_STS_OK;
>  	bool bad_pmem = false;
> -	phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
> -	void *pmem_addr = pmem->virt_addr + pmem_off;
> -
> -	if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
> -		bad_pmem = true;
> +	phys_addr_t pmem_real_off = pmem_off + pmem->data_offset;
> +	void *pmem_addr = pmem->virt_addr + pmem_real_off;
> +	sector_t sector_start, sector_end;
> +	unsigned nr_sectors;
> +
> +	sector_start = DIV_ROUND_UP(pmem_off, SECTOR_SIZE);
> +	sector_end = (pmem_off + len) >> SECTOR_SHIFT;
> +	if (sector_end > sector_start) {
> +		nr_sectors = sector_end - sector_start;
> +		if (unlikely(is_bad_pmem(&pmem->bb, sector_start,
> +					 nr_sectors << SECTOR_SHIFT)))
> +			bad_pmem = true;

I don't think an unlikely annotation makes much sense for assigning
a boolean value to a flag variable.

> +		/*
> +		 * Pass sector aligned offset and length. That seems
> +		 * to work as of now. Other finer grained alignment
> +		 * cases can be addressed later if need be.
> +		 */

This comment seems pretty scary.  What other cases can you think of?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 3/7] dax, pmem: Add a dax operation zero_page_range
  2020-02-07 20:26 ` [PATCH v3 3/7] dax, pmem: Add a dax operation zero_page_range Vivek Goyal
@ 2020-02-17 13:26   ` Christoph Hellwig
  2020-02-17 18:08     ` Vivek Goyal
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2020-02-17 13:26 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-fsdevel, linux-nvdimm, hch, dan.j.williams, dm-devel,
	vishal.l.verma

> +	int rc;
> +	struct pmem_device *pmem = dax_get_private(dax_dev);
> +	struct page *page = ZERO_PAGE(0);

Nit: I tend to find code easier to read if variable declarations
with assignments are above those without.

Also I don't think we need the page variable here.

> +	rc = pmem_do_write(pmem, page, 0, offset, len);
> +	if (rc > 0)
> +		return -EIO;

pmem_do_write returns a blk_status_t, so the type of rc and the > check
seem odd.  But I think pmem_do_write (and pmem_do_read) might be better
off returning a normal errno anyway.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 6/7] dax,iomap: Start using dax native zero_page_range()
  2020-02-07 20:26 ` [PATCH v3 6/7] dax,iomap: Start using dax native zero_page_range() Vivek Goyal
@ 2020-02-17 13:26   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2020-02-17 13:26 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-fsdevel, linux-nvdimm, hch, dan.j.williams, dm-devel,
	vishal.l.verma

On Fri, Feb 07, 2020 at 03:26:51PM -0500, Vivek Goyal wrote:
> Get rid of calling block device interface for zeroing in iomap dax
> zeroing path and use dax native zeroing interface instead.
> 
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 7/7] dax,iomap: Add helper dax_iomap_zero() to zero a range
  2020-02-07 20:26 ` [PATCH v3 7/7] dax,iomap: Add helper dax_iomap_zero() to zero a range Vivek Goyal
@ 2020-02-17 13:27   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2020-02-17 13:27 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-fsdevel, linux-nvdimm, hch, dan.j.williams, dm-devel,
	vishal.l.verma

On Fri, Feb 07, 2020 at 03:26:52PM -0500, Vivek Goyal wrote:
> Add a helper dax_ioamp_zero() to zero a range. This patch basically
> merges __dax_zero_page_range() and iomap_dax_zero().
> 
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/dax.c               | 12 ++++++------
>  fs/iomap/buffered-io.c |  9 +--------
>  include/linux/dax.h    | 17 +++--------------
>  3 files changed, 10 insertions(+), 28 deletions(-)

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges
  2020-02-17 13:23   ` Christoph Hellwig
@ 2020-02-17 14:59     ` Vivek Goyal
  0 siblings, 0 replies; 20+ messages in thread
From: Vivek Goyal @ 2020-02-17 14:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-fsdevel, linux-nvdimm, dan.j.williams, dm-devel, vishal.l.verma

On Mon, Feb 17, 2020 at 05:23:09AM -0800, Christoph Hellwig wrote:
> On Fri, Feb 07, 2020 at 03:26:47PM -0500, Vivek Goyal wrote:
> > Currently pmem_do_write() is written with assumption that all I/O is
> > sector aligned. Soon I want to use this function in zero_page_range()
> > where range passed in does not have to be sector aligned.
> > 
> > Modify this function to be able to deal with an arbitrary range. Which
> > is specified by pmem_off and len.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  drivers/nvdimm/pmem.c | 30 ++++++++++++++++++++++--------
> >  1 file changed, 22 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> > index 9ad07cb8c9fc..281fe04d25fd 100644
> > --- a/drivers/nvdimm/pmem.c
> > +++ b/drivers/nvdimm/pmem.c
> > @@ -154,15 +154,23 @@ static blk_status_t pmem_do_read(struct pmem_device *pmem,
> >  
> >  static blk_status_t pmem_do_write(struct pmem_device *pmem,
> >  			struct page *page, unsigned int page_off,
> > -			sector_t sector, unsigned int len)
> > +			u64 pmem_off, unsigned int len)
> >  {
> >  	blk_status_t rc = BLK_STS_OK;
> >  	bool bad_pmem = false;
> > -	phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
> > -	void *pmem_addr = pmem->virt_addr + pmem_off;
> > -
> > -	if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
> > -		bad_pmem = true;
> > +	phys_addr_t pmem_real_off = pmem_off + pmem->data_offset;
> > +	void *pmem_addr = pmem->virt_addr + pmem_real_off;
> > +	sector_t sector_start, sector_end;
> > +	unsigned nr_sectors;
> > +
> > +	sector_start = DIV_ROUND_UP(pmem_off, SECTOR_SIZE);
> > +	sector_end = (pmem_off + len) >> SECTOR_SHIFT;
> > +	if (sector_end > sector_start) {
> > +		nr_sectors = sector_end - sector_start;
> > +		if (unlikely(is_bad_pmem(&pmem->bb, sector_start,
> > +					 nr_sectors << SECTOR_SHIFT)))
> > +			bad_pmem = true;
> 
> I don't think an unlikely annotation makes much sense for assigning
> a boolean value to a flag variable.

Ok, will get rid if this unlikely() instance.

> 
> > +		/*
> > +		 * Pass sector aligned offset and length. That seems
> > +		 * to work as of now. Other finer grained alignment
> > +		 * cases can be addressed later if need be.
> > +		 */
> 
> This comment seems pretty scary.  What other cases can you think of?

Currently firmware seems to have restrictions on alignment of size and
offset of poisoned memory being cleared.

drivers/nvdimm/bus.c

nvdimm_clear_poison()
{
...
	clear_err_unit = ars_cap.clear_err_unit;
	        mask = clear_err_unit - 1;
        if ((phys | len) & mask)
                return -ENXIO;
...
}

On the system I was testing clear_err_unit is 256. If I pass in offset
and len values which are not aligned to 256, I get errors.

So if a caller passes in a random offset and range, I clear poison
only on the part of the range which is aligned to 1 << SECTOR_SHIFT. Any
portion of the range left in the beginning or at the end, does not clear
poison.

Current code also clears poison on secotr boundaries only. One can go
the extra mile and query "clear_err_unit" and if it is less than
SECTOR_SIZE, then possibly clear the poison on range of memory which
is not sector aligned but clear_err_unit aligned.

But this retains existing functionality and is not a regression w.r.t
we are already doing. Querying "clear_err_unit" acting accordingly is
an improvement if one needs it.

Hence, I don't think this is something to be concerned about.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/7] pmem: Add functions for reading/writing page to/from pmem
  2020-02-17 13:21   ` Christoph Hellwig
@ 2020-02-17 18:04     ` Vivek Goyal
  0 siblings, 0 replies; 20+ messages in thread
From: Vivek Goyal @ 2020-02-17 18:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-fsdevel, linux-nvdimm, dan.j.williams, dm-devel, vishal.l.verma

On Mon, Feb 17, 2020 at 05:21:38AM -0800, Christoph Hellwig wrote:
> On Fri, Feb 07, 2020 at 03:26:46PM -0500, Vivek Goyal wrote:
> > +static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
> > +			unsigned int len, unsigned int off, unsigned int op,
> > +			sector_t sector)
> > +{
> > +	if (!op_is_write(op))
> > +		return pmem_do_read(pmem, page, off, sector, len);
> > +
> > +	return pmem_do_write(pmem, page, off, sector, len);
> 
> Why not:
> 
> 	if (op_is_write(op))
> 		return pmem_do_write(pmem, page, off, sector, len);
> 	return pmem_do_read(pmem, page, off, sector, len);
> 
> that being said I don't see the point of this pmem_do_bvec helper given
> that it only has two callers.

Ok, I am about to post V4 of patches and I got rid of pmem_do_bvec() and
callers are directly calling pmem_do_read()/pmem_do_write().

Vivek


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 3/7] dax, pmem: Add a dax operation zero_page_range
  2020-02-17 13:26   ` Christoph Hellwig
@ 2020-02-17 18:08     ` Vivek Goyal
  0 siblings, 0 replies; 20+ messages in thread
From: Vivek Goyal @ 2020-02-17 18:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-fsdevel, linux-nvdimm, dan.j.williams, dm-devel, vishal.l.verma

On Mon, Feb 17, 2020 at 05:26:07AM -0800, Christoph Hellwig wrote:
> > +	int rc;
> > +	struct pmem_device *pmem = dax_get_private(dax_dev);
> > +	struct page *page = ZERO_PAGE(0);
> 
> Nit: I tend to find code easier to read if variable declarations
> with assignments are above those without.

Fixed in V4. 

> 
> Also I don't think we need the page variable here.

Fixed in V4.

> 
> > +	rc = pmem_do_write(pmem, page, 0, offset, len);
> > +	if (rc > 0)
> > +		return -EIO;
> 
> pmem_do_write returns a blk_status_t, so the type of rc and the > check
> seem odd.  But I think pmem_do_write (and pmem_do_read) might be better
> off returning a normal errno anyway.

Now I am using blk_status_to_errno() to convert error in V4.

        rc = pmem_do_write(pmem, ZERO_PAGE(0), 0, offset, len);
        return blk_status_to_errno(rc);

Did not modify pmem_do_read()/pmem_do_write() to return errno as there
is still one caller which expects to return blk_status_t and then that
caller will have to do the converstion.

Having said that, it probably is good idea to clean up functions called
by pmem_do_read()/pmem_do_write() to return errno. I prefer not to take
that work in that patch series as that seems like a nice to have thing
and can be handled in a separate patch series.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, back to index

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07 20:26 [PATCH v3 0/7] dax,pmem: Provide a dax operation to zero range of memory Vivek Goyal
2020-02-07 20:26 ` [PATCH v3 1/7] pmem: Add functions for reading/writing page to/from pmem Vivek Goyal
2020-02-17 13:21   ` Christoph Hellwig
2020-02-17 18:04     ` Vivek Goyal
2020-02-07 20:26 ` [PATCH v3 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges Vivek Goyal
2020-02-17 13:23   ` Christoph Hellwig
2020-02-17 14:59     ` Vivek Goyal
2020-02-07 20:26 ` [PATCH v3 3/7] dax, pmem: Add a dax operation zero_page_range Vivek Goyal
2020-02-17 13:26   ` Christoph Hellwig
2020-02-17 18:08     ` Vivek Goyal
2020-02-07 20:26 ` [PATCH v3 4/7] s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver Vivek Goyal
2020-02-10 20:53   ` Gerald Schaefer
2020-02-11 15:11     ` Vivek Goyal
2020-02-11 15:49       ` Gerald Schaefer
2020-02-07 20:26 ` [PATCH v3 5/7] dm,dax: Add dax zero_page_range operation Vivek Goyal
2020-02-07 20:26 ` [PATCH v3 6/7] dax,iomap: Start using dax native zero_page_range() Vivek Goyal
2020-02-17 13:26   ` Christoph Hellwig
2020-02-07 20:26 ` [PATCH v3 7/7] dax,iomap: Add helper dax_iomap_zero() to zero a range Vivek Goyal
2020-02-17 13:27   ` Christoph Hellwig
2020-02-14 12:57 ` [PATCH v3 0/7] dax,pmem: Provide a dax operation to zero range of memory Vivek Goyal

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git