* [PATCH v4 0/7] dax/pmem: Provide a dax operation to zero range of memory
@ 2020-02-17 18:16 Vivek Goyal
2020-02-17 18:16 ` [PATCH v4 1/7] pmem: Add functions for reading/writing page to/from pmem Vivek Goyal
` (6 more replies)
0 siblings, 7 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-02-17 18:16 UTC (permalink / raw)
To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
Cc: dm-devel, vishal.l.verma, vgoyal
Hi,
This is V4 of patches. These patches are also available at.
https://github.com/rhvgoyal/linux/commits/dax-zero-range-v4
Changes since V3.
- Rebased patches on top of 5.6-rc1
- Took care of some of the comments from Christoph.
- Captured some Reviewed-by tags in some of the patches.
Previous version of patches are here.
v3:
https://lore.kernel.org/linux-fsdevel/20200214125717.GA18654@redhat.com/T/#t
v2:
https://lore.kernel.org/linux-fsdevel/20200203200029.4592-1-vgoyal@redhat.com/
v1:
https://lore.kernel.org/linux-fsdevel/20200123165249.GA7664@redhat.com/
Thanks
Vivek
Vivek Goyal (7):
pmem: Add functions for reading/writing page to/from pmem
pmem: Enable pmem_do_write() to deal with arbitrary ranges
dax, pmem: Add a dax operation zero_page_range
s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
dm,dax: Add dax zero_page_range operation
dax,iomap: Start using dax native zero_page_range()
dax,iomap: Add helper dax_iomap_zero() to zero a range
drivers/dax/super.c | 19 ++++++
drivers/md/dm-linear.c | 21 +++++++
drivers/md/dm-log-writes.c | 19 ++++++
drivers/md/dm-stripe.c | 26 ++++++++
drivers/md/dm.c | 31 +++++++++
drivers/nvdimm/pmem.c | 115 +++++++++++++++++++++++-----------
drivers/s390/block/dcssblk.c | 17 +++++
fs/dax.c | 53 ++++------------
fs/iomap/buffered-io.c | 9 +--
include/linux/dax.h | 20 ++----
include/linux/device-mapper.h | 3 +
11 files changed, 233 insertions(+), 100 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v4 1/7] pmem: Add functions for reading/writing page to/from pmem
2020-02-17 18:16 [PATCH v4 0/7] dax/pmem: Provide a dax operation to zero range of memory Vivek Goyal
@ 2020-02-17 18:16 ` Vivek Goyal
2020-02-18 17:07 ` [dm-devel] " Christoph Hellwig
2020-02-17 18:16 ` [PATCH v4 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges Vivek Goyal
` (5 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Vivek Goyal @ 2020-02-17 18:16 UTC (permalink / raw)
To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
Cc: dm-devel, vishal.l.verma, vgoyal
This splits pmem_do_bvec() into pmem_do_read() and pmem_do_write().
pmem_do_write() will be used by pmem zero_page_range() as well. Hence
sharing the same code.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
drivers/nvdimm/pmem.c | 86 +++++++++++++++++++++++++------------------
1 file changed, 50 insertions(+), 36 deletions(-)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 4eae441f86c9..075b11682192 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -136,9 +136,25 @@ static blk_status_t read_pmem(struct page *page, unsigned int off,
return BLK_STS_OK;
}
-static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
- unsigned int len, unsigned int off, unsigned int op,
- sector_t sector)
+static blk_status_t pmem_do_read(struct pmem_device *pmem,
+ struct page *page, unsigned int page_off,
+ sector_t sector, unsigned int len)
+{
+ blk_status_t rc;
+ phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
+ void *pmem_addr = pmem->virt_addr + pmem_off;
+
+ if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
+ return BLK_STS_IOERR;
+
+ rc = read_pmem(page, page_off, pmem_addr, len);
+ flush_dcache_page(page);
+ return rc;
+}
+
+static blk_status_t pmem_do_write(struct pmem_device *pmem,
+ struct page *page, unsigned int page_off,
+ sector_t sector, unsigned int len)
{
blk_status_t rc = BLK_STS_OK;
bool bad_pmem = false;
@@ -148,34 +164,25 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
bad_pmem = true;
- if (!op_is_write(op)) {
- if (unlikely(bad_pmem))
- rc = BLK_STS_IOERR;
- else {
- rc = read_pmem(page, off, pmem_addr, len);
- flush_dcache_page(page);
- }
- } else {
- /*
- * Note that we write the data both before and after
- * clearing poison. The write before clear poison
- * handles situations where the latest written data is
- * preserved and the clear poison operation simply marks
- * the address range as valid without changing the data.
- * In this case application software can assume that an
- * interrupted write will either return the new good
- * data or an error.
- *
- * However, if pmem_clear_poison() leaves the data in an
- * indeterminate state we need to perform the write
- * after clear poison.
- */
- flush_dcache_page(page);
- write_pmem(pmem_addr, page, off, len);
- if (unlikely(bad_pmem)) {
- rc = pmem_clear_poison(pmem, pmem_off, len);
- write_pmem(pmem_addr, page, off, len);
- }
+ /*
+ * Note that we write the data both before and after
+ * clearing poison. The write before clear poison
+ * handles situations where the latest written data is
+ * preserved and the clear poison operation simply marks
+ * the address range as valid without changing the data.
+ * In this case application software can assume that an
+ * interrupted write will either return the new good
+ * data or an error.
+ *
+ * However, if pmem_clear_poison() leaves the data in an
+ * indeterminate state we need to perform the write
+ * after clear poison.
+ */
+ flush_dcache_page(page);
+ write_pmem(pmem_addr, page, page_off, len);
+ if (unlikely(bad_pmem)) {
+ rc = pmem_clear_poison(pmem, pmem_off, len);
+ write_pmem(pmem_addr, page, page_off, len);
}
return rc;
@@ -197,8 +204,12 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
do_acct = nd_iostat_start(bio, &start);
bio_for_each_segment(bvec, bio, iter) {
- rc = pmem_do_bvec(pmem, bvec.bv_page, bvec.bv_len,
- bvec.bv_offset, bio_op(bio), iter.bi_sector);
+ if (op_is_write(bio_op(bio)))
+ rc = pmem_do_write(pmem, bvec.bv_page, bvec.bv_offset,
+ iter.bi_sector, bvec.bv_len);
+ else
+ rc = pmem_do_read(pmem, bvec.bv_page, bvec.bv_offset,
+ iter.bi_sector, bvec.bv_len);
if (rc) {
bio->bi_status = rc;
break;
@@ -223,9 +234,12 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
struct pmem_device *pmem = bdev->bd_queue->queuedata;
blk_status_t rc;
- rc = pmem_do_bvec(pmem, page, hpage_nr_pages(page) * PAGE_SIZE,
- 0, op, sector);
-
+ if (op_is_write(op))
+ rc = pmem_do_write(pmem, page, 0, sector,
+ hpage_nr_pages(page) * PAGE_SIZE);
+ else
+ rc = pmem_do_read(pmem, page, 0, sector,
+ hpage_nr_pages(page) * PAGE_SIZE);
/*
* The ->rw_page interface is subtle and tricky. The core
* retries on any error, so we can only invoke page_endio() in
--
2.20.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges
2020-02-17 18:16 [PATCH v4 0/7] dax/pmem: Provide a dax operation to zero range of memory Vivek Goyal
2020-02-17 18:16 ` [PATCH v4 1/7] pmem: Add functions for reading/writing page to/from pmem Vivek Goyal
@ 2020-02-17 18:16 ` Vivek Goyal
2020-02-18 17:09 ` [dm-devel] " Christoph Hellwig
2020-02-17 18:16 ` [PATCH v4 3/7] dax, pmem: Add a dax operation zero_page_range Vivek Goyal
` (4 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Vivek Goyal @ 2020-02-17 18:16 UTC (permalink / raw)
To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
Cc: dm-devel, vishal.l.verma, vgoyal
Currently pmem_do_write() is written with assumption that all I/O is
sector aligned. Soon I want to use this function in zero_page_range()
where range passed in does not have to be sector aligned.
Modify this function to be able to deal with an arbitrary range. Which
is specified by pmem_off and len.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
drivers/nvdimm/pmem.c | 32 +++++++++++++++++++++++---------
1 file changed, 23 insertions(+), 9 deletions(-)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 075b11682192..fae8f67da9de 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -154,15 +154,23 @@ static blk_status_t pmem_do_read(struct pmem_device *pmem,
static blk_status_t pmem_do_write(struct pmem_device *pmem,
struct page *page, unsigned int page_off,
- sector_t sector, unsigned int len)
+ u64 pmem_off, unsigned int len)
{
blk_status_t rc = BLK_STS_OK;
bool bad_pmem = false;
- phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
- void *pmem_addr = pmem->virt_addr + pmem_off;
-
- if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
- bad_pmem = true;
+ phys_addr_t pmem_real_off = pmem_off + pmem->data_offset;
+ void *pmem_addr = pmem->virt_addr + pmem_real_off;
+ sector_t sector_start, sector_end;
+ unsigned nr_sectors;
+
+ sector_start = DIV_ROUND_UP(pmem_off, SECTOR_SIZE);
+ sector_end = (pmem_off + len) >> SECTOR_SHIFT;
+ if (sector_end > sector_start) {
+ nr_sectors = sector_end - sector_start;
+ if (is_bad_pmem(&pmem->bb, sector_start,
+ nr_sectors << SECTOR_SHIFT))
+ bad_pmem = true;
+ }
/*
* Note that we write the data both before and after
@@ -181,7 +189,13 @@ static blk_status_t pmem_do_write(struct pmem_device *pmem,
flush_dcache_page(page);
write_pmem(pmem_addr, page, page_off, len);
if (unlikely(bad_pmem)) {
- rc = pmem_clear_poison(pmem, pmem_off, len);
+ /*
+ * Pass sector aligned offset and length. That seems
+ * to work as of now. Other finer grained alignment
+ * cases can be addressed later if need be.
+ */
+ rc = pmem_clear_poison(pmem, ALIGN(pmem_real_off, SECTOR_SIZE),
+ nr_sectors << SECTOR_SHIFT);
write_pmem(pmem_addr, page, page_off, len);
}
@@ -206,7 +220,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
bio_for_each_segment(bvec, bio, iter) {
if (op_is_write(bio_op(bio)))
rc = pmem_do_write(pmem, bvec.bv_page, bvec.bv_offset,
- iter.bi_sector, bvec.bv_len);
+ iter.bi_sector << SECTOR_SHIFT, bvec.bv_len);
else
rc = pmem_do_read(pmem, bvec.bv_page, bvec.bv_offset,
iter.bi_sector, bvec.bv_len);
@@ -235,7 +249,7 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
blk_status_t rc;
if (op_is_write(op))
- rc = pmem_do_write(pmem, page, 0, sector,
+ rc = pmem_do_write(pmem, page, 0, sector << SECTOR_SHIFT,
hpage_nr_pages(page) * PAGE_SIZE);
else
rc = pmem_do_read(pmem, page, 0, sector,
--
2.20.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 3/7] dax, pmem: Add a dax operation zero_page_range
2020-02-17 18:16 [PATCH v4 0/7] dax/pmem: Provide a dax operation to zero range of memory Vivek Goyal
2020-02-17 18:16 ` [PATCH v4 1/7] pmem: Add functions for reading/writing page to/from pmem Vivek Goyal
2020-02-17 18:16 ` [PATCH v4 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges Vivek Goyal
@ 2020-02-17 18:16 ` Vivek Goyal
2020-02-18 17:10 ` [dm-devel] " Christoph Hellwig
2020-02-17 18:16 ` [PATCH v4 4/7] s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver Vivek Goyal
` (3 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Vivek Goyal @ 2020-02-17 18:16 UTC (permalink / raw)
To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
Cc: dm-devel, vishal.l.verma, vgoyal
Add a dax operation zero_page_range, to zero a range of memory. This will
also clear any poison in the range being zeroed.
As of now, zeroing of up to one page is allowed in a single call. There
are no callers which are trying to zero more than a page in a single call.
Once we grow the callers which zero more than a page in single call, we
can add that support. Primary reason for not doing that yet is that this
will add little complexity in dm implementation where a range might be
spanning multiple underlying targets and one will have to split the range
into multiple sub ranges and call zero_page_range() on individual targets.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
drivers/dax/super.c | 19 +++++++++++++++++++
drivers/nvdimm/pmem.c | 11 +++++++++++
include/linux/dax.h | 3 +++
3 files changed, 33 insertions(+)
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 26a654dbc69a..31ee0b47b4ed 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -344,6 +344,25 @@ size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
}
EXPORT_SYMBOL_GPL(dax_copy_to_iter);
+int dax_zero_page_range(struct dax_device *dax_dev, u64 offset, size_t len)
+{
+ if (!dax_alive(dax_dev))
+ return -ENXIO;
+
+ if (!dax_dev->ops->zero_page_range)
+ return -EOPNOTSUPP;
+ /*
+ * There are no callers that want to zero across a page boundary as of
+ * now. Once users are there, this check can be removed after the
+ * device mapper code has been updated to split ranges across targets.
+ */
+ if (offset_in_page(offset) + len > PAGE_SIZE)
+ return -EIO;
+
+ return dax_dev->ops->zero_page_range(dax_dev, offset, len);
+}
+EXPORT_SYMBOL_GPL(dax_zero_page_range);
+
#ifdef CONFIG_ARCH_HAS_PMEM_API
void arch_wb_cache_pmem(void *addr, size_t size);
void dax_flush(struct dax_device *dax_dev, void *addr, size_t size)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index fae8f67da9de..44f660fa56ca 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -296,6 +296,16 @@ static const struct block_device_operations pmem_fops = {
.revalidate_disk = nvdimm_revalidate_disk,
};
+static int pmem_dax_zero_page_range(struct dax_device *dax_dev, u64 offset,
+ size_t len)
+{
+ struct pmem_device *pmem = dax_get_private(dax_dev);
+ blk_status_t rc;
+
+ rc = pmem_do_write(pmem, ZERO_PAGE(0), 0, offset, len);
+ return blk_status_to_errno(rc);
+}
+
static long pmem_dax_direct_access(struct dax_device *dax_dev,
pgoff_t pgoff, long nr_pages, void **kaddr, pfn_t *pfn)
{
@@ -327,6 +337,7 @@ static const struct dax_operations pmem_dax_ops = {
.dax_supported = generic_fsdax_supported,
.copy_from_iter = pmem_copy_from_iter,
.copy_to_iter = pmem_copy_to_iter,
+ .zero_page_range = pmem_dax_zero_page_range,
};
static const struct attribute_group *pmem_attribute_groups[] = {
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 9bd8528bd305..a555f0aeb7bd 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -34,6 +34,8 @@ struct dax_operations {
/* copy_to_iter: required operation for fs-dax direct-i/o */
size_t (*copy_to_iter)(struct dax_device *, pgoff_t, void *, size_t,
struct iov_iter *);
+ /* zero_page_range: required operation. Zero range with-in a page */
+ int (*zero_page_range)(struct dax_device *, u64, size_t);
};
extern struct attribute_group dax_attribute_group;
@@ -209,6 +211,7 @@ size_t dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
size_t bytes, struct iov_iter *i);
size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
size_t bytes, struct iov_iter *i);
+int dax_zero_page_range(struct dax_device *dax_dev, u64 offset, size_t len);
void dax_flush(struct dax_device *dax_dev, void *addr, size_t size);
ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
--
2.20.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 4/7] s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
2020-02-17 18:16 [PATCH v4 0/7] dax/pmem: Provide a dax operation to zero range of memory Vivek Goyal
` (2 preceding siblings ...)
2020-02-17 18:16 ` [PATCH v4 3/7] dax, pmem: Add a dax operation zero_page_range Vivek Goyal
@ 2020-02-17 18:16 ` Vivek Goyal
2020-02-17 18:16 ` [PATCH v4 5/7] dm,dax: Add dax zero_page_range operation Vivek Goyal
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-02-17 18:16 UTC (permalink / raw)
To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
Cc: dm-devel, vishal.l.verma, vgoyal, linux-s390, Gerald Schaefer
Add dax operation zero_page_range for dcssblk driver.
CC: linux-s390@vger.kernel.org
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
drivers/s390/block/dcssblk.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 63502ca537eb..331abab5d066 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -57,11 +57,28 @@ static size_t dcssblk_dax_copy_to_iter(struct dax_device *dax_dev,
return copy_to_iter(addr, bytes, i);
}
+static int dcssblk_dax_zero_page_range(struct dax_device *dax_dev, u64 offset,
+ size_t len)
+{
+ long rc;
+ void *kaddr;
+ pgoff_t pgoff = offset >> PAGE_SHIFT;
+ unsigned page_offset = offset_in_page(offset);
+
+ rc = dax_direct_access(dax_dev, pgoff, 1, &kaddr, NULL);
+ if (rc < 0)
+ return rc;
+ memset(kaddr + page_offset, 0, len);
+ dax_flush(dax_dev, kaddr + page_offset, len);
+ return 0;
+}
+
static const struct dax_operations dcssblk_dax_ops = {
.direct_access = dcssblk_dax_direct_access,
.dax_supported = generic_fsdax_supported,
.copy_from_iter = dcssblk_dax_copy_from_iter,
.copy_to_iter = dcssblk_dax_copy_to_iter,
+ .zero_page_range = dcssblk_dax_zero_page_range,
};
struct dcssblk_dev_info {
--
2.20.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 5/7] dm,dax: Add dax zero_page_range operation
2020-02-17 18:16 [PATCH v4 0/7] dax/pmem: Provide a dax operation to zero range of memory Vivek Goyal
` (3 preceding siblings ...)
2020-02-17 18:16 ` [PATCH v4 4/7] s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver Vivek Goyal
@ 2020-02-17 18:16 ` Vivek Goyal
2020-02-17 18:16 ` [PATCH v4 6/7] dax,iomap: Start using dax native zero_page_range() Vivek Goyal
2020-02-17 18:16 ` [PATCH v4 7/7] dax,iomap: Add helper dax_iomap_zero() to zero a range Vivek Goyal
6 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-02-17 18:16 UTC (permalink / raw)
To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
Cc: dm-devel, vishal.l.verma, vgoyal
This patch adds support for dax zero_page_range operation to dm targets.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
drivers/md/dm-linear.c | 21 +++++++++++++++++++++
drivers/md/dm-log-writes.c | 19 +++++++++++++++++++
drivers/md/dm-stripe.c | 26 ++++++++++++++++++++++++++
drivers/md/dm.c | 31 +++++++++++++++++++++++++++++++
include/linux/device-mapper.h | 3 +++
5 files changed, 100 insertions(+)
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 8d07fdf63a47..03f99e6ad372 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -201,10 +201,30 @@ static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
}
+static int linear_dax_zero_page_range(struct dm_target *ti, u64 offset,
+ size_t len)
+{
+ int ret;
+ struct linear_c *lc = ti->private;
+ struct block_device *bdev = lc->dev->bdev;
+ struct dax_device *dax_dev = lc->dev->dax_dev;
+ pgoff_t pgoff = offset >> PAGE_SHIFT;
+ unsigned page_offset = offset_in_page(offset);
+ sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+
+ dev_sector = linear_map_sector(ti, sector);
+ ret = bdev_dax_pgoff(bdev, dev_sector, ALIGN(len, PAGE_SIZE), &pgoff);
+ if (ret)
+ return ret;
+ return dax_zero_page_range(dax_dev, (pgoff << PAGE_SHIFT) + page_offset,
+ len);
+}
+
#else
#define linear_dax_direct_access NULL
#define linear_dax_copy_from_iter NULL
#define linear_dax_copy_to_iter NULL
+#define linear_dax_zero_page_range NULL
#endif
static struct target_type linear_target = {
@@ -226,6 +246,7 @@ static struct target_type linear_target = {
.direct_access = linear_dax_direct_access,
.dax_copy_from_iter = linear_dax_copy_from_iter,
.dax_copy_to_iter = linear_dax_copy_to_iter,
+ .dax_zero_page_range = linear_dax_zero_page_range,
};
int __init dm_linear_init(void)
diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 99721c76225d..f36ee223cb60 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -994,10 +994,28 @@ static size_t log_writes_dax_copy_to_iter(struct dm_target *ti,
return dax_copy_to_iter(lc->dev->dax_dev, pgoff, addr, bytes, i);
}
+static int log_writes_dax_zero_page_range(struct dm_target *ti, u64 offset,
+ size_t len)
+{
+ int ret;
+ struct log_writes_c *lc = ti->private;
+ pgoff_t pgoff = offset >> PAGE_SHIFT;
+ unsigned page_offset = offset_in_page(offset);
+ sector_t sector = pgoff * PAGE_SECTORS;
+
+ ret = bdev_dax_pgoff(lc->dev->bdev, sector, ALIGN(len, PAGE_SIZE),
+ &pgoff);
+ if (ret)
+ return ret;
+ return dax_zero_page_range(lc->dev->dax_dev,
+ (pgoff << PAGE_SHIFT) + page_offset, len);
+}
+
#else
#define log_writes_dax_direct_access NULL
#define log_writes_dax_copy_from_iter NULL
#define log_writes_dax_copy_to_iter NULL
+#define log_writes_dax_zero_page_range NULL
#endif
static struct target_type log_writes_target = {
@@ -1016,6 +1034,7 @@ static struct target_type log_writes_target = {
.direct_access = log_writes_dax_direct_access,
.dax_copy_from_iter = log_writes_dax_copy_from_iter,
.dax_copy_to_iter = log_writes_dax_copy_to_iter,
+ .dax_zero_page_range = log_writes_dax_zero_page_range,
};
static int __init dm_log_writes_init(void)
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 63bbcc20f49a..f5e17284c615 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -360,10 +360,35 @@ static size_t stripe_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
}
+static int stripe_dax_zero_page_range(struct dm_target *ti, u64 offset,
+ size_t len)
+{
+ int ret;
+ pgoff_t pgoff = offset >> PAGE_SHIFT;
+ unsigned page_offset = offset_in_page(offset);
+ sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+ struct stripe_c *sc = ti->private;
+ struct dax_device *dax_dev;
+ struct block_device *bdev;
+ uint32_t stripe;
+
+ stripe_map_sector(sc, sector, &stripe, &dev_sector);
+ dev_sector += sc->stripe[stripe].physical_start;
+ dax_dev = sc->stripe[stripe].dev->dax_dev;
+ bdev = sc->stripe[stripe].dev->bdev;
+
+ ret = bdev_dax_pgoff(bdev, dev_sector, ALIGN(len, PAGE_SIZE), &pgoff);
+ if (ret)
+ return ret;
+ return dax_zero_page_range(dax_dev, (pgoff << PAGE_SHIFT) + page_offset,
+ len);
+}
+
#else
#define stripe_dax_direct_access NULL
#define stripe_dax_copy_from_iter NULL
#define stripe_dax_copy_to_iter NULL
+#define stripe_dax_zero_page_range NULL
#endif
/*
@@ -486,6 +511,7 @@ static struct target_type stripe_target = {
.direct_access = stripe_dax_direct_access,
.dax_copy_from_iter = stripe_dax_copy_from_iter,
.dax_copy_to_iter = stripe_dax_copy_to_iter,
+ .dax_zero_page_range = stripe_dax_zero_page_range,
};
int __init dm_stripe_init(void)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b89f07ee2eff..c87cabdf7f18 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1198,6 +1198,36 @@ static size_t dm_dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff,
return ret;
}
+static int dm_dax_zero_page_range(struct dax_device *dax_dev, u64 offset,
+ size_t len)
+{
+ struct mapped_device *md = dax_get_private(dax_dev);
+ pgoff_t pgoff = offset >> PAGE_SHIFT;
+ sector_t sector = pgoff * PAGE_SECTORS;
+ struct dm_target *ti;
+ int ret = -EIO;
+ int srcu_idx;
+
+ ti = dm_dax_get_live_target(md, sector, &srcu_idx);
+
+ if (!ti)
+ goto out;
+ if (WARN_ON(!ti->type->dax_zero_page_range)) {
+ /*
+ * ->zero_page_range() is mandatory dax operation. If we are
+ * here, something is wrong.
+ */
+ dm_put_live_table(md, srcu_idx);
+ goto out;
+ }
+ ret = ti->type->dax_zero_page_range(ti, offset, len);
+
+ out:
+ dm_put_live_table(md, srcu_idx);
+
+ return ret;
+}
+
/*
* A target may call dm_accept_partial_bio only from the map routine. It is
* allowed for all bio types except REQ_PREFLUSH, REQ_OP_ZONE_RESET,
@@ -3199,6 +3229,7 @@ static const struct dax_operations dm_dax_ops = {
.dax_supported = dm_dax_supported,
.copy_from_iter = dm_dax_copy_from_iter,
.copy_to_iter = dm_dax_copy_to_iter,
+ .zero_page_range = dm_dax_zero_page_range,
};
/*
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 475668c69dbc..b4ef5b07be74 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -141,6 +141,8 @@ typedef long (*dm_dax_direct_access_fn) (struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn);
typedef size_t (*dm_dax_copy_iter_fn)(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i);
+typedef int (*dm_dax_zero_page_range_fn)(struct dm_target *ti, u64 offset,
+ size_t len);
#define PAGE_SECTORS (PAGE_SIZE / 512)
void dm_error(const char *message);
@@ -195,6 +197,7 @@ struct target_type {
dm_dax_direct_access_fn direct_access;
dm_dax_copy_iter_fn dax_copy_from_iter;
dm_dax_copy_iter_fn dax_copy_to_iter;
+ dm_dax_zero_page_range_fn dax_zero_page_range;
/* For internal device-mapper use. */
struct list_head list;
--
2.20.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 6/7] dax,iomap: Start using dax native zero_page_range()
2020-02-17 18:16 [PATCH v4 0/7] dax/pmem: Provide a dax operation to zero range of memory Vivek Goyal
` (4 preceding siblings ...)
2020-02-17 18:16 ` [PATCH v4 5/7] dm,dax: Add dax zero_page_range operation Vivek Goyal
@ 2020-02-17 18:16 ` Vivek Goyal
2020-02-17 18:16 ` [PATCH v4 7/7] dax,iomap: Add helper dax_iomap_zero() to zero a range Vivek Goyal
6 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-02-17 18:16 UTC (permalink / raw)
To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
Cc: dm-devel, vishal.l.verma, vgoyal, Christoph Hellwig
Get rid of calling block device interface for zeroing in iomap dax
zeroing path and use dax native zeroing interface instead.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
fs/dax.c | 45 +++++++++------------------------------------
1 file changed, 9 insertions(+), 36 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 1f1f0201cad1..6757e12b86b2 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1044,48 +1044,21 @@ static vm_fault_t dax_load_hole(struct xa_state *xas,
return ret;
}
-static bool dax_range_is_aligned(struct block_device *bdev,
- unsigned int offset, unsigned int length)
-{
- unsigned short sector_size = bdev_logical_block_size(bdev);
-
- if (!IS_ALIGNED(offset, sector_size))
- return false;
- if (!IS_ALIGNED(length, sector_size))
- return false;
-
- return true;
-}
-
int __dax_zero_page_range(struct block_device *bdev,
struct dax_device *dax_dev, sector_t sector,
unsigned int offset, unsigned int size)
{
- if (dax_range_is_aligned(bdev, offset, size)) {
- sector_t start_sector = sector + (offset >> 9);
-
- return blkdev_issue_zeroout(bdev, start_sector,
- size >> 9, GFP_NOFS, 0);
- } else {
- pgoff_t pgoff;
- long rc, id;
- void *kaddr;
+ pgoff_t pgoff;
+ long rc, id;
- rc = bdev_dax_pgoff(bdev, sector, PAGE_SIZE, &pgoff);
- if (rc)
- return rc;
+ rc = bdev_dax_pgoff(bdev, sector, PAGE_SIZE, &pgoff);
+ if (rc)
+ return rc;
- id = dax_read_lock();
- rc = dax_direct_access(dax_dev, pgoff, 1, &kaddr, NULL);
- if (rc < 0) {
- dax_read_unlock(id);
- return rc;
- }
- memset(kaddr + offset, 0, size);
- dax_flush(dax_dev, kaddr + offset, size);
- dax_read_unlock(id);
- }
- return 0;
+ id = dax_read_lock();
+ rc = dax_zero_page_range(dax_dev, (pgoff << PAGE_SHIFT) + offset, size);
+ dax_read_unlock(id);
+ return rc;
}
EXPORT_SYMBOL_GPL(__dax_zero_page_range);
--
2.20.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 7/7] dax,iomap: Add helper dax_iomap_zero() to zero a range
2020-02-17 18:16 [PATCH v4 0/7] dax/pmem: Provide a dax operation to zero range of memory Vivek Goyal
` (5 preceding siblings ...)
2020-02-17 18:16 ` [PATCH v4 6/7] dax,iomap: Start using dax native zero_page_range() Vivek Goyal
@ 2020-02-17 18:16 ` Vivek Goyal
6 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-02-17 18:16 UTC (permalink / raw)
To: linux-fsdevel, linux-nvdimm, hch, dan.j.williams
Cc: dm-devel, vishal.l.verma, vgoyal, Christoph Hellwig
Add a helper dax_ioamp_zero() to zero a range. This patch basically
merges __dax_zero_page_range() and iomap_dax_zero().
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
fs/dax.c | 12 ++++++------
fs/iomap/buffered-io.c | 9 +--------
include/linux/dax.h | 17 +++--------------
3 files changed, 10 insertions(+), 28 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 6757e12b86b2..f6c4788ba764 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1044,23 +1044,23 @@ static vm_fault_t dax_load_hole(struct xa_state *xas,
return ret;
}
-int __dax_zero_page_range(struct block_device *bdev,
- struct dax_device *dax_dev, sector_t sector,
- unsigned int offset, unsigned int size)
+int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
+ struct iomap *iomap)
{
pgoff_t pgoff;
long rc, id;
+ sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
- rc = bdev_dax_pgoff(bdev, sector, PAGE_SIZE, &pgoff);
+ rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
if (rc)
return rc;
id = dax_read_lock();
- rc = dax_zero_page_range(dax_dev, (pgoff << PAGE_SHIFT) + offset, size);
+ rc = dax_zero_page_range(iomap->dax_dev, (pgoff << PAGE_SHIFT) + offset,
+ size);
dax_read_unlock(id);
return rc;
}
-EXPORT_SYMBOL_GPL(__dax_zero_page_range);
static loff_t
dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7c84c4c027c4..6f750da545e5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -974,13 +974,6 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
}
-static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
- struct iomap *iomap)
-{
- return __dax_zero_page_range(iomap->bdev, iomap->dax_dev,
- iomap_sector(iomap, pos & PAGE_MASK), offset, bytes);
-}
-
static loff_t
iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
void *data, struct iomap *iomap, struct iomap *srcmap)
@@ -1000,7 +993,7 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
bytes = min_t(loff_t, PAGE_SIZE - offset, count);
if (IS_DAX(inode))
- status = iomap_dax_zero(pos, offset, bytes, iomap);
+ status = dax_iomap_zero(pos, offset, bytes, iomap);
else
status = iomap_zero(inode, pos, offset, bytes, iomap,
srcmap);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index a555f0aeb7bd..31d0e6fc3023 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -13,6 +13,7 @@
typedef unsigned long dax_entry_t;
struct iomap_ops;
+struct iomap;
struct dax_device;
struct dax_operations {
/*
@@ -223,20 +224,8 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf,
int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
pgoff_t index);
-
-#ifdef CONFIG_FS_DAX
-int __dax_zero_page_range(struct block_device *bdev,
- struct dax_device *dax_dev, sector_t sector,
- unsigned int offset, unsigned int length);
-#else
-static inline int __dax_zero_page_range(struct block_device *bdev,
- struct dax_device *dax_dev, sector_t sector,
- unsigned int offset, unsigned int length)
-{
- return -ENXIO;
-}
-#endif
-
+int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
+ struct iomap *iomap);
static inline bool dax_mapping(struct address_space *mapping)
{
return mapping->host && IS_DAX(mapping->host);
--
2.20.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [dm-devel] [PATCH v4 1/7] pmem: Add functions for reading/writing page to/from pmem
2020-02-17 18:16 ` [PATCH v4 1/7] pmem: Add functions for reading/writing page to/from pmem Vivek Goyal
@ 2020-02-18 17:07 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2020-02-18 17:07 UTC (permalink / raw)
To: Vivek Goyal
Cc: linux-fsdevel, linux-nvdimm, hch, dan.j.williams, vishal.l.verma,
dm-devel
On Mon, Feb 17, 2020 at 01:16:47PM -0500, Vivek Goyal wrote:
> This splits pmem_do_bvec() into pmem_do_read() and pmem_do_write().
> pmem_do_write() will be used by pmem zero_page_range() as well. Hence
> sharing the same code.
>
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dm-devel] [PATCH v4 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges
2020-02-17 18:16 ` [PATCH v4 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges Vivek Goyal
@ 2020-02-18 17:09 ` Christoph Hellwig
2020-02-18 21:10 ` Vivek Goyal
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2020-02-18 17:09 UTC (permalink / raw)
To: Vivek Goyal
Cc: linux-fsdevel, linux-nvdimm, hch, dan.j.williams, vishal.l.verma,
dm-devel
On Mon, Feb 17, 2020 at 01:16:48PM -0500, Vivek Goyal wrote:
> Currently pmem_do_write() is written with assumption that all I/O is
> sector aligned. Soon I want to use this function in zero_page_range()
> where range passed in does not have to be sector aligned.
>
> Modify this function to be able to deal with an arbitrary range. Which
> is specified by pmem_off and len.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
> drivers/nvdimm/pmem.c | 32 +++++++++++++++++++++++---------
> 1 file changed, 23 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 075b11682192..fae8f67da9de 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -154,15 +154,23 @@ static blk_status_t pmem_do_read(struct pmem_device *pmem,
>
> static blk_status_t pmem_do_write(struct pmem_device *pmem,
> struct page *page, unsigned int page_off,
> - sector_t sector, unsigned int len)
> + u64 pmem_off, unsigned int len)
> {
> blk_status_t rc = BLK_STS_OK;
> bool bad_pmem = false;
> - phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
> - void *pmem_addr = pmem->virt_addr + pmem_off;
> -
> - if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
> - bad_pmem = true;
> + phys_addr_t pmem_real_off = pmem_off + pmem->data_offset;
> + void *pmem_addr = pmem->virt_addr + pmem_real_off;
> + sector_t sector_start, sector_end;
> + unsigned nr_sectors;
> +
> + sector_start = DIV_ROUND_UP(pmem_off, SECTOR_SIZE);
> + sector_end = (pmem_off + len) >> SECTOR_SHIFT;
> + if (sector_end > sector_start) {
> + nr_sectors = sector_end - sector_start;
> + if (is_bad_pmem(&pmem->bb, sector_start,
> + nr_sectors << SECTOR_SHIFT))
> + bad_pmem = true;
> + }
>
> /*
> * Note that we write the data both before and after
> @@ -181,7 +189,13 @@ static blk_status_t pmem_do_write(struct pmem_device *pmem,
> flush_dcache_page(page);
> write_pmem(pmem_addr, page, page_off, len);
> if (unlikely(bad_pmem)) {
> - rc = pmem_clear_poison(pmem, pmem_off, len);
> + /*
> + * Pass sector aligned offset and length. That seems
> + * to work as of now. Other finer grained alignment
> + * cases can be addressed later if need be.
> + */
> + rc = pmem_clear_poison(pmem, ALIGN(pmem_real_off, SECTOR_SIZE),
> + nr_sectors << SECTOR_SHIFT);
> write_pmem(pmem_addr, page, page_off, len);
I'm still scared about the as of now commnet. If the interface to
clearing poison is page aligned I think we should document that in the
actual pmem_clear_poison function, and make that take the unaligned
offset. I also think we want some feedback from Dan or other what the
official interface is instead of "seems to work".
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dm-devel] [PATCH v4 3/7] dax, pmem: Add a dax operation zero_page_range
2020-02-17 18:16 ` [PATCH v4 3/7] dax, pmem: Add a dax operation zero_page_range Vivek Goyal
@ 2020-02-18 17:10 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2020-02-18 17:10 UTC (permalink / raw)
To: Vivek Goyal
Cc: linux-fsdevel, linux-nvdimm, hch, dan.j.williams, vishal.l.verma,
dm-devel
> +static int pmem_dax_zero_page_range(struct dax_device *dax_dev, u64 offset,
> + size_t len)
> +{
> + struct pmem_device *pmem = dax_get_private(dax_dev);
> + blk_status_t rc;
> +
> + rc = pmem_do_write(pmem, ZERO_PAGE(0), 0, offset, len);
> + return blk_status_to_errno(rc);
No real need for the rc variable here.
Otherwise looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dm-devel] [PATCH v4 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges
2020-02-18 17:09 ` [dm-devel] " Christoph Hellwig
@ 2020-02-18 21:10 ` Vivek Goyal
0 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-02-18 21:10 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-fsdevel, linux-nvdimm, dan.j.williams, vishal.l.verma, dm-devel
On Tue, Feb 18, 2020 at 09:09:28AM -0800, Christoph Hellwig wrote:
> On Mon, Feb 17, 2020 at 01:16:48PM -0500, Vivek Goyal wrote:
> > Currently pmem_do_write() is written with assumption that all I/O is
> > sector aligned. Soon I want to use this function in zero_page_range()
> > where range passed in does not have to be sector aligned.
> >
> > Modify this function to be able to deal with an arbitrary range. Which
> > is specified by pmem_off and len.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> > drivers/nvdimm/pmem.c | 32 +++++++++++++++++++++++---------
> > 1 file changed, 23 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> > index 075b11682192..fae8f67da9de 100644
> > --- a/drivers/nvdimm/pmem.c
> > +++ b/drivers/nvdimm/pmem.c
> > @@ -154,15 +154,23 @@ static blk_status_t pmem_do_read(struct pmem_device *pmem,
> >
> > static blk_status_t pmem_do_write(struct pmem_device *pmem,
> > struct page *page, unsigned int page_off,
> > - sector_t sector, unsigned int len)
> > + u64 pmem_off, unsigned int len)
> > {
> > blk_status_t rc = BLK_STS_OK;
> > bool bad_pmem = false;
> > - phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
> > - void *pmem_addr = pmem->virt_addr + pmem_off;
> > -
> > - if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
> > - bad_pmem = true;
> > + phys_addr_t pmem_real_off = pmem_off + pmem->data_offset;
> > + void *pmem_addr = pmem->virt_addr + pmem_real_off;
> > + sector_t sector_start, sector_end;
> > + unsigned nr_sectors;
> > +
> > + sector_start = DIV_ROUND_UP(pmem_off, SECTOR_SIZE);
> > + sector_end = (pmem_off + len) >> SECTOR_SHIFT;
> > + if (sector_end > sector_start) {
> > + nr_sectors = sector_end - sector_start;
> > + if (is_bad_pmem(&pmem->bb, sector_start,
> > + nr_sectors << SECTOR_SHIFT))
> > + bad_pmem = true;
> > + }
> >
> > /*
> > * Note that we write the data both before and after
> > @@ -181,7 +189,13 @@ static blk_status_t pmem_do_write(struct pmem_device *pmem,
> > flush_dcache_page(page);
> > write_pmem(pmem_addr, page, page_off, len);
> > if (unlikely(bad_pmem)) {
> > - rc = pmem_clear_poison(pmem, pmem_off, len);
> > + /*
> > + * Pass sector aligned offset and length. That seems
> > + * to work as of now. Other finer grained alignment
> > + * cases can be addressed later if need be.
> > + */
> > + rc = pmem_clear_poison(pmem, ALIGN(pmem_real_off, SECTOR_SIZE),
> > + nr_sectors << SECTOR_SHIFT);
> > write_pmem(pmem_addr, page, page_off, len);
>
> I'm still scared about the as of now commnet. If the interface to
> clearing poison is page aligned I think we should document that in the
> actual pmem_clear_poison function, and make that take the unaligned
> offset. I also think we want some feedback from Dan or other what the
> official interface is instead of "seems to work".
Ok, I am adding one more patch to series and enabling pmem_clear_poison()
to accept arbitrary offset and length and let it align offset and length
to sector boundary. Keeping it in a separate patch so that Dan can have
a close look at it and make sure I am doing things correctly.
Here is the new patch. I will post the V5 soon with this new patch.
Thanks
Vivek
Subject: drivers/pmem: Allow pmem_clear_poison() to accept arbitrary offset and len
Currently pmem_clear_poison() expects offset and len to be sector aligned.
Atleast that seems to be the assumption with which code has been written.
It is called only from pmem_do_bvec() which is called only from pmem_rw_page()
and pmem_make_request() which will only passe sector aligned offset and len.
Soon we want use this function from dax_zero_page_range() code path which
can try to zero arbitrary range of memory with-in a page. So update this
function to assume that offset and length can be arbitrary and do the
necessary alignments as needed.
nvdimm_clear_poison() seems to assume offset and len to be aligned to
clear_err_unit boundary. But this is currently internal detail and is
not exported for others to use. So for now, continue to align offset and
length to SECTOR_SIZE boundary. Improving it further and to align it
to clear_err_unit boundary is a TODO item for future.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
drivers/nvdimm/pmem.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 075b11682192..e72959203253 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -74,14 +74,28 @@ static blk_status_t pmem_clear_poison(struct pmem_device *pmem,
sector_t sector;
long cleared;
blk_status_t rc = BLK_STS_OK;
+ phys_addr_t start_aligned, end_aligned;
+ unsigned int len_aligned;
- sector = (offset - pmem->data_offset) / 512;
+ /*
+ * Callers can pass arbitrary offset and len. But nvdimm_clear_poison()
+ * expects memory offset and length to meet certain alignment
+ * restrction (clear_err_unit). Currently nvdimm does not export
+ * required alignment. So align offset and length to sector boundary
+ * before passing it to nvdimm_clear_poison().
+ */
+ start_aligned = ALIGN(offset, SECTOR_SIZE);
+ end_aligned = ALIGN_DOWN((offset + len), SECTOR_SIZE) - 1;
+ len_aligned = end_aligned - start_aligned + 1;
+
+ sector = (start_aligned - pmem->data_offset) / 512;
- cleared = nvdimm_clear_poison(dev, pmem->phys_addr + offset, len);
- if (cleared < len)
+ cleared = nvdimm_clear_poison(dev, pmem->phys_addr + start_aligned,
+ len_aligned);
+ if (cleared < len_aligned)
rc = BLK_STS_IOERR;
if (cleared > 0 && cleared / 512) {
- hwpoison_clear(pmem, pmem->phys_addr + offset, cleared);
+ hwpoison_clear(pmem, pmem->phys_addr + start_aligned, cleared);
cleared /= 512;
dev_dbg(dev, "%#llx clear %ld sector%s\n",
(unsigned long long) sector, cleared,
--
2.20.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-02-18 21:10 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-17 18:16 [PATCH v4 0/7] dax/pmem: Provide a dax operation to zero range of memory Vivek Goyal
2020-02-17 18:16 ` [PATCH v4 1/7] pmem: Add functions for reading/writing page to/from pmem Vivek Goyal
2020-02-18 17:07 ` [dm-devel] " Christoph Hellwig
2020-02-17 18:16 ` [PATCH v4 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges Vivek Goyal
2020-02-18 17:09 ` [dm-devel] " Christoph Hellwig
2020-02-18 21:10 ` Vivek Goyal
2020-02-17 18:16 ` [PATCH v4 3/7] dax, pmem: Add a dax operation zero_page_range Vivek Goyal
2020-02-18 17:10 ` [dm-devel] " Christoph Hellwig
2020-02-17 18:16 ` [PATCH v4 4/7] s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver Vivek Goyal
2020-02-17 18:16 ` [PATCH v4 5/7] dm,dax: Add dax zero_page_range operation Vivek Goyal
2020-02-17 18:16 ` [PATCH v4 6/7] dax,iomap: Start using dax native zero_page_range() Vivek Goyal
2020-02-17 18:16 ` [PATCH v4 7/7] dax,iomap: Add helper dax_iomap_zero() to zero a range Vivek Goyal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).