All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] dax: handling of media errors
@ 2016-03-24 23:17 ` Vishal Verma
  0 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jens Axboe, Jan Kara, Andrew Morton, Dave Chinner, xfs,
	linux-block, linux-mm, Matthew Wilcox, linux-fsdevel, linux-ext4,
	Al Viro

Until now, dax has been disabled if media errors were found on
any device. This series attempts to address that.

The first three patches from Dan re-enable dax even when media
errors are present.

The fourth patch from Matthew removes the
zeroout path from dax entirely, making zeroout operations always
go through the driver (The motivation is that if a backing device
has media errors, and we create a sparse file on it, we don't
want the initial zeroing to happen via dax, we want to give the
block driver a chance to clear the errors).

The fifth patch changes the behaviour of dax_do_io by adding a
wrapper around it that is passed all the arguments also needed by
__blockdev_do_direct_IO. If (the new) __dax_do_io fails with -EIO
due to a bad block, we simply retry with the direct_IO path which
forces the IO to go through the block driver, and can attempt to
clear the error.

Dan Williams (3):
  block, dax: pass blk_dax_ctl through to drivers
  dax: fallback from pmd to pte on error
  dax: enable dax in the presence of known media errors (badblocks)

Vishal Verma (2):
  dax: use sb_issue_zerout instead of calling dax_clear_sectors
  dax: handle media errors in dax_do_io

 arch/powerpc/sysdev/axonram.c | 10 +++----
 block/ioctl.c                 |  9 ------
 drivers/block/brd.c           |  9 +++---
 drivers/nvdimm/pmem.c         | 17 ++++++++---
 drivers/s390/block/dcssblk.c  | 12 ++++----
 fs/block_dev.c                |  7 +++--
 fs/dax.c                      | 70 +++++++++++++++++++++----------------------
 fs/ext2/inode.c               | 12 ++++----
 fs/ext4/indirect.c            | 11 ++++---
 fs/ext4/inode.c               |  5 ++--
 fs/xfs/xfs_aops.c             |  7 +++--
 fs/xfs/xfs_bmap_util.c        |  9 ------
 include/linux/blkdev.h        |  3 +-
 include/linux/dax.h           |  7 +++--
 14 files changed, 93 insertions(+), 95 deletions(-)

-- 
2.5.5

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 0/5] dax: handling of media errors
@ 2016-03-24 23:17 ` Vishal Verma
  0 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Vishal Verma, linux-fsdevel, linux-block, xfs, linux-ext4,
	linux-mm, Matthew Wilcox, Ross Zwisler, Dan Williams,
	Dave Chinner, Jan Kara, Jens Axboe, Al Viro, Andrew Morton

Until now, dax has been disabled if media errors were found on
any device. This series attempts to address that.

The first three patches from Dan re-enable dax even when media
errors are present.

The fourth patch from Matthew removes the
zeroout path from dax entirely, making zeroout operations always
go through the driver (The motivation is that if a backing device
has media errors, and we create a sparse file on it, we don't
want the initial zeroing to happen via dax, we want to give the
block driver a chance to clear the errors).

The fifth patch changes the behaviour of dax_do_io by adding a
wrapper around it that is passed all the arguments also needed by
__blockdev_do_direct_IO. If (the new) __dax_do_io fails with -EIO
due to a bad block, we simply retry with the direct_IO path which
forces the IO to go through the block driver, and can attempt to
clear the error.

Dan Williams (3):
  block, dax: pass blk_dax_ctl through to drivers
  dax: fallback from pmd to pte on error
  dax: enable dax in the presence of known media errors (badblocks)

Vishal Verma (2):
  dax: use sb_issue_zerout instead of calling dax_clear_sectors
  dax: handle media errors in dax_do_io

 arch/powerpc/sysdev/axonram.c | 10 +++----
 block/ioctl.c                 |  9 ------
 drivers/block/brd.c           |  9 +++---
 drivers/nvdimm/pmem.c         | 17 ++++++++---
 drivers/s390/block/dcssblk.c  | 12 ++++----
 fs/block_dev.c                |  7 +++--
 fs/dax.c                      | 70 +++++++++++++++++++++----------------------
 fs/ext2/inode.c               | 12 ++++----
 fs/ext4/indirect.c            | 11 ++++---
 fs/ext4/inode.c               |  5 ++--
 fs/xfs/xfs_aops.c             |  7 +++--
 fs/xfs/xfs_bmap_util.c        |  9 ------
 include/linux/blkdev.h        |  3 +-
 include/linux/dax.h           |  7 +++--
 14 files changed, 93 insertions(+), 95 deletions(-)

-- 
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 0/5] dax: handling of media errors
@ 2016-03-24 23:17 ` Vishal Verma
  0 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Vishal Verma, linux-fsdevel, linux-block, xfs, linux-ext4,
	linux-mm, Matthew Wilcox, Ross Zwisler, Dan Williams,
	Dave Chinner, Jan Kara, Jens Axboe, Al Viro, Andrew Morton

Until now, dax has been disabled if media errors were found on
any device. This series attempts to address that.

The first three patches from Dan re-enable dax even when media
errors are present.

The fourth patch from Matthew removes the
zeroout path from dax entirely, making zeroout operations always
go through the driver (The motivation is that if a backing device
has media errors, and we create a sparse file on it, we don't
want the initial zeroing to happen via dax, we want to give the
block driver a chance to clear the errors).

The fifth patch changes the behaviour of dax_do_io by adding a
wrapper around it that is passed all the arguments also needed by
__blockdev_do_direct_IO. If (the new) __dax_do_io fails with -EIO
due to a bad block, we simply retry with the direct_IO path which
forces the IO to go through the block driver, and can attempt to
clear the error.

Dan Williams (3):
  block, dax: pass blk_dax_ctl through to drivers
  dax: fallback from pmd to pte on error
  dax: enable dax in the presence of known media errors (badblocks)

Vishal Verma (2):
  dax: use sb_issue_zerout instead of calling dax_clear_sectors
  dax: handle media errors in dax_do_io

 arch/powerpc/sysdev/axonram.c | 10 +++----
 block/ioctl.c                 |  9 ------
 drivers/block/brd.c           |  9 +++---
 drivers/nvdimm/pmem.c         | 17 ++++++++---
 drivers/s390/block/dcssblk.c  | 12 ++++----
 fs/block_dev.c                |  7 +++--
 fs/dax.c                      | 70 +++++++++++++++++++++----------------------
 fs/ext2/inode.c               | 12 ++++----
 fs/ext4/indirect.c            | 11 ++++---
 fs/ext4/inode.c               |  5 ++--
 fs/xfs/xfs_aops.c             |  7 +++--
 fs/xfs/xfs_bmap_util.c        |  9 ------
 include/linux/blkdev.h        |  3 +-
 include/linux/dax.h           |  7 +++--
 14 files changed, 93 insertions(+), 95 deletions(-)

-- 
2.5.5


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 0/5] dax: handling of media errors
@ 2016-03-24 23:17 ` Vishal Verma
  0 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jens Axboe, Jan Kara, Andrew Morton, Vishal Verma, xfs,
	linux-block, linux-mm, Matthew Wilcox, linux-fsdevel,
	Ross Zwisler, linux-ext4, Dan Williams, Al Viro

Until now, dax has been disabled if media errors were found on
any device. This series attempts to address that.

The first three patches from Dan re-enable dax even when media
errors are present.

The fourth patch from Matthew removes the
zeroout path from dax entirely, making zeroout operations always
go through the driver (The motivation is that if a backing device
has media errors, and we create a sparse file on it, we don't
want the initial zeroing to happen via dax, we want to give the
block driver a chance to clear the errors).

The fifth patch changes the behaviour of dax_do_io by adding a
wrapper around it that is passed all the arguments also needed by
__blockdev_do_direct_IO. If (the new) __dax_do_io fails with -EIO
due to a bad block, we simply retry with the direct_IO path which
forces the IO to go through the block driver, and can attempt to
clear the error.

Dan Williams (3):
  block, dax: pass blk_dax_ctl through to drivers
  dax: fallback from pmd to pte on error
  dax: enable dax in the presence of known media errors (badblocks)

Vishal Verma (2):
  dax: use sb_issue_zerout instead of calling dax_clear_sectors
  dax: handle media errors in dax_do_io

 arch/powerpc/sysdev/axonram.c | 10 +++----
 block/ioctl.c                 |  9 ------
 drivers/block/brd.c           |  9 +++---
 drivers/nvdimm/pmem.c         | 17 ++++++++---
 drivers/s390/block/dcssblk.c  | 12 ++++----
 fs/block_dev.c                |  7 +++--
 fs/dax.c                      | 70 +++++++++++++++++++++----------------------
 fs/ext2/inode.c               | 12 ++++----
 fs/ext4/indirect.c            | 11 ++++---
 fs/ext4/inode.c               |  5 ++--
 fs/xfs/xfs_aops.c             |  7 +++--
 fs/xfs/xfs_bmap_util.c        |  9 ------
 include/linux/blkdev.h        |  3 +-
 include/linux/dax.h           |  7 +++--
 14 files changed, 93 insertions(+), 95 deletions(-)

-- 
2.5.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 1/5] block, dax: pass blk_dax_ctl through to drivers
  2016-03-24 23:17 ` Vishal Verma
@ 2016-03-24 23:17   ` Vishal Verma
  -1 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, linux-fsdevel, linux-block, xfs, linux-ext4,
	linux-mm, Matthew Wilcox, Ross Zwisler, Dave Chinner, Jan Kara,
	Jens Axboe, Al Viro, Andrew Morton

From: Dan Williams <dan.j.williams@intel.com>

From: Dan Williams <dan.j.williams@intel.com>

This is in preparation for doing badblocks checking against the
requested sector range in the driver.  Currently we opportunistically
return as much data that can be "dax'd" starting at the given sector.
When errors are present we want to limit that range to the first
encountered error, or fail the dax request if the range encompasses an
error.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/powerpc/sysdev/axonram.c | 10 +++++-----
 drivers/block/brd.c           |  9 +++++----
 drivers/nvdimm/pmem.c         |  9 +++++----
 drivers/s390/block/dcssblk.c  | 12 ++++++------
 fs/block_dev.c                |  2 +-
 include/linux/blkdev.h        |  3 +--
 6 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 0d112b9..d85673f 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -139,17 +139,17 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
 
 /**
  * axon_ram_direct_access - direct_access() method for block device
- * @device, @sector, @data: see block_device_operations method
+ * @dax: see block_device_operations method
  */
 static long
-axon_ram_direct_access(struct block_device *device, sector_t sector,
-		       void __pmem **kaddr, pfn_t *pfn)
+axon_ram_direct_access(struct block_device *device, struct blk_dax_ctl *dax)
 {
+	sector_t sector = get_start_sect(device) + dax->sector;
 	struct axon_ram_bank *bank = device->bd_disk->private_data;
 	loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
 
-	*kaddr = (void __pmem __force *) bank->io_addr + offset;
-	*pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
+	dax->addr = (void __pmem __force *) bank->io_addr + offset;
+	dax->pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
 	return bank->size - offset;
 }
 
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index cb27190..0b8bcfa 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -380,9 +380,10 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector,
 }
 
 #ifdef CONFIG_BLK_DEV_RAM_DAX
-static long brd_direct_access(struct block_device *bdev, sector_t sector,
-			void __pmem **kaddr, pfn_t *pfn)
+static long brd_direct_access(struct block_device *bdev,
+		struct blk_dax_ctl *dax)
 {
+	sector_t sector = get_start_sect(bdev) + dax->sector;
 	struct brd_device *brd = bdev->bd_disk->private_data;
 	struct page *page;
 
@@ -391,8 +392,8 @@ static long brd_direct_access(struct block_device *bdev, sector_t sector,
 	page = brd_insert_page(brd, sector);
 	if (!page)
 		return -ENOSPC;
-	*kaddr = (void __pmem *)page_address(page);
-	*pfn = page_to_pfn_t(page);
+	dax->addr = (void __pmem *)page_address(page);
+	dax->pfn = page_to_pfn_t(page);
 
 	return PAGE_SIZE;
 }
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index ca5721c..da10554 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -167,14 +167,15 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 	return rc;
 }
 
-static long pmem_direct_access(struct block_device *bdev, sector_t sector,
-		      void __pmem **kaddr, pfn_t *pfn)
+static long pmem_direct_access(struct block_device *bdev,
+		struct blk_dax_ctl *dax)
 {
+	sector_t sector = get_start_sect(bdev) + dax->sector;
 	struct pmem_device *pmem = bdev->bd_disk->private_data;
 	resource_size_t offset = sector * 512 + pmem->data_offset;
 
-	*kaddr = pmem->virt_addr + offset;
-	*pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
+	dax->addr = pmem->virt_addr + offset;
+	dax->pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
 
 	return pmem->size - pmem->pfn_pad - offset;
 }
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index ce7b701..e5bda68 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -30,8 +30,8 @@ static int dcssblk_open(struct block_device *bdev, fmode_t mode);
 static void dcssblk_release(struct gendisk *disk, fmode_t mode);
 static blk_qc_t dcssblk_make_request(struct request_queue *q,
 						struct bio *bio);
-static long dcssblk_direct_access(struct block_device *bdev, sector_t secnum,
-			 void __pmem **kaddr, pfn_t *pfn);
+static long dcssblk_direct_access(struct block_device *bdev,
+		struct blk_dax_ctl *dax)
 
 static char dcssblk_segments[DCSSBLK_PARM_LEN] = "\0";
 
@@ -883,9 +883,9 @@ fail:
 }
 
 static long
-dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
-			void __pmem **kaddr, pfn_t *pfn)
+dcssblk_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
 {
+	sector_t secnum = get_start_sect(bdev) + dax->sector;
 	struct dcssblk_dev_info *dev_info;
 	unsigned long offset, dev_sz;
 
@@ -894,8 +894,8 @@ dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
 		return -ENODEV;
 	dev_sz = dev_info->end - dev_info->start;
 	offset = secnum * 512;
-	*kaddr = (void __pmem *) (dev_info->start + offset);
-	*pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
+	dax->addr = (void __pmem *) (dev_info->start + offset);
+	dax->pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
 
 	return dev_sz - offset;
 }
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 826b164..9c0765b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -488,7 +488,7 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
 	sector += get_start_sect(bdev);
 	if (sector % (PAGE_SIZE / 512))
 		return -EINVAL;
-	avail = ops->direct_access(bdev, sector, &dax->addr, &dax->pfn);
+	avail = ops->direct_access(bdev, dax);
 	if (!avail)
 		return -ERANGE;
 	if (avail > 0 && avail & ~PAGE_MASK)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 413c84f..f8c65b8 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1653,8 +1653,7 @@ struct block_device_operations {
 	int (*rw_page)(struct block_device *, sector_t, struct page *, int rw);
 	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
 	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
-	long (*direct_access)(struct block_device *, sector_t, void __pmem **,
-			pfn_t *);
+	long (*direct_access)(struct block_device *, struct blk_dax_ctl *dax);
 	unsigned int (*check_events) (struct gendisk *disk,
 				      unsigned int clearing);
 	/* ->media_changed() is DEPRECATED, use ->check_events() instead */
-- 
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 1/5] block, dax: pass blk_dax_ctl through to drivers
@ 2016-03-24 23:17   ` Vishal Verma
  0 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jens Axboe, Jan Kara, Andrew Morton, xfs, linux-block, linux-mm,
	Matthew Wilcox, linux-fsdevel, Dan Williams, linux-ext4,
	Ross Zwisler, Al Viro

From: Dan Williams <dan.j.williams@intel.com>

From: Dan Williams <dan.j.williams@intel.com>

This is in preparation for doing badblocks checking against the
requested sector range in the driver.  Currently we opportunistically
return as much data that can be "dax'd" starting at the given sector.
When errors are present we want to limit that range to the first
encountered error, or fail the dax request if the range encompasses an
error.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/powerpc/sysdev/axonram.c | 10 +++++-----
 drivers/block/brd.c           |  9 +++++----
 drivers/nvdimm/pmem.c         |  9 +++++----
 drivers/s390/block/dcssblk.c  | 12 ++++++------
 fs/block_dev.c                |  2 +-
 include/linux/blkdev.h        |  3 +--
 6 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 0d112b9..d85673f 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -139,17 +139,17 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
 
 /**
  * axon_ram_direct_access - direct_access() method for block device
- * @device, @sector, @data: see block_device_operations method
+ * @dax: see block_device_operations method
  */
 static long
-axon_ram_direct_access(struct block_device *device, sector_t sector,
-		       void __pmem **kaddr, pfn_t *pfn)
+axon_ram_direct_access(struct block_device *device, struct blk_dax_ctl *dax)
 {
+	sector_t sector = get_start_sect(device) + dax->sector;
 	struct axon_ram_bank *bank = device->bd_disk->private_data;
 	loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
 
-	*kaddr = (void __pmem __force *) bank->io_addr + offset;
-	*pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
+	dax->addr = (void __pmem __force *) bank->io_addr + offset;
+	dax->pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
 	return bank->size - offset;
 }
 
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index cb27190..0b8bcfa 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -380,9 +380,10 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector,
 }
 
 #ifdef CONFIG_BLK_DEV_RAM_DAX
-static long brd_direct_access(struct block_device *bdev, sector_t sector,
-			void __pmem **kaddr, pfn_t *pfn)
+static long brd_direct_access(struct block_device *bdev,
+		struct blk_dax_ctl *dax)
 {
+	sector_t sector = get_start_sect(bdev) + dax->sector;
 	struct brd_device *brd = bdev->bd_disk->private_data;
 	struct page *page;
 
@@ -391,8 +392,8 @@ static long brd_direct_access(struct block_device *bdev, sector_t sector,
 	page = brd_insert_page(brd, sector);
 	if (!page)
 		return -ENOSPC;
-	*kaddr = (void __pmem *)page_address(page);
-	*pfn = page_to_pfn_t(page);
+	dax->addr = (void __pmem *)page_address(page);
+	dax->pfn = page_to_pfn_t(page);
 
 	return PAGE_SIZE;
 }
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index ca5721c..da10554 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -167,14 +167,15 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 	return rc;
 }
 
-static long pmem_direct_access(struct block_device *bdev, sector_t sector,
-		      void __pmem **kaddr, pfn_t *pfn)
+static long pmem_direct_access(struct block_device *bdev,
+		struct blk_dax_ctl *dax)
 {
+	sector_t sector = get_start_sect(bdev) + dax->sector;
 	struct pmem_device *pmem = bdev->bd_disk->private_data;
 	resource_size_t offset = sector * 512 + pmem->data_offset;
 
-	*kaddr = pmem->virt_addr + offset;
-	*pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
+	dax->addr = pmem->virt_addr + offset;
+	dax->pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
 
 	return pmem->size - pmem->pfn_pad - offset;
 }
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index ce7b701..e5bda68 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -30,8 +30,8 @@ static int dcssblk_open(struct block_device *bdev, fmode_t mode);
 static void dcssblk_release(struct gendisk *disk, fmode_t mode);
 static blk_qc_t dcssblk_make_request(struct request_queue *q,
 						struct bio *bio);
-static long dcssblk_direct_access(struct block_device *bdev, sector_t secnum,
-			 void __pmem **kaddr, pfn_t *pfn);
+static long dcssblk_direct_access(struct block_device *bdev,
+		struct blk_dax_ctl *dax)
 
 static char dcssblk_segments[DCSSBLK_PARM_LEN] = "\0";
 
@@ -883,9 +883,9 @@ fail:
 }
 
 static long
-dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
-			void __pmem **kaddr, pfn_t *pfn)
+dcssblk_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
 {
+	sector_t secnum = get_start_sect(bdev) + dax->sector;
 	struct dcssblk_dev_info *dev_info;
 	unsigned long offset, dev_sz;
 
@@ -894,8 +894,8 @@ dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
 		return -ENODEV;
 	dev_sz = dev_info->end - dev_info->start;
 	offset = secnum * 512;
-	*kaddr = (void __pmem *) (dev_info->start + offset);
-	*pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
+	dax->addr = (void __pmem *) (dev_info->start + offset);
+	dax->pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
 
 	return dev_sz - offset;
 }
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 826b164..9c0765b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -488,7 +488,7 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
 	sector += get_start_sect(bdev);
 	if (sector % (PAGE_SIZE / 512))
 		return -EINVAL;
-	avail = ops->direct_access(bdev, sector, &dax->addr, &dax->pfn);
+	avail = ops->direct_access(bdev, dax);
 	if (!avail)
 		return -ERANGE;
 	if (avail > 0 && avail & ~PAGE_MASK)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 413c84f..f8c65b8 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1653,8 +1653,7 @@ struct block_device_operations {
 	int (*rw_page)(struct block_device *, sector_t, struct page *, int rw);
 	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
 	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
-	long (*direct_access)(struct block_device *, sector_t, void __pmem **,
-			pfn_t *);
+	long (*direct_access)(struct block_device *, struct blk_dax_ctl *dax);
 	unsigned int (*check_events) (struct gendisk *disk,
 				      unsigned int clearing);
 	/* ->media_changed() is DEPRECATED, use ->check_events() instead */
-- 
2.5.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 2/5] dax: fallback from pmd to pte on error
  2016-03-24 23:17 ` Vishal Verma
  (?)
@ 2016-03-24 23:17   ` Vishal Verma
  -1 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jens Axboe, Jan Kara, Andrew Morton, Dave Chinner, xfs,
	linux-block, linux-mm, Matthew Wilcox, linux-fsdevel, linux-ext4,
	Al Viro

From: Dan Williams <dan.j.williams@intel.com>

From: Dan Williams <dan.j.williams@intel.com>

In preparation for consulting a badblocks list in pmem_direct_access(),
teach dax_pmd_fault() to fallback rather than fail immediately upon
encountering an error.  The thought being that reducing the span of the
dax request may avoid the error region.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/dax.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index bbb2ad7..bb7e9f8 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -940,8 +940,8 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 		long length = dax_map_atomic(bdev, &dax);
 
 		if (length < 0) {
-			result = VM_FAULT_SIGBUS;
-			goto out;
+			dax_pmd_dbg(&bh, address, "dax-error fallback");
+			goto fallback;
 		}
 		if (length < PMD_SIZE) {
 			dax_pmd_dbg(&bh, address, "dax-length too small");
-- 
2.5.5

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 2/5] dax: fallback from pmd to pte on error
@ 2016-03-24 23:17   ` Vishal Verma
  0 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, linux-fsdevel, linux-block, xfs, linux-ext4,
	linux-mm, Matthew Wilcox, Ross Zwisler, Dave Chinner, Jan Kara,
	Jens Axboe, Al Viro, Andrew Morton

From: Dan Williams <dan.j.williams@intel.com>

From: Dan Williams <dan.j.williams@intel.com>

In preparation for consulting a badblocks list in pmem_direct_access(),
teach dax_pmd_fault() to fallback rather than fail immediately upon
encountering an error.  The thought being that reducing the span of the
dax request may avoid the error region.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/dax.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index bbb2ad7..bb7e9f8 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -940,8 +940,8 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 		long length = dax_map_atomic(bdev, &dax);
 
 		if (length < 0) {
-			result = VM_FAULT_SIGBUS;
-			goto out;
+			dax_pmd_dbg(&bh, address, "dax-error fallback");
+			goto fallback;
 		}
 		if (length < PMD_SIZE) {
 			dax_pmd_dbg(&bh, address, "dax-length too small");
-- 
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 2/5] dax: fallback from pmd to pte on error
@ 2016-03-24 23:17   ` Vishal Verma
  0 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jens Axboe, Jan Kara, Andrew Morton, xfs, linux-block, linux-mm,
	Matthew Wilcox, linux-fsdevel, Dan Williams, linux-ext4,
	Ross Zwisler, Al Viro

From: Dan Williams <dan.j.williams@intel.com>

From: Dan Williams <dan.j.williams@intel.com>

In preparation for consulting a badblocks list in pmem_direct_access(),
teach dax_pmd_fault() to fallback rather than fail immediately upon
encountering an error.  The thought being that reducing the span of the
dax request may avoid the error region.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/dax.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index bbb2ad7..bb7e9f8 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -940,8 +940,8 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 		long length = dax_map_atomic(bdev, &dax);
 
 		if (length < 0) {
-			result = VM_FAULT_SIGBUS;
-			goto out;
+			dax_pmd_dbg(&bh, address, "dax-error fallback");
+			goto fallback;
 		}
 		if (length < PMD_SIZE) {
 			dax_pmd_dbg(&bh, address, "dax-length too small");
-- 
2.5.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 3/5] dax: enable dax in the presence of known media errors (badblocks)
  2016-03-24 23:17 ` Vishal Verma
  (?)
@ 2016-03-24 23:17   ` Vishal Verma
  -1 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jens Axboe, Jan Kara, Andrew Morton, Dave Chinner, xfs,
	linux-block, linux-mm, Matthew Wilcox, linux-fsdevel, linux-ext4,
	Al Viro

From: Dan Williams <dan.j.williams@intel.com>

From: Dan Williams <dan.j.williams@intel.com>

1/ If a mapping overlaps a bad sector fail the request.

2/ Do not opportunistically report more dax-capable capacity than is
   requested when errors present.

[vishal: fix a conflict with system RAM collision patches]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/ioctl.c         | 9 ---------
 drivers/nvdimm/pmem.c | 8 ++++++++
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index d8996bb..cd7f392 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -423,15 +423,6 @@ bool blkdev_dax_capable(struct block_device *bdev)
 			|| (bdev->bd_part->nr_sects % (PAGE_SIZE / 512)))
 		return false;
 
-	/*
-	 * If the device has known bad blocks, force all I/O through the
-	 * driver / page cache.
-	 *
-	 * TODO: support finer grained dax error handling
-	 */
-	if (disk->bb && disk->bb->count)
-		return false;
-
 	return true;
 }
 #endif
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index da10554..eac5f93 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -174,9 +174,17 @@ static long pmem_direct_access(struct block_device *bdev,
 	struct pmem_device *pmem = bdev->bd_disk->private_data;
 	resource_size_t offset = sector * 512 + pmem->data_offset;
 
+	if (unlikely(is_bad_pmem(&pmem->bb, sector, dax->size)))
+		return -EIO;
 	dax->addr = pmem->virt_addr + offset;
 	dax->pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
 
+	/*
+	 * If badblocks are present, limit known good range to the
+	 * requested range.
+	 */
+	if (unlikely(pmem->bb.count))
+		return dax->size;
 	return pmem->size - pmem->pfn_pad - offset;
 }
 
-- 
2.5.5

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 3/5] dax: enable dax in the presence of known media errors (badblocks)
@ 2016-03-24 23:17   ` Vishal Verma
  0 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Dan Williams, linux-fsdevel, linux-block, xfs, linux-ext4,
	linux-mm, Matthew Wilcox, Ross Zwisler, Dave Chinner, Jan Kara,
	Jens Axboe, Al Viro, Andrew Morton

From: Dan Williams <dan.j.williams@intel.com>

From: Dan Williams <dan.j.williams@intel.com>

1/ If a mapping overlaps a bad sector fail the request.

2/ Do not opportunistically report more dax-capable capacity than is
   requested when errors present.

[vishal: fix a conflict with system RAM collision patches]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/ioctl.c         | 9 ---------
 drivers/nvdimm/pmem.c | 8 ++++++++
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index d8996bb..cd7f392 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -423,15 +423,6 @@ bool blkdev_dax_capable(struct block_device *bdev)
 			|| (bdev->bd_part->nr_sects % (PAGE_SIZE / 512)))
 		return false;
 
-	/*
-	 * If the device has known bad blocks, force all I/O through the
-	 * driver / page cache.
-	 *
-	 * TODO: support finer grained dax error handling
-	 */
-	if (disk->bb && disk->bb->count)
-		return false;
-
 	return true;
 }
 #endif
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index da10554..eac5f93 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -174,9 +174,17 @@ static long pmem_direct_access(struct block_device *bdev,
 	struct pmem_device *pmem = bdev->bd_disk->private_data;
 	resource_size_t offset = sector * 512 + pmem->data_offset;
 
+	if (unlikely(is_bad_pmem(&pmem->bb, sector, dax->size)))
+		return -EIO;
 	dax->addr = pmem->virt_addr + offset;
 	dax->pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
 
+	/*
+	 * If badblocks are present, limit known good range to the
+	 * requested range.
+	 */
+	if (unlikely(pmem->bb.count))
+		return dax->size;
 	return pmem->size - pmem->pfn_pad - offset;
 }
 
-- 
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 3/5] dax: enable dax in the presence of known media errors (badblocks)
@ 2016-03-24 23:17   ` Vishal Verma
  0 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jens Axboe, Jan Kara, Andrew Morton, xfs, linux-block, linux-mm,
	Matthew Wilcox, linux-fsdevel, Dan Williams, linux-ext4,
	Ross Zwisler, Al Viro

From: Dan Williams <dan.j.williams@intel.com>

From: Dan Williams <dan.j.williams@intel.com>

1/ If a mapping overlaps a bad sector fail the request.

2/ Do not opportunistically report more dax-capable capacity than is
   requested when errors present.

[vishal: fix a conflict with system RAM collision patches]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/ioctl.c         | 9 ---------
 drivers/nvdimm/pmem.c | 8 ++++++++
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index d8996bb..cd7f392 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -423,15 +423,6 @@ bool blkdev_dax_capable(struct block_device *bdev)
 			|| (bdev->bd_part->nr_sects % (PAGE_SIZE / 512)))
 		return false;
 
-	/*
-	 * If the device has known bad blocks, force all I/O through the
-	 * driver / page cache.
-	 *
-	 * TODO: support finer grained dax error handling
-	 */
-	if (disk->bb && disk->bb->count)
-		return false;
-
 	return true;
 }
 #endif
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index da10554..eac5f93 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -174,9 +174,17 @@ static long pmem_direct_access(struct block_device *bdev,
 	struct pmem_device *pmem = bdev->bd_disk->private_data;
 	resource_size_t offset = sector * 512 + pmem->data_offset;
 
+	if (unlikely(is_bad_pmem(&pmem->bb, sector, dax->size)))
+		return -EIO;
 	dax->addr = pmem->virt_addr + offset;
 	dax->pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
 
+	/*
+	 * If badblocks are present, limit known good range to the
+	 * requested range.
+	 */
+	if (unlikely(pmem->bb.count))
+		return dax->size;
 	return pmem->size - pmem->pfn_pad - offset;
 }
 
-- 
2.5.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
  2016-03-24 23:17 ` Vishal Verma
  (?)
@ 2016-03-24 23:17   ` Vishal Verma
  -1 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jens Axboe, Jan Kara, Andrew Morton, Dave Chinner, xfs,
	linux-block, linux-mm, Matthew Wilcox, linux-fsdevel, linux-ext4,
	Al Viro

From: Matthew Wilcox <matthew.r.wilcox@intel.com>

dax_clear_sectors() cannot handle poisoned blocks.  These must be
zeroed using the BIO interface instead.  Convert ext2 and XFS to use
only sb_issue_zerout().

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
[vishal: Also remove the dax_clear_sectors function entirely]
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 fs/dax.c               | 32 --------------------------------
 fs/ext2/inode.c        |  7 +++----
 fs/xfs/xfs_bmap_util.c |  9 ---------
 include/linux/dax.h    |  1 -
 4 files changed, 3 insertions(+), 46 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index bb7e9f8..a30481e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -78,38 +78,6 @@ struct page *read_dax_sector(struct block_device *bdev, sector_t n)
 	return page;
 }
 
-/*
- * dax_clear_sectors() is called from within transaction context from XFS,
- * and hence this means the stack from this point must follow GFP_NOFS
- * semantics for all operations.
- */
-int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size)
-{
-	struct blk_dax_ctl dax = {
-		.sector = _sector,
-		.size = _size,
-	};
-
-	might_sleep();
-	do {
-		long count, sz;
-
-		count = dax_map_atomic(bdev, &dax);
-		if (count < 0)
-			return count;
-		sz = min_t(long, count, SZ_128K);
-		clear_pmem(dax.addr, sz);
-		dax.size -= sz;
-		dax.sector += sz / 512;
-		dax_unmap_atomic(bdev, &dax);
-		cond_resched();
-	} while (dax.size);
-
-	wmb_pmem();
-	return 0;
-}
-EXPORT_SYMBOL_GPL(dax_clear_sectors);
-
 /* the clear_pmem() calls are ordered by a wmb_pmem() in the caller */
 static void dax_new_buf(void __pmem *addr, unsigned size, unsigned first,
 		loff_t pos, loff_t end)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 6bd58e6..824f249 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -26,6 +26,7 @@
 #include <linux/highuid.h>
 #include <linux/pagemap.h>
 #include <linux/dax.h>
+#include <linux/blkdev.h>
 #include <linux/quotaops.h>
 #include <linux/writeback.h>
 #include <linux/buffer_head.h>
@@ -737,10 +738,8 @@ static int ext2_get_blocks(struct inode *inode,
 		 * so that it's not found by another thread before it's
 		 * initialised
 		 */
-		err = dax_clear_sectors(inode->i_sb->s_bdev,
-				le32_to_cpu(chain[depth-1].key) <<
-				(inode->i_blkbits - 9),
-				1 << inode->i_blkbits);
+		err = sb_issue_zeroout(inode->i_sb,
+				le32_to_cpu(chain[depth-1].key), 1, GFP_NOFS);
 		if (err) {
 			mutex_unlock(&ei->truncate_mutex);
 			goto cleanup;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 6c87601..23a759a 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -72,16 +72,7 @@ xfs_zero_extent(
 	struct xfs_mount *mp = ip->i_mount;
 	xfs_daddr_t	sector = xfs_fsb_to_db(ip, start_fsb);
 	sector_t	block = XFS_BB_TO_FSBT(mp, sector);
-	ssize_t		size = XFS_FSB_TO_B(mp, count_fsb);
 
-	if (IS_DAX(VFS_I(ip)))
-		return dax_clear_sectors(xfs_find_bdev_for_inode(VFS_I(ip)),
-				sector, size);
-
-	/*
-	 * let the block layer decide on the fastest method of
-	 * implementing the zeroing.
-	 */
 	return sb_issue_zeroout(mp->m_super, block, count_fsb, GFP_NOFS);
 
 }
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 636dd59..933198a 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,7 +7,6 @@
 
 ssize_t dax_do_io(struct kiocb *, struct inode *, struct iov_iter *, loff_t,
 		  get_block_t, dio_iodone_t, int flags);
-int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size);
 int dax_zero_page_range(struct inode *, loff_t from, unsigned len, get_block_t);
 int dax_truncate_page(struct inode *, loff_t from, get_block_t);
 int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t,
-- 
2.5.5

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-24 23:17   ` Vishal Verma
  0 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Vishal Verma, linux-fsdevel, linux-block, xfs, linux-ext4,
	linux-mm, Matthew Wilcox, Ross Zwisler, Dan Williams,
	Dave Chinner, Jan Kara, Jens Axboe, Al Viro, Andrew Morton

From: Matthew Wilcox <matthew.r.wilcox@intel.com>

dax_clear_sectors() cannot handle poisoned blocks.  These must be
zeroed using the BIO interface instead.  Convert ext2 and XFS to use
only sb_issue_zerout().

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
[vishal: Also remove the dax_clear_sectors function entirely]
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 fs/dax.c               | 32 --------------------------------
 fs/ext2/inode.c        |  7 +++----
 fs/xfs/xfs_bmap_util.c |  9 ---------
 include/linux/dax.h    |  1 -
 4 files changed, 3 insertions(+), 46 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index bb7e9f8..a30481e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -78,38 +78,6 @@ struct page *read_dax_sector(struct block_device *bdev, sector_t n)
 	return page;
 }
 
-/*
- * dax_clear_sectors() is called from within transaction context from XFS,
- * and hence this means the stack from this point must follow GFP_NOFS
- * semantics for all operations.
- */
-int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size)
-{
-	struct blk_dax_ctl dax = {
-		.sector = _sector,
-		.size = _size,
-	};
-
-	might_sleep();
-	do {
-		long count, sz;
-
-		count = dax_map_atomic(bdev, &dax);
-		if (count < 0)
-			return count;
-		sz = min_t(long, count, SZ_128K);
-		clear_pmem(dax.addr, sz);
-		dax.size -= sz;
-		dax.sector += sz / 512;
-		dax_unmap_atomic(bdev, &dax);
-		cond_resched();
-	} while (dax.size);
-
-	wmb_pmem();
-	return 0;
-}
-EXPORT_SYMBOL_GPL(dax_clear_sectors);
-
 /* the clear_pmem() calls are ordered by a wmb_pmem() in the caller */
 static void dax_new_buf(void __pmem *addr, unsigned size, unsigned first,
 		loff_t pos, loff_t end)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 6bd58e6..824f249 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -26,6 +26,7 @@
 #include <linux/highuid.h>
 #include <linux/pagemap.h>
 #include <linux/dax.h>
+#include <linux/blkdev.h>
 #include <linux/quotaops.h>
 #include <linux/writeback.h>
 #include <linux/buffer_head.h>
@@ -737,10 +738,8 @@ static int ext2_get_blocks(struct inode *inode,
 		 * so that it's not found by another thread before it's
 		 * initialised
 		 */
-		err = dax_clear_sectors(inode->i_sb->s_bdev,
-				le32_to_cpu(chain[depth-1].key) <<
-				(inode->i_blkbits - 9),
-				1 << inode->i_blkbits);
+		err = sb_issue_zeroout(inode->i_sb,
+				le32_to_cpu(chain[depth-1].key), 1, GFP_NOFS);
 		if (err) {
 			mutex_unlock(&ei->truncate_mutex);
 			goto cleanup;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 6c87601..23a759a 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -72,16 +72,7 @@ xfs_zero_extent(
 	struct xfs_mount *mp = ip->i_mount;
 	xfs_daddr_t	sector = xfs_fsb_to_db(ip, start_fsb);
 	sector_t	block = XFS_BB_TO_FSBT(mp, sector);
-	ssize_t		size = XFS_FSB_TO_B(mp, count_fsb);
 
-	if (IS_DAX(VFS_I(ip)))
-		return dax_clear_sectors(xfs_find_bdev_for_inode(VFS_I(ip)),
-				sector, size);
-
-	/*
-	 * let the block layer decide on the fastest method of
-	 * implementing the zeroing.
-	 */
 	return sb_issue_zeroout(mp->m_super, block, count_fsb, GFP_NOFS);
 
 }
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 636dd59..933198a 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,7 +7,6 @@
 
 ssize_t dax_do_io(struct kiocb *, struct inode *, struct iov_iter *, loff_t,
 		  get_block_t, dio_iodone_t, int flags);
-int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size);
 int dax_zero_page_range(struct inode *, loff_t from, unsigned len, get_block_t);
 int dax_truncate_page(struct inode *, loff_t from, get_block_t);
 int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t,
-- 
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-24 23:17   ` Vishal Verma
  0 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jens Axboe, Jan Kara, Andrew Morton, Vishal Verma, xfs,
	linux-block, linux-mm, Matthew Wilcox, linux-fsdevel,
	Ross Zwisler, linux-ext4, Dan Williams, Al Viro

From: Matthew Wilcox <matthew.r.wilcox@intel.com>

dax_clear_sectors() cannot handle poisoned blocks.  These must be
zeroed using the BIO interface instead.  Convert ext2 and XFS to use
only sb_issue_zerout().

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
[vishal: Also remove the dax_clear_sectors function entirely]
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 fs/dax.c               | 32 --------------------------------
 fs/ext2/inode.c        |  7 +++----
 fs/xfs/xfs_bmap_util.c |  9 ---------
 include/linux/dax.h    |  1 -
 4 files changed, 3 insertions(+), 46 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index bb7e9f8..a30481e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -78,38 +78,6 @@ struct page *read_dax_sector(struct block_device *bdev, sector_t n)
 	return page;
 }
 
-/*
- * dax_clear_sectors() is called from within transaction context from XFS,
- * and hence this means the stack from this point must follow GFP_NOFS
- * semantics for all operations.
- */
-int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size)
-{
-	struct blk_dax_ctl dax = {
-		.sector = _sector,
-		.size = _size,
-	};
-
-	might_sleep();
-	do {
-		long count, sz;
-
-		count = dax_map_atomic(bdev, &dax);
-		if (count < 0)
-			return count;
-		sz = min_t(long, count, SZ_128K);
-		clear_pmem(dax.addr, sz);
-		dax.size -= sz;
-		dax.sector += sz / 512;
-		dax_unmap_atomic(bdev, &dax);
-		cond_resched();
-	} while (dax.size);
-
-	wmb_pmem();
-	return 0;
-}
-EXPORT_SYMBOL_GPL(dax_clear_sectors);
-
 /* the clear_pmem() calls are ordered by a wmb_pmem() in the caller */
 static void dax_new_buf(void __pmem *addr, unsigned size, unsigned first,
 		loff_t pos, loff_t end)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 6bd58e6..824f249 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -26,6 +26,7 @@
 #include <linux/highuid.h>
 #include <linux/pagemap.h>
 #include <linux/dax.h>
+#include <linux/blkdev.h>
 #include <linux/quotaops.h>
 #include <linux/writeback.h>
 #include <linux/buffer_head.h>
@@ -737,10 +738,8 @@ static int ext2_get_blocks(struct inode *inode,
 		 * so that it's not found by another thread before it's
 		 * initialised
 		 */
-		err = dax_clear_sectors(inode->i_sb->s_bdev,
-				le32_to_cpu(chain[depth-1].key) <<
-				(inode->i_blkbits - 9),
-				1 << inode->i_blkbits);
+		err = sb_issue_zeroout(inode->i_sb,
+				le32_to_cpu(chain[depth-1].key), 1, GFP_NOFS);
 		if (err) {
 			mutex_unlock(&ei->truncate_mutex);
 			goto cleanup;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 6c87601..23a759a 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -72,16 +72,7 @@ xfs_zero_extent(
 	struct xfs_mount *mp = ip->i_mount;
 	xfs_daddr_t	sector = xfs_fsb_to_db(ip, start_fsb);
 	sector_t	block = XFS_BB_TO_FSBT(mp, sector);
-	ssize_t		size = XFS_FSB_TO_B(mp, count_fsb);
 
-	if (IS_DAX(VFS_I(ip)))
-		return dax_clear_sectors(xfs_find_bdev_for_inode(VFS_I(ip)),
-				sector, size);
-
-	/*
-	 * let the block layer decide on the fastest method of
-	 * implementing the zeroing.
-	 */
 	return sb_issue_zeroout(mp->m_super, block, count_fsb, GFP_NOFS);
 
 }
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 636dd59..933198a 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,7 +7,6 @@
 
 ssize_t dax_do_io(struct kiocb *, struct inode *, struct iov_iter *, loff_t,
 		  get_block_t, dio_iodone_t, int flags);
-int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size);
 int dax_zero_page_range(struct inode *, loff_t from, unsigned len, get_block_t);
 int dax_truncate_page(struct inode *, loff_t from, get_block_t);
 int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t,
-- 
2.5.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 5/5] dax: handle media errors in dax_do_io
  2016-03-24 23:17 ` Vishal Verma
  (?)
@ 2016-03-24 23:17   ` Vishal Verma
  -1 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jens Axboe, Jan Kara, Andrew Morton, Dave Chinner, xfs,
	linux-block, linux-mm, Matthew Wilcox, linux-fsdevel, linux-ext4,
	Al Viro

dax_do_io (called for read() or write() for a dax file system) may fail
in the presence of bad blocks or media errors. Since we expect that a
write should clear media errors on nvdimms, make dax_do_io fall back to
the direct_IO path, which will send down a bio to the driver, which can
then attempt to clear the error.

Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 fs/block_dev.c      |  5 +++--
 fs/dax.c            | 34 ++++++++++++++++++++++++++++++++--
 fs/ext2/inode.c     |  5 +++--
 fs/ext4/indirect.c  | 11 +++++++----
 fs/ext4/inode.c     |  5 +++--
 fs/xfs/xfs_aops.c   |  7 ++++---
 include/linux/dax.h |  6 +++++-
 7 files changed, 57 insertions(+), 16 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9c0765b..f3873ab 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -168,8 +168,9 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
 	struct inode *inode = bdev_file_inode(file);
 
 	if (IS_DAX(inode))
-		return dax_do_io(iocb, inode, iter, offset, blkdev_get_block,
-				NULL, DIO_SKIP_DIO_COUNT);
+		return dax_do_io(iocb, inode, I_BDEV(inode), iter, offset,
+				blkdev_get_block, blkdev_get_block,
+				NULL, NULL, DIO_SKIP_DIO_COUNT);
 	return __blockdev_direct_IO(iocb, inode, I_BDEV(inode), iter, offset,
 				    blkdev_get_block, NULL, NULL,
 				    DIO_SKIP_DIO_COUNT);
diff --git a/fs/dax.c b/fs/dax.c
index a30481e..b90c8e9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -208,7 +208,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 }
 
 /**
- * dax_do_io - Perform I/O to a DAX file
+ * __dax_do_io - Perform I/O to a DAX file
  * @iocb: The control block for this I/O
  * @inode: The file which the I/O is directed at
  * @iter: The addresses to do I/O from or to
@@ -224,7 +224,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
  * As with do_blockdev_direct_IO(), we increment i_dio_count while the I/O
  * is in progress.
  */
-ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
+ssize_t __dax_do_io(struct kiocb *iocb, struct inode *inode,
 		  struct iov_iter *iter, loff_t pos, get_block_t get_block,
 		  dio_iodone_t end_io, int flags)
 {
@@ -262,8 +262,38 @@ ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
  out:
 	return retval;
 }
+EXPORT_SYMBOL_GPL(__dax_do_io);
+
+/*
+ * This is a library function for use by file systems. It will perform a
+ * fallback to direct_io semantics if the dax_io fails due to a media error.
+ */
+ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
+		  struct block_device *bdev, struct iov_iter *iter, loff_t pos,
+		  get_block_t dax_get_block, get_block_t dio_get_block,
+		  dio_iodone_t end_io, dio_submit_t submit_io, int flags)
+{
+	ssize_t retval;
+
+	retval = __dax_do_io(iocb, inode, iter, pos, dax_get_block, end_io,
+				flags);
+	if (iov_iter_rw(iter) == WRITE && retval == -EIO) {
+		/*
+		 * __dax_do_io may have failed a write due to a bad block.
+		 * Retry with direct_io, and if the direct_IO also fails,
+		 * return -EIO as that was the original error that led us
+		 * down the direct_IO path.
+		 */
+		retval = __blockdev_direct_IO(iocb, inode, bdev, iter, pos,
+				dio_get_block, end_io, submit_io, flags);
+		if (retval < 0)
+			return -EIO;
+	}
+	return retval;
+}
 EXPORT_SYMBOL_GPL(dax_do_io);
 
+
 /*
  * The user has performed a load from a hole in the file.  Allocating
  * a new page in the file would cause excessive storage usage for
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 824f249..8a307cf 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -862,8 +862,9 @@ ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
 	ssize_t ret;
 
 	if (IS_DAX(inode))
-		ret = dax_do_io(iocb, inode, iter, offset, ext2_get_block, NULL,
-				DIO_LOCKING);
+		ret = dax_do_io(iocb, inode, inode->i_sb->s_bdev, iter,
+				offset, ext2_get_block, ext2_get_block,
+				NULL, NULL, DIO_LOCKING | DIO_SKIP_HOLES);
 	else
 		ret = blockdev_direct_IO(iocb, inode, iter, offset,
 					 ext2_get_block);
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 355ef9c..4b087b7 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -692,8 +692,9 @@ retry:
 			goto locked;
 		}
 		if (IS_DAX(inode))
-			ret = dax_do_io(iocb, inode, iter, offset,
-					ext4_get_block, NULL, 0);
+			ret = dax_do_io(iocb, inode, inode->i_sb->s_bdev, iter,
+					offset, ext4_get_block, ext4_get_block,
+					NULL, NULL, 0);
 		else
 			ret = __blockdev_direct_IO(iocb, inode,
 						   inode->i_sb->s_bdev, iter,
@@ -703,8 +704,10 @@ retry:
 	} else {
 locked:
 		if (IS_DAX(inode))
-			ret = dax_do_io(iocb, inode, iter, offset,
-					ext4_get_block, NULL, DIO_LOCKING);
+			ret = dax_do_io(iocb, inode, inode->i_sb->s_bdev, iter,
+					offset, ext4_get_block, ext4_get_block,
+					NULL, NULL, DIO_LOCKING |
+					DIO_SKIP_HOLES);
 		else
 			ret = blockdev_direct_IO(iocb, inode, iter, offset,
 						 ext4_get_block);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index aee960b..4220dac 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3315,8 +3315,9 @@ static ssize_t ext4_ext_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 	BUG_ON(ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode));
 #endif
 	if (IS_DAX(inode))
-		ret = dax_do_io(iocb, inode, iter, offset, get_block_func,
-				ext4_end_io_dio, dio_flags);
+		ret = dax_do_io(iocb, inode, inode->i_sb->s_bdev, iter, offset,
+				get_block_func, get_block_func,
+				ext4_end_io_dio, NULL, dio_flags);
 	else
 		ret = __blockdev_direct_IO(iocb, inode,
 					   inode->i_sb->s_bdev, iter, offset,
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a9ebabfe..dc4e088 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1682,11 +1682,12 @@ xfs_vm_do_dio(
 					 void		*private),
 	int			flags)
 {
-	struct block_device	*bdev;
+	struct block_device	*bdev = xfs_find_bdev_for_inode(inode);
 
 	if (IS_DAX(inode))
-		return dax_do_io(iocb, inode, iter, offset,
-				 xfs_get_blocks_direct, endio, 0);
+		return dax_do_io(iocb, inode, bdev, iter, offset,
+				 xfs_get_blocks_direct, xfs_get_blocks_direct,
+				 endio, NULL, flags);
 
 	bdev = xfs_find_bdev_for_inode(inode);
 	return  __blockdev_direct_IO(iocb, inode, bdev, iter, offset,
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 933198a..6981076 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -5,8 +5,12 @@
 #include <linux/mm.h>
 #include <asm/pgtable.h>
 
-ssize_t dax_do_io(struct kiocb *, struct inode *, struct iov_iter *, loff_t,
+ssize_t __dax_do_io(struct kiocb *, struct inode *, struct iov_iter *, loff_t,
 		  get_block_t, dio_iodone_t, int flags);
+ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
+		  struct block_device *bdev, struct iov_iter *iter, loff_t pos,
+		  get_block_t dax_get_block, get_block_t dio_get_block,
+		  dio_iodone_t end_io, dio_submit_t submit_io, int flags);
 int dax_zero_page_range(struct inode *, loff_t from, unsigned len, get_block_t);
 int dax_truncate_page(struct inode *, loff_t from, get_block_t);
 int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t,
-- 
2.5.5

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 5/5] dax: handle media errors in dax_do_io
@ 2016-03-24 23:17   ` Vishal Verma
  0 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Vishal Verma, linux-fsdevel, linux-block, xfs, linux-ext4,
	linux-mm, Matthew Wilcox, Ross Zwisler, Dan Williams,
	Dave Chinner, Jan Kara, Jens Axboe, Al Viro, Andrew Morton

dax_do_io (called for read() or write() for a dax file system) may fail
in the presence of bad blocks or media errors. Since we expect that a
write should clear media errors on nvdimms, make dax_do_io fall back to
the direct_IO path, which will send down a bio to the driver, which can
then attempt to clear the error.

Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 fs/block_dev.c      |  5 +++--
 fs/dax.c            | 34 ++++++++++++++++++++++++++++++++--
 fs/ext2/inode.c     |  5 +++--
 fs/ext4/indirect.c  | 11 +++++++----
 fs/ext4/inode.c     |  5 +++--
 fs/xfs/xfs_aops.c   |  7 ++++---
 include/linux/dax.h |  6 +++++-
 7 files changed, 57 insertions(+), 16 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9c0765b..f3873ab 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -168,8 +168,9 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
 	struct inode *inode = bdev_file_inode(file);
 
 	if (IS_DAX(inode))
-		return dax_do_io(iocb, inode, iter, offset, blkdev_get_block,
-				NULL, DIO_SKIP_DIO_COUNT);
+		return dax_do_io(iocb, inode, I_BDEV(inode), iter, offset,
+				blkdev_get_block, blkdev_get_block,
+				NULL, NULL, DIO_SKIP_DIO_COUNT);
 	return __blockdev_direct_IO(iocb, inode, I_BDEV(inode), iter, offset,
 				    blkdev_get_block, NULL, NULL,
 				    DIO_SKIP_DIO_COUNT);
diff --git a/fs/dax.c b/fs/dax.c
index a30481e..b90c8e9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -208,7 +208,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 }
 
 /**
- * dax_do_io - Perform I/O to a DAX file
+ * __dax_do_io - Perform I/O to a DAX file
  * @iocb: The control block for this I/O
  * @inode: The file which the I/O is directed at
  * @iter: The addresses to do I/O from or to
@@ -224,7 +224,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
  * As with do_blockdev_direct_IO(), we increment i_dio_count while the I/O
  * is in progress.
  */
-ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
+ssize_t __dax_do_io(struct kiocb *iocb, struct inode *inode,
 		  struct iov_iter *iter, loff_t pos, get_block_t get_block,
 		  dio_iodone_t end_io, int flags)
 {
@@ -262,8 +262,38 @@ ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
  out:
 	return retval;
 }
+EXPORT_SYMBOL_GPL(__dax_do_io);
+
+/*
+ * This is a library function for use by file systems. It will perform a
+ * fallback to direct_io semantics if the dax_io fails due to a media error.
+ */
+ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
+		  struct block_device *bdev, struct iov_iter *iter, loff_t pos,
+		  get_block_t dax_get_block, get_block_t dio_get_block,
+		  dio_iodone_t end_io, dio_submit_t submit_io, int flags)
+{
+	ssize_t retval;
+
+	retval = __dax_do_io(iocb, inode, iter, pos, dax_get_block, end_io,
+				flags);
+	if (iov_iter_rw(iter) == WRITE && retval == -EIO) {
+		/*
+		 * __dax_do_io may have failed a write due to a bad block.
+		 * Retry with direct_io, and if the direct_IO also fails,
+		 * return -EIO as that was the original error that led us
+		 * down the direct_IO path.
+		 */
+		retval = __blockdev_direct_IO(iocb, inode, bdev, iter, pos,
+				dio_get_block, end_io, submit_io, flags);
+		if (retval < 0)
+			return -EIO;
+	}
+	return retval;
+}
 EXPORT_SYMBOL_GPL(dax_do_io);
 
+
 /*
  * The user has performed a load from a hole in the file.  Allocating
  * a new page in the file would cause excessive storage usage for
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 824f249..8a307cf 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -862,8 +862,9 @@ ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
 	ssize_t ret;
 
 	if (IS_DAX(inode))
-		ret = dax_do_io(iocb, inode, iter, offset, ext2_get_block, NULL,
-				DIO_LOCKING);
+		ret = dax_do_io(iocb, inode, inode->i_sb->s_bdev, iter,
+				offset, ext2_get_block, ext2_get_block,
+				NULL, NULL, DIO_LOCKING | DIO_SKIP_HOLES);
 	else
 		ret = blockdev_direct_IO(iocb, inode, iter, offset,
 					 ext2_get_block);
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 355ef9c..4b087b7 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -692,8 +692,9 @@ retry:
 			goto locked;
 		}
 		if (IS_DAX(inode))
-			ret = dax_do_io(iocb, inode, iter, offset,
-					ext4_get_block, NULL, 0);
+			ret = dax_do_io(iocb, inode, inode->i_sb->s_bdev, iter,
+					offset, ext4_get_block, ext4_get_block,
+					NULL, NULL, 0);
 		else
 			ret = __blockdev_direct_IO(iocb, inode,
 						   inode->i_sb->s_bdev, iter,
@@ -703,8 +704,10 @@ retry:
 	} else {
 locked:
 		if (IS_DAX(inode))
-			ret = dax_do_io(iocb, inode, iter, offset,
-					ext4_get_block, NULL, DIO_LOCKING);
+			ret = dax_do_io(iocb, inode, inode->i_sb->s_bdev, iter,
+					offset, ext4_get_block, ext4_get_block,
+					NULL, NULL, DIO_LOCKING |
+					DIO_SKIP_HOLES);
 		else
 			ret = blockdev_direct_IO(iocb, inode, iter, offset,
 						 ext4_get_block);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index aee960b..4220dac 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3315,8 +3315,9 @@ static ssize_t ext4_ext_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 	BUG_ON(ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode));
 #endif
 	if (IS_DAX(inode))
-		ret = dax_do_io(iocb, inode, iter, offset, get_block_func,
-				ext4_end_io_dio, dio_flags);
+		ret = dax_do_io(iocb, inode, inode->i_sb->s_bdev, iter, offset,
+				get_block_func, get_block_func,
+				ext4_end_io_dio, NULL, dio_flags);
 	else
 		ret = __blockdev_direct_IO(iocb, inode,
 					   inode->i_sb->s_bdev, iter, offset,
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a9ebabfe..dc4e088 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1682,11 +1682,12 @@ xfs_vm_do_dio(
 					 void		*private),
 	int			flags)
 {
-	struct block_device	*bdev;
+	struct block_device	*bdev = xfs_find_bdev_for_inode(inode);
 
 	if (IS_DAX(inode))
-		return dax_do_io(iocb, inode, iter, offset,
-				 xfs_get_blocks_direct, endio, 0);
+		return dax_do_io(iocb, inode, bdev, iter, offset,
+				 xfs_get_blocks_direct, xfs_get_blocks_direct,
+				 endio, NULL, flags);
 
 	bdev = xfs_find_bdev_for_inode(inode);
 	return  __blockdev_direct_IO(iocb, inode, bdev, iter, offset,
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 933198a..6981076 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -5,8 +5,12 @@
 #include <linux/mm.h>
 #include <asm/pgtable.h>
 
-ssize_t dax_do_io(struct kiocb *, struct inode *, struct iov_iter *, loff_t,
+ssize_t __dax_do_io(struct kiocb *, struct inode *, struct iov_iter *, loff_t,
 		  get_block_t, dio_iodone_t, int flags);
+ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
+		  struct block_device *bdev, struct iov_iter *iter, loff_t pos,
+		  get_block_t dax_get_block, get_block_t dio_get_block,
+		  dio_iodone_t end_io, dio_submit_t submit_io, int flags);
 int dax_zero_page_range(struct inode *, loff_t from, unsigned len, get_block_t);
 int dax_truncate_page(struct inode *, loff_t from, get_block_t);
 int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t,
-- 
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 5/5] dax: handle media errors in dax_do_io
@ 2016-03-24 23:17   ` Vishal Verma
  0 siblings, 0 replies; 74+ messages in thread
From: Vishal Verma @ 2016-03-24 23:17 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jens Axboe, Jan Kara, Andrew Morton, Vishal Verma, xfs,
	linux-block, linux-mm, Matthew Wilcox, linux-fsdevel,
	Ross Zwisler, linux-ext4, Dan Williams, Al Viro

dax_do_io (called for read() or write() for a dax file system) may fail
in the presence of bad blocks or media errors. Since we expect that a
write should clear media errors on nvdimms, make dax_do_io fall back to
the direct_IO path, which will send down a bio to the driver, which can
then attempt to clear the error.

Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 fs/block_dev.c      |  5 +++--
 fs/dax.c            | 34 ++++++++++++++++++++++++++++++++--
 fs/ext2/inode.c     |  5 +++--
 fs/ext4/indirect.c  | 11 +++++++----
 fs/ext4/inode.c     |  5 +++--
 fs/xfs/xfs_aops.c   |  7 ++++---
 include/linux/dax.h |  6 +++++-
 7 files changed, 57 insertions(+), 16 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9c0765b..f3873ab 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -168,8 +168,9 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
 	struct inode *inode = bdev_file_inode(file);
 
 	if (IS_DAX(inode))
-		return dax_do_io(iocb, inode, iter, offset, blkdev_get_block,
-				NULL, DIO_SKIP_DIO_COUNT);
+		return dax_do_io(iocb, inode, I_BDEV(inode), iter, offset,
+				blkdev_get_block, blkdev_get_block,
+				NULL, NULL, DIO_SKIP_DIO_COUNT);
 	return __blockdev_direct_IO(iocb, inode, I_BDEV(inode), iter, offset,
 				    blkdev_get_block, NULL, NULL,
 				    DIO_SKIP_DIO_COUNT);
diff --git a/fs/dax.c b/fs/dax.c
index a30481e..b90c8e9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -208,7 +208,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 }
 
 /**
- * dax_do_io - Perform I/O to a DAX file
+ * __dax_do_io - Perform I/O to a DAX file
  * @iocb: The control block for this I/O
  * @inode: The file which the I/O is directed at
  * @iter: The addresses to do I/O from or to
@@ -224,7 +224,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
  * As with do_blockdev_direct_IO(), we increment i_dio_count while the I/O
  * is in progress.
  */
-ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
+ssize_t __dax_do_io(struct kiocb *iocb, struct inode *inode,
 		  struct iov_iter *iter, loff_t pos, get_block_t get_block,
 		  dio_iodone_t end_io, int flags)
 {
@@ -262,8 +262,38 @@ ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
  out:
 	return retval;
 }
+EXPORT_SYMBOL_GPL(__dax_do_io);
+
+/*
+ * This is a library function for use by file systems. It will perform a
+ * fallback to direct_io semantics if the dax_io fails due to a media error.
+ */
+ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
+		  struct block_device *bdev, struct iov_iter *iter, loff_t pos,
+		  get_block_t dax_get_block, get_block_t dio_get_block,
+		  dio_iodone_t end_io, dio_submit_t submit_io, int flags)
+{
+	ssize_t retval;
+
+	retval = __dax_do_io(iocb, inode, iter, pos, dax_get_block, end_io,
+				flags);
+	if (iov_iter_rw(iter) == WRITE && retval == -EIO) {
+		/*
+		 * __dax_do_io may have failed a write due to a bad block.
+		 * Retry with direct_io, and if the direct_IO also fails,
+		 * return -EIO as that was the original error that led us
+		 * down the direct_IO path.
+		 */
+		retval = __blockdev_direct_IO(iocb, inode, bdev, iter, pos,
+				dio_get_block, end_io, submit_io, flags);
+		if (retval < 0)
+			return -EIO;
+	}
+	return retval;
+}
 EXPORT_SYMBOL_GPL(dax_do_io);
 
+
 /*
  * The user has performed a load from a hole in the file.  Allocating
  * a new page in the file would cause excessive storage usage for
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 824f249..8a307cf 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -862,8 +862,9 @@ ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
 	ssize_t ret;
 
 	if (IS_DAX(inode))
-		ret = dax_do_io(iocb, inode, iter, offset, ext2_get_block, NULL,
-				DIO_LOCKING);
+		ret = dax_do_io(iocb, inode, inode->i_sb->s_bdev, iter,
+				offset, ext2_get_block, ext2_get_block,
+				NULL, NULL, DIO_LOCKING | DIO_SKIP_HOLES);
 	else
 		ret = blockdev_direct_IO(iocb, inode, iter, offset,
 					 ext2_get_block);
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 355ef9c..4b087b7 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -692,8 +692,9 @@ retry:
 			goto locked;
 		}
 		if (IS_DAX(inode))
-			ret = dax_do_io(iocb, inode, iter, offset,
-					ext4_get_block, NULL, 0);
+			ret = dax_do_io(iocb, inode, inode->i_sb->s_bdev, iter,
+					offset, ext4_get_block, ext4_get_block,
+					NULL, NULL, 0);
 		else
 			ret = __blockdev_direct_IO(iocb, inode,
 						   inode->i_sb->s_bdev, iter,
@@ -703,8 +704,10 @@ retry:
 	} else {
 locked:
 		if (IS_DAX(inode))
-			ret = dax_do_io(iocb, inode, iter, offset,
-					ext4_get_block, NULL, DIO_LOCKING);
+			ret = dax_do_io(iocb, inode, inode->i_sb->s_bdev, iter,
+					offset, ext4_get_block, ext4_get_block,
+					NULL, NULL, DIO_LOCKING |
+					DIO_SKIP_HOLES);
 		else
 			ret = blockdev_direct_IO(iocb, inode, iter, offset,
 						 ext4_get_block);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index aee960b..4220dac 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3315,8 +3315,9 @@ static ssize_t ext4_ext_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 	BUG_ON(ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode));
 #endif
 	if (IS_DAX(inode))
-		ret = dax_do_io(iocb, inode, iter, offset, get_block_func,
-				ext4_end_io_dio, dio_flags);
+		ret = dax_do_io(iocb, inode, inode->i_sb->s_bdev, iter, offset,
+				get_block_func, get_block_func,
+				ext4_end_io_dio, NULL, dio_flags);
 	else
 		ret = __blockdev_direct_IO(iocb, inode,
 					   inode->i_sb->s_bdev, iter, offset,
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a9ebabfe..dc4e088 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1682,11 +1682,12 @@ xfs_vm_do_dio(
 					 void		*private),
 	int			flags)
 {
-	struct block_device	*bdev;
+	struct block_device	*bdev = xfs_find_bdev_for_inode(inode);
 
 	if (IS_DAX(inode))
-		return dax_do_io(iocb, inode, iter, offset,
-				 xfs_get_blocks_direct, endio, 0);
+		return dax_do_io(iocb, inode, bdev, iter, offset,
+				 xfs_get_blocks_direct, xfs_get_blocks_direct,
+				 endio, NULL, flags);
 
 	bdev = xfs_find_bdev_for_inode(inode);
 	return  __blockdev_direct_IO(iocb, inode, bdev, iter, offset,
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 933198a..6981076 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -5,8 +5,12 @@
 #include <linux/mm.h>
 #include <asm/pgtable.h>
 
-ssize_t dax_do_io(struct kiocb *, struct inode *, struct iov_iter *, loff_t,
+ssize_t __dax_do_io(struct kiocb *, struct inode *, struct iov_iter *, loff_t,
 		  get_block_t, dio_iodone_t, int flags);
+ssize_t dax_do_io(struct kiocb *iocb, struct inode *inode,
+		  struct block_device *bdev, struct iov_iter *iter, loff_t pos,
+		  get_block_t dax_get_block, get_block_t dio_get_block,
+		  dio_iodone_t end_io, dio_submit_t submit_io, int flags);
 int dax_zero_page_range(struct inode *, loff_t from, unsigned len, get_block_t);
 int dax_truncate_page(struct inode *, loff_t from, get_block_t);
 int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t,
-- 
2.5.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/5] dax: enable dax in the presence of known media errors (badblocks)
  2016-03-24 23:17   ` Vishal Verma
  (?)
@ 2016-03-24 23:23     ` Verma, Vishal L
  -1 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-24 23:23 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: axboe, jack, david, xfs, linux-block, linux-mm, Wilcox,
	Matthew R  <matthew.r.wilcox@intel.com>,
	linux-fsdevel@vger.kernel.org, akpm, linux-ext4, viro

On Thu, 2016-03-24 at 17:17 -0600, Vishal Verma wrote:
> From: Dan Williams <dan.j.williams@intel.com>
> 
> From: Dan Williams <dan.j.williams@intel.com>

Eep, not sure how this happened, looked alright in the patches!
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/5] dax: enable dax in the presence of known media errors (badblocks)
@ 2016-03-24 23:23     ` Verma, Vishal L
  0 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-24 23:23 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Wilcox, Matthew R, linux-block, xfs, linux-mm, viro, axboe, akpm,
	linux-fsdevel, linux-ext4, david, jack

On Thu, 2016-03-24 at 17:17 -0600, Vishal Verma wrote:
> From: Dan Williams <dan.j.williams@intel.com>
> 
> From: Dan Williams <dan.j.williams@intel.com>

Eep, not sure how this happened, looked alright in the patches!

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/5] dax: enable dax in the presence of known media errors (badblocks)
@ 2016-03-24 23:23     ` Verma, Vishal L
  0 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-24 23:23 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: axboe, jack, xfs, linux-block, linux-mm, Wilcox, Matthew R,
	linux-fsdevel, akpm, linux-ext4, viro

On Thu, 2016-03-24 at 17:17 -0600, Vishal Verma wrote:
> From: Dan Williams <dan.j.williams@intel.com>
> 
> From: Dan Williams <dan.j.williams@intel.com>

Eep, not sure how this happened, looked alright in the patches!
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
  2016-03-24 23:17   ` Vishal Verma
@ 2016-03-25 10:44     ` Christoph Hellwig
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2016-03-25 10:44 UTC (permalink / raw)
  To: Vishal Verma
  Cc: linux-nvdimm, linux-fsdevel, linux-block, xfs, linux-ext4,
	linux-mm, Matthew Wilcox, Ross Zwisler, Dan Williams,
	Dave Chinner, Jan Kara, Jens Axboe, Al Viro, Andrew Morton

On Thu, Mar 24, 2016 at 05:17:29PM -0600, Vishal Verma wrote:
> @@ -72,16 +72,7 @@ xfs_zero_extent(
>  	struct xfs_mount *mp = ip->i_mount;
>  	xfs_daddr_t	sector = xfs_fsb_to_db(ip, start_fsb);
>  	sector_t	block = XFS_BB_TO_FSBT(mp, sector);
> -	ssize_t		size = XFS_FSB_TO_B(mp, count_fsb);
>  
> -	if (IS_DAX(VFS_I(ip)))
> -		return dax_clear_sectors(xfs_find_bdev_for_inode(VFS_I(ip)),
> -				sector, size);
> -
> -	/*
> -	 * let the block layer decide on the fastest method of
> -	 * implementing the zeroing.
> -	 */
>  	return sb_issue_zeroout(mp->m_super, block, count_fsb, GFP_NOFS);

While not new: using sb_issue_zeroout in XFS is wrong as it doesn't
account for the RT device.  We need the xfs_find_bdev_for_inode and
call blkdev_issue_zeroout directly with the bdev it returned.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-25 10:44     ` Christoph Hellwig
  0 siblings, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2016-03-25 10:44 UTC (permalink / raw)
  To: Vishal Verma
  Cc: Jens Axboe, Jan Kara, Andrew Morton, linux-nvdimm, xfs,
	linux-block, linux-mm, Matthew Wilcox, linux-fsdevel,
	Ross Zwisler, linux-ext4, Dan Williams, Al Viro

On Thu, Mar 24, 2016 at 05:17:29PM -0600, Vishal Verma wrote:
> @@ -72,16 +72,7 @@ xfs_zero_extent(
>  	struct xfs_mount *mp = ip->i_mount;
>  	xfs_daddr_t	sector = xfs_fsb_to_db(ip, start_fsb);
>  	sector_t	block = XFS_BB_TO_FSBT(mp, sector);
> -	ssize_t		size = XFS_FSB_TO_B(mp, count_fsb);
>  
> -	if (IS_DAX(VFS_I(ip)))
> -		return dax_clear_sectors(xfs_find_bdev_for_inode(VFS_I(ip)),
> -				sector, size);
> -
> -	/*
> -	 * let the block layer decide on the fastest method of
> -	 * implementing the zeroing.
> -	 */
>  	return sb_issue_zeroout(mp->m_super, block, count_fsb, GFP_NOFS);

While not new: using sb_issue_zeroout in XFS is wrong as it doesn't
account for the RT device.  We need the xfs_find_bdev_for_inode and
call blkdev_issue_zeroout directly with the bdev it returned.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/5] dax: handle media errors in dax_do_io
  2016-03-24 23:17   ` Vishal Verma
  (?)
@ 2016-03-25 10:45     ` Christoph Hellwig
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2016-03-25 10:45 UTC (permalink / raw)
  To: Vishal Verma
  Cc: linux-nvdimm, linux-fsdevel, linux-block, xfs, linux-ext4,
	linux-mm, Matthew Wilcox, Ross Zwisler, Dan Williams,
	Dave Chinner, Jan Kara, Jens Axboe, Al Viro, Andrew Morton

On Thu, Mar 24, 2016 at 05:17:30PM -0600, Vishal Verma wrote:
> dax_do_io (called for read() or write() for a dax file system) may fail
> in the presence of bad blocks or media errors. Since we expect that a
> write should clear media errors on nvdimms, make dax_do_io fall back to
> the direct_IO path, which will send down a bio to the driver, which can
> then attempt to clear the error.

Leave the fallback on -EIO to the callers please.  They generally call
__blockdev_direct_IO anyway, so it should actually become simpler that
way.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/5] dax: handle media errors in dax_do_io
@ 2016-03-25 10:45     ` Christoph Hellwig
  0 siblings, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2016-03-25 10:45 UTC (permalink / raw)
  To: Vishal Verma
  Cc: linux-nvdimm, linux-fsdevel, linux-block, xfs, linux-ext4,
	linux-mm, Matthew Wilcox, Ross Zwisler, Dan Williams,
	Dave Chinner, Jan Kara, Jens Axboe, Al Viro, Andrew Morton

On Thu, Mar 24, 2016 at 05:17:30PM -0600, Vishal Verma wrote:
> dax_do_io (called for read() or write() for a dax file system) may fail
> in the presence of bad blocks or media errors. Since we expect that a
> write should clear media errors on nvdimms, make dax_do_io fall back to
> the direct_IO path, which will send down a bio to the driver, which can
> then attempt to clear the error.

Leave the fallback on -EIO to the callers please.  They generally call
__blockdev_direct_IO anyway, so it should actually become simpler that
way.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/5] dax: handle media errors in dax_do_io
@ 2016-03-25 10:45     ` Christoph Hellwig
  0 siblings, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2016-03-25 10:45 UTC (permalink / raw)
  To: Vishal Verma
  Cc: Jens Axboe, Jan Kara, Andrew Morton, linux-nvdimm, xfs,
	linux-block, linux-mm, Matthew Wilcox, linux-fsdevel,
	Ross Zwisler, linux-ext4, Dan Williams, Al Viro

On Thu, Mar 24, 2016 at 05:17:30PM -0600, Vishal Verma wrote:
> dax_do_io (called for read() or write() for a dax file system) may fail
> in the presence of bad blocks or media errors. Since we expect that a
> write should clear media errors on nvdimms, make dax_do_io fall back to
> the direct_IO path, which will send down a bio to the driver, which can
> then attempt to clear the error.

Leave the fallback on -EIO to the callers please.  They generally call
__blockdev_direct_IO anyway, so it should actually become simpler that
way.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
  2016-03-24 23:17   ` Vishal Verma
  (?)
@ 2016-03-25 18:47     ` Dan Williams
  -1 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-25 18:47 UTC (permalink / raw)
  To: Vishal Verma
  Cc: Jens Axboe, Jan Kara, linux-nvdimm, Dave Chinner, XFS Developers,
	linux-block, Linux MM, Matthew Wilcox, linux-fsdevel, linux-ext4,
	Andrew Morton, Al Viro

On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@intel.com> wrote:
> From: Matthew Wilcox <matthew.r.wilcox@intel.com>
>
> dax_clear_sectors() cannot handle poisoned blocks.  These must be
> zeroed using the BIO interface instead.  Convert ext2 and XFS to use
> only sb_issue_zerout().
>
> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> [vishal: Also remove the dax_clear_sectors function entirely]
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  fs/dax.c               | 32 --------------------------------
>  fs/ext2/inode.c        |  7 +++----
>  fs/xfs/xfs_bmap_util.c |  9 ---------
>  include/linux/dax.h    |  1 -
>  4 files changed, 3 insertions(+), 46 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index bb7e9f8..a30481e 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct block_device *bdev, sector_t n)
>         return page;
>  }
>
> -/*
> - * dax_clear_sectors() is called from within transaction context from XFS,
> - * and hence this means the stack from this point must follow GFP_NOFS
> - * semantics for all operations.
> - */
> -int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size)
> -{
> -       struct blk_dax_ctl dax = {
> -               .sector = _sector,
> -               .size = _size,
> -       };
> -
> -       might_sleep();
> -       do {
> -               long count, sz;
> -
> -               count = dax_map_atomic(bdev, &dax);
> -               if (count < 0)
> -                       return count;
> -               sz = min_t(long, count, SZ_128K);
> -               clear_pmem(dax.addr, sz);
> -               dax.size -= sz;
> -               dax.sector += sz / 512;
> -               dax_unmap_atomic(bdev, &dax);
> -               cond_resched();
> -       } while (dax.size);
> -
> -       wmb_pmem();
> -       return 0;
> -}
> -EXPORT_SYMBOL_GPL(dax_clear_sectors);

What about the other unwritten extent conversions in the dax path?
Shouldn't those be converted to block-layer zero-outs as well?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-25 18:47     ` Dan Williams
  0 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-25 18:47 UTC (permalink / raw)
  To: Vishal Verma
  Cc: linux-nvdimm, linux-fsdevel, linux-block, XFS Developers,
	linux-ext4, Linux MM, Matthew Wilcox, Ross Zwisler, Dave Chinner,
	Jan Kara, Jens Axboe, Al Viro, Andrew Morton

On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@intel.com> wrote:
> From: Matthew Wilcox <matthew.r.wilcox@intel.com>
>
> dax_clear_sectors() cannot handle poisoned blocks.  These must be
> zeroed using the BIO interface instead.  Convert ext2 and XFS to use
> only sb_issue_zerout().
>
> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> [vishal: Also remove the dax_clear_sectors function entirely]
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  fs/dax.c               | 32 --------------------------------
>  fs/ext2/inode.c        |  7 +++----
>  fs/xfs/xfs_bmap_util.c |  9 ---------
>  include/linux/dax.h    |  1 -
>  4 files changed, 3 insertions(+), 46 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index bb7e9f8..a30481e 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct block_device *bdev, sector_t n)
>         return page;
>  }
>
> -/*
> - * dax_clear_sectors() is called from within transaction context from XFS,
> - * and hence this means the stack from this point must follow GFP_NOFS
> - * semantics for all operations.
> - */
> -int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size)
> -{
> -       struct blk_dax_ctl dax = {
> -               .sector = _sector,
> -               .size = _size,
> -       };
> -
> -       might_sleep();
> -       do {
> -               long count, sz;
> -
> -               count = dax_map_atomic(bdev, &dax);
> -               if (count < 0)
> -                       return count;
> -               sz = min_t(long, count, SZ_128K);
> -               clear_pmem(dax.addr, sz);
> -               dax.size -= sz;
> -               dax.sector += sz / 512;
> -               dax_unmap_atomic(bdev, &dax);
> -               cond_resched();
> -       } while (dax.size);
> -
> -       wmb_pmem();
> -       return 0;
> -}
> -EXPORT_SYMBOL_GPL(dax_clear_sectors);

What about the other unwritten extent conversions in the dax path?
Shouldn't those be converted to block-layer zero-outs as well?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-25 18:47     ` Dan Williams
  0 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-25 18:47 UTC (permalink / raw)
  To: Vishal Verma
  Cc: Jens Axboe, Jan Kara, linux-nvdimm, XFS Developers, linux-block,
	Linux MM, Matthew Wilcox, linux-fsdevel, Ross Zwisler,
	linux-ext4, Andrew Morton, Al Viro

On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@intel.com> wrote:
> From: Matthew Wilcox <matthew.r.wilcox@intel.com>
>
> dax_clear_sectors() cannot handle poisoned blocks.  These must be
> zeroed using the BIO interface instead.  Convert ext2 and XFS to use
> only sb_issue_zerout().
>
> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> [vishal: Also remove the dax_clear_sectors function entirely]
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  fs/dax.c               | 32 --------------------------------
>  fs/ext2/inode.c        |  7 +++----
>  fs/xfs/xfs_bmap_util.c |  9 ---------
>  include/linux/dax.h    |  1 -
>  4 files changed, 3 insertions(+), 46 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index bb7e9f8..a30481e 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct block_device *bdev, sector_t n)
>         return page;
>  }
>
> -/*
> - * dax_clear_sectors() is called from within transaction context from XFS,
> - * and hence this means the stack from this point must follow GFP_NOFS
> - * semantics for all operations.
> - */
> -int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size)
> -{
> -       struct blk_dax_ctl dax = {
> -               .sector = _sector,
> -               .size = _size,
> -       };
> -
> -       might_sleep();
> -       do {
> -               long count, sz;
> -
> -               count = dax_map_atomic(bdev, &dax);
> -               if (count < 0)
> -                       return count;
> -               sz = min_t(long, count, SZ_128K);
> -               clear_pmem(dax.addr, sz);
> -               dax.size -= sz;
> -               dax.sector += sz / 512;
> -               dax_unmap_atomic(bdev, &dax);
> -               cond_resched();
> -       } while (dax.size);
> -
> -       wmb_pmem();
> -       return 0;
> -}
> -EXPORT_SYMBOL_GPL(dax_clear_sectors);

What about the other unwritten extent conversions in the dax path?
Shouldn't those be converted to block-layer zero-outs as well?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/5] dax: handle media errors in dax_do_io
  2016-03-25 10:45     ` Christoph Hellwig
@ 2016-03-25 20:59       ` Verma, Vishal L
  -1 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-25 20:59 UTC (permalink / raw)
  To: hch
  Cc: linux-block, xfs, linux-nvdimm, linux-mm, viro, Williams, Dan J,
	axboe, akpm, linux-fsdevel, ross.zwisler, linux-ext4, Wilcox,
	Matthew R, david, jack

On Fri, 2016-03-25 at 03:45 -0700, Christoph Hellwig wrote:
> On Thu, Mar 24, 2016 at 05:17:30PM -0600, Vishal Verma wrote:
> > 
> > dax_do_io (called for read() or write() for a dax file system) may
> > fail
> > in the presence of bad blocks or media errors. Since we expect that
> > a
> > write should clear media errors on nvdimms, make dax_do_io fall
> > back to
> > the direct_IO path, which will send down a bio to the driver, which
> > can
> > then attempt to clear the error.
> Leave the fallback on -EIO to the callers please.  They generally
> call
> __blockdev_direct_IO anyway, so it should actually become simpler
> that
> way.

I thought of this, but made the retrying happen in the wrapper so that
it can be centralized. If the callers were to become responsible for
the retry, then any new callers of dax_do_io might not realize they are
responsible for retrying, and hit problems. Another tricky point might
be: in the wrapper, if __dax_do_io failed with -EIO, and subsequently
__blockdev_direct_IO also fails with a *different* error, I chose to
return -EIO because that was the 'first' error we hit and caused us to
fallback.. (Does this even seem reasonable?) And if so, do we want to
push back this decision too, to the callers?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/5] dax: handle media errors in dax_do_io
@ 2016-03-25 20:59       ` Verma, Vishal L
  0 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-25 20:59 UTC (permalink / raw)
  To: hch
  Cc: axboe, jack, linux-nvdimm, xfs, linux-block, linux-mm, viro,
	ross.zwisler, linux-fsdevel, Williams, Dan J, linux-ext4, akpm,
	Wilcox, Matthew R

On Fri, 2016-03-25 at 03:45 -0700, Christoph Hellwig wrote:
> On Thu, Mar 24, 2016 at 05:17:30PM -0600, Vishal Verma wrote:
> > 
> > dax_do_io (called for read() or write() for a dax file system) may
> > fail
> > in the presence of bad blocks or media errors. Since we expect that
> > a
> > write should clear media errors on nvdimms, make dax_do_io fall
> > back to
> > the direct_IO path, which will send down a bio to the driver, which
> > can
> > then attempt to clear the error.
> Leave the fallback on -EIO to the callers please.  They generally
> call
> __blockdev_direct_IO anyway, so it should actually become simpler
> that
> way.

I thought of this, but made the retrying happen in the wrapper so that
it can be centralized. If the callers were to become responsible for
the retry, then any new callers of dax_do_io might not realize they are
responsible for retrying, and hit problems. Another tricky point might
be: in the wrapper, if __dax_do_io failed with -EIO, and subsequently
__blockdev_direct_IO also fails with a *different* error, I chose to
return -EIO because that was the 'first' error we hit and caused us to
fallback.. (Does this even seem reasonable?) And if so, do we want to
push back this decision too, to the callers?
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
  2016-03-25 10:44     ` Christoph Hellwig
@ 2016-03-25 21:01       ` Verma, Vishal L
  -1 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-25 21:01 UTC (permalink / raw)
  To: hch
  Cc: linux-block, xfs, linux-nvdimm, linux-mm, viro, Williams, Dan J,
	axboe, akpm, linux-fsdevel, ross.zwisler, linux-ext4, Wilcox,
	Matthew R, david, jack

On Fri, 2016-03-25 at 03:44 -0700, Christoph Hellwig wrote:
> On Thu, Mar 24, 2016 at 05:17:29PM -0600, Vishal Verma wrote:
> > 
> > @@ -72,16 +72,7 @@ xfs_zero_extent(
> >  	struct xfs_mount *mp = ip->i_mount;
> >  	xfs_daddr_t	sector = xfs_fsb_to_db(ip, start_fsb);
> >  	sector_t	block = XFS_BB_TO_FSBT(mp, sector);
> > -	ssize_t		size = XFS_FSB_TO_B(mp, count_fsb);
> >  
> > -	if (IS_DAX(VFS_I(ip)))
> > -		return
> > dax_clear_sectors(xfs_find_bdev_for_inode(VFS_I(ip)),
> > -				sector, size);
> > -
> > -	/*
> > -	 * let the block layer decide on the fastest method of
> > -	 * implementing the zeroing.
> > -	 */
> >  	return sb_issue_zeroout(mp->m_super, block, count_fsb,
> > GFP_NOFS);
> While not new: using sb_issue_zeroout in XFS is wrong as it doesn't
> account for the RT device.  We need the xfs_find_bdev_for_inode and
> call blkdev_issue_zeroout directly with the bdev it returned.

Ok, I'll fix and send a v2. Thanks!
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-25 21:01       ` Verma, Vishal L
  0 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-25 21:01 UTC (permalink / raw)
  To: hch
  Cc: axboe, jack, linux-nvdimm, xfs, linux-block, linux-mm, viro,
	ross.zwisler, linux-fsdevel, Williams, Dan J, linux-ext4, akpm,
	Wilcox, Matthew R

On Fri, 2016-03-25 at 03:44 -0700, Christoph Hellwig wrote:
> On Thu, Mar 24, 2016 at 05:17:29PM -0600, Vishal Verma wrote:
> > 
> > @@ -72,16 +72,7 @@ xfs_zero_extent(
> >  	struct xfs_mount *mp = ip->i_mount;
> >  	xfs_daddr_t	sector = xfs_fsb_to_db(ip, start_fsb);
> >  	sector_t	block = XFS_BB_TO_FSBT(mp, sector);
> > -	ssize_t		size = XFS_FSB_TO_B(mp, count_fsb);
> >  
> > -	if (IS_DAX(VFS_I(ip)))
> > -		return
> > dax_clear_sectors(xfs_find_bdev_for_inode(VFS_I(ip)),
> > -				sector, size);
> > -
> > -	/*
> > -	 * let the block layer decide on the fastest method of
> > -	 * implementing the zeroing.
> > -	 */
> >  	return sb_issue_zeroout(mp->m_super, block, count_fsb,
> > GFP_NOFS);
> While not new: using sb_issue_zeroout in XFS is wrong as it doesn't
> account for the RT device.  We need the xfs_find_bdev_for_inode and
> call blkdev_issue_zeroout directly with the bdev it returned.

Ok, I'll fix and send a v2. Thanks!
> 
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
  2016-03-25 18:47     ` Dan Williams
  (?)
@ 2016-03-25 21:03       ` Verma, Vishal L
  -1 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-25 21:03 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: axboe, jack, linux-nvdimm, david, xfs, linux-block, linux-mm,
	viro, linux-fsdevel, akpm, linux-ext4, Wilcox

On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
> On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@intel.c
> om> wrote:
> > 
> > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > 
> > dax_clear_sectors() cannot handle poisoned blocks.  These must be
> > zeroed using the BIO interface instead.  Convert ext2 and XFS to
> > use
> > only sb_issue_zerout().
> > 
> > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > [vishal: Also remove the dax_clear_sectors function entirely]
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> >  fs/dax.c               | 32 --------------------------------
> >  fs/ext2/inode.c        |  7 +++----
> >  fs/xfs/xfs_bmap_util.c |  9 ---------
> >  include/linux/dax.h    |  1 -
> >  4 files changed, 3 insertions(+), 46 deletions(-)
> > 
> > diff --git a/fs/dax.c b/fs/dax.c
> > index bb7e9f8..a30481e 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct block_device
> > *bdev, sector_t n)
> >         return page;
> >  }
> > 
> > -/*
> > - * dax_clear_sectors() is called from within transaction context
> > from XFS,
> > - * and hence this means the stack from this point must follow
> > GFP_NOFS
> > - * semantics for all operations.
> > - */
> > -int dax_clear_sectors(struct block_device *bdev, sector_t _sector,
> > long _size)
> > -{
> > -       struct blk_dax_ctl dax = {
> > -               .sector = _sector,
> > -               .size = _size,
> > -       };
> > -
> > -       might_sleep();
> > -       do {
> > -               long count, sz;
> > -
> > -               count = dax_map_atomic(bdev, &dax);
> > -               if (count < 0)
> > -                       return count;
> > -               sz = min_t(long, count, SZ_128K);
> > -               clear_pmem(dax.addr, sz);
> > -               dax.size -= sz;
> > -               dax.sector += sz / 512;
> > -               dax_unmap_atomic(bdev, &dax);
> > -               cond_resched();
> > -       } while (dax.size);
> > -
> > -       wmb_pmem();
> > -       return 0;
> > -}
> > -EXPORT_SYMBOL_GPL(dax_clear_sectors);
> What about the other unwritten extent conversions in the dax path?
> Shouldn't those be converted to block-layer zero-outs as well?

Could you point me to where these might be? I thought once we've
converted all the zeroout type callers (by removing dax_clear_sectors),
and fixed up dax_do_io to try a driver fallback, we've handled all the
media error cases in dax..
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-25 21:03       ` Verma, Vishal L
  0 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-25 21:03 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: linux-block, xfs, linux-mm, viro, axboe, akpm, linux-nvdimm,
	linux-fsdevel, ross.zwisler, linux-ext4, Wilcox, Matthew R,
	david, jack

On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
> On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@intel.c
> om> wrote:
> > 
> > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > 
> > dax_clear_sectors() cannot handle poisoned blocks.  These must be
> > zeroed using the BIO interface instead.  Convert ext2 and XFS to
> > use
> > only sb_issue_zerout().
> > 
> > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > [vishal: Also remove the dax_clear_sectors function entirely]
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> >  fs/dax.c               | 32 --------------------------------
> >  fs/ext2/inode.c        |  7 +++----
> >  fs/xfs/xfs_bmap_util.c |  9 ---------
> >  include/linux/dax.h    |  1 -
> >  4 files changed, 3 insertions(+), 46 deletions(-)
> > 
> > diff --git a/fs/dax.c b/fs/dax.c
> > index bb7e9f8..a30481e 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct block_device
> > *bdev, sector_t n)
> >         return page;
> >  }
> > 
> > -/*
> > - * dax_clear_sectors() is called from within transaction context
> > from XFS,
> > - * and hence this means the stack from this point must follow
> > GFP_NOFS
> > - * semantics for all operations.
> > - */
> > -int dax_clear_sectors(struct block_device *bdev, sector_t _sector,
> > long _size)
> > -{
> > -       struct blk_dax_ctl dax = {
> > -               .sector = _sector,
> > -               .size = _size,
> > -       };
> > -
> > -       might_sleep();
> > -       do {
> > -               long count, sz;
> > -
> > -               count = dax_map_atomic(bdev, &dax);
> > -               if (count < 0)
> > -                       return count;
> > -               sz = min_t(long, count, SZ_128K);
> > -               clear_pmem(dax.addr, sz);
> > -               dax.size -= sz;
> > -               dax.sector += sz / 512;
> > -               dax_unmap_atomic(bdev, &dax);
> > -               cond_resched();
> > -       } while (dax.size);
> > -
> > -       wmb_pmem();
> > -       return 0;
> > -}
> > -EXPORT_SYMBOL_GPL(dax_clear_sectors);
> What about the other unwritten extent conversions in the dax path?
> Shouldn't those be converted to block-layer zero-outs as well?

Could you point me to where these might be? I thought once we've
converted all the zeroout type callers (by removing dax_clear_sectors),
and fixed up dax_do_io to try a driver fallback, we've handled all the
media error cases in dax..

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-25 21:03       ` Verma, Vishal L
  0 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-25 21:03 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: axboe, jack, linux-nvdimm, xfs, linux-block, linux-mm, viro,
	linux-fsdevel, akpm, linux-ext4, ross.zwisler, Wilcox, Matthew R

On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
> On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@intel.c
> om> wrote:
> > 
> > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > 
> > dax_clear_sectors() cannot handle poisoned blocks.  These must be
> > zeroed using the BIO interface instead.  Convert ext2 and XFS to
> > use
> > only sb_issue_zerout().
> > 
> > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > [vishal: Also remove the dax_clear_sectors function entirely]
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> >  fs/dax.c               | 32 --------------------------------
> >  fs/ext2/inode.c        |  7 +++----
> >  fs/xfs/xfs_bmap_util.c |  9 ---------
> >  include/linux/dax.h    |  1 -
> >  4 files changed, 3 insertions(+), 46 deletions(-)
> > 
> > diff --git a/fs/dax.c b/fs/dax.c
> > index bb7e9f8..a30481e 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct block_device
> > *bdev, sector_t n)
> >         return page;
> >  }
> > 
> > -/*
> > - * dax_clear_sectors() is called from within transaction context
> > from XFS,
> > - * and hence this means the stack from this point must follow
> > GFP_NOFS
> > - * semantics for all operations.
> > - */
> > -int dax_clear_sectors(struct block_device *bdev, sector_t _sector,
> > long _size)
> > -{
> > -       struct blk_dax_ctl dax = {
> > -               .sector = _sector,
> > -               .size = _size,
> > -       };
> > -
> > -       might_sleep();
> > -       do {
> > -               long count, sz;
> > -
> > -               count = dax_map_atomic(bdev, &dax);
> > -               if (count < 0)
> > -                       return count;
> > -               sz = min_t(long, count, SZ_128K);
> > -               clear_pmem(dax.addr, sz);
> > -               dax.size -= sz;
> > -               dax.sector += sz / 512;
> > -               dax_unmap_atomic(bdev, &dax);
> > -               cond_resched();
> > -       } while (dax.size);
> > -
> > -       wmb_pmem();
> > -       return 0;
> > -}
> > -EXPORT_SYMBOL_GPL(dax_clear_sectors);
> What about the other unwritten extent conversions in the dax path?
> Shouldn't those be converted to block-layer zero-outs as well?

Could you point me to where these might be? I thought once we've
converted all the zeroout type callers (by removing dax_clear_sectors),
and fixed up dax_do_io to try a driver fallback, we've handled all the
media error cases in dax..
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
  2016-03-25 21:03       ` Verma, Vishal L
  (?)
@ 2016-03-25 21:20         ` Dan Williams
  -1 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-25 21:20 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: axboe, jack, linux-nvdimm, david, xfs, linux-block, linux-mm,
	viro, linux-fsdevel, akpm, linux-ext4, Wilcox

On Fri, Mar 25, 2016 at 2:03 PM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
>> On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@intel.c
>> om> wrote:
>> >
>> > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> >
>> > dax_clear_sectors() cannot handle poisoned blocks.  These must be
>> > zeroed using the BIO interface instead.  Convert ext2 and XFS to
>> > use
>> > only sb_issue_zerout().
>> >
>> > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> > [vishal: Also remove the dax_clear_sectors function entirely]
>> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
>> > ---
>> >  fs/dax.c               | 32 --------------------------------
>> >  fs/ext2/inode.c        |  7 +++----
>> >  fs/xfs/xfs_bmap_util.c |  9 ---------
>> >  include/linux/dax.h    |  1 -
>> >  4 files changed, 3 insertions(+), 46 deletions(-)
>> >
>> > diff --git a/fs/dax.c b/fs/dax.c
>> > index bb7e9f8..a30481e 100644
>> > --- a/fs/dax.c
>> > +++ b/fs/dax.c
>> > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct block_device
>> > *bdev, sector_t n)
>> >         return page;
>> >  }
>> >
>> > -/*
>> > - * dax_clear_sectors() is called from within transaction context
>> > from XFS,
>> > - * and hence this means the stack from this point must follow
>> > GFP_NOFS
>> > - * semantics for all operations.
>> > - */
>> > -int dax_clear_sectors(struct block_device *bdev, sector_t _sector,
>> > long _size)
>> > -{
>> > -       struct blk_dax_ctl dax = {
>> > -               .sector = _sector,
>> > -               .size = _size,
>> > -       };
>> > -
>> > -       might_sleep();
>> > -       do {
>> > -               long count, sz;
>> > -
>> > -               count = dax_map_atomic(bdev, &dax);
>> > -               if (count < 0)
>> > -                       return count;
>> > -               sz = min_t(long, count, SZ_128K);
>> > -               clear_pmem(dax.addr, sz);
>> > -               dax.size -= sz;
>> > -               dax.sector += sz / 512;
>> > -               dax_unmap_atomic(bdev, &dax);
>> > -               cond_resched();
>> > -       } while (dax.size);
>> > -
>> > -       wmb_pmem();
>> > -       return 0;
>> > -}
>> > -EXPORT_SYMBOL_GPL(dax_clear_sectors);
>> What about the other unwritten extent conversions in the dax path?
>> Shouldn't those be converted to block-layer zero-outs as well?
>
> Could you point me to where these might be? I thought once we've
> converted all the zeroout type callers (by removing dax_clear_sectors),
> and fixed up dax_do_io to try a driver fallback, we've handled all the
> media error cases in dax..

grep for usages of clear_pmem()... which I was hoping to eliminate
after this change to push zeroing down to the driver.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-25 21:20         ` Dan Williams
  0 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-25 21:20 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: linux-block, xfs, linux-mm, viro, axboe, akpm, linux-nvdimm,
	linux-fsdevel, ross.zwisler, linux-ext4, Wilcox, Matthew R,
	david, jack

On Fri, Mar 25, 2016 at 2:03 PM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
>> On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@intel.c
>> om> wrote:
>> >
>> > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> >
>> > dax_clear_sectors() cannot handle poisoned blocks.  These must be
>> > zeroed using the BIO interface instead.  Convert ext2 and XFS to
>> > use
>> > only sb_issue_zerout().
>> >
>> > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> > [vishal: Also remove the dax_clear_sectors function entirely]
>> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
>> > ---
>> >  fs/dax.c               | 32 --------------------------------
>> >  fs/ext2/inode.c        |  7 +++----
>> >  fs/xfs/xfs_bmap_util.c |  9 ---------
>> >  include/linux/dax.h    |  1 -
>> >  4 files changed, 3 insertions(+), 46 deletions(-)
>> >
>> > diff --git a/fs/dax.c b/fs/dax.c
>> > index bb7e9f8..a30481e 100644
>> > --- a/fs/dax.c
>> > +++ b/fs/dax.c
>> > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct block_device
>> > *bdev, sector_t n)
>> >         return page;
>> >  }
>> >
>> > -/*
>> > - * dax_clear_sectors() is called from within transaction context
>> > from XFS,
>> > - * and hence this means the stack from this point must follow
>> > GFP_NOFS
>> > - * semantics for all operations.
>> > - */
>> > -int dax_clear_sectors(struct block_device *bdev, sector_t _sector,
>> > long _size)
>> > -{
>> > -       struct blk_dax_ctl dax = {
>> > -               .sector = _sector,
>> > -               .size = _size,
>> > -       };
>> > -
>> > -       might_sleep();
>> > -       do {
>> > -               long count, sz;
>> > -
>> > -               count = dax_map_atomic(bdev, &dax);
>> > -               if (count < 0)
>> > -                       return count;
>> > -               sz = min_t(long, count, SZ_128K);
>> > -               clear_pmem(dax.addr, sz);
>> > -               dax.size -= sz;
>> > -               dax.sector += sz / 512;
>> > -               dax_unmap_atomic(bdev, &dax);
>> > -               cond_resched();
>> > -       } while (dax.size);
>> > -
>> > -       wmb_pmem();
>> > -       return 0;
>> > -}
>> > -EXPORT_SYMBOL_GPL(dax_clear_sectors);
>> What about the other unwritten extent conversions in the dax path?
>> Shouldn't those be converted to block-layer zero-outs as well?
>
> Could you point me to where these might be? I thought once we've
> converted all the zeroout type callers (by removing dax_clear_sectors),
> and fixed up dax_do_io to try a driver fallback, we've handled all the
> media error cases in dax..

grep for usages of clear_pmem()... which I was hoping to eliminate
after this change to push zeroing down to the driver.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-25 21:20         ` Dan Williams
  0 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-25 21:20 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: axboe, jack, linux-nvdimm, xfs, linux-block, linux-mm, viro,
	linux-fsdevel, akpm, linux-ext4, ross.zwisler, Wilcox, Matthew R

On Fri, Mar 25, 2016 at 2:03 PM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
>> On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@intel.c
>> om> wrote:
>> >
>> > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> >
>> > dax_clear_sectors() cannot handle poisoned blocks.  These must be
>> > zeroed using the BIO interface instead.  Convert ext2 and XFS to
>> > use
>> > only sb_issue_zerout().
>> >
>> > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> > [vishal: Also remove the dax_clear_sectors function entirely]
>> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
>> > ---
>> >  fs/dax.c               | 32 --------------------------------
>> >  fs/ext2/inode.c        |  7 +++----
>> >  fs/xfs/xfs_bmap_util.c |  9 ---------
>> >  include/linux/dax.h    |  1 -
>> >  4 files changed, 3 insertions(+), 46 deletions(-)
>> >
>> > diff --git a/fs/dax.c b/fs/dax.c
>> > index bb7e9f8..a30481e 100644
>> > --- a/fs/dax.c
>> > +++ b/fs/dax.c
>> > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct block_device
>> > *bdev, sector_t n)
>> >         return page;
>> >  }
>> >
>> > -/*
>> > - * dax_clear_sectors() is called from within transaction context
>> > from XFS,
>> > - * and hence this means the stack from this point must follow
>> > GFP_NOFS
>> > - * semantics for all operations.
>> > - */
>> > -int dax_clear_sectors(struct block_device *bdev, sector_t _sector,
>> > long _size)
>> > -{
>> > -       struct blk_dax_ctl dax = {
>> > -               .sector = _sector,
>> > -               .size = _size,
>> > -       };
>> > -
>> > -       might_sleep();
>> > -       do {
>> > -               long count, sz;
>> > -
>> > -               count = dax_map_atomic(bdev, &dax);
>> > -               if (count < 0)
>> > -                       return count;
>> > -               sz = min_t(long, count, SZ_128K);
>> > -               clear_pmem(dax.addr, sz);
>> > -               dax.size -= sz;
>> > -               dax.sector += sz / 512;
>> > -               dax_unmap_atomic(bdev, &dax);
>> > -               cond_resched();
>> > -       } while (dax.size);
>> > -
>> > -       wmb_pmem();
>> > -       return 0;
>> > -}
>> > -EXPORT_SYMBOL_GPL(dax_clear_sectors);
>> What about the other unwritten extent conversions in the dax path?
>> Shouldn't those be converted to block-layer zero-outs as well?
>
> Could you point me to where these might be? I thought once we've
> converted all the zeroout type callers (by removing dax_clear_sectors),
> and fixed up dax_do_io to try a driver fallback, we've handled all the
> media error cases in dax..

grep for usages of clear_pmem()... which I was hoping to eliminate
after this change to push zeroing down to the driver.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/5] dax: handle media errors in dax_do_io
  2016-03-25 20:59       ` Verma, Vishal L
@ 2016-03-25 21:42         ` Dan Williams
  -1 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-25 21:42 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: hch, linux-block, xfs, linux-nvdimm, linux-mm, viro, axboe, akpm,
	linux-fsdevel, ross.zwisler, linux-ext4, Wilcox, Matthew R,
	david, jack

On Fri, Mar 25, 2016 at 1:59 PM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Fri, 2016-03-25 at 03:45 -0700, Christoph Hellwig wrote:
>> On Thu, Mar 24, 2016 at 05:17:30PM -0600, Vishal Verma wrote:
>> >
>> > dax_do_io (called for read() or write() for a dax file system) may
>> > fail
>> > in the presence of bad blocks or media errors. Since we expect that
>> > a
>> > write should clear media errors on nvdimms, make dax_do_io fall
>> > back to
>> > the direct_IO path, which will send down a bio to the driver, which
>> > can
>> > then attempt to clear the error.
>> Leave the fallback on -EIO to the callers please.  They generally
>> call
>> __blockdev_direct_IO anyway, so it should actually become simpler
>> that
>> way.
>
> I thought of this, but made the retrying happen in the wrapper so that
> it can be centralized. If the callers were to become responsible for
> the retry, then any new callers of dax_do_io might not realize they are
> responsible for retrying, and hit problems.

That's their prerogative otherwise you are precluding an alternate
handling of a dax_do_io() failure.  Maybe a fs or upper layer can
recover in a different manner than re-submit the I/O to the
__blockdev_direct_IO path.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/5] dax: handle media errors in dax_do_io
@ 2016-03-25 21:42         ` Dan Williams
  0 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-25 21:42 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: linux-block, jack, axboe, linux-nvdimm, xfs, hch, linux-mm, viro,
	linux-fsdevel, akpm, linux-ext4, ross.zwisler, Wilcox, Matthew R

On Fri, Mar 25, 2016 at 1:59 PM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Fri, 2016-03-25 at 03:45 -0700, Christoph Hellwig wrote:
>> On Thu, Mar 24, 2016 at 05:17:30PM -0600, Vishal Verma wrote:
>> >
>> > dax_do_io (called for read() or write() for a dax file system) may
>> > fail
>> > in the presence of bad blocks or media errors. Since we expect that
>> > a
>> > write should clear media errors on nvdimms, make dax_do_io fall
>> > back to
>> > the direct_IO path, which will send down a bio to the driver, which
>> > can
>> > then attempt to clear the error.
>> Leave the fallback on -EIO to the callers please.  They generally
>> call
>> __blockdev_direct_IO anyway, so it should actually become simpler
>> that
>> way.
>
> I thought of this, but made the retrying happen in the wrapper so that
> it can be centralized. If the callers were to become responsible for
> the retry, then any new callers of dax_do_io might not realize they are
> responsible for retrying, and hit problems.

That's their prerogative otherwise you are precluding an alternate
handling of a dax_do_io() failure.  Maybe a fs or upper layer can
recover in a different manner than re-submit the I/O to the
__blockdev_direct_IO path.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/5] dax: handle media errors in dax_do_io
  2016-03-25 21:42         ` Dan Williams
  (?)
@ 2016-03-25 22:36           ` Verma, Vishal L
  -1 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-25 22:36 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: linux-block, hch, xfs, linux-nvdimm, linux-mm, viro, axboe, akpm,
	linux-fsdevel, ross.zwisler, linux-ext4, Wilcox, Matthew R,
	david, jack

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1656 bytes --]

On Fri, 2016-03-25 at 14:42 -0700, Dan Williams wrote:
> On Fri, Mar 25, 2016 at 1:59 PM, Verma, Vishal L
> <vishal.l.verma@intel.com> wrote:
> > 
> > On Fri, 2016-03-25 at 03:45 -0700, Christoph Hellwig wrote:
> > > 
> > > On Thu, Mar 24, 2016 at 05:17:30PM -0600, Vishal Verma wrote:
> > > > 
> > > > 
> > > > dax_do_io (called for read() or write() for a dax file system)
> > > > may
> > > > fail
> > > > in the presence of bad blocks or media errors. Since we expect
> > > > that
> > > > a
> > > > write should clear media errors on nvdimms, make dax_do_io fall
> > > > back to
> > > > the direct_IO path, which will send down a bio to the driver,
> > > > which
> > > > can
> > > > then attempt to clear the error.
> > > Leave the fallback on -EIO to the callers please.  They generally
> > > call
> > > __blockdev_direct_IO anyway, so it should actually become simpler
> > > that
> > > way.
> > I thought of this, but made the retrying happen in the wrapper so
> > that
> > it can be centralized. If the callers were to become responsible
> > for
> > the retry, then any new callers of dax_do_io might not realize they
> > are
> > responsible for retrying, and hit problems.
> That's their prerogative otherwise you are precluding an alternate
> handling of a dax_do_io() failure.  Maybe a fs or upper layer can
> recover in a different manner than re-submit the I/O to the
> __blockdev_direct_IO path.

I'm happy to make the change, but we don't preclude that -- __dax_do_io
is still exported and available..N‹§²æìr¸›zǧu©ž²Æ {\b­†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨­è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒðèž×¦j)Z†·Ÿ

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/5] dax: handle media errors in dax_do_io
@ 2016-03-25 22:36           ` Verma, Vishal L
  0 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-25 22:36 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: linux-block, hch, xfs, linux-nvdimm, linux-mm, viro, axboe, akpm,
	linux-fsdevel, ross.zwisler, linux-ext4, Wilcox, Matthew R,
	david, jack

On Fri, 2016-03-25 at 14:42 -0700, Dan Williams wrote:
> On Fri, Mar 25, 2016 at 1:59 PM, Verma, Vishal L
> <vishal.l.verma@intel.com> wrote:
> > 
> > On Fri, 2016-03-25 at 03:45 -0700, Christoph Hellwig wrote:
> > > 
> > > On Thu, Mar 24, 2016 at 05:17:30PM -0600, Vishal Verma wrote:
> > > > 
> > > > 
> > > > dax_do_io (called for read() or write() for a dax file system)
> > > > may
> > > > fail
> > > > in the presence of bad blocks or media errors. Since we expect
> > > > that
> > > > a
> > > > write should clear media errors on nvdimms, make dax_do_io fall
> > > > back to
> > > > the direct_IO path, which will send down a bio to the driver,
> > > > which
> > > > can
> > > > then attempt to clear the error.
> > > Leave the fallback on -EIO to the callers please.  They generally
> > > call
> > > __blockdev_direct_IO anyway, so it should actually become simpler
> > > that
> > > way.
> > I thought of this, but made the retrying happen in the wrapper so
> > that
> > it can be centralized. If the callers were to become responsible
> > for
> > the retry, then any new callers of dax_do_io might not realize they
> > are
> > responsible for retrying, and hit problems.
> That's their prerogative otherwise you are precluding an alternate
> handling of a dax_do_io() failure.  Maybe a fs or upper layer can
> recover in a different manner than re-submit the I/O to the
> __blockdev_direct_IO path.

I'm happy to make the change, but we don't preclude that -- __dax_do_io
is still exported and available..

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/5] dax: handle media errors in dax_do_io
@ 2016-03-25 22:36           ` Verma, Vishal L
  0 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-25 22:36 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: hch, jack, axboe, linux-nvdimm, xfs, linux-block, linux-mm, viro,
	linux-fsdevel, akpm, linux-ext4, ross.zwisler, Wilcox, Matthew R

On Fri, 2016-03-25 at 14:42 -0700, Dan Williams wrote:
> On Fri, Mar 25, 2016 at 1:59 PM, Verma, Vishal L
> <vishal.l.verma@intel.com> wrote:
> > 
> > On Fri, 2016-03-25 at 03:45 -0700, Christoph Hellwig wrote:
> > > 
> > > On Thu, Mar 24, 2016 at 05:17:30PM -0600, Vishal Verma wrote:
> > > > 
> > > > 
> > > > dax_do_io (called for read() or write() for a dax file system)
> > > > may
> > > > fail
> > > > in the presence of bad blocks or media errors. Since we expect
> > > > that
> > > > a
> > > > write should clear media errors on nvdimms, make dax_do_io fall
> > > > back to
> > > > the direct_IO path, which will send down a bio to the driver,
> > > > which
> > > > can
> > > > then attempt to clear the error.
> > > Leave the fallback on -EIO to the callers please.  They generally
> > > call
> > > __blockdev_direct_IO anyway, so it should actually become simpler
> > > that
> > > way.
> > I thought of this, but made the retrying happen in the wrapper so
> > that
> > it can be centralized. If the callers were to become responsible
> > for
> > the retry, then any new callers of dax_do_io might not realize they
> > are
> > responsible for retrying, and hit problems.
> That's their prerogative otherwise you are precluding an alternate
> handling of a dax_do_io() failure.  Maybe a fs or upper layer can
> recover in a different manner than re-submit the I/O to the
> __blockdev_direct_IO path.

I'm happy to make the change, but we don't preclude that -- __dax_do_io
is still exported and available..
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/5] dax: handle media errors in dax_do_io
  2016-03-25 21:42         ` Dan Williams
@ 2016-03-26 16:53           ` hch
  -1 siblings, 0 replies; 74+ messages in thread
From: hch @ 2016-03-26 16:53 UTC (permalink / raw)
  To: Dan Williams
  Cc: Verma, Vishal L, linux-block, jack, axboe, linux-nvdimm, xfs,
	hch, linux-mm, viro, linux-fsdevel, akpm, linux-ext4,
	ross.zwisler, Wilcox, Matthew R

On Fri, Mar 25, 2016 at 02:42:37PM -0700, Dan Williams wrote:
> That's their prerogative otherwise you are precluding an alternate
> handling of a dax_do_io() failure.  Maybe a fs or upper layer can
> recover in a different manner than re-submit the I/O to the
> __blockdev_direct_IO path.

Let's keep the interface separate because they are, well separate.
There is a reason direct I/O falls back to buffered I/O by returning
and error if it can't handle it instead of handling all the magic.

I also really want to get rid of get_block as soon as possible for
DAX and direct I/O.  For DAX that should actually be possible
really quickly, while direct I/O might take some time and will
be have to be gradual.  So tighter integration of the two interface is
not just bad design, but actively harmful at this point in time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/5] dax: handle media errors in dax_do_io
@ 2016-03-26 16:53           ` hch
  0 siblings, 0 replies; 74+ messages in thread
From: hch @ 2016-03-26 16:53 UTC (permalink / raw)
  To: Dan Williams
  Cc: axboe, jack, hch, Verma, Vishal L, xfs, linux-block, linux-mm,
	viro, linux-nvdimm, linux-fsdevel, akpm, linux-ext4,
	ross.zwisler, Wilcox, Matthew R

On Fri, Mar 25, 2016 at 02:42:37PM -0700, Dan Williams wrote:
> That's their prerogative otherwise you are precluding an alternate
> handling of a dax_do_io() failure.  Maybe a fs or upper layer can
> recover in a different manner than re-submit the I/O to the
> __blockdev_direct_IO path.

Let's keep the interface separate because they are, well separate.
There is a reason direct I/O falls back to buffered I/O by returning
and error if it can't handle it instead of handling all the magic.

I also really want to get rid of get_block as soon as possible for
DAX and direct I/O.  For DAX that should actually be possible
really quickly, while direct I/O might take some time and will
be have to be gradual.  So tighter integration of the two interface is
not just bad design, but actively harmful at this point in time.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
  2016-03-25 21:20         ` Dan Williams
  (?)
@ 2016-03-28 20:01           ` Verma, Vishal L
  -1 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-28 20:01 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: axboe, jack, linux-nvdimm, david, xfs, linux-block, linux-mm,
	viro, linux-fsdevel, akpm, linux-ext4, Wilcox, Matthew

On Fri, 2016-03-25 at 14:20 -0700, Dan Williams wrote:
> On Fri, Mar 25, 2016 at 2:03 PM, Verma, Vishal L
> <vishal.l.verma@intel.com> wrote:
> > 
> > On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
> > > 
> > > On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@int
> > > el.c
> > > om> wrote:
> > > > 
> > > > 
> > > > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > > > 
> > > > dax_clear_sectors() cannot handle poisoned blocks.  These must
> > > > be
> > > > zeroed using the BIO interface instead.  Convert ext2 and XFS
> > > > to
> > > > use
> > > > only sb_issue_zerout().
> > > > 
> > > > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > > > [vishal: Also remove the dax_clear_sectors function entirely]
> > > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > > > ---
> > > >  fs/dax.c               | 32 --------------------------------
> > > >  fs/ext2/inode.c        |  7 +++----
> > > >  fs/xfs/xfs_bmap_util.c |  9 ---------
> > > >  include/linux/dax.h    |  1 -
> > > >  4 files changed, 3 insertions(+), 46 deletions(-)
> > > > 
> > > > diff --git a/fs/dax.c b/fs/dax.c
> > > > index bb7e9f8..a30481e 100644
> > > > --- a/fs/dax.c
> > > > +++ b/fs/dax.c
> > > > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct
> > > > block_device
> > > > *bdev, sector_t n)
> > > >         return page;
> > > >  }
> > > > 
> > > > -/*
> > > > - * dax_clear_sectors() is called from within transaction
> > > > context
> > > > from XFS,
> > > > - * and hence this means the stack from this point must follow
> > > > GFP_NOFS
> > > > - * semantics for all operations.
> > > > - */
> > > > -int dax_clear_sectors(struct block_device *bdev, sector_t
> > > > _sector,
> > > > long _size)
> > > > -{
> > > > -       struct blk_dax_ctl dax = {
> > > > -               .sector = _sector,
> > > > -               .size = _size,
> > > > -       };
> > > > -
> > > > -       might_sleep();
> > > > -       do {
> > > > -               long count, sz;
> > > > -
> > > > -               count = dax_map_atomic(bdev, &dax);
> > > > -               if (count < 0)
> > > > -                       return count;
> > > > -               sz = min_t(long, count, SZ_128K);
> > > > -               clear_pmem(dax.addr, sz);
> > > > -               dax.size -= sz;
> > > > -               dax.sector += sz / 512;
> > > > -               dax_unmap_atomic(bdev, &dax);
> > > > -               cond_resched();
> > > > -       } while (dax.size);
> > > > -
> > > > -       wmb_pmem();
> > > > -       return 0;
> > > > -}
> > > > -EXPORT_SYMBOL_GPL(dax_clear_sectors);
> > > What about the other unwritten extent conversions in the dax
> > > path?
> > > Shouldn't those be converted to block-layer zero-outs as well?
> > Could you point me to where these might be? I thought once we've
> > converted all the zeroout type callers (by removing
> > dax_clear_sectors),
> > and fixed up dax_do_io to try a driver fallback, we've handled all
> > the
> > media error cases in dax..
> grep for usages of clear_pmem()... which I was hoping to eliminate
> after this change to push zeroing down to the driver.

Ok, so I looked at these, and it looks like the majority of callers of
clear_pmem are from the fault path (either pmd or regular), and in
those cases we should be 'protected', as we would have failed at a
prior step (dax_map_atomic).

The two cases that may not be well handled are the calls to
dax_zero_page_range and dax_truncate_page which are called from file
systems. I think we may need to do a fallback to the driver for those
cases just like we do for dax_direct_io.. Thoughts?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-28 20:01           ` Verma, Vishal L
  0 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-28 20:01 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: linux-block, xfs, linux-mm, viro, akpm, axboe, linux-nvdimm,
	linux-fsdevel, ross.zwisler, linux-ext4, Wilcox, Matthew R,
	david, jack

On Fri, 2016-03-25 at 14:20 -0700, Dan Williams wrote:
> On Fri, Mar 25, 2016 at 2:03 PM, Verma, Vishal L
> <vishal.l.verma@intel.com> wrote:
> > 
> > On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
> > > 
> > > On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@int
> > > el.c
> > > om> wrote:
> > > > 
> > > > 
> > > > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > > > 
> > > > dax_clear_sectors() cannot handle poisoned blocks.  These must
> > > > be
> > > > zeroed using the BIO interface instead.  Convert ext2 and XFS
> > > > to
> > > > use
> > > > only sb_issue_zerout().
> > > > 
> > > > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > > > [vishal: Also remove the dax_clear_sectors function entirely]
> > > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > > > ---
> > > >  fs/dax.c               | 32 --------------------------------
> > > >  fs/ext2/inode.c        |  7 +++----
> > > >  fs/xfs/xfs_bmap_util.c |  9 ---------
> > > >  include/linux/dax.h    |  1 -
> > > >  4 files changed, 3 insertions(+), 46 deletions(-)
> > > > 
> > > > diff --git a/fs/dax.c b/fs/dax.c
> > > > index bb7e9f8..a30481e 100644
> > > > --- a/fs/dax.c
> > > > +++ b/fs/dax.c
> > > > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct
> > > > block_device
> > > > *bdev, sector_t n)
> > > >         return page;
> > > >  }
> > > > 
> > > > -/*
> > > > - * dax_clear_sectors() is called from within transaction
> > > > context
> > > > from XFS,
> > > > - * and hence this means the stack from this point must follow
> > > > GFP_NOFS
> > > > - * semantics for all operations.
> > > > - */
> > > > -int dax_clear_sectors(struct block_device *bdev, sector_t
> > > > _sector,
> > > > long _size)
> > > > -{
> > > > -       struct blk_dax_ctl dax = {
> > > > -               .sector = _sector,
> > > > -               .size = _size,
> > > > -       };
> > > > -
> > > > -       might_sleep();
> > > > -       do {
> > > > -               long count, sz;
> > > > -
> > > > -               count = dax_map_atomic(bdev, &dax);
> > > > -               if (count < 0)
> > > > -                       return count;
> > > > -               sz = min_t(long, count, SZ_128K);
> > > > -               clear_pmem(dax.addr, sz);
> > > > -               dax.size -= sz;
> > > > -               dax.sector += sz / 512;
> > > > -               dax_unmap_atomic(bdev, &dax);
> > > > -               cond_resched();
> > > > -       } while (dax.size);
> > > > -
> > > > -       wmb_pmem();
> > > > -       return 0;
> > > > -}
> > > > -EXPORT_SYMBOL_GPL(dax_clear_sectors);
> > > What about the other unwritten extent conversions in the dax
> > > path?
> > > Shouldn't those be converted to block-layer zero-outs as well?
> > Could you point me to where these might be? I thought once we've
> > converted all the zeroout type callers (by removing
> > dax_clear_sectors),
> > and fixed up dax_do_io to try a driver fallback, we've handled all
> > the
> > media error cases in dax..
> grep for usages of clear_pmem()... which I was hoping to eliminate
> after this change to push zeroing down to the driver.

Ok, so I looked at these, and it looks like the majority of callers of
clear_pmem are from the fault path (either pmd or regular), and in
those cases we should be 'protected', as we would have failed at a
prior step (dax_map_atomic).

The two cases that may not be well handled are the calls to
dax_zero_page_range and dax_truncate_page which are called from file
systems. I think we may need to do a fallback to the driver for those
cases just like we do for dax_direct_io.. Thoughts?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-28 20:01           ` Verma, Vishal L
  0 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-28 20:01 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: axboe, jack, linux-nvdimm, xfs, linux-block, linux-mm, viro,
	linux-fsdevel, akpm, linux-ext4, ross.zwisler, Wilcox, Matthew R

On Fri, 2016-03-25 at 14:20 -0700, Dan Williams wrote:
> On Fri, Mar 25, 2016 at 2:03 PM, Verma, Vishal L
> <vishal.l.verma@intel.com> wrote:
> > 
> > On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
> > > 
> > > On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@int
> > > el.c
> > > om> wrote:
> > > > 
> > > > 
> > > > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > > > 
> > > > dax_clear_sectors() cannot handle poisoned blocks.  These must
> > > > be
> > > > zeroed using the BIO interface instead.  Convert ext2 and XFS
> > > > to
> > > > use
> > > > only sb_issue_zerout().
> > > > 
> > > > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > > > [vishal: Also remove the dax_clear_sectors function entirely]
> > > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > > > ---
> > > >  fs/dax.c               | 32 --------------------------------
> > > >  fs/ext2/inode.c        |  7 +++----
> > > >  fs/xfs/xfs_bmap_util.c |  9 ---------
> > > >  include/linux/dax.h    |  1 -
> > > >  4 files changed, 3 insertions(+), 46 deletions(-)
> > > > 
> > > > diff --git a/fs/dax.c b/fs/dax.c
> > > > index bb7e9f8..a30481e 100644
> > > > --- a/fs/dax.c
> > > > +++ b/fs/dax.c
> > > > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct
> > > > block_device
> > > > *bdev, sector_t n)
> > > >         return page;
> > > >  }
> > > > 
> > > > -/*
> > > > - * dax_clear_sectors() is called from within transaction
> > > > context
> > > > from XFS,
> > > > - * and hence this means the stack from this point must follow
> > > > GFP_NOFS
> > > > - * semantics for all operations.
> > > > - */
> > > > -int dax_clear_sectors(struct block_device *bdev, sector_t
> > > > _sector,
> > > > long _size)
> > > > -{
> > > > -       struct blk_dax_ctl dax = {
> > > > -               .sector = _sector,
> > > > -               .size = _size,
> > > > -       };
> > > > -
> > > > -       might_sleep();
> > > > -       do {
> > > > -               long count, sz;
> > > > -
> > > > -               count = dax_map_atomic(bdev, &dax);
> > > > -               if (count < 0)
> > > > -                       return count;
> > > > -               sz = min_t(long, count, SZ_128K);
> > > > -               clear_pmem(dax.addr, sz);
> > > > -               dax.size -= sz;
> > > > -               dax.sector += sz / 512;
> > > > -               dax_unmap_atomic(bdev, &dax);
> > > > -               cond_resched();
> > > > -       } while (dax.size);
> > > > -
> > > > -       wmb_pmem();
> > > > -       return 0;
> > > > -}
> > > > -EXPORT_SYMBOL_GPL(dax_clear_sectors);
> > > What about the other unwritten extent conversions in the dax
> > > path?
> > > Shouldn't those be converted to block-layer zero-outs as well?
> > Could you point me to where these might be? I thought once we've
> > converted all the zeroout type callers (by removing
> > dax_clear_sectors),
> > and fixed up dax_do_io to try a driver fallback, we've handled all
> > the
> > media error cases in dax..
> grep for usages of clear_pmem()... which I was hoping to eliminate
> after this change to push zeroing down to the driver.

Ok, so I looked at these, and it looks like the majority of callers of
clear_pmem are from the fault path (either pmd or regular), and in
those cases we should be 'protected', as we would have failed at a
prior step (dax_map_atomic).

The two cases that may not be well handled are the calls to
dax_zero_page_range and dax_truncate_page which are called from file
systems. I think we may need to do a fallback to the driver for those
cases just like we do for dax_direct_io.. Thoughts?
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
  2016-03-28 20:01           ` Verma, Vishal L
  (?)
  (?)
@ 2016-03-28 23:34             ` Dan Williams
  -1 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-28 23:34 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: axboe, jack, linux-nvdimm, david, xfs, linux-block, linux-mm,
	viro, linux-fsdevel, akpm, linux-ext4, Wilcox

On Mon, Mar 28, 2016 at 1:01 PM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Fri, 2016-03-25 at 14:20 -0700, Dan Williams wrote:
>> On Fri, Mar 25, 2016 at 2:03 PM, Verma, Vishal L
>> <vishal.l.verma@intel.com> wrote:
>> >
>> > On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
>> > >
>> > > On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@int
>> > > el.c
>> > > om> wrote:
>> > > >
>> > > >
>> > > > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> > > >
>> > > > dax_clear_sectors() cannot handle poisoned blocks.  These must
>> > > > be
>> > > > zeroed using the BIO interface instead.  Convert ext2 and XFS
>> > > > to
>> > > > use
>> > > > only sb_issue_zerout().
>> > > >
>> > > > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> > > > [vishal: Also remove the dax_clear_sectors function entirely]
>> > > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
>> > > > ---
>> > > >  fs/dax.c               | 32 --------------------------------
>> > > >  fs/ext2/inode.c        |  7 +++----
>> > > >  fs/xfs/xfs_bmap_util.c |  9 ---------
>> > > >  include/linux/dax.h    |  1 -
>> > > >  4 files changed, 3 insertions(+), 46 deletions(-)
>> > > >
>> > > > diff --git a/fs/dax.c b/fs/dax.c
>> > > > index bb7e9f8..a30481e 100644
>> > > > --- a/fs/dax.c
>> > > > +++ b/fs/dax.c
>> > > > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct
>> > > > block_device
>> > > > *bdev, sector_t n)
>> > > >         return page;
>> > > >  }
>> > > >
>> > > > -/*
>> > > > - * dax_clear_sectors() is called from within transaction
>> > > > context
>> > > > from XFS,
>> > > > - * and hence this means the stack from this point must follow
>> > > > GFP_NOFS
>> > > > - * semantics for all operations.
>> > > > - */
>> > > > -int dax_clear_sectors(struct block_device *bdev, sector_t
>> > > > _sector,
>> > > > long _size)
>> > > > -{
>> > > > -       struct blk_dax_ctl dax = {
>> > > > -               .sector = _sector,
>> > > > -               .size = _size,
>> > > > -       };
>> > > > -
>> > > > -       might_sleep();
>> > > > -       do {
>> > > > -               long count, sz;
>> > > > -
>> > > > -               count = dax_map_atomic(bdev, &dax);
>> > > > -               if (count < 0)
>> > > > -                       return count;
>> > > > -               sz = min_t(long, count, SZ_128K);
>> > > > -               clear_pmem(dax.addr, sz);
>> > > > -               dax.size -= sz;
>> > > > -               dax.sector += sz / 512;
>> > > > -               dax_unmap_atomic(bdev, &dax);
>> > > > -               cond_resched();
>> > > > -       } while (dax.size);
>> > > > -
>> > > > -       wmb_pmem();
>> > > > -       return 0;
>> > > > -}
>> > > > -EXPORT_SYMBOL_GPL(dax_clear_sectors);
>> > > What about the other unwritten extent conversions in the dax
>> > > path?
>> > > Shouldn't those be converted to block-layer zero-outs as well?
>> > Could you point me to where these might be? I thought once we've
>> > converted all the zeroout type callers (by removing
>> > dax_clear_sectors),
>> > and fixed up dax_do_io to try a driver fallback, we've handled all
>> > the
>> > media error cases in dax..
>> grep for usages of clear_pmem()... which I was hoping to eliminate
>> after this change to push zeroing down to the driver.
>
> Ok, so I looked at these, and it looks like the majority of callers of
> clear_pmem are from the fault path (either pmd or regular), and in
> those cases we should be 'protected', as we would have failed at a
> prior step (dax_map_atomic).

Seems kind of sad to fail the fault due to a bad block when we were
going to zero it anyway, right?  I'm not seeing a compelling reason to
keep any zeroing in fs/dax.c.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-28 23:34             ` Dan Williams
  0 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-28 23:34 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: linux-block, xfs, linux-mm, viro, akpm, axboe, linux-nvdimm,
	linux-fsdevel, ross.zwisler, linux-ext4, Wilcox, Matthew R,
	david, jack

On Mon, Mar 28, 2016 at 1:01 PM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Fri, 2016-03-25 at 14:20 -0700, Dan Williams wrote:
>> On Fri, Mar 25, 2016 at 2:03 PM, Verma, Vishal L
>> <vishal.l.verma@intel.com> wrote:
>> >
>> > On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
>> > >
>> > > On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@int
>> > > el.c
>> > > om> wrote:
>> > > >
>> > > >
>> > > > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> > > >
>> > > > dax_clear_sectors() cannot handle poisoned blocks.  These must
>> > > > be
>> > > > zeroed using the BIO interface instead.  Convert ext2 and XFS
>> > > > to
>> > > > use
>> > > > only sb_issue_zerout().
>> > > >
>> > > > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> > > > [vishal: Also remove the dax_clear_sectors function entirely]
>> > > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
>> > > > ---
>> > > >  fs/dax.c               | 32 --------------------------------
>> > > >  fs/ext2/inode.c        |  7 +++----
>> > > >  fs/xfs/xfs_bmap_util.c |  9 ---------
>> > > >  include/linux/dax.h    |  1 -
>> > > >  4 files changed, 3 insertions(+), 46 deletions(-)
>> > > >
>> > > > diff --git a/fs/dax.c b/fs/dax.c
>> > > > index bb7e9f8..a30481e 100644
>> > > > --- a/fs/dax.c
>> > > > +++ b/fs/dax.c
>> > > > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct
>> > > > block_device
>> > > > *bdev, sector_t n)
>> > > >         return page;
>> > > >  }
>> > > >
>> > > > -/*
>> > > > - * dax_clear_sectors() is called from within transaction
>> > > > context
>> > > > from XFS,
>> > > > - * and hence this means the stack from this point must follow
>> > > > GFP_NOFS
>> > > > - * semantics for all operations.
>> > > > - */
>> > > > -int dax_clear_sectors(struct block_device *bdev, sector_t
>> > > > _sector,
>> > > > long _size)
>> > > > -{
>> > > > -       struct blk_dax_ctl dax = {
>> > > > -               .sector = _sector,
>> > > > -               .size = _size,
>> > > > -       };
>> > > > -
>> > > > -       might_sleep();
>> > > > -       do {
>> > > > -               long count, sz;
>> > > > -
>> > > > -               count = dax_map_atomic(bdev, &dax);
>> > > > -               if (count < 0)
>> > > > -                       return count;
>> > > > -               sz = min_t(long, count, SZ_128K);
>> > > > -               clear_pmem(dax.addr, sz);
>> > > > -               dax.size -= sz;
>> > > > -               dax.sector += sz / 512;
>> > > > -               dax_unmap_atomic(bdev, &dax);
>> > > > -               cond_resched();
>> > > > -       } while (dax.size);
>> > > > -
>> > > > -       wmb_pmem();
>> > > > -       return 0;
>> > > > -}
>> > > > -EXPORT_SYMBOL_GPL(dax_clear_sectors);
>> > > What about the other unwritten extent conversions in the dax
>> > > path?
>> > > Shouldn't those be converted to block-layer zero-outs as well?
>> > Could you point me to where these might be? I thought once we've
>> > converted all the zeroout type callers (by removing
>> > dax_clear_sectors),
>> > and fixed up dax_do_io to try a driver fallback, we've handled all
>> > the
>> > media error cases in dax..
>> grep for usages of clear_pmem()... which I was hoping to eliminate
>> after this change to push zeroing down to the driver.
>
> Ok, so I looked at these, and it looks like the majority of callers of
> clear_pmem are from the fault path (either pmd or regular), and in
> those cases we should be 'protected', as we would have failed at a
> prior step (dax_map_atomic).

Seems kind of sad to fail the fault due to a bad block when we were
going to zero it anyway, right?  I'm not seeing a compelling reason to
keep any zeroing in fs/dax.c.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-28 23:34             ` Dan Williams
  0 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-28 23:34 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: linux-block, xfs, linux-mm, viro, akpm, axboe, linux-nvdimm,
	linux-fsdevel, ross.zwisler, linux-ext4, Wilcox, Matthew R,
	david, jack

On Mon, Mar 28, 2016 at 1:01 PM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Fri, 2016-03-25 at 14:20 -0700, Dan Williams wrote:
>> On Fri, Mar 25, 2016 at 2:03 PM, Verma, Vishal L
>> <vishal.l.verma@intel.com> wrote:
>> >
>> > On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
>> > >
>> > > On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@int
>> > > el.c
>> > > om> wrote:
>> > > >
>> > > >
>> > > > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> > > >
>> > > > dax_clear_sectors() cannot handle poisoned blocks.  These must
>> > > > be
>> > > > zeroed using the BIO interface instead.  Convert ext2 and XFS
>> > > > to
>> > > > use
>> > > > only sb_issue_zerout().
>> > > >
>> > > > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> > > > [vishal: Also remove the dax_clear_sectors function entirely]
>> > > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
>> > > > ---
>> > > >  fs/dax.c               | 32 --------------------------------
>> > > >  fs/ext2/inode.c        |  7 +++----
>> > > >  fs/xfs/xfs_bmap_util.c |  9 ---------
>> > > >  include/linux/dax.h    |  1 -
>> > > >  4 files changed, 3 insertions(+), 46 deletions(-)
>> > > >
>> > > > diff --git a/fs/dax.c b/fs/dax.c
>> > > > index bb7e9f8..a30481e 100644
>> > > > --- a/fs/dax.c
>> > > > +++ b/fs/dax.c
>> > > > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct
>> > > > block_device
>> > > > *bdev, sector_t n)
>> > > >         return page;
>> > > >  }
>> > > >
>> > > > -/*
>> > > > - * dax_clear_sectors() is called from within transaction
>> > > > context
>> > > > from XFS,
>> > > > - * and hence this means the stack from this point must follow
>> > > > GFP_NOFS
>> > > > - * semantics for all operations.
>> > > > - */
>> > > > -int dax_clear_sectors(struct block_device *bdev, sector_t
>> > > > _sector,
>> > > > long _size)
>> > > > -{
>> > > > -       struct blk_dax_ctl dax = {
>> > > > -               .sector = _sector,
>> > > > -               .size = _size,
>> > > > -       };
>> > > > -
>> > > > -       might_sleep();
>> > > > -       do {
>> > > > -               long count, sz;
>> > > > -
>> > > > -               count = dax_map_atomic(bdev, &dax);
>> > > > -               if (count < 0)
>> > > > -                       return count;
>> > > > -               sz = min_t(long, count, SZ_128K);
>> > > > -               clear_pmem(dax.addr, sz);
>> > > > -               dax.size -= sz;
>> > > > -               dax.sector += sz / 512;
>> > > > -               dax_unmap_atomic(bdev, &dax);
>> > > > -               cond_resched();
>> > > > -       } while (dax.size);
>> > > > -
>> > > > -       wmb_pmem();
>> > > > -       return 0;
>> > > > -}
>> > > > -EXPORT_SYMBOL_GPL(dax_clear_sectors);
>> > > What about the other unwritten extent conversions in the dax
>> > > path?
>> > > Shouldn't those be converted to block-layer zero-outs as well?
>> > Could you point me to where these might be? I thought once we've
>> > converted all the zeroout type callers (by removing
>> > dax_clear_sectors),
>> > and fixed up dax_do_io to try a driver fallback, we've handled all
>> > the
>> > media error cases in dax..
>> grep for usages of clear_pmem()... which I was hoping to eliminate
>> after this change to push zeroing down to the driver.
>
> Ok, so I looked at these, and it looks like the majority of callers of
> clear_pmem are from the fault path (either pmd or regular), and in
> those cases we should be 'protected', as we would have failed at a
> prior step (dax_map_atomic).

Seems kind of sad to fail the fault due to a bad block when we were
going to zero it anyway, right?  I'm not seeing a compelling reason to
keep any zeroing in fs/dax.c.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-28 23:34             ` Dan Williams
  0 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-28 23:34 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: axboe, jack, linux-nvdimm, xfs, linux-block, linux-mm, viro,
	linux-fsdevel, akpm, linux-ext4, ross.zwisler, Wilcox, Matthew R

On Mon, Mar 28, 2016 at 1:01 PM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Fri, 2016-03-25 at 14:20 -0700, Dan Williams wrote:
>> On Fri, Mar 25, 2016 at 2:03 PM, Verma, Vishal L
>> <vishal.l.verma@intel.com> wrote:
>> >
>> > On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
>> > >
>> > > On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@int
>> > > el.c
>> > > om> wrote:
>> > > >
>> > > >
>> > > > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> > > >
>> > > > dax_clear_sectors() cannot handle poisoned blocks.  These must
>> > > > be
>> > > > zeroed using the BIO interface instead.  Convert ext2 and XFS
>> > > > to
>> > > > use
>> > > > only sb_issue_zerout().
>> > > >
>> > > > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> > > > [vishal: Also remove the dax_clear_sectors function entirely]
>> > > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
>> > > > ---
>> > > >  fs/dax.c               | 32 --------------------------------
>> > > >  fs/ext2/inode.c        |  7 +++----
>> > > >  fs/xfs/xfs_bmap_util.c |  9 ---------
>> > > >  include/linux/dax.h    |  1 -
>> > > >  4 files changed, 3 insertions(+), 46 deletions(-)
>> > > >
>> > > > diff --git a/fs/dax.c b/fs/dax.c
>> > > > index bb7e9f8..a30481e 100644
>> > > > --- a/fs/dax.c
>> > > > +++ b/fs/dax.c
>> > > > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct
>> > > > block_device
>> > > > *bdev, sector_t n)
>> > > >         return page;
>> > > >  }
>> > > >
>> > > > -/*
>> > > > - * dax_clear_sectors() is called from within transaction
>> > > > context
>> > > > from XFS,
>> > > > - * and hence this means the stack from this point must follow
>> > > > GFP_NOFS
>> > > > - * semantics for all operations.
>> > > > - */
>> > > > -int dax_clear_sectors(struct block_device *bdev, sector_t
>> > > > _sector,
>> > > > long _size)
>> > > > -{
>> > > > -       struct blk_dax_ctl dax = {
>> > > > -               .sector = _sector,
>> > > > -               .size = _size,
>> > > > -       };
>> > > > -
>> > > > -       might_sleep();
>> > > > -       do {
>> > > > -               long count, sz;
>> > > > -
>> > > > -               count = dax_map_atomic(bdev, &dax);
>> > > > -               if (count < 0)
>> > > > -                       return count;
>> > > > -               sz = min_t(long, count, SZ_128K);
>> > > > -               clear_pmem(dax.addr, sz);
>> > > > -               dax.size -= sz;
>> > > > -               dax.sector += sz / 512;
>> > > > -               dax_unmap_atomic(bdev, &dax);
>> > > > -               cond_resched();
>> > > > -       } while (dax.size);
>> > > > -
>> > > > -       wmb_pmem();
>> > > > -       return 0;
>> > > > -}
>> > > > -EXPORT_SYMBOL_GPL(dax_clear_sectors);
>> > > What about the other unwritten extent conversions in the dax
>> > > path?
>> > > Shouldn't those be converted to block-layer zero-outs as well?
>> > Could you point me to where these might be? I thought once we've
>> > converted all the zeroout type callers (by removing
>> > dax_clear_sectors),
>> > and fixed up dax_do_io to try a driver fallback, we've handled all
>> > the
>> > media error cases in dax..
>> grep for usages of clear_pmem()... which I was hoping to eliminate
>> after this change to push zeroing down to the driver.
>
> Ok, so I looked at these, and it looks like the majority of callers of
> clear_pmem are from the fault path (either pmd or regular), and in
> those cases we should be 'protected', as we would have failed at a
> prior step (dax_map_atomic).

Seems kind of sad to fail the fault due to a bad block when we were
going to zero it anyway, right?  I'm not seeing a compelling reason to
keep any zeroing in fs/dax.c.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
  2016-03-28 23:34             ` Dan Williams
  (?)
  (?)
@ 2016-03-29 18:57               ` Verma, Vishal L
  -1 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-29 18:57 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: axboe, jack, linux-nvdimm, david, xfs, linux-block, linux-mm,
	viro, linux-fsdevel, akpm, linux-ext4, Wilcox

On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:

<>

> Seems kind of sad to fail the fault due to a bad block when we were
> going to zero it anyway, right?  I'm not seeing a compelling reason to
> keep any zeroing in fs/dax.c.

Agreed - but how do we do this? clear_pmem needs to be able to clear an
arbitrary number of bytes, but to go through the driver, we'd need to
send down a bio? If only the driver had an rw_bytes like interface that
could be used by anyone... :)
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-29 18:57               ` Verma, Vishal L
  0 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-29 18:57 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: linux-block, xfs, linux-mm, viro, axboe, akpm, linux-nvdimm,
	linux-fsdevel, ross.zwisler, linux-ext4, Wilcox, Matthew R,
	david, jack

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 591 bytes --]

On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:

<>

> Seems kind of sad to fail the fault due to a bad block when we were
> going to zero it anyway, right?  I'm not seeing a compelling reason to
> keep any zeroing in fs/dax.c.

Agreed - but how do we do this? clear_pmem needs to be able to clear an
arbitrary number of bytes, but to go through the driver, we'd need to
send down a bio? If only the driver had an rw_bytes like interface that
could be used by anyone... :)N‹§²æìr¸›zǧu©ž²Æ {\b­†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨­è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒðèž×¦j)Z†·Ÿ

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-29 18:57               ` Verma, Vishal L
  0 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-29 18:57 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: linux-block, xfs, linux-mm, viro, axboe, akpm, linux-nvdimm,
	linux-fsdevel, ross.zwisler, linux-ext4, Wilcox, Matthew R,
	david, jack

On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:

<>

> Seems kind of sad to fail the fault due to a bad block when we were
> going to zero it anyway, right?  I'm not seeing a compelling reason to
> keep any zeroing in fs/dax.c.

Agreed - but how do we do this? clear_pmem needs to be able to clear an
arbitrary number of bytes, but to go through the driver, we'd need to
send down a bio? If only the driver had an rw_bytes like interface that
could be used by anyone... :)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-29 18:57               ` Verma, Vishal L
  0 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-03-29 18:57 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: axboe, jack, linux-nvdimm, xfs, linux-block, linux-mm, viro,
	linux-fsdevel, akpm, linux-ext4, ross.zwisler, Wilcox, Matthew R

On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:

<>

> Seems kind of sad to fail the fault due to a bad block when we were
> going to zero it anyway, right?  I'm not seeing a compelling reason to
> keep any zeroing in fs/dax.c.

Agreed - but how do we do this? clear_pmem needs to be able to clear an
arbitrary number of bytes, but to go through the driver, we'd need to
send down a bio? If only the driver had an rw_bytes like interface that
could be used by anyone... :)
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
  2016-03-29 18:57               ` Verma, Vishal L
  (?)
  (?)
@ 2016-03-29 19:37                 ` Dan Williams
  -1 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-29 19:37 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: axboe, jack, linux-nvdimm, david, xfs, linux-block, linux-mm,
	viro, linux-fsdevel, akpm, linux-ext4, Wilcox

On Tue, Mar 29, 2016 at 11:57 AM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
>
> <>
>
>> Seems kind of sad to fail the fault due to a bad block when we were
>> going to zero it anyway, right?  I'm not seeing a compelling reason to
>> keep any zeroing in fs/dax.c.
>
> Agreed - but how do we do this? clear_pmem needs to be able to clear an
> arbitrary number of bytes, but to go through the driver, we'd need to
> send down a bio? If only the driver had an rw_bytes like interface that
> could be used by anyone... :)

I think we're ok because clear_pmem() will always happen on PAGE_SIZE,
or HPAGE_SIZE boundaries.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-29 19:37                 ` Dan Williams
  0 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-29 19:37 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: linux-block, xfs, linux-mm, viro, axboe, akpm, linux-nvdimm,
	linux-fsdevel, ross.zwisler, linux-ext4, Wilcox, Matthew R,
	david, jack

On Tue, Mar 29, 2016 at 11:57 AM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
>
> <>
>
>> Seems kind of sad to fail the fault due to a bad block when we were
>> going to zero it anyway, right?  I'm not seeing a compelling reason to
>> keep any zeroing in fs/dax.c.
>
> Agreed - but how do we do this? clear_pmem needs to be able to clear an
> arbitrary number of bytes, but to go through the driver, we'd need to
> send down a bio? If only the driver had an rw_bytes like interface that
> could be used by anyone... :)

I think we're ok because clear_pmem() will always happen on PAGE_SIZE,
or HPAGE_SIZE boundaries.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-29 19:37                 ` Dan Williams
  0 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-29 19:37 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: linux-block, xfs, linux-mm, viro, axboe, akpm, linux-nvdimm,
	linux-fsdevel, ross.zwisler, linux-ext4, Wilcox, Matthew R,
	david, jack

On Tue, Mar 29, 2016 at 11:57 AM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
>
> <>
>
>> Seems kind of sad to fail the fault due to a bad block when we were
>> going to zero it anyway, right?  I'm not seeing a compelling reason to
>> keep any zeroing in fs/dax.c.
>
> Agreed - but how do we do this? clear_pmem needs to be able to clear an
> arbitrary number of bytes, but to go through the driver, we'd need to
> send down a bio? If only the driver had an rw_bytes like interface that
> could be used by anyone... :)

I think we're ok because clear_pmem() will always happen on PAGE_SIZE,
or HPAGE_SIZE boundaries.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-29 19:37                 ` Dan Williams
  0 siblings, 0 replies; 74+ messages in thread
From: Dan Williams @ 2016-03-29 19:37 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: axboe, jack, linux-nvdimm, xfs, linux-block, linux-mm, viro,
	linux-fsdevel, akpm, linux-ext4, ross.zwisler, Wilcox, Matthew R

On Tue, Mar 29, 2016 at 11:57 AM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
>
> <>
>
>> Seems kind of sad to fail the fault due to a bad block when we were
>> going to zero it anyway, right?  I'm not seeing a compelling reason to
>> keep any zeroing in fs/dax.c.
>
> Agreed - but how do we do this? clear_pmem needs to be able to clear an
> arbitrary number of bytes, but to go through the driver, we'd need to
> send down a bio? If only the driver had an rw_bytes like interface that
> could be used by anyone... :)

I think we're ok because clear_pmem() will always happen on PAGE_SIZE,
or HPAGE_SIZE boundaries.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
  2016-03-29 18:57               ` Verma, Vishal L
                                   ` (2 preceding siblings ...)
  (?)
@ 2016-03-30  7:49                 ` Jan Kara
  -1 siblings, 0 replies; 74+ messages in thread
From: Jan Kara @ 2016-03-30  7:49 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: axboe, jack, linux-nvdimm, david, xfs, linux-block, linux-mm,
	viro, linux-fsdevel, linux-ext4, akpm, Wilcox

On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:
> On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
> 
> <>
> 
> > Seems kind of sad to fail the fault due to a bad block when we were
> > going to zero it anyway, right?  I'm not seeing a compelling reason to
> > keep any zeroing in fs/dax.c.
> 
> Agreed - but how do we do this? clear_pmem needs to be able to clear an
> arbitrary number of bytes, but to go through the driver, we'd need to
> send down a bio? If only the driver had an rw_bytes like interface that
> could be used by anyone... :)

Actually, my patches for page fault locking remove zeroing from
dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens from
the filesystem only and the zeroing in those two functions is just a dead
code...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-30  7:49                 ` Jan Kara
  0 siblings, 0 replies; 74+ messages in thread
From: Jan Kara @ 2016-03-30  7:49 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: Williams, Dan J, linux-block, xfs, linux-mm, viro, axboe, akpm,
	linux-nvdimm, linux-fsdevel, ross.zwisler, linux-ext4, Wilcox,
	Matthew R, david, jack

On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:
> On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
> 
> <>
> 
> > Seems kind of sad to fail the fault due to a bad block when we were
> > going to zero it anyway, right?��I'm not seeing a compelling reason to
> > keep any zeroing in fs/dax.c.
> 
> Agreed - but how do we do this? clear_pmem needs to be able to clear an
> arbitrary number of bytes, but to go through the driver, we'd need to
> send down a bio? If only the driver had an rw_bytes like interface that
> could be used by anyone... :)

Actually, my patches for page fault locking remove zeroing from
dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens from
the filesystem only and the zeroing in those two functions is just a dead
code...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-30  7:49                 ` Jan Kara
  0 siblings, 0 replies; 74+ messages in thread
From: Jan Kara @ 2016-03-30  7:49 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: Williams, Dan J, linux-block, xfs, linux-mm, viro, axboe, akpm,
	linux-nvdimm, linux-fsdevel, ross.zwisler, linux-ext4, Wilcox,
	Matthew R, david, jack

On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:
> On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
> 
> <>
> 
> > Seems kind of sad to fail the fault due to a bad block when we were
> > going to zero it anyway, right?  I'm not seeing a compelling reason to
> > keep any zeroing in fs/dax.c.
> 
> Agreed - but how do we do this? clear_pmem needs to be able to clear an
> arbitrary number of bytes, but to go through the driver, we'd need to
> send down a bio? If only the driver had an rw_bytes like interface that
> could be used by anyone... :)

Actually, my patches for page fault locking remove zeroing from
dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens from
the filesystem only and the zeroing in those two functions is just a dead
code...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-30  7:49                 ` Jan Kara
  0 siblings, 0 replies; 74+ messages in thread
From: Jan Kara @ 2016-03-30  7:49 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: axboe, jack, linux-nvdimm, xfs, linux-block, linux-mm, viro,
	ross.zwisler, linux-fsdevel, Williams, Dan J, linux-ext4, akpm,
	Wilcox, Matthew R

On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:
> On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
> 
> <>
> 
> > Seems kind of sad to fail the fault due to a bad block when we were
> > going to zero it anyway, right?  I'm not seeing a compelling reason to
> > keep any zeroing in fs/dax.c.
> 
> Agreed - but how do we do this? clear_pmem needs to be able to clear an
> arbitrary number of bytes, but to go through the driver, we'd need to
> send down a bio? If only the driver had an rw_bytes like interface that
> could be used by anyone... :)

Actually, my patches for page fault locking remove zeroing from
dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens from
the filesystem only and the zeroing in those two functions is just a dead
code...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-03-30  7:49                 ` Jan Kara
  0 siblings, 0 replies; 74+ messages in thread
From: Jan Kara @ 2016-03-30  7:49 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: Williams, Dan J, linux-block, xfs, linux-mm, viro, axboe, akpm,
	linux-nvdimm, linux-fsdevel, ross.zwisler, linux-ext4, Wilcox,
	Matthew R, david, jack

On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:
> On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
> 
> <>
> 
> > Seems kind of sad to fail the fault due to a bad block when we were
> > going to zero it anyway, right?  I'm not seeing a compelling reason to
> > keep any zeroing in fs/dax.c.
> 
> Agreed - but how do we do this? clear_pmem needs to be able to clear an
> arbitrary number of bytes, but to go through the driver, we'd need to
> send down a bio? If only the driver had an rw_bytes like interface that
> could be used by anyone... :)

Actually, my patches for page fault locking remove zeroing from
dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens from
the filesystem only and the zeroing in those two functions is just a dead
code...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
  2016-03-30  7:49                 ` Jan Kara
  (?)
@ 2016-04-01 19:17                   ` Verma, Vishal L
  -1 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-04-01 19:17 UTC (permalink / raw)
  To: jack
  Cc: axboe, linux-nvdimm, david, xfs, linux-block, linux-mm, viro,
	linux-fsdevel, linux-ext4, akpm, Wilcox, Matthew

On Wed, 2016-03-30 at 09:49 +0200, Jan Kara wrote:
> On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:
> > 
> > On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
> > 
> > <>
> > 
> > > 
> > > Seems kind of sad to fail the fault due to a bad block when we
> > > were
> > > going to zero it anyway, right?  I'm not seeing a compelling
> > > reason to
> > > keep any zeroing in fs/dax.c.
> > Agreed - but how do we do this? clear_pmem needs to be able to clear
> > an
> > arbitrary number of bytes, but to go through the driver, we'd need
> > to
> > send down a bio? If only the driver had an rw_bytes like interface
> > that
> > could be used by anyone... :)
> Actually, my patches for page fault locking remove zeroing from
> dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens
> from
> the filesystem only and the zeroing in those two functions is just a
> dead
> code...

That should make things easier! Do you have a tree I could merge in to
get this? (WIP is ok as we know that my series will depend on yours..)
or, if you can distill out that patch on a 4.6-rc1 base, I could carry
it in my series too (your v2's 3/10 doesn't apply on 4.6-rc1..)

Thanks,
	-Vishal
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-04-01 19:17                   ` Verma, Vishal L
  0 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-04-01 19:17 UTC (permalink / raw)
  To: jack
  Cc: linux-block, xfs, linux-mm, viro, Williams, Dan J, axboe, akpm,
	linux-nvdimm, linux-fsdevel, ross.zwisler, linux-ext4, Wilcox,
	Matthew R, david

On Wed, 2016-03-30 at 09:49 +0200, Jan Kara wrote:
> On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:
> > 
> > On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
> > 
> > <>
> > 
> > > 
> > > Seems kind of sad to fail the fault due to a bad block when we
> > > were
> > > going to zero it anyway, right?  I'm not seeing a compelling
> > > reason to
> > > keep any zeroing in fs/dax.c.
> > Agreed - but how do we do this? clear_pmem needs to be able to clear
> > an
> > arbitrary number of bytes, but to go through the driver, we'd need
> > to
> > send down a bio? If only the driver had an rw_bytes like interface
> > that
> > could be used by anyone... :)
> Actually, my patches for page fault locking remove zeroing from
> dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens
> from
> the filesystem only and the zeroing in those two functions is just a
> dead
> code...

That should make things easier! Do you have a tree I could merge in to
get this? (WIP is ok as we know that my series will depend on yours..)
or, if you can distill out that patch on a 4.6-rc1 base, I could carry
it in my series too (your v2's 3/10 doesn't apply on 4.6-rc1..)

Thanks,
	-Vishal

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-04-01 19:17                   ` Verma, Vishal L
  0 siblings, 0 replies; 74+ messages in thread
From: Verma, Vishal L @ 2016-04-01 19:17 UTC (permalink / raw)
  To: jack
  Cc: axboe, linux-nvdimm, xfs, linux-block, linux-mm, viro,
	ross.zwisler, linux-fsdevel, Williams, Dan J, linux-ext4, akpm,
	Wilcox, Matthew R

On Wed, 2016-03-30 at 09:49 +0200, Jan Kara wrote:
> On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:
> > 
> > On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
> > 
> > <>
> > 
> > > 
> > > Seems kind of sad to fail the fault due to a bad block when we
> > > were
> > > going to zero it anyway, right?  I'm not seeing a compelling
> > > reason to
> > > keep any zeroing in fs/dax.c.
> > Agreed - but how do we do this? clear_pmem needs to be able to clear
> > an
> > arbitrary number of bytes, but to go through the driver, we'd need
> > to
> > send down a bio? If only the driver had an rw_bytes like interface
> > that
> > could be used by anyone... :)
> Actually, my patches for page fault locking remove zeroing from
> dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens
> from
> the filesystem only and the zeroing in those two functions is just a
> dead
> code...

That should make things easier! Do you have a tree I could merge in to
get this? (WIP is ok as we know that my series will depend on yours..)
or, if you can distill out that patch on a 4.6-rc1 base, I could carry
it in my series too (your v2's 3/10 doesn't apply on 4.6-rc1..)

Thanks,
	-Vishal
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
  2016-04-01 19:17                   ` Verma, Vishal L
                                       ` (2 preceding siblings ...)
  (?)
@ 2016-04-04 12:09                     ` Jan Kara
  -1 siblings, 0 replies; 74+ messages in thread
From: Jan Kara @ 2016-04-04 12:09 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: axboe, jack, linux-nvdimm, david, xfs, linux-block, linux-mm,
	viro, linux-fsdevel, linux-ext4, akpm, Wilcox

On Fri 01-04-16 19:17:52, Verma, Vishal L wrote:
> On Wed, 2016-03-30 at 09:49 +0200, Jan Kara wrote:
> > On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:
> > > 
> > > On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
> > > 
> > > <>
> > > 
> > > > 
> > > > Seems kind of sad to fail the fault due to a bad block when we
> > > > were
> > > > going to zero it anyway, right?  I'm not seeing a compelling
> > > > reason to
> > > > keep any zeroing in fs/dax.c.
> > > Agreed - but how do we do this? clear_pmem needs to be able to clear
> > > an
> > > arbitrary number of bytes, but to go through the driver, we'd need
> > > to
> > > send down a bio? If only the driver had an rw_bytes like interface
> > > that
> > > could be used by anyone... :)
> > Actually, my patches for page fault locking remove zeroing from
> > dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens
> > from
> > the filesystem only and the zeroing in those two functions is just a
> > dead
> > code...
> 
> That should make things easier! Do you have a tree I could merge in to
> get this? (WIP is ok as we know that my series will depend on yours..)
> or, if you can distill out that patch on a 4.6-rc1 base, I could carry
> it in my series too (your v2's 3/10 doesn't apply on 4.6-rc1..)

I'll CC you on the next posting of the series which I want to do this week.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-04-04 12:09                     ` Jan Kara
  0 siblings, 0 replies; 74+ messages in thread
From: Jan Kara @ 2016-04-04 12:09 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: jack, linux-block, xfs, linux-mm, viro, Williams, Dan J, axboe,
	akpm, linux-nvdimm, linux-fsdevel, ross.zwisler, linux-ext4,
	Wilcox, Matthew R, david

On Fri 01-04-16 19:17:52, Verma, Vishal L wrote:
> On Wed, 2016-03-30 at 09:49 +0200, Jan Kara wrote:
> > On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:
> > > 
> > > On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
> > > 
> > > <>
> > > 
> > > > 
> > > > Seems kind of sad to fail the fault due to a bad block when we
> > > > were
> > > > going to zero it anyway, right?��I'm not seeing a compelling
> > > > reason to
> > > > keep any zeroing in fs/dax.c.
> > > Agreed - but how do we do this? clear_pmem needs to be able to clear
> > > an
> > > arbitrary number of bytes, but to go through the driver, we'd need
> > > to
> > > send down a bio? If only the driver had an rw_bytes like interface
> > > that
> > > could be used by anyone... :)
> > Actually, my patches for page fault locking remove zeroing from
> > dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens
> > from
> > the filesystem only and the zeroing in those two functions is just a
> > dead
> > code...
> 
> That should make things easier! Do you have a tree I could merge in to
> get this? (WIP is ok as we know that my series will depend on yours..)
> or, if you can distill out that patch on a 4.6-rc1 base, I could carry
> it in my series too (your v2's 3/10 doesn't apply on 4.6-rc1..)

I'll CC you on the next posting of the series which I want to do this week.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-04-04 12:09                     ` Jan Kara
  0 siblings, 0 replies; 74+ messages in thread
From: Jan Kara @ 2016-04-04 12:09 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: jack, linux-block, xfs, linux-mm, viro, Williams, Dan J, axboe,
	akpm, linux-nvdimm, linux-fsdevel, ross.zwisler, linux-ext4,
	Wilcox, Matthew R, david

On Fri 01-04-16 19:17:52, Verma, Vishal L wrote:
> On Wed, 2016-03-30 at 09:49 +0200, Jan Kara wrote:
> > On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:
> > > 
> > > On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
> > > 
> > > <>
> > > 
> > > > 
> > > > Seems kind of sad to fail the fault due to a bad block when we
> > > > were
> > > > going to zero it anyway, right?  I'm not seeing a compelling
> > > > reason to
> > > > keep any zeroing in fs/dax.c.
> > > Agreed - but how do we do this? clear_pmem needs to be able to clear
> > > an
> > > arbitrary number of bytes, but to go through the driver, we'd need
> > > to
> > > send down a bio? If only the driver had an rw_bytes like interface
> > > that
> > > could be used by anyone... :)
> > Actually, my patches for page fault locking remove zeroing from
> > dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens
> > from
> > the filesystem only and the zeroing in those two functions is just a
> > dead
> > code...
> 
> That should make things easier! Do you have a tree I could merge in to
> get this? (WIP is ok as we know that my series will depend on yours..)
> or, if you can distill out that patch on a 4.6-rc1 base, I could carry
> it in my series too (your v2's 3/10 doesn't apply on 4.6-rc1..)

I'll CC you on the next posting of the series which I want to do this week.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-04-04 12:09                     ` Jan Kara
  0 siblings, 0 replies; 74+ messages in thread
From: Jan Kara @ 2016-04-04 12:09 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: axboe, jack, linux-nvdimm, xfs, linux-block, linux-mm, viro,
	ross.zwisler, linux-fsdevel, Williams, Dan J, linux-ext4, akpm,
	Wilcox, Matthew R

On Fri 01-04-16 19:17:52, Verma, Vishal L wrote:
> On Wed, 2016-03-30 at 09:49 +0200, Jan Kara wrote:
> > On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:
> > > 
> > > On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
> > > 
> > > <>
> > > 
> > > > 
> > > > Seems kind of sad to fail the fault due to a bad block when we
> > > > were
> > > > going to zero it anyway, right?  I'm not seeing a compelling
> > > > reason to
> > > > keep any zeroing in fs/dax.c.
> > > Agreed - but how do we do this? clear_pmem needs to be able to clear
> > > an
> > > arbitrary number of bytes, but to go through the driver, we'd need
> > > to
> > > send down a bio? If only the driver had an rw_bytes like interface
> > > that
> > > could be used by anyone... :)
> > Actually, my patches for page fault locking remove zeroing from
> > dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens
> > from
> > the filesystem only and the zeroing in those two functions is just a
> > dead
> > code...
> 
> That should make things easier! Do you have a tree I could merge in to
> get this? (WIP is ok as we know that my series will depend on yours..)
> or, if you can distill out that patch on a 4.6-rc1 base, I could carry
> it in my series too (your v2's 3/10 doesn't apply on 4.6-rc1..)

I'll CC you on the next posting of the series which I want to do this week.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
@ 2016-04-04 12:09                     ` Jan Kara
  0 siblings, 0 replies; 74+ messages in thread
From: Jan Kara @ 2016-04-04 12:09 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: jack, linux-block, xfs, linux-mm, viro, Williams, Dan J, axboe,
	akpm, linux-nvdimm, linux-fsdevel, ross.zwisler, linux-ext4,
	Wilcox, Matthew R, david

On Fri 01-04-16 19:17:52, Verma, Vishal L wrote:
> On Wed, 2016-03-30 at 09:49 +0200, Jan Kara wrote:
> > On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:
> > > 
> > > On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
> > > 
> > > <>
> > > 
> > > > 
> > > > Seems kind of sad to fail the fault due to a bad block when we
> > > > were
> > > > going to zero it anyway, right?  I'm not seeing a compelling
> > > > reason to
> > > > keep any zeroing in fs/dax.c.
> > > Agreed - but how do we do this? clear_pmem needs to be able to clear
> > > an
> > > arbitrary number of bytes, but to go through the driver, we'd need
> > > to
> > > send down a bio? If only the driver had an rw_bytes like interface
> > > that
> > > could be used by anyone... :)
> > Actually, my patches for page fault locking remove zeroing from
> > dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens
> > from
> > the filesystem only and the zeroing in those two functions is just a
> > dead
> > code...
> 
> That should make things easier! Do you have a tree I could merge in to
> get this? (WIP is ok as we know that my series will depend on yours..)
> or, if you can distill out that patch on a 4.6-rc1 base, I could carry
> it in my series too (your v2's 3/10 doesn't apply on 4.6-rc1..)

I'll CC you on the next posting of the series which I want to do this week.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2016-04-04 12:09 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-24 23:17 [PATCH 0/5] dax: handling of media errors Vishal Verma
2016-03-24 23:17 ` Vishal Verma
2016-03-24 23:17 ` Vishal Verma
2016-03-24 23:17 ` Vishal Verma
2016-03-24 23:17 ` [PATCH 1/5] block, dax: pass blk_dax_ctl through to drivers Vishal Verma
2016-03-24 23:17   ` Vishal Verma
2016-03-24 23:17 ` [PATCH 2/5] dax: fallback from pmd to pte on error Vishal Verma
2016-03-24 23:17   ` Vishal Verma
2016-03-24 23:17   ` Vishal Verma
2016-03-24 23:17 ` [PATCH 3/5] dax: enable dax in the presence of known media errors (badblocks) Vishal Verma
2016-03-24 23:17   ` Vishal Verma
2016-03-24 23:17   ` Vishal Verma
2016-03-24 23:23   ` Verma, Vishal L
2016-03-24 23:23     ` Verma, Vishal L
2016-03-24 23:23     ` Verma, Vishal L
2016-03-24 23:17 ` [PATCH 4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors Vishal Verma
2016-03-24 23:17   ` Vishal Verma
2016-03-24 23:17   ` Vishal Verma
2016-03-25 10:44   ` Christoph Hellwig
2016-03-25 10:44     ` Christoph Hellwig
2016-03-25 21:01     ` Verma, Vishal L
2016-03-25 21:01       ` Verma, Vishal L
2016-03-25 18:47   ` Dan Williams
2016-03-25 18:47     ` Dan Williams
2016-03-25 18:47     ` Dan Williams
2016-03-25 21:03     ` Verma, Vishal L
2016-03-25 21:03       ` Verma, Vishal L
2016-03-25 21:03       ` Verma, Vishal L
2016-03-25 21:20       ` Dan Williams
2016-03-25 21:20         ` Dan Williams
2016-03-25 21:20         ` Dan Williams
2016-03-28 20:01         ` Verma, Vishal L
2016-03-28 20:01           ` Verma, Vishal L
2016-03-28 20:01           ` Verma, Vishal L
2016-03-28 23:34           ` Dan Williams
2016-03-28 23:34             ` Dan Williams
2016-03-28 23:34             ` Dan Williams
2016-03-28 23:34             ` Dan Williams
2016-03-29 18:57             ` Verma, Vishal L
2016-03-29 18:57               ` Verma, Vishal L
2016-03-29 18:57               ` Verma, Vishal L
2016-03-29 18:57               ` Verma, Vishal L
2016-03-29 19:37               ` Dan Williams
2016-03-29 19:37                 ` Dan Williams
2016-03-29 19:37                 ` Dan Williams
2016-03-29 19:37                 ` Dan Williams
2016-03-30  7:49               ` Jan Kara
2016-03-30  7:49                 ` Jan Kara
2016-03-30  7:49                 ` Jan Kara
2016-03-30  7:49                 ` Jan Kara
2016-03-30  7:49                 ` Jan Kara
2016-04-01 19:17                 ` Verma, Vishal L
2016-04-01 19:17                   ` Verma, Vishal L
2016-04-01 19:17                   ` Verma, Vishal L
2016-04-04 12:09                   ` Jan Kara
2016-04-04 12:09                     ` Jan Kara
2016-04-04 12:09                     ` Jan Kara
2016-04-04 12:09                     ` Jan Kara
2016-04-04 12:09                     ` Jan Kara
2016-03-24 23:17 ` [PATCH 5/5] dax: handle media errors in dax_do_io Vishal Verma
2016-03-24 23:17   ` Vishal Verma
2016-03-24 23:17   ` Vishal Verma
2016-03-25 10:45   ` Christoph Hellwig
2016-03-25 10:45     ` Christoph Hellwig
2016-03-25 10:45     ` Christoph Hellwig
2016-03-25 20:59     ` Verma, Vishal L
2016-03-25 20:59       ` Verma, Vishal L
2016-03-25 21:42       ` Dan Williams
2016-03-25 21:42         ` Dan Williams
2016-03-25 22:36         ` Verma, Vishal L
2016-03-25 22:36           ` Verma, Vishal L
2016-03-25 22:36           ` Verma, Vishal L
2016-03-26 16:53         ` hch
2016-03-26 16:53           ` hch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.