* [PATCH v5 1/5] dax: fallback from pmd to pte on error
2016-05-06 21:53 [PATCH v5 0/5] dax: handling media errors (clear-on-zero only) Vishal Verma
@ 2016-05-06 21:53 ` Vishal Verma
2016-05-10 14:15 ` Jan Kara
2016-05-06 21:53 ` [PATCH v5 2/5] dax: enable dax in the presence of known media errors (badblocks) Vishal Verma
` (4 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Vishal Verma @ 2016-05-06 21:53 UTC (permalink / raw)
To: linux-nvdimm
Cc: Dan Williams, linux-fsdevel, linux-block, xfs, linux-ext4,
linux-mm, Ross Zwisler, Dave Chinner, Jan Kara, Jens Axboe,
Andrew Morton, linux-kernel, Christoph Hellwig, Jeff Moyer,
Boaz Harrosh
From: Dan Williams <dan.j.williams@intel.com>
In preparation for consulting a badblocks list in pmem_direct_access(),
teach dax_pmd_fault() to fallback rather than fail immediately upon
encountering an error. The thought being that reducing the span of the
dax request may avoid the error region.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
fs/dax.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 5a34f08..52f0044 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1111,8 +1111,8 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
long length = dax_map_atomic(bdev, &dax);
if (length < 0) {
- result = VM_FAULT_SIGBUS;
- goto out;
+ dax_pmd_dbg(&bh, address, "dax-error fallback");
+ goto fallback;
}
if (length < PMD_SIZE) {
dax_pmd_dbg(&bh, address, "dax-length too small");
--
2.5.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v5 1/5] dax: fallback from pmd to pte on error
2016-05-06 21:53 ` [PATCH v5 1/5] dax: fallback from pmd to pte on error Vishal Verma
@ 2016-05-10 14:15 ` Jan Kara
0 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2016-05-10 14:15 UTC (permalink / raw)
To: Vishal Verma
Cc: linux-nvdimm, Dan Williams, linux-fsdevel, linux-block, xfs,
linux-ext4, linux-mm, Ross Zwisler, Dave Chinner, Jan Kara,
Jens Axboe, Andrew Morton, linux-kernel, Christoph Hellwig,
Jeff Moyer, Boaz Harrosh
On Fri 06-05-16 15:53:07, Vishal Verma wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> In preparation for consulting a badblocks list in pmem_direct_access(),
> teach dax_pmd_fault() to fallback rather than fail immediately upon
> encountering an error. The thought being that reducing the span of the
> dax request may avoid the error region.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The patch looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v5 2/5] dax: enable dax in the presence of known media errors (badblocks)
2016-05-06 21:53 [PATCH v5 0/5] dax: handling media errors (clear-on-zero only) Vishal Verma
2016-05-06 21:53 ` [PATCH v5 1/5] dax: fallback from pmd to pte on error Vishal Verma
@ 2016-05-06 21:53 ` Vishal Verma
2016-05-06 21:53 ` [PATCH v5 3/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors Vishal Verma
` (3 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Vishal Verma @ 2016-05-06 21:53 UTC (permalink / raw)
To: linux-nvdimm
Cc: Dan Williams, linux-fsdevel, linux-block, xfs, linux-ext4,
linux-mm, Ross Zwisler, Dave Chinner, Jan Kara, Jens Axboe,
Andrew Morton, linux-kernel, Christoph Hellwig, Jeff Moyer,
Boaz Harrosh, Vishal Verma
From: Dan Williams <dan.j.williams@intel.com>
1/ If a mapping overlaps a bad sector fail the request.
2/ Do not opportunistically report more dax-capable capacity than is
requested when errors present.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[vishal: fix a conflict with system RAM collision patches]
[vishal: add a 'size' parameter to ->direct_access]
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
arch/powerpc/sysdev/axonram.c | 2 +-
block/ioctl.c | 9 ---------
drivers/block/brd.c | 2 +-
drivers/nvdimm/pmem.c | 10 +++++++++-
drivers/s390/block/dcssblk.c | 2 +-
fs/block_dev.c | 2 +-
include/linux/blkdev.h | 2 +-
7 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 0d112b9..ff75d70 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -143,7 +143,7 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
*/
static long
axon_ram_direct_access(struct block_device *device, sector_t sector,
- void __pmem **kaddr, pfn_t *pfn)
+ void __pmem **kaddr, pfn_t *pfn, long size)
{
struct axon_ram_bank *bank = device->bd_disk->private_data;
loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
diff --git a/block/ioctl.c b/block/ioctl.c
index 4ff1f92..bf80bfd 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -423,15 +423,6 @@ bool blkdev_dax_capable(struct block_device *bdev)
|| (bdev->bd_part->nr_sects % (PAGE_SIZE / 512)))
return false;
- /*
- * If the device has known bad blocks, force all I/O through the
- * driver / page cache.
- *
- * TODO: support finer grained dax error handling
- */
- if (disk->bb && disk->bb->count)
- return false;
-
return true;
}
#endif
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 51a071e..c04bd9b 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -381,7 +381,7 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector,
#ifdef CONFIG_BLK_DEV_RAM_DAX
static long brd_direct_access(struct block_device *bdev, sector_t sector,
- void __pmem **kaddr, pfn_t *pfn)
+ void __pmem **kaddr, pfn_t *pfn, long size)
{
struct brd_device *brd = bdev->bd_disk->private_data;
struct page *page;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f798899..c447579 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -182,14 +182,22 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
}
static long pmem_direct_access(struct block_device *bdev, sector_t sector,
- void __pmem **kaddr, pfn_t *pfn)
+ void __pmem **kaddr, pfn_t *pfn, long size)
{
struct pmem_device *pmem = bdev->bd_disk->private_data;
resource_size_t offset = sector * 512 + pmem->data_offset;
+ if (unlikely(is_bad_pmem(&pmem->bb, sector, size)))
+ return -EIO;
*kaddr = pmem->virt_addr + offset;
*pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
+ /*
+ * If badblocks are present, limit known good range to the
+ * requested range.
+ */
+ if (unlikely(pmem->bb.count))
+ return size;
return pmem->size - pmem->pfn_pad - offset;
}
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index b839086..c45d538 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -884,7 +884,7 @@ fail:
static long
dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
- void __pmem **kaddr, pfn_t *pfn)
+ void __pmem **kaddr, pfn_t *pfn, long size)
{
struct dcssblk_dev_info *dev_info;
unsigned long offset, dev_sz;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index b25bb23..02c68c4 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -488,7 +488,7 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
sector += get_start_sect(bdev);
if (sector % (PAGE_SIZE / 512))
return -EINVAL;
- avail = ops->direct_access(bdev, sector, &dax->addr, &dax->pfn);
+ avail = ops->direct_access(bdev, sector, &dax->addr, &dax->pfn, size);
if (!avail)
return -ERANGE;
if (avail > 0 && avail & ~PAGE_MASK)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 669e419..55ed530 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1657,7 +1657,7 @@ struct block_device_operations {
int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
long (*direct_access)(struct block_device *, sector_t, void __pmem **,
- pfn_t *);
+ pfn_t *, long);
unsigned int (*check_events) (struct gendisk *disk,
unsigned int clearing);
/* ->media_changed() is DEPRECATED, use ->check_events() instead */
--
2.5.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v5 3/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
2016-05-06 21:53 [PATCH v5 0/5] dax: handling media errors (clear-on-zero only) Vishal Verma
2016-05-06 21:53 ` [PATCH v5 1/5] dax: fallback from pmd to pte on error Vishal Verma
2016-05-06 21:53 ` [PATCH v5 2/5] dax: enable dax in the presence of known media errors (badblocks) Vishal Verma
@ 2016-05-06 21:53 ` Vishal Verma
2016-05-08 8:52 ` Christoph Hellwig
2016-05-10 14:16 ` Jan Kara
2016-05-06 21:53 ` [PATCH v5 4/5] dax: for truncate/hole-punch, do zeroing through the driver if possible Vishal Verma
` (2 subsequent siblings)
5 siblings, 2 replies; 14+ messages in thread
From: Vishal Verma @ 2016-05-06 21:53 UTC (permalink / raw)
To: linux-nvdimm
Cc: Matthew Wilcox, linux-fsdevel, linux-block, xfs, linux-ext4,
linux-mm, Ross Zwisler, Dan Williams, Dave Chinner, Jan Kara,
Jens Axboe, Andrew Morton, linux-kernel, Christoph Hellwig,
Jeff Moyer, Boaz Harrosh, Vishal Verma
From: Matthew Wilcox <matthew.r.wilcox@intel.com>
dax_clear_sectors() cannot handle poisoned blocks. These must be
zeroed using the BIO interface instead. Convert ext2 and XFS to use
only sb_issue_zerout().
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
[vishal: Also remove the dax_clear_sectors function entirely]
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
fs/dax.c | 32 --------------------------------
fs/ext2/inode.c | 7 +++----
fs/xfs/xfs_bmap_util.c | 15 ++++-----------
include/linux/dax.h | 1 -
4 files changed, 7 insertions(+), 48 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 52f0044..5948d9b 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -116,38 +116,6 @@ struct page *read_dax_sector(struct block_device *bdev, sector_t n)
return page;
}
-/*
- * dax_clear_sectors() is called from within transaction context from XFS,
- * and hence this means the stack from this point must follow GFP_NOFS
- * semantics for all operations.
- */
-int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size)
-{
- struct blk_dax_ctl dax = {
- .sector = _sector,
- .size = _size,
- };
-
- might_sleep();
- do {
- long count, sz;
-
- count = dax_map_atomic(bdev, &dax);
- if (count < 0)
- return count;
- sz = min_t(long, count, SZ_128K);
- clear_pmem(dax.addr, sz);
- dax.size -= sz;
- dax.sector += sz / 512;
- dax_unmap_atomic(bdev, &dax);
- cond_resched();
- } while (dax.size);
-
- wmb_pmem();
- return 0;
-}
-EXPORT_SYMBOL_GPL(dax_clear_sectors);
-
static bool buffer_written(struct buffer_head *bh)
{
return buffer_mapped(bh) && !buffer_unwritten(bh);
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 1f07b75..35f2b0bf 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -26,6 +26,7 @@
#include <linux/highuid.h>
#include <linux/pagemap.h>
#include <linux/dax.h>
+#include <linux/blkdev.h>
#include <linux/quotaops.h>
#include <linux/writeback.h>
#include <linux/buffer_head.h>
@@ -737,10 +738,8 @@ static int ext2_get_blocks(struct inode *inode,
* so that it's not found by another thread before it's
* initialised
*/
- err = dax_clear_sectors(inode->i_sb->s_bdev,
- le32_to_cpu(chain[depth-1].key) <<
- (inode->i_blkbits - 9),
- 1 << inode->i_blkbits);
+ err = sb_issue_zeroout(inode->i_sb,
+ le32_to_cpu(chain[depth-1].key), 1, GFP_NOFS);
if (err) {
mutex_unlock(&ei->truncate_mutex);
goto cleanup;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 3b63098..930ac6a 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -72,18 +72,11 @@ xfs_zero_extent(
struct xfs_mount *mp = ip->i_mount;
xfs_daddr_t sector = xfs_fsb_to_db(ip, start_fsb);
sector_t block = XFS_BB_TO_FSBT(mp, sector);
- ssize_t size = XFS_FSB_TO_B(mp, count_fsb);
-
- if (IS_DAX(VFS_I(ip)))
- return dax_clear_sectors(xfs_find_bdev_for_inode(VFS_I(ip)),
- sector, size);
-
- /*
- * let the block layer decide on the fastest method of
- * implementing the zeroing.
- */
- return sb_issue_zeroout(mp->m_super, block, count_fsb, GFP_NOFS);
+ return blkdev_issue_zeroout(xfs_find_bdev_for_inode(VFS_I(ip)),
+ block << (mp->m_super->s_blocksize_bits - 9),
+ count_fsb << (mp->m_super->s_blocksize_bits - 9),
+ GFP_NOFS, true);
}
/*
diff --git a/include/linux/dax.h b/include/linux/dax.h
index ef94fa7..426841a 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -11,7 +11,6 @@
ssize_t dax_do_io(struct kiocb *, struct inode *, struct iov_iter *, loff_t,
get_block_t, dio_iodone_t, int flags);
-int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size);
int dax_zero_page_range(struct inode *, loff_t from, unsigned len, get_block_t);
int dax_truncate_page(struct inode *, loff_t from, get_block_t);
int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t);
--
2.5.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v5 3/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
2016-05-06 21:53 ` [PATCH v5 3/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors Vishal Verma
@ 2016-05-08 8:52 ` Christoph Hellwig
2016-05-08 18:46 ` Verma, Vishal L
2016-05-10 14:16 ` Jan Kara
1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2016-05-08 8:52 UTC (permalink / raw)
To: Vishal Verma
Cc: linux-nvdimm, Matthew Wilcox, linux-fsdevel, linux-block, xfs,
linux-ext4, linux-mm, Ross Zwisler, Dan Williams, Dave Chinner,
Jan Kara, Jens Axboe, Andrew Morton, linux-kernel,
Christoph Hellwig, Jeff Moyer, Boaz Harrosh
On Fri, May 06, 2016 at 03:53:09PM -0600, Vishal Verma wrote:
> From: Matthew Wilcox <matthew.r.wilcox@intel.com>
>
> dax_clear_sectors() cannot handle poisoned blocks. These must be
> zeroed using the BIO interface instead. Convert ext2 and XFS to use
> only sb_issue_zerout().
>
> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> [vishal: Also remove the dax_clear_sectors function entirely]
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Just to make sure: the existing sb_issue_zerout as in 4.6-rc
is already doing the right thing for DAX? I've got a pending patchset
for XFS that introduces another dax_clear_sectors users, but if it's
already safe to use blkdev_issue_zeroout I can switch to that and avoid
the merge conflict.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 3/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
2016-05-08 8:52 ` Christoph Hellwig
@ 2016-05-08 18:46 ` Verma, Vishal L
2016-05-09 14:55 ` Ross Zwisler
0 siblings, 1 reply; 14+ messages in thread
From: Verma, Vishal L @ 2016-05-08 18:46 UTC (permalink / raw)
To: hch
Cc: linux-kernel, linux-block, xfs, linux-nvdimm, jmoyer, linux-mm,
Williams, Dan J, axboe, akpm, linux-fsdevel, ross.zwisler,
linux-ext4, boaz, Wilcox, Matthew R, david, jack
On Sun, 2016-05-08 at 01:52 -0700, Christoph Hellwig wrote:
> On Fri, May 06, 2016 at 03:53:09PM -0600, Vishal Verma wrote:
> >
> > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
> >
> > dax_clear_sectors() cannot handle poisoned blocks. These must be
> > zeroed using the BIO interface instead. Convert ext2 and XFS to
> > use
> > only sb_issue_zerout().
> >
> > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > [vishal: Also remove the dax_clear_sectors function entirely]
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> Just to make sure: the existing sb_issue_zerout as in 4.6-rc
> is already doing the right thing for DAX? I've got a pending
> patchset
> for XFS that introduces another dax_clear_sectors users, but if it's
> already safe to use blkdev_issue_zeroout I can switch to that and
> avoid
> the merge conflict.
I believe so - Jan has moved all unwritten extent conversions out of
DAX with his patch set, and I believe zeroing through the driver is
always fine. Ross or Jan could confirm though.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 3/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
2016-05-08 18:46 ` Verma, Vishal L
@ 2016-05-09 14:55 ` Ross Zwisler
0 siblings, 0 replies; 14+ messages in thread
From: Ross Zwisler @ 2016-05-09 14:55 UTC (permalink / raw)
To: Verma, Vishal L
Cc: hch, linux-kernel, linux-block, xfs, linux-nvdimm, jmoyer,
linux-mm, Williams, Dan J, axboe, akpm, linux-fsdevel,
ross.zwisler, linux-ext4, boaz, Wilcox, Matthew R, david, jack
On Sun, May 08, 2016 at 06:46:13PM +0000, Verma, Vishal L wrote:
> On Sun, 2016-05-08 at 01:52 -0700, Christoph Hellwig wrote:
> > On Fri, May 06, 2016 at 03:53:09PM -0600, Vishal Verma wrote:
> > >
> > > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > >
> > > dax_clear_sectors() cannot handle poisoned blocks. These must be
> > > zeroed using the BIO interface instead. Convert ext2 and XFS to
> > > use
> > > only sb_issue_zerout().
> > >
> > > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> > > [vishal: Also remove the dax_clear_sectors function entirely]
> > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > Just to make sure: the existing sb_issue_zerout as in 4.6-rc
> > is already doing the right thing for DAX? I've got a pending
> > patchset
> > for XFS that introduces another dax_clear_sectors users, but if it's
> > already safe to use blkdev_issue_zeroout I can switch to that and
> > avoid
> > the merge conflict.
>
> I believe so - Jan has moved all unwritten extent conversions out of
> DAX with his patch set, and I believe zeroing through the driver is
> always fine. Ross or Jan could confirm though.
Yep, I believe that the existing sb_issue_zeroout() as of v4.6-rc* does the
right thing. We'll end up calling sb_issue_zeroout() => blkdev_issue_zeroout()
=> __blkdev_issue_zeroout() because we don't have support for discard or
write_same in PMEM. This will send zero page BIOs to the PMEM driver, which
will do the zeroing as normal writes.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 3/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors
2016-05-06 21:53 ` [PATCH v5 3/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors Vishal Verma
2016-05-08 8:52 ` Christoph Hellwig
@ 2016-05-10 14:16 ` Jan Kara
1 sibling, 0 replies; 14+ messages in thread
From: Jan Kara @ 2016-05-10 14:16 UTC (permalink / raw)
To: Vishal Verma
Cc: linux-nvdimm, Matthew Wilcox, linux-fsdevel, linux-block, xfs,
linux-ext4, linux-mm, Ross Zwisler, Dan Williams, Dave Chinner,
Jan Kara, Jens Axboe, Andrew Morton, linux-kernel,
Christoph Hellwig, Jeff Moyer, Boaz Harrosh
On Fri 06-05-16 15:53:09, Vishal Verma wrote:
> From: Matthew Wilcox <matthew.r.wilcox@intel.com>
>
> dax_clear_sectors() cannot handle poisoned blocks. These must be
> zeroed using the BIO interface instead. Convert ext2 and XFS to use
> only sb_issue_zerout().
>
> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> [vishal: Also remove the dax_clear_sectors function entirely]
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
The patch looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v5 4/5] dax: for truncate/hole-punch, do zeroing through the driver if possible
2016-05-06 21:53 [PATCH v5 0/5] dax: handling media errors (clear-on-zero only) Vishal Verma
` (2 preceding siblings ...)
2016-05-06 21:53 ` [PATCH v5 3/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors Vishal Verma
@ 2016-05-06 21:53 ` Vishal Verma
2016-05-10 14:21 ` Jan Kara
2016-05-06 21:53 ` [PATCH v5 5/5] dax: fix a comment in dax_zero_page_range and dax_truncate_page Vishal Verma
2016-05-08 8:55 ` [PATCH v5 0/5] dax: handling media errors (clear-on-zero only) Christoph Hellwig
5 siblings, 1 reply; 14+ messages in thread
From: Vishal Verma @ 2016-05-06 21:53 UTC (permalink / raw)
To: linux-nvdimm
Cc: Vishal Verma, linux-fsdevel, linux-block, xfs, linux-ext4,
linux-mm, Ross Zwisler, Dan Williams, Dave Chinner, Jan Kara,
Jens Axboe, Andrew Morton, linux-kernel, Christoph Hellwig,
Jeff Moyer, Boaz Harrosh
In the truncate or hole-punch path in dax, we clear out sub-page ranges.
If these sub-page ranges are sector aligned and sized, we can do the
zeroing through the driver instead so that error-clearing is handled
automatically.
For sub-sector ranges, we still have to rely on clear_pmem and have the
possibility of tripping over errors.
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
Documentation/filesystems/dax.txt | 32 ++++++++++++++++++++++++++++++++
fs/dax.c | 30 +++++++++++++++++++++++++-----
2 files changed, 57 insertions(+), 5 deletions(-)
diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index 7bde640..ce4587d 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -79,6 +79,38 @@ These filesystems may be used for inspiration:
- ext4: the fourth extended filesystem, see Documentation/filesystems/ext4.txt
+Handling Media Errors
+---------------------
+
+The libnvdimm subsystem stores a record of known media error locations for
+each pmem block device (in gendisk->badblocks). If we fault at such location,
+or one with a latent error not yet discovered, the application can expect
+to receive a SIGBUS. Libnvdimm also allows clearing of these errors by simply
+writing the affected sectors (through the pmem driver, and if the underlying
+NVDIMM supports the clear_poison DSM defined by ACPI).
+
+Since DAX IO normally doesn't go through the driver/bio path, applications or
+sysadmins have an option to restore the lost data from a prior backup/inbuilt
+redundancy in the following ways:
+
+1. Delete the affected file, and restore from a backup (sysadmin route):
+ This will free the file system blocks that were being used by the file,
+ and the next time they're allocated, they will be zeroed first, which
+ happens through the driver, and will clear bad sectors.
+
+2. Truncate or hole-punch the part of the file that has a bad-block (at least
+ an entire aligned sector has to be hole-punched, but not necessarily an
+ entire filesystem block).
+
+These are the two basic paths that allow DAX filesystems to continue operating
+in the presence of media errors. More robust error recovery mechanisms can be
+built on top of this in the future, for example, involving redundancy/mirroring
+provided at the block layer through DM, or additionally, at the filesystem
+level. These would have to rely on the above two tenets, that error clearing
+can happen either by sending an IO through the driver, or zeroing (also through
+the driver).
+
+
Shortcomings
------------
diff --git a/fs/dax.c b/fs/dax.c
index 5948d9b..d8c974e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1196,6 +1196,20 @@ out:
}
EXPORT_SYMBOL_GPL(dax_pfn_mkwrite);
+static bool dax_range_is_aligned(struct block_device *bdev,
+ struct blk_dax_ctl *dax, unsigned int offset,
+ unsigned int length)
+{
+ unsigned short sector_size = bdev_logical_block_size(bdev);
+
+ if (((u64)dax->addr + offset) % sector_size)
+ return false;
+ if (length % sector_size)
+ return false;
+
+ return true;
+}
+
/**
* dax_zero_page_range - zero a range within a page of a DAX file
* @inode: The file being truncated
@@ -1240,11 +1254,17 @@ int dax_zero_page_range(struct inode *inode, loff_t from, unsigned length,
.size = PAGE_SIZE,
};
- if (dax_map_atomic(bdev, &dax) < 0)
- return PTR_ERR(dax.addr);
- clear_pmem(dax.addr + offset, length);
- wmb_pmem();
- dax_unmap_atomic(bdev, &dax);
+ if (dax_range_is_aligned(bdev, &dax, offset, length))
+ return blkdev_issue_zeroout(bdev, dax.sector,
+ length / bdev_logical_block_size(bdev),
+ GFP_NOFS, true);
+ else {
+ if (dax_map_atomic(bdev, &dax) < 0)
+ return PTR_ERR(dax.addr);
+ clear_pmem(dax.addr + offset, length);
+ wmb_pmem();
+ dax_unmap_atomic(bdev, &dax);
+ }
}
return 0;
--
2.5.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v5 4/5] dax: for truncate/hole-punch, do zeroing through the driver if possible
2016-05-06 21:53 ` [PATCH v5 4/5] dax: for truncate/hole-punch, do zeroing through the driver if possible Vishal Verma
@ 2016-05-10 14:21 ` Jan Kara
0 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2016-05-10 14:21 UTC (permalink / raw)
To: Vishal Verma
Cc: linux-nvdimm, linux-fsdevel, linux-block, xfs, linux-ext4,
linux-mm, Ross Zwisler, Dan Williams, Dave Chinner, Jan Kara,
Jens Axboe, Andrew Morton, linux-kernel, Christoph Hellwig,
Jeff Moyer, Boaz Harrosh
On Fri 06-05-16 15:53:10, Vishal Verma wrote:
> +static bool dax_range_is_aligned(struct block_device *bdev,
> + struct blk_dax_ctl *dax, unsigned int offset,
> + unsigned int length)
> +{
> + unsigned short sector_size = bdev_logical_block_size(bdev);
> +
> + if (((u64)dax->addr + offset) % sector_size)
> + return false;
> + if (length % sector_size)
> + return false;
sector_size should better be a power of two so you can save some cycles by
using & instead of %.
> @@ -1240,11 +1254,17 @@ int dax_zero_page_range(struct inode *inode, loff_t from, unsigned length,
> .size = PAGE_SIZE,
> };
>
> - if (dax_map_atomic(bdev, &dax) < 0)
> - return PTR_ERR(dax.addr);
> - clear_pmem(dax.addr + offset, length);
> - wmb_pmem();
> - dax_unmap_atomic(bdev, &dax);
> + if (dax_range_is_aligned(bdev, &dax, offset, length))
> + return blkdev_issue_zeroout(bdev, dax.sector,
> + length / bdev_logical_block_size(bdev),
> + GFP_NOFS, true);
This is actually wrong. blkdev_issue_zeroout() expects length to be simply
in units of 512-bytes. So you need length >> 9 here.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v5 5/5] dax: fix a comment in dax_zero_page_range and dax_truncate_page
2016-05-06 21:53 [PATCH v5 0/5] dax: handling media errors (clear-on-zero only) Vishal Verma
` (3 preceding siblings ...)
2016-05-06 21:53 ` [PATCH v5 4/5] dax: for truncate/hole-punch, do zeroing through the driver if possible Vishal Verma
@ 2016-05-06 21:53 ` Vishal Verma
2016-05-10 14:29 ` Jan Kara
2016-05-08 8:55 ` [PATCH v5 0/5] dax: handling media errors (clear-on-zero only) Christoph Hellwig
5 siblings, 1 reply; 14+ messages in thread
From: Vishal Verma @ 2016-05-06 21:53 UTC (permalink / raw)
To: linux-nvdimm
Cc: Vishal Verma, linux-fsdevel, linux-block, xfs, linux-ext4,
linux-mm, Ross Zwisler, Dan Williams, Dave Chinner, Jan Kara,
Jens Axboe, Andrew Morton, linux-kernel, Christoph Hellwig,
Jeff Moyer, Boaz Harrosh, Kirill A. Shutemov
The distinction between PAGE_SIZE and PAGE_CACHE_SIZE was removed in
09cbfea mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release}
macros
The comments for the above functions described a distinction between
those, that is now redundant, so remove those paragraphs
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
fs/dax.c | 12 ------------
1 file changed, 12 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index d8c974e..b8fa85a 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1221,12 +1221,6 @@ static bool dax_range_is_aligned(struct block_device *bdev,
* page in a DAX file. This is intended for hole-punch operations. If
* you are truncating a file, the helper function dax_truncate_page() may be
* more convenient.
- *
- * We work in terms of PAGE_SIZE here for commonality with
- * block_truncate_page(), but we could go down to PAGE_SIZE if the filesystem
- * took care of disposing of the unnecessary blocks. Even if the filesystem
- * block size is smaller than PAGE_SIZE, we have to zero the rest of the page
- * since the file might be mmapped.
*/
int dax_zero_page_range(struct inode *inode, loff_t from, unsigned length,
get_block_t get_block)
@@ -1279,12 +1273,6 @@ EXPORT_SYMBOL_GPL(dax_zero_page_range);
*
* Similar to block_truncate_page(), this function can be called by a
* filesystem when it is truncating a DAX file to handle the partial page.
- *
- * We work in terms of PAGE_SIZE here for commonality with
- * block_truncate_page(), but we could go down to PAGE_SIZE if the filesystem
- * took care of disposing of the unnecessary blocks. Even if the filesystem
- * block size is smaller than PAGE_SIZE, we have to zero the rest of the page
- * since the file might be mmapped.
*/
int dax_truncate_page(struct inode *inode, loff_t from, get_block_t get_block)
{
--
2.5.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v5 5/5] dax: fix a comment in dax_zero_page_range and dax_truncate_page
2016-05-06 21:53 ` [PATCH v5 5/5] dax: fix a comment in dax_zero_page_range and dax_truncate_page Vishal Verma
@ 2016-05-10 14:29 ` Jan Kara
0 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2016-05-10 14:29 UTC (permalink / raw)
To: Vishal Verma
Cc: linux-nvdimm, linux-fsdevel, linux-block, xfs, linux-ext4,
linux-mm, Ross Zwisler, Dan Williams, Dave Chinner, Jan Kara,
Jens Axboe, Andrew Morton, linux-kernel, Christoph Hellwig,
Jeff Moyer, Boaz Harrosh, Kirill A. Shutemov
On Fri 06-05-16 15:53:11, Vishal Verma wrote:
> The distinction between PAGE_SIZE and PAGE_CACHE_SIZE was removed in
>
> 09cbfea mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release}
> macros
>
> The comments for the above functions described a distinction between
> those, that is now redundant, so remove those paragraphs
>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v5 0/5] dax: handling media errors (clear-on-zero only)
2016-05-06 21:53 [PATCH v5 0/5] dax: handling media errors (clear-on-zero only) Vishal Verma
` (4 preceding siblings ...)
2016-05-06 21:53 ` [PATCH v5 5/5] dax: fix a comment in dax_zero_page_range and dax_truncate_page Vishal Verma
@ 2016-05-08 8:55 ` Christoph Hellwig
5 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2016-05-08 8:55 UTC (permalink / raw)
To: Vishal Verma
Cc: linux-nvdimm, linux-fsdevel, linux-block, xfs, linux-ext4,
linux-mm, Ross Zwisler, Dan Williams, Dave Chinner, Jan Kara,
Jens Axboe, Andrew Morton, linux-kernel, Christoph Hellwig,
Jeff Moyer, Boaz Harrosh
This series looks fine to me:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 14+ messages in thread