linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	dan.j.williams@intel.com, linux-nvdimm@lists.01.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] dax,pmem: Provide a dax operation to zero range of memory
Date: Thu, 30 Jan 2020 21:36:24 -0800	[thread overview]
Message-ID: <20200131053624.GA3353@infradead.org> (raw)
In-Reply-To: <20200123165249.GA7664@redhat.com>

On Thu, Jan 23, 2020 at 11:52:49AM -0500, Vivek Goyal wrote:
> Hi,
> 
> This is an RFC patch to provide a dax operation to zero a range of memory.
> It will also clear poison in the process. This is primarily compile tested
> patch. I don't have real hardware to test the poison logic. I am posting
> this to figure out if this is the right direction or not.
> 
> Motivation from this patch comes from Christoph's feedback that he will
> rather prefer a dax way to zero a range instead of relying on having to
> call blkdev_issue_zeroout() in __dax_zero_page_range().
> 
> https://lkml.org/lkml/2019/8/26/361
> 
> My motivation for this change is virtiofs DAX support. There we use DAX
> but we don't have a block device. So any dax code which has the assumption
> that there is always a block device associated is a problem. So this
> is more of a cleanup of one of the places where dax has this dependency
> on block device and if we add a dax operation for zeroing a range, it
> can help with not having to call blkdev_issue_zeroout() in dax path.
> 
> I have yet to take care of stacked block drivers (dm/md).
> 
> Current poison clearing logic is primarily written with assumption that
> I/O is sector aligned. With this new method, this assumption is broken
> and one can pass any range of memory to zero. I have fixed few places
> in existing logic to be able to handle an arbitrary start/end. I am
> not sure are there other dependencies which might need fixing or
> prohibit us from providing this method.
> 
> Any feedback or comment is welcome.
> 
> Thanks
> Vivek
> 
> ---
>  drivers/dax/super.c   |   13 +++++++++
>  drivers/nvdimm/pmem.c |   67 ++++++++++++++++++++++++++++++++++++++++++--------
>  fs/dax.c              |   39 ++++++++---------------------
>  include/linux/dax.h   |    3 ++
>  4 files changed, 85 insertions(+), 37 deletions(-)
> 
> Index: rhvgoyal-linux/drivers/nvdimm/pmem.c
> ===================================================================
> --- rhvgoyal-linux.orig/drivers/nvdimm/pmem.c	2020-01-23 11:32:11.075139183 -0500
> +++ rhvgoyal-linux/drivers/nvdimm/pmem.c	2020-01-23 11:32:28.660139183 -0500
> @@ -52,8 +52,8 @@ static void hwpoison_clear(struct pmem_d
>  	if (is_vmalloc_addr(pmem->virt_addr))
>  		return;
>  
> -	pfn_start = PHYS_PFN(phys);
> -	pfn_end = pfn_start + PHYS_PFN(len);
> +	pfn_start = PFN_UP(phys);
> +	pfn_end = PFN_DOWN(phys + len);
>  	for (pfn = pfn_start; pfn < pfn_end; pfn++) {
>  		struct page *page = pfn_to_page(pfn);
>  

This change looks unrelated to the rest.

> +	sector_end = ALIGN_DOWN((offset - pmem->data_offset + len), 512)/512;
> +	nr_sectors =  sector_end - sector_start;
>  
>  	cleared = nvdimm_clear_poison(dev, pmem->phys_addr + offset, len);
>  	if (cleared < len)
>  		rc = BLK_STS_IOERR;
> -	if (cleared > 0 && cleared / 512) {
> +	if (cleared > 0 && nr_sectors > 0) {
>  		hwpoison_clear(pmem, pmem->phys_addr + offset, cleared);
> -		cleared /= 512;
> -		dev_dbg(dev, "%#llx clear %ld sector%s\n",
> -				(unsigned long long) sector, cleared,
> -				cleared > 1 ? "s" : "");
> -		badblocks_clear(&pmem->bb, sector, cleared);
> +		dev_dbg(dev, "%#llx clear %d sector%s\n",
> +				(unsigned long long) sector_start, nr_sectors,
> +				nr_sectors > 1 ? "s" : "");
> +		badblocks_clear(&pmem->bb, sector_start, nr_sectors);
>  		if (pmem->bb_state)
>  			sysfs_notify_dirent(pmem->bb_state);
>  	}

As does this one?

>  int __dax_zero_page_range(struct block_device *bdev,
>  		struct dax_device *dax_dev, sector_t sector,
>  		unsigned int offset, unsigned int size)
>  {
> +	pgoff_t pgoff;
> +	long rc, id;
>  
> +	rc = bdev_dax_pgoff(bdev, sector, PAGE_SIZE, &pgoff);
> +	if (rc)
> +		return rc;
> +
> +	id = dax_read_lock();
> +	rc = dax_zero_page_range(dax_dev, pgoff, offset, size);
> +	if (rc == -EOPNOTSUPP) {
>  		void *kaddr;
>  
> +		/* If driver does not implement zero page range, fallback */

I think we'll want to restructure this a bit.  First make the new
method mandatory, and just provide a generic_dax_zero_page_range or
similar for the non-pmem instances.

Then __dax_zero_page_range and iomap_dax_zero should merge, and maybe
eventually iomap_zero_range_actor and iomap_zero_range should be split
into a pagecache and DAX variant, lifting the IS_DAXD check into the
callers.

      parent reply	other threads:[~2020-01-31  5:36 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-23 16:52 [RFC] dax,pmem: Provide a dax operation to zero range of memory Vivek Goyal
2020-01-23 19:01 ` Darrick J. Wong
2020-01-24 13:52   ` Vivek Goyal
2020-01-31 23:31   ` Dan Williams
2020-02-03  8:20     ` Christoph Hellwig
2020-02-04 23:23     ` Darrick J. Wong
2020-01-31  5:36 ` Christoph Hellwig [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200131053624.GA3353@infradead.org \
    --to=hch@infradead.org \
    --cc=dan.j.williams@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).