linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, jack@suse.cz, hch@lst.de
Subject: Re: [PATCH v4 10/12] filesystem-dax: Introduce dax_lock_page()
Date: Tue, 12 Jun 2018 12:15:42 -0600	[thread overview]
Message-ID: <20180612181542.GB28436@linux.intel.com> (raw)
In-Reply-To: <152850187437.38390.2257981090761438811.stgit@dwillia2-desk3.amr.corp.intel.com>

On Fri, Jun 08, 2018 at 04:51:14PM -0700, Dan Williams wrote:
> In preparation for implementing support for memory poison (media error)
> handling via dax mappings, implement a lock_page() equivalent. Poison
> error handling requires rmap and needs guarantees that the page->mapping
> association is maintained / valid (inode not freed) for the duration of
> the lookup.
> 
> In the device-dax case it is sufficient to simply hold a dev_pagemap
> reference. In the filesystem-dax case we need to use the entry lock.
> 
> Export the entry lock via dax_lock_page() that uses rcu_read_lock() to
> protect against the inode being freed, and revalidates the page->mapping
> association under xa_lock().
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  fs/dax.c            |   76 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dax.h |   15 ++++++++++
>  2 files changed, 91 insertions(+)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index cccf6cad1a7a..b7e71b108fcf 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -361,6 +361,82 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping,
>  	}
>  }
>  
> +struct page *dax_lock_page(unsigned long pfn)
> +{
> +	pgoff_t index;
> +	struct inode *inode;
> +	wait_queue_head_t *wq;
> +	void *entry = NULL, **slot;
> +	struct address_space *mapping;
> +	struct wait_exceptional_entry_queue ewait;
> +	struct page *ret = NULL, *page = pfn_to_page(pfn);
> +
> +	rcu_read_lock();
> +	for (;;) {
> +		mapping = READ_ONCE(page->mapping);

Why the READ_ONCE()?

> +
> +		if (!mapping || !IS_DAX(mapping->host))

Might read better using the dax_mapping() helper.

Also, forgive my ignorance, but this implies that dev dax has page->mapping
set up and that that inode will have IS_DAX set, right?  This will let us get
past this point for device DAX, and we'll bail out at the S_ISCHR() check?

> +			break;
> +
> +		/*
> +		 * In the device-dax case there's no need to lock, a
> +		 * struct dev_pagemap pin is sufficient to keep the
> +		 * inode alive.
> +		 */
> +		inode = mapping->host;
> +		if (S_ISCHR(inode->i_mode)) {
> +			ret = page;

'ret' isn't actually used for anything in this function, we just
unconditionally return 'page'.

> +			break;
> +		}
> +
> +		xa_lock_irq(&mapping->i_pages);
> +		if (mapping != page->mapping) {
> +			xa_unlock_irq(&mapping->i_pages);
> +			continue;
> +		}
> +		index = page->index;
> +
> +		init_wait(&ewait.wait);
> +		ewait.wait.func = wake_exceptional_entry_func;
> +
> +		entry = __radix_tree_lookup(&mapping->i_pages, index, NULL,
> +				&slot);
> +		if (!entry ||

So if we do a lookup and there is no entry in the tree, we won't add an empty
entry and lock it, we'll just return with no entry in the tree and nothing
locked.

Then, when we call dax_unlock_page(), we'll eventually hit a WARN_ON_ONCE() in 
dax_unlock_mapping_entry() when we see entry is 0.  And, in that gap we've got
nothing locked so page faults could have happened, etc... (which would mean
that instead of WARN_ON_ONCE() for an empty entry, we'd hit it instead for an
unlocked entry).

Is that okay?  Or do we need to insert a locked empty entry here?

  parent reply	other threads:[~2018-06-12 18:15 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-08 23:50 [PATCH v4 00/12] mm: Teach memory_failure() about ZONE_DEVICE pages Dan Williams
2018-06-08 23:50 ` [PATCH v4 01/12] device-dax: Convert to vmf_insert_mixed and vm_fault_t Dan Williams
2018-06-08 23:50 ` [PATCH v4 02/12] device-dax: Cleanup vm_fault de-reference chains Dan Williams
2018-06-11 17:12   ` Laurent Dufour
2018-06-11 17:14     ` Dan Williams
2018-06-08 23:50 ` [PATCH v4 03/12] device-dax: Enable page_mapping() Dan Williams
2018-06-08 23:50 ` [PATCH v4 04/12] device-dax: Set page->index Dan Williams
2018-06-08 23:50 ` [PATCH v4 05/12] filesystem-dax: " Dan Williams
2018-06-08 23:50 ` [PATCH v4 06/12] mm, madvise_inject_error: Let memory_failure() optionally take a page reference Dan Williams
2018-06-08 23:50 ` [PATCH v4 07/12] x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses Dan Williams
2018-06-08 23:51 ` [PATCH v4 08/12] x86/memory_failure: Introduce {set, clear}_mce_nospec() Dan Williams
2018-06-08 23:51 ` [PATCH v4 09/12] mm, memory_failure: Pass page size to kill_proc() Dan Williams
2018-06-08 23:51 ` [PATCH v4 10/12] filesystem-dax: Introduce dax_lock_page() Dan Williams
2018-06-11 15:41   ` Jan Kara
2018-06-11 16:48     ` Dan Williams
2018-06-12 18:07     ` Ross Zwisler
2018-07-04 15:20       ` Dan Williams
2018-07-04 15:17     ` Dan Williams
2018-06-12 18:15   ` Ross Zwisler [this message]
2018-07-04 15:11     ` Dan Williams
2018-06-08 23:51 ` [PATCH v4 11/12] mm, memory_failure: Teach memory_failure() about dev_pagemap pages Dan Williams
2018-06-11 15:50   ` Jan Kara
2018-06-11 16:45     ` Dan Williams
2018-06-12 20:14   ` Ross Zwisler
2018-06-12 23:38     ` Dan Williams
2018-06-08 23:51 ` [PATCH v4 12/12] libnvdimm, pmem: Restore page attributes when clearing errors Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180612181542.GB28436@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).