From: Dan Williams <dan.j.williams@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-nvdimm <linux-nvdimm@lists.01.org>,
Christoph Hellwig <hch@lst.de>, Linux MM <linux-mm@kvack.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v4 10/12] filesystem-dax: Introduce dax_lock_page()
Date: Mon, 11 Jun 2018 09:48:45 -0700 [thread overview]
Message-ID: <CAPcyv4g2+qTQoYN+_VjUsRTZYPKOagL43zZDfdoMi2qEKWJiAg@mail.gmail.com> (raw)
In-Reply-To: <20180611154146.jc5xt4gyaihq64lm@quack2.suse.cz>
On Mon, Jun 11, 2018 at 8:41 AM, Jan Kara <jack@suse.cz> wrote:
> On Fri 08-06-18 16:51:14, Dan Williams wrote:
>> In preparation for implementing support for memory poison (media error)
>> handling via dax mappings, implement a lock_page() equivalent. Poison
>> error handling requires rmap and needs guarantees that the page->mapping
>> association is maintained / valid (inode not freed) for the duration of
>> the lookup.
>>
>> In the device-dax case it is sufficient to simply hold a dev_pagemap
>> reference. In the filesystem-dax case we need to use the entry lock.
>>
>> Export the entry lock via dax_lock_page() that uses rcu_read_lock() to
>> protect against the inode being freed, and revalidates the page->mapping
>> association under xa_lock().
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> Some comments below...
>
>> diff --git a/fs/dax.c b/fs/dax.c
>> index cccf6cad1a7a..b7e71b108fcf 100644
>> --- a/fs/dax.c
>> +++ b/fs/dax.c
>> @@ -361,6 +361,82 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping,
>> }
>> }
>>
>> +struct page *dax_lock_page(unsigned long pfn)
>> +{
>
> Why do you return struct page here? Any reason behind that? Because struct
> page exists and can be accessed through pfn_to_page() regardless of result
> of this function so it looks a bit confusing. Also dax_lock_page() name
> seems a bit confusing. Maybe dax_lock_pfn_mapping_entry()?
>
>> + pgoff_t index;
>> + struct inode *inode;
>> + wait_queue_head_t *wq;
>> + void *entry = NULL, **slot;
>> + struct address_space *mapping;
>> + struct wait_exceptional_entry_queue ewait;
>> + struct page *ret = NULL, *page = pfn_to_page(pfn);
>> +
>> + rcu_read_lock();
>> + for (;;) {
>> + mapping = READ_ONCE(page->mapping);
>> +
>> + if (!mapping || !IS_DAX(mapping->host))
>> + break;
>> +
>> + /*
>> + * In the device-dax case there's no need to lock, a
>> + * struct dev_pagemap pin is sufficient to keep the
>> + * inode alive.
>> + */
>> + inode = mapping->host;
>> + if (S_ISCHR(inode->i_mode)) {
>> + ret = page;
>> + break;
>> + }
>> +
>> + xa_lock_irq(&mapping->i_pages);
>> + if (mapping != page->mapping) {
>> + xa_unlock_irq(&mapping->i_pages);
>> + continue;
>> + }
>> + index = page->index;
>> +
>> + init_wait(&ewait.wait);
>> + ewait.wait.func = wake_exceptional_entry_func;
>
> This initialization could be before the loop.
>
>> +
>> + entry = __radix_tree_lookup(&mapping->i_pages, index, NULL,
>> + &slot);
>> + if (!entry ||
>> + WARN_ON_ONCE(!radix_tree_exceptional_entry(entry))) {
>> + xa_unlock_irq(&mapping->i_pages);
>> + break;
>> + } else if (!slot_locked(mapping, slot)) {
>> + lock_slot(mapping, slot);
>> + ret = page;
>> + xa_unlock_irq(&mapping->i_pages);
>> + break;
>> + }
>> +
>> + wq = dax_entry_waitqueue(mapping, index, entry, &ewait.key);
>> + prepare_to_wait_exclusive(wq, &ewait.wait,
>> + TASK_UNINTERRUPTIBLE);
>> + xa_unlock_irq(&mapping->i_pages);
>> + rcu_read_unlock();
>> + schedule();
>> + finish_wait(wq, &ewait.wait);
>> + rcu_read_lock();
>> + }
>> + rcu_read_unlock();
>
> I don't like how this duplicates a lot of get_unlocked_mapping_entry().
> Can we possibly factor this out similary as done for wait_event()?
Ok, I'll give that a shot.
next prev parent reply other threads:[~2018-06-11 16:48 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-08 23:50 [PATCH v4 00/12] mm: Teach memory_failure() about ZONE_DEVICE pages Dan Williams
2018-06-08 23:50 ` [PATCH v4 01/12] device-dax: Convert to vmf_insert_mixed and vm_fault_t Dan Williams
2018-06-08 23:50 ` [PATCH v4 02/12] device-dax: Cleanup vm_fault de-reference chains Dan Williams
2018-06-11 17:12 ` Laurent Dufour
2018-06-11 17:14 ` Dan Williams
2018-06-08 23:50 ` [PATCH v4 03/12] device-dax: Enable page_mapping() Dan Williams
2018-06-08 23:50 ` [PATCH v4 04/12] device-dax: Set page->index Dan Williams
2018-06-08 23:50 ` [PATCH v4 05/12] filesystem-dax: " Dan Williams
2018-06-08 23:50 ` [PATCH v4 06/12] mm, madvise_inject_error: Let memory_failure() optionally take a page reference Dan Williams
2018-06-08 23:50 ` [PATCH v4 07/12] x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses Dan Williams
2018-06-08 23:51 ` [PATCH v4 08/12] x86/memory_failure: Introduce {set, clear}_mce_nospec() Dan Williams
2018-06-08 23:51 ` [PATCH v4 09/12] mm, memory_failure: Pass page size to kill_proc() Dan Williams
2018-06-08 23:51 ` [PATCH v4 10/12] filesystem-dax: Introduce dax_lock_page() Dan Williams
2018-06-11 15:41 ` Jan Kara
2018-06-11 16:48 ` Dan Williams [this message]
2018-06-12 18:07 ` Ross Zwisler
2018-07-04 15:20 ` Dan Williams
2018-07-04 15:17 ` Dan Williams
2018-06-12 18:15 ` Ross Zwisler
2018-07-04 15:11 ` Dan Williams
2018-06-08 23:51 ` [PATCH v4 11/12] mm, memory_failure: Teach memory_failure() about dev_pagemap pages Dan Williams
2018-06-11 15:50 ` Jan Kara
2018-06-11 16:45 ` Dan Williams
2018-06-12 20:14 ` Ross Zwisler
2018-06-12 23:38 ` Dan Williams
2018-06-08 23:51 ` [PATCH v4 12/12] libnvdimm, pmem: Restore page attributes when clearing errors Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAPcyv4g2+qTQoYN+_VjUsRTZYPKOagL43zZDfdoMi2qEKWJiAg@mail.gmail.com \
--to=dan.j.williams@intel.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).