From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Jan Kara <jack@suse.cz>
Cc: Chandan Rajendra <chandan@linux.ibm.com>,
mpe@ellerman.id.au, Dan Williams <dan.j.williams@intel.com>,
linux-fsdevel@vger.kernel.org
Subject: Re: write fault on dax mapping and usage of set_pte_at.
Date: Thu, 21 Feb 2019 19:11:14 +0530 [thread overview]
Message-ID: <faa8616d-cf9a-3dbb-353a-d163e20ad9e8@linux.ibm.com> (raw)
In-Reply-To: <20190221121238.GB21533@quack2.suse.cz>
On 2/21/19 5:42 PM, Jan Kara wrote:
> Hi Aneesh,
>
> On Thu 21-02-19 12:52:39, Aneesh Kumar K.V wrote:
>> We found this while testing dax with XFS, but i guess this is true for
>> other file systems too. The stack trace looks as
>>
>> [c00000000007610c] set_pte_at+0x3c/0x190
>> LR [c000000000378628] insert_pfn+0x208/0x280
>> Call Trace:
>> [c0000002125df980] [8000000000000104] 0x8000000000000104 (unreliable)
>> [c0000002125df9c0] [c000000000378488] insert_pfn+0x68/0x280
>> [c0000002125dfa30] [c0000000004a5494] dax_iomap_pte_fault.isra.7+0x734/0xa40
>> [c0000002125dfb50] [c000000000627250] __xfs_filemap_fault+0x280/0x2d0
>> [c0000002125dfbb0] [c000000000373abc] do_wp_page+0x48c/0xa40
>> [c0000002125dfc00] [c000000000379170] __handle_mm_fault+0x8d0/0x1fd0
>> [c0000002125dfd00] [c00000000037a9b0] handle_mm_fault+0x140/0x250
>> [c0000002125dfd40] [c000000000074bb0] __do_page_fault+0x300/0xd60
>> [c0000002125dfe20] [c00000000000acf4] handle_page_fault+0x18
>>
>>
>> Now that is WARN_ON in set_pte_at which is
>>
>> VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
>>
>> Multiple architecture optimize set_pte_at based on the assumption that
>> we will never use set_pte_at to update a valid pte entry. This helps in
>> avoid flushing tlb etc. We should be using ptep_set_access_flags for
>> this.
>
> Hum, I didn't know about this assumption and neither did lot of other
> people reviewing DAX patches. Is this documented somewhere?
>
>> I guess iomap code doesn't handle this correctly? Or am I missing
>> some other ways we can end up flushing tlb?
>
> So for RW->RO transition we use ptep_clear_flush() in dax_entry_mkclean()
> so that one is certainly safe. Similarly for unmapping. The RO->RW
> transition does not seem to have any TLB flush so there TLB could still
> carry stale information but it's the same as with normal page faults on
> invalid PTEs or with protection faults for normal pages (see e.g.
> finish_mkwrite_fault()).
I am not sure i understood that. RO -> RW transition can have stale TLB
entries with RO mapping in them. So architecture do flush TLB during the
fault, some may not. We do have a NestMMU issue with that transition
which requires us to do mark the pte invalid and flush TLB for that
transition. (see commit bd5050e38aec3055ff4257ade987d808ac93b582 )
For invalid PTEs we should have a TLB entry at all.
finish_mkwrite_fault do use wp_page_reuse() which does the right thing.
>The only thing that's remaining is a situation
> when we replace a PTE with zero page with a PTE pointing to a real storage
> (block allocation on protection fault). However in this case we do
> unmap_mapping_pages() in dax_insert_entry() so the PTE actually gets
> cleared before we install a new correct block mapping. So this case is safe
> as well. Am I missing something?
>
Do pfn_mkwrite callback need to insert the pfn details for a RO->RW
fault type? Can't we skip that pfn insert and let finish_mkwrite_fault
handle that pte update?
-aneesh
next prev parent reply other threads:[~2019-02-21 13:41 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <871s41a9mo.fsf@linux.ibm.com>
2019-02-21 12:12 ` write fault on dax mapping and usage of set_pte_at Jan Kara
2019-02-21 13:41 ` Aneesh Kumar K.V [this message]
2019-02-21 13:47 ` Aneesh Kumar K.V
2019-02-21 15:15 ` Jan Kara
2019-02-21 15:57 ` Aneesh Kumar K.V
2019-02-21 16:07 ` Aneesh Kumar K.V
2019-03-01 14:49 ` Jan Kara
2019-03-02 15:23 ` Chandan Rajendra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=faa8616d-cf9a-3dbb-353a-d163e20ad9e8@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=chandan@linux.ibm.com \
--cc=dan.j.williams@intel.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mpe@ellerman.id.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).