From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E12FFC43381 for ; Thu, 21 Feb 2019 15:15:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B17E02083B for ; Thu, 21 Feb 2019 15:15:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726621AbfBUPPQ (ORCPT ); Thu, 21 Feb 2019 10:15:16 -0500 Received: from mx2.suse.de ([195.135.220.15]:53790 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725870AbfBUPPQ (ORCPT ); Thu, 21 Feb 2019 10:15:16 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id F2B5BAECE; Thu, 21 Feb 2019 15:15:13 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 6BBD81E0900; Thu, 21 Feb 2019 16:15:13 +0100 (CET) Date: Thu, 21 Feb 2019 16:15:13 +0100 From: Jan Kara To: "Aneesh Kumar K.V" Cc: Jan Kara , Chandan Rajendra , mpe@ellerman.id.au, Dan Williams , linux-fsdevel@vger.kernel.org Subject: Re: write fault on dax mapping and usage of set_pte_at. Message-ID: <20190221151513.GB23071@quack2.suse.cz> References: <871s41a9mo.fsf@linux.ibm.com> <20190221121238.GB21533@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Thu 21-02-19 19:11:14, Aneesh Kumar K.V wrote: > On 2/21/19 5:42 PM, Jan Kara wrote: > > Hi Aneesh, > > > > On Thu 21-02-19 12:52:39, Aneesh Kumar K.V wrote: > > > We found this while testing dax with XFS, but i guess this is true for > > > other file systems too. The stack trace looks as > > > > > > [c00000000007610c] set_pte_at+0x3c/0x190 > > > LR [c000000000378628] insert_pfn+0x208/0x280 > > > Call Trace: > > > [c0000002125df980] [8000000000000104] 0x8000000000000104 (unreliable) > > > [c0000002125df9c0] [c000000000378488] insert_pfn+0x68/0x280 > > > [c0000002125dfa30] [c0000000004a5494] dax_iomap_pte_fault.isra.7+0x734/0xa40 > > > [c0000002125dfb50] [c000000000627250] __xfs_filemap_fault+0x280/0x2d0 > > > [c0000002125dfbb0] [c000000000373abc] do_wp_page+0x48c/0xa40 > > > [c0000002125dfc00] [c000000000379170] __handle_mm_fault+0x8d0/0x1fd0 > > > [c0000002125dfd00] [c00000000037a9b0] handle_mm_fault+0x140/0x250 > > > [c0000002125dfd40] [c000000000074bb0] __do_page_fault+0x300/0xd60 > > > [c0000002125dfe20] [c00000000000acf4] handle_page_fault+0x18 > > > > > > > > > Now that is WARN_ON in set_pte_at which is > > > > > > VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); > > > > > > Multiple architecture optimize set_pte_at based on the assumption that > > > we will never use set_pte_at to update a valid pte entry. This helps in > > > avoid flushing tlb etc. We should be using ptep_set_access_flags for > > > this. > > > > Hum, I didn't know about this assumption and neither did lot of other > > people reviewing DAX patches. Is this documented somewhere? Any answer here? > > > I guess iomap code doesn't handle this correctly? Or am I missing > > > some other ways we can end up flushing tlb? > > > > So for RW->RO transition we use ptep_clear_flush() in dax_entry_mkclean() > > so that one is certainly safe. Similarly for unmapping. The RO->RW > > transition does not seem to have any TLB flush so there TLB could still > > carry stale information but it's the same as with normal page faults on > > invalid PTEs or with protection faults for normal pages (see e.g. > > finish_mkwrite_fault()). > > I am not sure i understood that. RO -> RW transition can have stale TLB > entries with RO mapping in them. So architecture do flush TLB during the > fault, some may not. We do have a NestMMU issue with that transition which > requires us to do mark the pte invalid and flush TLB for that transition. > (see commit bd5050e38aec3055ff4257ade987d808ac93b582 ) OK, I see. > For invalid PTEs we should have a TLB entry at all. > > finish_mkwrite_fault do use wp_page_reuse() which does the right thing. Yes, sorry. Somehow I've misread the code. > > The only thing that's remaining is a situation > > when we replace a PTE with zero page with a PTE pointing to a real storage > > (block allocation on protection fault). However in this case we do > > unmap_mapping_pages() in dax_insert_entry() so the PTE actually gets > > cleared before we install a new correct block mapping. So this case is safe > > as well. Am I missing something? > > > > Do pfn_mkwrite callback need to insert the pfn details for a RO->RW fault > type? Can't we skip that pfn insert and let finish_mkwrite_fault handle that > pte update? Yes, pfn_mkwrite() must fully update the PTE as the PTE update must happen under a lock that is private to DAX code. Using ptep_set_access_flags() in iomap code isn't going to be simple either. I have to think whether / how that is possible. Honza -- Jan Kara SUSE Labs, CR