From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A7C2C43381 for ; Thu, 21 Feb 2019 12:12:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 625842086C for ; Thu, 21 Feb 2019 12:12:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726121AbfBUMMk (ORCPT ); Thu, 21 Feb 2019 07:12:40 -0500 Received: from mx2.suse.de ([195.135.220.15]:50464 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725932AbfBUMMj (ORCPT ); Thu, 21 Feb 2019 07:12:39 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id BF0E7B607; Thu, 21 Feb 2019 12:12:38 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 1CFA11E0900; Thu, 21 Feb 2019 13:12:38 +0100 (CET) Date: Thu, 21 Feb 2019 13:12:38 +0100 From: Jan Kara To: "Aneesh Kumar K.V" Cc: Jan Kara , Chandan Rajendra , mpe@ellerman.id.au, Dan Williams , linux-fsdevel@vger.kernel.org Subject: Re: write fault on dax mapping and usage of set_pte_at. Message-ID: <20190221121238.GB21533@quack2.suse.cz> References: <871s41a9mo.fsf@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <871s41a9mo.fsf@linux.ibm.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Hi Aneesh, On Thu 21-02-19 12:52:39, Aneesh Kumar K.V wrote: > We found this while testing dax with XFS, but i guess this is true for > other file systems too. The stack trace looks as > > [c00000000007610c] set_pte_at+0x3c/0x190 > LR [c000000000378628] insert_pfn+0x208/0x280 > Call Trace: > [c0000002125df980] [8000000000000104] 0x8000000000000104 (unreliable) > [c0000002125df9c0] [c000000000378488] insert_pfn+0x68/0x280 > [c0000002125dfa30] [c0000000004a5494] dax_iomap_pte_fault.isra.7+0x734/0xa40 > [c0000002125dfb50] [c000000000627250] __xfs_filemap_fault+0x280/0x2d0 > [c0000002125dfbb0] [c000000000373abc] do_wp_page+0x48c/0xa40 > [c0000002125dfc00] [c000000000379170] __handle_mm_fault+0x8d0/0x1fd0 > [c0000002125dfd00] [c00000000037a9b0] handle_mm_fault+0x140/0x250 > [c0000002125dfd40] [c000000000074bb0] __do_page_fault+0x300/0xd60 > [c0000002125dfe20] [c00000000000acf4] handle_page_fault+0x18 > > > Now that is WARN_ON in set_pte_at which is > > VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); > > Multiple architecture optimize set_pte_at based on the assumption that > we will never use set_pte_at to update a valid pte entry. This helps in > avoid flushing tlb etc. We should be using ptep_set_access_flags for > this. Hum, I didn't know about this assumption and neither did lot of other people reviewing DAX patches. Is this documented somewhere? > I guess iomap code doesn't handle this correctly? Or am I missing > some other ways we can end up flushing tlb? So for RW->RO transition we use ptep_clear_flush() in dax_entry_mkclean() so that one is certainly safe. Similarly for unmapping. The RO->RW transition does not seem to have any TLB flush so there TLB could still carry stale information but it's the same as with normal page faults on invalid PTEs or with protection faults for normal pages (see e.g. finish_mkwrite_fault()). The only thing that's remaining is a situation when we replace a PTE with zero page with a PTE pointing to a real storage (block allocation on protection fault). However in this case we do unmap_mapping_pages() in dax_insert_entry() so the PTE actually gets cleared before we install a new correct block mapping. So this case is safe as well. Am I missing something? Honza -- Jan Kara SUSE Labs, CR