From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC586C43381 for ; Thu, 21 Feb 2019 13:41:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 762012086A for ; Thu, 21 Feb 2019 13:41:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726113AbfBUNlg (ORCPT ); Thu, 21 Feb 2019 08:41:36 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:50196 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725991AbfBUNlg (ORCPT ); Thu, 21 Feb 2019 08:41:36 -0500 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x1LDbJFc105283 for ; Thu, 21 Feb 2019 08:41:35 -0500 Received: from e11.ny.us.ibm.com (e11.ny.us.ibm.com [129.33.205.201]) by mx0a-001b2d01.pphosted.com with ESMTP id 2qsvvj0jgu-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 21 Feb 2019 08:41:35 -0500 Received: from localhost by e11.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 21 Feb 2019 13:41:33 -0000 Received: from b01cxnp23033.gho.pok.ibm.com (9.57.198.28) by e11.ny.us.ibm.com (146.89.104.198) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 21 Feb 2019 13:41:30 -0000 Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x1LDfTJC22610048 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 21 Feb 2019 13:41:29 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 87218112062; Thu, 21 Feb 2019 13:41:29 +0000 (GMT) Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8BAD0112063; Thu, 21 Feb 2019 13:41:23 +0000 (GMT) Received: from [9.199.55.4] (unknown [9.199.55.4]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP; Thu, 21 Feb 2019 13:41:22 +0000 (GMT) Subject: Re: write fault on dax mapping and usage of set_pte_at. To: Jan Kara Cc: Chandan Rajendra , mpe@ellerman.id.au, Dan Williams , linux-fsdevel@vger.kernel.org References: <871s41a9mo.fsf@linux.ibm.com> <20190221121238.GB21533@quack2.suse.cz> From: "Aneesh Kumar K.V" Date: Thu, 21 Feb 2019 19:11:14 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: <20190221121238.GB21533@quack2.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 19022113-2213-0000-0000-00000355621A X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010638; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000281; SDB=6.01164270; UDB=6.00607979; IPR=6.00944855; MB=3.00025680; MTD=3.00000008; XFM=3.00000015; UTC=2019-02-21 13:41:32 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19022113-2214-0000-0000-00005D6DB695 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-02-21_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1902210100 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On 2/21/19 5:42 PM, Jan Kara wrote: > Hi Aneesh, > > On Thu 21-02-19 12:52:39, Aneesh Kumar K.V wrote: >> We found this while testing dax with XFS, but i guess this is true for >> other file systems too. The stack trace looks as >> >> [c00000000007610c] set_pte_at+0x3c/0x190 >> LR [c000000000378628] insert_pfn+0x208/0x280 >> Call Trace: >> [c0000002125df980] [8000000000000104] 0x8000000000000104 (unreliable) >> [c0000002125df9c0] [c000000000378488] insert_pfn+0x68/0x280 >> [c0000002125dfa30] [c0000000004a5494] dax_iomap_pte_fault.isra.7+0x734/0xa40 >> [c0000002125dfb50] [c000000000627250] __xfs_filemap_fault+0x280/0x2d0 >> [c0000002125dfbb0] [c000000000373abc] do_wp_page+0x48c/0xa40 >> [c0000002125dfc00] [c000000000379170] __handle_mm_fault+0x8d0/0x1fd0 >> [c0000002125dfd00] [c00000000037a9b0] handle_mm_fault+0x140/0x250 >> [c0000002125dfd40] [c000000000074bb0] __do_page_fault+0x300/0xd60 >> [c0000002125dfe20] [c00000000000acf4] handle_page_fault+0x18 >> >> >> Now that is WARN_ON in set_pte_at which is >> >> VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); >> >> Multiple architecture optimize set_pte_at based on the assumption that >> we will never use set_pte_at to update a valid pte entry. This helps in >> avoid flushing tlb etc. We should be using ptep_set_access_flags for >> this. > > Hum, I didn't know about this assumption and neither did lot of other > people reviewing DAX patches. Is this documented somewhere? > >> I guess iomap code doesn't handle this correctly? Or am I missing >> some other ways we can end up flushing tlb? > > So for RW->RO transition we use ptep_clear_flush() in dax_entry_mkclean() > so that one is certainly safe. Similarly for unmapping. The RO->RW > transition does not seem to have any TLB flush so there TLB could still > carry stale information but it's the same as with normal page faults on > invalid PTEs or with protection faults for normal pages (see e.g. > finish_mkwrite_fault()). I am not sure i understood that. RO -> RW transition can have stale TLB entries with RO mapping in them. So architecture do flush TLB during the fault, some may not. We do have a NestMMU issue with that transition which requires us to do mark the pte invalid and flush TLB for that transition. (see commit bd5050e38aec3055ff4257ade987d808ac93b582 ) For invalid PTEs we should have a TLB entry at all. finish_mkwrite_fault do use wp_page_reuse() which does the right thing. >The only thing that's remaining is a situation > when we replace a PTE with zero page with a PTE pointing to a real storage > (block allocation on protection fault). However in this case we do > unmap_mapping_pages() in dax_insert_entry() so the PTE actually gets > cleared before we install a new correct block mapping. So this case is safe > as well. Am I missing something? > Do pfn_mkwrite callback need to insert the pfn details for a RO->RW fault type? Can't we skip that pfn insert and let finish_mkwrite_fault handle that pte update? -aneesh