From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C31D4C433E0 for ; Wed, 15 Jul 2020 21:36:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A48372065D for ; Wed, 15 Jul 2020 21:36:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727112AbgGOVgl (ORCPT ); Wed, 15 Jul 2020 17:36:41 -0400 Received: from out30-133.freemail.mail.aliyun.com ([115.124.30.133]:42300 "EHLO out30-133.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726370AbgGOVgl (ORCPT ); Wed, 15 Jul 2020 17:36:41 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R811e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=9;SR=0;TI=SMTPD_---0U2qCi.P_1594848990; Received: from localhost(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0U2qCi.P_1594848990) by smtp.aliyun-inc.com(127.0.0.1); Thu, 16 Jul 2020 05:36:38 +0800 From: Yang Shi To: hannes@cmpxchg.org, catalin.marinas@arm.com, will.deacon@arm.com, akpm@linux-foundation.org Cc: yang.shi@linux.alibaba.com, xuyu@linux.alibaba.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [v2 PATCH] mm: avoid access flag update TLB flush for retried page fault Date: Thu, 16 Jul 2020 05:36:30 +0800 Message-Id: <1594848990-55657-1-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Recently we found regression when running will_it_scale/page_fault3 test on ARM64. Over 70% down for the multi processes cases and over 20% down for the multi threads cases. It turns out the regression is caused by commit 89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault"). The test mmaps a memory size file then write to the mapping, this would make all memory dirty and trigger dirty pages throttle, that upstream commit would release mmap_sem then retry the page fault. The retried page fault would see correct PTEs installed by the first try then update dirty bit and clear read-only bit and flush TLBs for ARM. The regression is caused by the excessive TLB flush. It is fine on x86 since x86 doesn't clear read-only bit so there is no need to flush TLB for this case. The page fault would be retried due to: 1. Waiting for page readahead 2. Waiting for page swapped in 3. Waiting for dirty pages throttling The first two cases don't have PTEs set up at all, so the retried page fault would install the PTEs, so they don't reach there. But the #3 case usually has PTEs installed, the retried page fault would reach the dirty bit and read-only bit update. But it seems not necessary to modify those bits again for #3 since they should be already set by the first page fault try. Of course the parallel page fault may set up PTEs, but we just need care about write fault. If the parallel page fault setup a writable and dirty PTE then the retried fault doesn't need do anything extra. If the parallel page fault setup a clean read-only PTE, the retried fault should just call do_wp_page() then return as the below code snippet shows: if (vmf->flags & FAULT_FLAG_WRITE) { if (!pte_write(entry)) return do_wp_page(vmf); } With this fix the test result get back to normal. Fixes: 89b15332af7c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault") Cc: Catalin Marinas Cc: Will Deacon Cc: Johannes Weiner Reported-by: Xu Yu Debugged-by: Xu Yu Tested-by: Xu Yu Signed-off-by: Yang Shi --- v2: * Incorporated the comment from Will Deacon. * Updated the commit log per the discussion. mm/memory.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index 87ec87c..e93e1da 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4241,8 +4241,14 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) if (vmf->flags & FAULT_FLAG_WRITE) { if (!pte_write(entry)) return do_wp_page(vmf); - entry = pte_mkdirty(entry); } + + if (vmf->flags & FAULT_FLAG_TRIED) + goto unlock; + + if (vmf->flags & FAULT_FLAG_WRITE) + entry = pte_mkdirty(entry); + entry = pte_mkyoung(entry); if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry, vmf->flags & FAULT_FLAG_WRITE)) { -- 1.8.3.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5C20C433E0 for ; Wed, 15 Jul 2020 21:38:07 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 80F2C2065D for ; Wed, 15 Jul 2020 21:38:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="HG4HG+Nt" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 80F2C2065D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Owner; bh=8Uz/JlOHKTm6TM+IrYz3OnGwXy3DMv9uiCBHgADhEYM=; b=HG4HG+Nti1PKPyZHgJePgARTp/ exkEx7r/RJuuQPCRhtrXZgCUlUsCv5hYBPLkV+owE+fNeKdEPs8fNfiSkPTj4GVKzRS0El1+IDG7r P1rBHdnQJUirSgZtx/CzGeLeGJOyIb0Xm+dJdiZnBqbTZsgHgK4iIdnDCCjGtmrxXtVoDrg/IjocJ N2gDCDOUxutsMRdp+mmhhIEVUUBRDYDFB93YxIspPjp21Q7KPBWKJz3aGEdRNDTeRcmeZbJnUiGzX pGfEQoceLqTRAVHhddcfNRNAxuNynew7iSSE9WMdKUTBCOlnV9ZOBD+2wTTH05rsWp/cixyjmCZkJ uLaUK4pg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jvp58-0006sg-AN; Wed, 15 Jul 2020 21:36:50 +0000 Received: from out30-43.freemail.mail.aliyun.com ([115.124.30.43]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1jvp54-0006qk-MH for linux-arm-kernel@lists.infradead.org; Wed, 15 Jul 2020 21:36:47 +0000 X-Alimail-AntiSpam: AC=PASS; BC=-1|-1; BR=01201311R811e4; CH=green; DM=||false|; DS=||; FP=0|-1|-1|-1|0|-1|-1|-1; HT=e01e04426; MF=yang.shi@linux.alibaba.com; NM=1; PH=DS; RN=9; SR=0; TI=SMTPD_---0U2qCi.P_1594848990; Received: from localhost(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0U2qCi.P_1594848990) by smtp.aliyun-inc.com(127.0.0.1); Thu, 16 Jul 2020 05:36:38 +0800 From: Yang Shi To: hannes@cmpxchg.org, catalin.marinas@arm.com, will.deacon@arm.com, akpm@linux-foundation.org Subject: [v2 PATCH] mm: avoid access flag update TLB flush for retried page fault Date: Thu, 16 Jul 2020 05:36:30 +0800 Message-Id: <1594848990-55657-1-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200715_173646_968400_D4E0DD06 X-CRM114-Status: GOOD ( 12.05 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: xuyu@linux.alibaba.com, yang.shi@linux.alibaba.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Recently we found regression when running will_it_scale/page_fault3 test on ARM64. Over 70% down for the multi processes cases and over 20% down for the multi threads cases. It turns out the regression is caused by commit 89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault"). The test mmaps a memory size file then write to the mapping, this would make all memory dirty and trigger dirty pages throttle, that upstream commit would release mmap_sem then retry the page fault. The retried page fault would see correct PTEs installed by the first try then update dirty bit and clear read-only bit and flush TLBs for ARM. The regression is caused by the excessive TLB flush. It is fine on x86 since x86 doesn't clear read-only bit so there is no need to flush TLB for this case. The page fault would be retried due to: 1. Waiting for page readahead 2. Waiting for page swapped in 3. Waiting for dirty pages throttling The first two cases don't have PTEs set up at all, so the retried page fault would install the PTEs, so they don't reach there. But the #3 case usually has PTEs installed, the retried page fault would reach the dirty bit and read-only bit update. But it seems not necessary to modify those bits again for #3 since they should be already set by the first page fault try. Of course the parallel page fault may set up PTEs, but we just need care about write fault. If the parallel page fault setup a writable and dirty PTE then the retried fault doesn't need do anything extra. If the parallel page fault setup a clean read-only PTE, the retried fault should just call do_wp_page() then return as the below code snippet shows: if (vmf->flags & FAULT_FLAG_WRITE) { if (!pte_write(entry)) return do_wp_page(vmf); } With this fix the test result get back to normal. Fixes: 89b15332af7c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault") Cc: Catalin Marinas Cc: Will Deacon Cc: Johannes Weiner Reported-by: Xu Yu Debugged-by: Xu Yu Tested-by: Xu Yu Signed-off-by: Yang Shi --- v2: * Incorporated the comment from Will Deacon. * Updated the commit log per the discussion. mm/memory.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index 87ec87c..e93e1da 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4241,8 +4241,14 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) if (vmf->flags & FAULT_FLAG_WRITE) { if (!pte_write(entry)) return do_wp_page(vmf); - entry = pte_mkdirty(entry); } + + if (vmf->flags & FAULT_FLAG_TRIED) + goto unlock; + + if (vmf->flags & FAULT_FLAG_WRITE) + entry = pte_mkdirty(entry); + entry = pte_mkyoung(entry); if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry, vmf->flags & FAULT_FLAG_WRITE)) { -- 1.8.3.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel