From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C04F3C433DF for ; Fri, 24 Jul 2020 04:56:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 84C01206F6 for ; Fri, 24 Jul 2020 04:56:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="ccxnKmzH" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 84C01206F6 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0A9238D0019; Fri, 24 Jul 2020 00:56:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 05A5B8D0007; Fri, 24 Jul 2020 00:56:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB2348D0019; Fri, 24 Jul 2020 00:56:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0092.hostedemail.com [216.40.44.92]) by kanga.kvack.org (Postfix) with ESMTP id D7DE88D0007 for ; Fri, 24 Jul 2020 00:56:52 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 48493814A for ; Fri, 24 Jul 2020 04:56:52 +0000 (UTC) X-FDA: 77071759464.13.elbow95_5415eff26f44 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 148BD1830451D for ; Fri, 24 Jul 2020 04:56:52 +0000 (UTC) X-HE-Tag: elbow95_5415eff26f44 X-Filterd-Recvd-Size: 6135 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Fri, 24 Jul 2020 04:56:51 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 1E4A2206D8; Fri, 24 Jul 2020 04:56:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595566610; bh=xd586bWd8pZru+min34gfFSAWUVudkSGYg+RUjq4rdU=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=ccxnKmzHO5cokXLktBXhxK/ZHDKzqTGmWvCmkvdEasP+rGcLN3EspZaMmaNj+/1zQ 7tUOV9lnekwlJ4TrvD3dSyVbBGFH0x58PGY6S9oojfszE0r0HADd5pPSoI/klKM0QA 09AGvtIJoXbUs4AyO4Bn7ftQLDUGivHinb51dM88= Date: Thu, 23 Jul 2020 21:56:49 -0700 From: Andrew Morton To: Yang Shi Cc: catalin.marinas@arm.com, hannes@cmpxchg.org, hdanton@sina.com, hughd@google.com, josef@toxicpanda.com, kirill.shutemov@linux.intel.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, torvalds@linux-foundation.org, will.deacon@arm.com, willy@infradead.org, xuyu@linux.alibaba.com Subject: Re: [patch 01/15] mm/memory.c: avoid access flag update TLB flush for retried page fault Message-Id: <20200723215649.c7d960381117ff794c0a50f0@linux-foundation.org> In-Reply-To: <30cf7356-bef1-c621-60cb-e12a8bd9111d@linux.alibaba.com> References: <20200724041508.QlTbrHnfh%akpm@linux-foundation.org> <30cf7356-bef1-c621-60cb-e12a8bd9111d@linux.alibaba.com> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 148BD1830451D X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 23 Jul 2020 21:38:10 -0700 Yang Shi wrote: > > --- a/mm/memory.c~mm-avoid-access-flag-update-tlb-flush-for-retried-page-fault > > +++ a/mm/memory.c > > @@ -4241,8 +4241,13 @@ static vm_fault_t handle_pte_fault(struc > > if (vmf->flags & FAULT_FLAG_WRITE) { > > if (!pte_write(entry)) > > return do_wp_page(vmf); > > - entry = pte_mkdirty(entry); > > } > > + > > + if ((vmf->flags & FAULT_FLAG_WRITE) && !(vmf->flags & FAULT_FLAG_TRIED)) > > + entry = pte_mkdirty(entry); > > + else if (vmf->flags & FAULT_FLAG_TRIED) > > + goto unlock; > > Hi Andrew, > > It looks you forgot fold v2 update? Argh, yes, sorry. It should have been this: From: Yang Shi Subject: mm/memory.c: avoid access flag update TLB flush for retried page fault Recently we found regression when running will_it_scale/page_fault3 test on ARM64. Over 70% down for the multi processes cases and over 20% down for the multi threads cases. It turns out the regression is caused by commit 89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault"). The test mmaps a memory size file then write to the mapping, this would make all memory dirty and trigger dirty pages throttle, that upstream commit would release mmap_sem then retry the page fault. The retried page fault would see correct PTEs installed by the first try then update dirty bit and clear read-only bit and flush TLBs for ARM. The regression is caused by the excessive TLB flush. It is fine on x86 since x86 doesn't clear read-only bit so there is no need to flush TLB for this case. The page fault would be retried due to: 1. Waiting for page readahead 2. Waiting for page swapped in 3. Waiting for dirty pages throttling The first two cases don't have PTEs set up at all, so the retried page fault would install the PTEs, so they don't reach there. But the #3 case usually has PTEs installed, the retried page fault would reach the dirty bit and read-only bit update. But it seems not necessary to modify those bits again for #3 since they should be already set by the first page fault try. Of course the parallel page fault may set up PTEs, but we just need care about write fault. If the parallel page fault setup a writable and dirty PTE then the retried fault doesn't need do anything extra. If the parallel page fault setup a clean read-only PTE, the retried fault should just call do_wp_page() then return as the below code snippet shows: if (vmf->flags & FAULT_FLAG_WRITE) { if (!pte_write(entry)) return do_wp_page(vmf); } With this fix the test result get back to normal. [yang.shi@linux.alibaba.com: incorporate comment from Will Deacon, update commit log per discussion] Link: http://lkml.kernel.org/r/1594848990-55657-1-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1594148072-91273-1-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi Reported-by: Xu Yu Debugged-by: Xu Yu Tested-by: Xu Yu Cc: Johannes Weiner Cc: Matthew Wilcox (Oracle) Cc: Kirill A. Shutemov Cc: Josef Bacik Cc: Hillf Danton Cc: Hugh Dickins Cc: Catalin Marinas Cc: Will Deacon Signed-off-by: Andrew Morton --- mm/memory.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) --- a/mm/memory.c~mm-avoid-access-flag-update-tlb-flush-for-retried-page-fault +++ a/mm/memory.c @@ -4241,8 +4241,14 @@ static vm_fault_t handle_pte_fault(struc if (vmf->flags & FAULT_FLAG_WRITE) { if (!pte_write(entry)) return do_wp_page(vmf); - entry = pte_mkdirty(entry); } + + if (vmf->flags & FAULT_FLAG_TRIED) + goto unlock; + + if (vmf->flags & FAULT_FLAG_WRITE) + entry = pte_mkdirty(entry); + entry = pte_mkyoung(entry); if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry, vmf->flags & FAULT_FLAG_WRITE)) { _