From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28262C47082 for ; Tue, 8 Jun 2021 12:05:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CC1086124C for ; Tue, 8 Jun 2021 12:05:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CC1086124C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 60C6D6B006C; Tue, 8 Jun 2021 08:05:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5BB826B006E; Tue, 8 Jun 2021 08:05:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E7346B0070; Tue, 8 Jun 2021 08:05:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0247.hostedemail.com [216.40.44.247]) by kanga.kvack.org (Postfix) with ESMTP id 06AF56B006C for ; Tue, 8 Jun 2021 08:05:35 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9CFE81DEA for ; Tue, 8 Jun 2021 12:05:35 +0000 (UTC) X-FDA: 78230427030.05.E26BA92 Received: from mail-lj1-f178.google.com (mail-lj1-f178.google.com [209.85.208.178]) by imf06.hostedemail.com (Postfix) with ESMTP id 55460C00F780 for ; Tue, 8 Jun 2021 12:05:32 +0000 (UTC) Received: by mail-lj1-f178.google.com with SMTP id s22so6237806ljg.5 for ; Tue, 08 Jun 2021 05:05:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=pVytLWJPCdYWlcE4l4Eozp5bHgj273pdwA3S5444jS8=; b=kJr5yYmQNhP36nMdU0gRFrggHRAzIsRAFHTIw1JmvZ39mNaS6E8+WjSaikaMO4WFsR uj6gmLz203Tx0iwlXmM5oOQbVovdTjjiO7XYzR4JfwoRdxQqYD/SmifC56XfWkqahdnp GvUva4GIdTRh6HsbMZXwlEuktRZ/tIIQuIOlN2PFJ38M4nRAeMVcqxll3d4SpucWwDzc MnGD11K26va2F4dhht+nUL3cu0JMfs25i4I7lhoB+Gew0QlAVu7TIAfVkYr0TSrK8vRh 20fclbfAcXb1IjQFTTqRaV0px1DiFaeoTB4OAXkgpOs7bCg+xqO3+XEfWlAaRlvMp5oS 4n1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=pVytLWJPCdYWlcE4l4Eozp5bHgj273pdwA3S5444jS8=; b=eiiILoCiqWAus1JXX5VC5zdYr/E02TDl60HjLqSew58jt6zqAErsHJMmmI383iyNtl 9pPR8jF1QMALmD6716xfpbMsviGUefO7wvI3IVR7FYAxAuobKHPgzbiRYH9ncxcSbLpV s0NgMN3PvpIm/3T7NBnvz+V4kZBPWX/mtV2Nt5e+d2Fu3Y5fh6t/DmIf9Jn8Qju40ypp 6LdYvF7fdZi5lGVhOSUCkxKztWIU6YF9fnaVll+CdOoFIBENWyfmhZLW3s/IP0zoFJCe FYZw08DjM/E5EPxrhSjUmUWm5Vneg5fxama40TFK4xEEzEWYbrd5fyAtRhV1yNhh5EUz D3nQ== X-Gm-Message-State: AOAM532NZ5wcGVkaIvomVwH7KDeC/2o8eQVTJyB9zGFx6EsDhl8yoeI8 YgNEE9EkUrX3XX4Zu6mr2I9d4w== X-Google-Smtp-Source: ABdhPJzbMTm0cxytj3h3IdFCMhnpHsgWWtYPB5Cc8+1+W8adVpLnY+b2jSlwx6m7+vCJG7orHfmJBg== X-Received: by 2002:a2e:85c9:: with SMTP id h9mr18534093ljj.355.1623153933524; Tue, 08 Jun 2021 05:05:33 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id a7sm1419983lfj.42.2021.06.08.05.05.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Jun 2021 05:05:32 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id 78C67102815; Tue, 8 Jun 2021 15:05:47 +0300 (+03) Date: Tue, 8 Jun 2021 15:05:47 +0300 From: "Kirill A. Shutemov" To: "Aneesh Kumar K.V" Cc: Hugh Dickins , linux-mm@kvack.org, akpm@linux-foundation.org, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, kaleshsingh@google.com, npiggin@gmail.com, joel@joelfernandes.org, Christophe Leroy , Linus Torvalds Subject: Re: [PATCH v7 01/11] mm/mremap: Fix race between MOVE_PMD mremap and pageout Message-ID: <20210608120547.krz7ymie3qq2sd2r@box.shutemov.name> References: <20210607055131.156184-1-aneesh.kumar@linux.ibm.com> <20210607055131.156184-2-aneesh.kumar@linux.ibm.com> <87o8cgokso.fsf@linux.ibm.com> <20210608094222.xcpvlc3kaq5j5sh3@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 55460C00F780 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=kJr5yYmQ; spf=none (imf06.hostedemail.com: domain of kirill@shutemov.name has no SPF policy when checking 209.85.208.178) smtp.mailfrom=kirill@shutemov.name; dmarc=none X-Stat-Signature: saa6abrew7rxs4ekstja9k67enqeu8xq X-HE-Tag: 1623153932-697418 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 08, 2021 at 04:47:19PM +0530, Aneesh Kumar K.V wrote: > On 6/8/21 3:12 PM, Kirill A. Shutemov wrote: > > On Tue, Jun 08, 2021 at 01:22:23PM +0530, Aneesh Kumar K.V wrote: > > > > > > Hi Hugh, > > > > > > Hugh Dickins writes: > > > > > > > On Mon, 7 Jun 2021, Aneesh Kumar K.V wrote: > > > > > > > > > CPU 1 CPU 2 CPU 3 > > > > > > > > > > mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one > > > > > > > > > > mmap_write_lock_killable() > > > > > > > > > > addr = old_addr > > > > > lock(pte_ptl) > > > > > lock(pmd_ptl) > > > > > pmd = *old_pmd > > > > > pmd_clear(old_pmd) > > > > > flush_tlb_range(old_addr) > > > > > > > > > > *new_pmd = pmd > > > > > *new_addr = 10; and fills > > > > > TLB with new addr > > > > > and old pfn > > > > > > > > > > unlock(pmd_ptl) > > > > > ptep_clear_flush() > > > > > old pfn is free. > > > > > Stale TLB entry > > > > > > > > > > Fix this race by holding pmd lock in pageout. This still doesn't handle the race > > > > > between MOVE_PUD and pageout. > > > > > > > > > > Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions") > > > > > Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com > > > > > Signed-off-by: Aneesh Kumar K.V > > > > > > > > This seems very wrong to me, to require another level of locking in the > > > > rmap lookup, just to fix some new pagetable games in mremap. > > > > > > > > But Linus asked "Am I missing something?": neither of you have mentioned > > > > mremap's take_rmap_locks(), so I hope that already meets your need. And > > > > if it needs to be called more often than before (see "need_rmap_locks"), > > > > that's probably okay. > > > > > > > > Hugh > > > > > > > > > > Thanks for reviewing the change. I missed the rmap lock in the code > > > path. How about the below change? > > > > > > mm/mremap: hold the rmap lock in write mode when moving page table entries. > > > To avoid a race between rmap walk and mremap, mremap does take_rmap_locks(). > > > The lock was taken to ensure that rmap walk don't miss a page table entry due to > > > PTE moves via move_pagetables(). The kernel does further optimization of > > > this lock such that if we are going to find the newly added vma after the > > > old vma, the rmap lock is not taken. This is because rmap walk would find the > > > vmas in the same order and if we don't find the page table attached to > > > older vma we would find it with the new vma which we would iterate later. > > > The actual lifetime of the page is still controlled by the PTE lock. > > > This patch updates the locking requirement to handle another race condition > > > explained below with optimized mremap:: > > > Optmized PMD move > > > CPU 1 CPU 2 CPU 3 > > > mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one > > > mmap_write_lock_killable() > > > addr = old_addr > > > lock(pte_ptl) > > > lock(pmd_ptl) > > > pmd = *old_pmd > > > pmd_clear(old_pmd) > > > flush_tlb_range(old_addr) > > > *new_pmd = pmd > > > *new_addr = 10; and fills > > > TLB with new addr > > > and old pfn > > > unlock(pmd_ptl) > > > ptep_clear_flush() > > > old pfn is free. > > > Stale TLB entry > > > Optmized PUD move: > > > CPU 1 CPU 2 CPU 3 > > > mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one > > > mmap_write_lock_killable() > > > addr = old_addr > > > lock(pte_ptl) > > > lock(pud_ptl) > > > pud = *old_pud > > > pud_clear(old_pud) > > > flush_tlb_range(old_addr) > > > *new_pud = pud > > > *new_addr = 10; and fills > > > TLB with new addr > > > and old pfn > > > unlock(pud_ptl) > > > ptep_clear_flush() > > > old pfn is free. > > > Stale TLB entry > > > Both the above race condition can be fixed if we force mremap path to take rmap lock. > > > Signed-off-by: Aneesh Kumar K.V > > > > Looks like it should be enough to address the race. > > > > It would be nice to understand what is performance overhead of the > > additional locking. Is it still faster to move single PMD page table under > > these locks comparing to moving PTE page table entries without the locks? > > > > The improvements provided by optimized mremap as captured in patch 11 is > large. > > mremap HAVE_MOVE_PMD/PUD optimization time comparison for 1GB region: > 1GB mremap - Source PTE-aligned, Destination PTE-aligned > mremap time: 2292772ns > 1GB mremap - Source PMD-aligned, Destination PMD-aligned > mremap time: 1158928ns > 1GB mremap - Source PUD-aligned, Destination PUD-aligned > mremap time: 63886ns > > With additional locking, I haven't observed much change in those numbers. > But that could also be because there is no contention on these locks when > this test is run? Okay, it's good enough: contention should not be common and it's okay to pay a price for correctness. Acked-by: Kirill A. Shutemov -- Kirill A. Shutemov