From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46D9CC47082 for ; Tue, 8 Jun 2021 09:42:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C00CB610FB for ; Tue, 8 Jun 2021 09:42:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C00CB610FB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5C2706B006E; Tue, 8 Jun 2021 05:42:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 571EC6B0070; Tue, 8 Jun 2021 05:42:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C4956B0071; Tue, 8 Jun 2021 05:42:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0247.hostedemail.com [216.40.44.247]) by kanga.kvack.org (Postfix) with ESMTP id 055526B006E for ; Tue, 8 Jun 2021 05:42:10 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9E763362B for ; Tue, 8 Jun 2021 09:42:10 +0000 (UTC) X-FDA: 78230065620.06.529022D Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) by imf04.hostedemail.com (Postfix) with ESMTP id 09CBF37C for ; Tue, 8 Jun 2021 09:42:07 +0000 (UTC) Received: by mail-lf1-f54.google.com with SMTP id n12so24072331lft.10 for ; Tue, 08 Jun 2021 02:42:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=UbrlioUAOAIyVAz7Y1adXjaiDjtyUMWku2wdnzCdoiE=; b=u3Oe4L5rvuo3QWAa2Fa/oNFoEr5gQFIiNX0Bc578VqkXIRhova6+7Rw2hLzmfC+3UP fSLu6F/51qzyA6Aaal9PU1+i7QWP4G2jxT2LxOGlqFxgcyUp03swE7NAbDyHXbLx53xK yXmxz3cEtQGVnR1sABRznZ+WRD/Txxj8wM2iyjxg8dVe680h46fwqWyQnGPWRfN+jfi1 yUGxtWvgDpaAWUYzN+LLqdgcmqamN8cAV4cK3j+J1LTxhW1Jt/eOR34KEhcColGMlUfn g05WvF9IexXPHSuFfjzUzg0cNXjQHYRwc2Y0Ik9M/rVQT23Sf7+54B9V9PjaojzswBHD lhUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=UbrlioUAOAIyVAz7Y1adXjaiDjtyUMWku2wdnzCdoiE=; b=uCAzipQwbgFmo1iciw/05eDTlURaiMYgd8kh4SJRuenid16r9LQ/0h5BA8EVZHu00T xeVa5AAyx4CxZCK3KiTCXs19Xic1wr/Yeg76wqyIw5wTgrU2a/Jv1y6rlhN8F7wEgY6l OHum+SIZrcVoeIDP1GCKrtR09C1CONEQyPvjUllEwZ70ibcMrQHRVmKdJX/Z4dFAYbsf JXYp1KJvDPf5N25TSox/zS4c/I3rPqKDNq9x26F6kEQGE8tBPGnV1/hU4ioJxmC4Scsv obORThSlRQ5x9URP9mXSqBpbUQ5cNDg0FrYIOGb3AZ/sVN6auNamFF+9OjmWVZW4j5ay RsFg== X-Gm-Message-State: AOAM531LAAf5OxQJcnwpFevCQ8DV9AKsgTZmkpNeRKQvpnGIc5gWRntt AnPsD43RhgPJfnOPwoLxT2I+zg== X-Google-Smtp-Source: ABdhPJzgUi/RGM4e/IJ++ZPnpBCtMa6QQYdBS/7ExIydcQEbhUkmVcoJhLqj/XAeQnHuMoLW0F9k2A== X-Received: by 2002:ac2:5493:: with SMTP id t19mr15220380lfk.145.1623145328659; Tue, 08 Jun 2021 02:42:08 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id v18sm993397ljg.114.2021.06.08.02.42.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Jun 2021 02:42:07 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id ABB01102717; Tue, 8 Jun 2021 12:42:22 +0300 (+03) Date: Tue, 8 Jun 2021 12:42:22 +0300 From: "Kirill A. Shutemov" To: "Aneesh Kumar K.V" Cc: Hugh Dickins , linux-mm@kvack.org, akpm@linux-foundation.org, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, kaleshsingh@google.com, npiggin@gmail.com, joel@joelfernandes.org, Christophe Leroy , Linus Torvalds Subject: Re: [PATCH v7 01/11] mm/mremap: Fix race between MOVE_PMD mremap and pageout Message-ID: <20210608094222.xcpvlc3kaq5j5sh3@box.shutemov.name> References: <20210607055131.156184-1-aneesh.kumar@linux.ibm.com> <20210607055131.156184-2-aneesh.kumar@linux.ibm.com> <87o8cgokso.fsf@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87o8cgokso.fsf@linux.ibm.com> Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=u3Oe4L5r; dmarc=none; spf=none (imf04.hostedemail.com: domain of kirill@shutemov.name has no SPF policy when checking 209.85.167.54) smtp.mailfrom=kirill@shutemov.name X-Stat-Signature: 15b3qpcerdjoz4gwd111r166yz6sc18s X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 09CBF37C X-HE-Tag: 1623145327-565602 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 08, 2021 at 01:22:23PM +0530, Aneesh Kumar K.V wrote: > > Hi Hugh, > > Hugh Dickins writes: > > > On Mon, 7 Jun 2021, Aneesh Kumar K.V wrote: > > > >> CPU 1 CPU 2 CPU 3 > >> > >> mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one > >> > >> mmap_write_lock_killable() > >> > >> addr = old_addr > >> lock(pte_ptl) > >> lock(pmd_ptl) > >> pmd = *old_pmd > >> pmd_clear(old_pmd) > >> flush_tlb_range(old_addr) > >> > >> *new_pmd = pmd > >> *new_addr = 10; and fills > >> TLB with new addr > >> and old pfn > >> > >> unlock(pmd_ptl) > >> ptep_clear_flush() > >> old pfn is free. > >> Stale TLB entry > >> > >> Fix this race by holding pmd lock in pageout. This still doesn't handle the race > >> between MOVE_PUD and pageout. > >> > >> Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions") > >> Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com > >> Signed-off-by: Aneesh Kumar K.V > > > > This seems very wrong to me, to require another level of locking in the > > rmap lookup, just to fix some new pagetable games in mremap. > > > > But Linus asked "Am I missing something?": neither of you have mentioned > > mremap's take_rmap_locks(), so I hope that already meets your need. And > > if it needs to be called more often than before (see "need_rmap_locks"), > > that's probably okay. > > > > Hugh > > > > Thanks for reviewing the change. I missed the rmap lock in the code > path. How about the below change? > > mm/mremap: hold the rmap lock in write mode when moving page table entries. > > To avoid a race between rmap walk and mremap, mremap does take_rmap_locks(). > The lock was taken to ensure that rmap walk don't miss a page table entry due to > PTE moves via move_pagetables(). The kernel does further optimization of > this lock such that if we are going to find the newly added vma after the > old vma, the rmap lock is not taken. This is because rmap walk would find the > vmas in the same order and if we don't find the page table attached to > older vma we would find it with the new vma which we would iterate later. > The actual lifetime of the page is still controlled by the PTE lock. > > This patch updates the locking requirement to handle another race condition > explained below with optimized mremap:: > > Optmized PMD move > > CPU 1 CPU 2 CPU 3 > > mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one > > mmap_write_lock_killable() > > addr = old_addr > lock(pte_ptl) > lock(pmd_ptl) > pmd = *old_pmd > pmd_clear(old_pmd) > flush_tlb_range(old_addr) > > *new_pmd = pmd > *new_addr = 10; and fills > TLB with new addr > and old pfn > > unlock(pmd_ptl) > ptep_clear_flush() > old pfn is free. > Stale TLB entry > > Optmized PUD move: > > CPU 1 CPU 2 CPU 3 > > mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one > > mmap_write_lock_killable() > > addr = old_addr > lock(pte_ptl) > lock(pud_ptl) > pud = *old_pud > pud_clear(old_pud) > flush_tlb_range(old_addr) > > *new_pud = pud > *new_addr = 10; and fills > TLB with new addr > and old pfn > > unlock(pud_ptl) > ptep_clear_flush() > old pfn is free. > Stale TLB entry > > Both the above race condition can be fixed if we force mremap path to take rmap lock. > > Signed-off-by: Aneesh Kumar K.V Looks like it should be enough to address the race. It would be nice to understand what is performance overhead of the additional locking. Is it still faster to move single PMD page table under these locks comparing to moving PTE page table entries without the locks? -- Kirill A. Shutemov