All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux-MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Kalesh Singh <kaleshsingh@google.com>,
	Nick Piggin <npiggin@gmail.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Christophe Leroy <christophe.leroy@csgroup.eu>
Subject: Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock
Date: Fri, 21 May 2021 18:20:45 +0530	[thread overview]
Message-ID: <5ea8fa4f-a5a2-7dc4-7958-23df6a2c1f3a@linux.ibm.com> (raw)
In-Reply-To: <CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com>

On 5/21/21 11:43 AM, Linus Torvalds wrote:
> On Thu, May 20, 2021 at 5:03 PM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> On 5/21/21 8:10 AM, Linus Torvalds wrote:
>>>
>>> So mremap does need to flush the TLB before releasing the page table
>>> lock, because that's the lifetime boundary for the page that got
>>> moved.
>>
>> How will we avoid that happening with
>> c49dd340180260c6239e453263a9a244da9a7c85 /
>> 2c91bd4a4e2e530582d6fd643ea7b86b27907151 . The commit improves mremap
>> performance by moving level3/level2 page table entries. When doing so we
>> are not holding level 4 ptl lock (pte_lock()). But rather we are holding
>> pmd_lock or pud_lock(). So if we move pages around without holding the
>> pte lock, won't the above issue happen even if we do a tlb flush with
>> holding pmd lock/pud lock?
> 
> Hmm. Interesting.
> 
> Your patch (to flush the TLB after clearing the old location, and
> before inserting it into the new one) looks like an "obvious" fix.
> 
> But I'm putting that "obvious" in quotes, because I'm now wondering if
> it actually fixes anything.
> 
> Lookie here:
> 
>   - CPU1 does a mremap of a pmd or pud.
> 
>      It clears the old pmd/pud, flushes the old TLB range, and then
> inserts the pmd/pud at the new location.
> 
>   - CPU2 does a page shrinker, which calls try_to_unmap, which calls
> try_to_unmap_one.
> 
> These are entirely asynchronous, because they have no shared lock. The
> mremap uses the pmd lock, the try_to_unmap_one() does the rmap walk,
> which does the pte lock.
> 
> Now, imagine that the following ordering happens with the two
> operations above, and a CPU3 that does accesses:
> 
>   - CPU2 follows (and sees) the old page tables in the old location and
> the took the pte lock
> 
>   - the mremap on CPU1 starts - cleared the old pmd, flushed the tlb,
> *and* inserts in the new place.
> 
>   - a user thread on CPU3 accesses the new location and fills the TLB
> of the *new* address
> 
>   - only now does CPU2 get to the "pte_get_and_clear()" to remove one page
> 
>   - CPU2 does a TLB flush and frees the page
> 
> End result:
> 
>   - both CPU1 _and_ CPU2 have flushed the TLB.
> 
>   - but both flushed the *OLD* address
> 
>   - the page is freed
> 
>   - CPU3 still has the stale TLB entry pointing to the page that is now
> free and might be reused for something else
> 
> Am I missing something?
> 

That is a problem. With that it looks like CONFIG_HAVE_MOVE_PMD/PUD is 
broken? I don't see an easy way to fix this?

-aneesh



WARNING: multiple messages have this Message-ID (diff)
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>, Linux-MM <linux-mm@kvack.org>,
	Kalesh Singh <kaleshsingh@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock
Date: Fri, 21 May 2021 18:20:45 +0530	[thread overview]
Message-ID: <5ea8fa4f-a5a2-7dc4-7958-23df6a2c1f3a@linux.ibm.com> (raw)
In-Reply-To: <CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com>

On 5/21/21 11:43 AM, Linus Torvalds wrote:
> On Thu, May 20, 2021 at 5:03 PM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> On 5/21/21 8:10 AM, Linus Torvalds wrote:
>>>
>>> So mremap does need to flush the TLB before releasing the page table
>>> lock, because that's the lifetime boundary for the page that got
>>> moved.
>>
>> How will we avoid that happening with
>> c49dd340180260c6239e453263a9a244da9a7c85 /
>> 2c91bd4a4e2e530582d6fd643ea7b86b27907151 . The commit improves mremap
>> performance by moving level3/level2 page table entries. When doing so we
>> are not holding level 4 ptl lock (pte_lock()). But rather we are holding
>> pmd_lock or pud_lock(). So if we move pages around without holding the
>> pte lock, won't the above issue happen even if we do a tlb flush with
>> holding pmd lock/pud lock?
> 
> Hmm. Interesting.
> 
> Your patch (to flush the TLB after clearing the old location, and
> before inserting it into the new one) looks like an "obvious" fix.
> 
> But I'm putting that "obvious" in quotes, because I'm now wondering if
> it actually fixes anything.
> 
> Lookie here:
> 
>   - CPU1 does a mremap of a pmd or pud.
> 
>      It clears the old pmd/pud, flushes the old TLB range, and then
> inserts the pmd/pud at the new location.
> 
>   - CPU2 does a page shrinker, which calls try_to_unmap, which calls
> try_to_unmap_one.
> 
> These are entirely asynchronous, because they have no shared lock. The
> mremap uses the pmd lock, the try_to_unmap_one() does the rmap walk,
> which does the pte lock.
> 
> Now, imagine that the following ordering happens with the two
> operations above, and a CPU3 that does accesses:
> 
>   - CPU2 follows (and sees) the old page tables in the old location and
> the took the pte lock
> 
>   - the mremap on CPU1 starts - cleared the old pmd, flushed the tlb,
> *and* inserts in the new place.
> 
>   - a user thread on CPU3 accesses the new location and fills the TLB
> of the *new* address
> 
>   - only now does CPU2 get to the "pte_get_and_clear()" to remove one page
> 
>   - CPU2 does a TLB flush and frees the page
> 
> End result:
> 
>   - both CPU1 _and_ CPU2 have flushed the TLB.
> 
>   - but both flushed the *OLD* address
> 
>   - the page is freed
> 
>   - CPU3 still has the stale TLB entry pointing to the page that is now
> free and might be reused for something else
> 
> Am I missing something?
> 

That is a problem. With that it looks like CONFIG_HAVE_MOVE_PMD/PUD is 
broken? I don't see an easy way to fix this?

-aneesh


  reply	other threads:[~2021-05-21 12:51 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-22  5:43 [PATCH v5 0/9] Speedup mremap on ppc64 Aneesh Kumar K.V
2021-04-22  5:43 ` Aneesh Kumar K.V
2021-04-22  5:43 ` [PATCH v5 1/9] selftest/mremap_test: Update the test to handle pagesize other than 4K Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-04-22  5:43 ` [PATCH v5 2/9] selftest/mremap_test: Avoid crash with static build Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-04-22  5:43 ` [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-05-18 20:04   ` Nathan Chancellor
2021-05-18 20:04     ` Nathan Chancellor
2021-05-19  4:46     ` Aneesh Kumar K.V
2021-05-19  4:46       ` Aneesh Kumar K.V
2021-05-19 18:02       ` Nathan Chancellor
2021-05-19 18:02         ` Nathan Chancellor
2021-05-20  2:18       ` Peter Xu
2021-05-20  2:18         ` Peter Xu
2021-05-20  8:26         ` Aneesh Kumar K.V
2021-05-20  8:26           ` Aneesh Kumar K.V
2021-05-20 12:46           ` Peter Xu
2021-05-20 12:46             ` Peter Xu
2021-05-20 13:23             ` Aneesh Kumar K.V
2021-05-20 13:23               ` Aneesh Kumar K.V
2021-05-20 13:37               ` Aneesh Kumar K.V
2021-05-20 13:37                 ` Aneesh Kumar K.V
2021-05-20 14:57                 ` Peter Xu
2021-05-20 14:57                   ` Peter Xu
2021-05-20 19:06                   ` Zi Yan
2021-05-20 19:06                     ` Zi Yan
2021-05-20 20:01                     ` Peter Xu
2021-05-20 20:01                       ` Peter Xu
2021-05-20 20:25                       ` Kalesh Singh
2021-05-20 20:25                         ` Kalesh Singh
2021-04-22  5:43 ` [PATCH v5 4/9] powerpc/mm/book3s64: Fix possible build error Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-04-22  5:43 ` [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-05-15 16:35   ` Guenter Roeck
2021-05-15 16:35     ` Guenter Roeck
2021-05-15 20:41     ` Andrew Morton
2021-05-15 20:41       ` Andrew Morton
2021-05-15 23:05       ` Guenter Roeck
2021-05-15 23:05         ` Guenter Roeck
2021-05-17  8:40     ` Aneesh Kumar K.V
2021-05-17  8:40       ` Aneesh Kumar K.V
2021-05-17 13:38       ` Guenter Roeck
2021-05-17 13:38         ` Guenter Roeck
2021-05-17 13:55         ` Aneesh Kumar K.V
2021-05-17 13:55           ` Aneesh Kumar K.V
2021-05-17 14:18           ` Guenter Roeck
2021-05-17 14:18             ` Guenter Roeck
2021-05-19  0:26             ` Michael Ellerman
2021-05-19  0:26               ` Michael Ellerman
2021-05-19  0:45               ` Segher Boessenkool
2021-05-19  0:45                 ` Segher Boessenkool
2021-05-19 12:03                 ` Segher Boessenkool
2021-05-19 13:37                   ` Guenter Roeck
2021-05-19 14:20                     ` Segher Boessenkool
2021-05-19 14:20                       ` Segher Boessenkool
2021-05-19 15:28                       ` Guenter Roeck
2021-05-19 15:28                         ` Guenter Roeck
2021-05-20  7:37                   ` Michael Ellerman
2021-05-20 12:17                     ` Segher Boessenkool
2021-05-19  1:08               ` Guenter Roeck
2021-05-19  1:08                 ` Guenter Roeck
2021-05-20 11:38                 ` Michael Ellerman
2021-05-20 11:38                   ` Michael Ellerman
2021-05-20 11:56                   ` Guenter Roeck
2021-05-20 11:56                     ` Guenter Roeck
2021-04-22  5:43 ` [PATCH v5 6/9] mm/mremap: Use range flush that does TLB and page walk cache flush Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-04-22  5:43 ` [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-05-20 15:26   ` Aneesh Kumar K.V
2021-05-20 15:26     ` Aneesh Kumar K.V
2021-05-20 16:57     ` Aneesh Kumar K.V
2021-05-20 16:57       ` Aneesh Kumar K.V
2021-05-21  2:40       ` Linus Torvalds
2021-05-21  2:40         ` Linus Torvalds
2021-05-21  3:03         ` Aneesh Kumar K.V
2021-05-21  3:03           ` Aneesh Kumar K.V
2021-05-21  3:28           ` Aneesh Kumar K.V
2021-05-21  3:28             ` Aneesh Kumar K.V
2021-05-21  6:13           ` Linus Torvalds
2021-05-21  6:13             ` Linus Torvalds
2021-05-21 12:50             ` Aneesh Kumar K.V [this message]
2021-05-21 12:50               ` Aneesh Kumar K.V
2021-05-21 13:03               ` Aneesh Kumar K.V
2021-05-21 13:03                 ` Aneesh Kumar K.V
2021-05-21 16:03                 ` Linus Torvalds
2021-05-21 16:03                   ` Linus Torvalds
2021-05-21 16:29                   ` Aneesh Kumar K.V
2021-05-21 16:29                     ` Aneesh Kumar K.V
2021-05-24 14:24                   ` Aneesh Kumar K.V
2021-05-24 14:24                     ` Aneesh Kumar K.V
2021-05-21 15:24               ` Liam Howlett
2021-05-21 15:24                 ` Liam Howlett
2021-05-21 16:02                 ` Aneesh Kumar K.V
2021-05-21 16:02                   ` Aneesh Kumar K.V
2021-05-21 16:05                 ` Linus Torvalds
2021-05-21 16:05                   ` Linus Torvalds
2021-04-22  5:43 ` [PATCH v5 8/9] mm/mremap: Allow arch runtime override Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-04-22  5:43 ` [PATCH v5 9/9] powerpc/mm: Enable move pmd/pud Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-05-11 22:19   ` Andrew Morton
2021-05-11 22:19     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ea8fa4f-a5a2-7dc4-7958-23df6a2c1f3a@linux.ibm.com \
    --to=aneesh.kumar@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=joel@joelfernandes.org \
    --cc=kaleshsingh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.