All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Michael Ellerman <mpe@ellerman.id.au>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	 Kalesh Singh <kaleshsingh@google.com>,
	Nick Piggin <npiggin@gmail.com>,
	 Joel Fernandes <joel@joelfernandes.org>,
	Christophe Leroy <christophe.leroy@csgroup.eu>
Subject: Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock
Date: Thu, 20 May 2021 16:40:41 -1000	[thread overview]
Message-ID: <CAHk-=wjq8thag3uNv-2MMu75OgX5ybMon7gZDUHYwzeTwcZHoA@mail.gmail.com> (raw)
In-Reply-To: <2eafd7df-65fd-1e2c-90b6-d143557a1fdc@linux.ibm.com>

On Thu, May 20, 2021 at 6:57 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Wondering whether this is correct considering we are holding mmap_sem in
> write mode in mremap.

Right. So *normally* the rule is to EITHER

 - hold the mmap_sem for writing

OR

 - hold the page table lock

and that the TLB flush needs to happen before you release that lock.

But as that commit message of commit eb66ae030829 ("mremap: properly
flush TLB before releasing the page") says, "mremap()" is a bit
special. It's special because mremap() didn't take ownership of the
page - it only moved it somewhere else. So now the page-out logic -
that relies on the page table lock - can free the page immediately
after we've released the page table lock.

So basically, in order to delay the TLB flush after releasing the page
table lock, it's not really sufficient to _just_ hold the mmap_sem for
writing. You also need to guarantee that the lifetime of the page
itself is held until after the TLB flush.

For normal operations like "munmap()", this happens naturally, because
we remove the page from the page table, and add it to the list of
pages to be freed after the TLB flush.

But mremap never did that "remove the page and add it to a list to be
free'd later". Instead, it just moved the page somewhere else. And
thus there is no guarantee that the page that got moved will continue
to exist until a TLB flush is done.

So mremap does need to flush the TLB before releasing the page table
lock, because that's the lifetime boundary for the page that got
moved.

                  Linus


WARNING: multiple messages have this Message-ID (diff)
From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nick Piggin <npiggin@gmail.com>, Linux-MM <linux-mm@kvack.org>,
	Kalesh Singh <kaleshsingh@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock
Date: Thu, 20 May 2021 16:40:41 -1000	[thread overview]
Message-ID: <CAHk-=wjq8thag3uNv-2MMu75OgX5ybMon7gZDUHYwzeTwcZHoA@mail.gmail.com> (raw)
In-Reply-To: <2eafd7df-65fd-1e2c-90b6-d143557a1fdc@linux.ibm.com>

On Thu, May 20, 2021 at 6:57 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Wondering whether this is correct considering we are holding mmap_sem in
> write mode in mremap.

Right. So *normally* the rule is to EITHER

 - hold the mmap_sem for writing

OR

 - hold the page table lock

and that the TLB flush needs to happen before you release that lock.

But as that commit message of commit eb66ae030829 ("mremap: properly
flush TLB before releasing the page") says, "mremap()" is a bit
special. It's special because mremap() didn't take ownership of the
page - it only moved it somewhere else. So now the page-out logic -
that relies on the page table lock - can free the page immediately
after we've released the page table lock.

So basically, in order to delay the TLB flush after releasing the page
table lock, it's not really sufficient to _just_ hold the mmap_sem for
writing. You also need to guarantee that the lifetime of the page
itself is held until after the TLB flush.

For normal operations like "munmap()", this happens naturally, because
we remove the page from the page table, and add it to the list of
pages to be freed after the TLB flush.

But mremap never did that "remove the page and add it to a list to be
free'd later". Instead, it just moved the page somewhere else. And
thus there is no guarantee that the page that got moved will continue
to exist until a TLB flush is done.

So mremap does need to flush the TLB before releasing the page table
lock, because that's the lifetime boundary for the page that got
moved.

                  Linus

  reply	other threads:[~2021-05-21  2:41 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-22  5:43 [PATCH v5 0/9] Speedup mremap on ppc64 Aneesh Kumar K.V
2021-04-22  5:43 ` Aneesh Kumar K.V
2021-04-22  5:43 ` [PATCH v5 1/9] selftest/mremap_test: Update the test to handle pagesize other than 4K Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-04-22  5:43 ` [PATCH v5 2/9] selftest/mremap_test: Avoid crash with static build Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-04-22  5:43 ` [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-05-18 20:04   ` Nathan Chancellor
2021-05-18 20:04     ` Nathan Chancellor
2021-05-19  4:46     ` Aneesh Kumar K.V
2021-05-19  4:46       ` Aneesh Kumar K.V
2021-05-19 18:02       ` Nathan Chancellor
2021-05-19 18:02         ` Nathan Chancellor
2021-05-20  2:18       ` Peter Xu
2021-05-20  2:18         ` Peter Xu
2021-05-20  8:26         ` Aneesh Kumar K.V
2021-05-20  8:26           ` Aneesh Kumar K.V
2021-05-20 12:46           ` Peter Xu
2021-05-20 12:46             ` Peter Xu
2021-05-20 13:23             ` Aneesh Kumar K.V
2021-05-20 13:23               ` Aneesh Kumar K.V
2021-05-20 13:37               ` Aneesh Kumar K.V
2021-05-20 13:37                 ` Aneesh Kumar K.V
2021-05-20 14:57                 ` Peter Xu
2021-05-20 14:57                   ` Peter Xu
2021-05-20 19:06                   ` Zi Yan
2021-05-20 19:06                     ` Zi Yan
2021-05-20 20:01                     ` Peter Xu
2021-05-20 20:01                       ` Peter Xu
2021-05-20 20:25                       ` Kalesh Singh
2021-05-20 20:25                         ` Kalesh Singh
2021-04-22  5:43 ` [PATCH v5 4/9] powerpc/mm/book3s64: Fix possible build error Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-04-22  5:43 ` [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-05-15 16:35   ` Guenter Roeck
2021-05-15 16:35     ` Guenter Roeck
2021-05-15 20:41     ` Andrew Morton
2021-05-15 20:41       ` Andrew Morton
2021-05-15 23:05       ` Guenter Roeck
2021-05-15 23:05         ` Guenter Roeck
2021-05-17  8:40     ` Aneesh Kumar K.V
2021-05-17  8:40       ` Aneesh Kumar K.V
2021-05-17 13:38       ` Guenter Roeck
2021-05-17 13:38         ` Guenter Roeck
2021-05-17 13:55         ` Aneesh Kumar K.V
2021-05-17 13:55           ` Aneesh Kumar K.V
2021-05-17 14:18           ` Guenter Roeck
2021-05-17 14:18             ` Guenter Roeck
2021-05-19  0:26             ` Michael Ellerman
2021-05-19  0:26               ` Michael Ellerman
2021-05-19  0:45               ` Segher Boessenkool
2021-05-19  0:45                 ` Segher Boessenkool
2021-05-19 12:03                 ` Segher Boessenkool
2021-05-19 13:37                   ` Guenter Roeck
2021-05-19 14:20                     ` Segher Boessenkool
2021-05-19 14:20                       ` Segher Boessenkool
2021-05-19 15:28                       ` Guenter Roeck
2021-05-19 15:28                         ` Guenter Roeck
2021-05-20  7:37                   ` Michael Ellerman
2021-05-20 12:17                     ` Segher Boessenkool
2021-05-19  1:08               ` Guenter Roeck
2021-05-19  1:08                 ` Guenter Roeck
2021-05-20 11:38                 ` Michael Ellerman
2021-05-20 11:38                   ` Michael Ellerman
2021-05-20 11:56                   ` Guenter Roeck
2021-05-20 11:56                     ` Guenter Roeck
2021-04-22  5:43 ` [PATCH v5 6/9] mm/mremap: Use range flush that does TLB and page walk cache flush Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-04-22  5:43 ` [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-05-20 15:26   ` Aneesh Kumar K.V
2021-05-20 15:26     ` Aneesh Kumar K.V
2021-05-20 16:57     ` Aneesh Kumar K.V
2021-05-20 16:57       ` Aneesh Kumar K.V
2021-05-21  2:40       ` Linus Torvalds [this message]
2021-05-21  2:40         ` Linus Torvalds
2021-05-21  3:03         ` Aneesh Kumar K.V
2021-05-21  3:03           ` Aneesh Kumar K.V
2021-05-21  3:28           ` Aneesh Kumar K.V
2021-05-21  3:28             ` Aneesh Kumar K.V
2021-05-21  6:13           ` Linus Torvalds
2021-05-21  6:13             ` Linus Torvalds
2021-05-21 12:50             ` Aneesh Kumar K.V
2021-05-21 12:50               ` Aneesh Kumar K.V
2021-05-21 13:03               ` Aneesh Kumar K.V
2021-05-21 13:03                 ` Aneesh Kumar K.V
2021-05-21 16:03                 ` Linus Torvalds
2021-05-21 16:03                   ` Linus Torvalds
2021-05-21 16:29                   ` Aneesh Kumar K.V
2021-05-21 16:29                     ` Aneesh Kumar K.V
2021-05-24 14:24                   ` Aneesh Kumar K.V
2021-05-24 14:24                     ` Aneesh Kumar K.V
2021-05-21 15:24               ` Liam Howlett
2021-05-21 15:24                 ` Liam Howlett
2021-05-21 16:02                 ` Aneesh Kumar K.V
2021-05-21 16:02                   ` Aneesh Kumar K.V
2021-05-21 16:05                 ` Linus Torvalds
2021-05-21 16:05                   ` Linus Torvalds
2021-04-22  5:43 ` [PATCH v5 8/9] mm/mremap: Allow arch runtime override Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-04-22  5:43 ` [PATCH v5 9/9] powerpc/mm: Enable move pmd/pud Aneesh Kumar K.V
2021-04-22  5:43   ` Aneesh Kumar K.V
2021-05-11 22:19   ` Andrew Morton
2021-05-11 22:19     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wjq8thag3uNv-2MMu75OgX5ybMon7gZDUHYwzeTwcZHoA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=joel@joelfernandes.org \
    --cc=kaleshsingh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.