linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Yu Zhao <yuzhao@google.com>,  Andy Lutomirski <luto@kernel.org>,
	Peter Xu <peterx@redhat.com>, Pavel Emelyanov <xemul@openvz.org>,
	 Mike Kravetz <mike.kravetz@oracle.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	 Minchan Kim <minchan@kernel.org>, Will Deacon <will@kernel.org>,
	 Peter Zijlstra <peterz@infradead.org>,
	Hugh Dickins <hughd@google.com>,
	 "Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	 Oleg Nesterov <oleg@redhat.com>, Jann Horn <jannh@google.com>,
	Kees Cook <keescook@chromium.org>,
	 John Hubbard <jhubbard@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,  Jan Kara <jack@suse.cz>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	Nadav Amit <nadav.amit@gmail.com>,  Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse
Date: Sat, 9 Jan 2021 19:51:52 -0800	[thread overview]
Message-ID: <CAHk-=wjZTMsv0_GOyQpLRk_5U1r5W8e21f8sV0jykK=z47hjGQ@mail.gmail.com> (raw)
In-Reply-To: <X/prosulFrEoNnoF@redhat.com>

On Sat, Jan 9, 2021 at 6:51 PM Andrea Arcangeli <aarcange@redhat.com> wrote:
>
> I just don't see the simplification coming from
> 09854ba94c6aad7886996bfbee2530b3d8a7f4f4. Instead of checking
> page_mapcount above as an optimization, to me it looks much simpler to
> check it in a single place, in do_wp_page, that in addition of
> optimizing away the superfluous copy, would optimize away the above
> complexity as well.

Here's the difference:

 (a) in COW, "page_mapcount()" is pure and utter garbage, and has zero meaning.

Why? Because MAPCOUNT DOES NOT MATTER FOR COW.

COW is about "I'm about to write to this page, and that means I need
an _exclusive_ page so that I don't write to a page that somebody else
is using".

Can you admit that fundamental fact?

Notice how "page_mapcount()" has absolutely NOTHING to do with
"exclusive page". There are lots of other ways the page can be used
aside from mapcount. The page cache may have a reference to the page.
Somebody that did a GUP may have a reference to the page.

So what actually matters at COW time? The only thing that matters is
"am I the exclusive owner". And guess what? We have a count of page
references. It's "page_count()". That's *EXACTLY* the thing that says
"are there maybe other references to this page".

In other words, COW needs to use page_count(). It really is that easy.
End of story.

So, given that, why do I then do

> +     if (page_mapcount(page) != 1)
> +             return false;

in my patch, when I just told you that "page_mapcount()" is irrelevant for COW?

Guess what? The above isn't about COW. The above isn't about whether
we need to do a copy in order to be able to write to the page without
anybody else being affected by it.

No, at fork time, and at this clear_refs time, the question is
entirely different. The question is not "Do I have exclusive access to
the page", but it is "Did I _already_ made sure that I have exclusive
access to the page because I pinned it"?

See how different the question is?

Because *if* you have done a pinned COW for soem direct-IO read, and
*if* that page is dirty, then you know it's mapped only in your
address space. You're basically doing the _reverse_ of the COW test,
and asking yourself "is this my own private pinned page"?

And then it's actually perfectly sane to do a check that says
"obviously, if somebody else has this page mapped, then that's not the
case".

See?

For COW, "page_mapcount()" is pure and utter garbage, and entirely
meaningless. How many places it's mapped in doesn't matter. You may
have to COW even if it's only mapped in your address space (page
cache, GUP, whatever).

But for "did I already make this exclusive", then it's actually
meaningful to say "is it mapped somewhere else". We know it has other
users - that "page_may_be_pinned()" in fact *guarantees* that it has
other users. But we're double-checking that the other users aren't
other mappings.

That said, I did just realize that that "page_mapcount()" check is
actually pointless.

Because we do have a simpler one. Instead of checking whether all
those references that made us go "page_might_be_pinned()" aren't other
mappings, the simple check for "pte_writable()" would already have
told us that we had already done the COW.

So you are actually right that the page_mapcount() test in my patch is
not the best way to check for this. By the time we see
"page_may_be_pinned()", we might as well just say "Oh, it's a private
mapping and the pte is already writable, so we know we were the
exclusive mapper, because COW and fork() already guarantee that".

> And I won't comment if it's actually safe to skip random pages or
> not. All I know is for mprotect and uffd-wp, definitely the above
> approach wouldn't work.

Why do you say that?

You say ":definitely know", but I think you're full of it.

The fact is, if you have a pinned page, why wouldn't we say "you can't
turn it read-only"?

It's pinned in the VM address space - and it's pinned writable. Simple
and clear semantics. You can *remove* it, but you can't change the
pinning.

              Linus


  reply	other threads:[~2021-01-10  4:00 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-10  0:44 [PATCH 0/1] mm: restore full accuracy in COW page reuse Andrea Arcangeli
2021-01-10  0:44 ` [PATCH 1/1] " Andrea Arcangeli
2021-01-10  2:54   ` Andrea Arcangeli
2021-01-11 14:11     ` Kirill A. Shutemov
2021-01-10  0:55 ` [PATCH 0/1] " Linus Torvalds
2021-01-10  1:19   ` Linus Torvalds
2021-01-10  1:37     ` Linus Torvalds
2021-01-10  3:24       ` Andrea Arcangeli
2021-01-10  2:51     ` Andrea Arcangeli
2021-01-10  3:51       ` Linus Torvalds [this message]
2021-01-10 19:30         ` Linus Torvalds
2021-01-11  1:18           ` Jason Gunthorpe
2021-01-11  7:26           ` John Hubbard
2021-01-11 12:42             ` Matthew Wilcox
2021-01-11 16:05             ` Jason Gunthorpe
2021-01-11 16:15               ` Michal Hocko
2021-01-11 19:19             ` Linus Torvalds
2021-01-11 22:18               ` Linus Torvalds
2021-01-12 17:07                 ` Andy Lutomirski
2021-01-12 23:51                 ` Jerome Glisse
2021-01-13  2:16                 ` Matthew Wilcox
2021-01-13  2:43                   ` Linus Torvalds
2021-01-13  3:31                   ` Linus Torvalds
2021-01-13  8:52                     ` David Hildenbrand
2021-01-13  8:57                       ` David Hildenbrand
2021-01-13 12:32                     ` Kirill A. Shutemov
2021-01-13 12:55                       ` Matthew Wilcox
2021-01-13 19:54                         ` Linus Torvalds
2021-01-13 23:54           ` Peter Xu
2021-01-11 15:52       ` Jason Gunthorpe
2021-01-15  8:59 ` David Hildenbrand
2021-01-15 18:37   ` Jason Gunthorpe
2021-01-15 19:46     ` David Hildenbrand
2021-01-15 19:53       ` Jason Gunthorpe
2021-01-16  3:40       ` John Hubbard
2021-01-16 11:42         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wjZTMsv0_GOyQpLRk_5U1r5W8e21f8sV0jykK=z47hjGQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=keescook@chromium.org \
    --cc=kirill@shutemov.name \
    --cc=ktkhai@virtuozzo.com \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=minchan@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=oleg@redhat.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xemul@openvz.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).