linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Jan Kara <jack@suse.cz>, Peter Xu <peterx@redhat.com>,
	Linux-MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Maya B . Gokhale" <gokhale2@llnl.gov>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	Marty Mcfadden <mcfadden8@llnl.gov>,
	Kirill Shutemov <kirill@shutemov.name>,
	Oleg Nesterov <oleg@redhat.com>, Jann Horn <jannh@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification
Date: Mon, 24 Aug 2020 11:22:03 -0700	[thread overview]
Message-ID: <CAHk-=whXyMiXLujhPdfQN3q5n-DpbVRFZfJqjBYLTYw37uSMvA@mail.gmail.com> (raw)
In-Reply-To: <dd6eb3e6-2797-1cf3-e1af-62a809ce83f2@virtuozzo.com>

On Mon, Aug 24, 2020 at 8:38 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>
> Sure, KSM does not increment page counter, when a page becomes PageKsm().
> Is patch comment about that? Even if so, I don't understand what this
> comment is about. "PageKsm() does not take additional counter" is not
> a reason the page can't be reused there.

No, the reason is that we don't want to reuse a KSM page, and the
page_count() check apparently isn't sufficient in all circumstances.

So the comment is there to explain why a plain "page_count()"
apparently isn't sufficient.

> The reason is that readers
> of this page may increase a counter without taking the lock, so
> this page_count() == 1 under the lock does not guarantee anything.

The intent is to get rid of

 (a) all the locking costs. The "lock_page()" we had here used to be
very expensive.

     It's shown up several times in the page lock problems, and the
reason seems to be simply that this is _the_ hottest non-IO path there
is, so it's somewhat easy to generate lots of contention on a shared
page.

 (b) the problems with GUP - because GUP (and some other page sharing
cases) don't increase the page_mapping() count, GUP was "invisible" to
the re-use code, and as a result the reuse code was a buggy mess.

 (c) the complete and pointless complexity of this path, that isn't
actually done anywhere else. The GUP issue was the immediate - and
currently existing - bug caused by this, but the locking costs are
another example.

So the page reuse is simply wrong. It's almost certainly also
pointless and entirely historical. The _reason_ for trying to reuse
the KSM pages was documented not as performance, but simple to match
the other (also pointless) complexity of the swap cache reuse.

So the intent is to do the "page_count()" test early, to get rid of
the locking issues with any shared pages.

So the logic is "if this page is marked PageKsm(), or if it has an
elevated page count, don't even try - just copy".

To make a very concrete example: it's not unusual at all to basically
have simultaneous page faults on a dirty page because it's COW-shared
in both parent and child. Trivial to trigger, with the child and
parent running on different CPU's and just writing to the same page
right after a fork. And there is absolutely _zero_ reason that should
be serialized by anything at all. The parent and child are complete
share-nothing things: taking the page lock was and is simply wrong.

Solution: don't do it. Just notice "Oh, this page has other users"
(and page_count() is the correct thing to do for that, not
page_mappings(), since GUP is also another user), and actively *avoid*
any serialization. Just copy the damn thing.

I'll take full blame for the historical stupidity. This was a bigger
deal back in the days when 4MB of RAM was considered normal. Plus page
locking wasn't even an issue back then. In fact, no locking at all was
needed back when the "try to reuse" code was originally written.
Things were simpler back then.

It's just that I'm 100% convinced that that historical legacy is very
very wrong these days. That "serialize on page lock if COW fault in
parent and child" is just an example of where this is fundamentally
wrong. But the whole complexity in the map count logic is just wholly
and totally wrong too.

I dare anybody to read the swapfile code for "total_map_swapcount" and
tell me they understand it fully.

So my theory is that this code - that is meant to *improve*
performance by sharing pages aggressively after a fork(), because that
used to be a primary issue, is now in fact making performance *much
worse*, because it's trying to optimize for a case that doesn't even
matter any more (does anybody truly believe that swap cache and shared
COW pages are a major source of performance?) and it does so at a huge
complexity _and_ performance cost.

So ripping out the KSM reuse code is just another "this is pointless
and wrong" issue. If you seriously try to KSM share a page that now
only has _one_ single user left, and that one single user writes to it
and is modifying it, then the problem is absolutely *NOT* that we
should try to re-use the page. No, the problem is that the KSM code
picked a horribly bad page to try to share.

Will that happen _occasionally_? Sure. But if it happens once in a
blue moon, we really shouldn't have that code to deal with it.

It's really that simple. All that reuse code is pointless and wrong.
It has historical roots, and it made sense at the time, but in this
day and age I'm convinced it's completely wrong.

Now, I'm _also_ admittedly convinced that I am occasionally completely
wrong, and people do odd things, and maybe there are loads where it
really matters. I doubt it in this case, but I think what we should do
is rip out all the existing historical code, and _if_ somebody has a
case where it matters, we can look at THAT case, and people can show

 (a) what the exact pattern is that we actually care about

 (b) numbers

and then maybe we can re-introduce some sort of re-use code with -
hopefully - a much more targeted and documented "this is why this
matters" approach.

So the intent is to get rid of the page lock thing, but I also hope
that long-term, we can get rid of reuse_swap_page() and some of that
mapcount stuff entirely.

                Linus

  reply	other threads:[~2020-08-24 18:22 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21 23:49 [PATCH 0/4] mm: Simplfy cow handling Peter Xu
2020-08-21 23:49 ` [PATCH 1/4] mm: Trial do_wp_page() simplification Peter Xu
2020-08-24  8:36   ` Kirill Tkhai
2020-08-24 14:30     ` Jan Kara
2020-08-24 15:37       ` Kirill Tkhai
2020-08-24 18:22         ` Linus Torvalds [this message]
2020-09-01  7:01           ` Hugh Dickins
2020-09-14 14:38   ` Jason Gunthorpe
2020-09-14 17:32     ` Linus Torvalds
2020-09-14 18:34       ` Peter Xu
2020-09-14 21:15         ` Peter Xu
2020-09-14 22:55           ` Jason Gunthorpe
2020-09-14 22:59             ` Linus Torvalds
2020-09-14 23:28               ` Jason Gunthorpe
2020-09-15  0:19                 ` Linus Torvalds
2020-09-15 14:50                 ` Peter Xu
2020-09-15 15:17                   ` Peter Xu
2020-09-15 16:05                   ` Jason Gunthorpe
2020-09-15 18:29                     ` Jason Gunthorpe
2020-09-15 19:13                       ` Peter Xu
2020-09-15 19:38                         ` Jason Gunthorpe
2020-09-15 21:33                           ` Peter Xu
2020-09-15 23:22                             ` Jason Gunthorpe
2020-09-16  1:50                               ` John Hubbard
2020-09-16 17:48                                 ` Jason Gunthorpe
2020-09-16 18:46                                   ` Peter Xu
2020-09-17 11:25                                     ` Jason Gunthorpe
2020-09-17 18:11                                       ` Linus Torvalds
2020-09-17 19:38                                         ` Jason Gunthorpe
2020-09-17 19:51                                           ` Linus Torvalds
2020-09-18 16:40                                             ` Peter Xu
2020-09-18 17:16                                               ` Linus Torvalds
2020-09-18 19:57                                                 ` Peter Xu
2020-09-18 17:32                                               ` Jason Gunthorpe
2020-09-18 20:40                                                 ` Peter Xu
2020-09-18 20:59                                                   ` Linus Torvalds
2020-09-19  0:28                                                     ` Jason Gunthorpe
2020-09-18 21:06                                                   ` John Hubbard
2020-09-19  0:01                                                     ` Jason Gunthorpe
2020-09-21  8:35                                                       ` Jan Kara
2020-09-21 12:03                                                         ` Jason Gunthorpe
2022-02-16 16:59                                                           ` Oded Gabbay
2022-02-16 17:24                                                             ` Oded Gabbay
2022-02-16 19:04                                                             ` Linus Torvalds
2022-02-16 19:20                                                               ` Oded Gabbay
2022-02-16 19:24                                                               ` David Hildenbrand
2020-09-21 13:42                                               ` Michal Hocko
2020-09-21 14:18                                                 ` Peter Xu
2020-09-21 14:28                                                   ` Michal Hocko
2020-09-21 14:38                                                     ` Tejun Heo
2020-09-21 14:43                                                       ` Christian Brauner
2020-09-21 14:55                                                         ` Michal Hocko
2020-09-21 15:04                                                           ` Christian Brauner
2020-09-21 16:06                                                             ` Michal Hocko
2020-09-23  7:53                                                               ` Michal Hocko
2020-09-21 14:41                                                 ` Christian Brauner
2020-09-21 14:57                                                   ` Michal Hocko
2020-09-21 16:31                                                     ` Peter Xu
2020-09-17 18:14                                       ` Peter Xu
2020-09-17 18:26                                         ` Linus Torvalds
2020-09-17 19:03                                           ` Peter Xu
2020-09-17 19:42                                             ` Linus Torvalds
2020-09-17 19:55                                               ` John Hubbard
2020-09-17 20:06                                               ` Jason Gunthorpe
2020-09-17 20:19                                                 ` John Hubbard
2020-09-17 20:25                                                   ` Jason Gunthorpe
2020-09-17 20:35                                                 ` Linus Torvalds
2020-09-17 21:40                                                   ` Peter Xu
2020-09-17 22:09                                                     ` Jason Gunthorpe
2020-09-17 22:25                                                       ` Linus Torvalds
2020-09-17 22:48                                                       ` Ira Weiny
2020-09-18  9:36                                                         ` Jan Kara
2020-09-18  9:44                                                       ` Jan Kara
2020-09-18 16:19                                             ` Jason Gunthorpe
2020-09-15 10:23           ` Leon Romanovsky
2020-09-15 15:56           ` Jason Gunthorpe
2020-09-15 15:03   ` Oleg Nesterov
2020-09-15 16:18     ` Peter Xu
2020-08-21 23:49 ` [PATCH 2/4] mm/ksm: Remove reuse_ksm_page() Peter Xu
2020-08-21 23:49 ` [PATCH 3/4] mm/gup: Remove enfornced COW mechanism Peter Xu
2020-09-14 14:27   ` Oleg Nesterov
2020-09-14 17:59     ` Peter Xu
2020-09-14 19:03       ` Linus Torvalds
2020-08-21 23:49 ` [PATCH 4/4] mm: Add PGREUSE counter Peter Xu
2020-08-22 16:14   ` Linus Torvalds
2020-08-24  0:24     ` Peter Xu
2020-08-22 16:05 ` [PATCH 0/4] mm: Simplfy cow handling Linus Torvalds
2020-08-23 23:58   ` Peter Xu
2020-08-24  8:38 ` Kirill Tkhai
2020-08-27 14:15 ` Peter Xu
2021-02-02 14:40 [PATCH 1/4] mm: Trial do_wp_page() simplification Gal Pressman
2021-02-02 16:31 ` Peter Xu
2021-02-02 16:44   ` Jason Gunthorpe
2021-02-02 17:05     ` Peter Xu
2021-02-02 17:13       ` Jason Gunthorpe
2021-02-03 12:43         ` Gal Pressman
2021-02-03 14:00           ` Jason Gunthorpe
2021-02-03 14:47             ` Gal Pressman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whXyMiXLujhPdfQN3q5n-DpbVRFZfJqjBYLTYw37uSMvA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=gokhale2@llnl.gov \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=kirill@shutemov.name \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcfadden8@llnl.gov \
    --cc=oleg@redhat.com \
    --cc=peterx@redhat.com \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).