All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Nadav Amit <nadav.amit@gmail.com>, Jann Horn <jannh@google.com>,
	John Hubbard <jhubbard@nvidia.com>, X86 ML <x86@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	kernel list <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	jroedel@suse.de, ubizjak@gmail.com,
	Alistair Popple <apopple@nvidia.com>
Subject: Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment
Date: Mon, 31 Oct 2022 10:19:43 -0700	[thread overview]
Message-ID: <CAHk-=wiuVAHSqQdGX837J279WaXXmOk883PmYojRihYC4oCLNg@mail.gmail.com> (raw)
In-Reply-To: <Y1+VWfNC7BvP8Pr2@hirez.programming.kicks-ass.net>

On Mon, Oct 31, 2022 at 2:29 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Sun, Oct 30, 2022 at 03:47:36PM -0700, Linus Torvalds wrote:
>
> > + * This is the simplified form of page_remove_rmap(), that only
> > + * deals with last-level pages, so 'compound' is always false,
> > + * and the caller does 'munlock_vma_page(page, vma, compound)'
> > + * separately.
> > + *
> > + * This allows for a much simpler calling convention and code.
> > + *
> > + * The caller holds the pte lock.
> > + */
> > +void page_zap_pte_rmap(struct page *page)
> > +{
>
> One could consider adding something like:
>
> #ifdef USE_SPLIT_PTE_PTLOCKS
>         lockdep_assert_held(ptlock_ptr(page))
> #endif

Yes, except that the page lock goes away in the next few patches and
gets replaced by just using the safe dec_lruvec_page_state() instead,
so it's not really worth it.

> > +     if (!atomic_add_negative(-1, &page->_mapcount))
> > +             return;
> > +
> > +     lock_page_memcg(page);
> > +     __dec_lruvec_page_state(page,
> > +             PageAnon(page) ? NR_ANON_MAPPED : NR_FILE_MAPPED);
> > +     unlock_page_memcg(page);
> > +}
>
> Took me a little while, but yes, .compound=false seems to reduce to
> this.

Yeah - it's why I kept that thing as three separate patches, because
even if each of the patches isn't "immediately obvious", you can at
least go back and follow along and see what I did.

The *full* simplification end result just looks like magic.

Admittedly, I think a lot of that "looks like magic" is because the
rmap code has seriously cruftified over the years. We had that time
when we actually

Go back a decade, and we literally used to do pretty much exactly what
the simplified form does. The transformation to complexity hell starts
with commit 89c06bd52fb9 ("memcg: use new logic for page stat
accounting"), but just look at what it looked like before that:

  git show 89c06bd52fb9^:mm/rmap.c

gets you the state back when it was simple. And look at what it did:

        void page_remove_rmap(struct page *page)
        {
                /* page still mapped by someone else? */
                if (!atomic_add_negative(-1, &page->_mapcount))
                        return;
                ... statistics go here ..

so in the end the simplified form of this page_zap_pte_rmap() really
isn't *that* surprising.

In fact, that old code handled PageHuge() fairly naturally too, and
almost all the mess comes from the memcg accounting - and locking -
changes.

And I actually really wanted to one step further, and try to then
batch up the page state accounting too. It's kind of stupid how we do
all that memcg locking for each page, and with this new setup we have
one nice array of pages that we *could* try to just batch things with.

The pages in normal situations *probably* mostly all share the same
memcg and node, so just optimistically trying to do something like "as
long as it's the same memcg as last time, just keep the lock".

Instead of locking and unlocking for every single page.

But just looking at it exhausted me enough that I don't think I'll go there.

Put another way: after this series, it's not the 'rmap' code that
makes me go Ugh, it's the memcg tracking..

(But the hugepage rmap code is incredibly nasty stuff still, and I
think the whole 'compound=true' case would be better with somebody
taking a look at that case too, but that somebody won't be me).

                  Linus

  reply	other threads:[~2022-10-31 17:20 UTC|newest]

Thread overview: 148+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-22 11:14 [PATCH 00/13] Clean up pmd_get_atomic() and i386-PAE Peter Zijlstra
2022-10-22 11:14 ` [PATCH 01/13] mm: Update ptep_get_lockless()s comment Peter Zijlstra
2022-10-24  5:42   ` John Hubbard
2022-10-24  8:00     ` Peter Zijlstra
2022-10-24 19:58       ` Jann Horn
2022-10-24 20:19         ` Linus Torvalds
2022-10-24 20:23           ` Jann Horn
2022-10-24 20:36             ` Linus Torvalds
2022-10-25  3:21             ` Matthew Wilcox
2022-10-25  7:54               ` Alistair Popple
2022-10-25 13:33                 ` Peter Zijlstra
2022-10-25 13:44                 ` Jann Horn
2022-10-26  0:45                   ` Alistair Popple
2022-10-25 14:02         ` Peter Zijlstra
2022-10-25 14:18           ` Jann Horn
2022-10-25 15:06             ` Peter Zijlstra
2022-10-26 16:45               ` Jann Horn
2022-10-27  7:08                 ` Peter Zijlstra
2022-10-27 18:13                   ` Linus Torvalds
2022-10-27 19:35                     ` Peter Zijlstra
2022-10-27 19:43                       ` Linus Torvalds
2022-10-27 20:15                     ` Nadav Amit
2022-10-27 20:31                       ` Linus Torvalds
2022-10-27 21:44                         ` Nadav Amit
2022-10-28 23:57                           ` Nadav Amit
2022-10-29  0:42                             ` Linus Torvalds
2022-10-29 18:05                               ` Nadav Amit
2022-10-29 18:36                                 ` Linus Torvalds
2022-10-29 18:58                                   ` Linus Torvalds
2022-10-29 19:14                                     ` Linus Torvalds
2022-10-29 19:28                                       ` Nadav Amit
2022-10-30  0:18                                       ` Nadav Amit
2022-10-30  2:17                                     ` Nadav Amit
2022-10-30 18:19                                       ` Linus Torvalds
2022-10-30 18:51                                         ` Linus Torvalds
2022-10-30 22:47                                           ` Linus Torvalds
2022-10-31  1:47                                             ` Linus Torvalds
2022-10-31  4:09                                               ` Nadav Amit
2022-10-31  4:55                                                 ` Nadav Amit
2022-10-31  5:00                                                 ` Linus Torvalds
2022-10-31 15:43                                                   ` Nadav Amit
2022-10-31 17:32                                                     ` Linus Torvalds
2022-10-31  9:36                                               ` Peter Zijlstra
2022-10-31 17:28                                                 ` Linus Torvalds
2022-10-31 18:43                                                   ` mm: delay rmap removal until after TLB flush Linus Torvalds
2022-11-02  9:14                                                     ` Christian Borntraeger
2022-11-02  9:23                                                       ` Christian Borntraeger
2022-11-02 17:55                                                       ` Linus Torvalds
2022-11-02 18:28                                                         ` Linus Torvalds
2022-11-02 22:29                                                         ` Gerald Schaefer
2022-11-02 12:45                                                     ` Peter Zijlstra
2022-11-02 22:31                                                     ` Gerald Schaefer
2022-11-02 23:13                                                       ` Linus Torvalds
2022-11-03  9:52                                                     ` David Hildenbrand
2022-11-03 16:54                                                       ` Linus Torvalds
2022-11-03 17:09                                                         ` Linus Torvalds
2022-11-03 17:36                                                           ` David Hildenbrand
2022-11-04  6:33                                                     ` Alexander Gordeev
2022-11-04 17:35                                                       ` Linus Torvalds
2022-11-06 21:06                                                         ` Hugh Dickins
2022-11-06 22:34                                                           ` Linus Torvalds
2022-11-06 23:14                                                             ` Andrew Morton
2022-11-07  0:06                                                               ` Stephen Rothwell
2022-11-07 16:19                                                               ` Linus Torvalds
2022-11-07 23:02                                                                 ` Andrew Morton
2022-11-07 23:44                                                                   ` Stephen Rothwell
2022-11-07  9:12                                                           ` Peter Zijlstra
2022-11-07 20:07                                                           ` Johannes Weiner
2022-11-07 20:29                                                             ` Linus Torvalds
2022-11-07 23:47                                                               ` Linus Torvalds
2022-11-08  4:28                                                                 ` Linus Torvalds
2022-11-08 19:56                                                                   ` Linus Torvalds
2022-11-08 20:03                                                                     ` Konstantin Ryabitsev
2022-11-08 20:18                                                                       ` Linus Torvalds
2022-11-08 19:41                                                                 ` [PATCH 1/4] mm: introduce 'encoded' page pointers with embedded extra bits Linus Torvalds
2022-11-08 20:37                                                                   ` Nadav Amit
2022-11-08 20:46                                                                     ` Linus Torvalds
2022-11-09  6:36                                                                   ` Alexander Gordeev
2022-11-09 18:00                                                                     ` Linus Torvalds
2022-11-09 20:02                                                                       ` Linus Torvalds
2022-11-08 19:41                                                                 ` [PATCH 2/4] mm: teach release_pages() to take an array of encoded page pointers too Linus Torvalds
2022-11-08 19:41                                                                 ` [PATCH 3/4] mm: mmu_gather: prepare to gather encoded page pointers with flags Linus Torvalds
2022-11-08 19:41                                                                 ` [PATCH 4/4] mm: delay page_remove_rmap() until after the TLB has been flushed Linus Torvalds
2022-11-08 20:48                                                                   ` [lkp] [+115 bytes kernel size regression] [i386-tinyconfig] [0309f16088] " kernel test robot
2022-11-08 21:01                                                                     ` Linus Torvalds
2022-11-08 21:05                                                                   ` [PATCH 4/4] " Nadav Amit
2022-11-09 15:53                                                                   ` Johannes Weiner
2022-11-09 19:31                                                                     ` Hugh Dickins
2022-10-31  9:39                                               ` [PATCH 01/13] mm: Update ptep_get_lockless()s comment Peter Zijlstra
2022-10-31 17:22                                                 ` Linus Torvalds
2022-10-31  9:46                                               ` Peter Zijlstra
2022-10-31  9:28                                             ` Peter Zijlstra
2022-10-31 17:19                                               ` Linus Torvalds [this message]
2022-10-30 19:34                                         ` Nadav Amit
2022-10-29 19:39                                   ` John Hubbard
2022-10-29 20:15                                     ` Linus Torvalds
2022-10-29 20:30                                       ` Linus Torvalds
2022-10-29 20:42                                         ` John Hubbard
2022-10-29 20:56                                       ` Nadav Amit
2022-10-29 21:03                                         ` Nadav Amit
2022-10-29 21:12                                         ` Linus Torvalds
2022-10-29 20:59                                       ` Theodore Ts'o
2022-10-26 19:43               ` Nadav Amit
2022-10-27  7:27                 ` Peter Zijlstra
2022-10-27 17:30                   ` Nadav Amit
2022-10-22 11:14 ` [PATCH 02/13] x86/mm/pae: Make pmd_t similar to pte_t Peter Zijlstra
2022-10-22 11:14 ` [PATCH 03/13] sh/mm: " Peter Zijlstra
2022-12-21 13:54   ` Guenter Roeck
2022-10-22 11:14 ` [PATCH 04/13] mm: Fix pmd_read_atomic() Peter Zijlstra
2022-10-22 17:30   ` Linus Torvalds
2022-10-24  8:09     ` Peter Zijlstra
2022-11-01 12:41     ` Peter Zijlstra
2022-11-01 17:42       ` Linus Torvalds
2022-11-02  9:12       ` [tip: x86/mm] mm: Convert __HAVE_ARCH_P..P_GET to the new style tip-bot2 for Peter Zijlstra
2022-11-03 21:15       ` tip-bot2 for Peter Zijlstra
2022-12-17 18:55       ` tip-bot2 for Peter Zijlstra
2022-10-22 11:14 ` [PATCH 05/13] mm: Rename GUP_GET_PTE_LOW_HIGH Peter Zijlstra
2022-10-22 11:14 ` [PATCH 06/13] mm: Rename pmd_read_atomic() Peter Zijlstra
2022-10-22 11:14 ` [PATCH 07/13] mm/gup: Fix the lockless PMD access Peter Zijlstra
2022-10-23  0:42   ` Hugh Dickins
2022-10-24  7:42     ` Peter Zijlstra
2022-10-25  3:58       ` Hugh Dickins
2022-10-22 11:14 ` [PATCH 08/13] x86/mm/pae: Dont (ab)use atomic64 Peter Zijlstra
2022-10-22 11:14 ` [PATCH 09/13] x86/mm/pae: Use WRITE_ONCE() Peter Zijlstra
2022-10-22 17:42   ` Linus Torvalds
2022-10-24 10:21     ` Peter Zijlstra
2022-10-22 11:14 ` [PATCH 10/13] x86/mm/pae: Be consistent with pXXp_get_and_clear() Peter Zijlstra
2022-10-22 17:53   ` Linus Torvalds
2022-10-24 11:13     ` Peter Zijlstra
2022-10-22 11:14 ` [PATCH 11/13] x86_64: Remove pointless set_64bit() usage Peter Zijlstra
2022-10-22 17:55   ` Linus Torvalds
2022-11-03 19:09   ` Nathan Chancellor
2022-11-03 19:23     ` Uros Bizjak
2022-11-03 19:35       ` Nathan Chancellor
2022-11-03 20:39         ` Linus Torvalds
2022-11-03 21:06           ` Peter Zijlstra
2022-11-04 16:01           ` Peter Zijlstra
2022-11-04 17:15             ` Linus Torvalds
2022-11-05 13:29               ` Jason A. Donenfeld
2022-11-05 15:14                 ` Peter Zijlstra
2022-11-05 20:54                   ` Jason A. Donenfeld
2022-11-07  9:14                   ` David Laight
2022-12-19 15:44               ` Peter Zijlstra
2022-10-22 11:14 ` [PATCH 12/13] x86/mm/pae: Get rid of set_64bit() Peter Zijlstra
2022-10-22 11:14 ` [PATCH 13/13] mm: Remove pointless barrier() after pmdp_get_lockless() Peter Zijlstra
2022-10-22 19:59   ` Yu Zhao
2022-10-22 17:57 ` [PATCH 00/13] Clean up pmd_get_atomic() and i386-PAE Linus Torvalds
2022-10-29 12:21 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wiuVAHSqQdGX837J279WaXXmOk883PmYojRihYC4oCLNg@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=jannh@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=jroedel@suse.de \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nadav.amit@gmail.com \
    --cc=peterz@infradead.org \
    --cc=ubizjak@gmail.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.