linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"ast@kernel.org" <ast@kernel.org>, "bp@alien8.de" <bp@alien8.de>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"dborkman@redhat.com" <dborkman@redhat.com>,
	"edumazet@google.com" <edumazet@google.com>,
	"hch@infradead.org" <hch@infradead.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"imbrenda@linux.ibm.com" <imbrenda@linux.ibm.com>,
	Kernel Team <Kernel-team@fb.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"mbenes@suse.cz" <mbenes@suse.cz>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>,
	"pmladek@suse.com" <pmladek@suse.com>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	Mike Rapoport <rppt@kernel.org>,
	"song@kernel.org" <song@kernel.org>,
	Song Liu <songliubraving@fb.com>
Subject: Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP
Date: Thu, 21 Apr 2022 19:31:05 -0700	[thread overview]
Message-ID: <CAHk-=wjiV5LHnbFs+ObSK-cDdCc7pEV+mMUJcZJPnj4RnOGd2Q@mail.gmail.com> (raw)
In-Reply-To: <1650590628.043zdepwk1.astroid@bobo.none>

On Thu, Apr 21, 2022 at 6:51 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> > See
> >
> >     https://lore.kernel.org/all/20220415164413.2727220-3-song@kernel.org/
> >
> > for [PATCH 2/4] in the series for this particular issue.
>
> I was being facetious. The problem is you can't do ^ because x86 is
> buggy.

No, we probably *can* do that PATCH 2/4. I suspect x86 really isn't
that buggy. The bugs are elsewhere (including other vmalloc_huge()
uses).

Really. Why can't you just admit that the major bug was in the
hugepage code itself?

You claim:

> Because it can be transparent. The bug was (stupidly) using compound
> pages when it should have just used split higher order pages.

but we're in -rc3 for 5.18, and you seem to be entirely ignoring the
fact that that stupid bug has been around for a *YEAR* now.

Guess what? It was reported within *days* of the code having  been
enabled on x86.

But for about a year, youv'e been convinced that powerpc is fine,
because nobody ever reported it.

And you *still* try to make this about how it's some "x86 bug",
despite that bug not having been x86-specific AT ALL.

Nick, please take a long look at yourself in the mirror.

And stop this whole mindless "it's x86".

The *ONLY* thing x86-64 did was to show that the code that had been
enabled on powerpc for a year had gotten almost no testing there.

And don't bother mentioning s390. It got even less coverage there.

So exactly *because* bugs were uncovered in days by x86 enabling this,
I'm not rushing to re-enable it until I think it's gone through more
thinking and testing.

And in particular, I really *really* want to limit the fallout.

For example, your "two-liner fix" is not at all obvious.

That broken code case used to have a comment that remap_vmalloc_page()
required compound pages, and you just removed that whole thing as if
it didn't matter, and split the page.

(I also think the comment meant 'vmap_pages_range()', but whatever).

And the thing is, I'm not entirely convinced that comment was wrong
and could just be ignored. The freeing code in __vunmap() will do

                int i, step = 1U << page_order;

                for (i = 0; i < area->nr_pages; i += step) {
                        struct page *page = area->pages[i];

                        BUG_ON(!page);
                        mod_memcg_page_state(page, MEMCG_VMALLOC, -step);
                        __free_pages(page, page_order);

which now looks VERY VERY wrong.

You've split the pages, they may be used as individual pages (possibly
by other things), and then you now at freeing time treat them as a
single compound page after all..

So your "trivial two-liner" that tried to fix a bug that has been
there for a year now, itself seems quite questionable.

Maybe it works, maybe it doesn't. My bet is "it doesn't".

And guess what? I bet it worked just fine in your testing on powerpc,
because you probably didn't actually have any real huge-page vmalloc
cases except for those filesystem big-hash cases that never get
free'd.

So that "this code was completely buggy for a year on powerpc" never
seemed to teach you anything about the code.

And again - none of this is at all x86-specific. NOT AT ALL.

So how about you admit you were wrong to begin with.

That hugepage code needs more care before we re-enable it.

Your two-liner wasn't so obvious after all, was it?

I really think we're much safer saying "hugepage mappings only matter
for a couple of things, and those things will *not* do sub-page games,
so it's simple and safe".

.. and that requires that opt-in model.

Because your "it's transparent" argument has never ever actually been
true, now has it?

           Linus

  reply	other threads:[~2022-04-22  2:38 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20220415164413.2727220-1-song@kernel.org>
     [not found] ` <20220415164413.2727220-2-song@kernel.org>
2022-04-15 17:43   ` [PATCH v4 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Rik van Riel
     [not found] ` <20220415164413.2727220-3-song@kernel.org>
2022-04-15 17:43   ` [PATCH v4 bpf 2/4] page_alloc: use vmalloc_huge for large system hash Rik van Riel
2022-04-25  7:07     ` Geert Uytterhoeven
2022-04-25  8:17       ` Linus Torvalds
2022-04-25  8:24         ` Geert Uytterhoeven
     [not found] ` <20220415164413.2727220-4-song@kernel.org>
2022-04-15 18:06   ` [PATCH v4 bpf 3/4] module: introduce module_alloc_huge Rik van Riel
2022-06-16 16:10   ` Dave Hansen
2022-04-15 19:05 ` [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Luis Chamberlain
2022-04-16  1:34   ` Song Liu
2022-04-16  1:42     ` Luis Chamberlain
2022-04-16  1:43       ` Luis Chamberlain
2022-04-16  5:08   ` Christoph Hellwig
2022-04-16 19:55     ` Song Liu
2022-04-16 20:30       ` Linus Torvalds
2022-04-16 22:26         ` Song Liu
2022-04-18 10:06           ` Mike Rapoport
2022-04-19  0:44             ` Luis Chamberlain
2022-04-19  1:56               ` Edgecombe, Rick P
2022-04-19  5:36                 ` Song Liu
2022-04-19 18:42                   ` Mike Rapoport
2022-04-19 19:20                     ` Linus Torvalds
2022-04-20  2:03                       ` Alexei Starovoitov
2022-04-20  2:18                         ` Linus Torvalds
2022-04-20 14:42                           ` Song Liu
2022-04-20 18:28                             ` Luis Chamberlain
2022-04-21  3:25                       ` Nicholas Piggin
2022-04-21  5:48                         ` Linus Torvalds
2022-04-21  6:02                           ` Linus Torvalds
2022-04-21  9:07                             ` Nicholas Piggin
2022-04-21  8:57                           ` Nicholas Piggin
2022-04-21 15:44                             ` Linus Torvalds
2022-04-21 23:30                               ` Nicholas Piggin
2022-04-22  0:49                                 ` Linus Torvalds
2022-04-22  1:51                                   ` Nicholas Piggin
2022-04-22  2:31                                     ` Linus Torvalds [this message]
2022-04-22  2:57                                       ` Nicholas Piggin
2022-04-21 15:47                             ` Edgecombe, Rick P
2022-04-21 16:15                               ` Linus Torvalds
2022-04-22  0:12                                 ` Nicholas Piggin
2022-04-22  2:29                                   ` Edgecombe, Rick P
2022-04-22  2:47                                     ` Linus Torvalds
2022-04-22 16:54                                       ` Edgecombe, Rick P
2022-04-22  3:08                                     ` Nicholas Piggin
2022-04-22  4:31                                       ` Nicholas Piggin
2022-04-22 17:10                                         ` Edgecombe, Rick P
2022-04-22 20:22                                           ` Edgecombe, Rick P
2022-04-22  3:33                                     ` Nicholas Piggin
2022-04-21  9:47                           ` Nicholas Piggin
2022-04-19 21:24                 ` Luis Chamberlain
2022-04-19 23:58                   ` Edgecombe, Rick P
2022-04-20  7:58                   ` Petr Mladek
2022-04-19 18:20               ` Mike Rapoport
2022-04-24 17:43       ` Linus Torvalds
2022-04-25  6:48         ` Song Liu
2022-04-21  3:19     ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wjiV5LHnbFs+ObSK-cDdCc7pEV+mMUJcZJPnj4RnOGd2Q@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dborkman@redhat.com \
    --cc=edumazet@google.com \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mbenes@suse.cz \
    --cc=mcgrof@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=pmladek@suse.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=song@kernel.org \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).