All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"ast@kernel.org" <ast@kernel.org>, "bp@alien8.de" <bp@alien8.de>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"dborkman@redhat.com" <dborkman@redhat.com>,
	"edumazet@google.com" <edumazet@google.com>,
	"hch@infradead.org" <hch@infradead.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"imbrenda@linux.ibm.com" <imbrenda@linux.ibm.com>,
	Kernel Team <Kernel-team@fb.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"mbenes@suse.cz" <mbenes@suse.cz>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>,
	"pmladek@suse.com" <pmladek@suse.com>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	Mike Rapoport <rppt@kernel.org>,
	"song@kernel.org" <song@kernel.org>,
	Song Liu <songliubraving@fb.com>
Subject: Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP
Date: Thu, 21 Apr 2022 18:57:22 +1000	[thread overview]
Message-ID: <1650530694.evuxjgtju7.astroid@bobo.none> (raw)
In-Reply-To: <CAHk-=wiQ5=S3m2+xRbm-1H8fuQwWfQxnO7tHhKg8FjegxzdVaQ@mail.gmail.com>

Excerpts from Linus Torvalds's message of April 21, 2022 3:48 pm:
> On Wed, Apr 20, 2022 at 8:25 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Why not just revert fac54e2bfb5b ?
> 
> That would be stupid, with no sane way forward.
> 
> The fact is, HUGE_VMALLOC was badly misdesigned, and enabling it on
> x86 only ended up showing the problems.
> 
> It wasn't fac54e2bfb5b that was the fundamental issue. It was the
> whole "oh, we should never have done it that way to begin with".
> 
> The whole initial notion that HAVE_ARCH_HUGE_VMALLOC means that there
> must be no PAGE_SIZE pte assumptions was simply broken.

It didn't have that requirement so much as required it to be
accounted for if the arch enabled it.

> There were
> actual real cases that had those assumptions, and the whole "let's
> just change vmalloc behavior by default and then people who don't like
> it can opt out" was just fundamentally a really bad idea.
> 
> Power had that random "oh, we don't want to do this for module_alloc",
> which you had a comment about "more testing" for.
> 
> And s390 had a case of hardware limitations where it didn't work for some cases.
> 
> And then enabling it on x86 turned up more issues.

Those were (AFAIKS) all in arch code though. The patch was the
fundamental issue for x86 because it had bugs. I don't quite see
what your objection is to power and s390's working implementations.
Some parts of the arch code could not cope with hue PTEs so they
used small.

Switching the API around to expect non-arch code to know whether or
not it can use huge mappings is much worse. How is 
alloc_large_system_hash expected to know whether it may use huge
pages on any given half-broken arch like x86?

It's the same like we have huge iomap for a long time. No driver
should be expect to have to understand that.

> So yes, commit fac54e2bfb5b _exposed_ things to a much larger
> audience. But all it just made clear was that your original notion of
> "let's change behavior and randomly disable it as things turn up" was
> just broken.
> 
> Including "small" details like the fact that apparently
> VM_FLUSH_RESET_PERMS didn't work correctly any more for this, which
> caused issues for bpf, and that [PATCH 4/4].

Which is another arch detail.

> And yes, there was a
> half-arsed comment ("may require extra work") to that effect in the
> powerpc __module_alloc() function, but it had been left to others to
> notice separately.

It had a comment in arch/Kconfig about it. Combing through the
details of every arch is left to others who choose to opt-in though.

> So no. We're not going back to that completely broken model. The
> lagepage thing needs to be opt-in, and needs a lot more care.

I don't think it should be opt-in at the caller level (at least not
outside arch/). As I said earlier maybe we end up finding fragmentation
to be a problem that can't be solved with simple heuristics tweaking
so we could think about adding something to give size/speed hint
tradeoff, but as for "can this caller use huge vmap backed memory",
it should not be something drivers or core code has to think about.

Thanks,
Nick

  parent reply	other threads:[~2022-04-21  8:57 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-15 16:44 [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
2022-04-15 16:44 ` [PATCH v4 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
2022-04-15 17:43   ` Rik van Riel
2022-04-15 16:44 ` [PATCH v4 bpf 2/4] page_alloc: use vmalloc_huge for large system hash Song Liu
2022-04-15 17:43   ` Rik van Riel
2022-04-25  7:07     ` Geert Uytterhoeven
2022-04-25  8:17       ` Linus Torvalds
2022-04-25  8:24         ` Geert Uytterhoeven
2022-04-15 16:44 ` [PATCH v4 bpf 3/4] module: introduce module_alloc_huge Song Liu
2022-04-15 18:06   ` Rik van Riel
2022-06-16 16:10   ` Dave Hansen
2022-04-15 16:44 ` [PATCH v4 bpf 4/4] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
2022-04-15 19:05 ` [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Luis Chamberlain
2022-04-16  1:34   ` Song Liu
2022-04-16  1:42     ` Luis Chamberlain
2022-04-16  1:43       ` Luis Chamberlain
2022-04-16  5:08   ` Christoph Hellwig
2022-04-16 19:55     ` Song Liu
2022-04-16 20:30       ` Linus Torvalds
2022-04-16 22:26         ` Song Liu
2022-04-18 10:06           ` Mike Rapoport
2022-04-19  0:44             ` Luis Chamberlain
2022-04-19  1:56               ` Edgecombe, Rick P
2022-04-19  5:36                 ` Song Liu
2022-04-19 18:42                   ` Mike Rapoport
2022-04-19 19:20                     ` Linus Torvalds
2022-04-20  2:03                       ` Alexei Starovoitov
2022-04-20  2:18                         ` Linus Torvalds
2022-04-20 14:42                           ` Song Liu
2022-04-20 18:28                             ` Luis Chamberlain
2022-04-21  7:29                             ` Song Liu
2022-04-21  3:25                       ` Nicholas Piggin
2022-04-21  5:48                         ` Linus Torvalds
2022-04-21  6:02                           ` Linus Torvalds
2022-04-21  9:07                             ` Nicholas Piggin
2022-04-21  8:57                           ` Nicholas Piggin [this message]
2022-04-21 15:44                             ` Linus Torvalds
2022-04-21 23:30                               ` Nicholas Piggin
2022-04-22  0:49                                 ` Linus Torvalds
2022-04-22  1:51                                   ` Nicholas Piggin
2022-04-22  2:31                                     ` Linus Torvalds
2022-04-22  2:57                                       ` Nicholas Piggin
2022-04-21 15:47                             ` Edgecombe, Rick P
2022-04-21 16:15                               ` Linus Torvalds
2022-04-22  0:12                                 ` Nicholas Piggin
2022-04-22  2:29                                   ` Edgecombe, Rick P
2022-04-22  2:47                                     ` Linus Torvalds
2022-04-22 16:54                                       ` Edgecombe, Rick P
2022-04-22  3:08                                     ` Nicholas Piggin
2022-04-22  4:31                                       ` Nicholas Piggin
2022-04-22 17:10                                         ` Edgecombe, Rick P
2022-04-22 20:22                                           ` Edgecombe, Rick P
2022-04-22  3:33                                     ` Nicholas Piggin
2022-04-21  9:47                           ` Nicholas Piggin
2022-04-19 21:24                 ` Luis Chamberlain
2022-04-19 23:58                   ` Edgecombe, Rick P
2022-04-20  7:58                   ` Petr Mladek
2022-04-19 18:20               ` Mike Rapoport
2022-04-24 17:43       ` Linus Torvalds
2022-04-25  6:48         ` Song Liu
2022-04-21  3:19     ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1650530694.evuxjgtju7.astroid@bobo.none \
    --to=npiggin@gmail.com \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dborkman@redhat.com \
    --cc=edumazet@google.com \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mbenes@suse.cz \
    --cc=mcgrof@kernel.org \
    --cc=pmladek@suse.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=song@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.