linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	Song Liu <songliubraving@fb.com>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 2/2] Revert "vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP"
Date: Fri, 22 Apr 2022 10:08:00 -0700	[thread overview]
Message-ID: <CAHk-=wgFoWLCV9aPHkHe1Mpu0XqxYWaPkKLpe_hcsTS_Vx3aRA@mail.gmail.com> (raw)
In-Reply-To: <20220422060107.781512-3-npiggin@gmail.com>

On Thu, Apr 21, 2022 at 11:01 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> This reverts commit 559089e0a93d44280ec3ab478830af319c56dbe3
>
> The previous commit fixes huge vmalloc for drivers that use the
> vmalloc_to_page() struct pages.

Yeah, no.

The very revert shows the problem:

> --- a/arch/powerpc/kernel/module.c
> +++ b/arch/powerpc/kernel/module.c
> @@ -101,7 +101,7 @@ __module_alloc(unsigned long size, unsigned long start, unsigned long end, bool
>          * too.
>          */
>         return __vmalloc_node_range(size, 1, start, end, gfp, prot,
> -                                   VM_FLUSH_RESET_PERMS,
> +                                   VM_FLUSH_RESET_PERMS | VM_NO_HUGE_VMAP,
>                                     NUMA_NO_NODE, __builtin_return_address(0));

This VM_NO_HUGE_VMAP is a sign of the fact that using hugepages for
mapping still isn't a transparent operation.

Now, in some cases that would be perfectly fine, ie the s390 case has
a nice clear comment about how it's a very special case:

> +       /*
> +        * The Create Secure Configuration Ultravisor Call does not support
> +        * using large pages for the virtual memory area.
> +        * This is a hardware limitation.
> +        */
> +       kvm->arch.pv.stor_var = vmalloc_no_huge(vlen);

but as long as it is "anything that plays permission games with the
mapping is broken" we are not reverting that opt-in thing.

And no, it's not just that powerpc module code that is somehow magical.

This is the exact same issue that the bpf people hit.

It's also elsewhere, although it might well be hidden by "small
allocations will never trigger this" (eg the arm64 kprobes case only
does a single page).

I also wonder how this affects any use of 'set_memory_xyz()' with
partial mappings (I can point to "frob_text()" and friends for
modules, but I can easily imagine drivers doing odd things).

In particular, x86 does support pmd splitting for pmd's in
set_memory_xyz(), but I *really* couldn't tell you that it's ok with a
largepage that has already had its page counts split.

It only used to hit the big IO mappings traditionally.

Now I *think* it JustWorks(tm) - I don't actually see any obvious
problems there - and I also really hope that nobody actually even does
that "partial set_memory" on some vmalloc allocation in the first
place, but no, that kind of "let's hope" is not ok.

And we already know it happens at least for modules.

And no, don't even start about that "it's x86".  It *still* isn't
about x86 as shown by this very patch. The issue is generic, and x86
just tends to hit more odd cases and drivers.

In fact, I think x86 probably does *better* than powerpc.

Because it looks like 'set_memory_xyz()' just returns an error for
vmalloc addresses on powerpc. Sounds strange. Doesn't powerpc do
STRICT_MODULE_RWX? Does it work only because 'frob_text()' doesn't
actually check the return value?

Or maybe set_memory_xyz() is ok and it is *only* VM_FLUSH_RESET_PERMS
that doesn't work? I don't know.

But I do know bpf was affected, and I'm looking at that module thing,
and so I suspect it's elsewhere too.

Just opt-in with the mappings that matter.

                   Linus

  reply	other threads:[~2022-04-22 17:08 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-22  6:01 [PATCH 0/2] Request to test fix for "x86/Kconfig: select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP)" Nicholas Piggin
2022-04-22  6:01 ` [PATCH 1/2] mm/vmalloc: huge vmalloc backing pages should be split rather than compound Nicholas Piggin
2022-04-22 16:23   ` Linus Torvalds
2022-04-22  6:01 ` [PATCH 2/2] Revert "vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP" Nicholas Piggin
2022-04-22 17:08   ` Linus Torvalds [this message]
2022-04-22 18:41     ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgFoWLCV9aPHkHe1Mpu0XqxYWaPkKLpe_hcsTS_Vx3aRA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@gmail.com \
    --cc=pmenzel@molgen.mpg.de \
    --cc=rick.p.edgecombe@intel.com \
    --cc=songliubraving@fb.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).