From: Alex Ghiti <alex@ghiti.fr>
To: Palmer Dabbelt <palmer@dabbelt.com>, benh@kernel.crashing.org
Cc: aou@eecs.berkeley.edu, linux-mm@kvack.org,
Anup Patel <Anup.Patel@wdc.com>,
linux-kernel@vger.kernel.org, Atish Patra <Atish.Patra@wdc.com>,
paulus@samba.org, zong.li@sifive.com,
Paul Walmsley <paul.walmsley@sifive.com>,
linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone
Date: Thu, 23 Jul 2020 01:36:45 -0400 [thread overview]
Message-ID: <970adad4-6eec-dffe-ad1c-bf74646229ad@ghiti.fr> (raw)
In-Reply-To: <mhng-4b49d09a-0267-4879-849f-30c24f26e2c3@palmerdabbelt-glaptop1>
Le 7/21/20 à 7:36 PM, Palmer Dabbelt a écrit :
> On Tue, 21 Jul 2020 16:11:02 PDT (-0700), benh@kernel.crashing.org wrote:
>> On Tue, 2020-07-21 at 14:36 -0400, Alex Ghiti wrote:
>>> > > I guess I don't understand why this is necessary at all.
>>> > > Specifically: why
>>> > > can't we just relocate the kernel within the linear map? That would
>>> > > let the
>>> > > bootloader put the kernel wherever it wants, modulo the physical
>>> > > memory size we
>>> > > support. We'd need to handle the regions that are coupled to the
>>> > > kernel's
>>> > > execution address, but we could just put them in an explicit memory
>>> > > region
>>> > > which is what we should probably be doing anyway.
>>> >
>>> > Virtual relocation in the linear mapping requires to move the kernel
>>> > physically too. Zong implemented this physical move in its KASLR RFC
>>> > patchset, which is cumbersome since finding an available physical spot
>>> > is harder than just selecting a virtual range in the vmalloc range.
>>> >
>>> > In addition, having the kernel mapping in the linear mapping prevents
>>> > the use of hugepage for the linear mapping resulting in performance
>>> loss
>>> > (at least for the GB that encompasses the kernel).
>>> >
>>> > Why do you find this "ugly" ? The vmalloc region is just a bunch of
>>> > available virtual addresses to whatever purpose we want, and as
>>> noted by
>>> > Zong, arm64 uses the same scheme.
>>
>> I don't get it :-)
>>
>> At least on powerpc we move the kernel in the linear mapping and it
>> works fine with huge pages, what is your problem there ? You rely on
>> punching small-page size holes in there ?
>
> That was my original suggestion, and I'm not actually sure it's
> invalid. It
> would mean that both the kernel's physical and virtual addresses are set
> by the
> bootloader, which may or may not be workable if we want to have an
> sv48+sv39
> kernel. My initial approach to sv48+sv39 kernels would be to just throw
> away
> the sv39 memory on sv48 kernels, which would preserve the linear map but
> mean
> that there is no single physical address that's accessible for both. That
> would require some coordination between the bootloader and the kernel as to
> where it should be loaded, but maybe there's a better way to design the
> linear
> map. Right now we have a bunch of unwritten rules about where things
> need to
> be loaded, which is a recipe for disaster.
>
> We could copy the kernel around, but I'm not sure I really like that
> idea. We
> do zero the BSS right now, so it's not like we entirely rely on the
> bootloader
> to set up the kernel image, but with the hart race boot scheme we have
> right
> now we'd at least need to leave a stub sitting around. Maybe we just throw
> away SBI v0.1, though, that's why we called it all legacy in the first
> place.
>
> My bigger worry is that anything that involves running the kernel at
> arbitrary
> virtual addresses means we need a PIC kernel, which means every global
> symbol
> needs an indirection. That's probably not so bad for shared libraries,
> but the
> kernel has a lot of global symbols. PLT references probably aren't so
> scary,
> as we have an incoherent instruction cache so the virtual function
> predictor
> isn't that hard to build, but making all global data accesses GOT-relative
> seems like a disaster for performance. This fixed-VA thing really just
> exists
> so we don't have to be full-on PIC.
>
> In theory I think we could just get away with pretending that medany is
> PIC,
> which I believe works as long as the data and text offset stays
> constant, you
> you don't have any symbols between 2GiB and -2GiB (as those may stay fixed,
> even in medany), and you deal with GP accordingly (which should work
> itself out
> in the current startup code). We rely on this for some of the early
> boot code
> (and will soon for kexec), but that's a very controlled code base and we've
> already had some issues. I'd be much more comfortable adding an explicit
> semi-PIC code model, as I tend to miss something when doing these sorts of
> things and then we could at least add it to the GCC test runs and
> guarantee it
> actually works. Not really sure I want to deal with that, though. It
> would,
> however, be the only way to get random virtual addresses during kernel
> execution.
>
>> At least in the old days, there were a number of assumptions that
>> the kernel text/data/bss resides in the linear mapping.
>
> Ya, it terrified me as well. Alex says arm64 puts the kernel in the
> vmalloc
> region, so assuming that's the case it must be possible. I didn't get that
> from reading the arm64 port (I guess it's no secret that pretty much all
> I do
> is copy their code)
See https://elixir.bootlin.com/linux/latest/source/arch/arm64/mm/mmu.c#L615.
>
>> If you change that you need to ensure that it's still physically
>> contiguous and you'll have to tweak __va and __pa, which might induce
>> extra overhead.
>
> I'm operating under the assumption that we don't want to add an
> additional load
> to virt2phys conversions. arm64 bends over backwards to avoid the load,
> and
> I'm assuming they have a reason for doing so. Of course, if we're PIC then
> maybe performance just doesn't matter, but I'm not sure I want to just
> give up.
> Distros will probably build the sv48+sv39 kernels as soon as they show
> up, even
> if there's no sv48 hardware for a while.
next prev parent reply other threads:[~2020-07-23 5:40 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-07 7:59 [PATCH v5 0/4] vmalloc kernel mapping and relocatable kernel Alexandre Ghiti
2020-06-07 7:59 ` [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone Alexandre Ghiti
2020-06-11 21:34 ` Atish Patra
2020-06-12 12:30 ` Alex Ghiti
2020-07-09 5:05 ` Palmer Dabbelt
2020-07-09 8:15 ` Zong Li
2020-07-09 11:11 ` Alex Ghiti
2020-07-21 18:36 ` Alex Ghiti
2020-07-21 19:05 ` Palmer Dabbelt
2020-07-21 23:12 ` Benjamin Herrenschmidt
2020-07-21 23:48 ` Palmer Dabbelt
2020-07-22 2:21 ` Benjamin Herrenschmidt
2020-07-22 4:50 ` Michael Ellerman
2020-07-22 5:46 ` Palmer Dabbelt
2020-07-22 9:43 ` Arnd Bergmann
2020-07-22 19:52 ` Palmer Dabbelt
2020-07-22 20:22 ` Arnd Bergmann
2020-07-22 21:05 ` Atish Patra
2020-07-24 7:20 ` Arnd Bergmann
2020-07-23 5:32 ` Alex Ghiti
2020-07-21 23:11 ` Benjamin Herrenschmidt
2020-07-21 23:36 ` Palmer Dabbelt
2020-07-23 5:36 ` Alex Ghiti [this message]
2020-07-23 5:21 ` Alex Ghiti
2020-07-23 22:33 ` Benjamin Herrenschmidt
2020-07-24 8:14 ` Arnd Bergmann
2020-06-07 7:59 ` [PATCH v5 2/4] riscv: Introduce CONFIG_RELOCATABLE Alexandre Ghiti
2020-06-10 14:10 ` Jerome Forissier
2020-06-11 19:43 ` Alex Ghiti
2020-06-07 7:59 ` [PATCH v5 3/4] powerpc: Move script to check relocations at compile time in scripts/ Alexandre Ghiti
2020-06-07 7:59 ` [PATCH v5 4/4] riscv: Check relocations at compile time Alexandre Ghiti
2020-07-08 4:21 ` [PATCH v5 0/4] vmalloc kernel mapping and relocatable kernel Alex Ghiti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=970adad4-6eec-dffe-ad1c-bf74646229ad@ghiti.fr \
--to=alex@ghiti.fr \
--cc=Anup.Patel@wdc.com \
--cc=Atish.Patra@wdc.com \
--cc=aou@eecs.berkeley.edu \
--cc=benh@kernel.crashing.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=paulus@samba.org \
--cc=zong.li@sifive.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).