linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Alex Ghiti <alex@ghiti.fr>
To: Palmer Dabbelt <palmer@dabbelt.com>, benh@kernel.crashing.org
Cc: aou@eecs.berkeley.edu, linux-mm@kvack.org,
	Anup Patel <Anup.Patel@wdc.com>,
	linux-kernel@vger.kernel.org, Atish Patra <Atish.Patra@wdc.com>,
	paulus@samba.org, zong.li@sifive.com,
	Paul Walmsley <paul.walmsley@sifive.com>,
	linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone
Date: Thu, 23 Jul 2020 01:36:45 -0400	[thread overview]
Message-ID: <970adad4-6eec-dffe-ad1c-bf74646229ad@ghiti.fr> (raw)
In-Reply-To: <mhng-4b49d09a-0267-4879-849f-30c24f26e2c3@palmerdabbelt-glaptop1>



Le 7/21/20 à 7:36 PM, Palmer Dabbelt a écrit :
> On Tue, 21 Jul 2020 16:11:02 PDT (-0700), benh@kernel.crashing.org wrote:
>> On Tue, 2020-07-21 at 14:36 -0400, Alex Ghiti wrote:
>>> > > I guess I don't understand why this is necessary at all.
>>> > > Specifically: why
>>> > > can't we just relocate the kernel within the linear map?  That would
>>> > > let the
>>> > > bootloader put the kernel wherever it wants, modulo the physical
>>> > > memory size we
>>> > > support.  We'd need to handle the regions that are coupled to the
>>> > > kernel's
>>> > > execution address, but we could just put them in an explicit memory
>>> > > region
>>> > > which is what we should probably be doing anyway.
>>> >
>>> > Virtual relocation in the linear mapping requires to move the kernel
>>> > physically too. Zong implemented this physical move in its KASLR RFC
>>> > patchset, which is cumbersome since finding an available physical spot
>>> > is harder than just selecting a virtual range in the vmalloc range.
>>> >
>>> > In addition, having the kernel mapping in the linear mapping prevents
>>> > the use of hugepage for the linear mapping resulting in performance 
>>> loss
>>> > (at least for the GB that encompasses the kernel).
>>> >
>>> > Why do you find this "ugly" ? The vmalloc region is just a bunch of
>>> > available virtual addresses to whatever purpose we want, and as 
>>> noted by
>>> > Zong, arm64 uses the same scheme.
>>
>> I don't get it :-)
>>
>> At least on powerpc we move the kernel in the linear mapping and it
>> works fine with huge pages, what is your problem there ? You rely on
>> punching small-page size holes in there ?
> 
> That was my original suggestion, and I'm not actually sure it's 
> invalid.  It
> would mean that both the kernel's physical and virtual addresses are set 
> by the
> bootloader, which may or may not be workable if we want to have an 
> sv48+sv39
> kernel.  My initial approach to sv48+sv39 kernels would be to just throw 
> away
> the sv39 memory on sv48 kernels, which would preserve the linear map but 
> mean
> that there is no single physical address that's accessible for both.  That
> would require some coordination between the bootloader and the kernel as to
> where it should be loaded, but maybe there's a better way to design the 
> linear
> map.  Right now we have a bunch of unwritten rules about where things 
> need to
> be loaded, which is a recipe for disaster.
> 
> We could copy the kernel around, but I'm not sure I really like that 
> idea.  We
> do zero the BSS right now, so it's not like we entirely rely on the 
> bootloader
> to set up the kernel image, but with the hart race boot scheme we have 
> right
> now we'd at least need to leave a stub sitting around.  Maybe we just throw
> away SBI v0.1, though, that's why we called it all legacy in the first 
> place.
> 
> My bigger worry is that anything that involves running the kernel at 
> arbitrary
> virtual addresses means we need a PIC kernel, which means every global 
> symbol
> needs an indirection.  That's probably not so bad for shared libraries, 
> but the
> kernel has a lot of global symbols.  PLT references probably aren't so 
> scary,
> as we have an incoherent instruction cache so the virtual function 
> predictor
> isn't that hard to build, but making all global data accesses GOT-relative
> seems like a disaster for performance.  This fixed-VA thing really just 
> exists
> so we don't have to be full-on PIC.
> 
> In theory I think we could just get away with pretending that medany is 
> PIC,
> which I believe works as long as the data and text offset stays 
> constant, you
> you don't have any symbols between 2GiB and -2GiB (as those may stay fixed,
> even in medany), and you deal with GP accordingly (which should work 
> itself out
> in the current startup code).  We rely on this for some of the early 
> boot code
> (and will soon for kexec), but that's a very controlled code base and we've
> already had some issues.  I'd be much more comfortable adding an explicit
> semi-PIC code model, as I tend to miss something when doing these sorts of
> things and then we could at least add it to the GCC test runs and 
> guarantee it
> actually works.  Not really sure I want to deal with that, though.  It 
> would,
> however, be the only way to get random virtual addresses during kernel
> execution.
> 
>> At least in the old days, there were a number of assumptions that
>> the kernel text/data/bss resides in the linear mapping.
> 
> Ya, it terrified me as well.  Alex says arm64 puts the kernel in the 
> vmalloc
> region, so assuming that's the case it must be possible.  I didn't get that
> from reading the arm64 port (I guess it's no secret that pretty much all 
> I do
> is copy their code)

See https://elixir.bootlin.com/linux/latest/source/arch/arm64/mm/mmu.c#L615.

> 
>> If you change that you need to ensure that it's still physically
>> contiguous and you'll have to tweak __va and __pa, which might induce
>> extra overhead.
> 
> I'm operating under the assumption that we don't want to add an 
> additional load
> to virt2phys conversions.  arm64 bends over backwards to avoid the load, 
> and
> I'm assuming they have a reason for doing so.  Of course, if we're PIC then
> maybe performance just doesn't matter, but I'm not sure I want to just 
> give up.
> Distros will probably build the sv48+sv39 kernels as soon as they show 
> up, even
> if there's no sv48 hardware for a while.

  reply	other threads:[~2020-07-23  5:40 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-07  7:59 [PATCH v5 0/4] vmalloc kernel mapping and relocatable kernel Alexandre Ghiti
2020-06-07  7:59 ` [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone Alexandre Ghiti
2020-06-11 21:34   ` Atish Patra
2020-06-12 12:30     ` Alex Ghiti
2020-07-09  5:05   ` Palmer Dabbelt
2020-07-09  8:15     ` Zong Li
2020-07-09 11:11     ` Alex Ghiti
2020-07-21 18:36       ` Alex Ghiti
2020-07-21 19:05         ` Palmer Dabbelt
2020-07-21 23:12           ` Benjamin Herrenschmidt
2020-07-21 23:48             ` Palmer Dabbelt
2020-07-22  2:21               ` Benjamin Herrenschmidt
2020-07-22  4:50                 ` Michael Ellerman
2020-07-22  5:46                   ` Palmer Dabbelt
2020-07-22  9:43           ` Arnd Bergmann
2020-07-22 19:52             ` Palmer Dabbelt
2020-07-22 20:22               ` Arnd Bergmann
2020-07-22 21:05                 ` Atish Patra
2020-07-24  7:20                   ` Arnd Bergmann
2020-07-23  5:32           ` Alex Ghiti
2020-07-21 23:11         ` Benjamin Herrenschmidt
2020-07-21 23:36           ` Palmer Dabbelt
2020-07-23  5:36             ` Alex Ghiti [this message]
2020-07-23  5:21           ` Alex Ghiti
2020-07-23 22:33             ` Benjamin Herrenschmidt
2020-07-24  8:14               ` Arnd Bergmann
2020-06-07  7:59 ` [PATCH v5 2/4] riscv: Introduce CONFIG_RELOCATABLE Alexandre Ghiti
2020-06-10 14:10   ` Jerome Forissier
2020-06-11 19:43     ` Alex Ghiti
2020-06-07  7:59 ` [PATCH v5 3/4] powerpc: Move script to check relocations at compile time in scripts/ Alexandre Ghiti
2020-06-07  7:59 ` [PATCH v5 4/4] riscv: Check relocations at compile time Alexandre Ghiti
2020-07-08  4:21 ` [PATCH v5 0/4] vmalloc kernel mapping and relocatable kernel Alex Ghiti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=970adad4-6eec-dffe-ad1c-bf74646229ad@ghiti.fr \
    --to=alex@ghiti.fr \
    --cc=Anup.Patel@wdc.com \
    --cc=Atish.Patra@wdc.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=benh@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=paulus@samba.org \
    --cc=zong.li@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).