Re: Error on loading some network Kernel modules

From: Zong Li <zongbox@gmail.com>
To: Alex Ghiti <alex@ghiti.fr>
Cc: Paul Walmsley <paul@pwsan.com>,
	David Abdurachmanov <david.abdurachmanov@gmail.com>,
	Anup Patel <anup@brainfault.org>,
	Carlos Eduardo de Paula <me@carlosedp.com>,
	Vincent Chen <vincent.chen@sifive.com>,
	Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>,
	Zong Li <zong.li@sifive.com>,
	linux-riscv <linux-riscv@lists.infradead.org>,
	Aurelien Jarno <aurelien@aurel32.net>
Subject: Re: Error on loading some network Kernel modules
Date: Tue, 4 Feb 2020 15:19:32 +0800	[thread overview]
Message-ID: <CA+ZOyahEPtuNqgSUHVcZpZp3WP3oX4jFOiqJvO827ye4+1DT8Q@mail.gmail.com> (raw)
In-Reply-To: <68bb87cb-50d7-5e85-37f4-ad2cc44865f1@ghiti.fr>

Alex Ghiti <alex@ghiti.fr> 於 2020年2月4日 週二 下午2:50寫道：
>
> Hi Zong,
>
> On 2/3/20 10:55 PM, Zong Li wrote:
> > Vincent Chen <vincent.chen@sifive.com> 於 2020年2月3日 週一 下午6:04寫道：
> >> On Mon, Feb 3, 2020 at 12:27 AM Romain Dolbeau
> >> <romain.dolbeau@european-processor-initiative.eu> wrote:
> >>> On 2020-02-01 14:59, Alex Ghiti wrote:
> >>>> Why restrict to 128M whereas we have 2GB offset available to the end of
> >>>> the kernel ?
> >>> Isn't that 2 GiB offset to whatever the module requires in the kernel,
> >>> rather than to the end of the kernel space?
> >>>
> >>> Is there some guarantee that symbols accessible by modules are at the
> >>> end of the kernel? If so, wouldn't the maximum offset for this patch
> >>> still be (2 GiB - <total size of accessible symbols>)?
> >>>
> >>> Cordially,
> >>>
> >>> --
> >>> Romain Dolbeau
> >> It took me some time to find the root cause of this problem, please
> >> allow me to share some observations before the discussion.
> >> The root cause of this issue is that the percpu data is declared as a
> >> static variable. The "static" attribute will make the compiler think
> >> that this symbol is close to the .text section at runtime. Hence, the
> >> compiler uses "auipc" to access this percpu data instead of using GOT.
> >> In this case,  the access range is limited to + -2G. However, in
> >> practice, these percpu data are placed at a pre-allocated region
> >> created by the memblock_allocate() function. In other words, the
> >> distance between the pre-allocated region (>PAGE_OFFSET ) and the
> >> .text section of the kernel module (in VMALLOC region) is much larger
> >> than 2G.
> >> I agree that the original patch,
> >> https://github.com/bjoto/linux/commit/8a56d1c8e8e91c1bc3893946d52b9217c96e1589,
> >> can solve most cases. However, I do not think the patch still works if
> >> the kernel module region is determined by _end or <total size of
> >> accessible symbols>. The reason is that the pre-allocated region for
> >> module percpu data comes from the memblock function at runtime. Hence,
> >> we cannot know the actual address of this region at compile-time, and
> >> this issue probably may occur again in this case.
> >>
> >> By the way, I think maybe we can refer to the implementation of MIPS.
> >> 1. For general cases, we can use this patch to solve this issue.
> >> 2. For a large kernel image (>2G) or enabling the KASLR feature, we
> >> may need a new code mode to deal with this issue.
> >>
> > The range [&_end - 2G, VMALLOC_END] won't be suitable when KASLR is
> > enabled. Unless we limit the randomized range in small area, the
> > module region start address will be bigger than VMALLOC_END.
>
>
> Actually, the relocatable patch I proposed already moves "everything" up
> at the same
> time: the kernel itself but also all the "zones" below (vmalloc,
> vmemmap, fixup...etc)
> since all those zones are defined from PAGE_OFFSET that is now dynamic.
> So the modules
> ill remain at the same offset to the kernel, unless explicitly
> randomized in the vmalloc zone.
>

OK, it makes sense. The module region moves along with kernel seems to
stay away from the concern I mentioned.

So now, the problem is that the pre-allocated region of percpu data is
located after _end symbol as Vincent mentioned, the 2G distance seems
to be too far for module region start address. (i.e. &_end - 2G).

>
> > So if we
> > divide a region into module use, we also have to provide module
> > randomization at the same time when KASLR is enabled. It good to me if
> > there is a new code model to use GOT for static variable, everything
> > will be easy.
>
>
> GOT usage seems indeed to be easier, but it seems a bit overkill to me:
> arm64 already
> uses the range I proposed for modules (arch/arm64/kernel/kaslr.c,
> module_alloc_base
> definition).
>
> Anyway, even if GOT is chosen, it won't be anytime soon right ? So we
> could, for the moment,
> constraint the modules to be close to the kernel with any range you like
> and that will still
> work with our implementation of KASLR.

Yes, I'm with you on that, we need a resolution in kernel first.

>
> It seems urgent to fix those modules loading issues for 5.6.
>
> Alex
>