linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Chen <vincent.chen@sifive.com>
To: Alexandre Ghiti <alex@ghiti.fr>
Cc: Carlos Eduardo de Paula <me@carlosedp.com>,
	David Abdurachmanov <david.abdurachmanov@gmail.com>,
	Anup Patel <anup@brainfault.org>, Paul Walmsley <paul@pwsan.com>,
	Zong Li <zongbox@gmail.com>,
	Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>,
	Zong Li <zong.li@sifive.com>,
	linux-riscv <linux-riscv@lists.infradead.org>,
	Aurelien Jarno <aurelien@aurel32.net>
Subject: Re: Error on loading some network Kernel modules
Date: Tue, 4 Feb 2020 18:46:15 +0800	[thread overview]
Message-ID: <CABvJ_xiBVQjfJfZU0Dfp0fc9n_zAoc=DSHPFuDMKu4k5n0qJtQ@mail.gmail.com> (raw)
In-Reply-To: <c12ed63e-c717-9fa0-7a6c-74d6d4a83a04@ghiti.fr>

On Tue, Feb 4, 2020 at 5:32 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
>
> On 2/4/20 8:19 AM, Zong Li wrote:
> > Alex Ghiti <alex@ghiti.fr> 於 2020年2月4日 週二 下午2:50寫道:
> >> Hi Zong,
> >>
> >> On 2/3/20 10:55 PM, Zong Li wrote:
> >>> Vincent Chen <vincent.chen@sifive.com> 於 2020年2月3日 週一 下午6:04寫道:
> >>>> On Mon, Feb 3, 2020 at 12:27 AM Romain Dolbeau
> >>>> <romain.dolbeau@european-processor-initiative.eu> wrote:
> >>>>> On 2020-02-01 14:59, Alex Ghiti wrote:
> >>>>>> Why restrict to 128M whereas we have 2GB offset available to the end of
> >>>>>> the kernel ?
> >>>>> Isn't that 2 GiB offset to whatever the module requires in the kernel,
> >>>>> rather than to the end of the kernel space?
> >>>>>
> >>>>> Is there some guarantee that symbols accessible by modules are at the
> >>>>> end of the kernel? If so, wouldn't the maximum offset for this patch
> >>>>> still be (2 GiB - <total size of accessible symbols>)?
> >>>>>
> >>>>> Cordially,
> >>>>>
> >>>>> --
> >>>>> Romain Dolbeau
> >>>> It took me some time to find the root cause of this problem, please
> >>>> allow me to share some observations before the discussion.
> >>>> The root cause of this issue is that the percpu data is declared as a
> >>>> static variable. The "static" attribute will make the compiler think
> >>>> that this symbol is close to the .text section at runtime. Hence, the
> >>>> compiler uses "auipc" to access this percpu data instead of using GOT.
> >>>> In this case,  the access range is limited to + -2G. However, in
> >>>> practice, these percpu data are placed at a pre-allocated region
> >>>> created by the memblock_allocate() function. In other words, the
> >>>> distance between the pre-allocated region (>PAGE_OFFSET ) and the
> >>>> .text section of the kernel module (in VMALLOC region) is much larger
> >>>> than 2G.
> >>>> I agree that the original patch,
> >>>> https://github.com/bjoto/linux/commit/8a56d1c8e8e91c1bc3893946d52b9217c96e1589,
> >>>> can solve most cases. However, I do not think the patch still works if
> >>>> the kernel module region is determined by _end or <total size of
> >>>> accessible symbols>. The reason is that the pre-allocated region for
> >>>> module percpu data comes from the memblock function at runtime. Hence,
> >>>> we cannot know the actual address of this region at compile-time, and
> >>>> this issue probably may occur again in this case.
> >>>>
> >>>> By the way, I think maybe we can refer to the implementation of MIPS.
> >>>> 1. For general cases, we can use this patch to solve this issue.
> >>>> 2. For a large kernel image (>2G) or enabling the KASLR feature, we
> >>>> may need a new code mode to deal with this issue.
> >>>>
> >>> The range [&_end - 2G, VMALLOC_END] won't be suitable when KASLR is
> >>> enabled. Unless we limit the randomized range in small area, the
> >>> module region start address will be bigger than VMALLOC_END.
> >>
> >> Actually, the relocatable patch I proposed already moves "everything" up
> >> at the same
> >> time: the kernel itself but also all the "zones" below (vmalloc,
> >> vmemmap, fixup...etc)
> >> since all those zones are defined from PAGE_OFFSET that is now dynamic.
> >> So the modules
> >> ill remain at the same offset to the kernel, unless explicitly
> >> randomized in the vmalloc zone.
> >>
> > OK, it makes sense. The module region moves along with kernel seems to
> > stay away from the concern I mentioned.
> >
> > So now, the problem is that the pre-allocated region of percpu data is
> > located after _end symbol as Vincent mentioned, the 2G distance seems
> > to be too far for module region start address. (i.e. &_end - 2G).
>
>
> Actually, I don't understand this issue: we are limited to addressing
> symbols within +/- 2GB
> from PC. So as long as the percpu symbol exists in the kernel, it is
> below _end and then we don't
> care that its content is initialized dynamically, as long as we can
> 'compute' its address from PC,
> right ?

In this case, the static percpu symbols of this issue are declared in
the kernel module, not in the kernel image.
When kernel loads the kernel module, in general, it continuously
places all the module sections in the VMALLOC memory area. However,
the ".data..percpu" section is an exception. The kernel places the
".data..percpu" section in a pre-allocated memory region. This region
is used to place the percpu data instances for each CPU and is created
by the memblock(). Hence, the instance of these per-cpu symbols is
above the _end.  In this case, if we make the module region locate at
[_end-2G, PAGE_OFFSET], the distance between these percpu symbols (its
address >_end) and module text section will probably exceed the 2G
limitation.

>
> Can you point out where I can take a look at this pre-allocated region
> for percpu data please ?
>
In mm/percpu.c, you can find how the kernel allocates the percpu data
region during the initialization phase.
In kernel/module.c's simplify_symbols function, you can find kernel
treats .data.percpu section as an exception when loading module.

If I have any misunderstandings, please let me know. Thanks

Best regards,
Vincent

> Thanks,
>
> Alex
>
>
> >
> >>> So if we
> >>> divide a region into module use, we also have to provide module
> >>> randomization at the same time when KASLR is enabled. It good to me if
> >>> there is a new code model to use GOT for static variable, everything
> >>> will be easy.
> >>
> >> GOT usage seems indeed to be easier, but it seems a bit overkill to me:
> >> arm64 already
> >> uses the range I proposed for modules (arch/arm64/kernel/kaslr.c,
> >> module_alloc_base
> >> definition).
> >>
> >> Anyway, even if GOT is chosen, it won't be anytime soon right ? So we
> >> could, for the moment,
> >> constraint the modules to be close to the kernel with any range you like
> >> and that will still
> >> work with our implementation of KASLR.
> > Yes, I'm with you on that, we need a resolution in kernel first.
> >
> >> It seems urgent to fix those modules loading issues for 5.6.
> >>
> >> Alex
> >>


  reply	other threads:[~2020-02-04 10:46 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-28 19:22 Error on loading some network Kernel modules Carlos Eduardo de Paula
2020-01-30  3:00 ` Paul Walmsley
2020-01-30 16:20   ` David Abdurachmanov
2020-01-31 14:12     ` Carlos Eduardo de Paula
2020-01-31 20:11     ` Aurelien Jarno
2020-02-01  7:52       ` Anup Patel
2020-02-01 13:59         ` Alex Ghiti
2020-02-02 14:10           ` Anup Patel
2020-02-02 15:21             ` Troy Benjegerdes
2020-02-02 16:27           ` Romain Dolbeau
2020-02-03 10:04             ` Vincent Chen
2020-02-04  3:55               ` Zong Li
2020-02-04  6:50                 ` Alex Ghiti
2020-02-04  7:19                   ` Zong Li
2020-02-04  9:32                     ` Alexandre Ghiti
2020-02-04 10:46                       ` Vincent Chen [this message]
2020-02-04 11:30                         ` Anup Patel
2020-02-04 14:03                           ` Vincent Chen
2020-02-04 19:10                             ` Alex Ghiti
     [not found]                             ` <a55f265e-71b2-5ebb-b079-6345007a442e@ghiti.fr>
2020-02-05  3:22                               ` Vincent Chen
2020-02-05  4:24                                 ` Anup Patel
2020-02-05 10:37                                   ` Vincent Chen
2020-02-07 14:39                                     ` Vincent Chen
2020-02-07 14:51                                       ` Vincent Chen
2020-02-10  6:37                                         ` Alex Ghiti
2020-02-10  9:53                                           ` Vincent Chen
2020-02-19  6:46                                             ` Alex Ghiti
2020-02-19  7:30                                               ` Vincent Chen
2020-02-04 17:48                           ` Romain Dolbeau
2020-02-03 20:57             ` Alex Ghiti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABvJ_xiBVQjfJfZU0Dfp0fc9n_zAoc=DSHPFuDMKu4k5n0qJtQ@mail.gmail.com' \
    --to=vincent.chen@sifive.com \
    --cc=alex@ghiti.fr \
    --cc=anup@brainfault.org \
    --cc=aurelien@aurel32.net \
    --cc=david.abdurachmanov@gmail.com \
    --cc=linux-riscv@lists.infradead.org \
    --cc=me@carlosedp.com \
    --cc=paul@pwsan.com \
    --cc=romain.dolbeau@european-processor-initiative.eu \
    --cc=zong.li@sifive.com \
    --cc=zongbox@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).