Re: Error on loading some network Kernel modules

From: Alex Ghiti <alex@ghiti.fr>
To: Vincent Chen <vincent.chen@sifive.com>, Anup Patel <Anup.Patel@wdc.com>
Cc: Carlos Eduardo de Paula <me@carlosedp.com>,
	David Abdurachmanov <david.abdurachmanov@gmail.com>,
	Anup Patel <anup@brainfault.org>, Paul Walmsley <paul@pwsan.com>,
	Zong Li <zongbox@gmail.com>,
	Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>,
	Zong Li <zong.li@sifive.com>,
	linux-riscv <linux-riscv@lists.infradead.org>,
	Aurelien Jarno <aurelien@aurel32.net>
Subject: Re: Error on loading some network Kernel modules
Date: Mon, 10 Feb 2020 01:37:20 -0500	[thread overview]
Message-ID: <2b69bae3-b00f-a991-16f6-8f682faeddfe@ghiti.fr> (raw)
In-Reply-To: <CABvJ_xiqAmdqUjZ9s8w9E3BYU8ruFnACpiGZMH-Vc8nCKiwUcA@mail.gmail.com>

Hi Vincent,

On 2/7/20 9:51 AM, Vincent Chen wrote:
> On Fri, Feb 7, 2020 at 10:39 PM Vincent Chen <vincent.chen@sifive.com> wrote:
>>
>>>> For point1, we don't need a dedicated region as long as we are allocating
>>>> modules from VMALLOC area. Let's avoid another virtual memory region if
>>>> possible.
>>>>
>>>
>>> Your comments inspire me if we can know the end of the percpu data
>>> region, we can dynamically calculate the start address in each module
>>> allocation by "<end of the percpu data region> - 2GB" if needed. I am
>>> finding a way to derive the <end of the percpu data region>.
>>>
>>
>> In the implementation of the idea, I found that I did have some
>> misunderstandings about the mechanism of accessing static percpu
>> symbol. Currently, I think this issue can be solved by modifying the
>> module_allocate() to the following.
>> #define MODULE_START max(PFN_ALIGN(&_end - 2 *
>> SZ_1G), PFN_ALIGN(&_text + 8 * SZ_1K - 2 * SZ_1G))
>> void *module_alloc(unsigned long size)
>> {
>> return __vmalloc_node_range(size, 1, MODULE_START,
>> VMALLOC_END, GFP_KERNEL,
>> PAGE_KERNEL_EXEC, 0,
>> NUMA_NO_NODE,
>> __builtin_return_address(0));
>> }
>> After applying a similar modification above, the kernel can
>> successfully insert the test module provided by Romain. Any feedback
>> is welcome. If someone is curious about the reason, I roughly
>> described it at the end of this mail. I am sorry if someone was misled
>> by early discussion.
>>
>>
>> As mentioned early, kernel pre-allocates a memory region for each CPU
>> to place the instances of the percpu data. According to the
>> declaration type, each memory region can be divided into three chunks.
>> These three chunks from low memory to high memory are static chunk,
>> reserved chunk, and dynamic chunk.
>> a. The static chunk is used for the static percpu symbols declared in the kernel
>> b. The reserved chunk is used for the static percpu symbols declared
>> in the kernel module
>> c. The dynamic chunk is used for all dynamic percpu symbols.
>>
>> The order of the percpu symbols in each memory region is the same and
>> is recorded in a symbol list maintained by the kernel. Therefore, the
>> address of a specific CPU's percpu symbol is "The address of this
>> percpu symbol in the symbol list" + "The offset from symbols list to
>> the specific CPU region".
>> a. The symbol list is created based on the ".data..percpu" section. In
>> other words, the start address of this symbol list is the start
>> address of ".data..percpu". When a dynamic percpu symbol is created or
>> a kernel module import a static percpu symbol, the kernel will follow
>> the above chunk rule to add this percpu symbol to its suitable region.
>> b. When each memory area was created, "The offset from symbols list to
>> the specific CPU region" has been recorded.
>>
>> Back to the issue, according to the rules above, the kernel will
>> relocate the ".data..percpu" section of the loaded module to the end
>> of the ".data..percpu" section of the kernel image. The end of the
>> ".data..percpu" section of the kernel image almost equals _text, and
>> the current size of the reserved chunk is 8KB. Therefore, the
>> appropriate range to place the loaded module is [max(PFN_ALIGN(&_end -
>> 2 *SZ_1G), PFN_ALIGN(&_text + 8 * SZ_1K - 2 * SZ_1G)),  VMALLOC_END].
> 
> I did not notice that the size of the kernel .text section is
> impossible below 2 PAGES. Therefore, the modification of module_alloc
> function can be reduced to the following.
> void *module_alloc(unsigned long size)
>   {
>   return __vmalloc_node_range(size, 1, PFN_ALIGN(&_end - 2 *SZ_1G),
>   VMALLOC_END, GFP_KERNEL,
>   PAGE_KERNEL_EXEC, 0,
>   NUMA_NO_NODE,
>   __builtin_return_address(0));
>   }
> It is the same as Alex's suggestion. Is Alex willing to send this
> patch to resolve this bug?
> 

You did all the work, please send a patch explaining what you learnt and 
feel free to add a Suggested-By. Anyway, I'll add a Reviewed-by when you 
propose something.

Thank you Vincent for all your research,

Alex