linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* powerpc: Set crashkernel offset to mid of RMA region
@ 2022-02-04  8:56 Sourabh Jain
  2022-02-16 12:25 ` Michael Ellerman
  0 siblings, 1 reply; 10+ messages in thread
From: Sourabh Jain @ 2022-02-04  8:56 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: mahesh, hbathini, Abdul haleem

On large config LPARs (having 192 and more cores), Linux fails to boot
due to insufficient memory in the first memblock. It is due to the
memory reservation for the crash kernel which starts at 128MB offset of
the first memblock. This memory reservation for the crash kernel doesn't
leave enough space in the first memblock to accommodate other essential
system resources.

The crash kernel start address was set to 128MB offset by default to
ensure that the crash kernel get some memory below the RMA region which
is used to be of size 256MB. But given that the RMA region size can be
512MB or more, setting the crash kernel offset to mid of RMA size will
leave enough space for the kernel to allocate memory for other system
resources.

Since the above crash kernel offset change is only applicable to the LPAR
platform, the LPAR feature detection is pushed before the crash kernel
reservation. The rest of LPAR specific initialization will still
be done during pseries_probe_fw_features as usual.

This patch is dependent on changes to paca allocation for boot CPU. It
expect boot CPU to discover 1T segment support which is introduced by
the patch posted here:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reported-by: Abdul haleem <abdhalee@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/rtas.c |  6 ++++++
 arch/powerpc/kexec/core.c  | 15 +++++++++++----
 2 files changed, 17 insertions(+), 4 deletions(-)
 
 ---
Chnages in v4:
	- fix build issue for 32-bit.

Changes in v3:
	https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239371.html
 ---

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 733e6ef36758..1f42aabbbab3 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1313,6 +1313,12 @@ int __init early_init_dt_scan_rtas(unsigned long node,
 	entryp = of_get_flat_dt_prop(node, "linux,rtas-entry", NULL);
 	sizep  = of_get_flat_dt_prop(node, "rtas-size", NULL);
 
+#ifdef CONFIG_PPC64
+	/* need this feature to decide the crashkernel offset */
+	if (of_get_flat_dt_prop(node, "ibm,hypertas-functions", NULL))
+		powerpc_firmware_features |= FW_FEATURE_LPAR;
+#endif
+
 	if (basep && entryp && sizep) {
 		rtas.base = *basep;
 		rtas.entry = *entryp;
diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
index 8b68d9f91a03..abf5897ae88c 100644
--- a/arch/powerpc/kexec/core.c
+++ b/arch/powerpc/kexec/core.c
@@ -134,11 +134,18 @@ void __init reserve_crashkernel(void)
 	if (!crashk_res.start) {
 #ifdef CONFIG_PPC64
 		/*
-		 * On 64bit we split the RMO in half but cap it at half of
-		 * a small SLB (128MB) since the crash kernel needs to place
-		 * itself and some stacks to be in the first segment.
+		 * On the LPAR platform place the crash kernel to mid of
+		 * RMA size (512MB or more) to ensure the crash kernel
+		 * gets enough space to place itself and some stack to be
+		 * in the first segment. At the same time normal kernel
+		 * also get enough space to allocate memory for essential
+		 * system resource in the first segment. Keep the crash
+		 * kernel starts at 128MB offset on other platforms.
 		 */
-		crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
+		if (firmware_has_feature(FW_FEATURE_LPAR))
+			crashk_res.start = ppc64_rma_size / 2;
+		else
+			crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
 #else
 		crashk_res.start = KDUMP_KERNELBASE;
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: powerpc: Set crashkernel offset to mid of RMA region
  2022-02-04  8:56 powerpc: Set crashkernel offset to mid of RMA region Sourabh Jain
@ 2022-02-16 12:25 ` Michael Ellerman
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2022-02-16 12:25 UTC (permalink / raw)
  To: mpe, Sourabh Jain, linuxppc-dev; +Cc: Abdul haleem, hbathini, mahesh

On Fri, 4 Feb 2022 14:26:01 +0530, Sourabh Jain wrote:
> On large config LPARs (having 192 and more cores), Linux fails to boot
> due to insufficient memory in the first memblock. It is due to the
> memory reservation for the crash kernel which starts at 128MB offset of
> the first memblock. This memory reservation for the crash kernel doesn't
> leave enough space in the first memblock to accommodate other essential
> system resources.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: Set crashkernel offset to mid of RMA region
      https://git.kernel.org/powerpc/c/7c5ed82b800d8615cdda00729e7b62e5899f0b13

cheers

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: powerpc: Set crashkernel offset to mid of RMA region
  2022-02-03 11:07     ` Michael Ellerman
@ 2022-02-04  9:14       ` Sourabh Jain
  0 siblings, 0 replies; 10+ messages in thread
From: Sourabh Jain @ 2022-02-04  9:14 UTC (permalink / raw)
  To: linuxppc-dev


On 03/02/22 16:37, Michael Ellerman wrote:
> Sourabh Jain <sourabhjain@linux.ibm.com> writes:
>> On 01/02/22 17:14, Michael Ellerman wrote:
>>> Sourabh Jain <sourabhjain@linux.ibm.com> writes:
>>>> On large config LPARs (having 192 and more cores), Linux fails to boot
>>>> due to insufficient memory in the first memblock. It is due to the
>>>> memory reservation for the crash kernel which starts at 128MB offset of
>>>> the first memblock. This memory reservation for the crash kernel doesn't
>>>> leave enough space in the first memblock to accommodate other essential
>>>> system resources.
>>>>
>>>> The crash kernel start address was set to 128MB offset by default to
>>>> ensure that the crash kernel get some memory below the RMA region which
>>>> is used to be of size 256MB. But given that the RMA region size can be
>>>> 512MB or more, setting the crash kernel offset to mid of RMA size will
>>>> leave enough space for kernel to allocate memory for other system
>>>> resources.
>>>>
>>>> Since the above crash kernel offset change is only applicable to the LPAR
>>>> platform, the LPAR feature detection is pushed before the crash kernel
>>>> reservation. The rest of LPAR specific initialization will still
>>>> be done during pseries_probe_fw_features as usual.
>>>>
>>>> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
>>>> Reported-and-tested-by: Abdul haleem <abdhalee@linux.vnet.ibm.com>
>>>>
>>>> ---
>>>>    arch/powerpc/kernel/rtas.c |  4 ++++
>>>>    arch/powerpc/kexec/core.c  | 15 +++++++++++----
>>>>    2 files changed, 15 insertions(+), 4 deletions(-)
>>>>
>>>>    ---
>>>>    Change in v3:
>>>> 	Dropped 1st and 2nd patch from v2. 1st and 2nd patch from v2 patch
>>>> 	series [1] try to discover 1T segment MMU feature support
>>>> 	BEFORE boot CPU paca allocation ([1] describes why it is needed).
>>>> 	MPE has posted a patch [2] that archives a similar objective by moving
>>>> 	boot CPU paca allocation after mmu_early_init_devtree().
>>>>
>>>> NOTE: This patch is dependent on the patch [2].
>>>>
>>>> [1] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20211018084434.217772-3-sourabhjain@linux.ibm.com/
>>>> [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html
>>>>    ---
>>>>
>>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>>> index 733e6ef36758..06df7464fb57 100644
>>>> --- a/arch/powerpc/kernel/rtas.c
>>>> +++ b/arch/powerpc/kernel/rtas.c
>>>> @@ -1313,6 +1313,10 @@ int __init early_init_dt_scan_rtas(unsigned long node,
>>>>    	entryp = of_get_flat_dt_prop(node, "linux,rtas-entry", NULL);
>>>>    	sizep  = of_get_flat_dt_prop(node, "rtas-size", NULL);
>>>>    
>>>> +	/* need this feature to decide the crashkernel offset */
>>>> +	if (of_get_flat_dt_prop(node, "ibm,hypertas-functions", NULL))
>>>> +		powerpc_firmware_features |= FW_FEATURE_LPAR;
>>>> +
>>> As you'd have seen this breaks the 32-bit build. It will need an #ifdef
>>> CONFIG_PPC64 around it.
>>>
>>>>    	if (basep && entryp && sizep) {
>>>>    		rtas.base = *basep;
>>>>    		rtas.entry = *entryp;
>>>> diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
>>>> index 8b68d9f91a03..abf5897ae88c 100644
>>>> --- a/arch/powerpc/kexec/core.c
>>>> +++ b/arch/powerpc/kexec/core.c
>>>> @@ -134,11 +134,18 @@ void __init reserve_crashkernel(void)
>>>>    	if (!crashk_res.start) {
>>>>    #ifdef CONFIG_PPC64
>>>>    		/*
>>>> -		 * On 64bit we split the RMO in half but cap it at half of
>>>> -		 * a small SLB (128MB) since the crash kernel needs to place
>>>> -		 * itself and some stacks to be in the first segment.
>>>> +		 * On the LPAR platform place the crash kernel to mid of
>>>> +		 * RMA size (512MB or more) to ensure the crash kernel
>>>> +		 * gets enough space to place itself and some stack to be
>>>> +		 * in the first segment. At the same time normal kernel
>>>> +		 * also get enough space to allocate memory for essential
>>>> +		 * system resource in the first segment. Keep the crash
>>>> +		 * kernel starts at 128MB offset on other platforms.
>>>>    		 */
>>>> -		crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
>>>> +		if (firmware_has_feature(FW_FEATURE_LPAR))
>>>> +			crashk_res.start = ppc64_rma_size / 2;
>>>> +		else
>>>> +			crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
>>> I think this will break on machines using Radix won't it? At this point
>>> in boot ppc64_rma_size will be == 0. Because we won't call into
>>> hash__setup_initial_memory_limit().
>>>
>>> That's not changed by your patch, but seems like this code needs to be
>>> more careful/clever.
>> Interesting, but in my testing, I found that ppc64_rma_size
>> did get initialized before reserve_crashkernel() using radix on LPAR.
>>
>> I am not sure why but hash__setup_initial_memory_limit() function is
>> gets called
>> regardless of radix or hash. Not sure whether it is by design but here
>> is the flow:
> It sort of is by design. See:
>
>    103a8542cb35 ("powerpc/book3s64/radix: Fix boot failure with large amount of guest memory")
>
> Basically the hash restrictions are more strict, so we apply them until
> we know we will use radix.
>
> But ...
>
>> setup_initial_memory_limit()
>>
>>        static inline void setup_initial_memory_limit()
>> (arch/powerpc/include/asm/book3s/64/mmu.h)
>>
>>               if (!early_radix_enabled())  // FALSE regardless of radix is enabled or not
> You mean early_radix_enabled() is False regardless. But that's not true
> in all cases.
>
> We can now build the kernel without hash MMU support at all, see:
>
>    387e220a2e5e ("powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU")
>
> In which case early_radix_enabled() will be true here, because it's hard
> coded to be true at build time.

Sorry, my bad I was not aware of that. But when both hash and radix
are enabled the early_radix_enabled return FALSE regardless it is disabled
using kernel command line or not is confusing to me. Maybe it is too early
in the boot sequence...


>
>>                   hash__setup_initial_memory_limit() // initialize ppc64_rma_size
>>
>>        reserve_crashkernel()  // initialize crashkernel offset to mid of RMA size.
>>
>>
>> For the sack of understanding even if we restrict crashkernel offset
>> setting to mid RMA (i.e. ppc64_rma_size/2) for
>> only hash it may not save radix because even today we are assigning
>> crashkernel offset using
>> ppc64_rma_size variable.
> Yes. There's already a bug there, your patch doesn't make it better or worse.
>
>> Is the current flow of initializing ppc64_rma_size variable before
>> reserve_crashkernel() for radix expected?
>>
>> Please provide your input.
> I wonder if we're better off moving the crash kernel reservation later,
> once we've discovered what MMU we're using.
>
> I can't immediately see why that would be a problem, as long as we do
> the reservation before we do any (many?) allocations. I'll have to think
> about it a bit more though, these boot ordering things are always
> subtle.
Agree we have space to improve this piece of code. Let me know
if I can help you to make this better.
> For now I think this patch is OK if you send a v2 to fix the compile
> error
I sent the next version in the mailing list. Thanks for the support.

- Sourabh Jain

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: powerpc: Set crashkernel offset to mid of RMA region
  2022-02-01 11:10 ` Hari Bathini
@ 2022-02-04  9:00   ` Sourabh Jain
  0 siblings, 0 replies; 10+ messages in thread
From: Sourabh Jain @ 2022-02-04  9:00 UTC (permalink / raw)
  To: linuxppc-dev


On 01/02/22 16:40, Hari Bathini wrote:
>
>
> On 28/01/22 3:34 pm, Sourabh Jain wrote:
>> On large config LPARs (having 192 and more cores), Linux fails to boot
>> due to insufficient memory in the first memblock. It is due to the
>> memory reservation for the crash kernel which starts at 128MB offset of
>> the first memblock. This memory reservation for the crash kernel doesn't
>> leave enough space in the first memblock to accommodate other essential
>> system resources.
>>
>> The crash kernel start address was set to 128MB offset by default to
>> ensure that the crash kernel get some memory below the RMA region which
>> is used to be of size 256MB. But given that the RMA region size can be
>> 512MB or more, setting the crash kernel offset to mid of RMA size will
>> leave enough space for kernel to allocate memory for other system
>> resources.
>>
>> Since the above crash kernel offset change is only applicable to the 
>> LPAR
>> platform, the LPAR feature detection is pushed before the crash kernel
>> reservation. The rest of LPAR specific initialization will still
>> be done during pseries_probe_fw_features as usual.
>>
>> Signed-off-by: Sourabh Jain<sourabhjain@linux.ibm.com>
>> Reported-and-tested-by: Abdul haleem<abdhalee@linux.vnet.ibm.com>
>>
>> ---
>>   arch/powerpc/kernel/rtas.c |  4 ++++
>>   arch/powerpc/kexec/core.c  | 15 +++++++++++----
>>   2 files changed, 15 insertions(+), 4 deletions(-)
>>
>>   ---
>>   Change in v3:
>>     Dropped 1st and 2nd patch from v2. 1st and 2nd patch from v2 patch
>>     series [1] try to discover 1T segment MMU feature support
>>     BEFORE boot CPU paca allocation ([1] describes why it is needed).
>>     MPE has posted a patch [2] that archives a similar objective by 
>> moving
>>     boot CPU paca allocation after mmu_early_init_devtree().
>>
>
>> NOTE: This patch is dependent on the patch [2].
>>
>> [1]https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20211018084434.217772-3-sourabhjain@linux.ibm.com/ 
>>
>> [2]https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html 
>>
>
> This dependency info must be captured somewhere within the changelog to
> be useful.

Added about the dependent patch in v4 patch commit message.
v4 patch link: 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-February/239642.html

Thanks for the review Hari.

- Sourabh Jain

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: powerpc: Set crashkernel offset to mid of RMA region
  2022-02-02 15:08   ` Sourabh Jain
@ 2022-02-03 11:07     ` Michael Ellerman
  2022-02-04  9:14       ` Sourabh Jain
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Ellerman @ 2022-02-03 11:07 UTC (permalink / raw)
  To: Sourabh Jain, linuxppc-dev; +Cc: mahesh, hbathini, Abdul haleem

Sourabh Jain <sourabhjain@linux.ibm.com> writes:
> On 01/02/22 17:14, Michael Ellerman wrote:
>> Sourabh Jain <sourabhjain@linux.ibm.com> writes:
>>> On large config LPARs (having 192 and more cores), Linux fails to boot
>>> due to insufficient memory in the first memblock. It is due to the
>>> memory reservation for the crash kernel which starts at 128MB offset of
>>> the first memblock. This memory reservation for the crash kernel doesn't
>>> leave enough space in the first memblock to accommodate other essential
>>> system resources.
>>>
>>> The crash kernel start address was set to 128MB offset by default to
>>> ensure that the crash kernel get some memory below the RMA region which
>>> is used to be of size 256MB. But given that the RMA region size can be
>>> 512MB or more, setting the crash kernel offset to mid of RMA size will
>>> leave enough space for kernel to allocate memory for other system
>>> resources.
>>>
>>> Since the above crash kernel offset change is only applicable to the LPAR
>>> platform, the LPAR feature detection is pushed before the crash kernel
>>> reservation. The rest of LPAR specific initialization will still
>>> be done during pseries_probe_fw_features as usual.
>>>
>>> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
>>> Reported-and-tested-by: Abdul haleem <abdhalee@linux.vnet.ibm.com>
>>>
>>> ---
>>>   arch/powerpc/kernel/rtas.c |  4 ++++
>>>   arch/powerpc/kexec/core.c  | 15 +++++++++++----
>>>   2 files changed, 15 insertions(+), 4 deletions(-)
>>>
>>>   ---
>>>   Change in v3:
>>> 	Dropped 1st and 2nd patch from v2. 1st and 2nd patch from v2 patch
>>> 	series [1] try to discover 1T segment MMU feature support
>>> 	BEFORE boot CPU paca allocation ([1] describes why it is needed).
>>> 	MPE has posted a patch [2] that archives a similar objective by moving
>>> 	boot CPU paca allocation after mmu_early_init_devtree().
>>>
>>> NOTE: This patch is dependent on the patch [2].
>>>
>>> [1] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20211018084434.217772-3-sourabhjain@linux.ibm.com/
>>> [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html
>>>   ---
>>>
>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>> index 733e6ef36758..06df7464fb57 100644
>>> --- a/arch/powerpc/kernel/rtas.c
>>> +++ b/arch/powerpc/kernel/rtas.c
>>> @@ -1313,6 +1313,10 @@ int __init early_init_dt_scan_rtas(unsigned long node,
>>>   	entryp = of_get_flat_dt_prop(node, "linux,rtas-entry", NULL);
>>>   	sizep  = of_get_flat_dt_prop(node, "rtas-size", NULL);
>>>   
>>> +	/* need this feature to decide the crashkernel offset */
>>> +	if (of_get_flat_dt_prop(node, "ibm,hypertas-functions", NULL))
>>> +		powerpc_firmware_features |= FW_FEATURE_LPAR;
>>> +
>> As you'd have seen this breaks the 32-bit build. It will need an #ifdef
>> CONFIG_PPC64 around it.
>>
>>>   	if (basep && entryp && sizep) {
>>>   		rtas.base = *basep;
>>>   		rtas.entry = *entryp;
>>> diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
>>> index 8b68d9f91a03..abf5897ae88c 100644
>>> --- a/arch/powerpc/kexec/core.c
>>> +++ b/arch/powerpc/kexec/core.c
>>> @@ -134,11 +134,18 @@ void __init reserve_crashkernel(void)
>>>   	if (!crashk_res.start) {
>>>   #ifdef CONFIG_PPC64
>>>   		/*
>>> -		 * On 64bit we split the RMO in half but cap it at half of
>>> -		 * a small SLB (128MB) since the crash kernel needs to place
>>> -		 * itself and some stacks to be in the first segment.
>>> +		 * On the LPAR platform place the crash kernel to mid of
>>> +		 * RMA size (512MB or more) to ensure the crash kernel
>>> +		 * gets enough space to place itself and some stack to be
>>> +		 * in the first segment. At the same time normal kernel
>>> +		 * also get enough space to allocate memory for essential
>>> +		 * system resource in the first segment. Keep the crash
>>> +		 * kernel starts at 128MB offset on other platforms.
>>>   		 */
>>> -		crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
>>> +		if (firmware_has_feature(FW_FEATURE_LPAR))
>>> +			crashk_res.start = ppc64_rma_size / 2;
>>> +		else
>>> +			crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
>> I think this will break on machines using Radix won't it? At this point
>> in boot ppc64_rma_size will be == 0. Because we won't call into
>> hash__setup_initial_memory_limit().
>>
>> That's not changed by your patch, but seems like this code needs to be
>> more careful/clever.
>
> Interesting, but in my testing, I found that ppc64_rma_size
> did get initialized before reserve_crashkernel() using radix on LPAR.
>
> I am not sure why but hash__setup_initial_memory_limit() function is 
> gets called
> regardless of radix or hash. Not sure whether it is by design but here 
> is the flow:

It sort of is by design. See:

  103a8542cb35 ("powerpc/book3s64/radix: Fix boot failure with large amount of guest memory")

Basically the hash restrictions are more strict, so we apply them until
we know we will use radix.

But ...

> setup_initial_memory_limit()
>
>       static inline void setup_initial_memory_limit() 
> (arch/powerpc/include/asm/book3s/64/mmu.h)
>
>              if (!early_radix_enabled())  // FALSE regardless of radix is enabled or not

You mean early_radix_enabled() is False regardless. But that's not true
in all cases.

We can now build the kernel without hash MMU support at all, see:

  387e220a2e5e ("powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU")

In which case early_radix_enabled() will be true here, because it's hard
coded to be true at build time.

>                  hash__setup_initial_memory_limit() // initialize ppc64_rma_size
>
>       reserve_crashkernel()  // initialize crashkernel offset to mid of RMA size.
>
>
> For the sack of understanding even if we restrict crashkernel offset 
> setting to mid RMA (i.e. ppc64_rma_size/2) for
> only hash it may not save radix because even today we are assigning 
> crashkernel offset using
> ppc64_rma_size variable.

Yes. There's already a bug there, your patch doesn't make it better or worse.

> Is the current flow of initializing ppc64_rma_size variable before 
> reserve_crashkernel() for radix expected?
>
> Please provide your input.

I wonder if we're better off moving the crash kernel reservation later,
once we've discovered what MMU we're using.

I can't immediately see why that would be a problem, as long as we do
the reservation before we do any (many?) allocations. I'll have to think
about it a bit more though, these boot ordering things are always
subtle.

For now I think this patch is OK if you send a v2 to fix the compile
error.

cheers

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: powerpc: Set crashkernel offset to mid of RMA region
  2022-02-01 11:44 ` Michael Ellerman
@ 2022-02-02 15:08   ` Sourabh Jain
  2022-02-03 11:07     ` Michael Ellerman
  0 siblings, 1 reply; 10+ messages in thread
From: Sourabh Jain @ 2022-02-02 15:08 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: mahesh, hbathini, Abdul haleem


On 01/02/22 17:14, Michael Ellerman wrote:
> Sourabh Jain <sourabhjain@linux.ibm.com> writes:
>> On large config LPARs (having 192 and more cores), Linux fails to boot
>> due to insufficient memory in the first memblock. It is due to the
>> memory reservation for the crash kernel which starts at 128MB offset of
>> the first memblock. This memory reservation for the crash kernel doesn't
>> leave enough space in the first memblock to accommodate other essential
>> system resources.
>>
>> The crash kernel start address was set to 128MB offset by default to
>> ensure that the crash kernel get some memory below the RMA region which
>> is used to be of size 256MB. But given that the RMA region size can be
>> 512MB or more, setting the crash kernel offset to mid of RMA size will
>> leave enough space for kernel to allocate memory for other system
>> resources.
>>
>> Since the above crash kernel offset change is only applicable to the LPAR
>> platform, the LPAR feature detection is pushed before the crash kernel
>> reservation. The rest of LPAR specific initialization will still
>> be done during pseries_probe_fw_features as usual.
>>
>> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
>> Reported-and-tested-by: Abdul haleem <abdhalee@linux.vnet.ibm.com>
>>
>> ---
>>   arch/powerpc/kernel/rtas.c |  4 ++++
>>   arch/powerpc/kexec/core.c  | 15 +++++++++++----
>>   2 files changed, 15 insertions(+), 4 deletions(-)
>>
>>   ---
>>   Change in v3:
>> 	Dropped 1st and 2nd patch from v2. 1st and 2nd patch from v2 patch
>> 	series [1] try to discover 1T segment MMU feature support
>> 	BEFORE boot CPU paca allocation ([1] describes why it is needed).
>> 	MPE has posted a patch [2] that archives a similar objective by moving
>> 	boot CPU paca allocation after mmu_early_init_devtree().
>>
>> NOTE: This patch is dependent on the patch [2].
>>
>> [1] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20211018084434.217772-3-sourabhjain@linux.ibm.com/
>> [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html
>>   ---
>>
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 733e6ef36758..06df7464fb57 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -1313,6 +1313,10 @@ int __init early_init_dt_scan_rtas(unsigned long node,
>>   	entryp = of_get_flat_dt_prop(node, "linux,rtas-entry", NULL);
>>   	sizep  = of_get_flat_dt_prop(node, "rtas-size", NULL);
>>   
>> +	/* need this feature to decide the crashkernel offset */
>> +	if (of_get_flat_dt_prop(node, "ibm,hypertas-functions", NULL))
>> +		powerpc_firmware_features |= FW_FEATURE_LPAR;
>> +
> As you'd have seen this breaks the 32-bit build. It will need an #ifdef
> CONFIG_PPC64 around it.
>
>>   	if (basep && entryp && sizep) {
>>   		rtas.base = *basep;
>>   		rtas.entry = *entryp;
>> diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
>> index 8b68d9f91a03..abf5897ae88c 100644
>> --- a/arch/powerpc/kexec/core.c
>> +++ b/arch/powerpc/kexec/core.c
>> @@ -134,11 +134,18 @@ void __init reserve_crashkernel(void)
>>   	if (!crashk_res.start) {
>>   #ifdef CONFIG_PPC64
>>   		/*
>> -		 * On 64bit we split the RMO in half but cap it at half of
>> -		 * a small SLB (128MB) since the crash kernel needs to place
>> -		 * itself and some stacks to be in the first segment.
>> +		 * On the LPAR platform place the crash kernel to mid of
>> +		 * RMA size (512MB or more) to ensure the crash kernel
>> +		 * gets enough space to place itself and some stack to be
>> +		 * in the first segment. At the same time normal kernel
>> +		 * also get enough space to allocate memory for essential
>> +		 * system resource in the first segment. Keep the crash
>> +		 * kernel starts at 128MB offset on other platforms.
>>   		 */
>> -		crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
>> +		if (firmware_has_feature(FW_FEATURE_LPAR))
>> +			crashk_res.start = ppc64_rma_size / 2;
>> +		else
>> +			crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
> I think this will break on machines using Radix won't it? At this point
> in boot ppc64_rma_size will be == 0. Because we won't call into
> hash__setup_initial_memory_limit().
>
> That's not changed by your patch, but seems like this code needs to be
> more careful/clever.

Interesting, but in my testing, I found that ppc64_rma_size
did get initialized before reserve_crashkernel() using radix on LPAR.

I am not sure why but hash__setup_initial_memory_limit() function is 
gets called
regardless of radix or hash. Not sure whether it is by design but here 
is the flow:

setup_initial_memory_limit()

      static inline void setup_initial_memory_limit() 
(arch/powerpc/include/asm/book3s/64/mmu.h)

             if (!early_radix_enabled())  // FALSE regardless of radix 
is enabled or not
                 hash__setup_initial_memory_limit() // initialize 
ppc64_rma_size

      reserve_crashkernel()  // initialize crashkernel offset to mid of 
RMA size.


For the sack of understanding even if we restrict crashkernel offset 
setting to mid RMA (i.e. ppc64_rma_size/2) for
only hash it may not save radix because even today we are assigning 
crashkernel offset using
ppc64_rma_size variable.

Is the current flow of initializing ppc64_rma_size variable before 
reserve_crashkernel() for radix expected?

Please provide your input.

Thanks,
Sourabh Jain


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: powerpc: Set crashkernel offset to mid of RMA region
  2022-01-28 10:04 Sourabh Jain
  2022-02-01  6:50 ` kernel test robot
  2022-02-01 11:10 ` Hari Bathini
@ 2022-02-01 11:44 ` Michael Ellerman
  2022-02-02 15:08   ` Sourabh Jain
  2 siblings, 1 reply; 10+ messages in thread
From: Michael Ellerman @ 2022-02-01 11:44 UTC (permalink / raw)
  To: Sourabh Jain, linuxppc-dev; +Cc: mahesh, hbathini, Abdul haleem

Sourabh Jain <sourabhjain@linux.ibm.com> writes:
> On large config LPARs (having 192 and more cores), Linux fails to boot
> due to insufficient memory in the first memblock. It is due to the
> memory reservation for the crash kernel which starts at 128MB offset of
> the first memblock. This memory reservation for the crash kernel doesn't
> leave enough space in the first memblock to accommodate other essential
> system resources.
>
> The crash kernel start address was set to 128MB offset by default to
> ensure that the crash kernel get some memory below the RMA region which
> is used to be of size 256MB. But given that the RMA region size can be
> 512MB or more, setting the crash kernel offset to mid of RMA size will
> leave enough space for kernel to allocate memory for other system
> resources.
>
> Since the above crash kernel offset change is only applicable to the LPAR
> platform, the LPAR feature detection is pushed before the crash kernel
> reservation. The rest of LPAR specific initialization will still
> be done during pseries_probe_fw_features as usual.
>
> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> Reported-and-tested-by: Abdul haleem <abdhalee@linux.vnet.ibm.com>
>
> ---
>  arch/powerpc/kernel/rtas.c |  4 ++++
>  arch/powerpc/kexec/core.c  | 15 +++++++++++----
>  2 files changed, 15 insertions(+), 4 deletions(-)
>
>  ---
>  Change in v3:
> 	Dropped 1st and 2nd patch from v2. 1st and 2nd patch from v2 patch
> 	series [1] try to discover 1T segment MMU feature support
> 	BEFORE boot CPU paca allocation ([1] describes why it is needed).
> 	MPE has posted a patch [2] that archives a similar objective by moving
> 	boot CPU paca allocation after mmu_early_init_devtree().
>
> NOTE: This patch is dependent on the patch [2].
>
> [1] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20211018084434.217772-3-sourabhjain@linux.ibm.com/
> [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html
>  ---
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 733e6ef36758..06df7464fb57 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -1313,6 +1313,10 @@ int __init early_init_dt_scan_rtas(unsigned long node,
>  	entryp = of_get_flat_dt_prop(node, "linux,rtas-entry", NULL);
>  	sizep  = of_get_flat_dt_prop(node, "rtas-size", NULL);
>  
> +	/* need this feature to decide the crashkernel offset */
> +	if (of_get_flat_dt_prop(node, "ibm,hypertas-functions", NULL))
> +		powerpc_firmware_features |= FW_FEATURE_LPAR;
> +

As you'd have seen this breaks the 32-bit build. It will need an #ifdef
CONFIG_PPC64 around it.

>  	if (basep && entryp && sizep) {
>  		rtas.base = *basep;
>  		rtas.entry = *entryp;
> diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
> index 8b68d9f91a03..abf5897ae88c 100644
> --- a/arch/powerpc/kexec/core.c
> +++ b/arch/powerpc/kexec/core.c
> @@ -134,11 +134,18 @@ void __init reserve_crashkernel(void)
>  	if (!crashk_res.start) {
>  #ifdef CONFIG_PPC64
>  		/*
> -		 * On 64bit we split the RMO in half but cap it at half of
> -		 * a small SLB (128MB) since the crash kernel needs to place
> -		 * itself and some stacks to be in the first segment.
> +		 * On the LPAR platform place the crash kernel to mid of
> +		 * RMA size (512MB or more) to ensure the crash kernel
> +		 * gets enough space to place itself and some stack to be
> +		 * in the first segment. At the same time normal kernel
> +		 * also get enough space to allocate memory for essential
> +		 * system resource in the first segment. Keep the crash
> +		 * kernel starts at 128MB offset on other platforms.
>  		 */
> -		crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
> +		if (firmware_has_feature(FW_FEATURE_LPAR))
> +			crashk_res.start = ppc64_rma_size / 2;
> +		else
> +			crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));

I think this will break on machines using Radix won't it? At this point
in boot ppc64_rma_size will be == 0. Because we won't call into 
hash__setup_initial_memory_limit().

That's not changed by your patch, but seems like this code needs to be
more careful/clever.

cheers

>  #else
>  		crashk_res.start = KDUMP_KERNELBASE;
>  #endif
> -- 
> 2.34.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: powerpc: Set crashkernel offset to mid of RMA region
  2022-01-28 10:04 Sourabh Jain
  2022-02-01  6:50 ` kernel test robot
@ 2022-02-01 11:10 ` Hari Bathini
  2022-02-04  9:00   ` Sourabh Jain
  2022-02-01 11:44 ` Michael Ellerman
  2 siblings, 1 reply; 10+ messages in thread
From: Hari Bathini @ 2022-02-01 11:10 UTC (permalink / raw)
  To: Sourabh Jain, linuxppc-dev, mpe; +Cc: mahesh, Abdul haleem



On 28/01/22 3:34 pm, Sourabh Jain wrote:
> On large config LPARs (having 192 and more cores), Linux fails to boot
> due to insufficient memory in the first memblock. It is due to the
> memory reservation for the crash kernel which starts at 128MB offset of
> the first memblock. This memory reservation for the crash kernel doesn't
> leave enough space in the first memblock to accommodate other essential
> system resources.
> 
> The crash kernel start address was set to 128MB offset by default to
> ensure that the crash kernel get some memory below the RMA region which
> is used to be of size 256MB. But given that the RMA region size can be
> 512MB or more, setting the crash kernel offset to mid of RMA size will
> leave enough space for kernel to allocate memory for other system
> resources.
> 
> Since the above crash kernel offset change is only applicable to the LPAR
> platform, the LPAR feature detection is pushed before the crash kernel
> reservation. The rest of LPAR specific initialization will still
> be done during pseries_probe_fw_features as usual.
> 
> Signed-off-by: Sourabh Jain<sourabhjain@linux.ibm.com>
> Reported-and-tested-by: Abdul haleem<abdhalee@linux.vnet.ibm.com>
> 
> ---
>   arch/powerpc/kernel/rtas.c |  4 ++++
>   arch/powerpc/kexec/core.c  | 15 +++++++++++----
>   2 files changed, 15 insertions(+), 4 deletions(-)
> 
>   ---
>   Change in v3:
> 	Dropped 1st and 2nd patch from v2. 1st and 2nd patch from v2 patch
> 	series [1] try to discover 1T segment MMU feature support
> 	BEFORE boot CPU paca allocation ([1] describes why it is needed).
> 	MPE has posted a patch [2] that archives a similar objective by moving
> 	boot CPU paca allocation after mmu_early_init_devtree().
> 

> NOTE: This patch is dependent on the patch [2].
> 
> [1]https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20211018084434.217772-3-sourabhjain@linux.ibm.com/
> [2]https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html

This dependency info must be captured somewhere within the changelog to
be useful.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: powerpc: Set crashkernel offset to mid of RMA region
  2022-01-28 10:04 Sourabh Jain
@ 2022-02-01  6:50 ` kernel test robot
  2022-02-01 11:10 ` Hari Bathini
  2022-02-01 11:44 ` Michael Ellerman
  2 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2022-02-01  6:50 UTC (permalink / raw)
  To: Sourabh Jain, linuxppc-dev, mpe
  Cc: mahesh, kbuild-all, hbathini, Abdul haleem

Hi Sourabh,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.17-rc2 next-20220131]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sourabh-Jain/powerpc-Set-crashkernel-offset-to-mid-of-RMA-region/20220128-180605
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-randconfig-m031-20220130 (https://download.01.org/0day-ci/archive/20220201/202202011412.n27Qr2sT-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/60c8e71e56c0e5e321c312f14aad9a2ceb241c63
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Sourabh-Jain/powerpc-Set-crashkernel-offset-to-mid-of-RMA-region/20220128-180605
        git checkout 60c8e71e56c0e5e321c312f14aad9a2ceb241c63
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=powerpc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   powerpc-linux-ld: warning: orphan section `.init.plt' from `drivers/net/ethernet/ti/davinci_cpdma.o' being placed in section `.init.plt'
   powerpc-linux-ld: warning: orphan section `.ftrace.tramp' from `drivers/net/ethernet/ti/davinci_cpdma.o' being placed in section `.ftrace.tramp'
   powerpc-linux-ld: arch/powerpc/kernel/rtas.o: in function `early_init_dt_scan_rtas':
>> rtas.c:(.init.text+0x2d2): undefined reference to `powerpc_firmware_features'
>> powerpc-linux-ld: rtas.c:(.init.text+0x2d6): undefined reference to `powerpc_firmware_features'
   powerpc-linux-ld: rtas.c:(.init.text+0x2de): undefined reference to `powerpc_firmware_features'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* powerpc: Set crashkernel offset to mid of RMA region
@ 2022-01-28 10:04 Sourabh Jain
  2022-02-01  6:50 ` kernel test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Sourabh Jain @ 2022-01-28 10:04 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: mahesh, hbathini, Abdul haleem

On large config LPARs (having 192 and more cores), Linux fails to boot
due to insufficient memory in the first memblock. It is due to the
memory reservation for the crash kernel which starts at 128MB offset of
the first memblock. This memory reservation for the crash kernel doesn't
leave enough space in the first memblock to accommodate other essential
system resources.

The crash kernel start address was set to 128MB offset by default to
ensure that the crash kernel get some memory below the RMA region which
is used to be of size 256MB. But given that the RMA region size can be
512MB or more, setting the crash kernel offset to mid of RMA size will
leave enough space for kernel to allocate memory for other system
resources.

Since the above crash kernel offset change is only applicable to the LPAR
platform, the LPAR feature detection is pushed before the crash kernel
reservation. The rest of LPAR specific initialization will still
be done during pseries_probe_fw_features as usual.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reported-and-tested-by: Abdul haleem <abdhalee@linux.vnet.ibm.com>

---
 arch/powerpc/kernel/rtas.c |  4 ++++
 arch/powerpc/kexec/core.c  | 15 +++++++++++----
 2 files changed, 15 insertions(+), 4 deletions(-)

 ---
 Change in v3:
	Dropped 1st and 2nd patch from v2. 1st and 2nd patch from v2 patch
	series [1] try to discover 1T segment MMU feature support
	BEFORE boot CPU paca allocation ([1] describes why it is needed).
	MPE has posted a patch [2] that archives a similar objective by moving
	boot CPU paca allocation after mmu_early_init_devtree().

NOTE: This patch is dependent on the patch [2].

[1] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20211018084434.217772-3-sourabhjain@linux.ibm.com/
[2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html
 ---

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 733e6ef36758..06df7464fb57 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1313,6 +1313,10 @@ int __init early_init_dt_scan_rtas(unsigned long node,
 	entryp = of_get_flat_dt_prop(node, "linux,rtas-entry", NULL);
 	sizep  = of_get_flat_dt_prop(node, "rtas-size", NULL);
 
+	/* need this feature to decide the crashkernel offset */
+	if (of_get_flat_dt_prop(node, "ibm,hypertas-functions", NULL))
+		powerpc_firmware_features |= FW_FEATURE_LPAR;
+
 	if (basep && entryp && sizep) {
 		rtas.base = *basep;
 		rtas.entry = *entryp;
diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
index 8b68d9f91a03..abf5897ae88c 100644
--- a/arch/powerpc/kexec/core.c
+++ b/arch/powerpc/kexec/core.c
@@ -134,11 +134,18 @@ void __init reserve_crashkernel(void)
 	if (!crashk_res.start) {
 #ifdef CONFIG_PPC64
 		/*
-		 * On 64bit we split the RMO in half but cap it at half of
-		 * a small SLB (128MB) since the crash kernel needs to place
-		 * itself and some stacks to be in the first segment.
+		 * On the LPAR platform place the crash kernel to mid of
+		 * RMA size (512MB or more) to ensure the crash kernel
+		 * gets enough space to place itself and some stack to be
+		 * in the first segment. At the same time normal kernel
+		 * also get enough space to allocate memory for essential
+		 * system resource in the first segment. Keep the crash
+		 * kernel starts at 128MB offset on other platforms.
 		 */
-		crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
+		if (firmware_has_feature(FW_FEATURE_LPAR))
+			crashk_res.start = ppc64_rma_size / 2;
+		else
+			crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
 #else
 		crashk_res.start = KDUMP_KERNELBASE;
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-02-16 12:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-04  8:56 powerpc: Set crashkernel offset to mid of RMA region Sourabh Jain
2022-02-16 12:25 ` Michael Ellerman
  -- strict thread matches above, loose matches on Subject: below --
2022-01-28 10:04 Sourabh Jain
2022-02-01  6:50 ` kernel test robot
2022-02-01 11:10 ` Hari Bathini
2022-02-04  9:00   ` Sourabh Jain
2022-02-01 11:44 ` Michael Ellerman
2022-02-02 15:08   ` Sourabh Jain
2022-02-03 11:07     ` Michael Ellerman
2022-02-04  9:14       ` Sourabh Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).