linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: Baoquan He <bhe@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	<x86@kernel.org>, "H . Peter Anvin" <hpa@zytor.com>,
	<linux-kernel@vger.kernel.org>, Dave Young <dyoung@redhat.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Eric Biederman <ebiederm@xmission.com>,
	<kexec@lists.infradead.org>, Will Deacon <will@kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	Rob Herring <robh+dt@kernel.org>,
	Frank Rowand <frowand.list@gmail.com>,
	<devicetree@vger.kernel.org>, "Jonathan Corbet" <corbet@lwn.net>,
	<linux-doc@vger.kernel.org>, Randy Dunlap <rdunlap@infradead.org>,
	Feng Zhou <zhoufeng.zf@bytedance.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Chen Zhou <dingguo.cz@antgroup.com>,
	"John Donnelly" <John.p.donnelly@oracle.com>,
	Dave Kleikamp <dave.kleikamp@oracle.com>
Subject: Re: [PATCH v22 5/9] arm64: kdump: Reimplement crashkernel=X
Date: Fri, 29 Apr 2022 16:25:37 +0800	[thread overview]
Message-ID: <23e2dcf4-4e9a-5298-d5d8-8761b0bbbe21@huawei.com> (raw)
In-Reply-To: <a9c736a0-f2b3-5b8a-94d9-80742ccd2700@huawei.com>



On 2022/4/29 16:02, Leizhen (ThunderTown) wrote:
> 
> 
> On 2022/4/29 11:24, Baoquan He wrote:
>> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2022/4/28 11:52, Baoquan He wrote:
>>>> On 04/28/22 at 11:40am, Baoquan He wrote:
>>>>> Hi Catalin, Zhen Lei,
>>>>>
>>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote:
>>>>>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote:
>>>>>>> On 2022/4/27 20:32, Catalin Marinas wrote:
>>>>>>>> I think one could always pass a default command line like:
>>>>>>>>
>>>>>>>> 	crashkernel=1G,high crashkernel=128M,low
>>>>>>>>
>>>>>>>> without much knowledge of the SoC memory layout.
>>>>>>>
>>>>>>> Yes, that's what the end result is. The user specify crashkernel=128M,low
>>>>>>> and the implementation ensure the 128M low memory is allocated from DMA zone.
>>>>>>> We use arm64_dma_phys_limit as the upper limit for crash low memory.
>>>>>>>
>>>>>>> +#define CRASH_ADDR_LOW_MAX             arm64_dma_phys_limit
>>>>>>> +       unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
>>>>>>> +       crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
>>>>>>>                                                crash_base, crash_max);
>>>>>>>
>>>>>>>> Another option is to only introduce crashkernel=Y,low and, when that is
>>>>>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a
>>>>>>>> 'high' option at all:
>>>>>>>>
>>>>>>>> 	crashkernel=1G				- all within ZONE_DMA
>>>>>>>> 	crashkernel=1G crashkernel=128M,low	- 128M in ZONE_DMA
>>>>>>>> 						  1G above ZONE_DMA
>>>>>>>>
>>>>>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore
>>>>>>>> the 'low' option.
>>>>>>>
>>>>>>> I think although the code is hard to make generic, the interface is better to
>>>>>>> be relatively uniform. A user might have to maintain both x86 and arm64, and
>>>>>>> so on. It's not a good thing that the difference is too big.
>>>>>>
>>>>>> There will be some difference as the 4G limit doesn't always hold for
>>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify
>>>>>> things a bit while following the documented behaviour:
>>>>>>
>>>>>> 	crashkernel=Y		- current behaviour within ZONE_DMA
>>>>>> 	crashkernel=Y,high	- allocate from above ZONE_DMA
>>>>>> 	crashkernel=Y,low	- allocate within ZONE_DMA
>>>>>>
>>>>>> There is no fallback from crashkernel=Y.
>>>>>>
>>>>>> The question is whether we still want a default low allocation if
>>>>>> crashkernel=Y,low is missing but 'high' is present. If we add this, I
>>>>>> think we'd be consistent with kernel-parameters.txt for the 'low'
>>>>>> description. A default 'low' is probably not that bad but I'm tempted to
>>>>>> always mandate both 'high' and 'low'.
>>>>>
>>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns
>>>>> about this version. And I have the same concerns about them which comes
>>>>> from below points:
>>>>> 1) we may need to take best effort to keep ,high, ,low behaviour
>>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they
>>>>> deploy/configure kdump on different machines of different ARCHes in the
>>>>> same LAB. I think we should try to avoid the confusion.
>>>
>>> Yes, but for someone who is configuring crashkernel= for the first time, he
>>> needs to read doc to understand how to configure it. The doc can show the
>>> recommended default value of 'low' size.
>>>
>>> After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"),
>>> the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size()
>>> is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low.
>>>
>>>
>>> +                * -swiotlb size: user-specified with swiotlb= or default.
>>> -               low_size = swiotlb_size_or_default() + (8UL<<20);
>>> +               low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20);
>>>
>>> That means all ARCHs can explicit configure crashkernel=256M,low, instead of
>>> omitting it. This may be another way to avoid confusion. It's not hard for
>>> programmer-turned-user/admin. However, this requires us to forgo backward
>>> compatibility with the default size of 'low'.
>>
>> We can make ,high and ,low simpler at first as they are alternative. If
>> possible, we can also simplify the ,high ,low implementation on x86_64
>> if it truly brings better archievement on arm64.
> 
> OK, I plan to remove optimization, fallback and default low size, to follow the
> suggestion of Catalin first. But there's one minor point of contention.
> 
> 1)    Both "crashkernel=X,high" and "crashkernel=X,low" must be present.
> 2)    Both "crashkernel=X,high" and "crashkernel=X,low" are present.
>    or
>       Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero.
> 
> I prefer 2), how about you?
> 
>>
>>>
>>>
>>>>> 2) Fallback behaviour is important to our distros. The reason is we will
>>>>> provide default value with crashkernel=xxxM along kernel of distros. In
>>>>> this case, we hope the reservation will succeed by all means. The ,high
>>>>> and ,low is an option if customer likes to take with expertise.
>>>
>>> OK, I got it.
>>>
>>>>>
>>>>> After going through arm64 memory init code, I got below summary about
>>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we
>>>>> can make use of it to facilitate to simplify code.
>>>>> ================================================================================
>>>>>                         DMA                      DMA32                    NORMAL
>>>>> 1)Raspberry Pi4         0~1G                     3G~4G                    (above 4G)
>>>>> 2)Normal machine        0~4G                     0                        (above 4G)
>>>>> 3)Special machine       (above 4G)~MAX
>>>>> 4)No DMA|DMA32                                                            (above 4G)~MAX
>>>
>>> arm64_memblock_init()
>>> 	reserve_crashkernel()        ---------------   0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
>> We don't need different code for this place of reservation as you are
>> doing in this patchset, since arm64_dma_phys_limit is initialized as 
>> below. In fact, in arm64_memblock_init(), we have made memblock ready,
>> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if
>> memblock_start_of_DRAM() is bigger than 4G, we possibly can call
>> reserve_crashkernel() here too.
> 
> Yes. Maybe all the devices in this environment are 64-bit. One way I know of allowing
> 32-bit devices to access high memory without SMMU is: Set a fixed value for the upper
> 32 bits. In this case, the DMA zone should be [phys_start, phys_start + 4G).

I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA offsets in the max_zone_phys() calculation")

    Currently, the kernel assumes that if RAM starts above 32-bit (or
    zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and
    such constrained devices have a hardwired DMA offset. In practice, we
    haven't noticed any such hardware so let's assume that we can expand
    ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly,
    ZONE_DMA is expanded to the 4GB limit if no RAM addressable by
    zone_bits.

So your suggestion is feasible.

> 
>>
>> phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1;
>>
>>> paging_init()                                       |
>>> 	map_mem()                                   |
>>> unflatten_device_tree or ACPI                       |  ----  //Raspberry Pi4 get dma zone base on dtb or ACPI
>>> bootmem_init();                                     |      |
>>> 	zone_sizes_init()                           |      |
>>> 		of_dma_get_max_cpu_address          |  ----|
>>> 		//Update arm64_dma_phys_limit       |  ----|
>>> 	reserve_crashkernel()        <--------------  //Because we need arm64_dma_phys_limit to be updated above
>>> request_standard_resources()
>>
>> Yeah, because arm64_dma_phys_limit is decided late in the 1) and 2) case
>> as I summarized, we need defer reserve_crashkernel() to bootmem_init(). But 
>> arm64_dma_phys_limit could be 1G or 4G, that's why your optimization
>> about BLOCKING may not be right since you assume the 4G boundary, while
>> forgetting Raspberry Pi4 on which 1G is the boundary of low memory and
> 
> No, no, my optimization for Raspberry Pi4 is fine. I do page mapping for memory
> under 4G and block mapping for memory above 4G. But still try to reserve crash
> low memory from DMA zone. So when 1G is the boundary, it's just not fully optimized,
> the memory 1-4G are mapped as page.
> 
>> high memory. So separating out BLOCKING optimization can let us focus on
>> the crashkernel,high support.
>>
>>>
>>>>>
>>>>> -------------------------------------------
>>>>>                       arm64_dma_phys_limit
>>>>> 1)Raspberry Pi4         1G                     
>>>>> 2)Normal machine        4G                     
>>>>> 3)Special machine       MAX
>>>>> 4)No DMA|DMA32          MAX
>>>>>
>>>>> Note: 3)Special machine means the machine's starting physical address is above 4G.
>>>>> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has
>>>>> IOMMU hardware supporting.
>>>>> ===================================================================================
>>>>>
>>>>> I made a draft patch based on this patchset, please feel free to check and
>>>>> see if it's OK, or anything missing or wrongly understood. I removed
>>>>> reserve_crashkernel_high() and only keep reserve_crashkernel() and
>>>>> reserve_crashkernel_low() as the v21 did.
>>>>
>>>> Sorry, forgot attaching the draft patch.
>>>>
>>>> By the way, we can also have a simple version with basic ,high, ,low
>>>> support, no fallback. We can add fallback and other optimization later.
>>>> This can be plan B.
>>>
>>> Yes, That's what Catalin suggested also.
>>>
>>> Hi, Baoquan He:
>>>   Without optimization, the whole Patch 3-4 and 6-7 can be dropped.
>>>
>>> Process after abstraction:
>>> 	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) {
>>> 		reserve_crashkernel()
>>> 		//block mapping
>>> 	} else {
>>> 		//page mapping
>>> 		reserve_crashkernel()
>>> 	}
>>>
>>> ------------ Simplified real-world process ---------
>> Yeah, this looks clearer. I would like to see a version with them.
>>
>>> arm64_memblock_init()
>>         Before reserve_crashkernel(), we can update arm64_dma_phys_limit
>> as memblock_end_of_DRAM() if CONFIG_ZONE_DMA|DMA32 is not enabled or
>> memblock_start_of_DRAM() is bigger than 4G.
>>         Then we go with:
>>         if (!arm64_dma_phys_limit)
>> 		reserve_crashkernel();
>>
>> Just personal opinion, please check if it's appropriate to handle case
>> 3) which has physical starting memory above 4G here. 
> 
> OK, I'll write it down and consider it in the future optimization.
> 
>>
>>> 	if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
>>> 		reserve_crashkernel()
>>            
>>> paging_init()
>>> 	map_mem()
>>> 		if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32))
>>> 			//block mapping
>>> 		else
>>> 			//page mapping
>>> unflatten_device_tree or ACPI
>>> bootmem_init();
>>> 	zone_sizes_init()
>>> 		of_dma_get_max_cpu_address
>>> 		//Update arm64_dma_phys_limit
>>> 	if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32))
>>> 		reserve_crashkernel()
>>
>> The rest sounds good with optimization code split out.
>>
>> .
>>
> 

-- 
Regards,
  Zhen Lei

  reply	other threads:[~2022-04-29  8:25 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-14 11:57 [PATCH v22 0/9] support reserving crashkernel above 4G on arm64 kdump Zhen Lei
2022-04-14 11:57 ` [PATCH v22 1/9] kdump: return -ENOENT if required cmdline option does not exist Zhen Lei
2022-04-25  3:49   ` Baoquan He
2022-04-14 11:57 ` [PATCH v22 2/9] arm64: Use insert_resource() to simplify code Zhen Lei
2022-04-14 11:57 ` [PATCH v22 3/9] arm64: kdump: Remove some redundant checks in map_mem() Zhen Lei
2022-04-14 11:57 ` [PATCH v22 4/9] arm64: kdump: Don't force page-level mappings for memory above 4G Zhen Lei
2022-04-26 14:26   ` Catalin Marinas
2022-04-27  7:12     ` Leizhen (ThunderTown)
2022-04-14 11:57 ` [PATCH v22 5/9] arm64: kdump: Reimplement crashkernel=X Zhen Lei
2022-04-26 18:02   ` Catalin Marinas
2022-04-27  6:54     ` Leizhen (ThunderTown)
2022-04-27 12:32       ` Catalin Marinas
2022-04-27 13:49         ` Leizhen (ThunderTown)
2022-04-27 16:04           ` Catalin Marinas
2022-04-28  2:22             ` Leizhen (ThunderTown)
2022-04-28  3:40             ` Baoquan He
2022-04-28  3:52               ` Baoquan He
2022-04-28  9:33                 ` Leizhen (ThunderTown)
2022-04-29  3:24                   ` Baoquan He
2022-04-29  8:02                     ` Leizhen (ThunderTown)
2022-04-29  8:25                       ` Leizhen (ThunderTown) [this message]
2022-05-03 22:00                         ` Catalin Marinas
2022-05-05  2:13                           ` Leizhen (ThunderTown)
2022-05-05  3:00                           ` Baoquan He
2022-05-05 14:20                             ` Catalin Marinas
2022-05-06 11:39                               ` Baoquan He
2022-04-14 11:57 ` [PATCH v22 6/9] arm64: kdump: Use page-level mapping for the high memory of crashkernel Zhen Lei
2022-04-14 11:57 ` [PATCH v22 7/9] arm64: kdump: Try not to use NO_BLOCK_MAPPINGS for memory under 4G Zhen Lei
2022-04-14 11:57 ` [PATCH v22 8/9] of: fdt: Add memory for devices by DT property "linux,usable-memory-range" Zhen Lei
2022-04-14 11:57 ` [PATCH v22 9/9] docs: kdump: Update the crashkernel description for arm64 Zhen Lei
2022-04-19 17:02 ` [PATCH v22 0/9] support reserving crashkernel above 4G on arm64 kdump Dave Kleikamp
2022-04-25  2:19 ` Leizhen (ThunderTown)
2022-04-25  2:45   ` Baoquan He
2022-04-25  6:29     ` Leizhen (ThunderTown)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=23e2dcf4-4e9a-5298-d5d8-8761b0bbbe21@huawei.com \
    --to=thunder.leizhen@huawei.com \
    --cc=John.p.donnelly@oracle.com \
    --cc=bhe@redhat.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dave.kleikamp@oracle.com \
    --cc=devicetree@vger.kernel.org \
    --cc=dingguo.cz@antgroup.com \
    --cc=dyoung@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=frowand.list@gmail.com \
    --cc=hpa@zytor.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rdunlap@infradead.org \
    --cc=robh+dt@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vgoyal@redhat.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=zhoufeng.zf@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).