linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/memblock:use a more appropriate order calculation when free memblock pages
@ 2020-12-03 15:23 carver4lio
  2020-12-04 13:42 ` Qian Cai
  2020-12-06 11:55 ` Mike Rapoport
  0 siblings, 2 replies; 8+ messages in thread
From: carver4lio @ 2020-12-03 15:23 UTC (permalink / raw)
  To: rppt; +Cc: akpm, linux-mm, linux-kernel, Hailong Liu

From: Hailong Liu <liu.hailong6@zte.com.cn>

When system in the booting stage, pages span from [start, end] of a memblock
are freed to buddy in a order as large as possible (less than MAX_ORDER) at
first, then decrease gradually to a proper order(less than end) in a loop.

However, *min(MAX_ORDER - 1UL, __ffs(start))* can not get the largest order
in some cases.
Instead, *__ffs(end - start)* may be more appropriate and meaningful.

Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
---
 mm/memblock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index b68ee8678..7c6d0dde7 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1931,7 +1931,7 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end)
 	int order;
 
 	while (start < end) {
-		order = min(MAX_ORDER - 1UL, __ffs(start));
+		order = min(MAX_ORDER - 1UL, __ffs(end - start));
 
 		while (start + (1UL << order) > end)
 			order--;
-- 
2.17.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memblock:use a more appropriate order calculation when free memblock pages
  2020-12-03 15:23 [PATCH] mm/memblock:use a more appropriate order calculation when free memblock pages carver4lio
@ 2020-12-04 13:42 ` Qian Cai
       [not found]   ` <CGME20201204160751eucas1p13cc7aad8c68dd2a495c4bbf422c4228c@eucas1p1.samsung.com>
  2020-12-06 11:55 ` Mike Rapoport
  1 sibling, 1 reply; 8+ messages in thread
From: Qian Cai @ 2020-12-04 13:42 UTC (permalink / raw)
  To: carver4lio, rppt
  Cc: akpm, linux-mm, linux-kernel, Hailong Liu, Stephen Rothwell,
	Linux Next Mailing List

On Thu, 2020-12-03 at 23:23 +0800, carver4lio@163.com wrote:
> From: Hailong Liu <liu.hailong6@zte.com.cn>
> 
> When system in the booting stage, pages span from [start, end] of a memblock
> are freed to buddy in a order as large as possible (less than MAX_ORDER) at
> first, then decrease gradually to a proper order(less than end) in a loop.
> 
> However, *min(MAX_ORDER - 1UL, __ffs(start))* can not get the largest order
> in some cases.
> Instead, *__ffs(end - start)* may be more appropriate and meaningful.
> 
> Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>

Reverting this commit on the top of today's linux-next fixed boot crashes on
multiple NUMA systems.

[    5.050736][    T0] flags: 0x3fffc000000000()
[    5.055103][    T0] raw: 003fffc000000000 ffffea0000000448 ffffea0000000448 0000000000000000
[    5.063572][    T0] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[    5.072045][    T0] page dumped because: VM_BUG_ON_PAGE(pfn & ((1 << order) - 1))
[    5.079580][    T0] ------------[ cut here ]------------
[    5.084883][    T0] kernel BUG at mm/page_alloc.c:1015!
[    5.090151][    T0] invalid opcode: 0000 [#1] SMP KASAN NOPTI
[    5.095894][    T0] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-rc6-next-20201204+ #11
[    5.104099][    T0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[    5.113370][    T0] RIP: 0010:__free_one_page+0xa19/0x1140
[    5.118864][    T0] Code: d2 e9 69 f6 ff ff 0f 0b 48 c7 c6 e0 52 2d a5 4c 89 ff e8 7a 98 f8 ff 0f 0b 0f 0b 48 c7 c6 60 53 2d a5 4c 89 ff e8 67 98 f8 ff <0f> 0b 48 c7 c6 c0 53 2d a5 4c 89 ff e8 56 98 f8 ff 0f 0b 48 89 da
[    5.138427][    T0] RSP: 0000:ffffffffa5807c30 EFLAGS: 00010086
[    5.144367][    T0] RAX: 0000000000000000 RBX: 0000000000000008 RCX: ffffffffa3c4abf4
[    5.152228][    T0] RDX: 1ffffd400000008f RSI: 0000000000000000 RDI: ffffea0000000478
[    5.160091][    T0] RBP: 0000000000000007 R08: fffffbfff5918fc5 R09: fffffbfff5918fc5
[    5.167951][    T0] R10: ffffffffac8c7e23 R11: fffffbfff5918fc4 R12: 0000000000000000
[    5.175815][    T0] R13: 0000000000000003 R14: ffff88887fff6000 R15: ffffea0000000440
[    5.183677][    T0] FS:  0000000000000000(0000) GS:ffff88881e800000(0000) knlGS:0000000000000000
[    5.192499][    T0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.198963][    T0] CR2: ffff88907efff000 CR3: 0000000ce3e14000 CR4: 00000000000406b0
[    5.206823][    T0] Call Trace:
[    5.209978][    T0]  ? rwlock_bug.part.1+0x90/0x90
[    5.214774][    T0]  free_one_page+0x7e/0x1e0
[    5.219142][    T0]  __free_pages_ok+0x646/0x13b0
[    5.223863][    T0]  memblock_free_all+0x21c/0x2c0
(inlined by) __free_memory_core at mm/memblock.c:2037
(inlined by) free_low_memory_core_early at mm/memblock.c:2060
(inlined by) memblock_free_all at mm/memblock.c:2100
[    5.228662][    T0]  ? reset_all_zones_managed_pages+0x9a/0x9a
[    5.234515][    T0]  ? memblock_alloc_try_nid+0xe6/0x127
[    5.239842][    T0]  ? memblock_alloc_try_nid_raw+0x12a/0x12a
[    5.245610][    T0]  ? early_amd_iommu_init+0x1e1f/0x1e1f
[    5.251024][    T0]  ? iommu_go_to_state+0x24/0x28
[    5.255831][    T0]  mem_init+0x1a/0x350
[    5.259762][    T0]  mm_init+0x5f/0x87
[    5.263515][    T0]  start_kernel+0x14c/0x3a7
[    5.267882][    T0]  ? copy_bootdata+0x19/0x47
[    5.272340][    T0]  secondary_startup_64_no_verify+0xc2/0xcb
[    5.278102][    T0] Modules linked in:
[    5.281869][    T0] random: get_random_bytes called from print_oops_end_marker+0x26/0x40 with crng_init=0
[    5.281878][    T0] ---[ end trace 32dd7228cc16af82 ]---
[    5.296795][    T0] RIP: 0010:__free_one_page+0xa19/0x1140
[    5.302299][    T0] Code: d2 e9 69 f6 ff ff 0f 0b 48 c7 c6 e0 52 2d a5 4c 89 ff e8 7a 98 f8 ff 0f 0b 0f 0b 48 c7 c6 60 53 2d a5 4c 89 ff e8 67 98 f8 ff <0f> 0b 48 c7 c6 c0 53 2d a5 4c 89 ff e8 56 98 f8 ff 0f 0b 48 89 da
[    5.321864][    T0] RSP: 0000:ffffffffa5807c30 EFLAGS: 00010086
[    5.327803][    T0] RAX: 0000000000000000 RBX: 0000000000000008 RCX: ffffffffa3c4abf4
[    5.335665][    T0] RDX: 1ffffd400000008f RSI: 0000000000000000 RDI: ffffea0000000478
[    5.343526][    T0] RBP: 0000000000000007 R08: fffffbfff5918fc5 R09: fffffbfff5918fc5
[    5.351389][    T0] R10: ffffffffac8c7e23 R11: fffffbfff5918fc4 R12: 0000000000000000
[    5.359249][    T0] R13: 0000000000000003 R14: ffff88887fff6000 R15: ffffea0000000440
[    5.367110][    T0] FS:  0000000000000000(0000) GS:ffff88881e800000(0000) knlGS:0000000000000000
[    5.375932][    T0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.382397][    T0] CR2: ffff88907efff000 CR3: 0000000ce3e14000 CR4: 00000000000406b0
[    5.390261][    T0] Kernel panic - not syncing: Fatal exception
[    5.396320][    T0] ---[ end Kernel panic - not syncing: Fatal exception ]---

> ---
>  mm/memblock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index b68ee8678..7c6d0dde7 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1931,7 +1931,7 @@ static void __init __free_pages_memory(unsigned long
> start, unsigned long end)
>  	int order;
>  
>  	while (start < end) {
> -		order = min(MAX_ORDER - 1UL, __ffs(start));
> +		order = min(MAX_ORDER - 1UL, __ffs(end - start));
>  
>  		while (start + (1UL << order) > end)
>  			order--;


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memblock:use a more appropriate order calculation when free memblock pages
       [not found]   ` <CGME20201204160751eucas1p13cc7aad8c68dd2a495c4bbf422c4228c@eucas1p1.samsung.com>
@ 2020-12-04 16:07     ` Marek Szyprowski
  2020-12-04 17:43       ` Jon Hunter
  0 siblings, 1 reply; 8+ messages in thread
From: Marek Szyprowski @ 2020-12-04 16:07 UTC (permalink / raw)
  To: Qian Cai, carver4lio, rppt
  Cc: akpm, linux-mm, linux-kernel, Hailong Liu, Stephen Rothwell,
	Linux Next Mailing List, Bartlomiej Zolnierkiewicz

Hi All,

On 04.12.2020 14:42, Qian Cai wrote:
> On Thu, 2020-12-03 at 23:23 +0800, carver4lio@163.com wrote:
>> From: Hailong Liu <liu.hailong6@zte.com.cn>
>>
>> When system in the booting stage, pages span from [start, end] of a memblock
>> are freed to buddy in a order as large as possible (less than MAX_ORDER) at
>> first, then decrease gradually to a proper order(less than end) in a loop.
>>
>> However, *min(MAX_ORDER - 1UL, __ffs(start))* can not get the largest order
>> in some cases.
>> Instead, *__ffs(end - start)* may be more appropriate and meaningful.
>>
>> Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
> Reverting this commit on the top of today's linux-next fixed boot crashes on
> multiple NUMA systems.

I confirm. Reverting commit 4df001639c84 ("mm/memblock: use a more 
appropriate order calculation when free memblock pages") on top of linux 
next-20201204 fixed booting of my ARM32bit test systems.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memblock:use a more appropriate order calculation when free memblock pages
  2020-12-04 16:07     ` Marek Szyprowski
@ 2020-12-04 17:43       ` Jon Hunter
  2020-12-05 17:09         ` Anders Roxell
  0 siblings, 1 reply; 8+ messages in thread
From: Jon Hunter @ 2020-12-04 17:43 UTC (permalink / raw)
  To: Marek Szyprowski, Qian Cai, carver4lio, rppt
  Cc: akpm, linux-mm, linux-kernel, Hailong Liu, Stephen Rothwell,
	Linux Next Mailing List, Bartlomiej Zolnierkiewicz, linux-tegra


On 04/12/2020 16:07, Marek Szyprowski wrote:
> Hi All,
> 
> On 04.12.2020 14:42, Qian Cai wrote:
>> On Thu, 2020-12-03 at 23:23 +0800, carver4lio@163.com wrote:
>>> From: Hailong Liu <liu.hailong6@zte.com.cn>
>>>
>>> When system in the booting stage, pages span from [start, end] of a memblock
>>> are freed to buddy in a order as large as possible (less than MAX_ORDER) at
>>> first, then decrease gradually to a proper order(less than end) in a loop.
>>>
>>> However, *min(MAX_ORDER - 1UL, __ffs(start))* can not get the largest order
>>> in some cases.
>>> Instead, *__ffs(end - start)* may be more appropriate and meaningful.
>>>
>>> Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
>> Reverting this commit on the top of today's linux-next fixed boot crashes on
>> multiple NUMA systems.
> 
> I confirm. Reverting commit 4df001639c84 ("mm/memblock: use a more 
> appropriate order calculation when free memblock pages") on top of linux 
> next-20201204 fixed booting of my ARM32bit test systems.


FWIW, I also confirm that this is causing several 32-bit Tegra platforms
to crash on boot and reverting this fixes the problem.

Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memblock:use a more appropriate order calculation when free memblock pages
  2020-12-04 17:43       ` Jon Hunter
@ 2020-12-05 17:09         ` Anders Roxell
  2020-12-05 17:12           ` Anders Roxell
  0 siblings, 1 reply; 8+ messages in thread
From: Anders Roxell @ 2020-12-05 17:09 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Marek Szyprowski, Qian Cai, carver4lio, rppt, Andrew Morton,
	Linux-MM, Linux Kernel Mailing List, Hailong Liu,
	Stephen Rothwell, Linux Next Mailing List,
	Bartlomiej Zolnierkiewicz, linux-tegra

On Fri, 4 Dec 2020 at 18:44, Jon Hunter <jonathanh@nvidia.com> wrote:
>
>
> On 04/12/2020 16:07, Marek Szyprowski wrote:
> > Hi All,
> >
> > On 04.12.2020 14:42, Qian Cai wrote:
> >> On Thu, 2020-12-03 at 23:23 +0800, carver4lio@163.com wrote:
> >>> From: Hailong Liu <liu.hailong6@zte.com.cn>
> >>>
> >>> When system in the booting stage, pages span from [start, end] of a memblock
> >>> are freed to buddy in a order as large as possible (less than MAX_ORDER) at
> >>> first, then decrease gradually to a proper order(less than end) in a loop.
> >>>
> >>> However, *min(MAX_ORDER - 1UL, __ffs(start))* can not get the largest order
> >>> in some cases.
> >>> Instead, *__ffs(end - start)* may be more appropriate and meaningful.
> >>>
> >>> Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
> >> Reverting this commit on the top of today's linux-next fixed boot crashes on
> >> multiple NUMA systems.
> >
> > I confirm. Reverting commit 4df001639c84 ("mm/memblock: use a more
> > appropriate order calculation when free memblock pages") on top of linux
> > next-20201204 fixed booting of my ARM32bit test systems.
>
>
> FWIW, I also confirm that this is causing several 32-bit Tegra platforms
> to crash on boot and reverting this fixes the problem.

I had the same experience on an arm64 system.

Cheers,
Anders

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memblock:use a more appropriate order calculation when free memblock pages
  2020-12-05 17:09         ` Anders Roxell
@ 2020-12-05 17:12           ` Anders Roxell
  0 siblings, 0 replies; 8+ messages in thread
From: Anders Roxell @ 2020-12-05 17:12 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Marek Szyprowski, Qian Cai, carver4lio, rppt, Andrew Morton,
	Linux-MM, Linux Kernel Mailing List, Hailong Liu,
	Stephen Rothwell, Linux Next Mailing List,
	Bartlomiej Zolnierkiewicz, linux-tegra

On Sat, 5 Dec 2020 at 18:09, Anders Roxell <anders.roxell@linaro.org> wrote:
>
> On Fri, 4 Dec 2020 at 18:44, Jon Hunter <jonathanh@nvidia.com> wrote:
> >
> >
> > On 04/12/2020 16:07, Marek Szyprowski wrote:
> > > Hi All,
> > >
> > > On 04.12.2020 14:42, Qian Cai wrote:
> > >> On Thu, 2020-12-03 at 23:23 +0800, carver4lio@163.com wrote:
> > >>> From: Hailong Liu <liu.hailong6@zte.com.cn>
> > >>>
> > >>> When system in the booting stage, pages span from [start, end] of a memblock
> > >>> are freed to buddy in a order as large as possible (less than MAX_ORDER) at
> > >>> first, then decrease gradually to a proper order(less than end) in a loop.
> > >>>
> > >>> However, *min(MAX_ORDER - 1UL, __ffs(start))* can not get the largest order
> > >>> in some cases.
> > >>> Instead, *__ffs(end - start)* may be more appropriate and meaningful.
> > >>>
> > >>> Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
> > >> Reverting this commit on the top of today's linux-next fixed boot crashes on
> > >> multiple NUMA systems.
> > >
> > > I confirm. Reverting commit 4df001639c84 ("mm/memblock: use a more
> > > appropriate order calculation when free memblock pages") on top of linux
> > > next-20201204 fixed booting of my ARM32bit test systems.
> >
> >
> > FWIW, I also confirm that this is causing several 32-bit Tegra platforms
> > to crash on boot and reverting this fixes the problem.
>
> I had the same experience on an arm64 system.

This is the log that I see:

[    0.000000][    T0] percpu: Embedded 507 pages/cpu s2036568 r8192
d31912 u2076672
[    0.000000][    T0] Detected VIPT I-cache on CPU0
[    0.000000][    T0] CPU features: detected: ARM erratum 845719
[    0.000000][    T0] CPU features: GIC system register CPU interface
present but disabled by higher exception level
[    0.000000][    T0] CPU features: kernel page table isolation
forced OFF by kpti command line option
[    0.000000][    T0] Built 1 zonelists, mobility grouping on.  Total
pages: 516096
[    0.000000][    T0] Policy zone: DMA
[    0.000000][    T0] Kernel command line: root=/dev/root
rootfstype=9p rootflags=trans=virtio console=ttyAMA0,38400n8
earlycon=pl011,0x9000000 initcall_debug softlockup_panic=0
security=none kpti=no
[    0.000000][    T0] Dentry cache hash table entries: 262144 (order:
9, 2097152 bytes, linear)
[    0.000000][    T0] Inode-cache hash table entries: 131072 (order:
8, 1048576 bytes, linear)
[    0.000000][    T0] mem auto-init: stack:off, heap alloc:on, heap free:on
[    0.000000][    T0] mem auto-init: clearing system memory may take
some time...
[    0.000000][    T0] page:(____ptrval____) refcount:0 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x40010
[    0.000000][    T0] flags: 0x1fffe0000000000()
[    0.000000][    T0] raw: 01fffe0000000000 fffffc0000000408
fffffc0000000408 0000000000000000
[    0.000000][    T0] raw: 0000000000000000 0000000000000000
00000000ffffffff 0000000000000000
[    0.000000][    T0] page dumped because: VM_BUG_ON_PAGE(pfn & ((1
<< order) - 1))
[    0.000000][    T0] ------------[ cut here ]------------
[    0.000000][    T0] kernel BUG at mm/page_alloc.c:1015!
[    0.000000][    T0] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[    0.000000][    T0] Modules linked in:
[    0.000000][    T0] CPU: 0 PID: 0 Comm: swapper Not tainted
5.10.0-rc6-next-20201204-00010-g7f8e9106f747-dirty #1
[    0.000000][    T0] Hardware name: linux,dummy-virt (DT)
[    0.000000][    T0] pstate: 40400085 (nZcv daIf +PAN -UAO -TCO BTYPE=--)
[    0.000000][    T0] pc : __free_one_page+0x14c/0x700
[    0.000000][    T0] lr : __free_one_page+0x14c/0x700
[    0.000000][    T0] sp : ffff800013fd7c10
[    0.000000][    T0] x29: ffff800013fd7c10 x28: 0000000000000000
[    0.000000][    T0] x27: 0000000000000200 x26: 0000000000000001
[    0.000000][    T0] x25: 0000000000000000 x24: 0000000000000009
[    0.000000][    T0] x23: ffff00007dbfbd40 x22: fffffc0000000400
[    0.000000][    T0] x21: 0000000000040010 x20: 0000000000000009
[    0.000000][    T0] x19: 00000000000001ff x18: 0000000000000000
[    0.000000][    T0] x17: 0000000000000000 x16: 0000000000000000
[    0.000000][    T0] x15: 0000000000000000 x14: 0000000000000000
[    0.000000][    T0] x13: 0000000000000000 x12: ffff70000281852d
[    0.000000][    T0] x11: 1ffff0000281852c x10: ffff70000281852c
[    0.000000][    T0] x9 : dfff800000000000 x8 : ffff8000140c2960
[    0.000000][    T0] x7 : 0000000000000001 x6 : 00008ffffd7e7ad4
[    0.000000][    T0] x5 : 0000000000000000 x4 : 0000000000000000
[    0.000000][    T0] x3 : ffff80001400ab00 x2 : 0000000000000000
[    0.000000][    T0] x1 : 0000000000000000 x0 : 0000000000000000
[    0.000000][    T0] Call trace:
[    0.000000][    T0]  __free_one_page+0x14c/0x700
[    0.000000][    T0]  free_one_page+0xf0/0x120
[    0.000000][    T0]  __free_pages_ok+0x720/0x780
[    0.000000][    T0]  __free_pages_core+0x240/0x280
[    0.000000][    T0]  memblock_free_pages+0x40/0x50
[    0.000000][    T0]  free_low_memory_core_early+0x230/0x2f0
[    0.000000][    T0]  memblock_free_all+0x28/0x58
[    0.000000][    T0]  mem_init+0xf0/0x10c
[    0.000000][    T0]  mm_init+0xb4/0xe8
[    0.000000][    T0]  start_kernel+0x1e0/0x520
[    0.000000][    T0] Code: 913a8021 aa1603e0 91030021 97fe7ec6 (d4210000)
[    0.000000][    T0] random: get_random_bytes called from
oops_exit+0x50/0xa0 with crng_init=0
[    0.000000][    T0] ---[ end trace 0000000000000000 ]---
[    0.000000][    T0] Kernel panic - not syncing: Oops - BUG: Fatal exception
[    0.000000][    T0] ---[ end Kernel panic - not syncing: Oops -
BUG: Fatal exception ]---

Cheers,
Anders

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memblock:use a more appropriate order calculation when free memblock pages
  2020-12-03 15:23 [PATCH] mm/memblock:use a more appropriate order calculation when free memblock pages carver4lio
  2020-12-04 13:42 ` Qian Cai
@ 2020-12-06 11:55 ` Mike Rapoport
  2020-12-06 14:21   ` carver4lio
  1 sibling, 1 reply; 8+ messages in thread
From: Mike Rapoport @ 2020-12-06 11:55 UTC (permalink / raw)
  To: carver4lio; +Cc: akpm, linux-mm, linux-kernel, Hailong Liu

On Thu, Dec 03, 2020 at 11:23:10PM +0800, carver4lio@163.com wrote:
> From: Hailong Liu <liu.hailong6@zte.com.cn>
> 
> When system in the booting stage, pages span from [start, end] of a memblock
> are freed to buddy in a order as large as possible (less than MAX_ORDER) at
> first, then decrease gradually to a proper order(less than end) in a loop.
> 
> However, *min(MAX_ORDER - 1UL, __ffs(start))* can not get the largest order
> in some cases.

Do you have examples?
What is the memory configration that casues suboptimal order selection
and what is the order in this case?

> Instead, *__ffs(end - start)* may be more appropriate and meaningful.

As several people reported using __ffs(end - start) is not correct.
If the order selection is indeed suboptimal we'd need some better
formula ;-)

> Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
> ---
>  mm/memblock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index b68ee8678..7c6d0dde7 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1931,7 +1931,7 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end)
>  	int order;
>  
>  	while (start < end) {
> -		order = min(MAX_ORDER - 1UL, __ffs(start));
> +		order = min(MAX_ORDER - 1UL, __ffs(end - start));
>  
>  		while (start + (1UL << order) > end)
>  			order--;
> -- 
> 2.17.1
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memblock:use a more appropriate order calculation when free memblock pages
  2020-12-06 11:55 ` Mike Rapoport
@ 2020-12-06 14:21   ` carver4lio
  0 siblings, 0 replies; 8+ messages in thread
From: carver4lio @ 2020-12-06 14:21 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: akpm, linux-mm, linux-kernel, Hailong Liu

On 12/6/20 7:55 PM, Mike Rapoport wrote:
> On Thu, Dec 03, 2020 at 11:23:10PM +0800, carver4lio@163.com wrote:
>> From: Hailong Liu <liu.hailong6@zte.com.cn>
>>
>> When system in the booting stage, pages span from [start, end] of a memblock
>> are freed to buddy in a order as large as possible (less than MAX_ORDER) at
>> first, then decrease gradually to a proper order(less than end) in a loop.
>>
>> However, *min(MAX_ORDER - 1UL, __ffs(start))* can not get the largest order
>> in some cases.
> 
> Do you have examples?
> What is the memory configration that casues suboptimal order selection
> and what is the order in this case?
> 
I'm sorry for my careless and inadequate testing(I just test it on my x86
machine with 8 cores).

On my x86_64 machine, the layout of RAM looks like:
/ # cat /proc/iomem
00000100-00000fff : reserved
00001000-0009c7ff : System RAM
0009c800-0009ffff : reserved
.....
100000000-22dffffff : System RAM
  22c600000-22d0e01c0 : Kernel code
  22d0e01c1-22d96af3f : Kernel data
  22dae5000-22dbdcfff : Kernel bss
22e000000-22fffffff : RAM buffer

On my machine, I noticed that when the order of an start pfn in is less than
MAX_ORDER, e.g: the start phy_addr 0x00001000, then the return value *order*
of *min(MAX_ORDER - 1UL, __ffs(start))* will be 1, but the free pages span
of the memblock is more than order 1, it's should be (end - start), I guess.

I tested my ideas with some record code like this:
diff --git a/mm/memblock.c b/mm/memblock.c
index b68ee86788af..b0143e3f75db 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1928,18 +1928,23 @@ early_param("memblock", early_memblock);

 static void __init __free_pages_memory(unsigned long start, unsigned long end)
 {
-       int order;
+       int order, loop_cnt, adjust_cnt;
+

        while (start < end) {
                order = min(MAX_ORDER - 1UL, __ffs(start));

-               while (start + (1UL << order) > end)
+               while (start + (1UL << order) > end) {
                        order--;
-
+                       adjust_cnt++;
+               }
                memblock_free_pages(pfn_to_page(start), start, order);

                start += (1UL << order);
+               loop_cnt++;
        }
+       pr_info("TST:[start %lu, end %lu]: loop cnt %d, adjust cnt %d\n",
+               loop_cnt++, adjust_cnt++);
 }

If I change __ffs(start) to __ffs(end - start), the print info show less
loop_cnt and adjust_cnt  on my machine.
 
>> Instead, *__ffs(end - start)* may be more appropriate and meaningful.
> 
> As several people reported using __ffs(end - start) is not correct.
> If the order selection is indeed suboptimal we'd need some better
> formula ;-)
> 
>> Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
>> ---
>>  mm/memblock.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index b68ee8678..7c6d0dde7 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1931,7 +1931,7 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end)
>>  	int order;
>>  
>>  	while (start < end) {
>> -		order = min(MAX_ORDER - 1UL, __ffs(start));
>> +		order = min(MAX_ORDER - 1UL, __ffs(end - start));
>>  
>>  		while (start + (1UL << order) > end)
>>  			order--;
>> -- 
>> 2.17.1
>>
>>
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-12-06 14:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03 15:23 [PATCH] mm/memblock:use a more appropriate order calculation when free memblock pages carver4lio
2020-12-04 13:42 ` Qian Cai
     [not found]   ` <CGME20201204160751eucas1p13cc7aad8c68dd2a495c4bbf422c4228c@eucas1p1.samsung.com>
2020-12-04 16:07     ` Marek Szyprowski
2020-12-04 17:43       ` Jon Hunter
2020-12-05 17:09         ` Anders Roxell
2020-12-05 17:12           ` Anders Roxell
2020-12-06 11:55 ` Mike Rapoport
2020-12-06 14:21   ` carver4lio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).