All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/5] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64
@ 2018-04-05  8:04 ` Jia He
  0 siblings, 0 replies; 28+ messages in thread
From: Jia He @ 2018-04-05  8:04 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Mark Rutland,
	Ard Biesheuvel, Andrew Morton, Michal Hocko
  Cc: Wei Yang, Kees Cook, Laura Abbott, Vladimir Murzin,
	Philip Derrin, AKASHI Takahiro, James Morse, Steve Capper,
	Pavel Tatashin, Gioh Kim, Vlastimil Babka, Mel Gorman,
	Johannes Weiner, Kemi Wang, Petr Tesarik, YASUAKI ISHIMATSU,
	Andrey Ryabinin, Nikolay Borisov, Daniel Jordan, Daniel Vacek,
	Eugeniu Rosca, linux-arm-kernel, linux-kernel, linux-mm, Jia He

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") tried to optimize the loop in memmap_init_zone(). But
there is still some room for improvement.

Patch 1 remain the memblock_next_valid_pfn on arm and arm64
Patch 2 optimizes the memblock_next_valid_pfn()
Patch 3~5 optimizes the early_pfn_valid()

As for the performance improvement, after this set, I can see the time
overhead of memmap_init() is reduced from 41313 us to 24345 us in my
armv8a server(QDF2400 with 96G memory).

Attached the memblock region information in my server.
[   86.956758] Zone ranges:
[   86.959452]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
[   86.966041]   Normal   [mem 0x0000000100000000-0x00000017ffffffff]
[   86.972631] Movable zone start for each node
[   86.977179] Early memory node ranges
[   86.980985]   node   0: [mem 0x0000000000200000-0x000000000021ffff]
[   86.987666]   node   0: [mem 0x0000000000820000-0x000000000307ffff]
[   86.994348]   node   0: [mem 0x0000000003080000-0x000000000308ffff]
[   87.001029]   node   0: [mem 0x0000000003090000-0x00000000031fffff]
[   87.007710]   node   0: [mem 0x0000000003200000-0x00000000033fffff]
[   87.014392]   node   0: [mem 0x0000000003410000-0x000000000563ffff]
[   87.021073]   node   0: [mem 0x0000000005640000-0x000000000567ffff]
[   87.027754]   node   0: [mem 0x0000000005680000-0x00000000056dffff]
[   87.034435]   node   0: [mem 0x00000000056e0000-0x00000000086fffff]
[   87.041117]   node   0: [mem 0x0000000008700000-0x000000000871ffff]
[   87.047798]   node   0: [mem 0x0000000008720000-0x000000000894ffff]
[   87.054479]   node   0: [mem 0x0000000008950000-0x0000000008baffff]
[   87.061161]   node   0: [mem 0x0000000008bb0000-0x0000000008bcffff]
[   87.067842]   node   0: [mem 0x0000000008bd0000-0x0000000008c4ffff]
[   87.074524]   node   0: [mem 0x0000000008c50000-0x0000000008e2ffff]
[   87.081205]   node   0: [mem 0x0000000008e30000-0x0000000008e4ffff]
[   87.087886]   node   0: [mem 0x0000000008e50000-0x0000000008fcffff]
[   87.094568]   node   0: [mem 0x0000000008fd0000-0x000000000910ffff]
[   87.101249]   node   0: [mem 0x0000000009110000-0x00000000092effff]
[   87.107930]   node   0: [mem 0x00000000092f0000-0x000000000930ffff]
[   87.114612]   node   0: [mem 0x0000000009310000-0x000000000963ffff]
[   87.121293]   node   0: [mem 0x0000000009640000-0x000000000e61ffff]
[   87.127975]   node   0: [mem 0x000000000e620000-0x000000000e64ffff]
[   87.134657]   node   0: [mem 0x000000000e650000-0x000000000fffffff]
[   87.141338]   node   0: [mem 0x0000000010800000-0x0000000017feffff]
[   87.148019]   node   0: [mem 0x000000001c000000-0x000000001c00ffff]
[   87.154701]   node   0: [mem 0x000000001c010000-0x000000001c7fffff]
[   87.161383]   node   0: [mem 0x000000001c810000-0x000000007efbffff]
[   87.168064]   node   0: [mem 0x000000007efc0000-0x000000007efdffff]
[   87.174746]   node   0: [mem 0x000000007efe0000-0x000000007efeffff]
[   87.181427]   node   0: [mem 0x000000007eff0000-0x000000007effffff]
[   87.188108]   node   0: [mem 0x000000007f000000-0x00000017ffffffff]
[   87.194791] Initmem setup node 0 [mem 0x0000000000200000-0x00000017ffffffff]

Without this patchset:
[  117.106153] Initmem setup node 0 [mem 0x0000000000200000-0x00000017ffffffff]
[  117.113677] before memmap_init
[  117.118195] after  memmap_init
>>> memmap_init takes 4518 us
[  117.121446] before memmap_init
[  117.154992] after  memmap_init
>>> memmap_init takes 33546 us
[  117.158241] before memmap_init
[  117.161490] after  memmap_init
>>> memmap_init takes 3249 us
>>> totally takes 41313 us

With this patchset:
[   87.194791] Initmem setup node 0 [mem 0x0000000000200000-0x00000017ffffffff]
[   87.202314] before memmap_init
[   87.206164] after  memmap_init
>>> memmap_init takes 3850 us
[   87.209416] before memmap_init
[   87.226662] after  memmap_init
>>> memmap_init takes 17246 us
[   87.229911] before memmap_init
[   87.233160] after  memmap_init
>>> memmap_init takes 3249 us
>>> totally takes 24345 us

Changelog:
V7: - fix i386 compilation error. refine the commit description
V6: - simplify the codes, move arm/arm64 common codes to one file.
    - refine patches as suggested by Danial Vacek and Ard Biesheuvel
V5: - further refining as suggested by Danial Vacek. Make codes
      arm/arm64 more arch specific
V4: - refine patches as suggested by Danial Vacek and Wei Yang
    - optimized on arm besides arm64
V3: - fix 2 issues reported by kbuild test robot
V2: - rebase to mmotm latest
    - remain memblock_next_valid_pfn on arm64
    - refine memblock_search_pfn_regions and pfn_valid_region

Jia He (5):
  mm: page_alloc: remain memblock_next_valid_pfn() on arm and arm64
  arm: arm64: page_alloc: reduce unnecessary binary search in
    memblock_next_valid_pfn()
  mm/memblock: introduce memblock_search_pfn_regions()
  arm: arm64: introduce pfn_valid_region()
  mm: page_alloc: reduce unnecessary binary search in early_pfn_valid()

 arch/arm/mm/init.c           |  1 +
 arch/arm64/mm/init.c         |  1 +
 include/linux/arm96_common.h | 76 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/memblock.h     |  2 ++
 include/linux/mmzone.h       | 18 ++++++++++-
 mm/memblock.c                |  9 ++++++
 mm/page_alloc.c              |  2 +-
 7 files changed, 107 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/arm96_common.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-04-08  2:05 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-05  8:04 [PATCH v7 0/5] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64 Jia He
2018-04-05  8:04 ` Jia He
2018-04-05  8:04 ` [PATCH v7 1/5] mm: page_alloc: remain memblock_next_valid_pfn() " Jia He
2018-04-05  8:04   ` Jia He
2018-04-05 11:23   ` Matthew Wilcox
2018-04-05 11:23     ` Matthew Wilcox
2018-04-05 12:29     ` Jia He
2018-04-05 12:29       ` Jia He
2018-04-05  8:04 ` [PATCH v7 2/5] arm: arm64: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn() Jia He
2018-04-05  8:04   ` Jia He
2018-04-05 11:34   ` Matthew Wilcox
2018-04-05 11:34     ` Matthew Wilcox
2018-04-05 12:44     ` Jia He
2018-04-05 12:44       ` Jia He
2018-04-05 12:50       ` Matthew Wilcox
2018-04-05 12:50         ` Matthew Wilcox
2018-04-06  9:09         ` Russell King - ARM Linux
2018-04-06  9:09           ` Russell King - ARM Linux
2018-04-06 10:23           ` Daniel Vacek
2018-04-06 10:23             ` Daniel Vacek
2018-04-08  2:05           ` Jia He
2018-04-08  2:05             ` Jia He
2018-04-05  8:04 ` [PATCH v7 3/5] mm/memblock: introduce memblock_search_pfn_regions() Jia He
2018-04-05  8:04   ` Jia He
2018-04-05  8:04 ` [PATCH v7 4/5] arm: arm64: introduce pfn_valid_region() Jia He
2018-04-05  8:04   ` Jia He
2018-04-05  8:04 ` [PATCH v7 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid() Jia He
2018-04-05  8:04   ` Jia He

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.