On 04/13/2015 06:16 AM, Mel Gorman wrote: > Memory initialisation had been identified as one of the reasons why large > machines take a long time to boot. Patches were posted a long time ago > that attempted to move deferred initialisation into the page allocator > paths. This was rejected on the grounds it should not be necessary to hurt > the fast paths to parallelise initialisation. This series reuses much of > the work from that time but defers the initialisation of memory to kswapd > so that one thread per node initialises memory local to that node. The > issue is that on the machines I tested with, memory initialisation was not > a major contributor to boot times. I'm posting the RFC to both review the > series and see if it actually helps users of very large machines. > > After applying the series and setting the appropriate Kconfig variable I > see this in the boot log on a 64G machine > > [ 7.383764] kswapd 0 initialised deferred memory in 188ms > [ 7.404253] kswapd 1 initialised deferred memory in 208ms > [ 7.411044] kswapd 3 initialised deferred memory in 216ms > [ 7.411551] kswapd 2 initialised deferred memory in 216ms > > On a 1TB machine, I see > > [ 11.913324] kswapd 0 initialised deferred memory in 1168ms > [ 12.220011] kswapd 2 initialised deferred memory in 1476ms > [ 12.245369] kswapd 3 initialised deferred memory in 1500ms > [ 12.271680] kswapd 1 initialised deferred memory in 1528ms > > Once booted the machine appears to work as normal. Boot times were measured > from the time shutdown was called until ssh was available again. In the > 64G case, the boot time savings are negligible. On the 1TB machine, the > savings were 10 seconds (about 8% improvement on kernel times but 1-2% > overall as POST takes so long). > > It would be nice if the people that have access to really large machines > would test this series and report back if the complexity is justified. > > Patches are against 4.0-rc7. > > Documentation/kernel-parameters.txt | 8 + > arch/ia64/mm/numa.c | 19 +- > arch/x86/Kconfig | 2 + > include/linux/memblock.h | 18 ++ > include/linux/mm.h | 8 +- > include/linux/mmzone.h | 37 +++- > init/main.c | 1 + > mm/Kconfig | 29 +++ > mm/bootmem.c | 6 +- > mm/internal.h | 23 ++- > mm/memblock.c | 34 ++- > mm/mm_init.c | 9 +- > mm/nobootmem.c | 7 +- > mm/page_alloc.c | 398 +++++++++++++++++++++++++++++++----- > mm/vmscan.c | 6 +- > 15 files changed, 507 insertions(+), 98 deletions(-) > I had included your patch with the 4.0 kernel and booted up a 16-socket 12-TB machine. I measured the elapsed time from the elilo prompt to the availability of ssh login. Without the patch, the bootup time was 404s. It was reduced to 298s with the patch. So there was about 100s reduction in bootup time (1/4 of the total). However, there were 2 bootup problems in the dmesg log that needed to be addressed. 1. There were 2 vmalloc allocation failures: [ 2.284686] vmalloc: allocation failure, allocated 16578404352 of 17179873280 bytes [ 10.399938] vmalloc: allocation failure, allocated 7970922496 of 8589938688 bytes 2. There were 2 soft lockup warnings: [ 57.319453] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [swapper/0:1] [ 85.409263] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/0:1] Once those problems are fixed, the patch should be in a pretty good shape. I have attached the dmesg log for your reference. Cheers, Longman