linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Deferred Memory Init: How to bring rest of memory online after limiting it with `mem=XG`?
@ 2020-12-03 10:51 Paul Menzel
  2020-12-03 12:25 ` David Hildenbrand
  2020-12-04  8:05 ` Feng Tang
  0 siblings, 2 replies; 9+ messages in thread
From: Paul Menzel @ 2020-12-03 10:51 UTC (permalink / raw)
  To: Feng Tang; +Cc: linux-mm, Arjan van de Ven

Dear Feng,


I am trying to reduce the startup time of Debian’s Linux 5.9.9 on a 
Intel Kaby Lake system with 32 GB of memory (TUXEDO Book BU1406 (Clevo 
N240BU)). On your Linux Plumbers Conference 2019 slides of your talk 
*Linux Kernel Fastboot On the Way* [1], you mention *Deferred Memory Init*:

> Deferred Memory Init
> 
> •   8GB RAM’s initialization costs 100+ ms
> •   In early boot phase, we don’t need that much memory
> •   Utilize the memory hotplug feature
>     •   “mem=4096m” in cmdline to only init 2 GB
>     •   Use systemd service to add rest memory in parallel

Starting Linux with `mem=2G` indeed reduces the startup time, but I am 
unable to get the rest of the memory online. Comparing it with a boot 
without `mem=2G` the `memoryX`  devices under 
`/sys/devices/system/memory/` are missing.

With `mem=2G`:

$ lsmem --output-all
RANGE                                  SIZE  STATE REMOVABLE BLOCK NODE 
ZONES
0x0000000000000000-0x0000000007ffffff  128M online       yes     0    0 
None
0x0000000008000000-0x000000007fffffff  1,9G online       yes  1-15    0 
DMA32

Memory block size:       128M
Total online memory:       2G
Total offline memory:      0B
$ ls -d /sys/devices/system/memory/memory*
/sys/devices/system/memory/memory0   /sys/devices/system/memory/memory2
/sys/devices/system/memory/memory1   /sys/devices/system/memory/memory3
/sys/devices/system/memory/memory10  /sys/devices/system/memory/memory4
/sys/devices/system/memory/memory11  /sys/devices/system/memory/memory5
/sys/devices/system/memory/memory12  /sys/devices/system/memory/memory6
/sys/devices/system/memory/memory13  /sys/devices/system/memory/memory7
/sys/devices/system/memory/memory14  /sys/devices/system/memory/memory8
/sys/devices/system/memory/memory15  /sys/devices/system/memory/memory9
```

Without `mem=2G`:

```
$ lsmem --output-all
RANGE                                  SIZE  STATE REMOVABLE  BLOCK NODE 
  ZONES
0x0000000000000000-0x0000000007ffffff  128M online       yes      0    0 
   None
0x0000000008000000-0x0000000087ffffff    2G online       yes   1-16    0 
  DMA32
0x0000000088000000-0x000000008fffffff  128M online       yes     17    0 
   None
0x0000000100000000-0x0000000867ffffff 29,6G online       yes 32-268    0 
Normal
0x0000000868000000-0x000000086fffffff  128M online       yes    269    0 
   None

Memory block size:       128M
Total online memory:      32G
Total offline memory:      0B
```

Can the deferred memory initialization be done with the upstream Linux 
kernel, or were you using patches on top?


Kind regards,

Paul


[1]: 
https://www.linuxplumbersconf.org/event/4/contributions/281/attachments/216/617/LPC_2019_kernel_fastboot_on_the_way.pdf


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Deferred Memory Init: How to bring rest of memory online after limiting it with `mem=XG`?
  2020-12-03 10:51 Deferred Memory Init: How to bring rest of memory online after limiting it with `mem=XG`? Paul Menzel
@ 2020-12-03 12:25 ` David Hildenbrand
  2020-12-03 12:52   ` Paul Menzel
  2020-12-04  1:17   ` Feng Tang
  2020-12-04  8:05 ` Feng Tang
  1 sibling, 2 replies; 9+ messages in thread
From: David Hildenbrand @ 2020-12-03 12:25 UTC (permalink / raw)
  To: Paul Menzel, Feng Tang; +Cc: linux-mm, Arjan van de Ven

On 03.12.20 11:51, Paul Menzel wrote:
> Dear Feng,
> 
> 
> I am trying to reduce the startup time of Debian’s Linux 5.9.9 on a 
> Intel Kaby Lake system with 32 GB of memory (TUXEDO Book BU1406 (Clevo 
> N240BU)). On your Linux Plumbers Conference 2019 slides of your talk 
> *Linux Kernel Fastboot On the Way* [1], you mention *Deferred Memory Init*:
> 
>> Deferred Memory Init
>>
>> •   8GB RAM’s initialization costs 100+ ms
>> •   In early boot phase, we don’t need that much memory
>> •   Utilize the memory hotplug feature
>>     •   “mem=4096m” in cmdline to only init 2 GB
>>     •   Use systemd service to add rest memory in parallel

Uh, that sounds very wrong and flawed.

Even if you would be adding+onlining memory in parallel, memory
hotplug/onlining code runs strictly sequential. This does not work.

And I question this approach in general.

We do have deferred meminit in the kernel during boot that can
initialize memory in parallel.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Deferred Memory Init: How to bring rest of memory online after limiting it with `mem=XG`?
  2020-12-03 12:25 ` David Hildenbrand
@ 2020-12-03 12:52   ` Paul Menzel
  2020-12-03 13:06     ` David Hildenbrand
  2020-12-04  1:17   ` Feng Tang
  1 sibling, 1 reply; 9+ messages in thread
From: Paul Menzel @ 2020-12-03 12:52 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: linux-mm, Arjan van de Ven, Feng Tang

Dear David,


Thank you for the quick response.


Am 03.12.20 um 13:25 schrieb David Hildenbrand:
> On 03.12.20 11:51, Paul Menzel wrote:

>> I am trying to reduce the startup time of Debian’s Linux 5.9.9 on a
>> Intel Kaby Lake system with 32 GB of memory (TUXEDO Book BU1406 (Clevo
>> N240BU)).

[…]

> We do have deferred meminit in the kernel during boot that can
> initialize memory in parallel.

Is that used automatically, or do I need to activate it somehow?


Kind regards,

Paul


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Deferred Memory Init: How to bring rest of memory online after limiting it with `mem=XG`?
  2020-12-03 12:52   ` Paul Menzel
@ 2020-12-03 13:06     ` David Hildenbrand
  2020-12-03 20:58       ` Daniel Jordan
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2020-12-03 13:06 UTC (permalink / raw)
  To: Paul Menzel; +Cc: linux-mm, Arjan van de Ven, Feng Tang

On 03.12.20 13:52, Paul Menzel wrote:
> Dear David,
> 
> 
> Thank you for the quick response.
> 
> 
> Am 03.12.20 um 13:25 schrieb David Hildenbrand:
>> On 03.12.20 11:51, Paul Menzel wrote:
> 
>>> I am trying to reduce the startup time of Debian’s Linux 5.9.9 on a
>>> Intel Kaby Lake system with 32 GB of memory (TUXEDO Book BU1406 (Clevo
>>> N240BU)).
> 
> […]
> 
>> We do have deferred meminit in the kernel during boot that can
>> initialize memory in parallel.
> 
> Is that used automatically, or do I need to activate it somehow?

If your kernel is compiled with

CONFIG_DEFERRED_STRUCT_PAGE_INIT

it should be enabled automatically.


config DEFERRED_STRUCT_PAGE_INIT
	bool "Defer initialisation of struct pages to kthreads"
	depends on SPARSEMEM
	depends on !NEED_PER_CPU_KM
	depends on 64BIT
	select PADATA
	help
	  Ordinarily all struct pages are initialised during early boot in a
	  single thread. On very large machines this can take a considerable
	  amount of time. If this option is set, large machines will bring up
	  a subset of memmap at boot and then initialise the rest in parallel.
	  This has a potential performance impact on tasks running early in the
	  lifetime of the system until these kthreads finish the
	  initialisation.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Deferred Memory Init: How to bring rest of memory online after limiting it with `mem=XG`?
  2020-12-03 13:06     ` David Hildenbrand
@ 2020-12-03 20:58       ` Daniel Jordan
  2020-12-04  7:31         ` Paul Menzel
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Jordan @ 2020-12-03 20:58 UTC (permalink / raw)
  To: David Hildenbrand, Paul Menzel; +Cc: linux-mm, Arjan van de Ven, Feng Tang

David Hildenbrand <david@redhat.com> writes:

> On 03.12.20 13:52, Paul Menzel wrote:
>> Dear David,
>> 
>> 
>> Thank you for the quick response.
>> 
>> 
>> Am 03.12.20 um 13:25 schrieb David Hildenbrand:
>>> On 03.12.20 11:51, Paul Menzel wrote:
>> 
>>>> I am trying to reduce the startup time of Debian’s Linux 5.9.9 on a
>>>> Intel Kaby Lake system with 32 GB of memory (TUXEDO Book BU1406 (Clevo
>>>> N240BU)).
>> 
>> […]
>> 
>>> We do have deferred meminit in the kernel during boot that can
>>> initialize memory in parallel.
>> 
>> Is that used automatically, or do I need to activate it somehow?
>
> If your kernel is compiled with
>
> CONFIG_DEFERRED_STRUCT_PAGE_INIT
>
> it should be enabled automatically.
>
>
> config DEFERRED_STRUCT_PAGE_INIT
> 	bool "Defer initialisation of struct pages to kthreads"
> 	depends on SPARSEMEM
> 	depends on !NEED_PER_CPU_KM
> 	depends on 64BIT
> 	select PADATA
> 	help
> 	  Ordinarily all struct pages are initialised during early boot in a
> 	  single thread. On very large machines this can take a considerable
> 	  amount of time. If this option is set, large machines will bring up
> 	  a subset of memmap at boot and then initialise the rest in parallel.
> 	  This has a potential performance impact on tasks running early in the
> 	  lifetime of the system until these kthreads finish the
> 	  initialisation.

Hello Paul,

If it is enabled, what does

dmesg | grep 'deferred pages'

give you?  And assuming you're running systemd, what does
systemd-analyze show you?

Thanks.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Deferred Memory Init: How to bring rest of memory online after limiting it with `mem=XG`?
  2020-12-03 12:25 ` David Hildenbrand
  2020-12-03 12:52   ` Paul Menzel
@ 2020-12-04  1:17   ` Feng Tang
  1 sibling, 0 replies; 9+ messages in thread
From: Feng Tang @ 2020-12-04  1:17 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Paul Menzel, linux-mm, Arjan van de Ven

Hi David,

On Thu, Dec 03, 2020 at 01:25:04PM +0100, David Hildenbrand wrote:
> On 03.12.20 11:51, Paul Menzel wrote:
> > Dear Feng,
> > 
> > 
> > I am trying to reduce the startup time of Debian’s Linux 5.9.9 on a 
> > Intel Kaby Lake system with 32 GB of memory (TUXEDO Book BU1406 (Clevo 
> > N240BU)). On your Linux Plumbers Conference 2019 slides of your talk 
> > *Linux Kernel Fastboot On the Way* [1], you mention *Deferred Memory Init*:
> > 
> >> Deferred Memory Init
> >>
> >> •   8GB RAM’s initialization costs 100+ ms
> >> •   In early boot phase, we don’t need that much memory
> >> •   Utilize the memory hotplug feature
> >>     •   “mem=4096m” in cmdline to only init 2 GB
> >>     •   Use systemd service to add rest memory in parallel
> 
> Uh, that sounds very wrong and flawed.
> 
> Even if you would be adding+onlining memory in parallel, memory
> hotplug/onlining code runs strictly sequential. This does not work.
> 
> And I question this approach in general.
> 
> We do have deferred meminit in the kernel during boot that can
> initialize memory in parallel.

Yes, this is what can now use decently.

The foil was written in 2019 and the work is done in 2018 with kernel
4.9~4.19 kernel, where this feature was not availabe.

Interestingly, I called for the in-kernel deferred init in LPC, and in
the LPC room, Pavel Tatashin told me that they were working on it, and
it's great to see it is completed, which is really useful for optimizing
kernel boot time.

btw, the user space task to online the other memory was actually some
script, like

for i in x..y
do
	echo online > /sys/devices/system/memory$i/state
done

Thanks,
Feng

> 
> -- 
> Thanks,
> 
> David / dhildenb


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Deferred Memory Init: How to bring rest of memory online after limiting it with `mem=XG`?
  2020-12-03 20:58       ` Daniel Jordan
@ 2020-12-04  7:31         ` Paul Menzel
  2020-12-04 19:50           ` Daniel Jordan
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Menzel @ 2020-12-04  7:31 UTC (permalink / raw)
  To: Daniel Jordan, David Hildenbrand; +Cc: linux-mm, Arjan van de Ven, Feng Tang

Dear David, dear Daniel,


Am 03.12.20 um 21:58 schrieb Daniel Jordan:
> David Hildenbrand writes:
> 
>> On 03.12.20 13:52, Paul Menzel wrote:

>>> Am 03.12.20 um 13:25 schrieb David Hildenbrand:
>>>> On 03.12.20 11:51, Paul Menzel wrote:
>>>
>>>>> I am trying to reduce the startup time of Debian’s Linux 5.9.9 on a
>>>>> Intel Kaby Lake system with 32 GB of memory (TUXEDO Book BU1406 (Clevo
>>>>> N240BU)).
>>>
>>> […]
>>>
>>>> We do have deferred meminit in the kernel during boot that can
>>>> initialize memory in parallel.
>>>
>>> Is that used automatically, or do I need to activate it somehow?
>>
>> If your kernel is compiled with
>>
>> CONFIG_DEFERRED_STRUCT_PAGE_INIT
>>
>> it should be enabled automatically.

[…]

Yes, in Debian’s Linux kernel configuration, that option is selected.

> If it is enabled, what does
> 
> dmesg | grep 'deferred pages'
> 
> give you?

     $ grep 'deferred pages' dmesg-full.txt
     [    0.140199] node 0 deferred pages initialised in 40ms

     $ grep 'deferred pages' dmesg-2g.txt
     [    0.077892] node 0 deferred pages initialised in 4ms

> And assuming you're running systemd, what does systemd-analyze show you?
Please find it below. In my experience, unpacking the initrd is also 
good measurement point, and the time to write protect the kernel 
read-only data.

Without `mem=` (32 GB):

     $ dmesg
     […]
     [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.9.0-3-amd64 
root=UUID=d23ce27e-5c5c-45fb-bfa8-79a87caff13f ro quiet 
cryptomgr.notests ipv6.disable=1 log_buf_len=2M initcall_debug 
init=/lib/systemd/systemd-bootchart
     […]
     [    0.266513] Trying to unpack rootfs image as initramfs...
     [    0.275829] Freeing initrd memory: 4468K
     […]
     [    0.295890] Freeing unused kernel image (initmem) memory: 1640K
     [    0.335585] Write protecting the kernel read-only data: 18432k
     [    0.336261] Freeing unused kernel image (text/rodata gap) 
memory: 2044K
     [    0.336406] Freeing unused kernel image (rodata/data gap) 
memory: 292K
     [    0.392213] x86/mm: Checked W+X mappings: passed, no W+X pages 
found.
     [    0.392213] x86/mm: Checking user space page tables
     [    0.432697] x86/mm: Checked W+X mappings: passed, no W+X pages 
found.
     [    0.432701] Run /init as init process
     […]
     $ systemd-analyze time
     Startup finished in 3.792s (firmware) + 4.116s (loader) + 767ms 
(kernel) + 1.414s (userspace) = 10.091s
     graphical.target reached after 1.401s in userspace

With mem=2G

     $ dmesg
     […]
     [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.9.0-3-amd64 
root=UUID=d23ce27e-5c5c-45fb-bfa8-79a87caff13f ro quiet 
cryptomgr.notests ipv6.disable=1 log_buf_len=2M initcall_debug 
init=/lib/systemd/systemd-bootchart mem=2G
     […]
     [    0.199720] Trying to unpack rootfs image as initramfs...
     [    0.209058] Freeing initrd memory: 4468K
     […]
     [    0.227433] Freeing unused kernel image (initmem) memory: 1640K
     [    0.253494] Write protecting the kernel read-only data: 18432k
     [    0.253898] Freeing unused kernel image (text/rodata gap) 
memory: 2044K
     [    0.253987] Freeing unused kernel image (rodata/data gap) 
memory: 292K
     [    0.297996] x86/mm: Checked W+X mappings: passed, no W+X pages 
found.
     [    0.297997] x86/mm: Checking user space page tables
     [    0.338337] x86/mm: Checked W+X mappings: passed, no W+X pages 
found.
     [    0.338341] Run /init as init process
     $ sytsemd-analyze time
     Startup finished in 650ms (kernel) + 3.009s (userspace) = 3.659s
     graphical.target reached after 2.979s in userspace

(No idea why the firmware and loader timestamps are sometimes not 
available.)

So the different memory sizes result in an almost 100 ms difference 
during start-up, and initializing 32 GB results in a 30 percent longer 
boottime up to starting the init process. (Unfortunately userspace and 
several drivers later on take also quite some time.)

If you have ideas how to reduce the boot time of the full 32 GB, that’d 
be great.


Kind regards,

Paul


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Deferred Memory Init: How to bring rest of memory online after limiting it with `mem=XG`?
  2020-12-03 10:51 Deferred Memory Init: How to bring rest of memory online after limiting it with `mem=XG`? Paul Menzel
  2020-12-03 12:25 ` David Hildenbrand
@ 2020-12-04  8:05 ` Feng Tang
  1 sibling, 0 replies; 9+ messages in thread
From: Feng Tang @ 2020-12-04  8:05 UTC (permalink / raw)
  To: Paul Menzel; +Cc: linux-mm, Arjan van de Ven

Hi Paul,

On Thu, Dec 03, 2020 at 11:51:58AM +0100, Paul Menzel wrote:
> Dear Feng,
> 
> 
> I am trying to reduce the startup time of Debian’s Linux 5.9.9 on a Intel
> Kaby Lake system with 32 GB of memory (TUXEDO Book BU1406 (Clevo N240BU)).
> On your Linux Plumbers Conference 2019 slides of your talk *Linux Kernel
> Fastboot On the Way* [1], you mention *Deferred Memory Init*:
> 
> >Deferred Memory Init
> >
> >•   8GB RAM’s initialization costs 100+ ms
> >•   In early boot phase, we don’t need that much memory
> >•   Utilize the memory hotplug feature
> >    •   “mem=4096m” in cmdline to only init 2 GB
> >    •   Use systemd service to add rest memory in parallel
> 
> Starting Linux with `mem=2G` indeed reduces the startup time, but I am
> unable to get the rest of the memory online. Comparing it with a boot
> without `mem=2G` the `memoryX`  devices under `/sys/devices/system/memory/`
> are missing.

[...]

> 
> Can the deferred memory initialization be done with the upstream Linux
> kernel, or were you using patches on top?

Yes, it should be able to work with upstream kernel. And you need to
do a 'probe' operation to create 'memoryX' for those deferred memroy.

When you use "mem=2G", there are a bunch of memory from e820 map will
be chopped off, so you see a few memory devices left in /sys/devices/system/memory/

And to see those missing devices, you can check your dmesg log to find
the physical memory address range which is chopped off. 

Followin is a quick test I did on qemu:

	[    0.000000] BIOS-provided physical RAM map:
	[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
	[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
	[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
	[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffdffff] usable  --> real E820 map
	...
	[    0.000000] user-defined physical RAM map:
	[    0.000000] user: [mem 0x0000000000000000-0x000000000009fbff] usable
	[    0.000000] user: [mem 0x000000000009fc00-0x000000000009ffff] reserved
	[    0.000000] user: [mem 0x00000000000f0000-0x00000000000fffff] reserved
	[    0.000000] user: [mem 0x0000000000100000-0x000000005dbfffff] usable	      --> with "mem=xxx" option

We can see the "user-defined" version has much less "useable" memory,
figue out the physical address, for this

	echo 0x60000000 > /sys/devices/system/memory/probe

will create a new "memoryX" in /sys/devices/system/memory/, and echo
"online" to the 'state' inside its folder will make it online.

There may be some more stuff to care, like the memory block size and
alignment, you can read the related code and document.

Thanks,
Feng



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Deferred Memory Init: How to bring rest of memory online after limiting it with `mem=XG`?
  2020-12-04  7:31         ` Paul Menzel
@ 2020-12-04 19:50           ` Daniel Jordan
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel Jordan @ 2020-12-04 19:50 UTC (permalink / raw)
  To: Paul Menzel, David Hildenbrand; +Cc: linux-mm, Arjan van de Ven, Feng Tang

Paul Menzel <pmenzel@molgen.mpg.de> writes:
> If you have ideas how to reduce the boot time of the full 32 GB, that’d 
> be great.

The kernel source has scripts/show_delta to see the relative times
between lines of dmesg.  That combined with the initcall_debug arg
you're already using should go some way toward helping you find where
the boot is slow.

Then there are a variety of tools you can use to drill down further if
necessary, with the simplest being some extra printk's in the kernel
paths that the output points you to if you're willing to do that.

You might also do some more runs to see if your numbers are stable.

thanks,
Daniel


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-12-04 19:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03 10:51 Deferred Memory Init: How to bring rest of memory online after limiting it with `mem=XG`? Paul Menzel
2020-12-03 12:25 ` David Hildenbrand
2020-12-03 12:52   ` Paul Menzel
2020-12-03 13:06     ` David Hildenbrand
2020-12-03 20:58       ` Daniel Jordan
2020-12-04  7:31         ` Paul Menzel
2020-12-04 19:50           ` Daniel Jordan
2020-12-04  1:17   ` Feng Tang
2020-12-04  8:05 ` Feng Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).