Dear Krzysztof, Am 10.11.21 um 00:10 schrieb Krzysztof Wilczyński: > [...] >>> I am curious - why is this a problem? Are you power-cycling your servers >>> so often to the point where the cumulative time spent in enumerating PCI >>> devices and adding them later to IOMMU groups is a problem? >>> >>> I am simply wondering why you decided to signal out the PCI enumeration as >>> slow in particular, especially given that a large server hardware tends to >>> have (most of the time, as per my experience) rather long initialisation >>> time either from being powered off or after being power cycled. I can take >>> a while before the actual operating system itself will start. >> >> It’s not a problem per se, and more a pet peeve of mine. Systems get faster >> and faster, and boottime slower and slower. On desktop systems, it’s much >> more important with firmware like coreboot taking less than one second to >> initialize the hardware and passing control to the payload/operating system. >> If we are lucky, we are going to have servers with FLOSS firmware. >> >> But, already now, using kexec to reboot a system, avoids the problems you >> pointed out on servers, and being able to reboot a system as quickly as >> possible, lowers the bar for people to reboot systems more often to, for >> example, so updates take effect. > > A very good point about the kexec usage. > > This is definitely often invaluable to get security updates out of the door > quickly, update kernel version, or when you want to switch operating system > quickly (a trick that companies like Equinix Metal use when offering their > baremetal as a service). > >>> We talked about this briefly with Bjorn, and there might be an option to >>> perhaps add some caching, as we suspect that the culprit here is doing PCI >>> configuration space read for each device, which can be slow on some >>> platforms. >>> >>> However, we would need to profile this to get some quantitative data to see >>> whether doing anything would even be worthwhile. It would definitely help >>> us understand better where the bottlenecks really are and of what magnitude. >>> >>> I personally don't have access to such a large hardware like the one you >>> have access to, thus I was wondering whether you would have some time, and >>> be willing, to profile this for us on the hardware you have. >>> >>> Let me know what do you think? >> >> Sounds good. I’d be willing to help. Note, that I won’t have time before >> Wednesday next week though. > > Not a problem! I am very grateful you are willing to devote some of you > time to help with this. > > I only have access to a few systems such as some commodity hardware like > a desktop PC and notebooks, and some assorted SoCs. These are sadly not > even close to a proper server platforms, and trying to measure anything on > these does not really yield any useful data as the delays related to PCI > enumeration on startup are quite insignificant in comparison - there is > just not enough hardware there, so to speak. > > I am really looking forward to the data you can gather for us and what > insight it might provide us with. So, kexec seems to work besides some DMAR-IR warnings [1]. `initcall_debug` increases the Linux boot time by over 50 % from 7.7 s to 12 s, which I didn’t expect. Here are the functions taking more than 200 ms: initcall pci_apply_final_quirks+0x0/0x132 returned 0 after 228433 usecs initcall raid6_select_algo+0x0/0x2d6 returned 0 after 383789 usecs initcall pcibios_assign_resources+0x0/0xc0 returned 0 after 610757 usecs initcall _mpt3sas_init+0x0/0x1c0 returned 0 after 721257 usecs initcall ahci_pci_driver_init+0x0/0x1a returned 0 after 945094 usecs initcall pci_iommu_init+0x0/0x3f returned 0 after 1487134 usecs initcall acpi_init+0x0/0x349 returned 0 after 7291015 usecs Some of them are run later though, but `acpi_init` sticks out with 7.3 s. Kind regards, Paul [1]: https://lore.kernel.org/linux-iommu/40a7581d-985b-f12b-0bb2-99c586a9f968@molgen.mpg.de/T/#u