From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B360C6778F for ; Thu, 26 Jul 2018 13:37:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0584F20685 for ; Thu, 26 Jul 2018 13:37:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0584F20685 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731082AbeGZOyH (ORCPT ); Thu, 26 Jul 2018 10:54:07 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:53840 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729410AbeGZOyH (ORCPT ); Thu, 26 Jul 2018 10:54:07 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B877881663C5; Thu, 26 Jul 2018 13:37:10 +0000 (UTC) Received: from localhost (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 205BC2166BA3; Thu, 26 Jul 2018 13:37:08 +0000 (UTC) Date: Thu, 26 Jul 2018 21:37:05 +0800 From: Baoquan He To: Michal Hocko Cc: Andrew Morton , linux-kernel@vger.kernel.org, robh+dt@kernel.org, dan.j.williams@intel.com, nicolas.pitre@linaro.org, josh@joshtriplett.org, fengguang.wu@intel.com, bp@suse.de, andy.shevchenko@gmail.com, patrik.r.jakobsson@gmail.com, airlied@linux.ie, kys@microsoft.com, haiyangz@microsoft.com, sthemmin@microsoft.com, dmitry.torokhov@gmail.com, frowand.list@gmail.com, keith.busch@intel.com, jonathan.derrick@intel.com, lorenzo.pieralisi@arm.com, bhelgaas@google.com, tglx@linutronix.de, brijesh.singh@amd.com, jglisse@redhat.com, thomas.lendacky@amd.com, gregkh@linuxfoundation.org, baiyaowei@cmss.chinamobile.com, richard.weiyang@gmail.com, devel@linuxdriverproject.org, linux-input@vger.kernel.org, linux-nvdimm@lists.01.org, devicetree@vger.kernel.org, linux-pci@vger.kernel.org, ebiederm@xmission.com, vgoyal@redhat.com, dyoung@redhat.com, yinghai@kernel.org, monstr@monstr.eu, davem@davemloft.net, chris@zankel.net, jcmvbkbc@gmail.com, gustavo@padovan.org, maarten.lankhorst@linux.intel.com, seanpaul@chromium.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kexec@lists.infradead.org Subject: Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required Message-ID: <20180726133705.GM6480@MiWiFi-R3L-srv> References: <20180718024944.577-1-bhe@redhat.com> <20180718024944.577-5-bhe@redhat.com> <20180718153326.b795e9ea7835432a56cd7011@linux-foundation.org> <20180719151753.GB7147@localhost.localdomain> <20180723143443.GD18181@dhcp22.suse.cz> <20180725064813.GI6480@MiWiFi-R3L-srv> <20180726125957.GH28386@dhcp22.suse.cz> <20180726130904.GL6480@MiWiFi-R3L-srv> <20180726131242.GI28386@dhcp22.suse.cz> <20180726131420.GJ28386@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180726131420.GJ28386@dhcp22.suse.cz> User-Agent: Mutt/1.9.1 (2017-09-22) X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 26 Jul 2018 13:37:10 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 26 Jul 2018 13:37:10 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'bhe@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/26/18 at 03:14pm, Michal Hocko wrote: > On Thu 26-07-18 15:12:42, Michal Hocko wrote: > > On Thu 26-07-18 21:09:04, Baoquan He wrote: > > > On 07/26/18 at 02:59pm, Michal Hocko wrote: > > > > On Wed 25-07-18 14:48:13, Baoquan He wrote: > > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote: > > > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote: > > > > > > > Kexec has been a formal feature in our distro, and customers owning > > > > > > > those kind of very large machine can make use of this feature to speed > > > > > > > up the reboot process. On uefi machine, the kexec_file loading will > > > > > > > search place to put kernel under 4G from top to down. As we know, the > > > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume > > > > > > > it. It may have possibility to not be able to find a usable space for > > > > > > > kernel/initrd. From the top down of the whole memory space, we don't > > > > > > > have this worry. > > > > > > > > > > > > I do not have the full context here but let me note that you should be > > > > > > careful when doing top-down reservation because you can easily get into > > > > > > hotplugable memory and break the hotremove usecase. We even warn when > > > > > > this is done. See memblock_find_in_range_node > > > > > > > > > > Kexec read kernel/initrd file into buffer, just search usable positions > > > > > for them to do the later copying. You can see below struct kexec_segment, > > > > > for the old kexec_load, kernel/initrd are read into user space buffer, > > > > > the @buf stores the user space buffer address, @mem stores the position > > > > > where kernel/initrd will be put. In kernel, it calls > > > > > kimage_load_normal_segment() to copy user space buffer to intermediate > > > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages > > > > > are recorded as entries, later when user execute "kexec -e" to trigger > > > > > kexec jumping, it will do the final copying from the intermediate pages > > > > > to the real destination pages which @mem pointed. Because we can't touch > > > > > the existed data in 1st kernel when do kexec kernel loading. With my > > > > > understanding, GFP_KERNEL will make those intermediate pages be > > > > > allocated inside immovable area, it won't impact hotplugging. But the > > > > > @mem we searched in the whole system RAM might be lost along with > > > > > hotplug. Hence we need do kexec kernel again when hotplug event is > > > > > detected. > > > > > > > > I am not sure I am following. If @mem is placed at movable node then the > > > > memory hotremove simply won't work, because we are seeing reserved pages > > > > and do not know what to do about them. They are not migrateable. > > > > Allocating intermediate pages from other nodes doesn't really help. > > > > > > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove > > > in 1st kernel, it does impact the kernel which kexec jump into if kernel > > > is at top of system RAM and the top RAM is in movable node. > > > > It will affect the 1st kernel (which does the memblock allocation > > top-down) as well. For reasons mentioned above. > > And btw. in the ideal world, we would restrict the memblock allocation > top-down from the non-movable nodes. But I do not think we have that > information ready at the time when the reservation is done. Oh, you could mix kexec loading up with kdump kernel loading. For kdump kernel, we need reserve memory region during bootup with memblock allocator. For kexec loading, we just operate after system up, and do not need to reserve any memmory region. About memory used to load them, it's quite different way. Thanks Baoquan