From: Michal Hocko <mhocko@kernel.org>
To: Baoquan He <bhe@redhat.com>
Cc: nicolas.pitre@linaro.org, brijesh.singh@amd.com,
devicetree@vger.kernel.org, airlied@linux.ie,
linux-pci@vger.kernel.org, richard.weiyang@gmail.com,
jcmvbkbc@gmail.com, baiyaowei@cmss.chinamobile.com,
kys@microsoft.com, frowand.list@gmail.com,
lorenzo.pieralisi@arm.com, sthemmin@microsoft.com,
linux-nvdimm@lists.01.org, patrik.r.jakobsson@gmail.com,
andy.shevchenko@gmail.com, linux-input@vger.kernel.org,
gustavo@padovan.org, bp@suse.de, dyoung@redhat.com,
thomas.lendacky@amd.com, haiyangz@microsoft.com,
maarten.lankhorst@linux.intel.com, josh@joshtriplett.org,
jglisse@redhat.com, robh+dt@kernel.org, seanpaul@chromium.org,
bhelgaas@google.com, tglx@linutronix.de, yinghai@kernel.org,
jonathan.derrick@intel.com, chris@zankel.net, monstr@monstr.eu,
linux-parisc@vger.kernel.org, gregkh@linuxfoundation.org,
dmitry.torokhov@gmail.com, kexec@lists.infradead.org,
linux-kernel@vger.kernel.org, ebiederm@xmission.com,
devel@linuxdriverproject.org,
Andrew Morton <akpm@linux-foundation.org>,
fengguang.wu@intel.com, linuxppc-dev@lists.ozlabs.org,
davem@davemloft.net
Subject: Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
Date: Thu, 26 Jul 2018 15:14:20 +0200 [thread overview]
Message-ID: <20180726131420.GJ28386@dhcp22.suse.cz> (raw)
In-Reply-To: <20180726131242.GI28386@dhcp22.suse.cz>
On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > > > Kexec has been a formal feature in our distro, and customers owning
> > > > > > those kind of very large machine can make use of this feature to speed
> > > > > > up the reboot process. On uefi machine, the kexec_file loading will
> > > > > > search place to put kernel under 4G from top to down. As we know, the
> > > > > > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > > > > > it. It may have possibility to not be able to find a usable space for
> > > > > > kernel/initrd. From the top down of the whole memory space, we don't
> > > > > > have this worry.
> > > > >
> > > > > I do not have the full context here but let me note that you should be
> > > > > careful when doing top-down reservation because you can easily get into
> > > > > hotplugable memory and break the hotremove usecase. We even warn when
> > > > > this is done. See memblock_find_in_range_node
> > > >
> > > > Kexec read kernel/initrd file into buffer, just search usable positions
> > > > for them to do the later copying. You can see below struct kexec_segment,
> > > > for the old kexec_load, kernel/initrd are read into user space buffer,
> > > > the @buf stores the user space buffer address, @mem stores the position
> > > > where kernel/initrd will be put. In kernel, it calls
> > > > kimage_load_normal_segment() to copy user space buffer to intermediate
> > > > pages which are allocated with flag GFP_KERNEL. These intermediate pages
> > > > are recorded as entries, later when user execute "kexec -e" to trigger
> > > > kexec jumping, it will do the final copying from the intermediate pages
> > > > to the real destination pages which @mem pointed. Because we can't touch
> > > > the existed data in 1st kernel when do kexec kernel loading. With my
> > > > understanding, GFP_KERNEL will make those intermediate pages be
> > > > allocated inside immovable area, it won't impact hotplugging. But the
> > > > @mem we searched in the whole system RAM might be lost along with
> > > > hotplug. Hence we need do kexec kernel again when hotplug event is
> > > > detected.
> > >
> > > I am not sure I am following. If @mem is placed at movable node then the
> > > memory hotremove simply won't work, because we are seeing reserved pages
> > > and do not know what to do about them. They are not migrateable.
> > > Allocating intermediate pages from other nodes doesn't really help.
> >
> > OK, I forgot the 2nd kernel which kexec jump into. It won't impact hotremove
> > in 1st kernel, it does impact the kernel which kexec jump into if kernel
> > is at top of system RAM and the top RAM is in movable node.
>
> It will affect the 1st kernel (which does the memblock allocation
> top-down) as well. For reasons mentioned above.
And btw. in the ideal world, we would restrict the memblock allocation
top-down from the non-movable nodes. But I do not think we have that
information ready at the time when the reservation is done.
--
Michal Hocko
SUSE Labs
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
next prev parent reply other threads:[~2018-07-26 13:14 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-18 2:49 [PATCH v7 0/4] resource: Use list_head to link sibling resource Baoquan He
2018-07-18 2:49 ` [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public Baoquan He
2018-07-18 16:36 ` Andy Shevchenko
2018-07-18 16:37 ` Andy Shevchenko
2018-07-19 15:18 ` Baoquan He
2018-07-18 2:49 ` [PATCH v7 2/4] resource: Use list_head to link sibling resource Baoquan He
2018-07-18 2:49 ` [PATCH v7 3/4] resource: add walk_system_ram_res_rev() Baoquan He
2018-07-18 2:49 ` [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required Baoquan He
2018-07-18 22:33 ` Andrew Morton
2018-07-19 15:17 ` Baoquan He
2018-07-19 19:44 ` Andrew Morton
2018-07-25 2:21 ` Baoquan He
2018-07-23 14:34 ` Michal Hocko
2018-07-25 6:48 ` Baoquan He
2018-07-26 12:59 ` Michal Hocko
2018-07-26 13:09 ` Baoquan He
2018-07-26 13:12 ` Michal Hocko
2018-07-26 13:14 ` Michal Hocko [this message]
2018-07-26 13:37 ` Baoquan He
2018-07-26 14:01 ` Michal Hocko
2018-07-26 15:10 ` Baoquan He
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180726131420.GJ28386@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=airlied@linux.ie \
--cc=akpm@linux-foundation.org \
--cc=andy.shevchenko@gmail.com \
--cc=baiyaowei@cmss.chinamobile.com \
--cc=bhe@redhat.com \
--cc=bhelgaas@google.com \
--cc=bp@suse.de \
--cc=brijesh.singh@amd.com \
--cc=chris@zankel.net \
--cc=davem@davemloft.net \
--cc=devel@linuxdriverproject.org \
--cc=devicetree@vger.kernel.org \
--cc=dmitry.torokhov@gmail.com \
--cc=dyoung@redhat.com \
--cc=ebiederm@xmission.com \
--cc=fengguang.wu@intel.com \
--cc=frowand.list@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=gustavo@padovan.org \
--cc=haiyangz@microsoft.com \
--cc=jcmvbkbc@gmail.com \
--cc=jglisse@redhat.com \
--cc=jonathan.derrick@intel.com \
--cc=josh@joshtriplett.org \
--cc=kexec@lists.infradead.org \
--cc=kys@microsoft.com \
--cc=linux-input@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-parisc@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lorenzo.pieralisi@arm.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=monstr@monstr.eu \
--cc=nicolas.pitre@linaro.org \
--cc=patrik.r.jakobsson@gmail.com \
--cc=richard.weiyang@gmail.com \
--cc=robh+dt@kernel.org \
--cc=seanpaul@chromium.org \
--cc=sthemmin@microsoft.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).