From: ebiederm@xmission.com (Eric W. Biederman) To: Petr Tesarik <ptesarik@suse.cz> Cc: Dave Young <dyoung@redhat.com>, dzickus@redhat.com, Neil Horman <nhorman@redhat.com>, Tony Luck <tony.luck@intel.com>, bhe@redhat.com, Michael Ellerman <mpe@ellerman.id.au>, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Martin Schwidefsky <schwidefsky@de.ibm.com>, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Hari Bathini <hbathini@linux.vnet.ibm.com>, Cong Wang <xiyou.wangcong@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, Ingo Molnar <mingo@kernel.org>, Vivek Goyal <vgoyal@redhat.com> Subject: Re: [PATCH] kdump: add default crashkernel reserve kernel config options Date: Fri, 25 May 2018 15:00:13 -0500 [thread overview] Message-ID: <87d0xjwlo2.fsf@xmission.com> (raw) In-Reply-To: <20180525065943.03bcb911@ezekiel.suse.cz> (Petr Tesarik's message of "Fri, 25 May 2018 06:59:43 +0200") Petr Tesarik <ptesarik@suse.cz> writes: > V Thu, 24 May 2018 11:34:05 -0500 > ebiederm@xmission.com (Eric W. Biederman) napsáno: > >> Petr Tesarik <ptesarik@suse.cz> writes: >> >> 2> On Thu, 24 May 2018 09:49:05 +0800 >> > Dave Young <dyoung@redhat.com> wrote: >> > >> >> Hi Petr, >> >> >> >> On 05/23/18 at 10:22pm, Petr Tesarik wrote: >> >>[...] >> >> > In short, if one size fits none, what good is it to hardcode that "one >> >> > size" into the kernel image? >> >> >> >> I agreed with all the things that we can not know the exact memory >> >> requirement for 100% use cases. But that does not means this is useless >> >> it is still useful for common use cases of no special and memory hog >> >> requirements as I mentioned in another reply it can simplify the kdump >> >> deployment for those people who do not need the special setup. >> > >> > I still tend to disagree. This "common-case" reservation depends on >> > things that are defined by user space. It surely does not make it >> > easier to build a distribution kernel. Today, I get bug reports that >> > the number calculated and added to the boot loader configuration by the >> > installer is inaccurate. If I put a fixed number into a kernel config >> > option, I will start getting bugs that this number is incorrect (for >> > some systems). >> > >> >> For example, if this is a workstation I just want to break into a shell >> >> to collect some panic info, then I just need a very minimal initrd, then >> >> the Kconfig will work just fine. >> > >> > What is "a very minimal initrd"? Last time I had to make a significant >> > adjustment to the estimation for openSUSE, this was caused by growing >> > user-space requirements (systemd in this case, but I don't want to >> > start flamewars on that topic, please). >> > >> > Anyway, if you want to improve the "common case", then look how IBM >> > tries to solve it for firmware-assisted dump (fadump) on powerpc: >> > >> > https://patchwork.ozlabs.org/patch/905026/ >> > >> > The main idea is: >> > >> >> Instead of setting aside a significant chunk of memory nobody can use, >> >> [...] reserve a significant chunk of memory that the kernel is prevented >> >> from using [...], but applications are free to use it. >> > >> > That works great, because user space pages are filtered out in the >> > common case, so they can be used freely by the panic kernel. >> >> They absolutely can not be used in the kdump case. >> >> The kdump requirement is that they are pages no-one initiates any I/O >> to. To avoid the problem of devices doing DMA as the new kernel starts >> and runs. > > Good point. This means that memory reserved for this purpose would also > have to be excluded from allocations that may be eventually used for > DMA transfers. Think of a network card. The DMA's for incomming packets can be indefinitely delayed into the future unless that network card is reprogrammed. If the dump kernel does not load the driver that won't happen. >> Secondarily to avoid problems with cpus that refused to halt. > > Let's face it - if some CPUs refused to halt, all bets are off. The > code running on such a CPU can break many other things besides memory, > most importantly, it may meddle with the HW registers of crucial > devices in the system. To be less abstract, I have seen a failure to > stop a CPU in the crashed kernel a few times, and the panic kernel > could never successfully save anything; it always crashed at boot or a > little bit later. Crashing at boot is comparatively good. That is part of the design criteria. It is better to fail to startup the kernel than to start a corrupted kernel and mangle a users data. But I do see how it can be a crap shoot when dealing with another cpu. The ultimate point is that the absolute best we can do is to run a kernel in memory that we never use for anything else and then we have a fighting chance of getting the system working and getting a report of the failure out to somewhere. > Anyway, of course we would still have to keep the current method, > because user pages are not always filtered. For example, a major SUSE > account runs a database in user space and also inspects its data > structures in case of a system crash. And I understand the memory pressures that will encourage people to use user pages for extra memory to run the dump capture kernel in. Short of the presence of an IOMMU that all DMA transfers must go through I don't see how those user pages could reliably be used. Eric
WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman) To: Petr Tesarik <ptesarik@suse.cz> Cc: dzickus@redhat.com, Neil Horman <nhorman@redhat.com>, Tony Luck <tony.luck@intel.com>, bhe@redhat.com, Michael Ellerman <mpe@ellerman.id.au>, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Hari Bathini <hbathini@linux.vnet.ibm.com>, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Martin Schwidefsky <schwidefsky@de.ibm.com>, Cong Wang <xiyou.wangcong@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, Dave Young <dyoung@redhat.com>, Ingo Molnar <mingo@kernel.org>, Vivek Goyal <vgoyal@redhat.com> Subject: Re: [PATCH] kdump: add default crashkernel reserve kernel config options Date: Fri, 25 May 2018 15:00:13 -0500 [thread overview] Message-ID: <87d0xjwlo2.fsf@xmission.com> (raw) In-Reply-To: <20180525065943.03bcb911@ezekiel.suse.cz> (Petr Tesarik's message of "Fri, 25 May 2018 06:59:43 +0200") Petr Tesarik <ptesarik@suse.cz> writes: > V Thu, 24 May 2018 11:34:05 -0500 > ebiederm@xmission.com (Eric W. Biederman) napsáno: > >> Petr Tesarik <ptesarik@suse.cz> writes: >> >> 2> On Thu, 24 May 2018 09:49:05 +0800 >> > Dave Young <dyoung@redhat.com> wrote: >> > >> >> Hi Petr, >> >> >> >> On 05/23/18 at 10:22pm, Petr Tesarik wrote: >> >>[...] >> >> > In short, if one size fits none, what good is it to hardcode that "one >> >> > size" into the kernel image? >> >> >> >> I agreed with all the things that we can not know the exact memory >> >> requirement for 100% use cases. But that does not means this is useless >> >> it is still useful for common use cases of no special and memory hog >> >> requirements as I mentioned in another reply it can simplify the kdump >> >> deployment for those people who do not need the special setup. >> > >> > I still tend to disagree. This "common-case" reservation depends on >> > things that are defined by user space. It surely does not make it >> > easier to build a distribution kernel. Today, I get bug reports that >> > the number calculated and added to the boot loader configuration by the >> > installer is inaccurate. If I put a fixed number into a kernel config >> > option, I will start getting bugs that this number is incorrect (for >> > some systems). >> > >> >> For example, if this is a workstation I just want to break into a shell >> >> to collect some panic info, then I just need a very minimal initrd, then >> >> the Kconfig will work just fine. >> > >> > What is "a very minimal initrd"? Last time I had to make a significant >> > adjustment to the estimation for openSUSE, this was caused by growing >> > user-space requirements (systemd in this case, but I don't want to >> > start flamewars on that topic, please). >> > >> > Anyway, if you want to improve the "common case", then look how IBM >> > tries to solve it for firmware-assisted dump (fadump) on powerpc: >> > >> > https://patchwork.ozlabs.org/patch/905026/ >> > >> > The main idea is: >> > >> >> Instead of setting aside a significant chunk of memory nobody can use, >> >> [...] reserve a significant chunk of memory that the kernel is prevented >> >> from using [...], but applications are free to use it. >> > >> > That works great, because user space pages are filtered out in the >> > common case, so they can be used freely by the panic kernel. >> >> They absolutely can not be used in the kdump case. >> >> The kdump requirement is that they are pages no-one initiates any I/O >> to. To avoid the problem of devices doing DMA as the new kernel starts >> and runs. > > Good point. This means that memory reserved for this purpose would also > have to be excluded from allocations that may be eventually used for > DMA transfers. Think of a network card. The DMA's for incomming packets can be indefinitely delayed into the future unless that network card is reprogrammed. If the dump kernel does not load the driver that won't happen. >> Secondarily to avoid problems with cpus that refused to halt. > > Let's face it - if some CPUs refused to halt, all bets are off. The > code running on such a CPU can break many other things besides memory, > most importantly, it may meddle with the HW registers of crucial > devices in the system. To be less abstract, I have seen a failure to > stop a CPU in the crashed kernel a few times, and the panic kernel > could never successfully save anything; it always crashed at boot or a > little bit later. Crashing at boot is comparatively good. That is part of the design criteria. It is better to fail to startup the kernel than to start a corrupted kernel and mangle a users data. But I do see how it can be a crap shoot when dealing with another cpu. The ultimate point is that the absolute best we can do is to run a kernel in memory that we never use for anything else and then we have a fighting chance of getting the system working and getting a report of the failure out to somewhere. > Anyway, of course we would still have to keep the current method, > because user pages are not always filtered. For example, a major SUSE > account runs a database in user space and also inspects its data > structures in case of a system crash. And I understand the memory pressures that will encourage people to use user pages for extra memory to run the dump capture kernel in. Short of the presence of an IOMMU that all DMA transfers must go through I don't see how those user pages could reliably be used. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2018-05-25 20:00 UTC|newest] Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-05-21 2:53 [PATCH] kdump: add default crashkernel reserve kernel config options Dave Young 2018-05-21 2:53 ` Dave Young 2018-05-21 19:02 ` Andrew Morton 2018-05-21 19:02 ` Andrew Morton 2018-05-22 1:43 ` Dave Young 2018-05-22 1:43 ` Dave Young 2018-05-22 1:48 ` Dave Young 2018-05-22 1:48 ` Dave Young 2018-05-23 7:06 ` Dave Young 2018-05-23 7:06 ` Dave Young 2018-05-23 15:53 ` Eric W. Biederman 2018-05-23 15:53 ` Eric W. Biederman 2018-05-23 20:22 ` Petr Tesarik 2018-05-23 20:22 ` Petr Tesarik 2018-05-24 1:49 ` Dave Young 2018-05-24 1:49 ` Dave Young 2018-05-24 6:57 ` Petr Tesarik 2018-05-24 6:57 ` Petr Tesarik 2018-05-24 7:26 ` Dave Young 2018-05-24 7:26 ` Dave Young 2018-05-24 7:39 ` Dave Young 2018-05-24 7:39 ` Dave Young 2018-05-24 7:56 ` Dave Young 2018-05-24 7:56 ` Dave Young 2018-05-24 8:29 ` Baoquan He 2018-05-24 8:29 ` Baoquan He 2018-05-24 9:02 ` Petr Tesarik 2018-05-24 9:02 ` Petr Tesarik 2018-05-24 7:31 ` Baoquan He 2018-05-24 7:31 ` Baoquan He 2018-05-24 16:34 ` Eric W. Biederman 2018-05-24 16:34 ` Eric W. Biederman 2018-05-25 4:59 ` Petr Tesarik 2018-05-25 4:59 ` Petr Tesarik 2018-05-25 20:00 ` Eric W. Biederman [this message] 2018-05-25 20:00 ` Eric W. Biederman 2018-05-28 12:34 ` Petr Tesarik 2018-05-28 12:34 ` Petr Tesarik 2018-05-29 12:19 ` Eric W. Biederman 2018-05-29 12:19 ` Eric W. Biederman 2018-05-24 1:42 ` Dave Young 2018-05-24 1:42 ` Dave Young 2018-05-24 16:41 ` Eric W. Biederman 2018-05-24 16:41 ` Eric W. Biederman 2018-05-25 2:43 ` Dave Young 2018-05-25 2:43 ` Dave Young
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=87d0xjwlo2.fsf@xmission.com \ --to=ebiederm@xmission.com \ --cc=akpm@linux-foundation.org \ --cc=benh@kernel.crashing.org \ --cc=bhe@redhat.com \ --cc=dyoung@redhat.com \ --cc=dzickus@redhat.com \ --cc=hbathini@linux.vnet.ibm.com \ --cc=kexec@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mingo@kernel.org \ --cc=mpe@ellerman.id.au \ --cc=nhorman@redhat.com \ --cc=ptesarik@suse.cz \ --cc=schwidefsky@de.ibm.com \ --cc=tony.luck@intel.com \ --cc=vgoyal@redhat.com \ --cc=xiyou.wangcong@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.