All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Petr Tesarik <ptesarik@suse.cz>
Cc: dzickus@redhat.com, Neil Horman <nhorman@redhat.com>,
	Tony Luck <tony.luck@intel.com>,
	bhe@redhat.com, Michael Ellerman <mpe@ellerman.id.au>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	Hari Bathini <hbathini@linux.vnet.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Young <dyoung@redhat.com>, Ingo Molnar <mingo@kernel.org>,
	Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH] kdump: add default crashkernel reserve kernel config options
Date: Tue, 29 May 2018 07:19:16 -0500	[thread overview]
Message-ID: <874liqu01n.fsf@xmission.com> (raw)
In-Reply-To: <20180528143400.4fc68de4@ezekiel.suse.cz> (Petr Tesarik's message of "Mon, 28 May 2018 14:34:00 +0200")

Petr Tesarik <ptesarik@suse.cz> writes:

> On Fri, 25 May 2018 15:00:13 -0500
> ebiederm@xmission.com (Eric W. Biederman) wrote:
>
>>[...]
>> The ultimate point is that the absolute best we can do is to run a
>> kernel in memory that we never use for anything else and then we have a
>> fighting chance of getting the system working and getting a report of
>> the failure out to somewhere.
>>
>> > Anyway, of course we would still have to keep the current method,
>> > because user pages are not always filtered. For example, a major SUSE
>> > account runs a database in user space and also inspects its data
>> > structures in case of a system crash.  
>> 
>> And I understand the memory pressures that will encourage people to use
>> user pages for extra memory to run the dump capture kernel in.  Short of
>> the presence of an IOMMU that all DMA transfers must go through I don't
>> see how those user pages could reliably be used.
>
> Absolutely. I fully understand that a system which reuses first
> kernel's memory in some way must be less reliable than the current
> state. However, some people are willing to trade less reliability for
> reduced resource consumption.

That is the kind of tradeoff that can easily result in the crash kernel
eating your data.  I will nack any patch that I see that goes anywhere
near that kind of solution for the kernel that takes the crash.

> Note that we're not talking about reserving a few gigs on a single
> machine with some terabytes of memory (i.e. less than 1% of total RAM),
> rather a few hundred megs of each 4-gig VM on an s390x machine (i.e.
> about 10% of total RAM).

You should be able to get away with tens of gigs instead of hundreds.
The biggest reservation I remember anyone ever making is about 100Meg.
And that was a general purpose configuration not tuned at all.  With the
maximum size dealing with large machines.

kexec on panic grew up on machines with 4Gig or less as it arrived
before everyone was 64bit.  It should be possible to tune your crash
dump taking kernel so things run in a reasonable amount of memory for
the configuration you are talking about.  The usual trade-off is time
vs generality.  Usually I simply have not seen people with non-embedded
configurations take the time to tune things.

Eric

WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman)
To: Petr Tesarik <ptesarik@suse.cz>
Cc: dzickus@redhat.com, Neil Horman <nhorman@redhat.com>,
	Tony Luck <tony.luck@intel.com>,
	bhe@redhat.com, Michael Ellerman <mpe@ellerman.id.au>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Hari Bathini <hbathini@linux.vnet.ibm.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Young <dyoung@redhat.com>, Ingo Molnar <mingo@kernel.org>,
	Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH] kdump: add default crashkernel reserve kernel config options
Date: Tue, 29 May 2018 07:19:16 -0500	[thread overview]
Message-ID: <874liqu01n.fsf@xmission.com> (raw)
In-Reply-To: <20180528143400.4fc68de4@ezekiel.suse.cz> (Petr Tesarik's message of "Mon, 28 May 2018 14:34:00 +0200")

Petr Tesarik <ptesarik@suse.cz> writes:

> On Fri, 25 May 2018 15:00:13 -0500
> ebiederm@xmission.com (Eric W. Biederman) wrote:
>
>>[...]
>> The ultimate point is that the absolute best we can do is to run a
>> kernel in memory that we never use for anything else and then we have a
>> fighting chance of getting the system working and getting a report of
>> the failure out to somewhere.
>>
>> > Anyway, of course we would still have to keep the current method,
>> > because user pages are not always filtered. For example, a major SUSE
>> > account runs a database in user space and also inspects its data
>> > structures in case of a system crash.  
>> 
>> And I understand the memory pressures that will encourage people to use
>> user pages for extra memory to run the dump capture kernel in.  Short of
>> the presence of an IOMMU that all DMA transfers must go through I don't
>> see how those user pages could reliably be used.
>
> Absolutely. I fully understand that a system which reuses first
> kernel's memory in some way must be less reliable than the current
> state. However, some people are willing to trade less reliability for
> reduced resource consumption.

That is the kind of tradeoff that can easily result in the crash kernel
eating your data.  I will nack any patch that I see that goes anywhere
near that kind of solution for the kernel that takes the crash.

> Note that we're not talking about reserving a few gigs on a single
> machine with some terabytes of memory (i.e. less than 1% of total RAM),
> rather a few hundred megs of each 4-gig VM on an s390x machine (i.e.
> about 10% of total RAM).

You should be able to get away with tens of gigs instead of hundreds.
The biggest reservation I remember anyone ever making is about 100Meg.
And that was a general purpose configuration not tuned at all.  With the
maximum size dealing with large machines.

kexec on panic grew up on machines with 4Gig or less as it arrived
before everyone was 64bit.  It should be possible to tune your crash
dump taking kernel so things run in a reasonable amount of memory for
the configuration you are talking about.  The usual trade-off is time
vs generality.  Usually I simply have not seen people with non-embedded
configurations take the time to tune things.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2018-05-29 12:19 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-21  2:53 [PATCH] kdump: add default crashkernel reserve kernel config options Dave Young
2018-05-21  2:53 ` Dave Young
2018-05-21 19:02 ` Andrew Morton
2018-05-21 19:02   ` Andrew Morton
2018-05-22  1:43   ` Dave Young
2018-05-22  1:43     ` Dave Young
2018-05-22  1:48   ` Dave Young
2018-05-22  1:48     ` Dave Young
2018-05-23  7:06   ` Dave Young
2018-05-23  7:06     ` Dave Young
2018-05-23 15:53     ` Eric W. Biederman
2018-05-23 15:53       ` Eric W. Biederman
2018-05-23 20:22       ` Petr Tesarik
2018-05-23 20:22         ` Petr Tesarik
2018-05-24  1:49         ` Dave Young
2018-05-24  1:49           ` Dave Young
2018-05-24  6:57           ` Petr Tesarik
2018-05-24  6:57             ` Petr Tesarik
2018-05-24  7:26             ` Dave Young
2018-05-24  7:26               ` Dave Young
2018-05-24  7:39               ` Dave Young
2018-05-24  7:39                 ` Dave Young
2018-05-24  7:56               ` Dave Young
2018-05-24  7:56                 ` Dave Young
2018-05-24  8:29                 ` Baoquan He
2018-05-24  8:29                   ` Baoquan He
2018-05-24  9:02               ` Petr Tesarik
2018-05-24  9:02                 ` Petr Tesarik
2018-05-24  7:31             ` Baoquan He
2018-05-24  7:31               ` Baoquan He
2018-05-24 16:34             ` Eric W. Biederman
2018-05-24 16:34               ` Eric W. Biederman
2018-05-25  4:59               ` Petr Tesarik
2018-05-25  4:59                 ` Petr Tesarik
2018-05-25 20:00                 ` Eric W. Biederman
2018-05-25 20:00                   ` Eric W. Biederman
2018-05-28 12:34                   ` Petr Tesarik
2018-05-28 12:34                     ` Petr Tesarik
2018-05-29 12:19                     ` Eric W. Biederman [this message]
2018-05-29 12:19                       ` Eric W. Biederman
2018-05-24  1:42       ` Dave Young
2018-05-24  1:42         ` Dave Young
2018-05-24 16:41         ` Eric W. Biederman
2018-05-24 16:41           ` Eric W. Biederman
2018-05-25  2:43           ` Dave Young
2018-05-25  2:43             ` Dave Young

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874liqu01n.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=bhe@redhat.com \
    --cc=dyoung@redhat.com \
    --cc=dzickus@redhat.com \
    --cc=hbathini@linux.vnet.ibm.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=nhorman@redhat.com \
    --cc=ptesarik@suse.cz \
    --cc=schwidefsky@de.ibm.com \
    --cc=tony.luck@intel.com \
    --cc=vgoyal@redhat.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.