From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=yvMx=KF=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID,
	DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,
	SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BF654C43460
	for <linux-mm@archiver.kernel.org>; Mon, 10 May 2021 11:01:18 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 1C02E61999
	for <linux-mm@archiver.kernel.org>; Mon, 10 May 2021 11:01:18 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1C02E61999
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 54CAE6B0070; Mon, 10 May 2021 07:01:17 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 4FC8A6B0071; Mon, 10 May 2021 07:01:17 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 376AA6B0072; Mon, 10 May 2021 07:01:17 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0173.hostedemail.com [216.40.44.173])
	by kanga.kvack.org (Postfix) with ESMTP id 1B1606B0070
	for <linux-mm@kvack.org>; Mon, 10 May 2021 07:01:17 -0400 (EDT)
Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id C9E82181AF5D0
	for <linux-mm@kvack.org>; Mon, 10 May 2021 11:01:16 +0000 (UTC)
X-FDA: 78125029752.38.134C8ED
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124])
	by imf09.hostedemail.com (Postfix) with ESMTP id 3C02E6000112
	for <linux-mm@kvack.org>; Mon, 10 May 2021 11:01:05 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1620644474;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=w7cGyd0ydkGYOI9NO4YUbRdiftLG90cTynQXzUsFrrc=;
	b=dv3OgOsIg81XJFRiIWQDvbLSd6kMV2+IPGPeOvtg1am+cFzWkc1CsfmPlZxZrKsqMrnmrD
	3AWBWrPXqZEPUjfmuZLJ0dVibW5OMGKRiG60ZPoeQpqzQ+/IA00vneUfYceO2gePeCKgqj
	7JCmkNjiRLY7XOzDS9dlvEEAr4QZljk=
Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com
 [209.85.128.69]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-485-UmCvECZqM9e0p1TXGRjCMQ-1; Mon, 10 May 2021 07:01:12 -0400
X-MC-Unique: UmCvECZqM9e0p1TXGRjCMQ-1
Received: by mail-wm1-f69.google.com with SMTP id o18-20020a1ca5120000b02901333a56d46eso6847393wme.8
        for <linux-mm@kvack.org>; Mon, 10 May 2021 04:01:12 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:to:cc:references:from:organization:subject
         :message-id:date:user-agent:mime-version:in-reply-to
         :content-language:content-transfer-encoding;
        bh=w7cGyd0ydkGYOI9NO4YUbRdiftLG90cTynQXzUsFrrc=;
        b=OeSoPP0RtriGjCgx7GbaG7n6hFN1a4a8Ykb9UXQKp3ZCQQGEvDnwXIPyjBpN4bo4yB
         rZ+hCONQ2L9Flsft3AguAIseDyFH0IyhXxNGWegqvJ0ExIRDixlnONIrq9SbkojfopgJ
         0jn1+RBdKVC8NyFzl4RL0k1ftmIbMearBh01LCzjy3QQrgVj50jrHXN9UFvNjals8/g6
         CysHofufxF86RS3VrNoKuS8YGlMkbmI5c5vnHl/MlqOC9PQuIjxoVYH0Dg60oJB43jJ2
         FLg4x6o05aso17RXgOkjWPdmq/suAJrxtGYuz3DEXgxZLZYvHwDw09bSS9EbzueDTykZ
         HkzQ==
X-Gm-Message-State: AOAM530s8WWtLWQ5NrAm6dtz9QFtSEo68atzKDWi3SrSTTcR91nxErit
	3vsLCnu1YLTOmNaSVtHAm9RkjBQQc0ODvL9CXMWlxziIASmTX+/5Io49gPZDr18Zcr2xzsI43dE
	lDuEGUBtUg68=
X-Received: by 2002:a5d:524f:: with SMTP id k15mr29763775wrc.412.1620644470643;
        Mon, 10 May 2021 04:01:10 -0700 (PDT)
X-Google-Smtp-Source: ABdhPJyrecxsXYNUfJFvVustsnaL5jztCalPy4k4RkykElhHdG0tCg9zx7Umh+KpnO61hu5+yM0SdA==
X-Received: by 2002:a5d:524f:: with SMTP id k15mr29763697wrc.412.1620644470000;
        Mon, 10 May 2021 04:01:10 -0700 (PDT)
Received: from [192.168.3.132] (p5b0c676a.dip0.t-ipconnect.de. [91.12.103.106])
        by smtp.gmail.com with ESMTPSA id n6sm22726262wro.23.2021.05.10.04.01.08
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Mon, 10 May 2021 04:01:09 -0700 (PDT)
To: Baoquan He <bhe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, andreyknvl@google.com,
 christian.brauner@ubuntu.com, colin.king@canonical.com, corbet@lwn.net,
 dyoung@redhat.com, frederic@kernel.org, gpiccoli@canonical.com,
 john.p.donnelly@oracle.com, jpoimboe@redhat.com, keescook@chromium.org,
 linux-mm@kvack.org, masahiroy@kernel.org, mchehab+huawei@kernel.org,
 mike.kravetz@oracle.com, mingo@kernel.org, mm-commits@vger.kernel.org,
 paulmck@kernel.org, peterz@infradead.org, rdunlap@infradead.org,
 rostedt@goodmis.org, rppt@kernel.org, saeed.mirzamohammadi@oracle.com,
 samitolvanen@google.com, sboyd@kernel.org, tglx@linutronix.de,
 torvalds@linux-foundation.org, vgoyal@redhat.com, yifeifz2@illinois.edu,
 Michal Hocko <mhocko@kernel.org>
References: <20210507010432.IN24PudKT%akpm@linux-foundation.org>
 <889c6b90-7335-71ce-c955-3596e6ac7c5a@redhat.com>
 <20210508085133.GA2946@localhost.localdomain>
 <2d0f53d9-51ca-da57-95a3-583dc81f35ef@redhat.com>
 <20210510045338.GB2946@localhost.localdomain>
 <4a544493-0622-ac6d-f14b-fb338e33b25e@redhat.com>
 <20210510104359.GC2946@localhost.localdomain>
From: David Hildenbrand <david@redhat.com>
Organization: Red Hat
Subject: Re: [patch 48/91] kernel/crash_core: add crashkernel=auto for vmcore
 creation
Message-ID: <e9332e4e-57bd-9c7c-b3ba-7e3b29fad7c8@redhat.com>
Date: Mon, 10 May 2021 13:01:08 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.8.1
MIME-Version: 1.0
In-Reply-To: <20210510104359.GC2946@localhost.localdomain>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
X-Rspamd-Server: rspam02
X-Rspamd-Queue-Id: 3C02E6000112
X-Stat-Signature: qdsedch11mxmfzk3r1ykixc8tu59xx1o
Authentication-Results: imf09.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dv3OgOsI;
	spf=none (imf09.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf09; identity=mailfrom; envelope-from="<david@redhat.com>"; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124
X-HE-DKIM-Result: pass/pass
X-HE-Tag: 1620644465-54372
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


>> I can understand the reasoning of "using a fraction of the memory size=
" when
>> booting up just to be on the safe side as we don't know", and that
>> motivation is much better than what I read so far. But then I wonder i=
f we
>> cannot handle that any better? Because this feels very suboptimal to m=
e and
>> I feel like there can be cases where the heuristic is just wrong.
>=20
> Yes, I understand what you said. Our headache is mainly from bare metal
> system worrying the reservation is not enough becuase of many devices.
>=20
> On VM, it is truly different. With much less devices, it does waste som=
e
> memory. Usually a fixed minimal size can cover 99.9% of system unless
> too many devices attached/added to VM, I am not sure what's the
> probability it could happen. While, by the help of /sys/kernel/kexec_cr=
ash_size,
> you can shrink it to an small enough but available size. Just you may
> need to reload kdump kernel because the loaded kernel should have been
> erazed and out of control. The shrinking should be done at early stage =
of
> kernel running, I would say, lest crash may happen during that period.
>=20
> We ever tried several different ways to enlarge the crashkernel size
> dynamically, but didn't think of a good way.

Yes, enlarging it at runtime much more difficult than shrinking.

[...]

>> A kernel early during boot can only guess. A kernel late during boot k=
nows.
>> Please correct me if I'm wrong.
>=20
> Well, I would not say it's guess, and would like to call them experical
> values from statistical data. With a priori vlaue given by 'auto',
> basically normal users of kdump don't need to care about the setting.
> E.g on Fedora, 'auto' can cover all systems, assume nobody would deploy
> it on a high end server. Everything we do is to make thing simple enoug=
h.
> If you don't know how to set, just add 'crashkernel=3Dauto' to cmdline,
> then everything is done. I believe you agree that not everybody would
> like to dig into kexec/kdump just for getting how big crashkernel size
> need be set when they want to use kdump functionality.

Oh absolutely. But OTOH, most users will leave the value untouched if it=20
works -- and complain at least in the VM environment to me about=20
surpises waste of system RAM with "crashkernel=3Dauto".

[...]

>> I know how helpful "crashkernel=3Dauto" was so far, but I am also awar=
e that
>> there was strong pushback in the past, and I remember for the reasons =
I
>> gave. IMHO we should refine that approach instead of trying to push th=
e same
>> thing upstream every couple of years.
>>
>> I ran into the "512MB crashkernel" on a 64G VM with memory ballooning =
issue
>> already but didn't report a BZ, because so far, I was under the impres=
sion
>> that more memory means more crashkernel. But you explained to me that =
I was
>> just running into a (for my use case) bad heuristic.
>=20
> I re-read the old posts, didn't see strong push-back. People just gave
> some different ideas instead. When we were silent, we tried different
> way, e.g the enlarging crashkernel at run time as told at above, but
> failed. Reusing free pages and user space pages of 1st kernel in kdump
> kernel, also failed. We also talked with people to consult if it's

Thanks for an insight into the history.

> doable to remove 'auto' support, nobody would like to give an affirmati=
ve
> answer. I know SUSE is using the way you mentioned to get a recommended
> size for long time, but it needs severeal more steps and need reboot. W=
e
> prefer to take that way too as an improvement. The simpler, the better.

At least I'm happy to hear that other people had the same idea as me ;)

I can understand the desire for simplicity. it would be great to hear=20
SUSEs perception of the problem and how they would ideally want to move=20
forward with this.

[...]

>> I'll be happy to help looking into dynamic shrinking of the crashkerne=
l size
>> if that approach makes sense. We could even let user space trigger tha=
t
>> resizing -- without a reboot.
>=20
> Don't reply each inline comment since I believe they have been covered
> by the earlier reply. Thanks for looking to this and telling your
> thought, to let us know that in fact you really care about the extra
> memory on VMs which we have realized, but didn't realized it really cau=
se
> issue.

I mess with dynamic resizing of VMs, that's why I usually take a closer=20
look at all things that do stuff based on the initial VM size; yes,=20
there is still a lot other such things out there.

It also bugged me for quite a bit that we don't have a sane way to=20
achieve what we're doing here upstream. It somewhat feels like "this=20
doesn't belong in the kernel and is user policy" but then, the existing=20
kernel support is suboptimal.

Maybe reserving some "maybe too big but okayish to boot the system in a=20
sane environment -- e.g., X% of system RAM and at least Y" size first=20
and shrinking it later as triggered by user space early (where we do=20
seem to have a way to pre-calculate things now) might actually be a good=20
direction to look into.

--=20
Thanks,

David / dhildenb