From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF654C43460 for ; Mon, 10 May 2021 11:01:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1C02E61999 for ; Mon, 10 May 2021 11:01:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1C02E61999 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 54CAE6B0070; Mon, 10 May 2021 07:01:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4FC8A6B0071; Mon, 10 May 2021 07:01:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 376AA6B0072; Mon, 10 May 2021 07:01:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0173.hostedemail.com [216.40.44.173]) by kanga.kvack.org (Postfix) with ESMTP id 1B1606B0070 for ; Mon, 10 May 2021 07:01:17 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id C9E82181AF5D0 for ; Mon, 10 May 2021 11:01:16 +0000 (UTC) X-FDA: 78125029752.38.134C8ED Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf09.hostedemail.com (Postfix) with ESMTP id 3C02E6000112 for ; Mon, 10 May 2021 11:01:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1620644474; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=w7cGyd0ydkGYOI9NO4YUbRdiftLG90cTynQXzUsFrrc=; b=dv3OgOsIg81XJFRiIWQDvbLSd6kMV2+IPGPeOvtg1am+cFzWkc1CsfmPlZxZrKsqMrnmrD 3AWBWrPXqZEPUjfmuZLJ0dVibW5OMGKRiG60ZPoeQpqzQ+/IA00vneUfYceO2gePeCKgqj 7JCmkNjiRLY7XOzDS9dlvEEAr4QZljk= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-485-UmCvECZqM9e0p1TXGRjCMQ-1; Mon, 10 May 2021 07:01:12 -0400 X-MC-Unique: UmCvECZqM9e0p1TXGRjCMQ-1 Received: by mail-wm1-f69.google.com with SMTP id o18-20020a1ca5120000b02901333a56d46eso6847393wme.8 for ; Mon, 10 May 2021 04:01:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=w7cGyd0ydkGYOI9NO4YUbRdiftLG90cTynQXzUsFrrc=; b=OeSoPP0RtriGjCgx7GbaG7n6hFN1a4a8Ykb9UXQKp3ZCQQGEvDnwXIPyjBpN4bo4yB rZ+hCONQ2L9Flsft3AguAIseDyFH0IyhXxNGWegqvJ0ExIRDixlnONIrq9SbkojfopgJ 0jn1+RBdKVC8NyFzl4RL0k1ftmIbMearBh01LCzjy3QQrgVj50jrHXN9UFvNjals8/g6 CysHofufxF86RS3VrNoKuS8YGlMkbmI5c5vnHl/MlqOC9PQuIjxoVYH0Dg60oJB43jJ2 FLg4x6o05aso17RXgOkjWPdmq/suAJrxtGYuz3DEXgxZLZYvHwDw09bSS9EbzueDTykZ HkzQ== X-Gm-Message-State: AOAM530s8WWtLWQ5NrAm6dtz9QFtSEo68atzKDWi3SrSTTcR91nxErit 3vsLCnu1YLTOmNaSVtHAm9RkjBQQc0ODvL9CXMWlxziIASmTX+/5Io49gPZDr18Zcr2xzsI43dE lDuEGUBtUg68= X-Received: by 2002:a5d:524f:: with SMTP id k15mr29763775wrc.412.1620644470643; Mon, 10 May 2021 04:01:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyrecxsXYNUfJFvVustsnaL5jztCalPy4k4RkykElhHdG0tCg9zx7Umh+KpnO61hu5+yM0SdA== X-Received: by 2002:a5d:524f:: with SMTP id k15mr29763697wrc.412.1620644470000; Mon, 10 May 2021 04:01:10 -0700 (PDT) Received: from [192.168.3.132] (p5b0c676a.dip0.t-ipconnect.de. [91.12.103.106]) by smtp.gmail.com with ESMTPSA id n6sm22726262wro.23.2021.05.10.04.01.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 May 2021 04:01:09 -0700 (PDT) To: Baoquan He Cc: Andrew Morton , andreyknvl@google.com, christian.brauner@ubuntu.com, colin.king@canonical.com, corbet@lwn.net, dyoung@redhat.com, frederic@kernel.org, gpiccoli@canonical.com, john.p.donnelly@oracle.com, jpoimboe@redhat.com, keescook@chromium.org, linux-mm@kvack.org, masahiroy@kernel.org, mchehab+huawei@kernel.org, mike.kravetz@oracle.com, mingo@kernel.org, mm-commits@vger.kernel.org, paulmck@kernel.org, peterz@infradead.org, rdunlap@infradead.org, rostedt@goodmis.org, rppt@kernel.org, saeed.mirzamohammadi@oracle.com, samitolvanen@google.com, sboyd@kernel.org, tglx@linutronix.de, torvalds@linux-foundation.org, vgoyal@redhat.com, yifeifz2@illinois.edu, Michal Hocko References: <20210507010432.IN24PudKT%akpm@linux-foundation.org> <889c6b90-7335-71ce-c955-3596e6ac7c5a@redhat.com> <20210508085133.GA2946@localhost.localdomain> <2d0f53d9-51ca-da57-95a3-583dc81f35ef@redhat.com> <20210510045338.GB2946@localhost.localdomain> <4a544493-0622-ac6d-f14b-fb338e33b25e@redhat.com> <20210510104359.GC2946@localhost.localdomain> From: David Hildenbrand Organization: Red Hat Subject: Re: [patch 48/91] kernel/crash_core: add crashkernel=auto for vmcore creation Message-ID: Date: Mon, 10 May 2021 13:01:08 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <20210510104359.GC2946@localhost.localdomain> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 3C02E6000112 X-Stat-Signature: qdsedch11mxmfzk3r1ykixc8tu59xx1o Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dv3OgOsI; spf=none (imf09.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf09; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1620644465-54372 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >> I can understand the reasoning of "using a fraction of the memory size= " when >> booting up just to be on the safe side as we don't know", and that >> motivation is much better than what I read so far. But then I wonder i= f we >> cannot handle that any better? Because this feels very suboptimal to m= e and >> I feel like there can be cases where the heuristic is just wrong. >=20 > Yes, I understand what you said. Our headache is mainly from bare metal > system worrying the reservation is not enough becuase of many devices. >=20 > On VM, it is truly different. With much less devices, it does waste som= e > memory. Usually a fixed minimal size can cover 99.9% of system unless > too many devices attached/added to VM, I am not sure what's the > probability it could happen. While, by the help of /sys/kernel/kexec_cr= ash_size, > you can shrink it to an small enough but available size. Just you may > need to reload kdump kernel because the loaded kernel should have been > erazed and out of control. The shrinking should be done at early stage = of > kernel running, I would say, lest crash may happen during that period. >=20 > We ever tried several different ways to enlarge the crashkernel size > dynamically, but didn't think of a good way. Yes, enlarging it at runtime much more difficult than shrinking. [...] >> A kernel early during boot can only guess. A kernel late during boot k= nows. >> Please correct me if I'm wrong. >=20 > Well, I would not say it's guess, and would like to call them experical > values from statistical data. With a priori vlaue given by 'auto', > basically normal users of kdump don't need to care about the setting. > E.g on Fedora, 'auto' can cover all systems, assume nobody would deploy > it on a high end server. Everything we do is to make thing simple enoug= h. > If you don't know how to set, just add 'crashkernel=3Dauto' to cmdline, > then everything is done. I believe you agree that not everybody would > like to dig into kexec/kdump just for getting how big crashkernel size > need be set when they want to use kdump functionality. Oh absolutely. But OTOH, most users will leave the value untouched if it=20 works -- and complain at least in the VM environment to me about=20 surpises waste of system RAM with "crashkernel=3Dauto". [...] >> I know how helpful "crashkernel=3Dauto" was so far, but I am also awar= e that >> there was strong pushback in the past, and I remember for the reasons = I >> gave. IMHO we should refine that approach instead of trying to push th= e same >> thing upstream every couple of years. >> >> I ran into the "512MB crashkernel" on a 64G VM with memory ballooning = issue >> already but didn't report a BZ, because so far, I was under the impres= sion >> that more memory means more crashkernel. But you explained to me that = I was >> just running into a (for my use case) bad heuristic. >=20 > I re-read the old posts, didn't see strong push-back. People just gave > some different ideas instead. When we were silent, we tried different > way, e.g the enlarging crashkernel at run time as told at above, but > failed. Reusing free pages and user space pages of 1st kernel in kdump > kernel, also failed. We also talked with people to consult if it's Thanks for an insight into the history. > doable to remove 'auto' support, nobody would like to give an affirmati= ve > answer. I know SUSE is using the way you mentioned to get a recommended > size for long time, but it needs severeal more steps and need reboot. W= e > prefer to take that way too as an improvement. The simpler, the better. At least I'm happy to hear that other people had the same idea as me ;) I can understand the desire for simplicity. it would be great to hear=20 SUSEs perception of the problem and how they would ideally want to move=20 forward with this. [...] >> I'll be happy to help looking into dynamic shrinking of the crashkerne= l size >> if that approach makes sense. We could even let user space trigger tha= t >> resizing -- without a reboot. >=20 > Don't reply each inline comment since I believe they have been covered > by the earlier reply. Thanks for looking to this and telling your > thought, to let us know that in fact you really care about the extra > memory on VMs which we have realized, but didn't realized it really cau= se > issue. I mess with dynamic resizing of VMs, that's why I usually take a closer=20 look at all things that do stuff based on the initial VM size; yes,=20 there is still a lot other such things out there. It also bugged me for quite a bit that we don't have a sane way to=20 achieve what we're doing here upstream. It somewhat feels like "this=20 doesn't belong in the kernel and is user policy" but then, the existing=20 kernel support is suboptimal. Maybe reserving some "maybe too big but okayish to boot the system in a=20 sane environment -- e.g., X% of system RAM and at least Y" size first=20 and shrinking it later as triggered by user space early (where we do=20 seem to have a way to pre-calculate things now) might actually be a good=20 direction to look into. --=20 Thanks, David / dhildenb