linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	andreyknvl@google.com, christian.brauner@ubuntu.com,
	colin.king@canonical.com, corbet@lwn.net, dyoung@redhat.com,
	frederic@kernel.org, gpiccoli@canonical.com,
	john.p.donnelly@oracle.com, jpoimboe@redhat.com,
	keescook@chromium.org, linux-mm@kvack.org, masahiroy@kernel.org,
	mchehab+huawei@kernel.org, mike.kravetz@oracle.com,
	mingo@kernel.org, mm-commits@vger.kernel.org, paulmck@kernel.org,
	peterz@infradead.org, rdunlap@infradead.org, rostedt@goodmis.org,
	rppt@kernel.org, saeed.mirzamohammadi@oracle.com,
	samitolvanen@google.com, sboyd@kernel.org, tglx@linutronix.de,
	torvalds@linux-foundation.org, vgoyal@redhat.com,
	yifeifz2@illinois.edu
Subject: Re: [patch 48/91] kernel/crash_core: add crashkernel=auto for vmcore creation
Date: Mon, 10 May 2021 12:53:38 +0800	[thread overview]
Message-ID: <20210510045338.GB2946@localhost.localdomain> (raw)
In-Reply-To: <2d0f53d9-51ca-da57-95a3-583dc81f35ef@redhat.com>

On 05/08/21 at 11:22am, David Hildenbrand wrote:
> > > Let me take a look .... oh, there it is from 2009
> > > 
> > > https://marc.info/?t=125006512600002&r=1&w=2
> > > 
> > > and then we had it in 2018
> > > 
> > > https://lkml.org/lkml/2018/5/20/262
> > 
> > Thanks for digging these two out, otherwise I may need do for people to
> > know the history better.
> 
> Sure, I stumbled over this myself recently when wondering about what fadump
> is.
> 
> 
> > > The issue I have with this: it's just plain wrong when you take memory
> > > hotplug into serious account as we see it quite heavily in VMs. You don't
> > > know what you'll need when building a kernel. Just pass it via the cmdline
> > 
> > Hmm, kdump may have no issue with memory hotplug in crashkernel
> > reservation aspect. The system RAM size is not correlated to
> > crashkernel size directly, that's why the default value in this patch is
> 
> "Not correlated directly" ...
> 
> "1G-64G:128M,64G-1T:256M,1T-:512M"
> 
> Am I still asleep and dreaming? :)

Well, I said 'Not correlated directly', then gave sentences to explan
the reason. I would like to repeat them:

1) Crashkernel need more memory on some systems mainly because of
device driver. You can take a system, no matter how much memory you
increse or decrease total system RAM size, the crashkernel size needed
is invariable.

  - The extreme case I have give about the i40e.
  - And the more devices, narutally the more memory needed.

2) About "1G-64G:128M,64G-1T:256M,1T-:512M", I also said the different
value is because taking very low proprotion of extra memory to avoid
potential risk, it's cost effective. Here, add another 90M which is
0.13% of 64G, 0.0085% of 1TB.

Hope it can help people sober up.

> 
> 
> > not linear related to system RAM size. The proportion of crashkernel
> > size to the total RAM size is thing we take into account. Usually
> > crashkernel 160M is enough on most of systems. If system RAM size is
> > larger, extra memory can be added just in case, and not bring much
> > impact to system.
> 
> So, all the rules we have are essentially broken because they rely
> completely on the system RAM during boot.

How do you get this?

Crashkernel=auto is a default value. PC, VMs, normal workstation and server
which are the overall majority can work well with it. I can say the number
is 99%. Only very few high end workstation, servers which contain
many PCI devices need investigation to decide crashkernel size. A possible
manual setting and rebooting is needed for them. You call this
'essentially broken'? So you later suggestd constructing crashkernel value
in user space and rebooting is not broken? Even though it's the similar
thing? what is your logic behind your conclusion?

Crashkernel=auto is mainly targetting most of systems, help people
w/o much knowledge of kdump implementation to use it for debugging.

I can say more about the benefit of crashkernel=auto. On Fedora, the
community distros sponsord by Redhat, the kexec/kdump is also maintained
by us. Fedora kernel is mainline kernel, so no crashkernel=auto
provided. We almost never get bug report from users, means almost nobody
use  it. We hope Fedora users' usage can help test functionality of
component. 
> 
> > 
> > With our investigation, PCIe devices impact the crashkernel size, and
> > cpu number. There are always pci devices which driver require tens of KB
> > meomry, even MB. E.g in below patch, my colleague Coiby found out the
> > i40e network card even cost 1.5G memory to initialize its ringbuffer on
> > ppc, and 85M on x86_64.
> > 
> > [PATCH v1 0/3] Reducing memory usage of i40e for kdump
> > http://lists.infradead.org/pipermail/kexec/2021-March/022117.html
> > 
> > Even though not all pci devices need surprisingly large memory like
> > i40e, system with hundreds of pci devices can also cost more memory than
> > expected. This kind of system usually is high end server, specified
> > crashkernel value need be set manually.
> > 
> > So system RAM size is the least important part to influence crashkernel
> 
> Aehm, not with fadump, no?

Fadump makes use of crashkernel reservation, but has different mechanism
to dumping. It needs a kernel config too if this patch is accepted, or
it can add it to command line from a user space program, I will talk
about that later. This depends on IBM's decision, I have added Hari to CC,
they will make the best choice after consideration.

}
> 
> > costing. Say my x1 laptop, even though I extended the RAM to 100TB, 160M
> > crashkernel is still enough. Just we would like to get a tiny extra part
> > to add to crashkernel if the total RAM is very large, that's the rule
> > for crashkernel=auto. As for VMs, given their very few devices, virtio
> > disk, NAT nic, etc, no matter how much memory is deployed and hot
> > added/removed, crashkernel size won't be influenced very much. My
> > personal understanding about it.
> 
> That's an interesting observation. But you're telling me that we end up
> wasting memory for the crashkernel because "crashkernel=auto" which is
> supposed to do something magical good automatically does something very
> suboptimal? Oh my ... this is broken.
> 
> Long story short: crashkernel=auto is pure ugliness.

Very interesting. Your long story is clear to me, but your short story
confuses me a lot.

Let me try to sort out and understand. In your first reply, you asserted
"it's plain wrong when taking memory hotplug serious account as
we see it quite heavily in VMs", means you plain don't know if it's
wrong, but you say it's plain wrong. I answered you 'no, not at all'
with detailed explanation, means it's plain opposite to your assertion.
So then you quickly came to 'crashkernel=auto is pure ugliness'. If a
simple crashkernel=auto is added to cover 99% systems, and advanced
operation only need be done for the rest which is tiny proportion,
this is called pure ugliness, what's pure beauty? Here I say 99%, I
could be very conservative.

> 
> Why can't we construct a crashkernel in user space when
> installing/activating kdump and requiring a reboot for kdump to be active as
> long as that crashkernel setting is not properly respected?
> 
> Just have a look at the system properties (is_qemu(), #PCI, ...) and propose
> a value for "crashkernel=". Check that that value is at least active when
> activating kdump. Otherwise don't enable kdump and fail.
> 
> Yes, it can be difficult with some newer/older kernels having some different
> demands, but things should change drastically, and a distro can always
> update its advises along with the kernel, no?
> 
> You could even have a kernel interface that gives you the current
> crashkernel size (maybe already there) vs. the recommended crashkernel size.
> Make kdump or *whoever* activate that in the cmdline and let kdump check if
> both values are satisfied when booting up.

Now, let's go to your long story.

Yes, if you haven't seen our patch in fedora kexec-tools maining list,
your suggested approach is the exactly same thing we are doing, please
check below patch.

[PATCH v2] kdumpctl: Add kdumpctl estimate
https://lists.fedoraproject.org/archives/list/kexec@lists.fedoraproject.org/thread/YCEOJHQXKVEIVNB23M2TDAJGYVNP5MJZ/

We will provide a new feature in user space script, to let user check if
their current crashkernel size is good or not. If not, they can adjust
accordingly.

But, where's the current crashkernel size coming from? Surely
crashkernel=auto. You wouldn't add a random crashkernel size then
compared with the recommended crashkernel size, then reboot, will you?
If crashkernel=auto get the expected size, no need to reboot. Means 99%
of systems has no need to reboot. Only very few of systems, need reboot
after checking the recommended size.

Long story short. crashkernel=auto will give a default value, trying to
cover most of systems. (Very few high end server need check if it's
enough and adjust with the help of user space tools. Then reboot.)


> 
> Also: this approach here doesn't make any sense when you want to do
> something dependent on other cmdline parameters. Take "fadump=on" vs
> "fadump=off" as an example. You just cannot handle it properly as proposed
> in this patch. To me the approach in this patch makes least sense TBH.

Why? We don't have this kind of judgement in kernel? Crashkernel=auto is
a generic mechanism, and has been added much earlier. Fadump was added
later by IBM for their need on ppc only, it relies on crashkernel
reservation but different mechanism of dumping. If it has different value
than kdump, a special hanlding is certainly needed. Who tell it has to be
'fadump=on'? They can check the value in user space program and add into
cmdline as you suggested, they can also make it into auto. The most suitable
is the best.

And I have several questions to ask, hope you can help answer:

1) Have you ever met crashkernel=auto broken on virt platform?

Asking this because you are from Virt team, and crashkernel=auto has been
there in RHEL for many years, and we have been working with Virt team to
support dumping. We haven't seen any bug report or complaint about
crashkernel=auto from Virt. 

2) Adding crashkernel=auto, and the kdumpctl estimate as user space
program to get a recommended size, then reboot. Removing crashkernel=auto,
only the kdumpctl estimate to get a recommended size, always reboot.
In RHEL we will take the 1st option. Are you willing to take the 2nd one
for Virt platform since you think crashkernel=auto is plain wrong, pure
ugliness, essentially broken, least sense?

Thanks
Baoquan



  reply	other threads:[~2021-05-10  4:54 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-07  1:01 incoming Andrew Morton
2021-05-07  1:02 ` [patch 01/91] alpha: eliminate old-style function definitions Andrew Morton
2021-05-07  1:02 ` [patch 02/91] alpha: csum_partial_copy.c: add function prototypes from <net/checksum.h> Andrew Morton
2021-05-07  1:02 ` [patch 03/91] fs/proc/generic.c: fix incorrect pde_is_permanent check Andrew Morton
2021-05-07  1:02 ` [patch 04/91] proc: save LOC in __xlate_proc_name() Andrew Morton
2021-05-07  2:24   ` Linus Torvalds
2021-05-07  1:02 ` [patch 05/91] proc: mandate ->proc_lseek in "struct proc_ops" Andrew Morton
2021-05-07  1:02 ` [patch 06/91] proc: delete redundant subset=pid check Andrew Morton
2021-05-07  1:02 ` [patch 07/91] selftests: proc: test subset=pid Andrew Morton
2021-05-07  1:02 ` [patch 08/91] proc/sysctl: fix function name error in comments Andrew Morton
2021-05-07  1:02 ` [patch 09/91] include: remove pagemap.h from blkdev.h Andrew Morton
2021-05-07  1:02 ` [patch 10/91] kernel.h: drop inclusion in bitmap.h Andrew Morton
2021-05-07  1:02 ` [patch 11/91] linux/profile.h: remove unnecessary declaration Andrew Morton
2021-05-07  1:02 ` [patch 12/91] kernel/async.c: fix pr_debug statement Andrew Morton
2021-05-07  1:02 ` [patch 13/91] kernel/cred.c: make init_groups static Andrew Morton
2021-05-07  1:02 ` [patch 14/91] tools: disable -Wno-type-limits Andrew Morton
2021-05-07  1:02 ` [patch 15/91] tools: bitmap: sync function declarations with the kernel Andrew Morton
2021-05-07  1:02 ` [patch 16/91] tools: sync BITMAP_LAST_WORD_MASK() macro " Andrew Morton
2021-05-07  1:02 ` [patch 17/91] arch: rearrange headers inclusion order in asm/bitops for m68k, sh and h8300 Andrew Morton
2021-05-07  1:02 ` [patch 18/91] lib: extend the scope of small_const_nbits() macro Andrew Morton
2021-05-07  1:03 ` [patch 19/91] tools: sync small_const_nbits() macro with the kernel Andrew Morton
2021-05-07  1:03 ` [patch 20/91] lib: inline _find_next_bit() wrappers Andrew Morton
2021-05-07  1:03 ` [patch 21/91] tools: sync find_next_bit implementation Andrew Morton
2021-05-07  1:03 ` [patch 22/91] lib: add fast path for find_next_*_bit() Andrew Morton
2021-05-07  1:03 ` [patch 23/91] lib: add fast path for find_first_*_bit() and find_last_bit() Andrew Morton
2021-05-07  1:03 ` [patch 24/91] tools: sync lib/find_bit implementation Andrew Morton
2021-05-07  1:03 ` [patch 25/91] MAINTAINERS: add entry for the bitmap API Andrew Morton
2021-05-07  1:03 ` [patch 26/91] lib/bch.c: fix a typo in the file bch.c Andrew Morton
2021-05-07  1:03 ` [patch 27/91] lib: fix inconsistent indenting in process_bit1() Andrew Morton
2021-05-07  1:03 ` [patch 28/91] lib/list_sort.c: fix typo in function description Andrew Morton
2021-05-07  1:03 ` [patch 29/91] lib/genalloc.c: fix a typo Andrew Morton
2021-05-07  1:03 ` [patch 30/91] lib: crc8: pointer to data block should be const Andrew Morton
2021-05-07  1:03 ` [patch 31/91] lib: stackdepot: turn depot_lock spinlock to raw_spinlock Andrew Morton
2021-05-07  1:03 ` [patch 32/91] lib/percpu_counter: tame kernel-doc compile warning Andrew Morton
2021-05-07  1:03 ` [patch 33/91] lib/genalloc: add parameter description to fix doc " Andrew Morton
2021-05-07  1:03 ` [patch 34/91] lib: parser: clean up kernel-doc Andrew Morton
2021-05-07  1:03 ` [patch 35/91] include/linux/compat.h: remove unneeded declaration from COMPAT_SYSCALL_DEFINEx() Andrew Morton
2021-05-07  1:03 ` [patch 36/91] checkpatch: warn when missing newline in return sysfs_emit() formats Andrew Morton
2021-05-07  1:03 ` [patch 37/91] checkpatch: exclude four preprocessor sub-expressions from MACRO_ARG_REUSE Andrew Morton
2021-05-07  1:04 ` [patch 38/91] checkpatch: improve ALLOC_ARRAY_ARGS test Andrew Morton
2021-05-07  1:04 ` [patch 39/91] kselftest: introduce new epoll test case Andrew Morton
2021-05-07  1:04 ` [patch 40/91] fs/epoll: restore waking from ep_done_scan() Andrew Morton
2021-05-07  1:04 ` [patch 41/91] isofs: fix fall-through warnings for Clang Andrew Morton
2021-05-07  1:04 ` [patch 42/91] fs/nilfs2: fix misspellings using codespell tool Andrew Morton
2021-05-07  1:04 ` [patch 43/91] nilfs2: fix typos in comments Andrew Morton
2021-05-07  1:04 ` [patch 44/91] hpfs: replace one-element array with flexible-array member Andrew Morton
2021-05-07  1:04 ` [patch 45/91] do_wait: make PIDTYPE_PID case O(1) instead of O(n) Andrew Morton
2021-05-07  1:04 ` [patch 46/91] kernel/fork.c: simplify copy_mm() Andrew Morton
2021-05-07  1:04 ` [patch 47/91] kernel/fork.c: fix typos Andrew Morton
2021-05-07  1:04 ` [patch 48/91] kernel/crash_core: add crashkernel=auto for vmcore creation Andrew Morton
2021-05-07  7:25   ` Linus Torvalds
2021-05-08  3:13     ` Baoquan He
2021-05-08  3:29       ` Baoquan He
2021-05-07  8:16   ` David Hildenbrand
2021-05-08  8:51     ` Baoquan He
2021-05-08  9:22       ` David Hildenbrand
2021-05-10  4:53         ` Baoquan He [this message]
2021-05-10  8:32           ` David Hildenbrand
2021-05-10 10:43             ` Baoquan He
2021-05-10 11:01               ` David Hildenbrand
2021-05-10 11:44                 ` Dave Young
2021-05-10 11:56                   ` David Hildenbrand
2021-05-11 13:36                     ` Baoquan He
2021-05-11 16:31                       ` Mike Rapoport
2021-05-11 17:07                         ` David Hildenbrand
2021-05-12 14:51                           ` Baoquan He
2021-05-12 15:07                             ` David Hildenbrand
2021-05-13  5:04                               ` Baoquan He
2021-05-12 19:03                             ` Kairui Song
2021-05-17  8:22                             ` David Hildenbrand
2021-05-18  8:49                               ` Baoquan He
2021-05-18  8:51                                 ` David Hildenbrand
2021-05-18  9:24                                   ` Dave Young
2021-05-12 14:13                         ` Baoquan He
2021-05-12  7:42                     ` Dave Young
2021-05-07  1:04 ` [patch 49/91] kexec: add kexec reboot string Andrew Morton
2021-05-07  1:04 ` [patch 50/91] kernel: kexec_file: fix error return code of kexec_calculate_store_digests() Andrew Morton
2021-05-07  1:04 ` [patch 51/91] kexec: dump kmessage before machine_kexec Andrew Morton
2021-05-07  1:04 ` [patch 52/91] gcov: combine common code Andrew Morton
2021-05-07  1:04 ` [patch 53/91] gcov: simplify buffer allocation Andrew Morton
2021-05-07  1:04 ` [patch 54/91] gcov: use kvmalloc() Andrew Morton
2021-05-07  1:04 ` [patch 55/91] gcov: clang: drop support for clang-10 and older Andrew Morton
2021-05-07  1:04 ` [patch 56/91] smp: kernel/panic.c - silence warnings Andrew Morton
2021-05-07  1:05 ` [patch 57/91] delayacct: clear right task's flag after blkio completes Andrew Morton
2021-05-07  1:05 ` [patch 58/91] gdb: lx-symbols: store the abspath() Andrew Morton
2021-05-07  1:05 ` [patch 59/91] scripts/gdb: document lx_current is only supported by x86 Andrew Morton
2021-05-07  1:05 ` [patch 60/91] scripts/gdb: add lx_current support for arm64 Andrew Morton
2021-05-07  1:05 ` [patch 61/91] kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources Andrew Morton
2021-05-07  1:05 ` [patch 62/91] kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources Andrew Morton
2021-05-07  1:05 ` [patch 63/91] kernel/resource: remove first_lvl / siblings_only logic Andrew Morton
2021-05-07  1:05 ` [patch 64/91] kernel/resource: allow region_intersects users to hold resource_lock Andrew Morton
2021-05-07  1:05 ` [patch 65/91] kernel/resource: refactor __request_region to allow external locking Andrew Morton
2021-05-07  1:05 ` [patch 66/91] kernel/resource: fix locking in request_free_mem_region Andrew Morton
2021-05-07  1:05 ` [patch 67/91] selftests: remove duplicate include Andrew Morton
2021-05-07  1:05 ` [patch 68/91] kernel/async.c: stop guarding pr_debug() statements Andrew Morton
2021-05-07  1:05 ` [patch 69/91] kernel/async.c: remove async_unregister_domain() Andrew Morton
2021-05-07  1:05 ` [patch 70/91] init/initramfs.c: do unpacking asynchronously Andrew Morton
2021-05-07  1:05 ` [patch 71/91] modules: add CONFIG_MODPROBE_PATH Andrew Morton
2021-05-07  1:05 ` [patch 72/91] ipc/sem.c: mundane typo fixes Andrew Morton
2021-05-07  1:05 ` [patch 73/91] mm: fix some typos and code style problems Andrew Morton
2021-05-07  1:05 ` [patch 74/91] drivers/char: remove /dev/kmem for good Andrew Morton
2021-05-07  1:06 ` [patch 75/91] mm: remove xlate_dev_kmem_ptr() Andrew Morton
2021-05-07  1:06 ` [patch 76/91] mm/vmalloc: remove vwrite() Andrew Morton
2021-05-07  1:06 ` [patch 77/91] arm: print alloc free paths for address in registers Andrew Morton
2021-05-07  1:06 ` [patch 78/91] scripts/spelling.txt: add "overlfow" Andrew Morton
2021-05-07  1:06 ` [patch 79/91] scripts/spelling.txt: add "diabled" typo Andrew Morton
2021-05-07  1:06 ` [patch 80/91] scripts/spelling.txt: add "overflw" Andrew Morton
2021-05-07  1:06 ` [patch 81/91] mm/slab.c: fix spelling mistake "disired" -> "desired" Andrew Morton
2021-05-07  1:06 ` [patch 82/91] include/linux/pgtable.h: few spelling fixes Andrew Morton
2021-05-07  1:06 ` [patch 83/91] kernel/umh.c: fix some spelling mistakes Andrew Morton
2021-05-07  1:06 ` [patch 84/91] kernel/user_namespace.c: fix typos Andrew Morton
2021-05-07  1:06 ` [patch 85/91] kernel/up.c: fix typo Andrew Morton
2021-05-07  1:06 ` [patch 86/91] kernel/sys.c: " Andrew Morton
2021-05-07  1:06 ` [patch 87/91] fs: fat: fix spelling typo of values Andrew Morton
2021-05-07  1:06 ` [patch 88/91] ipc/sem.c: spelling fix Andrew Morton
2021-05-07  1:06 ` [patch 89/91] treewide: remove editor modelines and cruft Andrew Morton
2021-05-07  1:06 ` [patch 90/91] mm: fix typos in comments Andrew Morton
2021-05-07  1:06 ` [patch 91/91] " Andrew Morton
2021-05-07  7:12 ` incoming Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210510045338.GB2946@localhost.localdomain \
    --to=bhe@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@google.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=colin.king@canonical.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=dyoung@redhat.com \
    --cc=frederic@kernel.org \
    --cc=gpiccoli@canonical.com \
    --cc=john.p.donnelly@oracle.com \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-mm@kvack.org \
    --cc=masahiroy@kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@kernel.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=saeed.mirzamohammadi@oracle.com \
    --cc=samitolvanen@google.com \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vgoyal@redhat.com \
    --cc=yifeifz2@illinois.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).