linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Edward Chron <echron@arista.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>, Roman Gushchin <guro@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	David Rientjes <rientjes@google.com>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Shakeel Butt <shakeelb@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	colona@arista.com, Edward Chron <echron@arista.com>
Subject: [PATCH 06/10] mm/oom_debug: Add Select Vmalloc Entries Print
Date: Mon, 26 Aug 2019 12:36:34 -0700	[thread overview]
Message-ID: <20190826193638.6638-7-echron@arista.com> (raw)
In-Reply-To: <20190826193638.6638-1-echron@arista.com>

Add OOM Debug code to allow select vmalloc entries to be printed output
at the time of an OOM event. Listing some portion of the larger vmalloc
entries has proven useful in tracking memory usage during an OOM event
so the root cause of the event can be determined.

Configuring this OOM Debug Option (DEBUG_OOM_VMALLOC_SELECT_PRINT)
------------------------------------------------------------------
To configure this option it needs to be selected in the OOM Debugging
configure menu. The kernel configuration entry can be found in the
config at: Kernel hacking, Memory Debugging, OOM Debugging with the
DEBUG_OOM_VMALLOC_SELECT_PRINT config entry that configures this option.

Two dynamic OOM debug settings for this option: enable, tenthpercent
--------------------------------------------------------------------
The oom debugfs base directory is found at: /sys/kernel/debug/oom.
The oom debugfs for this option is: vmalloc_select_print_
and for select options there are two files, the enable file and
the tenthpercent file are the debugfs files.

Dynamic disable or re-enable this OOM Debug option
--------------------------------------------------
This option may be disabled or re-enabled using the debugfs entry for
this OOM debug option. The debugfs file to enable this entry is found
at: /sys/kernel/debug/oom/vmalloc_select_print_enabled where the enabled
file's value determines whether the facility is enabled or disabled.
A value of 1 is enabled (default) and a value of 0 is disabled.

Specifying the minimum entry size (0-1000) in the tenthpercent file
-------------------------------------------------------------------
Also for DEBUG_OOM_VMALLOC_SELECT_PRINT the number of vmalloc entries
printed can be adjusted. By default if the DEBUG_OOM_VMALLOC_SELECT_PRINT
config option is enabled only entries that use 1% or more of memory are
printed. This can be adjusted to be entries as small as 0% of memory
or as large as 100% of memory in which case only a summary line is
printed, as no vmalloc entry could possibly use 100% of memory.
Adjustments are made through the debugfs file found at:
/sys/kernel/debug/oom/vmalloc_select_print_tenthpercent
Entry values that are valid are 0 through 1000 which represent memory
usage of 0% of memory to 100% of memory. Only entries that are using
at least one page of memory are printed even if the minimum entry
size is specified as 0, zero page entries have no memory assigned.

Content of Vmalloc entry records and Vmalloc summary record
-----------------------------------------------------------
The output is vmalloc entry information output limited such that only
entries equal to or larger than the minimum size are printed.
Unused vmallocs (no pages assigned to the vmalloc) are never printed.
The vmalloc entry information includes:
  - Size (in bytes)
  - pages (Number pages in use)
  - Caller Information to identify the request

Additional output consists of summary information that is printed
at the end of the output. This summary information includes:
  - Number of Vmalloc entries examined
  - Number of Vmalloc entries printed
  - minimum entry size for selection

Sample Output
-------------
Output produced consists of one line of output for each vmalloc entry
that is equal to or larger than the minimum entry size specified
by the percent_totalpages_print_limit (0% to 100.0%) followed by
one line of summary output. There is also a section header output
line and a summary line that are printed.

Sample Vmalloc entries section header:

Aug 19 19:27:01 coronado kernel: Vmalloc Info:

Sample per entry selected print line output:

Jul 22 20:16:09 yoursystem kernel: Vmalloc size=2625536 pages=640
 caller=__do_sys_swapon+0x78e/0x1130

Sample summary print line output:

Jul 22 19:03:26 yoursystem kernel: Summary: Vmalloc entries examined:1070
 printed:989 minsize:0kB


Signed-off-by: Edward Chron <echron@arista.com>
---
 include/linux/vmalloc.h | 12 ++++++++++++
 mm/Kconfig.debug        | 28 +++++++++++++++++++++++++++
 mm/oom_kill_debug.c     | 21 ++++++++++++++++++++
 mm/vmalloc.c            | 43 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 104 insertions(+)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 9b21d0047710..09e3257fc382 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -227,4 +227,16 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 int register_vmap_purge_notifier(struct notifier_block *nb);
 int unregister_vmap_purge_notifier(struct notifier_block *nb);
 
+#ifdef CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT
+/**
+ * Routine used to print select vmalloc entries on an OOM event so we
+ * can identify sizeable entries that may have a significant effect on
+ * kernel memory utilization. Output goes to dmesg along with all the OOM
+ * related messages when the config option DEBUG_OOM_VMALLOC_SELECT_PRINT
+ * is set to yes. The Option may be dyanmically enabled or disabled and
+ * the selection size is also dynamically configureable.
+ */
+extern void vmallocinfo_oom_print(unsigned long min_kb);
+#endif /* CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT */
+
 #endif /* _LINUX_VMALLOC_H */
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index c7d53ca95d32..ea3465343286 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -219,3 +219,31 @@ config DEBUG_OOM_SLAB_SELECT_PRINT
 	  print limit value of 10 or 1% of memory.
 
 	  If unsure, say N.
+
+config DEBUG_OOM_VMALLOC_SELECT_PRINT
+	bool "Debug OOM Select Vmallocs Print"
+	depends on DEBUG_OOM
+	help
+	  When enabled, allows the number of vmalloc entries printed
+	  to be print rate limited based on the amount of memory the
+	  vmalloc entry is consuming.
+
+	  If the option is configured it is enabled/disabled by setting
+	  the value of the file entry in the debugfs OOM interface at:
+	  /sys/kernel/debug/oom/vmalloc_select_print_enabled
+	  A value of 1 is enabled (default) and a value of 0 is disabled.
+
+	  When enabled entries are print limited by the amount of memory
+	  they consume. The setting value defines the minimum memory
+	  size consumed and are represented in tenths of a percent.
+	  Values supported are 0 to 1000 where 0 allows all entries to be
+	  printed, 1 would allow entries using 0.1% or more to be printed,
+	  10 would allow entries using 1% or more of memory to be printed.
+
+	  If configured and enabled the rate limiting memory percentage
+	  is specified by setting a value in the debugfs OOM interface at:
+	  /sys/kernel/debug/oom/vmalloc_select_print_tenthpercent
+	  If configured the default settings are set to enabled and
+	  print limit value of 10 or 1% of memory.
+
+	  If unsure, say N.
diff --git a/mm/oom_kill_debug.c b/mm/oom_kill_debug.c
index 2b5245e1134d..d5e37f8508e6 100644
--- a/mm/oom_kill_debug.c
+++ b/mm/oom_kill_debug.c
@@ -168,6 +168,9 @@
 #ifdef CONFIG_DEBUG_OOM_SLAB_SELECT_PRINT
 #include "slab.h"
 #endif
+#ifdef CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT
+#include <linux/vmalloc.h>
+#endif
 
 #define OOMD_MAX_FNAME 48
 #define OOMD_MAX_OPTNAME 32
@@ -223,6 +226,12 @@ static struct oom_debug_option oom_debug_options_table[] = {
 		.option_name	= "slab_select_print_",
 		.support_tpercent = true,
 	},
+#endif
+#ifdef CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT
+	{
+		.option_name	= "vmalloc_select_print_",
+		.support_tpercent = true,
+	},
 #endif
 	{}
 };
@@ -243,6 +252,9 @@ enum oom_debug_options_index {
 #endif
 #ifdef CONFIG_DEBUG_OOM_SLAB_SELECT_PRINT
 	SELECT_SLABS_STATE,
+#endif
+#ifdef CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT
+	SELECT_VMALLOC_STATE,
 #endif
 	OUT_OF_BOUNDS
 };
@@ -431,6 +443,15 @@ u32 oom_kill_debug_oom_event_is(void)
 		neightbl_print_stats("nd_tbl", &nd_tbl);
 #endif
 
+#ifdef CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT
+	if (oom_kill_debug_enabled(SELECT_VMALLOC_STATE)) {
+		u16 ptenth = oom_kill_debug_tenthpercent(SELECT_VMALLOC_STATE);
+		unsigned long minkb = (K(totalram_pages()) * ptenth) / 1000;
+
+		vmallocinfo_oom_print(minkb);
+	}
+#endif
+
 #ifdef CONFIG_DEBUG_OOM_TASKS_SUMMARY
 	if (oom_kill_debug_enabled(TASKS_STATE))
 		oom_kill_debug_tasks_summary_print();
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 7ba11e12a11f..2cdc0f0cd0af 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3523,4 +3523,47 @@ static int __init proc_vmalloc_init(void)
 }
 module_init(proc_vmalloc_init);
 
+#ifdef CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT
+#define K(x) ((x) << (PAGE_SHIFT-10))
+/*
+ * Routine used to print select vmalloc entries on an OOM condition so
+ * we can identify sizeable entries that may have a significant effect on
+ * kernel memory utilization. Output goes to dmesg along with all the OOM
+ * related messages when the config option DEBUG_OOM_VMALLOC_SELECT_PRINT
+ * is set to yes. Both enable / disable and size selection value are
+ * dynamically configurable.
+ */
+void vmallocinfo_oom_print(unsigned long min_kb)
+{
+	struct vmap_area *vap;
+	struct vm_struct *vsp;
+	u_int32_t entries = 0;
+	u_int32_t printed = 0;
+
+	if (!spin_trylock(&vmap_area_lock)) {
+		pr_info("Vmalloc Info: Skipped, vmap_area_lock not available\n");
+		return;
+	}
+
+	pr_info("Vmalloc Info:\n");
+	list_for_each_entry(vap, &vmap_area_list, list) {
+		if (!(vap->flags & VM_VM_AREA))
+			continue;
+		++entries;
+		vsp = vap->vm;
+		if ((vsp->nr_pages > 0) && (K(vsp->nr_pages) >= min_kb)) {
+			pr_info("vmalloc size=%ld pages=%d caller=%pS\n",
+				vsp->size, vsp->nr_pages, vsp->caller);
+			++printed;
+		}
+	}
+
+	spin_unlock(&vmap_area_lock);
+
+	pr_info("Summary: Vmalloc entries examined:%u printed:%u minsize:%lukB\n",
+		entries, printed, min_kb);
+}
+EXPORT_SYMBOL(vmallocinfo_oom_print);
+#endif /* CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT */
+
 #endif
-- 
2.20.1


  parent reply	other threads:[~2019-08-26 19:37 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-26 19:36 [PATCH 00/10] OOM Debug print selection and additional information Edward Chron
2019-08-26 19:36 ` [PATCH 01/10] mm/oom_debug: Add Debug base code Edward Chron
2019-08-27 13:28   ` kbuild test robot
2019-08-26 19:36 ` [PATCH 02/10] mm/oom_debug: Add System State Summary Edward Chron
2019-08-26 19:36 ` [PATCH 03/10] mm/oom_debug: Add Tasks Summary Edward Chron
2019-08-26 19:36 ` [PATCH 04/10] mm/oom_debug: Add ARP and ND Table Summary usage Edward Chron
2019-08-26 19:36 ` [PATCH 05/10] mm/oom_debug: Add Select Slabs Print Edward Chron
2019-08-26 19:36 ` Edward Chron [this message]
2019-08-26 19:36 ` [PATCH 07/10] mm/oom_debug: Add Select Process Entries Print Edward Chron
2019-08-26 19:36 ` [PATCH 08/10] mm/oom_debug: Add Slab Select Always Print Enable Edward Chron
2019-08-26 19:36 ` [PATCH 09/10] mm/oom_debug: Add Enhanced Slab Print Information Edward Chron
2019-08-26 19:36 ` [PATCH 10/10] mm/oom_debug: Add Enhanced Process " Edward Chron
2019-08-28  0:21   ` kbuild test robot
2019-08-27  7:15 ` [PATCH 00/10] OOM Debug print selection and additional information Michal Hocko
2019-08-27 10:10   ` Tetsuo Handa
2019-08-27 10:38     ` Michal Hocko
2019-08-28  1:07   ` Edward Chron
2019-08-28  6:59     ` Michal Hocko
     [not found]       ` <CAM3twVR_OLffQ1U-SgQOdHxuByLNL5sicfnObimpGpPQ1tJ0FQ@mail.gmail.com>
2019-08-28 20:18         ` Qian Cai
2019-08-28 21:17           ` Edward Chron
2019-08-28 21:34             ` Qian Cai
2019-08-29  7:11         ` Michal Hocko
2019-08-29 10:14           ` Tetsuo Handa
2019-08-29 11:56             ` Michal Hocko
2019-08-29 14:09               ` Tetsuo Handa
2019-08-29 15:48                 ` Edward Chron
2019-08-29 15:03               ` Edward Chron
2019-08-29 15:42                 ` Qian Cai
2019-08-29 16:09                   ` Edward Chron
2019-08-29 18:44                     ` Qian Cai
2019-08-29 22:41                       ` Edward Chron
2019-08-29 16:17                 ` Michal Hocko
2019-08-29 16:35                   ` Edward Chron
2019-08-29 15:20           ` Edward Chron
2019-08-27 12:40 ` Qian Cai
     [not found]   ` <CAM3twVQEMGWMQEC0dduri0JWt3gH6F2YsSqOmk55VQz+CZDVKg@mail.gmail.com>
2019-08-28  0:50     ` Qian Cai
2019-08-28  1:13       ` Edward Chron
2019-08-28  1:32         ` Qian Cai
2019-08-28  2:47           ` Edward Chron
2019-08-28  7:08             ` Michal Hocko
2019-08-28 10:12               ` Tetsuo Handa
2019-08-28 10:32                 ` Michal Hocko
2019-08-28 10:56                   ` Tetsuo Handa
2019-08-28 11:12                     ` Michal Hocko
2019-08-28 20:04                 ` Edward Chron
2019-08-29  3:31                   ` Edward Chron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190826193638.6638-7-echron@arista.com \
    --to=echron@arista.com \
    --cc=akpm@linux-foundation.org \
    --cc=colona@arista.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).