All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
@ 2017-09-27 21:46 ` Yang Shi
  0 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-09-27 21:46 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
  Cc: Yang Shi, linux-mm, linux-kernel


Recently we ran into a oom issue, kernel panic due to no killable process.
The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.

So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.

With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.

And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.

For details, please see the commit log for each commit.

Changelog v7 —> v8:
* Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.

Changelog v6 -> v7:
* Added unreclaim_slabs_oom_ratio proc knob, unreclaimable slabs info will be dumped when unreclaimable slabs amount : all user memory > the ratio

Changelog v5 —> v6:
* Fixed a checkpatch.pl warning for patch #2

Changelog v4 —> v5:
* Solved the comments from David
* Build test SLABINFO = n

Changelog v3 —> v4:
* Solved the comments from David
* Added David’s Acked-by in patch 1

Changelog v2 —> v3:
* Show used size and total size of each kmem cache per David’s comment

Changelog v1 —> v2:
* Removed the original patch 1 (“mm: slab: output reclaimable flag in /proc/slabinfo”) since Christoph suggested it might break the compatibility and /proc/slabinfo is legacy
* Added Christoph’s Acked-by
* Removed acquiring slab_mutex per Tetsuo’s comment


Yang Shi (2):
      tools: slabinfo: add "-U" option to show unreclaimable slabs only
      mm: oom: show unreclaimable slab info when unreclaimable slabs > user memory

 mm/oom_kill.c       | 22 ++++++++++++++++++++++
 mm/slab.h           |  8 ++++++++
 mm/slab_common.c    | 29 +++++++++++++++++++++++++++++
 tools/vm/slabinfo.c | 11 ++++++++++-
 4 files changed, 69 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
@ 2017-09-27 21:46 ` Yang Shi
  0 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-09-27 21:46 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
  Cc: Yang Shi, linux-mm, linux-kernel


Recently we ran into a oom issue, kernel panic due to no killable process.
The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.

So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.

With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.

And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.

For details, please see the commit log for each commit.

Changelog v7 a??> v8:
* Adopted Michala??s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.

Changelog v6 -> v7:
* Added unreclaim_slabs_oom_ratio proc knob, unreclaimable slabs info will be dumped when unreclaimable slabs amount : all user memory > the ratio

Changelog v5 a??> v6:
* Fixed a checkpatch.pl warning for patch #2

Changelog v4 a??> v5:
* Solved the comments from David
* Build test SLABINFO = n

Changelog v3 a??> v4:
* Solved the comments from David
* Added Davida??s Acked-by in patch 1

Changelog v2 a??> v3:
* Show used size and total size of each kmem cache per Davida??s comment

Changelog v1 a??> v2:
* Removed the original patch 1 (a??mm: slab: output reclaimable flag in /proc/slabinfoa??) since Christoph suggested it might break the compatibility and /proc/slabinfo is legacy
* Added Christopha??s Acked-by
* Removed acquiring slab_mutex per Tetsuoa??s comment


Yang Shi (2):
      tools: slabinfo: add "-U" option to show unreclaimable slabs only
      mm: oom: show unreclaimable slab info when unreclaimable slabs > user memory

 mm/oom_kill.c       | 22 ++++++++++++++++++++++
 mm/slab.h           |  8 ++++++++
 mm/slab_common.c    | 29 +++++++++++++++++++++++++++++
 tools/vm/slabinfo.c | 11 ++++++++++-
 4 files changed, 69 insertions(+), 1 deletion(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] tools: slabinfo: add "-U" option to show unreclaimable slabs only
  2017-09-27 21:46 ` Yang Shi
@ 2017-09-27 21:46   ` Yang Shi
  -1 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-09-27 21:46 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
  Cc: Yang Shi, linux-mm, linux-kernel

Add "-U" option to show unreclaimable slabs only.

"-U" and "-S" together can tell us what unreclaimable slabs use the most
memory to help debug huge unreclaimable slabs issue.

Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
---
 tools/vm/slabinfo.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c
index b9d34b3..de8fa11 100644
--- a/tools/vm/slabinfo.c
+++ b/tools/vm/slabinfo.c
@@ -83,6 +83,7 @@ struct aliasinfo {
 int sort_loss;
 int extended_totals;
 int show_bytes;
+int unreclaim_only;
 
 /* Debug options */
 int sanity;
@@ -132,6 +133,7 @@ static void usage(void)
 		"-L|--Loss              Sort by loss\n"
 		"-X|--Xtotals           Show extended summary information\n"
 		"-B|--Bytes             Show size in bytes\n"
+		"-U|--Unreclaim		Show unreclaimable slabs only\n"
 		"\nValid debug options (FZPUT may be combined)\n"
 		"a / A          Switch on all debug options (=FZUP)\n"
 		"-              Switch off all debug options\n"
@@ -568,6 +570,9 @@ static void slabcache(struct slabinfo *s)
 	if (strcmp(s->name, "*") == 0)
 		return;
 
+	if (unreclaim_only && s->reclaim_account)
+		return;
+
 	if (actual_slabs == 1) {
 		report(s);
 		return;
@@ -1346,6 +1351,7 @@ struct option opts[] = {
 	{ "Loss", no_argument, NULL, 'L'},
 	{ "Xtotals", no_argument, NULL, 'X'},
 	{ "Bytes", no_argument, NULL, 'B'},
+	{ "Unreclaim", no_argument, NULL, 'U'},
 	{ NULL, 0, NULL, 0 }
 };
 
@@ -1357,7 +1363,7 @@ int main(int argc, char *argv[])
 
 	page_size = getpagesize();
 
-	while ((c = getopt_long(argc, argv, "aAd::Defhil1noprstvzTSN:LXB",
+	while ((c = getopt_long(argc, argv, "aAd::Defhil1noprstvzTSN:LXBU",
 						opts, NULL)) != -1)
 		switch (c) {
 		case '1':
@@ -1438,6 +1444,9 @@ int main(int argc, char *argv[])
 		case 'B':
 			show_bytes = 1;
 			break;
+		case 'U':
+			unreclaim_only = 1;
+			break;
 		default:
 			fatal("%s: Invalid option '%c'\n", argv[0], optopt);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 1/2] tools: slabinfo: add "-U" option to show unreclaimable slabs only
@ 2017-09-27 21:46   ` Yang Shi
  0 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-09-27 21:46 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
  Cc: Yang Shi, linux-mm, linux-kernel

Add "-U" option to show unreclaimable slabs only.

"-U" and "-S" together can tell us what unreclaimable slabs use the most
memory to help debug huge unreclaimable slabs issue.

Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
---
 tools/vm/slabinfo.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c
index b9d34b3..de8fa11 100644
--- a/tools/vm/slabinfo.c
+++ b/tools/vm/slabinfo.c
@@ -83,6 +83,7 @@ struct aliasinfo {
 int sort_loss;
 int extended_totals;
 int show_bytes;
+int unreclaim_only;
 
 /* Debug options */
 int sanity;
@@ -132,6 +133,7 @@ static void usage(void)
 		"-L|--Loss              Sort by loss\n"
 		"-X|--Xtotals           Show extended summary information\n"
 		"-B|--Bytes             Show size in bytes\n"
+		"-U|--Unreclaim		Show unreclaimable slabs only\n"
 		"\nValid debug options (FZPUT may be combined)\n"
 		"a / A          Switch on all debug options (=FZUP)\n"
 		"-              Switch off all debug options\n"
@@ -568,6 +570,9 @@ static void slabcache(struct slabinfo *s)
 	if (strcmp(s->name, "*") == 0)
 		return;
 
+	if (unreclaim_only && s->reclaim_account)
+		return;
+
 	if (actual_slabs == 1) {
 		report(s);
 		return;
@@ -1346,6 +1351,7 @@ struct option opts[] = {
 	{ "Loss", no_argument, NULL, 'L'},
 	{ "Xtotals", no_argument, NULL, 'X'},
 	{ "Bytes", no_argument, NULL, 'B'},
+	{ "Unreclaim", no_argument, NULL, 'U'},
 	{ NULL, 0, NULL, 0 }
 };
 
@@ -1357,7 +1363,7 @@ int main(int argc, char *argv[])
 
 	page_size = getpagesize();
 
-	while ((c = getopt_long(argc, argv, "aAd::Defhil1noprstvzTSN:LXB",
+	while ((c = getopt_long(argc, argv, "aAd::Defhil1noprstvzTSN:LXBU",
 						opts, NULL)) != -1)
 		switch (c) {
 		case '1':
@@ -1438,6 +1444,9 @@ int main(int argc, char *argv[])
 		case 'B':
 			show_bytes = 1;
 			break;
+		case 'U':
+			unreclaim_only = 1;
+			break;
 		default:
 			fatal("%s: Invalid option '%c'\n", argv[0], optopt);
 
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] mm: oom: show unreclaimable slab info when unreclaimable slabs > user memory
  2017-09-27 21:46 ` Yang Shi
@ 2017-09-27 21:46   ` Yang Shi
  -1 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-09-27 21:46 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
  Cc: Yang Shi, linux-mm, linux-kernel

Kernel may panic when oom happens without killable process sometimes it
is caused by huge unreclaimable slabs used by kernel.

Although kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.

Print out unreclaimable slab info (used size and total size) which
actual memory usage is not zero (num_objs * size != 0) when
unreclaimable slabs amount is greater than total user memory (LRU
pages).

The output looks like:

Unreclaimable slab info:
Name                      Used          Total
rpc_buffers               31KB         31KB
rpc_tasks                  7KB          7KB
ebitmap_node            1964KB       1964KB
avtab_node              5024KB       5024KB
xfs_buf                 1402KB       1402KB
xfs_ili                  134KB        134KB
xfs_efi_item             115KB        115KB
xfs_efd_item             115KB        115KB
xfs_buf_item             134KB        134KB
xfs_log_item_desc        342KB        342KB
xfs_trans               1412KB       1412KB
xfs_ifork                212KB        212KB

Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
 mm/oom_kill.c    | 22 ++++++++++++++++++++++
 mm/slab.h        |  8 ++++++++
 mm/slab_common.c | 29 +++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e0..6d89397 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -43,6 +43,7 @@
 
 #include <asm/tlb.h>
 #include "internal.h"
+#include "slab.h"
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/oom.h>
@@ -160,6 +161,25 @@ static bool oom_unkillable_task(struct task_struct *p,
 	return false;
 }
 
+/*
+ * Print out unreclaimble slabs info when unreclaimable slabs amount is greater
+ * than all user memory (LRU pages)
+ */
+static bool is_dump_unreclaim_slabs(void)
+{
+	unsigned long nr_lru;
+
+	nr_lru = global_node_page_state(NR_ACTIVE_ANON) +
+		 global_node_page_state(NR_INACTIVE_ANON) +
+		 global_node_page_state(NR_ACTIVE_FILE) +
+		 global_node_page_state(NR_INACTIVE_FILE) +
+		 global_node_page_state(NR_ISOLATED_ANON) +
+		 global_node_page_state(NR_ISOLATED_FILE) +
+		 global_node_page_state(NR_UNEVICTABLE);
+
+	return (global_node_page_state(NR_SLAB_UNRECLAIMABLE) > nr_lru);
+}
+
 /**
  * oom_badness - heuristic function to determine which candidate task to kill
  * @p: task struct of which task we should calculate
@@ -423,6 +443,8 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
 		mem_cgroup_print_oom_info(oc->memcg, p);
 	else
 		show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
+	if (is_dump_unreclaim_slabs())
+		dump_unreclaimable_slab();
 	if (sysctl_oom_dump_tasks)
 		dump_tasks(oc->memcg, oc->nodemask);
 }
diff --git a/mm/slab.h b/mm/slab.h
index 0733628..b0496d1 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -505,6 +505,14 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
 void memcg_slab_stop(struct seq_file *m, void *p);
 int memcg_slab_show(struct seq_file *m, void *p);
 
+#ifdef CONFIG_SLABINFO
+void dump_unreclaimable_slab(void);
+#else
+static inline void dump_unreclaimable_slab(void)
+{
+}
+#endif
+
 void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
 
 #ifdef CONFIG_SLAB_FREELIST_RANDOM
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83b..d08213d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1272,6 +1272,35 @@ static int slab_show(struct seq_file *m, void *p)
 	return 0;
 }
 
+void dump_unreclaimable_slab(void)
+{
+	struct kmem_cache *s, *s2;
+	struct slabinfo sinfo;
+
+	pr_info("Unreclaimable slab info:\n");
+	pr_info("Name                      Used          Total\n");
+
+	/*
+	 * Here acquiring slab_mutex is unnecessary since we don't prefer to
+	 * get sleep in oom path right before kernel panic, and avoid race
+	 * condition.
+	 * Since it is already oom, so there should be not any big allocation
+	 * which could change the statistics significantly.
+	 */
+	list_for_each_entry_safe(s, s2, &slab_caches, list) {
+		if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
+			continue;
+
+		memset(&sinfo, 0, sizeof(sinfo));
+		get_slabinfo(s, &sinfo);
+
+		if (sinfo.num_objs > 0)
+			pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
+				(sinfo.active_objs * s->size) / 1024,
+				(sinfo.num_objs * s->size) / 1024);
+	}
+}
+
 #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
 void *memcg_slab_start(struct seq_file *m, loff_t *pos)
 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] mm: oom: show unreclaimable slab info when unreclaimable slabs > user memory
@ 2017-09-27 21:46   ` Yang Shi
  0 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-09-27 21:46 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
  Cc: Yang Shi, linux-mm, linux-kernel

Kernel may panic when oom happens without killable process sometimes it
is caused by huge unreclaimable slabs used by kernel.

Although kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.

Print out unreclaimable slab info (used size and total size) which
actual memory usage is not zero (num_objs * size != 0) when
unreclaimable slabs amount is greater than total user memory (LRU
pages).

The output looks like:

Unreclaimable slab info:
Name                      Used          Total
rpc_buffers               31KB         31KB
rpc_tasks                  7KB          7KB
ebitmap_node            1964KB       1964KB
avtab_node              5024KB       5024KB
xfs_buf                 1402KB       1402KB
xfs_ili                  134KB        134KB
xfs_efi_item             115KB        115KB
xfs_efd_item             115KB        115KB
xfs_buf_item             134KB        134KB
xfs_log_item_desc        342KB        342KB
xfs_trans               1412KB       1412KB
xfs_ifork                212KB        212KB

Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
 mm/oom_kill.c    | 22 ++++++++++++++++++++++
 mm/slab.h        |  8 ++++++++
 mm/slab_common.c | 29 +++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e0..6d89397 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -43,6 +43,7 @@
 
 #include <asm/tlb.h>
 #include "internal.h"
+#include "slab.h"
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/oom.h>
@@ -160,6 +161,25 @@ static bool oom_unkillable_task(struct task_struct *p,
 	return false;
 }
 
+/*
+ * Print out unreclaimble slabs info when unreclaimable slabs amount is greater
+ * than all user memory (LRU pages)
+ */
+static bool is_dump_unreclaim_slabs(void)
+{
+	unsigned long nr_lru;
+
+	nr_lru = global_node_page_state(NR_ACTIVE_ANON) +
+		 global_node_page_state(NR_INACTIVE_ANON) +
+		 global_node_page_state(NR_ACTIVE_FILE) +
+		 global_node_page_state(NR_INACTIVE_FILE) +
+		 global_node_page_state(NR_ISOLATED_ANON) +
+		 global_node_page_state(NR_ISOLATED_FILE) +
+		 global_node_page_state(NR_UNEVICTABLE);
+
+	return (global_node_page_state(NR_SLAB_UNRECLAIMABLE) > nr_lru);
+}
+
 /**
  * oom_badness - heuristic function to determine which candidate task to kill
  * @p: task struct of which task we should calculate
@@ -423,6 +443,8 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
 		mem_cgroup_print_oom_info(oc->memcg, p);
 	else
 		show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
+	if (is_dump_unreclaim_slabs())
+		dump_unreclaimable_slab();
 	if (sysctl_oom_dump_tasks)
 		dump_tasks(oc->memcg, oc->nodemask);
 }
diff --git a/mm/slab.h b/mm/slab.h
index 0733628..b0496d1 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -505,6 +505,14 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
 void memcg_slab_stop(struct seq_file *m, void *p);
 int memcg_slab_show(struct seq_file *m, void *p);
 
+#ifdef CONFIG_SLABINFO
+void dump_unreclaimable_slab(void);
+#else
+static inline void dump_unreclaimable_slab(void)
+{
+}
+#endif
+
 void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
 
 #ifdef CONFIG_SLAB_FREELIST_RANDOM
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83b..d08213d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1272,6 +1272,35 @@ static int slab_show(struct seq_file *m, void *p)
 	return 0;
 }
 
+void dump_unreclaimable_slab(void)
+{
+	struct kmem_cache *s, *s2;
+	struct slabinfo sinfo;
+
+	pr_info("Unreclaimable slab info:\n");
+	pr_info("Name                      Used          Total\n");
+
+	/*
+	 * Here acquiring slab_mutex is unnecessary since we don't prefer to
+	 * get sleep in oom path right before kernel panic, and avoid race
+	 * condition.
+	 * Since it is already oom, so there should be not any big allocation
+	 * which could change the statistics significantly.
+	 */
+	list_for_each_entry_safe(s, s2, &slab_caches, list) {
+		if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
+			continue;
+
+		memset(&sinfo, 0, sizeof(sinfo));
+		get_slabinfo(s, &sinfo);
+
+		if (sinfo.num_objs > 0)
+			pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
+				(sinfo.active_objs * s->size) / 1024,
+				(sinfo.num_objs * s->size) / 1024);
+	}
+}
+
 #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
 void *memcg_slab_start(struct seq_file *m, loff_t *pos)
 {
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
  2017-09-27 21:46 ` Yang Shi
@ 2017-09-28  4:36   ` Tetsuo Handa
  -1 siblings, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-09-28  4:36 UTC (permalink / raw)
  To: Yang Shi, mhocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel

On 2017/09/28 6:46, Yang Shi wrote:
> Changelog v7 —> v8:
> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.

Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
because there are

	mutex_lock(&slab_mutex);
	kmalloc(GFP_KERNEL);
	mutex_unlock(&slab_mutex);

users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
introducing a risk of crash (i.e. kernel panic) for regular OOM path?

We can try mutex_trylock() from dump_unreclaimable_slab() at best.
But it is still remaining unsafe, isn't it?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
@ 2017-09-28  4:36   ` Tetsuo Handa
  0 siblings, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-09-28  4:36 UTC (permalink / raw)
  To: Yang Shi, mhocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel

On 2017/09/28 6:46, Yang Shi wrote:
> Changelog v7 a??> v8:
> * Adopted Michala??s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.

Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
because there are

	mutex_lock(&slab_mutex);
	kmalloc(GFP_KERNEL);
	mutex_unlock(&slab_mutex);

users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
introducing a risk of crash (i.e. kernel panic) for regular OOM path?

We can try mutex_trylock() from dump_unreclaimable_slab() at best.
But it is still remaining unsafe, isn't it?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
  2017-09-28  4:36   ` Tetsuo Handa
@ 2017-09-28 17:49     ` Yang Shi
  -1 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-09-28 17:49 UTC (permalink / raw)
  To: Tetsuo Handa, mhocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel



On 9/27/17 9:36 PM, Tetsuo Handa wrote:
> On 2017/09/28 6:46, Yang Shi wrote:
>> Changelog v7 —> v8:
>> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
> 
> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> because there are
> 
> 	mutex_lock(&slab_mutex);
> 	kmalloc(GFP_KERNEL);
> 	mutex_unlock(&slab_mutex);
> 
> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
> introducing a risk of crash (i.e. kernel panic) for regular OOM path?

I don't see the difference between regular oom path and oom path other 
than calling panic() at last.

And, the slab dump may be called by panic path too, it is for both 
regular and panic path.

Thanks,
Yang

> 
> We can try mutex_trylock() from dump_unreclaimable_slab() at best.
> But it is still remaining unsafe, isn't it?
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
@ 2017-09-28 17:49     ` Yang Shi
  0 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-09-28 17:49 UTC (permalink / raw)
  To: Tetsuo Handa, mhocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel



On 9/27/17 9:36 PM, Tetsuo Handa wrote:
> On 2017/09/28 6:46, Yang Shi wrote:
>> Changelog v7 a??> v8:
>> * Adopted Michala??s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
> 
> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> because there are
> 
> 	mutex_lock(&slab_mutex);
> 	kmalloc(GFP_KERNEL);
> 	mutex_unlock(&slab_mutex);
> 
> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
> introducing a risk of crash (i.e. kernel panic) for regular OOM path?

I don't see the difference between regular oom path and oom path other 
than calling panic() at last.

And, the slab dump may be called by panic path too, it is for both 
regular and panic path.

Thanks,
Yang

> 
> We can try mutex_trylock() from dump_unreclaimable_slab() at best.
> But it is still remaining unsafe, isn't it?
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
  2017-09-28 17:49     ` Yang Shi
@ 2017-09-28 19:57       ` Tetsuo Handa
  -1 siblings, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-09-28 19:57 UTC (permalink / raw)
  To: yang.s, mhocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel

Yang Shi wrote:
> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
> > On 2017/09/28 6:46, Yang Shi wrote:
> >> Changelog v7 -> v8:
> >> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
> > 
> > Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> > because there are
> > 
> > 	mutex_lock(&slab_mutex);
> > 	kmalloc(GFP_KERNEL);
> > 	mutex_unlock(&slab_mutex);
> > 
> > users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
> > introducing a risk of crash (i.e. kernel panic) for regular OOM path?
> 
> I don't see the difference between regular oom path and oom path other 
> than calling panic() at last.
> 
> And, the slab dump may be called by panic path too, it is for both 
> regular and panic path.

Calling a function that might cause kerneloops immediately before calling panic()
would be tolerable, for the kernel will panic after all. But calling a function
that might cause kerneloops when there is no plan to call panic() is a bug.

> 
> Thanks,
> Yang
> 
> > 
> > We can try mutex_trylock() from dump_unreclaimable_slab() at best.
> > But it is still remaining unsafe, isn't it?
> > 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
@ 2017-09-28 19:57       ` Tetsuo Handa
  0 siblings, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-09-28 19:57 UTC (permalink / raw)
  To: yang.s, mhocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel

Yang Shi wrote:
> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
> > On 2017/09/28 6:46, Yang Shi wrote:
> >> Changelog v7 -> v8:
> >> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
> > 
> > Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> > because there are
> > 
> > 	mutex_lock(&slab_mutex);
> > 	kmalloc(GFP_KERNEL);
> > 	mutex_unlock(&slab_mutex);
> > 
> > users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
> > introducing a risk of crash (i.e. kernel panic) for regular OOM path?
> 
> I don't see the difference between regular oom path and oom path other 
> than calling panic() at last.
> 
> And, the slab dump may be called by panic path too, it is for both 
> regular and panic path.

Calling a function that might cause kerneloops immediately before calling panic()
would be tolerable, for the kernel will panic after all. But calling a function
that might cause kerneloops when there is no plan to call panic() is a bug.

> 
> Thanks,
> Yang
> 
> > 
> > We can try mutex_trylock() from dump_unreclaimable_slab() at best.
> > But it is still remaining unsafe, isn't it?
> > 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
  2017-09-28 19:57       ` Tetsuo Handa
@ 2017-09-28 20:21         ` Yang Shi
  -1 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-09-28 20:21 UTC (permalink / raw)
  To: Tetsuo Handa, mhocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel



On 9/28/17 12:57 PM, Tetsuo Handa wrote:
> Yang Shi wrote:
>> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
>>> On 2017/09/28 6:46, Yang Shi wrote:
>>>> Changelog v7 -> v8:
>>>> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
>>>
>>> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
>>> because there are
>>>
>>> 	mutex_lock(&slab_mutex);
>>> 	kmalloc(GFP_KERNEL);
>>> 	mutex_unlock(&slab_mutex);
>>>
>>> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
>>> introducing a risk of crash (i.e. kernel panic) for regular OOM path?
>>
>> I don't see the difference between regular oom path and oom path other
>> than calling panic() at last.
>>
>> And, the slab dump may be called by panic path too, it is for both
>> regular and panic path.
> 
> Calling a function that might cause kerneloops immediately before calling panic()
> would be tolerable, for the kernel will panic after all. But calling a function
> that might cause kerneloops when there is no plan to call panic() is a bug.

I got your point. slab_mutex is used to protect the list of all the  
slabs, since we are already in oom, there should be not kmem cache  
destroy happen during the list traverse. And, list_for_each_entry() has  
been replaced to list_for_each_entry_safe() to make the traverse more  
robust.

Thanks,
Yang

> 
>>
>> Thanks,
>> Yang
>>
>>>
>>> We can try mutex_trylock() from dump_unreclaimable_slab() at best.
>>> But it is still remaining unsafe, isn't it?
>>>
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
@ 2017-09-28 20:21         ` Yang Shi
  0 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-09-28 20:21 UTC (permalink / raw)
  To: Tetsuo Handa, mhocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel



On 9/28/17 12:57 PM, Tetsuo Handa wrote:
> Yang Shi wrote:
>> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
>>> On 2017/09/28 6:46, Yang Shi wrote:
>>>> Changelog v7 -> v8:
>>>> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
>>>
>>> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
>>> because there are
>>>
>>> 	mutex_lock(&slab_mutex);
>>> 	kmalloc(GFP_KERNEL);
>>> 	mutex_unlock(&slab_mutex);
>>>
>>> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
>>> introducing a risk of crash (i.e. kernel panic) for regular OOM path?
>>
>> I don't see the difference between regular oom path and oom path other
>> than calling panic() at last.
>>
>> And, the slab dump may be called by panic path too, it is for both
>> regular and panic path.
> 
> Calling a function that might cause kerneloops immediately before calling panic()
> would be tolerable, for the kernel will panic after all. But calling a function
> that might cause kerneloops when there is no plan to call panic() is a bug.

I got your point. slab_mutex is used to protect the list of all the  
slabs, since we are already in oom, there should be not kmem cache  
destroy happen during the list traverse. And, list_for_each_entry() has  
been replaced to list_for_each_entry_safe() to make the traverse more  
robust.

Thanks,
Yang

> 
>>
>> Thanks,
>> Yang
>>
>>>
>>> We can try mutex_trylock() from dump_unreclaimable_slab() at best.
>>> But it is still remaining unsafe, isn't it?
>>>
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
  2017-09-28 20:21         ` Yang Shi
@ 2017-09-28 20:45           ` Tetsuo Handa
  -1 siblings, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-09-28 20:45 UTC (permalink / raw)
  To: yang.s, mhocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel

Yang Shi wrote:
> On 9/28/17 12:57 PM, Tetsuo Handa wrote:
> > Yang Shi wrote:
> >> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
> >>> On 2017/09/28 6:46, Yang Shi wrote:
> >>>> Changelog v7 -> v8:
> >>>> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
> >>>
> >>> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> >>> because there are
> >>>
> >>> 	mutex_lock(&slab_mutex);
> >>> 	kmalloc(GFP_KERNEL);
> >>> 	mutex_unlock(&slab_mutex);
> >>>
> >>> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
> >>> introducing a risk of crash (i.e. kernel panic) for regular OOM path?
> >>
> >> I don't see the difference between regular oom path and oom path other
> >> than calling panic() at last.
> >>
> >> And, the slab dump may be called by panic path too, it is for both
> >> regular and panic path.
> > 
> > Calling a function that might cause kerneloops immediately before calling panic()
> > would be tolerable, for the kernel will panic after all. But calling a function
> > that might cause kerneloops when there is no plan to call panic() is a bug.
> 
> I got your point. slab_mutex is used to protect the list of all the  
> slabs, since we are already in oom, there should be not kmem cache  
> destroy happen during the list traverse. And, list_for_each_entry() has  
> been replaced to list_for_each_entry_safe() to make the traverse more  
> robust.

I consider that OOM event and kmem chache destroy event can run concurrently
because slab_mutex is not held by OOM event (and unfortunately cannot be held
due to possibility of deadlock) in order to protect the list of all the slabs.

I don't think replacing list_for_each_entry() with list_for_each_entry_safe()
makes the traverse more robust, for list_for_each_entry_safe() does not defer
freeing of memory used by list element. Rather, replacing list_for_each_entry()
with list_for_each_entry_rcu() (and making relevant changes such as
rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse safe.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
@ 2017-09-28 20:45           ` Tetsuo Handa
  0 siblings, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-09-28 20:45 UTC (permalink / raw)
  To: yang.s, mhocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel

Yang Shi wrote:
> On 9/28/17 12:57 PM, Tetsuo Handa wrote:
> > Yang Shi wrote:
> >> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
> >>> On 2017/09/28 6:46, Yang Shi wrote:
> >>>> Changelog v7 -> v8:
> >>>> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
> >>>
> >>> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> >>> because there are
> >>>
> >>> 	mutex_lock(&slab_mutex);
> >>> 	kmalloc(GFP_KERNEL);
> >>> 	mutex_unlock(&slab_mutex);
> >>>
> >>> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
> >>> introducing a risk of crash (i.e. kernel panic) for regular OOM path?
> >>
> >> I don't see the difference between regular oom path and oom path other
> >> than calling panic() at last.
> >>
> >> And, the slab dump may be called by panic path too, it is for both
> >> regular and panic path.
> > 
> > Calling a function that might cause kerneloops immediately before calling panic()
> > would be tolerable, for the kernel will panic after all. But calling a function
> > that might cause kerneloops when there is no plan to call panic() is a bug.
> 
> I got your point. slab_mutex is used to protect the list of all the  
> slabs, since we are already in oom, there should be not kmem cache  
> destroy happen during the list traverse. And, list_for_each_entry() has  
> been replaced to list_for_each_entry_safe() to make the traverse more  
> robust.

I consider that OOM event and kmem chache destroy event can run concurrently
because slab_mutex is not held by OOM event (and unfortunately cannot be held
due to possibility of deadlock) in order to protect the list of all the slabs.

I don't think replacing list_for_each_entry() with list_for_each_entry_safe()
makes the traverse more robust, for list_for_each_entry_safe() does not defer
freeing of memory used by list element. Rather, replacing list_for_each_entry()
with list_for_each_entry_rcu() (and making relevant changes such as
rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse safe.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
  2017-09-28 20:45           ` Tetsuo Handa
@ 2017-09-29 22:15             ` Yang Shi
  -1 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-09-29 22:15 UTC (permalink / raw)
  To: Tetsuo Handa, mhocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel



On 9/28/17 1:45 PM, Tetsuo Handa wrote:
> Yang Shi wrote:
>> On 9/28/17 12:57 PM, Tetsuo Handa wrote:
>>> Yang Shi wrote:
>>>> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
>>>>> On 2017/09/28 6:46, Yang Shi wrote:
>>>>>> Changelog v7 -> v8:
>>>>>> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
>>>>>
>>>>> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
>>>>> because there are
>>>>>
>>>>> 	mutex_lock(&slab_mutex);
>>>>> 	kmalloc(GFP_KERNEL);
>>>>> 	mutex_unlock(&slab_mutex);
>>>>>
>>>>> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
>>>>> introducing a risk of crash (i.e. kernel panic) for regular OOM path?
>>>>
>>>> I don't see the difference between regular oom path and oom path other
>>>> than calling panic() at last.
>>>>
>>>> And, the slab dump may be called by panic path too, it is for both
>>>> regular and panic path.
>>>
>>> Calling a function that might cause kerneloops immediately before calling panic()
>>> would be tolerable, for the kernel will panic after all. But calling a function
>>> that might cause kerneloops when there is no plan to call panic() is a bug.
>>
>> I got your point. slab_mutex is used to protect the list of all the
>> slabs, since we are already in oom, there should be not kmem cache
>> destroy happen during the list traverse. And, list_for_each_entry() has
>> been replaced to list_for_each_entry_safe() to make the traverse more
>> robust.
> 
> I consider that OOM event and kmem chache destroy event can run concurrently
> because slab_mutex is not held by OOM event (and unfortunately cannot be held
> due to possibility of deadlock) in order to protect the list of all the slabs.
> 
> I don't think replacing list_for_each_entry() with list_for_each_entry_safe()
> makes the traverse more robust, for list_for_each_entry_safe() does not defer
> freeing of memory used by list element. Rather, replacing list_for_each_entry()
> with list_for_each_entry_rcu() (and making relevant changes such as
> rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse safe.

I'm not sure if rcu could satisfy this case. rcu just can protect  
slab_caches_to_rcu_destroy list, which is used by SLAB_TYPESAFE_BY_RCU  
slabs.

Yang

> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
@ 2017-09-29 22:15             ` Yang Shi
  0 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-09-29 22:15 UTC (permalink / raw)
  To: Tetsuo Handa, mhocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel



On 9/28/17 1:45 PM, Tetsuo Handa wrote:
> Yang Shi wrote:
>> On 9/28/17 12:57 PM, Tetsuo Handa wrote:
>>> Yang Shi wrote:
>>>> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
>>>>> On 2017/09/28 6:46, Yang Shi wrote:
>>>>>> Changelog v7 -> v8:
>>>>>> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
>>>>>
>>>>> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
>>>>> because there are
>>>>>
>>>>> 	mutex_lock(&slab_mutex);
>>>>> 	kmalloc(GFP_KERNEL);
>>>>> 	mutex_unlock(&slab_mutex);
>>>>>
>>>>> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
>>>>> introducing a risk of crash (i.e. kernel panic) for regular OOM path?
>>>>
>>>> I don't see the difference between regular oom path and oom path other
>>>> than calling panic() at last.
>>>>
>>>> And, the slab dump may be called by panic path too, it is for both
>>>> regular and panic path.
>>>
>>> Calling a function that might cause kerneloops immediately before calling panic()
>>> would be tolerable, for the kernel will panic after all. But calling a function
>>> that might cause kerneloops when there is no plan to call panic() is a bug.
>>
>> I got your point. slab_mutex is used to protect the list of all the
>> slabs, since we are already in oom, there should be not kmem cache
>> destroy happen during the list traverse. And, list_for_each_entry() has
>> been replaced to list_for_each_entry_safe() to make the traverse more
>> robust.
> 
> I consider that OOM event and kmem chache destroy event can run concurrently
> because slab_mutex is not held by OOM event (and unfortunately cannot be held
> due to possibility of deadlock) in order to protect the list of all the slabs.
> 
> I don't think replacing list_for_each_entry() with list_for_each_entry_safe()
> makes the traverse more robust, for list_for_each_entry_safe() does not defer
> freeing of memory used by list element. Rather, replacing list_for_each_entry()
> with list_for_each_entry_rcu() (and making relevant changes such as
> rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse safe.

I'm not sure if rcu could satisfy this case. rcu just can protect  
slab_caches_to_rcu_destroy list, which is used by SLAB_TYPESAFE_BY_RCU  
slabs.

Yang

> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
  2017-09-29 22:15             ` Yang Shi
@ 2017-09-30 11:00               ` Tetsuo Handa
  -1 siblings, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-09-30 11:00 UTC (permalink / raw)
  To: yang.s
  Cc: mhocko, cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm,
	linux-kernel

Yang Shi wrote:
> On 9/28/17 1:45 PM, Tetsuo Handa wrote:
> > Yang Shi wrote:
> >> On 9/28/17 12:57 PM, Tetsuo Handa wrote:
> >>> Yang Shi wrote:
> >>>> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
> >>>>> On 2017/09/28 6:46, Yang Shi wrote:
> >>>>>> Changelog v7 -> v8:
> >>>>>> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
> >>>>>
> >>>>> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> >>>>> because there are
> >>>>>
> >>>>> 	mutex_lock(&slab_mutex);
> >>>>> 	kmalloc(GFP_KERNEL);
> >>>>> 	mutex_unlock(&slab_mutex);
> >>>>>
> >>>>> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
> >>>>> introducing a risk of crash (i.e. kernel panic) for regular OOM path?
> >>>>
> >>>> I don't see the difference between regular oom path and oom path other
> >>>> than calling panic() at last.
> >>>>
> >>>> And, the slab dump may be called by panic path too, it is for both
> >>>> regular and panic path.
> >>>
> >>> Calling a function that might cause kerneloops immediately before calling panic()
> >>> would be tolerable, for the kernel will panic after all. But calling a function
> >>> that might cause kerneloops when there is no plan to call panic() is a bug.
> >>
> >> I got your point. slab_mutex is used to protect the list of all the
> >> slabs, since we are already in oom, there should be not kmem cache
> >> destroy happen during the list traverse. And, list_for_each_entry() has
> >> been replaced to list_for_each_entry_safe() to make the traverse more
> >> robust.
> > 
> > I consider that OOM event and kmem chache destroy event can run concurrently
> > because slab_mutex is not held by OOM event (and unfortunately cannot be held
> > due to possibility of deadlock) in order to protect the list of all the slabs.
> > 
> > I don't think replacing list_for_each_entry() with list_for_each_entry_safe()
> > makes the traverse more robust, for list_for_each_entry_safe() does not defer
> > freeing of memory used by list element. Rather, replacing list_for_each_entry()
> > with list_for_each_entry_rcu() (and making relevant changes such as
> > rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse safe.
> 
> I'm not sure if rcu could satisfy this case. rcu just can protect  
> slab_caches_to_rcu_destroy list, which is used by SLAB_TYPESAFE_BY_RCU  
> slabs.

I'm not sure why you are talking about SLAB_TYPESAFE_BY_RCU.
What I meant is that

  Upon registration:

    // do initialize/setup stuff here
    synchronize_rcu(); // <= for dump_unreclaimable_slab()
    list_add_rcu(&kmem_cache->list, &slab_caches);

  Upon unregistration:

    list_del_rcu(&kmem_cache->list);
    synchronize_rcu(); // <= for dump_unreclaimable_slab()
    // do finalize/cleanup stuff here

then (if my understanding is correct)

	rcu_read_lock();
	list_for_each_entry_rcu(s, &slab_caches, list) {
		if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
			continue;

		memset(&sinfo, 0, sizeof(sinfo));
		get_slabinfo(s, &sinfo);

		if (sinfo.num_objs > 0)
			pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
				(sinfo.active_objs * s->size) / 1024,
				(sinfo.num_objs * s->size) / 1024);
	}
	rcu_read_unlock();

will make dump_unreclaimable_slab() safe.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
@ 2017-09-30 11:00               ` Tetsuo Handa
  0 siblings, 0 replies; 27+ messages in thread
From: Tetsuo Handa @ 2017-09-30 11:00 UTC (permalink / raw)
  To: yang.s
  Cc: mhocko, cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm,
	linux-kernel

Yang Shi wrote:
> On 9/28/17 1:45 PM, Tetsuo Handa wrote:
> > Yang Shi wrote:
> >> On 9/28/17 12:57 PM, Tetsuo Handa wrote:
> >>> Yang Shi wrote:
> >>>> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
> >>>>> On 2017/09/28 6:46, Yang Shi wrote:
> >>>>>> Changelog v7 -> v8:
> >>>>>> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
> >>>>>
> >>>>> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> >>>>> because there are
> >>>>>
> >>>>> 	mutex_lock(&slab_mutex);
> >>>>> 	kmalloc(GFP_KERNEL);
> >>>>> 	mutex_unlock(&slab_mutex);
> >>>>>
> >>>>> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
> >>>>> introducing a risk of crash (i.e. kernel panic) for regular OOM path?
> >>>>
> >>>> I don't see the difference between regular oom path and oom path other
> >>>> than calling panic() at last.
> >>>>
> >>>> And, the slab dump may be called by panic path too, it is for both
> >>>> regular and panic path.
> >>>
> >>> Calling a function that might cause kerneloops immediately before calling panic()
> >>> would be tolerable, for the kernel will panic after all. But calling a function
> >>> that might cause kerneloops when there is no plan to call panic() is a bug.
> >>
> >> I got your point. slab_mutex is used to protect the list of all the
> >> slabs, since we are already in oom, there should be not kmem cache
> >> destroy happen during the list traverse. And, list_for_each_entry() has
> >> been replaced to list_for_each_entry_safe() to make the traverse more
> >> robust.
> > 
> > I consider that OOM event and kmem chache destroy event can run concurrently
> > because slab_mutex is not held by OOM event (and unfortunately cannot be held
> > due to possibility of deadlock) in order to protect the list of all the slabs.
> > 
> > I don't think replacing list_for_each_entry() with list_for_each_entry_safe()
> > makes the traverse more robust, for list_for_each_entry_safe() does not defer
> > freeing of memory used by list element. Rather, replacing list_for_each_entry()
> > with list_for_each_entry_rcu() (and making relevant changes such as
> > rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse safe.
> 
> I'm not sure if rcu could satisfy this case. rcu just can protect  
> slab_caches_to_rcu_destroy list, which is used by SLAB_TYPESAFE_BY_RCU  
> slabs.

I'm not sure why you are talking about SLAB_TYPESAFE_BY_RCU.
What I meant is that

  Upon registration:

    // do initialize/setup stuff here
    synchronize_rcu(); // <= for dump_unreclaimable_slab()
    list_add_rcu(&kmem_cache->list, &slab_caches);

  Upon unregistration:

    list_del_rcu(&kmem_cache->list);
    synchronize_rcu(); // <= for dump_unreclaimable_slab()
    // do finalize/cleanup stuff here

then (if my understanding is correct)

	rcu_read_lock();
	list_for_each_entry_rcu(s, &slab_caches, list) {
		if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
			continue;

		memset(&sinfo, 0, sizeof(sinfo));
		get_slabinfo(s, &sinfo);

		if (sinfo.num_objs > 0)
			pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
				(sinfo.active_objs * s->size) / 1024,
				(sinfo.num_objs * s->size) / 1024);
	}
	rcu_read_unlock();

will make dump_unreclaimable_slab() safe.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when unreclaimable slabs > user memory
  2017-09-27 21:46   ` Yang Shi
  (?)
@ 2017-10-01  6:19   ` Christopher Lameter
  -1 siblings, 0 replies; 27+ messages in thread
From: Christopher Lameter @ 2017-10-01  6:19 UTC (permalink / raw)
  To: Yang Shi
  Cc: penberg, rientjes, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel

On Thu, 28 Sep 2017, Yang Shi wrote:

> diff --git a/mm/slab.h b/mm/slab.h
> index 0733628..b0496d1 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -505,6 +505,14 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
>  void memcg_slab_stop(struct seq_file *m, void *p);
>  int memcg_slab_show(struct seq_file *m, void *p);
>
> +#ifdef CONFIG_SLABINFO
> +void dump_unreclaimable_slab(void);
> +#else
> +static inline void dump_unreclaimable_slab(void)
> +{
> +}
> +#endif
> +
>  void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
>
>  #ifdef CONFIG_SLAB_FREELIST_RANDOM
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 904a83b..d08213d 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1272,6 +1272,35 @@ static int slab_show(struct seq_file *m, void *p)
>  	return 0;
>  }
>
> +void dump_unreclaimable_slab(void)
> +{
> +	struct kmem_cache *s, *s2;
> +	struct slabinfo sinfo;
> +
> +	pr_info("Unreclaimable slab info:\n");
> +	pr_info("Name                      Used          Total\n");
> +
> +	/*
> +	 * Here acquiring slab_mutex is unnecessary since we don't prefer to
> +	 * get sleep in oom path right before kernel panic, and avoid race
> +	 * condition.
> +	 * Since it is already oom, so there should be not any big allocation
> +	 * which could change the statistics significantly.
> +	 */
> +	list_for_each_entry_safe(s, s2, &slab_caches, list) {
> +		if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
> +			continue;
> +
> +		memset(&sinfo, 0, sizeof(sinfo));
> +		get_slabinfo(s, &sinfo);
> +
> +		if (sinfo.num_objs > 0)
> +			pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
> +				(sinfo.active_objs * s->size) / 1024,
> +				(sinfo.num_objs * s->size) / 1024);
> +	}
> +}
> +
>  #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
>  void *memcg_slab_start(struct seq_file *m, loff_t *pos)
>  {
>

SLABINFO is a legacy feature abd dump_unreclaimable_slab is definitely
not. It also does not depend on /proc/slabinfo support.

Please move the code out of the #ifdef CONFIG_SLABINFO section.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
  2017-09-28  4:36   ` Tetsuo Handa
@ 2017-10-02 11:20     ` Michal Hocko
  -1 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-10-02 11:20 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Yang Shi, cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm,
	linux-kernel

On Thu 28-09-17 13:36:57, Tetsuo Handa wrote:
> On 2017/09/28 6:46, Yang Shi wrote:
> > Changelog v7 —> v8:
> > * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
> 
> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> because there are
> 
> 	mutex_lock(&slab_mutex);
> 	kmalloc(GFP_KERNEL);
> 	mutex_unlock(&slab_mutex);
> 
> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
> introducing a risk of crash (i.e. kernel panic) for regular OOM path?

yes we are
 
> We can try mutex_trylock() from dump_unreclaimable_slab() at best.
> But it is still remaining unsafe, isn't it?

using the trylock sounds like a reasonable compromise.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
@ 2017-10-02 11:20     ` Michal Hocko
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-10-02 11:20 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Yang Shi, cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm,
	linux-kernel

On Thu 28-09-17 13:36:57, Tetsuo Handa wrote:
> On 2017/09/28 6:46, Yang Shi wrote:
> > Changelog v7 a??> v8:
> > * Adopted Michala??s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
> 
> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
> because there are
> 
> 	mutex_lock(&slab_mutex);
> 	kmalloc(GFP_KERNEL);
> 	mutex_unlock(&slab_mutex);
> 
> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
> introducing a risk of crash (i.e. kernel panic) for regular OOM path?

yes we are
 
> We can try mutex_trylock() from dump_unreclaimable_slab() at best.
> But it is still remaining unsafe, isn't it?

using the trylock sounds like a reasonable compromise.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
  2017-09-30 11:00               ` Tetsuo Handa
@ 2017-10-02 15:40                 ` Yang Shi
  -1 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-10-02 15:40 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: mhocko, cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm,
	linux-kernel



On 9/30/17 4:00 AM, Tetsuo Handa wrote:
> Yang Shi wrote:
>> On 9/28/17 1:45 PM, Tetsuo Handa wrote:
>>> Yang Shi wrote:
>>>> On 9/28/17 12:57 PM, Tetsuo Handa wrote:
>>>>> Yang Shi wrote:
>>>>>> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
>>>>>>> On 2017/09/28 6:46, Yang Shi wrote:
>>>>>>>> Changelog v7 -> v8:
>>>>>>>> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
>>>>>>>
>>>>>>> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
>>>>>>> because there are
>>>>>>>
>>>>>>> 	mutex_lock(&slab_mutex);
>>>>>>> 	kmalloc(GFP_KERNEL);
>>>>>>> 	mutex_unlock(&slab_mutex);
>>>>>>>
>>>>>>> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
>>>>>>> introducing a risk of crash (i.e. kernel panic) for regular OOM path?
>>>>>>
>>>>>> I don't see the difference between regular oom path and oom path other
>>>>>> than calling panic() at last.
>>>>>>
>>>>>> And, the slab dump may be called by panic path too, it is for both
>>>>>> regular and panic path.
>>>>>
>>>>> Calling a function that might cause kerneloops immediately before calling panic()
>>>>> would be tolerable, for the kernel will panic after all. But calling a function
>>>>> that might cause kerneloops when there is no plan to call panic() is a bug.
>>>>
>>>> I got your point. slab_mutex is used to protect the list of all the
>>>> slabs, since we are already in oom, there should be not kmem cache
>>>> destroy happen during the list traverse. And, list_for_each_entry() has
>>>> been replaced to list_for_each_entry_safe() to make the traverse more
>>>> robust.
>>>
>>> I consider that OOM event and kmem chache destroy event can run concurrently
>>> because slab_mutex is not held by OOM event (and unfortunately cannot be held
>>> due to possibility of deadlock) in order to protect the list of all the slabs.
>>>
>>> I don't think replacing list_for_each_entry() with list_for_each_entry_safe()
>>> makes the traverse more robust, for list_for_each_entry_safe() does not defer
>>> freeing of memory used by list element. Rather, replacing list_for_each_entry()
>>> with list_for_each_entry_rcu() (and making relevant changes such as
>>> rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse safe.
>>
>> I'm not sure if rcu could satisfy this case. rcu just can protect
>> slab_caches_to_rcu_destroy list, which is used by SLAB_TYPESAFE_BY_RCU
>> slabs.
> 
> I'm not sure why you are talking about SLAB_TYPESAFE_BY_RCU.
> What I meant is that
> 
>    Upon registration:
> 
>      // do initialize/setup stuff here
>      synchronize_rcu(); // <= for dump_unreclaimable_slab()
>      list_add_rcu(&kmem_cache->list, &slab_caches);
> 
>    Upon unregistration:
> 
>      list_del_rcu(&kmem_cache->list);
>      synchronize_rcu(); // <= for dump_unreclaimable_slab()
>      // do finalize/cleanup stuff here
> 
> then (if my understanding is correct)
> 
> 	rcu_read_lock();
> 	list_for_each_entry_rcu(s, &slab_caches, list) {
> 		if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
> 			continue;
> 
> 		memset(&sinfo, 0, sizeof(sinfo));
> 		get_slabinfo(s, &sinfo);
> 
> 		if (sinfo.num_objs > 0)
> 			pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
> 				(sinfo.active_objs * s->size) / 1024,
> 				(sinfo.num_objs * s->size) / 1024);
> 	}
> 	rcu_read_unlock();
> 
> will make dump_unreclaimable_slab() safe.

Thanks for the detailed description. However, it sounds this change is  
too much for slub, I'm not sure if this may change the subtle behavior  
of slub.

trylock sounds like a good alternative.

Yang

> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
@ 2017-10-02 15:40                 ` Yang Shi
  0 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-10-02 15:40 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: mhocko, cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm,
	linux-kernel



On 9/30/17 4:00 AM, Tetsuo Handa wrote:
> Yang Shi wrote:
>> On 9/28/17 1:45 PM, Tetsuo Handa wrote:
>>> Yang Shi wrote:
>>>> On 9/28/17 12:57 PM, Tetsuo Handa wrote:
>>>>> Yang Shi wrote:
>>>>>> On 9/27/17 9:36 PM, Tetsuo Handa wrote:
>>>>>>> On 2017/09/28 6:46, Yang Shi wrote:
>>>>>>>> Changelog v7 -> v8:
>>>>>>>> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
>>>>>>>
>>>>>>> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
>>>>>>> because there are
>>>>>>>
>>>>>>> 	mutex_lock(&slab_mutex);
>>>>>>> 	kmalloc(GFP_KERNEL);
>>>>>>> 	mutex_unlock(&slab_mutex);
>>>>>>>
>>>>>>> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
>>>>>>> introducing a risk of crash (i.e. kernel panic) for regular OOM path?
>>>>>>
>>>>>> I don't see the difference between regular oom path and oom path other
>>>>>> than calling panic() at last.
>>>>>>
>>>>>> And, the slab dump may be called by panic path too, it is for both
>>>>>> regular and panic path.
>>>>>
>>>>> Calling a function that might cause kerneloops immediately before calling panic()
>>>>> would be tolerable, for the kernel will panic after all. But calling a function
>>>>> that might cause kerneloops when there is no plan to call panic() is a bug.
>>>>
>>>> I got your point. slab_mutex is used to protect the list of all the
>>>> slabs, since we are already in oom, there should be not kmem cache
>>>> destroy happen during the list traverse. And, list_for_each_entry() has
>>>> been replaced to list_for_each_entry_safe() to make the traverse more
>>>> robust.
>>>
>>> I consider that OOM event and kmem chache destroy event can run concurrently
>>> because slab_mutex is not held by OOM event (and unfortunately cannot be held
>>> due to possibility of deadlock) in order to protect the list of all the slabs.
>>>
>>> I don't think replacing list_for_each_entry() with list_for_each_entry_safe()
>>> makes the traverse more robust, for list_for_each_entry_safe() does not defer
>>> freeing of memory used by list element. Rather, replacing list_for_each_entry()
>>> with list_for_each_entry_rcu() (and making relevant changes such as
>>> rcu_read_lock()/rcu_read_unlock()/synchronize_rcu()) will make the traverse safe.
>>
>> I'm not sure if rcu could satisfy this case. rcu just can protect
>> slab_caches_to_rcu_destroy list, which is used by SLAB_TYPESAFE_BY_RCU
>> slabs.
> 
> I'm not sure why you are talking about SLAB_TYPESAFE_BY_RCU.
> What I meant is that
> 
>    Upon registration:
> 
>      // do initialize/setup stuff here
>      synchronize_rcu(); // <= for dump_unreclaimable_slab()
>      list_add_rcu(&kmem_cache->list, &slab_caches);
> 
>    Upon unregistration:
> 
>      list_del_rcu(&kmem_cache->list);
>      synchronize_rcu(); // <= for dump_unreclaimable_slab()
>      // do finalize/cleanup stuff here
> 
> then (if my understanding is correct)
> 
> 	rcu_read_lock();
> 	list_for_each_entry_rcu(s, &slab_caches, list) {
> 		if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
> 			continue;
> 
> 		memset(&sinfo, 0, sizeof(sinfo));
> 		get_slabinfo(s, &sinfo);
> 
> 		if (sinfo.num_objs > 0)
> 			pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
> 				(sinfo.active_objs * s->size) / 1024,
> 				(sinfo.num_objs * s->size) / 1024);
> 	}
> 	rcu_read_unlock();
> 
> will make dump_unreclaimable_slab() safe.

Thanks for the detailed description. However, it sounds this change is  
too much for slub, I'm not sure if this may change the subtle behavior  
of slub.

trylock sounds like a good alternative.

Yang

> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
  2017-10-02 11:20     ` Michal Hocko
@ 2017-10-02 15:46       ` Yang Shi
  -1 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-10-02 15:46 UTC (permalink / raw)
  To: Michal Hocko, Tetsuo Handa
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel



On 10/2/17 4:20 AM, Michal Hocko wrote:
> On Thu 28-09-17 13:36:57, Tetsuo Handa wrote:
>> On 2017/09/28 6:46, Yang Shi wrote:
>>> Changelog v7 —> v8:
>>> * Adopted Michal’s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
>>
>> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
>> because there are
>>
>> 	mutex_lock(&slab_mutex);
>> 	kmalloc(GFP_KERNEL);
>> 	mutex_unlock(&slab_mutex);
>>
>> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
>> introducing a risk of crash (i.e. kernel panic) for regular OOM path?
> 
> yes we are
>   
>> We can try mutex_trylock() from dump_unreclaimable_slab() at best.
>> But it is still remaining unsafe, isn't it?
> 
> using the trylock sounds like a reasonable compromise.

OK, it sounds we reach agreement on trylock. Will solve those comments 
in v9.

Thanks,
Yang

> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message
@ 2017-10-02 15:46       ` Yang Shi
  0 siblings, 0 replies; 27+ messages in thread
From: Yang Shi @ 2017-10-02 15:46 UTC (permalink / raw)
  To: Michal Hocko, Tetsuo Handa
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel



On 10/2/17 4:20 AM, Michal Hocko wrote:
> On Thu 28-09-17 13:36:57, Tetsuo Handa wrote:
>> On 2017/09/28 6:46, Yang Shi wrote:
>>> Changelog v7 a??> v8:
>>> * Adopted Michala??s suggestion to dump unreclaim slab info when unreclaimable slabs amount > total user memory. Not only in oom panic path.
>>
>> Holding slab_mutex inside dump_unreclaimable_slab() was refrained since V2
>> because there are
>>
>> 	mutex_lock(&slab_mutex);
>> 	kmalloc(GFP_KERNEL);
>> 	mutex_unlock(&slab_mutex);
>>
>> users. If we call dump_unreclaimable_slab() for non OOM panic path, aren't we
>> introducing a risk of crash (i.e. kernel panic) for regular OOM path?
> 
> yes we are
>   
>> We can try mutex_trylock() from dump_unreclaimable_slab() at best.
>> But it is still remaining unsafe, isn't it?
> 
> using the trylock sounds like a reasonable compromise.

OK, it sounds we reach agreement on trylock. Will solve those comments 
in v9.

Thanks,
Yang

> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-10-02 15:46 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-27 21:46 [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message Yang Shi
2017-09-27 21:46 ` Yang Shi
2017-09-27 21:46 ` [PATCH 1/2] tools: slabinfo: add "-U" option to show unreclaimable slabs only Yang Shi
2017-09-27 21:46   ` Yang Shi
2017-09-27 21:46 ` [PATCH 2/2] mm: oom: show unreclaimable slab info when unreclaimable slabs > user memory Yang Shi
2017-09-27 21:46   ` Yang Shi
2017-10-01  6:19   ` Christopher Lameter
2017-09-28  4:36 ` [PATCH 0/2 v8] oom: capture unreclaimable slab info in oom message Tetsuo Handa
2017-09-28  4:36   ` Tetsuo Handa
2017-09-28 17:49   ` Yang Shi
2017-09-28 17:49     ` Yang Shi
2017-09-28 19:57     ` Tetsuo Handa
2017-09-28 19:57       ` Tetsuo Handa
2017-09-28 20:21       ` Yang Shi
2017-09-28 20:21         ` Yang Shi
2017-09-28 20:45         ` Tetsuo Handa
2017-09-28 20:45           ` Tetsuo Handa
2017-09-29 22:15           ` Yang Shi
2017-09-29 22:15             ` Yang Shi
2017-09-30 11:00             ` Tetsuo Handa
2017-09-30 11:00               ` Tetsuo Handa
2017-10-02 15:40               ` Yang Shi
2017-10-02 15:40                 ` Yang Shi
2017-10-02 11:20   ` Michal Hocko
2017-10-02 11:20     ` Michal Hocko
2017-10-02 15:46     ` Yang Shi
2017-10-02 15:46       ` Yang Shi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.