* [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message when kernel panic
@ 2017-09-20 22:38 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-20 22:38 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Recently we ran into a oom issue, kernel panic due to no killable process.
The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
For details, please see the commit log for each commit.
Changelog v3 —> v4:
* Solved the comments from David
* Added David’s Acked-by in patch 1
Changelog v2 —> v3:
* Show used size and total size of each kmem cache per David’s comment
Changelog v1 —> v2:
* Removed the original patch 1 (“mm: slab: output reclaimable flag in /proc/slabinfo”) since Christoph suggested it might break the compatibility and /proc/slabinfo is legacy
* Added Christoph’s Acked-by
* Removed acquiring slab_mutex per Tetsuo’s comment
Yang Shi (2):
tools: slabinfo: add "-U" option to show unreclaimable slabs only
mm: oom: show unreclaimable slab info when kernel panic
mm/oom_kill.c | 3 +++
mm/slab.h | 8 ++++++++
mm/slab_common.c | 26 ++++++++++++++++++++++++++
tools/vm/slabinfo.c | 11 ++++++++++-
4 files changed, 47 insertions(+), 1 deletion(-)
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message when kernel panic
@ 2017-09-20 22:38 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-20 22:38 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Recently we ran into a oom issue, kernel panic due to no killable process.
The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
For details, please see the commit log for each commit.
Changelog v3 a??> v4:
* Solved the comments from David
* Added Davida??s Acked-by in patch 1
Changelog v2 a??> v3:
* Show used size and total size of each kmem cache per Davida??s comment
Changelog v1 a??> v2:
* Removed the original patch 1 (a??mm: slab: output reclaimable flag in /proc/slabinfoa??) since Christoph suggested it might break the compatibility and /proc/slabinfo is legacy
* Added Christopha??s Acked-by
* Removed acquiring slab_mutex per Tetsuoa??s comment
Yang Shi (2):
tools: slabinfo: add "-U" option to show unreclaimable slabs only
mm: oom: show unreclaimable slab info when kernel panic
mm/oom_kill.c | 3 +++
mm/slab.h | 8 ++++++++
mm/slab_common.c | 26 ++++++++++++++++++++++++++
tools/vm/slabinfo.c | 11 ++++++++++-
4 files changed, 47 insertions(+), 1 deletion(-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 1/2] tools: slabinfo: add "-U" option to show unreclaimable slabs only
2017-09-20 22:38 ` Yang Shi
@ 2017-09-20 22:38 ` Yang Shi
-1 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-20 22:38 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Add "-U" option to show unreclaimable slabs only.
"-U" and "-S" together can tell us what unreclaimable slabs use the most
memory to help debug huge unreclaimable slabs issue.
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
---
tools/vm/slabinfo.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c
index b9d34b3..de8fa11 100644
--- a/tools/vm/slabinfo.c
+++ b/tools/vm/slabinfo.c
@@ -83,6 +83,7 @@ struct aliasinfo {
int sort_loss;
int extended_totals;
int show_bytes;
+int unreclaim_only;
/* Debug options */
int sanity;
@@ -132,6 +133,7 @@ static void usage(void)
"-L|--Loss Sort by loss\n"
"-X|--Xtotals Show extended summary information\n"
"-B|--Bytes Show size in bytes\n"
+ "-U|--Unreclaim Show unreclaimable slabs only\n"
"\nValid debug options (FZPUT may be combined)\n"
"a / A Switch on all debug options (=FZUP)\n"
"- Switch off all debug options\n"
@@ -568,6 +570,9 @@ static void slabcache(struct slabinfo *s)
if (strcmp(s->name, "*") == 0)
return;
+ if (unreclaim_only && s->reclaim_account)
+ return;
+
if (actual_slabs == 1) {
report(s);
return;
@@ -1346,6 +1351,7 @@ struct option opts[] = {
{ "Loss", no_argument, NULL, 'L'},
{ "Xtotals", no_argument, NULL, 'X'},
{ "Bytes", no_argument, NULL, 'B'},
+ { "Unreclaim", no_argument, NULL, 'U'},
{ NULL, 0, NULL, 0 }
};
@@ -1357,7 +1363,7 @@ int main(int argc, char *argv[])
page_size = getpagesize();
- while ((c = getopt_long(argc, argv, "aAd::Defhil1noprstvzTSN:LXB",
+ while ((c = getopt_long(argc, argv, "aAd::Defhil1noprstvzTSN:LXBU",
opts, NULL)) != -1)
switch (c) {
case '1':
@@ -1438,6 +1444,9 @@ int main(int argc, char *argv[])
case 'B':
show_bytes = 1;
break;
+ case 'U':
+ unreclaim_only = 1;
+ break;
default:
fatal("%s: Invalid option '%c'\n", argv[0], optopt);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 1/2] tools: slabinfo: add "-U" option to show unreclaimable slabs only
@ 2017-09-20 22:38 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-20 22:38 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Add "-U" option to show unreclaimable slabs only.
"-U" and "-S" together can tell us what unreclaimable slabs use the most
memory to help debug huge unreclaimable slabs issue.
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
---
tools/vm/slabinfo.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c
index b9d34b3..de8fa11 100644
--- a/tools/vm/slabinfo.c
+++ b/tools/vm/slabinfo.c
@@ -83,6 +83,7 @@ struct aliasinfo {
int sort_loss;
int extended_totals;
int show_bytes;
+int unreclaim_only;
/* Debug options */
int sanity;
@@ -132,6 +133,7 @@ static void usage(void)
"-L|--Loss Sort by loss\n"
"-X|--Xtotals Show extended summary information\n"
"-B|--Bytes Show size in bytes\n"
+ "-U|--Unreclaim Show unreclaimable slabs only\n"
"\nValid debug options (FZPUT may be combined)\n"
"a / A Switch on all debug options (=FZUP)\n"
"- Switch off all debug options\n"
@@ -568,6 +570,9 @@ static void slabcache(struct slabinfo *s)
if (strcmp(s->name, "*") == 0)
return;
+ if (unreclaim_only && s->reclaim_account)
+ return;
+
if (actual_slabs == 1) {
report(s);
return;
@@ -1346,6 +1351,7 @@ struct option opts[] = {
{ "Loss", no_argument, NULL, 'L'},
{ "Xtotals", no_argument, NULL, 'X'},
{ "Bytes", no_argument, NULL, 'B'},
+ { "Unreclaim", no_argument, NULL, 'U'},
{ NULL, 0, NULL, 0 }
};
@@ -1357,7 +1363,7 @@ int main(int argc, char *argv[])
page_size = getpagesize();
- while ((c = getopt_long(argc, argv, "aAd::Defhil1noprstvzTSN:LXB",
+ while ((c = getopt_long(argc, argv, "aAd::Defhil1noprstvzTSN:LXBU",
opts, NULL)) != -1)
switch (c) {
case '1':
@@ -1438,6 +1444,9 @@ int main(int argc, char *argv[])
case 'B':
show_bytes = 1;
break;
+ case 'U':
+ unreclaim_only = 1;
+ break;
default:
fatal("%s: Invalid option '%c'\n", argv[0], optopt);
--
1.8.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
2017-09-20 22:38 ` Yang Shi
@ 2017-09-20 22:38 ` Yang Shi
-1 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-20 22:38 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Kernel may panic when oom happens without killable process sometimes it
is caused by huge unreclaimable slabs used by kernel.
Although kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.
Print out unreclaimable slab info (used size and total size) which
actual memory usage is not zero (num_objs * size != 0) when panic_on_oom is set
or no killable process. Since such information is just showed when kernel
panic, so it will not lead too verbose message for normal oom.
The output looks like:
Unreclaimable slab info:
Name Used Total
rpc_buffers 31KB 31KB
rpc_tasks 7KB 7KB
ebitmap_node 1964KB 1964KB
avtab_node 5024KB 5024KB
xfs_buf 1402KB 1402KB
xfs_ili 134KB 134KB
xfs_efi_item 115KB 115KB
xfs_efd_item 115KB 115KB
xfs_buf_item 134KB 134KB
xfs_log_item_desc 342KB 342KB
xfs_trans 1412KB 1412KB
xfs_ifork 212KB 212KB
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
mm/oom_kill.c | 3 +++
mm/slab.h | 8 ++++++++
mm/slab_common.c | 26 ++++++++++++++++++++++++++
3 files changed, 37 insertions(+)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e0..bd48d34 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -43,6 +43,7 @@
#include <asm/tlb.h>
#include "internal.h"
+#include "slab.h"
#define CREATE_TRACE_POINTS
#include <trace/events/oom.h>
@@ -960,6 +961,7 @@ static void check_panic_on_oom(struct oom_control *oc,
if (is_sysrq_oom(oc))
return;
dump_header(oc, NULL);
+ dump_unreclaimable_slab();
panic("Out of memory: %s panic_on_oom is enabled\n",
sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
}
@@ -1044,6 +1046,7 @@ bool out_of_memory(struct oom_control *oc)
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
dump_header(oc, NULL);
+ dump_unreclaimable_slab();
panic("Out of memory and no killable processes...\n");
}
if (oc->chosen && oc->chosen != (void *)-1UL) {
diff --git a/mm/slab.h b/mm/slab.h
index 0733628..734a92d 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -505,6 +505,14 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
void memcg_slab_stop(struct seq_file *m, void *p);
int memcg_slab_show(struct seq_file *m, void *p);
+#ifdef CONFIG_SLABINFO
+void dump_unreclaimable_slab(void);
+#else
+void dump_unreclaimable_slab(void);
+{
+}
+#endif
+
void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
#ifdef CONFIG_SLAB_FREELIST_RANDOM
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83b..90d9de3 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1272,6 +1272,32 @@ static int slab_show(struct seq_file *m, void *p)
return 0;
}
+void dump_unreclaimable_slab(void)
+{
+ struct kmem_cache *s;
+ struct slabinfo sinfo;
+
+ pr_info("Unreclaimable slab info:\n");
+ pr_info("Name Used Total\n");
+
+ /*
+ * Here acquiring slab_mutex is unnecessary since we don't prefer to
+ * get sleep in oom path right before kernel panic, and avoid race condition.
+ * Since it is already oom, so there should be not any big allocation
+ * which could change the statistics significantly.
+ */
+ list_for_each_entry(s, &slab_caches, list) {
+ if (!is_root_cache(s))
+ continue;
+
+ memset(&sinfo, 0, sizeof(sinfo));
+ get_slabinfo(s, &sinfo);
+
+ if (!(s->flags & SLAB_RECLAIM_ACCOUNT) && sinfo.num_objs > 0)
+ pr_info("%-17s %10luKB %10luKB\n", cache_name(s), (sinfo.active_objs * s->size) / 1024, (sinfo.num_objs * s->size) / 1024);
+ }
+}
+
#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
void *memcg_slab_start(struct seq_file *m, loff_t *pos)
{
--
1.8.3.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
@ 2017-09-20 22:38 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-20 22:38 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Kernel may panic when oom happens without killable process sometimes it
is caused by huge unreclaimable slabs used by kernel.
Although kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.
Print out unreclaimable slab info (used size and total size) which
actual memory usage is not zero (num_objs * size != 0) when panic_on_oom is set
or no killable process. Since such information is just showed when kernel
panic, so it will not lead too verbose message for normal oom.
The output looks like:
Unreclaimable slab info:
Name Used Total
rpc_buffers 31KB 31KB
rpc_tasks 7KB 7KB
ebitmap_node 1964KB 1964KB
avtab_node 5024KB 5024KB
xfs_buf 1402KB 1402KB
xfs_ili 134KB 134KB
xfs_efi_item 115KB 115KB
xfs_efd_item 115KB 115KB
xfs_buf_item 134KB 134KB
xfs_log_item_desc 342KB 342KB
xfs_trans 1412KB 1412KB
xfs_ifork 212KB 212KB
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
mm/oom_kill.c | 3 +++
mm/slab.h | 8 ++++++++
mm/slab_common.c | 26 ++++++++++++++++++++++++++
3 files changed, 37 insertions(+)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e0..bd48d34 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -43,6 +43,7 @@
#include <asm/tlb.h>
#include "internal.h"
+#include "slab.h"
#define CREATE_TRACE_POINTS
#include <trace/events/oom.h>
@@ -960,6 +961,7 @@ static void check_panic_on_oom(struct oom_control *oc,
if (is_sysrq_oom(oc))
return;
dump_header(oc, NULL);
+ dump_unreclaimable_slab();
panic("Out of memory: %s panic_on_oom is enabled\n",
sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
}
@@ -1044,6 +1046,7 @@ bool out_of_memory(struct oom_control *oc)
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
dump_header(oc, NULL);
+ dump_unreclaimable_slab();
panic("Out of memory and no killable processes...\n");
}
if (oc->chosen && oc->chosen != (void *)-1UL) {
diff --git a/mm/slab.h b/mm/slab.h
index 0733628..734a92d 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -505,6 +505,14 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
void memcg_slab_stop(struct seq_file *m, void *p);
int memcg_slab_show(struct seq_file *m, void *p);
+#ifdef CONFIG_SLABINFO
+void dump_unreclaimable_slab(void);
+#else
+void dump_unreclaimable_slab(void);
+{
+}
+#endif
+
void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
#ifdef CONFIG_SLAB_FREELIST_RANDOM
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83b..90d9de3 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1272,6 +1272,32 @@ static int slab_show(struct seq_file *m, void *p)
return 0;
}
+void dump_unreclaimable_slab(void)
+{
+ struct kmem_cache *s;
+ struct slabinfo sinfo;
+
+ pr_info("Unreclaimable slab info:\n");
+ pr_info("Name Used Total\n");
+
+ /*
+ * Here acquiring slab_mutex is unnecessary since we don't prefer to
+ * get sleep in oom path right before kernel panic, and avoid race condition.
+ * Since it is already oom, so there should be not any big allocation
+ * which could change the statistics significantly.
+ */
+ list_for_each_entry(s, &slab_caches, list) {
+ if (!is_root_cache(s))
+ continue;
+
+ memset(&sinfo, 0, sizeof(sinfo));
+ get_slabinfo(s, &sinfo);
+
+ if (!(s->flags & SLAB_RECLAIM_ACCOUNT) && sinfo.num_objs > 0)
+ pr_info("%-17s %10luKB %10luKB\n", cache_name(s), (sinfo.active_objs * s->size) / 1024, (sinfo.num_objs * s->size) / 1024);
+ }
+}
+
#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
void *memcg_slab_start(struct seq_file *m, loff_t *pos)
{
--
1.8.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
2017-09-20 22:38 ` Yang Shi
@ 2017-09-21 8:23 ` David Rientjes
-1 siblings, 0 replies; 44+ messages in thread
From: David Rientjes @ 2017-09-21 8:23 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On Thu, 21 Sep 2017, Yang Shi wrote:
> Kernel may panic when oom happens without killable process sometimes it
> is caused by huge unreclaimable slabs used by kernel.
>
> Although kdump could help debug such problem, however, kdump is not
> available on all architectures and it might be malfunction sometime.
> And, since kernel already panic it is worthy capturing such information
> in dmesg to aid touble shooting.
>
> Print out unreclaimable slab info (used size and total size) which
> actual memory usage is not zero (num_objs * size != 0) when panic_on_oom is set
> or no killable process. Since such information is just showed when kernel
> panic, so it will not lead too verbose message for normal oom.
>
> The output looks like:
>
> Unreclaimable slab info:
> Name Used Total
> rpc_buffers 31KB 31KB
> rpc_tasks 7KB 7KB
> ebitmap_node 1964KB 1964KB
> avtab_node 5024KB 5024KB
> xfs_buf 1402KB 1402KB
> xfs_ili 134KB 134KB
> xfs_efi_item 115KB 115KB
> xfs_efd_item 115KB 115KB
> xfs_buf_item 134KB 134KB
> xfs_log_item_desc 342KB 342KB
> xfs_trans 1412KB 1412KB
> xfs_ifork 212KB 212KB
>
> Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
> ---
> mm/oom_kill.c | 3 +++
> mm/slab.h | 8 ++++++++
> mm/slab_common.c | 26 ++++++++++++++++++++++++++
> 3 files changed, 37 insertions(+)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 99736e0..bd48d34 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -43,6 +43,7 @@
>
> #include <asm/tlb.h>
> #include "internal.h"
> +#include "slab.h"
>
> #define CREATE_TRACE_POINTS
> #include <trace/events/oom.h>
> @@ -960,6 +961,7 @@ static void check_panic_on_oom(struct oom_control *oc,
> if (is_sysrq_oom(oc))
> return;
> dump_header(oc, NULL);
> + dump_unreclaimable_slab();
> panic("Out of memory: %s panic_on_oom is enabled\n",
> sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
> }
> @@ -1044,6 +1046,7 @@ bool out_of_memory(struct oom_control *oc)
> /* Found nothing?!?! Either we hang forever, or we panic. */
> if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
> dump_header(oc, NULL);
> + dump_unreclaimable_slab();
> panic("Out of memory and no killable processes...\n");
> }
> if (oc->chosen && oc->chosen != (void *)-1UL) {
> diff --git a/mm/slab.h b/mm/slab.h
> index 0733628..734a92d 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -505,6 +505,14 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
> void memcg_slab_stop(struct seq_file *m, void *p);
> int memcg_slab_show(struct seq_file *m, void *p);
>
> +#ifdef CONFIG_SLABINFO
> +void dump_unreclaimable_slab(void);
> +#else
> +void dump_unreclaimable_slab(void);
This won't compile when CONFIG_SLABINFO is disabled.
static inline void dump_unreclaimable_slab(void)
{
}
when CONFIG_SLABINFO=n.
> +{
> +}
> +#endif
> +
> void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
>
> #ifdef CONFIG_SLAB_FREELIST_RANDOM
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 904a83b..90d9de3 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1272,6 +1272,32 @@ static int slab_show(struct seq_file *m, void *p)
> return 0;
> }
>
> +void dump_unreclaimable_slab(void)
> +{
> + struct kmem_cache *s;
> + struct slabinfo sinfo;
> +
> + pr_info("Unreclaimable slab info:\n");
> + pr_info("Name Used Total\n");
> +
> + /*
> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
> + * get sleep in oom path right before kernel panic, and avoid race condition.
> + * Since it is already oom, so there should be not any big allocation
> + * which could change the statistics significantly.
The statistics themselves aren't protected by slab_mutex, it protects the
iteration of the list. I would suggest still taking the mutex here unless
there's a reason to avoid it.
> + */
> + list_for_each_entry(s, &slab_caches, list) {
> + if (!is_root_cache(s))
> + continue;
if (!(s->flags & SLAB_RECLAIM_ACCOUNT))
continue;
No need to do the memset or get_slabinfo() if it's reclaimable, so just
short-circuit it early in that case.
> +
> + memset(&sinfo, 0, sizeof(sinfo));
> + get_slabinfo(s, &sinfo);
> +
> + if (!(s->flags & SLAB_RECLAIM_ACCOUNT) && sinfo.num_objs > 0)
> + pr_info("%-17s %10luKB %10luKB\n", cache_name(s), (sinfo.active_objs * s->size) / 1024, (sinfo.num_objs * s->size) / 1024);
> + }
> +}
> +
> #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
> void *memcg_slab_start(struct seq_file *m, loff_t *pos)
> {
Please run scripts/checkpatch.pl on your patch since there's some
stylistic problems. Otherwise, I think we need one more revision and
we'll be good to go!
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
@ 2017-09-21 8:23 ` David Rientjes
0 siblings, 0 replies; 44+ messages in thread
From: David Rientjes @ 2017-09-21 8:23 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On Thu, 21 Sep 2017, Yang Shi wrote:
> Kernel may panic when oom happens without killable process sometimes it
> is caused by huge unreclaimable slabs used by kernel.
>
> Although kdump could help debug such problem, however, kdump is not
> available on all architectures and it might be malfunction sometime.
> And, since kernel already panic it is worthy capturing such information
> in dmesg to aid touble shooting.
>
> Print out unreclaimable slab info (used size and total size) which
> actual memory usage is not zero (num_objs * size != 0) when panic_on_oom is set
> or no killable process. Since such information is just showed when kernel
> panic, so it will not lead too verbose message for normal oom.
>
> The output looks like:
>
> Unreclaimable slab info:
> Name Used Total
> rpc_buffers 31KB 31KB
> rpc_tasks 7KB 7KB
> ebitmap_node 1964KB 1964KB
> avtab_node 5024KB 5024KB
> xfs_buf 1402KB 1402KB
> xfs_ili 134KB 134KB
> xfs_efi_item 115KB 115KB
> xfs_efd_item 115KB 115KB
> xfs_buf_item 134KB 134KB
> xfs_log_item_desc 342KB 342KB
> xfs_trans 1412KB 1412KB
> xfs_ifork 212KB 212KB
>
> Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
> ---
> mm/oom_kill.c | 3 +++
> mm/slab.h | 8 ++++++++
> mm/slab_common.c | 26 ++++++++++++++++++++++++++
> 3 files changed, 37 insertions(+)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 99736e0..bd48d34 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -43,6 +43,7 @@
>
> #include <asm/tlb.h>
> #include "internal.h"
> +#include "slab.h"
>
> #define CREATE_TRACE_POINTS
> #include <trace/events/oom.h>
> @@ -960,6 +961,7 @@ static void check_panic_on_oom(struct oom_control *oc,
> if (is_sysrq_oom(oc))
> return;
> dump_header(oc, NULL);
> + dump_unreclaimable_slab();
> panic("Out of memory: %s panic_on_oom is enabled\n",
> sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
> }
> @@ -1044,6 +1046,7 @@ bool out_of_memory(struct oom_control *oc)
> /* Found nothing?!?! Either we hang forever, or we panic. */
> if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
> dump_header(oc, NULL);
> + dump_unreclaimable_slab();
> panic("Out of memory and no killable processes...\n");
> }
> if (oc->chosen && oc->chosen != (void *)-1UL) {
> diff --git a/mm/slab.h b/mm/slab.h
> index 0733628..734a92d 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -505,6 +505,14 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
> void memcg_slab_stop(struct seq_file *m, void *p);
> int memcg_slab_show(struct seq_file *m, void *p);
>
> +#ifdef CONFIG_SLABINFO
> +void dump_unreclaimable_slab(void);
> +#else
> +void dump_unreclaimable_slab(void);
This won't compile when CONFIG_SLABINFO is disabled.
static inline void dump_unreclaimable_slab(void)
{
}
when CONFIG_SLABINFO=n.
> +{
> +}
> +#endif
> +
> void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
>
> #ifdef CONFIG_SLAB_FREELIST_RANDOM
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 904a83b..90d9de3 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1272,6 +1272,32 @@ static int slab_show(struct seq_file *m, void *p)
> return 0;
> }
>
> +void dump_unreclaimable_slab(void)
> +{
> + struct kmem_cache *s;
> + struct slabinfo sinfo;
> +
> + pr_info("Unreclaimable slab info:\n");
> + pr_info("Name Used Total\n");
> +
> + /*
> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
> + * get sleep in oom path right before kernel panic, and avoid race condition.
> + * Since it is already oom, so there should be not any big allocation
> + * which could change the statistics significantly.
The statistics themselves aren't protected by slab_mutex, it protects the
iteration of the list. I would suggest still taking the mutex here unless
there's a reason to avoid it.
> + */
> + list_for_each_entry(s, &slab_caches, list) {
> + if (!is_root_cache(s))
> + continue;
if (!(s->flags & SLAB_RECLAIM_ACCOUNT))
continue;
No need to do the memset or get_slabinfo() if it's reclaimable, so just
short-circuit it early in that case.
> +
> + memset(&sinfo, 0, sizeof(sinfo));
> + get_slabinfo(s, &sinfo);
> +
> + if (!(s->flags & SLAB_RECLAIM_ACCOUNT) && sinfo.num_objs > 0)
> + pr_info("%-17s %10luKB %10luKB\n", cache_name(s), (sinfo.active_objs * s->size) / 1024, (sinfo.num_objs * s->size) / 1024);
> + }
> +}
> +
> #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
> void *memcg_slab_start(struct seq_file *m, loff_t *pos)
> {
Please run scripts/checkpatch.pl on your patch since there's some
stylistic problems. Otherwise, I think we need one more revision and
we'll be good to go!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
2017-09-21 8:23 ` David Rientjes
@ 2017-09-21 17:51 ` Yang Shi
-1 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-21 17:51 UTC (permalink / raw)
To: David Rientjes
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On 9/21/17 1:23 AM, David Rientjes wrote:
> On Thu, 21 Sep 2017, Yang Shi wrote:
>
>> Kernel may panic when oom happens without killable process sometimes it
>> is caused by huge unreclaimable slabs used by kernel.
>>
>> Although kdump could help debug such problem, however, kdump is not
>> available on all architectures and it might be malfunction sometime.
>> And, since kernel already panic it is worthy capturing such information
>> in dmesg to aid touble shooting.
>>
>> Print out unreclaimable slab info (used size and total size) which
>> actual memory usage is not zero (num_objs * size != 0) when panic_on_oom is set
>> or no killable process. Since such information is just showed when kernel
>> panic, so it will not lead too verbose message for normal oom.
>>
>> The output looks like:
>>
>> Unreclaimable slab info:
>> Name Used Total
>> rpc_buffers 31KB 31KB
>> rpc_tasks 7KB 7KB
>> ebitmap_node 1964KB 1964KB
>> avtab_node 5024KB 5024KB
>> xfs_buf 1402KB 1402KB
>> xfs_ili 134KB 134KB
>> xfs_efi_item 115KB 115KB
>> xfs_efd_item 115KB 115KB
>> xfs_buf_item 134KB 134KB
>> xfs_log_item_desc 342KB 342KB
>> xfs_trans 1412KB 1412KB
>> xfs_ifork 212KB 212KB
>>
>> Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
>> ---
>> mm/oom_kill.c | 3 +++
>> mm/slab.h | 8 ++++++++
>> mm/slab_common.c | 26 ++++++++++++++++++++++++++
>> 3 files changed, 37 insertions(+)
>>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 99736e0..bd48d34 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -43,6 +43,7 @@
>>
>> #include <asm/tlb.h>
>> #include "internal.h"
>> +#include "slab.h"
>>
>> #define CREATE_TRACE_POINTS
>> #include <trace/events/oom.h>
>> @@ -960,6 +961,7 @@ static void check_panic_on_oom(struct oom_control *oc,
>> if (is_sysrq_oom(oc))
>> return;
>> dump_header(oc, NULL);
>> + dump_unreclaimable_slab();
>> panic("Out of memory: %s panic_on_oom is enabled\n",
>> sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
>> }
>> @@ -1044,6 +1046,7 @@ bool out_of_memory(struct oom_control *oc)
>> /* Found nothing?!?! Either we hang forever, or we panic. */
>> if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
>> dump_header(oc, NULL);
>> + dump_unreclaimable_slab();
>> panic("Out of memory and no killable processes...\n");
>> }
>> if (oc->chosen && oc->chosen != (void *)-1UL) {
>> diff --git a/mm/slab.h b/mm/slab.h
>> index 0733628..734a92d 100644
>> --- a/mm/slab.h
>> +++ b/mm/slab.h
>> @@ -505,6 +505,14 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
>> void memcg_slab_stop(struct seq_file *m, void *p);
>> int memcg_slab_show(struct seq_file *m, void *p);
>>
>> +#ifdef CONFIG_SLABINFO
>> +void dump_unreclaimable_slab(void);
>> +#else
>> +void dump_unreclaimable_slab(void);
>
> This won't compile when CONFIG_SLABINFO is disabled.
>
> static inline void dump_unreclaimable_slab(void)
> {
> }
>
> when CONFIG_SLABINFO=n.
Thanks for pointing this. Just tested CONFIG_SLANINFO = n case. It can't
be disabled in menuconfig, just manually modified init/Kconfig to test it.
>
>> +{
>> +}
>> +#endif
>> +
>> void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
>>
>> #ifdef CONFIG_SLAB_FREELIST_RANDOM
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 904a83b..90d9de3 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -1272,6 +1272,32 @@ static int slab_show(struct seq_file *m, void *p)
>> return 0;
>> }
>>
>> +void dump_unreclaimable_slab(void)
>> +{
>> + struct kmem_cache *s;
>> + struct slabinfo sinfo;
>> +
>> + pr_info("Unreclaimable slab info:\n");
>> + pr_info("Name Used Total\n");
>> +
>> + /*
>> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
>> + * get sleep in oom path right before kernel panic, and avoid race condition.
>> + * Since it is already oom, so there should be not any big allocation
>> + * which could change the statistics significantly.
>
> The statistics themselves aren't protected by slab_mutex, it protects the
> iteration of the list. I would suggest still taking the mutex here unless
> there's a reason to avoid it.
I don't think we prefer to sleep in oom path. Instead of acquiring the
mutex, I think we can use list_for_each_entry_safe() to avoid the
removal of kmem cache when printing the statistics.
>
>> + */
>> + list_for_each_entry(s, &slab_caches, list) {
>> + if (!is_root_cache(s))
>> + continue;
>
> if (!(s->flags & SLAB_RECLAIM_ACCOUNT))
> continue;
>
> No need to do the memset or get_slabinfo() if it's reclaimable, so just
> short-circuit it early in that case.
>
>> +
>> + memset(&sinfo, 0, sizeof(sinfo));
>> + get_slabinfo(s, &sinfo);
>> +
>> + if (!(s->flags & SLAB_RECLAIM_ACCOUNT) && sinfo.num_objs > 0)
>> + pr_info("%-17s %10luKB %10luKB\n", cache_name(s), (sinfo.active_objs * s->size) / 1024, (sinfo.num_objs * s->size) / 1024);
>> + }
>> +}
>> +
>> #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
>> void *memcg_slab_start(struct seq_file *m, loff_t *pos)
>> {
>
> Please run scripts/checkpatch.pl on your patch since there's some
> stylistic problems. Otherwise, I think we need one more revision and
> we'll be good to go!
Thanks, will prepare v5 soon.
Yang
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
@ 2017-09-21 17:51 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-21 17:51 UTC (permalink / raw)
To: David Rientjes
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On 9/21/17 1:23 AM, David Rientjes wrote:
> On Thu, 21 Sep 2017, Yang Shi wrote:
>
>> Kernel may panic when oom happens without killable process sometimes it
>> is caused by huge unreclaimable slabs used by kernel.
>>
>> Although kdump could help debug such problem, however, kdump is not
>> available on all architectures and it might be malfunction sometime.
>> And, since kernel already panic it is worthy capturing such information
>> in dmesg to aid touble shooting.
>>
>> Print out unreclaimable slab info (used size and total size) which
>> actual memory usage is not zero (num_objs * size != 0) when panic_on_oom is set
>> or no killable process. Since such information is just showed when kernel
>> panic, so it will not lead too verbose message for normal oom.
>>
>> The output looks like:
>>
>> Unreclaimable slab info:
>> Name Used Total
>> rpc_buffers 31KB 31KB
>> rpc_tasks 7KB 7KB
>> ebitmap_node 1964KB 1964KB
>> avtab_node 5024KB 5024KB
>> xfs_buf 1402KB 1402KB
>> xfs_ili 134KB 134KB
>> xfs_efi_item 115KB 115KB
>> xfs_efd_item 115KB 115KB
>> xfs_buf_item 134KB 134KB
>> xfs_log_item_desc 342KB 342KB
>> xfs_trans 1412KB 1412KB
>> xfs_ifork 212KB 212KB
>>
>> Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
>> ---
>> mm/oom_kill.c | 3 +++
>> mm/slab.h | 8 ++++++++
>> mm/slab_common.c | 26 ++++++++++++++++++++++++++
>> 3 files changed, 37 insertions(+)
>>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 99736e0..bd48d34 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -43,6 +43,7 @@
>>
>> #include <asm/tlb.h>
>> #include "internal.h"
>> +#include "slab.h"
>>
>> #define CREATE_TRACE_POINTS
>> #include <trace/events/oom.h>
>> @@ -960,6 +961,7 @@ static void check_panic_on_oom(struct oom_control *oc,
>> if (is_sysrq_oom(oc))
>> return;
>> dump_header(oc, NULL);
>> + dump_unreclaimable_slab();
>> panic("Out of memory: %s panic_on_oom is enabled\n",
>> sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
>> }
>> @@ -1044,6 +1046,7 @@ bool out_of_memory(struct oom_control *oc)
>> /* Found nothing?!?! Either we hang forever, or we panic. */
>> if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
>> dump_header(oc, NULL);
>> + dump_unreclaimable_slab();
>> panic("Out of memory and no killable processes...\n");
>> }
>> if (oc->chosen && oc->chosen != (void *)-1UL) {
>> diff --git a/mm/slab.h b/mm/slab.h
>> index 0733628..734a92d 100644
>> --- a/mm/slab.h
>> +++ b/mm/slab.h
>> @@ -505,6 +505,14 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
>> void memcg_slab_stop(struct seq_file *m, void *p);
>> int memcg_slab_show(struct seq_file *m, void *p);
>>
>> +#ifdef CONFIG_SLABINFO
>> +void dump_unreclaimable_slab(void);
>> +#else
>> +void dump_unreclaimable_slab(void);
>
> This won't compile when CONFIG_SLABINFO is disabled.
>
> static inline void dump_unreclaimable_slab(void)
> {
> }
>
> when CONFIG_SLABINFO=n.
Thanks for pointing this. Just tested CONFIG_SLANINFO = n case. It can't
be disabled in menuconfig, just manually modified init/Kconfig to test it.
>
>> +{
>> +}
>> +#endif
>> +
>> void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
>>
>> #ifdef CONFIG_SLAB_FREELIST_RANDOM
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 904a83b..90d9de3 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -1272,6 +1272,32 @@ static int slab_show(struct seq_file *m, void *p)
>> return 0;
>> }
>>
>> +void dump_unreclaimable_slab(void)
>> +{
>> + struct kmem_cache *s;
>> + struct slabinfo sinfo;
>> +
>> + pr_info("Unreclaimable slab info:\n");
>> + pr_info("Name Used Total\n");
>> +
>> + /*
>> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
>> + * get sleep in oom path right before kernel panic, and avoid race condition.
>> + * Since it is already oom, so there should be not any big allocation
>> + * which could change the statistics significantly.
>
> The statistics themselves aren't protected by slab_mutex, it protects the
> iteration of the list. I would suggest still taking the mutex here unless
> there's a reason to avoid it.
I don't think we prefer to sleep in oom path. Instead of acquiring the
mutex, I think we can use list_for_each_entry_safe() to avoid the
removal of kmem cache when printing the statistics.
>
>> + */
>> + list_for_each_entry(s, &slab_caches, list) {
>> + if (!is_root_cache(s))
>> + continue;
>
> if (!(s->flags & SLAB_RECLAIM_ACCOUNT))
> continue;
>
> No need to do the memset or get_slabinfo() if it's reclaimable, so just
> short-circuit it early in that case.
>
>> +
>> + memset(&sinfo, 0, sizeof(sinfo));
>> + get_slabinfo(s, &sinfo);
>> +
>> + if (!(s->flags & SLAB_RECLAIM_ACCOUNT) && sinfo.num_objs > 0)
>> + pr_info("%-17s %10luKB %10luKB\n", cache_name(s), (sinfo.active_objs * s->size) / 1024, (sinfo.num_objs * s->size) / 1024);
>> + }
>> +}
>> +
>> #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
>> void *memcg_slab_start(struct seq_file *m, loff_t *pos)
>> {
>
> Please run scripts/checkpatch.pl on your patch since there's some
> stylistic problems. Otherwise, I think we need one more revision and
> we'll be good to go!
Thanks, will prepare v5 soon.
Yang
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message when kernel panic
2017-09-20 22:38 ` Yang Shi
@ 2017-09-25 14:23 ` Michal Hocko
-1 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2017-09-25 14:23 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel
On Thu 21-09-17 06:38:50, Yang Shi wrote:
> Recently we ran into a oom issue, kernel panic due to no killable process.
> The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
>
> So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
> Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
>
> With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
>
> And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
Well, I do undestand that this _might_ be useful but it also might
generates a _lot_ of output. The oom report can be quite verbose already
so is this something we want to have enabled by default?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message when kernel panic
@ 2017-09-25 14:23 ` Michal Hocko
0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2017-09-25 14:23 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel
On Thu 21-09-17 06:38:50, Yang Shi wrote:
> Recently we ran into a oom issue, kernel panic due to no killable process.
> The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
>
> So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
> Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
>
> With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
>
> And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
Well, I do undestand that this _might_ be useful but it also might
generates a _lot_ of output. The oom report can be quite verbose already
so is this something we want to have enabled by default?
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message when kernel panic
2017-09-25 14:23 ` Michal Hocko
@ 2017-09-25 15:55 ` Yang Shi
-1 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-25 15:55 UTC (permalink / raw)
To: Michal Hocko
Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel
On 9/25/17 7:23 AM, Michal Hocko wrote:
> On Thu 21-09-17 06:38:50, Yang Shi wrote:
>> Recently we ran into a oom issue, kernel panic due to no killable process.
>> The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
>>
>> So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
>> Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
>>
>> With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
>>
>> And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
>
> Well, I do undestand that this _might_ be useful but it also might
> generates a _lot_ of output. The oom report can be quite verbose already
> so is this something we want to have enabled by default?
The uneclaimable slub message will be just printed out when kernel panic
(no killable process or panic_on_oom is set). So, it will not bother
normal oom. Since kernel is already panic, so it might be preferred to
have more information reported.
We definitely can add a proc knob to control it if we want to disable
the message even if when kernel panic.
Thanks,
Yang
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message when kernel panic
@ 2017-09-25 15:55 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-25 15:55 UTC (permalink / raw)
To: Michal Hocko
Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel
On 9/25/17 7:23 AM, Michal Hocko wrote:
> On Thu 21-09-17 06:38:50, Yang Shi wrote:
>> Recently we ran into a oom issue, kernel panic due to no killable process.
>> The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
>>
>> So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
>> Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
>>
>> With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
>>
>> And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
>
> Well, I do undestand that this _might_ be useful but it also might
> generates a _lot_ of output. The oom report can be quite verbose already
> so is this something we want to have enabled by default?
The uneclaimable slub message will be just printed out when kernel panic
(no killable process or panic_on_oom is set). So, it will not bother
normal oom. Since kernel is already panic, so it might be preferred to
have more information reported.
We definitely can add a proc knob to control it if we want to disable
the message even if when kernel panic.
Thanks,
Yang
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message when kernel panic
2017-09-25 15:55 ` Yang Shi
@ 2017-09-25 20:32 ` Michal Hocko
-1 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2017-09-25 20:32 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel
On Mon 25-09-17 23:55:19, Yang Shi wrote:
>
>
> On 9/25/17 7:23 AM, Michal Hocko wrote:
> > On Thu 21-09-17 06:38:50, Yang Shi wrote:
> > > Recently we ran into a oom issue, kernel panic due to no killable process.
> > > The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
> > >
> > > So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
> > > Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
> > >
> > > With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
> > >
> > > And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
> >
> > Well, I do undestand that this _might_ be useful but it also might
> > generates a _lot_ of output. The oom report can be quite verbose already
> > so is this something we want to have enabled by default?
>
> The uneclaimable slub message will be just printed out when kernel panic (no
> killable process or panic_on_oom is set). So, it will not bother normal oom.
> Since kernel is already panic, so it might be preferred to have more
> information reported.
Well, this certainly depends. If you have a limited console output (e.g.
no serial console) then the additional information can easily scroll the
potentially much more useful information from the early oom report. We
already do have a control to enable/disable tasks dumping which can be
very long as well.
> We definitely can add a proc knob to control it if we want to disable the
> message even if when kernel panic.
Well, I do not have a strong opinion on this. I can see cases where this
kind of information would be useful but most OOM reports I have seen
were simply user space pinned memory. Slab memory leaks are seen very
seldom. Do you think a pr_dbg and slab stats for all ooms would be still
useful?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message when kernel panic
@ 2017-09-25 20:32 ` Michal Hocko
0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2017-09-25 20:32 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel
On Mon 25-09-17 23:55:19, Yang Shi wrote:
>
>
> On 9/25/17 7:23 AM, Michal Hocko wrote:
> > On Thu 21-09-17 06:38:50, Yang Shi wrote:
> > > Recently we ran into a oom issue, kernel panic due to no killable process.
> > > The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
> > >
> > > So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
> > > Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
> > >
> > > With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
> > >
> > > And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
> >
> > Well, I do undestand that this _might_ be useful but it also might
> > generates a _lot_ of output. The oom report can be quite verbose already
> > so is this something we want to have enabled by default?
>
> The uneclaimable slub message will be just printed out when kernel panic (no
> killable process or panic_on_oom is set). So, it will not bother normal oom.
> Since kernel is already panic, so it might be preferred to have more
> information reported.
Well, this certainly depends. If you have a limited console output (e.g.
no serial console) then the additional information can easily scroll the
potentially much more useful information from the early oom report. We
already do have a control to enable/disable tasks dumping which can be
very long as well.
> We definitely can add a proc knob to control it if we want to disable the
> message even if when kernel panic.
Well, I do not have a strong opinion on this. I can see cases where this
kind of information would be useful but most OOM reports I have seen
were simply user space pinned memory. Slab memory leaks are seen very
seldom. Do you think a pr_dbg and slab stats for all ooms would be still
useful?
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message when kernel panic
2017-09-25 20:32 ` Michal Hocko
@ 2017-09-25 21:52 ` Yang Shi
-1 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-25 21:52 UTC (permalink / raw)
To: Michal Hocko
Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel
On 9/25/17 1:32 PM, Michal Hocko wrote:
> On Mon 25-09-17 23:55:19, Yang Shi wrote:
>>
>>
>> On 9/25/17 7:23 AM, Michal Hocko wrote:
>>> On Thu 21-09-17 06:38:50, Yang Shi wrote:
>>>> Recently we ran into a oom issue, kernel panic due to no killable process.
>>>> The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
>>>>
>>>> So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
>>>> Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
>>>>
>>>> With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
>>>>
>>>> And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
>>>
>>> Well, I do undestand that this _might_ be useful but it also might
>>> generates a _lot_ of output. The oom report can be quite verbose already
>>> so is this something we want to have enabled by default?
>>
>> The uneclaimable slub message will be just printed out when kernel panic (no
>> killable process or panic_on_oom is set). So, it will not bother normal oom.
>> Since kernel is already panic, so it might be preferred to have more
>> information reported.
>
> Well, this certainly depends. If you have a limited console output (e.g.
> no serial console) then the additional information can easily scroll the
> potentially much more useful information from the early oom report. We
> already do have a control to enable/disable tasks dumping which can be
> very long as well.
>
>> We definitely can add a proc knob to control it if we want to disable the
>> message even if when kernel panic.
>
> Well, I do not have a strong opinion on this. I can see cases where this
> kind of information would be useful but most OOM reports I have seen
> were simply user space pinned memory. Slab memory leaks are seen very
> seldom. Do you think a pr_dbg and slab stats for all ooms would be still
> useful?
It might be. But, we can use slabinfo to get all slab stats in non-panic
oom case, patch 1/2 (tools: slabinfo: add "-U" option to show
unreclaimable slabs only) should be used to cover this case.
Maybe we can set a unreclaimable slab/total mem ratio. For example, when
unreclaimable slab size >= 50% total memory size, then we print out slab
stats in oom? And, the ratio might be adjustable in /proc.
Or just replace pr_info to pr_debug. Once oom happens, if there are a
lot unreclaimable slabs consumed, we can just enable the debug info then
try to reproduce.
Thanks,
Yang
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message when kernel panic
@ 2017-09-25 21:52 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-25 21:52 UTC (permalink / raw)
To: Michal Hocko
Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel
On 9/25/17 1:32 PM, Michal Hocko wrote:
> On Mon 25-09-17 23:55:19, Yang Shi wrote:
>>
>>
>> On 9/25/17 7:23 AM, Michal Hocko wrote:
>>> On Thu 21-09-17 06:38:50, Yang Shi wrote:
>>>> Recently we ran into a oom issue, kernel panic due to no killable process.
>>>> The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
>>>>
>>>> So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
>>>> Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
>>>>
>>>> With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
>>>>
>>>> And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
>>>
>>> Well, I do undestand that this _might_ be useful but it also might
>>> generates a _lot_ of output. The oom report can be quite verbose already
>>> so is this something we want to have enabled by default?
>>
>> The uneclaimable slub message will be just printed out when kernel panic (no
>> killable process or panic_on_oom is set). So, it will not bother normal oom.
>> Since kernel is already panic, so it might be preferred to have more
>> information reported.
>
> Well, this certainly depends. If you have a limited console output (e.g.
> no serial console) then the additional information can easily scroll the
> potentially much more useful information from the early oom report. We
> already do have a control to enable/disable tasks dumping which can be
> very long as well.
>
>> We definitely can add a proc knob to control it if we want to disable the
>> message even if when kernel panic.
>
> Well, I do not have a strong opinion on this. I can see cases where this
> kind of information would be useful but most OOM reports I have seen
> were simply user space pinned memory. Slab memory leaks are seen very
> seldom. Do you think a pr_dbg and slab stats for all ooms would be still
> useful?
It might be. But, we can use slabinfo to get all slab stats in non-panic
oom case, patch 1/2 (tools: slabinfo: add "-U" option to show
unreclaimable slabs only) should be used to cover this case.
Maybe we can set a unreclaimable slab/total mem ratio. For example, when
unreclaimable slab size >= 50% total memory size, then we print out slab
stats in oom? And, the ratio might be adjustable in /proc.
Or just replace pr_info to pr_debug. Once oom happens, if there are a
lot unreclaimable slabs consumed, we can just enable the debug info then
try to reproduce.
Thanks,
Yang
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message when kernel panic
2017-09-25 21:52 ` Yang Shi
@ 2017-09-26 7:56 ` Michal Hocko
-1 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2017-09-26 7:56 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel
On Tue 26-09-17 05:52:50, Yang Shi wrote:
> Maybe we can set a unreclaimable slab/total mem ratio. For example, when
> unreclaimable slab size >= 50% total memory size, then we print out slab
> stats in oom? And, the ratio might be adjustable in /proc.
This sounds quite reasonable to me. I would compare the slab amount to
the directly user backed memory (LRU ages) though.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message when kernel panic
@ 2017-09-26 7:56 ` Michal Hocko
0 siblings, 0 replies; 44+ messages in thread
From: Michal Hocko @ 2017-09-26 7:56 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel
On Tue 26-09-17 05:52:50, Yang Shi wrote:
> Maybe we can set a unreclaimable slab/total mem ratio. For example, when
> unreclaimable slab size >= 50% total memory size, then we print out slab
> stats in oom? And, the ratio might be adjustable in /proc.
This sounds quite reasonable to me. I would compare the slab amount to
the directly user backed memory (LRU ages) though.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 0/2 v6] oom: capture unreclaimable slab info in oom message when kernel panic
@ 2017-09-22 19:52 Yang Shi
2017-09-22 19:52 ` Yang Shi
0 siblings, 1 reply; 44+ messages in thread
From: Yang Shi @ 2017-09-22 19:52 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Recently we ran into a oom issue, kernel panic due to no killable process.
The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
For details, please see the commit log for each commit.
Changelog v5 -> v6:
* Fixed a checkpatch.pl warning for patch #2, zero error and warning for both patches
Changelog v4 -> v5:
* Solved the comments from David
* Build test SLABINFO = n
Changelog v3 -> v4:
* Solved the comments from David
* Added David’s Acked-by in patch 1
Changelog v2 -> v3:
* Show used size and total size of each kmem cache per David’s comment
Changelog v1 -> v2:
* Removed the original patch 1 (“mm: slab: output reclaimable flag in /proc/slabinfo”) since Christoph suggested it might break the compatibility and /proc/slabinfo is legacy
* Added Christoph’s Acked-by
* Removed acquiring slab_mutex per Tetsuo’s comment
Yang Shi (2):
tools: slabinfo: add "-U" option to show unreclaimable slabs only
mm: oom: show unreclaimable slab info when kernel panic
mm/oom_kill.c | 3 +++
mm/slab.h | 8 ++++++++
mm/slab_common.c | 29 +++++++++++++++++++++++++++++
tools/vm/slabinfo.c | 11 ++++++++++-
4 files changed, 50 insertions(+), 1 deletion(-)
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
2017-09-22 19:52 [PATCH 0/2 v6] " Yang Shi
@ 2017-09-22 19:52 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-22 19:52 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Kernel may panic when oom happens without killable process sometimes it
is caused by huge unreclaimable slabs used by kernel.
Although kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.
Print out unreclaimable slab info (used size and total size) which
actual memory usage is not zero (num_objs * size != 0) when panic_on_oom
is set or no killable process. Since such information is just showed when
kernel panic, so it will not lead too verbose message for normal oom.
The output looks like:
Unreclaimable slab info:
Name Used Total
rpc_buffers 31KB 31KB
rpc_tasks 7KB 7KB
ebitmap_node 1964KB 1964KB
avtab_node 5024KB 5024KB
xfs_buf 1402KB 1402KB
xfs_ili 134KB 134KB
xfs_efi_item 115KB 115KB
xfs_efd_item 115KB 115KB
xfs_buf_item 134KB 134KB
xfs_log_item_desc 342KB 342KB
xfs_trans 1412KB 1412KB
xfs_ifork 212KB 212KB
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
mm/oom_kill.c | 3 +++
mm/slab.h | 8 ++++++++
mm/slab_common.c | 29 +++++++++++++++++++++++++++++
3 files changed, 40 insertions(+)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e0..bd48d34 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -43,6 +43,7 @@
#include <asm/tlb.h>
#include "internal.h"
+#include "slab.h"
#define CREATE_TRACE_POINTS
#include <trace/events/oom.h>
@@ -960,6 +961,7 @@ static void check_panic_on_oom(struct oom_control *oc,
if (is_sysrq_oom(oc))
return;
dump_header(oc, NULL);
+ dump_unreclaimable_slab();
panic("Out of memory: %s panic_on_oom is enabled\n",
sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
}
@@ -1044,6 +1046,7 @@ bool out_of_memory(struct oom_control *oc)
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
dump_header(oc, NULL);
+ dump_unreclaimable_slab();
panic("Out of memory and no killable processes...\n");
}
if (oc->chosen && oc->chosen != (void *)-1UL) {
diff --git a/mm/slab.h b/mm/slab.h
index 0733628..b0496d1 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -505,6 +505,14 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
void memcg_slab_stop(struct seq_file *m, void *p);
int memcg_slab_show(struct seq_file *m, void *p);
+#ifdef CONFIG_SLABINFO
+void dump_unreclaimable_slab(void);
+#else
+static inline void dump_unreclaimable_slab(void)
+{
+}
+#endif
+
void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
#ifdef CONFIG_SLAB_FREELIST_RANDOM
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83b..d08213d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1272,6 +1272,35 @@ static int slab_show(struct seq_file *m, void *p)
return 0;
}
+void dump_unreclaimable_slab(void)
+{
+ struct kmem_cache *s, *s2;
+ struct slabinfo sinfo;
+
+ pr_info("Unreclaimable slab info:\n");
+ pr_info("Name Used Total\n");
+
+ /*
+ * Here acquiring slab_mutex is unnecessary since we don't prefer to
+ * get sleep in oom path right before kernel panic, and avoid race
+ * condition.
+ * Since it is already oom, so there should be not any big allocation
+ * which could change the statistics significantly.
+ */
+ list_for_each_entry_safe(s, s2, &slab_caches, list) {
+ if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
+ continue;
+
+ memset(&sinfo, 0, sizeof(sinfo));
+ get_slabinfo(s, &sinfo);
+
+ if (sinfo.num_objs > 0)
+ pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
+ (sinfo.active_objs * s->size) / 1024,
+ (sinfo.num_objs * s->size) / 1024);
+ }
+}
+
#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
void *memcg_slab_start(struct seq_file *m, loff_t *pos)
{
--
1.8.3.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
@ 2017-09-22 19:52 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-22 19:52 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Kernel may panic when oom happens without killable process sometimes it
is caused by huge unreclaimable slabs used by kernel.
Although kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.
Print out unreclaimable slab info (used size and total size) which
actual memory usage is not zero (num_objs * size != 0) when panic_on_oom
is set or no killable process. Since such information is just showed when
kernel panic, so it will not lead too verbose message for normal oom.
The output looks like:
Unreclaimable slab info:
Name Used Total
rpc_buffers 31KB 31KB
rpc_tasks 7KB 7KB
ebitmap_node 1964KB 1964KB
avtab_node 5024KB 5024KB
xfs_buf 1402KB 1402KB
xfs_ili 134KB 134KB
xfs_efi_item 115KB 115KB
xfs_efd_item 115KB 115KB
xfs_buf_item 134KB 134KB
xfs_log_item_desc 342KB 342KB
xfs_trans 1412KB 1412KB
xfs_ifork 212KB 212KB
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
mm/oom_kill.c | 3 +++
mm/slab.h | 8 ++++++++
mm/slab_common.c | 29 +++++++++++++++++++++++++++++
3 files changed, 40 insertions(+)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e0..bd48d34 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -43,6 +43,7 @@
#include <asm/tlb.h>
#include "internal.h"
+#include "slab.h"
#define CREATE_TRACE_POINTS
#include <trace/events/oom.h>
@@ -960,6 +961,7 @@ static void check_panic_on_oom(struct oom_control *oc,
if (is_sysrq_oom(oc))
return;
dump_header(oc, NULL);
+ dump_unreclaimable_slab();
panic("Out of memory: %s panic_on_oom is enabled\n",
sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
}
@@ -1044,6 +1046,7 @@ bool out_of_memory(struct oom_control *oc)
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
dump_header(oc, NULL);
+ dump_unreclaimable_slab();
panic("Out of memory and no killable processes...\n");
}
if (oc->chosen && oc->chosen != (void *)-1UL) {
diff --git a/mm/slab.h b/mm/slab.h
index 0733628..b0496d1 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -505,6 +505,14 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
void memcg_slab_stop(struct seq_file *m, void *p);
int memcg_slab_show(struct seq_file *m, void *p);
+#ifdef CONFIG_SLABINFO
+void dump_unreclaimable_slab(void);
+#else
+static inline void dump_unreclaimable_slab(void)
+{
+}
+#endif
+
void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
#ifdef CONFIG_SLAB_FREELIST_RANDOM
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83b..d08213d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1272,6 +1272,35 @@ static int slab_show(struct seq_file *m, void *p)
return 0;
}
+void dump_unreclaimable_slab(void)
+{
+ struct kmem_cache *s, *s2;
+ struct slabinfo sinfo;
+
+ pr_info("Unreclaimable slab info:\n");
+ pr_info("Name Used Total\n");
+
+ /*
+ * Here acquiring slab_mutex is unnecessary since we don't prefer to
+ * get sleep in oom path right before kernel panic, and avoid race
+ * condition.
+ * Since it is already oom, so there should be not any big allocation
+ * which could change the statistics significantly.
+ */
+ list_for_each_entry_safe(s, s2, &slab_caches, list) {
+ if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
+ continue;
+
+ memset(&sinfo, 0, sizeof(sinfo));
+ get_slabinfo(s, &sinfo);
+
+ if (sinfo.num_objs > 0)
+ pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
+ (sinfo.active_objs * s->size) / 1024,
+ (sinfo.num_objs * s->size) / 1024);
+ }
+}
+
#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
void *memcg_slab_start(struct seq_file *m, loff_t *pos)
{
--
1.8.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
2017-09-22 19:52 ` Yang Shi
@ 2017-09-24 6:10 ` Qixuan Wu
-1 siblings, 0 replies; 44+ messages in thread
From: Qixuan Wu @ 2017-09-24 6:10 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko, linux-mm,
linux-kernel
On Sat, Sep 23, 2017, Yang Shi <yang.s@alibaba-inc.com> wrote:
>
> Kernel may panic when oom happens without killable process sometimes it
> is caused by huge unreclaimable slabs used by kernel.
>
> Although kdump could help debug such problem, however, kdump is not
> available on all architectures and it might be malfunction sometime.
> And, since kernel already panic it is worthy capturing such information
> in dmesg to aid touble shooting.
......
> +void dump_unreclaimable_slab(void)
> +{
> + struct kmem_cache *s, *s2;
> + struct slabinfo sinfo;
> +
> + pr_info("Unreclaimable slab info:\n");
> + pr_info("Name Used Total\n");
> +
> + /*
> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
> + * get sleep in oom path right before kernel panic, and avoid race
> + * condition.
> + * Since it is already oom, so there should be not any big allocation
> + * which could change the statistics significantly.
> + */
> + list_for_each_entry_safe(s, s2, &slab_caches, list) {
> + if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
> + continue;
> +
> + memset(&sinfo, 0, sizeof(sinfo));
> + get_slabinfo(s, &sinfo);
> +
> + if (sinfo.num_objs > 0)
> + pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
> + (sinfo.active_objs * s->size) / 1024,
> + (sinfo.num_objs * s->size) / 1024);
> + }
> +}
> +
Seems it's a good feature and patch is fine, maybe modify like below is better.
Change
if (sinfo.num_objs > 0)
to
if (sinfo.num_objs > 0 && sinfo.actives_objs > 0)
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
@ 2017-09-24 6:10 ` Qixuan Wu
0 siblings, 0 replies; 44+ messages in thread
From: Qixuan Wu @ 2017-09-24 6:10 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko, linux-mm,
linux-kernel
On Sat, Sep 23, 2017, Yang Shi <yang.s@alibaba-inc.com> wrote:
>
> Kernel may panic when oom happens without killable process sometimes it
> is caused by huge unreclaimable slabs used by kernel.
>
> Although kdump could help debug such problem, however, kdump is not
> available on all architectures and it might be malfunction sometime.
> And, since kernel already panic it is worthy capturing such information
> in dmesg to aid touble shooting.
......
> +void dump_unreclaimable_slab(void)
> +{
> + struct kmem_cache *s, *s2;
> + struct slabinfo sinfo;
> +
> + pr_info("Unreclaimable slab info:\n");
> + pr_info("Name Used Total\n");
> +
> + /*
> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
> + * get sleep in oom path right before kernel panic, and avoid race
> + * condition.
> + * Since it is already oom, so there should be not any big allocation
> + * which could change the statistics significantly.
> + */
> + list_for_each_entry_safe(s, s2, &slab_caches, list) {
> + if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
> + continue;
> +
> + memset(&sinfo, 0, sizeof(sinfo));
> + get_slabinfo(s, &sinfo);
> +
> + if (sinfo.num_objs > 0)
> + pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
> + (sinfo.active_objs * s->size) / 1024,
> + (sinfo.num_objs * s->size) / 1024);
> + }
> +}
> +
Seems it's a good feature and patch is fine, maybe modify like below is better.
Change
if (sinfo.num_objs > 0)
to
if (sinfo.num_objs > 0 && sinfo.actives_objs > 0)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 0/2 v5] oom: capture unreclaimable slab info in oom message when kernel panic
@ 2017-09-21 20:52 Yang Shi
2017-09-21 20:52 ` Yang Shi
0 siblings, 1 reply; 44+ messages in thread
From: Yang Shi @ 2017-09-21 20:52 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Recently we ran into a oom issue, kernel panic due to no killable process.
The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
For details, please see the commit log for each commit.
Changelog v4 —> v5:
* Solved the comments from David
* Build test SLABINFO = n
Changelog v3 —> v4:
* Solved the comments from David
* Added David’s Acked-by in patch 1
Changelog v2 —> v3:
* Show used size and total size of each kmem cache per David’s comment
Changelog v1 —> v2:
* Removed the original patch 1 (“mm: slab: output reclaimable flag in /proc/slabinfo”) since Christoph suggested it might break the compatibility and /proc/slabinfo is legacy
* Added Christoph’s Acked-by
* Removed acquiring slab_mutex per Tetsuo’s comment
Yang Shi (2):
tools: slabinfo: add "-U" option to show unreclaimable slabs only
mm: oom: show unreclaimable slab info when kernel panic
mm/oom_kill.c | 3 +++
mm/slab.h | 8 ++++++++
mm/slab_common.c | 29 +++++++++++++++++++++++++++++
tools/vm/slabinfo.c | 11 ++++++++++-
4 files changed, 50 insertions(+), 1 deletion(-)
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
2017-09-21 20:52 [PATCH 0/2 v5] oom: capture unreclaimable slab info in oom message " Yang Shi
@ 2017-09-21 20:52 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-21 20:52 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Kernel may panic when oom happens without killable process sometimes it
is caused by huge unreclaimable slabs used by kernel.
Although kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.
Print out unreclaimable slab info (used size and total size) which
actual memory usage is not zero (num_objs * size != 0) when panic_on_oom
is set or no killable process. Since such information is just showed when
kernel panic, so it will not lead too verbose message for normal oom.
The output looks like:
Unreclaimable slab info:
Name Used Total
rpc_buffers 31KB 31KB
rpc_tasks 7KB 7KB
ebitmap_node 1964KB 1964KB
avtab_node 5024KB 5024KB
xfs_buf 1402KB 1402KB
xfs_ili 134KB 134KB
xfs_efi_item 115KB 115KB
xfs_efd_item 115KB 115KB
xfs_buf_item 134KB 134KB
xfs_log_item_desc 342KB 342KB
xfs_trans 1412KB 1412KB
xfs_ifork 212KB 212KB
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
mm/oom_kill.c | 3 +++
mm/slab.h | 8 ++++++++
mm/slab_common.c | 29 +++++++++++++++++++++++++++++
3 files changed, 40 insertions(+)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e0..bd48d34 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -43,6 +43,7 @@
#include <asm/tlb.h>
#include "internal.h"
+#include "slab.h"
#define CREATE_TRACE_POINTS
#include <trace/events/oom.h>
@@ -960,6 +961,7 @@ static void check_panic_on_oom(struct oom_control *oc,
if (is_sysrq_oom(oc))
return;
dump_header(oc, NULL);
+ dump_unreclaimable_slab();
panic("Out of memory: %s panic_on_oom is enabled\n",
sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
}
@@ -1044,6 +1046,7 @@ bool out_of_memory(struct oom_control *oc)
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
dump_header(oc, NULL);
+ dump_unreclaimable_slab();
panic("Out of memory and no killable processes...\n");
}
if (oc->chosen && oc->chosen != (void *)-1UL) {
diff --git a/mm/slab.h b/mm/slab.h
index 0733628..b0496d1 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -505,6 +505,14 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
void memcg_slab_stop(struct seq_file *m, void *p);
int memcg_slab_show(struct seq_file *m, void *p);
+#ifdef CONFIG_SLABINFO
+void dump_unreclaimable_slab(void);
+#else
+static inline void dump_unreclaimable_slab(void)
+{
+}
+#endif
+
void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
#ifdef CONFIG_SLAB_FREELIST_RANDOM
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83b..72331ae 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1272,6 +1272,35 @@ static int slab_show(struct seq_file *m, void *p)
return 0;
}
+void dump_unreclaimable_slab(void)
+{
+ struct kmem_cache *s, *s2;
+ struct slabinfo sinfo;
+
+ pr_info("Unreclaimable slab info:\n");
+ pr_info("Name Used Total\n");
+
+ /*
+ * Here acquiring slab_mutex is unnecessary since we don't prefer to
+ * get sleep in oom path right before kernel panic, and avoid race
+ * condition.
+ * Since it is already oom, so there should be not any big allocation
+ * which could change the statistics significantly.
+ */
+ list_for_each_entry_safe(s, s2, &slab_caches, list) {
+ if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
+ continue;
+
+ memset(&sinfo, 0, sizeof(sinfo));
+ get_slabinfo(s, &sinfo);
+
+ if (sinfo.num_objs > 0)
+ pr_info("%-17s %10luKB %10luKB\n", cache_name(s), \
+ (sinfo.active_objs * s->size) / 1024, \
+ (sinfo.num_objs * s->size) / 1024);
+ }
+}
+
#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
void *memcg_slab_start(struct seq_file *m, loff_t *pos)
{
--
1.8.3.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
@ 2017-09-21 20:52 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-21 20:52 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Kernel may panic when oom happens without killable process sometimes it
is caused by huge unreclaimable slabs used by kernel.
Although kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.
Print out unreclaimable slab info (used size and total size) which
actual memory usage is not zero (num_objs * size != 0) when panic_on_oom
is set or no killable process. Since such information is just showed when
kernel panic, so it will not lead too verbose message for normal oom.
The output looks like:
Unreclaimable slab info:
Name Used Total
rpc_buffers 31KB 31KB
rpc_tasks 7KB 7KB
ebitmap_node 1964KB 1964KB
avtab_node 5024KB 5024KB
xfs_buf 1402KB 1402KB
xfs_ili 134KB 134KB
xfs_efi_item 115KB 115KB
xfs_efd_item 115KB 115KB
xfs_buf_item 134KB 134KB
xfs_log_item_desc 342KB 342KB
xfs_trans 1412KB 1412KB
xfs_ifork 212KB 212KB
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
mm/oom_kill.c | 3 +++
mm/slab.h | 8 ++++++++
mm/slab_common.c | 29 +++++++++++++++++++++++++++++
3 files changed, 40 insertions(+)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e0..bd48d34 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -43,6 +43,7 @@
#include <asm/tlb.h>
#include "internal.h"
+#include "slab.h"
#define CREATE_TRACE_POINTS
#include <trace/events/oom.h>
@@ -960,6 +961,7 @@ static void check_panic_on_oom(struct oom_control *oc,
if (is_sysrq_oom(oc))
return;
dump_header(oc, NULL);
+ dump_unreclaimable_slab();
panic("Out of memory: %s panic_on_oom is enabled\n",
sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
}
@@ -1044,6 +1046,7 @@ bool out_of_memory(struct oom_control *oc)
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
dump_header(oc, NULL);
+ dump_unreclaimable_slab();
panic("Out of memory and no killable processes...\n");
}
if (oc->chosen && oc->chosen != (void *)-1UL) {
diff --git a/mm/slab.h b/mm/slab.h
index 0733628..b0496d1 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -505,6 +505,14 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
void memcg_slab_stop(struct seq_file *m, void *p);
int memcg_slab_show(struct seq_file *m, void *p);
+#ifdef CONFIG_SLABINFO
+void dump_unreclaimable_slab(void);
+#else
+static inline void dump_unreclaimable_slab(void)
+{
+}
+#endif
+
void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
#ifdef CONFIG_SLAB_FREELIST_RANDOM
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83b..72331ae 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1272,6 +1272,35 @@ static int slab_show(struct seq_file *m, void *p)
return 0;
}
+void dump_unreclaimable_slab(void)
+{
+ struct kmem_cache *s, *s2;
+ struct slabinfo sinfo;
+
+ pr_info("Unreclaimable slab info:\n");
+ pr_info("Name Used Total\n");
+
+ /*
+ * Here acquiring slab_mutex is unnecessary since we don't prefer to
+ * get sleep in oom path right before kernel panic, and avoid race
+ * condition.
+ * Since it is already oom, so there should be not any big allocation
+ * which could change the statistics significantly.
+ */
+ list_for_each_entry_safe(s, s2, &slab_caches, list) {
+ if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
+ continue;
+
+ memset(&sinfo, 0, sizeof(sinfo));
+ get_slabinfo(s, &sinfo);
+
+ if (sinfo.num_objs > 0)
+ pr_info("%-17s %10luKB %10luKB\n", cache_name(s), \
+ (sinfo.active_objs * s->size) / 1024, \
+ (sinfo.num_objs * s->size) / 1024);
+ }
+}
+
#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
void *memcg_slab_start(struct seq_file *m, loff_t *pos)
{
--
1.8.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [RFC v3] oom: capture unreclaimable slab info in oom message when kernel panic
@ 2017-09-20 19:09 Yang Shi
2017-09-20 19:09 ` Yang Shi
0 siblings, 1 reply; 44+ messages in thread
From: Yang Shi @ 2017-09-20 19:09 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Recently we ran into a oom issue, kernel panic due to no killable process.
The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
For details, please see the commit log for each commit.
Changelog v2 —> v3:
* Show used size and total size of each kmem cache per David’s comment
Changelog v1 —> v2:
* Removed the original patch 1 (“mm: slab: output reclaimable flag in /proc/slabinfo”) since Christoph suggested it might break the compatibility and /proc/slabinfo is legacy
* Added Christoph’s Acked-by
* Removed acquiring slab_mutex per Tetsuo’s comment
Yang Shi (2):
tools: slabinfo: add "-U" option to show unreclaimable slabs only
mm: oom: show unreclaimable slab info when kernel panic
mm/oom_kill.c | 13 +++++++++++--
mm/slab.c | 1 +
mm/slab.h | 7 +++++++
mm/slab_common.c | 31 +++++++++++++++++++++++++++++++
mm/slub.c | 1 +
tools/vm/slabinfo.c | 11 ++++++++++-
6 files changed, 61 insertions(+), 3 deletions(-)
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
2017-09-20 19:09 [RFC v3] oom: capture unreclaimable slab info in oom message " Yang Shi
@ 2017-09-20 19:09 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-20 19:09 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Kernel may panic when oom happens without killable process sometimes it
is caused by huge unreclaimable slabs used by kernel.
Altough kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.
Add a field in struct slibinfo to show if this slab is reclaimable or
not, and a helper function to achieve the value from
SLAB_RECLAIM_ACCOUNT flag.
Print out unreclaimable slab info (used size and total size) which actual
memory usage is not zero (num_objs * size != 0) when panic_on_oom is set or
no killable process. Since such information is just showed when kernel panic,
so it will not lead too verbose message for normal oom.
The output looks like:
Unreclaimable slab info:
Name Used Total
rpc_buffers 31KB 31KB
rpc_tasks 7KB 7KB
ebitmap_node 1964KB 1964KB
avtab_node 5024KB 5024KB
xfs_buf 1402KB 1402KB
xfs_ili 134KB 134KB
xfs_efi_item 115KB 115KB
xfs_efd_item 115KB 115KB
xfs_buf_item 134KB 134KB
xfs_log_item_desc 342KB 342KB
xfs_trans 1412KB 1412KB
xfs_ifork 212KB 212KB
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
mm/oom_kill.c | 13 +++++++++++--
mm/slab.c | 1 +
mm/slab.h | 7 +++++++
mm/slab_common.c | 31 +++++++++++++++++++++++++++++++
mm/slub.c | 1 +
5 files changed, 51 insertions(+), 2 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e0..173c423 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -43,6 +43,7 @@
#include <asm/tlb.h>
#include "internal.h"
+#include "slab.h"
#define CREATE_TRACE_POINTS
#include <trace/events/oom.h>
@@ -427,6 +428,14 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
dump_tasks(oc->memcg, oc->nodemask);
}
+static void dump_header_with_slabinfo(struct oom_control *oc, struct task_struct *p)
+{
+ dump_header(oc, p);
+
+ if (IS_ENABLED(CONFIG_SLABINFO))
+ show_unreclaimable_slab();
+}
+
/*
* Number of OOM victims in flight
*/
@@ -959,7 +968,7 @@ static void check_panic_on_oom(struct oom_control *oc,
/* Do not panic for oom kills triggered by sysrq */
if (is_sysrq_oom(oc))
return;
- dump_header(oc, NULL);
+ dump_header_with_slabinfo(oc, NULL);
panic("Out of memory: %s panic_on_oom is enabled\n",
sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
}
@@ -1043,7 +1052,7 @@ bool out_of_memory(struct oom_control *oc)
select_bad_process(oc);
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
- dump_header(oc, NULL);
+ dump_header_with_slabinfo(oc, NULL);
panic("Out of memory and no killable processes...\n");
}
if (oc->chosen && oc->chosen != (void *)-1UL) {
diff --git a/mm/slab.c b/mm/slab.c
index 04dec48..4f4971c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4132,6 +4132,7 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
sinfo->shared = cachep->shared;
sinfo->objects_per_slab = cachep->num;
sinfo->cache_order = cachep->gfporder;
+ sinfo->reclaim = is_reclaimable(cachep);
}
void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *cachep)
diff --git a/mm/slab.h b/mm/slab.h
index 0733628..2f1ebce 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -186,6 +186,7 @@ struct slabinfo {
unsigned int shared;
unsigned int objects_per_slab;
unsigned int cache_order;
+ unsigned int reclaim;
};
void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo);
@@ -352,6 +353,11 @@ static inline void memcg_link_cache(struct kmem_cache *s)
#endif /* CONFIG_MEMCG && !CONFIG_SLOB */
+static inline bool is_reclaimable(struct kmem_cache *s)
+{
+ return (s->flags & SLAB_RECLAIM_ACCOUNT) ? true : false;
+}
+
static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
{
struct kmem_cache *cachep;
@@ -504,6 +510,7 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
void *memcg_slab_next(struct seq_file *m, void *p, loff_t *pos);
void memcg_slab_stop(struct seq_file *m, void *p);
int memcg_slab_show(struct seq_file *m, void *p);
+void show_unreclaimable_slab(void);
void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83b..f2c6200 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -35,6 +35,8 @@
static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
slab_caches_to_rcu_destroy_workfn);
+#define K(x) ((x)/1024)
+
/*
* Set of flags that will prevent slab merging
*/
@@ -1272,6 +1274,35 @@ static int slab_show(struct seq_file *m, void *p)
return 0;
}
+void show_unreclaimable_slab()
+{
+ struct kmem_cache *s = NULL;
+ struct slabinfo sinfo;
+
+ memset(&sinfo, 0, sizeof(sinfo));
+
+ printk("Unreclaimable slab info:\n");
+ printk("Name Used Total\n");
+
+ /*
+ * Here acquiring slab_mutex is unnecessary since we don't prefer to
+ * get sleep in oom path right before kernel panic, and avoid race condition.
+ * Since it is already oom, so there should be not any big allocation
+ * which could change the statistics significantly.
+ */
+ list_for_each_entry(s, &slab_caches, list) {
+ if (!is_root_cache(s))
+ continue;
+
+ get_slabinfo(s, &sinfo);
+
+ if (!is_reclaimable(s) && sinfo.num_objs > 0)
+ printk("%-17s %10luKB %10luKB\n", cache_name(s), K(sinfo.active_objs * s->size), K(sinfo.num_objs * s->size));
+ }
+}
+EXPORT_SYMBOL(show_unreclaimable_slab);
+#undef K
+
#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
void *memcg_slab_start(struct seq_file *m, loff_t *pos)
{
diff --git a/mm/slub.c b/mm/slub.c
index 163352c..5c17c0a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5872,6 +5872,7 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
sinfo->num_slabs = nr_slabs;
sinfo->objects_per_slab = oo_objects(s->oo);
sinfo->cache_order = oo_order(s->oo);
+ sinfo->reclaim = is_reclaimable(s);
}
void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
@ 2017-09-20 19:09 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-20 19:09 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Kernel may panic when oom happens without killable process sometimes it
is caused by huge unreclaimable slabs used by kernel.
Altough kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.
Add a field in struct slibinfo to show if this slab is reclaimable or
not, and a helper function to achieve the value from
SLAB_RECLAIM_ACCOUNT flag.
Print out unreclaimable slab info (used size and total size) which actual
memory usage is not zero (num_objs * size != 0) when panic_on_oom is set or
no killable process. Since such information is just showed when kernel panic,
so it will not lead too verbose message for normal oom.
The output looks like:
Unreclaimable slab info:
Name Used Total
rpc_buffers 31KB 31KB
rpc_tasks 7KB 7KB
ebitmap_node 1964KB 1964KB
avtab_node 5024KB 5024KB
xfs_buf 1402KB 1402KB
xfs_ili 134KB 134KB
xfs_efi_item 115KB 115KB
xfs_efd_item 115KB 115KB
xfs_buf_item 134KB 134KB
xfs_log_item_desc 342KB 342KB
xfs_trans 1412KB 1412KB
xfs_ifork 212KB 212KB
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
mm/oom_kill.c | 13 +++++++++++--
mm/slab.c | 1 +
mm/slab.h | 7 +++++++
mm/slab_common.c | 31 +++++++++++++++++++++++++++++++
mm/slub.c | 1 +
5 files changed, 51 insertions(+), 2 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e0..173c423 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -43,6 +43,7 @@
#include <asm/tlb.h>
#include "internal.h"
+#include "slab.h"
#define CREATE_TRACE_POINTS
#include <trace/events/oom.h>
@@ -427,6 +428,14 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
dump_tasks(oc->memcg, oc->nodemask);
}
+static void dump_header_with_slabinfo(struct oom_control *oc, struct task_struct *p)
+{
+ dump_header(oc, p);
+
+ if (IS_ENABLED(CONFIG_SLABINFO))
+ show_unreclaimable_slab();
+}
+
/*
* Number of OOM victims in flight
*/
@@ -959,7 +968,7 @@ static void check_panic_on_oom(struct oom_control *oc,
/* Do not panic for oom kills triggered by sysrq */
if (is_sysrq_oom(oc))
return;
- dump_header(oc, NULL);
+ dump_header_with_slabinfo(oc, NULL);
panic("Out of memory: %s panic_on_oom is enabled\n",
sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
}
@@ -1043,7 +1052,7 @@ bool out_of_memory(struct oom_control *oc)
select_bad_process(oc);
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
- dump_header(oc, NULL);
+ dump_header_with_slabinfo(oc, NULL);
panic("Out of memory and no killable processes...\n");
}
if (oc->chosen && oc->chosen != (void *)-1UL) {
diff --git a/mm/slab.c b/mm/slab.c
index 04dec48..4f4971c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4132,6 +4132,7 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
sinfo->shared = cachep->shared;
sinfo->objects_per_slab = cachep->num;
sinfo->cache_order = cachep->gfporder;
+ sinfo->reclaim = is_reclaimable(cachep);
}
void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *cachep)
diff --git a/mm/slab.h b/mm/slab.h
index 0733628..2f1ebce 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -186,6 +186,7 @@ struct slabinfo {
unsigned int shared;
unsigned int objects_per_slab;
unsigned int cache_order;
+ unsigned int reclaim;
};
void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo);
@@ -352,6 +353,11 @@ static inline void memcg_link_cache(struct kmem_cache *s)
#endif /* CONFIG_MEMCG && !CONFIG_SLOB */
+static inline bool is_reclaimable(struct kmem_cache *s)
+{
+ return (s->flags & SLAB_RECLAIM_ACCOUNT) ? true : false;
+}
+
static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
{
struct kmem_cache *cachep;
@@ -504,6 +510,7 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
void *memcg_slab_next(struct seq_file *m, void *p, loff_t *pos);
void memcg_slab_stop(struct seq_file *m, void *p);
int memcg_slab_show(struct seq_file *m, void *p);
+void show_unreclaimable_slab(void);
void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83b..f2c6200 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -35,6 +35,8 @@
static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
slab_caches_to_rcu_destroy_workfn);
+#define K(x) ((x)/1024)
+
/*
* Set of flags that will prevent slab merging
*/
@@ -1272,6 +1274,35 @@ static int slab_show(struct seq_file *m, void *p)
return 0;
}
+void show_unreclaimable_slab()
+{
+ struct kmem_cache *s = NULL;
+ struct slabinfo sinfo;
+
+ memset(&sinfo, 0, sizeof(sinfo));
+
+ printk("Unreclaimable slab info:\n");
+ printk("Name Used Total\n");
+
+ /*
+ * Here acquiring slab_mutex is unnecessary since we don't prefer to
+ * get sleep in oom path right before kernel panic, and avoid race condition.
+ * Since it is already oom, so there should be not any big allocation
+ * which could change the statistics significantly.
+ */
+ list_for_each_entry(s, &slab_caches, list) {
+ if (!is_root_cache(s))
+ continue;
+
+ get_slabinfo(s, &sinfo);
+
+ if (!is_reclaimable(s) && sinfo.num_objs > 0)
+ printk("%-17s %10luKB %10luKB\n", cache_name(s), K(sinfo.active_objs * s->size), K(sinfo.num_objs * s->size));
+ }
+}
+EXPORT_SYMBOL(show_unreclaimable_slab);
+#undef K
+
#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
void *memcg_slab_start(struct seq_file *m, loff_t *pos)
{
diff --git a/mm/slub.c b/mm/slub.c
index 163352c..5c17c0a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5872,6 +5872,7 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
sinfo->num_slabs = nr_slabs;
sinfo->objects_per_slab = oo_objects(s->oo);
sinfo->cache_order = oo_order(s->oo);
+ sinfo->reclaim = is_reclaimable(s);
}
void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s)
--
1.8.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
2017-09-20 19:09 ` Yang Shi
@ 2017-09-20 21:00 ` David Rientjes
-1 siblings, 0 replies; 44+ messages in thread
From: David Rientjes @ 2017-09-20 21:00 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On Thu, 21 Sep 2017, Yang Shi wrote:
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 99736e0..173c423 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -43,6 +43,7 @@
>
> #include <asm/tlb.h>
> #include "internal.h"
> +#include "slab.h"
>
> #define CREATE_TRACE_POINTS
> #include <trace/events/oom.h>
> @@ -427,6 +428,14 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
> dump_tasks(oc->memcg, oc->nodemask);
> }
>
> +static void dump_header_with_slabinfo(struct oom_control *oc, struct task_struct *p)
> +{
> + dump_header(oc, p);
> +
> + if (IS_ENABLED(CONFIG_SLABINFO))
> + show_unreclaimable_slab();
> +}
> +
> /*
> * Number of OOM victims in flight
> */
I don't think we need a new function for this. Where you want to dump
unreclaimable slab before panic, just call a new dump_unreclaimable_slab()
function that gets declared in slab.h that is a no-op when CONFIG_SLABINFO
is disabled. We just want to do
dump_header(...);
dump_unreclaimable_slab(...);
panic(...);
> diff --git a/mm/slab.c b/mm/slab.c
> index 04dec48..4f4971c 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -4132,6 +4132,7 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
> sinfo->shared = cachep->shared;
> sinfo->objects_per_slab = cachep->num;
> sinfo->cache_order = cachep->gfporder;
> + sinfo->reclaim = is_reclaimable(cachep);
We don't need a new field, we already have cachep->flags accessible.
> }
>
> void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *cachep)
> diff --git a/mm/slab.h b/mm/slab.h
> index 0733628..2f1ebce 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -186,6 +186,7 @@ struct slabinfo {
> unsigned int shared;
> unsigned int objects_per_slab;
> unsigned int cache_order;
> + unsigned int reclaim;
Not needed.
> };
>
> void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo);
> @@ -352,6 +353,11 @@ static inline void memcg_link_cache(struct kmem_cache *s)
>
> #endif /* CONFIG_MEMCG && !CONFIG_SLOB */
>
> +static inline bool is_reclaimable(struct kmem_cache *s)
> +{
> + return (s->flags & SLAB_RECLAIM_ACCOUNT) ? true : false;
> +}
> +
I don't think we need this.
> static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
> {
> struct kmem_cache *cachep;
> @@ -504,6 +510,7 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
> void *memcg_slab_next(struct seq_file *m, void *p, loff_t *pos);
> void memcg_slab_stop(struct seq_file *m, void *p);
> int memcg_slab_show(struct seq_file *m, void *p);
> +void show_unreclaimable_slab(void);
>
> void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 904a83b..f2c6200 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -35,6 +35,8 @@
> static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
> slab_caches_to_rcu_destroy_workfn);
>
> +#define K(x) ((x)/1024)
> +
I don't think we need this.
> /*
> * Set of flags that will prevent slab merging
> */
> @@ -1272,6 +1274,35 @@ static int slab_show(struct seq_file *m, void *p)
> return 0;
> }
>
> +void show_unreclaimable_slab()
void show_unreclaimable_slab(void)
> +{
> + struct kmem_cache *s = NULL;
No initialization needed.
> + struct slabinfo sinfo;
> +
> + memset(&sinfo, 0, sizeof(sinfo));
> +
> + printk("Unreclaimable slab info:\n");
> + printk("Name Used Total\n");
> +
> + /*
> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
> + * get sleep in oom path right before kernel panic, and avoid race condition.
> + * Since it is already oom, so there should be not any big allocation
> + * which could change the statistics significantly.
> + */
> + list_for_each_entry(s, &slab_caches, list) {
> + if (!is_root_cache(s))
> + continue;
> +
We need to do the memset() here.
> + get_slabinfo(s, &sinfo);
> +
> + if (!is_reclaimable(s) && sinfo.num_objs > 0)
> + printk("%-17s %10luKB %10luKB\n", cache_name(s), K(sinfo.active_objs * s->size), K(sinfo.num_objs * s->size));
I think you can just check for SLAB_RECLAIM_ACCOUNT here.
Everything in this function should be pr_info().
> + }
> +}
> +EXPORT_SYMBOL(show_unreclaimable_slab);
> +#undef K
> +
> #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
> void *memcg_slab_start(struct seq_file *m, loff_t *pos)
> {
> diff --git a/mm/slub.c b/mm/slub.c
> index 163352c..5c17c0a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5872,6 +5872,7 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
> sinfo->num_slabs = nr_slabs;
> sinfo->objects_per_slab = oo_objects(s->oo);
> sinfo->cache_order = oo_order(s->oo);
> + sinfo->reclaim = is_reclaimable(s);
Not needed.
> }
>
> void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s)
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
@ 2017-09-20 21:00 ` David Rientjes
0 siblings, 0 replies; 44+ messages in thread
From: David Rientjes @ 2017-09-20 21:00 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On Thu, 21 Sep 2017, Yang Shi wrote:
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 99736e0..173c423 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -43,6 +43,7 @@
>
> #include <asm/tlb.h>
> #include "internal.h"
> +#include "slab.h"
>
> #define CREATE_TRACE_POINTS
> #include <trace/events/oom.h>
> @@ -427,6 +428,14 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
> dump_tasks(oc->memcg, oc->nodemask);
> }
>
> +static void dump_header_with_slabinfo(struct oom_control *oc, struct task_struct *p)
> +{
> + dump_header(oc, p);
> +
> + if (IS_ENABLED(CONFIG_SLABINFO))
> + show_unreclaimable_slab();
> +}
> +
> /*
> * Number of OOM victims in flight
> */
I don't think we need a new function for this. Where you want to dump
unreclaimable slab before panic, just call a new dump_unreclaimable_slab()
function that gets declared in slab.h that is a no-op when CONFIG_SLABINFO
is disabled. We just want to do
dump_header(...);
dump_unreclaimable_slab(...);
panic(...);
> diff --git a/mm/slab.c b/mm/slab.c
> index 04dec48..4f4971c 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -4132,6 +4132,7 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
> sinfo->shared = cachep->shared;
> sinfo->objects_per_slab = cachep->num;
> sinfo->cache_order = cachep->gfporder;
> + sinfo->reclaim = is_reclaimable(cachep);
We don't need a new field, we already have cachep->flags accessible.
> }
>
> void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *cachep)
> diff --git a/mm/slab.h b/mm/slab.h
> index 0733628..2f1ebce 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -186,6 +186,7 @@ struct slabinfo {
> unsigned int shared;
> unsigned int objects_per_slab;
> unsigned int cache_order;
> + unsigned int reclaim;
Not needed.
> };
>
> void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo);
> @@ -352,6 +353,11 @@ static inline void memcg_link_cache(struct kmem_cache *s)
>
> #endif /* CONFIG_MEMCG && !CONFIG_SLOB */
>
> +static inline bool is_reclaimable(struct kmem_cache *s)
> +{
> + return (s->flags & SLAB_RECLAIM_ACCOUNT) ? true : false;
> +}
> +
I don't think we need this.
> static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
> {
> struct kmem_cache *cachep;
> @@ -504,6 +510,7 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
> void *memcg_slab_next(struct seq_file *m, void *p, loff_t *pos);
> void memcg_slab_stop(struct seq_file *m, void *p);
> int memcg_slab_show(struct seq_file *m, void *p);
> +void show_unreclaimable_slab(void);
>
> void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 904a83b..f2c6200 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -35,6 +35,8 @@
> static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
> slab_caches_to_rcu_destroy_workfn);
>
> +#define K(x) ((x)/1024)
> +
I don't think we need this.
> /*
> * Set of flags that will prevent slab merging
> */
> @@ -1272,6 +1274,35 @@ static int slab_show(struct seq_file *m, void *p)
> return 0;
> }
>
> +void show_unreclaimable_slab()
void show_unreclaimable_slab(void)
> +{
> + struct kmem_cache *s = NULL;
No initialization needed.
> + struct slabinfo sinfo;
> +
> + memset(&sinfo, 0, sizeof(sinfo));
> +
> + printk("Unreclaimable slab info:\n");
> + printk("Name Used Total\n");
> +
> + /*
> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
> + * get sleep in oom path right before kernel panic, and avoid race condition.
> + * Since it is already oom, so there should be not any big allocation
> + * which could change the statistics significantly.
> + */
> + list_for_each_entry(s, &slab_caches, list) {
> + if (!is_root_cache(s))
> + continue;
> +
We need to do the memset() here.
> + get_slabinfo(s, &sinfo);
> +
> + if (!is_reclaimable(s) && sinfo.num_objs > 0)
> + printk("%-17s %10luKB %10luKB\n", cache_name(s), K(sinfo.active_objs * s->size), K(sinfo.num_objs * s->size));
I think you can just check for SLAB_RECLAIM_ACCOUNT here.
Everything in this function should be pr_info().
> + }
> +}
> +EXPORT_SYMBOL(show_unreclaimable_slab);
> +#undef K
> +
> #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
> void *memcg_slab_start(struct seq_file *m, loff_t *pos)
> {
> diff --git a/mm/slub.c b/mm/slub.c
> index 163352c..5c17c0a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5872,6 +5872,7 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
> sinfo->num_slabs = nr_slabs;
> sinfo->objects_per_slab = oo_objects(s->oo);
> sinfo->cache_order = oo_order(s->oo);
> + sinfo->reclaim = is_reclaimable(s);
Not needed.
> }
>
> void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
2017-09-20 21:00 ` David Rientjes
@ 2017-09-20 21:32 ` Yang Shi
-1 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-20 21:32 UTC (permalink / raw)
To: David Rientjes
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On 9/20/17 2:00 PM, David Rientjes wrote:
> On Thu, 21 Sep 2017, Yang Shi wrote:
>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 99736e0..173c423 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -43,6 +43,7 @@
>>
>> #include <asm/tlb.h>
>> #include "internal.h"
>> +#include "slab.h"
>>
>> #define CREATE_TRACE_POINTS
>> #include <trace/events/oom.h>
>> @@ -427,6 +428,14 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
>> dump_tasks(oc->memcg, oc->nodemask);
>> }
>>
>> +static void dump_header_with_slabinfo(struct oom_control *oc, struct task_struct *p)
>> +{
>> + dump_header(oc, p);
>> +
>> + if (IS_ENABLED(CONFIG_SLABINFO))
>> + show_unreclaimable_slab();
>> +}
>> +
>> /*
>> * Number of OOM victims in flight
>> */
>
> I don't think we need a new function for this. Where you want to dump
> unreclaimable slab before panic, just call a new dump_unreclaimable_slab()
> function that gets declared in slab.h that is a no-op when CONFIG_SLABINFO
> is disabled. We just want to do
>
> dump_header(...);
> dump_unreclaimable_slab(...);
> panic(...);
Thanks for the comment, they will be solved in v4.
Yang
>
>> diff --git a/mm/slab.c b/mm/slab.c
>> index 04dec48..4f4971c 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -4132,6 +4132,7 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
>> sinfo->shared = cachep->shared;
>> sinfo->objects_per_slab = cachep->num;
>> sinfo->cache_order = cachep->gfporder;
>> + sinfo->reclaim = is_reclaimable(cachep);
>
> We don't need a new field, we already have cachep->flags accessible.
>
>> }
>>
>> void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *cachep)
>> diff --git a/mm/slab.h b/mm/slab.h
>> index 0733628..2f1ebce 100644
>> --- a/mm/slab.h
>> +++ b/mm/slab.h
>> @@ -186,6 +186,7 @@ struct slabinfo {
>> unsigned int shared;
>> unsigned int objects_per_slab;
>> unsigned int cache_order;
>> + unsigned int reclaim;
>
> Not needed.
>
>> };
>>
>> void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo);
>> @@ -352,6 +353,11 @@ static inline void memcg_link_cache(struct kmem_cache *s)
>>
>> #endif /* CONFIG_MEMCG && !CONFIG_SLOB */
>>
>> +static inline bool is_reclaimable(struct kmem_cache *s)
>> +{
>> + return (s->flags & SLAB_RECLAIM_ACCOUNT) ? true : false;
>> +}
>> +
>
> I don't think we need this.
>
>> static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
>> {
>> struct kmem_cache *cachep;
>> @@ -504,6 +510,7 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
>> void *memcg_slab_next(struct seq_file *m, void *p, loff_t *pos);
>> void memcg_slab_stop(struct seq_file *m, void *p);
>> int memcg_slab_show(struct seq_file *m, void *p);
>> +void show_unreclaimable_slab(void);
>>
>> void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
>>
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 904a83b..f2c6200 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -35,6 +35,8 @@
>> static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
>> slab_caches_to_rcu_destroy_workfn);
>>
>> +#define K(x) ((x)/1024)
>> +
>
> I don't think we need this.
>
>> /*
>> * Set of flags that will prevent slab merging
>> */
>> @@ -1272,6 +1274,35 @@ static int slab_show(struct seq_file *m, void *p)
>> return 0;
>> }
>>
>> +void show_unreclaimable_slab()
>
> void show_unreclaimable_slab(void)
>
>> +{
>> + struct kmem_cache *s = NULL;
>
> No initialization needed.
>
>> + struct slabinfo sinfo;
>> +
>> + memset(&sinfo, 0, sizeof(sinfo));
>> +
>> + printk("Unreclaimable slab info:\n");
>> + printk("Name Used Total\n");
>> +
>> + /*
>> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
>> + * get sleep in oom path right before kernel panic, and avoid race condition.
>> + * Since it is already oom, so there should be not any big allocation
>> + * which could change the statistics significantly.
>> + */
>> + list_for_each_entry(s, &slab_caches, list) {
>> + if (!is_root_cache(s))
>> + continue;
>> +
>
> We need to do the memset() here.
>
>> + get_slabinfo(s, &sinfo);
>> +
>> + if (!is_reclaimable(s) && sinfo.num_objs > 0)
>> + printk("%-17s %10luKB %10luKB\n", cache_name(s), K(sinfo.active_objs * s->size), K(sinfo.num_objs * s->size));
>
> I think you can just check for SLAB_RECLAIM_ACCOUNT here.
>
> Everything in this function should be pr_info().
>
>> + }
>> +}
>> +EXPORT_SYMBOL(show_unreclaimable_slab);
>> +#undef K
>> +
>> #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
>> void *memcg_slab_start(struct seq_file *m, loff_t *pos)
>> {
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 163352c..5c17c0a 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -5872,6 +5872,7 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
>> sinfo->num_slabs = nr_slabs;
>> sinfo->objects_per_slab = oo_objects(s->oo);
>> sinfo->cache_order = oo_order(s->oo);
>> + sinfo->reclaim = is_reclaimable(s);
>
> Not needed.
>
>> }
>>
>> void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s)
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
@ 2017-09-20 21:32 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-20 21:32 UTC (permalink / raw)
To: David Rientjes
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On 9/20/17 2:00 PM, David Rientjes wrote:
> On Thu, 21 Sep 2017, Yang Shi wrote:
>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 99736e0..173c423 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -43,6 +43,7 @@
>>
>> #include <asm/tlb.h>
>> #include "internal.h"
>> +#include "slab.h"
>>
>> #define CREATE_TRACE_POINTS
>> #include <trace/events/oom.h>
>> @@ -427,6 +428,14 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
>> dump_tasks(oc->memcg, oc->nodemask);
>> }
>>
>> +static void dump_header_with_slabinfo(struct oom_control *oc, struct task_struct *p)
>> +{
>> + dump_header(oc, p);
>> +
>> + if (IS_ENABLED(CONFIG_SLABINFO))
>> + show_unreclaimable_slab();
>> +}
>> +
>> /*
>> * Number of OOM victims in flight
>> */
>
> I don't think we need a new function for this. Where you want to dump
> unreclaimable slab before panic, just call a new dump_unreclaimable_slab()
> function that gets declared in slab.h that is a no-op when CONFIG_SLABINFO
> is disabled. We just want to do
>
> dump_header(...);
> dump_unreclaimable_slab(...);
> panic(...);
Thanks for the comment, they will be solved in v4.
Yang
>
>> diff --git a/mm/slab.c b/mm/slab.c
>> index 04dec48..4f4971c 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -4132,6 +4132,7 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
>> sinfo->shared = cachep->shared;
>> sinfo->objects_per_slab = cachep->num;
>> sinfo->cache_order = cachep->gfporder;
>> + sinfo->reclaim = is_reclaimable(cachep);
>
> We don't need a new field, we already have cachep->flags accessible.
>
>> }
>>
>> void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *cachep)
>> diff --git a/mm/slab.h b/mm/slab.h
>> index 0733628..2f1ebce 100644
>> --- a/mm/slab.h
>> +++ b/mm/slab.h
>> @@ -186,6 +186,7 @@ struct slabinfo {
>> unsigned int shared;
>> unsigned int objects_per_slab;
>> unsigned int cache_order;
>> + unsigned int reclaim;
>
> Not needed.
>
>> };
>>
>> void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo);
>> @@ -352,6 +353,11 @@ static inline void memcg_link_cache(struct kmem_cache *s)
>>
>> #endif /* CONFIG_MEMCG && !CONFIG_SLOB */
>>
>> +static inline bool is_reclaimable(struct kmem_cache *s)
>> +{
>> + return (s->flags & SLAB_RECLAIM_ACCOUNT) ? true : false;
>> +}
>> +
>
> I don't think we need this.
>
>> static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
>> {
>> struct kmem_cache *cachep;
>> @@ -504,6 +510,7 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
>> void *memcg_slab_next(struct seq_file *m, void *p, loff_t *pos);
>> void memcg_slab_stop(struct seq_file *m, void *p);
>> int memcg_slab_show(struct seq_file *m, void *p);
>> +void show_unreclaimable_slab(void);
>>
>> void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
>>
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 904a83b..f2c6200 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -35,6 +35,8 @@
>> static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
>> slab_caches_to_rcu_destroy_workfn);
>>
>> +#define K(x) ((x)/1024)
>> +
>
> I don't think we need this.
>
>> /*
>> * Set of flags that will prevent slab merging
>> */
>> @@ -1272,6 +1274,35 @@ static int slab_show(struct seq_file *m, void *p)
>> return 0;
>> }
>>
>> +void show_unreclaimable_slab()
>
> void show_unreclaimable_slab(void)
>
>> +{
>> + struct kmem_cache *s = NULL;
>
> No initialization needed.
>
>> + struct slabinfo sinfo;
>> +
>> + memset(&sinfo, 0, sizeof(sinfo));
>> +
>> + printk("Unreclaimable slab info:\n");
>> + printk("Name Used Total\n");
>> +
>> + /*
>> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
>> + * get sleep in oom path right before kernel panic, and avoid race condition.
>> + * Since it is already oom, so there should be not any big allocation
>> + * which could change the statistics significantly.
>> + */
>> + list_for_each_entry(s, &slab_caches, list) {
>> + if (!is_root_cache(s))
>> + continue;
>> +
>
> We need to do the memset() here.
>
>> + get_slabinfo(s, &sinfo);
>> +
>> + if (!is_reclaimable(s) && sinfo.num_objs > 0)
>> + printk("%-17s %10luKB %10luKB\n", cache_name(s), K(sinfo.active_objs * s->size), K(sinfo.num_objs * s->size));
>
> I think you can just check for SLAB_RECLAIM_ACCOUNT here.
>
> Everything in this function should be pr_info().
>
>> + }
>> +}
>> +EXPORT_SYMBOL(show_unreclaimable_slab);
>> +#undef K
>> +
>> #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
>> void *memcg_slab_start(struct seq_file *m, loff_t *pos)
>> {
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 163352c..5c17c0a 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -5872,6 +5872,7 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
>> sinfo->num_slabs = nr_slabs;
>> sinfo->objects_per_slab = oo_objects(s->oo);
>> sinfo->cache_order = oo_order(s->oo);
>> + sinfo->reclaim = is_reclaimable(s);
>
> Not needed.
>
>> }
>>
>> void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* [RFC v2] oom: capture unreclaimable slab info in oom message when kernel panic
@ 2017-09-18 18:26 Yang Shi
2017-09-18 18:26 ` Yang Shi
0 siblings, 1 reply; 44+ messages in thread
From: Yang Shi @ 2017-09-18 18:26 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Sorry, forgot send the cover letter with change log. Resent the patch series.
Recently we ran into a oom issue, kernel panic due to no killable process.
The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason.
So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case.
Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer.
With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only.
And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message.
For details, please see the commit log for each commit.
Changelog v1 —> v2:
* Removed the original patch 1 (“mm: slab: output reclaimable flag in /proc/slabinfo”) since Christopher suggested it might break the compatibility and /proc/slabinfo is legacy
* Added Christopher’s Acked-by
* Removed acquiring slab_mutex per Tetsuo’s comment
Yang Shi (2):
tools: slabinfo: add "-U" option to show unreclaimable slabs only
mm: oom: show unreclaimable slab info when kernel panic
mm/oom_kill.c | 13 +++++++++++--
mm/slab.c | 1 +
mm/slab.h | 7 +++++++
mm/slab_common.c | 30 ++++++++++++++++++++++++++++++
mm/slub.c | 1 +
tools/vm/slabinfo.c | 11 ++++++++++-
6 files changed, 60 insertions(+), 3 deletions(-)
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
2017-09-18 18:26 [RFC v2] oom: capture unreclaimable slab info in oom message " Yang Shi
@ 2017-09-18 18:26 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-18 18:26 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Kernel may panic when oom happens without killable process sometimes it
is caused by huge unreclaimable slabs used by kernel.
Altough kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.
Add a field in struct slibinfo to show if this slab is reclaimable or
not, and a helper function to achieve the value from
SLAB_RECLAIM_ACCOUNT flag.
Print out unreclaimable slab info which actual memory usage is not zero
(num_objs * size != 0) when panic_on_oom is set or no killable process.
Since such information is just showed when kernel panic, so it will not
lead too verbose message for normal oom.
The output looks like:
rpc_buffers 31KB
rpc_tasks 31KB
avtab_node 46735KB
xfs_buf 624KB
xfs_ili 48KB
xfs_efi_item 31KB
xfs_efd_item 31KB
xfs_buf_item 78KB
xfs_log_item_desc 141KB
xfs_trans 108KB
xfs_ifork 744KB
xfs_trans 108KB
xfs_ifork 744KB
xfs_da_state 126KB
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
mm/oom_kill.c | 13 +++++++++++--
mm/slab.c | 1 +
mm/slab.h | 7 +++++++
mm/slab_common.c | 30 ++++++++++++++++++++++++++++++
mm/slub.c | 1 +
5 files changed, 50 insertions(+), 2 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e0..173c423 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -43,6 +43,7 @@
#include <asm/tlb.h>
#include "internal.h"
+#include "slab.h"
#define CREATE_TRACE_POINTS
#include <trace/events/oom.h>
@@ -427,6 +428,14 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
dump_tasks(oc->memcg, oc->nodemask);
}
+static void dump_header_with_slabinfo(struct oom_control *oc, struct task_struct *p)
+{
+ dump_header(oc, p);
+
+ if (IS_ENABLED(CONFIG_SLABINFO))
+ show_unreclaimable_slab();
+}
+
/*
* Number of OOM victims in flight
*/
@@ -959,7 +968,7 @@ static void check_panic_on_oom(struct oom_control *oc,
/* Do not panic for oom kills triggered by sysrq */
if (is_sysrq_oom(oc))
return;
- dump_header(oc, NULL);
+ dump_header_with_slabinfo(oc, NULL);
panic("Out of memory: %s panic_on_oom is enabled\n",
sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
}
@@ -1043,7 +1052,7 @@ bool out_of_memory(struct oom_control *oc)
select_bad_process(oc);
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
- dump_header(oc, NULL);
+ dump_header_with_slabinfo(oc, NULL);
panic("Out of memory and no killable processes...\n");
}
if (oc->chosen && oc->chosen != (void *)-1UL) {
diff --git a/mm/slab.c b/mm/slab.c
index 04dec48..4f4971c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4132,6 +4132,7 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
sinfo->shared = cachep->shared;
sinfo->objects_per_slab = cachep->num;
sinfo->cache_order = cachep->gfporder;
+ sinfo->reclaim = is_reclaimable(cachep);
}
void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *cachep)
diff --git a/mm/slab.h b/mm/slab.h
index 0733628..2f1ebce 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -186,6 +186,7 @@ struct slabinfo {
unsigned int shared;
unsigned int objects_per_slab;
unsigned int cache_order;
+ unsigned int reclaim;
};
void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo);
@@ -352,6 +353,11 @@ static inline void memcg_link_cache(struct kmem_cache *s)
#endif /* CONFIG_MEMCG && !CONFIG_SLOB */
+static inline bool is_reclaimable(struct kmem_cache *s)
+{
+ return (s->flags & SLAB_RECLAIM_ACCOUNT) ? true : false;
+}
+
static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
{
struct kmem_cache *cachep;
@@ -504,6 +510,7 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
void *memcg_slab_next(struct seq_file *m, void *p, loff_t *pos);
void memcg_slab_stop(struct seq_file *m, void *p);
int memcg_slab_show(struct seq_file *m, void *p);
+void show_unreclaimable_slab(void);
void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83b..665baf2 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -35,6 +35,8 @@
static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
slab_caches_to_rcu_destroy_workfn);
+#define K(x) ((x)/1024)
+
/*
* Set of flags that will prevent slab merging
*/
@@ -1272,6 +1274,34 @@ static int slab_show(struct seq_file *m, void *p)
return 0;
}
+void show_unreclaimable_slab()
+{
+ struct kmem_cache *s = NULL;
+ struct slabinfo sinfo;
+
+ memset(&sinfo, 0, sizeof(sinfo));
+
+ printk("Unreclaimable slabs:\n");
+
+ /*
+ * Here acquiring slab_mutex is unnecessary since we don't prefer to
+ * get sleep in oom path right before kernel panic, and avoid race condition.
+ * Since it is already oom, so there should be not any big allocation
+ * which could change the statistics significantly.
+ */
+ list_for_each_entry(s, &slab_caches, list) {
+ if (!is_root_cache(s))
+ continue;
+
+ get_slabinfo(s, &sinfo);
+
+ if (!is_reclaimable(s) && sinfo.num_objs > 0)
+ printk("%-17s %luKB\n", cache_name(s), K(sinfo.num_objs * s->size));
+ }
+}
+EXPORT_SYMBOL(show_unreclaimable_slab);
+#undef K
+
#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
void *memcg_slab_start(struct seq_file *m, loff_t *pos)
{
diff --git a/mm/slub.c b/mm/slub.c
index 163352c..5c17c0a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5872,6 +5872,7 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
sinfo->num_slabs = nr_slabs;
sinfo->objects_per_slab = oo_objects(s->oo);
sinfo->cache_order = oo_order(s->oo);
+ sinfo->reclaim = is_reclaimable(s);
}
void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
@ 2017-09-18 18:26 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-18 18:26 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Kernel may panic when oom happens without killable process sometimes it
is caused by huge unreclaimable slabs used by kernel.
Altough kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.
Add a field in struct slibinfo to show if this slab is reclaimable or
not, and a helper function to achieve the value from
SLAB_RECLAIM_ACCOUNT flag.
Print out unreclaimable slab info which actual memory usage is not zero
(num_objs * size != 0) when panic_on_oom is set or no killable process.
Since such information is just showed when kernel panic, so it will not
lead too verbose message for normal oom.
The output looks like:
rpc_buffers 31KB
rpc_tasks 31KB
avtab_node 46735KB
xfs_buf 624KB
xfs_ili 48KB
xfs_efi_item 31KB
xfs_efd_item 31KB
xfs_buf_item 78KB
xfs_log_item_desc 141KB
xfs_trans 108KB
xfs_ifork 744KB
xfs_trans 108KB
xfs_ifork 744KB
xfs_da_state 126KB
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
mm/oom_kill.c | 13 +++++++++++--
mm/slab.c | 1 +
mm/slab.h | 7 +++++++
mm/slab_common.c | 30 ++++++++++++++++++++++++++++++
mm/slub.c | 1 +
5 files changed, 50 insertions(+), 2 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e0..173c423 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -43,6 +43,7 @@
#include <asm/tlb.h>
#include "internal.h"
+#include "slab.h"
#define CREATE_TRACE_POINTS
#include <trace/events/oom.h>
@@ -427,6 +428,14 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
dump_tasks(oc->memcg, oc->nodemask);
}
+static void dump_header_with_slabinfo(struct oom_control *oc, struct task_struct *p)
+{
+ dump_header(oc, p);
+
+ if (IS_ENABLED(CONFIG_SLABINFO))
+ show_unreclaimable_slab();
+}
+
/*
* Number of OOM victims in flight
*/
@@ -959,7 +968,7 @@ static void check_panic_on_oom(struct oom_control *oc,
/* Do not panic for oom kills triggered by sysrq */
if (is_sysrq_oom(oc))
return;
- dump_header(oc, NULL);
+ dump_header_with_slabinfo(oc, NULL);
panic("Out of memory: %s panic_on_oom is enabled\n",
sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
}
@@ -1043,7 +1052,7 @@ bool out_of_memory(struct oom_control *oc)
select_bad_process(oc);
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
- dump_header(oc, NULL);
+ dump_header_with_slabinfo(oc, NULL);
panic("Out of memory and no killable processes...\n");
}
if (oc->chosen && oc->chosen != (void *)-1UL) {
diff --git a/mm/slab.c b/mm/slab.c
index 04dec48..4f4971c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4132,6 +4132,7 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
sinfo->shared = cachep->shared;
sinfo->objects_per_slab = cachep->num;
sinfo->cache_order = cachep->gfporder;
+ sinfo->reclaim = is_reclaimable(cachep);
}
void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *cachep)
diff --git a/mm/slab.h b/mm/slab.h
index 0733628..2f1ebce 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -186,6 +186,7 @@ struct slabinfo {
unsigned int shared;
unsigned int objects_per_slab;
unsigned int cache_order;
+ unsigned int reclaim;
};
void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo);
@@ -352,6 +353,11 @@ static inline void memcg_link_cache(struct kmem_cache *s)
#endif /* CONFIG_MEMCG && !CONFIG_SLOB */
+static inline bool is_reclaimable(struct kmem_cache *s)
+{
+ return (s->flags & SLAB_RECLAIM_ACCOUNT) ? true : false;
+}
+
static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
{
struct kmem_cache *cachep;
@@ -504,6 +510,7 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
void *memcg_slab_next(struct seq_file *m, void *p, loff_t *pos);
void memcg_slab_stop(struct seq_file *m, void *p);
int memcg_slab_show(struct seq_file *m, void *p);
+void show_unreclaimable_slab(void);
void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83b..665baf2 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -35,6 +35,8 @@
static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
slab_caches_to_rcu_destroy_workfn);
+#define K(x) ((x)/1024)
+
/*
* Set of flags that will prevent slab merging
*/
@@ -1272,6 +1274,34 @@ static int slab_show(struct seq_file *m, void *p)
return 0;
}
+void show_unreclaimable_slab()
+{
+ struct kmem_cache *s = NULL;
+ struct slabinfo sinfo;
+
+ memset(&sinfo, 0, sizeof(sinfo));
+
+ printk("Unreclaimable slabs:\n");
+
+ /*
+ * Here acquiring slab_mutex is unnecessary since we don't prefer to
+ * get sleep in oom path right before kernel panic, and avoid race condition.
+ * Since it is already oom, so there should be not any big allocation
+ * which could change the statistics significantly.
+ */
+ list_for_each_entry(s, &slab_caches, list) {
+ if (!is_root_cache(s))
+ continue;
+
+ get_slabinfo(s, &sinfo);
+
+ if (!is_reclaimable(s) && sinfo.num_objs > 0)
+ printk("%-17s %luKB\n", cache_name(s), K(sinfo.num_objs * s->size));
+ }
+}
+EXPORT_SYMBOL(show_unreclaimable_slab);
+#undef K
+
#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
void *memcg_slab_start(struct seq_file *m, loff_t *pos)
{
diff --git a/mm/slub.c b/mm/slub.c
index 163352c..5c17c0a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5872,6 +5872,7 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
sinfo->num_slabs = nr_slabs;
sinfo->objects_per_slab = oo_objects(s->oo);
sinfo->cache_order = oo_order(s->oo);
+ sinfo->reclaim = is_reclaimable(s);
}
void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s)
--
1.8.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
2017-09-18 18:26 ` Yang Shi
@ 2017-09-19 20:57 ` David Rientjes
-1 siblings, 0 replies; 44+ messages in thread
From: David Rientjes @ 2017-09-19 20:57 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On Tue, 19 Sep 2017, Yang Shi wrote:
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -35,6 +35,8 @@
> static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
> slab_caches_to_rcu_destroy_workfn);
>
> +#define K(x) ((x)/1024)
> +
> /*
> * Set of flags that will prevent slab merging
> */
> @@ -1272,6 +1274,34 @@ static int slab_show(struct seq_file *m, void *p)
> return 0;
> }
>
> +void show_unreclaimable_slab()
> +{
> + struct kmem_cache *s = NULL;
> + struct slabinfo sinfo;
> +
> + memset(&sinfo, 0, sizeof(sinfo));
> +
> + printk("Unreclaimable slabs:\n");
> +
> + /*
> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
> + * get sleep in oom path right before kernel panic, and avoid race condition.
> + * Since it is already oom, so there should be not any big allocation
> + * which could change the statistics significantly.
> + */
> + list_for_each_entry(s, &slab_caches, list) {
> + if (!is_root_cache(s))
> + continue;
> +
> + get_slabinfo(s, &sinfo);
> +
> + if (!is_reclaimable(s) && sinfo.num_objs > 0)
> + printk("%-17s %luKB\n", cache_name(s), K(sinfo.num_objs * s->size));
> + }
I like this, but could we be even more helpful by giving the user more
information from sinfo beyond just the total size of objects allocated?
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
@ 2017-09-19 20:57 ` David Rientjes
0 siblings, 0 replies; 44+ messages in thread
From: David Rientjes @ 2017-09-19 20:57 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On Tue, 19 Sep 2017, Yang Shi wrote:
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -35,6 +35,8 @@
> static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
> slab_caches_to_rcu_destroy_workfn);
>
> +#define K(x) ((x)/1024)
> +
> /*
> * Set of flags that will prevent slab merging
> */
> @@ -1272,6 +1274,34 @@ static int slab_show(struct seq_file *m, void *p)
> return 0;
> }
>
> +void show_unreclaimable_slab()
> +{
> + struct kmem_cache *s = NULL;
> + struct slabinfo sinfo;
> +
> + memset(&sinfo, 0, sizeof(sinfo));
> +
> + printk("Unreclaimable slabs:\n");
> +
> + /*
> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
> + * get sleep in oom path right before kernel panic, and avoid race condition.
> + * Since it is already oom, so there should be not any big allocation
> + * which could change the statistics significantly.
> + */
> + list_for_each_entry(s, &slab_caches, list) {
> + if (!is_root_cache(s))
> + continue;
> +
> + get_slabinfo(s, &sinfo);
> +
> + if (!is_reclaimable(s) && sinfo.num_objs > 0)
> + printk("%-17s %luKB\n", cache_name(s), K(sinfo.num_objs * s->size));
> + }
I like this, but could we be even more helpful by giving the user more
information from sinfo beyond just the total size of objects allocated?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
2017-09-19 20:57 ` David Rientjes
@ 2017-09-19 21:45 ` Yang Shi
-1 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-19 21:45 UTC (permalink / raw)
To: David Rientjes
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On 9/19/17 1:57 PM, David Rientjes wrote:
> On Tue, 19 Sep 2017, Yang Shi wrote:
>
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -35,6 +35,8 @@
>> static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
>> slab_caches_to_rcu_destroy_workfn);
>>
>> +#define K(x) ((x)/1024)
>> +
>> /*
>> * Set of flags that will prevent slab merging
>> */
>> @@ -1272,6 +1274,34 @@ static int slab_show(struct seq_file *m, void *p)
>> return 0;
>> }
>>
>> +void show_unreclaimable_slab()
>> +{
>> + struct kmem_cache *s = NULL;
>> + struct slabinfo sinfo;
>> +
>> + memset(&sinfo, 0, sizeof(sinfo));
>> +
>> + printk("Unreclaimable slabs:\n");
>> +
>> + /*
>> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
>> + * get sleep in oom path right before kernel panic, and avoid race condition.
>> + * Since it is already oom, so there should be not any big allocation
>> + * which could change the statistics significantly.
>> + */
>> + list_for_each_entry(s, &slab_caches, list) {
>> + if (!is_root_cache(s))
>> + continue;
>> +
>> + get_slabinfo(s, &sinfo);
>> +
>> + if (!is_reclaimable(s) && sinfo.num_objs > 0)
>> + printk("%-17s %luKB\n", cache_name(s), K(sinfo.num_objs * s->size));
>> + }
>
> I like this, but could we be even more helpful by giving the user more
> information from sinfo beyond just the total size of objects allocated?
Sure, we definitely can. But, the question is what info is helpful to
users to diagnose oom other than the size.
I think of the below:
- the number of active objs, the number of total objs, the percentage
of active objs per cache
- the number of active slabs, the number of total slabs, the percentage
of active slabs per cache
Anything else?
Thanks,
Yang
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
@ 2017-09-19 21:45 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-19 21:45 UTC (permalink / raw)
To: David Rientjes
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On 9/19/17 1:57 PM, David Rientjes wrote:
> On Tue, 19 Sep 2017, Yang Shi wrote:
>
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -35,6 +35,8 @@
>> static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
>> slab_caches_to_rcu_destroy_workfn);
>>
>> +#define K(x) ((x)/1024)
>> +
>> /*
>> * Set of flags that will prevent slab merging
>> */
>> @@ -1272,6 +1274,34 @@ static int slab_show(struct seq_file *m, void *p)
>> return 0;
>> }
>>
>> +void show_unreclaimable_slab()
>> +{
>> + struct kmem_cache *s = NULL;
>> + struct slabinfo sinfo;
>> +
>> + memset(&sinfo, 0, sizeof(sinfo));
>> +
>> + printk("Unreclaimable slabs:\n");
>> +
>> + /*
>> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
>> + * get sleep in oom path right before kernel panic, and avoid race condition.
>> + * Since it is already oom, so there should be not any big allocation
>> + * which could change the statistics significantly.
>> + */
>> + list_for_each_entry(s, &slab_caches, list) {
>> + if (!is_root_cache(s))
>> + continue;
>> +
>> + get_slabinfo(s, &sinfo);
>> +
>> + if (!is_reclaimable(s) && sinfo.num_objs > 0)
>> + printk("%-17s %luKB\n", cache_name(s), K(sinfo.num_objs * s->size));
>> + }
>
> I like this, but could we be even more helpful by giving the user more
> information from sinfo beyond just the total size of objects allocated?
Sure, we definitely can. But, the question is what info is helpful to
users to diagnose oom other than the size.
I think of the below:
- the number of active objs, the number of total objs, the percentage
of active objs per cache
- the number of active slabs, the number of total slabs, the percentage
of active slabs per cache
Anything else?
Thanks,
Yang
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
2017-09-19 21:45 ` Yang Shi
@ 2017-09-19 22:41 ` David Rientjes
-1 siblings, 0 replies; 44+ messages in thread
From: David Rientjes @ 2017-09-19 22:41 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On Wed, 20 Sep 2017, Yang Shi wrote:
> > > --- a/mm/slab_common.c
> > > +++ b/mm/slab_common.c
> > > @@ -35,6 +35,8 @@
> > > static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
> > > slab_caches_to_rcu_destroy_workfn);
> > > +#define K(x) ((x)/1024)
> > > +
> > > /*
> > > * Set of flags that will prevent slab merging
> > > */
> > > @@ -1272,6 +1274,34 @@ static int slab_show(struct seq_file *m, void *p)
> > > return 0;
> > > }
> > > +void show_unreclaimable_slab()
> > > +{
> > > + struct kmem_cache *s = NULL;
> > > + struct slabinfo sinfo;
> > > +
> > > + memset(&sinfo, 0, sizeof(sinfo));
> > > +
> > > + printk("Unreclaimable slabs:\n");
> > > +
> > > + /*
> > > + * Here acquiring slab_mutex is unnecessary since we don't prefer to
> > > + * get sleep in oom path right before kernel panic, and avoid race
> > > condition.
> > > + * Since it is already oom, so there should be not any big allocation
> > > + * which could change the statistics significantly.
> > > + */
> > > + list_for_each_entry(s, &slab_caches, list) {
> > > + if (!is_root_cache(s))
> > > + continue;
> > > +
> > > + get_slabinfo(s, &sinfo);
> > > +
> > > + if (!is_reclaimable(s) && sinfo.num_objs > 0)
> > > + printk("%-17s %luKB\n", cache_name(s),
> > > K(sinfo.num_objs * s->size));
> > > + }
> >
> > I like this, but could we be even more helpful by giving the user more
> > information from sinfo beyond just the total size of objects allocated?
>
> Sure, we definitely can. But, the question is what info is helpful to users to
> diagnose oom other than the size.
>
> I think of the below:
> - the number of active objs, the number of total objs, the percentage
> of active objs per cache
> - the number of active slabs, the number of total slabs, the
> percentage of active slabs per cache
>
> Anything else?
>
Right now it's a useful tool to find out what unreclaimable slab is
sitting around that is causing the system to run out of memory. If we
knew how much of this slab is actually in use vs free, it can determine if
its stranding or if there's a bug in the slab allocator itself.
We wouldn't need percentages, we can calculate that directly from the
data if necessary.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
@ 2017-09-19 22:41 ` David Rientjes
0 siblings, 0 replies; 44+ messages in thread
From: David Rientjes @ 2017-09-19 22:41 UTC (permalink / raw)
To: Yang Shi
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On Wed, 20 Sep 2017, Yang Shi wrote:
> > > --- a/mm/slab_common.c
> > > +++ b/mm/slab_common.c
> > > @@ -35,6 +35,8 @@
> > > static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
> > > slab_caches_to_rcu_destroy_workfn);
> > > +#define K(x) ((x)/1024)
> > > +
> > > /*
> > > * Set of flags that will prevent slab merging
> > > */
> > > @@ -1272,6 +1274,34 @@ static int slab_show(struct seq_file *m, void *p)
> > > return 0;
> > > }
> > > +void show_unreclaimable_slab()
> > > +{
> > > + struct kmem_cache *s = NULL;
> > > + struct slabinfo sinfo;
> > > +
> > > + memset(&sinfo, 0, sizeof(sinfo));
> > > +
> > > + printk("Unreclaimable slabs:\n");
> > > +
> > > + /*
> > > + * Here acquiring slab_mutex is unnecessary since we don't prefer to
> > > + * get sleep in oom path right before kernel panic, and avoid race
> > > condition.
> > > + * Since it is already oom, so there should be not any big allocation
> > > + * which could change the statistics significantly.
> > > + */
> > > + list_for_each_entry(s, &slab_caches, list) {
> > > + if (!is_root_cache(s))
> > > + continue;
> > > +
> > > + get_slabinfo(s, &sinfo);
> > > +
> > > + if (!is_reclaimable(s) && sinfo.num_objs > 0)
> > > + printk("%-17s %luKB\n", cache_name(s),
> > > K(sinfo.num_objs * s->size));
> > > + }
> >
> > I like this, but could we be even more helpful by giving the user more
> > information from sinfo beyond just the total size of objects allocated?
>
> Sure, we definitely can. But, the question is what info is helpful to users to
> diagnose oom other than the size.
>
> I think of the below:
> - the number of active objs, the number of total objs, the percentage
> of active objs per cache
> - the number of active slabs, the number of total slabs, the
> percentage of active slabs per cache
>
> Anything else?
>
Right now it's a useful tool to find out what unreclaimable slab is
sitting around that is causing the system to run out of memory. If we
knew how much of this slab is actually in use vs free, it can determine if
its stranding or if there's a bug in the slab allocator itself.
We wouldn't need percentages, we can calculate that directly from the
data if necessary.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
2017-09-19 22:41 ` David Rientjes
@ 2017-09-19 23:03 ` Yang Shi
-1 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-19 23:03 UTC (permalink / raw)
To: David Rientjes
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On 9/19/17 3:41 PM, David Rientjes wrote:
> On Wed, 20 Sep 2017, Yang Shi wrote:
>
>>>> --- a/mm/slab_common.c
>>>> +++ b/mm/slab_common.c
>>>> @@ -35,6 +35,8 @@
>>>> static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
>>>> slab_caches_to_rcu_destroy_workfn);
>>>> +#define K(x) ((x)/1024)
>>>> +
>>>> /*
>>>> * Set of flags that will prevent slab merging
>>>> */
>>>> @@ -1272,6 +1274,34 @@ static int slab_show(struct seq_file *m, void *p)
>>>> return 0;
>>>> }
>>>> +void show_unreclaimable_slab()
>>>> +{
>>>> + struct kmem_cache *s = NULL;
>>>> + struct slabinfo sinfo;
>>>> +
>>>> + memset(&sinfo, 0, sizeof(sinfo));
>>>> +
>>>> + printk("Unreclaimable slabs:\n");
>>>> +
>>>> + /*
>>>> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
>>>> + * get sleep in oom path right before kernel panic, and avoid race
>>>> condition.
>>>> + * Since it is already oom, so there should be not any big allocation
>>>> + * which could change the statistics significantly.
>>>> + */
>>>> + list_for_each_entry(s, &slab_caches, list) {
>>>> + if (!is_root_cache(s))
>>>> + continue;
>>>> +
>>>> + get_slabinfo(s, &sinfo);
>>>> +
>>>> + if (!is_reclaimable(s) && sinfo.num_objs > 0)
>>>> + printk("%-17s %luKB\n", cache_name(s),
>>>> K(sinfo.num_objs * s->size));
>>>> + }
>>>
>>> I like this, but could we be even more helpful by giving the user more
>>> information from sinfo beyond just the total size of objects allocated?
>>
>> Sure, we definitely can. But, the question is what info is helpful to users to
>> diagnose oom other than the size.
>>
>> I think of the below:
>> - the number of active objs, the number of total objs, the percentage
>> of active objs per cache
>> - the number of active slabs, the number of total slabs, the
>> percentage of active slabs per cache
>>
>> Anything else?
>>
>
> Right now it's a useful tool to find out what unreclaimable slab is
> sitting around that is causing the system to run out of memory. If we
> knew how much of this slab is actually in use vs free, it can determine if
> its stranding or if there's a bug in the slab allocator itself.
I see. You prefer to have a report which looks like:
Cache Used size Free size
mm_struct 100K 50K
Or show the total size (used + free) instead of free size. And, may plus
the number of objs and the number of total objs.
Thanks,
Yang
>
> We wouldn't need percentages, we can calculate that directly from the
> data if necessary.
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
@ 2017-09-19 23:03 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-19 23:03 UTC (permalink / raw)
To: David Rientjes
Cc: cl, penberg, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel
On 9/19/17 3:41 PM, David Rientjes wrote:
> On Wed, 20 Sep 2017, Yang Shi wrote:
>
>>>> --- a/mm/slab_common.c
>>>> +++ b/mm/slab_common.c
>>>> @@ -35,6 +35,8 @@
>>>> static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
>>>> slab_caches_to_rcu_destroy_workfn);
>>>> +#define K(x) ((x)/1024)
>>>> +
>>>> /*
>>>> * Set of flags that will prevent slab merging
>>>> */
>>>> @@ -1272,6 +1274,34 @@ static int slab_show(struct seq_file *m, void *p)
>>>> return 0;
>>>> }
>>>> +void show_unreclaimable_slab()
>>>> +{
>>>> + struct kmem_cache *s = NULL;
>>>> + struct slabinfo sinfo;
>>>> +
>>>> + memset(&sinfo, 0, sizeof(sinfo));
>>>> +
>>>> + printk("Unreclaimable slabs:\n");
>>>> +
>>>> + /*
>>>> + * Here acquiring slab_mutex is unnecessary since we don't prefer to
>>>> + * get sleep in oom path right before kernel panic, and avoid race
>>>> condition.
>>>> + * Since it is already oom, so there should be not any big allocation
>>>> + * which could change the statistics significantly.
>>>> + */
>>>> + list_for_each_entry(s, &slab_caches, list) {
>>>> + if (!is_root_cache(s))
>>>> + continue;
>>>> +
>>>> + get_slabinfo(s, &sinfo);
>>>> +
>>>> + if (!is_reclaimable(s) && sinfo.num_objs > 0)
>>>> + printk("%-17s %luKB\n", cache_name(s),
>>>> K(sinfo.num_objs * s->size));
>>>> + }
>>>
>>> I like this, but could we be even more helpful by giving the user more
>>> information from sinfo beyond just the total size of objects allocated?
>>
>> Sure, we definitely can. But, the question is what info is helpful to users to
>> diagnose oom other than the size.
>>
>> I think of the below:
>> - the number of active objs, the number of total objs, the percentage
>> of active objs per cache
>> - the number of active slabs, the number of total slabs, the
>> percentage of active slabs per cache
>>
>> Anything else?
>>
>
> Right now it's a useful tool to find out what unreclaimable slab is
> sitting around that is causing the system to run out of memory. If we
> knew how much of this slab is actually in use vs free, it can determine if
> its stranding or if there's a bug in the slab allocator itself.
I see. You prefer to have a report which looks like:
Cache Used size Free size
mm_struct 100K 50K
Or show the total size (used + free) instead of free size. And, may plus
the number of objs and the number of total objs.
Thanks,
Yang
>
> We wouldn't need percentages, we can calculate that directly from the
> data if necessary.
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 1/2] tools: slabinfo: add "-U" option to show unreclaimable slabs only
@ 2017-09-18 18:23 Yang Shi
2017-09-18 18:23 ` Yang Shi
0 siblings, 1 reply; 44+ messages in thread
From: Yang Shi @ 2017-09-18 18:23 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Add "-U" option to show unreclaimable slabs only.
"-U" and "-S" together can tell us what unreclaimable slabs use the most
memory to help debug huge unreclaimable slabs issue.
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
tools/vm/slabinfo.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c
index b9d34b3..9673190 100644
--- a/tools/vm/slabinfo.c
+++ b/tools/vm/slabinfo.c
@@ -83,6 +83,7 @@ struct aliasinfo {
int sort_loss;
int extended_totals;
int show_bytes;
+int unreclaim_only;
/* Debug options */
int sanity;
@@ -132,6 +133,7 @@ static void usage(void)
"-L|--Loss Sort by loss\n"
"-X|--Xtotals Show extended summary information\n"
"-B|--Bytes Show size in bytes\n"
+ "-U|--unreclaim Show unreclaimable slabs only\n"
"\nValid debug options (FZPUT may be combined)\n"
"a / A Switch on all debug options (=FZUP)\n"
"- Switch off all debug options\n"
@@ -568,6 +570,9 @@ static void slabcache(struct slabinfo *s)
if (strcmp(s->name, "*") == 0)
return;
+ if (unreclaim_only && s->reclaim_account)
+ return;
+
if (actual_slabs == 1) {
report(s);
return;
@@ -1346,6 +1351,7 @@ struct option opts[] = {
{ "Loss", no_argument, NULL, 'L'},
{ "Xtotals", no_argument, NULL, 'X'},
{ "Bytes", no_argument, NULL, 'B'},
+ { "unreclaim", no_argument, NULL, 'U'},
{ NULL, 0, NULL, 0 }
};
@@ -1357,7 +1363,7 @@ int main(int argc, char *argv[])
page_size = getpagesize();
- while ((c = getopt_long(argc, argv, "aAd::Defhil1noprstvzTSN:LXB",
+ while ((c = getopt_long(argc, argv, "aAd::Defhil1noprstvzTSN:LXBU",
opts, NULL)) != -1)
switch (c) {
case '1':
@@ -1438,6 +1444,9 @@ int main(int argc, char *argv[])
case 'B':
show_bytes = 1;
break;
+ case 'U':
+ unreclaim_only = 1;
+ break;
default:
fatal("%s: Invalid option '%c'\n", argv[0], optopt);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
2017-09-18 18:23 [PATCH 1/2] tools: slabinfo: add "-U" option to show unreclaimable slabs only Yang Shi
@ 2017-09-18 18:23 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-18 18:23 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Kernel may panic when oom happens without killable process sometimes it
is caused by huge unreclaimable slabs used by kernel.
Altough kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.
Add a field in struct slibinfo to show if this slab is reclaimable or
not, and a helper function to achieve the value from
SLAB_RECLAIM_ACCOUNT flag.
Print out unreclaimable slab info which actual memory usage is not zero
(num_objs * size != 0) when panic_on_oom is set or no killable process.
Since such information is just showed when kernel panic, so it will not
lead too verbose message for normal oom.
The output looks like:
rpc_buffers 31KB
rpc_tasks 31KB
avtab_node 46735KB
xfs_buf 624KB
xfs_ili 48KB
xfs_efi_item 31KB
xfs_efd_item 31KB
xfs_buf_item 78KB
xfs_log_item_desc 141KB
xfs_trans 108KB
xfs_ifork 744KB
xfs_trans 108KB
xfs_ifork 744KB
xfs_da_state 126KB
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
mm/oom_kill.c | 13 +++++++++++--
mm/slab.c | 1 +
mm/slab.h | 7 +++++++
mm/slab_common.c | 30 ++++++++++++++++++++++++++++++
mm/slub.c | 1 +
5 files changed, 50 insertions(+), 2 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e0..173c423 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -43,6 +43,7 @@
#include <asm/tlb.h>
#include "internal.h"
+#include "slab.h"
#define CREATE_TRACE_POINTS
#include <trace/events/oom.h>
@@ -427,6 +428,14 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
dump_tasks(oc->memcg, oc->nodemask);
}
+static void dump_header_with_slabinfo(struct oom_control *oc, struct task_struct *p)
+{
+ dump_header(oc, p);
+
+ if (IS_ENABLED(CONFIG_SLABINFO))
+ show_unreclaimable_slab();
+}
+
/*
* Number of OOM victims in flight
*/
@@ -959,7 +968,7 @@ static void check_panic_on_oom(struct oom_control *oc,
/* Do not panic for oom kills triggered by sysrq */
if (is_sysrq_oom(oc))
return;
- dump_header(oc, NULL);
+ dump_header_with_slabinfo(oc, NULL);
panic("Out of memory: %s panic_on_oom is enabled\n",
sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
}
@@ -1043,7 +1052,7 @@ bool out_of_memory(struct oom_control *oc)
select_bad_process(oc);
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
- dump_header(oc, NULL);
+ dump_header_with_slabinfo(oc, NULL);
panic("Out of memory and no killable processes...\n");
}
if (oc->chosen && oc->chosen != (void *)-1UL) {
diff --git a/mm/slab.c b/mm/slab.c
index 04dec48..4f4971c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4132,6 +4132,7 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
sinfo->shared = cachep->shared;
sinfo->objects_per_slab = cachep->num;
sinfo->cache_order = cachep->gfporder;
+ sinfo->reclaim = is_reclaimable(cachep);
}
void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *cachep)
diff --git a/mm/slab.h b/mm/slab.h
index 0733628..2f1ebce 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -186,6 +186,7 @@ struct slabinfo {
unsigned int shared;
unsigned int objects_per_slab;
unsigned int cache_order;
+ unsigned int reclaim;
};
void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo);
@@ -352,6 +353,11 @@ static inline void memcg_link_cache(struct kmem_cache *s)
#endif /* CONFIG_MEMCG && !CONFIG_SLOB */
+static inline bool is_reclaimable(struct kmem_cache *s)
+{
+ return (s->flags & SLAB_RECLAIM_ACCOUNT) ? true : false;
+}
+
static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
{
struct kmem_cache *cachep;
@@ -504,6 +510,7 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
void *memcg_slab_next(struct seq_file *m, void *p, loff_t *pos);
void memcg_slab_stop(struct seq_file *m, void *p);
int memcg_slab_show(struct seq_file *m, void *p);
+void show_unreclaimable_slab(void);
void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83b..665baf2 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -35,6 +35,8 @@
static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
slab_caches_to_rcu_destroy_workfn);
+#define K(x) ((x)/1024)
+
/*
* Set of flags that will prevent slab merging
*/
@@ -1272,6 +1274,34 @@ static int slab_show(struct seq_file *m, void *p)
return 0;
}
+void show_unreclaimable_slab()
+{
+ struct kmem_cache *s = NULL;
+ struct slabinfo sinfo;
+
+ memset(&sinfo, 0, sizeof(sinfo));
+
+ printk("Unreclaimable slabs:\n");
+
+ /*
+ * Here acquiring slab_mutex is unnecessary since we don't prefer to
+ * get sleep in oom path right before kernel panic, and avoid race condition.
+ * Since it is already oom, so there should be not any big allocation
+ * which could change the statistics significantly.
+ */
+ list_for_each_entry(s, &slab_caches, list) {
+ if (!is_root_cache(s))
+ continue;
+
+ get_slabinfo(s, &sinfo);
+
+ if (!is_reclaimable(s) && sinfo.num_objs > 0)
+ printk("%-17s %luKB\n", cache_name(s), K(sinfo.num_objs * s->size));
+ }
+}
+EXPORT_SYMBOL(show_unreclaimable_slab);
+#undef K
+
#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
void *memcg_slab_start(struct seq_file *m, loff_t *pos)
{
diff --git a/mm/slub.c b/mm/slub.c
index 163352c..5c17c0a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5872,6 +5872,7 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
sinfo->num_slabs = nr_slabs;
sinfo->objects_per_slab = oo_objects(s->oo);
sinfo->cache_order = oo_order(s->oo);
+ sinfo->reclaim = is_reclaimable(s);
}
void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic
@ 2017-09-18 18:23 ` Yang Shi
0 siblings, 0 replies; 44+ messages in thread
From: Yang Shi @ 2017-09-18 18:23 UTC (permalink / raw)
To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
Cc: Yang Shi, linux-mm, linux-kernel
Kernel may panic when oom happens without killable process sometimes it
is caused by huge unreclaimable slabs used by kernel.
Altough kdump could help debug such problem, however, kdump is not
available on all architectures and it might be malfunction sometime.
And, since kernel already panic it is worthy capturing such information
in dmesg to aid touble shooting.
Add a field in struct slibinfo to show if this slab is reclaimable or
not, and a helper function to achieve the value from
SLAB_RECLAIM_ACCOUNT flag.
Print out unreclaimable slab info which actual memory usage is not zero
(num_objs * size != 0) when panic_on_oom is set or no killable process.
Since such information is just showed when kernel panic, so it will not
lead too verbose message for normal oom.
The output looks like:
rpc_buffers 31KB
rpc_tasks 31KB
avtab_node 46735KB
xfs_buf 624KB
xfs_ili 48KB
xfs_efi_item 31KB
xfs_efd_item 31KB
xfs_buf_item 78KB
xfs_log_item_desc 141KB
xfs_trans 108KB
xfs_ifork 744KB
xfs_trans 108KB
xfs_ifork 744KB
xfs_da_state 126KB
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
mm/oom_kill.c | 13 +++++++++++--
mm/slab.c | 1 +
mm/slab.h | 7 +++++++
mm/slab_common.c | 30 ++++++++++++++++++++++++++++++
mm/slub.c | 1 +
5 files changed, 50 insertions(+), 2 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 99736e0..173c423 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -43,6 +43,7 @@
#include <asm/tlb.h>
#include "internal.h"
+#include "slab.h"
#define CREATE_TRACE_POINTS
#include <trace/events/oom.h>
@@ -427,6 +428,14 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
dump_tasks(oc->memcg, oc->nodemask);
}
+static void dump_header_with_slabinfo(struct oom_control *oc, struct task_struct *p)
+{
+ dump_header(oc, p);
+
+ if (IS_ENABLED(CONFIG_SLABINFO))
+ show_unreclaimable_slab();
+}
+
/*
* Number of OOM victims in flight
*/
@@ -959,7 +968,7 @@ static void check_panic_on_oom(struct oom_control *oc,
/* Do not panic for oom kills triggered by sysrq */
if (is_sysrq_oom(oc))
return;
- dump_header(oc, NULL);
+ dump_header_with_slabinfo(oc, NULL);
panic("Out of memory: %s panic_on_oom is enabled\n",
sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
}
@@ -1043,7 +1052,7 @@ bool out_of_memory(struct oom_control *oc)
select_bad_process(oc);
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
- dump_header(oc, NULL);
+ dump_header_with_slabinfo(oc, NULL);
panic("Out of memory and no killable processes...\n");
}
if (oc->chosen && oc->chosen != (void *)-1UL) {
diff --git a/mm/slab.c b/mm/slab.c
index 04dec48..4f4971c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4132,6 +4132,7 @@ void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
sinfo->shared = cachep->shared;
sinfo->objects_per_slab = cachep->num;
sinfo->cache_order = cachep->gfporder;
+ sinfo->reclaim = is_reclaimable(cachep);
}
void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *cachep)
diff --git a/mm/slab.h b/mm/slab.h
index 0733628..2f1ebce 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -186,6 +186,7 @@ struct slabinfo {
unsigned int shared;
unsigned int objects_per_slab;
unsigned int cache_order;
+ unsigned int reclaim;
};
void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo);
@@ -352,6 +353,11 @@ static inline void memcg_link_cache(struct kmem_cache *s)
#endif /* CONFIG_MEMCG && !CONFIG_SLOB */
+static inline bool is_reclaimable(struct kmem_cache *s)
+{
+ return (s->flags & SLAB_RECLAIM_ACCOUNT) ? true : false;
+}
+
static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
{
struct kmem_cache *cachep;
@@ -504,6 +510,7 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
void *memcg_slab_next(struct seq_file *m, void *p, loff_t *pos);
void memcg_slab_stop(struct seq_file *m, void *p);
int memcg_slab_show(struct seq_file *m, void *p);
+void show_unreclaimable_slab(void);
void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83b..665baf2 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -35,6 +35,8 @@
static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
slab_caches_to_rcu_destroy_workfn);
+#define K(x) ((x)/1024)
+
/*
* Set of flags that will prevent slab merging
*/
@@ -1272,6 +1274,34 @@ static int slab_show(struct seq_file *m, void *p)
return 0;
}
+void show_unreclaimable_slab()
+{
+ struct kmem_cache *s = NULL;
+ struct slabinfo sinfo;
+
+ memset(&sinfo, 0, sizeof(sinfo));
+
+ printk("Unreclaimable slabs:\n");
+
+ /*
+ * Here acquiring slab_mutex is unnecessary since we don't prefer to
+ * get sleep in oom path right before kernel panic, and avoid race condition.
+ * Since it is already oom, so there should be not any big allocation
+ * which could change the statistics significantly.
+ */
+ list_for_each_entry(s, &slab_caches, list) {
+ if (!is_root_cache(s))
+ continue;
+
+ get_slabinfo(s, &sinfo);
+
+ if (!is_reclaimable(s) && sinfo.num_objs > 0)
+ printk("%-17s %luKB\n", cache_name(s), K(sinfo.num_objs * s->size));
+ }
+}
+EXPORT_SYMBOL(show_unreclaimable_slab);
+#undef K
+
#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
void *memcg_slab_start(struct seq_file *m, loff_t *pos)
{
diff --git a/mm/slub.c b/mm/slub.c
index 163352c..5c17c0a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5872,6 +5872,7 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
sinfo->num_slabs = nr_slabs;
sinfo->objects_per_slab = oo_objects(s->oo);
sinfo->cache_order = oo_order(s->oo);
+ sinfo->reclaim = is_reclaimable(s);
}
void slabinfo_show_stats(struct seq_file *m, struct kmem_cache *s)
--
1.8.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 44+ messages in thread
end of thread, other threads:[~2017-09-26 7:56 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-20 22:38 [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message when kernel panic Yang Shi
2017-09-20 22:38 ` Yang Shi
2017-09-20 22:38 ` [PATCH 1/2] tools: slabinfo: add "-U" option to show unreclaimable slabs only Yang Shi
2017-09-20 22:38 ` Yang Shi
2017-09-20 22:38 ` [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic Yang Shi
2017-09-20 22:38 ` Yang Shi
2017-09-21 8:23 ` David Rientjes
2017-09-21 8:23 ` David Rientjes
2017-09-21 17:51 ` Yang Shi
2017-09-21 17:51 ` Yang Shi
2017-09-25 14:23 ` [PATCH 0/2 v4] oom: capture unreclaimable slab info in oom message " Michal Hocko
2017-09-25 14:23 ` Michal Hocko
2017-09-25 15:55 ` Yang Shi
2017-09-25 15:55 ` Yang Shi
2017-09-25 20:32 ` Michal Hocko
2017-09-25 20:32 ` Michal Hocko
2017-09-25 21:52 ` Yang Shi
2017-09-25 21:52 ` Yang Shi
2017-09-26 7:56 ` Michal Hocko
2017-09-26 7:56 ` Michal Hocko
-- strict thread matches above, loose matches on Subject: below --
2017-09-22 19:52 [PATCH 0/2 v6] " Yang Shi
2017-09-22 19:52 ` [PATCH 2/2] mm: oom: show unreclaimable slab info " Yang Shi
2017-09-22 19:52 ` Yang Shi
2017-09-24 6:10 ` Qixuan Wu
2017-09-24 6:10 ` Qixuan Wu
2017-09-21 20:52 [PATCH 0/2 v5] oom: capture unreclaimable slab info in oom message " Yang Shi
2017-09-21 20:52 ` [PATCH 2/2] mm: oom: show unreclaimable slab info " Yang Shi
2017-09-21 20:52 ` Yang Shi
2017-09-20 19:09 [RFC v3] oom: capture unreclaimable slab info in oom message " Yang Shi
2017-09-20 19:09 ` [PATCH 2/2] mm: oom: show unreclaimable slab info " Yang Shi
2017-09-20 19:09 ` Yang Shi
2017-09-20 21:00 ` David Rientjes
2017-09-20 21:00 ` David Rientjes
2017-09-20 21:32 ` Yang Shi
2017-09-20 21:32 ` Yang Shi
2017-09-18 18:26 [RFC v2] oom: capture unreclaimable slab info in oom message " Yang Shi
2017-09-18 18:26 ` [PATCH 2/2] mm: oom: show unreclaimable slab info " Yang Shi
2017-09-18 18:26 ` Yang Shi
2017-09-19 20:57 ` David Rientjes
2017-09-19 20:57 ` David Rientjes
2017-09-19 21:45 ` Yang Shi
2017-09-19 21:45 ` Yang Shi
2017-09-19 22:41 ` David Rientjes
2017-09-19 22:41 ` David Rientjes
2017-09-19 23:03 ` Yang Shi
2017-09-19 23:03 ` Yang Shi
2017-09-18 18:23 [PATCH 1/2] tools: slabinfo: add "-U" option to show unreclaimable slabs only Yang Shi
2017-09-18 18:23 ` [PATCH 2/2] mm: oom: show unreclaimable slab info when kernel panic Yang Shi
2017-09-18 18:23 ` Yang Shi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.