[PATCH 0/2 -mmotm] oom: show single slab cache in oom whose size

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2 -mmotm] oom: show single slab cache in oom whose size > 10% of total system memory
@ 2017-10-25 22:48 Yang Shi
  2017-10-25 22:48 ` [PATCH 1/2] mm: extract common code for calculating total memory size Yang Shi
  2017-10-25 22:49 ` [PATCH 2/2] mm: oom: dump single excessive slab cache when oom Yang Shi
  0 siblings, 2 replies; 11+ messages in thread
From: Yang Shi @ 2017-10-25 22:48 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
  Cc: Yang Shi, linux-mm, linux-kernel


Per the suggestion from David [1], this implementation dumps single slab cache if its size is over 10% of total system memory. In current implementation, the ration is fixed as 10%.

To get the size of total system memory patch #1 extract the common code from show_mem() so that the code can be used by checking the ratio.

The patchset is based on the mmotm tree.

[1] https://marc.info/?l=linux-mm&m=150819933626604&w=2

Yang Shi (2):
      mm: extract common code for calculating total memory size
      mm: oom: dump single excessive slab cache when oom

 include/linux/mm.h | 25 +++++++++++++++++++++++++
 lib/show_mem.c     | 20 +-------------------
 mm/oom_kill.c      | 22 +---------------------
 mm/slab.h          |  4 ++--
 mm/slab_common.c   | 21 ++++++++++++++++-----
 5 files changed, 45 insertions(+), 47 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] mm: extract common code for calculating total memory size
  2017-10-25 22:48 [PATCH 0/2 -mmotm] oom: show single slab cache in oom whose size > 10% of total system memory Yang Shi
@ 2017-10-25 22:48 ` Yang Shi
  2017-10-27 10:00   ` Christopher Lameter
  2017-10-25 22:49 ` [PATCH 2/2] mm: oom: dump single excessive slab cache when oom Yang Shi
  1 sibling, 1 reply; 11+ messages in thread
From: Yang Shi @ 2017-10-25 22:48 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
  Cc: Yang Shi, linux-mm, linux-kernel

Total memory size is needed by unreclaimable slub oom to check if
significant memory is used by a single slab. But, the caculation work is
done in show_mem(), so extracting the common code in order to share with
others.

Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
 include/linux/mm.h | 25 +++++++++++++++++++++++++
 lib/show_mem.c     | 20 +-------------------
 2 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 935c4d4..e21b81e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2050,6 +2050,31 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn,
 static inline void zero_resv_unavail(void) {}
 #endif
 
+static inline void calc_mem_size(unsigned long *total, unsigned long *reserved,
+				 unsigned long *highmem)
+{
+	pg_data_t *pgdat;
+
+	for_each_online_pgdat(pgdat) {
+		unsigned long flags;
+		int zoneid;
+
+		pgdat_resize_lock(pgdat, &flags);
+		for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
+			struct zone *zone = &pgdat->node_zones[zoneid];
+			if (!populated_zone(zone))
+				continue;
+
+			*total += zone->present_pages;
+			*reserved += zone->present_pages - zone->managed_pages;
+
+			if (is_highmem_idx(zoneid))
+				*highmem += zone->present_pages;
+		}
+		pgdat_resize_unlock(pgdat, &flags);
+	}
+}
+
 extern void set_dma_reserve(unsigned long new_dma_reserve);
 extern void memmap_init_zone(unsigned long, int, unsigned long,
 				unsigned long, enum memmap_context);
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 0beaa1d..115475e 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -11,30 +11,12 @@
 
 void show_mem(unsigned int filter, nodemask_t *nodemask)
 {
-	pg_data_t *pgdat;
 	unsigned long total = 0, reserved = 0, highmem = 0;
 
 	printk("Mem-Info:\n");
 	show_free_areas(filter, nodemask);
 
-	for_each_online_pgdat(pgdat) {
-		unsigned long flags;
-		int zoneid;
-
-		pgdat_resize_lock(pgdat, &flags);
-		for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
-			struct zone *zone = &pgdat->node_zones[zoneid];
-			if (!populated_zone(zone))
-				continue;
-
-			total += zone->present_pages;
-			reserved += zone->present_pages - zone->managed_pages;
-
-			if (is_highmem_idx(zoneid))
-				highmem += zone->present_pages;
-		}
-		pgdat_resize_unlock(pgdat, &flags);
-	}
+	calc_mem_size(&total, &reserved, &highmem);
 
 	printk("%lu pages RAM\n", total);
 	printk("%lu pages HighMem/MovableOnly\n", highmem);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] mm: extract common code for calculating total memory size
  2017-10-25 22:48 ` [PATCH 1/2] mm: extract common code for calculating total memory size Yang Shi
@ 2017-10-27 10:00   ` Christopher Lameter
  2017-10-27 16:51     ` Yang Shi
  0 siblings, 1 reply; 11+ messages in thread
From: Christopher Lameter @ 2017-10-27 10:00 UTC (permalink / raw)
  To: Yang Shi
  Cc: penberg, rientjes, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel

On Thu, 26 Oct 2017, Yang Shi wrote:

> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 935c4d4..e21b81e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2050,6 +2050,31 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn,
>  static inline void zero_resv_unavail(void) {}
>  #endif
>
> +static inline void calc_mem_size(unsigned long *total, unsigned long *reserved,
> +				 unsigned long *highmem)
> +{

Huge incline function. This needs to go into mm/page_alloc.c or
mm/slab_common.c

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] mm: extract common code for calculating total memory size
  2017-10-27 10:00   ` Christopher Lameter
@ 2017-10-27 16:51     ` Yang Shi
  2017-10-31 16:45       ` Yang Shi
  0 siblings, 1 reply; 11+ messages in thread
From: Yang Shi @ 2017-10-27 16:51 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: penberg, rientjes, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel



On 10/27/17 3:00 AM, Christopher Lameter wrote:
> On Thu, 26 Oct 2017, Yang Shi wrote:
> 
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 935c4d4..e21b81e 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -2050,6 +2050,31 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn,
>>   static inline void zero_resv_unavail(void) {}
>>   #endif
>>
>> +static inline void calc_mem_size(unsigned long *total, unsigned long *reserved,
>> +				 unsigned long *highmem)
>> +{
> 
> Huge incline function. This needs to go into mm/page_alloc.c or
> mm/slab_common.c

It is used by lib/show_mem.c too. But since it is definitely on a hot 
patch, I think I can change it to non inline.

Thanks,
Yang

> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] mm: extract common code for calculating total memory size
  2017-10-27 16:51     ` Yang Shi
@ 2017-10-31 16:45       ` Yang Shi
  0 siblings, 0 replies; 11+ messages in thread
From: Yang Shi @ 2017-10-31 16:45 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: penberg, rientjes, iamjoonsoo.kim, akpm, mhocko, linux-mm, linux-kernel



On 10/27/17 9:51 AM, Yang Shi wrote:
> 
> 
> On 10/27/17 3:00 AM, Christopher Lameter wrote:
>> On Thu, 26 Oct 2017, Yang Shi wrote:
>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 935c4d4..e21b81e 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -2050,6 +2050,31 @@ extern int __meminit 
>>> __early_pfn_to_nid(unsigned long pfn,
>>>   static inline void zero_resv_unavail(void) {}
>>>   #endif
>>>
>>> +static inline void calc_mem_size(unsigned long *total, unsigned long 
>>> *reserved,
>>> +                 unsigned long *highmem)
>>> +{
>>
>> Huge incline function. This needs to go into mm/page_alloc.c or
>> mm/slab_common.c
> 
> It is used by lib/show_mem.c too. But since it is definitely on a hot 
> patch, I think I can change it to non inline.

I mean it is *not* on the hot path. Sorry for the typo and inconvenience.

Yang

> 
> Thanks,
> Yang
> 
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/2] mm: oom: dump single excessive slab cache when oom
  2017-10-25 22:48 [PATCH 0/2 -mmotm] oom: show single slab cache in oom whose size > 10% of total system memory Yang Shi
  2017-10-25 22:48 ` [PATCH 1/2] mm: extract common code for calculating total memory size Yang Shi
@ 2017-10-25 22:49 ` Yang Shi
  2017-10-26 14:53   ` Michal Hocko
  1 sibling, 1 reply; 11+ messages in thread
From: Yang Shi @ 2017-10-25 22:49 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko
  Cc: Yang Shi, linux-mm, linux-kernel

Per the discussion with David [1], it looks more reasonable to just dump
the single excessive slab cache instead of dumping all slab caches when
oom.

Dump single excessive slab cache if its size is > 10% of total system
memory size when oom regardless it is unreclaimable.

[1] https://marc.info/?l=linux-mm&m=150819933626604&w=2

Suggested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
---
 mm/oom_kill.c    | 22 +---------------------
 mm/slab.h        |  4 ++--
 mm/slab_common.c | 21 ++++++++++++++++-----
 3 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 26add8a..f996f29 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -162,25 +162,6 @@ static bool oom_unkillable_task(struct task_struct *p,
 	return false;
 }
 
-/*
- * Print out unreclaimble slabs info when unreclaimable slabs amount is greater
- * than all user memory (LRU pages)
- */
-static bool is_dump_unreclaim_slabs(void)
-{
-	unsigned long nr_lru;
-
-	nr_lru = global_node_page_state(NR_ACTIVE_ANON) +
-		 global_node_page_state(NR_INACTIVE_ANON) +
-		 global_node_page_state(NR_ACTIVE_FILE) +
-		 global_node_page_state(NR_INACTIVE_FILE) +
-		 global_node_page_state(NR_ISOLATED_ANON) +
-		 global_node_page_state(NR_ISOLATED_FILE) +
-		 global_node_page_state(NR_UNEVICTABLE);
-
-	return (global_node_page_state(NR_SLAB_UNRECLAIMABLE) > nr_lru);
-}
-
 /**
  * oom_badness - heuristic function to determine which candidate task to kill
  * @p: task struct of which task we should calculate
@@ -443,8 +424,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
 		mem_cgroup_print_oom_info(oc->memcg, p);
 	else {
 		show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
-		if (is_dump_unreclaim_slabs())
-			dump_unreclaimable_slab();
+		dump_slab_cache();
 	}
 	if (sysctl_oom_dump_tasks)
 		dump_tasks(oc->memcg, oc->nodemask);
diff --git a/mm/slab.h b/mm/slab.h
index 6a86025..818b569 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -507,9 +507,9 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
 int memcg_slab_show(struct seq_file *m, void *p);
 
 #if defined(CONFIG_SLAB) || defined(CONFIG_SLUB_DEBUG)
-void dump_unreclaimable_slab(void);
+void dump_slab_cache(void);
 #else
-static inline void dump_unreclaimable_slab(void)
+static inline void dump_slab_cache(void)
 {
 }
 #endif
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1b14fe0..e5bfa07 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1311,7 +1311,18 @@ static int slab_show(struct seq_file *m, void *p)
 	return 0;
 }
 
-void dump_unreclaimable_slab(void)
+static bool inline is_dump_slabs(struct kmem_cache *s, struct slabinfo *sinfo)
+{
+	unsigned long total = 0, reserved = 0, highmem = 0;
+	unsigned long slab_size = sinfo->num_objs * s->size;
+
+	calc_mem_size(&total, &reserved, &highmem);
+
+	/* Check if single slab > 10% of total memory size */
+	return (slab_size > (total * PAGE_SIZE / 10));
+}
+
+void dump_slab_cache(void)
 {
 	struct kmem_cache *s, *s2;
 	struct slabinfo sinfo;
@@ -1324,20 +1335,20 @@ void dump_unreclaimable_slab(void)
 	 * without acquiring the mutex.
 	 */
 	if (!mutex_trylock(&slab_mutex)) {
-		pr_warn("excessive unreclaimable slab but cannot dump stats\n");
+		pr_warn("excessive slab cache but cannot dump stats\n");
 		return;
 	}
 
-	pr_info("Unreclaimable slab info:\n");
+	pr_info("The list of excessive single slab cache:\n");
 	pr_info("Name                      Used          Total\n");
 
 	list_for_each_entry_safe(s, s2, &slab_caches, list) {
-		if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
+		if (!is_root_cache(s))
 			continue;
 
 		get_slabinfo(s, &sinfo);
 
-		if (sinfo.num_objs > 0)
+		if (is_dump_slabs(s, &sinfo))
 			pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
 				(sinfo.active_objs * s->size) / 1024,
 				(sinfo.num_objs * s->size) / 1024);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] mm: oom: dump single excessive slab cache when oom
  2017-10-25 22:49 ` [PATCH 2/2] mm: oom: dump single excessive slab cache when oom Yang Shi
@ 2017-10-26 14:53   ` Michal Hocko
  2017-10-26 16:15     ` Yang Shi
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2017-10-26 14:53 UTC (permalink / raw)
  To: Yang Shi
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel

On Thu 26-10-17 06:49:00, Yang Shi wrote:
> Per the discussion with David [1], it looks more reasonable to just dump

Please try to avoid external references in the changelog as much as
possible.

> the single excessive slab cache instead of dumping all slab caches when
> oom.

You meant to say
"to just dump all slab caches which excess 10% of the total memory."

While we are at it. Abusing calc_mem_size seems to be rather clumsy and
tt is not nodemask aware so you the whole thing is dubious for NUMA
constrained OOMs.

The more I think about this the more I am convinced that this is just
fiddling with the code without a good reason and without much better
outcome.
 
> Dump single excessive slab cache if its size is > 10% of total system

s@single@all@

> memory size when oom regardless it is unreclaimable.
> 
> [1] https://marc.info/?l=linux-mm&m=150819933626604&w=2
> 
> Suggested-by: David Rientjes <rientjes@google.com>
> Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
> ---
>  mm/oom_kill.c    | 22 +---------------------
>  mm/slab.h        |  4 ++--
>  mm/slab_common.c | 21 ++++++++++++++++-----
>  3 files changed, 19 insertions(+), 28 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 26add8a..f996f29 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -162,25 +162,6 @@ static bool oom_unkillable_task(struct task_struct *p,
>  	return false;
>  }
>  
> -/*
> - * Print out unreclaimble slabs info when unreclaimable slabs amount is greater
> - * than all user memory (LRU pages)
> - */
> -static bool is_dump_unreclaim_slabs(void)
> -{
> -	unsigned long nr_lru;
> -
> -	nr_lru = global_node_page_state(NR_ACTIVE_ANON) +
> -		 global_node_page_state(NR_INACTIVE_ANON) +
> -		 global_node_page_state(NR_ACTIVE_FILE) +
> -		 global_node_page_state(NR_INACTIVE_FILE) +
> -		 global_node_page_state(NR_ISOLATED_ANON) +
> -		 global_node_page_state(NR_ISOLATED_FILE) +
> -		 global_node_page_state(NR_UNEVICTABLE);
> -
> -	return (global_node_page_state(NR_SLAB_UNRECLAIMABLE) > nr_lru);
> -}
> -
>  /**
>   * oom_badness - heuristic function to determine which candidate task to kill
>   * @p: task struct of which task we should calculate
> @@ -443,8 +424,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
>  		mem_cgroup_print_oom_info(oc->memcg, p);
>  	else {
>  		show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
> -		if (is_dump_unreclaim_slabs())
> -			dump_unreclaimable_slab();
> +		dump_slab_cache();
>  	}
>  	if (sysctl_oom_dump_tasks)
>  		dump_tasks(oc->memcg, oc->nodemask);
> diff --git a/mm/slab.h b/mm/slab.h
> index 6a86025..818b569 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -507,9 +507,9 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
>  int memcg_slab_show(struct seq_file *m, void *p);
>  
>  #if defined(CONFIG_SLAB) || defined(CONFIG_SLUB_DEBUG)
> -void dump_unreclaimable_slab(void);
> +void dump_slab_cache(void);
>  #else
> -static inline void dump_unreclaimable_slab(void)
> +static inline void dump_slab_cache(void)
>  {
>  }
>  #endif
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 1b14fe0..e5bfa07 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1311,7 +1311,18 @@ static int slab_show(struct seq_file *m, void *p)
>  	return 0;
>  }
>  
> -void dump_unreclaimable_slab(void)
> +static bool inline is_dump_slabs(struct kmem_cache *s, struct slabinfo *sinfo)
> +{
> +	unsigned long total = 0, reserved = 0, highmem = 0;
> +	unsigned long slab_size = sinfo->num_objs * s->size;
> +
> +	calc_mem_size(&total, &reserved, &highmem);
> +
> +	/* Check if single slab > 10% of total memory size */
> +	return (slab_size > (total * PAGE_SIZE / 10));
> +}
> +
> +void dump_slab_cache(void)
>  {
>  	struct kmem_cache *s, *s2;
>  	struct slabinfo sinfo;
> @@ -1324,20 +1335,20 @@ void dump_unreclaimable_slab(void)
>  	 * without acquiring the mutex.
>  	 */
>  	if (!mutex_trylock(&slab_mutex)) {
> -		pr_warn("excessive unreclaimable slab but cannot dump stats\n");
> +		pr_warn("excessive slab cache but cannot dump stats\n");
>  		return;
>  	}
>  
> -	pr_info("Unreclaimable slab info:\n");
> +	pr_info("The list of excessive single slab cache:\n");
>  	pr_info("Name                      Used          Total\n");
>  
>  	list_for_each_entry_safe(s, s2, &slab_caches, list) {
> -		if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
> +		if (!is_root_cache(s))
>  			continue;
>  
>  		get_slabinfo(s, &sinfo);
>  
> -		if (sinfo.num_objs > 0)
> +		if (is_dump_slabs(s, &sinfo))
>  			pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
>  				(sinfo.active_objs * s->size) / 1024,
>  				(sinfo.num_objs * s->size) / 1024);
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] mm: oom: dump single excessive slab cache when oom
  2017-10-26 14:53   ` Michal Hocko
@ 2017-10-26 16:15     ` Yang Shi
  2017-10-26 16:27       ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Yang Shi @ 2017-10-26 16:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel



On 10/26/17 7:53 AM, Michal Hocko wrote:
> On Thu 26-10-17 06:49:00, Yang Shi wrote:
>> Per the discussion with David [1], it looks more reasonable to just dump
> 
> Please try to avoid external references in the changelog as much as
> possible.

OK.

> 
>> the single excessive slab cache instead of dumping all slab caches when
>> oom.
> 
> You meant to say
> "to just dump all slab caches which excess 10% of the total memory."
> 
> While we are at it. Abusing calc_mem_size seems to be rather clumsy and
> tt is not nodemask aware so you the whole thing is dubious for NUMA
> constrained OOMs.

Since we just need the total memory size of the node for NUMA 
constrained OOM, we should be able to use show_mem_node_skip() to bring 
in nodemask.

> 
> The more I think about this the more I am convinced that this is just
> fiddling with the code without a good reason and without much better
> outcome.

I don't get you. Do you mean the benefit is not that much with just 
dumping excessive slab caches?

Thanks,
Yang


>   
>> Dump single excessive slab cache if its size is > 10% of total system
> 
> s@single@all@
> 
>> memory size when oom regardless it is unreclaimable.
>>
>> [1] https://marc.info/?l=linux-mm&m=150819933626604&w=2
>>
>> Suggested-by: David Rientjes <rientjes@google.com>
>> Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
>> ---
>>   mm/oom_kill.c    | 22 +---------------------
>>   mm/slab.h        |  4 ++--
>>   mm/slab_common.c | 21 ++++++++++++++++-----
>>   3 files changed, 19 insertions(+), 28 deletions(-)
>>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 26add8a..f996f29 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -162,25 +162,6 @@ static bool oom_unkillable_task(struct task_struct *p,
>>   	return false;
>>   }
>>   
>> -/*
>> - * Print out unreclaimble slabs info when unreclaimable slabs amount is greater
>> - * than all user memory (LRU pages)
>> - */
>> -static bool is_dump_unreclaim_slabs(void)
>> -{
>> -	unsigned long nr_lru;
>> -
>> -	nr_lru = global_node_page_state(NR_ACTIVE_ANON) +
>> -		 global_node_page_state(NR_INACTIVE_ANON) +
>> -		 global_node_page_state(NR_ACTIVE_FILE) +
>> -		 global_node_page_state(NR_INACTIVE_FILE) +
>> -		 global_node_page_state(NR_ISOLATED_ANON) +
>> -		 global_node_page_state(NR_ISOLATED_FILE) +
>> -		 global_node_page_state(NR_UNEVICTABLE);
>> -
>> -	return (global_node_page_state(NR_SLAB_UNRECLAIMABLE) > nr_lru);
>> -}
>> -
>>   /**
>>    * oom_badness - heuristic function to determine which candidate task to kill
>>    * @p: task struct of which task we should calculate
>> @@ -443,8 +424,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
>>   		mem_cgroup_print_oom_info(oc->memcg, p);
>>   	else {
>>   		show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
>> -		if (is_dump_unreclaim_slabs())
>> -			dump_unreclaimable_slab();
>> +		dump_slab_cache();
>>   	}
>>   	if (sysctl_oom_dump_tasks)
>>   		dump_tasks(oc->memcg, oc->nodemask);
>> diff --git a/mm/slab.h b/mm/slab.h
>> index 6a86025..818b569 100644
>> --- a/mm/slab.h
>> +++ b/mm/slab.h
>> @@ -507,9 +507,9 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
>>   int memcg_slab_show(struct seq_file *m, void *p);
>>   
>>   #if defined(CONFIG_SLAB) || defined(CONFIG_SLUB_DEBUG)
>> -void dump_unreclaimable_slab(void);
>> +void dump_slab_cache(void);
>>   #else
>> -static inline void dump_unreclaimable_slab(void)
>> +static inline void dump_slab_cache(void)
>>   {
>>   }
>>   #endif
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 1b14fe0..e5bfa07 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -1311,7 +1311,18 @@ static int slab_show(struct seq_file *m, void *p)
>>   	return 0;
>>   }
>>   
>> -void dump_unreclaimable_slab(void)
>> +static bool inline is_dump_slabs(struct kmem_cache *s, struct slabinfo *sinfo)
>> +{
>> +	unsigned long total = 0, reserved = 0, highmem = 0;
>> +	unsigned long slab_size = sinfo->num_objs * s->size;
>> +
>> +	calc_mem_size(&total, &reserved, &highmem);
>> +
>> +	/* Check if single slab > 10% of total memory size */
>> +	return (slab_size > (total * PAGE_SIZE / 10));
>> +}
>> +
>> +void dump_slab_cache(void)
>>   {
>>   	struct kmem_cache *s, *s2;
>>   	struct slabinfo sinfo;
>> @@ -1324,20 +1335,20 @@ void dump_unreclaimable_slab(void)
>>   	 * without acquiring the mutex.
>>   	 */
>>   	if (!mutex_trylock(&slab_mutex)) {
>> -		pr_warn("excessive unreclaimable slab but cannot dump stats\n");
>> +		pr_warn("excessive slab cache but cannot dump stats\n");
>>   		return;
>>   	}
>>   
>> -	pr_info("Unreclaimable slab info:\n");
>> +	pr_info("The list of excessive single slab cache:\n");
>>   	pr_info("Name                      Used          Total\n");
>>   
>>   	list_for_each_entry_safe(s, s2, &slab_caches, list) {
>> -		if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT))
>> +		if (!is_root_cache(s))
>>   			continue;
>>   
>>   		get_slabinfo(s, &sinfo);
>>   
>> -		if (sinfo.num_objs > 0)
>> +		if (is_dump_slabs(s, &sinfo))
>>   			pr_info("%-17s %10luKB %10luKB\n", cache_name(s),
>>   				(sinfo.active_objs * s->size) / 1024,
>>   				(sinfo.num_objs * s->size) / 1024);
>> -- 
>> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] mm: oom: dump single excessive slab cache when oom
  2017-10-26 16:15     ` Yang Shi
@ 2017-10-26 16:27       ` Michal Hocko
  2017-10-26 17:14         ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2017-10-26 16:27 UTC (permalink / raw)
  To: Yang Shi
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel

On Fri 27-10-17 00:15:17, Yang Shi wrote:
> 
> 
> On 10/26/17 7:53 AM, Michal Hocko wrote:
> > On Thu 26-10-17 06:49:00, Yang Shi wrote:
> > > Per the discussion with David [1], it looks more reasonable to just dump
> > 
> > Please try to avoid external references in the changelog as much as
> > possible.
> 
> OK.
> 
> > 
> > > the single excessive slab cache instead of dumping all slab caches when
> > > oom.
> > 
> > You meant to say
> > "to just dump all slab caches which excess 10% of the total memory."
> > 
> > While we are at it. Abusing calc_mem_size seems to be rather clumsy and
> > tt is not nodemask aware so you the whole thing is dubious for NUMA
> > constrained OOMs.
> 
> Since we just need the total memory size of the node for NUMA constrained
> OOM, we should be able to use show_mem_node_skip() to bring in nodemask.

yes

> > The more I think about this the more I am convinced that this is just
> > fiddling with the code without a good reason and without much better
> > outcome.
> 
> I don't get you. Do you mean the benefit is not that much with just dumping
> excessive slab caches?

Yes, I am not sure it makes sense to touch it without further
experiences. I am not saying this is a wrong approach I would just give
it some more time to see how it behaves in the wild and then make
changes based on that experience.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] mm: oom: dump single excessive slab cache when oom
  2017-10-26 16:27       ` Michal Hocko
@ 2017-10-26 17:14         ` Michal Hocko
  2017-10-31 16:47           ` Yang Shi
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2017-10-26 17:14 UTC (permalink / raw)
  To: Yang Shi
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel

On Thu 26-10-17 18:27:01, Michal Hocko wrote:
> On Fri 27-10-17 00:15:17, Yang Shi wrote:
> > 
> > 
> > On 10/26/17 7:53 AM, Michal Hocko wrote:
> > > On Thu 26-10-17 06:49:00, Yang Shi wrote:
> > > > Per the discussion with David [1], it looks more reasonable to just dump
> > > 
> > > Please try to avoid external references in the changelog as much as
> > > possible.
> > 
> > OK.
> > 
> > > 
> > > > the single excessive slab cache instead of dumping all slab caches when
> > > > oom.
> > > 
> > > You meant to say
> > > "to just dump all slab caches which excess 10% of the total memory."
> > > 
> > > While we are at it. Abusing calc_mem_size seems to be rather clumsy and
> > > tt is not nodemask aware so you the whole thing is dubious for NUMA
> > > constrained OOMs.
> > 
> > Since we just need the total memory size of the node for NUMA constrained
> > OOM, we should be able to use show_mem_node_skip() to bring in nodemask.
> 
> yes

to be more specific. This would work for the total number of pages
calculation. This is still not enough, though. You would also have to
filter slabs per numa node and this is getting more and more complicated
for a marginal improvement.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] mm: oom: dump single excessive slab cache when oom
  2017-10-26 17:14         ` Michal Hocko
@ 2017-10-31 16:47           ` Yang Shi
  0 siblings, 0 replies; 11+ messages in thread
From: Yang Shi @ 2017-10-31 16:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel



On 10/26/17 10:14 AM, Michal Hocko wrote:
> On Thu 26-10-17 18:27:01, Michal Hocko wrote:
>> On Fri 27-10-17 00:15:17, Yang Shi wrote:
>>>
>>>
>>> On 10/26/17 7:53 AM, Michal Hocko wrote:
>>>> On Thu 26-10-17 06:49:00, Yang Shi wrote:
>>>>> Per the discussion with David [1], it looks more reasonable to just dump
>>>>
>>>> Please try to avoid external references in the changelog as much as
>>>> possible.
>>>
>>> OK.
>>>
>>>>
>>>>> the single excessive slab cache instead of dumping all slab caches when
>>>>> oom.
>>>>
>>>> You meant to say
>>>> "to just dump all slab caches which excess 10% of the total memory."
>>>>
>>>> While we are at it. Abusing calc_mem_size seems to be rather clumsy and
>>>> tt is not nodemask aware so you the whole thing is dubious for NUMA
>>>> constrained OOMs.
>>>
>>> Since we just need the total memory size of the node for NUMA constrained
>>> OOM, we should be able to use show_mem_node_skip() to bring in nodemask.
>>
>> yes
> 
> to be more specific. This would work for the total number of pages
> calculation. This is still not enough, though. You would also have to
> filter slabs per numa node and this is getting more and more complicated
> for a marginal improvement.

Yes, it sounds so. Basically, I agree with you to wait for a while to 
see how the current implementation is doing.

Thanks,
Yang

> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-10-31 16:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-25 22:48 [PATCH 0/2 -mmotm] oom: show single slab cache in oom whose size > 10% of total system memory Yang Shi
2017-10-25 22:48 ` [PATCH 1/2] mm: extract common code for calculating total memory size Yang Shi
2017-10-27 10:00   ` Christopher Lameter
2017-10-27 16:51     ` Yang Shi
2017-10-31 16:45       ` Yang Shi
2017-10-25 22:49 ` [PATCH 2/2] mm: oom: dump single excessive slab cache when oom Yang Shi
2017-10-26 14:53   ` Michal Hocko
2017-10-26 16:15     ` Yang Shi
2017-10-26 16:27       ` Michal Hocko
2017-10-26 17:14         ` Michal Hocko
2017-10-31 16:47           ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).