linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2 v4] lib: debugobjects: export max loops counter
@ 2018-01-25  0:17 Yang Shi
  2018-01-25  0:17 ` [PATCH 2/2 v4] lib: debugobjects: handle objects free in a batch outside the loop Yang Shi
  0 siblings, 1 reply; 4+ messages in thread
From: Yang Shi @ 2018-01-25  0:17 UTC (permalink / raw)
  To: tglx, longman; +Cc: yang.shi, linux-kernel

Currently max chain counter is exported to debugfs, it just record the
counter of inner loop, however, there might be significant iterations of
external loop then it may take significant amount of time to finish all
of the checks. This may cause lockup on !CONFIG_PREEMPT kernel build
occasionally.

Record the counter of the max loops then export to sysfs so that the
user can be aware of the real overhead.

Then the output of /sys/kernel/debug/debug_objects/stats looks like:

max_chain     :121
max_loops     :543267
warnings      :0
fixups        :0
pool_free     :1764
pool_min_free :341
pool_used     :86438
pool_max_used :268887
objs_allocated:6068254
objs_freed    :5981076

Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
v1 --> v2:
 * Correct the typo in the commit log

 lib/debugobjects.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 2f5349c..166488d 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -50,6 +50,7 @@ struct debug_bucket {
 static struct kmem_cache	*obj_cache;
 
 static int			debug_objects_maxchain __read_mostly;
+static int			debug_objects_maxloops __read_mostly;
 static int			debug_objects_fixups __read_mostly;
 static int			debug_objects_warnings __read_mostly;
 static int			debug_objects_enabled __read_mostly
@@ -720,7 +721,7 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size)
 	enum debug_obj_state state;
 	struct debug_bucket *db;
 	struct debug_obj *obj;
-	int cnt;
+	int cnt, max_loops = 0;
 
 	saddr = (unsigned long) address;
 	eaddr = saddr + size;
@@ -765,7 +766,12 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size)
 
 		if (cnt > debug_objects_maxchain)
 			debug_objects_maxchain = cnt;
+
+		max_loops += cnt;
 	}
+
+	if (max_loops > debug_objects_maxloops)
+		debug_objects_maxloops = max_loops;
 }
 
 void debug_check_no_obj_freed(const void *address, unsigned long size)
@@ -780,6 +786,7 @@ void debug_check_no_obj_freed(const void *address, unsigned long size)
 static int debug_stats_show(struct seq_file *m, void *v)
 {
 	seq_printf(m, "max_chain     :%d\n", debug_objects_maxchain);
+	seq_printf(m, "max_loops     :%d\n", debug_objects_maxloops);
 	seq_printf(m, "warnings      :%d\n", debug_objects_warnings);
 	seq_printf(m, "fixups        :%d\n", debug_objects_fixups);
 	seq_printf(m, "pool_free     :%d\n", obj_pool_free);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2 v4] lib: debugobjects: handle objects free in a batch outside the loop
  2018-01-25  0:17 [PATCH 1/2 v4] lib: debugobjects: export max loops counter Yang Shi
@ 2018-01-25  0:17 ` Yang Shi
  2018-01-31 16:01   ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Yang Shi @ 2018-01-25  0:17 UTC (permalink / raw)
  To: tglx, longman; +Cc: yang.shi, linux-kernel

There are nested loops on debug objects free path, sometimes it may take
over hundred thousands of loops, then cause soft lockup with
!CONFIG_PREEMPT occasionally, like below:

NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [stress-ng-getde:110342]

 CPU: 15 PID: 110342 Comm: stress-ng-getde Tainted: G
E   4.9.44-003.ali3000.alios7.x86_64.debug #1
 Hardware name: Dell Inc. PowerEdge R720xd/0X6FFV, BIOS
1.6.0 03/07/2013
 task: ffff884cbb0d0000 task.stack: ffff884cabc70000
 RIP: 0010:[<ffffffff817d647b>]  [<ffffffff817d647b>]
_raw_spin_unlock_irqrestore+0x3b/0x60
 RSP: 0018:ffff884cabc77b78  EFLAGS: 00000292
 RAX: ffff884cbb0d0000 RBX: 0000000000000292 RCX: 0000000000000000
 RDX: ffff884cbb0d0000 RSI: 0000000000000001 RDI: 0000000000000292
 RBP: ffff884cabc77b88 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff8357a0d8
 R13: ffff884cabc77bc8 R14: ffffffff8357a0d0 R15: 00000000000000fc
 FS:  00002aee845fd2c0(0000) GS:ffff8852bd400000(0000)
knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000002991808 CR3: 0000005123abf000 CR4: 00000000000406e0
 Stack:
  ffff884ff4fe0000 ffff884ff4fd8000 ffff884cabc77c00 ffffffff8141177e
  0000000000000202 ffff884cbb0d0000 ffff884cabc77bc8 0000000000000006
  ffff884ff4fda000 ffffffff8357a0d8 0000000000000000 91f5d976f6020b6c
 Call Trace:
  [<ffffffff8141177e>] debug_check_no_obj_freed+0x13e/0x220
  [<ffffffff811f8751>] __free_pages_ok+0x1f1/0x5c0
  [<ffffffff811fa785>] __free_pages+0x25/0x40
  [<ffffffff812638db>] __free_slab+0x19b/0x270
  [<ffffffff812639e9>] discard_slab+0x39/0x50
  [<ffffffff812679f7>] __slab_free+0x207/0x270
  [<ffffffff81269966>] ___cache_free+0xa6/0xb0
  [<ffffffff8126c267>] qlist_free_all+0x47/0x80
  [<ffffffff8126c5a9>] quarantine_reduce+0x159/0x190
  [<ffffffff8126b3bf>] kasan_kmalloc+0xaf/0xc0
  [<ffffffff8126b8a2>] kasan_slab_alloc+0x12/0x20
  [<ffffffff81265e8a>] kmem_cache_alloc+0xfa/0x360
  [<ffffffff812abc8f>] ? getname_flags+0x4f/0x1f0
  [<ffffffff812abc8f>] getname_flags+0x4f/0x1f0
  [<ffffffff812abe42>] getname+0x12/0x20
  [<ffffffff81298da9>] do_sys_open+0xf9/0x210
  [<ffffffff81298ede>] SyS_open+0x1e/0x20
  [<ffffffff817d6e01>] entry_SYSCALL_64_fastpath+0x1f/0xc2

The code path might be called in either atomic or non-atomic context,
and in_atomic() can't tell if current context is atomic or not on
!PREEMPT kernel, so cond_resched() can't be used to prevent from the
softlockup.

Defer objects free outside of the loop in a batch to save some cycles
in the loop.
The objects will be added to a global free list, then put them back to
pool list in a work. When allocating objects, check if there are any
objects available on the global free list and just reuse the objects if
the global free list is not empty.
Reuse pool lock to protect the free list. Using a new lock sounds
overkilling.

Export the number of objects on the global freelist to sysfs, which
looks like:

max_chain     :79
max_loops     :8147
warnings      :0
fixups        :0
pool_free     :1697
pool_min_free :346
pool_used     :15356
pool_max_used :23933
on_free_list  :39
objs_allocated:32617
objs_freed    :16588

Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
---
v4: Dropped touching softlockup watchdog approach, and defer objects free
    outside the for loop per the suggestion from tglx.

 lib/debugobjects.c | 48 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 39 insertions(+), 9 deletions(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 166488d..cebf952 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -42,11 +42,14 @@ struct debug_bucket {
 static DEFINE_RAW_SPINLOCK(pool_lock);
 
 static HLIST_HEAD(obj_pool);
+static HLIST_HEAD(obj_to_free);
 
 static int			obj_pool_min_free = ODEBUG_POOL_SIZE;
 static int			obj_pool_free = ODEBUG_POOL_SIZE;
 static int			obj_pool_used;
 static int			obj_pool_max_used;
+/* The number of objs on the global free list */
+static int			obj_free;
 static struct kmem_cache	*obj_cache;
 
 static int			debug_objects_maxchain __read_mostly;
@@ -141,7 +144,8 @@ static struct debug_obj *lookup_object(void *addr, struct debug_bucket *b)
 }
 
 /*
- * Allocate a new object. If the pool is empty, switch off the debugger.
+ * Allocate a new object. Retrieve from global freelist first. If the pool is
+ * empty, switch off the debugger.
  * Must be called with interrupts disabled.
  */
 static struct debug_obj *
@@ -150,6 +154,13 @@ static struct debug_obj *lookup_object(void *addr, struct debug_bucket *b)
 	struct debug_obj *obj = NULL;
 
 	raw_spin_lock(&pool_lock);
+	if (obj_free > 0 && (obj_pool_free < obj_pool_min_free)) {
+		obj = hlist_entry(obj_to_free.first, typeof(*obj), node);
+		obj_free--;
+		hlist_del(&obj->node);
+		goto out;
+	}
+
 	if (obj_pool.first) {
 		obj	    = hlist_entry(obj_pool.first, typeof(*obj), node);
 
@@ -169,6 +180,8 @@ static struct debug_obj *lookup_object(void *addr, struct debug_bucket *b)
 		if (obj_pool_free < obj_pool_min_free)
 			obj_pool_min_free = obj_pool_free;
 	}
+
+out:
 	raw_spin_unlock(&pool_lock);
 
 	return obj;
@@ -186,11 +199,25 @@ static struct debug_obj *lookup_object(void *addr, struct debug_bucket *b)
 static void free_obj_work(struct work_struct *work)
 {
 	struct debug_obj *objs[ODEBUG_FREE_BATCH];
+	struct hlist_node *tmp;
+	struct debug_obj *obj;
 	unsigned long flags;
 	int i;
 
 	if (!raw_spin_trylock_irqsave(&pool_lock, flags))
 		return;
+
+	/* Move free obj to pool list from global free list */
+	if (obj_free > 0) {
+		hlist_for_each_entry_safe(obj, tmp, &obj_to_free, node) {
+			hlist_del(&obj->node);
+			hlist_add_head(&obj->node, &obj_pool);
+			obj_pool_free++;
+			obj_pool_used--;
+			obj_free--;
+		}
+	}
+
 	while (obj_pool_free >= debug_objects_pool_size + ODEBUG_FREE_BATCH) {
 		for (i = 0; i < ODEBUG_FREE_BATCH; i++) {
 			objs[i] = hlist_entry(obj_pool.first,
@@ -716,7 +743,6 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size)
 {
 	unsigned long flags, oaddr, saddr, eaddr, paddr, chunks;
 	struct hlist_node *tmp;
-	HLIST_HEAD(freelist);
 	struct debug_obj_descr *descr;
 	enum debug_obj_state state;
 	struct debug_bucket *db;
@@ -752,18 +778,17 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size)
 				goto repeat;
 			default:
 				hlist_del(&obj->node);
-				hlist_add_head(&obj->node, &freelist);
+				/* Put obj on the global free list */
+				raw_spin_lock(&pool_lock);
+				hlist_add_head(&obj->node, &obj_to_free);
+				/* Update the counter of objs on the global freelist */
+				obj_free++;
+				raw_spin_unlock(&pool_lock);
 				break;
 			}
 		}
 		raw_spin_unlock_irqrestore(&db->lock, flags);
 
-		/* Now free them */
-		hlist_for_each_entry_safe(obj, tmp, &freelist, node) {
-			hlist_del(&obj->node);
-			free_object(obj);
-		}
-
 		if (cnt > debug_objects_maxchain)
 			debug_objects_maxchain = cnt;
 
@@ -772,6 +797,10 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size)
 
 	if (max_loops > debug_objects_maxloops)
 		debug_objects_maxloops = max_loops;
+
+	/* Schedule work to move free objs to pool list */
+	if (obj_free > 0)
+		schedule_work(&debug_obj_work);
 }
 
 void debug_check_no_obj_freed(const void *address, unsigned long size)
@@ -793,6 +822,7 @@ static int debug_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "pool_min_free :%d\n", obj_pool_min_free);
 	seq_printf(m, "pool_used     :%d\n", obj_pool_used);
 	seq_printf(m, "pool_max_used :%d\n", obj_pool_max_used);
+	seq_printf(m, "on_free_list  :%d\n", obj_free);
 	seq_printf(m, "objs_allocated:%d\n", debug_objects_allocated);
 	seq_printf(m, "objs_freed    :%d\n", debug_objects_freed);
 	return 0;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2 v4] lib: debugobjects: handle objects free in a batch outside the loop
  2018-01-25  0:17 ` [PATCH 2/2 v4] lib: debugobjects: handle objects free in a batch outside the loop Yang Shi
@ 2018-01-31 16:01   ` Thomas Gleixner
  2018-01-31 18:51     ` Yang Shi
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2018-01-31 16:01 UTC (permalink / raw)
  To: Yang Shi; +Cc: longman, linux-kernel

On Thu, 25 Jan 2018, Yang Shi wrote:

> There are nested loops on debug objects free path, sometimes it may take
> over hundred thousands of loops, then cause soft lockup with
> !CONFIG_PREEMPT occasionally, like below:
> 
> NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [stress-ng-getde:110342]
> 
>  CPU: 15 PID: 110342 Comm: stress-ng-getde Tainted: G
> E   4.9.44-003.ali3000.alios7.x86_64.debug #1
>  Hardware name: Dell Inc. PowerEdge R720xd/0X6FFV, BIOS
> 1.6.0 03/07/2013
>  task: ffff884cbb0d0000 task.stack: ffff884cabc70000
>  RIP: 0010:[<ffffffff817d647b>]  [<ffffffff817d647b>]
> _raw_spin_unlock_irqrestore+0x3b/0x60
>  RSP: 0018:ffff884cabc77b78  EFLAGS: 00000292
>  RAX: ffff884cbb0d0000 RBX: 0000000000000292 RCX: 0000000000000000
>  RDX: ffff884cbb0d0000 RSI: 0000000000000001 RDI: 0000000000000292
>  RBP: ffff884cabc77b88 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff8357a0d8
>  R13: ffff884cabc77bc8 R14: ffffffff8357a0d0 R15: 00000000000000fc
>  FS:  00002aee845fd2c0(0000) GS:ffff8852bd400000(0000)
> knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000002991808 CR3: 0000005123abf000 CR4: 00000000000406e0
>  Stack:
>   ffff884ff4fe0000 ffff884ff4fd8000 ffff884cabc77c00 ffffffff8141177e
>   0000000000000202 ffff884cbb0d0000 ffff884cabc77bc8 0000000000000006
>   ffff884ff4fda000 ffffffff8357a0d8 0000000000000000 91f5d976f6020b6c

Please trim that. The register content is not really interesting

> @@ -186,11 +199,25 @@ static struct debug_obj *lookup_object(void *addr, struct debug_bucket *b)
>  static void free_obj_work(struct work_struct *work)
>  {
>  	struct debug_obj *objs[ODEBUG_FREE_BATCH];
> +	struct hlist_node *tmp;
> +	struct debug_obj *obj;
>  	unsigned long flags;
>  	int i;
>  
>  	if (!raw_spin_trylock_irqsave(&pool_lock, flags))
>  		return;
> +
> +	/* Move free obj to pool list from global free list */

If the pool is full, why would you shuffle those objects to the pool list
first just to free them 10 lines further down?

> +	if (obj_free > 0) {

  obj_nr_tofree might be a more descriptive name for this variable

> +		hlist_for_each_entry_safe(obj, tmp, &obj_to_free, node) {
> +			hlist_del(&obj->node);
> +			hlist_add_head(&obj->node, &obj_pool);
> +			obj_pool_free++;
> +			obj_pool_used--;
> +			obj_free--;
> +		}
> +	}

The other thing here is that this whole list walk operation happens with
the pool lock held and interrupts disabled. That's suboptimal at best.

So the right thing to do here is:

	HLIST_HEAD(tofree);
		
  	if (!raw_spin_trylock_irqsave(&pool_lock, flags))
  		return;

	while (obj_pool_free < debug_objects_pool_size) {
		if (!obj_nr_tofree)
			break:
		hlist_del(...)
		hlist_add(...)
	}

	if (obj_nr_tofree) {
		hlist_move_list(&obj_to_free, &freelist);
		....
	}

	while (obj_pool_free >= debug_objects_pool_size + ODEBUG_FREE_BATCH) {
	      ....
	}

	raw_spin_unlock_irqrestore(&pool_lock, flags);

	hlist_for_each_entry_safe(obj, tmp, &tofree, node) {
		hlist_del();
		kmem_cache_free(....);
	}

That way you minimize the lock held times and spare pointless list walks
and shuffling.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2 v4] lib: debugobjects: handle objects free in a batch outside the loop
  2018-01-31 16:01   ` Thomas Gleixner
@ 2018-01-31 18:51     ` Yang Shi
  0 siblings, 0 replies; 4+ messages in thread
From: Yang Shi @ 2018-01-31 18:51 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: longman, linux-kernel



On 1/31/18 8:01 AM, Thomas Gleixner wrote:
> On Thu, 25 Jan 2018, Yang Shi wrote:
>
>> There are nested loops on debug objects free path, sometimes it may take
>> over hundred thousands of loops, then cause soft lockup with
>> !CONFIG_PREEMPT occasionally, like below:
>>
>> NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [stress-ng-getde:110342]
>>
>>   CPU: 15 PID: 110342 Comm: stress-ng-getde Tainted: G
>> E   4.9.44-003.ali3000.alios7.x86_64.debug #1
>>   Hardware name: Dell Inc. PowerEdge R720xd/0X6FFV, BIOS
>> 1.6.0 03/07/2013
>>   task: ffff884cbb0d0000 task.stack: ffff884cabc70000
>>   RIP: 0010:[<ffffffff817d647b>]  [<ffffffff817d647b>]
>> _raw_spin_unlock_irqrestore+0x3b/0x60
>>   RSP: 0018:ffff884cabc77b78  EFLAGS: 00000292
>>   RAX: ffff884cbb0d0000 RBX: 0000000000000292 RCX: 0000000000000000
>>   RDX: ffff884cbb0d0000 RSI: 0000000000000001 RDI: 0000000000000292
>>   RBP: ffff884cabc77b88 R08: 0000000000000000 R09: 0000000000000000
>>   R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff8357a0d8
>>   R13: ffff884cabc77bc8 R14: ffffffff8357a0d0 R15: 00000000000000fc
>>   FS:  00002aee845fd2c0(0000) GS:ffff8852bd400000(0000)
>> knlGS:0000000000000000
>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   CR2: 0000000002991808 CR3: 0000005123abf000 CR4: 00000000000406e0
>>   Stack:
>>    ffff884ff4fe0000 ffff884ff4fd8000 ffff884cabc77c00 ffffffff8141177e
>>    0000000000000202 ffff884cbb0d0000 ffff884cabc77bc8 0000000000000006
>>    ffff884ff4fda000 ffffffff8357a0d8 0000000000000000 91f5d976f6020b6c
> Please trim that. The register content is not really interesting

Yes, sure.

>
>> @@ -186,11 +199,25 @@ static struct debug_obj *lookup_object(void *addr, struct debug_bucket *b)
>>   static void free_obj_work(struct work_struct *work)
>>   {
>>   	struct debug_obj *objs[ODEBUG_FREE_BATCH];
>> +	struct hlist_node *tmp;
>> +	struct debug_obj *obj;
>>   	unsigned long flags;
>>   	int i;
>>   
>>   	if (!raw_spin_trylock_irqsave(&pool_lock, flags))
>>   		return;
>> +
>> +	/* Move free obj to pool list from global free list */
> If the pool is full, why would you shuffle those objects to the pool list
> first just to free them 10 lines further down?

Yes, thanks for pointing out this. I should checked if pool list is full 
or not before moving free objects to pool list.

>
>> +	if (obj_free > 0) {
>    obj_nr_tofree might be a more descriptive name for this variable

OK.

>
>> +		hlist_for_each_entry_safe(obj, tmp, &obj_to_free, node) {
>> +			hlist_del(&obj->node);
>> +			hlist_add_head(&obj->node, &obj_pool);
>> +			obj_pool_free++;
>> +			obj_pool_used--;
>> +			obj_free--;
>> +		}
>> +	}
> The other thing here is that this whole list walk operation happens with
> the pool lock held and interrupts disabled. That's suboptimal at best.
>
> So the right thing to do here is:
>
> 	HLIST_HEAD(tofree);
> 		
>    	if (!raw_spin_trylock_irqsave(&pool_lock, flags))
>    		return;
>
> 	while (obj_pool_free < debug_objects_pool_size) {
> 		if (!obj_nr_tofree)
> 			break:
> 		hlist_del(...)
> 		hlist_add(...)
> 	}
>
> 	if (obj_nr_tofree) {
> 		hlist_move_list(&obj_to_free, &freelist);
> 		....
> 	}
>
> 	while (obj_pool_free >= debug_objects_pool_size + ODEBUG_FREE_BATCH) {
> 	      ....
> 	}
>
> 	raw_spin_unlock_irqrestore(&pool_lock, flags);
>
> 	hlist_for_each_entry_safe(obj, tmp, &tofree, node) {
> 		hlist_del();
> 		kmem_cache_free(....);
> 	}
>
> That way you minimize the lock held times and spare pointless list walks
> and shuffling.

Thanks a lot for the suggestion, will fix these in new version.

Regards,
Yang

>
> Thanks,
>
> 	tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-01-31 18:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-25  0:17 [PATCH 1/2 v4] lib: debugobjects: export max loops counter Yang Shi
2018-01-25  0:17 ` [PATCH 2/2 v4] lib: debugobjects: handle objects free in a batch outside the loop Yang Shi
2018-01-31 16:01   ` Thomas Gleixner
2018-01-31 18:51     ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).