linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4 v6] lib: debugobjects: Introduce new global free list and defer objects free via the free list
@ 2018-02-05 23:18 Yang Shi
  2018-02-05 23:18 ` [PATCH 1/4 v6] lib: debugobjects: export max loops counter Yang Shi
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Yang Shi @ 2018-02-05 23:18 UTC (permalink / raw)
  To: tglx, longman; +Cc: yang.shi, linux-kernel


Since there are 4 patches now for this version, it is hard to track the change
log in single patch, so came up with this cover letter to track the change.

Here is the problem.

There are nested loops on debug objects free path, sometimes it may take
over hundred thousands of loops, then cause soft lockup with
!CONFIG_PREEMPT occasionally, like below:

NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s!
[stress-ng-getde:110342]

 CPU: 15 PID: 110342 Comm: stress-ng-getde Tainted: G
E   4.9.44-003.ali3000.alios7.x86_64.debug #1
 Hardware name: Dell Inc. PowerEdge R720xd/0X6FFV, BIOS
1.6.0 03/07/2013

Call Trace:
  [<ffffffff8141177e>] debug_check_no_obj_freed+0x13e/0x220
  [<ffffffff811f8751>] __free_pages_ok+0x1f1/0x5c0
  [<ffffffff811fa785>] __free_pages+0x25/0x40
  [<ffffffff812638db>] __free_slab+0x19b/0x270
  [<ffffffff812639e9>] discard_slab+0x39/0x50
  [<ffffffff812679f7>] __slab_free+0x207/0x270
  [<ffffffff81269966>] ___cache_free+0xa6/0xb0
  [<ffffffff8126c267>] qlist_free_all+0x47/0x80
  [<ffffffff8126c5a9>] quarantine_reduce+0x159/0x190
  [<ffffffff8126b3bf>] kasan_kmalloc+0xaf/0xc0
  [<ffffffff8126b8a2>] kasan_slab_alloc+0x12/0x20
  [<ffffffff81265e8a>] kmem_cache_alloc+0xfa/0x360
  [<ffffffff812abc8f>] ? getname_flags+0x4f/0x1f0
  [<ffffffff812abc8f>] getname_flags+0x4f/0x1f0
  [<ffffffff812abe42>] getname+0x12/0x20
  [<ffffffff81298da9>] do_sys_open+0xf9/0x210
  [<ffffffff81298ede>] SyS_open+0x1e/0x20
  [<ffffffff817d6e01>] entry_SYSCALL_64_fastpath+0x1f/0xc2

The code path might be called in either atomic or non-atomic context,
and in_atomic() can't tell if current context is atomic or not on
!PREEMPT kernel, so cond_resched() can't be used to prevent from the
softlockup.

Defer objects free outside of the loop in a batch to save some cycles
in the loop.
The objects will be added to a global free list, then put them back to
pool list in a work if the pool list is not full. If the pool list is
already full, the objects will stay on the global free list, then will
be freed later.
When allocating objects, check if there are any objects available on
the global free list and just reuse the objects if the global free list
is not empty. Reuse pool lock to protect the free list.

v6:
 * Splitted the second patch into 3 patches
 * Introduced __free_object()
 * Moved free objs reuse to fill_pool()
 * Fixed obj_pool_used leak
v5:
 * Trimmed commit log to just keep call stack and process info.
 * Per tglx's comment, just move free objs to the pool list when it is not
   full. If the pool list is full, free the memory of objs on the free list.
v4:
 * Dropped touching softlockup watchdog approach, and defer objects free
   outside the for loop per the suggestion from tglx.
v3:
 * Use debugfs_create_u32() helper API per Waiman's suggestion
v2:
 * Added suppress_lockup knob in debugfs per Waiman's suggestion

Yang Shi (4):
      lib: debugobjects: export max loops counter
      lib: debugobjects: add global free list and the counter
      lib: debugobjects: use global fre list in free_object()
      lib: debugobjects: handle objects free in a batch outside the loop

 lib/debugobjects.c | 116 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 91 insertions(+), 25 deletions(-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4 v6] lib: debugobjects: export max loops counter
  2018-02-05 23:18 [PATCH 0/4 v6] lib: debugobjects: Introduce new global free list and defer objects free via the free list Yang Shi
@ 2018-02-05 23:18 ` Yang Shi
  2018-02-12 15:04   ` Thomas Gleixner
  2018-02-13 10:03   ` [tip:core/debugobjects] debugobjects: Export " tip-bot for Yang Shi
  2018-02-05 23:18 ` [PATCH 2/4 v6] lib: debugobjects: add global free list and the counter Yang Shi
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 17+ messages in thread
From: Yang Shi @ 2018-02-05 23:18 UTC (permalink / raw)
  To: tglx, longman; +Cc: yang.shi, linux-kernel

Currently max chain counter is exported to debugfs, it just record the
counter of inner loop, however, there might be significant iterations of
external loop then it may take significant amount of time to finish all
of the checks. This may cause lockup on !CONFIG_PREEMPT kernel build
occasionally.

Record the counter of the max loops then export to sysfs so that the
user can be aware of the real overhead.

Then the output of /sys/kernel/debug/debug_objects/stats looks like:

max_chain     :121
max_loops     :543267
warnings      :0
fixups        :0
pool_free     :1764
pool_min_free :341
pool_used     :86438
pool_max_used :268887
objs_allocated:6068254
objs_freed    :5981076

Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 lib/debugobjects.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 2f5349c..166488d 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -50,6 +50,7 @@ struct debug_bucket {
 static struct kmem_cache	*obj_cache;
 
 static int			debug_objects_maxchain __read_mostly;
+static int			debug_objects_maxloops __read_mostly;
 static int			debug_objects_fixups __read_mostly;
 static int			debug_objects_warnings __read_mostly;
 static int			debug_objects_enabled __read_mostly
@@ -720,7 +721,7 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size)
 	enum debug_obj_state state;
 	struct debug_bucket *db;
 	struct debug_obj *obj;
-	int cnt;
+	int cnt, max_loops = 0;
 
 	saddr = (unsigned long) address;
 	eaddr = saddr + size;
@@ -765,7 +766,12 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size)
 
 		if (cnt > debug_objects_maxchain)
 			debug_objects_maxchain = cnt;
+
+		max_loops += cnt;
 	}
+
+	if (max_loops > debug_objects_maxloops)
+		debug_objects_maxloops = max_loops;
 }
 
 void debug_check_no_obj_freed(const void *address, unsigned long size)
@@ -780,6 +786,7 @@ void debug_check_no_obj_freed(const void *address, unsigned long size)
 static int debug_stats_show(struct seq_file *m, void *v)
 {
 	seq_printf(m, "max_chain     :%d\n", debug_objects_maxchain);
+	seq_printf(m, "max_loops     :%d\n", debug_objects_maxloops);
 	seq_printf(m, "warnings      :%d\n", debug_objects_warnings);
 	seq_printf(m, "fixups        :%d\n", debug_objects_fixups);
 	seq_printf(m, "pool_free     :%d\n", obj_pool_free);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4 v6] lib: debugobjects: add global free list and the counter
  2018-02-05 23:18 [PATCH 0/4 v6] lib: debugobjects: Introduce new global free list and defer objects free via the free list Yang Shi
  2018-02-05 23:18 ` [PATCH 1/4 v6] lib: debugobjects: export max loops counter Yang Shi
@ 2018-02-05 23:18 ` Yang Shi
  2018-02-12 15:52   ` Thomas Gleixner
                     ` (2 more replies)
  2018-02-05 23:18 ` [PATCH 3/4 v6] lib: debugobjects: use global free list in free_object() Yang Shi
  2018-02-05 23:18 ` [PATCH 4/4 v6] lib: debugobjects: handle objects free in a batch outside the loop Yang Shi
  3 siblings, 3 replies; 17+ messages in thread
From: Yang Shi @ 2018-02-05 23:18 UTC (permalink / raw)
  To: tglx, longman; +Cc: yang.shi, linux-kernel

Add a global free list and counter for reusing the objects.

Moving free objects from the global free list to pool list when pool list
is not full when freeing them. If the pool list is full already, just free
the memory of the objects later.

When initializing objects, fill the pool list from the global free list
first if it has free objects for reuse.

Reuse pool lock to protect the global free list and its counter.

And, export the number of objects on the global free list to sysfs:

max_chain     :79
max_loops     :8147
warnings      :0
fixups        :0
pool_free     :1697
pool_min_free :346
pool_used     :15356
pool_max_used :23933
on_free_list  :39
objs_allocated:32617
objs_freed    :16588

Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
---
 lib/debugobjects.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 166488d..c15fb5f 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -42,11 +42,14 @@ struct debug_bucket {
 static DEFINE_RAW_SPINLOCK(pool_lock);
 
 static HLIST_HEAD(obj_pool);
+static HLIST_HEAD(obj_to_free);
 
 static int			obj_pool_min_free = ODEBUG_POOL_SIZE;
 static int			obj_pool_free = ODEBUG_POOL_SIZE;
 static int			obj_pool_used;
 static int			obj_pool_max_used;
+/* The number of objs on the global free list */
+static int			obj_nr_tofree;
 static struct kmem_cache	*obj_cache;
 
 static int			debug_objects_maxchain __read_mostly;
@@ -97,9 +100,23 @@ static int __init disable_object_debug(char *str)
 static void fill_pool(void)
 {
 	gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
-	struct debug_obj *new;
+	struct debug_obj *new, *obj;
 	unsigned long flags;
 
+	/*
+	 * Reuse objs from the global free list, they will be reinitialized
+	 * when allocating
+	 */
+	while (obj_nr_tofree > 0 && (obj_pool_free < obj_pool_min_free)) {
+		raw_spin_lock_irqsave(&pool_lock, flags);
+		obj = hlist_entry(obj_to_free.first, typeof(*obj), node);
+		hlist_del(&obj->node);
+		obj_nr_tofree--;
+		hlist_add_head(&obj->node, &obj_pool);
+		obj_pool_free++;
+		raw_spin_unlock_irqrestore(&pool_lock, flags);
+	}
+
 	if (likely(obj_pool_free >= debug_objects_pool_min_level))
 		return;
 
@@ -186,11 +203,40 @@ static struct debug_obj *lookup_object(void *addr, struct debug_bucket *b)
 static void free_obj_work(struct work_struct *work)
 {
 	struct debug_obj *objs[ODEBUG_FREE_BATCH];
+	struct hlist_node *tmp;
+	struct debug_obj *obj;
 	unsigned long flags;
 	int i;
+	HLIST_HEAD(tofree);
 
 	if (!raw_spin_trylock_irqsave(&pool_lock, flags))
 		return;
+
+	/*
+	 * The objs on the pool list might be allocated before the work is
+	 * run, so recheck if pool list it full or not, if not fill pool
+	 * list from the global free list
+	 */
+	while (obj_pool_free < debug_objects_pool_size) {
+		if (obj_nr_tofree <= 0)
+			break;
+
+		obj = hlist_entry(obj_to_free.first, typeof(*obj), node);
+		hlist_del(&obj->node);
+		hlist_add_head(&obj->node, &obj_pool);
+		obj_pool_free++;
+		obj_nr_tofree--;
+	}
+
+	/*
+	 * pool list is already full, and there are still objs on the free list,
+	 * move remaining free objs to a separate list to free the memory later.
+	 */
+	if (obj_nr_tofree > 0) {
+		hlist_move_list(&obj_to_free, &tofree);
+		obj_nr_tofree = 0;
+	}
+
 	while (obj_pool_free >= debug_objects_pool_size + ODEBUG_FREE_BATCH) {
 		for (i = 0; i < ODEBUG_FREE_BATCH; i++) {
 			objs[i] = hlist_entry(obj_pool.first,
@@ -211,6 +257,13 @@ static void free_obj_work(struct work_struct *work)
 			return;
 	}
 	raw_spin_unlock_irqrestore(&pool_lock, flags);
+
+	if (!hlist_empty(&tofree)) {
+		hlist_for_each_entry_safe(obj, tmp, &tofree, node) {
+			hlist_del(&obj->node);
+			kmem_cache_free(obj_cache, obj);
+		}
+	}
 }
 
 /*
@@ -793,6 +846,7 @@ static int debug_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "pool_min_free :%d\n", obj_pool_min_free);
 	seq_printf(m, "pool_used     :%d\n", obj_pool_used);
 	seq_printf(m, "pool_max_used :%d\n", obj_pool_max_used);
+	seq_printf(m, "on_free_list  :%d\n", obj_nr_tofree);
 	seq_printf(m, "objs_allocated:%d\n", debug_objects_allocated);
 	seq_printf(m, "objs_freed    :%d\n", debug_objects_freed);
 	return 0;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4 v6] lib: debugobjects: use global free list in free_object()
  2018-02-05 23:18 [PATCH 0/4 v6] lib: debugobjects: Introduce new global free list and defer objects free via the free list Yang Shi
  2018-02-05 23:18 ` [PATCH 1/4 v6] lib: debugobjects: export max loops counter Yang Shi
  2018-02-05 23:18 ` [PATCH 2/4 v6] lib: debugobjects: add global free list and the counter Yang Shi
@ 2018-02-05 23:18 ` Yang Shi
  2018-02-13 10:04   ` [tip:core/debugobjects] debugobjects: Use " tip-bot for Yang Shi
  2018-02-05 23:18 ` [PATCH 4/4 v6] lib: debugobjects: handle objects free in a batch outside the loop Yang Shi
  3 siblings, 1 reply; 17+ messages in thread
From: Yang Shi @ 2018-02-05 23:18 UTC (permalink / raw)
  To: tglx, longman; +Cc: yang.shi, linux-kernel

When the pool list is already full, just put the free objects on the
global free list, then schedule a work to move them to pool list or free
the memory later.

If the pool list is not full, just put the objects back to the pool
list.

Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
---
 lib/debugobjects.c | 37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index c15fb5f..09f2469 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -266,27 +266,34 @@ static void free_obj_work(struct work_struct *work)
 	}
 }
 
-/*
- * Put the object back into the pool and schedule work to free objects
- * if necessary.
- */
-static void free_object(struct debug_obj *obj)
+static bool __free_object(struct debug_obj *obj)
 {
 	unsigned long flags;
-	int sched = 0;
+	bool work;
 
 	raw_spin_lock_irqsave(&pool_lock, flags);
-	/*
-	 * schedule work when the pool is filled and the cache is
-	 * initialized:
-	 */
-	if (obj_pool_free > debug_objects_pool_size && obj_cache)
-		sched = 1;
-	hlist_add_head(&obj->node, &obj_pool);
-	obj_pool_free++;
+	work = (obj_pool_free > debug_objects_pool_size) && obj_cache ?
+		true : false;
 	obj_pool_used--;
+
+	if (work) {
+		obj_nr_tofree++;
+		hlist_add_head(&obj->node, &obj_to_free);
+	} else {
+		obj_pool_free++;
+		hlist_add_head(&obj->node, &obj_pool);
+	}
 	raw_spin_unlock_irqrestore(&pool_lock, flags);
-	if (sched)
+	return work;
+}
+
+/*
+ * Put the object back into the pool and schedule work to free objects
+ * if necessary.
+ */
+static void free_object(struct debug_obj *obj)
+{
+	if (__free_object(obj))
 		schedule_work(&debug_obj_work);
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4 v6] lib: debugobjects: handle objects free in a batch outside the loop
  2018-02-05 23:18 [PATCH 0/4 v6] lib: debugobjects: Introduce new global free list and defer objects free via the free list Yang Shi
                   ` (2 preceding siblings ...)
  2018-02-05 23:18 ` [PATCH 3/4 v6] lib: debugobjects: use global free list in free_object() Yang Shi
@ 2018-02-05 23:18 ` Yang Shi
  2018-02-13 10:05   ` [tip:core/debugobjects] debugobjects: Use global free list in __debug_check_no_obj_freed() tip-bot for Yang Shi
  3 siblings, 1 reply; 17+ messages in thread
From: Yang Shi @ 2018-02-05 23:18 UTC (permalink / raw)
  To: tglx, longman; +Cc: yang.shi, linux-kernel

There are nested loops on debug objects free path, sometimes it may take
over hundred thousands of loops, then cause soft lockup with
!CONFIG_PREEMPT occasionally, like below:

NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s!
[stress-ng-getde:110342]

 CPU: 15 PID: 110342 Comm: stress-ng-getde Tainted: G
E   4.9.44-003.ali3000.alios7.x86_64.debug #1
 Hardware name: Dell Inc. PowerEdge R720xd/0X6FFV, BIOS
1.6.0 03/07/2013

Call Trace:
  [<ffffffff8141177e>] debug_check_no_obj_freed+0x13e/0x220
  [<ffffffff811f8751>] __free_pages_ok+0x1f1/0x5c0
  [<ffffffff811fa785>] __free_pages+0x25/0x40
  [<ffffffff812638db>] __free_slab+0x19b/0x270
  [<ffffffff812639e9>] discard_slab+0x39/0x50
  [<ffffffff812679f7>] __slab_free+0x207/0x270
  [<ffffffff81269966>] ___cache_free+0xa6/0xb0
  [<ffffffff8126c267>] qlist_free_all+0x47/0x80
  [<ffffffff8126c5a9>] quarantine_reduce+0x159/0x190
  [<ffffffff8126b3bf>] kasan_kmalloc+0xaf/0xc0
  [<ffffffff8126b8a2>] kasan_slab_alloc+0x12/0x20
  [<ffffffff81265e8a>] kmem_cache_alloc+0xfa/0x360
  [<ffffffff812abc8f>] ? getname_flags+0x4f/0x1f0
  [<ffffffff812abc8f>] getname_flags+0x4f/0x1f0
  [<ffffffff812abe42>] getname+0x12/0x20
  [<ffffffff81298da9>] do_sys_open+0xf9/0x210
  [<ffffffff81298ede>] SyS_open+0x1e/0x20
  [<ffffffff817d6e01>] entry_SYSCALL_64_fastpath+0x1f/0xc2

The code path might be called in either atomic or non-atomic context,
and in_atomic() can't tell if current context is atomic or not on
!PREEMPT kernel, so cond_resched() can't be used to prevent from the
softlockup.

Utilize the global free list to defer objects free outside of the loop in
a batch to save some cycles in the loop.

Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
---
 lib/debugobjects.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 09f2469..b1b42bd 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -776,12 +776,12 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size)
 {
 	unsigned long flags, oaddr, saddr, eaddr, paddr, chunks;
 	struct hlist_node *tmp;
-	HLIST_HEAD(freelist);
 	struct debug_obj_descr *descr;
 	enum debug_obj_state state;
 	struct debug_bucket *db;
 	struct debug_obj *obj;
 	int cnt, max_loops = 0;
+	bool work = false;
 
 	saddr = (unsigned long) address;
 	eaddr = saddr + size;
@@ -812,18 +812,12 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size)
 				goto repeat;
 			default:
 				hlist_del(&obj->node);
-				hlist_add_head(&obj->node, &freelist);
+				work |= __free_object(obj);
 				break;
 			}
 		}
 		raw_spin_unlock_irqrestore(&db->lock, flags);
 
-		/* Now free them */
-		hlist_for_each_entry_safe(obj, tmp, &freelist, node) {
-			hlist_del(&obj->node);
-			free_object(obj);
-		}
-
 		if (cnt > debug_objects_maxchain)
 			debug_objects_maxchain = cnt;
 
@@ -832,6 +826,10 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size)
 
 	if (max_loops > debug_objects_maxloops)
 		debug_objects_maxloops = max_loops;
+
+	/* Schedule work to move free objs to pool list */
+	if (work)
+		schedule_work(&debug_obj_work);
 }
 
 void debug_check_no_obj_freed(const void *address, unsigned long size)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4 v6] lib: debugobjects: export max loops counter
  2018-02-05 23:18 ` [PATCH 1/4 v6] lib: debugobjects: export max loops counter Yang Shi
@ 2018-02-12 15:04   ` Thomas Gleixner
  2018-02-13 10:03   ` [tip:core/debugobjects] debugobjects: Export " tip-bot for Yang Shi
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2018-02-12 15:04 UTC (permalink / raw)
  To: Yang Shi; +Cc: longman, linux-kernel

On Tue, 6 Feb 2018, Yang Shi wrote:
> @@ -720,7 +721,7 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size)
>  	enum debug_obj_state state;
>  	struct debug_bucket *db;
>  	struct debug_obj *obj;
> -	int cnt;
> +	int cnt, max_loops = 0;
>  
>  	saddr = (unsigned long) address;
>  	eaddr = saddr + size;
> @@ -765,7 +766,12 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size)
>  
>  		if (cnt > debug_objects_maxchain)
>  			debug_objects_maxchain = cnt;
> +
> +		max_loops += cnt;

I don't think max_loops is the proper name for this. It's not counting
loops. It's counting the aggregate number of objects inspected for a single
invocation of __debug_check_no_obj_freed() while max_chain records the
chain length in a hash bucket corresponding to a single memory chunk. I'll
fix that up myself. No need to resend.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4 v6] lib: debugobjects: add global free list and the counter
  2018-02-05 23:18 ` [PATCH 2/4 v6] lib: debugobjects: add global free list and the counter Yang Shi
@ 2018-02-12 15:52   ` Thomas Gleixner
  2018-02-12 15:54     ` Thomas Gleixner
  2018-02-12 16:25   ` Thomas Gleixner
  2018-02-13 10:04   ` [tip:core/debugobjects] debugobjects: Add " tip-bot for Yang Shi
  2 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2018-02-12 15:52 UTC (permalink / raw)
  To: Yang Shi; +Cc: longman, linux-kernel

On Tue, 6 Feb 2018, Yang Shi wrote:
> +	/*
> +	 * The objs on the pool list might be allocated before the work is
> +	 * run, so recheck if pool list it full or not, if not fill pool
> +	 * list from the global free list
> +	 */
> +	while (obj_pool_free < debug_objects_pool_size) {
> +		if (obj_nr_tofree <= 0)
> +			break;
> +
> +		obj = hlist_entry(obj_to_free.first, typeof(*obj), node);
> +		hlist_del(&obj->node);
> +		hlist_add_head(&obj->node, &obj_pool);
> +		obj_pool_free++;
> +		obj_nr_tofree--;
> +	}
> +
> +	/*
> +	 * pool list is already full, and there are still objs on the free list,
> +	 * move remaining free objs to a separate list to free the memory later.
> +	 */
> +	if (obj_nr_tofree > 0) {
> +		hlist_move_list(&obj_to_free, &tofree);
> +		obj_nr_tofree = 0;
> +	}
> +
>  	while (obj_pool_free >= debug_objects_pool_size + ODEBUG_FREE_BATCH) {
>  		for (i = 0; i < ODEBUG_FREE_BATCH; i++) {
>  			objs[i] = hlist_entry(obj_pool.first,

This whole section is now pointless and can be removed. There is no way
that this code path can be entered after this change. Surplus objects are
on the obj_to_free list and not on the obj_pool itself.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4 v6] lib: debugobjects: add global free list and the counter
  2018-02-12 15:52   ` Thomas Gleixner
@ 2018-02-12 15:54     ` Thomas Gleixner
  2018-02-12 18:53       ` Yang Shi
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2018-02-12 15:54 UTC (permalink / raw)
  To: Yang Shi; +Cc: longman, linux-kernel

On Mon, 12 Feb 2018, Thomas Gleixner wrote:
> On Tue, 6 Feb 2018, Yang Shi wrote:
> > +	/*
> > +	 * The objs on the pool list might be allocated before the work is
> > +	 * run, so recheck if pool list it full or not, if not fill pool
> > +	 * list from the global free list
> > +	 */
> > +	while (obj_pool_free < debug_objects_pool_size) {
> > +		if (obj_nr_tofree <= 0)
> > +			break;
> > +
> > +		obj = hlist_entry(obj_to_free.first, typeof(*obj), node);
> > +		hlist_del(&obj->node);
> > +		hlist_add_head(&obj->node, &obj_pool);
> > +		obj_pool_free++;
> > +		obj_nr_tofree--;
> > +	}
> > +
> > +	/*
> > +	 * pool list is already full, and there are still objs on the free list,
> > +	 * move remaining free objs to a separate list to free the memory later.
> > +	 */
> > +	if (obj_nr_tofree > 0) {
> > +		hlist_move_list(&obj_to_free, &tofree);
> > +		obj_nr_tofree = 0;
> > +	}
> > +
> >  	while (obj_pool_free >= debug_objects_pool_size + ODEBUG_FREE_BATCH) {
> >  		for (i = 0; i < ODEBUG_FREE_BATCH; i++) {
> >  			objs[i] = hlist_entry(obj_pool.first,
> 
> This whole section is now pointless and can be removed. There is no way
> that this code path can be entered after this change. Surplus objects are
> on the obj_to_free list and not on the obj_pool itself.

Actually not in this patch, but in the next one.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4 v6] lib: debugobjects: add global free list and the counter
  2018-02-05 23:18 ` [PATCH 2/4 v6] lib: debugobjects: add global free list and the counter Yang Shi
  2018-02-12 15:52   ` Thomas Gleixner
@ 2018-02-12 16:25   ` Thomas Gleixner
  2018-02-12 18:48     ` Yang Shi
  2018-02-13 10:04   ` [tip:core/debugobjects] debugobjects: Add " tip-bot for Yang Shi
  2 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2018-02-12 16:25 UTC (permalink / raw)
  To: Yang Shi; +Cc: longman, linux-kernel

On Tue, 6 Feb 2018, Yang Shi wrote:
> +	/*
> +	 * Reuse objs from the global free list, they will be reinitialized
> +	 * when allocating
> +	 */
> +	while (obj_nr_tofree > 0 && (obj_pool_free < obj_pool_min_free)) {
> +		raw_spin_lock_irqsave(&pool_lock, flags);
> +		obj = hlist_entry(obj_to_free.first, typeof(*obj), node);

This is racy vs. the worker thread. Assume obj_nr_tofree = 1:

CPU0					CPU1
worker
   lock(&pool_lock);			while (obj_nr_tofree > 0 && ...) {
     obj = hlist_entry(obj_to_free);	  lock(&pool_lock);
     hlist_del(obj);			  
     obj_nr_tofree--;
     ...
   unlock(&pool_lock);
					  obj = hlist_entry(obj_to_free);
					  hlist_del(obj); <------- NULL pointer dereference

Not what you want, right? The counter or the list head need to be rechecked
after the lock is acquired.

> +		hlist_del(&obj->node);
> +		obj_nr_tofree--;
> +		hlist_add_head(&obj->node, &obj_pool);
> +		obj_pool_free++;
> +		raw_spin_unlock_irqrestore(&pool_lock, flags);
> +	}

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4 v6] lib: debugobjects: add global free list and the counter
  2018-02-12 16:25   ` Thomas Gleixner
@ 2018-02-12 18:48     ` Yang Shi
  2018-02-13 10:02       ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Yang Shi @ 2018-02-12 18:48 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: longman, linux-kernel



On 2/12/18 8:25 AM, Thomas Gleixner wrote:
> On Tue, 6 Feb 2018, Yang Shi wrote:
>> +	/*
>> +	 * Reuse objs from the global free list, they will be reinitialized
>> +	 * when allocating
>> +	 */
>> +	while (obj_nr_tofree > 0 && (obj_pool_free < obj_pool_min_free)) {
>> +		raw_spin_lock_irqsave(&pool_lock, flags);
>> +		obj = hlist_entry(obj_to_free.first, typeof(*obj), node);
> This is racy vs. the worker thread. Assume obj_nr_tofree = 1:
>
> CPU0					CPU1
> worker
>     lock(&pool_lock);			while (obj_nr_tofree > 0 && ...) {
>       obj = hlist_entry(obj_to_free);	  lock(&pool_lock);
>       hlist_del(obj);			
>       obj_nr_tofree--;
>       ...
>     unlock(&pool_lock);
> 					  obj = hlist_entry(obj_to_free);
> 					  hlist_del(obj); <------- NULL pointer dereference
>
> Not what you want, right? The counter or the list head need to be rechecked
> after the lock is acquired.

Yes, you are right. Will fix the race in newer version.

Regards,
Yang

>
>> +		hlist_del(&obj->node);
>> +		obj_nr_tofree--;
>> +		hlist_add_head(&obj->node, &obj_pool);
>> +		obj_pool_free++;
>> +		raw_spin_unlock_irqrestore(&pool_lock, flags);
>> +	}
> Thanks,
>
> 	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4 v6] lib: debugobjects: add global free list and the counter
  2018-02-12 15:54     ` Thomas Gleixner
@ 2018-02-12 18:53       ` Yang Shi
  0 siblings, 0 replies; 17+ messages in thread
From: Yang Shi @ 2018-02-12 18:53 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: longman, linux-kernel



On 2/12/18 7:54 AM, Thomas Gleixner wrote:
> On Mon, 12 Feb 2018, Thomas Gleixner wrote:
>> On Tue, 6 Feb 2018, Yang Shi wrote:
>>> +	/*
>>> +	 * The objs on the pool list might be allocated before the work is
>>> +	 * run, so recheck if pool list it full or not, if not fill pool
>>> +	 * list from the global free list
>>> +	 */
>>> +	while (obj_pool_free < debug_objects_pool_size) {
>>> +		if (obj_nr_tofree <= 0)
>>> +			break;
>>> +
>>> +		obj = hlist_entry(obj_to_free.first, typeof(*obj), node);
>>> +		hlist_del(&obj->node);
>>> +		hlist_add_head(&obj->node, &obj_pool);
>>> +		obj_pool_free++;
>>> +		obj_nr_tofree--;
>>> +	}
>>> +
>>> +	/*
>>> +	 * pool list is already full, and there are still objs on the free list,
>>> +	 * move remaining free objs to a separate list to free the memory later.
>>> +	 */
>>> +	if (obj_nr_tofree > 0) {
>>> +		hlist_move_list(&obj_to_free, &tofree);
>>> +		obj_nr_tofree = 0;
>>> +	}
>>> +
>>>   	while (obj_pool_free >= debug_objects_pool_size + ODEBUG_FREE_BATCH) {
>>>   		for (i = 0; i < ODEBUG_FREE_BATCH; i++) {
>>>   			objs[i] = hlist_entry(obj_pool.first,
>> This whole section is now pointless and can be removed. There is no way
>> that this code path can be entered after this change. Surplus objects are
>> on the obj_to_free list and not on the obj_pool itself.

Yes, it is correct. I think the whole ODEBUG_FREE_BATCH thing can be 
removed too.

Will do it in the next version.

Regards,
Yang

> Actually not in this patch, but in the next one.
>
> Thanks,
>
> 	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4 v6] lib: debugobjects: add global free list and the counter
  2018-02-12 18:48     ` Yang Shi
@ 2018-02-13 10:02       ` Thomas Gleixner
  2018-02-14  5:33         ` Yang Shi
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2018-02-13 10:02 UTC (permalink / raw)
  To: Yang Shi; +Cc: longman, linux-kernel

On Mon, 12 Feb 2018, Yang Shi wrote:
> On 2/12/18 8:25 AM, Thomas Gleixner wrote:
> > On Tue, 6 Feb 2018, Yang Shi wrote:
> > > +	/*
> > > +	 * Reuse objs from the global free list, they will be reinitialized
> > > +	 * when allocating
> > > +	 */
> > > +	while (obj_nr_tofree > 0 && (obj_pool_free < obj_pool_min_free)) {
> > > +		raw_spin_lock_irqsave(&pool_lock, flags);
> > > +		obj = hlist_entry(obj_to_free.first, typeof(*obj), node);
> > This is racy vs. the worker thread. Assume obj_nr_tofree = 1:
> > 
> > CPU0					CPU1
> > worker
> >     lock(&pool_lock);			while (obj_nr_tofree > 0 && ...) {
> >       obj = hlist_entry(obj_to_free);	  lock(&pool_lock);
> >       hlist_del(obj);			
> >       obj_nr_tofree--;
> >       ...
> >     unlock(&pool_lock);
> > 					  obj = hlist_entry(obj_to_free);
> > 					  hlist_del(obj); <------- NULL
> > pointer dereference
> > 
> > Not what you want, right? The counter or the list head need to be rechecked
> > after the lock is acquired.
> 
> Yes, you are right. Will fix the race in newer version.

I fixed up all the minor issues with this series and applied it to:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core/debugobjects

Please double check the result.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:core/debugobjects] debugobjects: Export max loops counter
  2018-02-05 23:18 ` [PATCH 1/4 v6] lib: debugobjects: export max loops counter Yang Shi
  2018-02-12 15:04   ` Thomas Gleixner
@ 2018-02-13 10:03   ` tip-bot for Yang Shi
  1 sibling, 0 replies; 17+ messages in thread
From: tip-bot for Yang Shi @ 2018-02-13 10:03 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, yang.shi, hpa, mingo, tglx

Commit-ID:  bd9dcd046509cd5355605e43791eacee8bf5e40f
Gitweb:     https://git.kernel.org/tip/bd9dcd046509cd5355605e43791eacee8bf5e40f
Author:     Yang Shi <yang.shi@linux.alibaba.com>
AuthorDate: Tue, 6 Feb 2018 07:18:25 +0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 13 Feb 2018 10:58:58 +0100

debugobjects: Export max loops counter

__debug_check_no_obj_freed() can be an expensive operation depending on the
size of memory freed. It already exports the maximum chain walk length via
debugfs, but this only records the maximum of a single memory chunk.

Though there is no information about the total number of objects inspected
for a __debug_check_no_obj_freed() operation, which might be significantly
larger when a huge memory region is freed.

Aggregate the number of objects inspected for a single invocation of
__debug_check_no_obj_freed() and export it via sysfs.

The resulting output of /sys/kernel/debug/debug_objects/stats looks like:

max_chain     :121
max_checked   :543267
warnings      :0
fixups        :0
pool_free     :1764
pool_min_free :341
pool_used     :86438
pool_max_used :268887
objs_allocated:6068254
objs_freed    :5981076

[ tglx: Renamed the variable to max_checked and adjusted changelog ]

Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: longman@redhat.com
Link: https://lkml.kernel.org/r/1517872708-24207-2-git-send-email-yang.shi@linux.alibaba.com

---
 lib/debugobjects.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 2f5349c..f6d57a1 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -50,6 +50,7 @@ static int			obj_pool_max_used;
 static struct kmem_cache	*obj_cache;
 
 static int			debug_objects_maxchain __read_mostly;
+static int			debug_objects_maxchecked __read_mostly;
 static int			debug_objects_fixups __read_mostly;
 static int			debug_objects_warnings __read_mostly;
 static int			debug_objects_enabled __read_mostly
@@ -720,7 +721,7 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size)
 	enum debug_obj_state state;
 	struct debug_bucket *db;
 	struct debug_obj *obj;
-	int cnt;
+	int cnt, objs_checked = 0;
 
 	saddr = (unsigned long) address;
 	eaddr = saddr + size;
@@ -765,7 +766,12 @@ repeat:
 
 		if (cnt > debug_objects_maxchain)
 			debug_objects_maxchain = cnt;
+
+		objs_checked += cnt;
 	}
+
+	if (objs_checked > debug_objects_maxchecked)
+		debug_objects_maxchecked = objs_checked;
 }
 
 void debug_check_no_obj_freed(const void *address, unsigned long size)
@@ -780,6 +786,7 @@ void debug_check_no_obj_freed(const void *address, unsigned long size)
 static int debug_stats_show(struct seq_file *m, void *v)
 {
 	seq_printf(m, "max_chain     :%d\n", debug_objects_maxchain);
+	seq_printf(m, "max_checked   :%d\n", debug_objects_maxchecked);
 	seq_printf(m, "warnings      :%d\n", debug_objects_warnings);
 	seq_printf(m, "fixups        :%d\n", debug_objects_fixups);
 	seq_printf(m, "pool_free     :%d\n", obj_pool_free);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip:core/debugobjects] debugobjects: Add global free list and the counter
  2018-02-05 23:18 ` [PATCH 2/4 v6] lib: debugobjects: add global free list and the counter Yang Shi
  2018-02-12 15:52   ` Thomas Gleixner
  2018-02-12 16:25   ` Thomas Gleixner
@ 2018-02-13 10:04   ` tip-bot for Yang Shi
  2 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Yang Shi @ 2018-02-13 10:04 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: hpa, yang.shi, mingo, tglx, linux-kernel

Commit-ID:  36c4ead6f6dfbbe777d3d7e9cc8702530b71a94f
Gitweb:     https://git.kernel.org/tip/36c4ead6f6dfbbe777d3d7e9cc8702530b71a94f
Author:     Yang Shi <yang.shi@linux.alibaba.com>
AuthorDate: Tue, 6 Feb 2018 07:18:26 +0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 13 Feb 2018 10:58:58 +0100

debugobjects: Add global free list and the counter

free_object() adds objects to the pool list and schedules work when the
pool list is larger than the pool size.  The worker handles the actual
kfree() of the object by iterating the pool list until the pool size is
below the maximum pool size again.

To iterate the pool list, pool_lock has to be held and the objects which
should be freed() need to be put into temporary storage so pool_lock can be
dropped for the actual kmem_cache_free() invocation. That's a pointless and
expensive exercise if there is a large number of objects to free.

In such a case its better to evaulate the fill level of the pool in
free_objects() and queue the object to free either in the pool list or if
it's full on a separate global free list.

The worker can then do the following simpler operation:

  - Move objects back from the global free list to the pool list if the
    pool list is not longer full.

  - Remove the remaining objects in a single list move operation from the
    global free list and do the kmem_cache_free() operation lockless from
    the temporary list head.

In fill_pool() the global free list is checked as well to avoid real
allocations from the kmem cache.

Add the necessary list head and a counter for the number of objects on the
global free list and export that counter via sysfs:

max_chain     :79
max_loops     :8147
warnings      :0
fixups        :0
pool_free     :1697
pool_min_free :346
pool_used     :15356
pool_max_used :23933
on_free_list  :39
objs_allocated:32617
objs_freed    :16588

Nothing queues objects on the global free list yet. This happens in a
follow up change.

[ tglx: Simplified implementation and massaged changelog ]

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: longman@redhat.com
Link: https://lkml.kernel.org/r/1517872708-24207-3-git-send-email-yang.shi@linux.alibaba.com

---
 lib/debugobjects.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index f6d57a1..e31273b 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -42,11 +42,14 @@ static struct debug_obj		obj_static_pool[ODEBUG_POOL_SIZE] __initdata;
 static DEFINE_RAW_SPINLOCK(pool_lock);
 
 static HLIST_HEAD(obj_pool);
+static HLIST_HEAD(obj_to_free);
 
 static int			obj_pool_min_free = ODEBUG_POOL_SIZE;
 static int			obj_pool_free = ODEBUG_POOL_SIZE;
 static int			obj_pool_used;
 static int			obj_pool_max_used;
+/* The number of objs on the global free list */
+static int			obj_nr_tofree;
 static struct kmem_cache	*obj_cache;
 
 static int			debug_objects_maxchain __read_mostly;
@@ -97,12 +100,32 @@ static const char *obj_states[ODEBUG_STATE_MAX] = {
 static void fill_pool(void)
 {
 	gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
-	struct debug_obj *new;
+	struct debug_obj *new, *obj;
 	unsigned long flags;
 
 	if (likely(obj_pool_free >= debug_objects_pool_min_level))
 		return;
 
+	/*
+	 * Reuse objs from the global free list; they will be reinitialized
+	 * when allocating.
+	 */
+	while (obj_nr_tofree && (obj_pool_free < obj_pool_min_free)) {
+		raw_spin_lock_irqsave(&pool_lock, flags);
+		/*
+		 * Recheck with the lock held as the worker thread might have
+		 * won the race and freed the global free list already.
+		 */
+		if (obj_nr_tofree) {
+			obj = hlist_entry(obj_to_free.first, typeof(*obj), node);
+			hlist_del(&obj->node);
+			obj_nr_tofree--;
+			hlist_add_head(&obj->node, &obj_pool);
+			obj_pool_free++;
+		}
+		raw_spin_unlock_irqrestore(&pool_lock, flags);
+	}
+
 	if (unlikely(!obj_cache))
 		return;
 
@@ -186,11 +209,38 @@ alloc_object(void *addr, struct debug_bucket *b, struct debug_obj_descr *descr)
 static void free_obj_work(struct work_struct *work)
 {
 	struct debug_obj *objs[ODEBUG_FREE_BATCH];
+	struct hlist_node *tmp;
+	struct debug_obj *obj;
 	unsigned long flags;
 	int i;
+	HLIST_HEAD(tofree);
 
 	if (!raw_spin_trylock_irqsave(&pool_lock, flags))
 		return;
+
+	/*
+	 * The objs on the pool list might be allocated before the work is
+	 * run, so recheck if pool list it full or not, if not fill pool
+	 * list from the global free list
+	 */
+	while (obj_nr_tofree && obj_pool_free < debug_objects_pool_size) {
+		obj = hlist_entry(obj_to_free.first, typeof(*obj), node);
+		hlist_del(&obj->node);
+		hlist_add_head(&obj->node, &obj_pool);
+		obj_pool_free++;
+		obj_nr_tofree--;
+	}
+
+	/*
+	 * Pool list is already full and there are still objs on the free
+	 * list. Move remaining free objs to a temporary list to free the
+	 * memory outside the pool_lock held region.
+	 */
+	if (obj_nr_tofree) {
+		hlist_move_list(&obj_to_free, &tofree);
+		obj_nr_tofree = 0;
+	}
+
 	while (obj_pool_free >= debug_objects_pool_size + ODEBUG_FREE_BATCH) {
 		for (i = 0; i < ODEBUG_FREE_BATCH; i++) {
 			objs[i] = hlist_entry(obj_pool.first,
@@ -211,6 +261,11 @@ static void free_obj_work(struct work_struct *work)
 			return;
 	}
 	raw_spin_unlock_irqrestore(&pool_lock, flags);
+
+	hlist_for_each_entry_safe(obj, tmp, &tofree, node) {
+		hlist_del(&obj->node);
+		kmem_cache_free(obj_cache, obj);
+	}
 }
 
 /*
@@ -793,6 +848,7 @@ static int debug_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "pool_min_free :%d\n", obj_pool_min_free);
 	seq_printf(m, "pool_used     :%d\n", obj_pool_used);
 	seq_printf(m, "pool_max_used :%d\n", obj_pool_max_used);
+	seq_printf(m, "on_free_list  :%d\n", obj_nr_tofree);
 	seq_printf(m, "objs_allocated:%d\n", debug_objects_allocated);
 	seq_printf(m, "objs_freed    :%d\n", debug_objects_freed);
 	return 0;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip:core/debugobjects] debugobjects: Use global free list in free_object()
  2018-02-05 23:18 ` [PATCH 3/4 v6] lib: debugobjects: use global free list in free_object() Yang Shi
@ 2018-02-13 10:04   ` tip-bot for Yang Shi
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Yang Shi @ 2018-02-13 10:04 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: hpa, yang.shi, mingo, tglx, linux-kernel

Commit-ID:  636e1970fd7deaa0d0ee0dfb6ac65fbd690b32d2
Gitweb:     https://git.kernel.org/tip/636e1970fd7deaa0d0ee0dfb6ac65fbd690b32d2
Author:     Yang Shi <yang.shi@linux.alibaba.com>
AuthorDate: Tue, 6 Feb 2018 07:18:27 +0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 13 Feb 2018 10:58:59 +0100

debugobjects: Use global free list in free_object()

The newly added global free list allows to avoid lengthy pool_list
iterations in free_obj_work() by putting objects either into the pool list
when the fill level of the pool is below the maximum or by putting them on
the global free list immediately.

As the pool is now guaranteed to never exceed the maximum fill level this
allows to remove the batch removal from pool list in free_obj_work().

Split free_object() into two parts, so the actual queueing function can be
reused without invoking schedule_work() on every invocation.

[ tglx: Remove the batch removal from pool list and massage changelog ]

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: longman@redhat.com
Link: https://lkml.kernel.org/r/1517872708-24207-4-git-send-email-yang.shi@linux.alibaba.com

---
 lib/debugobjects.c | 63 +++++++++++++++++++-----------------------------------
 1 file changed, 22 insertions(+), 41 deletions(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index e31273b..3e79c10 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -201,18 +201,13 @@ alloc_object(void *addr, struct debug_bucket *b, struct debug_obj_descr *descr)
  * workqueue function to free objects.
  *
  * To reduce contention on the global pool_lock, the actual freeing of
- * debug objects will be delayed if the pool_lock is busy. We also free
- * the objects in a batch of 4 for each lock/unlock cycle.
+ * debug objects will be delayed if the pool_lock is busy.
  */
-#define ODEBUG_FREE_BATCH	4
-
 static void free_obj_work(struct work_struct *work)
 {
-	struct debug_obj *objs[ODEBUG_FREE_BATCH];
 	struct hlist_node *tmp;
 	struct debug_obj *obj;
 	unsigned long flags;
-	int i;
 	HLIST_HEAD(tofree);
 
 	if (!raw_spin_trylock_irqsave(&pool_lock, flags))
@@ -240,26 +235,6 @@ static void free_obj_work(struct work_struct *work)
 		hlist_move_list(&obj_to_free, &tofree);
 		obj_nr_tofree = 0;
 	}
-
-	while (obj_pool_free >= debug_objects_pool_size + ODEBUG_FREE_BATCH) {
-		for (i = 0; i < ODEBUG_FREE_BATCH; i++) {
-			objs[i] = hlist_entry(obj_pool.first,
-					      typeof(*objs[0]), node);
-			hlist_del(&objs[i]->node);
-		}
-
-		obj_pool_free -= ODEBUG_FREE_BATCH;
-		debug_objects_freed += ODEBUG_FREE_BATCH;
-		/*
-		 * We release pool_lock across kmem_cache_free() to
-		 * avoid contention on pool_lock.
-		 */
-		raw_spin_unlock_irqrestore(&pool_lock, flags);
-		for (i = 0; i < ODEBUG_FREE_BATCH; i++)
-			kmem_cache_free(obj_cache, objs[i]);
-		if (!raw_spin_trylock_irqsave(&pool_lock, flags))
-			return;
-	}
 	raw_spin_unlock_irqrestore(&pool_lock, flags);
 
 	hlist_for_each_entry_safe(obj, tmp, &tofree, node) {
@@ -268,27 +243,33 @@ static void free_obj_work(struct work_struct *work)
 	}
 }
 
-/*
- * Put the object back into the pool and schedule work to free objects
- * if necessary.
- */
-static void free_object(struct debug_obj *obj)
+static bool __free_object(struct debug_obj *obj)
 {
 	unsigned long flags;
-	int sched = 0;
+	bool work;
 
 	raw_spin_lock_irqsave(&pool_lock, flags);
-	/*
-	 * schedule work when the pool is filled and the cache is
-	 * initialized:
-	 */
-	if (obj_pool_free > debug_objects_pool_size && obj_cache)
-		sched = 1;
-	hlist_add_head(&obj->node, &obj_pool);
-	obj_pool_free++;
+	work = (obj_pool_free > debug_objects_pool_size) && obj_cache;
 	obj_pool_used--;
+
+	if (work) {
+		obj_nr_tofree++;
+		hlist_add_head(&obj->node, &obj_to_free);
+	} else {
+		obj_pool_free++;
+		hlist_add_head(&obj->node, &obj_pool);
+	}
 	raw_spin_unlock_irqrestore(&pool_lock, flags);
-	if (sched)
+	return work;
+}
+
+/*
+ * Put the object back into the pool and schedule work to free objects
+ * if necessary.
+ */
+static void free_object(struct debug_obj *obj)
+{
+	if (__free_object(obj))
 		schedule_work(&debug_obj_work);
 }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip:core/debugobjects] debugobjects: Use global free list in __debug_check_no_obj_freed()
  2018-02-05 23:18 ` [PATCH 4/4 v6] lib: debugobjects: handle objects free in a batch outside the loop Yang Shi
@ 2018-02-13 10:05   ` tip-bot for Yang Shi
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Yang Shi @ 2018-02-13 10:05 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: yang.shi, mingo, linux-kernel, hpa, tglx

Commit-ID:  1ea9b98b007a662e402551a41a4413becad40a65
Gitweb:     https://git.kernel.org/tip/1ea9b98b007a662e402551a41a4413becad40a65
Author:     Yang Shi <yang.shi@linux.alibaba.com>
AuthorDate: Tue, 6 Feb 2018 07:18:28 +0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 13 Feb 2018 10:59:18 +0100

debugobjects: Use global free list in __debug_check_no_obj_freed()

__debug_check_no_obj_freed() iterates over the to be freed memory region in
chunks and iterates over the corresponding hash bucket list for each
chunk. This can accumulate to hundred thousands of checked objects. In the
worst case this can trigger the soft lockup detector:

NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s!
CPU: 15 PID: 110342 Comm: stress-ng-getde
Call Trace:
  [<ffffffff8141177e>] debug_check_no_obj_freed+0x13e/0x220
  [<ffffffff811f8751>] __free_pages_ok+0x1f1/0x5c0
  [<ffffffff811fa785>] __free_pages+0x25/0x40
  [<ffffffff812638db>] __free_slab+0x19b/0x270
  [<ffffffff812639e9>] discard_slab+0x39/0x50
  [<ffffffff812679f7>] __slab_free+0x207/0x270
  [<ffffffff81269966>] ___cache_free+0xa6/0xb0
  [<ffffffff8126c267>] qlist_free_all+0x47/0x80
  [<ffffffff8126c5a9>] quarantine_reduce+0x159/0x190
  [<ffffffff8126b3bf>] kasan_kmalloc+0xaf/0xc0
  [<ffffffff8126b8a2>] kasan_slab_alloc+0x12/0x20
  [<ffffffff81265e8a>] kmem_cache_alloc+0xfa/0x360
  [<ffffffff812abc8f>] ? getname_flags+0x4f/0x1f0
  [<ffffffff812abc8f>] getname_flags+0x4f/0x1f0
  [<ffffffff812abe42>] getname+0x12/0x20
  [<ffffffff81298da9>] do_sys_open+0xf9/0x210
  [<ffffffff81298ede>] SyS_open+0x1e/0x20
  [<ffffffff817d6e01>] entry_SYSCALL_64_fastpath+0x1f/0xc2

The code path might be called in either atomic or non-atomic context, but
in_atomic() can't tell if the current context is atomic or not on a
PREEMPT=n kernel, so cond_resched() can't be used to prevent the
softlockup.

Utilize the global free list to shorten the loop execution time.

[ tglx: Massaged changelog ]

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: longman@redhat.com
Link: https://lkml.kernel.org/r/1517872708-24207-5-git-send-email-yang.shi@linux.alibaba.com
---
 lib/debugobjects.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 3e79c10..faab2c4 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -751,13 +751,13 @@ EXPORT_SYMBOL_GPL(debug_object_active_state);
 static void __debug_check_no_obj_freed(const void *address, unsigned long size)
 {
 	unsigned long flags, oaddr, saddr, eaddr, paddr, chunks;
-	struct hlist_node *tmp;
-	HLIST_HEAD(freelist);
 	struct debug_obj_descr *descr;
 	enum debug_obj_state state;
 	struct debug_bucket *db;
+	struct hlist_node *tmp;
 	struct debug_obj *obj;
 	int cnt, objs_checked = 0;
+	bool work = false;
 
 	saddr = (unsigned long) address;
 	eaddr = saddr + size;
@@ -788,18 +788,12 @@ repeat:
 				goto repeat;
 			default:
 				hlist_del(&obj->node);
-				hlist_add_head(&obj->node, &freelist);
+				work |= __free_object(obj);
 				break;
 			}
 		}
 		raw_spin_unlock_irqrestore(&db->lock, flags);
 
-		/* Now free them */
-		hlist_for_each_entry_safe(obj, tmp, &freelist, node) {
-			hlist_del(&obj->node);
-			free_object(obj);
-		}
-
 		if (cnt > debug_objects_maxchain)
 			debug_objects_maxchain = cnt;
 
@@ -808,6 +802,10 @@ repeat:
 
 	if (objs_checked > debug_objects_maxchecked)
 		debug_objects_maxchecked = objs_checked;
+
+	/* Schedule work to actually kmem_cache_free() objects */
+	if (work)
+		schedule_work(&debug_obj_work);
 }
 
 void debug_check_no_obj_freed(const void *address, unsigned long size)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4 v6] lib: debugobjects: add global free list and the counter
  2018-02-13 10:02       ` Thomas Gleixner
@ 2018-02-14  5:33         ` Yang Shi
  0 siblings, 0 replies; 17+ messages in thread
From: Yang Shi @ 2018-02-14  5:33 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: longman, linux-kernel



On 2/13/18 2:02 AM, Thomas Gleixner wrote:
> On Mon, 12 Feb 2018, Yang Shi wrote:
>> On 2/12/18 8:25 AM, Thomas Gleixner wrote:
>>> On Tue, 6 Feb 2018, Yang Shi wrote:
>>>> +	/*
>>>> +	 * Reuse objs from the global free list, they will be reinitialized
>>>> +	 * when allocating
>>>> +	 */
>>>> +	while (obj_nr_tofree > 0 && (obj_pool_free < obj_pool_min_free)) {
>>>> +		raw_spin_lock_irqsave(&pool_lock, flags);
>>>> +		obj = hlist_entry(obj_to_free.first, typeof(*obj), node);
>>> This is racy vs. the worker thread. Assume obj_nr_tofree = 1:
>>>
>>> CPU0					CPU1
>>> worker
>>>      lock(&pool_lock);			while (obj_nr_tofree > 0 && ...) {
>>>        obj = hlist_entry(obj_to_free);	  lock(&pool_lock);
>>>        hlist_del(obj);			
>>>        obj_nr_tofree--;
>>>        ...
>>>      unlock(&pool_lock);
>>> 					  obj = hlist_entry(obj_to_free);
>>> 					  hlist_del(obj); <------- NULL
>>> pointer dereference
>>>
>>> Not what you want, right? The counter or the list head need to be rechecked
>>> after the lock is acquired.
>> Yes, you are right. Will fix the race in newer version.
> I fixed up all the minor issues with this series and applied it to:
>
>     git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core/debugobjects
>
> Please double check the result.

Thanks a lot. It looks good.

Regards,
Yang

>
> Thanks,
>
> 	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-02-14  5:33 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-05 23:18 [PATCH 0/4 v6] lib: debugobjects: Introduce new global free list and defer objects free via the free list Yang Shi
2018-02-05 23:18 ` [PATCH 1/4 v6] lib: debugobjects: export max loops counter Yang Shi
2018-02-12 15:04   ` Thomas Gleixner
2018-02-13 10:03   ` [tip:core/debugobjects] debugobjects: Export " tip-bot for Yang Shi
2018-02-05 23:18 ` [PATCH 2/4 v6] lib: debugobjects: add global free list and the counter Yang Shi
2018-02-12 15:52   ` Thomas Gleixner
2018-02-12 15:54     ` Thomas Gleixner
2018-02-12 18:53       ` Yang Shi
2018-02-12 16:25   ` Thomas Gleixner
2018-02-12 18:48     ` Yang Shi
2018-02-13 10:02       ` Thomas Gleixner
2018-02-14  5:33         ` Yang Shi
2018-02-13 10:04   ` [tip:core/debugobjects] debugobjects: Add " tip-bot for Yang Shi
2018-02-05 23:18 ` [PATCH 3/4 v6] lib: debugobjects: use global free list in free_object() Yang Shi
2018-02-13 10:04   ` [tip:core/debugobjects] debugobjects: Use " tip-bot for Yang Shi
2018-02-05 23:18 ` [PATCH 4/4 v6] lib: debugobjects: handle objects free in a batch outside the loop Yang Shi
2018-02-13 10:05   ` [tip:core/debugobjects] debugobjects: Use global free list in __debug_check_no_obj_freed() tip-bot for Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).