* [RESEND PATCH v2 0/3] debugobjects: Reduce global pool_lock contention
@ 2017-01-05 20:17 Waiman Long
2017-01-05 20:17 ` [RESEND PATCH v2 1/3] debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done Waiman Long
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Waiman Long @ 2017-01-05 20:17 UTC (permalink / raw)
To: Thomas Gleixner, Andrew Morton
Cc: linux-kernel, Du Changbin, Christian Borntraeger, Jan Stancek,
Waiman Long
v1->v2:
- Move patch 2 in front of patch 1.
- Fix merge conflict with linux-next.
This patchset aims to reduce contention of the global pool_lock
while improving performance at the same time. It is done to resolve
the following soft lockup problem with a debug kernel in some of the
large SMP systems:
NMI watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [rcuos/1:21]
...
RIP: 0010:[<ffffffff817c216b>] [<ffffffff817c216b>]
_raw_spin_unlock_irqrestore+0x3b/0x60
...
Call Trace:
[<ffffffff813f40d1>] free_object+0x81/0xb0
[<ffffffff813f4f33>] debug_check_no_obj_freed+0x193/0x220
[<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
[<ffffffff81284996>] ? file_free_rcu+0x36/0x60
[<ffffffff81251712>] kmem_cache_free+0xd2/0x380
[<ffffffff81284960>] ? fput+0x90/0x90
[<ffffffff81284996>] file_free_rcu+0x36/0x60
[<ffffffff81124c23>] rcu_nocb_kthread+0x1b3/0x550
[<ffffffff81124b71>] ? rcu_nocb_kthread+0x101/0x550
[<ffffffff81124a70>] ? sync_exp_work_done.constprop.63+0x50/0x50
[<ffffffff810c59d1>] kthread+0x101/0x120
[<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
[<ffffffff817c2d32>] ret_from_fork+0x22/0x50
On a 8-socket IvyBridge-EX system (120 cores, 240 threads), the
elapsed time of a 4.9-rc7 kernel parallel build (make -j 240) was
reduced from 7m57s to 7m19s with a patched 4.9-rc7 kernel. There
was also about a 10X reduction in the number of debug objects being
allocated from or freed to the kmemcache during the kernel build.
Waiman Long (3):
debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done
debugobjects: Scale thresholds with # of CPUs
debugobjects: Reduce contention on the global pool_lock
lib/debugobjects.c | 57 ++++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 45 insertions(+), 12 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [RESEND PATCH v2 1/3] debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done
2017-01-05 20:17 [RESEND PATCH v2 0/3] debugobjects: Reduce global pool_lock contention Waiman Long
@ 2017-01-05 20:17 ` Waiman Long
2017-02-04 16:22 ` [tip:core/debugobjects] " tip-bot for Waiman Long
2017-01-05 20:17 ` [RESEND PATCH v2 2/3] debugobjects: Scale thresholds with # of CPUs Waiman Long
2017-01-05 20:17 ` [RESEND PATCH v2 3/3] debugobjects: Reduce contention on the global pool_lock Waiman Long
2 siblings, 1 reply; 12+ messages in thread
From: Waiman Long @ 2017-01-05 20:17 UTC (permalink / raw)
To: Thomas Gleixner, Andrew Morton
Cc: linux-kernel, Du Changbin, Christian Borntraeger, Jan Stancek,
Waiman Long
New debugfs stat counters are added to track the numbers of
kmem_cache_alloc() and kmem_cache_free() function calls to get a
sense of how the internal debug objects cache management is performing.
Signed-off-by: Waiman Long <longman@redhat.com>
---
lib/debugobjects.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 04c1ef7..d78673e 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -55,6 +55,12 @@ struct debug_bucket {
static struct debug_obj_descr *descr_test __read_mostly;
+/*
+ * Track numbers of kmem_cache_alloc and kmem_cache_free done.
+ */
+static int debug_objects_alloc;
+static int debug_objects_freed;
+
static void free_obj_work(struct work_struct *work);
static DECLARE_WORK(debug_obj_work, free_obj_work);
@@ -102,6 +108,7 @@ static void fill_pool(void)
raw_spin_lock_irqsave(&pool_lock, flags);
hlist_add_head(&new->node, &obj_pool);
+ debug_objects_alloc++;
obj_pool_free++;
raw_spin_unlock_irqrestore(&pool_lock, flags);
}
@@ -173,6 +180,7 @@ static void free_obj_work(struct work_struct *work)
obj = hlist_entry(obj_pool.first, typeof(*obj), node);
hlist_del(&obj->node);
obj_pool_free--;
+ debug_objects_freed++;
/*
* We release pool_lock across kmem_cache_free() to
* avoid contention on pool_lock.
@@ -758,6 +766,8 @@ static int debug_stats_show(struct seq_file *m, void *v)
seq_printf(m, "pool_min_free :%d\n", obj_pool_min_free);
seq_printf(m, "pool_used :%d\n", obj_pool_used);
seq_printf(m, "pool_max_used :%d\n", obj_pool_max_used);
+ seq_printf(m, "objects_alloc :%d\n", debug_objects_alloc);
+ seq_printf(m, "objects_freed :%d\n", debug_objects_freed);
return 0;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RESEND PATCH v2 2/3] debugobjects: Scale thresholds with # of CPUs
2017-01-05 20:17 [RESEND PATCH v2 0/3] debugobjects: Reduce global pool_lock contention Waiman Long
2017-01-05 20:17 ` [RESEND PATCH v2 1/3] debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done Waiman Long
@ 2017-01-05 20:17 ` Waiman Long
2017-02-04 16:23 ` [tip:core/debugobjects] " tip-bot for Waiman Long
2017-01-05 20:17 ` [RESEND PATCH v2 3/3] debugobjects: Reduce contention on the global pool_lock Waiman Long
2 siblings, 1 reply; 12+ messages in thread
From: Waiman Long @ 2017-01-05 20:17 UTC (permalink / raw)
To: Thomas Gleixner, Andrew Morton
Cc: linux-kernel, Du Changbin, Christian Borntraeger, Jan Stancek,
Waiman Long
On a large SMP systems with hundreds of CPUs, the current thresholds
for allocating and freeing debug objects (256 and 1024 respectively)
may not work well. This can cause a lot of needless calls to
kmem_aloc() and kmem_free() on those systems.
To alleviate this thrashing problem, the object freeing threshold
is now increased to "1024 + # of CPUs * 32". Whereas the object
allocation threshold is increased to "256 + # of CPUs * 4". That
should make the debug objects subsystem scale better with the number
of CPUs available in the system.
Signed-off-by: Waiman Long <longman@redhat.com>
---
lib/debugobjects.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index d78673e..dc78217 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -52,7 +52,10 @@ struct debug_bucket {
static int debug_objects_warnings __read_mostly;
static int debug_objects_enabled __read_mostly
= CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT;
-
+static int debug_objects_pool_size __read_mostly
+ = ODEBUG_POOL_SIZE;
+static int debug_objects_pool_min_level __read_mostly
+ = ODEBUG_POOL_MIN_LEVEL;
static struct debug_obj_descr *descr_test __read_mostly;
/*
@@ -94,13 +97,13 @@ static void fill_pool(void)
struct debug_obj *new;
unsigned long flags;
- if (likely(obj_pool_free >= ODEBUG_POOL_MIN_LEVEL))
+ if (likely(obj_pool_free >= debug_objects_pool_min_level))
return;
if (unlikely(!obj_cache))
return;
- while (obj_pool_free < ODEBUG_POOL_MIN_LEVEL) {
+ while (obj_pool_free < debug_objects_pool_min_level) {
new = kmem_cache_zalloc(obj_cache, gfp);
if (!new)
@@ -176,7 +179,7 @@ static void free_obj_work(struct work_struct *work)
unsigned long flags;
raw_spin_lock_irqsave(&pool_lock, flags);
- while (obj_pool_free > ODEBUG_POOL_SIZE) {
+ while (obj_pool_free > debug_objects_pool_size) {
obj = hlist_entry(obj_pool.first, typeof(*obj), node);
hlist_del(&obj->node);
obj_pool_free--;
@@ -206,7 +209,7 @@ static void free_object(struct debug_obj *obj)
* schedule work when the pool is filled and the cache is
* initialized:
*/
- if (obj_pool_free > ODEBUG_POOL_SIZE && obj_cache)
+ if (obj_pool_free > debug_objects_pool_size && obj_cache)
sched = 1;
hlist_add_head(&obj->node, &obj_pool);
obj_pool_free++;
@@ -1126,4 +1129,11 @@ void __init debug_objects_mem_init(void)
pr_warn("out of memory.\n");
} else
debug_objects_selftest();
+
+ /*
+ * Increase the thresholds for allocating and freeing objects
+ * according to the number of possible CPUs available in the system.
+ */
+ debug_objects_pool_size += num_possible_cpus() * 32;
+ debug_objects_pool_min_level += num_possible_cpus() * 4;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RESEND PATCH v2 3/3] debugobjects: Reduce contention on the global pool_lock
2017-01-05 20:17 [RESEND PATCH v2 0/3] debugobjects: Reduce global pool_lock contention Waiman Long
2017-01-05 20:17 ` [RESEND PATCH v2 1/3] debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done Waiman Long
2017-01-05 20:17 ` [RESEND PATCH v2 2/3] debugobjects: Scale thresholds with # of CPUs Waiman Long
@ 2017-01-05 20:17 ` Waiman Long
2017-02-04 16:23 ` [tip:core/debugobjects] " tip-bot for Waiman Long
2017-02-05 16:13 ` tip-bot for Waiman Long
2 siblings, 2 replies; 12+ messages in thread
From: Waiman Long @ 2017-01-05 20:17 UTC (permalink / raw)
To: Thomas Gleixner, Andrew Morton
Cc: linux-kernel, Du Changbin, Christian Borntraeger, Jan Stancek,
Waiman Long
On a large SMP system with many CPUs, the global pool_lock may become
a performance bottleneck as all the CPUs that need to allocate or
free debug objects have to take the lock. That can sometimes cause
soft lockups like:
NMI watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [rcuos/1:21]
...
RIP: 0010:[<ffffffff817c216b>] [<ffffffff817c216b>]
_raw_spin_unlock_irqrestore+0x3b/0x60
...
Call Trace:
[<ffffffff813f40d1>] free_object+0x81/0xb0
[<ffffffff813f4f33>] debug_check_no_obj_freed+0x193/0x220
[<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
[<ffffffff81284996>] ? file_free_rcu+0x36/0x60
[<ffffffff81251712>] kmem_cache_free+0xd2/0x380
[<ffffffff81284960>] ? fput+0x90/0x90
[<ffffffff81284996>] file_free_rcu+0x36/0x60
[<ffffffff81124c23>] rcu_nocb_kthread+0x1b3/0x550
[<ffffffff81124b71>] ? rcu_nocb_kthread+0x101/0x550
[<ffffffff81124a70>] ? sync_exp_work_done.constprop.63+0x50/0x50
[<ffffffff810c59d1>] kthread+0x101/0x120
[<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
[<ffffffff817c2d32>] ret_from_fork+0x22/0x50
To reduce the amount of contention on the pool_lock, the actual
kmem_cache_free() of the debug objects will be delayed if the pool_lock
is busy. This will temporarily increase the amount of free objects
available at the free pool when the system is busy. As a result,
the number of kmem_cache allocation and freeing should be reduced.
This patch also groups the freeing of debug objects in a batch of 4
to reduce the total number of lock/unlock cycles.
Signed-off-by: Waiman Long <longman@redhat.com>
---
lib/debugobjects.c | 31 ++++++++++++++++++++++---------
1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index dc78217..5476bbe 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -172,25 +172,38 @@ static struct debug_obj *lookup_object(void *addr, struct debug_bucket *b)
/*
* workqueue function to free objects.
+ *
+ * To reduce contention on the global pool_lock, the actual freeing of
+ * debug objects will be delayed if the pool_lock is busy. We also free
+ * the objects in a batch of 4 for each lock/unlock cycle.
*/
+#define ODEBUG_FREE_BATCH 4
static void free_obj_work(struct work_struct *work)
{
- struct debug_obj *obj;
+ struct debug_obj *objs[ODEBUG_FREE_BATCH];
unsigned long flags;
+ int i;
- raw_spin_lock_irqsave(&pool_lock, flags);
- while (obj_pool_free > debug_objects_pool_size) {
- obj = hlist_entry(obj_pool.first, typeof(*obj), node);
- hlist_del(&obj->node);
- obj_pool_free--;
- debug_objects_freed++;
+ if (!raw_spin_trylock_irqsave(&pool_lock, flags))
+ return;
+ while (obj_pool_free >= debug_objects_pool_size + ODEBUG_FREE_BATCH) {
+ for (i = 0; i < ODEBUG_FREE_BATCH; i++) {
+ objs[i] = hlist_entry(obj_pool.first,
+ typeof(*objs[0]), node);
+ hlist_del(&objs[i]->node);
+ }
+
+ obj_pool_free -= ODEBUG_FREE_BATCH;
+ debug_objects_freed += ODEBUG_FREE_BATCH;
/*
* We release pool_lock across kmem_cache_free() to
* avoid contention on pool_lock.
*/
raw_spin_unlock_irqrestore(&pool_lock, flags);
- kmem_cache_free(obj_cache, obj);
- raw_spin_lock_irqsave(&pool_lock, flags);
+ for (i = 0; i < ODEBUG_FREE_BATCH; i++)
+ kmem_cache_free(obj_cache, objs[i]);
+ if (!raw_spin_trylock_irqsave(&pool_lock, flags))
+ return;
}
raw_spin_unlock_irqrestore(&pool_lock, flags);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [tip:core/debugobjects] debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done
2017-01-05 20:17 ` [RESEND PATCH v2 1/3] debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done Waiman Long
@ 2017-02-04 16:22 ` tip-bot for Waiman Long
2017-02-05 10:08 ` Ingo Molnar
0 siblings, 1 reply; 12+ messages in thread
From: tip-bot for Waiman Long @ 2017-02-04 16:22 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, longman, mingo, akpm, jstancek, changbin.du, hpa,
borntraeger, tglx
Commit-ID: c4b73aabd0989d93b82894417ae501690bd1db5e
Gitweb: http://git.kernel.org/tip/c4b73aabd0989d93b82894417ae501690bd1db5e
Author: Waiman Long <longman@redhat.com>
AuthorDate: Thu, 5 Jan 2017 15:17:03 -0500
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 4 Feb 2017 09:01:54 +0100
debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done
New debugfs stat counters are added to track the numbers of
kmem_cache_alloc() and kmem_cache_free() function calls to get a
sense of how the internal debug objects cache management is performing.
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Du Changbin" <changbin.du@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Stancek <jstancek@redhat.com>
Link: http://lkml.kernel.org/r/1483647425-4135-2-git-send-email-longman@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
lib/debugobjects.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 04c1ef7..d78673e 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -55,6 +55,12 @@ static int debug_objects_enabled __read_mostly
static struct debug_obj_descr *descr_test __read_mostly;
+/*
+ * Track numbers of kmem_cache_alloc and kmem_cache_free done.
+ */
+static int debug_objects_alloc;
+static int debug_objects_freed;
+
static void free_obj_work(struct work_struct *work);
static DECLARE_WORK(debug_obj_work, free_obj_work);
@@ -102,6 +108,7 @@ static void fill_pool(void)
raw_spin_lock_irqsave(&pool_lock, flags);
hlist_add_head(&new->node, &obj_pool);
+ debug_objects_alloc++;
obj_pool_free++;
raw_spin_unlock_irqrestore(&pool_lock, flags);
}
@@ -173,6 +180,7 @@ static void free_obj_work(struct work_struct *work)
obj = hlist_entry(obj_pool.first, typeof(*obj), node);
hlist_del(&obj->node);
obj_pool_free--;
+ debug_objects_freed++;
/*
* We release pool_lock across kmem_cache_free() to
* avoid contention on pool_lock.
@@ -758,6 +766,8 @@ static int debug_stats_show(struct seq_file *m, void *v)
seq_printf(m, "pool_min_free :%d\n", obj_pool_min_free);
seq_printf(m, "pool_used :%d\n", obj_pool_used);
seq_printf(m, "pool_max_used :%d\n", obj_pool_max_used);
+ seq_printf(m, "objects_alloc :%d\n", debug_objects_alloc);
+ seq_printf(m, "objects_freed :%d\n", debug_objects_freed);
return 0;
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [tip:core/debugobjects] debugobjects: Scale thresholds with # of CPUs
2017-01-05 20:17 ` [RESEND PATCH v2 2/3] debugobjects: Scale thresholds with # of CPUs Waiman Long
@ 2017-02-04 16:23 ` tip-bot for Waiman Long
0 siblings, 0 replies; 12+ messages in thread
From: tip-bot for Waiman Long @ 2017-02-04 16:23 UTC (permalink / raw)
To: linux-tip-commits
Cc: hpa, jstancek, tglx, changbin.du, akpm, longman, borntraeger,
linux-kernel, mingo
Commit-ID: 97dd552eb23c83dbf626a6e84666c7e281375d47
Gitweb: http://git.kernel.org/tip/97dd552eb23c83dbf626a6e84666c7e281375d47
Author: Waiman Long <longman@redhat.com>
AuthorDate: Thu, 5 Jan 2017 15:17:04 -0500
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 4 Feb 2017 09:01:55 +0100
debugobjects: Scale thresholds with # of CPUs
On a large SMP systems with hundreds of CPUs, the current thresholds
for allocating and freeing debug objects (256 and 1024 respectively)
may not work well. This can cause a lot of needless calls to
kmem_aloc() and kmem_free() on those systems.
To alleviate this thrashing problem, the object freeing threshold
is now increased to "1024 + # of CPUs * 32". Whereas the object
allocation threshold is increased to "256 + # of CPUs * 4". That
should make the debug objects subsystem scale better with the number
of CPUs available in the system.
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Du Changbin" <changbin.du@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Stancek <jstancek@redhat.com>
Link: http://lkml.kernel.org/r/1483647425-4135-3-git-send-email-longman@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
lib/debugobjects.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index d78673e..dc78217 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -52,7 +52,10 @@ static int debug_objects_fixups __read_mostly;
static int debug_objects_warnings __read_mostly;
static int debug_objects_enabled __read_mostly
= CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT;
-
+static int debug_objects_pool_size __read_mostly
+ = ODEBUG_POOL_SIZE;
+static int debug_objects_pool_min_level __read_mostly
+ = ODEBUG_POOL_MIN_LEVEL;
static struct debug_obj_descr *descr_test __read_mostly;
/*
@@ -94,13 +97,13 @@ static void fill_pool(void)
struct debug_obj *new;
unsigned long flags;
- if (likely(obj_pool_free >= ODEBUG_POOL_MIN_LEVEL))
+ if (likely(obj_pool_free >= debug_objects_pool_min_level))
return;
if (unlikely(!obj_cache))
return;
- while (obj_pool_free < ODEBUG_POOL_MIN_LEVEL) {
+ while (obj_pool_free < debug_objects_pool_min_level) {
new = kmem_cache_zalloc(obj_cache, gfp);
if (!new)
@@ -176,7 +179,7 @@ static void free_obj_work(struct work_struct *work)
unsigned long flags;
raw_spin_lock_irqsave(&pool_lock, flags);
- while (obj_pool_free > ODEBUG_POOL_SIZE) {
+ while (obj_pool_free > debug_objects_pool_size) {
obj = hlist_entry(obj_pool.first, typeof(*obj), node);
hlist_del(&obj->node);
obj_pool_free--;
@@ -206,7 +209,7 @@ static void free_object(struct debug_obj *obj)
* schedule work when the pool is filled and the cache is
* initialized:
*/
- if (obj_pool_free > ODEBUG_POOL_SIZE && obj_cache)
+ if (obj_pool_free > debug_objects_pool_size && obj_cache)
sched = 1;
hlist_add_head(&obj->node, &obj_pool);
obj_pool_free++;
@@ -1126,4 +1129,11 @@ void __init debug_objects_mem_init(void)
pr_warn("out of memory.\n");
} else
debug_objects_selftest();
+
+ /*
+ * Increase the thresholds for allocating and freeing objects
+ * according to the number of possible CPUs available in the system.
+ */
+ debug_objects_pool_size += num_possible_cpus() * 32;
+ debug_objects_pool_min_level += num_possible_cpus() * 4;
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [tip:core/debugobjects] debugobjects: Reduce contention on the global pool_lock
2017-01-05 20:17 ` [RESEND PATCH v2 3/3] debugobjects: Reduce contention on the global pool_lock Waiman Long
@ 2017-02-04 16:23 ` tip-bot for Waiman Long
2017-02-05 10:03 ` Ingo Molnar
2017-02-05 16:13 ` tip-bot for Waiman Long
1 sibling, 1 reply; 12+ messages in thread
From: tip-bot for Waiman Long @ 2017-02-04 16:23 UTC (permalink / raw)
To: linux-tip-commits
Cc: hpa, tglx, changbin.du, linux-kernel, mingo, longman, jstancek,
akpm, borntraeger
Commit-ID: 6d2fea9837a584e706edad9b4b52833e31396736
Gitweb: http://git.kernel.org/tip/6d2fea9837a584e706edad9b4b52833e31396736
Author: Waiman Long <longman@redhat.com>
AuthorDate: Thu, 5 Jan 2017 15:17:05 -0500
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 4 Feb 2017 09:01:55 +0100
debugobjects: Reduce contention on the global pool_lock
On a large SMP system with many CPUs, the global pool_lock may become
a performance bottleneck as all the CPUs that need to allocate or
free debug objects have to take the lock. That can sometimes cause
soft lockups like:
NMI watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [rcuos/1:21]
...
RIP: 0010:[<ffffffff817c216b>] [<ffffffff817c216b>]
_raw_spin_unlock_irqrestore+0x3b/0x60
...
Call Trace:
[<ffffffff813f40d1>] free_object+0x81/0xb0
[<ffffffff813f4f33>] debug_check_no_obj_freed+0x193/0x220
[<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
[<ffffffff81284996>] ? file_free_rcu+0x36/0x60
[<ffffffff81251712>] kmem_cache_free+0xd2/0x380
[<ffffffff81284960>] ? fput+0x90/0x90
[<ffffffff81284996>] file_free_rcu+0x36/0x60
[<ffffffff81124c23>] rcu_nocb_kthread+0x1b3/0x550
[<ffffffff81124b71>] ? rcu_nocb_kthread+0x101/0x550
[<ffffffff81124a70>] ? sync_exp_work_done.constprop.63+0x50/0x50
[<ffffffff810c59d1>] kthread+0x101/0x120
[<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
[<ffffffff817c2d32>] ret_from_fork+0x22/0x50
To reduce the amount of contention on the pool_lock, the actual
kmem_cache_free() of the debug objects will be delayed if the pool_lock
is busy. This will temporarily increase the amount of free objects
available at the free pool when the system is busy. As a result,
the number of kmem_cache allocation and freeing is reduced.
To further reduce the lock operations free debug objects in batches of
four.
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Du Changbin" <changbin.du@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Stancek <jstancek@redhat.com>
Link: http://lkml.kernel.org/r/1483647425-4135-4-git-send-email-longman@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
lib/debugobjects.c | 31 ++++++++++++++++++++++---------
1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index dc78217..5476bbe 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -172,25 +172,38 @@ alloc_object(void *addr, struct debug_bucket *b, struct debug_obj_descr *descr)
/*
* workqueue function to free objects.
+ *
+ * To reduce contention on the global pool_lock, the actual freeing of
+ * debug objects will be delayed if the pool_lock is busy. We also free
+ * the objects in a batch of 4 for each lock/unlock cycle.
*/
+#define ODEBUG_FREE_BATCH 4
static void free_obj_work(struct work_struct *work)
{
- struct debug_obj *obj;
+ struct debug_obj *objs[ODEBUG_FREE_BATCH];
unsigned long flags;
+ int i;
- raw_spin_lock_irqsave(&pool_lock, flags);
- while (obj_pool_free > debug_objects_pool_size) {
- obj = hlist_entry(obj_pool.first, typeof(*obj), node);
- hlist_del(&obj->node);
- obj_pool_free--;
- debug_objects_freed++;
+ if (!raw_spin_trylock_irqsave(&pool_lock, flags))
+ return;
+ while (obj_pool_free >= debug_objects_pool_size + ODEBUG_FREE_BATCH) {
+ for (i = 0; i < ODEBUG_FREE_BATCH; i++) {
+ objs[i] = hlist_entry(obj_pool.first,
+ typeof(*objs[0]), node);
+ hlist_del(&objs[i]->node);
+ }
+
+ obj_pool_free -= ODEBUG_FREE_BATCH;
+ debug_objects_freed += ODEBUG_FREE_BATCH;
/*
* We release pool_lock across kmem_cache_free() to
* avoid contention on pool_lock.
*/
raw_spin_unlock_irqrestore(&pool_lock, flags);
- kmem_cache_free(obj_cache, obj);
- raw_spin_lock_irqsave(&pool_lock, flags);
+ for (i = 0; i < ODEBUG_FREE_BATCH; i++)
+ kmem_cache_free(obj_cache, objs[i]);
+ if (!raw_spin_trylock_irqsave(&pool_lock, flags))
+ return;
}
raw_spin_unlock_irqrestore(&pool_lock, flags);
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [tip:core/debugobjects] debugobjects: Reduce contention on the global pool_lock
2017-02-04 16:23 ` [tip:core/debugobjects] " tip-bot for Waiman Long
@ 2017-02-05 10:03 ` Ingo Molnar
2017-02-06 22:09 ` Waiman Long
0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2017-02-05 10:03 UTC (permalink / raw)
To: hpa, tglx, linux-kernel, changbin.du, longman, jstancek, akpm,
borntraeger
Cc: linux-tip-commits
* tip-bot for Waiman Long <tipbot@zytor.com> wrote:
> ---
> lib/debugobjects.c | 31 ++++++++++++++++++++++---------
> 1 file changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
> index dc78217..5476bbe 100644
> --- a/lib/debugobjects.c
> +++ b/lib/debugobjects.c
> @@ -172,25 +172,38 @@ alloc_object(void *addr, struct debug_bucket *b, struct debug_obj_descr *descr)
>
> /*
> * workqueue function to free objects.
> + *
> + * To reduce contention on the global pool_lock, the actual freeing of
> + * debug objects will be delayed if the pool_lock is busy. We also free
> + * the objects in a batch of 4 for each lock/unlock cycle.
> */
> +#define ODEBUG_FREE_BATCH 4
> static void free_obj_work(struct work_struct *work)
> {
Please put an extra newline before function definitions.
Looks good otherwise!
Thanks,
Ingo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:core/debugobjects] debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done
2017-02-04 16:22 ` [tip:core/debugobjects] " tip-bot for Waiman Long
@ 2017-02-05 10:08 ` Ingo Molnar
0 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2017-02-05 10:08 UTC (permalink / raw)
To: borntraeger, tglx, hpa, linux-kernel, longman, changbin.du,
jstancek, akpm
Cc: linux-tip-commits
* tip-bot for Waiman Long <tipbot@zytor.com> wrote:
> Commit-ID: c4b73aabd0989d93b82894417ae501690bd1db5e
> Gitweb: http://git.kernel.org/tip/c4b73aabd0989d93b82894417ae501690bd1db5e
> Author: Waiman Long <longman@redhat.com>
> AuthorDate: Thu, 5 Jan 2017 15:17:03 -0500
> Committer: Thomas Gleixner <tglx@linutronix.de>
> CommitDate: Sat, 4 Feb 2017 09:01:54 +0100
>
> debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done
>
> New debugfs stat counters are added to track the numbers of
> kmem_cache_alloc() and kmem_cache_free() function calls to get a
> sense of how the internal debug objects cache management is performing.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: "Du Changbin" <changbin.du@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jan Stancek <jstancek@redhat.com>
> Link: http://lkml.kernel.org/r/1483647425-4135-2-git-send-email-longman@redhat.com
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> ---
> lib/debugobjects.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
> index 04c1ef7..d78673e 100644
> --- a/lib/debugobjects.c
> +++ b/lib/debugobjects.c
> @@ -55,6 +55,12 @@ static int debug_objects_enabled __read_mostly
>
> static struct debug_obj_descr *descr_test __read_mostly;
>
> +/*
> + * Track numbers of kmem_cache_alloc and kmem_cache_free done.
Nit:
/*
* Track the number of kmem_cache_alloc()/free() calls done.
*/
Another nit:
> + */
> +static int debug_objects_alloc;
> +static int debug_objects_freed;
Yeah, so we want to either use past tense consistently:
static int debug_objects_allocated;
static int debug_objects_freed;
Or we want to use present tense consistently:
static int debug_objects_alloc;
static int debug_objects_free;
... but we don't want to mix the two when naming related counters!
( Btw., I'm for the _allocated/_freed pattern, that's what the usual nomenclature
for statistics counters. )
> + seq_printf(m, "objects_alloc :%d\n", debug_objects_alloc);
> + seq_printf(m, "objects_freed :%d\n", debug_objects_freed);
Ditto.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 12+ messages in thread
* [tip:core/debugobjects] debugobjects: Reduce contention on the global pool_lock
2017-01-05 20:17 ` [RESEND PATCH v2 3/3] debugobjects: Reduce contention on the global pool_lock Waiman Long
2017-02-04 16:23 ` [tip:core/debugobjects] " tip-bot for Waiman Long
@ 2017-02-05 16:13 ` tip-bot for Waiman Long
1 sibling, 0 replies; 12+ messages in thread
From: tip-bot for Waiman Long @ 2017-02-05 16:13 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, changbin.du, jstancek, mingo, longman, hpa, tglx,
borntraeger, akpm
Commit-ID: 858274b6a13b4db0e6fb451eea7f8817c42426a7
Gitweb: http://git.kernel.org/tip/858274b6a13b4db0e6fb451eea7f8817c42426a7
Author: Waiman Long <longman@redhat.com>
AuthorDate: Thu, 5 Jan 2017 15:17:05 -0500
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 5 Feb 2017 17:09:32 +0100
debugobjects: Reduce contention on the global pool_lock
On a large SMP system with many CPUs, the global pool_lock may become
a performance bottleneck as all the CPUs that need to allocate or
free debug objects have to take the lock. That can sometimes cause
soft lockups like:
NMI watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [rcuos/1:21]
...
RIP: 0010:[<ffffffff817c216b>] [<ffffffff817c216b>]
_raw_spin_unlock_irqrestore+0x3b/0x60
...
Call Trace:
[<ffffffff813f40d1>] free_object+0x81/0xb0
[<ffffffff813f4f33>] debug_check_no_obj_freed+0x193/0x220
[<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
[<ffffffff81284996>] ? file_free_rcu+0x36/0x60
[<ffffffff81251712>] kmem_cache_free+0xd2/0x380
[<ffffffff81284960>] ? fput+0x90/0x90
[<ffffffff81284996>] file_free_rcu+0x36/0x60
[<ffffffff81124c23>] rcu_nocb_kthread+0x1b3/0x550
[<ffffffff81124b71>] ? rcu_nocb_kthread+0x101/0x550
[<ffffffff81124a70>] ? sync_exp_work_done.constprop.63+0x50/0x50
[<ffffffff810c59d1>] kthread+0x101/0x120
[<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
[<ffffffff817c2d32>] ret_from_fork+0x22/0x50
To reduce the amount of contention on the pool_lock, the actual
kmem_cache_free() of the debug objects will be delayed if the pool_lock
is busy. This will temporarily increase the amount of free objects
available at the free pool when the system is busy. As a result,
the number of kmem_cache allocation and freeing is reduced.
To further reduce the lock operations free debug objects in batches of
four.
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Du Changbin" <changbin.du@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Stancek <jstancek@redhat.com>
Link: http://lkml.kernel.org/r/1483647425-4135-4-git-send-email-longman@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
lib/debugobjects.c | 32 +++++++++++++++++++++++---------
1 file changed, 23 insertions(+), 9 deletions(-)
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index dc78217..5e1bf2f 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -172,25 +172,39 @@ alloc_object(void *addr, struct debug_bucket *b, struct debug_obj_descr *descr)
/*
* workqueue function to free objects.
+ *
+ * To reduce contention on the global pool_lock, the actual freeing of
+ * debug objects will be delayed if the pool_lock is busy. We also free
+ * the objects in a batch of 4 for each lock/unlock cycle.
*/
+#define ODEBUG_FREE_BATCH 4
+
static void free_obj_work(struct work_struct *work)
{
- struct debug_obj *obj;
+ struct debug_obj *objs[ODEBUG_FREE_BATCH];
unsigned long flags;
+ int i;
- raw_spin_lock_irqsave(&pool_lock, flags);
- while (obj_pool_free > debug_objects_pool_size) {
- obj = hlist_entry(obj_pool.first, typeof(*obj), node);
- hlist_del(&obj->node);
- obj_pool_free--;
- debug_objects_freed++;
+ if (!raw_spin_trylock_irqsave(&pool_lock, flags))
+ return;
+ while (obj_pool_free >= debug_objects_pool_size + ODEBUG_FREE_BATCH) {
+ for (i = 0; i < ODEBUG_FREE_BATCH; i++) {
+ objs[i] = hlist_entry(obj_pool.first,
+ typeof(*objs[0]), node);
+ hlist_del(&objs[i]->node);
+ }
+
+ obj_pool_free -= ODEBUG_FREE_BATCH;
+ debug_objects_freed += ODEBUG_FREE_BATCH;
/*
* We release pool_lock across kmem_cache_free() to
* avoid contention on pool_lock.
*/
raw_spin_unlock_irqrestore(&pool_lock, flags);
- kmem_cache_free(obj_cache, obj);
- raw_spin_lock_irqsave(&pool_lock, flags);
+ for (i = 0; i < ODEBUG_FREE_BATCH; i++)
+ kmem_cache_free(obj_cache, objs[i]);
+ if (!raw_spin_trylock_irqsave(&pool_lock, flags))
+ return;
}
raw_spin_unlock_irqrestore(&pool_lock, flags);
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [tip:core/debugobjects] debugobjects: Reduce contention on the global pool_lock
2017-02-05 10:03 ` Ingo Molnar
@ 2017-02-06 22:09 ` Waiman Long
2017-02-06 22:50 ` Ingo Molnar
0 siblings, 1 reply; 12+ messages in thread
From: Waiman Long @ 2017-02-06 22:09 UTC (permalink / raw)
To: Ingo Molnar, hpa, tglx, linux-kernel, changbin.du, jstancek,
akpm, borntraeger
Cc: linux-tip-commits
On 02/05/2017 05:03 AM, Ingo Molnar wrote:
> * tip-bot for Waiman Long <tipbot@zytor.com> wrote:
>
>> ---
>> lib/debugobjects.c | 31 ++++++++++++++++++++++---------
>> 1 file changed, 22 insertions(+), 9 deletions(-)
>>
>> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
>> index dc78217..5476bbe 100644
>> --- a/lib/debugobjects.c
>> +++ b/lib/debugobjects.c
>> @@ -172,25 +172,38 @@ alloc_object(void *addr, struct debug_bucket *b, struct debug_obj_descr *descr)
>>
>> /*
>> * workqueue function to free objects.
>> + *
>> + * To reduce contention on the global pool_lock, the actual freeing of
>> + * debug objects will be delayed if the pool_lock is busy. We also free
>> + * the objects in a batch of 4 for each lock/unlock cycle.
>> */
>> +#define ODEBUG_FREE_BATCH 4
>> static void free_obj_work(struct work_struct *work)
>> {
> Please put an extra newline before function definitions.
>
> Looks good otherwise!
>
> Thanks,
>
> Ingo
Sure, I can do that. BTW the debugobjects patch was also pull into the
-mm tree a little while ago. Will that be a problem?
-Longman
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:core/debugobjects] debugobjects: Reduce contention on the global pool_lock
2017-02-06 22:09 ` Waiman Long
@ 2017-02-06 22:50 ` Ingo Molnar
0 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2017-02-06 22:50 UTC (permalink / raw)
To: Waiman Long, Andrew Morton
Cc: hpa, tglx, linux-kernel, changbin.du, jstancek, akpm,
borntraeger, linux-tip-commits
* Waiman Long <longman@redhat.com> wrote:
> On 02/05/2017 05:03 AM, Ingo Molnar wrote:
> > * tip-bot for Waiman Long <tipbot@zytor.com> wrote:
> >
> >> ---
> >> lib/debugobjects.c | 31 ++++++++++++++++++++++---------
> >> 1 file changed, 22 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
> >> index dc78217..5476bbe 100644
> >> --- a/lib/debugobjects.c
> >> +++ b/lib/debugobjects.c
> >> @@ -172,25 +172,38 @@ alloc_object(void *addr, struct debug_bucket *b, struct debug_obj_descr *descr)
> >>
> >> /*
> >> * workqueue function to free objects.
> >> + *
> >> + * To reduce contention on the global pool_lock, the actual freeing of
> >> + * debug objects will be delayed if the pool_lock is busy. We also free
> >> + * the objects in a batch of 4 for each lock/unlock cycle.
> >> */
> >> +#define ODEBUG_FREE_BATCH 4
> >> static void free_obj_work(struct work_struct *work)
> >> {
> > Please put an extra newline before function definitions.
> >
> > Looks good otherwise!
> >
> > Thanks,
> >
> > Ingo
>
>
> Sure, I can do that. BTW the debugobjects patch was also pull into the
> -mm tree a little while ago. Will that be a problem?
Should not be a problem usually, Andrew typically drops such patches if they show
up in a maintainer tree via linux-next.
And once accepted we don't silently drop patches from -tip hosted maintainer
trees, so this is a reliable workflow.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-02-06 22:50 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-05 20:17 [RESEND PATCH v2 0/3] debugobjects: Reduce global pool_lock contention Waiman Long
2017-01-05 20:17 ` [RESEND PATCH v2 1/3] debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done Waiman Long
2017-02-04 16:22 ` [tip:core/debugobjects] " tip-bot for Waiman Long
2017-02-05 10:08 ` Ingo Molnar
2017-01-05 20:17 ` [RESEND PATCH v2 2/3] debugobjects: Scale thresholds with # of CPUs Waiman Long
2017-02-04 16:23 ` [tip:core/debugobjects] " tip-bot for Waiman Long
2017-01-05 20:17 ` [RESEND PATCH v2 3/3] debugobjects: Reduce contention on the global pool_lock Waiman Long
2017-02-04 16:23 ` [tip:core/debugobjects] " tip-bot for Waiman Long
2017-02-05 10:03 ` Ingo Molnar
2017-02-06 22:09 ` Waiman Long
2017-02-06 22:50 ` Ingo Molnar
2017-02-05 16:13 ` tip-bot for Waiman Long
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).