linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] latencytop lock usage improvement
@ 2019-04-29  8:03 Feng Tang
  2019-04-29  8:03 ` [RFC PATCH 1/3] kernel/sysctl: add description for "latencytop" Feng Tang
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Feng Tang @ 2019-04-29  8:03 UTC (permalink / raw)
  To: Andrew Morton, Arjan van de Ven, Jonathan Corbet, Ingo Molnar,
	Eric W Biederman, Peter Zijlstra, Dmitry Vyukov, Thomas Gleixner,
	Andy Lutomirski, Ying Huang, linux-kernel
  Cc: Feng Tang

Hi All,

latencytop is a very nice tool for tracing system latency hotspots, and
we heavily use it in our LKP test suites.

However, we found in some benchmark tests, there are very severe lock
contentions which hits 70%+ of CPU cycles in perf profile, especially
for benchmark involving massive process scheduling on platforms with 
many CPUs, like running hackbench 
  "hackbench -g 1408 --process --pipe -l 1875 -s 1024"
on a 2 sockets Xeon E5-2699 machine (44 Cores/88 CPUs) with 64GB RAM.

And due to that, we have to explicitly disable latencytop for those test
cases.

By checking the code, we found latencytop use one global spinlock to
cover the latency data updating for both global data and per-task ones.

So initially we tried splitting single lock into one global lock and
per-task lock for better lock granularity, but the lock contention is
only reduced slightly (only 1% drop for perf profile). The reason is
the contention is still severe, as the benchmarks cause massive
scheduling on the 88 CPUs, and every schedule-in call will try to
acquire the lock.

Then we tried to reduce the operations inside the latency_lock's 
protection (between the spin_lock_irqsave/raw_spin_unlock_irqrestore
pair), and also there is only very small improvement, and lock contention
keeps high.

At last,  we tried adding one extra lazy mode which only update the global
latency data when a task exit, while still updating the per-task data
in real time. This reduces the lock contention from 70%+ to less than
5% while boost that hackbench case's throughput by 276%. 

Please help to review, thanks!

Patch 1/3 adds the missing sysctl description for "latencytop" and I
think it could be merged independently.

Patch 2/3 splits latency_lock to global lock and per task lock.
And actually, a more aggressive thought is the per-task lock may not be
needed as the per-task data is only updated at the enqueueing time for
a task, which implies no race condtion for it.

Patch 3/3 implements the lazy mode and update the document.

Thanks,
Feng


Feng Tang (3):
  kernel/sysctl: add description for "latencytop"
  latencytop: split latency_lock to global lock and per task lock
  latencytop: add a lazy mode for updating global data

 Documentation/sysctl/kernel.txt | 23 ++++++++++++++++++++
 include/linux/latencytop.h      |  5 +++++
 include/linux/sched.h           |  1 +
 init/init_task.c                |  3 +++
 kernel/exit.c                   |  2 ++
 kernel/fork.c                   |  4 ++++
 kernel/latencytop.c             | 47 +++++++++++++++++++++++++++++++++++------
 7 files changed, 78 insertions(+), 7 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 1/3] kernel/sysctl: add description for "latencytop"
  2019-04-29  8:03 [RFC PATCH 0/3] latencytop lock usage improvement Feng Tang
@ 2019-04-29  8:03 ` Feng Tang
  2019-04-29  8:03 ` [RFC PATCH 2/3] latencytop: split latency_lock to global lock and per task lock Feng Tang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Feng Tang @ 2019-04-29  8:03 UTC (permalink / raw)
  To: Andrew Morton, Arjan van de Ven, Jonathan Corbet, Ingo Molnar,
	Eric W Biederman, Peter Zijlstra, Dmitry Vyukov, Thomas Gleixner,
	Andy Lutomirski, Ying Huang, linux-kernel
  Cc: Feng Tang

The body of description is mostly copied from comments in
kernel/latencytop.c

Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 Documentation/sysctl/kernel.txt | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index c0527d8..080ef66 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -43,6 +43,7 @@ show up in /proc/sys/kernel:
 - hyperv_record_panic_msg
 - kexec_load_disabled
 - kptr_restrict
+- latencytop
 - l2cr                        [ PPC only ]
 - modprobe                    ==> Documentation/debugging-modules.txt
 - modules_disabled
@@ -437,6 +438,23 @@ When kptr_restrict is set to (2), kernel pointers printed using
 
 ==============================================================
 
+latencytop:
+
+This value controls whether to start collecting kernel latency
+data, it is off (0) by default, and could be switched on (1).
+The latency talked here is not the 'traditional' interrupt
+latency (which is primarily caused by something else consuming CPU),
+but instead, it is the latency an application encounters because
+the kernel sleeps on its behalf for various reasons.
+
+The info is exported via /proc/latency_stats and /proc/<pid>/latency.
+
+This file shows up only if CONFIG_LATENCYTOP is enabled, and please
+be noted that turning it on may bring notable sytstem overhead when
+there are massive scheduling in system.
+
+==============================================================
+
 l2cr: (PPC only)
 
 This flag controls the L2 cache of G3 processor boards. If
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 2/3] latencytop: split latency_lock to global lock and per task lock
  2019-04-29  8:03 [RFC PATCH 0/3] latencytop lock usage improvement Feng Tang
  2019-04-29  8:03 ` [RFC PATCH 1/3] kernel/sysctl: add description for "latencytop" Feng Tang
@ 2019-04-29  8:03 ` Feng Tang
  2019-04-29  8:03 ` [RFC PATCH 3/3] latencytop: add a lazy mode for updating global latency data Feng Tang
  2019-04-30  8:09 ` [RFC PATCH 0/3] latencytop lock usage improvement Peter Zijlstra
  3 siblings, 0 replies; 8+ messages in thread
From: Feng Tang @ 2019-04-29  8:03 UTC (permalink / raw)
  To: Andrew Morton, Arjan van de Ven, Jonathan Corbet, Ingo Molnar,
	Eric W Biederman, Peter Zijlstra, Dmitry Vyukov, Thomas Gleixner,
	Andy Lutomirski, Ying Huang, linux-kernel
  Cc: Feng Tang

Currently there is one global "latency_lock" to cover both the
global and per-task latency data updating. Splitting it into one
global lock and per-task one will improve lock's granularity and
reduce the contention.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 include/linux/sched.h | 1 +
 init/init_task.c      | 3 +++
 kernel/fork.c         | 4 ++++
 kernel/latencytop.c   | 9 +++++----
 4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f9b43c9..84cf13c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1095,6 +1095,7 @@ struct task_struct {
 	unsigned long			dirty_paused_when;
 
 #ifdef CONFIG_LATENCYTOP
+	raw_spinlock_t			latency_lock;
 	int				latency_record_count;
 	struct latency_record		latency_record[LT_SAVECOUNT];
 #endif
diff --git a/init/init_task.c b/init/init_task.c
index 5aebe3b..f7cc0fb 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -158,6 +158,9 @@ struct task_struct init_task
 	.numa_group	= NULL,
 	.numa_faults	= NULL,
 #endif
+#ifdef CONFIG_LATENCYTOP
+	.latency_lock	= __RAW_SPIN_LOCK_UNLOCKED(init_task.latency_lock),
+#endif
 #ifdef CONFIG_KASAN
 	.kasan_depth	= 1,
 #endif
diff --git a/kernel/fork.c b/kernel/fork.c
index b69248e..2109468 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1963,6 +1963,10 @@ static __latent_entropy struct task_struct *copy_process(
 #ifdef TIF_SYSCALL_EMU
 	clear_tsk_thread_flag(p, TIF_SYSCALL_EMU);
 #endif
+
+#ifdef CONFIG_LATENCYTOP
+	raw_spin_lock_init(&p->latency_lock);
+#endif
 	clear_all_latency_tracing(p);
 
 	/* ok, now we should be set up.. */
diff --git a/kernel/latencytop.c b/kernel/latencytop.c
index 96b4179..6d7a174 100644
--- a/kernel/latencytop.c
+++ b/kernel/latencytop.c
@@ -74,10 +74,10 @@ void clear_all_latency_tracing(struct task_struct *p)
 	if (!latencytop_enabled)
 		return;
 
-	raw_spin_lock_irqsave(&latency_lock, flags);
+	raw_spin_lock_irqsave(&p->latency_lock, flags);
 	memset(&p->latency_record, 0, sizeof(p->latency_record));
 	p->latency_record_count = 0;
-	raw_spin_unlock_irqrestore(&latency_lock, flags);
+	raw_spin_unlock_irqrestore(&p->latency_lock, flags);
 }
 
 static void clear_global_latency_tracing(void)
@@ -194,9 +194,10 @@ __account_scheduler_latency(struct task_struct *tsk, int usecs, int inter)
 	store_stacktrace(tsk, &lat);
 
 	raw_spin_lock_irqsave(&latency_lock, flags);
-
 	account_global_scheduler_latency(tsk, &lat);
+	raw_spin_unlock_irqrestore(&latency_lock, flags);
 
+	raw_spin_lock_irqsave(&tsk->latency_lock, flags);
 	for (i = 0; i < tsk->latency_record_count; i++) {
 		struct latency_record *mylat;
 		int same = 1;
@@ -234,7 +235,7 @@ __account_scheduler_latency(struct task_struct *tsk, int usecs, int inter)
 	memcpy(&tsk->latency_record[i], &lat, sizeof(struct latency_record));
 
 out_unlock:
-	raw_spin_unlock_irqrestore(&latency_lock, flags);
+	raw_spin_unlock_irqrestore(&tsk->latency_lock, flags);
 }
 
 static int lstats_show(struct seq_file *m, void *v)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 3/3] latencytop: add a lazy mode for updating global latency data
  2019-04-29  8:03 [RFC PATCH 0/3] latencytop lock usage improvement Feng Tang
  2019-04-29  8:03 ` [RFC PATCH 1/3] kernel/sysctl: add description for "latencytop" Feng Tang
  2019-04-29  8:03 ` [RFC PATCH 2/3] latencytop: split latency_lock to global lock and per task lock Feng Tang
@ 2019-04-29  8:03 ` Feng Tang
  2019-04-30  8:09 ` [RFC PATCH 0/3] latencytop lock usage improvement Peter Zijlstra
  3 siblings, 0 replies; 8+ messages in thread
From: Feng Tang @ 2019-04-29  8:03 UTC (permalink / raw)
  To: Andrew Morton, Arjan van de Ven, Jonathan Corbet, Ingo Molnar,
	Eric W Biederman, Peter Zijlstra, Dmitry Vyukov, Thomas Gleixner,
	Andy Lutomirski, Ying Huang, linux-kernel
  Cc: Feng Tang

latencytop is a nice tool for tracing system latency hotspots, and
we heavily use it in 0day/LKP test suites.

However, When running some scheduler benchmarks like hackbench,
we noticed in some cases the global latencytop_lock will occupy around
70% of CPU cycles from perf profile, mainly come from contention
for "latency_lock" inside __account_scheduler_latency(), as when
system is running with workload that causes massive process scheduling,
most of the processes contends for this global lock. Given that,
we have to disable the latencytop when running such benchmarks.

Add an extra lazy mode option, which will only update the global
latency data when a task exits, and this greatly reduces the possible
lock contion for "latency_lock". And with this new lazy mode, the lock
contention for latency_lock could be cut from 70% to less than 3%
(perf profile data),  and there is a hackbench throughput boost :

            v5.0    v5.0 + patches
---------------- ---------------------------
    540207          +267.6%    1986052        hackbench.throughput

The test we run is on a 2 sockets Xeon E5-2699 machine (44 Cores/88 CPUs)
with 64GB RAM, with cmd:
  "hackbench -g 1408 --process --pipe -l 1875 -s 1024"

As a new mode is added, the sysctl "kernel.latencytop" and
/proc/sys/kernel/latencytop are changed as follows:

	0 - Disabled
	1 - Enabled (normal mode): update the global data each time task
	    gets scheduled (same as before the patch)
	2 - Enabled (lazy mode): update the global data only when a task
	    exists

Suggested-by: Ying Huang <ying.huang@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com
Cc: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/sysctl/kernel.txt | 17 +++++++++++------
 include/linux/latencytop.h      |  5 +++++
 kernel/exit.c                   |  2 ++
 kernel/latencytop.c             | 41 +++++++++++++++++++++++++++++++++++++----
 4 files changed, 55 insertions(+), 10 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 080ef66..05cd9e2 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -440,12 +440,17 @@ When kptr_restrict is set to (2), kernel pointers printed using
 
 latencytop:
 
-This value controls whether to start collecting kernel latency
-data, it is off (0) by default, and could be switched on (1).
-The latency talked here is not the 'traditional' interrupt
-latency (which is primarily caused by something else consuming CPU),
-but instead, it is the latency an application encounters because
-the kernel sleeps on its behalf for various reasons.
+This value controls whether and how to collect kernel latency
+data, it is off (0) by default. The latency talked here is not
+the 'traditional' interrupt latency (which is primarily caused by
+something else consuming CPU), but instead, it is the latency an
+application encounters because the kernel sleeps on its behalf
+for various reasons.
+
+0 - Disabled
+1 - Enabled (normal mode): update the global data each time task
+    gets scheduled
+2 - Enabled (lazy mode): update the global data only when a task exists
 
 The info is exported via /proc/latency_stats and /proc/<pid>/latency.
 
diff --git a/include/linux/latencytop.h b/include/linux/latencytop.h
index 7c560e0..08eeabb 100644
--- a/include/linux/latencytop.h
+++ b/include/linux/latencytop.h
@@ -41,6 +41,7 @@ void clear_all_latency_tracing(struct task_struct *p);
 extern int sysctl_latencytop(struct ctl_table *table, int write,
 			void __user *buffer, size_t *lenp, loff_t *ppos);
 
+extern void update_task_latency_data(struct task_struct *tsk);
 #else
 
 static inline void
@@ -52,6 +53,10 @@ static inline void clear_all_latency_tracing(struct task_struct *p)
 {
 }
 
+static inline void update_task_latency_data(struct task_struct *tsk)
+{
+}
+
 #endif
 
 #endif
diff --git a/kernel/exit.c b/kernel/exit.c
index 2639a30..701b8bd 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -859,6 +859,8 @@ void __noreturn do_exit(long code)
 	tsk->exit_code = code;
 	taskstats_exit(tsk, group_dead);
 
+	update_task_latency_data(tsk);
+
 	exit_mm();
 
 	if (group_dead)
diff --git a/kernel/latencytop.c b/kernel/latencytop.c
index 6d7a174..9943cce 100644
--- a/kernel/latencytop.c
+++ b/kernel/latencytop.c
@@ -65,6 +65,17 @@ static DEFINE_RAW_SPINLOCK(latency_lock);
 #define MAXLR 128
 static struct latency_record latency_record[MAXLR];
 
+/*
+ * latencytop working models:
+ *	0 - disabled
+ *	1 - enabled, update the global data each time task
+ *	    gets scheduled
+ *	2 - enabled, only update the global data when a task
+ *	    exists
+ */
+#define LATENCYTOP_DISABLED	0
+#define LATENCYTOP_NORMAL_MODE	1
+#define LATENCYTOP_LAZY_MODE	2
 int latencytop_enabled;
 
 void clear_all_latency_tracing(struct task_struct *p)
@@ -125,7 +136,7 @@ account_global_scheduler_latency(struct task_struct *tsk,
 				break;
 		}
 		if (same) {
-			latency_record[i].count++;
+			latency_record[i].count += lat->count;
 			latency_record[i].time += lat->time;
 			if (lat->time > latency_record[i].max)
 				latency_record[i].max = lat->time;
@@ -141,6 +152,25 @@ account_global_scheduler_latency(struct task_struct *tsk,
 	memcpy(&latency_record[i], lat, sizeof(struct latency_record));
 }
 
+/* Used only in lazy mode */
+void update_task_latency_data(struct task_struct *tsk)
+{
+	unsigned long flags;
+	int i;
+
+	if (latencytop_enabled != LATENCYTOP_LAZY_MODE)
+		return;
+
+	/* skip kernel threads for now */
+	if (!tsk->mm)
+		return;
+
+	raw_spin_lock_irqsave(&latency_lock, flags);
+	for (i = 0; i < tsk->latency_record_count; i++)
+		account_global_scheduler_latency(tsk, &tsk->latency_record[i]);
+	raw_spin_unlock_irqrestore(&latency_lock, flags);
+}
+
 /*
  * Iterator to store a backtrace into a latency record entry
  */
@@ -193,9 +223,12 @@ __account_scheduler_latency(struct task_struct *tsk, int usecs, int inter)
 	lat.max = usecs;
 	store_stacktrace(tsk, &lat);
 
-	raw_spin_lock_irqsave(&latency_lock, flags);
-	account_global_scheduler_latency(tsk, &lat);
-	raw_spin_unlock_irqrestore(&latency_lock, flags);
+	/* Don't do the global update in lazy mode */
+	if (latencytop_enabled == LATENCYTOP_NORMAL_MODE) {
+		raw_spin_lock_irqsave(&latency_lock, flags);
+		account_global_scheduler_latency(tsk, &lat);
+		raw_spin_unlock_irqrestore(&latency_lock, flags);
+	}
 
 	raw_spin_lock_irqsave(&tsk->latency_lock, flags);
 	for (i = 0; i < tsk->latency_record_count; i++) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/3] latencytop lock usage improvement
  2019-04-29  8:03 [RFC PATCH 0/3] latencytop lock usage improvement Feng Tang
                   ` (2 preceding siblings ...)
  2019-04-29  8:03 ` [RFC PATCH 3/3] latencytop: add a lazy mode for updating global latency data Feng Tang
@ 2019-04-30  8:09 ` Peter Zijlstra
  2019-04-30  8:35   ` Feng Tang
  3 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2019-04-30  8:09 UTC (permalink / raw)
  To: Feng Tang
  Cc: Andrew Morton, Arjan van de Ven, Jonathan Corbet, Ingo Molnar,
	Eric W Biederman, Dmitry Vyukov, Thomas Gleixner,
	Andy Lutomirski, Ying Huang, linux-kernel

On Mon, Apr 29, 2019 at 04:03:28PM +0800, Feng Tang wrote:
> Hi All,
> 
> latencytop is a very nice tool for tracing system latency hotspots, and
> we heavily use it in our LKP test suites.

What data does latency-top give that perf cannot give you? Ideally we'd
remove latencytop entirely.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/3] latencytop lock usage improvement
  2019-04-30  8:09 ` [RFC PATCH 0/3] latencytop lock usage improvement Peter Zijlstra
@ 2019-04-30  8:35   ` Feng Tang
  2019-04-30  9:10     ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Feng Tang @ 2019-04-30  8:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Arjan van de Ven, Jonathan Corbet, Ingo Molnar,
	Eric W Biederman, Dmitry Vyukov, Thomas Gleixner,
	Andy Lutomirski, Ying Huang, linux-kernel

Hi Peter,

On Tue, Apr 30, 2019 at 10:09:10AM +0200, Peter Zijlstra wrote:
> On Mon, Apr 29, 2019 at 04:03:28PM +0800, Feng Tang wrote:
> > Hi All,
> > 
> > latencytop is a very nice tool for tracing system latency hotspots, and
> > we heavily use it in our LKP test suites.
> 
> What data does latency-top give that perf cannot give you? Ideally we'd
> remove latencytop entirely.

Thanks for the review. In 0day/LKP test service, we have many tools for
monitoring and analyzing the test results, perf is the most important
one, which has the most parts in our auto-generated comparing results.   
For example to identify spinlock contentions and system hotspots.

latencytop is another tool we used to find why systems go idle, like why
workload chose to sleep or waiting for something. 

Thanks,
Feng

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/3] latencytop lock usage improvement
  2019-04-30  8:35   ` Feng Tang
@ 2019-04-30  9:10     ` Peter Zijlstra
  2019-04-30  9:22       ` Feng Tang
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2019-04-30  9:10 UTC (permalink / raw)
  To: Feng Tang
  Cc: Andrew Morton, Arjan van de Ven, Jonathan Corbet, Ingo Molnar,
	Eric W Biederman, Dmitry Vyukov, Thomas Gleixner,
	Andy Lutomirski, Ying Huang, linux-kernel

On Tue, Apr 30, 2019 at 04:35:05PM +0800, Feng Tang wrote:
> Hi Peter,
> 
> On Tue, Apr 30, 2019 at 10:09:10AM +0200, Peter Zijlstra wrote:
> > On Mon, Apr 29, 2019 at 04:03:28PM +0800, Feng Tang wrote:
> > > Hi All,
> > > 
> > > latencytop is a very nice tool for tracing system latency hotspots, and
> > > we heavily use it in our LKP test suites.
> > 
> > What data does latency-top give that perf cannot give you? Ideally we'd
> > remove latencytop entirely.
> 
> Thanks for the review. In 0day/LKP test service, we have many tools for
> monitoring and analyzing the test results, perf is the most important
> one, which has the most parts in our auto-generated comparing results.   
> For example to identify spinlock contentions and system hotspots.
> 
> latencytop is another tool we used to find why systems go idle, like why
> workload chose to sleep or waiting for something. 

You're not answering the question; why can't you use perf for that? ISTR
we explicitly added support for things like that.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/3] latencytop lock usage improvement
  2019-04-30  9:10     ` Peter Zijlstra
@ 2019-04-30  9:22       ` Feng Tang
  0 siblings, 0 replies; 8+ messages in thread
From: Feng Tang @ 2019-04-30  9:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Arjan van de Ven, Jonathan Corbet, Ingo Molnar,
	Eric W Biederman, Dmitry Vyukov, Thomas Gleixner,
	Andy Lutomirski, Ying Huang, linux-kernel

Hi Peter,

On Tue, Apr 30, 2019 at 11:10:33AM +0200, Peter Zijlstra wrote:
> On Tue, Apr 30, 2019 at 04:35:05PM +0800, Feng Tang wrote:
> > Hi Peter,
> > 
> > On Tue, Apr 30, 2019 at 10:09:10AM +0200, Peter Zijlstra wrote:
> > > On Mon, Apr 29, 2019 at 04:03:28PM +0800, Feng Tang wrote:
> > > > Hi All,
> > > > 
> > > > latencytop is a very nice tool for tracing system latency hotspots, and
> > > > we heavily use it in our LKP test suites.
> > > 
> > > What data does latency-top give that perf cannot give you? Ideally we'd
> > > remove latencytop entirely.
> > 
> > Thanks for the review. In 0day/LKP test service, we have many tools for
> > monitoring and analyzing the test results, perf is the most important
> > one, which has the most parts in our auto-generated comparing results.   
> > For example to identify spinlock contentions and system hotspots.
> > 
> > latencytop is another tool we used to find why systems go idle, like why
> > workload chose to sleep or waiting for something. 
> 
> You're not answering the question; why can't you use perf for that? ISTR
> we explicitly added support for things like that.

I was not very familiar with perf before. And after my last reply,
I googled a little, and found "perf sched latency" has the simliar
function, except I can't directly get the call chain, any suggestion
for this? thanks!

- Feng


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-04-30  9:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-29  8:03 [RFC PATCH 0/3] latencytop lock usage improvement Feng Tang
2019-04-29  8:03 ` [RFC PATCH 1/3] kernel/sysctl: add description for "latencytop" Feng Tang
2019-04-29  8:03 ` [RFC PATCH 2/3] latencytop: split latency_lock to global lock and per task lock Feng Tang
2019-04-29  8:03 ` [RFC PATCH 3/3] latencytop: add a lazy mode for updating global latency data Feng Tang
2019-04-30  8:09 ` [RFC PATCH 0/3] latencytop lock usage improvement Peter Zijlstra
2019-04-30  8:35   ` Feng Tang
2019-04-30  9:10     ` Peter Zijlstra
2019-04-30  9:22       ` Feng Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).