Question to perf spending a large amount of time monitoring a java process

* Question to perf spending a large amount of time monitoring a java process
@ 2017-12-01  7:32 zhangmengting
  2017-12-01  8:03 ` Wangnan (F)
  2017-12-05 17:40 ` Andi Kleen
  0 siblings, 2 replies; 4+ messages in thread
From: zhangmengting @ 2017-12-01  7:32 UTC (permalink / raw)
  To: linux-perf-users
  Cc: acme, namhyung, jolsa, huawei.libin, cj.chengjian, zhangmengting

[-- Attachment #1: Type: text/plain, Size: 4196 bytes --]

Hi all,

I found that perf spends a large amount of time attaching and monitoring
a java process with lock, although the execution time of the java process
is below 1 minute.

Attachment 1(ContextSwitchTest.java) is the java source code used to
reproduce the problem. The code is compiled and run with the following
commands. The arguments of the process are <number of RUNS>
(how many times the test code will be excuated) and <lock ITERATES>
(how many times the thread acquires lock).
With arguments <1, 1000000>, the execution time of the process is just 
one minute.

$javac ContextSwitchTest.java

$java ContextSwitchTest
Usage:
java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints 
ContextSwitchTest  <number of RUNS>  <lock ITERATES>
Example:
java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints 
ContextSwitchTest 1 1000000

I've tested the problem on both x86 and ARM64 platform with 4.14 kernel 
and 4.14 perf.
And for convenience, I've add time check code to detect the execution 
time for perf record.
Attachment 2 is the time check patch 
(0001-perf-record-add-execution-time-check-code.patch)

The test result is shown below:
1) On x86 platform
a. The execution time of this java process
$java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints 
ContextSwitchTest 1 1000000
RUNS : 1, ITERATES : 1000000
Name : Thread-0, 21
Name : Thread-1, 22
parks: 979010
parks: 978929
Average time: 28642ns
Total time: 56081313428ns = 56s
b. The execution time of perf monitoring this process with several event 
groups
$perf record -N -B -T -g -e 
'{cycles,r008,r01b,r10c,r009},{cycles,r100,r102,r107,r108},\
{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},\
{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},\
{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},\
{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108}'\
java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints 
ContextSwitchTest 1 1000000
record_open_RUN is 0.235386s
RUNS : 1, ITERATES : 1000000
Name : Thread-0, 21
Name : Thread-1, 22
parks: 997895
parks: 998116
Average time: 72197ns
Total time: 144107437593ns = 144s
pollfd_RUN is 169.4294951967s
[ perf record: Woken up 148 times to write data ]
[ perf record: Captured and wrote 0.060 MB perf.data ]
Record_RUN is 170.4294783665s

2) On ARM64 platform
a. The execution time of this java process
$java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints 
ContextSwitchTest 1 1000000
RUNS : 1, ITERATES : 1000000
Name : Thread-0, 24
Name : Thread-1, 25
parks: 977285
parks: 977279
Average time: 4708ns
Total time: 9203640720ns = 9s
b. The execution time of perf monitoring this process with several event 
groups
$perf record -N -B -T -g -e'{cycles,r008,r01b,r10c,r009,r010,r012},\
{cycles,r100,r102,r107,r108,r076,r078},{cycles,r001,r002,r014,r179,r177},\
{cycles,r121,r122,r123,r124,r125,r126},{cycles,r040,r042,r050,r052,r060,r061},\
{cycles,r003,r004,r005,r016,r017},{cycles,r070,r071,r073,r074,r075,r077},\
{cycles,r112,r113,r12c,r111,r120},{cycles,r06c,r06d,r06e,r07c,r07d,r07e},\
{cycles,r150,r151,r152,r16a,r079,r07a}' \
java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints 
ContextSwitchTest 1 1000000
record_open_RUN is 0.40954s
RUNS : 1, ITERATES : 1000000
Name : Thread-0, 24
Name : Thread-1, 25
parks: 1008468
parks: 1003826
Average time: 1154505ns
Total time: 2323208237220ns = 2323s
pollfd_RUN is 2326.645806s
[ perf record: Woken up 18463 times to write data ]
[ perf record: Captured and wrote 6263.982 MB perf.data ]
Record_RUN is 2328.4294867157s

The test result shows that perf consumes most of the time polling fds.
In addtion, it seems that when tracing a great amount of events, perf may
extend the execution time of the traced process, especially on ARM64 
platform.
A process that runs only 10 seconds now needs an hour execution time, 
which is
somewhat insane.

I confuses that how perf affects the traced process and whether the
final perf.data is still accurate since perf has affected the traced 
process?
Is there something wrong with perf?

Thanks,
Mengting Zhang


[-- Attachment #2: ContextSwitchTest.java --]
[-- Type: text/java, Size: 2634 bytes --]

import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.locks.LockSupport;    

public final class ContextSwitchTest {
    static int RUNS = 1;
    static int ITERATES = 1000;
    static AtomicReference turn = new AtomicReference();

    static final class WorkerThread extends Thread {
        volatile Thread other;
        volatile int nparks;
        public void run() {
            final AtomicReference t = turn;
            final Thread other = this.other;
            
	    Thread current = Thread.currentThread();  
            System.out.println("Name : " + current.getName() +", " + current.getId( ));

            if (turn == null || other == null)
                throw new NullPointerException();
            int p = 0;
            for (int i = 0; i < ITERATES; ++i) {
                while (!t.compareAndSet(other, this)) {
                    LockSupport.park();
                    ++p;
                }
                LockSupport.unpark(other);
            }
            LockSupport.unpark(other);
            nparks = p;
            System.out.println("parks: " + p);

        }
    }

    static void test() throws Exception {
        WorkerThread a = new WorkerThread();
        WorkerThread b = new WorkerThread();
        a.other = b;
        b.other = a;
        turn.set(a);
        long startTime = System.nanoTime();
        a.start();
        b.start();
        a.join();
        b.join();
        long endTime = System.nanoTime();
        int parkNum = a.nparks + b.nparks;
        System.out.println("Average time: " + ((endTime - startTime) / parkNum)
                           + "ns");
    }

    public static void main(String[] args) throws Exception {
	if (args.length != 2) {
                System.out.println("Usage: \n" +
                                   "java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints  ContextSwitchTest  <number of RUNS>  <lock ITERATES>");
                System.out.println("Example: \n" +
                                   "java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints  ContextSwitchTest 1 1000000");
                System.exit(0);
        }
        if (args.length == 2) {
		RUNS = Integer.parseInt(args[0]);
    		ITERATES = Integer.parseInt(args[1]);
	}

        System.out.println("RUNS : " + RUNS + ", ITERATES : " + ITERATES);
	long startTime = System.nanoTime();
        for (int i = 0; i < RUNS; i++) {
            test();
        }
        long endTime = System.nanoTime();
        System.out.println("Total time: " + ((endTime - startTime)) + "ns = " + (endTime - startTime) / 1000000000 + "s");
    }
}

[-- Attachment #3: 0001-perf-record-add-execution-time-check-code.patch --]
[-- Type: text/plain, Size: 3279 bytes --]

From f21d8b2f7329785da27548e61152d7cd542d9ee1 Mon Sep 17 00:00:00 2001
From: Mengting Zhang <zhangmengting@huawei.com>
Date: Fri, 1 Dec 2017 13:43:57 +0800
Subject: [PATCH] perf record: add execution time check code

"record_open_RUN" means the time of record__open();
"Record_RUN" means the time of cmd_record();
"pollfd_RUN" means the time of main part of __cmd_record()
polling fds;

Test it:
$perf record sleep 1
$record_open_RUN is 1.4294047351s
pollfd_RUN is 1.8617s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data (28 samples) ]
Record_RUN is 2.4294628690s

Signed-off-by: Mengting Zhang <zhangmengting@huawei.com>
---
 tools/perf/builtin-record.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 56f8142..f0f0dab 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -50,6 +50,7 @@
 #include <signal.h>
 #include <sys/mman.h>
 #include <sys/wait.h>
+#include <sys/time.h>
 #include <asm/bug.h>
 #include <linux/time64.h>
 
@@ -432,6 +433,8 @@ static int record__open(struct record *rec)
 	struct record_opts *opts = &rec->opts;
 	struct perf_evsel_config_term *err_term;
 	int rc = 0;
+	struct timeval start, end;
+	gettimeofday(&start, NULL);
 
 	perf_evlist__config(evlist, opts, &callchain_param);
 
@@ -475,6 +478,10 @@ static int record__open(struct record *rec)
 	session->evlist = evlist;
 	perf_session__set_id_hdr_size(session);
 out:
+	gettimeofday(&end, NULL);
+	printf("record_open_RUN is %u.%us\n",
+		(unsigned int)(end.tv_sec - start.tv_sec),
+		(unsigned int)(end.tv_usec - start.tv_usec));
 	return rc;
 }
 
@@ -881,6 +888,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	struct perf_session *session;
 	bool disabled = false, draining = false;
 	int fd;
+	struct timeval start, end;
 
 	rec->progname = argv[0];
 
@@ -1051,6 +1059,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	trigger_ready(&auxtrace_snapshot_trigger);
 	trigger_ready(&switch_output_trigger);
 	perf_hooks__invoke_record_start();
+	gettimeofday(&start, NULL);
 	for (;;) {
 		unsigned long long hits = rec->samples;
 
@@ -1148,6 +1157,11 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			disabled = true;
 		}
 	}
+	gettimeofday(&end, NULL);
+	printf("pollfd_RUN is %u.%us\n",
+		(unsigned int)(end.tv_sec - start.tv_sec),
+		(unsigned int)(end.tv_usec - start.tv_usec));
+
 	trigger_off(&auxtrace_snapshot_trigger);
 	trigger_off(&switch_output_trigger);
 
@@ -1688,6 +1702,8 @@ int cmd_record(int argc, const char **argv)
 	int err;
 	struct record *rec = &record;
 	char errbuf[BUFSIZ];
+	struct timeval start, end;
+	gettimeofday(&start, NULL);	
 
 #ifndef HAVE_LIBBPF_SUPPORT
 # define set_nobuild(s, l, c) set_option_nobuild(record_options, s, l, "NO_LIBBPF=1", c)
@@ -1884,6 +1900,12 @@ int cmd_record(int argc, const char **argv)
 	perf_evlist__delete(rec->evlist);
 	symbol__exit();
 	auxtrace_record__free(rec->itr);
+
+	gettimeofday(&end, NULL);
+	printf("Record_RUN is %u.%us\n",
+		(unsigned int)(end.tv_sec - start.tv_sec),
+		(unsigned int)(end.tv_usec - start.tv_usec));
+
 	return err;
 }
 
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread