All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Wangnan (F)" <wangnan0@huawei.com>
To: zhangmengting <zhangmengting@huawei.com>,
	linux-perf-users@vger.kernel.org
Cc: acme@redhat.com, namhyung@kernel.org, jolsa@redhat.com,
	huawei.libin@huawei.com, cj.chengjian@huawei.com,
	Andi Kleen <andi@firstfloor.org>
Subject: Re: Question to perf spending a large amount of time monitoring a java process
Date: Fri, 1 Dec 2017 16:03:25 +0800	[thread overview]
Message-ID: <fdd35498-4678-7e0a-79b2-9a8ea604a0e0@huawei.com> (raw)
In-Reply-To: <20d3fa12-295c-84d3-2eae-aa32b5757d19@huawei.com>

I have contacted Mengting by phone. My suggestion is listed below.

I think what she wants is generating data to do a top-down analysis on 
ARM64. Append Andi Kleen to cc list to see if he has any suggestion on it.

On 2017/12/1 15:32, zhangmengting wrote:
> Hi all,
>
> I found that perf spends a large amount of time attaching and monitoring
> a java process with lock, although the execution time of the java process
> is below 1 minute.
>
> Attachment 1(ContextSwitchTest.java) is the java source code used to
> reproduce the problem. The code is compiled and run with the following
> commands. The arguments of the process are <number of RUNS>
> (how many times the test code will be excuated) and <lock ITERATES>
> (how many times the thread acquires lock).
> With arguments <1, 1000000>, the execution time of the process is just 
> one minute.
>
> $javac ContextSwitchTest.java
>
> $java ContextSwitchTest
> Usage:
> java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints 
> ContextSwitchTest  <number of RUNS>  <lock ITERATES>
> Example:
> java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints 
> ContextSwitchTest 1 1000000
>
> I've tested the problem on both x86 and ARM64 platform with 4.14 
> kernel and 4.14 perf.
> And for convenience, I've add time check code to detect the execution 
> time for perf record.
> Attachment 2 is the time check patch 
> (0001-perf-record-add-execution-time-check-code.patch)
>
> The test result is shown below:
> 1) On x86 platform
> a. The execution time of this java process
> $java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints 
> ContextSwitchTest 1 1000000
> RUNS : 1, ITERATES : 1000000
> Name : Thread-0, 21
> Name : Thread-1, 22
> parks: 979010
> parks: 978929
> Average time: 28642ns
> Total time: 56081313428ns = 56s
> b. The execution time of perf monitoring this process with several 
> event groups
> $perf record -N -B -T -g -e 
> '{cycles,r008,r01b,r10c,r009},{cycles,r100,r102,r107,r108},\
> {cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},\ 
>
> {cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},\ 
>
> {cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},\ 
>
> {cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108}'\ 
>

There are too many event groups. It introduce heavy overhead to context 
switch, which
is the benchmark you are using. A '-a' may help to reduce event group 
switching
overhead.

> java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints 
> ContextSwitchTest 1 1000000
> record_open_RUN is 0.235386s
> RUNS : 1, ITERATES : 1000000
> Name : Thread-0, 21
> Name : Thread-1, 22
> parks: 997895
> parks: 998116
> Average time: 72197ns
> Total time: 144107437593ns = 144s
> pollfd_RUN is 169.4294951967s
> [ perf record: Woken up 148 times to write data ]
> [ perf record: Captured and wrote 0.060 MB perf.data ]
> Record_RUN is 170.4294783665s
>
> 2) On ARM64 platform
> a. The execution time of this java process
> $java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints 
> ContextSwitchTest 1 1000000
> RUNS : 1, ITERATES : 1000000
> Name : Thread-0, 24
> Name : Thread-1, 25
> parks: 977285
> parks: 977279
> Average time: 4708ns
> Total time: 9203640720ns = 9s
> b. The execution time of perf monitoring this process with several 
> event groups
> $perf record -N -B -T -g -e'{cycles,r008,r01b,r10c,r009,r010,r012},\
> {cycles,r100,r102,r107,r108,r076,r078},{cycles,r001,r002,r014,r179,r177},\ 
>
> {cycles,r121,r122,r123,r124,r125,r126},{cycles,r040,r042,r050,r052,r060,r061},\ 
>
> {cycles,r003,r004,r005,r016,r017},{cycles,r070,r071,r073,r074,r075,r077},\ 
>
> {cycles,r112,r113,r12c,r111,r120},{cycles,r06c,r06d,r06e,r07c,r07d,r07e},\ 
>
> {cycles,r150,r151,r152,r16a,r079,r07a}' \
> java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints 
> ContextSwitchTest 1 1000000
> record_open_RUN is 0.40954s
> RUNS : 1, ITERATES : 1000000
> Name : Thread-0, 24
> Name : Thread-1, 25
> parks: 1008468
> parks: 1003826
> Average time: 1154505ns
> Total time: 2323208237220ns = 2323s
> pollfd_RUN is 2326.645806s
> [ perf record: Woken up 18463 times to write data ]
> [ perf record: Captured and wrote 6263.982 MB perf.data ]

Event frequency seems too high. Please try -F or -C to control data
rate. When using '-a', you need '--exclude-perf' to filter out those
events generated by perf itself.

To reduce IO overhead, you can increase ring buffer using '-m' or switch
to overwrite mode.

> Record_RUN is 2328.4294867157s
>
> The test result shows that perf consumes most of the time polling fds.
> In addtion, it seems that when tracing a great amount of events, perf may
> extend the execution time of the traced process, especially on ARM64 
> platform.
> A process that runs only 10 seconds now needs an hour execution time, 
> which is
> somewhat insane.
>
> I confuses that how perf affects the traced process and whether the
> final perf.data is still accurate since perf has affected the traced 
> process?
> Is there something wrong with perf?
>
> Thanks,
> Mengting Zhang
>

  reply	other threads:[~2017-12-01  8:03 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-01  7:32 Question to perf spending a large amount of time monitoring a java process zhangmengting
2017-12-01  8:03 ` Wangnan (F) [this message]
2017-12-05 17:40 ` Andi Kleen
2017-12-05 17:52   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fdd35498-4678-7e0a-79b2-9a8ea604a0e0@huawei.com \
    --to=wangnan0@huawei.com \
    --cc=acme@redhat.com \
    --cc=andi@firstfloor.org \
    --cc=cj.chengjian@huawei.com \
    --cc=huawei.libin@huawei.com \
    --cc=jolsa@redhat.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=namhyung@kernel.org \
    --cc=zhangmengting@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.