Hi all, I found that perf spends a large amount of time attaching and monitoring a java process with lock, although the execution time of the java process is below 1 minute. Attachment 1(ContextSwitchTest.java) is the java source code used to reproduce the problem. The code is compiled and run with the following commands. The arguments of the process are (how many times the test code will be excuated) and (how many times the thread acquires lock). With arguments <1, 1000000>, the execution time of the process is just one minute. $javac ContextSwitchTest.java $java ContextSwitchTest Usage: java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints ContextSwitchTest Example: java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints ContextSwitchTest 1 1000000 I've tested the problem on both x86 and ARM64 platform with 4.14 kernel and 4.14 perf. And for convenience, I've add time check code to detect the execution time for perf record. Attachment 2 is the time check patch (0001-perf-record-add-execution-time-check-code.patch) The test result is shown below: 1) On x86 platform a. The execution time of this java process $java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints ContextSwitchTest 1 1000000 RUNS : 1, ITERATES : 1000000 Name : Thread-0, 21 Name : Thread-1, 22 parks: 979010 parks: 978929 Average time: 28642ns Total time: 56081313428ns = 56s b. The execution time of perf monitoring this process with several event groups $perf record -N -B -T -g -e '{cycles,r008,r01b,r10c,r009},{cycles,r100,r102,r107,r108},\ {cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},\ {cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},\ {cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},\ {cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108},{cycles,r100,r102,r107,r108}'\ java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints ContextSwitchTest 1 1000000 record_open_RUN is 0.235386s RUNS : 1, ITERATES : 1000000 Name : Thread-0, 21 Name : Thread-1, 22 parks: 997895 parks: 998116 Average time: 72197ns Total time: 144107437593ns = 144s pollfd_RUN is 169.4294951967s [ perf record: Woken up 148 times to write data ] [ perf record: Captured and wrote 0.060 MB perf.data ] Record_RUN is 170.4294783665s 2) On ARM64 platform a. The execution time of this java process $java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints ContextSwitchTest 1 1000000 RUNS : 1, ITERATES : 1000000 Name : Thread-0, 24 Name : Thread-1, 25 parks: 977285 parks: 977279 Average time: 4708ns Total time: 9203640720ns = 9s b. The execution time of perf monitoring this process with several event groups $perf record -N -B -T -g -e'{cycles,r008,r01b,r10c,r009,r010,r012},\ {cycles,r100,r102,r107,r108,r076,r078},{cycles,r001,r002,r014,r179,r177},\ {cycles,r121,r122,r123,r124,r125,r126},{cycles,r040,r042,r050,r052,r060,r061},\ {cycles,r003,r004,r005,r016,r017},{cycles,r070,r071,r073,r074,r075,r077},\ {cycles,r112,r113,r12c,r111,r120},{cycles,r06c,r06d,r06e,r07c,r07d,r07e},\ {cycles,r150,r151,r152,r16a,r079,r07a}' \ java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints ContextSwitchTest 1 1000000 record_open_RUN is 0.40954s RUNS : 1, ITERATES : 1000000 Name : Thread-0, 24 Name : Thread-1, 25 parks: 1008468 parks: 1003826 Average time: 1154505ns Total time: 2323208237220ns = 2323s pollfd_RUN is 2326.645806s [ perf record: Woken up 18463 times to write data ] [ perf record: Captured and wrote 6263.982 MB perf.data ] Record_RUN is 2328.4294867157s The test result shows that perf consumes most of the time polling fds. In addtion, it seems that when tracing a great amount of events, perf may extend the execution time of the traced process, especially on ARM64 platform. A process that runs only 10 seconds now needs an hour execution time, which is somewhat insane. I confuses that how perf affects the traced process and whether the final perf.data is still accurate since perf has affected the traced process? Is there something wrong with perf? Thanks, Mengting Zhang