linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Budankov <alexey.budankov@linux.intel.com>
To: Jiri Olsa <jolsa@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andi Kleen <andi@firstfloor.org>
Subject: Re: [RFCv2 00/48] perf tools: Add threads to record command
Date: Fri, 21 Sep 2018 09:13:08 +0300	[thread overview]
Message-ID: <4f63c3d5-2a33-28ed-4e45-086045e9ab50@linux.intel.com> (raw)
In-Reply-To: <71153c79-f0b9-4bf7-7491-202f46c6b5ed@linux.intel.com>

Hello Jiri,

On 14.09.2018 12:37, Alexey Budankov wrote:
> On 14.09.2018 11:28, Jiri Olsa wrote:
>> On Fri, Sep 14, 2018 at 10:26:53AM +0200, Jiri Olsa wrote:
>>
>> SNIP
>>
>>>>> The threaded monitoring currently can't monitor backward maps
>>>>> and there are probably more limitations which I haven't spotted
>>>>> yet.
>>>>>
>>>>> So far I tested on laptop:
>>>>>   http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt
>>>>>
>>>>> and a one bigger server:
>>>>>   http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt
>>>>>
>>>>> I can see decrease in recorded LOST events, but both the benchmark
>>>>> and the monitoring must be carefully configured wrt:
>>>>>   - number of events (frequency)
>>>>>   - size of the memory maps
>>>>>   - size of events (callchains)
>>>>>   - final perf.data size
>>>>>
>>>>> It's also available in:
>>>>>   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>>>>>   perf/record_threads
>>>>>
>>>>> thoughts? ;-) thanks
>>>>> jirka
>>>>
>>>> It is preferable to split into smaller pieces that bring 
>>>> some improvement proved by metrics numbers and ready for 
>>>> merging and upstream. Do we have more metrics than the 
>>>> data loss from trace AIO patches?
>>>
>>> well the primary focus is to get more events in,
>>> so the LOST metric is the main one
>>
>> actualy I was hoping, could you please run it through the same
>> tests as you do for AIO code on some huge server? 
> 
> Yeah, I will, but it takes some time.

Here it is:

Hardware:
cat /proc/cpuinfo
processor	: 271
vendor_id	: GenuineIntel
cpu family	: 6
model		: 133
model name	: Intel(R) Xeon Phi(TM) CPU 7285 @ 1.30GHz
stepping	: 0
microcode	: 0xe
cpu MHz		: 1064.235
cache size	: 1024 KB
physical id	: 0
siblings	: 272
core id		: 73
cpu cores	: 68
apicid		: 295
initial apicid	: 295
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ring3mwait cpuid_fault epb pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms avx512f rdseed adx avx512pf avx512er avx512cd xsaveopt dtherm ida arat pln pts avx512_vpopcntdq avx512_4vnniw avx512_4fmaps
bugs		: cpu_meltdown spectre_v1 spectre_v2
bogomips	: 2594.07
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

uname -a
Linux nntpat98-196 4.18.0-rc7+ #2 SMP Thu Sep 6 13:24:37 MSK 2018 x86_64 x86_64 x86_64 GNU/Linux

cat /proc/sys/kernel/perf_event_paranoid
0

cat /proc/sys/kernel/perf_event_mlock_kb 
516

cat /proc/sys/kernel/perf_event_max_sample_rate 
3000

cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Metrics:
runtime overhead (%) : elapsed_time_under_profiling / elapsed_time
data loss (%)        : paused_time / elapsed_time_under_profiling
LOST events          : stat from perf report --stats
SAMPLE events        : stat from perf report --stats
perf.data size (B)   : size of trace file on disk

Events:
cpu/period=P,event=0x3c/Duk;CPU_CLK_UNHALTED.THREAD
cpu/period=P,umask=0x3/Duk;CPU_CLK_UNHALTED.REF_TSC
cpu/period=P,event=0xc0/Duk;INST_RETIRED.ANY
cpu/period=0xaae61,event=0xc2,umask=0x10/uk;UOPS_RETIRED.ALL
cpu/period=0x11171,event=0xc2,umask=0x20/uk;UOPS_RETIRED.SCALAR_SIMD
cpu/period=0x11171,event=0xc2,umask=0x40/uk;UOPS_RETIRED.PACKED_SIMD

=================================================

Command:
/usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.thr record --threads=T \
	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
        -e cpu/period=P,event=0x3c/Duk,\
           cpu/period=P,umask=0x3/Duk,\
           cpu/period=P,event=0xc0/Duk,\
           cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
           cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
           cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
         --clockid=monotonic_raw -- ./matrix.(icc|gcc)

Workload: matrix multiplication in 256 threads

/usr/bin/time ./matrix.icc
Addr of buf1 = 0x7ff9faa73010
Offs of buf1 = 0x7ff9faa73180
Addr of buf2 = 0x7ff9f8a72010
Offs of buf2 = 0x7ff9f8a721c0
Addr of buf3 = 0x7ff9f6a71010
Offs of buf3 = 0x7ff9f6a71100
Addr of buf4 = 0x7ff9f4a70010
Offs of buf4 = 0x7ff9f4a70140
Threads #: 256 Pthreads
Matrix size: 2048
Using multiply kernel: multiply1
Freq = 0.997720 GHz
Execution time = 9.061 seconds
1639.55user 6.59system 0:07.12elapsed 23094%CPU (0avgtext+0avgdata 100448maxresident)k
96inputs+0outputs (1major+33839minor)pagefaults 0swaps

T : 272
        P (period, ms)       : 0.1
	runtime overhead (%) : 45x ~ 323.54 / 7.12
	data loss (%)        : 96
	LOST events          : 323662
	SAMPLE events        : 31885479
        perf.data size (GiB) : 42

	P (period, ms)       : 0.25
	runtime overhead (%) : 25x ~ 180.76 / 7.12
	data loss (%)        : 69 
	LOST events          : 10636
	SAMPLE events        : 18692998
        perf.data size (GiB) : 23.5

	P (period, ms)       : 0.35 
	runtime overhead (%) : 16x ~ 119.49 / 7.12
	data loss (%)        : 1
	LOST events          : 6
	SAMPLE events        : 11178524
        perf.data size (GiB) : 14

T : 128
	P (period, ms)       : 0.35 
	runtime overhead (%) : 15x ~ 111.98 / 7.12
	data loss (%)        : 62
	LOST events          : 2825
	SAMPLE events        : 11267247
        perf.data size (GiB) : 15

T : 64
	P (period, ms)       : 0.35 
	runtime overhead (%) : 14x ~ 101.55 / 7.12
	data loss (%)        : 67
	LOST events          : 5155
	SAMPLE events        : 10966297
        perf.data size (GiB) : 13.7

Workload: matrix multiplication in 128 threads

/usr/bin/time ./matrix.gcc
Addr of buf1 = 0x7f072e630010
Offs of buf1 = 0x7f072e630180
Addr of buf2 = 0x7f072c62f010
Offs of buf2 = 0x7f072c62f1c0
Addr of buf3 = 0x7f072a62e010
Offs of buf3 = 0x7f072a62e100
Addr of buf4 = 0x7f072862d010
Offs of buf4 = 0x7f072862d140
Threads #: 128 Pthreads
Matrix size: 2048
Using multiply kernel: multiply1
Execution time = 6.639 seconds
767.03user 11.17system 0:06.81elapsed 11424%CPU (0avgtext+0avgdata 100756maxresident)k
88inputs+0outputs (0major+139898minor)pagefaults 0swaps

T : 272
        P (period, ms)       : 0.1
	runtime overhead (%) : 29x ~ 198.81 / 6.81
	data loss (%)        : 21
	LOST events          : 2502
	SAMPLE events        : 22481062
        perf.data size (GiB) : 27.6

	P (period, ms)       : 0.25
	runtime overhead (%) : 13x ~ 88.47 / 6.81
	data loss (%)        : 0
	LOST events          : 0
	SAMPLE events        : 9572787
        perf.data size (GiB) : 11.3

	P (period, ms)       : 0.35 
	runtime overhead (%) : 10x ~ 67.11 / 6.81
	data loss (%)        : 1
	LOST events          : 137
	SAMPLE events        : 6985930
        perf.data size (GiB) : 8

T : 128
	P (period, ms)       : 0.35 
	runtime overhead (%) : 9.5x ~ 64.33 / 6.81
	data loss (%)        : 1
	LOST events          : 3
	SAMPLE events        : 6666903
        perf.data size (GiB) : 7.8

T : 64
	P (period, ms)       : 0.25
	runtime overhead (%) : 17x ~ 114.27 / 6.81
	data loss (%)        : 2
	LOST events          : 52
	SAMPLE events        : 12643645
        perf.data size (GiB) : 15.5

	P (period, ms)       : 0.35 
	runtime overhead (%) : 10x ~ 68.60 / 6.81
	data loss (%)        : 1
	LOST events          : 93
	SAMPLE events        : 7164368
        perf.data size (GiB) : 8.5

Thanks,
Alexey

> 
>>
>> thanks,
>> jirka
>>
> 

  reply	other threads:[~2018-09-21  6:15 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
2018-09-13 12:54 ` [PATCH 01/48] perf tools: Remove perf_tool from event_op2 Jiri Olsa
2018-09-25  9:31   ` [tip:perf/core] " tip-bot for Jiri Olsa
2018-09-13 12:54 ` [PATCH 02/48] perf tools: Remove perf_tool from event_op3 Jiri Olsa
2018-09-18 20:56   ` Arnaldo Carvalho de Melo
2018-09-23 19:45     ` Jiri Olsa
2018-09-25  9:31   ` [tip:perf/core] " tip-bot for Jiri Olsa
2018-09-13 12:54 ` [PATCH 03/48] perf tools: Pass struct perf_mmap into auxtrace_mmap__read* functions Jiri Olsa
2018-09-25  9:32   ` [tip:perf/core] perf auxtrace: Pass struct perf_mmap into mmap__read* functions tip-bot for Jiri Olsa
2018-09-13 12:54 ` [PATCH 04/48] perf tools: Add struct perf_mmap arg into record__write Jiri Olsa
2018-09-25  9:32   ` [tip:perf/core] perf tools: Add 'struct perf_mmap' arg to record__write() tip-bot for Jiri Olsa
2018-09-13 12:54 ` [PATCH 05/48] perf tools: Use a software dummy event to track task/mmap events Jiri Olsa
2018-09-13 12:54 ` [PATCH 06/48] perf tools: Create separate mmap for dummy tracking event Jiri Olsa
2018-09-13 12:54 ` [PATCH 07/48] perf tools: Extend perf_evlist__mmap_ex() to use track mmap Jiri Olsa
2018-09-13 12:54 ` [PATCH 08/48] perf report: Skip dummy tracking event Jiri Olsa
2018-09-13 12:54 ` [PATCH 09/48] perf tools: Make copyfile_offset global Jiri Olsa
2018-09-18 20:54   ` Arnaldo Carvalho de Melo
2018-09-23 19:44     ` Jiri Olsa
2018-09-25  9:33   ` [tip:perf/core] perf util: Make copyfile_offset() global tip-bot for Jiri Olsa
2018-09-13 12:54 ` [PATCH 10/48] perf tools: Add HEADER_DATA_INDEX feature Jiri Olsa
2018-09-13 12:54 ` [PATCH 11/48] perf tools: Handle indexed data file properly Jiri Olsa
2018-09-13 12:54 ` [PATCH 12/48] perf tools: Add perf_data__create_index function Jiri Olsa
2018-09-13 12:54 ` [PATCH 13/48] perf record: Add --index option for building index table Jiri Olsa
2018-09-13 12:54 ` [PATCH 14/48] perf tools: Introduce thread__comm(_str)_by_time() helpers Jiri Olsa
2018-09-13 12:54 ` [PATCH 15/48] perf tools: Add a test case for thread comm handling Jiri Olsa
2018-09-13 12:54 ` [PATCH 16/48] perf tools: Use thread__comm_by_time() when adding hist entries Jiri Olsa
2018-09-13 12:54 ` [PATCH 17/48] perf tools: Convert dead thread list into rbtree Jiri Olsa
2018-09-13 12:54 ` [PATCH 18/48] perf tools: Introduce machine__find*_thread_by_time() Jiri Olsa
2018-09-13 12:54 ` [PATCH 19/48] perf tools: Add thread::exited flag Jiri Olsa
2018-09-13 12:54 ` [PATCH 20/48] perf tools: Add a test case for timed thread handling Jiri Olsa
2018-09-13 12:54 ` [PATCH 21/48] perf tools: Maintain map groups list in a leader thread Jiri Olsa
2018-09-13 12:54 ` [PATCH 22/48] perf tools: Introduce thread__find_symbol_by_time() and friends Jiri Olsa
2018-09-13 12:54 ` [PATCH 23/48] perf callchain: Use thread__find_addr_location_by_time() " Jiri Olsa
2018-09-13 12:54 ` [PATCH 24/48] perf tools: Add a test case for timed map groups handling Jiri Olsa
2018-09-13 12:54 ` [PATCH 25/48] perf tools: Save timestamp of a map creation Jiri Olsa
2018-09-13 12:54 ` [PATCH 26/48] perf tools: Introduce map_groups__{insert,find}_by_time() Jiri Olsa
2018-09-13 12:54 ` [PATCH 27/48] perf tools: Use map_groups__find_addr_by_time() Jiri Olsa
2018-09-13 12:54 ` [PATCH 28/48] perf tools: Add testcase for managing maps with time Jiri Olsa
2018-09-13 12:54 ` [PATCH 29/48] perf callchain: Maintain libunwind's address space in map_groups Jiri Olsa
2018-09-14 18:15   ` Arnaldo Carvalho de Melo
2018-09-14 19:00     ` Jiri Olsa
2018-09-13 12:54 ` [PATCH 30/48] perf tools: Rename perf_evlist__munmap_filtered to perf_mmap__put_filtered Jiri Olsa
2018-09-13 12:54 ` [PATCH 31/48] tools lib fd array: Introduce fdarray__add_clone function Jiri Olsa
2018-09-13 12:54 ` [PATCH 32/48] tools lib subcmd: Add OPT_INTEGER_OPTARG|_SET options Jiri Olsa
2018-09-13 12:54 ` [PATCH 33/48] perf tools: Move __perf_session__process_events args into struct Jiri Olsa
2018-09-13 12:54 ` [PATCH 34/48] perf ui progress: Fix index progress display Jiri Olsa
2018-09-13 12:54 ` [PATCH 35/48] perf tools: Add threads debug variable Jiri Olsa
2018-09-13 12:54 ` [PATCH 36/48] perf tools: Add perf_mmap__read_tail function Jiri Olsa
2018-09-13 12:54 ` [PATCH 37/48] perf record: Introduce struct record_thread Jiri Olsa
2018-09-17 11:26   ` Namhyung Kim
2018-09-23 19:31     ` Jiri Olsa
2018-09-13 12:54 ` [PATCH 38/48] perf record: Read record thread's mmaps Jiri Olsa
2018-09-17 11:28   ` Namhyung Kim
2018-09-23 19:35     ` Jiri Olsa
2018-09-13 12:54 ` [PATCH 39/48] perf record: Move waking into struct record Jiri Olsa
2018-09-17 11:31   ` Namhyung Kim
2018-09-23 19:36     ` Jiri Olsa
2018-09-13 12:54 ` [PATCH 40/48] perf record: Move samples into struct record_thread Jiri Olsa
2018-09-13 12:54 ` [PATCH 41/48] perf record: Move bytes_written " Jiri Olsa
2018-09-13 12:54 ` [PATCH 42/48] perf record: Add record_thread start/stop/process functions Jiri Olsa
2018-09-13 12:54 ` [PATCH 43/48] perf record: Wait for all threads being started Jiri Olsa
2018-09-13 12:54 ` [PATCH 44/48] perf record: Add --threads option Jiri Olsa
2018-09-17 11:37   ` Namhyung Kim
2018-09-13 12:54 ` [PATCH 45/48] perf record: Add --thread-stats option support Jiri Olsa
2018-09-13 12:54 ` [PATCH 46/48] perf record: Add maps to --thread-stats output Jiri Olsa
2018-09-13 12:54 ` [PATCH 47/48] perf record: Spread maps for --threads option Jiri Olsa
2018-09-17 11:40   ` Namhyung Kim
2018-09-23 19:44     ` Jiri Olsa
2018-09-24 14:22       ` Arnaldo Carvalho de Melo
2018-09-26  6:23         ` Jiri Olsa
2018-09-27 16:01           ` Jiri Olsa
2018-09-28  6:25             ` Namhyung Kim
2018-09-13 12:54 ` [PATCH 48/48] perf record: Spread maps for --threads=X option Jiri Olsa
2018-09-13 16:10 ` [RFCv2 00/48] perf tools: Add threads to record command Alexey Budankov
2018-09-14  2:29   ` Namhyung Kim
2018-09-14  7:15     ` Alexey Budankov
2018-09-14  8:23     ` Jiri Olsa
2018-09-14  9:40       ` Ingo Molnar
2018-09-14 11:15         ` Peter Zijlstra
2018-09-14 11:47           ` Jiri Olsa
2018-09-14 12:01             ` Peter Zijlstra
2018-09-14 12:13               ` Ingo Molnar
2018-09-14 12:19                 ` Jiri Olsa
2018-09-14 12:45                   ` Ingo Molnar
2018-09-14  9:33     ` Ingo Molnar
2018-09-14  8:26   ` Jiri Olsa
2018-09-14  8:28     ` Jiri Olsa
2018-09-14  9:37       ` Alexey Budankov
2018-09-21  6:13         ` Alexey Budankov [this message]
2018-09-21 12:15           ` Alexey Budankov
2018-09-24 19:23             ` Alexey Budankov
2018-10-02 21:41               ` Jiri Olsa
2018-10-03  7:01                 ` Alexey Budankov
2018-09-23 19:30           ` Jiri Olsa
2018-09-24  7:02             ` Alexey Budankov
2018-09-24 13:09               ` Alexey Budankov
2018-09-24 14:29                 ` Jiri Olsa
2018-09-24 18:32                   ` Alexey Budankov
2018-09-24 19:12                     ` Alexey Budankov
2018-10-05  6:14                     ` Namhyung Kim
2018-09-14 17:02 ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4f63c3d5-2a33-28ed-4e45-086045e9ab50@linux.intel.com \
    --to=alexey.budankov@linux.intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andi@firstfloor.org \
    --cc=jolsa@kernel.org \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).