All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG REPORT] perf tools: x86_64: weight column displays odd memory latency data
@ 2022-05-22 22:14 Tyson Thomas
  2022-05-24 14:58 ` Liang, Kan
  0 siblings, 1 reply; 5+ messages in thread
From: Tyson Thomas @ 2022-05-22 22:14 UTC (permalink / raw)
  To: kan.liang, linux-kernel, linux-perf-users

Hi Kan, Linux-Perf Team

I have observed some odd behaviour within perf when using perf-mem. Specifically the reported latency under the weight column appears to be unreasonably high.

Here is a given sample from a recent test, I find that some of the  latencies are close to an unsigned short and I cannot seem to be understand why that would be outside of it being an issue with the perf events.

This can be replicated using a NAS benchmark, specifically cg.D.

I observe the following results in perf mem report (just getting the top 10 results)

 0.02% ,62515 ,L1 or L1 hit 
 0.02% ,54048 ,L1 or L1 hit 
 0.02% ,52206 ,L1 or L1 hit 
 0.02% ,49831 ,L1 or L1 hit 
 0.02% ,49056 ,Local RAM or RAM hit 
 0.01% ,40666 ,LFB or LFB hit 
 0.01% ,38080 ,L1 or L1 hit 
 0.01% ,36772 ,L1 or L1 hit 
 0.01% ,36729 ,LFB or LFB hit 
 0.01% ,27101 ,LFB or LFB hit

Is it possible for someone to shed some light on this or am I misunderstanding how the weight column is used here?
This appears to have been an issue on 5.4, 5.10 and 5.15. I am looking into seeing if it is still present in 5.17 and 5.18.

I've also tried this on different Intel CPUs such as Intel Xeon 6230, i5-1135G7, Intel Xeon 6330

Any insight or help would be appreciated,
Tyson

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG REPORT] perf tools: x86_64: weight column displays odd memory latency data
  2022-05-22 22:14 [BUG REPORT] perf tools: x86_64: weight column displays odd memory latency data Tyson Thomas
@ 2022-05-24 14:58 ` Liang, Kan
  2022-06-01  5:03   ` Tyson Thomas
  0 siblings, 1 reply; 5+ messages in thread
From: Liang, Kan @ 2022-05-24 14:58 UTC (permalink / raw)
  To: Tyson Thomas, linux-kernel, linux-perf-users



On 5/22/2022 6:14 PM, Tyson Thomas wrote:
> Hi Kan, Linux-Perf Team
> 
> I have observed some odd behaviour within perf when using perf-mem. Specifically the reported latency under the weight column appears to be unreasonably high.
> 
> Here is a given sample from a recent test, I find that some of the  latencies are close to an unsigned short and I cannot seem to be understand why that would be outside of it being an issue with the perf events.
> 
> This can be replicated using a NAS benchmark, specifically cg.D.
> 
> I observe the following results in perf mem report (just getting the top 10 results)
>


Could you please show me the exact perf command used?

With my perf mem report,
The first column is the Overhead.
The second column is the number of samples.
The third column is the weight.
The fourth is the Memory access.

Seems like the weight is missed?

Could you please check the perf report -D?
It will dump the weight for each Sample.
Does it look correct?

Thanks,
Kan

>   0.02% ,62515 ,L1 or L1 hit
>   0.02% ,54048 ,L1 or L1 hit
>   0.02% ,52206 ,L1 or L1 hit
>   0.02% ,49831 ,L1 or L1 hit
>   0.02% ,49056 ,Local RAM or RAM hit
>   0.01% ,40666 ,LFB or LFB hit
>   0.01% ,38080 ,L1 or L1 hit
>   0.01% ,36772 ,L1 or L1 hit
>   0.01% ,36729 ,LFB or LFB hit
>   0.01% ,27101 ,LFB or LFB hit
> 
> Is it possible for someone to shed some light on this or am I misunderstanding how the weight column is used here?
> This appears to have been an issue on 5.4, 5.10 and 5.15. I am looking into seeing if it is still present in 5.17 and 5.18.
> 
> I've also tried this on different Intel CPUs such as Intel Xeon 6230, i5-1135G7, Intel Xeon 6330
> 
> Any insight or help would be appreciated,
> Tyson

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG REPORT] perf tools: x86_64: weight column displays odd memory latency data
  2022-05-24 14:58 ` Liang, Kan
@ 2022-06-01  5:03   ` Tyson Thomas
  2022-06-03 14:03     ` Liang, Kan
  0 siblings, 1 reply; 5+ messages in thread
From: Tyson Thomas @ 2022-06-01  5:03 UTC (permalink / raw)
  To: Liang, Kan, linux-kernel, linux-perf-users

Hi Kan,

Thanks for the reply.

The exact command used was: perf mem report

The columns are in order: Overhead, Samples, Local Weight, Memory Access.
The results I sent before are: Overhead, Local Weight and Memory.

So, to update the table from before, each tuple has a sample of 1, so it is:

   0.02%,1 ,62515 ,L1 or L1 hit
   0.02%,1 ,54048 ,L1 or L1 hit
   0.02%,1 ,52206 ,L1 or L1 hit
   0.02%,1 ,49831 ,L1 or L1 hit
   0.02%,1 ,49056 ,Local RAM or RAM hit
   0.01%,1 ,40666 ,LFB or LFB hit
   0.01%,1 ,38080 ,L1 or L1 hit
   0.01%,1 ,36772 ,L1 or L1 hit

So, the weights are absurdly high here. 

I'll give a reply on perf report -D shortly.

Regards,
Tyson



From: Liang, Kan <kan.liang@linux.intel.com>
Sent: Wednesday, 25 May 2022 12:58 AM
To: Tyson Thomas <tyson.thomas@sydney.edu.au>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; linux-perf-users@vger.kernel.org <linux-perf-users@vger.kernel.org>
Subject: Re: [BUG REPORT] perf tools: x86_64: weight column displays odd memory latency data 
 


On 5/22/2022 6:14 PM, Tyson Thomas wrote:
> Hi Kan, Linux-Perf Team
> 
> I have observed some odd behaviour within perf when using perf-mem. Specifically the reported latency under the weight column appears to be unreasonably high.
> 
> Here is a given sample from a recent test, I find that some of the  latencies are close to an unsigned short and I cannot seem to be understand why that would be outside of it being an issue with the perf events.
> 
> This can be replicated using a NAS benchmark, specifically cg.D.
> 
> I observe the following results in perf mem report (just getting the top 10 results)
>


Could you please show me the exact perf command used?

With my perf mem report,
The first column is the Overhead.
The second column is the number of samples.
The third column is the weight.
The fourth is the Memory access.

Seems like the weight is missed?

Could you please check the perf report -D?
It will dump the weight for each Sample.
Does it look correct?

Thanks,
Kan

>   0.02% ,62515 ,L1 or L1 hit
>   0.02% ,54048 ,L1 or L1 hit
>   0.02% ,52206 ,L1 or L1 hit
>   0.02% ,49831 ,L1 or L1 hit
>   0.02% ,49056 ,Local RAM or RAM hit
>   0.01% ,40666 ,LFB or LFB hit
>   0.01% ,38080 ,L1 or L1 hit
>   0.01% ,36772 ,L1 or L1 hit
>   0.01% ,36729 ,LFB or LFB hit
>   0.01% ,27101 ,LFB or LFB hit
> 
> Is it possible for someone to shed some light on this or am I misunderstanding how the weight column is used here?
> This appears to have been an issue on 5.4, 5.10 and 5.15. I am looking into seeing if it is still present in 5.17 and 5.18.
> 
> I've also tried this on different Intel CPUs such as Intel Xeon 6230, i5-1135G7, Intel Xeon 6330
> 
> Any insight or help would be appreciated,
> Tyson

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG REPORT] perf tools: x86_64: weight column displays odd memory latency data
  2022-06-01  5:03   ` Tyson Thomas
@ 2022-06-03 14:03     ` Liang, Kan
  2022-06-07 20:39       ` Tyson Thomas
  0 siblings, 1 reply; 5+ messages in thread
From: Liang, Kan @ 2022-06-03 14:03 UTC (permalink / raw)
  To: Tyson Thomas, linux-kernel, linux-perf-users



On 6/1/2022 1:03 AM, Tyson Thomas wrote:
> Hi Kan,
> 
> Thanks for the reply.
> 
> The exact command used was: perf mem report
> 
> The columns are in order: Overhead, Samples, Local Weight, Memory Access.
> The results I sent before are: Overhead, Local Weight and Memory.
> 
> So, to update the table from before, each tuple has a sample of 1, so it is:
> 
>     0.02%,1 ,62515 ,L1 or L1 hit
>     0.02%,1 ,54048 ,L1 or L1 hit
>     0.02%,1 ,52206 ,L1 or L1 hit
>     0.02%,1 ,49831 ,L1 or L1 hit
>     0.02%,1 ,49056 ,Local RAM or RAM hit
>     0.01%,1 ,40666 ,LFB or LFB hit
>     0.01%,1 ,38080 ,L1 or L1 hit
>     0.01%,1 ,36772 ,L1 or L1 hit
> 
> So, the weights are absurdly high here.
> 
> I'll give a reply on perf report -D shortly.
> 
> Regards,
> Tyson
> 
> 
> 
> From: Liang, Kan <kan.liang@linux.intel.com>
> Sent: Wednesday, 25 May 2022 12:58 AM
> To: Tyson Thomas <tyson.thomas@sydney.edu.au>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; linux-perf-users@vger.kernel.org <linux-perf-users@vger.kernel.org>
> Subject: Re: [BUG REPORT] perf tools: x86_64: weight column displays odd memory latency data
>   
> 
> 
> On 5/22/2022 6:14 PM, Tyson Thomas wrote:
>> Hi Kan, Linux-Perf Team
>>
>> I have observed some odd behaviour within perf when using perf-mem. Specifically the reported latency under the weight column appears to be unreasonably high.
>>
>> Here is a given sample from a recent test, I find that some of the  latencies are close to an unsigned short and I cannot seem to be understand why that would be outside of it being an issue with the perf events.
>>
>> This can be replicated using a NAS benchmark, specifically cg.D.


I tried the cg.D benchmark on my machine, but I cannot reproduce the 
issue. The weight looks reasonable.

$ perf mem record ./bin/cg.D.x


  NAS Parallel Benchmarks (NPB3.4-OMP) - CG Benchmark

  Size:     1500000
  Iterations:                    100
  Number of available threads:     4

^C[ perf record: Woken up 118 times to write data ]
[ perf record: Captured and wrote 30.108 MB perf.data (438307 samples) ]

$ perf mem report --stdio
# To display the perf.data header info, please use 
--header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 209K of event 'cpu/mem-loads,ldlat=30/Pu'
# Total weight : 56835744
# Sort order   : 
local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,p_stage_cyc
#
# Overhead       Samples  Local Weight  Memory access             Symbol 
                            Shared Object         Data Symbol >
# ........  ............  ............  ........................ 
................................  ....................  ............>
#
      0.01%            66  53            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.01%            53  58            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.01%            59  52            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%            53  53            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%            43  65            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%            50  55            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%            51  50            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%            46  55            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%            49  51            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2496          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%            48  52            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2441          Local RAM or RAM hit      [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2420          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2415          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2403          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2399          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2390          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2385          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2385          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2378          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2358          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>

Thanks,
Kan

>>
>> I observe the following results in perf mem report (just getting the top 10 results)
>>
> 
> 
> Could you please show me the exact perf command used?
> 
> With my perf mem report,
> The first column is the Overhead.
> The second column is the number of samples.
> The third column is the weight.
> The fourth is the Memory access.
> 
> Seems like the weight is missed?
> 
> Could you please check the perf report -D?
> It will dump the weight for each Sample.
> Does it look correct?
> 
> Thanks,
> Kan
> 
>>     0.02% ,62515 ,L1 or L1 hit
>>     0.02% ,54048 ,L1 or L1 hit
>>     0.02% ,52206 ,L1 or L1 hit
>>     0.02% ,49831 ,L1 or L1 hit
>>     0.02% ,49056 ,Local RAM or RAM hit
>>     0.01% ,40666 ,LFB or LFB hit
>>     0.01% ,38080 ,L1 or L1 hit
>>     0.01% ,36772 ,L1 or L1 hit
>>     0.01% ,36729 ,LFB or LFB hit
>>     0.01% ,27101 ,LFB or LFB hit
>>
>> Is it possible for someone to shed some light on this or am I misunderstanding how the weight column is used here?
>> This appears to have been an issue on 5.4, 5.10 and 5.15. I am looking into seeing if it is still present in 5.17 and 5.18.
>>
>> I've also tried this on different Intel CPUs such as Intel Xeon 6230, i5-1135G7, Intel Xeon 6330
>>
>> Any insight or help would be appreciated,
>> Tyson

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG REPORT] perf tools: x86_64: weight column displays odd memory latency data
  2022-06-03 14:03     ` Liang, Kan
@ 2022-06-07 20:39       ` Tyson Thomas
  0 siblings, 0 replies; 5+ messages in thread
From: Tyson Thomas @ 2022-06-07 20:39 UTC (permalink / raw)
  To: Liang, Kan, linux-kernel, linux-perf-users

Hi Kan,

Thanks for the reply and sharing your output.

However, I have a couple of questions since the output is not what I have expected and appears odd.

* Are you using the OpenMP version? I ask because I don't observe any openMP symbols in your output
* Does your computer have enough ram for cg.D? (Your output does not include the iteration results and the perf file is smaller than expected)
I observe a similar issue with cg.C.x (cg.C memory requirements are around 8GB)


Regards,
Tyson

From: Liang, Kan <kan.liang@linux.intel.com>
Sent: Saturday, 4 June 2022 12:03 AM
To: Tyson Thomas <tyson.thomas@sydney.edu.au>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; linux-perf-users@vger.kernel.org <linux-perf-users@vger.kernel.org>
Subject: Re: [BUG REPORT] perf tools: x86_64: weight column displays odd memory latency data 
 


On 6/1/2022 1:03 AM, Tyson Thomas wrote:
> Hi Kan,
> 
> Thanks for the reply.
> 
> The exact command used was: perf mem report
> 
> The columns are in order: Overhead, Samples, Local Weight, Memory Access.
> The results I sent before are: Overhead, Local Weight and Memory.
> 
> So, to update the table from before, each tuple has a sample of 1, so it is:
> 
>     0.02%,1 ,62515 ,L1 or L1 hit
>     0.02%,1 ,54048 ,L1 or L1 hit
>     0.02%,1 ,52206 ,L1 or L1 hit
>     0.02%,1 ,49831 ,L1 or L1 hit
>     0.02%,1 ,49056 ,Local RAM or RAM hit
>     0.01%,1 ,40666 ,LFB or LFB hit
>     0.01%,1 ,38080 ,L1 or L1 hit
>     0.01%,1 ,36772 ,L1 or L1 hit
> 
> So, the weights are absurdly high here.
> 
> I'll give a reply on perf report -D shortly.
> 
> Regards,
> Tyson
> 
> 
> 
> From: Liang, Kan <kan.liang@linux.intel.com>
> Sent: Wednesday, 25 May 2022 12:58 AM
> To: Tyson Thomas <tyson.thomas@sydney.edu.au>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; linux-perf-users@vger.kernel.org <linux-perf-users@vger.kernel.org>
> Subject: Re: [BUG REPORT] perf tools: x86_64: weight column displays odd memory latency data
>   
> 
> 
> On 5/22/2022 6:14 PM, Tyson Thomas wrote:
>> Hi Kan, Linux-Perf Team
>>
>> I have observed some odd behaviour within perf when using perf-mem. Specifically the reported latency under the weight column appears to be unreasonably high.
>>
>> Here is a given sample from a recent test, I find that some of the  latencies are close to an unsigned short and I cannot seem to be understand why that would be outside of it being an issue with the perf events.
>>
>> This can be replicated using a NAS benchmark, specifically cg.D.


I tried the cg.D benchmark on my machine, but I cannot reproduce the 
issue. The weight looks reasonable.

$ perf mem record ./bin/cg.D.x


  NAS Parallel Benchmarks (NPB3.4-OMP) - CG Benchmark

  Size:     1500000
  Iterations:                    100
  Number of available threads:     4

^C[ perf record: Woken up 118 times to write data ]
[ perf record: Captured and wrote 30.108 MB perf.data (438307 samples) ]

$ perf mem report --stdio
# To display the perf.data header info, please use 
--header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 209K of event 'cpu/mem-loads,ldlat=30/Pu'
# Total weight : 56835744
# Sort order   : 
local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,p_stage_cyc
#
# Overhead       Samples  Local Weight  Memory access             Symbol 
                            Shared Object         Data Symbol >
# ........  ............  ............  ........................ 
................................  ....................  ............>
#
      0.01%            66  53            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.01%            53  58            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.01%            59  52            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%            53  53            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%            43  65            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%            50  55            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%            51  50            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%            46  55            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%            49  51            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2496          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%            48  52            L1 or L1 hit              [.] 
randlc_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2441          Local RAM or RAM hit      [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2420          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2415          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2403          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2399          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2390          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2385          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2385          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2378          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>
      0.00%             1  2358          LFB or LFB hit            [.] 
sparse_                       cg.D.x                [.] 0x00007f>

Thanks,
Kan

>>
>> I observe the following results in perf mem report (just getting the top 10 results)
>>
> 
> 
> Could you please show me the exact perf command used?
> 
> With my perf mem report,
> The first column is the Overhead.
> The second column is the number of samples.
> The third column is the weight.
> The fourth is the Memory access.
> 
> Seems like the weight is missed?
> 
> Could you please check the perf report -D?
> It will dump the weight for each Sample.
> Does it look correct?
> 
> Thanks,
> Kan
> 
>>     0.02% ,62515 ,L1 or L1 hit
>>     0.02% ,54048 ,L1 or L1 hit
>>     0.02% ,52206 ,L1 or L1 hit
>>     0.02% ,49831 ,L1 or L1 hit
>>     0.02% ,49056 ,Local RAM or RAM hit
>>     0.01% ,40666 ,LFB or LFB hit
>>     0.01% ,38080 ,L1 or L1 hit
>>     0.01% ,36772 ,L1 or L1 hit
>>     0.01% ,36729 ,LFB or LFB hit
>>     0.01% ,27101 ,LFB or LFB hit
>>
>> Is it possible for someone to shed some light on this or am I misunderstanding how the weight column is used here?
>> This appears to have been an issue on 5.4, 5.10 and 5.15. I am looking into seeing if it is still present in 5.17 and 5.18.
>>
>> I've also tried this on different Intel CPUs such as Intel Xeon 6230, i5-1135G7, Intel Xeon 6330
>>
>> Any insight or help would be appreciated,
>> Tyson

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-06-08  1:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-22 22:14 [BUG REPORT] perf tools: x86_64: weight column displays odd memory latency data Tyson Thomas
2022-05-24 14:58 ` Liang, Kan
2022-06-01  5:03   ` Tyson Thomas
2022-06-03 14:03     ` Liang, Kan
2022-06-07 20:39       ` Tyson Thomas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.