From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755013Ab1GVSzr (ORCPT ); Fri, 22 Jul 2011 14:55:47 -0400 Received: from smtp-out.google.com ([74.125.121.67]:24403 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754989Ab1GVSzo convert rfc822-to-8bit (ORCPT ); Fri, 22 Jul 2011 14:55:44 -0400 DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=dkim-signature:mime-version:in-reply-to:references:date: message-id:subject:from:to:cc:content-type: content-transfer-encoding:x-system-of-record; b=LyGHVsgH1LoWScTcjKZ/xaTSYxz+SlAbROLM3RasKjW9ZOigK42KN+c2f7gevJZSB J5hxqmJQXclJ9QWmMCybQ== MIME-Version: 1.0 In-Reply-To: <1309766525-14089-1-git-send-email-ming.m.lin@intel.com> References: <1309766525-14089-1-git-send-email-ming.m.lin@intel.com> Date: Fri, 22 Jul 2011 11:55:41 -0700 Message-ID: Subject: Re: [PATCH 0/4] perf: memory load/store events generalization From: Stephane Eranian To: Lin Ming Cc: Peter Zijlstra , Ingo Molnar , Andi Kleen , Arnaldo Carvalho de Melo , linux-kernel Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Lin, On Mon, Jul 4, 2011 at 1:02 AM, Lin Ming wrote: > Hi, all > > Intel PMU provides 2 facilities to monitor memory operation: load latency and precise store. > This patchset tries to generalize memory load/store events. > So other arches may also add such features. > > A new sub-command "mem" is added, > > $ perf mem > >  usage: perf mem [] {record |report} > >    -t, --type     memory operations(load/store) >    -L, --latency     latency to sample(only for load op) > That looks okay as a first approach tool. But what people are most often interested in is to see where the misses occur, i.e., you need to display load/store addresses somehow, especially for the more costly misses (the ones the compiler cannot really hide by hoisting loads). > $ perf mem -t load record make -j8 > > > > $ perf mem -t load report > > Memory load operation statistics > ================================ >                      L1-local: total latency=   28027, count=    3355(avg=8) That's wrong. On Intel, you need to subtract 4 cycles from the latency you get out of PEBS-LL. The kernel can do that. >                      L2-snoop: total latency=    1430, count=      29(avg=49) I suspect L2-snoop is not correct. If this line item relates to bit 2 of the data source, then it corresponds to a secondary miss. That means you have a load to a cache-line that is already being requested. >                      L2-local: total latency=     124, count=       8(avg=15) >             L3-snoop, found M: total latency=     452, count=       4(avg=113) >          L3-snoop, found no M: total latency=       0, count=       0(avg=0) > L3-snoop, no coherency actions: total latency=     875, count=      18(avg=48) >        L3-miss, snoop, shared: total latency=       0, count=       0(avg=0) >     L3-miss, local, exclusive: total latency=       0, count=       0(avg=0) >        L3-miss, local, shared: total latency=       0, count=       0(avg=0) >    L3-miss, remote, exclusive: total latency=       0, count=       0(avg=0) >       L3-miss, remote, shared: total latency=       0, count=       0(avg=0) >                    Unknown L3: total latency=       0, count=       0(avg=0) >                            IO: total latency=       0, count=       0(avg=0) >                      Uncached: total latency=     464, count=      30(avg=15) > I think it would be more useful to print the % of loads captured for each category. > $ perf mem -t store record make -j8 > > > > $ perf mem -t store report > > Memory store operation statistics > ================================= >                data-cache hit:     8138 >               data-cache miss:        0 >                      STLB hit:     8138 >                     STLB miss:        0 >                 Locked access:        0 >               Unlocked access:     8138 > > Any comment is appreciated. > > Thanks, > Lin Ming >