linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Properly interpret indirect call in perf annotate.
@ 2018-08-23 12:29 Martin Liška
  2018-08-23 14:12 ` Arnaldo Carvalho de Melo
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Martin Liška @ 2018-08-23 12:29 UTC (permalink / raw)
  To: linux-perf-users, lkml; +Cc: Arnaldo Carvalho de Melo, Jiri Olsa

[-- Attachment #1: Type: text/plain, Size: 364 bytes --]

The patch changes interpretation of:
callq  *0x8(%rbx)

from:
  0.26 │     → callq  *8
to:
  0.26 │     → callq  *0x8(%rbx)

in this can an address is followed by a register, thus
one can't parse only address.

Signed-off-by: Martin Liška <mliska@suse.cz>
---
 tools/perf/util/annotate.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)



[-- Attachment #2: 0001-Properly-interpret-indirect-call-in-perf-annotate.patch --]
[-- Type: text/x-patch, Size: 668 bytes --]

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index e4268b948e0e..e32ead4744bd 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -246,8 +246,14 @@ static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_s
 
 indirect_call:
 	tok = strchr(endptr, '*');
-	if (tok != NULL)
-		ops->target.addr = strtoull(tok + 1, NULL, 16);
+	if (tok != NULL) {
+		endptr++;
+
+		/* Indirect call can use a non-rip register and offset: callq  *0x8(%rbx).
+		 * Do not parse such instruction.  */
+		if (strstr(endptr, "(%r") == NULL)
+			ops->target.addr = strtoull(endptr, NULL, 16);
+	}
 	goto find_target;
 }
 


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] Properly interpret indirect call in perf annotate.
  2018-08-23 12:29 [PATCH] Properly interpret indirect call in perf annotate Martin Liška
@ 2018-08-23 14:12 ` Arnaldo Carvalho de Melo
  2018-08-27  9:06   ` Martin Liška
  2018-08-23 23:00 ` Kim Phillips
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-08-23 14:12 UTC (permalink / raw)
  To: Martin Liška; +Cc: linux-perf-users, lkml, Jiri Olsa

Em Thu, Aug 23, 2018 at 02:29:34PM +0200, Martin Liška escreveu:
> The patch changes interpretation of:
> callq  *0x8(%rbx)
> 
> from:
>   0.26 │     → callq  *8
> to:
>   0.26 │     → callq  *0x8(%rbx)
> 
> in this can an address is followed by a register, thus
> one can't parse only address.

Please mention one or two functions where such sequence appears, so that
others can reproduce your before/after more quickly,

- Arnaldo
 
> Signed-off-by: Martin Liška <mliska@suse.cz>
> ---
>  tools/perf/util/annotate.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> 

> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index e4268b948e0e..e32ead4744bd 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -246,8 +246,14 @@ static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_s
>  
>  indirect_call:
>  	tok = strchr(endptr, '*');
> -	if (tok != NULL)
> -		ops->target.addr = strtoull(tok + 1, NULL, 16);
> +	if (tok != NULL) {
> +		endptr++;
> +
> +		/* Indirect call can use a non-rip register and offset: callq  *0x8(%rbx).
> +		 * Do not parse such instruction.  */
> +		if (strstr(endptr, "(%r") == NULL)
> +			ops->target.addr = strtoull(endptr, NULL, 16);
> +	}
>  	goto find_target;
>  }
>  
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Properly interpret indirect call in perf annotate.
  2018-08-23 12:29 [PATCH] Properly interpret indirect call in perf annotate Martin Liška
  2018-08-23 14:12 ` Arnaldo Carvalho de Melo
@ 2018-08-23 23:00 ` Kim Phillips
  2018-08-27 10:37 ` Namhyung Kim
  2018-08-28 14:10 ` Arnaldo Carvalho de Melo
  3 siblings, 0 replies; 10+ messages in thread
From: Kim Phillips @ 2018-08-23 23:00 UTC (permalink / raw)
  To: Martin Liška
  Cc: linux-perf-users, lkml, Arnaldo Carvalho de Melo, Jiri Olsa

On Thu, 23 Aug 2018 14:29:34 +0200
Martin Liška <mliska@suse.cz> wrote:

> The patch changes interpretation of:
> callq  *0x8(%rbx)
> 
> from:
>   0.26 │     → callq  *8
> to:
>   0.26 │     → callq  *0x8(%rbx)
> 
> in this can an address is followed by a register, thus
> one can't parse only address.
> 
> Signed-off-by: Martin Liška <mliska@suse.cz>
> ---

Tested this doesn't incur any grievances parsing aarch64 code:

Tested-by: Kim Phillips <kim.phillips@arm.com>

Thanks,

Kim

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Properly interpret indirect call in perf annotate.
  2018-08-23 14:12 ` Arnaldo Carvalho de Melo
@ 2018-08-27  9:06   ` Martin Liška
  2018-08-28 14:10     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Liška @ 2018-08-27  9:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: linux-perf-users, lkml, Jiri Olsa

[-- Attachment #1: Type: text/plain, Size: 1887 bytes --]

On 08/23/2018 04:12 PM, Arnaldo Carvalho de Melo wrote:
> Em Thu, Aug 23, 2018 at 02:29:34PM +0200, Martin Liška escreveu:
>> The patch changes interpretation of:
>> callq  *0x8(%rbx)
>>
>> from:
>>   0.26 │     → callq  *8
>> to:
>>   0.26 │     → callq  *0x8(%rbx)
>>
>> in this can an address is followed by a register, thus
>> one can't parse only address.
> 
> Please mention one or two functions where such sequence appears, so that
> others can reproduce your before/after more quickly,

Sure, there's self-contained example on can compile (-O2) and test.
It's following call in test function:

test:
.LFB1:
        .cfi_startproc
        movq    %rdi, %rax
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        movq    %rsi, %rdi
        movq    %rdx, %rsi
        call    *8(%rax) <---- here
        cmpl    $1, %eax
        adcl    $-1, %eax
        addq    $8, %rsp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc

Martin

> 
> - Arnaldo
>  
>> Signed-off-by: Martin Liška <mliska@suse.cz>
>> ---
>>  tools/perf/util/annotate.c | 10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>>
> 
>> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
>> index e4268b948e0e..e32ead4744bd 100644
>> --- a/tools/perf/util/annotate.c
>> +++ b/tools/perf/util/annotate.c
>> @@ -246,8 +246,14 @@ static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_s
>>  
>>  indirect_call:
>>  	tok = strchr(endptr, '*');
>> -	if (tok != NULL)
>> -		ops->target.addr = strtoull(tok + 1, NULL, 16);
>> +	if (tok != NULL) {
>> +		endptr++;
>> +
>> +		/* Indirect call can use a non-rip register and offset: callq  *0x8(%rbx).
>> +		 * Do not parse such instruction.  */
>> +		if (strstr(endptr, "(%r") == NULL)
>> +			ops->target.addr = strtoull(endptr, NULL, 16);
>> +	}
>>  	goto find_target;
>>  }
>>  
>>
> 

[-- Attachment #2: perf-callq.c --]
[-- Type: text/x-csrc, Size: 1271 bytes --]

typedef int (*htab_eq) (const void *, const void *);

static int eq (const void *a, const void *b)
{
  return a - b;
}

struct htab
{
  /* Current size (in entries) of the hash table */
  int size;

  /* Pointer to comparison function.  */
  htab_eq eq_f;

  /* Table itself.  */
  void **entries;

  /* Current number of elements including also deleted elements */
  int n_elements;

  /* Current number of deleted elements in the table */
  int n_deleted;

  /* The following member is used for debugging. Its value is number
     of all calls of `htab_find_slot' for the hash table. */
  unsigned int searches;

  /* The following member is used for debugging.  Its value is number
     of collisions fixed for time of work with the hash table. */
  unsigned int collisions;

  unsigned int max_size;

  /* This is non-zero if we are allowed to return NULL for function calls
     that allocate memory.  */
  int return_allocation_failure;
};

int
__attribute__ ((noinline))
test(struct htab *t, int *a, int *b)
{
  int r = t->eq_f (a, b);
  if (r)
    return r - 1;

  return 0;
}

struct htab mytable;
int r;

int main(int argc, char **argv)
{
  mytable.eq_f = &eq;
  for (unsigned i = 0; i < 100000000; i++)
    r += test (&mytable, &argc, &argc);
  
  return 0;
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Properly interpret indirect call in perf annotate.
  2018-08-23 12:29 [PATCH] Properly interpret indirect call in perf annotate Martin Liška
  2018-08-23 14:12 ` Arnaldo Carvalho de Melo
  2018-08-23 23:00 ` Kim Phillips
@ 2018-08-27 10:37 ` Namhyung Kim
  2018-08-27 14:28   ` Martin Liška
  2018-08-28 14:10 ` Arnaldo Carvalho de Melo
  3 siblings, 1 reply; 10+ messages in thread
From: Namhyung Kim @ 2018-08-27 10:37 UTC (permalink / raw)
  To: Martin Liška
  Cc: linux-perf-users, lkml, Arnaldo Carvalho de Melo, Jiri Olsa, kernel-team

Hello,

On Thu, Aug 23, 2018 at 02:29:34PM +0200, Martin Liška wrote:
> The patch changes interpretation of:
> callq  *0x8(%rbx)
> 
> from:
>   0.26 │     → callq  *8
> to:
>   0.26 │     → callq  *0x8(%rbx)
> 
> in this can an address is followed by a register, thus
> one can't parse only address.

Also there's a case with no offset like:  callq  *%rbx


> 
> Signed-off-by: Martin Liška <mliska@suse.cz>
> ---
>  tools/perf/util/annotate.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> 

> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index e4268b948e0e..e32ead4744bd 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -246,8 +246,14 @@ static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_s
>  
>  indirect_call:
>  	tok = strchr(endptr, '*');
> -	if (tok != NULL)
> -		ops->target.addr = strtoull(tok + 1, NULL, 16);
> +	if (tok != NULL) {
> +		endptr++;
> +
> +		/* Indirect call can use a non-rip register and offset: callq  *0x8(%rbx).
> +		 * Do not parse such instruction.  */
> +		if (strstr(endptr, "(%r") == NULL)
> +			ops->target.addr = strtoull(endptr, NULL, 16);

It seems too x86-specific, what about this? (not tested)


indirect_call:
	tok = strchr(endptr, '*');
	if (tok != NULL) {
		endptr++;
		if (!isdigit(*endptr))
			return 0;

		addr = strtoull(endptr, &endptr, 0);
		if (*endptr != '('))
			ops->target.addr = addr;


Thanks,
Namhyung


> +	}
>  	goto find_target;
>  }
>  
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Properly interpret indirect call in perf annotate.
  2018-08-27 10:37 ` Namhyung Kim
@ 2018-08-27 14:28   ` Martin Liška
  0 siblings, 0 replies; 10+ messages in thread
From: Martin Liška @ 2018-08-27 14:28 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-perf-users, lkml, Arnaldo Carvalho de Melo, Jiri Olsa, kernel-team

[-- Attachment #1: Type: text/plain, Size: 2004 bytes --]

On 08/27/2018 12:37 PM, Namhyung Kim wrote:
> Hello,
> 
> On Thu, Aug 23, 2018 at 02:29:34PM +0200, Martin Liška wrote:
>> The patch changes interpretation of:
>> callq  *0x8(%rbx)
>>
>> from:
>>   0.26 │     → callq  *8
>> to:
>>   0.26 │     → callq  *0x8(%rbx)
>>
>> in this can an address is followed by a register, thus
>> one can't parse only address.
> 
> Also there's a case with no offset like:  callq  *%rbx

Yes. But this case is fine as strtoull returns 0 for that:
'If there were no digits at all, strtoul() stores the original value of nptr in *endptr (and returns 0).'
So ops->target.addr is then 0 and it's fine.

> 
> 
>>
>> Signed-off-by: Martin Liška <mliska@suse.cz>
>> ---
>>  tools/perf/util/annotate.c | 10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>>
> 
>> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
>> index e4268b948e0e..e32ead4744bd 100644
>> --- a/tools/perf/util/annotate.c
>> +++ b/tools/perf/util/annotate.c
>> @@ -246,8 +246,14 @@ static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_s
>>  
>>  indirect_call:
>>  	tok = strchr(endptr, '*');
>> -	if (tok != NULL)
>> -		ops->target.addr = strtoull(tok + 1, NULL, 16);
>> +	if (tok != NULL) {
>> +		endptr++;
>> +
>> +		/* Indirect call can use a non-rip register and offset: callq  *0x8(%rbx).
>> +		 * Do not parse such instruction.  */
>> +		if (strstr(endptr, "(%r") == NULL)
>> +			ops->target.addr = strtoull(endptr, NULL, 16);
> 
> It seems too x86-specific, what about this? (not tested)

It is, I'm fine with that. I've just tested that for the callq  *0x8(%rbx) example.
I'm sending patch for that version.

Martin

> 
> 
> indirect_call:
> 	tok = strchr(endptr, '*');
> 	if (tok != NULL) {
> 		endptr++;
> 		if (!isdigit(*endptr))
> 			return 0;
> 
> 		addr = strtoull(endptr, &endptr, 0);
> 		if (*endptr != '('))
> 			ops->target.addr = addr;
> 
> 
> Thanks,
> Namhyung
> 
> 
>> +	}
>>  	goto find_target;
>>  }
>>  
>>
> 


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Properly-interpret-indirect-call-in-perf-annotate.patch --]
[-- Type: text/x-patch; name="0001-Properly-interpret-indirect-call-in-perf-annotate.patch", Size: 1495 bytes --]

From 58a0eca544be8cc9e15b2ab5ecd9d9401ff4d2ec Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Thu, 23 Aug 2018 14:25:33 +0200
Subject: [PATCH] Properly interpret indirect call in perf annotate.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The patch changes interpretation of:
callq  *0x8(%rbx)

from:
  0.26 │     → callq  *8
to:
  0.26 │     → callq  *0x8(%rbx)

in this can an address is followed by a register, thus
one can't parse only address.

Signed-off-by: Martin Liška <mliska@suse.cz>
---
 tools/perf/util/annotate.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index e4268b948e0e..18a8477d4664 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -212,6 +212,7 @@ static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_s
 	struct addr_map_symbol target = {
 		.map = map,
 	};
+  u64 addr;
 
 	ops->target.addr = strtoull(ops->raw, &endptr, 16);
 
@@ -246,8 +247,15 @@ static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_s
 
 indirect_call:
 	tok = strchr(endptr, '*');
-	if (tok != NULL)
-		ops->target.addr = strtoull(tok + 1, NULL, 16);
+	if (tok != NULL) {
+		endptr++;
+		if (!isdigit(*endptr))
+			return 0;
+
+		addr = strtoull(endptr, &endptr, 0);
+		if (*endptr != '(')
+			ops->target.addr = addr;
+	}
 	goto find_target;
 }
 
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] Properly interpret indirect call in perf annotate.
  2018-08-23 12:29 [PATCH] Properly interpret indirect call in perf annotate Martin Liška
                   ` (2 preceding siblings ...)
  2018-08-27 10:37 ` Namhyung Kim
@ 2018-08-28 14:10 ` Arnaldo Carvalho de Melo
  3 siblings, 0 replies; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-08-28 14:10 UTC (permalink / raw)
  To: Martin Liška; +Cc: linux-perf-users, lkml, Jiri Olsa

Em Thu, Aug 23, 2018 at 02:29:34PM +0200, Martin Liška escreveu:
> The patch changes interpretation of:
> callq  *0x8(%rbx)
> 
> from:
>   0.26 │     → callq  *8
> to:
>   0.26 │     → callq  *0x8(%rbx)
> 
> in this can an address is followed by a register, thus
> one can't parse only address.
> 
> Signed-off-by: Martin Liška <mliska@suse.cz>
> ---
>  tools/perf/util/annotate.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)

Please don't send the patch as an attachment, use 'git-format-patch',
I'm fixing this up this time.

- Arnaldo
 
> 

> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index e4268b948e0e..e32ead4744bd 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -246,8 +246,14 @@ static int call__parse(struct arch *arch, struct ins_operands *ops, struct map_s
>  
>  indirect_call:
>  	tok = strchr(endptr, '*');
> -	if (tok != NULL)
> -		ops->target.addr = strtoull(tok + 1, NULL, 16);
> +	if (tok != NULL) {
> +		endptr++;
> +
> +		/* Indirect call can use a non-rip register and offset: callq  *0x8(%rbx).
> +		 * Do not parse such instruction.  */
> +		if (strstr(endptr, "(%r") == NULL)
> +			ops->target.addr = strtoull(endptr, NULL, 16);
> +	}
>  	goto find_target;
>  }
>  
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Properly interpret indirect call in perf annotate.
  2018-08-27  9:06   ` Martin Liška
@ 2018-08-28 14:10     ` Arnaldo Carvalho de Melo
  2018-08-28 14:18       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-08-28 14:10 UTC (permalink / raw)
  To: Martin Liška; +Cc: linux-perf-users, lkml, Jiri Olsa

Em Mon, Aug 27, 2018 at 11:06:21AM +0200, Martin Liška escreveu:
> On 08/23/2018 04:12 PM, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Aug 23, 2018 at 02:29:34PM +0200, Martin Liška escreveu:
> >> The patch changes interpretation of:
> >> callq  *0x8(%rbx)
> >>
> >> from:
> >>   0.26 │     → callq  *8
> >> to:
> >>   0.26 │     → callq  *0x8(%rbx)
> >>
> >> in this can an address is followed by a register, thus
> >> one can't parse only address.
> > 
> > Please mention one or two functions where such sequence appears, so that
> > others can reproduce your before/after more quickly,
> 
> Sure, there's self-contained example on can compile (-O2) and test.
> It's following call in test function:
> 
> test:
> .LFB1:
>         .cfi_startproc
>         movq    %rdi, %rax
>         subq    $8, %rsp
>         .cfi_def_cfa_offset 16
>         movq    %rsi, %rdi
>         movq    %rdx, %rsi
>         call    *8(%rax) <---- here
>         cmpl    $1, %eax
>         adcl    $-1, %eax
>         addq    $8, %rsp
>         .cfi_def_cfa_offset 8
>         ret
>         .cfi_endproc

Here I'm getting:

Samples: 2K of event 'cycles:uppp', 4000 Hz, Event count (approx.): 1808551484
test  /home/acme/c/perf-callq [Percent: local period]
  0.17 │      mov    %rdx,-0x28(%rbp)
  0.58 │      mov    -0x18(%rbp),%rax
  7.90 │      mov    0x8(%rax),%rax
  8.67 │      mov    -0x28(%rbp),%rcx
       │      mov    -0x20(%rbp),%rdx
  0.08 │      mov    %rcx,%rsi
  6.28 │      mov    %rdx,%rdi
 10.50 │    → callq  *%rax
  1.67 │      mov    %eax,-0x4(%rbp)
 11.95 │      cmpl   $0x0,-0x4(%rbp)
  8.14 │    ↓ je     3d
       │      mov    -0x4(%rbp),%eax
       │      sub    $0x1,%eax
       │    ↓ jmp    42
       │3d:   mov    $0x0,%eax
  7.84 │42:   leaveq
       │    ← retq

Without the patch, will check if something changes with it.

- Arnaldo



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Properly interpret indirect call in perf annotate.
  2018-08-28 14:10     ` Arnaldo Carvalho de Melo
@ 2018-08-28 14:18       ` Arnaldo Carvalho de Melo
  2018-08-28 17:55         ` Martin Liška
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-08-28 14:18 UTC (permalink / raw)
  To: Martin Liška; +Cc: linux-perf-users, lkml, Jiri Olsa

Em Tue, Aug 28, 2018 at 11:10:47AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Aug 27, 2018 at 11:06:21AM +0200, Martin Liška escreveu:
> > On 08/23/2018 04:12 PM, Arnaldo Carvalho de Melo wrote:
> > > Em Thu, Aug 23, 2018 at 02:29:34PM +0200, Martin Liška escreveu:
> > >> The patch changes interpretation of:
> > >> callq  *0x8(%rbx)
> > >>
> > >> from:
> > >>   0.26 │     → callq  *8
> > >> to:
> > >>   0.26 │     → callq  *0x8(%rbx)

<SNIP>

> > > Please mention one or two functions where such sequence appears, so that
> > > others can reproduce your before/after more quickly,

> > Sure, there's self-contained example on can compile (-O2) and test.
> > It's following call in test function:

> > test:
> > .LFB1:
> >         .cfi_startproc
> >         movq    %rdi, %rax
> >         subq    $8, %rsp
> >         .cfi_def_cfa_offset 16
> >         movq    %rsi, %rdi
> >         movq    %rdx, %rsi
> >         call    *8(%rax) <---- here
> >         cmpl    $1, %eax
> >         adcl    $-1, %eax
> >         addq    $8, %rsp
> >         .cfi_def_cfa_offset 8
> >         ret
> >         .cfi_endproc
> 
> Here I'm getting:
> 
> Samples: 2K of event 'cycles:uppp', 4000 Hz, Event count (approx.): 1808551484
> test  /home/acme/c/perf-callq [Percent: local period]
>   0.17 │      mov    %rdx,-0x28(%rbp)
>   0.58 │      mov    -0x18(%rbp),%rax
>   7.90 │      mov    0x8(%rax),%rax
>   8.67 │      mov    -0x28(%rbp),%rcx
>        │      mov    -0x20(%rbp),%rdx
>   0.08 │      mov    %rcx,%rsi
>   6.28 │      mov    %rdx,%rdi
>  10.50 │    → callq  *%rax
>   1.67 │      mov    %eax,-0x4(%rbp)
>  11.95 │      cmpl   $0x0,-0x4(%rbp)
>   8.14 │    ↓ je     3d
>        │      mov    -0x4(%rbp),%eax
>        │      sub    $0x1,%eax
>        │    ↓ jmp    42
>        │3d:   mov    $0x0,%eax
>   7.84 │42:   leaveq
>        │    ← retq
> 
> Without the patch, will check if something changes with it.

No changes with the patch, but then I did another test, ran a system
wide record for a while, then tested without/with your patch, with
--stdio2 redirecting to /tmp/{before,after} and got the expected
results, see below.

Thanks, applying,

- Arnaldo

--- /tmp/before 2018-08-28 11:16:03.238384143 -0300
+++ /tmp/after  2018-08-28 11:15:39.335341042 -0300
@@ -13274,7 +13274,7 @@
              ↓ jle    128     
                hash_value = hash_table->hash_func (key);
                mov    0x8(%rsp),%rdi
-  0.91       → callq  *30     
+  0.91       → callq  *0x30(%r12)
                mov    $0x2,%r8d
                cmp    $0x2,%eax
                node_hash = hash_table->hashes[node_index];
@@ -13848,7 +13848,7 @@
                 mov    %r14,%rdi
                 sub    %rbx,%r13
                 mov    %r13,%rdx
-              → callq  *38      
+              → callq  *0x38(%r15)
                 cmp    %rax,%r13
   1.91        ↓ je     240      
          1b4:   mov    $0xffffffff,%r13d
@@ -14026,7 +14026,7 @@
                 mov    %rcx,-0x500(%rbp)
                 mov    %r15,%rsi
                 mov    %r14,%rdi
-              → callq  *38      
+              → callq  *0x38(%rax)
                 mov    -0x500(%rbp),%rcx
                 cmp    %rax,%rcx
               ↓ jne    9b0
<SNIP tons of other such cases>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Properly interpret indirect call in perf annotate.
  2018-08-28 14:18       ` Arnaldo Carvalho de Melo
@ 2018-08-28 17:55         ` Martin Liška
  0 siblings, 0 replies; 10+ messages in thread
From: Martin Liška @ 2018-08-28 17:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: linux-perf-users, lkml, Jiri Olsa

On 08/28/2018 04:18 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Aug 28, 2018 at 11:10:47AM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Mon, Aug 27, 2018 at 11:06:21AM +0200, Martin Liška escreveu:
>>> On 08/23/2018 04:12 PM, Arnaldo Carvalho de Melo wrote:
>>>> Em Thu, Aug 23, 2018 at 02:29:34PM +0200, Martin Liška escreveu:
>>>>> The patch changes interpretation of:
>>>>> callq  *0x8(%rbx)
>>>>>
>>>>> from:
>>>>>    0.26 │     → callq  *8
>>>>> to:
>>>>>    0.26 │     → callq  *0x8(%rbx)
> 
> <SNIP>
> 
>>>> Please mention one or two functions where such sequence appears, so that
>>>> others can reproduce your before/after more quickly,
> 
>>> Sure, there's self-contained example on can compile (-O2) and test.
>>> It's following call in test function:
> 
>>> test:
>>> .LFB1:
>>>          .cfi_startproc
>>>          movq    %rdi, %rax
>>>          subq    $8, %rsp
>>>          .cfi_def_cfa_offset 16
>>>          movq    %rsi, %rdi
>>>          movq    %rdx, %rsi
>>>          call    *8(%rax) <---- here
>>>          cmpl    $1, %eax
>>>          adcl    $-1, %eax
>>>          addq    $8, %rsp
>>>          .cfi_def_cfa_offset 8
>>>          ret
>>>          .cfi_endproc
>>
>> Here I'm getting:
>>
>> Samples: 2K of event 'cycles:uppp', 4000 Hz, Event count (approx.): 1808551484
>> test  /home/acme/c/perf-callq [Percent: local period]
>>    0.17 │      mov    %rdx,-0x28(%rbp)
>>    0.58 │      mov    -0x18(%rbp),%rax
>>    7.90 │      mov    0x8(%rax),%rax
>>    8.67 │      mov    -0x28(%rbp),%rcx
>>         │      mov    -0x20(%rbp),%rdx
>>    0.08 │      mov    %rcx,%rsi
>>    6.28 │      mov    %rdx,%rdi
>>   10.50 │    → callq  *%rax
>>    1.67 │      mov    %eax,-0x4(%rbp)
>>   11.95 │      cmpl   $0x0,-0x4(%rbp)
>>    8.14 │    ↓ je     3d
>>         │      mov    -0x4(%rbp),%eax
>>         │      sub    $0x1,%eax
>>         │    ↓ jmp    42
>>         │3d:   mov    $0x0,%eax
>>    7.84 │42:   leaveq
>>         │    ← retq
>>
>> Without the patch, will check if something changes with it.

Hi Arnaldo.

Thanks for re-sending of the patch and for the testing. The example I sent
is dependent on version of GCC compiler.

> 
> No changes with the patch, but then I did another test, ran a system
> wide record for a while, then tested without/with your patch, with
> --stdio2 redirecting to /tmp/{before,after} and got the expected
> results, see below.
> 
> Thanks, applying,

Good!
Martin

> 
> - Arnaldo
> 
> --- /tmp/before 2018-08-28 11:16:03.238384143 -0300
> +++ /tmp/after  2018-08-28 11:15:39.335341042 -0300
> @@ -13274,7 +13274,7 @@
>                ↓ jle    128
>                  hash_value = hash_table->hash_func (key);
>                  mov    0x8(%rsp),%rdi
> -  0.91       → callq  *30
> +  0.91       → callq  *0x30(%r12)
>                  mov    $0x2,%r8d
>                  cmp    $0x2,%eax
>                  node_hash = hash_table->hashes[node_index];
> @@ -13848,7 +13848,7 @@
>                   mov    %r14,%rdi
>                   sub    %rbx,%r13
>                   mov    %r13,%rdx
> -              → callq  *38
> +              → callq  *0x38(%r15)
>                   cmp    %rax,%r13
>     1.91        ↓ je     240
>            1b4:   mov    $0xffffffff,%r13d
> @@ -14026,7 +14026,7 @@
>                   mov    %rcx,-0x500(%rbp)
>                   mov    %r15,%rsi
>                   mov    %r14,%rdi
> -              → callq  *38
> +              → callq  *0x38(%rax)
>                   mov    -0x500(%rbp),%rcx
>                   cmp    %rax,%rcx
>                 ↓ jne    9b0
> <SNIP tons of other such cases>
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-08-28 17:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-23 12:29 [PATCH] Properly interpret indirect call in perf annotate Martin Liška
2018-08-23 14:12 ` Arnaldo Carvalho de Melo
2018-08-27  9:06   ` Martin Liška
2018-08-28 14:10     ` Arnaldo Carvalho de Melo
2018-08-28 14:18       ` Arnaldo Carvalho de Melo
2018-08-28 17:55         ` Martin Liška
2018-08-23 23:00 ` Kim Phillips
2018-08-27 10:37 ` Namhyung Kim
2018-08-27 14:28   ` Martin Liška
2018-08-28 14:10 ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).