linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* perf tool issue following 'perf stat: Fix --no-scale' patch integration
@ 2019-08-21 14:58 Gerald BAEZA
  0 siblings, 0 replies; 2+ messages in thread
From: Gerald BAEZA @ 2019-08-21 14:58 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, ak, mathieu.poirier, peterz,
	mingo, acme, alexander.shishkin, jolsa, namhyung, suzuki.poulose
  Cc: Alexandre TORGUE

Dear Andi and all perf tool / arm debug  experts

This is about the following patch :
       perf stat: Fix --no-scale
       SHA-1 : 75998bb263bf48c1c85d78cd2d2f3a97d3747cab

Since it is applied in the kernel, I noticed that perf tool fails on my ARMv7 platform (STM32MP1 with Cortex-A7 and NEON) with the following error : 
       root@stm32mp1:~# perf stat --no-scale sleep 1
       [10827.350202] Alignment trap: perf (631) PC=0x001139e8 Instr=0xf4640adf Address=0x0021a804 1
       [10827.357704] Alignment trap: not handling instruction f4640adf at [<001139e8>]
       [10827.364867] 8<--- cut here ---
       [10827.367875] Unhandled fault: alignment exception (0x001) at 0x0021a804
       [10827.374427] pgd = 8abc1568
       [10827.377090] [0021a804] *pgd=ff2e8835
       Bus error

The same error happens with or without the --no-scale option.
This is to give the context. I do not blame your patch, Andi :)

I analyzed the root cause of this issue, summarized below, but then I need your lights to imagine the best correction.

One of the changes in the patch concerns tools/perf/util/stat.c :
                                case AGGR_GLOBAL:
                                      aggr->val += count->val;
       -                              if (config->scale) {
       -                                              aggr->ena += count->ena;
       -                                              aggr->run += count->run;
       -                              }
       +                             aggr->ena += count->ena;
       +                             aggr->run += count->run;

The consequence of this new writing is that GCC generates a NEON vectored instruction to load count->val and count->ena values in 64 bits registers, since they are sequential in memory and systematically initialized now:
                f4640adf              vld1.64  {d16-d17}, [r4 :64]

The problem comes from the ':64' specifying that the parameter has to be 8 bytes aligned.
The 'count' pointer points inside the 'contents[]' array from the 'struct xyarray'.
If I force this field to be 64 bits aligned, then perf works again:
struct xyarray {
                size_t row_size;
                size_t entry_size;
                size_t entries;
                size_t max_x;
                size_t max_y;
-              char contents[] ;
+             char contents[] __attribute__((aligned(64)));
};

But the xyarray structure is generic so I think this patch cannot be the final one.
Some GCC versions have a -mgeneral-regs-only option to forbid the generation of NEON instructions while compiling one file, but this does not seem to be mainlined (?).

Well, I am hesitating and don't know what kind of correction I should apply.
I also don't know very well perf tool source code, so this sets some borders to my imagination  :)

Can you help me please ?

Best regards

Gérald


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 2+ messages in thread

* RE: perf tool issue following 'perf stat: Fix --no-scale' patch integration
       [not found]   ` <20190821195451.GG3929@kernel.org>
@ 2019-08-22  7:17     ` Gerald BAEZA
  0 siblings, 0 replies; 2+ messages in thread
From: Gerald BAEZA @ 2019-08-22  7:17 UTC (permalink / raw)
  To: acme@kernel.org, Andi Kleen, linux-arm-kernel, linux-kernel
  Cc: Alexandre TORGUE, mathieu.poirier, suzuki.poulose, peterz,
	alexander.shishkin, mingo, namhyung, jolsa

Hello Arnaldo and Andi

Indeed, 'aligned(8)' instead of 'aligned(64)'.
Thanks for your quick feedbacks and I am going to prepare the patch.

Gérald
 


> Em Wed, Aug 21, 2019 at 09:26:35AM -0700, Andi Kleen escreveu:
> > >
> > >    +             char contents[] __attribute__((aligned(64)));
> >
> > I think you want aligned(8). The parameter is bytes, not bits.
> >
> > >
> > >
> > >    But the xyarray structure is generic so I think this patch cannot be the
> > >    final one.
> >
> > I think it's fine actually to just apply this generically (with 8). It
> > will only waste a few bytes on other 32bit architectures and should be
> > a nop on 64bit, not worth doing anything more sophisticated.
> >
> > I would just submit a patch to do that.
> 
> Agreed.
> 
> - Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-08-22  7:19 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-21 14:58 perf tool issue following 'perf stat: Fix --no-scale' patch integration Gerald BAEZA
     [not found] <f686372a96ea490785c0a76cc96b3434@SFHDAG5NODE1.st.com>
     [not found] ` <20190821162635.GB36669@tassilo.jf.intel.com>
     [not found]   ` <20190821195451.GG3929@kernel.org>
2019-08-22  7:17     ` Gerald BAEZA

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).