All of lore.kernel.org
 help / color / mirror / Atom feed
* Floating point register corruption on ARM Cortex A57 (ARMv8) with RT_PREEMPT linux
@ 2018-10-05  2:12 Anup Pemmaiah
  2018-10-05 16:55 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 5+ messages in thread
From: Anup Pemmaiah @ 2018-10-05  2:12 UTC (permalink / raw)
  To: linux-rt-users

Hi,

I have a multi threaded userspace application running on RT_PREEMPT
linux on ARM Cortex A57 (ARMv8). The linux version is 4.14.0 with the
corresponding RT_PREEMPT patch. The below code is a simplified version
to re-create the issue. When the code is compiled with optimization
level -O1 and above, it error's out after sometime, with random value
in fl_val(result of the multiplication in thr_fn) at Line-24. After
looking at the compiled assembly code and through gdb, noticed that,
the value of 0.002 is loaded into a floating point register(s8) before
the looping starts at Line-21. The s8 register value is then used to
multiply with register storing frand value at Line-24. At random
times, the value in s8, has garbage value and the process aborts as
expected. It looks like, during kernel context switches, it is not
properly re-storing the values of floating point registers. The same
code works fine without any errors, on a non-rt kernel of the same
version on the same hardware.

My Questions

1) Is there any floating point related kernel setting that I should
set in the RT_PREEMPT kernel? I have set eagerfpu=on (even though it
is on by default)

2) Was reading about "Lazy Stacking" for Cortex-M4, but not sure if it
applies for Cortex A57

Any comments will be greatly appreciated.

----------------------------------------------Test-code------------------------------------------------------------
  1 #include <iostream>
  2 #include <cstdlib>
  3 #include <cstdint>
  4 #include <ctime>
  5 #include <random>
  6 #include <pthread.h>
  7 #include <string.h>
  8 #include <errno.h>
  9 #include <unistd.h>
 10 #include <signal.h>
 11
 12 #define MAX_THR 30
 13
 14 typedef float float32_t;
 15 int32_t gflag = 0;
 16
 17 void *thr_fn(void *arg) {
 18   //pthread_t cur_tid = pthread_self();
 19   std::random_device rd;   // non-deterministic generator
 20   std::mt19937 gen(rd());
 21   while (1) {
 22     uint32_t rand_num = gen() % 1000;
 23     float32_t frand = static_cast<float32_t>(rand_num);
 24     float32_t fl_val = (frand * 0.002F);
 25     if ((fl_val < 0.0F) || (fl_val > 2.0F)) {
 26       gflag = -1;
 27       // Keep sleeping till the process gets aborted by the main thread
 28       while(1){usleep(2000);}
 29     }
 30     usleep(1000);
 31   }
 32 }
 33
 34 int main() {
 35   int thr_cnt;
 36   int res = 0;
 37   pthread_t tid;
 38   for (thr_cnt = 0; thr_cnt < MAX_THR; thr_cnt++) {
 39     res = pthread_create(&tid,  NULL, thr_fn, NULL);
 40     if (res != 0) {
 41       std::cout <<"Error creating pthread: " <<strerror(errno);
 42     }
 43   }
 44   while(1){
 45     if (gflag != 0) {
 46       kill(getpid(), SIGABRT);
 47     }
 48     usleep(1000 * 1000 * 1);
 49   }
 50   return (0);
 51 }
~
Thanks
Anup

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Floating point register corruption on ARM Cortex A57 (ARMv8) with RT_PREEMPT linux
  2018-10-05  2:12 Floating point register corruption on ARM Cortex A57 (ARMv8) with RT_PREEMPT linux Anup Pemmaiah
@ 2018-10-05 16:55 ` Sebastian Andrzej Siewior
  2018-10-07 16:58   ` Anup Pemmaiah
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-05 16:55 UTC (permalink / raw)
  To: Anup Pemmaiah; +Cc: linux-rt-users

On 2018-10-04 19:12:53 [-0700], Anup Pemmaiah wrote:
> 1) Is there any floating point related kernel setting that I should
> set in the RT_PREEMPT kernel? I have set eagerfpu=on (even though it
> is on by default)

nope, should work by default. Do you have NEON related crypto code or
EFI enabled?

> 2) Was reading about "Lazy Stacking" for Cortex-M4, but not sure if it
> applies for Cortex A57
> 
> Any comments will be greatly appreciated.

Could you please try the latest v4.18? I believe it is fixed there and
needs just backporting. Could you please try?

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Floating point register corruption on ARM Cortex A57 (ARMv8) with RT_PREEMPT linux
  2018-10-05 16:55 ` Sebastian Andrzej Siewior
@ 2018-10-07 16:58   ` Anup Pemmaiah
  2018-10-08  5:35     ` Anup Pemmaiah
  0 siblings, 1 reply; 5+ messages in thread
From: Anup Pemmaiah @ 2018-10-07 16:58 UTC (permalink / raw)
  To: sebastian.siewior; +Cc: linux-rt-users

> nope, should work by default. Do you have NEON related crypto code or
> EFI enabled?

Sebastian, Thank you for the comments. I have NEON related crypto code
enabled right now, but I remember disabling
it and it did not make a difference. I will disable it again and will
give it a try. In the mean time, when I disabled the following 4 lines
from the config file
and re-compiled the kernel, the test code works fine without the issue
described earlier related to floating point. Are you suspecting that
NEON related crypto interferes with real time kernel and not with non-rt kernel?


  # CONFIG_PREEMPT_RT_BASE=y

  # CONFIG_HAVE_PREEMPT_LAZY=y

  # CONFIG_PREEMPT_LAZY=y

  # CONFIG_PREEMPT_RT_FULL=y


> Could you please try the latest v4.18? I believe it is fixed there and
> needs just backporting. Could you please try?

I will try it as a last resort because I am not sure if the board BSP
supports v4.18. Right now, I am
trying to figure out, why it works fine with non-rt kernel and only
see the issue when the above four RT_PREEMPT config
options are turned on.


On Fri, Oct 5, 2018, 9:55 AM Sebastian Andrzej Siewior
<sebastian.siewior@linutronix.de> wrote:
>
> On 2018-10-04 19:12:53 [-0700], Anup Pemmaiah wrote:
> > 1) Is there any floating point related kernel setting that I should
> > set in the RT_PREEMPT kernel? I have set eagerfpu=on (even though it
> > is on by default)
>
> nope, should work by default. Do you have NEON related crypto code or
> EFI enabled?
>
> > 2) Was reading about "Lazy Stacking" for Cortex-M4, but not sure if it
> > applies for Cortex A57
> >
> > Any comments will be greatly appreciated.
>
> Could you please try the latest v4.18? I believe it is fixed there and
> needs just backporting. Could you please try?
>
>
> Sebastian


On Fri, Oct 5, 2018 at 9:55 AM Sebastian Andrzej Siewior
<sebastian.siewior@linutronix.de> wrote:
>
> On 2018-10-04 19:12:53 [-0700], Anup Pemmaiah wrote:
> > 1) Is there any floating point related kernel setting that I should
> > set in the RT_PREEMPT kernel? I have set eagerfpu=on (even though it
> > is on by default)
>
> nope, should work by default. Do you have NEON related crypto code or
> EFI enabled?
>
> > 2) Was reading about "Lazy Stacking" for Cortex-M4, but not sure if it
> > applies for Cortex A57
> >
> > Any comments will be greatly appreciated.
>
> Could you please try the latest v4.18? I believe it is fixed there and
> needs just backporting. Could you please try?
>
> Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Floating point register corruption on ARM Cortex A57 (ARMv8) with RT_PREEMPT linux
  2018-10-07 16:58   ` Anup Pemmaiah
@ 2018-10-08  5:35     ` Anup Pemmaiah
  2018-10-12  1:42       ` Anup Pemmaiah
  0 siblings, 1 reply; 5+ messages in thread
From: Anup Pemmaiah @ 2018-10-08  5:35 UTC (permalink / raw)
  To: sebastian.siewior; +Cc: linux-rt-users

Some more observations with RT_PREEMPT configs enabled.

1) I re-ran the tests disabling all crypto including NEON related
crypto and EFI kernel config options. I still see randomly floating
point register getting corrupted

2) I noticed that, when I run the tests with RT schedulers and RT
priorities, eg: ("chrt -f 5 ./test_float" or "chrt -r 5
./test_float"),  I am not able  reproduce the corruption issue. But,
when I run the tests (just ./test_float) without any RT scheduler and
priority  (i.e SCHED_OTHER) can easily reproduce the issue.

I tried disabling PREEMPT_LAZY, by "echo NO_PREEMPT_LAZY >
/sys/kernel/debug/sched_features". It did not help and am able to
reproduce the problem

3) I have another Cortex ARM A57 system from a different vendor(cannot
name the vendors because of proprietary reasons) with Linux kernel
version 4.9.38 and RT_PREEMPT enabled. I do not see any floating point
corruption issue, even if I run the test as SCHED_OTHER or with real
time settings. So, that tells me moving to 4.18 may not help. What do
you think?

Thanks
Anup
On Sun, Oct 7, 2018 at 9:58 AM Anup Pemmaiah <anup.pemmaiah@gmail.com> wrote:
>
> > nope, should work by default. Do you have NEON related crypto code or
> > EFI enabled?
>
> Sebastian, Thank you for the comments. I have NEON related crypto code
> enabled right now, but I remember disabling
> it and it did not make a difference. I will disable it again and will
> give it a try. In the mean time, when I disabled the following 4 lines
> from the config file
> and re-compiled the kernel, the test code works fine without the issue
> described earlier related to floating point. Are you suspecting that
> NEON related crypto interferes with real time kernel and not with non-rt kernel?
>
>
>   # CONFIG_PREEMPT_RT_BASE=y
>
>   # CONFIG_HAVE_PREEMPT_LAZY=y
>
>   # CONFIG_PREEMPT_LAZY=y
>
>   # CONFIG_PREEMPT_RT_FULL=y
>
>
> > Could you please try the latest v4.18? I believe it is fixed there and
> > needs just backporting. Could you please try?
>
> I will try it as a last resort because I am not sure if the board BSP
> supports v4.18. Right now, I am
> trying to figure out, why it works fine with non-rt kernel and only
> see the issue when the above four RT_PREEMPT config
> options are turned on.
>
>
> On Fri, Oct 5, 2018, 9:55 AM Sebastian Andrzej Siewior
> <sebastian.siewior@linutronix.de> wrote:
> >
> > On 2018-10-04 19:12:53 [-0700], Anup Pemmaiah wrote:
> > > 1) Is there any floating point related kernel setting that I should
> > > set in the RT_PREEMPT kernel? I have set eagerfpu=on (even though it
> > > is on by default)
> >
> > nope, should work by default. Do you have NEON related crypto code or
> > EFI enabled?
> >
> > > 2) Was reading about "Lazy Stacking" for Cortex-M4, but not sure if it
> > > applies for Cortex A57
> > >
> > > Any comments will be greatly appreciated.
> >
> > Could you please try the latest v4.18? I believe it is fixed there and
> > needs just backporting. Could you please try?
> >
> >
> > Sebastian
>
>
> On Fri, Oct 5, 2018 at 9:55 AM Sebastian Andrzej Siewior
> <sebastian.siewior@linutronix.de> wrote:
> >
> > On 2018-10-04 19:12:53 [-0700], Anup Pemmaiah wrote:
> > > 1) Is there any floating point related kernel setting that I should
> > > set in the RT_PREEMPT kernel? I have set eagerfpu=on (even though it
> > > is on by default)
> >
> > nope, should work by default. Do you have NEON related crypto code or
> > EFI enabled?
> >
> > > 2) Was reading about "Lazy Stacking" for Cortex-M4, but not sure if it
> > > applies for Cortex A57
> > >
> > > Any comments will be greatly appreciated.
> >
> > Could you please try the latest v4.18? I believe it is fixed there and
> > needs just backporting. Could you please try?
> >
> > Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Floating point register corruption on ARM Cortex A57 (ARMv8) with RT_PREEMPT linux
  2018-10-08  5:35     ` Anup Pemmaiah
@ 2018-10-12  1:42       ` Anup Pemmaiah
  0 siblings, 0 replies; 5+ messages in thread
From: Anup Pemmaiah @ 2018-10-12  1:42 UTC (permalink / raw)
  To: sebastian.siewior; +Cc: linux-rt-users

Just wanted to give a quick update. Since, I could not get a 4.18 BSP
from the vendor, I could not move to 4.18 kernel as they only release
BSP's for LTS releases. I did a diff of arch/arm64/kernel/fpsimd.c
between my 4.14 version and the 4.18. I did not port SVE part, but
just ported back the preempt_enable/preempt_disable in fpsimd* around
local_bh_enable/disable. With that fix, I do not see the floating
point corruption anymore.
On Sun, Oct 7, 2018 at 10:35 PM Anup Pemmaiah <anup.pemmaiah@gmail.com> wrote:
>
> Some more observations with RT_PREEMPT configs enabled.
>
> 1) I re-ran the tests disabling all crypto including NEON related
> crypto and EFI kernel config options. I still see randomly floating
> point register getting corrupted
>
> 2) I noticed that, when I run the tests with RT schedulers and RT
> priorities, eg: ("chrt -f 5 ./test_float" or "chrt -r 5
> ./test_float"),  I am not able  reproduce the corruption issue. But,
> when I run the tests (just ./test_float) without any RT scheduler and
> priority  (i.e SCHED_OTHER) can easily reproduce the issue.
>
> I tried disabling PREEMPT_LAZY, by "echo NO_PREEMPT_LAZY >
> /sys/kernel/debug/sched_features". It did not help and am able to
> reproduce the problem
>
> 3) I have another Cortex ARM A57 system from a different vendor(cannot
> name the vendors because of proprietary reasons) with Linux kernel
> version 4.9.38 and RT_PREEMPT enabled. I do not see any floating point
> corruption issue, even if I run the test as SCHED_OTHER or with real
> time settings. So, that tells me moving to 4.18 may not help. What do
> you think?
>
> Thanks
> Anup
> On Sun, Oct 7, 2018 at 9:58 AM Anup Pemmaiah <anup.pemmaiah@gmail.com> wrote:
> >
> > > nope, should work by default. Do you have NEON related crypto code or
> > > EFI enabled?
> >
> > Sebastian, Thank you for the comments. I have NEON related crypto code
> > enabled right now, but I remember disabling
> > it and it did not make a difference. I will disable it again and will
> > give it a try. In the mean time, when I disabled the following 4 lines
> > from the config file
> > and re-compiled the kernel, the test code works fine without the issue
> > described earlier related to floating point. Are you suspecting that
> > NEON related crypto interferes with real time kernel and not with non-rt kernel?
> >
> >
> >   # CONFIG_PREEMPT_RT_BASE=y
> >
> >   # CONFIG_HAVE_PREEMPT_LAZY=y
> >
> >   # CONFIG_PREEMPT_LAZY=y
> >
> >   # CONFIG_PREEMPT_RT_FULL=y
> >
> >
> > > Could you please try the latest v4.18? I believe it is fixed there and
> > > needs just backporting. Could you please try?
> >
> > I will try it as a last resort because I am not sure if the board BSP
> > supports v4.18. Right now, I am
> > trying to figure out, why it works fine with non-rt kernel and only
> > see the issue when the above four RT_PREEMPT config
> > options are turned on.
> >
> >
> > On Fri, Oct 5, 2018, 9:55 AM Sebastian Andrzej Siewior
> > <sebastian.siewior@linutronix.de> wrote:
> > >
> > > On 2018-10-04 19:12:53 [-0700], Anup Pemmaiah wrote:
> > > > 1) Is there any floating point related kernel setting that I should
> > > > set in the RT_PREEMPT kernel? I have set eagerfpu=on (even though it
> > > > is on by default)
> > >
> > > nope, should work by default. Do you have NEON related crypto code or
> > > EFI enabled?
> > >
> > > > 2) Was reading about "Lazy Stacking" for Cortex-M4, but not sure if it
> > > > applies for Cortex A57
> > > >
> > > > Any comments will be greatly appreciated.
> > >
> > > Could you please try the latest v4.18? I believe it is fixed there and
> > > needs just backporting. Could you please try?
> > >
> > >
> > > Sebastian
> >
> >
> > On Fri, Oct 5, 2018 at 9:55 AM Sebastian Andrzej Siewior
> > <sebastian.siewior@linutronix.de> wrote:
> > >
> > > On 2018-10-04 19:12:53 [-0700], Anup Pemmaiah wrote:
> > > > 1) Is there any floating point related kernel setting that I should
> > > > set in the RT_PREEMPT kernel? I have set eagerfpu=on (even though it
> > > > is on by default)
> > >
> > > nope, should work by default. Do you have NEON related crypto code or
> > > EFI enabled?
> > >
> > > > 2) Was reading about "Lazy Stacking" for Cortex-M4, but not sure if it
> > > > applies for Cortex A57
> > > >
> > > > Any comments will be greatly appreciated.
> > >
> > > Could you please try the latest v4.18? I believe it is fixed there and
> > > needs just backporting. Could you please try?
> > >
> > > Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-10-12  9:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-05  2:12 Floating point register corruption on ARM Cortex A57 (ARMv8) with RT_PREEMPT linux Anup Pemmaiah
2018-10-05 16:55 ` Sebastian Andrzej Siewior
2018-10-07 16:58   ` Anup Pemmaiah
2018-10-08  5:35     ` Anup Pemmaiah
2018-10-12  1:42       ` Anup Pemmaiah

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.