All of lore.kernel.org
 help / color / mirror / Atom feed
* mysterious crashes on OMAP5 uevm
@ 2015-09-08 12:46 ` Grazvydas Ignotas
  0 siblings, 0 replies; 38+ messages in thread
From: Grazvydas Ignotas @ 2015-09-08 12:46 UTC (permalink / raw)
  To: linux-arm-kernel, linux-omap, Dr. H. Nikolaus Schaller
  Cc: Nishanth Menon, Russell King - ARM Linux

Hi,

this is a longstanding problem I'm seeing since the very beginning,
which was around 3.12 or so (when I've first got the hardware) and it
seems 4.2 is affected by it still. Basically what happens is Xorg
randomly segfaults at some "impossible" location. I don't have the
details at the moment (could get them is needed), but from what I
examined with gdb some time ago the situation did not make any sense.

There are 2 workarounds that I know which make the problem go away
(one is enough):
- recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
- disable ARCH_MULTI_V6 in the kernel config

Because of the above workarounds I have forgotten about it several
times, but it regularly comes back and bites again. It would look like
some missing erratum workaround, but I have all of them enabled in the
kernel.

Does anyone know about this? Perhaps some missing erratum workaround
in the bootloader? u-boot isn't too old here (2015.07).

Gražvydas

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-08 12:46 ` Grazvydas Ignotas
  0 siblings, 0 replies; 38+ messages in thread
From: Grazvydas Ignotas @ 2015-09-08 12:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

this is a longstanding problem I'm seeing since the very beginning,
which was around 3.12 or so (when I've first got the hardware) and it
seems 4.2 is affected by it still. Basically what happens is Xorg
randomly segfaults at some "impossible" location. I don't have the
details at the moment (could get them is needed), but from what I
examined with gdb some time ago the situation did not make any sense.

There are 2 workarounds that I know which make the problem go away
(one is enough):
- recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
- disable ARCH_MULTI_V6 in the kernel config

Because of the above workarounds I have forgotten about it several
times, but it regularly comes back and bites again. It would look like
some missing erratum workaround, but I have all of them enabled in the
kernel.

Does anyone know about this? Perhaps some missing erratum workaround
in the bootloader? u-boot isn't too old here (2015.07).

Gra?vydas

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mysterious crashes on OMAP5 uevm
  2015-09-08 12:46 ` Grazvydas Ignotas
@ 2015-09-08 14:38   ` Tony Lindgren
  -1 siblings, 0 replies; 38+ messages in thread
From: Tony Lindgren @ 2015-09-08 14:38 UTC (permalink / raw)
  To: Grazvydas Ignotas
  Cc: Nishanth Menon, Dr. H. Nikolaus Schaller, linux-omap,
	Russell King - ARM Linux, linux-arm-kernel

* Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
> Hi,
> 
> this is a longstanding problem I'm seeing since the very beginning,
> which was around 3.12 or so (when I've first got the hardware) and it
> seems 4.2 is affected by it still. Basically what happens is Xorg
> randomly segfaults at some "impossible" location. I don't have the
> details at the moment (could get them is needed), but from what I
> examined with gdb some time ago the situation did not make any sense.
> 
> There are 2 workarounds that I know which make the problem go away
> (one is enough):
> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
> - disable ARCH_MULTI_V6 in the kernel config
> 
> Because of the above workarounds I have forgotten about it several
> times, but it regularly comes back and bites again. It would look like
> some missing erratum workaround, but I have all of them enabled in the
> kernel.
> 
> Does anyone know about this? Perhaps some missing erratum workaround
> in the bootloader? u-boot isn't too old here (2015.07).

Seems like some incorrect handling with CONFIG_CPU_V6 compiled in.. 
Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
__LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
places ignoring uncompress and davinci code.

Do you have some easy way to reproduce this issue?

Regards,

Tony

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-08 14:38   ` Tony Lindgren
  0 siblings, 0 replies; 38+ messages in thread
From: Tony Lindgren @ 2015-09-08 14:38 UTC (permalink / raw)
  To: linux-arm-kernel

* Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
> Hi,
> 
> this is a longstanding problem I'm seeing since the very beginning,
> which was around 3.12 or so (when I've first got the hardware) and it
> seems 4.2 is affected by it still. Basically what happens is Xorg
> randomly segfaults at some "impossible" location. I don't have the
> details at the moment (could get them is needed), but from what I
> examined with gdb some time ago the situation did not make any sense.
> 
> There are 2 workarounds that I know which make the problem go away
> (one is enough):
> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
> - disable ARCH_MULTI_V6 in the kernel config
> 
> Because of the above workarounds I have forgotten about it several
> times, but it regularly comes back and bites again. It would look like
> some missing erratum workaround, but I have all of them enabled in the
> kernel.
> 
> Does anyone know about this? Perhaps some missing erratum workaround
> in the bootloader? u-boot isn't too old here (2015.07).

Seems like some incorrect handling with CONFIG_CPU_V6 compiled in.. 
Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
__LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
places ignoring uncompress and davinci code.

Do you have some easy way to reproduce this issue?

Regards,

Tony

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mysterious crashes on OMAP5 uevm
  2015-09-08 14:38   ` Tony Lindgren
@ 2015-09-08 20:41     ` Grazvydas Ignotas
  -1 siblings, 0 replies; 38+ messages in thread
From: Grazvydas Ignotas @ 2015-09-08 20:41 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Nishanth Menon, Dr. H. Nikolaus Schaller, linux-omap,
	Russell King - ARM Linux, linux-arm-kernel

On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
> * Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
>> Hi,
>>
>> this is a longstanding problem I'm seeing since the very beginning,
>> which was around 3.12 or so (when I've first got the hardware) and it
>> seems 4.2 is affected by it still. Basically what happens is Xorg
>> randomly segfaults at some "impossible" location. I don't have the
>> details at the moment (could get them is needed), but from what I
>> examined with gdb some time ago the situation did not make any sense.
>>
>> There are 2 workarounds that I know which make the problem go away
>> (one is enough):
>> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
>> - disable ARCH_MULTI_V6 in the kernel config
>>
>> Because of the above workarounds I have forgotten about it several
>> times, but it regularly comes back and bites again. It would look like
>> some missing erratum workaround, but I have all of them enabled in the
>> kernel.
>>
>> Does anyone know about this? Perhaps some missing erratum workaround
>> in the bootloader? u-boot isn't too old here (2015.07).
>
> Seems like some incorrect handling with CONFIG_CPU_V6 compiled in..
> Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
> __LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
> places ignoring uncompress and davinci code.

ok with that it was quite easy to find. On a kernel with ARCH_MULTI_V6
disabled, it is enough to just do this:

--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -340,13 +340,13 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
                /*
                 * The LSB of the handler determines if we're going to
                 * be using THUMB or ARM mode for this signal handler.
                 */
                thumb = handler & 1;

-#if __LINUX_ARM_ARCH__ >= 7
+#if 0 //__LINUX_ARM_ARCH__ >= 7
                /*
                 * Clear the If-Then Thumb-2 execution state
                 * ARM spec requires this to be all 000s in ARM mode
                 * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
                 * signal transition without this.
                 */

... and the problem appears, so I guess this needs some real
multiplatform handling,.

> Do you have some easy way to reproduce this issue?

Just moving a browser window around with mouse usually triggers it
within a minute.

>
> Regards,
>
> Tony

Gražvydas

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-08 20:41     ` Grazvydas Ignotas
  0 siblings, 0 replies; 38+ messages in thread
From: Grazvydas Ignotas @ 2015-09-08 20:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
> * Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
>> Hi,
>>
>> this is a longstanding problem I'm seeing since the very beginning,
>> which was around 3.12 or so (when I've first got the hardware) and it
>> seems 4.2 is affected by it still. Basically what happens is Xorg
>> randomly segfaults at some "impossible" location. I don't have the
>> details at the moment (could get them is needed), but from what I
>> examined with gdb some time ago the situation did not make any sense.
>>
>> There are 2 workarounds that I know which make the problem go away
>> (one is enough):
>> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
>> - disable ARCH_MULTI_V6 in the kernel config
>>
>> Because of the above workarounds I have forgotten about it several
>> times, but it regularly comes back and bites again. It would look like
>> some missing erratum workaround, but I have all of them enabled in the
>> kernel.
>>
>> Does anyone know about this? Perhaps some missing erratum workaround
>> in the bootloader? u-boot isn't too old here (2015.07).
>
> Seems like some incorrect handling with CONFIG_CPU_V6 compiled in..
> Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
> __LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
> places ignoring uncompress and davinci code.

ok with that it was quite easy to find. On a kernel with ARCH_MULTI_V6
disabled, it is enough to just do this:

--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -340,13 +340,13 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
                /*
                 * The LSB of the handler determines if we're going to
                 * be using THUMB or ARM mode for this signal handler.
                 */
                thumb = handler & 1;

-#if __LINUX_ARM_ARCH__ >= 7
+#if 0 //__LINUX_ARM_ARCH__ >= 7
                /*
                 * Clear the If-Then Thumb-2 execution state
                 * ARM spec requires this to be all 000s in ARM mode
                 * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
                 * signal transition without this.
                 */

... and the problem appears, so I guess this needs some real
multiplatform handling,.

> Do you have some easy way to reproduce this issue?

Just moving a browser window around with mouse usually triggers it
within a minute.

>
> Regards,
>
> Tony

Gra?vydas

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mysterious crashes on OMAP5 uevm
  2015-09-08 20:41     ` Grazvydas Ignotas
@ 2015-09-08 21:07       ` Tony Lindgren
  -1 siblings, 0 replies; 38+ messages in thread
From: Tony Lindgren @ 2015-09-08 21:07 UTC (permalink / raw)
  To: Grazvydas Ignotas
  Cc: Nishanth Menon, Dr. H. Nikolaus Schaller, linux-omap,
	Russell King - ARM Linux, linux-arm-kernel

* Grazvydas Ignotas <notasas@gmail.com> [150908 13:44]:
> On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
> > * Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
> >> Hi,
> >>
> >> this is a longstanding problem I'm seeing since the very beginning,
> >> which was around 3.12 or so (when I've first got the hardware) and it
> >> seems 4.2 is affected by it still. Basically what happens is Xorg
> >> randomly segfaults at some "impossible" location. I don't have the
> >> details at the moment (could get them is needed), but from what I
> >> examined with gdb some time ago the situation did not make any sense.
> >>
> >> There are 2 workarounds that I know which make the problem go away
> >> (one is enough):
> >> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
> >> - disable ARCH_MULTI_V6 in the kernel config
> >>
> >> Because of the above workarounds I have forgotten about it several
> >> times, but it regularly comes back and bites again. It would look like
> >> some missing erratum workaround, but I have all of them enabled in the
> >> kernel.
> >>
> >> Does anyone know about this? Perhaps some missing erratum workaround
> >> in the bootloader? u-boot isn't too old here (2015.07).
> >
> > Seems like some incorrect handling with CONFIG_CPU_V6 compiled in..
> > Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
> > __LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
> > places ignoring uncompress and davinci code.
> 
> ok with that it was quite easy to find. On a kernel with ARCH_MULTI_V6
> disabled, it is enough to just do this:
> 
> --- a/arch/arm/kernel/signal.c
> +++ b/arch/arm/kernel/signal.c
> @@ -340,13 +340,13 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
>                 /*
>                  * The LSB of the handler determines if we're going to
>                  * be using THUMB or ARM mode for this signal handler.
>                  */
>                 thumb = handler & 1;
> 
> -#if __LINUX_ARM_ARCH__ >= 7
> +#if 0 //__LINUX_ARM_ARCH__ >= 7
>                 /*
>                  * Clear the If-Then Thumb-2 execution state
>                  * ARM spec requires this to be all 000s in ARM mode
>                  * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
>                  * signal transition without this.
>                  */
> 
> ... and the problem appears, so I guess this needs some real
> multiplatform handling,.

OK nice to hear you found it. Yeah looks like some runtime
capability check is needed.
 
> > Do you have some easy way to reproduce this issue?
> 
> Just moving a browser window around with mouse usually triggers it
> within a minute.

OK good to know.

Regards,

Tony

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-08 21:07       ` Tony Lindgren
  0 siblings, 0 replies; 38+ messages in thread
From: Tony Lindgren @ 2015-09-08 21:07 UTC (permalink / raw)
  To: linux-arm-kernel

* Grazvydas Ignotas <notasas@gmail.com> [150908 13:44]:
> On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
> > * Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
> >> Hi,
> >>
> >> this is a longstanding problem I'm seeing since the very beginning,
> >> which was around 3.12 or so (when I've first got the hardware) and it
> >> seems 4.2 is affected by it still. Basically what happens is Xorg
> >> randomly segfaults at some "impossible" location. I don't have the
> >> details at the moment (could get them is needed), but from what I
> >> examined with gdb some time ago the situation did not make any sense.
> >>
> >> There are 2 workarounds that I know which make the problem go away
> >> (one is enough):
> >> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
> >> - disable ARCH_MULTI_V6 in the kernel config
> >>
> >> Because of the above workarounds I have forgotten about it several
> >> times, but it regularly comes back and bites again. It would look like
> >> some missing erratum workaround, but I have all of them enabled in the
> >> kernel.
> >>
> >> Does anyone know about this? Perhaps some missing erratum workaround
> >> in the bootloader? u-boot isn't too old here (2015.07).
> >
> > Seems like some incorrect handling with CONFIG_CPU_V6 compiled in..
> > Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
> > __LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
> > places ignoring uncompress and davinci code.
> 
> ok with that it was quite easy to find. On a kernel with ARCH_MULTI_V6
> disabled, it is enough to just do this:
> 
> --- a/arch/arm/kernel/signal.c
> +++ b/arch/arm/kernel/signal.c
> @@ -340,13 +340,13 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
>                 /*
>                  * The LSB of the handler determines if we're going to
>                  * be using THUMB or ARM mode for this signal handler.
>                  */
>                 thumb = handler & 1;
> 
> -#if __LINUX_ARM_ARCH__ >= 7
> +#if 0 //__LINUX_ARM_ARCH__ >= 7
>                 /*
>                  * Clear the If-Then Thumb-2 execution state
>                  * ARM spec requires this to be all 000s in ARM mode
>                  * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
>                  * signal transition without this.
>                  */
> 
> ... and the problem appears, so I guess this needs some real
> multiplatform handling,.

OK nice to hear you found it. Yeah looks like some runtime
capability check is needed.
 
> > Do you have some easy way to reproduce this issue?
> 
> Just moving a browser window around with mouse usually triggers it
> within a minute.

OK good to know.

Regards,

Tony

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mysterious crashes on OMAP5 uevm
  2015-09-08 21:07       ` Tony Lindgren
@ 2015-09-10  6:42         ` Dr. H. Nikolaus Schaller
  -1 siblings, 0 replies; 38+ messages in thread
From: Dr. H. Nikolaus Schaller @ 2015-09-10  6:42 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Nishanth Menon, Russell King - ARM Linux, Grazvydas Ignotas,
	Marek Belisko, linux-omap, linux-arm-kernel


Am 08.09.2015 um 23:07 schrieb Tony Lindgren <tony@atomide.com>:

> * Grazvydas Ignotas <notasas@gmail.com> [150908 13:44]:
>> On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
>>> * Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
>>>> Hi,
>>>> 
>>>> this is a longstanding problem I'm seeing since the very beginning,
>>>> which was around 3.12 or so (when I've first got the hardware) and it
>>>> seems 4.2 is affected by it still. Basically what happens is Xorg
>>>> randomly segfaults at some "impossible" location. I don't have the
>>>> details at the moment (could get them is needed), but from what I
>>>> examined with gdb some time ago the situation did not make any sense.
>>>> 
>>>> There are 2 workarounds that I know which make the problem go away
>>>> (one is enough):
>>>> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
>>>> - disable ARCH_MULTI_V6 in the kernel config
>>>> 
>>>> Because of the above workarounds I have forgotten about it several
>>>> times, but it regularly comes back and bites again. It would look like
>>>> some missing erratum workaround, but I have all of them enabled in the
>>>> kernel.
>>>> 
>>>> Does anyone know about this? Perhaps some missing erratum workaround
>>>> in the bootloader? u-boot isn't too old here (2015.07).
>>> 
>>> Seems like some incorrect handling with CONFIG_CPU_V6 compiled in..
>>> Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
>>> __LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
>>> places ignoring uncompress and davinci code.
>> 
>> ok with that it was quite easy to find. On a kernel with ARCH_MULTI_V6
>> disabled, it is enough to just do this:
>> 
>> --- a/arch/arm/kernel/signal.c
>> +++ b/arch/arm/kernel/signal.c
>> @@ -340,13 +340,13 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
>>                /*
>>                 * The LSB of the handler determines if we're going to
>>                 * be using THUMB or ARM mode for this signal handler.
>>                 */
>>                thumb = handler & 1;
>> 
>> -#if __LINUX_ARM_ARCH__ >= 7
>> +#if 0 //__LINUX_ARM_ARCH__ >= 7
>>                /*
>>                 * Clear the If-Then Thumb-2 execution state
>>                 * ARM spec requires this to be all 000s in ARM mode
>>                 * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
>>                 * signal transition without this.
>>                 */
>> 
>> ... and the problem appears, so I guess this needs some real
>> multiplatform handling,.
> 
> OK nice to hear you found it. Yeah looks like some runtime
> capability check is needed.
> 
>>> Do you have some easy way to reproduce this issue?
>> 
>> Just moving a browser window around with mouse usually triggers it
>> within a minute.
> 
> OK good to know.

It looks as if this is the solution for the same symptom on our OMAP3 board (gta04).
There, it suffices to draw on the touch screen for ~10 seconds to make the xserver segfault.

[we are using the binary xserver from debian wheezy
ii  xserver-xorg-core                        2:1.12.4-6+deb7u5             armhf        Xorg X server - core server]

We know about this bug for a while, but so far did think that some touch screen
event bit has changed and we have to fix our touch screen driver.

Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
>> #if 0 //__LINUX_ARM_ARCH__ >= 7
makes it re-appear.

A while ago I tried to debug running the x-server under strace and could find that it also has
something to do with SIGALRM.

And that is very consistent with “enable/disable” by modifying arch/arm/kernel/signal.c

BR,
Nikolaus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-10  6:42         ` Dr. H. Nikolaus Schaller
  0 siblings, 0 replies; 38+ messages in thread
From: Dr. H. Nikolaus Schaller @ 2015-09-10  6:42 UTC (permalink / raw)
  To: linux-arm-kernel


Am 08.09.2015 um 23:07 schrieb Tony Lindgren <tony@atomide.com>:

> * Grazvydas Ignotas <notasas@gmail.com> [150908 13:44]:
>> On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
>>> * Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
>>>> Hi,
>>>> 
>>>> this is a longstanding problem I'm seeing since the very beginning,
>>>> which was around 3.12 or so (when I've first got the hardware) and it
>>>> seems 4.2 is affected by it still. Basically what happens is Xorg
>>>> randomly segfaults at some "impossible" location. I don't have the
>>>> details at the moment (could get them is needed), but from what I
>>>> examined with gdb some time ago the situation did not make any sense.
>>>> 
>>>> There are 2 workarounds that I know which make the problem go away
>>>> (one is enough):
>>>> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
>>>> - disable ARCH_MULTI_V6 in the kernel config
>>>> 
>>>> Because of the above workarounds I have forgotten about it several
>>>> times, but it regularly comes back and bites again. It would look like
>>>> some missing erratum workaround, but I have all of them enabled in the
>>>> kernel.
>>>> 
>>>> Does anyone know about this? Perhaps some missing erratum workaround
>>>> in the bootloader? u-boot isn't too old here (2015.07).
>>> 
>>> Seems like some incorrect handling with CONFIG_CPU_V6 compiled in..
>>> Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
>>> __LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
>>> places ignoring uncompress and davinci code.
>> 
>> ok with that it was quite easy to find. On a kernel with ARCH_MULTI_V6
>> disabled, it is enough to just do this:
>> 
>> --- a/arch/arm/kernel/signal.c
>> +++ b/arch/arm/kernel/signal.c
>> @@ -340,13 +340,13 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
>>                /*
>>                 * The LSB of the handler determines if we're going to
>>                 * be using THUMB or ARM mode for this signal handler.
>>                 */
>>                thumb = handler & 1;
>> 
>> -#if __LINUX_ARM_ARCH__ >= 7
>> +#if 0 //__LINUX_ARM_ARCH__ >= 7
>>                /*
>>                 * Clear the If-Then Thumb-2 execution state
>>                 * ARM spec requires this to be all 000s in ARM mode
>>                 * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
>>                 * signal transition without this.
>>                 */
>> 
>> ... and the problem appears, so I guess this needs some real
>> multiplatform handling,.
> 
> OK nice to hear you found it. Yeah looks like some runtime
> capability check is needed.
> 
>>> Do you have some easy way to reproduce this issue?
>> 
>> Just moving a browser window around with mouse usually triggers it
>> within a minute.
> 
> OK good to know.

It looks as if this is the solution for the same symptom on our OMAP3 board (gta04).
There, it suffices to draw on the touch screen for ~10 seconds to make the xserver segfault.

[we are using the binary xserver from debian wheezy
ii  xserver-xorg-core                        2:1.12.4-6+deb7u5             armhf        Xorg X server - core server]

We know about this bug for a while, but so far did think that some touch screen
event bit has changed and we have to fix our touch screen driver.

Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
>> #if 0 //__LINUX_ARM_ARCH__ >= 7
makes it re-appear.

A while ago I tried to debug running the x-server under strace and could find that it also has
something to do with SIGALRM.

And that is very consistent with ?enable/disable? by modifying arch/arm/kernel/signal.c

BR,
Nikolaus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mysterious crashes on OMAP5 uevm
  2015-09-10  6:42         ` Dr. H. Nikolaus Schaller
@ 2015-09-10  8:30           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2015-09-10  8:30 UTC (permalink / raw)
  To: Dr. H. Nikolaus Schaller
  Cc: Nishanth Menon, Tony Lindgren, Grazvydas Ignotas, Marek Belisko,
	linux-omap, linux-arm-kernel

On Thu, Sep 10, 2015 at 08:42:57AM +0200, Dr. H. Nikolaus Schaller wrote:
> 
> Am 08.09.2015 um 23:07 schrieb Tony Lindgren <tony@atomide.com>:
> 
> > * Grazvydas Ignotas <notasas@gmail.com> [150908 13:44]:
> >> On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
> >>> * Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
> >>>> Hi,
> >>>> 
> >>>> this is a longstanding problem I'm seeing since the very beginning,
> >>>> which was around 3.12 or so (when I've first got the hardware) and it
> >>>> seems 4.2 is affected by it still. Basically what happens is Xorg
> >>>> randomly segfaults at some "impossible" location. I don't have the
> >>>> details at the moment (could get them is needed), but from what I
> >>>> examined with gdb some time ago the situation did not make any sense.
> >>>> 
> >>>> There are 2 workarounds that I know which make the problem go away
> >>>> (one is enough):
> >>>> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
> >>>> - disable ARCH_MULTI_V6 in the kernel config
> >>>> 
> >>>> Because of the above workarounds I have forgotten about it several
> >>>> times, but it regularly comes back and bites again. It would look like
> >>>> some missing erratum workaround, but I have all of them enabled in the
> >>>> kernel.
> >>>> 
> >>>> Does anyone know about this? Perhaps some missing erratum workaround
> >>>> in the bootloader? u-boot isn't too old here (2015.07).
> >>> 
> >>> Seems like some incorrect handling with CONFIG_CPU_V6 compiled in..
> >>> Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
> >>> __LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
> >>> places ignoring uncompress and davinci code.
> >> 
> >> ok with that it was quite easy to find. On a kernel with ARCH_MULTI_V6
> >> disabled, it is enough to just do this:
> >> 
> >> --- a/arch/arm/kernel/signal.c
> >> +++ b/arch/arm/kernel/signal.c
> >> @@ -340,13 +340,13 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
> >>                /*
> >>                 * The LSB of the handler determines if we're going to
> >>                 * be using THUMB or ARM mode for this signal handler.
> >>                 */
> >>                thumb = handler & 1;
> >> 
> >> -#if __LINUX_ARM_ARCH__ >= 7
> >> +#if 0 //__LINUX_ARM_ARCH__ >= 7
> >>                /*
> >>                 * Clear the If-Then Thumb-2 execution state
> >>                 * ARM spec requires this to be all 000s in ARM mode
> >>                 * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
> >>                 * signal transition without this.
> >>                 */
> >> 
> >> ... and the problem appears, so I guess this needs some real
> >> multiplatform handling,.
> > 
> > OK nice to hear you found it. Yeah looks like some runtime
> > capability check is needed.
> > 
> >>> Do you have some easy way to reproduce this issue?
> >> 
> >> Just moving a browser window around with mouse usually triggers it
> >> within a minute.
> > 
> > OK good to know.
> 
> It looks as if this is the solution for the same symptom on our OMAP3 board (gta04).
> There, it suffices to draw on the touch screen for ~10 seconds to make the xserver segfault.
> 
> [we are using the binary xserver from debian wheezy
> ii  xserver-xorg-core                        2:1.12.4-6+deb7u5             armhf        Xorg X server - core server]
> 
> We know about this bug for a while, but so far did think that some touch screen
> event bit has changed and we have to fix our touch screen driver.
> 
> Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
> >> #if 0 //__LINUX_ARM_ARCH__ >= 7
> makes it re-appear.
> 
> A while ago I tried to debug running the x-server under strace and could find that it also has
> something to do with SIGALRM.
> 
> And that is very consistent with “enable/disable” by modifying arch/arm/kernel/signal.c

It would be really nice if someone could diagnose what's going on here.
What exception is causing the X server to be killed (someone said a
segfault)?  What is the register state at the point that happens?  What
does the code look like  Is it happening inside the SIGALRM handler, or
when the SIGALRM handler has returned?

I'd suggest attaching gdb to the X server, but remember to set gdb to
ignore SIGPIPEs.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-10  8:30           ` Russell King - ARM Linux
  0 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2015-09-10  8:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 10, 2015 at 08:42:57AM +0200, Dr. H. Nikolaus Schaller wrote:
> 
> Am 08.09.2015 um 23:07 schrieb Tony Lindgren <tony@atomide.com>:
> 
> > * Grazvydas Ignotas <notasas@gmail.com> [150908 13:44]:
> >> On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
> >>> * Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
> >>>> Hi,
> >>>> 
> >>>> this is a longstanding problem I'm seeing since the very beginning,
> >>>> which was around 3.12 or so (when I've first got the hardware) and it
> >>>> seems 4.2 is affected by it still. Basically what happens is Xorg
> >>>> randomly segfaults at some "impossible" location. I don't have the
> >>>> details at the moment (could get them is needed), but from what I
> >>>> examined with gdb some time ago the situation did not make any sense.
> >>>> 
> >>>> There are 2 workarounds that I know which make the problem go away
> >>>> (one is enough):
> >>>> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
> >>>> - disable ARCH_MULTI_V6 in the kernel config
> >>>> 
> >>>> Because of the above workarounds I have forgotten about it several
> >>>> times, but it regularly comes back and bites again. It would look like
> >>>> some missing erratum workaround, but I have all of them enabled in the
> >>>> kernel.
> >>>> 
> >>>> Does anyone know about this? Perhaps some missing erratum workaround
> >>>> in the bootloader? u-boot isn't too old here (2015.07).
> >>> 
> >>> Seems like some incorrect handling with CONFIG_CPU_V6 compiled in..
> >>> Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
> >>> __LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
> >>> places ignoring uncompress and davinci code.
> >> 
> >> ok with that it was quite easy to find. On a kernel with ARCH_MULTI_V6
> >> disabled, it is enough to just do this:
> >> 
> >> --- a/arch/arm/kernel/signal.c
> >> +++ b/arch/arm/kernel/signal.c
> >> @@ -340,13 +340,13 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
> >>                /*
> >>                 * The LSB of the handler determines if we're going to
> >>                 * be using THUMB or ARM mode for this signal handler.
> >>                 */
> >>                thumb = handler & 1;
> >> 
> >> -#if __LINUX_ARM_ARCH__ >= 7
> >> +#if 0 //__LINUX_ARM_ARCH__ >= 7
> >>                /*
> >>                 * Clear the If-Then Thumb-2 execution state
> >>                 * ARM spec requires this to be all 000s in ARM mode
> >>                 * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
> >>                 * signal transition without this.
> >>                 */
> >> 
> >> ... and the problem appears, so I guess this needs some real
> >> multiplatform handling,.
> > 
> > OK nice to hear you found it. Yeah looks like some runtime
> > capability check is needed.
> > 
> >>> Do you have some easy way to reproduce this issue?
> >> 
> >> Just moving a browser window around with mouse usually triggers it
> >> within a minute.
> > 
> > OK good to know.
> 
> It looks as if this is the solution for the same symptom on our OMAP3 board (gta04).
> There, it suffices to draw on the touch screen for ~10 seconds to make the xserver segfault.
> 
> [we are using the binary xserver from debian wheezy
> ii  xserver-xorg-core                        2:1.12.4-6+deb7u5             armhf        Xorg X server - core server]
> 
> We know about this bug for a while, but so far did think that some touch screen
> event bit has changed and we have to fix our touch screen driver.
> 
> Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
> >> #if 0 //__LINUX_ARM_ARCH__ >= 7
> makes it re-appear.
> 
> A while ago I tried to debug running the x-server under strace and could find that it also has
> something to do with SIGALRM.
> 
> And that is very consistent with ?enable/disable? by modifying arch/arm/kernel/signal.c

It would be really nice if someone could diagnose what's going on here.
What exception is causing the X server to be killed (someone said a
segfault)?  What is the register state at the point that happens?  What
does the code look like  Is it happening inside the SIGALRM handler, or
when the SIGALRM handler has returned?

I'd suggest attaching gdb to the X server, but remember to set gdb to
ignore SIGPIPEs.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mysterious crashes on OMAP5 uevm
  2015-09-10  8:30           ` Russell King - ARM Linux
@ 2015-09-10  8:57             ` Dr. H. Nikolaus Schaller
  -1 siblings, 0 replies; 38+ messages in thread
From: Dr. H. Nikolaus Schaller @ 2015-09-10  8:57 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Nishanth Menon, Tony Lindgren, Grazvydas Ignotas, Marek Belisko,
	linux-omap, linux-arm-kernel


Am 10.09.2015 um 10:30 schrieb Russell King - ARM Linux <linux@arm.linux.org.uk>:

> On Thu, Sep 10, 2015 at 08:42:57AM +0200, Dr. H. Nikolaus Schaller wrote:
>> 
>> Am 08.09.2015 um 23:07 schrieb Tony Lindgren <tony@atomide.com>:
>> 
>>> * Grazvydas Ignotas <notasas@gmail.com> [150908 13:44]:
>>>> On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
>>>>> * Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
>>>>>> Hi,
>>>>>> 
>>>>>> this is a longstanding problem I'm seeing since the very beginning,
>>>>>> which was around 3.12 or so (when I've first got the hardware) and it
>>>>>> seems 4.2 is affected by it still. Basically what happens is Xorg
>>>>>> randomly segfaults at some "impossible" location. I don't have the
>>>>>> details at the moment (could get them is needed), but from what I
>>>>>> examined with gdb some time ago the situation did not make any sense.
>>>>>> 
>>>>>> There are 2 workarounds that I know which make the problem go away
>>>>>> (one is enough):
>>>>>> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
>>>>>> - disable ARCH_MULTI_V6 in the kernel config
>>>>>> 
>>>>>> Because of the above workarounds I have forgotten about it several
>>>>>> times, but it regularly comes back and bites again. It would look like
>>>>>> some missing erratum workaround, but I have all of them enabled in the
>>>>>> kernel.
>>>>>> 
>>>>>> Does anyone know about this? Perhaps some missing erratum workaround
>>>>>> in the bootloader? u-boot isn't too old here (2015.07).
>>>>> 
>>>>> Seems like some incorrect handling with CONFIG_CPU_V6 compiled in..
>>>>> Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
>>>>> __LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
>>>>> places ignoring uncompress and davinci code.
>>>> 
>>>> ok with that it was quite easy to find. On a kernel with ARCH_MULTI_V6
>>>> disabled, it is enough to just do this:
>>>> 
>>>> --- a/arch/arm/kernel/signal.c
>>>> +++ b/arch/arm/kernel/signal.c
>>>> @@ -340,13 +340,13 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
>>>>               /*
>>>>                * The LSB of the handler determines if we're going to
>>>>                * be using THUMB or ARM mode for this signal handler.
>>>>                */
>>>>               thumb = handler & 1;
>>>> 
>>>> -#if __LINUX_ARM_ARCH__ >= 7
>>>> +#if 0 //__LINUX_ARM_ARCH__ >= 7
>>>>               /*
>>>>                * Clear the If-Then Thumb-2 execution state
>>>>                * ARM spec requires this to be all 000s in ARM mode
>>>>                * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
>>>>                * signal transition without this.
>>>>                */
>>>> 
>>>> ... and the problem appears, so I guess this needs some real
>>>> multiplatform handling,.
>>> 
>>> OK nice to hear you found it. Yeah looks like some runtime
>>> capability check is needed.
>>> 
>>>>> Do you have some easy way to reproduce this issue?
>>>> 
>>>> Just moving a browser window around with mouse usually triggers it
>>>> within a minute.
>>> 
>>> OK good to know.
>> 
>> It looks as if this is the solution for the same symptom on our OMAP3 board (gta04).
>> There, it suffices to draw on the touch screen for ~10 seconds to make the xserver segfault.
>> 
>> [we are using the binary xserver from debian wheezy
>> ii  xserver-xorg-core                        2:1.12.4-6+deb7u5             armhf        Xorg X server - core server]
>> 
>> We know about this bug for a while, but so far did think that some touch screen
>> event bit has changed and we have to fix our touch screen driver.
>> 
>> Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
>>>> #if 0 //__LINUX_ARM_ARCH__ >= 7
>> makes it re-appear.
>> 
>> A while ago I tried to debug running the x-server under strace and could find that it also has
>> something to do with SIGALRM.
>> 
>> And that is very consistent with “enable/disable” by modifying arch/arm/kernel/signal.c
> 
> It would be really nice if someone could diagnose what's going on here.
> What exception is causing the X server to be killed (someone said a
> segfault)?  What is the register state at the point that happens?  What
> does the code look like  Is it happening inside the SIGALRM handler, or
> when the SIGALRM handler has returned?
> 
> I'd suggest attaching gdb to the X server, but remember to set gdb to
> ignore SIGPIPEs.

I don’t have a setup to run gdb (with source) on the device and really zero
experience with Xserver sources. But maybe Grazvydas can do that better
than me.

Attached is some strace I had recorded during my earlier experiments.
X-Server appears not only to heavily use SIGALRM but SIGIO.

And it looks as if it a SEGFAULT appears inside the SIGIO handler after
having done 3 syscalls (select, read, clock_gettime) but before the
sigreturn. At least in this example.

Xserver then does a graceful shutdown after SEGFAULT. I.e. it prints the
segfault message by itself.

Hope this is a useful piece to solve the puzzle and helps a little.

BR,
Nikolaus

…
--- SIGALRM (Alarm clock) @ 0 (0) ---
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353T^\351\n\0\3\0\0\0:\4\0\0;\230\353T^\351\n\0\3\0\1\0=\7\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 494831541}) = 0
sigreturn()                             = ? (mask now [ILL ABRT KILL USR1 SEGV PIPE TERM STKFLT CHLD STOP TSTP TTIN XFSZ VTALRM PROF IO PWR RTMIN])
sigreturn()                             = ? (mask now [])
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
select(256, [1 3 4 5 12 13 14 15 16 19], NULL, NULL, {0, 0}) = 1 (in [19], left {0, 0})
clock_gettime(CLOCK_MONOTONIC, {7330, 499042967}) = 0
setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 500050047}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 501911619}) = 0
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353Tw\20\v\0\3\0\0\0h\4\0\0;\230\353Tw\20\v\0\3\0\1\0\256\7\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 504536131}) = 0
sigreturn()                             = ? (mask now [HUP QUIT ILL])
clock_gettime(CLOCK_MONOTONIC, {7330, 506275633}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 506855467}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 507587889}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 508442381}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 508961180}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 509418943}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 509998777}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 511860350}) = 0
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353TT7\v\0\3\0\0\0\242\4\0\0;\230\353TT7\v\0\3\0\1\0\367\7\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 514484861}) = 0
sigreturn()                             = ? (mask now [])
clock_gettime(CLOCK_MONOTONIC, {7330, 516224363}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 516743162}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 517200926}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 517719725}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 518452147}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 519367674}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 519947508}) = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353Tn^\v\0\3\0\0\0\370\4\0\0;\230\353Tn^\v\0\3\0\1\0y\10\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 525074461}) = 0
sigreturn()                             = ? (mask now [])
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
select(256, [1 3 4 5 12 13 14 15 16 19], NULL, NULL, {0, 0}) = 1 (in [19], left {0, 0})
clock_gettime(CLOCK_MONOTONIC, {7330, 528400877}) = 0
setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 529377440}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 530018309}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 531910399}) = 0
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353T\246\205\v\0\3\0\0\0V\5\0\0;\230\353T\246\205\v\0\3\0\1\0\336\10\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 534534910}) = 0
sigreturn()                             = ? (mask now [HUP QUIT ILL])
writev(20, [{"\6\0T\3\256\332o\0\345\0\0\0\3\0\0\1\0\0\0\0h\0\377\0h\0\377\0\0\1\1\0"..., 224}], 1) = 224
clock_gettime(CLOCK_MONOTONIC, {7330, 542164305}) = 0
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353TX\255\v\0\3\0\0\0\317\5\0\0;\230\353TX\255\v\0\3\0\1\0T\t\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 546253660}) = 0
sigreturn()                             = ? (mask now [HUP QUIT ILL])
read(20, "5\20\4\0\236\0\0\1\3\0\0\1\33\1\257\0\224\4\6\0\237\0\0\1\236\0\0\1)\0\0\0"..., 4096) = 1088
clock_gettime(CLOCK_MONOTONIC, {7330, 548756102}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 549366453}) = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [HUP QUIT ILL])
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353T\273\323\v\0\3\0\0\0K\6\0\0;\230\353T\273\323\v\0\3\0\1\0\314\t\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 554707029}) = 0
sigreturn()                             = ? (mask now [HUP QUIT ILL])
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
select(256, [1 3 4 5 12 13 14 15 16 19], NULL, NULL, {0, 0}) = 1 (in [19], left {0, 0})
clock_gettime(CLOCK_MONOTONIC, {7330, 558155516}) = 0
setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 559132078}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 560749510}) = 0
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353T\325\372\v\0\3\0\0\0\326\6\0\0;\230\353T\325\372\v\0\3\0\1\0:\n\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 564564207}) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
write(2, "\n", 1
)                       = 1
clock_gettime(CLOCK_MONOTONIC, {7330, 565968016}) = 0
write(0, "[  7330.565] ", 13)           = 13
write(0, "\n", 1)                       = 1
write(2, "Backtrace:\n", 11Backtrace:
)            = 11
clock_gettime(CLOCK_MONOTONIC, {7330, 568195799}) = 0
write(0, "[  7330.568] ", 13)           = 13
write(0, "Backtrace:\n", 11)            = 11
write(2, "\n", 1
)                       = 1
clock_gettime(CLOCK_MONOTONIC, {7330, 571125486}) = 0
write(0, "[  7330.571] ", 13)           = 13
write(0, "\n", 1)                       = 1
futex(0xb6c587d0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(2, "Segmentation fault at address (n"..., 36Segmentation fault at address (nil)
) = 36
clock_gettime(CLOCK_MONOTONIC, {7330, 575092772}) = 0
write(0, "[  7330.575] ", 13)           = 13
write(0, "Segmentation fault at address (n"..., 36) = 36
write(2, "\nFatal server error:\n", 21
Fatal server error:
) = 21
clock_gettime(CLOCK_MONOTONIC, {7330, 577412108}) = 0
write(0, "[  7330.577] ", 13)           = 13
write(0, "\nFatal server error:\n", 21) = 21
write(2, "Caught signal 11 (Segmentation f"..., 55Caught signal 11 (Segmentation fault). Server aborting
) = 55
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [ABRT BUS FPE USR1 SEGV USR2 ALRM STKFLT CHLD CONT TTIN TTOU URG XCPU VTALRM PROF WINCH IO PWR RTMIN])
clock_gettime(CLOCK_MONOTONIC, {7330, 582752684}) = 0
write(0, "[  7330.582] ", 13)           = 13
write(0, "Caught signal 11 (Segmentation f"..., 55) = 55
write(2, "\n", 1
)                       = 1
clock_gettime(CLOCK_MONOTONIC, {7330, 585041502}) = 0
write(0, "[  7330.585] ", 13)           = 13
write(0, "\n", 1)                       = 1
write(2, "\nPlease consult the The X.Org Fo"..., 85
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
for help. 
) = 85
clock_gettime(CLOCK_MONOTONIC, {7330, 587208250}) = 0
write(0, "[  7330.587] ", 13)           = 13
write(0, "\nPlease consult the The X.Org Fo"..., 85) = 85
write(2, "Please also check the log file a"..., 84Please also check the log file at "/var/log/Xorg.0.log" for additional information.
) = 84
clock_gettime(CLOCK_MONOTONIC, {7330, 589466551}) = 0
write(0, "[  7330.589] ", 13)           = 13
write(0, "Please also check the log file a"..., 84) = 84
write(2, "\n", 1
)                       = 1
clock_gettime(CLOCK_MONOTONIC, {7330, 593525389}) = 0
write(0, "[  7330.593] ", 13)           = 13
write(0, "\n", 1)                       = 1
close(1)                                = 0
close(3)                                = 0
close(4)                                = 0
close(5)                                = 0
unlink("/tmp/.X11-unix/X0")             = 0
unlink("/tmp/.X0-lock")                 = 0
rt_sigprocmask(SIG_BLOCK, [ALRM CHLD TSTP TTIN TTOU VTALRM WINCH IO], [SEGV IO], 8) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 599567869}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 601948240}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 603168943}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 604145506}) = 0
fcntl64(9, F_GETFL)                     = 0x2802 (flags O_RDWR|O_NONBLOCK|O_ASYNC)
fcntl64(9, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
fcntl64(9, F_GETFD)                     = 0
close(9)                                = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 606983641}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 608509520}) = 0
write(0, "[  7330.608] ", 13)           = 13
write(0, "(II) evdev: Touchscreen: Close\n", 31) = 31
clock_gettime(CLOCK_MONOTONIC, {7330, 610798338}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 611408690}) = 0
write(0, "[  7330.611] ", 13)           = 13
write(0, "(II) UnloadModule: \"evdev\"\n", 27) = 27
clock_gettime(CLOCK_MONOTONIC, {7330, 613361815}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 614368895}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 615009764}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 615986326}) = 0
fcntl64(10, F_GETFL)                    = 0x2802 (flags O_RDWR|O_NONBLOCK|O_ASYNC)
fcntl64(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
fcntl64(10, F_GETFD)                    = 0
close(10)                               = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 618336180}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 619007567}) = 0
write(0, "[  7330.619] ", 13)           = 13
write(0, "(II) evdev: Power Button: Close\n", 32) = 32
clock_gettime(CLOCK_MONOTONIC, {7330, 621601561}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 622181395}) = 0
write(0, "[  7330.622] ", 13)           = 13
write(0, "(II) UnloadModule: \"evdev\"\n", 27) = 27
fcntl64(11, F_GETFL)                    = 0x2802 (flags O_RDWR|O_NONBLOCK|O_ASYNC)
fcntl64(11, F_SETFL, O_RDWR|O_NONBLOCK) = 0
fcntl64(11, F_GETFD)                    = 0
rt_sigaction(SIGIO, {SIG_IGN, [IO], 0x4000000 /* SA_??? */}, {0xb6f0d63d, [IO], 0x4000000 /* SA_??? */}, 8) = 0
close(11)                               = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 626606443}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 627308348}) = 0
write(0, "[  7330.627] ", 13)           = 13
write(0, "(II) evdev: AUX Button: Close\n", 30) = 30
clock_gettime(CLOCK_MONOTONIC, {7330, 629261473}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 629810789}) = 0
write(0, "[  7330.629] ", 13)           = 13
write(0, "(II) UnloadModule: \"evdev\"\n", 27) = 27
rt_sigprocmask(SIG_SETMASK, [SEGV IO], NULL, 8) = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
rt_sigprocmask(SIG_BLOCK, [IO], [SEGV IO], 8) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 634663084}) = 0
write(0, "[  7330.634] ", 13)           = 13
write(0, "(NI) OMAPFBLeaveVT\n", 19)    = 19
ioctl(7, KDSETMODE, 0)                  = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
ioctl(7, KDSKBMODE, 0x3)                = 0
ioctl(7, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 -opost -isig -icanon -echo ...}) = 0
ioctl(7, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(7, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(7, VIDIOC_RESERVED or VT_GETMODE, 0xbef3b348) = 0
ioctl(7, VIDIOC_ENUM_FMT or VT_SETMODE, 0xbef3b348) = 0
ioctl(7, VT_ACTIVATE, 0x1)              = 0
ioctl(7, VT_WAITACTIVE, 0x1)            = 0
close(7)                                = 0
write(2, "Server terminated with error (1)"..., 52Server terminated with error (1). Closing log file.
) = 52
clock_gettime(CLOCK_MONOTONIC, {7330, 655903318}) = 0
write(0, "[  7330.655] ", 13)           = 13
write(0, "Server terminated with error (1)"..., 52) = 52
close(0)                                = 0
rt_sigprocmask(SIG_BLOCK, [ALRM CHLD TSTP TTIN TTOU VTALRM WINCH IO], [SEGV IO], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
tgkill(4586, 4586, SIGABRT)             = 0
--- SIGABRT (Aborted) @ 0 (0) ---
root@gta04:~# 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-10  8:57             ` Dr. H. Nikolaus Schaller
  0 siblings, 0 replies; 38+ messages in thread
From: Dr. H. Nikolaus Schaller @ 2015-09-10  8:57 UTC (permalink / raw)
  To: linux-arm-kernel


Am 10.09.2015 um 10:30 schrieb Russell King - ARM Linux <linux@arm.linux.org.uk>:

> On Thu, Sep 10, 2015 at 08:42:57AM +0200, Dr. H. Nikolaus Schaller wrote:
>> 
>> Am 08.09.2015 um 23:07 schrieb Tony Lindgren <tony@atomide.com>:
>> 
>>> * Grazvydas Ignotas <notasas@gmail.com> [150908 13:44]:
>>>> On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
>>>>> * Grazvydas Ignotas <notasas@gmail.com> [150908 05:50]:
>>>>>> Hi,
>>>>>> 
>>>>>> this is a longstanding problem I'm seeing since the very beginning,
>>>>>> which was around 3.12 or so (when I've first got the hardware) and it
>>>>>> seems 4.2 is affected by it still. Basically what happens is Xorg
>>>>>> randomly segfaults at some "impossible" location. I don't have the
>>>>>> details at the moment (could get them is needed), but from what I
>>>>>> examined with gdb some time ago the situation did not make any sense.
>>>>>> 
>>>>>> There are 2 workarounds that I know which make the problem go away
>>>>>> (one is enough):
>>>>>> - recompile Xorg with -marm (I'm using Debian armhf so it's thumb2 by default)
>>>>>> - disable ARCH_MULTI_V6 in the kernel config
>>>>>> 
>>>>>> Because of the above workarounds I have forgotten about it several
>>>>>> times, but it regularly comes back and bites again. It would look like
>>>>>> some missing erratum workaround, but I have all of them enabled in the
>>>>>> kernel.
>>>>>> 
>>>>>> Does anyone know about this? Perhaps some missing erratum workaround
>>>>>> in the bootloader? u-boot isn't too old here (2015.07).
>>>>> 
>>>>> Seems like some incorrect handling with CONFIG_CPU_V6 compiled in..
>>>>> Maybe try to narrow it down by commenting out some CONFIG_CPU_V6 and
>>>>> __LINUX_ARM_ARCH__ = 6 ifdefs in the git grep CONFIG_CPU_V6
>>>>> places ignoring uncompress and davinci code.
>>>> 
>>>> ok with that it was quite easy to find. On a kernel with ARCH_MULTI_V6
>>>> disabled, it is enough to just do this:
>>>> 
>>>> --- a/arch/arm/kernel/signal.c
>>>> +++ b/arch/arm/kernel/signal.c
>>>> @@ -340,13 +340,13 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
>>>>               /*
>>>>                * The LSB of the handler determines if we're going to
>>>>                * be using THUMB or ARM mode for this signal handler.
>>>>                */
>>>>               thumb = handler & 1;
>>>> 
>>>> -#if __LINUX_ARM_ARCH__ >= 7
>>>> +#if 0 //__LINUX_ARM_ARCH__ >= 7
>>>>               /*
>>>>                * Clear the If-Then Thumb-2 execution state
>>>>                * ARM spec requires this to be all 000s in ARM mode
>>>>                * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
>>>>                * signal transition without this.
>>>>                */
>>>> 
>>>> ... and the problem appears, so I guess this needs some real
>>>> multiplatform handling,.
>>> 
>>> OK nice to hear you found it. Yeah looks like some runtime
>>> capability check is needed.
>>> 
>>>>> Do you have some easy way to reproduce this issue?
>>>> 
>>>> Just moving a browser window around with mouse usually triggers it
>>>> within a minute.
>>> 
>>> OK good to know.
>> 
>> It looks as if this is the solution for the same symptom on our OMAP3 board (gta04).
>> There, it suffices to draw on the touch screen for ~10 seconds to make the xserver segfault.
>> 
>> [we are using the binary xserver from debian wheezy
>> ii  xserver-xorg-core                        2:1.12.4-6+deb7u5             armhf        Xorg X server - core server]
>> 
>> We know about this bug for a while, but so far did think that some touch screen
>> event bit has changed and we have to fix our touch screen driver.
>> 
>> Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
>>>> #if 0 //__LINUX_ARM_ARCH__ >= 7
>> makes it re-appear.
>> 
>> A while ago I tried to debug running the x-server under strace and could find that it also has
>> something to do with SIGALRM.
>> 
>> And that is very consistent with ?enable/disable? by modifying arch/arm/kernel/signal.c
> 
> It would be really nice if someone could diagnose what's going on here.
> What exception is causing the X server to be killed (someone said a
> segfault)?  What is the register state at the point that happens?  What
> does the code look like  Is it happening inside the SIGALRM handler, or
> when the SIGALRM handler has returned?
> 
> I'd suggest attaching gdb to the X server, but remember to set gdb to
> ignore SIGPIPEs.

I don?t have a setup to run gdb (with source) on the device and really zero
experience with Xserver sources. But maybe Grazvydas can do that better
than me.

Attached is some strace I had recorded during my earlier experiments.
X-Server appears not only to heavily use SIGALRM but SIGIO.

And it looks as if it a SEGFAULT appears inside the SIGIO handler after
having done 3 syscalls (select, read, clock_gettime) but before the
sigreturn. At least in this example.

Xserver then does a graceful shutdown after SEGFAULT. I.e. it prints the
segfault message by itself.

Hope this is a useful piece to solve the puzzle and helps a little.

BR,
Nikolaus

?
--- SIGALRM (Alarm clock) @ 0 (0) ---
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353T^\351\n\0\3\0\0\0:\4\0\0;\230\353T^\351\n\0\3\0\1\0=\7\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 494831541}) = 0
sigreturn()                             = ? (mask now [ILL ABRT KILL USR1 SEGV PIPE TERM STKFLT CHLD STOP TSTP TTIN XFSZ VTALRM PROF IO PWR RTMIN])
sigreturn()                             = ? (mask now [])
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
select(256, [1 3 4 5 12 13 14 15 16 19], NULL, NULL, {0, 0}) = 1 (in [19], left {0, 0})
clock_gettime(CLOCK_MONOTONIC, {7330, 499042967}) = 0
setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 500050047}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 501911619}) = 0
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353Tw\20\v\0\3\0\0\0h\4\0\0;\230\353Tw\20\v\0\3\0\1\0\256\7\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 504536131}) = 0
sigreturn()                             = ? (mask now [HUP QUIT ILL])
clock_gettime(CLOCK_MONOTONIC, {7330, 506275633}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 506855467}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 507587889}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 508442381}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 508961180}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 509418943}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 509998777}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 511860350}) = 0
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353TT7\v\0\3\0\0\0\242\4\0\0;\230\353TT7\v\0\3\0\1\0\367\7\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 514484861}) = 0
sigreturn()                             = ? (mask now [])
clock_gettime(CLOCK_MONOTONIC, {7330, 516224363}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 516743162}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 517200926}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 517719725}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 518452147}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 519367674}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 519947508}) = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353Tn^\v\0\3\0\0\0\370\4\0\0;\230\353Tn^\v\0\3\0\1\0y\10\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 525074461}) = 0
sigreturn()                             = ? (mask now [])
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
select(256, [1 3 4 5 12 13 14 15 16 19], NULL, NULL, {0, 0}) = 1 (in [19], left {0, 0})
clock_gettime(CLOCK_MONOTONIC, {7330, 528400877}) = 0
setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 529377440}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 530018309}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 531910399}) = 0
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353T\246\205\v\0\3\0\0\0V\5\0\0;\230\353T\246\205\v\0\3\0\1\0\336\10\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 534534910}) = 0
sigreturn()                             = ? (mask now [HUP QUIT ILL])
writev(20, [{"\6\0T\3\256\332o\0\345\0\0\0\3\0\0\1\0\0\0\0h\0\377\0h\0\377\0\0\1\1\0"..., 224}], 1) = 224
clock_gettime(CLOCK_MONOTONIC, {7330, 542164305}) = 0
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353TX\255\v\0\3\0\0\0\317\5\0\0;\230\353TX\255\v\0\3\0\1\0T\t\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 546253660}) = 0
sigreturn()                             = ? (mask now [HUP QUIT ILL])
read(20, "5\20\4\0\236\0\0\1\3\0\0\1\33\1\257\0\224\4\6\0\237\0\0\1\236\0\0\1)\0\0\0"..., 4096) = 1088
clock_gettime(CLOCK_MONOTONIC, {7330, 548756102}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 549366453}) = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [HUP QUIT ILL])
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353T\273\323\v\0\3\0\0\0K\6\0\0;\230\353T\273\323\v\0\3\0\1\0\314\t\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 554707029}) = 0
sigreturn()                             = ? (mask now [HUP QUIT ILL])
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
select(256, [1 3 4 5 12 13 14 15 16 19], NULL, NULL, {0, 0}) = 1 (in [19], left {0, 0})
clock_gettime(CLOCK_MONOTONIC, {7330, 558155516}) = 0
setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 559132078}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 560749510}) = 0
--- SIGIO (I/O possible) @ 0 (0) ---
select(12, [9 10 11], NULL, NULL, {0, 0}) = 1 (in [9], left {0, 0})
read(9, ";\230\353T\325\372\v\0\3\0\0\0\326\6\0\0;\230\353T\325\372\v\0\3\0\1\0:\n\0\0"..., 256) = 64
clock_gettime(CLOCK_MONOTONIC, {7330, 564564207}) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
write(2, "\n", 1
)                       = 1
clock_gettime(CLOCK_MONOTONIC, {7330, 565968016}) = 0
write(0, "[  7330.565] ", 13)           = 13
write(0, "\n", 1)                       = 1
write(2, "Backtrace:\n", 11Backtrace:
)            = 11
clock_gettime(CLOCK_MONOTONIC, {7330, 568195799}) = 0
write(0, "[  7330.568] ", 13)           = 13
write(0, "Backtrace:\n", 11)            = 11
write(2, "\n", 1
)                       = 1
clock_gettime(CLOCK_MONOTONIC, {7330, 571125486}) = 0
write(0, "[  7330.571] ", 13)           = 13
write(0, "\n", 1)                       = 1
futex(0xb6c587d0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(2, "Segmentation fault at address (n"..., 36Segmentation fault at address (nil)
) = 36
clock_gettime(CLOCK_MONOTONIC, {7330, 575092772}) = 0
write(0, "[  7330.575] ", 13)           = 13
write(0, "Segmentation fault at address (n"..., 36) = 36
write(2, "\nFatal server error:\n", 21
Fatal server error:
) = 21
clock_gettime(CLOCK_MONOTONIC, {7330, 577412108}) = 0
write(0, "[  7330.577] ", 13)           = 13
write(0, "\nFatal server error:\n", 21) = 21
write(2, "Caught signal 11 (Segmentation f"..., 55Caught signal 11 (Segmentation fault). Server aborting
) = 55
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [ABRT BUS FPE USR1 SEGV USR2 ALRM STKFLT CHLD CONT TTIN TTOU URG XCPU VTALRM PROF WINCH IO PWR RTMIN])
clock_gettime(CLOCK_MONOTONIC, {7330, 582752684}) = 0
write(0, "[  7330.582] ", 13)           = 13
write(0, "Caught signal 11 (Segmentation f"..., 55) = 55
write(2, "\n", 1
)                       = 1
clock_gettime(CLOCK_MONOTONIC, {7330, 585041502}) = 0
write(0, "[  7330.585] ", 13)           = 13
write(0, "\n", 1)                       = 1
write(2, "\nPlease consult the The X.Org Fo"..., 85
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
for help. 
) = 85
clock_gettime(CLOCK_MONOTONIC, {7330, 587208250}) = 0
write(0, "[  7330.587] ", 13)           = 13
write(0, "\nPlease consult the The X.Org Fo"..., 85) = 85
write(2, "Please also check the log file a"..., 84Please also check the log file at "/var/log/Xorg.0.log" for additional information.
) = 84
clock_gettime(CLOCK_MONOTONIC, {7330, 589466551}) = 0
write(0, "[  7330.589] ", 13)           = 13
write(0, "Please also check the log file a"..., 84) = 84
write(2, "\n", 1
)                       = 1
clock_gettime(CLOCK_MONOTONIC, {7330, 593525389}) = 0
write(0, "[  7330.593] ", 13)           = 13
write(0, "\n", 1)                       = 1
close(1)                                = 0
close(3)                                = 0
close(4)                                = 0
close(5)                                = 0
unlink("/tmp/.X11-unix/X0")             = 0
unlink("/tmp/.X0-lock")                 = 0
rt_sigprocmask(SIG_BLOCK, [ALRM CHLD TSTP TTIN TTOU VTALRM WINCH IO], [SEGV IO], 8) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 599567869}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 601948240}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 603168943}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 604145506}) = 0
fcntl64(9, F_GETFL)                     = 0x2802 (flags O_RDWR|O_NONBLOCK|O_ASYNC)
fcntl64(9, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
fcntl64(9, F_GETFD)                     = 0
close(9)                                = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 606983641}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 608509520}) = 0
write(0, "[  7330.608] ", 13)           = 13
write(0, "(II) evdev: Touchscreen: Close\n", 31) = 31
clock_gettime(CLOCK_MONOTONIC, {7330, 610798338}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 611408690}) = 0
write(0, "[  7330.611] ", 13)           = 13
write(0, "(II) UnloadModule: \"evdev\"\n", 27) = 27
clock_gettime(CLOCK_MONOTONIC, {7330, 613361815}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 614368895}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 615009764}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 615986326}) = 0
fcntl64(10, F_GETFL)                    = 0x2802 (flags O_RDWR|O_NONBLOCK|O_ASYNC)
fcntl64(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
fcntl64(10, F_GETFD)                    = 0
close(10)                               = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 618336180}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 619007567}) = 0
write(0, "[  7330.619] ", 13)           = 13
write(0, "(II) evdev: Power Button: Close\n", 32) = 32
clock_gettime(CLOCK_MONOTONIC, {7330, 621601561}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 622181395}) = 0
write(0, "[  7330.622] ", 13)           = 13
write(0, "(II) UnloadModule: \"evdev\"\n", 27) = 27
fcntl64(11, F_GETFL)                    = 0x2802 (flags O_RDWR|O_NONBLOCK|O_ASYNC)
fcntl64(11, F_SETFL, O_RDWR|O_NONBLOCK) = 0
fcntl64(11, F_GETFD)                    = 0
rt_sigaction(SIGIO, {SIG_IGN, [IO], 0x4000000 /* SA_??? */}, {0xb6f0d63d, [IO], 0x4000000 /* SA_??? */}, 8) = 0
close(11)                               = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 626606443}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 627308348}) = 0
write(0, "[  7330.627] ", 13)           = 13
write(0, "(II) evdev: AUX Button: Close\n", 30) = 30
clock_gettime(CLOCK_MONOTONIC, {7330, 629261473}) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 629810789}) = 0
write(0, "[  7330.629] ", 13)           = 13
write(0, "(II) UnloadModule: \"evdev\"\n", 27) = 27
rt_sigprocmask(SIG_SETMASK, [SEGV IO], NULL, 8) = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
rt_sigprocmask(SIG_BLOCK, [IO], [SEGV IO], 8) = 0
clock_gettime(CLOCK_MONOTONIC, {7330, 634663084}) = 0
write(0, "[  7330.634] ", 13)           = 13
write(0, "(NI) OMAPFBLeaveVT\n", 19)    = 19
ioctl(7, KDSETMODE, 0)                  = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
ioctl(7, KDSKBMODE, 0x3)                = 0
ioctl(7, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 -opost -isig -icanon -echo ...}) = 0
ioctl(7, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(7, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(7, VIDIOC_RESERVED or VT_GETMODE, 0xbef3b348) = 0
ioctl(7, VIDIOC_ENUM_FMT or VT_SETMODE, 0xbef3b348) = 0
ioctl(7, VT_ACTIVATE, 0x1)              = 0
ioctl(7, VT_WAITACTIVE, 0x1)            = 0
close(7)                                = 0
write(2, "Server terminated with error (1)"..., 52Server terminated with error (1). Closing log file.
) = 52
clock_gettime(CLOCK_MONOTONIC, {7330, 655903318}) = 0
write(0, "[  7330.655] ", 13)           = 13
write(0, "Server terminated with error (1)"..., 52) = 52
close(0)                                = 0
rt_sigprocmask(SIG_BLOCK, [ALRM CHLD TSTP TTIN TTOU VTALRM WINCH IO], [SEGV IO], 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
tgkill(4586, 4586, SIGABRT)             = 0
--- SIGABRT (Aborted) @ 0 (0) ---
root at gta04:~# 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: mysterious crashes on OMAP5 uevm
  2015-09-10  8:30           ` Russell King - ARM Linux
@ 2015-09-10 23:33             ` Woodruff, Richard
  -1 siblings, 0 replies; 38+ messages in thread
From: Woodruff, Richard @ 2015-09-10 23:33 UTC (permalink / raw)
  To: Russell King - ARM Linux, Dr. H. Nikolaus Schaller
  Cc: Menon, Nishanth, Tony Lindgren, Grazvydas Ignotas, Marek Belisko,
	linux-omap, linux-arm-kernel

> From: linux-arm-kernel [mailto:linux-arm-kernel-
> bounces@lists.infradead.org] On Behalf Of Russell King - ARM Linux
 
> > >>>> There are 2 workarounds that I know which make the problem go
> > >>>> away (one is enough):
> > >>>> - recompile Xorg with -marm (I'm using Debian armhf so it's
> > >>>> thumb2 by default)
> > >>>> - disable ARCH_MULTI_V6 in the kernel config

This reminds me of a customer crash I saw quite a while ago relating to thumb2.  I thought it was fixed but maybe not.

In a couple spots the PSR_IT_MASK was not conditionally handled well in ARCH_MULTI_V6 flow.  Some stack sanity check failed and a BUG() was triggered.

Compiling the app for v6 or pulling MULTI from the kernel build solved the issue.

Additionally it was not handled correctly in GDB.   The old build of GDB didn't do MULTI and needed a hack to be useable on thumb2 code.

Regards,
Richard W.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-10 23:33             ` Woodruff, Richard
  0 siblings, 0 replies; 38+ messages in thread
From: Woodruff, Richard @ 2015-09-10 23:33 UTC (permalink / raw)
  To: linux-arm-kernel

> From: linux-arm-kernel [mailto:linux-arm-kernel-
> bounces at lists.infradead.org] On Behalf Of Russell King - ARM Linux
 
> > >>>> There are 2 workarounds that I know which make the problem go
> > >>>> away (one is enough):
> > >>>> - recompile Xorg with -marm (I'm using Debian armhf so it's
> > >>>> thumb2 by default)
> > >>>> - disable ARCH_MULTI_V6 in the kernel config

This reminds me of a customer crash I saw quite a while ago relating to thumb2.  I thought it was fixed but maybe not.

In a couple spots the PSR_IT_MASK was not conditionally handled well in ARCH_MULTI_V6 flow.  Some stack sanity check failed and a BUG() was triggered.

Compiling the app for v6 or pulling MULTI from the kernel build solved the issue.

Additionally it was not handled correctly in GDB.   The old build of GDB didn't do MULTI and needed a hack to be useable on thumb2 code.

Regards,
Richard W.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mysterious crashes on OMAP5 uevm
  2015-09-10  8:30           ` Russell King - ARM Linux
@ 2015-09-11 13:27             ` Grazvydas Ignotas
  -1 siblings, 0 replies; 38+ messages in thread
From: Grazvydas Ignotas @ 2015-09-11 13:27 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Nishanth Menon, Tony Lindgren, Dr. H. Nikolaus Schaller,
	Marek Belisko, linux-omap, linux-arm-kernel

On Thu, Sep 10, 2015 at 10:30 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, Sep 10, 2015 at 08:42:57AM +0200, Dr. H. Nikolaus Schaller wrote:
>> ...
>>
>> Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
>> >> #if 0 //__LINUX_ARM_ARCH__ >= 7
>> makes it re-appear.
>>
>> A while ago I tried to debug running the x-server under strace and could find that it also has
>> something to do with SIGALRM.
>>
>> And that is very consistent with “enable/disable” by modifying arch/arm/kernel/signal.c
>
> It would be really nice if someone could diagnose what's going on here.
> What exception is causing the X server to be killed (someone said a
> segfault)?  What is the register state at the point that happens?  What
> does the code look like  Is it happening inside the SIGALRM handler, or
> when the SIGALRM handler has returned?
>
> I'd suggest attaching gdb to the X server, but remember to set gdb to
> ignore SIGPIPEs.

It's actually pretty random, see some debug sessions in [1].
The first one is the most useful one, but I haven't though of checking
what pixman_rasterize_edges() was doing when the signal arrived, and
most often the "less useful" segfaults occur. However from the
disassembly (see debug1_libpixman.gz) it can be seen that the signal
arrived right after IT.

[1] http://notaz.gp2x.de/tmp/thumb_segfault/

Gražvydas

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-11 13:27             ` Grazvydas Ignotas
  0 siblings, 0 replies; 38+ messages in thread
From: Grazvydas Ignotas @ 2015-09-11 13:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 10, 2015 at 10:30 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, Sep 10, 2015 at 08:42:57AM +0200, Dr. H. Nikolaus Schaller wrote:
>> ...
>>
>> Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
>> >> #if 0 //__LINUX_ARM_ARCH__ >= 7
>> makes it re-appear.
>>
>> A while ago I tried to debug running the x-server under strace and could find that it also has
>> something to do with SIGALRM.
>>
>> And that is very consistent with ?enable/disable? by modifying arch/arm/kernel/signal.c
>
> It would be really nice if someone could diagnose what's going on here.
> What exception is causing the X server to be killed (someone said a
> segfault)?  What is the register state at the point that happens?  What
> does the code look like  Is it happening inside the SIGALRM handler, or
> when the SIGALRM handler has returned?
>
> I'd suggest attaching gdb to the X server, but remember to set gdb to
> ignore SIGPIPEs.

It's actually pretty random, see some debug sessions in [1].
The first one is the most useful one, but I haven't though of checking
what pixman_rasterize_edges() was doing when the signal arrived, and
most often the "less useful" segfaults occur. However from the
disassembly (see debug1_libpixman.gz) it can be seen that the signal
arrived right after IT.

[1] http://notaz.gp2x.de/tmp/thumb_segfault/

Gra?vydas

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mysterious crashes on OMAP5 uevm
  2015-09-11 13:27             ` Grazvydas Ignotas
@ 2015-09-11 14:03               ` Russell King - ARM Linux
  -1 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2015-09-11 14:03 UTC (permalink / raw)
  To: Grazvydas Ignotas
  Cc: Nishanth Menon, Tony Lindgren, Dr. H. Nikolaus Schaller,
	Marek Belisko, linux-omap, linux-arm-kernel

On Fri, Sep 11, 2015 at 03:27:13PM +0200, Grazvydas Ignotas wrote:
> On Thu, Sep 10, 2015 at 10:30 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Thu, Sep 10, 2015 at 08:42:57AM +0200, Dr. H. Nikolaus Schaller wrote:
> >> ...
> >>
> >> Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
> >> >> #if 0 //__LINUX_ARM_ARCH__ >= 7
> >> makes it re-appear.
> >>
> >> A while ago I tried to debug running the x-server under strace and could find that it also has
> >> something to do with SIGALRM.
> >>
> >> And that is very consistent with “enable/disable” by modifying arch/arm/kernel/signal.c
> >
> > It would be really nice if someone could diagnose what's going on here.
> > What exception is causing the X server to be killed (someone said a
> > segfault)?  What is the register state at the point that happens?  What
> > does the code look like  Is it happening inside the SIGALRM handler, or
> > when the SIGALRM handler has returned?
> >
> > I'd suggest attaching gdb to the X server, but remember to set gdb to
> > ignore SIGPIPEs.
> 
> It's actually pretty random, see some debug sessions in [1].
> The first one is the most useful one, but I haven't though of checking
> what pixman_rasterize_edges() was doing when the signal arrived, and
> most often the "less useful" segfaults occur. However from the
> disassembly (see debug1_libpixman.gz) it can be seen that the signal
> arrived right after IT.
> 
> [1] http://notaz.gp2x.de/tmp/thumb_segfault/

We're not going from ARM -> Thumb or Thumb -> ARM here, but Thumb code
in libpixman is being interrupted calling a Thumb signal handler.

Working through the code:

   0x7f717ec8 <SmartScheduleTimer>:     ldr     r2, [pc, #20]   ; = 0x0004112e
   0x7f717eca <SmartScheduleTimer+2>:   ldr     r1, [pc, #24]   ; = 0x00000c48
   0x7f717ecc <SmartScheduleTimer+4>:   ldr     r3, [pc, #24]   ; = 0x00000e6c
   0x7f717ece <SmartScheduleTimer+6>:   add     r2, pc
   0x7f717ed0 <SmartScheduleTimer+8>:   ldr     r1, [r2, r1]
   0x7f717ed2 <SmartScheduleTimer+10>:  ldr     r3, [r2, r3]
=> 0x7f717ed4 <SmartScheduleTimer+12>:  ldr     r2, [r1, #0]

The instruction at 0x7f717ed4 was trying to access 0xd1242963 which
is in kernel space, and this is the faulting instruction.

At this point, r2 should contain 0x0004112e plus the PC value.  r2 in
the register dump was 0x7f717fa0.  Let's calculate the value that PC
should be here.  0x7f717fa0 - 0x0004112e = 0x7f6d6e72, which is
clearly wrong.

So, I don't think the first instruction here was executed by the CPU.

gdb indicates that the parent context to the signal frame, pc was at
0xb6dd87f8, which works out at 0x297f8 into the libpixman-1 library:

   297f0:       449c            add     ip, r3
   297f2:       f1bc 0fff       cmp.w   ip, #255        ; 0xff
   297f6:       bfd4            ite     le
   297f8:       fa5f fc8c       uxtble.w        ip, ip
   297fc:       f04f 0cff       movgt.w ip, #255        ; 0xff
   29800:       f88a c000       strb.w  ip, [sl]

and as you say, is just after an IT instruction, which would have
set the IT execution state to appropriately skip either the first or
the second instruction.

Unfortunately, the IT instruction's condition is being carried forward
to the signal handler, causing either the first or second instruction
there to be skipped.

Looking back at the history, the original commit introducing the
clearing of the PSR_IT_MASK bits is just wrong:

-               if (thumb)
+               if (thumb) {
                        cpsr |= PSR_T_BIT;
-               else
+#if __LINUX_ARM_ARCH__ >= 7
+                       /* clear the If-Then Thumb-2 execution state */
+                       cpsr &= ~PSR_IT_MASK;
+#endif
+               } else
                        cpsr &= ~PSR_T_BIT;

This shouldn't be a compile-time decision at all, and it certainly should
not be dependent on __LINUX_ARM_ARCH__, which marks the _lowest_ supported
architecture.

However, even the idea that it's ARMv7 or later is wrong.  According to
the ARM ARM, the IT instruction is present in ARMv6T2 as well, which
means it's ARMv6 too (which would have __LINUX_ARM_ARCH__ = 6).

Looking at the ARM ARM, these bits are "reserved" in previous non-T2
architectures, have an undefined value at reset, and are probably zero
anyway.

Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the problem,
and I doubt there's any ARMv6 non-T2 systems out there that would be
affected by clearing the IT state bits.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-11 14:03               ` Russell King - ARM Linux
  0 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2015-09-11 14:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 11, 2015 at 03:27:13PM +0200, Grazvydas Ignotas wrote:
> On Thu, Sep 10, 2015 at 10:30 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Thu, Sep 10, 2015 at 08:42:57AM +0200, Dr. H. Nikolaus Schaller wrote:
> >> ...
> >>
> >> Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
> >> >> #if 0 //__LINUX_ARM_ARCH__ >= 7
> >> makes it re-appear.
> >>
> >> A while ago I tried to debug running the x-server under strace and could find that it also has
> >> something to do with SIGALRM.
> >>
> >> And that is very consistent with ?enable/disable? by modifying arch/arm/kernel/signal.c
> >
> > It would be really nice if someone could diagnose what's going on here.
> > What exception is causing the X server to be killed (someone said a
> > segfault)?  What is the register state at the point that happens?  What
> > does the code look like  Is it happening inside the SIGALRM handler, or
> > when the SIGALRM handler has returned?
> >
> > I'd suggest attaching gdb to the X server, but remember to set gdb to
> > ignore SIGPIPEs.
> 
> It's actually pretty random, see some debug sessions in [1].
> The first one is the most useful one, but I haven't though of checking
> what pixman_rasterize_edges() was doing when the signal arrived, and
> most often the "less useful" segfaults occur. However from the
> disassembly (see debug1_libpixman.gz) it can be seen that the signal
> arrived right after IT.
> 
> [1] http://notaz.gp2x.de/tmp/thumb_segfault/

We're not going from ARM -> Thumb or Thumb -> ARM here, but Thumb code
in libpixman is being interrupted calling a Thumb signal handler.

Working through the code:

   0x7f717ec8 <SmartScheduleTimer>:     ldr     r2, [pc, #20]   ; = 0x0004112e
   0x7f717eca <SmartScheduleTimer+2>:   ldr     r1, [pc, #24]   ; = 0x00000c48
   0x7f717ecc <SmartScheduleTimer+4>:   ldr     r3, [pc, #24]   ; = 0x00000e6c
   0x7f717ece <SmartScheduleTimer+6>:   add     r2, pc
   0x7f717ed0 <SmartScheduleTimer+8>:   ldr     r1, [r2, r1]
   0x7f717ed2 <SmartScheduleTimer+10>:  ldr     r3, [r2, r3]
=> 0x7f717ed4 <SmartScheduleTimer+12>:  ldr     r2, [r1, #0]

The instruction at 0x7f717ed4 was trying to access 0xd1242963 which
is in kernel space, and this is the faulting instruction.

At this point, r2 should contain 0x0004112e plus the PC value.  r2 in
the register dump was 0x7f717fa0.  Let's calculate the value that PC
should be here.  0x7f717fa0 - 0x0004112e = 0x7f6d6e72, which is
clearly wrong.

So, I don't think the first instruction here was executed by the CPU.

gdb indicates that the parent context to the signal frame, pc was at
0xb6dd87f8, which works out at 0x297f8 into the libpixman-1 library:

   297f0:       449c            add     ip, r3
   297f2:       f1bc 0fff       cmp.w   ip, #255        ; 0xff
   297f6:       bfd4            ite     le
   297f8:       fa5f fc8c       uxtble.w        ip, ip
   297fc:       f04f 0cff       movgt.w ip, #255        ; 0xff
   29800:       f88a c000       strb.w  ip, [sl]

and as you say, is just after an IT instruction, which would have
set the IT execution state to appropriately skip either the first or
the second instruction.

Unfortunately, the IT instruction's condition is being carried forward
to the signal handler, causing either the first or second instruction
there to be skipped.

Looking back at the history, the original commit introducing the
clearing of the PSR_IT_MASK bits is just wrong:

-               if (thumb)
+               if (thumb) {
                        cpsr |= PSR_T_BIT;
-               else
+#if __LINUX_ARM_ARCH__ >= 7
+                       /* clear the If-Then Thumb-2 execution state */
+                       cpsr &= ~PSR_IT_MASK;
+#endif
+               } else
                        cpsr &= ~PSR_T_BIT;

This shouldn't be a compile-time decision at all, and it certainly should
not be dependent on __LINUX_ARM_ARCH__, which marks the _lowest_ supported
architecture.

However, even the idea that it's ARMv7 or later is wrong.  According to
the ARM ARM, the IT instruction is present in ARMv6T2 as well, which
means it's ARMv6 too (which would have __LINUX_ARM_ARCH__ = 6).

Looking at the ARM ARM, these bits are "reserved" in previous non-T2
architectures, have an undefined value at reset, and are probably zero
anyway.

Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the problem,
and I doubt there's any ARMv6 non-T2 systems out there that would be
affected by clearing the IT state bits.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: mysterious crashes on OMAP5 uevm
  2015-09-11 14:03               ` Russell King - ARM Linux
@ 2015-09-11 16:12                 ` Woodruff, Richard
  -1 siblings, 0 replies; 38+ messages in thread
From: Woodruff, Richard @ 2015-09-11 16:12 UTC (permalink / raw)
  To: Russell King - ARM Linux, Grazvydas Ignotas
  Cc: Menon, Nishanth, Tony Lindgren, Dr. H. Nikolaus Schaller,
	Marek Belisko, linux-omap, linux-arm-kernel

> From: linux-omap-owner@vger.kernel.org [mailto:linux-omap-
> owner@vger.kernel.org] On Behalf Of Russell King - ARM Linux
> Sent: Friday, September 11, 2015 9:03 AM
> To: Grazvydas Ignotas

> However, even the idea that it's ARMv7 or later is wrong.  According to
> the ARM ARM, the IT instruction is present in ARMv6T2 as well, which
> means it's ARMv6 too (which would have __LINUX_ARM_ARCH__ = 6).

I recall seeing ARMv6T2 first implemented in the ARM1156 which is a v6 CPU with T2 option added.

Cortex-R class was the ARMv7 successor to the 1156 CPU which also use T2.

> Looking at the ARM ARM, these bits are "reserved" in previous non-T2
> architectures, have an undefined value at reset, and are probably zero
> anyway.
> 
> Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the
> problem,
> and I doubt there's any ARMv6 non-T2 systems out there that would be
> affected by clearing the IT state bits.

Probably you already looked, but cpsr.it usage is not restricted to this one spot.

Looking back at old notes I think both debug and signal handler code keyed on bit usage.  I see from LXR kernel KVM code also uses in some capacity.

The 1156/Cortex-R are typically MMU-less.   They may (or not) have something else to consider when fixing.

Regards,
Richard W.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-11 16:12                 ` Woodruff, Richard
  0 siblings, 0 replies; 38+ messages in thread
From: Woodruff, Richard @ 2015-09-11 16:12 UTC (permalink / raw)
  To: linux-arm-kernel

> From: linux-omap-owner at vger.kernel.org [mailto:linux-omap-
> owner at vger.kernel.org] On Behalf Of Russell King - ARM Linux
> Sent: Friday, September 11, 2015 9:03 AM
> To: Grazvydas Ignotas

> However, even the idea that it's ARMv7 or later is wrong.  According to
> the ARM ARM, the IT instruction is present in ARMv6T2 as well, which
> means it's ARMv6 too (which would have __LINUX_ARM_ARCH__ = 6).

I recall seeing ARMv6T2 first implemented in the ARM1156 which is a v6 CPU with T2 option added.

Cortex-R class was the ARMv7 successor to the 1156 CPU which also use T2.

> Looking at the ARM ARM, these bits are "reserved" in previous non-T2
> architectures, have an undefined value at reset, and are probably zero
> anyway.
> 
> Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the
> problem,
> and I doubt there's any ARMv6 non-T2 systems out there that would be
> affected by clearing the IT state bits.

Probably you already looked, but cpsr.it usage is not restricted to this one spot.

Looking back at old notes I think both debug and signal handler code keyed on bit usage.  I see from LXR kernel KVM code also uses in some capacity.

The 1156/Cortex-R are typically MMU-less.   They may (or not) have something else to consider when fixing.

Regards,
Richard W.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mysterious crashes on OMAP5 uevm
  2015-09-11 16:12                 ` Woodruff, Richard
@ 2015-09-11 17:48                   ` Russell King - ARM Linux
  -1 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2015-09-11 17:48 UTC (permalink / raw)
  To: Woodruff, Richard
  Cc: Menon, Nishanth, Tony Lindgren, Dr. H. Nikolaus Schaller,
	Grazvydas Ignotas, Marek Belisko, linux-omap, linux-arm-kernel

On Fri, Sep 11, 2015 at 04:12:21PM +0000, Woodruff, Richard wrote:
> > From: linux-omap-owner@vger.kernel.org [mailto:linux-omap-
> > owner@vger.kernel.org] On Behalf Of Russell King - ARM Linux
> > Sent: Friday, September 11, 2015 9:03 AM
> > To: Grazvydas Ignotas
> 
> > However, even the idea that it's ARMv7 or later is wrong.  According to
> > the ARM ARM, the IT instruction is present in ARMv6T2 as well, which
> > means it's ARMv6 too (which would have __LINUX_ARM_ARCH__ = 6).
> 
> I recall seeing ARMv6T2 first implemented in the ARM1156 which is a
> v6 CPU with T2 option added.

Exactly, which is why we need to be dealing with the IT bits in signal
handling for >= ARMv6, not >= ARMv7.

> > Looking at the ARM ARM, these bits are "reserved" in previous non-T2
> > architectures, have an undefined value at reset, and are probably zero
> > anyway.
> > 
> > Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the
> > problem,
> > and I doubt there's any ARMv6 non-T2 systems out there that would be
> > affected by clearing the IT state bits.
> 
> Probably you already looked, but cpsr.it usage is not restricted to this
> one spot.

Other places:

arch/arm/mm/extable.c-#ifdef CONFIG_THUMB2_KERNEL
arch/arm/mm/extable.c-          /* Clear the IT state to avoid nasty surprises in the fixup */
arch/arm/mm/extable.c:          regs->ARM_cpsr &= ~PSR_IT_MASK;
arch/arm/mm/extable.c-#endif

which is irrelevant here.  This code only deals with kernel mode, and
the only time that this makes sense is when the kernel is built using
Thumb2 instructions.  CONFIG_THUMB2_KERNEL covers the case properly.

arch/arm/probes/kprobes/test-core.c-    regs->ARM_lr = val ^ (14 << 8);
arch/arm/probes/kprobes/test-core.c:    regs->ARM_cpsr &= ~(APSR_MASK | PSR_IT_MASK);
arch/arm/probes/kprobes/test-core.c-    regs->ARM_cpsr |= test_context_cpsr(scenario);

>From what I can see, this happens unconditionally.

KVM and Xen code... that requires virtualisation support, which is ARMv7.

arch/arm/probes/kprobes/actions-thumb.c... emulating an IT instruction.
arch/arm/probes/decode.h::it_advance... emulating Thumb2.

So really there's no other places that need fixing.

> Looking back at old notes I think both debug and signal handler code
> keyed on bit usage.  I see from LXR kernel KVM code also uses in some
> capacity.

Frankly, Richard, you're getting on my nerves in this thread - you
seem to know all about this problem, yet you never reported the problem
upstream, so people are effectively having to waste time re-doing the
work that you've already done.

Nothing annoys me more than having people say "oh yes, I found that
problem and worked on it" and nothing coming of it (no report, no
patch, no nothing.)

As you have "old notes" you've already investigated this issue, and
presumably you came up with a patch.  Where is it?

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-11 17:48                   ` Russell King - ARM Linux
  0 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2015-09-11 17:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 11, 2015 at 04:12:21PM +0000, Woodruff, Richard wrote:
> > From: linux-omap-owner at vger.kernel.org [mailto:linux-omap-
> > owner at vger.kernel.org] On Behalf Of Russell King - ARM Linux
> > Sent: Friday, September 11, 2015 9:03 AM
> > To: Grazvydas Ignotas
> 
> > However, even the idea that it's ARMv7 or later is wrong.  According to
> > the ARM ARM, the IT instruction is present in ARMv6T2 as well, which
> > means it's ARMv6 too (which would have __LINUX_ARM_ARCH__ = 6).
> 
> I recall seeing ARMv6T2 first implemented in the ARM1156 which is a
> v6 CPU with T2 option added.

Exactly, which is why we need to be dealing with the IT bits in signal
handling for >= ARMv6, not >= ARMv7.

> > Looking at the ARM ARM, these bits are "reserved" in previous non-T2
> > architectures, have an undefined value at reset, and are probably zero
> > anyway.
> > 
> > Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the
> > problem,
> > and I doubt there's any ARMv6 non-T2 systems out there that would be
> > affected by clearing the IT state bits.
> 
> Probably you already looked, but cpsr.it usage is not restricted to this
> one spot.

Other places:

arch/arm/mm/extable.c-#ifdef CONFIG_THUMB2_KERNEL
arch/arm/mm/extable.c-          /* Clear the IT state to avoid nasty surprises in the fixup */
arch/arm/mm/extable.c:          regs->ARM_cpsr &= ~PSR_IT_MASK;
arch/arm/mm/extable.c-#endif

which is irrelevant here.  This code only deals with kernel mode, and
the only time that this makes sense is when the kernel is built using
Thumb2 instructions.  CONFIG_THUMB2_KERNEL covers the case properly.

arch/arm/probes/kprobes/test-core.c-    regs->ARM_lr = val ^ (14 << 8);
arch/arm/probes/kprobes/test-core.c:    regs->ARM_cpsr &= ~(APSR_MASK | PSR_IT_MASK);
arch/arm/probes/kprobes/test-core.c-    regs->ARM_cpsr |= test_context_cpsr(scenario);

>From what I can see, this happens unconditionally.

KVM and Xen code... that requires virtualisation support, which is ARMv7.

arch/arm/probes/kprobes/actions-thumb.c... emulating an IT instruction.
arch/arm/probes/decode.h::it_advance... emulating Thumb2.

So really there's no other places that need fixing.

> Looking back at old notes I think both debug and signal handler code
> keyed on bit usage.  I see from LXR kernel KVM code also uses in some
> capacity.

Frankly, Richard, you're getting on my nerves in this thread - you
seem to know all about this problem, yet you never reported the problem
upstream, so people are effectively having to waste time re-doing the
work that you've already done.

Nothing annoys me more than having people say "oh yes, I found that
problem and worked on it" and nothing coming of it (no report, no
patch, no nothing.)

As you have "old notes" you've already investigated this issue, and
presumably you came up with a patch.  Where is it?

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: mysterious crashes on OMAP5 uevm
  2015-09-11 17:48                   ` Russell King - ARM Linux
@ 2015-09-11 18:34                     ` Woodruff, Richard
  -1 siblings, 0 replies; 38+ messages in thread
From: Woodruff, Richard @ 2015-09-11 18:34 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Menon, Nishanth, Tony Lindgren, Dr. H. Nikolaus Schaller,
	Grazvydas Ignotas, Marek Belisko, linux-omap, linux-arm-kernel

> From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk]
> Sent: Friday, September 11, 2015 12:49 PM

> Frankly, Richard, you're getting on my nerves in this thread - you seem to
> know all about this problem, yet you never reported the problem upstream,
> so people are effectively having to waste time re-doing the work that you've
> already done.
>
> Nothing annoys me more than having people say "oh yes, I found that
> problem and worked on it" and nothing coming of it (no report, no patch, no
> nothing.)

Yes, when I put out the hint (to help speed resolution) I expected there might be some negative interpretation.

When I originally hit the issue, I did pass along information to folks who work in the area with expectation they would follow through.  Probably it got lost.

When I noticed this thread, it appeared like the CPSR.IT information didn't make it out, so I directly posted what I recalled.

> As you have "old notes" you've already investigated this issue, and
> presumably you came up with a patch.  Where is it?

I didn't generate a comprehensive one. I did a couple of hack versions but was unsure in some of the areas your analysis has cleared... for that issue I ended up advising a reversion of MULTI_V6 for that older kernel.

Regards,
Richard W.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-11 18:34                     ` Woodruff, Richard
  0 siblings, 0 replies; 38+ messages in thread
From: Woodruff, Richard @ 2015-09-11 18:34 UTC (permalink / raw)
  To: linux-arm-kernel

> From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk]
> Sent: Friday, September 11, 2015 12:49 PM

> Frankly, Richard, you're getting on my nerves in this thread - you seem to
> know all about this problem, yet you never reported the problem upstream,
> so people are effectively having to waste time re-doing the work that you've
> already done.
>
> Nothing annoys me more than having people say "oh yes, I found that
> problem and worked on it" and nothing coming of it (no report, no patch, no
> nothing.)

Yes, when I put out the hint (to help speed resolution) I expected there might be some negative interpretation.

When I originally hit the issue, I did pass along information to folks who work in the area with expectation they would follow through.  Probably it got lost.

When I noticed this thread, it appeared like the CPSR.IT information didn't make it out, so I directly posted what I recalled.

> As you have "old notes" you've already investigated this issue, and
> presumably you came up with a patch.  Where is it?

I didn't generate a comprehensive one. I did a couple of hack versions but was unsure in some of the areas your analysis has cleared... for that issue I ended up advising a reversion of MULTI_V6 for that older kernel.

Regards,
Richard W.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mysterious crashes on OMAP5 uevm
  2015-09-11 14:03               ` Russell King - ARM Linux
@ 2015-09-14 12:12                 ` Russell King - ARM Linux
  -1 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2015-09-14 12:12 UTC (permalink / raw)
  To: Grazvydas Ignotas, Will Deacon
  Cc: Nishanth Menon, Tony Lindgren, Dr. H. Nikolaus Schaller,
	Marek Belisko, linux-omap, linux-arm-kernel

On Fri, Sep 11, 2015 at 03:03:07PM +0100, Russell King - ARM Linux wrote:
> On Fri, Sep 11, 2015 at 03:27:13PM +0200, Grazvydas Ignotas wrote:
> > On Thu, Sep 10, 2015 at 10:30 AM, Russell King - ARM Linux
> > <linux@arm.linux.org.uk> wrote:
> > > On Thu, Sep 10, 2015 at 08:42:57AM +0200, Dr. H. Nikolaus Schaller wrote:
> > >> ...
> > >>
> > >> Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
> > >> >> #if 0 //__LINUX_ARM_ARCH__ >= 7
> > >> makes it re-appear.
> > >>
> > >> A while ago I tried to debug running the x-server under strace and could find that it also has
> > >> something to do with SIGALRM.
> > >>
> > >> And that is very consistent with “enable/disable” by modifying arch/arm/kernel/signal.c
> > >
> > > It would be really nice if someone could diagnose what's going on here.
> > > What exception is causing the X server to be killed (someone said a
> > > segfault)?  What is the register state at the point that happens?  What
> > > does the code look like  Is it happening inside the SIGALRM handler, or
> > > when the SIGALRM handler has returned?
> > >
> > > I'd suggest attaching gdb to the X server, but remember to set gdb to
> > > ignore SIGPIPEs.
> > 
> > It's actually pretty random, see some debug sessions in [1].
> > The first one is the most useful one, but I haven't though of checking
> > what pixman_rasterize_edges() was doing when the signal arrived, and
> > most often the "less useful" segfaults occur. However from the
> > disassembly (see debug1_libpixman.gz) it can be seen that the signal
> > arrived right after IT.
> > 
> > [1] http://notaz.gp2x.de/tmp/thumb_segfault/
> 
> We're not going from ARM -> Thumb or Thumb -> ARM here, but Thumb code
> in libpixman is being interrupted calling a Thumb signal handler.
> 
> Working through the code:
> 
>    0x7f717ec8 <SmartScheduleTimer>:     ldr     r2, [pc, #20]   ; = 0x0004112e
>    0x7f717eca <SmartScheduleTimer+2>:   ldr     r1, [pc, #24]   ; = 0x00000c48
>    0x7f717ecc <SmartScheduleTimer+4>:   ldr     r3, [pc, #24]   ; = 0x00000e6c
>    0x7f717ece <SmartScheduleTimer+6>:   add     r2, pc
>    0x7f717ed0 <SmartScheduleTimer+8>:   ldr     r1, [r2, r1]
>    0x7f717ed2 <SmartScheduleTimer+10>:  ldr     r3, [r2, r3]
> => 0x7f717ed4 <SmartScheduleTimer+12>:  ldr     r2, [r1, #0]
> 
> The instruction at 0x7f717ed4 was trying to access 0xd1242963 which
> is in kernel space, and this is the faulting instruction.
> 
> At this point, r2 should contain 0x0004112e plus the PC value.  r2 in
> the register dump was 0x7f717fa0.  Let's calculate the value that PC
> should be here.  0x7f717fa0 - 0x0004112e = 0x7f6d6e72, which is
> clearly wrong.
> 
> So, I don't think the first instruction here was executed by the CPU.
> 
> gdb indicates that the parent context to the signal frame, pc was at
> 0xb6dd87f8, which works out at 0x297f8 into the libpixman-1 library:
> 
>    297f0:       449c            add     ip, r3
>    297f2:       f1bc 0fff       cmp.w   ip, #255        ; 0xff
>    297f6:       bfd4            ite     le
>    297f8:       fa5f fc8c       uxtble.w        ip, ip
>    297fc:       f04f 0cff       movgt.w ip, #255        ; 0xff
>    29800:       f88a c000       strb.w  ip, [sl]
> 
> and as you say, is just after an IT instruction, which would have
> set the IT execution state to appropriately skip either the first or
> the second instruction.
> 
> Unfortunately, the IT instruction's condition is being carried forward
> to the signal handler, causing either the first or second instruction
> there to be skipped.
> 
> Looking back at the history, the original commit introducing the
> clearing of the PSR_IT_MASK bits is just wrong:
> 
> -               if (thumb)
> +               if (thumb) {
>                         cpsr |= PSR_T_BIT;
> -               else
> +#if __LINUX_ARM_ARCH__ >= 7
> +                       /* clear the If-Then Thumb-2 execution state */
> +                       cpsr &= ~PSR_IT_MASK;
> +#endif
> +               } else
>                         cpsr &= ~PSR_T_BIT;
> 
> This shouldn't be a compile-time decision at all, and it certainly should
> not be dependent on __LINUX_ARM_ARCH__, which marks the _lowest_ supported
> architecture.
> 
> However, even the idea that it's ARMv7 or later is wrong.  According to
> the ARM ARM, the IT instruction is present in ARMv6T2 as well, which
> means it's ARMv6 too (which would have __LINUX_ARM_ARCH__ = 6).
> 
> Looking at the ARM ARM, these bits are "reserved" in previous non-T2
> architectures, have an undefined value at reset, and are probably zero
> anyway.
> 
> Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the problem,
> and I doubt there's any ARMv6 non-T2 systems out there that would be
> affected by clearing the IT state bits.

Please test the following patch:

8<===
From: Russell King <rmk+kernel@arm.linux.org.uk>
Subject: [PATCH] ARM: fix Thumb2 signal handling when ARMv6 is enabled

When a kernel is built covering ARMv6 to ARMv7, we omit to clear the
IT state when entering a signal handler.  This can cause the first
few instructions to be conditionally executed depending on the parent
context.

In any case, the original test for >= ARMv7 is broken - ARMv6 can have
Thumb-2 support as well, and an ARMv6T2 specific build would omit this
code too.

Relax the test back to ARMv6 or greater.  This results in us always
clearing the IT state bits in the PSR, even on CPUs where these bits
are reserved.  However, they're reserved for the IT state, so this
should cause no harm.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 arch/arm/kernel/signal.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index b6cda06b455f..b43b4d360bab 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -343,12 +343,17 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
 		 */
 		thumb = handler & 1;
 
-#if __LINUX_ARM_ARCH__ >= 7
+#if __LINUX_ARM_ARCH__ >= 6
 		/*
-		 * Clear the If-Then Thumb-2 execution state
-		 * ARM spec requires this to be all 000s in ARM mode
-		 * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
-		 * signal transition without this.
+		 * Clear the If-Then Thumb-2 execution state.  ARM spec
+		 * requires this to be all 000s in ARM mode.  Snapdragon
+		 * S4/Krait misbehaves on a Thumb=>ARM signal transition
+		 * without this.
+		 *
+		 * We must do this whenever we are running on a Thumb-2
+		 * capable CPU, which includes ARMv6T2.  However, we elect
+		 * to do this whenever we're on an ARMv6 or later CPU for
+		 * simplicity.
 		 */
 		cpsr &= ~PSR_IT_MASK;
 #endif
-- 
2.1.0



-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-14 12:12                 ` Russell King - ARM Linux
  0 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2015-09-14 12:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 11, 2015 at 03:03:07PM +0100, Russell King - ARM Linux wrote:
> On Fri, Sep 11, 2015 at 03:27:13PM +0200, Grazvydas Ignotas wrote:
> > On Thu, Sep 10, 2015 at 10:30 AM, Russell King - ARM Linux
> > <linux@arm.linux.org.uk> wrote:
> > > On Thu, Sep 10, 2015 at 08:42:57AM +0200, Dr. H. Nikolaus Schaller wrote:
> > >> ...
> > >>
> > >> Now, disabling CONFIG_ARCH_MULTI_V6 also makes the bug go away and adding the
> > >> >> #if 0 //__LINUX_ARM_ARCH__ >= 7
> > >> makes it re-appear.
> > >>
> > >> A while ago I tried to debug running the x-server under strace and could find that it also has
> > >> something to do with SIGALRM.
> > >>
> > >> And that is very consistent with ?enable/disable? by modifying arch/arm/kernel/signal.c
> > >
> > > It would be really nice if someone could diagnose what's going on here.
> > > What exception is causing the X server to be killed (someone said a
> > > segfault)?  What is the register state at the point that happens?  What
> > > does the code look like  Is it happening inside the SIGALRM handler, or
> > > when the SIGALRM handler has returned?
> > >
> > > I'd suggest attaching gdb to the X server, but remember to set gdb to
> > > ignore SIGPIPEs.
> > 
> > It's actually pretty random, see some debug sessions in [1].
> > The first one is the most useful one, but I haven't though of checking
> > what pixman_rasterize_edges() was doing when the signal arrived, and
> > most often the "less useful" segfaults occur. However from the
> > disassembly (see debug1_libpixman.gz) it can be seen that the signal
> > arrived right after IT.
> > 
> > [1] http://notaz.gp2x.de/tmp/thumb_segfault/
> 
> We're not going from ARM -> Thumb or Thumb -> ARM here, but Thumb code
> in libpixman is being interrupted calling a Thumb signal handler.
> 
> Working through the code:
> 
>    0x7f717ec8 <SmartScheduleTimer>:     ldr     r2, [pc, #20]   ; = 0x0004112e
>    0x7f717eca <SmartScheduleTimer+2>:   ldr     r1, [pc, #24]   ; = 0x00000c48
>    0x7f717ecc <SmartScheduleTimer+4>:   ldr     r3, [pc, #24]   ; = 0x00000e6c
>    0x7f717ece <SmartScheduleTimer+6>:   add     r2, pc
>    0x7f717ed0 <SmartScheduleTimer+8>:   ldr     r1, [r2, r1]
>    0x7f717ed2 <SmartScheduleTimer+10>:  ldr     r3, [r2, r3]
> => 0x7f717ed4 <SmartScheduleTimer+12>:  ldr     r2, [r1, #0]
> 
> The instruction at 0x7f717ed4 was trying to access 0xd1242963 which
> is in kernel space, and this is the faulting instruction.
> 
> At this point, r2 should contain 0x0004112e plus the PC value.  r2 in
> the register dump was 0x7f717fa0.  Let's calculate the value that PC
> should be here.  0x7f717fa0 - 0x0004112e = 0x7f6d6e72, which is
> clearly wrong.
> 
> So, I don't think the first instruction here was executed by the CPU.
> 
> gdb indicates that the parent context to the signal frame, pc was at
> 0xb6dd87f8, which works out at 0x297f8 into the libpixman-1 library:
> 
>    297f0:       449c            add     ip, r3
>    297f2:       f1bc 0fff       cmp.w   ip, #255        ; 0xff
>    297f6:       bfd4            ite     le
>    297f8:       fa5f fc8c       uxtble.w        ip, ip
>    297fc:       f04f 0cff       movgt.w ip, #255        ; 0xff
>    29800:       f88a c000       strb.w  ip, [sl]
> 
> and as you say, is just after an IT instruction, which would have
> set the IT execution state to appropriately skip either the first or
> the second instruction.
> 
> Unfortunately, the IT instruction's condition is being carried forward
> to the signal handler, causing either the first or second instruction
> there to be skipped.
> 
> Looking back at the history, the original commit introducing the
> clearing of the PSR_IT_MASK bits is just wrong:
> 
> -               if (thumb)
> +               if (thumb) {
>                         cpsr |= PSR_T_BIT;
> -               else
> +#if __LINUX_ARM_ARCH__ >= 7
> +                       /* clear the If-Then Thumb-2 execution state */
> +                       cpsr &= ~PSR_IT_MASK;
> +#endif
> +               } else
>                         cpsr &= ~PSR_T_BIT;
> 
> This shouldn't be a compile-time decision at all, and it certainly should
> not be dependent on __LINUX_ARM_ARCH__, which marks the _lowest_ supported
> architecture.
> 
> However, even the idea that it's ARMv7 or later is wrong.  According to
> the ARM ARM, the IT instruction is present in ARMv6T2 as well, which
> means it's ARMv6 too (which would have __LINUX_ARM_ARCH__ = 6).
> 
> Looking at the ARM ARM, these bits are "reserved" in previous non-T2
> architectures, have an undefined value at reset, and are probably zero
> anyway.
> 
> Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the problem,
> and I doubt there's any ARMv6 non-T2 systems out there that would be
> affected by clearing the IT state bits.

Please test the following patch:

8<===
From: Russell King <rmk+kernel@arm.linux.org.uk>
Subject: [PATCH] ARM: fix Thumb2 signal handling when ARMv6 is enabled

When a kernel is built covering ARMv6 to ARMv7, we omit to clear the
IT state when entering a signal handler.  This can cause the first
few instructions to be conditionally executed depending on the parent
context.

In any case, the original test for >= ARMv7 is broken - ARMv6 can have
Thumb-2 support as well, and an ARMv6T2 specific build would omit this
code too.

Relax the test back to ARMv6 or greater.  This results in us always
clearing the IT state bits in the PSR, even on CPUs where these bits
are reserved.  However, they're reserved for the IT state, so this
should cause no harm.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 arch/arm/kernel/signal.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index b6cda06b455f..b43b4d360bab 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -343,12 +343,17 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
 		 */
 		thumb = handler & 1;
 
-#if __LINUX_ARM_ARCH__ >= 7
+#if __LINUX_ARM_ARCH__ >= 6
 		/*
-		 * Clear the If-Then Thumb-2 execution state
-		 * ARM spec requires this to be all 000s in ARM mode
-		 * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
-		 * signal transition without this.
+		 * Clear the If-Then Thumb-2 execution state.  ARM spec
+		 * requires this to be all 000s in ARM mode.  Snapdragon
+		 * S4/Krait misbehaves on a Thumb=>ARM signal transition
+		 * without this.
+		 *
+		 * We must do this whenever we are running on a Thumb-2
+		 * capable CPU, which includes ARMv6T2.  However, we elect
+		 * to do this whenever we're on an ARMv6 or later CPU for
+		 * simplicity.
 		 */
 		cpsr &= ~PSR_IT_MASK;
 #endif
-- 
2.1.0



-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: mysterious crashes on OMAP5 uevm
  2015-09-14 12:12                 ` Russell King - ARM Linux
@ 2015-09-14 19:02                   ` Tony Lindgren
  -1 siblings, 0 replies; 38+ messages in thread
From: Tony Lindgren @ 2015-09-14 19:02 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Nishanth Menon, Dr. H. Nikolaus Schaller, Will Deacon,
	Grazvydas Ignotas, Marek Belisko, linux-omap, linux-arm-kernel

* Russell King - ARM Linux <linux@arm.linux.org.uk> [150914 05:16]:
> On Fri, Sep 11, 2015 at 03:03:07PM +0100, Russell King - ARM Linux wrote:
> > 
> > Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the problem,
> > and I doubt there's any ARMv6 non-T2 systems out there that would be
> > affected by clearing the IT state bits.
> 
> Please test the following patch:

While we're waiting for Grazvydas to test.. Looks good to me:

Acked-by: Tony Lindgren <tony@atomide.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-14 19:02                   ` Tony Lindgren
  0 siblings, 0 replies; 38+ messages in thread
From: Tony Lindgren @ 2015-09-14 19:02 UTC (permalink / raw)
  To: linux-arm-kernel

* Russell King - ARM Linux <linux@arm.linux.org.uk> [150914 05:16]:
> On Fri, Sep 11, 2015 at 03:03:07PM +0100, Russell King - ARM Linux wrote:
> > 
> > Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the problem,
> > and I doubt there's any ARMv6 non-T2 systems out there that would be
> > affected by clearing the IT state bits.
> 
> Please test the following patch:

While we're waiting for Grazvydas to test.. Looks good to me:

Acked-by: Tony Lindgren <tony@atomide.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mysterious crashes on OMAP5 uevm
  2015-09-14 19:02                   ` Tony Lindgren
@ 2015-09-14 19:35                     ` Dr. H. Nikolaus Schaller
  -1 siblings, 0 replies; 38+ messages in thread
From: Dr. H. Nikolaus Schaller @ 2015-09-14 19:35 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Nishanth Menon, Russell King - ARM Linux, Will Deacon,
	Grazvydas Ignotas, Marek Belisko, linux-omap, linux-arm-kernel


Am 14.09.2015 um 21:02 schrieb Tony Lindgren <tony@atomide.com>:

> * Russell King - ARM Linux <linux@arm.linux.org.uk> [150914 05:16]:
>> On Fri, Sep 11, 2015 at 03:03:07PM +0100, Russell King - ARM Linux wrote:
>>> 
>>> Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the problem,
>>> and I doubt there's any ARMv6 non-T2 systems out there that would be
>>> affected by clearing the IT state bits.
>> 
>> Please test the following patch:
> 
> While we're waiting for Grazvydas to test.. Looks good to me:
> 
> Acked-by: Tony Lindgren <tony@atomide.com>

I have tested on:
* GTA04 with DM3730 (OMAP3)
* Pyra prototype with OMAP5432
No X server crashes seen any more.

Tested-by: H. Nikolaus Schaller <hns@goldelico.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-14 19:35                     ` Dr. H. Nikolaus Schaller
  0 siblings, 0 replies; 38+ messages in thread
From: Dr. H. Nikolaus Schaller @ 2015-09-14 19:35 UTC (permalink / raw)
  To: linux-arm-kernel


Am 14.09.2015 um 21:02 schrieb Tony Lindgren <tony@atomide.com>:

> * Russell King - ARM Linux <linux@arm.linux.org.uk> [150914 05:16]:
>> On Fri, Sep 11, 2015 at 03:03:07PM +0100, Russell King - ARM Linux wrote:
>>> 
>>> Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the problem,
>>> and I doubt there's any ARMv6 non-T2 systems out there that would be
>>> affected by clearing the IT state bits.
>> 
>> Please test the following patch:
> 
> While we're waiting for Grazvydas to test.. Looks good to me:
> 
> Acked-by: Tony Lindgren <tony@atomide.com>

I have tested on:
* GTA04 with DM3730 (OMAP3)
* Pyra prototype with OMAP5432
No X server crashes seen any more.

Tested-by: H. Nikolaus Schaller <hns@goldelico.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mysterious crashes on OMAP5 uevm
  2015-09-14 19:35                     ` Dr. H. Nikolaus Schaller
@ 2015-09-15 17:31                       ` Grazvydas Ignotas
  -1 siblings, 0 replies; 38+ messages in thread
From: Grazvydas Ignotas @ 2015-09-15 17:31 UTC (permalink / raw)
  To: Dr. H. Nikolaus Schaller
  Cc: Nishanth Menon, Russell King - ARM Linux, Tony Lindgren,
	Will Deacon, Marek Belisko, linux-omap, linux-arm-kernel

On Mon, Sep 14, 2015 at 10:35 PM, Dr. H. Nikolaus Schaller
<hns@goldelico.com> wrote:
>
> Am 14.09.2015 um 21:02 schrieb Tony Lindgren <tony@atomide.com>:
>
>> * Russell King - ARM Linux <linux@arm.linux.org.uk> [150914 05:16]:
>>> On Fri, Sep 11, 2015 at 03:03:07PM +0100, Russell King - ARM Linux wrote:
>>>>
>>>> Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the problem,
>>>> and I doubt there's any ARMv6 non-T2 systems out there that would be
>>>> affected by clearing the IT state bits.
>>>
>>> Please test the following patch:
>>
>> While we're waiting for Grazvydas to test.. Looks good to me:
>>
>> Acked-by: Tony Lindgren <tony@atomide.com>
>
> I have tested on:
> * GTA04 with DM3730 (OMAP3)
> * Pyra prototype with OMAP5432
> No X server crashes seen any more.
>
> Tested-by: H. Nikolaus Schaller <hns@goldelico.com>

Tested-by: Grazvydas Ignotas <notasas@gmail.com>
on OMAP5 uevm running v4.2 built with omap2plus_defconfig.
On v4.3-rc1 hsmmc controller probe is deferred for whatever reason and
never reprobes, so my rootfs is never mounted and I could not test,
but that looks unrelated.

I guess it's worth marking this one for stable.

Gražvydas

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-15 17:31                       ` Grazvydas Ignotas
  0 siblings, 0 replies; 38+ messages in thread
From: Grazvydas Ignotas @ 2015-09-15 17:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 14, 2015 at 10:35 PM, Dr. H. Nikolaus Schaller
<hns@goldelico.com> wrote:
>
> Am 14.09.2015 um 21:02 schrieb Tony Lindgren <tony@atomide.com>:
>
>> * Russell King - ARM Linux <linux@arm.linux.org.uk> [150914 05:16]:
>>> On Fri, Sep 11, 2015 at 03:03:07PM +0100, Russell King - ARM Linux wrote:
>>>>
>>>> Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the problem,
>>>> and I doubt there's any ARMv6 non-T2 systems out there that would be
>>>> affected by clearing the IT state bits.
>>>
>>> Please test the following patch:
>>
>> While we're waiting for Grazvydas to test.. Looks good to me:
>>
>> Acked-by: Tony Lindgren <tony@atomide.com>
>
> I have tested on:
> * GTA04 with DM3730 (OMAP3)
> * Pyra prototype with OMAP5432
> No X server crashes seen any more.
>
> Tested-by: H. Nikolaus Schaller <hns@goldelico.com>

Tested-by: Grazvydas Ignotas <notasas@gmail.com>
on OMAP5 uevm running v4.2 built with omap2plus_defconfig.
On v4.3-rc1 hsmmc controller probe is deferred for whatever reason and
never reprobes, so my rootfs is never mounted and I could not test,
but that looks unrelated.

I guess it's worth marking this one for stable.

Gra?vydas

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mysterious crashes on OMAP5 uevm
  2015-09-15 17:31                       ` Grazvydas Ignotas
@ 2015-09-16 10:07                         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2015-09-16 10:07 UTC (permalink / raw)
  To: Grazvydas Ignotas
  Cc: Nishanth Menon, Tony Lindgren, Dr. H. Nikolaus Schaller,
	Will Deacon, Marek Belisko, linux-omap, linux-arm-kernel

On Tue, Sep 15, 2015 at 08:31:44PM +0300, Grazvydas Ignotas wrote:
> On Mon, Sep 14, 2015 at 10:35 PM, Dr. H. Nikolaus Schaller
> <hns@goldelico.com> wrote:
> >
> > Am 14.09.2015 um 21:02 schrieb Tony Lindgren <tony@atomide.com>:
> >
> >> * Russell King - ARM Linux <linux@arm.linux.org.uk> [150914 05:16]:
> >>> On Fri, Sep 11, 2015 at 03:03:07PM +0100, Russell King - ARM Linux wrote:
> >>>>
> >>>> Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the problem,
> >>>> and I doubt there's any ARMv6 non-T2 systems out there that would be
> >>>> affected by clearing the IT state bits.
> >>>
> >>> Please test the following patch:
> >>
> >> While we're waiting for Grazvydas to test.. Looks good to me:
> >>
> >> Acked-by: Tony Lindgren <tony@atomide.com>
> >
> > I have tested on:
> > * GTA04 with DM3730 (OMAP3)
> > * Pyra prototype with OMAP5432
> > No X server crashes seen any more.
> >
> > Tested-by: H. Nikolaus Schaller <hns@goldelico.com>
> 
> Tested-by: Grazvydas Ignotas <notasas@gmail.com>
> on OMAP5 uevm running v4.2 built with omap2plus_defconfig.
> On v4.3-rc1 hsmmc controller probe is deferred for whatever reason and
> never reprobes, so my rootfs is never mounted and I could not test,
> but that looks unrelated.

Thanks.

> I guess it's worth marking this one for stable.

Indeed.

Having looked closer at the ARM ARM, these bits on older CPUs are marked
as UNK/SBZP (unknown, should be zero or preserved).  So it's safe to get
rid of that #if entirely.  Removing that #if won't affect the validity
of your testing as you've only tested on ARMv7 platforms with ARMv6
included in the kernel.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-16 10:07                         ` Russell King - ARM Linux
  0 siblings, 0 replies; 38+ messages in thread
From: Russell King - ARM Linux @ 2015-09-16 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 15, 2015 at 08:31:44PM +0300, Grazvydas Ignotas wrote:
> On Mon, Sep 14, 2015 at 10:35 PM, Dr. H. Nikolaus Schaller
> <hns@goldelico.com> wrote:
> >
> > Am 14.09.2015 um 21:02 schrieb Tony Lindgren <tony@atomide.com>:
> >
> >> * Russell King - ARM Linux <linux@arm.linux.org.uk> [150914 05:16]:
> >>> On Fri, Sep 11, 2015 at 03:03:07PM +0100, Russell King - ARM Linux wrote:
> >>>>
> >>>> Merely changing __LINUX_ARM_ARCH__ >= 7 to >= 6 should fix the problem,
> >>>> and I doubt there's any ARMv6 non-T2 systems out there that would be
> >>>> affected by clearing the IT state bits.
> >>>
> >>> Please test the following patch:
> >>
> >> While we're waiting for Grazvydas to test.. Looks good to me:
> >>
> >> Acked-by: Tony Lindgren <tony@atomide.com>
> >
> > I have tested on:
> > * GTA04 with DM3730 (OMAP3)
> > * Pyra prototype with OMAP5432
> > No X server crashes seen any more.
> >
> > Tested-by: H. Nikolaus Schaller <hns@goldelico.com>
> 
> Tested-by: Grazvydas Ignotas <notasas@gmail.com>
> on OMAP5 uevm running v4.2 built with omap2plus_defconfig.
> On v4.3-rc1 hsmmc controller probe is deferred for whatever reason and
> never reprobes, so my rootfs is never mounted and I could not test,
> but that looks unrelated.

Thanks.

> I guess it's worth marking this one for stable.

Indeed.

Having looked closer at the ARM ARM, these bits on older CPUs are marked
as UNK/SBZP (unknown, should be zero or preserved).  So it's safe to get
rid of that #if entirely.  Removing that #if won't affect the validity
of your testing as you've only tested on ARMv7 platforms with ARMv6
included in the kernel.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mysterious crashes on OMAP5 uevm
  2015-09-08 21:07       ` Tony Lindgren
@ 2015-09-18 17:48         ` Tony Lindgren
  -1 siblings, 0 replies; 38+ messages in thread
From: Tony Lindgren @ 2015-09-18 17:48 UTC (permalink / raw)
  To: Grazvydas Ignotas
  Cc: Nishanth Menon, Dr. H. Nikolaus Schaller, linux-omap,
	Russell King - ARM Linux, linux-arm-kernel

Hi Grazvydas,

* Tony Lindgren <tony@atomide.com> [150908 14:11]:
> * Grazvydas Ignotas <notasas@gmail.com> [150908 13:44]:
> > On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
> OK nice to hear you found it. Yeah looks like some runtime
> capability check is needed.
>  
> > > Do you have some easy way to reproduce this issue?
> > 
> > Just moving a browser window around with mouse usually triggers it
> > within a minute.
> 
> OK good to know.

Just FYI, I too was now able to produce it here too moving around
icewweasel for about a minute. And can confirm Russell's patch
fixes the problem.

I'm using i3 tiling window manager here, and don't usually
ever have any floating windows which probably explains why I
did not run into this issue earlier with my lapdock experiments :)

Regards,

Tony

^ permalink raw reply	[flat|nested] 38+ messages in thread

* mysterious crashes on OMAP5 uevm
@ 2015-09-18 17:48         ` Tony Lindgren
  0 siblings, 0 replies; 38+ messages in thread
From: Tony Lindgren @ 2015-09-18 17:48 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Grazvydas,

* Tony Lindgren <tony@atomide.com> [150908 14:11]:
> * Grazvydas Ignotas <notasas@gmail.com> [150908 13:44]:
> > On Tue, Sep 8, 2015 at 4:38 PM, Tony Lindgren <tony@atomide.com> wrote:
> OK nice to hear you found it. Yeah looks like some runtime
> capability check is needed.
>  
> > > Do you have some easy way to reproduce this issue?
> > 
> > Just moving a browser window around with mouse usually triggers it
> > within a minute.
> 
> OK good to know.

Just FYI, I too was now able to produce it here too moving around
icewweasel for about a minute. And can confirm Russell's patch
fixes the problem.

I'm using i3 tiling window manager here, and don't usually
ever have any floating windows which probably explains why I
did not run into this issue earlier with my lapdock experiments :)

Regards,

Tony

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2015-09-18 17:48 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-08 12:46 mysterious crashes on OMAP5 uevm Grazvydas Ignotas
2015-09-08 12:46 ` Grazvydas Ignotas
2015-09-08 14:38 ` Tony Lindgren
2015-09-08 14:38   ` Tony Lindgren
2015-09-08 20:41   ` Grazvydas Ignotas
2015-09-08 20:41     ` Grazvydas Ignotas
2015-09-08 21:07     ` Tony Lindgren
2015-09-08 21:07       ` Tony Lindgren
2015-09-10  6:42       ` Dr. H. Nikolaus Schaller
2015-09-10  6:42         ` Dr. H. Nikolaus Schaller
2015-09-10  8:30         ` Russell King - ARM Linux
2015-09-10  8:30           ` Russell King - ARM Linux
2015-09-10  8:57           ` Dr. H. Nikolaus Schaller
2015-09-10  8:57             ` Dr. H. Nikolaus Schaller
2015-09-10 23:33           ` Woodruff, Richard
2015-09-10 23:33             ` Woodruff, Richard
2015-09-11 13:27           ` Grazvydas Ignotas
2015-09-11 13:27             ` Grazvydas Ignotas
2015-09-11 14:03             ` Russell King - ARM Linux
2015-09-11 14:03               ` Russell King - ARM Linux
2015-09-11 16:12               ` Woodruff, Richard
2015-09-11 16:12                 ` Woodruff, Richard
2015-09-11 17:48                 ` Russell King - ARM Linux
2015-09-11 17:48                   ` Russell King - ARM Linux
2015-09-11 18:34                   ` Woodruff, Richard
2015-09-11 18:34                     ` Woodruff, Richard
2015-09-14 12:12               ` Russell King - ARM Linux
2015-09-14 12:12                 ` Russell King - ARM Linux
2015-09-14 19:02                 ` Tony Lindgren
2015-09-14 19:02                   ` Tony Lindgren
2015-09-14 19:35                   ` Dr. H. Nikolaus Schaller
2015-09-14 19:35                     ` Dr. H. Nikolaus Schaller
2015-09-15 17:31                     ` Grazvydas Ignotas
2015-09-15 17:31                       ` Grazvydas Ignotas
2015-09-16 10:07                       ` Russell King - ARM Linux
2015-09-16 10:07                         ` Russell King - ARM Linux
2015-09-18 17:48       ` Tony Lindgren
2015-09-18 17:48         ` Tony Lindgren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.