All of lore.kernel.org
 help / color / mirror / Atom feed
* arm64: unhandled level 0 translation fault
@ 2017-12-12 10:20 ` Geert Uytterhoeven
  0 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-12 10:20 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, Linux-Renesas, linux-kernel

Hi Catalin, Will, et al,

During userspace (Debian jessie NFS root) boot on arm64:

rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
esr 0x92000004, in dash[aaaaadf77000+1a000]
CPU: 0 PID: 1083 Comm: rpcbind Not tainted
4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
pstate: 80000000 (Nzcv daif -PAN -UAO)
pc : 0xaaaaadf8a51c
lr : 0xaaaaadf8ac08
sp : 0000ffffcffeac00
x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
x27: 0000ffffcffebf7c x26: 0000ffffcffead20
x25: 0000aaaacea1c5f0 x24: 0000000000000000
x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
x21: 0000000000000000 x20: 0000000000000008
x19: 0000000000000000 x18: 0000ffffcffeb500
x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
x15: 0000ffffa2363588 x14: ffffffffffffffff
x13: 0000000000000020 x12: 0000000000000010
x11: 0101010101010101 x10: 0000aaaaadfa1000
x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
x7 : 0000000000000000 x6 : 0000000000000000
x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
x3 : 0000aaaaadfa2338 x2 : 0000000000000000
x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30

Sometimes it happens with other processes, but the main address, esr, and
pstate values are always the same.

I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
releases, so the last time was two weeks ago), but never saw the issue
before until today, so probably v4.15-rc1 is OK.
Unfortunately it doesn't happen during every boot, which makes it
cumbersome to bisect.

My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
and even without today's arm64/for-next/core merged in, I still managed to
reproduce the issue, so I believe it was introduced in v4.15-rc2 or
v4.15-rc3.

Once, when the kernel message above wasn't shown, I got an error from
userspace, which may be related:
*** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaadd970988 ***

Do you have a clue?
Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-12 10:20 ` Geert Uytterhoeven
  0 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-12 10:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Catalin, Will, et al,

During userspace (Debian jessie NFS root) boot on arm64:

rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
esr 0x92000004, in dash[aaaaadf77000+1a000]
CPU: 0 PID: 1083 Comm: rpcbind Not tainted
4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
pstate: 80000000 (Nzcv daif -PAN -UAO)
pc : 0xaaaaadf8a51c
lr : 0xaaaaadf8ac08
sp : 0000ffffcffeac00
x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
x27: 0000ffffcffebf7c x26: 0000ffffcffead20
x25: 0000aaaacea1c5f0 x24: 0000000000000000
x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
x21: 0000000000000000 x20: 0000000000000008
x19: 0000000000000000 x18: 0000ffffcffeb500
x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
x15: 0000ffffa2363588 x14: ffffffffffffffff
x13: 0000000000000020 x12: 0000000000000010
x11: 0101010101010101 x10: 0000aaaaadfa1000
x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
x7 : 0000000000000000 x6 : 0000000000000000
x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
x3 : 0000aaaaadfa2338 x2 : 0000000000000000
x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30

Sometimes it happens with other processes, but the main address, esr, and
pstate values are always the same.

I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
releases, so the last time was two weeks ago), but never saw the issue
before until today, so probably v4.15-rc1 is OK.
Unfortunately it doesn't happen during every boot, which makes it
cumbersome to bisect.

My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
and even without today's arm64/for-next/core merged in, I still managed to
reproduce the issue, so I believe it was introduced in v4.15-rc2 or
v4.15-rc3.

Once, when the kernel message above wasn't shown, I got an error from
userspace, which may be related:
*** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaadd970988 ***

Do you have a clue?
Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-12 10:20 ` Geert Uytterhoeven
@ 2017-12-12 10:36   ` Will Deacon
  -1 siblings, 0 replies; 36+ messages in thread
From: Will Deacon @ 2017-12-12 10:36 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Catalin Marinas, linux-arm-kernel, Linux-Renesas, linux-kernel

Hi Geert,

On Tue, Dec 12, 2017 at 11:20:09AM +0100, Geert Uytterhoeven wrote:
> During userspace (Debian jessie NFS root) boot on arm64:
> 
> rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
> esr 0x92000004, in dash[aaaaadf77000+1a000]
> CPU: 0 PID: 1083 Comm: rpcbind Not tainted
> 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
> Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
> pstate: 80000000 (Nzcv daif -PAN -UAO)
> pc : 0xaaaaadf8a51c
> lr : 0xaaaaadf8ac08
> sp : 0000ffffcffeac00
> x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
> x27: 0000ffffcffebf7c x26: 0000ffffcffead20
> x25: 0000aaaacea1c5f0 x24: 0000000000000000
> x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
> x21: 0000000000000000 x20: 0000000000000008
> x19: 0000000000000000 x18: 0000ffffcffeb500
> x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
> x15: 0000ffffa2363588 x14: ffffffffffffffff
> x13: 0000000000000020 x12: 0000000000000010
> x11: 0101010101010101 x10: 0000aaaaadfa1000
> x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
> x7 : 0000000000000000 x6 : 0000000000000000
> x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
> x3 : 0000aaaaadfa2338 x2 : 0000000000000000
> x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
> 
> Sometimes it happens with other processes, but the main address, esr, and
> pstate values are always the same.
> 
> I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
> releases, so the last time was two weeks ago), but never saw the issue
> before until today, so probably v4.15-rc1 is OK.
> Unfortunately it doesn't happen during every boot, which makes it
> cumbersome to bisect.
> 
> My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
> and even without today's arm64/for-next/core merged in, I still managed to
> reproduce the issue, so I believe it was introduced in v4.15-rc2 or
> v4.15-rc3.

Urgh, this looks nasty. Thanks for the report! A few questions:

 - Can you share your .config somewhere please?
 - What was your last known-good kernel?
 - Have you seen it on any other Soc?
 - What's the CPU in your SoC?

If I can reproduce the failure here, then I should be able to debug ASAP.

Cheers,

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-12 10:36   ` Will Deacon
  0 siblings, 0 replies; 36+ messages in thread
From: Will Deacon @ 2017-12-12 10:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geert,

On Tue, Dec 12, 2017 at 11:20:09AM +0100, Geert Uytterhoeven wrote:
> During userspace (Debian jessie NFS root) boot on arm64:
> 
> rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
> esr 0x92000004, in dash[aaaaadf77000+1a000]
> CPU: 0 PID: 1083 Comm: rpcbind Not tainted
> 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
> Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
> pstate: 80000000 (Nzcv daif -PAN -UAO)
> pc : 0xaaaaadf8a51c
> lr : 0xaaaaadf8ac08
> sp : 0000ffffcffeac00
> x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
> x27: 0000ffffcffebf7c x26: 0000ffffcffead20
> x25: 0000aaaacea1c5f0 x24: 0000000000000000
> x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
> x21: 0000000000000000 x20: 0000000000000008
> x19: 0000000000000000 x18: 0000ffffcffeb500
> x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
> x15: 0000ffffa2363588 x14: ffffffffffffffff
> x13: 0000000000000020 x12: 0000000000000010
> x11: 0101010101010101 x10: 0000aaaaadfa1000
> x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
> x7 : 0000000000000000 x6 : 0000000000000000
> x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
> x3 : 0000aaaaadfa2338 x2 : 0000000000000000
> x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
> 
> Sometimes it happens with other processes, but the main address, esr, and
> pstate values are always the same.
> 
> I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
> releases, so the last time was two weeks ago), but never saw the issue
> before until today, so probably v4.15-rc1 is OK.
> Unfortunately it doesn't happen during every boot, which makes it
> cumbersome to bisect.
> 
> My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
> and even without today's arm64/for-next/core merged in, I still managed to
> reproduce the issue, so I believe it was introduced in v4.15-rc2 or
> v4.15-rc3.

Urgh, this looks nasty. Thanks for the report! A few questions:

 - Can you share your .config somewhere please?
 - What was your last known-good kernel?
 - Have you seen it on any other Soc?
 - What's the CPU in your SoC?

If I can reproduce the failure here, then I should be able to debug ASAP.

Cheers,

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-12 10:36   ` Will Deacon
@ 2017-12-12 15:11     ` Geert Uytterhoeven
  -1 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-12 15:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, linux-arm-kernel, Linux-Renesas, linux-kernel

Hi Will,

On Tue, Dec 12, 2017 at 11:36 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Dec 12, 2017 at 11:20:09AM +0100, Geert Uytterhoeven wrote:
>> During userspace (Debian jessie NFS root) boot on arm64:
>>
>> rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
>> esr 0x92000004, in dash[aaaaadf77000+1a000]
>> CPU: 0 PID: 1083 Comm: rpcbind Not tainted
>> 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
>> Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
>> pstate: 80000000 (Nzcv daif -PAN -UAO)
>> pc : 0xaaaaadf8a51c
>> lr : 0xaaaaadf8ac08
>> sp : 0000ffffcffeac00
>> x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
>> x27: 0000ffffcffebf7c x26: 0000ffffcffead20
>> x25: 0000aaaacea1c5f0 x24: 0000000000000000
>> x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
>> x21: 0000000000000000 x20: 0000000000000008
>> x19: 0000000000000000 x18: 0000ffffcffeb500
>> x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
>> x15: 0000ffffa2363588 x14: ffffffffffffffff
>> x13: 0000000000000020 x12: 0000000000000010
>> x11: 0101010101010101 x10: 0000aaaaadfa1000
>> x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
>> x7 : 0000000000000000 x6 : 0000000000000000
>> x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
>> x3 : 0000aaaaadfa2338 x2 : 0000000000000000
>> x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
>>
>> Sometimes it happens with other processes, but the main address, esr, and
>> pstate values are always the same.
>>
>> I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
>> releases, so the last time was two weeks ago), but never saw the issue
>> before until today, so probably v4.15-rc1 is OK.
>> Unfortunately it doesn't happen during every boot, which makes it
>> cumbersome to bisect.
>>
>> My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
>> and even without today's arm64/for-next/core merged in, I still managed to
>> reproduce the issue, so I believe it was introduced in v4.15-rc2 or
>> v4.15-rc3.
>
> Urgh, this looks nasty. Thanks for the report! A few questions:
>
>  - Can you share your .config somewhere please?

I managed to reproduce it on plain v4.15-rc3 using both arm64_defconfig, and
renesas_defconfig (from Simon's repo).

>  - What was your last known-good kernel?

v4.15-rc1.

>  - Have you seen it on any other Soc?

I haven't seen it on any Renesas arm32 SoC, only on arm64.

>  - What's the CPU in your SoC?

Quad Cortex A57.

> If I can reproduce the failure here, then I should be able to debug ASAP.

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-12 15:11     ` Geert Uytterhoeven
  0 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-12 15:11 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Tue, Dec 12, 2017 at 11:36 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Dec 12, 2017 at 11:20:09AM +0100, Geert Uytterhoeven wrote:
>> During userspace (Debian jessie NFS root) boot on arm64:
>>
>> rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
>> esr 0x92000004, in dash[aaaaadf77000+1a000]
>> CPU: 0 PID: 1083 Comm: rpcbind Not tainted
>> 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
>> Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
>> pstate: 80000000 (Nzcv daif -PAN -UAO)
>> pc : 0xaaaaadf8a51c
>> lr : 0xaaaaadf8ac08
>> sp : 0000ffffcffeac00
>> x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
>> x27: 0000ffffcffebf7c x26: 0000ffffcffead20
>> x25: 0000aaaacea1c5f0 x24: 0000000000000000
>> x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
>> x21: 0000000000000000 x20: 0000000000000008
>> x19: 0000000000000000 x18: 0000ffffcffeb500
>> x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
>> x15: 0000ffffa2363588 x14: ffffffffffffffff
>> x13: 0000000000000020 x12: 0000000000000010
>> x11: 0101010101010101 x10: 0000aaaaadfa1000
>> x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
>> x7 : 0000000000000000 x6 : 0000000000000000
>> x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
>> x3 : 0000aaaaadfa2338 x2 : 0000000000000000
>> x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
>>
>> Sometimes it happens with other processes, but the main address, esr, and
>> pstate values are always the same.
>>
>> I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
>> releases, so the last time was two weeks ago), but never saw the issue
>> before until today, so probably v4.15-rc1 is OK.
>> Unfortunately it doesn't happen during every boot, which makes it
>> cumbersome to bisect.
>>
>> My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
>> and even without today's arm64/for-next/core merged in, I still managed to
>> reproduce the issue, so I believe it was introduced in v4.15-rc2 or
>> v4.15-rc3.
>
> Urgh, this looks nasty. Thanks for the report! A few questions:
>
>  - Can you share your .config somewhere please?

I managed to reproduce it on plain v4.15-rc3 using both arm64_defconfig, and
renesas_defconfig (from Simon's repo).

>  - What was your last known-good kernel?

v4.15-rc1.

>  - Have you seen it on any other Soc?

I haven't seen it on any Renesas arm32 SoC, only on arm64.

>  - What's the CPU in your SoC?

Quad Cortex A57.

> If I can reproduce the failure here, then I should be able to debug ASAP.

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-12 15:11     ` Geert Uytterhoeven
@ 2017-12-12 16:00       ` Geert Uytterhoeven
  -1 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-12 16:00 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, linux-arm-kernel, Linux-Renesas, linux-kernel

Hi Will,

On Tue, Dec 12, 2017 at 4:11 PM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> On Tue, Dec 12, 2017 at 11:36 AM, Will Deacon <will.deacon@arm.com> wrote:
>> On Tue, Dec 12, 2017 at 11:20:09AM +0100, Geert Uytterhoeven wrote:
>>> During userspace (Debian jessie NFS root) boot on arm64:
>>>
>>> rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
>>> esr 0x92000004, in dash[aaaaadf77000+1a000]
>>> CPU: 0 PID: 1083 Comm: rpcbind Not tainted
>>> 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
>>> Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
>>> pstate: 80000000 (Nzcv daif -PAN -UAO)
>>> pc : 0xaaaaadf8a51c
>>> lr : 0xaaaaadf8ac08
>>> sp : 0000ffffcffeac00
>>> x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
>>> x27: 0000ffffcffebf7c x26: 0000ffffcffead20
>>> x25: 0000aaaacea1c5f0 x24: 0000000000000000
>>> x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
>>> x21: 0000000000000000 x20: 0000000000000008
>>> x19: 0000000000000000 x18: 0000ffffcffeb500
>>> x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
>>> x15: 0000ffffa2363588 x14: ffffffffffffffff
>>> x13: 0000000000000020 x12: 0000000000000010
>>> x11: 0101010101010101 x10: 0000aaaaadfa1000
>>> x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
>>> x7 : 0000000000000000 x6 : 0000000000000000
>>> x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
>>> x3 : 0000aaaaadfa2338 x2 : 0000000000000000
>>> x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
>>>
>>> Sometimes it happens with other processes, but the main address, esr, and
>>> pstate values are always the same.
>>>
>>> I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
>>> releases, so the last time was two weeks ago), but never saw the issue
>>> before until today, so probably v4.15-rc1 is OK.
>>> Unfortunately it doesn't happen during every boot, which makes it
>>> cumbersome to bisect.
>>>
>>> My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
>>> and even without today's arm64/for-next/core merged in, I still managed to
>>> reproduce the issue, so I believe it was introduced in v4.15-rc2 or
>>> v4.15-rc3.
>>
>> Urgh, this looks nasty. Thanks for the report! A few questions:
>>
>>  - Can you share your .config somewhere please?
>
> I managed to reproduce it on plain v4.15-rc3 using both arm64_defconfig, and
> renesas_defconfig (from Simon's repo).

v4.15-rc2 is affected, too.

>>  - What was your last known-good kernel?
>
> v4.15-rc1.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-12 16:00       ` Geert Uytterhoeven
  0 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-12 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Tue, Dec 12, 2017 at 4:11 PM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> On Tue, Dec 12, 2017 at 11:36 AM, Will Deacon <will.deacon@arm.com> wrote:
>> On Tue, Dec 12, 2017 at 11:20:09AM +0100, Geert Uytterhoeven wrote:
>>> During userspace (Debian jessie NFS root) boot on arm64:
>>>
>>> rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
>>> esr 0x92000004, in dash[aaaaadf77000+1a000]
>>> CPU: 0 PID: 1083 Comm: rpcbind Not tainted
>>> 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
>>> Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
>>> pstate: 80000000 (Nzcv daif -PAN -UAO)
>>> pc : 0xaaaaadf8a51c
>>> lr : 0xaaaaadf8ac08
>>> sp : 0000ffffcffeac00
>>> x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
>>> x27: 0000ffffcffebf7c x26: 0000ffffcffead20
>>> x25: 0000aaaacea1c5f0 x24: 0000000000000000
>>> x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
>>> x21: 0000000000000000 x20: 0000000000000008
>>> x19: 0000000000000000 x18: 0000ffffcffeb500
>>> x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
>>> x15: 0000ffffa2363588 x14: ffffffffffffffff
>>> x13: 0000000000000020 x12: 0000000000000010
>>> x11: 0101010101010101 x10: 0000aaaaadfa1000
>>> x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
>>> x7 : 0000000000000000 x6 : 0000000000000000
>>> x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
>>> x3 : 0000aaaaadfa2338 x2 : 0000000000000000
>>> x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
>>>
>>> Sometimes it happens with other processes, but the main address, esr, and
>>> pstate values are always the same.
>>>
>>> I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
>>> releases, so the last time was two weeks ago), but never saw the issue
>>> before until today, so probably v4.15-rc1 is OK.
>>> Unfortunately it doesn't happen during every boot, which makes it
>>> cumbersome to bisect.
>>>
>>> My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
>>> and even without today's arm64/for-next/core merged in, I still managed to
>>> reproduce the issue, so I believe it was introduced in v4.15-rc2 or
>>> v4.15-rc3.
>>
>> Urgh, this looks nasty. Thanks for the report! A few questions:
>>
>>  - Can you share your .config somewhere please?
>
> I managed to reproduce it on plain v4.15-rc3 using both arm64_defconfig, and
> renesas_defconfig (from Simon's repo).

v4.15-rc2 is affected, too.

>>  - What was your last known-good kernel?
>
> v4.15-rc1.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-12 16:00       ` Geert Uytterhoeven
@ 2017-12-12 16:57         ` Will Deacon
  -1 siblings, 0 replies; 36+ messages in thread
From: Will Deacon @ 2017-12-12 16:57 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Catalin Marinas, linux-arm-kernel, Linux-Renesas, linux-kernel

On Tue, Dec 12, 2017 at 05:00:33PM +0100, Geert Uytterhoeven wrote:
> On Tue, Dec 12, 2017 at 4:11 PM, Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > On Tue, Dec 12, 2017 at 11:36 AM, Will Deacon <will.deacon@arm.com> wrote:
> >> On Tue, Dec 12, 2017 at 11:20:09AM +0100, Geert Uytterhoeven wrote:
> >>> During userspace (Debian jessie NFS root) boot on arm64:
> >>>
> >>> rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
> >>> esr 0x92000004, in dash[aaaaadf77000+1a000]
> >>> CPU: 0 PID: 1083 Comm: rpcbind Not tainted
> >>> 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
> >>> Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
> >>> pstate: 80000000 (Nzcv daif -PAN -UAO)
> >>> pc : 0xaaaaadf8a51c
> >>> lr : 0xaaaaadf8ac08
> >>> sp : 0000ffffcffeac00
> >>> x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
> >>> x27: 0000ffffcffebf7c x26: 0000ffffcffead20
> >>> x25: 0000aaaacea1c5f0 x24: 0000000000000000
> >>> x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
> >>> x21: 0000000000000000 x20: 0000000000000008
> >>> x19: 0000000000000000 x18: 0000ffffcffeb500
> >>> x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
> >>> x15: 0000ffffa2363588 x14: ffffffffffffffff
> >>> x13: 0000000000000020 x12: 0000000000000010
> >>> x11: 0101010101010101 x10: 0000aaaaadfa1000
> >>> x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
> >>> x7 : 0000000000000000 x6 : 0000000000000000
> >>> x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
> >>> x3 : 0000aaaaadfa2338 x2 : 0000000000000000
> >>> x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
> >>>
> >>> Sometimes it happens with other processes, but the main address, esr, and
> >>> pstate values are always the same.
> >>>
> >>> I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
> >>> releases, so the last time was two weeks ago), but never saw the issue
> >>> before until today, so probably v4.15-rc1 is OK.
> >>> Unfortunately it doesn't happen during every boot, which makes it
> >>> cumbersome to bisect.
> >>>
> >>> My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
> >>> and even without today's arm64/for-next/core merged in, I still managed to
> >>> reproduce the issue, so I believe it was introduced in v4.15-rc2 or
> >>> v4.15-rc3.
> >>
> >> Urgh, this looks nasty. Thanks for the report! A few questions:
> >>
> >>  - Can you share your .config somewhere please?
> >
> > I managed to reproduce it on plain v4.15-rc3 using both arm64_defconfig, and
> > renesas_defconfig (from Simon's repo).
> 
> v4.15-rc2 is affected, too.

Do you reckon you can bisect between -rc1 and -rc2? We've been unable to
reproduce this on any of our systems, unfortunately.

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-12 16:57         ` Will Deacon
  0 siblings, 0 replies; 36+ messages in thread
From: Will Deacon @ 2017-12-12 16:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 12, 2017 at 05:00:33PM +0100, Geert Uytterhoeven wrote:
> On Tue, Dec 12, 2017 at 4:11 PM, Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > On Tue, Dec 12, 2017 at 11:36 AM, Will Deacon <will.deacon@arm.com> wrote:
> >> On Tue, Dec 12, 2017 at 11:20:09AM +0100, Geert Uytterhoeven wrote:
> >>> During userspace (Debian jessie NFS root) boot on arm64:
> >>>
> >>> rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
> >>> esr 0x92000004, in dash[aaaaadf77000+1a000]
> >>> CPU: 0 PID: 1083 Comm: rpcbind Not tainted
> >>> 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
> >>> Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
> >>> pstate: 80000000 (Nzcv daif -PAN -UAO)
> >>> pc : 0xaaaaadf8a51c
> >>> lr : 0xaaaaadf8ac08
> >>> sp : 0000ffffcffeac00
> >>> x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
> >>> x27: 0000ffffcffebf7c x26: 0000ffffcffead20
> >>> x25: 0000aaaacea1c5f0 x24: 0000000000000000
> >>> x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
> >>> x21: 0000000000000000 x20: 0000000000000008
> >>> x19: 0000000000000000 x18: 0000ffffcffeb500
> >>> x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
> >>> x15: 0000ffffa2363588 x14: ffffffffffffffff
> >>> x13: 0000000000000020 x12: 0000000000000010
> >>> x11: 0101010101010101 x10: 0000aaaaadfa1000
> >>> x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
> >>> x7 : 0000000000000000 x6 : 0000000000000000
> >>> x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
> >>> x3 : 0000aaaaadfa2338 x2 : 0000000000000000
> >>> x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
> >>>
> >>> Sometimes it happens with other processes, but the main address, esr, and
> >>> pstate values are always the same.
> >>>
> >>> I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
> >>> releases, so the last time was two weeks ago), but never saw the issue
> >>> before until today, so probably v4.15-rc1 is OK.
> >>> Unfortunately it doesn't happen during every boot, which makes it
> >>> cumbersome to bisect.
> >>>
> >>> My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
> >>> and even without today's arm64/for-next/core merged in, I still managed to
> >>> reproduce the issue, so I believe it was introduced in v4.15-rc2 or
> >>> v4.15-rc3.
> >>
> >> Urgh, this looks nasty. Thanks for the report! A few questions:
> >>
> >>  - Can you share your .config somewhere please?
> >
> > I managed to reproduce it on plain v4.15-rc3 using both arm64_defconfig, and
> > renesas_defconfig (from Simon's repo).
> 
> v4.15-rc2 is affected, too.

Do you reckon you can bisect between -rc1 and -rc2? We've been unable to
reproduce this on any of our systems, unfortunately.

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-12 16:57         ` Will Deacon
@ 2017-12-12 20:54           ` Geert Uytterhoeven
  -1 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-12 20:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, linux-arm-kernel, Linux-Renesas, linux-kernel

Hi Will,

On Tue, Dec 12, 2017 at 5:57 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Dec 12, 2017 at 05:00:33PM +0100, Geert Uytterhoeven wrote:
>> On Tue, Dec 12, 2017 at 4:11 PM, Geert Uytterhoeven
>> <geert@linux-m68k.org> wrote:
>> > On Tue, Dec 12, 2017 at 11:36 AM, Will Deacon <will.deacon@arm.com> wrote:
>> >> On Tue, Dec 12, 2017 at 11:20:09AM +0100, Geert Uytterhoeven wrote:
>> >>> During userspace (Debian jessie NFS root) boot on arm64:
>> >>>
>> >>> rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
>> >>> esr 0x92000004, in dash[aaaaadf77000+1a000]
>> >>> CPU: 0 PID: 1083 Comm: rpcbind Not tainted
>> >>> 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
>> >>> Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
>> >>> pstate: 80000000 (Nzcv daif -PAN -UAO)
>> >>> pc : 0xaaaaadf8a51c
>> >>> lr : 0xaaaaadf8ac08
>> >>> sp : 0000ffffcffeac00
>> >>> x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
>> >>> x27: 0000ffffcffebf7c x26: 0000ffffcffead20
>> >>> x25: 0000aaaacea1c5f0 x24: 0000000000000000
>> >>> x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
>> >>> x21: 0000000000000000 x20: 0000000000000008
>> >>> x19: 0000000000000000 x18: 0000ffffcffeb500
>> >>> x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
>> >>> x15: 0000ffffa2363588 x14: ffffffffffffffff
>> >>> x13: 0000000000000020 x12: 0000000000000010
>> >>> x11: 0101010101010101 x10: 0000aaaaadfa1000
>> >>> x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
>> >>> x7 : 0000000000000000 x6 : 0000000000000000
>> >>> x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
>> >>> x3 : 0000aaaaadfa2338 x2 : 0000000000000000
>> >>> x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
>> >>>
>> >>> Sometimes it happens with other processes, but the main address, esr, and
>> >>> pstate values are always the same.
>> >>>
>> >>> I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
>> >>> releases, so the last time was two weeks ago), but never saw the issue
>> >>> before until today, so probably v4.15-rc1 is OK.
>> >>> Unfortunately it doesn't happen during every boot, which makes it
>> >>> cumbersome to bisect.
>> >>>
>> >>> My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
>> >>> and even without today's arm64/for-next/core merged in, I still managed to
>> >>> reproduce the issue, so I believe it was introduced in v4.15-rc2 or
>> >>> v4.15-rc3.
>> >>
>> >> Urgh, this looks nasty. Thanks for the report! A few questions:
>> >>
>> >>  - Can you share your .config somewhere please?
>> >
>> > I managed to reproduce it on plain v4.15-rc3 using both arm64_defconfig, and
>> > renesas_defconfig (from Simon's repo).
>>
>> v4.15-rc2 is affected, too.
>
> Do you reckon you can bisect between -rc1 and -rc2? We've been unable to
> reproduce this on any of our systems, unfortunately.

I've tried, but ended up on an unrelated XFS merge commit. Probably I
marked a few commits good due to not seeing this heisenbug.

For reference, here's the bisect log.

Bad commits showed one or both of "unhandled level 0 translation fault" and
"invalid pointer". Good commits didn't show any during 6 tries.

git bisect start
# bad: [ae64f9bd1d3621b5e60d7363bc20afb46aede215] Linux 4.15-rc2
git bisect bad ae64f9bd1d3621b5e60d7363bc20afb46aede215
# good: [4fbd8d194f06c8a3fd2af1ce560ddb31f7ec8323] Linux 4.15-rc1
git bisect good 4fbd8d194f06c8a3fd2af1ce560ddb31f7ec8323
# good: [9e0600f5cf6cecfcab5046d1453a9538c054d8a7] Merge tag
'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect good 9e0600f5cf6cecfcab5046d1453a9538c054d8a7
# good: [503505bfea19b7d69e2572297e6defa0f9c2404e] Merge branch
'drm-fixes-4.15' of git://people.freedesktop.org/~agd5f/linux into
drm-fixes
git bisect good 503505bfea19b7d69e2572297e6defa0f9c2404e
# good: [ae753ee2771a1bacade56411bb98037b2545c929] Merge tag
'afs-fixes-20171201' of
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
git bisect good ae753ee2771a1bacade56411bb98037b2545c929
# good: [e1ba1c99dad92c5917b22b1047cf36e4426b124a] Merge tag
'riscv-for-linus-4.15-rc2_cleanups' of
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
git bisect good e1ba1c99dad92c5917b22b1047cf36e4426b124a
# bad: [2db767d9889cef087149a5eaa35c1497671fa40f] Merge tag
'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
git bisect bad 2db767d9889cef087149a5eaa35c1497671fa40f
# good: [22a6c83777ac7c17d6c63891beeeac24cf5da450] xfs: ubsan fixes
git bisect good 22a6c83777ac7c17d6c63891beeeac24cf5da450
# bad: [788c1da05b73aee68ed98f05b577c308351f5619] Merge tag
'xfs-4.15-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
git bisect bad 788c1da05b73aee68ed98f05b577c308351f5619
# good: [3b42d385753c22b29d259ccb9d4c3f419e583b30] xfs: scrub inode
mode properly
git bisect good 3b42d385753c22b29d259ccb9d4c3f419e583b30
# good: [373b0589dc8d58bc09c9a28d03611ae4fb216057] xfs: Properly retry
failed dquot items in case of error during buffer writeback
git bisect good 373b0589dc8d58bc09c9a28d03611ae4fb216057
# first bad commit: [788c1da05b73aee68ed98f05b577c308351f5619] Merge
tag 'xfs-4.15-fixes-4' of
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Tomorrow there's another day in bisection paradise...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-12 20:54           ` Geert Uytterhoeven
  0 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-12 20:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Tue, Dec 12, 2017 at 5:57 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Dec 12, 2017 at 05:00:33PM +0100, Geert Uytterhoeven wrote:
>> On Tue, Dec 12, 2017 at 4:11 PM, Geert Uytterhoeven
>> <geert@linux-m68k.org> wrote:
>> > On Tue, Dec 12, 2017 at 11:36 AM, Will Deacon <will.deacon@arm.com> wrote:
>> >> On Tue, Dec 12, 2017 at 11:20:09AM +0100, Geert Uytterhoeven wrote:
>> >>> During userspace (Debian jessie NFS root) boot on arm64:
>> >>>
>> >>> rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
>> >>> esr 0x92000004, in dash[aaaaadf77000+1a000]
>> >>> CPU: 0 PID: 1083 Comm: rpcbind Not tainted
>> >>> 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
>> >>> Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
>> >>> pstate: 80000000 (Nzcv daif -PAN -UAO)
>> >>> pc : 0xaaaaadf8a51c
>> >>> lr : 0xaaaaadf8ac08
>> >>> sp : 0000ffffcffeac00
>> >>> x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
>> >>> x27: 0000ffffcffebf7c x26: 0000ffffcffead20
>> >>> x25: 0000aaaacea1c5f0 x24: 0000000000000000
>> >>> x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
>> >>> x21: 0000000000000000 x20: 0000000000000008
>> >>> x19: 0000000000000000 x18: 0000ffffcffeb500
>> >>> x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
>> >>> x15: 0000ffffa2363588 x14: ffffffffffffffff
>> >>> x13: 0000000000000020 x12: 0000000000000010
>> >>> x11: 0101010101010101 x10: 0000aaaaadfa1000
>> >>> x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
>> >>> x7 : 0000000000000000 x6 : 0000000000000000
>> >>> x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
>> >>> x3 : 0000aaaaadfa2338 x2 : 0000000000000000
>> >>> x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
>> >>>
>> >>> Sometimes it happens with other processes, but the main address, esr, and
>> >>> pstate values are always the same.
>> >>>
>> >>> I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
>> >>> releases, so the last time was two weeks ago), but never saw the issue
>> >>> before until today, so probably v4.15-rc1 is OK.
>> >>> Unfortunately it doesn't happen during every boot, which makes it
>> >>> cumbersome to bisect.
>> >>>
>> >>> My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
>> >>> and even without today's arm64/for-next/core merged in, I still managed to
>> >>> reproduce the issue, so I believe it was introduced in v4.15-rc2 or
>> >>> v4.15-rc3.
>> >>
>> >> Urgh, this looks nasty. Thanks for the report! A few questions:
>> >>
>> >>  - Can you share your .config somewhere please?
>> >
>> > I managed to reproduce it on plain v4.15-rc3 using both arm64_defconfig, and
>> > renesas_defconfig (from Simon's repo).
>>
>> v4.15-rc2 is affected, too.
>
> Do you reckon you can bisect between -rc1 and -rc2? We've been unable to
> reproduce this on any of our systems, unfortunately.

I've tried, but ended up on an unrelated XFS merge commit. Probably I
marked a few commits good due to not seeing this heisenbug.

For reference, here's the bisect log.

Bad commits showed one or both of "unhandled level 0 translation fault" and
"invalid pointer". Good commits didn't show any during 6 tries.

git bisect start
# bad: [ae64f9bd1d3621b5e60d7363bc20afb46aede215] Linux 4.15-rc2
git bisect bad ae64f9bd1d3621b5e60d7363bc20afb46aede215
# good: [4fbd8d194f06c8a3fd2af1ce560ddb31f7ec8323] Linux 4.15-rc1
git bisect good 4fbd8d194f06c8a3fd2af1ce560ddb31f7ec8323
# good: [9e0600f5cf6cecfcab5046d1453a9538c054d8a7] Merge tag
'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect good 9e0600f5cf6cecfcab5046d1453a9538c054d8a7
# good: [503505bfea19b7d69e2572297e6defa0f9c2404e] Merge branch
'drm-fixes-4.15' of git://people.freedesktop.org/~agd5f/linux into
drm-fixes
git bisect good 503505bfea19b7d69e2572297e6defa0f9c2404e
# good: [ae753ee2771a1bacade56411bb98037b2545c929] Merge tag
'afs-fixes-20171201' of
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
git bisect good ae753ee2771a1bacade56411bb98037b2545c929
# good: [e1ba1c99dad92c5917b22b1047cf36e4426b124a] Merge tag
'riscv-for-linus-4.15-rc2_cleanups' of
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
git bisect good e1ba1c99dad92c5917b22b1047cf36e4426b124a
# bad: [2db767d9889cef087149a5eaa35c1497671fa40f] Merge tag
'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
git bisect bad 2db767d9889cef087149a5eaa35c1497671fa40f
# good: [22a6c83777ac7c17d6c63891beeeac24cf5da450] xfs: ubsan fixes
git bisect good 22a6c83777ac7c17d6c63891beeeac24cf5da450
# bad: [788c1da05b73aee68ed98f05b577c308351f5619] Merge tag
'xfs-4.15-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
git bisect bad 788c1da05b73aee68ed98f05b577c308351f5619
# good: [3b42d385753c22b29d259ccb9d4c3f419e583b30] xfs: scrub inode
mode properly
git bisect good 3b42d385753c22b29d259ccb9d4c3f419e583b30
# good: [373b0589dc8d58bc09c9a28d03611ae4fb216057] xfs: Properly retry
failed dquot items in case of error during buffer writeback
git bisect good 373b0589dc8d58bc09c9a28d03611ae4fb216057
# first bad commit: [788c1da05b73aee68ed98f05b577c308351f5619] Merge
tag 'xfs-4.15-fixes-4' of
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Tomorrow there's another day in bisection paradise...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-12 20:54           ` Geert Uytterhoeven
@ 2017-12-13 10:24             ` Will Deacon
  -1 siblings, 0 replies; 36+ messages in thread
From: Will Deacon @ 2017-12-13 10:24 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Catalin Marinas, linux-arm-kernel, Linux-Renesas, linux-kernel

Hi Geert,

Thanks for trying to bisect this.

On Tue, Dec 12, 2017 at 09:54:05PM +0100, Geert Uytterhoeven wrote:
> On Tue, Dec 12, 2017 at 5:57 PM, Will Deacon <will.deacon@arm.com> wrote:
> > Do you reckon you can bisect between -rc1 and -rc2? We've been unable to
> > reproduce this on any of our systems, unfortunately.
> 
> I've tried, but ended up on an unrelated XFS merge commit. Probably I
> marked a few commits good due to not seeing this heisenbug.
> 
> For reference, here's the bisect log.
> 
> Bad commits showed one or both of "unhandled level 0 translation fault" and
> "invalid pointer". Good commits didn't show any during 6 tries.
> 
> git bisect start
> # bad: [ae64f9bd1d3621b5e60d7363bc20afb46aede215] Linux 4.15-rc2
> git bisect bad ae64f9bd1d3621b5e60d7363bc20afb46aede215
> # good: [4fbd8d194f06c8a3fd2af1ce560ddb31f7ec8323] Linux 4.15-rc1
> git bisect good 4fbd8d194f06c8a3fd2af1ce560ddb31f7ec8323
> # good: [9e0600f5cf6cecfcab5046d1453a9538c054d8a7] Merge tag
> 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
> git bisect good 9e0600f5cf6cecfcab5046d1453a9538c054d8a7
> # good: [503505bfea19b7d69e2572297e6defa0f9c2404e] Merge branch
> 'drm-fixes-4.15' of git://people.freedesktop.org/~agd5f/linux into
> drm-fixes
> git bisect good 503505bfea19b7d69e2572297e6defa0f9c2404e
> # good: [ae753ee2771a1bacade56411bb98037b2545c929] Merge tag
> 'afs-fixes-20171201' of
> git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
> git bisect good ae753ee2771a1bacade56411bb98037b2545c929
> # good: [e1ba1c99dad92c5917b22b1047cf36e4426b124a] Merge tag
> 'riscv-for-linus-4.15-rc2_cleanups' of
> git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
> git bisect good e1ba1c99dad92c5917b22b1047cf36e4426b124a

^^ This one is the first "good" commit containing the arm64-fixes pull.
Maybe try stressing it a bit more and see if it also fails?

That said, I'm still suspicious that nobody else is seeing this -- I also
checked the various build/boot farms and everything looks ok.

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-13 10:24             ` Will Deacon
  0 siblings, 0 replies; 36+ messages in thread
From: Will Deacon @ 2017-12-13 10:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geert,

Thanks for trying to bisect this.

On Tue, Dec 12, 2017 at 09:54:05PM +0100, Geert Uytterhoeven wrote:
> On Tue, Dec 12, 2017 at 5:57 PM, Will Deacon <will.deacon@arm.com> wrote:
> > Do you reckon you can bisect between -rc1 and -rc2? We've been unable to
> > reproduce this on any of our systems, unfortunately.
> 
> I've tried, but ended up on an unrelated XFS merge commit. Probably I
> marked a few commits good due to not seeing this heisenbug.
> 
> For reference, here's the bisect log.
> 
> Bad commits showed one or both of "unhandled level 0 translation fault" and
> "invalid pointer". Good commits didn't show any during 6 tries.
> 
> git bisect start
> # bad: [ae64f9bd1d3621b5e60d7363bc20afb46aede215] Linux 4.15-rc2
> git bisect bad ae64f9bd1d3621b5e60d7363bc20afb46aede215
> # good: [4fbd8d194f06c8a3fd2af1ce560ddb31f7ec8323] Linux 4.15-rc1
> git bisect good 4fbd8d194f06c8a3fd2af1ce560ddb31f7ec8323
> # good: [9e0600f5cf6cecfcab5046d1453a9538c054d8a7] Merge tag
> 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
> git bisect good 9e0600f5cf6cecfcab5046d1453a9538c054d8a7
> # good: [503505bfea19b7d69e2572297e6defa0f9c2404e] Merge branch
> 'drm-fixes-4.15' of git://people.freedesktop.org/~agd5f/linux into
> drm-fixes
> git bisect good 503505bfea19b7d69e2572297e6defa0f9c2404e
> # good: [ae753ee2771a1bacade56411bb98037b2545c929] Merge tag
> 'afs-fixes-20171201' of
> git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
> git bisect good ae753ee2771a1bacade56411bb98037b2545c929
> # good: [e1ba1c99dad92c5917b22b1047cf36e4426b124a] Merge tag
> 'riscv-for-linus-4.15-rc2_cleanups' of
> git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
> git bisect good e1ba1c99dad92c5917b22b1047cf36e4426b124a

^^ This one is the first "good" commit containing the arm64-fixes pull.
Maybe try stressing it a bit more and see if it also fails?

That said, I'm still suspicious that nobody else is seeing this -- I also
checked the various build/boot farms and everything looks ok.

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-12 10:20 ` Geert Uytterhoeven
@ 2017-12-14 14:34   ` Geert Uytterhoeven
  -1 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-14 14:34 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Dave Martin
  Cc: linux-arm-kernel, Linux-Renesas, Linux Kernel Mailing List,
	Alex Bennée, Ard Biesheuvel

Hi Catalin, Will, Dave,

On Tue, Dec 12, 2017 at 11:20 AM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> During userspace (Debian jessie NFS root) boot on arm64:
>
> rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
> esr 0x92000004, in dash[aaaaadf77000+1a000]
> CPU: 0 PID: 1083 Comm: rpcbind Not tainted
> 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
> Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)

This is a quad Cortex A57.

> pstate: 80000000 (Nzcv daif -PAN -UAO)
> pc : 0xaaaaadf8a51c
> lr : 0xaaaaadf8ac08
> sp : 0000ffffcffeac00
> x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
> x27: 0000ffffcffebf7c x26: 0000ffffcffead20
> x25: 0000aaaacea1c5f0 x24: 0000000000000000
> x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
> x21: 0000000000000000 x20: 0000000000000008
> x19: 0000000000000000 x18: 0000ffffcffeb500
> x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
> x15: 0000ffffa2363588 x14: ffffffffffffffff
> x13: 0000000000000020 x12: 0000000000000010
> x11: 0101010101010101 x10: 0000aaaaadfa1000
> x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
> x7 : 0000000000000000 x6 : 0000000000000000
> x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
> x3 : 0000aaaaadfa2338 x2 : 0000000000000000
> x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
>
> Sometimes it happens with other processes, but the main address, esr, and
> pstate values are always the same.
>
> I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
> releases, so the last time was two weeks ago), but never saw the issue
> before until today, so probably v4.15-rc1 is OK.
> Unfortunately it doesn't happen during every boot, which makes it
> cumbersome to bisect.
>
> My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
> and even without today's arm64/for-next/core merged in, I still managed to
> reproduce the issue, so I believe it was introduced in v4.15-rc2 or
> v4.15-rc3.
>
> Once, when the kernel message above wasn't shown, I got an error from
> userspace, which may be related:
> *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaadd970988 ***

With more boots (10 instead of 6) to declare a kernel good, I bisected this
to commit 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD
state after signals").

Reverting that commit on top of v4.15-rc3 fixed the issue for me.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-14 14:34   ` Geert Uytterhoeven
  0 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-14 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Catalin, Will, Dave,

On Tue, Dec 12, 2017 at 11:20 AM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> During userspace (Debian jessie NFS root) boot on arm64:
>
> rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
> esr 0x92000004, in dash[aaaaadf77000+1a000]
> CPU: 0 PID: 1083 Comm: rpcbind Not tainted
> 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
> Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)

This is a quad Cortex A57.

> pstate: 80000000 (Nzcv daif -PAN -UAO)
> pc : 0xaaaaadf8a51c
> lr : 0xaaaaadf8ac08
> sp : 0000ffffcffeac00
> x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
> x27: 0000ffffcffebf7c x26: 0000ffffcffead20
> x25: 0000aaaacea1c5f0 x24: 0000000000000000
> x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
> x21: 0000000000000000 x20: 0000000000000008
> x19: 0000000000000000 x18: 0000ffffcffeb500
> x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
> x15: 0000ffffa2363588 x14: ffffffffffffffff
> x13: 0000000000000020 x12: 0000000000000010
> x11: 0101010101010101 x10: 0000aaaaadfa1000
> x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
> x7 : 0000000000000000 x6 : 0000000000000000
> x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
> x3 : 0000aaaaadfa2338 x2 : 0000000000000000
> x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
>
> Sometimes it happens with other processes, but the main address, esr, and
> pstate values are always the same.
>
> I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
> releases, so the last time was two weeks ago), but never saw the issue
> before until today, so probably v4.15-rc1 is OK.
> Unfortunately it doesn't happen during every boot, which makes it
> cumbersome to bisect.
>
> My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
> and even without today's arm64/for-next/core merged in, I still managed to
> reproduce the issue, so I believe it was introduced in v4.15-rc2 or
> v4.15-rc3.
>
> Once, when the kernel message above wasn't shown, I got an error from
> userspace, which may be related:
> *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaadd970988 ***

With more boots (10 instead of 6) to declare a kernel good, I bisected this
to commit 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD
state after signals").

Reverting that commit on top of v4.15-rc3 fixed the issue for me.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-14 14:34   ` Geert Uytterhoeven
@ 2017-12-14 15:16     ` Will Deacon
  -1 siblings, 0 replies; 36+ messages in thread
From: Will Deacon @ 2017-12-14 15:16 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Catalin Marinas, Dave Martin, linux-arm-kernel, Linux-Renesas,
	Linux Kernel Mailing List, Alex Bennée, Ard Biesheuvel

Hi Geert,

On Thu, Dec 14, 2017 at 03:34:50PM +0100, Geert Uytterhoeven wrote:
> On Tue, Dec 12, 2017 at 11:20 AM, Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > During userspace (Debian jessie NFS root) boot on arm64:
> >
> > rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
> > esr 0x92000004, in dash[aaaaadf77000+1a000]
> > CPU: 0 PID: 1083 Comm: rpcbind Not tainted
> > 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
> > Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
> 
> This is a quad Cortex A57.

It's so bizarre that nobody else is running into this!

> > pstate: 80000000 (Nzcv daif -PAN -UAO)
> > pc : 0xaaaaadf8a51c
> > lr : 0xaaaaadf8ac08
> > sp : 0000ffffcffeac00
> > x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
> > x27: 0000ffffcffebf7c x26: 0000ffffcffead20
> > x25: 0000aaaacea1c5f0 x24: 0000000000000000
> > x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
> > x21: 0000000000000000 x20: 0000000000000008
> > x19: 0000000000000000 x18: 0000ffffcffeb500
> > x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
> > x15: 0000ffffa2363588 x14: ffffffffffffffff
> > x13: 0000000000000020 x12: 0000000000000010
> > x11: 0101010101010101 x10: 0000aaaaadfa1000
> > x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
> > x7 : 0000000000000000 x6 : 0000000000000000
> > x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
> > x3 : 0000aaaaadfa2338 x2 : 0000000000000000
> > x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
> >
> > Sometimes it happens with other processes, but the main address, esr, and
> > pstate values are always the same.
> >
> > I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
> > releases, so the last time was two weeks ago), but never saw the issue
> > before until today, so probably v4.15-rc1 is OK.
> > Unfortunately it doesn't happen during every boot, which makes it
> > cumbersome to bisect.
> >
> > My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
> > and even without today's arm64/for-next/core merged in, I still managed to
> > reproduce the issue, so I believe it was introduced in v4.15-rc2 or
> > v4.15-rc3.
> >
> > Once, when the kernel message above wasn't shown, I got an error from
> > userspace, which may be related:
> > *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaadd970988 ***
> 
> With more boots (10 instead of 6) to declare a kernel good, I bisected this
> to commit 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD
> state after signals").
> 
> Reverting that commit on top of v4.15-rc3 fixed the issue for me.

Thanks for persevering with the bisect. We'll get this fixed ASAP, but we'll
be relying on you to test the patch we come up with.

Cheers,

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-14 15:16     ` Will Deacon
  0 siblings, 0 replies; 36+ messages in thread
From: Will Deacon @ 2017-12-14 15:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Geert,

On Thu, Dec 14, 2017 at 03:34:50PM +0100, Geert Uytterhoeven wrote:
> On Tue, Dec 12, 2017 at 11:20 AM, Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > During userspace (Debian jessie NFS root) boot on arm64:
> >
> > rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
> > esr 0x92000004, in dash[aaaaadf77000+1a000]
> > CPU: 0 PID: 1083 Comm: rpcbind Not tainted
> > 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
> > Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
> 
> This is a quad Cortex A57.

It's so bizarre that nobody else is running into this!

> > pstate: 80000000 (Nzcv daif -PAN -UAO)
> > pc : 0xaaaaadf8a51c
> > lr : 0xaaaaadf8ac08
> > sp : 0000ffffcffeac00
> > x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
> > x27: 0000ffffcffebf7c x26: 0000ffffcffead20
> > x25: 0000aaaacea1c5f0 x24: 0000000000000000
> > x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
> > x21: 0000000000000000 x20: 0000000000000008
> > x19: 0000000000000000 x18: 0000ffffcffeb500
> > x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
> > x15: 0000ffffa2363588 x14: ffffffffffffffff
> > x13: 0000000000000020 x12: 0000000000000010
> > x11: 0101010101010101 x10: 0000aaaaadfa1000
> > x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
> > x7 : 0000000000000000 x6 : 0000000000000000
> > x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
> > x3 : 0000aaaaadfa2338 x2 : 0000000000000000
> > x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
> >
> > Sometimes it happens with other processes, but the main address, esr, and
> > pstate values are always the same.
> >
> > I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
> > releases, so the last time was two weeks ago), but never saw the issue
> > before until today, so probably v4.15-rc1 is OK.
> > Unfortunately it doesn't happen during every boot, which makes it
> > cumbersome to bisect.
> >
> > My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
> > and even without today's arm64/for-next/core merged in, I still managed to
> > reproduce the issue, so I believe it was introduced in v4.15-rc2 or
> > v4.15-rc3.
> >
> > Once, when the kernel message above wasn't shown, I got an error from
> > userspace, which may be related:
> > *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaadd970988 ***
> 
> With more boots (10 instead of 6) to declare a kernel good, I bisected this
> to commit 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD
> state after signals").
> 
> Reverting that commit on top of v4.15-rc3 fixed the issue for me.

Thanks for persevering with the bisect. We'll get this fixed ASAP, but we'll
be relying on you to test the patch we come up with.

Cheers,

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-14 14:34   ` Geert Uytterhoeven
@ 2017-12-14 15:24     ` Dave P Martin
  -1 siblings, 0 replies; 36+ messages in thread
From: Dave P Martin @ 2017-12-14 15:24 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Catalin Marinas, Will Deacon, linux-arm-kernel, Linux-Renesas,
	Linux Kernel Mailing List, Alex Bennée, Ard Biesheuvel

On Thu, Dec 14, 2017 at 02:34:50PM +0000, Geert Uytterhoeven wrote:
> Hi Catalin, Will, Dave,
>
> On Tue, Dec 12, 2017 at 11:20 AM, Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > During userspace (Debian jessie NFS root) boot on arm64:
> >
> > rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
> > esr 0x92000004, in dash[aaaaadf77000+1a000]
> > CPU: 0 PID: 1083 Comm: rpcbind Not tainted
> > 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
> > Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
>
> This is a quad Cortex A57.
>
> > pstate: 80000000 (Nzcv daif -PAN -UAO)
> > pc : 0xaaaaadf8a51c
> > lr : 0xaaaaadf8ac08
> > sp : 0000ffffcffeac00
> > x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
> > x27: 0000ffffcffebf7c x26: 0000ffffcffead20
> > x25: 0000aaaacea1c5f0 x24: 0000000000000000
> > x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
> > x21: 0000000000000000 x20: 0000000000000008
> > x19: 0000000000000000 x18: 0000ffffcffeb500
> > x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
> > x15: 0000ffffa2363588 x14: ffffffffffffffff
> > x13: 0000000000000020 x12: 0000000000000010
> > x11: 0101010101010101 x10: 0000aaaaadfa1000
> > x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
> > x7 : 0000000000000000 x6 : 0000000000000000
> > x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
> > x3 : 0000aaaaadfa2338 x2 : 0000000000000000
> > x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
> >
> > Sometimes it happens with other processes, but the main address, esr, and
> > pstate values are always the same.
> >
> > I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
> > releases, so the last time was two weeks ago), but never saw the issue
> > before until today, so probably v4.15-rc1 is OK.
> > Unfortunately it doesn't happen during every boot, which makes it
> > cumbersome to bisect.
> >
> > My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
> > and even without today's arm64/for-next/core merged in, I still managed to
> > reproduce the issue, so I believe it was introduced in v4.15-rc2 or
> > v4.15-rc3.
> >
> > Once, when the kernel message above wasn't shown, I got an error from
> > userspace, which may be related:
> > *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaadd970988 ***
>
> With more boots (10 instead of 6) to declare a kernel good, I bisected this
> to commit 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD
> state after signals").
>
> Reverting that commit on top of v4.15-rc3 fixed the issue for me.

Good work on the bisect -- I'll need to have a think about this...

That patch fixes a genuine problem so we can't just revert it.


What if you revert _just this function_ back to what it was in v4.14?

Cheers
---Dave
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-14 15:24     ` Dave P Martin
  0 siblings, 0 replies; 36+ messages in thread
From: Dave P Martin @ 2017-12-14 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 14, 2017 at 02:34:50PM +0000, Geert Uytterhoeven wrote:
> Hi Catalin, Will, Dave,
>
> On Tue, Dec 12, 2017 at 11:20 AM, Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > During userspace (Debian jessie NFS root) boot on arm64:
> >
> > rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
> > esr 0x92000004, in dash[aaaaadf77000+1a000]
> > CPU: 0 PID: 1083 Comm: rpcbind Not tainted
> > 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
> > Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
>
> This is a quad Cortex A57.
>
> > pstate: 80000000 (Nzcv daif -PAN -UAO)
> > pc : 0xaaaaadf8a51c
> > lr : 0xaaaaadf8ac08
> > sp : 0000ffffcffeac00
> > x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
> > x27: 0000ffffcffebf7c x26: 0000ffffcffead20
> > x25: 0000aaaacea1c5f0 x24: 0000000000000000
> > x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
> > x21: 0000000000000000 x20: 0000000000000008
> > x19: 0000000000000000 x18: 0000ffffcffeb500
> > x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
> > x15: 0000ffffa2363588 x14: ffffffffffffffff
> > x13: 0000000000000020 x12: 0000000000000010
> > x11: 0101010101010101 x10: 0000aaaaadfa1000
> > x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
> > x7 : 0000000000000000 x6 : 0000000000000000
> > x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
> > x3 : 0000aaaaadfa2338 x2 : 0000000000000000
> > x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
> >
> > Sometimes it happens with other processes, but the main address, esr, and
> > pstate values are always the same.
> >
> > I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
> > releases, so the last time was two weeks ago), but never saw the issue
> > before until today, so probably v4.15-rc1 is OK.
> > Unfortunately it doesn't happen during every boot, which makes it
> > cumbersome to bisect.
> >
> > My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
> > and even without today's arm64/for-next/core merged in, I still managed to
> > reproduce the issue, so I believe it was introduced in v4.15-rc2 or
> > v4.15-rc3.
> >
> > Once, when the kernel message above wasn't shown, I got an error from
> > userspace, which may be related:
> > *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaadd970988 ***
>
> With more boots (10 instead of 6) to declare a kernel good, I bisected this
> to commit 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD
> state after signals").
>
> Reverting that commit on top of v4.15-rc3 fixed the issue for me.

Good work on the bisect -- I'll need to have a think about this...

That patch fixes a genuine problem so we can't just revert it.


What if you revert _just this function_ back to what it was in v4.14?

Cheers
---Dave
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-14 15:24     ` Dave P Martin
@ 2017-12-14 18:08       ` Geert Uytterhoeven
  -1 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-14 18:08 UTC (permalink / raw)
  To: Dave P Martin
  Cc: Catalin Marinas, Will Deacon, linux-arm-kernel, Linux-Renesas,
	Linux Kernel Mailing List, Alex Bennée, Ard Biesheuvel

Hi Dave,

On Thu, Dec 14, 2017 at 4:24 PM, Dave P Martin <Dave.Martin@arm.com> wrote:
> On Thu, Dec 14, 2017 at 02:34:50PM +0000, Geert Uytterhoeven wrote:
>> On Tue, Dec 12, 2017 at 11:20 AM, Geert Uytterhoeven
>> <geert@linux-m68k.org> wrote:
>> > During userspace (Debian jessie NFS root) boot on arm64:
>> >
>> > rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
>> > esr 0x92000004, in dash[aaaaadf77000+1a000]
>> > CPU: 0 PID: 1083 Comm: rpcbind Not tainted
>> > 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
>> > Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
>>
>> This is a quad Cortex A57.
>>
>> > pstate: 80000000 (Nzcv daif -PAN -UAO)
>> > pc : 0xaaaaadf8a51c
>> > lr : 0xaaaaadf8ac08
>> > sp : 0000ffffcffeac00
>> > x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
>> > x27: 0000ffffcffebf7c x26: 0000ffffcffead20
>> > x25: 0000aaaacea1c5f0 x24: 0000000000000000
>> > x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
>> > x21: 0000000000000000 x20: 0000000000000008
>> > x19: 0000000000000000 x18: 0000ffffcffeb500
>> > x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
>> > x15: 0000ffffa2363588 x14: ffffffffffffffff
>> > x13: 0000000000000020 x12: 0000000000000010
>> > x11: 0101010101010101 x10: 0000aaaaadfa1000
>> > x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
>> > x7 : 0000000000000000 x6 : 0000000000000000
>> > x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
>> > x3 : 0000aaaaadfa2338 x2 : 0000000000000000
>> > x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
>> >
>> > Sometimes it happens with other processes, but the main address, esr, and
>> > pstate values are always the same.
>> >
>> > I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
>> > releases, so the last time was two weeks ago), but never saw the issue
>> > before until today, so probably v4.15-rc1 is OK.
>> > Unfortunately it doesn't happen during every boot, which makes it
>> > cumbersome to bisect.
>> >
>> > My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
>> > and even without today's arm64/for-next/core merged in, I still managed to
>> > reproduce the issue, so I believe it was introduced in v4.15-rc2 or
>> > v4.15-rc3.
>> >
>> > Once, when the kernel message above wasn't shown, I got an error from
>> > userspace, which may be related:
>> > *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaadd970988 ***
>>
>> With more boots (10 instead of 6) to declare a kernel good, I bisected this
>> to commit 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD
>> state after signals").
>>
>> Reverting that commit on top of v4.15-rc3 fixed the issue for me.
>
> Good work on the bisect -- I'll need to have a think about this...
>
> That patch fixes a genuine problem so we can't just revert it.
>
> What if you revert _just this function_ back to what it was in v4.14?

With fpsimd_update_current_state() reverted to v4.14, and

-               __this_cpu_write(fpsimd_last_state, st);
+               __this_cpu_write(fpsimd_last_state.st, st);

to make it build, the problem seems to be fixed, too.

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-14 18:08       ` Geert Uytterhoeven
  0 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-14 18:08 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Dave,

On Thu, Dec 14, 2017 at 4:24 PM, Dave P Martin <Dave.Martin@arm.com> wrote:
> On Thu, Dec 14, 2017 at 02:34:50PM +0000, Geert Uytterhoeven wrote:
>> On Tue, Dec 12, 2017 at 11:20 AM, Geert Uytterhoeven
>> <geert@linux-m68k.org> wrote:
>> > During userspace (Debian jessie NFS root) boot on arm64:
>> >
>> > rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
>> > esr 0x92000004, in dash[aaaaadf77000+1a000]
>> > CPU: 0 PID: 1083 Comm: rpcbind Not tainted
>> > 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
>> > Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
>>
>> This is a quad Cortex A57.
>>
>> > pstate: 80000000 (Nzcv daif -PAN -UAO)
>> > pc : 0xaaaaadf8a51c
>> > lr : 0xaaaaadf8ac08
>> > sp : 0000ffffcffeac00
>> > x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
>> > x27: 0000ffffcffebf7c x26: 0000ffffcffead20
>> > x25: 0000aaaacea1c5f0 x24: 0000000000000000
>> > x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
>> > x21: 0000000000000000 x20: 0000000000000008
>> > x19: 0000000000000000 x18: 0000ffffcffeb500
>> > x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
>> > x15: 0000ffffa2363588 x14: ffffffffffffffff
>> > x13: 0000000000000020 x12: 0000000000000010
>> > x11: 0101010101010101 x10: 0000aaaaadfa1000
>> > x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
>> > x7 : 0000000000000000 x6 : 0000000000000000
>> > x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
>> > x3 : 0000aaaaadfa2338 x2 : 0000000000000000
>> > x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
>> >
>> > Sometimes it happens with other processes, but the main address, esr, and
>> > pstate values are always the same.
>> >
>> > I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
>> > releases, so the last time was two weeks ago), but never saw the issue
>> > before until today, so probably v4.15-rc1 is OK.
>> > Unfortunately it doesn't happen during every boot, which makes it
>> > cumbersome to bisect.
>> >
>> > My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
>> > and even without today's arm64/for-next/core merged in, I still managed to
>> > reproduce the issue, so I believe it was introduced in v4.15-rc2 or
>> > v4.15-rc3.
>> >
>> > Once, when the kernel message above wasn't shown, I got an error from
>> > userspace, which may be related:
>> > *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaadd970988 ***
>>
>> With more boots (10 instead of 6) to declare a kernel good, I bisected this
>> to commit 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD
>> state after signals").
>>
>> Reverting that commit on top of v4.15-rc3 fixed the issue for me.
>
> Good work on the bisect -- I'll need to have a think about this...
>
> That patch fixes a genuine problem so we can't just revert it.
>
> What if you revert _just this function_ back to what it was in v4.14?

With fpsimd_update_current_state() reverted to v4.14, and

-               __this_cpu_write(fpsimd_last_state, st);
+               __this_cpu_write(fpsimd_last_state.st, st);

to make it build, the problem seems to be fixed, too.

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-14 18:08       ` Geert Uytterhoeven
@ 2017-12-15 11:23         ` Dave Martin
  -1 siblings, 0 replies; 36+ messages in thread
From: Dave Martin @ 2017-12-15 11:23 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, Linux-Renesas, Alex Bennée,
	linux-arm-kernel

On Thu, Dec 14, 2017 at 07:08:27PM +0100, Geert Uytterhoeven wrote:
> Hi Dave,
> 
> On Thu, Dec 14, 2017 at 4:24 PM, Dave P Martin <Dave.Martin@arm.com> wrote:
> > On Thu, Dec 14, 2017 at 02:34:50PM +0000, Geert Uytterhoeven wrote:
> >> On Tue, Dec 12, 2017 at 11:20 AM, Geert Uytterhoeven
> >> <geert@linux-m68k.org> wrote:
> >> > During userspace (Debian jessie NFS root) boot on arm64:
> >> >
> >> > rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
> >> > esr 0x92000004, in dash[aaaaadf77000+1a000]
> >> > CPU: 0 PID: 1083 Comm: rpcbind Not tainted
> >> > 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
> >> > Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
> >>
> >> This is a quad Cortex A57.
> >>
> >> > pstate: 80000000 (Nzcv daif -PAN -UAO)
> >> > pc : 0xaaaaadf8a51c
> >> > lr : 0xaaaaadf8ac08
> >> > sp : 0000ffffcffeac00
> >> > x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
> >> > x27: 0000ffffcffebf7c x26: 0000ffffcffead20
> >> > x25: 0000aaaacea1c5f0 x24: 0000000000000000
> >> > x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
> >> > x21: 0000000000000000 x20: 0000000000000008
> >> > x19: 0000000000000000 x18: 0000ffffcffeb500
> >> > x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
> >> > x15: 0000ffffa2363588 x14: ffffffffffffffff
> >> > x13: 0000000000000020 x12: 0000000000000010
> >> > x11: 0101010101010101 x10: 0000aaaaadfa1000
> >> > x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
> >> > x7 : 0000000000000000 x6 : 0000000000000000
> >> > x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
> >> > x3 : 0000aaaaadfa2338 x2 : 0000000000000000
> >> > x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
> >> >
> >> > Sometimes it happens with other processes, but the main address, esr, and
> >> > pstate values are always the same.
> >> >
> >> > I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
> >> > releases, so the last time was two weeks ago), but never saw the issue
> >> > before until today, so probably v4.15-rc1 is OK.
> >> > Unfortunately it doesn't happen during every boot, which makes it
> >> > cumbersome to bisect.
> >> >
> >> > My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
> >> > and even without today's arm64/for-next/core merged in, I still managed to
> >> > reproduce the issue, so I believe it was introduced in v4.15-rc2 or
> >> > v4.15-rc3.
> >> >
> >> > Once, when the kernel message above wasn't shown, I got an error from
> >> > userspace, which may be related:
> >> > *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaadd970988 ***
> >>
> >> With more boots (10 instead of 6) to declare a kernel good, I bisected this
> >> to commit 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD
> >> state after signals").
> >>
> >> Reverting that commit on top of v4.15-rc3 fixed the issue for me.
> >
> > Good work on the bisect -- I'll need to have a think about this...
> >
> > That patch fixes a genuine problem so we can't just revert it.
> >
> > What if you revert _just this function_ back to what it was in v4.14?
> 
> With fpsimd_update_current_state() reverted to v4.14, and
> 
> -               __this_cpu_write(fpsimd_last_state, st);
> +               __this_cpu_write(fpsimd_last_state.st, st);
> 
> to make it build, the problem seems to be fixed, too.
> 
> Thanks!
> 
> Gr{oetje,eeting}s,
> 
>                         Geert

Interesting if I apply that to v4.14 and then flatten the new code for CONFIG_ARM64_SVE=n, I get:

Working:

void fpsimd_update_current_state(struct fpsimd_state *state)
{
	local_bh_disable();

	fpsimd_load_state(state);
	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
		struct fpsimd_state *st = &current->thread.fpsimd_state;

		__this_cpu_write(fpsimd_last_state.st, st);
		st->cpu = smp_processor_id();
	}

	local_bh_enable();
}

Broken:

void fpsimd_update_current_state(struct fpsimd_state *state)
{
	struct fpsimd_last_state_struct *last;
	struct fpsimd_state *st;

	local_bh_disable();

	current->thread.fpsimd_state = *state;
	fpsimd_load_state(&current->thread.fpsimd_state);

	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
		last = this_cpu_ptr(&fpsimd_last_state);
		st = &current->thread.fpsimd_state;

		last->st = st;
		last->sve_in_use = test_thread_flag(TIF_SVE);
		st->cpu = smp_processor_id();
	}

	local_bh_enable();
}

Can you try my flattened "broken" version by itself and see if that does
reproduce the bug?  If not, my flattening may be making bad assumptions...


Assuming the "broken" version reproduces the bug, I can't yet see exactly
where the breakage comes from.

The two important differences here seem to be

1) Staging the state via current->thread.fpsimd_state instead of loading
directly:

-	fpsimd_load_state(state);
+	current->thread.fpsimd_state = *state;
+	fpsimd_load_state(&current->thread.fpsimd_state);

and

2) Using this_cpu_ptr() + assignment instead of __this_cpu_write() when
reassociating the task's fpsimd context with the cpu:

 {
+	struct fpsimd_last_state_struct *last;
+	struct fpsimd_state *st;

[...]

 	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
-		struct fpsimd_state *st = &current->thread.fpsimd_state;
-
-		__this_cpu_write(fpsimd_last_state.st, st);
-		st->cpu = smp_processor_id();
+		last = this_cpu_ptr(&fpsimd_last_state);
+		st = &current->thread.fpsimd_state;
+
+		last->st = st;
+		last->sve_in_use = test_thread_flag(TIF_SVE);
+		st->cpu = smp_processor_id();
 	}


I can't see why either of these breaks anything yet.

Can you try them independently and see whether you can isolate the
breakage to one of them?

Cheers
---Dave

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-15 11:23         ` Dave Martin
  0 siblings, 0 replies; 36+ messages in thread
From: Dave Martin @ 2017-12-15 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 14, 2017 at 07:08:27PM +0100, Geert Uytterhoeven wrote:
> Hi Dave,
> 
> On Thu, Dec 14, 2017 at 4:24 PM, Dave P Martin <Dave.Martin@arm.com> wrote:
> > On Thu, Dec 14, 2017 at 02:34:50PM +0000, Geert Uytterhoeven wrote:
> >> On Tue, Dec 12, 2017 at 11:20 AM, Geert Uytterhoeven
> >> <geert@linux-m68k.org> wrote:
> >> > During userspace (Debian jessie NFS root) boot on arm64:
> >> >
> >> > rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
> >> > esr 0x92000004, in dash[aaaaadf77000+1a000]
> >> > CPU: 0 PID: 1083 Comm: rpcbind Not tainted
> >> > 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
> >> > Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
> >>
> >> This is a quad Cortex A57.
> >>
> >> > pstate: 80000000 (Nzcv daif -PAN -UAO)
> >> > pc : 0xaaaaadf8a51c
> >> > lr : 0xaaaaadf8ac08
> >> > sp : 0000ffffcffeac00
> >> > x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
> >> > x27: 0000ffffcffebf7c x26: 0000ffffcffead20
> >> > x25: 0000aaaacea1c5f0 x24: 0000000000000000
> >> > x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
> >> > x21: 0000000000000000 x20: 0000000000000008
> >> > x19: 0000000000000000 x18: 0000ffffcffeb500
> >> > x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
> >> > x15: 0000ffffa2363588 x14: ffffffffffffffff
> >> > x13: 0000000000000020 x12: 0000000000000010
> >> > x11: 0101010101010101 x10: 0000aaaaadfa1000
> >> > x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
> >> > x7 : 0000000000000000 x6 : 0000000000000000
> >> > x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
> >> > x3 : 0000aaaaadfa2338 x2 : 0000000000000000
> >> > x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
> >> >
> >> > Sometimes it happens with other processes, but the main address, esr, and
> >> > pstate values are always the same.
> >> >
> >> > I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
> >> > releases, so the last time was two weeks ago), but never saw the issue
> >> > before until today, so probably v4.15-rc1 is OK.
> >> > Unfortunately it doesn't happen during every boot, which makes it
> >> > cumbersome to bisect.
> >> >
> >> > My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
> >> > and even without today's arm64/for-next/core merged in, I still managed to
> >> > reproduce the issue, so I believe it was introduced in v4.15-rc2 or
> >> > v4.15-rc3.
> >> >
> >> > Once, when the kernel message above wasn't shown, I got an error from
> >> > userspace, which may be related:
> >> > *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaadd970988 ***
> >>
> >> With more boots (10 instead of 6) to declare a kernel good, I bisected this
> >> to commit 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD
> >> state after signals").
> >>
> >> Reverting that commit on top of v4.15-rc3 fixed the issue for me.
> >
> > Good work on the bisect -- I'll need to have a think about this...
> >
> > That patch fixes a genuine problem so we can't just revert it.
> >
> > What if you revert _just this function_ back to what it was in v4.14?
> 
> With fpsimd_update_current_state() reverted to v4.14, and
> 
> -               __this_cpu_write(fpsimd_last_state, st);
> +               __this_cpu_write(fpsimd_last_state.st, st);
> 
> to make it build, the problem seems to be fixed, too.
> 
> Thanks!
> 
> Gr{oetje,eeting}s,
> 
>                         Geert

Interesting if I apply that to v4.14 and then flatten the new code for CONFIG_ARM64_SVE=n, I get:

Working:

void fpsimd_update_current_state(struct fpsimd_state *state)
{
	local_bh_disable();

	fpsimd_load_state(state);
	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
		struct fpsimd_state *st = &current->thread.fpsimd_state;

		__this_cpu_write(fpsimd_last_state.st, st);
		st->cpu = smp_processor_id();
	}

	local_bh_enable();
}

Broken:

void fpsimd_update_current_state(struct fpsimd_state *state)
{
	struct fpsimd_last_state_struct *last;
	struct fpsimd_state *st;

	local_bh_disable();

	current->thread.fpsimd_state = *state;
	fpsimd_load_state(&current->thread.fpsimd_state);

	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
		last = this_cpu_ptr(&fpsimd_last_state);
		st = &current->thread.fpsimd_state;

		last->st = st;
		last->sve_in_use = test_thread_flag(TIF_SVE);
		st->cpu = smp_processor_id();
	}

	local_bh_enable();
}

Can you try my flattened "broken" version by itself and see if that does
reproduce the bug?  If not, my flattening may be making bad assumptions...


Assuming the "broken" version reproduces the bug, I can't yet see exactly
where the breakage comes from.

The two important differences here seem to be

1) Staging the state via current->thread.fpsimd_state instead of loading
directly:

-	fpsimd_load_state(state);
+	current->thread.fpsimd_state = *state;
+	fpsimd_load_state(&current->thread.fpsimd_state);

and

2) Using this_cpu_ptr() + assignment instead of __this_cpu_write() when
reassociating the task's fpsimd context with the cpu:

 {
+	struct fpsimd_last_state_struct *last;
+	struct fpsimd_state *st;

[...]

 	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
-		struct fpsimd_state *st = &current->thread.fpsimd_state;
-
-		__this_cpu_write(fpsimd_last_state.st, st);
-		st->cpu = smp_processor_id();
+		last = this_cpu_ptr(&fpsimd_last_state);
+		st = &current->thread.fpsimd_state;
+
+		last->st = st;
+		last->sve_in_use = test_thread_flag(TIF_SVE);
+		st->cpu = smp_processor_id();
 	}


I can't see why either of these breaks anything yet.

Can you try them independently and see whether you can isolate the
breakage to one of them?

Cheers
---Dave

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-15 11:23         ` Dave Martin
@ 2017-12-15 13:30           ` Geert Uytterhoeven
  -1 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-15 13:30 UTC (permalink / raw)
  To: Dave Martin
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, Linux-Renesas, Alex Bennée,
	linux-arm-kernel

Hi Dave,

On Fri, Dec 15, 2017 at 12:23 PM, Dave Martin <Dave.Martin@arm.com> wrote:
> On Thu, Dec 14, 2017 at 07:08:27PM +0100, Geert Uytterhoeven wrote:
>> On Thu, Dec 14, 2017 at 4:24 PM, Dave P Martin <Dave.Martin@arm.com> wrote:
>> > On Thu, Dec 14, 2017 at 02:34:50PM +0000, Geert Uytterhoeven wrote:
>> >> On Tue, Dec 12, 2017 at 11:20 AM, Geert Uytterhoeven
>> >> <geert@linux-m68k.org> wrote:
>> >> > During userspace (Debian jessie NFS root) boot on arm64:
>> >> >
>> >> > rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
>> >> > esr 0x92000004, in dash[aaaaadf77000+1a000]
>> >> > CPU: 0 PID: 1083 Comm: rpcbind Not tainted
>> >> > 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
>> >> > Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
>> >>
>> >> This is a quad Cortex A57.
>> >>
>> >> > pstate: 80000000 (Nzcv daif -PAN -UAO)
>> >> > pc : 0xaaaaadf8a51c
>> >> > lr : 0xaaaaadf8ac08
>> >> > sp : 0000ffffcffeac00
>> >> > x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
>> >> > x27: 0000ffffcffebf7c x26: 0000ffffcffead20
>> >> > x25: 0000aaaacea1c5f0 x24: 0000000000000000
>> >> > x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
>> >> > x21: 0000000000000000 x20: 0000000000000008
>> >> > x19: 0000000000000000 x18: 0000ffffcffeb500
>> >> > x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
>> >> > x15: 0000ffffa2363588 x14: ffffffffffffffff
>> >> > x13: 0000000000000020 x12: 0000000000000010
>> >> > x11: 0101010101010101 x10: 0000aaaaadfa1000
>> >> > x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
>> >> > x7 : 0000000000000000 x6 : 0000000000000000
>> >> > x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
>> >> > x3 : 0000aaaaadfa2338 x2 : 0000000000000000
>> >> > x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
>> >> >
>> >> > Sometimes it happens with other processes, but the main address, esr, and
>> >> > pstate values are always the same.
>> >> >
>> >> > I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
>> >> > releases, so the last time was two weeks ago), but never saw the issue
>> >> > before until today, so probably v4.15-rc1 is OK.
>> >> > Unfortunately it doesn't happen during every boot, which makes it
>> >> > cumbersome to bisect.
>> >> >
>> >> > My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
>> >> > and even without today's arm64/for-next/core merged in, I still managed to
>> >> > reproduce the issue, so I believe it was introduced in v4.15-rc2 or
>> >> > v4.15-rc3.
>> >> >
>> >> > Once, when the kernel message above wasn't shown, I got an error from
>> >> > userspace, which may be related:
>> >> > *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaadd970988 ***
>> >>
>> >> With more boots (10 instead of 6) to declare a kernel good, I bisected this
>> >> to commit 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD
>> >> state after signals").
>> >>
>> >> Reverting that commit on top of v4.15-rc3 fixed the issue for me.
>> >
>> > Good work on the bisect -- I'll need to have a think about this...
>> >
>> > That patch fixes a genuine problem so we can't just revert it.
>> >
>> > What if you revert _just this function_ back to what it was in v4.14?
>>
>> With fpsimd_update_current_state() reverted to v4.14, and
>>
>> -               __this_cpu_write(fpsimd_last_state, st);
>> +               __this_cpu_write(fpsimd_last_state.st, st);
>>
>> to make it build, the problem seems to be fixed, too.

> Interesting if I apply that to v4.14 and then flatten the new code for CONFIG_ARM64_SVE=n, I get:
>
> Working:
>
> void fpsimd_update_current_state(struct fpsimd_state *state)
> {
>         local_bh_disable();
>
>         fpsimd_load_state(state);
>         if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
>                 struct fpsimd_state *st = &current->thread.fpsimd_state;
>
>                 __this_cpu_write(fpsimd_last_state.st, st);
>                 st->cpu = smp_processor_id();
>         }
>
>         local_bh_enable();
> }
>
> Broken:
>
> void fpsimd_update_current_state(struct fpsimd_state *state)
> {
>         struct fpsimd_last_state_struct *last;
>         struct fpsimd_state *st;
>
>         local_bh_disable();
>
>         current->thread.fpsimd_state = *state;
>         fpsimd_load_state(&current->thread.fpsimd_state);
>
>         if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
>                 last = this_cpu_ptr(&fpsimd_last_state);
>                 st = &current->thread.fpsimd_state;
>
>                 last->st = st;
>                 last->sve_in_use = test_thread_flag(TIF_SVE);
>                 st->cpu = smp_processor_id();
>         }
>
>         local_bh_enable();
> }
>
> Can you try my flattened "broken" version by itself and see if that does
> reproduce the bug?  If not, my flattening may be making bad assumptions...
>
> Assuming the "broken" version reproduces the bug, I can't yet see exactly
> where the breakage comes from.

Correct, above "Working" is working, and "Broken" is broken.

> The two important differences here seem to be
>
> 1) Staging the state via current->thread.fpsimd_state instead of loading
> directly:
>
> -       fpsimd_load_state(state);
> +       current->thread.fpsimd_state = *state;
> +       fpsimd_load_state(&current->thread.fpsimd_state);

The change above introduces the breakage.

> 2) Using this_cpu_ptr() + assignment instead of __this_cpu_write() when
> reassociating the task's fpsimd context with the cpu:
>
>  {
> +       struct fpsimd_last_state_struct *last;
> +       struct fpsimd_state *st;
>
> [...]
>
>         if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
> -               struct fpsimd_state *st = &current->thread.fpsimd_state;
> -
> -               __this_cpu_write(fpsimd_last_state.st, st);
> -               st->cpu = smp_processor_id();
> +               last = this_cpu_ptr(&fpsimd_last_state);
> +               st = &current->thread.fpsimd_state;
> +
> +               last->st = st;
> +               last->sve_in_use = test_thread_flag(TIF_SVE);
> +               st->cpu = smp_processor_id();
>         }

The change above is fine.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-15 13:30           ` Geert Uytterhoeven
  0 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-15 13:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Dave,

On Fri, Dec 15, 2017 at 12:23 PM, Dave Martin <Dave.Martin@arm.com> wrote:
> On Thu, Dec 14, 2017 at 07:08:27PM +0100, Geert Uytterhoeven wrote:
>> On Thu, Dec 14, 2017 at 4:24 PM, Dave P Martin <Dave.Martin@arm.com> wrote:
>> > On Thu, Dec 14, 2017 at 02:34:50PM +0000, Geert Uytterhoeven wrote:
>> >> On Tue, Dec 12, 2017 at 11:20 AM, Geert Uytterhoeven
>> >> <geert@linux-m68k.org> wrote:
>> >> > During userspace (Debian jessie NFS root) boot on arm64:
>> >> >
>> >> > rpcbind[1083]: unhandled level 0 translation fault (11) at 0x00000008,
>> >> > esr 0x92000004, in dash[aaaaadf77000+1a000]
>> >> > CPU: 0 PID: 1083 Comm: rpcbind Not tainted
>> >> > 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51
>> >> > Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
>> >>
>> >> This is a quad Cortex A57.
>> >>
>> >> > pstate: 80000000 (Nzcv daif -PAN -UAO)
>> >> > pc : 0xaaaaadf8a51c
>> >> > lr : 0xaaaaadf8ac08
>> >> > sp : 0000ffffcffeac00
>> >> > x29: 0000ffffcffeac00 x28: 0000aaaaadfa1000
>> >> > x27: 0000ffffcffebf7c x26: 0000ffffcffead20
>> >> > x25: 0000aaaacea1c5f0 x24: 0000000000000000
>> >> > x23: 0000aaaaadfa1000 x22: 0000aaaaadfa1000
>> >> > x21: 0000000000000000 x20: 0000000000000008
>> >> > x19: 0000000000000000 x18: 0000ffffcffeb500
>> >> > x17: 0000ffffa22babfc x16: 0000aaaaadfa1ae8
>> >> > x15: 0000ffffa2363588 x14: ffffffffffffffff
>> >> > x13: 0000000000000020 x12: 0000000000000010
>> >> > x11: 0101010101010101 x10: 0000aaaaadfa1000
>> >> > x9 : 00000000ffffff81 x8 : 0000aaaaadfa2000
>> >> > x7 : 0000000000000000 x6 : 0000000000000000
>> >> > x5 : 0000aaaaadfa2338 x4 : 0000aaaaadfa2000
>> >> > x3 : 0000aaaaadfa2338 x2 : 0000000000000000
>> >> > x1 : 0000aaaaadfa28b0 x0 : 0000aaaaadfa4c30
>> >> >
>> >> > Sometimes it happens with other processes, but the main address, esr, and
>> >> > pstate values are always the same.
>> >> >
>> >> > I regularly run arm64/for-next/core (through bi-weekly renesas-drivers
>> >> > releases, so the last time was two weeks ago), but never saw the issue
>> >> > before until today, so probably v4.15-rc1 is OK.
>> >> > Unfortunately it doesn't happen during every boot, which makes it
>> >> > cumbersome to bisect.
>> >> >
>> >> > My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that,
>> >> > and even without today's arm64/for-next/core merged in, I still managed to
>> >> > reproduce the issue, so I believe it was introduced in v4.15-rc2 or
>> >> > v4.15-rc3.
>> >> >
>> >> > Once, when the kernel message above wasn't shown, I got an error from
>> >> > userspace, which may be related:
>> >> > *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaadd970988 ***
>> >>
>> >> With more boots (10 instead of 6) to declare a kernel good, I bisected this
>> >> to commit 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD
>> >> state after signals").
>> >>
>> >> Reverting that commit on top of v4.15-rc3 fixed the issue for me.
>> >
>> > Good work on the bisect -- I'll need to have a think about this...
>> >
>> > That patch fixes a genuine problem so we can't just revert it.
>> >
>> > What if you revert _just this function_ back to what it was in v4.14?
>>
>> With fpsimd_update_current_state() reverted to v4.14, and
>>
>> -               __this_cpu_write(fpsimd_last_state, st);
>> +               __this_cpu_write(fpsimd_last_state.st, st);
>>
>> to make it build, the problem seems to be fixed, too.

> Interesting if I apply that to v4.14 and then flatten the new code for CONFIG_ARM64_SVE=n, I get:
>
> Working:
>
> void fpsimd_update_current_state(struct fpsimd_state *state)
> {
>         local_bh_disable();
>
>         fpsimd_load_state(state);
>         if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
>                 struct fpsimd_state *st = &current->thread.fpsimd_state;
>
>                 __this_cpu_write(fpsimd_last_state.st, st);
>                 st->cpu = smp_processor_id();
>         }
>
>         local_bh_enable();
> }
>
> Broken:
>
> void fpsimd_update_current_state(struct fpsimd_state *state)
> {
>         struct fpsimd_last_state_struct *last;
>         struct fpsimd_state *st;
>
>         local_bh_disable();
>
>         current->thread.fpsimd_state = *state;
>         fpsimd_load_state(&current->thread.fpsimd_state);
>
>         if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
>                 last = this_cpu_ptr(&fpsimd_last_state);
>                 st = &current->thread.fpsimd_state;
>
>                 last->st = st;
>                 last->sve_in_use = test_thread_flag(TIF_SVE);
>                 st->cpu = smp_processor_id();
>         }
>
>         local_bh_enable();
> }
>
> Can you try my flattened "broken" version by itself and see if that does
> reproduce the bug?  If not, my flattening may be making bad assumptions...
>
> Assuming the "broken" version reproduces the bug, I can't yet see exactly
> where the breakage comes from.

Correct, above "Working" is working, and "Broken" is broken.

> The two important differences here seem to be
>
> 1) Staging the state via current->thread.fpsimd_state instead of loading
> directly:
>
> -       fpsimd_load_state(state);
> +       current->thread.fpsimd_state = *state;
> +       fpsimd_load_state(&current->thread.fpsimd_state);

The change above introduces the breakage.

> 2) Using this_cpu_ptr() + assignment instead of __this_cpu_write() when
> reassociating the task's fpsimd context with the cpu:
>
>  {
> +       struct fpsimd_last_state_struct *last;
> +       struct fpsimd_state *st;
>
> [...]
>
>         if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
> -               struct fpsimd_state *st = &current->thread.fpsimd_state;
> -
> -               __this_cpu_write(fpsimd_last_state.st, st);
> -               st->cpu = smp_processor_id();
> +               last = this_cpu_ptr(&fpsimd_last_state);
> +               st = &current->thread.fpsimd_state;
> +
> +               last->st = st;
> +               last->sve_in_use = test_thread_flag(TIF_SVE);
> +               st->cpu = smp_processor_id();
>         }

The change above is fine.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-15 13:30           ` Geert Uytterhoeven
@ 2017-12-15 14:27             ` Will Deacon
  -1 siblings, 0 replies; 36+ messages in thread
From: Will Deacon @ 2017-12-15 14:27 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Dave Martin, Ard Biesheuvel, Catalin Marinas,
	Linux Kernel Mailing List, Linux-Renesas, Alex Bennée,
	linux-arm-kernel

On Fri, Dec 15, 2017 at 02:30:00PM +0100, Geert Uytterhoeven wrote:
> On Fri, Dec 15, 2017 at 12:23 PM, Dave Martin <Dave.Martin@arm.com> wrote:
> > The two important differences here seem to be
> >
> > 1) Staging the state via current->thread.fpsimd_state instead of loading
> > directly:
> >
> > -       fpsimd_load_state(state);
> > +       current->thread.fpsimd_state = *state;
> > +       fpsimd_load_state(&current->thread.fpsimd_state);
> 
> The change above introduces the breakage.

I finally managed to reproduce this, but only by using the exact same
compiler as Geert:

https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.9.0/x86_64-gcc-4.9.0-nolibc_aarch64-linux.tar.xz

I then reliably see the problem if I run:

  # /usr/bin/update-ca-certificates

from Debian Jessie.

Note that my normal toolchain (Linaro 7.1.1 build) works fine and also
if I use the toolchain above but disable CONFIG_ARM64_CRYPTO then things
work too.

So there's some toolchain-specific interaction between this change and the
crypto code...

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-15 14:27             ` Will Deacon
  0 siblings, 0 replies; 36+ messages in thread
From: Will Deacon @ 2017-12-15 14:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 15, 2017 at 02:30:00PM +0100, Geert Uytterhoeven wrote:
> On Fri, Dec 15, 2017 at 12:23 PM, Dave Martin <Dave.Martin@arm.com> wrote:
> > The two important differences here seem to be
> >
> > 1) Staging the state via current->thread.fpsimd_state instead of loading
> > directly:
> >
> > -       fpsimd_load_state(state);
> > +       current->thread.fpsimd_state = *state;
> > +       fpsimd_load_state(&current->thread.fpsimd_state);
> 
> The change above introduces the breakage.

I finally managed to reproduce this, but only by using the exact same
compiler as Geert:

https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.9.0/x86_64-gcc-4.9.0-nolibc_aarch64-linux.tar.xz

I then reliably see the problem if I run:

  # /usr/bin/update-ca-certificates

from Debian Jessie.

Note that my normal toolchain (Linaro 7.1.1 build) works fine and also
if I use the toolchain above but disable CONFIG_ARM64_CRYPTO then things
work too.

So there's some toolchain-specific interaction between this change and the
crypto code...

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-15 14:27             ` Will Deacon
@ 2017-12-15 15:56               ` Geert Uytterhoeven
  -1 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-15 15:56 UTC (permalink / raw)
  To: Will Deacon
  Cc: Dave Martin, Ard Biesheuvel, Catalin Marinas,
	Linux Kernel Mailing List, Linux-Renesas, Alex Bennée,
	linux-arm-kernel

On Fri, Dec 15, 2017 at 3:27 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Dec 15, 2017 at 02:30:00PM +0100, Geert Uytterhoeven wrote:
>> On Fri, Dec 15, 2017 at 12:23 PM, Dave Martin <Dave.Martin@arm.com> wrote:
>> > The two important differences here seem to be
>> >
>> > 1) Staging the state via current->thread.fpsimd_state instead of loading
>> > directly:
>> >
>> > -       fpsimd_load_state(state);
>> > +       current->thread.fpsimd_state = *state;
>> > +       fpsimd_load_state(&current->thread.fpsimd_state);
>>
>> The change above introduces the breakage.
>
> I finally managed to reproduce this, but only by using the exact same
> compiler as Geert:
>
> https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.9.0/x86_64-gcc-4.9.0-nolibc_aarch64-linux.tar.xz
>
> I then reliably see the problem if I run:
>
>   # /usr/bin/update-ca-certificates
>
> from Debian Jessie.
>
> Note that my normal toolchain (Linaro 7.1.1 build) works fine and also
> if I use the toolchain above but disable CONFIG_ARM64_CRYPTO then things
> work too.
>
> So there's some toolchain-specific interaction between this change and the
> crypto code...
>
> Will



-- 
Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-15 15:56               ` Geert Uytterhoeven
  0 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-15 15:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 15, 2017 at 3:27 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Dec 15, 2017 at 02:30:00PM +0100, Geert Uytterhoeven wrote:
>> On Fri, Dec 15, 2017 at 12:23 PM, Dave Martin <Dave.Martin@arm.com> wrote:
>> > The two important differences here seem to be
>> >
>> > 1) Staging the state via current->thread.fpsimd_state instead of loading
>> > directly:
>> >
>> > -       fpsimd_load_state(state);
>> > +       current->thread.fpsimd_state = *state;
>> > +       fpsimd_load_state(&current->thread.fpsimd_state);
>>
>> The change above introduces the breakage.
>
> I finally managed to reproduce this, but only by using the exact same
> compiler as Geert:
>
> https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.9.0/x86_64-gcc-4.9.0-nolibc_aarch64-linux.tar.xz
>
> I then reliably see the problem if I run:
>
>   # /usr/bin/update-ca-certificates
>
> from Debian Jessie.
>
> Note that my normal toolchain (Linaro 7.1.1 build) works fine and also
> if I use the toolchain above but disable CONFIG_ARM64_CRYPTO then things
> work too.
>
> So there's some toolchain-specific interaction between this change and the
> crypto code...
>
> Will



-- 
Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-15 14:27             ` Will Deacon
@ 2017-12-15 15:59               ` Geert Uytterhoeven
  -1 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-15 15:59 UTC (permalink / raw)
  To: Will Deacon
  Cc: Dave Martin, Ard Biesheuvel, Catalin Marinas,
	Linux Kernel Mailing List, Linux-Renesas, Alex Bennée,
	linux-arm-kernel

Hi Will,

On Fri, Dec 15, 2017 at 3:27 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Dec 15, 2017 at 02:30:00PM +0100, Geert Uytterhoeven wrote:
>> On Fri, Dec 15, 2017 at 12:23 PM, Dave Martin <Dave.Martin@arm.com> wrote:
>> > The two important differences here seem to be
>> >
>> > 1) Staging the state via current->thread.fpsimd_state instead of loading
>> > directly:
>> >
>> > -       fpsimd_load_state(state);
>> > +       current->thread.fpsimd_state = *state;
>> > +       fpsimd_load_state(&current->thread.fpsimd_state);
>>
>> The change above introduces the breakage.
>
> I finally managed to reproduce this, but only by using the exact same
> compiler as Geert:
>
> https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.9.0/x86_64-gcc-4.9.0-nolibc_aarch64-linux.tar.xz
>
> I then reliably see the problem if I run:
>
>   # /usr/bin/update-ca-certificates

/usr/sbin/... ?

> from Debian Jessie.

Funny, I've just got both

    *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaac17d4988 ***

and

    mountall.sh[2172]: unhandled level 0 translation fault (11) at
0x0000004d, esr 0x92000004, in dash[aaaace7e5000+1a000]

during boot up, but I can't get update-ca-certificates to fail...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-15 15:59               ` Geert Uytterhoeven
  0 siblings, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2017-12-15 15:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Fri, Dec 15, 2017 at 3:27 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Dec 15, 2017 at 02:30:00PM +0100, Geert Uytterhoeven wrote:
>> On Fri, Dec 15, 2017 at 12:23 PM, Dave Martin <Dave.Martin@arm.com> wrote:
>> > The two important differences here seem to be
>> >
>> > 1) Staging the state via current->thread.fpsimd_state instead of loading
>> > directly:
>> >
>> > -       fpsimd_load_state(state);
>> > +       current->thread.fpsimd_state = *state;
>> > +       fpsimd_load_state(&current->thread.fpsimd_state);
>>
>> The change above introduces the breakage.
>
> I finally managed to reproduce this, but only by using the exact same
> compiler as Geert:
>
> https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.9.0/x86_64-gcc-4.9.0-nolibc_aarch64-linux.tar.xz
>
> I then reliably see the problem if I run:
>
>   # /usr/bin/update-ca-certificates

/usr/sbin/... ?

> from Debian Jessie.

Funny, I've just got both

    *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaac17d4988 ***

and

    mountall.sh[2172]: unhandled level 0 translation fault (11) at
0x0000004d, esr 0x92000004, in dash[aaaace7e5000+1a000]

during boot up, but I can't get update-ca-certificates to fail...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-15 15:59               ` Geert Uytterhoeven
@ 2017-12-15 16:06                 ` Will Deacon
  -1 siblings, 0 replies; 36+ messages in thread
From: Will Deacon @ 2017-12-15 16:06 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Dave Martin, Ard Biesheuvel, Catalin Marinas,
	Linux Kernel Mailing List, Linux-Renesas, Alex Bennée,
	linux-arm-kernel

On Fri, Dec 15, 2017 at 04:59:28PM +0100, Geert Uytterhoeven wrote:
> On Fri, Dec 15, 2017 at 3:27 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Dec 15, 2017 at 02:30:00PM +0100, Geert Uytterhoeven wrote:
> >> On Fri, Dec 15, 2017 at 12:23 PM, Dave Martin <Dave.Martin@arm.com> wrote:
> >> > The two important differences here seem to be
> >> >
> >> > 1) Staging the state via current->thread.fpsimd_state instead of loading
> >> > directly:
> >> >
> >> > -       fpsimd_load_state(state);
> >> > +       current->thread.fpsimd_state = *state;
> >> > +       fpsimd_load_state(&current->thread.fpsimd_state);
> >>
> >> The change above introduces the breakage.
> >
> > I finally managed to reproduce this, but only by using the exact same
> > compiler as Geert:
> >
> > https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.9.0/x86_64-gcc-4.9.0-nolibc_aarch64-linux.tar.xz
> >
> > I then reliably see the problem if I run:
> >
> >   # /usr/bin/update-ca-certificates
> 
> /usr/sbin/... ?
> 
> > from Debian Jessie.
> 
> Funny, I've just got both
> 
>     *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaac17d4988 ***
> 
> and
> 
>     mountall.sh[2172]: unhandled level 0 translation fault (11) at
> 0x0000004d, esr 0x92000004, in dash[aaaace7e5000+1a000]
> 
> during boot up, but I can't get update-ca-certificates to fail...

Can you try the diff below, please?

Will

--->8

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 540a1e010eb5..fae81f7964b4 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1043,7 +1043,7 @@ void fpsimd_update_current_state(struct fpsimd_state *state)
 
        local_bh_disable();
 
-       current->thread.fpsimd_state = *state;
+       current->thread.fpsimd_state.user_fpsimd = state->user_fpsimd;
        if (system_supports_sve() && test_thread_flag(TIF_SVE))
                fpsimd_to_sve(current);

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-15 16:06                 ` Will Deacon
  0 siblings, 0 replies; 36+ messages in thread
From: Will Deacon @ 2017-12-15 16:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 15, 2017 at 04:59:28PM +0100, Geert Uytterhoeven wrote:
> On Fri, Dec 15, 2017 at 3:27 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Dec 15, 2017 at 02:30:00PM +0100, Geert Uytterhoeven wrote:
> >> On Fri, Dec 15, 2017 at 12:23 PM, Dave Martin <Dave.Martin@arm.com> wrote:
> >> > The two important differences here seem to be
> >> >
> >> > 1) Staging the state via current->thread.fpsimd_state instead of loading
> >> > directly:
> >> >
> >> > -       fpsimd_load_state(state);
> >> > +       current->thread.fpsimd_state = *state;
> >> > +       fpsimd_load_state(&current->thread.fpsimd_state);
> >>
> >> The change above introduces the breakage.
> >
> > I finally managed to reproduce this, but only by using the exact same
> > compiler as Geert:
> >
> > https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.9.0/x86_64-gcc-4.9.0-nolibc_aarch64-linux.tar.xz
> >
> > I then reliably see the problem if I run:
> >
> >   # /usr/bin/update-ca-certificates
> 
> /usr/sbin/... ?
> 
> > from Debian Jessie.
> 
> Funny, I've just got both
> 
>     *** Error in `/bin/sh': free(): invalid pointer: 0x0000aaaac17d4988 ***
> 
> and
> 
>     mountall.sh[2172]: unhandled level 0 translation fault (11) at
> 0x0000004d, esr 0x92000004, in dash[aaaace7e5000+1a000]
> 
> during boot up, but I can't get update-ca-certificates to fail...

Can you try the diff below, please?

Will

--->8

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 540a1e010eb5..fae81f7964b4 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1043,7 +1043,7 @@ void fpsimd_update_current_state(struct fpsimd_state *state)
 
        local_bh_disable();
 
-       current->thread.fpsimd_state = *state;
+       current->thread.fpsimd_state.user_fpsimd = state->user_fpsimd;
        if (system_supports_sve() && test_thread_flag(TIF_SVE))
                fpsimd_to_sve(current);

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: arm64: unhandled level 0 translation fault
  2017-12-15 13:30           ` Geert Uytterhoeven
@ 2017-12-15 17:11             ` Dave Martin
  -1 siblings, 0 replies; 36+ messages in thread
From: Dave Martin @ 2017-12-15 17:11 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, Linux-Renesas, Alex Bennée,
	linux-arm-kernel

On Fri, Dec 15, 2017 at 02:30:00PM +0100, Geert Uytterhoeven wrote:
> Hi Dave,
> 
> On Fri, Dec 15, 2017 at 12:23 PM, Dave Martin <Dave.Martin@arm.com> wrote:
> > On Thu, Dec 14, 2017 at 07:08:27PM +0100, Geert Uytterhoeven wrote:
> >> On Thu, Dec 14, 2017 at 4:24 PM, Dave P Martin <Dave.Martin@arm.com> wrote:

[...]

> >> > Good work on the bisect -- I'll need to have a think about this...
> >> >
> >> > That patch fixes a genuine problem so we can't just revert it.
> >> >
> >> > What if you revert _just this function_ back to what it was in v4.14?
> >>
> >> With fpsimd_update_current_state() reverted to v4.14, and
> >>
> >> -               __this_cpu_write(fpsimd_last_state, st);
> >> +               __this_cpu_write(fpsimd_last_state.st, st);
> >>
> >> to make it build, the problem seems to be fixed, too.
> 
> > Interesting if I apply that to v4.14 and then flatten the new code for CONFIG_ARM64_SVE=n, I get:
> >
> > Working:
> >
> > void fpsimd_update_current_state(struct fpsimd_state *state)
> > {
> >         local_bh_disable();
> >
> >         fpsimd_load_state(state);
> >         if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
> >                 struct fpsimd_state *st = &current->thread.fpsimd_state;
> >
> >                 __this_cpu_write(fpsimd_last_state.st, st);
> >                 st->cpu = smp_processor_id();
> >         }
> >
> >         local_bh_enable();
> > }
> >
> > Broken:
> >
> > void fpsimd_update_current_state(struct fpsimd_state *state)
> > {
> >         struct fpsimd_last_state_struct *last;
> >         struct fpsimd_state *st;
> >
> >         local_bh_disable();
> >
> >         current->thread.fpsimd_state = *state;
> >         fpsimd_load_state(&current->thread.fpsimd_state);
> >
> >         if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
> >                 last = this_cpu_ptr(&fpsimd_last_state);
> >                 st = &current->thread.fpsimd_state;
> >
> >                 last->st = st;
> >                 last->sve_in_use = test_thread_flag(TIF_SVE);
> >                 st->cpu = smp_processor_id();
> >         }
> >
> >         local_bh_enable();
> > }
> >
> > Can you try my flattened "broken" version by itself and see if that does
> > reproduce the bug?  If not, my flattening may be making bad assumptions...
> >
> > Assuming the "broken" version reproduces the bug, I can't yet see exactly
> > where the breakage comes from.
> 
> Correct, above "Working" is working, and "Broken" is broken.
> 
> > The two important differences here seem to be
> >
> > 1) Staging the state via current->thread.fpsimd_state instead of loading
> > directly:
> >
> > -       fpsimd_load_state(state);
> > +       current->thread.fpsimd_state = *state;
> > +       fpsimd_load_state(&current->thread.fpsimd_state);
> 
> The change above introduces the breakage.
> 
> > 2) Using this_cpu_ptr() + assignment instead of __this_cpu_write() when
> > reassociating the task's fpsimd context with the cpu:
> >
> >  {
> > +       struct fpsimd_last_state_struct *last;
> > +       struct fpsimd_state *st;
> >
> > [...]
> >
> >         if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
> > -               struct fpsimd_state *st = &current->thread.fpsimd_state;
> > -
> > -               __this_cpu_write(fpsimd_last_state.st, st);
> > -               st->cpu = smp_processor_id();
> > +               last = this_cpu_ptr(&fpsimd_last_state);
> > +               st = &current->thread.fpsimd_state;
> > +
> > +               last->st = st;
> > +               last->sve_in_use = test_thread_flag(TIF_SVE);
> > +               st->cpu = smp_processor_id();
> >         }
> 
> The change above is fine.

Thanks for this.

Will came up with a convincing hypothesis for how the dodgy change broke
things here -- see the diff in his separate reply.

I'll cook up a more complete fix, but the diff Will provided should at
least get things working.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 36+ messages in thread

* arm64: unhandled level 0 translation fault
@ 2017-12-15 17:11             ` Dave Martin
  0 siblings, 0 replies; 36+ messages in thread
From: Dave Martin @ 2017-12-15 17:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 15, 2017 at 02:30:00PM +0100, Geert Uytterhoeven wrote:
> Hi Dave,
> 
> On Fri, Dec 15, 2017 at 12:23 PM, Dave Martin <Dave.Martin@arm.com> wrote:
> > On Thu, Dec 14, 2017 at 07:08:27PM +0100, Geert Uytterhoeven wrote:
> >> On Thu, Dec 14, 2017 at 4:24 PM, Dave P Martin <Dave.Martin@arm.com> wrote:

[...]

> >> > Good work on the bisect -- I'll need to have a think about this...
> >> >
> >> > That patch fixes a genuine problem so we can't just revert it.
> >> >
> >> > What if you revert _just this function_ back to what it was in v4.14?
> >>
> >> With fpsimd_update_current_state() reverted to v4.14, and
> >>
> >> -               __this_cpu_write(fpsimd_last_state, st);
> >> +               __this_cpu_write(fpsimd_last_state.st, st);
> >>
> >> to make it build, the problem seems to be fixed, too.
> 
> > Interesting if I apply that to v4.14 and then flatten the new code for CONFIG_ARM64_SVE=n, I get:
> >
> > Working:
> >
> > void fpsimd_update_current_state(struct fpsimd_state *state)
> > {
> >         local_bh_disable();
> >
> >         fpsimd_load_state(state);
> >         if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
> >                 struct fpsimd_state *st = &current->thread.fpsimd_state;
> >
> >                 __this_cpu_write(fpsimd_last_state.st, st);
> >                 st->cpu = smp_processor_id();
> >         }
> >
> >         local_bh_enable();
> > }
> >
> > Broken:
> >
> > void fpsimd_update_current_state(struct fpsimd_state *state)
> > {
> >         struct fpsimd_last_state_struct *last;
> >         struct fpsimd_state *st;
> >
> >         local_bh_disable();
> >
> >         current->thread.fpsimd_state = *state;
> >         fpsimd_load_state(&current->thread.fpsimd_state);
> >
> >         if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
> >                 last = this_cpu_ptr(&fpsimd_last_state);
> >                 st = &current->thread.fpsimd_state;
> >
> >                 last->st = st;
> >                 last->sve_in_use = test_thread_flag(TIF_SVE);
> >                 st->cpu = smp_processor_id();
> >         }
> >
> >         local_bh_enable();
> > }
> >
> > Can you try my flattened "broken" version by itself and see if that does
> > reproduce the bug?  If not, my flattening may be making bad assumptions...
> >
> > Assuming the "broken" version reproduces the bug, I can't yet see exactly
> > where the breakage comes from.
> 
> Correct, above "Working" is working, and "Broken" is broken.
> 
> > The two important differences here seem to be
> >
> > 1) Staging the state via current->thread.fpsimd_state instead of loading
> > directly:
> >
> > -       fpsimd_load_state(state);
> > +       current->thread.fpsimd_state = *state;
> > +       fpsimd_load_state(&current->thread.fpsimd_state);
> 
> The change above introduces the breakage.
> 
> > 2) Using this_cpu_ptr() + assignment instead of __this_cpu_write() when
> > reassociating the task's fpsimd context with the cpu:
> >
> >  {
> > +       struct fpsimd_last_state_struct *last;
> > +       struct fpsimd_state *st;
> >
> > [...]
> >
> >         if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
> > -               struct fpsimd_state *st = &current->thread.fpsimd_state;
> > -
> > -               __this_cpu_write(fpsimd_last_state.st, st);
> > -               st->cpu = smp_processor_id();
> > +               last = this_cpu_ptr(&fpsimd_last_state);
> > +               st = &current->thread.fpsimd_state;
> > +
> > +               last->st = st;
> > +               last->sve_in_use = test_thread_flag(TIF_SVE);
> > +               st->cpu = smp_processor_id();
> >         }
> 
> The change above is fine.

Thanks for this.

Will came up with a convincing hypothesis for how the dodgy change broke
things here -- see the diff in his separate reply.

I'll cook up a more complete fix, but the diff Will provided should at
least get things working.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2017-12-15 17:11 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-12 10:20 arm64: unhandled level 0 translation fault Geert Uytterhoeven
2017-12-12 10:20 ` Geert Uytterhoeven
2017-12-12 10:36 ` Will Deacon
2017-12-12 10:36   ` Will Deacon
2017-12-12 15:11   ` Geert Uytterhoeven
2017-12-12 15:11     ` Geert Uytterhoeven
2017-12-12 16:00     ` Geert Uytterhoeven
2017-12-12 16:00       ` Geert Uytterhoeven
2017-12-12 16:57       ` Will Deacon
2017-12-12 16:57         ` Will Deacon
2017-12-12 20:54         ` Geert Uytterhoeven
2017-12-12 20:54           ` Geert Uytterhoeven
2017-12-13 10:24           ` Will Deacon
2017-12-13 10:24             ` Will Deacon
2017-12-14 14:34 ` Geert Uytterhoeven
2017-12-14 14:34   ` Geert Uytterhoeven
2017-12-14 15:16   ` Will Deacon
2017-12-14 15:16     ` Will Deacon
2017-12-14 15:24   ` Dave P Martin
2017-12-14 15:24     ` Dave P Martin
2017-12-14 18:08     ` Geert Uytterhoeven
2017-12-14 18:08       ` Geert Uytterhoeven
2017-12-15 11:23       ` Dave Martin
2017-12-15 11:23         ` Dave Martin
2017-12-15 13:30         ` Geert Uytterhoeven
2017-12-15 13:30           ` Geert Uytterhoeven
2017-12-15 14:27           ` Will Deacon
2017-12-15 14:27             ` Will Deacon
2017-12-15 15:56             ` Geert Uytterhoeven
2017-12-15 15:56               ` Geert Uytterhoeven
2017-12-15 15:59             ` Geert Uytterhoeven
2017-12-15 15:59               ` Geert Uytterhoeven
2017-12-15 16:06               ` Will Deacon
2017-12-15 16:06                 ` Will Deacon
2017-12-15 17:11           ` Dave Martin
2017-12-15 17:11             ` Dave Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.