All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm64: restore get_current() optimisation
@ 2017-01-03 18:27 Mark Rutland
  2017-01-04 15:23 ` Will Deacon
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Mark Rutland @ 2017-01-03 18:27 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Catalin,

My THREAD_INFO_IN_TASK series had an unintended performance regression in
get_current() / current_thread_info(). Could you please take the below as a
fix for the next rc?

Thanks,
Mark.

---->8----
Commit c02433dd6de32f04 ("arm64: split thread_info from task stack")
inverted the relationship between get_current() and
current_thread_info(), with sp_el0 now holding the current task_struct
rather than the current thead_info. The new implementation of
get_current() prevents the compiler from being able to optimize repeated
calls to either, resulting in a noticeable penalty in some
microbenchmarks.

This patch restores the previous optimisation by implementing
get_current() in the same way as our old current_thread_info(), using a
non-volatile asm statement.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Davidlohr Bueso <dbueso@suse.de>
---
 arch/arm64/include/asm/current.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/current.h b/arch/arm64/include/asm/current.h
index f2bcbe2..86c4041 100644
--- a/arch/arm64/include/asm/current.h
+++ b/arch/arm64/include/asm/current.h
@@ -9,9 +9,17 @@
 
 struct task_struct;
 
+/*
+ * We don't use read_sysreg() as we want the compiler to cache the value where
+ * possible.
+ */
 static __always_inline struct task_struct *get_current(void)
 {
-	return (struct task_struct *)read_sysreg(sp_el0);
+	unsigned long sp_el0;
+
+	asm ("mrs %0, sp_el0" : "=r" (sp_el0));
+
+	return (struct task_struct *)sp_el0;
 }
 
 #define current get_current()
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH] arm64: restore get_current() optimisation
  2017-01-03 18:27 [PATCH] arm64: restore get_current() optimisation Mark Rutland
@ 2017-01-04 15:23 ` Will Deacon
  2017-03-02 11:35 ` Jon Hunter
  2017-03-02 11:54 ` Andreas Färber
  2 siblings, 0 replies; 14+ messages in thread
From: Will Deacon @ 2017-01-04 15:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 03, 2017 at 06:27:01PM +0000, Mark Rutland wrote:
> Hi Catalin,
> 
> My THREAD_INFO_IN_TASK series had an unintended performance regression in
> get_current() / current_thread_info(). Could you please take the below as a
> fix for the next rc?
> 
> Thanks,
> Mark.
> 
> ---->8----
> Commit c02433dd6de32f04 ("arm64: split thread_info from task stack")
> inverted the relationship between get_current() and
> current_thread_info(), with sp_el0 now holding the current task_struct
> rather than the current thead_info. The new implementation of
> get_current() prevents the compiler from being able to optimize repeated
> calls to either, resulting in a noticeable penalty in some
> microbenchmarks.
> 
> This patch restores the previous optimisation by implementing
> get_current() in the same way as our old current_thread_info(), using a
> non-volatile asm statement.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Reported-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  arch/arm64/include/asm/current.h | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)

Acked-by: Will Deacon <will.deacon@arm.com>

Thanks for putting this back like it was!

Will

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] arm64: restore get_current() optimisation
  2017-01-03 18:27 [PATCH] arm64: restore get_current() optimisation Mark Rutland
  2017-01-04 15:23 ` Will Deacon
@ 2017-03-02 11:35 ` Jon Hunter
  2017-03-02 12:35   ` Mark Rutland
  2017-03-02 11:54 ` Andreas Färber
  2 siblings, 1 reply; 14+ messages in thread
From: Jon Hunter @ 2017-03-02 11:35 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

On 03/01/17 18:27, Mark Rutland wrote:
> Hi Catalin,
> 
> My THREAD_INFO_IN_TASK series had an unintended performance regression in
> get_current() / current_thread_info(). Could you please take the below as a
> fix for the next rc?
> 
> Thanks,
> Mark.
> 
> ---->8----
> Commit c02433dd6de32f04 ("arm64: split thread_info from task stack")
> inverted the relationship between get_current() and
> current_thread_info(), with sp_el0 now holding the current task_struct
> rather than the current thead_info. The new implementation of
> get_current() prevents the compiler from being able to optimize repeated
> calls to either, resulting in a noticeable penalty in some
> microbenchmarks.
> 
> This patch restores the previous optimisation by implementing
> get_current() in the same way as our old current_thread_info(), using a
> non-volatile asm statement.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Reported-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  arch/arm64/include/asm/current.h | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/current.h b/arch/arm64/include/asm/current.h
> index f2bcbe2..86c4041 100644
> --- a/arch/arm64/include/asm/current.h
> +++ b/arch/arm64/include/asm/current.h
> @@ -9,9 +9,17 @@
>  
>  struct task_struct;
>  
> +/*
> + * We don't use read_sysreg() as we want the compiler to cache the value where
> + * possible.
> + */
>  static __always_inline struct task_struct *get_current(void)
>  {
> -	return (struct task_struct *)read_sysreg(sp_el0);
> +	unsigned long sp_el0;
> +
> +	asm ("mrs %0, sp_el0" : "=r" (sp_el0));
> +
> +	return (struct task_struct *)sp_el0;
>  }
>  
>  #define current get_current()

I noticed that with v4.10 I am seeing the following panic ...

[  184.523390] Unable to handle kernel paging request at virtual address ffff8001bb7a2800
[  184.531316] pgd = ffff8000b96b1000
[  184.534711] [ffff8001bb7a2800] *pgd=0000000000000000
[  184.539670] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  184.545231] Modules linked in:
[  184.548285] CPU: 2 PID: 1407 Comm: tinymix Not tainted 4.10.0-00017-g50bc4a8b2868 #19
[  184.556104] Hardware name: Google Pixel C (DT)
[  184.560540] task: ffff8000bb558c80 task.stack: ffff8000b9648000
[  184.566458] PC is at regcache_flat_read+0x14/0x20
[  184.571155] LR is at regcache_read+0x50/0x78
[  184.575417] pc : [<ffff0000085d0c6c>] lr : [<ffff0000085cefa8>] pstate: 400001c5
[  184.582802] sp : ffff8000b964b970
[  184.586108] x29: ffff8000b964b970 x28: ffff8000b9584800 
[  184.591412] x27: ffff8000b964bcc8 x26: ffff8000b9461000 
[  184.596716] x25: 0000000000000000 x24: 0000000000000000 
[  184.602019] x23: 00000000ffff8000 x22: ffff8000b964ba1c 
[  184.607322] x21: ffff8000b964ba1c x20: 00000000ffff8000 
[  184.612626] x19: ffff8000bb7dc400 x18: 0000000000000000 
[  184.617928] x17: 0000000000000001 x16: ffff0000081f79e8 
[  184.623230] x15: 0000000000497000 x14: 0000000000000000 
[  184.628532] x13: 0000000000000001 x12: 0000000005cc6000 
[  184.633835] x11: 0000000000000000 x10: ffff8000bc16bf00 
[  184.639138] x9 : 0000000000000000 x8 : 0000000000000000 
[  184.644441] x7 : ffff8000bff68908 x6 : 0000000000000000 
[  184.649742] x5 : ffff000008fc9f00 x4 : ffff8000bb7aa800 
[  184.655044] x3 : 0000000000000002 x2 : ffff8000b964ba1c 
[  184.660347] x1 : 000000003fffe000 x0 : 0000000000000000 
[  184.665650] 
[  184.667137] Process tinymix (pid: 1407, stack limit = 0xffff8000b9648000)
[  184.673913] Stack: (0xffff8000b964b970 to 0xffff8000b964c000)
[  184.679649] b960:                                   ffff8000b964b9a0 ffff0000085cce60
[  184.687469] b980: ffff8000bb7dc400 ffff8000bb7dc400 00000000ffff8000 ffff0000085cd104
[  184.695288] b9a0: ffff8000b964b9d0 ffff0000085cd218 ffff8000b964ba8f ffff8000bb7dc400
[  184.703109] b9c0: 00000000bc1d14a0 00000000ffff8000 ffff8000b964ba20 ffff0000085ce1d8
[  184.710929] b9e0: ffff8000bb7dc400 00000000ffff8000 00000000bc1d14a0 00000000ffff8000
[  184.718748] ba00: ffff8000b964ba8f 0000000000000000 ffff8000bb7dc400 ffff0000085ce1e8
[  184.726567] ba20: ffff8000b964ba70 ffff000008856c44 ffff000008ffbff0 ffff000008ffbe08
[  184.734386] ba40: 0000000000000001 ffff8000b964bb08 ffff8000b964bb28 0000000000000000
[  184.742206] ba60: ffff000008ffc020 ffff00000884e700 ffff8000b964ba90 ffff00000884e7f4
[  184.750026] ba80: ffff8000b964ba80 00ff8000b964ba80 ffff8000b964bb40 ffff00000884eb2c
[  184.757846] baa0: ffff8000b9584748 0000000000000008 ffff8000b9583900 ffff000008ffbe08
[  184.765666] bac0: ffff000008ffaa30 ffff8000b964bcc8 0000000000000003 0000000000000002
[  184.773485] bae0: 0000000000000003 ffff000008ffaa20 ffff8000b964bb20 ffff000008d6ede8
[  184.781303] bb00: ffff8000bb7dc400 ffff8000b9544710 ffff8000b9544710 ffff8000b964bb18
[  184.789122] bb20: ffff8000b964bb18 ffff8000b964bb28 ffff8000b964bb28 ffff00000884ebbc
[  184.796942] bb40: ffff8000b964bb80 ffff00000884eb9c ffff000008ffbe08 ffff8000b9583900
[  184.804762] bb60: ffff000008ffbe58 0000000000000001 ffff000008ffaa20 0000000000000001
[  184.812581] bb80: ffff8000b964bbc0 ffff00000886bd04 0000000000000001 ffff8000b9583900
[  184.820402] bba0: ffff8000b964bcf0 ffff8000b964bcf0 ffff000009062000 ffff000008b0a390
[  184.828220] bbc0: ffff8000b964bcf0 ffff000008830110 ffff8000bc33b000 ffff8000bc1d1000
[  184.836039] bbe0: 00000000ffffffff ffff8000b96a9800 ffff8000bc1d14a0 ffff8000bc1d1870
[  184.843858] bc00: 0000000000000123 000000000000001d ffff000008982000 ffff8000bb558c80
[  184.851677] bc20: ffff8000b964bd40 0000000000000000 0000000000000001 ffff000008830b24
[  184.859496] bc40: ffff000008b0a390 ffff8000bc33b000 ffff8000bb7b9520 ffff8000bb7b9400
[  184.867316] bc60: 0000000200000139 0000024000000040 78754d2000000440 ffff8000b9583900
[  184.875137] bc80: 3f30031f00000240 0000000000000000 0000000000000000 0000000000000000
[  184.882956] bca0: ffff8000b9583900 ff1cf31300000440 ffff800000000000 ffff000008830038
[  184.890777] bcc0: ffff8000bc33b000 ffff8000b9583900 0f1f03ff00000040 ffff800000000001
[  184.898597] bce0: ffff8000bc1d14a0 ffff00000818f4e4 ffff8000b964bd70 ffff000008830610
[  184.906417] bd00: ffff8000bc33b000 0000000000000000 0000fffffdbf4308 ffff8000bc1d1000
[  184.914236] bd20: ffff8000b96a9800 000000000000001d ffff000008982000 0000000000000000
[  184.922055] bd40: ffff8000b964bd70 ffff0000088305d0 00000000c4c85513 0000fffffdbf4308
[  184.929875] bd60: 0000fffffdbf4308 0000000000000000 ffff8000b964be00 ffff0000081f7354
[  184.937694] bd80: ffff8000b9665600 0000fffffdbf4308 ffff8000b969b238 0000000000000003
[  184.945514] bda0: 00000000c4c85513 0000fffffdbf4308 0000000000000123 0000000092000047
[  184.953333] bdc0: 000000003a0f1018 ffff8000b964bec0 0000000060000000 0000000000000024
[  184.961152] bde0: 0000000092000047 000000003a0f1018 0000000000000020 ffff8000bb558c80
[  184.968972] be00: ffff8000b964be80 ffff0000081f7a74 0000000000000000 ffff8000b9665600
[  184.976792] be20: ffff8000b9665600 0000000000000003 00000000c4c85513 0000000000415230
[  184.984612] be40: ffff8000b964be80 ffff0000081f7a28 0000000000000000 ffff8000b9665600
[  184.992432] be60: ffff8000b9665600 0000000000000003 00000000c4c85513 ffff0000081f7a0c
[  185.000253] be80: 0000000000000000 ffff000008082f30 0000000000000000 00008000b70ac000
[  185.008072] bea0: ffffffffffffffff 000000000041c51c 0000000080000000 0000000000000015
[  185.015892] bec0: 0000000000000003 00000000c4c85513 0000fffffdbf4308 0000000000000010
[  185.023712] bee0: fffffffffffffff0 0000000000000040 000000000000003f 0000000000000000
[  185.031530] bf00: 000000000000001d 0000000000000004 0101010101010101 0000000000000005
[  185.039350] bf20: ffffffffffffffff 0000000000499000 0000000000499000 0000000000497000
[  185.047169] bf40: 0000fffffdbf4b68 0000000000000001 0000000000000000 00000000004001a0
[  185.054988] bf60: 0000000000000000 00000000004001a0 0000000000000000 0000000000000000
[  185.062807] bf80: 000000000040559c 00000000004054e4 0000000000000000 0000000000000000
[  185.070627] bfa0: 0000000000000000 0000fffffdbf42e0 0000000000402998 0000fffffdbf42e0
[  185.078447] bfc0: 000000000041c51c 0000000080000000 0000000000000003 000000000000001d
[  185.086265] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  185.094083] Call trace:
[  185.096525] Exception stack(0xffff8000b964b7a0 to 0xffff8000b964b8d0)
[  185.102954] b7a0: ffff8000bb7dc400 0001000000000000 ffff8000b964b970 ffff0000085d0c6c
[  185.110774] b7c0: ffff8000b964b7e0 ffff0000080fb520 0000000000000001 0000002af6734537
[  185.118594] b7e0: ffff8000b964b810 ffff0000080e8f40 ffff8000bc16be80 00000000000008f4
[  185.126415] b800: ffff8000b964b830 ffff0000080eeb14 ffff8000bbaa0d00 ffff8000bff688e0
[  185.134234] b820: ffff8000b964b830 ffff0000080eeb28 ffff8000b964b850 ffff0000080e94f4
[  185.142054] b840: 0000000000000000 000000003fffe000 ffff8000b964ba1c 0000000000000002
[  185.149873] b860: ffff8000bb7aa800 ffff000008fc9f00 0000000000000000 ffff8000bff68908
[  185.157693] b880: 0000000000000000 0000000000000000 ffff8000bc16bf00 0000000000000000
[  185.165512] b8a0: 0000000005cc6000 0000000000000001 0000000000000000 0000000000497000
[  185.173331] b8c0: ffff0000081f79e8 0000000000000001
[  185.178203] [<ffff0000085d0c6c>] regcache_flat_read+0x14/0x20
[  185.183939] [<ffff0000085cce60>] _regmap_read+0x98/0xe8
[  185.189155] [<ffff0000085cd218>] _regmap_update_bits+0xa0/0xf0
[  185.194978] [<ffff0000085ce1d8>] regmap_update_bits_base+0x60/0x90
[  185.201152] [<ffff000008856c44>] snd_soc_component_update_bits+0x24/0x40
[  185.207843] [<ffff00000884e7f4>] dapm_power_widgets+0x474/0x730
[  185.213751] [<ffff00000884eb2c>] soc_dapm_mux_update_power.isra.29+0x7c/0xa0
[  185.220787] [<ffff00000884eb9c>] snd_soc_dapm_mux_update_power+0x4c/0x88
[  185.227479] [<ffff00000886bd04>] tegra210_xbar_put_value_enum+0x1b4/0x228
[  185.234256] [<ffff000008830110>] snd_ctl_elem_write+0x110/0x188
[  185.240165] [<ffff000008830610>] snd_ctl_ioctl+0xd0/0x798
[  185.245557] [<ffff0000081f7354>] do_vfs_ioctl+0xa4/0x738
[  185.250859] [<ffff0000081f7a74>] SyS_ioctl+0x8c/0xa0
[  185.255818] [<ffff000008082f30>] el0_svc_naked+0x24/0x28
[  185.261121] Code: 52800000 b941c883 f9410084 1ac32421 (b8615881) 
[  185.267223] ---[ end trace 5f6a6332822eca30 ]---

Bisecting the panic ends up at this patch and reverting it on top of v4.10 prevents this from
occurring. 

The occurs when I start playing audio on Tegra210 using tinymix. I do have some out-of-tree
patches for Tegra audio that I am using when seeing this but I have been using those for
probably a year or so, as I am gradually upstreaming bits.

I am a bit flummoxed by the above, any thoughts?

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] arm64: restore get_current() optimisation
  2017-01-03 18:27 [PATCH] arm64: restore get_current() optimisation Mark Rutland
  2017-01-04 15:23 ` Will Deacon
  2017-03-02 11:35 ` Jon Hunter
@ 2017-03-02 11:54 ` Andreas Färber
  2017-03-02 12:40   ` Mark Rutland
  2 siblings, 1 reply; 14+ messages in thread
From: Andreas Färber @ 2017-03-02 11:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

Am 03.01.2017 um 19:27 schrieb Mark Rutland:
> Commit c02433dd6de32f04 ("arm64: split thread_info from task stack")
> inverted the relationship between get_current() and
> current_thread_info(), with sp_el0 now holding the current task_struct
> rather than the current thead_info. The new implementation of

"thread_info"?

> get_current() prevents the compiler from being able to optimize repeated
> calls to either, resulting in a noticeable penalty in some
> microbenchmarks.
> 
> This patch restores the previous optimisation by implementing
> get_current() in the same way as our old current_thread_info(), using a
> non-volatile asm statement.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Reported-by: Davidlohr Bueso <dbueso@suse.de>

Should this get a Fixes: c02433dd6de32f04 ("arm64: split thread_info
from task stack") header?

Regards,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 N?rnberg, Germany
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] arm64: restore get_current() optimisation
  2017-03-02 11:35 ` Jon Hunter
@ 2017-03-02 12:35   ` Mark Rutland
  2017-03-02 15:30     ` Jon Hunter
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2017-03-02 12:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 02, 2017 at 11:35:06AM +0000, Jon Hunter wrote:
> Hi Mark,

Hi Jon,

> On 03/01/17 18:27, Mark Rutland wrote:
> > Commit c02433dd6de32f04 ("arm64: split thread_info from task stack")
> > inverted the relationship between get_current() and
> > current_thread_info(), with sp_el0 now holding the current task_struct
> > rather than the current thead_info. The new implementation of
> > get_current() prevents the compiler from being able to optimize repeated
> > calls to either, resulting in a noticeable penalty in some
> > microbenchmarks.
> > 
> > This patch restores the previous optimisation by implementing
> > get_current() in the same way as our old current_thread_info(), using a
> > non-volatile asm statement.

> > +/*
> > + * We don't use read_sysreg() as we want the compiler to cache the value where
> > + * possible.
> > + */
> >  static __always_inline struct task_struct *get_current(void)
> >  {
> > -	return (struct task_struct *)read_sysreg(sp_el0);
> > +	unsigned long sp_el0;
> > +
> > +	asm ("mrs %0, sp_el0" : "=r" (sp_el0));
> > +
> > +	return (struct task_struct *)sp_el0;
> >  }
> >  
> >  #define current get_current()

> I noticed that with v4.10 I am seeing the following panic ...

Ouch. :(

For reference, which toolchain are you using? This kind of code tends to be
toolchain-sensitive.

> [  184.523390] Unable to handle kernel paging request at virtual address ffff8001bb7a2800
> [  184.531316] pgd = ffff8000b96b1000
> [  184.534711] [ffff8001bb7a2800] *pgd=0000000000000000
> [  184.539670] Internal error: Oops: 96000005 [#1] PREEMPT SMP

That ESR_EL1 value decodes as a "Data Abort taken without a change in Exception
level", the DFSC decodes as "Translation fault, level 1", and WnR is clear.

So we're blowing up on a read of a bogus address.

> [  184.566458] PC is at regcache_flat_read+0x14/0x20
> [  184.571155] LR is at regcache_read+0x50/0x78
> [  184.575417] pc : [<ffff0000085d0c6c>] lr : [<ffff0000085cefa8>] pstate: 400001c5

Judging by the PC, that read could be any of:

* the read of map->cache at the start of regcache_flat_read()

* an inlined regcache_get_index_by_order()'s read of map->reg_stride_order

* the read of cache[regcache_flat_get_index(map, reg)]

... so it seems either map or map->cache is dodgy.

If you're can addr2line that PC, that should tell us which access is
blowing up, and therefore which pointer is dodgy.

We'll want the full output considering inlined functions, i.e.

${CROSS_COMPILE}addr2line -ife vmlinux 0xffff0000085d0c6c

> [  184.582802] sp : ffff8000b964b970
> [  184.586108] x29: ffff8000b964b970 x28: ffff8000b9584800 
> [  184.591412] x27: ffff8000b964bcc8 x26: ffff8000b9461000 
> [  184.596716] x25: 0000000000000000 x24: 0000000000000000 
> [  184.602019] x23: 00000000ffff8000 x22: ffff8000b964ba1c 
> [  184.607322] x21: ffff8000b964ba1c x20: 00000000ffff8000 
> [  184.612626] x19: ffff8000bb7dc400 x18: 0000000000000000 
> [  184.617928] x17: 0000000000000001 x16: ffff0000081f79e8 
> [  184.623230] x15: 0000000000497000 x14: 0000000000000000 
> [  184.628532] x13: 0000000000000001 x12: 0000000005cc6000 
> [  184.633835] x11: 0000000000000000 x10: ffff8000bc16bf00 
> [  184.639138] x9 : 0000000000000000 x8 : 0000000000000000 
> [  184.644441] x7 : ffff8000bff68908 x6 : 0000000000000000 
> [  184.649742] x5 : ffff000008fc9f00 x4 : ffff8000bb7aa800 
> [  184.655044] x3 : 0000000000000002 x2 : ffff8000b964ba1c 
> [  184.660347] x1 : 000000003fffe000 x0 : 0000000000000000 

> [  185.178203] [<ffff0000085d0c6c>] regcache_flat_read+0x14/0x20
> [  185.183939] [<ffff0000085cce60>] _regmap_read+0x98/0xe8
> [  185.189155] [<ffff0000085cd218>] _regmap_update_bits+0xa0/0xf0
> [  185.194978] [<ffff0000085ce1d8>] regmap_update_bits_base+0x60/0x90
> [  185.201152] [<ffff000008856c44>] snd_soc_component_update_bits+0x24/0x40

AFAICT, these don't implicitly access current as part of generating the
map pointer, so the dodgy pointer must have been generated above this
level.

At this level I can't see why current would be involved at all. Beyond this
point it's rather painful to follow the backtrace due to inlining.

> [  185.207843] [<ffff00000884e7f4>] dapm_power_widgets+0x474/0x730
> [  185.213751] [<ffff00000884eb2c>] soc_dapm_mux_update_power.isra.29+0x7c/0xa0
> [  185.220787] [<ffff00000884eb9c>] snd_soc_dapm_mux_update_power+0x4c/0x88
> [  185.227479] [<ffff00000886bd04>] tegra210_xbar_put_value_enum+0x1b4/0x228
> [  185.234256] [<ffff000008830110>] snd_ctl_elem_write+0x110/0x188
> [  185.240165] [<ffff000008830610>] snd_ctl_ioctl+0xd0/0x798
> [  185.245557] [<ffff0000081f7354>] do_vfs_ioctl+0xa4/0x738
> [  185.250859] [<ffff0000081f7a74>] SyS_ioctl+0x8c/0xa0
> [  185.255818] [<ffff000008082f30>] el0_svc_naked+0x24/0x28
> [  185.261121] Code: 52800000 b941c883 f9410084 1ac32421 (b8615881) 
> [  185.267223] ---[ end trace 5f6a6332822eca30 ]---
> 
> Bisecting the panic ends up at this patch and reverting it on top of v4.10 prevents this from
> occurring. 
> 
> The occurs when I start playing audio on Tegra210 using tinymix. I do have some out-of-tree
> patches for Tegra audio that I am using when seeing this but I have been using those for
> probably a year or so, as I am gradually upstreaming bits.
> 
> I am a bit flummoxed by the above, any thoughts?

Likewise. :/

It could just be that this happens to change the alignment/size of things, and
unmasks a latent bug. Possibly, the removal of volatile has allowed some code
to be reordered, highlighting missing barriers/synchronisation.

Maybe we are generating current wrong in some case, though I can't see how, and
this is the only such report I've seen.

If the commit in question is resulting in get_current() behaving differently,
it *might* be possible to detect with the hack below. I haven't seen it blow up
on my test systems.

Otherwise, it might be worth giving KASAN a go; that might detect data
corruption. If you have a recent enough toolchain, you only need enable
CONFIG_KASAN. This will make your kernel Image a fair amount larger.

Thanks,
Mark.

---->8----
diff --git a/arch/arm64/include/asm/current.h b/arch/arm64/include/asm/current.h
index 86c4041..2093103 100644
--- a/arch/arm64/include/asm/current.h
+++ b/arch/arm64/include/asm/current.h
@@ -1,6 +1,7 @@
 #ifndef __ASM_CURRENT_H
 #define __ASM_CURRENT_H
 
+#include <linux/bug.h>
 #include <linux/compiler.h>
 
 #include <asm/sysreg.h>
@@ -13,7 +14,7 @@
  * We don't use read_sysreg() as we want the compiler to cache the value where
  * possible.
  */
-static __always_inline struct task_struct *get_current(void)
+static __always_inline struct task_struct *__get_cached_current(void)
 {
        unsigned long sp_el0;
 
@@ -22,6 +23,19 @@ static __always_inline struct task_struct *get_current(void)
        return (struct task_struct *)sp_el0;
 }
 
+static __always_inline struct task_struct *__get_uncached_current(void)
+{
+       return (struct task_struct *)read_sysreg(sp_el0);
+}
+
+static __always_inline struct task_struct *get_current(void)
+{
+       struct task_struct *cur = __get_cached_current();
+       BUG_ON(cur != __get_uncached_current());
+
+       return cur;
+}
+
 #define current get_current()
 
 #endif /* __ASSEMBLY__ */

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH] arm64: restore get_current() optimisation
  2017-03-02 11:54 ` Andreas Färber
@ 2017-03-02 12:40   ` Mark Rutland
  2017-03-02 12:43     ` Andreas Färber
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2017-03-02 12:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 02, 2017 at 12:54:24PM +0100, Andreas F?rber wrote:
> Hi Mark,
> 
> Am 03.01.2017 um 19:27 schrieb Mark Rutland:
> > Commit c02433dd6de32f04 ("arm64: split thread_info from task stack")
> > inverted the relationship between get_current() and
> > current_thread_info(), with sp_el0 now holding the current task_struct
> > rather than the current thead_info. The new implementation of
> 
> "thread_info"?

struct thread_info, as returned by current_thread_info().

> > get_current() prevents the compiler from being able to optimize repeated
> > calls to either, resulting in a noticeable penalty in some
> > microbenchmarks.
> > 
> > This patch restores the previous optimisation by implementing
> > get_current() in the same way as our old current_thread_info(), using a
> > non-volatile asm statement.
> > 
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Reported-by: Davidlohr Bueso <dbueso@suse.de>
> 
> Should this get a Fixes: c02433dd6de32f04 ("arm64: split thread_info
> from task stack") header?

Both c02433dd6de32f04 and this patch are already in mainline.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] arm64: restore get_current() optimisation
  2017-03-02 12:40   ` Mark Rutland
@ 2017-03-02 12:43     ` Andreas Färber
  2017-03-02 13:37       ` Mark Rutland
  0 siblings, 1 reply; 14+ messages in thread
From: Andreas Färber @ 2017-03-02 12:43 UTC (permalink / raw)
  To: linux-arm-kernel

Am 02.03.2017 um 13:40 schrieb Mark Rutland:
> On Thu, Mar 02, 2017 at 12:54:24PM +0100, Andreas F?rber wrote:
>> Hi Mark,
>>
>> Am 03.01.2017 um 19:27 schrieb Mark Rutland:
>>> Commit c02433dd6de32f04 ("arm64: split thread_info from task stack")
>>> inverted the relationship between get_current() and
>>> current_thread_info(), with sp_el0 now holding the current task_struct
>>> rather than the current thead_info. The new implementation of
>>
>> "thread_info"?
> 
> struct thread_info, as returned by current_thread_info().

I was pointing out the missing R. :)

>>> get_current() prevents the compiler from being able to optimize repeated
>>> calls to either, resulting in a noticeable penalty in some
>>> microbenchmarks.
>>>
>>> This patch restores the previous optimisation by implementing
>>> get_current() in the same way as our old current_thread_info(), using a
>>> non-volatile asm statement.
>>>
>>> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Will Deacon <will.deacon@arm.com>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Reported-by: Davidlohr Bueso <dbueso@suse.de>
>>
>> Should this get a Fixes: c02433dd6de32f04 ("arm64: split thread_info
>> from task stack") header?
> 
> Both c02433dd6de32f04 and this patch are already in mainline.

Sorry, this thread was at top of LAKML and it had one Acked-by but no
reply saying someone merged it.

Cheers,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 N?rnberg, Germany
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] arm64: restore get_current() optimisation
  2017-03-02 12:43     ` Andreas Färber
@ 2017-03-02 13:37       ` Mark Rutland
  0 siblings, 0 replies; 14+ messages in thread
From: Mark Rutland @ 2017-03-02 13:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 02, 2017 at 01:43:09PM +0100, Andreas F?rber wrote:
> Am 02.03.2017 um 13:40 schrieb Mark Rutland:
> > On Thu, Mar 02, 2017 at 12:54:24PM +0100, Andreas F?rber wrote:
> >> Hi Mark,
> >>
> >> Am 03.01.2017 um 19:27 schrieb Mark Rutland:
> >>> Commit c02433dd6de32f04 ("arm64: split thread_info from task stack")
> >>> inverted the relationship between get_current() and
> >>> current_thread_info(), with sp_el0 now holding the current task_struct
> >>> rather than the current thead_info. The new implementation of
> >>
> >> "thread_info"?
> > 
> > struct thread_info, as returned by current_thread_info().
> 
> I was pointing out the missing R. :)

Ah; clearly I should invest in some reading glasses... :)

Mark.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] arm64: restore get_current() optimisation
  2017-03-02 12:35   ` Mark Rutland
@ 2017-03-02 15:30     ` Jon Hunter
  2017-03-02 16:12       ` Robin Murphy
  2017-03-02 16:46       ` Mark Rutland
  0 siblings, 2 replies; 14+ messages in thread
From: Jon Hunter @ 2017-03-02 15:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

On 02/03/17 12:35, Mark Rutland wrote:
> On Thu, Mar 02, 2017 at 11:35:06AM +0000, Jon Hunter wrote:
>> Hi Mark,
> 
> Hi Jon,
> 
>> On 03/01/17 18:27, Mark Rutland wrote:
>>> Commit c02433dd6de32f04 ("arm64: split thread_info from task stack")
>>> inverted the relationship between get_current() and
>>> current_thread_info(), with sp_el0 now holding the current task_struct
>>> rather than the current thead_info. The new implementation of
>>> get_current() prevents the compiler from being able to optimize repeated
>>> calls to either, resulting in a noticeable penalty in some
>>> microbenchmarks.
>>>
>>> This patch restores the previous optimisation by implementing
>>> get_current() in the same way as our old current_thread_info(), using a
>>> non-volatile asm statement.
> 
>>> +/*
>>> + * We don't use read_sysreg() as we want the compiler to cache the value where
>>> + * possible.
>>> + */
>>>  static __always_inline struct task_struct *get_current(void)
>>>  {
>>> -	return (struct task_struct *)read_sysreg(sp_el0);
>>> +	unsigned long sp_el0;
>>> +
>>> +	asm ("mrs %0, sp_el0" : "=r" (sp_el0));
>>> +
>>> +	return (struct task_struct *)sp_el0;
>>>  }
>>>  
>>>  #define current get_current()
> 
>> I noticed that with v4.10 I am seeing the following panic ...
> 
> Ouch. :(
> 
> For reference, which toolchain are you using? This kind of code tends to be
> toolchain-sensitive.

This is with Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4. I have also tried ...

gcc version 5.3.1 20160412 (Linaro GCC 5.3-2016.05) 
gcc version 6.2.1 20161016 (Linaro GCC 6.2-2016.11)

... and see the same panic.
 
>> [  184.523390] Unable to handle kernel paging request at virtual address ffff8001bb7a2800
>> [  184.531316] pgd = ffff8000b96b1000
>> [  184.534711] [ffff8001bb7a2800] *pgd=0000000000000000
>> [  184.539670] Internal error: Oops: 96000005 [#1] PREEMPT SMP
> 
> That ESR_EL1 value decodes as a "Data Abort taken without a change in Exception
> level", the DFSC decodes as "Translation fault, level 1", and WnR is clear.
> 
> So we're blowing up on a read of a bogus address.
> 
>> [  184.566458] PC is at regcache_flat_read+0x14/0x20
>> [  184.571155] LR is at regcache_read+0x50/0x78
>> [  184.575417] pc : [<ffff0000085d0c6c>] lr : [<ffff0000085cefa8>] pstate: 400001c5
> 
> Judging by the PC, that read could be any of:
> 
> * the read of map->cache at the start of regcache_flat_read()
> 
> * an inlined regcache_get_index_by_order()'s read of map->reg_stride_order
> 
> * the read of cache[regcache_flat_get_index(map, reg)]
> 
> ... so it seems either map or map->cache is dodgy.
> 
> If you're can addr2line that PC, that should tell us which access is
> blowing up, and therefore which pointer is dodgy.
> 
> We'll want the full output considering inlined functions, i.e.
> 
> ${CROSS_COMPILE}addr2line -ife vmlinux 0xffff0000085d0c6c

This shows ...

regcache_flat_read
/home/jonathanh/workdir/tegra/korg-linux.git/drivers/base/regmap/regcache-flat.c:60
 
>> [  184.582802] sp : ffff8000b964b970
>> [  184.586108] x29: ffff8000b964b970 x28: ffff8000b9584800 
>> [  184.591412] x27: ffff8000b964bcc8 x26: ffff8000b9461000 
>> [  184.596716] x25: 0000000000000000 x24: 0000000000000000 
>> [  184.602019] x23: 00000000ffff8000 x22: ffff8000b964ba1c 
>> [  184.607322] x21: ffff8000b964ba1c x20: 00000000ffff8000 
>> [  184.612626] x19: ffff8000bb7dc400 x18: 0000000000000000 
>> [  184.617928] x17: 0000000000000001 x16: ffff0000081f79e8 
>> [  184.623230] x15: 0000000000497000 x14: 0000000000000000 
>> [  184.628532] x13: 0000000000000001 x12: 0000000005cc6000 
>> [  184.633835] x11: 0000000000000000 x10: ffff8000bc16bf00 
>> [  184.639138] x9 : 0000000000000000 x8 : 0000000000000000 
>> [  184.644441] x7 : ffff8000bff68908 x6 : 0000000000000000 
>> [  184.649742] x5 : ffff000008fc9f00 x4 : ffff8000bb7aa800 
>> [  184.655044] x3 : 0000000000000002 x2 : ffff8000b964ba1c 
>> [  184.660347] x1 : 000000003fffe000 x0 : 0000000000000000 
> 
>> [  185.178203] [<ffff0000085d0c6c>] regcache_flat_read+0x14/0x20
>> [  185.183939] [<ffff0000085cce60>] _regmap_read+0x98/0xe8
>> [  185.189155] [<ffff0000085cd218>] _regmap_update_bits+0xa0/0xf0
>> [  185.194978] [<ffff0000085ce1d8>] regmap_update_bits_base+0x60/0x90
>> [  185.201152] [<ffff000008856c44>] snd_soc_component_update_bits+0x24/0x40
> 
> AFAICT, these don't implicitly access current as part of generating the
> map pointer, so the dodgy pointer must have been generated above this
> level.
> 
> At this level I can't see why current would be involved at all. Beyond this
> point it's rather painful to follow the backtrace due to inlining.
> 
>> [  185.207843] [<ffff00000884e7f4>] dapm_power_widgets+0x474/0x730
>> [  185.213751] [<ffff00000884eb2c>] soc_dapm_mux_update_power.isra.29+0x7c/0xa0
>> [  185.220787] [<ffff00000884eb9c>] snd_soc_dapm_mux_update_power+0x4c/0x88
>> [  185.227479] [<ffff00000886bd04>] tegra210_xbar_put_value_enum+0x1b4/0x228
>> [  185.234256] [<ffff000008830110>] snd_ctl_elem_write+0x110/0x188
>> [  185.240165] [<ffff000008830610>] snd_ctl_ioctl+0xd0/0x798
>> [  185.245557] [<ffff0000081f7354>] do_vfs_ioctl+0xa4/0x738
>> [  185.250859] [<ffff0000081f7a74>] SyS_ioctl+0x8c/0xa0
>> [  185.255818] [<ffff000008082f30>] el0_svc_naked+0x24/0x28
>> [  185.261121] Code: 52800000 b941c883 f9410084 1ac32421 (b8615881) 
>> [  185.267223] ---[ end trace 5f6a6332822eca30 ]---
>>
>> Bisecting the panic ends up at this patch and reverting it on top of v4.10 prevents this from
>> occurring. 
>>
>> The occurs when I start playing audio on Tegra210 using tinymix. I do have some out-of-tree
>> patches for Tegra audio that I am using when seeing this but I have been using those for
>> probably a year or so, as I am gradually upstreaming bits.
>>
>> I am a bit flummoxed by the above, any thoughts?
> 
> Likewise. :/
> 
> It could just be that this happens to change the alignment/size of things, and
> unmasks a latent bug. Possibly, the removal of volatile has allowed some code
> to be reordered, highlighting missing barriers/synchronisation.
> 
> Maybe we are generating current wrong in some case, though I can't see how, and
> this is the only such report I've seen.
> 
> If the commit in question is resulting in get_current() behaving differently,
> it *might* be possible to detect with the hack below. I haven't seen it blow up
> on my test systems.

Unfortunately, that did not catch it :-(
 
> Otherwise, it might be worth giving KASAN a go; that might detect data
> corruption. If you have a recent enough toolchain, you only need enable
> CONFIG_KASAN. This will make your kernel Image a fair amount larger.

I enabled this with gcc 6.2.1 but now the PC is at __asan_load4 ...

[   19.516956] Unable to handle kernel paging request at virtual address ffff100033fcc660
[   19.524940] pgd = ffff80009c4c8000
[   19.528365] [ffff100033fcc660] *pgd=0000000000000000
[   19.533357] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[   19.538949] Modules linked in:
[   19.542033] CPU: 1 PID: 1465 Comm: tinymix Not tainted 4.10.0-00018-g0db5ca31acab #3
[   19.549822] Hardware name: Google Pixel C (DT)
[   19.554289] task: ffff8000a47e0d00 task.stack: ffff8000a3818000
[   19.560239] PC is at __asan_load4+0x24/0xa0
[   19.564450] LR is at regcache_flat_read+0x40/0x68
[   19.569176] pc : [<ffff200008269f94>] lr : [<ffff200008889ec8>] pstate: 200001c5
[   19.576616] sp : ffff8000a381b5a0
[   19.579951] x29: ffff8000a381b5a0 x28: ffff2000092a4240 
[   19.585288] x27: 0000000000000000 x26: 00000000a4c19f80 
[   19.590624] x25: 0000000000000000 x24: 00000000ffff8000 
[   19.595960] x23: ffff8000a381b6c0 x22: ffff80009fe6b300 
[   19.601295] x21: ffff8000a381b6c0 x20: ffff80009f821b00 
[   19.606632] x19: 000000003fffe000 x18: 0000000000000000 
[   19.611967] x17: 0000000000000001 x16: ffff2000082ac7d0 
[   19.617302] x15: 0000000000497000 x14: ffff200008c4f2f0 
[   19.622637] x13: ffff200008c4f264 x12: ffffffffffffffff 
[   19.627972] x11: 0000000000000040 x10: 0000000000000870 
[   19.633307] x9 : ffff8000a381b5a0 x8 : 00000000f4f4f404 
[   19.638642] x7 : ffff1000147036d4 x6 : 00000000f3f3f3f3 
[   19.643976] x5 : 0000000000000000 x4 : ffff80019fe63300 
[   19.649312] x3 : ffff200008889e88 x2 : 0000000000000000 
[   19.654646] x1 : 1ffff00033fcc660 x0 : dfff200000000000 
[   19.659979] 
[   19.661494] Process tinymix (pid: 1465, stack limit = 0xffff8000a3818000)
[   19.668304] Stack: (0xffff8000a381b5a0 to 0xffff8000a381c000)
[   19.674077] b5a0: ffff8000a381b5b0 ffff200008889ec8 ffff8000a381b5e0 ffff200008886ed4
[   19.681955] b5c0: ffff80009f821b00 ffff200009b08580 00000000ffff8000 ffff8000a381b6c0
[   19.689834] b5e0: ffff8000a381b610 ffff200008883908 ffff80009f821b00 ffff80009f821b00
[   19.697711] b600: 00000000ffff8000 ffff80009f821ced ffff8000a381b650 ffff200008883fb8
[   19.705590] b620: 1ffff000147036d4 ffff8000a381b7c0 ffff80009f821b00 0000000000000000
[   19.713467] b640: 00000000ffff8000 ffff200008883f14 ffff8000a381b700 ffff2000088859cc
[   19.721344] b660: ffff80009f821b00 ffff80009f821b30 ffff80009f821bb0 0000000000000000
[   19.729222] b680: 00000000ffff8000 00000000a4c19f80 00000000ffff8000 ffff8000a381b7c0
[   19.737100] b6a0: 0000000041b58ab3 ffff2000094ee250 ffff200008883e88 ffff80009f821b30
[   19.744979] b6c0: ffff200008880c50 0000000000000000 ffff8000a381b6e0 ffff200008880c70
[   19.752856] b6e0: ffff8000a381b700 ffff2000088859a0 ffff8000a381b700 ffff2000088859ac
[   19.760733] b700: ffff8000a381b760 ffff200008c5cad4 1ffff000147036f4 ffff80009ffe5bc0
[   19.768610] b720: 00000000ffff8000 00000000a4c19f80 00000000ffff8000 ffff80009ec442a8
[   19.776489] b740: ffff8000a381bae0 ffff200009c69f80 0000000000000000 ffff200008c5cab0
[   19.784366] b760: ffff8000a381b800 ffff200008c4ed24 ffff80009ee6de00 ffff80009ec44280
[   19.792243] b780: ffff200009c6a168 ffff200009c6a228 ffff200009c6a198 ffff8000a381b8f0
[   19.800120] b7a0: 0000000041b58ab3 ffff20000953f560 ffff200008c5ca38 ffff200008c4c2f0
[   19.807997] b7c0: ffff8000a381b700 ffff8000a381b7c0 ffff200009c6a198 ffff80009f29d500
[   19.815875] b7e0: ffff200009c6a148 ffff200009c69f80 ffff8000a381b800 ffff200008c4ed0c
[   19.823754] b800: ffff8000a381b970 ffff200008c4f264 ffff80009ee6dcc8 ffff20000931e4a0
[   19.831630] b820: 0000000000000008 ffff200009c5c990 ffff80009ee8de00 ffff200009c69f80
[   19.839507] b840: ffff2000092f0d60 0000000000000002 ffff80009ee8de00 0000000000000028
[   19.847384] b860: 1ffff00014703712 ffff2000ffff8000 ffff8000a4c19f80 ffff80009ffe5bc0
[   19.855261] b880: ffff2000092a3000 0000000009b08580 0000000041b58ab3 ffff20000953f2c0
[   19.863138] b8a0: ffff200008c4e480 ffff200008883908 ffff80009ed9c110 ffff80009ed9c110
[   19.871015] b8c0: ffff8000a381b8f0 ffff200008880ca4 ffff80009f821b00 0000000000000000
[   19.878892] b8e0: ffff80009f821b30 ffff200008880c98 ffff8000a381b8f0 ffff8000a381b8f0
[   19.886769] b900: ffff8000a381b930 ffff200008c4be90 ffff80009ecabc00 ffff80009ec443a0
[   19.894647] b920: ffff80009ec44280 ffff200008c4be64 ffff8000a381b930 ffff8000a381b930
[   19.902524] b940: ffff80009ecabc00 ffff20000931e4a0 0000000000000008 ffff200009c5c990
[   19.910403] b960: ffff8000a381b970 ffff200008c4f240 ffff8000a381b9c0 ffff200008c4f2f0
[   19.918281] b980: ffff200009c69f80 ffff8000a381bae0 ffff200009c69fd0 ffff200009c6a210
[   19.926158] b9a0: ffff80009ee8de00 0000000000000001 ffff200009c5c980 ffff200008c4f2d8
[   19.934036] b9c0: ffff8000a381ba10 ffff200008c7bda0 ffff8000a381bb58 ffff2000092f0bf4
[   19.941912] b9e0: ffff8000a381bb08 00000000ff1cf313 0000000000000002 ffff200009c5c980
[   19.949790] ba00: 0000000000000001 ffff200008c7bd80 ffff8000a381bb80 ffff200008c1e9b4
[   19.957667] ba20: ffff8000a3e71100 ffff80009ee8de00 1ffff0001470377c 0000000000000055
[   19.965546] ba40: ffff80009ee8de00 ffff80009f29d500 ffff80009f29d9a0 ffff8000a441f200
[   19.973423] ba60: 0000000000000000 ffff8000a47e0d00 1ffff00014703760 0000000000000050
[   19.981301] ba80: 0000000000000001 0000000000000000 1ffff00014703758 ffff8000a3e71100
[   19.989178] baa0: ffff8000a3e71148 ffff80009ffe5ca0 ffff80009ffe5b80 0000000300000000
[   19.997056] bac0: 0000000041b58ab3 ffff2000095421a0 ffff200008c7bb40 ffff8000a441f210
[   20.004933] bae0: ffff80009ee8de00 3f30031f00000240 ffff800000000000 ffff8000a4c19f80
[   20.012810] bb00: 0000000041b58ab3 ffff20000953e4f0 ff1cf31300000440 ffff200000000000
[   20.020688] bb20: ffff8000a381bb80 ffff200008c1e86c ffff8000a3e71100 0f1f03ff00000040
[   20.028564] bb40: 1ffff00000000001 0000000000000000 0000ffffcc230cc8 ffff80009f29d500
[   20.036443] bb60: ffff80009f29d9a0 ffff8000a441f200 ffff8000a381bb80 ffff200008c1e98c
[   20.044321] bb80: ffff8000a381bc60 ffff200008c1f190 1ffff00014703798 ffff8000a3e71100
[   20.052197] bba0: ffff8000a441f200 0000000000000000 0000ffffcc230cc8 ffff80009f29d500
[   20.060076] bbc0: ffff80009f29dd70 000000000000001d ffff200008e14000 ffff200008213d04
[   20.067953] bbe0: 0000000041b58ab3 ffff20000953e4c0 ffff200008c1e7e8 0000ffffcc230cc8
[   20.075830] bc00: 0000ffffcc230cc8 ffff80009ef4aec0 ffff80009f29d500 000000000000001d
[   20.083707] bc20: ffff8000a381bc30 ffff200008213d2c ffff8000a381bc60 ffff200008c1f148
[   20.091583] bc40: 1ffff00014703798 00000000c4c85513 ffff8000a381bc60 ffff200008c1f16c
[   20.099461] bc60: ffff8000a381bd40 ffff2000082abebc 1ffff000147037b4 00000000c4c85513
[   20.107339] bc80: ffff80009cdbfb80 0000ffffcc230cc8 ffff20000929b0a0 ffff80009ef4aec0
[   20.115216] bca0: 0000000000000123 000000000000001d ffff200008e14000 014000c000000055
[   20.123094] bcc0: 0000000041b58ab3 ffff20000953e4c0 ffff200008c1f058 0000000000000000
[   20.130970] bce0: 0000000000000000 0000000000000000 0000000000000000 ffff80009c455a40
[   20.138847] bd00: ffff7e0002711570 0000000000000000 ffff8000a47e0d00 ffff8000a381bec0
[   20.146725] bd20: ffff8000a381bd30 ffff20000809f5d8 ffff8000a381bd40 ffff2000082abea4
[   20.154602] bd40: ffff8000a381be80 ffff2000082ac85c 0000000000000000 ffff80009cdbfb80
[   20.162479] bd60: ffff80009cdbfb80 0000000000000003 00000000c4c85513 0000ffffcc230cc8
[   20.170355] bd80: 0000000000000123 000000000000001d ffff200008e14000 ffff8000a47e0d00
[   20.178232] bda0: 0000000041b58ab3 ffff2000094dd808 ffff2000082abd88 ffff20000808336c
[   20.186109] bdc0: 0000000000000000 00006000b6877000 ffffffffffffffff 0000000000415230
[   20.193986] bde0: 0000000060000000 0000000000000024 0000000092000047 000000000a148018
[   20.201863] be00: 0000000041b58ab3 ffff2000094cc5c8 ffff200008081360 ffff2000094da138
[   20.209741] be20: ffff200008239b00 ffff80009e48b0f0 ffff8000a381be40 ffff2000082bd2d4
[   20.217617] be40: ffff8000a381be80 ffff2000082ac810 0000000000000000 ffff80009cdbfb80
[   20.225494] be60: ffff80009cdbfb80 0000000000000003 00000000c4c85513 ffff2000082ac7f4
[   20.233370] be80: 0000000000000000 ffff200008083730 0000000000000000 00006000b6877000
[   20.241247] bea0: ffffffffffffffff 000000000041c51c 0000000080000000 0000000000000015
[   20.249125] bec0: 0000000000000003 00000000c4c85513 0000ffffcc230cc8 0000000000000010
[   20.257002] bee0: fffffffffffffff0 0000000000000040 000000000000003f 0000000000000000
[   20.264879] bf00: 000000000000001d 0000000000000004 0101010101010101 0000000000000005
[   20.272756] bf20: ffffffffffffffff 0000000000499000 0000000000499000 0000000000497000
[   20.280634] bf40: 0000ffffcc231528 0000000000000001 0000000000000000 00000000004001a0
[   20.288510] bf60: 0000000000000000 00000000004001a0 0000000000000000 0000000000000000
[   20.296386] bf80: 000000000040559c 00000000004054e4 0000000000000000 0000000000000000
[   20.304263] bfa0: 0000000000000000 0000ffffcc230ca0 0000000000402998 0000ffffcc230ca0
[   20.312139] bfc0: 000000000041c51c 0000000080000000 0000000000000003 000000000000001d
[   20.320016] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   20.327888] Call trace:
[   20.330358] Exception stack(0xffff8000a381b370 to 0xffff8000a381b4a0)
[   20.336821] b360:                                   000000003fffe000 0001000000000000
[   20.344698] b380: ffff8000a381b5a0 ffff200008269f94 00000000200001c5 0000000000000025
[   20.352575] b3a0: 0000000000000000 00000000a4c19f80 0000000041b58ab3 ffff2000094cc5c8
[   20.360452] b3c0: ffff200008081360 0000000000000003 ffff200009729b74 0000000000000008
[   20.368330] b3e0: ffff200008e2fec0 ffff200008e37000 ffff8000a381b400 ffff200008549b64
[   20.376208] b400: ffff8000a381b410 ffff20000811a1ec ffff8000a381b440 ffff20000811a860
[   20.384085] b420: ffff8000a381b430 ffff20000811a2cc ffff8000a381b470 ffff20000811aa60
[   20.391961] b440: 0000000000000002 ffff8000a375ef80 ffff8000bff628e0 0000000000000001
[   20.399838] b460: ffff8000a381b470 ffff20000811a988 dfff200000000000 1ffff00033fcc660
[   20.407715] b480: 0000000000000000 ffff200008889e88 ffff80019fe63300 0000000000000000
[   20.415595] [<ffff200008269f94>] __asan_load4+0x24/0xa0
[   20.420845] [<ffff200008889ec8>] regcache_flat_read+0x40/0x68
[   20.426618] [<ffff200008886ed4>] regcache_read+0x7c/0xa8
[   20.431955] [<ffff200008883908>] _regmap_read+0xd0/0x130
[   20.437292] [<ffff200008883fb8>] _regmap_update_bits+0x130/0x178
[   20.443322] [<ffff2000088859cc>] regmap_update_bits_base+0x84/0xd0
[   20.449532] [<ffff200008c5cad4>] snd_soc_component_update_bits+0x9c/0xf0
[   20.456256] [<ffff200008c4ed24>] dapm_power_widgets+0x8a4/0xd28
[   20.462199] [<ffff200008c4f264>] soc_dapm_mux_update_power.isra.29+0xbc/0xe0
[   20.469270] [<ffff200008c4f2f0>] snd_soc_dapm_mux_update_power+0x68/0xb0
[   20.475996] [<ffff200008c7bda0>] tegra210_xbar_put_value_enum+0x260/0x348
[   20.482809] [<ffff200008c1e9b4>] snd_ctl_elem_write+0x1cc/0x250
[   20.488751] [<ffff200008c1f190>] snd_ctl_ioctl+0x138/0x998
[   20.494263] [<ffff2000082abebc>] do_vfs_ioctl+0x134/0xa48
[   20.499684] [<ffff2000082ac85c>] SyS_ioctl+0x8c/0xa0
[   20.504675] [<ffff200008083730>] el0_svc_naked+0x24/0x28
[   20.510013] Code: d343fc01 aa0003e4 d2c40000 f2fbffe0 (78606822) 
[   20.516180] ---[ end trace 97433b67122c9a34 ]---

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] arm64: restore get_current() optimisation
  2017-03-02 15:30     ` Jon Hunter
@ 2017-03-02 16:12       ` Robin Murphy
  2017-03-02 17:11         ` Mark Rutland
  2017-03-02 16:46       ` Mark Rutland
  1 sibling, 1 reply; 14+ messages in thread
From: Robin Murphy @ 2017-03-02 16:12 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/03/17 15:30, Jon Hunter wrote:
> Hi Mark,
> 
> On 02/03/17 12:35, Mark Rutland wrote:
>> On Thu, Mar 02, 2017 at 11:35:06AM +0000, Jon Hunter wrote:
>>> Hi Mark,
>>
>> Hi Jon,
>>
>>> On 03/01/17 18:27, Mark Rutland wrote:
>>>> Commit c02433dd6de32f04 ("arm64: split thread_info from task stack")
>>>> inverted the relationship between get_current() and
>>>> current_thread_info(), with sp_el0 now holding the current task_struct
>>>> rather than the current thead_info. The new implementation of
>>>> get_current() prevents the compiler from being able to optimize repeated
>>>> calls to either, resulting in a noticeable penalty in some
>>>> microbenchmarks.
>>>>
>>>> This patch restores the previous optimisation by implementing
>>>> get_current() in the same way as our old current_thread_info(), using a
>>>> non-volatile asm statement.
>>
>>>> +/*
>>>> + * We don't use read_sysreg() as we want the compiler to cache the value where
>>>> + * possible.
>>>> + */
>>>>  static __always_inline struct task_struct *get_current(void)
>>>>  {
>>>> -	return (struct task_struct *)read_sysreg(sp_el0);
>>>> +	unsigned long sp_el0;
>>>> +
>>>> +	asm ("mrs %0, sp_el0" : "=r" (sp_el0));
>>>> +
>>>> +	return (struct task_struct *)sp_el0;
>>>>  }
>>>>  
>>>>  #define current get_current()
>>
>>> I noticed that with v4.10 I am seeing the following panic ...
>>
>> Ouch. :(
>>
>> For reference, which toolchain are you using? This kind of code tends to be
>> toolchain-sensitive.
> 
> This is with Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4. I have also tried ...
> 
> gcc version 5.3.1 20160412 (Linaro GCC 5.3-2016.05) 
> gcc version 6.2.1 20161016 (Linaro GCC 6.2-2016.11)
> 
> ... and see the same panic.
>  
>>> [  184.523390] Unable to handle kernel paging request at virtual address ffff8001bb7a2800

Notably, this is x4 + x23, where I'd bet on x4 being the address of
"cache", and x23 being the index, except that apparently the top half of
a pointer has somehow got in there instead - the stack contents at b9c8
and b9e8 also stand out in that regard. I'm wondering if the removal of
volatile means we get some stack access hoisted before an earlier
swizzling of current, the effect of which only makes itself known way
down the line.

The KASAN version below is also interesting in that the
reasonable-looking duff address is x0 + x1, but neither of those looks
like anything sane on their own.

Robin.

>>> [  184.531316] pgd = ffff8000b96b1000
>>> [  184.534711] [ffff8001bb7a2800] *pgd=0000000000000000
>>> [  184.539670] Internal error: Oops: 96000005 [#1] PREEMPT SMP
>>
>> That ESR_EL1 value decodes as a "Data Abort taken without a change in Exception
>> level", the DFSC decodes as "Translation fault, level 1", and WnR is clear.
>>
>> So we're blowing up on a read of a bogus address.
>>
>>> [  184.566458] PC is at regcache_flat_read+0x14/0x20
>>> [  184.571155] LR is at regcache_read+0x50/0x78
>>> [  184.575417] pc : [<ffff0000085d0c6c>] lr : [<ffff0000085cefa8>] pstate: 400001c5
>>
>> Judging by the PC, that read could be any of:
>>
>> * the read of map->cache at the start of regcache_flat_read()
>>
>> * an inlined regcache_get_index_by_order()'s read of map->reg_stride_order
>>
>> * the read of cache[regcache_flat_get_index(map, reg)]
>>
>> ... so it seems either map or map->cache is dodgy.
>>
>> If you're can addr2line that PC, that should tell us which access is
>> blowing up, and therefore which pointer is dodgy.
>>
>> We'll want the full output considering inlined functions, i.e.
>>
>> ${CROSS_COMPILE}addr2line -ife vmlinux 0xffff0000085d0c6c
> 
> This shows ...
> 
> regcache_flat_read
> /home/jonathanh/workdir/tegra/korg-linux.git/drivers/base/regmap/regcache-flat.c:60
>  
>>> [  184.582802] sp : ffff8000b964b970
>>> [  184.586108] x29: ffff8000b964b970 x28: ffff8000b9584800 
>>> [  184.591412] x27: ffff8000b964bcc8 x26: ffff8000b9461000 
>>> [  184.596716] x25: 0000000000000000 x24: 0000000000000000 
>>> [  184.602019] x23: 00000000ffff8000 x22: ffff8000b964ba1c 
>>> [  184.607322] x21: ffff8000b964ba1c x20: 00000000ffff8000 
>>> [  184.612626] x19: ffff8000bb7dc400 x18: 0000000000000000 
>>> [  184.617928] x17: 0000000000000001 x16: ffff0000081f79e8 
>>> [  184.623230] x15: 0000000000497000 x14: 0000000000000000 
>>> [  184.628532] x13: 0000000000000001 x12: 0000000005cc6000 
>>> [  184.633835] x11: 0000000000000000 x10: ffff8000bc16bf00 
>>> [  184.639138] x9 : 0000000000000000 x8 : 0000000000000000 
>>> [  184.644441] x7 : ffff8000bff68908 x6 : 0000000000000000 
>>> [  184.649742] x5 : ffff000008fc9f00 x4 : ffff8000bb7aa800 
>>> [  184.655044] x3 : 0000000000000002 x2 : ffff8000b964ba1c 
>>> [  184.660347] x1 : 000000003fffe000 x0 : 0000000000000000 
>>
>>> [  185.178203] [<ffff0000085d0c6c>] regcache_flat_read+0x14/0x20
>>> [  185.183939] [<ffff0000085cce60>] _regmap_read+0x98/0xe8
>>> [  185.189155] [<ffff0000085cd218>] _regmap_update_bits+0xa0/0xf0
>>> [  185.194978] [<ffff0000085ce1d8>] regmap_update_bits_base+0x60/0x90
>>> [  185.201152] [<ffff000008856c44>] snd_soc_component_update_bits+0x24/0x40
>>
>> AFAICT, these don't implicitly access current as part of generating the
>> map pointer, so the dodgy pointer must have been generated above this
>> level.
>>
>> At this level I can't see why current would be involved at all. Beyond this
>> point it's rather painful to follow the backtrace due to inlining.
>>
>>> [  185.207843] [<ffff00000884e7f4>] dapm_power_widgets+0x474/0x730
>>> [  185.213751] [<ffff00000884eb2c>] soc_dapm_mux_update_power.isra.29+0x7c/0xa0
>>> [  185.220787] [<ffff00000884eb9c>] snd_soc_dapm_mux_update_power+0x4c/0x88
>>> [  185.227479] [<ffff00000886bd04>] tegra210_xbar_put_value_enum+0x1b4/0x228
>>> [  185.234256] [<ffff000008830110>] snd_ctl_elem_write+0x110/0x188
>>> [  185.240165] [<ffff000008830610>] snd_ctl_ioctl+0xd0/0x798
>>> [  185.245557] [<ffff0000081f7354>] do_vfs_ioctl+0xa4/0x738
>>> [  185.250859] [<ffff0000081f7a74>] SyS_ioctl+0x8c/0xa0
>>> [  185.255818] [<ffff000008082f30>] el0_svc_naked+0x24/0x28
>>> [  185.261121] Code: 52800000 b941c883 f9410084 1ac32421 (b8615881) 
>>> [  185.267223] ---[ end trace 5f6a6332822eca30 ]---
>>>
>>> Bisecting the panic ends up at this patch and reverting it on top of v4.10 prevents this from
>>> occurring. 
>>>
>>> The occurs when I start playing audio on Tegra210 using tinymix. I do have some out-of-tree
>>> patches for Tegra audio that I am using when seeing this but I have been using those for
>>> probably a year or so, as I am gradually upstreaming bits.
>>>
>>> I am a bit flummoxed by the above, any thoughts?
>>
>> Likewise. :/
>>
>> It could just be that this happens to change the alignment/size of things, and
>> unmasks a latent bug. Possibly, the removal of volatile has allowed some code
>> to be reordered, highlighting missing barriers/synchronisation.
>>
>> Maybe we are generating current wrong in some case, though I can't see how, and
>> this is the only such report I've seen.
>>
>> If the commit in question is resulting in get_current() behaving differently,
>> it *might* be possible to detect with the hack below. I haven't seen it blow up
>> on my test systems.
> 
> Unfortunately, that did not catch it :-(
>  
>> Otherwise, it might be worth giving KASAN a go; that might detect data
>> corruption. If you have a recent enough toolchain, you only need enable
>> CONFIG_KASAN. This will make your kernel Image a fair amount larger.
> 
> I enabled this with gcc 6.2.1 but now the PC is at __asan_load4 ...
> 
> [   19.516956] Unable to handle kernel paging request at virtual address ffff100033fcc660
> [   19.524940] pgd = ffff80009c4c8000
> [   19.528365] [ffff100033fcc660] *pgd=0000000000000000
> [   19.533357] Internal error: Oops: 96000006 [#1] PREEMPT SMP
> [   19.538949] Modules linked in:
> [   19.542033] CPU: 1 PID: 1465 Comm: tinymix Not tainted 4.10.0-00018-g0db5ca31acab #3
> [   19.549822] Hardware name: Google Pixel C (DT)
> [   19.554289] task: ffff8000a47e0d00 task.stack: ffff8000a3818000
> [   19.560239] PC is at __asan_load4+0x24/0xa0
> [   19.564450] LR is at regcache_flat_read+0x40/0x68
> [   19.569176] pc : [<ffff200008269f94>] lr : [<ffff200008889ec8>] pstate: 200001c5
> [   19.576616] sp : ffff8000a381b5a0
> [   19.579951] x29: ffff8000a381b5a0 x28: ffff2000092a4240 
> [   19.585288] x27: 0000000000000000 x26: 00000000a4c19f80 
> [   19.590624] x25: 0000000000000000 x24: 00000000ffff8000 
> [   19.595960] x23: ffff8000a381b6c0 x22: ffff80009fe6b300 
> [   19.601295] x21: ffff8000a381b6c0 x20: ffff80009f821b00 
> [   19.606632] x19: 000000003fffe000 x18: 0000000000000000 
> [   19.611967] x17: 0000000000000001 x16: ffff2000082ac7d0 
> [   19.617302] x15: 0000000000497000 x14: ffff200008c4f2f0 
> [   19.622637] x13: ffff200008c4f264 x12: ffffffffffffffff 
> [   19.627972] x11: 0000000000000040 x10: 0000000000000870 
> [   19.633307] x9 : ffff8000a381b5a0 x8 : 00000000f4f4f404 
> [   19.638642] x7 : ffff1000147036d4 x6 : 00000000f3f3f3f3 
> [   19.643976] x5 : 0000000000000000 x4 : ffff80019fe63300 
> [   19.649312] x3 : ffff200008889e88 x2 : 0000000000000000 
> [   19.654646] x1 : 1ffff00033fcc660 x0 : dfff200000000000 
> [   19.659979] 
> [   19.661494] Process tinymix (pid: 1465, stack limit = 0xffff8000a3818000)
> [   19.668304] Stack: (0xffff8000a381b5a0 to 0xffff8000a381c000)
> [   19.674077] b5a0: ffff8000a381b5b0 ffff200008889ec8 ffff8000a381b5e0 ffff200008886ed4
> [   19.681955] b5c0: ffff80009f821b00 ffff200009b08580 00000000ffff8000 ffff8000a381b6c0
> [   19.689834] b5e0: ffff8000a381b610 ffff200008883908 ffff80009f821b00 ffff80009f821b00
> [   19.697711] b600: 00000000ffff8000 ffff80009f821ced ffff8000a381b650 ffff200008883fb8
> [   19.705590] b620: 1ffff000147036d4 ffff8000a381b7c0 ffff80009f821b00 0000000000000000
> [   19.713467] b640: 00000000ffff8000 ffff200008883f14 ffff8000a381b700 ffff2000088859cc
> [   19.721344] b660: ffff80009f821b00 ffff80009f821b30 ffff80009f821bb0 0000000000000000
> [   19.729222] b680: 00000000ffff8000 00000000a4c19f80 00000000ffff8000 ffff8000a381b7c0
> [   19.737100] b6a0: 0000000041b58ab3 ffff2000094ee250 ffff200008883e88 ffff80009f821b30
> [   19.744979] b6c0: ffff200008880c50 0000000000000000 ffff8000a381b6e0 ffff200008880c70
> [   19.752856] b6e0: ffff8000a381b700 ffff2000088859a0 ffff8000a381b700 ffff2000088859ac
> [   19.760733] b700: ffff8000a381b760 ffff200008c5cad4 1ffff000147036f4 ffff80009ffe5bc0
> [   19.768610] b720: 00000000ffff8000 00000000a4c19f80 00000000ffff8000 ffff80009ec442a8
> [   19.776489] b740: ffff8000a381bae0 ffff200009c69f80 0000000000000000 ffff200008c5cab0
> [   19.784366] b760: ffff8000a381b800 ffff200008c4ed24 ffff80009ee6de00 ffff80009ec44280
> [   19.792243] b780: ffff200009c6a168 ffff200009c6a228 ffff200009c6a198 ffff8000a381b8f0
> [   19.800120] b7a0: 0000000041b58ab3 ffff20000953f560 ffff200008c5ca38 ffff200008c4c2f0
> [   19.807997] b7c0: ffff8000a381b700 ffff8000a381b7c0 ffff200009c6a198 ffff80009f29d500
> [   19.815875] b7e0: ffff200009c6a148 ffff200009c69f80 ffff8000a381b800 ffff200008c4ed0c
> [   19.823754] b800: ffff8000a381b970 ffff200008c4f264 ffff80009ee6dcc8 ffff20000931e4a0
> [   19.831630] b820: 0000000000000008 ffff200009c5c990 ffff80009ee8de00 ffff200009c69f80
> [   19.839507] b840: ffff2000092f0d60 0000000000000002 ffff80009ee8de00 0000000000000028
> [   19.847384] b860: 1ffff00014703712 ffff2000ffff8000 ffff8000a4c19f80 ffff80009ffe5bc0
> [   19.855261] b880: ffff2000092a3000 0000000009b08580 0000000041b58ab3 ffff20000953f2c0
> [   19.863138] b8a0: ffff200008c4e480 ffff200008883908 ffff80009ed9c110 ffff80009ed9c110
> [   19.871015] b8c0: ffff8000a381b8f0 ffff200008880ca4 ffff80009f821b00 0000000000000000
> [   19.878892] b8e0: ffff80009f821b30 ffff200008880c98 ffff8000a381b8f0 ffff8000a381b8f0
> [   19.886769] b900: ffff8000a381b930 ffff200008c4be90 ffff80009ecabc00 ffff80009ec443a0
> [   19.894647] b920: ffff80009ec44280 ffff200008c4be64 ffff8000a381b930 ffff8000a381b930
> [   19.902524] b940: ffff80009ecabc00 ffff20000931e4a0 0000000000000008 ffff200009c5c990
> [   19.910403] b960: ffff8000a381b970 ffff200008c4f240 ffff8000a381b9c0 ffff200008c4f2f0
> [   19.918281] b980: ffff200009c69f80 ffff8000a381bae0 ffff200009c69fd0 ffff200009c6a210
> [   19.926158] b9a0: ffff80009ee8de00 0000000000000001 ffff200009c5c980 ffff200008c4f2d8
> [   19.934036] b9c0: ffff8000a381ba10 ffff200008c7bda0 ffff8000a381bb58 ffff2000092f0bf4
> [   19.941912] b9e0: ffff8000a381bb08 00000000ff1cf313 0000000000000002 ffff200009c5c980
> [   19.949790] ba00: 0000000000000001 ffff200008c7bd80 ffff8000a381bb80 ffff200008c1e9b4
> [   19.957667] ba20: ffff8000a3e71100 ffff80009ee8de00 1ffff0001470377c 0000000000000055
> [   19.965546] ba40: ffff80009ee8de00 ffff80009f29d500 ffff80009f29d9a0 ffff8000a441f200
> [   19.973423] ba60: 0000000000000000 ffff8000a47e0d00 1ffff00014703760 0000000000000050
> [   19.981301] ba80: 0000000000000001 0000000000000000 1ffff00014703758 ffff8000a3e71100
> [   19.989178] baa0: ffff8000a3e71148 ffff80009ffe5ca0 ffff80009ffe5b80 0000000300000000
> [   19.997056] bac0: 0000000041b58ab3 ffff2000095421a0 ffff200008c7bb40 ffff8000a441f210
> [   20.004933] bae0: ffff80009ee8de00 3f30031f00000240 ffff800000000000 ffff8000a4c19f80
> [   20.012810] bb00: 0000000041b58ab3 ffff20000953e4f0 ff1cf31300000440 ffff200000000000
> [   20.020688] bb20: ffff8000a381bb80 ffff200008c1e86c ffff8000a3e71100 0f1f03ff00000040
> [   20.028564] bb40: 1ffff00000000001 0000000000000000 0000ffffcc230cc8 ffff80009f29d500
> [   20.036443] bb60: ffff80009f29d9a0 ffff8000a441f200 ffff8000a381bb80 ffff200008c1e98c
> [   20.044321] bb80: ffff8000a381bc60 ffff200008c1f190 1ffff00014703798 ffff8000a3e71100
> [   20.052197] bba0: ffff8000a441f200 0000000000000000 0000ffffcc230cc8 ffff80009f29d500
> [   20.060076] bbc0: ffff80009f29dd70 000000000000001d ffff200008e14000 ffff200008213d04
> [   20.067953] bbe0: 0000000041b58ab3 ffff20000953e4c0 ffff200008c1e7e8 0000ffffcc230cc8
> [   20.075830] bc00: 0000ffffcc230cc8 ffff80009ef4aec0 ffff80009f29d500 000000000000001d
> [   20.083707] bc20: ffff8000a381bc30 ffff200008213d2c ffff8000a381bc60 ffff200008c1f148
> [   20.091583] bc40: 1ffff00014703798 00000000c4c85513 ffff8000a381bc60 ffff200008c1f16c
> [   20.099461] bc60: ffff8000a381bd40 ffff2000082abebc 1ffff000147037b4 00000000c4c85513
> [   20.107339] bc80: ffff80009cdbfb80 0000ffffcc230cc8 ffff20000929b0a0 ffff80009ef4aec0
> [   20.115216] bca0: 0000000000000123 000000000000001d ffff200008e14000 014000c000000055
> [   20.123094] bcc0: 0000000041b58ab3 ffff20000953e4c0 ffff200008c1f058 0000000000000000
> [   20.130970] bce0: 0000000000000000 0000000000000000 0000000000000000 ffff80009c455a40
> [   20.138847] bd00: ffff7e0002711570 0000000000000000 ffff8000a47e0d00 ffff8000a381bec0
> [   20.146725] bd20: ffff8000a381bd30 ffff20000809f5d8 ffff8000a381bd40 ffff2000082abea4
> [   20.154602] bd40: ffff8000a381be80 ffff2000082ac85c 0000000000000000 ffff80009cdbfb80
> [   20.162479] bd60: ffff80009cdbfb80 0000000000000003 00000000c4c85513 0000ffffcc230cc8
> [   20.170355] bd80: 0000000000000123 000000000000001d ffff200008e14000 ffff8000a47e0d00
> [   20.178232] bda0: 0000000041b58ab3 ffff2000094dd808 ffff2000082abd88 ffff20000808336c
> [   20.186109] bdc0: 0000000000000000 00006000b6877000 ffffffffffffffff 0000000000415230
> [   20.193986] bde0: 0000000060000000 0000000000000024 0000000092000047 000000000a148018
> [   20.201863] be00: 0000000041b58ab3 ffff2000094cc5c8 ffff200008081360 ffff2000094da138
> [   20.209741] be20: ffff200008239b00 ffff80009e48b0f0 ffff8000a381be40 ffff2000082bd2d4
> [   20.217617] be40: ffff8000a381be80 ffff2000082ac810 0000000000000000 ffff80009cdbfb80
> [   20.225494] be60: ffff80009cdbfb80 0000000000000003 00000000c4c85513 ffff2000082ac7f4
> [   20.233370] be80: 0000000000000000 ffff200008083730 0000000000000000 00006000b6877000
> [   20.241247] bea0: ffffffffffffffff 000000000041c51c 0000000080000000 0000000000000015
> [   20.249125] bec0: 0000000000000003 00000000c4c85513 0000ffffcc230cc8 0000000000000010
> [   20.257002] bee0: fffffffffffffff0 0000000000000040 000000000000003f 0000000000000000
> [   20.264879] bf00: 000000000000001d 0000000000000004 0101010101010101 0000000000000005
> [   20.272756] bf20: ffffffffffffffff 0000000000499000 0000000000499000 0000000000497000
> [   20.280634] bf40: 0000ffffcc231528 0000000000000001 0000000000000000 00000000004001a0
> [   20.288510] bf60: 0000000000000000 00000000004001a0 0000000000000000 0000000000000000
> [   20.296386] bf80: 000000000040559c 00000000004054e4 0000000000000000 0000000000000000
> [   20.304263] bfa0: 0000000000000000 0000ffffcc230ca0 0000000000402998 0000ffffcc230ca0
> [   20.312139] bfc0: 000000000041c51c 0000000080000000 0000000000000003 000000000000001d
> [   20.320016] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [   20.327888] Call trace:
> [   20.330358] Exception stack(0xffff8000a381b370 to 0xffff8000a381b4a0)
> [   20.336821] b360:                                   000000003fffe000 0001000000000000
> [   20.344698] b380: ffff8000a381b5a0 ffff200008269f94 00000000200001c5 0000000000000025
> [   20.352575] b3a0: 0000000000000000 00000000a4c19f80 0000000041b58ab3 ffff2000094cc5c8
> [   20.360452] b3c0: ffff200008081360 0000000000000003 ffff200009729b74 0000000000000008
> [   20.368330] b3e0: ffff200008e2fec0 ffff200008e37000 ffff8000a381b400 ffff200008549b64
> [   20.376208] b400: ffff8000a381b410 ffff20000811a1ec ffff8000a381b440 ffff20000811a860
> [   20.384085] b420: ffff8000a381b430 ffff20000811a2cc ffff8000a381b470 ffff20000811aa60
> [   20.391961] b440: 0000000000000002 ffff8000a375ef80 ffff8000bff628e0 0000000000000001
> [   20.399838] b460: ffff8000a381b470 ffff20000811a988 dfff200000000000 1ffff00033fcc660
> [   20.407715] b480: 0000000000000000 ffff200008889e88 ffff80019fe63300 0000000000000000
> [   20.415595] [<ffff200008269f94>] __asan_load4+0x24/0xa0
> [   20.420845] [<ffff200008889ec8>] regcache_flat_read+0x40/0x68
> [   20.426618] [<ffff200008886ed4>] regcache_read+0x7c/0xa8
> [   20.431955] [<ffff200008883908>] _regmap_read+0xd0/0x130
> [   20.437292] [<ffff200008883fb8>] _regmap_update_bits+0x130/0x178
> [   20.443322] [<ffff2000088859cc>] regmap_update_bits_base+0x84/0xd0
> [   20.449532] [<ffff200008c5cad4>] snd_soc_component_update_bits+0x9c/0xf0
> [   20.456256] [<ffff200008c4ed24>] dapm_power_widgets+0x8a4/0xd28
> [   20.462199] [<ffff200008c4f264>] soc_dapm_mux_update_power.isra.29+0xbc/0xe0
> [   20.469270] [<ffff200008c4f2f0>] snd_soc_dapm_mux_update_power+0x68/0xb0
> [   20.475996] [<ffff200008c7bda0>] tegra210_xbar_put_value_enum+0x260/0x348
> [   20.482809] [<ffff200008c1e9b4>] snd_ctl_elem_write+0x1cc/0x250
> [   20.488751] [<ffff200008c1f190>] snd_ctl_ioctl+0x138/0x998
> [   20.494263] [<ffff2000082abebc>] do_vfs_ioctl+0x134/0xa48
> [   20.499684] [<ffff2000082ac85c>] SyS_ioctl+0x8c/0xa0
> [   20.504675] [<ffff200008083730>] el0_svc_naked+0x24/0x28
> [   20.510013] Code: d343fc01 aa0003e4 d2c40000 f2fbffe0 (78606822) 
> [   20.516180] ---[ end trace 97433b67122c9a34 ]---
> 
> Cheers
> Jon
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] arm64: restore get_current() optimisation
  2017-03-02 15:30     ` Jon Hunter
  2017-03-02 16:12       ` Robin Murphy
@ 2017-03-02 16:46       ` Mark Rutland
  2017-03-03 15:32         ` Jon Hunter
  1 sibling, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2017-03-02 16:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 02, 2017 at 03:30:26PM +0000, Jon Hunter wrote:
> On 02/03/17 12:35, Mark Rutland wrote:
> > On Thu, Mar 02, 2017 at 11:35:06AM +0000, Jon Hunter wrote:
> >> On 03/01/17 18:27, Mark Rutland wrote:

> >> I noticed that with v4.10 I am seeing the following panic ...

> This is with Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4. I have also tried ...
> 
> gcc version 5.3.1 20160412 (Linaro GCC 5.3-2016.05) 
> gcc version 6.2.1 20161016 (Linaro GCC 6.2-2016.11)
> 
> ... and see the same panic.

So far I've been testing with the Linaro 15.08 toolchain; I'll give
these a go.

[...]

> >> [  184.566458] PC is at regcache_flat_read+0x14/0x20
> >> [  184.571155] LR is at regcache_read+0x50/0x78
> >> [  184.575417] pc : [<ffff0000085d0c6c>] lr : [<ffff0000085cefa8>] pstate: 400001c5
> > 
> > Judging by the PC, that read could be any of:
> > 
> > * the read of map->cache at the start of regcache_flat_read()
> > 
> > * an inlined regcache_get_index_by_order()'s read of map->reg_stride_order
> > 
> > * the read of cache[regcache_flat_get_index(map, reg)]
> > 
> > ... so it seems either map or map->cache is dodgy.
> > 
> > If you're can addr2line that PC, that should tell us which access is
> > blowing up, and therefore which pointer is dodgy.
> > 
> > We'll want the full output considering inlined functions, i.e.
> > 
> > ${CROSS_COMPILE}addr2line -ife vmlinux 0xffff0000085d0c6c
> 
> This shows ...
> 
> regcache_flat_read
> /home/jonathanh/workdir/tegra/korg-linux.git/drivers/base/regmap/regcache-flat.c:60

Ok, so it appears that either map->cache is dodgy, or the reg passed to
regcache_flat_read() is bogus, and we end up generating a cache index
that is well out of bounds.

Perhaps a caller is deriving that from an uninitialised variable? The
code and stack layout changes as a result of my patch could easily
tickle that.

I'm not at all familiar with regmap, but IIUC we can catch that with:

---->8----
diff --git a/drivers/base/regmap/regcache-flat.c b/drivers/base/regmap/regcache-flat.c
index 4d2e50b..2d42c01 100644
--- a/drivers/base/regmap/regcache-flat.c
+++ b/drivers/base/regmap/regcache-flat.c
@@ -57,6 +57,8 @@ static int regcache_flat_read(struct regmap *map,
 {
        unsigned int *cache = map->cache;
 
+       BUG_ON(reg > map->max_register);
+
        *value = cache[regcache_flat_get_index(map, reg)];
 
        return 0;
---->8----

> > If the commit in question is resulting in get_current() behaving differently,
> > it *might* be possible to detect with the hack below. I haven't seen it blow up
> > on my test systems.
> 
> Unfortunately, that did not catch it :-(

That would seem to imply that get_current() is returning the right value.

So, I'd suspect that we've uncovered a latent bug elsewhere.

If the reg index isn't out-of-bounds, it would imply that either the
regmap data is getting corrupted somewhere, or it isn't initialised
correctly.

Can we log when the regmap and regcache in question are initialised,
dumping their addresses? That way we can see if they're getting
clobbered behind our backs.

> > Otherwise, it might be worth giving KASAN a go; that might detect data
> > corruption. If you have a recent enough toolchain, you only need enable
> > CONFIG_KASAN. This will make your kernel Image a fair amount larger.
> 
> I enabled this with gcc 6.2.1 but now the PC is at __asan_load4 ...

That's expected if the dodgy address is sufficiently dodgy, so we can ignore
this. KASAN has a shadow of all mapped memory, and if you try to access
something that falls outside of mapped memory it can also fall outside
of the mapped shadow.

KASAN didn't detect anything before that, so it's unlikely to help us
here.

Thanks,
Mark.

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH] arm64: restore get_current() optimisation
  2017-03-02 16:12       ` Robin Murphy
@ 2017-03-02 17:11         ` Mark Rutland
  0 siblings, 0 replies; 14+ messages in thread
From: Mark Rutland @ 2017-03-02 17:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 02, 2017 at 04:12:08PM +0000, Robin Murphy wrote:
> On 02/03/17 15:30, Jon Hunter wrote:
> > On 02/03/17 12:35, Mark Rutland wrote:
> >> On Thu, Mar 02, 2017 at 11:35:06AM +0000, Jon Hunter wrote:

> >>> [  184.523390] Unable to handle kernel paging request at virtual address ffff8001bb7a2800
> 
> Notably, this is x4 + x23, where I'd bet on x4 being the address of
> "cache", and x23 being the index, except that apparently the top half of
> a pointer has somehow got in there instead - the stack contents at b9c8
> and b9e8 also stand out in that regard.

Indeed.

This could be indicative of an uninitialise reg value being passed in
from above, assuming map->reg_stride_order is 0.

> I'm wondering if the removal of volatile means we get some stack
> access hoisted before an earlier swizzling of current, the effect of
> which only makes itself known way down the line.

I don't see how that can happen, as current is never swizzled from the
PoV of the thread.

We switch it in assembly, in cpu_switch_to(), along with the other regs.
It's also initialsied in assembly, so at no point should C code be able
to observe a stale value.

> The KASAN version below is also interesting in that the
> reasonable-looking duff address is x0 + x1, but neither of those looks
> like anything sane on their own.

This is just an edge-case of KASAN. Anything that's outside of a mapped
area can also fall outside of the mapped shadow for that area.

> >>> [  184.582802] sp : ffff8000b964b970
> >>> [  184.586108] x29: ffff8000b964b970 x28: ffff8000b9584800 
> >>> [  184.591412] x27: ffff8000b964bcc8 x26: ffff8000b9461000 
> >>> [  184.596716] x25: 0000000000000000 x24: 0000000000000000 
> >>> [  184.602019] x23: 00000000ffff8000 x22: ffff8000b964ba1c 
> >>> [  184.607322] x21: ffff8000b964ba1c x20: 00000000ffff8000 
> >>> [  184.612626] x19: ffff8000bb7dc400 x18: 0000000000000000 
> >>> [  184.617928] x17: 0000000000000001 x16: ffff0000081f79e8 
> >>> [  184.623230] x15: 0000000000497000 x14: 0000000000000000 
> >>> [  184.628532] x13: 0000000000000001 x12: 0000000005cc6000 
> >>> [  184.633835] x11: 0000000000000000 x10: ffff8000bc16bf00 
> >>> [  184.639138] x9 : 0000000000000000 x8 : 0000000000000000 
> >>> [  184.644441] x7 : ffff8000bff68908 x6 : 0000000000000000 
> >>> [  184.649742] x5 : ffff000008fc9f00 x4 : ffff8000bb7aa800 
> >>> [  184.655044] x3 : 0000000000000002 x2 : ffff8000b964ba1c 
> >>> [  184.660347] x1 : 000000003fffe000 x0 : 0000000000000000 

> >> If the commit in question is resulting in get_current() behaving differently,
> >> it *might* be possible to detect with the hack below. I haven't seen it blow up
> >> on my test systems.
> > 
> > Unfortunately, that did not catch it :-(

Just to check, did it still blow up with that patch applied, or did it
function without any warning?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] arm64: restore get_current() optimisation
  2017-03-02 16:46       ` Mark Rutland
@ 2017-03-03 15:32         ` Jon Hunter
  2017-03-03 19:48           ` Mark Rutland
  0 siblings, 1 reply; 14+ messages in thread
From: Jon Hunter @ 2017-03-03 15:32 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

On 02/03/17 16:46, Mark Rutland wrote:

...

> Ok, so it appears that either map->cache is dodgy, or the reg passed to
> regcache_flat_read() is bogus, and we end up generating a cache index
> that is well out of bounds.
> 
> Perhaps a caller is deriving that from an uninitialised variable? The
> code and stack layout changes as a result of my patch could easily
> tickle that.
> 
> I'm not at all familiar with regmap, but IIUC we can catch that with:
> 
> ---->8----
> diff --git a/drivers/base/regmap/regcache-flat.c b/drivers/base/regmap/regcache-flat.c
> index 4d2e50b..2d42c01 100644
> --- a/drivers/base/regmap/regcache-flat.c
> +++ b/drivers/base/regmap/regcache-flat.c
> @@ -57,6 +57,8 @@ static int regcache_flat_read(struct regmap *map,
>  {
>         unsigned int *cache = map->cache;
>  
> +       BUG_ON(reg > map->max_register);
> +
>         *value = cache[regcache_flat_get_index(map, reg)];
>  
>         return 0;
> ---->8----

You are correct, the reg passed to regcache_flat_read() is indeed dodgy
and the above BUG() did trip it up.

Commit e411b0b5eb9b ('ASoC: dapm: Support second register for DAPM
control updates') introduced a 2nd register set into the
snd_soc_dapm_update struct and if the 'has_second_set' is true then it
tries to access this register. One of our out-of-tree audio patches was
not initialising the 'has_second_set' and hence it was trying to write a
bogus register (need to get this upstream!).

I should have caught this, but the bisect threw me off the scent! I
think that with reverting this patch we were just getting lucky and the
problem could have still occurred. I went back and tried it again and it
still works when reverting this change, but its just luck.

Sorry for the noise and thanks for pointing me in the right direction!

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] arm64: restore get_current() optimisation
  2017-03-03 15:32         ` Jon Hunter
@ 2017-03-03 19:48           ` Mark Rutland
  0 siblings, 0 replies; 14+ messages in thread
From: Mark Rutland @ 2017-03-03 19:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Mar 03, 2017 at 03:32:22PM +0000, Jon Hunter wrote:
> Hi Mark,
> 
> On 02/03/17 16:46, Mark Rutland wrote:

> > Perhaps a caller is deriving that from an uninitialised variable? The
> > code and stack layout changes as a result of my patch could easily
> > tickle that.

> You are correct, the reg passed to regcache_flat_read() is indeed dodgy
> and the above BUG() did trip it up.
> 
> Commit e411b0b5eb9b ('ASoC: dapm: Support second register for DAPM
> control updates') introduced a 2nd register set into the
> snd_soc_dapm_update struct and if the 'has_second_set' is true then it
> tries to access this register. One of our out-of-tree audio patches was
> not initialising the 'has_second_set' and hence it was trying to write a
> bogus register (need to get this upstream!).

Phew. That's get_current() off the hook!

> I should have caught this, but the bisect threw me off the scent! I
> think that with reverting this patch we were just getting lucky and the
> problem could have still occurred. I went back and tried it again and it
> still works when reverting this change, but its just luck.
> 
> Sorry for the noise and thanks for pointing me in the right direction!

No worries; it's always a pain to debug this sort of thing, especially
with a bisect seeming so reliable.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-03-03 19:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-03 18:27 [PATCH] arm64: restore get_current() optimisation Mark Rutland
2017-01-04 15:23 ` Will Deacon
2017-03-02 11:35 ` Jon Hunter
2017-03-02 12:35   ` Mark Rutland
2017-03-02 15:30     ` Jon Hunter
2017-03-02 16:12       ` Robin Murphy
2017-03-02 17:11         ` Mark Rutland
2017-03-02 16:46       ` Mark Rutland
2017-03-03 15:32         ` Jon Hunter
2017-03-03 19:48           ` Mark Rutland
2017-03-02 11:54 ` Andreas Färber
2017-03-02 12:40   ` Mark Rutland
2017-03-02 12:43     ` Andreas Färber
2017-03-02 13:37       ` Mark Rutland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.