* [PATCH v2 0/2] Allow configurable stack size (especially 32k on PPC64)
@ 2017-02-24 0:52 Hamish Martin
2017-02-24 0:52 ` [PATCH v2 1/2] powerpc: Move THREAD_SHIFT config to KConfig Hamish Martin
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Hamish Martin @ 2017-02-24 0:52 UTC (permalink / raw)
To: benh, paulus, mpe; +Cc: chris.packham, linuxppc-dev, Hamish Martin
This patch series adds the ability to configure the THREAD_SHIFT value and
thereby alter the stack size on powerpc systems. We are particularly interested
in configuring for a 32k stack on PPC64.
Using an NXP T2081 (e6500 PPC64 cores) we are observing stack overflows as a
result of applying a DTS overlay containing some I2C devices. Our scenario is
an ethernet switch chassis with plug-in cards. The I2C is driven from the T2081
through a PCA9548 mux on the main board. When we detect insertion of the plugin
card we schedule work for a call to of_overlay_create() to install a DTS
overlay for the plugin board. This DTS overlay contains a further PCA9548 mux
with more devices hanging off it including a PCA9539 GPIO expander. The
ultimate installed I2C tree is:
T2081 --- PCA9548 MUX --- PCA9548 MUX --- PCA9539 GPIO Expander
When we install the overlay the devices described in the overlay are probed and
we see a large number of stack frames used as a result. If this is coupled with
an interrupt happening that requires moderate to high stack use we observe
stack corruption. Here is an example long stack (from a 4.10-rc8 kernel) that
does not show corruption but does demonstrate the length and frame sizes
involved.
Depth Size Location (72 entries)
----- ---- --------
0) 13872 128 .__raise_softirq_irqoff+0x1c/0x130
1) 13744 144 .raise_softirq+0x30/0x70
2) 13600 112 .invoke_rcu_core+0x54/0x70
3) 13488 336 .rcu_check_callbacks+0x294/0xde0
4) 13152 128 .update_process_times+0x40/0x90
5) 13024 144 .tick_sched_handle.isra.16+0x40/0xb0
6) 12880 144 .tick_sched_timer+0x6c/0xe0
7) 12736 272 .__hrtimer_run_queues+0x1a0/0x4b0
8) 12464 208 .hrtimer_interrupt+0xe8/0x2a0
9) 12256 160 .__timer_interrupt+0xdc/0x330
10) 12096 160 .timer_interrupt+0x138/0x190
11) 11936 752 exc_0x900_common+0xe0/0xe4
12) 11184 128 .ftrace_ops_no_ops+0x11c/0x230
13) 11056 176 .ftrace_ops_test.isra.12+0x30/0x50
14) 10880 160 .ftrace_ops_no_ops+0xd4/0x230
15) 10720 112 ftrace_call+0x4/0x8
16) 10608 176 .lock_timer_base+0x3c/0xf0
17) 10432 144 .try_to_del_timer_sync+0x2c/0x90
18) 10288 128 .del_timer_sync+0x60/0x80
19) 10160 256 .schedule_timeout+0x1fc/0x490
20) 9904 208 .i2c_wait+0x238/0x290
21) 9696 256 .mpc_xfer+0x4e4/0x570
22) 9440 208 .__i2c_transfer+0x158/0x6d0
23) 9232 192 .pca954x_reg_write+0x70/0x110
24) 9040 160 .__i2c_mux_master_xfer+0xb4/0xf0
25) 8880 208 .__i2c_transfer+0x158/0x6d0
26) 8672 192 .pca954x_reg_write+0x70/0x110
27) 8480 144 .pca954x_select_chan+0x68/0xa0
28) 8336 160 .__i2c_mux_master_xfer+0x64/0xf0
29) 8176 208 .__i2c_transfer+0x158/0x6d0
30) 7968 144 .i2c_transfer+0x98/0x130
31) 7824 320 .i2c_smbus_xfer_emulated+0x168/0x600
32) 7504 208 .i2c_smbus_xfer+0x1c0/0x5d0
33) 7296 192 .i2c_smbus_write_byte_data+0x50/0x70
34) 7104 144 .pca953x_write_single+0x6c/0xe0
35) 6960 192 .pca953x_gpio_direction_output+0xa4/0x160
36) 6768 160 ._gpiod_direction_output_raw+0xec/0x460
37) 6608 160 .gpiod_hog+0x98/0x250
38) 6448 176 .of_gpiochip_add+0xdc/0x1c0
39) 6272 256 .gpiochip_add_data+0x4f4/0x8c0
40) 6016 144 .devm_gpiochip_add_data+0x64/0xf0
41) 5872 208 .pca953x_probe+0x2b4/0x5f0
42) 5664 144 .i2c_device_probe+0x224/0x2e0
43) 5520 160 .really_probe+0x244/0x380
44) 5360 160 .bus_for_each_drv+0x94/0x100
45) 5200 160 .__device_attach+0x118/0x160
46) 5040 144 .bus_probe_device+0xe8/0x100
47) 4896 208 .device_add+0x500/0x6c0
48) 4688 144 .i2c_new_device+0x1f8/0x240
49) 4544 256 .of_i2c_register_device+0x160/0x280
50) 4288 192 .i2c_register_adapter+0x238/0x630
51) 4096 208 .i2c_mux_add_adapter+0x3f8/0x540
52) 3888 192 .pca954x_probe+0x234/0x370
53) 3696 144 .i2c_device_probe+0x224/0x2e0
54) 3552 160 .really_probe+0x244/0x380
55) 3392 160 .bus_for_each_drv+0x94/0x100
56) 3232 160 .__device_attach+0x118/0x160
57) 3072 144 .bus_probe_device+0xe8/0x100
58) 2928 208 .device_add+0x500/0x6c0
59) 2720 144 .i2c_new_device+0x1f8/0x240
60) 2576 256 .of_i2c_register_device+0x160/0x280
61) 2320 144 .of_i2c_notify+0x12c/0x1d0
62) 2176 160 .notifier_call_chain+0x8c/0x100
63) 2016 160 .__blocking_notifier_call_chain+0x6c/0xe0
64) 1856 208 .__of_changeset_entry_notify+0xd8/0x140
65) 1648 192 .__of_changeset_apply+0x7c/0x100
66) 1456 272 .of_overlay_create+0x2e0/0x4b0
67) 1184 128 .xem2_install_overlay+0x40/0x90
68) 1056 176 .process_one_work+0x18c/0x540
69) 880 240 .worker_thread+0x98/0x550
70) 640 192 .kthread+0x150/0x190
71) 448 448 .ret_from_kernel_thread+0x58/0x64
13872
Obviously this could be avoided by constant whack-a-mole type activity of
restructuring code. We have in fact reworked our code from a two overlay
install to a one overlay install to avoid the worst cases. However, we believe
there is a more fundamental issue at the heart of the problem that ought to be
addressed.
In this thread from 2008 (https://lkml.org/lkml/2008/11/17/493) discussing
similar issues it is observed that the minimum stack frame size for PPC64 is
112 bytes compared with 16 bytes for PPC32. We consider that this fact means
the straight doubling of the 8k PPC32 stack to 16K for PPC64 does not lead to
an "equitable" situation with regard to stack headroom. The PPC64 system will
not have an equivalent amount of space to operate in.
For instance for a 70 frame stack, the architecture overhead just for the stack
frames is:
70 * 16 bytes = 1120 bytes for PPC32, and
70 * 112 bytes = 7840 bytes for PPC64.
So a simple doubling of the PPC32 stack size leaves us with a shortfall of 5600
bytes (7840 - (2 * 1120)). In the example the stack frame overhead for PPC32 is
1120/8192 = 13.5% of the stack space, whereas for PPC64 it is 7840/16384 =
47.8% of the space.
The aim of this series is to provide the ability for users to configure for
larger stacks without altering the defaults in a way that would impact existing
users. However, given the inequity between the PPC32 and PPC64 stacks when
taking into account the respective minimum stack frame sizes, we believe
consideration should be given to having a large default. We would appreciate
any input or opinions on this issue.
Changes:
v2:
* make THREAD_SHIFT dependent on EXPERT as suggested by Michael Ellerman.
Hamish Martin (2):
powerpc: Move THREAD_SHIFT config to KConfig
powerpc64: Allow for THREAD_SIZE > 16k
arch/powerpc/Kconfig | 10 ++++++++++
arch/powerpc/include/asm/thread_info.h | 10 +---------
arch/powerpc/kernel/head_64.S | 3 ++-
3 files changed, 13 insertions(+), 10 deletions(-)
--
2.11.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/2] powerpc: Move THREAD_SHIFT config to KConfig
2017-02-24 0:52 [PATCH v2 0/2] Allow configurable stack size (especially 32k on PPC64) Hamish Martin
@ 2017-02-24 0:52 ` Hamish Martin
2017-03-21 11:36 ` [v2,1/2] " Michael Ellerman
2017-02-24 0:52 ` [PATCH v2 2/2] powerpc64: Allow for THREAD_SIZE > 16k Hamish Martin
2017-02-24 9:54 ` [PATCH v2 0/2] Allow configurable stack size (especially 32k on PPC64) David Laight
2 siblings, 1 reply; 6+ messages in thread
From: Hamish Martin @ 2017-02-24 0:52 UTC (permalink / raw)
To: benh, paulus, mpe; +Cc: chris.packham, linuxppc-dev, Hamish Martin
Shift the logic for defining THREAD_SHIFT logic to Kconfig in order to
allow override by users.
Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
---
arch/powerpc/Kconfig | 10 ++++++++++
arch/powerpc/include/asm/thread_info.h | 10 +---------
2 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 281f4f1fcd1f..cd4dd9354b3c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -669,6 +669,16 @@ config PPC_256K_PAGES
endchoice
+config THREAD_SHIFT
+ int "Thread shift" if EXPERT
+ range 13 15
+ default "15" if PPC_256K_PAGES
+ default "14" if PPC64
+ default "13"
+ help
+ Used to define the stack size. The default is almost always what you
+ want. Only change this if you know what you are doing.
+
config FORCE_MAX_ZONEORDER
int "Maximum zone order"
range 8 9 if PPC64 && PPC_64K_PAGES
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index 87e4b2d8dcd4..2e17d668c472 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -10,15 +10,7 @@
#ifdef __KERNEL__
-/* We have 8k stacks on ppc32 and 16k on ppc64 */
-
-#if defined(CONFIG_PPC64)
-#define THREAD_SHIFT 14
-#elif defined(CONFIG_PPC_256K_PAGES)
-#define THREAD_SHIFT 15
-#else
-#define THREAD_SHIFT 13
-#endif
+#define THREAD_SHIFT CONFIG_THREAD_SHIFT
#define THREAD_SIZE (1 << THREAD_SHIFT)
--
2.11.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/2] powerpc64: Allow for THREAD_SIZE > 16k
2017-02-24 0:52 [PATCH v2 0/2] Allow configurable stack size (especially 32k on PPC64) Hamish Martin
2017-02-24 0:52 ` [PATCH v2 1/2] powerpc: Move THREAD_SHIFT config to KConfig Hamish Martin
@ 2017-02-24 0:52 ` Hamish Martin
2017-02-24 9:54 ` [PATCH v2 0/2] Allow configurable stack size (especially 32k on PPC64) David Laight
2 siblings, 0 replies; 6+ messages in thread
From: Hamish Martin @ 2017-02-24 0:52 UTC (permalink / raw)
To: benh, paulus, mpe; +Cc: chris.packham, linuxppc-dev, Hamish Martin
Fix an assembler error when the THREAD_SIZE is greater than 16k.
Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
---
arch/powerpc/kernel/head_64.S | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 1dc5eae2ced3..0ddc602b33a4 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -949,7 +949,8 @@ start_here_multiplatform:
LOAD_REG_ADDR(r3,init_thread_union)
/* set up a stack pointer */
- addi r1,r3,THREAD_SIZE
+ LOAD_REG_IMMEDIATE(r1,THREAD_SIZE)
+ add r1,r3,r1
li r0,0
stdu r0,-STACK_FRAME_OVERHEAD(r1)
--
2.11.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* RE: [PATCH v2 0/2] Allow configurable stack size (especially 32k on PPC64)
2017-02-24 0:52 [PATCH v2 0/2] Allow configurable stack size (especially 32k on PPC64) Hamish Martin
2017-02-24 0:52 ` [PATCH v2 1/2] powerpc: Move THREAD_SHIFT config to KConfig Hamish Martin
2017-02-24 0:52 ` [PATCH v2 2/2] powerpc64: Allow for THREAD_SIZE > 16k Hamish Martin
@ 2017-02-24 9:54 ` David Laight
2017-02-26 20:17 ` Hamish Martin
2 siblings, 1 reply; 6+ messages in thread
From: David Laight @ 2017-02-24 9:54 UTC (permalink / raw)
To: 'Hamish Martin', benh, paulus, mpe; +Cc: chris.packham, linuxppc-dev
From: Hamish Martin
> Sent: 24 February 2017 00:52
> This patch series adds the ability to configure the THREAD_SHIFT value an=
d
> thereby alter the stack size on powerpc systems. We are particularly inte=
rested
> in configuring for a 32k stack on PPC64.
>=20
> Using an NXP T2081 (e6500 PPC64 cores) we are observing stack overflows a=
s a
> result of applying a DTS overlay containing some I2C devices. Our scenari=
o is
> an ethernet switch chassis with plug-in cards. The I2C is driven from the=
T2081
> through a PCA9548 mux on the main board. When we detect insertion of the =
plugin
> card we schedule work for a call to of_overlay_create() to install a DTS
> overlay for the plugin board. This DTS overlay contains a further PCA9548=
mux
> with more devices hanging off it including a PCA9539 GPIO expander. The
> ultimate installed I2C tree is:
>=20
> T2081 --- PCA9548 MUX --- PCA9548 MUX --- PCA9539 GPIO Expander
>=20
> When we install the overlay the devices described in the overlay are prob=
ed and
> we see a large number of stack frames used as a result. If this is couple=
d with
> an interrupt happening that requires moderate to high stack use we observ=
e
> stack corruption. Here is an example long stack (from a 4.10-rc8 kernel) =
that
> does not show corruption but does demonstrate the length and frame sizes
> involved.
...
ISTM that the device probe needs to be iterative rather than recursive so
that deeply nested buses don't require deep stacks.
Switching stacks on interrupt entry would also make it much less likely tha=
t
you'll get an unexpected stack corruption.
The problem with just doubling the stack size is that code will just eat
it all up and, in a few years, something will hit the limit again.
David
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 0/2] Allow configurable stack size (especially 32k on PPC64)
2017-02-24 9:54 ` [PATCH v2 0/2] Allow configurable stack size (especially 32k on PPC64) David Laight
@ 2017-02-26 20:17 ` Hamish Martin
0 siblings, 0 replies; 6+ messages in thread
From: Hamish Martin @ 2017-02-26 20:17 UTC (permalink / raw)
To: David Laight, benh, paulus, mpe; +Cc: Chris Packham, linuxppc-dev
On 02/24/2017 10:54 PM, David Laight wrote:
> From: Hamish Martin
>> Sent: 24 February 2017 00:52
>> This patch series adds the ability to configure the THREAD_SHIFT value a=
nd
>> thereby alter the stack size on powerpc systems. We are particularly int=
erested
>> in configuring for a 32k stack on PPC64.
>>
>> Using an NXP T2081 (e6500 PPC64 cores) we are observing stack overflows =
as a
>> result of applying a DTS overlay containing some I2C devices. Our scenar=
io is
>> an ethernet switch chassis with plug-in cards. The I2C is driven from th=
e T2081
>> through a PCA9548 mux on the main board. When we detect insertion of the=
plugin
>> card we schedule work for a call to of_overlay_create() to install a DTS
>> overlay for the plugin board. This DTS overlay contains a further PCA954=
8 mux
>> with more devices hanging off it including a PCA9539 GPIO expander. The
>> ultimate installed I2C tree is:
>>
>> T2081 --- PCA9548 MUX --- PCA9548 MUX --- PCA9539 GPIO Expander
>>
>> When we install the overlay the devices described in the overlay are pro=
bed and
>> we see a large number of stack frames used as a result. If this is coupl=
ed with
>> an interrupt happening that requires moderate to high stack use we obser=
ve
>> stack corruption. Here is an example long stack (from a 4.10-rc8 kernel)=
that
>> does not show corruption but does demonstrate the length and frame sizes
>> involved.
> ...
>
> ISTM that the device probe needs to be iterative rather than recursive so
> that deeply nested buses don't require deep stacks.
>
> Switching stacks on interrupt entry would also make it much less likely t=
hat
> you'll get an unexpected stack corruption.
>
> The problem with just doubling the stack size is that code will just eat
> it all up and, in a few years, something will hit the limit again.
>
> David
>
>
Thanks for your comments David.
Yes, restructuring bus/device probing would be a better solution in the=20
longer term. It's what Ben H was alluding to in his reply to v1 of this=20
patch series. I do suspect that would be a fairly large undertaking=20
across the kernel drivers that may expose numerous bugs.
However, I still feel this series has validity given the inequity=20
between PPC64 and PPC32 minimum stack frame overhead.
Flipping it on its head, would anyone even consider a patch that sought=20
to make things more equivalent between PPC32 and PPC64 where the patch=20
involved reducing the PPC32 stack from 8k to something like 6k? I think not=
.
And of course, finally, this is user selectable and we don't seek to=20
modify the current default behaviour.
Cheers,
Hamish M.=
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [v2,1/2] powerpc: Move THREAD_SHIFT config to KConfig
2017-02-24 0:52 ` [PATCH v2 1/2] powerpc: Move THREAD_SHIFT config to KConfig Hamish Martin
@ 2017-03-21 11:36 ` Michael Ellerman
0 siblings, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2017-03-21 11:36 UTC (permalink / raw)
To: Hamish Martin, benh, paulus; +Cc: Hamish Martin, chris.packham, linuxppc-dev
On Fri, 2017-02-24 at 00:52:09 UTC, Hamish Martin wrote:
> Shift the logic for defining THREAD_SHIFT logic to Kconfig in order to
> allow override by users.
>
> Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
> Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Series applied to powerpc next, thanks.
https://git.kernel.org/powerpc/c/476134070c037820bd909ff6e43e0d
cheers
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-03-21 11:36 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-24 0:52 [PATCH v2 0/2] Allow configurable stack size (especially 32k on PPC64) Hamish Martin
2017-02-24 0:52 ` [PATCH v2 1/2] powerpc: Move THREAD_SHIFT config to KConfig Hamish Martin
2017-03-21 11:36 ` [v2,1/2] " Michael Ellerman
2017-02-24 0:52 ` [PATCH v2 2/2] powerpc64: Allow for THREAD_SIZE > 16k Hamish Martin
2017-02-24 9:54 ` [PATCH v2 0/2] Allow configurable stack size (especially 32k on PPC64) David Laight
2017-02-26 20:17 ` Hamish Martin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.