* AMDGPU and 16B stack alignment @ 2019-10-14 22:22 Nick Desaulniers [not found] ` <9e4d6378-5032-8521-13a9-d9d9519d07de@amd.com> 0 siblings, 1 reply; 11+ messages in thread From: Nick Desaulniers @ 2019-10-14 22:22 UTC (permalink / raw) To: Harry Wentland, Deucher, Alexander Cc: yshuiv7, andrew.cooper3, Arnd Bergmann, clang-built-linux, Matthias Kaehlcke, S, Shirish, Zhou, David(ChunMing), Koenig, Christian, amd-gfx list, LKML Hello! The x86 kernel is compiled with an 8B stack alignment via `-mpreferred-stack-boundary=3` for GCC since 3.6-rc1 via commit d9b0cde91c60 ("x86-64, gcc: Use -mpreferred-stack-boundary=3 if supported") or `-mstack-alignment=8` for Clang. Parts of the AMDGPU driver are compiled with 16B stack alignment. Generally, the stack alignment is part of the ABI. Linking together two different translation units with differing stack alignment is dangerous, particularly when the translation unit with the smaller stack alignment makes calls into the translation unit with the larger stack alignment. While 8B aligned stacks are sometimes also 16B aligned, they are not always. Multiple users have reported General Protection Faults (GPF) when using the AMDGPU driver compiled with Clang. Clang is placing objects in stack slots assuming the stack is 16B aligned, and selecting instructions that require 16B aligned memory operands. At runtime, syscalls handling 8B stack aligned code calls into code that assumes 16B stack alignment. When the stack is a multiple of 8B but not 16B, these instructions result in a GPF. GCC doesn't select instructions with alignment requirements, so the GPFs aren't observed, but it is still considered an ABI breakage to mix and match stack alignment. I have patches that basically remove -mpreferred-stack-boundary=4 and -mstack-alignment=16 from AMDGPU: https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-541247601 Yuxuan has tested with Clang and GCC and reported it fixes the GPF's observed. I've split the patch into 4; same commit message but different Fixes tags so that they backport to stable on finer granularity. 2 questions BEFORE I send the series: 1. Would you prefer 4 patches with unique `fixes` tags, or 1 patch? 2. Was there or is there still a good reason for the stack alignment mismatch? (Further, I think we can use -msse2 for BOTH clang+gcc after my patch, but I don't have hardware to test on. I'm happy to write/send the follow up patch, but I'd need help testing). -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <9e4d6378-5032-8521-13a9-d9d9519d07de@amd.com>]
* Re: AMDGPU and 16B stack alignment [not found] ` <9e4d6378-5032-8521-13a9-d9d9519d07de@amd.com> @ 2019-10-15 7:19 ` Arnd Bergmann 2019-10-15 10:48 ` David Laight 2019-10-15 18:05 ` Nick Desaulniers 0 siblings, 2 replies; 11+ messages in thread From: Arnd Bergmann @ 2019-10-15 7:19 UTC (permalink / raw) To: S, Shirish Cc: Nick Desaulniers, Wentland, Harry, Deucher, Alexander, yshuiv7, andrew.cooper3, clang-built-linux, Matthias Kaehlcke, S, Shirish, Zhou, David(ChunMing), Koenig, Christian, amd-gfx list, LKML On Tue, Oct 15, 2019 at 9:08 AM S, Shirish <sshankar@amd.com> wrote: > On 10/15/2019 3:52 AM, Nick Desaulniers wrote: > My gcc build fails with below errors: > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 > > While GPF observed on clang builds seem to be fixed. Ok, so it seems that gcc insists on having at least 2^4 bytes stack alignment when SSE is enabled on x86-64, but does not actually rely on that for correct operation unless it's using sse2. So -msse always has to be paired with -mpreferred-stack-boundary=3. For clang, it sounds like the opposite is true: when passing 16 byte stack alignment and having sse/sse2 enabled, it requires the incoming stack to be 16 byte aligned, but passing 8 byte alignment makes it do the right thing. So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4) to get the desired outcome on both? Arnd ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: AMDGPU and 16B stack alignment 2019-10-15 7:19 ` Arnd Bergmann @ 2019-10-15 10:48 ` David Laight 2019-10-15 18:05 ` Nick Desaulniers 1 sibling, 0 replies; 11+ messages in thread From: David Laight @ 2019-10-15 10:48 UTC (permalink / raw) To: 'Arnd Bergmann', S, Shirish Cc: Nick Desaulniers, Wentland, Harry, Deucher, Alexander, yshuiv7, andrew.cooper3, clang-built-linux, Matthias Kaehlcke, S, Shirish, Zhou, David(ChunMing), Koenig, Christian, amd-gfx list, LKML From: Arnd Bergmann > Sent: 15 October 2019 08:19 > > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish <sshankar@amd.com> wrote: > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote: > > > My gcc build fails with below errors: > > > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 > > > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 > > > > While GPF observed on clang builds seem to be fixed. > > Ok, so it seems that gcc insists on having at least 2^4 bytes stack > alignment when > SSE is enabled on x86-64, but does not actually rely on that for > correct operation > unless it's using sse2. So -msse always has to be paired with > -mpreferred-stack-boundary=3. > > For clang, it sounds like the opposite is true: when passing 16 byte > stack alignment > and having sse/sse2 enabled, it requires the incoming stack to be 16 > byte aligned, > but passing 8 byte alignment makes it do the right thing. > > So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4) > to get the desired outcome on both? It probably won't solve the problem. You need to find all the asm blocks that call back into C and ensure they maintain the required stack alignment. This might be possible in the kernel, but is almost impossible in userspace. ISTR that gcc arbitrarily changed the stack alignment for i386 a few years ago. While it helped code generation it broke a lot of things. I can't remember the correct set of options to get the stack alignment code added only where it was needed (generates a double %bp frame). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: AMDGPU and 16B stack alignment 2019-10-15 7:19 ` Arnd Bergmann 2019-10-15 10:48 ` David Laight @ 2019-10-15 18:05 ` Nick Desaulniers 2019-10-15 18:11 ` Nick Desaulniers ` (2 more replies) 1 sibling, 3 replies; 11+ messages in thread From: Nick Desaulniers @ 2019-10-15 18:05 UTC (permalink / raw) To: Arnd Bergmann, S, Shirish Cc: Wentland, Harry, Deucher, Alexander, yshuiv7, andrew.cooper3, clang-built-linux, Matthias Kaehlcke, S, Shirish, Zhou, David(ChunMing), Koenig, Christian, amd-gfx list, LKML On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann <arnd@arndb.de> wrote: > > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish <sshankar@amd.com> wrote: > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote: > > > My gcc build fails with below errors: > > > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 > > > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 I was able to reproduce this failure on pre-7.1 versions of GCC. It seems that when: 1. code is using doubles 2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment than GCC produces that error: https://godbolt.org/z/7T8nbH That's already a tall order of constraints, so it's understandable that the compiler would just error likely during instruction selection, but was eventually taught how to solve such constraints. > > > > While GPF observed on clang builds seem to be fixed. Thanks for the report. Your testing these patches is invaluable, Shirish! > > Ok, so it seems that gcc insists on having at least 2^4 bytes stack > alignment when > SSE is enabled on x86-64, but does not actually rely on that for > correct operation > unless it's using sse2. So -msse always has to be paired with > -mpreferred-stack-boundary=3. Seemingly only for older versions of GCC, pre 7.1. > > For clang, it sounds like the opposite is true: when passing 16 byte > stack alignment > and having sse/sse2 enabled, it requires the incoming stack to be 16 > byte aligned, I don't think it requires the incoming stack to be 16B aligned for sse2, I think it requires the incoming and current stack alignment to match. Today it does not, which is why we observe GPFs. > but passing 8 byte alignment makes it do the right thing. > > So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4) > to get the desired outcome on both? Hmmm...I would have liked to remove it outright, as it is an ABI mismatch that is likely to result in instability and non-fun-to-debug runtime issues in the future. I suspect my patch does work for GCC 7.1+. The question is: Do we want to either: 1. mark AMDGPU broken for GCC < 7.1, or 2. continue supporting it via stack alignment mismatch? 2 is brittle, and may break at any point in the future, but if it's working for someone it does make me feel bad to outright disable it. What I'd image 2 looks like is (psuedo code in a Makefile): if CC_IS_GCC && GCC_VERSION < 7.1: set stack alignment to 16B and hope for the best So my diff would be amended to keep the stack alignment flags, but only to support GCC < 7.1. And that assumes my change compiles with GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would feel even more confident if someone with hardware to test on and GCC 7.1+ could boot test). -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: AMDGPU and 16B stack alignment 2019-10-15 18:05 ` Nick Desaulniers @ 2019-10-15 18:11 ` Nick Desaulniers 2019-10-15 18:30 ` Alex Deucher 2019-10-15 20:26 ` Arvind Sankar 2 siblings, 0 replies; 11+ messages in thread From: Nick Desaulniers @ 2019-10-15 18:11 UTC (permalink / raw) To: Arnd Bergmann, S, Shirish Cc: Wentland, Harry, Deucher, Alexander, yshuiv7, andrew.cooper3, clang-built-linux, Matthias Kaehlcke, S, Shirish, Zhou, David(ChunMing), Koenig, Christian, amd-gfx list, LKML On Tue, Oct 15, 2019 at 11:05 AM Nick Desaulniers <ndesaulniers@google.com> wrote: > > On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish <sshankar@amd.com> wrote: > > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote: > > > > > My gcc build fails with below errors: > > > > > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 > > > > > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 > > I was able to reproduce this failure on pre-7.1 versions of GCC. It > seems that when: > 1. code is using doubles > 2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment > than GCC produces that error: > https://godbolt.org/z/7T8nbH > Also, I suspect that very error solves the mystery of "why was 16B stack alignment ever specified?" -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: AMDGPU and 16B stack alignment 2019-10-15 18:05 ` Nick Desaulniers 2019-10-15 18:11 ` Nick Desaulniers @ 2019-10-15 18:30 ` Alex Deucher 2019-10-15 20:15 ` Nick Desaulniers 2019-10-15 20:26 ` Arvind Sankar 2 siblings, 1 reply; 11+ messages in thread From: Alex Deucher @ 2019-10-15 18:30 UTC (permalink / raw) To: Nick Desaulniers Cc: Arnd Bergmann, S, Shirish, Zhou, David(ChunMing), clang-built-linux, andrew.cooper3, LKML, amd-gfx list, S, Shirish, Matthias Kaehlcke, yshuiv7, Deucher, Alexander, Wentland, Harry, Koenig, Christian On Tue, Oct 15, 2019 at 2:07 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish <sshankar@amd.com> wrote: > > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote: > > > > > My gcc build fails with below errors: > > > > > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 > > > > > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 > > I was able to reproduce this failure on pre-7.1 versions of GCC. It > seems that when: > 1. code is using doubles > 2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment > than GCC produces that error: > https://godbolt.org/z/7T8nbH > > That's already a tall order of constraints, so it's understandable > that the compiler would just error likely during instruction > selection, but was eventually taught how to solve such constraints. > > > > > > > While GPF observed on clang builds seem to be fixed. > > Thanks for the report. Your testing these patches is invaluable, Shirish! > > > > > Ok, so it seems that gcc insists on having at least 2^4 bytes stack > > alignment when > > SSE is enabled on x86-64, but does not actually rely on that for > > correct operation > > unless it's using sse2. So -msse always has to be paired with > > -mpreferred-stack-boundary=3. > > Seemingly only for older versions of GCC, pre 7.1. > > > > > For clang, it sounds like the opposite is true: when passing 16 byte > > stack alignment > > and having sse/sse2 enabled, it requires the incoming stack to be 16 > > byte aligned, > > I don't think it requires the incoming stack to be 16B aligned for > sse2, I think it requires the incoming and current stack alignment to > match. Today it does not, which is why we observe GPFs. > > > but passing 8 byte alignment makes it do the right thing. > > > > So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4) > > to get the desired outcome on both? > > Hmmm...I would have liked to remove it outright, as it is an ABI > mismatch that is likely to result in instability and non-fun-to-debug > runtime issues in the future. I suspect my patch does work for GCC > 7.1+. The question is: Do we want to either: > 1. mark AMDGPU broken for GCC < 7.1, or > 2. continue supporting it via stack alignment mismatch? > > 2 is brittle, and may break at any point in the future, but if it's > working for someone it does make me feel bad to outright disable it. > What I'd image 2 looks like is (psuedo code in a Makefile): Well, it's been working as is for years now, at least with gcc, so I'd hate to break that. Alex > > if CC_IS_GCC && GCC_VERSION < 7.1: > set stack alignment to 16B and hope for the best > > So my diff would be amended to keep the stack alignment flags, but > only to support GCC < 7.1. And that assumes my change compiles with > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would > feel even more confident if someone with hardware to test on and GCC > 7.1+ could boot test). > -- > Thanks, > ~Nick Desaulniers > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: AMDGPU and 16B stack alignment 2019-10-15 18:30 ` Alex Deucher @ 2019-10-15 20:15 ` Nick Desaulniers 0 siblings, 0 replies; 11+ messages in thread From: Nick Desaulniers @ 2019-10-15 20:15 UTC (permalink / raw) To: Alex Deucher Cc: Arnd Bergmann, S, Shirish, Zhou, David(ChunMing), clang-built-linux, andrew.cooper3, LKML, amd-gfx list, S, Shirish, Matthias Kaehlcke, yshuiv7, Deucher, Alexander, Wentland, Harry, Koenig, Christian On Tue, Oct 15, 2019 at 11:30 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > On Tue, Oct 15, 2019 at 2:07 PM Nick Desaulniers > <ndesaulniers@google.com> wrote: > > > > On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > > > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish <sshankar@amd.com> wrote: > > > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote: > > > > > > > My gcc build fails with below errors: > > > > > > > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 > > > > > > > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12 > > > > I was able to reproduce this failure on pre-7.1 versions of GCC. It > > seems that when: > > 1. code is using doubles > > 2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment > > than GCC produces that error: > > https://godbolt.org/z/7T8nbH > > > > That's already a tall order of constraints, so it's understandable > > that the compiler would just error likely during instruction > > selection, but was eventually taught how to solve such constraints. > > > > > > > > > > While GPF observed on clang builds seem to be fixed. > > > > Thanks for the report. Your testing these patches is invaluable, Shirish! > > > > > > > > Ok, so it seems that gcc insists on having at least 2^4 bytes stack > > > alignment when > > > SSE is enabled on x86-64, but does not actually rely on that for > > > correct operation > > > unless it's using sse2. So -msse always has to be paired with > > > -mpreferred-stack-boundary=3. > > > > Seemingly only for older versions of GCC, pre 7.1. > > > > > > > > For clang, it sounds like the opposite is true: when passing 16 byte > > > stack alignment > > > and having sse/sse2 enabled, it requires the incoming stack to be 16 > > > byte aligned, > > > > I don't think it requires the incoming stack to be 16B aligned for > > sse2, I think it requires the incoming and current stack alignment to > > match. Today it does not, which is why we observe GPFs. > > > > > but passing 8 byte alignment makes it do the right thing. > > > > > > So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4) > > > to get the desired outcome on both? > > > > Hmmm...I would have liked to remove it outright, as it is an ABI > > mismatch that is likely to result in instability and non-fun-to-debug > > runtime issues in the future. I suspect my patch does work for GCC > > 7.1+. The question is: Do we want to either: > > 1. mark AMDGPU broken for GCC < 7.1, or > > 2. continue supporting it via stack alignment mismatch? > > > > 2 is brittle, and may break at any point in the future, but if it's > > working for someone it does make me feel bad to outright disable it. > > What I'd image 2 looks like is (psuedo code in a Makefile): > > Well, it's been working as is for years now, at least with gcc, so I'd > hate to break that. Ok, I'm happy to leave that as is for GCC, then. Would you prefer I modify it for GCC >7.1 or just leave it alone (maybe I'll add a comment about *why* it's done for GCC)? Would you prefer 1 patch or 4? > > Alex > > > > > if CC_IS_GCC && GCC_VERSION < 7.1: > > set stack alignment to 16B and hope for the best Ie, this ^ > > > > So my diff would be amended to keep the stack alignment flags, but > > only to support GCC < 7.1. And that assumes my change compiles with > > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would > > feel even more confident if someone with hardware to test on and GCC > > 7.1+ could boot test). -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: AMDGPU and 16B stack alignment 2019-10-15 18:05 ` Nick Desaulniers 2019-10-15 18:11 ` Nick Desaulniers 2019-10-15 18:30 ` Alex Deucher @ 2019-10-15 20:26 ` Arvind Sankar 2019-10-16 1:51 ` Nick Desaulniers 2 siblings, 1 reply; 11+ messages in thread From: Arvind Sankar @ 2019-10-15 20:26 UTC (permalink / raw) To: Nick Desaulniers Cc: Arnd Bergmann, S, Shirish, Wentland, Harry, Deucher, Alexander, yshuiv7, andrew.cooper3, clang-built-linux, Matthias Kaehlcke, S, Shirish, Zhou, David(ChunMing), Koenig, Christian, amd-gfx list, LKML On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote: > Hmmm...I would have liked to remove it outright, as it is an ABI > mismatch that is likely to result in instability and non-fun-to-debug > runtime issues in the future. I suspect my patch does work for GCC > 7.1+. The question is: Do we want to either: > 1. mark AMDGPU broken for GCC < 7.1, or > 2. continue supporting it via stack alignment mismatch? > > 2 is brittle, and may break at any point in the future, but if it's > working for someone it does make me feel bad to outright disable it. > What I'd image 2 looks like is (psuedo code in a Makefile): > > if CC_IS_GCC && GCC_VERSION < 7.1: > set stack alignment to 16B and hope for the best > > So my diff would be amended to keep the stack alignment flags, but > only to support GCC < 7.1. And that assumes my change compiles with > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would > feel even more confident if someone with hardware to test on and GCC > 7.1+ could boot test). > -- > Thanks, > ~Nick Desaulniers If we do keep it, would adding -mstackrealign make it more robust? That's simple and will only add the alignment to functions that require 16-byte alignment (at least on gcc). Alternative is to use __attribute__((force_align_arg_pointer)) on functions that might be called from 8-byte-aligned code. It looks like -mstackrealign should work from gcc 5.3 onwards. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: AMDGPU and 16B stack alignment 2019-10-15 20:26 ` Arvind Sankar @ 2019-10-16 1:51 ` Nick Desaulniers 2019-10-16 18:55 ` Arvind Sankar 0 siblings, 1 reply; 11+ messages in thread From: Nick Desaulniers @ 2019-10-16 1:51 UTC (permalink / raw) To: Arvind Sankar Cc: Arnd Bergmann, S, Shirish, Wentland, Harry, Deucher, Alexander, yshuiv7, andrew.cooper3, clang-built-linux, Matthias Kaehlcke, S, Shirish, Zhou, David(ChunMing), Koenig, Christian, amd-gfx list, LKML On Tue, Oct 15, 2019 at 1:26 PM Arvind Sankar <nivedita@alum.mit.edu> wrote: > > On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote: > > Hmmm...I would have liked to remove it outright, as it is an ABI > > mismatch that is likely to result in instability and non-fun-to-debug > > runtime issues in the future. I suspect my patch does work for GCC > > 7.1+. The question is: Do we want to either: > > 1. mark AMDGPU broken for GCC < 7.1, or > > 2. continue supporting it via stack alignment mismatch? > > > > 2 is brittle, and may break at any point in the future, but if it's > > working for someone it does make me feel bad to outright disable it. > > What I'd image 2 looks like is (psuedo code in a Makefile): > > > > if CC_IS_GCC && GCC_VERSION < 7.1: > > set stack alignment to 16B and hope for the best > > > > So my diff would be amended to keep the stack alignment flags, but > > only to support GCC < 7.1. And that assumes my change compiles with > > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would > > feel even more confident if someone with hardware to test on and GCC > > 7.1+ could boot test). > > -- > > Thanks, > > ~Nick Desaulniers > > If we do keep it, would adding -mstackrealign make it more robust? > That's simple and will only add the alignment to functions that require > 16-byte alignment (at least on gcc). I think there's also `-mincoming-stack-boundary=`. https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-540038017 > > Alternative is to use > __attribute__((force_align_arg_pointer)) on functions that might be > called from 8-byte-aligned code. Which is hard to automate and easy to forget. Likely a large diff to fix today. > > It looks like -mstackrealign should work from gcc 5.3 onwards. The kernel would generally like to support GCC 4.9+. There's plenty of different ways to keep layering on duct tape and bailing wire to support differing ABIs, but that's just adding technical debt that will have to be repaid one day. That's why the cleanest solution IMO is mark the driver broken for old toolchains, and use a code-base-consistent stack alignment. Bending over backwards to support old toolchains means accepting stack alignment mismatches, which is in the "unspecified behavior" ring of the "undefined behavior" Venn diagram. I have the same opinion on relying on explicitly undefined behavior. I'll send patches for fixing up Clang, but please consider my strong advice to generally avoid stack alignment mismatches, regardless of compiler. -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: AMDGPU and 16B stack alignment 2019-10-16 1:51 ` Nick Desaulniers @ 2019-10-16 18:55 ` Arvind Sankar 2019-10-16 23:05 ` Nick Desaulniers 0 siblings, 1 reply; 11+ messages in thread From: Arvind Sankar @ 2019-10-16 18:55 UTC (permalink / raw) To: Nick Desaulniers Cc: Arvind Sankar, Arnd Bergmann, S, Shirish, Wentland, Harry, Deucher, Alexander, yshuiv7, andrew.cooper3, clang-built-linux, Matthias Kaehlcke, S, Shirish, Zhou, David(ChunMing), Koenig, Christian, amd-gfx list, LKML On Tue, Oct 15, 2019 at 06:51:26PM -0700, Nick Desaulniers wrote: > On Tue, Oct 15, 2019 at 1:26 PM Arvind Sankar <nivedita@alum.mit.edu> wrote: > > > > On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote: > > > Hmmm...I would have liked to remove it outright, as it is an ABI > > > mismatch that is likely to result in instability and non-fun-to-debug > > > runtime issues in the future. I suspect my patch does work for GCC > > > 7.1+. The question is: Do we want to either: > > > 1. mark AMDGPU broken for GCC < 7.1, or > > > 2. continue supporting it via stack alignment mismatch? > > > > > > 2 is brittle, and may break at any point in the future, but if it's > > > working for someone it does make me feel bad to outright disable it. > > > What I'd image 2 looks like is (psuedo code in a Makefile): > > > > > > if CC_IS_GCC && GCC_VERSION < 7.1: > > > set stack alignment to 16B and hope for the best > > > > > > So my diff would be amended to keep the stack alignment flags, but > > > only to support GCC < 7.1. And that assumes my change compiles with > > > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would > > > feel even more confident if someone with hardware to test on and GCC > > > 7.1+ could boot test). > > > -- > > > Thanks, > > > ~Nick Desaulniers > > > > If we do keep it, would adding -mstackrealign make it more robust? > > That's simple and will only add the alignment to functions that require > > 16-byte alignment (at least on gcc). > > I think there's also `-mincoming-stack-boundary=`. > https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-540038017 Yes, but -mstackrealign looks like it's supported by clang as well. > > > > > Alternative is to use > > __attribute__((force_align_arg_pointer)) on functions that might be > > called from 8-byte-aligned code. > > Which is hard to automate and easy to forget. Likely a large diff to fix today. Right, this is a no-go, esp to just fix old compilers. > > > > > It looks like -mstackrealign should work from gcc 5.3 onwards. > > The kernel would generally like to support GCC 4.9+. > > There's plenty of different ways to keep layering on duct tape and > bailing wire to support differing ABIs, but that's just adding > technical debt that will have to be repaid one day. That's why the > cleanest solution IMO is mark the driver broken for old toolchains, > and use a code-base-consistent stack alignment. Bending over > backwards to support old toolchains means accepting stack alignment > mismatches, which is in the "unspecified behavior" ring of the > "undefined behavior" Venn diagram. I have the same opinion on relying > on explicitly undefined behavior. > > I'll send patches for fixing up Clang, but please consider my strong > advice to generally avoid stack alignment mismatches, regardless of > compiler. > -- > Thanks, > ~Nick Desaulniers What I suggested was in reference to your proposal for dropping the -mpreferred-stack-boundary=4 for modern compilers, but keeping it for <7.1. -mstackrealign would at least let 5.3 onwards be less likely to break (and it doesn't error before then, I think it just doesn't actually do anything, so no worse than now at least). Simply dropping support for <7.1 would be cleanest, yes, but it sounds like people don't want to go that far. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: AMDGPU and 16B stack alignment 2019-10-16 18:55 ` Arvind Sankar @ 2019-10-16 23:05 ` Nick Desaulniers 0 siblings, 0 replies; 11+ messages in thread From: Nick Desaulniers @ 2019-10-16 23:05 UTC (permalink / raw) To: Arvind Sankar Cc: Arnd Bergmann, S, Shirish, Wentland, Harry, Deucher, Alexander, yshuiv7, andrew.cooper3, clang-built-linux, Matthias Kaehlcke, S, Shirish, Zhou, David(ChunMing), Koenig, Christian, amd-gfx list, LKML On Wed, Oct 16, 2019 at 11:55 AM Arvind Sankar <nivedita@alum.mit.edu> wrote: > > On Tue, Oct 15, 2019 at 06:51:26PM -0700, Nick Desaulniers wrote: > > On Tue, Oct 15, 2019 at 1:26 PM Arvind Sankar <nivedita@alum.mit.edu> wrote: > > > > > > On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote: > > > > Hmmm...I would have liked to remove it outright, as it is an ABI > > > > mismatch that is likely to result in instability and non-fun-to-debug > > > > runtime issues in the future. I suspect my patch does work for GCC > > > > 7.1+. The question is: Do we want to either: > > > > 1. mark AMDGPU broken for GCC < 7.1, or > > > > 2. continue supporting it via stack alignment mismatch? > > > > > > > > 2 is brittle, and may break at any point in the future, but if it's > > > > working for someone it does make me feel bad to outright disable it. > > > > What I'd image 2 looks like is (psuedo code in a Makefile): > > > > > > > > if CC_IS_GCC && GCC_VERSION < 7.1: > > > > set stack alignment to 16B and hope for the best > > > > > > > > So my diff would be amended to keep the stack alignment flags, but > > > > only to support GCC < 7.1. And that assumes my change compiles with > > > > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would > > > > feel even more confident if someone with hardware to test on and GCC > > > > 7.1+ could boot test). > > > > -- > > > > Thanks, > > > > ~Nick Desaulniers > > > > > > If we do keep it, would adding -mstackrealign make it more robust? > > > That's simple and will only add the alignment to functions that require > > > 16-byte alignment (at least on gcc). > > > > I think there's also `-mincoming-stack-boundary=`. > > https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-540038017 > > Yes, but -mstackrealign looks like it's supported by clang as well. Good to know, but I want less duct tape, not more. > > > > > > > > Alternative is to use > > > __attribute__((force_align_arg_pointer)) on functions that might be > > > called from 8-byte-aligned code. > > > > Which is hard to automate and easy to forget. Likely a large diff to fix today. > > Right, this is a no-go, esp to just fix old compilers. > > > > > > > > It looks like -mstackrealign should work from gcc 5.3 onwards. > > > > The kernel would generally like to support GCC 4.9+. > > > > There's plenty of different ways to keep layering on duct tape and > > bailing wire to support differing ABIs, but that's just adding > > technical debt that will have to be repaid one day. That's why the > > cleanest solution IMO is mark the driver broken for old toolchains, > > and use a code-base-consistent stack alignment. Bending over > > backwards to support old toolchains means accepting stack alignment > > mismatches, which is in the "unspecified behavior" ring of the > > "undefined behavior" Venn diagram. I have the same opinion on relying > > on explicitly undefined behavior. > > > > I'll send patches for fixing up Clang, but please consider my strong > > advice to generally avoid stack alignment mismatches, regardless of > > compiler. > > -- > > Thanks, > > ~Nick Desaulniers > > What I suggested was in reference to your proposal for dropping the > -mpreferred-stack-boundary=4 for modern compilers, but keeping it for > <7.1. -mstackrealign would at least let 5.3 onwards be less likely to > break (and it doesn't error before then, I think it just doesn't > actually do anything, so no worse than now at least). > > Simply dropping support for <7.1 would be cleanest, yes, but it sounds > like people don't want to go that far. That's fair. I've included your suggestions in the commit message of 02/03 of a series I just sent but forgot to in reply to this thread: https://lkml.org/lkml/2019/10/16/1700 Also, I do appreciate the suggestions and understand the value of brainstorming. -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-10-16 23:05 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-14 22:22 AMDGPU and 16B stack alignment Nick Desaulniers [not found] ` <9e4d6378-5032-8521-13a9-d9d9519d07de@amd.com> 2019-10-15 7:19 ` Arnd Bergmann 2019-10-15 10:48 ` David Laight 2019-10-15 18:05 ` Nick Desaulniers 2019-10-15 18:11 ` Nick Desaulniers 2019-10-15 18:30 ` Alex Deucher 2019-10-15 20:15 ` Nick Desaulniers 2019-10-15 20:26 ` Arvind Sankar 2019-10-16 1:51 ` Nick Desaulniers 2019-10-16 18:55 ` Arvind Sankar 2019-10-16 23:05 ` Nick Desaulniers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).