linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* AMDGPU and 16B stack alignment
@ 2019-10-14 22:22 Nick Desaulniers
       [not found] ` <9e4d6378-5032-8521-13a9-d9d9519d07de@amd.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Nick Desaulniers @ 2019-10-14 22:22 UTC (permalink / raw)
  To: Harry Wentland, Deucher, Alexander
  Cc: yshuiv7, andrew.cooper3, Arnd Bergmann, clang-built-linux,
	Matthias Kaehlcke, S, Shirish, Zhou, David(ChunMing),
	Koenig, Christian, amd-gfx list, LKML

Hello!

The x86 kernel is compiled with an 8B stack alignment via
    `-mpreferred-stack-boundary=3` for GCC since 3.6-rc1 via
    commit d9b0cde91c60 ("x86-64, gcc: Use
-mpreferred-stack-boundary=3 if supported")
    or `-mstack-alignment=8` for Clang. Parts of the AMDGPU driver are
    compiled with 16B stack alignment.

    Generally, the stack alignment is part of the ABI. Linking together two
    different translation units with differing stack alignment is dangerous,
    particularly when the translation unit with the smaller stack alignment
    makes calls into the translation unit with the larger stack alignment.
    While 8B aligned stacks are sometimes also 16B aligned, they are not
    always.

    Multiple users have reported General Protection Faults (GPF) when using
    the AMDGPU driver compiled with Clang. Clang is placing objects in stack
    slots assuming the stack is 16B aligned, and selecting instructions that
    require 16B aligned memory operands. At runtime, syscalls handling 8B
    stack aligned code calls into code that assumes 16B stack alignment.
    When the stack is a multiple of 8B but not 16B, these instructions
    result in a GPF.

    GCC doesn't select instructions with alignment requirements, so the GPFs
    aren't observed, but it is still considered an ABI breakage to mix and
    match stack alignment.

I have patches that basically remove -mpreferred-stack-boundary=4 and
-mstack-alignment=16 from AMDGPU:
https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-541247601
Yuxuan has tested with Clang and GCC and reported it fixes the GPF's observed.

I've split the patch into 4; same commit message but different Fixes
tags so that they backport to stable on finer granularity. 2 questions
BEFORE I send the series:

1. Would you prefer 4 patches with unique `fixes` tags, or 1 patch?
2. Was there or is there still a good reason for the stack alignment mismatch?

(Further, I think we can use -msse2 for BOTH clang+gcc after my patch,
but I don't have hardware to test on. I'm happy to write/send the
follow up patch, but I'd need help testing).
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMDGPU and 16B stack alignment
       [not found] ` <9e4d6378-5032-8521-13a9-d9d9519d07de@amd.com>
@ 2019-10-15  7:19   ` Arnd Bergmann
  2019-10-15 10:48     ` David Laight
  2019-10-15 18:05     ` Nick Desaulniers
  0 siblings, 2 replies; 11+ messages in thread
From: Arnd Bergmann @ 2019-10-15  7:19 UTC (permalink / raw)
  To: S, Shirish
  Cc: Nick Desaulniers, Wentland, Harry, Deucher, Alexander, yshuiv7,
	andrew.cooper3, clang-built-linux, Matthias Kaehlcke, S, Shirish,
	Zhou, David(ChunMing),
	Koenig, Christian, amd-gfx list, LKML

On Tue, Oct 15, 2019 at 9:08 AM S, Shirish <sshankar@amd.com> wrote:
> On 10/15/2019 3:52 AM, Nick Desaulniers wrote:

> My gcc build fails with below errors:
>
> dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
>
> dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
>
> While GPF observed on clang builds seem to be fixed.

Ok, so it seems that gcc insists on having at least 2^4 bytes stack
alignment when
SSE is enabled on x86-64, but does not actually rely on that for
correct operation
unless it's using sse2. So -msse always has to be paired with
 -mpreferred-stack-boundary=3.

For clang, it sounds like the opposite is true: when passing 16 byte
stack alignment
and having sse/sse2 enabled, it requires the incoming stack to be 16
byte aligned,
but passing 8 byte alignment makes it do the right thing.

So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4)
to get the desired outcome on both?

       Arnd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: AMDGPU and 16B stack alignment
  2019-10-15  7:19   ` Arnd Bergmann
@ 2019-10-15 10:48     ` David Laight
  2019-10-15 18:05     ` Nick Desaulniers
  1 sibling, 0 replies; 11+ messages in thread
From: David Laight @ 2019-10-15 10:48 UTC (permalink / raw)
  To: 'Arnd Bergmann', S, Shirish
  Cc: Nick Desaulniers, Wentland, Harry, Deucher, Alexander, yshuiv7,
	andrew.cooper3, clang-built-linux, Matthias Kaehlcke, S, Shirish,
	Zhou, David(ChunMing),
	Koenig, Christian, amd-gfx list, LKML

From: Arnd Bergmann
> Sent: 15 October 2019 08:19
> 
> On Tue, Oct 15, 2019 at 9:08 AM S, Shirish <sshankar@amd.com> wrote:
> > On 10/15/2019 3:52 AM, Nick Desaulniers wrote:
> 
> > My gcc build fails with below errors:
> >
> > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
> >
> > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
> >
> > While GPF observed on clang builds seem to be fixed.
> 
> Ok, so it seems that gcc insists on having at least 2^4 bytes stack
> alignment when
> SSE is enabled on x86-64, but does not actually rely on that for
> correct operation
> unless it's using sse2. So -msse always has to be paired with
>  -mpreferred-stack-boundary=3.
> 
> For clang, it sounds like the opposite is true: when passing 16 byte
> stack alignment
> and having sse/sse2 enabled, it requires the incoming stack to be 16
> byte aligned,
> but passing 8 byte alignment makes it do the right thing.
> 
> So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4)
> to get the desired outcome on both?

It probably won't solve the problem.
You need to find all the asm blocks that call back into C and ensure they
maintain the required stack alignment.
This might be possible in the kernel, but is almost impossible in userspace.

ISTR that gcc arbitrarily changed the stack alignment for i386 a few years ago.
While it helped code generation it broke a lot of things.
I can't remember the correct set of options to get the stack alignment
code added only where it was needed (generates a double %bp frame).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMDGPU and 16B stack alignment
  2019-10-15  7:19   ` Arnd Bergmann
  2019-10-15 10:48     ` David Laight
@ 2019-10-15 18:05     ` Nick Desaulniers
  2019-10-15 18:11       ` Nick Desaulniers
                         ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: Nick Desaulniers @ 2019-10-15 18:05 UTC (permalink / raw)
  To: Arnd Bergmann, S, Shirish
  Cc: Wentland, Harry, Deucher, Alexander, yshuiv7, andrew.cooper3,
	clang-built-linux, Matthias Kaehlcke, S, Shirish, Zhou,
	David(ChunMing),
	Koenig, Christian, amd-gfx list, LKML

On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Oct 15, 2019 at 9:08 AM S, Shirish <sshankar@amd.com> wrote:
> > On 10/15/2019 3:52 AM, Nick Desaulniers wrote:
>
> > My gcc build fails with below errors:
> >
> > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
> >
> > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12

I was able to reproduce this failure on pre-7.1 versions of GCC.  It
seems that when:
1. code is using doubles
2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment
than GCC produces that error:
https://godbolt.org/z/7T8nbH

That's already a tall order of constraints, so it's understandable
that the compiler would just error likely during instruction
selection, but was eventually taught how to solve such constraints.

> >
> > While GPF observed on clang builds seem to be fixed.

Thanks for the report.  Your testing these patches is invaluable, Shirish!

>
> Ok, so it seems that gcc insists on having at least 2^4 bytes stack
> alignment when
> SSE is enabled on x86-64, but does not actually rely on that for
> correct operation
> unless it's using sse2. So -msse always has to be paired with
>  -mpreferred-stack-boundary=3.

Seemingly only for older versions of GCC, pre 7.1.

>
> For clang, it sounds like the opposite is true: when passing 16 byte
> stack alignment
> and having sse/sse2 enabled, it requires the incoming stack to be 16
> byte aligned,

I don't think it requires the incoming stack to be 16B aligned for
sse2, I think it requires the incoming and current stack alignment to
match. Today it does not, which is why we observe GPFs.

> but passing 8 byte alignment makes it do the right thing.
>
> So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4)
> to get the desired outcome on both?

Hmmm...I would have liked to remove it outright, as it is an ABI
mismatch that is likely to result in instability and non-fun-to-debug
runtime issues in the future.  I suspect my patch does work for GCC
7.1+.  The question is: Do we want to either:
1. mark AMDGPU broken for GCC < 7.1, or
2. continue supporting it via stack alignment mismatch?

2 is brittle, and may break at any point in the future, but if it's
working for someone it does make me feel bad to outright disable it.
What I'd image 2 looks like is (psuedo code in a Makefile):

if CC_IS_GCC && GCC_VERSION < 7.1:
  set stack alignment to 16B and hope for the best

So my diff would be amended to keep the stack alignment flags, but
only to support GCC < 7.1.  And that assumes my change compiles with
GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would
feel even more confident if someone with hardware to test on and GCC
7.1+ could boot test).
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMDGPU and 16B stack alignment
  2019-10-15 18:05     ` Nick Desaulniers
@ 2019-10-15 18:11       ` Nick Desaulniers
  2019-10-15 18:30       ` Alex Deucher
  2019-10-15 20:26       ` Arvind Sankar
  2 siblings, 0 replies; 11+ messages in thread
From: Nick Desaulniers @ 2019-10-15 18:11 UTC (permalink / raw)
  To: Arnd Bergmann, S, Shirish
  Cc: Wentland, Harry, Deucher, Alexander, yshuiv7, andrew.cooper3,
	clang-built-linux, Matthias Kaehlcke, S, Shirish, Zhou,
	David(ChunMing),
	Koenig, Christian, amd-gfx list, LKML

On Tue, Oct 15, 2019 at 11:05 AM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish <sshankar@amd.com> wrote:
> > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote:
> >
> > > My gcc build fails with below errors:
> > >
> > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
> > >
> > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
>
> I was able to reproduce this failure on pre-7.1 versions of GCC.  It
> seems that when:
> 1. code is using doubles
> 2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment
> than GCC produces that error:
> https://godbolt.org/z/7T8nbH
>

Also, I suspect that very error solves the mystery of "why was 16B
stack alignment ever specified?"
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMDGPU and 16B stack alignment
  2019-10-15 18:05     ` Nick Desaulniers
  2019-10-15 18:11       ` Nick Desaulniers
@ 2019-10-15 18:30       ` Alex Deucher
  2019-10-15 20:15         ` Nick Desaulniers
  2019-10-15 20:26       ` Arvind Sankar
  2 siblings, 1 reply; 11+ messages in thread
From: Alex Deucher @ 2019-10-15 18:30 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Arnd Bergmann, S, Shirish, Zhou, David(ChunMing),
	clang-built-linux, andrew.cooper3, LKML, amd-gfx list, S,
	Shirish, Matthias Kaehlcke, yshuiv7, Deucher, Alexander,
	Wentland, Harry, Koenig, Christian

On Tue, Oct 15, 2019 at 2:07 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish <sshankar@amd.com> wrote:
> > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote:
> >
> > > My gcc build fails with below errors:
> > >
> > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
> > >
> > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
>
> I was able to reproduce this failure on pre-7.1 versions of GCC.  It
> seems that when:
> 1. code is using doubles
> 2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment
> than GCC produces that error:
> https://godbolt.org/z/7T8nbH
>
> That's already a tall order of constraints, so it's understandable
> that the compiler would just error likely during instruction
> selection, but was eventually taught how to solve such constraints.
>
> > >
> > > While GPF observed on clang builds seem to be fixed.
>
> Thanks for the report.  Your testing these patches is invaluable, Shirish!
>
> >
> > Ok, so it seems that gcc insists on having at least 2^4 bytes stack
> > alignment when
> > SSE is enabled on x86-64, but does not actually rely on that for
> > correct operation
> > unless it's using sse2. So -msse always has to be paired with
> >  -mpreferred-stack-boundary=3.
>
> Seemingly only for older versions of GCC, pre 7.1.
>
> >
> > For clang, it sounds like the opposite is true: when passing 16 byte
> > stack alignment
> > and having sse/sse2 enabled, it requires the incoming stack to be 16
> > byte aligned,
>
> I don't think it requires the incoming stack to be 16B aligned for
> sse2, I think it requires the incoming and current stack alignment to
> match. Today it does not, which is why we observe GPFs.
>
> > but passing 8 byte alignment makes it do the right thing.
> >
> > So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4)
> > to get the desired outcome on both?
>
> Hmmm...I would have liked to remove it outright, as it is an ABI
> mismatch that is likely to result in instability and non-fun-to-debug
> runtime issues in the future.  I suspect my patch does work for GCC
> 7.1+.  The question is: Do we want to either:
> 1. mark AMDGPU broken for GCC < 7.1, or
> 2. continue supporting it via stack alignment mismatch?
>
> 2 is brittle, and may break at any point in the future, but if it's
> working for someone it does make me feel bad to outright disable it.
> What I'd image 2 looks like is (psuedo code in a Makefile):

Well, it's been working as is for years now, at least with gcc, so I'd
hate to break that.

Alex

>
> if CC_IS_GCC && GCC_VERSION < 7.1:
>   set stack alignment to 16B and hope for the best
>
> So my diff would be amended to keep the stack alignment flags, but
> only to support GCC < 7.1.  And that assumes my change compiles with
> GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would
> feel even more confident if someone with hardware to test on and GCC
> 7.1+ could boot test).
> --
> Thanks,
> ~Nick Desaulniers
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMDGPU and 16B stack alignment
  2019-10-15 18:30       ` Alex Deucher
@ 2019-10-15 20:15         ` Nick Desaulniers
  0 siblings, 0 replies; 11+ messages in thread
From: Nick Desaulniers @ 2019-10-15 20:15 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Arnd Bergmann, S, Shirish, Zhou, David(ChunMing),
	clang-built-linux, andrew.cooper3, LKML, amd-gfx list, S,
	Shirish, Matthias Kaehlcke, yshuiv7, Deucher, Alexander,
	Wentland, Harry, Koenig, Christian

On Tue, Oct 15, 2019 at 11:30 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Tue, Oct 15, 2019 at 2:07 PM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > On Tue, Oct 15, 2019 at 12:19 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > On Tue, Oct 15, 2019 at 9:08 AM S, Shirish <sshankar@amd.com> wrote:
> > > > On 10/15/2019 3:52 AM, Nick Desaulniers wrote:
> > >
> > > > My gcc build fails with below errors:
> > > >
> > > > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
> > > >
> > > > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
> >
> > I was able to reproduce this failure on pre-7.1 versions of GCC.  It
> > seems that when:
> > 1. code is using doubles
> > 2. setting -mpreferred-stack-boundary=3 -mno-sse2, ie. 8B stack alignment
> > than GCC produces that error:
> > https://godbolt.org/z/7T8nbH
> >
> > That's already a tall order of constraints, so it's understandable
> > that the compiler would just error likely during instruction
> > selection, but was eventually taught how to solve such constraints.
> >
> > > >
> > > > While GPF observed on clang builds seem to be fixed.
> >
> > Thanks for the report.  Your testing these patches is invaluable, Shirish!
> >
> > >
> > > Ok, so it seems that gcc insists on having at least 2^4 bytes stack
> > > alignment when
> > > SSE is enabled on x86-64, but does not actually rely on that for
> > > correct operation
> > > unless it's using sse2. So -msse always has to be paired with
> > >  -mpreferred-stack-boundary=3.
> >
> > Seemingly only for older versions of GCC, pre 7.1.
> >
> > >
> > > For clang, it sounds like the opposite is true: when passing 16 byte
> > > stack alignment
> > > and having sse/sse2 enabled, it requires the incoming stack to be 16
> > > byte aligned,
> >
> > I don't think it requires the incoming stack to be 16B aligned for
> > sse2, I think it requires the incoming and current stack alignment to
> > match. Today it does not, which is why we observe GPFs.
> >
> > > but passing 8 byte alignment makes it do the right thing.
> > >
> > > So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4)
> > > to get the desired outcome on both?
> >
> > Hmmm...I would have liked to remove it outright, as it is an ABI
> > mismatch that is likely to result in instability and non-fun-to-debug
> > runtime issues in the future.  I suspect my patch does work for GCC
> > 7.1+.  The question is: Do we want to either:
> > 1. mark AMDGPU broken for GCC < 7.1, or
> > 2. continue supporting it via stack alignment mismatch?
> >
> > 2 is brittle, and may break at any point in the future, but if it's
> > working for someone it does make me feel bad to outright disable it.
> > What I'd image 2 looks like is (psuedo code in a Makefile):
>
> Well, it's been working as is for years now, at least with gcc, so I'd
> hate to break that.

Ok, I'm happy to leave that as is for GCC, then.  Would you prefer I
modify it for GCC >7.1 or just leave it alone (maybe I'll add a
comment about *why* it's done for GCC)? Would you prefer 1 patch or 4?

>
> Alex
>
> >
> > if CC_IS_GCC && GCC_VERSION < 7.1:
> >   set stack alignment to 16B and hope for the best

Ie, this ^

> >
> > So my diff would be amended to keep the stack alignment flags, but
> > only to support GCC < 7.1.  And that assumes my change compiles with
> > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would
> > feel even more confident if someone with hardware to test on and GCC
> > 7.1+ could boot test).

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMDGPU and 16B stack alignment
  2019-10-15 18:05     ` Nick Desaulniers
  2019-10-15 18:11       ` Nick Desaulniers
  2019-10-15 18:30       ` Alex Deucher
@ 2019-10-15 20:26       ` Arvind Sankar
  2019-10-16  1:51         ` Nick Desaulniers
  2 siblings, 1 reply; 11+ messages in thread
From: Arvind Sankar @ 2019-10-15 20:26 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Arnd Bergmann, S, Shirish, Wentland, Harry, Deucher, Alexander,
	yshuiv7, andrew.cooper3, clang-built-linux, Matthias Kaehlcke, S,
	Shirish, Zhou, David(ChunMing),
	Koenig, Christian, amd-gfx list, LKML

On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote:
> Hmmm...I would have liked to remove it outright, as it is an ABI
> mismatch that is likely to result in instability and non-fun-to-debug
> runtime issues in the future.  I suspect my patch does work for GCC
> 7.1+.  The question is: Do we want to either:
> 1. mark AMDGPU broken for GCC < 7.1, or
> 2. continue supporting it via stack alignment mismatch?
> 
> 2 is brittle, and may break at any point in the future, but if it's
> working for someone it does make me feel bad to outright disable it.
> What I'd image 2 looks like is (psuedo code in a Makefile):
> 
> if CC_IS_GCC && GCC_VERSION < 7.1:
>   set stack alignment to 16B and hope for the best
> 
> So my diff would be amended to keep the stack alignment flags, but
> only to support GCC < 7.1.  And that assumes my change compiles with
> GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would
> feel even more confident if someone with hardware to test on and GCC
> 7.1+ could boot test).
> -- 
> Thanks,
> ~Nick Desaulniers

If we do keep it, would adding -mstackrealign make it more robust?
That's simple and will only add the alignment to functions that require
16-byte alignment (at least on gcc).

Alternative is to use
__attribute__((force_align_arg_pointer)) on functions that might be
called from 8-byte-aligned code.

It looks like -mstackrealign should work from gcc 5.3 onwards.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMDGPU and 16B stack alignment
  2019-10-15 20:26       ` Arvind Sankar
@ 2019-10-16  1:51         ` Nick Desaulniers
  2019-10-16 18:55           ` Arvind Sankar
  0 siblings, 1 reply; 11+ messages in thread
From: Nick Desaulniers @ 2019-10-16  1:51 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Arnd Bergmann, S, Shirish, Wentland, Harry, Deucher, Alexander,
	yshuiv7, andrew.cooper3, clang-built-linux, Matthias Kaehlcke, S,
	Shirish, Zhou, David(ChunMing),
	Koenig, Christian, amd-gfx list, LKML

On Tue, Oct 15, 2019 at 1:26 PM Arvind Sankar <nivedita@alum.mit.edu> wrote:
>
> On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote:
> > Hmmm...I would have liked to remove it outright, as it is an ABI
> > mismatch that is likely to result in instability and non-fun-to-debug
> > runtime issues in the future.  I suspect my patch does work for GCC
> > 7.1+.  The question is: Do we want to either:
> > 1. mark AMDGPU broken for GCC < 7.1, or
> > 2. continue supporting it via stack alignment mismatch?
> >
> > 2 is brittle, and may break at any point in the future, but if it's
> > working for someone it does make me feel bad to outright disable it.
> > What I'd image 2 looks like is (psuedo code in a Makefile):
> >
> > if CC_IS_GCC && GCC_VERSION < 7.1:
> >   set stack alignment to 16B and hope for the best
> >
> > So my diff would be amended to keep the stack alignment flags, but
> > only to support GCC < 7.1.  And that assumes my change compiles with
> > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would
> > feel even more confident if someone with hardware to test on and GCC
> > 7.1+ could boot test).
> > --
> > Thanks,
> > ~Nick Desaulniers
>
> If we do keep it, would adding -mstackrealign make it more robust?
> That's simple and will only add the alignment to functions that require
> 16-byte alignment (at least on gcc).

I think there's also `-mincoming-stack-boundary=`.
https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-540038017

>
> Alternative is to use
> __attribute__((force_align_arg_pointer)) on functions that might be
> called from 8-byte-aligned code.

Which is hard to automate and easy to forget.  Likely a large diff to fix today.

>
> It looks like -mstackrealign should work from gcc 5.3 onwards.

The kernel would generally like to support GCC 4.9+.

There's plenty of different ways to keep layering on duct tape and
bailing wire to support differing ABIs, but that's just adding
technical debt that will have to be repaid one day.  That's why the
cleanest solution IMO is mark the driver broken for old toolchains,
and use a code-base-consistent stack alignment.  Bending over
backwards to support old toolchains means accepting stack alignment
mismatches, which is in the "unspecified behavior" ring of the
"undefined behavior" Venn diagram.  I have the same opinion on relying
on explicitly undefined behavior.

I'll send patches for fixing up Clang, but please consider my strong
advice to generally avoid stack alignment mismatches, regardless of
compiler.
--
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMDGPU and 16B stack alignment
  2019-10-16  1:51         ` Nick Desaulniers
@ 2019-10-16 18:55           ` Arvind Sankar
  2019-10-16 23:05             ` Nick Desaulniers
  0 siblings, 1 reply; 11+ messages in thread
From: Arvind Sankar @ 2019-10-16 18:55 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Arvind Sankar, Arnd Bergmann, S, Shirish, Wentland, Harry,
	Deucher, Alexander, yshuiv7, andrew.cooper3, clang-built-linux,
	Matthias Kaehlcke, S, Shirish, Zhou, David(ChunMing),
	Koenig, Christian, amd-gfx list, LKML

On Tue, Oct 15, 2019 at 06:51:26PM -0700, Nick Desaulniers wrote:
> On Tue, Oct 15, 2019 at 1:26 PM Arvind Sankar <nivedita@alum.mit.edu> wrote:
> >
> > On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote:
> > > Hmmm...I would have liked to remove it outright, as it is an ABI
> > > mismatch that is likely to result in instability and non-fun-to-debug
> > > runtime issues in the future.  I suspect my patch does work for GCC
> > > 7.1+.  The question is: Do we want to either:
> > > 1. mark AMDGPU broken for GCC < 7.1, or
> > > 2. continue supporting it via stack alignment mismatch?
> > >
> > > 2 is brittle, and may break at any point in the future, but if it's
> > > working for someone it does make me feel bad to outright disable it.
> > > What I'd image 2 looks like is (psuedo code in a Makefile):
> > >
> > > if CC_IS_GCC && GCC_VERSION < 7.1:
> > >   set stack alignment to 16B and hope for the best
> > >
> > > So my diff would be amended to keep the stack alignment flags, but
> > > only to support GCC < 7.1.  And that assumes my change compiles with
> > > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would
> > > feel even more confident if someone with hardware to test on and GCC
> > > 7.1+ could boot test).
> > > --
> > > Thanks,
> > > ~Nick Desaulniers
> >
> > If we do keep it, would adding -mstackrealign make it more robust?
> > That's simple and will only add the alignment to functions that require
> > 16-byte alignment (at least on gcc).
> 
> I think there's also `-mincoming-stack-boundary=`.
> https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-540038017

Yes, but -mstackrealign looks like it's supported by clang as well.
> 
> >
> > Alternative is to use
> > __attribute__((force_align_arg_pointer)) on functions that might be
> > called from 8-byte-aligned code.
> 
> Which is hard to automate and easy to forget.  Likely a large diff to fix today.

Right, this is a no-go, esp to just fix old compilers.
> 
> >
> > It looks like -mstackrealign should work from gcc 5.3 onwards.
> 
> The kernel would generally like to support GCC 4.9+.
> 
> There's plenty of different ways to keep layering on duct tape and
> bailing wire to support differing ABIs, but that's just adding
> technical debt that will have to be repaid one day.  That's why the
> cleanest solution IMO is mark the driver broken for old toolchains,
> and use a code-base-consistent stack alignment.  Bending over
> backwards to support old toolchains means accepting stack alignment
> mismatches, which is in the "unspecified behavior" ring of the
> "undefined behavior" Venn diagram.  I have the same opinion on relying
> on explicitly undefined behavior.
> 
> I'll send patches for fixing up Clang, but please consider my strong
> advice to generally avoid stack alignment mismatches, regardless of
> compiler.
> --
> Thanks,
> ~Nick Desaulniers

What I suggested was in reference to your proposal for dropping the
-mpreferred-stack-boundary=4 for modern compilers, but keeping it for
<7.1. -mstackrealign would at least let 5.3 onwards be less likely to
break (and it doesn't error before then, I think it just doesn't
actually do anything, so no worse than now at least).

Simply dropping support for <7.1 would be cleanest, yes, but it sounds
like people don't want to go that far.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: AMDGPU and 16B stack alignment
  2019-10-16 18:55           ` Arvind Sankar
@ 2019-10-16 23:05             ` Nick Desaulniers
  0 siblings, 0 replies; 11+ messages in thread
From: Nick Desaulniers @ 2019-10-16 23:05 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Arnd Bergmann, S, Shirish, Wentland, Harry, Deucher, Alexander,
	yshuiv7, andrew.cooper3, clang-built-linux, Matthias Kaehlcke, S,
	Shirish, Zhou, David(ChunMing),
	Koenig, Christian, amd-gfx list, LKML

On Wed, Oct 16, 2019 at 11:55 AM Arvind Sankar <nivedita@alum.mit.edu> wrote:
>
> On Tue, Oct 15, 2019 at 06:51:26PM -0700, Nick Desaulniers wrote:
> > On Tue, Oct 15, 2019 at 1:26 PM Arvind Sankar <nivedita@alum.mit.edu> wrote:
> > >
> > > On Tue, Oct 15, 2019 at 11:05:56AM -0700, Nick Desaulniers wrote:
> > > > Hmmm...I would have liked to remove it outright, as it is an ABI
> > > > mismatch that is likely to result in instability and non-fun-to-debug
> > > > runtime issues in the future.  I suspect my patch does work for GCC
> > > > 7.1+.  The question is: Do we want to either:
> > > > 1. mark AMDGPU broken for GCC < 7.1, or
> > > > 2. continue supporting it via stack alignment mismatch?
> > > >
> > > > 2 is brittle, and may break at any point in the future, but if it's
> > > > working for someone it does make me feel bad to outright disable it.
> > > > What I'd image 2 looks like is (psuedo code in a Makefile):
> > > >
> > > > if CC_IS_GCC && GCC_VERSION < 7.1:
> > > >   set stack alignment to 16B and hope for the best
> > > >
> > > > So my diff would be amended to keep the stack alignment flags, but
> > > > only to support GCC < 7.1.  And that assumes my change compiles with
> > > > GCC 7.1+. (Looks like it does for me locally with GCC 8.3, but I would
> > > > feel even more confident if someone with hardware to test on and GCC
> > > > 7.1+ could boot test).
> > > > --
> > > > Thanks,
> > > > ~Nick Desaulniers
> > >
> > > If we do keep it, would adding -mstackrealign make it more robust?
> > > That's simple and will only add the alignment to functions that require
> > > 16-byte alignment (at least on gcc).
> >
> > I think there's also `-mincoming-stack-boundary=`.
> > https://github.com/ClangBuiltLinux/linux/issues/735#issuecomment-540038017
>
> Yes, but -mstackrealign looks like it's supported by clang as well.

Good to know, but I want less duct tape, not more.

> >
> > >
> > > Alternative is to use
> > > __attribute__((force_align_arg_pointer)) on functions that might be
> > > called from 8-byte-aligned code.
> >
> > Which is hard to automate and easy to forget.  Likely a large diff to fix today.
>
> Right, this is a no-go, esp to just fix old compilers.
> >
> > >
> > > It looks like -mstackrealign should work from gcc 5.3 onwards.
> >
> > The kernel would generally like to support GCC 4.9+.
> >
> > There's plenty of different ways to keep layering on duct tape and
> > bailing wire to support differing ABIs, but that's just adding
> > technical debt that will have to be repaid one day.  That's why the
> > cleanest solution IMO is mark the driver broken for old toolchains,
> > and use a code-base-consistent stack alignment.  Bending over
> > backwards to support old toolchains means accepting stack alignment
> > mismatches, which is in the "unspecified behavior" ring of the
> > "undefined behavior" Venn diagram.  I have the same opinion on relying
> > on explicitly undefined behavior.
> >
> > I'll send patches for fixing up Clang, but please consider my strong
> > advice to generally avoid stack alignment mismatches, regardless of
> > compiler.
> > --
> > Thanks,
> > ~Nick Desaulniers
>
> What I suggested was in reference to your proposal for dropping the
> -mpreferred-stack-boundary=4 for modern compilers, but keeping it for
> <7.1. -mstackrealign would at least let 5.3 onwards be less likely to
> break (and it doesn't error before then, I think it just doesn't
> actually do anything, so no worse than now at least).
>
> Simply dropping support for <7.1 would be cleanest, yes, but it sounds
> like people don't want to go that far.

That's fair.  I've included your suggestions in the commit message of
02/03 of a series I just sent but forgot to in reply to this thread:
https://lkml.org/lkml/2019/10/16/1700

Also, I do appreciate the suggestions and understand the value of brainstorming.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-10-16 23:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-14 22:22 AMDGPU and 16B stack alignment Nick Desaulniers
     [not found] ` <9e4d6378-5032-8521-13a9-d9d9519d07de@amd.com>
2019-10-15  7:19   ` Arnd Bergmann
2019-10-15 10:48     ` David Laight
2019-10-15 18:05     ` Nick Desaulniers
2019-10-15 18:11       ` Nick Desaulniers
2019-10-15 18:30       ` Alex Deucher
2019-10-15 20:15         ` Nick Desaulniers
2019-10-15 20:26       ` Arvind Sankar
2019-10-16  1:51         ` Nick Desaulniers
2019-10-16 18:55           ` Arvind Sankar
2019-10-16 23:05             ` Nick Desaulniers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).