dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* mainline build failure for x86_64 allmodconfig with clang
@ 2022-08-04 18:36 Sudip Mukherjee (Codethink)
  2022-08-04 18:52 ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Sudip Mukherjee (Codethink) @ 2022-08-04 18:36 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: clang-built-linux, David Airlie, Pan, Xinhui, linux-kernel,
	amd-gfx, dri-devel, Alex Deucher, Linus Torvalds,
	Christian König

Hi All,

The latest mainline kernel branch fails to build for x86_64 allmodconfig
with clang. The errors are:

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_mode_vba_30.c:3596:6: error: stack frame size (2216) exceeds limit (2048) in 'dml30_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
void dml30_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:3908:6: error: stack frame size (2184) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
void dml31_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c:1726:6: error: stack frame size (2176) exceeds limit (2048) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
void dml32_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)

git bisect points to 3876a8b5e241 ("drm/amd/display: Enable building new display engine with KCOV enabled").

My last good build for clang was with e2b542100719 ("Merge tag 'flexible-array-transformations-UAPI-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux").

I will be happy to test any patch or provide any extra log if needed.


--
Regards
Sudip

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mainline build failure for x86_64 allmodconfig with clang
  2022-08-04 18:36 mainline build failure for x86_64 allmodconfig with clang Sudip Mukherjee (Codethink)
@ 2022-08-04 18:52 ` Linus Torvalds
  2022-08-04 19:24   ` Arnd Bergmann
  0 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2022-08-04 18:52 UTC (permalink / raw)
  To: Sudip Mukherjee (Codethink)
  Cc: Pan, Xinhui, David Airlie, clang-built-linux, linux-kernel,
	amd-gfx, Nathan Chancellor, dri-devel, Alex Deucher,
	Christian König

On Thu, Aug 4, 2022 at 11:37 AM Sudip Mukherjee (Codethink)
<sudipm.mukherjee@gmail.com> wrote:
>
> git bisect points to 3876a8b5e241 ("drm/amd/display: Enable building new display engine with KCOV enabled").

Ahh. So that was presumably why it was disabled before - because it
presumably does disgusting things that make KCOV generate even bigger
stack frames than it already has.

Those functions do seem to have fairly big stack footprints already (I
didn't try to look into why, I assume it's partly due to aggressive
inlining, and probably some automatic structures on stack). But gcc
doesn't seem to make it all that much worse with KCOV (and my clang
build doesn't enable KCOV).

So it's presumably some KCOV-vs-clang thing. Nathan?

              Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mainline build failure for x86_64 allmodconfig with clang
  2022-08-04 18:52 ` Linus Torvalds
@ 2022-08-04 19:24   ` Arnd Bergmann
  2022-08-04 20:43     ` Nathan Chancellor
  0 siblings, 1 reply; 17+ messages in thread
From: Arnd Bergmann @ 2022-08-04 19:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pan, Xinhui, David Airlie, clang-built-linux,
	Linux Kernel Mailing List, amd-gfx list, Christian König,
	Nathan Chancellor, dri-devel, Alex Deucher,
	Sudip Mukherjee (Codethink)

On Thu, Aug 4, 2022 at 8:52 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Aug 4, 2022 at 11:37 AM Sudip Mukherjee (Codethink)
> <sudipm.mukherjee@gmail.com> wrote:cov_trace_cmp
> >
> > git bisect points to 3876a8b5e241 ("drm/amd/display: Enable building new display engine with KCOV enabled").
>
> Ahh. So that was presumably why it was disabled before - because it
> presumably does disgusting things that make KCOV generate even bigger
> stack frames than it already has.
>
> Those functions do seem to have fairly big stack footprints already (I
> didn't try to look into why, I assume it's partly due to aggressive
> inlining, and probably some automatic structures on stack). But gcc
> doesn't seem to make it all that much worse with KCOV (and my clang
> build doesn't enable KCOV).
>
> So it's presumably some KCOV-vs-clang thing. Nathan?

The dependency was originally added to avoid a link failure in 9d1d02ff3678
 ("drm/amd/display: Don't build DCN1 when kcov is enabled") after I reported the
problem in https://lists.freedesktop.org/archives/dri-devel/2018-August/186131.html

The commit from the bisection just turns off KCOV for the entire directory
to avoid the link failure, so it's not actually a problem with KCOV vs clang,
but I think a problem with clang vs badly written code that was obscured
in allmodconfig builds prior to this.

The dml30_ModeSupportAndSystemConfigurationFull() function exercises
a few paths in the compiler that are otherwise rare. On thing it does is to
pass up to 60 arguments to other functions, and it heavily uses float and
double variables. Both of these make it rather fragile when it comes to
unusual compiler options, so the files keep coming up whenever a new
instrumentation feature gets added. There is probably some other flag
in allmodconfig that we can disable to improve this again, but I have not
checked this time.

        Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mainline build failure for x86_64 allmodconfig with clang
  2022-08-04 19:24   ` Arnd Bergmann
@ 2022-08-04 20:43     ` Nathan Chancellor
  2022-08-04 21:59       ` Linus Torvalds
                         ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Nathan Chancellor @ 2022-08-04 20:43 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: clang-built-linux, David Airlie, Pan, Xinhui,
	Linux Kernel Mailing List, amd-gfx list, Christian König,
	dri-devel, Alex Deucher, Linus Torvalds,
	Sudip Mukherjee (Codethink)

On Thu, Aug 04, 2022 at 09:24:41PM +0200, Arnd Bergmann wrote:
> On Thu, Aug 4, 2022 at 8:52 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Thu, Aug 4, 2022 at 11:37 AM Sudip Mukherjee (Codethink)
> > <sudipm.mukherjee@gmail.com> wrote:cov_trace_cmp
> > >
> > > git bisect points to 3876a8b5e241 ("drm/amd/display: Enable building new display engine with KCOV enabled").
> >
> > Ahh. So that was presumably why it was disabled before - because it
> > presumably does disgusting things that make KCOV generate even bigger
> > stack frames than it already has.
> >
> > Those functions do seem to have fairly big stack footprints already (I
> > didn't try to look into why, I assume it's partly due to aggressive
> > inlining, and probably some automatic structures on stack). But gcc
> > doesn't seem to make it all that much worse with KCOV (and my clang
> > build doesn't enable KCOV).
> >
> > So it's presumably some KCOV-vs-clang thing. Nathan?

Looks like Arnd beat me to it :)

> The dependency was originally added to avoid a link failure in 9d1d02ff3678
>  ("drm/amd/display: Don't build DCN1 when kcov is enabled") after I reported the
> problem in https://lists.freedesktop.org/archives/dri-devel/2018-August/186131.html
> 
> The commit from the bisection just turns off KCOV for the entire directory
> to avoid the link failure, so it's not actually a problem with KCOV vs clang,
> but I think a problem with clang vs badly written code that was obscured
> in allmodconfig builds prior to this.

Right, I do think the sanitizers make things worse here too, as those get
enabled with allmodconfig. I ran some really quick tests with allmodconfig and
a few instrumentation options flipped on/off:

allmodconfig (CONFIG_KASAN=y, CONFIG_KCSAN=n, CONFIG_KCOV=y, and CONFIG_UBSAN=y):

warning: stack frame size (2216) exceeds limit (2048) in 'dml30_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
warning: stack frame size (2184) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
warning: stack frame size (2176) exceeds limit (2048) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]

allmodconfig + CONFIG_KASAN=n:

warning: stack frame size (2112) exceeds limit (2048) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]

allmodconfig + CONFIG_KCOV=n:

warning: stack frame size (2216) exceeds limit (2048) in 'dml30_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
warning: stack frame size (2184) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
warning: stack frame size (2176) exceeds limit (2048) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]

allmodconfig + CONFIG_UBSAN=n:

warning: stack frame size (2584) exceeds limit (2048) in 'dml30_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
warning: stack frame size (2680) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
warning: stack frame size (2352) exceeds limit (2048) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]

allmodconfig + CONFIG_KASAN=n + CONFIG_KCSAN=y + CONFIG_UBSAN=n:

warning: stack frame size (2504) exceeds limit (2048) in 'dml30_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
warning: stack frame size (2600) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
warning: stack frame size (2264) exceeds limit (2048) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]

allmodconfig + CONFIG_KASAN=n + CONFIG_KCSAN=n + CONFIG_UBSAN=n:

warning: stack frame size (2072) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]

There might be other debugging configurations that make this worse too,
as I don't see those warnings on my distribution configuration.

> The dml30_ModeSupportAndSystemConfigurationFull() function exercises
> a few paths in the compiler that are otherwise rare. On thing it does is to
> pass up to 60 arguments to other functions, and it heavily uses float and
> double variables. Both of these make it rather fragile when it comes to
> unusual compiler options, so the files keep coming up whenever a new
> instrumentation feature gets added. There is probably some other flag
> in allmodconfig that we can disable to improve this again, but I have not
> checked this time.

I do notice that these files build with a non-configurable
-Wframe-large-than value:

$ rg frame_warn_flag drivers/gpu/drm/amd/display/dc/dml/Makefile
54:frame_warn_flag := -Wframe-larger-than=2048
70:CFLAGS_$(AMDDALPATH)/dc/dml/dcn30/display_mode_vba_30.o := $(dml_ccflags) $(frame_warn_flag)
72:CFLAGS_$(AMDDALPATH)/dc/dml/dcn31/display_mode_vba_31.o := $(dml_ccflags) $(frame_warn_flag)
76:CFLAGS_$(AMDDALPATH)/dc/dml/dcn32/display_mode_vba_32.o := $(dml_ccflags) $(frame_warn_flag)

I suppose that could just be bumped as a quick workaround? Two of those
files have a comment that implies modifying them in non-trivial ways is
not recommended.

/*
 * NOTE:
 *   This file is gcc-parsable HW gospel, coming straight from HW engineers.
 *
 * It doesn't adhere to Linux kernel style and sometimes will do things in odd
 * ways. Unless there is something clearly wrong with it the code should
 * remain as-is as it provides us with a guarantee from HW that it is correct.
 */

I do note that commit 1b54a0121dba ("drm/amd/display: Reduce stack size
in the mode support function") did have a workaround for GCC. It appears
clang will still inline mode_support_configuration(). If I mark it as
'noinline', the warning disappears in that file.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mainline build failure for x86_64 allmodconfig with clang
  2022-08-04 20:43     ` Nathan Chancellor
@ 2022-08-04 21:59       ` Linus Torvalds
  2022-08-04 22:43         ` Nathan Chancellor
  2022-08-05  9:46       ` David Laight
  2022-08-05 15:32       ` Harry Wentland
  2 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2022-08-04 21:59 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Arnd Bergmann, Pan, Xinhui, David Airlie, clang-built-linux,
	Linux Kernel Mailing List, amd-gfx list, Christian König,
	dri-devel, Alex Deucher, Sudip Mukherjee (Codethink)

On Thu, Aug 4, 2022 at 1:43 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> I do note that commit 1b54a0121dba ("drm/amd/display: Reduce stack size
> in the mode support function") did have a workaround for GCC. It appears
> clang will still inline mode_support_configuration(). If I mark it as
> 'noinline', the warning disappears in that file.

That sounds like probably the best option for now. Gcc does not inline
that function (at least for allmodconfig builds in my testing), so if
that makes clang match what gcc does, it seems a reasonable thing to
do.

            Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mainline build failure for x86_64 allmodconfig with clang
  2022-08-04 21:59       ` Linus Torvalds
@ 2022-08-04 22:43         ` Nathan Chancellor
  0 siblings, 0 replies; 17+ messages in thread
From: Nathan Chancellor @ 2022-08-04 22:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arnd Bergmann, Pan, Xinhui, David Airlie, clang-built-linux,
	Linux Kernel Mailing List, amd-gfx list, Christian König,
	dri-devel, Alex Deucher, Sudip Mukherjee (Codethink)

On Thu, Aug 04, 2022 at 02:59:01PM -0700, Linus Torvalds wrote:
> On Thu, Aug 4, 2022 at 1:43 PM Nathan Chancellor <nathan@kernel.org> wrote:
> >
> > I do note that commit 1b54a0121dba ("drm/amd/display: Reduce stack size
> > in the mode support function") did have a workaround for GCC. It appears
> > clang will still inline mode_support_configuration(). If I mark it as
> > 'noinline', the warning disappears in that file.
> 
> That sounds like probably the best option for now. Gcc does not inline
> that function (at least for allmodconfig builds in my testing), so if
> that makes clang match what gcc does, it seems a reasonable thing to
> do.

Sounds good. That solution only takes care of the warning in
display_mode_vba_32.c. I will try and come up with something similar for
the other two files tomorrow, unless the AMD folks beat me to it, since
they will know the driver better than I will ;)

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: mainline build failure for x86_64 allmodconfig with clang
  2022-08-04 20:43     ` Nathan Chancellor
  2022-08-04 21:59       ` Linus Torvalds
@ 2022-08-05  9:46       ` David Laight
  2022-08-05 15:32       ` Harry Wentland
  2 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2022-08-05  9:46 UTC (permalink / raw)
  To: 'Nathan Chancellor', Arnd Bergmann
  Cc: clang-built-linux, David Airlie, Pan, Xinhui,
	Linux Kernel Mailing List, amd-gfx list, Christian König,
	dri-devel, Alex Deucher, Linus Torvalds,
	Sudip Mukherjee (Codethink)

...
>  * NOTE:
>  *   This file is gcc-parsable HW gospel, coming straight from HW engineers.

I never trust hardware engineers to write code :-)
(Although at the moment they trust me to write VHDL...)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mainline build failure for x86_64 allmodconfig with clang
  2022-08-04 20:43     ` Nathan Chancellor
  2022-08-04 21:59       ` Linus Torvalds
  2022-08-05  9:46       ` David Laight
@ 2022-08-05 15:32       ` Harry Wentland
  2022-08-05 16:16         ` Arnd Bergmann
  2 siblings, 1 reply; 17+ messages in thread
From: Harry Wentland @ 2022-08-05 15:32 UTC (permalink / raw)
  To: Nathan Chancellor, Arnd Bergmann, Siqueira, Rodrigo
  Cc: clang-built-linux, David Airlie, Pan, Xinhui,
	Linux Kernel Mailing List, amd-gfx list,
	Sudip Mukherjee (Codethink),
	dri-devel, Alex Deucher, Linus Torvalds, Christian König



On 2022-08-04 16:43, Nathan Chancellor wrote:
> On Thu, Aug 04, 2022 at 09:24:41PM +0200, Arnd Bergmann wrote:
>> On Thu, Aug 4, 2022 at 8:52 PM Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>>
>>> On Thu, Aug 4, 2022 at 11:37 AM Sudip Mukherjee (Codethink)
>>> <sudipm.mukherjee@gmail.com> wrote:cov_trace_cmp
>>>>
>>>> git bisect points to 3876a8b5e241 ("drm/amd/display: Enable building new display engine with KCOV enabled").
>>>
>>> Ahh. So that was presumably why it was disabled before - because it
>>> presumably does disgusting things that make KCOV generate even bigger
>>> stack frames than it already has.
>>>
>>> Those functions do seem to have fairly big stack footprints already (I
>>> didn't try to look into why, I assume it's partly due to aggressive
>>> inlining, and probably some automatic structures on stack). But gcc
>>> doesn't seem to make it all that much worse with KCOV (and my clang
>>> build doesn't enable KCOV).
>>>
>>> So it's presumably some KCOV-vs-clang thing. Nathan?
> 
> Looks like Arnd beat me to it :)
> 
>> The dependency was originally added to avoid a link failure in 9d1d02ff3678
>>  ("drm/amd/display: Don't build DCN1 when kcov is enabled") after I reported the
>> problem in https://lists.freedesktop.org/archives/dri-devel/2018-August/186131.html>>>
>> The commit from the bisection just turns off KCOV for the entire directory
>> to avoid the link failure, so it's not actually a problem with KCOV vs clang,
>> but I think a problem with clang vs badly written code that was obscured
>> in allmodconfig builds prior to this.
> 
> Right, I do think the sanitizers make things worse here too, as those get
> enabled with allmodconfig. I ran some really quick tests with allmodconfig and
> a few instrumentation options flipped on/off:
> 
> allmodconfig (CONFIG_KASAN=y, CONFIG_KCSAN=n, CONFIG_KCOV=y, and CONFIG_UBSAN=y):
> 
> warning: stack frame size (2216) exceeds limit (2048) in 'dml30_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
> warning: stack frame size (2184) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
> warning: stack frame size (2176) exceeds limit (2048) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
> 
> allmodconfig + CONFIG_KASAN=n:
> 
> warning: stack frame size (2112) exceeds limit (2048) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
> 
> allmodconfig + CONFIG_KCOV=n:
> 
> warning: stack frame size (2216) exceeds limit (2048) in 'dml30_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
> warning: stack frame size (2184) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
> warning: stack frame size (2176) exceeds limit (2048) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
> 
> allmodconfig + CONFIG_UBSAN=n:
> 
> warning: stack frame size (2584) exceeds limit (2048) in 'dml30_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
> warning: stack frame size (2680) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
> warning: stack frame size (2352) exceeds limit (2048) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
> 
> allmodconfig + CONFIG_KASAN=n + CONFIG_KCSAN=y + CONFIG_UBSAN=n:
> 
> warning: stack frame size (2504) exceeds limit (2048) in 'dml30_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
> warning: stack frame size (2600) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
> warning: stack frame size (2264) exceeds limit (2048) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
> 
> allmodconfig + CONFIG_KASAN=n + CONFIG_KCSAN=n + CONFIG_UBSAN=n:
> 
> warning: stack frame size (2072) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
> 
> There might be other debugging configurations that make this worse too,
> as I don't see those warnings on my distribution configuration.
> 
>> The dml30_ModeSupportAndSystemConfigurationFull() function exercises
>> a few paths in the compiler that are otherwise rare. On thing it does is to
>> pass up to 60 arguments to other functions, and it heavily uses float and
>> double variables. Both of these make it rather fragile when it comes to
>> unusual compiler options, so the files keep coming up whenever a new
>> instrumentation feature gets added. There is probably some other flag
>> in allmodconfig that we can disable to improve this again, but I have not
>> checked this time.
> 
> I do notice that these files build with a non-configurable
> -Wframe-large-than value:
> 
> $ rg frame_warn_flag drivers/gpu/drm/amd/display/dc/dml/Makefile
> 54:frame_warn_flag := -Wframe-larger-than=2048

Tbh, I was looking at the history and I can't find a good reason this
was added. It should be safe to drop this. I would much rather use
the CONFIG_FRAME_WARN value than override it.

AFAIK most builds use 2048 by default anyways.

> 70:CFLAGS_$(AMDDALPATH)/dc/dml/dcn30/display_mode_vba_30.o := $(dml_ccflags) $(frame_warn_flag)
> 72:CFLAGS_$(AMDDALPATH)/dc/dml/dcn31/display_mode_vba_31.o := $(dml_ccflags) $(frame_warn_flag)
> 76:CFLAGS_$(AMDDALPATH)/dc/dml/dcn32/display_mode_vba_32.o := $(dml_ccflags) $(frame_warn_flag)
> 
> I suppose that could just be bumped as a quick workaround? Two of those
> files have a comment that implies modifying them in non-trivial ways is
> not recommended.
> 
> /*
>  * NOTE:
>  *   This file is gcc-parsable HW gospel, coming straight from HW engineers.
>  *
>  * It doesn't adhere to Linux kernel style and sometimes will do things in odd
>  * ways. Unless there is something clearly wrong with it the code should
>  * remain as-is as it provides us with a guarantee from HW that it is correct.
>  */
> 
> I do note that commit 1b54a0121dba ("drm/amd/display: Reduce stack size
> in the mode support function") did have a workaround for GCC. It appears
> clang will still inline mode_support_configuration(). If I mark it as
> 'noinline', the warning disappears in that file.
> 

That'd be the best quick fix. I guess if we split out functions to fix
stack usage we should mark them as 'noinline' in the future to avoid
agressive compiler optimizations.

Harry

> Cheers,
> Nathan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mainline build failure for x86_64 allmodconfig with clang
  2022-08-05 15:32       ` Harry Wentland
@ 2022-08-05 16:16         ` Arnd Bergmann
  2022-08-05 18:02           ` Nathan Chancellor
  0 siblings, 1 reply; 17+ messages in thread
From: Arnd Bergmann @ 2022-08-05 16:16 UTC (permalink / raw)
  To: Harry Wentland
  Cc: Pan, Xinhui, David Airlie, clang-built-linux, Siqueira, Rodrigo,
	Linux Kernel Mailing List, amd-gfx list,
	Sudip Mukherjee (Codethink),
	Nathan Chancellor, dri-devel, Alex Deucher, Linus Torvalds,
	Christian König

On Fri, Aug 5, 2022 at 5:32 PM Harry Wentland <harry.wentland@amd.com> wrote:
> > I do notice that these files build with a non-configurable
> > -Wframe-large-than value:
> >
> > $ rg frame_warn_flag drivers/gpu/drm/amd/display/dc/dml/Makefile
> > 54:frame_warn_flag := -Wframe-larger-than=2048
>
> Tbh, I was looking at the history and I can't find a good reason this
> was added. It should be safe to drop this. I would much rather use
> the CONFIG_FRAME_WARN value than override it.
>
> AFAIK most builds use 2048 by default anyways.

I'm fairly sure this was done for 32-bit builds, which default to a lower
warning limit of 1024 bytes and would otherwise run into this
problem when 64-bit platforms don't. With the default warning limit,
clang warns even more about an i386 build:

display/dc/dml/dcn20/display_rq_dlg_calc_20.c:1549:6: error: stack
frame size (1324) exceeds limit (1024) in 'dml20_rq_dlg_get_dlg_reg'
display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c:1550:6: error: stack
frame size (1324) exceeds limit (1024) in 'dml20v2_rq_dlg_get_dlg_reg'
display/dc/dml/dcn30/display_rq_dlg_calc_30.c:1742:6: error: stack
frame size (1484) exceeds limit (1024) in 'dml30_rq_dlg_get_dlg_reg'
display/dc/dml/dcn31/display_rq_dlg_calc_31.c:1571:6: error: stack
frame size (1548) exceeds limit (1024) in 'dml31_rq_dlg_get_dlg_reg'
display/dc/dml/dcn21/display_rq_dlg_calc_21.c:1657:6: error: stack
frame size (1388) exceeds limit (1024) in 'dml21_rq_dlg_get_dlg_reg'
display/dc/dml/dcn32/display_rq_dlg_calc_32.c:206:6: error: stack
frame size (1276) exceeds limit (1024) in 'dml32_rq_dlg_get_dlg_reg'
display/dc/dml/dcn31/display_mode_vba_31.c:2049:13: error: stack frame
size (1468) exceeds limit (1024) in
'DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation'
display/dc/dml/dcn20/display_mode_vba_20v2.c:1145:13: error: stack
frame size (1228) exceeds limit (1024) in
'dml20v2_DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation'
display/dc/dml/dcn20/display_mode_vba_20.c:1085:13: error: stack frame
size (1340) exceeds limit (1024) in
'dml20_DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation'
display/dc/dml/dcn31/display_mode_vba_31.c:3908:6: error: stack frame
size (1996) exceeds limit (1024) in
'dml31_ModeSupportAndSystemConfigurationFull'
display/dc/dml/dcn21/display_mode_vba_21.c:1466:13: error: stack frame
size (1308) exceeds limit (1024) in
'DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation'
display/dc/dml/dcn20/display_mode_vba_20v2.c:3393:6: error: stack
frame size (1356) exceeds limit (1024) in
'dml20v2_ModeSupportAndSystemConfigurationFull'
display/dc/dml/dcn20/display_mode_vba_20.c:3286:6: error: stack frame
size (1468) exceeds limit (1024) in
'dml20_ModeSupportAndSystemConfigurationFull'
display/dc/dml/dcn21/display_mode_vba_21.c:3518:6: error: stack frame
size (1228) exceeds limit (1024) in
'dml21_ModeSupportAndSystemConfigurationFull'
display/dc/dml/dcn30/display_mode_vba_30.c:1906:13: error: stack frame
size (1436) exceeds limit (1024) in
'DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation'
display/dc/dml/dcn30/display_mode_vba_30.c:3596:6: error: stack frame
size (2092) exceeds limit (1024) in
'dml30_ModeSupportAndSystemConfigurationFull'
> > I do note that commit 1b54a0121dba ("drm/amd/display: Reduce stack size
> > in the mode support function") did have a workaround for GCC. It appears
> > clang will still inline mode_support_configuration(). If I mark it as
> > 'noinline', the warning disappears in that file.
>
> That'd be the best quick fix. I guess if we split out functions to fix
> stack usage we should mark them as 'noinline' in the future to avoid
> agressive compiler optimizations.

While splitting out sub-functions can help reduce the maximum stack
usage, it seems that in this case it makes the actual problem worse:
I see 2168 bytes for the combined
dml32_ModeSupportAndSystemConfigurationFull(), but marking
mode_support_configuration() as noinline gives me 1992 bytes
for the outer function plus 384 bytes for the inner one. So it does
avoid the warning (barely), but not the problem that the warning tries
to point out.

        Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mainline build failure for x86_64 allmodconfig with clang
  2022-08-05 16:16         ` Arnd Bergmann
@ 2022-08-05 18:02           ` Nathan Chancellor
  2022-08-05 19:32             ` Arnd Bergmann
  0 siblings, 1 reply; 17+ messages in thread
From: Nathan Chancellor @ 2022-08-05 18:02 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Pan, Xinhui, David Airlie, Linus Torvalds, clang-built-linux,
	Siqueira, Rodrigo, Linux Kernel Mailing List, amd-gfx list,
	Sudip Mukherjee (Codethink),
	dri-devel, Alex Deucher, Christian König

On Fri, Aug 05, 2022 at 06:16:45PM +0200, Arnd Bergmann wrote:
> On Fri, Aug 5, 2022 at 5:32 PM Harry Wentland <harry.wentland@amd.com> wrote:
> > > I do notice that these files build with a non-configurable
> > > -Wframe-large-than value:
> > >
> > > $ rg frame_warn_flag drivers/gpu/drm/amd/display/dc/dml/Makefile
> > > 54:frame_warn_flag := -Wframe-larger-than=2048
> >
> > Tbh, I was looking at the history and I can't find a good reason this
> > was added. It should be safe to drop this. I would much rather use
> > the CONFIG_FRAME_WARN value than override it.
> >
> > AFAIK most builds use 2048 by default anyways.
> 
> I'm fairly sure this was done for 32-bit builds, which default to a lower
> warning limit of 1024 bytes and would otherwise run into this
> problem when 64-bit platforms don't. With the default warning limit,
> clang warns even more about an i386 build:
> 
> display/dc/dml/dcn20/display_rq_dlg_calc_20.c:1549:6: error: stack
> frame size (1324) exceeds limit (1024) in 'dml20_rq_dlg_get_dlg_reg'
> display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c:1550:6: error: stack
> frame size (1324) exceeds limit (1024) in 'dml20v2_rq_dlg_get_dlg_reg'
> display/dc/dml/dcn30/display_rq_dlg_calc_30.c:1742:6: error: stack
> frame size (1484) exceeds limit (1024) in 'dml30_rq_dlg_get_dlg_reg'
> display/dc/dml/dcn31/display_rq_dlg_calc_31.c:1571:6: error: stack
> frame size (1548) exceeds limit (1024) in 'dml31_rq_dlg_get_dlg_reg'
> display/dc/dml/dcn21/display_rq_dlg_calc_21.c:1657:6: error: stack
> frame size (1388) exceeds limit (1024) in 'dml21_rq_dlg_get_dlg_reg'
> display/dc/dml/dcn32/display_rq_dlg_calc_32.c:206:6: error: stack
> frame size (1276) exceeds limit (1024) in 'dml32_rq_dlg_get_dlg_reg'
> display/dc/dml/dcn31/display_mode_vba_31.c:2049:13: error: stack frame
> size (1468) exceeds limit (1024) in
> 'DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation'
> display/dc/dml/dcn20/display_mode_vba_20v2.c:1145:13: error: stack
> frame size (1228) exceeds limit (1024) in
> 'dml20v2_DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation'
> display/dc/dml/dcn20/display_mode_vba_20.c:1085:13: error: stack frame
> size (1340) exceeds limit (1024) in
> 'dml20_DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation'
> display/dc/dml/dcn31/display_mode_vba_31.c:3908:6: error: stack frame
> size (1996) exceeds limit (1024) in
> 'dml31_ModeSupportAndSystemConfigurationFull'
> display/dc/dml/dcn21/display_mode_vba_21.c:1466:13: error: stack frame
> size (1308) exceeds limit (1024) in
> 'DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation'
> display/dc/dml/dcn20/display_mode_vba_20v2.c:3393:6: error: stack
> frame size (1356) exceeds limit (1024) in
> 'dml20v2_ModeSupportAndSystemConfigurationFull'
> display/dc/dml/dcn20/display_mode_vba_20.c:3286:6: error: stack frame
> size (1468) exceeds limit (1024) in
> 'dml20_ModeSupportAndSystemConfigurationFull'
> display/dc/dml/dcn21/display_mode_vba_21.c:3518:6: error: stack frame
> size (1228) exceeds limit (1024) in
> 'dml21_ModeSupportAndSystemConfigurationFull'
> display/dc/dml/dcn30/display_mode_vba_30.c:1906:13: error: stack frame
> size (1436) exceeds limit (1024) in
> 'DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation'
> display/dc/dml/dcn30/display_mode_vba_30.c:3596:6: error: stack frame
> size (2092) exceeds limit (1024) in
> 'dml30_ModeSupportAndSystemConfigurationFull'
> > > I do note that commit 1b54a0121dba ("drm/amd/display: Reduce stack size
> > > in the mode support function") did have a workaround for GCC. It appears
> > > clang will still inline mode_support_configuration(). If I mark it as
> > > 'noinline', the warning disappears in that file.
> >
> > That'd be the best quick fix. I guess if we split out functions to fix
> > stack usage we should mark them as 'noinline' in the future to avoid
> > agressive compiler optimizations.
> 
> While splitting out sub-functions can help reduce the maximum stack
> usage, it seems that in this case it makes the actual problem worse:
> I see 2168 bytes for the combined
> dml32_ModeSupportAndSystemConfigurationFull(), but marking
> mode_support_configuration() as noinline gives me 1992 bytes
> for the outer function plus 384 bytes for the inner one. So it does
> avoid the warning (barely), but not the problem that the warning tries
> to point out.

I haven't had a chance to take a look at splitting things up yet, would
you recommend a different approach?

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mainline build failure for x86_64 allmodconfig with clang
  2022-08-05 18:02           ` Nathan Chancellor
@ 2022-08-05 19:32             ` Arnd Bergmann
  2022-08-07 17:36               ` David Laight
  2022-08-18 15:59               ` Nathan Chancellor
  0 siblings, 2 replies; 17+ messages in thread
From: Arnd Bergmann @ 2022-08-05 19:32 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Pan, Xinhui, David Airlie, Linus Torvalds, clang-built-linux,
	Siqueira, Rodrigo, Linux Kernel Mailing List, amd-gfx list,
	Sudip Mukherjee (Codethink),
	dri-devel, Alex Deucher, Christian König

On Fri, Aug 5, 2022 at 8:02 PM Nathan Chancellor <nathan@kernel.org> wrote:
> On Fri, Aug 05, 2022 at 06:16:45PM +0200, Arnd Bergmann wrote:
> > On Fri, Aug 5, 2022 at 5:32 PM Harry Wentland <harry.wentland@amd.com> wrote:
> > While splitting out sub-functions can help reduce the maximum stack
> > usage, it seems that in this case it makes the actual problem worse:
> > I see 2168 bytes for the combined
> > dml32_ModeSupportAndSystemConfigurationFull(), but marking
> > mode_support_configuration() as noinline gives me 1992 bytes
> > for the outer function plus 384 bytes for the inner one. So it does
> > avoid the warning (barely), but not the problem that the warning tries
> > to point out.
>
> I haven't had a chance to take a look at splitting things up yet, would
> you recommend a different approach?

Splitting up large functions can help when you have large local variables
that are used in different parts of the function, and the split gets the
compiler to reuse stack locations.

I think in this particular function, the problem isn't actually local variables
but either pushing variables on the stack for argument passing,
or something that causes the compiler to run out of registers so it
has to spill registers to the stack.

In either case, one has to actually look at the generated output
and then try to rearrange the codes so this does not happen.

One thing to try would be to condense a function call like

                dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(

&v->dummy_vars.dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport,
                        mode_lib->vba.USRRetrainingRequiredFinal,
                        mode_lib->vba.UsesMALLForPStateChange,

mode_lib->vba.PrefetchModePerState[mode_lib->vba.VoltageLevel][mode_lib->vba.maxMpcComb],
                        mode_lib->vba.NumberOfActiveSurfaces,
                        mode_lib->vba.MaxLineBufferLines,
                        mode_lib->vba.LineBufferSizeFinal,
                        mode_lib->vba.WritebackInterfaceBufferSize,
                        mode_lib->vba.DCFCLK,
                        mode_lib->vba.ReturnBW,
                        mode_lib->vba.SynchronizeTimingsFinal,

mode_lib->vba.SynchronizeDRRDisplaysForUCLKPStateChangeFinal,
                        mode_lib->vba.DRRDisplay,
                        v->dpte_group_bytes,
                        v->meta_row_height,
                        v->meta_row_height_chroma,

v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.mmSOCParameters,
                        mode_lib->vba.WritebackChunkSize,
                        mode_lib->vba.SOCCLK,
                        v->DCFCLKDeepSleep,
                        mode_lib->vba.DETBufferSizeY,
                        mode_lib->vba.DETBufferSizeC,
                        mode_lib->vba.SwathHeightY,
                        mode_lib->vba.SwathHeightC,
                        mode_lib->vba.LBBitPerPixel,
                        v->SwathWidthY,
                        v->SwathWidthC,
                        mode_lib->vba.HRatio,
                        mode_lib->vba.HRatioChroma,
                        mode_lib->vba.vtaps,
                        mode_lib->vba.VTAPsChroma,
                        mode_lib->vba.VRatio,
                        mode_lib->vba.VRatioChroma,
                        mode_lib->vba.HTotal,
                        mode_lib->vba.VTotal,
                        mode_lib->vba.VActive,
                        mode_lib->vba.PixelClock,
                        mode_lib->vba.BlendingAndTiming,
                        .... /* more arguments */);

into calling conventions that take a pointer to 'mode_lib->vba' and another
one to 'v', so these are no longer passed on the stack individually.

       Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: mainline build failure for x86_64 allmodconfig with clang
  2022-08-05 19:32             ` Arnd Bergmann
@ 2022-08-07 17:36               ` David Laight
  2022-08-07 17:55                 ` Linus Torvalds
  2022-08-18 15:59               ` Nathan Chancellor
  1 sibling, 1 reply; 17+ messages in thread
From: David Laight @ 2022-08-07 17:36 UTC (permalink / raw)
  To: 'Arnd Bergmann', Nathan Chancellor
  Cc: Pan, Xinhui, David Airlie, Linus Torvalds, clang-built-linux,
	Siqueira, Rodrigo, Linux Kernel Mailing List, amd-gfx list,
	Sudip Mukherjee (Codethink),
	dri-devel, Alex Deucher, Christian König

From: Arnd Bergmann
> Sent: 05 August 2022 20:32
...
> One thing to try would be to condense a function call like
> 
>                 dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
> 
...
>                         .... /* more arguments */);
> 
> into calling conventions that take a pointer to 'mode_lib->vba' and another
> one to 'v', so these are no longer passed on the stack individually.

Or, if it is only called once (I can't find the source)
force it to be inlined.

Or just shoot the software engineer who thinks 100 arguments
is sane. :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mainline build failure for x86_64 allmodconfig with clang
  2022-08-07 17:36               ` David Laight
@ 2022-08-07 17:55                 ` Linus Torvalds
  0 siblings, 0 replies; 17+ messages in thread
From: Linus Torvalds @ 2022-08-07 17:55 UTC (permalink / raw)
  To: David Laight
  Cc: Arnd Bergmann, Pan, Xinhui, David Airlie, clang-built-linux,
	Siqueira, Rodrigo, Linux Kernel Mailing List, amd-gfx list,
	Sudip Mukherjee (Codethink),
	Nathan Chancellor, dri-devel, Alex Deucher, Christian König

On Sun, Aug 7, 2022 at 10:36 AM David Laight <David.Laight@aculab.com> wrote:
>
> Or just shoot the software engineer who thinks 100 arguments
> is sane. :-)

I suspect the issue is that it's not primarily a software engineer who
wrote that code.

Hardware people writing code are about as scary as software engineers
with a soldering iron.

               Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mainline build failure for x86_64 allmodconfig with clang
  2022-08-05 19:32             ` Arnd Bergmann
  2022-08-07 17:36               ` David Laight
@ 2022-08-18 15:59               ` Nathan Chancellor
  2022-08-25 22:34                 ` Nathan Chancellor
  1 sibling, 1 reply; 17+ messages in thread
From: Nathan Chancellor @ 2022-08-18 15:59 UTC (permalink / raw)
  To: Arnd Bergmann, Harry Wentland, Siqueira, Rodrigo, Pan, Xinhui,
	Christian König, Alex Deucher
  Cc: David Airlie, clang-built-linux, Linux Kernel Mailing List,
	dri-devel, amd-gfx list, Linus Torvalds,
	Sudip Mukherjee (Codethink)

[-- Attachment #1: Type: text/plain, Size: 8557 bytes --]

Hi Arnd,

Doubling back around to this now since I think this is the only thing
breaking x86_64 allmodconfig with clang 11 through 15.

On Fri, Aug 05, 2022 at 09:32:13PM +0200, Arnd Bergmann wrote:
> On Fri, Aug 5, 2022 at 8:02 PM Nathan Chancellor <nathan@kernel.org> wrote:
> > On Fri, Aug 05, 2022 at 06:16:45PM +0200, Arnd Bergmann wrote:
> > > On Fri, Aug 5, 2022 at 5:32 PM Harry Wentland <harry.wentland@amd.com> wrote:
> > > While splitting out sub-functions can help reduce the maximum stack
> > > usage, it seems that in this case it makes the actual problem worse:
> > > I see 2168 bytes for the combined
> > > dml32_ModeSupportAndSystemConfigurationFull(), but marking
> > > mode_support_configuration() as noinline gives me 1992 bytes
> > > for the outer function plus 384 bytes for the inner one. So it does
> > > avoid the warning (barely), but not the problem that the warning tries
> > > to point out.
> >
> > I haven't had a chance to take a look at splitting things up yet, would
> > you recommend a different approach?
> 
> Splitting up large functions can help when you have large local variables
> that are used in different parts of the function, and the split gets the
> compiler to reuse stack locations.
> 
> I think in this particular function, the problem isn't actually local variables
> but either pushing variables on the stack for argument passing,
> or something that causes the compiler to run out of registers so it
> has to spill registers to the stack.
> 
> In either case, one has to actually look at the generated output
> and then try to rearrange the codes so this does not happen.
> 
> One thing to try would be to condense a function call like
> 
>                 dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
> 
> &v->dummy_vars.dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport,
>                         mode_lib->vba.USRRetrainingRequiredFinal,
>                         mode_lib->vba.UsesMALLForPStateChange,
> 
> mode_lib->vba.PrefetchModePerState[mode_lib->vba.VoltageLevel][mode_lib->vba.maxMpcComb],
>                         mode_lib->vba.NumberOfActiveSurfaces,
>                         mode_lib->vba.MaxLineBufferLines,
>                         mode_lib->vba.LineBufferSizeFinal,
>                         mode_lib->vba.WritebackInterfaceBufferSize,
>                         mode_lib->vba.DCFCLK,
>                         mode_lib->vba.ReturnBW,
>                         mode_lib->vba.SynchronizeTimingsFinal,
> 
> mode_lib->vba.SynchronizeDRRDisplaysForUCLKPStateChangeFinal,
>                         mode_lib->vba.DRRDisplay,
>                         v->dpte_group_bytes,
>                         v->meta_row_height,
>                         v->meta_row_height_chroma,
> 
> v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.mmSOCParameters,
>                         mode_lib->vba.WritebackChunkSize,
>                         mode_lib->vba.SOCCLK,
>                         v->DCFCLKDeepSleep,
>                         mode_lib->vba.DETBufferSizeY,
>                         mode_lib->vba.DETBufferSizeC,
>                         mode_lib->vba.SwathHeightY,
>                         mode_lib->vba.SwathHeightC,
>                         mode_lib->vba.LBBitPerPixel,
>                         v->SwathWidthY,
>                         v->SwathWidthC,
>                         mode_lib->vba.HRatio,
>                         mode_lib->vba.HRatioChroma,
>                         mode_lib->vba.vtaps,
>                         mode_lib->vba.VTAPsChroma,
>                         mode_lib->vba.VRatio,
>                         mode_lib->vba.VRatioChroma,
>                         mode_lib->vba.HTotal,
>                         mode_lib->vba.VTotal,
>                         mode_lib->vba.VActive,
>                         mode_lib->vba.PixelClock,
>                         mode_lib->vba.BlendingAndTiming,
>                         .... /* more arguments */);
> 
> into calling conventions that take a pointer to 'mode_lib->vba' and another
> one to 'v', so these are no longer passed on the stack individually.

So I took a whack at reducing this function's number of parameters and
ended up with the attached patch. I basically just removed any
parameters that were identical between the two call sites and access them
through the vba pointer, as you suggested.

AMD folks, is this an acceptable approach? It didn't take a trivial
amount of time so I want to make sure this is okay before I do it to
more functions/files.

Due to the potential size of these changes, I am a little weary of them
going into 6.0; even though they should be a simple search and replace
for the most part, it might be nice for them to have some decent soak
time in -next. One solution would be to raise the warning limit for
these files on 6.0 so that allmodconfig does not ship broken then reduce
the limit for 6.1 once these patches have been applied.

Additionally, I took a look at the stack usage across all compilers that
the kernel supports and I thought it was kind of interesting that the
usage really jumps from GCC 7 to 8, which I am guessing is a result of
commit e8a170ff9a35 ("drm/amdgpu: enable -msse2 for GCC 7.1+ users").
GCC 8 allmodconfig actually errors now too:

https://lore.kernel.org/alpine.DEB.2.22.394.2208152006320.289321@ramsan.of.borg/

          |-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
          | dml30_ModeSupportAndSystemConfigurationFull() | dml31_ModeSupportAndSystemConfigurationFull() | dml32_ModeSupportAndSystemConfigurationFull() |
|---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
| GCC 5   |                  1056 bytes                   |                   656 bytes                   |                  1040 bytes                   |
| GCC 6   |                  1024 bytes                   |                   672 bytes                   |                  1056 bytes                   |
| GCC 7   |                  1040 bytes                   |                   664 bytes                   |                  1056 bytes                   |
| GCC 8   |                  1760 bytes                   |                  1608 bytes                   |                  2144 bytes                   |
| GCC 9   |                  1664 bytes                   |                  1392 bytes                   |                  1960 bytes                   |
| GCC 10  |                  1648 bytes                   |                  1368 bytes                   |                  1952 bytes                   |
| GCC 11  |                  1680 bytes                   |                  1400 bytes                   |                  1952 bytes                   |
| GCC 12  |                  1680 bytes                   |                  1400 bytes                   |                  1984 bytes                   |
|---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
| LLVM 11 |                  2104 bytes                   |                  2056 bytes                   |                  2120 bytes                   |
| LLVM 12 |                  2152 bytes                   |                  2200 bytes                   |                  2152 bytes                   |
| LLVM 13 |                  2216 bytes                   |                  2248 bytes                   |                  2168 bytes                   |
| LLVM 14 |                  2168 bytes                   |                  2184 bytes                   |                  2160 bytes                   |
| LLVM 15 |                  2216 bytes                   |                  2184 bytes                   |                  2160 bytes                   |
| LLVM 16 |                  2232 bytes                   |                  2216 bytes                   |                  2176 bytes                   |
|---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|

With the patch I have attached,
dml32_ModeSupportAndSystemConfigurationFull() drops from 2176 to 1944
for LLVM 16, which is obviously still not great but it at least avoids
the warning.

Cheers,
Nathan

[-- Attachment #2: 0001-drm-amd-display-Reduce-number-of-arguments-to-dml32_.patch --]
[-- Type: text/plain, Size: 36924 bytes --]

From 193f6f6f708df6949b0c4df9fe99903e1252211a Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <nathan@kernel.org>
Date: Thu, 18 Aug 2022 08:36:51 -0700
Subject: [PATCH] drm/amd/display: Reduce number of arguments to
 dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport()

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
---
 .../dc/dml/dcn32/display_mode_vba_32.c        |  72 +----
 .../dc/dml/dcn32/display_mode_vba_util_32.c   | 259 ++++++++----------
 .../dc/dml/dcn32/display_mode_vba_util_32.h   |  36 +--
 3 files changed, 120 insertions(+), 247 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
index 890612db08dc..58d4bc5ebcd5 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_32.c
@@ -1167,66 +1167,34 @@ static void DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerforman
 		v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.mmSOCParameters.SMNLatency = mode_lib->vba.SMNLatency;
 
 		dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
-			&v->dummy_vars.dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport,
-			mode_lib->vba.USRRetrainingRequiredFinal,
-			mode_lib->vba.UsesMALLForPStateChange,
+			v,
 			mode_lib->vba.PrefetchModePerState[mode_lib->vba.VoltageLevel][mode_lib->vba.maxMpcComb],
-			mode_lib->vba.NumberOfActiveSurfaces,
-			mode_lib->vba.MaxLineBufferLines,
-			mode_lib->vba.LineBufferSizeFinal,
-			mode_lib->vba.WritebackInterfaceBufferSize,
 			mode_lib->vba.DCFCLK,
 			mode_lib->vba.ReturnBW,
-			mode_lib->vba.SynchronizeTimingsFinal,
-			mode_lib->vba.SynchronizeDRRDisplaysForUCLKPStateChangeFinal,
-			mode_lib->vba.DRRDisplay,
-			v->dpte_group_bytes,
-			v->meta_row_height,
-			v->meta_row_height_chroma,
 			v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.mmSOCParameters,
-			mode_lib->vba.WritebackChunkSize,
 			mode_lib->vba.SOCCLK,
 			v->DCFCLKDeepSleep,
 			mode_lib->vba.DETBufferSizeY,
 			mode_lib->vba.DETBufferSizeC,
 			mode_lib->vba.SwathHeightY,
 			mode_lib->vba.SwathHeightC,
-			mode_lib->vba.LBBitPerPixel,
 			v->SwathWidthY,
 			v->SwathWidthC,
-			mode_lib->vba.HRatio,
-			mode_lib->vba.HRatioChroma,
-			mode_lib->vba.vtaps,
-			mode_lib->vba.VTAPsChroma,
-			mode_lib->vba.VRatio,
-			mode_lib->vba.VRatioChroma,
-			mode_lib->vba.HTotal,
-			mode_lib->vba.VTotal,
-			mode_lib->vba.VActive,
-			mode_lib->vba.PixelClock,
-			mode_lib->vba.BlendingAndTiming,
 			mode_lib->vba.DPPPerPlane,
 			v->BytePerPixelDETY,
 			v->BytePerPixelDETC,
 			v->DSTXAfterScaler,
 			v->DSTYAfterScaler,
-			mode_lib->vba.WritebackEnable,
-			mode_lib->vba.WritebackPixelFormat,
-			mode_lib->vba.WritebackDestinationWidth,
-			mode_lib->vba.WritebackDestinationHeight,
-			mode_lib->vba.WritebackSourceHeight,
 			v->UnboundedRequestEnabled,
 			v->CompressedBufferSizeInkByte,
 
 			/* Output */
-			&v->Watermark,
 			&v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.dummy_dramchange_support,
 			v->MaxActiveDRAMClockChangeLatencySupported,
 			v->SubViewportLinesNeededInMALL,
 			&v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.dummy_fclkchange_support,
 			&v->MinActiveFCLKChangeLatencySupported,
-			&v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.dummy_USRRetrainingSupport,
-			mode_lib->vba.ActiveDRAMClockChangeLatencyMargin);
+			&v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.dummy_USRRetrainingSupport);
 
 		/* DCN32 has a new struct Watermarks (typedef) which is used to store
 		 * calculated WM values. Copy over values from struct to vba varaibles
@@ -3566,66 +3534,34 @@ void dml32_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_l
 
 			{
 				dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
-						&v->dummy_vars.dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport,
-						mode_lib->vba.USRRetrainingRequiredFinal,
-						mode_lib->vba.UsesMALLForPStateChange,
+						v,
 						mode_lib->vba.PrefetchModePerState[i][j],
-						mode_lib->vba.NumberOfActiveSurfaces,
-						mode_lib->vba.MaxLineBufferLines,
-						mode_lib->vba.LineBufferSizeFinal,
-						mode_lib->vba.WritebackInterfaceBufferSize,
 						mode_lib->vba.DCFCLKState[i][j],
 						mode_lib->vba.ReturnBWPerState[i][j],
-						mode_lib->vba.SynchronizeTimingsFinal,
-						mode_lib->vba.SynchronizeDRRDisplaysForUCLKPStateChangeFinal,
-						mode_lib->vba.DRRDisplay,
-						mode_lib->vba.dpte_group_bytes,
-						mode_lib->vba.meta_row_height,
-						mode_lib->vba.meta_row_height_chroma,
 						v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.mSOCParameters,
-						mode_lib->vba.WritebackChunkSize,
 						mode_lib->vba.SOCCLKPerState[i],
 						mode_lib->vba.ProjectedDCFCLKDeepSleep[i][j],
 						mode_lib->vba.DETBufferSizeYThisState,
 						mode_lib->vba.DETBufferSizeCThisState,
 						mode_lib->vba.SwathHeightYThisState,
 						mode_lib->vba.SwathHeightCThisState,
-						mode_lib->vba.LBBitPerPixel,
 						mode_lib->vba.SwathWidthYThisState, // 24
 						mode_lib->vba.SwathWidthCThisState,
-						mode_lib->vba.HRatio,
-						mode_lib->vba.HRatioChroma,
-						mode_lib->vba.vtaps,
-						mode_lib->vba.VTAPsChroma,
-						mode_lib->vba.VRatio,
-						mode_lib->vba.VRatioChroma,
-						mode_lib->vba.HTotal,
-						mode_lib->vba.VTotal,
-						mode_lib->vba.VActive,
-						mode_lib->vba.PixelClock,
-						mode_lib->vba.BlendingAndTiming,
 						mode_lib->vba.NoOfDPPThisState,
 						mode_lib->vba.BytePerPixelInDETY,
 						mode_lib->vba.BytePerPixelInDETC,
 						v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.DSTXAfterScaler,
 						v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.DSTYAfterScaler,
-						mode_lib->vba.WritebackEnable,
-						mode_lib->vba.WritebackPixelFormat,
-						mode_lib->vba.WritebackDestinationWidth,
-						mode_lib->vba.WritebackDestinationHeight,
-						mode_lib->vba.WritebackSourceHeight,
 						mode_lib->vba.UnboundedRequestEnabledThisState,
 						mode_lib->vba.CompressedBufferSizeInkByteThisState,
 
 						/* Output */
-						&mode_lib->vba.Watermark, // Store the values in vba
 						&mode_lib->vba.DRAMClockChangeSupport[i][j],
 						&v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.dummy_single2[0], // double *MaxActiveDRAMClockChangeLatencySupported
 						&v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.dummy_integer[0], // Long SubViewportLinesNeededInMALL[]
 						&mode_lib->vba.FCLKChangeSupport[i][j],
 						&v->dummy_vars.dml32_ModeSupportAndSystemConfigurationFull.dummy_single2[1], // double *MinActiveFCLKChangeLatencySupported
-						&mode_lib->vba.USRRetrainingSupport[i][j],
-						mode_lib->vba.ActiveDRAMClockChangeLatencyMargin);
+						&mode_lib->vba.USRRetrainingSupport[i][j]);
 			}
 		}
 	} // End of Prefetch Check
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c
index 07f8f3b8626b..d4917761f3ea 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c
@@ -4159,67 +4159,36 @@ void dml32_CalculateFlipSchedule(
 } // CalculateFlipSchedule
 
 void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
-		struct dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport *st_vars,
-		bool USRRetrainingRequiredFinal,
-		enum dm_use_mall_for_pstate_change_mode UseMALLForPStateChange[],
+		struct vba_vars_st *v,
 		unsigned int PrefetchMode,
-		unsigned int NumberOfActiveSurfaces,
-		unsigned int MaxLineBufferLines,
-		unsigned int LineBufferSize,
-		unsigned int WritebackInterfaceBufferSize,
 		double DCFCLK,
 		double ReturnBW,
-		bool SynchronizeTimingsFinal,
-		bool SynchronizeDRRDisplaysForUCLKPStateChangeFinal,
-		bool DRRDisplay[],
-		unsigned int dpte_group_bytes[],
-		unsigned int meta_row_height[],
-		unsigned int meta_row_height_chroma[],
 		SOCParametersList mmSOCParameters,
-		unsigned int WritebackChunkSize,
 		double SOCCLK,
 		double DCFClkDeepSleep,
 		unsigned int DETBufferSizeY[],
 		unsigned int DETBufferSizeC[],
 		unsigned int SwathHeightY[],
 		unsigned int SwathHeightC[],
-		unsigned int LBBitPerPixel[],
 		double SwathWidthY[],
 		double SwathWidthC[],
-		double HRatio[],
-		double HRatioChroma[],
-		unsigned int VTaps[],
-		unsigned int VTapsChroma[],
-		double VRatio[],
-		double VRatioChroma[],
-		unsigned int HTotal[],
-		unsigned int VTotal[],
-		unsigned int VActive[],
-		double PixelClock[],
-		unsigned int BlendingAndTiming[],
 		unsigned int DPPPerSurface[],
 		double BytePerPixelDETY[],
 		double BytePerPixelDETC[],
 		double DSTXAfterScaler[],
 		double DSTYAfterScaler[],
-		bool WritebackEnable[],
-		enum source_format_class WritebackPixelFormat[],
-		double WritebackDestinationWidth[],
-		double WritebackDestinationHeight[],
-		double WritebackSourceHeight[],
 		bool UnboundedRequestEnabled,
 		unsigned int CompressedBufferSizeInkByte,
 
 		/* Output */
-		Watermarks *Watermark,
 		enum clock_change_support *DRAMClockChangeSupport,
 		double MaxActiveDRAMClockChangeLatencySupported[],
 		unsigned int SubViewportLinesNeededInMALL[],
 		enum dm_fclock_change_support *FCLKChangeSupport,
 		double *MinActiveFCLKChangeLatencySupported,
-		bool *USRRetrainingSupport,
-		double ActiveDRAMClockChangeLatencyMargin[])
+		bool *USRRetrainingSupport)
 {
+	struct dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport *st_vars = &v->dummy_vars.dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport;
 	unsigned int i, j, k;
 
 	st_vars->SurfaceWithMinActiveFCLKChangeMargin = 0;
@@ -4231,136 +4200,136 @@ void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
 	st_vars->TotalPixelBW = 0.0;
 	st_vars->TotalActiveWriteback = 0;
 
-	Watermark->UrgentWatermark = mmSOCParameters.UrgentLatency + mmSOCParameters.ExtraLatency;
-	Watermark->USRRetrainingWatermark = mmSOCParameters.UrgentLatency + mmSOCParameters.ExtraLatency
+	v->Watermark->UrgentWatermark = mmSOCParameters.UrgentLatency + mmSOCParameters.ExtraLatency;
+	v->Watermark->USRRetrainingWatermark = mmSOCParameters.UrgentLatency + mmSOCParameters.ExtraLatency
 			+ mmSOCParameters.USRRetrainingLatency + mmSOCParameters.SMNLatency;
-	Watermark->DRAMClockChangeWatermark = mmSOCParameters.DRAMClockChangeLatency + Watermark->UrgentWatermark;
-	Watermark->FCLKChangeWatermark = mmSOCParameters.FCLKChangeLatency + Watermark->UrgentWatermark;
-	Watermark->StutterExitWatermark = mmSOCParameters.SRExitTime + mmSOCParameters.ExtraLatency
+	v->Watermark->DRAMClockChangeWatermark = mmSOCParameters.DRAMClockChangeLatency + v->Watermark->UrgentWatermark;
+	v->Watermark->FCLKChangeWatermark = mmSOCParameters.FCLKChangeLatency + v->Watermark->UrgentWatermark;
+	v->Watermark->StutterExitWatermark = mmSOCParameters.SRExitTime + mmSOCParameters.ExtraLatency
 			+ 10 / DCFClkDeepSleep;
-	Watermark->StutterEnterPlusExitWatermark = mmSOCParameters.SREnterPlusExitTime + mmSOCParameters.ExtraLatency
+	v->Watermark->StutterEnterPlusExitWatermark = mmSOCParameters.SREnterPlusExitTime + mmSOCParameters.ExtraLatency
 			+ 10 / DCFClkDeepSleep;
-	Watermark->Z8StutterExitWatermark = mmSOCParameters.SRExitZ8Time + mmSOCParameters.ExtraLatency
+	v->Watermark->Z8StutterExitWatermark = mmSOCParameters.SRExitZ8Time + mmSOCParameters.ExtraLatency
 			+ 10 / DCFClkDeepSleep;
-	Watermark->Z8StutterEnterPlusExitWatermark = mmSOCParameters.SREnterPlusExitZ8Time
+	v->Watermark->Z8StutterEnterPlusExitWatermark = mmSOCParameters.SREnterPlusExitZ8Time
 			+ mmSOCParameters.ExtraLatency + 10 / DCFClkDeepSleep;
 
 #ifdef __DML_VBA_DEBUG__
 	dml_print("DML::%s: UrgentLatency = %f\n", __func__, mmSOCParameters.UrgentLatency);
 	dml_print("DML::%s: ExtraLatency = %f\n", __func__, mmSOCParameters.ExtraLatency);
 	dml_print("DML::%s: DRAMClockChangeLatency = %f\n", __func__, mmSOCParameters.DRAMClockChangeLatency);
-	dml_print("DML::%s: UrgentWatermark = %f\n", __func__, Watermark->UrgentWatermark);
-	dml_print("DML::%s: USRRetrainingWatermark = %f\n", __func__, Watermark->USRRetrainingWatermark);
-	dml_print("DML::%s: DRAMClockChangeWatermark = %f\n", __func__, Watermark->DRAMClockChangeWatermark);
-	dml_print("DML::%s: FCLKChangeWatermark = %f\n", __func__, Watermark->FCLKChangeWatermark);
-	dml_print("DML::%s: StutterExitWatermark = %f\n", __func__, Watermark->StutterExitWatermark);
-	dml_print("DML::%s: StutterEnterPlusExitWatermark = %f\n", __func__, Watermark->StutterEnterPlusExitWatermark);
-	dml_print("DML::%s: Z8StutterExitWatermark = %f\n", __func__, Watermark->Z8StutterExitWatermark);
+	dml_print("DML::%s: UrgentWatermark = %f\n", __func__, v->Watermark->UrgentWatermark);
+	dml_print("DML::%s: USRRetrainingWatermark = %f\n", __func__, v->Watermark->USRRetrainingWatermark);
+	dml_print("DML::%s: DRAMClockChangeWatermark = %f\n", __func__, v->Watermark->DRAMClockChangeWatermark);
+	dml_print("DML::%s: FCLKChangeWatermark = %f\n", __func__, v->Watermark->FCLKChangeWatermark);
+	dml_print("DML::%s: StutterExitWatermark = %f\n", __func__, v->Watermark->StutterExitWatermark);
+	dml_print("DML::%s: StutterEnterPlusExitWatermark = %f\n", __func__, v->Watermark->StutterEnterPlusExitWatermark);
+	dml_print("DML::%s: Z8StutterExitWatermark = %f\n", __func__, v->Watermark->Z8StutterExitWatermark);
 	dml_print("DML::%s: Z8StutterEnterPlusExitWatermark = %f\n",
-			__func__, Watermark->Z8StutterEnterPlusExitWatermark);
+			__func__, v->Watermark->Z8StutterEnterPlusExitWatermark);
 #endif
 
 
 	st_vars->TotalActiveWriteback = 0;
-	for (k = 0; k < NumberOfActiveSurfaces; ++k) {
-		if (WritebackEnable[k] == true)
+	for (k = 0; k < v->NumberOfActiveSurfaces; ++k) {
+		if (v->WritebackEnable[k] == true)
 			st_vars->TotalActiveWriteback = st_vars->TotalActiveWriteback + 1;
 	}
 
 	if (st_vars->TotalActiveWriteback <= 1) {
-		Watermark->WritebackUrgentWatermark = mmSOCParameters.WritebackLatency;
+		v->Watermark->WritebackUrgentWatermark = mmSOCParameters.WritebackLatency;
 	} else {
-		Watermark->WritebackUrgentWatermark = mmSOCParameters.WritebackLatency
-				+ WritebackChunkSize * 1024.0 / 32.0 / SOCCLK;
+		v->Watermark->WritebackUrgentWatermark = mmSOCParameters.WritebackLatency
+				+ v->WritebackChunkSize * 1024.0 / 32.0 / SOCCLK;
 	}
-	if (USRRetrainingRequiredFinal)
-		Watermark->WritebackUrgentWatermark = Watermark->WritebackUrgentWatermark
+	if (v->USRRetrainingRequiredFinal)
+		v->Watermark->WritebackUrgentWatermark = v->Watermark->WritebackUrgentWatermark
 				+ mmSOCParameters.USRRetrainingLatency;
 
 	if (st_vars->TotalActiveWriteback <= 1) {
-		Watermark->WritebackDRAMClockChangeWatermark = mmSOCParameters.DRAMClockChangeLatency
+		v->Watermark->WritebackDRAMClockChangeWatermark = mmSOCParameters.DRAMClockChangeLatency
 				+ mmSOCParameters.WritebackLatency;
-		Watermark->WritebackFCLKChangeWatermark = mmSOCParameters.FCLKChangeLatency
+		v->Watermark->WritebackFCLKChangeWatermark = mmSOCParameters.FCLKChangeLatency
 				+ mmSOCParameters.WritebackLatency;
 	} else {
-		Watermark->WritebackDRAMClockChangeWatermark = mmSOCParameters.DRAMClockChangeLatency
-				+ mmSOCParameters.WritebackLatency + WritebackChunkSize * 1024.0 / 32.0 / SOCCLK;
-		Watermark->WritebackFCLKChangeWatermark = mmSOCParameters.FCLKChangeLatency
-				+ mmSOCParameters.WritebackLatency + WritebackChunkSize * 1024 / 32 / SOCCLK;
+		v->Watermark->WritebackDRAMClockChangeWatermark = mmSOCParameters.DRAMClockChangeLatency
+				+ mmSOCParameters.WritebackLatency + v->WritebackChunkSize * 1024.0 / 32.0 / SOCCLK;
+		v->Watermark->WritebackFCLKChangeWatermark = mmSOCParameters.FCLKChangeLatency
+				+ mmSOCParameters.WritebackLatency + v->WritebackChunkSize * 1024 / 32 / SOCCLK;
 	}
 
-	if (USRRetrainingRequiredFinal)
-		Watermark->WritebackDRAMClockChangeWatermark = Watermark->WritebackDRAMClockChangeWatermark
+	if (v->USRRetrainingRequiredFinal)
+		v->Watermark->WritebackDRAMClockChangeWatermark = v->Watermark->WritebackDRAMClockChangeWatermark
 				+ mmSOCParameters.USRRetrainingLatency;
 
-	if (USRRetrainingRequiredFinal)
-		Watermark->WritebackFCLKChangeWatermark = Watermark->WritebackFCLKChangeWatermark
+	if (v->USRRetrainingRequiredFinal)
+		v->Watermark->WritebackFCLKChangeWatermark = v->Watermark->WritebackFCLKChangeWatermark
 				+ mmSOCParameters.USRRetrainingLatency;
 
 #ifdef __DML_VBA_DEBUG__
 	dml_print("DML::%s: WritebackDRAMClockChangeWatermark = %f\n",
-			__func__, Watermark->WritebackDRAMClockChangeWatermark);
-	dml_print("DML::%s: WritebackFCLKChangeWatermark = %f\n", __func__, Watermark->WritebackFCLKChangeWatermark);
-	dml_print("DML::%s: WritebackUrgentWatermark = %f\n", __func__, Watermark->WritebackUrgentWatermark);
-	dml_print("DML::%s: USRRetrainingRequiredFinal = %d\n", __func__, USRRetrainingRequiredFinal);
+			__func__, v->Watermark->WritebackDRAMClockChangeWatermark);
+	dml_print("DML::%s: WritebackFCLKChangeWatermark = %f\n", __func__, v->Watermark->WritebackFCLKChangeWatermark);
+	dml_print("DML::%s: WritebackUrgentWatermark = %f\n", __func__, v->Watermark->WritebackUrgentWatermark);
+	dml_print("DML::%s: v->USRRetrainingRequiredFinal = %d\n", __func__, v->USRRetrainingRequiredFinal);
 	dml_print("DML::%s: USRRetrainingLatency = %f\n", __func__, mmSOCParameters.USRRetrainingLatency);
 #endif
 
-	for (k = 0; k < NumberOfActiveSurfaces; ++k) {
-		st_vars->TotalPixelBW = st_vars->TotalPixelBW + DPPPerSurface[k] * (SwathWidthY[k] * BytePerPixelDETY[k] * VRatio[k] +
-				SwathWidthC[k] * BytePerPixelDETC[k] * VRatioChroma[k]) / (HTotal[k] / PixelClock[k]);
+	for (k = 0; k < v->NumberOfActiveSurfaces; ++k) {
+		st_vars->TotalPixelBW = st_vars->TotalPixelBW + DPPPerSurface[k] * (SwathWidthY[k] * BytePerPixelDETY[k] * v->VRatio[k] +
+				SwathWidthC[k] * BytePerPixelDETC[k] * v->VRatioChroma[k]) / (v->HTotal[k] / v->PixelClock[k]);
 	}
 
-	for (k = 0; k < NumberOfActiveSurfaces; ++k) {
+	for (k = 0; k < v->NumberOfActiveSurfaces; ++k) {
 
-		st_vars->LBLatencyHidingSourceLinesY[k] = dml_min((double) MaxLineBufferLines, dml_floor(LineBufferSize / LBBitPerPixel[k] / (SwathWidthY[k] / dml_max(HRatio[k], 1.0)), 1)) - (VTaps[k] - 1);
-		st_vars->LBLatencyHidingSourceLinesC[k] = dml_min((double) MaxLineBufferLines, dml_floor(LineBufferSize / LBBitPerPixel[k] / (SwathWidthC[k] / dml_max(HRatioChroma[k], 1.0)), 1)) - (VTapsChroma[k] - 1);
+		st_vars->LBLatencyHidingSourceLinesY[k] = dml_min((double) v->MaxLineBufferLines, dml_floor(v->LineBufferSizeFinal / v->LBBitPerPixel[k] / (SwathWidthY[k] / dml_max(v->HRatio[k], 1.0)), 1)) - (v->vtaps[k] - 1);
+		st_vars->LBLatencyHidingSourceLinesC[k] = dml_min((double) v->MaxLineBufferLines, dml_floor(v->LineBufferSizeFinal / v->LBBitPerPixel[k] / (SwathWidthC[k] / dml_max(v->HRatioChroma[k], 1.0)), 1)) - (v->VTAPsChroma[k] - 1);
 
 
 #ifdef __DML_VBA_DEBUG__
-		dml_print("DML::%s: k=%d, MaxLineBufferLines = %d\n", __func__, k, MaxLineBufferLines);
-		dml_print("DML::%s: k=%d, LineBufferSize     = %d\n", __func__, k, LineBufferSize);
-		dml_print("DML::%s: k=%d, LBBitPerPixel      = %d\n", __func__, k, LBBitPerPixel[k]);
-		dml_print("DML::%s: k=%d, HRatio             = %f\n", __func__, k, HRatio[k]);
-		dml_print("DML::%s: k=%d, VTaps              = %d\n", __func__, k, VTaps[k]);
+		dml_print("DML::%s: k=%d, v->MaxLineBufferLines = %d\n", __func__, k, v->MaxLineBufferLines);
+		dml_print("DML::%s: k=%d, v->LineBufferSizeFinal     = %d\n", __func__, k, v->LineBufferSizeFinal);
+		dml_print("DML::%s: k=%d, v->LBBitPerPixel      = %d\n", __func__, k, v->LBBitPerPixel[k]);
+		dml_print("DML::%s: k=%d, v->HRatio             = %f\n", __func__, k, v->HRatio[k]);
+		dml_print("DML::%s: k=%d, v->vtaps              = %d\n", __func__, k, v->vtaps[k]);
 #endif
 
-		st_vars->EffectiveLBLatencyHidingY = st_vars->LBLatencyHidingSourceLinesY[k] / VRatio[k] * (HTotal[k] / PixelClock[k]);
-		st_vars->EffectiveLBLatencyHidingC = st_vars->LBLatencyHidingSourceLinesC[k] / VRatioChroma[k] * (HTotal[k] / PixelClock[k]);
+		st_vars->EffectiveLBLatencyHidingY = st_vars->LBLatencyHidingSourceLinesY[k] / v->VRatio[k] * (v->HTotal[k] / v->PixelClock[k]);
+		st_vars->EffectiveLBLatencyHidingC = st_vars->LBLatencyHidingSourceLinesC[k] / v->VRatioChroma[k] * (v->HTotal[k] / v->PixelClock[k]);
 		st_vars->EffectiveDETBufferSizeY = DETBufferSizeY[k];
 
 		if (UnboundedRequestEnabled) {
 			st_vars->EffectiveDETBufferSizeY = st_vars->EffectiveDETBufferSizeY
 					+ CompressedBufferSizeInkByte * 1024
-							* (SwathWidthY[k] * BytePerPixelDETY[k] * VRatio[k])
-							/ (HTotal[k] / PixelClock[k]) / st_vars->TotalPixelBW;
+							* (SwathWidthY[k] * BytePerPixelDETY[k] * v->VRatio[k])
+							/ (v->HTotal[k] / v->PixelClock[k]) / st_vars->TotalPixelBW;
 		}
 
 		st_vars->LinesInDETY[k] = (double) st_vars->EffectiveDETBufferSizeY / BytePerPixelDETY[k] / SwathWidthY[k];
 		st_vars->LinesInDETYRoundedDownToSwath[k] = dml_floor(st_vars->LinesInDETY[k], SwathHeightY[k]);
-		st_vars->FullDETBufferingTimeY = st_vars->LinesInDETYRoundedDownToSwath[k] * (HTotal[k] / PixelClock[k]) / VRatio[k];
+		st_vars->FullDETBufferingTimeY = st_vars->LinesInDETYRoundedDownToSwath[k] * (v->HTotal[k] / v->PixelClock[k]) / v->VRatio[k];
 
 		st_vars->ActiveClockChangeLatencyHidingY = st_vars->EffectiveLBLatencyHidingY + st_vars->FullDETBufferingTimeY
-				- (DSTXAfterScaler[k] / HTotal[k] + DSTYAfterScaler[k]) * HTotal[k] / PixelClock[k];
+				- (DSTXAfterScaler[k] / v->HTotal[k] + DSTYAfterScaler[k]) * v->HTotal[k] / v->PixelClock[k];
 
-		if (NumberOfActiveSurfaces > 1) {
+		if (v->NumberOfActiveSurfaces > 1) {
 			st_vars->ActiveClockChangeLatencyHidingY = st_vars->ActiveClockChangeLatencyHidingY
-					- (1 - 1 / NumberOfActiveSurfaces) * SwathHeightY[k] * HTotal[k]
-							/ PixelClock[k] / VRatio[k];
+					- (1 - 1 / v->NumberOfActiveSurfaces) * SwathHeightY[k] * v->HTotal[k]
+							/ v->PixelClock[k] / v->VRatio[k];
 		}
 
 		if (BytePerPixelDETC[k] > 0) {
 			st_vars->LinesInDETC[k] = DETBufferSizeC[k] / BytePerPixelDETC[k] / SwathWidthC[k];
 			st_vars->LinesInDETCRoundedDownToSwath[k] = dml_floor(st_vars->LinesInDETC[k], SwathHeightC[k]);
-			st_vars->FullDETBufferingTimeC = st_vars->LinesInDETCRoundedDownToSwath[k] * (HTotal[k] / PixelClock[k])
-					/ VRatioChroma[k];
+			st_vars->FullDETBufferingTimeC = st_vars->LinesInDETCRoundedDownToSwath[k] * (v->HTotal[k] / v->PixelClock[k])
+					/ v->VRatioChroma[k];
 			st_vars->ActiveClockChangeLatencyHidingC = st_vars->EffectiveLBLatencyHidingC + st_vars->FullDETBufferingTimeC
-					- (DSTXAfterScaler[k] / HTotal[k] + DSTYAfterScaler[k]) * HTotal[k]
-							/ PixelClock[k];
-			if (NumberOfActiveSurfaces > 1) {
+					- (DSTXAfterScaler[k] / v->HTotal[k] + DSTYAfterScaler[k]) * v->HTotal[k]
+							/ v->PixelClock[k];
+			if (v->NumberOfActiveSurfaces > 1) {
 				st_vars->ActiveClockChangeLatencyHidingC = st_vars->ActiveClockChangeLatencyHidingC
-						- (1 - 1 / NumberOfActiveSurfaces) * SwathHeightC[k] * HTotal[k]
-								/ PixelClock[k] / VRatioChroma[k];
+						- (1 - 1 / v->NumberOfActiveSurfaces) * SwathHeightC[k] * v->HTotal[k]
+								/ v->PixelClock[k] / v->VRatioChroma[k];
 			}
 			st_vars->ActiveClockChangeLatencyHiding = dml_min(st_vars->ActiveClockChangeLatencyHidingY,
 					st_vars->ActiveClockChangeLatencyHidingC);
@@ -4368,47 +4337,47 @@ void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
 			st_vars->ActiveClockChangeLatencyHiding = st_vars->ActiveClockChangeLatencyHidingY;
 		}
 
-		ActiveDRAMClockChangeLatencyMargin[k] = st_vars->ActiveClockChangeLatencyHiding - Watermark->UrgentWatermark
-				- Watermark->DRAMClockChangeWatermark;
-		st_vars->ActiveFCLKChangeLatencyMargin[k] = st_vars->ActiveClockChangeLatencyHiding - Watermark->UrgentWatermark
-				- Watermark->FCLKChangeWatermark;
-		st_vars->USRRetrainingLatencyMargin[k] = st_vars->ActiveClockChangeLatencyHiding - Watermark->USRRetrainingWatermark;
-
-		if (WritebackEnable[k]) {
-			st_vars->WritebackLatencyHiding = WritebackInterfaceBufferSize * 1024
-					/ (WritebackDestinationWidth[k] * WritebackDestinationHeight[k]
-							/ (WritebackSourceHeight[k] * HTotal[k] / PixelClock[k]) * 4);
-			if (WritebackPixelFormat[k] == dm_444_64)
+		v->ActiveDRAMClockChangeLatencyMargin[k] = st_vars->ActiveClockChangeLatencyHiding - v->Watermark->UrgentWatermark
+				- v->Watermark->DRAMClockChangeWatermark;
+		st_vars->ActiveFCLKChangeLatencyMargin[k] = st_vars->ActiveClockChangeLatencyHiding - v->Watermark->UrgentWatermark
+				- v->Watermark->FCLKChangeWatermark;
+		st_vars->USRRetrainingLatencyMargin[k] = st_vars->ActiveClockChangeLatencyHiding - v->Watermark->USRRetrainingWatermark;
+
+		if (v->WritebackEnable[k]) {
+			st_vars->WritebackLatencyHiding = v->WritebackInterfaceBufferSize * 1024
+					/ (v->WritebackDestinationWidth[k] * v->WritebackDestinationHeight[k]
+							/ (v->WritebackSourceHeight[k] * v->HTotal[k] / v->PixelClock[k]) * 4);
+			if (v->WritebackPixelFormat[k] == dm_444_64)
 				st_vars->WritebackLatencyHiding = st_vars->WritebackLatencyHiding / 2;
 
 			st_vars->WritebackDRAMClockChangeLatencyMargin = st_vars->WritebackLatencyHiding
-					- Watermark->WritebackDRAMClockChangeWatermark;
+					- v->Watermark->WritebackDRAMClockChangeWatermark;
 
 			st_vars->WritebackFCLKChangeLatencyMargin = st_vars->WritebackLatencyHiding
-					- Watermark->WritebackFCLKChangeWatermark;
+					- v->Watermark->WritebackFCLKChangeWatermark;
 
-			ActiveDRAMClockChangeLatencyMargin[k] = dml_min(ActiveDRAMClockChangeLatencyMargin[k],
+			v->ActiveDRAMClockChangeLatencyMargin[k] = dml_min(v->ActiveDRAMClockChangeLatencyMargin[k],
 					st_vars->WritebackFCLKChangeLatencyMargin);
 			st_vars->ActiveFCLKChangeLatencyMargin[k] = dml_min(st_vars->ActiveFCLKChangeLatencyMargin[k],
 					st_vars->WritebackDRAMClockChangeLatencyMargin);
 		}
 		MaxActiveDRAMClockChangeLatencySupported[k] =
-				(UseMALLForPStateChange[k] == dm_use_mall_pstate_change_phantom_pipe) ?
+				(v->UseMALLForPStateChange[k] == dm_use_mall_pstate_change_phantom_pipe) ?
 						0 :
-						(ActiveDRAMClockChangeLatencyMargin[k]
+						(v->ActiveDRAMClockChangeLatencyMargin[k]
 								+ mmSOCParameters.DRAMClockChangeLatency);
 	}
 
-	for (i = 0; i < NumberOfActiveSurfaces; ++i) {
-		for (j = 0; j < NumberOfActiveSurfaces; ++j) {
+	for (i = 0; i < v->NumberOfActiveSurfaces; ++i) {
+		for (j = 0; j < v->NumberOfActiveSurfaces; ++j) {
 			if (i == j ||
-					(BlendingAndTiming[i] == i && BlendingAndTiming[j] == i) ||
-					(BlendingAndTiming[j] == j && BlendingAndTiming[i] == j) ||
-					(BlendingAndTiming[i] == BlendingAndTiming[j] && BlendingAndTiming[i] != i) ||
-					(SynchronizeTimingsFinal && PixelClock[i] == PixelClock[j] &&
-					HTotal[i] == HTotal[j] && VTotal[i] == VTotal[j] &&
-					VActive[i] == VActive[j]) || (SynchronizeDRRDisplaysForUCLKPStateChangeFinal &&
-					(DRRDisplay[i] || DRRDisplay[j]))) {
+					(v->BlendingAndTiming[i] == i && v->BlendingAndTiming[j] == i) ||
+					(v->BlendingAndTiming[j] == j && v->BlendingAndTiming[i] == j) ||
+					(v->BlendingAndTiming[i] == v->BlendingAndTiming[j] && v->BlendingAndTiming[i] != i) ||
+					(v->SynchronizeTimingsFinal && v->PixelClock[i] == v->PixelClock[j] &&
+					v->HTotal[i] == v->HTotal[j] && v->VTotal[i] == v->VTotal[j] &&
+					v->VActive[i] == v->VActive[j]) || (v->SynchronizeRRDisplaysForUCLKPStateChangeFinal &&
+					(v->DRRDisplay[i] || v->DRRDisplay[j]))) {
 				st_vars->SynchronizedSurfaces[i][j] = true;
 			} else {
 				st_vars->SynchronizedSurfaces[i][j] = false;
@@ -4416,8 +4385,8 @@ void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
 		}
 	}
 
-	for (k = 0; k < NumberOfActiveSurfaces; ++k) {
-		if ((UseMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe) &&
+	for (k = 0; k < v->NumberOfActiveSurfaces; ++k) {
+		if ((v->UseMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe) &&
 				(!st_vars->FoundFirstSurfaceWithMinActiveFCLKChangeMargin ||
 				st_vars->ActiveFCLKChangeLatencyMargin[k] < st_vars->MinActiveFCLKChangeMargin)) {
 			st_vars->FoundFirstSurfaceWithMinActiveFCLKChangeMargin = true;
@@ -4429,9 +4398,9 @@ void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
 	*MinActiveFCLKChangeLatencySupported = st_vars->MinActiveFCLKChangeMargin + mmSOCParameters.FCLKChangeLatency;
 
 	st_vars->SameTimingForFCLKChange = true;
-	for (k = 0; k < NumberOfActiveSurfaces; ++k) {
+	for (k = 0; k < v->NumberOfActiveSurfaces; ++k) {
 		if (!st_vars->SynchronizedSurfaces[k][st_vars->SurfaceWithMinActiveFCLKChangeMargin]) {
-			if ((UseMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe) &&
+			if ((v->UseMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe) &&
 					(st_vars->SameTimingForFCLKChange ||
 					st_vars->ActiveFCLKChangeLatencyMargin[k] <
 					st_vars->SecondMinActiveFCLKChangeMarginOneDisplayInVBLank)) {
@@ -4451,18 +4420,18 @@ void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
 	}
 
 	*USRRetrainingSupport = true;
-	for (k = 0; k < NumberOfActiveSurfaces; ++k) {
-		if ((UseMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe) &&
+	for (k = 0; k < v->NumberOfActiveSurfaces; ++k) {
+		if ((v->UseMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe) &&
 				(st_vars->USRRetrainingLatencyMargin[k] < 0)) {
 			*USRRetrainingSupport = false;
 		}
 	}
 
-	for (k = 0; k < NumberOfActiveSurfaces; ++k) {
-		if (UseMALLForPStateChange[k] != dm_use_mall_pstate_change_full_frame &&
-				UseMALLForPStateChange[k] != dm_use_mall_pstate_change_sub_viewport &&
-				UseMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe &&
-				ActiveDRAMClockChangeLatencyMargin[k] < 0) {
+	for (k = 0; k < v->NumberOfActiveSurfaces; ++k) {
+		if (v->UseMALLForPStateChange[k] != dm_use_mall_pstate_change_full_frame &&
+				v->UseMALLForPStateChange[k] != dm_use_mall_pstate_change_sub_viewport &&
+				v->UseMALLForPStateChange[k] != dm_use_mall_pstate_change_phantom_pipe &&
+				v->ActiveDRAMClockChangeLatencyMargin[k] < 0) {
 			if (PrefetchMode > 0) {
 				st_vars->DRAMClockChangeSupportNumber = 2;
 			} else if (st_vars->DRAMClockChangeSupportNumber == 0) {
@@ -4475,10 +4444,10 @@ void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
 		}
 	}
 
-	for (k = 0; k < NumberOfActiveSurfaces; ++k) {
-		if (UseMALLForPStateChange[k] == dm_use_mall_pstate_change_full_frame)
+	for (k = 0; k < v->NumberOfActiveSurfaces; ++k) {
+		if (v->UseMALLForPStateChange[k] == dm_use_mall_pstate_change_full_frame)
 			st_vars->DRAMClockChangeMethod = 1;
-		else if (UseMALLForPStateChange[k] == dm_use_mall_pstate_change_sub_viewport)
+		else if (v->UseMALLForPStateChange[k] == dm_use_mall_pstate_change_sub_viewport)
 			st_vars->DRAMClockChangeMethod = 2;
 	}
 
@@ -4505,16 +4474,16 @@ void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
 			*DRAMClockChangeSupport = dm_dram_clock_change_unsupported;
 	}
 
-	for (k = 0; k < NumberOfActiveSurfaces; ++k) {
+	for (k = 0; k < v->NumberOfActiveSurfaces; ++k) {
 		unsigned int dst_y_pstate;
 		unsigned int src_y_pstate_l;
 		unsigned int src_y_pstate_c;
 		unsigned int src_y_ahead_l, src_y_ahead_c, sub_vp_lines_l, sub_vp_lines_c;
 
-		dst_y_pstate = dml_ceil((mmSOCParameters.DRAMClockChangeLatency + mmSOCParameters.UrgentLatency) / (HTotal[k] / PixelClock[k]), 1);
-		src_y_pstate_l = dml_ceil(dst_y_pstate * VRatio[k], SwathHeightY[k]);
+		dst_y_pstate = dml_ceil((mmSOCParameters.DRAMClockChangeLatency + mmSOCParameters.UrgentLatency) / (v->HTotal[k] / v->PixelClock[k]), 1);
+		src_y_pstate_l = dml_ceil(dst_y_pstate * v->VRatio[k], SwathHeightY[k]);
 		src_y_ahead_l = dml_floor(DETBufferSizeY[k] / BytePerPixelDETY[k] / SwathWidthY[k], SwathHeightY[k]) + st_vars->LBLatencyHidingSourceLinesY[k];
-		sub_vp_lines_l = src_y_pstate_l + src_y_ahead_l + meta_row_height[k];
+		sub_vp_lines_l = src_y_pstate_l + src_y_ahead_l + v->meta_row_height[k];
 
 #ifdef __DML_VBA_DEBUG__
 dml_print("DML::%s: k=%d, DETBufferSizeY               = %d\n", __func__, k, DETBufferSizeY[k]);
@@ -4525,21 +4494,21 @@ dml_print("DML::%s: k=%d, LBLatencyHidingSourceLinesY  = %d\n", __func__, k, st_
 dml_print("DML::%s: k=%d, dst_y_pstate      = %d\n", __func__, k, dst_y_pstate);
 dml_print("DML::%s: k=%d, src_y_pstate_l    = %d\n", __func__, k, src_y_pstate_l);
 dml_print("DML::%s: k=%d, src_y_ahead_l     = %d\n", __func__, k, src_y_ahead_l);
-dml_print("DML::%s: k=%d, meta_row_height   = %d\n", __func__, k, meta_row_height[k]);
+dml_print("DML::%s: k=%d, v->meta_row_height   = %d\n", __func__, k, v->meta_row_height[k]);
 dml_print("DML::%s: k=%d, sub_vp_lines_l    = %d\n", __func__, k, sub_vp_lines_l);
 #endif
 		SubViewportLinesNeededInMALL[k] = sub_vp_lines_l;
 
 		if (BytePerPixelDETC[k] > 0) {
-			src_y_pstate_c = dml_ceil(dst_y_pstate * VRatioChroma[k], SwathHeightC[k]);
+			src_y_pstate_c = dml_ceil(dst_y_pstate * v->VRatioChroma[k], SwathHeightC[k]);
 			src_y_ahead_c = dml_floor(DETBufferSizeC[k] / BytePerPixelDETC[k] / SwathWidthC[k], SwathHeightC[k]) + st_vars->LBLatencyHidingSourceLinesC[k];
-			sub_vp_lines_c = src_y_pstate_c + src_y_ahead_c + meta_row_height_chroma[k];
+			sub_vp_lines_c = src_y_pstate_c + src_y_ahead_c + v->meta_row_height_chroma[k];
 			SubViewportLinesNeededInMALL[k] = dml_max(sub_vp_lines_l, sub_vp_lines_c);
 
 #ifdef __DML_VBA_DEBUG__
 dml_print("DML::%s: k=%d, src_y_pstate_c            = %d\n", __func__, k, src_y_pstate_c);
 dml_print("DML::%s: k=%d, src_y_ahead_c             = %d\n", __func__, k, src_y_ahead_c);
-dml_print("DML::%s: k=%d, meta_row_height_chroma    = %d\n", __func__, k, meta_row_height_chroma[k]);
+dml_print("DML::%s: k=%d, v->meta_row_height_chroma    = %d\n", __func__, k, v->meta_row_height_chroma[k]);
 dml_print("DML::%s: k=%d, sub_vp_lines_c            = %d\n", __func__, k, sub_vp_lines_c);
 #endif
 		}
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.h b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.h
index 37a314ce284b..8515a65c61da 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.h
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.h
@@ -811,66 +811,34 @@ void dml32_CalculateFlipSchedule(
 		bool *ImmediateFlipSupportedForPipe);
 
 void dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
-		struct dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport *st_vars,
-		bool USRRetrainingRequiredFinal,
-		enum dm_use_mall_for_pstate_change_mode UseMALLForPStateChange[],
+		struct vba_vars_st *v,
 		unsigned int PrefetchMode,
-		unsigned int NumberOfActiveSurfaces,
-		unsigned int MaxLineBufferLines,
-		unsigned int LineBufferSize,
-		unsigned int WritebackInterfaceBufferSize,
 		double DCFCLK,
 		double ReturnBW,
-		bool SynchronizeTimingsFinal,
-		bool SynchronizeDRRDisplaysForUCLKPStateChangeFinal,
-		bool DRRDisplay[],
-		unsigned int dpte_group_bytes[],
-		unsigned int meta_row_height[],
-		unsigned int meta_row_height_chroma[],
 		SOCParametersList mmSOCParameters,
-		unsigned int WritebackChunkSize,
 		double SOCCLK,
 		double DCFClkDeepSleep,
 		unsigned int DETBufferSizeY[],
 		unsigned int DETBufferSizeC[],
 		unsigned int SwathHeightY[],
 		unsigned int SwathHeightC[],
-		unsigned int LBBitPerPixel[],
 		double SwathWidthY[],
 		double SwathWidthC[],
-		double HRatio[],
-		double HRatioChroma[],
-		unsigned int VTaps[],
-		unsigned int VTapsChroma[],
-		double VRatio[],
-		double VRatioChroma[],
-		unsigned int HTotal[],
-		unsigned int VTotal[],
-		unsigned int VActive[],
-		double PixelClock[],
-		unsigned int BlendingAndTiming[],
 		unsigned int DPPPerSurface[],
 		double BytePerPixelDETY[],
 		double BytePerPixelDETC[],
 		double DSTXAfterScaler[],
 		double DSTYAfterScaler[],
-		bool WritebackEnable[],
-		enum source_format_class WritebackPixelFormat[],
-		double WritebackDestinationWidth[],
-		double WritebackDestinationHeight[],
-		double WritebackSourceHeight[],
 		bool UnboundedRequestEnabled,
 		unsigned int CompressedBufferSizeInkByte,
 
 		/* Output */
-		Watermarks *Watermark,
 		enum clock_change_support *DRAMClockChangeSupport,
 		double MaxActiveDRAMClockChangeLatencySupported[],
 		unsigned int SubViewportLinesNeededInMALL[],
 		enum dm_fclock_change_support *FCLKChangeSupport,
 		double *MinActiveFCLKChangeLatencySupported,
-		bool *USRRetrainingSupport,
-		double ActiveDRAMClockChangeLatencyMargin[]);
+		bool *USRRetrainingSupport);
 
 double dml32_CalculateWriteBackDISPCLK(
 		enum source_format_class WritebackPixelFormat,
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: mainline build failure for x86_64 allmodconfig with clang
  2022-08-18 15:59               ` Nathan Chancellor
@ 2022-08-25 22:34                 ` Nathan Chancellor
  2022-08-26 14:31                   ` Alex Deucher
  0 siblings, 1 reply; 17+ messages in thread
From: Nathan Chancellor @ 2022-08-25 22:34 UTC (permalink / raw)
  To: Harry Wentland, Siqueira, Rodrigo, Pan, Xinhui,
	Christian König, Alex Deucher
  Cc: Arnd Bergmann, David Airlie, clang-built-linux,
	Linux Kernel Mailing List, dri-devel, amd-gfx list,
	Linus Torvalds, Sudip Mukherjee (Codethink)

Hi AMD folks,

Top posting because it might not have been obvious but I was looking for
your feedback on this message (which can be viewed on lore.kernel.org if
you do not have the original [1]) so that we can try to get this fixed
in some way for 6.0/6.1. If my approach is not welcome, please consider
suggesting another one or looking to see if this is something you all
could look into.

[1]: https://lore.kernel.org/Yv5h0rb3AgTZLVJv@dev-arch.thelio-3990X/

Cheers,
Nathan

On Thu, Aug 18, 2022 at 08:59:14AM -0700, Nathan Chancellor wrote:
> Hi Arnd,
> 
> Doubling back around to this now since I think this is the only thing
> breaking x86_64 allmodconfig with clang 11 through 15.
> 
> On Fri, Aug 05, 2022 at 09:32:13PM +0200, Arnd Bergmann wrote:
> > On Fri, Aug 5, 2022 at 8:02 PM Nathan Chancellor <nathan@kernel.org> wrote:
> > > On Fri, Aug 05, 2022 at 06:16:45PM +0200, Arnd Bergmann wrote:
> > > > On Fri, Aug 5, 2022 at 5:32 PM Harry Wentland <harry.wentland@amd.com> wrote:
> > > > While splitting out sub-functions can help reduce the maximum stack
> > > > usage, it seems that in this case it makes the actual problem worse:
> > > > I see 2168 bytes for the combined
> > > > dml32_ModeSupportAndSystemConfigurationFull(), but marking
> > > > mode_support_configuration() as noinline gives me 1992 bytes
> > > > for the outer function plus 384 bytes for the inner one. So it does
> > > > avoid the warning (barely), but not the problem that the warning tries
> > > > to point out.
> > >
> > > I haven't had a chance to take a look at splitting things up yet, would
> > > you recommend a different approach?
> > 
> > Splitting up large functions can help when you have large local variables
> > that are used in different parts of the function, and the split gets the
> > compiler to reuse stack locations.
> > 
> > I think in this particular function, the problem isn't actually local variables
> > but either pushing variables on the stack for argument passing,
> > or something that causes the compiler to run out of registers so it
> > has to spill registers to the stack.
> > 
> > In either case, one has to actually look at the generated output
> > and then try to rearrange the codes so this does not happen.
> > 
> > One thing to try would be to condense a function call like
> > 
> >                 dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
> > 
> > &v->dummy_vars.dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport,
> >                         mode_lib->vba.USRRetrainingRequiredFinal,
> >                         mode_lib->vba.UsesMALLForPStateChange,
> > 
> > mode_lib->vba.PrefetchModePerState[mode_lib->vba.VoltageLevel][mode_lib->vba.maxMpcComb],
> >                         mode_lib->vba.NumberOfActiveSurfaces,
> >                         mode_lib->vba.MaxLineBufferLines,
> >                         mode_lib->vba.LineBufferSizeFinal,
> >                         mode_lib->vba.WritebackInterfaceBufferSize,
> >                         mode_lib->vba.DCFCLK,
> >                         mode_lib->vba.ReturnBW,
> >                         mode_lib->vba.SynchronizeTimingsFinal,
> > 
> > mode_lib->vba.SynchronizeDRRDisplaysForUCLKPStateChangeFinal,
> >                         mode_lib->vba.DRRDisplay,
> >                         v->dpte_group_bytes,
> >                         v->meta_row_height,
> >                         v->meta_row_height_chroma,
> > 
> > v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.mmSOCParameters,
> >                         mode_lib->vba.WritebackChunkSize,
> >                         mode_lib->vba.SOCCLK,
> >                         v->DCFCLKDeepSleep,
> >                         mode_lib->vba.DETBufferSizeY,
> >                         mode_lib->vba.DETBufferSizeC,
> >                         mode_lib->vba.SwathHeightY,
> >                         mode_lib->vba.SwathHeightC,
> >                         mode_lib->vba.LBBitPerPixel,
> >                         v->SwathWidthY,
> >                         v->SwathWidthC,
> >                         mode_lib->vba.HRatio,
> >                         mode_lib->vba.HRatioChroma,
> >                         mode_lib->vba.vtaps,
> >                         mode_lib->vba.VTAPsChroma,
> >                         mode_lib->vba.VRatio,
> >                         mode_lib->vba.VRatioChroma,
> >                         mode_lib->vba.HTotal,
> >                         mode_lib->vba.VTotal,
> >                         mode_lib->vba.VActive,
> >                         mode_lib->vba.PixelClock,
> >                         mode_lib->vba.BlendingAndTiming,
> >                         .... /* more arguments */);
> > 
> > into calling conventions that take a pointer to 'mode_lib->vba' and another
> > one to 'v', so these are no longer passed on the stack individually.
> 
> So I took a whack at reducing this function's number of parameters and
> ended up with the attached patch. I basically just removed any
> parameters that were identical between the two call sites and access them
> through the vba pointer, as you suggested.
> 
> AMD folks, is this an acceptable approach? It didn't take a trivial
> amount of time so I want to make sure this is okay before I do it to
> more functions/files.
> 
> Due to the potential size of these changes, I am a little weary of them
> going into 6.0; even though they should be a simple search and replace
> for the most part, it might be nice for them to have some decent soak
> time in -next. One solution would be to raise the warning limit for
> these files on 6.0 so that allmodconfig does not ship broken then reduce
> the limit for 6.1 once these patches have been applied.
> 
> Additionally, I took a look at the stack usage across all compilers that
> the kernel supports and I thought it was kind of interesting that the
> usage really jumps from GCC 7 to 8, which I am guessing is a result of
> commit e8a170ff9a35 ("drm/amdgpu: enable -msse2 for GCC 7.1+ users").
> GCC 8 allmodconfig actually errors now too:
> 
> https://lore.kernel.org/alpine.DEB.2.22.394.2208152006320.289321@ramsan.of.borg/
> 
>           |-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
>           | dml30_ModeSupportAndSystemConfigurationFull() | dml31_ModeSupportAndSystemConfigurationFull() | dml32_ModeSupportAndSystemConfigurationFull() |
> |---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
> | GCC 5   |                  1056 bytes                   |                   656 bytes                   |                  1040 bytes                   |
> | GCC 6   |                  1024 bytes                   |                   672 bytes                   |                  1056 bytes                   |
> | GCC 7   |                  1040 bytes                   |                   664 bytes                   |                  1056 bytes                   |
> | GCC 8   |                  1760 bytes                   |                  1608 bytes                   |                  2144 bytes                   |
> | GCC 9   |                  1664 bytes                   |                  1392 bytes                   |                  1960 bytes                   |
> | GCC 10  |                  1648 bytes                   |                  1368 bytes                   |                  1952 bytes                   |
> | GCC 11  |                  1680 bytes                   |                  1400 bytes                   |                  1952 bytes                   |
> | GCC 12  |                  1680 bytes                   |                  1400 bytes                   |                  1984 bytes                   |
> |---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
> | LLVM 11 |                  2104 bytes                   |                  2056 bytes                   |                  2120 bytes                   |
> | LLVM 12 |                  2152 bytes                   |                  2200 bytes                   |                  2152 bytes                   |
> | LLVM 13 |                  2216 bytes                   |                  2248 bytes                   |                  2168 bytes                   |
> | LLVM 14 |                  2168 bytes                   |                  2184 bytes                   |                  2160 bytes                   |
> | LLVM 15 |                  2216 bytes                   |                  2184 bytes                   |                  2160 bytes                   |
> | LLVM 16 |                  2232 bytes                   |                  2216 bytes                   |                  2176 bytes                   |
> |---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
> 
> With the patch I have attached,
> dml32_ModeSupportAndSystemConfigurationFull() drops from 2176 to 1944
> for LLVM 16, which is obviously still not great but it at least avoids
> the warning.
> 
> Cheers,
> Nathan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mainline build failure for x86_64 allmodconfig with clang
  2022-08-25 22:34                 ` Nathan Chancellor
@ 2022-08-26 14:31                   ` Alex Deucher
  2022-08-30 20:38                     ` Nathan Chancellor
  0 siblings, 1 reply; 17+ messages in thread
From: Alex Deucher @ 2022-08-26 14:31 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Arnd Bergmann, clang-built-linux, David Airlie, Linus Torvalds,
	Pan, Xinhui, Siqueira, Rodrigo, Linux Kernel Mailing List,
	dri-devel, Sudip Mukherjee (Codethink),
	amd-gfx list, Alex Deucher, Christian König

On Thu, Aug 25, 2022 at 6:34 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> Hi AMD folks,
>
> Top posting because it might not have been obvious but I was looking for
> your feedback on this message (which can be viewed on lore.kernel.org if
> you do not have the original [1]) so that we can try to get this fixed
> in some way for 6.0/6.1. If my approach is not welcome, please consider
> suggesting another one or looking to see if this is something you all
> could look into.

The patch looks good to me.  I was hoping Harry or Rodrigo could
comment more since they are more familiar with this code and trying to
keep it in sync with what we get from the hardware teams.

Alex


>
> [1]: https://lore.kernel.org/Yv5h0rb3AgTZLVJv@dev-arch.thelio-3990X/
>
> Cheers,
> Nathan
>
> On Thu, Aug 18, 2022 at 08:59:14AM -0700, Nathan Chancellor wrote:
> > Hi Arnd,
> >
> > Doubling back around to this now since I think this is the only thing
> > breaking x86_64 allmodconfig with clang 11 through 15.
> >
> > On Fri, Aug 05, 2022 at 09:32:13PM +0200, Arnd Bergmann wrote:
> > > On Fri, Aug 5, 2022 at 8:02 PM Nathan Chancellor <nathan@kernel.org> wrote:
> > > > On Fri, Aug 05, 2022 at 06:16:45PM +0200, Arnd Bergmann wrote:
> > > > > On Fri, Aug 5, 2022 at 5:32 PM Harry Wentland <harry.wentland@amd.com> wrote:
> > > > > While splitting out sub-functions can help reduce the maximum stack
> > > > > usage, it seems that in this case it makes the actual problem worse:
> > > > > I see 2168 bytes for the combined
> > > > > dml32_ModeSupportAndSystemConfigurationFull(), but marking
> > > > > mode_support_configuration() as noinline gives me 1992 bytes
> > > > > for the outer function plus 384 bytes for the inner one. So it does
> > > > > avoid the warning (barely), but not the problem that the warning tries
> > > > > to point out.
> > > >
> > > > I haven't had a chance to take a look at splitting things up yet, would
> > > > you recommend a different approach?
> > >
> > > Splitting up large functions can help when you have large local variables
> > > that are used in different parts of the function, and the split gets the
> > > compiler to reuse stack locations.
> > >
> > > I think in this particular function, the problem isn't actually local variables
> > > but either pushing variables on the stack for argument passing,
> > > or something that causes the compiler to run out of registers so it
> > > has to spill registers to the stack.
> > >
> > > In either case, one has to actually look at the generated output
> > > and then try to rearrange the codes so this does not happen.
> > >
> > > One thing to try would be to condense a function call like
> > >
> > >                 dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
> > >
> > > &v->dummy_vars.dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport,
> > >                         mode_lib->vba.USRRetrainingRequiredFinal,
> > >                         mode_lib->vba.UsesMALLForPStateChange,
> > >
> > > mode_lib->vba.PrefetchModePerState[mode_lib->vba.VoltageLevel][mode_lib->vba.maxMpcComb],
> > >                         mode_lib->vba.NumberOfActiveSurfaces,
> > >                         mode_lib->vba.MaxLineBufferLines,
> > >                         mode_lib->vba.LineBufferSizeFinal,
> > >                         mode_lib->vba.WritebackInterfaceBufferSize,
> > >                         mode_lib->vba.DCFCLK,
> > >                         mode_lib->vba.ReturnBW,
> > >                         mode_lib->vba.SynchronizeTimingsFinal,
> > >
> > > mode_lib->vba.SynchronizeDRRDisplaysForUCLKPStateChangeFinal,
> > >                         mode_lib->vba.DRRDisplay,
> > >                         v->dpte_group_bytes,
> > >                         v->meta_row_height,
> > >                         v->meta_row_height_chroma,
> > >
> > > v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.mmSOCParameters,
> > >                         mode_lib->vba.WritebackChunkSize,
> > >                         mode_lib->vba.SOCCLK,
> > >                         v->DCFCLKDeepSleep,
> > >                         mode_lib->vba.DETBufferSizeY,
> > >                         mode_lib->vba.DETBufferSizeC,
> > >                         mode_lib->vba.SwathHeightY,
> > >                         mode_lib->vba.SwathHeightC,
> > >                         mode_lib->vba.LBBitPerPixel,
> > >                         v->SwathWidthY,
> > >                         v->SwathWidthC,
> > >                         mode_lib->vba.HRatio,
> > >                         mode_lib->vba.HRatioChroma,
> > >                         mode_lib->vba.vtaps,
> > >                         mode_lib->vba.VTAPsChroma,
> > >                         mode_lib->vba.VRatio,
> > >                         mode_lib->vba.VRatioChroma,
> > >                         mode_lib->vba.HTotal,
> > >                         mode_lib->vba.VTotal,
> > >                         mode_lib->vba.VActive,
> > >                         mode_lib->vba.PixelClock,
> > >                         mode_lib->vba.BlendingAndTiming,
> > >                         .... /* more arguments */);
> > >
> > > into calling conventions that take a pointer to 'mode_lib->vba' and another
> > > one to 'v', so these are no longer passed on the stack individually.
> >
> > So I took a whack at reducing this function's number of parameters and
> > ended up with the attached patch. I basically just removed any
> > parameters that were identical between the two call sites and access them
> > through the vba pointer, as you suggested.
> >
> > AMD folks, is this an acceptable approach? It didn't take a trivial
> > amount of time so I want to make sure this is okay before I do it to
> > more functions/files.
> >
> > Due to the potential size of these changes, I am a little weary of them
> > going into 6.0; even though they should be a simple search and replace
> > for the most part, it might be nice for them to have some decent soak
> > time in -next. One solution would be to raise the warning limit for
> > these files on 6.0 so that allmodconfig does not ship broken then reduce
> > the limit for 6.1 once these patches have been applied.
> >
> > Additionally, I took a look at the stack usage across all compilers that
> > the kernel supports and I thought it was kind of interesting that the
> > usage really jumps from GCC 7 to 8, which I am guessing is a result of
> > commit e8a170ff9a35 ("drm/amdgpu: enable -msse2 for GCC 7.1+ users").
> > GCC 8 allmodconfig actually errors now too:
> >
> > https://lore.kernel.org/alpine.DEB.2.22.394.2208152006320.289321@ramsan.of.borg/
> >
> >           |-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
> >           | dml30_ModeSupportAndSystemConfigurationFull() | dml31_ModeSupportAndSystemConfigurationFull() | dml32_ModeSupportAndSystemConfigurationFull() |
> > |---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
> > | GCC 5   |                  1056 bytes                   |                   656 bytes                   |                  1040 bytes                   |
> > | GCC 6   |                  1024 bytes                   |                   672 bytes                   |                  1056 bytes                   |
> > | GCC 7   |                  1040 bytes                   |                   664 bytes                   |                  1056 bytes                   |
> > | GCC 8   |                  1760 bytes                   |                  1608 bytes                   |                  2144 bytes                   |
> > | GCC 9   |                  1664 bytes                   |                  1392 bytes                   |                  1960 bytes                   |
> > | GCC 10  |                  1648 bytes                   |                  1368 bytes                   |                  1952 bytes                   |
> > | GCC 11  |                  1680 bytes                   |                  1400 bytes                   |                  1952 bytes                   |
> > | GCC 12  |                  1680 bytes                   |                  1400 bytes                   |                  1984 bytes                   |
> > |---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
> > | LLVM 11 |                  2104 bytes                   |                  2056 bytes                   |                  2120 bytes                   |
> > | LLVM 12 |                  2152 bytes                   |                  2200 bytes                   |                  2152 bytes                   |
> > | LLVM 13 |                  2216 bytes                   |                  2248 bytes                   |                  2168 bytes                   |
> > | LLVM 14 |                  2168 bytes                   |                  2184 bytes                   |                  2160 bytes                   |
> > | LLVM 15 |                  2216 bytes                   |                  2184 bytes                   |                  2160 bytes                   |
> > | LLVM 16 |                  2232 bytes                   |                  2216 bytes                   |                  2176 bytes                   |
> > |---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
> >
> > With the patch I have attached,
> > dml32_ModeSupportAndSystemConfigurationFull() drops from 2176 to 1944
> > for LLVM 16, which is obviously still not great but it at least avoids
> > the warning.
> >
> > Cheers,
> > Nathan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: mainline build failure for x86_64 allmodconfig with clang
  2022-08-26 14:31                   ` Alex Deucher
@ 2022-08-30 20:38                     ` Nathan Chancellor
  0 siblings, 0 replies; 17+ messages in thread
From: Nathan Chancellor @ 2022-08-30 20:38 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Arnd Bergmann, clang-built-linux, David Airlie, Linus Torvalds,
	Pan, Xinhui, Siqueira, Rodrigo, Linux Kernel Mailing List,
	dri-devel, Sudip Mukherjee (Codethink),
	amd-gfx list, Alex Deucher, Christian König

On Fri, Aug 26, 2022 at 10:31:34AM -0400, Alex Deucher wrote:
> On Thu, Aug 25, 2022 at 6:34 PM Nathan Chancellor <nathan@kernel.org> wrote:
> >
> > Hi AMD folks,
> >
> > Top posting because it might not have been obvious but I was looking for
> > your feedback on this message (which can be viewed on lore.kernel.org if
> > you do not have the original [1]) so that we can try to get this fixed
> > in some way for 6.0/6.1. If my approach is not welcome, please consider
> > suggesting another one or looking to see if this is something you all
> > could look into.
> 
> The patch looks good to me.  I was hoping Harry or Rodrigo could
> comment more since they are more familiar with this code and trying to
> keep it in sync with what we get from the hardware teams.

Thanks a lot for the input! That patch was broken but I have polished it
and a few other patches up and sent them along for review:

https://lore.kernel.org/20220830203409.3491379-1-nathan@kernel.org/

I did not CC everyone from this thread but it is on lore if others want
to comment on it. Hopefully we can get this all sorted out for 6.0
final.

Cheers,
Nathan

> > [1]: https://lore.kernel.org/Yv5h0rb3AgTZLVJv@dev-arch.thelio-3990X/
> >
> > Cheers,
> > Nathan
> >
> > On Thu, Aug 18, 2022 at 08:59:14AM -0700, Nathan Chancellor wrote:
> > > Hi Arnd,
> > >
> > > Doubling back around to this now since I think this is the only thing
> > > breaking x86_64 allmodconfig with clang 11 through 15.
> > >
> > > On Fri, Aug 05, 2022 at 09:32:13PM +0200, Arnd Bergmann wrote:
> > > > On Fri, Aug 5, 2022 at 8:02 PM Nathan Chancellor <nathan@kernel.org> wrote:
> > > > > On Fri, Aug 05, 2022 at 06:16:45PM +0200, Arnd Bergmann wrote:
> > > > > > On Fri, Aug 5, 2022 at 5:32 PM Harry Wentland <harry.wentland@amd.com> wrote:
> > > > > > While splitting out sub-functions can help reduce the maximum stack
> > > > > > usage, it seems that in this case it makes the actual problem worse:
> > > > > > I see 2168 bytes for the combined
> > > > > > dml32_ModeSupportAndSystemConfigurationFull(), but marking
> > > > > > mode_support_configuration() as noinline gives me 1992 bytes
> > > > > > for the outer function plus 384 bytes for the inner one. So it does
> > > > > > avoid the warning (barely), but not the problem that the warning tries
> > > > > > to point out.
> > > > >
> > > > > I haven't had a chance to take a look at splitting things up yet, would
> > > > > you recommend a different approach?
> > > >
> > > > Splitting up large functions can help when you have large local variables
> > > > that are used in different parts of the function, and the split gets the
> > > > compiler to reuse stack locations.
> > > >
> > > > I think in this particular function, the problem isn't actually local variables
> > > > but either pushing variables on the stack for argument passing,
> > > > or something that causes the compiler to run out of registers so it
> > > > has to spill registers to the stack.
> > > >
> > > > In either case, one has to actually look at the generated output
> > > > and then try to rearrange the codes so this does not happen.
> > > >
> > > > One thing to try would be to condense a function call like
> > > >
> > > >                 dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
> > > >
> > > > &v->dummy_vars.dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport,
> > > >                         mode_lib->vba.USRRetrainingRequiredFinal,
> > > >                         mode_lib->vba.UsesMALLForPStateChange,
> > > >
> > > > mode_lib->vba.PrefetchModePerState[mode_lib->vba.VoltageLevel][mode_lib->vba.maxMpcComb],
> > > >                         mode_lib->vba.NumberOfActiveSurfaces,
> > > >                         mode_lib->vba.MaxLineBufferLines,
> > > >                         mode_lib->vba.LineBufferSizeFinal,
> > > >                         mode_lib->vba.WritebackInterfaceBufferSize,
> > > >                         mode_lib->vba.DCFCLK,
> > > >                         mode_lib->vba.ReturnBW,
> > > >                         mode_lib->vba.SynchronizeTimingsFinal,
> > > >
> > > > mode_lib->vba.SynchronizeDRRDisplaysForUCLKPStateChangeFinal,
> > > >                         mode_lib->vba.DRRDisplay,
> > > >                         v->dpte_group_bytes,
> > > >                         v->meta_row_height,
> > > >                         v->meta_row_height_chroma,
> > > >
> > > > v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.mmSOCParameters,
> > > >                         mode_lib->vba.WritebackChunkSize,
> > > >                         mode_lib->vba.SOCCLK,
> > > >                         v->DCFCLKDeepSleep,
> > > >                         mode_lib->vba.DETBufferSizeY,
> > > >                         mode_lib->vba.DETBufferSizeC,
> > > >                         mode_lib->vba.SwathHeightY,
> > > >                         mode_lib->vba.SwathHeightC,
> > > >                         mode_lib->vba.LBBitPerPixel,
> > > >                         v->SwathWidthY,
> > > >                         v->SwathWidthC,
> > > >                         mode_lib->vba.HRatio,
> > > >                         mode_lib->vba.HRatioChroma,
> > > >                         mode_lib->vba.vtaps,
> > > >                         mode_lib->vba.VTAPsChroma,
> > > >                         mode_lib->vba.VRatio,
> > > >                         mode_lib->vba.VRatioChroma,
> > > >                         mode_lib->vba.HTotal,
> > > >                         mode_lib->vba.VTotal,
> > > >                         mode_lib->vba.VActive,
> > > >                         mode_lib->vba.PixelClock,
> > > >                         mode_lib->vba.BlendingAndTiming,
> > > >                         .... /* more arguments */);
> > > >
> > > > into calling conventions that take a pointer to 'mode_lib->vba' and another
> > > > one to 'v', so these are no longer passed on the stack individually.
> > >
> > > So I took a whack at reducing this function's number of parameters and
> > > ended up with the attached patch. I basically just removed any
> > > parameters that were identical between the two call sites and access them
> > > through the vba pointer, as you suggested.
> > >
> > > AMD folks, is this an acceptable approach? It didn't take a trivial
> > > amount of time so I want to make sure this is okay before I do it to
> > > more functions/files.
> > >
> > > Due to the potential size of these changes, I am a little weary of them
> > > going into 6.0; even though they should be a simple search and replace
> > > for the most part, it might be nice for them to have some decent soak
> > > time in -next. One solution would be to raise the warning limit for
> > > these files on 6.0 so that allmodconfig does not ship broken then reduce
> > > the limit for 6.1 once these patches have been applied.
> > >
> > > Additionally, I took a look at the stack usage across all compilers that
> > > the kernel supports and I thought it was kind of interesting that the
> > > usage really jumps from GCC 7 to 8, which I am guessing is a result of
> > > commit e8a170ff9a35 ("drm/amdgpu: enable -msse2 for GCC 7.1+ users").
> > > GCC 8 allmodconfig actually errors now too:
> > >
> > > https://lore.kernel.org/alpine.DEB.2.22.394.2208152006320.289321@ramsan.of.borg/
> > >
> > >           |-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
> > >           | dml30_ModeSupportAndSystemConfigurationFull() | dml31_ModeSupportAndSystemConfigurationFull() | dml32_ModeSupportAndSystemConfigurationFull() |
> > > |---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
> > > | GCC 5   |                  1056 bytes                   |                   656 bytes                   |                  1040 bytes                   |
> > > | GCC 6   |                  1024 bytes                   |                   672 bytes                   |                  1056 bytes                   |
> > > | GCC 7   |                  1040 bytes                   |                   664 bytes                   |                  1056 bytes                   |
> > > | GCC 8   |                  1760 bytes                   |                  1608 bytes                   |                  2144 bytes                   |
> > > | GCC 9   |                  1664 bytes                   |                  1392 bytes                   |                  1960 bytes                   |
> > > | GCC 10  |                  1648 bytes                   |                  1368 bytes                   |                  1952 bytes                   |
> > > | GCC 11  |                  1680 bytes                   |                  1400 bytes                   |                  1952 bytes                   |
> > > | GCC 12  |                  1680 bytes                   |                  1400 bytes                   |                  1984 bytes                   |
> > > |---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
> > > | LLVM 11 |                  2104 bytes                   |                  2056 bytes                   |                  2120 bytes                   |
> > > | LLVM 12 |                  2152 bytes                   |                  2200 bytes                   |                  2152 bytes                   |
> > > | LLVM 13 |                  2216 bytes                   |                  2248 bytes                   |                  2168 bytes                   |
> > > | LLVM 14 |                  2168 bytes                   |                  2184 bytes                   |                  2160 bytes                   |
> > > | LLVM 15 |                  2216 bytes                   |                  2184 bytes                   |                  2160 bytes                   |
> > > | LLVM 16 |                  2232 bytes                   |                  2216 bytes                   |                  2176 bytes                   |
> > > |---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
> > >
> > > With the patch I have attached,
> > > dml32_ModeSupportAndSystemConfigurationFull() drops from 2176 to 1944
> > > for LLVM 16, which is obviously still not great but it at least avoids
> > > the warning.
> > >
> > > Cheers,
> > > Nathan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-08-30 20:38 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-04 18:36 mainline build failure for x86_64 allmodconfig with clang Sudip Mukherjee (Codethink)
2022-08-04 18:52 ` Linus Torvalds
2022-08-04 19:24   ` Arnd Bergmann
2022-08-04 20:43     ` Nathan Chancellor
2022-08-04 21:59       ` Linus Torvalds
2022-08-04 22:43         ` Nathan Chancellor
2022-08-05  9:46       ` David Laight
2022-08-05 15:32       ` Harry Wentland
2022-08-05 16:16         ` Arnd Bergmann
2022-08-05 18:02           ` Nathan Chancellor
2022-08-05 19:32             ` Arnd Bergmann
2022-08-07 17:36               ` David Laight
2022-08-07 17:55                 ` Linus Torvalds
2022-08-18 15:59               ` Nathan Chancellor
2022-08-25 22:34                 ` Nathan Chancellor
2022-08-26 14:31                   ` Alex Deucher
2022-08-30 20:38                     ` Nathan Chancellor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).