All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/7] Proposal for isolating FPU operation
@ 2021-01-25 13:43 Rodrigo Siqueira
  2021-01-25 13:43 ` [RFC 1/7] drm/amd/display: Introduce FPU directory inside DC Rodrigo Siqueira
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Rodrigo Siqueira @ 2021-01-25 13:43 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alex Deucher, Harry Wentland, Christian König, Nicholas Kazlauskas

Hi,

In the display core, we utilize floats and doubles units for calculating
modesetting parameters. One side effect of our approach to use double-precision
is the fact that we spread multiple FPU access across our driver, which means
that we can accidentally clobber user space FPU state. As an attempt to fix
this problem, we have the following proposal:

1. We first need to move functions that deal with FPU to a single place in
   order to make things more manageable;
2. After we isolate these function in a single place, we want to remove any
   compilation flag that deals with FPU from other files and centralize it only
   in the files that need it;
3. We need to implement an interface for safely calling those FPU functions.
   The idea is to add a thin function layer where FPU functions are invoked
   under the protection of kernel_fpu_begin/end.

One of the challenges from the above steps is identifying which function uses
FPU registers; fortunately, Peter Zijlstra wrote a patch a couple of months ago
where he introduced an FPU check for objtool. I used the following command for
identifying the potential FPU usage:

 ./tools/objtool/objtool check -Ffa "drivers/gpu/drm/amd/display/dc/ANY_FILE.o"

Based on the above command output and the step-by-step approach that we want to
adopt, I decided to start this work focusing on DCN3 and DCN302. I believe that
the best way to see this RFC is:

1. The first patch introduces an FPU folder inside display/dc, intending to
   centralize functions that deal with FPU. Note that I introduced two new C
   files named dcn3x_commons inside a new folder called fpu_operation; I used
   the name dcn3x because some of the functions inside this folder are shared
   with DCN301 and DCN302. In other words, all FPU function which is shared
   across DCN3x will be placed in that file.
2. The next set of patches, start to move some of the function that requires
   FPU access to the file dcn3x_commons. I did it in a small chunk to make it
   easy to bisect in case of regressions.
3. Note that one of the patch touch DCN2, the reason for that is the fact that
   the function dcn20_calculate_dlg_params is shared from DCN2 to DCN3. Because
   of that, I create a new file named fpu_commons for keeping functions that
   are shared across multiple ASICs.
4. When we move some of the functions, notice that I also add an API for
   accessing it via fpu_kernel_begin/end.
5. At the end of the series, I dropped the FPU flags from the files that I
   initialize refactored.

We are also working on test stress for validating this change from the user
space and kernel perspective.

Keep in mind that this series is not done yet. I'm looking for feedback about
this approach because we have plans to use it for trying to fix our FPU
problems for the next couple of weeks. Finally, we want to do this work
step-by-step because it is easy to introduce regression when dealing with these
FPU problems.

Best Regards

Rodrigo Siqueira (7):
  drm/amd/display: Introduce FPU directory inside DC
  drm/amd/display: Moves dcn30_set_mcif_arb_params to FPU folder
  drm/amd/display: Add FPU file for functions shared across ASICs
  drm/amd/display: Move calculate_wm_and_dlg to FPU folder
  drm/amd/display: Move patch bounding box to FPU folder
  drm/amd/display: Move bounding box functions to FPU folder
  drm/amd/display: Drop float flages from DCN30 files

 drivers/gpu/drm/amd/display/dc/Makefile       |   1 +
 .../drm/amd/display/dc/dcn20/dcn20_resource.c | 106 +--
 .../drm/amd/display/dc/dcn20/dcn20_resource.h |   8 -
 .../drm/amd/display/dc/dcn21/dcn21_resource.c |   2 +
 drivers/gpu/drm/amd/display/dc/dcn30/Makefile |  30 -
 .../drm/amd/display/dc/dcn30/dcn30_resource.c | 683 +---------------
 .../drm/amd/display/dc/dcn30/dcn30_resource.h |  20 -
 .../amd/display/dc/dcn301/dcn301_resource.c   |  10 +-
 .../gpu/drm/amd/display/dc/dcn302/Makefile    |  25 -
 .../amd/display/dc/dcn302/dcn302_resource.c   |  10 +-
 .../drm/amd/display/dc/fpu_operation/Makefile |  58 ++
 .../display/dc/fpu_operation/dcn3x_commons.c  | 743 ++++++++++++++++++
 .../display/dc/fpu_operation/dcn3x_commons.h  |  44 ++
 .../display/dc/fpu_operation/fpu_commons.c    | 145 ++++
 .../display/dc/fpu_operation/fpu_commons.h    |  37 +
 15 files changed, 1051 insertions(+), 871 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/display/dc/fpu_operation/Makefile
 create mode 100644 drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
 create mode 100644 drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h
 create mode 100644 drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.c
 create mode 100644 drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.h

-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC 1/7] drm/amd/display: Introduce FPU directory inside DC
  2021-01-25 13:43 [RFC 0/7] Proposal for isolating FPU operation Rodrigo Siqueira
@ 2021-01-25 13:43 ` Rodrigo Siqueira
  2021-01-25 13:43 ` [RFC 2/7] drm/amd/display: Moves dcn30_set_mcif_arb_params to FPU folder Rodrigo Siqueira
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Rodrigo Siqueira @ 2021-01-25 13:43 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alex Deucher, Harry Wentland, Christian König, Nicholas Kazlauskas

The display core files rely on FPU operation, which requires to be
compiled with special flags. Ideally, we don't want these FPU operations
to get spread around the DC code; nevertheless, it happens in the
current source. This commit introduces a new directory named
fpu_operation that intends to centralize all files that require the FPU
compilation flag. As part of this new component, this patch also moves
one of the functions that require FPU access to a single shared file.
Notice that this is the first part of the work, and it does not fix the
FPU issue yet; we still need other patches for achieving the complete
isolation of this file.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
---
 drivers/gpu/drm/amd/display/dc/Makefile       |   1 +
 .../drm/amd/display/dc/dcn30/dcn30_resource.c |  92 +----------
 .../drm/amd/display/dc/dcn30/dcn30_resource.h |   2 -
 .../amd/display/dc/dcn301/dcn301_resource.c   |   3 +-
 .../amd/display/dc/dcn302/dcn302_resource.c   |   3 +-
 .../drm/amd/display/dc/fpu_operation/Makefile |  57 +++++++
 .../display/dc/fpu_operation/dcn3x_commons.c  | 146 ++++++++++++++++++
 .../display/dc/fpu_operation/dcn3x_commons.h  |  32 ++++
 8 files changed, 242 insertions(+), 94 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/display/dc/fpu_operation/Makefile
 create mode 100644 drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
 create mode 100644 drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h

diff --git a/drivers/gpu/drm/amd/display/dc/Makefile b/drivers/gpu/drm/amd/display/dc/Makefile
index bf8fe0471b8f..61dd2c5865c7 100644
--- a/drivers/gpu/drm/amd/display/dc/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/Makefile
@@ -33,6 +33,7 @@ DC_LIBS += dcn21
 DC_LIBS += dcn30
 DC_LIBS += dcn301
 DC_LIBS += dcn302
+DC_LIBS += fpu_operation
 endif
 
 DC_LIBS += dce120
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
index dcf848f747a5..02e3107f04ee 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
@@ -34,6 +34,7 @@
 #include "dcn20/dcn20_resource.h"
 
 #include "dcn30_resource.h"
+#include "fpu_operation/dcn3x_commons.h"
 
 #include "dcn10/dcn10_ipp.h"
 #include "dcn30/dcn30_hubbub.h"
@@ -1470,95 +1471,6 @@ int dcn30_populate_dml_pipes_from_context(
 	return pipe_cnt;
 }
 
-void dcn30_populate_dml_writeback_from_context(
-	struct dc *dc, struct resource_context *res_ctx, display_e2e_pipe_params_st *pipes)
-{
-	int pipe_cnt, i, j;
-	double max_calc_writeback_dispclk;
-	double writeback_dispclk;
-	struct writeback_st dout_wb;
-
-	for (i = 0, pipe_cnt = 0; i < dc->res_pool->pipe_count; i++) {
-		struct dc_stream_state *stream = res_ctx->pipe_ctx[i].stream;
-
-		if (!stream)
-			continue;
-		max_calc_writeback_dispclk = 0;
-
-		/* Set writeback information */
-		pipes[pipe_cnt].dout.wb_enable = 0;
-		pipes[pipe_cnt].dout.num_active_wb = 0;
-		for (j = 0; j < stream->num_wb_info; j++) {
-			struct dc_writeback_info *wb_info = &stream->writeback_info[j];
-
-			if (wb_info->wb_enabled && wb_info->writeback_source_plane &&
-					(wb_info->writeback_source_plane == res_ctx->pipe_ctx[i].plane_state)) {
-				pipes[pipe_cnt].dout.wb_enable = 1;
-				pipes[pipe_cnt].dout.num_active_wb++;
-				dout_wb.wb_src_height = wb_info->dwb_params.cnv_params.crop_en ?
-					wb_info->dwb_params.cnv_params.crop_height :
-					wb_info->dwb_params.cnv_params.src_height;
-				dout_wb.wb_src_width = wb_info->dwb_params.cnv_params.crop_en ?
-					wb_info->dwb_params.cnv_params.crop_width :
-					wb_info->dwb_params.cnv_params.src_width;
-				dout_wb.wb_dst_width = wb_info->dwb_params.dest_width;
-				dout_wb.wb_dst_height = wb_info->dwb_params.dest_height;
-
-				/* For IP that doesn't support WB scaling, set h/v taps to 1 to avoid DML validation failure */
-				if (dc->dml.ip.writeback_max_hscl_taps > 1) {
-					dout_wb.wb_htaps_luma = wb_info->dwb_params.scaler_taps.h_taps;
-					dout_wb.wb_vtaps_luma = wb_info->dwb_params.scaler_taps.v_taps;
-				} else {
-					dout_wb.wb_htaps_luma = 1;
-					dout_wb.wb_vtaps_luma = 1;
-				}
-				dout_wb.wb_htaps_chroma = 0;
-				dout_wb.wb_vtaps_chroma = 0;
-				dout_wb.wb_hratio = wb_info->dwb_params.cnv_params.crop_en ?
-					(double)wb_info->dwb_params.cnv_params.crop_width /
-						(double)wb_info->dwb_params.dest_width :
-					(double)wb_info->dwb_params.cnv_params.src_width /
-						(double)wb_info->dwb_params.dest_width;
-				dout_wb.wb_vratio = wb_info->dwb_params.cnv_params.crop_en ?
-					(double)wb_info->dwb_params.cnv_params.crop_height /
-						(double)wb_info->dwb_params.dest_height :
-					(double)wb_info->dwb_params.cnv_params.src_height /
-						(double)wb_info->dwb_params.dest_height;
-				if (wb_info->dwb_params.cnv_params.fc_out_format == DWB_OUT_FORMAT_64BPP_ARGB ||
-					wb_info->dwb_params.cnv_params.fc_out_format == DWB_OUT_FORMAT_64BPP_RGBA)
-					dout_wb.wb_pixel_format = dm_444_64;
-				else
-					dout_wb.wb_pixel_format = dm_444_32;
-
-				/* Workaround for cases where multiple writebacks are connected to same plane
-				 * In which case, need to compute worst case and set the associated writeback parameters
-				 * This workaround is necessary due to DML computation assuming only 1 set of writeback
-				 * parameters per pipe
-				 */
-				writeback_dispclk = dml30_CalculateWriteBackDISPCLK(
-						dout_wb.wb_pixel_format,
-						pipes[pipe_cnt].pipe.dest.pixel_rate_mhz,
-						dout_wb.wb_hratio,
-						dout_wb.wb_vratio,
-						dout_wb.wb_htaps_luma,
-						dout_wb.wb_vtaps_luma,
-						dout_wb.wb_src_width,
-						dout_wb.wb_dst_width,
-						pipes[pipe_cnt].pipe.dest.htotal,
-						dc->current_state->bw_ctx.dml.ip.writeback_line_buffer_buffer_size);
-
-				if (writeback_dispclk > max_calc_writeback_dispclk) {
-					max_calc_writeback_dispclk = writeback_dispclk;
-					pipes[pipe_cnt].dout.wb = dout_wb;
-				}
-			}
-		}
-
-		pipe_cnt++;
-	}
-
-}
-
 unsigned int dcn30_calc_max_scaled_time(
 		unsigned int time_per_pixel,
 		enum mmhubbub_wbif_mode mode,
@@ -2489,7 +2401,7 @@ static const struct resource_funcs dcn30_res_pool_funcs = {
 	.add_stream_to_ctx = dcn30_add_stream_to_ctx,
 	.add_dsc_to_stream_resource = dcn20_add_dsc_to_stream_resource,
 	.remove_stream_from_ctx = dcn20_remove_stream_from_ctx,
-	.populate_dml_writeback_from_context = dcn30_populate_dml_writeback_from_context,
+	.populate_dml_writeback_from_context = dcn3x_populate_dml_writeback_from_context,
 	.set_mcif_arb_params = dcn30_set_mcif_arb_params,
 	.find_first_free_match_stream_enc_for_link = dcn10_find_first_free_match_stream_enc_for_link,
 	.acquire_post_bldn_3dlut = dcn30_acquire_post_bldn_3dlut,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h
index 8ce7f6d39a20..b91e2856097f 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h
@@ -60,8 +60,6 @@ void dcn30_calculate_wm_and_dlg(
 		display_e2e_pipe_params_st *pipes,
 		int pipe_cnt,
 		int vlevel);
-void dcn30_populate_dml_writeback_from_context(
-		struct dc *dc, struct resource_context *res_ctx, display_e2e_pipe_params_st *pipes);
 
 int dcn30_populate_dml_pipes_from_context(
 	struct dc *dc, struct dc_state *context,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
index 5d4b2c60192e..e57f365aef95 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
@@ -35,6 +35,7 @@
 #include "dcn301_resource.h"
 
 #include "dcn20/dcn20_resource.h"
+#include "fpu_operation/dcn3x_commons.h"
 
 #include "dcn10/dcn10_ipp.h"
 #include "dcn301/dcn301_hubbub.h"
@@ -1630,7 +1631,7 @@ static struct resource_funcs dcn301_res_pool_funcs = {
 	.add_stream_to_ctx = dcn30_add_stream_to_ctx,
 	.add_dsc_to_stream_resource = dcn20_add_dsc_to_stream_resource,
 	.remove_stream_from_ctx = dcn20_remove_stream_from_ctx,
-	.populate_dml_writeback_from_context = dcn30_populate_dml_writeback_from_context,
+	.populate_dml_writeback_from_context = dcn3x_populate_dml_writeback_from_context,
 	.set_mcif_arb_params = dcn30_set_mcif_arb_params,
 	.find_first_free_match_stream_enc_for_link = dcn10_find_first_free_match_stream_enc_for_link,
 	.acquire_post_bldn_3dlut = dcn30_acquire_post_bldn_3dlut,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
index 22ba0be88faf..c0f4a6d10cb6 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
@@ -42,6 +42,7 @@
 
 #include "dcn20/dcn20_dsc.h"
 #include "dcn20/dcn20_resource.h"
+#include "fpu_operation/dcn3x_commons.h"
 
 #include "dcn10/dcn10_resource.h"
 
@@ -1240,7 +1241,7 @@ static struct resource_funcs dcn302_res_pool_funcs = {
 		.add_stream_to_ctx = dcn30_add_stream_to_ctx,
 		.add_dsc_to_stream_resource = dcn20_add_dsc_to_stream_resource,
 		.remove_stream_from_ctx = dcn20_remove_stream_from_ctx,
-		.populate_dml_writeback_from_context = dcn30_populate_dml_writeback_from_context,
+		.populate_dml_writeback_from_context = dcn3x_populate_dml_writeback_from_context,
 		.set_mcif_arb_params = dcn30_set_mcif_arb_params,
 		.find_first_free_match_stream_enc_for_link = dcn10_find_first_free_match_stream_enc_for_link,
 		.acquire_post_bldn_3dlut = dcn30_acquire_post_bldn_3dlut,
diff --git a/drivers/gpu/drm/amd/display/dc/fpu_operation/Makefile b/drivers/gpu/drm/amd/display/dc/fpu_operation/Makefile
new file mode 100644
index 000000000000..8a0957bfa8ba
--- /dev/null
+++ b/drivers/gpu/drm/amd/display/dc/fpu_operation/Makefile
@@ -0,0 +1,57 @@
+# 
+# Copyright 2020 Advanced Micro Devices, Inc.
+# 
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+# 
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+# 
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+# OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+# ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+# OTHER DEALINGS IN THE SOFTWARE.
+# 
+# Authors: AMD
+# 
+# 
+
+DCN3X_COMMONS = dcn3x_commons.o
+
+ifdef CONFIG_X86
+dcn3x_commons_ccflags := -mhard-float -msse
+endif
+
+ifdef CONFIG_PPC64
+dcn3x_commons_ccflags := -mhard-float -maltivec
+endif
+
+ifdef CONFIG_CC_IS_GCC
+ifeq ($(call cc-ifversion, -lt, 0701, y), y)
+IS_OLD_GCC = 1
+endif
+endif
+
+ifdef CONFIG_X86
+ifdef IS_OLD_GCC
+# Stack alignment mismatch, proceed with caution.
+# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
+# (8B stack alignment).
+dcn3x_commons_ccflags := -mpreferred-stack-boundary=4
+else
+dcn3x_commons_ccflags :=  -msse2
+endif
+endif
+
+CFLAGS_$(AMDDALPATH)/dc/fpu_operation/dcn3x_commons.o += $(dcn3x_commons_ccflags)
+
+AMD_DAL_DCN3X_COMMONS = $(addprefix $(AMDDALPATH)/dc/fpu_operation/,$(DCN3X_COMMONS))
+
+AMD_DISPLAY_FILES += $(AMD_DAL_DCN3X_COMMONS)
diff --git a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
new file mode 100644
index 000000000000..767ca59551c1
--- /dev/null
+++ b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
@@ -0,0 +1,146 @@
+/*
+ * Copyright 2021 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: AMD
+ *
+ */
+
+
+/**
+ * DOC: overview
+ *
+ * The DCN architecture relies on FPU operations, which require special
+ * compilation flags and the kernel_fpu_begin/end macros. Ideally, we want to
+ * avoid spreading this sort of function across multiple object files for
+ * keeping the FPU work manageable; furthermore, we want to avoid messing with
+ * userspace applications that use double precision. This file aims to
+ * centralize all DCN30  functions that require FPU access - including those
+ * shared between other DCN3x architecture (i.e., DCN3.01 and DCN3.02) - in a
+ * single place. Notice that this file follows the following pattern:
+ *
+ * - Functions shared between multiple DCN architecture starts with dcn3x;
+ * - The functions available in this file are exposed by via public functions
+ *   that calls functions that need FPU operation under the
+ *   kernel_fpu_begin/end macros.
+ */
+
+#include "resource.h"
+#include "dml/dcn30/display_mode_vba_30.h"
+
+#include "fpu_operation/dcn3x_commons.h"
+
+static void _dcn3x_populate_dml_writeback_from_context(struct dc *dc,
+		struct resource_context *res_ctx,
+		display_e2e_pipe_params_st *pipes)
+{
+	int pipe_cnt, i, j;
+	double max_calc_writeback_dispclk;
+	double writeback_dispclk;
+	struct writeback_st dout_wb;
+
+	for (i = 0, pipe_cnt = 0; i < dc->res_pool->pipe_count; i++) {
+		struct dc_stream_state *stream = res_ctx->pipe_ctx[i].stream;
+
+		if (!stream)
+			continue;
+		max_calc_writeback_dispclk = 0;
+
+		/* Set writeback information */
+		pipes[pipe_cnt].dout.wb_enable = 0;
+		pipes[pipe_cnt].dout.num_active_wb = 0;
+		for (j = 0; j < stream->num_wb_info; j++) {
+			struct dc_writeback_info *wb_info = &stream->writeback_info[j];
+
+			if (wb_info->wb_enabled && wb_info->writeback_source_plane &&
+					(wb_info->writeback_source_plane == res_ctx->pipe_ctx[i].plane_state)) {
+				pipes[pipe_cnt].dout.wb_enable = 1;
+				pipes[pipe_cnt].dout.num_active_wb++;
+				dout_wb.wb_src_height = wb_info->dwb_params.cnv_params.crop_en ?
+					wb_info->dwb_params.cnv_params.crop_height :
+					wb_info->dwb_params.cnv_params.src_height;
+				dout_wb.wb_src_width = wb_info->dwb_params.cnv_params.crop_en ?
+					wb_info->dwb_params.cnv_params.crop_width :
+					wb_info->dwb_params.cnv_params.src_width;
+				dout_wb.wb_dst_width = wb_info->dwb_params.dest_width;
+				dout_wb.wb_dst_height = wb_info->dwb_params.dest_height;
+
+				/* For IP that doesn't support WB scaling, set h/v taps to 1 to avoid DML validation failure */
+				if (dc->dml.ip.writeback_max_hscl_taps > 1) {
+					dout_wb.wb_htaps_luma = wb_info->dwb_params.scaler_taps.h_taps;
+					dout_wb.wb_vtaps_luma = wb_info->dwb_params.scaler_taps.v_taps;
+				} else {
+					dout_wb.wb_htaps_luma = 1;
+					dout_wb.wb_vtaps_luma = 1;
+				}
+				dout_wb.wb_htaps_chroma = 0;
+				dout_wb.wb_vtaps_chroma = 0;
+				dout_wb.wb_hratio = wb_info->dwb_params.cnv_params.crop_en ?
+					(double)wb_info->dwb_params.cnv_params.crop_width /
+						(double)wb_info->dwb_params.dest_width :
+					(double)wb_info->dwb_params.cnv_params.src_width /
+						(double)wb_info->dwb_params.dest_width;
+				dout_wb.wb_vratio = wb_info->dwb_params.cnv_params.crop_en ?
+					(double)wb_info->dwb_params.cnv_params.crop_height /
+						(double)wb_info->dwb_params.dest_height :
+					(double)wb_info->dwb_params.cnv_params.src_height /
+						(double)wb_info->dwb_params.dest_height;
+				if (wb_info->dwb_params.cnv_params.fc_out_format == DWB_OUT_FORMAT_64BPP_ARGB ||
+					wb_info->dwb_params.cnv_params.fc_out_format == DWB_OUT_FORMAT_64BPP_RGBA)
+					dout_wb.wb_pixel_format = dm_444_64;
+				else
+					dout_wb.wb_pixel_format = dm_444_32;
+
+				/* Workaround for cases where multiple writebacks are connected to same plane
+				 * In which case, need to compute worst case and set the associated writeback parameters
+				 * This workaround is necessary due to DML computation assuming only 1 set of writeback
+				 * parameters per pipe
+				 */
+				writeback_dispclk = dml30_CalculateWriteBackDISPCLK(
+						dout_wb.wb_pixel_format,
+						pipes[pipe_cnt].pipe.dest.pixel_rate_mhz,
+						dout_wb.wb_hratio,
+						dout_wb.wb_vratio,
+						dout_wb.wb_htaps_luma,
+						dout_wb.wb_vtaps_luma,
+						dout_wb.wb_src_width,
+						dout_wb.wb_dst_width,
+						pipes[pipe_cnt].pipe.dest.htotal,
+						dc->current_state->bw_ctx.dml.ip.writeback_line_buffer_buffer_size);
+
+				if (writeback_dispclk > max_calc_writeback_dispclk) {
+					max_calc_writeback_dispclk = writeback_dispclk;
+					pipes[pipe_cnt].dout.wb = dout_wb;
+				}
+			}
+		}
+
+		pipe_cnt++;
+	}
+
+}
+
+void dcn3x_populate_dml_writeback_from_context(struct dc *dc,
+	struct resource_context *res_ctx, display_e2e_pipe_params_st *pipes)
+{
+	DC_FP_START();
+	_dcn3x_populate_dml_writeback_from_context(dc, res_ctx, pipes);
+	DC_FP_END();
+}
diff --git a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h
new file mode 100644
index 000000000000..c3d6297d3585
--- /dev/null
+++ b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h
@@ -0,0 +1,32 @@
+/*
+ * Copyright 2020 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: AMD
+ *
+ */
+
+#ifndef _DCN3X_COMMONS_H_
+#define _DCN3X_COMMONS_H_
+
+void dcn3x_populate_dml_writeback_from_context(struct dc *dc,
+	struct resource_context *res_ctx, display_e2e_pipe_params_st *pipes);
+
+#endif /* _DCN3X_COMMONS_H_ */
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 2/7] drm/amd/display: Moves dcn30_set_mcif_arb_params to FPU folder
  2021-01-25 13:43 [RFC 0/7] Proposal for isolating FPU operation Rodrigo Siqueira
  2021-01-25 13:43 ` [RFC 1/7] drm/amd/display: Introduce FPU directory inside DC Rodrigo Siqueira
@ 2021-01-25 13:43 ` Rodrigo Siqueira
  2021-01-25 13:43 ` [RFC 3/7] drm/amd/display: Add FPU file for functions shared across ASICs Rodrigo Siqueira
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Rodrigo Siqueira @ 2021-01-25 13:43 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alex Deucher, Harry Wentland, Christian König, Nicholas Kazlauskas

The function dcn30_set_mcif_arb_params uses some double-precision
operation; for this reason, this commit moves this function to the
directory fpu_operation, where all FPU operations should be centralized.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
---
 .../drm/amd/display/dc/dcn30/dcn30_resource.c | 81 +-----------------
 .../drm/amd/display/dc/dcn30/dcn30_resource.h | 11 ---
 .../amd/display/dc/dcn301/dcn301_resource.c   |  2 +-
 .../amd/display/dc/dcn302/dcn302_resource.c   |  2 +-
 .../display/dc/fpu_operation/dcn3x_commons.c  | 84 +++++++++++++++++++
 .../display/dc/fpu_operation/dcn3x_commons.h  |  3 +
 6 files changed, 90 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
index 02e3107f04ee..b82d616f8a21 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
@@ -1471,85 +1471,6 @@ int dcn30_populate_dml_pipes_from_context(
 	return pipe_cnt;
 }
 
-unsigned int dcn30_calc_max_scaled_time(
-		unsigned int time_per_pixel,
-		enum mmhubbub_wbif_mode mode,
-		unsigned int urgent_watermark)
-{
-	unsigned int time_per_byte = 0;
-	unsigned int total_free_entry = 0xb40;
-	unsigned int buf_lh_capability;
-	unsigned int max_scaled_time;
-
-	if (mode == PACKED_444) /* packed mode 32 bpp */
-		time_per_byte = time_per_pixel/4;
-	else if (mode == PACKED_444_FP16) /* packed mode 64 bpp */
-		time_per_byte = time_per_pixel/8;
-
-	if (time_per_byte == 0)
-		time_per_byte = 1;
-
-	buf_lh_capability = (total_free_entry*time_per_byte*32) >> 6; /* time_per_byte is in u6.6*/
-	max_scaled_time   = buf_lh_capability - urgent_watermark;
-	return max_scaled_time;
-}
-
-void dcn30_set_mcif_arb_params(
-		struct dc *dc,
-		struct dc_state *context,
-		display_e2e_pipe_params_st *pipes,
-		int pipe_cnt)
-{
-	enum mmhubbub_wbif_mode wbif_mode;
-	struct display_mode_lib *dml = &context->bw_ctx.dml;
-	struct mcif_arb_params *wb_arb_params;
-	int i, j, k, dwb_pipe;
-
-	/* Writeback MCIF_WB arbitration parameters */
-	dwb_pipe = 0;
-	for (i = 0; i < dc->res_pool->pipe_count; i++) {
-
-		if (!context->res_ctx.pipe_ctx[i].stream)
-			continue;
-
-		for (j = 0; j < MAX_DWB_PIPES; j++) {
-			struct dc_writeback_info *writeback_info = &context->res_ctx.pipe_ctx[i].stream->writeback_info[j];
-
-			if (writeback_info->wb_enabled == false)
-				continue;
-
-			//wb_arb_params = &context->res_ctx.pipe_ctx[i].stream->writeback_info[j].mcif_arb_params;
-			wb_arb_params = &context->bw_ctx.bw.dcn.bw_writeback.mcif_wb_arb[dwb_pipe];
-
-			if (writeback_info->dwb_params.cnv_params.fc_out_format == DWB_OUT_FORMAT_64BPP_ARGB ||
-				writeback_info->dwb_params.cnv_params.fc_out_format == DWB_OUT_FORMAT_64BPP_RGBA)
-				wbif_mode = PACKED_444_FP16;
-			else
-				wbif_mode = PACKED_444;
-
-			for (k = 0; k < sizeof(wb_arb_params->cli_watermark)/sizeof(wb_arb_params->cli_watermark[0]); k++) {
-				wb_arb_params->cli_watermark[k] = get_wm_writeback_urgent(dml, pipes, pipe_cnt) * 1000;
-				wb_arb_params->pstate_watermark[k] = get_wm_writeback_dram_clock_change(dml, pipes, pipe_cnt) * 1000;
-			}
-			wb_arb_params->time_per_pixel = (1000000 << 6) / context->res_ctx.pipe_ctx[i].stream->phy_pix_clk; /* time_per_pixel should be in u6.6 format */
-			wb_arb_params->slice_lines = 32;
-			wb_arb_params->arbitration_slice = 2; /* irrelevant since there is no YUV output */
-			wb_arb_params->max_scaled_time = dcn30_calc_max_scaled_time(wb_arb_params->time_per_pixel,
-					wbif_mode,
-					wb_arb_params->cli_watermark[0]); /* assume 4 watermark sets have the same value */
-			wb_arb_params->dram_speed_change_duration = dml->vba.WritebackAllowDRAMClockChangeEndPosition[j] * pipes[0].clks_cfg.refclk_mhz; /* num_clock_cycles = us * MHz */
-
-			dwb_pipe++;
-
-			if (dwb_pipe >= MAX_DWB_PIPES)
-				return;
-		}
-		if (dwb_pipe >= MAX_DWB_PIPES)
-			return;
-	}
-
-}
-
 static struct dc_cap_funcs cap_funcs = {
 	.get_dcc_compression_cap = dcn20_get_dcc_compression_cap
 };
@@ -2402,7 +2323,7 @@ static const struct resource_funcs dcn30_res_pool_funcs = {
 	.add_dsc_to_stream_resource = dcn20_add_dsc_to_stream_resource,
 	.remove_stream_from_ctx = dcn20_remove_stream_from_ctx,
 	.populate_dml_writeback_from_context = dcn3x_populate_dml_writeback_from_context,
-	.set_mcif_arb_params = dcn30_set_mcif_arb_params,
+	.set_mcif_arb_params = dcn3x_set_mcif_arb_params,
 	.find_first_free_match_stream_enc_for_link = dcn10_find_first_free_match_stream_enc_for_link,
 	.acquire_post_bldn_3dlut = dcn30_acquire_post_bldn_3dlut,
 	.release_post_bldn_3dlut = dcn30_release_post_bldn_3dlut,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h
index b91e2856097f..42960574cce9 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h
@@ -42,17 +42,6 @@ struct resource_pool *dcn30_create_resource_pool(
 		const struct dc_init_data *init_data,
 		struct dc *dc);
 
-void dcn30_set_mcif_arb_params(
-		struct dc *dc,
-		struct dc_state *context,
-		display_e2e_pipe_params_st *pipes,
-		int pipe_cnt);
-
-unsigned int dcn30_calc_max_scaled_time(
-		unsigned int time_per_pixel,
-		enum mmhubbub_wbif_mode mode,
-		unsigned int urgent_watermark);
-
 bool dcn30_validate_bandwidth(struct dc *dc, struct dc_state *context,
 		bool fast_validate);
 void dcn30_calculate_wm_and_dlg(
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
index e57f365aef95..8369b009a853 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
@@ -1632,7 +1632,7 @@ static struct resource_funcs dcn301_res_pool_funcs = {
 	.add_dsc_to_stream_resource = dcn20_add_dsc_to_stream_resource,
 	.remove_stream_from_ctx = dcn20_remove_stream_from_ctx,
 	.populate_dml_writeback_from_context = dcn3x_populate_dml_writeback_from_context,
-	.set_mcif_arb_params = dcn30_set_mcif_arb_params,
+	.set_mcif_arb_params = dcn3x_set_mcif_arb_params,
 	.find_first_free_match_stream_enc_for_link = dcn10_find_first_free_match_stream_enc_for_link,
 	.acquire_post_bldn_3dlut = dcn30_acquire_post_bldn_3dlut,
 	.release_post_bldn_3dlut = dcn30_release_post_bldn_3dlut,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
index c0f4a6d10cb6..0554a3dd3214 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
@@ -1242,7 +1242,7 @@ static struct resource_funcs dcn302_res_pool_funcs = {
 		.add_dsc_to_stream_resource = dcn20_add_dsc_to_stream_resource,
 		.remove_stream_from_ctx = dcn20_remove_stream_from_ctx,
 		.populate_dml_writeback_from_context = dcn3x_populate_dml_writeback_from_context,
-		.set_mcif_arb_params = dcn30_set_mcif_arb_params,
+		.set_mcif_arb_params = dcn3x_set_mcif_arb_params,
 		.find_first_free_match_stream_enc_for_link = dcn10_find_first_free_match_stream_enc_for_link,
 		.acquire_post_bldn_3dlut = dcn30_acquire_post_bldn_3dlut,
 		.release_post_bldn_3dlut = dcn30_release_post_bldn_3dlut,
diff --git a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
index 767ca59551c1..707f782127a7 100644
--- a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
+++ b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
@@ -137,6 +137,90 @@ static void _dcn3x_populate_dml_writeback_from_context(struct dc *dc,
 
 }
 
+static unsigned int dcn3x_calc_max_scaled_time(unsigned int time_per_pixel,
+	enum mmhubbub_wbif_mode mode, unsigned int urgent_watermark)
+{
+	unsigned int time_per_byte = 0;
+	unsigned int total_free_entry = 0xb40;
+	unsigned int buf_lh_capability;
+	unsigned int max_scaled_time;
+
+	if (mode == PACKED_444) /* packed mode 32 bpp */
+		time_per_byte = time_per_pixel/4;
+	else if (mode == PACKED_444_FP16) /* packed mode 64 bpp */
+		time_per_byte = time_per_pixel/8;
+
+	if (time_per_byte == 0)
+		time_per_byte = 1;
+
+	buf_lh_capability = (total_free_entry*time_per_byte*32) >> 6; /* time_per_byte is in u6.6*/
+	max_scaled_time   = buf_lh_capability - urgent_watermark;
+	return max_scaled_time;
+}
+
+static void _dcn3x_set_mcif_arb_params(struct dc *dc, struct dc_state *context,
+		display_e2e_pipe_params_st *pipes,
+		int pipe_cnt)
+{
+	enum mmhubbub_wbif_mode wbif_mode;
+	struct display_mode_lib *dml = &context->bw_ctx.dml;
+	struct mcif_arb_params *wb_arb_params;
+	int i, j, k, dwb_pipe;
+
+	/* Writeback MCIF_WB arbitration parameters */
+	dwb_pipe = 0;
+	for (i = 0; i < dc->res_pool->pipe_count; i++) {
+
+		if (!context->res_ctx.pipe_ctx[i].stream)
+			continue;
+
+		for (j = 0; j < MAX_DWB_PIPES; j++) {
+			struct dc_writeback_info *writeback_info = &context->res_ctx.pipe_ctx[i].stream->writeback_info[j];
+
+			if (writeback_info->wb_enabled == false)
+				continue;
+
+			//wb_arb_params = &context->res_ctx.pipe_ctx[i].stream->writeback_info[j].mcif_arb_params;
+			wb_arb_params = &context->bw_ctx.bw.dcn.bw_writeback.mcif_wb_arb[dwb_pipe];
+
+			if (writeback_info->dwb_params.cnv_params.fc_out_format == DWB_OUT_FORMAT_64BPP_ARGB ||
+				writeback_info->dwb_params.cnv_params.fc_out_format == DWB_OUT_FORMAT_64BPP_RGBA)
+				wbif_mode = PACKED_444_FP16;
+			else
+				wbif_mode = PACKED_444;
+
+			for (k = 0; k < sizeof(wb_arb_params->cli_watermark)/sizeof(wb_arb_params->cli_watermark[0]); k++) {
+				wb_arb_params->cli_watermark[k] = get_wm_writeback_urgent(dml, pipes, pipe_cnt) * 1000;
+				wb_arb_params->pstate_watermark[k] = get_wm_writeback_dram_clock_change(dml, pipes, pipe_cnt) * 1000;
+			}
+			wb_arb_params->time_per_pixel = (1000000 << 6) / context->res_ctx.pipe_ctx[i].stream->phy_pix_clk; /* time_per_pixel should be in u6.6 format */
+			wb_arb_params->slice_lines = 32;
+			wb_arb_params->arbitration_slice = 2; /* irrelevant since there is no YUV output */
+			wb_arb_params->max_scaled_time = dcn3x_calc_max_scaled_time(wb_arb_params->time_per_pixel,
+					wbif_mode,
+					wb_arb_params->cli_watermark[0]); /* assume 4 watermark sets have the same value */
+			wb_arb_params->dram_speed_change_duration = dml->vba.WritebackAllowDRAMClockChangeEndPosition[j] * pipes[0].clks_cfg.refclk_mhz; /* num_clock_cycles = us * MHz */
+
+			dwb_pipe++;
+
+			if (dwb_pipe >= MAX_DWB_PIPES)
+				return;
+		}
+		if (dwb_pipe >= MAX_DWB_PIPES)
+			return;
+	}
+
+}
+
+void dcn3x_set_mcif_arb_params(struct dc *dc, struct dc_state *context,
+		display_e2e_pipe_params_st *pipes,
+		int pipe_cnt)
+{
+	DC_FP_START();
+	_dcn3x_set_mcif_arb_params(dc, context, pipes, pipe_cnt);
+	DC_FP_END();
+}
+
 void dcn3x_populate_dml_writeback_from_context(struct dc *dc,
 	struct resource_context *res_ctx, display_e2e_pipe_params_st *pipes)
 {
diff --git a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h
index c3d6297d3585..b3b6d8a66c28 100644
--- a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h
+++ b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h
@@ -29,4 +29,7 @@
 void dcn3x_populate_dml_writeback_from_context(struct dc *dc,
 	struct resource_context *res_ctx, display_e2e_pipe_params_st *pipes);
 
+void dcn3x_set_mcif_arb_params(struct dc *dc, struct dc_state *context,
+	display_e2e_pipe_params_st *pipes, int pipe_cnt);
+
 #endif /* _DCN3X_COMMONS_H_ */
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 3/7] drm/amd/display: Add FPU file for functions shared across ASICs
  2021-01-25 13:43 [RFC 0/7] Proposal for isolating FPU operation Rodrigo Siqueira
  2021-01-25 13:43 ` [RFC 1/7] drm/amd/display: Introduce FPU directory inside DC Rodrigo Siqueira
  2021-01-25 13:43 ` [RFC 2/7] drm/amd/display: Moves dcn30_set_mcif_arb_params to FPU folder Rodrigo Siqueira
@ 2021-01-25 13:43 ` Rodrigo Siqueira
  2021-01-25 13:43 ` [RFC 4/7] drm/amd/display: Move calculate_wm_and_dlg to FPU folder Rodrigo Siqueira
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Rodrigo Siqueira @ 2021-01-25 13:43 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alex Deucher, Harry Wentland, Christian König, Nicholas Kazlauskas

In our DC code, we have a couple of functions that use FPU and are
shared between multiples ASICs; this situation complicates the work of
centralizing FPU functions in a single place. This commit tries to
alleviate this problem by creating a file named fpu_commons to
centralize shared functions that use FPU operation in a single file. As
a start point, we move dcn20_calculate_dlg_params to this file.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
---
 .../drm/amd/display/dc/dcn20/dcn20_resource.c |  70 +-----------
 .../drm/amd/display/dc/dcn20/dcn20_resource.h |   5 -
 .../drm/amd/display/dc/dcn21/dcn21_resource.c |   2 +
 .../drm/amd/display/dc/dcn30/dcn30_resource.c |   1 +
 .../drm/amd/display/dc/fpu_operation/Makefile |   3 +-
 .../display/dc/fpu_operation/dcn3x_commons.c  |   1 +
 .../display/dc/fpu_operation/fpu_commons.c    | 102 ++++++++++++++++++
 .../display/dc/fpu_operation/fpu_commons.h    |  34 ++++++
 8 files changed, 144 insertions(+), 74 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.c
 create mode 100644 drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.h

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
index 064f158ce671..d5bf740b408c 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -87,6 +87,8 @@
 
 #include "amdgpu_socbb.h"
 
+#include "fpu_operation/fpu_commons.h"
+
 #define DC_LOGGER_INIT(logger)
 
 struct _vcs_dpi_ip_params_st dcn2_0_ip = {
@@ -3060,74 +3062,6 @@ static void dcn20_calculate_wm(
 	context->bw_ctx.bw.dcn.watermarks.a.frac_urg_bw_flip = get_fraction_of_urgent_bandwidth_imm_flip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
 }
 
-void dcn20_calculate_dlg_params(
-		struct dc *dc, struct dc_state *context,
-		display_e2e_pipe_params_st *pipes,
-		int pipe_cnt,
-		int vlevel)
-{
-	int i, pipe_idx;
-
-	/* Writeback MCIF_WB arbitration parameters */
-	dc->res_pool->funcs->set_mcif_arb_params(dc, context, pipes, pipe_cnt);
-
-	context->bw_ctx.bw.dcn.clk.dispclk_khz = context->bw_ctx.dml.vba.DISPCLK * 1000;
-	context->bw_ctx.bw.dcn.clk.dcfclk_khz = context->bw_ctx.dml.vba.DCFCLK * 1000;
-	context->bw_ctx.bw.dcn.clk.socclk_khz = context->bw_ctx.dml.vba.SOCCLK * 1000;
-	context->bw_ctx.bw.dcn.clk.dramclk_khz = context->bw_ctx.dml.vba.DRAMSpeed * 1000 / 16;
-	context->bw_ctx.bw.dcn.clk.dcfclk_deep_sleep_khz = context->bw_ctx.dml.vba.DCFCLKDeepSleep * 1000;
-	context->bw_ctx.bw.dcn.clk.fclk_khz = context->bw_ctx.dml.vba.FabricClock * 1000;
-	context->bw_ctx.bw.dcn.clk.p_state_change_support =
-		context->bw_ctx.dml.vba.DRAMClockChangeSupport[vlevel][context->bw_ctx.dml.vba.maxMpcComb]
-							!= dm_dram_clock_change_unsupported;
-	context->bw_ctx.bw.dcn.clk.dppclk_khz = 0;
-
-	if (context->bw_ctx.bw.dcn.clk.dispclk_khz < dc->debug.min_disp_clk_khz)
-		context->bw_ctx.bw.dcn.clk.dispclk_khz = dc->debug.min_disp_clk_khz;
-
-	for (i = 0, pipe_idx = 0; i < dc->res_pool->pipe_count; i++) {
-		if (!context->res_ctx.pipe_ctx[i].stream)
-			continue;
-		pipes[pipe_idx].pipe.dest.vstartup_start = get_vstartup(&context->bw_ctx.dml, pipes, pipe_cnt, pipe_idx);
-		pipes[pipe_idx].pipe.dest.vupdate_offset = get_vupdate_offset(&context->bw_ctx.dml, pipes, pipe_cnt, pipe_idx);
-		pipes[pipe_idx].pipe.dest.vupdate_width = get_vupdate_width(&context->bw_ctx.dml, pipes, pipe_cnt, pipe_idx);
-		pipes[pipe_idx].pipe.dest.vready_offset = get_vready_offset(&context->bw_ctx.dml, pipes, pipe_cnt, pipe_idx);
-		if (context->bw_ctx.bw.dcn.clk.dppclk_khz < pipes[pipe_idx].clks_cfg.dppclk_mhz * 1000)
-			context->bw_ctx.bw.dcn.clk.dppclk_khz = pipes[pipe_idx].clks_cfg.dppclk_mhz * 1000;
-		context->res_ctx.pipe_ctx[i].plane_res.bw.dppclk_khz =
-						pipes[pipe_idx].clks_cfg.dppclk_mhz * 1000;
-		context->res_ctx.pipe_ctx[i].pipe_dlg_param = pipes[pipe_idx].pipe.dest;
-		pipe_idx++;
-	}
-	/*save a original dppclock copy*/
-	context->bw_ctx.bw.dcn.clk.bw_dppclk_khz = context->bw_ctx.bw.dcn.clk.dppclk_khz;
-	context->bw_ctx.bw.dcn.clk.bw_dispclk_khz = context->bw_ctx.bw.dcn.clk.dispclk_khz;
-	context->bw_ctx.bw.dcn.clk.max_supported_dppclk_khz = context->bw_ctx.dml.soc.clock_limits[vlevel].dppclk_mhz * 1000;
-	context->bw_ctx.bw.dcn.clk.max_supported_dispclk_khz = context->bw_ctx.dml.soc.clock_limits[vlevel].dispclk_mhz * 1000;
-
-	for (i = 0, pipe_idx = 0; i < dc->res_pool->pipe_count; i++) {
-		bool cstate_en = context->bw_ctx.dml.vba.PrefetchMode[vlevel][context->bw_ctx.dml.vba.maxMpcComb] != 2;
-
-		if (!context->res_ctx.pipe_ctx[i].stream)
-			continue;
-
-		context->bw_ctx.dml.funcs.rq_dlg_get_dlg_reg(&context->bw_ctx.dml,
-				&context->res_ctx.pipe_ctx[i].dlg_regs,
-				&context->res_ctx.pipe_ctx[i].ttu_regs,
-				pipes,
-				pipe_cnt,
-				pipe_idx,
-				cstate_en,
-				context->bw_ctx.bw.dcn.clk.p_state_change_support,
-				false, false, true);
-
-		context->bw_ctx.dml.funcs.rq_dlg_get_rq_reg(&context->bw_ctx.dml,
-				&context->res_ctx.pipe_ctx[i].rq_regs,
-				pipes[pipe_idx].pipe);
-		pipe_idx++;
-	}
-}
-
 static bool dcn20_validate_bandwidth_internal(struct dc *dc, struct dc_state *context,
 		bool fast_validate)
 {
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.h b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.h
index c8f3127bbcdf..11ec655a18e3 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.h
@@ -160,11 +160,6 @@ bool dcn20_fast_validate_bw(
 		int *pipe_split_from,
 		int *vlevel_out,
 		bool fast_validate);
-void dcn20_calculate_dlg_params(
-		struct dc *dc, struct dc_state *context,
-		display_e2e_pipe_params_st *pipes,
-		int pipe_cnt,
-		int vlevel);
 
 enum dc_status dcn20_build_mapped_resource(const struct dc *dc, struct dc_state *context, struct dc_stream_state *stream);
 enum dc_status dcn20_add_stream_to_ctx(struct dc *dc, struct dc_state *new_ctx, struct dc_stream_state *dc_stream);
diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
index b000b43a820d..b1e97caf56c7 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
@@ -89,6 +89,8 @@
 #include "dce/dmub_psr.h"
 #include "dce/dmub_abm.h"
 
+#include "fpu_operation/fpu_commons.h"
+
 #define DC_LOGGER_INIT(logger)
 
 
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
index b82d616f8a21..c97533b4ad09 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
@@ -34,6 +34,7 @@
 #include "dcn20/dcn20_resource.h"
 
 #include "dcn30_resource.h"
+#include "fpu_operation/fpu_commons.h"
 #include "fpu_operation/dcn3x_commons.h"
 
 #include "dcn10/dcn10_ipp.h"
diff --git a/drivers/gpu/drm/amd/display/dc/fpu_operation/Makefile b/drivers/gpu/drm/amd/display/dc/fpu_operation/Makefile
index 8a0957bfa8ba..e68536c2e225 100644
--- a/drivers/gpu/drm/amd/display/dc/fpu_operation/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/fpu_operation/Makefile
@@ -23,7 +23,7 @@
 # 
 # 
 
-DCN3X_COMMONS = dcn3x_commons.o
+DCN3X_COMMONS = dcn3x_commons.o fpu_commons.o
 
 ifdef CONFIG_X86
 dcn3x_commons_ccflags := -mhard-float -msse
@@ -51,6 +51,7 @@ endif
 endif
 
 CFLAGS_$(AMDDALPATH)/dc/fpu_operation/dcn3x_commons.o += $(dcn3x_commons_ccflags)
+CFLAGS_$(AMDDALPATH)/dc/fpu_operation/fpu_commons.o += $(dcn3x_commons_ccflags)
 
 AMD_DAL_DCN3X_COMMONS = $(addprefix $(AMDDALPATH)/dc/fpu_operation/,$(DCN3X_COMMONS))
 
diff --git a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
index 707f782127a7..06e9bd6d4d28 100644
--- a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
+++ b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
@@ -45,6 +45,7 @@
 #include "resource.h"
 #include "dml/dcn30/display_mode_vba_30.h"
 
+#include "fpu_operation/fpu_commons.h"
 #include "fpu_operation/dcn3x_commons.h"
 
 static void _dcn3x_populate_dml_writeback_from_context(struct dc *dc,
diff --git a/drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.c b/drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.c
new file mode 100644
index 000000000000..d2089a1dc630
--- /dev/null
+++ b/drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.c
@@ -0,0 +1,102 @@
+/*
+ * Copyright 2021 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: AMD
+ *
+ */
+
+#include "resource.h"
+#include "dml/dcn30/display_mode_vba_30.h"
+
+#include "fpu_operation/fpu_commons.h"
+
+static void _dcn20_calculate_dlg_params(struct dc *dc, struct dc_state *context,
+		display_e2e_pipe_params_st *pipes, int pipe_cnt, int vlevel)
+{
+	int i, pipe_idx;
+
+	/* Writeback MCIF_WB arbitration parameters */
+	dc->res_pool->funcs->set_mcif_arb_params(dc, context, pipes, pipe_cnt);
+
+	context->bw_ctx.bw.dcn.clk.dispclk_khz = context->bw_ctx.dml.vba.DISPCLK * 1000;
+	context->bw_ctx.bw.dcn.clk.dcfclk_khz = context->bw_ctx.dml.vba.DCFCLK * 1000;
+	context->bw_ctx.bw.dcn.clk.socclk_khz = context->bw_ctx.dml.vba.SOCCLK * 1000;
+	context->bw_ctx.bw.dcn.clk.dramclk_khz = context->bw_ctx.dml.vba.DRAMSpeed * 1000 / 16;
+	context->bw_ctx.bw.dcn.clk.dcfclk_deep_sleep_khz = context->bw_ctx.dml.vba.DCFCLKDeepSleep * 1000;
+	context->bw_ctx.bw.dcn.clk.fclk_khz = context->bw_ctx.dml.vba.FabricClock * 1000;
+	context->bw_ctx.bw.dcn.clk.p_state_change_support =
+		context->bw_ctx.dml.vba.DRAMClockChangeSupport[vlevel][context->bw_ctx.dml.vba.maxMpcComb]
+							!= dm_dram_clock_change_unsupported;
+	context->bw_ctx.bw.dcn.clk.dppclk_khz = 0;
+
+	if (context->bw_ctx.bw.dcn.clk.dispclk_khz < dc->debug.min_disp_clk_khz)
+		context->bw_ctx.bw.dcn.clk.dispclk_khz = dc->debug.min_disp_clk_khz;
+
+	for (i = 0, pipe_idx = 0; i < dc->res_pool->pipe_count; i++) {
+		if (!context->res_ctx.pipe_ctx[i].stream)
+			continue;
+		pipes[pipe_idx].pipe.dest.vstartup_start = get_vstartup(&context->bw_ctx.dml, pipes, pipe_cnt, pipe_idx);
+		pipes[pipe_idx].pipe.dest.vupdate_offset = get_vupdate_offset(&context->bw_ctx.dml, pipes, pipe_cnt, pipe_idx);
+		pipes[pipe_idx].pipe.dest.vupdate_width = get_vupdate_width(&context->bw_ctx.dml, pipes, pipe_cnt, pipe_idx);
+		pipes[pipe_idx].pipe.dest.vready_offset = get_vready_offset(&context->bw_ctx.dml, pipes, pipe_cnt, pipe_idx);
+		if (context->bw_ctx.bw.dcn.clk.dppclk_khz < pipes[pipe_idx].clks_cfg.dppclk_mhz * 1000)
+			context->bw_ctx.bw.dcn.clk.dppclk_khz = pipes[pipe_idx].clks_cfg.dppclk_mhz * 1000;
+		context->res_ctx.pipe_ctx[i].plane_res.bw.dppclk_khz =
+						pipes[pipe_idx].clks_cfg.dppclk_mhz * 1000;
+		context->res_ctx.pipe_ctx[i].pipe_dlg_param = pipes[pipe_idx].pipe.dest;
+		pipe_idx++;
+	}
+	/*save a original dppclock copy*/
+	context->bw_ctx.bw.dcn.clk.bw_dppclk_khz = context->bw_ctx.bw.dcn.clk.dppclk_khz;
+	context->bw_ctx.bw.dcn.clk.bw_dispclk_khz = context->bw_ctx.bw.dcn.clk.dispclk_khz;
+	context->bw_ctx.bw.dcn.clk.max_supported_dppclk_khz = context->bw_ctx.dml.soc.clock_limits[vlevel].dppclk_mhz * 1000;
+	context->bw_ctx.bw.dcn.clk.max_supported_dispclk_khz = context->bw_ctx.dml.soc.clock_limits[vlevel].dispclk_mhz * 1000;
+
+	for (i = 0, pipe_idx = 0; i < dc->res_pool->pipe_count; i++) {
+		bool cstate_en = context->bw_ctx.dml.vba.PrefetchMode[vlevel][context->bw_ctx.dml.vba.maxMpcComb] != 2;
+
+		if (!context->res_ctx.pipe_ctx[i].stream)
+			continue;
+
+		context->bw_ctx.dml.funcs.rq_dlg_get_dlg_reg(&context->bw_ctx.dml,
+				&context->res_ctx.pipe_ctx[i].dlg_regs,
+				&context->res_ctx.pipe_ctx[i].ttu_regs,
+				pipes,
+				pipe_cnt,
+				pipe_idx,
+				cstate_en,
+				context->bw_ctx.bw.dcn.clk.p_state_change_support,
+				false, false, true);
+
+		context->bw_ctx.dml.funcs.rq_dlg_get_rq_reg(&context->bw_ctx.dml,
+				&context->res_ctx.pipe_ctx[i].rq_regs,
+				pipes[pipe_idx].pipe);
+		pipe_idx++;
+	}
+}
+
+void dcn20_calculate_dlg_params(struct dc *dc, struct dc_state *context,
+		display_e2e_pipe_params_st *pipes, int pipe_cnt, int vlevel)
+{
+	DC_FP_START();
+	_dcn20_calculate_dlg_params(dc, context, pipes, pipe_cnt, vlevel);
+	DC_FP_END();
+}
diff --git a/drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.h b/drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.h
new file mode 100644
index 000000000000..8d23fa8e87ee
--- /dev/null
+++ b/drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright 2021 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: AMD
+ *
+ */
+
+#ifndef _FPU_COMMONS_H_
+#define _FPU_COMMONS_H_
+
+void dcn20_calculate_dlg_params(struct dc *dc, struct dc_state *context,
+		display_e2e_pipe_params_st *pipes,
+		int pipe_cnt,
+		int vlevel);
+
+#endif
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 4/7] drm/amd/display: Move calculate_wm_and_dlg to FPU folder
  2021-01-25 13:43 [RFC 0/7] Proposal for isolating FPU operation Rodrigo Siqueira
                   ` (2 preceding siblings ...)
  2021-01-25 13:43 ` [RFC 3/7] drm/amd/display: Add FPU file for functions shared across ASICs Rodrigo Siqueira
@ 2021-01-25 13:43 ` Rodrigo Siqueira
  2021-01-25 13:43 ` [RFC 5/7] drm/amd/display: Move patch bounding box " Rodrigo Siqueira
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Rodrigo Siqueira @ 2021-01-25 13:43 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alex Deucher, Harry Wentland, Christian König, Nicholas Kazlauskas

The function dcn3x_calculate_wm_and_dlg and
dcn30_calculate_wm_and_dlg_fp require access to FPU operation; for this
reason, this commit moves this function to the fpu directory.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
---
 .../drm/amd/display/dc/dcn30/dcn30_resource.c | 181 +-----------------
 .../drm/amd/display/dc/dcn30/dcn30_resource.h |   5 -
 .../amd/display/dc/dcn301/dcn301_resource.c   |   2 +-
 .../amd/display/dc/dcn302/dcn302_resource.c   |   2 +-
 .../display/dc/fpu_operation/dcn3x_commons.c  | 177 +++++++++++++++++
 .../display/dc/fpu_operation/dcn3x_commons.h  |   3 +
 6 files changed, 183 insertions(+), 187 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
index c97533b4ad09..4edebee00095 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
@@ -1938,185 +1938,6 @@ static noinline bool dcn30_internal_validate_bw(
 	return out;
 }
 
-/*
- * This must be noinline to ensure anything that deals with FP registers
- * is contained within this call; previously our compiling with hard-float
- * would result in fp instructions being emitted outside of the boundaries
- * of the DC_FP_START/END macros, which makes sense as the compiler has no
- * idea about what is wrapped and what is not
- *
- * This is largely just a workaround to avoid breakage introduced with 5.6,
- * ideally all fp-using code should be moved into its own file, only that
- * should be compiled with hard-float, and all code exported from there
- * should be strictly wrapped with DC_FP_START/END
- */
-static noinline void dcn30_calculate_wm_and_dlg_fp(
-		struct dc *dc, struct dc_state *context,
-		display_e2e_pipe_params_st *pipes,
-		int pipe_cnt,
-		int vlevel)
-{
-	int i, pipe_idx;
-	double dcfclk = context->bw_ctx.dml.vba.DCFCLKState[vlevel][context->bw_ctx.dml.vba.maxMpcComb];
-	bool pstate_en = context->bw_ctx.dml.vba.DRAMClockChangeSupport[vlevel][context->bw_ctx.dml.vba.maxMpcComb] !=
-			dm_dram_clock_change_unsupported;
-
-	if (context->bw_ctx.dml.soc.min_dcfclk > dcfclk)
-		dcfclk = context->bw_ctx.dml.soc.min_dcfclk;
-
-	pipes[0].clks_cfg.voltage = vlevel;
-	pipes[0].clks_cfg.dcfclk_mhz = dcfclk;
-	pipes[0].clks_cfg.socclk_mhz = context->bw_ctx.dml.soc.clock_limits[vlevel].socclk_mhz;
-
-	/* Set B:
-	 * DCFCLK: 1GHz or min required above 1GHz
-	 * FCLK/UCLK: Max
-	 */
-	if (dc->clk_mgr->bw_params->wm_table.nv_entries[WM_B].valid) {
-		if (vlevel == 0) {
-			pipes[0].clks_cfg.voltage = 1;
-			pipes[0].clks_cfg.dcfclk_mhz = context->bw_ctx.dml.soc.clock_limits[0].dcfclk_mhz;
-		}
-		context->bw_ctx.dml.soc.dram_clock_change_latency_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_B].dml_input.pstate_latency_us;
-		context->bw_ctx.dml.soc.sr_enter_plus_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_B].dml_input.sr_enter_plus_exit_time_us;
-		context->bw_ctx.dml.soc.sr_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_B].dml_input.sr_exit_time_us;
-	}
-	context->bw_ctx.bw.dcn.watermarks.b.urgent_ns = get_wm_urgent(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.b.cstate_pstate.cstate_enter_plus_exit_ns = get_wm_stutter_enter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.b.cstate_pstate.cstate_exit_ns = get_wm_stutter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.b.cstate_pstate.pstate_change_ns = get_wm_dram_clock_change(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.b.pte_meta_urgent_ns = get_wm_memory_trip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.b.frac_urg_bw_nom = get_fraction_of_urgent_bandwidth(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.b.frac_urg_bw_flip = get_fraction_of_urgent_bandwidth_imm_flip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.b.urgent_latency_ns = get_urgent_latency(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-
-	pipes[0].clks_cfg.voltage = vlevel;
-	pipes[0].clks_cfg.dcfclk_mhz = dcfclk;
-
-	/* Set D:
-	 * DCFCLK: Min Required
-	 * FCLK(proportional to UCLK): 1GHz or Max
-	 * MALL stutter, sr_enter_exit = 4, sr_exit = 2us
-	 */
-	/*
-	if (dc->clk_mgr->bw_params->wm_table.nv_entries[WM_D].valid) {
-		context->bw_ctx.dml.soc.dram_clock_change_latency_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_D].dml_input.pstate_latency_us;
-		context->bw_ctx.dml.soc.sr_enter_plus_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_D].dml_input.sr_enter_plus_exit_time_us;
-		context->bw_ctx.dml.soc.sr_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_D].dml_input.sr_exit_time_us;
-	}
-	context->bw_ctx.bw.dcn.watermarks.d.urgent_ns = get_wm_urgent(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.d.cstate_pstate.cstate_enter_plus_exit_ns = get_wm_stutter_enter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.d.cstate_pstate.cstate_exit_ns = get_wm_stutter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.d.cstate_pstate.pstate_change_ns = get_wm_dram_clock_change(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.d.pte_meta_urgent_ns = get_wm_memory_trip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.d.frac_urg_bw_nom = get_fraction_of_urgent_bandwidth(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.d.frac_urg_bw_flip = get_fraction_of_urgent_bandwidth_imm_flip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.d.urgent_latency_ns = get_urgent_latency(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	*/
-
-	/* Set C:
-	 * DCFCLK: Min Required
-	 * FCLK(proportional to UCLK): 1GHz or Max
-	 * pstate latency overridden to 5us
-	 */
-	if (dc->clk_mgr->bw_params->wm_table.nv_entries[WM_C].valid) {
-		unsigned int min_dram_speed_mts = context->bw_ctx.dml.vba.DRAMSpeed;
-		unsigned int min_dram_speed_mts_margin = 160;
-
-		context->bw_ctx.dml.soc.dram_clock_change_latency_us = dc->clk_mgr->bw_params->dummy_pstate_table[0].dummy_pstate_latency_us;
-
-		if (context->bw_ctx.dml.vba.DRAMClockChangeSupport[vlevel][context->bw_ctx.dml.vba.maxMpcComb] == dm_dram_clock_change_unsupported)
-			min_dram_speed_mts = dc->clk_mgr->bw_params->clk_table.entries[dc->clk_mgr->bw_params->clk_table.num_entries - 1].memclk_mhz * 16;
-
-		for (i = 3; i > 0; i--) {
-			if ((min_dram_speed_mts + min_dram_speed_mts_margin > dc->clk_mgr->bw_params->dummy_pstate_table[i].dram_speed_mts) &&
-					(min_dram_speed_mts - min_dram_speed_mts_margin < dc->clk_mgr->bw_params->dummy_pstate_table[i].dram_speed_mts))
-				context->bw_ctx.dml.soc.dram_clock_change_latency_us = dc->clk_mgr->bw_params->dummy_pstate_table[i].dummy_pstate_latency_us;
-		}
-
-		context->bw_ctx.dml.soc.sr_enter_plus_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_C].dml_input.sr_enter_plus_exit_time_us;
-		context->bw_ctx.dml.soc.sr_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_C].dml_input.sr_exit_time_us;
-	}
-	context->bw_ctx.bw.dcn.watermarks.c.urgent_ns = get_wm_urgent(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.c.cstate_pstate.cstate_enter_plus_exit_ns = get_wm_stutter_enter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.c.cstate_pstate.cstate_exit_ns = get_wm_stutter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.c.cstate_pstate.pstate_change_ns = get_wm_dram_clock_change(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.c.pte_meta_urgent_ns = get_wm_memory_trip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.c.frac_urg_bw_nom = get_fraction_of_urgent_bandwidth(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.c.frac_urg_bw_flip = get_fraction_of_urgent_bandwidth_imm_flip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	context->bw_ctx.bw.dcn.watermarks.c.urgent_latency_ns = get_urgent_latency(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-
-	if (!pstate_en) {
-		/* The only difference between A and C is p-state latency, if p-state is not supported we want to
-		 * calculate DLG based on dummy p-state latency, and max out the set A p-state watermark
-		 */
-		context->bw_ctx.bw.dcn.watermarks.a = context->bw_ctx.bw.dcn.watermarks.c;
-		context->bw_ctx.bw.dcn.watermarks.a.cstate_pstate.pstate_change_ns = 0;
-	} else {
-		/* Set A:
-		 * DCFCLK: Min Required
-		 * FCLK(proportional to UCLK): 1GHz or Max
-		 *
-		 * Set A calculated last so that following calculations are based on Set A
-		 */
-		if (dc->clk_mgr->bw_params->wm_table.nv_entries[WM_A].valid) {
-			context->bw_ctx.dml.soc.dram_clock_change_latency_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_A].dml_input.pstate_latency_us;
-			context->bw_ctx.dml.soc.sr_enter_plus_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_A].dml_input.sr_enter_plus_exit_time_us;
-			context->bw_ctx.dml.soc.sr_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_A].dml_input.sr_exit_time_us;
-		}
-		context->bw_ctx.bw.dcn.watermarks.a.urgent_ns = get_wm_urgent(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-		context->bw_ctx.bw.dcn.watermarks.a.cstate_pstate.cstate_enter_plus_exit_ns = get_wm_stutter_enter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-		context->bw_ctx.bw.dcn.watermarks.a.cstate_pstate.cstate_exit_ns = get_wm_stutter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-		context->bw_ctx.bw.dcn.watermarks.a.cstate_pstate.pstate_change_ns = get_wm_dram_clock_change(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-		context->bw_ctx.bw.dcn.watermarks.a.pte_meta_urgent_ns = get_wm_memory_trip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-		context->bw_ctx.bw.dcn.watermarks.a.frac_urg_bw_nom = get_fraction_of_urgent_bandwidth(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-		context->bw_ctx.bw.dcn.watermarks.a.frac_urg_bw_flip = get_fraction_of_urgent_bandwidth_imm_flip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-		context->bw_ctx.bw.dcn.watermarks.a.urgent_latency_ns = get_urgent_latency(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
-	}
-
-	context->perf_params.stutter_period_us = context->bw_ctx.dml.vba.StutterPeriod;
-
-	/* Make set D = set A until set D is enabled */
-	context->bw_ctx.bw.dcn.watermarks.d = context->bw_ctx.bw.dcn.watermarks.a;
-
-	for (i = 0, pipe_idx = 0; i < dc->res_pool->pipe_count; i++) {
-		if (!context->res_ctx.pipe_ctx[i].stream)
-			continue;
-
-		pipes[pipe_idx].clks_cfg.dispclk_mhz = get_dispclk_calculated(&context->bw_ctx.dml, pipes, pipe_cnt);
-		pipes[pipe_idx].clks_cfg.dppclk_mhz = get_dppclk_calculated(&context->bw_ctx.dml, pipes, pipe_cnt, pipe_idx);
-
-		if (dc->config.forced_clocks) {
-			pipes[pipe_idx].clks_cfg.dispclk_mhz = context->bw_ctx.dml.soc.clock_limits[0].dispclk_mhz;
-			pipes[pipe_idx].clks_cfg.dppclk_mhz = context->bw_ctx.dml.soc.clock_limits[0].dppclk_mhz;
-		}
-		if (dc->debug.min_disp_clk_khz > pipes[pipe_idx].clks_cfg.dispclk_mhz * 1000)
-			pipes[pipe_idx].clks_cfg.dispclk_mhz = dc->debug.min_disp_clk_khz / 1000.0;
-		if (dc->debug.min_dpp_clk_khz > pipes[pipe_idx].clks_cfg.dppclk_mhz * 1000)
-			pipes[pipe_idx].clks_cfg.dppclk_mhz = dc->debug.min_dpp_clk_khz / 1000.0;
-
-		pipe_idx++;
-	}
-
-	dcn20_calculate_dlg_params(dc, context, pipes, pipe_cnt, vlevel);
-
-	if (!pstate_en)
-		/* Restore full p-state latency */
-		context->bw_ctx.dml.soc.dram_clock_change_latency_us =
-				dc->clk_mgr->bw_params->wm_table.nv_entries[WM_A].dml_input.pstate_latency_us;
-}
-
-void dcn30_calculate_wm_and_dlg(
-		struct dc *dc, struct dc_state *context,
-		display_e2e_pipe_params_st *pipes,
-		int pipe_cnt,
-		int vlevel)
-{
-	DC_FP_START();
-	dcn30_calculate_wm_and_dlg_fp(dc, context, pipes, pipe_cnt, vlevel);
-	DC_FP_END();
-}
-
 bool dcn30_validate_bandwidth(struct dc *dc,
 		struct dc_state *context,
 		bool fast_validate)
@@ -2317,7 +2138,7 @@ static const struct resource_funcs dcn30_res_pool_funcs = {
 	.link_enc_create = dcn30_link_encoder_create,
 	.panel_cntl_create = dcn30_panel_cntl_create,
 	.validate_bandwidth = dcn30_validate_bandwidth,
-	.calculate_wm_and_dlg = dcn30_calculate_wm_and_dlg,
+	.calculate_wm_and_dlg = dcn3x_calculate_wm_and_dlg,
 	.populate_dml_pipes = dcn30_populate_dml_pipes_from_context,
 	.acquire_idle_pipe_for_layer = dcn20_acquire_idle_pipe_for_layer,
 	.add_stream_to_ctx = dcn30_add_stream_to_ctx,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h
index 42960574cce9..694ae4e4e88e 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h
@@ -44,11 +44,6 @@ struct resource_pool *dcn30_create_resource_pool(
 
 bool dcn30_validate_bandwidth(struct dc *dc, struct dc_state *context,
 		bool fast_validate);
-void dcn30_calculate_wm_and_dlg(
-		struct dc *dc, struct dc_state *context,
-		display_e2e_pipe_params_st *pipes,
-		int pipe_cnt,
-		int vlevel);
 
 int dcn30_populate_dml_pipes_from_context(
 	struct dc *dc, struct dc_state *context,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
index 8369b009a853..934df194db38 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
@@ -1625,7 +1625,7 @@ static struct resource_funcs dcn301_res_pool_funcs = {
 	.link_enc_create = dcn301_link_encoder_create,
 	.panel_cntl_create = dcn301_panel_cntl_create,
 	.validate_bandwidth = dcn30_validate_bandwidth,
-	.calculate_wm_and_dlg = dcn30_calculate_wm_and_dlg,
+	.calculate_wm_and_dlg = dcn3x_calculate_wm_and_dlg,
 	.populate_dml_pipes = dcn30_populate_dml_pipes_from_context,
 	.acquire_idle_pipe_for_layer = dcn20_acquire_idle_pipe_for_layer,
 	.add_stream_to_ctx = dcn30_add_stream_to_ctx,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
index 0554a3dd3214..539757ec3348 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
@@ -1235,7 +1235,7 @@ static struct resource_funcs dcn302_res_pool_funcs = {
 		.link_enc_create = dcn302_link_encoder_create,
 		.panel_cntl_create = dcn302_panel_cntl_create,
 		.validate_bandwidth = dcn30_validate_bandwidth,
-		.calculate_wm_and_dlg = dcn30_calculate_wm_and_dlg,
+		.calculate_wm_and_dlg = dcn3x_calculate_wm_and_dlg,
 		.populate_dml_pipes = dcn30_populate_dml_pipes_from_context,
 		.acquire_idle_pipe_for_layer = dcn20_acquire_idle_pipe_for_layer,
 		.add_stream_to_ctx = dcn30_add_stream_to_ctx,
diff --git a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
index 06e9bd6d4d28..c3b06a8cf765 100644
--- a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
+++ b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
@@ -42,6 +42,7 @@
  *   kernel_fpu_begin/end macros.
  */
 
+#include "clk_mgr.h"
 #include "resource.h"
 #include "dml/dcn30/display_mode_vba_30.h"
 
@@ -213,6 +214,182 @@ static void _dcn3x_set_mcif_arb_params(struct dc *dc, struct dc_state *context,
 
 }
 
+/*
+ * This must be noinline to ensure anything that deals with FP registers
+ * is contained within this call; previously our compiling with hard-float
+ * would result in fp instructions being emitted outside of the boundaries
+ * of the DC_FP_START/END macros, which makes sense as the compiler has no
+ * idea about what is wrapped and what is not
+ *
+ * This is largely just a workaround to avoid breakage introduced with 5.6,
+ * ideally all fp-using code should be moved into its own file, only that
+ * should be compiled with hard-float, and all code exported from there
+ * should be strictly wrapped with DC_FP_START/END
+ */
+static noinline void _dcn30_calculate_wm_and_dlg_fp(struct dc *dc,
+		struct dc_state *context, display_e2e_pipe_params_st *pipes,
+		int pipe_cnt, int vlevel)
+{
+	int i, pipe_idx;
+	double dcfclk = context->bw_ctx.dml.vba.DCFCLKState[vlevel][context->bw_ctx.dml.vba.maxMpcComb];
+	bool pstate_en = context->bw_ctx.dml.vba.DRAMClockChangeSupport[vlevel][context->bw_ctx.dml.vba.maxMpcComb] !=
+			dm_dram_clock_change_unsupported;
+
+	if (context->bw_ctx.dml.soc.min_dcfclk > dcfclk)
+		dcfclk = context->bw_ctx.dml.soc.min_dcfclk;
+
+	pipes[0].clks_cfg.voltage = vlevel;
+	pipes[0].clks_cfg.dcfclk_mhz = dcfclk;
+	pipes[0].clks_cfg.socclk_mhz = context->bw_ctx.dml.soc.clock_limits[vlevel].socclk_mhz;
+
+	/* Set B:
+	 * DCFCLK: 1GHz or min required above 1GHz
+	 * FCLK/UCLK: Max
+	 */
+	if (dc->clk_mgr->bw_params->wm_table.nv_entries[WM_B].valid) {
+		if (vlevel == 0) {
+			pipes[0].clks_cfg.voltage = 1;
+			pipes[0].clks_cfg.dcfclk_mhz = context->bw_ctx.dml.soc.clock_limits[0].dcfclk_mhz;
+		}
+		context->bw_ctx.dml.soc.dram_clock_change_latency_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_B].dml_input.pstate_latency_us;
+		context->bw_ctx.dml.soc.sr_enter_plus_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_B].dml_input.sr_enter_plus_exit_time_us;
+		context->bw_ctx.dml.soc.sr_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_B].dml_input.sr_exit_time_us;
+	}
+	context->bw_ctx.bw.dcn.watermarks.b.urgent_ns = get_wm_urgent(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.b.cstate_pstate.cstate_enter_plus_exit_ns = get_wm_stutter_enter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.b.cstate_pstate.cstate_exit_ns = get_wm_stutter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.b.cstate_pstate.pstate_change_ns = get_wm_dram_clock_change(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.b.pte_meta_urgent_ns = get_wm_memory_trip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.b.frac_urg_bw_nom = get_fraction_of_urgent_bandwidth(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.b.frac_urg_bw_flip = get_fraction_of_urgent_bandwidth_imm_flip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.b.urgent_latency_ns = get_urgent_latency(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+
+	pipes[0].clks_cfg.voltage = vlevel;
+	pipes[0].clks_cfg.dcfclk_mhz = dcfclk;
+
+	/* Set D:
+	 * DCFCLK: Min Required
+	 * FCLK(proportional to UCLK): 1GHz or Max
+	 * MALL stutter, sr_enter_exit = 4, sr_exit = 2us
+	 */
+	/*
+	if (dc->clk_mgr->bw_params->wm_table.nv_entries[WM_D].valid) {
+		context->bw_ctx.dml.soc.dram_clock_change_latency_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_D].dml_input.pstate_latency_us;
+		context->bw_ctx.dml.soc.sr_enter_plus_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_D].dml_input.sr_enter_plus_exit_time_us;
+		context->bw_ctx.dml.soc.sr_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_D].dml_input.sr_exit_time_us;
+	}
+	context->bw_ctx.bw.dcn.watermarks.d.urgent_ns = get_wm_urgent(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.d.cstate_pstate.cstate_enter_plus_exit_ns = get_wm_stutter_enter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.d.cstate_pstate.cstate_exit_ns = get_wm_stutter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.d.cstate_pstate.pstate_change_ns = get_wm_dram_clock_change(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.d.pte_meta_urgent_ns = get_wm_memory_trip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.d.frac_urg_bw_nom = get_fraction_of_urgent_bandwidth(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.d.frac_urg_bw_flip = get_fraction_of_urgent_bandwidth_imm_flip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.d.urgent_latency_ns = get_urgent_latency(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	*/
+
+	/* Set C:
+	 * DCFCLK: Min Required
+	 * FCLK(proportional to UCLK): 1GHz or Max
+	 * pstate latency overridden to 5us
+	 */
+	if (dc->clk_mgr->bw_params->wm_table.nv_entries[WM_C].valid) {
+		unsigned int min_dram_speed_mts = context->bw_ctx.dml.vba.DRAMSpeed;
+		unsigned int min_dram_speed_mts_margin = 160;
+
+		context->bw_ctx.dml.soc.dram_clock_change_latency_us = dc->clk_mgr->bw_params->dummy_pstate_table[0].dummy_pstate_latency_us;
+
+		if (context->bw_ctx.dml.vba.DRAMClockChangeSupport[vlevel][context->bw_ctx.dml.vba.maxMpcComb] == dm_dram_clock_change_unsupported)
+			min_dram_speed_mts = dc->clk_mgr->bw_params->clk_table.entries[dc->clk_mgr->bw_params->clk_table.num_entries - 1].memclk_mhz * 16;
+
+		for (i = 3; i > 0; i--) {
+			if ((min_dram_speed_mts + min_dram_speed_mts_margin > dc->clk_mgr->bw_params->dummy_pstate_table[i].dram_speed_mts) &&
+					(min_dram_speed_mts - min_dram_speed_mts_margin < dc->clk_mgr->bw_params->dummy_pstate_table[i].dram_speed_mts))
+				context->bw_ctx.dml.soc.dram_clock_change_latency_us = dc->clk_mgr->bw_params->dummy_pstate_table[i].dummy_pstate_latency_us;
+		}
+
+		context->bw_ctx.dml.soc.sr_enter_plus_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_C].dml_input.sr_enter_plus_exit_time_us;
+		context->bw_ctx.dml.soc.sr_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_C].dml_input.sr_exit_time_us;
+	}
+	context->bw_ctx.bw.dcn.watermarks.c.urgent_ns = get_wm_urgent(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.c.cstate_pstate.cstate_enter_plus_exit_ns = get_wm_stutter_enter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.c.cstate_pstate.cstate_exit_ns = get_wm_stutter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.c.cstate_pstate.pstate_change_ns = get_wm_dram_clock_change(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.c.pte_meta_urgent_ns = get_wm_memory_trip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.c.frac_urg_bw_nom = get_fraction_of_urgent_bandwidth(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.c.frac_urg_bw_flip = get_fraction_of_urgent_bandwidth_imm_flip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	context->bw_ctx.bw.dcn.watermarks.c.urgent_latency_ns = get_urgent_latency(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+
+	if (!pstate_en) {
+		/* The only difference between A and C is p-state latency, if p-state is not supported we want to
+		 * calculate DLG based on dummy p-state latency, and max out the set A p-state watermark
+		 */
+		context->bw_ctx.bw.dcn.watermarks.a = context->bw_ctx.bw.dcn.watermarks.c;
+		context->bw_ctx.bw.dcn.watermarks.a.cstate_pstate.pstate_change_ns = 0;
+	} else {
+		/* Set A:
+		 * DCFCLK: Min Required
+		 * FCLK(proportional to UCLK): 1GHz or Max
+		 *
+		 * Set A calculated last so that following calculations are based on Set A
+		 */
+		if (dc->clk_mgr->bw_params->wm_table.nv_entries[WM_A].valid) {
+			context->bw_ctx.dml.soc.dram_clock_change_latency_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_A].dml_input.pstate_latency_us;
+			context->bw_ctx.dml.soc.sr_enter_plus_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_A].dml_input.sr_enter_plus_exit_time_us;
+			context->bw_ctx.dml.soc.sr_exit_time_us = dc->clk_mgr->bw_params->wm_table.nv_entries[WM_A].dml_input.sr_exit_time_us;
+		}
+		context->bw_ctx.bw.dcn.watermarks.a.urgent_ns = get_wm_urgent(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+		context->bw_ctx.bw.dcn.watermarks.a.cstate_pstate.cstate_enter_plus_exit_ns = get_wm_stutter_enter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+		context->bw_ctx.bw.dcn.watermarks.a.cstate_pstate.cstate_exit_ns = get_wm_stutter_exit(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+		context->bw_ctx.bw.dcn.watermarks.a.cstate_pstate.pstate_change_ns = get_wm_dram_clock_change(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+		context->bw_ctx.bw.dcn.watermarks.a.pte_meta_urgent_ns = get_wm_memory_trip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+		context->bw_ctx.bw.dcn.watermarks.a.frac_urg_bw_nom = get_fraction_of_urgent_bandwidth(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+		context->bw_ctx.bw.dcn.watermarks.a.frac_urg_bw_flip = get_fraction_of_urgent_bandwidth_imm_flip(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+		context->bw_ctx.bw.dcn.watermarks.a.urgent_latency_ns = get_urgent_latency(&context->bw_ctx.dml, pipes, pipe_cnt) * 1000;
+	}
+
+	context->perf_params.stutter_period_us = context->bw_ctx.dml.vba.StutterPeriod;
+
+	/* Make set D = set A until set D is enabled */
+	context->bw_ctx.bw.dcn.watermarks.d = context->bw_ctx.bw.dcn.watermarks.a;
+
+	for (i = 0, pipe_idx = 0; i < dc->res_pool->pipe_count; i++) {
+		if (!context->res_ctx.pipe_ctx[i].stream)
+			continue;
+
+		pipes[pipe_idx].clks_cfg.dispclk_mhz = get_dispclk_calculated(&context->bw_ctx.dml, pipes, pipe_cnt);
+		pipes[pipe_idx].clks_cfg.dppclk_mhz = get_dppclk_calculated(&context->bw_ctx.dml, pipes, pipe_cnt, pipe_idx);
+
+		if (dc->config.forced_clocks) {
+			pipes[pipe_idx].clks_cfg.dispclk_mhz = context->bw_ctx.dml.soc.clock_limits[0].dispclk_mhz;
+			pipes[pipe_idx].clks_cfg.dppclk_mhz = context->bw_ctx.dml.soc.clock_limits[0].dppclk_mhz;
+		}
+		if (dc->debug.min_disp_clk_khz > pipes[pipe_idx].clks_cfg.dispclk_mhz * 1000)
+			pipes[pipe_idx].clks_cfg.dispclk_mhz = dc->debug.min_disp_clk_khz / 1000.0;
+		if (dc->debug.min_dpp_clk_khz > pipes[pipe_idx].clks_cfg.dppclk_mhz * 1000)
+			pipes[pipe_idx].clks_cfg.dppclk_mhz = dc->debug.min_dpp_clk_khz / 1000.0;
+
+		pipe_idx++;
+	}
+
+	dcn20_calculate_dlg_params(dc, context, pipes, pipe_cnt, vlevel);
+
+	if (!pstate_en)
+		/* Restore full p-state latency */
+		context->bw_ctx.dml.soc.dram_clock_change_latency_us =
+				dc->clk_mgr->bw_params->wm_table.nv_entries[WM_A].dml_input.pstate_latency_us;
+}
+
+void dcn3x_calculate_wm_and_dlg(struct dc *dc, struct dc_state *context,
+				display_e2e_pipe_params_st *pipes,
+				int pipe_cnt,
+				int vlevel)
+{
+	DC_FP_START();
+	_dcn30_calculate_wm_and_dlg_fp(dc, context, pipes, pipe_cnt, vlevel);
+	DC_FP_END();
+}
+
 void dcn3x_set_mcif_arb_params(struct dc *dc, struct dc_state *context,
 		display_e2e_pipe_params_st *pipes,
 		int pipe_cnt)
diff --git a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h
index b3b6d8a66c28..7521498bbd3b 100644
--- a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h
+++ b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h
@@ -32,4 +32,7 @@ void dcn3x_populate_dml_writeback_from_context(struct dc *dc,
 void dcn3x_set_mcif_arb_params(struct dc *dc, struct dc_state *context,
 	display_e2e_pipe_params_st *pipes, int pipe_cnt);
 
+void dcn3x_calculate_wm_and_dlg(struct dc *dc, struct dc_state *context,
+	display_e2e_pipe_params_st *pipes, int pipe_cnt, int vlevel);
+
 #endif /* _DCN3X_COMMONS_H_ */
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 5/7] drm/amd/display: Move patch bounding box to FPU folder
  2021-01-25 13:43 [RFC 0/7] Proposal for isolating FPU operation Rodrigo Siqueira
                   ` (3 preceding siblings ...)
  2021-01-25 13:43 ` [RFC 4/7] drm/amd/display: Move calculate_wm_and_dlg to FPU folder Rodrigo Siqueira
@ 2021-01-25 13:43 ` Rodrigo Siqueira
  2021-01-25 13:43 ` [RFC 6/7] drm/amd/display: Move bounding box functions " Rodrigo Siqueira
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Rodrigo Siqueira @ 2021-01-25 13:43 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alex Deucher, Harry Wentland, Christian König, Nicholas Kazlauskas

The function dcn20_patch_bounding_box is shared from DCN2 to DCN3 and
uses FPU operations. For this reason, this commit moves this function to
the fpu_commons file.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
---
 .../drm/amd/display/dc/dcn20/dcn20_resource.c | 36 +---------------
 .../drm/amd/display/dc/dcn20/dcn20_resource.h |  3 --
 .../drm/amd/display/dc/dcn30/dcn30_resource.c |  2 +-
 .../amd/display/dc/dcn301/dcn301_resource.c   |  3 +-
 .../amd/display/dc/dcn302/dcn302_resource.c   |  3 +-
 .../display/dc/fpu_operation/fpu_commons.c    | 43 +++++++++++++++++++
 .../display/dc/fpu_operation/fpu_commons.h    |  3 ++
 7 files changed, 52 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
index d5bf740b408c..e9257999148e 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -3472,40 +3472,6 @@ void dcn20_update_bounding_box(struct dc *dc, struct _vcs_dpi_soc_bounding_box_s
 	bb->clock_limits[num_calculated_states].state = bb->num_states;
 }
 
-void dcn20_patch_bounding_box(struct dc *dc, struct _vcs_dpi_soc_bounding_box_st *bb)
-{
-	if ((int)(bb->sr_exit_time_us * 1000) != dc->bb_overrides.sr_exit_time_ns
-			&& dc->bb_overrides.sr_exit_time_ns) {
-		bb->sr_exit_time_us = dc->bb_overrides.sr_exit_time_ns / 1000.0;
-	}
-
-	if ((int)(bb->sr_enter_plus_exit_time_us * 1000)
-				!= dc->bb_overrides.sr_enter_plus_exit_time_ns
-			&& dc->bb_overrides.sr_enter_plus_exit_time_ns) {
-		bb->sr_enter_plus_exit_time_us =
-				dc->bb_overrides.sr_enter_plus_exit_time_ns / 1000.0;
-	}
-
-	if ((int)(bb->urgent_latency_us * 1000) != dc->bb_overrides.urgent_latency_ns
-			&& dc->bb_overrides.urgent_latency_ns) {
-		bb->urgent_latency_us = dc->bb_overrides.urgent_latency_ns / 1000.0;
-	}
-
-	if ((int)(bb->dram_clock_change_latency_us * 1000)
-				!= dc->bb_overrides.dram_clock_change_latency_ns
-			&& dc->bb_overrides.dram_clock_change_latency_ns) {
-		bb->dram_clock_change_latency_us =
-				dc->bb_overrides.dram_clock_change_latency_ns / 1000.0;
-	}
-
-	if ((int)(bb->dummy_pstate_latency_us * 1000)
-				!= dc->bb_overrides.dummy_clock_change_latency_ns
-			&& dc->bb_overrides.dummy_clock_change_latency_ns) {
-		bb->dummy_pstate_latency_us =
-				dc->bb_overrides.dummy_clock_change_latency_ns / 1000.0;
-	}
-}
-
 static struct _vcs_dpi_soc_bounding_box_st *get_asic_rev_soc_bb(
 	uint32_t hw_internal_rev)
 {
@@ -3580,7 +3546,7 @@ static bool init_soc_bounding_box(struct dc *dc,
 
 	loaded_ip->max_num_otg = pool->base.res_cap->num_timing_generator;
 	loaded_ip->max_num_dpp = pool->base.pipe_count;
-	dcn20_patch_bounding_box(dc, loaded_bb);
+	dcn_patch_bounding_box(dc, loaded_bb);
 
 	return true;
 }
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.h b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.h
index 11ec655a18e3..4e6fda59ec29 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.h
@@ -167,9 +167,6 @@ enum dc_status dcn20_add_dsc_to_stream_resource(struct dc *dc, struct dc_state *
 enum dc_status dcn20_remove_stream_from_ctx(struct dc *dc, struct dc_state *new_ctx, struct dc_stream_state *dc_stream);
 enum dc_status dcn20_patch_unknown_plane_state(struct dc_plane_state *plane_state);
 
-void dcn20_patch_bounding_box(
-		struct dc *dc,
-		struct _vcs_dpi_soc_bounding_box_st *bb);
 void dcn20_cap_soc_clocks(
 		struct _vcs_dpi_soc_bounding_box_st *bb,
 		struct pp_smu_nv_clock_table max_clocks);
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
index 4edebee00095..98acc8be698f 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
@@ -1563,7 +1563,7 @@ static bool init_soc_bounding_box(struct dc *dc,
 	loaded_ip->max_num_otg = pool->base.res_cap->num_timing_generator;
 	loaded_ip->max_num_dpp = pool->base.pipe_count;
 	loaded_ip->clamp_min_dcfclk = dc->config.clamp_min_dcfclk;
-	dcn20_patch_bounding_box(dc, loaded_bb);
+	dcn_patch_bounding_box(dc, loaded_bb);
 
 	if (dc->ctx->dc_bios->funcs->get_soc_bb_info) {
 		struct bp_soc_bb_info bb_info = {0};
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
index 934df194db38..e56925a89e4a 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
@@ -36,6 +36,7 @@
 
 #include "dcn20/dcn20_resource.h"
 #include "fpu_operation/dcn3x_commons.h"
+#include "fpu_operation/fpu_commons.h"
 
 #include "dcn10/dcn10_ipp.h"
 #include "dcn301/dcn301_hubbub.h"
@@ -1502,7 +1503,7 @@ static bool init_soc_bounding_box(struct dc *dc,
 
 	loaded_ip->max_num_otg = pool->base.res_cap->num_timing_generator;
 	loaded_ip->max_num_dpp = pool->base.pipe_count;
-	dcn20_patch_bounding_box(dc, loaded_bb);
+	dcn_patch_bounding_box(dc, loaded_bb);
 
 	if (dc->ctx->dc_bios->funcs->get_soc_bb_info) {
 		struct bp_soc_bb_info bb_info = {0};
diff --git a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
index 539757ec3348..8a62ff16347a 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
@@ -43,6 +43,7 @@
 #include "dcn20/dcn20_dsc.h"
 #include "dcn20/dcn20_resource.h"
 #include "fpu_operation/dcn3x_commons.h"
+#include "fpu_operation/fpu_commons.h"
 
 #include "dcn10/dcn10_resource.h"
 
@@ -1098,7 +1099,7 @@ static bool init_soc_bounding_box(struct dc *dc,  struct resource_pool *pool)
 	loaded_ip->max_num_otg = pool->pipe_count;
 	loaded_ip->max_num_dpp = pool->pipe_count;
 	loaded_ip->clamp_min_dcfclk = dc->config.clamp_min_dcfclk;
-	dcn20_patch_bounding_box(dc, loaded_bb);
+	dcn_patch_bounding_box(dc, loaded_bb);
 	return true;
 }
 
diff --git a/drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.c b/drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.c
index d2089a1dc630..5ffab6715085 100644
--- a/drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.c
+++ b/drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.c
@@ -93,6 +93,41 @@ static void _dcn20_calculate_dlg_params(struct dc *dc, struct dc_state *context,
 	}
 }
 
+static void _dcn_patch_bounding_box(struct dc *dc,
+				     struct _vcs_dpi_soc_bounding_box_st *bb)
+{
+	if ((int)(bb->sr_exit_time_us * 1000) != dc->bb_overrides.sr_exit_time_ns
+			&& dc->bb_overrides.sr_exit_time_ns) {
+		bb->sr_exit_time_us = dc->bb_overrides.sr_exit_time_ns / 1000.0;
+	}
+
+	if ((int)(bb->sr_enter_plus_exit_time_us * 1000)
+				!= dc->bb_overrides.sr_enter_plus_exit_time_ns
+			&& dc->bb_overrides.sr_enter_plus_exit_time_ns) {
+		bb->sr_enter_plus_exit_time_us =
+				dc->bb_overrides.sr_enter_plus_exit_time_ns / 1000.0;
+	}
+
+	if ((int)(bb->urgent_latency_us * 1000) != dc->bb_overrides.urgent_latency_ns
+			&& dc->bb_overrides.urgent_latency_ns) {
+		bb->urgent_latency_us = dc->bb_overrides.urgent_latency_ns / 1000.0;
+	}
+
+	if ((int)(bb->dram_clock_change_latency_us * 1000)
+				!= dc->bb_overrides.dram_clock_change_latency_ns
+			&& dc->bb_overrides.dram_clock_change_latency_ns) {
+		bb->dram_clock_change_latency_us =
+				dc->bb_overrides.dram_clock_change_latency_ns / 1000.0;
+	}
+
+	if ((int)(bb->dummy_pstate_latency_us * 1000)
+				!= dc->bb_overrides.dummy_clock_change_latency_ns
+			&& dc->bb_overrides.dummy_clock_change_latency_ns) {
+		bb->dummy_pstate_latency_us =
+				dc->bb_overrides.dummy_clock_change_latency_ns / 1000.0;
+	}
+}
+
 void dcn20_calculate_dlg_params(struct dc *dc, struct dc_state *context,
 		display_e2e_pipe_params_st *pipes, int pipe_cnt, int vlevel)
 {
@@ -100,3 +135,11 @@ void dcn20_calculate_dlg_params(struct dc *dc, struct dc_state *context,
 	_dcn20_calculate_dlg_params(dc, context, pipes, pipe_cnt, vlevel);
 	DC_FP_END();
 }
+
+void dcn_patch_bounding_box(struct dc *dc,
+			      struct _vcs_dpi_soc_bounding_box_st *bb)
+{
+	DC_FP_START();
+	_dcn_patch_bounding_box(dc, bb);
+	DC_FP_END();
+}
diff --git a/drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.h b/drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.h
index 8d23fa8e87ee..1eda6a4399a4 100644
--- a/drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.h
+++ b/drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.h
@@ -31,4 +31,7 @@ void dcn20_calculate_dlg_params(struct dc *dc, struct dc_state *context,
 		int pipe_cnt,
 		int vlevel);
 
+void dcn_patch_bounding_box(struct dc *dc,
+			    struct _vcs_dpi_soc_bounding_box_st *bb);
+
 #endif
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 6/7] drm/amd/display: Move bounding box functions to FPU folder
  2021-01-25 13:43 [RFC 0/7] Proposal for isolating FPU operation Rodrigo Siqueira
                   ` (4 preceding siblings ...)
  2021-01-25 13:43 ` [RFC 5/7] drm/amd/display: Move patch bounding box " Rodrigo Siqueira
@ 2021-01-25 13:43 ` Rodrigo Siqueira
  2021-01-25 13:43 ` [RFC 7/7] drm/amd/display: Drop float flages from DCN30 files Rodrigo Siqueira
  2021-01-25 15:29 ` [RFC 0/7] Proposal for isolating FPU operation Christian König
  7 siblings, 0 replies; 9+ messages in thread
From: Rodrigo Siqueira @ 2021-01-25 13:43 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alex Deucher, Harry Wentland, Christian König, Nicholas Kazlauskas

This commit moves all operations and data structures related to the
bounding box for DCN30 to the FPU folders.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
---
 .../drm/amd/display/dc/dcn30/dcn30_resource.c | 325 +----------------
 .../drm/amd/display/dc/dcn30/dcn30_resource.h |   2 -
 .../display/dc/fpu_operation/dcn3x_commons.c  | 335 ++++++++++++++++++
 .../display/dc/fpu_operation/dcn3x_commons.h  |   6 +
 4 files changed, 342 insertions(+), 326 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
index 98acc8be698f..cd2598bd193f 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
@@ -93,137 +93,6 @@
 
 #define DC_LOGGER_INIT(logger)
 
-struct _vcs_dpi_ip_params_st dcn3_0_ip = {
-	.use_min_dcfclk = 1,
-	.clamp_min_dcfclk = 0,
-	.odm_capable = 1,
-	.gpuvm_enable = 0,
-	.hostvm_enable = 0,
-	.gpuvm_max_page_table_levels = 4,
-	.hostvm_max_page_table_levels = 4,
-	.hostvm_cached_page_table_levels = 0,
-	.pte_group_size_bytes = 2048,
-	.num_dsc = 6,
-	.rob_buffer_size_kbytes = 184,
-	.det_buffer_size_kbytes = 184,
-	.dpte_buffer_size_in_pte_reqs_luma = 84,
-	.pde_proc_buffer_size_64k_reqs = 48,
-	.dpp_output_buffer_pixels = 2560,
-	.opp_output_buffer_lines = 1,
-	.pixel_chunk_size_kbytes = 8,
-	.pte_enable = 1,
-	.max_page_table_levels = 2,
-	.pte_chunk_size_kbytes = 2,  // ?
-	.meta_chunk_size_kbytes = 2,
-	.writeback_chunk_size_kbytes = 8,
-	.line_buffer_size_bits = 789504,
-	.is_line_buffer_bpp_fixed = 0,  // ?
-	.line_buffer_fixed_bpp = 0,     // ?
-	.dcc_supported = true,
-	.writeback_interface_buffer_size_kbytes = 90,
-	.writeback_line_buffer_buffer_size = 0,
-	.max_line_buffer_lines = 12,
-	.writeback_luma_buffer_size_kbytes = 12,  // writeback_line_buffer_buffer_size = 656640
-	.writeback_chroma_buffer_size_kbytes = 8,
-	.writeback_chroma_line_buffer_width_pixels = 4,
-	.writeback_max_hscl_ratio = 1,
-	.writeback_max_vscl_ratio = 1,
-	.writeback_min_hscl_ratio = 1,
-	.writeback_min_vscl_ratio = 1,
-	.writeback_max_hscl_taps = 1,
-	.writeback_max_vscl_taps = 1,
-	.writeback_line_buffer_luma_buffer_size = 0,
-	.writeback_line_buffer_chroma_buffer_size = 14643,
-	.cursor_buffer_size = 8,
-	.cursor_chunk_size = 2,
-	.max_num_otg = 6,
-	.max_num_dpp = 6,
-	.max_num_wb = 1,
-	.max_dchub_pscl_bw_pix_per_clk = 4,
-	.max_pscl_lb_bw_pix_per_clk = 2,
-	.max_lb_vscl_bw_pix_per_clk = 4,
-	.max_vscl_hscl_bw_pix_per_clk = 4,
-	.max_hscl_ratio = 6,
-	.max_vscl_ratio = 6,
-	.hscl_mults = 4,
-	.vscl_mults = 4,
-	.max_hscl_taps = 8,
-	.max_vscl_taps = 8,
-	.dispclk_ramp_margin_percent = 1,
-	.underscan_factor = 1.11,
-	.min_vblank_lines = 32,
-	.dppclk_delay_subtotal = 46,
-	.dynamic_metadata_vm_enabled = true,
-	.dppclk_delay_scl_lb_only = 16,
-	.dppclk_delay_scl = 50,
-	.dppclk_delay_cnvc_formatter = 27,
-	.dppclk_delay_cnvc_cursor = 6,
-	.dispclk_delay_subtotal = 119,
-	.dcfclk_cstate_latency = 5.2, // SRExitTime
-	.max_inter_dcn_tile_repeaters = 8,
-	.odm_combine_4to1_supported = true,
-
-	.xfc_supported = false,
-	.xfc_fill_bw_overhead_percent = 10.0,
-	.xfc_fill_constant_bytes = 0,
-	.gfx7_compat_tiling_supported = 0,
-	.number_of_cursors = 1,
-};
-
-struct _vcs_dpi_soc_bounding_box_st dcn3_0_soc = {
-	.clock_limits = {
-			{
-				.state = 0,
-				.dispclk_mhz = 562.0,
-				.dppclk_mhz = 300.0,
-				.phyclk_mhz = 300.0,
-				.phyclk_d18_mhz = 667.0,
-				.dscclk_mhz = 405.6,
-			},
-		},
-	.min_dcfclk = 500.0, /* TODO: set this to actual min DCFCLK */
-	.num_states = 1,
-	.sr_exit_time_us = 12,
-	.sr_enter_plus_exit_time_us = 20,
-	.urgent_latency_us = 4.0,
-	.urgent_latency_pixel_data_only_us = 4.0,
-	.urgent_latency_pixel_mixed_with_vm_data_us = 4.0,
-	.urgent_latency_vm_data_only_us = 4.0,
-	.urgent_out_of_order_return_per_channel_pixel_only_bytes = 4096,
-	.urgent_out_of_order_return_per_channel_pixel_and_vm_bytes = 4096,
-	.urgent_out_of_order_return_per_channel_vm_only_bytes = 4096,
-	.pct_ideal_dram_sdp_bw_after_urgent_pixel_only = 80.0,
-	.pct_ideal_dram_sdp_bw_after_urgent_pixel_and_vm = 60.0,
-	.pct_ideal_dram_sdp_bw_after_urgent_vm_only = 40.0,
-	.max_avg_sdp_bw_use_normal_percent = 60.0,
-	.max_avg_dram_bw_use_normal_percent = 40.0,
-	.writeback_latency_us = 12.0,
-	.max_request_size_bytes = 256,
-	.fabric_datapath_to_dcn_data_return_bytes = 64,
-	.dcn_downspread_percent = 0.5,
-	.downspread_percent = 0.38,
-	.dram_page_open_time_ns = 50.0,
-	.dram_rw_turnaround_time_ns = 17.5,
-	.dram_return_buffer_per_channel_bytes = 8192,
-	.round_trip_ping_latency_dcfclk_cycles = 191,
-	.urgent_out_of_order_return_per_channel_bytes = 4096,
-	.channel_interleave_bytes = 256,
-	.num_banks = 8,
-	.gpuvm_min_page_size_bytes = 4096,
-	.hostvm_min_page_size_bytes = 4096,
-	.dram_clock_change_latency_us = 404,
-	.dummy_pstate_latency_us = 5,
-	.writeback_dram_clock_change_latency_us = 23.0,
-	.return_bus_width_bytes = 64,
-	.dispclk_dppclk_vco_speed_mhz = 3650,
-	.xfc_bus_transport_time_us = 20,      // ?
-	.xfc_xbuf_latency_tolerance_us = 4,  // ?
-	.use_urgent_burst_bw = 1,            // ?
-	.do_urgent_latency_adjustment = true,
-	.urgent_latency_adjustment_fabric_clock_component_us = 1.0,
-	.urgent_latency_adjustment_fabric_clock_reference_mhz = 1000,
-};
-
 enum dcn30_clk_src_array_id {
 	DCN30_CLK_SRC_PLL0,
 	DCN30_CLK_SRC_PLL1,
@@ -1537,52 +1406,6 @@ bool dcn30_release_post_bldn_3dlut(
 #define fixed16_to_double(x) (((double) x) / ((double) (1 << 16)))
 #define fixed16_to_double_to_cpu(x) fixed16_to_double(le32_to_cpu(x))
 
-static bool is_soc_bounding_box_valid(struct dc *dc)
-{
-	uint32_t hw_internal_rev = dc->ctx->asic_id.hw_internal_rev;
-
-	if (ASICREV_IS_SIENNA_CICHLID_P(hw_internal_rev))
-		return true;
-
-	return false;
-}
-
-static bool init_soc_bounding_box(struct dc *dc,
-				  struct dcn30_resource_pool *pool)
-{
-	struct _vcs_dpi_soc_bounding_box_st *loaded_bb = &dcn3_0_soc;
-	struct _vcs_dpi_ip_params_st *loaded_ip = &dcn3_0_ip;
-
-	DC_LOGGER_INIT(dc->ctx->logger);
-
-	if (!is_soc_bounding_box_valid(dc)) {
-		DC_LOG_ERROR("%s: not valid soc bounding box/n", __func__);
-		return false;
-	}
-
-	loaded_ip->max_num_otg = pool->base.res_cap->num_timing_generator;
-	loaded_ip->max_num_dpp = pool->base.pipe_count;
-	loaded_ip->clamp_min_dcfclk = dc->config.clamp_min_dcfclk;
-	dcn_patch_bounding_box(dc, loaded_bb);
-
-	if (dc->ctx->dc_bios->funcs->get_soc_bb_info) {
-		struct bp_soc_bb_info bb_info = {0};
-
-		if (dc->ctx->dc_bios->funcs->get_soc_bb_info(dc->ctx->dc_bios, &bb_info) == BP_RESULT_OK) {
-			if (bb_info.dram_clock_change_latency_100ns > 0)
-				dcn3_0_soc.dram_clock_change_latency_us = bb_info.dram_clock_change_latency_100ns * 10;
-
-			if (bb_info.dram_sr_enter_exit_latency_100ns > 0)
-				dcn3_0_soc.sr_enter_plus_exit_time_us = bb_info.dram_sr_enter_exit_latency_100ns * 10;
-
-			if (bb_info.dram_sr_exit_latency_100ns > 0)
-				dcn3_0_soc.sr_exit_time_us = bb_info.dram_sr_exit_latency_100ns * 10;
-		}
-	}
-
-	return true;
-}
-
 static bool dcn30_split_stream_for_mpc_or_odm(
 		const struct dc *dc,
 		struct resource_context *res_ctx,
@@ -1989,150 +1812,6 @@ bool dcn30_validate_bandwidth(struct dc *dc,
 	return out;
 }
 
-static noinline void get_optimal_dcfclk_fclk_for_uclk(unsigned int uclk_mts,
-                                                       unsigned int *optimal_dcfclk,
-                                                       unsigned int *optimal_fclk)
-{
-       double bw_from_dram, bw_from_dram1, bw_from_dram2;
-
-       bw_from_dram1 = uclk_mts * dcn3_0_soc.num_chans *
-                       dcn3_0_soc.dram_channel_width_bytes * (dcn3_0_soc.max_avg_dram_bw_use_normal_percent / 100);
-       bw_from_dram2 = uclk_mts * dcn3_0_soc.num_chans *
-                       dcn3_0_soc.dram_channel_width_bytes * (dcn3_0_soc.max_avg_sdp_bw_use_normal_percent / 100);
-
-       bw_from_dram = (bw_from_dram1 < bw_from_dram2) ? bw_from_dram1 : bw_from_dram2;
-
-       if (optimal_fclk)
-               *optimal_fclk = bw_from_dram /
-               (dcn3_0_soc.fabric_datapath_to_dcn_data_return_bytes * (dcn3_0_soc.max_avg_sdp_bw_use_normal_percent / 100));
-
-       if (optimal_dcfclk)
-               *optimal_dcfclk =  bw_from_dram /
-               (dcn3_0_soc.return_bus_width_bytes * (dcn3_0_soc.max_avg_sdp_bw_use_normal_percent / 100));
-}
-
-void dcn30_update_bw_bounding_box(struct dc *dc, struct clk_bw_params *bw_params)
-{
-	unsigned int i, j;
-	unsigned int num_states = 0;
-
-	unsigned int dcfclk_mhz[DC__VOLTAGE_STATES] = {0};
-	unsigned int dram_speed_mts[DC__VOLTAGE_STATES] = {0};
-	unsigned int optimal_uclk_for_dcfclk_sta_targets[DC__VOLTAGE_STATES] = {0};
-	unsigned int optimal_dcfclk_for_uclk[DC__VOLTAGE_STATES] = {0};
-
-	unsigned int dcfclk_sta_targets[DC__VOLTAGE_STATES] = {694, 875, 1000, 1200};
-	unsigned int num_dcfclk_sta_targets = 4;
-	unsigned int num_uclk_states;
-
-	if (dc->ctx->dc_bios->vram_info.num_chans)
-		dcn3_0_soc.num_chans = dc->ctx->dc_bios->vram_info.num_chans;
-
-	if (dc->ctx->dc_bios->vram_info.dram_channel_width_bytes)
-		dcn3_0_soc.dram_channel_width_bytes = dc->ctx->dc_bios->vram_info.dram_channel_width_bytes;
-
-	dcn3_0_soc.dispclk_dppclk_vco_speed_mhz = dc->clk_mgr->dentist_vco_freq_khz / 1000.0;
-	dc->dml.soc.dispclk_dppclk_vco_speed_mhz = dc->clk_mgr->dentist_vco_freq_khz / 1000.0;
-
-	if (bw_params->clk_table.entries[0].memclk_mhz) {
-
-		if (bw_params->clk_table.entries[1].dcfclk_mhz > dcfclk_sta_targets[num_dcfclk_sta_targets-1]) {
-			// If max DCFCLK is greater than the max DCFCLK STA target, insert into the DCFCLK STA target array
-			dcfclk_sta_targets[num_dcfclk_sta_targets] = bw_params->clk_table.entries[1].dcfclk_mhz;
-			num_dcfclk_sta_targets++;
-		} else if (bw_params->clk_table.entries[1].dcfclk_mhz < dcfclk_sta_targets[num_dcfclk_sta_targets-1]) {
-			// If max DCFCLK is less than the max DCFCLK STA target, cap values and remove duplicates
-			for (i = 0; i < num_dcfclk_sta_targets; i++) {
-				if (dcfclk_sta_targets[i] > bw_params->clk_table.entries[1].dcfclk_mhz) {
-					dcfclk_sta_targets[i] = bw_params->clk_table.entries[1].dcfclk_mhz;
-					break;
-				}
-			}
-			// Update size of array since we "removed" duplicates
-			num_dcfclk_sta_targets = i + 1;
-		}
-
-		num_uclk_states = bw_params->clk_table.num_entries;
-
-		// Calculate optimal dcfclk for each uclk
-		for (i = 0; i < num_uclk_states; i++) {
-			DC_FP_START();
-			get_optimal_dcfclk_fclk_for_uclk(bw_params->clk_table.entries[i].memclk_mhz * 16,
-					&optimal_dcfclk_for_uclk[i], NULL);
-			DC_FP_END();
-			if (optimal_dcfclk_for_uclk[i] < bw_params->clk_table.entries[0].dcfclk_mhz) {
-				optimal_dcfclk_for_uclk[i] = bw_params->clk_table.entries[0].dcfclk_mhz;
-			}
-		}
-
-		// Calculate optimal uclk for each dcfclk sta target
-		for (i = 0; i < num_dcfclk_sta_targets; i++) {
-			for (j = 0; j < num_uclk_states; j++) {
-				if (dcfclk_sta_targets[i] < optimal_dcfclk_for_uclk[j]) {
-					optimal_uclk_for_dcfclk_sta_targets[i] =
-							bw_params->clk_table.entries[j].memclk_mhz * 16;
-					break;
-				}
-			}
-		}
-
-		i = 0;
-		j = 0;
-		// create the final dcfclk and uclk table
-		while (i < num_dcfclk_sta_targets && j < num_uclk_states && num_states < DC__VOLTAGE_STATES) {
-			if (dcfclk_sta_targets[i] < optimal_dcfclk_for_uclk[j] && i < num_dcfclk_sta_targets) {
-				dcfclk_mhz[num_states] = dcfclk_sta_targets[i];
-				dram_speed_mts[num_states++] = optimal_uclk_for_dcfclk_sta_targets[i++];
-			} else {
-				if (j < num_uclk_states && optimal_dcfclk_for_uclk[j] <= bw_params->clk_table.entries[1].dcfclk_mhz) {
-					dcfclk_mhz[num_states] = optimal_dcfclk_for_uclk[j];
-					dram_speed_mts[num_states++] = bw_params->clk_table.entries[j++].memclk_mhz * 16;
-				} else {
-					j = num_uclk_states;
-				}
-			}
-		}
-
-		while (i < num_dcfclk_sta_targets && num_states < DC__VOLTAGE_STATES) {
-			dcfclk_mhz[num_states] = dcfclk_sta_targets[i];
-			dram_speed_mts[num_states++] = optimal_uclk_for_dcfclk_sta_targets[i++];
-		}
-
-		while (j < num_uclk_states && num_states < DC__VOLTAGE_STATES &&
-				optimal_dcfclk_for_uclk[j] <= bw_params->clk_table.entries[1].dcfclk_mhz) {
-			dcfclk_mhz[num_states] = optimal_dcfclk_for_uclk[j];
-			dram_speed_mts[num_states++] = bw_params->clk_table.entries[j++].memclk_mhz * 16;
-		}
-
-		for (i = 0; i < dcn3_0_soc.num_states; i++) {
-			dcn3_0_soc.clock_limits[i].state = i;
-			dcn3_0_soc.clock_limits[i].dcfclk_mhz = dcfclk_mhz[i];
-			dcn3_0_soc.clock_limits[i].fabricclk_mhz = dcfclk_mhz[i];
-			dcn3_0_soc.clock_limits[i].dram_speed_mts = dram_speed_mts[i];
-
-			/* Fill all states with max values of all other clocks */
-			dcn3_0_soc.clock_limits[i].dispclk_mhz = bw_params->clk_table.entries[1].dispclk_mhz;
-			dcn3_0_soc.clock_limits[i].dppclk_mhz  = bw_params->clk_table.entries[1].dppclk_mhz;
-			dcn3_0_soc.clock_limits[i].phyclk_mhz  = bw_params->clk_table.entries[1].phyclk_mhz;
-			dcn3_0_soc.clock_limits[i].dtbclk_mhz = dcn3_0_soc.clock_limits[0].dtbclk_mhz;
-			/* These clocks cannot come from bw_params, always fill from dcn3_0_soc[1] */
-			/* FCLK, PHYCLK_D18, SOCCLK, DSCCLK */
-			dcn3_0_soc.clock_limits[i].phyclk_d18_mhz = dcn3_0_soc.clock_limits[0].phyclk_d18_mhz;
-			dcn3_0_soc.clock_limits[i].socclk_mhz = dcn3_0_soc.clock_limits[0].socclk_mhz;
-			dcn3_0_soc.clock_limits[i].dscclk_mhz = dcn3_0_soc.clock_limits[0].dscclk_mhz;
-		}
-		/* re-init DML with updated bb */
-		dml_init_instance(&dc->dml, &dcn3_0_soc, &dcn3_0_ip, DML_PROJECT_DCN30);
-		if (dc->current_state)
-			dml_init_instance(&dc->current_state->bw_ctx.dml, &dcn3_0_soc, &dcn3_0_ip, DML_PROJECT_DCN30);
-	}
-
-	/* re-init DML with updated bb */
-	dml_init_instance(&dc->dml, &dcn3_0_soc, &dcn3_0_ip, DML_PROJECT_DCN30);
-	if (dc->current_state)
-		dml_init_instance(&dc->current_state->bw_ctx.dml, &dcn3_0_soc, &dcn3_0_ip, DML_PROJECT_DCN30);
-}
-
 static const struct resource_funcs dcn30_res_pool_funcs = {
 	.destroy = dcn30_destroy_resource_pool,
 	.link_enc_create = dcn30_link_encoder_create,
@@ -2293,9 +1972,7 @@ static bool dcn30_resource_construct(
 	}
 
 	/* PP Lib and SMU interfaces */
-	init_soc_bounding_box(dc, pool);
-
-	dml_init_instance(&dc->dml, &dcn3_0_soc, &dcn3_0_ip, DML_PROJECT_DCN30);
+	init_bounding_box_alongside_dml(dc, pool);
 
 	/* IRQ */
 	init_data.ctx = dc->ctx;
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h
index 694ae4e4e88e..9b63638308f6 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.h
@@ -68,6 +68,4 @@ enum dc_status dcn30_add_stream_to_ctx(
 		struct dc_state *new_ctx,
 		struct dc_stream_state *dc_stream);
 
-void dcn30_update_bw_bounding_box(struct dc *dc, struct clk_bw_params *bw_params);
-
 #endif /* _DCN30_RESOURCE_H_ */
diff --git a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
index c3b06a8cf765..edcfbfcbc71d 100644
--- a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
+++ b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
@@ -44,11 +44,190 @@
 
 #include "clk_mgr.h"
 #include "resource.h"
+#include "dcn30/dcn30_resource.h"
 #include "dml/dcn30/display_mode_vba_30.h"
 
 #include "fpu_operation/fpu_commons.h"
 #include "fpu_operation/dcn3x_commons.h"
 
+static struct _vcs_dpi_ip_params_st dcn3_0_ip = {
+	.use_min_dcfclk = 1,
+	.clamp_min_dcfclk = 0,
+	.odm_capable = 1,
+	.gpuvm_enable = 0,
+	.hostvm_enable = 0,
+	.gpuvm_max_page_table_levels = 4,
+	.hostvm_max_page_table_levels = 4,
+	.hostvm_cached_page_table_levels = 0,
+	.pte_group_size_bytes = 2048,
+	.num_dsc = 6,
+	.rob_buffer_size_kbytes = 184,
+	.det_buffer_size_kbytes = 184,
+	.dpte_buffer_size_in_pte_reqs_luma = 84,
+	.pde_proc_buffer_size_64k_reqs = 48,
+	.dpp_output_buffer_pixels = 2560,
+	.opp_output_buffer_lines = 1,
+	.pixel_chunk_size_kbytes = 8,
+	.pte_enable = 1,
+	.max_page_table_levels = 2,
+	.pte_chunk_size_kbytes = 2,  // ?
+	.meta_chunk_size_kbytes = 2,
+	.writeback_chunk_size_kbytes = 8,
+	.line_buffer_size_bits = 789504,
+	.is_line_buffer_bpp_fixed = 0,  // ?
+	.line_buffer_fixed_bpp = 0,     // ?
+	.dcc_supported = true,
+	.writeback_interface_buffer_size_kbytes = 90,
+	.writeback_line_buffer_buffer_size = 0,
+	.max_line_buffer_lines = 12,
+	.writeback_luma_buffer_size_kbytes = 12,  // writeback_line_buffer_buffer_size = 656640
+	.writeback_chroma_buffer_size_kbytes = 8,
+	.writeback_chroma_line_buffer_width_pixels = 4,
+	.writeback_max_hscl_ratio = 1,
+	.writeback_max_vscl_ratio = 1,
+	.writeback_min_hscl_ratio = 1,
+	.writeback_min_vscl_ratio = 1,
+	.writeback_max_hscl_taps = 1,
+	.writeback_max_vscl_taps = 1,
+	.writeback_line_buffer_luma_buffer_size = 0,
+	.writeback_line_buffer_chroma_buffer_size = 14643,
+	.cursor_buffer_size = 8,
+	.cursor_chunk_size = 2,
+	.max_num_otg = 6,
+	.max_num_dpp = 6,
+	.max_num_wb = 1,
+	.max_dchub_pscl_bw_pix_per_clk = 4,
+	.max_pscl_lb_bw_pix_per_clk = 2,
+	.max_lb_vscl_bw_pix_per_clk = 4,
+	.max_vscl_hscl_bw_pix_per_clk = 4,
+	.max_hscl_ratio = 6,
+	.max_vscl_ratio = 6,
+	.hscl_mults = 4,
+	.vscl_mults = 4,
+	.max_hscl_taps = 8,
+	.max_vscl_taps = 8,
+	.dispclk_ramp_margin_percent = 1,
+	.underscan_factor = 1.11,
+	.min_vblank_lines = 32,
+	.dppclk_delay_subtotal = 46,
+	.dynamic_metadata_vm_enabled = true,
+	.dppclk_delay_scl_lb_only = 16,
+	.dppclk_delay_scl = 50,
+	.dppclk_delay_cnvc_formatter = 27,
+	.dppclk_delay_cnvc_cursor = 6,
+	.dispclk_delay_subtotal = 119,
+	.dcfclk_cstate_latency = 5.2, // SRExitTime
+	.max_inter_dcn_tile_repeaters = 8,
+	.odm_combine_4to1_supported = true,
+
+	.xfc_supported = false,
+	.xfc_fill_bw_overhead_percent = 10.0,
+	.xfc_fill_constant_bytes = 0,
+	.gfx7_compat_tiling_supported = 0,
+	.number_of_cursors = 1,
+};
+
+static struct _vcs_dpi_soc_bounding_box_st dcn3_0_soc = {
+	.clock_limits = {
+			{
+				.state = 0,
+				.dispclk_mhz = 562.0,
+				.dppclk_mhz = 300.0,
+				.phyclk_mhz = 300.0,
+				.phyclk_d18_mhz = 667.0,
+				.dscclk_mhz = 405.6,
+			},
+		},
+	.min_dcfclk = 500.0, /* TODO: set this to actual min DCFCLK */
+	.num_states = 1,
+	.sr_exit_time_us = 12,
+	.sr_enter_plus_exit_time_us = 20,
+	.urgent_latency_us = 4.0,
+	.urgent_latency_pixel_data_only_us = 4.0,
+	.urgent_latency_pixel_mixed_with_vm_data_us = 4.0,
+	.urgent_latency_vm_data_only_us = 4.0,
+	.urgent_out_of_order_return_per_channel_pixel_only_bytes = 4096,
+	.urgent_out_of_order_return_per_channel_pixel_and_vm_bytes = 4096,
+	.urgent_out_of_order_return_per_channel_vm_only_bytes = 4096,
+	.pct_ideal_dram_sdp_bw_after_urgent_pixel_only = 80.0,
+	.pct_ideal_dram_sdp_bw_after_urgent_pixel_and_vm = 60.0,
+	.pct_ideal_dram_sdp_bw_after_urgent_vm_only = 40.0,
+	.max_avg_sdp_bw_use_normal_percent = 60.0,
+	.max_avg_dram_bw_use_normal_percent = 40.0,
+	.writeback_latency_us = 12.0,
+	.max_request_size_bytes = 256,
+	.fabric_datapath_to_dcn_data_return_bytes = 64,
+	.dcn_downspread_percent = 0.5,
+	.downspread_percent = 0.38,
+	.dram_page_open_time_ns = 50.0,
+	.dram_rw_turnaround_time_ns = 17.5,
+	.dram_return_buffer_per_channel_bytes = 8192,
+	.round_trip_ping_latency_dcfclk_cycles = 191,
+	.urgent_out_of_order_return_per_channel_bytes = 4096,
+	.channel_interleave_bytes = 256,
+	.num_banks = 8,
+	.gpuvm_min_page_size_bytes = 4096,
+	.hostvm_min_page_size_bytes = 4096,
+	.dram_clock_change_latency_us = 404,
+	.dummy_pstate_latency_us = 5,
+	.writeback_dram_clock_change_latency_us = 23.0,
+	.return_bus_width_bytes = 64,
+	.dispclk_dppclk_vco_speed_mhz = 3650,
+	.xfc_bus_transport_time_us = 20,      // ?
+	.xfc_xbuf_latency_tolerance_us = 4,  // ?
+	.use_urgent_burst_bw = 1,            // ?
+	.do_urgent_latency_adjustment = true,
+	.urgent_latency_adjustment_fabric_clock_component_us = 1.0,
+	.urgent_latency_adjustment_fabric_clock_reference_mhz = 1000,
+};
+
+static bool is_soc_bounding_box_valid(struct dc *dc)
+{
+	uint32_t hw_internal_rev = dc->ctx->asic_id.hw_internal_rev;
+
+	if (ASICREV_IS_SIENNA_CICHLID_P(hw_internal_rev))
+		return true;
+
+	return false;
+}
+
+
+static bool init_soc_bounding_box(struct dc *dc,
+				  struct dcn30_resource_pool *pool)
+{
+	struct _vcs_dpi_soc_bounding_box_st *loaded_bb = &dcn3_0_soc;
+	struct _vcs_dpi_ip_params_st *loaded_ip = &dcn3_0_ip;
+
+	//DC_LOGGER_INIT(dc->ctx->logger);
+
+	if (!is_soc_bounding_box_valid(dc)) {
+		DC_LOG_ERROR("%s: not valid soc bounding box/n", __func__);
+		return false;
+	}
+
+	loaded_ip->max_num_otg = pool->base.res_cap->num_timing_generator;
+	loaded_ip->max_num_dpp = pool->base.pipe_count;
+	loaded_ip->clamp_min_dcfclk = dc->config.clamp_min_dcfclk;
+	dcn_patch_bounding_box(dc, loaded_bb);
+
+	if (dc->ctx->dc_bios->funcs->get_soc_bb_info) {
+		struct bp_soc_bb_info bb_info = {0};
+
+		if (dc->ctx->dc_bios->funcs->get_soc_bb_info(dc->ctx->dc_bios, &bb_info) == BP_RESULT_OK) {
+			if (bb_info.dram_clock_change_latency_100ns > 0)
+				dcn3_0_soc.dram_clock_change_latency_us = bb_info.dram_clock_change_latency_100ns * 10;
+
+			if (bb_info.dram_sr_enter_exit_latency_100ns > 0)
+				dcn3_0_soc.sr_enter_plus_exit_time_us = bb_info.dram_sr_enter_exit_latency_100ns * 10;
+
+			if (bb_info.dram_sr_exit_latency_100ns > 0)
+				dcn3_0_soc.sr_exit_time_us = bb_info.dram_sr_exit_latency_100ns * 10;
+		}
+	}
+
+	return true;
+}
+
 static void _dcn3x_populate_dml_writeback_from_context(struct dc *dc,
 		struct resource_context *res_ctx,
 		display_e2e_pipe_params_st *pipes)
@@ -380,6 +559,162 @@ static noinline void _dcn30_calculate_wm_and_dlg_fp(struct dc *dc,
 				dc->clk_mgr->bw_params->wm_table.nv_entries[WM_A].dml_input.pstate_latency_us;
 }
 
+static noinline void get_optimal_dcfclk_fclk_for_uclk(unsigned int uclk_mts,
+                                                       unsigned int *optimal_dcfclk,
+                                                       unsigned int *optimal_fclk)
+{
+       double bw_from_dram, bw_from_dram1, bw_from_dram2;
+
+       bw_from_dram1 = uclk_mts * dcn3_0_soc.num_chans *
+                       dcn3_0_soc.dram_channel_width_bytes * (dcn3_0_soc.max_avg_dram_bw_use_normal_percent / 100);
+       bw_from_dram2 = uclk_mts * dcn3_0_soc.num_chans *
+                       dcn3_0_soc.dram_channel_width_bytes * (dcn3_0_soc.max_avg_sdp_bw_use_normal_percent / 100);
+
+       bw_from_dram = (bw_from_dram1 < bw_from_dram2) ? bw_from_dram1 : bw_from_dram2;
+
+       if (optimal_fclk)
+               *optimal_fclk = bw_from_dram /
+               (dcn3_0_soc.fabric_datapath_to_dcn_data_return_bytes * (dcn3_0_soc.max_avg_sdp_bw_use_normal_percent / 100));
+
+       if (optimal_dcfclk)
+               *optimal_dcfclk =  bw_from_dram /
+               (dcn3_0_soc.return_bus_width_bytes * (dcn3_0_soc.max_avg_sdp_bw_use_normal_percent / 100));
+}
+
+void dcn30_update_bw_bounding_box(struct dc *dc, struct clk_bw_params *bw_params)
+{
+	unsigned int i, j;
+	unsigned int num_states = 0;
+
+	unsigned int dcfclk_mhz[DC__VOLTAGE_STATES] = {0};
+	unsigned int dram_speed_mts[DC__VOLTAGE_STATES] = {0};
+	unsigned int optimal_uclk_for_dcfclk_sta_targets[DC__VOLTAGE_STATES] = {0};
+	unsigned int optimal_dcfclk_for_uclk[DC__VOLTAGE_STATES] = {0};
+
+	unsigned int dcfclk_sta_targets[DC__VOLTAGE_STATES] = {694, 875, 1000, 1200};
+	unsigned int num_dcfclk_sta_targets = 4;
+	unsigned int num_uclk_states;
+
+	if (dc->ctx->dc_bios->vram_info.num_chans)
+		dcn3_0_soc.num_chans = dc->ctx->dc_bios->vram_info.num_chans;
+
+	if (dc->ctx->dc_bios->vram_info.dram_channel_width_bytes)
+		dcn3_0_soc.dram_channel_width_bytes = dc->ctx->dc_bios->vram_info.dram_channel_width_bytes;
+
+	dcn3_0_soc.dispclk_dppclk_vco_speed_mhz = dc->clk_mgr->dentist_vco_freq_khz / 1000.0;
+	dc->dml.soc.dispclk_dppclk_vco_speed_mhz = dc->clk_mgr->dentist_vco_freq_khz / 1000.0;
+
+	if (bw_params->clk_table.entries[0].memclk_mhz) {
+
+		if (bw_params->clk_table.entries[1].dcfclk_mhz > dcfclk_sta_targets[num_dcfclk_sta_targets-1]) {
+			// If max DCFCLK is greater than the max DCFCLK STA target, insert into the DCFCLK STA target array
+			dcfclk_sta_targets[num_dcfclk_sta_targets] = bw_params->clk_table.entries[1].dcfclk_mhz;
+			num_dcfclk_sta_targets++;
+		} else if (bw_params->clk_table.entries[1].dcfclk_mhz < dcfclk_sta_targets[num_dcfclk_sta_targets-1]) {
+			// If max DCFCLK is less than the max DCFCLK STA target, cap values and remove duplicates
+			for (i = 0; i < num_dcfclk_sta_targets; i++) {
+				if (dcfclk_sta_targets[i] > bw_params->clk_table.entries[1].dcfclk_mhz) {
+					dcfclk_sta_targets[i] = bw_params->clk_table.entries[1].dcfclk_mhz;
+					break;
+				}
+			}
+			// Update size of array since we "removed" duplicates
+			num_dcfclk_sta_targets = i + 1;
+		}
+
+		num_uclk_states = bw_params->clk_table.num_entries;
+
+		// Calculate optimal dcfclk for each uclk
+		for (i = 0; i < num_uclk_states; i++) {
+			DC_FP_START();
+			get_optimal_dcfclk_fclk_for_uclk(bw_params->clk_table.entries[i].memclk_mhz * 16,
+					&optimal_dcfclk_for_uclk[i], NULL);
+			DC_FP_END();
+			if (optimal_dcfclk_for_uclk[i] < bw_params->clk_table.entries[0].dcfclk_mhz) {
+				optimal_dcfclk_for_uclk[i] = bw_params->clk_table.entries[0].dcfclk_mhz;
+			}
+		}
+
+		// Calculate optimal uclk for each dcfclk sta target
+		for (i = 0; i < num_dcfclk_sta_targets; i++) {
+			for (j = 0; j < num_uclk_states; j++) {
+				if (dcfclk_sta_targets[i] < optimal_dcfclk_for_uclk[j]) {
+					optimal_uclk_for_dcfclk_sta_targets[i] =
+							bw_params->clk_table.entries[j].memclk_mhz * 16;
+					break;
+				}
+			}
+		}
+
+		i = 0;
+		j = 0;
+		// create the final dcfclk and uclk table
+		while (i < num_dcfclk_sta_targets && j < num_uclk_states && num_states < DC__VOLTAGE_STATES) {
+			if (dcfclk_sta_targets[i] < optimal_dcfclk_for_uclk[j] && i < num_dcfclk_sta_targets) {
+				dcfclk_mhz[num_states] = dcfclk_sta_targets[i];
+				dram_speed_mts[num_states++] = optimal_uclk_for_dcfclk_sta_targets[i++];
+			} else {
+				if (j < num_uclk_states && optimal_dcfclk_for_uclk[j] <= bw_params->clk_table.entries[1].dcfclk_mhz) {
+					dcfclk_mhz[num_states] = optimal_dcfclk_for_uclk[j];
+					dram_speed_mts[num_states++] = bw_params->clk_table.entries[j++].memclk_mhz * 16;
+				} else {
+					j = num_uclk_states;
+				}
+			}
+		}
+
+		while (i < num_dcfclk_sta_targets && num_states < DC__VOLTAGE_STATES) {
+			dcfclk_mhz[num_states] = dcfclk_sta_targets[i];
+			dram_speed_mts[num_states++] = optimal_uclk_for_dcfclk_sta_targets[i++];
+		}
+
+		while (j < num_uclk_states && num_states < DC__VOLTAGE_STATES &&
+				optimal_dcfclk_for_uclk[j] <= bw_params->clk_table.entries[1].dcfclk_mhz) {
+			dcfclk_mhz[num_states] = optimal_dcfclk_for_uclk[j];
+			dram_speed_mts[num_states++] = bw_params->clk_table.entries[j++].memclk_mhz * 16;
+		}
+
+		for (i = 0; i < dcn3_0_soc.num_states; i++) {
+			dcn3_0_soc.clock_limits[i].state = i;
+			dcn3_0_soc.clock_limits[i].dcfclk_mhz = dcfclk_mhz[i];
+			dcn3_0_soc.clock_limits[i].fabricclk_mhz = dcfclk_mhz[i];
+			dcn3_0_soc.clock_limits[i].dram_speed_mts = dram_speed_mts[i];
+
+			/* Fill all states with max values of all other clocks */
+			dcn3_0_soc.clock_limits[i].dispclk_mhz = bw_params->clk_table.entries[1].dispclk_mhz;
+			dcn3_0_soc.clock_limits[i].dppclk_mhz  = bw_params->clk_table.entries[1].dppclk_mhz;
+			dcn3_0_soc.clock_limits[i].phyclk_mhz  = bw_params->clk_table.entries[1].phyclk_mhz;
+			dcn3_0_soc.clock_limits[i].dtbclk_mhz = dcn3_0_soc.clock_limits[0].dtbclk_mhz;
+			/* These clocks cannot come from bw_params, always fill from dcn3_0_soc[1] */
+			/* FCLK, PHYCLK_D18, SOCCLK, DSCCLK */
+			dcn3_0_soc.clock_limits[i].phyclk_d18_mhz = dcn3_0_soc.clock_limits[0].phyclk_d18_mhz;
+			dcn3_0_soc.clock_limits[i].socclk_mhz = dcn3_0_soc.clock_limits[0].socclk_mhz;
+			dcn3_0_soc.clock_limits[i].dscclk_mhz = dcn3_0_soc.clock_limits[0].dscclk_mhz;
+		}
+		/* re-init DML with updated bb */
+		dml_init_instance(&dc->dml, &dcn3_0_soc, &dcn3_0_ip, DML_PROJECT_DCN30);
+		if (dc->current_state)
+			dml_init_instance(&dc->current_state->bw_ctx.dml, &dcn3_0_soc, &dcn3_0_ip, DML_PROJECT_DCN30);
+	}
+
+	/* re-init DML with updated bb */
+	dml_init_instance(&dc->dml, &dcn3_0_soc, &dcn3_0_ip, DML_PROJECT_DCN30);
+	if (dc->current_state)
+		dml_init_instance(&dc->current_state->bw_ctx.dml, &dcn3_0_soc, &dcn3_0_ip, DML_PROJECT_DCN30);
+}
+
+bool init_bounding_box_alongside_dml(struct dc *dc, struct dcn30_resource_pool *pool)
+{
+	bool ret = true;
+
+	DC_FP_START();
+	ret = init_soc_bounding_box(dc, pool);
+	dml_init_instance(&dc->dml, &dcn3_0_soc, &dcn3_0_ip, DML_PROJECT_DCN30);
+	DC_FP_END();
+
+	return ret;
+}
+
 void dcn3x_calculate_wm_and_dlg(struct dc *dc, struct dc_state *context,
 				display_e2e_pipe_params_st *pipes,
 				int pipe_cnt,
diff --git a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h
index 7521498bbd3b..c5a002faf79f 100644
--- a/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h
+++ b/drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h
@@ -35,4 +35,10 @@ void dcn3x_set_mcif_arb_params(struct dc *dc, struct dc_state *context,
 void dcn3x_calculate_wm_and_dlg(struct dc *dc, struct dc_state *context,
 	display_e2e_pipe_params_st *pipes, int pipe_cnt, int vlevel);
 
+void dcn30_update_bw_bounding_box(struct dc *dc,
+	struct clk_bw_params *bw_params);
+
+bool init_bounding_box_alongside_dml(struct dc *dc,
+	struct dcn30_resource_pool *pool);
+
 #endif /* _DCN3X_COMMONS_H_ */
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 7/7] drm/amd/display: Drop float flages from DCN30 files
  2021-01-25 13:43 [RFC 0/7] Proposal for isolating FPU operation Rodrigo Siqueira
                   ` (5 preceding siblings ...)
  2021-01-25 13:43 ` [RFC 6/7] drm/amd/display: Move bounding box functions " Rodrigo Siqueira
@ 2021-01-25 13:43 ` Rodrigo Siqueira
  2021-01-25 15:29 ` [RFC 0/7] Proposal for isolating FPU operation Christian König
  7 siblings, 0 replies; 9+ messages in thread
From: Rodrigo Siqueira @ 2021-01-25 13:43 UTC (permalink / raw)
  To: amd-gfx
  Cc: Alex Deucher, Harry Wentland, Christian König, Nicholas Kazlauskas

All functions that require FPU access associated with DCN30 were moved
to the specific file inside the FPU directory. For this reason, we don't
need to use the `-mhard-float -msse` flags for any DCN30 file directly,
which means that we can safely drop those flags from the Makefile and
any other unrelated FPU code.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
---
 drivers/gpu/drm/amd/display/dc/dcn30/Makefile | 30 -------------------
 .../drm/amd/display/dc/dcn30/dcn30_resource.c |  3 --
 .../gpu/drm/amd/display/dc/dcn302/Makefile    | 25 ----------------
 3 files changed, 58 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/Makefile b/drivers/gpu/drm/amd/display/dc/dcn30/Makefile
index c20331eb62e0..b7c2ae9ddfda 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/Makefile
@@ -30,36 +30,6 @@ DCN30 = dcn30_init.o dcn30_hubbub.o dcn30_hubp.o dcn30_dpp.o dcn30_optc.o \
 	dcn30_dpp_cm.o dcn30_dwb_cm.o dcn30_cm_common.o dcn30_mmhubbub.o \
 	dcn30_dio_link_encoder.o dcn30_resource.o
 
-
-ifdef CONFIG_X86
-CFLAGS_$(AMDDALPATH)/dc/dcn30/dcn30_resource.o := -mhard-float -msse
-CFLAGS_$(AMDDALPATH)/dc/dcn30/dcn30_optc.o := -mhard-float -msse
-endif
-
-ifdef CONFIG_PPC64
-CFLAGS_$(AMDDALPATH)/dc/dcn30/dcn30_resource.o := -mhard-float -maltivec
-CFLAGS_$(AMDDALPATH)/dc/dcn30/dcn30_optc.o := -mhard-float -maltivec
-endif
-
-ifdef CONFIG_CC_IS_GCC
-ifeq ($(call cc-ifversion, -lt, 0701, y), y)
-IS_OLD_GCC = 1
-endif
-endif
-
-ifdef CONFIG_X86
-ifdef IS_OLD_GCC
-# Stack alignment mismatch, proceed with caution.
-# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
-# (8B stack alignment).
-CFLAGS_$(AMDDALPATH)/dc/dcn30/dcn30_resource.o += -mpreferred-stack-boundary=4
-CFLAGS_$(AMDDALPATH)/dc/dcn30/dcn30_optc.o += -mpreferred-stack-boundary=4
-else
-CFLAGS_$(AMDDALPATH)/dc/dcn30/dcn30_resource.o += -msse2
-CFLAGS_$(AMDDALPATH)/dc/dcn30/dcn30_optc.o += -msse2
-endif
-endif
-
 AMD_DAL_DCN30 = $(addprefix $(AMDDALPATH)/dc/dcn30/,$(DCN30))
 
 AMD_DISPLAY_FILES += $(AMD_DAL_DCN30)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
index cd2598bd193f..d00f98174f21 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
@@ -1403,9 +1403,6 @@ bool dcn30_release_post_bldn_3dlut(
 	return ret;
 }
 
-#define fixed16_to_double(x) (((double) x) / ((double) (1 << 16)))
-#define fixed16_to_double_to_cpu(x) fixed16_to_double(le32_to_cpu(x))
-
 static bool dcn30_split_stream_for_mpc_or_odm(
 		const struct dc *dc,
 		struct resource_context *res_ctx,
diff --git a/drivers/gpu/drm/amd/display/dc/dcn302/Makefile b/drivers/gpu/drm/amd/display/dc/dcn302/Makefile
index 8d4924b7dc22..3ea9bff27912 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn302/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dcn302/Makefile
@@ -12,31 +12,6 @@
 
 DCN3_02 = dcn302_init.o dcn302_hwseq.o dcn302_resource.o
 
-ifdef CONFIG_X86
-CFLAGS_$(AMDDALPATH)/dc/dcn302/dcn302_resource.o := -mhard-float -msse
-endif
-
-ifdef CONFIG_PPC64
-CFLAGS_$(AMDDALPATH)/dc/dcn302/dcn302_resource.o := -mhard-float -maltivec
-endif
-
-ifdef CONFIG_CC_IS_GCC
-ifeq ($(call cc-ifversion, -lt, 0701, y), y)
-IS_OLD_GCC = 1
-endif
-endif
-
-ifdef CONFIG_X86
-ifdef IS_OLD_GCC
-# Stack alignment mismatch, proceed with caution.
-# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
-# (8B stack alignment).
-CFLAGS_$(AMDDALPATH)/dc/dcn302/dcn302_resource.o += -mpreferred-stack-boundary=4
-else
-CFLAGS_$(AMDDALPATH)/dc/dcn302/dcn302_resource.o += -msse2
-endif
-endif
-
 AMD_DAL_DCN3_02 = $(addprefix $(AMDDALPATH)/dc/dcn302/,$(DCN3_02))
 
 AMD_DISPLAY_FILES += $(AMD_DAL_DCN3_02)
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC 0/7] Proposal for isolating FPU operation
  2021-01-25 13:43 [RFC 0/7] Proposal for isolating FPU operation Rodrigo Siqueira
                   ` (6 preceding siblings ...)
  2021-01-25 13:43 ` [RFC 7/7] drm/amd/display: Drop float flages from DCN30 files Rodrigo Siqueira
@ 2021-01-25 15:29 ` Christian König
  7 siblings, 0 replies; 9+ messages in thread
From: Christian König @ 2021-01-25 15:29 UTC (permalink / raw)
  To: Rodrigo Siqueira, amd-gfx
  Cc: Alex Deucher, Harry Wentland, Nicholas Kazlauskas

Hi Rodrigo,

good to see this finally be tackled. The whole approach looks solid to 
me, just one things I've noted.

> +void dcn3x_populate_dml_writeback_from_context(struct dc *dc,
> +	struct resource_context *res_ctx, display_e2e_pipe_params_st *pipes)
> +{
> +	DC_FP_START();
> +	_dcn3x_populate_dml_writeback_from_context(dc, res_ctx, pipes);
> +	DC_FP_END();
> +}

The calls to DC_FP_START()/DC_FP_END() must be outside of the 
fpu_operation directory.

The problem is that even before the call to DC_FP_START() the compiler 
might think it is a good idea to use FP registers for spilling on some 
architectures.

So my understanding is that all calls of functions declared inside the 
fpu_operation directory must be made with DC_FP_START()/DC_FP_END() in 
the calleing and not the called function.

Regards,
Christian.

Am 25.01.21 um 14:43 schrieb Rodrigo Siqueira:
> Hi,
>
> In the display core, we utilize floats and doubles units for calculating
> modesetting parameters. One side effect of our approach to use double-precision
> is the fact that we spread multiple FPU access across our driver, which means
> that we can accidentally clobber user space FPU state. As an attempt to fix
> this problem, we have the following proposal:
>
> 1. We first need to move functions that deal with FPU to a single place in
>     order to make things more manageable;
> 2. After we isolate these function in a single place, we want to remove any
>     compilation flag that deals with FPU from other files and centralize it only
>     in the files that need it;
> 3. We need to implement an interface for safely calling those FPU functions.
>     The idea is to add a thin function layer where FPU functions are invoked
>     under the protection of kernel_fpu_begin/end.
>
> One of the challenges from the above steps is identifying which function uses
> FPU registers; fortunately, Peter Zijlstra wrote a patch a couple of months ago
> where he introduced an FPU check for objtool. I used the following command for
> identifying the potential FPU usage:
>
>   ./tools/objtool/objtool check -Ffa "drivers/gpu/drm/amd/display/dc/ANY_FILE.o"
>
> Based on the above command output and the step-by-step approach that we want to
> adopt, I decided to start this work focusing on DCN3 and DCN302. I believe that
> the best way to see this RFC is:
>
> 1. The first patch introduces an FPU folder inside display/dc, intending to
>     centralize functions that deal with FPU. Note that I introduced two new C
>     files named dcn3x_commons inside a new folder called fpu_operation; I used
>     the name dcn3x because some of the functions inside this folder are shared
>     with DCN301 and DCN302. In other words, all FPU function which is shared
>     across DCN3x will be placed in that file.
> 2. The next set of patches, start to move some of the function that requires
>     FPU access to the file dcn3x_commons. I did it in a small chunk to make it
>     easy to bisect in case of regressions.
> 3. Note that one of the patch touch DCN2, the reason for that is the fact that
>     the function dcn20_calculate_dlg_params is shared from DCN2 to DCN3. Because
>     of that, I create a new file named fpu_commons for keeping functions that
>     are shared across multiple ASICs.
> 4. When we move some of the functions, notice that I also add an API for
>     accessing it via fpu_kernel_begin/end.
> 5. At the end of the series, I dropped the FPU flags from the files that I
>     initialize refactored.
>
> We are also working on test stress for validating this change from the user
> space and kernel perspective.
>
> Keep in mind that this series is not done yet. I'm looking for feedback about
> this approach because we have plans to use it for trying to fix our FPU
> problems for the next couple of weeks. Finally, we want to do this work
> step-by-step because it is easy to introduce regression when dealing with these
> FPU problems.
>
> Best Regards
>
> Rodrigo Siqueira (7):
>    drm/amd/display: Introduce FPU directory inside DC
>    drm/amd/display: Moves dcn30_set_mcif_arb_params to FPU folder
>    drm/amd/display: Add FPU file for functions shared across ASICs
>    drm/amd/display: Move calculate_wm_and_dlg to FPU folder
>    drm/amd/display: Move patch bounding box to FPU folder
>    drm/amd/display: Move bounding box functions to FPU folder
>    drm/amd/display: Drop float flages from DCN30 files
>
>   drivers/gpu/drm/amd/display/dc/Makefile       |   1 +
>   .../drm/amd/display/dc/dcn20/dcn20_resource.c | 106 +--
>   .../drm/amd/display/dc/dcn20/dcn20_resource.h |   8 -
>   .../drm/amd/display/dc/dcn21/dcn21_resource.c |   2 +
>   drivers/gpu/drm/amd/display/dc/dcn30/Makefile |  30 -
>   .../drm/amd/display/dc/dcn30/dcn30_resource.c | 683 +---------------
>   .../drm/amd/display/dc/dcn30/dcn30_resource.h |  20 -
>   .../amd/display/dc/dcn301/dcn301_resource.c   |  10 +-
>   .../gpu/drm/amd/display/dc/dcn302/Makefile    |  25 -
>   .../amd/display/dc/dcn302/dcn302_resource.c   |  10 +-
>   .../drm/amd/display/dc/fpu_operation/Makefile |  58 ++
>   .../display/dc/fpu_operation/dcn3x_commons.c  | 743 ++++++++++++++++++
>   .../display/dc/fpu_operation/dcn3x_commons.h  |  44 ++
>   .../display/dc/fpu_operation/fpu_commons.c    | 145 ++++
>   .../display/dc/fpu_operation/fpu_commons.h    |  37 +
>   15 files changed, 1051 insertions(+), 871 deletions(-)
>   create mode 100644 drivers/gpu/drm/amd/display/dc/fpu_operation/Makefile
>   create mode 100644 drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.c
>   create mode 100644 drivers/gpu/drm/amd/display/dc/fpu_operation/dcn3x_commons.h
>   create mode 100644 drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.c
>   create mode 100644 drivers/gpu/drm/amd/display/dc/fpu_operation/fpu_commons.h
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-01-25 15:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-25 13:43 [RFC 0/7] Proposal for isolating FPU operation Rodrigo Siqueira
2021-01-25 13:43 ` [RFC 1/7] drm/amd/display: Introduce FPU directory inside DC Rodrigo Siqueira
2021-01-25 13:43 ` [RFC 2/7] drm/amd/display: Moves dcn30_set_mcif_arb_params to FPU folder Rodrigo Siqueira
2021-01-25 13:43 ` [RFC 3/7] drm/amd/display: Add FPU file for functions shared across ASICs Rodrigo Siqueira
2021-01-25 13:43 ` [RFC 4/7] drm/amd/display: Move calculate_wm_and_dlg to FPU folder Rodrigo Siqueira
2021-01-25 13:43 ` [RFC 5/7] drm/amd/display: Move patch bounding box " Rodrigo Siqueira
2021-01-25 13:43 ` [RFC 6/7] drm/amd/display: Move bounding box functions " Rodrigo Siqueira
2021-01-25 13:43 ` [RFC 7/7] drm/amd/display: Drop float flages from DCN30 files Rodrigo Siqueira
2021-01-25 15:29 ` [RFC 0/7] Proposal for isolating FPU operation Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.