dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation.
@ 2023-09-21 14:15 Sebastian Andrzej Siewior
  2023-09-21 14:15 ` [PATCH 1/5] drm/amd/display: Remove migrate_en/dis from dc_fpu_begin() Sebastian Andrzej Siewior
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-09-21 14:15 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Leo Li, Pan, Xinhui, Rodrigo Siqueira, Peter Zijlstra,
	Alex Deucher, Thomas Gleixner, Christian König

Hi,

I stumbled uppon the amdgpu driver via a bugzilla report. The actual fix
is #4 + #5 and the rest was made while looking at the code.

Sebastian



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/5] drm/amd/display: Remove migrate_en/dis from dc_fpu_begin().
  2023-09-21 14:15 [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation Sebastian Andrzej Siewior
@ 2023-09-21 14:15 ` Sebastian Andrzej Siewior
  2023-10-03 19:53   ` Harry Wentland
  2023-09-21 14:15 ` [PATCH 2/5] drm/amd/display: Simplify the per-CPU usage Sebastian Andrzej Siewior
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-09-21 14:15 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Tianci Yin, Leo Li, Sebastian Andrzej Siewior, Pan, Xinhui,
	Rodrigo Siqueira, Peter Zijlstra, Aurabindo Pillai, Alex Deucher,
	Thomas Gleixner, Christian König

This is a revert of the commit mentioned below while it is not wrong, as
in the kernel will explode, having migrate_disable() here it is
complete waste of resources.

Additionally commit message is plain wrong the review tag does not make
it any better. The migrate_disable() interface has a fat comment
describing it and it includes the word "undesired" in the headline which
should tickle people to read it before using it.
Initially I assumed it is worded too harsh but now I beg to differ.

The reviewer of the original commit, even not understanding what
migrate_disable() does should ask the following:

- migrate_disable() is added only to the CONFIG_X86 block and it claims
  to protect fpu_recursion_depth. Why are the other the architectures
  excluded?

- migrate_disable() is added after fpu_recursion_depth was modified.
  Shouldn't it be added before the modification or referencing takes
  place?

Moving on.
Disabling preemption DOES prevent CPU migration. A task, that can not be
pushed away from the CPU by the scheduler (due to disabled preemption)
can not be pushed or migrated to another CPU.

Disabling migration DOES NOT ensure consistency of per-CPU variables. It
only ensures that the task acts always on the same per-CPU variable. The
task remains preemptible meaning multiple tasks can access the same
per-CPU variable. This in turn leads to inconsistency for the statement

                  *pcpu -= 1;

with two tasks on one CPU and a preemption point during the RMW
operation:

     Task A                           Task B
     read pcpu to reg  # 0
     inc reg           # 0 -> 1
                                      read pcpu to reg  # 0
                                      inc reg           # 0 -> 1
                                      write reg to pcpu # 1
     write reg to pcpu # 1

At the end pcpu reads 1 but should read 2 instead. Boom.

get_cpu_ptr() already contains a preempt_disable() statement. That means
that the per-CPU variable can only be referenced by a single task which
is currently running. The only inconsistency that can occur if the
variable is additionally accessed from an interrupt.

Remove migrate_disable/enable() from dc_fpu_begin/end().

Cc: Tianci Yin <tianci.yin@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Fixes: 0c316556d1249 ("drm/amd/display: Disable migration to ensure consistency of per-CPU variable")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 172aa10a8800f..86f4c0e046548 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -91,7 +91,6 @@ void dc_fpu_begin(const char *function_name, const int line)
 
 	if (*pcpu == 1) {
 #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
-		migrate_disable();
 		kernel_fpu_begin();
 #elif defined(CONFIG_PPC64)
 		if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
@@ -132,7 +131,6 @@ void dc_fpu_end(const char *function_name, const int line)
 	if (*pcpu <= 0) {
 #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
 		kernel_fpu_end();
-		migrate_enable();
 #elif defined(CONFIG_PPC64)
 		if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
 			disable_kernel_vsx();
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/5] drm/amd/display: Simplify the per-CPU usage.
  2023-09-21 14:15 [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation Sebastian Andrzej Siewior
  2023-09-21 14:15 ` [PATCH 1/5] drm/amd/display: Remove migrate_en/dis from dc_fpu_begin() Sebastian Andrzej Siewior
@ 2023-09-21 14:15 ` Sebastian Andrzej Siewior
  2023-09-21 14:15 ` [PATCH 3/5] drm/amd/display: Add a warning if the FPU is used outside from task context Sebastian Andrzej Siewior
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-09-21 14:15 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Leo Li, Sebastian Andrzej Siewior, Pan, Xinhui, Rodrigo Siqueira,
	Peter Zijlstra, Alex Deucher, Thomas Gleixner,
	Christian König

The fpu_recursion_depth counter is used to ensure that dc_fpu_begin()
can be invoked multiple times while the FPU-disable function itself is
only invoked once. Also the counter part (dc_fpu_end()) is ballanced
properly.

Instead of using the get_cpu_ptr() dance around the inc it is simpler to
increment the per-CPU variable directly. Also the per-CPU variable has
to be incremented and decremented on the same CPU. This is ensured by
the inner-part which disables preemption. This is kind of not obvious,
works and the preempt-counter is touched a few times for no reason.

Disable preemption before incrementing fpu_recursion_depth for the first
time. Keep preemption disabled until dc_fpu_end() where the counter is
decremented making it obvious that the preemption has to stay disabled
while the counter is non-zero.
Use simple inc/dec functions.
Remove the nested preempt_disable/enable functions which are now not
needed.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c    | 50 ++++++++-----------
 1 file changed, 20 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 86f4c0e046548..8bd5926b47e06 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -60,11 +60,9 @@ static DEFINE_PER_CPU(int, fpu_recursion_depth);
  */
 inline void dc_assert_fp_enabled(void)
 {
-	int *pcpu, depth = 0;
+	int depth;
 
-	pcpu = get_cpu_ptr(&fpu_recursion_depth);
-	depth = *pcpu;
-	put_cpu_ptr(&fpu_recursion_depth);
+	depth = __this_cpu_read(fpu_recursion_depth);
 
 	ASSERT(depth >= 1);
 }
@@ -84,32 +82,27 @@ inline void dc_assert_fp_enabled(void)
  */
 void dc_fpu_begin(const char *function_name, const int line)
 {
-	int *pcpu;
+	int depth;
 
-	pcpu = get_cpu_ptr(&fpu_recursion_depth);
-	*pcpu += 1;
+	preempt_disable();
+	depth = __this_cpu_inc_return(fpu_recursion_depth);
 
-	if (*pcpu == 1) {
+	if (depth == 1) {
 #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
 		kernel_fpu_begin();
 #elif defined(CONFIG_PPC64)
-		if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
-			preempt_disable();
+		if (cpu_has_feature(CPU_FTR_VSX_COMP))
 			enable_kernel_vsx();
-		} else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) {
-			preempt_disable();
+		else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
 			enable_kernel_altivec();
-		} else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) {
-			preempt_disable();
+		else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
 			enable_kernel_fp();
-		}
 #elif defined(CONFIG_ARM64)
 		kernel_neon_begin();
 #endif
 	}
 
-	TRACE_DCN_FPU(true, function_name, line, *pcpu);
-	put_cpu_ptr(&fpu_recursion_depth);
+	TRACE_DCN_FPU(true, function_name, line, depth);
 }
 
 /**
@@ -124,29 +117,26 @@ void dc_fpu_begin(const char *function_name, const int line)
  */
 void dc_fpu_end(const char *function_name, const int line)
 {
-	int *pcpu;
+	int depth;
 
-	pcpu = get_cpu_ptr(&fpu_recursion_depth);
-	*pcpu -= 1;
-	if (*pcpu <= 0) {
+	depth = __this_cpu_dec_return(fpu_recursion_depth);
+	if (depth == 0) {
 #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
 		kernel_fpu_end();
 #elif defined(CONFIG_PPC64)
-		if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
+		if (cpu_has_feature(CPU_FTR_VSX_COMP))
 			disable_kernel_vsx();
-			preempt_enable();
-		} else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) {
+		else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
 			disable_kernel_altivec();
-			preempt_enable();
-		} else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) {
+		else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
 			disable_kernel_fp();
-			preempt_enable();
-		}
 #elif defined(CONFIG_ARM64)
 		kernel_neon_end();
 #endif
+	} else {
+		WARN_ON_ONCE(depth < 0);
 	}
 
-	TRACE_DCN_FPU(false, function_name, line, *pcpu);
-	put_cpu_ptr(&fpu_recursion_depth);
+	TRACE_DCN_FPU(false, function_name, line, depth);
+	preempt_enable();
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] drm/amd/display: Add a warning if the FPU is used outside from task context.
  2023-09-21 14:15 [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation Sebastian Andrzej Siewior
  2023-09-21 14:15 ` [PATCH 1/5] drm/amd/display: Remove migrate_en/dis from dc_fpu_begin() Sebastian Andrzej Siewior
  2023-09-21 14:15 ` [PATCH 2/5] drm/amd/display: Simplify the per-CPU usage Sebastian Andrzej Siewior
@ 2023-09-21 14:15 ` Sebastian Andrzej Siewior
  2023-09-21 14:15 ` [PATCH 4/5] drm/amd/display: Move the memory allocation out of dcn21_validate_bandwidth_fp() Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-09-21 14:15 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Leo Li, Sebastian Andrzej Siewior, Pan, Xinhui, Rodrigo Siqueira,
	Peter Zijlstra, Alex Deucher, Thomas Gleixner,
	Christian König

Add a warning if the FPU is used from any context other than task
context. This is only precaution since the code is not able to be used
from softirq while the API allows it on x86 for instance.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 8bd5926b47e06..4ae4720535a56 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -84,6 +84,7 @@ void dc_fpu_begin(const char *function_name, const int line)
 {
 	int depth;
 
+	WARN_ON_ONCE(!in_task());
 	preempt_disable();
 	depth = __this_cpu_inc_return(fpu_recursion_depth);
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/5] drm/amd/display: Move the memory allocation out of dcn21_validate_bandwidth_fp().
  2023-09-21 14:15 [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2023-09-21 14:15 ` [PATCH 3/5] drm/amd/display: Add a warning if the FPU is used outside from task context Sebastian Andrzej Siewior
@ 2023-09-21 14:15 ` Sebastian Andrzej Siewior
  2023-09-21 14:15 ` [PATCH 5/5] drm/amd/display: Move the memory allocation out of dcn20_validate_bandwidth_fp() Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-09-21 14:15 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Leo Li, Sebastian Andrzej Siewior, Pan, Xinhui, Rodrigo Siqueira,
	Peter Zijlstra, Alex Deucher, Thomas Gleixner,
	Christian König

dcn21_validate_bandwidth_fp() is invoked while FPU access has been
enabled. FPU access requires disabling preemption even on PREEMPT_RT.
It is not possible to allocate memory with disabled preemption even with
GFP_ATOMIC on PREEMPT_RT.

Move the memory allocation before FPU access is enabled.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217928
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 10 +++++++++-
 drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c  |  7 ++-----
 drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h  |  5 ++---
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
index d1a25fe6c44fa..5674c3450fc36 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
@@ -953,9 +953,17 @@ static bool dcn21_validate_bandwidth(struct dc *dc, struct dc_state *context,
 		bool fast_validate)
 {
 	bool voltage_supported;
+	display_e2e_pipe_params_st *pipes;
+
+	pipes = kcalloc(dc->res_pool->pipe_count, sizeof(display_e2e_pipe_params_st), GFP_KERNEL);
+	if (!pipes)
+		return false;
+
 	DC_FP_START();
-	voltage_supported = dcn21_validate_bandwidth_fp(dc, context, fast_validate);
+	voltage_supported = dcn21_validate_bandwidth_fp(dc, context, fast_validate, pipes);
 	DC_FP_END();
+
+	kfree(pipes);
 	return voltage_supported;
 }
 
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
index 5805fb02af14e..8b2038162a7e1 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
@@ -2216,9 +2216,8 @@ static void dcn21_calculate_wm(struct dc *dc, struct dc_state *context,
 						&context->bw_ctx.dml, pipes, pipe_cnt);
 }
 
-bool dcn21_validate_bandwidth_fp(struct dc *dc,
-				 struct dc_state *context,
-				 bool fast_validate)
+bool dcn21_validate_bandwidth_fp(struct dc *dc, struct dc_state *context,
+				 bool fast_validate, display_e2e_pipe_params_st *pipes)
 {
 	bool out = false;
 
@@ -2227,7 +2226,6 @@ bool dcn21_validate_bandwidth_fp(struct dc *dc,
 	int vlevel = 0;
 	int pipe_split_from[MAX_PIPES];
 	int pipe_cnt = 0;
-	display_e2e_pipe_params_st *pipes = kzalloc(dc->res_pool->pipe_count * sizeof(display_e2e_pipe_params_st), GFP_ATOMIC);
 	DC_LOGGER_INIT(dc->ctx->logger);
 
 	BW_VAL_TRACE_COUNT();
@@ -2267,7 +2265,6 @@ bool dcn21_validate_bandwidth_fp(struct dc *dc,
 	out = false;
 
 validate_out:
-	kfree(pipes);
 
 	BW_VAL_TRACE_FINISH();
 
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h
index c51badf7b68a9..a81a0b9e68842 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h
@@ -77,9 +77,8 @@ int dcn21_populate_dml_pipes_from_context(struct dc *dc,
 					  struct dc_state *context,
 					  display_e2e_pipe_params_st *pipes,
 					  bool fast_validate);
-bool dcn21_validate_bandwidth_fp(struct dc *dc,
-				 struct dc_state *context,
-				 bool fast_validate);
+bool dcn21_validate_bandwidth_fp(struct dc *dc, struct dc_state *context, bool
+				 fast_validate, display_e2e_pipe_params_st *pipes);
 void dcn21_update_bw_bounding_box(struct dc *dc, struct clk_bw_params *bw_params);
 
 void dcn21_clk_mgr_set_bw_params_wm_table(struct clk_bw_params *bw_params);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/5] drm/amd/display: Move the memory allocation out of dcn20_validate_bandwidth_fp().
  2023-09-21 14:15 [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2023-09-21 14:15 ` [PATCH 4/5] drm/amd/display: Move the memory allocation out of dcn21_validate_bandwidth_fp() Sebastian Andrzej Siewior
@ 2023-09-21 14:15 ` Sebastian Andrzej Siewior
  2023-09-22  5:33 ` [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation Christian König
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-09-21 14:15 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Leo Li, Sebastian Andrzej Siewior, Pan, Xinhui, Rodrigo Siqueira,
	Peter Zijlstra, Alex Deucher, Thomas Gleixner,
	Christian König

dcn20_validate_bandwidth_fp() is invoked while FPU access has been
enabled. FPU access requires disabling preemption even on PREEMPT_RT.
It is not possible to allocate memory with disabled preemption even with
GFP_ATOMIC on PREEMPT_RT.

Move the memory allocation before FPU access is enabled.
To preserve previous "clean" state of "pipes" add a memset() before the
second invocation of dcn20_validate_bandwidth_internal() where the
variable is used.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 .../drm/amd/display/dc/dcn20/dcn20_resource.c    | 10 +++++++++-
 .../gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c | 16 +++++++---------
 .../gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h |  5 ++---
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
index d587f807dfd7c..5036a3e608324 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -2141,9 +2141,17 @@ bool dcn20_validate_bandwidth(struct dc *dc, struct dc_state *context,
 		bool fast_validate)
 {
 	bool voltage_supported;
+	display_e2e_pipe_params_st *pipes;
+
+	pipes = kcalloc(dc->res_pool->pipe_count, sizeof(display_e2e_pipe_params_st), GFP_KERNEL);
+	if (!pipes)
+		return false;
+
 	DC_FP_START();
-	voltage_supported = dcn20_validate_bandwidth_fp(dc, context, fast_validate);
+	voltage_supported = dcn20_validate_bandwidth_fp(dc, context, fast_validate, pipes);
 	DC_FP_END();
+
+	kfree(pipes);
 	return voltage_supported;
 }
 
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
index 8b2038162a7e1..2ad92497b9bf2 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.c
@@ -1923,7 +1923,7 @@ void dcn20_patch_bounding_box(struct dc *dc, struct _vcs_dpi_soc_bounding_box_st
 }
 
 static bool dcn20_validate_bandwidth_internal(struct dc *dc, struct dc_state *context,
-		bool fast_validate)
+		bool fast_validate, display_e2e_pipe_params_st *pipes)
 {
 	bool out = false;
 
@@ -1932,7 +1932,6 @@ static bool dcn20_validate_bandwidth_internal(struct dc *dc, struct dc_state *co
 	int vlevel = 0;
 	int pipe_split_from[MAX_PIPES];
 	int pipe_cnt = 0;
-	display_e2e_pipe_params_st *pipes = kzalloc(dc->res_pool->pipe_count * sizeof(display_e2e_pipe_params_st), GFP_ATOMIC);
 	DC_LOGGER_INIT(dc->ctx->logger);
 
 	BW_VAL_TRACE_COUNT();
@@ -1967,16 +1966,14 @@ static bool dcn20_validate_bandwidth_internal(struct dc *dc, struct dc_state *co
 	out = false;
 
 validate_out:
-	kfree(pipes);
 
 	BW_VAL_TRACE_FINISH();
 
 	return out;
 }
 
-bool dcn20_validate_bandwidth_fp(struct dc *dc,
-				 struct dc_state *context,
-				 bool fast_validate)
+bool dcn20_validate_bandwidth_fp(struct dc *dc, struct dc_state *context,
+				 bool fast_validate, display_e2e_pipe_params_st *pipes)
 {
 	bool voltage_supported = false;
 	bool full_pstate_supported = false;
@@ -1995,11 +1992,11 @@ bool dcn20_validate_bandwidth_fp(struct dc *dc,
 	ASSERT(context != dc->current_state);
 
 	if (fast_validate) {
-		return dcn20_validate_bandwidth_internal(dc, context, true);
+		return dcn20_validate_bandwidth_internal(dc, context, true, pipes);
 	}
 
 	// Best case, we support full UCLK switch latency
-	voltage_supported = dcn20_validate_bandwidth_internal(dc, context, false);
+	voltage_supported = dcn20_validate_bandwidth_internal(dc, context, false, pipes);
 	full_pstate_supported = context->bw_ctx.bw.dcn.clk.p_state_change_support;
 
 	if (context->bw_ctx.dml.soc.dummy_pstate_latency_us == 0 ||
@@ -2011,7 +2008,8 @@ bool dcn20_validate_bandwidth_fp(struct dc *dc,
 	// Fallback: Try to only support G6 temperature read latency
 	context->bw_ctx.dml.soc.dram_clock_change_latency_us = context->bw_ctx.dml.soc.dummy_pstate_latency_us;
 
-	voltage_supported = dcn20_validate_bandwidth_internal(dc, context, false);
+	memset(pipes, 0, dc->res_pool->pipe_count * sizeof(display_e2e_pipe_params_st));
+	voltage_supported = dcn20_validate_bandwidth_internal(dc, context, false, pipes);
 	dummy_pstate_supported = context->bw_ctx.bw.dcn.clk.p_state_change_support;
 
 	if (voltage_supported && (dummy_pstate_supported || !(context->stream_count))) {
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h
index a81a0b9e68842..b6c34198ddc86 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.h
@@ -61,9 +61,8 @@ void dcn20_update_bounding_box(struct dc *dc,
 			       unsigned int num_states);
 void dcn20_patch_bounding_box(struct dc *dc,
 			      struct _vcs_dpi_soc_bounding_box_st *bb);
-bool dcn20_validate_bandwidth_fp(struct dc *dc,
-				 struct dc_state *context,
-				 bool fast_validate);
+bool dcn20_validate_bandwidth_fp(struct dc *dc, struct dc_state *context,
+				 bool fast_validate, display_e2e_pipe_params_st *pipes);
 void dcn20_fpu_set_wm_ranges(int i,
 			     struct pp_smu_wm_range_sets *ranges,
 			     struct _vcs_dpi_soc_bounding_box_st *loaded_bb);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation.
  2023-09-21 14:15 [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2023-09-21 14:15 ` [PATCH 5/5] drm/amd/display: Move the memory allocation out of dcn20_validate_bandwidth_fp() Sebastian Andrzej Siewior
@ 2023-09-22  5:33 ` Christian König
  2023-10-02 10:58   ` Sebastian Andrzej Siewior
  2023-10-04 21:00 ` Rodrigo Siqueira Jordao
  2023-10-04 21:13 ` Hamza Mahfooz
  7 siblings, 1 reply; 18+ messages in thread
From: Christian König @ 2023-09-22  5:33 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, dri-devel, amd-gfx
  Cc: Leo Li, Pan, Xinhui, Rodrigo Siqueira, Peter Zijlstra,
	Alex Deucher, Thomas Gleixner, Christian König

Am 21.09.23 um 16:15 schrieb Sebastian Andrzej Siewior:
> Hi,
>
> I stumbled uppon the amdgpu driver via a bugzilla report. The actual fix
> is #4 + #5 and the rest was made while looking at the code.

Oh, yes please :)

Rodrigo and I have been trying to sort those things out previously, but 
that's Sisyphean work.

In general the DC team needs to judge, but of hand it looks good to me.

Christian.

>
> Sebastian
>
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation.
  2023-09-22  5:33 ` [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation Christian König
@ 2023-10-02 10:58   ` Sebastian Andrzej Siewior
  2023-10-03 19:54     ` Harry Wentland
  0 siblings, 1 reply; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-10-02 10:58 UTC (permalink / raw)
  To: Christian König
  Cc: Leo Li, Pan, Xinhui, Rodrigo Siqueira, amd-gfx, Peter Zijlstra,
	dri-devel, Alex Deucher, Thomas Gleixner, Christian König

On 2023-09-22 07:33:26 [+0200], Christian König wrote:
> Am 21.09.23 um 16:15 schrieb Sebastian Andrzej Siewior:
> > Hi,
> > 
> > I stumbled uppon the amdgpu driver via a bugzilla report. The actual fix
> > is #4 + #5 and the rest was made while looking at the code.
> 
> Oh, yes please :)
> 
> Rodrigo and I have been trying to sort those things out previously, but
> that's Sisyphean work.
> 
> In general the DC team needs to judge, but of hand it looks good to me.

Any way to get this merged? There was no reply from the DC team… No
reply from the person breaking it either. The bugzilla reporter stated
that it solves his trouble. He didn't report anything new ;)

> Christian.

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] drm/amd/display: Remove migrate_en/dis from dc_fpu_begin().
  2023-09-21 14:15 ` [PATCH 1/5] drm/amd/display: Remove migrate_en/dis from dc_fpu_begin() Sebastian Andrzej Siewior
@ 2023-10-03 19:53   ` Harry Wentland
  2023-10-04  8:53     ` Sebastian Andrzej Siewior
  2023-10-04 12:10     ` Hamza Mahfooz
  0 siblings, 2 replies; 18+ messages in thread
From: Harry Wentland @ 2023-10-03 19:53 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, dri-devel, amd-gfx
  Cc: Tianci Yin, Leo Li, Pan, Xinhui, Rodrigo Siqueira,
	Peter Zijlstra, Aurabindo Pillai, Alex Deucher, Thomas Gleixner,
	Christian König

On 2023-09-21 10:15, Sebastian Andrzej Siewior wrote:
> This is a revert of the commit mentioned below while it is not wrong, as
> in the kernel will explode, having migrate_disable() here it is
> complete waste of resources.
> 
> Additionally commit message is plain wrong the review tag does not make

Not sure I follow what's unhelpful about the review tag with
0c316556d1249 ("drm/amd/display: Disable migration to ensure consistency of per-CPU variable")

I do wish the original patch showed the splat it's attempting
to fix. It apparently made a difference for something, whether
inadvertently or not. I wish I knew what that "something" was.

Harry

> it any better. The migrate_disable() interface has a fat comment
> describing it and it includes the word "undesired" in the headline which
> should tickle people to read it before using it.
> Initially I assumed it is worded too harsh but now I beg to differ.
> 
> The reviewer of the original commit, even not understanding what
> migrate_disable() does should ask the following:
> 
> - migrate_disable() is added only to the CONFIG_X86 block and it claims
>   to protect fpu_recursion_depth. Why are the other the architectures
>   excluded?
> 
> - migrate_disable() is added after fpu_recursion_depth was modified.
>   Shouldn't it be added before the modification or referencing takes
>   place?
> 
> Moving on.
> Disabling preemption DOES prevent CPU migration. A task, that can not be
> pushed away from the CPU by the scheduler (due to disabled preemption)
> can not be pushed or migrated to another CPU.
> 
> Disabling migration DOES NOT ensure consistency of per-CPU variables. It
> only ensures that the task acts always on the same per-CPU variable. The
> task remains preemptible meaning multiple tasks can access the same
> per-CPU variable. This in turn leads to inconsistency for the statement
> 
>                   *pcpu -= 1;
> 
> with two tasks on one CPU and a preemption point during the RMW
> operation:
> 
>      Task A                           Task B
>      read pcpu to reg  # 0
>      inc reg           # 0 -> 1
>                                       read pcpu to reg  # 0
>                                       inc reg           # 0 -> 1
>                                       write reg to pcpu # 1
>      write reg to pcpu # 1
> 
> At the end pcpu reads 1 but should read 2 instead. Boom.
> 
> get_cpu_ptr() already contains a preempt_disable() statement. That means
> that the per-CPU variable can only be referenced by a single task which
> is currently running. The only inconsistency that can occur if the
> variable is additionally accessed from an interrupt.
> 
> Remove migrate_disable/enable() from dc_fpu_begin/end().
> 
> Cc: Tianci Yin <tianci.yin@amd.com>
> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
> Fixes: 0c316556d1249 ("drm/amd/display: Disable migration to ensure consistency of per-CPU variable")
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> index 172aa10a8800f..86f4c0e046548 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> @@ -91,7 +91,6 @@ void dc_fpu_begin(const char *function_name, const int line)
>  
>  	if (*pcpu == 1) {
>  #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
> -		migrate_disable();
>  		kernel_fpu_begin();
>  #elif defined(CONFIG_PPC64)
>  		if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
> @@ -132,7 +131,6 @@ void dc_fpu_end(const char *function_name, const int line)
>  	if (*pcpu <= 0) {
>  #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
>  		kernel_fpu_end();
> -		migrate_enable();
>  #elif defined(CONFIG_PPC64)
>  		if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
>  			disable_kernel_vsx();


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation.
  2023-10-02 10:58   ` Sebastian Andrzej Siewior
@ 2023-10-03 19:54     ` Harry Wentland
  2023-10-04  8:49       ` Sebastian Andrzej Siewior
  2023-10-04 12:44       ` Harry Wentland
  0 siblings, 2 replies; 18+ messages in thread
From: Harry Wentland @ 2023-10-03 19:54 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Christian König
  Cc: Leo Li, Pan, Xinhui, Rodrigo Siqueira, amd-gfx, Peter Zijlstra,
	dri-devel, Alex Deucher, Thomas Gleixner, Christian König

On 2023-10-02 06:58, Sebastian Andrzej Siewior wrote:
> On 2023-09-22 07:33:26 [+0200], Christian König wrote:
>> Am 21.09.23 um 16:15 schrieb Sebastian Andrzej Siewior:
>>> Hi,
>>>
>>> I stumbled uppon the amdgpu driver via a bugzilla report. The actual fix
>>> is #4 + #5 and the rest was made while looking at the code.
>>
>> Oh, yes please :)
>>
>> Rodrigo and I have been trying to sort those things out previously, but
>> that's Sisyphean work.
>>
>> In general the DC team needs to judge, but of hand it looks good to me.
> 
> Any way to get this merged? There was no reply from the DC team… No
> reply from the person breaking it either. The bugzilla reporter stated
> that it solves his trouble. He didn't report anything new ;)
> 

Apologies for the slow progress. We're feeding it through our CI and
will let you know the verdict soon.

Do you happen to have the bugzilla link that this is fixing? It would
be helpful to include that as a link in the patches as well, to give
them context.

Harry

>> Christian.
> 
> Sebastian


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation.
  2023-10-03 19:54     ` Harry Wentland
@ 2023-10-04  8:49       ` Sebastian Andrzej Siewior
  2023-10-04 12:44       ` Harry Wentland
  1 sibling, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-10-04  8:49 UTC (permalink / raw)
  To: Harry Wentland
  Cc: Leo Li, Christian König, Pan, Xinhui, Rodrigo Siqueira,
	amd-gfx, Peter Zijlstra, dri-devel, Alex Deucher,
	Thomas Gleixner, Christian König

On 2023-10-03 15:54:58 [-0400], Harry Wentland wrote:
> On 2023-10-02 06:58, Sebastian Andrzej Siewior wrote:
> > On 2023-09-22 07:33:26 [+0200], Christian König wrote:
> >> Am 21.09.23 um 16:15 schrieb Sebastian Andrzej Siewior:
> >>> Hi,
> >>>
> >>> I stumbled uppon the amdgpu driver via a bugzilla report. The actual fix
> >>> is #4 + #5 and the rest was made while looking at the code.
> >>
> >> Oh, yes please :)
> >>
> >> Rodrigo and I have been trying to sort those things out previously, but
> >> that's Sisyphean work.
> >>
> >> In general the DC team needs to judge, but of hand it looks good to me.
> > 
> > Any way to get this merged? There was no reply from the DC team… No
> > reply from the person breaking it either. The bugzilla reporter stated
> > that it solves his trouble. He didn't report anything new ;)
> > 
> 
> Apologies for the slow progress. We're feeding it through our CI and
> will let you know the verdict soon.
> 
> Do you happen to have the bugzilla link that this is fixing? It would
> be helpful to include that as a link in the patches as well, to give
> them context.
The bugzilla report is at
  https://bugzilla.kernel.org/show_bug.cgi?id=217928

but the patches explain the situation, too. Even more verbose than the
report…

> Harry

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] drm/amd/display: Remove migrate_en/dis from dc_fpu_begin().
  2023-10-03 19:53   ` Harry Wentland
@ 2023-10-04  8:53     ` Sebastian Andrzej Siewior
  2023-10-04 12:10     ` Hamza Mahfooz
  1 sibling, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-10-04  8:53 UTC (permalink / raw)
  To: Harry Wentland
  Cc: Tianci Yin, Leo Li, Pan, Xinhui, Rodrigo Siqueira, amd-gfx,
	Peter Zijlstra, Aurabindo Pillai, dri-devel, Alex Deucher,
	Thomas Gleixner, Christian König

On 2023-10-03 15:53:41 [-0400], Harry Wentland wrote:
> On 2023-09-21 10:15, Sebastian Andrzej Siewior wrote:
> > This is a revert of the commit mentioned below while it is not wrong, as
> > in the kernel will explode, having migrate_disable() here it is
> > complete waste of resources.
> > 
> > Additionally commit message is plain wrong the review tag does not make
> 
> Not sure I follow what's unhelpful about the review tag with
> 0c316556d1249 ("drm/amd/display: Disable migration to ensure consistency of per-CPU variable")

I explained it below with two points what the reviewer should have
noticed why reading the commit message even if he does not know what
migrate_disable() itself does.

> I do wish the original patch showed the splat it's attempting
> to fix. It apparently made a difference for something, whether
> inadvertently or not. I wish I knew what that "something" was.

As far as I can tell the patch does make a difference.

> Harry

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] drm/amd/display: Remove migrate_en/dis from dc_fpu_begin().
  2023-10-03 19:53   ` Harry Wentland
  2023-10-04  8:53     ` Sebastian Andrzej Siewior
@ 2023-10-04 12:10     ` Hamza Mahfooz
  2023-10-05 12:10       ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 18+ messages in thread
From: Hamza Mahfooz @ 2023-10-04 12:10 UTC (permalink / raw)
  To: Harry Wentland, Sebastian Andrzej Siewior, dri-devel, amd-gfx
  Cc: Tianci Yin, Leo Li, Pan, Xinhui, Rodrigo Siqueira,
	Peter Zijlstra, Aurabindo Pillai, Alex Deucher, Thomas Gleixner,
	Christian König

On 10/3/23 15:53, Harry Wentland wrote:
> On 2023-09-21 10:15, Sebastian Andrzej Siewior wrote:
>> This is a revert of the commit mentioned below while it is not wrong, as
>> in the kernel will explode, having migrate_disable() here it is
>> complete waste of resources.
>>
>> Additionally commit message is plain wrong the review tag does not make
> 
> Not sure I follow what's unhelpful about the review tag with
> 0c316556d1249 ("drm/amd/display: Disable migration to ensure consistency of per-CPU variable")
> 
> I do wish the original patch showed the splat it's attempting
> to fix. It apparently made a difference for something, whether
> inadvertently or not. I wish I knew what that "something" was.

I did some digging, and it seems like the intention of that patch was to
fix the following splat:

WARNING: CPU: 5 PID: 1062 at 
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/dc_fpu.c:71 
dc_assert_fp_enabled+0x1a/0x30 [amdgpu]
[...]
CPU: 5 PID: 1062 Comm: Xorg Tainted: G           OE 
5.15.0-56-generic #62-Ubuntu
Hardware name: ASUS System Product Name/ROG STRIX Z590-F GAMING WIFI, 
BIOS 1202 10/27/2021
RIP: 0010:dc_assert_fp_enabled+0x1a/0x30 [amdgpu]
Code: ff 45 31 f6 0f 0b e9 ca fe ff ff e8 90 1c 1f f7 48 c7 c0 00 30 03 
00 65 48 03 05 b1 aa 86 3f 8b 00 85 c0 7e 05 c3 cc cc cc cc <0f> 0b c3 
cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
RSP: 0000:ffffb89b82a8f118 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8c271cd00000 RCX: 0000000000000000
RDX: ffff8c2708025000 RSI: ffff8c270e8c0000 RDI: ffff8c271cd00000
RBP: ffffb89b82a8f1d0 R08: 0000000000000000 R09: 7fffff6affffffff
R10: ffffb89b82a8f240 R11: 0000000000000000 R12: 0000000000000002
R13: ffff8c271cd00000 R14: ffff8c270e8c0000 R15: ffff8c2708025000
FS:  00007f0570019a80(0000) GS:ffff8c2e3fb40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005594858a0058 CR3: 000000010e204001 CR4: 0000000000770ee0
PKRU: 55555554
Call Trace:
  <TASK>
  ? dcn20_populate_dml_pipes_from_context+0x47/0x1730 [amdgpu]
  ? __kmalloc_node+0x2cb/0x3a0
  dcn32_populate_dml_pipes_from_context+0x2b/0x450 [amdgpu]
  dcn32_internal_validate_bw+0x15f9/0x2670 [amdgpu]
  dcn32_find_dummy_latency_index_for_fw_based_mclk_switch+0xd0/0x310 
[amdgpu]
  dcn32_calculate_wm_and_dlg_fpu+0xe6/0x1e50 [amdgpu]
  dcn32_calculate_wm_and_dlg+0x46/0x70 [amdgpu]
  dcn32_validate_bandwidth+0x1b7/0x3e0 [amdgpu]
  dc_validate_global_state+0x32c/0x560 [amdgpu]
  dc_validate_with_context+0x6e6/0xd80 [amdgpu]
  dc_commit_streams+0x21b/0x500 [amdgpu]
  dc_commit_state+0xf3/0x150 [amdgpu]
  amdgpu_dm_atomic_commit_tail+0x60d/0x3120 [amdgpu]
  ? dcn32_internal_validate_bw+0xcf8/0x2670 [amdgpu]
  ? fill_plane_buffer_attributes+0x1e5/0x560 [amdgpu]
  ? dcn32_validate_bandwidth+0x1e0/0x3e0 [amdgpu]
  ? kfree+0x1f7/0x250
  ? dcn32_validate_bandwidth+0x1e0/0x3e0 [amdgpu]
  ? dc_validate_global_state+0x32c/0x560 [amdgpu]
  ? __cond_resched+0x1a/0x50
  ? __wait_for_common+0x3e/0x150
  ? fill_plane_buffer_attributes+0x1e5/0x560 [amdgpu]
  ? usleep_range_state+0x90/0x90
  ? wait_for_completion_timeout+0x1d/0x30
  commit_tail+0xc2/0x170 [drm_kms_helper]
  ? drm_atomic_helper_swap_state+0x20f/0x370 [drm_kms_helper]
  drm_atomic_helper_commit+0x12b/0x150 [drm_kms_helper]
  amdgpu_dm_atomic_commit+0x11/0x20 [amdgpu]
  drm_atomic_commit+0x47/0x60 [drm]
  drm_mode_obj_set_property_ioctl+0x16b/0x420 [drm]
  ? mutex_lock+0x13/0x50
  ? drm_mode_createblob_ioctl+0xf6/0x130 [drm]
  ? drm_mode_obj_find_prop_id+0x90/0x90 [drm]
  drm_ioctl_kernel+0xb0/0x100 [drm]
  drm_ioctl+0x268/0x4b0 [drm]
  ? drm_mode_obj_find_prop_id+0x90/0x90 [drm]
  ? ktime_get_mono_fast_ns+0x4a/0xa0
  amdgpu_drm_ioctl+0x4e/0x90 [amdgpu]
  __x64_sys_ioctl+0x92/0xd0
  do_syscall_64+0x59/0xc0
  ? do_user_addr_fault+0x1e7/0x670
  ? do_syscall_64+0x69/0xc0
  ? exit_to_user_mode_prepare+0x37/0xb0
  ? irqentry_exit_to_user_mode+0x9/0x20
  ? irqentry_exit+0x1d/0x30
  ? exc_page_fault+0x89/0x170
  entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7f05704a2aff
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 
44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 
3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
RSP: 002b:00007ffc8c45a3f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffc8c45a480 RCX: 00007f05704a2aff
RDX: 00007ffc8c45a480 RSI: 00000000c01864ba RDI: 000000000000000e
RBP: 00000000c01864ba R08: 0000000000000077 R09: 00000000cccccccc
R10: 00007f05705a22f0 R11: 0000000000000246 R12: 0000000000000004
R13: 000000000000000e R14: 000000000000000f R15: 00007ffc8c45a8a8
  </TASK>
---[ end trace 4deab30bb69df00f ]---

> 
> Harry
> 
>> it any better. The migrate_disable() interface has a fat comment
>> describing it and it includes the word "undesired" in the headline which
>> should tickle people to read it before using it.
>> Initially I assumed it is worded too harsh but now I beg to differ.
>>
>> The reviewer of the original commit, even not understanding what
>> migrate_disable() does should ask the following:
>>
>> - migrate_disable() is added only to the CONFIG_X86 block and it claims
>>    to protect fpu_recursion_depth. Why are the other the architectures
>>    excluded?
>>
>> - migrate_disable() is added after fpu_recursion_depth was modified.
>>    Shouldn't it be added before the modification or referencing takes
>>    place?
>>
>> Moving on.
>> Disabling preemption DOES prevent CPU migration. A task, that can not be
>> pushed away from the CPU by the scheduler (due to disabled preemption)
>> can not be pushed or migrated to another CPU.
>>
>> Disabling migration DOES NOT ensure consistency of per-CPU variables. It
>> only ensures that the task acts always on the same per-CPU variable. The
>> task remains preemptible meaning multiple tasks can access the same
>> per-CPU variable. This in turn leads to inconsistency for the statement
>>
>>                    *pcpu -= 1;
>>
>> with two tasks on one CPU and a preemption point during the RMW
>> operation:
>>
>>       Task A                           Task B
>>       read pcpu to reg  # 0
>>       inc reg           # 0 -> 1
>>                                        read pcpu to reg  # 0
>>                                        inc reg           # 0 -> 1
>>                                        write reg to pcpu # 1
>>       write reg to pcpu # 1
>>
>> At the end pcpu reads 1 but should read 2 instead. Boom.
>>
>> get_cpu_ptr() already contains a preempt_disable() statement. That means
>> that the per-CPU variable can only be referenced by a single task which
>> is currently running. The only inconsistency that can occur if the
>> variable is additionally accessed from an interrupt.
>>
>> Remove migrate_disable/enable() from dc_fpu_begin/end().
>>
>> Cc: Tianci Yin <tianci.yin@amd.com>
>> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
>> Fixes: 0c316556d1249 ("drm/amd/display: Disable migration to ensure consistency of per-CPU variable")
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>> ---
>>   drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 2 --
>>   1 file changed, 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
>> index 172aa10a8800f..86f4c0e046548 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
>> @@ -91,7 +91,6 @@ void dc_fpu_begin(const char *function_name, const int line)
>>   
>>   	if (*pcpu == 1) {
>>   #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
>> -		migrate_disable();
>>   		kernel_fpu_begin();
>>   #elif defined(CONFIG_PPC64)
>>   		if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
>> @@ -132,7 +131,6 @@ void dc_fpu_end(const char *function_name, const int line)
>>   	if (*pcpu <= 0) {
>>   #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
>>   		kernel_fpu_end();
>> -		migrate_enable();
>>   #elif defined(CONFIG_PPC64)
>>   		if (cpu_has_feature(CPU_FTR_VSX_COMP)) {
>>   			disable_kernel_vsx();
> 
-- 
Hamza


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation.
  2023-10-03 19:54     ` Harry Wentland
  2023-10-04  8:49       ` Sebastian Andrzej Siewior
@ 2023-10-04 12:44       ` Harry Wentland
  2023-10-05 12:11         ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 18+ messages in thread
From: Harry Wentland @ 2023-10-04 12:44 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Christian König
  Cc: Leo Li, Pan, Xinhui, Rodrigo Siqueira, dri-devel, Peter Zijlstra,
	amd-gfx, Alex Deucher, Thomas Gleixner, Christian König



On 2023-10-03 15:54, Harry Wentland wrote:
> On 2023-10-02 06:58, Sebastian Andrzej Siewior wrote:
>> On 2023-09-22 07:33:26 [+0200], Christian König wrote:
>>> Am 21.09.23 um 16:15 schrieb Sebastian Andrzej Siewior:
>>>> Hi,
>>>>
>>>> I stumbled uppon the amdgpu driver via a bugzilla report. The actual fix
>>>> is #4 + #5 and the rest was made while looking at the code.
>>>
>>> Oh, yes please :)
>>>
>>> Rodrigo and I have been trying to sort those things out previously, but
>>> that's Sisyphean work.
>>>
>>> In general the DC team needs to judge, but of hand it looks good to me.
>>
>> Any way to get this merged? There was no reply from the DC team… No
>> reply from the person breaking it either. The bugzilla reporter stated
>> that it solves his trouble. He didn't report anything new ;)
>>
> 
> Apologies for the slow progress. We're feeding it through our CI and
> will let you know the verdict soon.
> 
> Do you happen to have the bugzilla link that this is fixing? It would
> be helpful to include that as a link in the patches as well, to give
> them context.
> 

CI passed.

Series is
Acked-by: Harry Wentland <harry.wentland@amd.com>

Harry

> Harry
> 
>>> Christian.
>>
>> Sebastian
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation.
  2023-09-21 14:15 [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation Sebastian Andrzej Siewior
                   ` (5 preceding siblings ...)
  2023-09-22  5:33 ` [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation Christian König
@ 2023-10-04 21:00 ` Rodrigo Siqueira Jordao
  2023-10-04 21:13 ` Hamza Mahfooz
  7 siblings, 0 replies; 18+ messages in thread
From: Rodrigo Siqueira Jordao @ 2023-10-04 21:00 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, dri-devel, amd-gfx
  Cc: Leo Li, Pan, Xinhui, Peter Zijlstra, Alex Deucher,
	Thomas Gleixner, Christian König



On 9/21/23 08:15, Sebastian Andrzej Siewior wrote:
> Hi,
> 
> I stumbled uppon the amdgpu driver via a bugzilla report. The actual fix
> is #4 + #5 and the rest was made while looking at the code.
> 
> Sebastian
> 
> 

Hi Sebastian,

Thanks a lot for this patchset. We tested it on multiple devices, and 
everything looks good. I also reviewed it and lgtm.

Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>

Thanks
Siqueira

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation.
  2023-09-21 14:15 [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation Sebastian Andrzej Siewior
                   ` (6 preceding siblings ...)
  2023-10-04 21:00 ` Rodrigo Siqueira Jordao
@ 2023-10-04 21:13 ` Hamza Mahfooz
  7 siblings, 0 replies; 18+ messages in thread
From: Hamza Mahfooz @ 2023-10-04 21:13 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, dri-devel, amd-gfx
  Cc: Leo Li, Pan, Xinhui, Rodrigo Siqueira, Peter Zijlstra,
	Alex Deucher, Thomas Gleixner, Christian König

On 9/21/23 10:15, Sebastian Andrzej Siewior wrote:
> Hi,
> 
> I stumbled uppon the amdgpu driver via a bugzilla report. The actual fix
> is #4 + #5 and the rest was made while looking at the code.
> 
> Sebastian

I have applied the series, thanks!

> 
> 
-- 
Hamza


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] drm/amd/display: Remove migrate_en/dis from dc_fpu_begin().
  2023-10-04 12:10     ` Hamza Mahfooz
@ 2023-10-05 12:10       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-10-05 12:10 UTC (permalink / raw)
  To: Hamza Mahfooz
  Cc: Tianci Yin, Leo Li, Pan, Xinhui, Rodrigo Siqueira, amd-gfx,
	Peter Zijlstra, Aurabindo Pillai, dri-devel, Alex Deucher,
	Thomas Gleixner, Christian König

On 2023-10-04 08:10:35 [-0400], Hamza Mahfooz wrote:
> I did some digging, and it seems like the intention of that patch was to
> fix the following splat:
> 
> WARNING: CPU: 5 PID: 1062 at
> drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/dc_fpu.c:71
> dc_assert_fp_enabled+0x1a/0x30 [amdgpu]
> [...]
> CPU: 5 PID: 1062 Comm: Xorg Tainted: G           OE 5.15.0-56-generic

So it has hard to look this up with a upstream v5.15 kernel since the
dcn32_populate_dml_pipes_from_context() was introduced in v6.0-rc1.
Judging from v6.0-rc1 I don't see how that warning could occur other
than using dc_assert_fp_enabled() without invoking DC_FP_START first.

> Hamza

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation.
  2023-10-04 12:44       ` Harry Wentland
@ 2023-10-05 12:11         ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-10-05 12:11 UTC (permalink / raw)
  To: Harry Wentland
  Cc: Leo Li, Christian König, Pan, Xinhui, Rodrigo Siqueira,
	amd-gfx, Peter Zijlstra, dri-devel, Alex Deucher,
	Thomas Gleixner, Christian König

On 2023-10-04 08:44:58 [-0400], Harry Wentland wrote:
> CI passed.
> 
> Series is
> Acked-by: Harry Wentland <harry.wentland@amd.com>

Thank you.

> Harry

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-10-05 12:11 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-21 14:15 [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation Sebastian Andrzej Siewior
2023-09-21 14:15 ` [PATCH 1/5] drm/amd/display: Remove migrate_en/dis from dc_fpu_begin() Sebastian Andrzej Siewior
2023-10-03 19:53   ` Harry Wentland
2023-10-04  8:53     ` Sebastian Andrzej Siewior
2023-10-04 12:10     ` Hamza Mahfooz
2023-10-05 12:10       ` Sebastian Andrzej Siewior
2023-09-21 14:15 ` [PATCH 2/5] drm/amd/display: Simplify the per-CPU usage Sebastian Andrzej Siewior
2023-09-21 14:15 ` [PATCH 3/5] drm/amd/display: Add a warning if the FPU is used outside from task context Sebastian Andrzej Siewior
2023-09-21 14:15 ` [PATCH 4/5] drm/amd/display: Move the memory allocation out of dcn21_validate_bandwidth_fp() Sebastian Andrzej Siewior
2023-09-21 14:15 ` [PATCH 5/5] drm/amd/display: Move the memory allocation out of dcn20_validate_bandwidth_fp() Sebastian Andrzej Siewior
2023-09-22  5:33 ` [PATCH 0/5] drm/amd/display: Remove migrate-disable and move memory allocation Christian König
2023-10-02 10:58   ` Sebastian Andrzej Siewior
2023-10-03 19:54     ` Harry Wentland
2023-10-04  8:49       ` Sebastian Andrzej Siewior
2023-10-04 12:44       ` Harry Wentland
2023-10-05 12:11         ` Sebastian Andrzej Siewior
2023-10-04 21:00 ` Rodrigo Siqueira Jordao
2023-10-04 21:13 ` Hamza Mahfooz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).