All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/5] DRM scheduler changes for Xe
@ 2023-10-31  3:24 ` Matthew Brost
  0 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-10-31  3:24 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, thomas.hellstrom, Matthew Brost, sarah.walker,
	ltuikov, ketil.johnsen, Liviu.Dudau, mcanal, boris.brezillon,
	dakr, donald.robson, lina, christian.koenig, faith.ekstrand

      As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we
have been asked to merge our common DRM scheduler patches first.

This a continuation of a RFC [3] with all comments addressed, ready for
a full review, and hopefully in state which can merged in the near
future. More details of this series can found in the cover letter of the
RFC [3].

These changes have been tested with the Xe driver. Based on drm-tip branch.

A follow up series will be posted to address some of dakr requets for
kernel doc changes.

v2:
 - Break run job, free job, and process message in own work items
 - This might break other drivers as run job and free job now can run in
   parallel, can fix up if needed

v3:
 - Include missing patch 'drm/sched: Add drm_sched_submit_* helpers'
 - Fix issue with setting timestamp to early
 - Don't dequeue jobs for single entity after calling entity fini
 - Flush pending jobs on entity fini
 - Add documentation for entity teardown
 - Add Matthew Brost to maintainers of DRM scheduler

v4:
 - Drop message interface
 - Drop 'Flush pending jobs on entity fini'
 - Drop 'Add documentation for entity teardown'
 - Address all feedback

v5:
 - Address Luben's feedback
 - Drop starting TDR after calling run_job()
 - Drop adding Matthew Brost to maintainers of DRM scheduler

v6:
 - Address Luben's feedback
 - Include base commit

v7:
 - Drop SINGLE_ENTITY mode rather pull in Luben's patch for dynamic run queues
 - Address Luben's feedback for free_job work item patch

v8:
 - Rebase on drm-tip which includes Luben's patch for dynamic run queues
 - Don't adjust comments, change variable names, function names twice in series
 - Don't move existing code to different places in a file to preserve git history

Matt

[1] https://gitlab.freedesktop.org/drm/xe/kernel
[2] https://patchwork.freedesktop.org/series/112188/
[3] https://patchwork.freedesktop.org/series/116055/

Matthew Brost (5):
  drm/sched: Add drm_sched_wqueue_* helpers
  drm/sched: Convert drm scheduler to use a work queue rather than
    kthread
  drm/sched: Split free_job into own work item
  drm/sched: Add drm_sched_start_timeout_unlocked helper
  drm/sched: Add a helper to queue TDR immediately

 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  15 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  14 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c       |   2 +-
 drivers/gpu/drm/lima/lima_sched.c             |   2 +-
 drivers/gpu/drm/msm/adreno/adreno_device.c    |   6 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c          |   2 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c       |   2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c       |   2 +-
 drivers/gpu/drm/scheduler/sched_main.c        | 301 ++++++++++++------
 drivers/gpu/drm/v3d/v3d_sched.c               |  10 +-
 include/drm/gpu_scheduler.h                   |  20 +-
 12 files changed, 248 insertions(+), 130 deletions(-)


base-commit: b560681c6bf623db41064ac486dd148d6c103e53
-- 
2.34.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Intel-xe] [PATCH v8 0/5] DRM scheduler changes for Xe
@ 2023-10-31  3:24 ` Matthew Brost
  0 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-10-31  3:24 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ltuikov, ketil.johnsen, Liviu.Dudau,
	mcanal, frank.binns, boris.brezillon, dakr, donald.robson,
	daniel, lina, airlied, christian.koenig, faith.ekstrand

      As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we
have been asked to merge our common DRM scheduler patches first.

This a continuation of a RFC [3] with all comments addressed, ready for
a full review, and hopefully in state which can merged in the near
future. More details of this series can found in the cover letter of the
RFC [3].

These changes have been tested with the Xe driver. Based on drm-tip branch.

A follow up series will be posted to address some of dakr requets for
kernel doc changes.

v2:
 - Break run job, free job, and process message in own work items
 - This might break other drivers as run job and free job now can run in
   parallel, can fix up if needed

v3:
 - Include missing patch 'drm/sched: Add drm_sched_submit_* helpers'
 - Fix issue with setting timestamp to early
 - Don't dequeue jobs for single entity after calling entity fini
 - Flush pending jobs on entity fini
 - Add documentation for entity teardown
 - Add Matthew Brost to maintainers of DRM scheduler

v4:
 - Drop message interface
 - Drop 'Flush pending jobs on entity fini'
 - Drop 'Add documentation for entity teardown'
 - Address all feedback

v5:
 - Address Luben's feedback
 - Drop starting TDR after calling run_job()
 - Drop adding Matthew Brost to maintainers of DRM scheduler

v6:
 - Address Luben's feedback
 - Include base commit

v7:
 - Drop SINGLE_ENTITY mode rather pull in Luben's patch for dynamic run queues
 - Address Luben's feedback for free_job work item patch

v8:
 - Rebase on drm-tip which includes Luben's patch for dynamic run queues
 - Don't adjust comments, change variable names, function names twice in series
 - Don't move existing code to different places in a file to preserve git history

Matt

[1] https://gitlab.freedesktop.org/drm/xe/kernel
[2] https://patchwork.freedesktop.org/series/112188/
[3] https://patchwork.freedesktop.org/series/116055/

Matthew Brost (5):
  drm/sched: Add drm_sched_wqueue_* helpers
  drm/sched: Convert drm scheduler to use a work queue rather than
    kthread
  drm/sched: Split free_job into own work item
  drm/sched: Add drm_sched_start_timeout_unlocked helper
  drm/sched: Add a helper to queue TDR immediately

 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  15 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  14 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c       |   2 +-
 drivers/gpu/drm/lima/lima_sched.c             |   2 +-
 drivers/gpu/drm/msm/adreno/adreno_device.c    |   6 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c          |   2 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c       |   2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c       |   2 +-
 drivers/gpu/drm/scheduler/sched_main.c        | 301 ++++++++++++------
 drivers/gpu/drm/v3d/v3d_sched.c               |  10 +-
 include/drm/gpu_scheduler.h                   |  20 +-
 12 files changed, 248 insertions(+), 130 deletions(-)


base-commit: b560681c6bf623db41064ac486dd148d6c103e53
-- 
2.34.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v8 1/5] drm/sched: Add drm_sched_wqueue_* helpers
  2023-10-31  3:24 ` [Intel-xe] " Matthew Brost
@ 2023-10-31  3:24   ` Matthew Brost
  -1 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-10-31  3:24 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, thomas.hellstrom, Matthew Brost, sarah.walker,
	Luben Tuikov, ltuikov, ketil.johnsen, Liviu.Dudau, mcanal,
	boris.brezillon, dakr, donald.robson, lina, christian.koenig,
	faith.ekstrand

Add scheduler wqueue ready, stop, and start helpers to hide the
implementation details of the scheduler from the drivers.

v2:
  - s/sched_wqueue/sched_wqueue (Luben)
  - Remove the extra white line after the return-statement (Luben)
  - update drm_sched_wqueue_ready comment (Luben)

Cc: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   | 15 +++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 12 +++---
 drivers/gpu/drm/msm/adreno/adreno_device.c    |  6 ++-
 drivers/gpu/drm/scheduler/sched_main.c        | 39 ++++++++++++++++++-
 include/drm/gpu_scheduler.h                   |  3 ++
 6 files changed, 59 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 625db444df1c..10d56979fe3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -290,7 +290,7 @@ static int suspend_resume_compute_scheduler(struct amdgpu_device *adev, bool sus
 	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
 		struct amdgpu_ring *ring = &adev->gfx.compute_ring[i];
 
-		if (!(ring && ring->sched.thread))
+		if (!(ring && drm_sched_wqueue_ready(&ring->sched)))
 			continue;
 
 		/* stop secheduler and drain ring. */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 3136a0774dd9..e20fd9e6c5bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1659,9 +1659,9 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
 	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 			continue;
-		kthread_park(ring->sched.thread);
+		drm_sched_wqueue_stop(&ring->sched);
 	}
 
 	seq_puts(m, "run ib test:\n");
@@ -1675,9 +1675,9 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
 	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 			continue;
-		kthread_unpark(ring->sched.thread);
+		drm_sched_wqueue_start(&ring->sched);
 	}
 
 	up_write(&adev->reset_domain->sem);
@@ -1897,7 +1897,8 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 
 	ring = adev->rings[val];
 
-	if (!ring || !ring->funcs->preempt_ib || !ring->sched.thread)
+	if (!ring || !ring->funcs->preempt_ib ||
+	    !drm_sched_wqueue_ready(&ring->sched))
 		return -EINVAL;
 
 	/* the last preemption failed */
@@ -1915,7 +1916,7 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 		goto pro_end;
 
 	/* stop the scheduler */
-	kthread_park(ring->sched.thread);
+	drm_sched_wqueue_stop(&ring->sched);
 
 	/* preempt the IB */
 	r = amdgpu_ring_preempt_ib(ring);
@@ -1949,7 +1950,7 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 
 failure:
 	/* restart the scheduler */
-	kthread_unpark(ring->sched.thread);
+	drm_sched_wqueue_start(&ring->sched);
 
 	up_read(&adev->reset_domain->sem);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 186c06756a2c..d20c12aae66b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4861,7 +4861,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev)
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 			continue;
 
 		spin_lock(&ring->sched.job_list_lock);
@@ -5000,7 +5000,7 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 			continue;
 
 		/* Clear job fence from fence drv to avoid force_completion
@@ -5489,7 +5489,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 		for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 			struct amdgpu_ring *ring = tmp_adev->rings[i];
 
-			if (!ring || !ring->sched.thread)
+			if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 				continue;
 
 			drm_sched_stop(&ring->sched, job ? &job->base : NULL);
@@ -5565,7 +5565,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 		for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 			struct amdgpu_ring *ring = tmp_adev->rings[i];
 
-			if (!ring || !ring->sched.thread)
+			if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 				continue;
 
 			drm_sched_start(&ring->sched, true);
@@ -5892,7 +5892,7 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
 		for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 			struct amdgpu_ring *ring = adev->rings[i];
 
-			if (!ring || !ring->sched.thread)
+			if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 				continue;
 
 			drm_sched_stop(&ring->sched, NULL);
@@ -6020,7 +6020,7 @@ void amdgpu_pci_resume(struct pci_dev *pdev)
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 			continue;
 
 		drm_sched_start(&ring->sched, true);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 41b13dec9bef..f62ab5257e66 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -841,7 +841,8 @@ static void suspend_scheduler(struct msm_gpu *gpu)
 	 */
 	for (i = 0; i < gpu->nr_rings; i++) {
 		struct drm_gpu_scheduler *sched = &gpu->rb[i]->sched;
-		kthread_park(sched->thread);
+
+		drm_sched_wqueue_stop(sched);
 	}
 }
 
@@ -851,7 +852,8 @@ static void resume_scheduler(struct msm_gpu *gpu)
 
 	for (i = 0; i < gpu->nr_rings; i++) {
 		struct drm_gpu_scheduler *sched = &gpu->rb[i]->sched;
-		kthread_unpark(sched->thread);
+
+		drm_sched_wqueue_start(sched);
 	}
 }
 
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 99797a8c836a..54c1c5fe01ba 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -439,7 +439,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 {
 	struct drm_sched_job *s_job, *tmp;
 
-	kthread_park(sched->thread);
+	drm_sched_wqueue_stop(sched);
 
 	/*
 	 * Reinsert back the bad job here - now it's safe as
@@ -552,7 +552,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
 		spin_unlock(&sched->job_list_lock);
 	}
 
-	kthread_unpark(sched->thread);
+	drm_sched_wqueue_start(sched);
 }
 EXPORT_SYMBOL(drm_sched_start);
 
@@ -1252,3 +1252,38 @@ void drm_sched_increase_karma(struct drm_sched_job *bad)
 	}
 }
 EXPORT_SYMBOL(drm_sched_increase_karma);
+
+/**
+ * drm_sched_wqueue_ready - Is the scheduler ready for submission
+ *
+ * @sched: scheduler instance
+ *
+ * Returns true if submission is ready
+ */
+bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched)
+{
+	return !!sched->thread;
+}
+EXPORT_SYMBOL(drm_sched_wqueue_ready);
+
+/**
+ * drm_sched_wqueue_stop - stop scheduler submission
+ *
+ * @sched: scheduler instance
+ */
+void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
+{
+	kthread_park(sched->thread);
+}
+EXPORT_SYMBOL(drm_sched_wqueue_stop);
+
+/**
+ * drm_sched_wqueue_start - start scheduler submission
+ *
+ * @sched: scheduler instance
+ */
+void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched)
+{
+	kthread_unpark(sched->thread);
+}
+EXPORT_SYMBOL(drm_sched_wqueue_start);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index d2fb81e34174..1d5a20af4a06 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -552,6 +552,9 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 
 void drm_sched_job_cleanup(struct drm_sched_job *job);
 void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched);
+bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);
+void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched);
+void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched);
 void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad);
 void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery);
 void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-xe] [PATCH v8 1/5] drm/sched: Add drm_sched_wqueue_* helpers
@ 2023-10-31  3:24   ` Matthew Brost
  0 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-10-31  3:24 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, Luben Tuikov, ltuikov, ketil.johnsen,
	Liviu.Dudau, mcanal, frank.binns, boris.brezillon, dakr,
	donald.robson, daniel, lina, airlied, christian.koenig,
	faith.ekstrand

Add scheduler wqueue ready, stop, and start helpers to hide the
implementation details of the scheduler from the drivers.

v2:
  - s/sched_wqueue/sched_wqueue (Luben)
  - Remove the extra white line after the return-statement (Luben)
  - update drm_sched_wqueue_ready comment (Luben)

Cc: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   | 15 +++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 12 +++---
 drivers/gpu/drm/msm/adreno/adreno_device.c    |  6 ++-
 drivers/gpu/drm/scheduler/sched_main.c        | 39 ++++++++++++++++++-
 include/drm/gpu_scheduler.h                   |  3 ++
 6 files changed, 59 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 625db444df1c..10d56979fe3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -290,7 +290,7 @@ static int suspend_resume_compute_scheduler(struct amdgpu_device *adev, bool sus
 	for (i = 0; i < adev->gfx.num_compute_rings; i++) {
 		struct amdgpu_ring *ring = &adev->gfx.compute_ring[i];
 
-		if (!(ring && ring->sched.thread))
+		if (!(ring && drm_sched_wqueue_ready(&ring->sched)))
 			continue;
 
 		/* stop secheduler and drain ring. */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 3136a0774dd9..e20fd9e6c5bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1659,9 +1659,9 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
 	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 			continue;
-		kthread_park(ring->sched.thread);
+		drm_sched_wqueue_stop(&ring->sched);
 	}
 
 	seq_puts(m, "run ib test:\n");
@@ -1675,9 +1675,9 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
 	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 			continue;
-		kthread_unpark(ring->sched.thread);
+		drm_sched_wqueue_start(&ring->sched);
 	}
 
 	up_write(&adev->reset_domain->sem);
@@ -1897,7 +1897,8 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 
 	ring = adev->rings[val];
 
-	if (!ring || !ring->funcs->preempt_ib || !ring->sched.thread)
+	if (!ring || !ring->funcs->preempt_ib ||
+	    !drm_sched_wqueue_ready(&ring->sched))
 		return -EINVAL;
 
 	/* the last preemption failed */
@@ -1915,7 +1916,7 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 		goto pro_end;
 
 	/* stop the scheduler */
-	kthread_park(ring->sched.thread);
+	drm_sched_wqueue_stop(&ring->sched);
 
 	/* preempt the IB */
 	r = amdgpu_ring_preempt_ib(ring);
@@ -1949,7 +1950,7 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 
 failure:
 	/* restart the scheduler */
-	kthread_unpark(ring->sched.thread);
+	drm_sched_wqueue_start(&ring->sched);
 
 	up_read(&adev->reset_domain->sem);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 186c06756a2c..d20c12aae66b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4861,7 +4861,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev)
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 			continue;
 
 		spin_lock(&ring->sched.job_list_lock);
@@ -5000,7 +5000,7 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 			continue;
 
 		/* Clear job fence from fence drv to avoid force_completion
@@ -5489,7 +5489,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 		for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 			struct amdgpu_ring *ring = tmp_adev->rings[i];
 
-			if (!ring || !ring->sched.thread)
+			if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 				continue;
 
 			drm_sched_stop(&ring->sched, job ? &job->base : NULL);
@@ -5565,7 +5565,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 		for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 			struct amdgpu_ring *ring = tmp_adev->rings[i];
 
-			if (!ring || !ring->sched.thread)
+			if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 				continue;
 
 			drm_sched_start(&ring->sched, true);
@@ -5892,7 +5892,7 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
 		for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 			struct amdgpu_ring *ring = adev->rings[i];
 
-			if (!ring || !ring->sched.thread)
+			if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 				continue;
 
 			drm_sched_stop(&ring->sched, NULL);
@@ -6020,7 +6020,7 @@ void amdgpu_pci_resume(struct pci_dev *pdev)
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		struct amdgpu_ring *ring = adev->rings[i];
 
-		if (!ring || !ring->sched.thread)
+		if (!ring || !drm_sched_wqueue_ready(&ring->sched))
 			continue;
 
 		drm_sched_start(&ring->sched, true);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 41b13dec9bef..f62ab5257e66 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -841,7 +841,8 @@ static void suspend_scheduler(struct msm_gpu *gpu)
 	 */
 	for (i = 0; i < gpu->nr_rings; i++) {
 		struct drm_gpu_scheduler *sched = &gpu->rb[i]->sched;
-		kthread_park(sched->thread);
+
+		drm_sched_wqueue_stop(sched);
 	}
 }
 
@@ -851,7 +852,8 @@ static void resume_scheduler(struct msm_gpu *gpu)
 
 	for (i = 0; i < gpu->nr_rings; i++) {
 		struct drm_gpu_scheduler *sched = &gpu->rb[i]->sched;
-		kthread_unpark(sched->thread);
+
+		drm_sched_wqueue_start(sched);
 	}
 }
 
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 99797a8c836a..54c1c5fe01ba 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -439,7 +439,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 {
 	struct drm_sched_job *s_job, *tmp;
 
-	kthread_park(sched->thread);
+	drm_sched_wqueue_stop(sched);
 
 	/*
 	 * Reinsert back the bad job here - now it's safe as
@@ -552,7 +552,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
 		spin_unlock(&sched->job_list_lock);
 	}
 
-	kthread_unpark(sched->thread);
+	drm_sched_wqueue_start(sched);
 }
 EXPORT_SYMBOL(drm_sched_start);
 
@@ -1252,3 +1252,38 @@ void drm_sched_increase_karma(struct drm_sched_job *bad)
 	}
 }
 EXPORT_SYMBOL(drm_sched_increase_karma);
+
+/**
+ * drm_sched_wqueue_ready - Is the scheduler ready for submission
+ *
+ * @sched: scheduler instance
+ *
+ * Returns true if submission is ready
+ */
+bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched)
+{
+	return !!sched->thread;
+}
+EXPORT_SYMBOL(drm_sched_wqueue_ready);
+
+/**
+ * drm_sched_wqueue_stop - stop scheduler submission
+ *
+ * @sched: scheduler instance
+ */
+void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
+{
+	kthread_park(sched->thread);
+}
+EXPORT_SYMBOL(drm_sched_wqueue_stop);
+
+/**
+ * drm_sched_wqueue_start - start scheduler submission
+ *
+ * @sched: scheduler instance
+ */
+void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched)
+{
+	kthread_unpark(sched->thread);
+}
+EXPORT_SYMBOL(drm_sched_wqueue_start);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index d2fb81e34174..1d5a20af4a06 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -552,6 +552,9 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 
 void drm_sched_job_cleanup(struct drm_sched_job *job);
 void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched);
+bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);
+void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched);
+void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched);
 void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad);
 void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery);
 void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v8 2/5] drm/sched: Convert drm scheduler to use a work queue rather than kthread
  2023-10-31  3:24 ` [Intel-xe] " Matthew Brost
@ 2023-10-31  3:24   ` Matthew Brost
  -1 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-10-31  3:24 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, thomas.hellstrom, Matthew Brost, sarah.walker,
	Luben Tuikov, ltuikov, ketil.johnsen, Liviu.Dudau, mcanal,
	boris.brezillon, dakr, donald.robson, lina, christian.koenig,
	faith.ekstrand

In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.

1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.

2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.

A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.

v2:
  - (Rob Clark) Fix msm build
  - Pass in run work queue
v3:
  - (Boris) don't have loop in worker
v4:
  - (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
  - (Boris) default to ordered work queue
v6:
  - (Luben / checkpatch) fix alignment in msm_ringbuffer.c
  - (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
  - (Luben) Update comment for drm_sched_wqueue_enqueue
  - (Luben) Positive check for submit_wq in drm_sched_init
  - (Luben) s/alloc_submit_wq/own_submit_wq
v7:
  - (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
  - (Luben) Adjust var names / comments

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
 drivers/gpu/drm/lima/lima_sched.c          |   2 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c       |   2 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c    |   2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
 drivers/gpu/drm/scheduler/sched_main.c     | 131 +++++++++++----------
 drivers/gpu/drm/v3d/v3d_sched.c            |  10 +-
 include/drm/gpu_scheduler.h                |  14 ++-
 9 files changed, 86 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d20c12aae66b..f493ffa1feec 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2491,7 +2491,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
 			break;
 		}
 
-		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
+		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL,
 				   DRM_SCHED_PRIORITY_COUNT,
 				   ring->num_hw_submission, 0,
 				   timeout, adev->reset_domain->wq,
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 9b79f218e21a..c4b04b0dee16 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -134,7 +134,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
 {
 	int ret;
 
-	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
+	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
 			     DRM_SCHED_PRIORITY_COUNT,
 			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
 			     msecs_to_jiffies(500), NULL, NULL,
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index 295f0353a02e..aa030e1f7cda 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
 
 	INIT_WORK(&pipe->recover_work, lima_sched_recover_work);
 
-	return drm_sched_init(&pipe->base, &lima_sched_ops,
+	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL,
 			      DRM_SCHED_PRIORITY_COUNT,
 			      1,
 			      lima_job_hang_limit,
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 95257ab0185d..4968568e3b54 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -94,7 +94,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
 	 /* currently managing hangcheck ourselves: */
 	sched_timeout = MAX_SCHEDULE_TIMEOUT;
 
-	ret = drm_sched_init(&ring->sched, &msm_sched_ops,
+	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
 			     DRM_SCHED_PRIORITY_COUNT,
 			     num_hw_submissions, 0, sched_timeout,
 			     NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
index 7c376c4ccdcf..c4ba56b1a6dd 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sched.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
@@ -435,7 +435,7 @@ int nouveau_sched_init(struct nouveau_drm *drm)
 	if (!drm->sched_wq)
 		return -ENOMEM;
 
-	return drm_sched_init(sched, &nouveau_sched_ops,
+	return drm_sched_init(sched, &nouveau_sched_ops, NULL,
 			      DRM_SCHED_PRIORITY_COUNT,
 			      NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
 			      NULL, NULL, "nouveau_sched", drm->dev->dev);
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index ecd2e035147f..6d89e24322db 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -852,7 +852,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
 		js->queue[j].fence_context = dma_fence_context_alloc(1);
 
 		ret = drm_sched_init(&js->queue[j].sched,
-				     &panfrost_sched_ops,
+				     &panfrost_sched_ops, NULL,
 				     DRM_SCHED_PRIORITY_COUNT,
 				     nentries, 0,
 				     msecs_to_jiffies(JOB_TIMEOUT_MS),
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 54c1c5fe01ba..d1ae05bded15 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -48,7 +48,6 @@
  * through the jobs entity pointer.
  */
 
-#include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/sched.h>
 #include <linux/completion.h>
@@ -256,6 +255,16 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
 	return rb ? rb_entry(rb, struct drm_sched_entity, rb_tree_node) : NULL;
 }
 
+/**
+ * drm_sched_run_job_queue - enqueue run-job work
+ * @sched: scheduler instance
+ */
+static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
+{
+	if (!READ_ONCE(sched->pause_submit))
+		queue_work(sched->submit_wq, &sched->work_run_job);
+}
+
 /**
  * drm_sched_job_done - complete a job
  * @s_job: pointer to the job which is done
@@ -275,7 +284,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
 	dma_fence_get(&s_fence->finished);
 	drm_sched_fence_finished(s_fence, result);
 	dma_fence_put(&s_fence->finished);
-	wake_up_interruptible(&sched->wake_up_worker);
+	drm_sched_run_job_queue(sched);
 }
 
 /**
@@ -874,7 +883,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
 void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched)
 {
 	if (drm_sched_can_queue(sched))
-		wake_up_interruptible(&sched->wake_up_worker);
+		drm_sched_run_job_queue(sched);
 }
 
 /**
@@ -985,60 +994,41 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 EXPORT_SYMBOL(drm_sched_pick_best);
 
 /**
- * drm_sched_blocked - check if the scheduler is blocked
+ * drm_sched_run_job_work - main scheduler thread
  *
- * @sched: scheduler instance
- *
- * Returns true if blocked, otherwise false.
+ * @w: run job work
  */
-static bool drm_sched_blocked(struct drm_gpu_scheduler *sched)
+static void drm_sched_run_job_work(struct work_struct *w)
 {
-	if (kthread_should_park()) {
-		kthread_parkme();
-		return true;
-	}
-
-	return false;
-}
-
-/**
- * drm_sched_main - main scheduler thread
- *
- * @param: scheduler instance
- *
- * Returns 0.
- */
-static int drm_sched_main(void *param)
-{
-	struct drm_gpu_scheduler *sched = (struct drm_gpu_scheduler *)param;
+	struct drm_gpu_scheduler *sched =
+		container_of(w, struct drm_gpu_scheduler, work_run_job);
+	struct drm_sched_entity *entity;
+	struct drm_sched_job *cleanup_job;
 	int r;
 
-	sched_set_fifo_low(current);
+	if (READ_ONCE(sched->pause_submit))
+		return;
 
-	while (!kthread_should_stop()) {
-		struct drm_sched_entity *entity = NULL;
-		struct drm_sched_fence *s_fence;
-		struct drm_sched_job *sched_job;
-		struct dma_fence *fence;
-		struct drm_sched_job *cleanup_job = NULL;
+	cleanup_job = drm_sched_get_cleanup_job(sched);
+	entity = drm_sched_select_entity(sched);
 
-		wait_event_interruptible(sched->wake_up_worker,
-					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
-					 (!drm_sched_blocked(sched) &&
-					  (entity = drm_sched_select_entity(sched))) ||
-					 kthread_should_stop());
+	if (!entity && !cleanup_job)
+		return;	/* No more work */
 
-		if (cleanup_job)
-			sched->ops->free_job(cleanup_job);
+	if (cleanup_job)
+		sched->ops->free_job(cleanup_job);
 
-		if (!entity)
-			continue;
+	if (entity) {
+		struct dma_fence *fence;
+		struct drm_sched_fence *s_fence;
+		struct drm_sched_job *sched_job;
 
 		sched_job = drm_sched_entity_pop_job(entity);
-
 		if (!sched_job) {
 			complete_all(&entity->entity_idle);
-			continue;
+			if (!cleanup_job)
+				return;	/* No more work */
+			goto again;
 		}
 
 		s_fence = sched_job->s_fence;
@@ -1069,7 +1059,9 @@ static int drm_sched_main(void *param)
 
 		wake_up(&sched->job_scheduled);
 	}
-	return 0;
+
+again:
+	drm_sched_run_job_queue(sched);
 }
 
 /**
@@ -1077,6 +1069,8 @@ static int drm_sched_main(void *param)
  *
  * @sched: scheduler instance
  * @ops: backend operations for this scheduler
+ * @submit_wq: workqueue to use for submission. If NULL, an ordered wq is
+ *	       allocated and used
  * @num_rqs: number of runqueues, one for each priority, up to DRM_SCHED_PRIORITY_COUNT
  * @hw_submission: number of hw submissions that can be in flight
  * @hang_limit: number of times to allow a job to hang before dropping it
@@ -1091,6 +1085,7 @@ static int drm_sched_main(void *param)
  */
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
+		   struct workqueue_struct *submit_wq,
 		   u32 num_rqs, uint32_t hw_submission, unsigned int hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
 		   atomic_t *score, const char *name, struct device *dev)
@@ -1121,14 +1116,22 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		return 0;
 	}
 
+	if (submit_wq) {
+		sched->submit_wq = submit_wq;
+		sched->own_submit_wq = false;
+	} else {
+		sched->submit_wq = alloc_ordered_workqueue(name, 0);
+		if (!sched->submit_wq)
+			return -ENOMEM;
+
+		sched->own_submit_wq = true;
+	}
+	ret = -ENOMEM;
 	sched->sched_rq = kmalloc_array(num_rqs, sizeof(*sched->sched_rq),
 					GFP_KERNEL | __GFP_ZERO);
-	if (!sched->sched_rq) {
-		drm_err(sched, "%s: out of memory for sched_rq\n", __func__);
-		return -ENOMEM;
-	}
+	if (!sched->sched_rq)
+		goto Out_free;
 	sched->num_rqs = num_rqs;
-	ret = -ENOMEM;
 	for (i = DRM_SCHED_PRIORITY_MIN; i < sched->num_rqs; i++) {
 		sched->sched_rq[i] = kzalloc(sizeof(*sched->sched_rq[i]), GFP_KERNEL);
 		if (!sched->sched_rq[i])
@@ -1136,31 +1139,26 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		drm_sched_rq_init(sched, sched->sched_rq[i]);
 	}
 
-	init_waitqueue_head(&sched->wake_up_worker);
 	init_waitqueue_head(&sched->job_scheduled);
 	INIT_LIST_HEAD(&sched->pending_list);
 	spin_lock_init(&sched->job_list_lock);
 	atomic_set(&sched->hw_rq_count, 0);
 	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
+	INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
 	atomic_set(&sched->_score, 0);
 	atomic64_set(&sched->job_id_count, 0);
-
-	/* Each scheduler will run on a seperate kernel thread */
-	sched->thread = kthread_run(drm_sched_main, sched, sched->name);
-	if (IS_ERR(sched->thread)) {
-		ret = PTR_ERR(sched->thread);
-		sched->thread = NULL;
-		DRM_DEV_ERROR(sched->dev, "Failed to create scheduler for %s.\n", name);
-		goto Out_unroll;
-	}
+	sched->pause_submit = false;
 
 	sched->ready = true;
 	return 0;
 Out_unroll:
 	for (--i ; i >= DRM_SCHED_PRIORITY_MIN; i--)
 		kfree(sched->sched_rq[i]);
+Out_free:
 	kfree(sched->sched_rq);
 	sched->sched_rq = NULL;
+	if (sched->own_submit_wq)
+		destroy_workqueue(sched->submit_wq);
 	drm_err(sched, "%s: Failed to setup GPU scheduler--out of memory\n", __func__);
 	return ret;
 }
@@ -1178,8 +1176,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 	struct drm_sched_entity *s_entity;
 	int i;
 
-	if (sched->thread)
-		kthread_stop(sched->thread);
+	drm_sched_wqueue_stop(sched);
 
 	for (i = sched->num_rqs - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
 		struct drm_sched_rq *rq = sched->sched_rq[i];
@@ -1202,6 +1199,8 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 	/* Confirm no work left behind accessing device structures */
 	cancel_delayed_work_sync(&sched->work_tdr);
 
+	if (sched->own_submit_wq)
+		destroy_workqueue(sched->submit_wq);
 	sched->ready = false;
 	kfree(sched->sched_rq);
 	sched->sched_rq = NULL;
@@ -1262,7 +1261,7 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
  */
 bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched)
 {
-	return !!sched->thread;
+	return sched->ready;
 }
 EXPORT_SYMBOL(drm_sched_wqueue_ready);
 
@@ -1273,7 +1272,8 @@ EXPORT_SYMBOL(drm_sched_wqueue_ready);
  */
 void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
 {
-	kthread_park(sched->thread);
+	WRITE_ONCE(sched->pause_submit, true);
+	cancel_work_sync(&sched->work_run_job);
 }
 EXPORT_SYMBOL(drm_sched_wqueue_stop);
 
@@ -1284,6 +1284,7 @@ EXPORT_SYMBOL(drm_sched_wqueue_stop);
  */
 void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched)
 {
-	kthread_unpark(sched->thread);
+	WRITE_ONCE(sched->pause_submit, false);
+	queue_work(sched->submit_wq, &sched->work_run_job);
 }
 EXPORT_SYMBOL(drm_sched_wqueue_start);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 038e1ae589c7..0b6696b0d882 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -388,7 +388,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 	int ret;
 
 	ret = drm_sched_init(&v3d->queue[V3D_BIN].sched,
-			     &v3d_bin_sched_ops,
+			     &v3d_bin_sched_ops, NULL,
 			     DRM_SCHED_PRIORITY_COUNT,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
@@ -397,7 +397,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 		return ret;
 
 	ret = drm_sched_init(&v3d->queue[V3D_RENDER].sched,
-			     &v3d_render_sched_ops,
+			     &v3d_render_sched_ops, NULL,
 			     DRM_SCHED_PRIORITY_COUNT,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
@@ -406,7 +406,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 		goto fail;
 
 	ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
-			     &v3d_tfu_sched_ops,
+			     &v3d_tfu_sched_ops, NULL,
 			     DRM_SCHED_PRIORITY_COUNT,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
@@ -416,7 +416,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 
 	if (v3d_has_csd(v3d)) {
 		ret = drm_sched_init(&v3d->queue[V3D_CSD].sched,
-				     &v3d_csd_sched_ops,
+				     &v3d_csd_sched_ops, NULL,
 				     DRM_SCHED_PRIORITY_COUNT,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms), NULL,
@@ -425,7 +425,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 			goto fail;
 
 		ret = drm_sched_init(&v3d->queue[V3D_CACHE_CLEAN].sched,
-				     &v3d_cache_clean_sched_ops,
+				     &v3d_cache_clean_sched_ops, NULL,
 				     DRM_SCHED_PRIORITY_COUNT,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms), NULL,
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 1d5a20af4a06..e0e7c4eb57d9 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -474,17 +474,16 @@ struct drm_sched_backend_ops {
  * @num_rqs: Number of run-queues. This is at most DRM_SCHED_PRIORITY_COUNT,
  *           as there's usually one run-queue per priority, but could be less.
  * @sched_rq: An allocated array of run-queues of size @num_rqs;
- * @wake_up_worker: the wait queue on which the scheduler sleeps until a job
- *                  is ready to be scheduled.
  * @job_scheduled: once @drm_sched_entity_do_release is called the scheduler
  *                 waits on this wait queue until all the scheduled jobs are
  *                 finished.
  * @hw_rq_count: the number of jobs currently in the hardware queue.
  * @job_id_count: used to assign unique id to the each job.
+ * @submit_wq: workqueue used to queue @work_run_job
  * @timeout_wq: workqueue used to queue @work_tdr
+ * @work_run_job: work which calls run_job op of each scheduler.
  * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
  *            timeout interval is over.
- * @thread: the kthread on which the scheduler which run.
  * @pending_list: the list of jobs which are currently in the job queue.
  * @job_list_lock: lock to protect the pending_list.
  * @hang_limit: once the hangs by a job crosses this limit then it is marked
@@ -493,6 +492,8 @@ struct drm_sched_backend_ops {
  * @_score: score used when the driver doesn't provide one
  * @ready: marks if the underlying HW is ready to work
  * @free_guilty: A hit to time out handler to free the guilty job.
+ * @pause_submit: pause queuing of @work_run_job on @submit_wq
+ * @own_submit_wq: scheduler owns allocation of @submit_wq
  * @dev: system &struct device
  *
  * One scheduler is implemented for each hardware ring.
@@ -504,13 +505,13 @@ struct drm_gpu_scheduler {
 	const char			*name;
 	u32                             num_rqs;
 	struct drm_sched_rq             **sched_rq;
-	wait_queue_head_t		wake_up_worker;
 	wait_queue_head_t		job_scheduled;
 	atomic_t			hw_rq_count;
 	atomic64_t			job_id_count;
+	struct workqueue_struct		*submit_wq;
 	struct workqueue_struct		*timeout_wq;
+	struct work_struct		work_run_job;
 	struct delayed_work		work_tdr;
-	struct task_struct		*thread;
 	struct list_head		pending_list;
 	spinlock_t			job_list_lock;
 	int				hang_limit;
@@ -518,11 +519,14 @@ struct drm_gpu_scheduler {
 	atomic_t                        _score;
 	bool				ready;
 	bool				free_guilty;
+	bool				pause_submit;
+	bool				own_submit_wq;
 	struct device			*dev;
 };
 
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
+		   struct workqueue_struct *submit_wq,
 		   u32 num_rqs, uint32_t hw_submission, unsigned int hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
 		   atomic_t *score, const char *name, struct device *dev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-xe] [PATCH v8 2/5] drm/sched: Convert drm scheduler to use a work queue rather than kthread
@ 2023-10-31  3:24   ` Matthew Brost
  0 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-10-31  3:24 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, Luben Tuikov, ltuikov, ketil.johnsen,
	Liviu.Dudau, mcanal, frank.binns, boris.brezillon, dakr,
	donald.robson, daniel, lina, airlied, christian.koenig,
	faith.ekstrand

In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.

1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.

2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.

A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.

v2:
  - (Rob Clark) Fix msm build
  - Pass in run work queue
v3:
  - (Boris) don't have loop in worker
v4:
  - (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
  - (Boris) default to ordered work queue
v6:
  - (Luben / checkpatch) fix alignment in msm_ringbuffer.c
  - (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
  - (Luben) Update comment for drm_sched_wqueue_enqueue
  - (Luben) Positive check for submit_wq in drm_sched_init
  - (Luben) s/alloc_submit_wq/own_submit_wq
v7:
  - (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
  - (Luben) Adjust var names / comments

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
 drivers/gpu/drm/lima/lima_sched.c          |   2 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c       |   2 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c    |   2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
 drivers/gpu/drm/scheduler/sched_main.c     | 131 +++++++++++----------
 drivers/gpu/drm/v3d/v3d_sched.c            |  10 +-
 include/drm/gpu_scheduler.h                |  14 ++-
 9 files changed, 86 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d20c12aae66b..f493ffa1feec 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2491,7 +2491,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
 			break;
 		}
 
-		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
+		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL,
 				   DRM_SCHED_PRIORITY_COUNT,
 				   ring->num_hw_submission, 0,
 				   timeout, adev->reset_domain->wq,
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 9b79f218e21a..c4b04b0dee16 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -134,7 +134,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
 {
 	int ret;
 
-	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
+	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL,
 			     DRM_SCHED_PRIORITY_COUNT,
 			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
 			     msecs_to_jiffies(500), NULL, NULL,
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index 295f0353a02e..aa030e1f7cda 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
 
 	INIT_WORK(&pipe->recover_work, lima_sched_recover_work);
 
-	return drm_sched_init(&pipe->base, &lima_sched_ops,
+	return drm_sched_init(&pipe->base, &lima_sched_ops, NULL,
 			      DRM_SCHED_PRIORITY_COUNT,
 			      1,
 			      lima_job_hang_limit,
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 95257ab0185d..4968568e3b54 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -94,7 +94,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
 	 /* currently managing hangcheck ourselves: */
 	sched_timeout = MAX_SCHEDULE_TIMEOUT;
 
-	ret = drm_sched_init(&ring->sched, &msm_sched_ops,
+	ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL,
 			     DRM_SCHED_PRIORITY_COUNT,
 			     num_hw_submissions, 0, sched_timeout,
 			     NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
index 7c376c4ccdcf..c4ba56b1a6dd 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sched.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
@@ -435,7 +435,7 @@ int nouveau_sched_init(struct nouveau_drm *drm)
 	if (!drm->sched_wq)
 		return -ENOMEM;
 
-	return drm_sched_init(sched, &nouveau_sched_ops,
+	return drm_sched_init(sched, &nouveau_sched_ops, NULL,
 			      DRM_SCHED_PRIORITY_COUNT,
 			      NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit,
 			      NULL, NULL, "nouveau_sched", drm->dev->dev);
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index ecd2e035147f..6d89e24322db 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -852,7 +852,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
 		js->queue[j].fence_context = dma_fence_context_alloc(1);
 
 		ret = drm_sched_init(&js->queue[j].sched,
-				     &panfrost_sched_ops,
+				     &panfrost_sched_ops, NULL,
 				     DRM_SCHED_PRIORITY_COUNT,
 				     nentries, 0,
 				     msecs_to_jiffies(JOB_TIMEOUT_MS),
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 54c1c5fe01ba..d1ae05bded15 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -48,7 +48,6 @@
  * through the jobs entity pointer.
  */
 
-#include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/sched.h>
 #include <linux/completion.h>
@@ -256,6 +255,16 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
 	return rb ? rb_entry(rb, struct drm_sched_entity, rb_tree_node) : NULL;
 }
 
+/**
+ * drm_sched_run_job_queue - enqueue run-job work
+ * @sched: scheduler instance
+ */
+static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
+{
+	if (!READ_ONCE(sched->pause_submit))
+		queue_work(sched->submit_wq, &sched->work_run_job);
+}
+
 /**
  * drm_sched_job_done - complete a job
  * @s_job: pointer to the job which is done
@@ -275,7 +284,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
 	dma_fence_get(&s_fence->finished);
 	drm_sched_fence_finished(s_fence, result);
 	dma_fence_put(&s_fence->finished);
-	wake_up_interruptible(&sched->wake_up_worker);
+	drm_sched_run_job_queue(sched);
 }
 
 /**
@@ -874,7 +883,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
 void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched)
 {
 	if (drm_sched_can_queue(sched))
-		wake_up_interruptible(&sched->wake_up_worker);
+		drm_sched_run_job_queue(sched);
 }
 
 /**
@@ -985,60 +994,41 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 EXPORT_SYMBOL(drm_sched_pick_best);
 
 /**
- * drm_sched_blocked - check if the scheduler is blocked
+ * drm_sched_run_job_work - main scheduler thread
  *
- * @sched: scheduler instance
- *
- * Returns true if blocked, otherwise false.
+ * @w: run job work
  */
-static bool drm_sched_blocked(struct drm_gpu_scheduler *sched)
+static void drm_sched_run_job_work(struct work_struct *w)
 {
-	if (kthread_should_park()) {
-		kthread_parkme();
-		return true;
-	}
-
-	return false;
-}
-
-/**
- * drm_sched_main - main scheduler thread
- *
- * @param: scheduler instance
- *
- * Returns 0.
- */
-static int drm_sched_main(void *param)
-{
-	struct drm_gpu_scheduler *sched = (struct drm_gpu_scheduler *)param;
+	struct drm_gpu_scheduler *sched =
+		container_of(w, struct drm_gpu_scheduler, work_run_job);
+	struct drm_sched_entity *entity;
+	struct drm_sched_job *cleanup_job;
 	int r;
 
-	sched_set_fifo_low(current);
+	if (READ_ONCE(sched->pause_submit))
+		return;
 
-	while (!kthread_should_stop()) {
-		struct drm_sched_entity *entity = NULL;
-		struct drm_sched_fence *s_fence;
-		struct drm_sched_job *sched_job;
-		struct dma_fence *fence;
-		struct drm_sched_job *cleanup_job = NULL;
+	cleanup_job = drm_sched_get_cleanup_job(sched);
+	entity = drm_sched_select_entity(sched);
 
-		wait_event_interruptible(sched->wake_up_worker,
-					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
-					 (!drm_sched_blocked(sched) &&
-					  (entity = drm_sched_select_entity(sched))) ||
-					 kthread_should_stop());
+	if (!entity && !cleanup_job)
+		return;	/* No more work */
 
-		if (cleanup_job)
-			sched->ops->free_job(cleanup_job);
+	if (cleanup_job)
+		sched->ops->free_job(cleanup_job);
 
-		if (!entity)
-			continue;
+	if (entity) {
+		struct dma_fence *fence;
+		struct drm_sched_fence *s_fence;
+		struct drm_sched_job *sched_job;
 
 		sched_job = drm_sched_entity_pop_job(entity);
-
 		if (!sched_job) {
 			complete_all(&entity->entity_idle);
-			continue;
+			if (!cleanup_job)
+				return;	/* No more work */
+			goto again;
 		}
 
 		s_fence = sched_job->s_fence;
@@ -1069,7 +1059,9 @@ static int drm_sched_main(void *param)
 
 		wake_up(&sched->job_scheduled);
 	}
-	return 0;
+
+again:
+	drm_sched_run_job_queue(sched);
 }
 
 /**
@@ -1077,6 +1069,8 @@ static int drm_sched_main(void *param)
  *
  * @sched: scheduler instance
  * @ops: backend operations for this scheduler
+ * @submit_wq: workqueue to use for submission. If NULL, an ordered wq is
+ *	       allocated and used
  * @num_rqs: number of runqueues, one for each priority, up to DRM_SCHED_PRIORITY_COUNT
  * @hw_submission: number of hw submissions that can be in flight
  * @hang_limit: number of times to allow a job to hang before dropping it
@@ -1091,6 +1085,7 @@ static int drm_sched_main(void *param)
  */
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
+		   struct workqueue_struct *submit_wq,
 		   u32 num_rqs, uint32_t hw_submission, unsigned int hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
 		   atomic_t *score, const char *name, struct device *dev)
@@ -1121,14 +1116,22 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		return 0;
 	}
 
+	if (submit_wq) {
+		sched->submit_wq = submit_wq;
+		sched->own_submit_wq = false;
+	} else {
+		sched->submit_wq = alloc_ordered_workqueue(name, 0);
+		if (!sched->submit_wq)
+			return -ENOMEM;
+
+		sched->own_submit_wq = true;
+	}
+	ret = -ENOMEM;
 	sched->sched_rq = kmalloc_array(num_rqs, sizeof(*sched->sched_rq),
 					GFP_KERNEL | __GFP_ZERO);
-	if (!sched->sched_rq) {
-		drm_err(sched, "%s: out of memory for sched_rq\n", __func__);
-		return -ENOMEM;
-	}
+	if (!sched->sched_rq)
+		goto Out_free;
 	sched->num_rqs = num_rqs;
-	ret = -ENOMEM;
 	for (i = DRM_SCHED_PRIORITY_MIN; i < sched->num_rqs; i++) {
 		sched->sched_rq[i] = kzalloc(sizeof(*sched->sched_rq[i]), GFP_KERNEL);
 		if (!sched->sched_rq[i])
@@ -1136,31 +1139,26 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		drm_sched_rq_init(sched, sched->sched_rq[i]);
 	}
 
-	init_waitqueue_head(&sched->wake_up_worker);
 	init_waitqueue_head(&sched->job_scheduled);
 	INIT_LIST_HEAD(&sched->pending_list);
 	spin_lock_init(&sched->job_list_lock);
 	atomic_set(&sched->hw_rq_count, 0);
 	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
+	INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
 	atomic_set(&sched->_score, 0);
 	atomic64_set(&sched->job_id_count, 0);
-
-	/* Each scheduler will run on a seperate kernel thread */
-	sched->thread = kthread_run(drm_sched_main, sched, sched->name);
-	if (IS_ERR(sched->thread)) {
-		ret = PTR_ERR(sched->thread);
-		sched->thread = NULL;
-		DRM_DEV_ERROR(sched->dev, "Failed to create scheduler for %s.\n", name);
-		goto Out_unroll;
-	}
+	sched->pause_submit = false;
 
 	sched->ready = true;
 	return 0;
 Out_unroll:
 	for (--i ; i >= DRM_SCHED_PRIORITY_MIN; i--)
 		kfree(sched->sched_rq[i]);
+Out_free:
 	kfree(sched->sched_rq);
 	sched->sched_rq = NULL;
+	if (sched->own_submit_wq)
+		destroy_workqueue(sched->submit_wq);
 	drm_err(sched, "%s: Failed to setup GPU scheduler--out of memory\n", __func__);
 	return ret;
 }
@@ -1178,8 +1176,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 	struct drm_sched_entity *s_entity;
 	int i;
 
-	if (sched->thread)
-		kthread_stop(sched->thread);
+	drm_sched_wqueue_stop(sched);
 
 	for (i = sched->num_rqs - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
 		struct drm_sched_rq *rq = sched->sched_rq[i];
@@ -1202,6 +1199,8 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 	/* Confirm no work left behind accessing device structures */
 	cancel_delayed_work_sync(&sched->work_tdr);
 
+	if (sched->own_submit_wq)
+		destroy_workqueue(sched->submit_wq);
 	sched->ready = false;
 	kfree(sched->sched_rq);
 	sched->sched_rq = NULL;
@@ -1262,7 +1261,7 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
  */
 bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched)
 {
-	return !!sched->thread;
+	return sched->ready;
 }
 EXPORT_SYMBOL(drm_sched_wqueue_ready);
 
@@ -1273,7 +1272,8 @@ EXPORT_SYMBOL(drm_sched_wqueue_ready);
  */
 void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
 {
-	kthread_park(sched->thread);
+	WRITE_ONCE(sched->pause_submit, true);
+	cancel_work_sync(&sched->work_run_job);
 }
 EXPORT_SYMBOL(drm_sched_wqueue_stop);
 
@@ -1284,6 +1284,7 @@ EXPORT_SYMBOL(drm_sched_wqueue_stop);
  */
 void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched)
 {
-	kthread_unpark(sched->thread);
+	WRITE_ONCE(sched->pause_submit, false);
+	queue_work(sched->submit_wq, &sched->work_run_job);
 }
 EXPORT_SYMBOL(drm_sched_wqueue_start);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 038e1ae589c7..0b6696b0d882 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -388,7 +388,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 	int ret;
 
 	ret = drm_sched_init(&v3d->queue[V3D_BIN].sched,
-			     &v3d_bin_sched_ops,
+			     &v3d_bin_sched_ops, NULL,
 			     DRM_SCHED_PRIORITY_COUNT,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
@@ -397,7 +397,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 		return ret;
 
 	ret = drm_sched_init(&v3d->queue[V3D_RENDER].sched,
-			     &v3d_render_sched_ops,
+			     &v3d_render_sched_ops, NULL,
 			     DRM_SCHED_PRIORITY_COUNT,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
@@ -406,7 +406,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 		goto fail;
 
 	ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
-			     &v3d_tfu_sched_ops,
+			     &v3d_tfu_sched_ops, NULL,
 			     DRM_SCHED_PRIORITY_COUNT,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms), NULL,
@@ -416,7 +416,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 
 	if (v3d_has_csd(v3d)) {
 		ret = drm_sched_init(&v3d->queue[V3D_CSD].sched,
-				     &v3d_csd_sched_ops,
+				     &v3d_csd_sched_ops, NULL,
 				     DRM_SCHED_PRIORITY_COUNT,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms), NULL,
@@ -425,7 +425,7 @@ v3d_sched_init(struct v3d_dev *v3d)
 			goto fail;
 
 		ret = drm_sched_init(&v3d->queue[V3D_CACHE_CLEAN].sched,
-				     &v3d_cache_clean_sched_ops,
+				     &v3d_cache_clean_sched_ops, NULL,
 				     DRM_SCHED_PRIORITY_COUNT,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms), NULL,
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 1d5a20af4a06..e0e7c4eb57d9 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -474,17 +474,16 @@ struct drm_sched_backend_ops {
  * @num_rqs: Number of run-queues. This is at most DRM_SCHED_PRIORITY_COUNT,
  *           as there's usually one run-queue per priority, but could be less.
  * @sched_rq: An allocated array of run-queues of size @num_rqs;
- * @wake_up_worker: the wait queue on which the scheduler sleeps until a job
- *                  is ready to be scheduled.
  * @job_scheduled: once @drm_sched_entity_do_release is called the scheduler
  *                 waits on this wait queue until all the scheduled jobs are
  *                 finished.
  * @hw_rq_count: the number of jobs currently in the hardware queue.
  * @job_id_count: used to assign unique id to the each job.
+ * @submit_wq: workqueue used to queue @work_run_job
  * @timeout_wq: workqueue used to queue @work_tdr
+ * @work_run_job: work which calls run_job op of each scheduler.
  * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
  *            timeout interval is over.
- * @thread: the kthread on which the scheduler which run.
  * @pending_list: the list of jobs which are currently in the job queue.
  * @job_list_lock: lock to protect the pending_list.
  * @hang_limit: once the hangs by a job crosses this limit then it is marked
@@ -493,6 +492,8 @@ struct drm_sched_backend_ops {
  * @_score: score used when the driver doesn't provide one
  * @ready: marks if the underlying HW is ready to work
  * @free_guilty: A hit to time out handler to free the guilty job.
+ * @pause_submit: pause queuing of @work_run_job on @submit_wq
+ * @own_submit_wq: scheduler owns allocation of @submit_wq
  * @dev: system &struct device
  *
  * One scheduler is implemented for each hardware ring.
@@ -504,13 +505,13 @@ struct drm_gpu_scheduler {
 	const char			*name;
 	u32                             num_rqs;
 	struct drm_sched_rq             **sched_rq;
-	wait_queue_head_t		wake_up_worker;
 	wait_queue_head_t		job_scheduled;
 	atomic_t			hw_rq_count;
 	atomic64_t			job_id_count;
+	struct workqueue_struct		*submit_wq;
 	struct workqueue_struct		*timeout_wq;
+	struct work_struct		work_run_job;
 	struct delayed_work		work_tdr;
-	struct task_struct		*thread;
 	struct list_head		pending_list;
 	spinlock_t			job_list_lock;
 	int				hang_limit;
@@ -518,11 +519,14 @@ struct drm_gpu_scheduler {
 	atomic_t                        _score;
 	bool				ready;
 	bool				free_guilty;
+	bool				pause_submit;
+	bool				own_submit_wq;
 	struct device			*dev;
 };
 
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
+		   struct workqueue_struct *submit_wq,
 		   u32 num_rqs, uint32_t hw_submission, unsigned int hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
 		   atomic_t *score, const char *name, struct device *dev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v8 3/5] drm/sched: Split free_job into own work item
  2023-10-31  3:24 ` [Intel-xe] " Matthew Brost
@ 2023-10-31  3:24   ` Matthew Brost
  -1 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-10-31  3:24 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, thomas.hellstrom, Matthew Brost, sarah.walker,
	ltuikov, ketil.johnsen, Liviu.Dudau, mcanal, boris.brezillon,
	dakr, donald.robson, lina, christian.koenig, faith.ekstrand

Rather than call free_job and run_job in same work item have a dedicated
work item for each. This aligns with the design and intended use of work
queues.

v2:
   - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
     timestamp in free_job() work item (Danilo)
v3:
  - Drop forward dec of drm_sched_select_entity (Boris)
  - Return in drm_sched_run_job_work if entity NULL (Boris)
v4:
  - Replace dequeue with peek and invert logic (Luben)
  - Wrap to 100 lines (Luben)
  - Update comments for *_queue / *_queue_if_ready functions (Luben)
v5:
  - Drop peek argument, blindly reinit idle (Luben)
  - s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done (Luben)
  - Update work_run_job & work_free_job kernel doc (Luben)
v6:
  - Do not move drm_sched_select_entity in file (Luben)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 146 +++++++++++++++++--------
 include/drm/gpu_scheduler.h            |   4 +-
 2 files changed, 101 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index d1ae05bded15..3b1b2f8eafe8 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -265,6 +265,32 @@ static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
 		queue_work(sched->submit_wq, &sched->work_run_job);
 }
 
+/**
+ * drm_sched_free_job_queue - enqueue free-job work
+ * @sched: scheduler instance
+ */
+static void drm_sched_free_job_queue(struct drm_gpu_scheduler *sched)
+{
+	if (!READ_ONCE(sched->pause_submit))
+		queue_work(sched->submit_wq, &sched->work_free_job);
+}
+
+/**
+ * drm_sched_free_job_queue_if_done - enqueue free-job work if ready
+ * @sched: scheduler instance
+ */
+static void drm_sched_free_job_queue_if_done(struct drm_gpu_scheduler *sched)
+{
+	struct drm_sched_job *job;
+
+	spin_lock(&sched->job_list_lock);
+	job = list_first_entry_or_null(&sched->pending_list,
+				       struct drm_sched_job, list);
+	if (job && dma_fence_is_signaled(&job->s_fence->finished))
+		drm_sched_free_job_queue(sched);
+	spin_unlock(&sched->job_list_lock);
+}
+
 /**
  * drm_sched_job_done - complete a job
  * @s_job: pointer to the job which is done
@@ -284,7 +310,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
 	dma_fence_get(&s_fence->finished);
 	drm_sched_fence_finished(s_fence, result);
 	dma_fence_put(&s_fence->finished);
-	drm_sched_run_job_queue(sched);
+	drm_sched_free_job_queue(sched);
 }
 
 /**
@@ -943,8 +969,10 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 						typeof(*next), list);
 
 		if (next) {
-			next->s_fence->scheduled.timestamp =
-				dma_fence_timestamp(&job->s_fence->finished);
+			if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
+				     &next->s_fence->scheduled.flags))
+				next->s_fence->scheduled.timestamp =
+					dma_fence_timestamp(&job->s_fence->finished);
 			/* start TO timer for next job */
 			drm_sched_start_timeout(sched);
 		}
@@ -994,7 +1022,40 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 EXPORT_SYMBOL(drm_sched_pick_best);
 
 /**
- * drm_sched_run_job_work - main scheduler thread
+ * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
+ * @sched: scheduler instance
+ */
+static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
+{
+	if (drm_sched_select_entity(sched))
+		drm_sched_run_job_queue(sched);
+}
+
+/**
+ * drm_sched_free_job_work - worker to call free_job
+ *
+ * @w: free job work
+ */
+static void drm_sched_free_job_work(struct work_struct *w)
+{
+	struct drm_gpu_scheduler *sched =
+		container_of(w, struct drm_gpu_scheduler, work_free_job);
+	struct drm_sched_job *cleanup_job;
+
+	if (READ_ONCE(sched->pause_submit))
+		return;
+
+	cleanup_job = drm_sched_get_cleanup_job(sched);
+	if (cleanup_job) {
+		sched->ops->free_job(cleanup_job);
+
+		drm_sched_free_job_queue_if_done(sched);
+		drm_sched_run_job_queue_if_ready(sched);
+	}
+}
+
+/**
+ * drm_sched_run_job_work - worker to call run_job
  *
  * @w: run job work
  */
@@ -1003,65 +1064,51 @@ static void drm_sched_run_job_work(struct work_struct *w)
 	struct drm_gpu_scheduler *sched =
 		container_of(w, struct drm_gpu_scheduler, work_run_job);
 	struct drm_sched_entity *entity;
-	struct drm_sched_job *cleanup_job;
+	struct dma_fence *fence;
+	struct drm_sched_fence *s_fence;
+	struct drm_sched_job *sched_job;
 	int r;
 
 	if (READ_ONCE(sched->pause_submit))
 		return;
 
-	cleanup_job = drm_sched_get_cleanup_job(sched);
 	entity = drm_sched_select_entity(sched);
+	if (!entity)
+		return;
 
-	if (!entity && !cleanup_job)
+	sched_job = drm_sched_entity_pop_job(entity);
+	if (!sched_job) {
+		complete_all(&entity->entity_idle);
 		return;	/* No more work */
+	}
 
-	if (cleanup_job)
-		sched->ops->free_job(cleanup_job);
-
-	if (entity) {
-		struct dma_fence *fence;
-		struct drm_sched_fence *s_fence;
-		struct drm_sched_job *sched_job;
-
-		sched_job = drm_sched_entity_pop_job(entity);
-		if (!sched_job) {
-			complete_all(&entity->entity_idle);
-			if (!cleanup_job)
-				return;	/* No more work */
-			goto again;
-		}
-
-		s_fence = sched_job->s_fence;
-
-		atomic_inc(&sched->hw_rq_count);
-		drm_sched_job_begin(sched_job);
+	s_fence = sched_job->s_fence;
 
-		trace_drm_run_job(sched_job, entity);
-		fence = sched->ops->run_job(sched_job);
-		complete_all(&entity->entity_idle);
-		drm_sched_fence_scheduled(s_fence, fence);
+	atomic_inc(&sched->hw_rq_count);
+	drm_sched_job_begin(sched_job);
 
-		if (!IS_ERR_OR_NULL(fence)) {
-			/* Drop for original kref_init of the fence */
-			dma_fence_put(fence);
+	trace_drm_run_job(sched_job, entity);
+	fence = sched->ops->run_job(sched_job);
+	complete_all(&entity->entity_idle);
+	drm_sched_fence_scheduled(s_fence, fence);
 
-			r = dma_fence_add_callback(fence, &sched_job->cb,
-						   drm_sched_job_done_cb);
-			if (r == -ENOENT)
-				drm_sched_job_done(sched_job, fence->error);
-			else if (r)
-				DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n",
-					  r);
-		} else {
-			drm_sched_job_done(sched_job, IS_ERR(fence) ?
-					   PTR_ERR(fence) : 0);
-		}
+	if (!IS_ERR_OR_NULL(fence)) {
+		/* Drop for original kref_init of the fence */
+		dma_fence_put(fence);
 
-		wake_up(&sched->job_scheduled);
+		r = dma_fence_add_callback(fence, &sched_job->cb,
+					   drm_sched_job_done_cb);
+		if (r == -ENOENT)
+			drm_sched_job_done(sched_job, fence->error);
+		else if (r)
+			DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
+	} else {
+		drm_sched_job_done(sched_job, IS_ERR(fence) ?
+				   PTR_ERR(fence) : 0);
 	}
 
-again:
-	drm_sched_run_job_queue(sched);
+	wake_up(&sched->job_scheduled);
+	drm_sched_run_job_queue_if_ready(sched);
 }
 
 /**
@@ -1145,6 +1192,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 	atomic_set(&sched->hw_rq_count, 0);
 	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
 	INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
+	INIT_WORK(&sched->work_free_job, drm_sched_free_job_work);
 	atomic_set(&sched->_score, 0);
 	atomic64_set(&sched->job_id_count, 0);
 	sched->pause_submit = false;
@@ -1274,6 +1322,7 @@ void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
 {
 	WRITE_ONCE(sched->pause_submit, true);
 	cancel_work_sync(&sched->work_run_job);
+	cancel_work_sync(&sched->work_free_job);
 }
 EXPORT_SYMBOL(drm_sched_wqueue_stop);
 
@@ -1286,5 +1335,6 @@ void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched)
 {
 	WRITE_ONCE(sched->pause_submit, false);
 	queue_work(sched->submit_wq, &sched->work_run_job);
+	queue_work(sched->submit_wq, &sched->work_free_job);
 }
 EXPORT_SYMBOL(drm_sched_wqueue_start);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index e0e7c4eb57d9..677ba96759ab 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -479,9 +479,10 @@ struct drm_sched_backend_ops {
  *                 finished.
  * @hw_rq_count: the number of jobs currently in the hardware queue.
  * @job_id_count: used to assign unique id to the each job.
- * @submit_wq: workqueue used to queue @work_run_job
+ * @submit_wq: workqueue used to queue @work_run_job and @work_free_job
  * @timeout_wq: workqueue used to queue @work_tdr
  * @work_run_job: work which calls run_job op of each scheduler.
+ * @work_free_job: work which calls free_job op of each scheduler.
  * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
  *            timeout interval is over.
  * @pending_list: the list of jobs which are currently in the job queue.
@@ -511,6 +512,7 @@ struct drm_gpu_scheduler {
 	struct workqueue_struct		*submit_wq;
 	struct workqueue_struct		*timeout_wq;
 	struct work_struct		work_run_job;
+	struct work_struct		work_free_job;
 	struct delayed_work		work_tdr;
 	struct list_head		pending_list;
 	spinlock_t			job_list_lock;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-xe] [PATCH v8 3/5] drm/sched: Split free_job into own work item
@ 2023-10-31  3:24   ` Matthew Brost
  0 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-10-31  3:24 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ltuikov, ketil.johnsen, Liviu.Dudau,
	mcanal, frank.binns, boris.brezillon, dakr, donald.robson,
	daniel, lina, airlied, christian.koenig, faith.ekstrand

Rather than call free_job and run_job in same work item have a dedicated
work item for each. This aligns with the design and intended use of work
queues.

v2:
   - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
     timestamp in free_job() work item (Danilo)
v3:
  - Drop forward dec of drm_sched_select_entity (Boris)
  - Return in drm_sched_run_job_work if entity NULL (Boris)
v4:
  - Replace dequeue with peek and invert logic (Luben)
  - Wrap to 100 lines (Luben)
  - Update comments for *_queue / *_queue_if_ready functions (Luben)
v5:
  - Drop peek argument, blindly reinit idle (Luben)
  - s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done (Luben)
  - Update work_run_job & work_free_job kernel doc (Luben)
v6:
  - Do not move drm_sched_select_entity in file (Luben)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 146 +++++++++++++++++--------
 include/drm/gpu_scheduler.h            |   4 +-
 2 files changed, 101 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index d1ae05bded15..3b1b2f8eafe8 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -265,6 +265,32 @@ static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
 		queue_work(sched->submit_wq, &sched->work_run_job);
 }
 
+/**
+ * drm_sched_free_job_queue - enqueue free-job work
+ * @sched: scheduler instance
+ */
+static void drm_sched_free_job_queue(struct drm_gpu_scheduler *sched)
+{
+	if (!READ_ONCE(sched->pause_submit))
+		queue_work(sched->submit_wq, &sched->work_free_job);
+}
+
+/**
+ * drm_sched_free_job_queue_if_done - enqueue free-job work if ready
+ * @sched: scheduler instance
+ */
+static void drm_sched_free_job_queue_if_done(struct drm_gpu_scheduler *sched)
+{
+	struct drm_sched_job *job;
+
+	spin_lock(&sched->job_list_lock);
+	job = list_first_entry_or_null(&sched->pending_list,
+				       struct drm_sched_job, list);
+	if (job && dma_fence_is_signaled(&job->s_fence->finished))
+		drm_sched_free_job_queue(sched);
+	spin_unlock(&sched->job_list_lock);
+}
+
 /**
  * drm_sched_job_done - complete a job
  * @s_job: pointer to the job which is done
@@ -284,7 +310,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
 	dma_fence_get(&s_fence->finished);
 	drm_sched_fence_finished(s_fence, result);
 	dma_fence_put(&s_fence->finished);
-	drm_sched_run_job_queue(sched);
+	drm_sched_free_job_queue(sched);
 }
 
 /**
@@ -943,8 +969,10 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 						typeof(*next), list);
 
 		if (next) {
-			next->s_fence->scheduled.timestamp =
-				dma_fence_timestamp(&job->s_fence->finished);
+			if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
+				     &next->s_fence->scheduled.flags))
+				next->s_fence->scheduled.timestamp =
+					dma_fence_timestamp(&job->s_fence->finished);
 			/* start TO timer for next job */
 			drm_sched_start_timeout(sched);
 		}
@@ -994,7 +1022,40 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 EXPORT_SYMBOL(drm_sched_pick_best);
 
 /**
- * drm_sched_run_job_work - main scheduler thread
+ * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
+ * @sched: scheduler instance
+ */
+static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
+{
+	if (drm_sched_select_entity(sched))
+		drm_sched_run_job_queue(sched);
+}
+
+/**
+ * drm_sched_free_job_work - worker to call free_job
+ *
+ * @w: free job work
+ */
+static void drm_sched_free_job_work(struct work_struct *w)
+{
+	struct drm_gpu_scheduler *sched =
+		container_of(w, struct drm_gpu_scheduler, work_free_job);
+	struct drm_sched_job *cleanup_job;
+
+	if (READ_ONCE(sched->pause_submit))
+		return;
+
+	cleanup_job = drm_sched_get_cleanup_job(sched);
+	if (cleanup_job) {
+		sched->ops->free_job(cleanup_job);
+
+		drm_sched_free_job_queue_if_done(sched);
+		drm_sched_run_job_queue_if_ready(sched);
+	}
+}
+
+/**
+ * drm_sched_run_job_work - worker to call run_job
  *
  * @w: run job work
  */
@@ -1003,65 +1064,51 @@ static void drm_sched_run_job_work(struct work_struct *w)
 	struct drm_gpu_scheduler *sched =
 		container_of(w, struct drm_gpu_scheduler, work_run_job);
 	struct drm_sched_entity *entity;
-	struct drm_sched_job *cleanup_job;
+	struct dma_fence *fence;
+	struct drm_sched_fence *s_fence;
+	struct drm_sched_job *sched_job;
 	int r;
 
 	if (READ_ONCE(sched->pause_submit))
 		return;
 
-	cleanup_job = drm_sched_get_cleanup_job(sched);
 	entity = drm_sched_select_entity(sched);
+	if (!entity)
+		return;
 
-	if (!entity && !cleanup_job)
+	sched_job = drm_sched_entity_pop_job(entity);
+	if (!sched_job) {
+		complete_all(&entity->entity_idle);
 		return;	/* No more work */
+	}
 
-	if (cleanup_job)
-		sched->ops->free_job(cleanup_job);
-
-	if (entity) {
-		struct dma_fence *fence;
-		struct drm_sched_fence *s_fence;
-		struct drm_sched_job *sched_job;
-
-		sched_job = drm_sched_entity_pop_job(entity);
-		if (!sched_job) {
-			complete_all(&entity->entity_idle);
-			if (!cleanup_job)
-				return;	/* No more work */
-			goto again;
-		}
-
-		s_fence = sched_job->s_fence;
-
-		atomic_inc(&sched->hw_rq_count);
-		drm_sched_job_begin(sched_job);
+	s_fence = sched_job->s_fence;
 
-		trace_drm_run_job(sched_job, entity);
-		fence = sched->ops->run_job(sched_job);
-		complete_all(&entity->entity_idle);
-		drm_sched_fence_scheduled(s_fence, fence);
+	atomic_inc(&sched->hw_rq_count);
+	drm_sched_job_begin(sched_job);
 
-		if (!IS_ERR_OR_NULL(fence)) {
-			/* Drop for original kref_init of the fence */
-			dma_fence_put(fence);
+	trace_drm_run_job(sched_job, entity);
+	fence = sched->ops->run_job(sched_job);
+	complete_all(&entity->entity_idle);
+	drm_sched_fence_scheduled(s_fence, fence);
 
-			r = dma_fence_add_callback(fence, &sched_job->cb,
-						   drm_sched_job_done_cb);
-			if (r == -ENOENT)
-				drm_sched_job_done(sched_job, fence->error);
-			else if (r)
-				DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n",
-					  r);
-		} else {
-			drm_sched_job_done(sched_job, IS_ERR(fence) ?
-					   PTR_ERR(fence) : 0);
-		}
+	if (!IS_ERR_OR_NULL(fence)) {
+		/* Drop for original kref_init of the fence */
+		dma_fence_put(fence);
 
-		wake_up(&sched->job_scheduled);
+		r = dma_fence_add_callback(fence, &sched_job->cb,
+					   drm_sched_job_done_cb);
+		if (r == -ENOENT)
+			drm_sched_job_done(sched_job, fence->error);
+		else if (r)
+			DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
+	} else {
+		drm_sched_job_done(sched_job, IS_ERR(fence) ?
+				   PTR_ERR(fence) : 0);
 	}
 
-again:
-	drm_sched_run_job_queue(sched);
+	wake_up(&sched->job_scheduled);
+	drm_sched_run_job_queue_if_ready(sched);
 }
 
 /**
@@ -1145,6 +1192,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 	atomic_set(&sched->hw_rq_count, 0);
 	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
 	INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
+	INIT_WORK(&sched->work_free_job, drm_sched_free_job_work);
 	atomic_set(&sched->_score, 0);
 	atomic64_set(&sched->job_id_count, 0);
 	sched->pause_submit = false;
@@ -1274,6 +1322,7 @@ void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
 {
 	WRITE_ONCE(sched->pause_submit, true);
 	cancel_work_sync(&sched->work_run_job);
+	cancel_work_sync(&sched->work_free_job);
 }
 EXPORT_SYMBOL(drm_sched_wqueue_stop);
 
@@ -1286,5 +1335,6 @@ void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched)
 {
 	WRITE_ONCE(sched->pause_submit, false);
 	queue_work(sched->submit_wq, &sched->work_run_job);
+	queue_work(sched->submit_wq, &sched->work_free_job);
 }
 EXPORT_SYMBOL(drm_sched_wqueue_start);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index e0e7c4eb57d9..677ba96759ab 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -479,9 +479,10 @@ struct drm_sched_backend_ops {
  *                 finished.
  * @hw_rq_count: the number of jobs currently in the hardware queue.
  * @job_id_count: used to assign unique id to the each job.
- * @submit_wq: workqueue used to queue @work_run_job
+ * @submit_wq: workqueue used to queue @work_run_job and @work_free_job
  * @timeout_wq: workqueue used to queue @work_tdr
  * @work_run_job: work which calls run_job op of each scheduler.
+ * @work_free_job: work which calls free_job op of each scheduler.
  * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
  *            timeout interval is over.
  * @pending_list: the list of jobs which are currently in the job queue.
@@ -511,6 +512,7 @@ struct drm_gpu_scheduler {
 	struct workqueue_struct		*submit_wq;
 	struct workqueue_struct		*timeout_wq;
 	struct work_struct		work_run_job;
+	struct work_struct		work_free_job;
 	struct delayed_work		work_tdr;
 	struct list_head		pending_list;
 	spinlock_t			job_list_lock;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v8 4/5] drm/sched: Add drm_sched_start_timeout_unlocked helper
  2023-10-31  3:24 ` [Intel-xe] " Matthew Brost
@ 2023-10-31  3:24   ` Matthew Brost
  -1 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-10-31  3:24 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, thomas.hellstrom, Matthew Brost, sarah.walker,
	Luben Tuikov, ltuikov, ketil.johnsen, Liviu.Dudau, mcanal,
	boris.brezillon, dakr, donald.robson, lina, christian.koenig,
	faith.ekstrand

Also add a lockdep assert to drm_sched_start_timeout.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 3b1b2f8eafe8..fc387de5a0c7 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -334,11 +334,20 @@ static void drm_sched_job_done_cb(struct dma_fence *f, struct dma_fence_cb *cb)
  */
 static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
 {
+	lockdep_assert_held(&sched->job_list_lock);
+
 	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
 	    !list_empty(&sched->pending_list))
 		queue_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
 }
 
+static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
+{
+	spin_lock(&sched->job_list_lock);
+	drm_sched_start_timeout(sched);
+	spin_unlock(&sched->job_list_lock);
+}
+
 /**
  * drm_sched_fault - immediately start timeout handler
  *
@@ -451,11 +460,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
 		spin_unlock(&sched->job_list_lock);
 	}
 
-	if (status != DRM_GPU_SCHED_STAT_ENODEV) {
-		spin_lock(&sched->job_list_lock);
-		drm_sched_start_timeout(sched);
-		spin_unlock(&sched->job_list_lock);
-	}
+	if (status != DRM_GPU_SCHED_STAT_ENODEV)
+		drm_sched_start_timeout_unlocked(sched);
 }
 
 /**
@@ -581,11 +587,8 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
 			drm_sched_job_done(s_job, -ECANCELED);
 	}
 
-	if (full_recovery) {
-		spin_lock(&sched->job_list_lock);
-		drm_sched_start_timeout(sched);
-		spin_unlock(&sched->job_list_lock);
-	}
+	if (full_recovery)
+		drm_sched_start_timeout_unlocked(sched);
 
 	drm_sched_wqueue_start(sched);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-xe] [PATCH v8 4/5] drm/sched: Add drm_sched_start_timeout_unlocked helper
@ 2023-10-31  3:24   ` Matthew Brost
  0 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-10-31  3:24 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, Luben Tuikov, ltuikov, ketil.johnsen,
	Liviu.Dudau, mcanal, frank.binns, boris.brezillon, dakr,
	donald.robson, daniel, lina, airlied, christian.koenig,
	faith.ekstrand

Also add a lockdep assert to drm_sched_start_timeout.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 3b1b2f8eafe8..fc387de5a0c7 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -334,11 +334,20 @@ static void drm_sched_job_done_cb(struct dma_fence *f, struct dma_fence_cb *cb)
  */
 static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
 {
+	lockdep_assert_held(&sched->job_list_lock);
+
 	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
 	    !list_empty(&sched->pending_list))
 		queue_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
 }
 
+static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
+{
+	spin_lock(&sched->job_list_lock);
+	drm_sched_start_timeout(sched);
+	spin_unlock(&sched->job_list_lock);
+}
+
 /**
  * drm_sched_fault - immediately start timeout handler
  *
@@ -451,11 +460,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
 		spin_unlock(&sched->job_list_lock);
 	}
 
-	if (status != DRM_GPU_SCHED_STAT_ENODEV) {
-		spin_lock(&sched->job_list_lock);
-		drm_sched_start_timeout(sched);
-		spin_unlock(&sched->job_list_lock);
-	}
+	if (status != DRM_GPU_SCHED_STAT_ENODEV)
+		drm_sched_start_timeout_unlocked(sched);
 }
 
 /**
@@ -581,11 +587,8 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
 			drm_sched_job_done(s_job, -ECANCELED);
 	}
 
-	if (full_recovery) {
-		spin_lock(&sched->job_list_lock);
-		drm_sched_start_timeout(sched);
-		spin_unlock(&sched->job_list_lock);
-	}
+	if (full_recovery)
+		drm_sched_start_timeout_unlocked(sched);
 
 	drm_sched_wqueue_start(sched);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v8 5/5] drm/sched: Add a helper to queue TDR immediately
  2023-10-31  3:24 ` [Intel-xe] " Matthew Brost
@ 2023-10-31  3:24   ` Matthew Brost
  -1 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-10-31  3:24 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, thomas.hellstrom, Matthew Brost, sarah.walker,
	Luben Tuikov, ltuikov, ketil.johnsen, Liviu.Dudau, mcanal,
	boris.brezillon, dakr, donald.robson, lina, christian.koenig,
	faith.ekstrand

Add a helper whereby a driver can invoke TDR immediately.

v2:
 - Drop timeout args, rename function, use mod delayed work (Luben)
v3:
 - s/XE/Xe (Luben)
 - present tense in commit message (Luben)
 - Adjust comment for drm_sched_tdr_queue_imm (Luben)
v4:
 - Adjust commit message (Luben)

Cc: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 18 +++++++++++++++++-
 include/drm/gpu_scheduler.h            |  1 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index fc387de5a0c7..98b2ad54fc70 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -338,7 +338,7 @@ static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
 
 	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
 	    !list_empty(&sched->pending_list))
-		queue_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
+		mod_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
 }
 
 static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
@@ -348,6 +348,22 @@ static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
 	spin_unlock(&sched->job_list_lock);
 }
 
+/**
+ * drm_sched_tdr_queue_imm: - immediately start job timeout handler
+ *
+ * @sched: scheduler for which the timeout handling should be started.
+ *
+ * Start timeout handling immediately for the named scheduler.
+ */
+void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched)
+{
+	spin_lock(&sched->job_list_lock);
+	sched->timeout = 0;
+	drm_sched_start_timeout(sched);
+	spin_unlock(&sched->job_list_lock);
+}
+EXPORT_SYMBOL(drm_sched_tdr_queue_imm);
+
 /**
  * drm_sched_fault - immediately start timeout handler
  *
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 677ba96759ab..c1565694c0e9 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -556,6 +556,7 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 				    struct drm_gpu_scheduler **sched_list,
                                    unsigned int num_sched_list);
 
+void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched);
 void drm_sched_job_cleanup(struct drm_sched_job *job);
 void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched);
 bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-xe] [PATCH v8 5/5] drm/sched: Add a helper to queue TDR immediately
@ 2023-10-31  3:24   ` Matthew Brost
  0 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-10-31  3:24 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: robdclark, sarah.walker, Luben Tuikov, ltuikov, ketil.johnsen,
	Liviu.Dudau, mcanal, frank.binns, boris.brezillon, dakr,
	donald.robson, daniel, lina, airlied, christian.koenig,
	faith.ekstrand

Add a helper whereby a driver can invoke TDR immediately.

v2:
 - Drop timeout args, rename function, use mod delayed work (Luben)
v3:
 - s/XE/Xe (Luben)
 - present tense in commit message (Luben)
 - Adjust comment for drm_sched_tdr_queue_imm (Luben)
v4:
 - Adjust commit message (Luben)

Cc: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 18 +++++++++++++++++-
 include/drm/gpu_scheduler.h            |  1 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index fc387de5a0c7..98b2ad54fc70 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -338,7 +338,7 @@ static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
 
 	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
 	    !list_empty(&sched->pending_list))
-		queue_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
+		mod_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
 }
 
 static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
@@ -348,6 +348,22 @@ static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
 	spin_unlock(&sched->job_list_lock);
 }
 
+/**
+ * drm_sched_tdr_queue_imm: - immediately start job timeout handler
+ *
+ * @sched: scheduler for which the timeout handling should be started.
+ *
+ * Start timeout handling immediately for the named scheduler.
+ */
+void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched)
+{
+	spin_lock(&sched->job_list_lock);
+	sched->timeout = 0;
+	drm_sched_start_timeout(sched);
+	spin_unlock(&sched->job_list_lock);
+}
+EXPORT_SYMBOL(drm_sched_tdr_queue_imm);
+
 /**
  * drm_sched_fault - immediately start timeout handler
  *
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 677ba96759ab..c1565694c0e9 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -556,6 +556,7 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 				    struct drm_gpu_scheduler **sched_list,
                                    unsigned int num_sched_list);
 
+void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched);
 void drm_sched_job_cleanup(struct drm_sched_job *job);
 void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched);
 bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev10)
  2023-10-31  3:24 ` [Intel-xe] " Matthew Brost
                   ` (5 preceding siblings ...)
  (?)
@ 2023-10-31  3:31 ` Patchwork
  -1 siblings, 0 replies; 62+ messages in thread
From: Patchwork @ 2023-10-31  3:31 UTC (permalink / raw)
  To: Danilo Krummrich; +Cc: intel-xe

== Series Details ==

Series: DRM scheduler changes for Xe (rev10)
URL   : https://patchwork.freedesktop.org/series/121744/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: 1e652bcf7 drm/xe/xelpmp: Extend Wa_22016670082 to Xe_LPM+
=== git am output follows ===
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:290
error: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c: patch does not apply
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1659
error: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c: patch does not apply
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4861
error: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: patch does not apply
error: patch failed: drivers/gpu/drm/msm/adreno/adreno_device.c:841
error: drivers/gpu/drm/msm/adreno/adreno_device.c: patch does not apply
error: patch failed: drivers/gpu/drm/scheduler/sched_main.c:439
error: drivers/gpu/drm/scheduler/sched_main.c: patch does not apply
error: patch failed: include/drm/gpu_scheduler.h:552
error: include/drm/gpu_scheduler.h: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch
Applying: drm/sched: Add drm_sched_wqueue_* helpers
Patch failed at 0001 drm/sched: Add drm_sched_wqueue_* helpers
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH v8 3/5] drm/sched: Split free_job into own work item
  2023-10-31  3:24   ` [Intel-xe] " Matthew Brost
@ 2023-11-01 22:13     ` Luben Tuikov
  -1 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-01 22:13 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ltuikov89, ketil.johnsen, Liviu.Dudau,
	mcanal, frank.binns, boris.brezillon, dakr, donald.robson,
	daniel, lina, airlied, christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 9850 bytes --]

On 2023-10-30 23:24, Matthew Brost wrote:
> Rather than call free_job and run_job in same work item have a dedicated
> work item for each. This aligns with the design and intended use of work
> queues.
> 
> v2:
>    - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
>      timestamp in free_job() work item (Danilo)
> v3:
>   - Drop forward dec of drm_sched_select_entity (Boris)
>   - Return in drm_sched_run_job_work if entity NULL (Boris)
> v4:
>   - Replace dequeue with peek and invert logic (Luben)
>   - Wrap to 100 lines (Luben)
>   - Update comments for *_queue / *_queue_if_ready functions (Luben)
> v5:
>   - Drop peek argument, blindly reinit idle (Luben)
>   - s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done (Luben)
>   - Update work_run_job & work_free_job kernel doc (Luben)
> v6:
>   - Do not move drm_sched_select_entity in file (Luben)
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>

Regards,
Luben

> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 146 +++++++++++++++++--------
>  include/drm/gpu_scheduler.h            |   4 +-
>  2 files changed, 101 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index d1ae05bded15..3b1b2f8eafe8 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -265,6 +265,32 @@ static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>  		queue_work(sched->submit_wq, &sched->work_run_job);
>  }
>  
> +/**
> + * drm_sched_free_job_queue - enqueue free-job work
> + * @sched: scheduler instance
> + */
> +static void drm_sched_free_job_queue(struct drm_gpu_scheduler *sched)
> +{
> +	if (!READ_ONCE(sched->pause_submit))
> +		queue_work(sched->submit_wq, &sched->work_free_job);
> +}
> +
> +/**
> + * drm_sched_free_job_queue_if_done - enqueue free-job work if ready
> + * @sched: scheduler instance
> + */
> +static void drm_sched_free_job_queue_if_done(struct drm_gpu_scheduler *sched)
> +{
> +	struct drm_sched_job *job;
> +
> +	spin_lock(&sched->job_list_lock);
> +	job = list_first_entry_or_null(&sched->pending_list,
> +				       struct drm_sched_job, list);
> +	if (job && dma_fence_is_signaled(&job->s_fence->finished))
> +		drm_sched_free_job_queue(sched);
> +	spin_unlock(&sched->job_list_lock);
> +}
> +
>  /**
>   * drm_sched_job_done - complete a job
>   * @s_job: pointer to the job which is done
> @@ -284,7 +310,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
>  	dma_fence_get(&s_fence->finished);
>  	drm_sched_fence_finished(s_fence, result);
>  	dma_fence_put(&s_fence->finished);
> -	drm_sched_run_job_queue(sched);
> +	drm_sched_free_job_queue(sched);
>  }
>  
>  /**
> @@ -943,8 +969,10 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>  						typeof(*next), list);
>  
>  		if (next) {
> -			next->s_fence->scheduled.timestamp =
> -				dma_fence_timestamp(&job->s_fence->finished);
> +			if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
> +				     &next->s_fence->scheduled.flags))
> +				next->s_fence->scheduled.timestamp =
> +					dma_fence_timestamp(&job->s_fence->finished);
>  			/* start TO timer for next job */
>  			drm_sched_start_timeout(sched);
>  		}
> @@ -994,7 +1022,40 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>  EXPORT_SYMBOL(drm_sched_pick_best);
>  
>  /**
> - * drm_sched_run_job_work - main scheduler thread
> + * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
> + * @sched: scheduler instance
> + */
> +static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> +{
> +	if (drm_sched_select_entity(sched))
> +		drm_sched_run_job_queue(sched);
> +}
> +
> +/**
> + * drm_sched_free_job_work - worker to call free_job
> + *
> + * @w: free job work
> + */
> +static void drm_sched_free_job_work(struct work_struct *w)
> +{
> +	struct drm_gpu_scheduler *sched =
> +		container_of(w, struct drm_gpu_scheduler, work_free_job);
> +	struct drm_sched_job *cleanup_job;
> +
> +	if (READ_ONCE(sched->pause_submit))
> +		return;
> +
> +	cleanup_job = drm_sched_get_cleanup_job(sched);
> +	if (cleanup_job) {
> +		sched->ops->free_job(cleanup_job);
> +
> +		drm_sched_free_job_queue_if_done(sched);
> +		drm_sched_run_job_queue_if_ready(sched);
> +	}
> +}
> +
> +/**
> + * drm_sched_run_job_work - worker to call run_job
>   *
>   * @w: run job work
>   */
> @@ -1003,65 +1064,51 @@ static void drm_sched_run_job_work(struct work_struct *w)
>  	struct drm_gpu_scheduler *sched =
>  		container_of(w, struct drm_gpu_scheduler, work_run_job);
>  	struct drm_sched_entity *entity;
> -	struct drm_sched_job *cleanup_job;
> +	struct dma_fence *fence;
> +	struct drm_sched_fence *s_fence;
> +	struct drm_sched_job *sched_job;
>  	int r;
>  
>  	if (READ_ONCE(sched->pause_submit))
>  		return;
>  
> -	cleanup_job = drm_sched_get_cleanup_job(sched);
>  	entity = drm_sched_select_entity(sched);
> +	if (!entity)
> +		return;
>  
> -	if (!entity && !cleanup_job)
> +	sched_job = drm_sched_entity_pop_job(entity);
> +	if (!sched_job) {
> +		complete_all(&entity->entity_idle);
>  		return;	/* No more work */
> +	}
>  
> -	if (cleanup_job)
> -		sched->ops->free_job(cleanup_job);
> -
> -	if (entity) {
> -		struct dma_fence *fence;
> -		struct drm_sched_fence *s_fence;
> -		struct drm_sched_job *sched_job;
> -
> -		sched_job = drm_sched_entity_pop_job(entity);
> -		if (!sched_job) {
> -			complete_all(&entity->entity_idle);
> -			if (!cleanup_job)
> -				return;	/* No more work */
> -			goto again;
> -		}
> -
> -		s_fence = sched_job->s_fence;
> -
> -		atomic_inc(&sched->hw_rq_count);
> -		drm_sched_job_begin(sched_job);
> +	s_fence = sched_job->s_fence;
>  
> -		trace_drm_run_job(sched_job, entity);
> -		fence = sched->ops->run_job(sched_job);
> -		complete_all(&entity->entity_idle);
> -		drm_sched_fence_scheduled(s_fence, fence);
> +	atomic_inc(&sched->hw_rq_count);
> +	drm_sched_job_begin(sched_job);
>  
> -		if (!IS_ERR_OR_NULL(fence)) {
> -			/* Drop for original kref_init of the fence */
> -			dma_fence_put(fence);
> +	trace_drm_run_job(sched_job, entity);
> +	fence = sched->ops->run_job(sched_job);
> +	complete_all(&entity->entity_idle);
> +	drm_sched_fence_scheduled(s_fence, fence);
>  
> -			r = dma_fence_add_callback(fence, &sched_job->cb,
> -						   drm_sched_job_done_cb);
> -			if (r == -ENOENT)
> -				drm_sched_job_done(sched_job, fence->error);
> -			else if (r)
> -				DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n",
> -					  r);
> -		} else {
> -			drm_sched_job_done(sched_job, IS_ERR(fence) ?
> -					   PTR_ERR(fence) : 0);
> -		}
> +	if (!IS_ERR_OR_NULL(fence)) {
> +		/* Drop for original kref_init of the fence */
> +		dma_fence_put(fence);
>  
> -		wake_up(&sched->job_scheduled);
> +		r = dma_fence_add_callback(fence, &sched_job->cb,
> +					   drm_sched_job_done_cb);
> +		if (r == -ENOENT)
> +			drm_sched_job_done(sched_job, fence->error);
> +		else if (r)
> +			DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
> +	} else {
> +		drm_sched_job_done(sched_job, IS_ERR(fence) ?
> +				   PTR_ERR(fence) : 0);
>  	}
>  
> -again:
> -	drm_sched_run_job_queue(sched);
> +	wake_up(&sched->job_scheduled);
> +	drm_sched_run_job_queue_if_ready(sched);
>  }
>  
>  /**
> @@ -1145,6 +1192,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>  	atomic_set(&sched->hw_rq_count, 0);
>  	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
>  	INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
> +	INIT_WORK(&sched->work_free_job, drm_sched_free_job_work);
>  	atomic_set(&sched->_score, 0);
>  	atomic64_set(&sched->job_id_count, 0);
>  	sched->pause_submit = false;
> @@ -1274,6 +1322,7 @@ void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
>  {
>  	WRITE_ONCE(sched->pause_submit, true);
>  	cancel_work_sync(&sched->work_run_job);
> +	cancel_work_sync(&sched->work_free_job);
>  }
>  EXPORT_SYMBOL(drm_sched_wqueue_stop);
>  
> @@ -1286,5 +1335,6 @@ void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched)
>  {
>  	WRITE_ONCE(sched->pause_submit, false);
>  	queue_work(sched->submit_wq, &sched->work_run_job);
> +	queue_work(sched->submit_wq, &sched->work_free_job);
>  }
>  EXPORT_SYMBOL(drm_sched_wqueue_start);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index e0e7c4eb57d9..677ba96759ab 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -479,9 +479,10 @@ struct drm_sched_backend_ops {
>   *                 finished.
>   * @hw_rq_count: the number of jobs currently in the hardware queue.
>   * @job_id_count: used to assign unique id to the each job.
> - * @submit_wq: workqueue used to queue @work_run_job
> + * @submit_wq: workqueue used to queue @work_run_job and @work_free_job
>   * @timeout_wq: workqueue used to queue @work_tdr
>   * @work_run_job: work which calls run_job op of each scheduler.
> + * @work_free_job: work which calls free_job op of each scheduler.
>   * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
>   *            timeout interval is over.
>   * @pending_list: the list of jobs which are currently in the job queue.
> @@ -511,6 +512,7 @@ struct drm_gpu_scheduler {
>  	struct workqueue_struct		*submit_wq;
>  	struct workqueue_struct		*timeout_wq;
>  	struct work_struct		work_run_job;
> +	struct work_struct		work_free_job;
>  	struct delayed_work		work_tdr;
>  	struct list_head		pending_list;
>  	spinlock_t			job_list_lock;

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 673 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v8 3/5] drm/sched: Split free_job into own work item
@ 2023-11-01 22:13     ` Luben Tuikov
  0 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-01 22:13 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, thomas.hellstrom, sarah.walker, ltuikov89,
	ketil.johnsen, Liviu.Dudau, mcanal, boris.brezillon, dakr,
	donald.robson, lina, christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 9850 bytes --]

On 2023-10-30 23:24, Matthew Brost wrote:
> Rather than call free_job and run_job in same work item have a dedicated
> work item for each. This aligns with the design and intended use of work
> queues.
> 
> v2:
>    - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
>      timestamp in free_job() work item (Danilo)
> v3:
>   - Drop forward dec of drm_sched_select_entity (Boris)
>   - Return in drm_sched_run_job_work if entity NULL (Boris)
> v4:
>   - Replace dequeue with peek and invert logic (Luben)
>   - Wrap to 100 lines (Luben)
>   - Update comments for *_queue / *_queue_if_ready functions (Luben)
> v5:
>   - Drop peek argument, blindly reinit idle (Luben)
>   - s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done (Luben)
>   - Update work_run_job & work_free_job kernel doc (Luben)
> v6:
>   - Do not move drm_sched_select_entity in file (Luben)
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>

Regards,
Luben

> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 146 +++++++++++++++++--------
>  include/drm/gpu_scheduler.h            |   4 +-
>  2 files changed, 101 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index d1ae05bded15..3b1b2f8eafe8 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -265,6 +265,32 @@ static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>  		queue_work(sched->submit_wq, &sched->work_run_job);
>  }
>  
> +/**
> + * drm_sched_free_job_queue - enqueue free-job work
> + * @sched: scheduler instance
> + */
> +static void drm_sched_free_job_queue(struct drm_gpu_scheduler *sched)
> +{
> +	if (!READ_ONCE(sched->pause_submit))
> +		queue_work(sched->submit_wq, &sched->work_free_job);
> +}
> +
> +/**
> + * drm_sched_free_job_queue_if_done - enqueue free-job work if ready
> + * @sched: scheduler instance
> + */
> +static void drm_sched_free_job_queue_if_done(struct drm_gpu_scheduler *sched)
> +{
> +	struct drm_sched_job *job;
> +
> +	spin_lock(&sched->job_list_lock);
> +	job = list_first_entry_or_null(&sched->pending_list,
> +				       struct drm_sched_job, list);
> +	if (job && dma_fence_is_signaled(&job->s_fence->finished))
> +		drm_sched_free_job_queue(sched);
> +	spin_unlock(&sched->job_list_lock);
> +}
> +
>  /**
>   * drm_sched_job_done - complete a job
>   * @s_job: pointer to the job which is done
> @@ -284,7 +310,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
>  	dma_fence_get(&s_fence->finished);
>  	drm_sched_fence_finished(s_fence, result);
>  	dma_fence_put(&s_fence->finished);
> -	drm_sched_run_job_queue(sched);
> +	drm_sched_free_job_queue(sched);
>  }
>  
>  /**
> @@ -943,8 +969,10 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>  						typeof(*next), list);
>  
>  		if (next) {
> -			next->s_fence->scheduled.timestamp =
> -				dma_fence_timestamp(&job->s_fence->finished);
> +			if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
> +				     &next->s_fence->scheduled.flags))
> +				next->s_fence->scheduled.timestamp =
> +					dma_fence_timestamp(&job->s_fence->finished);
>  			/* start TO timer for next job */
>  			drm_sched_start_timeout(sched);
>  		}
> @@ -994,7 +1022,40 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>  EXPORT_SYMBOL(drm_sched_pick_best);
>  
>  /**
> - * drm_sched_run_job_work - main scheduler thread
> + * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
> + * @sched: scheduler instance
> + */
> +static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> +{
> +	if (drm_sched_select_entity(sched))
> +		drm_sched_run_job_queue(sched);
> +}
> +
> +/**
> + * drm_sched_free_job_work - worker to call free_job
> + *
> + * @w: free job work
> + */
> +static void drm_sched_free_job_work(struct work_struct *w)
> +{
> +	struct drm_gpu_scheduler *sched =
> +		container_of(w, struct drm_gpu_scheduler, work_free_job);
> +	struct drm_sched_job *cleanup_job;
> +
> +	if (READ_ONCE(sched->pause_submit))
> +		return;
> +
> +	cleanup_job = drm_sched_get_cleanup_job(sched);
> +	if (cleanup_job) {
> +		sched->ops->free_job(cleanup_job);
> +
> +		drm_sched_free_job_queue_if_done(sched);
> +		drm_sched_run_job_queue_if_ready(sched);
> +	}
> +}
> +
> +/**
> + * drm_sched_run_job_work - worker to call run_job
>   *
>   * @w: run job work
>   */
> @@ -1003,65 +1064,51 @@ static void drm_sched_run_job_work(struct work_struct *w)
>  	struct drm_gpu_scheduler *sched =
>  		container_of(w, struct drm_gpu_scheduler, work_run_job);
>  	struct drm_sched_entity *entity;
> -	struct drm_sched_job *cleanup_job;
> +	struct dma_fence *fence;
> +	struct drm_sched_fence *s_fence;
> +	struct drm_sched_job *sched_job;
>  	int r;
>  
>  	if (READ_ONCE(sched->pause_submit))
>  		return;
>  
> -	cleanup_job = drm_sched_get_cleanup_job(sched);
>  	entity = drm_sched_select_entity(sched);
> +	if (!entity)
> +		return;
>  
> -	if (!entity && !cleanup_job)
> +	sched_job = drm_sched_entity_pop_job(entity);
> +	if (!sched_job) {
> +		complete_all(&entity->entity_idle);
>  		return;	/* No more work */
> +	}
>  
> -	if (cleanup_job)
> -		sched->ops->free_job(cleanup_job);
> -
> -	if (entity) {
> -		struct dma_fence *fence;
> -		struct drm_sched_fence *s_fence;
> -		struct drm_sched_job *sched_job;
> -
> -		sched_job = drm_sched_entity_pop_job(entity);
> -		if (!sched_job) {
> -			complete_all(&entity->entity_idle);
> -			if (!cleanup_job)
> -				return;	/* No more work */
> -			goto again;
> -		}
> -
> -		s_fence = sched_job->s_fence;
> -
> -		atomic_inc(&sched->hw_rq_count);
> -		drm_sched_job_begin(sched_job);
> +	s_fence = sched_job->s_fence;
>  
> -		trace_drm_run_job(sched_job, entity);
> -		fence = sched->ops->run_job(sched_job);
> -		complete_all(&entity->entity_idle);
> -		drm_sched_fence_scheduled(s_fence, fence);
> +	atomic_inc(&sched->hw_rq_count);
> +	drm_sched_job_begin(sched_job);
>  
> -		if (!IS_ERR_OR_NULL(fence)) {
> -			/* Drop for original kref_init of the fence */
> -			dma_fence_put(fence);
> +	trace_drm_run_job(sched_job, entity);
> +	fence = sched->ops->run_job(sched_job);
> +	complete_all(&entity->entity_idle);
> +	drm_sched_fence_scheduled(s_fence, fence);
>  
> -			r = dma_fence_add_callback(fence, &sched_job->cb,
> -						   drm_sched_job_done_cb);
> -			if (r == -ENOENT)
> -				drm_sched_job_done(sched_job, fence->error);
> -			else if (r)
> -				DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n",
> -					  r);
> -		} else {
> -			drm_sched_job_done(sched_job, IS_ERR(fence) ?
> -					   PTR_ERR(fence) : 0);
> -		}
> +	if (!IS_ERR_OR_NULL(fence)) {
> +		/* Drop for original kref_init of the fence */
> +		dma_fence_put(fence);
>  
> -		wake_up(&sched->job_scheduled);
> +		r = dma_fence_add_callback(fence, &sched_job->cb,
> +					   drm_sched_job_done_cb);
> +		if (r == -ENOENT)
> +			drm_sched_job_done(sched_job, fence->error);
> +		else if (r)
> +			DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
> +	} else {
> +		drm_sched_job_done(sched_job, IS_ERR(fence) ?
> +				   PTR_ERR(fence) : 0);
>  	}
>  
> -again:
> -	drm_sched_run_job_queue(sched);
> +	wake_up(&sched->job_scheduled);
> +	drm_sched_run_job_queue_if_ready(sched);
>  }
>  
>  /**
> @@ -1145,6 +1192,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>  	atomic_set(&sched->hw_rq_count, 0);
>  	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
>  	INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
> +	INIT_WORK(&sched->work_free_job, drm_sched_free_job_work);
>  	atomic_set(&sched->_score, 0);
>  	atomic64_set(&sched->job_id_count, 0);
>  	sched->pause_submit = false;
> @@ -1274,6 +1322,7 @@ void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
>  {
>  	WRITE_ONCE(sched->pause_submit, true);
>  	cancel_work_sync(&sched->work_run_job);
> +	cancel_work_sync(&sched->work_free_job);
>  }
>  EXPORT_SYMBOL(drm_sched_wqueue_stop);
>  
> @@ -1286,5 +1335,6 @@ void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched)
>  {
>  	WRITE_ONCE(sched->pause_submit, false);
>  	queue_work(sched->submit_wq, &sched->work_run_job);
> +	queue_work(sched->submit_wq, &sched->work_free_job);
>  }
>  EXPORT_SYMBOL(drm_sched_wqueue_start);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index e0e7c4eb57d9..677ba96759ab 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -479,9 +479,10 @@ struct drm_sched_backend_ops {
>   *                 finished.
>   * @hw_rq_count: the number of jobs currently in the hardware queue.
>   * @job_id_count: used to assign unique id to the each job.
> - * @submit_wq: workqueue used to queue @work_run_job
> + * @submit_wq: workqueue used to queue @work_run_job and @work_free_job
>   * @timeout_wq: workqueue used to queue @work_tdr
>   * @work_run_job: work which calls run_job op of each scheduler.
> + * @work_free_job: work which calls free_job op of each scheduler.
>   * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
>   *            timeout interval is over.
>   * @pending_list: the list of jobs which are currently in the job queue.
> @@ -511,6 +512,7 @@ struct drm_gpu_scheduler {
>  	struct workqueue_struct		*submit_wq;
>  	struct workqueue_struct		*timeout_wq;
>  	struct work_struct		work_run_job;
> +	struct work_struct		work_free_job;
>  	struct delayed_work		work_tdr;
>  	struct list_head		pending_list;
>  	spinlock_t			job_list_lock;

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 673 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH v8 0/5] DRM scheduler changes for Xe
  2023-10-31  3:24 ` [Intel-xe] " Matthew Brost
@ 2023-11-01 22:16   ` Luben Tuikov
  -1 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-01 22:16 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ltuikov89, ketil.johnsen, Liviu.Dudau,
	mcanal, frank.binns, boris.brezillon, dakr, donald.robson,
	daniel, lina, airlied, christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 3562 bytes --]

On 2023-10-30 23:24, Matthew Brost wrote:
>       As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we
> have been asked to merge our common DRM scheduler patches first.
> 
> This a continuation of a RFC [3] with all comments addressed, ready for
> a full review, and hopefully in state which can merged in the near
> future. More details of this series can found in the cover letter of the
> RFC [3].
> 
> These changes have been tested with the Xe driver. Based on drm-tip branch.
> 
> A follow up series will be posted to address some of dakr requets for
> kernel doc changes.
> 
> v2:
>  - Break run job, free job, and process message in own work items
>  - This might break other drivers as run job and free job now can run in
>    parallel, can fix up if needed
> 
> v3:
>  - Include missing patch 'drm/sched: Add drm_sched_submit_* helpers'
>  - Fix issue with setting timestamp to early
>  - Don't dequeue jobs for single entity after calling entity fini
>  - Flush pending jobs on entity fini
>  - Add documentation for entity teardown
>  - Add Matthew Brost to maintainers of DRM scheduler
> 
> v4:
>  - Drop message interface
>  - Drop 'Flush pending jobs on entity fini'
>  - Drop 'Add documentation for entity teardown'
>  - Address all feedback
> 
> v5:
>  - Address Luben's feedback
>  - Drop starting TDR after calling run_job()
>  - Drop adding Matthew Brost to maintainers of DRM scheduler
> 
> v6:
>  - Address Luben's feedback
>  - Include base commit
> 
> v7:
>  - Drop SINGLE_ENTITY mode rather pull in Luben's patch for dynamic run queues
>  - Address Luben's feedback for free_job work item patch
> 
> v8:
>  - Rebase on drm-tip which includes Luben's patch for dynamic run queues
>  - Don't adjust comments, change variable names, function names twice in series
>  - Don't move existing code to different places in a file to preserve git history
> 
> Matt
> 
> [1] https://gitlab.freedesktop.org/drm/xe/kernel
> [2] https://patchwork.freedesktop.org/series/112188/
> [3] https://patchwork.freedesktop.org/series/116055/
> 
> Matthew Brost (5):
>   drm/sched: Add drm_sched_wqueue_* helpers
>   drm/sched: Convert drm scheduler to use a work queue rather than
>     kthread
>   drm/sched: Split free_job into own work item
>   drm/sched: Add drm_sched_start_timeout_unlocked helper
>   drm/sched: Add a helper to queue TDR immediately
> 
>  .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  15 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  14 +-
>  drivers/gpu/drm/etnaviv/etnaviv_sched.c       |   2 +-
>  drivers/gpu/drm/lima/lima_sched.c             |   2 +-
>  drivers/gpu/drm/msm/adreno/adreno_device.c    |   6 +-
>  drivers/gpu/drm/msm/msm_ringbuffer.c          |   2 +-
>  drivers/gpu/drm/nouveau/nouveau_sched.c       |   2 +-
>  drivers/gpu/drm/panfrost/panfrost_job.c       |   2 +-
>  drivers/gpu/drm/scheduler/sched_main.c        | 301 ++++++++++++------
>  drivers/gpu/drm/v3d/v3d_sched.c               |  10 +-
>  include/drm/gpu_scheduler.h                   |  20 +-
>  12 files changed, 248 insertions(+), 130 deletions(-)
> 
> 
> base-commit: b560681c6bf623db41064ac486dd148d6c103e53

Hi Matthew,

I've pushed this series into drm-misc-next--I've tested and am running live with it.
Make sure to use "dim update-branches" to get all the resolutions, etc.

Thank you for working through this. Have a nice rest of your week. :-)
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 673 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v8 0/5] DRM scheduler changes for Xe
@ 2023-11-01 22:16   ` Luben Tuikov
  0 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-01 22:16 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, thomas.hellstrom, sarah.walker, ltuikov89,
	ketil.johnsen, Liviu.Dudau, mcanal, boris.brezillon, dakr,
	donald.robson, lina, christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 3562 bytes --]

On 2023-10-30 23:24, Matthew Brost wrote:
>       As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we
> have been asked to merge our common DRM scheduler patches first.
> 
> This a continuation of a RFC [3] with all comments addressed, ready for
> a full review, and hopefully in state which can merged in the near
> future. More details of this series can found in the cover letter of the
> RFC [3].
> 
> These changes have been tested with the Xe driver. Based on drm-tip branch.
> 
> A follow up series will be posted to address some of dakr requets for
> kernel doc changes.
> 
> v2:
>  - Break run job, free job, and process message in own work items
>  - This might break other drivers as run job and free job now can run in
>    parallel, can fix up if needed
> 
> v3:
>  - Include missing patch 'drm/sched: Add drm_sched_submit_* helpers'
>  - Fix issue with setting timestamp to early
>  - Don't dequeue jobs for single entity after calling entity fini
>  - Flush pending jobs on entity fini
>  - Add documentation for entity teardown
>  - Add Matthew Brost to maintainers of DRM scheduler
> 
> v4:
>  - Drop message interface
>  - Drop 'Flush pending jobs on entity fini'
>  - Drop 'Add documentation for entity teardown'
>  - Address all feedback
> 
> v5:
>  - Address Luben's feedback
>  - Drop starting TDR after calling run_job()
>  - Drop adding Matthew Brost to maintainers of DRM scheduler
> 
> v6:
>  - Address Luben's feedback
>  - Include base commit
> 
> v7:
>  - Drop SINGLE_ENTITY mode rather pull in Luben's patch for dynamic run queues
>  - Address Luben's feedback for free_job work item patch
> 
> v8:
>  - Rebase on drm-tip which includes Luben's patch for dynamic run queues
>  - Don't adjust comments, change variable names, function names twice in series
>  - Don't move existing code to different places in a file to preserve git history
> 
> Matt
> 
> [1] https://gitlab.freedesktop.org/drm/xe/kernel
> [2] https://patchwork.freedesktop.org/series/112188/
> [3] https://patchwork.freedesktop.org/series/116055/
> 
> Matthew Brost (5):
>   drm/sched: Add drm_sched_wqueue_* helpers
>   drm/sched: Convert drm scheduler to use a work queue rather than
>     kthread
>   drm/sched: Split free_job into own work item
>   drm/sched: Add drm_sched_start_timeout_unlocked helper
>   drm/sched: Add a helper to queue TDR immediately
> 
>  .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  15 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  14 +-
>  drivers/gpu/drm/etnaviv/etnaviv_sched.c       |   2 +-
>  drivers/gpu/drm/lima/lima_sched.c             |   2 +-
>  drivers/gpu/drm/msm/adreno/adreno_device.c    |   6 +-
>  drivers/gpu/drm/msm/msm_ringbuffer.c          |   2 +-
>  drivers/gpu/drm/nouveau/nouveau_sched.c       |   2 +-
>  drivers/gpu/drm/panfrost/panfrost_job.c       |   2 +-
>  drivers/gpu/drm/scheduler/sched_main.c        | 301 ++++++++++++------
>  drivers/gpu/drm/v3d/v3d_sched.c               |  10 +-
>  include/drm/gpu_scheduler.h                   |  20 +-
>  12 files changed, 248 insertions(+), 130 deletions(-)
> 
> 
> base-commit: b560681c6bf623db41064ac486dd148d6c103e53

Hi Matthew,

I've pushed this series into drm-misc-next--I've tested and am running live with it.
Make sure to use "dim update-branches" to get all the resolutions, etc.

Thank you for working through this. Have a nice rest of your week. :-)
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 673 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH v8 3/5] drm/sched: Split free_job into own work item
  2023-10-31  3:24   ` [Intel-xe] " Matthew Brost
@ 2023-11-02 11:13     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 62+ messages in thread
From: Tvrtko Ursulin @ 2023-11-02 11:13 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ltuikov, ketil.johnsen, lina, mcanal,
	Liviu.Dudau, boris.brezillon, dakr, donald.robson,
	christian.koenig, faith.ekstrand


On 31/10/2023 03:24, Matthew Brost wrote:
> Rather than call free_job and run_job in same work item have a dedicated
> work item for each. This aligns with the design and intended use of work
> queues.
> 
> v2:
>     - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
>       timestamp in free_job() work item (Danilo)
> v3:
>    - Drop forward dec of drm_sched_select_entity (Boris)
>    - Return in drm_sched_run_job_work if entity NULL (Boris)
> v4:
>    - Replace dequeue with peek and invert logic (Luben)
>    - Wrap to 100 lines (Luben)
>    - Update comments for *_queue / *_queue_if_ready functions (Luben)
> v5:
>    - Drop peek argument, blindly reinit idle (Luben)
>    - s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done (Luben)
>    - Update work_run_job & work_free_job kernel doc (Luben)
> v6:
>    - Do not move drm_sched_select_entity in file (Luben)
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 146 +++++++++++++++++--------
>   include/drm/gpu_scheduler.h            |   4 +-
>   2 files changed, 101 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index d1ae05bded15..3b1b2f8eafe8 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -265,6 +265,32 @@ static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>   		queue_work(sched->submit_wq, &sched->work_run_job);
>   }
>   
> +/**
> + * drm_sched_free_job_queue - enqueue free-job work
> + * @sched: scheduler instance
> + */
> +static void drm_sched_free_job_queue(struct drm_gpu_scheduler *sched)
> +{
> +	if (!READ_ONCE(sched->pause_submit))
> +		queue_work(sched->submit_wq, &sched->work_free_job);
> +}
> +
> +/**
> + * drm_sched_free_job_queue_if_done - enqueue free-job work if ready
> + * @sched: scheduler instance
> + */
> +static void drm_sched_free_job_queue_if_done(struct drm_gpu_scheduler *sched)
> +{
> +	struct drm_sched_job *job;
> +
> +	spin_lock(&sched->job_list_lock);
> +	job = list_first_entry_or_null(&sched->pending_list,
> +				       struct drm_sched_job, list);
> +	if (job && dma_fence_is_signaled(&job->s_fence->finished))
> +		drm_sched_free_job_queue(sched);
> +	spin_unlock(&sched->job_list_lock);
> +}
> +
>   /**
>    * drm_sched_job_done - complete a job
>    * @s_job: pointer to the job which is done
> @@ -284,7 +310,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
>   	dma_fence_get(&s_fence->finished);
>   	drm_sched_fence_finished(s_fence, result);
>   	dma_fence_put(&s_fence->finished);
> -	drm_sched_run_job_queue(sched);
> +	drm_sched_free_job_queue(sched);
>   }
>   
>   /**
> @@ -943,8 +969,10 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>   						typeof(*next), list);
>   
>   		if (next) {
> -			next->s_fence->scheduled.timestamp =
> -				dma_fence_timestamp(&job->s_fence->finished);
> +			if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
> +				     &next->s_fence->scheduled.flags))
> +				next->s_fence->scheduled.timestamp =
> +					dma_fence_timestamp(&job->s_fence->finished);
>   			/* start TO timer for next job */
>   			drm_sched_start_timeout(sched);
>   		}
> @@ -994,7 +1022,40 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>   EXPORT_SYMBOL(drm_sched_pick_best);
>   
>   /**
> - * drm_sched_run_job_work - main scheduler thread
> + * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
> + * @sched: scheduler instance
> + */
> +static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> +{
> +	if (drm_sched_select_entity(sched))
> +		drm_sched_run_job_queue(sched);
> +}
> +
> +/**
> + * drm_sched_free_job_work - worker to call free_job
> + *
> + * @w: free job work
> + */
> +static void drm_sched_free_job_work(struct work_struct *w)
> +{
> +	struct drm_gpu_scheduler *sched =
> +		container_of(w, struct drm_gpu_scheduler, work_free_job);
> +	struct drm_sched_job *cleanup_job;
> +
> +	if (READ_ONCE(sched->pause_submit))
> +		return;
> +
> +	cleanup_job = drm_sched_get_cleanup_job(sched);
> +	if (cleanup_job) {
> +		sched->ops->free_job(cleanup_job);
> +
> +		drm_sched_free_job_queue_if_done(sched);
> +		drm_sched_run_job_queue_if_ready(sched);

Are finished jobs now disturbing the round robin selection?

Every time this cleans up a job we get:

drm_sched_run_job_queue_if_ready
  -> drm_sched_select_entity
      -> drm_sched_rq_select_entity_rr
          -> rq->current_entity bumped to next in list

So when the job run worker does:

	entity = drm_sched_select_entity(sched);

It does not pick the same entity as before this patch? If so perhaps 
drm_sched_run_job_queue_if_ready needs a "peek" helper which does not 
modify any state.

Regards,

Tvrtko

> +	}
> +}
> +
> +/**
> + * drm_sched_run_job_work - worker to call run_job
>    *
>    * @w: run job work
>    */
> @@ -1003,65 +1064,51 @@ static void drm_sched_run_job_work(struct work_struct *w)
>   	struct drm_gpu_scheduler *sched =
>   		container_of(w, struct drm_gpu_scheduler, work_run_job);
>   	struct drm_sched_entity *entity;
> -	struct drm_sched_job *cleanup_job;
> +	struct dma_fence *fence;
> +	struct drm_sched_fence *s_fence;
> +	struct drm_sched_job *sched_job;
>   	int r;
>   
>   	if (READ_ONCE(sched->pause_submit))
>   		return;
>   
> -	cleanup_job = drm_sched_get_cleanup_job(sched);
>   	entity = drm_sched_select_entity(sched);
> +	if (!entity)
> +		return;
>   
> -	if (!entity && !cleanup_job)
> +	sched_job = drm_sched_entity_pop_job(entity);
> +	if (!sched_job) {
> +		complete_all(&entity->entity_idle);
>   		return;	/* No more work */
> +	}
>   
> -	if (cleanup_job)
> -		sched->ops->free_job(cleanup_job);
> -
> -	if (entity) {
> -		struct dma_fence *fence;
> -		struct drm_sched_fence *s_fence;
> -		struct drm_sched_job *sched_job;
> -
> -		sched_job = drm_sched_entity_pop_job(entity);
> -		if (!sched_job) {
> -			complete_all(&entity->entity_idle);
> -			if (!cleanup_job)
> -				return;	/* No more work */
> -			goto again;
> -		}
> -
> -		s_fence = sched_job->s_fence;
> -
> -		atomic_inc(&sched->hw_rq_count);
> -		drm_sched_job_begin(sched_job);
> +	s_fence = sched_job->s_fence;
>   
> -		trace_drm_run_job(sched_job, entity);
> -		fence = sched->ops->run_job(sched_job);
> -		complete_all(&entity->entity_idle);
> -		drm_sched_fence_scheduled(s_fence, fence);
> +	atomic_inc(&sched->hw_rq_count);
> +	drm_sched_job_begin(sched_job);
>   
> -		if (!IS_ERR_OR_NULL(fence)) {
> -			/* Drop for original kref_init of the fence */
> -			dma_fence_put(fence);
> +	trace_drm_run_job(sched_job, entity);
> +	fence = sched->ops->run_job(sched_job);
> +	complete_all(&entity->entity_idle);
> +	drm_sched_fence_scheduled(s_fence, fence);
>   
> -			r = dma_fence_add_callback(fence, &sched_job->cb,
> -						   drm_sched_job_done_cb);
> -			if (r == -ENOENT)
> -				drm_sched_job_done(sched_job, fence->error);
> -			else if (r)
> -				DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n",
> -					  r);
> -		} else {
> -			drm_sched_job_done(sched_job, IS_ERR(fence) ?
> -					   PTR_ERR(fence) : 0);
> -		}
> +	if (!IS_ERR_OR_NULL(fence)) {
> +		/* Drop for original kref_init of the fence */
> +		dma_fence_put(fence);
>   
> -		wake_up(&sched->job_scheduled);
> +		r = dma_fence_add_callback(fence, &sched_job->cb,
> +					   drm_sched_job_done_cb);
> +		if (r == -ENOENT)
> +			drm_sched_job_done(sched_job, fence->error);
> +		else if (r)
> +			DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
> +	} else {
> +		drm_sched_job_done(sched_job, IS_ERR(fence) ?
> +				   PTR_ERR(fence) : 0);
>   	}
>   
> -again:
> -	drm_sched_run_job_queue(sched);
> +	wake_up(&sched->job_scheduled);
> +	drm_sched_run_job_queue_if_ready(sched);
>   }
>   
>   /**
> @@ -1145,6 +1192,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>   	atomic_set(&sched->hw_rq_count, 0);
>   	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
>   	INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
> +	INIT_WORK(&sched->work_free_job, drm_sched_free_job_work);
>   	atomic_set(&sched->_score, 0);
>   	atomic64_set(&sched->job_id_count, 0);
>   	sched->pause_submit = false;
> @@ -1274,6 +1322,7 @@ void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
>   {
>   	WRITE_ONCE(sched->pause_submit, true);
>   	cancel_work_sync(&sched->work_run_job);
> +	cancel_work_sync(&sched->work_free_job);
>   }
>   EXPORT_SYMBOL(drm_sched_wqueue_stop);
>   
> @@ -1286,5 +1335,6 @@ void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched)
>   {
>   	WRITE_ONCE(sched->pause_submit, false);
>   	queue_work(sched->submit_wq, &sched->work_run_job);
> +	queue_work(sched->submit_wq, &sched->work_free_job);
>   }
>   EXPORT_SYMBOL(drm_sched_wqueue_start);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index e0e7c4eb57d9..677ba96759ab 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -479,9 +479,10 @@ struct drm_sched_backend_ops {
>    *                 finished.
>    * @hw_rq_count: the number of jobs currently in the hardware queue.
>    * @job_id_count: used to assign unique id to the each job.
> - * @submit_wq: workqueue used to queue @work_run_job
> + * @submit_wq: workqueue used to queue @work_run_job and @work_free_job
>    * @timeout_wq: workqueue used to queue @work_tdr
>    * @work_run_job: work which calls run_job op of each scheduler.
> + * @work_free_job: work which calls free_job op of each scheduler.
>    * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
>    *            timeout interval is over.
>    * @pending_list: the list of jobs which are currently in the job queue.
> @@ -511,6 +512,7 @@ struct drm_gpu_scheduler {
>   	struct workqueue_struct		*submit_wq;
>   	struct workqueue_struct		*timeout_wq;
>   	struct work_struct		work_run_job;
> +	struct work_struct		work_free_job;
>   	struct delayed_work		work_tdr;
>   	struct list_head		pending_list;
>   	spinlock_t			job_list_lock;

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v8 3/5] drm/sched: Split free_job into own work item
@ 2023-11-02 11:13     ` Tvrtko Ursulin
  0 siblings, 0 replies; 62+ messages in thread
From: Tvrtko Ursulin @ 2023-11-02 11:13 UTC (permalink / raw)
  To: Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, thomas.hellstrom, sarah.walker, ltuikov,
	ketil.johnsen, lina, mcanal, Liviu.Dudau, boris.brezillon, dakr,
	donald.robson, christian.koenig, faith.ekstrand


On 31/10/2023 03:24, Matthew Brost wrote:
> Rather than call free_job and run_job in same work item have a dedicated
> work item for each. This aligns with the design and intended use of work
> queues.
> 
> v2:
>     - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
>       timestamp in free_job() work item (Danilo)
> v3:
>    - Drop forward dec of drm_sched_select_entity (Boris)
>    - Return in drm_sched_run_job_work if entity NULL (Boris)
> v4:
>    - Replace dequeue with peek and invert logic (Luben)
>    - Wrap to 100 lines (Luben)
>    - Update comments for *_queue / *_queue_if_ready functions (Luben)
> v5:
>    - Drop peek argument, blindly reinit idle (Luben)
>    - s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done (Luben)
>    - Update work_run_job & work_free_job kernel doc (Luben)
> v6:
>    - Do not move drm_sched_select_entity in file (Luben)
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 146 +++++++++++++++++--------
>   include/drm/gpu_scheduler.h            |   4 +-
>   2 files changed, 101 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index d1ae05bded15..3b1b2f8eafe8 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -265,6 +265,32 @@ static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>   		queue_work(sched->submit_wq, &sched->work_run_job);
>   }
>   
> +/**
> + * drm_sched_free_job_queue - enqueue free-job work
> + * @sched: scheduler instance
> + */
> +static void drm_sched_free_job_queue(struct drm_gpu_scheduler *sched)
> +{
> +	if (!READ_ONCE(sched->pause_submit))
> +		queue_work(sched->submit_wq, &sched->work_free_job);
> +}
> +
> +/**
> + * drm_sched_free_job_queue_if_done - enqueue free-job work if ready
> + * @sched: scheduler instance
> + */
> +static void drm_sched_free_job_queue_if_done(struct drm_gpu_scheduler *sched)
> +{
> +	struct drm_sched_job *job;
> +
> +	spin_lock(&sched->job_list_lock);
> +	job = list_first_entry_or_null(&sched->pending_list,
> +				       struct drm_sched_job, list);
> +	if (job && dma_fence_is_signaled(&job->s_fence->finished))
> +		drm_sched_free_job_queue(sched);
> +	spin_unlock(&sched->job_list_lock);
> +}
> +
>   /**
>    * drm_sched_job_done - complete a job
>    * @s_job: pointer to the job which is done
> @@ -284,7 +310,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
>   	dma_fence_get(&s_fence->finished);
>   	drm_sched_fence_finished(s_fence, result);
>   	dma_fence_put(&s_fence->finished);
> -	drm_sched_run_job_queue(sched);
> +	drm_sched_free_job_queue(sched);
>   }
>   
>   /**
> @@ -943,8 +969,10 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>   						typeof(*next), list);
>   
>   		if (next) {
> -			next->s_fence->scheduled.timestamp =
> -				dma_fence_timestamp(&job->s_fence->finished);
> +			if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
> +				     &next->s_fence->scheduled.flags))
> +				next->s_fence->scheduled.timestamp =
> +					dma_fence_timestamp(&job->s_fence->finished);
>   			/* start TO timer for next job */
>   			drm_sched_start_timeout(sched);
>   		}
> @@ -994,7 +1022,40 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>   EXPORT_SYMBOL(drm_sched_pick_best);
>   
>   /**
> - * drm_sched_run_job_work - main scheduler thread
> + * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
> + * @sched: scheduler instance
> + */
> +static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> +{
> +	if (drm_sched_select_entity(sched))
> +		drm_sched_run_job_queue(sched);
> +}
> +
> +/**
> + * drm_sched_free_job_work - worker to call free_job
> + *
> + * @w: free job work
> + */
> +static void drm_sched_free_job_work(struct work_struct *w)
> +{
> +	struct drm_gpu_scheduler *sched =
> +		container_of(w, struct drm_gpu_scheduler, work_free_job);
> +	struct drm_sched_job *cleanup_job;
> +
> +	if (READ_ONCE(sched->pause_submit))
> +		return;
> +
> +	cleanup_job = drm_sched_get_cleanup_job(sched);
> +	if (cleanup_job) {
> +		sched->ops->free_job(cleanup_job);
> +
> +		drm_sched_free_job_queue_if_done(sched);
> +		drm_sched_run_job_queue_if_ready(sched);

Are finished jobs now disturbing the round robin selection?

Every time this cleans up a job we get:

drm_sched_run_job_queue_if_ready
  -> drm_sched_select_entity
      -> drm_sched_rq_select_entity_rr
          -> rq->current_entity bumped to next in list

So when the job run worker does:

	entity = drm_sched_select_entity(sched);

It does not pick the same entity as before this patch? If so perhaps 
drm_sched_run_job_queue_if_ready needs a "peek" helper which does not 
modify any state.

Regards,

Tvrtko

> +	}
> +}
> +
> +/**
> + * drm_sched_run_job_work - worker to call run_job
>    *
>    * @w: run job work
>    */
> @@ -1003,65 +1064,51 @@ static void drm_sched_run_job_work(struct work_struct *w)
>   	struct drm_gpu_scheduler *sched =
>   		container_of(w, struct drm_gpu_scheduler, work_run_job);
>   	struct drm_sched_entity *entity;
> -	struct drm_sched_job *cleanup_job;
> +	struct dma_fence *fence;
> +	struct drm_sched_fence *s_fence;
> +	struct drm_sched_job *sched_job;
>   	int r;
>   
>   	if (READ_ONCE(sched->pause_submit))
>   		return;
>   
> -	cleanup_job = drm_sched_get_cleanup_job(sched);
>   	entity = drm_sched_select_entity(sched);
> +	if (!entity)
> +		return;
>   
> -	if (!entity && !cleanup_job)
> +	sched_job = drm_sched_entity_pop_job(entity);
> +	if (!sched_job) {
> +		complete_all(&entity->entity_idle);
>   		return;	/* No more work */
> +	}
>   
> -	if (cleanup_job)
> -		sched->ops->free_job(cleanup_job);
> -
> -	if (entity) {
> -		struct dma_fence *fence;
> -		struct drm_sched_fence *s_fence;
> -		struct drm_sched_job *sched_job;
> -
> -		sched_job = drm_sched_entity_pop_job(entity);
> -		if (!sched_job) {
> -			complete_all(&entity->entity_idle);
> -			if (!cleanup_job)
> -				return;	/* No more work */
> -			goto again;
> -		}
> -
> -		s_fence = sched_job->s_fence;
> -
> -		atomic_inc(&sched->hw_rq_count);
> -		drm_sched_job_begin(sched_job);
> +	s_fence = sched_job->s_fence;
>   
> -		trace_drm_run_job(sched_job, entity);
> -		fence = sched->ops->run_job(sched_job);
> -		complete_all(&entity->entity_idle);
> -		drm_sched_fence_scheduled(s_fence, fence);
> +	atomic_inc(&sched->hw_rq_count);
> +	drm_sched_job_begin(sched_job);
>   
> -		if (!IS_ERR_OR_NULL(fence)) {
> -			/* Drop for original kref_init of the fence */
> -			dma_fence_put(fence);
> +	trace_drm_run_job(sched_job, entity);
> +	fence = sched->ops->run_job(sched_job);
> +	complete_all(&entity->entity_idle);
> +	drm_sched_fence_scheduled(s_fence, fence);
>   
> -			r = dma_fence_add_callback(fence, &sched_job->cb,
> -						   drm_sched_job_done_cb);
> -			if (r == -ENOENT)
> -				drm_sched_job_done(sched_job, fence->error);
> -			else if (r)
> -				DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n",
> -					  r);
> -		} else {
> -			drm_sched_job_done(sched_job, IS_ERR(fence) ?
> -					   PTR_ERR(fence) : 0);
> -		}
> +	if (!IS_ERR_OR_NULL(fence)) {
> +		/* Drop for original kref_init of the fence */
> +		dma_fence_put(fence);
>   
> -		wake_up(&sched->job_scheduled);
> +		r = dma_fence_add_callback(fence, &sched_job->cb,
> +					   drm_sched_job_done_cb);
> +		if (r == -ENOENT)
> +			drm_sched_job_done(sched_job, fence->error);
> +		else if (r)
> +			DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r);
> +	} else {
> +		drm_sched_job_done(sched_job, IS_ERR(fence) ?
> +				   PTR_ERR(fence) : 0);
>   	}
>   
> -again:
> -	drm_sched_run_job_queue(sched);
> +	wake_up(&sched->job_scheduled);
> +	drm_sched_run_job_queue_if_ready(sched);
>   }
>   
>   /**
> @@ -1145,6 +1192,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>   	atomic_set(&sched->hw_rq_count, 0);
>   	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
>   	INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
> +	INIT_WORK(&sched->work_free_job, drm_sched_free_job_work);
>   	atomic_set(&sched->_score, 0);
>   	atomic64_set(&sched->job_id_count, 0);
>   	sched->pause_submit = false;
> @@ -1274,6 +1322,7 @@ void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
>   {
>   	WRITE_ONCE(sched->pause_submit, true);
>   	cancel_work_sync(&sched->work_run_job);
> +	cancel_work_sync(&sched->work_free_job);
>   }
>   EXPORT_SYMBOL(drm_sched_wqueue_stop);
>   
> @@ -1286,5 +1335,6 @@ void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched)
>   {
>   	WRITE_ONCE(sched->pause_submit, false);
>   	queue_work(sched->submit_wq, &sched->work_run_job);
> +	queue_work(sched->submit_wq, &sched->work_free_job);
>   }
>   EXPORT_SYMBOL(drm_sched_wqueue_start);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index e0e7c4eb57d9..677ba96759ab 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -479,9 +479,10 @@ struct drm_sched_backend_ops {
>    *                 finished.
>    * @hw_rq_count: the number of jobs currently in the hardware queue.
>    * @job_id_count: used to assign unique id to the each job.
> - * @submit_wq: workqueue used to queue @work_run_job
> + * @submit_wq: workqueue used to queue @work_run_job and @work_free_job
>    * @timeout_wq: workqueue used to queue @work_tdr
>    * @work_run_job: work which calls run_job op of each scheduler.
> + * @work_free_job: work which calls free_job op of each scheduler.
>    * @work_tdr: schedules a delayed call to @drm_sched_job_timedout after the
>    *            timeout interval is over.
>    * @pending_list: the list of jobs which are currently in the job queue.
> @@ -511,6 +512,7 @@ struct drm_gpu_scheduler {
>   	struct workqueue_struct		*submit_wq;
>   	struct workqueue_struct		*timeout_wq;
>   	struct work_struct		work_run_job;
> +	struct work_struct		work_free_job;
>   	struct delayed_work		work_tdr;
>   	struct list_head		pending_list;
>   	spinlock_t			job_list_lock;

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()
  2023-11-02 11:13     ` Tvrtko Ursulin
@ 2023-11-02 22:46       ` Luben Tuikov
  -1 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-02 22:46 UTC (permalink / raw)
  To: tvrtko.ursulin
  Cc: matthew.brost, robdclark, sarah.walker, ltuikov, Luben Tuikov,
	ketil.johnsen, lina, mcanal, Liviu.Dudau, dri-devel, intel-xe,
	boris.brezillon, dakr, donald.robson, christian.koenig,
	faith.ekstrand

Eliminate drm_sched_run_job_queue_if_ready() and instead just call
drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
the former function uses drm_sched_select_entity() to determine if the
scheduler had an entity ready in one of its run-queues, and in the case of the
Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
just that, selects the _next_ entity which is ready, sets up the run-queue and
completion and returns that entity. The FIFO scheduling algorithm is unaffected.

Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
in the case of RR scheduling, that would result in calling select_entity()
twice, which may result in skipping a ready entity if more than one entity is
ready. This commit fixes this by eliminating the if_ready() variant.

Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 98b2ad54fc7071..05816e7cae8c8b 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 }
 EXPORT_SYMBOL(drm_sched_pick_best);
 
-/**
- * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
- * @sched: scheduler instance
- */
-static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
-{
-	if (drm_sched_select_entity(sched))
-		drm_sched_run_job_queue(sched);
-}
-
 /**
  * drm_sched_free_job_work - worker to call free_job
  *
@@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
 		sched->ops->free_job(cleanup_job);
 
 		drm_sched_free_job_queue_if_done(sched);
-		drm_sched_run_job_queue_if_ready(sched);
+		drm_sched_run_job_queue(sched);
 	}
 }
 
@@ -1127,7 +1117,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
 	}
 
 	wake_up(&sched->job_scheduled);
-	drm_sched_run_job_queue_if_ready(sched);
+	drm_sched_run_job_queue(sched);
 }
 
 /**

base-commit: 6fd9487147c4f18ad77eea00bd8c9189eec74a3e
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-xe] [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()
@ 2023-11-02 22:46       ` Luben Tuikov
  0 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-02 22:46 UTC (permalink / raw)
  To: tvrtko.ursulin
  Cc: robdclark, sarah.walker, ltuikov, Luben Tuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	dakr, donald.robson, christian.koenig, faith.ekstrand

Eliminate drm_sched_run_job_queue_if_ready() and instead just call
drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
the former function uses drm_sched_select_entity() to determine if the
scheduler had an entity ready in one of its run-queues, and in the case of the
Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
just that, selects the _next_ entity which is ready, sets up the run-queue and
completion and returns that entity. The FIFO scheduling algorithm is unaffected.

Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
in the case of RR scheduling, that would result in calling select_entity()
twice, which may result in skipping a ready entity if more than one entity is
ready. This commit fixes this by eliminating the if_ready() variant.

Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 98b2ad54fc7071..05816e7cae8c8b 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 }
 EXPORT_SYMBOL(drm_sched_pick_best);
 
-/**
- * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
- * @sched: scheduler instance
- */
-static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
-{
-	if (drm_sched_select_entity(sched))
-		drm_sched_run_job_queue(sched);
-}
-
 /**
  * drm_sched_free_job_work - worker to call free_job
  *
@@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
 		sched->ops->free_job(cleanup_job);
 
 		drm_sched_free_job_queue_if_done(sched);
-		drm_sched_run_job_queue_if_ready(sched);
+		drm_sched_run_job_queue(sched);
 	}
 }
 
@@ -1127,7 +1117,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
 	}
 
 	wake_up(&sched->job_scheduled);
-	drm_sched_run_job_queue_if_ready(sched);
+	drm_sched_run_job_queue(sched);
 }
 
 /**

base-commit: 6fd9487147c4f18ad77eea00bd8c9189eec74a3e
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev11)
  2023-10-31  3:24 ` [Intel-xe] " Matthew Brost
                   ` (7 preceding siblings ...)
  (?)
@ 2023-11-02 22:49 ` Patchwork
  -1 siblings, 0 replies; 62+ messages in thread
From: Patchwork @ 2023-11-02 22:49 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: intel-xe

== Series Details ==

Series: DRM scheduler changes for Xe (rev11)
URL   : https://patchwork.freedesktop.org/series/121744/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: 58dfdb8dc drm/xe: Add Wa_14019821291
=== git am output follows ===
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:290
error: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c: patch does not apply
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1659
error: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c: patch does not apply
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4861
error: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: patch does not apply
error: patch failed: drivers/gpu/drm/msm/adreno/adreno_device.c:841
error: drivers/gpu/drm/msm/adreno/adreno_device.c: patch does not apply
error: patch failed: drivers/gpu/drm/scheduler/sched_main.c:439
error: drivers/gpu/drm/scheduler/sched_main.c: patch does not apply
error: patch failed: include/drm/gpu_scheduler.h:552
error: include/drm/gpu_scheduler.h: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch
Applying: drm/sched: Add drm_sched_wqueue_* helpers
Patch failed at 0001 drm/sched: Add drm_sched_wqueue_* helpers
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v8 3/5] drm/sched: Split free_job into own work item
  2023-11-02 11:13     ` Tvrtko Ursulin
@ 2023-11-02 22:58       ` Luben Tuikov
  -1 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-02 22:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, thomas.hellstrom, sarah.walker, ltuikov,
	ketil.johnsen, lina, mcanal, Liviu.Dudau, boris.brezillon, dakr,
	donald.robson, christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 5257 bytes --]

On 2023-11-02 07:13, Tvrtko Ursulin wrote:
> 
> On 31/10/2023 03:24, Matthew Brost wrote:
>> Rather than call free_job and run_job in same work item have a dedicated
>> work item for each. This aligns with the design and intended use of work
>> queues.
>>
>> v2:
>>     - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
>>       timestamp in free_job() work item (Danilo)
>> v3:
>>    - Drop forward dec of drm_sched_select_entity (Boris)
>>    - Return in drm_sched_run_job_work if entity NULL (Boris)
>> v4:
>>    - Replace dequeue with peek and invert logic (Luben)
>>    - Wrap to 100 lines (Luben)
>>    - Update comments for *_queue / *_queue_if_ready functions (Luben)
>> v5:
>>    - Drop peek argument, blindly reinit idle (Luben)
>>    - s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done (Luben)
>>    - Update work_run_job & work_free_job kernel doc (Luben)
>> v6:
>>    - Do not move drm_sched_select_entity in file (Luben)
>>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 146 +++++++++++++++++--------
>>   include/drm/gpu_scheduler.h            |   4 +-
>>   2 files changed, 101 insertions(+), 49 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index d1ae05bded15..3b1b2f8eafe8 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -265,6 +265,32 @@ static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>   		queue_work(sched->submit_wq, &sched->work_run_job);
>>   }
>>   
>> +/**
>> + * drm_sched_free_job_queue - enqueue free-job work
>> + * @sched: scheduler instance
>> + */
>> +static void drm_sched_free_job_queue(struct drm_gpu_scheduler *sched)
>> +{
>> +	if (!READ_ONCE(sched->pause_submit))
>> +		queue_work(sched->submit_wq, &sched->work_free_job);
>> +}
>> +
>> +/**
>> + * drm_sched_free_job_queue_if_done - enqueue free-job work if ready
>> + * @sched: scheduler instance
>> + */
>> +static void drm_sched_free_job_queue_if_done(struct drm_gpu_scheduler *sched)
>> +{
>> +	struct drm_sched_job *job;
>> +
>> +	spin_lock(&sched->job_list_lock);
>> +	job = list_first_entry_or_null(&sched->pending_list,
>> +				       struct drm_sched_job, list);
>> +	if (job && dma_fence_is_signaled(&job->s_fence->finished))
>> +		drm_sched_free_job_queue(sched);
>> +	spin_unlock(&sched->job_list_lock);
>> +}
>> +
>>   /**
>>    * drm_sched_job_done - complete a job
>>    * @s_job: pointer to the job which is done
>> @@ -284,7 +310,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
>>   	dma_fence_get(&s_fence->finished);
>>   	drm_sched_fence_finished(s_fence, result);
>>   	dma_fence_put(&s_fence->finished);
>> -	drm_sched_run_job_queue(sched);
>> +	drm_sched_free_job_queue(sched);
>>   }
>>   
>>   /**
>> @@ -943,8 +969,10 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>   						typeof(*next), list);
>>   
>>   		if (next) {
>> -			next->s_fence->scheduled.timestamp =
>> -				dma_fence_timestamp(&job->s_fence->finished);
>> +			if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
>> +				     &next->s_fence->scheduled.flags))
>> +				next->s_fence->scheduled.timestamp =
>> +					dma_fence_timestamp(&job->s_fence->finished);
>>   			/* start TO timer for next job */
>>   			drm_sched_start_timeout(sched);
>>   		}
>> @@ -994,7 +1022,40 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>   EXPORT_SYMBOL(drm_sched_pick_best);
>>   
>>   /**
>> - * drm_sched_run_job_work - main scheduler thread
>> + * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
>> + * @sched: scheduler instance
>> + */
>> +static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
>> +{
>> +	if (drm_sched_select_entity(sched))
>> +		drm_sched_run_job_queue(sched);
>> +}
>> +
>> +/**
>> + * drm_sched_free_job_work - worker to call free_job
>> + *
>> + * @w: free job work
>> + */
>> +static void drm_sched_free_job_work(struct work_struct *w)
>> +{
>> +	struct drm_gpu_scheduler *sched =
>> +		container_of(w, struct drm_gpu_scheduler, work_free_job);
>> +	struct drm_sched_job *cleanup_job;
>> +
>> +	if (READ_ONCE(sched->pause_submit))
>> +		return;
>> +
>> +	cleanup_job = drm_sched_get_cleanup_job(sched);
>> +	if (cleanup_job) {
>> +		sched->ops->free_job(cleanup_job);
>> +
>> +		drm_sched_free_job_queue_if_done(sched);
>> +		drm_sched_run_job_queue_if_ready(sched);
> 
> Are finished jobs now disturbing the round robin selection?
> 
> Every time this cleans up a job we get:
> 
> drm_sched_run_job_queue_if_ready
>   -> drm_sched_select_entity
>       -> drm_sched_rq_select_entity_rr
>           -> rq->current_entity bumped to next in list
> 
> So when the job run worker does:
> 
> 	entity = drm_sched_select_entity(sched);
> 
> It does not pick the same entity as before this patch? If so perhaps 
> drm_sched_run_job_queue_if_ready needs a "peek" helper which does not 
> modify any state.

Hi Tvrtko,

Thank you for reporting this. I've sent out a patch.
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH v8 3/5] drm/sched: Split free_job into own work item
@ 2023-11-02 22:58       ` Luben Tuikov
  0 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-02 22:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, Matthew Brost, dri-devel, intel-xe
  Cc: robdclark, sarah.walker, ltuikov, ketil.johnsen, lina, mcanal,
	Liviu.Dudau, boris.brezillon, dakr, donald.robson,
	christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 5257 bytes --]

On 2023-11-02 07:13, Tvrtko Ursulin wrote:
> 
> On 31/10/2023 03:24, Matthew Brost wrote:
>> Rather than call free_job and run_job in same work item have a dedicated
>> work item for each. This aligns with the design and intended use of work
>> queues.
>>
>> v2:
>>     - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
>>       timestamp in free_job() work item (Danilo)
>> v3:
>>    - Drop forward dec of drm_sched_select_entity (Boris)
>>    - Return in drm_sched_run_job_work if entity NULL (Boris)
>> v4:
>>    - Replace dequeue with peek and invert logic (Luben)
>>    - Wrap to 100 lines (Luben)
>>    - Update comments for *_queue / *_queue_if_ready functions (Luben)
>> v5:
>>    - Drop peek argument, blindly reinit idle (Luben)
>>    - s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done (Luben)
>>    - Update work_run_job & work_free_job kernel doc (Luben)
>> v6:
>>    - Do not move drm_sched_select_entity in file (Luben)
>>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 146 +++++++++++++++++--------
>>   include/drm/gpu_scheduler.h            |   4 +-
>>   2 files changed, 101 insertions(+), 49 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index d1ae05bded15..3b1b2f8eafe8 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -265,6 +265,32 @@ static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>   		queue_work(sched->submit_wq, &sched->work_run_job);
>>   }
>>   
>> +/**
>> + * drm_sched_free_job_queue - enqueue free-job work
>> + * @sched: scheduler instance
>> + */
>> +static void drm_sched_free_job_queue(struct drm_gpu_scheduler *sched)
>> +{
>> +	if (!READ_ONCE(sched->pause_submit))
>> +		queue_work(sched->submit_wq, &sched->work_free_job);
>> +}
>> +
>> +/**
>> + * drm_sched_free_job_queue_if_done - enqueue free-job work if ready
>> + * @sched: scheduler instance
>> + */
>> +static void drm_sched_free_job_queue_if_done(struct drm_gpu_scheduler *sched)
>> +{
>> +	struct drm_sched_job *job;
>> +
>> +	spin_lock(&sched->job_list_lock);
>> +	job = list_first_entry_or_null(&sched->pending_list,
>> +				       struct drm_sched_job, list);
>> +	if (job && dma_fence_is_signaled(&job->s_fence->finished))
>> +		drm_sched_free_job_queue(sched);
>> +	spin_unlock(&sched->job_list_lock);
>> +}
>> +
>>   /**
>>    * drm_sched_job_done - complete a job
>>    * @s_job: pointer to the job which is done
>> @@ -284,7 +310,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
>>   	dma_fence_get(&s_fence->finished);
>>   	drm_sched_fence_finished(s_fence, result);
>>   	dma_fence_put(&s_fence->finished);
>> -	drm_sched_run_job_queue(sched);
>> +	drm_sched_free_job_queue(sched);
>>   }
>>   
>>   /**
>> @@ -943,8 +969,10 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>   						typeof(*next), list);
>>   
>>   		if (next) {
>> -			next->s_fence->scheduled.timestamp =
>> -				dma_fence_timestamp(&job->s_fence->finished);
>> +			if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
>> +				     &next->s_fence->scheduled.flags))
>> +				next->s_fence->scheduled.timestamp =
>> +					dma_fence_timestamp(&job->s_fence->finished);
>>   			/* start TO timer for next job */
>>   			drm_sched_start_timeout(sched);
>>   		}
>> @@ -994,7 +1022,40 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>   EXPORT_SYMBOL(drm_sched_pick_best);
>>   
>>   /**
>> - * drm_sched_run_job_work - main scheduler thread
>> + * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
>> + * @sched: scheduler instance
>> + */
>> +static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
>> +{
>> +	if (drm_sched_select_entity(sched))
>> +		drm_sched_run_job_queue(sched);
>> +}
>> +
>> +/**
>> + * drm_sched_free_job_work - worker to call free_job
>> + *
>> + * @w: free job work
>> + */
>> +static void drm_sched_free_job_work(struct work_struct *w)
>> +{
>> +	struct drm_gpu_scheduler *sched =
>> +		container_of(w, struct drm_gpu_scheduler, work_free_job);
>> +	struct drm_sched_job *cleanup_job;
>> +
>> +	if (READ_ONCE(sched->pause_submit))
>> +		return;
>> +
>> +	cleanup_job = drm_sched_get_cleanup_job(sched);
>> +	if (cleanup_job) {
>> +		sched->ops->free_job(cleanup_job);
>> +
>> +		drm_sched_free_job_queue_if_done(sched);
>> +		drm_sched_run_job_queue_if_ready(sched);
> 
> Are finished jobs now disturbing the round robin selection?
> 
> Every time this cleans up a job we get:
> 
> drm_sched_run_job_queue_if_ready
>   -> drm_sched_select_entity
>       -> drm_sched_rq_select_entity_rr
>           -> rq->current_entity bumped to next in list
> 
> So when the job run worker does:
> 
> 	entity = drm_sched_select_entity(sched);
> 
> It does not pick the same entity as before this patch? If so perhaps 
> drm_sched_run_job_queue_if_ready needs a "peek" helper which does not 
> modify any state.

Hi Tvrtko,

Thank you for reporting this. I've sent out a patch.
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()
  2023-11-02 22:46       ` [Intel-xe] " Luben Tuikov
@ 2023-11-03 10:39         ` Tvrtko Ursulin
  -1 siblings, 0 replies; 62+ messages in thread
From: Tvrtko Ursulin @ 2023-11-03 10:39 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: matthew.brost, robdclark, sarah.walker, ltuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	dakr, donald.robson, christian.koenig, faith.ekstrand


On 02/11/2023 22:46, Luben Tuikov wrote:
> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
> the former function uses drm_sched_select_entity() to determine if the
> scheduler had an entity ready in one of its run-queues, and in the case of the
> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
> just that, selects the _next_ entity which is ready, sets up the run-queue and
> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
> 
> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
> in the case of RR scheduling, that would result in calling select_entity()
> twice, which may result in skipping a ready entity if more than one entity is
> ready. This commit fixes this by eliminating the if_ready() variant.

Fixes: is missing since the regression already landed.

> 
> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>   1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 98b2ad54fc7071..05816e7cae8c8b 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>   }
>   EXPORT_SYMBOL(drm_sched_pick_best);
>   
> -/**
> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> -{
> -	if (drm_sched_select_entity(sched))
> -		drm_sched_run_job_queue(sched);
> -}
> -
>   /**
>    * drm_sched_free_job_work - worker to call free_job
>    *
> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>   		sched->ops->free_job(cleanup_job);
>   
>   		drm_sched_free_job_queue_if_done(sched);
> -		drm_sched_run_job_queue_if_ready(sched);
> +		drm_sched_run_job_queue(sched);

It works but is a bit wasteful causing needless CPU wake ups with a 
potentially empty queue, both here and in drm_sched_run_job_work below.

What would be the problem in having a "peek" type helper? It would be 
easy to do it in a single spin lock section instead of drop and re-acquire.

What is even the point of having the re-queue here _inside_ the if 
(cleanup_job) block? See 
https://lists.freedesktop.org/archives/dri-devel/2023-November/429037.html. 
Because of the lock drop and re-acquire I don't see that it makes sense 
to make potential re-queue depend on the existence of current finished job.

Also the point of doing the re-queue of the run job queue from the free 
worker?

(I suppose re-queuing the _free_ worker itself is needed in the current 
design, albeit inefficient.)

Regards,

Tvrtko

>   	}
>   }
>   
> @@ -1127,7 +1117,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
>   	}
>   
>   	wake_up(&sched->job_scheduled);
> -	drm_sched_run_job_queue_if_ready(sched);
> +	drm_sched_run_job_queue(sched);
>   }
>   
>   /**
> 
> base-commit: 6fd9487147c4f18ad77eea00bd8c9189eec74a3e

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()
@ 2023-11-03 10:39         ` Tvrtko Ursulin
  0 siblings, 0 replies; 62+ messages in thread
From: Tvrtko Ursulin @ 2023-11-03 10:39 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: robdclark, sarah.walker, ltuikov, ketil.johnsen, lina, mcanal,
	Liviu.Dudau, dri-devel, intel-xe, boris.brezillon, dakr,
	donald.robson, christian.koenig, faith.ekstrand


On 02/11/2023 22:46, Luben Tuikov wrote:
> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
> the former function uses drm_sched_select_entity() to determine if the
> scheduler had an entity ready in one of its run-queues, and in the case of the
> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
> just that, selects the _next_ entity which is ready, sets up the run-queue and
> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
> 
> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
> in the case of RR scheduling, that would result in calling select_entity()
> twice, which may result in skipping a ready entity if more than one entity is
> ready. This commit fixes this by eliminating the if_ready() variant.

Fixes: is missing since the regression already landed.

> 
> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>   1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 98b2ad54fc7071..05816e7cae8c8b 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>   }
>   EXPORT_SYMBOL(drm_sched_pick_best);
>   
> -/**
> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> -{
> -	if (drm_sched_select_entity(sched))
> -		drm_sched_run_job_queue(sched);
> -}
> -
>   /**
>    * drm_sched_free_job_work - worker to call free_job
>    *
> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>   		sched->ops->free_job(cleanup_job);
>   
>   		drm_sched_free_job_queue_if_done(sched);
> -		drm_sched_run_job_queue_if_ready(sched);
> +		drm_sched_run_job_queue(sched);

It works but is a bit wasteful causing needless CPU wake ups with a 
potentially empty queue, both here and in drm_sched_run_job_work below.

What would be the problem in having a "peek" type helper? It would be 
easy to do it in a single spin lock section instead of drop and re-acquire.

What is even the point of having the re-queue here _inside_ the if 
(cleanup_job) block? See 
https://lists.freedesktop.org/archives/dri-devel/2023-November/429037.html. 
Because of the lock drop and re-acquire I don't see that it makes sense 
to make potential re-queue depend on the existence of current finished job.

Also the point of doing the re-queue of the run job queue from the free 
worker?

(I suppose re-queuing the _free_ worker itself is needed in the current 
design, albeit inefficient.)

Regards,

Tvrtko

>   	}
>   }
>   
> @@ -1127,7 +1117,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
>   	}
>   
>   	wake_up(&sched->job_scheduled);
> -	drm_sched_run_job_queue_if_ready(sched);
> +	drm_sched_run_job_queue(sched);
>   }
>   
>   /**
> 
> base-commit: 6fd9487147c4f18ad77eea00bd8c9189eec74a3e

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()
  2023-11-02 22:46       ` [Intel-xe] " Luben Tuikov
@ 2023-11-03 15:13         ` Matthew Brost
  -1 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-11-03 15:13 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: robdclark, tvrtko.ursulin, sarah.walker, ltuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	dakr, donald.robson, christian.koenig, faith.ekstrand

On Thu, Nov 02, 2023 at 06:46:54PM -0400, Luben Tuikov wrote:
> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
> the former function uses drm_sched_select_entity() to determine if the
> scheduler had an entity ready in one of its run-queues, and in the case of the
> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
> just that, selects the _next_ entity which is ready, sets up the run-queue and
> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
> 
> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
> in the case of RR scheduling, that would result in calling select_entity()
> twice, which may result in skipping a ready entity if more than one entity is
> ready. This commit fixes this by eliminating the if_ready() variant.
> 

Ah, yes I guess we both missed this. What about reviving the peek
argument [1]? This would avoid unnecessary re-queues. 

Matt

[1] https://patchwork.freedesktop.org/patch/562222/?series=121744&rev=7

> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>  1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 98b2ad54fc7071..05816e7cae8c8b 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>  }
>  EXPORT_SYMBOL(drm_sched_pick_best);
>  
> -/**
> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> -{
> -	if (drm_sched_select_entity(sched))
> -		drm_sched_run_job_queue(sched);
> -}
> -
>  /**
>   * drm_sched_free_job_work - worker to call free_job
>   *
> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>  		sched->ops->free_job(cleanup_job);
>  
>  		drm_sched_free_job_queue_if_done(sched);
> -		drm_sched_run_job_queue_if_ready(sched);
> +		drm_sched_run_job_queue(sched);
>  	}
>  }
>  
> @@ -1127,7 +1117,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
>  	}
>  
>  	wake_up(&sched->job_scheduled);
> -	drm_sched_run_job_queue_if_ready(sched);
> +	drm_sched_run_job_queue(sched);
>  }
>  
>  /**
> 
> base-commit: 6fd9487147c4f18ad77eea00bd8c9189eec74a3e
> -- 
> 2.42.1
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()
@ 2023-11-03 15:13         ` Matthew Brost
  0 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-11-03 15:13 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: robdclark, tvrtko.ursulin, sarah.walker, ltuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	dakr, donald.robson, christian.koenig, faith.ekstrand

On Thu, Nov 02, 2023 at 06:46:54PM -0400, Luben Tuikov wrote:
> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
> the former function uses drm_sched_select_entity() to determine if the
> scheduler had an entity ready in one of its run-queues, and in the case of the
> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
> just that, selects the _next_ entity which is ready, sets up the run-queue and
> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
> 
> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
> in the case of RR scheduling, that would result in calling select_entity()
> twice, which may result in skipping a ready entity if more than one entity is
> ready. This commit fixes this by eliminating the if_ready() variant.
> 

Ah, yes I guess we both missed this. What about reviving the peek
argument [1]? This would avoid unnecessary re-queues. 

Matt

[1] https://patchwork.freedesktop.org/patch/562222/?series=121744&rev=7

> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>  1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 98b2ad54fc7071..05816e7cae8c8b 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>  }
>  EXPORT_SYMBOL(drm_sched_pick_best);
>  
> -/**
> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> -{
> -	if (drm_sched_select_entity(sched))
> -		drm_sched_run_job_queue(sched);
> -}
> -
>  /**
>   * drm_sched_free_job_work - worker to call free_job
>   *
> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>  		sched->ops->free_job(cleanup_job);
>  
>  		drm_sched_free_job_queue_if_done(sched);
> -		drm_sched_run_job_queue_if_ready(sched);
> +		drm_sched_run_job_queue(sched);
>  	}
>  }
>  
> @@ -1127,7 +1117,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
>  	}
>  
>  	wake_up(&sched->job_scheduled);
> -	drm_sched_run_job_queue_if_ready(sched);
> +	drm_sched_run_job_queue(sched);
>  }
>  
>  /**
> 
> base-commit: 6fd9487147c4f18ad77eea00bd8c9189eec74a3e
> -- 
> 2.42.1
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()
  2023-11-03 15:13         ` [Intel-xe] " Matthew Brost
@ 2023-11-04  0:24           ` Luben Tuikov
  -1 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-04  0:24 UTC (permalink / raw)
  To: Matthew Brost
  Cc: robdclark, tvrtko.ursulin, sarah.walker, ltuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	dakr, donald.robson, christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 2128 bytes --]

Hi Matt, :-)

On 2023-11-03 11:13, Matthew Brost wrote:
> On Thu, Nov 02, 2023 at 06:46:54PM -0400, Luben Tuikov wrote:
>> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
>> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
>> the former function uses drm_sched_select_entity() to determine if the
>> scheduler had an entity ready in one of its run-queues, and in the case of the
>> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
>> just that, selects the _next_ entity which is ready, sets up the run-queue and
>> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
>>
>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>> in the case of RR scheduling, that would result in calling select_entity()
>> twice, which may result in skipping a ready entity if more than one entity is
>> ready. This commit fixes this by eliminating the if_ready() variant.
>>
> 
> Ah, yes I guess we both missed this. What about reviving the peek
> argument [1]? This would avoid unnecessary re-queues.

So, I really am not too fond of "peek-then-get-and-do" (scheduling) organizations,
because they don't scale. As we've seen in our case, the RR has a side effect,
as Tvrtko pointed out (thanks!), and in the future this
"peek-first, then go-again, to go"-type of organization would only prevent us
from doing more interesting things.

Also, with the GPU scheduler organization, mixing in the "peek", we just get
to carry it around through many a function, only to be used in a leaf function,
and exported way back up (because we don't know the rq at that level).

I'd much rather we just did "consume-until-empty", and if we have one last
empty check (or first), then that's not a breaker. (I mean, we have a
drm_sched_pick_best() which has time complexity O(n), and we execute it every time
we arm a job, so it's not that big of a deal.) Plus, it makes the code concise
and compact.

Let me reconstitute the patch and I'll send it for yours review.
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()
@ 2023-11-04  0:24           ` Luben Tuikov
  0 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-04  0:24 UTC (permalink / raw)
  To: Matthew Brost
  Cc: robdclark, tvrtko.ursulin, sarah.walker, ltuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	dakr, donald.robson, christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 2128 bytes --]

Hi Matt, :-)

On 2023-11-03 11:13, Matthew Brost wrote:
> On Thu, Nov 02, 2023 at 06:46:54PM -0400, Luben Tuikov wrote:
>> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
>> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
>> the former function uses drm_sched_select_entity() to determine if the
>> scheduler had an entity ready in one of its run-queues, and in the case of the
>> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
>> just that, selects the _next_ entity which is ready, sets up the run-queue and
>> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
>>
>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>> in the case of RR scheduling, that would result in calling select_entity()
>> twice, which may result in skipping a ready entity if more than one entity is
>> ready. This commit fixes this by eliminating the if_ready() variant.
>>
> 
> Ah, yes I guess we both missed this. What about reviving the peek
> argument [1]? This would avoid unnecessary re-queues.

So, I really am not too fond of "peek-then-get-and-do" (scheduling) organizations,
because they don't scale. As we've seen in our case, the RR has a side effect,
as Tvrtko pointed out (thanks!), and in the future this
"peek-first, then go-again, to go"-type of organization would only prevent us
from doing more interesting things.

Also, with the GPU scheduler organization, mixing in the "peek", we just get
to carry it around through many a function, only to be used in a leaf function,
and exported way back up (because we don't know the rq at that level).

I'd much rather we just did "consume-until-empty", and if we have one last
empty check (or first), then that's not a breaker. (I mean, we have a
drm_sched_pick_best() which has time complexity O(n), and we execute it every time
we arm a job, so it's not that big of a deal.) Plus, it makes the code concise
and compact.

Let me reconstitute the patch and I'll send it for yours review.
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()
  2023-11-03 10:39         ` [Intel-xe] " Tvrtko Ursulin
@ 2023-11-04  0:25           ` Luben Tuikov
  -1 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-04  0:25 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: robdclark, sarah.walker, ltuikov, ketil.johnsen, lina, mcanal,
	Liviu.Dudau, dri-devel, intel-xe, boris.brezillon, dakr,
	donald.robson, christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 3061 bytes --]

Hi Tvrtko,

On 2023-11-03 06:39, Tvrtko Ursulin wrote:
> 
> On 02/11/2023 22:46, Luben Tuikov wrote:
>> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
>> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
>> the former function uses drm_sched_select_entity() to determine if the
>> scheduler had an entity ready in one of its run-queues, and in the case of the
>> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
>> just that, selects the _next_ entity which is ready, sets up the run-queue and
>> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
>>
>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>> in the case of RR scheduling, that would result in calling select_entity()
>> twice, which may result in skipping a ready entity if more than one entity is
>> ready. This commit fixes this by eliminating the if_ready() variant.
> 
> Fixes: is missing since the regression already landed.

Ah, yes, thank you for pointing that out. :-)
I'll add one.

> 
>>
>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>>   1 file changed, 2 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 98b2ad54fc7071..05816e7cae8c8b 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>   }
>>   EXPORT_SYMBOL(drm_sched_pick_best);
>>   
>> -/**
>> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
>> - * @sched: scheduler instance
>> - */
>> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
>> -{
>> -	if (drm_sched_select_entity(sched))
>> -		drm_sched_run_job_queue(sched);
>> -}
>> -
>>   /**
>>    * drm_sched_free_job_work - worker to call free_job
>>    *
>> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>>   		sched->ops->free_job(cleanup_job);
>>   
>>   		drm_sched_free_job_queue_if_done(sched);
>> -		drm_sched_run_job_queue_if_ready(sched);
>> +		drm_sched_run_job_queue(sched);
> 
> It works but is a bit wasteful causing needless CPU wake ups with a 

I'd not worry about "waking up the CPU" as the CPU scheduler would most likely
put the wq on the same CPU by instruction cache locality.

> potentially empty queue, both here and in drm_sched_run_job_work below.

That's true, but if you were to look at the typical execution of
this code you'd see we get a string of function entry when the incoming queue
is non-empty, followed by one empty entry only to be taken off the CPU. So,
it really isn't a breaker.

So, there's a way to mitigate this in drm_sched_run_job_work(). I'll see that it
makes it in the next version of the patch.

Thanks!
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()
@ 2023-11-04  0:25           ` Luben Tuikov
  0 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-04  0:25 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: matthew.brost, robdclark, sarah.walker, ltuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	dakr, donald.robson, christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 3061 bytes --]

Hi Tvrtko,

On 2023-11-03 06:39, Tvrtko Ursulin wrote:
> 
> On 02/11/2023 22:46, Luben Tuikov wrote:
>> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
>> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
>> the former function uses drm_sched_select_entity() to determine if the
>> scheduler had an entity ready in one of its run-queues, and in the case of the
>> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
>> just that, selects the _next_ entity which is ready, sets up the run-queue and
>> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
>>
>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>> in the case of RR scheduling, that would result in calling select_entity()
>> twice, which may result in skipping a ready entity if more than one entity is
>> ready. This commit fixes this by eliminating the if_ready() variant.
> 
> Fixes: is missing since the regression already landed.

Ah, yes, thank you for pointing that out. :-)
I'll add one.

> 
>>
>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>>   1 file changed, 2 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 98b2ad54fc7071..05816e7cae8c8b 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>   }
>>   EXPORT_SYMBOL(drm_sched_pick_best);
>>   
>> -/**
>> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
>> - * @sched: scheduler instance
>> - */
>> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
>> -{
>> -	if (drm_sched_select_entity(sched))
>> -		drm_sched_run_job_queue(sched);
>> -}
>> -
>>   /**
>>    * drm_sched_free_job_work - worker to call free_job
>>    *
>> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>>   		sched->ops->free_job(cleanup_job);
>>   
>>   		drm_sched_free_job_queue_if_done(sched);
>> -		drm_sched_run_job_queue_if_ready(sched);
>> +		drm_sched_run_job_queue(sched);
> 
> It works but is a bit wasteful causing needless CPU wake ups with a 

I'd not worry about "waking up the CPU" as the CPU scheduler would most likely
put the wq on the same CPU by instruction cache locality.

> potentially empty queue, both here and in drm_sched_run_job_work below.

That's true, but if you were to look at the typical execution of
this code you'd see we get a string of function entry when the incoming queue
is non-empty, followed by one empty entry only to be taken off the CPU. So,
it really isn't a breaker.

So, there's a way to mitigate this in drm_sched_run_job_work(). I'll see that it
makes it in the next version of the patch.

Thanks!
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()
  2023-11-04  0:25           ` Luben Tuikov
@ 2023-11-06 12:54             ` Tvrtko Ursulin
  -1 siblings, 0 replies; 62+ messages in thread
From: Tvrtko Ursulin @ 2023-11-06 12:54 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: matthew.brost, robdclark, sarah.walker, ltuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	dakr, donald.robson, christian.koenig, faith.ekstrand


On 04/11/2023 00:25, Luben Tuikov wrote:
> Hi Tvrtko,
> 
> On 2023-11-03 06:39, Tvrtko Ursulin wrote:
>>
>> On 02/11/2023 22:46, Luben Tuikov wrote:
>>> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
>>> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
>>> the former function uses drm_sched_select_entity() to determine if the
>>> scheduler had an entity ready in one of its run-queues, and in the case of the
>>> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
>>> just that, selects the _next_ entity which is ready, sets up the run-queue and
>>> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
>>>
>>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>>> in the case of RR scheduling, that would result in calling select_entity()
>>> twice, which may result in skipping a ready entity if more than one entity is
>>> ready. This commit fixes this by eliminating the if_ready() variant.
>>
>> Fixes: is missing since the regression already landed.
> 
> Ah, yes, thank you for pointing that out. :-)
> I'll add one.
> 
>>
>>>
>>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>>>    1 file changed, 2 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 98b2ad54fc7071..05816e7cae8c8b 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>>    }
>>>    EXPORT_SYMBOL(drm_sched_pick_best);
>>>    
>>> -/**
>>> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
>>> - * @sched: scheduler instance
>>> - */
>>> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
>>> -{
>>> -	if (drm_sched_select_entity(sched))
>>> -		drm_sched_run_job_queue(sched);
>>> -}
>>> -
>>>    /**
>>>     * drm_sched_free_job_work - worker to call free_job
>>>     *
>>> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>>>    		sched->ops->free_job(cleanup_job);
>>>    
>>>    		drm_sched_free_job_queue_if_done(sched);
>>> -		drm_sched_run_job_queue_if_ready(sched);
>>> +		drm_sched_run_job_queue(sched);
>>
>> It works but is a bit wasteful causing needless CPU wake ups with a
> 
> I'd not worry about "waking up the CPU" as the CPU scheduler would most likely
> put the wq on the same CPU by instruction cache locality.
> 
>> potentially empty queue, both here and in drm_sched_run_job_work below.
> 
> That's true, but if you were to look at the typical execution of
> this code you'd see we get a string of function entry when the incoming queue
> is non-empty, followed by one empty entry only to be taken off the CPU. So,
> it really isn't a breaker.
> 
> So, there's a way to mitigate this in drm_sched_run_job_work(). I'll see that it
> makes it in the next version of the patch.

Okay, I will be keeping an eye on that.

Separately, I might send a patch to do the "re-queue if more pending" in 
one atomic section. (Instead of re-acquiring the lock.)

And also as a heads up, at some point in the next few months I will 
start looking at the latency and power effects of the "do just one and 
re-queue" conversion. In ChromeOS milli-Watts matter and some things 
like media playback do have a lot of inter-engine dependencies. So 
keeping the CPU C state residency low might matter. Well, it might 
matter for server media transcode stream density workloads too, both 
from power and stream capacity per socket metrics.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()
@ 2023-11-06 12:54             ` Tvrtko Ursulin
  0 siblings, 0 replies; 62+ messages in thread
From: Tvrtko Ursulin @ 2023-11-06 12:54 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: robdclark, sarah.walker, ltuikov, ketil.johnsen, lina, mcanal,
	Liviu.Dudau, dri-devel, intel-xe, boris.brezillon, dakr,
	donald.robson, christian.koenig, faith.ekstrand


On 04/11/2023 00:25, Luben Tuikov wrote:
> Hi Tvrtko,
> 
> On 2023-11-03 06:39, Tvrtko Ursulin wrote:
>>
>> On 02/11/2023 22:46, Luben Tuikov wrote:
>>> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
>>> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
>>> the former function uses drm_sched_select_entity() to determine if the
>>> scheduler had an entity ready in one of its run-queues, and in the case of the
>>> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
>>> just that, selects the _next_ entity which is ready, sets up the run-queue and
>>> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
>>>
>>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>>> in the case of RR scheduling, that would result in calling select_entity()
>>> twice, which may result in skipping a ready entity if more than one entity is
>>> ready. This commit fixes this by eliminating the if_ready() variant.
>>
>> Fixes: is missing since the regression already landed.
> 
> Ah, yes, thank you for pointing that out. :-)
> I'll add one.
> 
>>
>>>
>>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>>>    1 file changed, 2 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 98b2ad54fc7071..05816e7cae8c8b 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>>    }
>>>    EXPORT_SYMBOL(drm_sched_pick_best);
>>>    
>>> -/**
>>> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
>>> - * @sched: scheduler instance
>>> - */
>>> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
>>> -{
>>> -	if (drm_sched_select_entity(sched))
>>> -		drm_sched_run_job_queue(sched);
>>> -}
>>> -
>>>    /**
>>>     * drm_sched_free_job_work - worker to call free_job
>>>     *
>>> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>>>    		sched->ops->free_job(cleanup_job);
>>>    
>>>    		drm_sched_free_job_queue_if_done(sched);
>>> -		drm_sched_run_job_queue_if_ready(sched);
>>> +		drm_sched_run_job_queue(sched);
>>
>> It works but is a bit wasteful causing needless CPU wake ups with a
> 
> I'd not worry about "waking up the CPU" as the CPU scheduler would most likely
> put the wq on the same CPU by instruction cache locality.
> 
>> potentially empty queue, both here and in drm_sched_run_job_work below.
> 
> That's true, but if you were to look at the typical execution of
> this code you'd see we get a string of function entry when the incoming queue
> is non-empty, followed by one empty entry only to be taken off the CPU. So,
> it really isn't a breaker.
> 
> So, there's a way to mitigate this in drm_sched_run_job_work(). I'll see that it
> makes it in the next version of the patch.

Okay, I will be keeping an eye on that.

Separately, I might send a patch to do the "re-queue if more pending" in 
one atomic section. (Instead of re-acquiring the lock.)

And also as a heads up, at some point in the next few months I will 
start looking at the latency and power effects of the "do just one and 
re-queue" conversion. In ChromeOS milli-Watts matter and some things 
like media playback do have a lot of inter-engine dependencies. So 
keeping the CPU C state residency low might matter. Well, it might 
matter for server media transcode stream density workloads too, both 
from power and stream capacity per socket metrics.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
  2023-11-02 11:13     ` Tvrtko Ursulin
@ 2023-11-07  4:10       ` Luben Tuikov
  -1 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-07  4:10 UTC (permalink / raw)
  To: tvrtko.ursulin
  Cc: matthew.brost, robdclark, sarah.walker, ltuikov, Luben Tuikov,
	ketil.johnsen, lina, mcanal, Liviu.Dudau, dri-devel, intel-xe,
	boris.brezillon, dakr, donald.robson, christian.koenig,
	faith.ekstrand

Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
it do just that, schedule the work item for execution.

The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
to determine if the scheduler has an entity ready in one of its run-queues,
and in the case of the Round-Robin (RR) scheduling, the function
drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
which is ready, sets up the run-queue and completion and returns that
entity. The FIFO scheduling algorithm is unaffected.

Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
in the case of RR scheduling, that would result in drm_sched_select_entity()
having been called twice, which may result in skipping a ready entity if more
than one entity is ready. This commit fixes this by eliminating the call to
drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
in drm_sched_run_job_work().

v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
    Add fixes-tag. (Tvrtko)

Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
---
 drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 27843e37d9b769..cd0dc3f81d05f0 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
 }
 
 /**
- * __drm_sched_run_job_queue - enqueue run-job work
+ * drm_sched_run_job_queue - enqueue run-job work
  * @sched: scheduler instance
  */
-static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
+static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
 {
 	if (!READ_ONCE(sched->pause_submit))
 		queue_work(sched->submit_wq, &sched->work_run_job);
@@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
 void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
 {
 	if (drm_sched_can_queue(sched))
-		__drm_sched_run_job_queue(sched);
+		drm_sched_run_job_queue(sched);
 }
 
 /**
@@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 }
 EXPORT_SYMBOL(drm_sched_pick_best);
 
-/**
- * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
- * @sched: scheduler instance
- */
-static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
-{
-	if (drm_sched_select_entity(sched))
-		__drm_sched_run_job_queue(sched);
-}
-
 /**
  * drm_sched_free_job_work - worker to call free_job
  *

base-commit: 27d9620e9a9a6bc27a646b464b85860d91e21af3
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-xe] [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
@ 2023-11-07  4:10       ` Luben Tuikov
  0 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-07  4:10 UTC (permalink / raw)
  To: tvrtko.ursulin
  Cc: robdclark, sarah.walker, ltuikov, Luben Tuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	dakr, donald.robson, christian.koenig, faith.ekstrand

Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
it do just that, schedule the work item for execution.

The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
to determine if the scheduler has an entity ready in one of its run-queues,
and in the case of the Round-Robin (RR) scheduling, the function
drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
which is ready, sets up the run-queue and completion and returns that
entity. The FIFO scheduling algorithm is unaffected.

Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
in the case of RR scheduling, that would result in drm_sched_select_entity()
having been called twice, which may result in skipping a ready entity if more
than one entity is ready. This commit fixes this by eliminating the call to
drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
in drm_sched_run_job_work().

v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
    Add fixes-tag. (Tvrtko)

Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
---
 drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 27843e37d9b769..cd0dc3f81d05f0 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
 }
 
 /**
- * __drm_sched_run_job_queue - enqueue run-job work
+ * drm_sched_run_job_queue - enqueue run-job work
  * @sched: scheduler instance
  */
-static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
+static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
 {
 	if (!READ_ONCE(sched->pause_submit))
 		queue_work(sched->submit_wq, &sched->work_run_job);
@@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
 void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
 {
 	if (drm_sched_can_queue(sched))
-		__drm_sched_run_job_queue(sched);
+		drm_sched_run_job_queue(sched);
 }
 
 /**
@@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
 }
 EXPORT_SYMBOL(drm_sched_pick_best);
 
-/**
- * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
- * @sched: scheduler instance
- */
-static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
-{
-	if (drm_sched_select_entity(sched))
-		__drm_sched_run_job_queue(sched);
-}
-
 /**
  * drm_sched_free_job_work - worker to call free_job
  *

base-commit: 27d9620e9a9a6bc27a646b464b85860d91e21af3
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev12)
  2023-10-31  3:24 ` [Intel-xe] " Matthew Brost
                   ` (8 preceding siblings ...)
  (?)
@ 2023-11-07  4:39 ` Patchwork
  -1 siblings, 0 replies; 62+ messages in thread
From: Patchwork @ 2023-11-07  4:39 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: intel-xe

== Series Details ==

Series: DRM scheduler changes for Xe (rev12)
URL   : https://patchwork.freedesktop.org/series/121744/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: 5a3a6fdda drm/xe: Fix pagefault and access counter worker functions
=== git am output follows ===
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:290
error: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c: patch does not apply
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1659
error: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c: patch does not apply
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4861
error: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: patch does not apply
error: patch failed: drivers/gpu/drm/msm/adreno/adreno_device.c:841
error: drivers/gpu/drm/msm/adreno/adreno_device.c: patch does not apply
error: patch failed: drivers/gpu/drm/scheduler/sched_main.c:439
error: drivers/gpu/drm/scheduler/sched_main.c: patch does not apply
error: patch failed: include/drm/gpu_scheduler.h:552
error: include/drm/gpu_scheduler.h: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch
Applying: drm/sched: Add drm_sched_wqueue_* helpers
Patch failed at 0001 drm/sched: Add drm_sched_wqueue_* helpers
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
  2023-11-07  4:10       ` [Intel-xe] " Luben Tuikov
@ 2023-11-07 11:48         ` Matthew Brost
  -1 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-11-07 11:48 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: robdclark, tvrtko.ursulin, sarah.walker, ltuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	dakr, donald.robson, christian.koenig, faith.ekstrand

On Mon, Nov 06, 2023 at 11:10:21PM -0500, Luben Tuikov wrote:
> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
> it do just that, schedule the work item for execution.
> 
> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
> to determine if the scheduler has an entity ready in one of its run-queues,
> and in the case of the Round-Robin (RR) scheduling, the function
> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
> which is ready, sets up the run-queue and completion and returns that
> entity. The FIFO scheduling algorithm is unaffected.
> 
> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
> in the case of RR scheduling, that would result in drm_sched_select_entity()
> having been called twice, which may result in skipping a ready entity if more
> than one entity is ready. This commit fixes this by eliminating the call to
> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
> in drm_sched_run_job_work().
> 
> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>     Add fixes-tag. (Tvrtko)
> 
> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
>  1 file changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 27843e37d9b769..cd0dc3f81d05f0 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>  }
>  
>  /**
> - * __drm_sched_run_job_queue - enqueue run-job work
> + * drm_sched_run_job_queue - enqueue run-job work
>   * @sched: scheduler instance
>   */
> -static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>  {
>  	if (!READ_ONCE(sched->pause_submit))
>  		queue_work(sched->submit_wq, &sched->work_run_job);
> @@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>  void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
>  {
>  	if (drm_sched_can_queue(sched))
> -		__drm_sched_run_job_queue(sched);
> +		drm_sched_run_job_queue(sched);
>  }
>  
>  /**
> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>  }
>  EXPORT_SYMBOL(drm_sched_pick_best);
>  
> -/**
> - * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> -{
> -	if (drm_sched_select_entity(sched))
> -		__drm_sched_run_job_queue(sched);
> -}
> -
>  /**
>   * drm_sched_free_job_work - worker to call free_job
>   *
> 
> base-commit: 27d9620e9a9a6bc27a646b464b85860d91e21af3
> -- 
> 2.42.1
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
@ 2023-11-07 11:48         ` Matthew Brost
  0 siblings, 0 replies; 62+ messages in thread
From: Matthew Brost @ 2023-11-07 11:48 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: robdclark, tvrtko.ursulin, sarah.walker, ltuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	dakr, donald.robson, christian.koenig, faith.ekstrand

On Mon, Nov 06, 2023 at 11:10:21PM -0500, Luben Tuikov wrote:
> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
> it do just that, schedule the work item for execution.
> 
> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
> to determine if the scheduler has an entity ready in one of its run-queues,
> and in the case of the Round-Robin (RR) scheduling, the function
> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
> which is ready, sets up the run-queue and completion and returns that
> entity. The FIFO scheduling algorithm is unaffected.
> 
> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
> in the case of RR scheduling, that would result in drm_sched_select_entity()
> having been called twice, which may result in skipping a ready entity if more
> than one entity is ready. This commit fixes this by eliminating the call to
> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
> in drm_sched_run_job_work().
> 
> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>     Add fixes-tag. (Tvrtko)
> 
> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
>  1 file changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 27843e37d9b769..cd0dc3f81d05f0 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>  }
>  
>  /**
> - * __drm_sched_run_job_queue - enqueue run-job work
> + * drm_sched_run_job_queue - enqueue run-job work
>   * @sched: scheduler instance
>   */
> -static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>  {
>  	if (!READ_ONCE(sched->pause_submit))
>  		queue_work(sched->submit_wq, &sched->work_run_job);
> @@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>  void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
>  {
>  	if (drm_sched_can_queue(sched))
> -		__drm_sched_run_job_queue(sched);
> +		drm_sched_run_job_queue(sched);
>  }
>  
>  /**
> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>  }
>  EXPORT_SYMBOL(drm_sched_pick_best);
>  
> -/**
> - * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> -{
> -	if (drm_sched_select_entity(sched))
> -		__drm_sched_run_job_queue(sched);
> -}
> -
>  /**
>   * drm_sched_free_job_work - worker to call free_job
>   *
> 
> base-commit: 27d9620e9a9a6bc27a646b464b85860d91e21af3
> -- 
> 2.42.1
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
  2023-11-07  4:10       ` [Intel-xe] " Luben Tuikov
@ 2023-11-07 17:53         ` Danilo Krummrich
  -1 siblings, 0 replies; 62+ messages in thread
From: Danilo Krummrich @ 2023-11-07 17:53 UTC (permalink / raw)
  To: Luben Tuikov, tvrtko.ursulin
  Cc: matthew.brost, robdclark, sarah.walker, ltuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	donald.robson, christian.koenig, faith.ekstrand

On 11/7/23 05:10, Luben Tuikov wrote:
> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
> it do just that, schedule the work item for execution.
> 
> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
> to determine if the scheduler has an entity ready in one of its run-queues,
> and in the case of the Round-Robin (RR) scheduling, the function
> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
> which is ready, sets up the run-queue and completion and returns that
> entity. The FIFO scheduling algorithm is unaffected.
> 
> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
> in the case of RR scheduling, that would result in drm_sched_select_entity()
> having been called twice, which may result in skipping a ready entity if more
> than one entity is ready. This commit fixes this by eliminating the call to
> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
> in drm_sched_run_job_work().
> 
> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>      Add fixes-tag. (Tvrtko)
> 
> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")

Reviewed-by: Danilo Krummrich <dakr@redhat.com>

> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
>   1 file changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 27843e37d9b769..cd0dc3f81d05f0 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>   }
>   
>   /**
> - * __drm_sched_run_job_queue - enqueue run-job work
> + * drm_sched_run_job_queue - enqueue run-job work
>    * @sched: scheduler instance
>    */
> -static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>   {
>   	if (!READ_ONCE(sched->pause_submit))
>   		queue_work(sched->submit_wq, &sched->work_run_job);
> @@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>   void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
>   {
>   	if (drm_sched_can_queue(sched))
> -		__drm_sched_run_job_queue(sched);
> +		drm_sched_run_job_queue(sched);
>   }
>   
>   /**
> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>   }
>   EXPORT_SYMBOL(drm_sched_pick_best);
>   
> -/**
> - * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> -{
> -	if (drm_sched_select_entity(sched))
> -		__drm_sched_run_job_queue(sched);
> -}
> -
>   /**
>    * drm_sched_free_job_work - worker to call free_job
>    *
> 
> base-commit: 27d9620e9a9a6bc27a646b464b85860d91e21af3


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
@ 2023-11-07 17:53         ` Danilo Krummrich
  0 siblings, 0 replies; 62+ messages in thread
From: Danilo Krummrich @ 2023-11-07 17:53 UTC (permalink / raw)
  To: Luben Tuikov, tvrtko.ursulin
  Cc: robdclark, sarah.walker, ltuikov, ketil.johnsen, lina, mcanal,
	Liviu.Dudau, dri-devel, intel-xe, boris.brezillon, donald.robson,
	christian.koenig, faith.ekstrand

On 11/7/23 05:10, Luben Tuikov wrote:
> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
> it do just that, schedule the work item for execution.
> 
> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
> to determine if the scheduler has an entity ready in one of its run-queues,
> and in the case of the Round-Robin (RR) scheduling, the function
> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
> which is ready, sets up the run-queue and completion and returns that
> entity. The FIFO scheduling algorithm is unaffected.
> 
> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
> in the case of RR scheduling, that would result in drm_sched_select_entity()
> having been called twice, which may result in skipping a ready entity if more
> than one entity is ready. This commit fixes this by eliminating the call to
> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
> in drm_sched_run_job_work().
> 
> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>      Add fixes-tag. (Tvrtko)
> 
> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")

Reviewed-by: Danilo Krummrich <dakr@redhat.com>

> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
>   1 file changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 27843e37d9b769..cd0dc3f81d05f0 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>   }
>   
>   /**
> - * __drm_sched_run_job_queue - enqueue run-job work
> + * drm_sched_run_job_queue - enqueue run-job work
>    * @sched: scheduler instance
>    */
> -static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>   {
>   	if (!READ_ONCE(sched->pause_submit))
>   		queue_work(sched->submit_wq, &sched->work_run_job);
> @@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>   void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
>   {
>   	if (drm_sched_can_queue(sched))
> -		__drm_sched_run_job_queue(sched);
> +		drm_sched_run_job_queue(sched);
>   }
>   
>   /**
> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>   }
>   EXPORT_SYMBOL(drm_sched_pick_best);
>   
> -/**
> - * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> -{
> -	if (drm_sched_select_entity(sched))
> -		__drm_sched_run_job_queue(sched);
> -}
> -
>   /**
>    * drm_sched_free_job_work - worker to call free_job
>    *
> 
> base-commit: 27d9620e9a9a6bc27a646b464b85860d91e21af3


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
  2023-11-07  4:10       ` [Intel-xe] " Luben Tuikov
@ 2023-11-08  0:41         ` Danilo Krummrich
  -1 siblings, 0 replies; 62+ messages in thread
From: Danilo Krummrich @ 2023-11-08  0:41 UTC (permalink / raw)
  To: Luben Tuikov, tvrtko.ursulin
  Cc: matthew.brost, robdclark, sarah.walker, ltuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	donald.robson, christian.koenig, faith.ekstrand

On 11/7/23 05:10, Luben Tuikov wrote:
> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
> it do just that, schedule the work item for execution.
> 
> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
> to determine if the scheduler has an entity ready in one of its run-queues,
> and in the case of the Round-Robin (RR) scheduling, the function
> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
> which is ready, sets up the run-queue and completion and returns that
> entity. The FIFO scheduling algorithm is unaffected.
> 
> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
> in the case of RR scheduling, that would result in drm_sched_select_entity()
> having been called twice, which may result in skipping a ready entity if more
> than one entity is ready. This commit fixes this by eliminating the call to
> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
> in drm_sched_run_job_work().
> 
> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>      Add fixes-tag. (Tvrtko)
> 
> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
>   1 file changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 27843e37d9b769..cd0dc3f81d05f0 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>   }
>   
>   /**
> - * __drm_sched_run_job_queue - enqueue run-job work
> + * drm_sched_run_job_queue - enqueue run-job work
>    * @sched: scheduler instance
>    */
> -static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>   {
>   	if (!READ_ONCE(sched->pause_submit))
>   		queue_work(sched->submit_wq, &sched->work_run_job);
> @@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>   void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
>   {
>   	if (drm_sched_can_queue(sched))
> -		__drm_sched_run_job_queue(sched);
> +		drm_sched_run_job_queue(sched);
>   }
>   
>   /**
> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>   }
>   EXPORT_SYMBOL(drm_sched_pick_best);
>   
> -/**
> - * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> -{
> -	if (drm_sched_select_entity(sched))

Hm, now that I rebase my patch to implement dynamic job-flow control I recognize that
we probably need the peek semantics here. If we do not select an entity here, we also
do not check whether the corresponding job fits on the ring.

Alternatively, we simply can't do this check in drm_sched_wakeup(). The consequence would
be that we don't detect that we need to wait for credits to free up before the run work is
already executing and the run work selects an entity.

- Danilo

> -		__drm_sched_run_job_queue(sched);
> -}
> -
>   /**
>    * drm_sched_free_job_work - worker to call free_job
>    *
> 
> base-commit: 27d9620e9a9a6bc27a646b464b85860d91e21af3


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
@ 2023-11-08  0:41         ` Danilo Krummrich
  0 siblings, 0 replies; 62+ messages in thread
From: Danilo Krummrich @ 2023-11-08  0:41 UTC (permalink / raw)
  To: Luben Tuikov, tvrtko.ursulin
  Cc: robdclark, sarah.walker, ltuikov, ketil.johnsen, lina, mcanal,
	Liviu.Dudau, dri-devel, intel-xe, boris.brezillon, donald.robson,
	christian.koenig, faith.ekstrand

On 11/7/23 05:10, Luben Tuikov wrote:
> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
> it do just that, schedule the work item for execution.
> 
> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
> to determine if the scheduler has an entity ready in one of its run-queues,
> and in the case of the Round-Robin (RR) scheduling, the function
> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
> which is ready, sets up the run-queue and completion and returns that
> entity. The FIFO scheduling algorithm is unaffected.
> 
> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
> in the case of RR scheduling, that would result in drm_sched_select_entity()
> having been called twice, which may result in skipping a ready entity if more
> than one entity is ready. This commit fixes this by eliminating the call to
> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
> in drm_sched_run_job_work().
> 
> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>      Add fixes-tag. (Tvrtko)
> 
> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
>   1 file changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 27843e37d9b769..cd0dc3f81d05f0 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>   }
>   
>   /**
> - * __drm_sched_run_job_queue - enqueue run-job work
> + * drm_sched_run_job_queue - enqueue run-job work
>    * @sched: scheduler instance
>    */
> -static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>   {
>   	if (!READ_ONCE(sched->pause_submit))
>   		queue_work(sched->submit_wq, &sched->work_run_job);
> @@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>   void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
>   {
>   	if (drm_sched_can_queue(sched))
> -		__drm_sched_run_job_queue(sched);
> +		drm_sched_run_job_queue(sched);
>   }
>   
>   /**
> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>   }
>   EXPORT_SYMBOL(drm_sched_pick_best);
>   
> -/**
> - * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> -{
> -	if (drm_sched_select_entity(sched))

Hm, now that I rebase my patch to implement dynamic job-flow control I recognize that
we probably need the peek semantics here. If we do not select an entity here, we also
do not check whether the corresponding job fits on the ring.

Alternatively, we simply can't do this check in drm_sched_wakeup(). The consequence would
be that we don't detect that we need to wait for credits to free up before the run work is
already executing and the run work selects an entity.

- Danilo

> -		__drm_sched_run_job_queue(sched);
> -}
> -
>   /**
>    * drm_sched_free_job_work - worker to call free_job
>    *
> 
> base-commit: 27d9620e9a9a6bc27a646b464b85860d91e21af3


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
  2023-11-07 11:48         ` [Intel-xe] " Matthew Brost
@ 2023-11-08  3:28           ` Luben Tuikov
  -1 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-08  3:28 UTC (permalink / raw)
  To: Matthew Brost
  Cc: robdclark, tvrtko.ursulin, sarah.walker, ltuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	dakr, donald.robson, christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 1558 bytes --]

On 2023-11-07 06:48, Matthew Brost wrote:
> On Mon, Nov 06, 2023 at 11:10:21PM -0500, Luben Tuikov wrote:
>> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
>> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
>> it do just that, schedule the work item for execution.
>>
>> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
>> to determine if the scheduler has an entity ready in one of its run-queues,
>> and in the case of the Round-Robin (RR) scheduling, the function
>> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
>> which is ready, sets up the run-queue and completion and returns that
>> entity. The FIFO scheduling algorithm is unaffected.
>>
>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>> in the case of RR scheduling, that would result in drm_sched_select_entity()
>> having been called twice, which may result in skipping a ready entity if more
>> than one entity is ready. This commit fixes this by eliminating the call to
>> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
>> in drm_sched_run_job_work().
>>
>> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>>     Add fixes-tag. (Tvrtko)
>>
>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
> 
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>

Thank you, sir!
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
@ 2023-11-08  3:28           ` Luben Tuikov
  0 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-08  3:28 UTC (permalink / raw)
  To: Matthew Brost
  Cc: robdclark, tvrtko.ursulin, sarah.walker, ltuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	dakr, donald.robson, christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 1558 bytes --]

On 2023-11-07 06:48, Matthew Brost wrote:
> On Mon, Nov 06, 2023 at 11:10:21PM -0500, Luben Tuikov wrote:
>> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
>> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
>> it do just that, schedule the work item for execution.
>>
>> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
>> to determine if the scheduler has an entity ready in one of its run-queues,
>> and in the case of the Round-Robin (RR) scheduling, the function
>> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
>> which is ready, sets up the run-queue and completion and returns that
>> entity. The FIFO scheduling algorithm is unaffected.
>>
>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>> in the case of RR scheduling, that would result in drm_sched_select_entity()
>> having been called twice, which may result in skipping a ready entity if more
>> than one entity is ready. This commit fixes this by eliminating the call to
>> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
>> in drm_sched_run_job_work().
>>
>> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>>     Add fixes-tag. (Tvrtko)
>>
>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
> 
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>

Thank you, sir!
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
  2023-11-07 17:53         ` [Intel-xe] " Danilo Krummrich
@ 2023-11-08  3:29           ` Luben Tuikov
  -1 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-08  3:29 UTC (permalink / raw)
  To: Danilo Krummrich, tvrtko.ursulin
  Cc: matthew.brost, robdclark, sarah.walker, ltuikov, ketil.johnsen,
	lina, mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	donald.robson, christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 1533 bytes --]

On 2023-11-07 12:53, Danilo Krummrich wrote:
> On 11/7/23 05:10, Luben Tuikov wrote:
>> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
>> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
>> it do just that, schedule the work item for execution.
>>
>> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
>> to determine if the scheduler has an entity ready in one of its run-queues,
>> and in the case of the Round-Robin (RR) scheduling, the function
>> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
>> which is ready, sets up the run-queue and completion and returns that
>> entity. The FIFO scheduling algorithm is unaffected.
>>
>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>> in the case of RR scheduling, that would result in drm_sched_select_entity()
>> having been called twice, which may result in skipping a ready entity if more
>> than one entity is ready. This commit fixes this by eliminating the call to
>> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
>> in drm_sched_run_job_work().
>>
>> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>>      Add fixes-tag. (Tvrtko)
>>
>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
> 
> Reviewed-by: Danilo Krummrich <dakr@redhat.com>

Thank you, sir!
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
@ 2023-11-08  3:29           ` Luben Tuikov
  0 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-08  3:29 UTC (permalink / raw)
  To: Danilo Krummrich, tvrtko.ursulin
  Cc: robdclark, sarah.walker, ltuikov, ketil.johnsen, lina, mcanal,
	Liviu.Dudau, dri-devel, intel-xe, boris.brezillon, donald.robson,
	christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 1533 bytes --]

On 2023-11-07 12:53, Danilo Krummrich wrote:
> On 11/7/23 05:10, Luben Tuikov wrote:
>> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
>> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
>> it do just that, schedule the work item for execution.
>>
>> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
>> to determine if the scheduler has an entity ready in one of its run-queues,
>> and in the case of the Round-Robin (RR) scheduling, the function
>> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
>> which is ready, sets up the run-queue and completion and returns that
>> entity. The FIFO scheduling algorithm is unaffected.
>>
>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>> in the case of RR scheduling, that would result in drm_sched_select_entity()
>> having been called twice, which may result in skipping a ready entity if more
>> than one entity is ready. This commit fixes this by eliminating the call to
>> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
>> in drm_sched_run_job_work().
>>
>> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>>      Add fixes-tag. (Tvrtko)
>>
>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
> 
> Reviewed-by: Danilo Krummrich <dakr@redhat.com>

Thank you, sir!
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
  2023-11-08  0:41         ` [Intel-xe] " Danilo Krummrich
@ 2023-11-09  6:52           ` Luben Tuikov
  -1 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-09  6:52 UTC (permalink / raw)
  To: Danilo Krummrich, tvrtko.ursulin
  Cc: matthew.brost, robdclark, sarah.walker, ketil.johnsen, lina,
	mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	donald.robson, christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 4070 bytes --]

Hi,

On 2023-11-07 19:41, Danilo Krummrich wrote:
> On 11/7/23 05:10, Luben Tuikov wrote:
>> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
>> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
>> it do just that, schedule the work item for execution.
>>
>> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
>> to determine if the scheduler has an entity ready in one of its run-queues,
>> and in the case of the Round-Robin (RR) scheduling, the function
>> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
>> which is ready, sets up the run-queue and completion and returns that
>> entity. The FIFO scheduling algorithm is unaffected.
>>
>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>> in the case of RR scheduling, that would result in drm_sched_select_entity()
>> having been called twice, which may result in skipping a ready entity if more
>> than one entity is ready. This commit fixes this by eliminating the call to
>> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
>> in drm_sched_run_job_work().
>>
>> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>>      Add fixes-tag. (Tvrtko)
>>
>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
>>   1 file changed, 3 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 27843e37d9b769..cd0dc3f81d05f0 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>>   }
>>   
>>   /**
>> - * __drm_sched_run_job_queue - enqueue run-job work
>> + * drm_sched_run_job_queue - enqueue run-job work
>>    * @sched: scheduler instance
>>    */
>> -static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>> +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>   {
>>   	if (!READ_ONCE(sched->pause_submit))
>>   		queue_work(sched->submit_wq, &sched->work_run_job);
>> @@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>>   void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
>>   {
>>   	if (drm_sched_can_queue(sched))
>> -		__drm_sched_run_job_queue(sched);
>> +		drm_sched_run_job_queue(sched);
>>   }
>>   
>>   /**
>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>   }
>>   EXPORT_SYMBOL(drm_sched_pick_best);
>>   
>> -/**
>> - * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
>> - * @sched: scheduler instance
>> - */
>> -static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>> -{
>> -	if (drm_sched_select_entity(sched))
> 
> Hm, now that I rebase my patch to implement dynamic job-flow control I recognize that
> we probably need the peek semantics here. If we do not select an entity here, we also
> do not check whether the corresponding job fits on the ring.
> 
> Alternatively, we simply can't do this check in drm_sched_wakeup(). The consequence would
> be that we don't detect that we need to wait for credits to free up before the run work is
> already executing and the run work selects an entity.

So I rebased v5 on top of the latest drm-misc-next, and looked around and found out that
drm_sched_wakeup() is missing drm_sched_entity_is_ready(). It should look like the following,

void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
		      struct drm_sched_entity *entity)
{
	if (drm_sched_entity_is_ready(entity))
		if (drm_sched_can_queue(sched, entity))
			drm_sched_run_job_queue(sched);
}

See the attached patch. (Currently running with base-commit and the attached patch.)
-- 
Regards,
Luben

[-- Attachment #1.1.2: 0001-drm-sched-implement-dynamic-job-flow-control.patch --]
[-- Type: text/x-patch, Size: 26479 bytes --]

From 65b8b8be52e8c112d7350397cb54b4fb3470b008 Mon Sep 17 00:00:00 2001
From: Danilo Krummrich <dakr@redhat.com>
Date: Thu, 2 Nov 2023 01:10:34 +0100
Subject: [PATCH] drm/sched: implement dynamic job-flow control

Currently, job flow control is implemented simply by limiting the number
of jobs in flight. Therefore, a scheduler is initialized with a credit
limit that corresponds to the number of jobs which can be sent to the
hardware.

This implies that for each job, drivers need to account for the maximum
job size possible in order to not overflow the ring buffer.

However, there are drivers, such as Nouveau, where the job size has a
rather large range. For such drivers it can easily happen that job
submissions not even filling the ring by 1% can block subsequent
submissions, which, in the worst case, can lead to the ring run dry.

In order to overcome this issue, allow for tracking the actual job size
instead of the number of jobs. Therefore, add a field to track a job's
credit count, which represents the number of credits a job contributes
to the scheduler's credit limit.

v2: Check that the entity is ready before checking drm_sched_can_queue()
    in drm_sched_wakeup(). (Luben)

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231102001038.5076-1-dakr@redhat.com
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
---
 Documentation/gpu/drm-mm.rst                  |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       |   2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c  |   2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c         |   2 +-
 drivers/gpu/drm/lima/lima_device.c            |   2 +-
 drivers/gpu/drm/lima/lima_sched.c             |   2 +-
 drivers/gpu/drm/msm/msm_gem_submit.c          |   2 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c       |   2 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c       |   2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c       |   2 +-
 .../gpu/drm/scheduler/gpu_scheduler_trace.h   |   2 +-
 drivers/gpu/drm/scheduler/sched_entity.c      |   4 +-
 drivers/gpu/drm/scheduler/sched_main.c        | 171 ++++++++++++++----
 drivers/gpu/drm/v3d/v3d_gem.c                 |   2 +-
 include/drm/gpu_scheduler.h                   |  31 +++-
 15 files changed, 177 insertions(+), 57 deletions(-)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index 602010cb6894c3..acc5901ac84088 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -552,6 +552,12 @@ Overview
 .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
    :doc: Overview
 
+Flow Control
+------------
+
+.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
+   :doc: Flow Control
+
 Scheduler Function References
 -----------------------------
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 1f357198533f3e..62bb7fc7448ad9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -115,7 +115,7 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	if (!entity)
 		return 0;
 
-	return drm_sched_job_init(&(*job)->base, entity, owner);
+	return drm_sched_job_init(&(*job)->base, entity, 1, owner);
 }
 
 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
index 2416c526f9b067..3d0f8d182506e4 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
@@ -535,7 +535,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
 
 	ret = drm_sched_job_init(&submit->sched_job,
 				 &ctx->sched_entity[args->pipe],
-				 submit->ctx);
+				 1, submit->ctx);
 	if (ret)
 		goto err_submit_put;
 
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index 9276756e1397d3..5105d290e72e2e 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -1917,7 +1917,7 @@ static int etnaviv_gpu_rpm_suspend(struct device *dev)
 	u32 idle, mask;
 
 	/* If there are any jobs in the HW queue, we're not idle */
-	if (atomic_read(&gpu->sched.hw_rq_count))
+	if (atomic_read(&gpu->sched.credit_count))
 		return -EBUSY;
 
 	/* Check whether the hardware (except FE and MC) is idle */
diff --git a/drivers/gpu/drm/lima/lima_device.c b/drivers/gpu/drm/lima/lima_device.c
index 02cef0cea6572b..0bf7105c8748b4 100644
--- a/drivers/gpu/drm/lima/lima_device.c
+++ b/drivers/gpu/drm/lima/lima_device.c
@@ -514,7 +514,7 @@ int lima_device_suspend(struct device *dev)
 
 	/* check any task running */
 	for (i = 0; i < lima_pipe_num; i++) {
-		if (atomic_read(&ldev->pipe[i].base.hw_rq_count))
+		if (atomic_read(&ldev->pipe[i].base.credit_count))
 			return -EBUSY;
 	}
 
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index aa030e1f7cdaec..c3bf8cda84982c 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -123,7 +123,7 @@ int lima_sched_task_init(struct lima_sched_task *task,
 	for (i = 0; i < num_bos; i++)
 		drm_gem_object_get(&bos[i]->base.base);
 
-	err = drm_sched_job_init(&task->base, &context->base, vm);
+	err = drm_sched_job_init(&task->base, &context->base, 1, vm);
 	if (err) {
 		kfree(task->bos);
 		return err;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 99744de6c05a1b..c002cabe7b9c50 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -48,7 +48,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
 		return ERR_PTR(ret);
 	}
 
-	ret = drm_sched_job_init(&submit->base, queue->entity, queue);
+	ret = drm_sched_job_init(&submit->base, queue->entity, 1, queue);
 	if (ret) {
 		kfree(submit->hw_fence);
 		kfree(submit);
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
index 7e64b5ef90fb2b..1b2cc3f2e1c7e8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sched.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
@@ -89,7 +89,7 @@ nouveau_job_init(struct nouveau_job *job,
 
 	}
 
-	ret = drm_sched_job_init(&job->base, &entity->base, NULL);
+	ret = drm_sched_job_init(&job->base, &entity->base, 1, NULL);
 	if (ret)
 		goto err_free_chains;
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index b834777b409b07..54d1c19bea84dd 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -274,7 +274,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
 
 	ret = drm_sched_job_init(&job->base,
 				 &file_priv->sched_entity[slot],
-				 NULL);
+				 1, NULL);
 	if (ret)
 		goto out_put_job;
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 6d89e24322dbf0..f9446e197428d0 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -963,7 +963,7 @@ int panfrost_job_is_idle(struct panfrost_device *pfdev)
 
 	for (i = 0; i < NUM_JOB_SLOTS; i++) {
 		/* If there are any jobs in the HW queue, we're not idle */
-		if (atomic_read(&js->queue[i].sched.hw_rq_count))
+		if (atomic_read(&js->queue[i].sched.credit_count))
 			return false;
 	}
 
diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
index 3143ecaaff8628..f8ed093b7356eb 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
@@ -51,7 +51,7 @@ DECLARE_EVENT_CLASS(drm_sched_job,
 			   __assign_str(name, sched_job->sched->name);
 			   __entry->job_count = spsc_queue_count(&entity->job_queue);
 			   __entry->hw_job_count = atomic_read(
-				   &sched_job->sched->hw_rq_count);
+				   &sched_job->sched->credit_count);
 			   ),
 	    TP_printk("entity=%p, id=%llu, fence=%p, ring=%s, job count:%u, hw job count:%d",
 		      __entry->entity, __entry->id,
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index f1db63cc819812..4d42b1e4daa67f 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -370,7 +370,7 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
 		container_of(cb, struct drm_sched_entity, cb);
 
 	drm_sched_entity_clear_dep(f, cb);
-	drm_sched_wakeup(entity->rq->sched);
+	drm_sched_wakeup(entity->rq->sched, entity);
 }
 
 /**
@@ -602,7 +602,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
 			drm_sched_rq_update_fifo(entity, submit_ts);
 
-		drm_sched_wakeup(entity->rq->sched);
+		drm_sched_wakeup(entity->rq->sched, entity);
 	}
 }
 EXPORT_SYMBOL(drm_sched_entity_push_job);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index cd0dc3f81d05f0..dbb0a0b64cad8c 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -48,6 +48,30 @@
  * through the jobs entity pointer.
  */
 
+/**
+ * DOC: Flow Control
+ *
+ * The DRM GPU scheduler provides a flow control mechanism to regulate the rate
+ * in which the jobs fetched from scheduler entities are executed.
+ *
+ * In this context the &drm_gpu_scheduler keeps track of a driver specified
+ * credit limit representing the capacity of this scheduler and a credit count;
+ * every &drm_sched_job carries a driver specified number of credits.
+ *
+ * Once a job is executed (but not yet finished), the job's credits contribute
+ * to the scheduler's credit count until the job is finished. If by executing
+ * one more job the scheduler's credit count would exceed the scheduler's
+ * credit limit, the job won't be executed. Instead, the scheduler will wait
+ * until the credit count has decreased enough to not overflow its credit limit.
+ * This implies waiting for previously executed jobs.
+ *
+ * Optionally, drivers may register a callback (update_job_credits) provided by
+ * struct drm_sched_backend_ops to update the job's credits dynamically. The
+ * scheduler executes this callback every time the scheduler considers a job for
+ * execution and subsequently checks whether the job fits the scheduler's credit
+ * limit.
+ */
+
 #include <linux/wait.h>
 #include <linux/sched.h>
 #include <linux/completion.h>
@@ -75,6 +99,46 @@ int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
 MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
 module_param_named(sched_policy, drm_sched_policy, int, 0444);
 
+static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
+{
+	u32 credits;
+
+	drm_WARN_ON(sched, check_sub_overflow(sched->credit_limit,
+					      atomic_read(&sched->credit_count),
+					      &credits));
+
+	return credits;
+}
+
+/**
+ * drm_sched_can_queue -- Can we queue more to the hardware?
+ * @sched: scheduler instance
+ * @entity: the scheduler entity
+ *
+ * Return true if we can push at least one more job from @entity, false
+ * otherwise.
+ */
+static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
+				struct drm_sched_entity *entity)
+{
+	struct drm_sched_job *s_job;
+
+	s_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
+	if (!s_job)
+		return false;
+
+	if (sched->ops->update_job_credits) {
+		s_job->credits = sched->ops->update_job_credits(s_job);
+
+		drm_WARN(sched, !s_job->credits,
+			 "Jobs with zero credits bypass job-flow control\n");
+	}
+
+	drm_WARN_ON(sched, s_job->credits > sched->credit_limit);
+
+	return drm_sched_available_credits(sched) >= s_job->credits;
+}
+
 static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a,
 							    const struct rb_node *b)
 {
@@ -186,12 +250,18 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
 /**
  * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run
  *
+ * @sched: the gpu scheduler
  * @rq: scheduler run queue to check.
  *
- * Try to find a ready entity, returns NULL if none found.
+ * Try to find the next ready entity.
+ *
+ * Return an entity if one is found; return an error-pointer (!NULL) if an
+ * entity was ready, but the scheduler had insufficient credits to accommodate
+ * its job; return NULL, if no ready entity was found.
  */
 static struct drm_sched_entity *
-drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
+drm_sched_rq_select_entity_rr(struct drm_gpu_scheduler *sched,
+			      struct drm_sched_rq *rq)
 {
 	struct drm_sched_entity *entity;
 
@@ -201,6 +271,14 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
 	if (entity) {
 		list_for_each_entry_continue(entity, &rq->entities, list) {
 			if (drm_sched_entity_is_ready(entity)) {
+				/* If we can't queue yet, preserve the current
+				 * entity in terms of fairness.
+				 */
+				if (!drm_sched_can_queue(sched, entity)) {
+					spin_unlock(&rq->lock);
+					return ERR_PTR(-ENOSPC);
+				}
+
 				rq->current_entity = entity;
 				reinit_completion(&entity->entity_idle);
 				spin_unlock(&rq->lock);
@@ -210,8 +288,15 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
 	}
 
 	list_for_each_entry(entity, &rq->entities, list) {
-
 		if (drm_sched_entity_is_ready(entity)) {
+			/* If we can't queue yet, preserve the current entity in
+			 * terms of fairness.
+			 */
+			if (!drm_sched_can_queue(sched, entity)) {
+				spin_unlock(&rq->lock);
+				return ERR_PTR(-ENOSPC);
+			}
+
 			rq->current_entity = entity;
 			reinit_completion(&entity->entity_idle);
 			spin_unlock(&rq->lock);
@@ -230,12 +315,18 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
 /**
  * drm_sched_rq_select_entity_fifo - Select an entity which provides a job to run
  *
+ * @sched: the gpu scheduler
  * @rq: scheduler run queue to check.
  *
- * Find oldest waiting ready entity, returns NULL if none found.
+ * Find oldest waiting ready entity.
+ *
+ * Return an entity if one is found; return an error-pointer (!NULL) if an
+ * entity was ready, but the scheduler had insufficient credits to accommodate
+ * its job; return NULL, if no ready entity was found.
  */
 static struct drm_sched_entity *
-drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
+drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched,
+				struct drm_sched_rq *rq)
 {
 	struct rb_node *rb;
 
@@ -245,6 +336,14 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
 
 		entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
 		if (drm_sched_entity_is_ready(entity)) {
+			/* If we can't queue yet, preserve the current entity in
+			 * terms of fairness.
+			 */
+			if (!drm_sched_can_queue(sched, entity)) {
+				spin_unlock(&rq->lock);
+				return ERR_PTR(-ENOSPC);
+			}
+
 			rq->current_entity = entity;
 			reinit_completion(&entity->entity_idle);
 			break;
@@ -302,7 +401,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
 	struct drm_sched_fence *s_fence = s_job->s_fence;
 	struct drm_gpu_scheduler *sched = s_fence->sched;
 
-	atomic_dec(&sched->hw_rq_count);
+	atomic_sub(s_job->credits, &sched->credit_count);
 	atomic_dec(sched->score);
 
 	trace_drm_sched_process_job(s_fence);
@@ -525,7 +624,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 					      &s_job->cb)) {
 			dma_fence_put(s_job->s_fence->parent);
 			s_job->s_fence->parent = NULL;
-			atomic_dec(&sched->hw_rq_count);
+			atomic_sub(s_job->credits, &sched->credit_count);
 		} else {
 			/*
 			 * remove job from pending_list.
@@ -586,7 +685,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
 	list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) {
 		struct dma_fence *fence = s_job->s_fence->parent;
 
-		atomic_inc(&sched->hw_rq_count);
+		atomic_add(s_job->credits, &sched->credit_count);
 
 		if (!full_recovery)
 			continue;
@@ -667,6 +766,8 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs);
  * drm_sched_job_init - init a scheduler job
  * @job: scheduler job to init
  * @entity: scheduler entity to use
+ * @credits: the number of credits this job contributes to the schedulers
+ * credit limit
  * @owner: job owner for debugging
  *
  * Refer to drm_sched_entity_push_job() documentation
@@ -684,7 +785,7 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs);
  */
 int drm_sched_job_init(struct drm_sched_job *job,
 		       struct drm_sched_entity *entity,
-		       void *owner)
+		       u32 credits, void *owner)
 {
 	if (!entity->rq) {
 		/* This will most likely be followed by missing frames
@@ -701,6 +802,10 @@ int drm_sched_job_init(struct drm_sched_job *job,
 		return -ENOMEM;
 
 	INIT_LIST_HEAD(&job->list);
+	job->credits = credits;
+
+	drm_WARN(job->sched, !credits,
+		 "Jobs with zero credits bypass job-flow control\n");
 
 	xa_init_flags(&job->dependencies, XA_FLAGS_ALLOC);
 
@@ -908,27 +1013,18 @@ void drm_sched_job_cleanup(struct drm_sched_job *job)
 EXPORT_SYMBOL(drm_sched_job_cleanup);
 
 /**
- * drm_sched_can_queue -- Can we queue more to the hardware?
- * @sched: scheduler instance
- *
- * Return true if we can push more jobs to the hw, otherwise false.
- */
-static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
-{
-	return atomic_read(&sched->hw_rq_count) <
-		sched->hw_submission_limit;
-}
-
-/**
- * drm_sched_wakeup - Wake up the scheduler if it is ready to queue
+ * drm_sched_wakeup - Wake up the scheduler
  * @sched: scheduler instance
+ * @entity: the scheduler entity
  *
  * Wake up the scheduler if we can queue jobs.
  */
-void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
+void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
+		      struct drm_sched_entity *entity)
 {
-	if (drm_sched_can_queue(sched))
-		drm_sched_run_job_queue(sched);
+	if (drm_sched_entity_is_ready(entity))
+		if (drm_sched_can_queue(sched, entity))
+			drm_sched_run_job_queue(sched);
 }
 
 /**
@@ -936,7 +1032,11 @@ void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
  *
  * @sched: scheduler instance
  *
- * Returns the entity to process or NULL if none are found.
+ * Return an entity to process or NULL if none are found.
+ *
+ * Note, that we break out of the for-loop when "entity" is non-null, which can
+ * also be an error-pointer--this assures we don't process lower priority
+ * run-queues. See comments in the respectively called functions.
  */
 static struct drm_sched_entity *
 drm_sched_select_entity(struct drm_gpu_scheduler *sched)
@@ -944,19 +1044,16 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
 	struct drm_sched_entity *entity;
 	int i;
 
-	if (!drm_sched_can_queue(sched))
-		return NULL;
-
 	/* Kernel run queue has higher priority than normal run queue*/
 	for (i = sched->num_rqs - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
 		entity = drm_sched_policy == DRM_SCHED_POLICY_FIFO ?
-			drm_sched_rq_select_entity_fifo(sched->sched_rq[i]) :
-			drm_sched_rq_select_entity_rr(sched->sched_rq[i]);
+			drm_sched_rq_select_entity_fifo(sched, sched->sched_rq[i]) :
+			drm_sched_rq_select_entity_rr(sched, sched->sched_rq[i]);
 		if (entity)
 			break;
 	}
 
-	return entity;
+	return IS_ERR(entity) ? NULL : entity;
 }
 
 /**
@@ -1092,7 +1189,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
 
 	s_fence = sched_job->s_fence;
 
-	atomic_inc(&sched->hw_rq_count);
+	atomic_add(sched_job->credits, &sched->credit_count);
 	drm_sched_job_begin(sched_job);
 
 	trace_drm_run_job(sched_job, entity);
@@ -1127,7 +1224,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
  * @submit_wq: workqueue to use for submission. If NULL, an ordered wq is
  *	       allocated and used
  * @num_rqs: number of runqueues, one for each priority, up to DRM_SCHED_PRIORITY_COUNT
- * @hw_submission: number of hw submissions that can be in flight
+ * @credit_limit: the number of credits this scheduler can hold from all jobs
  * @hang_limit: number of times to allow a job to hang before dropping it
  * @timeout: timeout value in jiffies for the scheduler
  * @timeout_wq: workqueue to use for timeout work. If NULL, the system_wq is
@@ -1141,14 +1238,14 @@ static void drm_sched_run_job_work(struct work_struct *w)
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
 		   struct workqueue_struct *submit_wq,
-		   u32 num_rqs, uint32_t hw_submission, unsigned int hang_limit,
+		   u32 num_rqs, u32 credit_limit, unsigned int hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
 		   atomic_t *score, const char *name, struct device *dev)
 {
 	int i, ret;
 
 	sched->ops = ops;
-	sched->hw_submission_limit = hw_submission;
+	sched->credit_limit = credit_limit;
 	sched->name = name;
 	sched->timeout = timeout;
 	sched->timeout_wq = timeout_wq ? : system_wq;
@@ -1197,7 +1294,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 	init_waitqueue_head(&sched->job_scheduled);
 	INIT_LIST_HEAD(&sched->pending_list);
 	spin_lock_init(&sched->job_list_lock);
-	atomic_set(&sched->hw_rq_count, 0);
+	atomic_set(&sched->credit_count, 0);
 	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
 	INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
 	INIT_WORK(&sched->work_free_job, drm_sched_free_job_work);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 712675134c048d..9d2ac23c29e33e 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -418,7 +418,7 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 	job->file = file_priv;
 
 	ret = drm_sched_job_init(&job->base, &v3d_priv->sched_entity[queue],
-				 v3d_priv);
+				 1, v3d_priv);
 	if (ret)
 		goto fail;
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 754fd2217334e5..783872d29a6d71 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -321,6 +321,7 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
  * @sched: the scheduler instance on which this job is scheduled.
  * @s_fence: contains the fences for the scheduling of job.
  * @finish_cb: the callback for the finished fence.
+ * @credits: the number of credits this job contributes to the scheduler
  * @work: Helper to reschdeule job kill to different context.
  * @id: a unique id assigned to each job scheduled on the scheduler.
  * @karma: increment on every hang caused by this job. If this exceeds the hang
@@ -340,6 +341,8 @@ struct drm_sched_job {
 	struct drm_gpu_scheduler	*sched;
 	struct drm_sched_fence		*s_fence;
 
+	u32				credits;
+
 	/*
 	 * work is used only after finish_cb has been used and will not be
 	 * accessed anymore.
@@ -463,13 +466,27 @@ struct drm_sched_backend_ops {
          * and it's time to clean it up.
 	 */
 	void (*free_job)(struct drm_sched_job *sched_job);
+
+	/**
+	 * @update_job_credits: Called when the scheduler is considering this
+	 * job for execution.
+	 *
+	 * This callback returns the number of credits the job would take if
+	 * pushed to the hardware. Drivers may use this to dynamically update
+	 * the job's credit count. For instance, deduct the number of credits
+	 * for already signalled native fences.
+	 *
+	 * This callback is optional.
+	 */
+	u32 (*update_job_credits)(struct drm_sched_job *sched_job);
 };
 
 /**
  * struct drm_gpu_scheduler - scheduler instance-specific data
  *
  * @ops: backend operations provided by the driver.
- * @hw_submission_limit: the max size of the hardware queue.
+ * @credit_limit: the credit limit of this scheduler
+ * @credit_count: the current credit count of this scheduler
  * @timeout: the time after which a job is removed from the scheduler.
  * @name: name of the ring for which this scheduler is being used.
  * @num_rqs: Number of run-queues. This is at most DRM_SCHED_PRIORITY_COUNT,
@@ -478,7 +495,6 @@ struct drm_sched_backend_ops {
  * @job_scheduled: once @drm_sched_entity_do_release is called the scheduler
  *                 waits on this wait queue until all the scheduled jobs are
  *                 finished.
- * @hw_rq_count: the number of jobs currently in the hardware queue.
  * @job_id_count: used to assign unique id to the each job.
  * @submit_wq: workqueue used to queue @work_run_job and @work_free_job
  * @timeout_wq: workqueue used to queue @work_tdr
@@ -502,13 +518,13 @@ struct drm_sched_backend_ops {
  */
 struct drm_gpu_scheduler {
 	const struct drm_sched_backend_ops	*ops;
-	uint32_t			hw_submission_limit;
+	u32				credit_limit;
+	atomic_t			credit_count;
 	long				timeout;
 	const char			*name;
 	u32                             num_rqs;
 	struct drm_sched_rq             **sched_rq;
 	wait_queue_head_t		job_scheduled;
-	atomic_t			hw_rq_count;
 	atomic64_t			job_id_count;
 	struct workqueue_struct		*submit_wq;
 	struct workqueue_struct		*timeout_wq;
@@ -530,14 +546,14 @@ struct drm_gpu_scheduler {
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
 		   struct workqueue_struct *submit_wq,
-		   u32 num_rqs, uint32_t hw_submission, unsigned int hang_limit,
+		   u32 num_rqs, u32 credit_limit, unsigned int hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
 		   atomic_t *score, const char *name, struct device *dev);
 
 void drm_sched_fini(struct drm_gpu_scheduler *sched);
 int drm_sched_job_init(struct drm_sched_job *job,
 		       struct drm_sched_entity *entity,
-		       void *owner);
+		       u32 credits, void *owner);
 void drm_sched_job_arm(struct drm_sched_job *job);
 int drm_sched_job_add_dependency(struct drm_sched_job *job,
 				 struct dma_fence *fence);
@@ -559,7 +575,8 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 
 void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched);
 void drm_sched_job_cleanup(struct drm_sched_job *job);
-void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
+void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
+		      struct drm_sched_entity *entity);
 bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);
 void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched);
 void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched);

base-commit: 8d88e4cdce4f5c56de55174a4d32ea9c06f7fa66
-- 
2.42.1


[-- Attachment #1.1.3: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
@ 2023-11-09  6:52           ` Luben Tuikov
  0 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-09  6:52 UTC (permalink / raw)
  To: Danilo Krummrich, tvrtko.ursulin
  Cc: robdclark, sarah.walker, ketil.johnsen, lina, mcanal,
	Liviu.Dudau, dri-devel, intel-xe, boris.brezillon, donald.robson,
	christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 4070 bytes --]

Hi,

On 2023-11-07 19:41, Danilo Krummrich wrote:
> On 11/7/23 05:10, Luben Tuikov wrote:
>> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
>> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
>> it do just that, schedule the work item for execution.
>>
>> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
>> to determine if the scheduler has an entity ready in one of its run-queues,
>> and in the case of the Round-Robin (RR) scheduling, the function
>> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
>> which is ready, sets up the run-queue and completion and returns that
>> entity. The FIFO scheduling algorithm is unaffected.
>>
>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>> in the case of RR scheduling, that would result in drm_sched_select_entity()
>> having been called twice, which may result in skipping a ready entity if more
>> than one entity is ready. This commit fixes this by eliminating the call to
>> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
>> in drm_sched_run_job_work().
>>
>> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>>      Add fixes-tag. (Tvrtko)
>>
>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
>>   1 file changed, 3 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 27843e37d9b769..cd0dc3f81d05f0 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>>   }
>>   
>>   /**
>> - * __drm_sched_run_job_queue - enqueue run-job work
>> + * drm_sched_run_job_queue - enqueue run-job work
>>    * @sched: scheduler instance
>>    */
>> -static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>> +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>   {
>>   	if (!READ_ONCE(sched->pause_submit))
>>   		queue_work(sched->submit_wq, &sched->work_run_job);
>> @@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>>   void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
>>   {
>>   	if (drm_sched_can_queue(sched))
>> -		__drm_sched_run_job_queue(sched);
>> +		drm_sched_run_job_queue(sched);
>>   }
>>   
>>   /**
>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>   }
>>   EXPORT_SYMBOL(drm_sched_pick_best);
>>   
>> -/**
>> - * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
>> - * @sched: scheduler instance
>> - */
>> -static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>> -{
>> -	if (drm_sched_select_entity(sched))
> 
> Hm, now that I rebase my patch to implement dynamic job-flow control I recognize that
> we probably need the peek semantics here. If we do not select an entity here, we also
> do not check whether the corresponding job fits on the ring.
> 
> Alternatively, we simply can't do this check in drm_sched_wakeup(). The consequence would
> be that we don't detect that we need to wait for credits to free up before the run work is
> already executing and the run work selects an entity.

So I rebased v5 on top of the latest drm-misc-next, and looked around and found out that
drm_sched_wakeup() is missing drm_sched_entity_is_ready(). It should look like the following,

void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
		      struct drm_sched_entity *entity)
{
	if (drm_sched_entity_is_ready(entity))
		if (drm_sched_can_queue(sched, entity))
			drm_sched_run_job_queue(sched);
}

See the attached patch. (Currently running with base-commit and the attached patch.)
-- 
Regards,
Luben

[-- Attachment #1.1.2: 0001-drm-sched-implement-dynamic-job-flow-control.patch --]
[-- Type: text/x-patch, Size: 26479 bytes --]

From 65b8b8be52e8c112d7350397cb54b4fb3470b008 Mon Sep 17 00:00:00 2001
From: Danilo Krummrich <dakr@redhat.com>
Date: Thu, 2 Nov 2023 01:10:34 +0100
Subject: [PATCH] drm/sched: implement dynamic job-flow control

Currently, job flow control is implemented simply by limiting the number
of jobs in flight. Therefore, a scheduler is initialized with a credit
limit that corresponds to the number of jobs which can be sent to the
hardware.

This implies that for each job, drivers need to account for the maximum
job size possible in order to not overflow the ring buffer.

However, there are drivers, such as Nouveau, where the job size has a
rather large range. For such drivers it can easily happen that job
submissions not even filling the ring by 1% can block subsequent
submissions, which, in the worst case, can lead to the ring run dry.

In order to overcome this issue, allow for tracking the actual job size
instead of the number of jobs. Therefore, add a field to track a job's
credit count, which represents the number of credits a job contributes
to the scheduler's credit limit.

v2: Check that the entity is ready before checking drm_sched_can_queue()
    in drm_sched_wakeup(). (Luben)

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231102001038.5076-1-dakr@redhat.com
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
---
 Documentation/gpu/drm-mm.rst                  |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       |   2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c  |   2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c         |   2 +-
 drivers/gpu/drm/lima/lima_device.c            |   2 +-
 drivers/gpu/drm/lima/lima_sched.c             |   2 +-
 drivers/gpu/drm/msm/msm_gem_submit.c          |   2 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c       |   2 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c       |   2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c       |   2 +-
 .../gpu/drm/scheduler/gpu_scheduler_trace.h   |   2 +-
 drivers/gpu/drm/scheduler/sched_entity.c      |   4 +-
 drivers/gpu/drm/scheduler/sched_main.c        | 171 ++++++++++++++----
 drivers/gpu/drm/v3d/v3d_gem.c                 |   2 +-
 include/drm/gpu_scheduler.h                   |  31 +++-
 15 files changed, 177 insertions(+), 57 deletions(-)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index 602010cb6894c3..acc5901ac84088 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -552,6 +552,12 @@ Overview
 .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
    :doc: Overview
 
+Flow Control
+------------
+
+.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
+   :doc: Flow Control
+
 Scheduler Function References
 -----------------------------
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 1f357198533f3e..62bb7fc7448ad9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -115,7 +115,7 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	if (!entity)
 		return 0;
 
-	return drm_sched_job_init(&(*job)->base, entity, owner);
+	return drm_sched_job_init(&(*job)->base, entity, 1, owner);
 }
 
 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
index 2416c526f9b067..3d0f8d182506e4 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
@@ -535,7 +535,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data,
 
 	ret = drm_sched_job_init(&submit->sched_job,
 				 &ctx->sched_entity[args->pipe],
-				 submit->ctx);
+				 1, submit->ctx);
 	if (ret)
 		goto err_submit_put;
 
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index 9276756e1397d3..5105d290e72e2e 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -1917,7 +1917,7 @@ static int etnaviv_gpu_rpm_suspend(struct device *dev)
 	u32 idle, mask;
 
 	/* If there are any jobs in the HW queue, we're not idle */
-	if (atomic_read(&gpu->sched.hw_rq_count))
+	if (atomic_read(&gpu->sched.credit_count))
 		return -EBUSY;
 
 	/* Check whether the hardware (except FE and MC) is idle */
diff --git a/drivers/gpu/drm/lima/lima_device.c b/drivers/gpu/drm/lima/lima_device.c
index 02cef0cea6572b..0bf7105c8748b4 100644
--- a/drivers/gpu/drm/lima/lima_device.c
+++ b/drivers/gpu/drm/lima/lima_device.c
@@ -514,7 +514,7 @@ int lima_device_suspend(struct device *dev)
 
 	/* check any task running */
 	for (i = 0; i < lima_pipe_num; i++) {
-		if (atomic_read(&ldev->pipe[i].base.hw_rq_count))
+		if (atomic_read(&ldev->pipe[i].base.credit_count))
 			return -EBUSY;
 	}
 
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index aa030e1f7cdaec..c3bf8cda84982c 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -123,7 +123,7 @@ int lima_sched_task_init(struct lima_sched_task *task,
 	for (i = 0; i < num_bos; i++)
 		drm_gem_object_get(&bos[i]->base.base);
 
-	err = drm_sched_job_init(&task->base, &context->base, vm);
+	err = drm_sched_job_init(&task->base, &context->base, 1, vm);
 	if (err) {
 		kfree(task->bos);
 		return err;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 99744de6c05a1b..c002cabe7b9c50 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -48,7 +48,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
 		return ERR_PTR(ret);
 	}
 
-	ret = drm_sched_job_init(&submit->base, queue->entity, queue);
+	ret = drm_sched_job_init(&submit->base, queue->entity, 1, queue);
 	if (ret) {
 		kfree(submit->hw_fence);
 		kfree(submit);
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
index 7e64b5ef90fb2b..1b2cc3f2e1c7e8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sched.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
@@ -89,7 +89,7 @@ nouveau_job_init(struct nouveau_job *job,
 
 	}
 
-	ret = drm_sched_job_init(&job->base, &entity->base, NULL);
+	ret = drm_sched_job_init(&job->base, &entity->base, 1, NULL);
 	if (ret)
 		goto err_free_chains;
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index b834777b409b07..54d1c19bea84dd 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -274,7 +274,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
 
 	ret = drm_sched_job_init(&job->base,
 				 &file_priv->sched_entity[slot],
-				 NULL);
+				 1, NULL);
 	if (ret)
 		goto out_put_job;
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 6d89e24322dbf0..f9446e197428d0 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -963,7 +963,7 @@ int panfrost_job_is_idle(struct panfrost_device *pfdev)
 
 	for (i = 0; i < NUM_JOB_SLOTS; i++) {
 		/* If there are any jobs in the HW queue, we're not idle */
-		if (atomic_read(&js->queue[i].sched.hw_rq_count))
+		if (atomic_read(&js->queue[i].sched.credit_count))
 			return false;
 	}
 
diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
index 3143ecaaff8628..f8ed093b7356eb 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
@@ -51,7 +51,7 @@ DECLARE_EVENT_CLASS(drm_sched_job,
 			   __assign_str(name, sched_job->sched->name);
 			   __entry->job_count = spsc_queue_count(&entity->job_queue);
 			   __entry->hw_job_count = atomic_read(
-				   &sched_job->sched->hw_rq_count);
+				   &sched_job->sched->credit_count);
 			   ),
 	    TP_printk("entity=%p, id=%llu, fence=%p, ring=%s, job count:%u, hw job count:%d",
 		      __entry->entity, __entry->id,
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index f1db63cc819812..4d42b1e4daa67f 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -370,7 +370,7 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
 		container_of(cb, struct drm_sched_entity, cb);
 
 	drm_sched_entity_clear_dep(f, cb);
-	drm_sched_wakeup(entity->rq->sched);
+	drm_sched_wakeup(entity->rq->sched, entity);
 }
 
 /**
@@ -602,7 +602,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
 			drm_sched_rq_update_fifo(entity, submit_ts);
 
-		drm_sched_wakeup(entity->rq->sched);
+		drm_sched_wakeup(entity->rq->sched, entity);
 	}
 }
 EXPORT_SYMBOL(drm_sched_entity_push_job);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index cd0dc3f81d05f0..dbb0a0b64cad8c 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -48,6 +48,30 @@
  * through the jobs entity pointer.
  */
 
+/**
+ * DOC: Flow Control
+ *
+ * The DRM GPU scheduler provides a flow control mechanism to regulate the rate
+ * in which the jobs fetched from scheduler entities are executed.
+ *
+ * In this context the &drm_gpu_scheduler keeps track of a driver specified
+ * credit limit representing the capacity of this scheduler and a credit count;
+ * every &drm_sched_job carries a driver specified number of credits.
+ *
+ * Once a job is executed (but not yet finished), the job's credits contribute
+ * to the scheduler's credit count until the job is finished. If by executing
+ * one more job the scheduler's credit count would exceed the scheduler's
+ * credit limit, the job won't be executed. Instead, the scheduler will wait
+ * until the credit count has decreased enough to not overflow its credit limit.
+ * This implies waiting for previously executed jobs.
+ *
+ * Optionally, drivers may register a callback (update_job_credits) provided by
+ * struct drm_sched_backend_ops to update the job's credits dynamically. The
+ * scheduler executes this callback every time the scheduler considers a job for
+ * execution and subsequently checks whether the job fits the scheduler's credit
+ * limit.
+ */
+
 #include <linux/wait.h>
 #include <linux/sched.h>
 #include <linux/completion.h>
@@ -75,6 +99,46 @@ int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
 MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " __stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
 module_param_named(sched_policy, drm_sched_policy, int, 0444);
 
+static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
+{
+	u32 credits;
+
+	drm_WARN_ON(sched, check_sub_overflow(sched->credit_limit,
+					      atomic_read(&sched->credit_count),
+					      &credits));
+
+	return credits;
+}
+
+/**
+ * drm_sched_can_queue -- Can we queue more to the hardware?
+ * @sched: scheduler instance
+ * @entity: the scheduler entity
+ *
+ * Return true if we can push at least one more job from @entity, false
+ * otherwise.
+ */
+static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
+				struct drm_sched_entity *entity)
+{
+	struct drm_sched_job *s_job;
+
+	s_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
+	if (!s_job)
+		return false;
+
+	if (sched->ops->update_job_credits) {
+		s_job->credits = sched->ops->update_job_credits(s_job);
+
+		drm_WARN(sched, !s_job->credits,
+			 "Jobs with zero credits bypass job-flow control\n");
+	}
+
+	drm_WARN_ON(sched, s_job->credits > sched->credit_limit);
+
+	return drm_sched_available_credits(sched) >= s_job->credits;
+}
+
 static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a,
 							    const struct rb_node *b)
 {
@@ -186,12 +250,18 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
 /**
  * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run
  *
+ * @sched: the gpu scheduler
  * @rq: scheduler run queue to check.
  *
- * Try to find a ready entity, returns NULL if none found.
+ * Try to find the next ready entity.
+ *
+ * Return an entity if one is found; return an error-pointer (!NULL) if an
+ * entity was ready, but the scheduler had insufficient credits to accommodate
+ * its job; return NULL, if no ready entity was found.
  */
 static struct drm_sched_entity *
-drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
+drm_sched_rq_select_entity_rr(struct drm_gpu_scheduler *sched,
+			      struct drm_sched_rq *rq)
 {
 	struct drm_sched_entity *entity;
 
@@ -201,6 +271,14 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
 	if (entity) {
 		list_for_each_entry_continue(entity, &rq->entities, list) {
 			if (drm_sched_entity_is_ready(entity)) {
+				/* If we can't queue yet, preserve the current
+				 * entity in terms of fairness.
+				 */
+				if (!drm_sched_can_queue(sched, entity)) {
+					spin_unlock(&rq->lock);
+					return ERR_PTR(-ENOSPC);
+				}
+
 				rq->current_entity = entity;
 				reinit_completion(&entity->entity_idle);
 				spin_unlock(&rq->lock);
@@ -210,8 +288,15 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
 	}
 
 	list_for_each_entry(entity, &rq->entities, list) {
-
 		if (drm_sched_entity_is_ready(entity)) {
+			/* If we can't queue yet, preserve the current entity in
+			 * terms of fairness.
+			 */
+			if (!drm_sched_can_queue(sched, entity)) {
+				spin_unlock(&rq->lock);
+				return ERR_PTR(-ENOSPC);
+			}
+
 			rq->current_entity = entity;
 			reinit_completion(&entity->entity_idle);
 			spin_unlock(&rq->lock);
@@ -230,12 +315,18 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
 /**
  * drm_sched_rq_select_entity_fifo - Select an entity which provides a job to run
  *
+ * @sched: the gpu scheduler
  * @rq: scheduler run queue to check.
  *
- * Find oldest waiting ready entity, returns NULL if none found.
+ * Find oldest waiting ready entity.
+ *
+ * Return an entity if one is found; return an error-pointer (!NULL) if an
+ * entity was ready, but the scheduler had insufficient credits to accommodate
+ * its job; return NULL, if no ready entity was found.
  */
 static struct drm_sched_entity *
-drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
+drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched,
+				struct drm_sched_rq *rq)
 {
 	struct rb_node *rb;
 
@@ -245,6 +336,14 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
 
 		entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
 		if (drm_sched_entity_is_ready(entity)) {
+			/* If we can't queue yet, preserve the current entity in
+			 * terms of fairness.
+			 */
+			if (!drm_sched_can_queue(sched, entity)) {
+				spin_unlock(&rq->lock);
+				return ERR_PTR(-ENOSPC);
+			}
+
 			rq->current_entity = entity;
 			reinit_completion(&entity->entity_idle);
 			break;
@@ -302,7 +401,7 @@ static void drm_sched_job_done(struct drm_sched_job *s_job, int result)
 	struct drm_sched_fence *s_fence = s_job->s_fence;
 	struct drm_gpu_scheduler *sched = s_fence->sched;
 
-	atomic_dec(&sched->hw_rq_count);
+	atomic_sub(s_job->credits, &sched->credit_count);
 	atomic_dec(sched->score);
 
 	trace_drm_sched_process_job(s_fence);
@@ -525,7 +624,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 					      &s_job->cb)) {
 			dma_fence_put(s_job->s_fence->parent);
 			s_job->s_fence->parent = NULL;
-			atomic_dec(&sched->hw_rq_count);
+			atomic_sub(s_job->credits, &sched->credit_count);
 		} else {
 			/*
 			 * remove job from pending_list.
@@ -586,7 +685,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
 	list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) {
 		struct dma_fence *fence = s_job->s_fence->parent;
 
-		atomic_inc(&sched->hw_rq_count);
+		atomic_add(s_job->credits, &sched->credit_count);
 
 		if (!full_recovery)
 			continue;
@@ -667,6 +766,8 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs);
  * drm_sched_job_init - init a scheduler job
  * @job: scheduler job to init
  * @entity: scheduler entity to use
+ * @credits: the number of credits this job contributes to the schedulers
+ * credit limit
  * @owner: job owner for debugging
  *
  * Refer to drm_sched_entity_push_job() documentation
@@ -684,7 +785,7 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs);
  */
 int drm_sched_job_init(struct drm_sched_job *job,
 		       struct drm_sched_entity *entity,
-		       void *owner)
+		       u32 credits, void *owner)
 {
 	if (!entity->rq) {
 		/* This will most likely be followed by missing frames
@@ -701,6 +802,10 @@ int drm_sched_job_init(struct drm_sched_job *job,
 		return -ENOMEM;
 
 	INIT_LIST_HEAD(&job->list);
+	job->credits = credits;
+
+	drm_WARN(job->sched, !credits,
+		 "Jobs with zero credits bypass job-flow control\n");
 
 	xa_init_flags(&job->dependencies, XA_FLAGS_ALLOC);
 
@@ -908,27 +1013,18 @@ void drm_sched_job_cleanup(struct drm_sched_job *job)
 EXPORT_SYMBOL(drm_sched_job_cleanup);
 
 /**
- * drm_sched_can_queue -- Can we queue more to the hardware?
- * @sched: scheduler instance
- *
- * Return true if we can push more jobs to the hw, otherwise false.
- */
-static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
-{
-	return atomic_read(&sched->hw_rq_count) <
-		sched->hw_submission_limit;
-}
-
-/**
- * drm_sched_wakeup - Wake up the scheduler if it is ready to queue
+ * drm_sched_wakeup - Wake up the scheduler
  * @sched: scheduler instance
+ * @entity: the scheduler entity
  *
  * Wake up the scheduler if we can queue jobs.
  */
-void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
+void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
+		      struct drm_sched_entity *entity)
 {
-	if (drm_sched_can_queue(sched))
-		drm_sched_run_job_queue(sched);
+	if (drm_sched_entity_is_ready(entity))
+		if (drm_sched_can_queue(sched, entity))
+			drm_sched_run_job_queue(sched);
 }
 
 /**
@@ -936,7 +1032,11 @@ void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
  *
  * @sched: scheduler instance
  *
- * Returns the entity to process or NULL if none are found.
+ * Return an entity to process or NULL if none are found.
+ *
+ * Note, that we break out of the for-loop when "entity" is non-null, which can
+ * also be an error-pointer--this assures we don't process lower priority
+ * run-queues. See comments in the respectively called functions.
  */
 static struct drm_sched_entity *
 drm_sched_select_entity(struct drm_gpu_scheduler *sched)
@@ -944,19 +1044,16 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
 	struct drm_sched_entity *entity;
 	int i;
 
-	if (!drm_sched_can_queue(sched))
-		return NULL;
-
 	/* Kernel run queue has higher priority than normal run queue*/
 	for (i = sched->num_rqs - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
 		entity = drm_sched_policy == DRM_SCHED_POLICY_FIFO ?
-			drm_sched_rq_select_entity_fifo(sched->sched_rq[i]) :
-			drm_sched_rq_select_entity_rr(sched->sched_rq[i]);
+			drm_sched_rq_select_entity_fifo(sched, sched->sched_rq[i]) :
+			drm_sched_rq_select_entity_rr(sched, sched->sched_rq[i]);
 		if (entity)
 			break;
 	}
 
-	return entity;
+	return IS_ERR(entity) ? NULL : entity;
 }
 
 /**
@@ -1092,7 +1189,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
 
 	s_fence = sched_job->s_fence;
 
-	atomic_inc(&sched->hw_rq_count);
+	atomic_add(sched_job->credits, &sched->credit_count);
 	drm_sched_job_begin(sched_job);
 
 	trace_drm_run_job(sched_job, entity);
@@ -1127,7 +1224,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
  * @submit_wq: workqueue to use for submission. If NULL, an ordered wq is
  *	       allocated and used
  * @num_rqs: number of runqueues, one for each priority, up to DRM_SCHED_PRIORITY_COUNT
- * @hw_submission: number of hw submissions that can be in flight
+ * @credit_limit: the number of credits this scheduler can hold from all jobs
  * @hang_limit: number of times to allow a job to hang before dropping it
  * @timeout: timeout value in jiffies for the scheduler
  * @timeout_wq: workqueue to use for timeout work. If NULL, the system_wq is
@@ -1141,14 +1238,14 @@ static void drm_sched_run_job_work(struct work_struct *w)
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
 		   struct workqueue_struct *submit_wq,
-		   u32 num_rqs, uint32_t hw_submission, unsigned int hang_limit,
+		   u32 num_rqs, u32 credit_limit, unsigned int hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
 		   atomic_t *score, const char *name, struct device *dev)
 {
 	int i, ret;
 
 	sched->ops = ops;
-	sched->hw_submission_limit = hw_submission;
+	sched->credit_limit = credit_limit;
 	sched->name = name;
 	sched->timeout = timeout;
 	sched->timeout_wq = timeout_wq ? : system_wq;
@@ -1197,7 +1294,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 	init_waitqueue_head(&sched->job_scheduled);
 	INIT_LIST_HEAD(&sched->pending_list);
 	spin_lock_init(&sched->job_list_lock);
-	atomic_set(&sched->hw_rq_count, 0);
+	atomic_set(&sched->credit_count, 0);
 	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
 	INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
 	INIT_WORK(&sched->work_free_job, drm_sched_free_job_work);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 712675134c048d..9d2ac23c29e33e 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -418,7 +418,7 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 	job->file = file_priv;
 
 	ret = drm_sched_job_init(&job->base, &v3d_priv->sched_entity[queue],
-				 v3d_priv);
+				 1, v3d_priv);
 	if (ret)
 		goto fail;
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 754fd2217334e5..783872d29a6d71 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -321,6 +321,7 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
  * @sched: the scheduler instance on which this job is scheduled.
  * @s_fence: contains the fences for the scheduling of job.
  * @finish_cb: the callback for the finished fence.
+ * @credits: the number of credits this job contributes to the scheduler
  * @work: Helper to reschdeule job kill to different context.
  * @id: a unique id assigned to each job scheduled on the scheduler.
  * @karma: increment on every hang caused by this job. If this exceeds the hang
@@ -340,6 +341,8 @@ struct drm_sched_job {
 	struct drm_gpu_scheduler	*sched;
 	struct drm_sched_fence		*s_fence;
 
+	u32				credits;
+
 	/*
 	 * work is used only after finish_cb has been used and will not be
 	 * accessed anymore.
@@ -463,13 +466,27 @@ struct drm_sched_backend_ops {
          * and it's time to clean it up.
 	 */
 	void (*free_job)(struct drm_sched_job *sched_job);
+
+	/**
+	 * @update_job_credits: Called when the scheduler is considering this
+	 * job for execution.
+	 *
+	 * This callback returns the number of credits the job would take if
+	 * pushed to the hardware. Drivers may use this to dynamically update
+	 * the job's credit count. For instance, deduct the number of credits
+	 * for already signalled native fences.
+	 *
+	 * This callback is optional.
+	 */
+	u32 (*update_job_credits)(struct drm_sched_job *sched_job);
 };
 
 /**
  * struct drm_gpu_scheduler - scheduler instance-specific data
  *
  * @ops: backend operations provided by the driver.
- * @hw_submission_limit: the max size of the hardware queue.
+ * @credit_limit: the credit limit of this scheduler
+ * @credit_count: the current credit count of this scheduler
  * @timeout: the time after which a job is removed from the scheduler.
  * @name: name of the ring for which this scheduler is being used.
  * @num_rqs: Number of run-queues. This is at most DRM_SCHED_PRIORITY_COUNT,
@@ -478,7 +495,6 @@ struct drm_sched_backend_ops {
  * @job_scheduled: once @drm_sched_entity_do_release is called the scheduler
  *                 waits on this wait queue until all the scheduled jobs are
  *                 finished.
- * @hw_rq_count: the number of jobs currently in the hardware queue.
  * @job_id_count: used to assign unique id to the each job.
  * @submit_wq: workqueue used to queue @work_run_job and @work_free_job
  * @timeout_wq: workqueue used to queue @work_tdr
@@ -502,13 +518,13 @@ struct drm_sched_backend_ops {
  */
 struct drm_gpu_scheduler {
 	const struct drm_sched_backend_ops	*ops;
-	uint32_t			hw_submission_limit;
+	u32				credit_limit;
+	atomic_t			credit_count;
 	long				timeout;
 	const char			*name;
 	u32                             num_rqs;
 	struct drm_sched_rq             **sched_rq;
 	wait_queue_head_t		job_scheduled;
-	atomic_t			hw_rq_count;
 	atomic64_t			job_id_count;
 	struct workqueue_struct		*submit_wq;
 	struct workqueue_struct		*timeout_wq;
@@ -530,14 +546,14 @@ struct drm_gpu_scheduler {
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
 		   struct workqueue_struct *submit_wq,
-		   u32 num_rqs, uint32_t hw_submission, unsigned int hang_limit,
+		   u32 num_rqs, u32 credit_limit, unsigned int hang_limit,
 		   long timeout, struct workqueue_struct *timeout_wq,
 		   atomic_t *score, const char *name, struct device *dev);
 
 void drm_sched_fini(struct drm_gpu_scheduler *sched);
 int drm_sched_job_init(struct drm_sched_job *job,
 		       struct drm_sched_entity *entity,
-		       void *owner);
+		       u32 credits, void *owner);
 void drm_sched_job_arm(struct drm_sched_job *job);
 int drm_sched_job_add_dependency(struct drm_sched_job *job,
 				 struct dma_fence *fence);
@@ -559,7 +575,8 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 
 void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched);
 void drm_sched_job_cleanup(struct drm_sched_job *job);
-void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
+void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
+		      struct drm_sched_entity *entity);
 bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);
 void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched);
 void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched);

base-commit: 8d88e4cdce4f5c56de55174a4d32ea9c06f7fa66
-- 
2.42.1


[-- Attachment #1.1.3: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev13)
  2023-10-31  3:24 ` [Intel-xe] " Matthew Brost
                   ` (9 preceding siblings ...)
  (?)
@ 2023-11-09  7:12 ` Patchwork
  -1 siblings, 0 replies; 62+ messages in thread
From: Patchwork @ 2023-11-09  7:12 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: intel-xe

== Series Details ==

Series: DRM scheduler changes for Xe (rev13)
URL   : https://patchwork.freedesktop.org/series/121744/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: 58c0bd519 drm/xe: Add Wa_14019877138
=== git am output follows ===
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:290
error: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c: patch does not apply
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1659
error: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c: patch does not apply
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4861
error: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: patch does not apply
error: patch failed: drivers/gpu/drm/msm/adreno/adreno_device.c:841
error: drivers/gpu/drm/msm/adreno/adreno_device.c: patch does not apply
error: patch failed: drivers/gpu/drm/scheduler/sched_main.c:439
error: drivers/gpu/drm/scheduler/sched_main.c: patch does not apply
error: patch failed: include/drm/gpu_scheduler.h:552
error: include/drm/gpu_scheduler.h: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch
Applying: drm/sched: Add drm_sched_wqueue_* helpers
Patch failed at 0001 drm/sched: Add drm_sched_wqueue_* helpers
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
  2023-11-09  6:52           ` [Intel-xe] " Luben Tuikov
@ 2023-11-09 19:24             ` Danilo Krummrich
  -1 siblings, 0 replies; 62+ messages in thread
From: Danilo Krummrich @ 2023-11-09 19:24 UTC (permalink / raw)
  To: Luben Tuikov, tvrtko.ursulin
  Cc: matthew.brost, robdclark, sarah.walker, ketil.johnsen, lina,
	mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	donald.robson, christian.koenig, faith.ekstrand

On 11/9/23 07:52, Luben Tuikov wrote:
> Hi,
> 
> On 2023-11-07 19:41, Danilo Krummrich wrote:
>> On 11/7/23 05:10, Luben Tuikov wrote:
>>> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
>>> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
>>> it do just that, schedule the work item for execution.
>>>
>>> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
>>> to determine if the scheduler has an entity ready in one of its run-queues,
>>> and in the case of the Round-Robin (RR) scheduling, the function
>>> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
>>> which is ready, sets up the run-queue and completion and returns that
>>> entity. The FIFO scheduling algorithm is unaffected.
>>>
>>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>>> in the case of RR scheduling, that would result in drm_sched_select_entity()
>>> having been called twice, which may result in skipping a ready entity if more
>>> than one entity is ready. This commit fixes this by eliminating the call to
>>> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
>>> in drm_sched_run_job_work().
>>>
>>> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>>>       Add fixes-tag. (Tvrtko)
>>>
>>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>>> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
>>>    1 file changed, 3 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 27843e37d9b769..cd0dc3f81d05f0 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>>>    }
>>>    
>>>    /**
>>> - * __drm_sched_run_job_queue - enqueue run-job work
>>> + * drm_sched_run_job_queue - enqueue run-job work
>>>     * @sched: scheduler instance
>>>     */
>>> -static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>> +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>>    {
>>>    	if (!READ_ONCE(sched->pause_submit))
>>>    		queue_work(sched->submit_wq, &sched->work_run_job);
>>> @@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>>>    void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
>>>    {
>>>    	if (drm_sched_can_queue(sched))
>>> -		__drm_sched_run_job_queue(sched);
>>> +		drm_sched_run_job_queue(sched);
>>>    }
>>>    
>>>    /**
>>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>>    }
>>>    EXPORT_SYMBOL(drm_sched_pick_best);
>>>    
>>> -/**
>>> - * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
>>> - * @sched: scheduler instance
>>> - */
>>> -static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>> -{
>>> -	if (drm_sched_select_entity(sched))
>>
>> Hm, now that I rebase my patch to implement dynamic job-flow control I recognize that
>> we probably need the peek semantics here. If we do not select an entity here, we also
>> do not check whether the corresponding job fits on the ring.
>>
>> Alternatively, we simply can't do this check in drm_sched_wakeup(). The consequence would
>> be that we don't detect that we need to wait for credits to free up before the run work is
>> already executing and the run work selects an entity.
> 
> So I rebased v5 on top of the latest drm-misc-next, and looked around and found out that
> drm_sched_wakeup() is missing drm_sched_entity_is_ready(). It should look like the following,

Yeah, but that's just the consequence of re-basing it onto Tvrtko's patch.

My point is that by removing drm_sched_select_entity() from drm_sched_run_job_queue() we do not
only loose the check whether the selected entity is ready, but also whether we have enough
credits to actually run a new job. This can lead to queuing up work that does nothing but calling
drm_sched_select_entity() and return.

By peeking the entity we could know this *before* scheduling work and hence avoid some CPU scheduler
overhead.

However, since this patch already landed and we can fail the same way if the selected entity isn't
ready I don't consider this to be a blocker for the credit patch, hence I will send out a v6.

> 
> void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
> 		      struct drm_sched_entity *entity)
> {
> 	if (drm_sched_entity_is_ready(entity))
> 		if (drm_sched_can_queue(sched, entity))
> 			drm_sched_run_job_queue(sched);
> }
> 
> See the attached patch. (Currently running with base-commit and the attached patch.)


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
@ 2023-11-09 19:24             ` Danilo Krummrich
  0 siblings, 0 replies; 62+ messages in thread
From: Danilo Krummrich @ 2023-11-09 19:24 UTC (permalink / raw)
  To: Luben Tuikov, tvrtko.ursulin
  Cc: robdclark, sarah.walker, ketil.johnsen, lina, mcanal,
	Liviu.Dudau, dri-devel, intel-xe, boris.brezillon, donald.robson,
	christian.koenig, faith.ekstrand

On 11/9/23 07:52, Luben Tuikov wrote:
> Hi,
> 
> On 2023-11-07 19:41, Danilo Krummrich wrote:
>> On 11/7/23 05:10, Luben Tuikov wrote:
>>> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
>>> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
>>> it do just that, schedule the work item for execution.
>>>
>>> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
>>> to determine if the scheduler has an entity ready in one of its run-queues,
>>> and in the case of the Round-Robin (RR) scheduling, the function
>>> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
>>> which is ready, sets up the run-queue and completion and returns that
>>> entity. The FIFO scheduling algorithm is unaffected.
>>>
>>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>>> in the case of RR scheduling, that would result in drm_sched_select_entity()
>>> having been called twice, which may result in skipping a ready entity if more
>>> than one entity is ready. This commit fixes this by eliminating the call to
>>> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
>>> in drm_sched_run_job_work().
>>>
>>> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>>>       Add fixes-tag. (Tvrtko)
>>>
>>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>>> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
>>>    1 file changed, 3 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 27843e37d9b769..cd0dc3f81d05f0 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>>>    }
>>>    
>>>    /**
>>> - * __drm_sched_run_job_queue - enqueue run-job work
>>> + * drm_sched_run_job_queue - enqueue run-job work
>>>     * @sched: scheduler instance
>>>     */
>>> -static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>> +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>>    {
>>>    	if (!READ_ONCE(sched->pause_submit))
>>>    		queue_work(sched->submit_wq, &sched->work_run_job);
>>> @@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>>>    void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
>>>    {
>>>    	if (drm_sched_can_queue(sched))
>>> -		__drm_sched_run_job_queue(sched);
>>> +		drm_sched_run_job_queue(sched);
>>>    }
>>>    
>>>    /**
>>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>>    }
>>>    EXPORT_SYMBOL(drm_sched_pick_best);
>>>    
>>> -/**
>>> - * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
>>> - * @sched: scheduler instance
>>> - */
>>> -static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>> -{
>>> -	if (drm_sched_select_entity(sched))
>>
>> Hm, now that I rebase my patch to implement dynamic job-flow control I recognize that
>> we probably need the peek semantics here. If we do not select an entity here, we also
>> do not check whether the corresponding job fits on the ring.
>>
>> Alternatively, we simply can't do this check in drm_sched_wakeup(). The consequence would
>> be that we don't detect that we need to wait for credits to free up before the run work is
>> already executing and the run work selects an entity.
> 
> So I rebased v5 on top of the latest drm-misc-next, and looked around and found out that
> drm_sched_wakeup() is missing drm_sched_entity_is_ready(). It should look like the following,

Yeah, but that's just the consequence of re-basing it onto Tvrtko's patch.

My point is that by removing drm_sched_select_entity() from drm_sched_run_job_queue() we do not
only loose the check whether the selected entity is ready, but also whether we have enough
credits to actually run a new job. This can lead to queuing up work that does nothing but calling
drm_sched_select_entity() and return.

By peeking the entity we could know this *before* scheduling work and hence avoid some CPU scheduler
overhead.

However, since this patch already landed and we can fail the same way if the selected entity isn't
ready I don't consider this to be a blocker for the credit patch, hence I will send out a v6.

> 
> void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
> 		      struct drm_sched_entity *entity)
> {
> 	if (drm_sched_entity_is_ready(entity))
> 		if (drm_sched_can_queue(sched, entity))
> 			drm_sched_run_job_queue(sched);
> }
> 
> See the attached patch. (Currently running with base-commit and the attached patch.)


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
  2023-11-09 19:24             ` [Intel-xe] " Danilo Krummrich
@ 2023-11-09 23:41               ` Danilo Krummrich
  -1 siblings, 0 replies; 62+ messages in thread
From: Danilo Krummrich @ 2023-11-09 23:41 UTC (permalink / raw)
  To: Luben Tuikov, tvrtko.ursulin
  Cc: matthew.brost, robdclark, sarah.walker, ketil.johnsen, lina,
	mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	donald.robson, christian.koenig, faith.ekstrand

On 11/9/23 20:24, Danilo Krummrich wrote:
> On 11/9/23 07:52, Luben Tuikov wrote:
>> Hi,
>>
>> On 2023-11-07 19:41, Danilo Krummrich wrote:
>>> On 11/7/23 05:10, Luben Tuikov wrote:
>>>> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
>>>> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
>>>> it do just that, schedule the work item for execution.
>>>>
>>>> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
>>>> to determine if the scheduler has an entity ready in one of its run-queues,
>>>> and in the case of the Round-Robin (RR) scheduling, the function
>>>> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
>>>> which is ready, sets up the run-queue and completion and returns that
>>>> entity. The FIFO scheduling algorithm is unaffected.
>>>>
>>>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>>>> in the case of RR scheduling, that would result in drm_sched_select_entity()
>>>> having been called twice, which may result in skipping a ready entity if more
>>>> than one entity is ready. This commit fixes this by eliminating the call to
>>>> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
>>>> in drm_sched_run_job_work().
>>>>
>>>> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>>>>       Add fixes-tag. (Tvrtko)
>>>>
>>>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>>>> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
>>>> ---
>>>>    drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
>>>>    1 file changed, 3 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index 27843e37d9b769..cd0dc3f81d05f0 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>>>>    }
>>>>    /**
>>>> - * __drm_sched_run_job_queue - enqueue run-job work
>>>> + * drm_sched_run_job_queue - enqueue run-job work
>>>>     * @sched: scheduler instance
>>>>     */
>>>> -static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>>> +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>>>    {
>>>>        if (!READ_ONCE(sched->pause_submit))
>>>>            queue_work(sched->submit_wq, &sched->work_run_job);
>>>> @@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>>>>    void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
>>>>    {
>>>>        if (drm_sched_can_queue(sched))
>>>> -        __drm_sched_run_job_queue(sched);
>>>> +        drm_sched_run_job_queue(sched);
>>>>    }
>>>>    /**
>>>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>>>    }
>>>>    EXPORT_SYMBOL(drm_sched_pick_best);
>>>> -/**
>>>> - * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
>>>> - * @sched: scheduler instance
>>>> - */
>>>> -static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>>> -{
>>>> -    if (drm_sched_select_entity(sched))
>>>
>>> Hm, now that I rebase my patch to implement dynamic job-flow control I recognize that
>>> we probably need the peek semantics here. If we do not select an entity here, we also
>>> do not check whether the corresponding job fits on the ring.
>>>
>>> Alternatively, we simply can't do this check in drm_sched_wakeup(). The consequence would
>>> be that we don't detect that we need to wait for credits to free up before the run work is
>>> already executing and the run work selects an entity.
>>
>> So I rebased v5 on top of the latest drm-misc-next, and looked around and found out that
>> drm_sched_wakeup() is missing drm_sched_entity_is_ready(). It should look like the following,
> 
> Yeah, but that's just the consequence of re-basing it onto Tvrtko's patch.
> 
> My point is that by removing drm_sched_select_entity() from drm_sched_run_job_queue() we do not
> only loose the check whether the selected entity is ready, but also whether we have enough
> credits to actually run a new job. This can lead to queuing up work that does nothing but calling
> drm_sched_select_entity() and return.

Ok, I see it now.  We don't need to peek, we know the entity at drm_sched_wakeup().

However, the missing drm_sched_entity_is_ready() check should have been added already when
drm_sched_select_entity() was removed. Gonna send a fix for that as well.

- Danilo

> 
> By peeking the entity we could know this *before* scheduling work and hence avoid some CPU scheduler
> overhead.
> 
> However, since this patch already landed and we can fail the same way if the selected entity isn't
> ready I don't consider this to be a blocker for the credit patch, hence I will send out a v6.
> 
>>
>> void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
>>               struct drm_sched_entity *entity)
>> {
>>     if (drm_sched_entity_is_ready(entity))
>>         if (drm_sched_can_queue(sched, entity))
>>             drm_sched_run_job_queue(sched);
>> }
>>
>> See the attached patch. (Currently running with base-commit and the attached patch.)


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
@ 2023-11-09 23:41               ` Danilo Krummrich
  0 siblings, 0 replies; 62+ messages in thread
From: Danilo Krummrich @ 2023-11-09 23:41 UTC (permalink / raw)
  To: Luben Tuikov, tvrtko.ursulin
  Cc: robdclark, sarah.walker, ketil.johnsen, lina, mcanal,
	Liviu.Dudau, dri-devel, intel-xe, boris.brezillon, donald.robson,
	christian.koenig, faith.ekstrand

On 11/9/23 20:24, Danilo Krummrich wrote:
> On 11/9/23 07:52, Luben Tuikov wrote:
>> Hi,
>>
>> On 2023-11-07 19:41, Danilo Krummrich wrote:
>>> On 11/7/23 05:10, Luben Tuikov wrote:
>>>> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
>>>> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
>>>> it do just that, schedule the work item for execution.
>>>>
>>>> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
>>>> to determine if the scheduler has an entity ready in one of its run-queues,
>>>> and in the case of the Round-Robin (RR) scheduling, the function
>>>> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
>>>> which is ready, sets up the run-queue and completion and returns that
>>>> entity. The FIFO scheduling algorithm is unaffected.
>>>>
>>>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>>>> in the case of RR scheduling, that would result in drm_sched_select_entity()
>>>> having been called twice, which may result in skipping a ready entity if more
>>>> than one entity is ready. This commit fixes this by eliminating the call to
>>>> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
>>>> in drm_sched_run_job_work().
>>>>
>>>> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>>>>       Add fixes-tag. (Tvrtko)
>>>>
>>>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>>>> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
>>>> ---
>>>>    drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
>>>>    1 file changed, 3 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index 27843e37d9b769..cd0dc3f81d05f0 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>>>>    }
>>>>    /**
>>>> - * __drm_sched_run_job_queue - enqueue run-job work
>>>> + * drm_sched_run_job_queue - enqueue run-job work
>>>>     * @sched: scheduler instance
>>>>     */
>>>> -static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>>> +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>>>    {
>>>>        if (!READ_ONCE(sched->pause_submit))
>>>>            queue_work(sched->submit_wq, &sched->work_run_job);
>>>> @@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>>>>    void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
>>>>    {
>>>>        if (drm_sched_can_queue(sched))
>>>> -        __drm_sched_run_job_queue(sched);
>>>> +        drm_sched_run_job_queue(sched);
>>>>    }
>>>>    /**
>>>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>>>    }
>>>>    EXPORT_SYMBOL(drm_sched_pick_best);
>>>> -/**
>>>> - * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
>>>> - * @sched: scheduler instance
>>>> - */
>>>> -static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>>> -{
>>>> -    if (drm_sched_select_entity(sched))
>>>
>>> Hm, now that I rebase my patch to implement dynamic job-flow control I recognize that
>>> we probably need the peek semantics here. If we do not select an entity here, we also
>>> do not check whether the corresponding job fits on the ring.
>>>
>>> Alternatively, we simply can't do this check in drm_sched_wakeup(). The consequence would
>>> be that we don't detect that we need to wait for credits to free up before the run work is
>>> already executing and the run work selects an entity.
>>
>> So I rebased v5 on top of the latest drm-misc-next, and looked around and found out that
>> drm_sched_wakeup() is missing drm_sched_entity_is_ready(). It should look like the following,
> 
> Yeah, but that's just the consequence of re-basing it onto Tvrtko's patch.
> 
> My point is that by removing drm_sched_select_entity() from drm_sched_run_job_queue() we do not
> only loose the check whether the selected entity is ready, but also whether we have enough
> credits to actually run a new job. This can lead to queuing up work that does nothing but calling
> drm_sched_select_entity() and return.

Ok, I see it now.  We don't need to peek, we know the entity at drm_sched_wakeup().

However, the missing drm_sched_entity_is_ready() check should have been added already when
drm_sched_select_entity() was removed. Gonna send a fix for that as well.

- Danilo

> 
> By peeking the entity we could know this *before* scheduling work and hence avoid some CPU scheduler
> overhead.
> 
> However, since this patch already landed and we can fail the same way if the selected entity isn't
> ready I don't consider this to be a blocker for the credit patch, hence I will send out a v6.
> 
>>
>> void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
>>               struct drm_sched_entity *entity)
>> {
>>     if (drm_sched_entity_is_ready(entity))
>>         if (drm_sched_can_queue(sched, entity))
>>             drm_sched_run_job_queue(sched);
>> }
>>
>> See the attached patch. (Currently running with base-commit and the attached patch.)


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
  2023-11-09 23:41               ` [Intel-xe] " Danilo Krummrich
@ 2023-11-09 23:49                 ` Luben Tuikov
  -1 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-09 23:49 UTC (permalink / raw)
  To: Danilo Krummrich, tvrtko.ursulin
  Cc: matthew.brost, robdclark, sarah.walker, ketil.johnsen, lina,
	mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon,
	donald.robson, christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 5011 bytes --]

On 2023-11-09 18:41, Danilo Krummrich wrote:
> On 11/9/23 20:24, Danilo Krummrich wrote:
>> On 11/9/23 07:52, Luben Tuikov wrote:
>>> Hi,
>>>
>>> On 2023-11-07 19:41, Danilo Krummrich wrote:
>>>> On 11/7/23 05:10, Luben Tuikov wrote:
>>>>> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
>>>>> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
>>>>> it do just that, schedule the work item for execution.
>>>>>
>>>>> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
>>>>> to determine if the scheduler has an entity ready in one of its run-queues,
>>>>> and in the case of the Round-Robin (RR) scheduling, the function
>>>>> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
>>>>> which is ready, sets up the run-queue and completion and returns that
>>>>> entity. The FIFO scheduling algorithm is unaffected.
>>>>>
>>>>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>>>>> in the case of RR scheduling, that would result in drm_sched_select_entity()
>>>>> having been called twice, which may result in skipping a ready entity if more
>>>>> than one entity is ready. This commit fixes this by eliminating the call to
>>>>> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
>>>>> in drm_sched_run_job_work().
>>>>>
>>>>> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>>>>>       Add fixes-tag. (Tvrtko)
>>>>>
>>>>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>>>>> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
>>>>> ---
>>>>>    drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
>>>>>    1 file changed, 3 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 27843e37d9b769..cd0dc3f81d05f0 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>>>>>    }
>>>>>    /**
>>>>> - * __drm_sched_run_job_queue - enqueue run-job work
>>>>> + * drm_sched_run_job_queue - enqueue run-job work
>>>>>     * @sched: scheduler instance
>>>>>     */
>>>>> -static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>>>> +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>>>>    {
>>>>>        if (!READ_ONCE(sched->pause_submit))
>>>>>            queue_work(sched->submit_wq, &sched->work_run_job);
>>>>> @@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>>>>>    void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
>>>>>    {
>>>>>        if (drm_sched_can_queue(sched))
>>>>> -        __drm_sched_run_job_queue(sched);
>>>>> +        drm_sched_run_job_queue(sched);
>>>>>    }
>>>>>    /**
>>>>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>>>>    }
>>>>>    EXPORT_SYMBOL(drm_sched_pick_best);
>>>>> -/**
>>>>> - * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
>>>>> - * @sched: scheduler instance
>>>>> - */
>>>>> -static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>>>> -{
>>>>> -    if (drm_sched_select_entity(sched))
>>>>
>>>> Hm, now that I rebase my patch to implement dynamic job-flow control I recognize that
>>>> we probably need the peek semantics here. If we do not select an entity here, we also
>>>> do not check whether the corresponding job fits on the ring.
>>>>
>>>> Alternatively, we simply can't do this check in drm_sched_wakeup(). The consequence would
>>>> be that we don't detect that we need to wait for credits to free up before the run work is
>>>> already executing and the run work selects an entity.
>>>
>>> So I rebased v5 on top of the latest drm-misc-next, and looked around and found out that
>>> drm_sched_wakeup() is missing drm_sched_entity_is_ready(). It should look like the following,
>>
>> Yeah, but that's just the consequence of re-basing it onto Tvrtko's patch.
>>
>> My point is that by removing drm_sched_select_entity() from drm_sched_run_job_queue() we do not
>> only loose the check whether the selected entity is ready, but also whether we have enough
>> credits to actually run a new job. This can lead to queuing up work that does nothing but calling
>> drm_sched_select_entity() and return.
> 
> Ok, I see it now.  We don't need to peek, we know the entity at drm_sched_wakeup().
> 
> However, the missing drm_sched_entity_is_ready() check should have been added already when
> drm_sched_select_entity() was removed. Gonna send a fix for that as well.

Let me do that, since I added it to your patch.
Then you can rebase your credits patch onto mine.

-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling
@ 2023-11-09 23:49                 ` Luben Tuikov
  0 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-09 23:49 UTC (permalink / raw)
  To: Danilo Krummrich, tvrtko.ursulin
  Cc: robdclark, sarah.walker, ketil.johnsen, lina, mcanal,
	Liviu.Dudau, dri-devel, intel-xe, boris.brezillon, donald.robson,
	christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 5011 bytes --]

On 2023-11-09 18:41, Danilo Krummrich wrote:
> On 11/9/23 20:24, Danilo Krummrich wrote:
>> On 11/9/23 07:52, Luben Tuikov wrote:
>>> Hi,
>>>
>>> On 2023-11-07 19:41, Danilo Krummrich wrote:
>>>> On 11/7/23 05:10, Luben Tuikov wrote:
>>>>> Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
>>>>> rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
>>>>> it do just that, schedule the work item for execution.
>>>>>
>>>>> The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
>>>>> to determine if the scheduler has an entity ready in one of its run-queues,
>>>>> and in the case of the Round-Robin (RR) scheduling, the function
>>>>> drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
>>>>> which is ready, sets up the run-queue and completion and returns that
>>>>> entity. The FIFO scheduling algorithm is unaffected.
>>>>>
>>>>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>>>>> in the case of RR scheduling, that would result in drm_sched_select_entity()
>>>>> having been called twice, which may result in skipping a ready entity if more
>>>>> than one entity is ready. This commit fixes this by eliminating the call to
>>>>> drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
>>>>> in drm_sched_run_job_work().
>>>>>
>>>>> v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
>>>>>       Add fixes-tag. (Tvrtko)
>>>>>
>>>>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>>>>> Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
>>>>> ---
>>>>>    drivers/gpu/drm/scheduler/sched_main.c | 16 +++-------------
>>>>>    1 file changed, 3 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 27843e37d9b769..cd0dc3f81d05f0 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -256,10 +256,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>>>>>    }
>>>>>    /**
>>>>> - * __drm_sched_run_job_queue - enqueue run-job work
>>>>> + * drm_sched_run_job_queue - enqueue run-job work
>>>>>     * @sched: scheduler instance
>>>>>     */
>>>>> -static void __drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>>>> +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>>>>    {
>>>>>        if (!READ_ONCE(sched->pause_submit))
>>>>>            queue_work(sched->submit_wq, &sched->work_run_job);
>>>>> @@ -928,7 +928,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler *sched)
>>>>>    void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
>>>>>    {
>>>>>        if (drm_sched_can_queue(sched))
>>>>> -        __drm_sched_run_job_queue(sched);
>>>>> +        drm_sched_run_job_queue(sched);
>>>>>    }
>>>>>    /**
>>>>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>>>>    }
>>>>>    EXPORT_SYMBOL(drm_sched_pick_best);
>>>>> -/**
>>>>> - * drm_sched_run_job_queue - enqueue run-job work if there are ready entities
>>>>> - * @sched: scheduler instance
>>>>> - */
>>>>> -static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
>>>>> -{
>>>>> -    if (drm_sched_select_entity(sched))
>>>>
>>>> Hm, now that I rebase my patch to implement dynamic job-flow control I recognize that
>>>> we probably need the peek semantics here. If we do not select an entity here, we also
>>>> do not check whether the corresponding job fits on the ring.
>>>>
>>>> Alternatively, we simply can't do this check in drm_sched_wakeup(). The consequence would
>>>> be that we don't detect that we need to wait for credits to free up before the run work is
>>>> already executing and the run work selects an entity.
>>>
>>> So I rebased v5 on top of the latest drm-misc-next, and looked around and found out that
>>> drm_sched_wakeup() is missing drm_sched_entity_is_ready(). It should look like the following,
>>
>> Yeah, but that's just the consequence of re-basing it onto Tvrtko's patch.
>>
>> My point is that by removing drm_sched_select_entity() from drm_sched_run_job_queue() we do not
>> only loose the check whether the selected entity is ready, but also whether we have enough
>> credits to actually run a new job. This can lead to queuing up work that does nothing but calling
>> drm_sched_select_entity() and return.
> 
> Ok, I see it now.  We don't need to peek, we know the entity at drm_sched_wakeup().
> 
> However, the missing drm_sched_entity_is_ready() check should have been added already when
> drm_sched_select_entity() was removed. Gonna send a fix for that as well.

Let me do that, since I added it to your patch.
Then you can rebase your credits patch onto mine.

-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH] Revert "drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()"
  2023-11-09 23:49                 ` [Intel-xe] " Luben Tuikov
@ 2023-11-27 13:30                   ` Bert Karwatzki
  -1 siblings, 0 replies; 62+ messages in thread
From: Bert Karwatzki @ 2023-11-27 13:30 UTC (permalink / raw)
  To: ltuikov89
  Cc: matthew.brost, robdclark, sarah.walker, tvrtko.ursulin,
	ketil.johnsen, lina, mcanal, Liviu.Dudau, dri-devel, intel-xe,
	boris.brezillon, dakr, donald.robson, christian.koenig,
	faith.ekstrand, Bert Karwatzki

Commit f3123c25 (in combination with the use of work queues by the gpu
scheduler) leads to random lock ups of the GUI [1,2].

This is not a complete revert of commit f3123c25 as drm_sched_wakeup
still needs its entity argument to pass it to drm_sched_can_queue.

[1] https://gitlab.freedesktop.org/drm/amd/-/issues/2994
[2] https://lists.freedesktop.org/archives/dri-devel/2023-November/431606.html

This reverts commit f3123c2590005c5ff631653d31428e40cd10c618.
---
 drivers/gpu/drm/scheduler/sched_main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 682aebe96db7..550492a7a031 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1029,9 +1029,8 @@ EXPORT_SYMBOL(drm_sched_job_cleanup);
 void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
 		      struct drm_sched_entity *entity)
 {
-	if (drm_sched_entity_is_ready(entity))
-		if (drm_sched_can_queue(sched, entity))
-			drm_sched_run_job_queue(sched);
+	if (drm_sched_can_queue(sched, entity))
+		drm_sched_run_job_queue(sched);
 }

 /**
--
2.43.0


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Intel-xe] [PATCH] Revert "drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()"
@ 2023-11-27 13:30                   ` Bert Karwatzki
  0 siblings, 0 replies; 62+ messages in thread
From: Bert Karwatzki @ 2023-11-27 13:30 UTC (permalink / raw)
  To: ltuikov89
  Cc: robdclark, sarah.walker, tvrtko.ursulin, ketil.johnsen, lina,
	mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon, dakr,
	donald.robson, christian.koenig, faith.ekstrand, Bert Karwatzki

Commit f3123c25 (in combination with the use of work queues by the gpu
scheduler) leads to random lock ups of the GUI [1,2].

This is not a complete revert of commit f3123c25 as drm_sched_wakeup
still needs its entity argument to pass it to drm_sched_can_queue.

[1] https://gitlab.freedesktop.org/drm/amd/-/issues/2994
[2] https://lists.freedesktop.org/archives/dri-devel/2023-November/431606.html

This reverts commit f3123c2590005c5ff631653d31428e40cd10c618.
---
 drivers/gpu/drm/scheduler/sched_main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 682aebe96db7..550492a7a031 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1029,9 +1029,8 @@ EXPORT_SYMBOL(drm_sched_job_cleanup);
 void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
 		      struct drm_sched_entity *entity)
 {
-	if (drm_sched_entity_is_ready(entity))
-		if (drm_sched_can_queue(sched, entity))
-			drm_sched_run_job_queue(sched);
+	if (drm_sched_can_queue(sched, entity))
+		drm_sched_run_job_queue(sched);
 }

 /**
--
2.43.0


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH] Revert "drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()"
  2023-11-27 13:30                   ` [Intel-xe] " Bert Karwatzki
@ 2023-11-27 15:14                     ` Luben Tuikov
  -1 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-27 15:14 UTC (permalink / raw)
  To: Bert Karwatzki
  Cc: matthew.brost, robdclark, sarah.walker, tvrtko.ursulin,
	ketil.johnsen, lina, mcanal, Liviu.Dudau, dri-devel, intel-xe,
	boris.brezillon, dakr, donald.robson, christian.koenig,
	faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 1945 bytes --]

Hi Bert,

# The title of the patch should be:

drm/sched: Partial revert of "Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()"

On 2023-11-27 08:30, Bert Karwatzki wrote:
> Commit f3123c25 (in combination with the use of work queues by the gpu

Commit f3123c2590005c, in combination with the use of work queues by the GPU
scheduler, leads to random lock-ups of the GUI.

> scheduler) leads to random lock ups of the GUI [1,2].
> 
> This is not a complete revert of commit f3123c25 as drm_sched_wakeup

This is a partial revert of of commit f3123c2590005c since drm_sched_wakeup()

> still needs its entity argument to pass it to drm_sched_can_queue.

... drm_sched_can_queue().

# Don't forget a SoB line!

Signed-off-by: Bert ...

>> [1] https://gitlab.freedesktop.org/drm/amd/-/issues/2994

# Use a Link: tag instead, like this:
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2994

> [2] https://lists.freedesktop.org/archives/dri-devel/2023-November/431606.html

# Use a Link: tag instead, like this:
Link: https://lists.freedesktop.org/archives/dri-devel/2023-November/431606.html

> 
> This reverts commit f3123c2590005c5ff631653d31428e40cd10c618.

# The line above is *not* necessary, since this is a partial commit. Instead we need
# a Fixes: line, like this:

Fixes: f3123c2590005c ("drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()")

#######---

Then after you do "git format-patch", post it like this:

git send-email \
    --in-reply-to=c5292d06-2e37-4715-96dc-699f369111fa@gmail.com \
    --to=ltuikov89@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=spasswolf@web.de \
    --cc=tvrtko.ursulin@intel.com \
    /path/to/PATCH

This follows your thread where all the information is stored.

Thanks!
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe] [PATCH] Revert "drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()"
@ 2023-11-27 15:14                     ` Luben Tuikov
  0 siblings, 0 replies; 62+ messages in thread
From: Luben Tuikov @ 2023-11-27 15:14 UTC (permalink / raw)
  To: Bert Karwatzki
  Cc: robdclark, sarah.walker, tvrtko.ursulin, ketil.johnsen, lina,
	mcanal, Liviu.Dudau, dri-devel, intel-xe, boris.brezillon, dakr,
	donald.robson, christian.koenig, faith.ekstrand


[-- Attachment #1.1.1: Type: text/plain, Size: 1945 bytes --]

Hi Bert,

# The title of the patch should be:

drm/sched: Partial revert of "Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()"

On 2023-11-27 08:30, Bert Karwatzki wrote:
> Commit f3123c25 (in combination with the use of work queues by the gpu

Commit f3123c2590005c, in combination with the use of work queues by the GPU
scheduler, leads to random lock-ups of the GUI.

> scheduler) leads to random lock ups of the GUI [1,2].
> 
> This is not a complete revert of commit f3123c25 as drm_sched_wakeup

This is a partial revert of of commit f3123c2590005c since drm_sched_wakeup()

> still needs its entity argument to pass it to drm_sched_can_queue.

... drm_sched_can_queue().

# Don't forget a SoB line!

Signed-off-by: Bert ...

>> [1] https://gitlab.freedesktop.org/drm/amd/-/issues/2994

# Use a Link: tag instead, like this:
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2994

> [2] https://lists.freedesktop.org/archives/dri-devel/2023-November/431606.html

# Use a Link: tag instead, like this:
Link: https://lists.freedesktop.org/archives/dri-devel/2023-November/431606.html

> 
> This reverts commit f3123c2590005c5ff631653d31428e40cd10c618.

# The line above is *not* necessary, since this is a partial commit. Instead we need
# a Fixes: line, like this:

Fixes: f3123c2590005c ("drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()")

#######---

Then after you do "git format-patch", post it like this:

git send-email \
    --in-reply-to=c5292d06-2e37-4715-96dc-699f369111fa@gmail.com \
    --to=ltuikov89@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=spasswolf@web.de \
    --cc=tvrtko.ursulin@intel.com \
    /path/to/PATCH

This follows your thread where all the information is stored.

Thanks!
-- 
Regards,
Luben

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 677 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev14)
  2023-10-31  3:24 ` [Intel-xe] " Matthew Brost
                   ` (10 preceding siblings ...)
  (?)
@ 2023-11-27 16:18 ` Patchwork
  2023-11-27 17:15   ` Bert Karwatzki
  -1 siblings, 1 reply; 62+ messages in thread
From: Patchwork @ 2023-11-27 16:18 UTC (permalink / raw)
  To: Bert Karwatzki; +Cc: intel-xe

== Series Details ==

Series: DRM scheduler changes for Xe (rev14)
URL   : https://patchwork.freedesktop.org/series/121744/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: 7c4b213b5 drm/xe: Internally change the compute_mode and no_dma_fence mode naming
=== git am output follows ===
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:290
error: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c: patch does not apply
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1659
error: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c: patch does not apply
error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4861
error: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: patch does not apply
error: patch failed: drivers/gpu/drm/msm/adreno/adreno_device.c:841
error: drivers/gpu/drm/msm/adreno/adreno_device.c: patch does not apply
error: patch failed: drivers/gpu/drm/scheduler/sched_main.c:439
error: drivers/gpu/drm/scheduler/sched_main.c: patch does not apply
error: patch failed: include/drm/gpu_scheduler.h:552
error: include/drm/gpu_scheduler.h: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch
Applying: drm/sched: Add drm_sched_wqueue_* helpers
Patch failed at 0001 drm/sched: Add drm_sched_wqueue_* helpers
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Intel-xe]  ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev14)
  2023-11-27 16:18 ` [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev14) Patchwork
@ 2023-11-27 17:15   ` Bert Karwatzki
  0 siblings, 0 replies; 62+ messages in thread
From: Bert Karwatzki @ 2023-11-27 17:15 UTC (permalink / raw)
  To: intel-xe

Am Montag, dem 27.11.2023 um 16:18 +0000 schrieb Patchwork:
> == Series Details ==
>
> Series: DRM scheduler changes for Xe (rev14)
> URL   : https://patchwork.freedesktop.org/series/121744/
> State : failure
>
> == Summary ==
>
> === Applying kernel patches on branch 'drm-xe-next' with base: ===
> Base commit: 7c4b213b5 drm/xe: Internally change the compute_mode and
> no_dma_fence mode naming
> === git am output follows ===
> error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:290
> error: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c: patch does not
> apply
> error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1659
> error: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c: patch does not apply
> error: patch failed: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4861
> error: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: patch does not apply
> error: patch failed: drivers/gpu/drm/msm/adreno/adreno_device.c:841
> error: drivers/gpu/drm/msm/adreno/adreno_device.c: patch does not apply
> error: patch failed: drivers/gpu/drm/scheduler/sched_main.c:439
> error: drivers/gpu/drm/scheduler/sched_main.c: patch does not apply
> error: patch failed: include/drm/gpu_scheduler.h:552
> error: include/drm/gpu_scheduler.h: patch does not apply
> hint: Use 'git am --show-current-patch' to see the failed patch
> Applying: drm/sched: Add drm_sched_wqueue_* helpers
> Patch failed at 0001 drm/sched: Add drm_sched_wqueue_* helpers
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
>
>

The patch was targetting the for-linux-next branch of the drm-misc tree
(url = https://anongit.freedesktop.org/git/drm/drm-misc.git).

Bert Karwatzki

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2023-11-27 17:34 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-31  3:24 [PATCH v8 0/5] DRM scheduler changes for Xe Matthew Brost
2023-10-31  3:24 ` [Intel-xe] " Matthew Brost
2023-10-31  3:24 ` [PATCH v8 1/5] drm/sched: Add drm_sched_wqueue_* helpers Matthew Brost
2023-10-31  3:24   ` [Intel-xe] " Matthew Brost
2023-10-31  3:24 ` [PATCH v8 2/5] drm/sched: Convert drm scheduler to use a work queue rather than kthread Matthew Brost
2023-10-31  3:24   ` [Intel-xe] " Matthew Brost
2023-10-31  3:24 ` [PATCH v8 3/5] drm/sched: Split free_job into own work item Matthew Brost
2023-10-31  3:24   ` [Intel-xe] " Matthew Brost
2023-11-01 22:13   ` Luben Tuikov
2023-11-01 22:13     ` Luben Tuikov
2023-11-02 11:13   ` [Intel-xe] " Tvrtko Ursulin
2023-11-02 11:13     ` Tvrtko Ursulin
2023-11-02 22:46     ` [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready() Luben Tuikov
2023-11-02 22:46       ` [Intel-xe] " Luben Tuikov
2023-11-03 10:39       ` Tvrtko Ursulin
2023-11-03 10:39         ` [Intel-xe] " Tvrtko Ursulin
2023-11-04  0:25         ` Luben Tuikov
2023-11-04  0:25           ` Luben Tuikov
2023-11-06 12:54           ` Tvrtko Ursulin
2023-11-06 12:54             ` [Intel-xe] " Tvrtko Ursulin
2023-11-03 15:13       ` Matthew Brost
2023-11-03 15:13         ` [Intel-xe] " Matthew Brost
2023-11-04  0:24         ` Luben Tuikov
2023-11-04  0:24           ` [Intel-xe] " Luben Tuikov
2023-11-02 22:58     ` [PATCH v8 3/5] drm/sched: Split free_job into own work item Luben Tuikov
2023-11-02 22:58       ` [Intel-xe] " Luben Tuikov
2023-11-07  4:10     ` [PATCH] drm/sched: Don't disturb the entity when in RR-mode scheduling Luben Tuikov
2023-11-07  4:10       ` [Intel-xe] " Luben Tuikov
2023-11-07 11:48       ` Matthew Brost
2023-11-07 11:48         ` [Intel-xe] " Matthew Brost
2023-11-08  3:28         ` Luben Tuikov
2023-11-08  3:28           ` [Intel-xe] " Luben Tuikov
2023-11-07 17:53       ` Danilo Krummrich
2023-11-07 17:53         ` [Intel-xe] " Danilo Krummrich
2023-11-08  3:29         ` Luben Tuikov
2023-11-08  3:29           ` [Intel-xe] " Luben Tuikov
2023-11-08  0:41       ` Danilo Krummrich
2023-11-08  0:41         ` [Intel-xe] " Danilo Krummrich
2023-11-09  6:52         ` Luben Tuikov
2023-11-09  6:52           ` [Intel-xe] " Luben Tuikov
2023-11-09 19:24           ` Danilo Krummrich
2023-11-09 19:24             ` [Intel-xe] " Danilo Krummrich
2023-11-09 23:41             ` Danilo Krummrich
2023-11-09 23:41               ` [Intel-xe] " Danilo Krummrich
2023-11-09 23:49               ` Luben Tuikov
2023-11-09 23:49                 ` [Intel-xe] " Luben Tuikov
2023-11-27 13:30                 ` [PATCH] Revert "drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()" Bert Karwatzki
2023-11-27 13:30                   ` [Intel-xe] " Bert Karwatzki
2023-11-27 15:14                   ` Luben Tuikov
2023-11-27 15:14                     ` [Intel-xe] " Luben Tuikov
2023-10-31  3:24 ` [PATCH v8 4/5] drm/sched: Add drm_sched_start_timeout_unlocked helper Matthew Brost
2023-10-31  3:24   ` [Intel-xe] " Matthew Brost
2023-10-31  3:24 ` [PATCH v8 5/5] drm/sched: Add a helper to queue TDR immediately Matthew Brost
2023-10-31  3:24   ` [Intel-xe] " Matthew Brost
2023-10-31  3:31 ` [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev10) Patchwork
2023-11-01 22:16 ` [Intel-xe] [PATCH v8 0/5] DRM scheduler changes for Xe Luben Tuikov
2023-11-01 22:16   ` Luben Tuikov
2023-11-02 22:49 ` [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev11) Patchwork
2023-11-07  4:39 ` [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev12) Patchwork
2023-11-09  7:12 ` [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev13) Patchwork
2023-11-27 16:18 ` [Intel-xe] ✗ CI.Patch_applied: failure for DRM scheduler changes for Xe (rev14) Patchwork
2023-11-27 17:15   ` Bert Karwatzki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.