All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] drm/i915: PREEMPT_RT related fixups.
@ 2021-12-14 14:02 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:02 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Tvrtko Ursulin, David Airlie, Rodrigo Vivi, Thomas Gleixner


Hi,

The following patches are from the PREEMPT_RT queue. One patch was
applied, one added so here are eight again. I can post them in smaller
batches if that is preferred.
It is mostly about disabling interrupts/preemption which leads to
problems.  Unfortunately DRM_I915_LOW_LEVEL_TRACEPOINTS had to be
disabled because it acquires locks from within trace points. Making the
lock a raw_spinlock_t led to higher latencies during video playback
  https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfyf7g@linutronix.de/

and I'm not sure if I hit the worse case here.
I tested it on a SandyBridge with built-in i915 by using X, OpenGL and
playing videos without noticing any warnings. However, some code paths
were not entered.

Sebastian


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH 0/8] drm/i915: PREEMPT_RT related fixups.
@ 2021-12-14 14:02 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:02 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: David Airlie, Thomas Gleixner


Hi,

The following patches are from the PREEMPT_RT queue. One patch was
applied, one added so here are eight again. I can post them in smaller
batches if that is preferred.
It is mostly about disabling interrupts/preemption which leads to
problems.  Unfortunately DRM_I915_LOW_LEVEL_TRACEPOINTS had to be
disabled because it acquires locks from within trace points. Making the
lock a raw_spinlock_t led to higher latencies during video playback
  https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfyf7g@linutronix.de/

and I'm not sure if I hit the worse case here.
I tested it on a SandyBridge with built-in i915 by using X, OpenGL and
playing videos without noticing any warnings. However, some code paths
were not entered.

Sebastian


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/8] drm/i915: Drop the irqs_disabled() check
  2021-12-14 14:02 ` [Intel-gfx] " Sebastian Andrzej Siewior
@ 2021-12-14 14:02   ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:02 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Tvrtko Ursulin, David Airlie, Sebastian Andrzej Siewior,
	Rodrigo Vivi, Thomas Gleixner

The !irqs_disabled() check triggers on PREEMPT_RT even with
i915_sched_engine::lock acquired. The reason is the lock is transformed
into a sleeping lock on PREEMPT_RT and does not disable interrupts.

There is no need to check for disabled interrupts. The lockdep
annotation below already check if the lock has been acquired by the
caller and will yell if the interrupts are not disabled.

Remove the !irqs_disabled() check.

Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/i915/i915_request.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index fe682b6902aae..304565d567a1a 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -581,7 +581,6 @@ bool __i915_request_submit(struct i915_request *request)
 
 	RQ_TRACE(request, "\n");
 
-	GEM_BUG_ON(!irqs_disabled());
 	lockdep_assert_held(&engine->sched_engine->lock);
 
 	/*
@@ -690,7 +689,6 @@ void __i915_request_unsubmit(struct i915_request *request)
 	 */
 	RQ_TRACE(request, "\n");
 
-	GEM_BUG_ON(!irqs_disabled());
 	lockdep_assert_held(&engine->sched_engine->lock);
 
 	/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH 1/8] drm/i915: Drop the irqs_disabled() check
@ 2021-12-14 14:02   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:02 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: David Airlie, Sebastian Andrzej Siewior, Thomas Gleixner

The !irqs_disabled() check triggers on PREEMPT_RT even with
i915_sched_engine::lock acquired. The reason is the lock is transformed
into a sleeping lock on PREEMPT_RT and does not disable interrupts.

There is no need to check for disabled interrupts. The lockdep
annotation below already check if the lock has been acquired by the
caller and will yell if the interrupts are not disabled.

Remove the !irqs_disabled() check.

Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/i915/i915_request.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index fe682b6902aae..304565d567a1a 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -581,7 +581,6 @@ bool __i915_request_submit(struct i915_request *request)
 
 	RQ_TRACE(request, "\n");
 
-	GEM_BUG_ON(!irqs_disabled());
 	lockdep_assert_held(&engine->sched_engine->lock);
 
 	/*
@@ -690,7 +689,6 @@ void __i915_request_unsubmit(struct i915_request *request)
 	 */
 	RQ_TRACE(request, "\n");
 
-	GEM_BUG_ON(!irqs_disabled());
 	lockdep_assert_held(&engine->sched_engine->lock);
 
 	/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/8] drm/i915/gt: Queue and wait for the irq_work item.
  2021-12-14 14:02 ` [Intel-gfx] " Sebastian Andrzej Siewior
@ 2021-12-14 14:02   ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:02 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Tvrtko Ursulin, David Airlie, Sebastian Andrzej Siewior,
	Clark Williams, Rodrigo Vivi, Thomas Gleixner

Disabling interrupts and invoking the irq_work function directly breaks
on PREEMPT_RT.
PREEMPT_RT does not invoke all irq_work from hardirq context because
some of the user have spinlock_t locking in the callback function.
These locks are then turned into a sleeping locks which can not be
acquired with disabled interrupts.

Using irq_work_queue() has the benefit that the irqwork will be invoked
in the regular context. In general there is "no" delay between enqueuing
the callback and its invocation because the interrupt is raised right
away on architectures which support it (which includes x86).

Use irq_work_queue() + irq_work_sync() instead invoking the callback
directly.

Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 209cf265bf746..6e1b9068d944c 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -311,10 +311,9 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
 	/* Kick the work once more to drain the signalers, and disarm the irq */
 	irq_work_sync(&b->irq_work);
 	while (READ_ONCE(b->irq_armed) && !atomic_read(&b->active)) {
-		local_irq_disable();
-		signal_irq_work(&b->irq_work);
-		local_irq_enable();
+		irq_work_queue(&b->irq_work);
 		cond_resched();
+		irq_work_sync(&b->irq_work);
 	}
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH 2/8] drm/i915/gt: Queue and wait for the irq_work item.
@ 2021-12-14 14:02   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:02 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: David Airlie, Sebastian Andrzej Siewior, Clark Williams, Thomas Gleixner

Disabling interrupts and invoking the irq_work function directly breaks
on PREEMPT_RT.
PREEMPT_RT does not invoke all irq_work from hardirq context because
some of the user have spinlock_t locking in the callback function.
These locks are then turned into a sleeping locks which can not be
acquired with disabled interrupts.

Using irq_work_queue() has the benefit that the irqwork will be invoked
in the regular context. In general there is "no" delay between enqueuing
the callback and its invocation because the interrupt is raised right
away on architectures which support it (which includes x86).

Use irq_work_queue() + irq_work_sync() instead invoking the callback
directly.

Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 209cf265bf746..6e1b9068d944c 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -311,10 +311,9 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
 	/* Kick the work once more to drain the signalers, and disarm the irq */
 	irq_work_sync(&b->irq_work);
 	while (READ_ONCE(b->irq_armed) && !atomic_read(&b->active)) {
-		local_irq_disable();
-		signal_irq_work(&b->irq_work);
-		local_irq_enable();
+		irq_work_queue(&b->irq_work);
 		cond_resched();
+		irq_work_sync(&b->irq_work);
 	}
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/8] drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock()
  2021-12-14 14:02 ` [Intel-gfx] " Sebastian Andrzej Siewior
@ 2021-12-14 14:02   ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:02 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Tvrtko Ursulin, David Airlie, Sebastian Andrzej Siewior,
	Clark Williams, Rodrigo Vivi, Thomas Gleixner

execlists_dequeue() is invoked from a function which uses
local_irq_disable() to disable interrupts so the spin_lock() behaves
like spin_lock_irq().
This breaks PREEMPT_RT because local_irq_disable() + spin_lock() is not
the same as spin_lock_irq().

execlists_dequeue_irq() and execlists_dequeue() has each one caller
only. If intel_engine_cs::active::lock is acquired and released with the
_irq suffix then it behaves almost as if execlists_dequeue() would be
invoked with disabled interrupts. The difference is the last part of the
function which is then invoked with enabled interrupts.
I can't tell if this makes a difference. From looking at it, it might
work to move the last unlock at the end of the function as I didn't find
anything that would acquire the lock again.

Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 .../drm/i915/gt/intel_execlists_submission.c    | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index a69df5e9e77af..2d5f0c226ad66 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1284,7 +1284,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * and context switches) submission.
 	 */
 
-	spin_lock(&sched_engine->lock);
+	spin_lock_irq(&sched_engine->lock);
 
 	/*
 	 * If the queue is higher priority than the last
@@ -1384,7 +1384,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				 * Even if ELSP[1] is occupied and not worthy
 				 * of timeslices, our queue might be.
 				 */
-				spin_unlock(&sched_engine->lock);
+				spin_unlock_irq(&sched_engine->lock);
 				return;
 			}
 		}
@@ -1410,7 +1410,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 		if (last && !can_merge_rq(last, rq)) {
 			spin_unlock(&ve->base.sched_engine->lock);
-			spin_unlock(&engine->sched_engine->lock);
+			spin_unlock_irq(&engine->sched_engine->lock);
 			return; /* leave this for another sibling */
 		}
 
@@ -1572,7 +1572,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 */
 	sched_engine->queue_priority_hint = queue_prio(sched_engine);
 	i915_sched_engine_reset_on_empty(sched_engine);
-	spin_unlock(&sched_engine->lock);
+	spin_unlock_irq(&sched_engine->lock);
 
 	/*
 	 * We can skip poking the HW if we ended up with exactly the same set
@@ -1598,13 +1598,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	}
 }
 
-static void execlists_dequeue_irq(struct intel_engine_cs *engine)
-{
-	local_irq_disable(); /* Suspend interrupts across request submission */
-	execlists_dequeue(engine);
-	local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */
-}
-
 static void clear_ports(struct i915_request **ports, int count)
 {
 	memset_p((void **)ports, NULL, count);
@@ -2425,7 +2418,7 @@ static void execlists_submission_tasklet(struct tasklet_struct *t)
 	}
 
 	if (!engine->execlists.pending[0]) {
-		execlists_dequeue_irq(engine);
+		execlists_dequeue(engine);
 		start_timeslice(engine);
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH 3/8] drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock()
@ 2021-12-14 14:02   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:02 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: David Airlie, Sebastian Andrzej Siewior, Clark Williams, Thomas Gleixner

execlists_dequeue() is invoked from a function which uses
local_irq_disable() to disable interrupts so the spin_lock() behaves
like spin_lock_irq().
This breaks PREEMPT_RT because local_irq_disable() + spin_lock() is not
the same as spin_lock_irq().

execlists_dequeue_irq() and execlists_dequeue() has each one caller
only. If intel_engine_cs::active::lock is acquired and released with the
_irq suffix then it behaves almost as if execlists_dequeue() would be
invoked with disabled interrupts. The difference is the last part of the
function which is then invoked with enabled interrupts.
I can't tell if this makes a difference. From looking at it, it might
work to move the last unlock at the end of the function as I didn't find
anything that would acquire the lock again.

Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 .../drm/i915/gt/intel_execlists_submission.c    | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index a69df5e9e77af..2d5f0c226ad66 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1284,7 +1284,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * and context switches) submission.
 	 */
 
-	spin_lock(&sched_engine->lock);
+	spin_lock_irq(&sched_engine->lock);
 
 	/*
 	 * If the queue is higher priority than the last
@@ -1384,7 +1384,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				 * Even if ELSP[1] is occupied and not worthy
 				 * of timeslices, our queue might be.
 				 */
-				spin_unlock(&sched_engine->lock);
+				spin_unlock_irq(&sched_engine->lock);
 				return;
 			}
 		}
@@ -1410,7 +1410,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 		if (last && !can_merge_rq(last, rq)) {
 			spin_unlock(&ve->base.sched_engine->lock);
-			spin_unlock(&engine->sched_engine->lock);
+			spin_unlock_irq(&engine->sched_engine->lock);
 			return; /* leave this for another sibling */
 		}
 
@@ -1572,7 +1572,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 */
 	sched_engine->queue_priority_hint = queue_prio(sched_engine);
 	i915_sched_engine_reset_on_empty(sched_engine);
-	spin_unlock(&sched_engine->lock);
+	spin_unlock_irq(&sched_engine->lock);
 
 	/*
 	 * We can skip poking the HW if we ended up with exactly the same set
@@ -1598,13 +1598,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	}
 }
 
-static void execlists_dequeue_irq(struct intel_engine_cs *engine)
-{
-	local_irq_disable(); /* Suspend interrupts across request submission */
-	execlists_dequeue(engine);
-	local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */
-}
-
 static void clear_ports(struct i915_request **ports, int count)
 {
 	memset_p((void **)ports, NULL, count);
@@ -2425,7 +2418,7 @@ static void execlists_submission_tasklet(struct tasklet_struct *t)
 	}
 
 	if (!engine->execlists.pending[0]) {
-		execlists_dequeue_irq(engine);
+		execlists_dequeue(engine);
 		start_timeslice(engine);
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/8] drm/i915: Use preempt_disable/enable_rt() where recommended
  2021-12-14 14:02 ` [Intel-gfx] " Sebastian Andrzej Siewior
@ 2021-12-14 14:02   ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:02 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Tvrtko Ursulin, David Airlie, Sebastian Andrzej Siewior,
	Mike Galbraith, Rodrigo Vivi, Thomas Gleixner

From: Mike Galbraith <umgwanakikbuti@gmail.com>

Mario Kleiner suggest in commit
  ad3543ede630f ("drm/intel: Push get_scanout_position() timestamping into kms driver.")

a spots where preemption should be disabled on PREEMPT_RT. The
difference is that on PREEMPT_RT the intel_uncore::lock disables neither
preemption nor interrupts and so region remains preemptible.

The area covers only register reads and writes. The part that worries me
is:
- __intel_get_crtc_scanline() the worst case is 100us if no match is
  found.

- intel_crtc_scanlines_since_frame_timestamp() not sure how long this
  may take in the worst case.

It was in the RT queue for a while and nobody complained.
Disable preemption on PREEPMPT_RT during timestamping.

[bigeasy: patch description.]

Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/i915/i915_irq.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 038a9ec563c10..8e9ff0bcbc7e4 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -916,7 +916,8 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc *_crtc,
 	 */
 	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
 
-	/* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		preempt_disable();
 
 	/* Get optional system timestamp before query. */
 	if (stime)
@@ -980,7 +981,8 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc *_crtc,
 	if (etime)
 		*etime = ktime_get();
 
-	/* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		preempt_enable();
 
 	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH 4/8] drm/i915: Use preempt_disable/enable_rt() where recommended
@ 2021-12-14 14:02   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:02 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: David Airlie, Sebastian Andrzej Siewior, Mike Galbraith, Thomas Gleixner

From: Mike Galbraith <umgwanakikbuti@gmail.com>

Mario Kleiner suggest in commit
  ad3543ede630f ("drm/intel: Push get_scanout_position() timestamping into kms driver.")

a spots where preemption should be disabled on PREEMPT_RT. The
difference is that on PREEMPT_RT the intel_uncore::lock disables neither
preemption nor interrupts and so region remains preemptible.

The area covers only register reads and writes. The part that worries me
is:
- __intel_get_crtc_scanline() the worst case is 100us if no match is
  found.

- intel_crtc_scanlines_since_frame_timestamp() not sure how long this
  may take in the worst case.

It was in the RT queue for a while and nobody complained.
Disable preemption on PREEPMPT_RT during timestamping.

[bigeasy: patch description.]

Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/i915/i915_irq.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 038a9ec563c10..8e9ff0bcbc7e4 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -916,7 +916,8 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc *_crtc,
 	 */
 	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
 
-	/* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		preempt_disable();
 
 	/* Get optional system timestamp before query. */
 	if (stime)
@@ -980,7 +981,8 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc *_crtc,
 	if (etime)
 		*etime = ktime_get();
 
-	/* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		preempt_enable();
 
 	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/8] drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates
  2021-12-14 14:02 ` [Intel-gfx] " Sebastian Andrzej Siewior
@ 2021-12-14 14:02   ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:02 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Tvrtko Ursulin, David Airlie, Sebastian Andrzej Siewior,
	Mike Galbraith, Rodrigo Vivi, Thomas Gleixner

From: Mike Galbraith <umgwanakikbuti@gmail.com>

Commit
   8d7849db3eab7 ("drm/i915: Make sprite updates atomic")

started disabling interrupts across atomic updates. This breaks on PREEMPT_RT
because within this section the code attempt to acquire spinlock_t locks which
are sleeping locks on PREEMPT_RT.

According to the comment the interrupts are disabled to avoid random delays and
not required for protection or synchronisation.
If this needs to happen with disabled interrupts on PREEMPT_RT, and the
whole section is restricted to register access then all sleeping locks
need to be acquired before interrupts are disabled and some function
maybe moved after enabling interrupts again.
This includes:
- prepare_to_wait() + finish_wait() due its wake queue.
- drm_crtc_vblank_put() -> vblank_disable_fn() drm_device::vbl_lock.
- skl_pfit_enable(), intel_update_plane(), vlv_atomic_update_fifo() and
  maybe others due to intel_uncore::lock
- drm_crtc_arm_vblank_event() due to drm_device::event_lock and
  drm_device::vblank_time_lock.

Don't disable interrupts on PREEMPT_RT during atomic updates.

[bigeasy: drop local locks, commit message]

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/i915/display/intel_crtc.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_crtc.c b/drivers/gpu/drm/i915/display/intel_crtc.c
index 254e67141a776..7a39029b083f4 100644
--- a/drivers/gpu/drm/i915/display/intel_crtc.c
+++ b/drivers/gpu/drm/i915/display/intel_crtc.c
@@ -425,7 +425,8 @@ void intel_pipe_update_start(const struct intel_crtc_state *new_crtc_state)
 	 */
 	intel_psr_wait_for_idle(new_crtc_state);
 
-	local_irq_disable();
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_disable();
 
 	crtc->debug.min_vbl = min;
 	crtc->debug.max_vbl = max;
@@ -450,11 +451,13 @@ void intel_pipe_update_start(const struct intel_crtc_state *new_crtc_state)
 			break;
 		}
 
-		local_irq_enable();
+		if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+			local_irq_enable();
 
 		timeout = schedule_timeout(timeout);
 
-		local_irq_disable();
+		if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+			local_irq_disable();
 	}
 
 	finish_wait(wq, &wait);
@@ -487,7 +490,8 @@ void intel_pipe_update_start(const struct intel_crtc_state *new_crtc_state)
 	return;
 
 irq_disable:
-	local_irq_disable();
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_disable();
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_VBLANK_EVADE)
@@ -566,7 +570,8 @@ void intel_pipe_update_end(struct intel_crtc_state *new_crtc_state)
 		new_crtc_state->uapi.event = NULL;
 	}
 
-	local_irq_enable();
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_enable();
 
 	/* Send VRR Push to terminate Vblank */
 	intel_vrr_send_push(new_crtc_state);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH 5/8] drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates
@ 2021-12-14 14:02   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:02 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: David Airlie, Sebastian Andrzej Siewior, Mike Galbraith, Thomas Gleixner

From: Mike Galbraith <umgwanakikbuti@gmail.com>

Commit
   8d7849db3eab7 ("drm/i915: Make sprite updates atomic")

started disabling interrupts across atomic updates. This breaks on PREEMPT_RT
because within this section the code attempt to acquire spinlock_t locks which
are sleeping locks on PREEMPT_RT.

According to the comment the interrupts are disabled to avoid random delays and
not required for protection or synchronisation.
If this needs to happen with disabled interrupts on PREEMPT_RT, and the
whole section is restricted to register access then all sleeping locks
need to be acquired before interrupts are disabled and some function
maybe moved after enabling interrupts again.
This includes:
- prepare_to_wait() + finish_wait() due its wake queue.
- drm_crtc_vblank_put() -> vblank_disable_fn() drm_device::vbl_lock.
- skl_pfit_enable(), intel_update_plane(), vlv_atomic_update_fifo() and
  maybe others due to intel_uncore::lock
- drm_crtc_arm_vblank_event() due to drm_device::event_lock and
  drm_device::vblank_time_lock.

Don't disable interrupts on PREEMPT_RT during atomic updates.

[bigeasy: drop local locks, commit message]

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/i915/display/intel_crtc.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_crtc.c b/drivers/gpu/drm/i915/display/intel_crtc.c
index 254e67141a776..7a39029b083f4 100644
--- a/drivers/gpu/drm/i915/display/intel_crtc.c
+++ b/drivers/gpu/drm/i915/display/intel_crtc.c
@@ -425,7 +425,8 @@ void intel_pipe_update_start(const struct intel_crtc_state *new_crtc_state)
 	 */
 	intel_psr_wait_for_idle(new_crtc_state);
 
-	local_irq_disable();
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_disable();
 
 	crtc->debug.min_vbl = min;
 	crtc->debug.max_vbl = max;
@@ -450,11 +451,13 @@ void intel_pipe_update_start(const struct intel_crtc_state *new_crtc_state)
 			break;
 		}
 
-		local_irq_enable();
+		if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+			local_irq_enable();
 
 		timeout = schedule_timeout(timeout);
 
-		local_irq_disable();
+		if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+			local_irq_disable();
 	}
 
 	finish_wait(wq, &wait);
@@ -487,7 +490,8 @@ void intel_pipe_update_start(const struct intel_crtc_state *new_crtc_state)
 	return;
 
 irq_disable:
-	local_irq_disable();
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_disable();
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_VBLANK_EVADE)
@@ -566,7 +570,8 @@ void intel_pipe_update_end(struct intel_crtc_state *new_crtc_state)
 		new_crtc_state->uapi.event = NULL;
 	}
 
-	local_irq_enable();
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		local_irq_enable();
 
 	/* Send VRR Push to terminate Vblank */
 	intel_vrr_send_push(new_crtc_state);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 6/8] drm/i915: Don't check for atomic context on PREEMPT_RT
  2021-12-14 14:02 ` [Intel-gfx] " Sebastian Andrzej Siewior
@ 2021-12-14 14:02   ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:02 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Tvrtko Ursulin, David Airlie, Sebastian Andrzej Siewior,
	Rodrigo Vivi, Thomas Gleixner

The !in_atomic() check in _wait_for_atomic() triggers on PREEMPT_RT
because the uncore::lock is a spinlock_t and does not disable
preemption or interrupts.

Changing the uncore:lock to a raw_spinlock_t doubles the worst case
latency on an otherwise idle testbox during testing. Therefore I'm
currently unsure about changing this.

Link: https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfyf7g@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/i915/i915_utils.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 7a5925072466a..b7b56fb1e2fc7 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -344,7 +344,7 @@ wait_remaining_ms_from_jiffies(unsigned long timestamp_jiffies, int to_wait_ms)
 #define wait_for(COND, MS)		_wait_for((COND), (MS) * 1000, 10, 1000)
 
 /* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
-#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
+#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT) && !defined(CONFIG_PREEMPT_RT)
 # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
 #else
 # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH 6/8] drm/i915: Don't check for atomic context on PREEMPT_RT
@ 2021-12-14 14:02   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:02 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: David Airlie, Sebastian Andrzej Siewior, Thomas Gleixner

The !in_atomic() check in _wait_for_atomic() triggers on PREEMPT_RT
because the uncore::lock is a spinlock_t and does not disable
preemption or interrupts.

Changing the uncore:lock to a raw_spinlock_t doubles the worst case
latency on an otherwise idle testbox during testing. Therefore I'm
currently unsure about changing this.

Link: https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfyf7g@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/i915/i915_utils.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 7a5925072466a..b7b56fb1e2fc7 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -344,7 +344,7 @@ wait_remaining_ms_from_jiffies(unsigned long timestamp_jiffies, int to_wait_ms)
 #define wait_for(COND, MS)		_wait_for((COND), (MS) * 1000, 10, 1000)
 
 /* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
-#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
+#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT) && !defined(CONFIG_PREEMPT_RT)
 # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
 #else
 # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 7/8] drm/i915: Disable tracing points on PREEMPT_RT
  2021-12-14 14:02 ` [Intel-gfx] " Sebastian Andrzej Siewior
@ 2021-12-14 14:03   ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:03 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Tvrtko Ursulin, David Airlie, Sebastian Andrzej Siewior,
	Luca Abeni, Steven Rostedt, Rodrigo Vivi, Thomas Gleixner

Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
|  rt_spin_lock+0x3f/0x50
|  gen6_read32+0x45/0x1d0 [i915]
|  g4x_get_vblank_counter+0x36/0x40 [i915]
|  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]

The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spinlock_t locks which are transformed into
sleeping locks on PREEMPT_RT. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping locks on PREEMPT_RT.
At the time the arguments are evaluated within trace point, preemption
is disabled and so the locks must not be acquired on PREEMPT_RT.

Based on this I don't see any other way than disable trace points on
PREMPT_RT.

Reported-by: Luca Abeni <lucabe72@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/i915/i915_trace.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 8104981a66044..64c3fa7cc05df 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -2,6 +2,10 @@
 #if !defined(_I915_TRACE_H_) || defined(TRACE_HEADER_MULTI_READ)
 #define _I915_TRACE_H_
 
+#ifdef CONFIG_PREEMPT_RT
+#define NOTRACE
+#endif
+
 #include <linux/stringify.h>
 #include <linux/types.h>
 #include <linux/tracepoint.h>
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH 7/8] drm/i915: Disable tracing points on PREEMPT_RT
@ 2021-12-14 14:03   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:03 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: David Airlie, Sebastian Andrzej Siewior, Luca Abeni,
	Steven Rostedt, Thomas Gleixner

Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
|  rt_spin_lock+0x3f/0x50
|  gen6_read32+0x45/0x1d0 [i915]
|  g4x_get_vblank_counter+0x36/0x40 [i915]
|  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]

The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spinlock_t locks which are transformed into
sleeping locks on PREEMPT_RT. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping locks on PREEMPT_RT.
At the time the arguments are evaluated within trace point, preemption
is disabled and so the locks must not be acquired on PREEMPT_RT.

Based on this I don't see any other way than disable trace points on
PREMPT_RT.

Reported-by: Luca Abeni <lucabe72@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/gpu/drm/i915/i915_trace.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 8104981a66044..64c3fa7cc05df 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -2,6 +2,10 @@
 #if !defined(_I915_TRACE_H_) || defined(TRACE_HEADER_MULTI_READ)
 #define _I915_TRACE_H_
 
+#ifdef CONFIG_PREEMPT_RT
+#define NOTRACE
+#endif
+
 #include <linux/stringify.h>
 #include <linux/types.h>
 #include <linux/tracepoint.h>
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 8/8] drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
  2021-12-14 14:02 ` [Intel-gfx] " Sebastian Andrzej Siewior
@ 2021-12-14 14:03   ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:03 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Tvrtko Ursulin, David Airlie, Sebastian Andrzej Siewior,
	Steven Rostedt, Rodrigo Vivi, Thomas Gleixner

The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.

Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/gpu/drm/i915/i915_trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 64c3fa7cc05df..89a4089bc4baf 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -823,7 +823,7 @@ DEFINE_EVENT(i915_request, i915_request_add,
 	     TP_ARGS(rq)
 );
 
-#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
+#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) && !defined(NOTRACE)
 DEFINE_EVENT(i915_request, i915_request_guc_submit,
 	     TP_PROTO(struct i915_request *rq),
 	     TP_ARGS(rq)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH 8/8] drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
@ 2021-12-14 14:03   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 14:03 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: David Airlie, Sebastian Andrzej Siewior, Steven Rostedt, Thomas Gleixner

The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.

Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/gpu/drm/i915/i915_trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 64c3fa7cc05df..89a4089bc4baf 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -823,7 +823,7 @@ DEFINE_EVENT(i915_request, i915_request_add,
 	     TP_ARGS(rq)
 );
 
-#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
+#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) && !defined(NOTRACE)
 DEFINE_EVENT(i915_request, i915_request_guc_submit,
 	     TP_PROTO(struct i915_request *rq),
 	     TP_ARGS(rq)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/8] drm/i915: Disable tracing points on PREEMPT_RT
  2021-12-14 14:03   ` [Intel-gfx] " Sebastian Andrzej Siewior
@ 2021-12-14 14:36     ` Steven Rostedt
  -1 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2021-12-14 14:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tvrtko Ursulin, David Airlie, intel-gfx, Luca Abeni, dri-devel,
	Rodrigo Vivi, Thomas Gleixner

On Tue, 14 Dec 2021 15:03:00 +0100
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> Luca Abeni reported this:
> | BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
> | CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
> | Call Trace:
> |  rt_spin_lock+0x3f/0x50
> |  gen6_read32+0x45/0x1d0 [i915]
> |  g4x_get_vblank_counter+0x36/0x40 [i915]
> |  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
> 
> The tracing events use trace_i915_pipe_update_start() among other events
> use functions acquire spinlock_t locks which are transformed into
> sleeping locks on PREEMPT_RT. A few trace points use
> intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
> might acquire a sleeping locks on PREEMPT_RT.
> At the time the arguments are evaluated within trace point, preemption
> is disabled and so the locks must not be acquired on PREEMPT_RT.
> 
> Based on this I don't see any other way than disable trace points on
> PREMPT_RT.

Another way around this that I can see is if the data for the tracepoints
can fit on the stack and add wrappers around the tracepoints. For example,
looking at the first tracepoint in i915_trace.h:

TRACE_EVENT(intel_pipe_enable,
	    TP_PROTO(struct intel_crtc *crtc),
	    TP_ARGS(crtc),

	    TP_STRUCT__entry(
			     __array(u32, frame, 3)
			     __array(u32, scanline, 3)
			     __field(enum pipe, pipe)
			     ),
	    TP_fast_assign(
			   struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
			   struct intel_crtc *it__;
			   for_each_intel_crtc(&dev_priv->drm, it__) {
				   __entry->frame[it__->pipe] = intel_crtc_get_vblank_counter(it__);
				   __entry->scanline[it__->pipe] = intel_get_crtc_scanline(it__);
			   }
			   __entry->pipe = crtc->pipe;
			   ),

	    TP_printk("pipe %c enable, pipe A: frame=%u, scanline=%u, pipe B: frame=%u, scanline=%u, pipe C: frame=%u, scanline=%u",
		      pipe_name(__entry->pipe),
		      __entry->frame[PIPE_A], __entry->scanline[PIPE_A],
		      __entry->frame[PIPE_B], __entry->scanline[PIPE_B],
		      __entry->frame[PIPE_C], __entry->scanline[PIPE_C])
);

We could modify this to be:

TRACE_EVENT(intel_pipe_enable,
	    TP_PROTO(u32 *frame, u32 *scanline, enum pipe),
	    TP_ARGS(frame, scanline, pipe),

	    TP_STRUCT__entry(
			     __array(u32, frame, 3)
			     __array(u32, scanline, 3)
			     __field(enum pipe, pipe)
			     ),
	    TP_fast_assign(
			   int i;
			   for (i = 0; i < 3; i++) {
			      __entry->frame[i] = frame[i];
			      __entry->scanline[i] = scanline[i];
			   }
			   __entry->pipe = pipe;
			   ),

	    TP_printk("pipe %c enable, pipe A: frame=%u, scanline=%u, pipe B: frame=%u, scanline=%u, pipe C: frame=%u, scanline=%u",
		      pipe_name(__entry->pipe),
		      __entry->frame[PIPE_A], __entry->scanline[PIPE_A],
		      __entry->frame[PIPE_B], __entry->scanline[PIPE_B],
		      __entry->frame[PIPE_C], __entry->scanline[PIPE_C])
);


static inline void do_trace_intel_pipe(struct intel_crtc *crtc)
{
	u32 frame[3];
	u32 scanline[3];
	enum pipe pipe;

	if (!trace_intel_pipe_enable_enabled())
		return;

	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
	struct intel_crtc *it__;
	for_each_intel_crtc(&dev_priv->drm, it__) {
		frame[it__->pipe] = intel_crtc_get_vblank_counter(it__);
		scanline[it__->pipe] = intel_get_crtc_scanline(it__);
	}

	trace_intel_pipe(frame, scanline, crtc->pipe);
}


The trace_intel_pipe_enable_enabled() is a static_branch that will act the
same as the nop of a trace event, so this will still not add overhead when
not enabled.

All the processing will be done outside the trace event allowing it to be
preempted, and then when the trace event is executed, it will run quickly
without taking any locks.

Then have the code call do_trace_intel_pipe() instead of trace_intel_pipe()
and this should fix the issue with preempt rt.

-- Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: Disable tracing points on PREEMPT_RT
@ 2021-12-14 14:36     ` Steven Rostedt
  0 siblings, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2021-12-14 14:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: David Airlie, intel-gfx, Luca Abeni, dri-devel, Thomas Gleixner

On Tue, 14 Dec 2021 15:03:00 +0100
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> Luca Abeni reported this:
> | BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
> | CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
> | Call Trace:
> |  rt_spin_lock+0x3f/0x50
> |  gen6_read32+0x45/0x1d0 [i915]
> |  g4x_get_vblank_counter+0x36/0x40 [i915]
> |  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
> 
> The tracing events use trace_i915_pipe_update_start() among other events
> use functions acquire spinlock_t locks which are transformed into
> sleeping locks on PREEMPT_RT. A few trace points use
> intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
> might acquire a sleeping locks on PREEMPT_RT.
> At the time the arguments are evaluated within trace point, preemption
> is disabled and so the locks must not be acquired on PREEMPT_RT.
> 
> Based on this I don't see any other way than disable trace points on
> PREMPT_RT.

Another way around this that I can see is if the data for the tracepoints
can fit on the stack and add wrappers around the tracepoints. For example,
looking at the first tracepoint in i915_trace.h:

TRACE_EVENT(intel_pipe_enable,
	    TP_PROTO(struct intel_crtc *crtc),
	    TP_ARGS(crtc),

	    TP_STRUCT__entry(
			     __array(u32, frame, 3)
			     __array(u32, scanline, 3)
			     __field(enum pipe, pipe)
			     ),
	    TP_fast_assign(
			   struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
			   struct intel_crtc *it__;
			   for_each_intel_crtc(&dev_priv->drm, it__) {
				   __entry->frame[it__->pipe] = intel_crtc_get_vblank_counter(it__);
				   __entry->scanline[it__->pipe] = intel_get_crtc_scanline(it__);
			   }
			   __entry->pipe = crtc->pipe;
			   ),

	    TP_printk("pipe %c enable, pipe A: frame=%u, scanline=%u, pipe B: frame=%u, scanline=%u, pipe C: frame=%u, scanline=%u",
		      pipe_name(__entry->pipe),
		      __entry->frame[PIPE_A], __entry->scanline[PIPE_A],
		      __entry->frame[PIPE_B], __entry->scanline[PIPE_B],
		      __entry->frame[PIPE_C], __entry->scanline[PIPE_C])
);

We could modify this to be:

TRACE_EVENT(intel_pipe_enable,
	    TP_PROTO(u32 *frame, u32 *scanline, enum pipe),
	    TP_ARGS(frame, scanline, pipe),

	    TP_STRUCT__entry(
			     __array(u32, frame, 3)
			     __array(u32, scanline, 3)
			     __field(enum pipe, pipe)
			     ),
	    TP_fast_assign(
			   int i;
			   for (i = 0; i < 3; i++) {
			      __entry->frame[i] = frame[i];
			      __entry->scanline[i] = scanline[i];
			   }
			   __entry->pipe = pipe;
			   ),

	    TP_printk("pipe %c enable, pipe A: frame=%u, scanline=%u, pipe B: frame=%u, scanline=%u, pipe C: frame=%u, scanline=%u",
		      pipe_name(__entry->pipe),
		      __entry->frame[PIPE_A], __entry->scanline[PIPE_A],
		      __entry->frame[PIPE_B], __entry->scanline[PIPE_B],
		      __entry->frame[PIPE_C], __entry->scanline[PIPE_C])
);


static inline void do_trace_intel_pipe(struct intel_crtc *crtc)
{
	u32 frame[3];
	u32 scanline[3];
	enum pipe pipe;

	if (!trace_intel_pipe_enable_enabled())
		return;

	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
	struct intel_crtc *it__;
	for_each_intel_crtc(&dev_priv->drm, it__) {
		frame[it__->pipe] = intel_crtc_get_vblank_counter(it__);
		scanline[it__->pipe] = intel_get_crtc_scanline(it__);
	}

	trace_intel_pipe(frame, scanline, crtc->pipe);
}


The trace_intel_pipe_enable_enabled() is a static_branch that will act the
same as the nop of a trace event, so this will still not add overhead when
not enabled.

All the processing will be done outside the trace event allowing it to be
preempted, and then when the trace event is executed, it will run quickly
without taking any locks.

Then have the code call do_trace_intel_pipe() instead of trace_intel_pipe()
and this should fix the issue with preempt rt.

-- Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/8] drm/i915: Disable tracing points on PREEMPT_RT
  2021-12-14 14:36     ` [Intel-gfx] " Steven Rostedt
@ 2021-12-14 15:41       ` Jani Nikula
  -1 siblings, 0 replies; 35+ messages in thread
From: Jani Nikula @ 2021-12-14 15:41 UTC (permalink / raw)
  To: Steven Rostedt, Sebastian Andrzej Siewior
  Cc: Tvrtko Ursulin, David Airlie, intel-gfx, dri-devel, Luca Abeni,
	Rodrigo Vivi, Thomas Gleixner

On Tue, 14 Dec 2021, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 14 Dec 2021 15:03:00 +0100
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
>
>> Luca Abeni reported this:
>> | BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
>> | CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
>> | Call Trace:
>> |  rt_spin_lock+0x3f/0x50
>> |  gen6_read32+0x45/0x1d0 [i915]
>> |  g4x_get_vblank_counter+0x36/0x40 [i915]
>> |  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
>> 
>> The tracing events use trace_i915_pipe_update_start() among other events
>> use functions acquire spinlock_t locks which are transformed into
>> sleeping locks on PREEMPT_RT. A few trace points use
>> intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
>> might acquire a sleeping locks on PREEMPT_RT.
>> At the time the arguments are evaluated within trace point, preemption
>> is disabled and so the locks must not be acquired on PREEMPT_RT.
>> 
>> Based on this I don't see any other way than disable trace points on
>> PREMPT_RT.
>
> Another way around this that I can see is if the data for the tracepoints
> can fit on the stack and add wrappers around the tracepoints. For example,
> looking at the first tracepoint in i915_trace.h:

FYI display portions of the file have been split to
display/intel_display_trace.[ch] in current drm-intel-next, headed for
v5.17 merge window.

BR,
Jani.


>
> TRACE_EVENT(intel_pipe_enable,
> 	    TP_PROTO(struct intel_crtc *crtc),
> 	    TP_ARGS(crtc),
>
> 	    TP_STRUCT__entry(
> 			     __array(u32, frame, 3)
> 			     __array(u32, scanline, 3)
> 			     __field(enum pipe, pipe)
> 			     ),
> 	    TP_fast_assign(
> 			   struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
> 			   struct intel_crtc *it__;
> 			   for_each_intel_crtc(&dev_priv->drm, it__) {
> 				   __entry->frame[it__->pipe] = intel_crtc_get_vblank_counter(it__);
> 				   __entry->scanline[it__->pipe] = intel_get_crtc_scanline(it__);
> 			   }
> 			   __entry->pipe = crtc->pipe;
> 			   ),
>
> 	    TP_printk("pipe %c enable, pipe A: frame=%u, scanline=%u, pipe B: frame=%u, scanline=%u, pipe C: frame=%u, scanline=%u",
> 		      pipe_name(__entry->pipe),
> 		      __entry->frame[PIPE_A], __entry->scanline[PIPE_A],
> 		      __entry->frame[PIPE_B], __entry->scanline[PIPE_B],
> 		      __entry->frame[PIPE_C], __entry->scanline[PIPE_C])
> );
>
> We could modify this to be:
>
> TRACE_EVENT(intel_pipe_enable,
> 	    TP_PROTO(u32 *frame, u32 *scanline, enum pipe),
> 	    TP_ARGS(frame, scanline, pipe),
>
> 	    TP_STRUCT__entry(
> 			     __array(u32, frame, 3)
> 			     __array(u32, scanline, 3)
> 			     __field(enum pipe, pipe)
> 			     ),
> 	    TP_fast_assign(
> 			   int i;
> 			   for (i = 0; i < 3; i++) {
> 			      __entry->frame[i] = frame[i];
> 			      __entry->scanline[i] = scanline[i];
> 			   }
> 			   __entry->pipe = pipe;
> 			   ),
>
> 	    TP_printk("pipe %c enable, pipe A: frame=%u, scanline=%u, pipe B: frame=%u, scanline=%u, pipe C: frame=%u, scanline=%u",
> 		      pipe_name(__entry->pipe),
> 		      __entry->frame[PIPE_A], __entry->scanline[PIPE_A],
> 		      __entry->frame[PIPE_B], __entry->scanline[PIPE_B],
> 		      __entry->frame[PIPE_C], __entry->scanline[PIPE_C])
> );
>
>
> static inline void do_trace_intel_pipe(struct intel_crtc *crtc)
> {
> 	u32 frame[3];
> 	u32 scanline[3];
> 	enum pipe pipe;
>
> 	if (!trace_intel_pipe_enable_enabled())
> 		return;
>
> 	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
> 	struct intel_crtc *it__;
> 	for_each_intel_crtc(&dev_priv->drm, it__) {
> 		frame[it__->pipe] = intel_crtc_get_vblank_counter(it__);
> 		scanline[it__->pipe] = intel_get_crtc_scanline(it__);
> 	}
>
> 	trace_intel_pipe(frame, scanline, crtc->pipe);
> }
>
>
> The trace_intel_pipe_enable_enabled() is a static_branch that will act the
> same as the nop of a trace event, so this will still not add overhead when
> not enabled.
>
> All the processing will be done outside the trace event allowing it to be
> preempted, and then when the trace event is executed, it will run quickly
> without taking any locks.
>
> Then have the code call do_trace_intel_pipe() instead of trace_intel_pipe()
> and this should fix the issue with preempt rt.
>
> -- Steve

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: Disable tracing points on PREEMPT_RT
@ 2021-12-14 15:41       ` Jani Nikula
  0 siblings, 0 replies; 35+ messages in thread
From: Jani Nikula @ 2021-12-14 15:41 UTC (permalink / raw)
  To: Steven Rostedt, Sebastian Andrzej Siewior
  Cc: David Airlie, intel-gfx, dri-devel, Luca Abeni, Thomas Gleixner

On Tue, 14 Dec 2021, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 14 Dec 2021 15:03:00 +0100
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
>
>> Luca Abeni reported this:
>> | BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
>> | CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
>> | Call Trace:
>> |  rt_spin_lock+0x3f/0x50
>> |  gen6_read32+0x45/0x1d0 [i915]
>> |  g4x_get_vblank_counter+0x36/0x40 [i915]
>> |  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
>> 
>> The tracing events use trace_i915_pipe_update_start() among other events
>> use functions acquire spinlock_t locks which are transformed into
>> sleeping locks on PREEMPT_RT. A few trace points use
>> intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
>> might acquire a sleeping locks on PREEMPT_RT.
>> At the time the arguments are evaluated within trace point, preemption
>> is disabled and so the locks must not be acquired on PREEMPT_RT.
>> 
>> Based on this I don't see any other way than disable trace points on
>> PREMPT_RT.
>
> Another way around this that I can see is if the data for the tracepoints
> can fit on the stack and add wrappers around the tracepoints. For example,
> looking at the first tracepoint in i915_trace.h:

FYI display portions of the file have been split to
display/intel_display_trace.[ch] in current drm-intel-next, headed for
v5.17 merge window.

BR,
Jani.


>
> TRACE_EVENT(intel_pipe_enable,
> 	    TP_PROTO(struct intel_crtc *crtc),
> 	    TP_ARGS(crtc),
>
> 	    TP_STRUCT__entry(
> 			     __array(u32, frame, 3)
> 			     __array(u32, scanline, 3)
> 			     __field(enum pipe, pipe)
> 			     ),
> 	    TP_fast_assign(
> 			   struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
> 			   struct intel_crtc *it__;
> 			   for_each_intel_crtc(&dev_priv->drm, it__) {
> 				   __entry->frame[it__->pipe] = intel_crtc_get_vblank_counter(it__);
> 				   __entry->scanline[it__->pipe] = intel_get_crtc_scanline(it__);
> 			   }
> 			   __entry->pipe = crtc->pipe;
> 			   ),
>
> 	    TP_printk("pipe %c enable, pipe A: frame=%u, scanline=%u, pipe B: frame=%u, scanline=%u, pipe C: frame=%u, scanline=%u",
> 		      pipe_name(__entry->pipe),
> 		      __entry->frame[PIPE_A], __entry->scanline[PIPE_A],
> 		      __entry->frame[PIPE_B], __entry->scanline[PIPE_B],
> 		      __entry->frame[PIPE_C], __entry->scanline[PIPE_C])
> );
>
> We could modify this to be:
>
> TRACE_EVENT(intel_pipe_enable,
> 	    TP_PROTO(u32 *frame, u32 *scanline, enum pipe),
> 	    TP_ARGS(frame, scanline, pipe),
>
> 	    TP_STRUCT__entry(
> 			     __array(u32, frame, 3)
> 			     __array(u32, scanline, 3)
> 			     __field(enum pipe, pipe)
> 			     ),
> 	    TP_fast_assign(
> 			   int i;
> 			   for (i = 0; i < 3; i++) {
> 			      __entry->frame[i] = frame[i];
> 			      __entry->scanline[i] = scanline[i];
> 			   }
> 			   __entry->pipe = pipe;
> 			   ),
>
> 	    TP_printk("pipe %c enable, pipe A: frame=%u, scanline=%u, pipe B: frame=%u, scanline=%u, pipe C: frame=%u, scanline=%u",
> 		      pipe_name(__entry->pipe),
> 		      __entry->frame[PIPE_A], __entry->scanline[PIPE_A],
> 		      __entry->frame[PIPE_B], __entry->scanline[PIPE_B],
> 		      __entry->frame[PIPE_C], __entry->scanline[PIPE_C])
> );
>
>
> static inline void do_trace_intel_pipe(struct intel_crtc *crtc)
> {
> 	u32 frame[3];
> 	u32 scanline[3];
> 	enum pipe pipe;
>
> 	if (!trace_intel_pipe_enable_enabled())
> 		return;
>
> 	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
> 	struct intel_crtc *it__;
> 	for_each_intel_crtc(&dev_priv->drm, it__) {
> 		frame[it__->pipe] = intel_crtc_get_vblank_counter(it__);
> 		scanline[it__->pipe] = intel_get_crtc_scanline(it__);
> 	}
>
> 	trace_intel_pipe(frame, scanline, crtc->pipe);
> }
>
>
> The trace_intel_pipe_enable_enabled() is a static_branch that will act the
> same as the nop of a trace event, so this will still not add overhead when
> not enabled.
>
> All the processing will be done outside the trace event allowing it to be
> preempted, and then when the trace event is executed, it will run quickly
> without taking any locks.
>
> Then have the code call do_trace_intel_pipe() instead of trace_intel_pipe()
> and this should fix the issue with preempt rt.
>
> -- Steve

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/8] drm/i915: Disable tracing points on PREEMPT_RT
  2021-12-14 14:36     ` [Intel-gfx] " Steven Rostedt
@ 2021-12-14 15:56       ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 15:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Tvrtko Ursulin, David Airlie, intel-gfx, Luca Abeni, dri-devel,
	Rodrigo Vivi, Thomas Gleixner

On 2021-12-14 09:36:52 [-0500], Steven Rostedt wrote:
> Another way around this that I can see is if the data for the tracepoints
> can fit on the stack and add wrappers around the tracepoints. For example,
> looking at the first tracepoint in i915_trace.h:
…

Nice.

> We could modify this to be:
> static inline void do_trace_intel_pipe(struct intel_crtc *crtc)
> {
> 	u32 frame[3];
> 	u32 scanline[3];
> 	enum pipe pipe;
> 
> 	if (!trace_intel_pipe_enable_enabled())
> 		return;
> 
> 	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
> 	struct intel_crtc *it__;
> 	for_each_intel_crtc(&dev_priv->drm, it__) {
> 		frame[it__->pipe] = intel_crtc_get_vblank_counter(it__);
> 		scanline[it__->pipe] = intel_get_crtc_scanline(it__);
> 	}
> 
> 	trace_intel_pipe(frame, scanline, crtc->pipe);
> }
> Then have the code call do_trace_intel_pipe() instead of trace_intel_pipe()
> and this should fix the issue with preempt rt.

Is this is something, that the i915 devs would accept?

> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: Disable tracing points on PREEMPT_RT
@ 2021-12-14 15:56       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 15:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David Airlie, intel-gfx, Luca Abeni, dri-devel, Thomas Gleixner

On 2021-12-14 09:36:52 [-0500], Steven Rostedt wrote:
> Another way around this that I can see is if the data for the tracepoints
> can fit on the stack and add wrappers around the tracepoints. For example,
> looking at the first tracepoint in i915_trace.h:
…

Nice.

> We could modify this to be:
> static inline void do_trace_intel_pipe(struct intel_crtc *crtc)
> {
> 	u32 frame[3];
> 	u32 scanline[3];
> 	enum pipe pipe;
> 
> 	if (!trace_intel_pipe_enable_enabled())
> 		return;
> 
> 	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
> 	struct intel_crtc *it__;
> 	for_each_intel_crtc(&dev_priv->drm, it__) {
> 		frame[it__->pipe] = intel_crtc_get_vblank_counter(it__);
> 		scanline[it__->pipe] = intel_get_crtc_scanline(it__);
> 	}
> 
> 	trace_intel_pipe(frame, scanline, crtc->pipe);
> }
> Then have the code call do_trace_intel_pipe() instead of trace_intel_pipe()
> and this should fix the issue with preempt rt.

Is this is something, that the i915 devs would accept?

> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/i915: PREEMPT_RT related fixups. (rev4)
  2021-12-14 14:02 ` [Intel-gfx] " Sebastian Andrzej Siewior
                   ` (8 preceding siblings ...)
  (?)
@ 2021-12-14 15:58 ` Patchwork
  2021-12-14 16:12   ` Sebastian Andrzej Siewior
  -1 siblings, 1 reply; 35+ messages in thread
From: Patchwork @ 2021-12-14 15:58 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: PREEMPT_RT related fixups. (rev4)
URL   : https://patchwork.freedesktop.org/series/95463/
State : failure

== Summary ==

Applying: drm/i915: Drop the irqs_disabled() check
Applying: drm/i915/gt: Queue and wait for the irq_work item.
Applying: drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock()
Applying: drm/i915: Use preempt_disable/enable_rt() where recommended
Applying: drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/display/intel_crtc.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/display/intel_crtc.c
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/display/intel_crtc.c
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0005 drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.BUILD: failure for drm/i915: PREEMPT_RT related fixups. (rev4)
  2021-12-14 15:58 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/i915: PREEMPT_RT related fixups. (rev4) Patchwork
@ 2021-12-14 16:12   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-14 16:12 UTC (permalink / raw)
  To: intel-gfx

On 2021-12-14 15:58:45 [-0000], Patchwork wrote:
> == Series Details ==
> 
> Series: drm/i915: PREEMPT_RT related fixups. (rev4)
> URL   : https://patchwork.freedesktop.org/series/95463/
> State : failure
> 
> == Summary ==
> 
> Applying: drm/i915: Drop the irqs_disabled() check
> Applying: drm/i915/gt: Queue and wait for the irq_work item.
> Applying: drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock()
> Applying: drm/i915: Use preempt_disable/enable_rt() where recommended
> Applying: drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates
> Using index info to reconstruct a base tree...
> M	drivers/gpu/drm/i915/display/intel_crtc.c
> Falling back to patching base and 3-way merge...
> Auto-merging drivers/gpu/drm/i915/display/intel_crtc.c
> CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/display/intel_crtc.c
> error: Failed to merge in the changes.
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> Patch failed at 0005 drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".

I used the drm-intel-gt-next branch. Which one would be preferred?

Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: Disable tracing points on PREEMPT_RT
  2021-12-14 14:36     ` [Intel-gfx] " Steven Rostedt
                       ` (2 preceding siblings ...)
  (?)
@ 2021-12-14 16:34     ` Ville Syrjälä
  2021-12-14 16:58       ` Steven Rostedt
  -1 siblings, 1 reply; 35+ messages in thread
From: Ville Syrjälä @ 2021-12-14 16:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David Airlie, Sebastian Andrzej Siewior, dri-devel, Luca Abeni,
	Thomas Gleixner, intel-gfx

On Tue, Dec 14, 2021 at 09:36:52AM -0500, Steven Rostedt wrote:
> On Tue, 14 Dec 2021 15:03:00 +0100
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> > Luca Abeni reported this:
> > | BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
> > | CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
> > | Call Trace:
> > |  rt_spin_lock+0x3f/0x50
> > |  gen6_read32+0x45/0x1d0 [i915]
> > |  g4x_get_vblank_counter+0x36/0x40 [i915]
> > |  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
> > 
> > The tracing events use trace_i915_pipe_update_start() among other events
> > use functions acquire spinlock_t locks which are transformed into
> > sleeping locks on PREEMPT_RT. A few trace points use
> > intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
> > might acquire a sleeping locks on PREEMPT_RT.
> > At the time the arguments are evaluated within trace point, preemption
> > is disabled and so the locks must not be acquired on PREEMPT_RT.
> > 
> > Based on this I don't see any other way than disable trace points on
> > PREMPT_RT.
> 
> Another way around this that I can see is if the data for the tracepoints
> can fit on the stack and add wrappers around the tracepoints. For example,
> looking at the first tracepoint in i915_trace.h:
> 
> TRACE_EVENT(intel_pipe_enable,
> 	    TP_PROTO(struct intel_crtc *crtc),
> 	    TP_ARGS(crtc),
> 
> 	    TP_STRUCT__entry(
> 			     __array(u32, frame, 3)
> 			     __array(u32, scanline, 3)
> 			     __field(enum pipe, pipe)
> 			     ),
> 	    TP_fast_assign(
> 			   struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
> 			   struct intel_crtc *it__;
> 			   for_each_intel_crtc(&dev_priv->drm, it__) {
> 				   __entry->frame[it__->pipe] = intel_crtc_get_vblank_counter(it__);
> 				   __entry->scanline[it__->pipe] = intel_get_crtc_scanline(it__);
> 			   }
> 			   __entry->pipe = crtc->pipe;
> 			   ),
> 
> 	    TP_printk("pipe %c enable, pipe A: frame=%u, scanline=%u, pipe B: frame=%u, scanline=%u, pipe C: frame=%u, scanline=%u",
> 		      pipe_name(__entry->pipe),
> 		      __entry->frame[PIPE_A], __entry->scanline[PIPE_A],
> 		      __entry->frame[PIPE_B], __entry->scanline[PIPE_B],
> 		      __entry->frame[PIPE_C], __entry->scanline[PIPE_C])
> );
> 
> We could modify this to be:
> 
> TRACE_EVENT(intel_pipe_enable,
> 	    TP_PROTO(u32 *frame, u32 *scanline, enum pipe),
> 	    TP_ARGS(frame, scanline, pipe),
> 
> 	    TP_STRUCT__entry(
> 			     __array(u32, frame, 3)
> 			     __array(u32, scanline, 3)
> 			     __field(enum pipe, pipe)
> 			     ),
> 	    TP_fast_assign(
> 			   int i;
> 			   for (i = 0; i < 3; i++) {
> 			      __entry->frame[i] = frame[i];
> 			      __entry->scanline[i] = scanline[i];
> 			   }
> 			   __entry->pipe = pipe;
> 			   ),
> 
> 	    TP_printk("pipe %c enable, pipe A: frame=%u, scanline=%u, pipe B: frame=%u, scanline=%u, pipe C: frame=%u, scanline=%u",
> 		      pipe_name(__entry->pipe),
> 		      __entry->frame[PIPE_A], __entry->scanline[PIPE_A],
> 		      __entry->frame[PIPE_B], __entry->scanline[PIPE_B],
> 		      __entry->frame[PIPE_C], __entry->scanline[PIPE_C])
> );
> 
> 
> static inline void do_trace_intel_pipe(struct intel_crtc *crtc)
> {
> 	u32 frame[3];
> 	u32 scanline[3];
> 	enum pipe pipe;
> 
> 	if (!trace_intel_pipe_enable_enabled())
> 		return;
> 
> 	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
> 	struct intel_crtc *it__;
> 	for_each_intel_crtc(&dev_priv->drm, it__) {
> 		frame[it__->pipe] = intel_crtc_get_vblank_counter(it__);
> 		scanline[it__->pipe] = intel_get_crtc_scanline(it__);
> 	}
> 
> 	trace_intel_pipe(frame, scanline, crtc->pipe);
> }

Looks lightly tedious. Can't we have "slow" (or whatever) versions of
the trace macros so we could just declare these the same was as before
without having to manually write that wrapper for every event?

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: Disable tracing points on PREEMPT_RT
  2021-12-14 16:34     ` Ville Syrjälä
@ 2021-12-14 16:58       ` Steven Rostedt
  2022-02-08 17:32         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 35+ messages in thread
From: Steven Rostedt @ 2021-12-14 16:58 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: David Airlie, Sebastian Andrzej Siewior, dri-devel, Luca Abeni,
	Thomas Gleixner, intel-gfx

On Tue, 14 Dec 2021 18:34:50 +0200
Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:

> Looks lightly tedious. Can't we have "slow" (or whatever) versions of
> the trace macros so we could just declare these the same was as before
> without having to manually write that wrapper for every event?

That would be quite tedious as well ;-)

There's a couple of problems with doing it as a macro. One is that the data
would need to be saved on stack. There's no guarantee that there will be
enough stack available. We could probably add a way to limit the size. That
is, adding something like:

#define MAX_SLOW_TRACE_ENTRY_SIZE	256

	BUILD_BUG_ON(sizeof(trace_event_raw_##call) > MAX_SLOW_TRACE_ENTRY_SIZE);

and then have the entry done outside the trace event. But even with that,
this is specific to the perf and ftrace code, and not to the trace point
that is called. We would need to figure out a way to make the macro work
for all the users.

It may be possible to do, but it will be far from trivial, and I'm not sure
I want this to be an easy option. Locks should not be taken from trace
events in general, as they are not tested with lockdep when the trace
points are not enabled, and could hide deadlocks that may not appear until
running on production.

-- Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/8] drm/i915: Use preempt_disable/enable_rt() where recommended
  2021-12-14 14:02   ` [Intel-gfx] " Sebastian Andrzej Siewior
@ 2022-01-26 23:29     ` Mario Kleiner
  -1 siblings, 0 replies; 35+ messages in thread
From: Mario Kleiner @ 2022-01-26 23:29 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tvrtko Ursulin, David Airlie, intel-gfx, Mike Galbraith,
	dri-devel, Rodrigo Vivi, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 9080 bytes --]

On Tue, Dec 14, 2021 at 3:03 PM Sebastian Andrzej Siewior <
bigeasy@linutronix.de> wrote:

> From: Mike Galbraith <umgwanakikbuti@gmail.com>
>
> Mario Kleiner suggest in commit
>   ad3543ede630f ("drm/intel: Push get_scanout_position() timestamping into
> kms driver.")
>
> a spots where preemption should be disabled on PREEMPT_RT. The
> difference is that on PREEMPT_RT the intel_uncore::lock disables neither
> preemption nor interrupts and so region remains preemptible.
>
>
Hi, first thank you for implementing these preempt disables according to
the markers i left long ago. And sorry for the rather late reply.

I had a look at the code, as of Linux 5.16, and did also a little test run
(of a standard kernel, not with PREEMPT_RT, only
CONFIG_PREEMPT_VOLUNTARY=y) on my Intel Kabylake GT2, so some thoughts:

The area covers only register reads and writes. The part that worries me
> is:
> - __intel_get_crtc_scanline() the worst case is 100us if no match is
>   found.
>

This one can be a problem indeed on (maybe all?) modern Intel gpu's since
Haswell, ie. the last ~10 years. I was able to reproduce it on my Kabylake
Intel gpu.

Most of the time that for-loop with up to 100 repetitions (~ 100
udelay(1) + one mmio register read) (cfe.
https://elixir.bootlin.com/linux/v5.17-rc1/source/drivers/gpu/drm/i915/i915_irq.c#L856)
will not execute, because most of the time that function gets called from
the vblank irq handler and then that trigger condition (if
(HAS_DDI(dev_priv) && !position)) is not true. However, it also gets called
as part of power-saving on behalf of userspace context, whenever the
desktop graphics goes idle for two video refresh cycles. If the desktop
shows graphics activity again, and vblank interrupts need to get reenabled,
the probability of hitting that case is then ~1-4% depending on video mode.
How many loops it runs also varies.

On my little Intel(R) Core(TM) i5-8250U CPU machine with a mostly idle
desktop, I observed about one hit every couple of seconds of regular use,
and each hit took between 125 usecs and almost 250 usecs. I guess udelay(1)
can take a bit longer than 1 usec?

So that's too much for preempt-rt. What one could do is the following:

1. In the for-loop in __intel_get_crtc_scanline(), add a preempt_enable()
before the udelay(1); and a preempt_disable() again after it. Or
potentially around the whole for-loop if the overhead of
preempt_en/disable() is significant?

2. In intel_get_crtc_scanline() also wrap the call to
__intel_get_crtc_scanline() into a preempt_disable() and preempt_enable(),
so we can be sure that __intel_get_crtc_scanline() always gets called with
preemption disabled.

Why should this work ok'ish? The point of the original preempt disable
inside i915_get_crtc_scanoutpos
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/i915_get_crtc_scanoutpos>
is that those two *stime = ktime_get() and *etime = ktime_get() clock
queries happen as close to the scanout position query as possible to get a
small confidence interval for when exactly the scanoutpos was
read/determined from the display hardware. error = (etime - stime) is the
error margin. If that margin becomes greater than 20 usecs, then the
higher-level code will consider the measurement invalid and repeat the
whole procedure up to 3 times before giving up.

Normally, in my experience with different graphics chips, one would observe
error < 3 usecs, so the measurement almost always succeeds at first try,
only very rarely takes two attempts. The preempt disable is meant to make
sure that this stays the case on a PREEMPT_RT kernel.

The problem here are the relatively rare cases where we hit that up to 100
iterations for-loop. Here even on a regular kernel, due to hardware quirks,
we already exceed the 20 usecs tolerance by a huge amount of more than 100
usecs, leading to a retry of the measurement. And my tests showed that
often the two succeeding retries also fail, because of hardware quirks can
apparently create a blackout situation approaching 1 msec, so we lose
anyway, regardless if we get preempted on a RT kernel or not. That's why
enabling preemption on RT again during that for-loop should not make the
situation worse and at least keep RT as real-time as intended.

In practice I would also expect that this failure case is the one least
likely to impair userspace applications greatly in practice. The cases that
mostly matter are the ones executed during vblank hardware irq, where the
for-loop never executes and error margin and preempt off time is only about
1 usec. My own software which depends on very precise timestamps from the
mechanism never reported >> 20 usecs errors during startup tests or runtime
tests.


> - intel_crtc_scanlines_since_frame_timestamp() not sure how long this
>   may take in the worst case.
>
>
intel_crtc_scanlines_since_frame_timestamp() should be harmless. That
do-while loop just tries to make sure that two register reads that should
happen within the same video refresh cycle are happening in the same
refresh cycle. As such, the while loop will almost always just execute only
once, and at most two times, so that's at most 6 mmio register reads for
two loop iterations.

In the long run one could try to test if
__intel_get_crtc_scanline_from_timestamp
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/__intel_get_crtc_scanline_from_timestamp>()
wouldn't be the better choice for affected hardware always. Code and
register descriptions suggest the feature is supported by all potentially
affected hardware, so if it would turn out that that method works as
accurate and reliable as the old one, we could save the overhead and time
delays for that 100 for-loop iterations and make the whole timestamping
more reliable on modern hw.

It was in the RT queue for a while and nobody complained.
> Disable preemption on PREEPMPT_RT during timestamping.
>
>
Do other patches exist to implement the preempt_dis/enable() also for AMD
and NVidia / nouveau / vc4? I left corresponding markers for
radeon/amdgpu-kms and RaspberryPi's vc4 driver. Ideally all kms drivers
which use scanout position queries should have such code. Always a
preempt_disable() before the "if (stime) *stime = ktime_get();" statement,
and a preempt_enable() after the "if (etime) *etime = ktime_get();"
statement.

Checking Linux 5.16 code, this should be safe - short preempt off interval
with only a few mmio register reads - for all kms drivers that support it
atm. I found the following functions to modify:

amdgpu: amdgpu_display_get_crtc_scanoutpos()
radeon: radeon_get_crtc_scanoutpos()
msm: mdp5_crtc_get_scanout_position() and dpu_crtc_get_scanout_position()
ltdc: ltdc_crtc_get_scanout_position()
vc4: vc4_crtc_get_scanout_position()

nouveau: In nvkm_head_mthd_scanoutpos(), one needs to put the
preempt_disable() right before

args <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/args>->v0
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/v0>.time
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/time>[0] =
ktime_to_ns <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/ktime_to_ns>(ktime_get
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/ktime_get>());

and the preempt_enable() right after
args <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/args>->v0
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/v0>.time
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/time>[1] =
ktime_to_ns <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/ktime_to_ns>(ktime_get
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/ktime_get>());


Is the plan to integrate these patches into the mainline kernel soon, as
part of ongoing preempt-rt upstreaming?

thanks,
-mario

[bigeasy: patch description.]
>
> Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c
> b/drivers/gpu/drm/i915/i915_irq.c
> index 038a9ec563c10..8e9ff0bcbc7e4 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -916,7 +916,8 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc
> *_crtc,
>          */
>         spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
>
> -       /* preempt_disable_rt() should go right here in PREEMPT_RT
> patchset. */
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_disable();
>
>         /* Get optional system timestamp before query. */
>         if (stime)
> @@ -980,7 +981,8 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc
> *_crtc,
>         if (etime)
>                 *etime = ktime_get();
>
> -       /* preempt_enable_rt() should go right here in PREEMPT_RT
> patchset. */
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_enable();
>
>         spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
>
> --
> 2.34.1
>
>

[-- Attachment #2: Type: text/html, Size: 12262 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH 4/8] drm/i915: Use preempt_disable/enable_rt() where recommended
@ 2022-01-26 23:29     ` Mario Kleiner
  0 siblings, 0 replies; 35+ messages in thread
From: Mario Kleiner @ 2022-01-26 23:29 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: David Airlie, intel-gfx, Mike Galbraith, dri-devel, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 9080 bytes --]

On Tue, Dec 14, 2021 at 3:03 PM Sebastian Andrzej Siewior <
bigeasy@linutronix.de> wrote:

> From: Mike Galbraith <umgwanakikbuti@gmail.com>
>
> Mario Kleiner suggest in commit
>   ad3543ede630f ("drm/intel: Push get_scanout_position() timestamping into
> kms driver.")
>
> a spots where preemption should be disabled on PREEMPT_RT. The
> difference is that on PREEMPT_RT the intel_uncore::lock disables neither
> preemption nor interrupts and so region remains preemptible.
>
>
Hi, first thank you for implementing these preempt disables according to
the markers i left long ago. And sorry for the rather late reply.

I had a look at the code, as of Linux 5.16, and did also a little test run
(of a standard kernel, not with PREEMPT_RT, only
CONFIG_PREEMPT_VOLUNTARY=y) on my Intel Kabylake GT2, so some thoughts:

The area covers only register reads and writes. The part that worries me
> is:
> - __intel_get_crtc_scanline() the worst case is 100us if no match is
>   found.
>

This one can be a problem indeed on (maybe all?) modern Intel gpu's since
Haswell, ie. the last ~10 years. I was able to reproduce it on my Kabylake
Intel gpu.

Most of the time that for-loop with up to 100 repetitions (~ 100
udelay(1) + one mmio register read) (cfe.
https://elixir.bootlin.com/linux/v5.17-rc1/source/drivers/gpu/drm/i915/i915_irq.c#L856)
will not execute, because most of the time that function gets called from
the vblank irq handler and then that trigger condition (if
(HAS_DDI(dev_priv) && !position)) is not true. However, it also gets called
as part of power-saving on behalf of userspace context, whenever the
desktop graphics goes idle for two video refresh cycles. If the desktop
shows graphics activity again, and vblank interrupts need to get reenabled,
the probability of hitting that case is then ~1-4% depending on video mode.
How many loops it runs also varies.

On my little Intel(R) Core(TM) i5-8250U CPU machine with a mostly idle
desktop, I observed about one hit every couple of seconds of regular use,
and each hit took between 125 usecs and almost 250 usecs. I guess udelay(1)
can take a bit longer than 1 usec?

So that's too much for preempt-rt. What one could do is the following:

1. In the for-loop in __intel_get_crtc_scanline(), add a preempt_enable()
before the udelay(1); and a preempt_disable() again after it. Or
potentially around the whole for-loop if the overhead of
preempt_en/disable() is significant?

2. In intel_get_crtc_scanline() also wrap the call to
__intel_get_crtc_scanline() into a preempt_disable() and preempt_enable(),
so we can be sure that __intel_get_crtc_scanline() always gets called with
preemption disabled.

Why should this work ok'ish? The point of the original preempt disable
inside i915_get_crtc_scanoutpos
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/i915_get_crtc_scanoutpos>
is that those two *stime = ktime_get() and *etime = ktime_get() clock
queries happen as close to the scanout position query as possible to get a
small confidence interval for when exactly the scanoutpos was
read/determined from the display hardware. error = (etime - stime) is the
error margin. If that margin becomes greater than 20 usecs, then the
higher-level code will consider the measurement invalid and repeat the
whole procedure up to 3 times before giving up.

Normally, in my experience with different graphics chips, one would observe
error < 3 usecs, so the measurement almost always succeeds at first try,
only very rarely takes two attempts. The preempt disable is meant to make
sure that this stays the case on a PREEMPT_RT kernel.

The problem here are the relatively rare cases where we hit that up to 100
iterations for-loop. Here even on a regular kernel, due to hardware quirks,
we already exceed the 20 usecs tolerance by a huge amount of more than 100
usecs, leading to a retry of the measurement. And my tests showed that
often the two succeeding retries also fail, because of hardware quirks can
apparently create a blackout situation approaching 1 msec, so we lose
anyway, regardless if we get preempted on a RT kernel or not. That's why
enabling preemption on RT again during that for-loop should not make the
situation worse and at least keep RT as real-time as intended.

In practice I would also expect that this failure case is the one least
likely to impair userspace applications greatly in practice. The cases that
mostly matter are the ones executed during vblank hardware irq, where the
for-loop never executes and error margin and preempt off time is only about
1 usec. My own software which depends on very precise timestamps from the
mechanism never reported >> 20 usecs errors during startup tests or runtime
tests.


> - intel_crtc_scanlines_since_frame_timestamp() not sure how long this
>   may take in the worst case.
>
>
intel_crtc_scanlines_since_frame_timestamp() should be harmless. That
do-while loop just tries to make sure that two register reads that should
happen within the same video refresh cycle are happening in the same
refresh cycle. As such, the while loop will almost always just execute only
once, and at most two times, so that's at most 6 mmio register reads for
two loop iterations.

In the long run one could try to test if
__intel_get_crtc_scanline_from_timestamp
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/__intel_get_crtc_scanline_from_timestamp>()
wouldn't be the better choice for affected hardware always. Code and
register descriptions suggest the feature is supported by all potentially
affected hardware, so if it would turn out that that method works as
accurate and reliable as the old one, we could save the overhead and time
delays for that 100 for-loop iterations and make the whole timestamping
more reliable on modern hw.

It was in the RT queue for a while and nobody complained.
> Disable preemption on PREEPMPT_RT during timestamping.
>
>
Do other patches exist to implement the preempt_dis/enable() also for AMD
and NVidia / nouveau / vc4? I left corresponding markers for
radeon/amdgpu-kms and RaspberryPi's vc4 driver. Ideally all kms drivers
which use scanout position queries should have such code. Always a
preempt_disable() before the "if (stime) *stime = ktime_get();" statement,
and a preempt_enable() after the "if (etime) *etime = ktime_get();"
statement.

Checking Linux 5.16 code, this should be safe - short preempt off interval
with only a few mmio register reads - for all kms drivers that support it
atm. I found the following functions to modify:

amdgpu: amdgpu_display_get_crtc_scanoutpos()
radeon: radeon_get_crtc_scanoutpos()
msm: mdp5_crtc_get_scanout_position() and dpu_crtc_get_scanout_position()
ltdc: ltdc_crtc_get_scanout_position()
vc4: vc4_crtc_get_scanout_position()

nouveau: In nvkm_head_mthd_scanoutpos(), one needs to put the
preempt_disable() right before

args <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/args>->v0
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/v0>.time
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/time>[0] =
ktime_to_ns <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/ktime_to_ns>(ktime_get
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/ktime_get>());

and the preempt_enable() right after
args <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/args>->v0
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/v0>.time
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/time>[1] =
ktime_to_ns <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/ktime_to_ns>(ktime_get
<https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/ktime_get>());


Is the plan to integrate these patches into the mainline kernel soon, as
part of ongoing preempt-rt upstreaming?

thanks,
-mario

[bigeasy: patch description.]
>
> Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c
> b/drivers/gpu/drm/i915/i915_irq.c
> index 038a9ec563c10..8e9ff0bcbc7e4 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -916,7 +916,8 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc
> *_crtc,
>          */
>         spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
>
> -       /* preempt_disable_rt() should go right here in PREEMPT_RT
> patchset. */
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_disable();
>
>         /* Get optional system timestamp before query. */
>         if (stime)
> @@ -980,7 +981,8 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc
> *_crtc,
>         if (etime)
>                 *etime = ktime_get();
>
> -       /* preempt_enable_rt() should go right here in PREEMPT_RT
> patchset. */
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +               preempt_enable();
>
>         spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
>
> --
> 2.34.1
>
>

[-- Attachment #2: Type: text/html, Size: 12262 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: Disable tracing points on PREEMPT_RT
  2021-12-14 16:58       ` Steven Rostedt
@ 2022-02-08 17:32         ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-08 17:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David Airlie, intel-gfx, dri-devel, Luca Abeni, Thomas Gleixner

On 2021-12-14 11:58:37 [-0500], Steven Rostedt wrote:
> On Tue, 14 Dec 2021 18:34:50 +0200
> Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> 
> > Looks lightly tedious. Can't we have "slow" (or whatever) versions of
> > the trace macros so we could just declare these the same was as before
> > without having to manually write that wrapper for every event?
> 
> That would be quite tedious as well ;-)
> It may be possible to do, but it will be far from trivial, and I'm not sure
> I want this to be an easy option. Locks should not be taken from trace
> events in general, as they are not tested with lockdep when the trace
> points are not enabled, and could hide deadlocks that may not appear until
> running on production.

So we disable the tracing points or try what Steven suggested?

> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/8] drm/i915: Use preempt_disable/enable_rt() where recommended
  2022-01-26 23:29     ` [Intel-gfx] " Mario Kleiner
@ 2022-02-11  8:44       ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-11  8:44 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Tvrtko Ursulin, David Airlie, intel-gfx, Mike Galbraith,
	dri-devel, Rodrigo Vivi, Thomas Gleixner

On 2022-01-27 00:29:37 [+0100], Mario Kleiner wrote:
> Hi, first thank you for implementing these preempt disables according to
Hi Mario,

> the markers i left long ago. And sorry for the rather late reply.
> 
> I had a look at the code, as of Linux 5.16, and did also a little test run
> (of a standard kernel, not with PREEMPT_RT, only
> CONFIG_PREEMPT_VOLUNTARY=y) on my Intel Kabylake GT2, so some thoughts:
> 
> The area covers only register reads and writes. The part that worries me
> > is:
> > - __intel_get_crtc_scanline() the worst case is 100us if no match is
> >   found.
> >
> 
> This one can be a problem indeed on (maybe all?) modern Intel gpu's since
> Haswell, ie. the last ~10 years. I was able to reproduce it on my Kabylake
> Intel gpu.
> 
> Most of the time that for-loop with up to 100 repetitions (~ 100
> udelay(1) + one mmio register read) (cfe.
> https://elixir.bootlin.com/linux/v5.17-rc1/source/drivers/gpu/drm/i915/i915_irq.c#L856)
> will not execute, because most of the time that function gets called from
> the vblank irq handler and then that trigger condition (if
> (HAS_DDI(dev_priv) && !position)) is not true. However, it also gets called
> as part of power-saving on behalf of userspace context, whenever the
> desktop graphics goes idle for two video refresh cycles. If the desktop
> shows graphics activity again, and vblank interrupts need to get reenabled,
> the probability of hitting that case is then ~1-4% depending on video mode.
> How many loops it runs also varies.
> 
> On my little Intel(R) Core(TM) i5-8250U CPU machine with a mostly idle
> desktop, I observed about one hit every couple of seconds of regular use,
> and each hit took between 125 usecs and almost 250 usecs. I guess udelay(1)
> can take a bit longer than 1 usec?

it should get very close to this. Maybe something else extended the time
depending on what you observe.

> So that's too much for preempt-rt. What one could do is the following:
> 
> 1. In the for-loop in __intel_get_crtc_scanline(), add a preempt_enable()
> before the udelay(1); and a preempt_disable() again after it. Or
> potentially around the whole for-loop if the overhead of
> preempt_en/disable() is significant?

It is very optimized on x86 ;)

> 2. In intel_get_crtc_scanline() also wrap the call to
> __intel_get_crtc_scanline() into a preempt_disable() and preempt_enable(),
> so we can be sure that __intel_get_crtc_scanline() always gets called with
> preemption disabled.
> 
> Why should this work ok'ish? The point of the original preempt disable
> inside i915_get_crtc_scanoutpos
> <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/i915_get_crtc_scanoutpos>
> is that those two *stime = ktime_get() and *etime = ktime_get() clock
> queries happen as close to the scanout position query as possible to get a
> small confidence interval for when exactly the scanoutpos was
> read/determined from the display hardware. error = (etime - stime) is the
> error margin. If that margin becomes greater than 20 usecs, then the
> higher-level code will consider the measurement invalid and repeat the
> whole procedure up to 3 times before giving up.

The preempt-disable is needed then? The task is preemptible here on
PREEMPT_RT but it _may_ not come to this. The difference vs !RT is that
an interrupt will preempt this code without it.

> Normally, in my experience with different graphics chips, one would observe
> error < 3 usecs, so the measurement almost always succeeds at first try,
> only very rarely takes two attempts. The preempt disable is meant to make
> sure that this stays the case on a PREEMPT_RT kernel.

Was it needed?

> The problem here are the relatively rare cases where we hit that up to 100
> iterations for-loop. Here even on a regular kernel, due to hardware quirks,
> we already exceed the 20 usecs tolerance by a huge amount of more than 100
> usecs, leading to a retry of the measurement. And my tests showed that
> often the two succeeding retries also fail, because of hardware quirks can
> apparently create a blackout situation approaching 1 msec, so we lose
> anyway, regardless if we get preempted on a RT kernel or not. That's why
> enabling preemption on RT again during that for-loop should not make the
> situation worse and at least keep RT as real-time as intended.
> 
> In practice I would also expect that this failure case is the one least
> likely to impair userspace applications greatly in practice. The cases that
> mostly matter are the ones executed during vblank hardware irq, where the
> for-loop never executes and error margin and preempt off time is only about
> 1 usec. My own software which depends on very precise timestamps from the
> mechanism never reported >> 20 usecs errors during startup tests or runtime
> tests.

That is without RT I assume?

> > - intel_crtc_scanlines_since_frame_timestamp() not sure how long this
> >   may take in the worst case.
> >
> >
> intel_crtc_scanlines_since_frame_timestamp() should be harmless. That
> do-while loop just tries to make sure that two register reads that should
> happen within the same video refresh cycle are happening in the same
> refresh cycle. As such, the while loop will almost always just execute only
> once, and at most two times, so that's at most 6 mmio register reads for
> two loop iterations.
> 
> In the long run one could try to test if
> __intel_get_crtc_scanline_from_timestamp
> <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/__intel_get_crtc_scanline_from_timestamp>()
> wouldn't be the better choice for affected hardware always. Code and
> register descriptions suggest the feature is supported by all potentially
> affected hardware, so if it would turn out that that method works as
> accurate and reliable as the old one, we could save the overhead and time
> delays for that 100 for-loop iterations and make the whole timestamping
> more reliable on modern hw.
> 
> It was in the RT queue for a while and nobody complained.
> > Disable preemption on PREEPMPT_RT during timestamping.
> >
> >
> Do other patches exist to implement the preempt_dis/enable() also for AMD
> and NVidia / nouveau / vc4? I left corresponding markers for

No, nobody complained. Most likely the i915 is wider used since it is
built-in into many chipsets which then run RT and some of them use the
display in production.

> radeon/amdgpu-kms and RaspberryPi's vc4 driver. Ideally all kms drivers
> which use scanout position queries should have such code. Always a
> preempt_disable() before the "if (stime) *stime = ktime_get();" statement,
> and a preempt_enable() after the "if (etime) *etime = ktime_get();"
> statement.
> 
> Checking Linux 5.16 code, this should be safe - short preempt off interval
> with only a few mmio register reads - for all kms drivers that support it
> atm. I found the following functions to modify:
> 
> amdgpu: amdgpu_display_get_crtc_scanoutpos()
> radeon: radeon_get_crtc_scanoutpos()
> msm: mdp5_crtc_get_scanout_position() and dpu_crtc_get_scanout_position()
> ltdc: ltdc_crtc_get_scanout_position()
> vc4: vc4_crtc_get_scanout_position()

I that is "small" with locks and such, then it should work

> nouveau: In nvkm_head_mthd_scanoutpos(), one needs to put the
> preempt_disable() right before
> 
> 
> Is the plan to integrate these patches into the mainline kernel soon, as
> part of ongoing preempt-rt upstreaming?

I want to get the i915 in as part of RT upstreaming. But now I've been
thinking to not allowing i915 on RT via Kconfig and worry about it
afterwards.

> thanks,
> -mario

Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH 4/8] drm/i915: Use preempt_disable/enable_rt() where recommended
@ 2022-02-11  8:44       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-02-11  8:44 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: David Airlie, intel-gfx, Mike Galbraith, dri-devel, Thomas Gleixner

On 2022-01-27 00:29:37 [+0100], Mario Kleiner wrote:
> Hi, first thank you for implementing these preempt disables according to
Hi Mario,

> the markers i left long ago. And sorry for the rather late reply.
> 
> I had a look at the code, as of Linux 5.16, and did also a little test run
> (of a standard kernel, not with PREEMPT_RT, only
> CONFIG_PREEMPT_VOLUNTARY=y) on my Intel Kabylake GT2, so some thoughts:
> 
> The area covers only register reads and writes. The part that worries me
> > is:
> > - __intel_get_crtc_scanline() the worst case is 100us if no match is
> >   found.
> >
> 
> This one can be a problem indeed on (maybe all?) modern Intel gpu's since
> Haswell, ie. the last ~10 years. I was able to reproduce it on my Kabylake
> Intel gpu.
> 
> Most of the time that for-loop with up to 100 repetitions (~ 100
> udelay(1) + one mmio register read) (cfe.
> https://elixir.bootlin.com/linux/v5.17-rc1/source/drivers/gpu/drm/i915/i915_irq.c#L856)
> will not execute, because most of the time that function gets called from
> the vblank irq handler and then that trigger condition (if
> (HAS_DDI(dev_priv) && !position)) is not true. However, it also gets called
> as part of power-saving on behalf of userspace context, whenever the
> desktop graphics goes idle for two video refresh cycles. If the desktop
> shows graphics activity again, and vblank interrupts need to get reenabled,
> the probability of hitting that case is then ~1-4% depending on video mode.
> How many loops it runs also varies.
> 
> On my little Intel(R) Core(TM) i5-8250U CPU machine with a mostly idle
> desktop, I observed about one hit every couple of seconds of regular use,
> and each hit took between 125 usecs and almost 250 usecs. I guess udelay(1)
> can take a bit longer than 1 usec?

it should get very close to this. Maybe something else extended the time
depending on what you observe.

> So that's too much for preempt-rt. What one could do is the following:
> 
> 1. In the for-loop in __intel_get_crtc_scanline(), add a preempt_enable()
> before the udelay(1); and a preempt_disable() again after it. Or
> potentially around the whole for-loop if the overhead of
> preempt_en/disable() is significant?

It is very optimized on x86 ;)

> 2. In intel_get_crtc_scanline() also wrap the call to
> __intel_get_crtc_scanline() into a preempt_disable() and preempt_enable(),
> so we can be sure that __intel_get_crtc_scanline() always gets called with
> preemption disabled.
> 
> Why should this work ok'ish? The point of the original preempt disable
> inside i915_get_crtc_scanoutpos
> <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/i915_get_crtc_scanoutpos>
> is that those two *stime = ktime_get() and *etime = ktime_get() clock
> queries happen as close to the scanout position query as possible to get a
> small confidence interval for when exactly the scanoutpos was
> read/determined from the display hardware. error = (etime - stime) is the
> error margin. If that margin becomes greater than 20 usecs, then the
> higher-level code will consider the measurement invalid and repeat the
> whole procedure up to 3 times before giving up.

The preempt-disable is needed then? The task is preemptible here on
PREEMPT_RT but it _may_ not come to this. The difference vs !RT is that
an interrupt will preempt this code without it.

> Normally, in my experience with different graphics chips, one would observe
> error < 3 usecs, so the measurement almost always succeeds at first try,
> only very rarely takes two attempts. The preempt disable is meant to make
> sure that this stays the case on a PREEMPT_RT kernel.

Was it needed?

> The problem here are the relatively rare cases where we hit that up to 100
> iterations for-loop. Here even on a regular kernel, due to hardware quirks,
> we already exceed the 20 usecs tolerance by a huge amount of more than 100
> usecs, leading to a retry of the measurement. And my tests showed that
> often the two succeeding retries also fail, because of hardware quirks can
> apparently create a blackout situation approaching 1 msec, so we lose
> anyway, regardless if we get preempted on a RT kernel or not. That's why
> enabling preemption on RT again during that for-loop should not make the
> situation worse and at least keep RT as real-time as intended.
> 
> In practice I would also expect that this failure case is the one least
> likely to impair userspace applications greatly in practice. The cases that
> mostly matter are the ones executed during vblank hardware irq, where the
> for-loop never executes and error margin and preempt off time is only about
> 1 usec. My own software which depends on very precise timestamps from the
> mechanism never reported >> 20 usecs errors during startup tests or runtime
> tests.

That is without RT I assume?

> > - intel_crtc_scanlines_since_frame_timestamp() not sure how long this
> >   may take in the worst case.
> >
> >
> intel_crtc_scanlines_since_frame_timestamp() should be harmless. That
> do-while loop just tries to make sure that two register reads that should
> happen within the same video refresh cycle are happening in the same
> refresh cycle. As such, the while loop will almost always just execute only
> once, and at most two times, so that's at most 6 mmio register reads for
> two loop iterations.
> 
> In the long run one could try to test if
> __intel_get_crtc_scanline_from_timestamp
> <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/__intel_get_crtc_scanline_from_timestamp>()
> wouldn't be the better choice for affected hardware always. Code and
> register descriptions suggest the feature is supported by all potentially
> affected hardware, so if it would turn out that that method works as
> accurate and reliable as the old one, we could save the overhead and time
> delays for that 100 for-loop iterations and make the whole timestamping
> more reliable on modern hw.
> 
> It was in the RT queue for a while and nobody complained.
> > Disable preemption on PREEPMPT_RT during timestamping.
> >
> >
> Do other patches exist to implement the preempt_dis/enable() also for AMD
> and NVidia / nouveau / vc4? I left corresponding markers for

No, nobody complained. Most likely the i915 is wider used since it is
built-in into many chipsets which then run RT and some of them use the
display in production.

> radeon/amdgpu-kms and RaspberryPi's vc4 driver. Ideally all kms drivers
> which use scanout position queries should have such code. Always a
> preempt_disable() before the "if (stime) *stime = ktime_get();" statement,
> and a preempt_enable() after the "if (etime) *etime = ktime_get();"
> statement.
> 
> Checking Linux 5.16 code, this should be safe - short preempt off interval
> with only a few mmio register reads - for all kms drivers that support it
> atm. I found the following functions to modify:
> 
> amdgpu: amdgpu_display_get_crtc_scanoutpos()
> radeon: radeon_get_crtc_scanoutpos()
> msm: mdp5_crtc_get_scanout_position() and dpu_crtc_get_scanout_position()
> ltdc: ltdc_crtc_get_scanout_position()
> vc4: vc4_crtc_get_scanout_position()

I that is "small" with locks and such, then it should work

> nouveau: In nvkm_head_mthd_scanoutpos(), one needs to put the
> preempt_disable() right before
> 
> 
> Is the plan to integrate these patches into the mainline kernel soon, as
> part of ongoing preempt-rt upstreaming?

I want to get the i915 in as part of RT upstreaming. But now I've been
thinking to not allowing i915 on RT via Kconfig and worry about it
afterwards.

> thanks,
> -mario

Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/8] drm/i915: Use preempt_disable/enable_rt() where recommended
  2022-02-11  8:44       ` [Intel-gfx] " Sebastian Andrzej Siewior
@ 2022-02-14 18:38         ` Mario Kleiner
  -1 siblings, 0 replies; 35+ messages in thread
From: Mario Kleiner @ 2022-02-14 18:38 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tvrtko Ursulin, David Airlie, intel-gfx, Mike Galbraith,
	dri-devel, Rodrigo Vivi, Thomas Gleixner

On Fri, Feb 11, 2022 at 9:44 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2022-01-27 00:29:37 [+0100], Mario Kleiner wrote:
> > Hi, first thank you for implementing these preempt disables according to
> Hi Mario,
>
> > the markers i left long ago. And sorry for the rather late reply.
> >
> > I had a look at the code, as of Linux 5.16, and did also a little test run
> > (of a standard kernel, not with PREEMPT_RT, only
> > CONFIG_PREEMPT_VOLUNTARY=y) on my Intel Kabylake GT2, so some thoughts:
> >
> > The area covers only register reads and writes. The part that worries me
> > > is:
> > > - __intel_get_crtc_scanline() the worst case is 100us if no match is
> > >   found.
> > >
> >
> > This one can be a problem indeed on (maybe all?) modern Intel gpu's since
> > Haswell, ie. the last ~10 years. I was able to reproduce it on my Kabylake
> > Intel gpu.
> >
> > Most of the time that for-loop with up to 100 repetitions (~ 100
> > udelay(1) + one mmio register read) (cfe.
> > https://elixir.bootlin.com/linux/v5.17-rc1/source/drivers/gpu/drm/i915/i915_irq.c#L856)
> > will not execute, because most of the time that function gets called from
> > the vblank irq handler and then that trigger condition (if
> > (HAS_DDI(dev_priv) && !position)) is not true. However, it also gets called
> > as part of power-saving on behalf of userspace context, whenever the
> > desktop graphics goes idle for two video refresh cycles. If the desktop
> > shows graphics activity again, and vblank interrupts need to get reenabled,
> > the probability of hitting that case is then ~1-4% depending on video mode.
> > How many loops it runs also varies.
> >
> > On my little Intel(R) Core(TM) i5-8250U CPU machine with a mostly idle
> > desktop, I observed about one hit every couple of seconds of regular use,
> > and each hit took between 125 usecs and almost 250 usecs. I guess udelay(1)
> > can take a bit longer than 1 usec?
>
> it should get very close to this. Maybe something else extended the time
> depending on what you observe.
>

Probably all the other stuff in that for-loop adds a microsecond. I
don't have a good feeling how long a typical mmio register read is
expected to take, except for quite a bit less than 1 usec from my
experience.

> > So that's too much for preempt-rt. What one could do is the following:
> >
> > 1. In the for-loop in __intel_get_crtc_scanline(), add a preempt_enable()
> > before the udelay(1); and a preempt_disable() again after it. Or
> > potentially around the whole for-loop if the overhead of
> > preempt_en/disable() is significant?
>
> It is very optimized on x86 ;)

Good! So adding a disable/enable pair into each of those loop
iterations won't hurt.

>
> > 2. In intel_get_crtc_scanline() also wrap the call to
> > __intel_get_crtc_scanline() into a preempt_disable() and preempt_enable(),
> > so we can be sure that __intel_get_crtc_scanline() always gets called with
> > preemption disabled.
> >
> > Why should this work ok'ish? The point of the original preempt disable
> > inside i915_get_crtc_scanoutpos
> > <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/i915_get_crtc_scanoutpos>
> > is that those two *stime = ktime_get() and *etime = ktime_get() clock
> > queries happen as close to the scanout position query as possible to get a
> > small confidence interval for when exactly the scanoutpos was
> > read/determined from the display hardware. error = (etime - stime) is the
> > error margin. If that margin becomes greater than 20 usecs, then the
> > higher-level code will consider the measurement invalid and repeat the
> > whole procedure up to 3 times before giving up.
>
> The preempt-disable is needed then? The task is preemptible here on
> PREEMPT_RT but it _may_ not come to this. The difference vs !RT is that
> an interrupt will preempt this code without it.
>

Yes, it is needed, as that chunk of code between the two ktime_get()
requires should ideally not get interrupted by anything.
The "try up to three times" higher level logic in calling code is just
to cover the hopefully rare cases where something still preempts,
e.g., a NMI or such.

I have not ever tested this on a PREEMPT_RT kernel in at least a
decade, but on regular kernels, e.g., Ubuntu generic or Ubuntu
low-latency kernels I haven't observed more than one retry when it
mattered, and usually the code executes in 0-2 usecs on my test
machines, way below the limit of 20 usecs at which a measurement is
considered failed and then retried. So the retries are sufficient as
long as all preventable preemption is prevented. Hence the
preempt_disable() annotations to make sure it continues to work on
PREEMPT_RT.

The exception, as I learned after your mail, is when that for-loop is
hit. Then it can take hundreds of microseconds, and even spoil all
three retries, resulting in a rather inaccurate measurement. But the
"good news" is that that for-loop will very likely only be hit in
cases where the code is not called from (Vblank/Pageflip-completion)
hardware interrupt handlers, but triggered by a userspace ioctl. Those
are the cases where a higher measurement error is imho tolerable in
typical use cases and luckily getting preempted won't make things
worse than they are already. That's why it is acceptable to have those
preempt_enable/disable around each udelay(1) in that one specific up
to 100 iterations for-loop -- It helps RT to reach its goals, but
doesn't hurt the graphics case further. Basically we got lucky.

> > Normally, in my experience with different graphics chips, one would observe
> > error < 3 usecs, so the measurement almost always succeeds at first try,
> > only very rarely takes two attempts. The preempt disable is meant to make
> > sure that this stays the case on a PREEMPT_RT kernel.
>
> Was it needed?
>

The point of that code is that in an ideal world, the mmio registers
and ktime_get() host time read should happen all at exactly the same
time. Given that that is impossible, the second best is to try to do
both as close in time together as possible, hence the measures to try
to prevent any preemption as much as possible and have some
mitigations for the hopefully rare cases when it fails.

> > The problem here are the relatively rare cases where we hit that up to 100
> > iterations for-loop. Here even on a regular kernel, due to hardware quirks,
> > we already exceed the 20 usecs tolerance by a huge amount of more than 100
> > usecs, leading to a retry of the measurement. And my tests showed that
> > often the two succeeding retries also fail, because of hardware quirks can
> > apparently create a blackout situation approaching 1 msec, so we lose
> > anyway, regardless if we get preempted on a RT kernel or not. That's why
> > enabling preemption on RT again during that for-loop should not make the
> > situation worse and at least keep RT as real-time as intended.
> >
> > In practice I would also expect that this failure case is the one least
> > likely to impair userspace applications greatly in practice. The cases that
> > mostly matter are the ones executed during vblank hardware irq, where the
> > for-loop never executes and error margin and preempt off time is only about
> > 1 usec. My own software which depends on very precise timestamps from the
> > mechanism never reported >> 20 usecs errors during startup tests or runtime
> > tests.
>
> That is without RT I assume?

Yes.

I haven't tested PREEMPT_RT in a decade. But all the same would apply
for RT. And there are interesting use cases at least in my field
(neuroscience/medical science) if one can have graphics applications
running on a RT kernel which are very timing sensitive. Setups which
need to present pictures/animations on a monitor with very reliable
and precise timing and timestamps, but also do some simultaneous
realtime data collection or stimulation via ADC/DAC boards or digital
i/o boards, e.g., realtime virtual dynamic patch-clamp techniques in
electrophysiology (cfe. this link for an idea:
https://vivo.weill.cornell.edu/display/pubid11764320).

And a whole host of other applications, once the RT patch-set is upstreamed...

>
> > > - intel_crtc_scanlines_since_frame_timestamp() not sure how long this
> > >   may take in the worst case.
> > >
> > >
> > intel_crtc_scanlines_since_frame_timestamp() should be harmless. That
> > do-while loop just tries to make sure that two register reads that should
> > happen within the same video refresh cycle are happening in the same
> > refresh cycle. As such, the while loop will almost always just execute only
> > once, and at most two times, so that's at most 6 mmio register reads for
> > two loop iterations.
> >
> > In the long run one could try to test if
> > __intel_get_crtc_scanline_from_timestamp
> > <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/__intel_get_crtc_scanline_from_timestamp>()
> > wouldn't be the better choice for affected hardware always. Code and
> > register descriptions suggest the feature is supported by all potentially
> > affected hardware, so if it would turn out that that method works as
> > accurate and reliable as the old one, we could save the overhead and time
> > delays for that 100 for-loop iterations and make the whole timestamping
> > more reliable on modern hw.
> >
> > It was in the RT queue for a while and nobody complained.
> > > Disable preemption on PREEPMPT_RT during timestamping.
> > >
> > >
> > Do other patches exist to implement the preempt_dis/enable() also for AMD
> > and NVidia / nouveau / vc4? I left corresponding markers for
>
> No, nobody complained. Most likely the i915 is wider used since it is
> built-in into many chipsets which then run RT and some of them use the
> display in production.
>
> > radeon/amdgpu-kms and RaspberryPi's vc4 driver. Ideally all kms drivers
> > which use scanout position queries should have such code. Always a
> > preempt_disable() before the "if (stime) *stime = ktime_get();" statement,
> > and a preempt_enable() after the "if (etime) *etime = ktime_get();"
> > statement.
> >
> > Checking Linux 5.16 code, this should be safe - short preempt off interval
> > with only a few mmio register reads - for all kms drivers that support it
> > atm. I found the following functions to modify:
> >
> > amdgpu: amdgpu_display_get_crtc_scanoutpos()
> > radeon: radeon_get_crtc_scanoutpos()
> > msm: mdp5_crtc_get_scanout_position() and dpu_crtc_get_scanout_position()
> > ltdc: ltdc_crtc_get_scanout_position()
> > vc4: vc4_crtc_get_scanout_position()
>
> I that is "small" with locks and such, then it should work
>

Yes. I looked over all of those functions implementations, so unless i
overlooked something, it should be easy.

> > nouveau: In nvkm_head_mthd_scanoutpos(), one needs to put the
> > preempt_disable() right before
> >
> …
> >
> > Is the plan to integrate these patches into the mainline kernel soon, as
> > part of ongoing preempt-rt upstreaming?
>
> I want to get the i915 in as part of RT upstreaming. But now I've been
> thinking to not allowing i915 on RT via Kconfig and worry about it
> afterwards.
>

Well, my sketched out approach should be ok, and a start. We may be
able to do without that for-loop in the future for a more elegant
solution on affected hw, but that will require quite a bit of testing
(if it wouldn't exchange one problematic hardware quirk for another),
for which i don't have the time right now or in the next few months,
although i put it on to my todo list for laters, if nobody else does
it beforehand. The Intel gfx people would have a way larger sample of
hw to test though, my sample is n=1 atm.

-mario

> > thanks,
> > -mario
>
> Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH 4/8] drm/i915: Use preempt_disable/enable_rt() where recommended
@ 2022-02-14 18:38         ` Mario Kleiner
  0 siblings, 0 replies; 35+ messages in thread
From: Mario Kleiner @ 2022-02-14 18:38 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: David Airlie, intel-gfx, Mike Galbraith, dri-devel, Thomas Gleixner

On Fri, Feb 11, 2022 at 9:44 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2022-01-27 00:29:37 [+0100], Mario Kleiner wrote:
> > Hi, first thank you for implementing these preempt disables according to
> Hi Mario,
>
> > the markers i left long ago. And sorry for the rather late reply.
> >
> > I had a look at the code, as of Linux 5.16, and did also a little test run
> > (of a standard kernel, not with PREEMPT_RT, only
> > CONFIG_PREEMPT_VOLUNTARY=y) on my Intel Kabylake GT2, so some thoughts:
> >
> > The area covers only register reads and writes. The part that worries me
> > > is:
> > > - __intel_get_crtc_scanline() the worst case is 100us if no match is
> > >   found.
> > >
> >
> > This one can be a problem indeed on (maybe all?) modern Intel gpu's since
> > Haswell, ie. the last ~10 years. I was able to reproduce it on my Kabylake
> > Intel gpu.
> >
> > Most of the time that for-loop with up to 100 repetitions (~ 100
> > udelay(1) + one mmio register read) (cfe.
> > https://elixir.bootlin.com/linux/v5.17-rc1/source/drivers/gpu/drm/i915/i915_irq.c#L856)
> > will not execute, because most of the time that function gets called from
> > the vblank irq handler and then that trigger condition (if
> > (HAS_DDI(dev_priv) && !position)) is not true. However, it also gets called
> > as part of power-saving on behalf of userspace context, whenever the
> > desktop graphics goes idle for two video refresh cycles. If the desktop
> > shows graphics activity again, and vblank interrupts need to get reenabled,
> > the probability of hitting that case is then ~1-4% depending on video mode.
> > How many loops it runs also varies.
> >
> > On my little Intel(R) Core(TM) i5-8250U CPU machine with a mostly idle
> > desktop, I observed about one hit every couple of seconds of regular use,
> > and each hit took between 125 usecs and almost 250 usecs. I guess udelay(1)
> > can take a bit longer than 1 usec?
>
> it should get very close to this. Maybe something else extended the time
> depending on what you observe.
>

Probably all the other stuff in that for-loop adds a microsecond. I
don't have a good feeling how long a typical mmio register read is
expected to take, except for quite a bit less than 1 usec from my
experience.

> > So that's too much for preempt-rt. What one could do is the following:
> >
> > 1. In the for-loop in __intel_get_crtc_scanline(), add a preempt_enable()
> > before the udelay(1); and a preempt_disable() again after it. Or
> > potentially around the whole for-loop if the overhead of
> > preempt_en/disable() is significant?
>
> It is very optimized on x86 ;)

Good! So adding a disable/enable pair into each of those loop
iterations won't hurt.

>
> > 2. In intel_get_crtc_scanline() also wrap the call to
> > __intel_get_crtc_scanline() into a preempt_disable() and preempt_enable(),
> > so we can be sure that __intel_get_crtc_scanline() always gets called with
> > preemption disabled.
> >
> > Why should this work ok'ish? The point of the original preempt disable
> > inside i915_get_crtc_scanoutpos
> > <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/i915_get_crtc_scanoutpos>
> > is that those two *stime = ktime_get() and *etime = ktime_get() clock
> > queries happen as close to the scanout position query as possible to get a
> > small confidence interval for when exactly the scanoutpos was
> > read/determined from the display hardware. error = (etime - stime) is the
> > error margin. If that margin becomes greater than 20 usecs, then the
> > higher-level code will consider the measurement invalid and repeat the
> > whole procedure up to 3 times before giving up.
>
> The preempt-disable is needed then? The task is preemptible here on
> PREEMPT_RT but it _may_ not come to this. The difference vs !RT is that
> an interrupt will preempt this code without it.
>

Yes, it is needed, as that chunk of code between the two ktime_get()
requires should ideally not get interrupted by anything.
The "try up to three times" higher level logic in calling code is just
to cover the hopefully rare cases where something still preempts,
e.g., a NMI or such.

I have not ever tested this on a PREEMPT_RT kernel in at least a
decade, but on regular kernels, e.g., Ubuntu generic or Ubuntu
low-latency kernels I haven't observed more than one retry when it
mattered, and usually the code executes in 0-2 usecs on my test
machines, way below the limit of 20 usecs at which a measurement is
considered failed and then retried. So the retries are sufficient as
long as all preventable preemption is prevented. Hence the
preempt_disable() annotations to make sure it continues to work on
PREEMPT_RT.

The exception, as I learned after your mail, is when that for-loop is
hit. Then it can take hundreds of microseconds, and even spoil all
three retries, resulting in a rather inaccurate measurement. But the
"good news" is that that for-loop will very likely only be hit in
cases where the code is not called from (Vblank/Pageflip-completion)
hardware interrupt handlers, but triggered by a userspace ioctl. Those
are the cases where a higher measurement error is imho tolerable in
typical use cases and luckily getting preempted won't make things
worse than they are already. That's why it is acceptable to have those
preempt_enable/disable around each udelay(1) in that one specific up
to 100 iterations for-loop -- It helps RT to reach its goals, but
doesn't hurt the graphics case further. Basically we got lucky.

> > Normally, in my experience with different graphics chips, one would observe
> > error < 3 usecs, so the measurement almost always succeeds at first try,
> > only very rarely takes two attempts. The preempt disable is meant to make
> > sure that this stays the case on a PREEMPT_RT kernel.
>
> Was it needed?
>

The point of that code is that in an ideal world, the mmio registers
and ktime_get() host time read should happen all at exactly the same
time. Given that that is impossible, the second best is to try to do
both as close in time together as possible, hence the measures to try
to prevent any preemption as much as possible and have some
mitigations for the hopefully rare cases when it fails.

> > The problem here are the relatively rare cases where we hit that up to 100
> > iterations for-loop. Here even on a regular kernel, due to hardware quirks,
> > we already exceed the 20 usecs tolerance by a huge amount of more than 100
> > usecs, leading to a retry of the measurement. And my tests showed that
> > often the two succeeding retries also fail, because of hardware quirks can
> > apparently create a blackout situation approaching 1 msec, so we lose
> > anyway, regardless if we get preempted on a RT kernel or not. That's why
> > enabling preemption on RT again during that for-loop should not make the
> > situation worse and at least keep RT as real-time as intended.
> >
> > In practice I would also expect that this failure case is the one least
> > likely to impair userspace applications greatly in practice. The cases that
> > mostly matter are the ones executed during vblank hardware irq, where the
> > for-loop never executes and error margin and preempt off time is only about
> > 1 usec. My own software which depends on very precise timestamps from the
> > mechanism never reported >> 20 usecs errors during startup tests or runtime
> > tests.
>
> That is without RT I assume?

Yes.

I haven't tested PREEMPT_RT in a decade. But all the same would apply
for RT. And there are interesting use cases at least in my field
(neuroscience/medical science) if one can have graphics applications
running on a RT kernel which are very timing sensitive. Setups which
need to present pictures/animations on a monitor with very reliable
and precise timing and timestamps, but also do some simultaneous
realtime data collection or stimulation via ADC/DAC boards or digital
i/o boards, e.g., realtime virtual dynamic patch-clamp techniques in
electrophysiology (cfe. this link for an idea:
https://vivo.weill.cornell.edu/display/pubid11764320).

And a whole host of other applications, once the RT patch-set is upstreamed...

>
> > > - intel_crtc_scanlines_since_frame_timestamp() not sure how long this
> > >   may take in the worst case.
> > >
> > >
> > intel_crtc_scanlines_since_frame_timestamp() should be harmless. That
> > do-while loop just tries to make sure that two register reads that should
> > happen within the same video refresh cycle are happening in the same
> > refresh cycle. As such, the while loop will almost always just execute only
> > once, and at most two times, so that's at most 6 mmio register reads for
> > two loop iterations.
> >
> > In the long run one could try to test if
> > __intel_get_crtc_scanline_from_timestamp
> > <https://elixir.bootlin.com/linux/v5.17-rc1/C/ident/__intel_get_crtc_scanline_from_timestamp>()
> > wouldn't be the better choice for affected hardware always. Code and
> > register descriptions suggest the feature is supported by all potentially
> > affected hardware, so if it would turn out that that method works as
> > accurate and reliable as the old one, we could save the overhead and time
> > delays for that 100 for-loop iterations and make the whole timestamping
> > more reliable on modern hw.
> >
> > It was in the RT queue for a while and nobody complained.
> > > Disable preemption on PREEPMPT_RT during timestamping.
> > >
> > >
> > Do other patches exist to implement the preempt_dis/enable() also for AMD
> > and NVidia / nouveau / vc4? I left corresponding markers for
>
> No, nobody complained. Most likely the i915 is wider used since it is
> built-in into many chipsets which then run RT and some of them use the
> display in production.
>
> > radeon/amdgpu-kms and RaspberryPi's vc4 driver. Ideally all kms drivers
> > which use scanout position queries should have such code. Always a
> > preempt_disable() before the "if (stime) *stime = ktime_get();" statement,
> > and a preempt_enable() after the "if (etime) *etime = ktime_get();"
> > statement.
> >
> > Checking Linux 5.16 code, this should be safe - short preempt off interval
> > with only a few mmio register reads - for all kms drivers that support it
> > atm. I found the following functions to modify:
> >
> > amdgpu: amdgpu_display_get_crtc_scanoutpos()
> > radeon: radeon_get_crtc_scanoutpos()
> > msm: mdp5_crtc_get_scanout_position() and dpu_crtc_get_scanout_position()
> > ltdc: ltdc_crtc_get_scanout_position()
> > vc4: vc4_crtc_get_scanout_position()
>
> I that is "small" with locks and such, then it should work
>

Yes. I looked over all of those functions implementations, so unless i
overlooked something, it should be easy.

> > nouveau: In nvkm_head_mthd_scanoutpos(), one needs to put the
> > preempt_disable() right before
> >
> …
> >
> > Is the plan to integrate these patches into the mainline kernel soon, as
> > part of ongoing preempt-rt upstreaming?
>
> I want to get the i915 in as part of RT upstreaming. But now I've been
> thinking to not allowing i915 on RT via Kconfig and worry about it
> afterwards.
>

Well, my sketched out approach should be ok, and a start. We may be
able to do without that for-loop in the future for a more elegant
solution on affected hw, but that will require quite a bit of testing
(if it wouldn't exchange one problematic hardware quirk for another),
for which i don't have the time right now or in the next few months,
although i put it on to my todo list for laters, if nobody else does
it beforehand. The Intel gfx people would have a way larger sample of
hw to test though, my sample is n=1 atm.

-mario

> > thanks,
> > -mario
>
> Sebastian

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2022-02-14 18:39 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-14 14:02 [PATCH 0/8] drm/i915: PREEMPT_RT related fixups Sebastian Andrzej Siewior
2021-12-14 14:02 ` [Intel-gfx] " Sebastian Andrzej Siewior
2021-12-14 14:02 ` [PATCH 1/8] drm/i915: Drop the irqs_disabled() check Sebastian Andrzej Siewior
2021-12-14 14:02   ` [Intel-gfx] " Sebastian Andrzej Siewior
2021-12-14 14:02 ` [PATCH 2/8] drm/i915/gt: Queue and wait for the irq_work item Sebastian Andrzej Siewior
2021-12-14 14:02   ` [Intel-gfx] " Sebastian Andrzej Siewior
2021-12-14 14:02 ` [PATCH 3/8] drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock() Sebastian Andrzej Siewior
2021-12-14 14:02   ` [Intel-gfx] " Sebastian Andrzej Siewior
2021-12-14 14:02 ` [PATCH 4/8] drm/i915: Use preempt_disable/enable_rt() where recommended Sebastian Andrzej Siewior
2021-12-14 14:02   ` [Intel-gfx] " Sebastian Andrzej Siewior
2022-01-26 23:29   ` Mario Kleiner
2022-01-26 23:29     ` [Intel-gfx] " Mario Kleiner
2022-02-11  8:44     ` Sebastian Andrzej Siewior
2022-02-11  8:44       ` [Intel-gfx] " Sebastian Andrzej Siewior
2022-02-14 18:38       ` Mario Kleiner
2022-02-14 18:38         ` [Intel-gfx] " Mario Kleiner
2021-12-14 14:02 ` [PATCH 5/8] drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates Sebastian Andrzej Siewior
2021-12-14 14:02   ` [Intel-gfx] " Sebastian Andrzej Siewior
2021-12-14 14:02 ` [PATCH 6/8] drm/i915: Don't check for atomic context on PREEMPT_RT Sebastian Andrzej Siewior
2021-12-14 14:02   ` [Intel-gfx] " Sebastian Andrzej Siewior
2021-12-14 14:03 ` [PATCH 7/8] drm/i915: Disable tracing points " Sebastian Andrzej Siewior
2021-12-14 14:03   ` [Intel-gfx] " Sebastian Andrzej Siewior
2021-12-14 14:36   ` Steven Rostedt
2021-12-14 14:36     ` [Intel-gfx] " Steven Rostedt
2021-12-14 15:41     ` Jani Nikula
2021-12-14 15:41       ` [Intel-gfx] " Jani Nikula
2021-12-14 15:56     ` Sebastian Andrzej Siewior
2021-12-14 15:56       ` [Intel-gfx] " Sebastian Andrzej Siewior
2021-12-14 16:34     ` Ville Syrjälä
2021-12-14 16:58       ` Steven Rostedt
2022-02-08 17:32         ` Sebastian Andrzej Siewior
2021-12-14 14:03 ` [PATCH 8/8] drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE Sebastian Andrzej Siewior
2021-12-14 14:03   ` [Intel-gfx] " Sebastian Andrzej Siewior
2021-12-14 15:58 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/i915: PREEMPT_RT related fixups. (rev4) Patchwork
2021-12-14 16:12   ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.