All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lucas De Marchi <lucas.demarchi@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Subject: Re: [PATCH] drm/i915/dg2: Catch and log more unexpected values in DG1_MSTR_TILE_INTR
Date: Wed, 25 May 2022 11:14:17 -0700	[thread overview]
Message-ID: <20220525181417.wnilka4bigyi5fle@ldmartin-desk2> (raw)
In-Reply-To: <f37468b3-1066-ee4b-fb5b-7664fd180fd6@linux.intel.com>

On Wed, May 25, 2022 at 05:03:13PM +0100, Tvrtko Ursulin wrote:
>
>On 24/05/2022 18:51, Matt Roper wrote:
>>On Tue, May 24, 2022 at 10:43:39AM +0100, Tvrtko Ursulin wrote:
>>>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>>Catch and log any garbage in the register, including no tiles marked, or
>>>multiple tiles marked.
>>>
>>>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>Cc: Matt Roper <matthew.d.roper@intel.com>
>>>---
>>>We caught garbage in DG1_MSTR_TILE_INTR with DG2 (actual value 0xF9D2C008)
>>>during glmark and more badness. So I thought lets log all possible failure
>>>modes from here and also use per device logging.
>>>---
>>>  drivers/gpu/drm/i915/i915_irq.c | 33 ++++++++++++++++++++++-----------
>>>  drivers/gpu/drm/i915/i915_reg.h |  1 +
>>>  2 files changed, 23 insertions(+), 11 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>>>index 73cebc6aa650..79853d3fc1ed 100644
>>>--- a/drivers/gpu/drm/i915/i915_irq.c
>>>+++ b/drivers/gpu/drm/i915/i915_irq.c
>>>@@ -2778,24 +2778,30 @@ static irqreturn_t dg1_irq_handler(int irq, void *arg)
>>>  	u32 gu_misc_iir;
>>>  	if (!intel_irqs_enabled(i915))
>>>-		return IRQ_NONE;
>>>+		goto none;
>>>  	master_tile_ctl = dg1_master_intr_disable(regs);
>>>-	if (!master_tile_ctl) {
>>>-		dg1_master_intr_enable(regs);
>>>-		return IRQ_NONE;
>>>+	if (!master_tile_ctl)
>>>+		goto enable_none;
>>>+
>>>+	if (master_tile_ctl & ~(DG1_MSTR_IRQ | DG1_MSTR_TILE_MASK)) {
>>>+		drm_warn(&i915->drm, "Garbage in master_tile_ctl: 0x%08x!\n",
>>>+			 master_tile_ctl);
>>
>>I know we have a bunch of them already, but shouldn't we be avoiding
>>printk-based stuff like this inside interrupt handlers?  Should we be
>>migrating all these error messages over to trace_printk or something
>>similar that's safer to use?
>
>Not sure - I kind of think some really unexpected and worrying 
>situations should be loud and on by default. Risk is then spam if not 
>ratelimited. Maybe we should instead ratelimit most errors/warnings 
>coming for irq handlers?
>
>In this particular case at least DRM_ERROR with no device info is the 
>odd one out in the entire file so I'd suggest changing at least that, 
>if the rest of my changes is of questionable benefit.

I'd rather remove the printk's from irq rather than adding more. At the very
least, they should be the _once variant or ratelimited. One of the few
cases to even deserve a unlikely(), even to document this shouldn't
really be happening.

Our irq handlers (particularly on dgfx and multi-gt) are already too
long running... I don't like making them any onger or slower.


Lucas De Marchi

WARNING: multiple messages have this Message-ID (diff)
From: Lucas De Marchi <lucas.demarchi@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/dg2: Catch and log more unexpected values in DG1_MSTR_TILE_INTR
Date: Wed, 25 May 2022 11:14:17 -0700	[thread overview]
Message-ID: <20220525181417.wnilka4bigyi5fle@ldmartin-desk2> (raw)
In-Reply-To: <f37468b3-1066-ee4b-fb5b-7664fd180fd6@linux.intel.com>

On Wed, May 25, 2022 at 05:03:13PM +0100, Tvrtko Ursulin wrote:
>
>On 24/05/2022 18:51, Matt Roper wrote:
>>On Tue, May 24, 2022 at 10:43:39AM +0100, Tvrtko Ursulin wrote:
>>>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>>Catch and log any garbage in the register, including no tiles marked, or
>>>multiple tiles marked.
>>>
>>>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>Cc: Matt Roper <matthew.d.roper@intel.com>
>>>---
>>>We caught garbage in DG1_MSTR_TILE_INTR with DG2 (actual value 0xF9D2C008)
>>>during glmark and more badness. So I thought lets log all possible failure
>>>modes from here and also use per device logging.
>>>---
>>>  drivers/gpu/drm/i915/i915_irq.c | 33 ++++++++++++++++++++++-----------
>>>  drivers/gpu/drm/i915/i915_reg.h |  1 +
>>>  2 files changed, 23 insertions(+), 11 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>>>index 73cebc6aa650..79853d3fc1ed 100644
>>>--- a/drivers/gpu/drm/i915/i915_irq.c
>>>+++ b/drivers/gpu/drm/i915/i915_irq.c
>>>@@ -2778,24 +2778,30 @@ static irqreturn_t dg1_irq_handler(int irq, void *arg)
>>>  	u32 gu_misc_iir;
>>>  	if (!intel_irqs_enabled(i915))
>>>-		return IRQ_NONE;
>>>+		goto none;
>>>  	master_tile_ctl = dg1_master_intr_disable(regs);
>>>-	if (!master_tile_ctl) {
>>>-		dg1_master_intr_enable(regs);
>>>-		return IRQ_NONE;
>>>+	if (!master_tile_ctl)
>>>+		goto enable_none;
>>>+
>>>+	if (master_tile_ctl & ~(DG1_MSTR_IRQ | DG1_MSTR_TILE_MASK)) {
>>>+		drm_warn(&i915->drm, "Garbage in master_tile_ctl: 0x%08x!\n",
>>>+			 master_tile_ctl);
>>
>>I know we have a bunch of them already, but shouldn't we be avoiding
>>printk-based stuff like this inside interrupt handlers?  Should we be
>>migrating all these error messages over to trace_printk or something
>>similar that's safer to use?
>
>Not sure - I kind of think some really unexpected and worrying 
>situations should be loud and on by default. Risk is then spam if not 
>ratelimited. Maybe we should instead ratelimit most errors/warnings 
>coming for irq handlers?
>
>In this particular case at least DRM_ERROR with no device info is the 
>odd one out in the entire file so I'd suggest changing at least that, 
>if the rest of my changes is of questionable benefit.

I'd rather remove the printk's from irq rather than adding more. At the very
least, they should be the _once variant or ratelimited. One of the few
cases to even deserve a unlikely(), even to document this shouldn't
really be happening.

Our irq handlers (particularly on dgfx and multi-gt) are already too
long running... I don't like making them any onger or slower.


Lucas De Marchi

  parent reply	other threads:[~2022-05-25 18:14 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-24  9:43 [PATCH] drm/i915/dg2: Catch and log more unexpected values in DG1_MSTR_TILE_INTR Tvrtko Ursulin
2022-05-24  9:43 ` [Intel-gfx] " Tvrtko Ursulin
2022-05-24 10:01 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
2022-05-24 10:22 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-05-24 11:46 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2022-05-24 17:51 ` [PATCH] " Matt Roper
2022-05-24 17:51   ` [Intel-gfx] " Matt Roper
2022-05-25 16:03   ` Tvrtko Ursulin
2022-05-25 16:03     ` [Intel-gfx] " Tvrtko Ursulin
2022-05-25 18:05     ` Matt Roper
2022-05-25 18:05       ` [Intel-gfx] " Matt Roper
2022-05-26 10:18       ` Tvrtko Ursulin
2022-05-26 10:18         ` [Intel-gfx] " Tvrtko Ursulin
2022-05-27 18:42         ` Matt Roper
2022-05-27 18:42           ` Matt Roper
2022-06-06 11:55           ` Tvrtko Ursulin
2022-06-06 11:55             ` [Intel-gfx] " Tvrtko Ursulin
2022-06-06 15:21             ` Matt Roper
2022-06-06 15:21               ` [Intel-gfx] " Matt Roper
2022-06-07  9:20               ` Tvrtko Ursulin
2022-06-07  9:20                 ` [Intel-gfx] " Tvrtko Ursulin
2022-05-25 18:14     ` Lucas De Marchi [this message]
2022-05-25 18:14       ` Lucas De Marchi
2022-05-26 10:29       ` Tvrtko Ursulin
2022-05-26 10:29         ` [Intel-gfx] " Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220525181417.wnilka4bigyi5fle@ldmartin-desk2 \
    --to=lucas.demarchi@intel.com \
    --cc=Intel-gfx@lists.freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=tvrtko.ursulin@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.