All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>,
	"Brost, Matthew" <matthew.brost@intel.com>
Cc: "intel-gfx@lists.freedesktop.org" <intel-gfx@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [RFC 7/7] drm/i915/guc: Print the GuC error capture output register list.
Date: Fri, 7 Jan 2022 09:03:26 +0000	[thread overview]
Message-ID: <3bdab0a4-f182-28bb-0206-8d6d4b34c690@linux.intel.com> (raw)
In-Reply-To: <798a8740d1692541ee822535dbe880ce17b2bbbc.camel@intel.com>


On 06/01/2022 18:33, Teres Alexis, Alan Previn wrote:
> 
> On Thu, 2022-01-06 at 09:38 +0000, Tvrtko Ursulin wrote:
>> On 05/01/2022 17:30, Teres Alexis, Alan Previn wrote:
>>> On Tue, 2022-01-04 at 13:56 +0000, Tvrtko Ursulin wrote:
>>>>> The flow of events are as below:
>>>>>
>>>>> 1. guc sends notification that an error capture was done and ready to take.
>>>>> 	- at this point we copy the guc error captured dump into an interim store
>>>>> 	  (larger buffer that can hold multiple captures).
>>>>> 2. guc sends notification that a context was reset (after the prior)
>>>>> 	- this triggers a call to i915_gpu_coredump with the corresponding engine-mask
>>>>>              from the context that was reset
>>>>> 	- i915_gpu_coredump proceeds to gather entire gpu state including driver state,
>>>>>              global gpu state, engine state, context vmas and also engine registers. For the
>>>>>              engine registers now call into the guc_capture code which merely needs to verify
>>>>> 	  that GuC had already done a step 1 and we have data ready to be parsed.
>>>>
>>>> What about the time between the actual reset and receiving the context
>>>> reset notification? Latter will contain intel_context->guc_id - can that
>>>> be re-assigned or "retired" in between the two and so cause problems for
>>>> matching the correct (or any) vmas?
>>>>
>>> Not it cannot because its only after the context reset notification that i915 starts
>>> taking action against that cotnext - and even that happens after the i915_gpu_codedump (engine-mask-of-context) happens.
>>> That's what i've observed in the code flow.
>>
>> The fact it is "only after" is exactly why I asked.
>>
>> Reset notification is in a CT queue with other stuff, right? So can be
>> some unrelated time after the actual reset. Could have context be
>> retired in the meantime and guc_id released is the question.
>>
>> Because i915 has no idea there was a reset until this delayed message
>> comes over, but it could see user interrupt signaling end of batch,
>> after the reset has happened, unbeknown to i915, right?
>>
>> Perhaps the answer is guc_id cannot be released via the request retire
>> flows. Or GuC signaling release of guc_id is a thing, which is then
>> ordered via the same CT buffer.
>>
>> I don't know, just asking.
>>
> As long as the context is pinned, the guc-id wont be re-assigned. After a bit of offline brain-dump
> from John Harrison, there are many factors that can keep the context pinned (recounts) including
> new or oustanding requests. So a guc-id can't get re-assigned between a capture-notify and a
> context-reset even if that outstanding request is the only refcount left since it would still
> be considered outstanding by the driver. I also think we may also be talking past each other
> in the sense that the guc-id is something the driver assigns to a context being pinned and only
> the driver can un-assign it (both assigning and unasigning is via H2G interactions).
> I get the sense you are assuming the GuC can un-assign the guc-id's on its own - which isn't
> the case. Apologies if i mis-assumed.

I did not think GuC can re-assign ce->guc_id. I asked about request/context complete/retire happening before reset/capture notification is received.

That would be the time window between the last intel_context_put, so last i915_request_put from retire, at which point AFAICT GuC code releases the guc_id. Execution timeline like:

|------ rq1 ------|------ rq2 ------|
    ^ engine reset		    ^ rq2, rq1 retire, guc id released

                                                           		^ GuC reset notify received - guc_id not known any more?
  
You are saying something is guaranteed to be holding onto the guc_id at the point of receiving the notification? "There are many factors that can keep the context pinned" - what is it in this case? Or the case cannot happen?

Regards,

Tvrtko

  reply	other threads:[~2022-01-07  9:03 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-22 23:03 [RFC 0/7] Add GuC Error Capture Support Alan Previn
2021-11-22 23:03 ` [Intel-gfx] " Alan Previn
2021-11-22 23:03 ` [Intel-gfx] [RFC 1/7] drm/i915/guc: Add basic support for error capture lists Alan Previn
2021-11-23 21:12   ` Michal Wajdeczko
2021-12-08 18:23     ` Teres Alexis, Alan Previn
2021-11-22 23:03 ` [Intel-gfx] [RFC 2/7] drm/i915/guc: Update GuC ADS size " Alan Previn
2021-11-23 21:46   ` Michal Wajdeczko
2021-11-24  9:52     ` Jani Nikula
2021-11-24 17:34     ` Teres Alexis, Alan Previn
2021-12-21 23:15       ` Teres Alexis, Alan Previn
2021-12-22  1:49       ` Teres Alexis, Alan Previn
2021-12-22 20:13     ` Teres Alexis, Alan Previn
2021-11-24 10:06   ` Jani Nikula
2021-11-24 17:37     ` Teres Alexis, Alan Previn
2021-11-22 23:03 ` [Intel-gfx] [RFC 3/7] drm/i915/guc: Populate XE_LP register lists for GuC error state capture Alan Previn
2021-11-23  1:59   ` kernel test robot
2021-11-23 21:55   ` Michal Wajdeczko
2021-11-24 17:16     ` Teres Alexis, Alan Previn
2021-11-22 23:03 ` [Intel-gfx] [RFC 4/7] drm/i915/guc: Add GuC's error state capture output structures Alan Previn
2021-11-24 10:08   ` Jani Nikula
2021-11-24 17:37     ` Teres Alexis, Alan Previn
2021-12-07 21:01   ` Matthew Brost
2021-12-07 23:35     ` Teres Alexis, Alan Previn
2021-11-22 23:04 ` [Intel-gfx] [RFC 5/7] drm/i915/guc: Update GuC's log-buffer-state access for error capture Alan Previn
2021-12-07 22:31   ` Matthew Brost
2021-12-07 23:33     ` Teres Alexis, Alan Previn
2021-12-07 23:30       ` Matthew Brost
2021-11-22 23:04 ` [Intel-gfx] [RFC 6/7] drm/i915/guc: Copy new GuC error capture logs upon G2H notification Alan Previn
2021-12-07 22:58   ` Matthew Brost
2021-12-08  5:14     ` Teres Alexis, Alan Previn
2021-12-08 18:22       ` Teres Alexis, Alan Previn
2021-11-22 23:04 ` [Intel-gfx] [RFC 7/7] drm/i915/guc: Print the GuC error capture output register list Alan Previn
2021-11-23  0:25   ` Teres Alexis, Alan Previn
2021-12-08  0:22   ` Matthew Brost
2021-12-08  6:31     ` Teres Alexis, Alan Previn
2021-12-23 18:54       ` Teres Alexis, Alan Previn
2021-12-24 12:09         ` Tvrtko Ursulin
2021-12-24 13:34           ` Teres Alexis, Alan Previn
2022-01-04 13:56             ` Tvrtko Ursulin
2022-01-05 17:30               ` Teres Alexis, Alan Previn
2022-01-06  9:38                 ` Tvrtko Ursulin
2022-01-06 18:33                   ` Teres Alexis, Alan Previn
2022-01-07  9:03                     ` Tvrtko Ursulin [this message]
2022-01-07 17:03                       ` Teres Alexis, Alan Previn
2022-01-10  8:07                         ` Tvrtko Ursulin
2022-01-10 18:19                           ` Teres Alexis, Alan Previn
2022-01-11 10:08                             ` Tvrtko Ursulin
2022-01-14  7:16                               ` Teres Alexis, Alan Previn
2021-11-22 23:44 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add GuC Error Capture Support Patchwork
2021-11-22 23:45 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-11-23  0:16 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2021-11-23  0:40 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add GuC Error Capture Support (rev2) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3bdab0a4-f182-28bb-0206-8d6d4b34c690@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=alan.previn.teres.alexis@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.