All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>
To: "intel-gfx@lists.freedesktop.org" <intel-gfx@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Copy new GuC error capture logs upon G2H notification.
Date: Wed, 19 Jan 2022 01:36:18 +0000	[thread overview]
Message-ID: <304ed4cc83bf360a2d8c7b89abe1fdb63a88bc10.camel@intel.com> (raw)
In-Reply-To: <20220118100358.1329655-7-alan.previn.teres.alexis@intel.com>

A fresh round of offline design discussions with internal team has decided 
that:

 - we dont want to use an interim circular buffer to copy all of the GuC
   generated register dumps of one or more 'engine-capture-groups'.
 - instead, we shall dynamically allocate multiple nodes, each node being
   a single "engine-capture-dump". A link list of one or many engine-capture-
   dumps would result from a single engine-capture-group.
 - this dynamic allocation should happen during the G2H error-capture
   notification event which happens before the corresponding G2H context-
   reset that triggers the i915_gpu_coredump (where we want to avoid
   memory allocation moving forward).
 - we also realize that during the link-list allocation we would need
   a first-parse of the guc-log-error-state-capture head-to-tail entries
   in order to duplicate global and engine-class register dumps for each
   each engine instance register dump if we find dependent-engine resets
   in a engine-capture-group.
 - later when i915_gpu_coredump calls into capture_engine, we finally
   attach the corresponding node from the link list above (detaching it
   from that link list) into i915_gpu_coredump's intel_engine_coredump
   structure when have matching LRCA/guc-id/engine-instace.
 - we would also have to add a flag through i915_gpu_coredump top level
   trigger through to capture_engine to indicate if the capture was triggered
   via a guc context reset or a forced user reset or gt-reset. In the latter
   case (user/gt reset), we should capture the register values directly
   from the HW, i.e. the pre-guc behavior without matching against GuC.

...alan


On Tue, 2022-01-18 at 02:03 -0800, Alan Previn wrote:

> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> index b637628905ec..fc80c5f31915 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c+static void __guc_capture_store_snapshot_work(struct intel_guc *guc)
..
..
> +{
> +	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
> +	unsigned int buffer_size, read_offset, write_offset, bytes_to_copy, full_count;
> +	struct guc_log_buffer_state *log_buf_state;
> +	struct guc_log_buffer_state log_buf_state_local;
> +	void *src_data, *dst_data = NULL;
> +	bool new_overflow;
> +
> +	/* Lock to get the pointer to GuC capture-log-buffer-state */
> +	mutex_lock(&guc->log_state[GUC_CAPTURE_LOG_BUFFER].lock);
> +	log_buf_state = guc->log.buf_addr +
> +			(sizeof(struct guc_log_buffer_state) * GUC_CAPTURE_LOG_BUFFER);
> +	src_data = guc->log.buf_addr + intel_guc_get_log_buffer_offset(GUC_CAPTURE_LOG_BUFFER);
> +
> +	/*
> +	 * Make a copy of the state structure, inside GuC log buffer
> +	 * (which is uncached mapped), on the stack to avoid reading
> +	 * from it multiple times.
> +	 */
> +	memcpy(&log_buf_state_local, log_buf_state, sizeof(struct guc_log_buffer_state));
> +	buffer_size = intel_guc_get_log_buffer_size(GUC_CAPTURE_LOG_BUFFER);
> +	read_offset = log_buf_state_local.read_ptr;
> +	write_offset = log_buf_state_local.sampled_write_ptr;
> +	full_count = log_buf_state_local.buffer_full_cnt;
> +
> +	/* Bookkeeping stuff */
> +	guc->log_state[GUC_CAPTURE_LOG_BUFFER].flush += log_buf_state_local.flush_to_file;
> +	new_overflow = intel_guc_check_log_buf_overflow(guc,
> +							&guc->log_state[GUC_CAPTURE_LOG_BUFFER],
> +							full_count);
> +
> +	/* Update the state of shared log buffer */
> +	log_buf_state->read_ptr = write_offset;
> +	log_buf_state->flush_to_file = 0;
> +
> +	mutex_unlock(&guc->log_state[GUC_CAPTURE_LOG_BUFFER].lock);
> +
> +	dst_data = guc->capture.priv->out_store.addr;
> +	if (dst_data) {
> +		mutex_lock(&guc->capture.priv->out_store.lock);
> +
> +		/* Now copy the actual logs. */
> +		if (unlikely(new_overflow)) {
> +			/* copy the whole buffer in case of overflow */
> +			read_offset = 0;
> +			write_offset = buffer_size;
> +		} else if (unlikely((read_offset > buffer_size) ||
> +			   (write_offset > buffer_size))) {
> +			drm_err(&i915->drm, "invalid GuC log capture buffer state!\n");
> +			/* copy whole buffer as offsets are unreliable */
> +			read_offset = 0;
> +			write_offset = buffer_size;
> +		}
> +
> +		/* first copy from the tail end of the GuC log capture buffer */
> +		if (read_offset > write_offset) {
> +			guc_capture_store_insert(guc, &guc->capture.priv->out_store, src_data,
> +						 write_offset);
> +			bytes_to_copy = buffer_size - read_offset;
> +		} else {
> +			bytes_to_copy = write_offset - read_offset;
> +		}
> +		guc_capture_store_insert(guc, &guc->capture.priv->out_store, src_data + read_offset,
> +					 bytes_to_copy);
> +
> +		mutex_unlock(&guc->capture.priv->out_store.lock);
> +	}
> +}
> +
> +void intel_guc_capture_store_snapshot(struct intel_guc *guc)
> +{
> +	if (guc->capture.priv)
> +		__guc_capture_store_snapshot_work(guc);
> +}
> +
> +static void guc_capture_store_destroy(struct intel_guc *guc)
> +{
> +	mutex_destroy(&guc->capture.priv->out_store.lock);
> +	guc->capture.priv->out_store.size = 0;
> +	kfree(guc->capture.priv->out_store.addr);
> +	guc->capture.priv->out_store.addr = NULL;
> +}
> +
> +static int guc_capture_store_create(struct intel_guc *guc)
> +{
> +	/*
> +	 * Make this interim buffer larger than GuC capture output buffer so that we can absorb
> +	 * a little delay when processing the raw capture dumps into text friendly logs
> +	 * for the i915_gpu_coredump output
> +	 */
> +	size_t max_dump_size;
> +
> +	GEM_BUG_ON(guc->capture.priv->out_store.addr);
> +
> +	max_dump_size = PAGE_ALIGN(intel_guc_capture_output_min_size_est(guc));
> +	max_dump_size = roundup_pow_of_two(max_dump_size);
> +
> +	guc->capture.priv->out_store.addr = kzalloc(max_dump_size, GFP_KERNEL);
> +	if (!guc->capture.priv->out_store.addr)
> +		return -ENOMEM;
> +
> +	guc->capture.priv->out_store.size = max_dump_size;
> +	mutex_init(&guc->capture.priv->out_store.lock);
> +
> +	return 0;
> +}
> +

  reply	other threads:[~2022-01-19  1:36 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-18 10:03 [PATCH 0/7] Add GuC Error Capture Support Alan Previn
2022-01-18 10:03 ` [Intel-gfx] " Alan Previn
2022-01-18 10:03 ` [Intel-gfx] [PATCH 1/7] drm/i915/guc: Update GuC ADS size for error capture lists Alan Previn
2022-01-18 10:03 ` [Intel-gfx] [PATCH 2/7] drm/i915/guc: Add XE_LP registers for GuC error state capture Alan Previn
2022-01-24 19:33   ` Teres Alexis, Alan Previn
2022-01-18 10:03 ` [Intel-gfx] [PATCH 3/7] drm/i915/guc: Add DG2 " Alan Previn
2022-01-18 10:03 ` [Intel-gfx] [PATCH 4/7] drm/i915/guc: Add GuC's error state capture output structures Alan Previn
2022-01-18 10:03 ` [Intel-gfx] [PATCH 5/7] drm/i915/guc: Update GuC's log-buffer-state access for error capture Alan Previn
2022-01-18 10:03 ` [Intel-gfx] [PATCH 6/7] drm/i915/guc: Copy new GuC error capture logs upon G2H notification Alan Previn
2022-01-19  1:36   ` Teres Alexis, Alan Previn [this message]
2022-01-18 10:03 ` [Intel-gfx] [PATCH 7/7] drm/i915/guc: Print the GuC error capture output register list Alan Previn
2022-01-18 10:16 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add GuC Error Capture Support (rev4) Patchwork
2022-01-18 10:17 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-01-18 10:49 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=304ed4cc83bf360a2d8c7b89abe1fdb63a88bc10.camel@intel.com \
    --to=alan.previn.teres.alexis@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.