[PATCH 0/7] Add GuC Error Capture Support

* [PATCH 0/7] Add GuC Error Capture Support
@ 2022-01-18 10:03 ` Alan Previn
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Previn @ 2022-01-18 10:03 UTC (permalink / raw)
  To: intel-gfx
  Cc: Matthew Brost, Tvrtko Ursulin, John Harrison, dri-devel, Alan Previn

This series:
  1. Enables support of GuC to execute error-
     state-capture based on a list of MMIO
     registers the driver registers and GuC will
     dump and report back right before a GuC
     triggered engine-reset event.
  2. Updates the ADS blob creation to register lists
     of global and engine registers with GuC.
  3. Defines tables of register lists that are global or
     engine class or engine instance in scope.
  4. Separates GuC log-buffer access locks for relay logging
     vs the new region for the error state capture data.
  5. Allocates an additional interim circular buffer store
     to copy snapshots of new GuC reported error-state-capture
     dumps in response to the G2H notification.
  6. Connects the i915_gpu_coredump reporting function
     to the GuC error capture module to print all GuC
     error state capture dumps that is reported.

This is the 4th rev of this series with the first 3 revs
labelled as RFC.

Prior receipts of rvb's:
  - Patch #4 has received R-v-b from Matthew Brost
    <matthew.brost@intel.com>

Changes from prior revs:
  v4:
      - Rebased on latest drm-tip that has been merged with the
        support of GuC firmware version 69.0.3 that is required
        for GuC error-state-catpure to work.
      - Added register list for DG2 which is the same as XE_LP
        except an additional steering register set.
      - Fixed a bug in the end of capture parsing loop in
        intel_guc_capture_out_print_next_group that was not
        properly comparing the engine-instance and engine-
        class being parsed against the one that triggered
        the i915_gpu_coredump.
  v3:
      - Fixed all review comments from rev2 except the following:
          - Michal Wajdeczko proposed adding a seperate function
            to lookup register string nameslookup (based on offset)
            but decided against it because of offset conflicts
            and the current table layout is easier to maintain.
          - Last set of checkpatch errors pertaining to "COMPLEX
            MACROS" should be fixed on next rev.
      - Abstracted internal-to-guc-capture information into a new
        __guc_state_capture_priv structure that allows the exclusion
        of intel_guc.h and intel_guc_fwif.h from intel_guc_capture.h.
        Now, only the first 2 patches have a wider build time
        impact because of the changes to intel_guc_fwif.h but
        subsequent changes to guc-capture internal structures
        or firmware interfaces used solely by guc-capture module
        shoudn't impact the rest of the driver build.
      - Added missing Gen12LP registers and added slice+subslice
        indices when reporting extended steered registers.
      - Add additional checks to ensure that the GuC reported
        error capture information matches the i915_gpu_coredump
        that is being printed before we print out the corresponding
        VMA dumps such as the batch buffer.
   v2:
      - Ignore - failed CI retest.

Alan Previn (7):
  drm/i915/guc: Update GuC ADS size for error capture lists
  drm/i915/guc: Add XE_LP registers for GuC error state capture.
  drm/i915/guc: Add DG2 registers for GuC error state capture.
  drm/i915/guc: Add GuC's error state capture output structures.
  drm/i915/guc: Update GuC's log-buffer-state access for error capture.
  drm/i915/guc: Copy new GuC error capture logs upon G2H notification.
  drm/i915/guc: Print the GuC error capture output register list.

 drivers/gpu/drm/i915/Makefile                 |    1 +
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |    4 +-
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |    7 +
 drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h |   85 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |   13 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   13 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |   36 +-
 .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 1310 +++++++++++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_capture.h    |   30 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   21 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    |  155 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |   20 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   14 +-
 drivers/gpu/drm/i915/i915_gpu_error.c         |   65 +-
 drivers/gpu/drm/i915/i915_gpu_error.h         |   14 +
 15 files changed, 1669 insertions(+), 119 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h

-- 
2.25.1

^ permalink raw reply	[flat|nested] 14+ messages in thread