[PATCH v9 00/18] Support for sustained capturing of GuC firmware logs

* [PATCH v9 00/18] Support for sustained capturing of GuC firmware logs
@ 2016-09-08 10:39 akash.goel
  2016-09-08 10:39 ` [PATCH 01/18] drm/i915: Decouple GuC log setup from verbosity parameter akash.goel
                   ` (18 more replies)
  0 siblings, 19 replies; 22+ messages in thread
From: akash.goel @ 2016-09-08 10:39 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

From: Akash Goel <akash.goel@intel.com>

GuC firmware log its debug messages into a Host-GuC shared memory buffer
and when the buffer is half full it sends a Flush interrupt to Host.
GuC firmware follows the half-full draining protocol where it expects that
while it is writing to 2nd half of the buffer, 1st half would get consumed
by Host and then get a flush completed acknowledgment from Host, so that
it does not end up doing any overwrite causing loss of logs.
So far flush interrupt wasn't enabled on Host side & User could capture the
contents/snapshot of log buffer through 'i915_guc_log_dump' debugfs iface.
But this couldn't meet couple of key requirements, especially of Validation,
first is to ensure capturing of all boot time logs even with high verbosity
level and second is to enable capturing of logs in a sustained manner like
for the entire duration of a workload.
Now Driver will enable flush interrupt and on receiving it, would copy the
contents of log buffer into its local buffer. The size of local buffer would
be big enough to contain multiple snapshots of the log buffer giving ample
time to User to pull boot time messages.
Have added a debugfs interface '/sys/kernel/debug/dri/guc_log' for User to
collect the logs. Availed relay framework to implement this interface, where
Driver will have to just use a relay API to store snapshots of GuC log buffer
in a buffer managed by relay. The relay buffer can be operated in a mode,
equivalent to 'dmesg -c' where the old data, not yet collected by User, will
be overwritten if buffer becomes full or it can be operated in no-overwrite
mode where relay will stop accepting new data if all sub buffers are full.
Have used the latter mode to avoid the possibility of getting garbled data. 
Besides mmap method, through which User can directly access the relay
buffer contents, relay also supports the 'poll' method. Through the 'poll'
call on log file, User can come to know whenever a new snapshot of the log
buffer is taken by Driver, so can run in tandem with the Driver and thus
capture logs in a sustained/streaming manner, without any loss of data.

The logs can be captured from relay backed debugfs file through the utility
igt/tools/intel_guc_logger.

v2: Rebased to the latest drm-intel-nightly.

v3: Aligned with the modification of late debugfs registration, at the end of
    i915 Driver load. Did cleanup as per Tvrtko's review comments, added 3
    new patches to optimize the log-buffer flush interrupt handling, gather
    and report the logging related stats.

v4: Added 2 new patches to further optimize the log-buffer flush interrupt
    handling. Did cleanup as per Chris's review comments, fixed couple of
    issues related to clearing of Guc2Host message register. Switched to
    no-overwrite mode for the relay.

v5: Added a new patch to avail MOVNTDQA instruction based fast memcpy provided
    by a patch from Chris. Dropped the rt priority kthread patch, after
    evaluating all the optimizations with certain benchmarks like
    synmark_oglmultithread, synmark_oglbatch5 which generates flush interupts
    almost at every ms or less. Updated the older patches as per the review
    comments from Tvrtko and Chris W. Added a new patch to augment i915 error
    state with the GuC log buffer contents. Fixed the issue of User interrupt
    getting disabled for VEBOX ring, causing failure for certain IGTs.
    Also included 2 patches to support early logging for capturing boot
    time logs and use per CPU constructs on the relay side so as to address
    a WARNING issue with the call to relay_reserve(), without disabling
    preemption.

v6: Mainly did the rebasing, refactoring, cleanup as per the review comments
    and fixed error/warnings reported by checkpatch.

v7: Added a new patch to complete the pending log buffer flush work item in
    system suspend case. Cleaned up the irq handler & work item function
    by removing the check for GuC interrupts.

v8: Replaced the patch added in last version with a patch which marks the
    GuC log buffer flush interrupt handling WQ as freezable, as per the inputs
    from Imre. Refactored the log buffer sampling function and added a new
    helper function to improve the readability as per suggestions from Tvrtko.

v9: As per Chris's comment, removed the forceful flush of GuC log buffer from
    the error state capture path as that could have disturbed the atomicity
    required in error state path. Squashed the wc type vmalloc mapping patch
    with SSE4.1 movntdqa based memcpy patch. Added a BUG_ON for the relay
    buffer allocation size.

Akash Goel (12):
  drm/i915: New structure to contain GuC logging related fields
  drm/i915: Add low level set of routines for programming PM IER/IIR/IMR
    register set
  relay: Use per CPU constructs for the relay channel buffer pointers
  drm/i915: Add a relay backed debugfs interface for capturing GuC logs
  drm/i915: New lock to serialize the Host2GuC actions
  drm/i915: Add stats for GuC log buffer flush interrupts
  drm/i915: Optimization to reduce the sampling time of GuC log buffer
  drm/i915: Increase GuC log buffer size to reduce flush interrupts
  drm/i915: Augment i915 error state to include the dump of GuC log
    buffer
  drm/i915: Use SSE4.1 movntdqa based memcpy for sampling GuC log buffer
  drm/i915: Early creation of relay channel for capturing boot time logs
  drm/i915: Mark the GuC log buffer flush interrupts handling WQ as
    freezable

Sagar Arun Kamble (6):
  drm/i915: Decouple GuC log setup from verbosity parameter
  drm/i915: Add GuC ukernel logging related fields to fw interface file
  drm/i915: Support for GuC interrupts
  drm/i915: Handle log buffer flush interrupt event from GuC
  drm/i915: Support for forceful flush of GuC log buffer
  drm/i915: Debugfs support for GuC logging control

 drivers/gpu/drm/i915/Kconfig               |   1 +
 drivers/gpu/drm/i915/i915_debugfs.c        |  73 +++-
 drivers/gpu/drm/i915/i915_drv.c            |   2 +
 drivers/gpu/drm/i915/i915_drv.h            |   5 +-
 drivers/gpu/drm/i915/i915_gpu_error.c      |  20 +
 drivers/gpu/drm/i915/i915_guc_submission.c | 603 ++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_irq.c            | 159 ++++++--
 drivers/gpu/drm/i915/i915_reg.h            |  11 +
 drivers/gpu/drm/i915/intel_drv.h           |   6 +
 drivers/gpu/drm/i915/intel_guc.h           |  30 +-
 drivers/gpu/drm/i915/intel_guc_fwif.h      |  82 +++-
 drivers/gpu/drm/i915/intel_guc_loader.c    |  10 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c    |   4 +-
 include/linux/relay.h                      |  17 +-
 kernel/relay.c                             |  74 ++--
 15 files changed, 1010 insertions(+), 87 deletions(-)

-- 
1.9.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread