[RFC 00/22] Gen7 batch buffer command parser

* [RFC 00/22] Gen7 batch buffer command parser
@ 2013-11-26 16:51 bradley.d.volkin
  2013-11-26 16:51 ` [RFC 01/22] drm/i915: Add data structures for " bradley.d.volkin
                   ` (26 more replies)
  0 siblings, 27 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Certain OpenGL features (e.g. transform feedback, performance monitoring)
require userspace code to submit batches containing commands such as
MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
generations of the hardware will noop these commands in "unsecure" batches
(which includes all userspace batches submitted via i915) even though the
commands may be safe and represent the intended programming model of the device.

This series introduces a software command parser similar in operation to the
command parsing done in hardware for unsecure batches. However, the software
parser allows some operations that would be noop'd by hardware, if the parser
determines the operation is safe, and submits the batch as "secure" to prevent
hardware parsing. Currently the series implements this on IVB and HSW.

The series is divided into several phases:

patches 01-09: These implement infrastructure and the command parsing algorithm,
               all behind a module parameter. I expect some discussion and
	       rework, but hopefully there's nothing too controversial.
patches 10-17: These define the checks performed by the parser.
               I expect much discussion :)
patches 18-20: In a final pass over the command checks, I found some issues with
               the definitions. They looked painful to rebase in, so I've added
	       them here.
patches 21-22: These enable the parser by default. It runs on all batches except
               those that set the I915_EXEC_SECURE flag in the execbuffer2 call.

There are follow-up patches to libdrm and to i-g-t. The i-g-t tests are very
basic and do not test all of the commands used by the parser on the assumption
that I'm likely to make the same mistakes in both the parser and the test.

I've run the i-g-t gem_* tests, the piglit quick tests (w/Mesa git from a few
days ago), and generally used an Ubuntu 13.10 IVB system with the parser
running. Aside from a failure described below, I don't think there are any
regressions. That is, piglit claims some regressions, but from manually running
the tests I think these are false positives. However, I could use help in
getting broader testing, particularly around performance. In general, I see less
than 3% performance impact on HSW, with more like 10% impact for pathological
batch sizes. But we'll certainly want to run relevant benchmarks beyond what
I've done.

At this point there are a couple of known issues and potential improvements.

1) VLV. The parser is currently disabled for VLV. One type of check performed by
   the parser is that commands which access memory do so via PPGTT. VLV does not
   have PPGTT enabled at this time. I chose to implement the PPGTT checks via
   generic bit checking infrastructure in the parser, so they are not easily
   disabled for VLV. For now, I'm disabling parsing altogether in the hope that
   PPGTT can be enabled for VLV in the near future.
2) Coherency. I've found two types of coherency issues when reading the batch
   buffer from the CPU during execbuffer2. Looking for help with both issues.
    i. First, the i-g-t test gem_cpu_reloc blits to a batch buffer and the
       parser isn't properly waiting for the operation to complete before
       parsing. I tried adding i915_gem_object_sync(batch_obj, [ring|NULL])
       but that actually caused more failures.
   ii. Second, on VLV, I've seen cache coherency issues when userspace writes
       the batch via pwrite fast path before calling execbuffer2. The parser
       reads stale data. This works fine on IVB and HSW, so I believe it's an
       LLC vs. non-LLC issue. I'm just unclear on what the correct flushing or
       synchronization is for this scenario.
3) 2nd-level batches. The parser currently allows MI_BATCH_BUFFER_START commands
   in userspace batches without parsing them. The media driver uses 2nd-level
   batches, so a solution is required. I have some ideas but don't want to delay
   the review process for what I have so far. It may be that the 2nd-level
   parsing is only needed for VCS and the current code (plus rejecting BBS)
   would be sufficient for RCS.
4) Command buffer copy. To avoid CPU modifications to buffers after parsing, and
   to avoid GPU modifications to buffers via EUs or commands in the batch, we
   should copy the userspace batch buffer to memory that userspace does not
   have access to, map it into GGTT, and execute that batch buffer. I have a
   sense of how to do this for 1st-level batches, but it would need changes to
   tie in with the 2nd-level batch parsing I think, so I've again held off.

Brad Volkin (22):
  drm/i915: Add data structures for command parser
  drm/i915: Initial command parser table definitions
  drm/i915: Hook command parser tables up to rings
  drm/i915: Add per-ring command length decode functions
  drm/i915: Implement command parsing
  drm/i915: Add a HAS_CMD_PARSER getparam
  drm/i915: Add support for rejecting commands during parsing
  drm/i915: Add support for checking register accesses
  drm/i915: Add support for rejecting commands via bitmasks
  drm/i915: Reject unsafe commands
  drm/i915: Add register whitelists for mesa
  drm/i915: Enable register whitelist checks
  drm/i915: Enable bit checking for some commands
  drm/i915: Enable PPGTT command parser checks
  drm/i915: Reject commands that would store to global HWS page
  drm/i915: Reject additional commands
  drm/i915: Add parser data for perf monitoring GL extensions
  drm/i915: Reject MI_ARB_ON_OFF on VECS
  drm/i915: Fix length handling for MFX_WAIT
  drm/i915: Fix MI_STORE_DWORD_IMM parser defintion
  drm/i915: Clean up command parser enable decision
  drm/i915: Enable command parsing by default

 drivers/gpu/drm/i915/Makefile              |   3 +-
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 712 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_dma.c            |   3 +
 drivers/gpu/drm/i915/i915_drv.c            |   5 +
 drivers/gpu/drm/i915/i915_drv.h            |  96 ++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  15 +
 drivers/gpu/drm/i915/i915_reg.h            |  66 +++
 drivers/gpu/drm/i915/intel_ringbuffer.c    |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  25 +
 include/uapi/drm/i915_drm.h                |   1 +
 10 files changed, 927 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c

-- 
1.8.4.4

^ permalink raw reply	[flat|nested] 138+ messages in thread