All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/22] Gen7 batch buffer command parser
@ 2013-11-26 16:51 bradley.d.volkin
  2013-11-26 16:51 ` [RFC 01/22] drm/i915: Add data structures for " bradley.d.volkin
                   ` (26 more replies)
  0 siblings, 27 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Certain OpenGL features (e.g. transform feedback, performance monitoring)
require userspace code to submit batches containing commands such as
MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
generations of the hardware will noop these commands in "unsecure" batches
(which includes all userspace batches submitted via i915) even though the
commands may be safe and represent the intended programming model of the device.

This series introduces a software command parser similar in operation to the
command parsing done in hardware for unsecure batches. However, the software
parser allows some operations that would be noop'd by hardware, if the parser
determines the operation is safe, and submits the batch as "secure" to prevent
hardware parsing. Currently the series implements this on IVB and HSW.

The series is divided into several phases:

patches 01-09: These implement infrastructure and the command parsing algorithm,
               all behind a module parameter. I expect some discussion and
	       rework, but hopefully there's nothing too controversial.
patches 10-17: These define the checks performed by the parser.
               I expect much discussion :)
patches 18-20: In a final pass over the command checks, I found some issues with
               the definitions. They looked painful to rebase in, so I've added
	       them here.
patches 21-22: These enable the parser by default. It runs on all batches except
               those that set the I915_EXEC_SECURE flag in the execbuffer2 call.

There are follow-up patches to libdrm and to i-g-t. The i-g-t tests are very
basic and do not test all of the commands used by the parser on the assumption
that I'm likely to make the same mistakes in both the parser and the test.

I've run the i-g-t gem_* tests, the piglit quick tests (w/Mesa git from a few
days ago), and generally used an Ubuntu 13.10 IVB system with the parser
running. Aside from a failure described below, I don't think there are any
regressions. That is, piglit claims some regressions, but from manually running
the tests I think these are false positives. However, I could use help in
getting broader testing, particularly around performance. In general, I see less
than 3% performance impact on HSW, with more like 10% impact for pathological
batch sizes. But we'll certainly want to run relevant benchmarks beyond what
I've done.

At this point there are a couple of known issues and potential improvements.

1) VLV. The parser is currently disabled for VLV. One type of check performed by
   the parser is that commands which access memory do so via PPGTT. VLV does not
   have PPGTT enabled at this time. I chose to implement the PPGTT checks via
   generic bit checking infrastructure in the parser, so they are not easily
   disabled for VLV. For now, I'm disabling parsing altogether in the hope that
   PPGTT can be enabled for VLV in the near future.
2) Coherency. I've found two types of coherency issues when reading the batch
   buffer from the CPU during execbuffer2. Looking for help with both issues.
    i. First, the i-g-t test gem_cpu_reloc blits to a batch buffer and the
       parser isn't properly waiting for the operation to complete before
       parsing. I tried adding i915_gem_object_sync(batch_obj, [ring|NULL])
       but that actually caused more failures.
   ii. Second, on VLV, I've seen cache coherency issues when userspace writes
       the batch via pwrite fast path before calling execbuffer2. The parser
       reads stale data. This works fine on IVB and HSW, so I believe it's an
       LLC vs. non-LLC issue. I'm just unclear on what the correct flushing or
       synchronization is for this scenario.
3) 2nd-level batches. The parser currently allows MI_BATCH_BUFFER_START commands
   in userspace batches without parsing them. The media driver uses 2nd-level
   batches, so a solution is required. I have some ideas but don't want to delay
   the review process for what I have so far. It may be that the 2nd-level
   parsing is only needed for VCS and the current code (plus rejecting BBS)
   would be sufficient for RCS.
4) Command buffer copy. To avoid CPU modifications to buffers after parsing, and
   to avoid GPU modifications to buffers via EUs or commands in the batch, we
   should copy the userspace batch buffer to memory that userspace does not
   have access to, map it into GGTT, and execute that batch buffer. I have a
   sense of how to do this for 1st-level batches, but it would need changes to
   tie in with the 2nd-level batch parsing I think, so I've again held off.

Brad Volkin (22):
  drm/i915: Add data structures for command parser
  drm/i915: Initial command parser table definitions
  drm/i915: Hook command parser tables up to rings
  drm/i915: Add per-ring command length decode functions
  drm/i915: Implement command parsing
  drm/i915: Add a HAS_CMD_PARSER getparam
  drm/i915: Add support for rejecting commands during parsing
  drm/i915: Add support for checking register accesses
  drm/i915: Add support for rejecting commands via bitmasks
  drm/i915: Reject unsafe commands
  drm/i915: Add register whitelists for mesa
  drm/i915: Enable register whitelist checks
  drm/i915: Enable bit checking for some commands
  drm/i915: Enable PPGTT command parser checks
  drm/i915: Reject commands that would store to global HWS page
  drm/i915: Reject additional commands
  drm/i915: Add parser data for perf monitoring GL extensions
  drm/i915: Reject MI_ARB_ON_OFF on VECS
  drm/i915: Fix length handling for MFX_WAIT
  drm/i915: Fix MI_STORE_DWORD_IMM parser defintion
  drm/i915: Clean up command parser enable decision
  drm/i915: Enable command parsing by default

 drivers/gpu/drm/i915/Makefile              |   3 +-
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 712 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_dma.c            |   3 +
 drivers/gpu/drm/i915/i915_drv.c            |   5 +
 drivers/gpu/drm/i915/i915_drv.h            |  96 ++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  15 +
 drivers/gpu/drm/i915/i915_reg.h            |  66 +++
 drivers/gpu/drm/i915/intel_ringbuffer.c    |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  25 +
 include/uapi/drm/i915_drm.h                |   1 +
 10 files changed, 927 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c

-- 
1.8.4.4

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [RFC 01/22] drm/i915: Add data structures for command parser
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 02/22] drm/i915: Initial command parser table definitions bradley.d.volkin
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

The command parser needs to know a few things about certain commands
in order to process them correctly. Add structures for storing that
information.

OTC-Tracker: AXIA-4631
Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h | 51 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 14f250a..ff1e201 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1731,6 +1731,57 @@ struct drm_i915_file_private {
 	atomic_t rps_wait_boost;
 };
 
+/**
+ * A command that requires special handling by the command parser.
+ */
+struct drm_i915_cmd_descriptor {
+	/**
+	 * Flags describing how the command parser processes the command.
+	 *
+	 * CMD_DESC_FIXED: The command has a fixed length if this is set,
+	 *                 a length mask if not set
+	 * CMD_DESC_SKIP: The command is allowed but does not follow the
+	 *                standard length encoding for the opcode range in
+	 *                which it falls
+	 */
+	u32 flags;
+#define CMD_DESC_FIXED (1<<0)
+#define CMD_DESC_SKIP  (1<<1)
+
+	/**
+	 * The command's unique identification bits and the bitmask to get them.
+	 * This isn't strictly the opcode field as defined in the spec and may
+	 * also include type, subtype, and/or subop fields.
+	 */
+	struct {
+		u32 value;
+		u32 mask;
+	} cmd;
+
+	/**
+	 * The command's length. The command is either fixed length (i.e. does
+	 * not include a length field) or has a length field mask. The flag
+	 * CMD_DESC_FIXED indicates a fixed length. Otherwise, the command has
+	 * a length mask. All command entries in a command table must include
+	 * length information.
+	 */
+	union {
+		u32 fixed;
+		u32 mask;
+	} length;
+};
+
+/**
+ * A table of commands requiring special handling by the command parser.
+ *
+ * Each ring has an array of tables. Each table consists of an array of command
+ * descriptors, which must be sorted with command opcodes in ascending order.
+ */
+struct drm_i915_cmd_table {
+	const struct drm_i915_cmd_descriptor *table;
+	int count;
+};
+
 #define INTEL_INFO(dev)	(to_i915(dev)->info)
 
 #define IS_I830(dev)		((dev)->pdev->device == 0x3577)
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 02/22] drm/i915: Initial command parser table definitions
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
  2013-11-26 16:51 ` [RFC 01/22] drm/i915: Add data structures for " bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 03/22] drm/i915: Hook command parser tables up to rings bradley.d.volkin
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Add command tables defining irregular length commands for each ring.
This requires a few new command opcode definitions.

OTC-Tracker: AXIA-4631
Change-Id: I064bceb457e15f46928058352afe76d918c58ef5
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/Makefile          |   3 +-
 drivers/gpu/drm/i915/i915_cmd_parser.c | 138 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h        |  34 ++++++++
 3 files changed, 174 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 41838ea..7b7450b 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -47,7 +47,8 @@ i915-y := i915_drv.o i915_dma.o i915_irq.o \
 	  dvo_tfp410.o \
 	  dvo_sil164.o \
 	  dvo_ns2501.o \
-	  i915_gem_dmabuf.o
+	  i915_gem_dmabuf.o \
+	  i915_cmd_parser.o
 
 i915-$(CONFIG_COMPAT)   += i915_ioc32.o
 
diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
new file mode 100644
index 0000000..deb77c8c
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -0,0 +1,138 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Brad Volkin <bradley.d.volkin@intel.com>
+ *
+ */
+
+#include "i915_drv.h"
+
+#define STD_MI_OPCODE_MASK  0xFF800000
+#define STD_3D_OPCODE_MASK  0xFFFF0000
+#define STD_2D_OPCODE_MASK  0xFFC00000
+#define STD_MFX_OPCODE_MASK 0xFFFF0000
+
+#define CMD(op, opm, f, lm, fl, ...)		\
+	{					\
+		.flags = (fl) | (f),		\
+		.cmd = { (op), (opm) },		\
+		.length = { (lm) },		\
+		__VA_ARGS__			\
+	}
+
+/* Convenience macros to compress the tables */
+#define SMI STD_MI_OPCODE_MASK
+#define S3D STD_3D_OPCODE_MASK
+#define S2D STD_2D_OPCODE_MASK
+#define SMFX STD_MFX_OPCODE_MASK
+#define F CMD_DESC_FIXED
+#define S CMD_DESC_SKIP
+
+/*            Command                          Mask   Fixed Len   Action
+	      ---------------------------------------------------------- */
+static const struct drm_i915_cmd_descriptor common_cmds[] = {
+	CMD(  MI_NOOP,                          SMI,    F,  1,      S  ),
+	CMD(  MI_USER_INTERRUPT,                SMI,    F,  1,      S  ),
+	CMD(  MI_WAIT_FOR_EVENT,                SMI,    F,  1,      S  ),
+	CMD(  MI_ARB_CHECK,                     SMI,    F,  1,      S  ),
+	CMD(  MI_REPORT_HEAD,                   SMI,    F,  1,      S  ),
+	CMD(  MI_SUSPEND_FLUSH,                 SMI,    F,  1,      S  ),
+	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  S  ),
+	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
+};
+
+static const struct drm_i915_cmd_descriptor render_cmds[] = {
+	CMD(  MI_FLUSH,                         SMI,    F,  1,      S  ),
+	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
+	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_PREDICATE,                     SMI,    F,  1,      S  ),
+	CMD(  MI_TOPOLOGY_FILTER,               SMI,    F,  1,      S  ),
+	CMD(  MI_CLFLUSH,                       SMI,   !F,  0x3FF,  S  ),
+	CMD(  GFX_OP_3DSTATE_VF_STATISTICS,     S3D,    F,  1,      S  ),
+	CMD(  PIPELINE_SELECT,                  S3D,    F,  1,      S  ),
+	CMD(  GPGPU_OBJECT,                     S3D,   !F,  0xFF,   S  ),
+	CMD(  GPGPU_WALKER,                     S3D,   !F,  0xFF,   S  ),
+	CMD(  GFX_OP_3DSTATE_SO_DECL_LIST,      S3D,   !F,  0x1FF,  S  ),
+};
+
+static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
+	CMD(  MI_SET_PREDICATE,                 SMI,    F,  1,      S  ),
+	CMD(  MI_RS_CONTROL,                    SMI,    F,  1,      S  ),
+	CMD(  MI_URB_ATOMIC_ALLOC,              SMI,    F,  1,      S  ),
+	CMD(  MI_RS_CONTEXT,                    SMI,    F,  1,      S  ),
+	CMD(  MI_MATH,                          SMI,   !F,  0x3F,   S  ),
+	CMD(  MI_LOAD_REGISTER_REG,             SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_LOAD_URB_MEM,                  SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_STORE_URB_MEM,                 SMI,   !F,  0xFF,   S  ),
+	CMD(  GFX_OP_3DSTATE_DX9_CONSTANTF_VS,  S3D,   !F,  0x7FF,  S  ),
+	CMD(  GFX_OP_3DSTATE_DX9_CONSTANTF_PS,  S3D,   !F,  0x7FF,  S  ),
+};
+
+static const struct drm_i915_cmd_descriptor video_cmds[] = {
+	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
+	CMD(  MFX_WAIT,                         SMFX,  !F,  0x3F,   S  ),
+};
+
+static const struct drm_i915_cmd_descriptor blt_cmds[] = {
+	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   S  ),
+	CMD(  COLOR_BLT,                        S2D,   !F,  0x3F,   S  ),
+	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
+};
+
+#undef CMD
+#undef SMI
+#undef S3D
+#undef S2D
+#undef SMFX
+#undef F
+#undef S
+
+static const struct drm_i915_cmd_table gen7_render_cmds[] = {
+	{ common_cmds, ARRAY_SIZE(common_cmds) },
+	{ render_cmds, ARRAY_SIZE(render_cmds) },
+};
+
+static const struct drm_i915_cmd_table hsw_render_ring_cmds[] = {
+	{ common_cmds, ARRAY_SIZE(common_cmds) },
+	{ render_cmds, ARRAY_SIZE(render_cmds) },
+	{ hsw_render_cmds, ARRAY_SIZE(hsw_render_cmds) },
+};
+
+static const struct drm_i915_cmd_table gen7_video_cmds[] = {
+	{ common_cmds, ARRAY_SIZE(common_cmds) },
+	{ video_cmds, ARRAY_SIZE(video_cmds) },
+};
+
+static const struct drm_i915_cmd_table hsw_vebox_cmds[] = {
+	{ common_cmds, ARRAY_SIZE(common_cmds) },
+};
+
+static const struct drm_i915_cmd_table gen7_blt_cmds[] = {
+	{ common_cmds, ARRAY_SIZE(common_cmds) },
+	{ blt_cmds, ARRAY_SIZE(blt_cmds) },
+};
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index f2104f5..5d52569 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -335,6 +335,40 @@
 #define   PIPE_CONTROL_DEPTH_CACHE_FLUSH		(1<<0)
 #define   PIPE_CONTROL_GLOBAL_GTT (1<<2) /* in addr dword */
 
+/*
+ * Commands used only by the command parser
+ */
+#define MI_SET_PREDICATE       MI_INSTR(0x01, 0)
+#define MI_ARB_CHECK           MI_INSTR(0x05, 0)
+#define MI_RS_CONTROL          MI_INSTR(0x06, 0)
+#define MI_URB_ATOMIC_ALLOC    MI_INSTR(0x09, 0)
+#define MI_PREDICATE           MI_INSTR(0x0C, 0)
+#define MI_RS_CONTEXT          MI_INSTR(0x0F, 0)
+#define MI_TOPOLOGY_FILTER     MI_INSTR(0x0D, 0)
+#define MI_MATH                MI_INSTR(0x1A, 0)
+#define MI_UPDATE_GTT          MI_INSTR(0x23, 0)
+#define MI_CLFLUSH             MI_INSTR(0x27, 0)
+#define MI_LOAD_REGISTER_MEM   MI_INSTR(0x29, 0)
+#define MI_LOAD_REGISTER_REG   MI_INSTR(0x2A, 0)
+#define MI_LOAD_URB_MEM        MI_INSTR(0x2C, 0)
+#define MI_STORE_URB_MEM       MI_INSTR(0x2D, 0)
+
+#define PIPELINE_SELECT                ((0x3<<29)|(0x1<<27)|(0x1<<24)|(0x4<<16))
+#define GFX_OP_3DSTATE_VF_STATISTICS   ((0x3<<29)|(0x1<<27)|(0x0<<24)|(0xB<<16))
+#define MEDIA_VFE_STATE                ((0x3<<29)|(0x2<<27)|(0x0<<24)|(0x0<<16))
+#define GPGPU_OBJECT                   ((0x3<<29)|(0x2<<27)|(0x1<<24)|(0x4<<16))
+#define GPGPU_WALKER                   ((0x3<<29)|(0x2<<27)|(0x1<<24)|(0x5<<16))
+#define GFX_OP_3DSTATE_DX9_CONSTANTF_VS \
+	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x39<<16))
+#define GFX_OP_3DSTATE_DX9_CONSTANTF_PS \
+	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x3A<<16))
+#define GFX_OP_3DSTATE_SO_DECL_LIST \
+	((0x3<<29)|(0x3<<27)|(0x1<<24)|(0x17<<16))
+
+#define MFX_WAIT  ((0x3<<29)|(0x1<<27)|(0x0<<16))
+
+#define COLOR_BLT     ((0x2<<29)|(0x40<<22))
+#define SRC_COPY_BLT  ((0x2<<29)|(0x43<<22))
 
 /*
  * Reset registers
-- 
1.8.4.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 03/22] drm/i915: Hook command parser tables up to rings
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
  2013-11-26 16:51 ` [RFC 01/22] drm/i915: Add data structures for " bradley.d.volkin
  2013-11-26 16:51 ` [RFC 02/22] drm/i915: Initial command parser table definitions bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 04/22] drm/i915: Add per-ring command length decode functions bradley.d.volkin
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

OTC-Tracker: AXIA-4631
Change-Id: Id178f67338d00c23ca4e8fc9499313dba93d2c5c
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c  | 34 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h         |  3 +++
 drivers/gpu/drm/i915/intel_ringbuffer.c |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  7 +++++++
 4 files changed, 46 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index deb77c8c..014e661 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -136,3 +136,37 @@ static const struct drm_i915_cmd_table gen7_blt_cmds[] = {
 	{ common_cmds, ARRAY_SIZE(common_cmds) },
 	{ blt_cmds, ARRAY_SIZE(blt_cmds) },
 };
+
+void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
+{
+	if (!IS_GEN7(ring->dev))
+		return;
+
+	switch (ring->id) {
+	case RCS:
+		if (IS_HASWELL(ring->dev)) {
+			ring->cmd_tables = hsw_render_ring_cmds;
+			ring->cmd_table_count =
+				ARRAY_SIZE(hsw_render_ring_cmds);
+		} else {
+			ring->cmd_tables = gen7_render_cmds;
+			ring->cmd_table_count = ARRAY_SIZE(gen7_render_cmds);
+		}
+		break;
+	case VCS:
+		ring->cmd_tables = gen7_video_cmds;
+		ring->cmd_table_count = ARRAY_SIZE(gen7_video_cmds);
+		break;
+	case BCS:
+		ring->cmd_tables = gen7_blt_cmds;
+		ring->cmd_table_count = ARRAY_SIZE(gen7_blt_cmds);
+		break;
+	case VECS:
+		ring->cmd_tables = hsw_vebox_cmds;
+		ring->cmd_table_count = ARRAY_SIZE(hsw_vebox_cmds);
+		break;
+	default:
+		DRM_DEBUG("CMD: cmd_parser_init with unknown ring: %d\n",
+			  ring->id);
+	}
+}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ff1e201..7de4e59 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2372,6 +2372,9 @@ void i915_destroy_error_state(struct drm_device *dev);
 void i915_get_extra_instdone(struct drm_device *dev, uint32_t *instdone);
 const char *i915_cache_level_str(int type);
 
+/* i915_cmd_parser.c */
+void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring);
+
 /* i915_suspend.c */
 extern int i915_save_state(struct drm_device *dev);
 extern int i915_restore_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 69589e4..553b889 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1382,6 +1382,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	if (IS_I830(ring->dev) || IS_845G(ring->dev))
 		ring->effective_size -= 128;
 
+	i915_cmd_parser_init_ring(ring);
+
 	return 0;
 
 err_unmap:
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 71a73f4..67305d3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -162,6 +162,13 @@ struct  intel_ring_buffer {
 		u32 gtt_offset;
 		volatile u32 *cpu_page;
 	} scratch;
+
+	/**
+	 * Tables of commands the command parser needs to know about
+	 * for this ring.
+	 */
+	const struct drm_i915_cmd_table *cmd_tables;
+	int cmd_table_count;
 };
 
 static inline bool
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 04/22] drm/i915: Add per-ring command length decode functions
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (2 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 03/22] drm/i915: Hook command parser tables up to rings bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 05/22] drm/i915: Implement command parsing bradley.d.volkin
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

For commands that aren't in the parser's tables, we get the length
based on standard per-ring command encodings for specific opcode ranges.

These functions just return the bitmask and the parser will extract the
actual length value.

OTC-Tracker: AXIA-4631
Change-Id: I2729d4483931cb4aea9403fd43710c4d4e8e5e89
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c  | 62 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h | 12 +++++++
 2 files changed, 74 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 014e661..247d530 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -137,6 +137,62 @@ static const struct drm_i915_cmd_table gen7_blt_cmds[] = {
 	{ blt_cmds, ARRAY_SIZE(blt_cmds) },
 };
 
+#define CLIENT_MASK      0xE0000000
+#define SUBCLIENT_MASK   0x18000000
+#define MI_CLIENT        0x00000000
+#define RC_CLIENT        0x60000000
+#define BC_CLIENT        0x40000000
+#define MEDIA_SUBCLIENT  0x10000000
+
+static u32 gen7_render_get_cmd_length_mask(u32 cmd_header)
+{
+	u32 client = cmd_header & CLIENT_MASK;
+	u32 subclient = cmd_header & SUBCLIENT_MASK;
+
+	if (client == MI_CLIENT)
+		return 0x3F;
+	else if (client == RC_CLIENT) {
+		if (subclient == MEDIA_SUBCLIENT)
+			return 0xFFFF;
+		else
+			return 0xFF;
+	}
+
+	DRM_DEBUG_DRIVER("CMD: Abnormal rcs cmd length! 0x%08X\n", cmd_header);
+	return 0;
+}
+
+static u32 gen7_bsd_get_cmd_length_mask(u32 cmd_header)
+{
+	u32 client = cmd_header & CLIENT_MASK;
+	u32 subclient = cmd_header & SUBCLIENT_MASK;
+
+	if (client == MI_CLIENT)
+		return 0x3F;
+	else if (client == RC_CLIENT) {
+		if (subclient == MEDIA_SUBCLIENT)
+			return 0xFFF;
+		else
+			return 0xFF;
+	}
+
+	DRM_DEBUG_DRIVER("CMD: Abnormal bsd cmd length! 0x%08X\n", cmd_header);
+	return 0;
+}
+
+static u32 gen7_blt_get_cmd_length_mask(u32 cmd_header)
+{
+	u32 client = cmd_header & CLIENT_MASK;
+
+	if (client == MI_CLIENT)
+		return 0x3F;
+	else if (client == BC_CLIENT)
+		return 0xFF;
+
+	DRM_DEBUG_DRIVER("CMD: Abnormal blt cmd length! 0x%08X\n", cmd_header);
+	return 0;
+}
+
 void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
 {
 	if (!IS_GEN7(ring->dev))
@@ -152,18 +208,24 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
 			ring->cmd_tables = gen7_render_cmds;
 			ring->cmd_table_count = ARRAY_SIZE(gen7_render_cmds);
 		}
+
+		ring->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
 		break;
 	case VCS:
 		ring->cmd_tables = gen7_video_cmds;
 		ring->cmd_table_count = ARRAY_SIZE(gen7_video_cmds);
+		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
 		break;
 	case BCS:
 		ring->cmd_tables = gen7_blt_cmds;
 		ring->cmd_table_count = ARRAY_SIZE(gen7_blt_cmds);
+		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
 		break;
 	case VECS:
 		ring->cmd_tables = hsw_vebox_cmds;
 		ring->cmd_table_count = ARRAY_SIZE(hsw_vebox_cmds);
+		/* VECS can use the same length_mask function as VCS */
+		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
 		break;
 	default:
 		DRM_DEBUG("CMD: cmd_parser_init with unknown ring: %d\n",
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 67305d3..8e71b59 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -169,6 +169,18 @@ struct  intel_ring_buffer {
 	 */
 	const struct drm_i915_cmd_table *cmd_tables;
 	int cmd_table_count;
+
+	/**
+	 * Returns the bitmask for the length field of the specified command.
+	 * Return 0 for an unrecognized/invalid command.
+	 *
+	 * If the command parser finds an entry for a command in the ring's
+	 * cmd_tables, it gets the command's length based on the table entry.
+	 * If not, it calls this function to determine the per-ring length field
+	 * encoding for the command (i.e. certain opcode ranges use certain bits
+	 * to encode the command length in the header).
+	 */
+	u32 (*get_cmd_length_mask)(u32 cmd_header);
 };
 
 static inline bool
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 05/22] drm/i915: Implement command parsing
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (3 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 04/22] drm/i915: Add per-ring command length decode functions bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 17:29   ` Chris Wilson
  2013-11-26 16:51 ` [RFC 06/22] drm/i915: Add a HAS_CMD_PARSER getparam bradley.d.volkin
                   ` (21 subsequent siblings)
  26 siblings, 1 reply; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

At this point the parser just implements the mechanics of finding
commands in the batch buffer and looking them up in the parser tables.

It rejects poorly formed batches (e.g. no MI_BATCH_BUFFER_END command,
containing unrecognized commands, etc) but does no other checks.

Optionally enabled by a module parameter, currently defaulting to off.
We'll enable by default at the end of the series.

The code to look up commands in the per-ring tables implements a linear
search. The tables are small so in practice this does not appear to cause
a performance issue. However, the tables are sorted by command opcode such
that we can move to something like binary search if this code becomes a
bottleneck in the future.

OTC-Tracker: AXIA-4631
Change-Id: If71f50d28578d27325c3438abf4b229c7cf695cd
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 148 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.c            |   5 +
 drivers/gpu/drm/i915/i915_drv.h            |   4 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  15 +++
 4 files changed, 172 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 247d530..b01628e 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -232,3 +232,151 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
 			  ring->id);
 	}
 }
+
+static const struct drm_i915_cmd_descriptor*
+find_cmd_in_table(const struct drm_i915_cmd_table *table,
+		  u32 cmd_header)
+{
+	int i;
+
+	for (i = 0; i < table->count; i++) {
+		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
+		u32 masked_cmd = desc->cmd.mask & cmd_header;
+		u32 masked_value = desc->cmd.value & desc->cmd.mask;
+
+		if (masked_cmd == masked_value)
+			return desc;
+	}
+
+	return NULL;
+}
+
+/* Returns a pointer to a descriptor for the command specified by cmd_header.
+ *
+ * The caller must supply space for a default descriptor via the default_desc
+ * parameter. If no descriptor for the specified command exists in the ring's
+ * command parser tables, this function fills in default_desc based on the
+ * ring's default length encoding and returns default_desc.
+ */
+static const struct drm_i915_cmd_descriptor*
+find_cmd(struct intel_ring_buffer *ring,
+	 u32 cmd_header,
+	 struct drm_i915_cmd_descriptor *default_desc)
+{
+	u32 mask;
+	int i;
+
+	for (i = 0; i < ring->cmd_table_count; i++) {
+		const struct drm_i915_cmd_descriptor *desc;
+
+		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
+		if (desc)
+			return desc;
+	}
+
+	mask = ring->get_cmd_length_mask(cmd_header);
+	if (!mask)
+		return NULL;
+
+	BUG_ON(!default_desc);
+	default_desc->flags = CMD_DESC_SKIP;
+	default_desc->length.mask = mask;
+
+	return default_desc;
+}
+
+static u32 *vmap_batch(struct drm_i915_gem_object *obj)
+{
+	int i;
+	void *addr = NULL;
+	struct sg_page_iter sg_iter;
+	struct page **pages;
+
+	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
+	if (pages == NULL) {
+		DRM_DEBUG("Failed to get space for pages\n");
+		goto finish;
+	}
+
+	i = 0;
+	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
+		pages[i] = sg_page_iter_page(&sg_iter);
+		i++;
+	}
+
+	addr = vmap(pages, i, 0, PAGE_KERNEL);
+	if (addr == NULL) {
+		DRM_DEBUG("Failed to vmap pages\n");
+		goto finish;
+	}
+
+finish:
+	if (pages)
+		drm_free_large(pages);
+	return (u32*)addr;
+}
+
+#define LENGTH_BIAS 2
+
+int i915_parse_cmds(struct intel_ring_buffer *ring,
+		    struct drm_i915_gem_object *batch_obj,
+		    u32 batch_start_offset)
+{
+	int ret = 0;
+	u32 *cmd, *batch_base, *batch_end;
+	struct drm_i915_cmd_descriptor default_desc = { 0 };
+
+	/* No command tables currently indicates a platform without parsing */
+	if (!ring->cmd_tables)
+		return 0;
+
+	batch_base = vmap_batch(batch_obj);
+	if (!batch_base) {
+		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
+		return -ENOMEM;
+	}
+
+	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
+	batch_end = cmd + (batch_obj->base.size / sizeof(*batch_end));
+
+	while (cmd < batch_end) {
+		const struct drm_i915_cmd_descriptor *desc;
+		u32 length;
+
+		if (*cmd == MI_BATCH_BUFFER_END)
+			break;
+
+		desc = find_cmd(ring, *cmd, &default_desc);
+		if (!desc) {
+			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
+					 *cmd);
+			ret = -EINVAL;
+			break;
+		}
+
+		if (desc->flags & CMD_DESC_FIXED)
+			length = desc->length.fixed;
+		else
+			length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
+
+		if ((batch_end - cmd) < length) {
+			DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%d batchlen=%ld\n",
+					 *cmd,
+					 length,
+					 batch_end - cmd);
+			ret = -EINVAL;
+			break;
+		}
+
+		cmd += length;
+	}
+
+	if (cmd >= batch_end) {
+		DRM_DEBUG_DRIVER("CMD: Got to the end of the buffer w/o a BBE cmd!\n");
+		ret = -EINVAL;
+	}
+
+	vunmap(batch_base);
+
+	return ret;
+}
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 13076db..90d7db0 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -154,6 +154,11 @@ module_param_named(prefault_disable, i915_prefault_disable, bool, 0600);
 MODULE_PARM_DESC(prefault_disable,
 		"Disable page prefaulting for pread/pwrite/reloc (default:false). For developers only.");
 
+int i915_enable_cmd_parser __read_mostly = 0;
+module_param_named(enable_cmd_parser, i915_enable_cmd_parser, int, 0600);
+MODULE_PARM_DESC(enable_cmd_parser,
+		"Enable command parsing (default: false)");
+
 static struct drm_driver driver;
 
 static const struct intel_device_info intel_i830_info = {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7de4e59..81ef047 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1926,6 +1926,7 @@ extern bool i915_fastboot __read_mostly;
 extern int i915_enable_pc8 __read_mostly;
 extern int i915_pc8_timeout __read_mostly;
 extern bool i915_prefault_disable __read_mostly;
+extern int i915_enable_cmd_parser __read_mostly;
 
 extern int i915_suspend(struct drm_device *dev, pm_message_t state);
 extern int i915_resume(struct drm_device *dev);
@@ -2374,6 +2375,9 @@ const char *i915_cache_level_str(int type);
 
 /* i915_cmd_parser.c */
 void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring);
+int i915_parse_cmds(struct intel_ring_buffer *ring,
+		    struct drm_i915_gem_object *batch_obj,
+		    u32 batch_start_offset);
 
 /* i915_suspend.c */
 extern int i915_save_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b800fe4..06975c7 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1143,6 +1143,21 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
 
+	if (i915_enable_cmd_parser && !(flags & I915_DISPATCH_SECURE)) {
+		ret = i915_parse_cmds(ring,
+				      batch_obj,
+				      args->batch_start_offset);
+		if (ret)
+			goto err;
+
+		/* Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
+		 * from MI_BATCH_BUFFER_START commands issued in the
+		 * dispatch_execbuffer implementations. We specifically don't
+		 * want that set when the command parser is enabled.
+		 */
+		flags |= I915_DISPATCH_SECURE;
+	}
+
 	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
 	 * hsw should have this fixed, but bdw mucks it up again. */
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 06/22] drm/i915: Add a HAS_CMD_PARSER getparam
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (4 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 05/22] drm/i915: Implement command parsing bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-27 12:51   ` Daniel Vetter
  2013-11-26 16:51 ` [RFC 07/22] drm/i915: Add support for rejecting commands during parsing bradley.d.volkin
                   ` (20 subsequent siblings)
  26 siblings, 1 reply; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

So userspace can query the kernel for command parser support.

OTC-Tracker: AXIA-4631
Change-Id: I58af650db9f6753c2dcac9c54ab432fd31db302f
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c | 3 +++
 include/uapi/drm/i915_drm.h     | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 5aeb103..f0a4638 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1003,6 +1003,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_EXEC_HANDLE_LUT:
 		value = 1;
 		break;
+	case I915_PARAM_HAS_CMD_PARSER:
+		value = 1;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 52aed89..48cc277 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -337,6 +337,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_NO_RELOC	 25
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
 #define I915_PARAM_HAS_WT     	 	 27
+#define I915_PARAM_HAS_CMD_PARSER	 28
 
 typedef struct drm_i915_getparam {
 	int param;
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 07/22] drm/i915: Add support for rejecting commands during parsing
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (5 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 06/22] drm/i915: Add a HAS_CMD_PARSER getparam bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 08/22] drm/i915: Add support for checking register accesses bradley.d.volkin
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Certain commands are always disallowed from userspace. This adds
the ability for the command parser to detect such commands and
reject batch buffers containing them.

OTC-Tracker: AXIA-4631
Change-Id: I000b0df4d441ec80b607a50d35e83418cdfd38b3
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 6 ++++++
 drivers/gpu/drm/i915/i915_drv.h        | 6 ++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index b01628e..c64f640 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -368,6 +368,12 @@ int i915_parse_cmds(struct intel_ring_buffer *ring,
 			break;
 		}
 
+		if (desc->flags & CMD_DESC_REJECT) {
+			DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
+			ret = -EINVAL;
+			break;
+		}
+
 		cmd += length;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 81ef047..6ace856 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1743,10 +1743,12 @@ struct drm_i915_cmd_descriptor {
 	 * CMD_DESC_SKIP: The command is allowed but does not follow the
 	 *                standard length encoding for the opcode range in
 	 *                which it falls
+	 * CMD_DESC_REJECT: The command is never allowed
 	 */
 	u32 flags;
-#define CMD_DESC_FIXED (1<<0)
-#define CMD_DESC_SKIP  (1<<1)
+#define CMD_DESC_FIXED  (1<<0)
+#define CMD_DESC_SKIP   (1<<1)
+#define CMD_DESC_REJECT (1<<2)
 
 	/**
 	 * The command's unique identification bits and the bitmask to get them.
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 08/22] drm/i915: Add support for checking register accesses
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (6 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 07/22] drm/i915: Add support for rejecting commands during parsing bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 09/22] drm/i915: Add support for rejecting commands via bitmasks bradley.d.volkin
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Some OpenGL/media features require userspace to perform register
accesses from batch buffers on Gen hardware.

To enable this, each ring gets a whitelist of registers that userspace
may access from a batch buffer. With this patch, no whitelists are defined,
so no access is allowed.

OTC-Tracker: AXIA-4631
Change-Id: Ibf607ec3b1df0076f6acbd9aee34a2ee48cfac6b
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c  | 26 ++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h         | 19 ++++++++++++++++---
 drivers/gpu/drm/i915/intel_ringbuffer.h |  6 ++++++
 3 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index c64f640..2dbca01 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -285,6 +285,20 @@ find_cmd(struct intel_ring_buffer *ring,
 	return default_desc;
 }
 
+static int valid_reg(const u32 *table, int count, u32 addr)
+{
+	if (table && count != 0) {
+		int i;
+
+		for (i = 0; i < count; i++) {
+			if (table[i] == addr)
+				return 1;
+		}
+	}
+
+	return 0;
+}
+
 static u32 *vmap_batch(struct drm_i915_gem_object *obj)
 {
 	int i;
@@ -374,6 +388,18 @@ int i915_parse_cmds(struct intel_ring_buffer *ring,
 			break;
 		}
 
+		if (desc->flags & CMD_DESC_REGISTER) {
+			u32 reg_addr = cmd[desc->reg.offset] & desc->reg.mask;
+
+			if (!valid_reg(ring->reg_table,
+				       ring->reg_count, reg_addr)) {
+				DRM_DEBUG_DRIVER("CMD: Rejected register 0x%08X in command: 0x%08X (ring=%d)\n",
+						 reg_addr, *cmd, ring->id);
+				ret = -EINVAL;
+				break;
+			}
+		}
+
 		cmd += length;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6ace856..83b6031 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1744,11 +1744,14 @@ struct drm_i915_cmd_descriptor {
 	 *                standard length encoding for the opcode range in
 	 *                which it falls
 	 * CMD_DESC_REJECT: The command is never allowed
+	 * CMD_DESC_REGISTER: The command should be checked against the
+	 *                    register whitelist for the appropriate ring
 	 */
 	u32 flags;
-#define CMD_DESC_FIXED  (1<<0)
-#define CMD_DESC_SKIP   (1<<1)
-#define CMD_DESC_REJECT (1<<2)
+#define CMD_DESC_FIXED    (1<<0)
+#define CMD_DESC_SKIP     (1<<1)
+#define CMD_DESC_REJECT   (1<<2)
+#define CMD_DESC_REGISTER (1<<3)
 
 	/**
 	 * The command's unique identification bits and the bitmask to get them.
@@ -1771,6 +1774,16 @@ struct drm_i915_cmd_descriptor {
 		u32 fixed;
 		u32 mask;
 	} length;
+
+	/**
+	 * Describes where to find a register address in the command to check
+	 * against the ring's register whitelist. Only valid if flags has the
+	 * CMD_DESC_REGISTER bit set.
+	 */
+	struct {
+		u32 offset;
+		u32 mask;
+	} reg;
 };
 
 /**
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 8e71b59..b898105a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -171,6 +171,12 @@ struct  intel_ring_buffer {
 	int cmd_table_count;
 
 	/**
+	 * Table of registers allowed in commands that read/write registers.
+	 */
+	const u32 *reg_table;
+	int reg_count;
+
+	/**
 	 * Returns the bitmask for the length field of the specified command.
 	 * Return 0 for an unrecognized/invalid command.
 	 *
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 09/22] drm/i915: Add support for rejecting commands via bitmasks
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (7 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 08/22] drm/i915: Add support for checking register accesses bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 10/22] drm/i915: Reject unsafe commands bradley.d.volkin
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

A variety of checks we want to do amount to verifying that a given
bit or bits are set/clear in a given dword of a command. For now,
allow a small but arbitrary number of bitmasks for each command.

OTC-Tracker: AXIA-4631
Change-Id: Icc77316c243b6e218774c15e2c090cc470d59317
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 22 ++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h        | 16 ++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 2dbca01..99d15f3 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -400,6 +400,28 @@ int i915_parse_cmds(struct intel_ring_buffer *ring,
 			}
 		}
 
+		if (desc->flags & CMD_DESC_BITMASK) {
+			int i;
+
+			for (i = 0; i < desc->bits_count; i++) {
+				u32 dword = cmd[desc->bits[i].offset] &
+					desc->bits[i].mask;
+
+				if (dword != desc->bits[i].expected) {
+					DRM_DEBUG_DRIVER("CMD: Rejected command 0x%08X for bitmask 0x%08X (exp=0x%08X act=0x%08X) (ring=%d)\n",
+							 *cmd,
+							 desc->bits[i].mask,
+							 desc->bits[i].expected,
+							 dword, ring->id);
+					ret = -EINVAL;
+					break;
+				}
+			}
+
+			if (ret)
+				break;
+		}
+
 		cmd += length;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 83b6031..f31fc68 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1752,6 +1752,7 @@ struct drm_i915_cmd_descriptor {
 #define CMD_DESC_SKIP     (1<<1)
 #define CMD_DESC_REJECT   (1<<2)
 #define CMD_DESC_REGISTER (1<<3)
+#define CMD_DESC_BITMASK  (1<<4)
 
 	/**
 	 * The command's unique identification bits and the bitmask to get them.
@@ -1784,6 +1785,21 @@ struct drm_i915_cmd_descriptor {
 		u32 offset;
 		u32 mask;
 	} reg;
+
+#define MAX_CMD_DESC_BITMASKS 3
+	/**
+	 * Describes command checks where a particular dword is masked and
+	 * compared against an expected value. If the command does not match
+	 * the expected value, the parser rejects it. Only valid if flags has
+	 * the CMD_DESC_BITMASK bit set.
+	 */
+	struct {
+		u32 offset;
+		u32 mask;
+		u32 expected;
+	} bits[MAX_CMD_DESC_BITMASKS];
+	/** Number of valid entries in the bits array */
+	int bits_count;
 };
 
 /**
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 10/22] drm/i915: Reject unsafe commands
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (8 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 09/22] drm/i915: Add support for rejecting commands via bitmasks bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 11/22] drm/i915: Add register whitelists for mesa bradley.d.volkin
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

These commands allow userspace to affect global state.

OTC-Tracker: AXIA-4631
Change-Id: I80a22c9cd83181790d2a9064e70ea09326691b66
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 99d15f3..8ee4cda 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -47,6 +47,7 @@
 #define SMFX STD_MFX_OPCODE_MASK
 #define F CMD_DESC_FIXED
 #define S CMD_DESC_SKIP
+#define R CMD_DESC_REJECT
 
 /*            Command                          Mask   Fixed Len   Action
 	      ---------------------------------------------------------- */
@@ -57,10 +58,11 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = {
 	CMD(  MI_ARB_CHECK,                     SMI,    F,  1,      S  ),
 	CMD(  MI_REPORT_HEAD,                   SMI,    F,  1,      S  ),
 	CMD(  MI_SUSPEND_FLUSH,                 SMI,    F,  1,      S  ),
-	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  S  ),
-	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   S  ),
 	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   S  ),
 	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
@@ -68,8 +70,8 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = {
 
 static const struct drm_i915_cmd_descriptor render_cmds[] = {
 	CMD(  MI_FLUSH,                         SMI,    F,  1,      S  ),
-	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
-	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
+	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_PREDICATE,                     SMI,    F,  1,      S  ),
 	CMD(  MI_TOPOLOGY_FILTER,               SMI,    F,  1,      S  ),
 	CMD(  MI_CLFLUSH,                       SMI,   !F,  0x3FF,  S  ),
@@ -94,12 +96,12 @@ static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
 };
 
 static const struct drm_i915_cmd_descriptor video_cmds[] = {
-	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
+	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
 	CMD(  MFX_WAIT,                         SMFX,  !F,  0x3F,   S  ),
 };
 
 static const struct drm_i915_cmd_descriptor blt_cmds[] = {
-	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
 	CMD(  COLOR_BLT,                        S2D,   !F,  0x3F,   S  ),
 	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
 };
@@ -111,6 +113,7 @@ static const struct drm_i915_cmd_descriptor blt_cmds[] = {
 #undef SMFX
 #undef F
 #undef S
+#undef R
 
 static const struct drm_i915_cmd_table gen7_render_cmds[] = {
 	{ common_cmds, ARRAY_SIZE(common_cmds) },
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 11/22] drm/i915: Add register whitelists for mesa
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (9 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 10/22] drm/i915: Reject unsafe commands bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 12/22] drm/i915: Enable register whitelist checks bradley.d.volkin
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

These registers are currently used by mesa for blitting and
for transform feedback extensions.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 33 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h        |  9 +++++++++
 2 files changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 8ee4cda..1decff9 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -140,6 +140,34 @@ static const struct drm_i915_cmd_table gen7_blt_cmds[] = {
 	{ blt_cmds, ARRAY_SIZE(blt_cmds) },
 };
 
+/* Register whitelists, sorted by increasing register offset.
+ *
+ * Some registers that userspace accesses are 64 bits. The register
+ * access commands only allow 32-bit accesses. Hence, we have to include
+ * entries for both halves of the 64-bit registers.
+ */
+
+static const u32 gen7_render_regs[] = {
+	CL_INVOCATION_COUNT,
+	CL_INVOCATION_COUNT + sizeof(u32),
+	GEN7_SO_NUM_PRIMS_WRITTEN(0),
+	GEN7_SO_NUM_PRIMS_WRITTEN(0) + sizeof(u32),
+	GEN7_SO_NUM_PRIMS_WRITTEN(1),
+	GEN7_SO_NUM_PRIMS_WRITTEN(1) + sizeof(u32),
+	GEN7_SO_NUM_PRIMS_WRITTEN(2),
+	GEN7_SO_NUM_PRIMS_WRITTEN(2) + sizeof(u32),
+	GEN7_SO_NUM_PRIMS_WRITTEN(3),
+	GEN7_SO_NUM_PRIMS_WRITTEN(3) + sizeof(u32),
+	GEN7_SO_WRITE_OFFSET(0),
+	GEN7_SO_WRITE_OFFSET(1),
+	GEN7_SO_WRITE_OFFSET(2),
+	GEN7_SO_WRITE_OFFSET(3),
+};
+
+static const u32 gen7_blt_regs[] = {
+	BCS_SWCTRL,
+};
+
 #define CLIENT_MASK      0xE0000000
 #define SUBCLIENT_MASK   0x18000000
 #define MI_CLIENT        0x00000000
@@ -212,6 +240,9 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
 			ring->cmd_table_count = ARRAY_SIZE(gen7_render_cmds);
 		}
 
+		ring->reg_table = gen7_render_regs;
+		ring->reg_count = ARRAY_SIZE(gen7_render_regs);
+
 		ring->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
 		break;
 	case VCS:
@@ -222,6 +253,8 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
 	case BCS:
 		ring->cmd_tables = gen7_blt_cmds;
 		ring->cmd_table_count = ARRAY_SIZE(gen7_blt_cmds);
+		ring->reg_table = gen7_blt_regs;
+		ring->reg_count = ARRAY_SIZE(gen7_blt_regs);
 		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
 		break;
 	case VECS:
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 5d52569..aa43624 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -371,6 +371,15 @@
 #define SRC_COPY_BLT  ((0x2<<29)|(0x43<<22))
 
 /*
+ * Registers used only by the command parser
+ */
+#define BCS_SWCTRL 0x22200
+
+#define CL_INVOCATION_COUNT 0x2338
+/* There are the 4 64-bit counter registers, one for each stream output */
+#define GEN7_SO_NUM_PRIMS_WRITTEN(n) (0x5200 + (n) * 8)
+
+/*
  * Reset registers
  */
 #define DEBUG_RESET_I830		0x6070
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 12/22] drm/i915: Enable register whitelist checks
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (10 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 11/22] drm/i915: Add register whitelists for mesa bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 13/22] drm/i915: Enable bit checking for some commands bradley.d.volkin
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

MI_STORE_REGISTER_MEM and MI_LOAD_REGISTER_* commands allow userspace
access to registers. Only certain registers should be allowed for such
access, so enable checking for those commands.

OTC-Tracker: AXIA-4631
Change-Id: Ie614a2f0eb2e5917de809e5a17957175d24cc44f
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 1decff9..df5424b 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -48,6 +48,7 @@
 #define F CMD_DESC_FIXED
 #define S CMD_DESC_SKIP
 #define R CMD_DESC_REJECT
+#define W CMD_DESC_REGISTER
 
 /*            Command                          Mask   Fixed Len   Action
 	      ---------------------------------------------------------- */
@@ -61,10 +62,13 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = {
 	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  S  ),
 	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   R  ),
-	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   W,
+	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
 	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0xFF,   R  ),
-	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   S  ),
-	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   W,
+	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
+	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   W,
+	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
 	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
 };
 
@@ -88,7 +92,8 @@ static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
 	CMD(  MI_URB_ATOMIC_ALLOC,              SMI,    F,  1,      S  ),
 	CMD(  MI_RS_CONTEXT,                    SMI,    F,  1,      S  ),
 	CMD(  MI_MATH,                          SMI,   !F,  0x3F,   S  ),
-	CMD(  MI_LOAD_REGISTER_REG,             SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_LOAD_REGISTER_REG,             SMI,   !F,  0xFF,   W,
+	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
 	CMD(  MI_LOAD_URB_MEM,                  SMI,   !F,  0xFF,   S  ),
 	CMD(  MI_STORE_URB_MEM,                 SMI,   !F,  0xFF,   S  ),
 	CMD(  GFX_OP_3DSTATE_DX9_CONSTANTF_VS,  S3D,   !F,  0x7FF,  S  ),
@@ -114,6 +119,7 @@ static const struct drm_i915_cmd_descriptor blt_cmds[] = {
 #undef F
 #undef S
 #undef R
+#undef W
 
 static const struct drm_i915_cmd_table gen7_render_cmds[] = {
 	{ common_cmds, ARRAY_SIZE(common_cmds) },
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 13/22] drm/i915: Enable bit checking for some commands
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (11 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 12/22] drm/i915: Enable register whitelist checks bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 14/22] drm/i915: Enable PPGTT command parser checks bradley.d.volkin
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

These checks prevent userspace from using certain commands to
access registers or generate interrupts.

OTC-Tracker: AXIA-4631
Change-Id: Ic6367ae98272495ba874c22abd4824fbced0abca
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 41 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h        |  3 +++
 2 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index df5424b..b881d39 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -49,6 +49,7 @@
 #define S CMD_DESC_SKIP
 #define R CMD_DESC_REJECT
 #define W CMD_DESC_REGISTER
+#define B CMD_DESC_BITMASK
 
 /*            Command                          Mask   Fixed Len   Action
 	      ---------------------------------------------------------- */
@@ -81,9 +82,23 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
 	CMD(  MI_CLFLUSH,                       SMI,   !F,  0x3FF,  S  ),
 	CMD(  GFX_OP_3DSTATE_VF_STATISTICS,     S3D,    F,  1,      S  ),
 	CMD(  PIPELINE_SELECT,                  S3D,    F,  1,      S  ),
+	CMD(  MEDIA_VFE_STATE,			S3D,   !F,  0xFFFF, B,
+	      .bits = {{
+			.offset = 2,
+			.mask = MEDIA_VFE_STATE_MMIO_ACCESS_MASK,
+			.expected = 0
+	      }},
+	      .bits_count = 1					       ),
 	CMD(  GPGPU_OBJECT,                     S3D,   !F,  0xFF,   S  ),
 	CMD(  GPGPU_WALKER,                     S3D,   !F,  0xFF,   S  ),
 	CMD(  GFX_OP_3DSTATE_SO_DECL_LIST,      S3D,   !F,  0x1FF,  S  ),
+	CMD(  GFX_OP_PIPE_CONTROL(5),           S3D,   !F,  0xFF,   B,
+	      .bits = {{
+			.offset = 1,
+			.mask = (PIPE_CONTROL_MMIO_WRITE | PIPE_CONTROL_NOTIFY),
+			.expected = 0
+	      }},
+	      .bits_count = 1					       ),
 };
 
 static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
@@ -102,11 +117,35 @@ static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
 
 static const struct drm_i915_cmd_descriptor video_cmds[] = {
 	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
+	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_FLUSH_DW_NOTIFY,
+			.expected = 0
+	      }},
+	      .bits_count = 1					       ),
 	CMD(  MFX_WAIT,                         SMFX,  !F,  0x3F,   S  ),
 };
 
+static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
+	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_FLUSH_DW_NOTIFY,
+			.expected = 0
+	      }},
+	      .bits_count = 1					       ),
+};
+
 static const struct drm_i915_cmd_descriptor blt_cmds[] = {
 	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
+	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_FLUSH_DW_NOTIFY,
+			.expected = 0
+	      }},
+	      .bits_count = 1					       ),
 	CMD(  COLOR_BLT,                        S2D,   !F,  0x3F,   S  ),
 	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
 };
@@ -120,6 +159,7 @@ static const struct drm_i915_cmd_descriptor blt_cmds[] = {
 #undef S
 #undef R
 #undef W
+#undef B
 
 static const struct drm_i915_cmd_table gen7_render_cmds[] = {
 	{ common_cmds, ARRAY_SIZE(common_cmds) },
@@ -139,6 +179,7 @@ static const struct drm_i915_cmd_table gen7_video_cmds[] = {
 
 static const struct drm_i915_cmd_table hsw_vebox_cmds[] = {
 	{ common_cmds, ARRAY_SIZE(common_cmds) },
+	{ vecs_cmds, ARRAY_SIZE(vecs_cmds) },
 };
 
 static const struct drm_i915_cmd_table gen7_blt_cmds[] = {
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index aa43624..0e504b9 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -240,6 +240,7 @@
 #define   MI_FLUSH_DW_STORE_INDEX	(1<<21)
 #define   MI_INVALIDATE_TLB		(1<<18)
 #define   MI_FLUSH_DW_OP_STOREDW	(1<<14)
+#define   MI_FLUSH_DW_NOTIFY		(1<<8)
 #define   MI_INVALIDATE_BSD		(1<<7)
 #define   MI_FLUSH_DW_USE_GTT		(1<<2)
 #define   MI_FLUSH_DW_USE_PPGTT		(0<<2)
@@ -318,6 +319,7 @@
 #define   DISPLAY_PLANE_B           (1<<20)
 #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|(len-2))
 #define   PIPE_CONTROL_GLOBAL_GTT_IVB			(1<<24) /* gen7+ */
+#define   PIPE_CONTROL_MMIO_WRITE			(1<<23)
 #define   PIPE_CONTROL_CS_STALL				(1<<20)
 #define   PIPE_CONTROL_TLB_INVALIDATE			(1<<18)
 #define   PIPE_CONTROL_QW_WRITE				(1<<14)
@@ -356,6 +358,7 @@
 #define PIPELINE_SELECT                ((0x3<<29)|(0x1<<27)|(0x1<<24)|(0x4<<16))
 #define GFX_OP_3DSTATE_VF_STATISTICS   ((0x3<<29)|(0x1<<27)|(0x0<<24)|(0xB<<16))
 #define MEDIA_VFE_STATE                ((0x3<<29)|(0x2<<27)|(0x0<<24)|(0x0<<16))
+#define  MEDIA_VFE_STATE_MMIO_ACCESS_MASK (0x18)
 #define GPGPU_OBJECT                   ((0x3<<29)|(0x2<<27)|(0x1<<24)|(0x4<<16))
 #define GPGPU_WALKER                   ((0x3<<29)|(0x2<<27)|(0x1<<24)|(0x5<<16))
 #define GFX_OP_3DSTATE_DX9_CONSTANTF_VS \
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 14/22] drm/i915: Enable PPGTT command parser checks
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (12 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 13/22] drm/i915: Enable bit checking for some commands bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 15/22] drm/i915: Reject commands that would store to global HWS page bradley.d.volkin
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Various commands that access memory have a bit to determine whether
the graphics address specified in the command should use the GGTT or
PPGTT for translation. These checks ensure that the bit indicates
PPGTT translation.

Most of these checks use the existing bit-checking infrastructure.
The PIPE_CONTROL and MI_FLUSH_DW commands, however, are multi-function
commands. The GGTT/PPGTT bit is only relevant for certain uses of the
command. As such, this change also extends the bit-checking code to
include a "condition" mask and offset. If the condition mask is non-zero
then the parser only performs the bit check when the bits specified by
the condition mask/offset are also non-zero.

NOTE: At this point in the series PPGTT must be enabled for the parser
to work correctly. If it's not enabled, userspace will not be setting
the PPGTT bits the way the parser requires. There's a WARN_ON to detect
this case.

OTC-Tracker: AXIA-4631
Change-Id: I3f4c76b6734f1956ec47e698230f97d0998ff92b
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 110 ++++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_drv.h        |   6 ++
 drivers/gpu/drm/i915/i915_reg.h        |   5 ++
 3 files changed, 111 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index b881d39..7b30a03 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -61,15 +61,33 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = {
 	CMD(  MI_REPORT_HEAD,                   SMI,    F,  1,      S  ),
 	CMD(  MI_SUSPEND_FLUSH,                 SMI,    F,  1,      S  ),
 	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   R  ),
-	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  S  ),
+	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1					       ),
 	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   W,
 	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
 	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0xFF,   R  ),
-	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   W,
-	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
-	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   W,
-	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
+	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   W | B,
+	      .reg = { .offset = 1, .mask = 0x007FFFFC },
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
+	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   W | B,
+	      .reg = { .offset = 1, .mask = 0x007FFFFC },
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
 	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
 };
 
@@ -79,7 +97,20 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
 	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_PREDICATE,                     SMI,    F,  1,      S  ),
 	CMD(  MI_TOPOLOGY_FILTER,               SMI,    F,  1,      S  ),
-	CMD(  MI_CLFLUSH,                       SMI,   !F,  0x3FF,  S  ),
+	CMD(  MI_CLFLUSH,                       SMI,   !F,  0x3FF,  B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
+	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
 	CMD(  GFX_OP_3DSTATE_VF_STATISTICS,     S3D,    F,  1,      S  ),
 	CMD(  PIPELINE_SELECT,                  S3D,    F,  1,      S  ),
 	CMD(  MEDIA_VFE_STATE,			S3D,   !F,  0xFFFF, B,
@@ -97,8 +128,15 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
 			.offset = 1,
 			.mask = (PIPE_CONTROL_MMIO_WRITE | PIPE_CONTROL_NOTIFY),
 			.expected = 0
+	      },
+	      {
+			.offset = 1,
+			.mask = PIPE_CONTROL_GLOBAL_GTT_IVB,
+			.expected = 0,
+			.condition_offset = 1,
+			.condition_mask = PIPE_CONTROL_POST_SYNC_OP_MASK
 	      }},
-	      .bits_count = 1					       ),
+	      .bits_count = 2					       ),
 };
 
 static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
@@ -122,8 +160,22 @@ static const struct drm_i915_cmd_descriptor video_cmds[] = {
 			.offset = 0,
 			.mask = MI_FLUSH_DW_NOTIFY,
 			.expected = 0
+	      },
+	      {
+			.offset = 1,
+			.mask = MI_FLUSH_DW_USE_GTT,
+			.expected = 0,
+			.condition_offset = 0,
+			.condition_mask = MI_FLUSH_DW_OP_MASK
 	      }},
-	      .bits_count = 1					       ),
+	      .bits_count = 2					       ),
+	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
 	CMD(  MFX_WAIT,                         SMFX,  !F,  0x3F,   S  ),
 };
 
@@ -133,8 +185,22 @@ static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
 			.offset = 0,
 			.mask = MI_FLUSH_DW_NOTIFY,
 			.expected = 0
+	      },
+	      {
+			.offset = 1,
+			.mask = MI_FLUSH_DW_USE_GTT,
+			.expected = 0,
+			.condition_offset = 0,
+			.condition_mask = MI_FLUSH_DW_OP_MASK
 	      }},
-	      .bits_count = 1					       ),
+	      .bits_count = 2					       ),
+	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
 };
 
 static const struct drm_i915_cmd_descriptor blt_cmds[] = {
@@ -144,8 +210,15 @@ static const struct drm_i915_cmd_descriptor blt_cmds[] = {
 			.offset = 0,
 			.mask = MI_FLUSH_DW_NOTIFY,
 			.expected = 0
+	      },
+	      {
+			.offset = 1,
+			.mask = MI_FLUSH_DW_USE_GTT,
+			.expected = 0,
+			.condition_offset = 0,
+			.condition_mask = MI_FLUSH_DW_OP_MASK
 	      }},
-	      .bits_count = 1					       ),
+	      .bits_count = 2					       ),
 	CMD(  COLOR_BLT,                        S2D,   !F,  0x3F,   S  ),
 	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
 };
@@ -422,6 +495,13 @@ int i915_parse_cmds(struct intel_ring_buffer *ring,
 	int ret = 0;
 	u32 *cmd, *batch_base, *batch_end;
 	struct drm_i915_cmd_descriptor default_desc = { 0 };
+	drm_i915_private_t *dev_priv =
+		(drm_i915_private_t *)ring->dev->dev_private;
+
+	/* XXX: this breaks VLV, which is Gen7, but no PPGTT
+	 * Replace with better checks for when to call i915_parse_cmds?
+	 */
+	WARN_ON(!dev_priv->mm.aliasing_ppgtt);
 
 	/* No command tables currently indicates a platform without parsing */
 	if (!ring->cmd_tables)
@@ -490,6 +570,16 @@ int i915_parse_cmds(struct intel_ring_buffer *ring,
 				u32 dword = cmd[desc->bits[i].offset] &
 					desc->bits[i].mask;
 
+				if (desc->bits[i].condition_mask != 0) {
+					u32 offset =
+						desc->bits[i].condition_offset;
+					u32 condition = cmd[offset] &
+						desc->bits[i].condition_mask;
+
+					if (condition == 0)
+						continue;
+				}
+
 				if (dword != desc->bits[i].expected) {
 					DRM_DEBUG_DRIVER("CMD: Rejected command 0x%08X for bitmask 0x%08X (exp=0x%08X act=0x%08X) (ring=%d)\n",
 							 *cmd,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f31fc68..161d9cd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1792,11 +1792,17 @@ struct drm_i915_cmd_descriptor {
 	 * compared against an expected value. If the command does not match
 	 * the expected value, the parser rejects it. Only valid if flags has
 	 * the CMD_DESC_BITMASK bit set.
+	 *
+	 * If the check specifies a non-zero condition_mask then the parser
+	 * only performs the check when the bits specified by condition_mask
+	 * are non-zero.
 	 */
 	struct {
 		u32 offset;
 		u32 mask;
 		u32 expected;
+		u32 condition_offset;
+		u32 condition_mask;
 	} bits[MAX_CMD_DESC_BITMASKS];
 	/** Number of valid entries in the bits array */
 	int bits_count;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 0e504b9..3f64d41 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -178,6 +178,8 @@
  * Memory interface instructions used by the kernel
  */
 #define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
+/* Many MI commands use bit 22 of the header dword for GGTT vs PPGTT */
+#define  MI_GLOBAL_GTT    (1<<22)
 
 #define MI_NOOP			MI_INSTR(0, 0)
 #define MI_USER_INTERRUPT	MI_INSTR(0x02, 0)
@@ -240,6 +242,7 @@
 #define   MI_FLUSH_DW_STORE_INDEX	(1<<21)
 #define   MI_INVALIDATE_TLB		(1<<18)
 #define   MI_FLUSH_DW_OP_STOREDW	(1<<14)
+#define   MI_FLUSH_DW_OP_MASK		(3<<14)
 #define   MI_FLUSH_DW_NOTIFY		(1<<8)
 #define   MI_INVALIDATE_BSD		(1<<7)
 #define   MI_FLUSH_DW_USE_GTT		(1<<2)
@@ -323,6 +326,7 @@
 #define   PIPE_CONTROL_CS_STALL				(1<<20)
 #define   PIPE_CONTROL_TLB_INVALIDATE			(1<<18)
 #define   PIPE_CONTROL_QW_WRITE				(1<<14)
+#define   PIPE_CONTROL_POST_SYNC_OP_MASK                (3<<14)
 #define   PIPE_CONTROL_DEPTH_STALL			(1<<13)
 #define   PIPE_CONTROL_WRITE_FLUSH			(1<<12)
 #define   PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH	(1<<12) /* gen6+ */
@@ -354,6 +358,7 @@
 #define MI_LOAD_REGISTER_REG   MI_INSTR(0x2A, 0)
 #define MI_LOAD_URB_MEM        MI_INSTR(0x2C, 0)
 #define MI_STORE_URB_MEM       MI_INSTR(0x2D, 0)
+#define MI_CONDITIONAL_BATCH_BUFFER_END MI_INSTR(0x36, 0)
 
 #define PIPELINE_SELECT                ((0x3<<29)|(0x1<<27)|(0x1<<24)|(0x4<<16))
 #define GFX_OP_3DSTATE_VF_STATISTICS   ((0x3<<29)|(0x1<<27)|(0x0<<24)|(0xB<<16))
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 15/22] drm/i915: Reject commands that would store to global HWS page
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (13 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 14/22] drm/i915: Enable PPGTT command parser checks bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 16/22] drm/i915: Reject additional commands bradley.d.volkin
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

PIPE_CONTROL and MI_FLUSH_DW have bits that would write to the
hardware status page. There are no users of this today and it
seems unsafe.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 30 ++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_reg.h        |  1 +
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 7b30a03..f32dc69 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -131,7 +131,8 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
 	      },
 	      {
 			.offset = 1,
-			.mask = PIPE_CONTROL_GLOBAL_GTT_IVB,
+		        .mask = (PIPE_CONTROL_GLOBAL_GTT_IVB |
+				 PIPE_CONTROL_STORE_DATA_INDEX),
 			.expected = 0,
 			.condition_offset = 1,
 			.condition_mask = PIPE_CONTROL_POST_SYNC_OP_MASK
@@ -167,8 +168,15 @@ static const struct drm_i915_cmd_descriptor video_cmds[] = {
 			.expected = 0,
 			.condition_offset = 0,
 			.condition_mask = MI_FLUSH_DW_OP_MASK
+	      },
+	      {
+			.offset = 0,
+			.mask = MI_FLUSH_DW_STORE_INDEX,
+			.expected = 0,
+			.condition_offset = 0,
+			.condition_mask = MI_FLUSH_DW_OP_MASK
 	      }},
-	      .bits_count = 2					       ),
+	      .bits_count = 3					       ),
 	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
 	      .bits = {{
 			.offset = 0,
@@ -192,8 +200,15 @@ static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
 			.expected = 0,
 			.condition_offset = 0,
 			.condition_mask = MI_FLUSH_DW_OP_MASK
+	      },
+	      {
+			.offset = 0,
+			.mask = MI_FLUSH_DW_STORE_INDEX,
+			.expected = 0,
+			.condition_offset = 0,
+			.condition_mask = MI_FLUSH_DW_OP_MASK
 	      }},
-	      .bits_count = 2					       ),
+	      .bits_count = 3					       ),
 	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
 	      .bits = {{
 			.offset = 0,
@@ -217,8 +232,15 @@ static const struct drm_i915_cmd_descriptor blt_cmds[] = {
 			.expected = 0,
 			.condition_offset = 0,
 			.condition_mask = MI_FLUSH_DW_OP_MASK
+	      },
+	      {
+			.offset = 0,
+			.mask = MI_FLUSH_DW_STORE_INDEX,
+			.expected = 0,
+			.condition_offset = 0,
+			.condition_mask = MI_FLUSH_DW_OP_MASK
 	      }},
-	      .bits_count = 2					       ),
+	      .bits_count = 3					       ),
 	CMD(  COLOR_BLT,                        S2D,   !F,  0x3F,   S  ),
 	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
 };
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 3f64d41..919d1a6 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -323,6 +323,7 @@
 #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|(len-2))
 #define   PIPE_CONTROL_GLOBAL_GTT_IVB			(1<<24) /* gen7+ */
 #define   PIPE_CONTROL_MMIO_WRITE			(1<<23)
+#define   PIPE_CONTROL_STORE_DATA_INDEX			(1<<21)
 #define   PIPE_CONTROL_CS_STALL				(1<<20)
 #define   PIPE_CONTROL_TLB_INVALIDATE			(1<<18)
 #define   PIPE_CONTROL_QW_WRITE				(1<<14)
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 16/22] drm/i915: Reject additional commands
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (14 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 15/22] drm/i915: Reject commands that would store to global HWS page bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 17/22] drm/i915: Add parser data for perf monitoring GL extensions bradley.d.volkin
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

MI_USER_INTERRUPT: We're rejecting other interrupt-generating mechanisms

MI_LOAD_SCAN_LINES_*: The DDX is the only user of these and protects
them behind the I915_EXEC_SECURE flag, so we probably shouldn't let others
use these.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 26 +++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_reg.h        | 29 +++++++++++++++--------------
 2 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index f32dc69..3ad2a1e 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -55,7 +55,7 @@
 	      ---------------------------------------------------------- */
 static const struct drm_i915_cmd_descriptor common_cmds[] = {
 	CMD(  MI_NOOP,                          SMI,    F,  1,      S  ),
-	CMD(  MI_USER_INTERRUPT,                SMI,    F,  1,      S  ),
+	CMD(  MI_USER_INTERRUPT,                SMI,    F,  1,      R  ),
 	CMD(  MI_WAIT_FOR_EVENT,                SMI,    F,  1,      S  ),
 	CMD(  MI_ARB_CHECK,                     SMI,    F,  1,      S  ),
 	CMD(  MI_REPORT_HEAD,                   SMI,    F,  1,      S  ),
@@ -145,6 +145,8 @@ static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
 	CMD(  MI_RS_CONTROL,                    SMI,    F,  1,      S  ),
 	CMD(  MI_URB_ATOMIC_ALLOC,              SMI,    F,  1,      S  ),
 	CMD(  MI_RS_CONTEXT,                    SMI,    F,  1,      S  ),
+	CMD(  MI_LOAD_SCAN_LINES_INCL,          SMI,   !F,  0x3F,   R  ),
+	CMD(  MI_LOAD_SCAN_LINES_EXCL,          SMI,   !F,  0x3F,   R  ),
 	CMD(  MI_MATH,                          SMI,   !F,  0x3F,   S  ),
 	CMD(  MI_LOAD_REGISTER_REG,             SMI,   !F,  0xFF,   W,
 	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
@@ -245,6 +247,11 @@ static const struct drm_i915_cmd_descriptor blt_cmds[] = {
 	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
 };
 
+static const struct drm_i915_cmd_descriptor hsw_blt_cmds[] = {
+	CMD(  MI_LOAD_SCAN_LINES_INCL,          SMI,   !F,  0x3F,   R  ),
+	CMD(  MI_LOAD_SCAN_LINES_EXCL,          SMI,   !F,  0x3F,   R  ),
+};
+
 #undef CMD
 #undef SMI
 #undef S3D
@@ -282,6 +289,12 @@ static const struct drm_i915_cmd_table gen7_blt_cmds[] = {
 	{ blt_cmds, ARRAY_SIZE(blt_cmds) },
 };
 
+static const struct drm_i915_cmd_table hsw_blt_ring_cmds[] = {
+	{ common_cmds, ARRAY_SIZE(common_cmds) },
+	{ blt_cmds, ARRAY_SIZE(blt_cmds) },
+	{ hsw_blt_cmds, ARRAY_SIZE(hsw_blt_cmds) },
+};
+
 /* Register whitelists, sorted by increasing register offset.
  *
  * Some registers that userspace accesses are 64 bits. The register
@@ -393,10 +406,17 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
 		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
 		break;
 	case BCS:
-		ring->cmd_tables = gen7_blt_cmds;
-		ring->cmd_table_count = ARRAY_SIZE(gen7_blt_cmds);
+		if (IS_HASWELL(ring->dev)) {
+			ring->cmd_tables = hsw_blt_ring_cmds;
+			ring->cmd_table_count = ARRAY_SIZE(hsw_blt_ring_cmds);
+		} else {
+			ring->cmd_tables = gen7_blt_cmds;
+			ring->cmd_table_count = ARRAY_SIZE(gen7_blt_cmds);
+		}
+
 		ring->reg_table = gen7_blt_regs;
 		ring->reg_count = ARRAY_SIZE(gen7_blt_regs);
+
 		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
 		break;
 	case VECS:
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 919d1a6..232ad0c 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -345,20 +345,21 @@
 /*
  * Commands used only by the command parser
  */
-#define MI_SET_PREDICATE       MI_INSTR(0x01, 0)
-#define MI_ARB_CHECK           MI_INSTR(0x05, 0)
-#define MI_RS_CONTROL          MI_INSTR(0x06, 0)
-#define MI_URB_ATOMIC_ALLOC    MI_INSTR(0x09, 0)
-#define MI_PREDICATE           MI_INSTR(0x0C, 0)
-#define MI_RS_CONTEXT          MI_INSTR(0x0F, 0)
-#define MI_TOPOLOGY_FILTER     MI_INSTR(0x0D, 0)
-#define MI_MATH                MI_INSTR(0x1A, 0)
-#define MI_UPDATE_GTT          MI_INSTR(0x23, 0)
-#define MI_CLFLUSH             MI_INSTR(0x27, 0)
-#define MI_LOAD_REGISTER_MEM   MI_INSTR(0x29, 0)
-#define MI_LOAD_REGISTER_REG   MI_INSTR(0x2A, 0)
-#define MI_LOAD_URB_MEM        MI_INSTR(0x2C, 0)
-#define MI_STORE_URB_MEM       MI_INSTR(0x2D, 0)
+#define MI_SET_PREDICATE        MI_INSTR(0x01, 0)
+#define MI_ARB_CHECK            MI_INSTR(0x05, 0)
+#define MI_RS_CONTROL           MI_INSTR(0x06, 0)
+#define MI_URB_ATOMIC_ALLOC     MI_INSTR(0x09, 0)
+#define MI_PREDICATE            MI_INSTR(0x0C, 0)
+#define MI_RS_CONTEXT           MI_INSTR(0x0F, 0)
+#define MI_TOPOLOGY_FILTER      MI_INSTR(0x0D, 0)
+#define MI_LOAD_SCAN_LINES_EXCL MI_INSTR(0x13, 0)
+#define MI_MATH                 MI_INSTR(0x1A, 0)
+#define MI_UPDATE_GTT           MI_INSTR(0x23, 0)
+#define MI_CLFLUSH              MI_INSTR(0x27, 0)
+#define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
+#define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
+#define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
+#define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
 #define MI_CONDITIONAL_BATCH_BUFFER_END MI_INSTR(0x36, 0)
 
 #define PIPELINE_SELECT                ((0x3<<29)|(0x1<<27)|(0x1<<24)|(0x4<<16))
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 17/22] drm/i915: Add parser data for perf monitoring GL extensions
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (15 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 16/22] drm/i915: Reject additional commands bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 18/22] drm/i915: Reject MI_ARB_ON_OFF on VECS bradley.d.volkin
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

These registers and commands will be used by mesa for the
GL_AMD_performance_monitor extension.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 27 +++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h        | 13 +++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 3ad2a1e..c8426af 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -104,6 +104,13 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
 			.expected = 0
 	      }},
 	      .bits_count = 1                                          ),
+	CMD(  MI_REPORT_PERF_COUNT,             SMI,   !F,  0x3F,   B,
+	      .bits = {{
+			.offset = 1,
+			.mask = MI_REPORT_PERF_COUNT_GGTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
 	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
 	      .bits = {{
 			.offset = 0,
@@ -303,8 +310,28 @@ static const struct drm_i915_cmd_table hsw_blt_ring_cmds[] = {
  */
 
 static const u32 gen7_render_regs[] = {
+	HS_INVOCATION_COUNT,
+	HS_INVOCATION_COUNT + sizeof(u32),
+	DS_INVOCATION_COUNT,
+	DS_INVOCATION_COUNT + sizeof(u32),
+	IA_VERTICES_COUNT,
+	IA_VERTICES_COUNT + sizeof(u32),
+	IA_PRIMITIVES_COUNT,
+	IA_PRIMITIVES_COUNT + sizeof(u32),
+	VS_INVOCATION_COUNT,
+	VS_INVOCATION_COUNT + sizeof(u32),
+	GS_INVOCATION_COUNT,
+	GS_INVOCATION_COUNT + sizeof(u32),
+	GS_PRIMITIVES_COUNT,
+	GS_PRIMITIVES_COUNT + sizeof(u32),
 	CL_INVOCATION_COUNT,
 	CL_INVOCATION_COUNT + sizeof(u32),
+	CL_PRIMITIVES_COUNT,
+	CL_PRIMITIVES_COUNT + sizeof(u32),
+	PS_INVOCATION_COUNT,
+	PS_INVOCATION_COUNT + sizeof(u32),
+	PS_DEPTH_COUNT,
+	PS_DEPTH_COUNT + sizeof(u32),
 	GEN7_SO_NUM_PRIMS_WRITTEN(0),
 	GEN7_SO_NUM_PRIMS_WRITTEN(0) + sizeof(u32),
 	GEN7_SO_NUM_PRIMS_WRITTEN(1),
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 232ad0c..4dd5541 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -356,6 +356,8 @@
 #define MI_MATH                 MI_INSTR(0x1A, 0)
 #define MI_UPDATE_GTT           MI_INSTR(0x23, 0)
 #define MI_CLFLUSH              MI_INSTR(0x27, 0)
+#define MI_REPORT_PERF_COUNT    MI_INSTR(0x28, 0)
+#define   MI_REPORT_PERF_COUNT_GGTT (1<<0)
 #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
 #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
 #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
@@ -385,7 +387,18 @@
  */
 #define BCS_SWCTRL 0x22200
 
+#define HS_INVOCATION_COUNT 0x2300
+#define DS_INVOCATION_COUNT 0x2308
+#define IA_VERTICES_COUNT   0x2310
+#define IA_PRIMITIVES_COUNT 0x2318
+#define VS_INVOCATION_COUNT 0x2320
+#define GS_INVOCATION_COUNT 0x2328
+#define GS_PRIMITIVES_COUNT 0x2330
 #define CL_INVOCATION_COUNT 0x2338
+#define CL_PRIMITIVES_COUNT 0x2340
+#define PS_INVOCATION_COUNT 0x2348
+#define PS_DEPTH_COUNT      0x2350
+
 /* There are the 4 64-bit counter registers, one for each stream output */
 #define GEN7_SO_NUM_PRIMS_WRITTEN(n) (0x5200 + (n) * 8)
 
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 18/22] drm/i915: Reject MI_ARB_ON_OFF on VECS
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (16 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 17/22] drm/i915: Add parser data for perf monitoring GL extensions bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 19/22] drm/i915: Fix length handling for MFX_WAIT bradley.d.volkin
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index c8426af..5593740 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -197,6 +197,7 @@ static const struct drm_i915_cmd_descriptor video_cmds[] = {
 };
 
 static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
+	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
 	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
 	      .bits = {{
 			.offset = 0,
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 19/22] drm/i915: Fix length handling for MFX_WAIT
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (17 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 18/22] drm/i915: Reject MI_ARB_ON_OFF on VECS bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 20/22] drm/i915: Fix MI_STORE_DWORD_IMM parser defintion bradley.d.volkin
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 5593740..adc7d94 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -193,7 +193,11 @@ static const struct drm_i915_cmd_descriptor video_cmds[] = {
 			.expected = 0
 	      }},
 	      .bits_count = 1                                          ),
-	CMD(  MFX_WAIT,                         SMFX,  !F,  0x3F,   S  ),
+	/* MFX_WAIT doesn't fit the way we handle length for most commands.
+	 * It has a length field but it uses a non-standard length bias.
+	 * It is always 1 dword though, so just treat it as fixed length.
+	 */
+	CMD(  MFX_WAIT,                         SMFX,   F,  1,      S  ),
 };
 
 static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 20/22] drm/i915: Fix MI_STORE_DWORD_IMM parser defintion
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (18 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 19/22] drm/i915: Fix length handling for MFX_WAIT bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 18:08   ` Chris Wilson
  2013-11-26 16:51 ` [RFC 21/22] drm/i915: Clean up command parser enable decision bradley.d.volkin
                   ` (6 subsequent siblings)
  26 siblings, 1 reply; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

The length mask is different for each ring and the size can vary,
so we should duplicate the definition with the correct encoding
for each ring.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 35 +++++++++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index adc7d94..8481ef0 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -61,13 +61,6 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = {
 	CMD(  MI_REPORT_HEAD,                   SMI,    F,  1,      S  ),
 	CMD(  MI_SUSPEND_FLUSH,                 SMI,    F,  1,      S  ),
 	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   R  ),
-	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  B,
-	      .bits = {{
-			.offset = 0,
-			.mask = MI_GLOBAL_GTT,
-			.expected = 0
-	      }},
-	      .bits_count = 1					       ),
 	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   W,
 	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
@@ -97,6 +90,13 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
 	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_PREDICATE,                     SMI,    F,  1,      S  ),
 	CMD(  MI_TOPOLOGY_FILTER,               SMI,    F,  1,      S  ),
+	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3F,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1					       ),
 	CMD(  MI_CLFLUSH,                       SMI,   !F,  0x3FF,  B,
 	      .bits = {{
 			.offset = 0,
@@ -165,6 +165,13 @@ static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
 
 static const struct drm_i915_cmd_descriptor video_cmds[] = {
 	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
+	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1					       ),
 	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
 	      .bits = {{
 			.offset = 0,
@@ -202,6 +209,13 @@ static const struct drm_i915_cmd_descriptor video_cmds[] = {
 
 static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
 	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
+	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1					       ),
 	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
 	      .bits = {{
 			.offset = 0,
@@ -234,6 +248,13 @@ static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
 
 static const struct drm_i915_cmd_descriptor blt_cmds[] = {
 	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
+	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x1FF,  B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1					       ),
 	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
 	      .bits = {{
 			.offset = 0,
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 21/22] drm/i915: Clean up command parser enable decision
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (19 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 20/22] drm/i915: Fix MI_STORE_DWORD_IMM parser defintion bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 16:51 ` [RFC 22/22] drm/i915: Enable command parsing by default bradley.d.volkin
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 30 +++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_drv.h            |  1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 8481ef0..b3525ce 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -581,6 +581,25 @@ finish:
 	return (u32*)addr;
 }
 
+int i915_needs_cmd_parser(struct intel_ring_buffer *ring)
+{
+	drm_i915_private_t *dev_priv =
+		(drm_i915_private_t *)ring->dev->dev_private;
+
+	/* No command tables indicates a platform without parsing */
+	if (!ring->cmd_tables)
+		return 0;
+
+	/* XXX: VLV is Gen7 and therefore has cmd_tables, but has PPGTT
+	 * disabled. That will cause all of the parser's PPGTT checks to
+	 * fail. For now, disable parsing when PPGTT is off.
+	 */
+	if(!dev_priv->mm.aliasing_ppgtt)
+		return 0;
+
+	return i915_enable_cmd_parser;
+}
+
 #define LENGTH_BIAS 2
 
 int i915_parse_cmds(struct intel_ring_buffer *ring,
@@ -590,17 +609,6 @@ int i915_parse_cmds(struct intel_ring_buffer *ring,
 	int ret = 0;
 	u32 *cmd, *batch_base, *batch_end;
 	struct drm_i915_cmd_descriptor default_desc = { 0 };
-	drm_i915_private_t *dev_priv =
-		(drm_i915_private_t *)ring->dev->dev_private;
-
-	/* XXX: this breaks VLV, which is Gen7, but no PPGTT
-	 * Replace with better checks for when to call i915_parse_cmds?
-	 */
-	WARN_ON(!dev_priv->mm.aliasing_ppgtt);
-
-	/* No command tables currently indicates a platform without parsing */
-	if (!ring->cmd_tables)
-		return 0;
 
 	batch_base = vmap_batch(batch_obj);
 	if (!batch_base) {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 161d9cd..e7fc31c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2412,6 +2412,7 @@ const char *i915_cache_level_str(int type);
 
 /* i915_cmd_parser.c */
 void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring);
+int i915_needs_cmd_parser(struct intel_ring_buffer *ring);
 int i915_parse_cmds(struct intel_ring_buffer *ring,
 		    struct drm_i915_gem_object *batch_obj,
 		    u32 batch_start_offset);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 06975c7..7b1453e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1143,7 +1143,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
 
-	if (i915_enable_cmd_parser && !(flags & I915_DISPATCH_SECURE)) {
+	if (i915_needs_cmd_parser(ring) && !(flags & I915_DISPATCH_SECURE)) {
 		ret = i915_parse_cmds(ring,
 				      batch_obj,
 				      args->batch_start_offset);
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [RFC 22/22] drm/i915: Enable command parsing by default
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (20 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 21/22] drm/i915: Clean up command parser enable decision bradley.d.volkin
@ 2013-11-26 16:51 ` bradley.d.volkin
  2013-11-26 19:35 ` [RFC 00/22] Gen7 batch buffer command parser Daniel Vetter
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2013-11-26 16:51 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

OTC-Tracker: AXIA-4631
Change-Id: I6747457e1fe7494bd42787af51198fcba398ad78
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 90d7db0..8c0d91b 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -154,10 +154,10 @@ module_param_named(prefault_disable, i915_prefault_disable, bool, 0600);
 MODULE_PARM_DESC(prefault_disable,
 		"Disable page prefaulting for pread/pwrite/reloc (default:false). For developers only.");
 
-int i915_enable_cmd_parser __read_mostly = 0;
+int i915_enable_cmd_parser __read_mostly = 1;
 module_param_named(enable_cmd_parser, i915_enable_cmd_parser, int, 0600);
 MODULE_PARM_DESC(enable_cmd_parser,
-		"Enable command parsing (default: false)");
+		"Enable command parsing (default: true)");
 
 static struct drm_driver driver;
 
-- 
1.8.4.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [RFC 05/22] drm/i915: Implement command parsing
  2013-11-26 16:51 ` [RFC 05/22] drm/i915: Implement command parsing bradley.d.volkin
@ 2013-11-26 17:29   ` Chris Wilson
  2013-11-26 17:38     ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Chris Wilson @ 2013-11-26 17:29 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Tue, Nov 26, 2013 at 08:51:22AM -0800, bradley.d.volkin@intel.com wrote:
> +static const struct drm_i915_cmd_descriptor*
> +find_cmd_in_table(const struct drm_i915_cmd_table *table,
> +		  u32 cmd_header)
> +{
> +	int i;
> +
> +	for (i = 0; i < table->count; i++) {
> +		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
> +		u32 masked_cmd = desc->cmd.mask & cmd_header;
> +		u32 masked_value = desc->cmd.value & desc->cmd.mask;
> +
> +		if (masked_cmd == masked_value)
> +			return desc;

Maybe pre-sort the cmd table and use bsearch? And a runtime test on
module load to check that the table is sorted correctly.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 05/22] drm/i915: Implement command parsing
  2013-11-26 17:29   ` Chris Wilson
@ 2013-11-26 17:38     ` Volkin, Bradley D
  2013-11-26 17:56       ` Chris Wilson
  0 siblings, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2013-11-26 17:38 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Nov 26, 2013 at 09:29:32AM -0800, Chris Wilson wrote:
> On Tue, Nov 26, 2013 at 08:51:22AM -0800, bradley.d.volkin@intel.com wrote:
> > +static const struct drm_i915_cmd_descriptor*
> > +find_cmd_in_table(const struct drm_i915_cmd_table *table,
> > +		  u32 cmd_header)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < table->count; i++) {
> > +		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
> > +		u32 masked_cmd = desc->cmd.mask & cmd_header;
> > +		u32 masked_value = desc->cmd.value & desc->cmd.mask;
> > +
> > +		if (masked_cmd == masked_value)
> > +			return desc;
> 
> Maybe pre-sort the cmd table and use bsearch? And a runtime test on
> module load to check that the table is sorted correctly.

So far it doesn't look like the search is a bottleneck, but I've tried to keep
the tables sorted so that we could easily switch to bsearch if needed. Would
you prefer to just use bsearch from the start?

I agree that the module load check makes sense if we do switch.

> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 05/22] drm/i915: Implement command parsing
  2013-11-26 17:38     ` Volkin, Bradley D
@ 2013-11-26 17:56       ` Chris Wilson
  2013-11-26 18:55         ` Volkin, Bradley D
  2013-12-05 21:10         ` Volkin, Bradley D
  0 siblings, 2 replies; 138+ messages in thread
From: Chris Wilson @ 2013-11-26 17:56 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Tue, Nov 26, 2013 at 09:38:55AM -0800, Volkin, Bradley D wrote:
> On Tue, Nov 26, 2013 at 09:29:32AM -0800, Chris Wilson wrote:
> > On Tue, Nov 26, 2013 at 08:51:22AM -0800, bradley.d.volkin@intel.com wrote:
> > > +static const struct drm_i915_cmd_descriptor*
> > > +find_cmd_in_table(const struct drm_i915_cmd_table *table,
> > > +		  u32 cmd_header)
> > > +{
> > > +	int i;
> > > +
> > > +	for (i = 0; i < table->count; i++) {
> > > +		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
> > > +		u32 masked_cmd = desc->cmd.mask & cmd_header;
> > > +		u32 masked_value = desc->cmd.value & desc->cmd.mask;
> > > +
> > > +		if (masked_cmd == masked_value)
> > > +			return desc;
> > 
> > Maybe pre-sort the cmd table and use bsearch? And a runtime test on
> > module load to check that the table is sorted correctly.
> 
> So far it doesn't look like the search is a bottleneck, but I've tried to keep
> the tables sorted so that we could easily switch to bsearch if needed. Would
> you prefer to just use bsearch from the start?

Yes. I think it will be easier (especially if the codes does the runtime
check) to keep the table sorted as commands are incremently added in the
future, rather than having to retrofit the bsearch when it becomes a
significant problem.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 20/22] drm/i915: Fix MI_STORE_DWORD_IMM parser defintion
  2013-11-26 16:51 ` [RFC 20/22] drm/i915: Fix MI_STORE_DWORD_IMM parser defintion bradley.d.volkin
@ 2013-11-26 18:08   ` Chris Wilson
  2013-11-26 18:55     ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Chris Wilson @ 2013-11-26 18:08 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Tue, Nov 26, 2013 at 08:51:37AM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> The length mask is different for each ring and the size can vary,
> so we should duplicate the definition with the correct encoding
> for each ring.

Jumping in here since this highlights the most confusing aspect of this
series - the meta patching. Please implement the command parsing
infrastructure upfront and in a very small number of patches (most
preferably one) that avoids having to add fixes late in the series.

I think using
s/S/ALLOW/
s/R/REJECT/
s/B/BLACKLIST/
s/W/WHITELIST/
makes the action much more clear, and would rather that every unsafe
command starts off as REJECT. (With the whitelist/blacklisting being
added as separate patches with justification as they are now.)

Since we do disable the security, I would also reject all
unknown/unmatched commands and make ALLOW explicit.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 05/22] drm/i915: Implement command parsing
  2013-11-26 17:56       ` Chris Wilson
@ 2013-11-26 18:55         ` Volkin, Bradley D
  2013-12-05 21:10         ` Volkin, Bradley D
  1 sibling, 0 replies; 138+ messages in thread
From: Volkin, Bradley D @ 2013-11-26 18:55 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Nov 26, 2013 at 09:56:09AM -0800, Chris Wilson wrote:
> On Tue, Nov 26, 2013 at 09:38:55AM -0800, Volkin, Bradley D wrote:
> > On Tue, Nov 26, 2013 at 09:29:32AM -0800, Chris Wilson wrote:
> > > On Tue, Nov 26, 2013 at 08:51:22AM -0800, bradley.d.volkin@intel.com wrote:
> > > > +static const struct drm_i915_cmd_descriptor*
> > > > +find_cmd_in_table(const struct drm_i915_cmd_table *table,
> > > > +		  u32 cmd_header)
> > > > +{
> > > > +	int i;
> > > > +
> > > > +	for (i = 0; i < table->count; i++) {
> > > > +		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
> > > > +		u32 masked_cmd = desc->cmd.mask & cmd_header;
> > > > +		u32 masked_value = desc->cmd.value & desc->cmd.mask;
> > > > +
> > > > +		if (masked_cmd == masked_value)
> > > > +			return desc;
> > > 
> > > Maybe pre-sort the cmd table and use bsearch? And a runtime test on
> > > module load to check that the table is sorted correctly.
> > 
> > So far it doesn't look like the search is a bottleneck, but I've tried to keep
> > the tables sorted so that we could easily switch to bsearch if needed. Would
> > you prefer to just use bsearch from the start?
> 
> Yes. I think it will be easier (especially if the codes does the runtime
> check) to keep the table sorted as commands are incremently added in the
> future, rather than having to retrofit the bsearch when it becomes a
> significant problem.

Ok, makes sense. I'll assume the same comment applies to the register whitelists and
make similar changes there.

Brad

> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 20/22] drm/i915: Fix MI_STORE_DWORD_IMM parser defintion
  2013-11-26 18:08   ` Chris Wilson
@ 2013-11-26 18:55     ` Volkin, Bradley D
  0 siblings, 0 replies; 138+ messages in thread
From: Volkin, Bradley D @ 2013-11-26 18:55 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Nov 26, 2013 at 10:08:48AM -0800, Chris Wilson wrote:
> On Tue, Nov 26, 2013 at 08:51:37AM -0800, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > The length mask is different for each ring and the size can vary,
> > so we should duplicate the definition with the correct encoding
> > for each ring.
> 
> Jumping in here since this highlights the most confusing aspect of this
> series - the meta patching. Please implement the command parsing
> infrastructure upfront and in a very small number of patches (most
> preferably one) that avoids having to add fixes late in the series.

Sure. As this is my first contribution, I'm still working on how to best
split up a series, so I'm happy to clean up that aspect. It sounds like
the series should look more like:
- All parser infrastructure and implementation (basically squash current 1-9,
  plus the bsearch changes, plus REJECT changes)
- N patches to set commands for register whitelist and bitmask checking
- Enable the parser

Correct?

> 
> I think using
> s/S/ALLOW/
> s/R/REJECT/
> s/B/BLACKLIST/
> s/W/WHITELIST/
> makes the action much more clear, and would rather that every unsafe
> command starts off as REJECT. (With the whitelist/blacklisting being
> added as separate patches with justification as they are now.)

I had split out the REJECTs to make the justification easier to find
in the commit message, but I can reject them from the start too.

For 'B', the term 'blacklist' to me implies a set of things that are
unconditionally not allowed, whereas the 'B' commands are conditionally
allowed based on the bitmask checks. Are you just asking for a readability
change in expanding the action names used in the table, or are you looking
for any semantic changes to what the parser checks? Or am I overthinking
this comment?

Brad

> 
> Since we do disable the security, I would also reject all
> unknown/unmatched commands and make ALLOW explicit.
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (21 preceding siblings ...)
  2013-11-26 16:51 ` [RFC 22/22] drm/i915: Enable command parsing by default bradley.d.volkin
@ 2013-11-26 19:35 ` Daniel Vetter
  2013-11-26 20:24   ` Volkin, Bradley D
                     ` (2 more replies)
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
                   ` (3 subsequent siblings)
  26 siblings, 3 replies; 138+ messages in thread
From: Daniel Vetter @ 2013-11-26 19:35 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

Hi Brad,

On Tue, Nov 26, 2013 at 08:51:17AM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> Certain OpenGL features (e.g. transform feedback, performance monitoring)
> require userspace code to submit batches containing commands such as
> MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
> generations of the hardware will noop these commands in "unsecure" batches
> (which includes all userspace batches submitted via i915) even though the
> commands may be safe and represent the intended programming model of the device.
> 
> This series introduces a software command parser similar in operation to the
> command parsing done in hardware for unsecure batches. However, the software
> parser allows some operations that would be noop'd by hardware, if the parser
> determines the operation is safe, and submits the batch as "secure" to prevent
> hardware parsing. Currently the series implements this on IVB and HSW.
> 
> The series is divided into several phases:
> 
> patches 01-09: These implement infrastructure and the command parsing algorithm,
>                all behind a module parameter. I expect some discussion and
> 	       rework, but hopefully there's nothing too controversial.
> patches 10-17: These define the checks performed by the parser.
>                I expect much discussion :)
> patches 18-20: In a final pass over the command checks, I found some issues with
>                the definitions. They looked painful to rebase in, so I've added
> 	       them here.
> patches 21-22: These enable the parser by default. It runs on all batches except
>                those that set the I915_EXEC_SECURE flag in the execbuffer2 call.

I think long-term we should even scan secure batches. We'd need to allow
some registers which only the drm master (i.e. owner of the display
hardware) is allowed to do, e.g. for scanline waits. But once we have that
we should be able to port all current users of secure batches over to
scanned batches and so enforce this everywhere by default.

The other issue is that igt tests assume to be able to run some evil
tests, so maybe we don't actually want this.

> There are follow-up patches to libdrm and to i-g-t. The i-g-t tests are very
> basic and do not test all of the commands used by the parser on the assumption
> that I'm likely to make the same mistakes in both the parser and the test.

Yeah, I agree that just checking whether commands all go through (or not)
as expected adds very little value on top of the few tests you have done.
I think we should take a look at some corner cases which might trip up
your checker a bit though:
- I think we should check batchbuffer chaining and make sure it works on
  the vcs ring and not anywhere else (we can't ever break shipping libva
  which uses this).
- Some tests to trip up your parser should be done, like 3D commands that
  fall off the end of the batch bo. Or commands that span page boundaries.
  The later isn't an issue atm since you use vmap, but we should switch to
  per-page kmap since the vmap overhead is fairly horrible.

> I've run the i-g-t gem_* tests, the piglit quick tests (w/Mesa git from a few
> days ago), and generally used an Ubuntu 13.10 IVB system with the parser
> running. Aside from a failure described below, I don't think there are any
> regressions. That is, piglit claims some regressions, but from manually running
> the tests I think these are false positives. However, I could use help in
> getting broader testing, particularly around performance. In general, I see less
> than 3% performance impact on HSW, with more like 10% impact for pathological
> batch sizes. But we'll certainly want to run relevant benchmarks beyond what
> I've done.

Yeah, a microbenchmark that just shovels MI_NOP batches of various sizes
through the checker and bypassing it (with EXEC_SECURE) would be really
good. Maybe even some variable-sized commands (all the state setup stuff
should be useful for that) to keep things interesting. Some variation is
also important to have some good cache thrasing going on (since your check
tables are fairly large I think).

> At this point there are a couple of known issues and potential improvements.
> 
> 1) VLV. The parser is currently disabled for VLV. One type of check performed by
>    the parser is that commands which access memory do so via PPGTT. VLV does not
>    have PPGTT enabled at this time. I chose to implement the PPGTT checks via
>    generic bit checking infrastructure in the parser, so they are not easily
>    disabled for VLV. For now, I'm disabling parsing altogether in the hope that
>    PPGTT can be enabled for VLV in the near future.

We need ppgtt for the parser anyway, since otherwise userspace can submit
a self-modifying batch. Checking for that is impossible, so allowing sw
checked batches without the ppgtt/ggtt split would be a decent security
hole.

> 2) Coherency. I've found two types of coherency issues when reading the batch
>    buffer from the CPU during execbuffer2. Looking for help with both issues.
>     i. First, the i-g-t test gem_cpu_reloc blits to a batch buffer and the
>        parser isn't properly waiting for the operation to complete before
>        parsing. I tried adding i915_gem_object_sync(batch_obj, [ring|NULL])
>        but that actually caused more failures.

This synchronization should happen when processing the relocations. The
batch itself isn't modified by the gpu, we simply upload it using the
blitter. So this going wrong indicates there's some issue somewhere ...


>    ii. Second, on VLV, I've seen cache coherency issues when userspace writes
>        the batch via pwrite fast path before calling execbuffer2. The parser
>        reads stale data. This works fine on IVB and HSW, so I believe it's an
>        LLC vs. non-LLC issue. I'm just unclear on what the correct flushing or
>        synchronization is for this scenario.

Imo we take a good look at the optimized buffer read/write code from
i915_gem_shmem_pread (for reading the userspace batch) and
i915_gem_shmem_pwrite (for writing to the checked buffer). If we do the
checking in-line with the reading this should also bring down the overhead
massively compared to the current solution (those shmem read/write
functions are fairly optimized).

> 3) 2nd-level batches. The parser currently allows MI_BATCH_BUFFER_START commands
>    in userspace batches without parsing them. The media driver uses 2nd-level
>    batches, so a solution is required. I have some ideas but don't want to delay
>    the review process for what I have so far. It may be that the 2nd-level
>    parsing is only needed for VCS and the current code (plus rejecting BBS)
>    would be sufficient for RCS.

Afaik only libva uses second-level batches, and only on the vcs. So I hope
we can just submit those as unpriviledged batches if possible. If that's
not possible it'll get fairly ugly I fear :(

> 4) Command buffer copy. To avoid CPU modifications to buffers after parsing, and
>    to avoid GPU modifications to buffers via EUs or commands in the batch, we
>    should copy the userspace batch buffer to memory that userspace does not
>    have access to, map it into GGTT, and execute that batch buffer. I have a
>    sense of how to do this for 1st-level batches, but it would need changes to
>    tie in with the 2nd-level batch parsing I think, so I've again held off.

Yeah, we need the copying for otherwise the parsing is fairly pointless.
I've stumbled over some of your internally patches and had a quick look at
them. Two high-level comments:

- Using the existing active buffer lru instead of manual pinning would
  better integrate with the eviction code. For an example of in-kernel
  objects (and not userspace objects) using this look at the hw context
  code.
- Imo we should tag all buffers as purgeable while they're in the cache.
  That way the shrinker will automatically drop the backing storage if
  memory runs low (and thanks to the active list lru only when the gpu has
  stopped processing the batch). That way we can just keep on allocating
  buffers if they're all busy without any concern for running out of
  memory.

I'll try to read through your patch pile in the next few days, this is
just the very-very high-level stuff that came to mind immediately.

Cheers, Daniel
> 
> Brad Volkin (22):
>   drm/i915: Add data structures for command parser
>   drm/i915: Initial command parser table definitions
>   drm/i915: Hook command parser tables up to rings
>   drm/i915: Add per-ring command length decode functions
>   drm/i915: Implement command parsing
>   drm/i915: Add a HAS_CMD_PARSER getparam
>   drm/i915: Add support for rejecting commands during parsing
>   drm/i915: Add support for checking register accesses
>   drm/i915: Add support for rejecting commands via bitmasks
>   drm/i915: Reject unsafe commands
>   drm/i915: Add register whitelists for mesa
>   drm/i915: Enable register whitelist checks
>   drm/i915: Enable bit checking for some commands
>   drm/i915: Enable PPGTT command parser checks
>   drm/i915: Reject commands that would store to global HWS page
>   drm/i915: Reject additional commands
>   drm/i915: Add parser data for perf monitoring GL extensions
>   drm/i915: Reject MI_ARB_ON_OFF on VECS
>   drm/i915: Fix length handling for MFX_WAIT
>   drm/i915: Fix MI_STORE_DWORD_IMM parser defintion
>   drm/i915: Clean up command parser enable decision
>   drm/i915: Enable command parsing by default
> 
>  drivers/gpu/drm/i915/Makefile              |   3 +-
>  drivers/gpu/drm/i915/i915_cmd_parser.c     | 712 +++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_dma.c            |   3 +
>  drivers/gpu/drm/i915/i915_drv.c            |   5 +
>  drivers/gpu/drm/i915/i915_drv.h            |  96 ++++
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  15 +
>  drivers/gpu/drm/i915/i915_reg.h            |  66 +++
>  drivers/gpu/drm/i915/intel_ringbuffer.c    |   2 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h    |  25 +
>  include/uapi/drm/i915_drm.h                |   1 +
>  10 files changed, 927 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c
> 
> -- 
> 1.8.4.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-11-26 19:35 ` [RFC 00/22] Gen7 batch buffer command parser Daniel Vetter
@ 2013-11-26 20:24   ` Volkin, Bradley D
  2013-11-27  1:32     ` ykzhao
                       ` (2 more replies)
  2013-11-27  1:26   ` Xiang, Haihao
  2013-12-11  0:58   ` Volkin, Bradley D
  2 siblings, 3 replies; 138+ messages in thread
From: Volkin, Bradley D @ 2013-11-26 20:24 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, Nov 26, 2013 at 11:35:38AM -0800, Daniel Vetter wrote:
> Hi Brad,
> 
> On Tue, Nov 26, 2013 at 08:51:17AM -0800, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > Certain OpenGL features (e.g. transform feedback, performance monitoring)
> > require userspace code to submit batches containing commands such as
> > MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
> > generations of the hardware will noop these commands in "unsecure" batches
> > (which includes all userspace batches submitted via i915) even though the
> > commands may be safe and represent the intended programming model of the device.
> > 
> > This series introduces a software command parser similar in operation to the
> > command parsing done in hardware for unsecure batches. However, the software
> > parser allows some operations that would be noop'd by hardware, if the parser
> > determines the operation is safe, and submits the batch as "secure" to prevent
> > hardware parsing. Currently the series implements this on IVB and HSW.
> > 
> > The series is divided into several phases:
> > 
> > patches 01-09: These implement infrastructure and the command parsing algorithm,
> >                all behind a module parameter. I expect some discussion and
> > 	       rework, but hopefully there's nothing too controversial.
> > patches 10-17: These define the checks performed by the parser.
> >                I expect much discussion :)
> > patches 18-20: In a final pass over the command checks, I found some issues with
> >                the definitions. They looked painful to rebase in, so I've added
> > 	       them here.
> > patches 21-22: These enable the parser by default. It runs on all batches except
> >                those that set the I915_EXEC_SECURE flag in the execbuffer2 call.
> 
> I think long-term we should even scan secure batches. We'd need to allow
> some registers which only the drm master (i.e. owner of the display
> hardware) is allowed to do, e.g. for scanline waits. But once we have that
> we should be able to port all current users of secure batches over to
> scanned batches and so enforce this everywhere by default.
> 
> The other issue is that igt tests assume to be able to run some evil
> tests, so maybe we don't actually want this.

Agreed. I thought we could handle this as a follow-up task once the basic stuff is
in place, particularly given that we'd want to modify at least some users to test.
I also wasn't sure if we would want the check to be root && master, as in the current
secure flag, or just master.

W.r.t. the tests, I suppose we can just turn checking on for secure batches and see
what happens.

> 
> > There are follow-up patches to libdrm and to i-g-t. The i-g-t tests are very
> > basic and do not test all of the commands used by the parser on the assumption
> > that I'm likely to make the same mistakes in both the parser and the test.
> 
> Yeah, I agree that just checking whether commands all go through (or not)
> as expected adds very little value on top of the few tests you have done.
> I think we should take a look at some corner cases which might trip up
> your checker a bit though:
> - I think we should check batchbuffer chaining and make sure it works on
>   the vcs ring and not anywhere else (we can't ever break shipping libva
>   which uses this).
> - Some tests to trip up your parser should be done, like 3D commands that
>   fall off the end of the batch bo. Or commands that span page boundaries.
>   The later isn't an issue atm since you use vmap, but we should switch to
>   per-page kmap since the vmap overhead is fairly horrible.

Good suggestions. I'll look into these.

> 
> > I've run the i-g-t gem_* tests, the piglit quick tests (w/Mesa git from a few
> > days ago), and generally used an Ubuntu 13.10 IVB system with the parser
> > running. Aside from a failure described below, I don't think there are any
> > regressions. That is, piglit claims some regressions, but from manually running
> > the tests I think these are false positives. However, I could use help in
> > getting broader testing, particularly around performance. In general, I see less
> > than 3% performance impact on HSW, with more like 10% impact for pathological
> > batch sizes. But we'll certainly want to run relevant benchmarks beyond what
> > I've done.
> 
> Yeah, a microbenchmark that just shovels MI_NOP batches of various sizes
> through the checker and bypassing it (with EXEC_SECURE) would be really
> good. Maybe even some variable-sized commands (all the state setup stuff
> should be useful for that) to keep things interesting. Some variation is
> also important to have some good cache thrasing going on (since your check
> tables are fairly large I think).

Ok. I'd be interested in some comment from the mesa and libva guys here for real world
workloads, but a microbenchmark would be a good start.

Which "state setup stuff" are you referring to? Something specific in i-g-t or something
more general?

> 
> > At this point there are a couple of known issues and potential improvements.
> > 
> > 1) VLV. The parser is currently disabled for VLV. One type of check performed by
> >    the parser is that commands which access memory do so via PPGTT. VLV does not
> >    have PPGTT enabled at this time. I chose to implement the PPGTT checks via
> >    generic bit checking infrastructure in the parser, so they are not easily
> >    disabled for VLV. For now, I'm disabling parsing altogether in the hope that
> >    PPGTT can be enabled for VLV in the near future.
> 
> We need ppgtt for the parser anyway, since otherwise userspace can submit
> a self-modifying batch. Checking for that is impossible, so allowing sw
> checked batches without the ppgtt/ggtt split would be a decent security
> hole.
> 
> > 2) Coherency. I've found two types of coherency issues when reading the batch
> >    buffer from the CPU during execbuffer2. Looking for help with both issues.
> >     i. First, the i-g-t test gem_cpu_reloc blits to a batch buffer and the
> >        parser isn't properly waiting for the operation to complete before
> >        parsing. I tried adding i915_gem_object_sync(batch_obj, [ring|NULL])
> >        but that actually caused more failures.
> 
> This synchronization should happen when processing the relocations. The
> batch itself isn't modified by the gpu, we simply upload it using the
> blitter. So this going wrong indicates there's some issue somewhere ...

Ok. I didn't debug too far. Putting a gem_sync() in the test between the upload and
exec fixed the issue. Since I wasn't doing any explicit synchronization I assumed it
was my issue.

> 
> 
> >    ii. Second, on VLV, I've seen cache coherency issues when userspace writes
> >        the batch via pwrite fast path before calling execbuffer2. The parser
> >        reads stale data. This works fine on IVB and HSW, so I believe it's an
> >        LLC vs. non-LLC issue. I'm just unclear on what the correct flushing or
> >        synchronization is for this scenario.
> 
> Imo we take a good look at the optimized buffer read/write code from
> i915_gem_shmem_pread (for reading the userspace batch) and
> i915_gem_shmem_pwrite (for writing to the checked buffer). If we do the
> checking in-line with the reading this should also bring down the overhead
> massively compared to the current solution (those shmem read/write
> functions are fairly optimized).

I'll take a look. So far I'm not seeing the vmap as a bottleneck and I'm a bit concerned
about the complexity of trying to do mapping/parsing per-page, but I agree it's worth a
more detailed analysis.

> 
> > 3) 2nd-level batches. The parser currently allows MI_BATCH_BUFFER_START commands
> >    in userspace batches without parsing them. The media driver uses 2nd-level
> >    batches, so a solution is required. I have some ideas but don't want to delay
> >    the review process for what I have so far. It may be that the 2nd-level
> >    parsing is only needed for VCS and the current code (plus rejecting BBS)
> >    would be sufficient for RCS.
> 
> Afaik only libva uses second-level batches, and only on the vcs. So I hope

I would be very happy if that was the case. I couldn't easily tell from reading the libva
code which ring they were submitting to. I definitely did not find uses in mesa or the ddx.

> we can just submit those as unpriviledged batches if possible. If that's
> not possible it'll get fairly ugly I fear :(
> 
> > 4) Command buffer copy. To avoid CPU modifications to buffers after parsing, and
> >    to avoid GPU modifications to buffers via EUs or commands in the batch, we
> >    should copy the userspace batch buffer to memory that userspace does not
> >    have access to, map it into GGTT, and execute that batch buffer. I have a
> >    sense of how to do this for 1st-level batches, but it would need changes to
> >    tie in with the 2nd-level batch parsing I think, so I've again held off.
> 
> Yeah, we need the copying for otherwise the parsing is fairly pointless.
> I've stumbled over some of your internally patches and had a quick look at
> them. Two high-level comments:
> 
> - Using the existing active buffer lru instead of manual pinning would
>   better integrate with the eviction code. For an example of in-kernel
>   objects (and not userspace objects) using this look at the hw context
>   code.
> - Imo we should tag all buffers as purgeable while they're in the cache.
>   That way the shrinker will automatically drop the backing storage if
>   memory runs low (and thanks to the active list lru only when the gpu has
>   stopped processing the batch). That way we can just keep on allocating
>   buffers if they're all busy without any concern for running out of
>   memory.

Ok, I'll look at the hw context code for buffer mgmt. For "purgeable", just via the
madv field in the i915 gem object?

Also, there are a couple iterations of the work-in-progress patches. Do you prefer a
cache per ring or a single cache shared by all rings?

Thanks,
Brad

> 
> I'll try to read through your patch pile in the next few days, this is
> just the very-very high-level stuff that came to mind immediately.
> 
> Cheers, Daniel
> > 
> > Brad Volkin (22):
> >   drm/i915: Add data structures for command parser
> >   drm/i915: Initial command parser table definitions
> >   drm/i915: Hook command parser tables up to rings
> >   drm/i915: Add per-ring command length decode functions
> >   drm/i915: Implement command parsing
> >   drm/i915: Add a HAS_CMD_PARSER getparam
> >   drm/i915: Add support for rejecting commands during parsing
> >   drm/i915: Add support for checking register accesses
> >   drm/i915: Add support for rejecting commands via bitmasks
> >   drm/i915: Reject unsafe commands
> >   drm/i915: Add register whitelists for mesa
> >   drm/i915: Enable register whitelist checks
> >   drm/i915: Enable bit checking for some commands
> >   drm/i915: Enable PPGTT command parser checks
> >   drm/i915: Reject commands that would store to global HWS page
> >   drm/i915: Reject additional commands
> >   drm/i915: Add parser data for perf monitoring GL extensions
> >   drm/i915: Reject MI_ARB_ON_OFF on VECS
> >   drm/i915: Fix length handling for MFX_WAIT
> >   drm/i915: Fix MI_STORE_DWORD_IMM parser defintion
> >   drm/i915: Clean up command parser enable decision
> >   drm/i915: Enable command parsing by default
> > 
> >  drivers/gpu/drm/i915/Makefile              |   3 +-
> >  drivers/gpu/drm/i915/i915_cmd_parser.c     | 712 +++++++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/i915_dma.c            |   3 +
> >  drivers/gpu/drm/i915/i915_drv.c            |   5 +
> >  drivers/gpu/drm/i915/i915_drv.h            |  96 ++++
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  15 +
> >  drivers/gpu/drm/i915/i915_reg.h            |  66 +++
> >  drivers/gpu/drm/i915/intel_ringbuffer.c    |   2 +
> >  drivers/gpu/drm/i915/intel_ringbuffer.h    |  25 +
> >  include/uapi/drm/i915_drm.h                |   1 +
> >  10 files changed, 927 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c
> > 
> > -- 
> > 1.8.4.4
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-11-26 19:35 ` [RFC 00/22] Gen7 batch buffer command parser Daniel Vetter
  2013-11-26 20:24   ` Volkin, Bradley D
@ 2013-11-27  1:26   ` Xiang, Haihao
  2013-12-11  0:58   ` Volkin, Bradley D
  2 siblings, 0 replies; 138+ messages in thread
From: Xiang, Haihao @ 2013-11-27  1:26 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, 2013-11-26 at 20:35 +0100, Daniel Vetter wrote: 
> Hi Brad,
> 
> On Tue, Nov 26, 2013 at 08:51:17AM -0800, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > Certain OpenGL features (e.g. transform feedback, performance monitoring)
> > require userspace code to submit batches containing commands such as
> > MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
> > generations of the hardware will noop these commands in "unsecure" batches
> > (which includes all userspace batches submitted via i915) even though the
> > commands may be safe and represent the intended programming model of the device.
> > 
> > This series introduces a software command parser similar in operation to the
> > command parsing done in hardware for unsecure batches. However, the software
> > parser allows some operations that would be noop'd by hardware, if the parser
> > determines the operation is safe, and submits the batch as "secure" to prevent
> > hardware parsing. Currently the series implements this on IVB and HSW.
> > 
> > The series is divided into several phases:
> > 
> > patches 01-09: These implement infrastructure and the command parsing algorithm,
> >                all behind a module parameter. I expect some discussion and
> > 	       rework, but hopefully there's nothing too controversial.
> > patches 10-17: These define the checks performed by the parser.
> >                I expect much discussion :)
> > patches 18-20: In a final pass over the command checks, I found some issues with
> >                the definitions. They looked painful to rebase in, so I've added
> > 	       them here.
> > patches 21-22: These enable the parser by default. It runs on all batches except
> >                those that set the I915_EXEC_SECURE flag in the execbuffer2 call.
> 
> I think long-term we should even scan secure batches. We'd need to allow
> some registers which only the drm master (i.e. owner of the display
> hardware) is allowed to do, e.g. for scanline waits. But once we have that
> we should be able to port all current users of secure batches over to
> scanned batches and so enforce this everywhere by default.
> 
> The other issue is that igt tests assume to be able to run some evil
> tests, so maybe we don't actually want this.
> 
> > There are follow-up patches to libdrm and to i-g-t. The i-g-t tests are very
> > basic and do not test all of the commands used by the parser on the assumption
> > that I'm likely to make the same mistakes in both the parser and the test.
> 
> Yeah, I agree that just checking whether commands all go through (or not)
> as expected adds very little value on top of the few tests you have done.
> I think we should take a look at some corner cases which might trip up
> your checker a bit though:
> - I think we should check batchbuffer chaining and make sure it works on
>   the vcs ring and not anywhere else (we can't ever break shipping libva
>   which uses this).

Besides the vcs ring, we also use batchbuffer chaining on the render
ring for video post processing, video motion estimation and motion
compensation(on ILK),  A fixed length batch buffer isn't suitable for
those operations as those operations are based on a macroblock instead
of a frame. It would be better to make sure batchbuffer chaining works
on the render ring too.


> - Some tests to trip up your parser should be done, like 3D commands that
>   fall off the end of the batch bo. Or commands that span page boundaries.
>   The later isn't an issue atm since you use vmap, but we should switch to
>   per-page kmap since the vmap overhead is fairly horrible.
> 
> > I've run the i-g-t gem_* tests, the piglit quick tests (w/Mesa git from a few
> > days ago), and generally used an Ubuntu 13.10 IVB system with the parser
> > running. Aside from a failure described below, I don't think there are any
> > regressions. That is, piglit claims some regressions, but from manually running
> > the tests I think these are false positives. However, I could use help in
> > getting broader testing, particularly around performance. In general, I see less
> > than 3% performance impact on HSW, with more like 10% impact for pathological
> > batch sizes. But we'll certainly want to run relevant benchmarks beyond what
> > I've done.
> 
> Yeah, a microbenchmark that just shovels MI_NOP batches of various sizes
> through the checker and bypassing it (with EXEC_SECURE) would be really
> good. Maybe even some variable-sized commands (all the state setup stuff
> should be useful for that) to keep things interesting. Some variation is
> also important to have some good cache thrasing going on (since your check
> tables are fairly large I think).
> 
> > At this point there are a couple of known issues and potential improvements.
> > 
> > 1) VLV. The parser is currently disabled for VLV. One type of check performed by
> >    the parser is that commands which access memory do so via PPGTT. VLV does not
> >    have PPGTT enabled at this time. I chose to implement the PPGTT checks via
> >    generic bit checking infrastructure in the parser, so they are not easily
> >    disabled for VLV. For now, I'm disabling parsing altogether in the hope that
> >    PPGTT can be enabled for VLV in the near future.
> 
> We need ppgtt for the parser anyway, since otherwise userspace can submit
> a self-modifying batch. Checking for that is impossible, so allowing sw
> checked batches without the ppgtt/ggtt split would be a decent security
> hole.
> 
> > 2) Coherency. I've found two types of coherency issues when reading the batch
> >    buffer from the CPU during execbuffer2. Looking for help with both issues.
> >     i. First, the i-g-t test gem_cpu_reloc blits to a batch buffer and the
> >        parser isn't properly waiting for the operation to complete before
> >        parsing. I tried adding i915_gem_object_sync(batch_obj, [ring|NULL])
> >        but that actually caused more failures.
> 
> This synchronization should happen when processing the relocations. The
> batch itself isn't modified by the gpu, we simply upload it using the
> blitter. So this going wrong indicates there's some issue somewhere ...
> 
> 
> >    ii. Second, on VLV, I've seen cache coherency issues when userspace writes
> >        the batch via pwrite fast path before calling execbuffer2. The parser
> >        reads stale data. This works fine on IVB and HSW, so I believe it's an
> >        LLC vs. non-LLC issue. I'm just unclear on what the correct flushing or
> >        synchronization is for this scenario.
> 
> Imo we take a good look at the optimized buffer read/write code from
> i915_gem_shmem_pread (for reading the userspace batch) and
> i915_gem_shmem_pwrite (for writing to the checked buffer). If we do the
> checking in-line with the reading this should also bring down the overhead
> massively compared to the current solution (those shmem read/write
> functions are fairly optimized).
> 
> > 3) 2nd-level batches. The parser currently allows MI_BATCH_BUFFER_START commands
> >    in userspace batches without parsing them. The media driver uses 2nd-level
> >    batches, so a solution is required. I have some ideas but don't want to delay
> >    the review process for what I have so far. It may be that the 2nd-level
> >    parsing is only needed for VCS and the current code (plus rejecting BBS)
> >    would be sufficient for RCS.
> 
> Afaik only libva uses second-level batches, and only on the vcs. So I hope
> we can just submit those as unpriviledged batches if possible. If that's
> not possible it'll get fairly ugly I fear :(
> 
> > 4) Command buffer copy. To avoid CPU modifications to buffers after parsing, and
> >    to avoid GPU modifications to buffers via EUs or commands in the batch, we
> >    should copy the userspace batch buffer to memory that userspace does not
> >    have access to, map it into GGTT, and execute that batch buffer. I have a
> >    sense of how to do this for 1st-level batches, but it would need changes to
> >    tie in with the 2nd-level batch parsing I think, so I've again held off.
> 
> Yeah, we need the copying for otherwise the parsing is fairly pointless.
> I've stumbled over some of your internally patches and had a quick look at
> them. Two high-level comments:
> 
> - Using the existing active buffer lru instead of manual pinning would
>   better integrate with the eviction code. For an example of in-kernel
>   objects (and not userspace objects) using this look at the hw context
>   code.
> - Imo we should tag all buffers as purgeable while they're in the cache.
>   That way the shrinker will automatically drop the backing storage if
>   memory runs low (and thanks to the active list lru only when the gpu has
>   stopped processing the batch). That way we can just keep on allocating
>   buffers if they're all busy without any concern for running out of
>   memory.
> 
> I'll try to read through your patch pile in the next few days, this is
> just the very-very high-level stuff that came to mind immediately.
> 
> Cheers, Daniel
> > 
> > Brad Volkin (22):
> >   drm/i915: Add data structures for command parser
> >   drm/i915: Initial command parser table definitions
> >   drm/i915: Hook command parser tables up to rings
> >   drm/i915: Add per-ring command length decode functions
> >   drm/i915: Implement command parsing
> >   drm/i915: Add a HAS_CMD_PARSER getparam
> >   drm/i915: Add support for rejecting commands during parsing
> >   drm/i915: Add support for checking register accesses
> >   drm/i915: Add support for rejecting commands via bitmasks
> >   drm/i915: Reject unsafe commands
> >   drm/i915: Add register whitelists for mesa
> >   drm/i915: Enable register whitelist checks
> >   drm/i915: Enable bit checking for some commands
> >   drm/i915: Enable PPGTT command parser checks
> >   drm/i915: Reject commands that would store to global HWS page
> >   drm/i915: Reject additional commands
> >   drm/i915: Add parser data for perf monitoring GL extensions
> >   drm/i915: Reject MI_ARB_ON_OFF on VECS
> >   drm/i915: Fix length handling for MFX_WAIT
> >   drm/i915: Fix MI_STORE_DWORD_IMM parser defintion
> >   drm/i915: Clean up command parser enable decision
> >   drm/i915: Enable command parsing by default
> > 
> >  drivers/gpu/drm/i915/Makefile              |   3 +-
> >  drivers/gpu/drm/i915/i915_cmd_parser.c     | 712 +++++++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/i915_dma.c            |   3 +
> >  drivers/gpu/drm/i915/i915_drv.c            |   5 +
> >  drivers/gpu/drm/i915/i915_drv.h            |  96 ++++
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  15 +
> >  drivers/gpu/drm/i915/i915_reg.h            |  66 +++
> >  drivers/gpu/drm/i915/intel_ringbuffer.c    |   2 +
> >  drivers/gpu/drm/i915/intel_ringbuffer.h    |  25 +
> >  include/uapi/drm/i915_drm.h                |   1 +
> >  10 files changed, 927 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c
> > 
> > -- 
> > 1.8.4.4
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-11-26 20:24   ` Volkin, Bradley D
@ 2013-11-27  1:32     ` ykzhao
  2013-11-27  8:10       ` Daniel Vetter
  2013-12-04  8:13     ` Daniel Vetter
  2013-12-05 20:47     ` Volkin, Bradley D
  2 siblings, 1 reply; 138+ messages in thread
From: ykzhao @ 2013-11-27  1:32 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Tue, 2013-11-26 at 13:24 -0700, Volkin, Bradley D wrote:
> On Tue, Nov 26, 2013 at 11:35:38AM -0800, Daniel Vetter wrote:
> > Hi Brad,
> > 
> > On Tue, Nov 26, 2013 at 08:51:17AM -0800, bradley.d.volkin@intel.com wrote:
> > > From: Brad Volkin <bradley.d.volkin@intel.com>
> > > 
> > > Certain OpenGL features (e.g. transform feedback, performance monitoring)
> > > require userspace code to submit batches containing commands such as
> > > MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
> > > generations of the hardware will noop these commands in "unsecure" batches
> > > (which includes all userspace batches submitted via i915) even though the
> > > commands may be safe and represent the intended programming model of the device.
> > > 
> > > This series introduces a software command parser similar in operation to the
> > > command parsing done in hardware for unsecure batches. However, the software
> > > parser allows some operations that would be noop'd by hardware, if the parser
> > > determines the operation is safe, and submits the batch as "secure" to prevent
> > > hardware parsing. Currently the series implements this on IVB and HSW.
> > > 
> > > The series is divided into several phases:
> > > 
> > > patches 01-09: These implement infrastructure and the command parsing algorithm,
> > >                all behind a module parameter. I expect some discussion and
> > > 	       rework, but hopefully there's nothing too controversial.
> > > patches 10-17: These define the checks performed by the parser.
> > >                I expect much discussion :)
> > > patches 18-20: In a final pass over the command checks, I found some issues with
> > >                the definitions. They looked painful to rebase in, so I've added
> > > 	       them here.
> > > patches 21-22: These enable the parser by default. It runs on all batches except
> > >                those that set the I915_EXEC_SECURE flag in the execbuffer2 call.
> > 
> > I think long-term we should even scan secure batches. We'd need to allow
> > some registers which only the drm master (i.e. owner of the display
> > hardware) is allowed to do, e.g. for scanline waits. But once we have that
> > we should be able to port all current users of secure batches over to
> > scanned batches and so enforce this everywhere by default.
> > 
> > The other issue is that igt tests assume to be able to run some evil
> > tests, so maybe we don't actually want this.
> 
> Agreed. I thought we could handle this as a follow-up task once the basic stuff is
> in place, particularly given that we'd want to modify at least some users to test.
> I also wasn't sure if we would want the check to be root && master, as in the current
> secure flag, or just master.
> 
> W.r.t. the tests, I suppose we can just turn checking on for secure batches and see
> what happens.
> 
> > 
> > > There are follow-up patches to libdrm and to i-g-t. The i-g-t tests are very
> > > basic and do not test all of the commands used by the parser on the assumption
> > > that I'm likely to make the same mistakes in both the parser and the test.
> > 
> > Yeah, I agree that just checking whether commands all go through (or not)
> > as expected adds very little value on top of the few tests you have done.
> > I think we should take a look at some corner cases which might trip up
> > your checker a bit though:
> > - I think we should check batchbuffer chaining and make sure it works on
> >   the vcs ring and not anywhere else (we can't ever break shipping libva
> >   which uses this).
> > - Some tests to trip up your parser should be done, like 3D commands that
> >   fall off the end of the batch bo. Or commands that span page boundaries.
> >   The later isn't an issue atm since you use vmap, but we should switch to
> >   per-page kmap since the vmap overhead is fairly horrible.
> 
> Good suggestions. I'll look into these.
Hi, Brad
      More inputs from libva about the batchbuffer chaining.

      Now the batchbuffer chaining is widely used in libva driver. This
is related with how the libva driver processes the image. For the
encoding purpose, it needs to be handled based on macroblock(16x16).And
every macroblock needs a group of GPU commands. So the GPU commands for
all the macroblocks will be constructed in the second-level batchbuffer.
The mode of batchbuffer chaining will bring the following benefits:
      a. The size of second-level batch buffer can be allocated based on
the size of handled image. For example: 1080p/720p/480p can use the
different size.
      b. The gpu commands in second-level batchbuffer can be constructed
by using GPU instead of CPU, which is helpful to improve the
performance. 

      At the same time both VCS and Render Ring are used in libva
driver. For example: The encoding will use VCS and RCS ring. Firstly the
RCS ring is used to execute GPU command for the motion vector/mode
prediction. And then the VCS Ring is used to execute the GPU command for
generating the bit-stream. So not only VCS ring uses the mode of
batchbuffer chaining, but also the Render Ring uses the mode of
batchbuffer chaining.
      
Thanks.
    Yakui
> 
> > 
> > > I've run the i-g-t gem_* tests, the piglit quick tests (w/Mesa git from a few
> > > days ago), and generally used an Ubuntu 13.10 IVB system with the parser
> > > running. Aside from a failure described below, I don't think there are any
> > > regressions. That is, piglit claims some regressions, but from manually running
> > > the tests I think these are false positives. However, I could use help in
> > > getting broader testing, particularly around performance. In general, I see less
> > > than 3% performance impact on HSW, with more like 10% impact for pathological
> > > batch sizes. But we'll certainly want to run relevant benchmarks beyond what
> > > I've done.
> > 
> > Yeah, a microbenchmark that just shovels MI_NOP batches of various sizes
> > through the checker and bypassing it (with EXEC_SECURE) would be really
> > good. Maybe even some variable-sized commands (all the state setup stuff
> > should be useful for that) to keep things interesting. Some variation is
> > also important to have some good cache thrasing going on (since your check
> > tables are fairly large I think).
> 
> Ok. I'd be interested in some comment from the mesa and libva guys here for real world
> workloads, but a microbenchmark would be a good start.
> 
> Which "state setup stuff" are you referring to? Something specific in i-g-t or something
> more general?
> 
> > 
> > > At this point there are a couple of known issues and potential improvements.
> > > 
> > > 1) VLV. The parser is currently disabled for VLV. One type of check performed by
> > >    the parser is that commands which access memory do so via PPGTT. VLV does not
> > >    have PPGTT enabled at this time. I chose to implement the PPGTT checks via
> > >    generic bit checking infrastructure in the parser, so they are not easily
> > >    disabled for VLV. For now, I'm disabling parsing altogether in the hope that
> > >    PPGTT can be enabled for VLV in the near future.
> > 
> > We need ppgtt for the parser anyway, since otherwise userspace can submit
> > a self-modifying batch. Checking for that is impossible, so allowing sw
> > checked batches without the ppgtt/ggtt split would be a decent security
> > hole.
> > 
> > > 2) Coherency. I've found two types of coherency issues when reading the batch
> > >    buffer from the CPU during execbuffer2. Looking for help with both issues.
> > >     i. First, the i-g-t test gem_cpu_reloc blits to a batch buffer and the
> > >        parser isn't properly waiting for the operation to complete before
> > >        parsing. I tried adding i915_gem_object_sync(batch_obj, [ring|NULL])
> > >        but that actually caused more failures.
> > 
> > This synchronization should happen when processing the relocations. The
> > batch itself isn't modified by the gpu, we simply upload it using the
> > blitter. So this going wrong indicates there's some issue somewhere ...
> 
> Ok. I didn't debug too far. Putting a gem_sync() in the test between the upload and
> exec fixed the issue. Since I wasn't doing any explicit synchronization I assumed it
> was my issue.
> 
> > 
> > 
> > >    ii. Second, on VLV, I've seen cache coherency issues when userspace writes
> > >        the batch via pwrite fast path before calling execbuffer2. The parser
> > >        reads stale data. This works fine on IVB and HSW, so I believe it's an
> > >        LLC vs. non-LLC issue. I'm just unclear on what the correct flushing or
> > >        synchronization is for this scenario.
> > 
> > Imo we take a good look at the optimized buffer read/write code from
> > i915_gem_shmem_pread (for reading the userspace batch) and
> > i915_gem_shmem_pwrite (for writing to the checked buffer). If we do the
> > checking in-line with the reading this should also bring down the overhead
> > massively compared to the current solution (those shmem read/write
> > functions are fairly optimized).
> 
> I'll take a look. So far I'm not seeing the vmap as a bottleneck and I'm a bit concerned
> about the complexity of trying to do mapping/parsing per-page, but I agree it's worth a
> more detailed analysis.
> 
> > 
> > > 3) 2nd-level batches. The parser currently allows MI_BATCH_BUFFER_START commands
> > >    in userspace batches without parsing them. The media driver uses 2nd-level
> > >    batches, so a solution is required. I have some ideas but don't want to delay
> > >    the review process for what I have so far. It may be that the 2nd-level
> > >    parsing is only needed for VCS and the current code (plus rejecting BBS)
> > >    would be sufficient for RCS.
> > 
> > Afaik only libva uses second-level batches, and only on the vcs. So I hope
> 
> I would be very happy if that was the case. I couldn't easily tell from reading the libva
> code which ring they were submitting to. I definitely did not find uses in mesa or the ddx.
> 
> > we can just submit those as unpriviledged batches if possible. If that's
> > not possible it'll get fairly ugly I fear :(
> > 
> > > 4) Command buffer copy. To avoid CPU modifications to buffers after parsing, and
> > >    to avoid GPU modifications to buffers via EUs or commands in the batch, we
> > >    should copy the userspace batch buffer to memory that userspace does not
> > >    have access to, map it into GGTT, and execute that batch buffer. I have a
> > >    sense of how to do this for 1st-level batches, but it would need changes to
> > >    tie in with the 2nd-level batch parsing I think, so I've again held off.
> > 
> > Yeah, we need the copying for otherwise the parsing is fairly pointless.
> > I've stumbled over some of your internally patches and had a quick look at
> > them. Two high-level comments:
> > 
> > - Using the existing active buffer lru instead of manual pinning would
> >   better integrate with the eviction code. For an example of in-kernel
> >   objects (and not userspace objects) using this look at the hw context
> >   code.
> > - Imo we should tag all buffers as purgeable while they're in the cache.
> >   That way the shrinker will automatically drop the backing storage if
> >   memory runs low (and thanks to the active list lru only when the gpu has
> >   stopped processing the batch). That way we can just keep on allocating
> >   buffers if they're all busy without any concern for running out of
> >   memory.
> 
> Ok, I'll look at the hw context code for buffer mgmt. For "purgeable", just via the
> madv field in the i915 gem object?
> 
> Also, there are a couple iterations of the work-in-progress patches. Do you prefer a
> cache per ring or a single cache shared by all rings?
> 
> Thanks,
> Brad
> 
> > 
> > I'll try to read through your patch pile in the next few days, this is
> > just the very-very high-level stuff that came to mind immediately.
> > 
> > Cheers, Daniel
> > > 
> > > Brad Volkin (22):
> > >   drm/i915: Add data structures for command parser
> > >   drm/i915: Initial command parser table definitions
> > >   drm/i915: Hook command parser tables up to rings
> > >   drm/i915: Add per-ring command length decode functions
> > >   drm/i915: Implement command parsing
> > >   drm/i915: Add a HAS_CMD_PARSER getparam
> > >   drm/i915: Add support for rejecting commands during parsing
> > >   drm/i915: Add support for checking register accesses
> > >   drm/i915: Add support for rejecting commands via bitmasks
> > >   drm/i915: Reject unsafe commands
> > >   drm/i915: Add register whitelists for mesa
> > >   drm/i915: Enable register whitelist checks
> > >   drm/i915: Enable bit checking for some commands
> > >   drm/i915: Enable PPGTT command parser checks
> > >   drm/i915: Reject commands that would store to global HWS page
> > >   drm/i915: Reject additional commands
> > >   drm/i915: Add parser data for perf monitoring GL extensions
> > >   drm/i915: Reject MI_ARB_ON_OFF on VECS
> > >   drm/i915: Fix length handling for MFX_WAIT
> > >   drm/i915: Fix MI_STORE_DWORD_IMM parser defintion
> > >   drm/i915: Clean up command parser enable decision
> > >   drm/i915: Enable command parsing by default
> > > 
> > >  drivers/gpu/drm/i915/Makefile              |   3 +-
> > >  drivers/gpu/drm/i915/i915_cmd_parser.c     | 712 +++++++++++++++++++++++++++++
> > >  drivers/gpu/drm/i915/i915_dma.c            |   3 +
> > >  drivers/gpu/drm/i915/i915_drv.c            |   5 +
> > >  drivers/gpu/drm/i915/i915_drv.h            |  96 ++++
> > >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  15 +
> > >  drivers/gpu/drm/i915/i915_reg.h            |  66 +++
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c    |   2 +
> > >  drivers/gpu/drm/i915/intel_ringbuffer.h    |  25 +
> > >  include/uapi/drm/i915_drm.h                |   1 +
> > >  10 files changed, 927 insertions(+), 1 deletion(-)
> > >  create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c
> > > 
> > > -- 
> > > 1.8.4.4
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-11-27  1:32     ` ykzhao
@ 2013-11-27  8:10       ` Daniel Vetter
  2013-11-27  8:23         ` Xiang, Haihao
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2013-11-27  8:10 UTC (permalink / raw)
  To: ykzhao; +Cc: intel-gfx

On Wed, Nov 27, 2013 at 09:32:32AM +0800, ykzhao wrote:
> On Tue, 2013-11-26 at 13:24 -0700, Volkin, Bradley D wrote:
> > On Tue, Nov 26, 2013 at 11:35:38AM -0800, Daniel Vetter wrote:
> > > Hi Brad,
> > > 
> > > On Tue, Nov 26, 2013 at 08:51:17AM -0800, bradley.d.volkin@intel.com wrote:
> > > > From: Brad Volkin <bradley.d.volkin@intel.com>
> > > > 
> > > > Certain OpenGL features (e.g. transform feedback, performance monitoring)
> > > > require userspace code to submit batches containing commands such as
> > > > MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
> > > > generations of the hardware will noop these commands in "unsecure" batches
> > > > (which includes all userspace batches submitted via i915) even though the
> > > > commands may be safe and represent the intended programming model of the device.
> > > > 
> > > > This series introduces a software command parser similar in operation to the
> > > > command parsing done in hardware for unsecure batches. However, the software
> > > > parser allows some operations that would be noop'd by hardware, if the parser
> > > > determines the operation is safe, and submits the batch as "secure" to prevent
> > > > hardware parsing. Currently the series implements this on IVB and HSW.
> > > > 
> > > > The series is divided into several phases:
> > > > 
> > > > patches 01-09: These implement infrastructure and the command parsing algorithm,
> > > >                all behind a module parameter. I expect some discussion and
> > > > 	       rework, but hopefully there's nothing too controversial.
> > > > patches 10-17: These define the checks performed by the parser.
> > > >                I expect much discussion :)
> > > > patches 18-20: In a final pass over the command checks, I found some issues with
> > > >                the definitions. They looked painful to rebase in, so I've added
> > > > 	       them here.
> > > > patches 21-22: These enable the parser by default. It runs on all batches except
> > > >                those that set the I915_EXEC_SECURE flag in the execbuffer2 call.
> > > 
> > > I think long-term we should even scan secure batches. We'd need to allow
> > > some registers which only the drm master (i.e. owner of the display
> > > hardware) is allowed to do, e.g. for scanline waits. But once we have that
> > > we should be able to port all current users of secure batches over to
> > > scanned batches and so enforce this everywhere by default.
> > > 
> > > The other issue is that igt tests assume to be able to run some evil
> > > tests, so maybe we don't actually want this.
> > 
> > Agreed. I thought we could handle this as a follow-up task once the basic stuff is
> > in place, particularly given that we'd want to modify at least some users to test.
> > I also wasn't sure if we would want the check to be root && master, as in the current
> > secure flag, or just master.
> > 
> > W.r.t. the tests, I suppose we can just turn checking on for secure batches and see
> > what happens.
> > 
> > > 
> > > > There are follow-up patches to libdrm and to i-g-t. The i-g-t tests are very
> > > > basic and do not test all of the commands used by the parser on the assumption
> > > > that I'm likely to make the same mistakes in both the parser and the test.
> > > 
> > > Yeah, I agree that just checking whether commands all go through (or not)
> > > as expected adds very little value on top of the few tests you have done.
> > > I think we should take a look at some corner cases which might trip up
> > > your checker a bit though:
> > > - I think we should check batchbuffer chaining and make sure it works on
> > >   the vcs ring and not anywhere else (we can't ever break shipping libva
> > >   which uses this).
> > > - Some tests to trip up your parser should be done, like 3D commands that
> > >   fall off the end of the batch bo. Or commands that span page boundaries.
> > >   The later isn't an issue atm since you use vmap, but we should switch to
> > >   per-page kmap since the vmap overhead is fairly horrible.
> > 
> > Good suggestions. I'll look into these.
> Hi, Brad
>       More inputs from libva about the batchbuffer chaining.
> 
>       Now the batchbuffer chaining is widely used in libva driver. This
> is related with how the libva driver processes the image. For the
> encoding purpose, it needs to be handled based on macroblock(16x16).And
> every macroblock needs a group of GPU commands. So the GPU commands for
> all the macroblocks will be constructed in the second-level batchbuffer.
> The mode of batchbuffer chaining will bring the following benefits:
>       a. The size of second-level batch buffer can be allocated based on
> the size of handled image. For example: 1080p/720p/480p can use the
> different size.
>       b. The gpu commands in second-level batchbuffer can be constructed
> by using GPU instead of CPU, which is helpful to improve the
> performance. 
> 
>       At the same time both VCS and Render Ring are used in libva
> driver. For example: The encoding will use VCS and RCS ring. Firstly the
> RCS ring is used to execute GPU command for the motion vector/mode
> prediction. And then the VCS Ring is used to execute the GPU command for
> generating the bit-stream. So not only VCS ring uses the mode of
> batchbuffer chaining, but also the Render Ring uses the mode of
> batchbuffer chaining.

So are these 2nd level batches constructed by the gpu in some cases? That
would be fairly horribly to take into account with the batch checker ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-11-27  8:10       ` Daniel Vetter
@ 2013-11-27  8:23         ` Xiang, Haihao
  2013-11-27  8:31           ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Xiang, Haihao @ 2013-11-27  8:23 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, 2013-11-27 at 09:10 +0100, Daniel Vetter wrote: 
> On Wed, Nov 27, 2013 at 09:32:32AM +0800, ykzhao wrote:
> > On Tue, 2013-11-26 at 13:24 -0700, Volkin, Bradley D wrote:
> > > On Tue, Nov 26, 2013 at 11:35:38AM -0800, Daniel Vetter wrote:
> > > > Hi Brad,
> > > > 
> > > > On Tue, Nov 26, 2013 at 08:51:17AM -0800, bradley.d.volkin@intel.com wrote:
> > > > > From: Brad Volkin <bradley.d.volkin@intel.com>
> > > > > 
> > > > > Certain OpenGL features (e.g. transform feedback, performance monitoring)
> > > > > require userspace code to submit batches containing commands such as
> > > > > MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
> > > > > generations of the hardware will noop these commands in "unsecure" batches
> > > > > (which includes all userspace batches submitted via i915) even though the
> > > > > commands may be safe and represent the intended programming model of the device.
> > > > > 
> > > > > This series introduces a software command parser similar in operation to the
> > > > > command parsing done in hardware for unsecure batches. However, the software
> > > > > parser allows some operations that would be noop'd by hardware, if the parser
> > > > > determines the operation is safe, and submits the batch as "secure" to prevent
> > > > > hardware parsing. Currently the series implements this on IVB and HSW.
> > > > > 
> > > > > The series is divided into several phases:
> > > > > 
> > > > > patches 01-09: These implement infrastructure and the command parsing algorithm,
> > > > >                all behind a module parameter. I expect some discussion and
> > > > > 	       rework, but hopefully there's nothing too controversial.
> > > > > patches 10-17: These define the checks performed by the parser.
> > > > >                I expect much discussion :)
> > > > > patches 18-20: In a final pass over the command checks, I found some issues with
> > > > >                the definitions. They looked painful to rebase in, so I've added
> > > > > 	       them here.
> > > > > patches 21-22: These enable the parser by default. It runs on all batches except
> > > > >                those that set the I915_EXEC_SECURE flag in the execbuffer2 call.
> > > > 
> > > > I think long-term we should even scan secure batches. We'd need to allow
> > > > some registers which only the drm master (i.e. owner of the display
> > > > hardware) is allowed to do, e.g. for scanline waits. But once we have that
> > > > we should be able to port all current users of secure batches over to
> > > > scanned batches and so enforce this everywhere by default.
> > > > 
> > > > The other issue is that igt tests assume to be able to run some evil
> > > > tests, so maybe we don't actually want this.
> > > 
> > > Agreed. I thought we could handle this as a follow-up task once the basic stuff is
> > > in place, particularly given that we'd want to modify at least some users to test.
> > > I also wasn't sure if we would want the check to be root && master, as in the current
> > > secure flag, or just master.
> > > 
> > > W.r.t. the tests, I suppose we can just turn checking on for secure batches and see
> > > what happens.
> > > 
> > > > 
> > > > > There are follow-up patches to libdrm and to i-g-t. The i-g-t tests are very
> > > > > basic and do not test all of the commands used by the parser on the assumption
> > > > > that I'm likely to make the same mistakes in both the parser and the test.
> > > > 
> > > > Yeah, I agree that just checking whether commands all go through (or not)
> > > > as expected adds very little value on top of the few tests you have done.
> > > > I think we should take a look at some corner cases which might trip up
> > > > your checker a bit though:
> > > > - I think we should check batchbuffer chaining and make sure it works on
> > > >   the vcs ring and not anywhere else (we can't ever break shipping libva
> > > >   which uses this).
> > > > - Some tests to trip up your parser should be done, like 3D commands that
> > > >   fall off the end of the batch bo. Or commands that span page boundaries.
> > > >   The later isn't an issue atm since you use vmap, but we should switch to
> > > >   per-page kmap since the vmap overhead is fairly horrible.
> > > 
> > > Good suggestions. I'll look into these.
> > Hi, Brad
> >       More inputs from libva about the batchbuffer chaining.
> > 
> >       Now the batchbuffer chaining is widely used in libva driver. This
> > is related with how the libva driver processes the image. For the
> > encoding purpose, it needs to be handled based on macroblock(16x16).And
> > every macroblock needs a group of GPU commands. So the GPU commands for
> > all the macroblocks will be constructed in the second-level batchbuffer.
> > The mode of batchbuffer chaining will bring the following benefits:
> >       a. The size of second-level batch buffer can be allocated based on
> > the size of handled image. For example: 1080p/720p/480p can use the
> > different size.
> >       b. The gpu commands in second-level batchbuffer can be constructed
> > by using GPU instead of CPU, which is helpful to improve the
> > performance. 
> > 
> >       At the same time both VCS and Render Ring are used in libva
> > driver. For example: The encoding will use VCS and RCS ring. Firstly the
> > RCS ring is used to execute GPU command for the motion vector/mode
> > prediction. And then the VCS Ring is used to execute the GPU command for
> > generating the bit-stream. So not only VCS ring uses the mode of
> > batchbuffer chaining, but also the Render Ring uses the mode of
> > batchbuffer chaining.
> 
> So are these 2nd level batches constructed by the gpu in some cases? That
> would be fairly horribly to take into account with the batch checker ...

It is *not* the 2nd level batch buffer (bit 22 isn't set). Only batch
buffer chain is used.


> -Daniel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-11-27  8:23         ` Xiang, Haihao
@ 2013-11-27  8:31           ` Daniel Vetter
  2013-11-27  8:42             ` Xiang, Haihao
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2013-11-27  8:31 UTC (permalink / raw)
  To: Xiang, Haihao; +Cc: intel-gfx

On Wed, Nov 27, 2013 at 9:23 AM, Xiang, Haihao <haihao.xiang@intel.com> wrote:
>> So are these 2nd level batches constructed by the gpu in some cases? That
>> would be fairly horribly to take into account with the batch checker ...
>
> It is *not* the 2nd level batch buffer (bit 22 isn't set). Only batch
> buffer chain is used.

That's not really the hard part for the command checker, the important
question is whether the gpu generates these batches or whether they're
constructed by the cpu.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-11-27  8:31           ` Daniel Vetter
@ 2013-11-27  8:42             ` Xiang, Haihao
  2013-11-27  8:47               ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Xiang, Haihao @ 2013-11-27  8:42 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, 2013-11-27 at 09:31 +0100, Daniel Vetter wrote: 
> On Wed, Nov 27, 2013 at 9:23 AM, Xiang, Haihao <haihao.xiang@intel.com> wrote:
> >> So are these 2nd level batches constructed by the gpu in some cases? That
> >> would be fairly horribly to take into account with the batch checker ...
> >
> > It is *not* the 2nd level batch buffer (bit 22 isn't set). Only batch
> > buffer chain is used.
> 
> That's not really the hard part for the command checker, the important
> question is whether the gpu generates these batches or whether they're
> constructed by the cpu.

Yes, some batches are generated by GPU, either by EU shaders or by BSD
unit (batch buffer for MC on ILK).


> -Daniel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-11-27  8:42             ` Xiang, Haihao
@ 2013-11-27  8:47               ` Daniel Vetter
  2013-11-27  8:54                 ` Xiang, Haihao
  2013-11-27  8:55                 ` ykzhao
  0 siblings, 2 replies; 138+ messages in thread
From: Daniel Vetter @ 2013-11-27  8:47 UTC (permalink / raw)
  To: Xiang, Haihao; +Cc: intel-gfx

On Wed, Nov 27, 2013 at 04:42:11PM +0800, Xiang, Haihao wrote:
> On Wed, 2013-11-27 at 09:31 +0100, Daniel Vetter wrote: 
> > On Wed, Nov 27, 2013 at 9:23 AM, Xiang, Haihao <haihao.xiang@intel.com> wrote:
> > >> So are these 2nd level batches constructed by the gpu in some cases? That
> > >> would be fairly horribly to take into account with the batch checker ...
> > >
> > > It is *not* the 2nd level batch buffer (bit 22 isn't set). Only batch
> > > buffer chain is used.
> > 
> > That's not really the hard part for the command checker, the important
> > question is whether the gpu generates these batches or whether they're
> > constructed by the cpu.
> 
> Yes, some batches are generated by GPU, either by EU shaders or by BSD
> unit (batch buffer for MC on ILK).

So is ilk the only platform which does that? The command checker is only
for gen7+ (and maybe gen6).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-11-27  8:47               ` Daniel Vetter
@ 2013-11-27  8:54                 ` Xiang, Haihao
  2013-11-27  8:55                 ` ykzhao
  1 sibling, 0 replies; 138+ messages in thread
From: Xiang, Haihao @ 2013-11-27  8:54 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, 2013-11-27 at 09:47 +0100, Daniel Vetter wrote: 
> On Wed, Nov 27, 2013 at 04:42:11PM +0800, Xiang, Haihao wrote:
> > On Wed, 2013-11-27 at 09:31 +0100, Daniel Vetter wrote: 
> > > On Wed, Nov 27, 2013 at 9:23 AM, Xiang, Haihao <haihao.xiang@intel.com> wrote:
> > > >> So are these 2nd level batches constructed by the gpu in some cases? That
> > > >> would be fairly horribly to take into account with the batch checker ...
> > > >
> > > > It is *not* the 2nd level batch buffer (bit 22 isn't set). Only batch
> > > > buffer chain is used.
> > > 
> > > That's not really the hard part for the command checker, the important
> > > question is whether the gpu generates these batches or whether they're
> > > constructed by the cpu.
> > 
> > Yes, some batches are generated by GPU, either by EU shaders or by BSD
> > unit (batch buffer for MC on ILK).
> 
> So is ilk the only platform which does that? The command checker is only
> for gen7+ (and maybe gen6).

No. In libva some batches are generated by BSD unit on ILK. on Gen6+,
some batches are constructed by GPU shader.


> -Daniel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-11-27  8:47               ` Daniel Vetter
  2013-11-27  8:54                 ` Xiang, Haihao
@ 2013-11-27  8:55                 ` ykzhao
  1 sibling, 0 replies; 138+ messages in thread
From: ykzhao @ 2013-11-27  8:55 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, 2013-11-27 at 09:47 +0100, Daniel Vetter wrote:
> On Wed, Nov 27, 2013 at 04:42:11PM +0800, Xiang, Haihao wrote:
> > On Wed, 2013-11-27 at 09:31 +0100, Daniel Vetter wrote: 
> > > On Wed, Nov 27, 2013 at 9:23 AM, Xiang, Haihao <haihao.xiang@intel.com> wrote:
> > > >> So are these 2nd level batches constructed by the gpu in some cases? That
> > > >> would be fairly horribly to take into account with the batch checker ...
> > > >
> > > > It is *not* the 2nd level batch buffer (bit 22 isn't set). Only batch
> > > > buffer chain is used.
> > > 
> > > That's not really the hard part for the command checker, the important
> > > question is whether the gpu generates these batches or whether they're
> > > constructed by the cpu.
> > 
> > Yes, some batches are generated by GPU, either by EU shaders or by BSD
> > unit (batch buffer for MC on ILK).
> 
> So is ilk the only platform which does that? The command checker is only
> for gen7+ (and maybe gen6).

No. The Ivy/Haswell also uses the GPU to construct the batchbuffer. This
is mainly for the optimization of performance. 

Thanks.
     Yakui

> -Daniel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 06/22] drm/i915: Add a HAS_CMD_PARSER getparam
  2013-11-26 16:51 ` [RFC 06/22] drm/i915: Add a HAS_CMD_PARSER getparam bradley.d.volkin
@ 2013-11-27 12:51   ` Daniel Vetter
  2013-12-05  9:38     ` Kenneth Graunke
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2013-11-27 12:51 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Tue, Nov 26, 2013 at 08:51:23AM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> So userspace can query the kernel for command parser support.
> 
> OTC-Tracker: AXIA-4631
> Change-Id: I58af650db9f6753c2dcac9c54ab432fd31db302f
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_dma.c | 3 +++
>  include/uapi/drm/i915_drm.h     | 1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 5aeb103..f0a4638 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1003,6 +1003,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
>  	case I915_PARAM_HAS_EXEC_HANDLE_LUT:
>  		value = 1;
>  		break;
> +	case I915_PARAM_HAS_CMD_PARSER:
> +		value = 1;

I think we need to have a CMD_PARSER_VER flag here which we can increment
every time we add new registers for additional features. Examples would be
extensions for OA, or the L3 cache control stuff the media/compute guys
want. I think we should also add a comment (maybe right here) which
explains for each version what has been added. Otherwise userspace has no
idea what kind of additional restricted commands it can submit.
-Daniel

> +		break;
>  	default:
>  		DRM_DEBUG("Unknown parameter %d\n", param->param);
>  		return -EINVAL;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 52aed89..48cc277 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -337,6 +337,7 @@ typedef struct drm_i915_irq_wait {
>  #define I915_PARAM_HAS_EXEC_NO_RELOC	 25
>  #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
>  #define I915_PARAM_HAS_WT     	 	 27
> +#define I915_PARAM_HAS_CMD_PARSER	 28
>  
>  typedef struct drm_i915_getparam {
>  	int param;
> -- 
> 1.8.4.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-11-26 20:24   ` Volkin, Bradley D
  2013-11-27  1:32     ` ykzhao
@ 2013-12-04  8:13     ` Daniel Vetter
  2013-12-04  8:22       ` Daniel Vetter
  2013-12-05  1:40       ` Volkin, Bradley D
  2013-12-05 20:47     ` Volkin, Bradley D
  2 siblings, 2 replies; 138+ messages in thread
From: Daniel Vetter @ 2013-12-04  8:13 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Tue, Nov 26, 2013 at 9:24 PM, Volkin, Bradley D <bradley.d.volkin@intel.com> wrote:

[snip]

> Which "state setup stuff" are you referring to? Something specific in i-g-t or something
> more general?

The state setup 3D commands as opposed to doing actual rendering commands
(with 3D_PRIM). Just to have a bit more realistic cs opcode lengths for
micro-benchmarking.

[snip]

> Ok, I'll look at the hw context code for buffer mgmt. For "purgeable", just via the
> madv field in the i915 gem object?

Yeah, though I'd extract two tiny helpers (maybe shared with the madvise
ioctl) to set an object to purgeable and then resurrect it. The later can
obviously fail. The helpers are just so we have a place to throw debug
asserts into, maybe there are other needs for in-kernel caches.

> Also, there are a couple iterations of the work-in-progress patches. Do you prefer a
> cache per ring or a single cache shared by all rings?

I've pondered a bunch of reasons for/against the two approaches and I
think it won't really matter. Maybe slightly leaning towards per-ring
caches since then objects retire in order. Well, until we do preemption
;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-12-04  8:13     ` Daniel Vetter
@ 2013-12-04  8:22       ` Daniel Vetter
  2013-12-05  1:40       ` Volkin, Bradley D
  1 sibling, 0 replies; 138+ messages in thread
From: Daniel Vetter @ 2013-12-04  8:22 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Wed, Dec 4, 2013 at 9:13 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> Ok, I'll look at the hw context code for buffer mgmt. For "purgeable", just via the
>> madv field in the i915 gem object?
>
> Yeah, though I'd extract two tiny helpers (maybe shared with the madvise
> ioctl) to set an object to purgeable and then resurrect it. The later can
> obviously fail. The helpers are just so we have a place to throw debug
> asserts into, maybe there are other needs for in-kernel caches.

Since we've had tons of fun in the past few months with low memory
handling I think an evil testcase to exercise this cache purging logic
a bit would be good. Quick idea would be to submit dummy batches that
double in size every time (to ensure we have a miss in the cache),
with the last batch careful sized so that the userspace batch plus the
kernel copy will still fit, but not together with all the older cached
batches.

I expect that our other memory thrashing tests will also exercise the
cache purging, but having a specific test which doesn't exercise
anything else really helps to separate issues.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-12-04  8:13     ` Daniel Vetter
  2013-12-04  8:22       ` Daniel Vetter
@ 2013-12-05  1:40       ` Volkin, Bradley D
  2013-12-05  7:48         ` Daniel Vetter
  1 sibling, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2013-12-05  1:40 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Dec 04, 2013 at 12:13:39AM -0800, Daniel Vetter wrote:
> On Tue, Nov 26, 2013 at 9:24 PM, Volkin, Bradley D <bradley.d.volkin@intel.com> wrote:
> 
> [snip]
> 
> > Which "state setup stuff" are you referring to? Something specific in i-g-t or something
> > more general?
> 
> The state setup 3D commands as opposed to doing actual rendering commands
> (with 3D_PRIM). Just to have a bit more realistic cs opcode lengths for
> micro-benchmarking.
> 
> [snip]
> 
> > Ok, I'll look at the hw context code for buffer mgmt. For "purgeable", just via the
> > madv field in the i915 gem object?
> 
> Yeah, though I'd extract two tiny helpers (maybe shared with the madvise
> ioctl) to set an object to purgeable and then resurrect it. The later can
> obviously fail. The helpers are just so we have a place to throw debug
> asserts into, maybe there are other needs for in-kernel caches.

Hmm. Not sure how much use the helpers would be. The ioctl has one case where it will
truncate() the object, but beyond that it just sets madv appropriately. In general,
purging and resurrecting appear to happen via a put_pages() from the shrinker and a
get_pages() the next time the object is needed.

Also, I haven't looked too closely at the shrinker code. Could there be potential races
between the shrinker purging a buffer and an execbuf call choosing it? I would expect
that synchronization is already in place. Just curious if you know off the top of your head.

Brad

> 
> > Also, there are a couple iterations of the work-in-progress patches. Do you prefer a
> > cache per ring or a single cache shared by all rings?
> 
> I've pondered a bunch of reasons for/against the two approaches and I
> think it won't really matter. Maybe slightly leaning towards per-ring
> caches since then objects retire in order. Well, until we do preemption
> ;-)
> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-12-05  1:40       ` Volkin, Bradley D
@ 2013-12-05  7:48         ` Daniel Vetter
  0 siblings, 0 replies; 138+ messages in thread
From: Daniel Vetter @ 2013-12-05  7:48 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Thu, Dec 5, 2013 at 2:40 AM, Volkin, Bradley D
<bradley.d.volkin@intel.com> wrote:
>> > Ok, I'll look at the hw context code for buffer mgmt. For "purgeable", just via the
>> > madv field in the i915 gem object?
>>
>> Yeah, though I'd extract two tiny helpers (maybe shared with the madvise
>> ioctl) to set an object to purgeable and then resurrect it. The later can
>> obviously fail. The helpers are just so we have a place to throw debug
>> asserts into, maybe there are other needs for in-kernel caches.
>
> Hmm. Not sure how much use the helpers would be. The ioctl has one case where it will
> truncate() the object, but beyond that it just sets madv appropriately. In general,
> purging and resurrecting appear to happen via a put_pages() from the shrinker and a
> get_pages() the next time the object is needed.

Yeah, the helper would be indeed minimal. But they'd allow us to
shovel checks like obj->pin_count or similar stuff in there to
sanity-check users.

> Also, I haven't looked too closely at the shrinker code. Could there be potential races
> between the shrinker purging a buffer and an execbuf call choosing it? I would expect
> that synchronization is already in place. Just curious if you know off the top of your head.

As long as you try to set the object back to WILLNEED before you start
using it and free your reference if that fails (i.e. it's state is
PURGED) then the magic just happens.

I just reviewed the code a bit and it seems like we're missing a few
sanity checks and don't properly reject purgeable objects in e.g.
bind_to_vm.

For marking the object purgeable you only have to take care to do that
after its on the active list. The shrinker will then make sure to wait
for the gpu to complete processing before it drops the backing
storage. Misordering here would be caught by a obj->pin_count check,
hence why I've suggested that.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 06/22] drm/i915: Add a HAS_CMD_PARSER getparam
  2013-11-27 12:51   ` Daniel Vetter
@ 2013-12-05  9:38     ` Kenneth Graunke
  2013-12-05 17:22       ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Kenneth Graunke @ 2013-12-05  9:38 UTC (permalink / raw)
  To: Daniel Vetter, bradley.d.volkin; +Cc: intel-gfx

On 11/27/2013 04:51 AM, Daniel Vetter wrote:
> On Tue, Nov 26, 2013 at 08:51:23AM -0800, bradley.d.volkin@intel.com wrote:
>> From: Brad Volkin <bradley.d.volkin@intel.com>
>>
>> So userspace can query the kernel for command parser support.
>>
>> OTC-Tracker: AXIA-4631
>> Change-Id: I58af650db9f6753c2dcac9c54ab432fd31db302f
>> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_dma.c | 3 +++
>>  include/uapi/drm/i915_drm.h     | 1 +
>>  2 files changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
>> index 5aeb103..f0a4638 100644
>> --- a/drivers/gpu/drm/i915/i915_dma.c
>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>> @@ -1003,6 +1003,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
>>  	case I915_PARAM_HAS_EXEC_HANDLE_LUT:
>>  		value = 1;
>>  		break;
>> +	case I915_PARAM_HAS_CMD_PARSER:
>> +		value = 1;
> 
> I think we need to have a CMD_PARSER_VER flag here which we can increment
> every time we add new registers for additional features. Examples would be
> extensions for OA, or the L3 cache control stuff the media/compute guys
> want. I think we should also add a comment (maybe right here) which
> explains for each version what has been added. Otherwise userspace has no
> idea what kind of additional restricted commands it can submit.
> -Daniel

The CMD_PARSER_VER idea makes sense to me, too.  That way userspace can
easily check whether it's allowed to do something, and we can extend
that in the future.

--Ken

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 06/22] drm/i915: Add a HAS_CMD_PARSER getparam
  2013-12-05  9:38     ` Kenneth Graunke
@ 2013-12-05 17:22       ` Volkin, Bradley D
  2013-12-05 17:26         ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2013-12-05 17:22 UTC (permalink / raw)
  To: Kenneth Graunke; +Cc: intel-gfx

On Thu, Dec 05, 2013 at 01:38:00AM -0800, Kenneth Graunke wrote:
> On 11/27/2013 04:51 AM, Daniel Vetter wrote:
> > On Tue, Nov 26, 2013 at 08:51:23AM -0800, bradley.d.volkin@intel.com wrote:
> >> From: Brad Volkin <bradley.d.volkin@intel.com>
> >>
> >> So userspace can query the kernel for command parser support.
> >>
> >> OTC-Tracker: AXIA-4631
> >> Change-Id: I58af650db9f6753c2dcac9c54ab432fd31db302f
> >> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> >> ---
> >>  drivers/gpu/drm/i915/i915_dma.c | 3 +++
> >>  include/uapi/drm/i915_drm.h     | 1 +
> >>  2 files changed, 4 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> >> index 5aeb103..f0a4638 100644
> >> --- a/drivers/gpu/drm/i915/i915_dma.c
> >> +++ b/drivers/gpu/drm/i915/i915_dma.c
> >> @@ -1003,6 +1003,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
> >>  	case I915_PARAM_HAS_EXEC_HANDLE_LUT:
> >>  		value = 1;
> >>  		break;
> >> +	case I915_PARAM_HAS_CMD_PARSER:
> >> +		value = 1;
> > 
> > I think we need to have a CMD_PARSER_VER flag here which we can increment
> > every time we add new registers for additional features. Examples would be
> > extensions for OA, or the L3 cache control stuff the media/compute guys
> > want. I think we should also add a comment (maybe right here) which
> > explains for each version what has been added. Otherwise userspace has no
> > idea what kind of additional restricted commands it can submit.
> > -Daniel
> 
> The CMD_PARSER_VER idea makes sense to me, too.  That way userspace can
> easily check whether it's allowed to do something, and we can extend
> that in the future.

Ok. Any reason to keep both HAS_CMD_PARSER and CMD_PARSER_VER? Seems like
just the version should be enough.

Brad

> 
> --Ken

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 06/22] drm/i915: Add a HAS_CMD_PARSER getparam
  2013-12-05 17:22       ` Volkin, Bradley D
@ 2013-12-05 17:26         ` Daniel Vetter
  0 siblings, 0 replies; 138+ messages in thread
From: Daniel Vetter @ 2013-12-05 17:26 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Thu, Dec 5, 2013 at 6:22 PM, Volkin, Bradley D
<bradley.d.volkin@intel.com> wrote:
> Ok. Any reason to keep both HAS_CMD_PARSER and CMD_PARSER_VER? Seems like
> just the version should be enough.

Just the version should be good enough, if it's not there userspace
will get an -EINVAL. And we can just say that if the version is 0
there's no command parser. So no need for an additional flag.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-11-26 20:24   ` Volkin, Bradley D
  2013-11-27  1:32     ` ykzhao
  2013-12-04  8:13     ` Daniel Vetter
@ 2013-12-05 20:47     ` Volkin, Bradley D
  2013-12-05 23:42       ` Daniel Vetter
  2 siblings, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2013-12-05 20:47 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, Nov 26, 2013 at 12:24:14PM -0800, Volkin, Bradley D wrote:
> On Tue, Nov 26, 2013 at 11:35:38AM -0800, Daniel Vetter wrote:
> > I think long-term we should even scan secure batches. We'd need to allow
> > some registers which only the drm master (i.e. owner of the display
> > hardware) is allowed to do, e.g. for scanline waits. But once we have that
> > we should be able to port all current users of secure batches over to
> > scanned batches and so enforce this everywhere by default.
> > 
> > The other issue is that igt tests assume to be able to run some evil
> > tests, so maybe we don't actually want this.
> 
> Agreed. I thought we could handle this as a follow-up task once the basic stuff is
> in place, particularly given that we'd want to modify at least some users to test.
> I also wasn't sure if we would want the check to be root && master, as in the current
> secure flag, or just master.

So my plan to initially not parse secure batches might be shot. During further testing,
I found that it looks like Ubuntu 13.10 ships with fdo bug 71328 out of the box (sna
doesn't set the EXEC_SECURE flag when doing scanline waits). Sooo...

If we parse all batches and allow extra commands/registers from the drm master, should
that list just be the commands/registers used for scanline waits? Are there others you
can think of?

Brad

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 05/22] drm/i915: Implement command parsing
  2013-11-26 17:56       ` Chris Wilson
  2013-11-26 18:55         ` Volkin, Bradley D
@ 2013-12-05 21:10         ` Volkin, Bradley D
  1 sibling, 0 replies; 138+ messages in thread
From: Volkin, Bradley D @ 2013-12-05 21:10 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Nov 26, 2013 at 09:56:09AM -0800, Chris Wilson wrote:
> On Tue, Nov 26, 2013 at 09:38:55AM -0800, Volkin, Bradley D wrote:
> > On Tue, Nov 26, 2013 at 09:29:32AM -0800, Chris Wilson wrote:
> > > On Tue, Nov 26, 2013 at 08:51:22AM -0800, bradley.d.volkin@intel.com wrote:
> > > > +static const struct drm_i915_cmd_descriptor*
> > > > +find_cmd_in_table(const struct drm_i915_cmd_table *table,
> > > > +		  u32 cmd_header)
> > > > +{
> > > > +	int i;
> > > > +
> > > > +	for (i = 0; i < table->count; i++) {
> > > > +		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
> > > > +		u32 masked_cmd = desc->cmd.mask & cmd_header;
> > > > +		u32 masked_value = desc->cmd.value & desc->cmd.mask;
> > > > +
> > > > +		if (masked_cmd == masked_value)
> > > > +			return desc;
> > > 
> > > Maybe pre-sort the cmd table and use bsearch? And a runtime test on
> > > module load to check that the table is sorted correctly.
> > 
> > So far it doesn't look like the search is a bottleneck, but I've tried to keep
> > the tables sorted so that we could easily switch to bsearch if needed. Would
> > you prefer to just use bsearch from the start?
> 
> Yes. I think it will be easier (especially if the codes does the runtime
> check) to keep the table sorted as commands are incremently added in the
> future, rather than having to retrofit the bsearch when it becomes a
> significant problem.
> -Chris

A quick test is showing that bsearch() with the current table definitions is worse
than the linear search by ~7%. If I flatten the tables so there's one table with
all of the commands for a given ring, bsearch() is ~2% better than linear search.

So I'm inclined to add the code to check that the list is sorted but stick with linear
search for now. I'll revisit when we have more complete performance data.

Brad

> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-12-05 20:47     ` Volkin, Bradley D
@ 2013-12-05 23:42       ` Daniel Vetter
  0 siblings, 0 replies; 138+ messages in thread
From: Daniel Vetter @ 2013-12-05 23:42 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Thu, Dec 5, 2013 at 9:47 PM, Volkin, Bradley D
<bradley.d.volkin@intel.com> wrote:
> On Tue, Nov 26, 2013 at 12:24:14PM -0800, Volkin, Bradley D wrote:
>> On Tue, Nov 26, 2013 at 11:35:38AM -0800, Daniel Vetter wrote:
>> > I think long-term we should even scan secure batches. We'd need to allow
>> > some registers which only the drm master (i.e. owner of the display
>> > hardware) is allowed to do, e.g. for scanline waits. But once we have that
>> > we should be able to port all current users of secure batches over to
>> > scanned batches and so enforce this everywhere by default.
>> >
>> > The other issue is that igt tests assume to be able to run some evil
>> > tests, so maybe we don't actually want this.
>>
>> Agreed. I thought we could handle this as a follow-up task once the basic stuff is
>> in place, particularly given that we'd want to modify at least some users to test.
>> I also wasn't sure if we would want the check to be root && master, as in the current
>> secure flag, or just master.
>
> So my plan to initially not parse secure batches might be shot. During further testing,
> I found that it looks like Ubuntu 13.10 ships with fdo bug 71328 out of the box (sna
> doesn't set the EXEC_SECURE flag when doing scanline waits). Sooo...
>
> If we parse all batches and allow extra commands/registers from the drm master, should
> that list just be the commands/registers used for scanline waits? Are there others you
> can think of?

No, I think the additional stuff the drm master is allowed to do is
just the scanline waits. Everything else that the ddx might or might
not be doing should also be possible as a normal process (e.g. the
BTILE register to switch the blitter into y-tiled mode). But Chris
should know if there's anything else really.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-11-26 19:35 ` [RFC 00/22] Gen7 batch buffer command parser Daniel Vetter
  2013-11-26 20:24   ` Volkin, Bradley D
  2013-11-27  1:26   ` Xiang, Haihao
@ 2013-12-11  0:58   ` Volkin, Bradley D
  2013-12-11  9:54     ` Daniel Vetter
  2 siblings, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2013-12-11  0:58 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

[snip]

On Tue, Nov 26, 2013 at 11:35:38AM -0800, Daniel Vetter wrote:
> > 2) Coherency. I've found two types of coherency issues when reading the batch
> >    buffer from the CPU during execbuffer2. Looking for help with both issues.
> >     i. First, the i-g-t test gem_cpu_reloc blits to a batch buffer and the
> >        parser isn't properly waiting for the operation to complete before
> >        parsing. I tried adding i915_gem_object_sync(batch_obj, [ring|NULL])
> >        but that actually caused more failures.
> 
> This synchronization should happen when processing the relocations. The
> batch itself isn't modified by the gpu, we simply upload it using the
> blitter. So this going wrong indicates there's some issue somewhere ...

I looked at this again. The blitter copy uploads a batch full of noops and there
are no relocations associated with the failing batch, as far as I can tell. I
suspect that would explain why relocation handling wouldn't catch this. In any case,
I copied the relevant setup code from shmem pread, and it resolved the issue.

> 
> 
> >    ii. Second, on VLV, I've seen cache coherency issues when userspace writes
> >        the batch via pwrite fast path before calling execbuffer2. The parser
> >        reads stale data. This works fine on IVB and HSW, so I believe it's an
> >        LLC vs. non-LLC issue. I'm just unclear on what the correct flushing or
> >        synchronization is for this scenario.
> 
> Imo we take a good look at the optimized buffer read/write code from
> i915_gem_shmem_pread (for reading the userspace batch) and
> i915_gem_shmem_pwrite (for writing to the checked buffer). If we do the
> checking in-line with the reading this should also bring down the overhead
> massively compared to the current solution (those shmem read/write
> functions are fairly optimized).

So, I have a functioning kmap_atomic based parser using an sg_mapping_iter, and in the
tests I'm running, it's worse than the vmap approach. This is still without the batch
copy, but I think it's relevant anyhow. I haven't done much in the way of analysis as to
why we're getting these results. At a high level, the vmap approach leads to simple
code with a few function calls and control structures. The per-page approach has
somewhat more complex logic around mapping the next page at the right time and checking
for commands that cross a page boundary. I'd guess that difference factors into it.

Due to the risk of regressions, I think it would be better not to delay getting
broader functional and performance testing by waiting to sort this out. I'd rather
stick with vmap for now and revisit that overhead once we have more concrete
performance data to work from.

I'll propose that I send an updated series later this week or next that addresses
feedback from the review, including better handling for secure and chained batches,
the sync fixes for gem_cpu_reloc, some of the additional tests you suggested,
and possibly an attempt at batch copy. If that goes well, could we see about
getting the patches into the hands of your QA team for further testing?

Brad

> 
> > 3) 2nd-level batches. The parser currently allows MI_BATCH_BUFFER_START commands
> >    in userspace batches without parsing them. The media driver uses 2nd-level
> >    batches, so a solution is required. I have some ideas but don't want to delay
> >    the review process for what I have so far. It may be that the 2nd-level
> >    parsing is only needed for VCS and the current code (plus rejecting BBS)
> >    would be sufficient for RCS.
> 
> Afaik only libva uses second-level batches, and only on the vcs. So I hope
> we can just submit those as unpriviledged batches if possible. If that's
> not possible it'll get fairly ugly I fear :(
> 
> > 4) Command buffer copy. To avoid CPU modifications to buffers after parsing, and
> >    to avoid GPU modifications to buffers via EUs or commands in the batch, we
> >    should copy the userspace batch buffer to memory that userspace does not
> >    have access to, map it into GGTT, and execute that batch buffer. I have a
> >    sense of how to do this for 1st-level batches, but it would need changes to
> >    tie in with the 2nd-level batch parsing I think, so I've again held off.
> 
> Yeah, we need the copying for otherwise the parsing is fairly pointless.
> I've stumbled over some of your internally patches and had a quick look at
> them. Two high-level comments:
> 
> - Using the existing active buffer lru instead of manual pinning would
>   better integrate with the eviction code. For an example of in-kernel
>   objects (and not userspace objects) using this look at the hw context
>   code.
> - Imo we should tag all buffers as purgeable while they're in the cache.
>   That way the shrinker will automatically drop the backing storage if
>   memory runs low (and thanks to the active list lru only when the gpu has
>   stopped processing the batch). That way we can just keep on allocating
>   buffers if they're all busy without any concern for running out of
>   memory.
> 
> I'll try to read through your patch pile in the next few days, this is
> just the very-very high-level stuff that came to mind immediately.
> 
> Cheers, Daniel
> > 
> > Brad Volkin (22):
> >   drm/i915: Add data structures for command parser
> >   drm/i915: Initial command parser table definitions
> >   drm/i915: Hook command parser tables up to rings
> >   drm/i915: Add per-ring command length decode functions
> >   drm/i915: Implement command parsing
> >   drm/i915: Add a HAS_CMD_PARSER getparam
> >   drm/i915: Add support for rejecting commands during parsing
> >   drm/i915: Add support for checking register accesses
> >   drm/i915: Add support for rejecting commands via bitmasks
> >   drm/i915: Reject unsafe commands
> >   drm/i915: Add register whitelists for mesa
> >   drm/i915: Enable register whitelist checks
> >   drm/i915: Enable bit checking for some commands
> >   drm/i915: Enable PPGTT command parser checks
> >   drm/i915: Reject commands that would store to global HWS page
> >   drm/i915: Reject additional commands
> >   drm/i915: Add parser data for perf monitoring GL extensions
> >   drm/i915: Reject MI_ARB_ON_OFF on VECS
> >   drm/i915: Fix length handling for MFX_WAIT
> >   drm/i915: Fix MI_STORE_DWORD_IMM parser defintion
> >   drm/i915: Clean up command parser enable decision
> >   drm/i915: Enable command parsing by default
> > 
> >  drivers/gpu/drm/i915/Makefile              |   3 +-
> >  drivers/gpu/drm/i915/i915_cmd_parser.c     | 712 +++++++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/i915_dma.c            |   3 +
> >  drivers/gpu/drm/i915/i915_drv.c            |   5 +
> >  drivers/gpu/drm/i915/i915_drv.h            |  96 ++++
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  15 +
> >  drivers/gpu/drm/i915/i915_reg.h            |  66 +++
> >  drivers/gpu/drm/i915/intel_ringbuffer.c    |   2 +
> >  drivers/gpu/drm/i915/intel_ringbuffer.h    |  25 +
> >  include/uapi/drm/i915_drm.h                |   1 +
> >  10 files changed, 927 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c
> > 
> > -- 
> > 1.8.4.4
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-12-11  0:58   ` Volkin, Bradley D
@ 2013-12-11  9:54     ` Daniel Vetter
  2013-12-11 18:04       ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2013-12-11  9:54 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Tue, Dec 10, 2013 at 04:58:18PM -0800, Volkin, Bradley D wrote:
> So, I have a functioning kmap_atomic based parser using an sg_mapping_iter, and in the
> tests I'm running, it's worse than the vmap approach. This is still without the batch
> copy, but I think it's relevant anyhow. I haven't done much in the way of analysis as to
> why we're getting these results. At a high level, the vmap approach leads to simple
> code with a few function calls and control structures. The per-page approach has
> somewhat more complex logic around mapping the next page at the right time and checking
> for commands that cross a page boundary. I'd guess that difference factors into it.
>
> Due to the risk of regressions, I think it would be better not to delay getting
> broader functional and performance testing by waiting to sort this out. I'd rather
> stick with vmap for now and revisit that overhead once we have more concrete
> performance data to work from.

Yeah, makes sense. Just to check: Was that on hsw with llc coherency or on
vlv? Might be that the clflushing shifts the picture around a bit.

> I'll propose that I send an updated series later this week or next that addresses
> feedback from the review, including better handling for secure and chained batches,
> the sync fixes for gem_cpu_reloc, some of the additional tests you suggested,
> and possibly an attempt at batch copy. If that goes well, could we see about
> getting the patches into the hands of your QA team for further testing?

We unfortunately don't really have tons of spare cycles from our QA team
for testing branches (pretty much none actually), so the usual approach is
to review and merge patches without first going through QA. If we pull in
your new i-g-ts first we should have decent assurance that nothing blows
up. And since kernel patch series should always be fully bisectable we can
stop at any point in time if something goes wrong.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-12-11  9:54     ` Daniel Vetter
@ 2013-12-11 18:04       ` Volkin, Bradley D
  2013-12-11 18:46         ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2013-12-11 18:04 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Dec 11, 2013 at 01:54:40AM -0800, Daniel Vetter wrote:
> On Tue, Dec 10, 2013 at 04:58:18PM -0800, Volkin, Bradley D wrote:
> > So, I have a functioning kmap_atomic based parser using an sg_mapping_iter, and in the
> > tests I'm running, it's worse than the vmap approach. This is still without the batch
> > copy, but I think it's relevant anyhow. I haven't done much in the way of analysis as to
> > why we're getting these results. At a high level, the vmap approach leads to simple
> > code with a few function calls and control structures. The per-page approach has
> > somewhat more complex logic around mapping the next page at the right time and checking
> > for commands that cross a page boundary. I'd guess that difference factors into it.
> >
> > Due to the risk of regressions, I think it would be better not to delay getting
> > broader functional and performance testing by waiting to sort this out. I'd rather
> > stick with vmap for now and revisit that overhead once we have more concrete
> > performance data to work from.
> 
> Yeah, makes sense. Just to check: Was that on hsw with llc coherency or on
> vlv? Might be that the clflushing shifts the picture around a bit.

IVB and HSW. There's now a wait_rendering() call that should cover the gem_cpu_reloc
case. I haven't had a chance to go back and test on VLV to see if the clflushing helps
with the other coherency issue.

> 
> > I'll propose that I send an updated series later this week or next that addresses
> > feedback from the review, including better handling for secure and chained batches,
> > the sync fixes for gem_cpu_reloc, some of the additional tests you suggested,
> > and possibly an attempt at batch copy. If that goes well, could we see about
> > getting the patches into the hands of your QA team for further testing?
> 
> We unfortunately don't really have tons of spare cycles from our QA team
> for testing branches (pretty much none actually), so the usual approach is
> to review and merge patches without first going through QA. If we pull in
> your new i-g-ts first we should have decent assurance that nothing blows
> up. And since kernel patch series should always be fully bisectable we can
> stop at any point in time if something goes wrong.

Ok, sounds good. I'm fine with whatever approach gets us the test coverage soonest.

Brad

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-12-11 18:04       ` Volkin, Bradley D
@ 2013-12-11 18:46         ` Daniel Vetter
  0 siblings, 0 replies; 138+ messages in thread
From: Daniel Vetter @ 2013-12-11 18:46 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Wed, Dec 11, 2013 at 7:04 PM, Volkin, Bradley D
<bradley.d.volkin@intel.com> wrote:
>> We unfortunately don't really have tons of spare cycles from our QA team
>> for testing branches (pretty much none actually), so the usual approach is
>> to review and merge patches without first going through QA. If we pull in
>> your new i-g-ts first we should have decent assurance that nothing blows
>> up. And since kernel patch series should always be fully bisectable we can
>> stop at any point in time if something goes wrong.
>
> Ok, sounds good. I'm fine with whatever approach gets us the test coverage soonest.

One thing that's always important is to get tangential prep work to
the beginning of your patch series as much as possible. That way we
can merge&test those glue patches (and their impact on the rest of the
driver) even when review is blocked on some contentious topic that
affects the core of a new feature.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 00/13] Gen7 batch buffer command parser
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (22 preceding siblings ...)
  2013-11-26 19:35 ` [RFC 00/22] Gen7 batch buffer command parser Daniel Vetter
@ 2014-01-29 21:55 ` bradley.d.volkin
  2014-01-29 21:55   ` [PATCH 01/13] drm/i915: Refactor shmem pread setup bradley.d.volkin
                     ` (14 more replies)
  2014-01-29 21:57 ` [PATCH] intel: Merge i915_drm.h with cmd parser define bradley.d.volkin
                   ` (2 subsequent siblings)
  26 siblings, 15 replies; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:55 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Certain OpenGL features (e.g. transform feedback, performance monitoring)
require userspace code to submit batches containing commands such as
MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
generations of the hardware will noop these commands in "unsecure" batches
(which includes all userspace batches submitted via i915) even though the
commands may be safe and represent the intended programming model of the device.

This series introduces a software command parser similar in operation to the
command parsing done in hardware for unsecure batches. However, the software
parser allows some operations that would be noop'd by hardware, if the parser
determines the operation is safe, and submits the batch as "secure" to prevent
hardware parsing. Currently the series implements this on IVB and HSW.

The series has one piece of prep work, one patch for the parser logic, and a
handful of patches to fill out the tables which drive the parser. There are
follow-up patches to libdrm and to i-g-t. The i-g-t tests are basic and do not
test all of the commands used by the parser on the assumption that I'm likely
to make the same mistakes in both the parser and the test.

WARNING!!!
I've previously run the i-g-t gem_* tests, the piglit quick tests, and generally
used Ubuntu 13.10 IVB and HSW systems with the parser running. Aside from a
failure described below, I did not see any regressions. However, the series
currently hits a BUG_ON() if you enable the parser due to a regression in secure
batch handling on -nightly.

At this point there are a couple of required/potential improvements.

1) Chained batches. The parser currently allows MI_BATCH_BUFFER_START commands
   in userspace batches without parsing them. The media driver uses chained
   batches, so a solution is required. I'm still working through the
   requirements but don't want to continue delaying the review process for what
   I have so far.
2) Command buffer copy. To avoid CPU modifications to buffers after parsing, and
   to avoid GPU modifications to buffers via EUs or commands in the batch, we
   should copy the userspace batch buffer to memory that userspace does not
   have access to, map it into GGTT, and execute that batch buffer. I have a
   sense of how to do this for 1st-level batches, but it may need changes to
   tie in with the chained batch parsing, so I've again held off.
3) Coherency. I've found a coherency issue on VLV when reading the batch buffer
   from the CPU during execbuffer2. Userspace writes the batch via pwrite fast
   path before calling execbuffer2. The parser reads stale data. This works fine
   on IVB and HSW, so I believe it's an LLC vs. non-LLC issue. I'm just unclear
   on what the correct flushing or synchronization is for this scenario. This
   only matters if we get PPGTT working on VLV and enable the parser there.

v2:
- Significantly reorder series
- Scan secure batches (i.e. I915_EXEC_SECURE)
- Check that parser tables are sorted during init
- Fixed gem_cpu_reloc regression
- HAS_CMD_PARSER -> CMD_PARSER_VERSION getparam
- Additional tests

Brad Volkin (13):
  drm/i915: Refactor shmem pread setup
  drm/i915: Implement command buffer parsing logic
  drm/i915: Initial command parser table definitions
  drm/i915: Reject privileged commands
  drm/i915: Allow some privileged commands from master
  drm/i915: Add register whitelists for mesa
  drm/i915: Add register whitelist for DRM master
  drm/i915: Enable register whitelist checks
  drm/i915: Reject commands that explicitly generate interrupts
  drm/i915: Enable PPGTT command parser checks
  drm/i915: Reject commands that would store to global HWS page
  drm/i915: Add a CMD_PARSER_VERSION getparam
  drm/i915: Enable command parsing by default

 drivers/gpu/drm/i915/Makefile              |   3 +-
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 845 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_dma.c            |   4 +
 drivers/gpu/drm/i915/i915_drv.h            | 103 ++++
 drivers/gpu/drm/i915/i915_gem.c            |  48 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  17 +
 drivers/gpu/drm/i915/i915_params.c         |   5 +
 drivers/gpu/drm/i915/i915_reg.h            |  78 +++
 drivers/gpu/drm/i915/intel_ringbuffer.c    |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  32 ++
 include/uapi/drm/i915_drm.h                |   1 +
 11 files changed, 1123 insertions(+), 15 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c

-- 
1.8.5.2

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 01/13] drm/i915: Refactor shmem pread setup
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
@ 2014-01-29 21:55   ` bradley.d.volkin
  2014-01-30  8:36     ` Daniel Vetter
  2014-01-29 21:55   ` [PATCH 02/13] drm/i915: Implement command buffer parsing logic bradley.d.volkin
                     ` (13 subsequent siblings)
  14 siblings, 1 reply; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:55 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

The command parser is going to need the same synchronization and
setup logic, so factor it out for reuse.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |  3 +++
 drivers/gpu/drm/i915/i915_gem.c | 48 +++++++++++++++++++++++++++++------------
 2 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3673ba1..bfb30df 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2045,6 +2045,9 @@ void i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv);
 void i915_gem_release_mmap(struct drm_i915_gem_object *obj);
 void i915_gem_lastclose(struct drm_device *dev);
 
+int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
+				    int *needs_clflush);
+
 int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
 static inline struct page *i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 39770f7..fdc1f40 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -332,6 +332,39 @@ __copy_from_user_swizzled(char *gpu_vaddr, int gpu_offset,
 	return 0;
 }
 
+/*
+ * Pins the specified object's pages and synchronizes the object with
+ * GPU accesses. Sets needs_clflush to non-zero if the caller should
+ * flush the object from the CPU cache.
+ */
+int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
+				    int *needs_clflush)
+{
+	int ret;
+
+	*needs_clflush = 0;
+
+	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
+		/* If we're not in the cpu read domain, set ourself into the gtt
+		 * read domain and manually flush cachelines (if required). This
+		 * optimizes for the case when the gpu will dirty the data
+		 * anyway again before the next pread happens. */
+		*needs_clflush = !cpu_cache_is_coherent(obj->base.dev,
+							obj->cache_level);
+		ret = i915_gem_object_wait_rendering(obj, true);
+		if (ret)
+			return ret;
+	}
+
+	ret = i915_gem_object_get_pages(obj);
+	if (ret)
+		return ret;
+
+	i915_gem_object_pin_pages(obj);
+
+	return ret;
+}
+
 /* Per-page copy function for the shmem pread fastpath.
  * Flushes invalid cachelines before reading the target if
  * needs_clflush is set. */
@@ -429,23 +462,10 @@ i915_gem_shmem_pread(struct drm_device *dev,
 
 	obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
 
-	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
-		/* If we're not in the cpu read domain, set ourself into the gtt
-		 * read domain and manually flush cachelines (if required). This
-		 * optimizes for the case when the gpu will dirty the data
-		 * anyway again before the next pread happens. */
-		needs_clflush = !cpu_cache_is_coherent(dev, obj->cache_level);
-		ret = i915_gem_object_wait_rendering(obj, true);
-		if (ret)
-			return ret;
-	}
-
-	ret = i915_gem_object_get_pages(obj);
+	ret = i915_gem_obj_prepare_shmem_read(obj, &needs_clflush);
 	if (ret)
 		return ret;
 
-	i915_gem_object_pin_pages(obj);
-
 	offset = args->offset;
 
 	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents,
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
  2014-01-29 21:55   ` [PATCH 01/13] drm/i915: Refactor shmem pread setup bradley.d.volkin
@ 2014-01-29 21:55   ` bradley.d.volkin
  2014-01-29 22:28     ` Chris Wilson
                       ` (3 more replies)
  2014-01-29 21:55   ` [PATCH 03/13] drm/i915: Initial command parser table definitions bradley.d.volkin
                     ` (12 subsequent siblings)
  14 siblings, 4 replies; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:55 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

The command parser scans batch buffers submitted via execbuffer ioctls before
the driver submits them to hardware. At a high level, it looks for several
things:

1) Commands which are explicitly defined as privileged or which should only be
   used by the kernel driver. The parser generally rejects such commands, with
   the provision that it may allow some from the drm master process.
2) Commands which access registers. To support correct/enhanced userspace
   functionality, particularly certain OpenGL extensions, the parser provides a
   whitelist of registers which userspace may safely access (for both normal and
   drm master processes).
3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
   parser always rejects such commands.

Each ring maintains tables of commands and registers which the parser uses in
scanning batch buffers submitted to that ring.

The set of commands that the parser must check for is significantly smaller
than the number of commands supported, especially on the render ring. As such,
the parser tables (built up in subsequent patches) contain only those commands
required by the parser. This generally works because command opcode ranges have
standard command length encodings. So for commands that the parser does not need
to check, it can easily skip them. This is implementated via a per-ring length
decoding vfunc.

Unfortunately, there are a number of commands that do not follow the standard
length encoding for their opcode range, primarily amongst the MI_* commands. To
handle this, the parser provides a way to define explicit "skip" entries in the
per-ring command tables.

Other command table entries will map fairly directly to high level categories
mentioned above: rejected, master-only, register whitelist. A number of checks,
including the privileged memory checks, are implemented via a general bitmasking
mechanism.

OTC-Tracker: AXIA-4631
Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/Makefile              |   3 +-
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 404 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h            |  94 +++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  17 ++
 drivers/gpu/drm/i915/i915_params.c         |   5 +
 drivers/gpu/drm/i915/intel_ringbuffer.c    |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  32 +++
 7 files changed, 556 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 4850494..2da81bf 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -47,7 +47,8 @@ i915-y := i915_drv.o i915_dma.o i915_irq.o \
 	  dvo_tfp410.o \
 	  dvo_sil164.o \
 	  dvo_ns2501.o \
-	  i915_gem_dmabuf.o
+	  i915_gem_dmabuf.o \
+	  i915_cmd_parser.o
 
 i915-$(CONFIG_COMPAT)   += i915_ioc32.o
 
diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
new file mode 100644
index 0000000..7639dbc
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -0,0 +1,404 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Brad Volkin <bradley.d.volkin@intel.com>
+ *
+ */
+
+#include "i915_drv.h"
+
+#define CLIENT_MASK      0xE0000000
+#define SUBCLIENT_MASK   0x18000000
+#define MI_CLIENT        0x00000000
+#define RC_CLIENT        0x60000000
+#define BC_CLIENT        0x40000000
+#define MEDIA_SUBCLIENT  0x10000000
+
+static u32 gen7_render_get_cmd_length_mask(u32 cmd_header)
+{
+	u32 client = cmd_header & CLIENT_MASK;
+	u32 subclient = cmd_header & SUBCLIENT_MASK;
+
+	if (client == MI_CLIENT)
+		return 0x3F;
+	else if (client == RC_CLIENT) {
+		if (subclient == MEDIA_SUBCLIENT)
+			return 0xFFFF;
+		else
+			return 0xFF;
+	}
+
+	DRM_DEBUG_DRIVER("CMD: Abnormal rcs cmd length! 0x%08X\n", cmd_header);
+	return 0;
+}
+
+static u32 gen7_bsd_get_cmd_length_mask(u32 cmd_header)
+{
+	u32 client = cmd_header & CLIENT_MASK;
+	u32 subclient = cmd_header & SUBCLIENT_MASK;
+
+	if (client == MI_CLIENT)
+		return 0x3F;
+	else if (client == RC_CLIENT) {
+		if (subclient == MEDIA_SUBCLIENT)
+			return 0xFFF;
+		else
+			return 0xFF;
+	}
+
+	DRM_DEBUG_DRIVER("CMD: Abnormal bsd cmd length! 0x%08X\n", cmd_header);
+	return 0;
+}
+
+static u32 gen7_blt_get_cmd_length_mask(u32 cmd_header)
+{
+	u32 client = cmd_header & CLIENT_MASK;
+
+	if (client == MI_CLIENT)
+		return 0x3F;
+	else if (client == BC_CLIENT)
+		return 0xFF;
+
+	DRM_DEBUG_DRIVER("CMD: Abnormal blt cmd length! 0x%08X\n", cmd_header);
+	return 0;
+}
+
+static void validate_cmds_sorted(struct intel_ring_buffer *ring)
+{
+	int i;
+
+	if (!ring->cmd_tables || ring->cmd_table_count == 0)
+		return;
+
+	for (i = 0; i < ring->cmd_table_count; i++) {
+		const struct drm_i915_cmd_table *table = &ring->cmd_tables[i];
+		u32 previous = 0;
+		int j;
+
+		for (j = 0; j < table->count; j++) {
+			const struct drm_i915_cmd_descriptor *desc =
+				&table->table[i];
+			u32 curr = desc->cmd.value & desc->cmd.mask;
+
+			if (curr < previous) {
+				DRM_ERROR("CMD: table not sorted ring=%d table=%d entry=%d cmd=0x%08X\n",
+					  ring->id, i, j, curr);
+				return;
+			}
+
+			previous = curr;
+		}
+	}
+}
+
+static void check_sorted(int ring_id, const u32 *reg_table, int reg_count)
+{
+	int i;
+	u32 previous = 0;
+
+	for (i = 0; i < reg_count; i++) {
+		u32 curr = reg_table[i];
+
+		if (curr < previous) {
+			DRM_ERROR("CMD: table not sorted ring=%d entry=%d reg=0x%08X\n",
+				  ring_id, i, curr);
+			return;
+		}
+
+		previous = curr;
+	}
+}
+
+static void validate_regs_sorted(struct intel_ring_buffer *ring)
+{
+	if (ring->reg_table && ring->reg_count > 0)
+		check_sorted(ring->id, ring->reg_table, ring->reg_count);
+
+	if (ring->master_reg_table && ring->master_reg_count > 0)
+		check_sorted(ring->id, ring->master_reg_table,
+			     ring->master_reg_count);
+}
+
+void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
+{
+	if (!IS_GEN7(ring->dev))
+		return;
+
+	switch (ring->id) {
+	case RCS:
+		ring->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
+		break;
+	case VCS:
+		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
+		break;
+	case BCS:
+		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
+		break;
+	case VECS:
+		/* VECS can use the same length_mask function as VCS */
+		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
+		break;
+	default:
+		DRM_ERROR("CMD: cmd_parser_init with unknown ring: %d\n",
+			  ring->id);
+	}
+
+	validate_cmds_sorted(ring);
+	validate_regs_sorted(ring);
+}
+
+static const struct drm_i915_cmd_descriptor*
+find_cmd_in_table(const struct drm_i915_cmd_table *table,
+		  u32 cmd_header)
+{
+	int i;
+
+	for (i = 0; i < table->count; i++) {
+		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
+		u32 masked_cmd = desc->cmd.mask & cmd_header;
+		u32 masked_value = desc->cmd.value & desc->cmd.mask;
+
+		if (masked_cmd == masked_value)
+			return desc;
+	}
+
+	return NULL;
+}
+
+/*
+ * Returns a pointer to a descriptor for the command specified by cmd_header.
+ *
+ * The caller must supply space for a default descriptor via the default_desc
+ * parameter. If no descriptor for the specified command exists in the ring's
+ * command parser tables, this function fills in default_desc based on the
+ * ring's default length encoding and returns default_desc.
+ */
+static const struct drm_i915_cmd_descriptor*
+find_cmd(struct intel_ring_buffer *ring,
+	 u32 cmd_header,
+	 struct drm_i915_cmd_descriptor *default_desc)
+{
+	u32 mask;
+	int i;
+
+	for (i = 0; i < ring->cmd_table_count; i++) {
+		const struct drm_i915_cmd_descriptor *desc;
+
+		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
+		if (desc)
+			return desc;
+	}
+
+	mask = ring->get_cmd_length_mask(cmd_header);
+	if (!mask)
+		return NULL;
+
+	BUG_ON(!default_desc);
+	default_desc->flags = CMD_DESC_SKIP;
+	default_desc->length.mask = mask;
+
+	return default_desc;
+}
+
+static int valid_reg(const u32 *table, int count, u32 addr)
+{
+	if (table && count != 0) {
+		int i;
+
+		for (i = 0; i < count; i++) {
+			if (table[i] == addr)
+				return 1;
+		}
+	}
+
+	return 0;
+}
+
+static u32 *vmap_batch(struct drm_i915_gem_object *obj)
+{
+	int i;
+	void *addr = NULL;
+	struct sg_page_iter sg_iter;
+	struct page **pages;
+
+	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
+	if (pages == NULL) {
+		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
+		goto finish;
+	}
+
+	i = 0;
+	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
+		pages[i] = sg_page_iter_page(&sg_iter);
+		i++;
+	}
+
+	addr = vmap(pages, i, 0, PAGE_KERNEL);
+	if (addr == NULL) {
+		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
+		goto finish;
+	}
+
+finish:
+	if (pages)
+		drm_free_large(pages);
+	return (u32*)addr;
+}
+
+int i915_needs_cmd_parser(struct intel_ring_buffer *ring)
+{
+	/* No command tables indicates a platform without parsing */
+	if (!ring->cmd_tables)
+		return 0;
+
+	return i915.enable_cmd_parser;
+}
+
+#define LENGTH_BIAS 2
+
+int i915_parse_cmds(struct intel_ring_buffer *ring,
+		    struct drm_i915_gem_object *batch_obj,
+		    u32 batch_start_offset,
+		    bool is_master)
+{
+	int ret = 0;
+	u32 *cmd, *batch_base, *batch_end;
+	struct drm_i915_cmd_descriptor default_desc = { 0 };
+	int needs_clflush = 0;
+
+	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
+		return ret;
+	}
+
+	batch_base = vmap_batch(batch_obj);
+	if (!batch_base) {
+		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
+		i915_gem_object_unpin_pages(batch_obj);
+		return -ENOMEM;
+	}
+
+	if (needs_clflush)
+		drm_clflush_virt_range((char *)batch_base, batch_obj->base.size);
+
+	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
+	batch_end = cmd + (batch_obj->base.size / sizeof(*batch_end));
+
+	while (cmd < batch_end) {
+		const struct drm_i915_cmd_descriptor *desc;
+		u32 length;
+
+		if (*cmd == MI_BATCH_BUFFER_END)
+			break;
+
+		desc = find_cmd(ring, *cmd, &default_desc);
+		if (!desc) {
+			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
+					 *cmd);
+			ret = -EINVAL;
+			break;
+		}
+
+		if (desc->flags & CMD_DESC_FIXED)
+			length = desc->length.fixed;
+		else
+			length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
+
+		if ((batch_end - cmd) < length) {
+			DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%d batchlen=%ld\n",
+					 *cmd,
+					 length,
+					 batch_end - cmd);
+			ret = -EINVAL;
+			break;
+		}
+
+		if (desc->flags & CMD_DESC_REJECT) {
+			DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
+			ret = -EINVAL;
+			break;
+		}
+
+		if ((desc->flags & CMD_DESC_MASTER) && !is_master) {
+			DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n",
+					 *cmd);
+			ret = -EINVAL;
+			break;
+		}
+
+		if (desc->flags & CMD_DESC_REGISTER) {
+			u32 reg_addr = cmd[desc->reg.offset] & desc->reg.mask;
+
+			if (!valid_reg(ring->reg_table,
+				       ring->reg_count, reg_addr)) {
+				if (!is_master ||
+				    !valid_reg(ring->master_reg_table,
+					       ring->master_reg_count,
+					       reg_addr)) {
+					DRM_DEBUG_DRIVER("CMD: Rejected register 0x%08X in command: 0x%08X (ring=%d)\n",
+							 reg_addr,
+							 *cmd,
+							 ring->id);
+					ret = -EINVAL;
+					break;
+				}
+			}
+		}
+
+		if (desc->flags & CMD_DESC_BITMASK) {
+			int i;
+
+			for (i = 0; i < desc->bits_count; i++) {
+				u32 dword = cmd[desc->bits[i].offset] &
+					desc->bits[i].mask;
+
+				if (dword != desc->bits[i].expected) {
+					DRM_DEBUG_DRIVER("CMD: Rejected command 0x%08X for bitmask 0x%08X (exp=0x%08X act=0x%08X) (ring=%d)\n",
+							 *cmd,
+							 desc->bits[i].mask,
+							 desc->bits[i].expected,
+							 dword, ring->id);
+					ret = -EINVAL;
+					break;
+				}
+			}
+
+			if (ret)
+				break;
+		}
+
+		cmd += length;
+	}
+
+	if (cmd >= batch_end) {
+		DRM_DEBUG_DRIVER("CMD: Got to the end of the buffer w/o a BBE cmd!\n");
+		ret = -EINVAL;
+	}
+
+	vunmap(batch_base);
+
+	i915_gem_object_unpin_pages(batch_obj);
+
+	return ret;
+}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bfb30df..8aed80f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1765,6 +1765,91 @@ struct drm_i915_file_private {
 	atomic_t rps_wait_boost;
 };
 
+/**
+ * A command that requires special handling by the command parser.
+ */
+struct drm_i915_cmd_descriptor {
+	/**
+	 * Flags describing how the command parser processes the command.
+	 *
+	 * CMD_DESC_FIXED: The command has a fixed length if this is set,
+	 *                 a length mask if not set
+	 * CMD_DESC_SKIP: The command is allowed but does not follow the
+	 *                standard length encoding for the opcode range in
+	 *                which it falls
+	 * CMD_DESC_REJECT: The command is never allowed
+	 * CMD_DESC_REGISTER: The command should be checked against the
+	 *                    register whitelist for the appropriate ring
+	 * CMD_DESC_MASTER: The command is allowed if the submitting process
+	 *                  is the DRM master
+	 */
+	u32 flags;
+#define CMD_DESC_FIXED    (1<<0)
+#define CMD_DESC_SKIP     (1<<1)
+#define CMD_DESC_REJECT   (1<<2)
+#define CMD_DESC_REGISTER (1<<3)
+#define CMD_DESC_BITMASK  (1<<4)
+#define CMD_DESC_MASTER   (1<<5)
+
+	/**
+	 * The command's unique identification bits and the bitmask to get them.
+	 * This isn't strictly the opcode field as defined in the spec and may
+	 * also include type, subtype, and/or subop fields.
+	 */
+	struct {
+		u32 value;
+		u32 mask;
+	} cmd;
+
+	/**
+	 * The command's length. The command is either fixed length (i.e. does
+	 * not include a length field) or has a length field mask. The flag
+	 * CMD_DESC_FIXED indicates a fixed length. Otherwise, the command has
+	 * a length mask. All command entries in a command table must include
+	 * length information.
+	 */
+	union {
+		u32 fixed;
+		u32 mask;
+	} length;
+
+	/**
+	 * Describes where to find a register address in the command to check
+	 * against the ring's register whitelist. Only valid if flags has the
+	 * CMD_DESC_REGISTER bit set.
+	 */
+	struct {
+		u32 offset;
+		u32 mask;
+	} reg;
+
+#define MAX_CMD_DESC_BITMASKS 3
+	/**
+	 * Describes command checks where a particular dword is masked and
+	 * compared against an expected value. If the command does not match
+	 * the expected value, the parser rejects it. Only valid if flags has
+	 * the CMD_DESC_BITMASK bit set.
+	 */
+	struct {
+		u32 offset;
+		u32 mask;
+		u32 expected;
+	} bits[MAX_CMD_DESC_BITMASKS];
+	/** Number of valid entries in the bits array */
+	int bits_count;
+};
+
+/**
+ * A table of commands requiring special handling by the command parser.
+ *
+ * Each ring has an array of tables. Each table consists of an array of command
+ * descriptors, which must be sorted with command opcodes in ascending order.
+ */
+struct drm_i915_cmd_table {
+	const struct drm_i915_cmd_descriptor *table;
+	int count;
+};
+
 #define INTEL_INFO(dev)	(to_i915(dev)->info)
 
 #define IS_I830(dev)		((dev)->pdev->device == 0x3577)
@@ -1923,6 +2008,7 @@ struct i915_params {
 	bool prefault_disable;
 	bool reset;
 	int invert_brightness;
+	int enable_cmd_parser;
 };
 extern struct i915_params i915 __read_mostly;
 
@@ -2428,6 +2514,14 @@ void i915_destroy_error_state(struct drm_device *dev);
 void i915_get_extra_instdone(struct drm_device *dev, uint32_t *instdone);
 const char *i915_cache_level_str(int type);
 
+/* i915_cmd_parser.c */
+void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring);
+int i915_needs_cmd_parser(struct intel_ring_buffer *ring);
+int i915_parse_cmds(struct intel_ring_buffer *ring,
+		    struct drm_i915_gem_object *batch_obj,
+		    u32 batch_start_offset,
+		    bool is_master);
+
 /* i915_suspend.c */
 extern int i915_save_state(struct drm_device *dev);
 extern int i915_restore_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 032def9..c953445 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1180,6 +1180,23 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
 
+	if (i915_needs_cmd_parser(ring)) {
+		ret = i915_parse_cmds(ring,
+				      batch_obj,
+				      args->batch_start_offset,
+				      file->is_master);
+		if (ret)
+			goto err;
+
+		/*
+		 * Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
+		 * from MI_BATCH_BUFFER_START commands issued in the
+		 * dispatch_execbuffer implementations. We specifically don't
+		 * want that set when the command parser is enabled.
+		 */
+		flags |= I915_DISPATCH_SECURE;
+	}
+
 	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
 	 * hsw should have this fixed, but bdw mucks it up again. */
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index c743057..6d3d906 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -47,6 +47,7 @@ struct i915_params i915 __read_mostly = {
 	.prefault_disable = 0,
 	.reset = true,
 	.invert_brightness = 0,
+	.enable_cmd_parser = 0
 };
 
 module_param_named(modeset, i915.modeset, int, 0400);
@@ -153,3 +154,7 @@ MODULE_PARM_DESC(invert_brightness,
 	"report PCI device ID, subsystem vendor and subsystem device ID "
 	"to dri-devel@lists.freedesktop.org, if your machine needs it. "
 	"It will then be included in an upcoming module version.");
+
+module_param_named(enable_cmd_parser, i915.enable_cmd_parser, int, 0600);
+MODULE_PARM_DESC(enable_cmd_parser,
+		"Enable command parsing (default: false)");
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a0d61f8..77fc61d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1388,6 +1388,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	if (IS_I830(ring->dev) || IS_845G(ring->dev))
 		ring->effective_size -= 128;
 
+	i915_cmd_parser_init_ring(ring);
+
 	return 0;
 
 err_unmap:
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 71a73f4..cff2b35 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -162,6 +162,38 @@ struct  intel_ring_buffer {
 		u32 gtt_offset;
 		volatile u32 *cpu_page;
 	} scratch;
+
+	/**
+	 * Tables of commands the command parser needs to know about
+	 * for this ring.
+	 */
+	const struct drm_i915_cmd_table *cmd_tables;
+	int cmd_table_count;
+
+	/**
+	 * Table of registers allowed in commands that read/write registers.
+	 */
+	const u32 *reg_table;
+	int reg_count;
+
+	/**
+	 * Table of registers allowed in commands that read/write registers, but
+	 * only from the DRM master.
+	 */
+	const u32 *master_reg_table;
+	int master_reg_count;
+
+	/**
+	 * Returns the bitmask for the length field of the specified command.
+	 * Return 0 for an unrecognized/invalid command.
+	 *
+	 * If the command parser finds an entry for a command in the ring's
+	 * cmd_tables, it gets the command's length based on the table entry.
+	 * If not, it calls this function to determine the per-ring length field
+	 * encoding for the command (i.e. certain opcode ranges use certain bits
+	 * to encode the command length in the header).
+	 */
+	u32 (*get_cmd_length_mask)(u32 cmd_header);
 };
 
 static inline bool
-- 
1.8.5.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 03/13] drm/i915: Initial command parser table definitions
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
  2014-01-29 21:55   ` [PATCH 01/13] drm/i915: Refactor shmem pread setup bradley.d.volkin
  2014-01-29 21:55   ` [PATCH 02/13] drm/i915: Implement command buffer parsing logic bradley.d.volkin
@ 2014-01-29 21:55   ` bradley.d.volkin
  2014-02-05 14:22     ` Jani Nikula
  2014-01-29 21:55   ` [PATCH 04/13] drm/i915: Reject privileged commands bradley.d.volkin
                     ` (11 subsequent siblings)
  14 siblings, 1 reply; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:55 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Add command tables defining irregular length commands for each ring.
This requires a few new command opcode definitions.

OTC-Tracker: AXIA-4631
Change-Id: I064bceb457e15f46928058352afe76d918c58ef5
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 157 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h        |  46 ++++++++++
 2 files changed, 203 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 7639dbc..2e27bad 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -27,6 +27,148 @@
 
 #include "i915_drv.h"
 
+#define STD_MI_OPCODE_MASK  0xFF800000
+#define STD_3D_OPCODE_MASK  0xFFFF0000
+#define STD_2D_OPCODE_MASK  0xFFC00000
+#define STD_MFX_OPCODE_MASK 0xFFFF0000
+
+#define CMD(op, opm, f, lm, fl, ...)		\
+	{					\
+		.flags = (fl) | (f),		\
+		.cmd = { (op), (opm) }, 	\
+		.length = { (lm) },		\
+		__VA_ARGS__			\
+	}
+
+/* Convenience macros to compress the tables */
+#define SMI STD_MI_OPCODE_MASK
+#define S3D STD_3D_OPCODE_MASK
+#define S2D STD_2D_OPCODE_MASK
+#define SMFX STD_MFX_OPCODE_MASK
+#define F CMD_DESC_FIXED
+#define S CMD_DESC_SKIP
+#define R CMD_DESC_REJECT
+#define W CMD_DESC_REGISTER
+#define B CMD_DESC_BITMASK
+#define M CMD_DESC_MASTER
+
+/*            Command                          Mask   Fixed Len   Action
+	      ---------------------------------------------------------- */
+static const struct drm_i915_cmd_descriptor common_cmds[] = {
+	CMD(  MI_NOOP,                          SMI,    F,  1,      S  ),
+	CMD(  MI_USER_INTERRUPT,                SMI,    F,  1,      S  ),
+	CMD(  MI_WAIT_FOR_EVENT,                SMI,    F,  1,      S  ),
+	CMD(  MI_ARB_CHECK,                     SMI,    F,  1,      S  ),
+	CMD(  MI_REPORT_HEAD,                   SMI,    F,  1,      S  ),
+	CMD(  MI_SUSPEND_FLUSH,                 SMI,    F,  1,      S  ),
+	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
+};
+
+static const struct drm_i915_cmd_descriptor render_cmds[] = {
+	CMD(  MI_FLUSH,                         SMI,    F,  1,      S  ),
+	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
+	CMD(  MI_PREDICATE,                     SMI,    F,  1,      S  ),
+	CMD(  MI_TOPOLOGY_FILTER,               SMI,    F,  1,      S  ),
+	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_SET_CONTEXT,                   SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_URB_CLEAR,                     SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_CLFLUSH,                       SMI,   !F,  0x3FF,  S  ),
+	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
+	CMD(  GFX_OP_3DSTATE_VF_STATISTICS,     S3D,    F,  1,      S  ),
+	CMD(  PIPELINE_SELECT,                  S3D,    F,  1,      S  ),
+	CMD(  GPGPU_OBJECT,                     S3D,   !F,  0xFF,   S  ),
+	CMD(  GPGPU_WALKER,                     S3D,   !F,  0xFF,   S  ),
+	CMD(  GFX_OP_3DSTATE_SO_DECL_LIST,      S3D,   !F,  0x1FF,  S  ),
+};
+
+static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
+	CMD(  MI_SET_PREDICATE,                 SMI,    F,  1,      S  ),
+	CMD(  MI_RS_CONTROL,                    SMI,    F,  1,      S  ),
+	CMD(  MI_URB_ATOMIC_ALLOC,              SMI,    F,  1,      S  ),
+	CMD(  MI_RS_CONTEXT,                    SMI,    F,  1,      S  ),
+	CMD(  MI_LOAD_REGISTER_REG,             SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_RS_STORE_DATA_IMM,             SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_LOAD_URB_MEM,                  SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_STORE_URB_MEM,                 SMI,   !F,  0xFF,   S  ),
+	CMD(  GFX_OP_3DSTATE_DX9_CONSTANTF_VS,  S3D,   !F,  0x7FF,  S  ),
+	CMD(  GFX_OP_3DSTATE_DX9_CONSTANTF_PS,  S3D,   !F,  0x7FF,  S  ),
+
+	CMD(  GFX_OP_3DSTATE_BINDING_TABLE_EDIT_VS,  S3D,   !F,  0x1FF,  S  ),
+	CMD(  GFX_OP_3DSTATE_BINDING_TABLE_EDIT_GS,  S3D,   !F,  0x1FF,  S  ),
+	CMD(  GFX_OP_3DSTATE_BINDING_TABLE_EDIT_HS,  S3D,   !F,  0x1FF,  S  ),
+	CMD(  GFX_OP_3DSTATE_BINDING_TABLE_EDIT_DS,  S3D,   !F,  0x1FF,  S  ),
+	CMD(  GFX_OP_3DSTATE_BINDING_TABLE_EDIT_PS,  S3D,   !F,  0x1FF,  S  ),
+};
+
+static const struct drm_i915_cmd_descriptor video_cmds[] = {
+	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
+	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
+	/*
+	 * MFX_WAIT doesn't fit the way we handle length for most commands.
+	 * It has a length field but it uses a non-standard length bias.
+	 * It is always 1 dword though, so just treat it as fixed length.
+	 */
+	CMD(  MFX_WAIT,                         SMFX,   F,  1,      S  ),
+};
+
+static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
+	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
+	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
+};
+
+static const struct drm_i915_cmd_descriptor blt_cmds[] = {
+	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  S  ),
+	CMD(  COLOR_BLT,                        S2D,   !F,  0x3F,   S  ),
+	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
+};
+
+#undef CMD
+#undef SMI
+#undef S3D
+#undef S2D
+#undef SMFX
+#undef F
+#undef S
+#undef R
+#undef W
+#undef B
+#undef M
+
+static const struct drm_i915_cmd_table gen7_render_cmds[] = {
+	{ common_cmds, ARRAY_SIZE(common_cmds) },
+	{ render_cmds, ARRAY_SIZE(render_cmds) },
+};
+
+static const struct drm_i915_cmd_table hsw_render_ring_cmds[] = {
+	{ common_cmds, ARRAY_SIZE(common_cmds) },
+	{ render_cmds, ARRAY_SIZE(render_cmds) },
+	{ hsw_render_cmds, ARRAY_SIZE(hsw_render_cmds) },
+};
+
+static const struct drm_i915_cmd_table gen7_video_cmds[] = {
+	{ common_cmds, ARRAY_SIZE(common_cmds) },
+	{ video_cmds, ARRAY_SIZE(video_cmds) },
+};
+
+static const struct drm_i915_cmd_table hsw_vebox_cmds[] = {
+	{ common_cmds, ARRAY_SIZE(common_cmds) },
+	{ vecs_cmds, ARRAY_SIZE(vecs_cmds) },
+};
+
+static const struct drm_i915_cmd_table gen7_blt_cmds[] = {
+	{ common_cmds, ARRAY_SIZE(common_cmds) },
+	{ blt_cmds, ARRAY_SIZE(blt_cmds) },
+};
+
 #define CLIENT_MASK      0xE0000000
 #define SUBCLIENT_MASK   0x18000000
 #define MI_CLIENT        0x00000000
@@ -146,15 +288,30 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
 
 	switch (ring->id) {
 	case RCS:
+		if (IS_HASWELL(ring->dev)) {
+			ring->cmd_tables = hsw_render_ring_cmds;
+			ring->cmd_table_count =
+				ARRAY_SIZE(hsw_render_ring_cmds);
+		} else {
+			ring->cmd_tables = gen7_render_cmds;
+			ring->cmd_table_count = ARRAY_SIZE(gen7_render_cmds);
+		}
+
 		ring->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
 		break;
 	case VCS:
+		ring->cmd_tables = gen7_video_cmds;
+		ring->cmd_table_count = ARRAY_SIZE(gen7_video_cmds);
 		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
 		break;
 	case BCS:
+		ring->cmd_tables = gen7_blt_cmds;
+		ring->cmd_table_count = ARRAY_SIZE(gen7_blt_cmds);
 		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
 		break;
 	case VECS:
+		ring->cmd_tables = hsw_vebox_cmds;
+		ring->cmd_table_count = ARRAY_SIZE(hsw_vebox_cmds);
 		/* VECS can use the same length_mask function as VCS */
 		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index cbbaf26..13ed6ed 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -336,6 +336,52 @@
 #define   PIPE_CONTROL_DEPTH_CACHE_FLUSH		(1<<0)
 #define   PIPE_CONTROL_GLOBAL_GTT (1<<2) /* in addr dword */
 
+/*
+ * Commands used only by the command parser
+ */
+#define MI_SET_PREDICATE       MI_INSTR(0x01, 0)
+#define MI_ARB_CHECK           MI_INSTR(0x05, 0)
+#define MI_RS_CONTROL          MI_INSTR(0x06, 0)
+#define MI_URB_ATOMIC_ALLOC    MI_INSTR(0x09, 0)
+#define MI_PREDICATE           MI_INSTR(0x0C, 0)
+#define MI_RS_CONTEXT          MI_INSTR(0x0F, 0)
+#define MI_TOPOLOGY_FILTER     MI_INSTR(0x0D, 0)
+#define MI_URB_CLEAR           MI_INSTR(0x19, 0)
+#define MI_UPDATE_GTT          MI_INSTR(0x23, 0)
+#define MI_CLFLUSH             MI_INSTR(0x27, 0)
+#define MI_LOAD_REGISTER_MEM   MI_INSTR(0x29, 0)
+#define MI_LOAD_REGISTER_REG   MI_INSTR(0x2A, 0)
+#define MI_RS_STORE_DATA_IMM   MI_INSTR(0x2B, 0)
+#define MI_LOAD_URB_MEM        MI_INSTR(0x2C, 0)
+#define MI_STORE_URB_MEM       MI_INSTR(0x2D, 0)
+#define MI_CONDITIONAL_BATCH_BUFFER_END MI_INSTR(0x36, 0)
+
+#define PIPELINE_SELECT                ((0x3<<29)|(0x1<<27)|(0x1<<24)|(0x4<<16))
+#define GFX_OP_3DSTATE_VF_STATISTICS   ((0x3<<29)|(0x1<<27)|(0x0<<24)|(0xB<<16))
+#define GPGPU_OBJECT                   ((0x3<<29)|(0x2<<27)|(0x1<<24)|(0x4<<16))
+#define GPGPU_WALKER                   ((0x3<<29)|(0x2<<27)|(0x1<<24)|(0x5<<16))
+#define GFX_OP_3DSTATE_DX9_CONSTANTF_VS \
+	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x39<<16))
+#define GFX_OP_3DSTATE_DX9_CONSTANTF_PS \
+	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x3A<<16))
+#define GFX_OP_3DSTATE_SO_DECL_LIST \
+	((0x3<<29)|(0x3<<27)|(0x1<<24)|(0x17<<16))
+
+#define GFX_OP_3DSTATE_BINDING_TABLE_EDIT_VS \
+	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x43<<16))
+#define GFX_OP_3DSTATE_BINDING_TABLE_EDIT_GS \
+	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x44<<16))
+#define GFX_OP_3DSTATE_BINDING_TABLE_EDIT_HS \
+	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x45<<16))
+#define GFX_OP_3DSTATE_BINDING_TABLE_EDIT_DS \
+	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x46<<16))
+#define GFX_OP_3DSTATE_BINDING_TABLE_EDIT_PS \
+	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x47<<16))
+
+#define MFX_WAIT  ((0x3<<29)|(0x1<<27)|(0x0<<16))
+
+#define COLOR_BLT     ((0x2<<29)|(0x40<<22))
+#define SRC_COPY_BLT  ((0x2<<29)|(0x43<<22))
 
 /*
  * Reset registers
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 04/13] drm/i915: Reject privileged commands
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
                     ` (2 preceding siblings ...)
  2014-01-29 21:55   ` [PATCH 03/13] drm/i915: Initial command parser table definitions bradley.d.volkin
@ 2014-01-29 21:55   ` bradley.d.volkin
  2014-02-05 15:22     ` Jani Nikula
  2014-01-29 21:55   ` [PATCH 05/13] drm/i915: Allow some privileged commands from master bradley.d.volkin
                     ` (10 subsequent siblings)
  14 siblings, 1 reply; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:55 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

The spec defines most of these commands as privileged. A few others,
like the semaphore mbox command and some display commands, are also
reserved for the driver's use. Subsequent patches relax some of
these restrictions.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 54 ++++++++++++++++++++++++----------
 drivers/gpu/drm/i915/i915_reg.h        | 31 +++++++++----------
 2 files changed, 54 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 2e27bad..cc2f68c 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -57,27 +57,27 @@
 static const struct drm_i915_cmd_descriptor common_cmds[] = {
 	CMD(  MI_NOOP,                          SMI,    F,  1,      S  ),
 	CMD(  MI_USER_INTERRUPT,                SMI,    F,  1,      S  ),
-	CMD(  MI_WAIT_FOR_EVENT,                SMI,    F,  1,      S  ),
+	CMD(  MI_WAIT_FOR_EVENT,                SMI,    F,  1,      R  ),
 	CMD(  MI_ARB_CHECK,                     SMI,    F,  1,      S  ),
 	CMD(  MI_REPORT_HEAD,                   SMI,    F,  1,      S  ),
 	CMD(  MI_SUSPEND_FLUSH,                 SMI,    F,  1,      S  ),
-	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   S  ),
-	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   S  ),
-	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   S  ),
-	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   S  ),
-	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   R  ),
+	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   R  ),
+	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   R  ),
+	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   R  ),
+	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
 };
 
 static const struct drm_i915_cmd_descriptor render_cmds[] = {
 	CMD(  MI_FLUSH,                         SMI,    F,  1,      S  ),
-	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
+	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
 	CMD(  MI_PREDICATE,                     SMI,    F,  1,      S  ),
 	CMD(  MI_TOPOLOGY_FILTER,               SMI,    F,  1,      S  ),
-	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   S  ),
-	CMD(  MI_SET_CONTEXT,                   SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
+	CMD(  MI_SET_CONTEXT,                   SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_URB_CLEAR,                     SMI,   !F,  0xFF,   S  ),
-	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_CLFLUSH,                       SMI,   !F,  0x3FF,  S  ),
 	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
 	CMD(  GFX_OP_3DSTATE_VF_STATISTICS,     S3D,    F,  1,      S  ),
@@ -92,7 +92,9 @@ static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
 	CMD(  MI_RS_CONTROL,                    SMI,    F,  1,      S  ),
 	CMD(  MI_URB_ATOMIC_ALLOC,              SMI,    F,  1,      S  ),
 	CMD(  MI_RS_CONTEXT,                    SMI,    F,  1,      S  ),
-	CMD(  MI_LOAD_REGISTER_REG,             SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_LOAD_SCAN_LINES_INCL,          SMI,   !F,  0x3F,   R  ),
+	CMD(  MI_LOAD_SCAN_LINES_EXCL,          SMI,   !F,  0x3F,   R  ),
+	CMD(  MI_LOAD_REGISTER_REG,             SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_RS_STORE_DATA_IMM,             SMI,   !F,  0xFF,   S  ),
 	CMD(  MI_LOAD_URB_MEM,                  SMI,   !F,  0xFF,   S  ),
 	CMD(  MI_STORE_URB_MEM,                 SMI,   !F,  0xFF,   S  ),
@@ -107,8 +109,9 @@ static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
 };
 
 static const struct drm_i915_cmd_descriptor video_cmds[] = {
-	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
+	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
 	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0x3F,   R  ),
 	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
 	/*
 	 * MFX_WAIT doesn't fit the way we handle length for most commands.
@@ -119,18 +122,25 @@ static const struct drm_i915_cmd_descriptor video_cmds[] = {
 };
 
 static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
-	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
+	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
 	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0x3F,   R  ),
 	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
 };
 
 static const struct drm_i915_cmd_descriptor blt_cmds[] = {
-	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  S  ),
+	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0x3F,   R  ),
 	CMD(  COLOR_BLT,                        S2D,   !F,  0x3F,   S  ),
 	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
 };
 
+static const struct drm_i915_cmd_descriptor hsw_blt_cmds[] = {
+	CMD(  MI_LOAD_SCAN_LINES_INCL,          SMI,   !F,  0x3F,   R  ),
+	CMD(  MI_LOAD_SCAN_LINES_EXCL,          SMI,   !F,  0x3F,   R  ),
+};
+
 #undef CMD
 #undef SMI
 #undef S3D
@@ -169,6 +179,12 @@ static const struct drm_i915_cmd_table gen7_blt_cmds[] = {
 	{ blt_cmds, ARRAY_SIZE(blt_cmds) },
 };
 
+static const struct drm_i915_cmd_table hsw_blt_ring_cmds[] = {
+	{ common_cmds, ARRAY_SIZE(common_cmds) },
+	{ blt_cmds, ARRAY_SIZE(blt_cmds) },
+	{ hsw_blt_cmds, ARRAY_SIZE(hsw_blt_cmds) },
+};
+
 #define CLIENT_MASK      0xE0000000
 #define SUBCLIENT_MASK   0x18000000
 #define MI_CLIENT        0x00000000
@@ -305,8 +321,14 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
 		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
 		break;
 	case BCS:
-		ring->cmd_tables = gen7_blt_cmds;
-		ring->cmd_table_count = ARRAY_SIZE(gen7_blt_cmds);
+		if (IS_HASWELL(ring->dev)) {
+			ring->cmd_tables = hsw_blt_ring_cmds;
+			ring->cmd_table_count = ARRAY_SIZE(hsw_blt_ring_cmds);
+		} else {
+			ring->cmd_tables = gen7_blt_cmds;
+			ring->cmd_table_count = ARRAY_SIZE(gen7_blt_cmds);
+		}
+
 		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
 		break;
 	case VECS:
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 13ed6ed..2b7c26e 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -339,21 +339,22 @@
 /*
  * Commands used only by the command parser
  */
-#define MI_SET_PREDICATE       MI_INSTR(0x01, 0)
-#define MI_ARB_CHECK           MI_INSTR(0x05, 0)
-#define MI_RS_CONTROL          MI_INSTR(0x06, 0)
-#define MI_URB_ATOMIC_ALLOC    MI_INSTR(0x09, 0)
-#define MI_PREDICATE           MI_INSTR(0x0C, 0)
-#define MI_RS_CONTEXT          MI_INSTR(0x0F, 0)
-#define MI_TOPOLOGY_FILTER     MI_INSTR(0x0D, 0)
-#define MI_URB_CLEAR           MI_INSTR(0x19, 0)
-#define MI_UPDATE_GTT          MI_INSTR(0x23, 0)
-#define MI_CLFLUSH             MI_INSTR(0x27, 0)
-#define MI_LOAD_REGISTER_MEM   MI_INSTR(0x29, 0)
-#define MI_LOAD_REGISTER_REG   MI_INSTR(0x2A, 0)
-#define MI_RS_STORE_DATA_IMM   MI_INSTR(0x2B, 0)
-#define MI_LOAD_URB_MEM        MI_INSTR(0x2C, 0)
-#define MI_STORE_URB_MEM       MI_INSTR(0x2D, 0)
+#define MI_SET_PREDICATE        MI_INSTR(0x01, 0)
+#define MI_ARB_CHECK            MI_INSTR(0x05, 0)
+#define MI_RS_CONTROL           MI_INSTR(0x06, 0)
+#define MI_URB_ATOMIC_ALLOC     MI_INSTR(0x09, 0)
+#define MI_PREDICATE            MI_INSTR(0x0C, 0)
+#define MI_RS_CONTEXT           MI_INSTR(0x0F, 0)
+#define MI_TOPOLOGY_FILTER      MI_INSTR(0x0D, 0)
+#define MI_LOAD_SCAN_LINES_EXCL MI_INSTR(0x13, 0)
+#define MI_URB_CLEAR            MI_INSTR(0x19, 0)
+#define MI_UPDATE_GTT           MI_INSTR(0x23, 0)
+#define MI_CLFLUSH              MI_INSTR(0x27, 0)
+#define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
+#define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
+#define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
+#define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
+#define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
 #define MI_CONDITIONAL_BATCH_BUFFER_END MI_INSTR(0x36, 0)
 
 #define PIPELINE_SELECT                ((0x3<<29)|(0x1<<27)|(0x1<<24)|(0x4<<16))
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 05/13] drm/i915: Allow some privileged commands from master
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
                     ` (3 preceding siblings ...)
  2014-01-29 21:55   ` [PATCH 04/13] drm/i915: Reject privileged commands bradley.d.volkin
@ 2014-01-29 21:55   ` bradley.d.volkin
  2014-01-29 21:55   ` [PATCH 06/13] drm/i915: Add register whitelists for mesa bradley.d.volkin
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:55 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

The Intel DDX uses these to implement scanline waits in the X server.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index cc2f68c..88456638 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -57,7 +57,7 @@
 static const struct drm_i915_cmd_descriptor common_cmds[] = {
 	CMD(  MI_NOOP,                          SMI,    F,  1,      S  ),
 	CMD(  MI_USER_INTERRUPT,                SMI,    F,  1,      S  ),
-	CMD(  MI_WAIT_FOR_EVENT,                SMI,    F,  1,      R  ),
+	CMD(  MI_WAIT_FOR_EVENT,                SMI,    F,  1,      M  ),
 	CMD(  MI_ARB_CHECK,                     SMI,    F,  1,      S  ),
 	CMD(  MI_REPORT_HEAD,                   SMI,    F,  1,      S  ),
 	CMD(  MI_SUSPEND_FLUSH,                 SMI,    F,  1,      S  ),
@@ -92,7 +92,7 @@ static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
 	CMD(  MI_RS_CONTROL,                    SMI,    F,  1,      S  ),
 	CMD(  MI_URB_ATOMIC_ALLOC,              SMI,    F,  1,      S  ),
 	CMD(  MI_RS_CONTEXT,                    SMI,    F,  1,      S  ),
-	CMD(  MI_LOAD_SCAN_LINES_INCL,          SMI,   !F,  0x3F,   R  ),
+	CMD(  MI_LOAD_SCAN_LINES_INCL,          SMI,   !F,  0x3F,   M  ),
 	CMD(  MI_LOAD_SCAN_LINES_EXCL,          SMI,   !F,  0x3F,   R  ),
 	CMD(  MI_LOAD_REGISTER_REG,             SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_RS_STORE_DATA_IMM,             SMI,   !F,  0xFF,   S  ),
@@ -137,7 +137,7 @@ static const struct drm_i915_cmd_descriptor blt_cmds[] = {
 };
 
 static const struct drm_i915_cmd_descriptor hsw_blt_cmds[] = {
-	CMD(  MI_LOAD_SCAN_LINES_INCL,          SMI,   !F,  0x3F,   R  ),
+	CMD(  MI_LOAD_SCAN_LINES_INCL,          SMI,   !F,  0x3F,   M  ),
 	CMD(  MI_LOAD_SCAN_LINES_EXCL,          SMI,   !F,  0x3F,   R  ),
 };
 
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 06/13] drm/i915: Add register whitelists for mesa
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
                     ` (4 preceding siblings ...)
  2014-01-29 21:55   ` [PATCH 05/13] drm/i915: Allow some privileged commands from master bradley.d.volkin
@ 2014-01-29 21:55   ` bradley.d.volkin
  2014-02-05 15:29     ` Jani Nikula
  2014-01-29 21:55   ` [PATCH 07/13] drm/i915: Add register whitelist for DRM master bradley.d.volkin
                     ` (8 subsequent siblings)
  14 siblings, 1 reply; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:55 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

These registers are currently used by mesa for blitting,
transform feedback extensions, and performance monitoring
extensions.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 55 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h        | 20 +++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 88456638..18d5b05 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -185,6 +185,55 @@ static const struct drm_i915_cmd_table hsw_blt_ring_cmds[] = {
 	{ hsw_blt_cmds, ARRAY_SIZE(hsw_blt_cmds) },
 };
 
+/*
+ * Register whitelists, sorted by increasing register offset.
+ *
+ * Some registers that userspace accesses are 64 bits. The register
+ * access commands only allow 32-bit accesses. Hence, we have to include
+ * entries for both halves of the 64-bit registers.
+ */
+
+static const u32 gen7_render_regs[] = {
+	HS_INVOCATION_COUNT,
+	HS_INVOCATION_COUNT + sizeof(u32),
+	DS_INVOCATION_COUNT,
+	DS_INVOCATION_COUNT + sizeof(u32),
+	IA_VERTICES_COUNT,
+	IA_VERTICES_COUNT + sizeof(u32),
+	IA_PRIMITIVES_COUNT,
+	IA_PRIMITIVES_COUNT + sizeof(u32),
+	VS_INVOCATION_COUNT,
+	VS_INVOCATION_COUNT + sizeof(u32),
+	GS_INVOCATION_COUNT,
+	GS_INVOCATION_COUNT + sizeof(u32),
+	GS_PRIMITIVES_COUNT,
+	GS_PRIMITIVES_COUNT + sizeof(u32),
+	CL_INVOCATION_COUNT,
+	CL_INVOCATION_COUNT + sizeof(u32),
+	CL_PRIMITIVES_COUNT,
+	CL_PRIMITIVES_COUNT + sizeof(u32),
+	PS_INVOCATION_COUNT,
+	PS_INVOCATION_COUNT + sizeof(u32),
+	PS_DEPTH_COUNT,
+	PS_DEPTH_COUNT + sizeof(u32),
+	GEN7_SO_NUM_PRIMS_WRITTEN(0),
+	GEN7_SO_NUM_PRIMS_WRITTEN(0) + sizeof(u32),
+	GEN7_SO_NUM_PRIMS_WRITTEN(1),
+	GEN7_SO_NUM_PRIMS_WRITTEN(1) + sizeof(u32),
+	GEN7_SO_NUM_PRIMS_WRITTEN(2),
+	GEN7_SO_NUM_PRIMS_WRITTEN(2) + sizeof(u32),
+	GEN7_SO_NUM_PRIMS_WRITTEN(3),
+	GEN7_SO_NUM_PRIMS_WRITTEN(3) + sizeof(u32),
+	GEN7_SO_WRITE_OFFSET(0),
+	GEN7_SO_WRITE_OFFSET(1),
+	GEN7_SO_WRITE_OFFSET(2),
+	GEN7_SO_WRITE_OFFSET(3),
+};
+
+static const u32 gen7_blt_regs[] = {
+	BCS_SWCTRL,
+};
+
 #define CLIENT_MASK      0xE0000000
 #define SUBCLIENT_MASK   0x18000000
 #define MI_CLIENT        0x00000000
@@ -313,6 +362,9 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
 			ring->cmd_table_count = ARRAY_SIZE(gen7_render_cmds);
 		}
 
+		ring->reg_table = gen7_render_regs;
+		ring->reg_count = ARRAY_SIZE(gen7_render_regs);
+
 		ring->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
 		break;
 	case VCS:
@@ -329,6 +381,9 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
 			ring->cmd_table_count = ARRAY_SIZE(gen7_blt_cmds);
 		}
 
+		ring->reg_table = gen7_blt_regs;
+		ring->reg_count = ARRAY_SIZE(gen7_blt_regs);
+
 		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
 		break;
 	case VECS:
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 2b7c26e..b99bacf 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -385,6 +385,26 @@
 #define SRC_COPY_BLT  ((0x2<<29)|(0x43<<22))
 
 /*
+ * Registers used only by the command parser
+ */
+#define BCS_SWCTRL 0x22200
+
+#define HS_INVOCATION_COUNT 0x2300
+#define DS_INVOCATION_COUNT 0x2308
+#define IA_VERTICES_COUNT   0x2310
+#define IA_PRIMITIVES_COUNT 0x2318
+#define VS_INVOCATION_COUNT 0x2320
+#define GS_INVOCATION_COUNT 0x2328
+#define GS_PRIMITIVES_COUNT 0x2330
+#define CL_INVOCATION_COUNT 0x2338
+#define CL_PRIMITIVES_COUNT 0x2340
+#define PS_INVOCATION_COUNT 0x2348
+#define PS_DEPTH_COUNT      0x2350
+
+/* There are the 4 64-bit counter registers, one for each stream output */
+#define GEN7_SO_NUM_PRIMS_WRITTEN(n) (0x5200 + (n) * 8)
+
+/*
  * Reset registers
  */
 #define DEBUG_RESET_I830		0x6070
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 07/13] drm/i915: Add register whitelist for DRM master
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
                     ` (5 preceding siblings ...)
  2014-01-29 21:55   ` [PATCH 06/13] drm/i915: Add register whitelists for mesa bradley.d.volkin
@ 2014-01-29 21:55   ` bradley.d.volkin
  2014-01-29 22:37     ` Chris Wilson
  2014-01-29 21:55   ` [PATCH 08/13] drm/i915: Enable register whitelist checks bradley.d.volkin
                     ` (7 subsequent siblings)
  14 siblings, 1 reply; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:55 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

These are used to implement scanline waits in the X server.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 18d5b05..296e322 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -234,6 +234,20 @@ static const u32 gen7_blt_regs[] = {
 	BCS_SWCTRL,
 };
 
+/* Whitelists for the DRM master. Magic numbers are taken from sna, to match. */
+static const u32 ivb_master_regs[] = {
+	0xa188, /* FORCEWAKE_MT */
+	0x44050, /* DERRMR */
+	0x70068,
+	0x71068,
+	0x72068,
+};
+
+static const u32 hsw_master_regs[] = {
+	0xa188, /* FORCEWAKE_MT */
+	0x44050, /* DERRMR */
+};
+
 #define CLIENT_MASK      0xE0000000
 #define SUBCLIENT_MASK   0x18000000
 #define MI_CLIENT        0x00000000
@@ -365,6 +379,14 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
 		ring->reg_table = gen7_render_regs;
 		ring->reg_count = ARRAY_SIZE(gen7_render_regs);
 
+		if (IS_HASWELL(ring->dev)) {
+			ring->master_reg_table = hsw_master_regs;
+			ring->master_reg_count = ARRAY_SIZE(hsw_master_regs);
+		} else {
+			ring->master_reg_table = ivb_master_regs;
+			ring->master_reg_count = ARRAY_SIZE(ivb_master_regs);
+		}
+
 		ring->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
 		break;
 	case VCS:
@@ -384,6 +406,14 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
 		ring->reg_table = gen7_blt_regs;
 		ring->reg_count = ARRAY_SIZE(gen7_blt_regs);
 
+		if (IS_HASWELL(ring->dev)) {
+			ring->master_reg_table = hsw_master_regs;
+			ring->master_reg_count = ARRAY_SIZE(hsw_master_regs);
+		} else {
+			ring->master_reg_table = ivb_master_regs;
+			ring->master_reg_count = ARRAY_SIZE(ivb_master_regs);
+		}
+
 		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
 		break;
 	case VECS:
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 08/13] drm/i915: Enable register whitelist checks
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
                     ` (6 preceding siblings ...)
  2014-01-29 21:55   ` [PATCH 07/13] drm/i915: Add register whitelist for DRM master bradley.d.volkin
@ 2014-01-29 21:55   ` bradley.d.volkin
  2014-02-05 15:33     ` Jani Nikula
  2014-01-29 21:55   ` [PATCH 09/13] drm/i915: Reject commands that explicitly generate interrupts bradley.d.volkin
                     ` (6 subsequent siblings)
  14 siblings, 1 reply; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:55 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

MI_STORE_REGISTER_MEM, MI_LOAD_REGISTER_MEM, and MI_LOAD_REGISTER_IMM
commands allow userspace access to registers. Only certain registers
should be allowed for such access, so enable checking for those commands.
Each ring gets its own register whitelist.

MI_LOAD_REGISTER_REG on HSW also allows register access but is currently
unused by userspace components. Leave it rejected.

PIPE_CONTROL and MEDIA_VFE_STATE allow register access based on certain
bits being set. Reject those as well.

OTC-Tracker: AXIA-4631
Change-Id: Ie614a2f0eb2e5917de809e5a17957175d24cc44f
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 23 ++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_reg.h        |  3 +++
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 296e322..5d3e303 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -63,9 +63,12 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = {
 	CMD(  MI_SUSPEND_FLUSH,                 SMI,    F,  1,      S  ),
 	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   R  ),
-	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   R  ),
-	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   R  ),
-	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   R  ),
+	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   W,
+	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
+	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   W,
+	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
+	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   W,
+	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
 	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
 };
 
@@ -82,9 +85,23 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
 	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
 	CMD(  GFX_OP_3DSTATE_VF_STATISTICS,     S3D,    F,  1,      S  ),
 	CMD(  PIPELINE_SELECT,                  S3D,    F,  1,      S  ),
+	CMD(  MEDIA_VFE_STATE,			S3D,   !F,  0xFFFF, B,
+	      .bits = {{
+			.offset = 2,
+			.mask = MEDIA_VFE_STATE_MMIO_ACCESS_MASK,
+			.expected = 0
+	      }},
+	      .bits_count = 1					       ),
 	CMD(  GPGPU_OBJECT,                     S3D,   !F,  0xFF,   S  ),
 	CMD(  GPGPU_WALKER,                     S3D,   !F,  0xFF,   S  ),
 	CMD(  GFX_OP_3DSTATE_SO_DECL_LIST,      S3D,   !F,  0x1FF,  S  ),
+	CMD(  GFX_OP_PIPE_CONTROL(5),           S3D,   !F,  0xFF,   B,
+	      .bits = {{
+			.offset = 1,
+			.mask = PIPE_CONTROL_MMIO_WRITE,
+			.expected = 0
+	      }},
+	      .bits_count = 1					       ),
 };
 
 static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index b99bacf..6592d0d 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -319,6 +319,7 @@
 #define   DISPLAY_PLANE_B           (1<<20)
 #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|(len-2))
 #define   PIPE_CONTROL_GLOBAL_GTT_IVB			(1<<24) /* gen7+ */
+#define   PIPE_CONTROL_MMIO_WRITE			(1<<23)
 #define   PIPE_CONTROL_CS_STALL				(1<<20)
 #define   PIPE_CONTROL_TLB_INVALIDATE			(1<<18)
 #define   PIPE_CONTROL_QW_WRITE				(1<<14)
@@ -359,6 +360,8 @@
 
 #define PIPELINE_SELECT                ((0x3<<29)|(0x1<<27)|(0x1<<24)|(0x4<<16))
 #define GFX_OP_3DSTATE_VF_STATISTICS   ((0x3<<29)|(0x1<<27)|(0x0<<24)|(0xB<<16))
+#define MEDIA_VFE_STATE                ((0x3<<29)|(0x2<<27)|(0x0<<24)|(0x0<<16))
+#define  MEDIA_VFE_STATE_MMIO_ACCESS_MASK (0x18)
 #define GPGPU_OBJECT                   ((0x3<<29)|(0x2<<27)|(0x1<<24)|(0x4<<16))
 #define GPGPU_WALKER                   ((0x3<<29)|(0x2<<27)|(0x1<<24)|(0x5<<16))
 #define GFX_OP_3DSTATE_DX9_CONSTANTF_VS \
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 09/13] drm/i915: Reject commands that explicitly generate interrupts
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
                     ` (7 preceding siblings ...)
  2014-01-29 21:55   ` [PATCH 08/13] drm/i915: Enable register whitelist checks bradley.d.volkin
@ 2014-01-29 21:55   ` bradley.d.volkin
  2014-01-29 21:55   ` [PATCH 10/13] drm/i915: Enable PPGTT command parser checks bradley.d.volkin
                     ` (5 subsequent siblings)
  14 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:55 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

The driver leaves most interrupts masked during normal operation,
so there would have to be additional work to enable userspace to
safely request/receive an interrupt.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 25 +++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_reg.h        |  1 +
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 5d3e303..7de7c6a 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -56,7 +56,7 @@
 	      ---------------------------------------------------------- */
 static const struct drm_i915_cmd_descriptor common_cmds[] = {
 	CMD(  MI_NOOP,                          SMI,    F,  1,      S  ),
-	CMD(  MI_USER_INTERRUPT,                SMI,    F,  1,      S  ),
+	CMD(  MI_USER_INTERRUPT,                SMI,    F,  1,      R  ),
 	CMD(  MI_WAIT_FOR_EVENT,                SMI,    F,  1,      M  ),
 	CMD(  MI_ARB_CHECK,                     SMI,    F,  1,      S  ),
 	CMD(  MI_REPORT_HEAD,                   SMI,    F,  1,      S  ),
@@ -98,7 +98,7 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
 	CMD(  GFX_OP_PIPE_CONTROL(5),           S3D,   !F,  0xFF,   B,
 	      .bits = {{
 			.offset = 1,
-			.mask = PIPE_CONTROL_MMIO_WRITE,
+			.mask = (PIPE_CONTROL_MMIO_WRITE | PIPE_CONTROL_NOTIFY),
 			.expected = 0
 	      }},
 	      .bits_count = 1					       ),
@@ -129,6 +129,13 @@ static const struct drm_i915_cmd_descriptor video_cmds[] = {
 	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
 	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   S  ),
 	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0x3F,   R  ),
+	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_FLUSH_DW_NOTIFY,
+			.expected = 0
+	      }},
+	      .bits_count = 1					       ),
 	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
 	/*
 	 * MFX_WAIT doesn't fit the way we handle length for most commands.
@@ -142,6 +149,13 @@ static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
 	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
 	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   S  ),
 	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0x3F,   R  ),
+	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_FLUSH_DW_NOTIFY,
+			.expected = 0
+	      }},
+	      .bits_count = 1					       ),
 	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
 };
 
@@ -149,6 +163,13 @@ static const struct drm_i915_cmd_descriptor blt_cmds[] = {
 	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  S  ),
 	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0x3F,   R  ),
+	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_FLUSH_DW_NOTIFY,
+			.expected = 0
+	      }},
+	      .bits_count = 1					       ),
 	CMD(  COLOR_BLT,                        S2D,   !F,  0x3F,   S  ),
 	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
 };
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 6592d0d..c2e4898 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -258,6 +258,7 @@
 #define   MI_FLUSH_DW_STORE_INDEX	(1<<21)
 #define   MI_INVALIDATE_TLB		(1<<18)
 #define   MI_FLUSH_DW_OP_STOREDW	(1<<14)
+#define   MI_FLUSH_DW_NOTIFY		(1<<8)
 #define   MI_INVALIDATE_BSD		(1<<7)
 #define   MI_FLUSH_DW_USE_GTT		(1<<2)
 #define   MI_FLUSH_DW_USE_PPGTT		(0<<2)
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 10/13] drm/i915: Enable PPGTT command parser checks
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
                     ` (8 preceding siblings ...)
  2014-01-29 21:55   ` [PATCH 09/13] drm/i915: Reject commands that explicitly generate interrupts bradley.d.volkin
@ 2014-01-29 21:55   ` bradley.d.volkin
  2014-01-29 22:33     ` Chris Wilson
  2014-02-05 15:37     ` Jani Nikula
  2014-01-29 21:55   ` [PATCH 11/13] drm/i915: Reject commands that would store to global HWS page bradley.d.volkin
                     ` (4 subsequent siblings)
  14 siblings, 2 replies; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:55 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Various commands that access memory have a bit to determine whether
the graphics address specified in the command should use the GGTT or
PPGTT for translation. These checks ensure that the bit indicates
PPGTT translation.

Most of these checks use the existing bit-checking infrastructure.
The PIPE_CONTROL and MI_FLUSH_DW commands, however, are multi-function
commands. The GGTT/PPGTT bit is only relevant for certain uses of the
command. As such, this change also extends the bit-checking code to
include a "condition" mask and offset. If the condition mask is non-zero
then the parser only performs the bit check when the bits specified by
the condition mask/offset are also non-zero.

NOTE: At this point in the series PPGTT must be enabled for the parser
to work correctly. If it's not enabled, userspace will not be setting
the PPGTT bits the way the parser requires. VLV is the only platform
where this is a problem, so at this point, we disable parsing for VLV.

OTC-Tracker: AXIA-4631
Change-Id: I3f4c76b6734f1956ec47e698230f97d0998ff92b
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 147 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_drv.h        |   6 ++
 drivers/gpu/drm/i915/i915_reg.h        |   6 ++
 3 files changed, 144 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 7de7c6a..26072a2 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -65,10 +65,22 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = {
 	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   W,
 	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
-	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   W,
-	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
-	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   W,
-	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
+	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   W | B,
+	      .reg = { .offset = 1, .mask = 0x007FFFFC },
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
+	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   W | B,
+	      .reg = { .offset = 1, .mask = 0x007FFFFC },
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
 	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
 };
 
@@ -80,9 +92,35 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
 	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_SET_CONTEXT,                   SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_URB_CLEAR,                     SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3F,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
 	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0xFF,   R  ),
-	CMD(  MI_CLFLUSH,                       SMI,   !F,  0x3FF,  S  ),
-	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_CLFLUSH,                       SMI,   !F,  0x3FF,  B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
+	CMD(  MI_REPORT_PERF_COUNT,             SMI,   !F,  0x3F,   B,
+	      .bits = {{
+			.offset = 1,
+			.mask = MI_REPORT_PERF_COUNT_GGTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
+	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
 	CMD(  GFX_OP_3DSTATE_VF_STATISTICS,     S3D,    F,  1,      S  ),
 	CMD(  PIPELINE_SELECT,                  S3D,    F,  1,      S  ),
 	CMD(  MEDIA_VFE_STATE,			S3D,   !F,  0xFFFF, B,
@@ -100,8 +138,15 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
 			.offset = 1,
 			.mask = (PIPE_CONTROL_MMIO_WRITE | PIPE_CONTROL_NOTIFY),
 			.expected = 0
+	      },
+	      {
+			.offset = 1,
+		        .mask = PIPE_CONTROL_GLOBAL_GTT_IVB,
+			.expected = 0,
+			.condition_offset = 1,
+			.condition_mask = PIPE_CONTROL_POST_SYNC_OP_MASK
 	      }},
-	      .bits_count = 1					       ),
+	      .bits_count = 2					       ),
 };
 
 static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
@@ -127,16 +172,35 @@ static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
 
 static const struct drm_i915_cmd_descriptor video_cmds[] = {
 	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
-	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
 	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0x3F,   R  ),
 	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
 	      .bits = {{
 			.offset = 0,
 			.mask = MI_FLUSH_DW_NOTIFY,
 			.expected = 0
+	      },
+	      {
+			.offset = 1,
+			.mask = MI_FLUSH_DW_USE_GTT,
+			.expected = 0,
+			.condition_offset = 0,
+			.condition_mask = MI_FLUSH_DW_OP_MASK
 	      }},
-	      .bits_count = 1					       ),
-	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
+	      .bits_count = 2                                          ),
+	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
 	/*
 	 * MFX_WAIT doesn't fit the way we handle length for most commands.
 	 * It has a length field but it uses a non-standard length bias.
@@ -147,29 +211,61 @@ static const struct drm_i915_cmd_descriptor video_cmds[] = {
 
 static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
 	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
-	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   S  ),
+	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
 	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0x3F,   R  ),
 	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
 	      .bits = {{
 			.offset = 0,
 			.mask = MI_FLUSH_DW_NOTIFY,
 			.expected = 0
+	      },
+	      {
+			.offset = 1,
+			.mask = MI_FLUSH_DW_USE_GTT,
+			.expected = 0,
+			.condition_offset = 0,
+			.condition_mask = MI_FLUSH_DW_OP_MASK
 	      }},
-	      .bits_count = 1					       ),
-	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
+	      .bits_count = 2					       ),
+	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
 };
 
 static const struct drm_i915_cmd_descriptor blt_cmds[] = {
 	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
-	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  S  ),
+	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  B,
+	      .bits = {{
+			.offset = 0,
+			.mask = MI_GLOBAL_GTT,
+			.expected = 0
+	      }},
+	      .bits_count = 1                                          ),
 	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0x3F,   R  ),
 	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
 	      .bits = {{
 			.offset = 0,
 			.mask = MI_FLUSH_DW_NOTIFY,
 			.expected = 0
+	      },
+	      {
+			.offset = 1,
+			.mask = MI_FLUSH_DW_USE_GTT,
+			.expected = 0,
+			.condition_offset = 0,
+			.condition_mask = MI_FLUSH_DW_OP_MASK
 	      }},
-	      .bits_count = 1					       ),
+	      .bits_count = 2					       ),
 	CMD(  COLOR_BLT,                        S2D,   !F,  0x3F,   S  ),
 	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
 };
@@ -569,10 +665,21 @@ finish:
 
 int i915_needs_cmd_parser(struct intel_ring_buffer *ring)
 {
+	drm_i915_private_t *dev_priv =
+		(drm_i915_private_t *)ring->dev->dev_private;
+
 	/* No command tables indicates a platform without parsing */
 	if (!ring->cmd_tables)
 		return 0;
 
+	/*
+	 * XXX: VLV is Gen7 and therefore has cmd_tables, but has PPGTT
+	 * disabled. That will cause all of the parser's PPGTT checks to
+	 * fail. For now, disable parsing when PPGTT is off.
+	 */
+	if(!dev_priv->mm.aliasing_ppgtt)
+		return 0;
+
 	return i915.enable_cmd_parser;
 }
 
@@ -675,6 +782,16 @@ int i915_parse_cmds(struct intel_ring_buffer *ring,
 				u32 dword = cmd[desc->bits[i].offset] &
 					desc->bits[i].mask;
 
+				if (desc->bits[i].condition_mask != 0) {
+					u32 offset =
+						desc->bits[i].condition_offset;
+					u32 condition = cmd[offset] &
+						desc->bits[i].condition_mask;
+
+					if (condition == 0)
+						continue;
+				}
+
 				if (dword != desc->bits[i].expected) {
 					DRM_DEBUG_DRIVER("CMD: Rejected command 0x%08X for bitmask 0x%08X (exp=0x%08X act=0x%08X) (ring=%d)\n",
 							 *cmd,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8aed80f..2d1d2ef 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1829,11 +1829,17 @@ struct drm_i915_cmd_descriptor {
 	 * compared against an expected value. If the command does not match
 	 * the expected value, the parser rejects it. Only valid if flags has
 	 * the CMD_DESC_BITMASK bit set.
+	 *
+	 * If the check specifies a non-zero condition_mask then the parser
+	 * only performs the check when the bits specified by condition_mask
+	 * are non-zero.
 	 */
 	struct {
 		u32 offset;
 		u32 mask;
 		u32 expected;
+		u32 condition_offset;
+		u32 condition_mask;
 	} bits[MAX_CMD_DESC_BITMASKS];
 	/** Number of valid entries in the bits array */
 	int bits_count;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index c2e4898..ff263f4 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -179,6 +179,8 @@
  * Memory interface instructions used by the kernel
  */
 #define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
+/* Many MI commands use bit 22 of the header dword for GGTT vs PPGTT */
+#define  MI_GLOBAL_GTT    (1<<22)
 
 #define MI_NOOP			MI_INSTR(0, 0)
 #define MI_USER_INTERRUPT	MI_INSTR(0x02, 0)
@@ -258,6 +260,7 @@
 #define   MI_FLUSH_DW_STORE_INDEX	(1<<21)
 #define   MI_INVALIDATE_TLB		(1<<18)
 #define   MI_FLUSH_DW_OP_STOREDW	(1<<14)
+#define   MI_FLUSH_DW_OP_MASK		(3<<14)
 #define   MI_FLUSH_DW_NOTIFY		(1<<8)
 #define   MI_INVALIDATE_BSD		(1<<7)
 #define   MI_FLUSH_DW_USE_GTT		(1<<2)
@@ -324,6 +327,7 @@
 #define   PIPE_CONTROL_CS_STALL				(1<<20)
 #define   PIPE_CONTROL_TLB_INVALIDATE			(1<<18)
 #define   PIPE_CONTROL_QW_WRITE				(1<<14)
+#define   PIPE_CONTROL_POST_SYNC_OP_MASK                (3<<14)
 #define   PIPE_CONTROL_DEPTH_STALL			(1<<13)
 #define   PIPE_CONTROL_WRITE_FLUSH			(1<<12)
 #define   PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH	(1<<12) /* gen6+ */
@@ -352,6 +356,8 @@
 #define MI_URB_CLEAR            MI_INSTR(0x19, 0)
 #define MI_UPDATE_GTT           MI_INSTR(0x23, 0)
 #define MI_CLFLUSH              MI_INSTR(0x27, 0)
+#define MI_REPORT_PERF_COUNT    MI_INSTR(0x28, 0)
+#define   MI_REPORT_PERF_COUNT_GGTT (1<<0)
 #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
 #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
 #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 11/13] drm/i915: Reject commands that would store to global HWS page
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
                     ` (9 preceding siblings ...)
  2014-01-29 21:55   ` [PATCH 10/13] drm/i915: Enable PPGTT command parser checks bradley.d.volkin
@ 2014-01-29 21:55   ` bradley.d.volkin
  2014-02-05 15:39     ` Jani Nikula
  2014-01-29 21:55   ` [PATCH 12/13] drm/i915: Add a CMD_PARSER_VERSION getparam bradley.d.volkin
                     ` (3 subsequent siblings)
  14 siblings, 1 reply; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:55 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

PIPE_CONTROL and MI_FLUSH_DW have bits that would write to the
hardware status page. The driver stores request tracking info
there, so don't let userspace overwrite it.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 30 ++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_reg.h        |  1 +
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 26072a2..b93df1c 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -141,7 +141,8 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
 	      },
 	      {
 			.offset = 1,
-		        .mask = PIPE_CONTROL_GLOBAL_GTT_IVB,
+		        .mask = (PIPE_CONTROL_GLOBAL_GTT_IVB |
+				 PIPE_CONTROL_STORE_DATA_INDEX),
 			.expected = 0,
 			.condition_offset = 1,
 			.condition_mask = PIPE_CONTROL_POST_SYNC_OP_MASK
@@ -192,8 +193,15 @@ static const struct drm_i915_cmd_descriptor video_cmds[] = {
 			.expected = 0,
 			.condition_offset = 0,
 			.condition_mask = MI_FLUSH_DW_OP_MASK
+	      },
+	      {
+			.offset = 0,
+			.mask = MI_FLUSH_DW_STORE_INDEX,
+			.expected = 0,
+			.condition_offset = 0,
+			.condition_mask = MI_FLUSH_DW_OP_MASK
 	      }},
-	      .bits_count = 2                                          ),
+	      .bits_count = 3                                          ),
 	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
 	      .bits = {{
 			.offset = 0,
@@ -231,8 +239,15 @@ static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
 			.expected = 0,
 			.condition_offset = 0,
 			.condition_mask = MI_FLUSH_DW_OP_MASK
+	      },
+	      {
+			.offset = 0,
+			.mask = MI_FLUSH_DW_STORE_INDEX,
+			.expected = 0,
+			.condition_offset = 0,
+			.condition_mask = MI_FLUSH_DW_OP_MASK
 	      }},
-	      .bits_count = 2					       ),
+	      .bits_count = 3					       ),
 	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
 	      .bits = {{
 			.offset = 0,
@@ -264,8 +279,15 @@ static const struct drm_i915_cmd_descriptor blt_cmds[] = {
 			.expected = 0,
 			.condition_offset = 0,
 			.condition_mask = MI_FLUSH_DW_OP_MASK
+	      },
+	      {
+			.offset = 0,
+			.mask = MI_FLUSH_DW_STORE_INDEX,
+			.expected = 0,
+			.condition_offset = 0,
+			.condition_mask = MI_FLUSH_DW_OP_MASK
 	      }},
-	      .bits_count = 2					       ),
+	      .bits_count = 3					       ),
 	CMD(  COLOR_BLT,                        S2D,   !F,  0x3F,   S  ),
 	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
 };
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index ff263f4..5f77cb6 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -324,6 +324,7 @@
 #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|(len-2))
 #define   PIPE_CONTROL_GLOBAL_GTT_IVB			(1<<24) /* gen7+ */
 #define   PIPE_CONTROL_MMIO_WRITE			(1<<23)
+#define   PIPE_CONTROL_STORE_DATA_INDEX			(1<<21)
 #define   PIPE_CONTROL_CS_STALL				(1<<20)
 #define   PIPE_CONTROL_TLB_INVALIDATE			(1<<18)
 #define   PIPE_CONTROL_QW_WRITE				(1<<14)
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 12/13] drm/i915: Add a CMD_PARSER_VERSION getparam
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
                     ` (10 preceding siblings ...)
  2014-01-29 21:55   ` [PATCH 11/13] drm/i915: Reject commands that would store to global HWS page bradley.d.volkin
@ 2014-01-29 21:55   ` bradley.d.volkin
  2014-01-30  9:19     ` Daniel Vetter
  2014-01-29 21:55   ` [PATCH 13/13] drm/i915: Enable command parsing by default bradley.d.volkin
                     ` (2 subsequent siblings)
  14 siblings, 1 reply; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:55 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

So userspace can query the kernel for command parser support.

OTC-Tracker: AXIA-4631
Change-Id: I58af650db9f6753c2dcac9c54ab432fd31db302f
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c | 4 ++++
 include/uapi/drm/i915_drm.h     | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 258b1be..34ba199 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1013,6 +1013,10 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_EXEC_HANDLE_LUT:
 		value = 1;
 		break;
+	case I915_PARAM_CMD_PARSER_VERSION:
+		/* TODO: version info (e.g. what is allowed?) */
+		value = 1;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 126bfaa..8a3e4ef00 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -337,6 +337,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_NO_RELOC	 25
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
 #define I915_PARAM_HAS_WT     	 	 27
+#define I915_PARAM_CMD_PARSER_VERSION	 28
 
 typedef struct drm_i915_getparam {
 	int param;
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 13/13] drm/i915: Enable command parsing by default
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
                     ` (11 preceding siblings ...)
  2014-01-29 21:55   ` [PATCH 12/13] drm/i915: Add a CMD_PARSER_VERSION getparam bradley.d.volkin
@ 2014-01-29 21:55   ` bradley.d.volkin
  2014-01-29 22:11   ` [PATCH 00/13] Gen7 batch buffer command parser Daniel Vetter
  2014-02-05 15:41   ` Jani Nikula
  14 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:55 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

OTC-Tracker: AXIA-4631
Change-Id: I6747457e1fe7494bd42787af51198fcba398ad78
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_params.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 6d3d906..981b635 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -47,7 +47,7 @@ struct i915_params i915 __read_mostly = {
 	.prefault_disable = 0,
 	.reset = true,
 	.invert_brightness = 0,
-	.enable_cmd_parser = 0
+	.enable_cmd_parser = 1
 };
 
 module_param_named(modeset, i915.modeset, int, 0400);
@@ -157,4 +157,4 @@ MODULE_PARM_DESC(invert_brightness,
 
 module_param_named(enable_cmd_parser, i915.enable_cmd_parser, int, 0600);
 MODULE_PARM_DESC(enable_cmd_parser,
-		"Enable command parsing (default: false)");
+		"Enable command parsing (default: true)");
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH] intel: Merge i915_drm.h with cmd parser define
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (23 preceding siblings ...)
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
@ 2014-01-29 21:57 ` bradley.d.volkin
  2014-01-29 22:13   ` Chris Wilson
  2014-01-29 21:58 ` [PATCH 1/6] tests: Add a test for the command parser bradley.d.volkin
  2014-02-05 10:28 ` [RFC 00/22] Gen7 batch buffer command parser Chris Wilson
  26 siblings, 1 reply; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:57 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 include/drm/i915_drm.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
index 2f4eb8c..ba863c4 100644
--- a/include/drm/i915_drm.h
+++ b/include/drm/i915_drm.h
@@ -27,7 +27,7 @@
 #ifndef _I915_DRM_H_
 #define _I915_DRM_H_
 
-#include <drm.h>
+#include <drm/drm.h>
 
 /* Please note that modifications to all structs defined here are
  * subject to backwards-compatibility constraints.
@@ -337,6 +337,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_NO_RELOC	 25
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
 #define I915_PARAM_HAS_WT     	 	 27
+#define I915_PARAM_CMD_PARSER_VERSION	 28
 
 typedef struct drm_i915_getparam {
 	int param;
@@ -721,7 +722,7 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_IS_PINNED		(1<<10)
 
-/** Provide a hint to the kernel that the command stream and auxilliary
+/** Provide a hint to the kernel that the command stream and auxiliary
  * state buffers already holds the correct presumed addresses and so the
  * relocation process may be skipped if no buffers need to be moved in
  * preparation for the execbuffer.
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 1/6] tests: Add a test for the command parser
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (24 preceding siblings ...)
  2014-01-29 21:57 ` [PATCH] intel: Merge i915_drm.h with cmd parser define bradley.d.volkin
@ 2014-01-29 21:58 ` bradley.d.volkin
  2014-01-29 21:58   ` [PATCH 2/6] tests/gem_exec_parse: Add tests for rejected commands bradley.d.volkin
                     ` (4 more replies)
  2014-02-05 10:28 ` [RFC 00/22] Gen7 batch buffer command parser Chris Wilson
  26 siblings, 5 replies; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:58 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Start with a simple testcase that should pass.

v2: Switch to I915_PARAM_CMD_PARSER_VERSION

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 tests/.gitignore       |   1 +
 tests/Makefile.sources |   1 +
 tests/gem_exec_parse.c | 140 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 142 insertions(+)
 create mode 100644 tests/gem_exec_parse.c

diff --git a/tests/.gitignore b/tests/.gitignore
index 7377275..f2356fb 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -35,6 +35,7 @@ gem_exec_blt
 gem_exec_faulting_reloc
 gem_exec_lut_handle
 gem_exec_nop
+gem_exec_parse
 gem_fd_exhaustion
 gem_fenced_exec_thrash
 gem_fence_thrash
diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index a8c0c96..90a5322 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -29,6 +29,7 @@ TESTS_progs_M = \
 	gem_exec_bad_domains \
 	gem_exec_faulting_reloc \
 	gem_exec_nop \
+	gem_exec_parse \
 	gem_fenced_exec_thrash \
 	gem_fence_thrash \
 	gem_flink \
diff --git a/tests/gem_exec_parse.c b/tests/gem_exec_parse.c
new file mode 100644
index 0000000..c71e478
--- /dev/null
+++ b/tests/gem_exec_parse.c
@@ -0,0 +1,140 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include "drm.h"
+#include "i915_drm.h"
+#include "drmtest.h"
+
+#ifndef I915_PARAM_CMD_PARSER_VERSION
+#define I915_PARAM_CMD_PARSER_VERSION       28
+#endif
+
+static int exec_batch_patched(int fd, uint32_t cmd_bo, uint32_t *cmds,
+			      int size, int patch_offset, uint64_t expected_value)
+{
+	struct drm_i915_gem_execbuffer2 execbuf;
+	struct drm_i915_gem_exec_object2 objs[2];
+	struct drm_i915_gem_relocation_entry reloc[1];
+
+	uint32_t target_bo = gem_create(fd, 4096);
+	uint64_t actual_value = 0;
+
+	gem_write(fd, cmd_bo, 0, cmds, size);
+
+	reloc[0].offset = patch_offset;
+	reloc[0].delta = 0;
+	reloc[0].target_handle = target_bo;
+	reloc[0].read_domains = I915_GEM_DOMAIN_RENDER;
+	reloc[0].write_domain = I915_GEM_DOMAIN_RENDER;
+	reloc[0].presumed_offset = 0;
+
+	objs[0].handle = target_bo;
+	objs[0].relocation_count = 0;
+	objs[0].relocs_ptr = 0;
+	objs[0].alignment = 0;
+	objs[0].offset = 0;
+	objs[0].flags = 0;
+	objs[0].rsvd1 = 0;
+	objs[0].rsvd2 = 0;
+
+	objs[1].handle = cmd_bo;
+	objs[1].relocation_count = 1;
+	objs[1].relocs_ptr = (uintptr_t)reloc;
+	objs[1].alignment = 0;
+	objs[1].offset = 0;
+	objs[1].flags = 0;
+	objs[1].rsvd1 = 0;
+	objs[1].rsvd2 = 0;
+
+	execbuf.buffers_ptr = (uintptr_t)objs;
+	execbuf.buffer_count = 2;
+	execbuf.batch_start_offset = 0;
+	execbuf.batch_len = size;
+	execbuf.cliprects_ptr = 0;
+	execbuf.num_cliprects = 0;
+	execbuf.DR1 = 0;
+	execbuf.DR4 = 0;
+	execbuf.flags = I915_EXEC_RENDER;
+	i915_execbuffer2_set_context_id(execbuf, 0);
+	execbuf.rsvd2 = 0;
+
+	gem_execbuf(fd, &execbuf);
+	gem_sync(fd, cmd_bo);
+
+	gem_read(fd,target_bo, 0, &actual_value, sizeof(actual_value));
+	igt_assert(expected_value == actual_value);
+
+	gem_close(fd, target_bo);
+
+	return 1;
+}
+
+uint32_t handle;
+int fd;
+
+#define GFX_OP_PIPE_CONTROL	((0x3<<29)|(0x3<<27)|(0x2<<24)|2)
+#define   PIPE_CONTROL_QW_WRITE	(1<<14)
+
+igt_main
+{
+	igt_fixture {
+		int parser_version = 0;
+                drm_i915_getparam_t gp;
+		int rc;
+
+		fd = drm_open_any();
+
+		gp.param = I915_PARAM_CMD_PARSER_VERSION;
+		gp.value = &parser_version;
+		rc = drmIoctl(fd, DRM_IOCTL_I915_GETPARAM, &gp);
+		igt_require(!rc && parser_version > 0);
+
+		handle = gem_create(fd, 4096);
+	}
+
+	igt_subtest("basic-allowed") {
+		uint32_t pc[] = {
+			GFX_OP_PIPE_CONTROL,
+			PIPE_CONTROL_QW_WRITE,
+			0, // To be patched
+			0x12000000,
+			0,
+			MI_BATCH_BUFFER_END,
+		};
+		igt_assert(
+			exec_batch_patched(fd, handle,
+					   pc, sizeof(pc),
+					   8, // patch offset,
+					   0x12000000));
+	}
+
+	igt_fixture {
+		gem_close(fd, handle);
+
+		close(fd);
+	}
+}
-- 
1.8.3.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 2/6] tests/gem_exec_parse: Add tests for rejected commands
  2014-01-29 21:58 ` [PATCH 1/6] tests: Add a test for the command parser bradley.d.volkin
@ 2014-01-29 21:58   ` bradley.d.volkin
  2014-01-29 21:58   ` [PATCH 3/6] tests/gem_exec_parse: Add tests for register whitelist bradley.d.volkin
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:58 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 tests/gem_exec_parse.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/tests/gem_exec_parse.c b/tests/gem_exec_parse.c
index c71e478..ebf7116 100644
--- a/tests/gem_exec_parse.c
+++ b/tests/gem_exec_parse.c
@@ -93,9 +93,55 @@ static int exec_batch_patched(int fd, uint32_t cmd_bo, uint32_t *cmds,
 	return 1;
 }
 
+static int exec_batch(int fd, uint32_t cmd_bo, uint32_t *cmds,
+		      int size, int ring, int expected_ret)
+{
+	struct drm_i915_gem_execbuffer2 execbuf;
+	struct drm_i915_gem_exec_object2 objs[1];
+	int ret;
+
+	gem_write(fd, cmd_bo, 0, cmds, size);
+
+	objs[0].handle = cmd_bo;
+	objs[0].relocation_count = 0;
+	objs[0].relocs_ptr = 0;
+	objs[0].alignment = 0;
+	objs[0].offset = 0;
+	objs[0].flags = 0;
+	objs[0].rsvd1 = 0;
+	objs[0].rsvd2 = 0;
+
+	execbuf.buffers_ptr = (uintptr_t)objs;
+	execbuf.buffer_count = 1;
+	execbuf.batch_start_offset = 0;
+	execbuf.batch_len = size;
+	execbuf.cliprects_ptr = 0;
+	execbuf.num_cliprects = 0;
+	execbuf.DR1 = 0;
+	execbuf.DR4 = 0;
+	execbuf.flags = ring;
+	i915_execbuffer2_set_context_id(execbuf, 0);
+	execbuf.rsvd2 = 0;
+
+	ret = drmIoctl(fd,
+		       DRM_IOCTL_I915_GEM_EXECBUFFER2,
+		       &execbuf);
+	if (ret == 0)
+		igt_assert(expected_ret == 0);
+	else
+		igt_assert(-errno == expected_ret);
+
+	gem_sync(fd, cmd_bo);
+
+	return 1;
+}
+
 uint32_t handle;
 int fd;
 
+#define MI_ARB_ON_OFF (0x8 << 23)
+#define MI_DISPLAY_FLIP ((0x14 << 23) | 1)
+
 #define GFX_OP_PIPE_CONTROL	((0x3<<29)|(0x3<<27)|(0x2<<24)|2)
 #define   PIPE_CONTROL_QW_WRITE	(1<<14)
 
@@ -132,6 +178,41 @@ igt_main
 					   0x12000000));
 	}
 
+	igt_subtest("basic-rejected") {
+		uint32_t arb_on_off[] = {
+			MI_ARB_ON_OFF,
+			MI_BATCH_BUFFER_END,
+		};
+		uint32_t display_flip[] = {
+			MI_DISPLAY_FLIP,
+			0, 0, 0,
+			MI_BATCH_BUFFER_END,
+			0
+		};
+		igt_assert(
+			   exec_batch(fd, handle,
+				      arb_on_off, sizeof(arb_on_off),
+				      I915_EXEC_RENDER,
+				      -EINVAL));
+		igt_assert(
+			   exec_batch(fd, handle,
+				      arb_on_off, sizeof(arb_on_off),
+				      I915_EXEC_BSD,
+				      -EINVAL));
+		if (gem_has_vebox(fd)) {
+			igt_assert(
+				   exec_batch(fd, handle,
+					      arb_on_off, sizeof(arb_on_off),
+					      I915_EXEC_VEBOX,
+					      -EINVAL));
+		}
+		igt_assert(
+			   exec_batch(fd, handle,
+				      display_flip, sizeof(display_flip),
+				      I915_EXEC_BLT,
+				      -EINVAL));
+	}
+
 	igt_fixture {
 		gem_close(fd, handle);
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 3/6] tests/gem_exec_parse: Add tests for register whitelist
  2014-01-29 21:58 ` [PATCH 1/6] tests: Add a test for the command parser bradley.d.volkin
  2014-01-29 21:58   ` [PATCH 2/6] tests/gem_exec_parse: Add tests for rejected commands bradley.d.volkin
@ 2014-01-29 21:58   ` bradley.d.volkin
  2014-01-29 21:58   ` [PATCH 4/6] tests/gem_exec_parse: Add tests for bitmask checks bradley.d.volkin
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:58 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 tests/gem_exec_parse.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/tests/gem_exec_parse.c b/tests/gem_exec_parse.c
index ebf7116..48fde25 100644
--- a/tests/gem_exec_parse.c
+++ b/tests/gem_exec_parse.c
@@ -141,6 +141,7 @@ int fd;
 
 #define MI_ARB_ON_OFF (0x8 << 23)
 #define MI_DISPLAY_FLIP ((0x14 << 23) | 1)
+#define MI_LOAD_REGISTER_IMM ((0x22 << 23) | 1)
 
 #define GFX_OP_PIPE_CONTROL	((0x3<<29)|(0x3<<27)|(0x2<<24)|2)
 #define   PIPE_CONTROL_QW_WRITE	(1<<14)
@@ -213,6 +214,31 @@ igt_main
 				      -EINVAL));
 	}
 
+	igt_subtest("registers") {
+		uint32_t lri_bad[] = {
+			MI_LOAD_REGISTER_IMM,
+			0, // disallowed register address
+			0x12000000,
+			MI_BATCH_BUFFER_END,
+		};
+		uint32_t lri_ok[] = {
+			MI_LOAD_REGISTER_IMM,
+			0x5280, // allowed register address (SO_WRITE_OFFSET[0])
+			0x1,
+			MI_BATCH_BUFFER_END,
+		};
+		igt_assert(
+			   exec_batch(fd, handle,
+				      lri_bad, sizeof(lri_bad),
+				      I915_EXEC_RENDER,
+				      -EINVAL));
+		igt_assert(
+			   exec_batch(fd, handle,
+				      lri_ok, sizeof(lri_ok),
+				      I915_EXEC_RENDER,
+				      0));
+	}
+
 	igt_fixture {
 		gem_close(fd, handle);
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 4/6] tests/gem_exec_parse: Add tests for bitmask checks
  2014-01-29 21:58 ` [PATCH 1/6] tests: Add a test for the command parser bradley.d.volkin
  2014-01-29 21:58   ` [PATCH 2/6] tests/gem_exec_parse: Add tests for rejected commands bradley.d.volkin
  2014-01-29 21:58   ` [PATCH 3/6] tests/gem_exec_parse: Add tests for register whitelist bradley.d.volkin
@ 2014-01-29 21:58   ` bradley.d.volkin
  2014-01-29 21:58   ` [PATCH 5/6] tests/gem_exec_parse: Test for batches w/o MI_BATCH_BUFFER_END bradley.d.volkin
  2014-01-29 21:58   ` [PATCH 6/6] tests/gem_exec_parse: Test a command crossing a page boundary bradley.d.volkin
  4 siblings, 0 replies; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:58 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 tests/gem_exec_parse.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/tests/gem_exec_parse.c b/tests/gem_exec_parse.c
index 48fde25..9e90408 100644
--- a/tests/gem_exec_parse.c
+++ b/tests/gem_exec_parse.c
@@ -145,6 +145,7 @@ int fd;
 
 #define GFX_OP_PIPE_CONTROL	((0x3<<29)|(0x3<<27)|(0x2<<24)|2)
 #define   PIPE_CONTROL_QW_WRITE	(1<<14)
+#define   PIPE_CONTROL_LRI_POST_OP (1<<23)
 
 igt_main
 {
@@ -239,6 +240,23 @@ igt_main
 				      0));
 	}
 
+	igt_subtest("bitmasks") {
+		uint32_t pc[] = {
+			GFX_OP_PIPE_CONTROL,
+			(PIPE_CONTROL_QW_WRITE |
+			 PIPE_CONTROL_LRI_POST_OP),
+			0, // To be patched
+			0x12000000,
+			0,
+			MI_BATCH_BUFFER_END,
+		};
+		igt_assert(
+			   exec_batch(fd, handle,
+				      pc, sizeof(pc),
+				      I915_EXEC_RENDER,
+				      -EINVAL));
+	}
+
 	igt_fixture {
 		gem_close(fd, handle);
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 5/6] tests/gem_exec_parse: Test for batches w/o MI_BATCH_BUFFER_END
  2014-01-29 21:58 ` [PATCH 1/6] tests: Add a test for the command parser bradley.d.volkin
                     ` (2 preceding siblings ...)
  2014-01-29 21:58   ` [PATCH 4/6] tests/gem_exec_parse: Add tests for bitmask checks bradley.d.volkin
@ 2014-01-29 21:58   ` bradley.d.volkin
  2014-01-29 22:10     ` Chris Wilson
  2014-01-29 21:58   ` [PATCH 6/6] tests/gem_exec_parse: Test a command crossing a page boundary bradley.d.volkin
  4 siblings, 1 reply; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:58 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 tests/gem_exec_parse.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tests/gem_exec_parse.c b/tests/gem_exec_parse.c
index 9e90408..004c3bf 100644
--- a/tests/gem_exec_parse.c
+++ b/tests/gem_exec_parse.c
@@ -257,6 +257,15 @@ igt_main
 				      -EINVAL));
 	}
 
+	igt_subtest("batch-without-end") {
+		uint32_t noop[1024] = { 0 };
+		igt_assert(
+			   exec_batch(fd, handle,
+				      noop, sizeof(noop),
+				      I915_EXEC_RENDER,
+				      -EINVAL));
+	}
+
 	igt_fixture {
 		gem_close(fd, handle);
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 6/6] tests/gem_exec_parse: Test a command crossing a page boundary
  2014-01-29 21:58 ` [PATCH 1/6] tests: Add a test for the command parser bradley.d.volkin
                     ` (3 preceding siblings ...)
  2014-01-29 21:58   ` [PATCH 5/6] tests/gem_exec_parse: Test for batches w/o MI_BATCH_BUFFER_END bradley.d.volkin
@ 2014-01-29 21:58   ` bradley.d.volkin
  2014-01-29 22:12     ` Chris Wilson
  4 siblings, 1 reply; 138+ messages in thread
From: bradley.d.volkin @ 2014-01-29 21:58 UTC (permalink / raw)
  To: intel-gfx

From: Brad Volkin <bradley.d.volkin@intel.com>

This is a speculative test in that it's not particularly relevant
today, but is important if we switch the parser implementation to
use kmap_atomic instead of vmap.

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 tests/gem_exec_parse.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/tests/gem_exec_parse.c b/tests/gem_exec_parse.c
index 004c3bf..455bfbf 100644
--- a/tests/gem_exec_parse.c
+++ b/tests/gem_exec_parse.c
@@ -136,6 +136,60 @@ static int exec_batch(int fd, uint32_t cmd_bo, uint32_t *cmds,
 	return 1;
 }
 
+static int exec_split_batch(int fd, uint32_t *cmds,
+			    int size, int ring, int expected_ret)
+{
+	struct drm_i915_gem_execbuffer2 execbuf;
+	struct drm_i915_gem_exec_object2 objs[1];
+	uint32_t cmd_bo;
+	uint32_t noop[1024] = { 0 };
+	int ret;
+
+	// Allocate and fill a 2-page batch with noops
+	cmd_bo = gem_create(fd, 4096 * 2);
+	gem_write(fd, cmd_bo, 0, noop, sizeof(noop));
+	gem_write(fd, cmd_bo, 4096, noop, sizeof(noop));
+
+	// Write the provided commands such that the first dword
+	// of the command buffer is the last dword of the first
+	// page (i.e. the command is split across the two pages).
+	gem_write(fd, cmd_bo, 4096-sizeof(uint32_t), cmds, size);
+
+	objs[0].handle = cmd_bo;
+	objs[0].relocation_count = 0;
+	objs[0].relocs_ptr = 0;
+	objs[0].alignment = 0;
+	objs[0].offset = 0;
+	objs[0].flags = 0;
+	objs[0].rsvd1 = 0;
+	objs[0].rsvd2 = 0;
+
+	execbuf.buffers_ptr = (uintptr_t)objs;
+	execbuf.buffer_count = 1;
+	execbuf.batch_start_offset = 0;
+	execbuf.batch_len = size;
+	execbuf.cliprects_ptr = 0;
+	execbuf.num_cliprects = 0;
+	execbuf.DR1 = 0;
+	execbuf.DR4 = 0;
+	execbuf.flags = ring;
+	i915_execbuffer2_set_context_id(execbuf, 0);
+	execbuf.rsvd2 = 0;
+
+	ret = drmIoctl(fd,
+		       DRM_IOCTL_I915_GEM_EXECBUFFER2,
+		       &execbuf);
+	if (ret == 0)
+		igt_assert(expected_ret == 0);
+	else
+		igt_assert(-errno == expected_ret);
+
+	gem_sync(fd, cmd_bo);
+	gem_close(fd, cmd_bo);
+
+	return 1;
+}
+
 uint32_t handle;
 int fd;
 
@@ -266,6 +320,20 @@ igt_main
 				      -EINVAL));
 	}
 
+	igt_subtest("cmd-crossing-page") {
+		uint32_t lri_ok[] = {
+			MI_LOAD_REGISTER_IMM,
+			0x5280, // allowed register address (SO_WRITE_OFFSET[0])
+			0x1,
+			MI_BATCH_BUFFER_END,
+		};
+		igt_assert(
+			   exec_split_batch(fd,
+					    lri_ok, sizeof(lri_ok),
+					    I915_EXEC_RENDER,
+					    0));
+	}
+
 	igt_fixture {
 		gem_close(fd, handle);
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH 5/6] tests/gem_exec_parse: Test for batches w/o MI_BATCH_BUFFER_END
  2014-01-29 21:58   ` [PATCH 5/6] tests/gem_exec_parse: Test for batches w/o MI_BATCH_BUFFER_END bradley.d.volkin
@ 2014-01-29 22:10     ` Chris Wilson
  2014-01-30 11:46       ` Chris Wilson
  0 siblings, 1 reply; 138+ messages in thread
From: Chris Wilson @ 2014-01-29 22:10 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 01:58:29PM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  tests/gem_exec_parse.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/tests/gem_exec_parse.c b/tests/gem_exec_parse.c
> index 9e90408..004c3bf 100644
> --- a/tests/gem_exec_parse.c
> +++ b/tests/gem_exec_parse.c
> @@ -257,6 +257,15 @@ igt_main
>  				      -EINVAL));
>  	}
>  
> +	igt_subtest("batch-without-end") {
> +		uint32_t noop[1024] = { 0 };
> +		igt_assert(
> +			   exec_batch(fd, handle,
> +				      noop, sizeof(noop),
> +				      I915_EXEC_RENDER,
> +				      -EINVAL));

Cheekier would be
uint32_t empty[] = { MI_NOOP, MI_NOOP, MI_BATCH_BUFFER_END, 0 };
for_each_ring() {
	igt_assert(exec_batch(fd, handle, empty, sizeof(empty), ring, 0));
	igt_assert(exec_batch(fd, handle, empty, 8, ring, -EINVAL));
}

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 00/13] Gen7 batch buffer command parser
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
                     ` (12 preceding siblings ...)
  2014-01-29 21:55   ` [PATCH 13/13] drm/i915: Enable command parsing by default bradley.d.volkin
@ 2014-01-29 22:11   ` Daniel Vetter
  2014-01-29 22:22     ` Volkin, Bradley D
  2014-02-05 15:41   ` Jani Nikula
  14 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2014-01-29 22:11 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 01:55:01PM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 3) Coherency. I've found a coherency issue on VLV when reading the batch buffer
>    from the CPU during execbuffer2. Userspace writes the batch via pwrite fast
>    path before calling execbuffer2. The parser reads stale data. This works fine
>    on IVB and HSW, so I believe it's an LLC vs. non-LLC issue. I'm just unclear
>    on what the correct flushing or synchronization is for this scenario. This
>    only matters if we get PPGTT working on VLV and enable the parser there.

Hm, adopting the shmem_read clflushing didn't help for this? That would be
fairly shocking, since it means our shmem read paths are broken. Which are
used e.g. by the libva readback code for the encoded bitstream.

One thing aside: When resending the complete series (even if it's just a
subset) it's better to start a new thread. We tend to use in-reply-to only
when resending individual patches, while the review discussion is still
ongoing. That way the discussion stays together. But when there's been a
bit a longer break it's imo better to start a new thread.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 6/6] tests/gem_exec_parse: Test a command crossing a page boundary
  2014-01-29 21:58   ` [PATCH 6/6] tests/gem_exec_parse: Test a command crossing a page boundary bradley.d.volkin
@ 2014-01-29 22:12     ` Chris Wilson
  2014-03-25 13:20       ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Chris Wilson @ 2014-01-29 22:12 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 01:58:30PM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> This is a speculative test in that it's not particularly relevant
> today, but is important if we switch the parser implementation to
> use kmap_atomic instead of vmap.

Do you not want to iterate over all (or some combination of)
valid/invalid commands to better fuzz the handling of boundaries?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] intel: Merge i915_drm.h with cmd parser define
  2014-01-29 21:57 ` [PATCH] intel: Merge i915_drm.h with cmd parser define bradley.d.volkin
@ 2014-01-29 22:13   ` Chris Wilson
  2014-01-29 22:26     ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Chris Wilson @ 2014-01-29 22:13 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 01:57:28PM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  include/drm/i915_drm.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
> index 2f4eb8c..ba863c4 100644
> --- a/include/drm/i915_drm.h
> +++ b/include/drm/i915_drm.h
> @@ -27,7 +27,7 @@
>  #ifndef _I915_DRM_H_
>  #define _I915_DRM_H_
>  
> -#include <drm.h>
> +#include <drm/drm.h>

Something about this patch smells very fishy....

>  
>  /* Please note that modifications to all structs defined here are
>   * subject to backwards-compatibility constraints.
> @@ -337,6 +337,7 @@ typedef struct drm_i915_irq_wait {
>  #define I915_PARAM_HAS_EXEC_NO_RELOC	 25
>  #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
>  #define I915_PARAM_HAS_WT     	 	 27
> +#define I915_PARAM_CMD_PARSER_VERSION	 28
>  
>  typedef struct drm_i915_getparam {
>  	int param;
> @@ -721,7 +722,7 @@ struct drm_i915_gem_execbuffer2 {
>   */
>  #define I915_EXEC_IS_PINNED		(1<<10)
>  
> -/** Provide a hint to the kernel that the command stream and auxilliary
> +/** Provide a hint to the kernel that the command stream and auxiliary
>   * state buffers already holds the correct presumed addresses and so the
>   * relocation process may be skipped if no buffers need to be moved in
>   * preparation for the execbuffer.

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 00/13] Gen7 batch buffer command parser
  2014-01-29 22:11   ` [PATCH 00/13] Gen7 batch buffer command parser Daniel Vetter
@ 2014-01-29 22:22     ` Volkin, Bradley D
  2014-01-29 23:31       ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2014-01-29 22:22 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 02:11:17PM -0800, Daniel Vetter wrote:
> On Wed, Jan 29, 2014 at 01:55:01PM -0800, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 3) Coherency. I've found a coherency issue on VLV when reading the batch buffer
> >    from the CPU during execbuffer2. Userspace writes the batch via pwrite fast
> >    path before calling execbuffer2. The parser reads stale data. This works fine
> >    on IVB and HSW, so I believe it's an LLC vs. non-LLC issue. I'm just unclear
> >    on what the correct flushing or synchronization is for this scenario. This
> >    only matters if we get PPGTT working on VLV and enable the parser there.
> 
> Hm, adopting the shmem_read clflushing didn't help for this? That would be
> fairly shocking, since it means our shmem read paths are broken. Which are
> used e.g. by the libva readback code for the encoded bitstream.

Sorry, not clear enough. I actually haven't retested that part with the clflushing
added since the opinion seemed to be that leaving the parser disabled for VLV was ok.
Just left the note for now so it doesn't get lost.

> 
> One thing aside: When resending the complete series (even if it's just a
> subset) it's better to start a new thread. We tend to use in-reply-to only
> when resending individual patches, while the review discussion is still
> ongoing. That way the discussion stays together. But when there's been a
> bit a longer break it's imo better to start a new thread.

Ok, I thought it was more of a blanket thing. Noted.
-Brad

> 
> Cheers, Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] intel: Merge i915_drm.h with cmd parser define
  2014-01-29 22:13   ` Chris Wilson
@ 2014-01-29 22:26     ` Volkin, Bradley D
  2014-01-30  9:20       ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2014-01-29 22:26 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 02:13:21PM -0800, Chris Wilson wrote:
> On Wed, Jan 29, 2014 at 01:57:28PM -0800, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > ---
> >  include/drm/i915_drm.h | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
> > index 2f4eb8c..ba863c4 100644
> > --- a/include/drm/i915_drm.h
> > +++ b/include/drm/i915_drm.h
> > @@ -27,7 +27,7 @@
> >  #ifndef _I915_DRM_H_
> >  #define _I915_DRM_H_
> >  
> > -#include <drm.h>
> > +#include <drm/drm.h>
> 
> Something about this patch smells very fishy....

Yeah, I wasn't completely sure about this one. I followed what I thought was
the procedure for updating the header (i.e. make headers_install in kernel,
copy to libdrm) and this is what I got.
-Brad

> 
> >  
> >  /* Please note that modifications to all structs defined here are
> >   * subject to backwards-compatibility constraints.
> > @@ -337,6 +337,7 @@ typedef struct drm_i915_irq_wait {
> >  #define I915_PARAM_HAS_EXEC_NO_RELOC	 25
> >  #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
> >  #define I915_PARAM_HAS_WT     	 	 27
> > +#define I915_PARAM_CMD_PARSER_VERSION	 28
> >  
> >  typedef struct drm_i915_getparam {
> >  	int param;
> > @@ -721,7 +722,7 @@ struct drm_i915_gem_execbuffer2 {
> >   */
> >  #define I915_EXEC_IS_PINNED		(1<<10)
> >  
> > -/** Provide a hint to the kernel that the command stream and auxilliary
> > +/** Provide a hint to the kernel that the command stream and auxiliary
> >   * state buffers already holds the correct presumed addresses and so the
> >   * relocation process may be skipped if no buffers need to be moved in
> >   * preparation for the execbuffer.
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-01-29 21:55   ` [PATCH 02/13] drm/i915: Implement command buffer parsing logic bradley.d.volkin
@ 2014-01-29 22:28     ` Chris Wilson
  2014-01-30  8:53       ` Daniel Vetter
  2014-01-30  9:07     ` Daniel Vetter
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 138+ messages in thread
From: Chris Wilson @ 2014-01-29 22:28 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> +/*
> + * Returns a pointer to a descriptor for the command specified by cmd_header.
> + *
> + * The caller must supply space for a default descriptor via the default_desc
> + * parameter. If no descriptor for the specified command exists in the ring's
> + * command parser tables, this function fills in default_desc based on the
> + * ring's default length encoding and returns default_desc.
> + */
> +static const struct drm_i915_cmd_descriptor*
> +find_cmd(struct intel_ring_buffer *ring,
> +	 u32 cmd_header,
> +	 struct drm_i915_cmd_descriptor *default_desc)
> +{
> +	u32 mask;
> +	int i;
> +
> +	for (i = 0; i < ring->cmd_table_count; i++) {
> +		const struct drm_i915_cmd_descriptor *desc;
> +
> +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> +		if (desc)
> +			return desc;
> +	}
> +
> +	mask = ring->get_cmd_length_mask(cmd_header);
> +	if (!mask)
> +		return NULL;
> +
> +	BUG_ON(!default_desc);
> +	default_desc->flags = CMD_DESC_SKIP;
> +	default_desc->length.mask = mask;

If we turn off all hw validation (through use of the secure bit) should
we not default to a whitelist of commands? Otherwise it just seems to be
a case of running a fuzzer until we kill the machine.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 10/13] drm/i915: Enable PPGTT command parser checks
  2014-01-29 21:55   ` [PATCH 10/13] drm/i915: Enable PPGTT command parser checks bradley.d.volkin
@ 2014-01-29 22:33     ` Chris Wilson
  2014-01-29 23:00       ` Volkin, Bradley D
  2014-02-05 15:37     ` Jani Nikula
  1 sibling, 1 reply; 138+ messages in thread
From: Chris Wilson @ 2014-01-29 22:33 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 01:55:11PM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> Various commands that access memory have a bit to determine whether
> the graphics address specified in the command should use the GGTT or
> PPGTT for translation. These checks ensure that the bit indicates
> PPGTT translation.
> 
> Most of these checks use the existing bit-checking infrastructure.
> The PIPE_CONTROL and MI_FLUSH_DW commands, however, are multi-function
> commands. The GGTT/PPGTT bit is only relevant for certain uses of the
> command. As such, this change also extends the bit-checking code to
> include a "condition" mask and offset. If the condition mask is non-zero
> then the parser only performs the bit check when the bits specified by
> the condition mask/offset are also non-zero.
> 
> NOTE: At this point in the series PPGTT must be enabled for the parser
> to work correctly. If it's not enabled, userspace will not be setting
> the PPGTT bits the way the parser requires. VLV is the only platform
> where this is a problem, so at this point, we disable parsing for VLV.

That doesn't make sense. Are we not verifying that userspace has set the
bits as appropriate for the hardware setup? So the value we expect
depends upon how we have enabled ppgtt (or not).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 07/13] drm/i915: Add register whitelist for DRM master
  2014-01-29 21:55   ` [PATCH 07/13] drm/i915: Add register whitelist for DRM master bradley.d.volkin
@ 2014-01-29 22:37     ` Chris Wilson
  2014-01-29 23:18       ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Chris Wilson @ 2014-01-29 22:37 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 01:55:08PM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> These are used to implement scanline waits in the X server.
> 
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 18d5b05..296e322 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -234,6 +234,20 @@ static const u32 gen7_blt_regs[] = {
>  	BCS_SWCTRL,
>  };
>  
> +/* Whitelists for the DRM master. Magic numbers are taken from sna, to match. */

It would be wiser to use the kernel defines, makes it look like we are
actually in charge. ;-)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 10/13] drm/i915: Enable PPGTT command parser checks
  2014-01-29 22:33     ` Chris Wilson
@ 2014-01-29 23:00       ` Volkin, Bradley D
  2014-01-29 23:08         ` Chris Wilson
  0 siblings, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2014-01-29 23:00 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 02:33:55PM -0800, Chris Wilson wrote:
> On Wed, Jan 29, 2014 at 01:55:11PM -0800, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > Various commands that access memory have a bit to determine whether
> > the graphics address specified in the command should use the GGTT or
> > PPGTT for translation. These checks ensure that the bit indicates
> > PPGTT translation.
> > 
> > Most of these checks use the existing bit-checking infrastructure.
> > The PIPE_CONTROL and MI_FLUSH_DW commands, however, are multi-function
> > commands. The GGTT/PPGTT bit is only relevant for certain uses of the
> > command. As such, this change also extends the bit-checking code to
> > include a "condition" mask and offset. If the condition mask is non-zero
> > then the parser only performs the bit check when the bits specified by
> > the condition mask/offset are also non-zero.
> > 
> > NOTE: At this point in the series PPGTT must be enabled for the parser
> > to work correctly. If it's not enabled, userspace will not be setting
> > the PPGTT bits the way the parser requires. VLV is the only platform
> > where this is a problem, so at this point, we disable parsing for VLV.
> 
> That doesn't make sense. Are we not verifying that userspace has set the
> bits as appropriate for the hardware setup? So the value we expect
> depends upon how we have enabled ppgtt (or not).

We could but don't currently. I was under the impression the parser wasn't
seen as having as much benefit without ppgtt and that we're generally moving
towards ppgtt as the default for all relevant platforms.
- Brad

> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 10/13] drm/i915: Enable PPGTT command parser checks
  2014-01-29 23:00       ` Volkin, Bradley D
@ 2014-01-29 23:08         ` Chris Wilson
  0 siblings, 0 replies; 138+ messages in thread
From: Chris Wilson @ 2014-01-29 23:08 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 03:00:11PM -0800, Volkin, Bradley D wrote:
> On Wed, Jan 29, 2014 at 02:33:55PM -0800, Chris Wilson wrote:
> > On Wed, Jan 29, 2014 at 01:55:11PM -0800, bradley.d.volkin@intel.com wrote:
> > > From: Brad Volkin <bradley.d.volkin@intel.com>
> > > 
> > > Various commands that access memory have a bit to determine whether
> > > the graphics address specified in the command should use the GGTT or
> > > PPGTT for translation. These checks ensure that the bit indicates
> > > PPGTT translation.
> > > 
> > > Most of these checks use the existing bit-checking infrastructure.
> > > The PIPE_CONTROL and MI_FLUSH_DW commands, however, are multi-function
> > > commands. The GGTT/PPGTT bit is only relevant for certain uses of the
> > > command. As such, this change also extends the bit-checking code to
> > > include a "condition" mask and offset. If the condition mask is non-zero
> > > then the parser only performs the bit check when the bits specified by
> > > the condition mask/offset are also non-zero.
> > > 
> > > NOTE: At this point in the series PPGTT must be enabled for the parser
> > > to work correctly. If it's not enabled, userspace will not be setting
> > > the PPGTT bits the way the parser requires. VLV is the only platform
> > > where this is a problem, so at this point, we disable parsing for VLV.
> > 
> > That doesn't make sense. Are we not verifying that userspace has set the
> > bits as appropriate for the hardware setup? So the value we expect
> > depends upon how we have enabled ppgtt (or not).
> 
> We could but don't currently. I was under the impression the parser wasn't
> seen as having as much benefit without ppgtt and that we're generally moving
> towards ppgtt as the default for all relevant platforms.

Oh, I remember that argument. It's just the way you phrased the note
made me think that it was a limitation of the patch.

Personally I would implement the checks against the hardware state as we
know it. It's a nice pedalogical example, and removes a buried
assumption from the code.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 07/13] drm/i915: Add register whitelist for DRM master
  2014-01-29 22:37     ` Chris Wilson
@ 2014-01-29 23:18       ` Volkin, Bradley D
  2014-01-30  9:02         ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2014-01-29 23:18 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 02:37:25PM -0800, Chris Wilson wrote:
> On Wed, Jan 29, 2014 at 01:55:08PM -0800, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > These are used to implement scanline waits in the X server.
> > 
> > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_cmd_parser.c | 30 ++++++++++++++++++++++++++++++
> >  1 file changed, 30 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > index 18d5b05..296e322 100644
> > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > @@ -234,6 +234,20 @@ static const u32 gen7_blt_regs[] = {
> >  	BCS_SWCTRL,
> >  };
> >  
> > +/* Whitelists for the DRM master. Magic numbers are taken from sna, to match. */
> 
> It would be wiser to use the kernel defines, makes it look like we are
> actually in charge. ;-)

Will fix, though based on the sna commit history, it looks like you're in charge
either way :)

> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 00/13] Gen7 batch buffer command parser
  2014-01-29 22:22     ` Volkin, Bradley D
@ 2014-01-29 23:31       ` Daniel Vetter
  0 siblings, 0 replies; 138+ messages in thread
From: Daniel Vetter @ 2014-01-29 23:31 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 02:22:49PM -0800, Volkin, Bradley D wrote:
> On Wed, Jan 29, 2014 at 02:11:17PM -0800, Daniel Vetter wrote:
> > On Wed, Jan 29, 2014 at 01:55:01PM -0800, bradley.d.volkin@intel.com wrote:
> > > From: Brad Volkin <bradley.d.volkin@intel.com>
> > > 3) Coherency. I've found a coherency issue on VLV when reading the batch buffer
> > >    from the CPU during execbuffer2. Userspace writes the batch via pwrite fast
> > >    path before calling execbuffer2. The parser reads stale data. This works fine
> > >    on IVB and HSW, so I believe it's an LLC vs. non-LLC issue. I'm just unclear
> > >    on what the correct flushing or synchronization is for this scenario. This
> > >    only matters if we get PPGTT working on VLV and enable the parser there.
> > 
> > Hm, adopting the shmem_read clflushing didn't help for this? That would be
> > fairly shocking, since it means our shmem read paths are broken. Which are
> > used e.g. by the libva readback code for the encoded bitstream.
> 
> Sorry, not clear enough. I actually haven't retested that part with the clflushing
> added since the opinion seemed to be that leaving the parser disabled for VLV was ok.
> Just left the note for now so it doesn't get lost.

Would be nice to give it spin though, since afaik this issue might persist
on vlv+1. And I guess we can't keep on sticking our heads into sand about
ppgtt not really working on soc platforms ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 01/13] drm/i915: Refactor shmem pread setup
  2014-01-29 21:55   ` [PATCH 01/13] drm/i915: Refactor shmem pread setup bradley.d.volkin
@ 2014-01-30  8:36     ` Daniel Vetter
  0 siblings, 0 replies; 138+ messages in thread
From: Daniel Vetter @ 2014-01-30  8:36 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 01:55:02PM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> The command parser is going to need the same synchronization and
> setup logic, so factor it out for reuse.
> 
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h |  3 +++
>  drivers/gpu/drm/i915/i915_gem.c | 48 +++++++++++++++++++++++++++++------------
>  2 files changed, 37 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 3673ba1..bfb30df 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2045,6 +2045,9 @@ void i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv);
>  void i915_gem_release_mmap(struct drm_i915_gem_object *obj);
>  void i915_gem_lastclose(struct drm_device *dev);
>  
> +int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
> +				    int *needs_clflush);
> +
>  int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
>  static inline struct page *i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 39770f7..fdc1f40 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -332,6 +332,39 @@ __copy_from_user_swizzled(char *gpu_vaddr, int gpu_offset,
>  	return 0;
>  }
>  
> +/*
> + * Pins the specified object's pages and synchronizes the object with
> + * GPU accesses. Sets needs_clflush to non-zero if the caller should
> + * flush the object from the CPU cache.
> + */
> +int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
> +				    int *needs_clflush)
> +{
> +	int ret;

I think for safety reasons (userspace can blow up the kernel otherwise
with the execbuf cmd parser) we need to replicate the "is this really
shmem backed" check from i915_gem_pread_ioctl to this place here.
-Daniel

> +
> +	*needs_clflush = 0;
> +
> +	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
> +		/* If we're not in the cpu read domain, set ourself into the gtt
> +		 * read domain and manually flush cachelines (if required). This
> +		 * optimizes for the case when the gpu will dirty the data
> +		 * anyway again before the next pread happens. */
> +		*needs_clflush = !cpu_cache_is_coherent(obj->base.dev,
> +							obj->cache_level);
> +		ret = i915_gem_object_wait_rendering(obj, true);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	ret = i915_gem_object_get_pages(obj);
> +	if (ret)
> +		return ret;
> +
> +	i915_gem_object_pin_pages(obj);
> +
> +	return ret;
> +}
> +
>  /* Per-page copy function for the shmem pread fastpath.
>   * Flushes invalid cachelines before reading the target if
>   * needs_clflush is set. */
> @@ -429,23 +462,10 @@ i915_gem_shmem_pread(struct drm_device *dev,
>  
>  	obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
>  
> -	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
> -		/* If we're not in the cpu read domain, set ourself into the gtt
> -		 * read domain and manually flush cachelines (if required). This
> -		 * optimizes for the case when the gpu will dirty the data
> -		 * anyway again before the next pread happens. */
> -		needs_clflush = !cpu_cache_is_coherent(dev, obj->cache_level);
> -		ret = i915_gem_object_wait_rendering(obj, true);
> -		if (ret)
> -			return ret;
> -	}
> -
> -	ret = i915_gem_object_get_pages(obj);
> +	ret = i915_gem_obj_prepare_shmem_read(obj, &needs_clflush);
>  	if (ret)
>  		return ret;
>  
> -	i915_gem_object_pin_pages(obj);
> -
>  	offset = args->offset;
>  
>  	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents,
> -- 
> 1.8.5.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-01-29 22:28     ` Chris Wilson
@ 2014-01-30  8:53       ` Daniel Vetter
  2014-01-30  9:05         ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2014-01-30  8:53 UTC (permalink / raw)
  To: Chris Wilson, bradley.d.volkin, intel-gfx

On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > +/*
> > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > + *
> > + * The caller must supply space for a default descriptor via the default_desc
> > + * parameter. If no descriptor for the specified command exists in the ring's
> > + * command parser tables, this function fills in default_desc based on the
> > + * ring's default length encoding and returns default_desc.
> > + */
> > +static const struct drm_i915_cmd_descriptor*
> > +find_cmd(struct intel_ring_buffer *ring,
> > +	 u32 cmd_header,
> > +	 struct drm_i915_cmd_descriptor *default_desc)
> > +{
> > +	u32 mask;
> > +	int i;
> > +
> > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > +		const struct drm_i915_cmd_descriptor *desc;
> > +
> > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > +		if (desc)
> > +			return desc;
> > +	}
> > +
> > +	mask = ring->get_cmd_length_mask(cmd_header);
> > +	if (!mask)
> > +		return NULL;
> > +
> > +	BUG_ON(!default_desc);
> > +	default_desc->flags = CMD_DESC_SKIP;
> > +	default_desc->length.mask = mask;
> 
> If we turn off all hw validation (through use of the secure bit) should
> we not default to a whitelist of commands? Otherwise it just seems to be
> a case of running a fuzzer until we kill the machine.

Preventing hangs and dos is imo not the attack model, gpus are too fickle
for that. The attach model here is to prevent priveledge escalation and
information leaks. I.e. we want just containement of all read/write access
to the gtt space.

I think for that purpose an explicit whitelist of commands which target
things outside of the (pp)gtt is sufficient. radeon's checker design is
completely different, but pretty much the only command they have is
to load register values. Intel gpus otoh have a big set of special-purpose
commands to load (most) of the rendering pipeline state. So we have
hw built-in register whitelists for all that stuff since you just can't
load arbitrary registers and state with those commands.

Also note that for raw register access Bradley's scanner _is_ whitelist
based. And for general reads/writes gpu designers confirmed that those are
all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
as long as we check for the exceptions and otherwise only whitelist MI_
commands we know about we should be covered.

So I think this is sound.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 07/13] drm/i915: Add register whitelist for DRM master
  2014-01-29 23:18       ` Volkin, Bradley D
@ 2014-01-30  9:02         ` Daniel Vetter
       [not found]           ` <20140130172206.GA26611@vpg-ubuntu-bdvolkin>
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2014-01-30  9:02 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 03:18:21PM -0800, Volkin, Bradley D wrote:
> On Wed, Jan 29, 2014 at 02:37:25PM -0800, Chris Wilson wrote:
> > On Wed, Jan 29, 2014 at 01:55:08PM -0800, bradley.d.volkin@intel.com wrote:
> > > From: Brad Volkin <bradley.d.volkin@intel.com>
> > > 
> > > These are used to implement scanline waits in the X server.
> > > 
> > > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_cmd_parser.c | 30 ++++++++++++++++++++++++++++++
> > >  1 file changed, 30 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > index 18d5b05..296e322 100644
> > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > > @@ -234,6 +234,20 @@ static const u32 gen7_blt_regs[] = {
> > >  	BCS_SWCTRL,
> > >  };
> > >  
> > > +/* Whitelists for the DRM master. Magic numbers are taken from sna, to match. */
> > 
> > It would be wiser to use the kernel defines, makes it look like we are
> > actually in charge. ;-)
> 
> Will fix, though based on the sna commit history, it looks like you're in charge
> either way :)

Yeah, for the register tables I think we really should use symbolic values
consistently, adding new ones if i915_reg.h has them lacking. At least as
long as the lists are this short.

Aside: The bkm for getting big feature work which adds lots of register
#defines like this is to split out patches with just the #defines. That
way those can be reviewed independently from any discussions about the
code itself and so merged early. Helps with rebasing pains ;-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-01-30  8:53       ` Daniel Vetter
@ 2014-01-30  9:05         ` Daniel Vetter
  2014-01-30  9:12           ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2014-01-30  9:05 UTC (permalink / raw)
  To: Chris Wilson, bradley.d.volkin, intel-gfx

On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > +/*
> > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > + *
> > > + * The caller must supply space for a default descriptor via the default_desc
> > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > + * command parser tables, this function fills in default_desc based on the
> > > + * ring's default length encoding and returns default_desc.
> > > + */
> > > +static const struct drm_i915_cmd_descriptor*
> > > +find_cmd(struct intel_ring_buffer *ring,
> > > +	 u32 cmd_header,
> > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > +{
> > > +	u32 mask;
> > > +	int i;
> > > +
> > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > +		const struct drm_i915_cmd_descriptor *desc;
> > > +
> > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > +		if (desc)
> > > +			return desc;
> > > +	}
> > > +
> > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > +	if (!mask)
> > > +		return NULL;
> > > +
> > > +	BUG_ON(!default_desc);
> > > +	default_desc->flags = CMD_DESC_SKIP;
> > > +	default_desc->length.mask = mask;
> > 
> > If we turn off all hw validation (through use of the secure bit) should
> > we not default to a whitelist of commands? Otherwise it just seems to be
> > a case of running a fuzzer until we kill the machine.
> 
> Preventing hangs and dos is imo not the attack model, gpus are too fickle
> for that. The attach model here is to prevent priveledge escalation and
> information leaks. I.e. we want just containement of all read/write access
> to the gtt space.
> 
> I think for that purpose an explicit whitelist of commands which target
> things outside of the (pp)gtt is sufficient. radeon's checker design is
> completely different, but pretty much the only command they have is
> to load register values. Intel gpus otoh have a big set of special-purpose
> commands to load (most) of the rendering pipeline state. So we have
> hw built-in register whitelists for all that stuff since you just can't
> load arbitrary registers and state with those commands.
> 
> Also note that for raw register access Bradley's scanner _is_ whitelist
> based. And for general reads/writes gpu designers confirmed that those are
> all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> as long as we check for the exceptions and otherwise only whitelist MI_
> commands we know about we should be covered.
> 
> So I think this is sound.

Hm, but while scrolling through the checker I haven't spotted a "reject
everything unknown" for MI_CLIENT commands. Bradley, have I missed that?

I think submitting an invented MI_CLIENT command would also be a good
testcase.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-01-29 21:55   ` [PATCH 02/13] drm/i915: Implement command buffer parsing logic bradley.d.volkin
  2014-01-29 22:28     ` Chris Wilson
@ 2014-01-30  9:07     ` Daniel Vetter
  2014-01-30 10:57       ` Chris Wilson
  2014-02-05 15:15     ` Jani Nikula
  2014-02-07 13:58     ` Jani Nikula
  3 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2014-01-30  9:07 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> The command parser scans batch buffers submitted via execbuffer ioctls before
> the driver submits them to hardware. At a high level, it looks for several
> things:
> 
> 1) Commands which are explicitly defined as privileged or which should only be
>    used by the kernel driver. The parser generally rejects such commands, with
>    the provision that it may allow some from the drm master process.
> 2) Commands which access registers. To support correct/enhanced userspace
>    functionality, particularly certain OpenGL extensions, the parser provides a
>    whitelist of registers which userspace may safely access (for both normal and
>    drm master processes).
> 3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
>    parser always rejects such commands.
> 
> Each ring maintains tables of commands and registers which the parser uses in
> scanning batch buffers submitted to that ring.
> 
> The set of commands that the parser must check for is significantly smaller
> than the number of commands supported, especially on the render ring. As such,
> the parser tables (built up in subsequent patches) contain only those commands
> required by the parser. This generally works because command opcode ranges have
> standard command length encodings. So for commands that the parser does not need
> to check, it can easily skip them. This is implementated via a per-ring length
> decoding vfunc.
> 
> Unfortunately, there are a number of commands that do not follow the standard
> length encoding for their opcode range, primarily amongst the MI_* commands. To
> handle this, the parser provides a way to define explicit "skip" entries in the
> per-ring command tables.
> 
> Other command table entries will map fairly directly to high level categories
> mentioned above: rejected, master-only, register whitelist. A number of checks,
> including the privileged memory checks, are implemented via a general bitmasking
> mechanism.
> 
> OTC-Tracker: AXIA-4631
> Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>


> +#include "i915_drv.h"
> +
> +#define CLIENT_MASK      0xE0000000
> +#define SUBCLIENT_MASK   0x18000000
> +#define MI_CLIENT        0x00000000
> +#define RC_CLIENT        0x60000000
> +#define BC_CLIENT        0x40000000
> +#define MEDIA_SUBCLIENT  0x10000000

I think these would fit neatly right next to all the other MI_* #defines
in i915_reg.h. The other idea that just crossed my mind is to extract all
the command #defines into a new i915_cmd.h (included by i915_drv.h by
default) since i915_reg.h is already giant.

But that should be done as a follow-up patch to avoid patch shuffling
hell.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-01-30  9:05         ` Daniel Vetter
@ 2014-01-30  9:12           ` Daniel Vetter
  2014-01-30 11:07             ` Daniel Vetter
  2014-01-30 17:55             ` Volkin, Bradley D
  0 siblings, 2 replies; 138+ messages in thread
From: Daniel Vetter @ 2014-01-30  9:12 UTC (permalink / raw)
  To: Chris Wilson, bradley.d.volkin, intel-gfx

On Thu, Jan 30, 2014 at 10:05:28AM +0100, Daniel Vetter wrote:
> On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> > On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > > +/*
> > > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > > + *
> > > > + * The caller must supply space for a default descriptor via the default_desc
> > > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > > + * command parser tables, this function fills in default_desc based on the
> > > > + * ring's default length encoding and returns default_desc.
> > > > + */
> > > > +static const struct drm_i915_cmd_descriptor*
> > > > +find_cmd(struct intel_ring_buffer *ring,
> > > > +	 u32 cmd_header,
> > > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > > +{
> > > > +	u32 mask;
> > > > +	int i;
> > > > +
> > > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > > +		const struct drm_i915_cmd_descriptor *desc;
> > > > +
> > > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > > +		if (desc)
> > > > +			return desc;
> > > > +	}
> > > > +
> > > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > > +	if (!mask)
> > > > +		return NULL;
> > > > +
> > > > +	BUG_ON(!default_desc);
> > > > +	default_desc->flags = CMD_DESC_SKIP;
> > > > +	default_desc->length.mask = mask;
> > > 
> > > If we turn off all hw validation (through use of the secure bit) should
> > > we not default to a whitelist of commands? Otherwise it just seems to be
> > > a case of running a fuzzer until we kill the machine.
> > 
> > Preventing hangs and dos is imo not the attack model, gpus are too fickle
> > for that. The attach model here is to prevent priveledge escalation and
> > information leaks. I.e. we want just containement of all read/write access
> > to the gtt space.
> > 
> > I think for that purpose an explicit whitelist of commands which target
> > things outside of the (pp)gtt is sufficient. radeon's checker design is
> > completely different, but pretty much the only command they have is
> > to load register values. Intel gpus otoh have a big set of special-purpose
> > commands to load (most) of the rendering pipeline state. So we have
> > hw built-in register whitelists for all that stuff since you just can't
> > load arbitrary registers and state with those commands.
> > 
> > Also note that for raw register access Bradley's scanner _is_ whitelist
> > based. And for general reads/writes gpu designers confirmed that those are
> > all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> > as long as we check for the exceptions and otherwise only whitelist MI_
> > commands we know about we should be covered.
> > 
> > So I think this is sound.
> 
> Hm, but while scrolling through the checker I haven't spotted a "reject
> everything unknown" for MI_CLIENT commands. Bradley, have I missed that?
> 
> I think submitting an invented MI_CLIENT command would also be a good
> testcase.

One more: I think it would be good to have an overview comment at the top
of i915_cmd_parser.c which details the security attack model and the
overall blacklist/whitelist design of the checker. We don't (yet) have
autogenerated documentation for i915, but that's something I'm working on.
And the kerneldoc system can also pull in multi-paragraph overview
comments besides the usual api documentation, so it's good to have things
ready.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 12/13] drm/i915: Add a CMD_PARSER_VERSION getparam
  2014-01-29 21:55   ` [PATCH 12/13] drm/i915: Add a CMD_PARSER_VERSION getparam bradley.d.volkin
@ 2014-01-30  9:19     ` Daniel Vetter
  2014-01-30 17:25       ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2014-01-30  9:19 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 01:55:13PM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> So userspace can query the kernel for command parser support.
> 
> OTC-Tracker: AXIA-4631
> Change-Id: I58af650db9f6753c2dcac9c54ab432fd31db302f
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_dma.c | 4 ++++
>  include/uapi/drm/i915_drm.h     | 1 +
>  2 files changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 258b1be..34ba199 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1013,6 +1013,10 @@ static int i915_getparam(struct drm_device *dev, void *data,
>  	case I915_PARAM_HAS_EXEC_HANDLE_LUT:
>  		value = 1;
>  		break;
> +	case I915_PARAM_CMD_PARSER_VERSION:
> +		/* TODO: version info (e.g. what is allowed?) */
> +		value = 1;

I think an increasing integer without any mean special grouping (like
major/minor) is good enough. What we need though is a small revision log
with one-line blurbs that explain what has been added, e.g.

1: Initial version.
2: Allow streamout related registers
3: Add gen8 support

...

I think it would be good to have this as a comment right next to the
parser code itself, so what about adding a i915_cmd_parser_get_version
function to i915_cmd_parser.c who's only job is to return and integer and
contain this comment?
-Daniel

> +		break;
>  	default:
>  		DRM_DEBUG("Unknown parameter %d\n", param->param);
>  		return -EINVAL;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 126bfaa..8a3e4ef00 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -337,6 +337,7 @@ typedef struct drm_i915_irq_wait {
>  #define I915_PARAM_HAS_EXEC_NO_RELOC	 25
>  #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
>  #define I915_PARAM_HAS_WT     	 	 27
> +#define I915_PARAM_CMD_PARSER_VERSION	 28
>  
>  typedef struct drm_i915_getparam {
>  	int param;
> -- 
> 1.8.5.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] intel: Merge i915_drm.h with cmd parser define
  2014-01-29 22:26     ` Volkin, Bradley D
@ 2014-01-30  9:20       ` Daniel Vetter
  2014-01-30 17:28         ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2014-01-30  9:20 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Wed, Jan 29, 2014 at 02:26:12PM -0800, Volkin, Bradley D wrote:
> On Wed, Jan 29, 2014 at 02:13:21PM -0800, Chris Wilson wrote:
> > On Wed, Jan 29, 2014 at 01:57:28PM -0800, bradley.d.volkin@intel.com wrote:
> > > From: Brad Volkin <bradley.d.volkin@intel.com>
> > > 
> > > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > > ---
> > >  include/drm/i915_drm.h | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
> > > index 2f4eb8c..ba863c4 100644
> > > --- a/include/drm/i915_drm.h
> > > +++ b/include/drm/i915_drm.h
> > > @@ -27,7 +27,7 @@
> > >  #ifndef _I915_DRM_H_
> > >  #define _I915_DRM_H_
> > >  
> > > -#include <drm.h>
> > > +#include <drm/drm.h>
> > 
> > Something about this patch smells very fishy....
> 
> Yeah, I wasn't completely sure about this one. I followed what I thought was
> the procedure for updating the header (i.e. make headers_install in kernel,
> copy to libdrm) and this is what I got.

I guess either works, so maybe just add a note to the commit message about
the little change. Imo it's better to have a 1:1 copy of the header
generated by the kernel.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-01-30  9:07     ` Daniel Vetter
@ 2014-01-30 10:57       ` Chris Wilson
  0 siblings, 0 replies; 138+ messages in thread
From: Chris Wilson @ 2014-01-30 10:57 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, Jan 30, 2014 at 10:07:42AM +0100, Daniel Vetter wrote:
> On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > The command parser scans batch buffers submitted via execbuffer ioctls before
> > the driver submits them to hardware. At a high level, it looks for several
> > things:
> > 
> > 1) Commands which are explicitly defined as privileged or which should only be
> >    used by the kernel driver. The parser generally rejects such commands, with
> >    the provision that it may allow some from the drm master process.
> > 2) Commands which access registers. To support correct/enhanced userspace
> >    functionality, particularly certain OpenGL extensions, the parser provides a
> >    whitelist of registers which userspace may safely access (for both normal and
> >    drm master processes).
> > 3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
> >    parser always rejects such commands.
> > 
> > Each ring maintains tables of commands and registers which the parser uses in
> > scanning batch buffers submitted to that ring.
> > 
> > The set of commands that the parser must check for is significantly smaller
> > than the number of commands supported, especially on the render ring. As such,
> > the parser tables (built up in subsequent patches) contain only those commands
> > required by the parser. This generally works because command opcode ranges have
> > standard command length encodings. So for commands that the parser does not need
> > to check, it can easily skip them. This is implementated via a per-ring length
> > decoding vfunc.
> > 
> > Unfortunately, there are a number of commands that do not follow the standard
> > length encoding for their opcode range, primarily amongst the MI_* commands. To
> > handle this, the parser provides a way to define explicit "skip" entries in the
> > per-ring command tables.
> > 
> > Other command table entries will map fairly directly to high level categories
> > mentioned above: rejected, master-only, register whitelist. A number of checks,
> > including the privileged memory checks, are implemented via a general bitmasking
> > mechanism.
> > 
> > OTC-Tracker: AXIA-4631
> > Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
> > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> 
> 
> > +#include "i915_drv.h"
> > +
> > +#define CLIENT_MASK      0xE0000000
> > +#define SUBCLIENT_MASK   0x18000000
> > +#define MI_CLIENT        0x00000000
> > +#define RC_CLIENT        0x60000000
> > +#define BC_CLIENT        0x40000000
> > +#define MEDIA_SUBCLIENT  0x10000000
> 
> I think these would fit neatly right next to all the other MI_* #defines
> in i915_reg.h. The other idea that just crossed my mind is to extract all
> the command #defines into a new i915_cmd.h (included by i915_drv.h by
> default) since i915_reg.h is already giant.

i915_cmd.h please. (i915_cs.h? i915_command_stream.h?)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-01-30  9:12           ` Daniel Vetter
@ 2014-01-30 11:07             ` Daniel Vetter
  2014-01-30 18:05               ` Volkin, Bradley D
  2014-01-30 17:55             ` Volkin, Bradley D
  1 sibling, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2014-01-30 11:07 UTC (permalink / raw)
  To: Chris Wilson, bradley.d.volkin, intel-gfx

On Thu, Jan 30, 2014 at 10:12:06AM +0100, Daniel Vetter wrote:
> On Thu, Jan 30, 2014 at 10:05:28AM +0100, Daniel Vetter wrote:
> > On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> > > On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > > > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > > > +/*
> > > > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > > > + *
> > > > > + * The caller must supply space for a default descriptor via the default_desc
> > > > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > > > + * command parser tables, this function fills in default_desc based on the
> > > > > + * ring's default length encoding and returns default_desc.
> > > > > + */
> > > > > +static const struct drm_i915_cmd_descriptor*
> > > > > +find_cmd(struct intel_ring_buffer *ring,
> > > > > +	 u32 cmd_header,
> > > > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > > > +{
> > > > > +	u32 mask;
> > > > > +	int i;
> > > > > +
> > > > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > > > +		const struct drm_i915_cmd_descriptor *desc;
> > > > > +
> > > > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > > > +		if (desc)
> > > > > +			return desc;
> > > > > +	}
> > > > > +
> > > > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > > > +	if (!mask)
> > > > > +		return NULL;
> > > > > +
> > > > > +	BUG_ON(!default_desc);
> > > > > +	default_desc->flags = CMD_DESC_SKIP;
> > > > > +	default_desc->length.mask = mask;
> > > > 
> > > > If we turn off all hw validation (through use of the secure bit) should
> > > > we not default to a whitelist of commands? Otherwise it just seems to be
> > > > a case of running a fuzzer until we kill the machine.
> > > 
> > > Preventing hangs and dos is imo not the attack model, gpus are too fickle
> > > for that. The attach model here is to prevent priveledge escalation and
> > > information leaks. I.e. we want just containement of all read/write access
> > > to the gtt space.
> > > 
> > > I think for that purpose an explicit whitelist of commands which target
> > > things outside of the (pp)gtt is sufficient. radeon's checker design is
> > > completely different, but pretty much the only command they have is
> > > to load register values. Intel gpus otoh have a big set of special-purpose
> > > commands to load (most) of the rendering pipeline state. So we have
> > > hw built-in register whitelists for all that stuff since you just can't
> > > load arbitrary registers and state with those commands.
> > > 
> > > Also note that for raw register access Bradley's scanner _is_ whitelist
> > > based. And for general reads/writes gpu designers confirmed that those are
> > > all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> > > as long as we check for the exceptions and otherwise only whitelist MI_
> > > commands we know about we should be covered.
> > > 
> > > So I think this is sound.
> > 
> > Hm, but while scrolling through the checker I haven't spotted a "reject
> > everything unknown" for MI_CLIENT commands. Bradley, have I missed that?
> > 
> > I think submitting an invented MI_CLIENT command would also be a good
> > testcase.
> 
> One more: I think it would be good to have an overview comment at the top
> of i915_cmd_parser.c which details the security attack model and the
> overall blacklist/whitelist design of the checker. We don't (yet) have
> autogenerated documentation for i915, but that's something I'm working on.
> And the kerneldoc system can also pull in multi-paragraph overview
> comments besides the usual api documentation, so it's good to have things
> ready.

Chatted with Chris a bit more on irc about this, and for more paranoia I
guess we should also reject any unknown client and media subclient
commands.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 5/6] tests/gem_exec_parse: Test for batches w/o MI_BATCH_BUFFER_END
  2014-01-29 22:10     ` Chris Wilson
@ 2014-01-30 11:46       ` Chris Wilson
  2014-03-25 13:17         ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Chris Wilson @ 2014-01-30 11:46 UTC (permalink / raw)
  To: bradley.d.volkin, intel-gfx

On Wed, Jan 29, 2014 at 10:10:47PM +0000, Chris Wilson wrote:
> On Wed, Jan 29, 2014 at 01:58:29PM -0800, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > ---
> >  tests/gem_exec_parse.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/tests/gem_exec_parse.c b/tests/gem_exec_parse.c
> > index 9e90408..004c3bf 100644
> > --- a/tests/gem_exec_parse.c
> > +++ b/tests/gem_exec_parse.c
> > @@ -257,6 +257,15 @@ igt_main
> >  				      -EINVAL));
> >  	}
> >  
> > +	igt_subtest("batch-without-end") {
> > +		uint32_t noop[1024] = { 0 };
> > +		igt_assert(
> > +			   exec_batch(fd, handle,
> > +				      noop, sizeof(noop),
> > +				      I915_EXEC_RENDER,
> > +				      -EINVAL));
> 
> Cheekier would be
> uint32_t empty[] = { MI_NOOP, MI_NOOP, MI_BATCH_BUFFER_END, 0 };
> for_each_ring() {
> 	igt_assert(exec_batch(fd, handle, empty, sizeof(empty), ring, 0));
> 	igt_assert(exec_batch(fd, handle, empty, 8, ring, -EINVAL));
> }

On this subject, it should be
{ INVALID, INVALID, NOOP, NOOP, END, 0}
assert(exec(0,  4) == -EINVAL);
assert(exec(0,  8) == -EINVAL);
assert(exec(0, 12) == -EINVAL);
assert(exec(4,  8) == -EINVAL);
assert(exec(4, 12) == 0);
assert(exec(8, 12) == 0);
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 12/13] drm/i915: Add a CMD_PARSER_VERSION getparam
  2014-01-30  9:19     ` Daniel Vetter
@ 2014-01-30 17:25       ` Volkin, Bradley D
  0 siblings, 0 replies; 138+ messages in thread
From: Volkin, Bradley D @ 2014-01-30 17:25 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, Jan 30, 2014 at 01:19:15AM -0800, Daniel Vetter wrote:
> On Wed, Jan 29, 2014 at 01:55:13PM -0800, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > So userspace can query the kernel for command parser support.
> > 
> > OTC-Tracker: AXIA-4631
> > Change-Id: I58af650db9f6753c2dcac9c54ab432fd31db302f
> > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_dma.c | 4 ++++
> >  include/uapi/drm/i915_drm.h     | 1 +
> >  2 files changed, 5 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> > index 258b1be..34ba199 100644
> > --- a/drivers/gpu/drm/i915/i915_dma.c
> > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > @@ -1013,6 +1013,10 @@ static int i915_getparam(struct drm_device *dev, void *data,
> >  	case I915_PARAM_HAS_EXEC_HANDLE_LUT:
> >  		value = 1;
> >  		break;
> > +	case I915_PARAM_CMD_PARSER_VERSION:
> > +		/* TODO: version info (e.g. what is allowed?) */
> > +		value = 1;
> 
> I think an increasing integer without any mean special grouping (like
> major/minor) is good enough. What we need though is a small revision log
> with one-line blurbs that explain what has been added, e.g.
> 
> 1: Initial version.
> 2: Allow streamout related registers
> 3: Add gen8 support
> 
> ...
> 
> I think it would be good to have this as a comment right next to the
> parser code itself, so what about adding a i915_cmd_parser_get_version
> function to i915_cmd_parser.c who's only job is to return and integer and
> contain this comment?
> -Daniel

Agree on the revision log. I forgot that I left writing it as a todo.
I'll add i915_cmd_parser_get_version and fill that in.
- Brad
 
> 
> > +		break;
> >  	default:
> >  		DRM_DEBUG("Unknown parameter %d\n", param->param);
> >  		return -EINVAL;
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index 126bfaa..8a3e4ef00 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -337,6 +337,7 @@ typedef struct drm_i915_irq_wait {
> >  #define I915_PARAM_HAS_EXEC_NO_RELOC	 25
> >  #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
> >  #define I915_PARAM_HAS_WT     	 	 27
> > +#define I915_PARAM_CMD_PARSER_VERSION	 28
> >  
> >  typedef struct drm_i915_getparam {
> >  	int param;
> > -- 
> > 1.8.5.2
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] intel: Merge i915_drm.h with cmd parser define
  2014-01-30  9:20       ` Daniel Vetter
@ 2014-01-30 17:28         ` Volkin, Bradley D
  2014-02-04 10:26           ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2014-01-30 17:28 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, Jan 30, 2014 at 01:20:57AM -0800, Daniel Vetter wrote:
> On Wed, Jan 29, 2014 at 02:26:12PM -0800, Volkin, Bradley D wrote:
> > On Wed, Jan 29, 2014 at 02:13:21PM -0800, Chris Wilson wrote:
> > > On Wed, Jan 29, 2014 at 01:57:28PM -0800, bradley.d.volkin@intel.com wrote:
> > > > From: Brad Volkin <bradley.d.volkin@intel.com>
> > > > 
> > > > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > > > ---
> > > >  include/drm/i915_drm.h | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
> > > > index 2f4eb8c..ba863c4 100644
> > > > --- a/include/drm/i915_drm.h
> > > > +++ b/include/drm/i915_drm.h
> > > > @@ -27,7 +27,7 @@
> > > >  #ifndef _I915_DRM_H_
> > > >  #define _I915_DRM_H_
> > > >  
> > > > -#include <drm.h>
> > > > +#include <drm/drm.h>
> > > 
> > > Something about this patch smells very fishy....
> > 
> > Yeah, I wasn't completely sure about this one. I followed what I thought was
> > the procedure for updating the header (i.e. make headers_install in kernel,
> > copy to libdrm) and this is what I got.
> 
> I guess either works, so maybe just add a note to the commit message about
> the little change. Imo it's better to have a 1:1 copy of the header
> generated by the kernel.

Sorry, I'm a bit confused. Did I follow the right procedure for updating
the header?

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-01-30  9:12           ` Daniel Vetter
  2014-01-30 11:07             ` Daniel Vetter
@ 2014-01-30 17:55             ` Volkin, Bradley D
  1 sibling, 0 replies; 138+ messages in thread
From: Volkin, Bradley D @ 2014-01-30 17:55 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, Jan 30, 2014 at 01:12:06AM -0800, Daniel Vetter wrote:
> On Thu, Jan 30, 2014 at 10:05:28AM +0100, Daniel Vetter wrote:
> > On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> > > On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > > > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > > > +/*
> > > > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > > > + *
> > > > > + * The caller must supply space for a default descriptor via the default_desc
> > > > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > > > + * command parser tables, this function fills in default_desc based on the
> > > > > + * ring's default length encoding and returns default_desc.
> > > > > + */
> > > > > +static const struct drm_i915_cmd_descriptor*
> > > > > +find_cmd(struct intel_ring_buffer *ring,
> > > > > +	 u32 cmd_header,
> > > > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > > > +{
> > > > > +	u32 mask;
> > > > > +	int i;
> > > > > +
> > > > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > > > +		const struct drm_i915_cmd_descriptor *desc;
> > > > > +
> > > > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > > > +		if (desc)
> > > > > +			return desc;
> > > > > +	}
> > > > > +
> > > > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > > > +	if (!mask)
> > > > > +		return NULL;
> > > > > +
> > > > > +	BUG_ON(!default_desc);
> > > > > +	default_desc->flags = CMD_DESC_SKIP;
> > > > > +	default_desc->length.mask = mask;
> > > > 
> > > > If we turn off all hw validation (through use of the secure bit) should
> > > > we not default to a whitelist of commands? Otherwise it just seems to be
> > > > a case of running a fuzzer until we kill the machine.
> > > 
> > > Preventing hangs and dos is imo not the attack model, gpus are too fickle
> > > for that. The attach model here is to prevent priveledge escalation and
> > > information leaks. I.e. we want just containement of all read/write access
> > > to the gtt space.
> > > 
> > > I think for that purpose an explicit whitelist of commands which target
> > > things outside of the (pp)gtt is sufficient. radeon's checker design is
> > > completely different, but pretty much the only command they have is
> > > to load register values. Intel gpus otoh have a big set of special-purpose
> > > commands to load (most) of the rendering pipeline state. So we have
> > > hw built-in register whitelists for all that stuff since you just can't
> > > load arbitrary registers and state with those commands.
> > > 
> > > Also note that for raw register access Bradley's scanner _is_ whitelist
> > > based. And for general reads/writes gpu designers confirmed that those are
> > > all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> > > as long as we check for the exceptions and otherwise only whitelist MI_
> > > commands we know about we should be covered.
> > > 
> > > So I think this is sound.
> > 
> > Hm, but while scrolling through the checker I haven't spotted a "reject
> > everything unknown" for MI_CLIENT commands. Bradley, have I missed that?
> > 
> > I think submitting an invented MI_CLIENT command would also be a good
> > testcase.
> 
> One more: I think it would be good to have an overview comment at the top
> of i915_cmd_parser.c which details the security attack model and the
> overall blacklist/whitelist design of the checker. We don't (yet) have
> autogenerated documentation for i915, but that's something I'm working on.
> And the kerneldoc system can also pull in multi-paragraph overview
> comments besides the usual api documentation, so it's good to have things
> ready.

Ok, I'll add that and maybe kerneldoc for the non-static functions to the
next revision.
- Brad

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-01-30 11:07             ` Daniel Vetter
@ 2014-01-30 18:05               ` Volkin, Bradley D
  2014-02-03 23:00                 ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2014-01-30 18:05 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, Jan 30, 2014 at 03:07:15AM -0800, Daniel Vetter wrote:
> On Thu, Jan 30, 2014 at 10:12:06AM +0100, Daniel Vetter wrote:
> > On Thu, Jan 30, 2014 at 10:05:28AM +0100, Daniel Vetter wrote:
> > > On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> > > > On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > > > > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > > > > +/*
> > > > > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > > > > + *
> > > > > > + * The caller must supply space for a default descriptor via the default_desc
> > > > > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > > > > + * command parser tables, this function fills in default_desc based on the
> > > > > > + * ring's default length encoding and returns default_desc.
> > > > > > + */
> > > > > > +static const struct drm_i915_cmd_descriptor*
> > > > > > +find_cmd(struct intel_ring_buffer *ring,
> > > > > > +	 u32 cmd_header,
> > > > > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > > > > +{
> > > > > > +	u32 mask;
> > > > > > +	int i;
> > > > > > +
> > > > > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > > > > +		const struct drm_i915_cmd_descriptor *desc;
> > > > > > +
> > > > > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > > > > +		if (desc)
> > > > > > +			return desc;
> > > > > > +	}
> > > > > > +
> > > > > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > > > > +	if (!mask)
> > > > > > +		return NULL;
> > > > > > +
> > > > > > +	BUG_ON(!default_desc);
> > > > > > +	default_desc->flags = CMD_DESC_SKIP;
> > > > > > +	default_desc->length.mask = mask;
> > > > > 
> > > > > If we turn off all hw validation (through use of the secure bit) should
> > > > > we not default to a whitelist of commands? Otherwise it just seems to be
> > > > > a case of running a fuzzer until we kill the machine.
> > > > 
> > > > Preventing hangs and dos is imo not the attack model, gpus are too fickle
> > > > for that. The attach model here is to prevent priveledge escalation and
> > > > information leaks. I.e. we want just containement of all read/write access
> > > > to the gtt space.
> > > > 
> > > > I think for that purpose an explicit whitelist of commands which target
> > > > things outside of the (pp)gtt is sufficient. radeon's checker design is
> > > > completely different, but pretty much the only command they have is
> > > > to load register values. Intel gpus otoh have a big set of special-purpose
> > > > commands to load (most) of the rendering pipeline state. So we have
> > > > hw built-in register whitelists for all that stuff since you just can't
> > > > load arbitrary registers and state with those commands.
> > > > 
> > > > Also note that for raw register access Bradley's scanner _is_ whitelist
> > > > based. And for general reads/writes gpu designers confirmed that those are
> > > > all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> > > > as long as we check for the exceptions and otherwise only whitelist MI_
> > > > commands we know about we should be covered.
> > > > 
> > > > So I think this is sound.
> > > 
> > > Hm, but while scrolling through the checker I haven't spotted a "reject
> > > everything unknown" for MI_CLIENT commands. Bradley, have I missed that?
> > > 
> > > I think submitting an invented MI_CLIENT command would also be a good
> > > testcase.
> > 
> > One more: I think it would be good to have an overview comment at the top
> > of i915_cmd_parser.c which details the security attack model and the
> > overall blacklist/whitelist design of the checker. We don't (yet) have
> > autogenerated documentation for i915, but that's something I'm working on.
> > And the kerneldoc system can also pull in multi-paragraph overview
> > comments besides the usual api documentation, so it's good to have things
> > ready.
> 
> Chatted with Chris a bit more on irc about this, and for more paranoia I
> guess we should also reject any unknown client and media subclient
> commands.

Hmm, not sure I follow. Can you elaborate?

Are you suggesting we add all the MI and Media commands to the tables and reject
any command from those client/subclients that is not found in the table? Or that
we look at the client and subclient fields of the command and reject if they are
not from a set of expected values? Or other?

- Brad

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 07/13] drm/i915: Add register whitelist for DRM master
       [not found]           ` <20140130172206.GA26611@vpg-ubuntu-bdvolkin>
@ 2014-01-30 20:41             ` Daniel Vetter
  0 siblings, 0 replies; 138+ messages in thread
From: Daniel Vetter @ 2014-01-30 20:41 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: Nikula, Jani, intel-gfx

Re-adding intel-gfx, this seems generally useful.

On Thu, Jan 30, 2014 at 6:22 PM, Volkin, Bradley D
<bradley.d.volkin@intel.com> wrote:
> Ok. There's still a couple ways I could see doing it.
> - one big patch with all the new #defines, up front
> - split the #defines from their current patches, all up front
> - split the #defines from their current patches, just before the patch that uses them
>
> I guess I would tend towards the 3rd option, but maybe that's overkill. Which do you suggest?

Whatever you like better, and tbh I'd wait for review and only split
things if you need to. In essence this was just a suggestion for the
future for smoother merging.

For the patches themselves the riskiest thing from my maintainer pov
is breaking old userspace. So I'd like to get the command streamer
merged as soon as possible, and enabled in enforcing mode. We can then
plug in all the missing pieces while getting test coverage in the wild
(lots of people tend to use -nightly). I've sigend up Jani to review
the current series in detail, including the tests.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-01-30 18:05               ` Volkin, Bradley D
@ 2014-02-03 23:00                 ` Volkin, Bradley D
  2014-02-04 10:20                   ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2014-02-03 23:00 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

Ping. Daniel or Chris, can one of you clarify this request? Thanks.

On Thu, Jan 30, 2014 at 10:05:27AM -0800, Volkin, Bradley D wrote:
> On Thu, Jan 30, 2014 at 03:07:15AM -0800, Daniel Vetter wrote:
> > On Thu, Jan 30, 2014 at 10:12:06AM +0100, Daniel Vetter wrote:
> > > On Thu, Jan 30, 2014 at 10:05:28AM +0100, Daniel Vetter wrote:
> > > > On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> > > > > On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > > > > > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > > > > > +/*
> > > > > > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > > > > > + *
> > > > > > > + * The caller must supply space for a default descriptor via the default_desc
> > > > > > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > > > > > + * command parser tables, this function fills in default_desc based on the
> > > > > > > + * ring's default length encoding and returns default_desc.
> > > > > > > + */
> > > > > > > +static const struct drm_i915_cmd_descriptor*
> > > > > > > +find_cmd(struct intel_ring_buffer *ring,
> > > > > > > +	 u32 cmd_header,
> > > > > > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > > > > > +{
> > > > > > > +	u32 mask;
> > > > > > > +	int i;
> > > > > > > +
> > > > > > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > > > > > +		const struct drm_i915_cmd_descriptor *desc;
> > > > > > > +
> > > > > > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > > > > > +		if (desc)
> > > > > > > +			return desc;
> > > > > > > +	}
> > > > > > > +
> > > > > > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > > > > > +	if (!mask)
> > > > > > > +		return NULL;
> > > > > > > +
> > > > > > > +	BUG_ON(!default_desc);
> > > > > > > +	default_desc->flags = CMD_DESC_SKIP;
> > > > > > > +	default_desc->length.mask = mask;
> > > > > > 
> > > > > > If we turn off all hw validation (through use of the secure bit) should
> > > > > > we not default to a whitelist of commands? Otherwise it just seems to be
> > > > > > a case of running a fuzzer until we kill the machine.
> > > > > 
> > > > > Preventing hangs and dos is imo not the attack model, gpus are too fickle
> > > > > for that. The attach model here is to prevent priveledge escalation and
> > > > > information leaks. I.e. we want just containement of all read/write access
> > > > > to the gtt space.
> > > > > 
> > > > > I think for that purpose an explicit whitelist of commands which target
> > > > > things outside of the (pp)gtt is sufficient. radeon's checker design is
> > > > > completely different, but pretty much the only command they have is
> > > > > to load register values. Intel gpus otoh have a big set of special-purpose
> > > > > commands to load (most) of the rendering pipeline state. So we have
> > > > > hw built-in register whitelists for all that stuff since you just can't
> > > > > load arbitrary registers and state with those commands.
> > > > > 
> > > > > Also note that for raw register access Bradley's scanner _is_ whitelist
> > > > > based. And for general reads/writes gpu designers confirmed that those are
> > > > > all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> > > > > as long as we check for the exceptions and otherwise only whitelist MI_
> > > > > commands we know about we should be covered.
> > > > > 
> > > > > So I think this is sound.
> > > > 
> > > > Hm, but while scrolling through the checker I haven't spotted a "reject
> > > > everything unknown" for MI_CLIENT commands. Bradley, have I missed that?
> > > > 
> > > > I think submitting an invented MI_CLIENT command would also be a good
> > > > testcase.
> > > 
> > > One more: I think it would be good to have an overview comment at the top
> > > of i915_cmd_parser.c which details the security attack model and the
> > > overall blacklist/whitelist design of the checker. We don't (yet) have
> > > autogenerated documentation for i915, but that's something I'm working on.
> > > And the kerneldoc system can also pull in multi-paragraph overview
> > > comments besides the usual api documentation, so it's good to have things
> > > ready.
> > 
> > Chatted with Chris a bit more on irc about this, and for more paranoia I
> > guess we should also reject any unknown client and media subclient
> > commands.
> 
> Hmm, not sure I follow. Can you elaborate?
> 
> Are you suggesting we add all the MI and Media commands to the tables and reject
> any command from those client/subclients that is not found in the table? Or that
> we look at the client and subclient fields of the command and reject if they are
> not from a set of expected values? Or other?
> 
> - Brad
> 
> > -Daniel
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-02-03 23:00                 ` Volkin, Bradley D
@ 2014-02-04 10:20                   ` Daniel Vetter
  2014-02-04 18:45                     ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2014-02-04 10:20 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Mon, Feb 03, 2014 at 03:00:19PM -0800, Volkin, Bradley D wrote:
> Ping. Daniel or Chris, can one of you clarify this request? Thanks.

I've been enjoying fosdem ...

> On Thu, Jan 30, 2014 at 10:05:27AM -0800, Volkin, Bradley D wrote:
> > On Thu, Jan 30, 2014 at 03:07:15AM -0800, Daniel Vetter wrote:
> > > On Thu, Jan 30, 2014 at 10:12:06AM +0100, Daniel Vetter wrote:
> > > > On Thu, Jan 30, 2014 at 10:05:28AM +0100, Daniel Vetter wrote:
> > > > > On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> > > > > > On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > > > > > > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > > > > > > +/*
> > > > > > > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > > > > > > + *
> > > > > > > > + * The caller must supply space for a default descriptor via the default_desc
> > > > > > > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > > > > > > + * command parser tables, this function fills in default_desc based on the
> > > > > > > > + * ring's default length encoding and returns default_desc.
> > > > > > > > + */
> > > > > > > > +static const struct drm_i915_cmd_descriptor*
> > > > > > > > +find_cmd(struct intel_ring_buffer *ring,
> > > > > > > > +	 u32 cmd_header,
> > > > > > > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > > > > > > +{
> > > > > > > > +	u32 mask;
> > > > > > > > +	int i;
> > > > > > > > +
> > > > > > > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > > > > > > +		const struct drm_i915_cmd_descriptor *desc;
> > > > > > > > +
> > > > > > > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > > > > > > +		if (desc)
> > > > > > > > +			return desc;
> > > > > > > > +	}
> > > > > > > > +
> > > > > > > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > > > > > > +	if (!mask)
> > > > > > > > +		return NULL;
> > > > > > > > +
> > > > > > > > +	BUG_ON(!default_desc);
> > > > > > > > +	default_desc->flags = CMD_DESC_SKIP;
> > > > > > > > +	default_desc->length.mask = mask;
> > > > > > > 
> > > > > > > If we turn off all hw validation (through use of the secure bit) should
> > > > > > > we not default to a whitelist of commands? Otherwise it just seems to be
> > > > > > > a case of running a fuzzer until we kill the machine.
> > > > > > 
> > > > > > Preventing hangs and dos is imo not the attack model, gpus are too fickle
> > > > > > for that. The attach model here is to prevent priveledge escalation and
> > > > > > information leaks. I.e. we want just containement of all read/write access
> > > > > > to the gtt space.
> > > > > > 
> > > > > > I think for that purpose an explicit whitelist of commands which target
> > > > > > things outside of the (pp)gtt is sufficient. radeon's checker design is
> > > > > > completely different, but pretty much the only command they have is
> > > > > > to load register values. Intel gpus otoh have a big set of special-purpose
> > > > > > commands to load (most) of the rendering pipeline state. So we have
> > > > > > hw built-in register whitelists for all that stuff since you just can't
> > > > > > load arbitrary registers and state with those commands.
> > > > > > 
> > > > > > Also note that for raw register access Bradley's scanner _is_ whitelist
> > > > > > based. And for general reads/writes gpu designers confirmed that those are
> > > > > > all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> > > > > > as long as we check for the exceptions and otherwise only whitelist MI_
> > > > > > commands we know about we should be covered.
> > > > > > 
> > > > > > So I think this is sound.
> > > > > 
> > > > > Hm, but while scrolling through the checker I haven't spotted a "reject
> > > > > everything unknown" for MI_CLIENT commands. Bradley, have I missed that?
> > > > > 
> > > > > I think submitting an invented MI_CLIENT command would also be a good
> > > > > testcase.
> > > > 
> > > > One more: I think it would be good to have an overview comment at the top
> > > > of i915_cmd_parser.c which details the security attack model and the
> > > > overall blacklist/whitelist design of the checker. We don't (yet) have
> > > > autogenerated documentation for i915, but that's something I'm working on.
> > > > And the kerneldoc system can also pull in multi-paragraph overview
> > > > comments besides the usual api documentation, so it's good to have things
> > > > ready.
> > > 
> > > Chatted with Chris a bit more on irc about this, and for more paranoia I
> > > guess we should also reject any unknown client and media subclient
> > > commands.
> > 
> > Hmm, not sure I follow. Can you elaborate?
> > 
> > Are you suggesting we add all the MI and Media commands to the tables and reject
> > any command from those client/subclients that is not found in the table? Or that
> > we look at the client and subclient fields of the command and reject if they are
> > not from a set of expected values? Or other?

Yeah, I think we should check the client/subclient fields for know values,
but not have explicit lists for each command. So overall control-flow
would be:

1. Check client/subclient against whitelist (per-ring obviously, so reject
blitter commands on non-blt rings ofc).

2. If the client indicates an MI command, check against MI_ whitelist.

3. For all other clients check against blacklist/greylist (e.g.
pipe_control where we just need to forbid global gtt writes).

Iirc we need a special-cases list for blitter commands anyway due to their
irregular lenght, so maybe we could do a full whitelist for that one, too.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] intel: Merge i915_drm.h with cmd parser define
  2014-01-30 17:28         ` Volkin, Bradley D
@ 2014-02-04 10:26           ` Daniel Vetter
  0 siblings, 0 replies; 138+ messages in thread
From: Daniel Vetter @ 2014-02-04 10:26 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Thu, Jan 30, 2014 at 09:28:25AM -0800, Volkin, Bradley D wrote:
> On Thu, Jan 30, 2014 at 01:20:57AM -0800, Daniel Vetter wrote:
> > On Wed, Jan 29, 2014 at 02:26:12PM -0800, Volkin, Bradley D wrote:
> > > On Wed, Jan 29, 2014 at 02:13:21PM -0800, Chris Wilson wrote:
> > > > On Wed, Jan 29, 2014 at 01:57:28PM -0800, bradley.d.volkin@intel.com wrote:
> > > > > From: Brad Volkin <bradley.d.volkin@intel.com>
> > > > > 
> > > > > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > > > > ---
> > > > >  include/drm/i915_drm.h | 5 +++--
> > > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
> > > > > index 2f4eb8c..ba863c4 100644
> > > > > --- a/include/drm/i915_drm.h
> > > > > +++ b/include/drm/i915_drm.h
> > > > > @@ -27,7 +27,7 @@
> > > > >  #ifndef _I915_DRM_H_
> > > > >  #define _I915_DRM_H_
> > > > >  
> > > > > -#include <drm.h>
> > > > > +#include <drm/drm.h>
> > > > 
> > > > Something about this patch smells very fishy....
> > > 
> > > Yeah, I wasn't completely sure about this one. I followed what I thought was
> > > the procedure for updating the header (i.e. make headers_install in kernel,
> > > copy to libdrm) and this is what I got.
> > 
> > I guess either works, so maybe just add a note to the commit message about
> > the little change. Imo it's better to have a 1:1 copy of the header
> > generated by the kernel.
> 
> Sorry, I'm a bit confused. Did I follow the right procedure for updating
> the header?

Yes, the procedure is

$ cd $KERNEL_REPO
$ make headers_install
$ cp usr/include/drm/i915_drm.h $DRM_REPO/drm/include/drm/

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-02-04 10:20                   ` Daniel Vetter
@ 2014-02-04 18:45                     ` Volkin, Bradley D
  2014-02-04 19:33                       ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2014-02-04 18:45 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, Feb 04, 2014 at 02:20:36AM -0800, Daniel Vetter wrote:
> On Mon, Feb 03, 2014 at 03:00:19PM -0800, Volkin, Bradley D wrote:
> > Ping. Daniel or Chris, can one of you clarify this request? Thanks.
> 
> I've been enjoying fosdem ...
> 
> > On Thu, Jan 30, 2014 at 10:05:27AM -0800, Volkin, Bradley D wrote:
> > > On Thu, Jan 30, 2014 at 03:07:15AM -0800, Daniel Vetter wrote:
> > > > On Thu, Jan 30, 2014 at 10:12:06AM +0100, Daniel Vetter wrote:
> > > > > On Thu, Jan 30, 2014 at 10:05:28AM +0100, Daniel Vetter wrote:
> > > > > > On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> > > > > > > On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > > > > > > > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > > > > > > > +/*
> > > > > > > > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > > > > > > > + *
> > > > > > > > > + * The caller must supply space for a default descriptor via the default_desc
> > > > > > > > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > > > > > > > + * command parser tables, this function fills in default_desc based on the
> > > > > > > > > + * ring's default length encoding and returns default_desc.
> > > > > > > > > + */
> > > > > > > > > +static const struct drm_i915_cmd_descriptor*
> > > > > > > > > +find_cmd(struct intel_ring_buffer *ring,
> > > > > > > > > +	 u32 cmd_header,
> > > > > > > > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > > > > > > > +{
> > > > > > > > > +	u32 mask;
> > > > > > > > > +	int i;
> > > > > > > > > +
> > > > > > > > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > > > > > > > +		const struct drm_i915_cmd_descriptor *desc;
> > > > > > > > > +
> > > > > > > > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > > > > > > > +		if (desc)
> > > > > > > > > +			return desc;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > > > > > > > +	if (!mask)
> > > > > > > > > +		return NULL;
> > > > > > > > > +
> > > > > > > > > +	BUG_ON(!default_desc);
> > > > > > > > > +	default_desc->flags = CMD_DESC_SKIP;
> > > > > > > > > +	default_desc->length.mask = mask;
> > > > > > > > 
> > > > > > > > If we turn off all hw validation (through use of the secure bit) should
> > > > > > > > we not default to a whitelist of commands? Otherwise it just seems to be
> > > > > > > > a case of running a fuzzer until we kill the machine.
> > > > > > > 
> > > > > > > Preventing hangs and dos is imo not the attack model, gpus are too fickle
> > > > > > > for that. The attach model here is to prevent priveledge escalation and
> > > > > > > information leaks. I.e. we want just containement of all read/write access
> > > > > > > to the gtt space.
> > > > > > > 
> > > > > > > I think for that purpose an explicit whitelist of commands which target
> > > > > > > things outside of the (pp)gtt is sufficient. radeon's checker design is
> > > > > > > completely different, but pretty much the only command they have is
> > > > > > > to load register values. Intel gpus otoh have a big set of special-purpose
> > > > > > > commands to load (most) of the rendering pipeline state. So we have
> > > > > > > hw built-in register whitelists for all that stuff since you just can't
> > > > > > > load arbitrary registers and state with those commands.
> > > > > > > 
> > > > > > > Also note that for raw register access Bradley's scanner _is_ whitelist
> > > > > > > based. And for general reads/writes gpu designers confirmed that those are
> > > > > > > all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> > > > > > > as long as we check for the exceptions and otherwise only whitelist MI_
> > > > > > > commands we know about we should be covered.
> > > > > > > 
> > > > > > > So I think this is sound.
> > > > > > 
> > > > > > Hm, but while scrolling through the checker I haven't spotted a "reject
> > > > > > everything unknown" for MI_CLIENT commands. Bradley, have I missed that?
> > > > > > 
> > > > > > I think submitting an invented MI_CLIENT command would also be a good
> > > > > > testcase.
> > > > > 
> > > > > One more: I think it would be good to have an overview comment at the top
> > > > > of i915_cmd_parser.c which details the security attack model and the
> > > > > overall blacklist/whitelist design of the checker. We don't (yet) have
> > > > > autogenerated documentation for i915, but that's something I'm working on.
> > > > > And the kerneldoc system can also pull in multi-paragraph overview
> > > > > comments besides the usual api documentation, so it's good to have things
> > > > > ready.
> > > > 
> > > > Chatted with Chris a bit more on irc about this, and for more paranoia I
> > > > guess we should also reject any unknown client and media subclient
> > > > commands.
> > > 
> > > Hmm, not sure I follow. Can you elaborate?
> > > 
> > > Are you suggesting we add all the MI and Media commands to the tables and reject
> > > any command from those client/subclients that is not found in the table? Or that
> > > we look at the client and subclient fields of the command and reject if they are
> > > not from a set of expected values? Or other?
> 
> Yeah, I think we should check the client/subclient fields for know values,
> but not have explicit lists for each command. So overall control-flow
> would be:
> 
> 1. Check client/subclient against whitelist (per-ring obviously, so reject
> blitter commands on non-blt rings ofc).

Ok, that's easy enough. I'm not sure the benefit is that large, but I can add this.

> 
> 2. If the client indicates an MI command, check against MI_ whitelist.
> 
> 3. For all other clients check against blacklist/greylist (e.g.
> pipe_control where we just need to forbid global gtt writes).

I'm a bit concerned about 2 and 3 because the behavior and table structures seem
fairly different from what we have now. I'm hesitant to make too big a change here
so close to having something functional.

Please correct me if I've missed or misunderstood anything, but my thinking is...

The current table structure is that we have tables per-ring and per-gen (plus the table
for common MI commands) and all tables are treated as blacklist/greylist. The proposed
flow here would indicate that we need tables per-ring, per-client, per-gen and that some
would be treated as a whitelist and some as a blacklist/greylist.

I think the benefit to these changes amounts to preventing clients from issuing invalid
MI commands, but clients can do that today, so it's not a regression right? It could also
make it easier to safely cover new platforms if MI commands were added, though the parser
is strictly limited to gen7 today and would need additional work to enable it for new gens.

The set of MI commands for current gens is small compared to the other command ranges.
On the one hand, that makes it easier to create a whitelist of the commands. On the other
hand, it also makes it easy to just audit the commands in the spec and validate that the
blacklist/greylist covers the potential issues.

So, if we maintain the conservative parser enabling and re-audit the lists when enabling
new gens, and if we can live with clients issuing totally invalid commands (and I get the
feeling that we have to), then I think the current solution gets us the benefits we want
with less complexity.

Let me know what you think. I'll continue working through the other feedback either way.

> 
> Iirc we need a special-cases list for blitter commands anyway due to their
> irregular lenght, so maybe we could do a full whitelist for that one, too.

All rings have commands with irregular length encodings. I believe they also all have
commands with non-fixed lengths (e.g. XY_TEXT_IMMEDIATE_BLT on blt, MEDIA_OBJECT on render).

Thanks,
Brad

> 
> Cheers, Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-02-04 18:45                     ` Volkin, Bradley D
@ 2014-02-04 19:33                       ` Daniel Vetter
  2014-02-05  0:56                         ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2014-02-04 19:33 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Tue, Feb 04, 2014 at 10:45:45AM -0800, Volkin, Bradley D wrote:
> The current table structure is that we have tables per-ring and per-gen (plus the table
> for common MI commands) and all tables are treated as blacklist/greylist. The proposed
> flow here would indicate that we need tables per-ring, per-client, per-gen and that some
> would be treated as a whitelist and some as a blacklist/greylist.
> 
> I think the benefit to these changes amounts to preventing clients from issuing invalid
> MI commands, but clients can do that today, so it's not a regression right? It could also
> make it easier to safely cover new platforms if MI commands were added, though the parser
> is strictly limited to gen7 today and would need additional work to enable it for new gens.

The benefit is in enabling future platforms - with an explicit MI
whitelist we'd forced to re-audit MI_ commands fully and there's no chance
to miss something. Also the MI_ commands are the tricky ones, so if one
slips through we have a problem.

Relying on the command streamer hw to catch invalid opcodes is something
we need to, so no benefit for that really.

> The set of MI commands for current gens is small compared to the other command ranges.
> On the one hand, that makes it easier to create a whitelist of the commands. On the other
> hand, it also makes it easy to just audit the commands in the spec and validate that the
> blacklist/greylist covers the potential issues.

Since we only care about the set of allowed MI commands the list probably
even shrinks a bit.

> So, if we maintain the conservative parser enabling and re-audit the lists when enabling
> new gens, and if we can live with clients issuing totally invalid commands (and I get the
> feeling that we have to), then I think the current solution gets us the benefits we want
> with less complexity.
> 
> Let me know what you think. I'll continue working through the other feedback either way.

I didn't really consider that the MI whitelist would have a bigger impact
on the code really. So if you expect this change to cause a bit a delay in
getting the next round ready then we can postpone this a bit - for me the
important part is to get the parser in soonish so that we can start to
catch regressions (if there are any we haven't considered yet). But
longer-term I think switching to a whiteliste for MI commands is the right
approach.

> > Iirc we need a special-cases list for blitter commands anyway due to their
> > irregular lenght, so maybe we could do a full whitelist for that one, too.
> 
> All rings have commands with irregular length encodings. I believe they also all have
> commands with non-fixed lengths (e.g. XY_TEXT_IMMEDIATE_BLT on blt, MEDIA_OBJECT on render).

Yeah, variable length commands are everywhere, but I've thought commands
which don't use the usual lenght-2 encoding (i.e. those which are just 1
dword) are restricted to MI commands and the blitter. Am I mistaken on
this?

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-02-04 19:33                       ` Daniel Vetter
@ 2014-02-05  0:56                         ` Volkin, Bradley D
  0 siblings, 0 replies; 138+ messages in thread
From: Volkin, Bradley D @ 2014-02-05  0:56 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, Feb 04, 2014 at 11:33:31AM -0800, Daniel Vetter wrote:
> On Tue, Feb 04, 2014 at 10:45:45AM -0800, Volkin, Bradley D wrote:
> > The current table structure is that we have tables per-ring and per-gen (plus the table
> > for common MI commands) and all tables are treated as blacklist/greylist. The proposed
> > flow here would indicate that we need tables per-ring, per-client, per-gen and that some
> > would be treated as a whitelist and some as a blacklist/greylist.
> > 
> > I think the benefit to these changes amounts to preventing clients from issuing invalid
> > MI commands, but clients can do that today, so it's not a regression right? It could also
> > make it easier to safely cover new platforms if MI commands were added, though the parser
> > is strictly limited to gen7 today and would need additional work to enable it for new gens.
> 
> The benefit is in enabling future platforms - with an explicit MI
> whitelist we'd forced to re-audit MI_ commands fully and there's no chance
> to miss something. Also the MI_ commands are the tricky ones, so if one
> slips through we have a problem.
> 
> Relying on the command streamer hw to catch invalid opcodes is something
> we need to, so no benefit for that really.
> 
> > The set of MI commands for current gens is small compared to the other command ranges.
> > On the one hand, that makes it easier to create a whitelist of the commands. On the other
> > hand, it also makes it easy to just audit the commands in the spec and validate that the
> > blacklist/greylist covers the potential issues.
> 
> Since we only care about the set of allowed MI commands the list probably
> even shrinks a bit.
> 
> > So, if we maintain the conservative parser enabling and re-audit the lists when enabling
> > new gens, and if we can live with clients issuing totally invalid commands (and I get the
> > feeling that we have to), then I think the current solution gets us the benefits we want
> > with less complexity.
> > 
> > Let me know what you think. I'll continue working through the other feedback either way.
> 
> I didn't really consider that the MI whitelist would have a bigger impact
> on the code really. So if you expect this change to cause a bit a delay in
> getting the next round ready then we can postpone this a bit - for me the
> important part is to get the parser in soonish so that we can start to
> catch regressions (if there are any we haven't considered yet). But
> longer-term I think switching to a whiteliste for MI commands is the right
> approach.

Ok, let me get the next round out and then I'll look at this in more detail.

> 
> > > Iirc we need a special-cases list for blitter commands anyway due to their
> > > irregular lenght, so maybe we could do a full whitelist for that one, too.
> > 
> > All rings have commands with irregular length encodings. I believe they also all have
> > commands with non-fixed lengths (e.g. XY_TEXT_IMMEDIATE_BLT on blt, MEDIA_OBJECT on render).
> 
> Yeah, variable length commands are everywhere, but I've thought commands
> which don't use the usual lenght-2 encoding (i.e. those which are just 1
> dword) are restricted to MI commands and the blitter. Am I mistaken on
> this?

>From what I see, there's the 1-dword MI commands on all rings. MFX_WAIT uses
length-1 on VCS. The blitter looks like length-2 everywhere.

Also, the MI commands and a few places on RCS and BCS use more/fewer bits for
the length field than other commands for that client/subclient.
-Brad

> 
> Cheers, Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
                   ` (25 preceding siblings ...)
  2014-01-29 21:58 ` [PATCH 1/6] tests: Add a test for the command parser bradley.d.volkin
@ 2014-02-05 10:28 ` Chris Wilson
  2014-02-05 18:18   ` Volkin, Bradley D
  26 siblings, 1 reply; 138+ messages in thread
From: Chris Wilson @ 2014-02-05 10:28 UTC (permalink / raw)
  To: bradley.d.volkin; +Cc: intel-gfx

On Tue, Nov 26, 2013 at 08:51:17AM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> Certain OpenGL features (e.g. transform feedback, performance monitoring)
> require userspace code to submit batches containing commands such as
> MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
> generations of the hardware will noop these commands in "unsecure" batches
> (which includes all userspace batches submitted via i915) even though the
> commands may be safe and represent the intended programming model of the device.
> 
> This series introduces a software command parser similar in operation to the
> command parsing done in hardware for unsecure batches. However, the software
> parser allows some operations that would be noop'd by hardware, if the parser
> determines the operation is safe, and submits the batch as "secure" to prevent
> hardware parsing. Currently the series implements this on IVB and HSW.

Just one more question... Do you have a branch for people to test?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 03/13] drm/i915: Initial command parser table definitions
  2014-01-29 21:55   ` [PATCH 03/13] drm/i915: Initial command parser table definitions bradley.d.volkin
@ 2014-02-05 14:22     ` Jani Nikula
  0 siblings, 0 replies; 138+ messages in thread
From: Jani Nikula @ 2014-02-05 14:22 UTC (permalink / raw)
  To: bradley.d.volkin, intel-gfx


N.B. I'll likely make multiple passes on the patches while reviewing,
for example I did not check any of the #define values here.


On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
>
> Add command tables defining irregular length commands for each ring.
> This requires a few new command opcode definitions.
>
> OTC-Tracker: AXIA-4631
> Change-Id: I064bceb457e15f46928058352afe76d918c58ef5
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 157 +++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_reg.h        |  46 ++++++++++
>  2 files changed, 203 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 7639dbc..2e27bad 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -27,6 +27,148 @@
>  
>  #include "i915_drv.h"
>  
> +#define STD_MI_OPCODE_MASK  0xFF800000
> +#define STD_3D_OPCODE_MASK  0xFFFF0000
> +#define STD_2D_OPCODE_MASK  0xFFC00000
> +#define STD_MFX_OPCODE_MASK 0xFFFF0000
> +
> +#define CMD(op, opm, f, lm, fl, ...)		\
> +	{					\
> +		.flags = (fl) | (f),		\

Sparse gives me 

drivers/gpu/drm/i915/i915_cmd_parser.c:64:9: warning: dubious: x | !y

for the !F cases (bitwise OR with a logical NOT). I can see it's not a
bug here, but we want to keep those warnings down. Maybe just s/!F/0/ in
the tables? Or make the f argument to CMD a boolean, and make that

	.flags = (fl) | (f ? CMD_DESC_FIXED : 0),

> +		.cmd = { (op), (opm) }, 	\
> +		.length = { (lm) },		\
> +		__VA_ARGS__			\
> +	}
> +
> +/* Convenience macros to compress the tables */
> +#define SMI STD_MI_OPCODE_MASK
> +#define S3D STD_3D_OPCODE_MASK
> +#define S2D STD_2D_OPCODE_MASK
> +#define SMFX STD_MFX_OPCODE_MASK
> +#define F CMD_DESC_FIXED
> +#define S CMD_DESC_SKIP
> +#define R CMD_DESC_REJECT
> +#define W CMD_DESC_REGISTER
> +#define B CMD_DESC_BITMASK
> +#define M CMD_DESC_MASTER
> +
> +/*            Command                          Mask   Fixed Len   Action
> +	      ---------------------------------------------------------- */
> +static const struct drm_i915_cmd_descriptor common_cmds[] = {
> +	CMD(  MI_NOOP,                          SMI,    F,  1,      S  ),
> +	CMD(  MI_USER_INTERRUPT,                SMI,    F,  1,      S  ),
> +	CMD(  MI_WAIT_FOR_EVENT,                SMI,    F,  1,      S  ),
> +	CMD(  MI_ARB_CHECK,                     SMI,    F,  1,      S  ),
> +	CMD(  MI_REPORT_HEAD,                   SMI,    F,  1,      S  ),
> +	CMD(  MI_SUSPEND_FLUSH,                 SMI,    F,  1,      S  ),
> +	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
> +};
> +
> +static const struct drm_i915_cmd_descriptor render_cmds[] = {
> +	CMD(  MI_FLUSH,                         SMI,    F,  1,      S  ),
> +	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
> +	CMD(  MI_PREDICATE,                     SMI,    F,  1,      S  ),
> +	CMD(  MI_TOPOLOGY_FILTER,               SMI,    F,  1,      S  ),
> +	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_SET_CONTEXT,                   SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_URB_CLEAR,                     SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_CLFLUSH,                       SMI,   !F,  0x3FF,  S  ),
> +	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
> +	CMD(  GFX_OP_3DSTATE_VF_STATISTICS,     S3D,    F,  1,      S  ),
> +	CMD(  PIPELINE_SELECT,                  S3D,    F,  1,      S  ),
> +	CMD(  GPGPU_OBJECT,                     S3D,   !F,  0xFF,   S  ),
> +	CMD(  GPGPU_WALKER,                     S3D,   !F,  0xFF,   S  ),
> +	CMD(  GFX_OP_3DSTATE_SO_DECL_LIST,      S3D,   !F,  0x1FF,  S  ),
> +};
> +
> +static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
> +	CMD(  MI_SET_PREDICATE,                 SMI,    F,  1,      S  ),
> +	CMD(  MI_RS_CONTROL,                    SMI,    F,  1,      S  ),
> +	CMD(  MI_URB_ATOMIC_ALLOC,              SMI,    F,  1,      S  ),
> +	CMD(  MI_RS_CONTEXT,                    SMI,    F,  1,      S  ),
> +	CMD(  MI_LOAD_REGISTER_REG,             SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_RS_STORE_DATA_IMM,             SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_LOAD_URB_MEM,                  SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_STORE_URB_MEM,                 SMI,   !F,  0xFF,   S  ),
> +	CMD(  GFX_OP_3DSTATE_DX9_CONSTANTF_VS,  S3D,   !F,  0x7FF,  S  ),
> +	CMD(  GFX_OP_3DSTATE_DX9_CONSTANTF_PS,  S3D,   !F,  0x7FF,  S  ),
> +
> +	CMD(  GFX_OP_3DSTATE_BINDING_TABLE_EDIT_VS,  S3D,   !F,  0x1FF,  S  ),
> +	CMD(  GFX_OP_3DSTATE_BINDING_TABLE_EDIT_GS,  S3D,   !F,  0x1FF,  S  ),
> +	CMD(  GFX_OP_3DSTATE_BINDING_TABLE_EDIT_HS,  S3D,   !F,  0x1FF,  S  ),
> +	CMD(  GFX_OP_3DSTATE_BINDING_TABLE_EDIT_DS,  S3D,   !F,  0x1FF,  S  ),
> +	CMD(  GFX_OP_3DSTATE_BINDING_TABLE_EDIT_PS,  S3D,   !F,  0x1FF,  S  ),
> +};
> +
> +static const struct drm_i915_cmd_descriptor video_cmds[] = {
> +	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
> +	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
> +	/*
> +	 * MFX_WAIT doesn't fit the way we handle length for most commands.
> +	 * It has a length field but it uses a non-standard length bias.
> +	 * It is always 1 dword though, so just treat it as fixed length.
> +	 */
> +	CMD(  MFX_WAIT,                         SMFX,   F,  1,      S  ),
> +};
> +
> +static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
> +	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
> +	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
> +};
> +
> +static const struct drm_i915_cmd_descriptor blt_cmds[] = {
> +	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  S  ),
> +	CMD(  COLOR_BLT,                        S2D,   !F,  0x3F,   S  ),
> +	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
> +};
> +
> +#undef CMD
> +#undef SMI
> +#undef S3D
> +#undef S2D
> +#undef SMFX
> +#undef F
> +#undef S
> +#undef R
> +#undef W
> +#undef B
> +#undef M
> +
> +static const struct drm_i915_cmd_table gen7_render_cmds[] = {
> +	{ common_cmds, ARRAY_SIZE(common_cmds) },
> +	{ render_cmds, ARRAY_SIZE(render_cmds) },
> +};
> +
> +static const struct drm_i915_cmd_table hsw_render_ring_cmds[] = {
> +	{ common_cmds, ARRAY_SIZE(common_cmds) },
> +	{ render_cmds, ARRAY_SIZE(render_cmds) },
> +	{ hsw_render_cmds, ARRAY_SIZE(hsw_render_cmds) },
> +};
> +
> +static const struct drm_i915_cmd_table gen7_video_cmds[] = {
> +	{ common_cmds, ARRAY_SIZE(common_cmds) },
> +	{ video_cmds, ARRAY_SIZE(video_cmds) },
> +};
> +
> +static const struct drm_i915_cmd_table hsw_vebox_cmds[] = {
> +	{ common_cmds, ARRAY_SIZE(common_cmds) },
> +	{ vecs_cmds, ARRAY_SIZE(vecs_cmds) },
> +};
> +
> +static const struct drm_i915_cmd_table gen7_blt_cmds[] = {
> +	{ common_cmds, ARRAY_SIZE(common_cmds) },
> +	{ blt_cmds, ARRAY_SIZE(blt_cmds) },
> +};

A thought, if you added an end-of-array cell to all of these tables, I
think a lot of the initialization would be neater. If that seems like
too much trouble for too little gain, feel free to file this in the
bikeshedding bin.

> +
>  #define CLIENT_MASK      0xE0000000
>  #define SUBCLIENT_MASK   0x18000000
>  #define MI_CLIENT        0x00000000
> @@ -146,15 +288,30 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
>  
>  	switch (ring->id) {
>  	case RCS:
> +		if (IS_HASWELL(ring->dev)) {
> +			ring->cmd_tables = hsw_render_ring_cmds;
> +			ring->cmd_table_count =
> +				ARRAY_SIZE(hsw_render_ring_cmds);
> +		} else {
> +			ring->cmd_tables = gen7_render_cmds;
> +			ring->cmd_table_count = ARRAY_SIZE(gen7_render_cmds);
> +		}
> +
>  		ring->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
>  		break;
>  	case VCS:
> +		ring->cmd_tables = gen7_video_cmds;
> +		ring->cmd_table_count = ARRAY_SIZE(gen7_video_cmds);
>  		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
>  		break;
>  	case BCS:
> +		ring->cmd_tables = gen7_blt_cmds;
> +		ring->cmd_table_count = ARRAY_SIZE(gen7_blt_cmds);
>  		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
>  		break;
>  	case VECS:
> +		ring->cmd_tables = hsw_vebox_cmds;
> +		ring->cmd_table_count = ARRAY_SIZE(hsw_vebox_cmds);
>  		/* VECS can use the same length_mask function as VCS */
>  		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
>  		break;
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index cbbaf26..13ed6ed 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -336,6 +336,52 @@
>  #define   PIPE_CONTROL_DEPTH_CACHE_FLUSH		(1<<0)
>  #define   PIPE_CONTROL_GLOBAL_GTT (1<<2) /* in addr dword */
>  
> +/*
> + * Commands used only by the command parser
> + */
> +#define MI_SET_PREDICATE       MI_INSTR(0x01, 0)
> +#define MI_ARB_CHECK           MI_INSTR(0x05, 0)
> +#define MI_RS_CONTROL          MI_INSTR(0x06, 0)
> +#define MI_URB_ATOMIC_ALLOC    MI_INSTR(0x09, 0)
> +#define MI_PREDICATE           MI_INSTR(0x0C, 0)
> +#define MI_RS_CONTEXT          MI_INSTR(0x0F, 0)
> +#define MI_TOPOLOGY_FILTER     MI_INSTR(0x0D, 0)
> +#define MI_URB_CLEAR           MI_INSTR(0x19, 0)
> +#define MI_UPDATE_GTT          MI_INSTR(0x23, 0)
> +#define MI_CLFLUSH             MI_INSTR(0x27, 0)
> +#define MI_LOAD_REGISTER_MEM   MI_INSTR(0x29, 0)
> +#define MI_LOAD_REGISTER_REG   MI_INSTR(0x2A, 0)
> +#define MI_RS_STORE_DATA_IMM   MI_INSTR(0x2B, 0)
> +#define MI_LOAD_URB_MEM        MI_INSTR(0x2C, 0)
> +#define MI_STORE_URB_MEM       MI_INSTR(0x2D, 0)
> +#define MI_CONDITIONAL_BATCH_BUFFER_END MI_INSTR(0x36, 0)
> +
> +#define PIPELINE_SELECT                ((0x3<<29)|(0x1<<27)|(0x1<<24)|(0x4<<16))
> +#define GFX_OP_3DSTATE_VF_STATISTICS   ((0x3<<29)|(0x1<<27)|(0x0<<24)|(0xB<<16))
> +#define GPGPU_OBJECT                   ((0x3<<29)|(0x2<<27)|(0x1<<24)|(0x4<<16))
> +#define GPGPU_WALKER                   ((0x3<<29)|(0x2<<27)|(0x1<<24)|(0x5<<16))
> +#define GFX_OP_3DSTATE_DX9_CONSTANTF_VS \
> +	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x39<<16))
> +#define GFX_OP_3DSTATE_DX9_CONSTANTF_PS \
> +	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x3A<<16))
> +#define GFX_OP_3DSTATE_SO_DECL_LIST \
> +	((0x3<<29)|(0x3<<27)|(0x1<<24)|(0x17<<16))
> +
> +#define GFX_OP_3DSTATE_BINDING_TABLE_EDIT_VS \
> +	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x43<<16))
> +#define GFX_OP_3DSTATE_BINDING_TABLE_EDIT_GS \
> +	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x44<<16))
> +#define GFX_OP_3DSTATE_BINDING_TABLE_EDIT_HS \
> +	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x45<<16))
> +#define GFX_OP_3DSTATE_BINDING_TABLE_EDIT_DS \
> +	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x46<<16))
> +#define GFX_OP_3DSTATE_BINDING_TABLE_EDIT_PS \
> +	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x47<<16))
> +
> +#define MFX_WAIT  ((0x3<<29)|(0x1<<27)|(0x0<<16))
> +
> +#define COLOR_BLT     ((0x2<<29)|(0x40<<22))
> +#define SRC_COPY_BLT  ((0x2<<29)|(0x43<<22))
>  
>  /*
>   * Reset registers
> -- 
> 1.8.5.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-01-29 21:55   ` [PATCH 02/13] drm/i915: Implement command buffer parsing logic bradley.d.volkin
  2014-01-29 22:28     ` Chris Wilson
  2014-01-30  9:07     ` Daniel Vetter
@ 2014-02-05 15:15     ` Jani Nikula
  2014-02-05 18:36       ` Volkin, Bradley D
  2014-02-07 13:58     ` Jani Nikula
  3 siblings, 1 reply; 138+ messages in thread
From: Jani Nikula @ 2014-02-05 15:15 UTC (permalink / raw)
  To: bradley.d.volkin, intel-gfx

On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
>
> The command parser scans batch buffers submitted via execbuffer ioctls before
> the driver submits them to hardware. At a high level, it looks for several
> things:
>
> 1) Commands which are explicitly defined as privileged or which should only be
>    used by the kernel driver. The parser generally rejects such commands, with
>    the provision that it may allow some from the drm master process.
> 2) Commands which access registers. To support correct/enhanced userspace
>    functionality, particularly certain OpenGL extensions, the parser provides a
>    whitelist of registers which userspace may safely access (for both normal and
>    drm master processes).
> 3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
>    parser always rejects such commands.
>
> Each ring maintains tables of commands and registers which the parser uses in
> scanning batch buffers submitted to that ring.
>
> The set of commands that the parser must check for is significantly smaller
> than the number of commands supported, especially on the render ring. As such,
> the parser tables (built up in subsequent patches) contain only those commands
> required by the parser. This generally works because command opcode ranges have
> standard command length encodings. So for commands that the parser does not need
> to check, it can easily skip them. This is implementated via a per-ring length
> decoding vfunc.
>
> Unfortunately, there are a number of commands that do not follow the standard
> length encoding for their opcode range, primarily amongst the MI_* commands. To
> handle this, the parser provides a way to define explicit "skip" entries in the
> per-ring command tables.
>
> Other command table entries will map fairly directly to high level categories
> mentioned above: rejected, master-only, register whitelist. A number of checks,
> including the privileged memory checks, are implemented via a general bitmasking
> mechanism.
>
> OTC-Tracker: AXIA-4631
> Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  drivers/gpu/drm/i915/Makefile              |   3 +-
>  drivers/gpu/drm/i915/i915_cmd_parser.c     | 404 +++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_drv.h            |  94 +++++++
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  17 ++
>  drivers/gpu/drm/i915/i915_params.c         |   5 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c    |   2 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h    |  32 +++
>  7 files changed, 556 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c
>
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 4850494..2da81bf 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -47,7 +47,8 @@ i915-y := i915_drv.o i915_dma.o i915_irq.o \
>  	  dvo_tfp410.o \
>  	  dvo_sil164.o \
>  	  dvo_ns2501.o \
> -	  i915_gem_dmabuf.o
> +	  i915_gem_dmabuf.o \
> +	  i915_cmd_parser.o

If you add this anywhere but last, you only need to touch one line
instead of two. It's nitpicky, but helps with things like git blame
(which would now point at you for i915_gem_dmabuf.o too ;).

>  
>  i915-$(CONFIG_COMPAT)   += i915_ioc32.o
>  
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> new file mode 100644
> index 0000000..7639dbc
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -0,0 +1,404 @@
> +/*
> + * Copyright © 2013 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + *    Brad Volkin <bradley.d.volkin@intel.com>
> + *
> + */
> +
> +#include "i915_drv.h"
> +
> +#define CLIENT_MASK      0xE0000000
> +#define SUBCLIENT_MASK   0x18000000
> +#define MI_CLIENT        0x00000000
> +#define RC_CLIENT        0x60000000
> +#define BC_CLIENT        0x40000000
> +#define MEDIA_SUBCLIENT  0x10000000
> +
> +static u32 gen7_render_get_cmd_length_mask(u32 cmd_header)
> +{
> +	u32 client = cmd_header & CLIENT_MASK;
> +	u32 subclient = cmd_header & SUBCLIENT_MASK;
> +
> +	if (client == MI_CLIENT)
> +		return 0x3F;
> +	else if (client == RC_CLIENT) {
> +		if (subclient == MEDIA_SUBCLIENT)
> +			return 0xFFFF;
> +		else
> +			return 0xFF;
> +	}
> +
> +	DRM_DEBUG_DRIVER("CMD: Abnormal rcs cmd length! 0x%08X\n", cmd_header);
> +	return 0;
> +}
> +
> +static u32 gen7_bsd_get_cmd_length_mask(u32 cmd_header)
> +{
> +	u32 client = cmd_header & CLIENT_MASK;
> +	u32 subclient = cmd_header & SUBCLIENT_MASK;
> +
> +	if (client == MI_CLIENT)
> +		return 0x3F;
> +	else if (client == RC_CLIENT) {
> +		if (subclient == MEDIA_SUBCLIENT)
> +			return 0xFFF;
> +		else
> +			return 0xFF;
> +	}
> +
> +	DRM_DEBUG_DRIVER("CMD: Abnormal bsd cmd length! 0x%08X\n", cmd_header);
> +	return 0;
> +}
> +
> +static u32 gen7_blt_get_cmd_length_mask(u32 cmd_header)
> +{
> +	u32 client = cmd_header & CLIENT_MASK;
> +
> +	if (client == MI_CLIENT)
> +		return 0x3F;
> +	else if (client == BC_CLIENT)
> +		return 0xFF;
> +
> +	DRM_DEBUG_DRIVER("CMD: Abnormal blt cmd length! 0x%08X\n", cmd_header);
> +	return 0;
> +}
> +
> +static void validate_cmds_sorted(struct intel_ring_buffer *ring)
> +{
> +	int i;
> +
> +	if (!ring->cmd_tables || ring->cmd_table_count == 0)
> +		return;
> +
> +	for (i = 0; i < ring->cmd_table_count; i++) {
> +		const struct drm_i915_cmd_table *table = &ring->cmd_tables[i];
> +		u32 previous = 0;
> +		int j;
> +
> +		for (j = 0; j < table->count; j++) {
> +			const struct drm_i915_cmd_descriptor *desc =
> +				&table->table[i];
> +			u32 curr = desc->cmd.value & desc->cmd.mask;
> +
> +			if (curr < previous) {
> +				DRM_ERROR("CMD: table not sorted ring=%d table=%d entry=%d cmd=0x%08X\n",
> +					  ring->id, i, j, curr);
> +				return;

So this checks the hand-filled tables, right?

I think this should not stop at the first error, but rather scan the
whole table and DRM_ERROR all cases where curr < previous, and after the
full scan BUG_ON() if there were any errors.

> +			}
> +
> +			previous = curr;
> +		}
> +	}
> +}
> +
> +static void check_sorted(int ring_id, const u32 *reg_table, int reg_count)
> +{
> +	int i;
> +	u32 previous = 0;
> +
> +	for (i = 0; i < reg_count; i++) {
> +		u32 curr = reg_table[i];
> +
> +		if (curr < previous) {
> +			DRM_ERROR("CMD: table not sorted ring=%d entry=%d reg=0x%08X\n",
> +				  ring_id, i, curr);
> +			return;

Same here.

> +		}
> +
> +		previous = curr;
> +	}
> +}
> +
> +static void validate_regs_sorted(struct intel_ring_buffer *ring)
> +{
> +	if (ring->reg_table && ring->reg_count > 0)
> +		check_sorted(ring->id, ring->reg_table, ring->reg_count);
> +
> +	if (ring->master_reg_table && ring->master_reg_count > 0)
> +		check_sorted(ring->id, ring->master_reg_table,
> +			     ring->master_reg_count);

Somehow I think the ifs here are redundant. check_sorted() is a no-op if
reg_count == 0, and if reg_count > 0 while reg_table == NULL, it
deserves to oops!

> +}
> +
> +void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
> +{
> +	if (!IS_GEN7(ring->dev))
> +		return;
> +
> +	switch (ring->id) {
> +	case RCS:
> +		ring->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
> +		break;
> +	case VCS:
> +		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
> +		break;
> +	case BCS:
> +		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
> +		break;
> +	case VECS:
> +		/* VECS can use the same length_mask function as VCS */
> +		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
> +		break;
> +	default:
> +		DRM_ERROR("CMD: cmd_parser_init with unknown ring: %d\n",
> +			  ring->id);

You'll oops later for calling NULL ring->get_cmd_length_mask(), so might
as well BUG() here.

> +	}
> +
> +	validate_cmds_sorted(ring);
> +	validate_regs_sorted(ring);
> +}
> +
> +static const struct drm_i915_cmd_descriptor*
> +find_cmd_in_table(const struct drm_i915_cmd_table *table,
> +		  u32 cmd_header)
> +{
> +	int i;
> +
> +	for (i = 0; i < table->count; i++) {
> +		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
> +		u32 masked_cmd = desc->cmd.mask & cmd_header;
> +		u32 masked_value = desc->cmd.value & desc->cmd.mask;
> +
> +		if (masked_cmd == masked_value)
> +			return desc;
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Returns a pointer to a descriptor for the command specified by cmd_header.
> + *
> + * The caller must supply space for a default descriptor via the default_desc
> + * parameter. If no descriptor for the specified command exists in the ring's
> + * command parser tables, this function fills in default_desc based on the
> + * ring's default length encoding and returns default_desc.
> + */
> +static const struct drm_i915_cmd_descriptor*
> +find_cmd(struct intel_ring_buffer *ring,
> +	 u32 cmd_header,
> +	 struct drm_i915_cmd_descriptor *default_desc)
> +{
> +	u32 mask;
> +	int i;
> +
> +	for (i = 0; i < ring->cmd_table_count; i++) {
> +		const struct drm_i915_cmd_descriptor *desc;
> +
> +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> +		if (desc)
> +			return desc;
> +	}
> +
> +	mask = ring->get_cmd_length_mask(cmd_header);
> +	if (!mask)
> +		return NULL;
> +
> +	BUG_ON(!default_desc);
> +	default_desc->flags = CMD_DESC_SKIP;
> +	default_desc->length.mask = mask;
> +
> +	return default_desc;
> +}
> +
> +static int valid_reg(const u32 *table, int count, u32 addr)

I like bools for boolean stuff.

> +{
> +	if (table && count != 0) {
> +		int i;
> +
> +		for (i = 0; i < count; i++) {
> +			if (table[i] == addr)
> +				return 1;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static u32 *vmap_batch(struct drm_i915_gem_object *obj)
> +{
> +	int i;
> +	void *addr = NULL;
> +	struct sg_page_iter sg_iter;
> +	struct page **pages;
> +
> +	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
> +	if (pages == NULL) {
> +		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
> +		goto finish;
> +	}
> +
> +	i = 0;
> +	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
> +		pages[i] = sg_page_iter_page(&sg_iter);
> +		i++;
> +	}
> +
> +	addr = vmap(pages, i, 0, PAGE_KERNEL);
> +	if (addr == NULL) {
> +		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
> +		goto finish;
> +	}
> +
> +finish:
> +	if (pages)
> +		drm_free_large(pages);
> +	return (u32*)addr;
> +}
> +
> +int i915_needs_cmd_parser(struct intel_ring_buffer *ring)

bool

> +{
> +	/* No command tables indicates a platform without parsing */
> +	if (!ring->cmd_tables)
> +		return 0;
> +
> +	return i915.enable_cmd_parser;
> +}
> +
> +#define LENGTH_BIAS 2
> +
> +int i915_parse_cmds(struct intel_ring_buffer *ring,
> +		    struct drm_i915_gem_object *batch_obj,
> +		    u32 batch_start_offset,
> +		    bool is_master)
> +{
> +	int ret = 0;
> +	u32 *cmd, *batch_base, *batch_end;
> +	struct drm_i915_cmd_descriptor default_desc = { 0 };
> +	int needs_clflush = 0;
> +
> +	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
> +		return ret;
> +	}
> +
> +	batch_base = vmap_batch(batch_obj);
> +	if (!batch_base) {
> +		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
> +		i915_gem_object_unpin_pages(batch_obj);
> +		return -ENOMEM;
> +	}
> +
> +	if (needs_clflush)
> +		drm_clflush_virt_range((char *)batch_base, batch_obj->base.size);
> +
> +	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
> +	batch_end = cmd + (batch_obj->base.size / sizeof(*batch_end));
> +
> +	while (cmd < batch_end) {
> +		const struct drm_i915_cmd_descriptor *desc;
> +		u32 length;
> +
> +		if (*cmd == MI_BATCH_BUFFER_END)
> +			break;
> +
> +		desc = find_cmd(ring, *cmd, &default_desc);
> +		if (!desc) {
> +			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
> +					 *cmd);
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		if (desc->flags & CMD_DESC_FIXED)
> +			length = desc->length.fixed;
> +		else
> +			length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
> +
> +		if ((batch_end - cmd) < length) {
> +			DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%d batchlen=%ld\n",
> +					 *cmd,
> +					 length,
> +					 batch_end - cmd);
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		if (desc->flags & CMD_DESC_REJECT) {
> +			DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		if ((desc->flags & CMD_DESC_MASTER) && !is_master) {
> +			DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n",
> +					 *cmd);
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		if (desc->flags & CMD_DESC_REGISTER) {
> +			u32 reg_addr = cmd[desc->reg.offset] & desc->reg.mask;
> +
> +			if (!valid_reg(ring->reg_table,
> +				       ring->reg_count, reg_addr)) {
> +				if (!is_master ||
> +				    !valid_reg(ring->master_reg_table,
> +					       ring->master_reg_count,
> +					       reg_addr)) {
> +					DRM_DEBUG_DRIVER("CMD: Rejected register 0x%08X in command: 0x%08X (ring=%d)\n",
> +							 reg_addr,
> +							 *cmd,
> +							 ring->id);
> +					ret = -EINVAL;
> +					break;
> +				}
> +			}
> +		}
> +
> +		if (desc->flags & CMD_DESC_BITMASK) {
> +			int i;
> +
> +			for (i = 0; i < desc->bits_count; i++) {
> +				u32 dword = cmd[desc->bits[i].offset] &
> +					desc->bits[i].mask;
> +
> +				if (dword != desc->bits[i].expected) {
> +					DRM_DEBUG_DRIVER("CMD: Rejected command 0x%08X for bitmask 0x%08X (exp=0x%08X act=0x%08X) (ring=%d)\n",
> +							 *cmd,
> +							 desc->bits[i].mask,
> +							 desc->bits[i].expected,
> +							 dword, ring->id);
> +					ret = -EINVAL;
> +					break;
> +				}
> +			}
> +
> +			if (ret)
> +				break;
> +		}
> +
> +		cmd += length;
> +	}
> +
> +	if (cmd >= batch_end) {
> +		DRM_DEBUG_DRIVER("CMD: Got to the end of the buffer w/o a BBE cmd!\n");
> +		ret = -EINVAL;
> +	}
> +
> +	vunmap(batch_base);
> +
> +	i915_gem_object_unpin_pages(batch_obj);
> +
> +	return ret;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index bfb30df..8aed80f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1765,6 +1765,91 @@ struct drm_i915_file_private {
>  	atomic_t rps_wait_boost;
>  };
>  
> +/**
> + * A command that requires special handling by the command parser.
> + */

You have plenty of kernel-doc comments here and in other patches. They
do expect a certain format, however. Please either make them regular
comments (the easy way) or adhere to proper kernel-doc format.

> +struct drm_i915_cmd_descriptor {
> +	/**
> +	 * Flags describing how the command parser processes the command.
> +	 *
> +	 * CMD_DESC_FIXED: The command has a fixed length if this is set,
> +	 *                 a length mask if not set
> +	 * CMD_DESC_SKIP: The command is allowed but does not follow the
> +	 *                standard length encoding for the opcode range in
> +	 *                which it falls
> +	 * CMD_DESC_REJECT: The command is never allowed
> +	 * CMD_DESC_REGISTER: The command should be checked against the
> +	 *                    register whitelist for the appropriate ring
> +	 * CMD_DESC_MASTER: The command is allowed if the submitting process
> +	 *                  is the DRM master
> +	 */
> +	u32 flags;
> +#define CMD_DESC_FIXED    (1<<0)
> +#define CMD_DESC_SKIP     (1<<1)
> +#define CMD_DESC_REJECT   (1<<2)
> +#define CMD_DESC_REGISTER (1<<3)
> +#define CMD_DESC_BITMASK  (1<<4)
> +#define CMD_DESC_MASTER   (1<<5)

Feels like flags should be named FLAG, not DESC. *shrug*.

> +
> +	/**
> +	 * The command's unique identification bits and the bitmask to get them.
> +	 * This isn't strictly the opcode field as defined in the spec and may
> +	 * also include type, subtype, and/or subop fields.
> +	 */
> +	struct {
> +		u32 value;
> +		u32 mask;
> +	} cmd;
> +
> +	/**
> +	 * The command's length. The command is either fixed length (i.e. does
> +	 * not include a length field) or has a length field mask. The flag
> +	 * CMD_DESC_FIXED indicates a fixed length. Otherwise, the command has
> +	 * a length mask. All command entries in a command table must include
> +	 * length information.
> +	 */
> +	union {
> +		u32 fixed;
> +		u32 mask;
> +	} length;
> +
> +	/**
> +	 * Describes where to find a register address in the command to check
> +	 * against the ring's register whitelist. Only valid if flags has the
> +	 * CMD_DESC_REGISTER bit set.
> +	 */
> +	struct {
> +		u32 offset;
> +		u32 mask;
> +	} reg;
> +
> +#define MAX_CMD_DESC_BITMASKS 3
> +	/**
> +	 * Describes command checks where a particular dword is masked and
> +	 * compared against an expected value. If the command does not match
> +	 * the expected value, the parser rejects it. Only valid if flags has
> +	 * the CMD_DESC_BITMASK bit set.
> +	 */
> +	struct {
> +		u32 offset;
> +		u32 mask;
> +		u32 expected;
> +	} bits[MAX_CMD_DESC_BITMASKS];
> +	/** Number of valid entries in the bits array */
> +	int bits_count;
> +};
> +
> +/**
> + * A table of commands requiring special handling by the command parser.
> + *
> + * Each ring has an array of tables. Each table consists of an array of command
> + * descriptors, which must be sorted with command opcodes in ascending order.
> + */
> +struct drm_i915_cmd_table {
> +	const struct drm_i915_cmd_descriptor *table;
> +	int count;
> +};
> +
>  #define INTEL_INFO(dev)	(to_i915(dev)->info)
>  
>  #define IS_I830(dev)		((dev)->pdev->device == 0x3577)
> @@ -1923,6 +2008,7 @@ struct i915_params {
>  	bool prefault_disable;
>  	bool reset;
>  	int invert_brightness;
> +	int enable_cmd_parser;
>  };
>  extern struct i915_params i915 __read_mostly;
>  
> @@ -2428,6 +2514,14 @@ void i915_destroy_error_state(struct drm_device *dev);
>  void i915_get_extra_instdone(struct drm_device *dev, uint32_t *instdone);
>  const char *i915_cache_level_str(int type);
>  
> +/* i915_cmd_parser.c */
> +void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring);
> +int i915_needs_cmd_parser(struct intel_ring_buffer *ring);
> +int i915_parse_cmds(struct intel_ring_buffer *ring,
> +		    struct drm_i915_gem_object *batch_obj,
> +		    u32 batch_start_offset,
> +		    bool is_master);
> +
>  /* i915_suspend.c */
>  extern int i915_save_state(struct drm_device *dev);
>  extern int i915_restore_state(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 032def9..c953445 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1180,6 +1180,23 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	}
>  	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
>  
> +	if (i915_needs_cmd_parser(ring)) {
> +		ret = i915_parse_cmds(ring,
> +				      batch_obj,
> +				      args->batch_start_offset,
> +				      file->is_master);
> +		if (ret)
> +			goto err;
> +
> +		/*
> +		 * Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
> +		 * from MI_BATCH_BUFFER_START commands issued in the
> +		 * dispatch_execbuffer implementations. We specifically don't
> +		 * want that set when the command parser is enabled.
> +		 */
> +		flags |= I915_DISPATCH_SECURE;
> +	}
> +
>  	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
>  	 * batch" bit. Hence we need to pin secure batches into the global gtt.
>  	 * hsw should have this fixed, but bdw mucks it up again. */
> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> index c743057..6d3d906 100644
> --- a/drivers/gpu/drm/i915/i915_params.c
> +++ b/drivers/gpu/drm/i915/i915_params.c
> @@ -47,6 +47,7 @@ struct i915_params i915 __read_mostly = {
>  	.prefault_disable = 0,
>  	.reset = true,
>  	.invert_brightness = 0,
> +	.enable_cmd_parser = 0

Please add a comma in the end so the next addition won't have to, just
like this doesn't have to touch the previous line.

>  };
>  
>  module_param_named(modeset, i915.modeset, int, 0400);
> @@ -153,3 +154,7 @@ MODULE_PARM_DESC(invert_brightness,
>  	"report PCI device ID, subsystem vendor and subsystem device ID "
>  	"to dri-devel@lists.freedesktop.org, if your machine needs it. "
>  	"It will then be included in an upcoming module version.");
> +
> +module_param_named(enable_cmd_parser, i915.enable_cmd_parser, int, 0600);
> +MODULE_PARM_DESC(enable_cmd_parser,
> +		"Enable command parsing (default: false)");

If it's a bool, make it a bool, or change the default text to 0.

> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index a0d61f8..77fc61d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1388,6 +1388,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  	if (IS_I830(ring->dev) || IS_845G(ring->dev))
>  		ring->effective_size -= 128;
>  
> +	i915_cmd_parser_init_ring(ring);
> +
>  	return 0;
>  
>  err_unmap:
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 71a73f4..cff2b35 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -162,6 +162,38 @@ struct  intel_ring_buffer {
>  		u32 gtt_offset;
>  		volatile u32 *cpu_page;
>  	} scratch;
> +
> +	/**
> +	 * Tables of commands the command parser needs to know about
> +	 * for this ring.
> +	 */
> +	const struct drm_i915_cmd_table *cmd_tables;
> +	int cmd_table_count;
> +
> +	/**
> +	 * Table of registers allowed in commands that read/write registers.
> +	 */
> +	const u32 *reg_table;
> +	int reg_count;
> +
> +	/**
> +	 * Table of registers allowed in commands that read/write registers, but
> +	 * only from the DRM master.
> +	 */
> +	const u32 *master_reg_table;
> +	int master_reg_count;
> +
> +	/**
> +	 * Returns the bitmask for the length field of the specified command.
> +	 * Return 0 for an unrecognized/invalid command.
> +	 *
> +	 * If the command parser finds an entry for a command in the ring's
> +	 * cmd_tables, it gets the command's length based on the table entry.
> +	 * If not, it calls this function to determine the per-ring length field
> +	 * encoding for the command (i.e. certain opcode ranges use certain bits
> +	 * to encode the command length in the header).
> +	 */
> +	u32 (*get_cmd_length_mask)(u32 cmd_header);
>  };

Plenty of non-conforming kernel-doc comments here too.

>  
>  static inline bool
> -- 
> 1.8.5.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 04/13] drm/i915: Reject privileged commands
  2014-01-29 21:55   ` [PATCH 04/13] drm/i915: Reject privileged commands bradley.d.volkin
@ 2014-02-05 15:22     ` Jani Nikula
  2014-02-05 18:42       ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Jani Nikula @ 2014-02-05 15:22 UTC (permalink / raw)
  To: bradley.d.volkin, intel-gfx

On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
>
> The spec defines most of these commands as privileged. A few others,
> like the semaphore mbox command and some display commands, are also
> reserved for the driver's use. Subsequent patches relax some of
> these restrictions.
>
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 54 ++++++++++++++++++++++++----------
>  drivers/gpu/drm/i915/i915_reg.h        | 31 +++++++++----------
>  2 files changed, 54 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 2e27bad..cc2f68c 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -57,27 +57,27 @@
>  static const struct drm_i915_cmd_descriptor common_cmds[] = {
>  	CMD(  MI_NOOP,                          SMI,    F,  1,      S  ),
>  	CMD(  MI_USER_INTERRUPT,                SMI,    F,  1,      S  ),
> -	CMD(  MI_WAIT_FOR_EVENT,                SMI,    F,  1,      S  ),
> +	CMD(  MI_WAIT_FOR_EVENT,                SMI,    F,  1,      R  ),
>  	CMD(  MI_ARB_CHECK,                     SMI,    F,  1,      S  ),
>  	CMD(  MI_REPORT_HEAD,                   SMI,    F,  1,      S  ),
>  	CMD(  MI_SUSPEND_FLUSH,                 SMI,    F,  1,      S  ),
> -	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   S  ),
> -	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   S  ),
> -	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   S  ),
> -	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   S  ),
> -	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   R  ),
> +	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   R  ),
> +	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   R  ),
> +	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   R  ),
> +	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   R  ),
>  	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
>  };
>  
>  static const struct drm_i915_cmd_descriptor render_cmds[] = {
>  	CMD(  MI_FLUSH,                         SMI,    F,  1,      S  ),
> -	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
> +	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
>  	CMD(  MI_PREDICATE,                     SMI,    F,  1,      S  ),
>  	CMD(  MI_TOPOLOGY_FILTER,               SMI,    F,  1,      S  ),
> -	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   S  ),
> -	CMD(  MI_SET_CONTEXT,                   SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
> +	CMD(  MI_SET_CONTEXT,                   SMI,   !F,  0xFF,   R  ),
>  	CMD(  MI_URB_CLEAR,                     SMI,   !F,  0xFF,   S  ),
> -	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0xFF,   R  ),
>  	CMD(  MI_CLFLUSH,                       SMI,   !F,  0x3FF,  S  ),
>  	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
>  	CMD(  GFX_OP_3DSTATE_VF_STATISTICS,     S3D,    F,  1,      S  ),
> @@ -92,7 +92,9 @@ static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
>  	CMD(  MI_RS_CONTROL,                    SMI,    F,  1,      S  ),
>  	CMD(  MI_URB_ATOMIC_ALLOC,              SMI,    F,  1,      S  ),
>  	CMD(  MI_RS_CONTEXT,                    SMI,    F,  1,      S  ),
> -	CMD(  MI_LOAD_REGISTER_REG,             SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_LOAD_SCAN_LINES_INCL,          SMI,   !F,  0x3F,   R  ),
> +	CMD(  MI_LOAD_SCAN_LINES_EXCL,          SMI,   !F,  0x3F,   R  ),
> +	CMD(  MI_LOAD_REGISTER_REG,             SMI,   !F,  0xFF,   R  ),
>  	CMD(  MI_RS_STORE_DATA_IMM,             SMI,   !F,  0xFF,   S  ),
>  	CMD(  MI_LOAD_URB_MEM,                  SMI,   !F,  0xFF,   S  ),
>  	CMD(  MI_STORE_URB_MEM,                 SMI,   !F,  0xFF,   S  ),
> @@ -107,8 +109,9 @@ static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
>  };
>  
>  static const struct drm_i915_cmd_descriptor video_cmds[] = {
> -	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
> +	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
>  	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0x3F,   R  ),
>  	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
>  	/*
>  	 * MFX_WAIT doesn't fit the way we handle length for most commands.
> @@ -119,18 +122,25 @@ static const struct drm_i915_cmd_descriptor video_cmds[] = {
>  };
>  
>  static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
> -	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      S  ),
> +	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
>  	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0x3F,   R  ),
>  	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
>  };
>  
>  static const struct drm_i915_cmd_descriptor blt_cmds[] = {
> -	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
>  	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  S  ),
> +	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0x3F,   R  ),
>  	CMD(  COLOR_BLT,                        S2D,   !F,  0x3F,   S  ),
>  	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
>  };
>  
> +static const struct drm_i915_cmd_descriptor hsw_blt_cmds[] = {
> +	CMD(  MI_LOAD_SCAN_LINES_INCL,          SMI,   !F,  0x3F,   R  ),
> +	CMD(  MI_LOAD_SCAN_LINES_EXCL,          SMI,   !F,  0x3F,   R  ),
> +};
> +
>  #undef CMD
>  #undef SMI
>  #undef S3D
> @@ -169,6 +179,12 @@ static const struct drm_i915_cmd_table gen7_blt_cmds[] = {
>  	{ blt_cmds, ARRAY_SIZE(blt_cmds) },
>  };
>  
> +static const struct drm_i915_cmd_table hsw_blt_ring_cmds[] = {
> +	{ common_cmds, ARRAY_SIZE(common_cmds) },
> +	{ blt_cmds, ARRAY_SIZE(blt_cmds) },
> +	{ hsw_blt_cmds, ARRAY_SIZE(hsw_blt_cmds) },
> +};
> +
>  #define CLIENT_MASK      0xE0000000
>  #define SUBCLIENT_MASK   0x18000000
>  #define MI_CLIENT        0x00000000
> @@ -305,8 +321,14 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
>  		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
>  		break;
>  	case BCS:
> -		ring->cmd_tables = gen7_blt_cmds;
> -		ring->cmd_table_count = ARRAY_SIZE(gen7_blt_cmds);
> +		if (IS_HASWELL(ring->dev)) {
> +			ring->cmd_tables = hsw_blt_ring_cmds;
> +			ring->cmd_table_count = ARRAY_SIZE(hsw_blt_ring_cmds);
> +		} else {
> +			ring->cmd_tables = gen7_blt_cmds;
> +			ring->cmd_table_count = ARRAY_SIZE(gen7_blt_cmds);
> +		}
> +
>  		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
>  		break;
>  	case VECS:
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 13ed6ed..2b7c26e 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -339,21 +339,22 @@
>  /*
>   * Commands used only by the command parser
>   */
> -#define MI_SET_PREDICATE       MI_INSTR(0x01, 0)
> -#define MI_ARB_CHECK           MI_INSTR(0x05, 0)
> -#define MI_RS_CONTROL          MI_INSTR(0x06, 0)
> -#define MI_URB_ATOMIC_ALLOC    MI_INSTR(0x09, 0)
> -#define MI_PREDICATE           MI_INSTR(0x0C, 0)
> -#define MI_RS_CONTEXT          MI_INSTR(0x0F, 0)
> -#define MI_TOPOLOGY_FILTER     MI_INSTR(0x0D, 0)
> -#define MI_URB_CLEAR           MI_INSTR(0x19, 0)
> -#define MI_UPDATE_GTT          MI_INSTR(0x23, 0)
> -#define MI_CLFLUSH             MI_INSTR(0x27, 0)
> -#define MI_LOAD_REGISTER_MEM   MI_INSTR(0x29, 0)
> -#define MI_LOAD_REGISTER_REG   MI_INSTR(0x2A, 0)
> -#define MI_RS_STORE_DATA_IMM   MI_INSTR(0x2B, 0)
> -#define MI_LOAD_URB_MEM        MI_INSTR(0x2C, 0)
> -#define MI_STORE_URB_MEM       MI_INSTR(0x2D, 0)
> +#define MI_SET_PREDICATE        MI_INSTR(0x01, 0)
> +#define MI_ARB_CHECK            MI_INSTR(0x05, 0)
> +#define MI_RS_CONTROL           MI_INSTR(0x06, 0)
> +#define MI_URB_ATOMIC_ALLOC     MI_INSTR(0x09, 0)
> +#define MI_PREDICATE            MI_INSTR(0x0C, 0)
> +#define MI_RS_CONTEXT           MI_INSTR(0x0F, 0)
> +#define MI_TOPOLOGY_FILTER      MI_INSTR(0x0D, 0)
> +#define MI_LOAD_SCAN_LINES_EXCL MI_INSTR(0x13, 0)
> +#define MI_URB_CLEAR            MI_INSTR(0x19, 0)
> +#define MI_UPDATE_GTT           MI_INSTR(0x23, 0)
> +#define MI_CLFLUSH              MI_INSTR(0x27, 0)
> +#define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
> +#define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
> +#define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
> +#define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
> +#define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)

Superfluous whitespace change hunk.


>  #define MI_CONDITIONAL_BATCH_BUFFER_END MI_INSTR(0x36, 0)
>  
>  #define PIPELINE_SELECT                ((0x3<<29)|(0x1<<27)|(0x1<<24)|(0x4<<16))
> -- 
> 1.8.5.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 06/13] drm/i915: Add register whitelists for mesa
  2014-01-29 21:55   ` [PATCH 06/13] drm/i915: Add register whitelists for mesa bradley.d.volkin
@ 2014-02-05 15:29     ` Jani Nikula
  2014-02-05 18:47       ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Jani Nikula @ 2014-02-05 15:29 UTC (permalink / raw)
  To: bradley.d.volkin, intel-gfx

On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
>
> These registers are currently used by mesa for blitting,
> transform feedback extensions, and performance monitoring
> extensions.
>
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 55 ++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_reg.h        | 20 +++++++++++++
>  2 files changed, 75 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 88456638..18d5b05 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -185,6 +185,55 @@ static const struct drm_i915_cmd_table hsw_blt_ring_cmds[] = {
>  	{ hsw_blt_cmds, ARRAY_SIZE(hsw_blt_cmds) },
>  };
>  
> +/*
> + * Register whitelists, sorted by increasing register offset.
> + *
> + * Some registers that userspace accesses are 64 bits. The register
> + * access commands only allow 32-bit accesses. Hence, we have to include
> + * entries for both halves of the 64-bit registers.
> + */

Seems like it would be useful to have a helper macro here.

	#define FOO64(addr) (addr), (addr + 4)

With a better name, hopefully. My imagination fails me now.

> +
> +static const u32 gen7_render_regs[] = {
> +	HS_INVOCATION_COUNT,
> +	HS_INVOCATION_COUNT + sizeof(u32),
> +	DS_INVOCATION_COUNT,
> +	DS_INVOCATION_COUNT + sizeof(u32),
> +	IA_VERTICES_COUNT,
> +	IA_VERTICES_COUNT + sizeof(u32),
> +	IA_PRIMITIVES_COUNT,
> +	IA_PRIMITIVES_COUNT + sizeof(u32),
> +	VS_INVOCATION_COUNT,
> +	VS_INVOCATION_COUNT + sizeof(u32),
> +	GS_INVOCATION_COUNT,
> +	GS_INVOCATION_COUNT + sizeof(u32),
> +	GS_PRIMITIVES_COUNT,
> +	GS_PRIMITIVES_COUNT + sizeof(u32),
> +	CL_INVOCATION_COUNT,
> +	CL_INVOCATION_COUNT + sizeof(u32),
> +	CL_PRIMITIVES_COUNT,
> +	CL_PRIMITIVES_COUNT + sizeof(u32),
> +	PS_INVOCATION_COUNT,
> +	PS_INVOCATION_COUNT + sizeof(u32),
> +	PS_DEPTH_COUNT,
> +	PS_DEPTH_COUNT + sizeof(u32),
> +	GEN7_SO_NUM_PRIMS_WRITTEN(0),
> +	GEN7_SO_NUM_PRIMS_WRITTEN(0) + sizeof(u32),
> +	GEN7_SO_NUM_PRIMS_WRITTEN(1),
> +	GEN7_SO_NUM_PRIMS_WRITTEN(1) + sizeof(u32),
> +	GEN7_SO_NUM_PRIMS_WRITTEN(2),
> +	GEN7_SO_NUM_PRIMS_WRITTEN(2) + sizeof(u32),
> +	GEN7_SO_NUM_PRIMS_WRITTEN(3),
> +	GEN7_SO_NUM_PRIMS_WRITTEN(3) + sizeof(u32),
> +	GEN7_SO_WRITE_OFFSET(0),
> +	GEN7_SO_WRITE_OFFSET(1),
> +	GEN7_SO_WRITE_OFFSET(2),
> +	GEN7_SO_WRITE_OFFSET(3),
> +};
> +
> +static const u32 gen7_blt_regs[] = {
> +	BCS_SWCTRL,
> +};
> +
>  #define CLIENT_MASK      0xE0000000
>  #define SUBCLIENT_MASK   0x18000000
>  #define MI_CLIENT        0x00000000
> @@ -313,6 +362,9 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
>  			ring->cmd_table_count = ARRAY_SIZE(gen7_render_cmds);
>  		}
>  
> +		ring->reg_table = gen7_render_regs;
> +		ring->reg_count = ARRAY_SIZE(gen7_render_regs);
> +
>  		ring->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
>  		break;
>  	case VCS:
> @@ -329,6 +381,9 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
>  			ring->cmd_table_count = ARRAY_SIZE(gen7_blt_cmds);
>  		}
>  
> +		ring->reg_table = gen7_blt_regs;
> +		ring->reg_count = ARRAY_SIZE(gen7_blt_regs);
> +
>  		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
>  		break;
>  	case VECS:
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 2b7c26e..b99bacf 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -385,6 +385,26 @@
>  #define SRC_COPY_BLT  ((0x2<<29)|(0x43<<22))
>  
>  /*
> + * Registers used only by the command parser
> + */
> +#define BCS_SWCTRL 0x22200
> +
> +#define HS_INVOCATION_COUNT 0x2300
> +#define DS_INVOCATION_COUNT 0x2308
> +#define IA_VERTICES_COUNT   0x2310
> +#define IA_PRIMITIVES_COUNT 0x2318
> +#define VS_INVOCATION_COUNT 0x2320
> +#define GS_INVOCATION_COUNT 0x2328
> +#define GS_PRIMITIVES_COUNT 0x2330
> +#define CL_INVOCATION_COUNT 0x2338
> +#define CL_PRIMITIVES_COUNT 0x2340
> +#define PS_INVOCATION_COUNT 0x2348
> +#define PS_DEPTH_COUNT      0x2350
> +
> +/* There are the 4 64-bit counter registers, one for each stream output */
> +#define GEN7_SO_NUM_PRIMS_WRITTEN(n) (0x5200 + (n) * 8)
> +
> +/*
>   * Reset registers
>   */
>  #define DEBUG_RESET_I830		0x6070
> -- 
> 1.8.5.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 08/13] drm/i915: Enable register whitelist checks
  2014-01-29 21:55   ` [PATCH 08/13] drm/i915: Enable register whitelist checks bradley.d.volkin
@ 2014-02-05 15:33     ` Jani Nikula
  2014-02-05 18:49       ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Jani Nikula @ 2014-02-05 15:33 UTC (permalink / raw)
  To: bradley.d.volkin, intel-gfx

On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
>
> MI_STORE_REGISTER_MEM, MI_LOAD_REGISTER_MEM, and MI_LOAD_REGISTER_IMM
> commands allow userspace access to registers. Only certain registers
> should be allowed for such access, so enable checking for those commands.
> Each ring gets its own register whitelist.
>
> MI_LOAD_REGISTER_REG on HSW also allows register access but is currently
> unused by userspace components. Leave it rejected.
>
> PIPE_CONTROL and MEDIA_VFE_STATE allow register access based on certain
> bits being set. Reject those as well.
>
> OTC-Tracker: AXIA-4631
> Change-Id: Ie614a2f0eb2e5917de809e5a17957175d24cc44f
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 23 ++++++++++++++++++++---
>  drivers/gpu/drm/i915/i915_reg.h        |  3 +++
>  2 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 296e322..5d3e303 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -63,9 +63,12 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = {
>  	CMD(  MI_SUSPEND_FLUSH,                 SMI,    F,  1,      S  ),
>  	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   R  ),
>  	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   R  ),
> -	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   R  ),
> -	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   R  ),
> -	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   R  ),
> +	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   W,
> +	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
> +	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   W,
> +	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
> +	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   W,
> +	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
>  	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
>  };
>  
> @@ -82,9 +85,23 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
>  	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
>  	CMD(  GFX_OP_3DSTATE_VF_STATISTICS,     S3D,    F,  1,      S  ),
>  	CMD(  PIPELINE_SELECT,                  S3D,    F,  1,      S  ),
> +	CMD(  MEDIA_VFE_STATE,			S3D,   !F,  0xFFFF, B,
> +	      .bits = {{
> +			.offset = 2,
> +			.mask = MEDIA_VFE_STATE_MMIO_ACCESS_MASK,
> +			.expected = 0
> +	      }},
> +	      .bits_count = 1					       ),

>From my bikeshedding dept.: here too I think it would be beneficial to
have the count decided by an empty element, or a .valid = 1 field or
something.


>  	CMD(  GPGPU_OBJECT,                     S3D,   !F,  0xFF,   S  ),
>  	CMD(  GPGPU_WALKER,                     S3D,   !F,  0xFF,   S  ),
>  	CMD(  GFX_OP_3DSTATE_SO_DECL_LIST,      S3D,   !F,  0x1FF,  S  ),
> +	CMD(  GFX_OP_PIPE_CONTROL(5),           S3D,   !F,  0xFF,   B,
> +	      .bits = {{
> +			.offset = 1,
> +			.mask = PIPE_CONTROL_MMIO_WRITE,
> +			.expected = 0
> +	      }},
> +	      .bits_count = 1					       ),
>  };
>  
>  static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index b99bacf..6592d0d 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -319,6 +319,7 @@
>  #define   DISPLAY_PLANE_B           (1<<20)
>  #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|(len-2))
>  #define   PIPE_CONTROL_GLOBAL_GTT_IVB			(1<<24) /* gen7+ */
> +#define   PIPE_CONTROL_MMIO_WRITE			(1<<23)
>  #define   PIPE_CONTROL_CS_STALL				(1<<20)
>  #define   PIPE_CONTROL_TLB_INVALIDATE			(1<<18)
>  #define   PIPE_CONTROL_QW_WRITE				(1<<14)
> @@ -359,6 +360,8 @@
>  
>  #define PIPELINE_SELECT                ((0x3<<29)|(0x1<<27)|(0x1<<24)|(0x4<<16))
>  #define GFX_OP_3DSTATE_VF_STATISTICS   ((0x3<<29)|(0x1<<27)|(0x0<<24)|(0xB<<16))
> +#define MEDIA_VFE_STATE                ((0x3<<29)|(0x2<<27)|(0x0<<24)|(0x0<<16))
> +#define  MEDIA_VFE_STATE_MMIO_ACCESS_MASK (0x18)
>  #define GPGPU_OBJECT                   ((0x3<<29)|(0x2<<27)|(0x1<<24)|(0x4<<16))
>  #define GPGPU_WALKER                   ((0x3<<29)|(0x2<<27)|(0x1<<24)|(0x5<<16))
>  #define GFX_OP_3DSTATE_DX9_CONSTANTF_VS \
> -- 
> 1.8.5.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 10/13] drm/i915: Enable PPGTT command parser checks
  2014-01-29 21:55   ` [PATCH 10/13] drm/i915: Enable PPGTT command parser checks bradley.d.volkin
  2014-01-29 22:33     ` Chris Wilson
@ 2014-02-05 15:37     ` Jani Nikula
  2014-02-05 18:54       ` Volkin, Bradley D
  1 sibling, 1 reply; 138+ messages in thread
From: Jani Nikula @ 2014-02-05 15:37 UTC (permalink / raw)
  To: bradley.d.volkin, intel-gfx

On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
>
> Various commands that access memory have a bit to determine whether
> the graphics address specified in the command should use the GGTT or
> PPGTT for translation. These checks ensure that the bit indicates
> PPGTT translation.
>
> Most of these checks use the existing bit-checking infrastructure.
> The PIPE_CONTROL and MI_FLUSH_DW commands, however, are multi-function
> commands. The GGTT/PPGTT bit is only relevant for certain uses of the
> command. As such, this change also extends the bit-checking code to
> include a "condition" mask and offset. If the condition mask is non-zero
> then the parser only performs the bit check when the bits specified by
> the condition mask/offset are also non-zero.
>
> NOTE: At this point in the series PPGTT must be enabled for the parser
> to work correctly. If it's not enabled, userspace will not be setting
> the PPGTT bits the way the parser requires. VLV is the only platform
> where this is a problem, so at this point, we disable parsing for VLV.
>
> OTC-Tracker: AXIA-4631
> Change-Id: I3f4c76b6734f1956ec47e698230f97d0998ff92b
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 147 +++++++++++++++++++++++++++++----
>  drivers/gpu/drm/i915/i915_drv.h        |   6 ++
>  drivers/gpu/drm/i915/i915_reg.h        |   6 ++
>  3 files changed, 144 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 7de7c6a..26072a2 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -65,10 +65,22 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = {
>  	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   R  ),
>  	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   W,
>  	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
> -	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   W,
> -	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
> -	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   W,
> -	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
> +	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   W | B,
> +	      .reg = { .offset = 1, .mask = 0x007FFFFC },
> +	      .bits = {{
> +			.offset = 0,
> +			.mask = MI_GLOBAL_GTT,
> +			.expected = 0

Not specific to this patch or this field, but all around I think you
should add the comma to the last line too. It's a pretty universal way
of doing things in the kernel, both for array and struct initialization.

> +	      }},
> +	      .bits_count = 1                                          ),
> +	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   W | B,
> +	      .reg = { .offset = 1, .mask = 0x007FFFFC },
> +	      .bits = {{
> +			.offset = 0,
> +			.mask = MI_GLOBAL_GTT,
> +			.expected = 0
> +	      }},
> +	      .bits_count = 1                                          ),
>  	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
>  };
>  
> @@ -80,9 +92,35 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
>  	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
>  	CMD(  MI_SET_CONTEXT,                   SMI,   !F,  0xFF,   R  ),
>  	CMD(  MI_URB_CLEAR,                     SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3F,   B,
> +	      .bits = {{
> +			.offset = 0,
> +			.mask = MI_GLOBAL_GTT,
> +			.expected = 0
> +	      }},
> +	      .bits_count = 1                                          ),
>  	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0xFF,   R  ),
> -	CMD(  MI_CLFLUSH,                       SMI,   !F,  0x3FF,  S  ),
> -	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_CLFLUSH,                       SMI,   !F,  0x3FF,  B,
> +	      .bits = {{
> +			.offset = 0,
> +			.mask = MI_GLOBAL_GTT,
> +			.expected = 0
> +	      }},
> +	      .bits_count = 1                                          ),
> +	CMD(  MI_REPORT_PERF_COUNT,             SMI,   !F,  0x3F,   B,
> +	      .bits = {{
> +			.offset = 1,
> +			.mask = MI_REPORT_PERF_COUNT_GGTT,
> +			.expected = 0
> +	      }},
> +	      .bits_count = 1                                          ),
> +	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
> +	      .bits = {{
> +			.offset = 0,
> +			.mask = MI_GLOBAL_GTT,
> +			.expected = 0
> +	      }},
> +	      .bits_count = 1                                          ),
>  	CMD(  GFX_OP_3DSTATE_VF_STATISTICS,     S3D,    F,  1,      S  ),
>  	CMD(  PIPELINE_SELECT,                  S3D,    F,  1,      S  ),
>  	CMD(  MEDIA_VFE_STATE,			S3D,   !F,  0xFFFF, B,
> @@ -100,8 +138,15 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
>  			.offset = 1,
>  			.mask = (PIPE_CONTROL_MMIO_WRITE | PIPE_CONTROL_NOTIFY),
>  			.expected = 0
> +	      },
> +	      {
> +			.offset = 1,
> +		        .mask = PIPE_CONTROL_GLOBAL_GTT_IVB,
> +			.expected = 0,
> +			.condition_offset = 1,
> +			.condition_mask = PIPE_CONTROL_POST_SYNC_OP_MASK
>  	      }},
> -	      .bits_count = 1					       ),
> +	      .bits_count = 2					       ),
>  };
>  
>  static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
> @@ -127,16 +172,35 @@ static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
>  
>  static const struct drm_i915_cmd_descriptor video_cmds[] = {
>  	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
> -	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   B,
> +	      .bits = {{
> +			.offset = 0,
> +			.mask = MI_GLOBAL_GTT,
> +			.expected = 0
> +	      }},
> +	      .bits_count = 1                                          ),
>  	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0x3F,   R  ),
>  	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
>  	      .bits = {{
>  			.offset = 0,
>  			.mask = MI_FLUSH_DW_NOTIFY,
>  			.expected = 0
> +	      },
> +	      {
> +			.offset = 1,
> +			.mask = MI_FLUSH_DW_USE_GTT,
> +			.expected = 0,
> +			.condition_offset = 0,
> +			.condition_mask = MI_FLUSH_DW_OP_MASK
>  	      }},
> -	      .bits_count = 1					       ),
> -	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
> +	      .bits_count = 2                                          ),
> +	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
> +	      .bits = {{
> +			.offset = 0,
> +			.mask = MI_GLOBAL_GTT,
> +			.expected = 0
> +	      }},
> +	      .bits_count = 1                                          ),
>  	/*
>  	 * MFX_WAIT doesn't fit the way we handle length for most commands.
>  	 * It has a length field but it uses a non-standard length bias.
> @@ -147,29 +211,61 @@ static const struct drm_i915_cmd_descriptor video_cmds[] = {
>  
>  static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
>  	CMD(  MI_ARB_ON_OFF,                    SMI,    F,  1,      R  ),
> -	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   S  ),
> +	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0xFF,   B,
> +	      .bits = {{
> +			.offset = 0,
> +			.mask = MI_GLOBAL_GTT,
> +			.expected = 0
> +	      }},
> +	      .bits_count = 1                                          ),
>  	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0x3F,   R  ),
>  	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
>  	      .bits = {{
>  			.offset = 0,
>  			.mask = MI_FLUSH_DW_NOTIFY,
>  			.expected = 0
> +	      },
> +	      {
> +			.offset = 1,
> +			.mask = MI_FLUSH_DW_USE_GTT,
> +			.expected = 0,
> +			.condition_offset = 0,
> +			.condition_mask = MI_FLUSH_DW_OP_MASK
>  	      }},
> -	      .bits_count = 1					       ),
> -	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
> +	      .bits_count = 2					       ),
> +	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
> +	      .bits = {{
> +			.offset = 0,
> +			.mask = MI_GLOBAL_GTT,
> +			.expected = 0
> +	      }},
> +	      .bits_count = 1                                          ),
>  };
>  
>  static const struct drm_i915_cmd_descriptor blt_cmds[] = {
>  	CMD(  MI_DISPLAY_FLIP,                  SMI,   !F,  0xFF,   R  ),
> -	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  S  ),
> +	CMD(  MI_STORE_DWORD_IMM,               SMI,   !F,  0x3FF,  B,
> +	      .bits = {{
> +			.offset = 0,
> +			.mask = MI_GLOBAL_GTT,
> +			.expected = 0
> +	      }},
> +	      .bits_count = 1                                          ),
>  	CMD(  MI_UPDATE_GTT,                    SMI,   !F,  0x3F,   R  ),
>  	CMD(  MI_FLUSH_DW,                      SMI,   !F,  0x3F,   B,
>  	      .bits = {{
>  			.offset = 0,
>  			.mask = MI_FLUSH_DW_NOTIFY,
>  			.expected = 0
> +	      },
> +	      {
> +			.offset = 1,
> +			.mask = MI_FLUSH_DW_USE_GTT,
> +			.expected = 0,
> +			.condition_offset = 0,
> +			.condition_mask = MI_FLUSH_DW_OP_MASK
>  	      }},
> -	      .bits_count = 1					       ),
> +	      .bits_count = 2					       ),
>  	CMD(  COLOR_BLT,                        S2D,   !F,  0x3F,   S  ),
>  	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
>  };
> @@ -569,10 +665,21 @@ finish:
>  
>  int i915_needs_cmd_parser(struct intel_ring_buffer *ring)
>  {
> +	drm_i915_private_t *dev_priv =
> +		(drm_i915_private_t *)ring->dev->dev_private;
> +
>  	/* No command tables indicates a platform without parsing */
>  	if (!ring->cmd_tables)
>  		return 0;
>  
> +	/*
> +	 * XXX: VLV is Gen7 and therefore has cmd_tables, but has PPGTT
> +	 * disabled. That will cause all of the parser's PPGTT checks to
> +	 * fail. For now, disable parsing when PPGTT is off.
> +	 */
> +	if(!dev_priv->mm.aliasing_ppgtt)
   	  ^ missing space.

> +		return 0;
> +

Hmm, shouldn't this belong to some other patch, much earlier in the
series? Like patch 2 or 3?

>  	return i915.enable_cmd_parser;
>  }
>  
> @@ -675,6 +782,16 @@ int i915_parse_cmds(struct intel_ring_buffer *ring,
>  				u32 dword = cmd[desc->bits[i].offset] &
>  					desc->bits[i].mask;
>  
> +				if (desc->bits[i].condition_mask != 0) {
> +					u32 offset =
> +						desc->bits[i].condition_offset;
> +					u32 condition = cmd[offset] &
> +						desc->bits[i].condition_mask;
> +
> +					if (condition == 0)
> +						continue;
> +				}
> +
>  				if (dword != desc->bits[i].expected) {
>  					DRM_DEBUG_DRIVER("CMD: Rejected command 0x%08X for bitmask 0x%08X (exp=0x%08X act=0x%08X) (ring=%d)\n",
>  							 *cmd,
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 8aed80f..2d1d2ef 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1829,11 +1829,17 @@ struct drm_i915_cmd_descriptor {
>  	 * compared against an expected value. If the command does not match
>  	 * the expected value, the parser rejects it. Only valid if flags has
>  	 * the CMD_DESC_BITMASK bit set.
> +	 *
> +	 * If the check specifies a non-zero condition_mask then the parser
> +	 * only performs the check when the bits specified by condition_mask
> +	 * are non-zero.
>  	 */
>  	struct {
>  		u32 offset;
>  		u32 mask;
>  		u32 expected;
> +		u32 condition_offset;
> +		u32 condition_mask;
>  	} bits[MAX_CMD_DESC_BITMASKS];
>  	/** Number of valid entries in the bits array */
>  	int bits_count;
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index c2e4898..ff263f4 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -179,6 +179,8 @@
>   * Memory interface instructions used by the kernel
>   */
>  #define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
> +/* Many MI commands use bit 22 of the header dword for GGTT vs PPGTT */
> +#define  MI_GLOBAL_GTT    (1<<22)
>  
>  #define MI_NOOP			MI_INSTR(0, 0)
>  #define MI_USER_INTERRUPT	MI_INSTR(0x02, 0)
> @@ -258,6 +260,7 @@
>  #define   MI_FLUSH_DW_STORE_INDEX	(1<<21)
>  #define   MI_INVALIDATE_TLB		(1<<18)
>  #define   MI_FLUSH_DW_OP_STOREDW	(1<<14)
> +#define   MI_FLUSH_DW_OP_MASK		(3<<14)
>  #define   MI_FLUSH_DW_NOTIFY		(1<<8)
>  #define   MI_INVALIDATE_BSD		(1<<7)
>  #define   MI_FLUSH_DW_USE_GTT		(1<<2)
> @@ -324,6 +327,7 @@
>  #define   PIPE_CONTROL_CS_STALL				(1<<20)
>  #define   PIPE_CONTROL_TLB_INVALIDATE			(1<<18)
>  #define   PIPE_CONTROL_QW_WRITE				(1<<14)
> +#define   PIPE_CONTROL_POST_SYNC_OP_MASK                (3<<14)
>  #define   PIPE_CONTROL_DEPTH_STALL			(1<<13)
>  #define   PIPE_CONTROL_WRITE_FLUSH			(1<<12)
>  #define   PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH	(1<<12) /* gen6+ */
> @@ -352,6 +356,8 @@
>  #define MI_URB_CLEAR            MI_INSTR(0x19, 0)
>  #define MI_UPDATE_GTT           MI_INSTR(0x23, 0)
>  #define MI_CLFLUSH              MI_INSTR(0x27, 0)
> +#define MI_REPORT_PERF_COUNT    MI_INSTR(0x28, 0)
> +#define   MI_REPORT_PERF_COUNT_GGTT (1<<0)
>  #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
>  #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
>  #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
> -- 
> 1.8.5.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 11/13] drm/i915: Reject commands that would store to global HWS page
  2014-01-29 21:55   ` [PATCH 11/13] drm/i915: Reject commands that would store to global HWS page bradley.d.volkin
@ 2014-02-05 15:39     ` Jani Nikula
  0 siblings, 0 replies; 138+ messages in thread
From: Jani Nikula @ 2014-02-05 15:39 UTC (permalink / raw)
  To: bradley.d.volkin, intel-gfx

On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
>
> PIPE_CONTROL and MI_FLUSH_DW have bits that would write to the
> hardware status page. The driver stores request tracking info
> there, so don't let userspace overwrite it.
>
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c | 30 ++++++++++++++++++++++++++----
>  drivers/gpu/drm/i915/i915_reg.h        |  1 +
>  2 files changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 26072a2..b93df1c 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -141,7 +141,8 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
>  	      },
>  	      {
>  			.offset = 1,
> -		        .mask = PIPE_CONTROL_GLOBAL_GTT_IVB,
> +		        .mask = (PIPE_CONTROL_GLOBAL_GTT_IVB |
> +				 PIPE_CONTROL_STORE_DATA_INDEX),
>  			.expected = 0,
>  			.condition_offset = 1,
>  			.condition_mask = PIPE_CONTROL_POST_SYNC_OP_MASK
> @@ -192,8 +193,15 @@ static const struct drm_i915_cmd_descriptor video_cmds[] = {
>  			.expected = 0,
>  			.condition_offset = 0,
>  			.condition_mask = MI_FLUSH_DW_OP_MASK
> +	      },
> +	      {
> +			.offset = 0,
> +			.mask = MI_FLUSH_DW_STORE_INDEX,
> +			.expected = 0,
> +			.condition_offset = 0,
> +			.condition_mask = MI_FLUSH_DW_OP_MASK
>  	      }},
> -	      .bits_count = 2                                          ),
> +	      .bits_count = 3                                          ),

I'm disliking this separate .bits_count more at every change...

>  	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
>  	      .bits = {{
>  			.offset = 0,
> @@ -231,8 +239,15 @@ static const struct drm_i915_cmd_descriptor vecs_cmds[] = {
>  			.expected = 0,
>  			.condition_offset = 0,
>  			.condition_mask = MI_FLUSH_DW_OP_MASK
> +	      },
> +	      {
> +			.offset = 0,
> +			.mask = MI_FLUSH_DW_STORE_INDEX,
> +			.expected = 0,
> +			.condition_offset = 0,
> +			.condition_mask = MI_FLUSH_DW_OP_MASK
>  	      }},
> -	      .bits_count = 2					       ),
> +	      .bits_count = 3					       ),
>  	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   B,
>  	      .bits = {{
>  			.offset = 0,
> @@ -264,8 +279,15 @@ static const struct drm_i915_cmd_descriptor blt_cmds[] = {
>  			.expected = 0,
>  			.condition_offset = 0,
>  			.condition_mask = MI_FLUSH_DW_OP_MASK
> +	      },
> +	      {
> +			.offset = 0,
> +			.mask = MI_FLUSH_DW_STORE_INDEX,
> +			.expected = 0,
> +			.condition_offset = 0,
> +			.condition_mask = MI_FLUSH_DW_OP_MASK
>  	      }},
> -	      .bits_count = 2					       ),
> +	      .bits_count = 3					       ),
>  	CMD(  COLOR_BLT,                        S2D,   !F,  0x3F,   S  ),
>  	CMD(  SRC_COPY_BLT,                     S2D,   !F,  0x3F,   S  ),
>  };
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index ff263f4..5f77cb6 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -324,6 +324,7 @@
>  #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|(len-2))
>  #define   PIPE_CONTROL_GLOBAL_GTT_IVB			(1<<24) /* gen7+ */
>  #define   PIPE_CONTROL_MMIO_WRITE			(1<<23)
> +#define   PIPE_CONTROL_STORE_DATA_INDEX			(1<<21)
>  #define   PIPE_CONTROL_CS_STALL				(1<<20)
>  #define   PIPE_CONTROL_TLB_INVALIDATE			(1<<18)
>  #define   PIPE_CONTROL_QW_WRITE				(1<<14)
> -- 
> 1.8.5.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 00/13] Gen7 batch buffer command parser
  2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
                     ` (13 preceding siblings ...)
  2014-01-29 22:11   ` [PATCH 00/13] Gen7 batch buffer command parser Daniel Vetter
@ 2014-02-05 15:41   ` Jani Nikula
  14 siblings, 0 replies; 138+ messages in thread
From: Jani Nikula @ 2014-02-05 15:41 UTC (permalink / raw)
  To: bradley.d.volkin, intel-gfx


FYI, I did an initial review "sweep" of this. Will focus more on the
logic and registers etc. next.

BR,
Jani.


On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
>
> Certain OpenGL features (e.g. transform feedback, performance monitoring)
> require userspace code to submit batches containing commands such as
> MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
> generations of the hardware will noop these commands in "unsecure" batches
> (which includes all userspace batches submitted via i915) even though the
> commands may be safe and represent the intended programming model of the device.
>
> This series introduces a software command parser similar in operation to the
> command parsing done in hardware for unsecure batches. However, the software
> parser allows some operations that would be noop'd by hardware, if the parser
> determines the operation is safe, and submits the batch as "secure" to prevent
> hardware parsing. Currently the series implements this on IVB and HSW.
>
> The series has one piece of prep work, one patch for the parser logic, and a
> handful of patches to fill out the tables which drive the parser. There are
> follow-up patches to libdrm and to i-g-t. The i-g-t tests are basic and do not
> test all of the commands used by the parser on the assumption that I'm likely
> to make the same mistakes in both the parser and the test.
>
> WARNING!!!
> I've previously run the i-g-t gem_* tests, the piglit quick tests, and generally
> used Ubuntu 13.10 IVB and HSW systems with the parser running. Aside from a
> failure described below, I did not see any regressions. However, the series
> currently hits a BUG_ON() if you enable the parser due to a regression in secure
> batch handling on -nightly.
>
> At this point there are a couple of required/potential improvements.
>
> 1) Chained batches. The parser currently allows MI_BATCH_BUFFER_START commands
>    in userspace batches without parsing them. The media driver uses chained
>    batches, so a solution is required. I'm still working through the
>    requirements but don't want to continue delaying the review process for what
>    I have so far.
> 2) Command buffer copy. To avoid CPU modifications to buffers after parsing, and
>    to avoid GPU modifications to buffers via EUs or commands in the batch, we
>    should copy the userspace batch buffer to memory that userspace does not
>    have access to, map it into GGTT, and execute that batch buffer. I have a
>    sense of how to do this for 1st-level batches, but it may need changes to
>    tie in with the chained batch parsing, so I've again held off.
> 3) Coherency. I've found a coherency issue on VLV when reading the batch buffer
>    from the CPU during execbuffer2. Userspace writes the batch via pwrite fast
>    path before calling execbuffer2. The parser reads stale data. This works fine
>    on IVB and HSW, so I believe it's an LLC vs. non-LLC issue. I'm just unclear
>    on what the correct flushing or synchronization is for this scenario. This
>    only matters if we get PPGTT working on VLV and enable the parser there.
>
> v2:
> - Significantly reorder series
> - Scan secure batches (i.e. I915_EXEC_SECURE)
> - Check that parser tables are sorted during init
> - Fixed gem_cpu_reloc regression
> - HAS_CMD_PARSER -> CMD_PARSER_VERSION getparam
> - Additional tests
>
> Brad Volkin (13):
>   drm/i915: Refactor shmem pread setup
>   drm/i915: Implement command buffer parsing logic
>   drm/i915: Initial command parser table definitions
>   drm/i915: Reject privileged commands
>   drm/i915: Allow some privileged commands from master
>   drm/i915: Add register whitelists for mesa
>   drm/i915: Add register whitelist for DRM master
>   drm/i915: Enable register whitelist checks
>   drm/i915: Reject commands that explicitly generate interrupts
>   drm/i915: Enable PPGTT command parser checks
>   drm/i915: Reject commands that would store to global HWS page
>   drm/i915: Add a CMD_PARSER_VERSION getparam
>   drm/i915: Enable command parsing by default
>
>  drivers/gpu/drm/i915/Makefile              |   3 +-
>  drivers/gpu/drm/i915/i915_cmd_parser.c     | 845 +++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_dma.c            |   4 +
>  drivers/gpu/drm/i915/i915_drv.h            | 103 ++++
>  drivers/gpu/drm/i915/i915_gem.c            |  48 +-
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  17 +
>  drivers/gpu/drm/i915/i915_params.c         |   5 +
>  drivers/gpu/drm/i915/i915_reg.h            |  78 +++
>  drivers/gpu/drm/i915/intel_ringbuffer.c    |   2 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h    |  32 ++
>  include/uapi/drm/i915_drm.h                |   1 +
>  11 files changed, 1123 insertions(+), 15 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c
>
> -- 
> 1.8.5.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2014-02-05 10:28 ` [RFC 00/22] Gen7 batch buffer command parser Chris Wilson
@ 2014-02-05 18:18   ` Volkin, Bradley D
  2014-02-05 18:25     ` Chris Wilson
  2014-02-05 18:30     ` Daniel Vetter
  0 siblings, 2 replies; 138+ messages in thread
From: Volkin, Bradley D @ 2014-02-05 18:18 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Feb 05, 2014 at 02:28:29AM -0800, Chris Wilson wrote:
> On Tue, Nov 26, 2013 at 08:51:17AM -0800, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > Certain OpenGL features (e.g. transform feedback, performance monitoring)
> > require userspace code to submit batches containing commands such as
> > MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
> > generations of the hardware will noop these commands in "unsecure" batches
> > (which includes all userspace batches submitted via i915) even though the
> > commands may be safe and represent the intended programming model of the device.
> > 
> > This series introduces a software command parser similar in operation to the
> > command parsing done in hardware for unsecure batches. However, the software
> > parser allows some operations that would be noop'd by hardware, if the parser
> > determines the operation is safe, and submits the batch as "secure" to prevent
> > hardware parsing. Currently the series implements this on IVB and HSW.
> 
> Just one more question... Do you have a branch for people to test?

Not at the moment. And as mentioned in the v2 cover letter, it's actually not
particularly testable (or mergeable for that matter) right now because of a
regression in secure dispatch on nightly.

> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2014-02-05 18:18   ` Volkin, Bradley D
@ 2014-02-05 18:25     ` Chris Wilson
  2014-02-05 18:30     ` Daniel Vetter
  1 sibling, 0 replies; 138+ messages in thread
From: Chris Wilson @ 2014-02-05 18:25 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Wed, Feb 05, 2014 at 10:18:44AM -0800, Volkin, Bradley D wrote:
> On Wed, Feb 05, 2014 at 02:28:29AM -0800, Chris Wilson wrote:
> > On Tue, Nov 26, 2013 at 08:51:17AM -0800, bradley.d.volkin@intel.com wrote:
> > > From: Brad Volkin <bradley.d.volkin@intel.com>
> > > 
> > > Certain OpenGL features (e.g. transform feedback, performance monitoring)
> > > require userspace code to submit batches containing commands such as
> > > MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
> > > generations of the hardware will noop these commands in "unsecure" batches
> > > (which includes all userspace batches submitted via i915) even though the
> > > commands may be safe and represent the intended programming model of the device.
> > > 
> > > This series introduces a software command parser similar in operation to the
> > > command parsing done in hardware for unsecure batches. However, the software
> > > parser allows some operations that would be noop'd by hardware, if the parser
> > > determines the operation is safe, and submits the batch as "secure" to prevent
> > > hardware parsing. Currently the series implements this on IVB and HSW.
> > 
> > Just one more question... Do you have a branch for people to test?
> 
> Not at the moment. And as mentioned in the v2 cover letter, it's actually not
> particularly testable (or mergeable for that matter) right now because of a
> regression in secure dispatch on nightly.

At this moment, I just want to be sure that the fixed dispatch overhead has
been minimised.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2014-02-05 18:18   ` Volkin, Bradley D
  2014-02-05 18:25     ` Chris Wilson
@ 2014-02-05 18:30     ` Daniel Vetter
  2014-02-05 19:00       ` Volkin, Bradley D
  1 sibling, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2014-02-05 18:30 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Wed, Feb 5, 2014 at 7:18 PM, Volkin, Bradley D
<bradley.d.volkin@intel.com> wrote:
> On Wed, Feb 05, 2014 at 02:28:29AM -0800, Chris Wilson wrote:
>> On Tue, Nov 26, 2013 at 08:51:17AM -0800, bradley.d.volkin@intel.com wrote:
>> > From: Brad Volkin <bradley.d.volkin@intel.com>
>> >
>> > Certain OpenGL features (e.g. transform feedback, performance monitoring)
>> > require userspace code to submit batches containing commands such as
>> > MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
>> > generations of the hardware will noop these commands in "unsecure" batches
>> > (which includes all userspace batches submitted via i915) even though the
>> > commands may be safe and represent the intended programming model of the device.
>> >
>> > This series introduces a software command parser similar in operation to the
>> > command parsing done in hardware for unsecure batches. However, the software
>> > parser allows some operations that would be noop'd by hardware, if the parser
>> > determines the operation is safe, and submits the batch as "secure" to prevent
>> > hardware parsing. Currently the series implements this on IVB and HSW.
>>
>> Just one more question... Do you have a branch for people to test?
>
> Not at the moment. And as mentioned in the v2 cover letter, it's actually not
> particularly testable (or mergeable for that matter) right now because of a
> regression in secure dispatch on nightly.

The command parser itself should still work, even with the regression
in -nightly. The copying and secure dispatch are obviously fail atm.
That still leaves regression testing of current userspace and
micro-optimizing the checker itself as possible things to do. Otoh not
sure what exactly Chris wanted to test.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-02-05 15:15     ` Jani Nikula
@ 2014-02-05 18:36       ` Volkin, Bradley D
  0 siblings, 0 replies; 138+ messages in thread
From: Volkin, Bradley D @ 2014-02-05 18:36 UTC (permalink / raw)
  To: Jani Nikula; +Cc: intel-gfx

On Wed, Feb 05, 2014 at 07:15:35AM -0800, Jani Nikula wrote:
> On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> >
> > The command parser scans batch buffers submitted via execbuffer ioctls before
> > the driver submits them to hardware. At a high level, it looks for several
> > things:
> >
> > 1) Commands which are explicitly defined as privileged or which should only be
> >    used by the kernel driver. The parser generally rejects such commands, with
> >    the provision that it may allow some from the drm master process.
> > 2) Commands which access registers. To support correct/enhanced userspace
> >    functionality, particularly certain OpenGL extensions, the parser provides a
> >    whitelist of registers which userspace may safely access (for both normal and
> >    drm master processes).
> > 3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
> >    parser always rejects such commands.
> >
> > Each ring maintains tables of commands and registers which the parser uses in
> > scanning batch buffers submitted to that ring.
> >
> > The set of commands that the parser must check for is significantly smaller
> > than the number of commands supported, especially on the render ring. As such,
> > the parser tables (built up in subsequent patches) contain only those commands
> > required by the parser. This generally works because command opcode ranges have
> > standard command length encodings. So for commands that the parser does not need
> > to check, it can easily skip them. This is implementated via a per-ring length
> > decoding vfunc.
> >
> > Unfortunately, there are a number of commands that do not follow the standard
> > length encoding for their opcode range, primarily amongst the MI_* commands. To
> > handle this, the parser provides a way to define explicit "skip" entries in the
> > per-ring command tables.
> >
> > Other command table entries will map fairly directly to high level categories
> > mentioned above: rejected, master-only, register whitelist. A number of checks,
> > including the privileged memory checks, are implemented via a general bitmasking
> > mechanism.
> >
> > OTC-Tracker: AXIA-4631
> > Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
> > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/Makefile              |   3 +-
> >  drivers/gpu/drm/i915/i915_cmd_parser.c     | 404 +++++++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/i915_drv.h            |  94 +++++++
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  17 ++
> >  drivers/gpu/drm/i915/i915_params.c         |   5 +
> >  drivers/gpu/drm/i915/intel_ringbuffer.c    |   2 +
> >  drivers/gpu/drm/i915/intel_ringbuffer.h    |  32 +++
> >  7 files changed, 556 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c
> >
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > index 4850494..2da81bf 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -47,7 +47,8 @@ i915-y := i915_drv.o i915_dma.o i915_irq.o \
> >  	  dvo_tfp410.o \
> >  	  dvo_sil164.o \
> >  	  dvo_ns2501.o \
> > -	  i915_gem_dmabuf.o
> > +	  i915_gem_dmabuf.o \
> > +	  i915_cmd_parser.o
> 
> If you add this anywhere but last, you only need to touch one line
> instead of two. It's nitpicky, but helps with things like git blame
> (which would now point at you for i915_gem_dmabuf.o too ;).

Sounds good

> 
> >  
> >  i915-$(CONFIG_COMPAT)   += i915_ioc32.o
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > new file mode 100644
> > index 0000000..7639dbc
> > --- /dev/null
> > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > @@ -0,0 +1,404 @@
> > +/*
> > + * Copyright © 2013 Intel Corporation
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a
> > + * copy of this software and associated documentation files (the "Software"),
> > + * to deal in the Software without restriction, including without limitation
> > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> > + * and/or sell copies of the Software, and to permit persons to whom the
> > + * Software is furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice (including the next
> > + * paragraph) shall be included in all copies or substantial portions of the
> > + * Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> > + * IN THE SOFTWARE.
> > + *
> > + * Authors:
> > + *    Brad Volkin <bradley.d.volkin@intel.com>
> > + *
> > + */
> > +
> > +#include "i915_drv.h"
> > +
> > +#define CLIENT_MASK      0xE0000000
> > +#define SUBCLIENT_MASK   0x18000000
> > +#define MI_CLIENT        0x00000000
> > +#define RC_CLIENT        0x60000000
> > +#define BC_CLIENT        0x40000000
> > +#define MEDIA_SUBCLIENT  0x10000000
> > +
> > +static u32 gen7_render_get_cmd_length_mask(u32 cmd_header)
> > +{
> > +	u32 client = cmd_header & CLIENT_MASK;
> > +	u32 subclient = cmd_header & SUBCLIENT_MASK;
> > +
> > +	if (client == MI_CLIENT)
> > +		return 0x3F;
> > +	else if (client == RC_CLIENT) {
> > +		if (subclient == MEDIA_SUBCLIENT)
> > +			return 0xFFFF;
> > +		else
> > +			return 0xFF;
> > +	}
> > +
> > +	DRM_DEBUG_DRIVER("CMD: Abnormal rcs cmd length! 0x%08X\n", cmd_header);
> > +	return 0;
> > +}
> > +
> > +static u32 gen7_bsd_get_cmd_length_mask(u32 cmd_header)
> > +{
> > +	u32 client = cmd_header & CLIENT_MASK;
> > +	u32 subclient = cmd_header & SUBCLIENT_MASK;
> > +
> > +	if (client == MI_CLIENT)
> > +		return 0x3F;
> > +	else if (client == RC_CLIENT) {
> > +		if (subclient == MEDIA_SUBCLIENT)
> > +			return 0xFFF;
> > +		else
> > +			return 0xFF;
> > +	}
> > +
> > +	DRM_DEBUG_DRIVER("CMD: Abnormal bsd cmd length! 0x%08X\n", cmd_header);
> > +	return 0;
> > +}
> > +
> > +static u32 gen7_blt_get_cmd_length_mask(u32 cmd_header)
> > +{
> > +	u32 client = cmd_header & CLIENT_MASK;
> > +
> > +	if (client == MI_CLIENT)
> > +		return 0x3F;
> > +	else if (client == BC_CLIENT)
> > +		return 0xFF;
> > +
> > +	DRM_DEBUG_DRIVER("CMD: Abnormal blt cmd length! 0x%08X\n", cmd_header);
> > +	return 0;
> > +}
> > +
> > +static void validate_cmds_sorted(struct intel_ring_buffer *ring)
> > +{
> > +	int i;
> > +
> > +	if (!ring->cmd_tables || ring->cmd_table_count == 0)
> > +		return;
> > +
> > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > +		const struct drm_i915_cmd_table *table = &ring->cmd_tables[i];
> > +		u32 previous = 0;
> > +		int j;
> > +
> > +		for (j = 0; j < table->count; j++) {
> > +			const struct drm_i915_cmd_descriptor *desc =
> > +				&table->table[i];
> > +			u32 curr = desc->cmd.value & desc->cmd.mask;
> > +
> > +			if (curr < previous) {
> > +				DRM_ERROR("CMD: table not sorted ring=%d table=%d entry=%d cmd=0x%08X\n",
> > +					  ring->id, i, j, curr);
> > +				return;
> 
> So this checks the hand-filled tables, right?
> 
> I think this should not stop at the first error, but rather scan the
> whole table and DRM_ERROR all cases where curr < previous, and after the
> full scan BUG_ON() if there were any errors.

Will change, and below

> 
> > +			}
> > +
> > +			previous = curr;
> > +		}
> > +	}
> > +}
> > +
> > +static void check_sorted(int ring_id, const u32 *reg_table, int reg_count)
> > +{
> > +	int i;
> > +	u32 previous = 0;
> > +
> > +	for (i = 0; i < reg_count; i++) {
> > +		u32 curr = reg_table[i];
> > +
> > +		if (curr < previous) {
> > +			DRM_ERROR("CMD: table not sorted ring=%d entry=%d reg=0x%08X\n",
> > +				  ring_id, i, curr);
> > +			return;
> 
> Same here.
> 
> > +		}
> > +
> > +		previous = curr;
> > +	}
> > +}
> > +
> > +static void validate_regs_sorted(struct intel_ring_buffer *ring)
> > +{
> > +	if (ring->reg_table && ring->reg_count > 0)
> > +		check_sorted(ring->id, ring->reg_table, ring->reg_count);
> > +
> > +	if (ring->master_reg_table && ring->master_reg_count > 0)
> > +		check_sorted(ring->id, ring->master_reg_table,
> > +			     ring->master_reg_count);
> 
> Somehow I think the ifs here are redundant. check_sorted() is a no-op if
> reg_count == 0, and if reg_count > 0 while reg_table == NULL, it
> deserves to oops!

Agreed

> 
> > +}
> > +
> > +void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
> > +{
> > +	if (!IS_GEN7(ring->dev))
> > +		return;
> > +
> > +	switch (ring->id) {
> > +	case RCS:
> > +		ring->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
> > +		break;
> > +	case VCS:
> > +		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
> > +		break;
> > +	case BCS:
> > +		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
> > +		break;
> > +	case VECS:
> > +		/* VECS can use the same length_mask function as VCS */
> > +		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
> > +		break;
> > +	default:
> > +		DRM_ERROR("CMD: cmd_parser_init with unknown ring: %d\n",
> > +			  ring->id);
> 
> You'll oops later for calling NULL ring->get_cmd_length_mask(), so might
> as well BUG() here.

Agreed

> 
> > +	}
> > +
> > +	validate_cmds_sorted(ring);
> > +	validate_regs_sorted(ring);
> > +}
> > +
> > +static const struct drm_i915_cmd_descriptor*
> > +find_cmd_in_table(const struct drm_i915_cmd_table *table,
> > +		  u32 cmd_header)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < table->count; i++) {
> > +		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
> > +		u32 masked_cmd = desc->cmd.mask & cmd_header;
> > +		u32 masked_value = desc->cmd.value & desc->cmd.mask;
> > +
> > +		if (masked_cmd == masked_value)
> > +			return desc;
> > +	}
> > +
> > +	return NULL;
> > +}
> > +
> > +/*
> > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > + *
> > + * The caller must supply space for a default descriptor via the default_desc
> > + * parameter. If no descriptor for the specified command exists in the ring's
> > + * command parser tables, this function fills in default_desc based on the
> > + * ring's default length encoding and returns default_desc.
> > + */
> > +static const struct drm_i915_cmd_descriptor*
> > +find_cmd(struct intel_ring_buffer *ring,
> > +	 u32 cmd_header,
> > +	 struct drm_i915_cmd_descriptor *default_desc)
> > +{
> > +	u32 mask;
> > +	int i;
> > +
> > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > +		const struct drm_i915_cmd_descriptor *desc;
> > +
> > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > +		if (desc)
> > +			return desc;
> > +	}
> > +
> > +	mask = ring->get_cmd_length_mask(cmd_header);
> > +	if (!mask)
> > +		return NULL;
> > +
> > +	BUG_ON(!default_desc);
> > +	default_desc->flags = CMD_DESC_SKIP;
> > +	default_desc->length.mask = mask;
> > +
> > +	return default_desc;
> > +}
> > +
> > +static int valid_reg(const u32 *table, int count, u32 addr)
> 
> I like bools for boolean stuff.

I'll reevaluate int vs bool throughout. I think the use is a bit inconsistent
throughout the driver at the moment, but I don't mind improving it.

> 
> > +{
> > +	if (table && count != 0) {
> > +		int i;
> > +
> > +		for (i = 0; i < count; i++) {
> > +			if (table[i] == addr)
> > +				return 1;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static u32 *vmap_batch(struct drm_i915_gem_object *obj)
> > +{
> > +	int i;
> > +	void *addr = NULL;
> > +	struct sg_page_iter sg_iter;
> > +	struct page **pages;
> > +
> > +	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
> > +	if (pages == NULL) {
> > +		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
> > +		goto finish;
> > +	}
> > +
> > +	i = 0;
> > +	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
> > +		pages[i] = sg_page_iter_page(&sg_iter);
> > +		i++;
> > +	}
> > +
> > +	addr = vmap(pages, i, 0, PAGE_KERNEL);
> > +	if (addr == NULL) {
> > +		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
> > +		goto finish;
> > +	}
> > +
> > +finish:
> > +	if (pages)
> > +		drm_free_large(pages);
> > +	return (u32*)addr;
> > +}
> > +
> > +int i915_needs_cmd_parser(struct intel_ring_buffer *ring)
> 
> bool
> 
> > +{
> > +	/* No command tables indicates a platform without parsing */
> > +	if (!ring->cmd_tables)
> > +		return 0;
> > +
> > +	return i915.enable_cmd_parser;
> > +}
> > +
> > +#define LENGTH_BIAS 2
> > +
> > +int i915_parse_cmds(struct intel_ring_buffer *ring,
> > +		    struct drm_i915_gem_object *batch_obj,
> > +		    u32 batch_start_offset,
> > +		    bool is_master)
> > +{
> > +	int ret = 0;
> > +	u32 *cmd, *batch_base, *batch_end;
> > +	struct drm_i915_cmd_descriptor default_desc = { 0 };
> > +	int needs_clflush = 0;
> > +
> > +	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
> > +	if (ret) {
> > +		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
> > +		return ret;
> > +	}
> > +
> > +	batch_base = vmap_batch(batch_obj);
> > +	if (!batch_base) {
> > +		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
> > +		i915_gem_object_unpin_pages(batch_obj);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	if (needs_clflush)
> > +		drm_clflush_virt_range((char *)batch_base, batch_obj->base.size);
> > +
> > +	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
> > +	batch_end = cmd + (batch_obj->base.size / sizeof(*batch_end));
> > +
> > +	while (cmd < batch_end) {
> > +		const struct drm_i915_cmd_descriptor *desc;
> > +		u32 length;
> > +
> > +		if (*cmd == MI_BATCH_BUFFER_END)
> > +			break;
> > +
> > +		desc = find_cmd(ring, *cmd, &default_desc);
> > +		if (!desc) {
> > +			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
> > +					 *cmd);
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> > +
> > +		if (desc->flags & CMD_DESC_FIXED)
> > +			length = desc->length.fixed;
> > +		else
> > +			length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
> > +
> > +		if ((batch_end - cmd) < length) {
> > +			DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%d batchlen=%ld\n",
> > +					 *cmd,
> > +					 length,
> > +					 batch_end - cmd);
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> > +
> > +		if (desc->flags & CMD_DESC_REJECT) {
> > +			DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> > +
> > +		if ((desc->flags & CMD_DESC_MASTER) && !is_master) {
> > +			DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n",
> > +					 *cmd);
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> > +
> > +		if (desc->flags & CMD_DESC_REGISTER) {
> > +			u32 reg_addr = cmd[desc->reg.offset] & desc->reg.mask;
> > +
> > +			if (!valid_reg(ring->reg_table,
> > +				       ring->reg_count, reg_addr)) {
> > +				if (!is_master ||
> > +				    !valid_reg(ring->master_reg_table,
> > +					       ring->master_reg_count,
> > +					       reg_addr)) {
> > +					DRM_DEBUG_DRIVER("CMD: Rejected register 0x%08X in command: 0x%08X (ring=%d)\n",
> > +							 reg_addr,
> > +							 *cmd,
> > +							 ring->id);
> > +					ret = -EINVAL;
> > +					break;
> > +				}
> > +			}
> > +		}
> > +
> > +		if (desc->flags & CMD_DESC_BITMASK) {
> > +			int i;
> > +
> > +			for (i = 0; i < desc->bits_count; i++) {
> > +				u32 dword = cmd[desc->bits[i].offset] &
> > +					desc->bits[i].mask;
> > +
> > +				if (dword != desc->bits[i].expected) {
> > +					DRM_DEBUG_DRIVER("CMD: Rejected command 0x%08X for bitmask 0x%08X (exp=0x%08X act=0x%08X) (ring=%d)\n",
> > +							 *cmd,
> > +							 desc->bits[i].mask,
> > +							 desc->bits[i].expected,
> > +							 dword, ring->id);
> > +					ret = -EINVAL;
> > +					break;
> > +				}
> > +			}
> > +
> > +			if (ret)
> > +				break;
> > +		}
> > +
> > +		cmd += length;
> > +	}
> > +
> > +	if (cmd >= batch_end) {
> > +		DRM_DEBUG_DRIVER("CMD: Got to the end of the buffer w/o a BBE cmd!\n");
> > +		ret = -EINVAL;
> > +	}
> > +
> > +	vunmap(batch_base);
> > +
> > +	i915_gem_object_unpin_pages(batch_obj);
> > +
> > +	return ret;
> > +}
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index bfb30df..8aed80f 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1765,6 +1765,91 @@ struct drm_i915_file_private {
> >  	atomic_t rps_wait_boost;
> >  };
> >  
> > +/**
> > + * A command that requires special handling by the command parser.
> > + */
> 
> You have plenty of kernel-doc comments here and in other patches. They
> do expect a certain format, however. Please either make them regular
> comments (the easy way) or adhere to proper kernel-doc format.

Again, current use is inconsistent in the driver, but I'll reevaluate throughout.

> 
> > +struct drm_i915_cmd_descriptor {
> > +	/**
> > +	 * Flags describing how the command parser processes the command.
> > +	 *
> > +	 * CMD_DESC_FIXED: The command has a fixed length if this is set,
> > +	 *                 a length mask if not set
> > +	 * CMD_DESC_SKIP: The command is allowed but does not follow the
> > +	 *                standard length encoding for the opcode range in
> > +	 *                which it falls
> > +	 * CMD_DESC_REJECT: The command is never allowed
> > +	 * CMD_DESC_REGISTER: The command should be checked against the
> > +	 *                    register whitelist for the appropriate ring
> > +	 * CMD_DESC_MASTER: The command is allowed if the submitting process
> > +	 *                  is the DRM master
> > +	 */
> > +	u32 flags;
> > +#define CMD_DESC_FIXED    (1<<0)
> > +#define CMD_DESC_SKIP     (1<<1)
> > +#define CMD_DESC_REJECT   (1<<2)
> > +#define CMD_DESC_REGISTER (1<<3)
> > +#define CMD_DESC_BITMASK  (1<<4)
> > +#define CMD_DESC_MASTER   (1<<5)
> 
> Feels like flags should be named FLAG, not DESC. *shrug*.
> 
> > +
> > +	/**
> > +	 * The command's unique identification bits and the bitmask to get them.
> > +	 * This isn't strictly the opcode field as defined in the spec and may
> > +	 * also include type, subtype, and/or subop fields.
> > +	 */
> > +	struct {
> > +		u32 value;
> > +		u32 mask;
> > +	} cmd;
> > +
> > +	/**
> > +	 * The command's length. The command is either fixed length (i.e. does
> > +	 * not include a length field) or has a length field mask. The flag
> > +	 * CMD_DESC_FIXED indicates a fixed length. Otherwise, the command has
> > +	 * a length mask. All command entries in a command table must include
> > +	 * length information.
> > +	 */
> > +	union {
> > +		u32 fixed;
> > +		u32 mask;
> > +	} length;
> > +
> > +	/**
> > +	 * Describes where to find a register address in the command to check
> > +	 * against the ring's register whitelist. Only valid if flags has the
> > +	 * CMD_DESC_REGISTER bit set.
> > +	 */
> > +	struct {
> > +		u32 offset;
> > +		u32 mask;
> > +	} reg;
> > +
> > +#define MAX_CMD_DESC_BITMASKS 3
> > +	/**
> > +	 * Describes command checks where a particular dword is masked and
> > +	 * compared against an expected value. If the command does not match
> > +	 * the expected value, the parser rejects it. Only valid if flags has
> > +	 * the CMD_DESC_BITMASK bit set.
> > +	 */
> > +	struct {
> > +		u32 offset;
> > +		u32 mask;
> > +		u32 expected;
> > +	} bits[MAX_CMD_DESC_BITMASKS];
> > +	/** Number of valid entries in the bits array */
> > +	int bits_count;
> > +};
> > +
> > +/**
> > + * A table of commands requiring special handling by the command parser.
> > + *
> > + * Each ring has an array of tables. Each table consists of an array of command
> > + * descriptors, which must be sorted with command opcodes in ascending order.
> > + */
> > +struct drm_i915_cmd_table {
> > +	const struct drm_i915_cmd_descriptor *table;
> > +	int count;
> > +};
> > +
> >  #define INTEL_INFO(dev)	(to_i915(dev)->info)
> >  
> >  #define IS_I830(dev)		((dev)->pdev->device == 0x3577)
> > @@ -1923,6 +2008,7 @@ struct i915_params {
> >  	bool prefault_disable;
> >  	bool reset;
> >  	int invert_brightness;
> > +	int enable_cmd_parser;
> >  };
> >  extern struct i915_params i915 __read_mostly;
> >  
> > @@ -2428,6 +2514,14 @@ void i915_destroy_error_state(struct drm_device *dev);
> >  void i915_get_extra_instdone(struct drm_device *dev, uint32_t *instdone);
> >  const char *i915_cache_level_str(int type);
> >  
> > +/* i915_cmd_parser.c */
> > +void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring);
> > +int i915_needs_cmd_parser(struct intel_ring_buffer *ring);
> > +int i915_parse_cmds(struct intel_ring_buffer *ring,
> > +		    struct drm_i915_gem_object *batch_obj,
> > +		    u32 batch_start_offset,
> > +		    bool is_master);
> > +
> >  /* i915_suspend.c */
> >  extern int i915_save_state(struct drm_device *dev);
> >  extern int i915_restore_state(struct drm_device *dev);
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index 032def9..c953445 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -1180,6 +1180,23 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >  	}
> >  	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
> >  
> > +	if (i915_needs_cmd_parser(ring)) {
> > +		ret = i915_parse_cmds(ring,
> > +				      batch_obj,
> > +				      args->batch_start_offset,
> > +				      file->is_master);
> > +		if (ret)
> > +			goto err;
> > +
> > +		/*
> > +		 * Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
> > +		 * from MI_BATCH_BUFFER_START commands issued in the
> > +		 * dispatch_execbuffer implementations. We specifically don't
> > +		 * want that set when the command parser is enabled.
> > +		 */
> > +		flags |= I915_DISPATCH_SECURE;
> > +	}
> > +
> >  	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
> >  	 * batch" bit. Hence we need to pin secure batches into the global gtt.
> >  	 * hsw should have this fixed, but bdw mucks it up again. */
> > diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> > index c743057..6d3d906 100644
> > --- a/drivers/gpu/drm/i915/i915_params.c
> > +++ b/drivers/gpu/drm/i915/i915_params.c
> > @@ -47,6 +47,7 @@ struct i915_params i915 __read_mostly = {
> >  	.prefault_disable = 0,
> >  	.reset = true,
> >  	.invert_brightness = 0,
> > +	.enable_cmd_parser = 0
> 
> Please add a comma in the end so the next addition won't have to, just
> like this doesn't have to touch the previous line.

Will do throughout

> 
> >  };
> >  
> >  module_param_named(modeset, i915.modeset, int, 0400);
> > @@ -153,3 +154,7 @@ MODULE_PARM_DESC(invert_brightness,
> >  	"report PCI device ID, subsystem vendor and subsystem device ID "
> >  	"to dri-devel@lists.freedesktop.org, if your machine needs it. "
> >  	"It will then be included in an upcoming module version.");
> > +
> > +module_param_named(enable_cmd_parser, i915.enable_cmd_parser, int, 0600);
> > +MODULE_PARM_DESC(enable_cmd_parser,
> > +		"Enable command parsing (default: false)");
> 
> If it's a bool, make it a bool, or change the default text to 0.

I'll change it to 0 for now. We might want to do the -1/0/1 style thing at some
point, though I don't necessarily have a good case for it right now.
- Brad

> 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index a0d61f8..77fc61d 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1388,6 +1388,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
> >  	if (IS_I830(ring->dev) || IS_845G(ring->dev))
> >  		ring->effective_size -= 128;
> >  
> > +	i915_cmd_parser_init_ring(ring);
> > +
> >  	return 0;
> >  
> >  err_unmap:
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > index 71a73f4..cff2b35 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > @@ -162,6 +162,38 @@ struct  intel_ring_buffer {
> >  		u32 gtt_offset;
> >  		volatile u32 *cpu_page;
> >  	} scratch;
> > +
> > +	/**
> > +	 * Tables of commands the command parser needs to know about
> > +	 * for this ring.
> > +	 */
> > +	const struct drm_i915_cmd_table *cmd_tables;
> > +	int cmd_table_count;
> > +
> > +	/**
> > +	 * Table of registers allowed in commands that read/write registers.
> > +	 */
> > +	const u32 *reg_table;
> > +	int reg_count;
> > +
> > +	/**
> > +	 * Table of registers allowed in commands that read/write registers, but
> > +	 * only from the DRM master.
> > +	 */
> > +	const u32 *master_reg_table;
> > +	int master_reg_count;
> > +
> > +	/**
> > +	 * Returns the bitmask for the length field of the specified command.
> > +	 * Return 0 for an unrecognized/invalid command.
> > +	 *
> > +	 * If the command parser finds an entry for a command in the ring's
> > +	 * cmd_tables, it gets the command's length based on the table entry.
> > +	 * If not, it calls this function to determine the per-ring length field
> > +	 * encoding for the command (i.e. certain opcode ranges use certain bits
> > +	 * to encode the command length in the header).
> > +	 */
> > +	u32 (*get_cmd_length_mask)(u32 cmd_header);
> >  };
> 
> Plenty of non-conforming kernel-doc comments here too.
> 
> >  
> >  static inline bool
> > -- 
> > 1.8.5.2
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Jani Nikula, Intel Open Source Technology Center
> 

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 04/13] drm/i915: Reject privileged commands
  2014-02-05 15:22     ` Jani Nikula
@ 2014-02-05 18:42       ` Volkin, Bradley D
  0 siblings, 0 replies; 138+ messages in thread
From: Volkin, Bradley D @ 2014-02-05 18:42 UTC (permalink / raw)
  To: Jani Nikula; +Cc: intel-gfx

[snip]

On Wed, Feb 05, 2014 at 07:22:33AM -0800, Jani Nikula wrote:
> On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index 13ed6ed..2b7c26e 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -339,21 +339,22 @@
> >  /*
> >   * Commands used only by the command parser
> >   */
> > -#define MI_SET_PREDICATE       MI_INSTR(0x01, 0)
> > -#define MI_ARB_CHECK           MI_INSTR(0x05, 0)
> > -#define MI_RS_CONTROL          MI_INSTR(0x06, 0)
> > -#define MI_URB_ATOMIC_ALLOC    MI_INSTR(0x09, 0)
> > -#define MI_PREDICATE           MI_INSTR(0x0C, 0)
> > -#define MI_RS_CONTEXT          MI_INSTR(0x0F, 0)
> > -#define MI_TOPOLOGY_FILTER     MI_INSTR(0x0D, 0)
> > -#define MI_URB_CLEAR           MI_INSTR(0x19, 0)
> > -#define MI_UPDATE_GTT          MI_INSTR(0x23, 0)
> > -#define MI_CLFLUSH             MI_INSTR(0x27, 0)
> > -#define MI_LOAD_REGISTER_MEM   MI_INSTR(0x29, 0)
> > -#define MI_LOAD_REGISTER_REG   MI_INSTR(0x2A, 0)
> > -#define MI_RS_STORE_DATA_IMM   MI_INSTR(0x2B, 0)
> > -#define MI_LOAD_URB_MEM        MI_INSTR(0x2C, 0)
> > -#define MI_STORE_URB_MEM       MI_INSTR(0x2D, 0)
> > +#define MI_SET_PREDICATE        MI_INSTR(0x01, 0)
> > +#define MI_ARB_CHECK            MI_INSTR(0x05, 0)
> > +#define MI_RS_CONTROL           MI_INSTR(0x06, 0)
> > +#define MI_URB_ATOMIC_ALLOC     MI_INSTR(0x09, 0)
> > +#define MI_PREDICATE            MI_INSTR(0x0C, 0)
> > +#define MI_RS_CONTEXT           MI_INSTR(0x0F, 0)
> > +#define MI_TOPOLOGY_FILTER      MI_INSTR(0x0D, 0)
> > +#define MI_LOAD_SCAN_LINES_EXCL MI_INSTR(0x13, 0)
> > +#define MI_URB_CLEAR            MI_INSTR(0x19, 0)
> > +#define MI_UPDATE_GTT           MI_INSTR(0x23, 0)
> > +#define MI_CLFLUSH              MI_INSTR(0x27, 0)
> > +#define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
> > +#define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
> > +#define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
> > +#define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
> > +#define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
> 
> Superfluous whitespace change hunk.

It adds MI_LOAD_SCAN_LINES_EXCL and adjusts the whitespace to line up. I see
that the whitespace change makes the actual change less obvious. I'll try to
clean that up.
- Brad

> 
> 
> >  #define MI_CONDITIONAL_BATCH_BUFFER_END MI_INSTR(0x36, 0)
> >  
> >  #define PIPELINE_SELECT                ((0x3<<29)|(0x1<<27)|(0x1<<24)|(0x4<<16))
> > -- 
> > 1.8.5.2
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 06/13] drm/i915: Add register whitelists for mesa
  2014-02-05 15:29     ` Jani Nikula
@ 2014-02-05 18:47       ` Volkin, Bradley D
  0 siblings, 0 replies; 138+ messages in thread
From: Volkin, Bradley D @ 2014-02-05 18:47 UTC (permalink / raw)
  To: Jani Nikula; +Cc: intel-gfx

On Wed, Feb 05, 2014 at 07:29:12AM -0800, Jani Nikula wrote:
> On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> >
> > These registers are currently used by mesa for blitting,
> > transform feedback extensions, and performance monitoring
> > extensions.
> >
> > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_cmd_parser.c | 55 ++++++++++++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/i915_reg.h        | 20 +++++++++++++
> >  2 files changed, 75 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > index 88456638..18d5b05 100644
> > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > @@ -185,6 +185,55 @@ static const struct drm_i915_cmd_table hsw_blt_ring_cmds[] = {
> >  	{ hsw_blt_cmds, ARRAY_SIZE(hsw_blt_cmds) },
> >  };
> >  
> > +/*
> > + * Register whitelists, sorted by increasing register offset.
> > + *
> > + * Some registers that userspace accesses are 64 bits. The register
> > + * access commands only allow 32-bit accesses. Hence, we have to include
> > + * entries for both halves of the 64-bit registers.
> > + */
> 
> Seems like it would be useful to have a helper macro here.
> 
> 	#define FOO64(addr) (addr), (addr + 4)
> 
> With a better name, hopefully. My imagination fails me now.

REG64(addr)?
Or maybe just
	#define REG_UPPER_DW(addr) (addr + 4)

- Brad

> 
> > +
> > +static const u32 gen7_render_regs[] = {
> > +	HS_INVOCATION_COUNT,
> > +	HS_INVOCATION_COUNT + sizeof(u32),
> > +	DS_INVOCATION_COUNT,
> > +	DS_INVOCATION_COUNT + sizeof(u32),
> > +	IA_VERTICES_COUNT,
> > +	IA_VERTICES_COUNT + sizeof(u32),
> > +	IA_PRIMITIVES_COUNT,
> > +	IA_PRIMITIVES_COUNT + sizeof(u32),
> > +	VS_INVOCATION_COUNT,
> > +	VS_INVOCATION_COUNT + sizeof(u32),
> > +	GS_INVOCATION_COUNT,
> > +	GS_INVOCATION_COUNT + sizeof(u32),
> > +	GS_PRIMITIVES_COUNT,
> > +	GS_PRIMITIVES_COUNT + sizeof(u32),
> > +	CL_INVOCATION_COUNT,
> > +	CL_INVOCATION_COUNT + sizeof(u32),
> > +	CL_PRIMITIVES_COUNT,
> > +	CL_PRIMITIVES_COUNT + sizeof(u32),
> > +	PS_INVOCATION_COUNT,
> > +	PS_INVOCATION_COUNT + sizeof(u32),
> > +	PS_DEPTH_COUNT,
> > +	PS_DEPTH_COUNT + sizeof(u32),
> > +	GEN7_SO_NUM_PRIMS_WRITTEN(0),
> > +	GEN7_SO_NUM_PRIMS_WRITTEN(0) + sizeof(u32),
> > +	GEN7_SO_NUM_PRIMS_WRITTEN(1),
> > +	GEN7_SO_NUM_PRIMS_WRITTEN(1) + sizeof(u32),
> > +	GEN7_SO_NUM_PRIMS_WRITTEN(2),
> > +	GEN7_SO_NUM_PRIMS_WRITTEN(2) + sizeof(u32),
> > +	GEN7_SO_NUM_PRIMS_WRITTEN(3),
> > +	GEN7_SO_NUM_PRIMS_WRITTEN(3) + sizeof(u32),
> > +	GEN7_SO_WRITE_OFFSET(0),
> > +	GEN7_SO_WRITE_OFFSET(1),
> > +	GEN7_SO_WRITE_OFFSET(2),
> > +	GEN7_SO_WRITE_OFFSET(3),
> > +};
> > +
> > +static const u32 gen7_blt_regs[] = {
> > +	BCS_SWCTRL,
> > +};
> > +
> >  #define CLIENT_MASK      0xE0000000
> >  #define SUBCLIENT_MASK   0x18000000
> >  #define MI_CLIENT        0x00000000
> > @@ -313,6 +362,9 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
> >  			ring->cmd_table_count = ARRAY_SIZE(gen7_render_cmds);
> >  		}
> >  
> > +		ring->reg_table = gen7_render_regs;
> > +		ring->reg_count = ARRAY_SIZE(gen7_render_regs);
> > +
> >  		ring->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
> >  		break;
> >  	case VCS:
> > @@ -329,6 +381,9 @@ void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
> >  			ring->cmd_table_count = ARRAY_SIZE(gen7_blt_cmds);
> >  		}
> >  
> > +		ring->reg_table = gen7_blt_regs;
> > +		ring->reg_count = ARRAY_SIZE(gen7_blt_regs);
> > +
> >  		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
> >  		break;
> >  	case VECS:
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index 2b7c26e..b99bacf 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -385,6 +385,26 @@
> >  #define SRC_COPY_BLT  ((0x2<<29)|(0x43<<22))
> >  
> >  /*
> > + * Registers used only by the command parser
> > + */
> > +#define BCS_SWCTRL 0x22200
> > +
> > +#define HS_INVOCATION_COUNT 0x2300
> > +#define DS_INVOCATION_COUNT 0x2308
> > +#define IA_VERTICES_COUNT   0x2310
> > +#define IA_PRIMITIVES_COUNT 0x2318
> > +#define VS_INVOCATION_COUNT 0x2320
> > +#define GS_INVOCATION_COUNT 0x2328
> > +#define GS_PRIMITIVES_COUNT 0x2330
> > +#define CL_INVOCATION_COUNT 0x2338
> > +#define CL_PRIMITIVES_COUNT 0x2340
> > +#define PS_INVOCATION_COUNT 0x2348
> > +#define PS_DEPTH_COUNT      0x2350
> > +
> > +/* There are the 4 64-bit counter registers, one for each stream output */
> > +#define GEN7_SO_NUM_PRIMS_WRITTEN(n) (0x5200 + (n) * 8)
> > +
> > +/*
> >   * Reset registers
> >   */
> >  #define DEBUG_RESET_I830		0x6070
> > -- 
> > 1.8.5.2
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 08/13] drm/i915: Enable register whitelist checks
  2014-02-05 15:33     ` Jani Nikula
@ 2014-02-05 18:49       ` Volkin, Bradley D
  0 siblings, 0 replies; 138+ messages in thread
From: Volkin, Bradley D @ 2014-02-05 18:49 UTC (permalink / raw)
  To: Jani Nikula; +Cc: intel-gfx

On Wed, Feb 05, 2014 at 07:33:28AM -0800, Jani Nikula wrote:
> On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> >
> > MI_STORE_REGISTER_MEM, MI_LOAD_REGISTER_MEM, and MI_LOAD_REGISTER_IMM
> > commands allow userspace access to registers. Only certain registers
> > should be allowed for such access, so enable checking for those commands.
> > Each ring gets its own register whitelist.
> >
> > MI_LOAD_REGISTER_REG on HSW also allows register access but is currently
> > unused by userspace components. Leave it rejected.
> >
> > PIPE_CONTROL and MEDIA_VFE_STATE allow register access based on certain
> > bits being set. Reject those as well.
> >
> > OTC-Tracker: AXIA-4631
> > Change-Id: Ie614a2f0eb2e5917de809e5a17957175d24cc44f
> > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_cmd_parser.c | 23 ++++++++++++++++++++---
> >  drivers/gpu/drm/i915/i915_reg.h        |  3 +++
> >  2 files changed, 23 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > index 296e322..5d3e303 100644
> > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > @@ -63,9 +63,12 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = {
> >  	CMD(  MI_SUSPEND_FLUSH,                 SMI,    F,  1,      S  ),
> >  	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   R  ),
> >  	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   R  ),
> > -	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   R  ),
> > -	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   R  ),
> > -	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   R  ),
> > +	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   W,
> > +	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
> > +	CMD(  MI_STORE_REGISTER_MEM(1),         SMI,   !F,  0xFF,   W,
> > +	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
> > +	CMD(  MI_LOAD_REGISTER_MEM,             SMI,   !F,  0xFF,   W,
> > +	      .reg = { .offset = 1, .mask = 0x007FFFFC }               ),
> >  	CMD(  MI_BATCH_BUFFER_START,            SMI,   !F,  0xFF,   S  ),
> >  };
> >  
> > @@ -82,9 +85,23 @@ static const struct drm_i915_cmd_descriptor render_cmds[] = {
> >  	CMD(  MI_CONDITIONAL_BATCH_BUFFER_END,  SMI,   !F,  0xFF,   S  ),
> >  	CMD(  GFX_OP_3DSTATE_VF_STATISTICS,     S3D,    F,  1,      S  ),
> >  	CMD(  PIPELINE_SELECT,                  S3D,    F,  1,      S  ),
> > +	CMD(  MEDIA_VFE_STATE,			S3D,   !F,  0xFFFF, B,
> > +	      .bits = {{
> > +			.offset = 2,
> > +			.mask = MEDIA_VFE_STATE_MMIO_ACCESS_MASK,
> > +			.expected = 0
> > +	      }},
> > +	      .bits_count = 1					       ),
> 
> From my bikeshedding dept.: here too I think it would be beneficial to
> have the count decided by an empty element, or a .valid = 1 field or
> something.

I see your point. I'll look at doing a .valid=1 or .mask!=0 check.
- Brad

> 
> 
> >  	CMD(  GPGPU_OBJECT,                     S3D,   !F,  0xFF,   S  ),
> >  	CMD(  GPGPU_WALKER,                     S3D,   !F,  0xFF,   S  ),
> >  	CMD(  GFX_OP_3DSTATE_SO_DECL_LIST,      S3D,   !F,  0x1FF,  S  ),
> > +	CMD(  GFX_OP_PIPE_CONTROL(5),           S3D,   !F,  0xFF,   B,
> > +	      .bits = {{
> > +			.offset = 1,
> > +			.mask = PIPE_CONTROL_MMIO_WRITE,
> > +			.expected = 0
> > +	      }},
> > +	      .bits_count = 1					       ),
> >  };
> >  
> >  static const struct drm_i915_cmd_descriptor hsw_render_cmds[] = {
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index b99bacf..6592d0d 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -319,6 +319,7 @@
> >  #define   DISPLAY_PLANE_B           (1<<20)
> >  #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|(len-2))
> >  #define   PIPE_CONTROL_GLOBAL_GTT_IVB			(1<<24) /* gen7+ */
> > +#define   PIPE_CONTROL_MMIO_WRITE			(1<<23)
> >  #define   PIPE_CONTROL_CS_STALL				(1<<20)
> >  #define   PIPE_CONTROL_TLB_INVALIDATE			(1<<18)
> >  #define   PIPE_CONTROL_QW_WRITE				(1<<14)
> > @@ -359,6 +360,8 @@
> >  
> >  #define PIPELINE_SELECT                ((0x3<<29)|(0x1<<27)|(0x1<<24)|(0x4<<16))
> >  #define GFX_OP_3DSTATE_VF_STATISTICS   ((0x3<<29)|(0x1<<27)|(0x0<<24)|(0xB<<16))
> > +#define MEDIA_VFE_STATE                ((0x3<<29)|(0x2<<27)|(0x0<<24)|(0x0<<16))
> > +#define  MEDIA_VFE_STATE_MMIO_ACCESS_MASK (0x18)
> >  #define GPGPU_OBJECT                   ((0x3<<29)|(0x2<<27)|(0x1<<24)|(0x4<<16))
> >  #define GPGPU_WALKER                   ((0x3<<29)|(0x2<<27)|(0x1<<24)|(0x5<<16))
> >  #define GFX_OP_3DSTATE_DX9_CONSTANTF_VS \
> > -- 
> > 1.8.5.2
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 10/13] drm/i915: Enable PPGTT command parser checks
  2014-02-05 15:37     ` Jani Nikula
@ 2014-02-05 18:54       ` Volkin, Bradley D
  0 siblings, 0 replies; 138+ messages in thread
From: Volkin, Bradley D @ 2014-02-05 18:54 UTC (permalink / raw)
  To: Jani Nikula; +Cc: intel-gfx

[snip]

On Wed, Feb 05, 2014 at 07:37:51AM -0800, Jani Nikula wrote:
> On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> >  int i915_needs_cmd_parser(struct intel_ring_buffer *ring)
> >  {
> > +	drm_i915_private_t *dev_priv =
> > +		(drm_i915_private_t *)ring->dev->dev_private;
> > +
> >  	/* No command tables indicates a platform without parsing */
> >  	if (!ring->cmd_tables)
> >  		return 0;
> >  
> > +	/*
> > +	 * XXX: VLV is Gen7 and therefore has cmd_tables, but has PPGTT
> > +	 * disabled. That will cause all of the parser's PPGTT checks to
> > +	 * fail. For now, disable parsing when PPGTT is off.
> > +	 */
> > +	if(!dev_priv->mm.aliasing_ppgtt)
>    	  ^ missing space.

Oops

> 
> > +		return 0;
> > +
> 
> Hmm, shouldn't this belong to some other patch, much earlier in the
> series? Like patch 2 or 3?

Not necessarily. It's only because we've added the PPGTT checks without
somehow making them conditional on aliasing_ppgtt==true that we have a problem,
and that only happens with this patch. The parser works, though is less useful,
on !aliasing_ppgtt platforms up to this point.

Chris suggested that we just fix it up so that the PPGTT checks are conditional
on PPGTT actually enabled, so I'm going to look at that.
- Brad

> 
> >  	return i915.enable_cmd_parser;
> >  }
> >  
> > @@ -675,6 +782,16 @@ int i915_parse_cmds(struct intel_ring_buffer *ring,
> >  				u32 dword = cmd[desc->bits[i].offset] &
> >  					desc->bits[i].mask;
> >  
> > +				if (desc->bits[i].condition_mask != 0) {
> > +					u32 offset =
> > +						desc->bits[i].condition_offset;
> > +					u32 condition = cmd[offset] &
> > +						desc->bits[i].condition_mask;
> > +
> > +					if (condition == 0)
> > +						continue;
> > +				}
> > +
> >  				if (dword != desc->bits[i].expected) {
> >  					DRM_DEBUG_DRIVER("CMD: Rejected command 0x%08X for bitmask 0x%08X (exp=0x%08X act=0x%08X) (ring=%d)\n",
> >  							 *cmd,
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 8aed80f..2d1d2ef 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1829,11 +1829,17 @@ struct drm_i915_cmd_descriptor {
> >  	 * compared against an expected value. If the command does not match
> >  	 * the expected value, the parser rejects it. Only valid if flags has
> >  	 * the CMD_DESC_BITMASK bit set.
> > +	 *
> > +	 * If the check specifies a non-zero condition_mask then the parser
> > +	 * only performs the check when the bits specified by condition_mask
> > +	 * are non-zero.
> >  	 */
> >  	struct {
> >  		u32 offset;
> >  		u32 mask;
> >  		u32 expected;
> > +		u32 condition_offset;
> > +		u32 condition_mask;
> >  	} bits[MAX_CMD_DESC_BITMASKS];
> >  	/** Number of valid entries in the bits array */
> >  	int bits_count;
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index c2e4898..ff263f4 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -179,6 +179,8 @@
> >   * Memory interface instructions used by the kernel
> >   */
> >  #define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
> > +/* Many MI commands use bit 22 of the header dword for GGTT vs PPGTT */
> > +#define  MI_GLOBAL_GTT    (1<<22)
> >  
> >  #define MI_NOOP			MI_INSTR(0, 0)
> >  #define MI_USER_INTERRUPT	MI_INSTR(0x02, 0)
> > @@ -258,6 +260,7 @@
> >  #define   MI_FLUSH_DW_STORE_INDEX	(1<<21)
> >  #define   MI_INVALIDATE_TLB		(1<<18)
> >  #define   MI_FLUSH_DW_OP_STOREDW	(1<<14)
> > +#define   MI_FLUSH_DW_OP_MASK		(3<<14)
> >  #define   MI_FLUSH_DW_NOTIFY		(1<<8)
> >  #define   MI_INVALIDATE_BSD		(1<<7)
> >  #define   MI_FLUSH_DW_USE_GTT		(1<<2)
> > @@ -324,6 +327,7 @@
> >  #define   PIPE_CONTROL_CS_STALL				(1<<20)
> >  #define   PIPE_CONTROL_TLB_INVALIDATE			(1<<18)
> >  #define   PIPE_CONTROL_QW_WRITE				(1<<14)
> > +#define   PIPE_CONTROL_POST_SYNC_OP_MASK                (3<<14)
> >  #define   PIPE_CONTROL_DEPTH_STALL			(1<<13)
> >  #define   PIPE_CONTROL_WRITE_FLUSH			(1<<12)
> >  #define   PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH	(1<<12) /* gen6+ */
> > @@ -352,6 +356,8 @@
> >  #define MI_URB_CLEAR            MI_INSTR(0x19, 0)
> >  #define MI_UPDATE_GTT           MI_INSTR(0x23, 0)
> >  #define MI_CLFLUSH              MI_INSTR(0x27, 0)
> > +#define MI_REPORT_PERF_COUNT    MI_INSTR(0x28, 0)
> > +#define   MI_REPORT_PERF_COUNT_GGTT (1<<0)
> >  #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
> >  #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
> >  #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
> > -- 
> > 1.8.5.2
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2014-02-05 18:30     ` Daniel Vetter
@ 2014-02-05 19:00       ` Volkin, Bradley D
  2014-02-05 19:17         ` Daniel Vetter
  0 siblings, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2014-02-05 19:00 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Feb 05, 2014 at 10:30:00AM -0800, Daniel Vetter wrote:
> On Wed, Feb 5, 2014 at 7:18 PM, Volkin, Bradley D
> <bradley.d.volkin@intel.com> wrote:
> > On Wed, Feb 05, 2014 at 02:28:29AM -0800, Chris Wilson wrote:
> >> On Tue, Nov 26, 2013 at 08:51:17AM -0800, bradley.d.volkin@intel.com wrote:
> >> > From: Brad Volkin <bradley.d.volkin@intel.com>
> >> >
> >> > Certain OpenGL features (e.g. transform feedback, performance monitoring)
> >> > require userspace code to submit batches containing commands such as
> >> > MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
> >> > generations of the hardware will noop these commands in "unsecure" batches
> >> > (which includes all userspace batches submitted via i915) even though the
> >> > commands may be safe and represent the intended programming model of the device.
> >> >
> >> > This series introduces a software command parser similar in operation to the
> >> > command parsing done in hardware for unsecure batches. However, the software
> >> > parser allows some operations that would be noop'd by hardware, if the parser
> >> > determines the operation is safe, and submits the batch as "secure" to prevent
> >> > hardware parsing. Currently the series implements this on IVB and HSW.
> >>
> >> Just one more question... Do you have a branch for people to test?
> >
> > Not at the moment. And as mentioned in the v2 cover letter, it's actually not
> > particularly testable (or mergeable for that matter) right now because of a
> > regression in secure dispatch on nightly.
> 
> The command parser itself should still work, even with the regression
> in -nightly. The copying and secure dispatch are obviously fail atm.
> That still leaves regression testing of current userspace and
> micro-optimizing the checker itself as possible things to do. Otoh not
> sure what exactly Chris wanted to test.

To test/merge, we'd have to change the series to take out the part where
patch 02/13 sets I915_DISPATCH_SECURE to avoid a BUG_ON() when i915.enable_cmd_parser=1.
But yes, otherwise the parsing works and I think should be sufficient for
what Chris indicated he wants to test.

- Brad

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2014-02-05 19:00       ` Volkin, Bradley D
@ 2014-02-05 19:17         ` Daniel Vetter
  2014-02-05 19:55           ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2014-02-05 19:17 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Wed, Feb 5, 2014 at 8:00 PM, Volkin, Bradley D
<bradley.d.volkin@intel.com> wrote:
> To test/merge, we'd have to change the series to take out the part where
> patch 02/13 sets I915_DISPATCH_SECURE to avoid a BUG_ON() when i915.enable_cmd_parser=1.
> But yes, otherwise the parsing works and I think should be sufficient for
> what Chris indicated he wants to test.

Oh, I didn't spot this but this needs to be moved way back in the
series - we can only set the bit once we have the batchbuffer copy
logic in place. Otherwise there's a security hole open since userspace
is free to frob the batch residing in the ppgtt, which we just can't
prevent.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [RFC 00/22] Gen7 batch buffer command parser
  2014-02-05 19:17         ` Daniel Vetter
@ 2014-02-05 19:55           ` Volkin, Bradley D
  0 siblings, 0 replies; 138+ messages in thread
From: Volkin, Bradley D @ 2014-02-05 19:55 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Feb 05, 2014 at 11:17:25AM -0800, Daniel Vetter wrote:
> On Wed, Feb 5, 2014 at 8:00 PM, Volkin, Bradley D
> <bradley.d.volkin@intel.com> wrote:
> > To test/merge, we'd have to change the series to take out the part where
> > patch 02/13 sets I915_DISPATCH_SECURE to avoid a BUG_ON() when i915.enable_cmd_parser=1.
> > But yes, otherwise the parsing works and I think should be sufficient for
> > what Chris indicated he wants to test.
> 
> Oh, I didn't spot this but this needs to be moved way back in the
> series - we can only set the bit once we have the batchbuffer copy
> logic in place. Otherwise there's a security hole open since userspace
> is free to frob the batch residing in the ppgtt, which we just can't
> prevent.

Good point. I'll take it out and we can add it as part of the batch copy work.

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-01-29 21:55   ` [PATCH 02/13] drm/i915: Implement command buffer parsing logic bradley.d.volkin
                       ` (2 preceding siblings ...)
  2014-02-05 15:15     ` Jani Nikula
@ 2014-02-07 13:58     ` Jani Nikula
  2014-02-07 14:45       ` Daniel Vetter
  3 siblings, 1 reply; 138+ messages in thread
From: Jani Nikula @ 2014-02-07 13:58 UTC (permalink / raw)
  To: bradley.d.volkin, intel-gfx

On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> +static int valid_reg(const u32 *table, int count, u32 addr)
> +{
> +	if (table && count != 0) {
> +		int i;
> +
> +		for (i = 0; i < count; i++) {
> +			if (table[i] == addr)
> +				return 1;
> +		}
> +	}

You go to great lengths to validate the register tables are sorted, but
in the end you don't take advantage of this fact by bailing out early if
the lookup goes past the addr.

Is this optimization the main reason for having the tables sorted, or
are there other reasons too (I couldn't find any)?

I'm beginning to wonder if this is a premature optimization that adds
extra code. For master restricted registers you will always scan the
regular reg table completely first. Perhaps a better option would be to
have all registers in the same table, with a separate master flag,
ordered by how frequently they are expected to be used. We do want to
optimize for the happy day scenario. But maybe it's too early to tell.

I'm inclined to ripping out the sort requirement and check, if the sole
purpose is optimization, for simplicity's sake.


BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-02-07 13:58     ` Jani Nikula
@ 2014-02-07 14:45       ` Daniel Vetter
  2014-02-11 18:12         ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2014-02-07 14:45 UTC (permalink / raw)
  To: Jani Nikula; +Cc: intel-gfx

On Fri, Feb 07, 2014 at 03:58:46PM +0200, Jani Nikula wrote:
> On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> > +static int valid_reg(const u32 *table, int count, u32 addr)
> > +{
> > +	if (table && count != 0) {
> > +		int i;
> > +
> > +		for (i = 0; i < count; i++) {
> > +			if (table[i] == addr)
> > +				return 1;
> > +		}
> > +	}
> 
> You go to great lengths to validate the register tables are sorted, but
> in the end you don't take advantage of this fact by bailing out early if
> the lookup goes past the addr.
> 
> Is this optimization the main reason for having the tables sorted, or
> are there other reasons too (I couldn't find any)?
> 
> I'm beginning to wonder if this is a premature optimization that adds
> extra code. For master restricted registers you will always scan the
> regular reg table completely first. Perhaps a better option would be to
> have all registers in the same table, with a separate master flag,
> ordered by how frequently they are expected to be used. We do want to
> optimize for the happy day scenario. But maybe it's too early to tell.
> 
> I'm inclined to ripping out the sort requirement and check, if the sole
> purpose is optimization, for simplicity's sake.

tbh I don't mind the sorting requirement, and iirc Brad has patches
already for binary search. Once we start to rely on the sorting we can
easily add a little functions which checks for that at ring
initialization, so I also don't see any concerns wrt code fragility.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-02-07 14:45       ` Daniel Vetter
@ 2014-02-11 18:12         ` Volkin, Bradley D
  2014-02-11 18:21           ` Jani Nikula
  0 siblings, 1 reply; 138+ messages in thread
From: Volkin, Bradley D @ 2014-02-11 18:12 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Feb 07, 2014 at 06:45:48AM -0800, Daniel Vetter wrote:
> On Fri, Feb 07, 2014 at 03:58:46PM +0200, Jani Nikula wrote:
> > On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> > > +static int valid_reg(const u32 *table, int count, u32 addr)
> > > +{
> > > +	if (table && count != 0) {
> > > +		int i;
> > > +
> > > +		for (i = 0; i < count; i++) {
> > > +			if (table[i] == addr)
> > > +				return 1;
> > > +		}
> > > +	}
> > 
> > You go to great lengths to validate the register tables are sorted, but
> > in the end you don't take advantage of this fact by bailing out early if
> > the lookup goes past the addr.
> > 
> > Is this optimization the main reason for having the tables sorted, or
> > are there other reasons too (I couldn't find any)?
> > 
> > I'm beginning to wonder if this is a premature optimization that adds
> > extra code. For master restricted registers you will always scan the
> > regular reg table completely first. Perhaps a better option would be to
> > have all registers in the same table, with a separate master flag,
> > ordered by how frequently they are expected to be used. We do want to
> > optimize for the happy day scenario. But maybe it's too early to tell.
> > 
> > I'm inclined to ripping out the sort requirement and check, if the sole
> > purpose is optimization, for simplicity's sake.
> 
> tbh I don't mind the sorting requirement, and iirc Brad has patches
> already for binary search. Once we start to rely on the sorting we can
> easily add a little functions which checks for that at ring
> initialization, so I also don't see any concerns wrt code fragility.

Sorry for the delayed response. The background here is that I originally
just had the tables sorted with a comment to say as much. The idea was that
if the linear search became an issue, switching algorithms would be easier.
Chris suggested just moving to bsearch and checking that the tables are sorted
as part of the v1 series review. I implemented bsearch and found that the perf
change was the same to slightly worse. So I added the sorting check and kept
the linear search until we have better data.

Thanks,
Brad

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 02/13] drm/i915: Implement command buffer parsing logic
  2014-02-11 18:12         ` Volkin, Bradley D
@ 2014-02-11 18:21           ` Jani Nikula
  0 siblings, 0 replies; 138+ messages in thread
From: Jani Nikula @ 2014-02-11 18:21 UTC (permalink / raw)
  To: Volkin, Bradley D, Daniel Vetter; +Cc: intel-gfx

On Tue, 11 Feb 2014, "Volkin, Bradley D" <bradley.d.volkin@intel.com> wrote:
> On Fri, Feb 07, 2014 at 06:45:48AM -0800, Daniel Vetter wrote:
>> On Fri, Feb 07, 2014 at 03:58:46PM +0200, Jani Nikula wrote:
>> > On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
>> > > +static int valid_reg(const u32 *table, int count, u32 addr)
>> > > +{
>> > > +	if (table && count != 0) {
>> > > +		int i;
>> > > +
>> > > +		for (i = 0; i < count; i++) {
>> > > +			if (table[i] == addr)
>> > > +				return 1;
>> > > +		}
>> > > +	}
>> > 
>> > You go to great lengths to validate the register tables are sorted, but
>> > in the end you don't take advantage of this fact by bailing out early if
>> > the lookup goes past the addr.
>> > 
>> > Is this optimization the main reason for having the tables sorted, or
>> > are there other reasons too (I couldn't find any)?
>> > 
>> > I'm beginning to wonder if this is a premature optimization that adds
>> > extra code. For master restricted registers you will always scan the
>> > regular reg table completely first. Perhaps a better option would be to
>> > have all registers in the same table, with a separate master flag,
>> > ordered by how frequently they are expected to be used. We do want to
>> > optimize for the happy day scenario. But maybe it's too early to tell.
>> > 
>> > I'm inclined to ripping out the sort requirement and check, if the sole
>> > purpose is optimization, for simplicity's sake.
>> 
>> tbh I don't mind the sorting requirement, and iirc Brad has patches
>> already for binary search. Once we start to rely on the sorting we can
>> easily add a little functions which checks for that at ring
>> initialization, so I also don't see any concerns wrt code fragility.
>
> Sorry for the delayed response. The background here is that I originally
> just had the tables sorted with a comment to say as much. The idea was that
> if the linear search became an issue, switching algorithms would be easier.
> Chris suggested just moving to bsearch and checking that the tables are sorted
> as part of the v1 series review. I implemented bsearch and found that the perf
> change was the same to slightly worse. So I added the sorting check and kept
> the linear search until we have better data.

Ok. For the linear search I think you could add the check if you've
iterated past the register and bail out early, and gather the data with
that.

BR,
Jani.


>
> Thanks,
> Brad
>
>> -Daniel
>> -- 
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 5/6] tests/gem_exec_parse: Test for batches w/o MI_BATCH_BUFFER_END
  2014-01-30 11:46       ` Chris Wilson
@ 2014-03-25 13:17         ` Daniel Vetter
  2014-03-25 19:49           ` Volkin, Bradley D
  0 siblings, 1 reply; 138+ messages in thread
From: Daniel Vetter @ 2014-03-25 13:17 UTC (permalink / raw)
  To: Chris Wilson, bradley.d.volkin, intel-gfx

On Thu, Jan 30, 2014 at 11:46:15AM +0000, Chris Wilson wrote:
> On Wed, Jan 29, 2014 at 10:10:47PM +0000, Chris Wilson wrote:
> > On Wed, Jan 29, 2014 at 01:58:29PM -0800, bradley.d.volkin@intel.com wrote:
> > > From: Brad Volkin <bradley.d.volkin@intel.com>
> > > 
> > > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > > ---
> > >  tests/gem_exec_parse.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > > 
> > > diff --git a/tests/gem_exec_parse.c b/tests/gem_exec_parse.c
> > > index 9e90408..004c3bf 100644
> > > --- a/tests/gem_exec_parse.c
> > > +++ b/tests/gem_exec_parse.c
> > > @@ -257,6 +257,15 @@ igt_main
> > >  				      -EINVAL));
> > >  	}
> > >  
> > > +	igt_subtest("batch-without-end") {
> > > +		uint32_t noop[1024] = { 0 };
> > > +		igt_assert(
> > > +			   exec_batch(fd, handle,
> > > +				      noop, sizeof(noop),
> > > +				      I915_EXEC_RENDER,
> > > +				      -EINVAL));
> > 
> > Cheekier would be
> > uint32_t empty[] = { MI_NOOP, MI_NOOP, MI_BATCH_BUFFER_END, 0 };
> > for_each_ring() {
> > 	igt_assert(exec_batch(fd, handle, empty, sizeof(empty), ring, 0));
> > 	igt_assert(exec_batch(fd, handle, empty, 8, ring, -EINVAL));
> > }
> 
> On this subject, it should be
> { INVALID, INVALID, NOOP, NOOP, END, 0}
> assert(exec(0,  4) == -EINVAL);
> assert(exec(0,  8) == -EINVAL);
> assert(exec(0, 12) == -EINVAL);
> assert(exec(4,  8) == -EINVAL);
> assert(exec(4, 12) == 0);
> assert(exec(8, 12) == 0);

Brad, care to throw this nasties into the test pond, too?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 6/6] tests/gem_exec_parse: Test a command crossing a page boundary
  2014-01-29 22:12     ` Chris Wilson
@ 2014-03-25 13:20       ` Daniel Vetter
  0 siblings, 0 replies; 138+ messages in thread
From: Daniel Vetter @ 2014-03-25 13:20 UTC (permalink / raw)
  To: Chris Wilson, bradley.d.volkin, intel-gfx

On Wed, Jan 29, 2014 at 10:12:08PM +0000, Chris Wilson wrote:
> On Wed, Jan 29, 2014 at 01:58:30PM -0800, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > This is a speculative test in that it's not particularly relevant
> > today, but is important if we switch the parser implementation to
> > use kmap_atomic instead of vmap.
> 
> Do you not want to iterate over all (or some combination of)
> valid/invalid commands to better fuzz the handling of boundaries?

I think we can look into that once we decide that kmap_atomic is indeed
the right way forward. This here seems good enough to at least have the
basics ready for a quick test.

Pulled in all six patches into igt, I think adding some of the additional
cases Chris suggested for invalid handling might be useful.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH 5/6] tests/gem_exec_parse: Test for batches w/o MI_BATCH_BUFFER_END
  2014-03-25 13:17         ` Daniel Vetter
@ 2014-03-25 19:49           ` Volkin, Bradley D
  0 siblings, 0 replies; 138+ messages in thread
From: Volkin, Bradley D @ 2014-03-25 19:49 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, Mar 25, 2014 at 06:17:55AM -0700, Daniel Vetter wrote:
> On Thu, Jan 30, 2014 at 11:46:15AM +0000, Chris Wilson wrote:
> > On Wed, Jan 29, 2014 at 10:10:47PM +0000, Chris Wilson wrote:
> > > On Wed, Jan 29, 2014 at 01:58:29PM -0800, bradley.d.volkin@intel.com wrote:
> > > > From: Brad Volkin <bradley.d.volkin@intel.com>
> > > > 
> > > > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > > > ---
> > > >  tests/gem_exec_parse.c | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > > 
> > > > diff --git a/tests/gem_exec_parse.c b/tests/gem_exec_parse.c
> > > > index 9e90408..004c3bf 100644
> > > > --- a/tests/gem_exec_parse.c
> > > > +++ b/tests/gem_exec_parse.c
> > > > @@ -257,6 +257,15 @@ igt_main
> > > >  				      -EINVAL));
> > > >  	}
> > > >  
> > > > +	igt_subtest("batch-without-end") {
> > > > +		uint32_t noop[1024] = { 0 };
> > > > +		igt_assert(
> > > > +			   exec_batch(fd, handle,
> > > > +				      noop, sizeof(noop),
> > > > +				      I915_EXEC_RENDER,
> > > > +				      -EINVAL));
> > > 
> > > Cheekier would be
> > > uint32_t empty[] = { MI_NOOP, MI_NOOP, MI_BATCH_BUFFER_END, 0 };
> > > for_each_ring() {
> > > 	igt_assert(exec_batch(fd, handle, empty, sizeof(empty), ring, 0));
> > > 	igt_assert(exec_batch(fd, handle, empty, 8, ring, -EINVAL));
> > > }
> > 
> > On this subject, it should be
> > { INVALID, INVALID, NOOP, NOOP, END, 0}
> > assert(exec(0,  4) == -EINVAL);
> > assert(exec(0,  8) == -EINVAL);
> > assert(exec(0, 12) == -EINVAL);
> > assert(exec(4,  8) == -EINVAL);
> > assert(exec(4, 12) == 0);
> > assert(exec(8, 12) == 0);
> 
> Brad, care to throw this nasties into the test pond, too?

Yeah, I can add that.

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 138+ messages in thread

end of thread, other threads:[~2014-03-25 19:49 UTC | newest]

Thread overview: 138+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-26 16:51 [RFC 00/22] Gen7 batch buffer command parser bradley.d.volkin
2013-11-26 16:51 ` [RFC 01/22] drm/i915: Add data structures for " bradley.d.volkin
2013-11-26 16:51 ` [RFC 02/22] drm/i915: Initial command parser table definitions bradley.d.volkin
2013-11-26 16:51 ` [RFC 03/22] drm/i915: Hook command parser tables up to rings bradley.d.volkin
2013-11-26 16:51 ` [RFC 04/22] drm/i915: Add per-ring command length decode functions bradley.d.volkin
2013-11-26 16:51 ` [RFC 05/22] drm/i915: Implement command parsing bradley.d.volkin
2013-11-26 17:29   ` Chris Wilson
2013-11-26 17:38     ` Volkin, Bradley D
2013-11-26 17:56       ` Chris Wilson
2013-11-26 18:55         ` Volkin, Bradley D
2013-12-05 21:10         ` Volkin, Bradley D
2013-11-26 16:51 ` [RFC 06/22] drm/i915: Add a HAS_CMD_PARSER getparam bradley.d.volkin
2013-11-27 12:51   ` Daniel Vetter
2013-12-05  9:38     ` Kenneth Graunke
2013-12-05 17:22       ` Volkin, Bradley D
2013-12-05 17:26         ` Daniel Vetter
2013-11-26 16:51 ` [RFC 07/22] drm/i915: Add support for rejecting commands during parsing bradley.d.volkin
2013-11-26 16:51 ` [RFC 08/22] drm/i915: Add support for checking register accesses bradley.d.volkin
2013-11-26 16:51 ` [RFC 09/22] drm/i915: Add support for rejecting commands via bitmasks bradley.d.volkin
2013-11-26 16:51 ` [RFC 10/22] drm/i915: Reject unsafe commands bradley.d.volkin
2013-11-26 16:51 ` [RFC 11/22] drm/i915: Add register whitelists for mesa bradley.d.volkin
2013-11-26 16:51 ` [RFC 12/22] drm/i915: Enable register whitelist checks bradley.d.volkin
2013-11-26 16:51 ` [RFC 13/22] drm/i915: Enable bit checking for some commands bradley.d.volkin
2013-11-26 16:51 ` [RFC 14/22] drm/i915: Enable PPGTT command parser checks bradley.d.volkin
2013-11-26 16:51 ` [RFC 15/22] drm/i915: Reject commands that would store to global HWS page bradley.d.volkin
2013-11-26 16:51 ` [RFC 16/22] drm/i915: Reject additional commands bradley.d.volkin
2013-11-26 16:51 ` [RFC 17/22] drm/i915: Add parser data for perf monitoring GL extensions bradley.d.volkin
2013-11-26 16:51 ` [RFC 18/22] drm/i915: Reject MI_ARB_ON_OFF on VECS bradley.d.volkin
2013-11-26 16:51 ` [RFC 19/22] drm/i915: Fix length handling for MFX_WAIT bradley.d.volkin
2013-11-26 16:51 ` [RFC 20/22] drm/i915: Fix MI_STORE_DWORD_IMM parser defintion bradley.d.volkin
2013-11-26 18:08   ` Chris Wilson
2013-11-26 18:55     ` Volkin, Bradley D
2013-11-26 16:51 ` [RFC 21/22] drm/i915: Clean up command parser enable decision bradley.d.volkin
2013-11-26 16:51 ` [RFC 22/22] drm/i915: Enable command parsing by default bradley.d.volkin
2013-11-26 19:35 ` [RFC 00/22] Gen7 batch buffer command parser Daniel Vetter
2013-11-26 20:24   ` Volkin, Bradley D
2013-11-27  1:32     ` ykzhao
2013-11-27  8:10       ` Daniel Vetter
2013-11-27  8:23         ` Xiang, Haihao
2013-11-27  8:31           ` Daniel Vetter
2013-11-27  8:42             ` Xiang, Haihao
2013-11-27  8:47               ` Daniel Vetter
2013-11-27  8:54                 ` Xiang, Haihao
2013-11-27  8:55                 ` ykzhao
2013-12-04  8:13     ` Daniel Vetter
2013-12-04  8:22       ` Daniel Vetter
2013-12-05  1:40       ` Volkin, Bradley D
2013-12-05  7:48         ` Daniel Vetter
2013-12-05 20:47     ` Volkin, Bradley D
2013-12-05 23:42       ` Daniel Vetter
2013-11-27  1:26   ` Xiang, Haihao
2013-12-11  0:58   ` Volkin, Bradley D
2013-12-11  9:54     ` Daniel Vetter
2013-12-11 18:04       ` Volkin, Bradley D
2013-12-11 18:46         ` Daniel Vetter
2014-01-29 21:55 ` [PATCH 00/13] " bradley.d.volkin
2014-01-29 21:55   ` [PATCH 01/13] drm/i915: Refactor shmem pread setup bradley.d.volkin
2014-01-30  8:36     ` Daniel Vetter
2014-01-29 21:55   ` [PATCH 02/13] drm/i915: Implement command buffer parsing logic bradley.d.volkin
2014-01-29 22:28     ` Chris Wilson
2014-01-30  8:53       ` Daniel Vetter
2014-01-30  9:05         ` Daniel Vetter
2014-01-30  9:12           ` Daniel Vetter
2014-01-30 11:07             ` Daniel Vetter
2014-01-30 18:05               ` Volkin, Bradley D
2014-02-03 23:00                 ` Volkin, Bradley D
2014-02-04 10:20                   ` Daniel Vetter
2014-02-04 18:45                     ` Volkin, Bradley D
2014-02-04 19:33                       ` Daniel Vetter
2014-02-05  0:56                         ` Volkin, Bradley D
2014-01-30 17:55             ` Volkin, Bradley D
2014-01-30  9:07     ` Daniel Vetter
2014-01-30 10:57       ` Chris Wilson
2014-02-05 15:15     ` Jani Nikula
2014-02-05 18:36       ` Volkin, Bradley D
2014-02-07 13:58     ` Jani Nikula
2014-02-07 14:45       ` Daniel Vetter
2014-02-11 18:12         ` Volkin, Bradley D
2014-02-11 18:21           ` Jani Nikula
2014-01-29 21:55   ` [PATCH 03/13] drm/i915: Initial command parser table definitions bradley.d.volkin
2014-02-05 14:22     ` Jani Nikula
2014-01-29 21:55   ` [PATCH 04/13] drm/i915: Reject privileged commands bradley.d.volkin
2014-02-05 15:22     ` Jani Nikula
2014-02-05 18:42       ` Volkin, Bradley D
2014-01-29 21:55   ` [PATCH 05/13] drm/i915: Allow some privileged commands from master bradley.d.volkin
2014-01-29 21:55   ` [PATCH 06/13] drm/i915: Add register whitelists for mesa bradley.d.volkin
2014-02-05 15:29     ` Jani Nikula
2014-02-05 18:47       ` Volkin, Bradley D
2014-01-29 21:55   ` [PATCH 07/13] drm/i915: Add register whitelist for DRM master bradley.d.volkin
2014-01-29 22:37     ` Chris Wilson
2014-01-29 23:18       ` Volkin, Bradley D
2014-01-30  9:02         ` Daniel Vetter
     [not found]           ` <20140130172206.GA26611@vpg-ubuntu-bdvolkin>
2014-01-30 20:41             ` Daniel Vetter
2014-01-29 21:55   ` [PATCH 08/13] drm/i915: Enable register whitelist checks bradley.d.volkin
2014-02-05 15:33     ` Jani Nikula
2014-02-05 18:49       ` Volkin, Bradley D
2014-01-29 21:55   ` [PATCH 09/13] drm/i915: Reject commands that explicitly generate interrupts bradley.d.volkin
2014-01-29 21:55   ` [PATCH 10/13] drm/i915: Enable PPGTT command parser checks bradley.d.volkin
2014-01-29 22:33     ` Chris Wilson
2014-01-29 23:00       ` Volkin, Bradley D
2014-01-29 23:08         ` Chris Wilson
2014-02-05 15:37     ` Jani Nikula
2014-02-05 18:54       ` Volkin, Bradley D
2014-01-29 21:55   ` [PATCH 11/13] drm/i915: Reject commands that would store to global HWS page bradley.d.volkin
2014-02-05 15:39     ` Jani Nikula
2014-01-29 21:55   ` [PATCH 12/13] drm/i915: Add a CMD_PARSER_VERSION getparam bradley.d.volkin
2014-01-30  9:19     ` Daniel Vetter
2014-01-30 17:25       ` Volkin, Bradley D
2014-01-29 21:55   ` [PATCH 13/13] drm/i915: Enable command parsing by default bradley.d.volkin
2014-01-29 22:11   ` [PATCH 00/13] Gen7 batch buffer command parser Daniel Vetter
2014-01-29 22:22     ` Volkin, Bradley D
2014-01-29 23:31       ` Daniel Vetter
2014-02-05 15:41   ` Jani Nikula
2014-01-29 21:57 ` [PATCH] intel: Merge i915_drm.h with cmd parser define bradley.d.volkin
2014-01-29 22:13   ` Chris Wilson
2014-01-29 22:26     ` Volkin, Bradley D
2014-01-30  9:20       ` Daniel Vetter
2014-01-30 17:28         ` Volkin, Bradley D
2014-02-04 10:26           ` Daniel Vetter
2014-01-29 21:58 ` [PATCH 1/6] tests: Add a test for the command parser bradley.d.volkin
2014-01-29 21:58   ` [PATCH 2/6] tests/gem_exec_parse: Add tests for rejected commands bradley.d.volkin
2014-01-29 21:58   ` [PATCH 3/6] tests/gem_exec_parse: Add tests for register whitelist bradley.d.volkin
2014-01-29 21:58   ` [PATCH 4/6] tests/gem_exec_parse: Add tests for bitmask checks bradley.d.volkin
2014-01-29 21:58   ` [PATCH 5/6] tests/gem_exec_parse: Test for batches w/o MI_BATCH_BUFFER_END bradley.d.volkin
2014-01-29 22:10     ` Chris Wilson
2014-01-30 11:46       ` Chris Wilson
2014-03-25 13:17         ` Daniel Vetter
2014-03-25 19:49           ` Volkin, Bradley D
2014-01-29 21:58   ` [PATCH 6/6] tests/gem_exec_parse: Test a command crossing a page boundary bradley.d.volkin
2014-01-29 22:12     ` Chris Wilson
2014-03-25 13:20       ` Daniel Vetter
2014-02-05 10:28 ` [RFC 00/22] Gen7 batch buffer command parser Chris Wilson
2014-02-05 18:18   ` Volkin, Bradley D
2014-02-05 18:25     ` Chris Wilson
2014-02-05 18:30     ` Daniel Vetter
2014-02-05 19:00       ` Volkin, Bradley D
2014-02-05 19:17         ` Daniel Vetter
2014-02-05 19:55           ` Volkin, Bradley D

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.