All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] render state initialization (bdw rc6)
@ 2014-05-06 13:26 Mika Kuoppala
  2014-05-06 13:26 ` [PATCH v2 1/2] drm/i915: add render state initialization Mika Kuoppala
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Mika Kuoppala @ 2014-05-06 13:26 UTC (permalink / raw)
  To: intel-gfx; +Cc: ben, miku, kristen

Hi,

V2 series of the render state initialization patches.

I decided not to pursue the copying of the context object as the ctx
is quite big, atleast on bdw. As discussed in irc, the copying
could be done with blitter, on context creation time. But even then we would 
need to wait for it to complete. Pushing 1kbytes of commands doesn't
sound so bad when the alternative is to copy 18 pages.

The state generators can be found here but they are not needed for testing.
http://cgit.freedesktop.org/~miku/intel-gpu-tools/log/?h=null_state_gen

Here is the branch for testing:
http://cgit.freedesktop.org/~miku/drm-intel/log/?h=render_state

Thank you to all who provided feedback.
-Mika

Mika Kuoppala (2):
  drm/i915: add render state initialization
  drm/i915: add null render states for gen6, gen7 and gen8

 drivers/gpu/drm/i915/Makefile                 |    6 +
 drivers/gpu/drm/i915/i915_drv.h               |    2 +
 drivers/gpu/drm/i915/i915_gem_context.c       |    6 +
 drivers/gpu/drm/i915/i915_gem_render_state.c  |  186 ++++++++++
 drivers/gpu/drm/i915/intel_renderstate.h      |   48 +++
 drivers/gpu/drm/i915/intel_renderstate_gen6.c |  289 +++++++++++++++
 drivers/gpu/drm/i915/intel_renderstate_gen7.c |  253 +++++++++++++
 drivers/gpu/drm/i915/intel_renderstate_gen8.c |  479 +++++++++++++++++++++++++
 8 files changed, 1269 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_render_state.c
 create mode 100644 drivers/gpu/drm/i915/intel_renderstate.h
 create mode 100644 drivers/gpu/drm/i915/intel_renderstate_gen6.c
 create mode 100644 drivers/gpu/drm/i915/intel_renderstate_gen7.c
 create mode 100644 drivers/gpu/drm/i915/intel_renderstate_gen8.c

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/2] drm/i915: add render state initialization
  2014-05-06 13:26 [PATCH v2 0/2] render state initialization (bdw rc6) Mika Kuoppala
@ 2014-05-06 13:26 ` Mika Kuoppala
  2014-05-06 13:41   ` Chris Wilson
  2014-05-06 13:26 ` [PATCH v2 2/2] drm/i915: add null render states for gen6, gen7 and gen8 Mika Kuoppala
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Mika Kuoppala @ 2014-05-06 13:26 UTC (permalink / raw)
  To: intel-gfx; +Cc: ben, miku, kristen

HW guys say that it is not a cool idea to let device
go into rc6 without proper 3d pipeline state.

For each new uninitialized context, generate a
valid null render state to be run on context
creation.

This patch introduces a skeleton with empty states.

v2: - No need to vmap (Chris Wilson)
    - use .c files for state (Daniel Vetter)
    - no need to flush as i915_add_request does it
    - remove parameter for batch alloc size
    - don't wait for the init (Ben Widawsky)

Tested-by: Kristen Carlson Accardi <kristen@linux.intel.com> (v1)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |    6 +
 drivers/gpu/drm/i915/i915_drv.h               |    2 +
 drivers/gpu/drm/i915/i915_gem_context.c       |    6 +
 drivers/gpu/drm/i915/i915_gem_render_state.c  |  186 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_renderstate.h      |   48 +++++++
 drivers/gpu/drm/i915/intel_renderstate_gen6.c |   10 ++
 drivers/gpu/drm/i915/intel_renderstate_gen7.c |   10 ++
 drivers/gpu/drm/i915/intel_renderstate_gen8.c |   10 ++
 8 files changed, 278 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_render_state.c
 create mode 100644 drivers/gpu/drm/i915/intel_renderstate.h
 create mode 100644 drivers/gpu/drm/i915/intel_renderstate_gen6.c
 create mode 100644 drivers/gpu/drm/i915/intel_renderstate_gen7.c
 create mode 100644 drivers/gpu/drm/i915/intel_renderstate_gen8.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index b1445b7..2446916 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -18,6 +18,7 @@ i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o
 # GEM code
 i915-y += i915_cmd_parser.o \
 	  i915_gem_context.o \
+	  i915_gem_render_state.o \
 	  i915_gem_debug.o \
 	  i915_gem_dmabuf.o \
 	  i915_gem_evict.o \
@@ -32,6 +33,11 @@ i915-y += i915_cmd_parser.o \
 	  intel_ringbuffer.o \
 	  intel_uncore.o
 
+# autogenerated null render state
+i915-y += intel_renderstate_gen6.o \
+	  intel_renderstate_gen7.o \
+	  intel_renderstate_gen8.o
+
 # modesetting core code
 i915-y += intel_bios.o \
 	  intel_display.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3fc2e3d..a2fc605 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2336,6 +2336,8 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 				   struct drm_file *file);
 
+/* i915_gem_render_state.c */
+int i915_gem_render_state_init(struct intel_ring_buffer *ring);
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev,
 					  struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f77b4c1..f7ad59e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -699,6 +699,12 @@ static int do_switch(struct intel_ring_buffer *ring,
 		/* obj is kept alive until the next request by its active ref */
 		i915_gem_object_ggtt_unpin(from->obj);
 		i915_gem_context_unreference(from);
+	} else {
+		if (to->is_initialized == false) {
+			ret = i915_gem_render_state_init(ring);
+			if (ret)
+				DRM_ERROR("init render state: %d\n", ret);
+		}
 	}
 
 	to->is_initialized = true;
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
new file mode 100644
index 0000000..48907ab
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -0,0 +1,186 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Mika Kuoppala <mika.kuoppala@intel.com>
+ *
+ */
+
+#include "i915_drv.h"
+#include "intel_renderstate.h"
+
+struct i915_render_state {
+	struct drm_i915_gem_object *obj;
+	unsigned long ggtt_offset;
+	void *batch;
+	u32 size;
+	u32 len;
+};
+
+static struct i915_render_state *render_state_alloc(struct drm_device *dev)
+{
+	struct i915_render_state *so;
+	struct page *page;
+	int ret;
+
+	so = kzalloc(sizeof(*so), GFP_KERNEL);
+	if (!so)
+		return ERR_PTR(-ENOMEM);
+
+	so->obj = i915_gem_alloc_object(dev, 4096);
+	if (so->obj == NULL) {
+		ret = -ENOMEM;
+		goto free;
+	}
+	so->size = 4096;
+
+	ret = i915_gem_obj_ggtt_pin(so->obj, 4096, 0);
+	if (ret)
+		goto free_gem;
+
+	BUG_ON(so->obj->pages->nents != 1);
+	page = sg_page(so->obj->pages->sgl);
+
+	so->batch = kmap(page);
+	if (!so->batch) {
+		ret = -ENOMEM;
+		goto unpin;
+	}
+
+	so->ggtt_offset = i915_gem_obj_ggtt_offset(so->obj);
+
+	return so;
+unpin:
+	i915_gem_object_ggtt_unpin(so->obj);
+free_gem:
+	drm_gem_object_unreference(&so->obj->base);
+free:
+	kfree(so);
+	return ERR_PTR(ret);
+}
+
+static void render_state_free(struct i915_render_state *so)
+{
+	kunmap(so->batch);
+	i915_gem_object_ggtt_unpin(so->obj);
+	drm_gem_object_unreference(&so->obj->base);
+	kfree(so);
+}
+
+static const struct intel_renderstate_rodata *
+render_state_get_rodata(const int gen)
+{
+	switch (gen) {
+	case 6:
+		return &gen6_null_state;
+	case 7:
+		return &gen7_null_state;
+	case 8:
+		return &gen8_null_state;
+	}
+
+	return NULL;
+}
+
+static int render_state_setup(const int gen, struct i915_render_state *so)
+{
+	const struct intel_renderstate_rodata *rodata;
+	const u64 goffset = i915_gem_obj_ggtt_offset(so->obj);
+	u32 reloc_index = 0;
+	u32 * const d = so->batch;
+	unsigned int i = 0;
+
+	rodata = render_state_get_rodata(gen);
+	if (rodata == NULL)
+		return -ENOENT;
+
+	if (rodata->batch_items * 4 > so->size)
+		return -EINVAL;
+
+	while (i < rodata->batch_items) {
+		u32 s = rodata->batch[i];
+
+		if (reloc_index < rodata->reloc_items &&
+		    i * 4  == rodata->reloc[reloc_index]) {
+
+			s += goffset & 0xffffffff;
+
+			/* We keep batch offsets max 32bit */
+			if (gen >= 8) {
+				if (i + 1 >= rodata->batch_items ||
+				    rodata->batch[i + 1] != 0)
+					return -EINVAL;
+
+				d[i] = s;
+				i++;
+				s = (goffset & 0xffffffff00000000ull) >> 32;
+			}
+
+			reloc_index++;
+		}
+
+		d[i] = s;
+		i++;
+	}
+
+	if (rodata->reloc_items != reloc_index) {
+		DRM_ERROR("not all relocs resolved, %d out of %d\n",
+			  reloc_index, rodata->reloc_items);
+		return -EINVAL;
+	}
+
+	so->len = rodata->batch_items * 4;
+
+	return 0;
+}
+
+int i915_gem_render_state_init(struct intel_ring_buffer *ring)
+{
+	const int gen = INTEL_INFO(ring->dev)->gen;
+	struct i915_render_state *so;
+	u32 seqno;
+	int ret;
+
+	if (gen < 6)
+		return 0;
+
+	so = render_state_alloc(ring->dev);
+	if (IS_ERR(so))
+		return PTR_ERR(so);
+
+	ret = render_state_setup(gen, so);
+	if (ret)
+		goto out;
+
+	ret = ring->dispatch_execbuffer(ring,
+					i915_gem_obj_ggtt_offset(so->obj),
+					so->len,
+					I915_DISPATCH_SECURE);
+	if (ret)
+		goto out;
+
+	ret = i915_add_request(ring, &seqno);
+
+out:
+	render_state_free(so);
+	return ret;
+}
diff --git a/drivers/gpu/drm/i915/intel_renderstate.h b/drivers/gpu/drm/i915/intel_renderstate.h
new file mode 100644
index 0000000..a5e783a
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_renderstate.h
@@ -0,0 +1,48 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef _INTEL_RENDERSTATE_H
+#define _INTEL_RENDERSTATE_H
+
+#include <linux/types.h>
+
+struct intel_renderstate_rodata {
+	const u32 *reloc;
+	const u32 reloc_items;
+	const u32 *batch;
+	const u32 batch_items;
+};
+
+extern const struct intel_renderstate_rodata gen6_null_state;
+extern const struct intel_renderstate_rodata gen7_null_state;
+extern const struct intel_renderstate_rodata gen8_null_state;
+
+#define RO_RENDERSTATE(_g)						\
+	const struct intel_renderstate_rodata gen ## _g ## _null_state = { \
+		.reloc = gen ## _g ## _null_state_relocs,		\
+		.reloc_items = sizeof(gen ## _g ## _null_state_relocs)/4, \
+		.batch = gen ## _g ## _null_state_batch,		\
+		.batch_items = sizeof(gen ## _g ## _null_state_batch)/4, \
+	}
+
+#endif /* INTEL_RENDERSTATE_H */
diff --git a/drivers/gpu/drm/i915/intel_renderstate_gen6.c b/drivers/gpu/drm/i915/intel_renderstate_gen6.c
new file mode 100644
index 0000000..5ed251a
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_renderstate_gen6.c
@@ -0,0 +1,10 @@
+#include "intel_renderstate.h"
+
+static const u32 gen6_null_state_relocs[] = {
+};
+
+static const u32 gen6_null_state_batch[] = {
+	0x0a << 23, /* MI_BATCH_BUFFER_END */
+};
+
+RO_RENDERSTATE(6);
diff --git a/drivers/gpu/drm/i915/intel_renderstate_gen7.c b/drivers/gpu/drm/i915/intel_renderstate_gen7.c
new file mode 100644
index 0000000..5333f44
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_renderstate_gen7.c
@@ -0,0 +1,10 @@
+#include "intel_renderstate.h"
+
+static const u32 gen7_null_state_relocs[] = {
+};
+
+static const u32 gen7_null_state_batch[] = {
+	0x0a << 23, /* MI_BATCH_BUFFER_END */
+};
+
+RO_RENDERSTATE(7);
diff --git a/drivers/gpu/drm/i915/intel_renderstate_gen8.c b/drivers/gpu/drm/i915/intel_renderstate_gen8.c
new file mode 100644
index 0000000..88c3733
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_renderstate_gen8.c
@@ -0,0 +1,10 @@
+#include "intel_renderstate.h"
+
+static const u32 gen8_null_state_relocs[] = {
+};
+
+static const u32 gen8_null_state_batch[] = {
+	0x0a << 23, /* MI_BATCH_BUFFER_END */
+};
+
+RO_RENDERSTATE(8);
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/2] drm/i915: add null render states for gen6, gen7 and gen8
  2014-05-06 13:26 [PATCH v2 0/2] render state initialization (bdw rc6) Mika Kuoppala
  2014-05-06 13:26 ` [PATCH v2 1/2] drm/i915: add render state initialization Mika Kuoppala
@ 2014-05-06 13:26 ` Mika Kuoppala
  2014-05-06 13:39 ` [PATCH] tools/null_state_gen: generate null render state Mika Kuoppala
  2014-05-14 10:08 ` [PATCH v2 0/2] render state initialization (bdw rc6) Damien Lespiau
  3 siblings, 0 replies; 19+ messages in thread
From: Mika Kuoppala @ 2014-05-06 13:26 UTC (permalink / raw)
  To: intel-gfx; +Cc: ben, miku, kristen

These are generated with intel-gpu-tools/tools/null_state_gen

v2: Don't use header file for states (Daniel Vetter)

Tested-by: Kristen Carlson Accardi <kristen@linux.intel.com> (v1)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_renderstate_gen6.c |  281 ++++++++++++++-
 drivers/gpu/drm/i915/intel_renderstate_gen7.c |  245 ++++++++++++-
 drivers/gpu/drm/i915/intel_renderstate_gen8.c |  471 ++++++++++++++++++++++++-
 3 files changed, 994 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_renderstate_gen6.c b/drivers/gpu/drm/i915/intel_renderstate_gen6.c
index 5ed251a..740538a 100644
--- a/drivers/gpu/drm/i915/intel_renderstate_gen6.c
+++ b/drivers/gpu/drm/i915/intel_renderstate_gen6.c
@@ -1,10 +1,289 @@
 #include "intel_renderstate.h"
 
 static const u32 gen6_null_state_relocs[] = {
+	0x00000020,
+	0x00000024,
+	0x0000002c,
+	0x000001e0,
+	0x000001e4,
 };
 
 static const u32 gen6_null_state_batch[] = {
-	0x0a << 23, /* MI_BATCH_BUFFER_END */
+	0x69040000,
+	0x790d0001,
+	0x00000000,
+	0x00000000,
+	0x78180000,
+	0x00000001,
+	0x61010008,
+	0x00000000,
+	0x00000001,	 /* reloc */
+	0x00000001,	 /* reloc */
+	0x00000000,
+	0x00000001,	 /* reloc */
+	0x00000000,
+	0x00000001,
+	0x00000000,
+	0x00000001,
+	0x61020000,
+	0x00000000,
+	0x78050001,
+	0x00000018,
+	0x00000000,
+	0x780d1002,
+	0x00000000,
+	0x00000000,
+	0x00000420,
+	0x78150003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78100004,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78160003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78110005,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78120002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78170003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x79050005,
+	0xe0040000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x79100000,
+	0x00000000,
+	0x79000002,
+	0xffffffff,
+	0x00000000,
+	0x00000000,
+	0x780e0002,
+	0x00000441,
+	0x00000401,
+	0x00000401,
+	0x78021002,
+	0x00000000,
+	0x00000000,
+	0x00000400,
+	0x78130012,
+	0x00400810,
+	0x00000000,
+	0x20000000,
+	0x04000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78140007,
+	0x00000280,
+	0x08080000,
+	0x00000000,
+	0x00060000,
+	0x4e080002,
+	0x00100400,
+	0x00000000,
+	0x00000000,
+	0x78090005,
+	0x02000000,
+	0x22220000,
+	0x02f60000,
+	0x11330000,
+	0x02850004,
+	0x11220000,
+	0x78011002,
+	0x00000000,
+	0x00000000,
+	0x00000200,
+	0x78080003,
+	0x00002000,
+	0x00000448,	 /* reloc */
+	0x00000448,	 /* reloc */
+	0x00000000,
+	0x05000000,	 /* cmds end */
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000220,	 /* state start */
+	0x00000240,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x0060005a,
+	0x204077be,
+	0x000000c0,
+	0x008d0040,
+	0x0060005a,
+	0x206077be,
+	0x000000c0,
+	0x008d0080,
+	0x0060005a,
+	0x208077be,
+	0x000000d0,
+	0x008d0040,
+	0x0060005a,
+	0x20a077be,
+	0x000000d0,
+	0x008d0080,
+	0x00000201,
+	0x20080061,
+	0x00000000,
+	0x00000000,
+	0x00600001,
+	0x20200022,
+	0x008d0000,
+	0x00000000,
+	0x02800031,
+	0x21c01cc9,
+	0x00000020,
+	0x0a8a0001,
+	0x00600001,
+	0x204003be,
+	0x008d01c0,
+	0x00000000,
+	0x00600001,
+	0x206003be,
+	0x008d01e0,
+	0x00000000,
+	0x00600001,
+	0x208003be,
+	0x008d0200,
+	0x00000000,
+	0x00600001,
+	0x20a003be,
+	0x008d0220,
+	0x00000000,
+	0x00600001,
+	0x20c003be,
+	0x008d0240,
+	0x00000000,
+	0x00600001,
+	0x20e003be,
+	0x008d0260,
+	0x00000000,
+	0x00600001,
+	0x210003be,
+	0x008d0280,
+	0x00000000,
+	0x00600001,
+	0x212003be,
+	0x008d02a0,
+	0x00000000,
+	0x05800031,
+	0x24001cc8,
+	0x00000040,
+	0x90019000,
+	0x0000007e,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x0000007e,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x0000007e,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x0000007e,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x0000007e,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x0000007e,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x0000007e,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x0000007e,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x30000000,
+	0x00000124,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0xf99a130c,
+	0x799a130c,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x80000031,
+	0x00000003,
+	0x00000000,	 /* state end */
 };
 
 RO_RENDERSTATE(6);
diff --git a/drivers/gpu/drm/i915/intel_renderstate_gen7.c b/drivers/gpu/drm/i915/intel_renderstate_gen7.c
index 5333f44..6fa7ff2 100644
--- a/drivers/gpu/drm/i915/intel_renderstate_gen7.c
+++ b/drivers/gpu/drm/i915/intel_renderstate_gen7.c
@@ -1,10 +1,253 @@
 #include "intel_renderstate.h"
 
 static const u32 gen7_null_state_relocs[] = {
+	0x0000000c,
+	0x00000010,
+	0x00000018,
+	0x000001ec,
 };
 
 static const u32 gen7_null_state_batch[] = {
-	0x0a << 23, /* MI_BATCH_BUFFER_END */
+	0x69040000,
+	0x61010008,
+	0x00000000,
+	0x00000001,	 /* reloc */
+	0x00000001,	 /* reloc */
+	0x00000000,
+	0x00000001,	 /* reloc */
+	0x00000000,
+	0x00000001,
+	0x00000000,
+	0x00000001,
+	0x790d0002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78180000,
+	0x00000001,
+	0x79160000,
+	0x00000008,
+	0x78300000,
+	0x02010040,
+	0x78310000,
+	0x04000000,
+	0x78320000,
+	0x04000000,
+	0x78330000,
+	0x02000000,
+	0x78100004,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781b0005,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781c0002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781d0004,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78110005,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78120002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78210000,
+	0x00000000,
+	0x78130005,
+	0x00000000,
+	0x20000000,
+	0x04000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78140001,
+	0x20000800,
+	0x00000000,
+	0x781e0001,
+	0x00000000,
+	0x00000000,
+	0x78050005,
+	0xe0040000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78040001,
+	0x00000000,
+	0x00000000,
+	0x78240000,
+	0x00000240,
+	0x78230000,
+	0x00000260,
+	0x782f0000,
+	0x00000280,
+	0x781f000c,
+	0x00400810,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78200006,
+	0x000002c0,
+	0x08080000,
+	0x00000000,
+	0x28000402,
+	0x00060000,
+	0x00000000,
+	0x00000000,
+	0x78090005,
+	0x02000000,
+	0x22220000,
+	0x02f60000,
+	0x11230000,
+	0x02f60004,
+	0x11230000,
+	0x78080003,
+	0x00006008,
+	0x00000340,	 /* reloc */
+	0xffffffff,
+	0x00000000,
+	0x782a0000,
+	0x00000360,
+	0x79000002,
+	0xffffffff,
+	0x00000000,
+	0x00000000,
+	0x7b000005,
+	0x0000000f,
+	0x00000003,
+	0x00000000,
+	0x00000001,
+	0x00000000,
+	0x00000000,
+	0x05000000,	 /* cmds end */
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000031,	 /* state start */
+	0x00000003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0xf99a130c,
+	0x799a130c,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000492,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x0080005a,
+	0x2e2077bd,
+	0x000000c0,
+	0x008d0040,
+	0x0080005a,
+	0x2e6077bd,
+	0x000000d0,
+	0x008d0040,
+	0x02800031,
+	0x21801fa9,
+	0x008d0e20,
+	0x08840001,
+	0x00800001,
+	0x2e2003bd,
+	0x008d0180,
+	0x00000000,
+	0x00800001,
+	0x2e6003bd,
+	0x008d01c0,
+	0x00000000,
+	0x00800001,
+	0x2ea003bd,
+	0x008d0200,
+	0x00000000,
+	0x00800001,
+	0x2ee003bd,
+	0x008d0240,
+	0x00000000,
+	0x05800031,
+	0x20001fa8,
+	0x008d0e20,
+	0x90031000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000380,
+	0x000003a0,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,	 /* state end */
 };
 
 RO_RENDERSTATE(7);
diff --git a/drivers/gpu/drm/i915/intel_renderstate_gen8.c b/drivers/gpu/drm/i915/intel_renderstate_gen8.c
index 88c3733..7b10ed3 100644
--- a/drivers/gpu/drm/i915/intel_renderstate_gen8.c
+++ b/drivers/gpu/drm/i915/intel_renderstate_gen8.c
@@ -1,10 +1,479 @@
 #include "intel_renderstate.h"
 
 static const u32 gen8_null_state_relocs[] = {
+	0x00000048,
+	0x00000050,
+	0x00000060,
+	0x000003ec,
 };
 
 static const u32 gen8_null_state_batch[] = {
-	0x0a << 23, /* MI_BATCH_BUFFER_END */
+	0x69040000,
+	0x61020001,
+	0x00000000,
+	0x00000000,
+	0x79120000,
+	0x00000000,
+	0x79130000,
+	0x00000000,
+	0x79140000,
+	0x00000000,
+	0x79150000,
+	0x00000000,
+	0x79160000,
+	0x00000000,
+	0x6101000e,
+	0x00000001,
+	0x00000000,
+	0x00000001,
+	0x00000001,	 /* reloc */
+	0x00000000,
+	0x00000001,	 /* reloc */
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000001,	 /* reloc */
+	0x00000000,
+	0xfffff001,
+	0x00001001,
+	0xfffff001,
+	0x00001001,
+	0x78230000,
+	0x000006e0,
+	0x78210000,
+	0x00000700,
+	0x78300000,
+	0x04010040,
+	0x78330000,
+	0x04000000,
+	0x78310000,
+	0x04000000,
+	0x78320000,
+	0x04000000,
+	0x78240000,
+	0x00000641,
+	0x780e0000,
+	0x00000601,
+	0x780d0000,
+	0x00000000,
+	0x78180000,
+	0x00000001,
+	0x78520003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78190009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781b0007,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78270000,
+	0x00000000,
+	0x782c0000,
+	0x00000000,
+	0x781c0002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78160009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78110008,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78290000,
+	0x00000000,
+	0x782e0000,
+	0x00000000,
+	0x781a0009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781d0007,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78280000,
+	0x00000000,
+	0x782d0000,
+	0x00000000,
+	0x78260000,
+	0x00000000,
+	0x782b0000,
+	0x00000000,
+	0x78150009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78100007,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781e0003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78120002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x781f0002,
+	0x30400820,
+	0x00000000,
+	0x00000000,
+	0x78510009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78500003,
+	0x00210000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78130002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x782a0000,
+	0x00000480,
+	0x782f0000,
+	0x00000540,
+	0x78140000,
+	0x00000800,
+	0x78170009,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x7820000a,
+	0x00000580,
+	0x00000000,
+	0x08080000,
+	0x00000000,
+	0x00000000,
+	0x1f000002,
+	0x00060000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x784d0000,
+	0x40000000,
+	0x784f0000,
+	0x80000100,
+	0x780f0000,
+	0x00000740,
+	0x78050006,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78070003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78060003,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x78040001,
+	0x00000000,
+	0x00000001,
+	0x79000002,
+	0xffffffff,
+	0x00000000,
+	0x00000000,
+	0x78080003,
+	0x00006000,
+	0x000005e0,	 /* reloc */
+	0x00000000,
+	0x00000000,
+	0x78090005,
+	0x02000000,
+	0x22220000,
+	0x02f60000,
+	0x11230000,
+	0x02850004,
+	0x11230000,
+	0x784b0000,
+	0x0000000f,
+	0x78490001,
+	0x00000000,
+	0x00000000,
+	0x7b000005,
+	0x00000000,
+	0x00000003,
+	0x00000000,
+	0x00000001,
+	0x00000000,
+	0x00000000,
+	0x05000000,	 /* cmds end */
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x000004c0,	 /* state start */
+	0x00000500,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000092,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x0060005a,
+	0x21403ae8,
+	0x3a0000c0,
+	0x008d0040,
+	0x0060005a,
+	0x21603ae8,
+	0x3a0000c0,
+	0x008d0080,
+	0x0060005a,
+	0x21803ae8,
+	0x3a0000d0,
+	0x008d0040,
+	0x0060005a,
+	0x21a03ae8,
+	0x3a0000d0,
+	0x008d0080,
+	0x02800031,
+	0x2e0022e8,
+	0x0e000140,
+	0x08840001,
+	0x05800031,
+	0x200022e0,
+	0x0e000e00,
+	0x90031000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x06200000,
+	0x00000002,
+	0x06200000,
+	0x00000002,
+	0x06200000,
+	0x00000002,
+	0x06200000,
+	0x00000002,
+	0x06200000,
+	0x00000002,
+	0x06200000,
+	0x00000002,
+	0x06200000,
+	0x00000002,
+	0x06200000,
+	0x00000002,
+	0x06200000,
+	0x00000002,
+	0x06200000,
+	0x00000002,
+	0x06200000,
+	0x00000002,
+	0x06200000,
+	0x00000002,
+	0x06200000,
+	0x00000002,
+	0x06200000,
+	0x00000002,
+	0x06200000,
+	0x00000002,
+	0x06200000,
+	0x00000002,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0xf99a130c,
+	0x799a130c,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x3f800000,
+	0x00000000,
+	0x3f800000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,
+	0x00000000,	 /* state end */
 };
 
 RO_RENDERSTATE(8);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH] tools/null_state_gen: generate null render state
  2014-05-06 13:26 [PATCH v2 0/2] render state initialization (bdw rc6) Mika Kuoppala
  2014-05-06 13:26 ` [PATCH v2 1/2] drm/i915: add render state initialization Mika Kuoppala
  2014-05-06 13:26 ` [PATCH v2 2/2] drm/i915: add null render states for gen6, gen7 and gen8 Mika Kuoppala
@ 2014-05-06 13:39 ` Mika Kuoppala
  2014-05-06 13:47   ` Chris Wilson
                     ` (4 more replies)
  2014-05-14 10:08 ` [PATCH v2 0/2] render state initialization (bdw rc6) Damien Lespiau
  3 siblings, 5 replies; 19+ messages in thread
From: Mika Kuoppala @ 2014-05-06 13:39 UTC (permalink / raw)
  To: intel-gfx

Generate valid (null) render state for each gen. Output
it as a c source file with batch and relocations.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 configure.ac                                  |    1 +
 lib/gen6_render.h                             |    1 +
 lib/gen7_render.h                             |    1 +
 tools/Makefile.am                             |    4 +-
 tools/null_state_gen/Makefile.am              |   16 +
 tools/null_state_gen/intel_batchbuffer.c      |  173 ++++++
 tools/null_state_gen/intel_batchbuffer.h      |   91 +++
 tools/null_state_gen/intel_null_state_gen.c   |  151 +++++
 tools/null_state_gen/intel_renderstate_gen7.c |  505 ++++++++++++++++
 tools/null_state_gen/intel_renderstate_gen8.c |  764 +++++++++++++++++++++++++
 10 files changed, 1706 insertions(+), 1 deletion(-)
 create mode 100644 tools/null_state_gen/Makefile.am
 create mode 100644 tools/null_state_gen/intel_batchbuffer.c
 create mode 100644 tools/null_state_gen/intel_batchbuffer.h
 create mode 100644 tools/null_state_gen/intel_null_state_gen.c
 create mode 100644 tools/null_state_gen/intel_renderstate_gen7.c
 create mode 100644 tools/null_state_gen/intel_renderstate_gen8.c

diff --git a/configure.ac b/configure.ac
index b71b100..b848ac3 100644
--- a/configure.ac
+++ b/configure.ac
@@ -211,6 +211,7 @@ AC_CONFIG_FILES([
 		 tests/Makefile
 		 tools/Makefile
 		 tools/quick_dump/Makefile
+		 tools/null_state_gen/Makefile
 		 debugger/Makefile
 		 debugger/system_routine/Makefile
 		 assembler/Makefile
diff --git a/lib/gen6_render.h b/lib/gen6_render.h
index 60dc93e..495cc2e 100644
--- a/lib/gen6_render.h
+++ b/lib/gen6_render.h
@@ -152,6 +152,7 @@
 #define VB0_VERTEXDATA			(0 << 20)
 #define VB0_INSTANCEDATA		(1 << 20)
 #define VB0_BUFFER_PITCH_SHIFT		0
+#define VB0_NULL_VERTEX_BUFFER          (1 << 13)
 
 /* VERTEX_ELEMENT_STATE Structure */
 #define VE0_VERTEX_BUFFER_INDEX_SHIFT	26 /* for GEN6 */
diff --git a/lib/gen7_render.h b/lib/gen7_render.h
index 1661d4c..992d839 100644
--- a/lib/gen7_render.h
+++ b/lib/gen7_render.h
@@ -165,6 +165,7 @@
 #define GEN7_VB0_VERTEXDATA		(0 << 20)
 #define GEN7_VB0_INSTANCEDATA		(1 << 20)
 #define GEN7_VB0_BUFFER_PITCH_SHIFT	0
+#define GEN7_VB0_NULL_VERTEX_BUFFER	(1 << 13)
 #define GEN7_VB0_ADDRESS_MODIFY_ENABLE	(1 << 14)
 
 /* VERTEX_ELEMENT_STATE Structure */
diff --git a/tools/Makefile.am b/tools/Makefile.am
index 151092b..64fa060 100644
--- a/tools/Makefile.am
+++ b/tools/Makefile.am
@@ -1,7 +1,9 @@
 include Makefile.sources
 
+SUBDIRS = null_state_gen
+
 if HAVE_DUMPER
-SUBDIRS = quick_dump
+SUBDIRS += quick_dump
 endif
 
 AM_CPPFLAGS = -I$(top_srcdir) -I$(top_srcdir)/lib
diff --git a/tools/null_state_gen/Makefile.am b/tools/null_state_gen/Makefile.am
new file mode 100644
index 0000000..40d2237
--- /dev/null
+++ b/tools/null_state_gen/Makefile.am
@@ -0,0 +1,16 @@
+bin_PROGRAMS = intel_null_state_gen
+
+intel_null_state_gen_SOURCES = 	\
+	intel_batchbuffer.c \
+	intel_renderstate_gen6.c \
+	intel_renderstate_gen7.c \
+	intel_renderstate_gen8.c \
+	intel_null_state_gen.c
+
+gens := 6 7 8
+
+h = /tmp/intel_renderstate_gen$$gen.c
+state_headers: intel_null_state_gen
+	for gen in $(gens); do \
+		./intel_null_state_gen $$gen >$(h) ;\
+	done
diff --git a/tools/null_state_gen/intel_batchbuffer.c b/tools/null_state_gen/intel_batchbuffer.c
new file mode 100644
index 0000000..62e052a
--- /dev/null
+++ b/tools/null_state_gen/intel_batchbuffer.c
@@ -0,0 +1,173 @@
+/**************************************************************************
+ *
+ * Copyright 2006 Tungsten Graphics, Inc., Cedar Park, Texas.
+ * All Rights Reserved.
+ *
+ * Copyright 2014 Intel Corporation
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL TUNGSTEN GRAPHICS AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ **************************************************************************/
+
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+
+#include "intel_batchbuffer.h"
+
+int intel_batch_reset(struct intel_batchbuffer *batch,
+		      void *p,
+		      uint32_t size,
+		      uint32_t off)
+{
+	batch->err = -EINVAL;
+	batch->base = batch->base_ptr = p;
+	batch->state_base = batch->state_ptr = p;
+
+	if (off >= size || ALIGN(off, 4) != off)
+		return -EINVAL;
+
+	batch->size = size;
+
+	batch->state_base = batch->state_ptr = &batch->base[off];
+
+	batch->num_relocs = 0;
+	batch->err = 0;
+
+	return batch->err;
+}
+
+uint32_t intel_batch_state_used(struct intel_batchbuffer *batch)
+{
+	return batch->state_ptr - batch->state_base;
+}
+
+uint32_t intel_batch_state_offset(struct intel_batchbuffer *batch)
+{
+	return batch->state_ptr - batch->base;
+}
+
+void *intel_batch_state_alloc(struct intel_batchbuffer *batch,
+			      uint32_t size,
+			      uint32_t align)
+{
+	uint32_t cur;
+	uint32_t offset;
+
+	if (batch->err)
+		return NULL;
+
+	cur  = intel_batch_state_offset(batch);
+	offset = ALIGN(cur, align);
+
+	if (offset + size > batch->size) {
+		batch->err = -ENOSPC;
+		return NULL;
+	}
+
+	batch->state_ptr = batch->base + offset + size;
+
+	memset(batch->base + cur, 0, size);
+
+	return batch->base + offset;
+}
+
+int intel_batch_offset(struct intel_batchbuffer *batch, const void *ptr)
+{
+	return (uint8_t *)ptr - batch->base;
+}
+
+int intel_batch_state_copy(struct intel_batchbuffer *batch,
+			   const void *ptr,
+			   const uint32_t size,
+			   const uint32_t align)
+{
+	void * const p = intel_batch_state_alloc(batch, size, align);
+
+	if (p == NULL)
+		return -1;
+
+	return intel_batch_offset(batch, memcpy(p, ptr, size));
+}
+
+uint32_t intel_batch_cmds_used(struct intel_batchbuffer *batch)
+{
+	return batch->base_ptr - batch->base;
+}
+
+uint32_t intel_batch_total_used(struct intel_batchbuffer *batch)
+{
+	return batch->state_ptr - batch->base;
+}
+
+static uint32_t intel_batch_space(struct intel_batchbuffer *batch)
+{
+	return batch->state_base - batch->base_ptr;
+}
+
+int intel_batch_emit_dword(struct intel_batchbuffer *batch, uint32_t dword)
+{
+	uint32_t offset;
+
+	if (batch->err)
+		return -1;
+
+	if (intel_batch_space(batch) < 4) {
+		batch->err = -ENOSPC;
+		return -1;
+	}
+
+	offset = intel_batch_offset(batch, batch->base_ptr);
+
+	*(uint32_t *) (batch->base_ptr) = dword;
+	batch->base_ptr += 4;
+
+	return offset;
+}
+
+int intel_batch_emit_reloc(struct intel_batchbuffer *batch,
+			   const uint32_t delta)
+{
+	uint32_t offset;
+
+	if (batch->err)
+		return -1;
+
+	if (delta >= batch->size) {
+		batch->err = -EINVAL;
+		return -1;
+	}
+
+	offset = intel_batch_emit_dword(batch, delta);
+
+	if (batch->err)
+		return -1;
+
+	if (batch->num_relocs >= MAX_RELOCS) {
+		batch->err = -ENOSPC;
+		return -1;
+	}
+
+	batch->relocs[batch->num_relocs++] = offset;
+
+	return offset;
+}
diff --git a/tools/null_state_gen/intel_batchbuffer.h b/tools/null_state_gen/intel_batchbuffer.h
new file mode 100644
index 0000000..f5c29db
--- /dev/null
+++ b/tools/null_state_gen/intel_batchbuffer.h
@@ -0,0 +1,91 @@
+/**************************************************************************
+ *
+ * Copyright 2006 Tungsten Graphics, Inc., Cedar Park, Texas.
+ * All Rights Reserved.
+ *
+ * Copyright 2014 Intel Corporation
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL TUNGSTEN GRAPHICS AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ **************************************************************************/
+
+#ifndef _INTEL_BATCHBUFFER_H
+#define _INTEL_BATCHBUFFER_H
+
+#include <stdint.h>
+
+#define MAX_RELOCS 64
+#define ALIGN(x, y) (((x) + (y)-1) & ~((y)-1))
+
+struct intel_batchbuffer {
+	int err;
+	uint8_t *base;
+	uint8_t *base_ptr;
+	uint8_t *state_base;
+	uint8_t *state_ptr;
+	int size;
+
+	uint32_t relocs[MAX_RELOCS];
+	uint32_t num_relocs;
+};
+
+#define OUT_BATCH(d) intel_batch_emit_dword(batch, d)
+#define OUT_RELOC(batch, read_domains, write_domain, delta) \
+	intel_batch_emit_reloc(batch, delta)
+
+int intel_batch_reset(struct intel_batchbuffer *batch,
+		       void *p,
+		       uint32_t size, uint32_t split_off);
+
+uint32_t intel_batch_state_used(struct intel_batchbuffer *batch);
+
+void *intel_batch_state_alloc(struct intel_batchbuffer *batch,
+			      uint32_t size,
+			      uint32_t align);
+
+int intel_batch_offset(struct intel_batchbuffer *batch, const void *ptr);
+
+int intel_batch_state_copy(struct intel_batchbuffer *batch,
+			   const void *ptr,
+			   const uint32_t size,
+			   const uint32_t align);
+
+uint32_t intel_batch_cmds_used(struct intel_batchbuffer *batch);
+
+int intel_batch_emit_dword(struct intel_batchbuffer *batch, uint32_t dword);
+
+int intel_batch_emit_reloc(struct intel_batchbuffer *batch,
+			   const uint32_t delta);
+
+uint32_t intel_batch_total_used(struct intel_batchbuffer *batch);
+
+static inline int intel_batch_error(struct intel_batchbuffer *batch)
+{
+	return batch->err;
+}
+
+static inline uint32_t intel_batch_state_start(struct intel_batchbuffer *batch)
+{
+	return batch->state_base - batch->base;
+}
+
+#endif
diff --git a/tools/null_state_gen/intel_null_state_gen.c b/tools/null_state_gen/intel_null_state_gen.c
new file mode 100644
index 0000000..14f45d3
--- /dev/null
+++ b/tools/null_state_gen/intel_null_state_gen.c
@@ -0,0 +1,151 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <assert.h>
+
+#include "intel_batchbuffer.h"
+
+#define STATE_ALIGN 64
+
+extern int gen6_setup_null_render_state(struct intel_batchbuffer *batch);
+extern int gen7_setup_null_render_state(struct intel_batchbuffer *batch);
+extern int gen8_setup_null_render_state(struct intel_batchbuffer *batch);
+
+static void print_usage(char *s)
+{
+	fprintf(stderr, "%s: <gen>\n"
+		"     gen:     gen to generate for (6,7,8)\n",
+	       s);
+}
+
+static int is_reloc(struct intel_batchbuffer *batch, uint32_t offset)
+{
+	int i;
+
+	for (i = 0; i < batch->num_relocs; i++)
+		if (batch->relocs[i] == offset)
+			return 1;
+
+	return 0;
+}
+
+static int print_state(int gen, struct intel_batchbuffer *batch)
+{
+	int i;
+
+	printf("#include \"intel_renderstate.h\"\n\n");
+
+	printf("static const u32 gen%d_null_state_relocs[] = {\n", gen);
+	for (i = 0; i < batch->num_relocs; i++) {
+		printf("\t0x%08x,\n", batch->relocs[i]);
+	}
+	printf("};\n\n");
+
+	printf("static const u32 gen%d_null_state_batch[] = {\n", gen);
+	for (i = 0; i < batch->size; i += 4) {
+		const uint32_t *p = (void *)batch->base + i;
+		printf("\t0x%08x,", *p);
+
+		if (i == intel_batch_cmds_used(batch) - 4)
+			printf("\t /* cmds end */");
+
+		if (i == intel_batch_state_start(batch))
+			printf("\t /* state start */");
+
+
+		if (i == intel_batch_state_start(batch) +
+		    intel_batch_state_used(batch) - 4)
+			printf("\t /* state end */");
+
+		if (is_reloc(batch, i))
+			printf("\t /* reloc */");
+
+		printf("\n");
+	}
+	printf("};\n\nRO_RENDERSTATE(%d);\n", gen);
+
+	return 0;
+}
+
+static int do_generate(int gen)
+{
+	int initial_size = 8192;
+	struct intel_batchbuffer batch;
+	void *p;
+	int ret = -EINVAL;
+	uint32_t cmd_len, state_len, size;
+	int (*null_state_gen)(struct intel_batchbuffer *batch) = NULL;
+
+	p = malloc(initial_size);
+	if (p == NULL)
+		return -ENOMEM;
+
+	assert(ALIGN(initial_size/2, STATE_ALIGN) == initial_size/2);
+
+	ret = intel_batch_reset(&batch, p, initial_size, initial_size/2);
+	if (ret)
+		goto out;
+
+	switch (gen) {
+	case 6:
+		null_state_gen = gen6_setup_null_render_state;
+		break;
+
+	case 7:
+		null_state_gen = gen7_setup_null_render_state;
+		break;
+
+	case 8:
+		null_state_gen = gen8_setup_null_render_state;
+		break;
+	}
+
+	if (null_state_gen == NULL) {
+		printf("no generator found for %d\n", gen);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = null_state_gen(&batch);
+	if (ret < 0)
+		goto out;
+
+	cmd_len = intel_batch_cmds_used(&batch);
+	state_len = intel_batch_state_used(&batch);
+
+	size = cmd_len + state_len + ALIGN(cmd_len, STATE_ALIGN) - cmd_len;
+
+	ret = intel_batch_reset(&batch, p, size, ALIGN(cmd_len, STATE_ALIGN));
+	if (ret)
+		goto out;
+
+	ret = null_state_gen(&batch);
+	if (ret < 0)
+		goto out;
+
+	assert(cmd_len == intel_batch_cmds_used(&batch));
+	assert(state_len == intel_batch_state_used(&batch));
+	assert(size == ret);
+
+	/* Batch buffer needs to end */
+	assert(*(uint32_t *)(p + cmd_len - 4) == (0xA << 23));
+
+	ret = print_state(gen, &batch);
+out:
+	free(p);
+
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	if (argc != 2) {
+		print_usage(argv[0]);
+		return 1;
+	}
+
+	return do_generate(atoi(argv[1]));
+}
diff --git a/tools/null_state_gen/intel_renderstate_gen7.c b/tools/null_state_gen/intel_renderstate_gen7.c
new file mode 100644
index 0000000..8fe8a80
--- /dev/null
+++ b/tools/null_state_gen/intel_renderstate_gen7.c
@@ -0,0 +1,505 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+
+#include "intel_batchbuffer.h"
+#include <lib/gen7_render.h>
+#include <lib/intel_reg.h>
+#include <stdio.h>
+
+static const uint32_t ps_kernel[][4] = {
+	{ 0x0080005a, 0x2e2077bd, 0x000000c0, 0x008d0040 },
+	{ 0x0080005a, 0x2e6077bd, 0x000000d0, 0x008d0040 },
+	{ 0x02800031, 0x21801fa9, 0x008d0e20, 0x08840001 },
+	{ 0x00800001, 0x2e2003bd, 0x008d0180, 0x00000000 },
+	{ 0x00800001, 0x2e6003bd, 0x008d01c0, 0x00000000 },
+	{ 0x00800001, 0x2ea003bd, 0x008d0200, 0x00000000 },
+	{ 0x00800001, 0x2ee003bd, 0x008d0240, 0x00000000 },
+	{ 0x05800031, 0x20001fa8, 0x008d0e20, 0x90031000 },
+};
+
+static uint32_t
+gen7_bind_buf_null(struct intel_batchbuffer *batch)
+{
+	uint32_t *ss;
+
+	ss = intel_batch_state_alloc(batch, 8 * sizeof(*ss), 32);
+	if (ss == NULL)
+		return -1;
+
+	ss[0] = 0;
+	ss[1] = 0;
+	ss[2] = 0;
+	ss[3] = 0;
+	ss[4] = 0;
+	ss[5] = 0;
+	ss[6] = 0;
+	ss[7] = 0;
+
+	return intel_batch_offset(batch, ss);
+}
+
+static void
+gen7_emit_vertex_elements(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_VERTEX_ELEMENTS |
+		  ((2 * (1 + 2)) + 1 - 2));
+
+	OUT_BATCH(0 << GEN7_VE0_VERTEX_BUFFER_INDEX_SHIFT | GEN7_VE0_VALID |
+		  GEN7_SURFACEFORMAT_R32G32B32A32_FLOAT <<
+		  GEN7_VE0_FORMAT_SHIFT |
+		  0 << GEN7_VE0_OFFSET_SHIFT);
+
+	OUT_BATCH(GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_0_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_1_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_2_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_3_SHIFT);
+
+	/* x,y */
+	OUT_BATCH(0 << GEN7_VE0_VERTEX_BUFFER_INDEX_SHIFT | GEN7_VE0_VALID |
+		  GEN7_SURFACEFORMAT_R16G16_SSCALED << GEN7_VE0_FORMAT_SHIFT |
+		  0 << GEN7_VE0_OFFSET_SHIFT); /* offsets vb in bytes */
+	OUT_BATCH(GEN7_VFCOMPONENT_STORE_SRC << GEN7_VE1_VFCOMPONENT_0_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_SRC << GEN7_VE1_VFCOMPONENT_1_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_2_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_1_FLT << GEN7_VE1_VFCOMPONENT_3_SHIFT);
+
+	/* s,t */
+	OUT_BATCH(0 << GEN7_VE0_VERTEX_BUFFER_INDEX_SHIFT | GEN7_VE0_VALID |
+		  GEN7_SURFACEFORMAT_R16G16_SSCALED << GEN7_VE0_FORMAT_SHIFT |
+		  4 << GEN7_VE0_OFFSET_SHIFT);  /* offset vb in bytes */
+	OUT_BATCH(GEN7_VFCOMPONENT_STORE_SRC << GEN7_VE1_VFCOMPONENT_0_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_SRC << GEN7_VE1_VFCOMPONENT_1_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_2_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_1_FLT << GEN7_VE1_VFCOMPONENT_3_SHIFT);
+}
+
+static uint32_t
+gen7_create_vertex_buffer(struct intel_batchbuffer *batch)
+{
+	uint16_t *v;
+
+	v = intel_batch_state_alloc(batch, 12*sizeof(*v), 8);
+	if (v == NULL)
+		return -1;
+
+	v[0] = 0;
+	v[1] = 0;
+	v[2] = 0;
+	v[3] = 0;
+
+	v[4] = 0;
+	v[5] = 0;
+	v[6] = 0;
+	v[7] = 0;
+
+	v[8] = 0;
+	v[9] = 0;
+	v[10] = 0;
+	v[11] = 0;
+
+	return intel_batch_offset(batch, v);
+}
+
+static void gen7_emit_vertex_buffer(struct intel_batchbuffer *batch)
+{
+	uint32_t offset;
+
+	offset = gen7_create_vertex_buffer(batch);
+
+	OUT_BATCH(GEN7_3DSTATE_VERTEX_BUFFERS | (5 - 2));
+	OUT_BATCH(0 << GEN7_VB0_BUFFER_INDEX_SHIFT |
+		  GEN7_VB0_VERTEXDATA |
+		  GEN7_VB0_ADDRESS_MODIFY_ENABLE |
+		  GEN7_VB0_NULL_VERTEX_BUFFER |
+		  4*2 << GEN7_VB0_BUFFER_PITCH_SHIFT);
+
+	OUT_RELOC(batch, I915_GEM_DOMAIN_VERTEX, 0, offset);
+	OUT_BATCH(~0);
+	OUT_BATCH(0);
+}
+
+static uint32_t
+gen7_bind_surfaces(struct intel_batchbuffer *batch)
+{
+	uint32_t *binding_table;
+
+	binding_table = intel_batch_state_alloc(batch, 8, 32);
+	if (binding_table == NULL)
+		return -1;
+
+	binding_table[0] = gen7_bind_buf_null(batch);
+	binding_table[1] = gen7_bind_buf_null(batch);
+
+	return intel_batch_offset(batch, binding_table);
+}
+
+static void
+gen7_emit_binding_table(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_PS | (2 - 2));
+	OUT_BATCH(gen7_bind_surfaces(batch));
+}
+
+static void
+gen7_emit_drawing_rectangle(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_DRAWING_RECTANGLE | (4 - 2));
+	/* Purposedly set min > max for null rectangle */
+	OUT_BATCH(0xffffffff);
+	OUT_BATCH(0 | 0);
+	OUT_BATCH(0);
+}
+
+static uint32_t
+gen7_create_blend_state(struct intel_batchbuffer *batch)
+{
+	struct gen7_blend_state *blend;
+
+	blend = intel_batch_state_alloc(batch, sizeof(*blend), 64);
+	if (blend == NULL)
+		return -1;
+
+	blend->blend0.dest_blend_factor = GEN7_BLENDFACTOR_ZERO;
+	blend->blend0.source_blend_factor = GEN7_BLENDFACTOR_ONE;
+	blend->blend0.blend_func = GEN7_BLENDFUNCTION_ADD;
+	blend->blend1.post_blend_clamp_enable = 1;
+	blend->blend1.pre_blend_clamp_enable = 1;
+
+	return intel_batch_offset(batch, blend);
+}
+
+static void
+gen7_emit_state_base_address(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_STATE_BASE_ADDRESS | (10 - 2));
+	OUT_BATCH(0);
+	OUT_RELOC(batch, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+	OUT_RELOC(batch, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+	OUT_BATCH(0);
+	OUT_RELOC(batch, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+
+	OUT_BATCH(0);
+	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
+	OUT_BATCH(0);
+	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
+}
+
+static uint32_t
+gen7_create_cc_viewport(struct intel_batchbuffer *batch)
+{
+	struct gen7_cc_viewport *vp;
+
+	vp = intel_batch_state_alloc(batch, sizeof(*vp), 32);
+	if (vp == NULL)
+		return -1;
+
+	vp->min_depth = -1.e35;
+	vp->max_depth = 1.e35;
+
+	return intel_batch_offset(batch, vp);
+}
+
+static void
+gen7_emit_cc(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_BLEND_STATE_POINTERS | (2 - 2));
+	OUT_BATCH(gen7_create_blend_state(batch));
+
+	OUT_BATCH(GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_CC | (2 - 2));
+	OUT_BATCH(gen7_create_cc_viewport(batch));
+}
+
+static uint32_t
+gen7_create_sampler(struct intel_batchbuffer *batch)
+{
+	struct gen7_sampler_state *ss;
+
+	ss = intel_batch_state_alloc(batch, sizeof(*ss), 32);
+	if (ss == NULL)
+		return -1;
+
+	ss->ss0.min_filter = GEN7_MAPFILTER_NEAREST;
+	ss->ss0.mag_filter = GEN7_MAPFILTER_NEAREST;
+
+	ss->ss3.r_wrap_mode = GEN7_TEXCOORDMODE_CLAMP;
+	ss->ss3.s_wrap_mode = GEN7_TEXCOORDMODE_CLAMP;
+	ss->ss3.t_wrap_mode = GEN7_TEXCOORDMODE_CLAMP;
+
+	ss->ss3.non_normalized_coord = 1;
+
+	return intel_batch_offset(batch, ss);
+}
+
+static void
+gen7_emit_sampler(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_PS | (2 - 2));
+	OUT_BATCH(gen7_create_sampler(batch));
+}
+
+static void
+gen7_emit_multisample(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_MULTISAMPLE | (4 - 2));
+	OUT_BATCH(GEN7_3DSTATE_MULTISAMPLE_PIXEL_LOCATION_CENTER |
+		  GEN7_3DSTATE_MULTISAMPLE_NUMSAMPLES_1); /* 1 sample/pixel */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_SAMPLE_MASK | (2 - 2));
+	OUT_BATCH(1);
+}
+
+static void
+gen7_emit_urb(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_PS | (2 - 2));
+	OUT_BATCH(8); /* in 1KBs */
+
+	/* num of VS entries must be divisible by 8 if size < 9 */
+	OUT_BATCH(GEN7_3DSTATE_URB_VS | (2 - 2));
+	OUT_BATCH((64 << GEN7_URB_ENTRY_NUMBER_SHIFT) |
+		  (2 - 1) << GEN7_URB_ENTRY_SIZE_SHIFT |
+		  (1 << GEN7_URB_STARTING_ADDRESS_SHIFT));
+
+	OUT_BATCH(GEN7_3DSTATE_URB_HS | (2 - 2));
+	OUT_BATCH((0 << GEN7_URB_ENTRY_SIZE_SHIFT) |
+		  (2 << GEN7_URB_STARTING_ADDRESS_SHIFT));
+
+	OUT_BATCH(GEN7_3DSTATE_URB_DS | (2 - 2));
+	OUT_BATCH((0 << GEN7_URB_ENTRY_SIZE_SHIFT) |
+		  (2 << GEN7_URB_STARTING_ADDRESS_SHIFT));
+
+	OUT_BATCH(GEN7_3DSTATE_URB_GS | (2 - 2));
+	OUT_BATCH((0 << GEN7_URB_ENTRY_SIZE_SHIFT) |
+		  (1 << GEN7_URB_STARTING_ADDRESS_SHIFT));
+}
+
+static void
+gen7_emit_vs(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_VS | (6 - 2));
+	OUT_BATCH(0); /* no VS kernel */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0); /* pass-through */
+}
+
+static void
+gen7_emit_hs(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_HS | (7 - 2));
+	OUT_BATCH(0); /* no HS kernel */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0); /* pass-through */
+}
+
+static void
+gen7_emit_te(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_TE | (4 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+static void
+gen7_emit_ds(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_DS | (6 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+static void
+gen7_emit_gs(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_GS | (7 - 2));
+	OUT_BATCH(0); /* no GS kernel */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0); /* pass-through  */
+}
+
+static void
+gen7_emit_streamout(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_STREAMOUT | (3 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+static void
+gen7_emit_sf(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_SF | (7 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(GEN7_3DSTATE_SF_CULL_NONE);
+	OUT_BATCH(2 << GEN7_3DSTATE_SF_TRIFAN_PROVOKE_SHIFT);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+static void
+gen7_emit_sbe(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_SBE | (14 - 2));
+	OUT_BATCH(1 << GEN7_SBE_NUM_OUTPUTS_SHIFT |
+		  1 << GEN7_SBE_URB_ENTRY_READ_LENGTH_SHIFT |
+		  1 << GEN7_SBE_URB_ENTRY_READ_OFFSET_SHIFT);
+	OUT_BATCH(0);
+	OUT_BATCH(0); /* dw4 */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0); /* dw8 */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0); /* dw12 */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+static void
+gen7_emit_ps(struct intel_batchbuffer *batch)
+{
+	int threads;
+
+#if 0 /* XXX: Do we need separate state for hsw or not */
+	if (IS_HASWELL(batch->dev))
+		threads = 40 << HSW_PS_MAX_THREADS_SHIFT |
+			1 << HSW_PS_SAMPLE_MASK_SHIFT;
+	else
+#endif
+		threads = 40 << IVB_PS_MAX_THREADS_SHIFT;
+
+	OUT_BATCH(GEN7_3DSTATE_PS | (8 - 2));
+	OUT_BATCH(intel_batch_state_copy(batch, ps_kernel,
+					 sizeof(ps_kernel), 64));
+	OUT_BATCH(1 << GEN7_PS_SAMPLER_COUNT_SHIFT |
+		  2 << GEN7_PS_BINDING_TABLE_ENTRY_COUNT_SHIFT);
+	OUT_BATCH(0); /* scratch address */
+	OUT_BATCH(threads |
+		  GEN7_PS_16_DISPATCH_ENABLE |
+		  GEN7_PS_ATTRIBUTE_ENABLE);
+	OUT_BATCH(6 << GEN7_PS_DISPATCH_START_GRF_SHIFT_0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+static void
+gen7_emit_clip(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_CLIP | (4 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(0); /* pass-through */
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_SF_CL | (2 - 2));
+	OUT_BATCH(0);
+}
+
+static void
+gen7_emit_wm(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_WM | (3 - 2));
+	OUT_BATCH(GEN7_WM_DISPATCH_ENABLE |
+		  GEN7_WM_PERSPECTIVE_PIXEL_BARYCENTRIC);
+	OUT_BATCH(0);
+}
+
+static void
+gen7_emit_null_depth_buffer(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_DEPTH_BUFFER | (7 - 2));
+	OUT_BATCH(GEN7_SURFACE_NULL << GEN7_3DSTATE_DEPTH_BUFFER_TYPE_SHIFT |
+		  GEN7_DEPTHFORMAT_D32_FLOAT <<
+		  GEN7_3DSTATE_DEPTH_BUFFER_FORMAT_SHIFT);
+	OUT_BATCH(0); /* disable depth, stencil and hiz */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_CLEAR_PARAMS | (3 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+int gen7_setup_null_render_state(struct intel_batchbuffer *batch)
+{
+	int ret;
+
+	OUT_BATCH(GEN7_PIPELINE_SELECT | PIPELINE_SELECT_3D);
+
+	gen7_emit_state_base_address(batch);
+	gen7_emit_multisample(batch);
+	gen7_emit_urb(batch);
+	gen7_emit_vs(batch);
+	gen7_emit_hs(batch);
+	gen7_emit_te(batch);
+	gen7_emit_ds(batch);
+	gen7_emit_gs(batch);
+	gen7_emit_clip(batch);
+	gen7_emit_sf(batch);
+	gen7_emit_wm(batch);
+	gen7_emit_streamout(batch);
+	gen7_emit_null_depth_buffer(batch);
+
+	gen7_emit_cc(batch);
+	gen7_emit_sampler(batch);
+	gen7_emit_sbe(batch);
+	gen7_emit_ps(batch);
+	gen7_emit_vertex_elements(batch);
+	gen7_emit_vertex_buffer(batch);
+	gen7_emit_binding_table(batch);
+	gen7_emit_drawing_rectangle(batch);
+
+	OUT_BATCH(GEN7_3DPRIMITIVE | (7 - 2));
+	OUT_BATCH(GEN7_3DPRIMITIVE_VERTEX_SEQUENTIAL | _3DPRIM_RECTLIST);
+	OUT_BATCH(3);
+	OUT_BATCH(0);
+	OUT_BATCH(1);   /* single instance */
+	OUT_BATCH(0);   /* start instance location */
+	OUT_BATCH(0);   /* index buffer offset, ignored */
+
+	OUT_BATCH(MI_BATCH_BUFFER_END);
+
+	ret = intel_batch_error(batch);
+	if (ret == 0)
+		ret = intel_batch_total_used(batch);
+
+	return ret;
+}
diff --git a/tools/null_state_gen/intel_renderstate_gen8.c b/tools/null_state_gen/intel_renderstate_gen8.c
new file mode 100644
index 0000000..7e22b24
--- /dev/null
+++ b/tools/null_state_gen/intel_renderstate_gen8.c
@@ -0,0 +1,764 @@
+#include "intel_batchbuffer.h"
+#include <lib/gen8_render.h>
+#include <lib/intel_reg.h>
+#include <string.h>
+
+struct {
+	uint32_t cc_state;
+	uint32_t blend_state;
+} cc;
+
+struct {
+	uint32_t cc_state;
+	uint32_t sf_clip_state;
+} viewport;
+
+/* see shaders/ps/blit.g7a */
+static const uint32_t ps_kernel[][4] = {
+#if 1
+   { 0x0060005a, 0x21403ae8, 0x3a0000c0, 0x008d0040 },
+   { 0x0060005a, 0x21603ae8, 0x3a0000c0, 0x008d0080 },
+   { 0x0060005a, 0x21803ae8, 0x3a0000d0, 0x008d0040 },
+   { 0x0060005a, 0x21a03ae8, 0x3a0000d0, 0x008d0080 },
+   { 0x02800031, 0x2e0022e8, 0x0e000140, 0x08840001 },
+   { 0x05800031, 0x200022e0, 0x0e000e00, 0x90031000 },
+#else
+   /* Write all -1 */
+   { 0x00600001, 0x2e000608, 0x00000000, 0x3f800000 },
+   { 0x00600001, 0x2e200608, 0x00000000, 0x3f800000 },
+   { 0x00600001, 0x2e400608, 0x00000000, 0x3f800000 },
+   { 0x00600001, 0x2e600608, 0x00000000, 0x3f800000 },
+   { 0x00600001, 0x2e800608, 0x00000000, 0x3f800000 },
+   { 0x00600001, 0x2ea00608, 0x00000000, 0x3f800000 },
+   { 0x00600001, 0x2ec00608, 0x00000000, 0x3f800000 },
+   { 0x00600001, 0x2ee00608, 0x00000000, 0x3f800000 },
+   { 0x05800031, 0x200022e0, 0x0e000e00, 0x90031000 },
+#endif
+};
+
+static uint32_t
+gen8_bind_buf_null(struct intel_batchbuffer *batch)
+{
+	struct gen8_surface_state *ss;
+
+	ss = intel_batch_state_alloc(batch, sizeof(*ss), 64);
+	if (ss == NULL)
+		return -1;
+
+	memset(ss, 0, sizeof(*ss));
+
+	return intel_batch_offset(batch, ss);
+}
+
+static uint32_t
+gen8_bind_surfaces(struct intel_batchbuffer *batch)
+{
+	uint32_t *binding_table, offset;
+
+	binding_table = intel_batch_state_alloc(batch, 8, 32);
+	if (binding_table == NULL)
+		return -1;
+
+	offset = intel_batch_offset(batch, binding_table);
+
+	binding_table[0] =
+		gen8_bind_buf_null(batch);
+	binding_table[1] =
+		gen8_bind_buf_null(batch);
+
+	return offset;
+}
+
+/* Mostly copy+paste from gen6, except wrap modes moved */
+static uint32_t
+gen8_create_sampler(struct intel_batchbuffer *batch) {
+	struct gen8_sampler_state *ss;
+	uint32_t offset;
+
+	ss = intel_batch_state_alloc(batch, sizeof(*ss), 64);
+	if (ss == NULL)
+		return -1;
+
+	offset = intel_batch_offset(batch, ss);
+
+	ss->ss0.min_filter = GEN6_MAPFILTER_NEAREST;
+	ss->ss0.mag_filter = GEN6_MAPFILTER_NEAREST;
+	ss->ss3.r_wrap_mode = GEN6_TEXCOORDMODE_CLAMP;
+	ss->ss3.s_wrap_mode = GEN6_TEXCOORDMODE_CLAMP;
+	ss->ss3.t_wrap_mode = GEN6_TEXCOORDMODE_CLAMP;
+
+	/* I've experimented with non-normalized coordinates and using the LD
+	 * sampler fetch, but couldn't make it work. */
+	ss->ss3.non_normalized_coord = 0;
+
+	return offset;
+}
+
+static uint32_t
+gen8_fill_ps(struct intel_batchbuffer *batch,
+	     const uint32_t kernel[][4],
+	     size_t size)
+{
+	return intel_batch_state_copy(batch, kernel, size, 64);
+}
+
+/**
+ * gen7_fill_vertex_buffer_data populate vertex buffer with data.
+ *
+ * The vertex buffer consists of 3 vertices to construct a RECTLIST. The 4th
+ * vertex is implied (automatically derived by the HW). Each element has the
+ * destination offset, and the normalized texture offset (src). The rectangle
+ * itself will span the entire subsurface to be copied.
+ *
+ * see gen6_emit_vertex_elements
+ */
+static uint32_t
+gen7_fill_vertex_buffer_data(struct intel_batchbuffer *batch)
+{
+	uint16_t *start;
+
+	start = intel_batch_state_alloc(batch, 2 * sizeof(*start), 8);
+	start[0] = 0;
+	start[1] = 0;
+
+	return intel_batch_offset(batch, start);
+}
+
+/**
+ * gen6_emit_vertex_elements - The vertex elements describe the contents of the
+ * vertex buffer. We pack the vertex buffer in a semi weird way, conforming to
+ * what gen6_rendercopy did. The most straightforward would be to store
+ * everything as floats.
+ *
+ * see gen7_fill_vertex_buffer_data() for where the corresponding elements are
+ * packed.
+ */
+static void
+gen6_emit_vertex_elements(struct intel_batchbuffer *batch) {
+	/*
+	 * The VUE layout
+	 *    dword 0-3: pad (0, 0, 0. 0)
+	 *    dword 4-7: position (x, y, 0, 1.0),
+	 *    dword 8-11: texture coordinate 0 (u0, v0, 0, 1.0)
+	 */
+	OUT_BATCH(GEN6_3DSTATE_VERTEX_ELEMENTS | (3 * 2 + 1 - 2));
+
+	/* Element state 0. These are 4 dwords of 0 required for the VUE format.
+	 * We don't really know or care what they do.
+	 */
+	OUT_BATCH(0 << VE0_VERTEX_BUFFER_INDEX_SHIFT | VE0_VALID |
+		  GEN6_SURFACEFORMAT_R32G32B32A32_FLOAT << VE0_FORMAT_SHIFT |
+		  0 << VE0_OFFSET_SHIFT); /* we specify 0, but it's really does not exist */
+	OUT_BATCH(GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_0_SHIFT |
+		  GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_1_SHIFT |
+		  GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_2_SHIFT |
+		  GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_3_SHIFT);
+
+	/* Element state 1 - Our "destination" vertices. These are passed down
+	 * through the pipeline, and eventually make it to the pixel shader as
+	 * the offsets in the destination surface. It's packed as the 16
+	 * signed/scaled because of gen6 rendercopy. I see no particular reason
+	 * for doing this though.
+	 */
+	OUT_BATCH(0 << VE0_VERTEX_BUFFER_INDEX_SHIFT | VE0_VALID |
+		  GEN6_SURFACEFORMAT_R16G16_SSCALED << VE0_FORMAT_SHIFT |
+		  0 << VE0_OFFSET_SHIFT); /* offsets vb in bytes */
+	OUT_BATCH(GEN6_VFCOMPONENT_STORE_SRC << VE1_VFCOMPONENT_0_SHIFT |
+		  GEN6_VFCOMPONENT_STORE_SRC << VE1_VFCOMPONENT_1_SHIFT |
+		  GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_2_SHIFT |
+		  GEN6_VFCOMPONENT_STORE_1_FLT << VE1_VFCOMPONENT_3_SHIFT);
+
+	/* Element state 2. Last but not least we store the U,V components as
+	 * normalized floats. These will be used in the pixel shader to sample
+	 * from the source buffer.
+	 */
+	OUT_BATCH(0 << VE0_VERTEX_BUFFER_INDEX_SHIFT | VE0_VALID |
+		  GEN6_SURFACEFORMAT_R32G32_FLOAT << VE0_FORMAT_SHIFT |
+		  4 << VE0_OFFSET_SHIFT);	/* offset vb in bytes */
+	OUT_BATCH(GEN6_VFCOMPONENT_STORE_SRC << VE1_VFCOMPONENT_0_SHIFT |
+		  GEN6_VFCOMPONENT_STORE_SRC << VE1_VFCOMPONENT_1_SHIFT |
+		  GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_2_SHIFT |
+		  GEN6_VFCOMPONENT_STORE_1_FLT << VE1_VFCOMPONENT_3_SHIFT);
+}
+
+/**
+ * gen7_emit_vertex_buffer emit the vertex buffers command
+ *
+ * @batch
+ * @offset - bytw offset within the @batch where the vertex buffer starts.
+ */
+static void gen7_emit_vertex_buffer(struct intel_batchbuffer *batch,
+				    uint32_t offset) {
+	OUT_BATCH(GEN6_3DSTATE_VERTEX_BUFFERS | (1 + (4 * 1) - 2));
+	OUT_BATCH(0 << VB0_BUFFER_INDEX_SHIFT | /* VB 0th index */
+		  GEN7_VB0_BUFFER_ADDR_MOD_EN | /* Address Modify Enable */
+		  VB0_NULL_VERTEX_BUFFER |
+		  0 << VB0_BUFFER_PITCH_SHIFT);
+	OUT_RELOC(batch, I915_GEM_DOMAIN_VERTEX, 0, offset);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+static uint32_t
+gen6_create_cc_state(struct intel_batchbuffer *batch)
+{
+	struct gen6_color_calc_state *cc_state;
+	uint32_t offset;
+
+	cc_state = intel_batch_state_alloc(batch, sizeof(*cc_state), 64);
+	if (cc_state == NULL)
+		return -1;
+
+	offset = intel_batch_offset(batch, cc_state);
+
+	return offset;
+}
+
+static uint32_t
+gen8_create_blend_state(struct intel_batchbuffer *batch)
+{
+	struct gen8_blend_state *blend;
+	int i;
+	uint32_t offset;
+
+	blend = intel_batch_state_alloc(batch, sizeof(*blend), 64);
+	if (blend == NULL)
+		return -1;
+
+	offset = intel_batch_offset(batch, blend);
+
+	for (i = 0; i < 16; i++) {
+		blend->bs[i].dest_blend_factor = GEN6_BLENDFACTOR_ZERO;
+		blend->bs[i].source_blend_factor = GEN6_BLENDFACTOR_ONE;
+		blend->bs[i].color_blend_func = GEN6_BLENDFUNCTION_ADD;
+		blend->bs[i].pre_blend_color_clamp = 1;
+		blend->bs[i].color_buffer_blend = 0;
+	}
+
+	return offset;
+}
+
+static uint32_t
+gen6_create_cc_viewport(struct intel_batchbuffer *batch)
+{
+	struct gen6_cc_viewport *vp;
+	uint32_t offset;
+
+	vp = intel_batch_state_alloc(batch, sizeof(*vp), 32);
+	if (vp == NULL)
+		return -1;
+
+	offset = intel_batch_offset(batch, vp);
+
+	/* XXX I don't understand this */
+	vp->min_depth = -1.e35;
+	vp->max_depth = 1.e35;
+
+	return offset;
+}
+
+static uint32_t
+gen7_create_sf_clip_viewport(struct intel_batchbuffer *batch) {
+	/* XXX these are likely not needed */
+	struct gen7_sf_clip_viewport *scv_state;
+	uint32_t offset;
+
+	scv_state = intel_batch_state_alloc(batch, sizeof(*scv_state), 64);
+	if (scv_state == NULL)
+		return -1;
+
+	offset = intel_batch_offset(batch, scv_state);
+
+	scv_state->guardband.xmin = 0;
+	scv_state->guardband.xmax = 1.0f;
+	scv_state->guardband.ymin = 0;
+	scv_state->guardband.ymax = 1.0f;
+
+	return offset;
+}
+
+static uint32_t
+gen6_create_scissor_rect(struct intel_batchbuffer *batch)
+{
+	struct gen6_scissor_rect *scissor;
+	uint32_t offset;
+
+	scissor = intel_batch_state_alloc(batch, sizeof(*scissor), 64);
+	if (scissor == NULL)
+		return -1;
+
+	offset = intel_batch_offset(batch, scissor);
+
+	return offset;
+}
+
+static void
+gen8_emit_sip(struct intel_batchbuffer *batch) {
+	OUT_BATCH(GEN6_STATE_SIP | (3 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+static void
+gen7_emit_push_constants(struct intel_batchbuffer *batch) {
+	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_VS);
+	OUT_BATCH(0);
+	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_HS);
+	OUT_BATCH(0);
+	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_DS);
+	OUT_BATCH(0);
+	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_GS);
+	OUT_BATCH(0);
+	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_PS);
+	OUT_BATCH(0);
+}
+
+static void
+gen8_emit_state_base_address(struct intel_batchbuffer *batch) {
+	OUT_BATCH(GEN6_STATE_BASE_ADDRESS | (16 - 2));
+
+	/* general */
+	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
+	OUT_BATCH(0);
+
+	/* stateless data port */
+	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
+
+	/* surface */
+	OUT_RELOC(batch, I915_GEM_DOMAIN_SAMPLER, 0, BASE_ADDRESS_MODIFY);
+	OUT_BATCH(0);
+
+	/* dynamic */
+	OUT_RELOC(batch, I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION,
+		  0, BASE_ADDRESS_MODIFY);
+	OUT_BATCH(0);
+
+	/* indirect */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	/* instruction */
+	OUT_RELOC(batch, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+	OUT_BATCH(0);
+
+	/* general state buffer size */
+	OUT_BATCH(0xfffff000 | 1);
+	/* dynamic state buffer size */
+	OUT_BATCH(1 << 12 | 1);
+	/* indirect object buffer size */
+	OUT_BATCH(0xfffff000 | 1);
+	/* intruction buffer size */
+	OUT_BATCH(1 << 12 | 1);
+}
+
+static void
+gen7_emit_urb(struct intel_batchbuffer *batch) {
+	/* XXX: Min valid values from mesa */
+	const int vs_entries = 64;
+	const int vs_size = 2;
+	const int vs_start = 2;
+
+	OUT_BATCH(GEN7_3DSTATE_URB_VS);
+	OUT_BATCH(vs_entries | ((vs_size - 1) << 16) | (vs_start << 25));
+	OUT_BATCH(GEN7_3DSTATE_URB_GS);
+	OUT_BATCH(vs_start << 25);
+	OUT_BATCH(GEN7_3DSTATE_URB_HS);
+	OUT_BATCH(vs_start << 25);
+	OUT_BATCH(GEN7_3DSTATE_URB_DS);
+	OUT_BATCH(vs_start << 25);
+}
+
+static void
+gen8_emit_cc(struct intel_batchbuffer *batch) {
+	OUT_BATCH(GEN7_3DSTATE_BLEND_STATE_POINTERS);
+	OUT_BATCH(cc.blend_state | 1);
+
+	OUT_BATCH(GEN6_3DSTATE_CC_STATE_POINTERS);
+	OUT_BATCH(cc.cc_state | 1);
+}
+
+static void
+gen8_emit_multisample(struct intel_batchbuffer *batch) {
+	OUT_BATCH(GEN8_3DSTATE_MULTISAMPLE);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN6_3DSTATE_SAMPLE_MASK);
+	OUT_BATCH(1);
+}
+
+static void
+gen8_emit_vs(struct intel_batchbuffer *batch) {
+	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_VS);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_VS);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN6_3DSTATE_CONSTANT_VS | (11 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN6_3DSTATE_VS | (9-2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+static void
+gen8_emit_hs(struct intel_batchbuffer *batch) {
+	OUT_BATCH(GEN7_3DSTATE_CONSTANT_HS | (11 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_HS | (9-2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_HS);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_HS);
+	OUT_BATCH(0);
+}
+
+static void
+gen8_emit_gs(struct intel_batchbuffer *batch) {
+	OUT_BATCH(GEN7_3DSTATE_CONSTANT_GS | (11 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_GS | (10-2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_GS);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_GS);
+	OUT_BATCH(0);
+}
+
+static void
+gen8_emit_ds(struct intel_batchbuffer *batch) {
+	OUT_BATCH(GEN7_3DSTATE_CONSTANT_DS | (11 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_DS | (9-2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_DS);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_DS);
+	OUT_BATCH(0);
+}
+
+static void
+gen8_emit_wm_hz_op(struct intel_batchbuffer *batch) {
+	OUT_BATCH(GEN8_3DSTATE_WM_HZ_OP | (5-2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+static void
+gen8_emit_null_state(struct intel_batchbuffer *batch) {
+	gen8_emit_wm_hz_op(batch);
+	gen8_emit_hs(batch);
+	OUT_BATCH(GEN7_3DSTATE_TE | (4-2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	gen8_emit_gs(batch);
+	gen8_emit_ds(batch);
+	gen8_emit_vs(batch);
+}
+
+static void
+gen7_emit_clip(struct intel_batchbuffer *batch) {
+	OUT_BATCH(GEN6_3DSTATE_CLIP | (4 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(0); /*  pass-through */
+	OUT_BATCH(0);
+}
+
+static void
+gen8_emit_sf(struct intel_batchbuffer *batch)
+{
+	int i;
+
+	OUT_BATCH(GEN7_3DSTATE_SBE | (4 - 2));
+	OUT_BATCH(1 << GEN7_SBE_NUM_OUTPUTS_SHIFT |
+		  GEN8_SBE_FORCE_URB_ENTRY_READ_LENGTH |
+		  GEN8_SBE_FORCE_URB_ENTRY_READ_OFFSET |
+		  1 << GEN7_SBE_URB_ENTRY_READ_LENGTH_SHIFT |
+		  1 << GEN8_SBE_URB_ENTRY_READ_OFFSET_SHIFT);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN8_3DSTATE_SBE_SWIZ | (11 - 2));
+	for (i = 0; i < 8; i++)
+		OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN8_3DSTATE_RASTER | (5 - 2));
+	OUT_BATCH(GEN8_RASTER_FRONT_WINDING_CCW | GEN8_RASTER_CULL_NONE);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN6_3DSTATE_SF | (4 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+static void
+gen8_emit_ps(struct intel_batchbuffer *batch, uint32_t kernel) {
+	const int max_threads = 63;
+
+	OUT_BATCH(GEN6_3DSTATE_WM | (2 - 2));
+	OUT_BATCH(/* XXX: I don't understand the BARYCENTRIC stuff, but it
+		   * appears we need it to put our setup data in the place we
+		   * expect (g6, see below) */
+		  GEN7_3DSTATE_PS_PERSPECTIVE_PIXEL_BARYCENTRIC);
+
+	OUT_BATCH(GEN6_3DSTATE_CONSTANT_PS | (11-2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_PS | (12-2));
+	OUT_BATCH(kernel);
+	OUT_BATCH(0); /* kernel hi */
+	OUT_BATCH(1 << GEN6_3DSTATE_WM_SAMPLER_COUNT_SHITF |
+		  2 << GEN6_3DSTATE_WM_BINDING_TABLE_ENTRY_COUNT_SHIFT);
+	OUT_BATCH(0); /* scratch space stuff */
+	OUT_BATCH(0); /* scratch hi */
+	OUT_BATCH((max_threads - 1) << GEN8_3DSTATE_PS_MAX_THREADS_SHIFT |
+		  GEN6_3DSTATE_WM_16_DISPATCH_ENABLE);
+	OUT_BATCH(6 << GEN6_3DSTATE_WM_DISPATCH_START_GRF_0_SHIFT);
+	OUT_BATCH(0); // kernel 1
+	OUT_BATCH(0); /* kernel 1 hi */
+	OUT_BATCH(0); // kernel 2
+	OUT_BATCH(0); /* kernel 2 hi */
+
+	OUT_BATCH(GEN8_3DSTATE_PS_BLEND | (2 - 2));
+	OUT_BATCH(GEN8_PS_BLEND_HAS_WRITEABLE_RT);
+
+	OUT_BATCH(GEN8_3DSTATE_PS_EXTRA | (2 - 2));
+	OUT_BATCH(GEN8_PSX_PIXEL_SHADER_VALID | GEN8_PSX_ATTRIBUTE_ENABLE);
+}
+
+static void
+gen8_emit_depth(struct intel_batchbuffer *batch) {
+	OUT_BATCH(GEN7_3DSTATE_DEPTH_BUFFER | (8-2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_HIER_DEPTH_BUFFER | (5 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN7_3DSTATE_STENCIL_BUFFER | (5 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+static void
+gen7_emit_clear(struct intel_batchbuffer *batch) {
+	OUT_BATCH(GEN7_3DSTATE_CLEAR_PARAMS | (3-2));
+	OUT_BATCH(0);
+	OUT_BATCH(1); // clear valid
+}
+
+static void
+gen6_emit_drawing_rectangle(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN6_3DSTATE_DRAWING_RECTANGLE | (4 - 2));
+	OUT_BATCH(0xffffffff);
+	OUT_BATCH(0 | 0);
+	OUT_BATCH(0);
+}
+
+static void gen8_emit_vf_topology(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN8_3DSTATE_VF_TOPOLOGY);
+	OUT_BATCH(_3DPRIM_RECTLIST);
+}
+
+/* Vertex elements MUST be defined before this according to spec */
+static void gen8_emit_primitive(struct intel_batchbuffer *batch, uint32_t offset)
+{
+	OUT_BATCH(GEN8_3DSTATE_VF_INSTANCING | (3 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	OUT_BATCH(GEN6_3DPRIMITIVE | (7-2));
+	OUT_BATCH(0);	/* gen8+ ignore the topology type field */
+	OUT_BATCH(3);	/* vertex count */
+	OUT_BATCH(0);	/*  We're specifying this instead with offset in GEN6_3DSTATE_VERTEX_BUFFERS */
+	OUT_BATCH(1);	/* single instance */
+	OUT_BATCH(0);	/* start instance location */
+	OUT_BATCH(0);	/* index buffer offset, ignored */
+}
+
+int gen8_setup_null_render_state(struct intel_batchbuffer *batch)
+{
+	uint32_t ps_sampler_state, ps_kernel_off, ps_binding_table;
+	uint32_t scissor_state;
+	uint32_t vertex_buffer;
+	uint32_t batch_end;
+	int ret;
+
+	ps_binding_table  = gen8_bind_surfaces(batch);
+	ps_sampler_state  = gen8_create_sampler(batch);
+	ps_kernel_off = gen8_fill_ps(batch, ps_kernel, sizeof(ps_kernel));
+	vertex_buffer = gen7_fill_vertex_buffer_data(batch);
+	cc.cc_state = gen6_create_cc_state(batch);
+	cc.blend_state = gen8_create_blend_state(batch);
+	viewport.cc_state = gen6_create_cc_viewport(batch);
+	viewport.sf_clip_state = gen7_create_sf_clip_viewport(batch);
+	scissor_state = gen6_create_scissor_rect(batch);
+	/* TODO: theree is other state which isn't setup */
+
+	/* Start emitting the commands. The order roughly follows the mesa blorp
+	 * order */
+	OUT_BATCH(GEN6_PIPELINE_SELECT | PIPELINE_SELECT_3D);
+
+	gen8_emit_sip(batch);
+
+	gen7_emit_push_constants(batch);
+
+	gen8_emit_state_base_address(batch);
+
+	OUT_BATCH(GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_CC);
+	OUT_BATCH(viewport.cc_state);
+	OUT_BATCH(GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP);
+	OUT_BATCH(viewport.sf_clip_state);
+
+	gen7_emit_urb(batch);
+
+	gen8_emit_cc(batch);
+
+	gen8_emit_multisample(batch);
+
+	gen8_emit_null_state(batch);
+
+	OUT_BATCH(GEN7_3DSTATE_STREAMOUT | (5-2));
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	gen7_emit_clip(batch);
+
+	gen8_emit_sf(batch);
+
+	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_PS);
+	OUT_BATCH(ps_binding_table);
+
+	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_PS);
+	OUT_BATCH(ps_sampler_state);
+
+	gen8_emit_ps(batch, ps_kernel_off);
+
+	OUT_BATCH(GEN6_3DSTATE_SCISSOR_STATE_POINTERS);
+	OUT_BATCH(scissor_state);
+
+	gen8_emit_depth(batch);
+
+	gen7_emit_clear(batch);
+
+	gen6_emit_drawing_rectangle(batch);
+
+	gen7_emit_vertex_buffer(batch, vertex_buffer);
+	gen6_emit_vertex_elements(batch);
+
+	gen8_emit_vf_topology(batch);
+	gen8_emit_primitive(batch, vertex_buffer);
+
+	OUT_BATCH(MI_BATCH_BUFFER_END);
+
+	ret = intel_batch_error(batch);
+	if (ret == 0)
+		ret = intel_batch_total_used(batch);
+
+	return ret;
+}
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] drm/i915: add render state initialization
  2014-05-06 13:26 ` [PATCH v2 1/2] drm/i915: add render state initialization Mika Kuoppala
@ 2014-05-06 13:41   ` Chris Wilson
  2014-05-06 14:30     ` [PATCH v3 " Mika Kuoppala
  2014-05-06 14:34     ` [PATCH v2 " Mika Kuoppala
  0 siblings, 2 replies; 19+ messages in thread
From: Chris Wilson @ 2014-05-06 13:41 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, miku, ben, kristen

On Tue, May 06, 2014 at 04:26:05PM +0300, Mika Kuoppala wrote:
> HW guys say that it is not a cool idea to let device
> go into rc6 without proper 3d pipeline state.

* shrug

What's improper 3d state and what prevents userspace from triggering
badness later?

The only problem I see in the patch is that you don't move the so->obj
to the GPU before execution - the code is only coherent thanks to LLC
atm.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tools/null_state_gen: generate null render state
  2014-05-06 13:39 ` [PATCH] tools/null_state_gen: generate null render state Mika Kuoppala
@ 2014-05-06 13:47   ` Chris Wilson
  2014-05-06 14:44     ` Mika Kuoppala
  2014-05-09 15:15     ` Damien Lespiau
  2014-05-08 14:37   ` Damien Lespiau
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Chris Wilson @ 2014-05-06 13:47 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

Why does this work? It is neither the most minimal batch, nor the
maximal. Which state is truly required? It looks like cargo-culted
Chinese.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/2] drm/i915: add render state initialization
  2014-05-06 13:41   ` Chris Wilson
@ 2014-05-06 14:30     ` Mika Kuoppala
  2014-05-14 10:24       ` Mateo Lozano, Oscar
  2014-05-06 14:34     ` [PATCH v2 " Mika Kuoppala
  1 sibling, 1 reply; 19+ messages in thread
From: Mika Kuoppala @ 2014-05-06 14:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: ben, miku, kristen

HW guys say that it is not a cool idea to let device
go into rc6 without proper 3d pipeline state.

For each new uninitialized context, generate a
valid null render state to be run on context
creation.

This patch introduces a skeleton with empty states.

v2: - No need to vmap (Chris Wilson)
    - use .c files for state (Daniel Vetter)
    - no need to flush as i915_add_request does it
    - remove parameter for batch alloc size
    - don't wait for the init (Ben Widawsky)

v3: - move to cpu/gpu (Chris Wilson)

Tested-by: Kristen Carlson Accardi <kristen@linux.intel.com> (v1)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |    6 +
 drivers/gpu/drm/i915/i915_drv.h               |    2 +
 drivers/gpu/drm/i915/i915_gem_context.c       |    6 +
 drivers/gpu/drm/i915/i915_gem_render_state.c  |  194 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_renderstate.h      |   48 ++++++
 drivers/gpu/drm/i915/intel_renderstate_gen6.c |   10 ++
 drivers/gpu/drm/i915/intel_renderstate_gen7.c |   10 ++
 drivers/gpu/drm/i915/intel_renderstate_gen8.c |   10 ++
 8 files changed, 286 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_render_state.c
 create mode 100644 drivers/gpu/drm/i915/intel_renderstate.h
 create mode 100644 drivers/gpu/drm/i915/intel_renderstate_gen6.c
 create mode 100644 drivers/gpu/drm/i915/intel_renderstate_gen7.c
 create mode 100644 drivers/gpu/drm/i915/intel_renderstate_gen8.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index b1445b7..2446916 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -18,6 +18,7 @@ i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o
 # GEM code
 i915-y += i915_cmd_parser.o \
 	  i915_gem_context.o \
+	  i915_gem_render_state.o \
 	  i915_gem_debug.o \
 	  i915_gem_dmabuf.o \
 	  i915_gem_evict.o \
@@ -32,6 +33,11 @@ i915-y += i915_cmd_parser.o \
 	  intel_ringbuffer.o \
 	  intel_uncore.o
 
+# autogenerated null render state
+i915-y += intel_renderstate_gen6.o \
+	  intel_renderstate_gen7.o \
+	  intel_renderstate_gen8.o
+
 # modesetting core code
 i915-y += intel_bios.o \
 	  intel_display.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3fc2e3d..a2fc605 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2336,6 +2336,8 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 				   struct drm_file *file);
 
+/* i915_gem_render_state.c */
+int i915_gem_render_state_init(struct intel_ring_buffer *ring);
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev,
 					  struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f77b4c1..f7ad59e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -699,6 +699,12 @@ static int do_switch(struct intel_ring_buffer *ring,
 		/* obj is kept alive until the next request by its active ref */
 		i915_gem_object_ggtt_unpin(from->obj);
 		i915_gem_context_unreference(from);
+	} else {
+		if (to->is_initialized == false) {
+			ret = i915_gem_render_state_init(ring);
+			if (ret)
+				DRM_ERROR("init render state: %d\n", ret);
+		}
 	}
 
 	to->is_initialized = true;
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
new file mode 100644
index 0000000..392aa7b
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -0,0 +1,194 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Mika Kuoppala <mika.kuoppala@intel.com>
+ *
+ */
+
+#include "i915_drv.h"
+#include "intel_renderstate.h"
+
+struct i915_render_state {
+	struct drm_i915_gem_object *obj;
+	unsigned long ggtt_offset;
+	void *batch;
+	u32 size;
+	u32 len;
+};
+
+static struct i915_render_state *render_state_alloc(struct drm_device *dev)
+{
+	struct i915_render_state *so;
+	struct page *page;
+	int ret;
+
+	so = kzalloc(sizeof(*so), GFP_KERNEL);
+	if (!so)
+		return ERR_PTR(-ENOMEM);
+
+	so->obj = i915_gem_alloc_object(dev, 4096);
+	if (so->obj == NULL) {
+		ret = -ENOMEM;
+		goto free;
+	}
+	so->size = 4096;
+
+	ret = i915_gem_obj_ggtt_pin(so->obj, 4096, 0);
+	if (ret)
+		goto free_gem;
+
+	BUG_ON(so->obj->pages->nents != 1);
+	page = sg_page(so->obj->pages->sgl);
+
+	so->batch = kmap(page);
+	if (!so->batch) {
+		ret = -ENOMEM;
+		goto unpin;
+	}
+
+	so->ggtt_offset = i915_gem_obj_ggtt_offset(so->obj);
+
+	return so;
+unpin:
+	i915_gem_object_ggtt_unpin(so->obj);
+free_gem:
+	drm_gem_object_unreference(&so->obj->base);
+free:
+	kfree(so);
+	return ERR_PTR(ret);
+}
+
+static void render_state_free(struct i915_render_state *so)
+{
+	kunmap(so->batch);
+	i915_gem_object_ggtt_unpin(so->obj);
+	drm_gem_object_unreference(&so->obj->base);
+	kfree(so);
+}
+
+static const struct intel_renderstate_rodata *
+render_state_get_rodata(struct drm_device *dev, const int gen)
+{
+	switch (gen) {
+	case 6:
+		return &gen6_null_state;
+	case 7:
+		return &gen7_null_state;
+	case 8:
+		return &gen8_null_state;
+	}
+
+	return NULL;
+}
+
+static int render_state_setup(const int gen,
+			      const struct intel_renderstate_rodata *rodata,
+			      struct i915_render_state *so)
+{
+	const u64 goffset = i915_gem_obj_ggtt_offset(so->obj);
+	u32 reloc_index = 0;
+	u32 * const d = so->batch;
+	unsigned int i = 0;
+	int ret;
+
+	if (!rodata || rodata->batch_items * 4 > so->size)
+		return -EINVAL;
+
+	ret = i915_gem_object_set_to_cpu_domain(so->obj, true);
+	if (ret)
+		return ret;
+
+	while (i < rodata->batch_items) {
+		u32 s = rodata->batch[i];
+
+		if (reloc_index < rodata->reloc_items &&
+		    i * 4  == rodata->reloc[reloc_index]) {
+
+			s += goffset & 0xffffffff;
+
+			/* We keep batch offsets max 32bit */
+			if (gen >= 8) {
+				if (i + 1 >= rodata->batch_items ||
+				    rodata->batch[i + 1] != 0)
+					return -EINVAL;
+
+				d[i] = s;
+				i++;
+				s = (goffset & 0xffffffff00000000ull) >> 32;
+			}
+
+			reloc_index++;
+		}
+
+		d[i] = s;
+		i++;
+	}
+
+	ret = i915_gem_object_set_to_gtt_domain(so->obj, false);
+	if (ret)
+		return ret;
+
+	if (rodata->reloc_items != reloc_index) {
+		DRM_ERROR("not all relocs resolved, %d out of %d\n",
+			  reloc_index, rodata->reloc_items);
+		return -EINVAL;
+	}
+
+	so->len = rodata->batch_items * 4;
+
+	return 0;
+}
+
+int i915_gem_render_state_init(struct intel_ring_buffer *ring)
+{
+	const int gen = INTEL_INFO(ring->dev)->gen;
+	struct i915_render_state *so;
+	const struct intel_renderstate_rodata *rodata;
+	u32 seqno;
+	int ret;
+
+	rodata = render_state_get_rodata(ring->dev, gen);
+	if (rodata == NULL)
+		return 0;
+
+	so = render_state_alloc(ring->dev);
+	if (IS_ERR(so))
+		return PTR_ERR(so);
+
+	ret = render_state_setup(gen, rodata, so);
+	if (ret)
+		goto out;
+
+	ret = ring->dispatch_execbuffer(ring,
+					i915_gem_obj_ggtt_offset(so->obj),
+					so->len,
+					I915_DISPATCH_SECURE);
+	if (ret)
+		goto out;
+
+	ret = i915_add_request(ring, &seqno);
+
+out:
+	render_state_free(so);
+	return ret;
+}
diff --git a/drivers/gpu/drm/i915/intel_renderstate.h b/drivers/gpu/drm/i915/intel_renderstate.h
new file mode 100644
index 0000000..a5e783a
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_renderstate.h
@@ -0,0 +1,48 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef _INTEL_RENDERSTATE_H
+#define _INTEL_RENDERSTATE_H
+
+#include <linux/types.h>
+
+struct intel_renderstate_rodata {
+	const u32 *reloc;
+	const u32 reloc_items;
+	const u32 *batch;
+	const u32 batch_items;
+};
+
+extern const struct intel_renderstate_rodata gen6_null_state;
+extern const struct intel_renderstate_rodata gen7_null_state;
+extern const struct intel_renderstate_rodata gen8_null_state;
+
+#define RO_RENDERSTATE(_g)						\
+	const struct intel_renderstate_rodata gen ## _g ## _null_state = { \
+		.reloc = gen ## _g ## _null_state_relocs,		\
+		.reloc_items = sizeof(gen ## _g ## _null_state_relocs)/4, \
+		.batch = gen ## _g ## _null_state_batch,		\
+		.batch_items = sizeof(gen ## _g ## _null_state_batch)/4, \
+	}
+
+#endif /* INTEL_RENDERSTATE_H */
diff --git a/drivers/gpu/drm/i915/intel_renderstate_gen6.c b/drivers/gpu/drm/i915/intel_renderstate_gen6.c
new file mode 100644
index 0000000..5ed251a
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_renderstate_gen6.c
@@ -0,0 +1,10 @@
+#include "intel_renderstate.h"
+
+static const u32 gen6_null_state_relocs[] = {
+};
+
+static const u32 gen6_null_state_batch[] = {
+	0x0a << 23, /* MI_BATCH_BUFFER_END */
+};
+
+RO_RENDERSTATE(6);
diff --git a/drivers/gpu/drm/i915/intel_renderstate_gen7.c b/drivers/gpu/drm/i915/intel_renderstate_gen7.c
new file mode 100644
index 0000000..5333f44
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_renderstate_gen7.c
@@ -0,0 +1,10 @@
+#include "intel_renderstate.h"
+
+static const u32 gen7_null_state_relocs[] = {
+};
+
+static const u32 gen7_null_state_batch[] = {
+	0x0a << 23, /* MI_BATCH_BUFFER_END */
+};
+
+RO_RENDERSTATE(7);
diff --git a/drivers/gpu/drm/i915/intel_renderstate_gen8.c b/drivers/gpu/drm/i915/intel_renderstate_gen8.c
new file mode 100644
index 0000000..88c3733
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_renderstate_gen8.c
@@ -0,0 +1,10 @@
+#include "intel_renderstate.h"
+
+static const u32 gen8_null_state_relocs[] = {
+};
+
+static const u32 gen8_null_state_batch[] = {
+	0x0a << 23, /* MI_BATCH_BUFFER_END */
+};
+
+RO_RENDERSTATE(8);
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] drm/i915: add render state initialization
  2014-05-06 13:41   ` Chris Wilson
  2014-05-06 14:30     ` [PATCH v3 " Mika Kuoppala
@ 2014-05-06 14:34     ` Mika Kuoppala
  1 sibling, 0 replies; 19+ messages in thread
From: Mika Kuoppala @ 2014-05-06 14:34 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, miku, ben, kristen

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Tue, May 06, 2014 at 04:26:05PM +0300, Mika Kuoppala wrote:
>> HW guys say that it is not a cool idea to let device
>> go into rc6 without proper 3d pipeline state.
>
> * shrug
>
> What's improper 3d state and what prevents userspace from triggering
> badness later?

I would guess improver is 'whats is there after powerup'. But yes,
we dont even know (yet?) what is the proper minimal state :P

What comes to userspace triggering badness later is that the ring
will hang and hangcheck will cleanup the mess.

> The only problem I see in the patch is that you don't move the so->obj
> to the GPU before execution - the code is only coherent thanks to LLC
> atm.

Fixed in v3.

Thanks,
-Mika

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tools/null_state_gen: generate null render state
  2014-05-06 13:47   ` Chris Wilson
@ 2014-05-06 14:44     ` Mika Kuoppala
  2014-05-09 15:15     ` Damien Lespiau
  1 sibling, 0 replies; 19+ messages in thread
From: Mika Kuoppala @ 2014-05-06 14:44 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Why does this work? It is neither the most minimal batch, nor the
> maximal. Which state is truly required? It looks like cargo-culted
> Chinese.

These are just stripped down version of rendercopy for each gen.

What I would guess would be the key to understanding the issue
at hand, would be to start from just empty batch and then add
one stage initialization at a time, until the bug disappears.

But as our bdw doesn't seem to hit the issue reliably, I went
into setting up everything (used rendercopy).

-Mika

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tools/null_state_gen: generate null render state
  2014-05-06 13:39 ` [PATCH] tools/null_state_gen: generate null render state Mika Kuoppala
  2014-05-06 13:47   ` Chris Wilson
@ 2014-05-08 14:37   ` Damien Lespiau
  2014-05-08 14:43   ` Damien Lespiau
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Damien Lespiau @ 2014-05-08 14:37 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Tue, May 06, 2014 at 04:39:01PM +0300, Mika Kuoppala wrote:
> Generate valid (null) render state for each gen. Output
> it as a c source file with batch and relocations.
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  configure.ac                                  |    1 +
>  lib/gen6_render.h                             |    1 +
>  lib/gen7_render.h                             |    1 +
>  tools/Makefile.am                             |    4 +-
>  tools/null_state_gen/Makefile.am              |   16 +
>  tools/null_state_gen/intel_batchbuffer.c      |  173 ++++++
>  tools/null_state_gen/intel_batchbuffer.h      |   91 +++
>  tools/null_state_gen/intel_null_state_gen.c   |  151 +++++
>  tools/null_state_gen/intel_renderstate_gen7.c |  505 ++++++++++++++++
>  tools/null_state_gen/intel_renderstate_gen8.c |  764 +++++++++++++++++++++++++
>  10 files changed, 1706 insertions(+), 1 deletion(-)


Missing a git add on intel_renderstate_gen6.c ? your branch doesn't
compile here.

-- 
Damien

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tools/null_state_gen: generate null render state
  2014-05-06 13:39 ` [PATCH] tools/null_state_gen: generate null render state Mika Kuoppala
  2014-05-06 13:47   ` Chris Wilson
  2014-05-08 14:37   ` Damien Lespiau
@ 2014-05-08 14:43   ` Damien Lespiau
  2014-05-08 15:10     ` Mika Kuoppala
  2014-05-09 14:46   ` Damien Lespiau
  2014-05-14 10:06   ` Damien Lespiau
  4 siblings, 1 reply; 19+ messages in thread
From: Damien Lespiau @ 2014-05-08 14:43 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Tue, May 06, 2014 at 04:39:01PM +0300, Mika Kuoppala wrote:
> diff --git a/tools/null_state_gen/Makefile.am b/tools/null_state_gen/Makefile.am
> new file mode 100644
> index 0000000..40d2237
> --- /dev/null
> +++ b/tools/null_state_gen/Makefile.am
> @@ -0,0 +1,16 @@
> +bin_PROGRAMS = intel_null_state_gen

Can intel_null_state_gen be 'noinst' instead? I don't think we need to
install it as a tool.

-- 
Damien

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tools/null_state_gen: generate null render state
  2014-05-08 14:43   ` Damien Lespiau
@ 2014-05-08 15:10     ` Mika Kuoppala
  0 siblings, 0 replies; 19+ messages in thread
From: Mika Kuoppala @ 2014-05-08 15:10 UTC (permalink / raw)
  To: Damien Lespiau; +Cc: intel-gfx

Damien Lespiau <damien.lespiau@intel.com> writes:

> On Tue, May 06, 2014 at 04:39:01PM +0300, Mika Kuoppala wrote:
>> diff --git a/tools/null_state_gen/Makefile.am b/tools/null_state_gen/Makefile.am
>> new file mode 100644
>> index 0000000..40d2237
>> --- /dev/null
>> +++ b/tools/null_state_gen/Makefile.am
>> @@ -0,0 +1,16 @@
>> +bin_PROGRAMS = intel_null_state_gen
>
> Can intel_null_state_gen be 'noinst' instead? I don't think we need to
> install it as a tool.

Definitely no need to install it.

noinst and missing gen6 file added and results pushed here:
http://cgit.freedesktop.org/~miku/intel-gpu-tools/log/?h=null_state_gen

-Mika

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tools/null_state_gen: generate null render state
  2014-05-06 13:39 ` [PATCH] tools/null_state_gen: generate null render state Mika Kuoppala
                     ` (2 preceding siblings ...)
  2014-05-08 14:43   ` Damien Lespiau
@ 2014-05-09 14:46   ` Damien Lespiau
  2014-05-14 10:06   ` Damien Lespiau
  4 siblings, 0 replies; 19+ messages in thread
From: Damien Lespiau @ 2014-05-09 14:46 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Tue, May 06, 2014 at 04:39:01PM +0300, Mika Kuoppala wrote:
> diff --git a/tools/null_state_gen/intel_renderstate_gen8.c b/tools/null_state_gen/intel_renderstate_gen8.c
> new file mode 100644
> index 0000000..7e22b24
> --- /dev/null
> +++ b/tools/null_state_gen/intel_renderstate_gen8.c

[...]

> +static void
> +gen7_emit_urb(struct intel_batchbuffer *batch) {
> +	/* XXX: Min valid values from mesa */
> +	const int vs_entries = 64;
> +	const int vs_size = 2;
> +	const int vs_start = 2;
> +
> +	OUT_BATCH(GEN7_3DSTATE_URB_VS);
> +	OUT_BATCH(vs_entries | ((vs_size - 1) << 16) | (vs_start << 25));
> +	OUT_BATCH(GEN7_3DSTATE_URB_GS);
> +	OUT_BATCH(vs_start << 25);
> +	OUT_BATCH(GEN7_3DSTATE_URB_HS);
> +	OUT_BATCH(vs_start << 25);
> +	OUT_BATCH(GEN7_3DSTATE_URB_DS);
> +	OUT_BATCH(vs_start << 25);
> +}

It seems that for BDW GT3, the minimal start is documented as 4. Mesa
has actually been updated to do the right thing now (push contants take
32KB on GT3) and vs_start is 4 on GT3.

Same story for the other URB allocations. But as they are disabled, may
not matter much. We don't setup the PUSH_CONSTANT state, so it's
possible the VS is able to address the start of the URB. Meh?

I'd still put vs_start to 4 I guess.

I'm quite puzzled by why we need to do all that, but let's not go there
again.

-- 
Damien

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tools/null_state_gen: generate null render state
  2014-05-06 13:47   ` Chris Wilson
  2014-05-06 14:44     ` Mika Kuoppala
@ 2014-05-09 15:15     ` Damien Lespiau
  1 sibling, 0 replies; 19+ messages in thread
From: Damien Lespiau @ 2014-05-09 15:15 UTC (permalink / raw)
  To: Chris Wilson, Mika Kuoppala, intel-gfx

On Tue, May 06, 2014 at 02:47:40PM +0100, Chris Wilson wrote:
> Why does this work? It is neither the most minimal batch, nor the
> maximal. Which state is truly required? It looks like cargo-culted
> Chinese.

I'll have to echo this. It's really not obvious why this is needed.
If you look at the render engine power context for instance, it's just a
list of registers. So if we do:

  - init_clock_gating() (bad name!)
  - enable_rc6()

The render power context should contain the W/A we setup.

Would we do:

  - enable_rc6()
      -> enter rc6
      -> power context save
  - init_clock_gating()
      -> exit rc6
      -> power context restore

We'd end up restoring the reset value of the registers we touch in
init_clock_gating() (or the values after BIOS really), ie may not
contain all W/As and hang?

Note that init_clock_gating() is not the only place where we touch the
registers that are part of the power context(s), we do need to ensure
rc6 is only enabled after we setup those registers.

It could also be that something else than saving/restoring the power
contexts is happening at rc6 entry/exit, but the documentation is rather
sparse here and so we need to try talking to the hardware engineers
again.

So yes very much feels like cargo culting. I'd be nice to really
understand what's happening.

Now, a rather pragmatic approach would be to take those patches if they
actually paper over an issue, but the Tested-by: tags are not legion,
Mika didn't reproduce the issue on his BDW (IIRC) and Ben was saying
Kristen didn't confirm it was these exact patches that were solving
hangs for her (If I understood correctly on the call).

I do have to point out that it's a lot of code to review and rather full
of details, ie, we'll get it wrong-ish (but not enough to break
anything, hopefully).

-- 
Damien

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tools/null_state_gen: generate null render state
  2014-05-06 13:39 ` [PATCH] tools/null_state_gen: generate null render state Mika Kuoppala
                     ` (3 preceding siblings ...)
  2014-05-09 14:46   ` Damien Lespiau
@ 2014-05-14 10:06   ` Damien Lespiau
  4 siblings, 0 replies; 19+ messages in thread
From: Damien Lespiau @ 2014-05-14 10:06 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Tue, May 06, 2014 at 04:39:01PM +0300, Mika Kuoppala wrote:
> Generate valid (null) render state for each gen. Output
> it as a c source file with batch and relocations.
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>

With the GT3 URB allocation restriction added, this series is

Acked-by: Damien Lespiau <damien.lespiau@intel.com>

Can't really promote that to a r-b tag in good faith.

-- 
Damien

> ---
>  configure.ac                                  |    1 +
>  lib/gen6_render.h                             |    1 +
>  lib/gen7_render.h                             |    1 +
>  tools/Makefile.am                             |    4 +-
>  tools/null_state_gen/Makefile.am              |   16 +
>  tools/null_state_gen/intel_batchbuffer.c      |  173 ++++++
>  tools/null_state_gen/intel_batchbuffer.h      |   91 +++
>  tools/null_state_gen/intel_null_state_gen.c   |  151 +++++
>  tools/null_state_gen/intel_renderstate_gen7.c |  505 ++++++++++++++++
>  tools/null_state_gen/intel_renderstate_gen8.c |  764 +++++++++++++++++++++++++
>  10 files changed, 1706 insertions(+), 1 deletion(-)
>  create mode 100644 tools/null_state_gen/Makefile.am
>  create mode 100644 tools/null_state_gen/intel_batchbuffer.c
>  create mode 100644 tools/null_state_gen/intel_batchbuffer.h
>  create mode 100644 tools/null_state_gen/intel_null_state_gen.c
>  create mode 100644 tools/null_state_gen/intel_renderstate_gen7.c
>  create mode 100644 tools/null_state_gen/intel_renderstate_gen8.c
> 
> diff --git a/configure.ac b/configure.ac
> index b71b100..b848ac3 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -211,6 +211,7 @@ AC_CONFIG_FILES([
>  		 tests/Makefile
>  		 tools/Makefile
>  		 tools/quick_dump/Makefile
> +		 tools/null_state_gen/Makefile
>  		 debugger/Makefile
>  		 debugger/system_routine/Makefile
>  		 assembler/Makefile
> diff --git a/lib/gen6_render.h b/lib/gen6_render.h
> index 60dc93e..495cc2e 100644
> --- a/lib/gen6_render.h
> +++ b/lib/gen6_render.h
> @@ -152,6 +152,7 @@
>  #define VB0_VERTEXDATA			(0 << 20)
>  #define VB0_INSTANCEDATA		(1 << 20)
>  #define VB0_BUFFER_PITCH_SHIFT		0
> +#define VB0_NULL_VERTEX_BUFFER          (1 << 13)
>  
>  /* VERTEX_ELEMENT_STATE Structure */
>  #define VE0_VERTEX_BUFFER_INDEX_SHIFT	26 /* for GEN6 */
> diff --git a/lib/gen7_render.h b/lib/gen7_render.h
> index 1661d4c..992d839 100644
> --- a/lib/gen7_render.h
> +++ b/lib/gen7_render.h
> @@ -165,6 +165,7 @@
>  #define GEN7_VB0_VERTEXDATA		(0 << 20)
>  #define GEN7_VB0_INSTANCEDATA		(1 << 20)
>  #define GEN7_VB0_BUFFER_PITCH_SHIFT	0
> +#define GEN7_VB0_NULL_VERTEX_BUFFER	(1 << 13)
>  #define GEN7_VB0_ADDRESS_MODIFY_ENABLE	(1 << 14)
>  
>  /* VERTEX_ELEMENT_STATE Structure */
> diff --git a/tools/Makefile.am b/tools/Makefile.am
> index 151092b..64fa060 100644
> --- a/tools/Makefile.am
> +++ b/tools/Makefile.am
> @@ -1,7 +1,9 @@
>  include Makefile.sources
>  
> +SUBDIRS = null_state_gen
> +
>  if HAVE_DUMPER
> -SUBDIRS = quick_dump
> +SUBDIRS += quick_dump
>  endif
>  
>  AM_CPPFLAGS = -I$(top_srcdir) -I$(top_srcdir)/lib
> diff --git a/tools/null_state_gen/Makefile.am b/tools/null_state_gen/Makefile.am
> new file mode 100644
> index 0000000..40d2237
> --- /dev/null
> +++ b/tools/null_state_gen/Makefile.am
> @@ -0,0 +1,16 @@
> +bin_PROGRAMS = intel_null_state_gen
> +
> +intel_null_state_gen_SOURCES = 	\
> +	intel_batchbuffer.c \
> +	intel_renderstate_gen6.c \
> +	intel_renderstate_gen7.c \
> +	intel_renderstate_gen8.c \
> +	intel_null_state_gen.c
> +
> +gens := 6 7 8
> +
> +h = /tmp/intel_renderstate_gen$$gen.c
> +state_headers: intel_null_state_gen
> +	for gen in $(gens); do \
> +		./intel_null_state_gen $$gen >$(h) ;\
> +	done
> diff --git a/tools/null_state_gen/intel_batchbuffer.c b/tools/null_state_gen/intel_batchbuffer.c
> new file mode 100644
> index 0000000..62e052a
> --- /dev/null
> +++ b/tools/null_state_gen/intel_batchbuffer.c
> @@ -0,0 +1,173 @@
> +/**************************************************************************
> + *
> + * Copyright 2006 Tungsten Graphics, Inc., Cedar Park, Texas.
> + * All Rights Reserved.
> + *
> + * Copyright 2014 Intel Corporation
> + * All Rights Reserved.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> + * "Software"), to deal in the Software without restriction, including
> + * without limitation the rights to use, copy, modify, merge, publish,
> + * distribute, sub license, and/or sell copies of the Software, and to
> + * permit persons to whom the Software is furnished to do so, subject to
> + * the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> + * next paragraph) shall be included in all copies or substantial portions
> + * of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
> + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
> + * IN NO EVENT SHALL TUNGSTEN GRAPHICS AND/OR ITS SUPPLIERS BE LIABLE FOR
> + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
> + *
> + **************************************************************************/
> +
> +#include <stdio.h>
> +#include <string.h>
> +#include <errno.h>
> +
> +#include "intel_batchbuffer.h"
> +
> +int intel_batch_reset(struct intel_batchbuffer *batch,
> +		      void *p,
> +		      uint32_t size,
> +		      uint32_t off)
> +{
> +	batch->err = -EINVAL;
> +	batch->base = batch->base_ptr = p;
> +	batch->state_base = batch->state_ptr = p;
> +
> +	if (off >= size || ALIGN(off, 4) != off)
> +		return -EINVAL;
> +
> +	batch->size = size;
> +
> +	batch->state_base = batch->state_ptr = &batch->base[off];
> +
> +	batch->num_relocs = 0;
> +	batch->err = 0;
> +
> +	return batch->err;
> +}
> +
> +uint32_t intel_batch_state_used(struct intel_batchbuffer *batch)
> +{
> +	return batch->state_ptr - batch->state_base;
> +}
> +
> +uint32_t intel_batch_state_offset(struct intel_batchbuffer *batch)
> +{
> +	return batch->state_ptr - batch->base;
> +}
> +
> +void *intel_batch_state_alloc(struct intel_batchbuffer *batch,
> +			      uint32_t size,
> +			      uint32_t align)
> +{
> +	uint32_t cur;
> +	uint32_t offset;
> +
> +	if (batch->err)
> +		return NULL;
> +
> +	cur  = intel_batch_state_offset(batch);
> +	offset = ALIGN(cur, align);
> +
> +	if (offset + size > batch->size) {
> +		batch->err = -ENOSPC;
> +		return NULL;
> +	}
> +
> +	batch->state_ptr = batch->base + offset + size;
> +
> +	memset(batch->base + cur, 0, size);
> +
> +	return batch->base + offset;
> +}
> +
> +int intel_batch_offset(struct intel_batchbuffer *batch, const void *ptr)
> +{
> +	return (uint8_t *)ptr - batch->base;
> +}
> +
> +int intel_batch_state_copy(struct intel_batchbuffer *batch,
> +			   const void *ptr,
> +			   const uint32_t size,
> +			   const uint32_t align)
> +{
> +	void * const p = intel_batch_state_alloc(batch, size, align);
> +
> +	if (p == NULL)
> +		return -1;
> +
> +	return intel_batch_offset(batch, memcpy(p, ptr, size));
> +}
> +
> +uint32_t intel_batch_cmds_used(struct intel_batchbuffer *batch)
> +{
> +	return batch->base_ptr - batch->base;
> +}
> +
> +uint32_t intel_batch_total_used(struct intel_batchbuffer *batch)
> +{
> +	return batch->state_ptr - batch->base;
> +}
> +
> +static uint32_t intel_batch_space(struct intel_batchbuffer *batch)
> +{
> +	return batch->state_base - batch->base_ptr;
> +}
> +
> +int intel_batch_emit_dword(struct intel_batchbuffer *batch, uint32_t dword)
> +{
> +	uint32_t offset;
> +
> +	if (batch->err)
> +		return -1;
> +
> +	if (intel_batch_space(batch) < 4) {
> +		batch->err = -ENOSPC;
> +		return -1;
> +	}
> +
> +	offset = intel_batch_offset(batch, batch->base_ptr);
> +
> +	*(uint32_t *) (batch->base_ptr) = dword;
> +	batch->base_ptr += 4;
> +
> +	return offset;
> +}
> +
> +int intel_batch_emit_reloc(struct intel_batchbuffer *batch,
> +			   const uint32_t delta)
> +{
> +	uint32_t offset;
> +
> +	if (batch->err)
> +		return -1;
> +
> +	if (delta >= batch->size) {
> +		batch->err = -EINVAL;
> +		return -1;
> +	}
> +
> +	offset = intel_batch_emit_dword(batch, delta);
> +
> +	if (batch->err)
> +		return -1;
> +
> +	if (batch->num_relocs >= MAX_RELOCS) {
> +		batch->err = -ENOSPC;
> +		return -1;
> +	}
> +
> +	batch->relocs[batch->num_relocs++] = offset;
> +
> +	return offset;
> +}
> diff --git a/tools/null_state_gen/intel_batchbuffer.h b/tools/null_state_gen/intel_batchbuffer.h
> new file mode 100644
> index 0000000..f5c29db
> --- /dev/null
> +++ b/tools/null_state_gen/intel_batchbuffer.h
> @@ -0,0 +1,91 @@
> +/**************************************************************************
> + *
> + * Copyright 2006 Tungsten Graphics, Inc., Cedar Park, Texas.
> + * All Rights Reserved.
> + *
> + * Copyright 2014 Intel Corporation
> + * All Rights Reserved.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> + * "Software"), to deal in the Software without restriction, including
> + * without limitation the rights to use, copy, modify, merge, publish,
> + * distribute, sub license, and/or sell copies of the Software, and to
> + * permit persons to whom the Software is furnished to do so, subject to
> + * the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> + * next paragraph) shall be included in all copies or substantial portions
> + * of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
> + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
> + * IN NO EVENT SHALL TUNGSTEN GRAPHICS AND/OR ITS SUPPLIERS BE LIABLE FOR
> + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
> + *
> + **************************************************************************/
> +
> +#ifndef _INTEL_BATCHBUFFER_H
> +#define _INTEL_BATCHBUFFER_H
> +
> +#include <stdint.h>
> +
> +#define MAX_RELOCS 64
> +#define ALIGN(x, y) (((x) + (y)-1) & ~((y)-1))
> +
> +struct intel_batchbuffer {
> +	int err;
> +	uint8_t *base;
> +	uint8_t *base_ptr;
> +	uint8_t *state_base;
> +	uint8_t *state_ptr;
> +	int size;
> +
> +	uint32_t relocs[MAX_RELOCS];
> +	uint32_t num_relocs;
> +};
> +
> +#define OUT_BATCH(d) intel_batch_emit_dword(batch, d)
> +#define OUT_RELOC(batch, read_domains, write_domain, delta) \
> +	intel_batch_emit_reloc(batch, delta)
> +
> +int intel_batch_reset(struct intel_batchbuffer *batch,
> +		       void *p,
> +		       uint32_t size, uint32_t split_off);
> +
> +uint32_t intel_batch_state_used(struct intel_batchbuffer *batch);
> +
> +void *intel_batch_state_alloc(struct intel_batchbuffer *batch,
> +			      uint32_t size,
> +			      uint32_t align);
> +
> +int intel_batch_offset(struct intel_batchbuffer *batch, const void *ptr);
> +
> +int intel_batch_state_copy(struct intel_batchbuffer *batch,
> +			   const void *ptr,
> +			   const uint32_t size,
> +			   const uint32_t align);
> +
> +uint32_t intel_batch_cmds_used(struct intel_batchbuffer *batch);
> +
> +int intel_batch_emit_dword(struct intel_batchbuffer *batch, uint32_t dword);
> +
> +int intel_batch_emit_reloc(struct intel_batchbuffer *batch,
> +			   const uint32_t delta);
> +
> +uint32_t intel_batch_total_used(struct intel_batchbuffer *batch);
> +
> +static inline int intel_batch_error(struct intel_batchbuffer *batch)
> +{
> +	return batch->err;
> +}
> +
> +static inline uint32_t intel_batch_state_start(struct intel_batchbuffer *batch)
> +{
> +	return batch->state_base - batch->base;
> +}
> +
> +#endif
> diff --git a/tools/null_state_gen/intel_null_state_gen.c b/tools/null_state_gen/intel_null_state_gen.c
> new file mode 100644
> index 0000000..14f45d3
> --- /dev/null
> +++ b/tools/null_state_gen/intel_null_state_gen.c
> @@ -0,0 +1,151 @@
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <errno.h>
> +#include <assert.h>
> +
> +#include "intel_batchbuffer.h"
> +
> +#define STATE_ALIGN 64
> +
> +extern int gen6_setup_null_render_state(struct intel_batchbuffer *batch);
> +extern int gen7_setup_null_render_state(struct intel_batchbuffer *batch);
> +extern int gen8_setup_null_render_state(struct intel_batchbuffer *batch);
> +
> +static void print_usage(char *s)
> +{
> +	fprintf(stderr, "%s: <gen>\n"
> +		"     gen:     gen to generate for (6,7,8)\n",
> +	       s);
> +}
> +
> +static int is_reloc(struct intel_batchbuffer *batch, uint32_t offset)
> +{
> +	int i;
> +
> +	for (i = 0; i < batch->num_relocs; i++)
> +		if (batch->relocs[i] == offset)
> +			return 1;
> +
> +	return 0;
> +}
> +
> +static int print_state(int gen, struct intel_batchbuffer *batch)
> +{
> +	int i;
> +
> +	printf("#include \"intel_renderstate.h\"\n\n");
> +
> +	printf("static const u32 gen%d_null_state_relocs[] = {\n", gen);
> +	for (i = 0; i < batch->num_relocs; i++) {
> +		printf("\t0x%08x,\n", batch->relocs[i]);
> +	}
> +	printf("};\n\n");
> +
> +	printf("static const u32 gen%d_null_state_batch[] = {\n", gen);
> +	for (i = 0; i < batch->size; i += 4) {
> +		const uint32_t *p = (void *)batch->base + i;
> +		printf("\t0x%08x,", *p);
> +
> +		if (i == intel_batch_cmds_used(batch) - 4)
> +			printf("\t /* cmds end */");
> +
> +		if (i == intel_batch_state_start(batch))
> +			printf("\t /* state start */");
> +
> +
> +		if (i == intel_batch_state_start(batch) +
> +		    intel_batch_state_used(batch) - 4)
> +			printf("\t /* state end */");
> +
> +		if (is_reloc(batch, i))
> +			printf("\t /* reloc */");
> +
> +		printf("\n");
> +	}
> +	printf("};\n\nRO_RENDERSTATE(%d);\n", gen);
> +
> +	return 0;
> +}
> +
> +static int do_generate(int gen)
> +{
> +	int initial_size = 8192;
> +	struct intel_batchbuffer batch;
> +	void *p;
> +	int ret = -EINVAL;
> +	uint32_t cmd_len, state_len, size;
> +	int (*null_state_gen)(struct intel_batchbuffer *batch) = NULL;
> +
> +	p = malloc(initial_size);
> +	if (p == NULL)
> +		return -ENOMEM;
> +
> +	assert(ALIGN(initial_size/2, STATE_ALIGN) == initial_size/2);
> +
> +	ret = intel_batch_reset(&batch, p, initial_size, initial_size/2);
> +	if (ret)
> +		goto out;
> +
> +	switch (gen) {
> +	case 6:
> +		null_state_gen = gen6_setup_null_render_state;
> +		break;
> +
> +	case 7:
> +		null_state_gen = gen7_setup_null_render_state;
> +		break;
> +
> +	case 8:
> +		null_state_gen = gen8_setup_null_render_state;
> +		break;
> +	}
> +
> +	if (null_state_gen == NULL) {
> +		printf("no generator found for %d\n", gen);
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	ret = null_state_gen(&batch);
> +	if (ret < 0)
> +		goto out;
> +
> +	cmd_len = intel_batch_cmds_used(&batch);
> +	state_len = intel_batch_state_used(&batch);
> +
> +	size = cmd_len + state_len + ALIGN(cmd_len, STATE_ALIGN) - cmd_len;
> +
> +	ret = intel_batch_reset(&batch, p, size, ALIGN(cmd_len, STATE_ALIGN));
> +	if (ret)
> +		goto out;
> +
> +	ret = null_state_gen(&batch);
> +	if (ret < 0)
> +		goto out;
> +
> +	assert(cmd_len == intel_batch_cmds_used(&batch));
> +	assert(state_len == intel_batch_state_used(&batch));
> +	assert(size == ret);
> +
> +	/* Batch buffer needs to end */
> +	assert(*(uint32_t *)(p + cmd_len - 4) == (0xA << 23));
> +
> +	ret = print_state(gen, &batch);
> +out:
> +	free(p);
> +
> +	if (ret < 0)
> +		return ret;
> +
> +	return 0;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	if (argc != 2) {
> +		print_usage(argv[0]);
> +		return 1;
> +	}
> +
> +	return do_generate(atoi(argv[1]));
> +}
> diff --git a/tools/null_state_gen/intel_renderstate_gen7.c b/tools/null_state_gen/intel_renderstate_gen7.c
> new file mode 100644
> index 0000000..8fe8a80
> --- /dev/null
> +++ b/tools/null_state_gen/intel_renderstate_gen7.c
> @@ -0,0 +1,505 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +
> +#include "intel_batchbuffer.h"
> +#include <lib/gen7_render.h>
> +#include <lib/intel_reg.h>
> +#include <stdio.h>
> +
> +static const uint32_t ps_kernel[][4] = {
> +	{ 0x0080005a, 0x2e2077bd, 0x000000c0, 0x008d0040 },
> +	{ 0x0080005a, 0x2e6077bd, 0x000000d0, 0x008d0040 },
> +	{ 0x02800031, 0x21801fa9, 0x008d0e20, 0x08840001 },
> +	{ 0x00800001, 0x2e2003bd, 0x008d0180, 0x00000000 },
> +	{ 0x00800001, 0x2e6003bd, 0x008d01c0, 0x00000000 },
> +	{ 0x00800001, 0x2ea003bd, 0x008d0200, 0x00000000 },
> +	{ 0x00800001, 0x2ee003bd, 0x008d0240, 0x00000000 },
> +	{ 0x05800031, 0x20001fa8, 0x008d0e20, 0x90031000 },
> +};
> +
> +static uint32_t
> +gen7_bind_buf_null(struct intel_batchbuffer *batch)
> +{
> +	uint32_t *ss;
> +
> +	ss = intel_batch_state_alloc(batch, 8 * sizeof(*ss), 32);
> +	if (ss == NULL)
> +		return -1;
> +
> +	ss[0] = 0;
> +	ss[1] = 0;
> +	ss[2] = 0;
> +	ss[3] = 0;
> +	ss[4] = 0;
> +	ss[5] = 0;
> +	ss[6] = 0;
> +	ss[7] = 0;
> +
> +	return intel_batch_offset(batch, ss);
> +}
> +
> +static void
> +gen7_emit_vertex_elements(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_VERTEX_ELEMENTS |
> +		  ((2 * (1 + 2)) + 1 - 2));
> +
> +	OUT_BATCH(0 << GEN7_VE0_VERTEX_BUFFER_INDEX_SHIFT | GEN7_VE0_VALID |
> +		  GEN7_SURFACEFORMAT_R32G32B32A32_FLOAT <<
> +		  GEN7_VE0_FORMAT_SHIFT |
> +		  0 << GEN7_VE0_OFFSET_SHIFT);
> +
> +	OUT_BATCH(GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_0_SHIFT |
> +		  GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_1_SHIFT |
> +		  GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_2_SHIFT |
> +		  GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_3_SHIFT);
> +
> +	/* x,y */
> +	OUT_BATCH(0 << GEN7_VE0_VERTEX_BUFFER_INDEX_SHIFT | GEN7_VE0_VALID |
> +		  GEN7_SURFACEFORMAT_R16G16_SSCALED << GEN7_VE0_FORMAT_SHIFT |
> +		  0 << GEN7_VE0_OFFSET_SHIFT); /* offsets vb in bytes */
> +	OUT_BATCH(GEN7_VFCOMPONENT_STORE_SRC << GEN7_VE1_VFCOMPONENT_0_SHIFT |
> +		  GEN7_VFCOMPONENT_STORE_SRC << GEN7_VE1_VFCOMPONENT_1_SHIFT |
> +		  GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_2_SHIFT |
> +		  GEN7_VFCOMPONENT_STORE_1_FLT << GEN7_VE1_VFCOMPONENT_3_SHIFT);
> +
> +	/* s,t */
> +	OUT_BATCH(0 << GEN7_VE0_VERTEX_BUFFER_INDEX_SHIFT | GEN7_VE0_VALID |
> +		  GEN7_SURFACEFORMAT_R16G16_SSCALED << GEN7_VE0_FORMAT_SHIFT |
> +		  4 << GEN7_VE0_OFFSET_SHIFT);  /* offset vb in bytes */
> +	OUT_BATCH(GEN7_VFCOMPONENT_STORE_SRC << GEN7_VE1_VFCOMPONENT_0_SHIFT |
> +		  GEN7_VFCOMPONENT_STORE_SRC << GEN7_VE1_VFCOMPONENT_1_SHIFT |
> +		  GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_2_SHIFT |
> +		  GEN7_VFCOMPONENT_STORE_1_FLT << GEN7_VE1_VFCOMPONENT_3_SHIFT);
> +}
> +
> +static uint32_t
> +gen7_create_vertex_buffer(struct intel_batchbuffer *batch)
> +{
> +	uint16_t *v;
> +
> +	v = intel_batch_state_alloc(batch, 12*sizeof(*v), 8);
> +	if (v == NULL)
> +		return -1;
> +
> +	v[0] = 0;
> +	v[1] = 0;
> +	v[2] = 0;
> +	v[3] = 0;
> +
> +	v[4] = 0;
> +	v[5] = 0;
> +	v[6] = 0;
> +	v[7] = 0;
> +
> +	v[8] = 0;
> +	v[9] = 0;
> +	v[10] = 0;
> +	v[11] = 0;
> +
> +	return intel_batch_offset(batch, v);
> +}
> +
> +static void gen7_emit_vertex_buffer(struct intel_batchbuffer *batch)
> +{
> +	uint32_t offset;
> +
> +	offset = gen7_create_vertex_buffer(batch);
> +
> +	OUT_BATCH(GEN7_3DSTATE_VERTEX_BUFFERS | (5 - 2));
> +	OUT_BATCH(0 << GEN7_VB0_BUFFER_INDEX_SHIFT |
> +		  GEN7_VB0_VERTEXDATA |
> +		  GEN7_VB0_ADDRESS_MODIFY_ENABLE |
> +		  GEN7_VB0_NULL_VERTEX_BUFFER |
> +		  4*2 << GEN7_VB0_BUFFER_PITCH_SHIFT);
> +
> +	OUT_RELOC(batch, I915_GEM_DOMAIN_VERTEX, 0, offset);
> +	OUT_BATCH(~0);
> +	OUT_BATCH(0);
> +}
> +
> +static uint32_t
> +gen7_bind_surfaces(struct intel_batchbuffer *batch)
> +{
> +	uint32_t *binding_table;
> +
> +	binding_table = intel_batch_state_alloc(batch, 8, 32);
> +	if (binding_table == NULL)
> +		return -1;
> +
> +	binding_table[0] = gen7_bind_buf_null(batch);
> +	binding_table[1] = gen7_bind_buf_null(batch);
> +
> +	return intel_batch_offset(batch, binding_table);
> +}
> +
> +static void
> +gen7_emit_binding_table(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_PS | (2 - 2));
> +	OUT_BATCH(gen7_bind_surfaces(batch));
> +}
> +
> +static void
> +gen7_emit_drawing_rectangle(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_DRAWING_RECTANGLE | (4 - 2));
> +	/* Purposedly set min > max for null rectangle */
> +	OUT_BATCH(0xffffffff);
> +	OUT_BATCH(0 | 0);
> +	OUT_BATCH(0);
> +}
> +
> +static uint32_t
> +gen7_create_blend_state(struct intel_batchbuffer *batch)
> +{
> +	struct gen7_blend_state *blend;
> +
> +	blend = intel_batch_state_alloc(batch, sizeof(*blend), 64);
> +	if (blend == NULL)
> +		return -1;
> +
> +	blend->blend0.dest_blend_factor = GEN7_BLENDFACTOR_ZERO;
> +	blend->blend0.source_blend_factor = GEN7_BLENDFACTOR_ONE;
> +	blend->blend0.blend_func = GEN7_BLENDFUNCTION_ADD;
> +	blend->blend1.post_blend_clamp_enable = 1;
> +	blend->blend1.pre_blend_clamp_enable = 1;
> +
> +	return intel_batch_offset(batch, blend);
> +}
> +
> +static void
> +gen7_emit_state_base_address(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_STATE_BASE_ADDRESS | (10 - 2));
> +	OUT_BATCH(0);
> +	OUT_RELOC(batch, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
> +	OUT_RELOC(batch, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
> +	OUT_BATCH(0);
> +	OUT_RELOC(batch, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
> +
> +	OUT_BATCH(0);
> +	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
> +}
> +
> +static uint32_t
> +gen7_create_cc_viewport(struct intel_batchbuffer *batch)
> +{
> +	struct gen7_cc_viewport *vp;
> +
> +	vp = intel_batch_state_alloc(batch, sizeof(*vp), 32);
> +	if (vp == NULL)
> +		return -1;
> +
> +	vp->min_depth = -1.e35;
> +	vp->max_depth = 1.e35;
> +
> +	return intel_batch_offset(batch, vp);
> +}
> +
> +static void
> +gen7_emit_cc(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_BLEND_STATE_POINTERS | (2 - 2));
> +	OUT_BATCH(gen7_create_blend_state(batch));
> +
> +	OUT_BATCH(GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_CC | (2 - 2));
> +	OUT_BATCH(gen7_create_cc_viewport(batch));
> +}
> +
> +static uint32_t
> +gen7_create_sampler(struct intel_batchbuffer *batch)
> +{
> +	struct gen7_sampler_state *ss;
> +
> +	ss = intel_batch_state_alloc(batch, sizeof(*ss), 32);
> +	if (ss == NULL)
> +		return -1;
> +
> +	ss->ss0.min_filter = GEN7_MAPFILTER_NEAREST;
> +	ss->ss0.mag_filter = GEN7_MAPFILTER_NEAREST;
> +
> +	ss->ss3.r_wrap_mode = GEN7_TEXCOORDMODE_CLAMP;
> +	ss->ss3.s_wrap_mode = GEN7_TEXCOORDMODE_CLAMP;
> +	ss->ss3.t_wrap_mode = GEN7_TEXCOORDMODE_CLAMP;
> +
> +	ss->ss3.non_normalized_coord = 1;
> +
> +	return intel_batch_offset(batch, ss);
> +}
> +
> +static void
> +gen7_emit_sampler(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_PS | (2 - 2));
> +	OUT_BATCH(gen7_create_sampler(batch));
> +}
> +
> +static void
> +gen7_emit_multisample(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_MULTISAMPLE | (4 - 2));
> +	OUT_BATCH(GEN7_3DSTATE_MULTISAMPLE_PIXEL_LOCATION_CENTER |
> +		  GEN7_3DSTATE_MULTISAMPLE_NUMSAMPLES_1); /* 1 sample/pixel */
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_SAMPLE_MASK | (2 - 2));
> +	OUT_BATCH(1);
> +}
> +
> +static void
> +gen7_emit_urb(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_PS | (2 - 2));
> +	OUT_BATCH(8); /* in 1KBs */
> +
> +	/* num of VS entries must be divisible by 8 if size < 9 */
> +	OUT_BATCH(GEN7_3DSTATE_URB_VS | (2 - 2));
> +	OUT_BATCH((64 << GEN7_URB_ENTRY_NUMBER_SHIFT) |
> +		  (2 - 1) << GEN7_URB_ENTRY_SIZE_SHIFT |
> +		  (1 << GEN7_URB_STARTING_ADDRESS_SHIFT));
> +
> +	OUT_BATCH(GEN7_3DSTATE_URB_HS | (2 - 2));
> +	OUT_BATCH((0 << GEN7_URB_ENTRY_SIZE_SHIFT) |
> +		  (2 << GEN7_URB_STARTING_ADDRESS_SHIFT));
> +
> +	OUT_BATCH(GEN7_3DSTATE_URB_DS | (2 - 2));
> +	OUT_BATCH((0 << GEN7_URB_ENTRY_SIZE_SHIFT) |
> +		  (2 << GEN7_URB_STARTING_ADDRESS_SHIFT));
> +
> +	OUT_BATCH(GEN7_3DSTATE_URB_GS | (2 - 2));
> +	OUT_BATCH((0 << GEN7_URB_ENTRY_SIZE_SHIFT) |
> +		  (1 << GEN7_URB_STARTING_ADDRESS_SHIFT));
> +}
> +
> +static void
> +gen7_emit_vs(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_VS | (6 - 2));
> +	OUT_BATCH(0); /* no VS kernel */
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0); /* pass-through */
> +}
> +
> +static void
> +gen7_emit_hs(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_HS | (7 - 2));
> +	OUT_BATCH(0); /* no HS kernel */
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0); /* pass-through */
> +}
> +
> +static void
> +gen7_emit_te(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_TE | (4 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen7_emit_ds(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_DS | (6 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen7_emit_gs(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_GS | (7 - 2));
> +	OUT_BATCH(0); /* no GS kernel */
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0); /* pass-through  */
> +}
> +
> +static void
> +gen7_emit_streamout(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_STREAMOUT | (3 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen7_emit_sf(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_SF | (7 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(GEN7_3DSTATE_SF_CULL_NONE);
> +	OUT_BATCH(2 << GEN7_3DSTATE_SF_TRIFAN_PROVOKE_SHIFT);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen7_emit_sbe(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_SBE | (14 - 2));
> +	OUT_BATCH(1 << GEN7_SBE_NUM_OUTPUTS_SHIFT |
> +		  1 << GEN7_SBE_URB_ENTRY_READ_LENGTH_SHIFT |
> +		  1 << GEN7_SBE_URB_ENTRY_READ_OFFSET_SHIFT);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0); /* dw4 */
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0); /* dw8 */
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0); /* dw12 */
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen7_emit_ps(struct intel_batchbuffer *batch)
> +{
> +	int threads;
> +
> +#if 0 /* XXX: Do we need separate state for hsw or not */
> +	if (IS_HASWELL(batch->dev))
> +		threads = 40 << HSW_PS_MAX_THREADS_SHIFT |
> +			1 << HSW_PS_SAMPLE_MASK_SHIFT;
> +	else
> +#endif
> +		threads = 40 << IVB_PS_MAX_THREADS_SHIFT;
> +
> +	OUT_BATCH(GEN7_3DSTATE_PS | (8 - 2));
> +	OUT_BATCH(intel_batch_state_copy(batch, ps_kernel,
> +					 sizeof(ps_kernel), 64));
> +	OUT_BATCH(1 << GEN7_PS_SAMPLER_COUNT_SHIFT |
> +		  2 << GEN7_PS_BINDING_TABLE_ENTRY_COUNT_SHIFT);
> +	OUT_BATCH(0); /* scratch address */
> +	OUT_BATCH(threads |
> +		  GEN7_PS_16_DISPATCH_ENABLE |
> +		  GEN7_PS_ATTRIBUTE_ENABLE);
> +	OUT_BATCH(6 << GEN7_PS_DISPATCH_START_GRF_SHIFT_0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen7_emit_clip(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_CLIP | (4 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0); /* pass-through */
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_SF_CL | (2 - 2));
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen7_emit_wm(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_WM | (3 - 2));
> +	OUT_BATCH(GEN7_WM_DISPATCH_ENABLE |
> +		  GEN7_WM_PERSPECTIVE_PIXEL_BARYCENTRIC);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen7_emit_null_depth_buffer(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN7_3DSTATE_DEPTH_BUFFER | (7 - 2));
> +	OUT_BATCH(GEN7_SURFACE_NULL << GEN7_3DSTATE_DEPTH_BUFFER_TYPE_SHIFT |
> +		  GEN7_DEPTHFORMAT_D32_FLOAT <<
> +		  GEN7_3DSTATE_DEPTH_BUFFER_FORMAT_SHIFT);
> +	OUT_BATCH(0); /* disable depth, stencil and hiz */
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_CLEAR_PARAMS | (3 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +}
> +
> +int gen7_setup_null_render_state(struct intel_batchbuffer *batch)
> +{
> +	int ret;
> +
> +	OUT_BATCH(GEN7_PIPELINE_SELECT | PIPELINE_SELECT_3D);
> +
> +	gen7_emit_state_base_address(batch);
> +	gen7_emit_multisample(batch);
> +	gen7_emit_urb(batch);
> +	gen7_emit_vs(batch);
> +	gen7_emit_hs(batch);
> +	gen7_emit_te(batch);
> +	gen7_emit_ds(batch);
> +	gen7_emit_gs(batch);
> +	gen7_emit_clip(batch);
> +	gen7_emit_sf(batch);
> +	gen7_emit_wm(batch);
> +	gen7_emit_streamout(batch);
> +	gen7_emit_null_depth_buffer(batch);
> +
> +	gen7_emit_cc(batch);
> +	gen7_emit_sampler(batch);
> +	gen7_emit_sbe(batch);
> +	gen7_emit_ps(batch);
> +	gen7_emit_vertex_elements(batch);
> +	gen7_emit_vertex_buffer(batch);
> +	gen7_emit_binding_table(batch);
> +	gen7_emit_drawing_rectangle(batch);
> +
> +	OUT_BATCH(GEN7_3DPRIMITIVE | (7 - 2));
> +	OUT_BATCH(GEN7_3DPRIMITIVE_VERTEX_SEQUENTIAL | _3DPRIM_RECTLIST);
> +	OUT_BATCH(3);
> +	OUT_BATCH(0);
> +	OUT_BATCH(1);   /* single instance */
> +	OUT_BATCH(0);   /* start instance location */
> +	OUT_BATCH(0);   /* index buffer offset, ignored */
> +
> +	OUT_BATCH(MI_BATCH_BUFFER_END);
> +
> +	ret = intel_batch_error(batch);
> +	if (ret == 0)
> +		ret = intel_batch_total_used(batch);
> +
> +	return ret;
> +}
> diff --git a/tools/null_state_gen/intel_renderstate_gen8.c b/tools/null_state_gen/intel_renderstate_gen8.c
> new file mode 100644
> index 0000000..7e22b24
> --- /dev/null
> +++ b/tools/null_state_gen/intel_renderstate_gen8.c
> @@ -0,0 +1,764 @@
> +#include "intel_batchbuffer.h"
> +#include <lib/gen8_render.h>
> +#include <lib/intel_reg.h>
> +#include <string.h>
> +
> +struct {
> +	uint32_t cc_state;
> +	uint32_t blend_state;
> +} cc;
> +
> +struct {
> +	uint32_t cc_state;
> +	uint32_t sf_clip_state;
> +} viewport;
> +
> +/* see shaders/ps/blit.g7a */
> +static const uint32_t ps_kernel[][4] = {
> +#if 1
> +   { 0x0060005a, 0x21403ae8, 0x3a0000c0, 0x008d0040 },
> +   { 0x0060005a, 0x21603ae8, 0x3a0000c0, 0x008d0080 },
> +   { 0x0060005a, 0x21803ae8, 0x3a0000d0, 0x008d0040 },
> +   { 0x0060005a, 0x21a03ae8, 0x3a0000d0, 0x008d0080 },
> +   { 0x02800031, 0x2e0022e8, 0x0e000140, 0x08840001 },
> +   { 0x05800031, 0x200022e0, 0x0e000e00, 0x90031000 },
> +#else
> +   /* Write all -1 */
> +   { 0x00600001, 0x2e000608, 0x00000000, 0x3f800000 },
> +   { 0x00600001, 0x2e200608, 0x00000000, 0x3f800000 },
> +   { 0x00600001, 0x2e400608, 0x00000000, 0x3f800000 },
> +   { 0x00600001, 0x2e600608, 0x00000000, 0x3f800000 },
> +   { 0x00600001, 0x2e800608, 0x00000000, 0x3f800000 },
> +   { 0x00600001, 0x2ea00608, 0x00000000, 0x3f800000 },
> +   { 0x00600001, 0x2ec00608, 0x00000000, 0x3f800000 },
> +   { 0x00600001, 0x2ee00608, 0x00000000, 0x3f800000 },
> +   { 0x05800031, 0x200022e0, 0x0e000e00, 0x90031000 },
> +#endif
> +};
> +
> +static uint32_t
> +gen8_bind_buf_null(struct intel_batchbuffer *batch)
> +{
> +	struct gen8_surface_state *ss;
> +
> +	ss = intel_batch_state_alloc(batch, sizeof(*ss), 64);
> +	if (ss == NULL)
> +		return -1;
> +
> +	memset(ss, 0, sizeof(*ss));
> +
> +	return intel_batch_offset(batch, ss);
> +}
> +
> +static uint32_t
> +gen8_bind_surfaces(struct intel_batchbuffer *batch)
> +{
> +	uint32_t *binding_table, offset;
> +
> +	binding_table = intel_batch_state_alloc(batch, 8, 32);
> +	if (binding_table == NULL)
> +		return -1;
> +
> +	offset = intel_batch_offset(batch, binding_table);
> +
> +	binding_table[0] =
> +		gen8_bind_buf_null(batch);
> +	binding_table[1] =
> +		gen8_bind_buf_null(batch);
> +
> +	return offset;
> +}
> +
> +/* Mostly copy+paste from gen6, except wrap modes moved */
> +static uint32_t
> +gen8_create_sampler(struct intel_batchbuffer *batch) {
> +	struct gen8_sampler_state *ss;
> +	uint32_t offset;
> +
> +	ss = intel_batch_state_alloc(batch, sizeof(*ss), 64);
> +	if (ss == NULL)
> +		return -1;
> +
> +	offset = intel_batch_offset(batch, ss);
> +
> +	ss->ss0.min_filter = GEN6_MAPFILTER_NEAREST;
> +	ss->ss0.mag_filter = GEN6_MAPFILTER_NEAREST;
> +	ss->ss3.r_wrap_mode = GEN6_TEXCOORDMODE_CLAMP;
> +	ss->ss3.s_wrap_mode = GEN6_TEXCOORDMODE_CLAMP;
> +	ss->ss3.t_wrap_mode = GEN6_TEXCOORDMODE_CLAMP;
> +
> +	/* I've experimented with non-normalized coordinates and using the LD
> +	 * sampler fetch, but couldn't make it work. */
> +	ss->ss3.non_normalized_coord = 0;
> +
> +	return offset;
> +}
> +
> +static uint32_t
> +gen8_fill_ps(struct intel_batchbuffer *batch,
> +	     const uint32_t kernel[][4],
> +	     size_t size)
> +{
> +	return intel_batch_state_copy(batch, kernel, size, 64);
> +}
> +
> +/**
> + * gen7_fill_vertex_buffer_data populate vertex buffer with data.
> + *
> + * The vertex buffer consists of 3 vertices to construct a RECTLIST. The 4th
> + * vertex is implied (automatically derived by the HW). Each element has the
> + * destination offset, and the normalized texture offset (src). The rectangle
> + * itself will span the entire subsurface to be copied.
> + *
> + * see gen6_emit_vertex_elements
> + */
> +static uint32_t
> +gen7_fill_vertex_buffer_data(struct intel_batchbuffer *batch)
> +{
> +	uint16_t *start;
> +
> +	start = intel_batch_state_alloc(batch, 2 * sizeof(*start), 8);
> +	start[0] = 0;
> +	start[1] = 0;
> +
> +	return intel_batch_offset(batch, start);
> +}
> +
> +/**
> + * gen6_emit_vertex_elements - The vertex elements describe the contents of the
> + * vertex buffer. We pack the vertex buffer in a semi weird way, conforming to
> + * what gen6_rendercopy did. The most straightforward would be to store
> + * everything as floats.
> + *
> + * see gen7_fill_vertex_buffer_data() for where the corresponding elements are
> + * packed.
> + */
> +static void
> +gen6_emit_vertex_elements(struct intel_batchbuffer *batch) {
> +	/*
> +	 * The VUE layout
> +	 *    dword 0-3: pad (0, 0, 0. 0)
> +	 *    dword 4-7: position (x, y, 0, 1.0),
> +	 *    dword 8-11: texture coordinate 0 (u0, v0, 0, 1.0)
> +	 */
> +	OUT_BATCH(GEN6_3DSTATE_VERTEX_ELEMENTS | (3 * 2 + 1 - 2));
> +
> +	/* Element state 0. These are 4 dwords of 0 required for the VUE format.
> +	 * We don't really know or care what they do.
> +	 */
> +	OUT_BATCH(0 << VE0_VERTEX_BUFFER_INDEX_SHIFT | VE0_VALID |
> +		  GEN6_SURFACEFORMAT_R32G32B32A32_FLOAT << VE0_FORMAT_SHIFT |
> +		  0 << VE0_OFFSET_SHIFT); /* we specify 0, but it's really does not exist */
> +	OUT_BATCH(GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_0_SHIFT |
> +		  GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_1_SHIFT |
> +		  GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_2_SHIFT |
> +		  GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_3_SHIFT);
> +
> +	/* Element state 1 - Our "destination" vertices. These are passed down
> +	 * through the pipeline, and eventually make it to the pixel shader as
> +	 * the offsets in the destination surface. It's packed as the 16
> +	 * signed/scaled because of gen6 rendercopy. I see no particular reason
> +	 * for doing this though.
> +	 */
> +	OUT_BATCH(0 << VE0_VERTEX_BUFFER_INDEX_SHIFT | VE0_VALID |
> +		  GEN6_SURFACEFORMAT_R16G16_SSCALED << VE0_FORMAT_SHIFT |
> +		  0 << VE0_OFFSET_SHIFT); /* offsets vb in bytes */
> +	OUT_BATCH(GEN6_VFCOMPONENT_STORE_SRC << VE1_VFCOMPONENT_0_SHIFT |
> +		  GEN6_VFCOMPONENT_STORE_SRC << VE1_VFCOMPONENT_1_SHIFT |
> +		  GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_2_SHIFT |
> +		  GEN6_VFCOMPONENT_STORE_1_FLT << VE1_VFCOMPONENT_3_SHIFT);
> +
> +	/* Element state 2. Last but not least we store the U,V components as
> +	 * normalized floats. These will be used in the pixel shader to sample
> +	 * from the source buffer.
> +	 */
> +	OUT_BATCH(0 << VE0_VERTEX_BUFFER_INDEX_SHIFT | VE0_VALID |
> +		  GEN6_SURFACEFORMAT_R32G32_FLOAT << VE0_FORMAT_SHIFT |
> +		  4 << VE0_OFFSET_SHIFT);	/* offset vb in bytes */
> +	OUT_BATCH(GEN6_VFCOMPONENT_STORE_SRC << VE1_VFCOMPONENT_0_SHIFT |
> +		  GEN6_VFCOMPONENT_STORE_SRC << VE1_VFCOMPONENT_1_SHIFT |
> +		  GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_2_SHIFT |
> +		  GEN6_VFCOMPONENT_STORE_1_FLT << VE1_VFCOMPONENT_3_SHIFT);
> +}
> +
> +/**
> + * gen7_emit_vertex_buffer emit the vertex buffers command
> + *
> + * @batch
> + * @offset - bytw offset within the @batch where the vertex buffer starts.
> + */
> +static void gen7_emit_vertex_buffer(struct intel_batchbuffer *batch,
> +				    uint32_t offset) {
> +	OUT_BATCH(GEN6_3DSTATE_VERTEX_BUFFERS | (1 + (4 * 1) - 2));
> +	OUT_BATCH(0 << VB0_BUFFER_INDEX_SHIFT | /* VB 0th index */
> +		  GEN7_VB0_BUFFER_ADDR_MOD_EN | /* Address Modify Enable */
> +		  VB0_NULL_VERTEX_BUFFER |
> +		  0 << VB0_BUFFER_PITCH_SHIFT);
> +	OUT_RELOC(batch, I915_GEM_DOMAIN_VERTEX, 0, offset);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +}
> +
> +static uint32_t
> +gen6_create_cc_state(struct intel_batchbuffer *batch)
> +{
> +	struct gen6_color_calc_state *cc_state;
> +	uint32_t offset;
> +
> +	cc_state = intel_batch_state_alloc(batch, sizeof(*cc_state), 64);
> +	if (cc_state == NULL)
> +		return -1;
> +
> +	offset = intel_batch_offset(batch, cc_state);
> +
> +	return offset;
> +}
> +
> +static uint32_t
> +gen8_create_blend_state(struct intel_batchbuffer *batch)
> +{
> +	struct gen8_blend_state *blend;
> +	int i;
> +	uint32_t offset;
> +
> +	blend = intel_batch_state_alloc(batch, sizeof(*blend), 64);
> +	if (blend == NULL)
> +		return -1;
> +
> +	offset = intel_batch_offset(batch, blend);
> +
> +	for (i = 0; i < 16; i++) {
> +		blend->bs[i].dest_blend_factor = GEN6_BLENDFACTOR_ZERO;
> +		blend->bs[i].source_blend_factor = GEN6_BLENDFACTOR_ONE;
> +		blend->bs[i].color_blend_func = GEN6_BLENDFUNCTION_ADD;
> +		blend->bs[i].pre_blend_color_clamp = 1;
> +		blend->bs[i].color_buffer_blend = 0;
> +	}
> +
> +	return offset;
> +}
> +
> +static uint32_t
> +gen6_create_cc_viewport(struct intel_batchbuffer *batch)
> +{
> +	struct gen6_cc_viewport *vp;
> +	uint32_t offset;
> +
> +	vp = intel_batch_state_alloc(batch, sizeof(*vp), 32);
> +	if (vp == NULL)
> +		return -1;
> +
> +	offset = intel_batch_offset(batch, vp);
> +
> +	/* XXX I don't understand this */
> +	vp->min_depth = -1.e35;
> +	vp->max_depth = 1.e35;
> +
> +	return offset;
> +}
> +
> +static uint32_t
> +gen7_create_sf_clip_viewport(struct intel_batchbuffer *batch) {
> +	/* XXX these are likely not needed */
> +	struct gen7_sf_clip_viewport *scv_state;
> +	uint32_t offset;
> +
> +	scv_state = intel_batch_state_alloc(batch, sizeof(*scv_state), 64);
> +	if (scv_state == NULL)
> +		return -1;
> +
> +	offset = intel_batch_offset(batch, scv_state);
> +
> +	scv_state->guardband.xmin = 0;
> +	scv_state->guardband.xmax = 1.0f;
> +	scv_state->guardband.ymin = 0;
> +	scv_state->guardband.ymax = 1.0f;
> +
> +	return offset;
> +}
> +
> +static uint32_t
> +gen6_create_scissor_rect(struct intel_batchbuffer *batch)
> +{
> +	struct gen6_scissor_rect *scissor;
> +	uint32_t offset;
> +
> +	scissor = intel_batch_state_alloc(batch, sizeof(*scissor), 64);
> +	if (scissor == NULL)
> +		return -1;
> +
> +	offset = intel_batch_offset(batch, scissor);
> +
> +	return offset;
> +}
> +
> +static void
> +gen8_emit_sip(struct intel_batchbuffer *batch) {
> +	OUT_BATCH(GEN6_STATE_SIP | (3 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen7_emit_push_constants(struct intel_batchbuffer *batch) {
> +	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_VS);
> +	OUT_BATCH(0);
> +	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_HS);
> +	OUT_BATCH(0);
> +	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_DS);
> +	OUT_BATCH(0);
> +	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_GS);
> +	OUT_BATCH(0);
> +	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_PS);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen8_emit_state_base_address(struct intel_batchbuffer *batch) {
> +	OUT_BATCH(GEN6_STATE_BASE_ADDRESS | (16 - 2));
> +
> +	/* general */
> +	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
> +	OUT_BATCH(0);
> +
> +	/* stateless data port */
> +	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
> +
> +	/* surface */
> +	OUT_RELOC(batch, I915_GEM_DOMAIN_SAMPLER, 0, BASE_ADDRESS_MODIFY);
> +	OUT_BATCH(0);
> +
> +	/* dynamic */
> +	OUT_RELOC(batch, I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION,
> +		  0, BASE_ADDRESS_MODIFY);
> +	OUT_BATCH(0);
> +
> +	/* indirect */
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	/* instruction */
> +	OUT_RELOC(batch, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
> +	OUT_BATCH(0);
> +
> +	/* general state buffer size */
> +	OUT_BATCH(0xfffff000 | 1);
> +	/* dynamic state buffer size */
> +	OUT_BATCH(1 << 12 | 1);
> +	/* indirect object buffer size */
> +	OUT_BATCH(0xfffff000 | 1);
> +	/* intruction buffer size */
> +	OUT_BATCH(1 << 12 | 1);
> +}
> +
> +static void
> +gen7_emit_urb(struct intel_batchbuffer *batch) {
> +	/* XXX: Min valid values from mesa */
> +	const int vs_entries = 64;
> +	const int vs_size = 2;
> +	const int vs_start = 2;
> +
> +	OUT_BATCH(GEN7_3DSTATE_URB_VS);
> +	OUT_BATCH(vs_entries | ((vs_size - 1) << 16) | (vs_start << 25));
> +	OUT_BATCH(GEN7_3DSTATE_URB_GS);
> +	OUT_BATCH(vs_start << 25);
> +	OUT_BATCH(GEN7_3DSTATE_URB_HS);
> +	OUT_BATCH(vs_start << 25);
> +	OUT_BATCH(GEN7_3DSTATE_URB_DS);
> +	OUT_BATCH(vs_start << 25);
> +}
> +
> +static void
> +gen8_emit_cc(struct intel_batchbuffer *batch) {
> +	OUT_BATCH(GEN7_3DSTATE_BLEND_STATE_POINTERS);
> +	OUT_BATCH(cc.blend_state | 1);
> +
> +	OUT_BATCH(GEN6_3DSTATE_CC_STATE_POINTERS);
> +	OUT_BATCH(cc.cc_state | 1);
> +}
> +
> +static void
> +gen8_emit_multisample(struct intel_batchbuffer *batch) {
> +	OUT_BATCH(GEN8_3DSTATE_MULTISAMPLE);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN6_3DSTATE_SAMPLE_MASK);
> +	OUT_BATCH(1);
> +}
> +
> +static void
> +gen8_emit_vs(struct intel_batchbuffer *batch) {
> +	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_VS);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_VS);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN6_3DSTATE_CONSTANT_VS | (11 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN6_3DSTATE_VS | (9-2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen8_emit_hs(struct intel_batchbuffer *batch) {
> +	OUT_BATCH(GEN7_3DSTATE_CONSTANT_HS | (11 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_HS | (9-2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_HS);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_HS);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen8_emit_gs(struct intel_batchbuffer *batch) {
> +	OUT_BATCH(GEN7_3DSTATE_CONSTANT_GS | (11 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_GS | (10-2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_GS);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_GS);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen8_emit_ds(struct intel_batchbuffer *batch) {
> +	OUT_BATCH(GEN7_3DSTATE_CONSTANT_DS | (11 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_DS | (9-2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_DS);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_DS);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen8_emit_wm_hz_op(struct intel_batchbuffer *batch) {
> +	OUT_BATCH(GEN8_3DSTATE_WM_HZ_OP | (5-2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen8_emit_null_state(struct intel_batchbuffer *batch) {
> +	gen8_emit_wm_hz_op(batch);
> +	gen8_emit_hs(batch);
> +	OUT_BATCH(GEN7_3DSTATE_TE | (4-2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	gen8_emit_gs(batch);
> +	gen8_emit_ds(batch);
> +	gen8_emit_vs(batch);
> +}
> +
> +static void
> +gen7_emit_clip(struct intel_batchbuffer *batch) {
> +	OUT_BATCH(GEN6_3DSTATE_CLIP | (4 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0); /*  pass-through */
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen8_emit_sf(struct intel_batchbuffer *batch)
> +{
> +	int i;
> +
> +	OUT_BATCH(GEN7_3DSTATE_SBE | (4 - 2));
> +	OUT_BATCH(1 << GEN7_SBE_NUM_OUTPUTS_SHIFT |
> +		  GEN8_SBE_FORCE_URB_ENTRY_READ_LENGTH |
> +		  GEN8_SBE_FORCE_URB_ENTRY_READ_OFFSET |
> +		  1 << GEN7_SBE_URB_ENTRY_READ_LENGTH_SHIFT |
> +		  1 << GEN8_SBE_URB_ENTRY_READ_OFFSET_SHIFT);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN8_3DSTATE_SBE_SWIZ | (11 - 2));
> +	for (i = 0; i < 8; i++)
> +		OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN8_3DSTATE_RASTER | (5 - 2));
> +	OUT_BATCH(GEN8_RASTER_FRONT_WINDING_CCW | GEN8_RASTER_CULL_NONE);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN6_3DSTATE_SF | (4 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen8_emit_ps(struct intel_batchbuffer *batch, uint32_t kernel) {
> +	const int max_threads = 63;
> +
> +	OUT_BATCH(GEN6_3DSTATE_WM | (2 - 2));
> +	OUT_BATCH(/* XXX: I don't understand the BARYCENTRIC stuff, but it
> +		   * appears we need it to put our setup data in the place we
> +		   * expect (g6, see below) */
> +		  GEN7_3DSTATE_PS_PERSPECTIVE_PIXEL_BARYCENTRIC);
> +
> +	OUT_BATCH(GEN6_3DSTATE_CONSTANT_PS | (11-2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_PS | (12-2));
> +	OUT_BATCH(kernel);
> +	OUT_BATCH(0); /* kernel hi */
> +	OUT_BATCH(1 << GEN6_3DSTATE_WM_SAMPLER_COUNT_SHITF |
> +		  2 << GEN6_3DSTATE_WM_BINDING_TABLE_ENTRY_COUNT_SHIFT);
> +	OUT_BATCH(0); /* scratch space stuff */
> +	OUT_BATCH(0); /* scratch hi */
> +	OUT_BATCH((max_threads - 1) << GEN8_3DSTATE_PS_MAX_THREADS_SHIFT |
> +		  GEN6_3DSTATE_WM_16_DISPATCH_ENABLE);
> +	OUT_BATCH(6 << GEN6_3DSTATE_WM_DISPATCH_START_GRF_0_SHIFT);
> +	OUT_BATCH(0); // kernel 1
> +	OUT_BATCH(0); /* kernel 1 hi */
> +	OUT_BATCH(0); // kernel 2
> +	OUT_BATCH(0); /* kernel 2 hi */
> +
> +	OUT_BATCH(GEN8_3DSTATE_PS_BLEND | (2 - 2));
> +	OUT_BATCH(GEN8_PS_BLEND_HAS_WRITEABLE_RT);
> +
> +	OUT_BATCH(GEN8_3DSTATE_PS_EXTRA | (2 - 2));
> +	OUT_BATCH(GEN8_PSX_PIXEL_SHADER_VALID | GEN8_PSX_ATTRIBUTE_ENABLE);
> +}
> +
> +static void
> +gen8_emit_depth(struct intel_batchbuffer *batch) {
> +	OUT_BATCH(GEN7_3DSTATE_DEPTH_BUFFER | (8-2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_HIER_DEPTH_BUFFER | (5 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN7_3DSTATE_STENCIL_BUFFER | (5 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +}
> +
> +static void
> +gen7_emit_clear(struct intel_batchbuffer *batch) {
> +	OUT_BATCH(GEN7_3DSTATE_CLEAR_PARAMS | (3-2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(1); // clear valid
> +}
> +
> +static void
> +gen6_emit_drawing_rectangle(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN6_3DSTATE_DRAWING_RECTANGLE | (4 - 2));
> +	OUT_BATCH(0xffffffff);
> +	OUT_BATCH(0 | 0);
> +	OUT_BATCH(0);
> +}
> +
> +static void gen8_emit_vf_topology(struct intel_batchbuffer *batch)
> +{
> +	OUT_BATCH(GEN8_3DSTATE_VF_TOPOLOGY);
> +	OUT_BATCH(_3DPRIM_RECTLIST);
> +}
> +
> +/* Vertex elements MUST be defined before this according to spec */
> +static void gen8_emit_primitive(struct intel_batchbuffer *batch, uint32_t offset)
> +{
> +	OUT_BATCH(GEN8_3DSTATE_VF_INSTANCING | (3 - 2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	OUT_BATCH(GEN6_3DPRIMITIVE | (7-2));
> +	OUT_BATCH(0);	/* gen8+ ignore the topology type field */
> +	OUT_BATCH(3);	/* vertex count */
> +	OUT_BATCH(0);	/*  We're specifying this instead with offset in GEN6_3DSTATE_VERTEX_BUFFERS */
> +	OUT_BATCH(1);	/* single instance */
> +	OUT_BATCH(0);	/* start instance location */
> +	OUT_BATCH(0);	/* index buffer offset, ignored */
> +}
> +
> +int gen8_setup_null_render_state(struct intel_batchbuffer *batch)
> +{
> +	uint32_t ps_sampler_state, ps_kernel_off, ps_binding_table;
> +	uint32_t scissor_state;
> +	uint32_t vertex_buffer;
> +	uint32_t batch_end;
> +	int ret;
> +
> +	ps_binding_table  = gen8_bind_surfaces(batch);
> +	ps_sampler_state  = gen8_create_sampler(batch);
> +	ps_kernel_off = gen8_fill_ps(batch, ps_kernel, sizeof(ps_kernel));
> +	vertex_buffer = gen7_fill_vertex_buffer_data(batch);
> +	cc.cc_state = gen6_create_cc_state(batch);
> +	cc.blend_state = gen8_create_blend_state(batch);
> +	viewport.cc_state = gen6_create_cc_viewport(batch);
> +	viewport.sf_clip_state = gen7_create_sf_clip_viewport(batch);
> +	scissor_state = gen6_create_scissor_rect(batch);
> +	/* TODO: theree is other state which isn't setup */
> +
> +	/* Start emitting the commands. The order roughly follows the mesa blorp
> +	 * order */
> +	OUT_BATCH(GEN6_PIPELINE_SELECT | PIPELINE_SELECT_3D);
> +
> +	gen8_emit_sip(batch);
> +
> +	gen7_emit_push_constants(batch);
> +
> +	gen8_emit_state_base_address(batch);
> +
> +	OUT_BATCH(GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_CC);
> +	OUT_BATCH(viewport.cc_state);
> +	OUT_BATCH(GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP);
> +	OUT_BATCH(viewport.sf_clip_state);
> +
> +	gen7_emit_urb(batch);
> +
> +	gen8_emit_cc(batch);
> +
> +	gen8_emit_multisample(batch);
> +
> +	gen8_emit_null_state(batch);
> +
> +	OUT_BATCH(GEN7_3DSTATE_STREAMOUT | (5-2));
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +	OUT_BATCH(0);
> +
> +	gen7_emit_clip(batch);
> +
> +	gen8_emit_sf(batch);
> +
> +	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_PS);
> +	OUT_BATCH(ps_binding_table);
> +
> +	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_PS);
> +	OUT_BATCH(ps_sampler_state);
> +
> +	gen8_emit_ps(batch, ps_kernel_off);
> +
> +	OUT_BATCH(GEN6_3DSTATE_SCISSOR_STATE_POINTERS);
> +	OUT_BATCH(scissor_state);
> +
> +	gen8_emit_depth(batch);
> +
> +	gen7_emit_clear(batch);
> +
> +	gen6_emit_drawing_rectangle(batch);
> +
> +	gen7_emit_vertex_buffer(batch, vertex_buffer);
> +	gen6_emit_vertex_elements(batch);
> +
> +	gen8_emit_vf_topology(batch);
> +	gen8_emit_primitive(batch, vertex_buffer);
> +
> +	OUT_BATCH(MI_BATCH_BUFFER_END);
> +
> +	ret = intel_batch_error(batch);
> +	if (ret == 0)
> +		ret = intel_batch_total_used(batch);
> +
> +	return ret;
> +}
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/2] render state initialization (bdw rc6)
  2014-05-06 13:26 [PATCH v2 0/2] render state initialization (bdw rc6) Mika Kuoppala
                   ` (2 preceding siblings ...)
  2014-05-06 13:39 ` [PATCH] tools/null_state_gen: generate null render state Mika Kuoppala
@ 2014-05-14 10:08 ` Damien Lespiau
  3 siblings, 0 replies; 19+ messages in thread
From: Damien Lespiau @ 2014-05-14 10:08 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, miku, ben, kristen

On Tue, May 06, 2014 at 04:26:04PM +0300, Mika Kuoppala wrote:
> Hi,
> 
> V2 series of the render state initialization patches.
> 
> I decided not to pursue the copying of the context object as the ctx
> is quite big, atleast on bdw. As discussed in irc, the copying
> could be done with blitter, on context creation time. But even then we would 
> need to wait for it to complete. Pushing 1kbytes of commands doesn't
> sound so bad when the alternative is to copy 18 pages.
> 
> The state generators can be found here but they are not needed for testing.
> http://cgit.freedesktop.org/~miku/intel-gpu-tools/log/?h=null_state_gen
> 
> Here is the branch for testing:
> http://cgit.freedesktop.org/~miku/drm-intel/log/?h=render_state
> 
> Thank you to all who provided feedback.
> -Mika
> 
> Mika Kuoppala (2):
>   drm/i915: add render state initialization
>   drm/i915: add null render states for gen6, gen7 and gen8

Acked-by: Damien Lespiau <damien.lespiau@intel.com>

> 
>  drivers/gpu/drm/i915/Makefile                 |    6 +
>  drivers/gpu/drm/i915/i915_drv.h               |    2 +
>  drivers/gpu/drm/i915/i915_gem_context.c       |    6 +
>  drivers/gpu/drm/i915/i915_gem_render_state.c  |  186 ++++++++++
>  drivers/gpu/drm/i915/intel_renderstate.h      |   48 +++
>  drivers/gpu/drm/i915/intel_renderstate_gen6.c |  289 +++++++++++++++
>  drivers/gpu/drm/i915/intel_renderstate_gen7.c |  253 +++++++++++++
>  drivers/gpu/drm/i915/intel_renderstate_gen8.c |  479 +++++++++++++++++++++++++
>  8 files changed, 1269 insertions(+)
>  create mode 100644 drivers/gpu/drm/i915/i915_gem_render_state.c
>  create mode 100644 drivers/gpu/drm/i915/intel_renderstate.h
>  create mode 100644 drivers/gpu/drm/i915/intel_renderstate_gen6.c
>  create mode 100644 drivers/gpu/drm/i915/intel_renderstate_gen7.c
>  create mode 100644 drivers/gpu/drm/i915/intel_renderstate_gen8.c
> 
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] drm/i915: add render state initialization
  2014-05-06 14:30     ` [PATCH v3 " Mika Kuoppala
@ 2014-05-14 10:24       ` Mateo Lozano, Oscar
  2014-05-14 11:13         ` Damien Lespiau
  0 siblings, 1 reply; 19+ messages in thread
From: Mateo Lozano, Oscar @ 2014-05-14 10:24 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx; +Cc: ben, miku, kristen

Hi Mika,

> -----Original Message-----
> From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf Of
> Mika Kuoppala
> Sent: Tuesday, May 06, 2014 3:30 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: ben@bwidawsk.net; miku@iki.fi; kristen@linux.intel.com
> Subject: [Intel-gfx] [PATCH v3 1/2] drm/i915: add render state initialization
> 
> HW guys say that it is not a cool idea to let device go into rc6 without proper 3d
> pipeline state.
> 
> For each new uninitialized context, generate a valid null render state to be run
> on context creation.

In Android, we have been seeing a problem in BDW D0 stepping (C0 is fine), in which actual rendering does not happen, even though everything seems to be healthy. The only "tell" seems to be that the pixel shader invocation count does not go up.
I wouldn´t dare say I understand why this fixes our problem, but it clearly does, so feel free to add:

Tested-by: Oscar Mateo <oscar.mateo@intel.com>

to both patches...
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] drm/i915: add render state initialization
  2014-05-14 10:24       ` Mateo Lozano, Oscar
@ 2014-05-14 11:13         ` Damien Lespiau
  2014-05-14 11:24           ` Mateo Lozano, Oscar
  0 siblings, 1 reply; 19+ messages in thread
From: Damien Lespiau @ 2014-05-14 11:13 UTC (permalink / raw)
  To: Mateo Lozano, Oscar; +Cc: intel-gfx, miku, ben, kristen

On Wed, May 14, 2014 at 10:24:53AM +0000, Mateo Lozano, Oscar wrote:
> Hi Mika,
> 
> > -----Original Message-----
> > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf Of
> > Mika Kuoppala
> > Sent: Tuesday, May 06, 2014 3:30 PM
> > To: intel-gfx@lists.freedesktop.org
> > Cc: ben@bwidawsk.net; miku@iki.fi; kristen@linux.intel.com
> > Subject: [Intel-gfx] [PATCH v3 1/2] drm/i915: add render state initialization
> > 
> > HW guys say that it is not a cool idea to let device go into rc6 without proper 3d
> > pipeline state.
> > 
> > For each new uninitialized context, generate a valid null render state to be run
> > on context creation.
> 
> In Android, we have been seeing a problem in BDW D0 stepping (C0 is
> fine), in which actual rendering does not happen, even though
> everything seems to be healthy. The only "tell" seems to be that the
> pixel shader invocation count does not go up.  I wouldn´t dare say I
> understand why this fixes our problem, but it clearly does, so feel
> free to add:
> 
> Tested-by: Oscar Mateo <oscar.mateo@intel.com>

Stating the obvious here, mainly for my own understanding :)

FWIW, that looks like the userspace driver you are using actually
relying on the kernel setting up a golden state and missing "one bit" or
one packet. For instance it could be that it's missing a
3DSTATE_WM_HZ_OP command (that's the last fix Ken did in the gen8 render
copy state).

Out of sheer luck (or almost :) we happen to have this working.

So that's another kind of papering over than the one "needed" for rc6.

It seems to potentially be a useful service the kernel is providing to
user space apps though, trying to setup a sane state so user space
batches don't hang the GPU if they are missing one command. Of course
hangs can still happen if the batches themselves have bugs.

How to be sure that's the correct golden state is another interesting
question we need to answer.

-- 
Damien

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] drm/i915: add render state initialization
  2014-05-14 11:13         ` Damien Lespiau
@ 2014-05-14 11:24           ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 19+ messages in thread
From: Mateo Lozano, Oscar @ 2014-05-14 11:24 UTC (permalink / raw)
  To: Lespiau, Damien; +Cc: intel-gfx, miku, ben, kristen

> -----Original Message-----
> From: Lespiau, Damien
> Sent: Wednesday, May 14, 2014 12:14 PM
> To: Mateo Lozano, Oscar
> Cc: Mika Kuoppala; intel-gfx@lists.freedesktop.org; ben@bwidawsk.net;
> miku@iki.fi; kristen@linux.intel.com
> Subject: Re: [Intel-gfx] [PATCH v3 1/2] drm/i915: add render state initialization
> 
> On Wed, May 14, 2014 at 10:24:53AM +0000, Mateo Lozano, Oscar wrote:
> > Hi Mika,
> >
> > > -----Original Message-----
> > > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On
> > > Behalf Of Mika Kuoppala
> > > Sent: Tuesday, May 06, 2014 3:30 PM
> > > To: intel-gfx@lists.freedesktop.org
> > > Cc: ben@bwidawsk.net; miku@iki.fi; kristen@linux.intel.com
> > > Subject: [Intel-gfx] [PATCH v3 1/2] drm/i915: add render state
> > > initialization
> > >
> > > HW guys say that it is not a cool idea to let device go into rc6
> > > without proper 3d pipeline state.
> > >
> > > For each new uninitialized context, generate a valid null render
> > > state to be run on context creation.
> >
> > In Android, we have been seeing a problem in BDW D0 stepping (C0 is
> > fine), in which actual rendering does not happen, even though
> > everything seems to be healthy. The only "tell" seems to be that the
> > pixel shader invocation count does not go up.  I wouldn´t dare say I
> > understand why this fixes our problem, but it clearly does, so feel
> > free to add:
> >
> > Tested-by: Oscar Mateo <oscar.mateo@intel.com>
> 
> Stating the obvious here, mainly for my own understanding :)
> 
> FWIW, that looks like the userspace driver you are using actually relying on the
> kernel setting up a golden state and missing "one bit" or one packet. For
> instance it could be that it's missing a 3DSTATE_WM_HZ_OP command (that's
> the last fix Ken did in the gen8 render copy state).
> 
> Out of sheer luck (or almost :) we happen to have this working.
> 
> So that's another kind of papering over than the one "needed" for rc6.
> 
> It seems to potentially be a useful service the kernel is providing to user space
> apps though, trying to setup a sane state so user space batches don't hang the
> GPU if they are missing one command. Of course hangs can still happen if the
> batches themselves have bugs.
> 
> How to be sure that's the correct golden state is another interesting question
> we need to answer.

Oops, sorry, I forgot a very important detail: the same userspace driver works just fine with a 3.10 kernel. I attribute this to the fact that the 3.10 kernel didn´t do MI_SET_CONTEXT unless you were explicitly creating contexts, but there could be another explanation, of course.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-05-14 11:25 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-06 13:26 [PATCH v2 0/2] render state initialization (bdw rc6) Mika Kuoppala
2014-05-06 13:26 ` [PATCH v2 1/2] drm/i915: add render state initialization Mika Kuoppala
2014-05-06 13:41   ` Chris Wilson
2014-05-06 14:30     ` [PATCH v3 " Mika Kuoppala
2014-05-14 10:24       ` Mateo Lozano, Oscar
2014-05-14 11:13         ` Damien Lespiau
2014-05-14 11:24           ` Mateo Lozano, Oscar
2014-05-06 14:34     ` [PATCH v2 " Mika Kuoppala
2014-05-06 13:26 ` [PATCH v2 2/2] drm/i915: add null render states for gen6, gen7 and gen8 Mika Kuoppala
2014-05-06 13:39 ` [PATCH] tools/null_state_gen: generate null render state Mika Kuoppala
2014-05-06 13:47   ` Chris Wilson
2014-05-06 14:44     ` Mika Kuoppala
2014-05-09 15:15     ` Damien Lespiau
2014-05-08 14:37   ` Damien Lespiau
2014-05-08 14:43   ` Damien Lespiau
2014-05-08 15:10     ` Mika Kuoppala
2014-05-09 14:46   ` Damien Lespiau
2014-05-14 10:06   ` Damien Lespiau
2014-05-14 10:08 ` [PATCH v2 0/2] render state initialization (bdw rc6) Damien Lespiau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.