All of lore.kernel.org
 help / color / mirror / Atom feed
* [igt-dev] [PATCH i-g-t v3 0/4] Refactoring of *_fill libraries
@ 2018-04-09 14:34 Katarzyna Dec
  2018-04-09 14:34 ` [igt-dev] [PATCH i-g-t v3 1/4] lib: Move common gpgpu/media fill functions to gpu_fill library Katarzyna Dec
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Katarzyna Dec @ 2018-04-09 14:34 UTC (permalink / raw)
  To: igt-dev

This series is removing duplications in gpgpu_fill and media_fill
libraries. As a first step I moved gpgpu and media helper functions
to gpu_fill library. In second patch I adjusted code to our coding
style. In the third not obvious duplications were removed (like
adding in gen7 functions conditions for future gens). Last patch
adds missing parameters that make GPU hang on gen9 and gen9+.

In first version of this series there was a comment about moving
batch_alloc/copy etc. functions to intel_batchbuffer library.
Because there is a lot of code to review already this change will
be introduced in another series (rendercopy, media_fill, gpgpu_fill
and media_spin code is affected by this).

It is possible that more changes around gen*_media.h and media_spin
is needed, but this will be done as a next step.

v2: Removed not obvious duplications. Adjusted code to review comments.
v3: Series needed reorganization because it introduced bug to ALP,
which was hard to find. That is why patch 1 is now almost only moving functions to gpu_fill with removing duplications, such as the same functions. Also applied comments from review.

Signed-off-by: Katarzyna Dec <katarzyna.dec@intel.com>
Cc: Lukasz Kalamarz <lukasz.kalamarz@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Katarzyna Dec (4):
  lib: Move common gpgpu/media fill functions to gpu_fill library
  lib: Remove duplications in gpu_fill library
  lib/gpgpu_fill: Add missing configuration parameters for gpgpu_fill
  lib: Adjust refactored gpu_fill library to our coding style

 lib/Makefile.sources    |   3 +-
 lib/gpgpu_fill.c        | 600 ++----------------------------------------
 lib/gpgpu_fill.h        |  12 +-
 lib/gpu_fill.c          | 674 ++++++++++++++++++++++++++++++++++++++++++++++++
 lib/gpu_fill.h          | 112 ++++++++
 lib/intel_batchbuffer.c |   4 +-
 lib/media_fill.h        |  23 +-
 lib/media_fill_gen7.c   | 278 +-------------------
 lib/media_fill_gen8.c   | 301 +--------------------
 lib/media_fill_gen8lp.c | 367 --------------------------
 lib/media_fill_gen9.c   | 309 +---------------------
 lib/meson.build         |   2 +-
 12 files changed, 841 insertions(+), 1844 deletions(-)
 create mode 100644 lib/gpu_fill.c
 create mode 100644 lib/gpu_fill.h
 delete mode 100644 lib/media_fill_gen8lp.c

-- 
2.14.3

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [igt-dev] [PATCH i-g-t v3 1/4] lib: Move common gpgpu/media fill functions to gpu_fill library
  2018-04-09 14:34 [igt-dev] [PATCH i-g-t v3 0/4] Refactoring of *_fill libraries Katarzyna Dec
@ 2018-04-09 14:34 ` Katarzyna Dec
  2018-04-09 14:34 ` [igt-dev] [PATCH i-g-t v3 2/4] lib: Remove duplications in " Katarzyna Dec
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Katarzyna Dec @ 2018-04-09 14:34 UTC (permalink / raw)
  To: igt-dev

Gpgpu_fill and media_fill libraries are very similar and many
functions can be shared. I have created library gpu_fill with
all functions needed for implementing gpgpu_fill and media_fill
tests for all Gens. For reviewing and debugging purposes this patch
should be only moving functions from few libraries to one removing
functions identical for both media and gpgpu.
Places in the code that required more changes:
  Removing gen7_fill_gpgpu_kernel function that is identical to
gen7_fill_media_kernel and introduces conflict with moving
genX_fill_interface_descriptor, which are the same for media and gpgpu.
  Function gen8_fill_media_kernel is not removed in this patch
(although it is identical with gen7 version), because this patch
should be as much as possible functions movement.
  gen8_fill_interface_descriptor was unified for media and gpgpu
by adding kernel and its size as a parameter (this parameters
were missing in media gen8, gen8lp and gen9 functions)
  gen8_emit_state_base_address was unified, the one for gpgpu was
configured like it would be using indirect state (while we are
using CURBE). I have checked that media fill version
(OUT_BATCH(0 | BASE_ADDRESS_MODIFY)) works fine on gpgpu gen8 and newer.

v2: Changed code layout. GenX_fill_media_kernel was identical to
genX_fill_gpgpu_kernel so this function was unified to
gen7_fill_kernel. There were 2 very similar functions
gen8_emit_state_base_address for media and gpgpu, where the one
for gpgpu was configured like it would be using indirect state
(while we are using CURBE). I have checked if media fill version
works fine in gpgpu test on Gen8 and unified them.

v3: Made patch easier for reviewing moving changes unifying code for
various gens (that were included v1) to other patch, leaving only
the most critical code changes.

Signed-off-by: Katarzyna Dec <katarzyna.dec@intel.com>
Cc: Lukasz Kalamarz <lukasz.kalamarz@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 lib/Makefile.sources    |   2 +
 lib/gpgpu_fill.c        | 571 +-----------------------------------
 lib/gpu_fill.c          | 758 ++++++++++++++++++++++++++++++++++++++++++++++++
 lib/gpu_fill.h          | 138 +++++++++
 lib/media_fill_gen7.c   | 271 +----------------
 lib/media_fill_gen8.c   | 290 +-----------------
 lib/media_fill_gen8lp.c | 284 +-----------------
 lib/media_fill_gen9.c   | 298 +------------------
 lib/meson.build         |   1 +
 9 files changed, 908 insertions(+), 1705 deletions(-)
 create mode 100644 lib/gpu_fill.c
 create mode 100644 lib/gpu_fill.h

diff --git a/lib/Makefile.sources b/lib/Makefile.sources
index 3d37ef1d..45e65dd7 100644
--- a/lib/Makefile.sources
+++ b/lib/Makefile.sources
@@ -64,6 +64,8 @@ lib_source_list =	 	\
 	media_spin.c		\
 	gpgpu_fill.h		\
 	gpgpu_fill.c		\
+	gpu_fill.h		\
+	gpu_fill.c		\
 	gen7_media.h            \
 	gen8_media.h            \
 	rendercopy_i915.c	\
diff --git a/lib/gpgpu_fill.c b/lib/gpgpu_fill.c
index 4d98643d..f2765fd6 100644
--- a/lib/gpgpu_fill.c
+++ b/lib/gpgpu_fill.c
@@ -30,10 +30,9 @@
 
 #include "intel_reg.h"
 #include "drmtest.h"
-#include "intel_batchbuffer.h"
-#include "gen7_media.h"
-#include "gen8_media.h"
+
 #include "gpgpu_fill.h"
+#include "gpu_fill.h"
 
 /* shaders/gpgpu/gpgpu_fill.gxa */
 static const uint32_t gen7_gpgpu_kernel[][4] = {
@@ -75,572 +74,6 @@ static const uint32_t gen9_gpgpu_kernel[][4] = {
 	{ 0x07800031, 0x20000a40, 0x06000e00, 0x82000010 },
 };
 
-static uint32_t
-batch_used(struct intel_batchbuffer *batch)
-{
-	return batch->ptr - batch->buffer;
-}
-
-static uint32_t
-batch_align(struct intel_batchbuffer *batch, uint32_t align)
-{
-	uint32_t offset = batch_used(batch);
-	offset = ALIGN(offset, align);
-	batch->ptr = batch->buffer + offset;
-	return offset;
-}
-
-static void *
-batch_alloc(struct intel_batchbuffer *batch, uint32_t size, uint32_t align)
-{
-	uint32_t offset = batch_align(batch, align);
-	batch->ptr += size;
-	return memset(batch->buffer + offset, 0, size);
-}
-
-static uint32_t
-batch_offset(struct intel_batchbuffer *batch, void *ptr)
-{
-	return (uint8_t *)ptr - batch->buffer;
-}
-
-static uint32_t
-batch_copy(struct intel_batchbuffer *batch, const void *ptr, uint32_t size,
-	   uint32_t align)
-{
-	return batch_offset(batch, memcpy(batch_alloc(batch, size, align), ptr, size));
-}
-
-static void
-gen7_render_flush(struct intel_batchbuffer *batch, uint32_t batch_end)
-{
-	int ret;
-
-	ret = drm_intel_bo_subdata(batch->bo, 0, 4096, batch->buffer);
-	if (ret == 0)
-		ret = drm_intel_bo_mrb_exec(batch->bo, batch_end,
-					NULL, 0, 0, 0);
-	igt_assert(ret == 0);
-}
-
-static uint32_t
-gen7_fill_curbe_buffer_data(struct intel_batchbuffer *batch, uint8_t color)
-{
-	uint8_t *curbe_buffer;
-	uint32_t offset;
-
-	curbe_buffer = batch_alloc(batch, sizeof(uint32_t) * 8, 64);
-	offset = batch_offset(batch, curbe_buffer);
-	*curbe_buffer = color;
-
-	return offset;
-}
-
-static uint32_t
-gen7_fill_surface_state(struct intel_batchbuffer *batch,
-			struct igt_buf *buf,
-			uint32_t format,
-			int is_dst)
-{
-	struct gen7_surface_state *ss;
-	uint32_t write_domain, read_domain, offset;
-	int ret;
-
-	if (is_dst) {
-		write_domain = read_domain = I915_GEM_DOMAIN_RENDER;
-	} else {
-		write_domain = 0;
-		read_domain = I915_GEM_DOMAIN_SAMPLER;
-	}
-
-	ss = batch_alloc(batch, sizeof(*ss), 64);
-	offset = batch_offset(batch, ss);
-
-	ss->ss0.surface_type = GEN7_SURFACE_2D;
-	ss->ss0.surface_format = format;
-	ss->ss0.render_cache_read_write = 1;
-
-	if (buf->tiling == I915_TILING_X)
-		ss->ss0.tiled_mode = 2;
-	else if (buf->tiling == I915_TILING_Y)
-		ss->ss0.tiled_mode = 3;
-
-	ss->ss1.base_addr = buf->bo->offset;
-	ret = drm_intel_bo_emit_reloc(batch->bo,
-				batch_offset(batch, ss) + 4,
-				buf->bo, 0,
-				read_domain, write_domain);
-	igt_assert(ret == 0);
-
-	ss->ss2.height = igt_buf_height(buf) - 1;
-	ss->ss2.width  = igt_buf_width(buf) - 1;
-
-	ss->ss3.pitch  = buf->stride - 1;
-
-	ss->ss7.shader_chanel_select_r = 4;
-	ss->ss7.shader_chanel_select_g = 5;
-	ss->ss7.shader_chanel_select_b = 6;
-	ss->ss7.shader_chanel_select_a = 7;
-
-	return offset;
-}
-
-static uint32_t
-gen8_fill_surface_state(struct intel_batchbuffer *batch,
-			struct igt_buf *buf,
-			uint32_t format,
-			int is_dst)
-{
-	struct gen8_surface_state *ss;
-	uint32_t write_domain, read_domain, offset;
-	int ret;
-
-	if (is_dst) {
-		write_domain = read_domain = I915_GEM_DOMAIN_RENDER;
-	} else {
-		write_domain = 0;
-		read_domain = I915_GEM_DOMAIN_SAMPLER;
-	}
-
-	ss = batch_alloc(batch, sizeof(*ss), 64);
-	offset = batch_offset(batch, ss);
-
-	ss->ss0.surface_type = GEN8_SURFACE_2D;
-	ss->ss0.surface_format = format;
-	ss->ss0.render_cache_read_write = 1;
-	ss->ss0.vertical_alignment = 1; /* align 4 */
-	ss->ss0.horizontal_alignment = 1; /* align 4 */
-
-	if (buf->tiling == I915_TILING_X)
-		ss->ss0.tiled_mode = 2;
-	else if (buf->tiling == I915_TILING_Y)
-		ss->ss0.tiled_mode = 3;
-
-	ss->ss8.base_addr = buf->bo->offset;
-
-	ret = drm_intel_bo_emit_reloc(batch->bo,
-				batch_offset(batch, ss) + 8 * 4,
-				buf->bo, 0,
-				read_domain, write_domain);
-	igt_assert_eq(ret, 0);
-
-	ss->ss2.height = igt_buf_height(buf) - 1;
-	ss->ss2.width  = igt_buf_width(buf) - 1;
-	ss->ss3.pitch  = buf->stride - 1;
-
-	ss->ss7.shader_chanel_select_r = 4;
-	ss->ss7.shader_chanel_select_g = 5;
-	ss->ss7.shader_chanel_select_b = 6;
-	ss->ss7.shader_chanel_select_a = 7;
-
-	return offset;
-
-}
-
-static uint32_t
-gen7_fill_binding_table(struct intel_batchbuffer *batch,
-			struct igt_buf *dst)
-{
-	uint32_t *binding_table, offset;
-
-	binding_table = batch_alloc(batch, 32, 64);
-	offset = batch_offset(batch, binding_table);
-
-	binding_table[0] = gen7_fill_surface_state(batch, dst, GEN7_SURFACEFORMAT_R8_UNORM, 1);
-
-	return offset;
-}
-
-static uint32_t
-gen8_fill_binding_table(struct intel_batchbuffer *batch,
-			struct igt_buf *dst)
-{
-	uint32_t *binding_table, offset;
-
-	binding_table = batch_alloc(batch, 32, 64);
-	offset = batch_offset(batch, binding_table);
-
-	binding_table[0] = gen8_fill_surface_state(batch, dst, GEN8_SURFACEFORMAT_R8_UNORM, 1);
-
-	return offset;
-}
-
-static uint32_t
-gen7_fill_gpgpu_kernel(struct intel_batchbuffer *batch,
-		const uint32_t kernel[][4],
-		size_t size)
-{
-	uint32_t offset;
-
-	offset = batch_copy(batch, kernel, size, 64);
-
-	return offset;
-}
-
-static uint32_t
-gen7_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst,
-			       const uint32_t kernel[][4], size_t size)
-{
-	struct gen7_interface_descriptor_data *idd;
-	uint32_t offset;
-	uint32_t binding_table_offset, kernel_offset;
-
-	binding_table_offset = gen7_fill_binding_table(batch, dst);
-	kernel_offset = gen7_fill_gpgpu_kernel(batch, kernel, size);
-
-	idd = batch_alloc(batch, sizeof(*idd), 64);
-	offset = batch_offset(batch, idd);
-
-	idd->desc0.kernel_start_pointer = (kernel_offset >> 6);
-
-	idd->desc1.single_program_flow = 1;
-	idd->desc1.floating_point_mode = GEN7_FLOATING_POINT_IEEE_754;
-
-	idd->desc2.sampler_count = 0;      /* 0 samplers used */
-	idd->desc2.sampler_state_pointer = 0;
-
-	idd->desc3.binding_table_entry_count = 0;
-	idd->desc3.binding_table_pointer = (binding_table_offset >> 5);
-
-	idd->desc4.constant_urb_entry_read_offset = 0;
-	idd->desc4.constant_urb_entry_read_length = 1; /* grf 1 */
-
-	return offset;
-}
-
-static uint32_t
-gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst,
-			       const uint32_t kernel[][4], size_t size)
-{
-	struct gen8_interface_descriptor_data *idd;
-	uint32_t offset;
-	uint32_t binding_table_offset, kernel_offset;
-
-	binding_table_offset = gen8_fill_binding_table(batch, dst);
-	kernel_offset = gen7_fill_gpgpu_kernel(batch, kernel, size);
-
-	idd = batch_alloc(batch, sizeof(*idd), 64);
-	offset = batch_offset(batch, idd);
-
-	idd->desc0.kernel_start_pointer = (kernel_offset >> 6);
-
-	idd->desc2.single_program_flow = 1;
-	idd->desc2.floating_point_mode = GEN8_FLOATING_POINT_IEEE_754;
-
-	idd->desc3.sampler_count = 0;      /* 0 samplers used */
-	idd->desc3.sampler_state_pointer = 0;
-
-	idd->desc4.binding_table_entry_count = 0;
-	idd->desc4.binding_table_pointer = (binding_table_offset >> 5);
-
-	idd->desc5.constant_urb_entry_read_offset = 0;
-	idd->desc5.constant_urb_entry_read_length = 1; /* grf 1 */
-
-	return offset;
-}
-
-static void
-gen7_emit_state_base_address(struct intel_batchbuffer *batch)
-{
-	OUT_BATCH(GEN7_STATE_BASE_ADDRESS | (10 - 2));
-
-	/* general */
-	OUT_BATCH(0);
-
-	/* surface */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
-
-	/* dynamic */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
-
-	/* indirect */
-	OUT_BATCH(0);
-
-	/* instruction */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
-
-	/* general/dynamic/indirect/instruction access Bound */
-	OUT_BATCH(0);
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-	OUT_BATCH(0);
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-}
-
-static void
-gen8_emit_state_base_address(struct intel_batchbuffer *batch)
-{
-	OUT_BATCH(GEN8_STATE_BASE_ADDRESS | (16 - 2));
-
-	/* general */
-	OUT_BATCH(0 | (0x78 << 4) | (0 << 1) |  BASE_ADDRESS_MODIFY);
-	OUT_BATCH(0);
-
-	/* stateless data port */
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-
-	/* surface */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_SAMPLER, 0, BASE_ADDRESS_MODIFY);
-
-	/* dynamic */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION,
-		  0, BASE_ADDRESS_MODIFY);
-
-	/* indirect */
-	OUT_BATCH(0);
-	OUT_BATCH(0 );
-
-	/* instruction */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
-
-	/* general state buffer size */
-	OUT_BATCH(0xfffff000 | 1);
-	/* dynamic state buffer size */
-	OUT_BATCH(1 << 12 | 1);
-	/* indirect object buffer size */
-	OUT_BATCH(0xfffff000 | 1);
-	/* intruction buffer size, must set modify enable bit, otherwise it may result in GPU hang */
-	OUT_BATCH(1 << 12 | 1);
-}
-
-static void
-gen9_emit_state_base_address(struct intel_batchbuffer *batch)
-{
-	OUT_BATCH(GEN8_STATE_BASE_ADDRESS | (19 - 2));
-
-	/* general */
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-	OUT_BATCH(0);
-
-	/* stateless data port */
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-
-	/* surface */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_SAMPLER, 0, BASE_ADDRESS_MODIFY);
-
-	/* dynamic */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION,
-		0, BASE_ADDRESS_MODIFY);
-
-	/* indirect */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	/* instruction */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
-
-	/* general state buffer size */
-	OUT_BATCH(0xfffff000 | 1);
-	/* dynamic state buffer size */
-	OUT_BATCH(1 << 12 | 1);
-	/* indirect object buffer size */
-	OUT_BATCH(0xfffff000 | 1);
-	/* intruction buffer size, must set modify enable bit, otherwise it may result in GPU hang */
-	OUT_BATCH(1 << 12 | 1);
-
-	/* Bindless surface state base address */
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-	OUT_BATCH(0);
-	OUT_BATCH(0xfffff000);
-}
-
-static void
-gen7_emit_vfe_state_gpgpu(struct intel_batchbuffer *batch)
-{
-	OUT_BATCH(GEN7_MEDIA_VFE_STATE | (8 - 2));
-
-	/* scratch buffer */
-	OUT_BATCH(0);
-
-	/* number of threads & urb entries */
-	OUT_BATCH(1 << 16 | /* max num of threads */
-		  0 << 8 | /* num of URB entry */
-		  1 << 2); /* GPGPU mode */
-
-	OUT_BATCH(0);
-
-	/* urb entry size & curbe size */
-	OUT_BATCH(0 << 16 | 	/* URB entry size in 256 bits unit */
-		  1);		/* CURBE entry size in 256 bits unit */
-
-	/* scoreboard */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-}
-
-static void
-gen8_emit_vfe_state_gpgpu(struct intel_batchbuffer *batch)
-{
-	OUT_BATCH(GEN8_MEDIA_VFE_STATE | (9 - 2));
-
-	/* scratch buffer */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	/* number of threads & urb entries */
-	OUT_BATCH(1 << 16 | 1 << 8);
-
-	OUT_BATCH(0);
-
-	/* urb entry size & curbe size */
-	OUT_BATCH(0 << 16 | 1);
-
-	/* scoreboard */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-}
-
-static void
-gen7_emit_curbe_load(struct intel_batchbuffer *batch, uint32_t curbe_buffer)
-{
-	OUT_BATCH(GEN7_MEDIA_CURBE_LOAD | (4 - 2));
-	OUT_BATCH(0);
-	/* curbe total data length */
-	OUT_BATCH(64);
-	/* curbe data start address, is relative to the dynamics base address */
-	OUT_BATCH(curbe_buffer);
-}
-
-static void
-gen7_emit_interface_descriptor_load(struct intel_batchbuffer *batch, uint32_t interface_descriptor)
-{
-	OUT_BATCH(GEN7_MEDIA_INTERFACE_DESCRIPTOR_LOAD | (4 - 2));
-	OUT_BATCH(0);
-	/* interface descriptor data length */
-	OUT_BATCH(sizeof(struct gen7_interface_descriptor_data));
-	/* interface descriptor address, is relative to the dynamics base address */
-	OUT_BATCH(interface_descriptor);
-}
-
-static void
-gen8_emit_interface_descriptor_load(struct intel_batchbuffer *batch, uint32_t interface_descriptor)
-{
-	OUT_BATCH(GEN8_MEDIA_INTERFACE_DESCRIPTOR_LOAD | (4 - 2));
-	OUT_BATCH(0);
-	/* interface descriptor data length */
-	OUT_BATCH(sizeof(struct gen8_interface_descriptor_data));
-	/* interface descriptor address, is relative to the dynamics base address */
-	OUT_BATCH(interface_descriptor);
-}
-
-static void
-gen7_emit_gpgpu_walk(struct intel_batchbuffer *batch,
-		     unsigned x, unsigned y,
-		     unsigned width, unsigned height)
-{
-	uint32_t x_dim, y_dim, tmp, right_mask;
-
-	/*
-	 * Simply do SIMD16 based dispatch, so every thread uses
-	 * SIMD16 channels.
-	 *
-	 * Define our own thread group size, e.g 16x1 for every group, then
-	 * will have 1 thread each group in SIMD16 dispatch. So thread
-	 * width/height/depth are all 1.
-	 *
-	 * Then thread group X = width / 16 (aligned to 16)
-	 * thread group Y = height;
-	 */
-	x_dim = (width + 15) / 16;
-	y_dim = height;
-
-	tmp = width & 15;
-	if (tmp == 0)
-		right_mask = (1 << 16) - 1;
-	else
-		right_mask = (1 << tmp) - 1;
-
-	OUT_BATCH(GEN7_GPGPU_WALKER | 9);
-
-	/* interface descriptor offset */
-	OUT_BATCH(0);
-
-	/* SIMD size, thread w/h/d */
-	OUT_BATCH(1 << 30 | /* SIMD16 */
-		  0 << 16 | /* depth:1 */
-		  0 << 8 | /* height:1 */
-		  0); /* width:1 */
-
-	/* thread group X */
-	OUT_BATCH(0);
-	OUT_BATCH(x_dim);
-
-	/* thread group Y */
-	OUT_BATCH(0);
-	OUT_BATCH(y_dim);
-
-	/* thread group Z */
-	OUT_BATCH(0);
-	OUT_BATCH(1);
-
-	/* right mask */
-	OUT_BATCH(right_mask);
-
-	/* bottom mask, height 1, always 0xffffffff */
-	OUT_BATCH(0xffffffff);
-}
-
-static void
-gen8_emit_gpgpu_walk(struct intel_batchbuffer *batch,
-		     unsigned x, unsigned y,
-		     unsigned width, unsigned height)
-{
-	uint32_t x_dim, y_dim, tmp, right_mask;
-
-	/*
-	 * Simply do SIMD16 based dispatch, so every thread uses
-	 * SIMD16 channels.
-	 *
-	 * Define our own thread group size, e.g 16x1 for every group, then
-	 * will have 1 thread each group in SIMD16 dispatch. So thread
-	 * width/height/depth are all 1.
-	 *
-	 * Then thread group X = width / 16 (aligned to 16)
-	 * thread group Y = height;
-	 */
-	x_dim = (width + 15) / 16;
-	y_dim = height;
-
-	tmp = width & 15;
-	if (tmp == 0)
-		right_mask = (1 << 16) - 1;
-	else
-		right_mask = (1 << tmp) - 1;
-
-	OUT_BATCH(GEN7_GPGPU_WALKER | 13);
-
-	OUT_BATCH(0); /* kernel offset */
-	OUT_BATCH(0); /* indirect data length */
-	OUT_BATCH(0); /* indirect data offset */
-
-	/* SIMD size, thread w/h/d */
-	OUT_BATCH(1 << 30 | /* SIMD16 */
-		  0 << 16 | /* depth:1 */
-		  0 << 8 | /* height:1 */
-		  0); /* width:1 */
-
-	/* thread group X */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(x_dim);
-
-	/* thread group Y */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(y_dim);
-
-	/* thread group Z */
-	OUT_BATCH(0);
-	OUT_BATCH(1);
-
-	/* right mask */
-	OUT_BATCH(right_mask);
-
-	/* bottom mask, height 1, always 0xffffffff */
-	OUT_BATCH(0xffffffff);
-}
-
 /*
  * This sets up the gpgpu pipeline,
  *
diff --git a/lib/gpu_fill.c b/lib/gpu_fill.c
new file mode 100644
index 00000000..172c6db6
--- /dev/null
+++ b/lib/gpu_fill.c
@@ -0,0 +1,758 @@
+#include "gpu_fill.h"
+
+uint32_t
+batch_used(struct intel_batchbuffer *batch)
+{
+	return batch->ptr - batch->buffer;
+}
+
+uint32_t
+batch_align(struct intel_batchbuffer *batch, uint32_t align)
+{
+	uint32_t offset = batch_used(batch);
+	offset = ALIGN(offset, align);
+	batch->ptr = batch->buffer + offset;
+	return offset;
+}
+
+void *
+batch_alloc(struct intel_batchbuffer *batch, uint32_t size, uint32_t align)
+{
+	uint32_t offset = batch_align(batch, align);
+	batch->ptr += size;
+	return memset(batch->buffer + offset, 0, size);
+}
+
+uint32_t
+batch_offset(struct intel_batchbuffer *batch, void *ptr)
+{
+	return (uint8_t *)ptr - batch->buffer;
+}
+
+uint32_t
+batch_copy(struct intel_batchbuffer *batch, const void *ptr, uint32_t size, uint32_t align)
+{
+	return batch_offset(batch, memcpy(batch_alloc(batch, size, align), ptr, size));
+}
+
+void
+gen7_render_flush(struct intel_batchbuffer *batch, uint32_t batch_end)
+{
+	int ret;
+
+	ret = drm_intel_bo_subdata(batch->bo, 0, 4096, batch->buffer);
+	if (ret == 0)
+		ret = drm_intel_bo_mrb_exec(batch->bo, batch_end,
+					NULL, 0, 0, 0);
+	igt_assert(ret == 0);
+}
+
+uint32_t
+gen7_fill_curbe_buffer_data(struct intel_batchbuffer *batch,
+			uint8_t color)
+{
+	uint8_t *curbe_buffer;
+	uint32_t offset;
+
+	curbe_buffer = batch_alloc(batch, sizeof(uint32_t) * 8, 64);
+	offset = batch_offset(batch, curbe_buffer);
+	*curbe_buffer = color;
+
+	return offset;
+}
+
+uint32_t
+gen7_fill_surface_state(struct intel_batchbuffer *batch,
+			struct igt_buf *buf,
+			uint32_t format,
+			int is_dst)
+{
+	struct gen7_surface_state *ss;
+	uint32_t write_domain, read_domain, offset;
+	int ret;
+
+	if (is_dst) {
+		write_domain = read_domain = I915_GEM_DOMAIN_RENDER;
+	} else {
+		write_domain = 0;
+		read_domain = I915_GEM_DOMAIN_SAMPLER;
+	}
+
+	ss = batch_alloc(batch, sizeof(*ss), 64);
+	offset = batch_offset(batch, ss);
+
+	ss->ss0.surface_type = GEN7_SURFACE_2D;
+	ss->ss0.surface_format = format;
+	ss->ss0.render_cache_read_write = 1;
+
+	if (buf->tiling == I915_TILING_X)
+		ss->ss0.tiled_mode = 2;
+	else if (buf->tiling == I915_TILING_Y)
+		ss->ss0.tiled_mode = 3;
+
+	ss->ss1.base_addr = buf->bo->offset;
+	ret = drm_intel_bo_emit_reloc(batch->bo,
+				batch_offset(batch, ss) + 4,
+				buf->bo, 0,
+				read_domain, write_domain);
+	igt_assert(ret == 0);
+
+	ss->ss2.height = igt_buf_height(buf) - 1;
+	ss->ss2.width  = igt_buf_width(buf) - 1;
+
+	ss->ss3.pitch  = buf->stride - 1;
+
+	ss->ss7.shader_chanel_select_r = 4;
+	ss->ss7.shader_chanel_select_g = 5;
+	ss->ss7.shader_chanel_select_b = 6;
+	ss->ss7.shader_chanel_select_a = 7;
+
+	return offset;
+}
+
+uint32_t
+gen7_fill_binding_table(struct intel_batchbuffer *batch,
+			struct igt_buf *dst)
+{
+	uint32_t *binding_table, offset;
+
+	binding_table = batch_alloc(batch, 32, 64);
+	offset = batch_offset(batch, binding_table);
+	binding_table[0] = gen7_fill_surface_state(batch, dst,
+						GEN7_SURFACEFORMAT_R8_UNORM, 1);
+
+	return offset;
+}
+
+uint32_t
+gen7_fill_media_kernel(struct intel_batchbuffer *batch,
+		const uint32_t kernel[][4],
+		size_t size)
+{
+	uint32_t offset;
+
+	offset = batch_copy(batch, kernel, size, 64);
+
+	return offset;
+}
+
+uint32_t
+gen8_fill_media_kernel(struct intel_batchbuffer *batch,
+		const uint32_t kernel[][4],
+		size_t size)
+{
+	uint32_t offset;
+
+	offset = batch_copy(batch, kernel, size, 64);
+
+	return offset;
+}
+
+uint32_t
+gen7_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst,
+			       const uint32_t kernel[][4], size_t size)
+{
+	struct gen7_interface_descriptor_data *idd;
+	uint32_t offset;
+	uint32_t binding_table_offset, kernel_offset;
+
+	binding_table_offset = gen7_fill_binding_table(batch, dst);
+	kernel_offset = gen7_fill_media_kernel(batch, kernel, size);
+
+	idd = batch_alloc(batch, sizeof(*idd), 64);
+	offset = batch_offset(batch, idd);
+
+	idd->desc0.kernel_start_pointer = (kernel_offset >> 6);
+
+	idd->desc1.single_program_flow = 1;
+	idd->desc1.floating_point_mode = GEN7_FLOATING_POINT_IEEE_754;
+
+	idd->desc2.sampler_count = 0;      /* 0 samplers used */
+	idd->desc2.sampler_state_pointer = 0;
+
+	idd->desc3.binding_table_entry_count = 0;
+	idd->desc3.binding_table_pointer = (binding_table_offset >> 5);
+
+	idd->desc4.constant_urb_entry_read_offset = 0;
+	idd->desc4.constant_urb_entry_read_length = 1; /* grf 1 */
+
+	return offset;
+}
+
+void
+gen7_emit_state_base_address(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_STATE_BASE_ADDRESS | (10 - 2));
+
+	/* general */
+	OUT_BATCH(0);
+
+	/* surface */
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+
+	/* dynamic */
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+
+	/* indirect */
+	OUT_BATCH(0);
+
+	/* instruction */
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+
+	/* general/dynamic/indirect/instruction access Bound */
+	OUT_BATCH(0);
+	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
+	OUT_BATCH(0);
+	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
+}
+
+void
+gen7_emit_vfe_state(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_MEDIA_VFE_STATE | (8 - 2));
+
+	/* scratch buffer */
+	OUT_BATCH(0);
+
+	/* number of threads & urb entries */
+	OUT_BATCH(1 << 16 |
+		2 << 8);
+
+	OUT_BATCH(0);
+
+	/* urb entry size & curbe size */
+	OUT_BATCH(2 << 16 | 	/* in 256 bits unit */
+		2);		/* in 256 bits unit */
+
+	/* scoreboard */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+void
+gen7_emit_vfe_state_gpgpu(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_MEDIA_VFE_STATE | (8 - 2));
+
+	/* scratch buffer */
+	OUT_BATCH(0);
+
+	/* number of threads & urb entries */
+	OUT_BATCH(1 << 16 | /* max num of threads */
+		  0 << 8 | /* num of URB entry */
+		  1 << 2); /* GPGPU mode */
+
+	OUT_BATCH(0);
+
+	/* urb entry size & curbe size */
+	OUT_BATCH(0 << 16 | 	/* URB entry size in 256 bits unit */
+		  1);		/* CURBE entry size in 256 bits unit */
+
+	/* scoreboard */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+void
+gen7_emit_curbe_load(struct intel_batchbuffer *batch, uint32_t curbe_buffer)
+{
+	OUT_BATCH(GEN7_MEDIA_CURBE_LOAD | (4 - 2));
+	OUT_BATCH(0);
+	/* curbe total data length */
+	OUT_BATCH(64);
+	/* curbe data start address, is relative to the dynamics base address */
+	OUT_BATCH(curbe_buffer);
+}
+
+void
+gen7_emit_interface_descriptor_load(struct intel_batchbuffer *batch, uint32_t interface_descriptor)
+{
+	OUT_BATCH(GEN7_MEDIA_INTERFACE_DESCRIPTOR_LOAD | (4 - 2));
+	OUT_BATCH(0);
+	/* interface descriptor data length */
+	OUT_BATCH(sizeof(struct gen7_interface_descriptor_data));
+	/* interface descriptor address, is relative to the dynamics base address */
+	OUT_BATCH(interface_descriptor);
+}
+
+void
+gen7_emit_media_objects(struct intel_batchbuffer *batch,
+			unsigned x, unsigned y,
+			unsigned width, unsigned height)
+{
+	int i, j;
+
+	for (i = 0; i < width / 16; i++) {
+		for (j = 0; j < height / 16; j++) {
+			OUT_BATCH(GEN7_MEDIA_OBJECT | (8 - 2));
+
+			/* interface descriptor offset */
+			OUT_BATCH(0);
+
+			/* without indirect data */
+			OUT_BATCH(0);
+			OUT_BATCH(0);
+
+			/* scoreboard */
+			OUT_BATCH(0);
+			OUT_BATCH(0);
+
+			/* inline data (xoffset, yoffset) */
+			OUT_BATCH(x + i * 16);
+			OUT_BATCH(y + j * 16);
+		}
+	}
+}
+
+void
+gen7_emit_gpgpu_walk(struct intel_batchbuffer *batch,
+		     unsigned x, unsigned y,
+		     unsigned width, unsigned height)
+{
+	uint32_t x_dim, y_dim, tmp, right_mask;
+
+	/*
+	 * Simply do SIMD16 based dispatch, so every thread uses
+	 * SIMD16 channels.
+	 *
+	 * Define our own thread group size, e.g 16x1 for every group, then
+	 * will have 1 thread each group in SIMD16 dispatch. So thread
+	 * width/height/depth are all 1.
+	 *
+	 * Then thread group X = width / 16 (aligned to 16)
+	 * thread group Y = height;
+	 */
+	x_dim = (width + 15) / 16;
+	y_dim = height;
+
+	tmp = width & 15;
+	if (tmp == 0)
+		right_mask = (1 << 16) - 1;
+	else
+		right_mask = (1 << tmp) - 1;
+
+	OUT_BATCH(GEN7_GPGPU_WALKER | 9);
+
+	/* interface descriptor offset */
+	OUT_BATCH(0);
+
+	/* SIMD size, thread w/h/d */
+	OUT_BATCH(1 << 30 | /* SIMD16 */
+		  0 << 16 | /* depth:1 */
+		  0 << 8 | /* height:1 */
+		  0); /* width:1 */
+
+	/* thread group X */
+	OUT_BATCH(0);
+	OUT_BATCH(x_dim);
+
+	/* thread group Y */
+	OUT_BATCH(0);
+	OUT_BATCH(y_dim);
+
+	/* thread group Z */
+	OUT_BATCH(0);
+	OUT_BATCH(1);
+
+	/* right mask */
+	OUT_BATCH(right_mask);
+
+	/* bottom mask, height 1, always 0xffffffff */
+	OUT_BATCH(0xffffffff);
+}
+
+void
+gen8_render_flush(struct intel_batchbuffer *batch, uint32_t batch_end)
+{
+	int ret;
+
+	ret = drm_intel_bo_subdata(batch->bo, 0, 4096, batch->buffer);
+	if (ret == 0)
+		ret = drm_intel_bo_mrb_exec(batch->bo, batch_end,
+					NULL, 0, 0, 0);
+	igt_assert(ret == 0);
+}
+
+
+uint32_t
+gen8_fill_curbe_buffer_data(struct intel_batchbuffer *batch,
+			uint8_t color)
+{
+	uint8_t *curbe_buffer;
+	uint32_t offset;
+
+	curbe_buffer = batch_alloc(batch, sizeof(uint32_t) * 8, 64);
+	offset = batch_offset(batch, curbe_buffer);
+	*curbe_buffer = color;
+
+	return offset;
+}
+
+uint32_t
+gen8_fill_surface_state(struct intel_batchbuffer *batch,
+			struct igt_buf *buf,
+			uint32_t format,
+			int is_dst)
+{
+	struct gen8_surface_state *ss;
+	uint32_t write_domain, read_domain, offset;
+	int ret;
+
+	if (is_dst) {
+		write_domain = read_domain = I915_GEM_DOMAIN_RENDER;
+	} else {
+		write_domain = 0;
+		read_domain = I915_GEM_DOMAIN_SAMPLER;
+	}
+
+	ss = batch_alloc(batch, sizeof(*ss), 64);
+	offset = batch_offset(batch, ss);
+
+	ss->ss0.surface_type = GEN8_SURFACE_2D;
+	ss->ss0.surface_format = format;
+	ss->ss0.render_cache_read_write = 1;
+	ss->ss0.vertical_alignment = 1; /* align 4 */
+	ss->ss0.horizontal_alignment = 1; /* align 4 */
+
+	if (buf->tiling == I915_TILING_X)
+		ss->ss0.tiled_mode = 2;
+	else if (buf->tiling == I915_TILING_Y)
+		ss->ss0.tiled_mode = 3;
+
+	ss->ss8.base_addr = buf->bo->offset;
+
+	ret = drm_intel_bo_emit_reloc(batch->bo,
+				batch_offset(batch, ss) + 8 * 4,
+				buf->bo, 0,
+				read_domain, write_domain);
+	igt_assert(ret == 0);
+
+	ss->ss2.height = igt_buf_height(buf) - 1;
+	ss->ss2.width  = igt_buf_width(buf) - 1;
+	ss->ss3.pitch  = buf->stride - 1;
+
+	ss->ss7.shader_chanel_select_r = 4;
+	ss->ss7.shader_chanel_select_g = 5;
+	ss->ss7.shader_chanel_select_b = 6;
+	ss->ss7.shader_chanel_select_a = 7;
+
+	return offset;
+}
+
+uint32_t
+gen8_fill_binding_table(struct intel_batchbuffer *batch,
+			struct igt_buf *dst)
+{
+	uint32_t *binding_table, offset;
+
+	binding_table = batch_alloc(batch, 32, 64);
+	offset = batch_offset(batch, binding_table);
+
+	binding_table[0] = gen8_fill_surface_state(batch, dst,
+						GEN8_SURFACEFORMAT_R8_UNORM, 1);
+
+	return offset;
+}
+
+uint32_t
+gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst,  const uint32_t kernel[][4], size_t size)
+{
+	struct gen8_interface_descriptor_data *idd;
+	uint32_t offset;
+	uint32_t binding_table_offset, kernel_offset;
+
+	binding_table_offset = gen8_fill_binding_table(batch, dst);
+	kernel_offset = gen8_fill_media_kernel(batch, kernel, size);
+
+	idd = batch_alloc(batch, sizeof(*idd), 64);
+	offset = batch_offset(batch, idd);
+
+	idd->desc0.kernel_start_pointer = (kernel_offset >> 6);
+
+	idd->desc2.single_program_flow = 1;
+	idd->desc2.floating_point_mode = GEN8_FLOATING_POINT_IEEE_754;
+
+	idd->desc3.sampler_count = 0;      /* 0 samplers used */
+	idd->desc3.sampler_state_pointer = 0;
+
+	idd->desc4.binding_table_entry_count = 0;
+	idd->desc4.binding_table_pointer = (binding_table_offset >> 5);
+
+	idd->desc5.constant_urb_entry_read_offset = 0;
+	idd->desc5.constant_urb_entry_read_length = 1; /* grf 1 */
+
+	return offset;
+}
+
+void
+gen8_emit_state_base_address(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN8_STATE_BASE_ADDRESS | (16 - 2));
+
+	/* general */
+	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
+	OUT_BATCH(0);
+
+	/* stateless data port */
+	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
+
+	/* surface */
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_SAMPLER, 0, BASE_ADDRESS_MODIFY);
+
+	/* dynamic */
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION,
+		0, BASE_ADDRESS_MODIFY);
+
+	/* indirect */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	/* instruction */
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+
+	/* general state buffer size */
+	OUT_BATCH(0xfffff000 | 1);
+	/* dynamic state buffer size */
+	OUT_BATCH(1 << 12 | 1);
+	/* indirect object buffer size */
+	OUT_BATCH(0xfffff000 | 1);
+	/* intruction buffer size, must set modify enable bit, otherwise it may result in GPU hang */
+	OUT_BATCH(1 << 12 | 1);
+}
+
+void
+gen8_emit_vfe_state(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN8_MEDIA_VFE_STATE | (9 - 2));
+
+	/* scratch buffer */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	/* number of threads & urb entries */
+	OUT_BATCH(1 << 16 |
+		2 << 8);
+
+	OUT_BATCH(0);
+
+	/* urb entry size & curbe size */
+	OUT_BATCH(2 << 16 |
+		2);
+
+	/* scoreboard */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+void
+gen8_emit_vfe_state_gpgpu(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN8_MEDIA_VFE_STATE | (9 - 2));
+
+	/* scratch buffer */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	/* number of threads & urb entries */
+	OUT_BATCH(1 << 16 | 1 << 8);
+
+	OUT_BATCH(0);
+
+	/* urb entry size & curbe size */
+	OUT_BATCH(0 << 16 | 1);
+
+	/* scoreboard */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+}
+
+void
+gen8_emit_curbe_load(struct intel_batchbuffer *batch, uint32_t curbe_buffer)
+{
+	OUT_BATCH(GEN8_MEDIA_CURBE_LOAD | (4 - 2));
+	OUT_BATCH(0);
+	/* curbe total data length */
+	OUT_BATCH(64);
+	/* curbe data start address, is relative to the dynamics base address */
+	OUT_BATCH(curbe_buffer);
+}
+
+void
+gen8_emit_interface_descriptor_load(struct intel_batchbuffer *batch, uint32_t interface_descriptor)
+{
+	OUT_BATCH(GEN8_MEDIA_INTERFACE_DESCRIPTOR_LOAD | (4 - 2));
+	OUT_BATCH(0);
+	/* interface descriptor data length */
+	OUT_BATCH(sizeof(struct gen8_interface_descriptor_data));
+	/* interface descriptor address, is relative to the dynamics base address */
+	OUT_BATCH(interface_descriptor);
+}
+
+void
+gen8_emit_media_state_flush(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN8_MEDIA_STATE_FLUSH | (2 - 2));
+	OUT_BATCH(0);
+}
+
+void
+gen8_emit_media_objects(struct intel_batchbuffer *batch,
+			unsigned x, unsigned y,
+			unsigned width, unsigned height)
+{
+	int i, j;
+
+	for (i = 0; i < width / 16; i++) {
+		for (j = 0; j < height / 16; j++) {
+			OUT_BATCH(GEN8_MEDIA_OBJECT | (8 - 2));
+
+			/* interface descriptor offset */
+			OUT_BATCH(0);
+
+			/* without indirect data */
+			OUT_BATCH(0);
+			OUT_BATCH(0);
+
+			/* scoreboard */
+			OUT_BATCH(0);
+			OUT_BATCH(0);
+
+			/* inline data (xoffset, yoffset) */
+			OUT_BATCH(x + i * 16);
+			OUT_BATCH(y + j * 16);
+			gen8_emit_media_state_flush(batch);
+		}
+	}
+}
+void
+gen8lp_emit_media_objects(struct intel_batchbuffer *batch,
+			unsigned x, unsigned y,
+			unsigned width, unsigned height)
+{
+	int i, j;
+
+	for (i = 0; i < width / 16; i++) {
+		for (j = 0; j < height / 16; j++) {
+			OUT_BATCH(GEN8_MEDIA_OBJECT | (8 - 2));
+
+			/* interface descriptor offset */
+			OUT_BATCH(0);
+
+			/* without indirect data */
+			OUT_BATCH(0);
+			OUT_BATCH(0);
+
+			/* scoreboard */
+			OUT_BATCH(0);
+			OUT_BATCH(0);
+
+			/* inline data (xoffset, yoffset) */
+			OUT_BATCH(x + i * 16);
+			OUT_BATCH(y + j * 16);
+		}
+	}
+}
+void
+gen8_emit_gpgpu_walk(struct intel_batchbuffer *batch,
+		     unsigned x, unsigned y,
+		     unsigned width, unsigned height)
+{
+	uint32_t x_dim, y_dim, tmp, right_mask;
+
+	/*
+	 * Simply do SIMD16 based dispatch, so every thread uses
+	 * SIMD16 channels.
+	 *
+	 * Define our own thread group size, e.g 16x1 for every group, then
+	 * will have 1 thread each group in SIMD16 dispatch. So thread
+	 * width/height/depth are all 1.
+	 *
+	 * Then thread group X = width / 16 (aligned to 16)
+	 * thread group Y = height;
+	 */
+	x_dim = (width + 15) / 16;
+	y_dim = height;
+
+	tmp = width & 15;
+	if (tmp == 0)
+		right_mask = (1 << 16) - 1;
+	else
+		right_mask = (1 << tmp) - 1;
+
+	OUT_BATCH(GEN7_GPGPU_WALKER | 13);
+
+	OUT_BATCH(0); /* kernel offset */
+	OUT_BATCH(0); /* indirect data length */
+	OUT_BATCH(0); /* indirect data offset */
+
+	/* SIMD size, thread w/h/d */
+	OUT_BATCH(1 << 30 | /* SIMD16 */
+		  0 << 16 | /* depth:1 */
+		  0 << 8 | /* height:1 */
+		  0); /* width:1 */
+
+	/* thread group X */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(x_dim);
+
+	/* thread group Y */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+	OUT_BATCH(y_dim);
+
+	/* thread group Z */
+	OUT_BATCH(0);
+	OUT_BATCH(1);
+
+	/* right mask */
+	OUT_BATCH(right_mask);
+
+	/* bottom mask, height 1, always 0xffffffff */
+	OUT_BATCH(0xffffffff);
+}
+
+void
+gen9_emit_state_base_address(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN8_STATE_BASE_ADDRESS | (19 - 2));
+
+	/* general */
+	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
+	OUT_BATCH(0);
+
+	/* stateless data port */
+	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
+
+	/* surface */
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_SAMPLER, 0, BASE_ADDRESS_MODIFY);
+
+	/* dynamic */
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION,
+		0, BASE_ADDRESS_MODIFY);
+
+	/* indirect */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
+
+	/* instruction */
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+
+	/* general state buffer size */
+	OUT_BATCH(0xfffff000 | 1);
+	/* dynamic state buffer size */
+	OUT_BATCH(1 << 12 | 1);
+	/* indirect object buffer size */
+	OUT_BATCH(0xfffff000 | 1);
+	/* intruction buffer size, must set modify enable bit, otherwise it may result in GPU hang */
+	OUT_BATCH(1 << 12 | 1);
+
+	/* Bindless surface state base address */
+	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
+	OUT_BATCH(0);
+	OUT_BATCH(0xfffff000);
+}
diff --git a/lib/gpu_fill.h b/lib/gpu_fill.h
new file mode 100644
index 00000000..973513b0
--- /dev/null
+++ b/lib/gpu_fill.h
@@ -0,0 +1,138 @@
+#include <intel_bufmgr.h>
+#include <i915_drm.h>
+
+#include "media_fill.h"
+#include "gen7_media.h"
+#include "gen8_media.h"
+#include "intel_reg.h"
+#include "drmtest.h"
+#include "intel_batchbuffer.h"
+#include "intel_chipset.h"
+#include <assert.h>
+
+uint32_t
+batch_used(struct intel_batchbuffer *batch);
+
+uint32_t
+batch_align(struct intel_batchbuffer *batch, uint32_t align);
+
+void *
+batch_alloc(struct intel_batchbuffer *batch, uint32_t size, uint32_t align);
+
+uint32_t
+batch_offset(struct intel_batchbuffer *batch, void *ptr);
+
+uint32_t
+batch_copy(struct intel_batchbuffer *batch, const void *ptr, uint32_t size, uint32_t align);
+
+void
+gen7_render_flush(struct intel_batchbuffer *batch, uint32_t batch_end);
+
+uint32_t
+gen7_fill_curbe_buffer_data(struct intel_batchbuffer *batch,
+			uint8_t color);
+
+uint32_t
+gen7_fill_surface_state(struct intel_batchbuffer *batch,
+			struct igt_buf *buf,
+			uint32_t format,
+			int is_dst);
+
+uint32_t
+gen7_fill_binding_table(struct intel_batchbuffer *batch,
+			struct igt_buf *dst);
+
+uint32_t
+gen7_fill_media_kernel(struct intel_batchbuffer *batch,
+		const uint32_t kernel[][4],
+		size_t size);
+
+uint32_t
+gen8_fill_media_kernel(struct intel_batchbuffer *batch,
+		const uint32_t kernel[][4],
+		size_t size);
+
+uint32_t
+gen7_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst,
+			       const uint32_t kernel[][4], size_t size);
+
+void
+gen7_emit_state_base_address(struct intel_batchbuffer *batch);
+
+void
+gen7_emit_vfe_state(struct intel_batchbuffer *batch);
+
+void
+gen7_emit_vfe_state_gpgpu(struct intel_batchbuffer *batch);
+
+void
+gen7_emit_curbe_load(struct intel_batchbuffer *batch, uint32_t curbe_buffer);
+
+void
+gen7_emit_interface_descriptor_load(struct intel_batchbuffer *batch, uint32_t interface_descriptor);
+
+void
+gen7_emit_media_objects(struct intel_batchbuffer *batch,
+			unsigned x, unsigned y,
+			unsigned width, unsigned height);
+
+void
+gen7_emit_gpgpu_walk(struct intel_batchbuffer *batch,
+		     unsigned x, unsigned y,
+		     unsigned width, unsigned height);
+
+void
+gen8_render_flush(struct intel_batchbuffer *batch, uint32_t batch_end);
+
+uint32_t
+gen8_fill_curbe_buffer_data(struct intel_batchbuffer *batch,
+			uint8_t color);
+
+uint32_t
+gen8_fill_surface_state(struct intel_batchbuffer *batch,
+			struct igt_buf *buf,
+			uint32_t format,
+			int is_dst);
+
+uint32_t
+gen8_fill_binding_table(struct intel_batchbuffer *batch,
+			struct igt_buf *dst);
+
+uint32_t
+gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst,  const uint32_t kernel[][4], size_t size);
+
+void
+gen8_emit_state_base_address(struct intel_batchbuffer *batch);
+
+void
+gen8_emit_vfe_state(struct intel_batchbuffer *batch);
+
+void
+gen8_emit_vfe_state_gpgpu(struct intel_batchbuffer *batch);
+
+void
+gen8_emit_curbe_load(struct intel_batchbuffer *batch, uint32_t curbe_buffer);
+
+void
+gen8_emit_interface_descriptor_load(struct intel_batchbuffer *batch, uint32_t interface_descriptor);
+
+void
+gen8_emit_media_state_flush(struct intel_batchbuffer *batch);
+
+void
+gen8_emit_media_objects(struct intel_batchbuffer *batch,
+			unsigned x, unsigned y,
+			unsigned width, unsigned height);
+
+void
+gen8lp_emit_media_objects(struct intel_batchbuffer *batch,
+			unsigned x, unsigned y,
+			unsigned width, unsigned height);
+
+void
+gen8_emit_gpgpu_walk(struct intel_batchbuffer *batch,
+		     unsigned x, unsigned y,
+		     unsigned width, unsigned height);
+
+void
+gen9_emit_state_base_address(struct intel_batchbuffer *batch);
diff --git a/lib/media_fill_gen7.c b/lib/media_fill_gen7.c
index 6fb44798..c97555a6 100644
--- a/lib/media_fill_gen7.c
+++ b/lib/media_fill_gen7.c
@@ -5,7 +5,7 @@
 #include "gen7_media.h"
 #include "intel_reg.h"
 #include "drmtest.h"
-
+#include "gpu_fill.h"
 #include <assert.h>
 
 static const uint32_t media_kernel[][4] = {
@@ -22,275 +22,6 @@ static const uint32_t media_kernel[][4] = {
 	{ 0x07800031, 0x20001ca8, 0x00000e00, 0x82000010 },
 };
 
-static uint32_t
-batch_used(struct intel_batchbuffer *batch)
-{
-	return batch->ptr - batch->buffer;
-}
-
-static uint32_t
-batch_align(struct intel_batchbuffer *batch, uint32_t align)
-{
-	uint32_t offset = batch_used(batch);
-	offset = ALIGN(offset, align);
-	batch->ptr = batch->buffer + offset;
-	return offset;
-}
-
-static void *
-batch_alloc(struct intel_batchbuffer *batch, uint32_t size, uint32_t align)
-{
-	uint32_t offset = batch_align(batch, align);
-	batch->ptr += size;
-	return memset(batch->buffer + offset, 0, size);
-}
-
-static uint32_t
-batch_offset(struct intel_batchbuffer *batch, void *ptr)
-{
-	return (uint8_t *)ptr - batch->buffer;
-}
-
-static uint32_t
-batch_copy(struct intel_batchbuffer *batch, const void *ptr, uint32_t size, uint32_t align)
-{
-	return batch_offset(batch, memcpy(batch_alloc(batch, size, align), ptr, size));
-}
-
-static void
-gen7_render_flush(struct intel_batchbuffer *batch, uint32_t batch_end)
-{
-	int ret;
-
-	ret = drm_intel_bo_subdata(batch->bo, 0, 4096, batch->buffer);
-	if (ret == 0)
-		ret = drm_intel_bo_mrb_exec(batch->bo, batch_end,
-					NULL, 0, 0, 0);
-	igt_assert(ret == 0);
-}
-
-static uint32_t
-gen7_fill_curbe_buffer_data(struct intel_batchbuffer *batch,
-			uint8_t color)
-{
-	uint8_t *curbe_buffer;
-	uint32_t offset;
-
-	curbe_buffer = batch_alloc(batch, sizeof(uint32_t) * 8, 64);
-	offset = batch_offset(batch, curbe_buffer);
-	*curbe_buffer = color;
-
-	return offset;
-}
-
-static uint32_t
-gen7_fill_surface_state(struct intel_batchbuffer *batch,
-			struct igt_buf *buf,
-			uint32_t format,
-			int is_dst)
-{
-	struct gen7_surface_state *ss;
-	uint32_t write_domain, read_domain, offset;
-	int ret;
-
-	if (is_dst) {
-		write_domain = read_domain = I915_GEM_DOMAIN_RENDER;
-	} else {
-		write_domain = 0;
-		read_domain = I915_GEM_DOMAIN_SAMPLER;
-	}
-
-	ss = batch_alloc(batch, sizeof(*ss), 64);
-	offset = batch_offset(batch, ss);
-
-	ss->ss0.surface_type = GEN7_SURFACE_2D;
-	ss->ss0.surface_format = format;
-	ss->ss0.render_cache_read_write = 1;
-
-	if (buf->tiling == I915_TILING_X)
-		ss->ss0.tiled_mode = 2;
-	else if (buf->tiling == I915_TILING_Y)
-		ss->ss0.tiled_mode = 3;
-
-	ss->ss1.base_addr = buf->bo->offset;
-	ret = drm_intel_bo_emit_reloc(batch->bo,
-				batch_offset(batch, ss) + 4,
-				buf->bo, 0,
-				read_domain, write_domain);
-	igt_assert(ret == 0);
-
-	ss->ss2.height = igt_buf_height(buf) - 1;
-	ss->ss2.width  = igt_buf_width(buf) - 1;
-
-	ss->ss3.pitch  = buf->stride - 1;
-
-	ss->ss7.shader_chanel_select_r = 4;
-	ss->ss7.shader_chanel_select_g = 5;
-	ss->ss7.shader_chanel_select_b = 6;
-	ss->ss7.shader_chanel_select_a = 7;
-
-	return offset;
-}
-
-static uint32_t
-gen7_fill_binding_table(struct intel_batchbuffer *batch,
-			struct igt_buf *dst)
-{
-	uint32_t *binding_table, offset;
-
-	binding_table = batch_alloc(batch, 32, 64);
-	offset = batch_offset(batch, binding_table);
-
-	binding_table[0] = gen7_fill_surface_state(batch, dst, GEN7_SURFACEFORMAT_R8_UNORM, 1);
-
-	return offset;
-}
-
-static uint32_t
-gen7_fill_media_kernel(struct intel_batchbuffer *batch,
-		const uint32_t kernel[][4],
-		size_t size)
-{
-	uint32_t offset;
-
-	offset = batch_copy(batch, kernel, size, 64);
-
-	return offset;
-}
-
-static uint32_t
-gen7_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst,
-			       const uint32_t kernel[][4], size_t size)
-{
-	struct gen7_interface_descriptor_data *idd;
-	uint32_t offset;
-	uint32_t binding_table_offset, kernel_offset;
-
-	binding_table_offset = gen7_fill_binding_table(batch, dst);
-	kernel_offset = gen7_fill_media_kernel(batch, kernel, size);
-
-	idd = batch_alloc(batch, sizeof(*idd), 64);
-	offset = batch_offset(batch, idd);
-
-	idd->desc0.kernel_start_pointer = (kernel_offset >> 6);
-
-	idd->desc1.single_program_flow = 1;
-	idd->desc1.floating_point_mode = GEN7_FLOATING_POINT_IEEE_754;
-
-	idd->desc2.sampler_count = 0;      /* 0 samplers used */
-	idd->desc2.sampler_state_pointer = 0;
-
-	idd->desc3.binding_table_entry_count = 0;
-	idd->desc3.binding_table_pointer = (binding_table_offset >> 5);
-
-	idd->desc4.constant_urb_entry_read_offset = 0;
-	idd->desc4.constant_urb_entry_read_length = 1; /* grf 1 */
-
-	return offset;
-}
-
-static void
-gen7_emit_state_base_address(struct intel_batchbuffer *batch)
-{
-	OUT_BATCH(GEN7_STATE_BASE_ADDRESS | (10 - 2));
-
-	/* general */
-	OUT_BATCH(0);
-
-	/* surface */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
-
-	/* dynamic */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
-
-	/* indirect */
-	OUT_BATCH(0);
-
-	/* instruction */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
-
-	/* general/dynamic/indirect/instruction access Bound */
-	OUT_BATCH(0);
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-	OUT_BATCH(0);
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-}
-
-static void
-gen7_emit_vfe_state(struct intel_batchbuffer *batch)
-{
-	OUT_BATCH(GEN7_MEDIA_VFE_STATE | (8 - 2));
-
-	/* scratch buffer */
-	OUT_BATCH(0);
-
-	/* number of threads & urb entries */
-	OUT_BATCH(1 << 16 |
-		2 << 8);
-
-	OUT_BATCH(0);
-
-	/* urb entry size & curbe size */
-	OUT_BATCH(2 << 16 | 	/* in 256 bits unit */
-		2);		/* in 256 bits unit */
-
-	/* scoreboard */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-}
-
-static void
-gen7_emit_curbe_load(struct intel_batchbuffer *batch, uint32_t curbe_buffer)
-{
-	OUT_BATCH(GEN7_MEDIA_CURBE_LOAD | (4 - 2));
-	OUT_BATCH(0);
-	/* curbe total data length */
-	OUT_BATCH(64);
-	/* curbe data start address, is relative to the dynamics base address */
-	OUT_BATCH(curbe_buffer);
-}
-
-static void
-gen7_emit_interface_descriptor_load(struct intel_batchbuffer *batch, uint32_t interface_descriptor)
-{
-	OUT_BATCH(GEN7_MEDIA_INTERFACE_DESCRIPTOR_LOAD | (4 - 2));
-	OUT_BATCH(0);
-	/* interface descriptor data length */
-	OUT_BATCH(sizeof(struct gen7_interface_descriptor_data));
-	/* interface descriptor address, is relative to the dynamics base address */
-	OUT_BATCH(interface_descriptor);
-}
-
-static void
-gen7_emit_media_objects(struct intel_batchbuffer *batch,
-			unsigned x, unsigned y,
-			unsigned width, unsigned height)
-{
-	int i, j;
-
-	for (i = 0; i < width / 16; i++) {
-		for (j = 0; j < height / 16; j++) {
-			OUT_BATCH(GEN7_MEDIA_OBJECT | (8 - 2));
-
-			/* interface descriptor offset */
-			OUT_BATCH(0);
-
-			/* without indirect data */
-			OUT_BATCH(0);
-			OUT_BATCH(0);
-
-			/* scoreboard */
-			OUT_BATCH(0);
-			OUT_BATCH(0);
-
-			/* inline data (xoffset, yoffset) */
-			OUT_BATCH(x + i * 16);
-			OUT_BATCH(y + j * 16);
-		}
-	}
-}
-
 /*
  * This sets up the media pipeline,
  *
diff --git a/lib/media_fill_gen8.c b/lib/media_fill_gen8.c
index 4a8fe5a2..4270997e 100644
--- a/lib/media_fill_gen8.c
+++ b/lib/media_fill_gen8.c
@@ -5,7 +5,7 @@
 #include "gen8_media.h"
 #include "intel_reg.h"
 #include "drmtest.h"
-
+#include "gpu_fill.h"
 #include <assert.h>
 
 
@@ -23,293 +23,7 @@ static const uint32_t media_kernel[][4] = {
 	{ 0x07800031, 0x20000a40, 0x0e000e00, 0x82000010 },
 };
 
-static uint32_t
-batch_used(struct intel_batchbuffer *batch)
-{
-	return batch->ptr - batch->buffer;
-}
-
-static uint32_t
-batch_align(struct intel_batchbuffer *batch, uint32_t align)
-{
-	uint32_t offset = batch_used(batch);
-	offset = ALIGN(offset, align);
-	batch->ptr = batch->buffer + offset;
-	return offset;
-}
-
-static void *
-batch_alloc(struct intel_batchbuffer *batch, uint32_t size, uint32_t align)
-{
-	uint32_t offset = batch_align(batch, align);
-	batch->ptr += size;
-	return memset(batch->buffer + offset, 0, size);
-}
-
-static uint32_t
-batch_offset(struct intel_batchbuffer *batch, void *ptr)
-{
-	return (uint8_t *)ptr - batch->buffer;
-}
-
-static uint32_t
-batch_copy(struct intel_batchbuffer *batch, const void *ptr, uint32_t size, uint32_t align)
-{
-	return batch_offset(batch, memcpy(batch_alloc(batch, size, align), ptr, size));
-}
-
-static void
-gen8_render_flush(struct intel_batchbuffer *batch, uint32_t batch_end)
-{
-	int ret;
-
-	ret = drm_intel_bo_subdata(batch->bo, 0, 4096, batch->buffer);
-	if (ret == 0)
-		ret = drm_intel_bo_mrb_exec(batch->bo, batch_end,
-					NULL, 0, 0, 0);
-	igt_assert(ret == 0);
-}
-
-static uint32_t
-gen8_fill_curbe_buffer_data(struct intel_batchbuffer *batch,
-			uint8_t color)
-{
-	uint8_t *curbe_buffer;
-	uint32_t offset;
-
-	curbe_buffer = batch_alloc(batch, sizeof(uint32_t) * 8, 64);
-	offset = batch_offset(batch, curbe_buffer);
-	*curbe_buffer = color;
-
-	return offset;
-}
-
-static uint32_t
-gen8_fill_surface_state(struct intel_batchbuffer *batch,
-			struct igt_buf *buf,
-			uint32_t format,
-			int is_dst)
-{
-	struct gen8_surface_state *ss;
-	uint32_t write_domain, read_domain, offset;
-	int ret;
-
-	if (is_dst) {
-		write_domain = read_domain = I915_GEM_DOMAIN_RENDER;
-	} else {
-		write_domain = 0;
-		read_domain = I915_GEM_DOMAIN_SAMPLER;
-	}
-
-	ss = batch_alloc(batch, sizeof(*ss), 64);
-	offset = batch_offset(batch, ss);
-
-	ss->ss0.surface_type = GEN8_SURFACE_2D;
-	ss->ss0.surface_format = format;
-	ss->ss0.render_cache_read_write = 1;
-	ss->ss0.vertical_alignment = 1; /* align 4 */
-	ss->ss0.horizontal_alignment = 1; /* align 4 */
-
-	if (buf->tiling == I915_TILING_X)
-		ss->ss0.tiled_mode = 2;
-	else if (buf->tiling == I915_TILING_Y)
-		ss->ss0.tiled_mode = 3;
-
-	ss->ss8.base_addr = buf->bo->offset;
-
-	ret = drm_intel_bo_emit_reloc(batch->bo,
-				batch_offset(batch, ss) + 8 * 4,
-				buf->bo, 0,
-				read_domain, write_domain);
-	igt_assert(ret == 0);
-
-	ss->ss2.height = igt_buf_height(buf) - 1;
-	ss->ss2.width  = igt_buf_width(buf) - 1;
-	ss->ss3.pitch  = buf->stride - 1;
-
-	ss->ss7.shader_chanel_select_r = 4;
-	ss->ss7.shader_chanel_select_g = 5;
-	ss->ss7.shader_chanel_select_b = 6;
-	ss->ss7.shader_chanel_select_a = 7;
-
-	return offset;
-}
-
-static uint32_t
-gen8_fill_binding_table(struct intel_batchbuffer *batch,
-			struct igt_buf *dst)
-{
-	uint32_t *binding_table, offset;
-
-	binding_table = batch_alloc(batch, 32, 64);
-	offset = batch_offset(batch, binding_table);
-
-	binding_table[0] = gen8_fill_surface_state(batch, dst, GEN8_SURFACEFORMAT_R8_UNORM, 1);
-
-	return offset;
-}
-
-static uint32_t
-gen8_fill_media_kernel(struct intel_batchbuffer *batch,
-		const uint32_t kernel[][4],
-		size_t size)
-{
-	uint32_t offset;
-
-	offset = batch_copy(batch, kernel, size, 64);
-
-	return offset;
-}
-
-static uint32_t
-gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst)
-{
-	struct gen8_interface_descriptor_data *idd;
-	uint32_t offset;
-	uint32_t binding_table_offset, kernel_offset;
-
-	binding_table_offset = gen8_fill_binding_table(batch, dst);
-	kernel_offset = gen8_fill_media_kernel(batch, media_kernel, sizeof(media_kernel));
-
-	idd = batch_alloc(batch, sizeof(*idd), 64);
-	offset = batch_offset(batch, idd);
-
-	idd->desc0.kernel_start_pointer = (kernel_offset >> 6);
-
-	idd->desc2.single_program_flow = 1;
-	idd->desc2.floating_point_mode = GEN8_FLOATING_POINT_IEEE_754;
-
-	idd->desc3.sampler_count = 0;      /* 0 samplers used */
-	idd->desc3.sampler_state_pointer = 0;
-
-	idd->desc4.binding_table_entry_count = 0;
-	idd->desc4.binding_table_pointer = (binding_table_offset >> 5);
-
-	idd->desc5.constant_urb_entry_read_offset = 0;
-	idd->desc5.constant_urb_entry_read_length = 1; /* grf 1 */
-
-	return offset;
-}
-
-static void
-gen8_emit_state_base_address(struct intel_batchbuffer *batch)
-{
-	OUT_BATCH(GEN8_STATE_BASE_ADDRESS | (16 - 2));
-
-	/* general */
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-	OUT_BATCH(0);
-
-	/* stateless data port */
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-
-	/* surface */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_SAMPLER, 0, BASE_ADDRESS_MODIFY);
-
-	/* dynamic */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION,
-		0, BASE_ADDRESS_MODIFY);
-
-	/* indirect */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	/* instruction */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
-
-	/* general state buffer size */
-	OUT_BATCH(0xfffff000 | 1);
-	/* dynamic state buffer size */
-	OUT_BATCH(1 << 12 | 1);
-	/* indirect object buffer size */
-	OUT_BATCH(0xfffff000 | 1);
-	/* intruction buffer size, must set modify enable bit, otherwise it may result in GPU hang */
-	OUT_BATCH(1 << 12 | 1);
-}
-
-static void
-gen8_emit_vfe_state(struct intel_batchbuffer *batch)
-{
-	OUT_BATCH(GEN8_MEDIA_VFE_STATE | (9 - 2));
-
-	/* scratch buffer */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	/* number of threads & urb entries */
-	OUT_BATCH(1 << 16 |
-		2 << 8);
-
-	OUT_BATCH(0);
-
-	/* urb entry size & curbe size */
-	OUT_BATCH(2 << 16 |
-		2);
-
-	/* scoreboard */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-}
-
-static void
-gen8_emit_curbe_load(struct intel_batchbuffer *batch, uint32_t curbe_buffer)
-{
-	OUT_BATCH(GEN8_MEDIA_CURBE_LOAD | (4 - 2));
-	OUT_BATCH(0);
-	/* curbe total data length */
-	OUT_BATCH(64);
-	/* curbe data start address, is relative to the dynamics base address */
-	OUT_BATCH(curbe_buffer);
-}
 
-static void
-gen8_emit_interface_descriptor_load(struct intel_batchbuffer *batch, uint32_t interface_descriptor)
-{
-	OUT_BATCH(GEN8_MEDIA_INTERFACE_DESCRIPTOR_LOAD | (4 - 2));
-	OUT_BATCH(0);
-	/* interface descriptor data length */
-	OUT_BATCH(sizeof(struct gen8_interface_descriptor_data));
-	/* interface descriptor address, is relative to the dynamics base address */
-	OUT_BATCH(interface_descriptor);
-}
-
-static void
-gen8_emit_media_state_flush(struct intel_batchbuffer *batch)
-{
-	OUT_BATCH(GEN8_MEDIA_STATE_FLUSH | (2 - 2));
-	OUT_BATCH(0);
-}
-
-static void
-gen8_emit_media_objects(struct intel_batchbuffer *batch,
-			unsigned x, unsigned y,
-			unsigned width, unsigned height)
-{
-	int i, j;
-
-	for (i = 0; i < width / 16; i++) {
-		for (j = 0; j < height / 16; j++) {
-			OUT_BATCH(GEN8_MEDIA_OBJECT | (8 - 2));
-
-			/* interface descriptor offset */
-			OUT_BATCH(0);
-
-			/* without indirect data */
-			OUT_BATCH(0);
-			OUT_BATCH(0);
-
-			/* scoreboard */
-			OUT_BATCH(0);
-			OUT_BATCH(0);
-
-			/* inline data (xoffset, yoffset) */
-			OUT_BATCH(x + i * 16);
-			OUT_BATCH(y + j * 16);
-			gen8_emit_media_state_flush(batch);
-		}
-	}
-}
 
 /*
  * This sets up the media pipeline,
@@ -349,7 +63,7 @@ gen8_media_fillfunc(struct intel_batchbuffer *batch,
 	batch->ptr = &batch->buffer[BATCH_STATE_SPLIT];
 
 	curbe_buffer = gen8_fill_curbe_buffer_data(batch, color);
-	interface_descriptor = gen8_fill_interface_descriptor(batch, dst);
+	interface_descriptor = gen8_fill_interface_descriptor(batch, dst, media_kernel, sizeof(media_kernel));
 	igt_assert(batch->ptr < &batch->buffer[4095]);
 
 	/* media pipeline */
diff --git a/lib/media_fill_gen8lp.c b/lib/media_fill_gen8lp.c
index 1f8a4adc..dcc11982 100644
--- a/lib/media_fill_gen8lp.c
+++ b/lib/media_fill_gen8lp.c
@@ -5,7 +5,7 @@
 #include "gen8_media.h"
 #include "intel_reg.h"
 #include "drmtest.h"
-
+#include "gpu_fill.h"
 #include <assert.h>
 
 
@@ -23,286 +23,6 @@ static const uint32_t media_kernel[][4] = {
 	{ 0x07800031, 0x20000a40, 0x0e000e00, 0x82000010 },
 };
 
-static uint32_t
-batch_used(struct intel_batchbuffer *batch)
-{
-	return batch->ptr - batch->buffer;
-}
-
-static uint32_t
-batch_align(struct intel_batchbuffer *batch, uint32_t align)
-{
-	uint32_t offset = batch_used(batch);
-	offset = ALIGN(offset, align);
-	batch->ptr = batch->buffer + offset;
-	return offset;
-}
-
-static void *
-batch_alloc(struct intel_batchbuffer *batch, uint32_t size, uint32_t align)
-{
-	uint32_t offset = batch_align(batch, align);
-	batch->ptr += size;
-	return memset(batch->buffer + offset, 0, size);
-}
-
-static uint32_t
-batch_offset(struct intel_batchbuffer *batch, void *ptr)
-{
-	return (uint8_t *)ptr - batch->buffer;
-}
-
-static uint32_t
-batch_copy(struct intel_batchbuffer *batch, const void *ptr, uint32_t size, uint32_t align)
-{
-	return batch_offset(batch, memcpy(batch_alloc(batch, size, align), ptr, size));
-}
-
-static void
-gen8_render_flush(struct intel_batchbuffer *batch, uint32_t batch_end)
-{
-	int ret;
-
-	ret = drm_intel_bo_subdata(batch->bo, 0, 4096, batch->buffer);
-	if (ret == 0)
-		ret = drm_intel_bo_mrb_exec(batch->bo, batch_end,
-					NULL, 0, 0, 0);
-	igt_assert(ret == 0);
-}
-
-static uint32_t
-gen8_fill_curbe_buffer_data(struct intel_batchbuffer *batch,
-			uint8_t color)
-{
-	uint8_t *curbe_buffer;
-	uint32_t offset;
-
-	curbe_buffer = batch_alloc(batch, sizeof(uint32_t) * 8, 64);
-	offset = batch_offset(batch, curbe_buffer);
-	*curbe_buffer = color;
-
-	return offset;
-}
-
-static uint32_t
-gen8_fill_surface_state(struct intel_batchbuffer *batch,
-			struct igt_buf *buf,
-			uint32_t format,
-			int is_dst)
-{
-	struct gen8_surface_state *ss;
-	uint32_t write_domain, read_domain, offset;
-	int ret;
-
-	if (is_dst) {
-		write_domain = read_domain = I915_GEM_DOMAIN_RENDER;
-	} else {
-		write_domain = 0;
-		read_domain = I915_GEM_DOMAIN_SAMPLER;
-	}
-
-	ss = batch_alloc(batch, sizeof(*ss), 64);
-	offset = batch_offset(batch, ss);
-
-	ss->ss0.surface_type = GEN8_SURFACE_2D;
-	ss->ss0.surface_format = format;
-	ss->ss0.render_cache_read_write = 1;
-	ss->ss0.vertical_alignment = 1; /* align 4 */
-	ss->ss0.horizontal_alignment = 1; /* align 4 */
-
-	if (buf->tiling == I915_TILING_X)
-		ss->ss0.tiled_mode = 2;
-	else if (buf->tiling == I915_TILING_Y)
-		ss->ss0.tiled_mode = 3;
-
-	ss->ss8.base_addr = buf->bo->offset;
-
-	ret = drm_intel_bo_emit_reloc(batch->bo,
-				batch_offset(batch, ss) + 8 * 4,
-				buf->bo, 0,
-				read_domain, write_domain);
-	igt_assert(ret == 0);
-
-	ss->ss2.height = igt_buf_height(buf) - 1;
-	ss->ss2.width  = igt_buf_width(buf) - 1;
-	ss->ss3.pitch  = buf->stride - 1;
-
-	ss->ss7.shader_chanel_select_r = 4;
-	ss->ss7.shader_chanel_select_g = 5;
-	ss->ss7.shader_chanel_select_b = 6;
-	ss->ss7.shader_chanel_select_a = 7;
-
-	return offset;
-}
-
-static uint32_t
-gen8_fill_binding_table(struct intel_batchbuffer *batch,
-			struct igt_buf *dst)
-{
-	uint32_t *binding_table, offset;
-
-	binding_table = batch_alloc(batch, 32, 64);
-	offset = batch_offset(batch, binding_table);
-
-	binding_table[0] = gen8_fill_surface_state(batch, dst, GEN8_SURFACEFORMAT_R8_UNORM, 1);
-
-	return offset;
-}
-
-static uint32_t
-gen8_fill_media_kernel(struct intel_batchbuffer *batch,
-		const uint32_t kernel[][4],
-		size_t size)
-{
-	uint32_t offset;
-
-	offset = batch_copy(batch, kernel, size, 64);
-
-	return offset;
-}
-
-static uint32_t
-gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst)
-{
-	struct gen8_interface_descriptor_data *idd;
-	uint32_t offset;
-	uint32_t binding_table_offset, kernel_offset;
-
-	binding_table_offset = gen8_fill_binding_table(batch, dst);
-	kernel_offset = gen8_fill_media_kernel(batch, media_kernel, sizeof(media_kernel));
-
-	idd = batch_alloc(batch, sizeof(*idd), 64);
-	offset = batch_offset(batch, idd);
-
-	idd->desc0.kernel_start_pointer = (kernel_offset >> 6);
-
-	idd->desc2.single_program_flow = 1;
-	idd->desc2.floating_point_mode = GEN8_FLOATING_POINT_IEEE_754;
-
-	idd->desc3.sampler_count = 0;      /* 0 samplers used */
-	idd->desc3.sampler_state_pointer = 0;
-
-	idd->desc4.binding_table_entry_count = 0;
-	idd->desc4.binding_table_pointer = (binding_table_offset >> 5);
-
-	idd->desc5.constant_urb_entry_read_offset = 0;
-	idd->desc5.constant_urb_entry_read_length = 1; /* grf 1 */
-
-	return offset;
-}
-
-static void
-gen8_emit_state_base_address(struct intel_batchbuffer *batch)
-{
-	OUT_BATCH(GEN8_STATE_BASE_ADDRESS | (16 - 2));
-
-	/* general */
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-	OUT_BATCH(0);
-
-	/* stateless data port */
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-
-	/* surface */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_SAMPLER, 0, BASE_ADDRESS_MODIFY);
-
-	/* dynamic */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION,
-		0, BASE_ADDRESS_MODIFY);
-
-	/* indirect */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	/* instruction */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
-
-	/* general state buffer size */
-	OUT_BATCH(0xfffff000 | 1);
-	/* dynamic state buffer size */
-	OUT_BATCH(1 << 12 | 1);
-	/* indirect object buffer size */
-	OUT_BATCH(0xfffff000 | 1);
-	/* intruction buffer size, must set modify enable bit, otherwise it may result in GPU hang */
-	OUT_BATCH(1 << 12 | 1);
-}
-
-static void
-gen8_emit_vfe_state(struct intel_batchbuffer *batch)
-{
-	OUT_BATCH(GEN8_MEDIA_VFE_STATE | (9 - 2));
-
-	/* scratch buffer */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	/* number of threads & urb entries */
-	OUT_BATCH(1 << 16 |
-		2 << 8);
-
-	OUT_BATCH(0);
-
-	/* urb entry size & curbe size */
-	OUT_BATCH(2 << 16 |
-		2);
-
-	/* scoreboard */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-}
-
-static void
-gen8_emit_curbe_load(struct intel_batchbuffer *batch, uint32_t curbe_buffer)
-{
-	OUT_BATCH(GEN8_MEDIA_CURBE_LOAD | (4 - 2));
-	OUT_BATCH(0);
-	/* curbe total data length */
-	OUT_BATCH(64);
-	/* curbe data start address, is relative to the dynamics base address */
-	OUT_BATCH(curbe_buffer);
-}
-
-static void
-gen8_emit_interface_descriptor_load(struct intel_batchbuffer *batch, uint32_t interface_descriptor)
-{
-	OUT_BATCH(GEN8_MEDIA_INTERFACE_DESCRIPTOR_LOAD | (4 - 2));
-	OUT_BATCH(0);
-	/* interface descriptor data length */
-	OUT_BATCH(sizeof(struct gen8_interface_descriptor_data));
-	/* interface descriptor address, is relative to the dynamics base address */
-	OUT_BATCH(interface_descriptor);
-}
-
-static void
-gen8lp_emit_media_objects(struct intel_batchbuffer *batch,
-			unsigned x, unsigned y,
-			unsigned width, unsigned height)
-{
-	int i, j;
-
-	for (i = 0; i < width / 16; i++) {
-		for (j = 0; j < height / 16; j++) {
-			OUT_BATCH(GEN8_MEDIA_OBJECT | (8 - 2));
-
-			/* interface descriptor offset */
-			OUT_BATCH(0);
-
-			/* without indirect data */
-			OUT_BATCH(0);
-			OUT_BATCH(0);
-
-			/* scoreboard */
-			OUT_BATCH(0);
-			OUT_BATCH(0);
-
-			/* inline data (xoffset, yoffset) */
-			OUT_BATCH(x + i * 16);
-			OUT_BATCH(y + j * 16);
-		}
-	}
-}
-
 /*
  * This sets up the media pipeline,
  *
@@ -341,7 +61,7 @@ gen8lp_media_fillfunc(struct intel_batchbuffer *batch,
 	batch->ptr = &batch->buffer[BATCH_STATE_SPLIT];
 
 	curbe_buffer = gen8_fill_curbe_buffer_data(batch, color);
-	interface_descriptor = gen8_fill_interface_descriptor(batch, dst);
+	interface_descriptor = gen8_fill_interface_descriptor(batch, dst, media_kernel, sizeof(media_kernel));
 	igt_assert(batch->ptr < &batch->buffer[4095]);
 
 	/* media pipeline */
diff --git a/lib/media_fill_gen9.c b/lib/media_fill_gen9.c
index 3fd21819..6accdbe4 100644
--- a/lib/media_fill_gen9.c
+++ b/lib/media_fill_gen9.c
@@ -4,11 +4,9 @@
 #include "media_fill.h"
 #include "gen8_media.h"
 #include "intel_reg.h"
-
+#include "gpu_fill.h"
 #include <assert.h>
 
-#define ALIGN(x, y) (((x) + (y)-1) & ~((y)-1))
-
 static const uint32_t media_kernel[][4] = {
 	{ 0x00400001, 0x20202288, 0x00000020, 0x00000000 },
 	{ 0x00600001, 0x20800208, 0x008d0000, 0x00000000 },
@@ -23,298 +21,6 @@ static const uint32_t media_kernel[][4] = {
 	{ 0x07800031, 0x20000a40, 0x0e000e00, 0x82000010 },
 };
 
-static uint32_t
-batch_used(struct intel_batchbuffer *batch)
-{
-	return batch->ptr - batch->buffer;
-}
-
-static uint32_t
-batch_align(struct intel_batchbuffer *batch, uint32_t align)
-{
-	uint32_t offset = batch_used(batch);
-	offset = ALIGN(offset, align);
-	batch->ptr = batch->buffer + offset;
-	return offset;
-}
-
-static void *
-batch_alloc(struct intel_batchbuffer *batch, uint32_t size, uint32_t align)
-{
-	uint32_t offset = batch_align(batch, align);
-	batch->ptr += size;
-	return memset(batch->buffer + offset, 0, size);
-}
-
-static uint32_t
-batch_offset(struct intel_batchbuffer *batch, void *ptr)
-{
-	return (uint8_t *)ptr - batch->buffer;
-}
-
-static uint32_t
-batch_copy(struct intel_batchbuffer *batch, const void *ptr, uint32_t size, uint32_t align)
-{
-	return batch_offset(batch, memcpy(batch_alloc(batch, size, align), ptr, size));
-}
-
-static void
-gen8_render_flush(struct intel_batchbuffer *batch, uint32_t batch_end)
-{
-	int ret;
-
-	ret = drm_intel_bo_subdata(batch->bo, 0, 4096, batch->buffer);
-	if (ret == 0)
-		ret = drm_intel_bo_mrb_exec(batch->bo, batch_end,
-					NULL, 0, 0, 0);
-	assert(ret == 0);
-}
-
-static uint32_t
-gen8_fill_curbe_buffer_data(struct intel_batchbuffer *batch,
-			uint8_t color)
-{
-	uint8_t *curbe_buffer;
-	uint32_t offset;
-
-	curbe_buffer = batch_alloc(batch, sizeof(uint32_t) * 8, 64);
-	offset = batch_offset(batch, curbe_buffer);
-	*curbe_buffer = color;
-
-	return offset;
-}
-
-static uint32_t
-gen8_fill_surface_state(struct intel_batchbuffer *batch,
-			struct igt_buf *buf,
-			uint32_t format,
-			int is_dst)
-{
-	struct gen8_surface_state *ss;
-	uint32_t write_domain, read_domain, offset;
-	int ret;
-
-	if (is_dst) {
-		write_domain = read_domain = I915_GEM_DOMAIN_RENDER;
-	} else {
-		write_domain = 0;
-		read_domain = I915_GEM_DOMAIN_SAMPLER;
-	}
-
-	ss = batch_alloc(batch, sizeof(*ss), 64);
-	offset = batch_offset(batch, ss);
-
-	ss->ss0.surface_type = GEN8_SURFACE_2D;
-	ss->ss0.surface_format = format;
-	ss->ss0.render_cache_read_write = 1;
-	ss->ss0.vertical_alignment = 1; /* align 4 */
-	ss->ss0.horizontal_alignment = 1; /* align 4 */
-
-	if (buf->tiling == I915_TILING_X)
-		ss->ss0.tiled_mode = 2;
-	else if (buf->tiling == I915_TILING_Y)
-		ss->ss0.tiled_mode = 3;
-
-	ss->ss8.base_addr = buf->bo->offset;
-
-	ret = drm_intel_bo_emit_reloc(batch->bo,
-				batch_offset(batch, ss) + 8 * 4,
-				buf->bo, 0,
-				read_domain, write_domain);
-	assert(ret == 0);
-
-	ss->ss2.height = igt_buf_height(buf) - 1;
-	ss->ss2.width  = igt_buf_width(buf) - 1;
-	ss->ss3.pitch  = buf->stride - 1;
-
-	ss->ss7.shader_chanel_select_r = 4;
-	ss->ss7.shader_chanel_select_g = 5;
-	ss->ss7.shader_chanel_select_b = 6;
-	ss->ss7.shader_chanel_select_a = 7;
-
-	return offset;
-}
-
-static uint32_t
-gen8_fill_binding_table(struct intel_batchbuffer *batch,
-			struct igt_buf *dst)
-{
-	uint32_t *binding_table, offset;
-
-	binding_table = batch_alloc(batch, 32, 64);
-	offset = batch_offset(batch, binding_table);
-
-	binding_table[0] = gen8_fill_surface_state(batch, dst, GEN8_SURFACEFORMAT_R8_UNORM, 1);
-
-	return offset;
-}
-
-static uint32_t
-gen8_fill_media_kernel(struct intel_batchbuffer *batch,
-		const uint32_t kernel[][4],
-		size_t size)
-{
-	uint32_t offset;
-
-	offset = batch_copy(batch, kernel, size, 64);
-
-	return offset;
-}
-
-static uint32_t
-gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst)
-{
-	struct gen8_interface_descriptor_data *idd;
-	uint32_t offset;
-	uint32_t binding_table_offset, kernel_offset;
-
-	binding_table_offset = gen8_fill_binding_table(batch, dst);
-	kernel_offset = gen8_fill_media_kernel(batch, media_kernel, sizeof(media_kernel));
-
-	idd = batch_alloc(batch, sizeof(*idd), 64);
-	offset = batch_offset(batch, idd);
-
-	idd->desc0.kernel_start_pointer = (kernel_offset >> 6);
-
-	idd->desc2.single_program_flow = 1;
-	idd->desc2.floating_point_mode = GEN8_FLOATING_POINT_IEEE_754;
-
-	idd->desc3.sampler_count = 0;      /* 0 samplers used */
-	idd->desc3.sampler_state_pointer = 0;
-
-	idd->desc4.binding_table_entry_count = 0;
-	idd->desc4.binding_table_pointer = (binding_table_offset >> 5);
-
-	idd->desc5.constant_urb_entry_read_offset = 0;
-	idd->desc5.constant_urb_entry_read_length = 1; /* grf 1 */
-
-	return offset;
-}
-
-static void
-gen9_emit_state_base_address(struct intel_batchbuffer *batch)
-{
-	OUT_BATCH(GEN8_STATE_BASE_ADDRESS | (19 - 2));
-
-	/* general */
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-	OUT_BATCH(0);
-
-	/* stateless data port */
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-
-	/* surface */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_SAMPLER, 0, BASE_ADDRESS_MODIFY);
-
-	/* dynamic */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION,
-		0, BASE_ADDRESS_MODIFY);
-
-	/* indirect */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	/* instruction */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
-
-	/* general state buffer size */
-	OUT_BATCH(0xfffff000 | 1);
-	/* dynamic state buffer size */
-	OUT_BATCH(1 << 12 | 1);
-	/* indirect object buffer size */
-	OUT_BATCH(0xfffff000 | 1);
-	/* intruction buffer size, must set modify enable bit, otherwise it may result in GPU hang */
-	OUT_BATCH(1 << 12 | 1);
-
-	/* Bindless surface state base address */
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-	OUT_BATCH(0);
-	OUT_BATCH(0xfffff000);
-}
-
-static void
-gen8_emit_vfe_state(struct intel_batchbuffer *batch)
-{
-	OUT_BATCH(GEN8_MEDIA_VFE_STATE | (9 - 2));
-
-	/* scratch buffer */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	/* number of threads & urb entries */
-	OUT_BATCH(1 << 16 |
-		2 << 8);
-
-	OUT_BATCH(0);
-
-	/* urb entry size & curbe size */
-	OUT_BATCH(2 << 16 |
-		2);
-
-	/* scoreboard */
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-}
-
-static void
-gen8_emit_curbe_load(struct intel_batchbuffer *batch, uint32_t curbe_buffer)
-{
-	OUT_BATCH(GEN8_MEDIA_CURBE_LOAD | (4 - 2));
-	OUT_BATCH(0);
-	/* curbe total data length */
-	OUT_BATCH(64);
-	/* curbe data start address, is relative to the dynamics base address */
-	OUT_BATCH(curbe_buffer);
-}
-
-static void
-gen8_emit_interface_descriptor_load(struct intel_batchbuffer *batch, uint32_t interface_descriptor)
-{
-	OUT_BATCH(GEN8_MEDIA_INTERFACE_DESCRIPTOR_LOAD | (4 - 2));
-	OUT_BATCH(0);
-	/* interface descriptor data length */
-	OUT_BATCH(sizeof(struct gen8_interface_descriptor_data));
-	/* interface descriptor address, is relative to the dynamics base address */
-	OUT_BATCH(interface_descriptor);
-}
-
-static void
-gen8_emit_media_state_flush(struct intel_batchbuffer *batch)
-{
-	OUT_BATCH(GEN8_MEDIA_STATE_FLUSH | (2 - 2));
-	OUT_BATCH(0);
-}
-
-static void
-gen8_emit_media_objects(struct intel_batchbuffer *batch,
-			unsigned x, unsigned y,
-			unsigned width, unsigned height)
-{
-	int i, j;
-
-	for (i = 0; i < width / 16; i++) {
-		for (j = 0; j < height / 16; j++) {
-			OUT_BATCH(GEN8_MEDIA_OBJECT | (8 - 2));
-
-			/* interface descriptor offset */
-			OUT_BATCH(0);
-
-			/* without indirect data */
-			OUT_BATCH(0);
-			OUT_BATCH(0);
-
-			/* scoreboard */
-			OUT_BATCH(0);
-			OUT_BATCH(0);
-
-			/* inline data (xoffset, yoffset) */
-			OUT_BATCH(x + i * 16);
-			OUT_BATCH(y + j * 16);
-			gen8_emit_media_state_flush(batch);
-		}
-	}
-}
 
 /*
  * This sets up the media pipeline,
@@ -354,7 +60,7 @@ gen9_media_fillfunc(struct intel_batchbuffer *batch,
 	batch->ptr = &batch->buffer[BATCH_STATE_SPLIT];
 
 	curbe_buffer = gen8_fill_curbe_buffer_data(batch, color);
-	interface_descriptor = gen8_fill_interface_descriptor(batch, dst);
+	interface_descriptor = gen8_fill_interface_descriptor(batch, dst, media_kernel, sizeof(media_kernel));
 	assert(batch->ptr < &batch->buffer[4095]);
 
 	/* media pipeline */
diff --git a/lib/meson.build b/lib/meson.build
index b3b8b14a..385e08b9 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -30,6 +30,7 @@ lib_sources = [
 	'media_fill_gen9.c',
 	'media_spin.c',
 	'gpgpu_fill.c',
+	'gpu_fill.c',
 	'rendercopy_i915.c',
 	'rendercopy_i830.c',
 	'rendercopy_gen6.c',
-- 
2.14.3

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [igt-dev] [PATCH i-g-t v3 2/4] lib: Remove duplications in gpu_fill library
  2018-04-09 14:34 [igt-dev] [PATCH i-g-t v3 0/4] Refactoring of *_fill libraries Katarzyna Dec
  2018-04-09 14:34 ` [igt-dev] [PATCH i-g-t v3 1/4] lib: Move common gpgpu/media fill functions to gpu_fill library Katarzyna Dec
@ 2018-04-09 14:34 ` Katarzyna Dec
  2018-04-09 14:34 ` [igt-dev] [PATCH i-g-t v3 3/4] lib/gpgpu_fill: Add missing configuration parameters for gpgpu_fill Katarzyna Dec
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Katarzyna Dec @ 2018-04-09 14:34 UTC (permalink / raw)
  To: igt-dev

After moving all functions needed for gpgpu and media fill testing
there is a lot of duplications which can be removed:
  Library media_fill_gen8 and media_fill_gen8lp for CHT was removed,
media state flush for !CHT was added to gen7_emit_media_objects.
  Many gen8 functions were replaced with gen7 version with devid
parameter (gen7_fill_curbe_load, gen7_emit_interface_descriptor,
gen7_fill_binding_table, gen7_emit_media_objects). Unified fill kernel
function so it is applicable to all gens and both media and gpgpu
(merged gen7_fill_media_kernel and gen8_fill_media_kernel).
  Duplicated constants like GEN8_MEDIA_VFE_STATE, GEN8_MEDIA_CURBE_LOAD,
GEN8_MEDIA_INTERFACE_DESCRIPTOR_LOAD, GEN8_MEDIA_OBJECT were
replaced by GEN7 version. However this constants were not removed
from gen8_media.h library, because they are used by other tests
for Gen8+. More refactoring in this gen*_media.h libraries is needed.

It seems that further unification of *_fillfunc functions will
introduce more confusion in understanding what the tests are doing
and what were changes between Gens.

v2: Moved some reduntant changes from Move gpgpu/media fill to gpu_fill...
to this patch. Applied comments from review.

Signed-off-by: Katarzyna Dec <katarzyna.dec@intel.com>
Cc: Lukasz Kalamarz <lukasz.kalamarz@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 lib/Makefile.sources    |   1 -
 lib/gpgpu_fill.c        |   2 +-
 lib/gpu_fill.c          | 130 +++++++-----------------------------------------
 lib/gpu_fill.h          |  27 +---------
 lib/intel_batchbuffer.c |   4 +-
 lib/media_fill.h        |   7 ---
 lib/media_fill_gen8.c   |   6 +--
 lib/media_fill_gen8lp.c |  87 --------------------------------
 lib/media_fill_gen9.c   |   6 +--
 lib/meson.build         |   1 -
 10 files changed, 26 insertions(+), 245 deletions(-)
 delete mode 100644 lib/media_fill_gen8lp.c

diff --git a/lib/Makefile.sources b/lib/Makefile.sources
index 45e65dd7..9c0150c1 100644
--- a/lib/Makefile.sources
+++ b/lib/Makefile.sources
@@ -58,7 +58,6 @@ lib_source_list =	 	\
 	media_fill.h            \
 	media_fill_gen7.c       \
 	media_fill_gen8.c       \
-	media_fill_gen8lp.c     \
 	media_fill_gen9.c       \
 	media_spin.h		\
 	media_spin.c		\
diff --git a/lib/gpgpu_fill.c b/lib/gpgpu_fill.c
index f2765fd6..579ce78d 100644
--- a/lib/gpgpu_fill.c
+++ b/lib/gpgpu_fill.c
@@ -180,7 +180,7 @@ gen8_gpgpu_fillfunc(struct intel_batchbuffer *batch,
 	gen8_emit_state_base_address(batch);
 	gen8_emit_vfe_state_gpgpu(batch);
 	gen7_emit_curbe_load(batch, curbe_buffer);
-	gen8_emit_interface_descriptor_load(batch, interface_descriptor);
+	gen7_emit_interface_descriptor_load(batch, interface_descriptor);
 	gen8_emit_gpgpu_walk(batch, x, y, width, height);
 
 	OUT_BATCH(MI_BATCH_BUFFER_END);
diff --git a/lib/gpu_fill.c b/lib/gpu_fill.c
index 172c6db6..f86e3414 100644
--- a/lib/gpu_fill.c
+++ b/lib/gpu_fill.c
@@ -118,26 +118,18 @@ gen7_fill_binding_table(struct intel_batchbuffer *batch,
 
 	binding_table = batch_alloc(batch, 32, 64);
 	offset = batch_offset(batch, binding_table);
-	binding_table[0] = gen7_fill_surface_state(batch, dst,
+	if (IS_GEN7(batch->devid))
+		binding_table[0] = gen7_fill_surface_state(batch, dst,
 						GEN7_SURFACEFORMAT_R8_UNORM, 1);
+	else
+		binding_table[0] = gen8_fill_surface_state(batch, dst,
+						GEN8_SURFACEFORMAT_R8_UNORM, 1);
 
 	return offset;
 }
 
 uint32_t
-gen7_fill_media_kernel(struct intel_batchbuffer *batch,
-		const uint32_t kernel[][4],
-		size_t size)
-{
-	uint32_t offset;
-
-	offset = batch_copy(batch, kernel, size, 64);
-
-	return offset;
-}
-
-uint32_t
-gen8_fill_media_kernel(struct intel_batchbuffer *batch,
+gen7_fill_kernel(struct intel_batchbuffer *batch,
 		const uint32_t kernel[][4],
 		size_t size)
 {
@@ -157,7 +149,7 @@ gen7_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *
 	uint32_t binding_table_offset, kernel_offset;
 
 	binding_table_offset = gen7_fill_binding_table(batch, dst);
-	kernel_offset = gen7_fill_media_kernel(batch, kernel, size);
+	kernel_offset = gen7_fill_kernel(batch, kernel, size);
 
 	idd = batch_alloc(batch, sizeof(*idd), 64);
 	offset = batch_offset(batch, idd);
@@ -272,7 +264,10 @@ gen7_emit_interface_descriptor_load(struct intel_batchbuffer *batch, uint32_t in
 	OUT_BATCH(GEN7_MEDIA_INTERFACE_DESCRIPTOR_LOAD | (4 - 2));
 	OUT_BATCH(0);
 	/* interface descriptor data length */
-	OUT_BATCH(sizeof(struct gen7_interface_descriptor_data));
+	if (IS_GEN7(batch->devid))
+		OUT_BATCH(sizeof(struct gen7_interface_descriptor_data));
+	else
+		OUT_BATCH(sizeof(struct gen8_interface_descriptor_data));
 	/* interface descriptor address, is relative to the dynamics base address */
 	OUT_BATCH(interface_descriptor);
 }
@@ -302,6 +297,8 @@ gen7_emit_media_objects(struct intel_batchbuffer *batch,
 			/* inline data (xoffset, yoffset) */
 			OUT_BATCH(x + i * 16);
 			OUT_BATCH(y + j * 16);
+			if (AT_LEAST_GEN(batch->devid, 8) && !IS_CHERRYVIEW(batch->devid))
+				gen8_emit_media_state_flush(batch);
 		}
 	}
 }
@@ -441,21 +438,6 @@ gen8_fill_surface_state(struct intel_batchbuffer *batch,
 	return offset;
 }
 
-uint32_t
-gen8_fill_binding_table(struct intel_batchbuffer *batch,
-			struct igt_buf *dst)
-{
-	uint32_t *binding_table, offset;
-
-	binding_table = batch_alloc(batch, 32, 64);
-	offset = batch_offset(batch, binding_table);
-
-	binding_table[0] = gen8_fill_surface_state(batch, dst,
-						GEN8_SURFACEFORMAT_R8_UNORM, 1);
-
-	return offset;
-}
-
 uint32_t
 gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst,  const uint32_t kernel[][4], size_t size)
 {
@@ -463,8 +445,8 @@ gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *
 	uint32_t offset;
 	uint32_t binding_table_offset, kernel_offset;
 
-	binding_table_offset = gen8_fill_binding_table(batch, dst);
-	kernel_offset = gen8_fill_media_kernel(batch, kernel, size);
+	binding_table_offset = gen7_fill_binding_table(batch, dst);
+	kernel_offset = gen7_fill_kernel(batch, kernel, size);
 
 	idd = batch_alloc(batch, sizeof(*idd), 64);
 	offset = batch_offset(batch, idd);
@@ -525,7 +507,7 @@ gen8_emit_state_base_address(struct intel_batchbuffer *batch)
 void
 gen8_emit_vfe_state(struct intel_batchbuffer *batch)
 {
-	OUT_BATCH(GEN8_MEDIA_VFE_STATE | (9 - 2));
+	OUT_BATCH(GEN7_MEDIA_VFE_STATE | (9 - 2));
 
 	/* scratch buffer */
 	OUT_BATCH(0);
@@ -550,7 +532,7 @@ gen8_emit_vfe_state(struct intel_batchbuffer *batch)
 void
 gen8_emit_vfe_state_gpgpu(struct intel_batchbuffer *batch)
 {
-	OUT_BATCH(GEN8_MEDIA_VFE_STATE | (9 - 2));
+	OUT_BATCH(GEN7_MEDIA_VFE_STATE | (9 - 2));
 
 	/* scratch buffer */
 	OUT_BATCH(0);
@@ -570,28 +552,6 @@ gen8_emit_vfe_state_gpgpu(struct intel_batchbuffer *batch)
 	OUT_BATCH(0);
 }
 
-void
-gen8_emit_curbe_load(struct intel_batchbuffer *batch, uint32_t curbe_buffer)
-{
-	OUT_BATCH(GEN8_MEDIA_CURBE_LOAD | (4 - 2));
-	OUT_BATCH(0);
-	/* curbe total data length */
-	OUT_BATCH(64);
-	/* curbe data start address, is relative to the dynamics base address */
-	OUT_BATCH(curbe_buffer);
-}
-
-void
-gen8_emit_interface_descriptor_load(struct intel_batchbuffer *batch, uint32_t interface_descriptor)
-{
-	OUT_BATCH(GEN8_MEDIA_INTERFACE_DESCRIPTOR_LOAD | (4 - 2));
-	OUT_BATCH(0);
-	/* interface descriptor data length */
-	OUT_BATCH(sizeof(struct gen8_interface_descriptor_data));
-	/* interface descriptor address, is relative to the dynamics base address */
-	OUT_BATCH(interface_descriptor);
-}
-
 void
 gen8_emit_media_state_flush(struct intel_batchbuffer *batch)
 {
@@ -599,63 +559,7 @@ gen8_emit_media_state_flush(struct intel_batchbuffer *batch)
 	OUT_BATCH(0);
 }
 
-void
-gen8_emit_media_objects(struct intel_batchbuffer *batch,
-			unsigned x, unsigned y,
-			unsigned width, unsigned height)
-{
-	int i, j;
-
-	for (i = 0; i < width / 16; i++) {
-		for (j = 0; j < height / 16; j++) {
-			OUT_BATCH(GEN8_MEDIA_OBJECT | (8 - 2));
-
-			/* interface descriptor offset */
-			OUT_BATCH(0);
-
-			/* without indirect data */
-			OUT_BATCH(0);
-			OUT_BATCH(0);
-
-			/* scoreboard */
-			OUT_BATCH(0);
-			OUT_BATCH(0);
-
-			/* inline data (xoffset, yoffset) */
-			OUT_BATCH(x + i * 16);
-			OUT_BATCH(y + j * 16);
-			gen8_emit_media_state_flush(batch);
-		}
-	}
-}
-void
-gen8lp_emit_media_objects(struct intel_batchbuffer *batch,
-			unsigned x, unsigned y,
-			unsigned width, unsigned height)
-{
-	int i, j;
-
-	for (i = 0; i < width / 16; i++) {
-		for (j = 0; j < height / 16; j++) {
-			OUT_BATCH(GEN8_MEDIA_OBJECT | (8 - 2));
-
-			/* interface descriptor offset */
-			OUT_BATCH(0);
-
-			/* without indirect data */
-			OUT_BATCH(0);
-			OUT_BATCH(0);
-
-			/* scoreboard */
-			OUT_BATCH(0);
-			OUT_BATCH(0);
 
-			/* inline data (xoffset, yoffset) */
-			OUT_BATCH(x + i * 16);
-			OUT_BATCH(y + j * 16);
-		}
-	}
-}
 void
 gen8_emit_gpgpu_walk(struct intel_batchbuffer *batch,
 		     unsigned x, unsigned y,
diff --git a/lib/gpu_fill.h b/lib/gpu_fill.h
index 973513b0..9200fe4f 100644
--- a/lib/gpu_fill.h
+++ b/lib/gpu_fill.h
@@ -43,12 +43,7 @@ gen7_fill_binding_table(struct intel_batchbuffer *batch,
 			struct igt_buf *dst);
 
 uint32_t
-gen7_fill_media_kernel(struct intel_batchbuffer *batch,
-		const uint32_t kernel[][4],
-		size_t size);
-
-uint32_t
-gen8_fill_media_kernel(struct intel_batchbuffer *batch,
+gen7_fill_kernel(struct intel_batchbuffer *batch,
 		const uint32_t kernel[][4],
 		size_t size);
 
@@ -94,10 +89,6 @@ gen8_fill_surface_state(struct intel_batchbuffer *batch,
 			uint32_t format,
 			int is_dst);
 
-uint32_t
-gen8_fill_binding_table(struct intel_batchbuffer *batch,
-			struct igt_buf *dst);
-
 uint32_t
 gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst,  const uint32_t kernel[][4], size_t size);
 
@@ -110,25 +101,9 @@ gen8_emit_vfe_state(struct intel_batchbuffer *batch);
 void
 gen8_emit_vfe_state_gpgpu(struct intel_batchbuffer *batch);
 
-void
-gen8_emit_curbe_load(struct intel_batchbuffer *batch, uint32_t curbe_buffer);
-
-void
-gen8_emit_interface_descriptor_load(struct intel_batchbuffer *batch, uint32_t interface_descriptor);
-
 void
 gen8_emit_media_state_flush(struct intel_batchbuffer *batch);
 
-void
-gen8_emit_media_objects(struct intel_batchbuffer *batch,
-			unsigned x, unsigned y,
-			unsigned width, unsigned height);
-
-void
-gen8lp_emit_media_objects(struct intel_batchbuffer *batch,
-			unsigned x, unsigned y,
-			unsigned width, unsigned height);
-
 void
 gen8_emit_gpgpu_walk(struct intel_batchbuffer *batch,
 		     unsigned x, unsigned y,
diff --git a/lib/intel_batchbuffer.c b/lib/intel_batchbuffer.c
index 7c04ccf3..10d4dce8 100644
--- a/lib/intel_batchbuffer.c
+++ b/lib/intel_batchbuffer.c
@@ -796,12 +796,10 @@ igt_fillfunc_t igt_get_media_fillfunc(int devid)
 
 	if (IS_GEN9(devid))
 		fill = gen9_media_fillfunc;
-	else if (IS_BROADWELL(devid))
+	else if (IS_GEN8(devid))
 		fill = gen8_media_fillfunc;
 	else if (IS_GEN7(devid))
 		fill = gen7_media_fillfunc;
-	else if (IS_CHERRYVIEW(devid))
-		fill = gen8lp_media_fillfunc;
 
 	return fill;
 }
diff --git a/lib/media_fill.h b/lib/media_fill.h
index 226489cb..161af8cf 100644
--- a/lib/media_fill.h
+++ b/lib/media_fill.h
@@ -18,13 +18,6 @@ gen7_media_fillfunc(struct intel_batchbuffer *batch,
                 unsigned width, unsigned height,
                 uint8_t color);
 
-void
-gen8lp_media_fillfunc(struct intel_batchbuffer *batch,
-		struct igt_buf *dst,
-		unsigned x, unsigned y,
-		unsigned width, unsigned height,
-		uint8_t color);
-
 void
 gen9_media_fillfunc(struct intel_batchbuffer *batch,
                 struct igt_buf *dst,
diff --git a/lib/media_fill_gen8.c b/lib/media_fill_gen8.c
index 4270997e..e398c8ea 100644
--- a/lib/media_fill_gen8.c
+++ b/lib/media_fill_gen8.c
@@ -73,11 +73,11 @@ gen8_media_fillfunc(struct intel_batchbuffer *batch,
 
 	gen8_emit_vfe_state(batch);
 
-	gen8_emit_curbe_load(batch, curbe_buffer);
+	gen7_emit_curbe_load(batch, curbe_buffer);
 
-	gen8_emit_interface_descriptor_load(batch, interface_descriptor);
+	gen7_emit_interface_descriptor_load(batch, interface_descriptor);
 
-	gen8_emit_media_objects(batch, x, y, width, height);
+	gen7_emit_media_objects(batch, x, y, width, height);
 
 	OUT_BATCH(MI_BATCH_BUFFER_END);
 
diff --git a/lib/media_fill_gen8lp.c b/lib/media_fill_gen8lp.c
deleted file mode 100644
index dcc11982..00000000
--- a/lib/media_fill_gen8lp.c
+++ /dev/null
@@ -1,87 +0,0 @@
-#include <intel_bufmgr.h>
-#include <i915_drm.h>
-
-#include "media_fill.h"
-#include "gen8_media.h"
-#include "intel_reg.h"
-#include "drmtest.h"
-#include "gpu_fill.h"
-#include <assert.h>
-
-
-static const uint32_t media_kernel[][4] = {
-	{ 0x00400001, 0x20202288, 0x00000020, 0x00000000 },
-	{ 0x00600001, 0x20800208, 0x008d0000, 0x00000000 },
-	{ 0x00200001, 0x20800208, 0x00450040, 0x00000000 },
-	{ 0x00000001, 0x20880608, 0x00000000, 0x000f000f },
-	{ 0x00800001, 0x20a00208, 0x00000020, 0x00000000 },
-	{ 0x00800001, 0x20e00208, 0x00000020, 0x00000000 },
-	{ 0x00800001, 0x21200208, 0x00000020, 0x00000000 },
-	{ 0x00800001, 0x21600208, 0x00000020, 0x00000000 },
-	{ 0x0c800031, 0x24000a40, 0x0e000080, 0x120a8000 },
-	{ 0x00600001, 0x2e000208, 0x008d0000, 0x00000000 },
-	{ 0x07800031, 0x20000a40, 0x0e000e00, 0x82000010 },
-};
-
-/*
- * This sets up the media pipeline,
- *
- * +---------------+ <---- 4096
- * |       ^       |
- * |       |       |
- * |    various    |
- * |      state    |
- * |       |       |
- * |_______|_______| <---- 2048 + ?
- * |       ^       |
- * |       |       |
- * |   batch       |
- * |    commands   |
- * |       |       |
- * |       |       |
- * +---------------+ <---- 0 + ?
- *
- */
-
-#define BATCH_STATE_SPLIT 2048
-
-void
-gen8lp_media_fillfunc(struct intel_batchbuffer *batch,
-		struct igt_buf *dst,
-		unsigned x, unsigned y,
-		unsigned width, unsigned height,
-		uint8_t color)
-{
-	uint32_t curbe_buffer, interface_descriptor;
-	uint32_t batch_end;
-
-	intel_batchbuffer_flush(batch);
-
-	/* setup states */
-	batch->ptr = &batch->buffer[BATCH_STATE_SPLIT];
-
-	curbe_buffer = gen8_fill_curbe_buffer_data(batch, color);
-	interface_descriptor = gen8_fill_interface_descriptor(batch, dst, media_kernel, sizeof(media_kernel));
-	igt_assert(batch->ptr < &batch->buffer[4095]);
-
-	/* media pipeline */
-	batch->ptr = batch->buffer;
-	OUT_BATCH(GEN8_PIPELINE_SELECT | PIPELINE_SELECT_MEDIA);
-	gen8_emit_state_base_address(batch);
-
-	gen8_emit_vfe_state(batch);
-
-	gen8_emit_curbe_load(batch, curbe_buffer);
-
-	gen8_emit_interface_descriptor_load(batch, interface_descriptor);
-
-	gen8lp_emit_media_objects(batch, x, y, width, height);
-
-	OUT_BATCH(MI_BATCH_BUFFER_END);
-
-	batch_end = batch_align(batch, 8);
-	igt_assert(batch_end < BATCH_STATE_SPLIT);
-
-	gen8_render_flush(batch, batch_end);
-	intel_batchbuffer_reset(batch);
-}
diff --git a/lib/media_fill_gen9.c b/lib/media_fill_gen9.c
index 6accdbe4..a492466a 100644
--- a/lib/media_fill_gen9.c
+++ b/lib/media_fill_gen9.c
@@ -75,11 +75,11 @@ gen9_media_fillfunc(struct intel_batchbuffer *batch,
 
 	gen8_emit_vfe_state(batch);
 
-	gen8_emit_curbe_load(batch, curbe_buffer);
+	gen7_emit_curbe_load(batch, curbe_buffer);
 
-	gen8_emit_interface_descriptor_load(batch, interface_descriptor);
+	gen7_emit_interface_descriptor_load(batch, interface_descriptor);
 
-	gen8_emit_media_objects(batch, x, y, width, height);
+	gen7_emit_media_objects(batch, x, y, width, height);
 
 	OUT_BATCH(GEN8_PIPELINE_SELECT | PIPELINE_SELECT_MEDIA |
 			GEN9_FORCE_MEDIA_AWAKE_DISABLE |
diff --git a/lib/meson.build b/lib/meson.build
index 385e08b9..5f2567fb 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -26,7 +26,6 @@ lib_sources = [
 	'ioctl_wrappers.c',
 	'media_fill_gen7.c',
 	'media_fill_gen8.c',
-	'media_fill_gen8lp.c',
 	'media_fill_gen9.c',
 	'media_spin.c',
 	'gpgpu_fill.c',
-- 
2.14.3

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [igt-dev] [PATCH i-g-t v3 3/4] lib/gpgpu_fill: Add missing configuration parameters for gpgpu_fill
  2018-04-09 14:34 [igt-dev] [PATCH i-g-t v3 0/4] Refactoring of *_fill libraries Katarzyna Dec
  2018-04-09 14:34 ` [igt-dev] [PATCH i-g-t v3 1/4] lib: Move common gpgpu/media fill functions to gpu_fill library Katarzyna Dec
  2018-04-09 14:34 ` [igt-dev] [PATCH i-g-t v3 2/4] lib: Remove duplications in " Katarzyna Dec
@ 2018-04-09 14:34 ` Katarzyna Dec
  2018-04-09 14:34 ` [igt-dev] [PATCH i-g-t v3 4/4] lib: Adjust refactored gpu_fill library to our coding style Katarzyna Dec
  2018-04-09 15:04 ` [igt-dev] ✗ Fi.CI.BAT: failure for Refactoring of *_fill libraries (rev2) Patchwork
  4 siblings, 0 replies; 6+ messages in thread
From: Katarzyna Dec @ 2018-04-09 14:34 UTC (permalink / raw)
  To: igt-dev

There are missing parameters for Gen8 configuration of gpgpu_fill
that are causing GPU hangs on newer hardware. We need to set the
number of threads in TG in gen8_fill_interface_descriptor. This
field was omitted (apparently without any side effects), but
according to bspec from BDW this field cannot be set to 0. We also
need to use pipeline selection mask to gen9_gpgpu_fillfunc, which
is necessary from SKL.

v2: rebased on refactored library
v3: Removed replacing gen7_emit_interface_descriptor_load with gen8
version in gen9_gpgpgu_fillfunc, because during refactoring gen8
function was removed.
v4: rebase on series new version

Signed-off-by: Katarzyna Dec <katarzyna.dec@intel.com>
Cc: Lukasz Kalamarz <lukasz.kalamarz@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 lib/gpgpu_fill.c | 3 ++-
 lib/gpu_fill.c   | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/gpgpu_fill.c b/lib/gpgpu_fill.c
index 579ce78d..5a77ebd4 100644
--- a/lib/gpgpu_fill.c
+++ b/lib/gpgpu_fill.c
@@ -223,7 +223,8 @@ gen9_gpgpu_fillfunc(struct intel_batchbuffer *batch,
 	batch->ptr = batch->buffer;
 
 	/* GPGPU pipeline */
-	OUT_BATCH(GEN7_PIPELINE_SELECT | PIPELINE_SELECT_GPGPU);
+	OUT_BATCH(GEN7_PIPELINE_SELECT | GEN9_PIPELINE_SELECTION_MASK |
+		  PIPELINE_SELECT_GPGPU);
 
 	gen9_emit_state_base_address(batch);
 	gen8_emit_vfe_state_gpgpu(batch);
diff --git a/lib/gpu_fill.c b/lib/gpu_fill.c
index f86e3414..39c103f6 100644
--- a/lib/gpu_fill.c
+++ b/lib/gpu_fill.c
@@ -465,6 +465,8 @@ gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *
 	idd->desc5.constant_urb_entry_read_offset = 0;
 	idd->desc5.constant_urb_entry_read_length = 1; /* grf 1 */
 
+	idd->desc6.num_threads_in_tg = 1;
+
 	return offset;
 }
 
-- 
2.14.3

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [igt-dev] [PATCH i-g-t v3 4/4] lib: Adjust refactored gpu_fill library to our coding style
  2018-04-09 14:34 [igt-dev] [PATCH i-g-t v3 0/4] Refactoring of *_fill libraries Katarzyna Dec
                   ` (2 preceding siblings ...)
  2018-04-09 14:34 ` [igt-dev] [PATCH i-g-t v3 3/4] lib/gpgpu_fill: Add missing configuration parameters for gpgpu_fill Katarzyna Dec
@ 2018-04-09 14:34 ` Katarzyna Dec
  2018-04-09 15:04 ` [igt-dev] ✗ Fi.CI.BAT: failure for Refactoring of *_fill libraries (rev2) Patchwork
  4 siblings, 0 replies; 6+ messages in thread
From: Katarzyna Dec @ 2018-04-09 14:34 UTC (permalink / raw)
  To: igt-dev

While I am making changes in gpgpu and media fill area let's
adjust code to our coding style.

v2: rebased on series new version (patch is now last from
series so change seems larger)

Signed-off-by: Katarzyna Dec <katarzyna.dec@intel.com>
Cc: Lukasz Kalamarz <lukasz.kalamarz@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 lib/gpgpu_fill.c      |  24 ++++++------
 lib/gpgpu_fill.h      |  12 +++---
 lib/gpu_fill.c        | 102 +++++++++++++++++++++++++++-----------------------
 lib/gpu_fill.h        |  51 +++++++++++++------------
 lib/media_fill.h      |  20 +++++-----
 lib/media_fill_gen7.c |   7 ++--
 lib/media_fill_gen8.c |   7 ++--
 lib/media_fill_gen9.c |   7 ++--
 8 files changed, 120 insertions(+), 110 deletions(-)

diff --git a/lib/gpgpu_fill.c b/lib/gpgpu_fill.c
index 5a77ebd4..72a1445a 100644
--- a/lib/gpgpu_fill.c
+++ b/lib/gpgpu_fill.c
@@ -99,8 +99,8 @@ static const uint32_t gen9_gpgpu_kernel[][4] = {
 void
 gen7_gpgpu_fillfunc(struct intel_batchbuffer *batch,
 		    struct igt_buf *dst,
-		    unsigned x, unsigned y,
-		    unsigned width, unsigned height,
+		    unsigned int x, unsigned int y,
+		    unsigned int width, unsigned int height,
 		    uint8_t color)
 {
 	uint32_t curbe_buffer, interface_descriptor;
@@ -120,8 +120,8 @@ gen7_gpgpu_fillfunc(struct intel_batchbuffer *batch,
 	curbe_buffer = gen7_fill_curbe_buffer_data(batch, color);
 
 	interface_descriptor = gen7_fill_interface_descriptor(batch, dst,
-							      gen7_gpgpu_kernel,
-							      sizeof(gen7_gpgpu_kernel));
+				gen7_gpgpu_kernel, sizeof(gen7_gpgpu_kernel));
+
 	igt_assert(batch->ptr < &batch->buffer[4095]);
 
 	batch->ptr = batch->buffer;
@@ -147,8 +147,8 @@ gen7_gpgpu_fillfunc(struct intel_batchbuffer *batch,
 void
 gen8_gpgpu_fillfunc(struct intel_batchbuffer *batch,
 		    struct igt_buf *dst,
-		    unsigned x, unsigned y,
-		    unsigned width, unsigned height,
+		    unsigned int x, unsigned int y,
+		    unsigned int width, unsigned int height,
 		    uint8_t color)
 {
 	uint32_t curbe_buffer, interface_descriptor;
@@ -168,8 +168,8 @@ gen8_gpgpu_fillfunc(struct intel_batchbuffer *batch,
 	curbe_buffer = gen7_fill_curbe_buffer_data(batch, color);
 
 	interface_descriptor = gen8_fill_interface_descriptor(batch, dst,
-							      gen8_gpgpu_kernel,
-							      sizeof(gen8_gpgpu_kernel));
+				gen8_gpgpu_kernel, sizeof(gen8_gpgpu_kernel));
+
 	igt_assert(batch->ptr < &batch->buffer[4095]);
 
 	batch->ptr = batch->buffer;
@@ -195,8 +195,8 @@ gen8_gpgpu_fillfunc(struct intel_batchbuffer *batch,
 void
 gen9_gpgpu_fillfunc(struct intel_batchbuffer *batch,
 		    struct igt_buf *dst,
-		    unsigned x, unsigned y,
-		    unsigned width, unsigned height,
+		    unsigned int x, unsigned int y,
+		    unsigned int width, unsigned int height,
 		    uint8_t color)
 {
 	uint32_t curbe_buffer, interface_descriptor;
@@ -216,8 +216,8 @@ gen9_gpgpu_fillfunc(struct intel_batchbuffer *batch,
 	curbe_buffer = gen7_fill_curbe_buffer_data(batch, color);
 
 	interface_descriptor = gen8_fill_interface_descriptor(batch, dst,
-							      gen9_gpgpu_kernel,
-							      sizeof(gen9_gpgpu_kernel));
+				gen9_gpgpu_kernel, sizeof(gen9_gpgpu_kernel));
+
 	igt_assert(batch->ptr < &batch->buffer[4095]);
 
 	batch->ptr = batch->buffer;
diff --git a/lib/gpgpu_fill.h b/lib/gpgpu_fill.h
index 7b5c8322..f0d188ae 100644
--- a/lib/gpgpu_fill.h
+++ b/lib/gpgpu_fill.h
@@ -30,22 +30,22 @@
 void
 gen7_gpgpu_fillfunc(struct intel_batchbuffer *batch,
 		    struct igt_buf *dst,
-		    unsigned x, unsigned y,
-		    unsigned width, unsigned height,
+		    unsigned int x, unsigned int y,
+		    unsigned int width, unsigned int height,
 		    uint8_t color);
 
 void
 gen8_gpgpu_fillfunc(struct intel_batchbuffer *batch,
 		    struct igt_buf *dst,
-		    unsigned x, unsigned y,
-		    unsigned width, unsigned height,
+		    unsigned int x, unsigned int y,
+		    unsigned int width, unsigned int height,
 		    uint8_t color);
 
 void
 gen9_gpgpu_fillfunc(struct intel_batchbuffer *batch,
 		    struct igt_buf *dst,
-		    unsigned x, unsigned y,
-		    unsigned width, unsigned height,
+		    unsigned int x, unsigned int y,
+		    unsigned int width, unsigned int height,
 		    uint8_t color);
 
 #endif /* GPGPU_FILL_H */
diff --git a/lib/gpu_fill.c b/lib/gpu_fill.c
index 39c103f6..43cfa142 100644
--- a/lib/gpu_fill.c
+++ b/lib/gpu_fill.c
@@ -10,6 +10,7 @@ uint32_t
 batch_align(struct intel_batchbuffer *batch, uint32_t align)
 {
 	uint32_t offset = batch_used(batch);
+
 	offset = ALIGN(offset, align);
 	batch->ptr = batch->buffer + offset;
 	return offset;
@@ -19,6 +20,7 @@ void *
 batch_alloc(struct intel_batchbuffer *batch, uint32_t size, uint32_t align)
 {
 	uint32_t offset = batch_align(batch, align);
+
 	batch->ptr += size;
 	return memset(batch->buffer + offset, 0, size);
 }
@@ -30,9 +32,11 @@ batch_offset(struct intel_batchbuffer *batch, void *ptr)
 }
 
 uint32_t
-batch_copy(struct intel_batchbuffer *batch, const void *ptr, uint32_t size, uint32_t align)
+batch_copy(struct intel_batchbuffer *batch, const void *ptr, uint32_t size,
+	   uint32_t align)
 {
-	return batch_offset(batch, memcpy(batch_alloc(batch, size, align), ptr, size));
+	return batch_offset(batch, memcpy(batch_alloc(batch, size, align),
+			    ptr, size));
 }
 
 void
@@ -43,13 +47,13 @@ gen7_render_flush(struct intel_batchbuffer *batch, uint32_t batch_end)
 	ret = drm_intel_bo_subdata(batch->bo, 0, 4096, batch->buffer);
 	if (ret == 0)
 		ret = drm_intel_bo_mrb_exec(batch->bo, batch_end,
-					NULL, 0, 0, 0);
+					    NULL, 0, 0, 0);
 	igt_assert(ret == 0);
 }
 
 uint32_t
 gen7_fill_curbe_buffer_data(struct intel_batchbuffer *batch,
-			uint8_t color)
+			    uint8_t color)
 {
 	uint8_t *curbe_buffer;
 	uint32_t offset;
@@ -62,10 +66,8 @@ gen7_fill_curbe_buffer_data(struct intel_batchbuffer *batch,
 }
 
 uint32_t
-gen7_fill_surface_state(struct intel_batchbuffer *batch,
-			struct igt_buf *buf,
-			uint32_t format,
-			int is_dst)
+gen7_fill_surface_state(struct intel_batchbuffer *batch, struct igt_buf *buf,
+			uint32_t format, int is_dst)
 {
 	struct gen7_surface_state *ss;
 	uint32_t write_domain, read_domain, offset;
@@ -111,8 +113,7 @@ gen7_fill_surface_state(struct intel_batchbuffer *batch,
 }
 
 uint32_t
-gen7_fill_binding_table(struct intel_batchbuffer *batch,
-			struct igt_buf *dst)
+gen7_fill_binding_table(struct intel_batchbuffer *batch, struct igt_buf *dst)
 {
 	uint32_t *binding_table, offset;
 
@@ -129,9 +130,8 @@ gen7_fill_binding_table(struct intel_batchbuffer *batch,
 }
 
 uint32_t
-gen7_fill_kernel(struct intel_batchbuffer *batch,
-		const uint32_t kernel[][4],
-		size_t size)
+gen7_fill_kernel(struct intel_batchbuffer *batch, const uint32_t kernel[][4],
+		 size_t size)
 {
 	uint32_t offset;
 
@@ -141,8 +141,9 @@ gen7_fill_kernel(struct intel_batchbuffer *batch,
 }
 
 uint32_t
-gen7_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst,
-			       const uint32_t kernel[][4], size_t size)
+gen7_fill_interface_descriptor(struct intel_batchbuffer *batch,
+			       struct igt_buf *dst, const uint32_t kernel[][4],
+			       size_t size)
 {
 	struct gen7_interface_descriptor_data *idd;
 	uint32_t offset;
@@ -180,16 +181,19 @@ gen7_emit_state_base_address(struct intel_batchbuffer *batch)
 	OUT_BATCH(0);
 
 	/* surface */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
+		  BASE_ADDRESS_MODIFY);
 
 	/* dynamic */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
+		  BASE_ADDRESS_MODIFY);
 
 	/* indirect */
 	OUT_BATCH(0);
 
 	/* instruction */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
+		  BASE_ADDRESS_MODIFY);
 
 	/* general/dynamic/indirect/instruction access Bound */
 	OUT_BATCH(0);
@@ -214,7 +218,7 @@ gen7_emit_vfe_state(struct intel_batchbuffer *batch)
 
 	/* urb entry size & curbe size */
 	OUT_BATCH(2 << 16 | 	/* in 256 bits unit */
-		2);		/* in 256 bits unit */
+		  2);		/* in 256 bits unit */
 
 	/* scoreboard */
 	OUT_BATCH(0);
@@ -268,14 +272,16 @@ gen7_emit_interface_descriptor_load(struct intel_batchbuffer *batch, uint32_t in
 		OUT_BATCH(sizeof(struct gen7_interface_descriptor_data));
 	else
 		OUT_BATCH(sizeof(struct gen8_interface_descriptor_data));
-	/* interface descriptor address, is relative to the dynamics base address */
+	/* interface descriptor address, is relative to the dynamics base
+	 * address
+	 */
 	OUT_BATCH(interface_descriptor);
 }
 
 void
 gen7_emit_media_objects(struct intel_batchbuffer *batch,
-			unsigned x, unsigned y,
-			unsigned width, unsigned height)
+			unsigned int x, unsigned int y,
+			unsigned int width, unsigned int height)
 {
 	int i, j;
 
@@ -297,7 +303,8 @@ gen7_emit_media_objects(struct intel_batchbuffer *batch,
 			/* inline data (xoffset, yoffset) */
 			OUT_BATCH(x + i * 16);
 			OUT_BATCH(y + j * 16);
-			if (AT_LEAST_GEN(batch->devid, 8) && !IS_CHERRYVIEW(batch->devid))
+			if (AT_LEAST_GEN(batch->devid, 8) &&
+			    !IS_CHERRYVIEW(batch->devid))
 				gen8_emit_media_state_flush(batch);
 		}
 	}
@@ -305,8 +312,8 @@ gen7_emit_media_objects(struct intel_batchbuffer *batch,
 
 void
 gen7_emit_gpgpu_walk(struct intel_batchbuffer *batch,
-		     unsigned x, unsigned y,
-		     unsigned width, unsigned height)
+		     unsigned int x, unsigned int y,
+		     unsigned int width, unsigned int height)
 {
 	uint32_t x_dim, y_dim, tmp, right_mask;
 
@@ -368,14 +375,13 @@ gen8_render_flush(struct intel_batchbuffer *batch, uint32_t batch_end)
 	ret = drm_intel_bo_subdata(batch->bo, 0, 4096, batch->buffer);
 	if (ret == 0)
 		ret = drm_intel_bo_mrb_exec(batch->bo, batch_end,
-					NULL, 0, 0, 0);
+					    NULL, 0, 0, 0);
 	igt_assert(ret == 0);
 }
 
 
 uint32_t
-gen8_fill_curbe_buffer_data(struct intel_batchbuffer *batch,
-			uint8_t color)
+gen8_fill_curbe_buffer_data(struct intel_batchbuffer *batch, uint8_t color)
 {
 	uint8_t *curbe_buffer;
 	uint32_t offset;
@@ -388,10 +394,8 @@ gen8_fill_curbe_buffer_data(struct intel_batchbuffer *batch,
 }
 
 uint32_t
-gen8_fill_surface_state(struct intel_batchbuffer *batch,
-			struct igt_buf *buf,
-			uint32_t format,
-			int is_dst)
+gen8_fill_surface_state(struct intel_batchbuffer *batch, struct igt_buf *buf,
+			uint32_t format, int is_dst)
 {
 	struct gen8_surface_state *ss;
 	uint32_t write_domain, read_domain, offset;
@@ -439,7 +443,9 @@ gen8_fill_surface_state(struct intel_batchbuffer *batch,
 }
 
 uint32_t
-gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst,  const uint32_t kernel[][4], size_t size)
+gen8_fill_interface_descriptor(struct intel_batchbuffer *batch,
+			       struct igt_buf *dst, const uint32_t kernel[][4],
+			       size_t size)
 {
 	struct gen8_interface_descriptor_data *idd;
 	uint32_t offset;
@@ -456,7 +462,7 @@ gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *
 	idd->desc2.single_program_flow = 1;
 	idd->desc2.floating_point_mode = GEN8_FLOATING_POINT_IEEE_754;
 
-	idd->desc3.sampler_count = 0;      /* 0 samplers used */
+	idd->desc3.sampler_count = 0;	/* 0 samplers used */
 	idd->desc3.sampler_state_pointer = 0;
 
 	idd->desc4.binding_table_entry_count = 0;
@@ -486,8 +492,8 @@ gen8_emit_state_base_address(struct intel_batchbuffer *batch)
 	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_SAMPLER, 0, BASE_ADDRESS_MODIFY);
 
 	/* dynamic */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION,
-		0, BASE_ADDRESS_MODIFY);
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_RENDER |
+		  I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
 
 	/* indirect */
 	OUT_BATCH(0);
@@ -502,7 +508,9 @@ gen8_emit_state_base_address(struct intel_batchbuffer *batch)
 	OUT_BATCH(1 << 12 | 1);
 	/* indirect object buffer size */
 	OUT_BATCH(0xfffff000 | 1);
-	/* intruction buffer size, must set modify enable bit, otherwise it may result in GPU hang */
+	/* instruction buffer size, must set modify enable bit, otherwise it may
+	 * result in GPU hang
+	 */
 	OUT_BATCH(1 << 12 | 1);
 }
 
@@ -517,13 +525,13 @@ gen8_emit_vfe_state(struct intel_batchbuffer *batch)
 
 	/* number of threads & urb entries */
 	OUT_BATCH(1 << 16 |
-		2 << 8);
+		  2 << 8);
 
 	OUT_BATCH(0);
 
 	/* urb entry size & curbe size */
 	OUT_BATCH(2 << 16 |
-		2);
+		  2);
 
 	/* scoreboard */
 	OUT_BATCH(0);
@@ -561,11 +569,10 @@ gen8_emit_media_state_flush(struct intel_batchbuffer *batch)
 	OUT_BATCH(0);
 }
 
-
 void
 gen8_emit_gpgpu_walk(struct intel_batchbuffer *batch,
-		     unsigned x, unsigned y,
-		     unsigned width, unsigned height)
+		     unsigned int x, unsigned int y,
+		     unsigned int width, unsigned int height)
 {
 	uint32_t x_dim, y_dim, tmp, right_mask;
 
@@ -638,15 +645,16 @@ gen9_emit_state_base_address(struct intel_batchbuffer *batch)
 	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_SAMPLER, 0, BASE_ADDRESS_MODIFY);
 
 	/* dynamic */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION,
-		0, BASE_ADDRESS_MODIFY);
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_RENDER |
+		  I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
 
 	/* indirect */
 	OUT_BATCH(0);
 	OUT_BATCH(0);
 
 	/* instruction */
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
+		  BASE_ADDRESS_MODIFY);
 
 	/* general state buffer size */
 	OUT_BATCH(0xfffff000 | 1);
@@ -654,7 +662,9 @@ gen9_emit_state_base_address(struct intel_batchbuffer *batch)
 	OUT_BATCH(1 << 12 | 1);
 	/* indirect object buffer size */
 	OUT_BATCH(0xfffff000 | 1);
-	/* intruction buffer size, must set modify enable bit, otherwise it may result in GPU hang */
+	/* intruction buffer size, must set modify enable bit, otherwise it may
+	 * result in GPU hang
+	 */
 	OUT_BATCH(1 << 12 | 1);
 
 	/* Bindless surface state base address */
diff --git a/lib/gpu_fill.h b/lib/gpu_fill.h
index 9200fe4f..59829be4 100644
--- a/lib/gpu_fill.h
+++ b/lib/gpu_fill.h
@@ -10,6 +10,8 @@
 #include "intel_chipset.h"
 #include <assert.h>
 
+#include "gpu_fill.h"
+
 uint32_t
 batch_used(struct intel_batchbuffer *batch);
 
@@ -23,33 +25,31 @@ uint32_t
 batch_offset(struct intel_batchbuffer *batch, void *ptr);
 
 uint32_t
-batch_copy(struct intel_batchbuffer *batch, const void *ptr, uint32_t size, uint32_t align);
+batch_copy(struct intel_batchbuffer *batch, const void *ptr, uint32_t size,
+	   uint32_t align);
 
 void
 gen7_render_flush(struct intel_batchbuffer *batch, uint32_t batch_end);
 
 uint32_t
 gen7_fill_curbe_buffer_data(struct intel_batchbuffer *batch,
-			uint8_t color);
+			    uint8_t color);
 
 uint32_t
-gen7_fill_surface_state(struct intel_batchbuffer *batch,
-			struct igt_buf *buf,
-			uint32_t format,
-			int is_dst);
+gen7_fill_surface_state(struct intel_batchbuffer *batch, struct igt_buf *buf,
+			uint32_t format, int is_dst);
 
 uint32_t
-gen7_fill_binding_table(struct intel_batchbuffer *batch,
-			struct igt_buf *dst);
+gen7_fill_binding_table(struct intel_batchbuffer *batch, struct igt_buf *dst);
 
 uint32_t
-gen7_fill_kernel(struct intel_batchbuffer *batch,
-		const uint32_t kernel[][4],
-		size_t size);
+gen7_fill_kernel(struct intel_batchbuffer *batch, const uint32_t kernel[][4],
+		 size_t size);
 
 uint32_t
-gen7_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst,
-			       const uint32_t kernel[][4], size_t size);
+gen7_fill_interface_descriptor(struct intel_batchbuffer *batch,
+			       struct igt_buf *dst, const uint32_t kernel[][4],
+			       size_t size);
 
 void
 gen7_emit_state_base_address(struct intel_batchbuffer *batch);
@@ -68,29 +68,28 @@ gen7_emit_interface_descriptor_load(struct intel_batchbuffer *batch, uint32_t in
 
 void
 gen7_emit_media_objects(struct intel_batchbuffer *batch,
-			unsigned x, unsigned y,
-			unsigned width, unsigned height);
+			unsigned int x, unsigned int y,
+			unsigned int width, unsigned int height);
 
 void
 gen7_emit_gpgpu_walk(struct intel_batchbuffer *batch,
-		     unsigned x, unsigned y,
-		     unsigned width, unsigned height);
+		     unsigned int x, unsigned int y,
+		     unsigned int width, unsigned int height);
 
 void
 gen8_render_flush(struct intel_batchbuffer *batch, uint32_t batch_end);
 
 uint32_t
-gen8_fill_curbe_buffer_data(struct intel_batchbuffer *batch,
-			uint8_t color);
+gen8_fill_curbe_buffer_data(struct intel_batchbuffer *batch, uint8_t color);
 
 uint32_t
-gen8_fill_surface_state(struct intel_batchbuffer *batch,
-			struct igt_buf *buf,
-			uint32_t format,
-			int is_dst);
+gen8_fill_surface_state(struct intel_batchbuffer *batch, struct igt_buf *buf,
+			uint32_t format, int is_dst);
 
 uint32_t
-gen8_fill_interface_descriptor(struct intel_batchbuffer *batch, struct igt_buf *dst,  const uint32_t kernel[][4], size_t size);
+gen8_fill_interface_descriptor(struct intel_batchbuffer *batch,
+			       struct igt_buf *dst, const uint32_t kernel[][4],
+			       size_t size);
 
 void
 gen8_emit_state_base_address(struct intel_batchbuffer *batch);
@@ -106,8 +105,8 @@ gen8_emit_media_state_flush(struct intel_batchbuffer *batch);
 
 void
 gen8_emit_gpgpu_walk(struct intel_batchbuffer *batch,
-		     unsigned x, unsigned y,
-		     unsigned width, unsigned height);
+		     unsigned int x, unsigned int y,
+		     unsigned int width, unsigned int height);
 
 void
 gen9_emit_state_base_address(struct intel_batchbuffer *batch);
diff --git a/lib/media_fill.h b/lib/media_fill.h
index 161af8cf..f6db734e 100644
--- a/lib/media_fill.h
+++ b/lib/media_fill.h
@@ -7,22 +7,22 @@
 void
 gen8_media_fillfunc(struct intel_batchbuffer *batch,
 		struct igt_buf *dst,
-		unsigned x, unsigned y,
-		unsigned width, unsigned height,
+		unsigned int x, unsigned int y,
+		unsigned int width, unsigned int height,
 		uint8_t color);
 
 void
 gen7_media_fillfunc(struct intel_batchbuffer *batch,
-                struct igt_buf *dst,
-                unsigned x, unsigned y,
-                unsigned width, unsigned height,
-                uint8_t color);
+		struct igt_buf *dst,
+		unsigned int x, unsigned int y,
+		unsigned int width, unsigned int height,
+		uint8_t color);
 
 void
 gen9_media_fillfunc(struct intel_batchbuffer *batch,
-                struct igt_buf *dst,
-                unsigned x, unsigned y,
-                unsigned width, unsigned height,
-                uint8_t color);
+		struct igt_buf *dst,
+		unsigned int x, unsigned int y,
+		unsigned int width, unsigned int height,
+		uint8_t color);
 
 #endif /* RENDE_MEDIA_FILL_H */
diff --git a/lib/media_fill_gen7.c b/lib/media_fill_gen7.c
index c97555a6..5a8c32fb 100644
--- a/lib/media_fill_gen7.c
+++ b/lib/media_fill_gen7.c
@@ -47,8 +47,8 @@ static const uint32_t media_kernel[][4] = {
 void
 gen7_media_fillfunc(struct intel_batchbuffer *batch,
 		struct igt_buf *dst,
-		unsigned x, unsigned y,
-		unsigned width, unsigned height,
+		unsigned int x, unsigned int y,
+		unsigned int width, unsigned int height,
 		uint8_t color)
 {
 	uint32_t curbe_buffer, interface_descriptor;
@@ -61,8 +61,7 @@ gen7_media_fillfunc(struct intel_batchbuffer *batch,
 
 	curbe_buffer = gen7_fill_curbe_buffer_data(batch, color);
 	interface_descriptor = gen7_fill_interface_descriptor(batch, dst,
-							      media_kernel,
-							      sizeof(media_kernel));
+					media_kernel, sizeof(media_kernel));
 	igt_assert(batch->ptr < &batch->buffer[4095]);
 
 	/* media pipeline */
diff --git a/lib/media_fill_gen8.c b/lib/media_fill_gen8.c
index e398c8ea..ee94510c 100644
--- a/lib/media_fill_gen8.c
+++ b/lib/media_fill_gen8.c
@@ -50,8 +50,8 @@ static const uint32_t media_kernel[][4] = {
 void
 gen8_media_fillfunc(struct intel_batchbuffer *batch,
 		struct igt_buf *dst,
-		unsigned x, unsigned y,
-		unsigned width, unsigned height,
+		unsigned int x, unsigned int y,
+		unsigned int width, unsigned int height,
 		uint8_t color)
 {
 	uint32_t curbe_buffer, interface_descriptor;
@@ -63,7 +63,8 @@ gen8_media_fillfunc(struct intel_batchbuffer *batch,
 	batch->ptr = &batch->buffer[BATCH_STATE_SPLIT];
 
 	curbe_buffer = gen8_fill_curbe_buffer_data(batch, color);
-	interface_descriptor = gen8_fill_interface_descriptor(batch, dst, media_kernel, sizeof(media_kernel));
+	interface_descriptor = gen8_fill_interface_descriptor(batch, dst,
+					media_kernel, sizeof(media_kernel));
 	igt_assert(batch->ptr < &batch->buffer[4095]);
 
 	/* media pipeline */
diff --git a/lib/media_fill_gen9.c b/lib/media_fill_gen9.c
index a492466a..cbc7a7ac 100644
--- a/lib/media_fill_gen9.c
+++ b/lib/media_fill_gen9.c
@@ -47,8 +47,8 @@ static const uint32_t media_kernel[][4] = {
 void
 gen9_media_fillfunc(struct intel_batchbuffer *batch,
 		struct igt_buf *dst,
-		unsigned x, unsigned y,
-		unsigned width, unsigned height,
+		unsigned int x, unsigned int y,
+		unsigned int width, unsigned int height,
 		uint8_t color)
 {
 	uint32_t curbe_buffer, interface_descriptor;
@@ -60,7 +60,8 @@ gen9_media_fillfunc(struct intel_batchbuffer *batch,
 	batch->ptr = &batch->buffer[BATCH_STATE_SPLIT];
 
 	curbe_buffer = gen8_fill_curbe_buffer_data(batch, color);
-	interface_descriptor = gen8_fill_interface_descriptor(batch, dst, media_kernel, sizeof(media_kernel));
+	interface_descriptor = gen8_fill_interface_descriptor(batch, dst,
+					media_kernel, sizeof(media_kernel));
 	assert(batch->ptr < &batch->buffer[4095]);
 
 	/* media pipeline */
-- 
2.14.3

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [igt-dev] ✗ Fi.CI.BAT: failure for Refactoring of *_fill libraries (rev2)
  2018-04-09 14:34 [igt-dev] [PATCH i-g-t v3 0/4] Refactoring of *_fill libraries Katarzyna Dec
                   ` (3 preceding siblings ...)
  2018-04-09 14:34 ` [igt-dev] [PATCH i-g-t v3 4/4] lib: Adjust refactored gpu_fill library to our coding style Katarzyna Dec
@ 2018-04-09 15:04 ` Patchwork
  4 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2018-04-09 15:04 UTC (permalink / raw)
  To: Katarzyna Dec; +Cc: igt-dev

== Series Details ==

Series: Refactoring of *_fill libraries (rev2)
URL   : https://patchwork.freedesktop.org/series/41212/
State : failure

== Summary ==

IGT patchset build failed on latest successful build
7c474e011548d35df6b80ceed81d3e6ca560c71d tests/perf: fix gen8 small cores whitelist expectation

                 from ../lib/gpu_fill.h:13,
                 from ../lib/gpu_fill.h:13,
                 from ../lib/gpu_fill.h:13,
                 from ../lib/gpu_fill.h:13,
                 from ../lib/gpu_fill.h:13,
                 from ../lib/gpu_fill.h:13,
                 from ../lib/gpu_fill.h:13,
                 from ../lib/gpu_fill.h:13,
                 from ../lib/gpu_fill.h:13,
                 from ../lib/gpu_fill.h:13,
                 from ../lib/gpu_fill.h:13,
                 from ../lib/gpu_fill.h:13,
                 from ../lib/gpu_fill.h:13,
                 from ../lib/gpu_fill.h:13,
                 from ../lib/gpu_fill.h:13,
                 from ../lib/media_fill_gen9.c:7:
../lib/gpu_fill.h:1:26: error: #include nested too deeply
 #include <intel_bufmgr.h>
                          ^
../lib/gpu_fill.h:2:22: error: #include nested too deeply
 #include <i915_drm.h>
                      ^
../lib/gpu_fill.h:4:24: error: #include nested too deeply
 #include "media_fill.h"
                        ^
../lib/gpu_fill.h:5:24: error: #include nested too deeply
 #include "gen7_media.h"
                        ^
../lib/gpu_fill.h:6:24: error: #include nested too deeply
 #include "gen8_media.h"
                        ^
../lib/gpu_fill.h:7:23: error: #include nested too deeply
 #include "intel_reg.h"
                       ^
../lib/gpu_fill.h:8:21: error: #include nested too deeply
 #include "drmtest.h"
                     ^
../lib/gpu_fill.h:9:31: error: #include nested too deeply
 #include "intel_batchbuffer.h"
                               ^
../lib/gpu_fill.h:10:27: error: #include nested too deeply
 #include "intel_chipset.h"
                           ^
../lib/gpu_fill.h:11:20: error: #include nested too deeply
 #include <assert.h>
                    ^
../lib/gpu_fill.h:13:22: error: #include nested too deeply
 #include "gpu_fill.h"
                      ^
ninja: build stopped: subcommand failed.

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-04-09 15:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-09 14:34 [igt-dev] [PATCH i-g-t v3 0/4] Refactoring of *_fill libraries Katarzyna Dec
2018-04-09 14:34 ` [igt-dev] [PATCH i-g-t v3 1/4] lib: Move common gpgpu/media fill functions to gpu_fill library Katarzyna Dec
2018-04-09 14:34 ` [igt-dev] [PATCH i-g-t v3 2/4] lib: Remove duplications in " Katarzyna Dec
2018-04-09 14:34 ` [igt-dev] [PATCH i-g-t v3 3/4] lib/gpgpu_fill: Add missing configuration parameters for gpgpu_fill Katarzyna Dec
2018-04-09 14:34 ` [igt-dev] [PATCH i-g-t v3 4/4] lib: Adjust refactored gpu_fill library to our coding style Katarzyna Dec
2018-04-09 15:04 ` [igt-dev] ✗ Fi.CI.BAT: failure for Refactoring of *_fill libraries (rev2) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.