intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] intel: Fix stencil buffer to be W tiled
@ 2011-07-18  7:55 Chad Versace
  2011-07-18  7:55 ` [PATCH] dri: Do not tile stencil buffer Chad Versace
  2011-07-18  7:55 ` [PATCH] intel: Fix stencil buffer to be W tiled Chad Versace
  0 siblings, 2 replies; 18+ messages in thread
From: Chad Versace @ 2011-07-18  7:55 UTC (permalink / raw)
  To: mesa-dev, intel-gfx; +Cc: Chad Versace

Chad Versace (2):

  xf86-video-intel
      dri: Do not tile stencil buffer

      src/intel_dri.c |   16 ++++++++++++----

  mesa
      intel: Fix stencil buffer to be W tiled

      src/mesa/drivers/dri/intel/intel_clear.c   |    6 ++
      src/mesa/drivers/dri/intel/intel_context.c |    9 ++-
      src/mesa/drivers/dri/intel/intel_fbo.c     |   13 +++--
      src/mesa/drivers/dri/intel/intel_screen.h  |    9 ++-
      src/mesa/drivers/dri/intel/intel_span.c    |   85 ++++++++++++++++++++--------
      5 files changed, 89 insertions(+), 33 deletions(-)

-- 
1.7.6

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] dri: Do not tile stencil buffer
  2011-07-18  7:55 [PATCH] intel: Fix stencil buffer to be W tiled Chad Versace
@ 2011-07-18  7:55 ` Chad Versace
  2011-07-18  8:15   ` Paul Menzel
  2011-07-18 18:45   ` [Mesa-dev] " Ian Romanick
  2011-07-18  7:55 ` [PATCH] intel: Fix stencil buffer to be W tiled Chad Versace
  1 sibling, 2 replies; 18+ messages in thread
From: Chad Versace @ 2011-07-18  7:55 UTC (permalink / raw)
  To: mesa-dev, intel-gfx; +Cc: Chad Versace

Until now, the stencil buffer was allocated as a Y tiled buffer, because
in several locations the PRM states that it is. However, it is actually
W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
4.5.2.1 W-Major Format:
    W-Major Tile Format is used for separate stencil.

The GTT is incapable of W fencing, so we allocate the stencil buffer with
I915_TILING_NONE and decode the tile's layout in software.

This commit mutually depends on the mesa commit:
    intel: Fix stencil buffer to be W tiled
    Author: Chad Versace <chad@chad-versace.us>
    Date:   Mon Jul 18 00:37:45 2011 -0700

CC: Eric Anholt <eric@anholt.net>
CC: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
---
 src/intel_dri.c |   16 ++++++++++++----
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/intel_dri.c b/src/intel_dri.c
index 5ea7c2c..4652dc7 100644
--- a/src/intel_dri.c
+++ b/src/intel_dri.c
@@ -336,7 +336,6 @@ I830DRI2CreateBuffer(DrawablePtr drawable, unsigned int attachment,
 			switch (attachment) {
 			case DRI2BufferDepth:
 			case DRI2BufferDepthStencil:
-			case DRI2BufferStencil:
 			case DRI2BufferHiz:
 				if (SUPPORTS_YTILING(intel)) {
 					hint |= INTEL_CREATE_PIXMAP_TILING_Y;
@@ -351,6 +350,14 @@ I830DRI2CreateBuffer(DrawablePtr drawable, unsigned int attachment,
 			case DRI2BufferFrontRight:
 				hint |= INTEL_CREATE_PIXMAP_TILING_X;
 				break;
+			case DRI2BufferStencil:
+				/*
+				 * The stencil buffer is W tiled. However, we
+				 * request from the kernel a non-tiled buffer
+				 * because the GTT is incapable of W fencing.
+				 */
+				hint |= INTEL_CREATE_PIXMAP_TILING_NONE;
+				break;
 			default:
 				free(privates);
 				free(buffer);
@@ -368,11 +375,12 @@ I830DRI2CreateBuffer(DrawablePtr drawable, unsigned int attachment,
 		 * To accomplish this, we resort to the nasty hack of doubling
 		 * the drm region's cpp and halving its height.
 		 *
-		 * If we neglect to double the pitch, then
-		 * drm_intel_gem_bo_map_gtt() maps the memory incorrectly.
+		 * If we neglect to double the pitch, then render corruption
+                 * occurs.
 		 */
 		if (attachment == DRI2BufferStencil) {
-			pixmap_height /= 2;
+			pixmap_width = ALIGN(pixmap_width, 64);
+			pixmap_height = ALIGN(pixmap_height / 2, 64);
 			pixmap_cpp *= 2;
 		}
 
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH] intel: Fix stencil buffer to be W tiled
  2011-07-18  7:55 [PATCH] intel: Fix stencil buffer to be W tiled Chad Versace
  2011-07-18  7:55 ` [PATCH] dri: Do not tile stencil buffer Chad Versace
@ 2011-07-18  7:55 ` Chad Versace
  2011-07-18  8:20   ` Paul Menzel
                     ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Chad Versace @ 2011-07-18  7:55 UTC (permalink / raw)
  To: mesa-dev, intel-gfx; +Cc: Chad Versace

Until now, the stencil buffer was allocated as a Y tiled buffer, because
in several locations the PRM states that it is. However, it is actually
W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
4.5.2.1 W-Major Format:
    W-Major Tile Format is used for separate stencil.

The GTT is incapable of W fencing, so we allocate the stencil buffer with
I915_TILING_NONE and decode the tile's layout in software.

This fix touches the following portions of code:
    - In intel_allocate_renderbuffer_storage(), allocate the stencil
      buffer with I915_TILING_NONE.
    - In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
      not tiled.
    - In the stencil buffer's span functions, the tile's layout must be
      decoded in software.

This commit mutually depends on the xf86-video-intel commit
    dri: Do not tile stencil buffer
    Author: Chad Versace <chad@chad-versace.us>
    Date:   Mon Jul 18 00:38:00 2011 -0700

On Gen6 with separate stencil enabled, fixes the following Piglit tests:
    bugs/fdo23670-drawpix_stencil
    general/stencil-drawpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
    spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
    spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
    spec/EXT_packed_depth_stencil/readpixels-24_8

Note: This is a candidate for the 7.11 branch.

CC: Eric Anholt <eric@anholt.net>
CC: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
---
 src/mesa/drivers/dri/intel/intel_clear.c   |    6 ++
 src/mesa/drivers/dri/intel/intel_context.c |    9 ++-
 src/mesa/drivers/dri/intel/intel_fbo.c     |   13 +++--
 src/mesa/drivers/dri/intel/intel_screen.h  |    9 ++-
 src/mesa/drivers/dri/intel/intel_span.c    |   85 ++++++++++++++++++++--------
 5 files changed, 89 insertions(+), 33 deletions(-)

diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/drivers/dri/intel/intel_clear.c
index dfca03c..5ab9873 100644
--- a/src/mesa/drivers/dri/intel/intel_clear.c
+++ b/src/mesa/drivers/dri/intel/intel_clear.c
@@ -143,6 +143,12 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
 	     */
             tri_mask |= BUFFER_BIT_STENCIL;
          }
+	 else if (intel->has_separate_stencil &&
+	       stencilRegion->tiling == I915_TILING_NONE) {
+	    /* The stencil buffer is actually W tiled, which the hardware
+	     * cannot blit to. */
+	    tri_mask |= BUFFER_BIT_STENCIL;
+	 }
          else {
             /* clearing all stencil bits, use blitting */
             blit_mask |= BUFFER_BIT_STENCIL;
diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/drivers/dri/intel/intel_context.c
index 2ba1363..fe8be08 100644
--- a/src/mesa/drivers/dri/intel/intel_context.c
+++ b/src/mesa/drivers/dri/intel/intel_context.c
@@ -1439,7 +1439,12 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
       assert(stencil_rb->Base.Format == MESA_FORMAT_S8);
       assert(depth_rb && depth_rb->Base.Format == MESA_FORMAT_X8_Z24);
 
-      if (stencil_rb->region->tiling == I915_TILING_Y) {
+      if (stencil_rb->region->tiling == I915_TILING_NONE) {
+	 /*
+	  * The stencil buffer is actually W tiled. The region's tiling is
+	  * I915_TILING_NONE, however, because the GTT is incapable of W
+	  * fencing.
+	  */
 	 intel->intelScreen->dri2_has_hiz = INTEL_DRI2_HAS_HIZ_TRUE;
 	 return;
       } else {
@@ -1527,7 +1532,7 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
        * Presently, however, no verification or clean up is necessary, and
        * execution should not reach here. If the framebuffer still has a hiz
        * region, then we have already set dri2_has_hiz to true after
-       * confirming above that the stencil buffer is Y tiled.
+       * confirming above that the stencil buffer is W tiled.
        */
       assert(0);
    }
diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
index 1669af2..507cc33 100644
--- a/src/mesa/drivers/dri/intel/intel_fbo.c
+++ b/src/mesa/drivers/dri/intel/intel_fbo.c
@@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
 
    if (irb->Base.Format == MESA_FORMAT_S8) {
       /*
+       * The stencil buffer is W tiled. However, we request from the kernel a
+       * non-tiled buffer because the GTT is incapable of W fencing.
+       *
        * The stencil buffer has quirky pitch requirements.  From Vol 2a,
        * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
        *    The pitch must be set to 2x the value computed based on width, as
@@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
        * To accomplish this, we resort to the nasty hack of doubling the drm
        * region's cpp and halving its height.
        *
-       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
-       * maps the memory incorrectly.
+       * If we neglect to double the pitch, then render corruption occurs.
        */
       irb->region = intel_region_alloc(intel->intelScreen,
-				       I915_TILING_Y,
+				       I915_TILING_NONE,
 				       cpp * 2,
-				       width,
-				       height / 2,
+				       ALIGN(width, 64),
+				       /* XXX: Maybe align to 128? */
+				       ALIGN(height / 2, 64),
 				       GL_TRUE);
       if (!irb->region)
 	return false;
diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/drivers/dri/intel/intel_screen.h
index b2013af..9dd6a52 100644
--- a/src/mesa/drivers/dri/intel/intel_screen.h
+++ b/src/mesa/drivers/dri/intel/intel_screen.h
@@ -63,9 +63,12 @@
  * x8_z24 and s8).
  *
  * Eventually, intel_update_renderbuffers() makes a DRI2 request for
- * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are Y tiled,
- * then we joyfully set intel_screen.dri2_has_hiz to true and continue as if
- * nothing happend.
+ * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's tiling is
+ * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has_hiz to
+ * true and continue as if nothing happend.
+ *
+ * [1] The stencil buffer is actually W tiled. However, we request from the
+ *     kernel a non-tiled buffer because the GTT is incapable of W fencing.
  *
  * If the buffers are X tiled, however, the handshake has failed and we must
  * clean up.
diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
index 153803f..d306432 100644
--- a/src/mesa/drivers/dri/intel/intel_span.c
+++ b/src/mesa/drivers/dri/intel/intel_span.c
@@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
    int miny = 0;							\
    int maxx = rb->Width;						\
    int maxy = rb->Height;						\
-   int stride = rb->RowStride;						\
-   uint8_t *buf = rb->Data;						\
+									\
+   /*									\
+    * Here we ignore rb->Data and rb->RowStride as set by		\
+    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
+    * manually, the region's *real* base address and stride is		\
+    * required.								\
+    */									\
+   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
+   uint8_t *buf = irb->region->buffer->virtual;				\
+   unsigned stride = irb->region->pitch;				\
+   unsigned height = 2 * irb->region->height;				\
+   bool flip = rb->Name == 0;						\
 
-/* Don't flip y. */
 #undef Y_FLIP
-#define Y_FLIP(y) y
+#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))
 
 /**
  * \brief Get pointer offset into stencil buffer.
  *
- * The stencil buffer interleaves two rows into one. Yay for crazy hardware.
- * The table below demonstrates how the pointer arithmetic behaves for a buffer
- * with positive stride (s=stride).
- *
- *     x    | y     | byte offset
- *     --------------------------
- *     0    | 0     | 0
- *     0    | 1     | 1
- *     1    | 0     | 2
- *     1    | 1     | 3
- *     ...  | ...   | ...
- *     0    | 2     | s
- *     0    | 3     | s + 1
- *     1    | 2     | s + 2
- *     1    | 3     | s + 3
- *
+ * The stencil buffer is W tiled. Since the GTT is incapable of W fencing, we
+ * must decode the tile's layout in software.
  *
+ * See
+ *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Major Tile
+ *     Format.
+ *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling Algorithm
  */
-static inline intptr_t
-intel_offset_S8(int stride, GLint x, GLint y)
+static inline uintptr_t
+intel_offset_S8(uint32_t stride, uint32_t x, uint32_t y)
 {
-   return 2 * ((y / 2) * stride + x) + y % 2;
+   uint32_t tile_size = 4096;
+   uint32_t tile_width = 64;
+   uint32_t tile_height = 64;
+   uint32_t row_size = 64 * stride;
+
+   uint32_t tile_x = x / tile_width;
+   uint32_t tile_y = y / tile_height;
+
+   /* The byte's address relative to the tile's base addres. */
+   uint32_t byte_x = x % tile_width;
+   uint32_t byte_y = y % tile_height;
+
+   uintptr_t u = tile_y * row_size
+               + tile_x * tile_size
+               + 512 * (byte_x / 8)
+               +  64 * (byte_y / 8)
+               +  32 * ((byte_y / 4) % 2)
+               +  16 * ((byte_x / 4) % 2)
+               +   8 * ((byte_y / 2) % 2)
+               +   4 * ((byte_x / 2) % 2)
+               +   2 * (byte_y % 2)
+               +   1 * (byte_x % 2);
+
+   /*
+    * Errata for Gen5:
+    *
+    * An additional offset is needed which is not documented in the PRM.
+    *
+    * if ((byte_x / 8) % 2 == 1) {
+    *    if ((byte_y / 8) % 2) == 0) {
+    *       u += 64;
+    *    } else {
+    *       u -= 64;
+    *    }
+    * }
+    *
+    * The offset is expressed more tersely as
+    * u += ((int) x & 0x8) * (8 - (((int) y & 0x8) << 1));
+    */
+
+   return u;
 }
 
 #define WRITE_STENCIL(x, y, src)  buf[intel_offset_S8(stride, x, y)] = src;
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] dri: Do not tile stencil buffer
  2011-07-18  7:55 ` [PATCH] dri: Do not tile stencil buffer Chad Versace
@ 2011-07-18  8:15   ` Paul Menzel
  2011-07-18 18:45   ` [Mesa-dev] " Ian Romanick
  1 sibling, 0 replies; 18+ messages in thread
From: Paul Menzel @ 2011-07-18  8:15 UTC (permalink / raw)
  To: intel-gfx; +Cc: mesa-dev


[-- Attachment #1.1: Type: text/plain, Size: 2668 bytes --]

Am Montag, den 18.07.2011, 00:55 -0700 schrieb Chad Versace:
> Until now, the stencil buffer was allocated as a Y tiled buffer, because
> in several locations the PRM states that it is. However, it is actually
> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
> 4.5.2.1 W-Major Format:
>     W-Major Tile Format is used for separate stencil.
> 
> The GTT is incapable of W fencing, so we allocate the stencil buffer with
> I915_TILING_NONE and decode the tile's layout in software.
> 
> This commit mutually depends on the mesa commit:
>     intel: Fix stencil buffer to be W tiled
>     Author: Chad Versace <chad@chad-versace.us>
>     Date:   Mon Jul 18 00:37:45 2011 -0700
> 
> CC: Eric Anholt <eric@anholt.net>
> CC: Kenneth Graunke <kenneth@whitecape.org>
> Signed-off-by: Chad Versace <chad@chad-versace.us>
> ---
>  src/intel_dri.c |   16 ++++++++++++----
>  1 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/src/intel_dri.c b/src/intel_dri.c
> index 5ea7c2c..4652dc7 100644
> --- a/src/intel_dri.c
> +++ b/src/intel_dri.c
> @@ -336,7 +336,6 @@ I830DRI2CreateBuffer(DrawablePtr drawable, unsigned int attachment,
>  			switch (attachment) {
>  			case DRI2BufferDepth:
>  			case DRI2BufferDepthStencil:
> -			case DRI2BufferStencil:
>  			case DRI2BufferHiz:
>  				if (SUPPORTS_YTILING(intel)) {
>  					hint |= INTEL_CREATE_PIXMAP_TILING_Y;
> @@ -351,6 +350,14 @@ I830DRI2CreateBuffer(DrawablePtr drawable, unsigned int attachment,
>  			case DRI2BufferFrontRight:
>  				hint |= INTEL_CREATE_PIXMAP_TILING_X;
>  				break;
> +			case DRI2BufferStencil:
> +				/*
> +				 * The stencil buffer is W tiled. However, we
> +				 * request from the kernel a non-tiled buffer
> +				 * because the GTT is incapable of W fencing.
> +				 */
> +				hint |= INTEL_CREATE_PIXMAP_TILING_NONE;
> +				break;
>  			default:
>  				free(privates);
>  				free(buffer);
> @@ -368,11 +375,12 @@ I830DRI2CreateBuffer(DrawablePtr drawable, unsigned int attachment,
>  		 * To accomplish this, we resort to the nasty hack of doubling
>  		 * the drm region's cpp and halving its height.
>  		 *
> -		 * If we neglect to double the pitch, then
> -		 * drm_intel_gem_bo_map_gtt() maps the memory incorrectly.
> +		 * If we neglect to double the pitch, then render corruption
> +                 * occurs.

The alignment does not seem to match.

>  		 */
>  		if (attachment == DRI2BufferStencil) {
> -			pixmap_height /= 2;
> +			pixmap_width = ALIGN(pixmap_width, 64);
> +			pixmap_height = ALIGN(pixmap_height / 2, 64);
>  			pixmap_cpp *= 2;
>  		}


Thanks,

Paul

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] intel: Fix stencil buffer to be W tiled
  2011-07-18  7:55 ` [PATCH] intel: Fix stencil buffer to be W tiled Chad Versace
@ 2011-07-18  8:20   ` Paul Menzel
  2011-07-19 20:03     ` [Mesa-dev] " Chad Versace
  2011-07-18 15:57   ` Eric Anholt
  2011-07-18 18:49   ` Ian Romanick
  2 siblings, 1 reply; 18+ messages in thread
From: Paul Menzel @ 2011-07-18  8:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: mesa-dev


[-- Attachment #1.1: Type: text/plain, Size: 1210 bytes --]

Am Montag, den 18.07.2011, 00:55 -0700 schrieb Chad Versace:

[…]

> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
> index 153803f..d306432 100644
> --- a/src/mesa/drivers/dri/intel/intel_span.c
> +++ b/src/mesa/drivers/dri/intel/intel_span.c
> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>     int miny = 0;							\
>     int maxx = rb->Width;						\
>     int maxy = rb->Height;						\
> -   int stride = rb->RowStride;						\
> -   uint8_t *buf = rb->Data;						\
> +									\
> +   /*									\
> +    * Here we ignore rb->Data and rb->RowStride as set by		\
> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
> +    * manually, the region's *real* base address and stride is		\
> +    * required.								\
> +    */									\
> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
> +   uint8_t *buf = irb->region->buffer->virtual;				\

There are alignment/white space issues above.

> +   unsigned stride = irb->region->pitch;				\
> +   unsigned height = 2 * irb->region->height;				\
> +   bool flip = rb->Name == 0;						\

[…]


Thanks,

Paul

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] intel: Fix stencil buffer to be W tiled
  2011-07-18  7:55 ` [PATCH] intel: Fix stencil buffer to be W tiled Chad Versace
  2011-07-18  8:20   ` Paul Menzel
@ 2011-07-18 15:57   ` Eric Anholt
  2011-07-18 21:05     ` [Mesa-dev] " Chad Versace
  2011-07-19  0:00     ` Chad Versace
  2011-07-18 18:49   ` Ian Romanick
  2 siblings, 2 replies; 18+ messages in thread
From: Eric Anholt @ 2011-07-18 15:57 UTC (permalink / raw)
  To: mesa-dev, intel-gfx; +Cc: Chad Versace


[-- Attachment #1.1: Type: text/plain, Size: 3233 bytes --]

On Mon, 18 Jul 2011 00:55:03 -0700, Chad Versace <chad@chad-versace.us> wrote:
> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
> index 1669af2..507cc33 100644
> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>  
>     if (irb->Base.Format == MESA_FORMAT_S8) {
>        /*
> +       * The stencil buffer is W tiled. However, we request from the kernel a
> +       * non-tiled buffer because the GTT is incapable of W fencing.
> +       *
>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>         *    The pitch must be set to 2x the value computed based on width, as
> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>         * To accomplish this, we resort to the nasty hack of doubling the drm
>         * region's cpp and halving its height.
>         *
> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
> -       * maps the memory incorrectly.
> +       * If we neglect to double the pitch, then render corruption occurs.
>         */
>        irb->region = intel_region_alloc(intel->intelScreen,
> -				       I915_TILING_Y,
> +				       I915_TILING_NONE,
>  				       cpp * 2,
> -				       width,
> -				       height / 2,
> +				       ALIGN(width, 64),
> +				       /* XXX: Maybe align to 128? */
> +				       ALIGN(height / 2, 64),
>  				       GL_TRUE);
>        if (!irb->region)
>  	return false;

This looks like it would fail on a buffer with height = 129.  Doesn't
seem like 128 pitch requirement would be needed -- has it been tested?

> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
> index 153803f..d306432 100644
> --- a/src/mesa/drivers/dri/intel/intel_span.c
> +++ b/src/mesa/drivers/dri/intel/intel_span.c
> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>     int miny = 0;							\
>     int maxx = rb->Width;						\
>     int maxy = rb->Height;						\
> -   int stride = rb->RowStride;						\
> -   uint8_t *buf = rb->Data;						\
> +									\
> +   /*									\
> +    * Here we ignore rb->Data and rb->RowStride as set by		\
> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
> +    * manually, the region's *real* base address and stride is		\
> +    * required.								\
> +    */									\
> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
> +   uint8_t *buf = irb->region->buffer->virtual;				\
> +   unsigned stride = irb->region->pitch;				\
> +   unsigned height = 2 * irb->region->height;				\
> +   bool flip = rb->Name == 0;						\
>  
> -/* Don't flip y. */
>  #undef Y_FLIP
> -#define Y_FLIP(y) y
> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))

The flip is usually handled by a scale and bias variable, so that Y_FLIP
is ((y) * scale + bias).  I think it'll produce less code, since Y_FLIP
is used a lot.


[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Mesa-dev] [PATCH] dri: Do not tile stencil buffer
  2011-07-18  7:55 ` [PATCH] dri: Do not tile stencil buffer Chad Versace
  2011-07-18  8:15   ` Paul Menzel
@ 2011-07-18 18:45   ` Ian Romanick
  2011-07-18 19:28     ` Chad Versace
  1 sibling, 1 reply; 18+ messages in thread
From: Ian Romanick @ 2011-07-18 18:45 UTC (permalink / raw)
  To: Chad Versace; +Cc: mesa-dev, intel-gfx

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/18/2011 12:55 AM, Chad Versace wrote:
> Until now, the stencil buffer was allocated as a Y tiled buffer, because
> in several locations the PRM states that it is. However, it is actually
> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
> 4.5.2.1 W-Major Format:
>     W-Major Tile Format is used for separate stencil.
> 
> The GTT is incapable of W fencing, so we allocate the stencil buffer with
> I915_TILING_NONE and decode the tile's layout in software.
> 
> This commit mutually depends on the mesa commit:
>     intel: Fix stencil buffer to be W tiled
>     Author: Chad Versace <chad@chad-versace.us>
>     Date:   Mon Jul 18 00:37:45 2011 -0700
> 
> CC: Eric Anholt <eric@anholt.net>
> CC: Kenneth Graunke <kenneth@whitecape.org>
> Signed-off-by: Chad Versace <chad@chad-versace.us>
> ---
>  src/intel_dri.c |   16 ++++++++++++----
>  1 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/src/intel_dri.c b/src/intel_dri.c
> index 5ea7c2c..4652dc7 100644
> --- a/src/intel_dri.c
> +++ b/src/intel_dri.c
> @@ -336,7 +336,6 @@ I830DRI2CreateBuffer(DrawablePtr drawable, unsigned int attachment,
>  			switch (attachment) {
>  			case DRI2BufferDepth:
>  			case DRI2BufferDepthStencil:
> -			case DRI2BufferStencil:
>  			case DRI2BufferHiz:
>  				if (SUPPORTS_YTILING(intel)) {
>  					hint |= INTEL_CREATE_PIXMAP_TILING_Y;
> @@ -351,6 +350,14 @@ I830DRI2CreateBuffer(DrawablePtr drawable, unsigned int attachment,
>  			case DRI2BufferFrontRight:
>  				hint |= INTEL_CREATE_PIXMAP_TILING_X;
>  				break;
> +			case DRI2BufferStencil:
> +				/*
> +				 * The stencil buffer is W tiled. However, we
> +				 * request from the kernel a non-tiled buffer
> +				 * because the GTT is incapable of W fencing.
> +				 */
> +				hint |= INTEL_CREATE_PIXMAP_TILING_NONE;
> +				break;
>  			default:
>  				free(privates);
>  				free(buffer);

Eh... it seems like this will break compatibility with older versions of
Mesa.  I haven't dug around, but there used to be a hack in DRI2 where a
client would request a depth buffer and a stencil buffer, but it would
get the same packed depth-stencil buffer for both.  I guess that might
all be up in the DRI2 layer and not in the driver...

> @@ -368,11 +375,12 @@ I830DRI2CreateBuffer(DrawablePtr drawable, unsigned int attachment,
>  		 * To accomplish this, we resort to the nasty hack of doubling
>  		 * the drm region's cpp and halving its height.
>  		 *
> -		 * If we neglect to double the pitch, then
> -		 * drm_intel_gem_bo_map_gtt() maps the memory incorrectly.
> +		 * If we neglect to double the pitch, then render corruption
> +                 * occurs.

Mangled whitespace?  Probably mixed tabs and spaces...

>  		 */
>  		if (attachment == DRI2BufferStencil) {
> -			pixmap_height /= 2;
> +			pixmap_width = ALIGN(pixmap_width, 64);
> +			pixmap_height = ALIGN(pixmap_height / 2, 64);
>  			pixmap_cpp *= 2;
>  		}
>  
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk4kf10ACgkQX1gOwKyEAw9jVACeOfy5BjD4Nb/YN3vyOoR5MOOv
IOAAn20Lh7GAN7KSAkBFs/u5uaHq8/Sm
=ICWC
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] intel: Fix stencil buffer to be W tiled
  2011-07-18  7:55 ` [PATCH] intel: Fix stencil buffer to be W tiled Chad Versace
  2011-07-18  8:20   ` Paul Menzel
  2011-07-18 15:57   ` Eric Anholt
@ 2011-07-18 18:49   ` Ian Romanick
  2011-07-18 20:54     ` [Mesa-dev] " Chad Versace
  2 siblings, 1 reply; 18+ messages in thread
From: Ian Romanick @ 2011-07-18 18:49 UTC (permalink / raw)
  To: Chad Versace; +Cc: mesa-dev, intel-gfx

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/18/2011 12:55 AM, Chad Versace wrote:
> Until now, the stencil buffer was allocated as a Y tiled buffer, because
> in several locations the PRM states that it is. However, it is actually
> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
> 4.5.2.1 W-Major Format:
>     W-Major Tile Format is used for separate stencil.
> 
> The GTT is incapable of W fencing, so we allocate the stencil buffer with
> I915_TILING_NONE and decode the tile's layout in software.
> 
> This fix touches the following portions of code:
>     - In intel_allocate_renderbuffer_storage(), allocate the stencil
>       buffer with I915_TILING_NONE.
>     - In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
>       not tiled.
>     - In the stencil buffer's span functions, the tile's layout must be
>       decoded in software.
> 
> This commit mutually depends on the xf86-video-intel commit
>     dri: Do not tile stencil buffer
>     Author: Chad Versace <chad@chad-versace.us>
>     Date:   Mon Jul 18 00:38:00 2011 -0700
> 
> On Gen6 with separate stencil enabled, fixes the following Piglit tests:
>     bugs/fdo23670-drawpix_stencil
>     general/stencil-drawpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
>     spec/EXT_packed_depth_stencil/readpixels-24_8
> 
> Note: This is a candidate for the 7.11 branch.
> 
> CC: Eric Anholt <eric@anholt.net>
> CC: Kenneth Graunke <kenneth@whitecape.org>
> Signed-off-by: Chad Versace <chad@chad-versace.us>
> ---
>  src/mesa/drivers/dri/intel/intel_clear.c   |    6 ++
>  src/mesa/drivers/dri/intel/intel_context.c |    9 ++-
>  src/mesa/drivers/dri/intel/intel_fbo.c     |   13 +++--
>  src/mesa/drivers/dri/intel/intel_screen.h  |    9 ++-
>  src/mesa/drivers/dri/intel/intel_span.c    |   85 ++++++++++++++++++++--------
>  5 files changed, 89 insertions(+), 33 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/drivers/dri/intel/intel_clear.c
> index dfca03c..5ab9873 100644
> --- a/src/mesa/drivers/dri/intel/intel_clear.c
> +++ b/src/mesa/drivers/dri/intel/intel_clear.c
> @@ -143,6 +143,12 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
>  	     */
>              tri_mask |= BUFFER_BIT_STENCIL;
>           }
> +	 else if (intel->has_separate_stencil &&
> +	       stencilRegion->tiling == I915_TILING_NONE) {
> +	    /* The stencil buffer is actually W tiled, which the hardware
> +	     * cannot blit to. */
> +	    tri_mask |= BUFFER_BIT_STENCIL;
> +	 }
>           else {
>              /* clearing all stencil bits, use blitting */
>              blit_mask |= BUFFER_BIT_STENCIL;
> diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/drivers/dri/intel/intel_context.c
> index 2ba1363..fe8be08 100644
> --- a/src/mesa/drivers/dri/intel/intel_context.c
> +++ b/src/mesa/drivers/dri/intel/intel_context.c
> @@ -1439,7 +1439,12 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>        assert(stencil_rb->Base.Format == MESA_FORMAT_S8);
>        assert(depth_rb && depth_rb->Base.Format == MESA_FORMAT_X8_Z24);
>  
> -      if (stencil_rb->region->tiling == I915_TILING_Y) {
> +      if (stencil_rb->region->tiling == I915_TILING_NONE) {
> +	 /*
> +	  * The stencil buffer is actually W tiled. The region's tiling is
> +	  * I915_TILING_NONE, however, because the GTT is incapable of W
> +	  * fencing.
> +	  */
>  	 intel->intelScreen->dri2_has_hiz = INTEL_DRI2_HAS_HIZ_TRUE;
>  	 return;
>        } else {
> @@ -1527,7 +1532,7 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>         * Presently, however, no verification or clean up is necessary, and
>         * execution should not reach here. If the framebuffer still has a hiz
>         * region, then we have already set dri2_has_hiz to true after
> -       * confirming above that the stencil buffer is Y tiled.
> +       * confirming above that the stencil buffer is W tiled.
>         */
>        assert(0);
>     }
> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
> index 1669af2..507cc33 100644
> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>  
>     if (irb->Base.Format == MESA_FORMAT_S8) {
>        /*
> +       * The stencil buffer is W tiled. However, we request from the kernel a
> +       * non-tiled buffer because the GTT is incapable of W fencing.
> +       *
>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>         *    The pitch must be set to 2x the value computed based on width, as
> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>         * To accomplish this, we resort to the nasty hack of doubling the drm
>         * region's cpp and halving its height.
>         *
> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
> -       * maps the memory incorrectly.
> +       * If we neglect to double the pitch, then render corruption occurs.
>         */
>        irb->region = intel_region_alloc(intel->intelScreen,
> -				       I915_TILING_Y,
> +				       I915_TILING_NONE,
>  				       cpp * 2,
> -				       width,
> -				       height / 2,
> +				       ALIGN(width, 64),
> +				       /* XXX: Maybe align to 128? */
> +				       ALIGN(height / 2, 64),
>  				       GL_TRUE);
>        if (!irb->region)
>  	return false;
> diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/drivers/dri/intel/intel_screen.h
> index b2013af..9dd6a52 100644
> --- a/src/mesa/drivers/dri/intel/intel_screen.h
> +++ b/src/mesa/drivers/dri/intel/intel_screen.h
> @@ -63,9 +63,12 @@
>   * x8_z24 and s8).
>   *
>   * Eventually, intel_update_renderbuffers() makes a DRI2 request for
> - * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are Y tiled,
> - * then we joyfully set intel_screen.dri2_has_hiz to true and continue as if
> - * nothing happend.
> + * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's tiling is
> + * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has_hiz to
> + * true and continue as if nothing happend.
> + *
> + * [1] The stencil buffer is actually W tiled. However, we request from the
> + *     kernel a non-tiled buffer because the GTT is incapable of W fencing.
>   *
>   * If the buffers are X tiled, however, the handshake has failed and we must
>   * clean up.
> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
> index 153803f..d306432 100644
> --- a/src/mesa/drivers/dri/intel/intel_span.c
> +++ b/src/mesa/drivers/dri/intel/intel_span.c
> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>     int miny = 0;							\
>     int maxx = rb->Width;						\
>     int maxy = rb->Height;						\
> -   int stride = rb->RowStride;						\
> -   uint8_t *buf = rb->Data;						\
> +									\
> +   /*									\
> +    * Here we ignore rb->Data and rb->RowStride as set by		\
> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
> +    * manually, the region's *real* base address and stride is		\
> +    * required.								\
> +    */									\
> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
> +   uint8_t *buf = irb->region->buffer->virtual;				\
> +   unsigned stride = irb->region->pitch;				\
> +   unsigned height = 2 * irb->region->height;				\
> +   bool flip = rb->Name == 0;						\
>  
> -/* Don't flip y. */
>  #undef Y_FLIP
> -#define Y_FLIP(y) y
> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))
>  
>  /**
>   * \brief Get pointer offset into stencil buffer.
>   *
> - * The stencil buffer interleaves two rows into one. Yay for crazy hardware.
> - * The table below demonstrates how the pointer arithmetic behaves for a buffer
> - * with positive stride (s=stride).
> - *
> - *     x    | y     | byte offset
> - *     --------------------------
> - *     0    | 0     | 0
> - *     0    | 1     | 1
> - *     1    | 0     | 2
> - *     1    | 1     | 3
> - *     ...  | ...   | ...
> - *     0    | 2     | s
> - *     0    | 3     | s + 1
> - *     1    | 2     | s + 2
> - *     1    | 3     | s + 3
> - *
> + * The stencil buffer is W tiled. Since the GTT is incapable of W fencing, we
> + * must decode the tile's layout in software.
>   *
> + * See
> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Major Tile
> + *     Format.
> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling Algorithm
>   */
> -static inline intptr_t
> -intel_offset_S8(int stride, GLint x, GLint y)
> +static inline uintptr_t

Why the type change?

> +intel_offset_S8(uint32_t stride, uint32_t x, uint32_t y)
>  {
> -   return 2 * ((y / 2) * stride + x) + y % 2;
> +   uint32_t tile_size = 4096;
> +   uint32_t tile_width = 64;
> +   uint32_t tile_height = 64;
> +   uint32_t row_size = 64 * stride;
> +
> +   uint32_t tile_x = x / tile_width;
> +   uint32_t tile_y = y / tile_height;
> +
> +   /* The byte's address relative to the tile's base addres. */
> +   uint32_t byte_x = x % tile_width;
> +   uint32_t byte_y = y % tile_height;
> +
> +   uintptr_t u = tile_y * row_size
> +               + tile_x * tile_size
> +               + 512 * (byte_x / 8)
> +               +  64 * (byte_y / 8)
> +               +  32 * ((byte_y / 4) % 2)
> +               +  16 * ((byte_x / 4) % 2)
> +               +   8 * ((byte_y / 2) % 2)
> +               +   4 * ((byte_x / 2) % 2)
> +               +   2 * (byte_y % 2)
> +               +   1 * (byte_x % 2);
> +
> +   /*
> +    * Errata for Gen5:
> +    *
> +    * An additional offset is needed which is not documented in the PRM.
> +    *
> +    * if ((byte_x / 8) % 2 == 1) {
> +    *    if ((byte_y / 8) % 2) == 0) {
> +    *       u += 64;
> +    *    } else {
> +    *       u -= 64;
> +    *    }
> +    * }
> +    *
> +    * The offset is expressed more tersely as
> +    * u += ((int) x & 0x8) * (8 - (((int) y & 0x8) << 1));
> +    */
> +
> +   return u;
>  }
>  
>  #define WRITE_STENCIL(x, y, src)  buf[intel_offset_S8(stride, x, y)] = src;

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk4kgDUACgkQX1gOwKyEAw+9WgCffPOqrYW9zIZ0114HTmwcXfCP
t7sAnA6cpwuNkL+o6/yc20DI27xzSopm
=66S9
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Mesa-dev] [PATCH] dri: Do not tile stencil buffer
  2011-07-18 18:45   ` [Mesa-dev] " Ian Romanick
@ 2011-07-18 19:28     ` Chad Versace
  2011-07-18 20:58       ` Ian Romanick
  0 siblings, 1 reply; 18+ messages in thread
From: Chad Versace @ 2011-07-18 19:28 UTC (permalink / raw)
  To: Ian Romanick; +Cc: mesa-dev, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 3648 bytes --]

On 07/18/2011 11:45 AM, Ian Romanick wrote:
> On 07/18/2011 12:55 AM, Chad Versace wrote:
>> Until now, the stencil buffer was allocated as a Y tiled buffer, because
>> in several locations the PRM states that it is. However, it is actually
>> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
>> 4.5.2.1 W-Major Format:
>>     W-Major Tile Format is used for separate stencil.
> 
>> The GTT is incapable of W fencing, so we allocate the stencil buffer with
>> I915_TILING_NONE and decode the tile's layout in software.
> 
>> This commit mutually depends on the mesa commit:
>>     intel: Fix stencil buffer to be W tiled
>>     Author: Chad Versace <chad@chad-versace.us>
>>     Date:   Mon Jul 18 00:37:45 2011 -0700
> 
>> CC: Eric Anholt <eric@anholt.net>
>> CC: Kenneth Graunke <kenneth@whitecape.org>
>> Signed-off-by: Chad Versace <chad@chad-versace.us>
>> ---
>>  src/intel_dri.c |   16 ++++++++++++----
>>  1 files changed, 12 insertions(+), 4 deletions(-)
> 
>> diff --git a/src/intel_dri.c b/src/intel_dri.c
>> index 5ea7c2c..4652dc7 100644
>> --- a/src/intel_dri.c
>> +++ b/src/intel_dri.c
>> @@ -336,7 +336,6 @@ I830DRI2CreateBuffer(DrawablePtr drawable, unsigned int attachment,
>>  			switch (attachment) {
>>  			case DRI2BufferDepth:
>>  			case DRI2BufferDepthStencil:
>> -			case DRI2BufferStencil:
>>  			case DRI2BufferHiz:
>>  				if (SUPPORTS_YTILING(intel)) {
>>  					hint |= INTEL_CREATE_PIXMAP_TILING_Y;
>> @@ -351,6 +350,14 @@ I830DRI2CreateBuffer(DrawablePtr drawable, unsigned int attachment,
>>  			case DRI2BufferFrontRight:
>>  				hint |= INTEL_CREATE_PIXMAP_TILING_X;
>>  				break;
>> +			case DRI2BufferStencil:
>> +				/*
>> +				 * The stencil buffer is W tiled. However, we
>> +				 * request from the kernel a non-tiled buffer
>> +				 * because the GTT is incapable of W fencing.
>> +				 */
>> +				hint |= INTEL_CREATE_PIXMAP_TILING_NONE;
>> +				break;
>>  			default:
>>  				free(privates);
>>  				free(buffer);
> 
> Eh... it seems like this will break compatibility with older versions of
> Mesa.  I haven't dug around, but there used to be a hack in DRI2 where a
> client would request a depth buffer and a stencil buffer, but it would
> get the same packed depth-stencil buffer for both.  I guess that might
> all be up in the DRI2 layer and not in the driver...

The 'case DRI2BufferStencil' modified in this hunk was added by me when implementing
separate stencil support. It was never used for this hack.

That hack is implemented by an alternate definition of I830DRI2CreateBuffer() which
is ifdef'd out when DRI2INFOREC_VERSION >= 2. FYI, the line that implements the hack can
be found by grepping xf86-video-intel:src/intel_dri.c for
    } else if (attachments[i] == DRI2BufferStencil && pDepthPixmap) {

> 
>> @@ -368,11 +375,12 @@ I830DRI2CreateBuffer(DrawablePtr drawable, unsigned int attachment,
>>  		 * To accomplish this, we resort to the nasty hack of doubling
>>  		 * the drm region's cpp and halving its height.
>>  		 *
>> -		 * If we neglect to double the pitch, then
>> -		 * drm_intel_gem_bo_map_gtt() maps the memory incorrectly.
>> +		 * If we neglect to double the pitch, then render corruption
>> +                 * occurs.
> 
> Mangled whitespace?  Probably mixed tabs and spaces...

Oops. Will fix.

> 
>>  		 */
>>  		if (attachment == DRI2BufferStencil) {
>> -			pixmap_height /= 2;
>> +			pixmap_width = ALIGN(pixmap_width, 64);
>> +			pixmap_height = ALIGN(pixmap_height / 2, 64);
>>  			pixmap_cpp *= 2;
>>  		}


-- 
Chad Versace
chad@chad-versace.us


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Mesa-dev] [PATCH] intel: Fix stencil buffer to be W tiled
  2011-07-18 18:49   ` Ian Romanick
@ 2011-07-18 20:54     ` Chad Versace
  2011-07-18 21:02       ` Ian Romanick
  0 siblings, 1 reply; 18+ messages in thread
From: Chad Versace @ 2011-07-18 20:54 UTC (permalink / raw)
  To: Ian Romanick; +Cc: mesa-dev, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 12619 bytes --]

On 07/18/2011 11:49 AM, Ian Romanick wrote:
> On 07/18/2011 12:55 AM, Chad Versace wrote:
>> Until now, the stencil buffer was allocated as a Y tiled buffer, because
>> in several locations the PRM states that it is. However, it is actually
>> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
>> 4.5.2.1 W-Major Format:
>>     W-Major Tile Format is used for separate stencil.
> 
>> The GTT is incapable of W fencing, so we allocate the stencil buffer with
>> I915_TILING_NONE and decode the tile's layout in software.
> 
>> This fix touches the following portions of code:
>>     - In intel_allocate_renderbuffer_storage(), allocate the stencil
>>       buffer with I915_TILING_NONE.
>>     - In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
>>       not tiled.
>>     - In the stencil buffer's span functions, the tile's layout must be
>>       decoded in software.
> 
>> This commit mutually depends on the xf86-video-intel commit
>>     dri: Do not tile stencil buffer
>>     Author: Chad Versace <chad@chad-versace.us>
>>     Date:   Mon Jul 18 00:38:00 2011 -0700
> 
>> On Gen6 with separate stencil enabled, fixes the following Piglit tests:
>>     bugs/fdo23670-drawpix_stencil
>>     general/stencil-drawpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
>>     spec/EXT_packed_depth_stencil/readpixels-24_8
> 
>> Note: This is a candidate for the 7.11 branch.
> 
>> CC: Eric Anholt <eric@anholt.net>
>> CC: Kenneth Graunke <kenneth@whitecape.org>
>> Signed-off-by: Chad Versace <chad@chad-versace.us>
>> ---
>>  src/mesa/drivers/dri/intel/intel_clear.c   |    6 ++
>>  src/mesa/drivers/dri/intel/intel_context.c |    9 ++-
>>  src/mesa/drivers/dri/intel/intel_fbo.c     |   13 +++--
>>  src/mesa/drivers/dri/intel/intel_screen.h  |    9 ++-
>>  src/mesa/drivers/dri/intel/intel_span.c    |   85 ++++++++++++++++++++--------
>>  5 files changed, 89 insertions(+), 33 deletions(-)
> 
>> diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/drivers/dri/intel/intel_clear.c
>> index dfca03c..5ab9873 100644
>> --- a/src/mesa/drivers/dri/intel/intel_clear.c
>> +++ b/src/mesa/drivers/dri/intel/intel_clear.c
>> @@ -143,6 +143,12 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
>>  	     */
>>              tri_mask |= BUFFER_BIT_STENCIL;
>>           }
>> +	 else if (intel->has_separate_stencil &&
>> +	       stencilRegion->tiling == I915_TILING_NONE) {
>> +	    /* The stencil buffer is actually W tiled, which the hardware
>> +	     * cannot blit to. */
>> +	    tri_mask |= BUFFER_BIT_STENCIL;
>> +	 }
>>           else {
>>              /* clearing all stencil bits, use blitting */
>>              blit_mask |= BUFFER_BIT_STENCIL;
>> diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/drivers/dri/intel/intel_context.c
>> index 2ba1363..fe8be08 100644
>> --- a/src/mesa/drivers/dri/intel/intel_context.c
>> +++ b/src/mesa/drivers/dri/intel/intel_context.c
>> @@ -1439,7 +1439,12 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>        assert(stencil_rb->Base.Format == MESA_FORMAT_S8);
>>        assert(depth_rb && depth_rb->Base.Format == MESA_FORMAT_X8_Z24);
> 
>> -      if (stencil_rb->region->tiling == I915_TILING_Y) {
>> +      if (stencil_rb->region->tiling == I915_TILING_NONE) {
>> +	 /*
>> +	  * The stencil buffer is actually W tiled. The region's tiling is
>> +	  * I915_TILING_NONE, however, because the GTT is incapable of W
>> +	  * fencing.
>> +	  */
>>  	 intel->intelScreen->dri2_has_hiz = INTEL_DRI2_HAS_HIZ_TRUE;
>>  	 return;
>>        } else {
>> @@ -1527,7 +1532,7 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>         * Presently, however, no verification or clean up is necessary, and
>>         * execution should not reach here. If the framebuffer still has a hiz
>>         * region, then we have already set dri2_has_hiz to true after
>> -       * confirming above that the stencil buffer is Y tiled.
>> +       * confirming above that the stencil buffer is W tiled.
>>         */
>>        assert(0);
>>     }
>> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
>> index 1669af2..507cc33 100644
>> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
>> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
>> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
> 
>>     if (irb->Base.Format == MESA_FORMAT_S8) {
>>        /*
>> +       * The stencil buffer is W tiled. However, we request from the kernel a
>> +       * non-tiled buffer because the GTT is incapable of W fencing.
>> +       *
>>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>>         *    The pitch must be set to 2x the value computed based on width, as
>> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>         * To accomplish this, we resort to the nasty hack of doubling the drm
>>         * region's cpp and halving its height.
>>         *
>> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
>> -       * maps the memory incorrectly.
>> +       * If we neglect to double the pitch, then render corruption occurs.
>>         */
>>        irb->region = intel_region_alloc(intel->intelScreen,
>> -				       I915_TILING_Y,
>> +				       I915_TILING_NONE,
>>  				       cpp * 2,
>> -				       width,
>> -				       height / 2,
>> +				       ALIGN(width, 64),
>> +				       /* XXX: Maybe align to 128? */
>> +				       ALIGN(height / 2, 64),
>>  				       GL_TRUE);
>>        if (!irb->region)
>>  	return false;
>> diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/drivers/dri/intel/intel_screen.h
>> index b2013af..9dd6a52 100644
>> --- a/src/mesa/drivers/dri/intel/intel_screen.h
>> +++ b/src/mesa/drivers/dri/intel/intel_screen.h
>> @@ -63,9 +63,12 @@
>>   * x8_z24 and s8).
>>   *
>>   * Eventually, intel_update_renderbuffers() makes a DRI2 request for
>> - * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are Y tiled,
>> - * then we joyfully set intel_screen.dri2_has_hiz to true and continue as if
>> - * nothing happend.
>> + * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's tiling is
>> + * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has_hiz to
>> + * true and continue as if nothing happend.
>> + *
>> + * [1] The stencil buffer is actually W tiled. However, we request from the
>> + *     kernel a non-tiled buffer because the GTT is incapable of W fencing.
>>   *
>>   * If the buffers are X tiled, however, the handshake has failed and we must
>>   * clean up.
>> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
>> index 153803f..d306432 100644
>> --- a/src/mesa/drivers/dri/intel/intel_span.c
>> +++ b/src/mesa/drivers/dri/intel/intel_span.c
>> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>>     int miny = 0;							\
>>     int maxx = rb->Width;						\
>>     int maxy = rb->Height;						\
>> -   int stride = rb->RowStride;						\
>> -   uint8_t *buf = rb->Data;						\
>> +									\
>> +   /*									\
>> +    * Here we ignore rb->Data and rb->RowStride as set by		\
>> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
>> +    * manually, the region's *real* base address and stride is		\
>> +    * required.								\
>> +    */									\
>> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
>> +   uint8_t *buf = irb->region->buffer->virtual;				\
>> +   unsigned stride = irb->region->pitch;				\
>> +   unsigned height = 2 * irb->region->height;				\
>> +   bool flip = rb->Name == 0;						\
> 
>> -/* Don't flip y. */
>>  #undef Y_FLIP
>> -#define Y_FLIP(y) y
>> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))
> 
>>  /**
>>   * \brief Get pointer offset into stencil buffer.
>>   *
>> - * The stencil buffer interleaves two rows into one. Yay for crazy hardware.
>> - * The table below demonstrates how the pointer arithmetic behaves for a buffer
>> - * with positive stride (s=stride).
>> - *
>> - *     x    | y     | byte offset
>> - *     --------------------------
>> - *     0    | 0     | 0
>> - *     0    | 1     | 1
>> - *     1    | 0     | 2
>> - *     1    | 1     | 3
>> - *     ...  | ...   | ...
>> - *     0    | 2     | s
>> - *     0    | 3     | s + 1
>> - *     1    | 2     | s + 2
>> - *     1    | 3     | s + 3 
>> - *
>> + * The stencil buffer is W tiled. Since the GTT is incapable of W fencing, we
>> + * must decode the tile's layout in software.
>>   *
>> + * See
>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Major Tile
>> + *     Format.
>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling Algorithm
>>   */
>> -static inline intptr_t
>> -intel_offset_S8(int stride, GLint x, GLint y)
>> +static inline uintptr_t
> 
> Why the type change?

Two reasons.

1. I redeclared the parameters as unsigned so to generate better code.
Since x, y, and stride are unsigned, the division and modulo operators
generate shift and bit-logic instructions rather than the slower arithmetic
instructions.

Below is a comparison of x % 64, signed versus unsigned, compiled with gcc -O3.
In f, the % generates 5 instructions. In g, only one instruction.

int f(int x) { return x % 64; }

f:
	movl	%edi, %edx
	sarl	$31, %edx
	shrl	$26, %edx
	leal	(%rdi,%rdx), %eax
	andl	$63, %eax
	subl	%edx, %eax
	ret

unsigned g(unsigned x) { return x % 64);

g:
	movl	%edi, %eax
	andl	$63, %eax
	ret

2. I redeclared the return type as unsigned to make it explicit that it does
not return a negative offset.

>> +intel_offset_S8(uint32_t stride, uint32_t x, uint32_t y)
>>  {
>> -   return 2 * ((y / 2) * stride + x) + y % 2;
>> +   uint32_t tile_size = 4096;
>> +   uint32_t tile_width = 64;
>> +   uint32_t tile_height = 64;
>> +   uint32_t row_size = 64 * stride;
>> +
>> +   uint32_t tile_x = x / tile_width;
>> +   uint32_t tile_y = y / tile_height;
>> +
>> +   /* The byte's address relative to the tile's base addres. */
>> +   uint32_t byte_x = x % tile_width;
>> +   uint32_t byte_y = y % tile_height;
>> +
>> +   uintptr_t u = tile_y * row_size
>> +               + tile_x * tile_size
>> +               + 512 * (byte_x / 8)
>> +               +  64 * (byte_y / 8)
>> +               +  32 * ((byte_y / 4) % 2)
>> +               +  16 * ((byte_x / 4) % 2)
>> +               +   8 * ((byte_y / 2) % 2)
>> +               +   4 * ((byte_x / 2) % 2)
>> +               +   2 * (byte_y % 2)
>> +               +   1 * (byte_x % 2);
>> +
>> +   /*
>> +    * Errata for Gen5:
>> +    *
>> +    * An additional offset is needed which is not documented in the PRM.
>> +    *
>> +    * if ((byte_x / 8) % 2 == 1) {
>> +    *    if ((byte_y / 8) % 2) == 0) {
>> +    *       u += 64;
>> +    *    } else {
>> +    *       u -= 64;
>> +    *    }
>> +    * }
>> +    *
>> +    * The offset is expressed more tersely as
>> +    * u += ((int) x & 0x8) * (8 - (((int) y & 0x8) << 1));
>> +    */
>> +
>> +   return u;
>>  }
> 
>>  #define WRITE_STENCIL(x, y, src)  buf[intel_offset_S8(stride, x, y)] = src;

-- 
Chad Versace
chad@chad-versace.us


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Mesa-dev] [PATCH] dri: Do not tile stencil buffer
  2011-07-18 19:28     ` Chad Versace
@ 2011-07-18 20:58       ` Ian Romanick
  0 siblings, 0 replies; 18+ messages in thread
From: Ian Romanick @ 2011-07-18 20:58 UTC (permalink / raw)
  To: Chad Versace; +Cc: mesa-dev, intel-gfx

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/18/2011 12:28 PM, Chad Versace wrote:
> On 07/18/2011 11:45 AM, Ian Romanick wrote:
>> On 07/18/2011 12:55 AM, Chad Versace wrote:
>>> Until now, the stencil buffer was allocated as a Y tiled buffer, because
>>> in several locations the PRM states that it is. However, it is actually
>>> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
>>> 4.5.2.1 W-Major Format:
>>>     W-Major Tile Format is used for separate stencil.
>>
>>> The GTT is incapable of W fencing, so we allocate the stencil buffer with
>>> I915_TILING_NONE and decode the tile's layout in software.
>>
>>> This commit mutually depends on the mesa commit:
>>>     intel: Fix stencil buffer to be W tiled
>>>     Author: Chad Versace <chad@chad-versace.us>
>>>     Date:   Mon Jul 18 00:37:45 2011 -0700
>>
>>> CC: Eric Anholt <eric@anholt.net>
>>> CC: Kenneth Graunke <kenneth@whitecape.org>
>>> Signed-off-by: Chad Versace <chad@chad-versace.us>
>>> ---
>>>  src/intel_dri.c |   16 ++++++++++++----
>>>  1 files changed, 12 insertions(+), 4 deletions(-)
>>
>>> diff --git a/src/intel_dri.c b/src/intel_dri.c
>>> index 5ea7c2c..4652dc7 100644
>>> --- a/src/intel_dri.c
>>> +++ b/src/intel_dri.c
>>> @@ -336,7 +336,6 @@ I830DRI2CreateBuffer(DrawablePtr drawable, unsigned int attachment,
>>>  			switch (attachment) {
>>>  			case DRI2BufferDepth:
>>>  			case DRI2BufferDepthStencil:
>>> -			case DRI2BufferStencil:
>>>  			case DRI2BufferHiz:
>>>  				if (SUPPORTS_YTILING(intel)) {
>>>  					hint |= INTEL_CREATE_PIXMAP_TILING_Y;
>>> @@ -351,6 +350,14 @@ I830DRI2CreateBuffer(DrawablePtr drawable, unsigned int attachment,
>>>  			case DRI2BufferFrontRight:
>>>  				hint |= INTEL_CREATE_PIXMAP_TILING_X;
>>>  				break;
>>> +			case DRI2BufferStencil:
>>> +				/*
>>> +				 * The stencil buffer is W tiled. However, we
>>> +				 * request from the kernel a non-tiled buffer
>>> +				 * because the GTT is incapable of W fencing.
>>> +				 */
>>> +				hint |= INTEL_CREATE_PIXMAP_TILING_NONE;
>>> +				break;
>>>  			default:
>>>  				free(privates);
>>>  				free(buffer);
>>
>> Eh... it seems like this will break compatibility with older versions of
>> Mesa.  I haven't dug around, but there used to be a hack in DRI2 where a
>> client would request a depth buffer and a stencil buffer, but it would
>> get the same packed depth-stencil buffer for both.  I guess that might
>> all be up in the DRI2 layer and not in the driver...
> 
> The 'case DRI2BufferStencil' modified in this hunk was added by me when implementing
> separate stencil support. It was never used for this hack.
> 
> That hack is implemented by an alternate definition of I830DRI2CreateBuffer() which
> is ifdef'd out when DRI2INFOREC_VERSION >= 2. FYI, the line that implements the hack can
> be found by grepping xf86-video-intel:src/intel_dri.c for
>     } else if (attachments[i] == DRI2BufferStencil && pDepthPixmap) {

Ah.  That sounds right.

This patch (with the whitespace fix) is

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

>>
>>> @@ -368,11 +375,12 @@ I830DRI2CreateBuffer(DrawablePtr drawable, unsigned int attachment,
>>>  		 * To accomplish this, we resort to the nasty hack of doubling
>>>  		 * the drm region's cpp and halving its height.
>>>  		 *
>>> -		 * If we neglect to double the pitch, then
>>> -		 * drm_intel_gem_bo_map_gtt() maps the memory incorrectly.
>>> +		 * If we neglect to double the pitch, then render corruption
>>> +                 * occurs.
>>
>> Mangled whitespace?  Probably mixed tabs and spaces...
> 
> Oops. Will fix.
> 
>>
>>>  		 */
>>>  		if (attachment == DRI2BufferStencil) {
>>> -			pixmap_height /= 2;
>>> +			pixmap_width = ALIGN(pixmap_width, 64);
>>> +			pixmap_height = ALIGN(pixmap_height / 2, 64);
>>>  			pixmap_cpp *= 2;
>>>  		}
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk4knooACgkQX1gOwKyEAw8mrACdFEbKAyQIntzWec6LVA2ZYYjz
8icAn1Fc+vHRyNeygch2QHnH7Vh9Ilqt
=9yZk
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Mesa-dev] [PATCH] intel: Fix stencil buffer to be W tiled
  2011-07-18 20:54     ` [Mesa-dev] " Chad Versace
@ 2011-07-18 21:02       ` Ian Romanick
  2011-07-18 21:24         ` Chad Versace
  0 siblings, 1 reply; 18+ messages in thread
From: Ian Romanick @ 2011-07-18 21:02 UTC (permalink / raw)
  To: Chad Versace; +Cc: mesa-dev, intel-gfx

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/18/2011 01:54 PM, Chad Versace wrote:
> On 07/18/2011 11:49 AM, Ian Romanick wrote:
>> On 07/18/2011 12:55 AM, Chad Versace wrote:
>>> Until now, the stencil buffer was allocated as a Y tiled buffer, because
>>> in several locations the PRM states that it is. However, it is actually
>>> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
>>> 4.5.2.1 W-Major Format:
>>>     W-Major Tile Format is used for separate stencil.
>>
>>> The GTT is incapable of W fencing, so we allocate the stencil buffer with
>>> I915_TILING_NONE and decode the tile's layout in software.
>>
>>> This fix touches the following portions of code:
>>>     - In intel_allocate_renderbuffer_storage(), allocate the stencil
>>>       buffer with I915_TILING_NONE.
>>>     - In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
>>>       not tiled.
>>>     - In the stencil buffer's span functions, the tile's layout must be
>>>       decoded in software.
>>
>>> This commit mutually depends on the xf86-video-intel commit
>>>     dri: Do not tile stencil buffer
>>>     Author: Chad Versace <chad@chad-versace.us>
>>>     Date:   Mon Jul 18 00:38:00 2011 -0700
>>
>>> On Gen6 with separate stencil enabled, fixes the following Piglit tests:
>>>     bugs/fdo23670-drawpix_stencil
>>>     general/stencil-drawpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
>>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
>>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
>>>     spec/EXT_packed_depth_stencil/readpixels-24_8
>>
>>> Note: This is a candidate for the 7.11 branch.
>>
>>> CC: Eric Anholt <eric@anholt.net>
>>> CC: Kenneth Graunke <kenneth@whitecape.org>
>>> Signed-off-by: Chad Versace <chad@chad-versace.us>
>>> ---
>>>  src/mesa/drivers/dri/intel/intel_clear.c   |    6 ++
>>>  src/mesa/drivers/dri/intel/intel_context.c |    9 ++-
>>>  src/mesa/drivers/dri/intel/intel_fbo.c     |   13 +++--
>>>  src/mesa/drivers/dri/intel/intel_screen.h  |    9 ++-
>>>  src/mesa/drivers/dri/intel/intel_span.c    |   85 ++++++++++++++++++++--------
>>>  5 files changed, 89 insertions(+), 33 deletions(-)
>>
>>> diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/drivers/dri/intel/intel_clear.c
>>> index dfca03c..5ab9873 100644
>>> --- a/src/mesa/drivers/dri/intel/intel_clear.c
>>> +++ b/src/mesa/drivers/dri/intel/intel_clear.c
>>> @@ -143,6 +143,12 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
>>>  	     */
>>>              tri_mask |= BUFFER_BIT_STENCIL;
>>>           }
>>> +	 else if (intel->has_separate_stencil &&
>>> +	       stencilRegion->tiling == I915_TILING_NONE) {
>>> +	    /* The stencil buffer is actually W tiled, which the hardware
>>> +	     * cannot blit to. */
>>> +	    tri_mask |= BUFFER_BIT_STENCIL;
>>> +	 }
>>>           else {
>>>              /* clearing all stencil bits, use blitting */
>>>              blit_mask |= BUFFER_BIT_STENCIL;
>>> diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/drivers/dri/intel/intel_context.c
>>> index 2ba1363..fe8be08 100644
>>> --- a/src/mesa/drivers/dri/intel/intel_context.c
>>> +++ b/src/mesa/drivers/dri/intel/intel_context.c
>>> @@ -1439,7 +1439,12 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>>        assert(stencil_rb->Base.Format == MESA_FORMAT_S8);
>>>        assert(depth_rb && depth_rb->Base.Format == MESA_FORMAT_X8_Z24);
>>
>>> -      if (stencil_rb->region->tiling == I915_TILING_Y) {
>>> +      if (stencil_rb->region->tiling == I915_TILING_NONE) {
>>> +	 /*
>>> +	  * The stencil buffer is actually W tiled. The region's tiling is
>>> +	  * I915_TILING_NONE, however, because the GTT is incapable of W
>>> +	  * fencing.
>>> +	  */
>>>  	 intel->intelScreen->dri2_has_hiz = INTEL_DRI2_HAS_HIZ_TRUE;
>>>  	 return;
>>>        } else {
>>> @@ -1527,7 +1532,7 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>>         * Presently, however, no verification or clean up is necessary, and
>>>         * execution should not reach here. If the framebuffer still has a hiz
>>>         * region, then we have already set dri2_has_hiz to true after
>>> -       * confirming above that the stencil buffer is Y tiled.
>>> +       * confirming above that the stencil buffer is W tiled.
>>>         */
>>>        assert(0);
>>>     }
>>> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
>>> index 1669af2..507cc33 100644
>>> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
>>> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
>>> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>
>>>     if (irb->Base.Format == MESA_FORMAT_S8) {
>>>        /*
>>> +       * The stencil buffer is W tiled. However, we request from the kernel a
>>> +       * non-tiled buffer because the GTT is incapable of W fencing.
>>> +       *
>>>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>>>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>>>         *    The pitch must be set to 2x the value computed based on width, as
>>> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>>         * To accomplish this, we resort to the nasty hack of doubling the drm
>>>         * region's cpp and halving its height.
>>>         *
>>> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
>>> -       * maps the memory incorrectly.
>>> +       * If we neglect to double the pitch, then render corruption occurs.
>>>         */
>>>        irb->region = intel_region_alloc(intel->intelScreen,
>>> -				       I915_TILING_Y,
>>> +				       I915_TILING_NONE,
>>>  				       cpp * 2,
>>> -				       width,
>>> -				       height / 2,
>>> +				       ALIGN(width, 64),
>>> +				       /* XXX: Maybe align to 128? */
>>> +				       ALIGN(height / 2, 64),
>>>  				       GL_TRUE);
>>>        if (!irb->region)
>>>  	return false;
>>> diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/drivers/dri/intel/intel_screen.h
>>> index b2013af..9dd6a52 100644
>>> --- a/src/mesa/drivers/dri/intel/intel_screen.h
>>> +++ b/src/mesa/drivers/dri/intel/intel_screen.h
>>> @@ -63,9 +63,12 @@
>>>   * x8_z24 and s8).
>>>   *
>>>   * Eventually, intel_update_renderbuffers() makes a DRI2 request for
>>> - * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are Y tiled,
>>> - * then we joyfully set intel_screen.dri2_has_hiz to true and continue as if
>>> - * nothing happend.
>>> + * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's tiling is
>>> + * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has_hiz to
>>> + * true and continue as if nothing happend.
>>> + *
>>> + * [1] The stencil buffer is actually W tiled. However, we request from the
>>> + *     kernel a non-tiled buffer because the GTT is incapable of W fencing.
>>>   *
>>>   * If the buffers are X tiled, however, the handshake has failed and we must
>>>   * clean up.
>>> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
>>> index 153803f..d306432 100644
>>> --- a/src/mesa/drivers/dri/intel/intel_span.c
>>> +++ b/src/mesa/drivers/dri/intel/intel_span.c
>>> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>>>     int miny = 0;							\
>>>     int maxx = rb->Width;						\
>>>     int maxy = rb->Height;						\
>>> -   int stride = rb->RowStride;						\
>>> -   uint8_t *buf = rb->Data;						\
>>> +									\
>>> +   /*									\
>>> +    * Here we ignore rb->Data and rb->RowStride as set by		\
>>> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
>>> +    * manually, the region's *real* base address and stride is		\
>>> +    * required.								\
>>> +    */									\
>>> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
>>> +   uint8_t *buf = irb->region->buffer->virtual;				\
>>> +   unsigned stride = irb->region->pitch;				\
>>> +   unsigned height = 2 * irb->region->height;				\
>>> +   bool flip = rb->Name == 0;						\
>>
>>> -/* Don't flip y. */
>>>  #undef Y_FLIP
>>> -#define Y_FLIP(y) y
>>> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))
>>
>>>  /**
>>>   * \brief Get pointer offset into stencil buffer.
>>>   *
>>> - * The stencil buffer interleaves two rows into one. Yay for crazy hardware.
>>> - * The table below demonstrates how the pointer arithmetic behaves for a buffer
>>> - * with positive stride (s=stride).
>>> - *
>>> - *     x    | y     | byte offset
>>> - *     --------------------------
>>> - *     0    | 0     | 0
>>> - *     0    | 1     | 1
>>> - *     1    | 0     | 2
>>> - *     1    | 1     | 3
>>> - *     ...  | ...   | ...
>>> - *     0    | 2     | s
>>> - *     0    | 3     | s + 1
>>> - *     1    | 2     | s + 2
>>> - *     1    | 3     | s + 3 
>>> - *
>>> + * The stencil buffer is W tiled. Since the GTT is incapable of W fencing, we
>>> + * must decode the tile's layout in software.
>>>   *
>>> + * See
>>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Major Tile
>>> + *     Format.
>>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling Algorithm
>>>   */
>>> -static inline intptr_t
>>> -intel_offset_S8(int stride, GLint x, GLint y)
>>> +static inline uintptr_t
>>
>> Why the type change?
> 
> Two reasons.
> 
> 1. I redeclared the parameters as unsigned so to generate better code.
> Since x, y, and stride are unsigned, the division and modulo operators
> generate shift and bit-logic instructions rather than the slower arithmetic
> instructions.

Is stride always unsigned now?  Will it always be in the future?  This
is the problem that we encountered with bug #37351 (fixed by e8b1c6d).

> Below is a comparison of x % 64, signed versus unsigned, compiled with gcc -O3.
> In f, the % generates 5 instructions. In g, only one instruction.
> 
> int f(int x) { return x % 64; }
> 
> f:
> 	movl	%edi, %edx
> 	sarl	$31, %edx
> 	shrl	$26, %edx
> 	leal	(%rdi,%rdx), %eax
> 	andl	$63, %eax
> 	subl	%edx, %eax
> 	ret
> 
> unsigned g(unsigned x) { return x % 64);
> 
> g:
> 	movl	%edi, %eax
> 	andl	$63, %eax
> 	ret
> 
> 2. I redeclared the return type as unsigned to make it explicit that it does
> not return a negative offset.
> 
>>> +intel_offset_S8(uint32_t stride, uint32_t x, uint32_t y)
>>>  {
>>> -   return 2 * ((y / 2) * stride + x) + y % 2;
>>> +   uint32_t tile_size = 4096;
>>> +   uint32_t tile_width = 64;
>>> +   uint32_t tile_height = 64;
>>> +   uint32_t row_size = 64 * stride;
>>> +
>>> +   uint32_t tile_x = x / tile_width;
>>> +   uint32_t tile_y = y / tile_height;
>>> +
>>> +   /* The byte's address relative to the tile's base addres. */
>>> +   uint32_t byte_x = x % tile_width;
>>> +   uint32_t byte_y = y % tile_height;
>>> +
>>> +   uintptr_t u = tile_y * row_size
>>> +               + tile_x * tile_size
>>> +               + 512 * (byte_x / 8)
>>> +               +  64 * (byte_y / 8)
>>> +               +  32 * ((byte_y / 4) % 2)
>>> +               +  16 * ((byte_x / 4) % 2)
>>> +               +   8 * ((byte_y / 2) % 2)
>>> +               +   4 * ((byte_x / 2) % 2)
>>> +               +   2 * (byte_y % 2)
>>> +               +   1 * (byte_x % 2);
>>> +
>>> +   /*
>>> +    * Errata for Gen5:
>>> +    *
>>> +    * An additional offset is needed which is not documented in the PRM.
>>> +    *
>>> +    * if ((byte_x / 8) % 2 == 1) {
>>> +    *    if ((byte_y / 8) % 2) == 0) {
>>> +    *       u += 64;
>>> +    *    } else {
>>> +    *       u -= 64;
>>> +    *    }
>>> +    * }
>>> +    *
>>> +    * The offset is expressed more tersely as
>>> +    * u += ((int) x & 0x8) * (8 - (((int) y & 0x8) << 1));
>>> +    */
>>> +
>>> +   return u;
>>>  }
>>
>>>  #define WRITE_STENCIL(x, y, src)  buf[intel_offset_S8(stride, x, y)] = src;

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk4kn08ACgkQX1gOwKyEAw/jIQCfYBpmMYlhRAWQa6lKs+z68HJo
aXQAn2DGTu/NU92jUfR4wP19PWU+zdOL
=AQiH
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Mesa-dev] [PATCH] intel: Fix stencil buffer to be W tiled
  2011-07-18 15:57   ` Eric Anholt
@ 2011-07-18 21:05     ` Chad Versace
  2011-07-19  0:00     ` Chad Versace
  1 sibling, 0 replies; 18+ messages in thread
From: Chad Versace @ 2011-07-18 21:05 UTC (permalink / raw)
  To: Eric Anholt; +Cc: mesa-dev, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 3922 bytes --]

On 07/18/2011 08:57 AM, Eric Anholt wrote:
> On Mon, 18 Jul 2011 00:55:03 -0700, Chad Versace <chad@chad-versace.us> wrote:
>> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
>> index 1669af2..507cc33 100644
>> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
>> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
>> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>  
>>     if (irb->Base.Format == MESA_FORMAT_S8) {
>>        /*
>> +       * The stencil buffer is W tiled. However, we request from the kernel a
>> +       * non-tiled buffer because the GTT is incapable of W fencing.
>> +       *
>>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>>         *    The pitch must be set to 2x the value computed based on width, as
>> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>         * To accomplish this, we resort to the nasty hack of doubling the drm
>>         * region's cpp and halving its height.
>>         *
>> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
>> -       * maps the memory incorrectly.
>> +       * If we neglect to double the pitch, then render corruption occurs.
>>         */
>>        irb->region = intel_region_alloc(intel->intelScreen,
>> -				       I915_TILING_Y,
>> +				       I915_TILING_NONE,
>>  				       cpp * 2,
>> -				       width,
>> -				       height / 2,
>> +				       ALIGN(width, 64),
>> +				       /* XXX: Maybe align to 128? */
>> +				       ALIGN(height / 2, 64),
>>  				       GL_TRUE);
>>        if (!irb->region)
>>  	return false;
> 
> This looks like it would fail on a buffer with height = 129.  Doesn't
> seem like 128 pitch requirement would be needed -- has it been tested?
> 
>> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
>> index 153803f..d306432 100644
>> --- a/src/mesa/drivers/dri/intel/intel_span.c
>> +++ b/src/mesa/drivers/dri/intel/intel_span.c
>> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>>     int miny = 0;							\
>>     int maxx = rb->Width;						\
>>     int maxy = rb->Height;						\
>> -   int stride = rb->RowStride;						\
>> -   uint8_t *buf = rb->Data;						\
>> +									\
>> +   /*									\
>> +    * Here we ignore rb->Data and rb->RowStride as set by		\
>> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
>> +    * manually, the region's *real* base address and stride is		\
>> +    * required.								\
>> +    */									\
>> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
>> +   uint8_t *buf = irb->region->buffer->virtual;				\
>> +   unsigned stride = irb->region->pitch;				\
>> +   unsigned height = 2 * irb->region->height;				\
>> +   bool flip = rb->Name == 0;						\
>>  
>> -/* Don't flip y. */
>>  #undef Y_FLIP
>> -#define Y_FLIP(y) y
>> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))
> 
> The flip is usually handled by a scale and bias variable, so that Y_FLIP
> is ((y) * scale + bias).  I think it'll produce less code, since Y_FLIP
> is used a lot.

Good call. Does changing the hunk like this look good to you?

+   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
+   uint8_t *buf = irb->region->buffer->virtual;			\
+   unsigned stride = irb->region->pitch;				\
+   unsigned height = 2 * irb->region->height;				\
+   bool flip = rb->Name == 0;						\
+   int y_scale = flip ? -1 : 1;					\
+   int y_bias = flip ? (height - 1) : 0;				\

-/* Don't flip y. */
 #undef Y_FLIP
-#define Y_FLIP(y) y
+#define Y_FLIP(y) (y_scale * (y) + y_bias)


-- 
Chad Versace
chad@chad-versace.us


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] intel: Fix stencil buffer to be W tiled
  2011-07-18 21:02       ` Ian Romanick
@ 2011-07-18 21:24         ` Chad Versace
  2011-07-18 22:34           ` Ian Romanick
  0 siblings, 1 reply; 18+ messages in thread
From: Chad Versace @ 2011-07-18 21:24 UTC (permalink / raw)
  To: Ian Romanick; +Cc: mesa-dev, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 14889 bytes --]

On 07/18/2011 02:02 PM, Ian Romanick wrote:
> On 07/18/2011 01:54 PM, Chad Versace wrote:
>> On 07/18/2011 11:49 AM, Ian Romanick wrote:
>>> On 07/18/2011 12:55 AM, Chad Versace wrote:
>>>> Until now, the stencil buffer was allocated as a Y tiled buffer, because
>>>> in several locations the PRM states that it is. However, it is actually
>>>> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
>>>> 4.5.2.1 W-Major Format:
>>>>     W-Major Tile Format is used for separate stencil.
>>>
>>>> The GTT is incapable of W fencing, so we allocate the stencil buffer with
>>>> I915_TILING_NONE and decode the tile's layout in software.
>>>
>>>> This fix touches the following portions of code:
>>>>     - In intel_allocate_renderbuffer_storage(), allocate the stencil
>>>>       buffer with I915_TILING_NONE.
>>>>     - In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
>>>>       not tiled.
>>>>     - In the stencil buffer's span functions, the tile's layout must be
>>>>       decoded in software.
>>>
>>>> This commit mutually depends on the xf86-video-intel commit
>>>>     dri: Do not tile stencil buffer
>>>>     Author: Chad Versace <chad@chad-versace.us>
>>>>     Date:   Mon Jul 18 00:38:00 2011 -0700
>>>
>>>> On Gen6 with separate stencil enabled, fixes the following Piglit tests:
>>>>     bugs/fdo23670-drawpix_stencil
>>>>     general/stencil-drawpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
>>>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
>>>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
>>>>     spec/EXT_packed_depth_stencil/readpixels-24_8
>>>
>>>> Note: This is a candidate for the 7.11 branch.
>>>
>>>> CC: Eric Anholt <eric@anholt.net>
>>>> CC: Kenneth Graunke <kenneth@whitecape.org>
>>>> Signed-off-by: Chad Versace <chad@chad-versace.us>
>>>> ---
>>>>  src/mesa/drivers/dri/intel/intel_clear.c   |    6 ++
>>>>  src/mesa/drivers/dri/intel/intel_context.c |    9 ++-
>>>>  src/mesa/drivers/dri/intel/intel_fbo.c     |   13 +++--
>>>>  src/mesa/drivers/dri/intel/intel_screen.h  |    9 ++-
>>>>  src/mesa/drivers/dri/intel/intel_span.c    |   85 ++++++++++++++++++++--------
>>>>  5 files changed, 89 insertions(+), 33 deletions(-)
>>>
>>>> diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/drivers/dri/intel/intel_clear.c
>>>> index dfca03c..5ab9873 100644
>>>> --- a/src/mesa/drivers/dri/intel/intel_clear.c
>>>> +++ b/src/mesa/drivers/dri/intel/intel_clear.c
>>>> @@ -143,6 +143,12 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
>>>>  	     */
>>>>              tri_mask |= BUFFER_BIT_STENCIL;
>>>>           }
>>>> +	 else if (intel->has_separate_stencil &&
>>>> +	       stencilRegion->tiling == I915_TILING_NONE) {
>>>> +	    /* The stencil buffer is actually W tiled, which the hardware
>>>> +	     * cannot blit to. */
>>>> +	    tri_mask |= BUFFER_BIT_STENCIL;
>>>> +	 }
>>>>           else {
>>>>              /* clearing all stencil bits, use blitting */
>>>>              blit_mask |= BUFFER_BIT_STENCIL;
>>>> diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/drivers/dri/intel/intel_context.c
>>>> index 2ba1363..fe8be08 100644
>>>> --- a/src/mesa/drivers/dri/intel/intel_context.c
>>>> +++ b/src/mesa/drivers/dri/intel/intel_context.c
>>>> @@ -1439,7 +1439,12 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>>>        assert(stencil_rb->Base.Format == MESA_FORMAT_S8);
>>>>        assert(depth_rb && depth_rb->Base.Format == MESA_FORMAT_X8_Z24);
>>>
>>>> -      if (stencil_rb->region->tiling == I915_TILING_Y) {
>>>> +      if (stencil_rb->region->tiling == I915_TILING_NONE) {
>>>> +	 /*
>>>> +	  * The stencil buffer is actually W tiled. The region's tiling is
>>>> +	  * I915_TILING_NONE, however, because the GTT is incapable of W
>>>> +	  * fencing.
>>>> +	  */
>>>>  	 intel->intelScreen->dri2_has_hiz = INTEL_DRI2_HAS_HIZ_TRUE;
>>>>  	 return;
>>>>        } else {
>>>> @@ -1527,7 +1532,7 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>>>         * Presently, however, no verification or clean up is necessary, and
>>>>         * execution should not reach here. If the framebuffer still has a hiz
>>>>         * region, then we have already set dri2_has_hiz to true after
>>>> -       * confirming above that the stencil buffer is Y tiled.
>>>> +       * confirming above that the stencil buffer is W tiled.
>>>>         */
>>>>        assert(0);
>>>>     }
>>>> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
>>>> index 1669af2..507cc33 100644
>>>> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
>>>> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
>>>> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>>
>>>>     if (irb->Base.Format == MESA_FORMAT_S8) {
>>>>        /*
>>>> +       * The stencil buffer is W tiled. However, we request from the kernel a
>>>> +       * non-tiled buffer because the GTT is incapable of W fencing.
>>>> +       *
>>>>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>>>>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>>>>         *    The pitch must be set to 2x the value computed based on width, as
>>>> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>>>         * To accomplish this, we resort to the nasty hack of doubling the drm
>>>>         * region's cpp and halving its height.
>>>>         *
>>>> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
>>>> -       * maps the memory incorrectly.
>>>> +       * If we neglect to double the pitch, then render corruption occurs.
>>>>         */
>>>>        irb->region = intel_region_alloc(intel->intelScreen,
>>>> -				       I915_TILING_Y,
>>>> +				       I915_TILING_NONE,
>>>>  				       cpp * 2,
>>>> -				       width,
>>>> -				       height / 2,
>>>> +				       ALIGN(width, 64),
>>>> +				       /* XXX: Maybe align to 128? */
>>>> +				       ALIGN(height / 2, 64),
>>>>  				       GL_TRUE);
>>>>        if (!irb->region)
>>>>  	return false;
>>>> diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/drivers/dri/intel/intel_screen.h
>>>> index b2013af..9dd6a52 100644
>>>> --- a/src/mesa/drivers/dri/intel/intel_screen.h
>>>> +++ b/src/mesa/drivers/dri/intel/intel_screen.h
>>>> @@ -63,9 +63,12 @@
>>>>   * x8_z24 and s8).
>>>>   *
>>>>   * Eventually, intel_update_renderbuffers() makes a DRI2 request for
>>>> - * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are Y tiled,
>>>> - * then we joyfully set intel_screen.dri2_has_hiz to true and continue as if
>>>> - * nothing happend.
>>>> + * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's tiling is
>>>> + * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has_hiz to
>>>> + * true and continue as if nothing happend.
>>>> + *
>>>> + * [1] The stencil buffer is actually W tiled. However, we request from the
>>>> + *     kernel a non-tiled buffer because the GTT is incapable of W fencing.
>>>>   *
>>>>   * If the buffers are X tiled, however, the handshake has failed and we must
>>>>   * clean up.
>>>> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
>>>> index 153803f..d306432 100644
>>>> --- a/src/mesa/drivers/dri/intel/intel_span.c
>>>> +++ b/src/mesa/drivers/dri/intel/intel_span.c
>>>> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>>>>     int miny = 0;							\
>>>>     int maxx = rb->Width;						\
>>>>     int maxy = rb->Height;						\
>>>> -   int stride = rb->RowStride;						\
>>>> -   uint8_t *buf = rb->Data;						\
>>>> +									\
>>>> +   /*									\
>>>> +    * Here we ignore rb->Data and rb->RowStride as set by		\
>>>> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
>>>> +    * manually, the region's *real* base address and stride is		\
>>>> +    * required.								\
>>>> +    */									\
>>>> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
>>>> +   uint8_t *buf = irb->region->buffer->virtual;				\
>>>> +   unsigned stride = irb->region->pitch;				\
>>>> +   unsigned height = 2 * irb->region->height;				\
>>>> +   bool flip = rb->Name == 0;						\
>>>
>>>> -/* Don't flip y. */
>>>>  #undef Y_FLIP
>>>> -#define Y_FLIP(y) y
>>>> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))
>>>
>>>>  /**
>>>>   * \brief Get pointer offset into stencil buffer.
>>>>   *
>>>> - * The stencil buffer interleaves two rows into one. Yay for crazy hardware.
>>>> - * The table below demonstrates how the pointer arithmetic behaves for a buffer
>>>> - * with positive stride (s=stride).
>>>> - *
>>>> - *     x    | y     | byte offset
>>>> - *     --------------------------
>>>> - *     0    | 0     | 0
>>>> - *     0    | 1     | 1
>>>> - *     1    | 0     | 2
>>>> - *     1    | 1     | 3
>>>> - *     ...  | ...   | ...
>>>> - *     0    | 2     | s
>>>> - *     0    | 3     | s + 1
>>>> - *     1    | 2     | s + 2
>>>> - *     1    | 3     | s + 3 
>>>> - *
>>>> + * The stencil buffer is W tiled. Since the GTT is incapable of W fencing, we
>>>> + * must decode the tile's layout in software.
>>>>   *
>>>> + * See
>>>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Major Tile
>>>> + *     Format.
>>>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling Algorithm
>>>>   */
>>>> -static inline intptr_t
>>>> -intel_offset_S8(int stride, GLint x, GLint y)
>>>> +static inline uintptr_t
>>>
>>> Why the type change?
> 
>> Two reasons.
> 
>> 1. I redeclared the parameters as unsigned so to generate better code.
>> Since x, y, and stride are unsigned, the division and modulo operators
>> generate shift and bit-logic instructions rather than the slower arithmetic
>> instructions.
> 
> Is stride always unsigned now? 

intelSpanRenderStart still sets rb->RowStride to be negative for system
stencil buffers. But we ignore that and use a positive stride instead.
To quote the hunk above:

+   /*									\
+    * Here we ignore rb->Data and rb->RowStride as set by		\
+    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
+    * manually, the region's *real* base address and stride is		\
+    * required.							\
+    */									\
+   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
+   uint8_t *buf = irb->region->buffer->virtual;			\
+   unsigned stride = irb->region->pitch;

By setting the buf and stride to the *real* base address and stride,
intel_offset_S8 can implement the exact W-tiling algorithm as described
in the bspec. (PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling
Algorithm).

If intel_offset_S8 instead used rb->Data and rb->RowStride as the buffer's
base address and stride, then intel_offset_S8 would essentially
need two implementations, like this:
    intptr_t
    intel_offset_S8(...)
    {
        if (stride > 0) {
            // do normal stuff;
        } else {
            // do upside down stuff;
        }
    }

I'd like to avoid such a bifurcated function.

> Will it always be in the future? 

Yes.

> This is the problem that we encountered with bug #37351 (fixed by e8b1c6d).

Ah... Thanks for recalling that bug. I believe setting the return type of
intel_offset_S8 to be intptr_t will eliminate reoccurrence of this bug. Do you agree?

>> Below is a comparison of x % 64, signed versus unsigned, compiled with gcc -O3.
>> In f, the % generates 5 instructions. In g, only one instruction.
> 
>> int f(int x) { return x % 64; }
> 
>> f:
>> 	movl	%edi, %edx
>> 	sarl	$31, %edx
>> 	shrl	$26, %edx
>> 	leal	(%rdi,%rdx), %eax
>> 	andl	$63, %eax
>> 	subl	%edx, %eax
>> 	ret
> 
>> unsigned g(unsigned x) { return x % 64);
> 
>> g:
>> 	movl	%edi, %eax
>> 	andl	$63, %eax
>> 	ret
> 
>> 2. I redeclared the return type as unsigned to make it explicit that it does
>> not return a negative offset.
> 
>>>> +intel_offset_S8(uint32_t stride, uint32_t x, uint32_t y)
>>>>  {
>>>> -   return 2 * ((y / 2) * stride + x) + y % 2;
>>>> +   uint32_t tile_size = 4096;
>>>> +   uint32_t tile_width = 64;
>>>> +   uint32_t tile_height = 64;
>>>> +   uint32_t row_size = 64 * stride;
>>>> +
>>>> +   uint32_t tile_x = x / tile_width;
>>>> +   uint32_t tile_y = y / tile_height;
>>>> +
>>>> +   /* The byte's address relative to the tile's base addres. */
>>>> +   uint32_t byte_x = x % tile_width;
>>>> +   uint32_t byte_y = y % tile_height;
>>>> +
>>>> +   uintptr_t u = tile_y * row_size
>>>> +               + tile_x * tile_size
>>>> +               + 512 * (byte_x / 8)
>>>> +               +  64 * (byte_y / 8)
>>>> +               +  32 * ((byte_y / 4) % 2)
>>>> +               +  16 * ((byte_x / 4) % 2)
>>>> +               +   8 * ((byte_y / 2) % 2)
>>>> +               +   4 * ((byte_x / 2) % 2)
>>>> +               +   2 * (byte_y % 2)
>>>> +               +   1 * (byte_x % 2);
>>>> +
>>>> +   /*
>>>> +    * Errata for Gen5:
>>>> +    *
>>>> +    * An additional offset is needed which is not documented in the PRM.
>>>> +    *
>>>> +    * if ((byte_x / 8) % 2 == 1) {
>>>> +    *    if ((byte_y / 8) % 2) == 0) {
>>>> +    *       u += 64;
>>>> +    *    } else {
>>>> +    *       u -= 64;
>>>> +    *    }
>>>> +    * }
>>>> +    *
>>>> +    * The offset is expressed more tersely as
>>>> +    * u += ((int) x & 0x8) * (8 - (((int) y & 0x8) << 1));
>>>> +    */
>>>> +
>>>> +   return u;
>>>>  }
>>>
>>>>  #define WRITE_STENCIL(x, y, src)  buf[intel_offset_S8(stride, x, y)] = src;

-- 
Chad Versace
chad@chad-versace.us


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] intel: Fix stencil buffer to be W tiled
  2011-07-18 21:24         ` Chad Versace
@ 2011-07-18 22:34           ` Ian Romanick
  0 siblings, 0 replies; 18+ messages in thread
From: Ian Romanick @ 2011-07-18 22:34 UTC (permalink / raw)
  To: Chad Versace; +Cc: mesa-dev, intel-gfx

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/18/2011 02:24 PM, Chad Versace wrote:
> On 07/18/2011 02:02 PM, Ian Romanick wrote:
>> On 07/18/2011 01:54 PM, Chad Versace wrote:
>>> On 07/18/2011 11:49 AM, Ian Romanick wrote:
>>>> On 07/18/2011 12:55 AM, Chad Versace wrote:
>>>>> Until now, the stencil buffer was allocated as a Y tiled buffer, because
>>>>> in several locations the PRM states that it is. However, it is actually
>>>>> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
>>>>> 4.5.2.1 W-Major Format:
>>>>>     W-Major Tile Format is used for separate stencil.
>>>>
>>>>> The GTT is incapable of W fencing, so we allocate the stencil buffer with
>>>>> I915_TILING_NONE and decode the tile's layout in software.
>>>>
>>>>> This fix touches the following portions of code:
>>>>>     - In intel_allocate_renderbuffer_storage(), allocate the stencil
>>>>>       buffer with I915_TILING_NONE.
>>>>>     - In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
>>>>>       not tiled.
>>>>>     - In the stencil buffer's span functions, the tile's layout must be
>>>>>       decoded in software.
>>>>
>>>>> This commit mutually depends on the xf86-video-intel commit
>>>>>     dri: Do not tile stencil buffer
>>>>>     Author: Chad Versace <chad@chad-versace.us>
>>>>>     Date:   Mon Jul 18 00:38:00 2011 -0700
>>>>
>>>>> On Gen6 with separate stencil enabled, fixes the following Piglit tests:
>>>>>     bugs/fdo23670-drawpix_stencil
>>>>>     general/stencil-drawpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
>>>>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
>>>>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
>>>>>     spec/EXT_packed_depth_stencil/readpixels-24_8
>>>>
>>>>> Note: This is a candidate for the 7.11 branch.
>>>>
>>>>> CC: Eric Anholt <eric@anholt.net>
>>>>> CC: Kenneth Graunke <kenneth@whitecape.org>
>>>>> Signed-off-by: Chad Versace <chad@chad-versace.us>
>>>>> ---
>>>>>  src/mesa/drivers/dri/intel/intel_clear.c   |    6 ++
>>>>>  src/mesa/drivers/dri/intel/intel_context.c |    9 ++-
>>>>>  src/mesa/drivers/dri/intel/intel_fbo.c     |   13 +++--
>>>>>  src/mesa/drivers/dri/intel/intel_screen.h  |    9 ++-
>>>>>  src/mesa/drivers/dri/intel/intel_span.c    |   85 ++++++++++++++++++++--------
>>>>>  5 files changed, 89 insertions(+), 33 deletions(-)
>>>>
>>>>> diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/drivers/dri/intel/intel_clear.c
>>>>> index dfca03c..5ab9873 100644
>>>>> --- a/src/mesa/drivers/dri/intel/intel_clear.c
>>>>> +++ b/src/mesa/drivers/dri/intel/intel_clear.c
>>>>> @@ -143,6 +143,12 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
>>>>>  	     */
>>>>>              tri_mask |= BUFFER_BIT_STENCIL;
>>>>>           }
>>>>> +	 else if (intel->has_separate_stencil &&
>>>>> +	       stencilRegion->tiling == I915_TILING_NONE) {
>>>>> +	    /* The stencil buffer is actually W tiled, which the hardware
>>>>> +	     * cannot blit to. */
>>>>> +	    tri_mask |= BUFFER_BIT_STENCIL;
>>>>> +	 }
>>>>>           else {
>>>>>              /* clearing all stencil bits, use blitting */
>>>>>              blit_mask |= BUFFER_BIT_STENCIL;
>>>>> diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/drivers/dri/intel/intel_context.c
>>>>> index 2ba1363..fe8be08 100644
>>>>> --- a/src/mesa/drivers/dri/intel/intel_context.c
>>>>> +++ b/src/mesa/drivers/dri/intel/intel_context.c
>>>>> @@ -1439,7 +1439,12 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>>>>        assert(stencil_rb->Base.Format == MESA_FORMAT_S8);
>>>>>        assert(depth_rb && depth_rb->Base.Format == MESA_FORMAT_X8_Z24);
>>>>
>>>>> -      if (stencil_rb->region->tiling == I915_TILING_Y) {
>>>>> +      if (stencil_rb->region->tiling == I915_TILING_NONE) {
>>>>> +	 /*
>>>>> +	  * The stencil buffer is actually W tiled. The region's tiling is
>>>>> +	  * I915_TILING_NONE, however, because the GTT is incapable of W
>>>>> +	  * fencing.
>>>>> +	  */
>>>>>  	 intel->intelScreen->dri2_has_hiz = INTEL_DRI2_HAS_HIZ_TRUE;
>>>>>  	 return;
>>>>>        } else {
>>>>> @@ -1527,7 +1532,7 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>>>>         * Presently, however, no verification or clean up is necessary, and
>>>>>         * execution should not reach here. If the framebuffer still has a hiz
>>>>>         * region, then we have already set dri2_has_hiz to true after
>>>>> -       * confirming above that the stencil buffer is Y tiled.
>>>>> +       * confirming above that the stencil buffer is W tiled.
>>>>>         */
>>>>>        assert(0);
>>>>>     }
>>>>> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
>>>>> index 1669af2..507cc33 100644
>>>>> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
>>>>> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
>>>>> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>>>
>>>>>     if (irb->Base.Format == MESA_FORMAT_S8) {
>>>>>        /*
>>>>> +       * The stencil buffer is W tiled. However, we request from the kernel a
>>>>> +       * non-tiled buffer because the GTT is incapable of W fencing.
>>>>> +       *
>>>>>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>>>>>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>>>>>         *    The pitch must be set to 2x the value computed based on width, as
>>>>> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>>>>         * To accomplish this, we resort to the nasty hack of doubling the drm
>>>>>         * region's cpp and halving its height.
>>>>>         *
>>>>> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
>>>>> -       * maps the memory incorrectly.
>>>>> +       * If we neglect to double the pitch, then render corruption occurs.
>>>>>         */
>>>>>        irb->region = intel_region_alloc(intel->intelScreen,
>>>>> -				       I915_TILING_Y,
>>>>> +				       I915_TILING_NONE,
>>>>>  				       cpp * 2,
>>>>> -				       width,
>>>>> -				       height / 2,
>>>>> +				       ALIGN(width, 64),
>>>>> +				       /* XXX: Maybe align to 128? */
>>>>> +				       ALIGN(height / 2, 64),
>>>>>  				       GL_TRUE);
>>>>>        if (!irb->region)
>>>>>  	return false;
>>>>> diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/drivers/dri/intel/intel_screen.h
>>>>> index b2013af..9dd6a52 100644
>>>>> --- a/src/mesa/drivers/dri/intel/intel_screen.h
>>>>> +++ b/src/mesa/drivers/dri/intel/intel_screen.h
>>>>> @@ -63,9 +63,12 @@
>>>>>   * x8_z24 and s8).
>>>>>   *
>>>>>   * Eventually, intel_update_renderbuffers() makes a DRI2 request for
>>>>> - * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are Y tiled,
>>>>> - * then we joyfully set intel_screen.dri2_has_hiz to true and continue as if
>>>>> - * nothing happend.
>>>>> + * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's tiling is
>>>>> + * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has_hiz to
>>>>> + * true and continue as if nothing happend.
>>>>> + *
>>>>> + * [1] The stencil buffer is actually W tiled. However, we request from the
>>>>> + *     kernel a non-tiled buffer because the GTT is incapable of W fencing.
>>>>>   *
>>>>>   * If the buffers are X tiled, however, the handshake has failed and we must
>>>>>   * clean up.
>>>>> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
>>>>> index 153803f..d306432 100644
>>>>> --- a/src/mesa/drivers/dri/intel/intel_span.c
>>>>> +++ b/src/mesa/drivers/dri/intel/intel_span.c
>>>>> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>>>>>     int miny = 0;							\
>>>>>     int maxx = rb->Width;						\
>>>>>     int maxy = rb->Height;						\
>>>>> -   int stride = rb->RowStride;						\
>>>>> -   uint8_t *buf = rb->Data;						\
>>>>> +									\
>>>>> +   /*									\
>>>>> +    * Here we ignore rb->Data and rb->RowStride as set by		\
>>>>> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
>>>>> +    * manually, the region's *real* base address and stride is		\
>>>>> +    * required.								\
>>>>> +    */									\
>>>>> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
>>>>> +   uint8_t *buf = irb->region->buffer->virtual;				\
>>>>> +   unsigned stride = irb->region->pitch;				\
>>>>> +   unsigned height = 2 * irb->region->height;				\
>>>>> +   bool flip = rb->Name == 0;						\
>>>>
>>>>> -/* Don't flip y. */
>>>>>  #undef Y_FLIP
>>>>> -#define Y_FLIP(y) y
>>>>> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))
>>>>
>>>>>  /**
>>>>>   * \brief Get pointer offset into stencil buffer.
>>>>>   *
>>>>> - * The stencil buffer interleaves two rows into one. Yay for crazy hardware.
>>>>> - * The table below demonstrates how the pointer arithmetic behaves for a buffer
>>>>> - * with positive stride (s=stride).
>>>>> - *
>>>>> - *     x    | y     | byte offset
>>>>> - *     --------------------------
>>>>> - *     0    | 0     | 0
>>>>> - *     0    | 1     | 1
>>>>> - *     1    | 0     | 2
>>>>> - *     1    | 1     | 3
>>>>> - *     ...  | ...   | ...
>>>>> - *     0    | 2     | s
>>>>> - *     0    | 3     | s + 1
>>>>> - *     1    | 2     | s + 2
>>>>> - *     1    | 3     | s + 3 
>>>>> - *
>>>>> + * The stencil buffer is W tiled. Since the GTT is incapable of W fencing, we
>>>>> + * must decode the tile's layout in software.
>>>>>   *
>>>>> + * See
>>>>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Major Tile
>>>>> + *     Format.
>>>>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling Algorithm
>>>>>   */
>>>>> -static inline intptr_t
>>>>> -intel_offset_S8(int stride, GLint x, GLint y)
>>>>> +static inline uintptr_t
>>>>
>>>> Why the type change?
>>
>>> Two reasons.
>>
>>> 1. I redeclared the parameters as unsigned so to generate better code.
>>> Since x, y, and stride are unsigned, the division and modulo operators
>>> generate shift and bit-logic instructions rather than the slower arithmetic
>>> instructions.
>>
>> Is stride always unsigned now? 
> 
> intelSpanRenderStart still sets rb->RowStride to be negative for system
> stencil buffers. But we ignore that and use a positive stride instead.
> To quote the hunk above:
> 
> +   /*									\
> +    * Here we ignore rb->Data and rb->RowStride as set by		\
> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
> +    * manually, the region's *real* base address and stride is		\
> +    * required.							\
> +    */									\
> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
> +   uint8_t *buf = irb->region->buffer->virtual;			\
> +   unsigned stride = irb->region->pitch;
> 
> By setting the buf and stride to the *real* base address and stride,
> intel_offset_S8 can implement the exact W-tiling algorithm as described
> in the bspec. (PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling
> Algorithm).
> 
> If intel_offset_S8 instead used rb->Data and rb->RowStride as the buffer's
> base address and stride, then intel_offset_S8 would essentially
> need two implementations, like this:
>     intptr_t
>     intel_offset_S8(...)
>     {
>         if (stride > 0) {
>             // do normal stuff;
>         } else {
>             // do upside down stuff;
>         }
>     }
> 
> I'd like to avoid such a bifurcated function.

That makes perfect sense.  Thanks for clearing that all up.

>> Will it always be in the future? 
> 
> Yes.
> 
>> This is the problem that we encountered with bug #37351 (fixed by e8b1c6d).
> 
> Ah... Thanks for recalling that bug. I believe setting the return type of
> intel_offset_S8 to be intptr_t will eliminate reoccurrence of this bug. Do you agree?

It should.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk4ktN4ACgkQX1gOwKyEAw/fxQCeMr45By87f0YxBNU+Fp62C1LV
asMAnitL5MjvaIKXgXUIGFDof8hNHRPL
=5kb4
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Mesa-dev] [PATCH] intel: Fix stencil buffer to be W tiled
  2011-07-18 15:57   ` Eric Anholt
  2011-07-18 21:05     ` [Mesa-dev] " Chad Versace
@ 2011-07-19  0:00     ` Chad Versace
  2011-07-19 15:34       ` Eric Anholt
  1 sibling, 1 reply; 18+ messages in thread
From: Chad Versace @ 2011-07-19  0:00 UTC (permalink / raw)
  To: Eric Anholt; +Cc: mesa-dev, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 2531 bytes --]

On 07/18/2011 08:57 AM, Eric Anholt wrote:
> On Mon, 18 Jul 2011 00:55:03 -0700, Chad Versace <chad@chad-versace.us> wrote:
>> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
>> index 1669af2..507cc33 100644
>> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
>> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
>> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>  
>>     if (irb->Base.Format == MESA_FORMAT_S8) {
>>        /*
>> +       * The stencil buffer is W tiled. However, we request from the kernel a
>> +       * non-tiled buffer because the GTT is incapable of W fencing.
>> +       *
>>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>>         *    The pitch must be set to 2x the value computed based on width, as
>> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>         * To accomplish this, we resort to the nasty hack of doubling the drm
>>         * region's cpp and halving its height.
>>         *
>> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
>> -       * maps the memory incorrectly.
>> +       * If we neglect to double the pitch, then render corruption occurs.
>>         */
>>        irb->region = intel_region_alloc(intel->intelScreen,
>> -				       I915_TILING_Y,
>> +				       I915_TILING_NONE,
>>  				       cpp * 2,
>> -				       width,
>> -				       height / 2,
>> +				       ALIGN(width, 64),
>> +				       /* XXX: Maybe align to 128? */
>> +				       ALIGN(height / 2, 64),
>>  				       GL_TRUE);
>>        if (!irb->region)
>>  	return false;
> 
> This looks like it would fail on a buffer with height = 129.

Oops. It does fail for height = 129. Accordingly, I've changed this hunk to:

       irb->region = intel_region_alloc(intel->intelScreen,
-				       I915_TILING_Y,
+				       I915_TILING_NONE,
 				       cpp * 2,
-				       width,
-				       height / 2,
+				       ALIGN(width, 64),
+				       ALIGN((height + 1)/ 2, 64),
 				       GL_TRUE);

And changed the same line in xf86-video-intel too.

> Doesn't
> seem like 128 pitch requirement would be needed -- has it been tested?

The XXX was left in the patch accidentally. 128-alignment has been tested, and was
found unnecessary.

-- 
Chad Versace
chad@chad-versace.us


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] intel: Fix stencil buffer to be W tiled
  2011-07-19  0:00     ` Chad Versace
@ 2011-07-19 15:34       ` Eric Anholt
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Anholt @ 2011-07-19 15:34 UTC (permalink / raw)
  To: Chad Versace; +Cc: mesa-dev, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 2755 bytes --]

On Mon, 18 Jul 2011 17:00:54 -0700, Chad Versace <chad@chad-versace.us> wrote:
> On 07/18/2011 08:57 AM, Eric Anholt wrote:
> > On Mon, 18 Jul 2011 00:55:03 -0700, Chad Versace <chad@chad-versace.us> wrote:
> >> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
> >> index 1669af2..507cc33 100644
> >> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
> >> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
> >> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
> >>  
> >>     if (irb->Base.Format == MESA_FORMAT_S8) {
> >>        /*
> >> +       * The stencil buffer is W tiled. However, we request from the kernel a
> >> +       * non-tiled buffer because the GTT is incapable of W fencing.
> >> +       *
> >>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
> >>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
> >>         *    The pitch must be set to 2x the value computed based on width, as
> >> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
> >>         * To accomplish this, we resort to the nasty hack of doubling the drm
> >>         * region's cpp and halving its height.
> >>         *
> >> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
> >> -       * maps the memory incorrectly.
> >> +       * If we neglect to double the pitch, then render corruption occurs.
> >>         */
> >>        irb->region = intel_region_alloc(intel->intelScreen,
> >> -				       I915_TILING_Y,
> >> +				       I915_TILING_NONE,
> >>  				       cpp * 2,
> >> -				       width,
> >> -				       height / 2,
> >> +				       ALIGN(width, 64),
> >> +				       /* XXX: Maybe align to 128? */
> >> +				       ALIGN(height / 2, 64),
> >>  				       GL_TRUE);
> >>        if (!irb->region)
> >>  	return false;
> > 
> > This looks like it would fail on a buffer with height = 129.
> 
> Oops. It does fail for height = 129. Accordingly, I've changed this hunk to:
> 
>        irb->region = intel_region_alloc(intel->intelScreen,
> -				       I915_TILING_Y,
> +				       I915_TILING_NONE,
>  				       cpp * 2,
> -				       width,
> -				       height / 2,
> +				       ALIGN(width, 64),
> +				       ALIGN((height + 1)/ 2, 64),
>  				       GL_TRUE);
> 
> And changed the same line in xf86-video-intel too.
> 
> > Doesn't
> > seem like 128 pitch requirement would be needed -- has it been tested?
> 
> The XXX was left in the patch accidentally. 128-alignment has been tested, and was
> found unnecessary.

Looks good to me, gets the Reviewed-by: Eric Anholt <eric@anholt.net>

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Mesa-dev] [PATCH] intel: Fix stencil buffer to be W tiled
  2011-07-18  8:20   ` Paul Menzel
@ 2011-07-19 20:03     ` Chad Versace
  0 siblings, 0 replies; 18+ messages in thread
From: Chad Versace @ 2011-07-19 20:03 UTC (permalink / raw)
  To: Paul Menzel; +Cc: mesa-dev, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 446 bytes --]

On 07/18/2011 01:20 AM, Paul Menzel wrote:
> Am Montag, den 18.07.2011, 00:55 -0700 schrieb Chad Versace:
> There are alignment/white space issues above.
> 
>> +   unsigned stride = irb->region->pitch;				\
>> +   unsigned height = 2 * irb->region->height;				\
>> +   bool flip = rb->Name == 0;						\
> 
> […]
> 
> 
> Thanks,
> 
> Paul
> 

Thanks. Fix whitespace in both patches.

-- 
Chad Versace
chad@chad-versace.us


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-07-19 20:03 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-18  7:55 [PATCH] intel: Fix stencil buffer to be W tiled Chad Versace
2011-07-18  7:55 ` [PATCH] dri: Do not tile stencil buffer Chad Versace
2011-07-18  8:15   ` Paul Menzel
2011-07-18 18:45   ` [Mesa-dev] " Ian Romanick
2011-07-18 19:28     ` Chad Versace
2011-07-18 20:58       ` Ian Romanick
2011-07-18  7:55 ` [PATCH] intel: Fix stencil buffer to be W tiled Chad Versace
2011-07-18  8:20   ` Paul Menzel
2011-07-19 20:03     ` [Mesa-dev] " Chad Versace
2011-07-18 15:57   ` Eric Anholt
2011-07-18 21:05     ` [Mesa-dev] " Chad Versace
2011-07-19  0:00     ` Chad Versace
2011-07-19 15:34       ` Eric Anholt
2011-07-18 18:49   ` Ian Romanick
2011-07-18 20:54     ` [Mesa-dev] " Chad Versace
2011-07-18 21:02       ` Ian Romanick
2011-07-18 21:24         ` Chad Versace
2011-07-18 22:34           ` Ian Romanick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).