From mboxrd@z Thu Jan  1 00:00:00 1970
From: Ben Widawsky <ben@bwidawsk.net>
Subject: Re: [RFC] drm/i915: use PIPE_CONTROL for flushing on
 gen6+
Date: Tue, 30 Aug 2011 16:11:29 -0700
Message-ID: <20110830231129.GA7274@bolo_yeung.jf.intel.com>
References: <20110812111845.339aab04@jbarnes-desktop>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org>
Received: from cloud01.chad-versace.us (184-106-247-128.static.cloud-ips.com
	[184.106.247.128])
	by gabe.freedesktop.org (Postfix) with ESMTP id 14F459E8CF
	for <intel-gfx@lists.freedesktop.org>;
	Tue, 30 Aug 2011 16:12:39 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <20110812111845.339aab04@jbarnes-desktop>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
To: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: intel-gfx@lists.freedesktop.org
List-Id: intel-gfx@lists.freedesktop.org

On Fri, Aug 12, 2011 at 11:18:45AM -0700, Jesse Barnes wrote:
>  	int space = (ring->head & HEAD_ADDR) - (ring->tail + 8);
> @@ -123,6 +133,112 @@ render_ring_flush(struct intel_ring_buffer *ring,
>  	return 0;
>  }
>  
> +/**
> + * Emits a PIPE_CONTROL with a non-zero post-sync operation, for
> + * implementing two workarounds on gen6.  From section 1.4.7.1
> + * "PIPE_CONTROL" of the Sandy Bridge PRM volume 2 part 1:
> + *
> + * [DevSNB-C+{W/A}] Before any depth stall flush (including those
> + * produced by non-pipelined state commands), software needs to first
> + * send a PIPE_CONTROL with no bits set except Post-Sync Operation !=
> + * 0.
> + *
> + * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache Flush Enable
> + * =1, a PIPE_CONTROL with any non-zero post-sync-op is required.
> + *
> + * And the workaround for these two requires this workaround first:
> + *
> + * [Dev-SNB{W/A}]: Pipe-control with CS-stall bit set must be sent
> + * BEFORE the pipe-control with a post-sync op and no write-cache
> + * flushes.
> + *
> + * And this last workaround is tricky because of the requirements on
> + * that bit.  From section 1.4.7.2.3 "Stall" of the Sandy Bridge PRM
> + * volume 2 part 1:
> + *
> + *     "1 of the following must also be set:
> + *      - Render Target Cache Flush Enable ([12] of DW1)
> + *      - Depth Cache Flush Enable ([0] of DW1)
> + *      - Stall at Pixel Scoreboard ([1] of DW1)
> + *      - Depth Stall ([13] of DW1)
> + *      - Post-Sync Operation ([13] of DW1)
> + *      - Notify Enable ([8] of DW1)"
> + *
> + * The cache flushes require the workaround flush that triggered this
> + * one, so we can't use it.  Depth stall would trigger the same.
> + * Post-sync nonzero is what triggered this second workaround, so we
> + * can't use that one either.  Notify enable is IRQs, which aren't
> + * really our business.  That leaves only stall at scoreboard.
> + */
> +static int
> +intel_emit_post_sync_nonzero_flush(struct intel_ring_buffer *ring)
> +{
> +	struct pipe_control *pc = ring->private;
> +	u32 scratch_addr = pc->gtt_offset + 128;
> +	int ret;
> +
> +
> +	ret = intel_ring_begin(ring, 6);
> +	if (ret)
> +		return ret;
> +
> +	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL_GEN6);
> +	intel_ring_emit(ring, PIPE_CONTROL_CS_STALL |
> +			PIPE_CONTROL_STALL_AT_SCOREBOARD);
> +	intel_ring_emit(ring, scratch_addr | PIPE_CONTROL_GLOBAL_GTT); /* address */
> +	intel_ring_emit(ring, 0); /* low dword */
> +	intel_ring_emit(ring, 0); /* high dword */
> +	intel_ring_emit(ring, MI_NOOP);
> +	intel_ring_advance(ring);
> +
> +	ret = intel_ring_begin(ring, 6);
> +	if (ret)
> +		return ret;
> +
> +	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL_GEN6);
> +	intel_ring_emit(ring, PIPE_CONTROL_QW_WRITE);
> +	intel_ring_emit(ring, scratch_addr | PIPE_CONTROL_GLOBAL_GTT); /* address */
> +	intel_ring_emit(ring, 0);
> +	intel_ring_emit(ring, 0);
> +	intel_ring_emit(ring, MI_NOOP);
> +	intel_ring_advance(ring);
> +
> +	return 0;
> +}
> +
> +static int
> +gen6_render_ring_flush(struct intel_ring_buffer *ring,
> +                         u32 invalidate_domains, u32 flush_domains)
> +{
> +	u32 flags = 0;
> +	struct pipe_control *pc = ring->private;
> +	u32 scratch_addr = pc->gtt_offset + 128;
> +	int ret;
> +
> +	/* Force SNB workarounds for PIPE_CONTROL flushes */
> +	intel_emit_post_sync_nonzero_flush(ring);
> +
> +	/* Just flush everything for now */
> +	flags |= PIPE_CONTROL_WC_FLUSH;
> +	flags |= PIPE_CONTROL_IS_FLUSH;
> +	flags |= PIPE_CONTROL_TC_FLUSH;
> +	flags |= PIPE_CONTROL_DEPTH_FLUSH;
> +
> +	ret = intel_ring_begin(ring, 6);
> +	if (ret)
> +		return ret;
> +
> +	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL_GEN6);
> +	intel_ring_emit(ring, flags);
> +	intel_ring_emit(ring, scratch_addr | PIPE_CONTROL_GLOBAL_GTT);
> +	intel_ring_emit(ring, 0); /* lower dword */
> +	intel_ring_emit(ring, 0); /* uppwer dword */
> +	intel_ring_emit(ring, MI_NOOP);
> +	intel_ring_advance(ring);
> +
> +	return 0;
> +}
> +
>  static void ring_write_tail(struct intel_ring_buffer *ring,
>  			    u32 value)
>  {

While I'm not convinced this is broken, I think you either want to
specify a qword write (3 << 14) or use n=2 for the pipe control, ie:

intel_ring_emit(ring, GFX_OP_PIPE_CONTROL_GEN);
intel_ring_emit(ring, flags);
intel_ring_emit(ring, 0); /* ignored if no write */
intel_ring_emit(ring, 0); /* ignored if no write*/
intel_ring_advance(ring);

Ben