All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Reinstate order of operations in {intel, logical}_ring_begin()
@ 2015-06-08 18:51 Dave Gordon
  2015-06-13 20:16 ` shuang.he
  2015-06-15  9:15 ` Chris Wilson
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Gordon @ 2015-06-08 18:51 UTC (permalink / raw)
  To: intel-gfx

The original idea of preallocating the OLR was implemented in

> 9d773091 drm/i915: Preallocate next seqno before touching the ring

and the sequence of operations was to allocate the OLR, then wrap past
the end of the ring if necessary, then wait for space if necessary.
But subsequently intel_ring_begin() was refactored, in

> 304d695 drm/i915: Flush outstanding requests before allocating new seqno

to ensure that pending work that might need to be flushed used the old
and not the newly-allocated request. This changed the sequence to wrap
and/or wait, then allocate, although the comment still said
	/* Preallocate the olr before touching the ring */
which was no longer true as intel_wrap_ring_buffer() touches the ring.

The reversal didn't introduce any problems until the introduction of
dynamic pinning, in

> 7ba717c drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand

With that came the possibility that the ringbuffer might not be pinned
to the GTT or mapped into CPU address space when intel_ring_begin()
is called. It gets pinned when the request is allocated, so it's now
important that this comes before *anything* that can write into the
ringbuffer, specifically intel_wrap_ring_buffer(), as this will fault if
(a) the ringbuffer happens not to be mapped, and (b) tail happens to be
sufficiently close to the end of the ring to trigger wrapping.

The original rationale for this reversal seems to no longer apply,
as we shouldn't ever have anything in the ringbuffer which is not
associated with a specific request, and therefore shouldn't have anything
to flush.  So it should now be safe to reinstate the original sequence
of allocate-wrap-wait :)

Between the original sequence swap and now, the ringbuffer code got
cloned to become the execbuffer code, in

> 82e104c drm/i915/bdw: New logical ring submission mechanism

So now we have to fix it in both paths ...

Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        |    6 +++---
 drivers/gpu/drm/i915/intel_ringbuffer.c |    6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9b74ffa..4d82d9b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -809,12 +809,12 @@ static int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf,
 	if (ret)
 		return ret;
 
-	ret = logical_ring_prepare(ringbuf, ctx, num_dwords * sizeof(uint32_t));
+	/* Preallocate the olr before touching the ring */
+	ret = i915_gem_request_alloc(ring, ctx);
 	if (ret)
 		return ret;
 
-	/* Preallocate the olr before touching the ring */
-	ret = i915_gem_request_alloc(ring, ctx);
+	ret = logical_ring_prepare(ringbuf, ctx, num_dwords * sizeof(uint32_t));
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index b70d25b..9f7a4d2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2222,12 +2222,12 @@ int intel_ring_begin(struct intel_engine_cs *ring,
 	if (ret)
 		return ret;
 
-	ret = __intel_ring_prepare(ring, num_dwords * sizeof(uint32_t));
+	/* Preallocate the olr before touching the ring */
+	ret = i915_gem_request_alloc(ring, ring->default_context);
 	if (ret)
 		return ret;
 
-	/* Preallocate the olr before touching the ring */
-	ret = i915_gem_request_alloc(ring, ring->default_context);
+	ret = __intel_ring_prepare(ring, num_dwords * sizeof(uint32_t));
 	if (ret)
 		return ret;
 
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Reinstate order of operations in {intel, logical}_ring_begin()
  2015-06-08 18:51 [PATCH] drm/i915: Reinstate order of operations in {intel, logical}_ring_begin() Dave Gordon
@ 2015-06-13 20:16 ` shuang.he
  2015-06-15  9:15 ` Chris Wilson
  1 sibling, 0 replies; 8+ messages in thread
From: shuang.he @ 2015-06-13 20:16 UTC (permalink / raw)
  To: shuang.he, lei.a.liu, intel-gfx, david.s.gordon

Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 6557
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                                  276/276              276/276
ILK                                  303/303              303/303
SNB                                  312/312              312/312
IVB                                  343/343              343/343
BYT                                  287/287              287/287
BDW                                  321/321              321/321
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Reinstate order of operations in {intel, logical}_ring_begin()
  2015-06-08 18:51 [PATCH] drm/i915: Reinstate order of operations in {intel, logical}_ring_begin() Dave Gordon
  2015-06-13 20:16 ` shuang.he
@ 2015-06-15  9:15 ` Chris Wilson
  2015-06-15 18:11   ` Dave Gordon
  1 sibling, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2015-06-15  9:15 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 08, 2015 at 07:51:36PM +0100, Dave Gordon wrote:
> The original idea of preallocating the OLR was implemented in
> 
> > 9d773091 drm/i915: Preallocate next seqno before touching the ring
> 
> and the sequence of operations was to allocate the OLR, then wrap past
> the end of the ring if necessary, then wait for space if necessary.
> But subsequently intel_ring_begin() was refactored, in
> 
> > 304d695 drm/i915: Flush outstanding requests before allocating new seqno
> 
> to ensure that pending work that might need to be flushed used the old
> and not the newly-allocated request. This changed the sequence to wrap
> and/or wait, then allocate, although the comment still said
> 	/* Preallocate the olr before touching the ring */
> which was no longer true as intel_wrap_ring_buffer() touches the ring.
> 
> The reversal didn't introduce any problems until the introduction of
> dynamic pinning, in
> 
> > 7ba717c drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand
> 
> With that came the possibility that the ringbuffer might not be pinned
> to the GTT or mapped into CPU address space when intel_ring_begin()
> is called. It gets pinned when the request is allocated, so it's now
> important that this comes before *anything* that can write into the
> ringbuffer, specifically intel_wrap_ring_buffer(), as this will fault if
> (a) the ringbuffer happens not to be mapped, and (b) tail happens to be
> sufficiently close to the end of the ring to trigger wrapping.
> 
> The original rationale for this reversal seems to no longer apply,
> as we shouldn't ever have anything in the ringbuffer which is not
> associated with a specific request, and therefore shouldn't have anything
> to flush.  So it should now be safe to reinstate the original sequence
> of allocate-wrap-wait :)

It still applies. If you submit say 1024 interrupted execbuffers they
all share the same request. Then so does the 1025. Except the 1025th
(for the sake of argument) requires extra space on the ring. To make
that space it finishes the only request (since all 1024 are one and the
same) the continues onwardsly blithely unaware it just lost the
olr/seqno.

To fix this requires request create/commit semantics, where the request
create manages the pinning of the context for itself, and also imposes
the limitation that a single request cannot occupy the full ringbuffer.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Reinstate order of operations in {intel, logical}_ring_begin()
  2015-06-15  9:15 ` Chris Wilson
@ 2015-06-15 18:11   ` Dave Gordon
  2015-06-15 20:41     ` Chris Wilson
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Gordon @ 2015-06-15 18:11 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Alex Dai

On 15/06/15 10:15, Chris Wilson wrote:
> On Mon, Jun 08, 2015 at 07:51:36PM +0100, Dave Gordon wrote:
>> The original idea of preallocating the OLR was implemented in
>>
>>> 9d773091 drm/i915: Preallocate next seqno before touching the ring
>>
>> and the sequence of operations was to allocate the OLR, then wrap past
>> the end of the ring if necessary, then wait for space if necessary.
>> But subsequently intel_ring_begin() was refactored, in
>>
>>> 304d695 drm/i915: Flush outstanding requests before allocating new seqno
>>
>> to ensure that pending work that might need to be flushed used the old
>> and not the newly-allocated request. This changed the sequence to wrap
>> and/or wait, then allocate, although the comment still said
>> 	/* Preallocate the olr before touching the ring */
>> which was no longer true as intel_wrap_ring_buffer() touches the ring.
>>
>> The reversal didn't introduce any problems until the introduction of
>> dynamic pinning, in
>>
>>> 7ba717c drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand
>>
>> With that came the possibility that the ringbuffer might not be pinned
>> to the GTT or mapped into CPU address space when intel_ring_begin()
>> is called. It gets pinned when the request is allocated, so it's now
>> important that this comes before *anything* that can write into the
>> ringbuffer, specifically intel_wrap_ring_buffer(), as this will fault if
>> (a) the ringbuffer happens not to be mapped, and (b) tail happens to be
>> sufficiently close to the end of the ring to trigger wrapping.
>>
>> The original rationale for this reversal seems to no longer apply,
>> as we shouldn't ever have anything in the ringbuffer which is not
>> associated with a specific request, and therefore shouldn't have anything
>> to flush.  So it should now be safe to reinstate the original sequence
>> of allocate-wrap-wait :)
> 
> It still applies. If you submit say 1024 interrupted execbuffers they

What is an interrupted execbuffer? AFAICT we hold the struct_mutex while
stuffing the ringbuffer so we can only ever be in the process of adding
instructions to one ringbuffer at a time, and we don't (now) interleave
any flip commands (execlists mode requires mmio flip). Is there still
something that just adds random stuff to someone else's OLR?

> all share the same request. Then so does the 1025. Except the 1025th
> (for the sake of argument) requires extra space on the ring. To make
> that space it finishes the only request (since all 1024 are one and the
> same) the continues onwardsly blithely unaware it just lost the
> olr/seqno.
> 
> To fix this requires request create/commit semantics, where the request
> create manages the pinning of the context for itself, and also imposes
> the limitation that a single request cannot occupy the full ringbuffer.
> -Chris

Well I'd very much like create/commit semantics, so that we can roll
back any sequence that breaks rather than leaving halfbaked command streams.

I'm pretty sure that no single request can occupy the full ringbuffer
though. Last time I measured it, the maximal sequence for any single
request was ~190 Dwords, considerably less than 1Kb, and not enough to
fill even the small (4-page) rings used with GuC submission.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Reinstate order of operations in {intel, logical}_ring_begin()
  2015-06-15 18:11   ` Dave Gordon
@ 2015-06-15 20:41     ` Chris Wilson
  2015-06-16 11:03       ` Dave Gordon
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2015-06-15 20:41 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Jun 15, 2015 at 07:11:37PM +0100, Dave Gordon wrote:
> > It still applies. If you submit say 1024 interrupted execbuffers they
> 
> What is an interrupted execbuffer? AFAICT we hold the struct_mutex while
> stuffing the ringbuffer so we can only ever be in the process of adding
> instructions to one ringbuffer at a time, and we don't (now) interleave
> any flip commands (execlists mode requires mmio flip). Is there still
> something that just adds random stuff to someone else's OLR?

Write commands to stream, fail to add the request because the wait for
ring space fails due to a pending signals. Repeat until the entire ring
is filled by a single pending request.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Reinstate order of operations in {intel, logical}_ring_begin()
  2015-06-15 20:41     ` Chris Wilson
@ 2015-06-16 11:03       ` Dave Gordon
  2015-06-16 12:18         ` Daniel Vetter
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Gordon @ 2015-06-16 11:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 15/06/15 21:41, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:11:37PM +0100, Dave Gordon wrote:
>>> It still applies. If you submit say 1024 interrupted execbuffers they
>>
>> What is an interrupted execbuffer? AFAICT we hold the struct_mutex while
>> stuffing the ringbuffer so we can only ever be in the process of adding
>> instructions to one ringbuffer at a time, and we don't (now) interleave
>> any flip commands (execlists mode requires mmio flip). Is there still
>> something that just adds random stuff to someone else's OLR?
> 
> Write commands to stream, fail to add the request because the wait for
> ring space fails due to a pending signals. Repeat until the entire ring
> is filled by a single pending request.
> -Chris

Uuuurgh ... I always said there was something wrong with leaving partial
command streams in the ringbuffer :(

In the Android version we make sure that can't happen by checking that
there's going to be enough space for all the commands generated by the
request before we start writing into the ringbuffer, so no ring_begin()
call thereafter can ever have to wait for more space.

Of course, having a global lock doesn't help. With multiple ringbuffers
per engine, one should really only need a per-ringbuffer lock for
writing into the ringbuffer ... and we should reset the tail pointer if
writing the request is interrupted or aborted.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Reinstate order of operations in {intel, logical}_ring_begin()
  2015-06-16 11:03       ` Dave Gordon
@ 2015-06-16 12:18         ` Daniel Vetter
  2015-06-16 12:24           ` Chris Wilson
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Vetter @ 2015-06-16 12:18 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Tue, Jun 16, 2015 at 12:03:45PM +0100, Dave Gordon wrote:
> On 15/06/15 21:41, Chris Wilson wrote:
> > On Mon, Jun 15, 2015 at 07:11:37PM +0100, Dave Gordon wrote:
> >>> It still applies. If you submit say 1024 interrupted execbuffers they
> >>
> >> What is an interrupted execbuffer? AFAICT we hold the struct_mutex while
> >> stuffing the ringbuffer so we can only ever be in the process of adding
> >> instructions to one ringbuffer at a time, and we don't (now) interleave
> >> any flip commands (execlists mode requires mmio flip). Is there still
> >> something that just adds random stuff to someone else's OLR?
> > 
> > Write commands to stream, fail to add the request because the wait for
> > ring space fails due to a pending signals. Repeat until the entire ring
> > is filled by a single pending request.
> > -Chris
> 
> Uuuurgh ... I always said there was something wrong with leaving partial
> command streams in the ringbuffer :(
> 
> In the Android version we make sure that can't happen by checking that
> there's going to be enough space for all the commands generated by the
> request before we start writing into the ringbuffer, so no ring_begin()
> call thereafter can ever have to wait for more space.

Yeah that's the guarantee that the olr removal series should finally put
into place by pre-reserving sufficient amounts of ringspace to guarantee
we can put the execbuf in fully or not at all.

> Of course, having a global lock doesn't help. With multiple ringbuffers
> per engine, one should really only need a per-ringbuffer lock for
> writing into the ringbuffer ... and we should reset the tail pointer if
> writing the request is interrupted or aborted.

Maybe someone should look at per-buffer locks before trying to split up
the low-level hw locks ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/i915: Reinstate order of operations in {intel, logical}_ring_begin()
  2015-06-16 12:18         ` Daniel Vetter
@ 2015-06-16 12:24           ` Chris Wilson
  0 siblings, 0 replies; 8+ messages in thread
From: Chris Wilson @ 2015-06-16 12:24 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, Jun 16, 2015 at 02:18:55PM +0200, Daniel Vetter wrote:
> Maybe someone should look at per-buffer locks before trying to split up
> the low-level hw locks ;-)

Per-vm then per-buffer. Adding more locked operations to execbuf is
scary though, with some workloads perf highlights the cost of the atomic
for an uncontested mutex as being where the majority of the time is
going.

At this point I am questing perf, but it is the last remaining locked
instruction in that path...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-06-16 12:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-08 18:51 [PATCH] drm/i915: Reinstate order of operations in {intel, logical}_ring_begin() Dave Gordon
2015-06-13 20:16 ` shuang.he
2015-06-15  9:15 ` Chris Wilson
2015-06-15 18:11   ` Dave Gordon
2015-06-15 20:41     ` Chris Wilson
2015-06-16 11:03       ` Dave Gordon
2015-06-16 12:18         ` Daniel Vetter
2015-06-16 12:24           ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.