* [PATCH v2] Resolve issues with ringbuffer space management [not found] <1433789441-8295-1-git-send-email-david.s.gordon@intel.com> @ 2015-06-12 17:09 ` Dave Gordon 2015-06-12 17:09 ` [PATCH 1/2] drm/i915: use effective_size for ringbuffer calculations Dave Gordon ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Dave Gordon @ 2015-06-12 17:09 UTC (permalink / raw) To: intel-gfx Updates and supersedes the referenced patch, "Reinstate order of operations in {intel,logical}_ring_begin()" _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] drm/i915: use effective_size for ringbuffer calculations 2015-06-12 17:09 ` [PATCH v2] Resolve issues with ringbuffer space management Dave Gordon @ 2015-06-12 17:09 ` Dave Gordon 2015-06-12 18:12 ` Chris Wilson 2015-06-12 17:09 ` [PATCH 2/2] drm/i915: Rework order of operations in {__intel, logical}_ring_prepare() Dave Gordon 2015-06-12 20:25 ` (no subject) Dave Gordon 2 siblings, 1 reply; 15+ messages in thread From: Dave Gordon @ 2015-06-12 17:09 UTC (permalink / raw) To: intel-gfx When calculating the available space in a ringbuffer, we should use the effective_size rather than the true size of the ring. v2: rebase to latest drm-intel-nightly v3: rebase to latest drm-intel-nightly Signed-off-by: Dave Gordon <david.s.gordon@intel.com> --- drivers/gpu/drm/i915/intel_lrc.c | 5 +++-- drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 9b74ffa..454e836 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -699,7 +699,7 @@ static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf, /* Would completion of this request free enough space? */ space = __intel_ring_space(request->postfix, ringbuf->tail, - ringbuf->size); + ringbuf->effective_size); if (space >= bytes) break; } @@ -711,7 +711,8 @@ static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf, if (ret) return ret; - ringbuf->space = space; + /* Update ring space after wait+retire */ + intel_ring_update_space(ringbuf); return 0; } diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index b70d25b..a3406b2 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -66,7 +66,8 @@ void intel_ring_update_space(struct intel_ringbuffer *ringbuf) } ringbuf->space = __intel_ring_space(ringbuf->head & HEAD_ADDR, - ringbuf->tail, ringbuf->size); + ringbuf->tail, + ringbuf->effective_size); } int intel_ring_space(struct intel_ringbuffer *ringbuf) @@ -2117,8 +2118,9 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n) return 0; list_for_each_entry(request, &ring->request_list, list) { + /* Would completion of this request free enough space? */ space = __intel_ring_space(request->postfix, ringbuf->tail, - ringbuf->size); + ringbuf->effective_size); if (space >= n) break; } @@ -2130,7 +2132,8 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n) if (ret) return ret; - ringbuf->space = space; + /* Update ring space after wait+retire */ + intel_ring_update_space(ringbuf); return 0; } -- 1.7.9.5 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] drm/i915: use effective_size for ringbuffer calculations 2015-06-12 17:09 ` [PATCH 1/2] drm/i915: use effective_size for ringbuffer calculations Dave Gordon @ 2015-06-12 18:12 ` Chris Wilson 2015-06-12 19:55 ` Dave Gordon 0 siblings, 1 reply; 15+ messages in thread From: Chris Wilson @ 2015-06-12 18:12 UTC (permalink / raw) To: Dave Gordon; +Cc: intel-gfx On Fri, Jun 12, 2015 at 06:09:07PM +0100, Dave Gordon wrote: > When calculating the available space in a ringbuffer, we should > use the effective_size rather than the true size of the ring. > > v2: rebase to latest drm-intel-nightly > v3: rebase to latest drm-intel-nightly > > Signed-off-by: Dave Gordon <david.s.gordon@intel.com> > --- > drivers/gpu/drm/i915/intel_lrc.c | 5 +++-- > drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- > 2 files changed, 9 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 9b74ffa..454e836 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -699,7 +699,7 @@ static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf, > > /* Would completion of this request free enough space? */ > space = __intel_ring_space(request->postfix, ringbuf->tail, > - ringbuf->size); > + ringbuf->effective_size); > if (space >= bytes) > break; > } > @@ -711,7 +711,8 @@ static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf, > if (ret) > return ret; > > - ringbuf->space = space; > + /* Update ring space after wait+retire */ > + intel_ring_update_space(ringbuf); Does the function not do what it says on the tin? At least make it seem like you are explaining your reasoning, not documenting the following function. /* * Having waited for the request, query the HEAD of most recent retired * request and use that for our space calcuations. */ However, that makes an incorrect assumption about the waiter. Given that the current code is written such that ringbuf->last_retired_head = request->postfix and that space is identical to the repeated calculation, what is your intention exactly? -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] drm/i915: use effective_size for ringbuffer calculations 2015-06-12 18:12 ` Chris Wilson @ 2015-06-12 19:55 ` Dave Gordon 2015-06-12 20:41 ` Chris Wilson 0 siblings, 1 reply; 15+ messages in thread From: Dave Gordon @ 2015-06-12 19:55 UTC (permalink / raw) To: Chris Wilson, intel-gfx On 12/06/15 19:12, Chris Wilson wrote: > On Fri, Jun 12, 2015 at 06:09:07PM +0100, Dave Gordon wrote: >> When calculating the available space in a ringbuffer, we should >> use the effective_size rather than the true size of the ring. >> >> v2: rebase to latest drm-intel-nightly >> v3: rebase to latest drm-intel-nightly >> >> Signed-off-by: Dave Gordon <david.s.gordon@intel.com> >> --- >> drivers/gpu/drm/i915/intel_lrc.c | 5 +++-- >> drivers/gpu/drm/i915/intel_ringbuffer.c | 9 ++++++--- >> 2 files changed, 9 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c >> index 9b74ffa..454e836 100644 >> --- a/drivers/gpu/drm/i915/intel_lrc.c >> +++ b/drivers/gpu/drm/i915/intel_lrc.c >> @@ -699,7 +699,7 @@ static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf, >> >> /* Would completion of this request free enough space? */ >> space = __intel_ring_space(request->postfix, ringbuf->tail, >> - ringbuf->size); >> + ringbuf->effective_size); >> if (space >= bytes) >> break; >> } >> @@ -711,7 +711,8 @@ static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf, >> if (ret) >> return ret; >> >> - ringbuf->space = space; >> + /* Update ring space after wait+retire */ >> + intel_ring_update_space(ringbuf); > > Does the function not do what it says on the tin? At least make it seem > like you are explaining your reasoning, not documenting the following > function. > > /* > * Having waited for the request, query the HEAD of most recent retired > * request and use that for our space calcuations. > */ That's what the comment means; the important bit is mentioning "retire", since it's not explicitly called from here but only via wait_request(). We could say, /* * After waiting, at least one request must have completed * and been retired, so make sure that the ringbuffer's * space calculations are up to date */ intel_ring_update_space(ringbuf); BUG_ON(ringbuf->space < bytes); > However, that makes an incorrect assumption about the waiter. Given that > the current code is written such that ringbuf->last_retired_head = > request->postfix and that space is identical to the repeated > calculation, what is your intention exactly? > -Chris Three factors: * firstly, a preference: I find it logically simpler that there should be one and only one piece of code that writes into ringbuf->space. One doesn't then have to reason about whether two different calculations are in fact equivalent. * secondly, for future proofing: although wait_request() now retires only up to the waited-for request, that wasn't always the case, nor is there any reason why it could not in future opportunistically retire additional requests that have completed while it was waiting. * thirdly, for correctness: using the function has the additional effect of consuming the last_retired_head value set by retire_request. If this is not done, a later call to intel_ring_space() may become confused, with the result that 'head' (and therefore 'space') will be incorrectly updated. .Dave. _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] drm/i915: use effective_size for ringbuffer calculations 2015-06-12 19:55 ` Dave Gordon @ 2015-06-12 20:41 ` Chris Wilson 0 siblings, 0 replies; 15+ messages in thread From: Chris Wilson @ 2015-06-12 20:41 UTC (permalink / raw) To: Dave Gordon; +Cc: intel-gfx On Fri, Jun 12, 2015 at 08:55:09PM +0100, Dave Gordon wrote: > > However, that makes an incorrect assumption about the waiter. Given that > > the current code is written such that ringbuf->last_retired_head = > > request->postfix and that space is identical to the repeated > > calculation, what is your intention exactly? > > -Chris > > Three factors: > > * firstly, a preference: I find it logically simpler that there should > be one and only one piece of code that writes into ringbuf->space. One > doesn't then have to reason about whether two different calculations are > in fact equivalent. I find the opposite, since we compute how much space we want I think it is counter intuitive to suggest otherwise. You then need to assert that the computed space is enough. By saying if we wait until this request, we must have at least this space available and only using that space there is no implicit knowlege. > * secondly, for future proofing: although wait_request() now retires > only up to the waited-for request, that wasn't always the case, nor is > there any reason why it could not in future opportunistically retire > additional requests that have completed while it was waiting. Because there is a cost associated with every retirement. See above for why being explicit is clearer. > * thirdly, for correctness: using the function has the additional effect > of consuming the last_retired_head value set by retire_request. If this > is not done, a later call to intel_ring_space() may become confused, > with the result that 'head' (and therefore 'space') will be incorrectly > updated. Eh. The code is still strictly correct. The biggest issue is that we have multiple locations that decide how to interpret the amount of space generated by completing the request. However, we want to keep request retirement very simple since it is a hot function. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/2] drm/i915: Rework order of operations in {__intel, logical}_ring_prepare() 2015-06-12 17:09 ` [PATCH v2] Resolve issues with ringbuffer space management Dave Gordon 2015-06-12 17:09 ` [PATCH 1/2] drm/i915: use effective_size for ringbuffer calculations Dave Gordon @ 2015-06-12 17:09 ` Dave Gordon 2015-06-12 18:05 ` Chris Wilson 2015-06-12 20:25 ` (no subject) Dave Gordon 2 siblings, 1 reply; 15+ messages in thread From: Dave Gordon @ 2015-06-12 17:09 UTC (permalink / raw) To: intel-gfx The original idea of preallocating the OLR was implemented in > 9d773091 drm/i915: Preallocate next seqno before touching the ring and the sequence of operations was to allocate the OLR, then wrap past the end of the ring if necessary, then wait for space if necessary. But subsequently intel_ring_begin() was refactored, in > 304d695 drm/i915: Flush outstanding requests before allocating new seqno to ensure that pending work that might need to be flushed used the old and not the newly-allocated request. This changed the sequence to wrap and/or wait, then allocate, although the comment still said /* Preallocate the olr before touching the ring */ which was no longer true as intel_wrap_ring_buffer() touches the ring. However, with the introduction of dynamic pinning, in > 7ba717c drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand came the possibility that the ringbuffer might not be pinned to the GTT or mapped into CPU address space when intel_ring_begin() is called. It gets pinned when the request is allocated, so it's now important that this comes *before* anything that can write into the ringbuffer, in this case intel_wrap_ring_buffer(), as this will fault if (a) the ringbuffer happens not to be mapped, and (b) tail happens to be sufficiently close to the end of the ring to trigger wrapping. So the correct order is neither the original allocate-wait-pad-wait, nor the subsequent wait-pad-wait-allocate, but wait-allocate-pad, avoiding both the problems described in the two commits mentioned above. As a bonus, we eliminate the special case where a single ring_begin() might end up waiting twice (once to be able to wrap, and then again if that still hadn't actually freed enough space for the request). We just precalculate the total amount of space we'll need *including* any for padding the end of the ring and wait for that much in one go :) In the time since this code was written, it has all been cloned from the original ringbuffer model to become the execbuffer code, in > 82e104c drm/i915/bdw: New logical ring submission mechanism So now we have to fix it in both paths ... Signed-off-by: Dave Gordon <david.s.gordon@intel.com> --- drivers/gpu/drm/i915/intel_lrc.c | 64 +++++++++++++++---------------- drivers/gpu/drm/i915/intel_ringbuffer.c | 63 +++++++++++++++--------------- 2 files changed, 64 insertions(+), 63 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 454e836..3ef5fb6 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -740,39 +740,22 @@ intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf, execlists_context_queue(ring, ctx, ringbuf->tail, request); } -static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf, - struct intel_context *ctx) -{ - uint32_t __iomem *virt; - int rem = ringbuf->size - ringbuf->tail; - - if (ringbuf->space < rem) { - int ret = logical_ring_wait_for_space(ringbuf, ctx, rem); - - if (ret) - return ret; - } - - virt = ringbuf->virtual_start + ringbuf->tail; - rem /= 4; - while (rem--) - iowrite32(MI_NOOP, virt++); - - ringbuf->tail = 0; - intel_ring_update_space(ringbuf); - - return 0; -} - static int logical_ring_prepare(struct intel_ringbuffer *ringbuf, struct intel_context *ctx, int bytes) { + int fill = 0; int ret; + /* + * If the request will not fit between 'tail' and the effective + * size of the ringbuffer, then we need to pad the end of the + * ringbuffer with NOOPs, then start the request at the top of + * the ring. This increases the total size that we need to check + * for by however much is left at the end of the ring ... + */ if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) { - ret = logical_ring_wrap_buffer(ringbuf, ctx); - if (unlikely(ret)) - return ret; + fill = ringbuf->size - ringbuf->tail; + bytes += fill; } if (unlikely(ringbuf->space < bytes)) { @@ -781,6 +764,28 @@ static int logical_ring_prepare(struct intel_ringbuffer *ringbuf, return ret; } + /* Ensure we have a request before touching the ring */ + if (!ringbuf->ring->outstanding_lazy_request) { + ret = i915_gem_request_alloc(ringbuf->ring, ctx); + if (ret) + return ret; + } + + if (unlikely(fill)) { + uint32_t __iomem *virt = ringbuf->virtual_start + ringbuf->tail; + + /* tail should not have moved */ + if (WARN_ON(fill != ringbuf->size - ringbuf->tail)) + fill = ringbuf->size - ringbuf->tail; + + do + iowrite32(MI_NOOP, virt++); + while ((fill -= 4) > 0); + + ringbuf->tail = 0; + intel_ring_update_space(ringbuf); + } + return 0; } @@ -814,11 +819,6 @@ static int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, if (ret) return ret; - /* Preallocate the olr before touching the ring */ - ret = i915_gem_request_alloc(ring, ctx); - if (ret) - return ret; - ringbuf->space -= num_dwords * sizeof(uint32_t); return 0; } diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index a3406b2..4c0bc29 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -2137,29 +2137,6 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n) return 0; } -static int intel_wrap_ring_buffer(struct intel_engine_cs *ring) -{ - uint32_t __iomem *virt; - struct intel_ringbuffer *ringbuf = ring->buffer; - int rem = ringbuf->size - ringbuf->tail; - - if (ringbuf->space < rem) { - int ret = ring_wait_for_space(ring, rem); - if (ret) - return ret; - } - - virt = ringbuf->virtual_start + ringbuf->tail; - rem /= 4; - while (rem--) - iowrite32(MI_NOOP, virt++); - - ringbuf->tail = 0; - intel_ring_update_space(ringbuf); - - return 0; -} - int intel_ring_idle(struct intel_engine_cs *ring) { struct drm_i915_gem_request *req; @@ -2197,12 +2174,19 @@ static int __intel_ring_prepare(struct intel_engine_cs *ring, int bytes) { struct intel_ringbuffer *ringbuf = ring->buffer; + int fill = 0; int ret; + /* + * If the request will not fit between 'tail' and the effective + * size of the ringbuffer, then we need to pad the end of the + * ringbuffer with NOOPs, then start the request at the top of + * the ring. This increases the total size that we need to check + * for by however much is left at the end of the ring ... + */ if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) { - ret = intel_wrap_ring_buffer(ring); - if (unlikely(ret)) - return ret; + fill = ringbuf->size - ringbuf->tail; + bytes += fill; } if (unlikely(ringbuf->space < bytes)) { @@ -2211,6 +2195,28 @@ static int __intel_ring_prepare(struct intel_engine_cs *ring, return ret; } + /* Ensure we have a request before touching the ring */ + if (!ringbuf->ring->outstanding_lazy_request) { + ret = i915_gem_request_alloc(ringbuf->ring, ctx); + if (ret) + return ret; + } + + if (unlikely(fill)) { + uint32_t __iomem *virt = ringbuf->virtual_start + ringbuf->tail; + + /* tail should not have moved */ + if (WARN_ON(fill != ringbuf->size - ringbuf->tail)) + fill = ringbuf->size - ringbuf->tail; + + do + iowrite32(MI_NOOP, virt++); + while ((fill -= 4) > 0); + + ringbuf->tail = 0; + intel_ring_update_space(ringbuf); + } + return 0; } @@ -2229,11 +2235,6 @@ int intel_ring_begin(struct intel_engine_cs *ring, if (ret) return ret; - /* Preallocate the olr before touching the ring */ - ret = i915_gem_request_alloc(ring, ring->default_context); - if (ret) - return ret; - ring->buffer->space -= num_dwords * sizeof(uint32_t); return 0; } -- 1.7.9.5 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] drm/i915: Rework order of operations in {__intel, logical}_ring_prepare() 2015-06-12 17:09 ` [PATCH 2/2] drm/i915: Rework order of operations in {__intel, logical}_ring_prepare() Dave Gordon @ 2015-06-12 18:05 ` Chris Wilson 2015-06-12 18:54 ` Dave Gordon 0 siblings, 1 reply; 15+ messages in thread From: Chris Wilson @ 2015-06-12 18:05 UTC (permalink / raw) To: Dave Gordon; +Cc: intel-gfx On Fri, Jun 12, 2015 at 06:09:08PM +0100, Dave Gordon wrote: > The original idea of preallocating the OLR was implemented in > > > 9d773091 drm/i915: Preallocate next seqno before touching the ring > > and the sequence of operations was to allocate the OLR, then wrap past > the end of the ring if necessary, then wait for space if necessary. > But subsequently intel_ring_begin() was refactored, in > > > 304d695 drm/i915: Flush outstanding requests before allocating new seqno > > to ensure that pending work that might need to be flushed used the old > and not the newly-allocated request. This changed the sequence to wrap > and/or wait, then allocate, although the comment still said > /* Preallocate the olr before touching the ring */ > which was no longer true as intel_wrap_ring_buffer() touches the ring. > > However, with the introduction of dynamic pinning, in > > > 7ba717c drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand > > came the possibility that the ringbuffer might not be pinned to the GTT > or mapped into CPU address space when intel_ring_begin() is called. It > gets pinned when the request is allocated, so it's now important that > this comes *before* anything that can write into the ringbuffer, in this > case intel_wrap_ring_buffer(), as this will fault if (a) the ringbuffer > happens not to be mapped, and (b) tail happens to be sufficiently close > to the end of the ring to trigger wrapping. On the other hand, the request allocation can itself write into the ring. This is not the right fix, that is the elimination of olr itself and passing the request into intel_ring_begin. That way we can explicit in our ordering into ring access. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] drm/i915: Rework order of operations in {__intel, logical}_ring_prepare() 2015-06-12 18:05 ` Chris Wilson @ 2015-06-12 18:54 ` Dave Gordon 2015-06-12 19:10 ` Chris Wilson 0 siblings, 1 reply; 15+ messages in thread From: Dave Gordon @ 2015-06-12 18:54 UTC (permalink / raw) To: Chris Wilson, intel-gfx On 12/06/15 19:05, Chris Wilson wrote: > On Fri, Jun 12, 2015 at 06:09:08PM +0100, Dave Gordon wrote: >> The original idea of preallocating the OLR was implemented in >> >>> 9d773091 drm/i915: Preallocate next seqno before touching the ring >> >> and the sequence of operations was to allocate the OLR, then wrap past >> the end of the ring if necessary, then wait for space if necessary. >> But subsequently intel_ring_begin() was refactored, in >> >>> 304d695 drm/i915: Flush outstanding requests before allocating new seqno >> >> to ensure that pending work that might need to be flushed used the old >> and not the newly-allocated request. This changed the sequence to wrap >> and/or wait, then allocate, although the comment still said >> /* Preallocate the olr before touching the ring */ >> which was no longer true as intel_wrap_ring_buffer() touches the ring. >> >> However, with the introduction of dynamic pinning, in >> >>> 7ba717c drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand >> >> came the possibility that the ringbuffer might not be pinned to the GTT >> or mapped into CPU address space when intel_ring_begin() is called. It >> gets pinned when the request is allocated, so it's now important that >> this comes *before* anything that can write into the ringbuffer, in this >> case intel_wrap_ring_buffer(), as this will fault if (a) the ringbuffer >> happens not to be mapped, and (b) tail happens to be sufficiently close >> to the end of the ring to trigger wrapping. > > On the other hand, the request allocation can itself write into the > ring. This is not the right fix, that is the elimination of olr itself > and passing the request into intel_ring_begin. That way we can explicit > in our ordering into ring access. > -Chris AFAICS, request allocation can write into the ring only if it actually has to flush some *pre-existing* OLR. [Aside: it can actually trigger writing into a completely different ringbuffer, but not the one we're handling here!] The worst-case sequence is: i915_gem_request_alloc finds there's no OLR i915_gem_get_seqno finds the seqno is 0 i915_gem_init_seqno for_eash_ring do ... intel_ring_idle but no OLR, so OK It only works because i915_gem_request_alloc() allocates the request early but doesn't store it in the OLR until the end. OTOH I agree that the long-term answer is the elimination of the OLR; this is really something of a stopgap until John H's Anti-OLR patchset is merged. Although, the simplification of the wait-wrap/wait-space sequence is probably worthwhile in its own right, so if Anti-OLR gets merged first we can put the rest of the changes on top of that. It's only code inside the "if(!OLR)" section that would need to be removed. .Dave. _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] drm/i915: Rework order of operations in {__intel, logical}_ring_prepare() 2015-06-12 18:54 ` Dave Gordon @ 2015-06-12 19:10 ` Chris Wilson 0 siblings, 0 replies; 15+ messages in thread From: Chris Wilson @ 2015-06-12 19:10 UTC (permalink / raw) To: Dave Gordon; +Cc: intel-gfx On Fri, Jun 12, 2015 at 07:54:17PM +0100, Dave Gordon wrote: > On 12/06/15 19:05, Chris Wilson wrote: > > On Fri, Jun 12, 2015 at 06:09:08PM +0100, Dave Gordon wrote: > >> The original idea of preallocating the OLR was implemented in > >> > >>> 9d773091 drm/i915: Preallocate next seqno before touching the ring > >> > >> and the sequence of operations was to allocate the OLR, then wrap past > >> the end of the ring if necessary, then wait for space if necessary. > >> But subsequently intel_ring_begin() was refactored, in > >> > >>> 304d695 drm/i915: Flush outstanding requests before allocating new seqno > >> > >> to ensure that pending work that might need to be flushed used the old > >> and not the newly-allocated request. This changed the sequence to wrap > >> and/or wait, then allocate, although the comment still said > >> /* Preallocate the olr before touching the ring */ > >> which was no longer true as intel_wrap_ring_buffer() touches the ring. > >> > >> However, with the introduction of dynamic pinning, in > >> > >>> 7ba717c drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand > >> > >> came the possibility that the ringbuffer might not be pinned to the GTT > >> or mapped into CPU address space when intel_ring_begin() is called. It > >> gets pinned when the request is allocated, so it's now important that > >> this comes *before* anything that can write into the ringbuffer, in this > >> case intel_wrap_ring_buffer(), as this will fault if (a) the ringbuffer > >> happens not to be mapped, and (b) tail happens to be sufficiently close > >> to the end of the ring to trigger wrapping. > > > > On the other hand, the request allocation can itself write into the > > ring. This is not the right fix, that is the elimination of olr itself > > and passing the request into intel_ring_begin. That way we can explicit > > in our ordering into ring access. > > -Chris > > AFAICS, request allocation can write into the ring only if it actually > has to flush some *pre-existing* OLR. [Aside: it can actually trigger > writing into a completely different ringbuffer, but not the one we're > handling here!] The worst-case sequence is: You forget that ultimately (or rather should have been in the design for execlists once the shortcomings of the ad hoc method were apparent) equest allocation will also be responsible for context management (since they are one and the same). > It only works because i915_gem_request_alloc() allocates the request > early but doesn't store it in the OLR until the end. > > OTOH I agree that the long-term answer is the elimination of the OLR; > this is really something of a stopgap until John H's Anti-OLR patchset > is merged. See my patches a year ago for a more complete cleanup. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* (no subject) 2015-06-12 17:09 ` [PATCH v2] Resolve issues with ringbuffer space management Dave Gordon 2015-06-12 17:09 ` [PATCH 1/2] drm/i915: use effective_size for ringbuffer calculations Dave Gordon 2015-06-12 17:09 ` [PATCH 2/2] drm/i915: Rework order of operations in {__intel, logical}_ring_prepare() Dave Gordon @ 2015-06-12 20:25 ` Dave Gordon 2015-06-12 20:25 ` [PATCH 1/2] drm/i915: Don't wait twice in {__intel, logical}_ring_prepare() Dave Gordon ` (2 more replies) 2 siblings, 3 replies; 15+ messages in thread From: Dave Gordon @ 2015-06-12 20:25 UTC (permalink / raw) To: intel-gfx Updated version split into two. The first tidies up the _ring_prepare() functions and removes the corner case where we might have had to wait twice; the second is a temporary workaround to solve a kernel OOPS that can occur if logical_ring_begin is called while the ringbuffer is not mapped because there's no current request. The latter will be superseded by the Anti-OLR patch series currently in review. But this helps with GuC submission, which is better than the execlist path at exposing the problematic case :( _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] drm/i915: Don't wait twice in {__intel, logical}_ring_prepare() 2015-06-12 20:25 ` (no subject) Dave Gordon @ 2015-06-12 20:25 ` Dave Gordon 2015-06-12 20:25 ` [PATCH 2/2] drm/i915: Allocate OLR more safely (workaround until OLR goes away) Dave Gordon 2015-06-17 11:04 ` (no subject) Daniel Vetter 2 siblings, 0 replies; 15+ messages in thread From: Dave Gordon @ 2015-06-12 20:25 UTC (permalink / raw) To: intel-gfx In the case that the ringbuffer was near-full AND 'tail' was near the end of the buffer, we could have ended up waiting twice: once to gain ownership of the space between TAIL and the end (which we just want to fill with padding, so as not to split a single command sequence across the end of the ringbuffer), and then again to get enough space for the new data that the caller wants to add. Now we just precalculate the total amount of space we'll need *including* any for padding at the end of the ringbuffer and wait for that much in one go. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> --- drivers/gpu/drm/i915/intel_lrc.c | 52 +++++++++++++++---------------- drivers/gpu/drm/i915/intel_ringbuffer.c | 51 +++++++++++++++--------------- 2 files changed, 50 insertions(+), 53 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 454e836..a4bb5d3 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -740,39 +740,22 @@ intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf, execlists_context_queue(ring, ctx, ringbuf->tail, request); } -static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf, - struct intel_context *ctx) -{ - uint32_t __iomem *virt; - int rem = ringbuf->size - ringbuf->tail; - - if (ringbuf->space < rem) { - int ret = logical_ring_wait_for_space(ringbuf, ctx, rem); - - if (ret) - return ret; - } - - virt = ringbuf->virtual_start + ringbuf->tail; - rem /= 4; - while (rem--) - iowrite32(MI_NOOP, virt++); - - ringbuf->tail = 0; - intel_ring_update_space(ringbuf); - - return 0; -} - static int logical_ring_prepare(struct intel_ringbuffer *ringbuf, struct intel_context *ctx, int bytes) { + int fill = 0; int ret; + /* + * If the request will not fit between 'tail' and the effective + * size of the ringbuffer, then we need to pad the end of the + * ringbuffer with NOOPs, then start the request at the top of + * the ring. This increases the total size that we need to check + * for by however much is left at the end of the ring ... + */ if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) { - ret = logical_ring_wrap_buffer(ringbuf, ctx); - if (unlikely(ret)) - return ret; + fill = ringbuf->size - ringbuf->tail; + bytes += fill; } if (unlikely(ringbuf->space < bytes)) { @@ -781,6 +764,21 @@ static int logical_ring_prepare(struct intel_ringbuffer *ringbuf, return ret; } + if (unlikely(fill)) { + uint32_t __iomem *virt = ringbuf->virtual_start + ringbuf->tail; + + /* tail should not have moved */ + if (WARN_ON(fill != ringbuf->size - ringbuf->tail)) + fill = ringbuf->size - ringbuf->tail; + + do + iowrite32(MI_NOOP, virt++); + while ((fill -= 4) > 0); + + ringbuf->tail = 0; + intel_ring_update_space(ringbuf); + } + return 0; } diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index a3406b2..5a1cd20 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -2137,29 +2137,6 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n) return 0; } -static int intel_wrap_ring_buffer(struct intel_engine_cs *ring) -{ - uint32_t __iomem *virt; - struct intel_ringbuffer *ringbuf = ring->buffer; - int rem = ringbuf->size - ringbuf->tail; - - if (ringbuf->space < rem) { - int ret = ring_wait_for_space(ring, rem); - if (ret) - return ret; - } - - virt = ringbuf->virtual_start + ringbuf->tail; - rem /= 4; - while (rem--) - iowrite32(MI_NOOP, virt++); - - ringbuf->tail = 0; - intel_ring_update_space(ringbuf); - - return 0; -} - int intel_ring_idle(struct intel_engine_cs *ring) { struct drm_i915_gem_request *req; @@ -2197,12 +2174,19 @@ static int __intel_ring_prepare(struct intel_engine_cs *ring, int bytes) { struct intel_ringbuffer *ringbuf = ring->buffer; + int fill = 0; int ret; + /* + * If the request will not fit between 'tail' and the effective + * size of the ringbuffer, then we need to pad the end of the + * ringbuffer with NOOPs, then start the request at the top of + * the ring. This increases the total size that we need to check + * for by however much is left at the end of the ring ... + */ if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) { - ret = intel_wrap_ring_buffer(ring); - if (unlikely(ret)) - return ret; + fill = ringbuf->size - ringbuf->tail; + bytes += fill; } if (unlikely(ringbuf->space < bytes)) { @@ -2211,6 +2195,21 @@ static int __intel_ring_prepare(struct intel_engine_cs *ring, return ret; } + if (unlikely(fill)) { + uint32_t __iomem *virt = ringbuf->virtual_start + ringbuf->tail; + + /* tail should not have moved */ + if (WARN_ON(fill != ringbuf->size - ringbuf->tail)) + fill = ringbuf->size - ringbuf->tail; + + do + iowrite32(MI_NOOP, virt++); + while ((fill -= 4) > 0); + + ringbuf->tail = 0; + intel_ring_update_space(ringbuf); + } + return 0; } -- 1.7.9.5 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] drm/i915: Allocate OLR more safely (workaround until OLR goes away) 2015-06-12 20:25 ` (no subject) Dave Gordon 2015-06-12 20:25 ` [PATCH 1/2] drm/i915: Don't wait twice in {__intel, logical}_ring_prepare() Dave Gordon @ 2015-06-12 20:25 ` Dave Gordon 2015-06-17 11:04 ` (no subject) Daniel Vetter 2 siblings, 0 replies; 15+ messages in thread From: Dave Gordon @ 2015-06-12 20:25 UTC (permalink / raw) To: intel-gfx The original idea of preallocating the OLR was implemented in > 9d773091 drm/i915: Preallocate next seqno before touching the ring and the sequence of operations was to allocate the OLR, then wrap past the end of the ring if necessary, then wait for space if necessary. But subsequently intel_ring_begin() was refactored, in > 304d695 drm/i915: Flush outstanding requests before allocating new seqno to ensure that pending work that might need to be flushed used the old and not the newly-allocated request. This changed the sequence to wrap and/or wait, then allocate, although the comment still said /* Preallocate the olr before touching the ring */ which was no longer true as intel_wrap_ring_buffer() touches the ring. However, with the introduction of dynamic pinning, in > 7ba717c drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand came the possibility that the ringbuffer might not be pinned to the GTT or mapped into CPU address space when intel_ring_begin() is called. It gets pinned when the request is allocated, so it's now important that this comes *before* anything that can write into the ringbuffer, in this case intel_wrap_ring_buffer(), as this will fault if (a) the ringbuffer happens not to be mapped, and (b) tail happens to be sufficiently close to the end of the ring to trigger wrapping. So the correct order is neither the original allocate-wait-pad-wait, nor the subsequent wait-pad-wait-allocate, but wait-allocate-pad, avoiding both the problems described in the two commits mentioned above. So this commit moves the calls to i915_gem_request_alloc() into the middle of {__intel,logical}_ring_prepare() rather than either before or after them. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> --- drivers/gpu/drm/i915/intel_lrc.c | 12 +++++++----- drivers/gpu/drm/i915/intel_ringbuffer.c | 12 +++++++----- 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index a4bb5d3..3ef5fb6 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -764,6 +764,13 @@ static int logical_ring_prepare(struct intel_ringbuffer *ringbuf, return ret; } + /* Ensure we have a request before touching the ring */ + if (!ringbuf->ring->outstanding_lazy_request) { + ret = i915_gem_request_alloc(ringbuf->ring, ctx); + if (ret) + return ret; + } + if (unlikely(fill)) { uint32_t __iomem *virt = ringbuf->virtual_start + ringbuf->tail; @@ -812,11 +819,6 @@ static int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, if (ret) return ret; - /* Preallocate the olr before touching the ring */ - ret = i915_gem_request_alloc(ring, ctx); - if (ret) - return ret; - ringbuf->space -= num_dwords * sizeof(uint32_t); return 0; } diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 5a1cd20..ddf580d 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -2195,6 +2195,13 @@ static int __intel_ring_prepare(struct intel_engine_cs *ring, return ret; } + /* Ensure we have a request before touching the ring */ + if (!ringbuf->ring->outstanding_lazy_request) { + ret = i915_gem_request_alloc(ringbuf->ring, ring->default_context); + if (ret) + return ret; + } + if (unlikely(fill)) { uint32_t __iomem *virt = ringbuf->virtual_start + ringbuf->tail; @@ -2228,11 +2235,6 @@ int intel_ring_begin(struct intel_engine_cs *ring, if (ret) return ret; - /* Preallocate the olr before touching the ring */ - ret = i915_gem_request_alloc(ring, ring->default_context); - if (ret) - return ret; - ring->buffer->space -= num_dwords * sizeof(uint32_t); return 0; } -- 1.7.9.5 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: (no subject) 2015-06-12 20:25 ` (no subject) Dave Gordon 2015-06-12 20:25 ` [PATCH 1/2] drm/i915: Don't wait twice in {__intel, logical}_ring_prepare() Dave Gordon 2015-06-12 20:25 ` [PATCH 2/2] drm/i915: Allocate OLR more safely (workaround until OLR goes away) Dave Gordon @ 2015-06-17 11:04 ` Daniel Vetter 2015-06-17 12:41 ` Jani Nikula 2015-06-18 10:30 ` Dave Gordon 2 siblings, 2 replies; 15+ messages in thread From: Daniel Vetter @ 2015-06-17 11:04 UTC (permalink / raw) To: Dave Gordon; +Cc: intel-gfx On Fri, Jun 12, 2015 at 09:25:36PM +0100, Dave Gordon wrote: > Updated version split into two. The first tidies up the _ring_prepare() > functions and removes the corner case where we might have had to wait > twice; the second is a temporary workaround to solve a kernel OOPS that > can occur if logical_ring_begin is called while the ringbuffer is not > mapped because there's no current request. > > The latter will be superseded by the Anti-OLR patch series currently > in review. But this helps with GuC submission, which is better than > the execlist path at exposing the problematic case :( Maintainer broken record: Lack of changelog makes it hard to figure out what changed and which patches are the latest version. Even more so when trying to catch up from vacation ... -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: (no subject) 2015-06-17 11:04 ` (no subject) Daniel Vetter @ 2015-06-17 12:41 ` Jani Nikula 2015-06-18 10:30 ` Dave Gordon 1 sibling, 0 replies; 15+ messages in thread From: Jani Nikula @ 2015-06-17 12:41 UTC (permalink / raw) To: Daniel Vetter, Dave Gordon; +Cc: intel-gfx On Wed, 17 Jun 2015, Daniel Vetter <daniel@ffwll.ch> wrote: > On Fri, Jun 12, 2015 at 09:25:36PM +0100, Dave Gordon wrote: >> Updated version split into two. The first tidies up the _ring_prepare() >> functions and removes the corner case where we might have had to wait >> twice; the second is a temporary workaround to solve a kernel OOPS that >> can occur if logical_ring_begin is called while the ringbuffer is not >> mapped because there's no current request. >> >> The latter will be superseded by the Anti-OLR patch series currently >> in review. But this helps with GuC submission, which is better than >> the execlist path at exposing the problematic case :( > > Maintainer broken record: Lack of changelog makes it hard to figure out > what changed and which patches are the latest version. Even more so when > trying to catch up from vacation ... Is it time we adopted Greg's <formletter> approach with copy-pasted snippets from [1]...? See [2] for an example. BR, Jani. [1] https://github.com/gregkh/gregkh-linux/blob/master/forms/patch_bot [2] http://mid.gmane.org/20150612153842.GA12274@kroah.com > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Jani Nikula, Intel Open Source Technology Center _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: (no subject) 2015-06-17 11:04 ` (no subject) Daniel Vetter 2015-06-17 12:41 ` Jani Nikula @ 2015-06-18 10:30 ` Dave Gordon 1 sibling, 0 replies; 15+ messages in thread From: Dave Gordon @ 2015-06-18 10:30 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx On 17/06/15 12:04, Daniel Vetter wrote: > On Fri, Jun 12, 2015 at 09:25:36PM +0100, Dave Gordon wrote: >> Updated version split into two. The first tidies up the _ring_prepare() >> functions and removes the corner case where we might have had to wait >> twice; the second is a temporary workaround to solve a kernel OOPS that >> can occur if logical_ring_begin is called while the ringbuffer is not >> mapped because there's no current request. >> >> The latter will be superseded by the Anti-OLR patch series currently >> in review. But this helps with GuC submission, which is better than >> the execlist path at exposing the problematic case :( > > Maintainer broken record: Lack of changelog makes it hard to figure out > what changed and which patches are the latest version. Even more so when > trying to catch up from vacation ... > -Daniel Oops, that wasn't ready to go to the mailing list, that was just supposed to go to myself so I could test whether the changes I'd made to my git-format-patch and git-send-email settings worked! Hence lack of subject line :( And the settings obviously /weren't/ right; apart from it going to the list, it didn't have the proper "Organisation" header, which was the thing I was trying to update, as well as setting up proper definitions so I could write "git send-email --identity=external --to=myself ..." I think I got them all sorted out before sending the GuC submission sequence though :) .Dave. _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-06-18 10:31 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <1433789441-8295-1-git-send-email-david.s.gordon@intel.com> 2015-06-12 17:09 ` [PATCH v2] Resolve issues with ringbuffer space management Dave Gordon 2015-06-12 17:09 ` [PATCH 1/2] drm/i915: use effective_size for ringbuffer calculations Dave Gordon 2015-06-12 18:12 ` Chris Wilson 2015-06-12 19:55 ` Dave Gordon 2015-06-12 20:41 ` Chris Wilson 2015-06-12 17:09 ` [PATCH 2/2] drm/i915: Rework order of operations in {__intel, logical}_ring_prepare() Dave Gordon 2015-06-12 18:05 ` Chris Wilson 2015-06-12 18:54 ` Dave Gordon 2015-06-12 19:10 ` Chris Wilson 2015-06-12 20:25 ` (no subject) Dave Gordon 2015-06-12 20:25 ` [PATCH 1/2] drm/i915: Don't wait twice in {__intel, logical}_ring_prepare() Dave Gordon 2015-06-12 20:25 ` [PATCH 2/2] drm/i915: Allocate OLR more safely (workaround until OLR goes away) Dave Gordon 2015-06-17 11:04 ` (no subject) Daniel Vetter 2015-06-17 12:41 ` Jani Nikula 2015-06-18 10:30 ` Dave Gordon
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.