All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] dmr/i915: Use appropriate spinlock flavour
@ 2016-02-01 11:00 Tvrtko Ursulin
  2016-02-01 11:00 ` [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control Tvrtko Ursulin
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Tvrtko Ursulin @ 2016-02-01 11:00 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We know this never runs from interrupt context so
don't need to use the flags variant.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a928823507c5..faa9def96917 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2970,11 +2970,9 @@ i915_gem_retire_requests(struct drm_device *dev)
 		i915_gem_retire_requests_ring(ring);
 		idle &= list_empty(&ring->request_list);
 		if (i915.enable_execlists) {
-			unsigned long flags;
-
-			spin_lock_irqsave(&ring->execlist_lock, flags);
+			spin_lock_irq(&ring->execlist_lock);
 			idle &= list_empty(&ring->execlist_queue);
-			spin_unlock_irqrestore(&ring->execlist_lock, flags);
+			spin_unlock_irq(&ring->execlist_lock);
 
 			intel_execlists_retire_requests(ring);
 		}
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control
  2016-02-01 11:00 [PATCH 1/2] dmr/i915: Use appropriate spinlock flavour Tvrtko Ursulin
@ 2016-02-01 11:00 ` Tvrtko Ursulin
  2016-02-01 11:12   ` Chris Wilson
  2016-02-01 11:18 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] dmr/i915: Use appropriate spinlock flavour Patchwork
  2016-02-11  8:52 ` [PATCH 1/2] " Daniel Vetter
  2 siblings, 1 reply; 7+ messages in thread
From: Tvrtko Ursulin @ 2016-02-01 11:00 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Where objects are shared across contexts and heavy rendering
is in progress, execlist retired request queue will grow
unbound until the GPU is idle enough for the retire worker
to run and call intel_execlists_retire_requests.

With some workloads, like for example gem_close_race, that
never happens causing the shared object VMA list to grow to
epic proportions, and in turn causes retirement call sites to
spend linearly more and more time walking the obj->vma_list.

End result is the above mentioned test case taking ten minutes
to complete and using up more than a GiB of RAM just for the VMA
objects.

If we instead trigger the execlist house keeping a bit more
often, obj->vma_list will be kept in check by the virtue of
context cleanup running and zapping the inactive VMAs.

This makes the test case an order of magnitude faster and brings
memory use back to normal.

This also makes the code more self-contained since the
intel_execlists_retire_requests call-site is now in a more
appropriate place and implementation leakage is somewhat
reduced.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Testcase: igt/gem_close_race/gem_close_race
---
 drivers/gpu/drm/i915/i915_gem.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index faa9def96917..c558887b2084 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2955,6 +2955,9 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 		i915_gem_request_assign(&ring->trace_irq_req, NULL);
 	}
 
+	if (i915.enable_execlists)
+		intel_execlists_retire_requests(ring);
+
 	WARN_ON(i915_verify_lists(ring->dev));
 }
 
@@ -2973,8 +2976,6 @@ i915_gem_retire_requests(struct drm_device *dev)
 			spin_lock_irq(&ring->execlist_lock);
 			idle &= list_empty(&ring->execlist_queue);
 			spin_unlock_irq(&ring->execlist_lock);
-
-			intel_execlists_retire_requests(ring);
 		}
 	}
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control
  2016-02-01 11:00 ` [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control Tvrtko Ursulin
@ 2016-02-01 11:12   ` Chris Wilson
  2016-02-01 13:29     ` Tvrtko Ursulin
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2016-02-01 11:12 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Feb 01, 2016 at 11:00:08AM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Where objects are shared across contexts and heavy rendering
> is in progress, execlist retired request queue will grow
> unbound until the GPU is idle enough for the retire worker
> to run and call intel_execlists_retire_requests.
> 
> With some workloads, like for example gem_close_race, that
> never happens causing the shared object VMA list to grow to
> epic proportions, and in turn causes retirement call sites to
> spend linearly more and more time walking the obj->vma_list.
> 
> End result is the above mentioned test case taking ten minutes
> to complete and using up more than a GiB of RAM just for the VMA
> objects.
> 
> If we instead trigger the execlist house keeping a bit more
> often, obj->vma_list will be kept in check by the virtue of
> context cleanup running and zapping the inactive VMAs.
> 
> This makes the test case an order of magnitude faster and brings
> memory use back to normal.
> 
> This also makes the code more self-contained since the
> intel_execlists_retire_requests call-site is now in a more
> appropriate place and implementation leakage is somewhat
> reduced.

However, this then causes a perf regression since we unpin the contexts
too frequently and do not have any mitigation in place yet.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [1/2] dmr/i915: Use appropriate spinlock flavour
  2016-02-01 11:00 [PATCH 1/2] dmr/i915: Use appropriate spinlock flavour Tvrtko Ursulin
  2016-02-01 11:00 ` [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control Tvrtko Ursulin
@ 2016-02-01 11:18 ` Patchwork
  2016-02-11  8:52 ` [PATCH 1/2] " Daniel Vetter
  2 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2016-02-01 11:18 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Summary ==

Series 2977v1 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/2977/revisions/1/mbox/

Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-c:
                dmesg-warn -> PASS       (bsw-nuc-2)

bdw-ultra        total:159  pass:147  dwarn:0   dfail:0   fail:0   skip:12 
bsw-nuc-2        total:159  pass:129  dwarn:0   dfail:0   fail:0   skip:30 
hsw-brixbox      total:159  pass:146  dwarn:0   dfail:0   fail:0   skip:13 
ilk-hp8440p      total:159  pass:111  dwarn:0   dfail:0   fail:0   skip:48 
ivb-t430s        total:159  pass:145  dwarn:0   dfail:0   fail:0   skip:14 
skl-i5k-2        total:159  pass:144  dwarn:1   dfail:0   fail:0   skip:14 
snb-dellxps      total:159  pass:137  dwarn:0   dfail:0   fail:0   skip:22 
snb-x220t        total:159  pass:137  dwarn:0   dfail:0   fail:1   skip:21 

HANGED hsw-gt2 in igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c

Results at /archive/results/CI_IGT_test/Patchwork_1334/

6b1049b84dcd979f631d15b2ada325d8e5b2c4e1 drm-intel-nightly: 2016y-01m-29d-22h-50m-57s UTC integration manifest
26395446c12cf9776730e10b0715e2967485a8b5 drm/i915: Keep the per-object list of VMAs under control
b5e4821aa4608096047f907772cf1a8451168c1f dmr/i915: Use appropriate spinlock flavour

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control
  2016-02-01 11:12   ` Chris Wilson
@ 2016-02-01 13:29     ` Tvrtko Ursulin
  2016-02-01 13:41       ` Chris Wilson
  0 siblings, 1 reply; 7+ messages in thread
From: Tvrtko Ursulin @ 2016-02-01 13:29 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 01/02/16 11:12, Chris Wilson wrote:
> On Mon, Feb 01, 2016 at 11:00:08AM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Where objects are shared across contexts and heavy rendering
>> is in progress, execlist retired request queue will grow
>> unbound until the GPU is idle enough for the retire worker
>> to run and call intel_execlists_retire_requests.
>>
>> With some workloads, like for example gem_close_race, that
>> never happens causing the shared object VMA list to grow to
>> epic proportions, and in turn causes retirement call sites to
>> spend linearly more and more time walking the obj->vma_list.
>>
>> End result is the above mentioned test case taking ten minutes
>> to complete and using up more than a GiB of RAM just for the VMA
>> objects.
>>
>> If we instead trigger the execlist house keeping a bit more
>> often, obj->vma_list will be kept in check by the virtue of
>> context cleanup running and zapping the inactive VMAs.
>>
>> This makes the test case an order of magnitude faster and brings
>> memory use back to normal.
>>
>> This also makes the code more self-contained since the
>> intel_execlists_retire_requests call-site is now in a more
>> appropriate place and implementation leakage is somewhat
>> reduced.
>
> However, this then causes a perf regression since we unpin the contexts
> too frequently and do not have any mitigation in place yet.

I suppose it is possible. What takes most time - page table clears on 
VMA unbinds? It is just that this looks so bad at the moment. :( Luckily 
it is just the IGT..

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control
  2016-02-01 13:29     ` Tvrtko Ursulin
@ 2016-02-01 13:41       ` Chris Wilson
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2016-02-01 13:41 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Feb 01, 2016 at 01:29:16PM +0000, Tvrtko Ursulin wrote:
> 
> On 01/02/16 11:12, Chris Wilson wrote:
> >On Mon, Feb 01, 2016 at 11:00:08AM +0000, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>Where objects are shared across contexts and heavy rendering
> >>is in progress, execlist retired request queue will grow
> >>unbound until the GPU is idle enough for the retire worker
> >>to run and call intel_execlists_retire_requests.
> >>
> >>With some workloads, like for example gem_close_race, that
> >>never happens causing the shared object VMA list to grow to
> >>epic proportions, and in turn causes retirement call sites to
> >>spend linearly more and more time walking the obj->vma_list.
> >>
> >>End result is the above mentioned test case taking ten minutes
> >>to complete and using up more than a GiB of RAM just for the VMA
> >>objects.
> >>
> >>If we instead trigger the execlist house keeping a bit more
> >>often, obj->vma_list will be kept in check by the virtue of
> >>context cleanup running and zapping the inactive VMAs.
> >>
> >>This makes the test case an order of magnitude faster and brings
> >>memory use back to normal.
> >>
> >>This also makes the code more self-contained since the
> >>intel_execlists_retire_requests call-site is now in a more
> >>appropriate place and implementation leakage is somewhat
> >>reduced.
> >
> >However, this then causes a perf regression since we unpin the contexts
> >too frequently and do not have any mitigation in place yet.
> 
> I suppose it is possible. What takes most time - page table clears
> on VMA unbinds? It is just that this looks so bad at the moment. :(
> Luckily it is just the IGT..

On Braswell, in particular it is most noticeable, it is the ioremaps.
Note that we don't unbind the VMA on unpin, just make them available for
reallocation. The basic mitigation strategy that's been sent in a couple
of different forms is to to defer the remapping from the unpin to the
vma unbind (and along the vmap paths from the unpin to the put_pages).
Then the context unpin becomes just a matter of dropping a few
individual pin-counts and ref-counts on the various objects used by the
context.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] dmr/i915: Use appropriate spinlock flavour
  2016-02-01 11:00 [PATCH 1/2] dmr/i915: Use appropriate spinlock flavour Tvrtko Ursulin
  2016-02-01 11:00 ` [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control Tvrtko Ursulin
  2016-02-01 11:18 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] dmr/i915: Use appropriate spinlock flavour Patchwork
@ 2016-02-11  8:52 ` Daniel Vetter
  2 siblings, 0 replies; 7+ messages in thread
From: Daniel Vetter @ 2016-02-11  8:52 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Feb 01, 2016 at 11:00:07AM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We know this never runs from interrupt context so
> don't need to use the flags variant.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index a928823507c5..faa9def96917 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2970,11 +2970,9 @@ i915_gem_retire_requests(struct drm_device *dev)
>  		i915_gem_retire_requests_ring(ring);
>  		idle &= list_empty(&ring->request_list);
>  		if (i915.enable_execlists) {
> -			unsigned long flags;
> -
> -			spin_lock_irqsave(&ring->execlist_lock, flags);
> +			spin_lock_irq(&ring->execlist_lock);
>  			idle &= list_empty(&ring->execlist_queue);
> -			spin_unlock_irqrestore(&ring->execlist_lock, flags);
> +			spin_unlock_irq(&ring->execlist_lock);
>  
>  			intel_execlists_retire_requests(ring);
>  		}
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-02-11  8:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-01 11:00 [PATCH 1/2] dmr/i915: Use appropriate spinlock flavour Tvrtko Ursulin
2016-02-01 11:00 ` [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control Tvrtko Ursulin
2016-02-01 11:12   ` Chris Wilson
2016-02-01 13:29     ` Tvrtko Ursulin
2016-02-01 13:41       ` Chris Wilson
2016-02-01 11:18 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] dmr/i915: Use appropriate spinlock flavour Patchwork
2016-02-11  8:52 ` [PATCH 1/2] " Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.