* [PATCH 1/2] dmr/i915: Use appropriate spinlock flavour
@ 2016-02-01 11:00 Tvrtko Ursulin
2016-02-01 11:00 ` [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control Tvrtko Ursulin
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Tvrtko Ursulin @ 2016-02-01 11:00 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
We know this never runs from interrupt context so
don't need to use the flags variant.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a928823507c5..faa9def96917 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2970,11 +2970,9 @@ i915_gem_retire_requests(struct drm_device *dev)
i915_gem_retire_requests_ring(ring);
idle &= list_empty(&ring->request_list);
if (i915.enable_execlists) {
- unsigned long flags;
-
- spin_lock_irqsave(&ring->execlist_lock, flags);
+ spin_lock_irq(&ring->execlist_lock);
idle &= list_empty(&ring->execlist_queue);
- spin_unlock_irqrestore(&ring->execlist_lock, flags);
+ spin_unlock_irq(&ring->execlist_lock);
intel_execlists_retire_requests(ring);
}
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control
2016-02-01 11:00 [PATCH 1/2] dmr/i915: Use appropriate spinlock flavour Tvrtko Ursulin
@ 2016-02-01 11:00 ` Tvrtko Ursulin
2016-02-01 11:12 ` Chris Wilson
2016-02-01 11:18 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] dmr/i915: Use appropriate spinlock flavour Patchwork
2016-02-11 8:52 ` [PATCH 1/2] " Daniel Vetter
2 siblings, 1 reply; 7+ messages in thread
From: Tvrtko Ursulin @ 2016-02-01 11:00 UTC (permalink / raw)
To: Intel-gfx
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Where objects are shared across contexts and heavy rendering
is in progress, execlist retired request queue will grow
unbound until the GPU is idle enough for the retire worker
to run and call intel_execlists_retire_requests.
With some workloads, like for example gem_close_race, that
never happens causing the shared object VMA list to grow to
epic proportions, and in turn causes retirement call sites to
spend linearly more and more time walking the obj->vma_list.
End result is the above mentioned test case taking ten minutes
to complete and using up more than a GiB of RAM just for the VMA
objects.
If we instead trigger the execlist house keeping a bit more
often, obj->vma_list will be kept in check by the virtue of
context cleanup running and zapping the inactive VMAs.
This makes the test case an order of magnitude faster and brings
memory use back to normal.
This also makes the code more self-contained since the
intel_execlists_retire_requests call-site is now in a more
appropriate place and implementation leakage is somewhat
reduced.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Testcase: igt/gem_close_race/gem_close_race
---
drivers/gpu/drm/i915/i915_gem.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index faa9def96917..c558887b2084 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2955,6 +2955,9 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
i915_gem_request_assign(&ring->trace_irq_req, NULL);
}
+ if (i915.enable_execlists)
+ intel_execlists_retire_requests(ring);
+
WARN_ON(i915_verify_lists(ring->dev));
}
@@ -2973,8 +2976,6 @@ i915_gem_retire_requests(struct drm_device *dev)
spin_lock_irq(&ring->execlist_lock);
idle &= list_empty(&ring->execlist_queue);
spin_unlock_irq(&ring->execlist_lock);
-
- intel_execlists_retire_requests(ring);
}
}
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control
2016-02-01 11:00 ` [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control Tvrtko Ursulin
@ 2016-02-01 11:12 ` Chris Wilson
2016-02-01 13:29 ` Tvrtko Ursulin
0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2016-02-01 11:12 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: Intel-gfx
On Mon, Feb 01, 2016 at 11:00:08AM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> Where objects are shared across contexts and heavy rendering
> is in progress, execlist retired request queue will grow
> unbound until the GPU is idle enough for the retire worker
> to run and call intel_execlists_retire_requests.
>
> With some workloads, like for example gem_close_race, that
> never happens causing the shared object VMA list to grow to
> epic proportions, and in turn causes retirement call sites to
> spend linearly more and more time walking the obj->vma_list.
>
> End result is the above mentioned test case taking ten minutes
> to complete and using up more than a GiB of RAM just for the VMA
> objects.
>
> If we instead trigger the execlist house keeping a bit more
> often, obj->vma_list will be kept in check by the virtue of
> context cleanup running and zapping the inactive VMAs.
>
> This makes the test case an order of magnitude faster and brings
> memory use back to normal.
>
> This also makes the code more self-contained since the
> intel_execlists_retire_requests call-site is now in a more
> appropriate place and implementation leakage is somewhat
> reduced.
However, this then causes a perf regression since we unpin the contexts
too frequently and do not have any mitigation in place yet.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control
2016-02-01 11:12 ` Chris Wilson
@ 2016-02-01 13:29 ` Tvrtko Ursulin
2016-02-01 13:41 ` Chris Wilson
0 siblings, 1 reply; 7+ messages in thread
From: Tvrtko Ursulin @ 2016-02-01 13:29 UTC (permalink / raw)
To: Chris Wilson, Intel-gfx
On 01/02/16 11:12, Chris Wilson wrote:
> On Mon, Feb 01, 2016 at 11:00:08AM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Where objects are shared across contexts and heavy rendering
>> is in progress, execlist retired request queue will grow
>> unbound until the GPU is idle enough for the retire worker
>> to run and call intel_execlists_retire_requests.
>>
>> With some workloads, like for example gem_close_race, that
>> never happens causing the shared object VMA list to grow to
>> epic proportions, and in turn causes retirement call sites to
>> spend linearly more and more time walking the obj->vma_list.
>>
>> End result is the above mentioned test case taking ten minutes
>> to complete and using up more than a GiB of RAM just for the VMA
>> objects.
>>
>> If we instead trigger the execlist house keeping a bit more
>> often, obj->vma_list will be kept in check by the virtue of
>> context cleanup running and zapping the inactive VMAs.
>>
>> This makes the test case an order of magnitude faster and brings
>> memory use back to normal.
>>
>> This also makes the code more self-contained since the
>> intel_execlists_retire_requests call-site is now in a more
>> appropriate place and implementation leakage is somewhat
>> reduced.
>
> However, this then causes a perf regression since we unpin the contexts
> too frequently and do not have any mitigation in place yet.
I suppose it is possible. What takes most time - page table clears on
VMA unbinds? It is just that this looks so bad at the moment. :( Luckily
it is just the IGT..
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control
2016-02-01 13:29 ` Tvrtko Ursulin
@ 2016-02-01 13:41 ` Chris Wilson
0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2016-02-01 13:41 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: Intel-gfx
On Mon, Feb 01, 2016 at 01:29:16PM +0000, Tvrtko Ursulin wrote:
>
> On 01/02/16 11:12, Chris Wilson wrote:
> >On Mon, Feb 01, 2016 at 11:00:08AM +0000, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>Where objects are shared across contexts and heavy rendering
> >>is in progress, execlist retired request queue will grow
> >>unbound until the GPU is idle enough for the retire worker
> >>to run and call intel_execlists_retire_requests.
> >>
> >>With some workloads, like for example gem_close_race, that
> >>never happens causing the shared object VMA list to grow to
> >>epic proportions, and in turn causes retirement call sites to
> >>spend linearly more and more time walking the obj->vma_list.
> >>
> >>End result is the above mentioned test case taking ten minutes
> >>to complete and using up more than a GiB of RAM just for the VMA
> >>objects.
> >>
> >>If we instead trigger the execlist house keeping a bit more
> >>often, obj->vma_list will be kept in check by the virtue of
> >>context cleanup running and zapping the inactive VMAs.
> >>
> >>This makes the test case an order of magnitude faster and brings
> >>memory use back to normal.
> >>
> >>This also makes the code more self-contained since the
> >>intel_execlists_retire_requests call-site is now in a more
> >>appropriate place and implementation leakage is somewhat
> >>reduced.
> >
> >However, this then causes a perf regression since we unpin the contexts
> >too frequently and do not have any mitigation in place yet.
>
> I suppose it is possible. What takes most time - page table clears
> on VMA unbinds? It is just that this looks so bad at the moment. :(
> Luckily it is just the IGT..
On Braswell, in particular it is most noticeable, it is the ioremaps.
Note that we don't unbind the VMA on unpin, just make them available for
reallocation. The basic mitigation strategy that's been sent in a couple
of different forms is to to defer the remapping from the unpin to the
vma unbind (and along the vmap paths from the unpin to the put_pages).
Then the context unpin becomes just a matter of dropping a few
individual pin-counts and ref-counts on the various objects used by the
context.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* ✗ Fi.CI.BAT: failure for series starting with [1/2] dmr/i915: Use appropriate spinlock flavour
2016-02-01 11:00 [PATCH 1/2] dmr/i915: Use appropriate spinlock flavour Tvrtko Ursulin
2016-02-01 11:00 ` [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control Tvrtko Ursulin
@ 2016-02-01 11:18 ` Patchwork
2016-02-11 8:52 ` [PATCH 1/2] " Daniel Vetter
2 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2016-02-01 11:18 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-gfx
== Summary ==
Series 2977v1 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/2977/revisions/1/mbox/
Test kms_pipe_crc_basic:
Subgroup suspend-read-crc-pipe-c:
dmesg-warn -> PASS (bsw-nuc-2)
bdw-ultra total:159 pass:147 dwarn:0 dfail:0 fail:0 skip:12
bsw-nuc-2 total:159 pass:129 dwarn:0 dfail:0 fail:0 skip:30
hsw-brixbox total:159 pass:146 dwarn:0 dfail:0 fail:0 skip:13
ilk-hp8440p total:159 pass:111 dwarn:0 dfail:0 fail:0 skip:48
ivb-t430s total:159 pass:145 dwarn:0 dfail:0 fail:0 skip:14
skl-i5k-2 total:159 pass:144 dwarn:1 dfail:0 fail:0 skip:14
snb-dellxps total:159 pass:137 dwarn:0 dfail:0 fail:0 skip:22
snb-x220t total:159 pass:137 dwarn:0 dfail:0 fail:1 skip:21
HANGED hsw-gt2 in igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c
Results at /archive/results/CI_IGT_test/Patchwork_1334/
6b1049b84dcd979f631d15b2ada325d8e5b2c4e1 drm-intel-nightly: 2016y-01m-29d-22h-50m-57s UTC integration manifest
26395446c12cf9776730e10b0715e2967485a8b5 drm/i915: Keep the per-object list of VMAs under control
b5e4821aa4608096047f907772cf1a8451168c1f dmr/i915: Use appropriate spinlock flavour
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] dmr/i915: Use appropriate spinlock flavour
2016-02-01 11:00 [PATCH 1/2] dmr/i915: Use appropriate spinlock flavour Tvrtko Ursulin
2016-02-01 11:00 ` [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control Tvrtko Ursulin
2016-02-01 11:18 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] dmr/i915: Use appropriate spinlock flavour Patchwork
@ 2016-02-11 8:52 ` Daniel Vetter
2 siblings, 0 replies; 7+ messages in thread
From: Daniel Vetter @ 2016-02-11 8:52 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: Intel-gfx
On Mon, Feb 01, 2016 at 11:00:07AM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> We know this never runs from interrupt context so
> don't need to use the flags variant.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index a928823507c5..faa9def96917 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2970,11 +2970,9 @@ i915_gem_retire_requests(struct drm_device *dev)
> i915_gem_retire_requests_ring(ring);
> idle &= list_empty(&ring->request_list);
> if (i915.enable_execlists) {
> - unsigned long flags;
> -
> - spin_lock_irqsave(&ring->execlist_lock, flags);
> + spin_lock_irq(&ring->execlist_lock);
> idle &= list_empty(&ring->execlist_queue);
> - spin_unlock_irqrestore(&ring->execlist_lock, flags);
> + spin_unlock_irq(&ring->execlist_lock);
>
> intel_execlists_retire_requests(ring);
> }
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-02-11 8:51 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-01 11:00 [PATCH 1/2] dmr/i915: Use appropriate spinlock flavour Tvrtko Ursulin
2016-02-01 11:00 ` [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control Tvrtko Ursulin
2016-02-01 11:12 ` Chris Wilson
2016-02-01 13:29 ` Tvrtko Ursulin
2016-02-01 13:41 ` Chris Wilson
2016-02-01 11:18 ` ✗ Fi.CI.BAT: failure for series starting with [1/2] dmr/i915: Use appropriate spinlock flavour Patchwork
2016-02-11 8:52 ` [PATCH 1/2] " Daniel Vetter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.