All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 14/41] drm/i915: Use a radixtree for random access to the object's backing storage
Date: Mon, 17 Oct 2016 11:57:57 +0100	[thread overview]
Message-ID: <20161017105757.GD20240@nuc-i3427.alporthouse.com> (raw)
In-Reply-To: <223d0ff2-a9bb-374a-3f71-97a7c85304f2@linux.intel.com>

On Mon, Oct 17, 2016 at 10:56:27AM +0100, Tvrtko Ursulin wrote:
> 
> On 14/10/2016 15:07, Chris Wilson wrote:
> >On Fri, Oct 14, 2016 at 02:32:03PM +0100, Tvrtko Ursulin wrote:
> >>On 14/10/2016 13:18, Chris Wilson wrote:
> >>>A while ago we switched from a contiguous array of pages into an sglist,
> >>>for that was both more convenient for mapping to hardware and avoided
> >>>the requirement for a vmalloc array of pages on every object. However,
> >>>certain GEM API calls (like pwrite, pread as well as performing
> >>>relocations) do desire access to individual struct pages. A quick hack
> >>>was to introduce a cache of the last access such that finding the
> >>>following page was quick - this works so long as the caller desired
> >>>sequential access. Walking backwards, or multiple callers, still hits a
> >>>slow linear search for each page. One solution is to store each
> >>>successful lookup in a radix tree.
> >>>
> >>>v2: Rewrite building the radixtree for clarity, hopefully.
> >>>
> >>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>---
> >>>  drivers/gpu/drm/i915/i915_drv.h         |  69 +++++-------
> >>>  drivers/gpu/drm/i915/i915_gem.c         | 179 +++++++++++++++++++++++++++++---
> >>>  drivers/gpu/drm/i915/i915_gem_stolen.c  |   4 +-
> >>>  drivers/gpu/drm/i915/i915_gem_userptr.c |   4 +-
> >>>  4 files changed, 193 insertions(+), 63 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>>index 38467dde1efe..53cf4b0e5359 100644
> >>>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>@@ -2273,9 +2273,12 @@ struct drm_i915_gem_object {
> >>>  	struct sg_table *pages;
> >>>  	int pages_pin_count;
> >>>-	struct get_page {
> >>>-		struct scatterlist *sg;
> >>>-		int last;
> >>>+	struct i915_gem_object_page_iter {
> >>>+		struct scatterlist *sg_pos;
> >>>+		unsigned long sg_idx;
> >>We are not consistent in the code with type used for number of pages
> >>in an object. sg_alloc_table even takes an unsigned int for it, so
> >>for as long as we build them as we do, unsigned long is a waste.
> >I know. It's worrying, today there is a possibility that we overflow a
> >32bit size. If this was counting in pages, we would have a few more
> >years of grace. All I can say is that we are fortunate that memory
> >remains expensive in the exabyte range.
> >
> >>>@@ -4338,6 +4349,8 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
> >>>  	obj->frontbuffer_ggtt_origin = ORIGIN_GTT;
> >>>  	obj->madv = I915_MADV_WILLNEED;
> >>>+	INIT_RADIX_TREE(&obj->get_page.radix, GFP_ATOMIC | __GFP_NOWARN);
> >>Pros & cons of GFP_ATOMIC here? Versus perhaps going with the mutex?
> >>I don't know how much data radix tree allocates with this, per node,
> >>but we can have a lot of pages. Would this create extra pressure on
> >>slab shrinking, and in turn out objects?
> >The problem is that we require sg lookup on a !pagefault path, hence
> >mutexes and GFP_KERNEL turn out to be illegal. :|
> 
> Bummer.  I don't know enough about the atomic reserve to judge how
> bad this might be then. Because userspace could drain it easily
> after this work by just pread/pwrite on large objects.

Yes. The pathological case of touching the last page (and only the last
page) will cause large amount of allocation (one page for every 512
pages in the object, and so on recursively - a 64MiB object will require
32 + 1 atomic page allocations, aiui). Not that bad really.

> >>Hmm... I assume failures happen then since you added this fallback.
> >>Due GFP_ATOMIC?
> >No, this was always in the code, because malloc failures happen. Quite
> >often if you run igt ;)
> 
> But GFP_ATOMIC failures primarily, yes?

GFP_ATOMIC certainly increases the likelihood of failure (by removing
the direct reclaim and sleep paths), on the other hand we do get a
reserve pool.
 
> It is a bit unfortunate that with this fancy lookup we can easily
> and silently fall back to linear lookup and lose the performance
> benefit. Do you know how often this happens? Should we have some
> stats collected and exported via debugfs to evaluate on
> realistically busy/loaded systems?

Hmm, in my head I had it wrongly sketched out as basically falling back
to the old behaviour after allocation failure. Ok, if I stored the last
successful allocation (currently sg_idx-1) and the current scan position
separately, we can keep the old style scheme in place for allocation
failure and so not degrade performance for the worst case.

I wasn't planning on seeing failures outside of igt/gem_shrink (with the
exception of people running firefox or chrome!)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2016-10-17 10:58 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-14 12:17 Fencing, fencing, fencing Chris Wilson
2016-10-14 12:17 ` [PATCH 01/41] drm/i915: Move user fault tracking to a separate list Chris Wilson
2016-10-14 12:17 ` [PATCH 02/41] drm/i915: Use RPM as the barrier for controlling user mmap access Chris Wilson
2016-10-14 12:17 ` [PATCH 03/41] drm/i915: Remove superfluous locking around userfault_list Chris Wilson
2016-10-14 12:17 ` [PATCH 04/41] drm/i915: Remove RPM sequence checking Chris Wilson
2016-10-14 12:17 ` [PATCH 05/41] drm/i915: Move fence cancellation to runtime suspend Chris Wilson
2016-10-14 12:17 ` [PATCH 06/41] drm/i915: Support asynchronous waits on struct fence from i915_gem_request Chris Wilson
2016-10-17 12:20   ` Joonas Lahtinen
2016-10-20 10:28     ` Chris Wilson
2016-10-14 12:17 ` [PATCH 07/41] drm/i915: Allow i915_sw_fence_await_sw_fence() to allocate Chris Wilson
2016-10-14 12:18 ` [PATCH 08/41] drm/i915: Rearrange i915_wait_request() accounting with callers Chris Wilson
2016-10-17 12:26   ` Joonas Lahtinen
2016-10-18 18:51   ` Matthew Auld
2016-10-19 10:39     ` Joonas Lahtinen
2016-10-14 12:18 ` [PATCH 09/41] drm/i915: Remove unused i915_gem_active_wait() in favour of _unlocked() Chris Wilson
2016-10-14 12:18 ` [PATCH 10/41] drm/i915: Defer active reference until required Chris Wilson
2016-10-14 12:18 ` [PATCH 11/41] drm/i915: Introduce an internal allocator for disposable private objects Chris Wilson
2016-10-14 12:42   ` Tvrtko Ursulin
2016-10-14 12:54     ` Chris Wilson
2016-10-14 13:44       ` Tvrtko Ursulin
2016-10-14 13:53         ` Chris Wilson
2016-10-14 14:35           ` Tvrtko Ursulin
2016-10-14 14:42             ` Chris Wilson
2016-10-17  9:47               ` Tvrtko Ursulin
2016-10-17  9:55                 ` Chris Wilson
2016-10-14 12:18 ` [PATCH 12/41] drm/i915: Reuse the active golden render state batch Chris Wilson
2016-10-14 12:18 ` [PATCH 13/41] drm/i915: Markup GEM API with lockdep asserts Chris Wilson
2016-10-14 12:18 ` [PATCH 14/41] drm/i915: Use a radixtree for random access to the object's backing storage Chris Wilson
2016-10-14 13:32   ` Tvrtko Ursulin
2016-10-14 14:07     ` Chris Wilson
2016-10-17  9:56       ` Tvrtko Ursulin
2016-10-17 10:57         ` Chris Wilson [this message]
2016-10-14 12:18 ` [PATCH 15/41] drm/i915: Use radixtree to jump start intel_partial_pages() Chris Wilson
2016-10-14 13:38   ` Tvrtko Ursulin
2016-10-14 12:18 ` [PATCH 16/41] drm/i915: Refactor object page API Chris Wilson
2016-10-14 12:18 ` [PATCH 17/41] drm/i915: Pass around sg_table to get_pages/put_pages backend Chris Wilson
2016-10-17 10:55   ` Tvrtko Ursulin
2016-10-17 11:31     ` Chris Wilson
2016-10-17 13:51       ` Tvrtko Ursulin
2016-10-17 14:08         ` Chris Wilson
2016-10-14 12:18 ` [PATCH 18/41] drm/i915: Move object backing storage manipulation to its own locking Chris Wilson
2016-10-14 12:18 ` [PATCH 19/41] drm/i915/dmabuf: Acquire the backing storage outside of struct_mutex Chris Wilson
2016-10-14 12:18 ` [PATCH 20/41] drm/i915: Implement pread without struct-mutex Chris Wilson
2016-10-14 12:18 ` [PATCH 21/41] drm/i915: Implement pwrite " Chris Wilson
2016-10-14 12:18 ` [PATCH 22/41] drm/i915: Acquire the backing storage outside of struct_mutex in set-domain Chris Wilson
2016-10-14 12:18 ` [PATCH 23/41] drm/i915: Move object release to a freelist + worker Chris Wilson
2016-10-18  9:19   ` Joonas Lahtinen
2016-10-18  9:20     ` Joonas Lahtinen
2016-10-18  9:51   ` John Harrison
2016-10-20  9:38     ` Chris Wilson
2016-10-14 12:18 ` [PATCH 24/41] drm/i915: Use lockless object free Chris Wilson
2016-10-18  8:50   ` Joonas Lahtinen
2016-10-14 12:18 ` [PATCH 25/41] drm/i915: Move GEM activity tracking into a common struct reservation_object Chris Wilson
2016-10-17 12:14   ` Joonas Lahtinen
2016-10-14 12:18 ` [PATCH 26/41] drm: Add reference counting to drm_atomic_state Chris Wilson
2016-10-17  6:20   ` Daniel Vetter
2016-10-14 12:18 ` [PATCH 27/41] drm/i915: Restore nonblocking awaits for modesetting Chris Wilson
2016-10-14 12:18 ` [PATCH 28/41] drm/i915: Combine seqno + tracking into a global timeline struct Chris Wilson
2016-10-14 12:18 ` [PATCH 29/41] drm/i915: Queue the idling context switch after all other timelines Chris Wilson
2016-10-14 12:18 ` [PATCH 30/41] drm/i915: Wait first for submission, before waiting for request completion Chris Wilson
2016-10-14 12:18 ` [PATCH 31/41] drm/i915: Introduce a global_seqno for each request Chris Wilson
2016-10-14 12:18 ` [PATCH 32/41] drm/i915: Rename ->emit_request to ->emit_breadcrumb Chris Wilson
2016-10-17 12:09   ` Joonas Lahtinen
2016-10-14 12:18 ` [PATCH 33/41] drm/i915: Record space required for breadcrumb emission Chris Wilson
2016-10-14 12:18 ` [PATCH 34/41] drm/i915: Defer " Chris Wilson
2016-10-14 12:18 ` [PATCH 35/41] drm/i915: Move the global sync optimisation to the timeline Chris Wilson
2016-10-14 12:18 ` [PATCH 36/41] drm/i915: Create a unique name for the context Chris Wilson
2016-10-14 12:18 ` [PATCH 37/41] drm/i915: Reserve space in the global seqno during request allocation Chris Wilson
2016-10-14 12:18 ` [PATCH 38/41] drm/i915: Defer setting of global seqno on request to submission Chris Wilson
2016-10-17 12:12   ` Joonas Lahtinen
2016-10-14 12:18 ` [PATCH 39/41] drm/i915: Enable multiple timelines Chris Wilson
2016-10-20 15:26   ` Joonas Lahtinen
2016-10-20 15:40     ` Chris Wilson
2016-10-14 12:18 ` [PATCH 40/41] drm/i915: Enable userspace to opt-out of implicit fencing Chris Wilson
2016-10-14 12:18 ` [PATCH 41/41] drm/i915: Support explicit fencing for execbuf Chris Wilson
2016-10-14 13:58 ` ✗ Fi.CI.BAT: failure for series starting with [01/41] drm/i915: Move user fault tracking to a separate list Patchwork
2016-10-14 17:20   ` Saarinen, Jani
2016-10-14 17:38     ` Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161017105757.GD20240@nuc-i3427.alporthouse.com \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.