All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 14/15] drm/i915: Async GPU relocation processing
Date: Fri, 17 Mar 2017 13:15:33 +0200	[thread overview]
Message-ID: <1489749333.2892.11.camel@linux.intel.com> (raw)
In-Reply-To: <20170316132006.7976-15-chris@chris-wilson.co.uk>

On to, 2017-03-16 at 13:20 +0000, Chris Wilson wrote:
> If the user requires patching of their batch or auxiliary buffers, we
> currently make the alterations on the cpu. If they are active on the GPU
> at the time, we wait under the struct_mutex for them to finish executing
> before we rewrite the contents. This happens if shared relocation trees
> are used between different contexts with separate address space (and the
> buffers then have different addresses in each), the 3D state will need
> to be adjusted between execution on each context. However, we don't need
> to use the CPU to do the relocation patching, as we could queue commands
> to the GPU to perform it and use fences to serialise the operation with
> the current activity and future - so the operation on the GPU appears
> just as atomic as performing it immediately. Performing the relocation
> rewrites on the GPU is not free, in terms of pure throughput, the number
> of relocations/s is about halved - but more importantly so is the time
> under the struct_mutex.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

<SNIP>

>  	/* Must be a variable in the struct to allow GCC to unroll. */
> +	cache->gen = INTEL_GEN(i915);
>  	cache->has_llc = HAS_LLC(i915);
> -	cache->has_fence = INTEL_GEN(i915) < 4;
> -	cache->use_64bit_reloc = HAS_64BIT_RELOC(i915);
> +	cache->has_fence = cache->gen < 4;
> +	cache->use_64bit_reloc = cache->gen >= 8;

Keep using HAS_64BIT_RELOC(i915)...

> +static u32 *reloc_gpu(struct i915_execbuffer *eb,
> +		      struct i915_vma *vma,
> +		      unsigned int len)
> +{
> +	struct reloc_cache *cache = &eb->reloc_cache;
> +	u32 *cmd;
> +
> +	if (cache->rq_size > PAGE_SIZE/sizeof(u32) - (len + 1))

There's no overflow chance here, so I'd rq_size + len + 1. 

> +		reloc_gpu_flush(cache);
> +
> +	if (!cache->rq) {
> +		struct drm_i915_gem_object *obj;
> +		struct drm_i915_gem_request *rq;
> +		struct i915_vma *batch;
> +		int err;
> +
> +		GEM_BUG_ON(vma->obj->base.write_domain & I915_GEM_DOMAIN_CPU);

Use ==.

I just close my eyes for the reminder of this function and expect v2 to
have a proper teardown :P

Also vma / obj pair naming varies from what they usually are, so I'd
consider renaming one of them to lessen confusion.

> @@ -1012,6 +1148,67 @@ relocate_entry(struct i915_vma *vma,
>  	bool wide = eb->reloc_cache.use_64bit_reloc;
>  	void *vaddr;
>  
> +	if (!eb->reloc_cache.vaddr &&
> +	    (DBG_FORCE_RELOC == FORCE_GPU_RELOC ||
> +	     !reservation_object_test_signaled_rcu(obj->resv, true))) {
> +		const unsigned int gen = eb->reloc_cache.gen;
> +		unsigned int len;
> +		u32 *batch;
> +		u64 addr;
> +
> +		if (wide)
> +			len = offset & 7 ? 8 : 5;
> +		else if (gen >= 4)
> +			len = 4;
> +		else if (gen >= 3)
> +			len = 3;
> +		else /* On gen2 MI_STORE_DWORD_IMM uses a physical address */
> +			goto repeat;
> +
> +		batch = reloc_gpu(eb, vma, len);
> +		if (IS_ERR(batch))
> +			goto repeat;
> +
> +		addr = gen8_canonical_addr(vma->node.start + offset);
> +		if (wide) {
> +			if (offset & 7) {
> +				*batch++ = MI_STORE_DWORD_IMM_GEN4;

This indent level calls for a new function. __relocate_entry_gpu

Couldn't we share some of this STORE IMM code we have around? I don't
want to crawl the specs every time :(

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2017-03-17 11:15 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-16 13:19 Make execbuf fast and green Chris Wilson
2017-03-16 13:19 ` [PATCH 01/15] drm/i915: Copy user requested buffers into the error state Chris Wilson
2017-03-16 13:19 ` [PATCH 02/15] drm/i915: Retire an active batch pool object rather than allocate new Chris Wilson
2017-03-17  8:52   ` Joonas Lahtinen
2017-03-17  9:02     ` Chris Wilson
2017-03-16 13:19 ` [PATCH 03/15] drm/i915: Amalgamate execbuffer parameter structures Chris Wilson
2017-03-17 10:04   ` Joonas Lahtinen
2017-03-16 13:19 ` [PATCH 04/15] drm/i915: Use vma->exec_entry as our double-entry placeholder Chris Wilson
2017-03-16 13:19 ` [PATCH 05/15] drm/i915: Split vma exec_link/evict_link Chris Wilson
2017-03-16 13:19 ` [PATCH 06/15] drm/i915: Stop using obj->obj_exec_link outside of execbuf Chris Wilson
2017-03-16 13:19 ` [PATCH 07/15] drm/i915: Store a direct lookup from object handle to vma Chris Wilson
2017-03-22  9:33   ` [PATCH v2] " Chris Wilson
2017-03-22 14:22     ` Joonas Lahtinen
2017-03-23 14:23       ` Daniel Vetter
2017-03-23 14:40         ` Chris Wilson
2017-03-16 13:19 ` [PATCH 08/15] drm/i915: Pass vma to relocate entry Chris Wilson
2017-03-21 14:17   ` Joonas Lahtinen
2017-03-23 13:02     ` [PATCH] " Chris Wilson
2017-03-16 13:20 ` [PATCH 09/15] drm/i915: Eliminate lots of iterations over the execobjects array Chris Wilson
2017-03-16 13:20 ` [PATCH 10/15] drm/i915: First try the previous execbuffer location Chris Wilson
2017-03-21 13:54   ` Joonas Lahtinen
2017-03-16 13:20 ` [PATCH 11/15] drm/i915: Wait upon userptr get-user-pages within execbuffer Chris Wilson
2017-03-16 13:20 ` [PATCH 12/15] drm/i915: Remove superfluous i915_add_request_no_flush() helper Chris Wilson
2017-03-17 11:17   ` Joonas Lahtinen
2017-03-17 13:06     ` Chris Wilson
2017-03-16 13:20 ` [PATCH 13/15] drm/i915: Allow execbuffer to use the first object as the batch Chris Wilson
2017-03-17 11:15   ` Joonas Lahtinen
2017-07-07 10:17     ` Daniel Vetter
2017-07-07 11:58       ` Chris Wilson
2017-07-10  5:55         ` Daniel Vetter
2017-07-08  3:54       ` Kenneth Graunke
2017-03-16 13:20 ` [PATCH 14/15] drm/i915: Async GPU relocation processing Chris Wilson
2017-03-17 11:11   ` [PATCH v2] " Chris Wilson
2017-03-17 11:15   ` Joonas Lahtinen [this message]
2017-03-16 13:20 ` [PATCH 15/15] drm/i915/scheduler: Support user-defined priorities Chris Wilson
2017-03-16 13:57 ` ✗ Fi.CI.BAT: warning for series starting with [01/15] drm/i915: Copy user requested buffers into the error state Patchwork
2017-03-17 11:28 ` ✓ Fi.CI.BAT: success for series starting with [01/15] drm/i915: Copy user requested buffers into the error state (rev2) Patchwork
2017-03-22  9:43 ` ✗ Fi.CI.BAT: failure for series starting with [01/15] drm/i915: Copy user requested buffers into the error state (rev3) Patchwork
2017-03-23 13:51 ` ✗ Fi.CI.BAT: failure for series starting with [01/15] drm/i915: Copy user requested buffers into the error state (rev4) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1489749333.2892.11.camel@linux.intel.com \
    --to=joonas.lahtinen@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.