From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Subject: [PATCH 08/10] drm/i915: Replace the complex flushing logic with simple invalidate/flush all
Date: Fri, 20 Jul 2012 12:41:06 +0100 [thread overview]
Message-ID: <1342784468-20474-9-git-send-email-chris@chris-wilson.co.uk> (raw)
In-Reply-To: <1342784468-20474-1-git-send-email-chris@chris-wilson.co.uk>
Now that we unconditionally flush and invalidate between every batch
buffer, we no longer need the complex logic to decide which domains
require flushing. Remove it and rejoice.
One side-effect is that this also removes the broken wait-on-pending-flip
logic.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 248 ++--------------------------
1 file changed, 12 insertions(+), 236 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 86aead4..b8d4dc3 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -34,180 +34,6 @@
#include "intel_drv.h"
#include <linux/dma_remapping.h>
-struct change_domains {
- uint32_t invalidate_domains;
- uint32_t flush_domains;
- uint32_t flush_rings;
- uint32_t flips;
-};
-
-/*
- * Set the next domain for the specified object. This
- * may not actually perform the necessary flushing/invaliding though,
- * as that may want to be batched with other set_domain operations
- *
- * This is (we hope) the only really tricky part of gem. The goal
- * is fairly simple -- track which caches hold bits of the object
- * and make sure they remain coherent. A few concrete examples may
- * help to explain how it works. For shorthand, we use the notation
- * (read_domains, write_domain), e.g. (CPU, CPU) to indicate the
- * a pair of read and write domain masks.
- *
- * Case 1: the batch buffer
- *
- * 1. Allocated
- * 2. Written by CPU
- * 3. Mapped to GTT
- * 4. Read by GPU
- * 5. Unmapped from GTT
- * 6. Freed
- *
- * Let's take these a step at a time
- *
- * 1. Allocated
- * Pages allocated from the kernel may still have
- * cache contents, so we set them to (CPU, CPU) always.
- * 2. Written by CPU (using pwrite)
- * The pwrite function calls set_domain (CPU, CPU) and
- * this function does nothing (as nothing changes)
- * 3. Mapped by GTT
- * This function asserts that the object is not
- * currently in any GPU-based read or write domains
- * 4. Read by GPU
- * i915_gem_execbuffer calls set_domain (COMMAND, 0).
- * As write_domain is zero, this function adds in the
- * current read domains (CPU+COMMAND, 0).
- * flush_domains is set to CPU.
- * invalidate_domains is set to COMMAND
- * clflush is run to get data out of the CPU caches
- * then i915_dev_set_domain calls i915_gem_flush to
- * emit an MI_FLUSH and drm_agp_chipset_flush
- * 5. Unmapped from GTT
- * i915_gem_object_unbind calls set_domain (CPU, CPU)
- * flush_domains and invalidate_domains end up both zero
- * so no flushing/invalidating happens
- * 6. Freed
- * yay, done
- *
- * Case 2: The shared render buffer
- *
- * 1. Allocated
- * 2. Mapped to GTT
- * 3. Read/written by GPU
- * 4. set_domain to (CPU,CPU)
- * 5. Read/written by CPU
- * 6. Read/written by GPU
- *
- * 1. Allocated
- * Same as last example, (CPU, CPU)
- * 2. Mapped to GTT
- * Nothing changes (assertions find that it is not in the GPU)
- * 3. Read/written by GPU
- * execbuffer calls set_domain (RENDER, RENDER)
- * flush_domains gets CPU
- * invalidate_domains gets GPU
- * clflush (obj)
- * MI_FLUSH and drm_agp_chipset_flush
- * 4. set_domain (CPU, CPU)
- * flush_domains gets GPU
- * invalidate_domains gets CPU
- * wait_rendering (obj) to make sure all drawing is complete.
- * This will include an MI_FLUSH to get the data from GPU
- * to memory
- * clflush (obj) to invalidate the CPU cache
- * Another MI_FLUSH in i915_gem_flush (eliminate this somehow?)
- * 5. Read/written by CPU
- * cache lines are loaded and dirtied
- * 6. Read written by GPU
- * Same as last GPU access
- *
- * Case 3: The constant buffer
- *
- * 1. Allocated
- * 2. Written by CPU
- * 3. Read by GPU
- * 4. Updated (written) by CPU again
- * 5. Read by GPU
- *
- * 1. Allocated
- * (CPU, CPU)
- * 2. Written by CPU
- * (CPU, CPU)
- * 3. Read by GPU
- * (CPU+RENDER, 0)
- * flush_domains = CPU
- * invalidate_domains = RENDER
- * clflush (obj)
- * MI_FLUSH
- * drm_agp_chipset_flush
- * 4. Updated (written) by CPU again
- * (CPU, CPU)
- * flush_domains = 0 (no previous write domain)
- * invalidate_domains = 0 (no new read domains)
- * 5. Read by GPU
- * (CPU+RENDER, 0)
- * flush_domains = CPU
- * invalidate_domains = RENDER
- * clflush (obj)
- * MI_FLUSH
- * drm_agp_chipset_flush
- */
-static void
-i915_gem_object_set_to_gpu_domain(struct drm_i915_gem_object *obj,
- struct intel_ring_buffer *ring,
- struct change_domains *cd)
-{
- uint32_t invalidate_domains = 0, flush_domains = 0;
-
- /*
- * If the object isn't moving to a new write domain,
- * let the object stay in multiple read domains
- */
- if (obj->base.pending_write_domain == 0)
- obj->base.pending_read_domains |= obj->base.read_domains;
-
- /*
- * Flush the current write domain if
- * the new read domains don't match. Invalidate
- * any read domains which differ from the old
- * write domain
- */
- if (obj->base.write_domain &&
- (((obj->base.write_domain != obj->base.pending_read_domains ||
- obj->ring != ring)) ||
- (obj->fenced_gpu_access && !obj->pending_fenced_gpu_access))) {
- flush_domains |= obj->base.write_domain;
- invalidate_domains |=
- obj->base.pending_read_domains & ~obj->base.write_domain;
- }
- /*
- * Invalidate any read caches which may have
- * stale data. That is, any new read domains.
- */
- invalidate_domains |= obj->base.pending_read_domains & ~obj->base.read_domains;
- if ((flush_domains | invalidate_domains) & I915_GEM_DOMAIN_CPU)
- i915_gem_clflush_object(obj);
-
- if (obj->base.pending_write_domain)
- cd->flips |= atomic_read(&obj->pending_flip);
-
- /* The actual obj->write_domain will be updated with
- * pending_write_domain after we emit the accumulated flush for all
- * of our domain changes in execbuffers (which clears objects'
- * write_domains). So if we have a current write domain that we
- * aren't changing, set pending_write_domain to that.
- */
- if (flush_domains == 0 && obj->base.pending_write_domain == 0)
- obj->base.pending_write_domain = obj->base.write_domain;
-
- cd->invalidate_domains |= invalidate_domains;
- cd->flush_domains |= flush_domains;
- if (flush_domains & I915_GEM_GPU_DOMAINS)
- cd->flush_rings |= intel_ring_flag(obj->ring);
- if (invalidate_domains & I915_GEM_GPU_DOMAINS)
- cd->flush_rings |= intel_ring_flag(ring);
-}
-
struct eb_objects {
int and;
struct hlist_head buckets[0];
@@ -810,81 +636,31 @@ err:
return ret;
}
-static void
-i915_gem_execbuffer_flush(struct drm_device *dev,
- uint32_t invalidate_domains,
- uint32_t flush_domains)
-{
- if (flush_domains & I915_GEM_DOMAIN_CPU)
- intel_gtt_chipset_flush();
-
- if (flush_domains & I915_GEM_DOMAIN_GTT)
- wmb();
-}
-
-static int
-i915_gem_execbuffer_wait_for_flips(struct intel_ring_buffer *ring, u32 flips)
-{
- u32 plane, flip_mask;
- int ret;
-
- /* Check for any pending flips. As we only maintain a flip queue depth
- * of 1, we can simply insert a WAIT for the next display flip prior
- * to executing the batch and avoid stalling the CPU.
- */
-
- for (plane = 0; flips >> plane; plane++) {
- if (((flips >> plane) & 1) == 0)
- continue;
-
- if (plane)
- flip_mask = MI_WAIT_FOR_PLANE_B_FLIP;
- else
- flip_mask = MI_WAIT_FOR_PLANE_A_FLIP;
-
- ret = intel_ring_begin(ring, 2);
- if (ret)
- return ret;
-
- intel_ring_emit(ring, MI_WAIT_FOR_EVENT | flip_mask);
- intel_ring_emit(ring, MI_NOOP);
- intel_ring_advance(ring);
- }
-
- return 0;
-}
-
-
static int
i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
struct list_head *objects)
{
struct drm_i915_gem_object *obj;
- struct change_domains cd;
+ uint32_t flush_domains = 0;
int ret;
- memset(&cd, 0, sizeof(cd));
- list_for_each_entry(obj, objects, exec_list)
- i915_gem_object_set_to_gpu_domain(obj, ring, &cd);
-
- if (cd.invalidate_domains | cd.flush_domains) {
- i915_gem_execbuffer_flush(ring->dev,
- cd.invalidate_domains,
- cd.flush_domains);
- }
-
- if (cd.flips) {
- ret = i915_gem_execbuffer_wait_for_flips(ring, cd.flips);
- if (ret)
- return ret;
- }
-
list_for_each_entry(obj, objects, exec_list) {
ret = i915_gem_object_sync(obj, ring);
if (ret)
return ret;
+
+ if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
+ i915_gem_clflush_object(obj);
+
+ flush_domains |= obj->base.write_domain;
}
+ if (flush_domains & I915_GEM_DOMAIN_CPU)
+ intel_gtt_chipset_flush();
+
+ if (flush_domains & I915_GEM_DOMAIN_GTT)
+ wmb();
+
/* Unconditionally invalidate gpu caches and ensure that we do flush
* any residual writes from the previous batch.
*/
--
1.7.10.4
next prev parent reply other threads:[~2012-07-20 11:42 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-20 11:40 Remove the flushing list (v2) Chris Wilson
2012-07-20 11:40 ` [PATCH 01/10] drm/i915: Allow late allocation of request for i915_add_request() Chris Wilson
2012-07-20 11:41 ` [PATCH 02/10] drm/i915: Remove assertion over write domain after i915_gem_object_sync() Chris Wilson
2012-07-20 11:41 ` [PATCH 03/10] drm/i915: Replace the pending_gpu_write flag with an explicit seqno Chris Wilson
2012-07-20 11:41 ` [PATCH 04/10] drm/i915: Remove the defunct flushing list Chris Wilson
2012-07-20 11:41 ` [PATCH 05/10] drm/i915: Remove the per-ring write list Chris Wilson
2012-07-20 11:41 ` [PATCH 06/10] drm/i915: Remove explicit flush from i915_gem_object_flush_fence() Chris Wilson
2012-07-20 11:41 ` [PATCH 07/10] drm/i915: Remove the explicit flush of the GPU write domain Chris Wilson
2012-07-20 11:41 ` Chris Wilson [this message]
2012-07-20 11:41 ` [PATCH 09/10] drm/i915: Clear the pending_gpu_fenced_access flag at the start of execbuffer Chris Wilson
2012-07-20 11:41 ` [PATCH 10/10] drm/i915: Split i915_gem_flush_ring() into seperate invalidate/flush funcs Chris Wilson
2012-07-21 10:49 ` Daniel Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1342784468-20474-9-git-send-email-chris@chris-wilson.co.uk \
--to=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).