intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* i915 next, post-llc
@ 2011-04-16  9:17 Chris Wilson
  2011-04-16  9:17 ` [PATCH 01/21] drm/i915: Cache GT fifo count for SandyBridge Chris Wilson
                   ` (20 more replies)
  0 siblings, 21 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

So continuing the -next discussion, I have a few more patches for review.
1-8: The last set of post-llc patches, which I think are close to being
     accepted. I need to debate the finer points of fence lifetime with
     Daniel and capture the explanations so that we don't repeat our
     mistakes.
9,10: A couple of patches from Mike Isely for configuring the LVDS in the
      absence of any configuration data. Next on the agenda in this
      particular series is being able to specify the lvds fixed mode.
      Watch this space!
11-13: Minor tweaks to improve code clarity
14,15: Reset GMBUS controller on init. After renaming GMBUS0+reg_offset.
16: Completely minor performance improvement (down on the 2-3% level for
    glyphs). More importantly the patch helps clarify the code.
17: A micro-optimisation for request retirement that I can live without.
18,19: Use GPU semaphores to prevent stalls when switching rings for
       page-flipping
20,21: vmap
-Chris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/21] drm/i915: Cache GT fifo count for SandyBridge
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16  9:17 ` [PATCH 02/21] drm/i915: Refactor pwrite/pread to use single copy of get_user_pages Chris Wilson
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

The read back of the available FIFO entries is vital for system
stability, but extremely costly. However, we only need a guide so as to
avoid eating into the reserved entries and since we are the only
consumer we can cache the read of the count from the last write.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.c |   15 ++++++++++-----
 drivers/gpu/drm/i915/i915_drv.h |    1 +
 drivers/gpu/drm/i915/i915_reg.h |    1 +
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index c416c1d..5d0d28c 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -287,12 +287,17 @@ void __gen6_gt_force_wake_put(struct drm_i915_private *dev_priv)
 
 void __gen6_gt_wait_for_fifo(struct drm_i915_private *dev_priv)
 {
-	int loop = 500;
-	u32 fifo = I915_READ_NOTRACE(GT_FIFO_FREE_ENTRIES);
-	while (fifo < 20 && loop--) {
-		udelay(10);
-		fifo = I915_READ_NOTRACE(GT_FIFO_FREE_ENTRIES);
+	if (dev_priv->gt_fifo_count < GT_FIFO_NUM_RESERVED_ENTRIES ) {
+		int loop = 500;
+		u32 fifo = I915_READ_NOTRACE(GT_FIFO_FREE_ENTRIES);
+		while (fifo <= GT_FIFO_NUM_RESERVED_ENTRIES && loop--) {
+			udelay(10);
+			fifo = I915_READ_NOTRACE(GT_FIFO_FREE_ENTRIES);
+		}
+		WARN_ON(loop < 0 && fifo <= GT_FIFO_NUM_RESERVED_ENTRIES);
+		dev_priv->gt_fifo_count = fifo;
 	}
+	dev_priv->gt_fifo_count--;
 }
 
 static int i915_drm_freeze(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 759045a..9b22f11 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -268,6 +268,7 @@ typedef struct drm_i915_private {
 	int relative_constants_mode;
 
 	void __iomem *regs;
+	u32 gt_fifo_count;
 
 	struct intel_gmbus {
 		struct i2c_adapter adapter;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index f39ac3a..e6bfe94 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -3329,6 +3329,7 @@
 #define  FORCEWAKE_ACK				0x130090
 
 #define  GT_FIFO_FREE_ENTRIES			0x120008
+#define    GT_FIFO_NUM_RESERVED_ENTRIES		20
 
 #define GEN6_RPNSWREQ				0xA008
 #define   GEN6_TURBO_DISABLE			(1<<31)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/21] drm/i915: Refactor pwrite/pread to use single copy of get_user_pages
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
  2011-04-16  9:17 ` [PATCH 01/21] drm/i915: Cache GT fifo count for SandyBridge Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16  9:17 ` [PATCH 03/21] drm/i915: s/addr & ~PAGE_MASK/offset_in_page(addr)/ Chris Wilson
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

Replace the three nearly identical copies of the code with a single
function. And take advantage of the opportunity to do some
micro-optimisation: avoid the vmalloc if at all possible and also avoid
dropping the lock unless we are forced to acquire the mm semaphore.

Measuring the impact of the optimisations, it turns out they are not so
micro after all. Running x11perf -aa10text on PineView:

  before 1.28 Mglyph/sec
  after  1.45 Mglyph/sec

(Glyphs, in general and on PineView in particular, are one of the very
few pwrite rate-limited workloads.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Paul Menzel <paulepanter@users.sourceforge.net>
---
 drivers/gpu/drm/i915/i915_gem.c |  149 ++++++++++++++++++++++++---------------
 1 files changed, 92 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 57bc775..140d324 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -257,6 +257,73 @@ static int i915_gem_object_needs_bit17_swizzle(struct drm_i915_gem_object *obj)
 		obj->tiling_mode != I915_TILING_NONE;
 }
 
+/**
+ * Magically retrieves the pages for the user addr whilst holding the
+ * dev->struct_mutex.
+ *
+ * Since we can not take the mm semaphore whilst holding our dev->struct_mutex,
+ * due to the pre-existing lock dependency established by i915_gem_fault(),
+ * we have to perform some sleight-of-hand.
+ *
+ * First, we try the lockless variant of get_user_pages(),
+ * __get_user_pages_fast(), whilst continuing to hold the mutex. If that
+ * fails to get all the user pages, then we no have choice but to acquire
+ * the mm semaphore, thus dropping the lock on dev->struct_mutex to do so.
+ * The dev->struct_mutex is then re-acquired before we return.
+ *
+ * Returns: an error code *and* the number of user pages acquired. Even
+ * on an error, you must iterate over the returned pages and release them.
+ */
+static int
+i915_gem_get_user_pages(struct drm_device *dev,
+			unsigned long addr,
+			bool write,
+			int *num_pages,
+			struct page ***pages_out)
+{
+	struct page **pages;
+	int pinned, ret;
+	int n = *num_pages;
+
+	pages = kmalloc(n*sizeof(struct page *),
+			GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
+	if (pages == NULL) {
+		pages = drm_malloc_ab(n, sizeof(struct page *));
+		if (pages == NULL) {
+			*pages_out = NULL;
+			*num_pages = 0;
+			return -ENOMEM;
+		}
+	}
+
+	pinned = __get_user_pages_fast(addr, n, write, pages);
+	if (pinned < n) {
+		struct mm_struct *mm = current->mm;
+
+		mutex_unlock(&dev->struct_mutex);
+		down_read(&mm->mmap_sem);
+		ret = get_user_pages(current, mm,
+				     addr + (pinned << PAGE_SHIFT),
+				     n - pinned,
+				     write, 0,
+				     pages + pinned,
+				     NULL);
+		up_read(&mm->mmap_sem);
+		mutex_lock(&dev->struct_mutex);
+		if (ret > 0)
+			pinned += ret;
+	}
+
+	ret = 0;
+	if (pinned < n)
+		ret = -EFAULT;
+
+	*num_pages = pinned;
+	*pages_out = pages;
+	return ret;
+}
+
+
 static inline void
 slow_shmem_copy(struct page *dst_page,
 		int dst_offset,
@@ -398,11 +465,11 @@ i915_gem_shmem_pread_slow(struct drm_device *dev,
 			  struct drm_file *file)
 {
 	struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
-	struct mm_struct *mm = current->mm;
 	struct page **user_pages;
 	ssize_t remain;
-	loff_t offset, pinned_pages, i;
-	loff_t first_data_page, last_data_page, num_pages;
+	loff_t offset;
+	loff_t first_data_page, last_data_page;
+	int num_pages, i;
 	int shmem_page_offset;
 	int data_page_index, data_page_offset;
 	int page_length;
@@ -420,20 +487,10 @@ i915_gem_shmem_pread_slow(struct drm_device *dev,
 	last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE;
 	num_pages = last_data_page - first_data_page + 1;
 
-	user_pages = drm_malloc_ab(num_pages, sizeof(struct page *));
-	if (user_pages == NULL)
-		return -ENOMEM;
-
-	mutex_unlock(&dev->struct_mutex);
-	down_read(&mm->mmap_sem);
-	pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr,
-				      num_pages, 1, 0, user_pages, NULL);
-	up_read(&mm->mmap_sem);
-	mutex_lock(&dev->struct_mutex);
-	if (pinned_pages < num_pages) {
-		ret = -EFAULT;
+	ret = i915_gem_get_user_pages(dev, data_ptr, true,
+				      &num_pages, &user_pages);
+	if (ret)
 		goto out;
-	}
 
 	ret = i915_gem_object_set_cpu_read_domain_range(obj,
 							args->offset,
@@ -494,7 +551,7 @@ i915_gem_shmem_pread_slow(struct drm_device *dev,
 	}
 
 out:
-	for (i = 0; i < pinned_pages; i++) {
+	for (i = 0; i < num_pages; i++) {
 		SetPageDirty(user_pages[i]);
 		mark_page_accessed(user_pages[i]);
 		page_cache_release(user_pages[i]);
@@ -679,10 +736,9 @@ i915_gem_gtt_pwrite_slow(struct drm_device *dev,
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	ssize_t remain;
 	loff_t gtt_page_base, offset;
-	loff_t first_data_page, last_data_page, num_pages;
-	loff_t pinned_pages, i;
+	loff_t first_data_page, last_data_page;
+	int num_pages, i;
 	struct page **user_pages;
-	struct mm_struct *mm = current->mm;
 	int gtt_page_offset, data_page_offset, data_page_index, page_length;
 	int ret;
 	uint64_t data_ptr = args->data_ptr;
@@ -697,28 +753,18 @@ i915_gem_gtt_pwrite_slow(struct drm_device *dev,
 	last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE;
 	num_pages = last_data_page - first_data_page + 1;
 
-	user_pages = drm_malloc_ab(num_pages, sizeof(struct page *));
-	if (user_pages == NULL)
-		return -ENOMEM;
-
-	mutex_unlock(&dev->struct_mutex);
-	down_read(&mm->mmap_sem);
-	pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr,
-				      num_pages, 0, 0, user_pages, NULL);
-	up_read(&mm->mmap_sem);
-	mutex_lock(&dev->struct_mutex);
-	if (pinned_pages < num_pages) {
-		ret = -EFAULT;
-		goto out_unpin_pages;
-	}
+	ret = i915_gem_get_user_pages(dev, data_ptr, false,
+				      &num_pages, &user_pages);
+	if (ret)
+		goto out;
 
 	ret = i915_gem_object_set_to_gtt_domain(obj, true);
 	if (ret)
-		goto out_unpin_pages;
+		goto out;
 
 	ret = i915_gem_object_put_fence(obj);
 	if (ret)
-		goto out_unpin_pages;
+		goto out;
 
 	offset = obj->gtt_offset + args->offset;
 
@@ -753,8 +799,8 @@ i915_gem_gtt_pwrite_slow(struct drm_device *dev,
 		data_ptr += page_length;
 	}
 
-out_unpin_pages:
-	for (i = 0; i < pinned_pages; i++)
+out:
+	for (i = 0; i < num_pages; i++)
 		page_cache_release(user_pages[i]);
 	drm_free_large(user_pages);
 
@@ -803,11 +849,11 @@ i915_gem_shmem_pwrite_fast(struct drm_device *dev,
 		if (IS_ERR(page))
 			return PTR_ERR(page);
 
-		vaddr = kmap_atomic(page, KM_USER0);
+		vaddr = kmap_atomic(page);
 		ret = __copy_from_user_inatomic(vaddr + page_offset,
 						user_data,
 						page_length);
-		kunmap_atomic(vaddr, KM_USER0);
+		kunmap_atomic(vaddr);
 
 		set_page_dirty(page);
 		mark_page_accessed(page);
@@ -842,11 +888,10 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 			   struct drm_file *file)
 {
 	struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
-	struct mm_struct *mm = current->mm;
 	struct page **user_pages;
 	ssize_t remain;
-	loff_t offset, pinned_pages, i;
-	loff_t first_data_page, last_data_page, num_pages;
+	loff_t first_data_page, last_data_page, offset;
+	int num_pages, i;
 	int shmem_page_offset;
 	int data_page_index,  data_page_offset;
 	int page_length;
@@ -864,20 +909,10 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 	last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE;
 	num_pages = last_data_page - first_data_page + 1;
 
-	user_pages = drm_malloc_ab(num_pages, sizeof(struct page *));
-	if (user_pages == NULL)
-		return -ENOMEM;
-
-	mutex_unlock(&dev->struct_mutex);
-	down_read(&mm->mmap_sem);
-	pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr,
-				      num_pages, 0, 0, user_pages, NULL);
-	up_read(&mm->mmap_sem);
-	mutex_lock(&dev->struct_mutex);
-	if (pinned_pages < num_pages) {
-		ret = -EFAULT;
+	ret = i915_gem_get_user_pages(dev, data_ptr, false,
+				      &num_pages, &user_pages);
+	if (ret)
 		goto out;
-	}
 
 	ret = i915_gem_object_set_to_cpu_domain(obj, 1);
 	if (ret)
@@ -940,7 +975,7 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 	}
 
 out:
-	for (i = 0; i < pinned_pages; i++)
+	for (i = 0; i < num_pages; i++)
 		page_cache_release(user_pages[i]);
 	drm_free_large(user_pages);
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/21] drm/i915: s/addr & ~PAGE_MASK/offset_in_page(addr)/
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
  2011-04-16  9:17 ` [PATCH 01/21] drm/i915: Cache GT fifo count for SandyBridge Chris Wilson
  2011-04-16  9:17 ` [PATCH 02/21] drm/i915: Refactor pwrite/pread to use single copy of get_user_pages Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16  9:17 ` [PATCH 04/21] drm/i915: Maintain fenced gpu access until we flush the fence Chris Wilson
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

Convert our open coded offset_in_page() to the common macro.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c |   21 ++++++++++-----------
 1 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 140d324..6b6bde0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -423,7 +423,7 @@ i915_gem_shmem_pread_fast(struct drm_device *dev,
 		 * page_offset = offset within page
 		 * page_length = bytes to copy for this page
 		 */
-		page_offset = offset & (PAGE_SIZE-1);
+		page_offset = offset_in_page(offset);
 		page_length = remain;
 		if ((page_offset + remain) > PAGE_SIZE)
 			page_length = PAGE_SIZE - page_offset;
@@ -512,9 +512,9 @@ i915_gem_shmem_pread_slow(struct drm_device *dev,
 		 * data_page_offset = offset with data_page_index page.
 		 * page_length = bytes to copy for this page
 		 */
-		shmem_page_offset = offset & ~PAGE_MASK;
+		shmem_page_offset = offset_in_page(offset);
 		data_page_index = data_ptr / PAGE_SIZE - first_data_page;
-		data_page_offset = data_ptr & ~PAGE_MASK;
+		data_page_offset = offset_in_page(data_ptr);
 
 		page_length = remain;
 		if ((shmem_page_offset + page_length) > PAGE_SIZE)
@@ -697,8 +697,8 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 		 * page_offset = offset within page
 		 * page_length = bytes to copy for this page
 		 */
-		page_base = (offset & ~(PAGE_SIZE-1));
-		page_offset = offset & (PAGE_SIZE-1);
+		page_base = offset & PAGE_MASK;
+		page_offset = offset_in_page(offset);
 		page_length = remain;
 		if ((page_offset + remain) > PAGE_SIZE)
 			page_length = PAGE_SIZE - page_offset;
@@ -709,7 +709,6 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 		 */
 		if (fast_user_write(dev_priv->mm.gtt_mapping, page_base,
 				    page_offset, user_data, page_length))
-
 			return -EFAULT;
 
 		remain -= page_length;
@@ -778,9 +777,9 @@ i915_gem_gtt_pwrite_slow(struct drm_device *dev,
 		 * page_length = bytes to copy for this page
 		 */
 		gtt_page_base = offset & PAGE_MASK;
-		gtt_page_offset = offset & ~PAGE_MASK;
+		gtt_page_offset = offset_in_page(offset);
 		data_page_index = data_ptr / PAGE_SIZE - first_data_page;
-		data_page_offset = data_ptr & ~PAGE_MASK;
+		data_page_offset = offset_in_page(data_ptr);
 
 		page_length = remain;
 		if ((gtt_page_offset + page_length) > PAGE_SIZE)
@@ -839,7 +838,7 @@ i915_gem_shmem_pwrite_fast(struct drm_device *dev,
 		 * page_offset = offset within page
 		 * page_length = bytes to copy for this page
 		 */
-		page_offset = offset & (PAGE_SIZE-1);
+		page_offset = offset_in_page(offset);
 		page_length = remain;
 		if ((page_offset + remain) > PAGE_SIZE)
 			page_length = PAGE_SIZE - page_offset;
@@ -933,9 +932,9 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 		 * data_page_offset = offset with data_page_index page.
 		 * page_length = bytes to copy for this page
 		 */
-		shmem_page_offset = offset & ~PAGE_MASK;
+		shmem_page_offset = offset_in_page(offset);
 		data_page_index = data_ptr / PAGE_SIZE - first_data_page;
-		data_page_offset = data_ptr & ~PAGE_MASK;
+		data_page_offset = offset_in_page(data_ptr);
 
 		page_length = remain;
 		if ((shmem_page_offset + page_length) > PAGE_SIZE)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/21] drm/i915: Maintain fenced gpu access until we flush the fence
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (2 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 03/21] drm/i915: s/addr & ~PAGE_MASK/offset_in_page(addr)/ Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16  9:17 ` [PATCH 05/21] drm/i915: Invalidate fenced read domains upon flush Chris Wilson
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

We only want to mark the transition from unfenced GPU access by an
execbuffer, so that we are forced to flush any pending writes through
the fence before updating the register.

In applying this fix for a corruption bug, we do lose the ability to
detect the earliest end of GPU fenced access, thus disabling the inherent
optimization.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 20a4cc5..a07911f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -911,7 +911,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *objects,
 
 		obj->base.read_domains = obj->base.pending_read_domains;
 		obj->base.write_domain = obj->base.pending_write_domain;
-		obj->fenced_gpu_access = obj->pending_fenced_gpu_access;
+		obj->fenced_gpu_access |= obj->pending_fenced_gpu_access;
 
 		i915_gem_object_move_to_active(obj, ring, seqno);
 		if (obj->base.write_domain) {
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/21] drm/i915: Invalidate fenced read domains upon flush
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (3 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 04/21] drm/i915: Maintain fenced gpu access until we flush the fence Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16  9:17 ` [PATCH 06/21] drm/i915: Pass the fence register number to be written Chris Wilson
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

Whenever we finish reading an object through a fence, for safety we
clear any GPU read domain and so invalidate any TLBs associated with
the fenced region upon its next use. As we now always flush writes
through an existing fence before it is released and then trigger the
invalidation of the GPU domains should we ever re-use it again on the
GPU, we no longer need to compare and force the invalidation if the
fenced access changes in move_to_gpu().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c            |    2 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    5 ++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6b6bde0..0ccd3ab 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2536,6 +2536,8 @@ i915_gem_object_flush_fence(struct drm_i915_gem_object *obj,
 				return ret;
 		}
 
+		 /* Invalidate the GPU TLBs for any future reads */
+		obj->base.read_domains &= ~I915_GEM_GPU_DOMAINS;
 		obj->fenced_gpu_access = false;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index a07911f..0010aee 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -172,9 +172,8 @@ i915_gem_object_set_to_gpu_domain(struct drm_i915_gem_object *obj,
 	 * write domain
 	 */
 	if (obj->base.write_domain &&
-	    (((obj->base.write_domain != obj->base.pending_read_domains ||
-	       obj->ring != ring)) ||
-	     (obj->fenced_gpu_access && !obj->pending_fenced_gpu_access))) {
+	    (obj->base.write_domain != obj->base.pending_read_domains ||
+	     obj->ring != ring)) {
 		flush_domains |= obj->base.write_domain;
 		invalidate_domains |=
 			obj->base.pending_read_domains & ~obj->base.write_domain;
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/21] drm/i915: Pass the fence register number to be written
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (4 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 05/21] drm/i915: Invalidate fenced read domains upon flush Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16  9:17 ` [PATCH 07/21] drm/i915: Track fence setup separately from fenced object lifetime Chris Wilson
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

This simplifies a later change where we want to successfully write (or
pipeline) the fence update prior to updating the bo.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |   61 ++++++++++++++++++--------------------
 drivers/gpu/drm/i915/i915_reg.h |    1 +
 2 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0ccd3ab..8c835de 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2349,12 +2349,12 @@ i915_gpu_idle(struct drm_device *dev)
 }
 
 static int sandybridge_write_fence_reg(struct drm_i915_gem_object *obj,
-				       struct intel_ring_buffer *pipelined)
+				       struct intel_ring_buffer *pipelined,
+				       int reg)
 {
 	struct drm_device *dev = obj->base.dev;
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	u32 size = obj->gtt_space->size;
-	int regnum = obj->fence_reg;
 	uint64_t val;
 
 	val = (uint64_t)((obj->gtt_offset + size - 4096) &
@@ -2374,24 +2374,24 @@ static int sandybridge_write_fence_reg(struct drm_i915_gem_object *obj,
 
 		intel_ring_emit(pipelined, MI_NOOP);
 		intel_ring_emit(pipelined, MI_LOAD_REGISTER_IMM(2));
-		intel_ring_emit(pipelined, FENCE_REG_SANDYBRIDGE_0 + regnum*8);
+		intel_ring_emit(pipelined, FENCE_REG_SANDYBRIDGE_0 + reg*8);
 		intel_ring_emit(pipelined, (u32)val);
-		intel_ring_emit(pipelined, FENCE_REG_SANDYBRIDGE_0 + regnum*8 + 4);
+		intel_ring_emit(pipelined, FENCE_REG_SANDYBRIDGE_0 + reg*8 + 4);
 		intel_ring_emit(pipelined, (u32)(val >> 32));
 		intel_ring_advance(pipelined);
 	} else
-		I915_WRITE64(FENCE_REG_SANDYBRIDGE_0 + regnum * 8, val);
+		I915_WRITE64(FENCE_REG_SANDYBRIDGE_0 + reg * 8, val);
 
 	return 0;
 }
 
 static int i965_write_fence_reg(struct drm_i915_gem_object *obj,
-				struct intel_ring_buffer *pipelined)
+				struct intel_ring_buffer *pipelined,
+				int reg)
 {
 	struct drm_device *dev = obj->base.dev;
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	u32 size = obj->gtt_space->size;
-	int regnum = obj->fence_reg;
 	uint64_t val;
 
 	val = (uint64_t)((obj->gtt_offset + size - 4096) &
@@ -2409,25 +2409,25 @@ static int i965_write_fence_reg(struct drm_i915_gem_object *obj,
 
 		intel_ring_emit(pipelined, MI_NOOP);
 		intel_ring_emit(pipelined, MI_LOAD_REGISTER_IMM(2));
-		intel_ring_emit(pipelined, FENCE_REG_965_0 + regnum*8);
+		intel_ring_emit(pipelined, FENCE_REG_965_0 + reg*8);
 		intel_ring_emit(pipelined, (u32)val);
-		intel_ring_emit(pipelined, FENCE_REG_965_0 + regnum*8 + 4);
+		intel_ring_emit(pipelined, FENCE_REG_965_0 + reg*8 + 4);
 		intel_ring_emit(pipelined, (u32)(val >> 32));
 		intel_ring_advance(pipelined);
 	} else
-		I915_WRITE64(FENCE_REG_965_0 + regnum * 8, val);
+		I915_WRITE64(FENCE_REG_965_0 + reg * 8, val);
 
 	return 0;
 }
 
 static int i915_write_fence_reg(struct drm_i915_gem_object *obj,
-				struct intel_ring_buffer *pipelined)
+				struct intel_ring_buffer *pipelined,
+				int reg)
 {
 	struct drm_device *dev = obj->base.dev;
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	u32 size = obj->gtt_space->size;
-	u32 fence_reg, val, pitch_val;
-	int tile_width;
+	int tile_width, val;
 
 	if (WARN((obj->gtt_offset & ~I915_FENCE_START_MASK) ||
 		 (size & -size) != size ||
@@ -2441,22 +2441,17 @@ static int i915_write_fence_reg(struct drm_i915_gem_object *obj,
 	else
 		tile_width = 512;
 
-	/* Note: pitch better be a power of two tile widths */
-	pitch_val = obj->stride / tile_width;
-	pitch_val = ffs(pitch_val) - 1;
-
 	val = obj->gtt_offset;
 	if (obj->tiling_mode == I915_TILING_Y)
 		val |= 1 << I830_FENCE_TILING_Y_SHIFT;
 	val |= I915_FENCE_SIZE_BITS(size);
-	val |= pitch_val << I830_FENCE_PITCH_SHIFT;
+	val |= I915_FENCE_PITCH_BITS(obj->stride / tile_width);
 	val |= I830_FENCE_REG_VALID;
 
-	fence_reg = obj->fence_reg;
-	if (fence_reg < 8)
-		fence_reg = FENCE_REG_830_0 + fence_reg * 4;
+	if (reg < 8)
+		reg = FENCE_REG_830_0 + reg * 4;
 	else
-		fence_reg = FENCE_REG_945_8 + (fence_reg - 8) * 4;
+		reg = FENCE_REG_945_8 + (reg - 8) * 4;
 
 	if (pipelined) {
 		int ret = intel_ring_begin(pipelined, 4);
@@ -2465,22 +2460,22 @@ static int i915_write_fence_reg(struct drm_i915_gem_object *obj,
 
 		intel_ring_emit(pipelined, MI_NOOP);
 		intel_ring_emit(pipelined, MI_LOAD_REGISTER_IMM(1));
-		intel_ring_emit(pipelined, fence_reg);
+		intel_ring_emit(pipelined, reg);
 		intel_ring_emit(pipelined, val);
 		intel_ring_advance(pipelined);
 	} else
-		I915_WRITE(fence_reg, val);
+		I915_WRITE(reg, val);
 
 	return 0;
 }
 
 static int i830_write_fence_reg(struct drm_i915_gem_object *obj,
-				struct intel_ring_buffer *pipelined)
+				struct intel_ring_buffer *pipelined,
+				int reg)
 {
 	struct drm_device *dev = obj->base.dev;
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	u32 size = obj->gtt_space->size;
-	int regnum = obj->fence_reg;
 	uint32_t val;
 	uint32_t pitch_val;
 
@@ -2508,11 +2503,11 @@ static int i830_write_fence_reg(struct drm_i915_gem_object *obj,
 
 		intel_ring_emit(pipelined, MI_NOOP);
 		intel_ring_emit(pipelined, MI_LOAD_REGISTER_IMM(1));
-		intel_ring_emit(pipelined, FENCE_REG_830_0 + regnum*4);
+		intel_ring_emit(pipelined, FENCE_REG_830_0 + reg*4);
 		intel_ring_emit(pipelined, val);
 		intel_ring_advance(pipelined);
 	} else
-		I915_WRITE(FENCE_REG_830_0 + regnum * 4, val);
+		I915_WRITE(FENCE_REG_830_0 + reg * 4, val);
 
 	return 0;
 }
@@ -2653,6 +2648,7 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj,
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_fence_reg *reg;
+	int regnum;
 	int ret;
 
 	/* XXX disable pipelining. There are bugs. Shocking. */
@@ -2748,19 +2744,20 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj,
 
 update:
 	obj->tiling_changed = false;
+	regnum = reg - dev_priv->fence_regs;
 	switch (INTEL_INFO(dev)->gen) {
 	case 6:
-		ret = sandybridge_write_fence_reg(obj, pipelined);
+		ret = sandybridge_write_fence_reg(obj, pipelined, regnum);
 		break;
 	case 5:
 	case 4:
-		ret = i965_write_fence_reg(obj, pipelined);
+		ret = i965_write_fence_reg(obj, pipelined, regnum);
 		break;
 	case 3:
-		ret = i915_write_fence_reg(obj, pipelined);
+		ret = i915_write_fence_reg(obj, pipelined, regnum);
 		break;
 	case 2:
-		ret = i830_write_fence_reg(obj, pipelined);
+		ret = i830_write_fence_reg(obj, pipelined, regnum);
 		break;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index e6bfe94..8848411 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -264,6 +264,7 @@
 
 #define   I915_FENCE_START_MASK		0x0ff00000
 #define   I915_FENCE_SIZE_BITS(size)	((ffs((size) >> 20) - 1) << 8)
+#define   I915_FENCE_PITCH_BITS(stride)	((ffs(stride) - 1) << I830_FENCE_PITCH_SHIFT)
 
 #define FENCE_REG_965_0			0x03000
 #define   I965_FENCE_PITCH_SHIFT	2
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/21] drm/i915: Track fence setup separately from fenced object lifetime
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (5 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 06/21] drm/i915: Pass the fence register number to be written Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16 13:20   ` Daniel Vetter
  2011-04-16  9:17 ` [PATCH 08/21] drm/i915: Only print out the actual number of fences for i915_error_state Chris Wilson
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx; +Cc: Andy Whitcroft, Daniel Vetter

This fixes a bookkeeping error causing an OOPS whilst waiting for an
object to finish using a fence. Now we can simply wait for the fence to
be written independent of the objects currently inhabiting it (past,
present and future).

A large amount of the change is to delay updating the information about
the fence on bo until after we successfully write, or queue the write to,
the register. This avoids the complication of undoing a partial change
should we fail in pipelining the change.

Cc: Andy Whitcroft <apw@canonical.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h |    1 +
 drivers/gpu/drm/i915/i915_gem.c |  155 ++++++++++++++++++++-------------------
 2 files changed, 82 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9b22f11..0296967 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -129,6 +129,7 @@ struct drm_i915_master_private {
 struct drm_i915_fence_reg {
 	struct list_head lru_list;
 	struct drm_i915_gem_object *obj;
+	struct intel_ring_buffer *setup_ring;
 	uint32_t setup_seqno;
 };
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8c835de..b9515ac 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1730,6 +1730,8 @@ i915_gem_object_move_to_inactive(struct drm_i915_gem_object *obj)
 	i915_gem_object_move_off_active(obj);
 	obj->fenced_gpu_access = false;
 
+	obj->last_fenced_seqno = 0;
+
 	obj->active = 0;
 	obj->pending_gpu_write = false;
 	drm_gem_object_unreference(&obj->base);
@@ -1895,7 +1897,6 @@ static void i915_gem_reset_fences(struct drm_device *dev)
 		reg->obj->fence_reg = I915_FENCE_REG_NONE;
 		reg->obj->fenced_gpu_access = false;
 		reg->obj->last_fenced_seqno = 0;
-		reg->obj->last_fenced_ring = NULL;
 		i915_gem_clear_fence_reg(dev, reg);
 	}
 }
@@ -2525,7 +2526,7 @@ i915_gem_object_flush_fence(struct drm_i915_gem_object *obj,
 
 	if (obj->fenced_gpu_access) {
 		if (obj->base.write_domain & I915_GEM_GPU_DOMAINS) {
-			ret = i915_gem_flush_ring(obj->last_fenced_ring,
+			ret = i915_gem_flush_ring(obj->ring,
 						  0, obj->base.write_domain);
 			if (ret)
 				return ret;
@@ -2536,17 +2537,22 @@ i915_gem_object_flush_fence(struct drm_i915_gem_object *obj,
 		obj->fenced_gpu_access = false;
 	}
 
+	if (obj->last_fenced_seqno &&
+	    ring_passed_seqno(obj->last_fenced_ring, obj->last_fenced_seqno))
+		obj->last_fenced_seqno = 0;
+
 	if (obj->last_fenced_seqno && pipelined != obj->last_fenced_ring) {
-		if (!ring_passed_seqno(obj->last_fenced_ring,
-				       obj->last_fenced_seqno)) {
-			ret = i915_wait_request(obj->last_fenced_ring,
-						obj->last_fenced_seqno);
-			if (ret)
-				return ret;
-		}
+		ret = i915_wait_request(obj->last_fenced_ring,
+					obj->last_fenced_seqno);
+		if (ret)
+			return ret;
 
+		/* Since last_fence_seqno can retire much earlier than
+		 * last_rendering_seqno, we track that here for efficiency.
+		 * (With a catch-all in move_to_inactive() to prevent very
+		 * old seqno from lying around.)
+		 */
 		obj->last_fenced_seqno = 0;
-		obj->last_fenced_ring = NULL;
 	}
 
 	/* Ensure that all CPU reads are completed before installing a fence
@@ -2613,7 +2619,7 @@ i915_find_fence_reg(struct drm_device *dev,
 			first = reg;
 
 		if (!pipelined ||
-		    !reg->obj->last_fenced_ring ||
+		    !reg->obj->last_fenced_seqno ||
 		    reg->obj->last_fenced_ring == pipelined) {
 			avail = reg;
 			break;
@@ -2630,7 +2636,6 @@ i915_find_fence_reg(struct drm_device *dev,
  * i915_gem_object_get_fence - set up a fence reg for an object
  * @obj: object to map through a fence reg
  * @pipelined: ring on which to queue the change, or NULL for CPU access
- * @interruptible: must we wait uninterruptibly for the register to retire?
  *
  * When mapping objects through the GTT, userspace wants to be able to write
  * to them without having to worry about swizzling if the object is tiled.
@@ -2638,6 +2643,10 @@ i915_find_fence_reg(struct drm_device *dev,
  * This function walks the fence regs looking for a free one for @obj,
  * stealing one if it can't find any.
  *
+ * Note: if two fence registers point to the same or overlapping memory region
+ * the results are undefined. This is even more fun with asynchronous updates
+ * via the GPU!
+ *
  * It then sets up the reg based on the object's properties: address, pitch
  * and tiling format.
  */
@@ -2648,6 +2657,7 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj,
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_fence_reg *reg;
+	struct drm_i915_gem_object *old = NULL;
 	int regnum;
 	int ret;
 
@@ -2657,45 +2667,21 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj,
 	/* Just update our place in the LRU if our fence is getting reused. */
 	if (obj->fence_reg != I915_FENCE_REG_NONE) {
 		reg = &dev_priv->fence_regs[obj->fence_reg];
-		list_move_tail(&reg->lru_list, &dev_priv->mm.fence_list);
-
-		if (obj->tiling_changed) {
-			ret = i915_gem_object_flush_fence(obj, pipelined);
-			if (ret)
-				return ret;
-
-			if (!obj->fenced_gpu_access && !obj->last_fenced_seqno)
-				pipelined = NULL;
-
-			if (pipelined) {
-				reg->setup_seqno =
-					i915_gem_next_request_seqno(pipelined);
-				obj->last_fenced_seqno = reg->setup_seqno;
-				obj->last_fenced_ring = pipelined;
-			}
 
+		if (obj->tiling_changed)
 			goto update;
-		}
-
-		if (!pipelined) {
-			if (reg->setup_seqno) {
-				if (!ring_passed_seqno(obj->last_fenced_ring,
-						       reg->setup_seqno)) {
-					ret = i915_wait_request(obj->last_fenced_ring,
-								reg->setup_seqno);
-					if (ret)
-						return ret;
-				}
 
-				reg->setup_seqno = 0;
-			}
-		} else if (obj->last_fenced_ring &&
-			   obj->last_fenced_ring != pipelined) {
-			ret = i915_gem_object_flush_fence(obj, pipelined);
+		if (reg->setup_seqno && pipelined != reg->setup_ring) {
+			ret = i915_wait_request(reg->setup_ring,
+						reg->setup_seqno);
 			if (ret)
 				return ret;
+
+			reg->setup_ring = 0;
+			reg->setup_seqno = 0;
 		}
 
+		list_move_tail(&reg->lru_list, &dev_priv->mm.fence_list);
 		return 0;
 	}
 
@@ -2703,47 +2689,43 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj,
 	if (reg == NULL)
 		return -ENOSPC;
 
-	ret = i915_gem_object_flush_fence(obj, pipelined);
-	if (ret)
-		return ret;
-
-	if (reg->obj) {
-		struct drm_i915_gem_object *old = reg->obj;
-
+	if ((old = reg->obj)) {
 		drm_gem_object_reference(&old->base);
 
 		if (old->tiling_mode)
 			i915_gem_release_mmap(old);
 
-		ret = i915_gem_object_flush_fence(old, pipelined);
-		if (ret) {
-			drm_gem_object_unreference(&old->base);
-			return ret;
-		}
+		ret = i915_gem_object_flush_fence(old, NULL); //pipelined);
+		if (ret)
+			goto err;
 
-		if (old->last_fenced_seqno == 0 && obj->last_fenced_seqno == 0)
-			pipelined = NULL;
+		/* Mark the fence register as in-use if pipelined */
+		reg->setup_ring = old->last_fenced_ring;
+		reg->setup_seqno = old->last_fenced_seqno;
+	}
 
-		old->fence_reg = I915_FENCE_REG_NONE;
-		old->last_fenced_ring = pipelined;
-		old->last_fenced_seqno =
-			pipelined ? i915_gem_next_request_seqno(pipelined) : 0;
+update:
+	ret = i915_gem_object_flush_fence(obj, pipelined);
+	if (ret)
+		goto err;
 
-		drm_gem_object_unreference(&old->base);
-	} else if (obj->last_fenced_seqno == 0)
-		pipelined = NULL;
+	if (reg->setup_seqno && pipelined != reg->setup_ring) {
+		ret = i915_wait_request(reg->setup_ring,
+					reg->setup_seqno);
+		if (ret)
+			goto err;
 
-	reg->obj = obj;
-	list_move_tail(&reg->lru_list, &dev_priv->mm.fence_list);
-	obj->fence_reg = reg - dev_priv->fence_regs;
-	obj->last_fenced_ring = pipelined;
+		reg->setup_ring = 0;
+		reg->setup_seqno = 0;
+	}
 
-	reg->setup_seqno =
-		pipelined ? i915_gem_next_request_seqno(pipelined) : 0;
-	obj->last_fenced_seqno = reg->setup_seqno;
+	/* If we had a pipelined request, but there is no pending GPU access or
+	 * update to a fence register for this memory region, we can write
+	 * the new fence register immediately.
+	 */
+	if (obj->last_fenced_seqno == 0 && reg->setup_seqno == 0)
+		pipelined = NULL;
 
-update:
-	obj->tiling_changed = false;
 	regnum = reg - dev_priv->fence_regs;
 	switch (INTEL_INFO(dev)->gen) {
 	case 6:
@@ -2760,7 +2742,31 @@ update:
 		ret = i830_write_fence_reg(obj, pipelined, regnum);
 		break;
 	}
+	if (ret)
+		goto err;
+
+	if (pipelined) {
+		reg->setup_seqno = i915_gem_next_request_seqno(pipelined);
+		reg->setup_ring = pipelined;
+		if (old) {
+			old->last_fenced_ring = pipelined;
+			old->last_fenced_seqno = reg->setup_seqno;
+		}
+	}
+
+	if (old) {
+		old->fence_reg = I915_FENCE_REG_NONE;
+		drm_gem_object_unreference(&old->base);
+	}
+
+	list_move_tail(&reg->lru_list, &dev_priv->mm.fence_list);
+	reg->obj = obj;
+	obj->fence_reg = regnum;
+	obj->tiling_changed = false;
+	return 0;
 
+err:
+	drm_gem_object_unreference(&old->base);
 	return ret;
 }
 
@@ -2799,6 +2805,7 @@ i915_gem_clear_fence_reg(struct drm_device *dev,
 
 	list_del_init(&reg->lru_list);
 	reg->obj = NULL;
+	reg->setup_ring = 0;
 	reg->setup_seqno = 0;
 }
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/21] drm/i915: Only print out the actual number of fences for i915_error_state
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (6 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 07/21] drm/i915: Track fence setup separately from fenced object lifetime Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16  9:17 ` [PATCH 09/21] drm/i915: Implement direct support for 24 bit LVDS pixel format Chris Wilson
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

From: Daniel Vetter <daniel.vetter@ffwll.ch>

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 52d2306..4c72407 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -776,7 +776,7 @@ static int i915_error_state(struct seq_file *m, void *unused)
 	seq_printf(m, "  INSTPM: 0x%08x\n", error->instpm);
 	seq_printf(m, "  seqno: 0x%08x\n", error->seqno);
 
-	for (i = 0; i < 16; i++)
+	for (i = 0; i < dev_priv->num_fence_regs; i++)
 		seq_printf(m, "  fence[%d] = %08llx\n", i, error->fence[i]);
 
 	if (error->active_bo)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/21] drm/i915: Implement direct support for 24 bit LVDS pixel format
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (7 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 08/21] drm/i915: Only print out the actual number of fences for i915_error_state Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16  9:17 ` [PATCH 10/21] drm/i915: Implement manual override of LVDS single/dual channel mode Chris Wilson
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mike Isely

From: Mike Isely <isely@isely.net>

LVDS digital data can be formatted as 18 bits/pixel or one of two
24 bits/pixel possibilities.  All are incompatible with one another,
so the GPU's LVDS output has to be configured to match the format
expected by the display device.

In many (most) cases this is generally not a problem because the LVDS
device is integral to the processor board (e.g. a laptop) and thus the
video BIOS already sets this up correctly as part of the boot
process.  Then the i915 drm simply works with what has been already
set up.

But there are cases where the LVDS-driven display and the GPU are
discrete components - this can happen in embedded environments where
the processor board is a COTS device with its own BIOS and the display
is added to it later.  In that situation the video BIOS on the
processor board can't know anything about the LVDS display which leads
to problems if the pixel format is not 18 bit (usually 18 bit is the
default).

This patch implements a new kernel option for the i915 kernel module:
"lvds_24bit".  The default value of zero preserves the previous "don't
change anything" behavior.  If it is set to "1" or "2" then 24 bit
format is enabled - the choice between "1" and "2" selects the
particular 24 bit format.  If it is set to "3" then 24 format is
specifically disabled, which should be an extremely rare case but is
included for completeness.

There was a similar patch back in 2008 to support 24 bit LVDS with the
user-mode xorg intel driver, using a new driver option in the
xorg.conf file.  However that patch was a casualty of the move to
kernel mode switching.  This patch implements the same sort of
solution, just now it's in the kernel drm driver for i915 driven GPUs.

Signed-off-by: Mike Isely <isely@pobox.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c      |    4 ++
 drivers/gpu/drm/i915/i915_drv.h      |    1 +
 drivers/gpu/drm/i915/i915_reg.h      |    8 ++++-
 drivers/gpu/drm/i915/intel_display.c |   52 ++++++++++++++++++++++++++++-----
 4 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 5d0d28c..49d38b0 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -67,6 +67,10 @@ module_param_named(vbt_sdvo_panel_type, i915_vbt_sdvo_panel_type, int, 0600);
 static bool i915_try_reset = true;
 module_param_named(reset, i915_try_reset, bool, 0600);
 
+unsigned int i915_lvds_24bit = 0;
+module_param_named(lvds_24bit, i915_lvds_24bit, int, 0600);
+MODULE_PARM_DESC(lvds_24bit, "LVDS 24 bit pixel format: 0=leave untouched (default), 1=24 bit '2.0' format, 2=24 bit '2.1' format, 3=force older 18 bit format");
+
 static struct drm_driver driver;
 extern int intel_agp_enabled;
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0296967..7cd63bb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -981,6 +981,7 @@ extern unsigned int i915_lvds_downclock;
 extern unsigned int i915_panel_use_ssc;
 extern int i915_vbt_sdvo_panel_type;
 extern unsigned int i915_enable_rc6;
+extern unsigned int i915_lvds_24bit;
 
 extern int i915_suspend(struct drm_device *dev, pm_message_t state);
 extern int i915_resume(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 8848411..5194499 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1453,7 +1453,13 @@
 /* LVDS sync polarity flags. Set to invert (i.e. negative) */
 #define   LVDS_VSYNC_POLARITY		(1 << 21)
 #define   LVDS_HSYNC_POLARITY		(1 << 20)
-
+/*
+ * Selects between .0 and .1 formats:
+ *
+ * 0 = 1x18.0, 2x18.0, 1x24.0 or 2x24.0
+ * 1 = 1x24.1 or 2x24.1
+ */
+#define LVDS_DATA_FORMAT_DOT_ONE	(1 << 24)
 /* Enable border for unscaled (or aspect-scaled) display */
 #define   LVDS_BORDER_ENABLE		(1 << 15)
 /*
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 62f9e52..a21d3666 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -4224,6 +4224,44 @@ static inline bool intel_panel_use_ssc(struct drm_i915_private *dev_priv)
 	return dev_priv->lvds_use_ssc && i915_panel_use_ssc;
 }
 
+static void intel_lvds_select_pixel_format(u32 *val)
+{
+	/* Control the output pixel format. */
+	switch (i915_lvds_24bit) {
+	default:
+	case 0:
+		/* 0 means don't mess with 18 vs 24 bit LVDS pixel
+		 * format.  Instead we trust whatever the video
+		 * BIOS should have done to set up the panel.
+		 * This is normally the safest choice since most
+		 * LVDS-connected panels are integral to the
+		 * system and thus the video BIOS knows how to set
+		 * it up appropriately. */
+		break;
+	case 1:
+		/* Enable 24 bit pixel mode using the "2.0" format */
+		*val |= LVDS_A3_POWER_UP;
+		*val &= ~LVDS_DATA_FORMAT_DOT_ONE;
+		break;
+	case 2:
+		/* Enable 24 bit pixel mode using the "2.1"
+		 * format; this choice is equivalent to the
+		 * LVDS24BitMode option in the old pre-KMS user
+		 * space driver. */
+		*val |= LVDS_A3_POWER_UP;
+		*val |= LVDS_DATA_FORMAT_DOT_ONE;
+		break;
+	case 3:
+		/* Enable 18 bit pixel mode - this should be a
+		   very rare case since this is usually the
+		   power-up mode if the video BIOS didn't set
+		   things up.  But it's here for completeness. */
+		*val &= ~LVDS_A3_POWER_UP;
+		*val &= ~LVDS_DATA_FORMAT_DOT_ONE;
+		break;
+	}
+}
+
 static int i9xx_crtc_mode_set(struct drm_crtc *crtc,
 			      struct drm_display_mode *mode,
 			      struct drm_display_mode *adjusted_mode,
@@ -4478,10 +4516,8 @@ static int i9xx_crtc_mode_set(struct drm_crtc *crtc,
 		else
 			temp &= ~(LVDS_B0B3_POWER_UP | LVDS_CLKB_POWER_UP);
 
-		/* It would be nice to set 24 vs 18-bit mode (LVDS_A3_POWER_UP)
-		 * appropriately here, but we need to look more thoroughly into how
-		 * panels behave in the two modes.
-		 */
+		intel_lvds_select_pixel_format(&temp);
+
 		/* set the dithering flag on LVDS as needed */
 		if (INTEL_INFO(dev)->gen >= 4) {
 			if (dev_priv->lvds_dither)
@@ -4505,6 +4541,7 @@ static int i9xx_crtc_mode_set(struct drm_crtc *crtc,
 			temp &= ~(LVDS_HSYNC_POLARITY | LVDS_VSYNC_POLARITY);
 			temp |= lvds_sync;
 		}
+
 		I915_WRITE(LVDS, temp);
 	}
 
@@ -5000,10 +5037,8 @@ static int ironlake_crtc_mode_set(struct drm_crtc *crtc,
 		else
 			temp &= ~(LVDS_B0B3_POWER_UP | LVDS_CLKB_POWER_UP);
 
-		/* It would be nice to set 24 vs 18-bit mode (LVDS_A3_POWER_UP)
-		 * appropriately here, but we need to look more thoroughly into how
-		 * panels behave in the two modes.
-		 */
+		intel_lvds_select_pixel_format(&temp);
+
 		if (adjusted_mode->flags & DRM_MODE_FLAG_NHSYNC)
 			lvds_sync |= LVDS_HSYNC_POLARITY;
 		if (adjusted_mode->flags & DRM_MODE_FLAG_NVSYNC)
@@ -5020,6 +5055,7 @@ static int ironlake_crtc_mode_set(struct drm_crtc *crtc,
 			temp &= ~(LVDS_HSYNC_POLARITY | LVDS_VSYNC_POLARITY);
 			temp |= lvds_sync;
 		}
+
 		I915_WRITE(PCH_LVDS, temp);
 	}
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/21] drm/i915: Implement manual override of LVDS single/dual channel mode
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (8 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 09/21] drm/i915: Implement direct support for 24 bit LVDS pixel format Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16  9:17 ` [PATCH 11/21] drm/i915/tv: Use a direct pointer for tv_mode Chris Wilson
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

From: Mike Isely <isely@isely.net>

The logic for LVDS setup in the Intel driver needs to know whether the
LVDS port should be in single or dual channel mode when calculating
video timing.  It had been answering this question by probing the
current hardware configuration, under the assumption that the video
BIOS would have already set it up.  But the video BIOS would actually
have to know how to set up the LVDS port for the connected device,
which is a bad assumption if the display device is not integral to the
processor board - a situation that can exist for embedded situation.
This is yet one more case where the Intel driver had been implicitly
relying on the video BIOS for display configuration.

This changes creates a new kernel option, lvds_channels, which can be
used to override the probe.  Setting this allows the user to specify
the number of channels in use, avoiding the possibly erroneous probe
of the hardware.  Almost nobody should ever need to touch this option,
and its default value of zero is interpreted to preserve existing
probe-the-hardware behavior.  But if the video BIOS gets this "wrong",
then it can be overridden by setting lvds_channels to 1 or 2,
indicating single or dual channel LVDS mode, respectively.

Signed-off-by: Mike Isely <isely@isely.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c      |    4 ++++
 drivers/gpu/drm/i915/i915_drv.h      |    1 +
 drivers/gpu/drm/i915/intel_display.c |   32 +++++++++++++++++---------------
 3 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 49d38b0..0acc995 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -71,6 +71,10 @@ unsigned int i915_lvds_24bit = 0;
 module_param_named(lvds_24bit, i915_lvds_24bit, int, 0600);
 MODULE_PARM_DESC(lvds_24bit, "LVDS 24 bit pixel format: 0=leave untouched (default), 1=24 bit '2.0' format, 2=24 bit '2.1' format, 3=force older 18 bit format");
 
+unsigned int i915_lvds_channels = 0;
+module_param_named(lvds_channels, i915_lvds_channels, int, 0600);
+MODULE_PARM_DESC(lvds_channels, "LVDS channels in use: 0=(default) probe hardware 1=single 2=dual");
+
 static struct drm_driver driver;
 extern int intel_agp_enabled;
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7cd63bb..76e111c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -977,6 +977,7 @@ extern unsigned int i915_fbpercrtc;
 extern int i915_panel_ignore_lid;
 extern unsigned int i915_powersave;
 extern unsigned int i915_semaphores;
+extern unsigned int i915_lvds_channels;
 extern unsigned int i915_lvds_downclock;
 extern unsigned int i915_panel_use_ssc;
 extern int i915_vbt_sdvo_panel_type;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index a21d3666..29b292c 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -354,6 +354,17 @@ static const intel_limit_t intel_limits_ironlake_display_port = {
         .find_pll = intel_find_pll_ironlake_dp,
 };
 
+static bool intel_lvds_is_dual_channel_mode(struct drm_i915_private *dev_priv,
+					    int lvds_reg)
+{
+	/* Did the user specify the number of channels? */
+	if (i915_lvds_channels)
+		return i915_lvds_channels == 2;
+
+	/* No, let's probe the current status of the hardware instead. */
+	return (I915_READ(lvds_reg) & LVDS_CLKB_POWER_MASK) == LVDS_CLKB_POWER_UP;
+}
+
 static const intel_limit_t *intel_ironlake_limit(struct drm_crtc *crtc,
 						int refclk)
 {
@@ -362,8 +373,7 @@ static const intel_limit_t *intel_ironlake_limit(struct drm_crtc *crtc,
 	const intel_limit_t *limit;
 
 	if (intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS)) {
-		if ((I915_READ(PCH_LVDS) & LVDS_CLKB_POWER_MASK) ==
-		    LVDS_CLKB_POWER_UP) {
+		if (intel_lvds_is_dual_channel_mode(dev_priv, PCH_LVDS)) {
 			/* LVDS dual channel */
 			if (refclk == 100000)
 				limit = &intel_limits_ironlake_dual_lvds_100m;
@@ -391,8 +401,7 @@ static const intel_limit_t *intel_g4x_limit(struct drm_crtc *crtc)
 	const intel_limit_t *limit;
 
 	if (intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS)) {
-		if ((I915_READ(LVDS) & LVDS_CLKB_POWER_MASK) ==
-		    LVDS_CLKB_POWER_UP)
+		if (intel_lvds_is_dual_channel_mode(dev_priv, LVDS))
 			/* LVDS with dual channel */
 			limit = &intel_limits_g4x_dual_channel_lvds;
 		else
@@ -523,14 +532,7 @@ intel_find_best_PLL(const intel_limit_t *limit, struct drm_crtc *crtc,
 
 	if (intel_pipe_has_type(crtc, INTEL_OUTPUT_LVDS) &&
 	    (I915_READ(LVDS)) != 0) {
-		/*
-		 * For LVDS, if the panel is on, just rely on its current
-		 * settings for dual-channel.  We haven't figured out how to
-		 * reliably set up different single/dual channel state, if we
-		 * even can.
-		 */
-		if ((I915_READ(LVDS) & LVDS_CLKB_POWER_MASK) ==
-		    LVDS_CLKB_POWER_UP)
+		if (intel_lvds_is_dual_channel_mode(dev_priv, LVDS))
 			clock.p2 = limit->p2.p2_fast;
 		else
 			clock.p2 = limit->p2.p2_slow;
@@ -594,8 +596,8 @@ intel_g4x_find_best_PLL(const intel_limit_t *limit, struct drm_crtc *crtc,
 			lvds_reg = PCH_LVDS;
 		else
 			lvds_reg = LVDS;
-		if ((I915_READ(lvds_reg) & LVDS_CLKB_POWER_MASK) ==
-		    LVDS_CLKB_POWER_UP)
+
+		if (intel_lvds_is_dual_channel_mode(dev_priv, lvds_reg))
 			clock.p2 = limit->p2.p2_fast;
 		else
 			clock.p2 = limit->p2.p2_slow;
@@ -4913,7 +4915,7 @@ static int ironlake_crtc_mode_set(struct drm_crtc *crtc,
 	if (is_lvds) {
 		if ((intel_panel_use_ssc(dev_priv) &&
 		     dev_priv->lvds_ssc_freq == 100) ||
-		    (I915_READ(PCH_LVDS) & LVDS_CLKB_POWER_MASK) == LVDS_CLKB_POWER_UP)
+		    intel_lvds_is_dual_channel_mode(dev_priv, PCH_LVDS))
 			factor = 25;
 	} else if (is_sdvo && is_tv)
 		factor = 20;
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 11/21] drm/i915/tv: Use a direct pointer for tv_mode
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (9 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 10/21] drm/i915: Implement manual override of LVDS single/dual channel mode Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16  9:17 ` [PATCH 12/21] drm/i915: Replace ironlake_compute_wm0 with g4x_compute_wm0 Chris Wilson
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

Instead of scanning through the static array using strcmp to find our
matching tv_mode, just keep a direct link around. The historical reason
for the strcmp was to match the connection property "tv format", but now
that property is translated for us into an index by the drm core.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mathew McKernan <matmckernan@rauland.com.au>
---
 drivers/gpu/drm/i915/intel_tv.c |  118 ++++++++++++++++-----------------------
 1 files changed, 48 insertions(+), 70 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_tv.c b/drivers/gpu/drm/i915/intel_tv.c
index 6b22c1d..2f7c084 100644
--- a/drivers/gpu/drm/i915/intel_tv.c
+++ b/drivers/gpu/drm/i915/intel_tv.c
@@ -43,12 +43,45 @@ enum tv_margin {
 	TV_MARGIN_RIGHT, TV_MARGIN_BOTTOM
 };
 
+struct tv_mode {
+	const char *name;
+	int clock;
+	int refresh; /* in millihertz (for precision) */
+	u32 oversample;
+	int hsync_end, hblank_start, hblank_end, htotal;
+	bool progressive, trilevel_sync, component_only;
+	int vsync_start_f1, vsync_start_f2, vsync_len;
+	bool veq_ena;
+	int veq_start_f1, veq_start_f2, veq_len;
+	int vi_end_f1, vi_end_f2, nbr_end;
+	bool burst_ena;
+	int hburst_start, hburst_len;
+	int vburst_start_f1, vburst_end_f1;
+	int vburst_start_f2, vburst_end_f2;
+	int vburst_start_f3, vburst_end_f3;
+	int vburst_start_f4, vburst_end_f4;
+	/*
+	 * subcarrier programming
+	 */
+	int dda2_size, dda3_size, dda1_inc, dda2_inc, dda3_inc;
+	u32 sc_reset;
+	bool pal_burst;
+	/*
+	 * blank/black levels
+	 */
+	const struct video_levels *composite_levels, *svideo_levels;
+	const struct color_conversion *composite_color, *svideo_color;
+	const u32 *filter_table;
+	int max_srcw;
+};
+
+
 /** Private structure for the integrated TV support */
 struct intel_tv {
 	struct intel_encoder base;
 
 	int type;
-	const char *tv_format;
+	const struct tv_mode *mode;
 	int margin[4];
 	u32 save_TV_H_CTL_1;
 	u32 save_TV_H_CTL_2;
@@ -349,39 +382,6 @@ static const struct video_levels component_levels = {
 };
 
 
-struct tv_mode {
-	const char *name;
-	int clock;
-	int refresh; /* in millihertz (for precision) */
-	u32 oversample;
-	int hsync_end, hblank_start, hblank_end, htotal;
-	bool progressive, trilevel_sync, component_only;
-	int vsync_start_f1, vsync_start_f2, vsync_len;
-	bool veq_ena;
-	int veq_start_f1, veq_start_f2, veq_len;
-	int vi_end_f1, vi_end_f2, nbr_end;
-	bool burst_ena;
-	int hburst_start, hburst_len;
-	int vburst_start_f1, vburst_end_f1;
-	int vburst_start_f2, vburst_end_f2;
-	int vburst_start_f3, vburst_end_f3;
-	int vburst_start_f4, vburst_end_f4;
-	/*
-	 * subcarrier programming
-	 */
-	int dda2_size, dda3_size, dda1_inc, dda2_inc, dda3_inc;
-	u32 sc_reset;
-	bool pal_burst;
-	/*
-	 * blank/black levels
-	 */
-	const struct video_levels *composite_levels, *svideo_levels;
-	const struct color_conversion *composite_color, *svideo_color;
-	const u32 *filter_table;
-	int max_srcw;
-};
-
-
 /*
  * Sub carrier DDA
  *
@@ -928,32 +928,12 @@ intel_tv_dpms(struct drm_encoder *encoder, int mode)
 	}
 }
 
-static const struct tv_mode *
-intel_tv_mode_lookup(const char *tv_format)
-{
-	int i;
-
-	for (i = 0; i < sizeof(tv_modes) / sizeof (tv_modes[0]); i++) {
-		const struct tv_mode *tv_mode = &tv_modes[i];
-
-		if (!strcmp(tv_format, tv_mode->name))
-			return tv_mode;
-	}
-	return NULL;
-}
-
-static const struct tv_mode *
-intel_tv_mode_find(struct intel_tv *intel_tv)
-{
-	return intel_tv_mode_lookup(intel_tv->tv_format);
-}
-
 static enum drm_mode_status
 intel_tv_mode_valid(struct drm_connector *connector,
 		    struct drm_display_mode *mode)
 {
 	struct intel_tv *intel_tv = intel_attached_tv(connector);
-	const struct tv_mode *tv_mode = intel_tv_mode_find(intel_tv);
+	const struct tv_mode *tv_mode = intel_tv->mode;
 
 	/* Ensure TV refresh is close to desired refresh */
 	if (tv_mode && abs(tv_mode->refresh - drm_mode_vrefresh(mode) * 1000)
@@ -971,7 +951,7 @@ intel_tv_mode_fixup(struct drm_encoder *encoder, struct drm_display_mode *mode,
 	struct drm_device *dev = encoder->dev;
 	struct drm_mode_config *drm_config = &dev->mode_config;
 	struct intel_tv *intel_tv = enc_to_intel_tv(encoder);
-	const struct tv_mode *tv_mode = intel_tv_mode_find(intel_tv);
+	const struct tv_mode *tv_mode = intel_tv->mode;
 	struct drm_encoder *other_encoder;
 
 	if (!tv_mode)
@@ -997,7 +977,7 @@ intel_tv_mode_set(struct drm_encoder *encoder, struct drm_display_mode *mode,
 	struct drm_crtc *crtc = encoder->crtc;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	struct intel_tv *intel_tv = enc_to_intel_tv(encoder);
-	const struct tv_mode *tv_mode = intel_tv_mode_find(intel_tv);
+	const struct tv_mode *tv_mode = intel_tv->mode;
 	u32 tv_ctl;
 	u32 hctl1, hctl2, hctl3;
 	u32 vctl1, vctl2, vctl3, vctl4, vctl5, vctl6, vctl7;
@@ -1321,23 +1301,21 @@ intel_tv_detect_type (struct intel_tv *intel_tv,
 static void intel_tv_find_better_format(struct drm_connector *connector)
 {
 	struct intel_tv *intel_tv = intel_attached_tv(connector);
-	const struct tv_mode *tv_mode = intel_tv_mode_find(intel_tv);
+	const struct tv_mode *tv_mode = intel_tv->mode;
+	bool is_component_only = intel_tv->type == DRM_MODE_CONNECTOR_Component;
 	int i;
 
-	if ((intel_tv->type == DRM_MODE_CONNECTOR_Component) ==
-		tv_mode->component_only)
+	if (is_component_only == tv_mode->component_only)
 		return;
 
-
-	for (i = 0; i < sizeof(tv_modes) / sizeof(*tv_modes); i++) {
+	for (i = 0; i < ARRAY_SIZE(tv_modes); i++) {
 		tv_mode = tv_modes + i;
 
-		if ((intel_tv->type == DRM_MODE_CONNECTOR_Component) ==
-			tv_mode->component_only)
+		if (is_component_only == tv_mode->component_only)
 			break;
 	}
 
-	intel_tv->tv_format = tv_mode->name;
+	intel_tv->mode = tv_mode;
 	drm_connector_property_set_value(connector,
 		connector->dev->mode_config.tv_mode_property, i);
 }
@@ -1405,7 +1383,7 @@ intel_tv_chose_preferred_modes(struct drm_connector *connector,
 			       struct drm_display_mode *mode_ptr)
 {
 	struct intel_tv *intel_tv = intel_attached_tv(connector);
-	const struct tv_mode *tv_mode = intel_tv_mode_find(intel_tv);
+	const struct tv_mode *tv_mode = intel_tv->mode;
 
 	if (tv_mode->nbr_end < 480 && mode_ptr->vdisplay == 480)
 		mode_ptr->type |= DRM_MODE_TYPE_PREFERRED;
@@ -1430,7 +1408,7 @@ intel_tv_get_modes(struct drm_connector *connector)
 {
 	struct drm_display_mode *mode_ptr;
 	struct intel_tv *intel_tv = intel_attached_tv(connector);
-	const struct tv_mode *tv_mode = intel_tv_mode_find(intel_tv);
+	const struct tv_mode *tv_mode = intel_tv->mode;
 	int j, count = 0;
 	u64 tmp;
 
@@ -1524,10 +1502,10 @@ intel_tv_set_property(struct drm_connector *connector, struct drm_property *prop
 			ret = -EINVAL;
 			goto out;
 		}
-		if (!strcmp(intel_tv->tv_format, tv_modes[val].name))
+		if (intel_tv->mode == tv_modes + val)
 			goto out;
 
-		intel_tv->tv_format = tv_modes[val].name;
+		intel_tv->mode = &tv_modes[val];
 		changed = true;
 	} else {
 		ret = -EINVAL;
@@ -1694,7 +1672,7 @@ intel_tv_init(struct drm_device *dev)
 	intel_tv->margin[TV_MARGIN_RIGHT] = 46;
 	intel_tv->margin[TV_MARGIN_BOTTOM] = 37;
 
-	intel_tv->tv_format = tv_modes[initial_mode].name;
+	intel_tv->mode = &tv_modes[initial_mode];
 
 	drm_encoder_helper_add(&intel_encoder->base, &intel_tv_helper_funcs);
 	drm_connector_helper_add(connector, &intel_tv_connector_helper_funcs);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 12/21] drm/i915: Replace ironlake_compute_wm0 with g4x_compute_wm0
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (10 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 11/21] drm/i915/tv: Use a direct pointer for tv_mode Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16  9:17 ` [PATCH 13/21] drm/i915/crt: Explicitly return false if connected to a digital monitor Chris Wilson
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

The computation of the first-level watermarks for g4x and gen5+ are
based on the same algorithm, so we can refactor those code paths to
use a single function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_display.c |   88 ++++++++--------------------------
 1 files changed, 20 insertions(+), 68 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 29b292c..6b29abc 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3853,54 +3853,6 @@ static void i830_update_wm(struct drm_device *dev)
 #define ILK_LP0_PLANE_LATENCY		700
 #define ILK_LP0_CURSOR_LATENCY		1300
 
-static bool ironlake_compute_wm0(struct drm_device *dev,
-				 int pipe,
-				 const struct intel_watermark_params *display,
-				 int display_latency_ns,
-				 const struct intel_watermark_params *cursor,
-				 int cursor_latency_ns,
-				 int *plane_wm,
-				 int *cursor_wm)
-{
-	struct drm_crtc *crtc;
-	int htotal, hdisplay, clock, pixel_size;
-	int line_time_us, line_count;
-	int entries, tlb_miss;
-
-	crtc = intel_get_crtc_for_pipe(dev, pipe);
-	if (crtc->fb == NULL || !crtc->enabled)
-		return false;
-
-	htotal = crtc->mode.htotal;
-	hdisplay = crtc->mode.hdisplay;
-	clock = crtc->mode.clock;
-	pixel_size = crtc->fb->bits_per_pixel / 8;
-
-	/* Use the small buffer method to calculate plane watermark */
-	entries = ((clock * pixel_size / 1000) * display_latency_ns) / 1000;
-	tlb_miss = display->fifo_size*display->cacheline_size - hdisplay * 8;
-	if (tlb_miss > 0)
-		entries += tlb_miss;
-	entries = DIV_ROUND_UP(entries, display->cacheline_size);
-	*plane_wm = entries + display->guard_size;
-	if (*plane_wm > (int)display->max_wm)
-		*plane_wm = display->max_wm;
-
-	/* Use the large buffer method to calculate cursor watermark */
-	line_time_us = ((htotal * 1000) / clock);
-	line_count = (cursor_latency_ns / line_time_us + 1000) / 1000;
-	entries = line_count * 64 * pixel_size;
-	tlb_miss = cursor->fifo_size*cursor->cacheline_size - hdisplay * 8;
-	if (tlb_miss > 0)
-		entries += tlb_miss;
-	entries = DIV_ROUND_UP(entries, cursor->cacheline_size);
-	*cursor_wm = entries + cursor->guard_size;
-	if (*cursor_wm > (int)cursor->max_wm)
-		*cursor_wm = (int)cursor->max_wm;
-
-	return true;
-}
-
 /*
  * Check the wm result.
  *
@@ -4009,12 +3961,12 @@ static void ironlake_update_wm(struct drm_device *dev)
 	unsigned int enabled;
 
 	enabled = 0;
-	if (ironlake_compute_wm0(dev, 0,
-				 &ironlake_display_wm_info,
-				 ILK_LP0_PLANE_LATENCY,
-				 &ironlake_cursor_wm_info,
-				 ILK_LP0_CURSOR_LATENCY,
-				 &plane_wm, &cursor_wm)) {
+	if (g4x_compute_wm0(dev, 0,
+			    &ironlake_display_wm_info,
+			    ILK_LP0_PLANE_LATENCY,
+			    &ironlake_cursor_wm_info,
+			    ILK_LP0_CURSOR_LATENCY,
+			    &plane_wm, &cursor_wm)) {
 		I915_WRITE(WM0_PIPEA_ILK,
 			   (plane_wm << WM0_PIPE_PLANE_SHIFT) | cursor_wm);
 		DRM_DEBUG_KMS("FIFO watermarks For pipe A -"
@@ -4023,12 +3975,12 @@ static void ironlake_update_wm(struct drm_device *dev)
 		enabled |= 1;
 	}
 
-	if (ironlake_compute_wm0(dev, 1,
-				 &ironlake_display_wm_info,
-				 ILK_LP0_PLANE_LATENCY,
-				 &ironlake_cursor_wm_info,
-				 ILK_LP0_CURSOR_LATENCY,
-				 &plane_wm, &cursor_wm)) {
+	if (g4x_compute_wm0(dev, 1,
+			    &ironlake_display_wm_info,
+			    ILK_LP0_PLANE_LATENCY,
+			    &ironlake_cursor_wm_info,
+			    ILK_LP0_CURSOR_LATENCY,
+			    &plane_wm, &cursor_wm)) {
 		I915_WRITE(WM0_PIPEB_ILK,
 			   (plane_wm << WM0_PIPE_PLANE_SHIFT) | cursor_wm);
 		DRM_DEBUG_KMS("FIFO watermarks For pipe B -"
@@ -4093,10 +4045,10 @@ static void sandybridge_update_wm(struct drm_device *dev)
 	unsigned int enabled;
 
 	enabled = 0;
-	if (ironlake_compute_wm0(dev, 0,
-				 &sandybridge_display_wm_info, latency,
-				 &sandybridge_cursor_wm_info, latency,
-				 &plane_wm, &cursor_wm)) {
+	if (g4x_compute_wm0(dev, 0,
+			    &sandybridge_display_wm_info, latency,
+			    &sandybridge_cursor_wm_info, latency,
+			    &plane_wm, &cursor_wm)) {
 		I915_WRITE(WM0_PIPEA_ILK,
 			   (plane_wm << WM0_PIPE_PLANE_SHIFT) | cursor_wm);
 		DRM_DEBUG_KMS("FIFO watermarks For pipe A -"
@@ -4105,10 +4057,10 @@ static void sandybridge_update_wm(struct drm_device *dev)
 		enabled |= 1;
 	}
 
-	if (ironlake_compute_wm0(dev, 1,
-				 &sandybridge_display_wm_info, latency,
-				 &sandybridge_cursor_wm_info, latency,
-				 &plane_wm, &cursor_wm)) {
+	if (g4x_compute_wm0(dev, 1,
+			    &sandybridge_display_wm_info, latency,
+			    &sandybridge_cursor_wm_info, latency,
+			    &plane_wm, &cursor_wm)) {
 		I915_WRITE(WM0_PIPEB_ILK,
 			   (plane_wm << WM0_PIPE_PLANE_SHIFT) | cursor_wm);
 		DRM_DEBUG_KMS("FIFO watermarks For pipe B -"
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 13/21] drm/i915/crt: Explicitly return false if connected to a digital monitor
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (11 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 12/21] drm/i915: Replace ironlake_compute_wm0 with g4x_compute_wm0 Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16  9:17 ` [PATCH 14/21] drm/i915/i2c: Convert from using GMBUS1 + reg_offset idiom to reg + 0 Chris Wilson
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

Rather than proceed on and silently return false by default, mention why
we rejected the presence of an EDID as implying the presence of a VGA
monitor. (The question arises whether there is a broken EDID which falsely
reports a digital connection when attached by VGA.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Keith Packard <keithp@keithp.com>
---
 drivers/gpu/drm/i915/intel_crt.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_crt.c b/drivers/gpu/drm/i915/intel_crt.c
index d03fc05..e17bc6b 100644
--- a/drivers/gpu/drm/i915/intel_crt.c
+++ b/drivers/gpu/drm/i915/intel_crt.c
@@ -288,6 +288,8 @@ static bool intel_crt_detect_ddc(struct drm_connector *connector)
 		 * This may be a DVI-I connector with a shared DDC
 		 * link between analog and digital outputs, so we
 		 * have to check the EDID input spec of the attached device.
+		 *
+		 * On the other hand, what should we do if it is a broken EDID?
 		 */
 		if (edid != NULL) {
 			is_digital = edid->input & DRM_EDID_INPUT_DIGITAL;
@@ -298,6 +300,8 @@ static bool intel_crt_detect_ddc(struct drm_connector *connector)
 		if (!is_digital) {
 			DRM_DEBUG_KMS("CRT detected via DDC:0x50 [EDID]\n");
 			return true;
+		} else {
+			DRM_DEBUG_KMS("CRT not detected via DDC:0x50 [EDID reports a digital panel]\n");
 		}
 	}
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 14/21] drm/i915/i2c: Convert from using GMBUS1 + reg_offset idiom to reg + 0
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (12 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 13/21] drm/i915/crt: Explicitly return false if connected to a digital monitor Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16  9:17 ` [PATCH 15/21] drm/i915/gmbus: Reset the controller on initialisation Chris Wilson
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

Keith complained that GMBUSx + reg_offset was ugly. An alternative
naming scheme which is more consistent with the reset of the code base
is to store the address of the GMBUS0 and then reference each of the
GMBUSx registers as an offset from GMBUS0.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_i2c.c |   51 +++++++++++++++++++++----------------
 1 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c
index d3b903b..ed11523 100644
--- a/drivers/gpu/drm/i915/intel_i2c.c
+++ b/drivers/gpu/drm/i915/intel_i2c.c
@@ -51,6 +51,11 @@ struct intel_gpio {
 	u32 reg;
 };
 
+static int intel_gmbus_reg0(struct drm_device *dev)
+{
+	return HAS_PCH_SPLIT(dev) ? PCH_GMBUS0 : GMBUS0;
+}
+
 void
 intel_i2c_reset(struct drm_device *dev)
 {
@@ -232,36 +237,36 @@ gmbus_xfer(struct i2c_adapter *adapter,
 					       struct intel_gmbus,
 					       adapter);
 	struct drm_i915_private *dev_priv = adapter->algo_data;
-	int i, reg_offset;
+	int i, reg;
 
 	if (bus->force_bit)
 		return intel_i2c_quirk_xfer(dev_priv,
 					    bus->force_bit, msgs, num);
 
-	reg_offset = HAS_PCH_SPLIT(dev_priv->dev) ? PCH_GMBUS0 - GMBUS0 : 0;
+	reg = intel_gmbus_reg0(dev_priv->dev);
 
-	I915_WRITE(GMBUS0 + reg_offset, bus->reg0);
+	I915_WRITE(reg + 0, bus->reg0);
 
 	for (i = 0; i < num; i++) {
 		u16 len = msgs[i].len;
 		u8 *buf = msgs[i].buf;
 
 		if (msgs[i].flags & I2C_M_RD) {
-			I915_WRITE(GMBUS1 + reg_offset,
+			I915_WRITE(reg + 1,
 				   GMBUS_CYCLE_WAIT | (i + 1 == num ? GMBUS_CYCLE_STOP : 0) |
 				   (len << GMBUS_BYTE_COUNT_SHIFT) |
 				   (msgs[i].addr << GMBUS_SLAVE_ADDR_SHIFT) |
 				   GMBUS_SLAVE_READ | GMBUS_SW_RDY);
-			POSTING_READ(GMBUS2+reg_offset);
+			POSTING_READ(reg + 2);
 			do {
 				u32 val, loop = 0;
 
-				if (wait_for(I915_READ(GMBUS2 + reg_offset) & (GMBUS_SATOER | GMBUS_HW_RDY), 50))
+				if (wait_for(I915_READ(reg + 2) & (GMBUS_SATOER | GMBUS_HW_RDY), 50))
 					goto timeout;
-				if (I915_READ(GMBUS2 + reg_offset) & GMBUS_SATOER)
+				if (I915_READ(reg + 2) & GMBUS_SATOER)
 					goto clear_err;
 
-				val = I915_READ(GMBUS3 + reg_offset);
+				val = I915_READ(reg + 3);
 				do {
 					*buf++ = val & 0xff;
 					val >>= 8;
@@ -275,18 +280,18 @@ gmbus_xfer(struct i2c_adapter *adapter,
 				val |= *buf++ << (8 * loop);
 			} while (--len && ++loop < 4);
 
-			I915_WRITE(GMBUS3 + reg_offset, val);
-			I915_WRITE(GMBUS1 + reg_offset,
+			I915_WRITE(reg + 3, val);
+			I915_WRITE(reg + 1,
 				   (i + 1 == num ? GMBUS_CYCLE_STOP : GMBUS_CYCLE_WAIT) |
 				   (msgs[i].len << GMBUS_BYTE_COUNT_SHIFT) |
 				   (msgs[i].addr << GMBUS_SLAVE_ADDR_SHIFT) |
 				   GMBUS_SLAVE_WRITE | GMBUS_SW_RDY);
-			POSTING_READ(GMBUS2+reg_offset);
+			POSTING_READ(reg + 2);
 
 			while (len) {
-				if (wait_for(I915_READ(GMBUS2 + reg_offset) & (GMBUS_SATOER | GMBUS_HW_RDY), 50))
+				if (wait_for(I915_READ(reg + 2) & (GMBUS_SATOER | GMBUS_HW_RDY), 50))
 					goto timeout;
-				if (I915_READ(GMBUS2 + reg_offset) & GMBUS_SATOER)
+				if (I915_READ(reg + 2) & GMBUS_SATOER)
 					goto clear_err;
 
 				val = loop = 0;
@@ -294,14 +299,15 @@ gmbus_xfer(struct i2c_adapter *adapter,
 					val |= *buf++ << (8 * loop);
 				} while (--len && ++loop < 4);
 
-				I915_WRITE(GMBUS3 + reg_offset, val);
-				POSTING_READ(GMBUS2+reg_offset);
+				I915_WRITE(reg + 3, val);
+				POSTING_READ(reg + 2);
 			}
 		}
 
-		if (i + 1 < num && wait_for(I915_READ(GMBUS2 + reg_offset) & (GMBUS_SATOER | GMBUS_HW_WAIT_PHASE), 50))
+		if (i + 1 < num &&
+		    wait_for(I915_READ(reg + 2) & (GMBUS_SATOER | GMBUS_HW_WAIT_PHASE), 50))
 			goto timeout;
-		if (I915_READ(GMBUS2 + reg_offset) & GMBUS_SATOER)
+		if (I915_READ(reg + 2) & GMBUS_SATOER)
 			goto clear_err;
 	}
 
@@ -312,22 +318,23 @@ clear_err:
 	 * of resetting the GMBUS controller and so clearing the
 	 * BUS_ERROR raised by the slave's NAK.
 	 */
-	I915_WRITE(GMBUS1 + reg_offset, GMBUS_SW_CLR_INT);
-	I915_WRITE(GMBUS1 + reg_offset, 0);
+	I915_WRITE(reg + 1, GMBUS_SW_CLR_INT);
+	I915_WRITE(reg + 1, 0);
 
 done:
 	/* Mark the GMBUS interface as disabled. We will re-enable it at the
 	 * start of the next xfer, till then let it sleep.
 	 */
-	I915_WRITE(GMBUS0 + reg_offset, 0);
+	I915_WRITE(reg + 0, 0);
 	return i;
 
 timeout:
 	DRM_INFO("GMBUS timed out, falling back to bit banging on pin %d [%s]\n",
 		 bus->reg0 & 0xff, bus->adapter.name);
-	I915_WRITE(GMBUS0 + reg_offset, 0);
+	intel_i2c_reset(dev_priv->dev);
 
-	/* Hardware may not support GMBUS over these pins? Try GPIO bitbanging instead. */
+	/* Hardware may not support GMBUS over these pins?
+	 * Try GPIO bitbanging instead. */
 	bus->force_bit = intel_gpio_create(dev_priv, bus->reg0 & 0xff);
 	if (!bus->force_bit)
 		return -ENOMEM;
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 15/21] drm/i915/gmbus: Reset the controller on initialisation
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (13 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 14/21] drm/i915/i2c: Convert from using GMBUS1 + reg_offset idiom to reg + 0 Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16  9:17 ` [PATCH 16/21] drm/i915: Retire requests before disabling pagefaults Chris Wilson
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

Toggle the Software Clear Interrupt bit which resets the controller to
clear any prior BUS_ERROR condition before we begin to use the
controller in earnest.

Suggested-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_i2c.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c
index ed11523..bf1c72c 100644
--- a/drivers/gpu/drm/i915/intel_i2c.c
+++ b/drivers/gpu/drm/i915/intel_i2c.c
@@ -60,10 +60,14 @@ void
 intel_i2c_reset(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	if (HAS_PCH_SPLIT(dev))
-		I915_WRITE(PCH_GMBUS0, 0);
-	else
-		I915_WRITE(GMBUS0, 0);
+	int reg = intel_gmbus_reg0(dev);
+
+	/* First reset the controller by toggling the Sw Clear Interrupt. */
+	I915_WRITE(reg + 1, GMBUS_SW_CLR_INT);
+	I915_WRITE(reg + 1, 0);
+
+	/* Then mark the controller as disabled. */
+	I915_WRITE(reg + 0, 0);
 }
 
 static void intel_i2c_quirk_set(struct drm_i915_private *dev_priv, bool enable)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 16/21] drm/i915: Retire requests before disabling pagefaults
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (14 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 15/21] drm/i915/gmbus: Reset the controller on initialisation Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16 13:44   ` Daniel Vetter
  2011-04-16  9:17 ` [PATCH 17/21] drm/i915: Repeat retiring of requests until the seqno is stable Chris Wilson
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

As we cannot wait upon an object to be released by the GPU once we have
disabled pagefaults, process any pending retirements first in the hope
that we move any potential relocations off the active list.

References: https://bugs.freedesktop.org/show_bug.cgi?id=35733
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 0010aee..b6f89f9 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -445,6 +445,12 @@ i915_gem_execbuffer_relocate(struct drm_device *dev,
 	struct drm_i915_gem_object *obj;
 	int ret = 0;
 
+	/* Try to move as many of the relocation targets off the active list
+	 * to avoid unnecessary fallbacks to the slow path, as we cannot wait
+	 * for the retirement with pagefaults disabled.
+	 */
+	i915_gem_retire_requests(dev);
+
 	/* This is the fast path and we cannot handle a pagefault whilst
 	 * holding the struct mutex lest the user pass in the relocations
 	 * contained within a mmaped bo. For in such a case we, the page
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 17/21] drm/i915: Repeat retiring of requests until the seqno is stable
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (15 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 16/21] drm/i915: Retire requests before disabling pagefaults Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16 13:45   ` Daniel Vetter
  2011-04-16  9:17 ` [PATCH 18/21] drm/i915: Split out i915_gem_object_move_to_ring() from execbuffer Chris Wilson
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

The list walking over objects and requests to retire may take a
significant amount of time, enough time for the GPU to have finished
more batches. So repeat the list walking until the GPU seqno is stable.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c |   66 ++++++++++++++++++++-------------------
 1 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b9515ac..58e77d6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1952,47 +1952,45 @@ i915_gem_retire_requests_ring(struct intel_ring_buffer *ring)
 
 	WARN_ON(i915_verify_lists(ring->dev));
 
-	seqno = ring->get_seqno(ring);
+	do {
+		seqno = ring->get_seqno(ring);
 
-	for (i = 0; i < ARRAY_SIZE(ring->sync_seqno); i++)
-		if (seqno >= ring->sync_seqno[i])
-			ring->sync_seqno[i] = 0;
+		while (!list_empty(&ring->request_list)) {
+			struct drm_i915_gem_request *request;
 
-	while (!list_empty(&ring->request_list)) {
-		struct drm_i915_gem_request *request;
+			request = list_first_entry(&ring->request_list,
+						   struct drm_i915_gem_request,
+						   list);
 
-		request = list_first_entry(&ring->request_list,
-					   struct drm_i915_gem_request,
-					   list);
-
-		if (!i915_seqno_passed(seqno, request->seqno))
-			break;
+			if (!i915_seqno_passed(seqno, request->seqno))
+				break;
 
-		trace_i915_gem_request_retire(ring, request->seqno);
+			trace_i915_gem_request_retire(ring, request->seqno);
 
-		list_del(&request->list);
-		i915_gem_request_remove_from_client(request);
-		kfree(request);
-	}
+			list_del(&request->list);
+			i915_gem_request_remove_from_client(request);
+			kfree(request);
+		}
 
-	/* Move any buffers on the active list that are no longer referenced
-	 * by the ringbuffer to the flushing/inactive lists as appropriate.
-	 */
-	while (!list_empty(&ring->active_list)) {
-		struct drm_i915_gem_object *obj;
+		/* Move any buffers on the active list that are no longer referenced
+		 * by the ringbuffer to the flushing/inactive lists as appropriate.
+		 */
+		while (!list_empty(&ring->active_list)) {
+			struct drm_i915_gem_object *obj;
 
-		obj= list_first_entry(&ring->active_list,
-				      struct drm_i915_gem_object,
-				      ring_list);
+			obj = list_first_entry(&ring->active_list,
+					       struct drm_i915_gem_object,
+					       ring_list);
 
-		if (!i915_seqno_passed(seqno, obj->last_rendering_seqno))
-			break;
+			if (!i915_seqno_passed(seqno, obj->last_rendering_seqno))
+				break;
 
-		if (obj->base.write_domain != 0)
-			i915_gem_object_move_to_flushing(obj);
-		else
-			i915_gem_object_move_to_inactive(obj);
-	}
+			if (obj->base.write_domain != 0)
+				i915_gem_object_move_to_flushing(obj);
+			else
+				i915_gem_object_move_to_inactive(obj);
+		}
+	} while (seqno != ring->get_seqno(ring));
 
 	if (unlikely(ring->trace_irq_seqno &&
 		     i915_seqno_passed(seqno, ring->trace_irq_seqno))) {
@@ -2000,6 +1998,10 @@ i915_gem_retire_requests_ring(struct intel_ring_buffer *ring)
 		ring->trace_irq_seqno = 0;
 	}
 
+	for (i = 0; i < ARRAY_SIZE(ring->sync_seqno); i++)
+		if (seqno >= ring->sync_seqno[i])
+			ring->sync_seqno[i] = 0;
+
 	WARN_ON(i915_verify_lists(ring->dev));
 }
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 18/21] drm/i915: Split out i915_gem_object_move_to_ring() from execbuffer
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (16 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 17/21] drm/i915: Repeat retiring of requests until the seqno is stable Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16 13:54   ` Daniel Vetter
  2011-04-16  9:17 ` [PATCH 19/21] drm/i915: Enable the use of GPU semaphores whilst page-flipping Chris Wilson
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

As we can make use of the ability to insert semaphores to serialise
accessing buffers between ring elsewhere, separate out the function from
the execbuffer code and make it generic.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h            |    5 +++
 drivers/gpu/drm/i915/i915_gem.c            |   40 ++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   43 +---------------------------
 3 files changed, 46 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 76e111c..30fbf3b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -940,6 +940,7 @@ enum intel_chip_family {
 #define HAS_BSD(dev)            (INTEL_INFO(dev)->has_bsd_ring)
 #define HAS_BLT(dev)            (INTEL_INFO(dev)->has_blt_ring)
 #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
+#define HAS_GPU_SEMAPHORES(dev) (INTEL_INFO(dev)->gen >= 6 && i915_semaphores)
 
 #define HAS_OVERLAY(dev)		(INTEL_INFO(dev)->has_overlay)
 #define OVERLAY_NEEDS_PHYSICAL(dev)	(INTEL_INFO(dev)->overlay_needs_physical)
@@ -1198,6 +1199,10 @@ void i915_gem_detach_phys_object(struct drm_device *dev,
 void i915_gem_free_all_phys_object(struct drm_device *dev);
 void i915_gem_release(struct drm_device *dev, struct drm_file *file);
 
+int __must_check
+i915_gem_object_move_to_ring(struct drm_i915_gem_object *obj,
+			     struct intel_ring_buffer *to);
+
 uint32_t
 i915_gem_get_unfenced_gtt_alignment(struct drm_i915_gem_object *obj);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 58e77d6..801496a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1658,6 +1658,46 @@ i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
 	obj->pages = NULL;
 }
 
+int
+i915_gem_object_move_to_ring(struct drm_i915_gem_object *obj,
+			     struct intel_ring_buffer *to)
+{
+	struct intel_ring_buffer *from = obj->ring;
+	u32 seqno;
+	int ret, idx;
+
+	if (from == NULL || to == from)
+		return 0;
+
+	if (to == NULL || !HAS_GPU_SEMAPHORES(obj->base.dev))
+		return i915_gem_object_wait_rendering(obj);
+
+	idx = intel_ring_sync_index(from, to);
+
+	seqno = obj->last_rendering_seqno;
+	if (seqno <= from->sync_seqno[idx])
+		return 0;
+
+	if (seqno == from->outstanding_lazy_request) {
+		struct drm_i915_gem_request *request;
+
+		request = kzalloc(sizeof(*request), GFP_KERNEL);
+		if (request == NULL)
+			return -ENOMEM;
+
+		ret = i915_add_request(from, NULL, request);
+		if (ret) {
+			kfree(request);
+			return ret;
+		}
+
+		seqno = request->seqno;
+	}
+
+	from->sync_seqno[idx] = seqno;
+	return intel_ring_sync(to, from, seqno - 1);
+}
+
 void
 i915_gem_object_move_to_active(struct drm_i915_gem_object *obj,
 			       struct intel_ring_buffer *ring,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b6f89f9..316603e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -756,47 +756,6 @@ i915_gem_execbuffer_flush(struct drm_device *dev,
 }
 
 static int
-i915_gem_execbuffer_sync_rings(struct drm_i915_gem_object *obj,
-			       struct intel_ring_buffer *to)
-{
-	struct intel_ring_buffer *from = obj->ring;
-	u32 seqno;
-	int ret, idx;
-
-	if (from == NULL || to == from)
-		return 0;
-
-	/* XXX gpu semaphores are implicated in various hard hangs on SNB */
-	if (INTEL_INFO(obj->base.dev)->gen < 6 || !i915_semaphores)
-		return i915_gem_object_wait_rendering(obj);
-
-	idx = intel_ring_sync_index(from, to);
-
-	seqno = obj->last_rendering_seqno;
-	if (seqno <= from->sync_seqno[idx])
-		return 0;
-
-	if (seqno == from->outstanding_lazy_request) {
-		struct drm_i915_gem_request *request;
-
-		request = kzalloc(sizeof(*request), GFP_KERNEL);
-		if (request == NULL)
-			return -ENOMEM;
-
-		ret = i915_add_request(from, NULL, request);
-		if (ret) {
-			kfree(request);
-			return ret;
-		}
-
-		seqno = request->seqno;
-	}
-
-	from->sync_seqno[idx] = seqno;
-	return intel_ring_sync(to, from, seqno - 1);
-}
-
-static int
 i915_gem_execbuffer_wait_for_flips(struct intel_ring_buffer *ring, u32 flips)
 {
 	u32 plane, flip_mask;
@@ -857,7 +816,7 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
 	}
 
 	list_for_each_entry(obj, objects, exec_list) {
-		ret = i915_gem_execbuffer_sync_rings(obj, ring);
+		ret = i915_gem_object_move_to_ring(obj, ring);
 		if (ret)
 			return ret;
 	}
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 19/21] drm/i915: Enable the use of GPU semaphores whilst page-flipping
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (17 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 18/21] drm/i915: Split out i915_gem_object_move_to_ring() from execbuffer Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16 13:58   ` Daniel Vetter
  2011-04-16  9:17 ` [PATCH 20/21] drm/i915: Use a slab for object allocation Chris Wilson
  2011-04-16  9:17 ` [PATCH 21/21] drm/i915: Introduce vmap (mapping of user pages into video memory) ioctl Chris Wilson
  20 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

The 2D driver unfortunately uses the BLT ring in order to perform blits,
and will often blit between buffers in the course of a pageflip. (Though
usually it is from the post-flipped scanout to the window frontbuffer
for reasons known only to itself, maintaining the front buffer in case
of 2D access one presumes.)

As such, being able to use a GPU semaphore and not stall the CPU/GPU
whilst switching between rings for a pageflip is useful.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 801496a..fe45f4e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3217,11 +3217,9 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
-	if (pipelined != obj->ring) {
-		ret = i915_gem_object_wait_rendering(obj);
-		if (ret)
-			return ret;
-	}
+	ret = i915_gem_object_move_to_ring(obj, pipelined);
+	if (ret)
+		return ret;
 
 	/* The display engine is not coherent with the LLC cache on gen6.  As
 	 * a result, we make sure that the pinning that is about to occur is
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 20/21] drm/i915: Use a slab for object allocation
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (18 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 19/21] drm/i915: Enable the use of GPU semaphores whilst page-flipping Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-16 14:07   ` Daniel Vetter
  2011-04-16  9:17 ` [PATCH 21/21] drm/i915: Introduce vmap (mapping of user pages into video memory) ioctl Chris Wilson
  20 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx

The primary purpose of this was to debug some use-after-free memory
corruption that was causing an OOPS inside drm/i915. As it turned out
the corruption was being caused elsewhere and i915.ko as a major user of
many objects was being hit hardest.

Indeed as we do frequent the generic kmalloc caches, dedicating one to
ourselves (or at least naming one for us depending upon the core) aids
debugging our own slab usage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_dma.c |    3 +++
 drivers/gpu/drm/i915/i915_drv.h |    2 ++
 drivers/gpu/drm/i915/i915_gem.c |   12 +++++++++---
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 3b69f38..d8269f3 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -2167,6 +2167,9 @@ int i915_driver_unload(struct drm_device *dev)
 
 	destroy_workqueue(dev_priv->wq);
 
+	if (dev_priv->slab)
+		kmem_cache_destroy(dev_priv->slab);
+
 	pci_dev_put(dev_priv->bridge_dev);
 	kfree(dev->dev_private);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 30fbf3b..a8733ac 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -265,6 +265,8 @@ typedef struct drm_i915_private {
 
 	const struct intel_device_info *info;
 
+	struct kmem_cache *slab;
+
 	int has_gem;
 	int relative_constants_mode;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fe45f4e..f554273 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3785,7 +3785,7 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
 
-	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+	obj = kmem_cache_alloc(dev_priv->slab, GFP_KERNEL | __GFP_ZERO);
 	if (obj == NULL)
 		return NULL;
 
@@ -3860,7 +3860,7 @@ static void i915_gem_free_object_tail(struct drm_i915_gem_object *obj)
 
 	kfree(obj->page_cpu_valid);
 	kfree(obj->bit_17);
-	kfree(obj);
+	kmem_cache_free(dev_priv->slab, obj);
 }
 
 void i915_gem_free_object(struct drm_gem_object *gem_obj)
@@ -4051,8 +4051,14 @@ init_ring_lists(struct intel_ring_buffer *ring)
 void
 i915_gem_load(struct drm_device *dev)
 {
-	int i;
 	drm_i915_private_t *dev_priv = dev->dev_private;
+	int i;
+
+	dev_priv->slab =
+		kmem_cache_create("i915_gem_object",
+				  sizeof(struct drm_i915_gem_object), 0,
+				  SLAB_HWCACHE_ALIGN,
+				  NULL);
 
 	INIT_LIST_HEAD(&dev_priv->mm.active_list);
 	INIT_LIST_HEAD(&dev_priv->mm.flushing_list);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 21/21] drm/i915: Introduce vmap (mapping of user pages into video memory) ioctl
  2011-04-16  9:17 i915 next, post-llc Chris Wilson
                   ` (19 preceding siblings ...)
  2011-04-16  9:17 ` [PATCH 20/21] drm/i915: Use a slab for object allocation Chris Wilson
@ 2011-04-16  9:17 ` Chris Wilson
  2011-04-18 14:58   ` Daniel Vetter
  20 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2011-04-16  9:17 UTC (permalink / raw)
  To: intel-gfx; +Cc: Dave Airlie

By exporting the ability to map user address and inserting PTEs
representing their backing pages into the GTT, we can exploit UMA in order
to utilize normal application data as a texture source or even as a
render target (depending upon the capabilities of the chipset). This has
a number of uses, with zero-copy downloads to the GPU and efficient
readback making the intermixed streaming of CPU and GPU operations
fairly efficient. This ability has many widespread implications from
faster rendering of partial software fallbacks (xterm!) to faster
pipelining of texture data (such as pixel buffer objects in GL).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@linux.ie>
---
 drivers/gpu/drm/drm_gem.c            |    3 +-
 drivers/gpu/drm/i915/Makefile        |    1 +
 drivers/gpu/drm/i915/i915_dma.c      |    4 +
 drivers/gpu/drm/i915/i915_drv.h      |   33 +++++++-
 drivers/gpu/drm/i915/i915_gem.c      |   77 +++++++++++------
 drivers/gpu/drm/i915/i915_gem_vmap.c |  149 ++++++++++++++++++++++++++++++++++
 include/drm/i915_drm.h               |   16 ++++
 7 files changed, 254 insertions(+), 29 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_vmap.c

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 74e4ff5..03ca40a 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -426,7 +426,8 @@ drm_gem_release(struct drm_device *dev, struct drm_file *file_private)
 void
 drm_gem_object_release(struct drm_gem_object *obj)
 {
-	fput(obj->filp);
+	if (obj->filp)
+		fput(obj->filp);
 }
 EXPORT_SYMBOL(drm_gem_object_release);
 
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 0ae6a7c..0bbc404 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -12,6 +12,7 @@ i915-y := i915_drv.o i915_dma.o i915_irq.o i915_mem.o \
 	  i915_gem_execbuffer.o \
 	  i915_gem_gtt.o \
 	  i915_gem_tiling.o \
+	  i915_gem_vmap.o \
 	  i915_trace_points.o \
 	  intel_display.o \
 	  intel_crt.o \
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index d8269f3..3979ed8 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -782,6 +782,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_RELAXED_DELTA:
 		value = 1;
 		break;
+	case I915_PARAM_HAS_VMAP:
+		value = dev_priv->has_gem;
+		break;
 	default:
 		DRM_DEBUG_DRIVER("Unknown parameter %d\n",
 				 param->param);
@@ -2279,6 +2282,7 @@ struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_MADVISE, i915_gem_madvise_ioctl, DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(I915_OVERLAY_PUT_IMAGE, intel_overlay_put_image, DRM_MASTER|DRM_CONTROL_ALLOW|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(I915_OVERLAY_ATTRS, intel_overlay_attrs, DRM_MASTER|DRM_CONTROL_ALLOW|DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VMAP, i915_gem_vmap_ioctl, DRM_UNLOCKED),
 };
 
 int i915_max_ioctl = DRM_ARRAY_SIZE(i915_ioctls);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a8733ac..90eac1d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -721,6 +721,11 @@ enum i915_cache_level {
 	I915_CACHE_LLC_MLC, /* gen6+ */
 };
 
+struct drm_i915_gem_object_ops {
+	int (*get_pages)(struct drm_i915_gem_object *, gfp_t, u32 *offset);
+	void (*put_pages)(struct drm_i915_gem_object *);
+};
+
 struct drm_i915_gem_object {
 	struct drm_gem_object base;
 
@@ -867,6 +872,18 @@ struct drm_i915_gem_object {
 	atomic_t pending_flip;
 };
 
+struct i915_gem_vmap_object {
+	struct drm_i915_gem_object gem;
+	uintptr_t user_ptr;
+	size_t user_size;
+	int read_only;
+};
+
+union drm_i915_gem_objects {
+	struct drm_i915_gem_object base;
+	struct i915_gem_vmap_object vmap;
+};
+
 #define to_intel_bo(x) container_of(x, struct drm_i915_gem_object, base)
 
 /**
@@ -1122,6 +1139,8 @@ int __must_check i915_gem_flush_ring(struct intel_ring_buffer *ring,
 				     uint32_t flush_domains);
 struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 						  size_t size);
+void i915_gem_object_init(struct drm_i915_gem_object *obj,
+			  const struct drm_i915_gem_object_ops *ops);
 void i915_gem_free_object(struct drm_gem_object *obj);
 int __must_check i915_gem_object_pin(struct drm_i915_gem_object *obj,
 				     uint32_t alignment,
@@ -1143,7 +1162,19 @@ int i915_gem_dumb_create(struct drm_file *file_priv,
 int i915_gem_mmap_gtt(struct drm_file *file_priv, struct drm_device *dev,
 		      uint32_t handle, uint64_t *offset);
 int i915_gem_dumb_destroy(struct drm_file *file_priv, struct drm_device *dev,
-			  uint32_t handle);			  
+			  uint32_t handle);
+
+/* i915_gem_vmap.c */
+int i915_gem_vmap_ioctl(struct drm_device *dev, void *data,
+			struct drm_file *file);
+
+int
+i915_gem_get_user_pages(struct drm_device *dev,
+			unsigned long addr,
+			bool write,
+			int *num_pages,
+			struct page ***pages_out);
+
 /**
  * Returns true if seq1 is later than seq2.
  */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f554273..6cb2331 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -274,7 +274,7 @@ static int i915_gem_object_needs_bit17_swizzle(struct drm_i915_gem_object *obj)
  * Returns: an error code *and* the number of user pages acquired. Even
  * on an error, you must iterate over the returned pages and release them.
  */
-static int
+int
 i915_gem_get_user_pages(struct drm_device *dev,
 			unsigned long addr,
 			bool write,
@@ -1585,12 +1585,13 @@ i915_gem_mmap_gtt_ioctl(struct drm_device *dev, void *data,
 
 static int
 i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj,
-			      gfp_t gfpmask)
+			      gfp_t gfpmask,
+			      u32 *offset)
 {
-	int page_count, i;
 	struct address_space *mapping;
 	struct inode *inode;
 	struct page *page;
+	int i, page_count;
 
 	/* Get the list of pages out of our struct file.  They'll be pinned
 	 * at this point until we release them.
@@ -1618,6 +1619,7 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj,
 	if (obj->tiling_mode != I915_TILING_NONE)
 		i915_gem_object_do_bit_17_swizzle(obj);
 
+	*offset = 0;
 	return 0;
 
 err_pages:
@@ -1785,6 +1787,9 @@ i915_gem_object_truncate(struct drm_i915_gem_object *obj)
 {
 	struct inode *inode;
 
+	if (obj->base.filp == NULL)
+		return;
+
 	/* Our goal here is to return as much of the memory as
 	 * is possible back to the system as we are called from OOM.
 	 * To do this we must instruct the shmfs to drop all of its
@@ -2269,6 +2274,7 @@ static void i915_gem_object_finish_gtt(struct drm_i915_gem_object *obj)
 int
 i915_gem_object_unbind(struct drm_i915_gem_object *obj)
 {
+	const struct drm_i915_gem_object_ops *ops = obj->base.driver_private;
 	int ret = 0;
 
 	if (obj->gtt_space == NULL)
@@ -2313,7 +2319,7 @@ i915_gem_object_unbind(struct drm_i915_gem_object *obj)
 	trace_i915_gem_object_unbind(obj);
 
 	i915_gem_gtt_unbind_object(obj);
-	i915_gem_object_put_pages_gtt(obj);
+	ops->put_pages(obj);
 
 	list_del_init(&obj->gtt_list);
 	list_del_init(&obj->mm_list);
@@ -2859,11 +2865,14 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 			    unsigned alignment,
 			    bool map_and_fenceable)
 {
+	const struct drm_i915_gem_object_ops *ops = obj->base.driver_private;
 	struct drm_device *dev = obj->base.dev;
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct drm_mm_node *free_space;
 	gfp_t gfpmask = __GFP_NORETRY | __GFP_NOWARN;
-	u32 size, fence_size, fence_alignment, unfenced_alignment;
+	u32 fence_size, fence_alignment;
+	u32 unfenced_alignment;
+	u32 size, offset;
 	bool mappable, fenceable;
 	int ret;
 
@@ -2929,7 +2938,7 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 		goto search_free;
 	}
 
-	ret = i915_gem_object_get_pages_gtt(obj, gfpmask);
+	ret = ops->get_pages(obj, gfpmask, &offset);
 	if (ret) {
 		drm_mm_put_block(obj->gtt_space);
 		obj->gtt_space = NULL;
@@ -2955,7 +2964,7 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 
 	ret = i915_gem_gtt_bind_object(obj);
 	if (ret) {
-		i915_gem_object_put_pages_gtt(obj);
+		ops->put_pages(obj);
 		drm_mm_put_block(obj->gtt_space);
 		obj->gtt_space = NULL;
 
@@ -2975,11 +2984,11 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 	BUG_ON(obj->base.read_domains & I915_GEM_GPU_DOMAINS);
 	BUG_ON(obj->base.write_domain & I915_GEM_GPU_DOMAINS);
 
-	obj->gtt_offset = obj->gtt_space->start;
+	obj->gtt_offset = obj->gtt_space->start + offset;
 
 	fenceable =
 		obj->gtt_space->size == fence_size &&
-		(obj->gtt_space->start & (fence_alignment -1)) == 0;
+		(obj->gtt_offset & (fence_alignment -1)) == 0;
 
 	mappable =
 		obj->gtt_offset + obj->base.size <= dev_priv->mm.gtt_mappable_end;
@@ -3779,27 +3788,16 @@ unlock:
 	return ret;
 }
 
-struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
-						  size_t size)
+void
+i915_gem_object_init(struct drm_i915_gem_object *obj,
+		     const struct drm_i915_gem_object_ops *ops)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct drm_i915_gem_object *obj;
-
-	obj = kmem_cache_alloc(dev_priv->slab, GFP_KERNEL | __GFP_ZERO);
-	if (obj == NULL)
-		return NULL;
-
-	if (drm_gem_object_init(dev, &obj->base, size) != 0) {
-		kfree(obj);
-		return NULL;
-	}
-
-	i915_gem_info_add_obj(dev_priv, size);
+	obj->base.driver_private = (void *)ops;
 
 	obj->base.write_domain = I915_GEM_DOMAIN_CPU;
 	obj->base.read_domains = I915_GEM_DOMAIN_CPU;
 
-	if (IS_GEN6(dev)) {
+	if (IS_GEN6(obj->base.dev)) {
 		/* On Gen6, we can have the GPU use the LLC (the CPU
 		 * cache) for about a 10% performance improvement
 		 * compared to uncached.  Graphics requests other than
@@ -3816,7 +3814,6 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 	} else
 		obj->cache_level = I915_CACHE_NONE;
 
-	obj->base.driver_private = NULL;
 	obj->fence_reg = I915_FENCE_REG_NONE;
 	INIT_LIST_HEAD(&obj->mm_list);
 	INIT_LIST_HEAD(&obj->gtt_list);
@@ -3824,9 +3821,35 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 	INIT_LIST_HEAD(&obj->exec_list);
 	INIT_LIST_HEAD(&obj->gpu_write_list);
 	obj->madv = I915_MADV_WILLNEED;
+
 	/* Avoid an unnecessary call to unbind on the first bind. */
 	obj->map_and_fenceable = true;
 
+	i915_gem_info_add_obj(obj->base.dev->dev_private, obj->base.size);
+}
+
+static const struct drm_i915_gem_object_ops i915_gem_object_ops = {
+	.get_pages = i915_gem_object_get_pages_gtt,
+	.put_pages = i915_gem_object_put_pages_gtt,
+};
+
+struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
+						  size_t size)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_gem_object *obj;
+
+	obj = kmem_cache_alloc(dev_priv->slab, GFP_KERNEL | __GFP_ZERO);
+	if (obj == NULL)
+		return NULL;
+
+	if (drm_gem_object_init(dev, &obj->base, size) != 0) {
+		kfree(obj);
+		return NULL;
+	}
+
+	i915_gem_object_init(obj, &i915_gem_object_ops);
+
 	return obj;
 }
 
@@ -4056,7 +4079,7 @@ i915_gem_load(struct drm_device *dev)
 
 	dev_priv->slab =
 		kmem_cache_create("i915_gem_object",
-				  sizeof(struct drm_i915_gem_object), 0,
+				  sizeof(union drm_i915_gem_objects), 0,
 				  SLAB_HWCACHE_ALIGN,
 				  NULL);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_vmap.c b/drivers/gpu/drm/i915/i915_gem_vmap.c
new file mode 100644
index 0000000..89a4ac4
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_vmap.c
@@ -0,0 +1,149 @@
+/*
+ * Copyright © 2010 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "drmP.h"
+#include "drm.h"
+#include "i915_drm.h"
+#include "i915_drv.h"
+#include "i915_trace.h"
+#include "intel_drv.h"
+#include <linux/swap.h>
+
+static struct i915_gem_vmap_object *to_vmap_object(struct drm_i915_gem_object *obj)
+{
+	return container_of(obj, struct i915_gem_vmap_object, gem);
+}
+
+static int
+i915_gem_vmap_get_pages(struct drm_i915_gem_object *obj, gfp_t gfp, u32 *offset)
+{
+	struct i915_gem_vmap_object *vmap = to_vmap_object(obj);
+	int num_pages = vmap->gem.base.size >> PAGE_SHIFT;
+	struct page **pages;
+	int i;
+
+	if (!access_ok(vmap->read_only ? VERIFY_READ : VERIFY_WRITE,
+		       (char __user *)vmap->user_ptr, vmap->user_size))
+		return -EFAULT;
+
+	if (i915_gem_get_user_pages(obj->base.dev,
+				    vmap->user_ptr,
+				    !vmap->read_only,
+				    &num_pages,
+				    &pages))
+		goto err;
+
+	vmap->gem.pages = pages;
+	*offset = offset_in_page(vmap->user_ptr);
+	return 0;
+
+err:
+	for (i = 0; i < num_pages; i++)
+		page_cache_release(pages[i]);
+	drm_free_large(pages);
+
+	return vmap->gem.pages ? -EAGAIN : -EFAULT;
+}
+
+static void
+i915_gem_vmap_put_pages(struct drm_i915_gem_object *obj)
+{
+	int num_pages = obj->base.size >> PAGE_SHIFT;
+	int i;
+
+	for (i = 0; i < num_pages; i++) {
+		if (obj->dirty)
+			set_page_dirty(obj->pages[i]);
+
+		mark_page_accessed(obj->pages[i]);
+		page_cache_release(obj->pages[i]);
+	}
+
+	obj->dirty = 0;
+	drm_free_large(obj->pages);
+	obj->pages = NULL;
+}
+
+static const struct drm_i915_gem_object_ops i915_gem_vmap_ops = {
+	.get_pages = i915_gem_vmap_get_pages,
+	.put_pages = i915_gem_vmap_put_pages,
+};
+
+/**
+ * Creates a new mm object that wraps some user memory.
+ */
+int
+i915_gem_vmap_ioctl(struct drm_device *dev, void *data,
+		    struct drm_file *file)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_gem_vmap *args = data;
+	struct i915_gem_vmap_object *obj;
+	loff_t first_data_page, last_data_page;
+	int num_pages;
+	int ret;
+	u32 handle;
+
+	first_data_page = args->user_ptr / PAGE_SIZE;
+	last_data_page = (args->user_ptr + args->user_size - 1) / PAGE_SIZE;
+	num_pages = last_data_page - first_data_page + 1;
+	if (num_pages * PAGE_SIZE > dev_priv->mm.gtt_total)
+		return -E2BIG;
+
+	ret = fault_in_pages_readable((char __user *)(uintptr_t)args->user_ptr,
+				      args->user_size);
+	if (ret)
+		return ret;
+
+	/* Allocate the new object */
+	obj = kmem_cache_alloc(dev_priv->slab, GFP_KERNEL | __GFP_ZERO);
+	if (obj == NULL)
+		return -ENOMEM;
+
+	obj->gem.base.dev = dev;
+	obj->gem.base.size = num_pages * PAGE_SIZE;
+
+	kref_init(&obj->gem.base.refcount);
+	atomic_set(&obj->gem.base.handle_count, 0);
+
+	i915_gem_object_init(&obj->gem, &i915_gem_vmap_ops);
+	obj->gem.cache_level = I915_CACHE_LLC_MLC;
+
+	obj->user_ptr = args->user_ptr;
+	obj->user_size = args->user_size;
+	obj->read_only = args->flags & I915_VMAP_READ_ONLY;
+
+	ret = drm_gem_handle_create(file, &obj->gem.base, &handle);
+	if (ret) {
+		drm_gem_object_release(&obj->gem.base);
+		kfree(obj);
+		return ret;
+	}
+
+	/* drop reference from allocate - handle holds it now */
+	drm_gem_object_unreference(&obj->gem.base);
+
+	args->handle = handle;
+	return 0;
+}
diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
index c4d6dbf..f02f6d7 100644
--- a/include/drm/i915_drm.h
+++ b/include/drm/i915_drm.h
@@ -198,6 +198,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_OVERLAY_PUT_IMAGE	0x27
 #define DRM_I915_OVERLAY_ATTRS	0x28
 #define DRM_I915_GEM_EXECBUFFER2	0x29
+#define DRM_I915_GEM_VMAP	0x2a
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
 #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
@@ -239,6 +240,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_MADVISE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_MADVISE, struct drm_i915_gem_madvise)
 #define DRM_IOCTL_I915_OVERLAY_PUT_IMAGE	DRM_IOW(DRM_COMMAND_BASE + DRM_IOCTL_I915_OVERLAY_ATTRS, struct drm_intel_overlay_put_image)
 #define DRM_IOCTL_I915_OVERLAY_ATTRS	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_OVERLAY_ATTRS, struct drm_intel_overlay_attrs)
+#define DRM_IOCTL_I915_GEM_VMAP	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_VMAP, struct drm_i915_gem_vmap)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -291,6 +293,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_COHERENT_RINGS	 13
 #define I915_PARAM_HAS_EXEC_CONSTANTS	 14
 #define I915_PARAM_HAS_RELAXED_DELTA	 15
+#define I915_PARAM_HAS_VMAP		 16
 
 typedef struct drm_i915_getparam {
 	int param;
@@ -388,6 +391,19 @@ struct drm_i915_gem_create {
 	__u32 pad;
 };
 
+struct drm_i915_gem_vmap {
+	__u64 user_ptr;
+	__u32 user_size;
+	__u32 flags;
+#define I915_VMAP_READ_ONLY 0x1
+	/**
+	 * Returned handle for the object.
+	 *
+	 * Object handles are nonzero.
+	 */
+	__u32 handle;
+};
+
 struct drm_i915_gem_pread {
 	/** Handle for the object being read. */
 	__u32 handle;
-- 
1.7.4.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/21] drm/i915: Track fence setup separately from fenced object lifetime
  2011-04-16  9:17 ` [PATCH 07/21] drm/i915: Track fence setup separately from fenced object lifetime Chris Wilson
@ 2011-04-16 13:20   ` Daniel Vetter
  0 siblings, 0 replies; 33+ messages in thread
From: Daniel Vetter @ 2011-04-16 13:20 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Andy Whitcroft, Daniel Vetter, intel-gfx

On Sat, Apr 16, 2011 at 10:17:31AM +0100, Chris Wilson wrote:
> This fixes a bookkeeping error causing an OOPS whilst waiting for an
> object to finish using a fence. Now we can simply wait for the fence to
> be written independent of the objects currently inhabiting it (past,
> present and future).
> 
> A large amount of the change is to delay updating the information about
> the fence on bo until after we successfully write, or queue the write to,
> the register. This avoids the complication of undoing a partial change
> should we fail in pipelining the change.

Reordering the accounting and consistently using setup_seqno/ring to track
fence changes looks correct.

I still see a few minor issues with the pipelined fencing code that should
be addressed in later patches:
- setup_seqno/ring is not being cleanup up anywhere and might go stale.
  The hack I've used in one of my stabs at this was to loop over all
  fences at the end of retire_requests.
- The semantics of last_fenced_seqno/ring are a bit hairy. I think ideal
  would be if the following would always hold:
  * an object may never change it's ring if last_fenced_seqno != 0. A call
    to flush fence is required to change the ring.
  * if last_fenced_seqno != 0, then always last_rendering_seqno != 0 (i.e.
    if the execbuffer fails we still move all objects with pipelined
    fencing setups to the active list).
  These two combined should give:
    last_fenced_ring != NULL implies last_fenced_ring == last_rendering_ring
  allowing us to simplify a few things.
- The obj->fenced_gpu_access optimization got killed in a previous patch.
  We could resurrect that by clearing it in process_flushing_list. With
  that change (and perhaps some movement of the assignement of
  fenced_gpu_access) we could consolidate the last_fenced_seqno/ring
  tracking.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 16/21] drm/i915: Retire requests before disabling pagefaults
  2011-04-16  9:17 ` [PATCH 16/21] drm/i915: Retire requests before disabling pagefaults Chris Wilson
@ 2011-04-16 13:44   ` Daniel Vetter
  0 siblings, 0 replies; 33+ messages in thread
From: Daniel Vetter @ 2011-04-16 13:44 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Sat, Apr 16, 2011 at 10:17:40AM +0100, Chris Wilson wrote:
> As we cannot wait upon an object to be released by the GPU once we have
> disabled pagefaults, process any pending retirements first in the hope
> that we move any potential relocations off the active list.

I have a hard time believing how one could emit relocs to active objects
... I suspect it is partially a cache thing: Retire stuff while the
gem_objects are sitting in L1 cache anyway can't hurt. So

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 17/21] drm/i915: Repeat retiring of requests until the seqno is stable
  2011-04-16  9:17 ` [PATCH 17/21] drm/i915: Repeat retiring of requests until the seqno is stable Chris Wilson
@ 2011-04-16 13:45   ` Daniel Vetter
  0 siblings, 0 replies; 33+ messages in thread
From: Daniel Vetter @ 2011-04-16 13:45 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

I'm gonna trust you not to have fat-fingered the indent change ;-)

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 18/21] drm/i915: Split out i915_gem_object_move_to_ring() from execbuffer
  2011-04-16  9:17 ` [PATCH 18/21] drm/i915: Split out i915_gem_object_move_to_ring() from execbuffer Chris Wilson
@ 2011-04-16 13:54   ` Daniel Vetter
  2011-04-16 14:18     ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Vetter @ 2011-04-16 13:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Sat, Apr 16, 2011 at 10:17:42AM +0100, Chris Wilson wrote:
> As we can make use of the ability to insert semaphores to serialise
> accessing buffers between ring elsewhere, separate out the function from
> the execbuffer code and make it generic.

Perhaps add a small note somewhere that move_to_ring now does the right
thing when to == NULL (falling back to wait_rendering). I've hunted around
a bit for that ...

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 19/21] drm/i915: Enable the use of GPU semaphores whilst page-flipping
  2011-04-16  9:17 ` [PATCH 19/21] drm/i915: Enable the use of GPU semaphores whilst page-flipping Chris Wilson
@ 2011-04-16 13:58   ` Daniel Vetter
  2011-04-16 14:20     ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Vetter @ 2011-04-16 13:58 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Sat, Apr 16, 2011 at 10:17:43AM +0100, Chris Wilson wrote:
> The 2D driver unfortunately uses the BLT ring in order to perform blits,
> and will often blit between buffers in the course of a pageflip. (Though
> usually it is from the post-flipped scanout to the window frontbuffer
> for reasons known only to itself, maintaining the front buffer in case
> of 2D access one presumes.)
> 
> As such, being able to use a GPU semaphore and not stall the CPU/GPU
> whilst switching between rings for a pageflip is useful.

Hey, we could issue flips on the blt ring .... <CARRIER LOST>

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 20/21] drm/i915: Use a slab for object allocation
  2011-04-16  9:17 ` [PATCH 20/21] drm/i915: Use a slab for object allocation Chris Wilson
@ 2011-04-16 14:07   ` Daniel Vetter
  0 siblings, 0 replies; 33+ messages in thread
From: Daniel Vetter @ 2011-04-16 14:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 18/21] drm/i915: Split out i915_gem_object_move_to_ring() from execbuffer
  2011-04-16 13:54   ` Daniel Vetter
@ 2011-04-16 14:18     ` Chris Wilson
  2011-04-16 14:24       ` Daniel Vetter
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2011-04-16 14:18 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Sat, 16 Apr 2011 15:54:56 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Sat, Apr 16, 2011 at 10:17:42AM +0100, Chris Wilson wrote:
> > As we can make use of the ability to insert semaphores to serialise
> > accessing buffers between ring elsewhere, separate out the function from
> > the execbuffer code and make it generic.
> 
> Perhaps add a small note somewhere that move_to_ring now does the right
> thing when to == NULL (falling back to wait_rendering). I've hunted around
> a bit for that ...

/**
 * Serialise an object between rings: wait for it to complete on the first
 * ring, before it can be used on the next.
 *
 * If the object is staying on the same ring, this is a no-op.
 *
 * If the object is not currently on a ring, this is a no-op.
 *
 * If the object is moving off a ring (i.e. to == NULL), then we wait for
 * rendering to complete entirely.
 *
 * The interesting case is when we move between two different rings. On
 * pre-SandyBridge hw, we have no choice but to wait until rendering has
 * finished. SandyBridge, however introduces the GPU semaphore which we
 * can use to cause one ring to wait upon the signal of another - avoiding
 * the CPU stall.
 *
 * We assume that the caller has emitted all required flushes.
 */
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 19/21] drm/i915: Enable the use of GPU semaphores whilst page-flipping
  2011-04-16 13:58   ` Daniel Vetter
@ 2011-04-16 14:20     ` Chris Wilson
  0 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-16 14:20 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Sat, 16 Apr 2011 15:58:52 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Sat, Apr 16, 2011 at 10:17:43AM +0100, Chris Wilson wrote:
> > As such, being able to use a GPU semaphore and not stall the CPU/GPU
> > whilst switching between rings for a pageflip is useful.
> 
> Hey, we could issue flips on the blt ring .... <CARRIER LOST>

I hear that one day that may even be supported in hardware! What dreams we
have!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 18/21] drm/i915: Split out i915_gem_object_move_to_ring() from execbuffer
  2011-04-16 14:18     ` Chris Wilson
@ 2011-04-16 14:24       ` Daniel Vetter
  0 siblings, 0 replies; 33+ messages in thread
From: Daniel Vetter @ 2011-04-16 14:24 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Sat, Apr 16, 2011 at 03:18:46PM +0100, Chris Wilson wrote:
> On Sat, 16 Apr 2011 15:54:56 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Sat, Apr 16, 2011 at 10:17:42AM +0100, Chris Wilson wrote:
> > > As we can make use of the ability to insert semaphores to serialise
> > > accessing buffers between ring elsewhere, separate out the function from
> > > the execbuffer code and make it generic.
> > 
> > Perhaps add a small note somewhere that move_to_ring now does the right
> > thing when to == NULL (falling back to wait_rendering). I've hunted around
> > a bit for that ...
> 
> /**
>  * Serialise an object between rings: wait for it to complete on the first
>  * ring, before it can be used on the next.
>  *
>  * If the object is staying on the same ring, this is a no-op.
>  *
>  * If the object is not currently on a ring, this is a no-op.
>  *
>  * If the object is moving off a ring (i.e. to == NULL), then we wait for
>  * rendering to complete entirely.
>  *
>  * The interesting case is when we move between two different rings. On
>  * pre-SandyBridge hw, we have no choice but to wait until rendering has
>  * finished. SandyBridge, however introduces the GPU semaphore which we
>  * can use to cause one ring to wait upon the signal of another - avoiding
>  * the CPU stall.
>  *
>  * We assume that the caller has emitted all required flushes.
>  */
Perfect!
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 21/21] drm/i915: Introduce vmap (mapping of user pages into video memory) ioctl
  2011-04-16  9:17 ` [PATCH 21/21] drm/i915: Introduce vmap (mapping of user pages into video memory) ioctl Chris Wilson
@ 2011-04-18 14:58   ` Daniel Vetter
  2011-04-19  6:20     ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Vetter @ 2011-04-18 14:58 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Dave Airlie, intel-gfx

Hi Chris,

First things first, vmap scares me. It's scariness is threefold:
- We haven't really used snooped memory seriously anywhere. And every time
  we newly use hw features, stuff tends to blow up in hilarious ways.
- Using snooped memory is rather different than using uc gtt mappings and
  has the potential for tons of funny api issues.
- With vmap gem is suddenly no longer in full control of its backing
  storage - imo a rather fundamental change.

With that admission out of the way follows the hopefully more rational
part ;-) First I look a bit at the use case for vmap and the problems (as
I see them) it tries to solve. Then comes fear rationalization in the form
of issues I see concerning the above three points. At the end then some
ideas as how to procedd.

I haven't yet looked in-depth into the code, so there's a decent chance
I've got some misconceptions. Please correct me.

Usage
-----

... to tightly integrate gpu rendering with rendering on the cpu, in
whatever form. Your current code uses this for really fast X fallbacks
(optimized with damage tracking to avoid unnecessary syncing), but I think
it might also prove useful for vbo upload, especially with swtnl. In
short: transfers.

There are two ways to further differentiate this: One is whether the
driver can allocate the storage for the cpu renderer or whether that is
handed down to it as fixed (X shm). The other is whether the transfer is a
oneshot or can reasonably be expected to be reused a lot of times (and
hence setup time be amortized).

vmap is squarely designed to for case of externally fixed memory which
will be reused. It obviously subsumes the driver provided memory case (for
not-oneshot stuff). In your X rework you also use it for small oneshot
stuff by amalgating many smaller transfers into larger vmaps.

Other contenders in the transfer business are:
- uc gtt maps, which just suck (besides oneshot uploads of giant stuff)
- and pwrite/pread, which due to abi-design is unusable for 2d/tiled
  objects.

Issues/questions: hw
-------------------------

- It newly uses a hw features. If that is not broken in certain special
  ways itself, it will surely turn up other bugs by changing the timing of
  all kind of things. It would be great to have a smoke (stress) test to
  have at least a vague idea of how well it works on different machines.

- While strolling through docs, I've found the following gem (Public ilk
  docs, Vol1Part2, Table 3-4 on page 53).

    "System Memory Address. Accesses are snooped in processor cache,
    allowing shared graphics/ processor access to (locked) cacheable
    memory data."

  With the range being "[0, 4GB]" Device range is unspecified, and might
  be DevBW-DevILK. If that's true, it sounds like a showstopper for
  vmapping random userspace address ranges.

  (Aside: that comment about (locked) is interesting. Could that explain
  why we sometimes miss seqno interrupts because we read a stale seqno
  from the hw page due to lack of a asm "lock" prefix? That might be
  necessary to avoid reading stale (virtual address indexed) L1 contents
  which the gpu might not be able to snoop.)

- Another thing I've noticed: You're using MLC_LLC caching for vmaps. Is
  that required to get coherent behaviour? If so we might need to change
  the if (cache_level != NONE) return; to if (cache_level == MLC_LLC)
  return; in clflush_object (for snb).

Issues/questions: api
---------------------

- One thing we already agreed on: the kernel won't magically redirect gtt
  mappings to cpu mappings. Userspace has to adapt.

- Your userspace already tracks the damage in both the cpu and the gpu
  portion of a pixmap. So I think it makes sense to fully offload
  coherency management to userspace, including gpu synchronization.
  Shooting down ptes would likely kill all benefits, anyway ;-)

- You're using set_domain_ioctl for synchronization. It seems to be enough
  for uber-synchronous X. GL's a bit better and might benefit from
  asynchronous execution fences (hello ttm). Nothing pressing, but
  something to keep in mind.

Issues/questions: vmap specifically
-----------------------------------

I can't do fear rationalization here, it essentially boils down to me not
being an -mm dev.

The other thing I'm wondering is what other use cases than X shm accel
this has. Single-purpose api extensions at the kernel level cause a
somewhat irky feeling for me.

I've also thought a bit about backwards compat of the new userspace code.
But given that vmap is just a special scheme for transfers I don't see a
problem in hiding it behind a bit of abstraction.

Ideas for moving forward
------------------------

In conclusion I think the idea of vmap - to use snoopeable mem for tight
cpu/gpu integration - is awesome and your experimental branches serve a
great landmark for where to head to. But my gut feeeling also says that
it's a too big leap to mainline in just one step. Hence this proposal for
a tad bit more piecemeal approach.  The goal being not to tackle
everything at once and to have the opportunity for some adjustment
mid-way.

First start with a new ioctl to enable snooped bos. We might prepare for a
kernel object cache and introduce a new gem_create ioctl, or just a
set_cacheable_ioctl. In short, just implement the "driver provided memory"
use case.

I think a smoke test should exist at merge time. Not just to have an idea
of which chipsets actually work, but also to have a clear example of the
intended use of snoopable bos wrt to synchronization. I.e. a simpel
example to think through the api implications.

Then convert over userspace. By looking at your snb branch, there's quite
some work to do. But there's also about 4-5 months until the 2.640
release, so should be plenty of time.

Using snoopable mem in less synchronous settings like mesa perhaps also
shows some api-hole that needs stuffing.

Then, when users adopt this in masses (probably somwhere in Q3) and we've
weeded out all the api kinks and worked around hw oddities, I think it's
time to re-evaluate the "externally provided memory" use-case and take a
look at vmap (the ioctl as proposed in your patch).

I have hopes that we might be able to subsume that use-case into the
single-shot use-case by beefing up pwrite/read with a blt variant that
does the right thing for 2d/tiled buffers (and also handles stride
mismatches). That feels a bit more generally useful, which is why I lean a
bit towards it. On the other hand we might turn vmap on it's head and
create an shm id out of a bo for X to use (if X shm turns out to be the
only user for such a thing).

Comments, ideas, flames highly welcome.

Yours, Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 21/21] drm/i915: Introduce vmap (mapping of user pages into video memory) ioctl
  2011-04-18 14:58   ` Daniel Vetter
@ 2011-04-19  6:20     ` Chris Wilson
  0 siblings, 0 replies; 33+ messages in thread
From: Chris Wilson @ 2011-04-19  6:20 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Dave Airlie, intel-gfx

Thanks for excellent comments... Skipping to the end for a quick response:

On Mon, 18 Apr 2011 16:58:17 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> I have hopes that we might be able to subsume that use-case into the
> single-shot use-case by beefing up pwrite/read with a blt variant that
> does the right thing for 2d/tiled buffers (and also handles stride
> mismatches). That feels a bit more generally useful, which is why I lean a
> bit towards it. On the other hand we might turn vmap on it's head and
> create an shm id out of a bo for X to use (if X shm turns out to be the
> only user for such a thing).

This is where I started: a new 2D pread/pwrite ioctl that used the BLT if
it so desired.

This was flawed in my use cases by the synchronous nature of the API I
used. When I thought about introducing async versions, I realised that it
became so much simpler if I moved the entirety of the serialisation into
userspace and vmap was born. And then I realised that a vmap bo could be
used in places other than simple BLT uploads and downloads.

Meanwhile, I'm intrigued by the idea of integrating SHM and bo... The use
case is a bit narrow though, latter protocol designs already use the bo as
the natural transport.

Anyway, it looks like my next task is see if vmap is a workable interface
for Mesa as well. I'd much prefer spending another couple of months
addressing your comments and widening the use cases, so I can drop this
patch for this merge window.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2011-04-19  6:20 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-16  9:17 i915 next, post-llc Chris Wilson
2011-04-16  9:17 ` [PATCH 01/21] drm/i915: Cache GT fifo count for SandyBridge Chris Wilson
2011-04-16  9:17 ` [PATCH 02/21] drm/i915: Refactor pwrite/pread to use single copy of get_user_pages Chris Wilson
2011-04-16  9:17 ` [PATCH 03/21] drm/i915: s/addr & ~PAGE_MASK/offset_in_page(addr)/ Chris Wilson
2011-04-16  9:17 ` [PATCH 04/21] drm/i915: Maintain fenced gpu access until we flush the fence Chris Wilson
2011-04-16  9:17 ` [PATCH 05/21] drm/i915: Invalidate fenced read domains upon flush Chris Wilson
2011-04-16  9:17 ` [PATCH 06/21] drm/i915: Pass the fence register number to be written Chris Wilson
2011-04-16  9:17 ` [PATCH 07/21] drm/i915: Track fence setup separately from fenced object lifetime Chris Wilson
2011-04-16 13:20   ` Daniel Vetter
2011-04-16  9:17 ` [PATCH 08/21] drm/i915: Only print out the actual number of fences for i915_error_state Chris Wilson
2011-04-16  9:17 ` [PATCH 09/21] drm/i915: Implement direct support for 24 bit LVDS pixel format Chris Wilson
2011-04-16  9:17 ` [PATCH 10/21] drm/i915: Implement manual override of LVDS single/dual channel mode Chris Wilson
2011-04-16  9:17 ` [PATCH 11/21] drm/i915/tv: Use a direct pointer for tv_mode Chris Wilson
2011-04-16  9:17 ` [PATCH 12/21] drm/i915: Replace ironlake_compute_wm0 with g4x_compute_wm0 Chris Wilson
2011-04-16  9:17 ` [PATCH 13/21] drm/i915/crt: Explicitly return false if connected to a digital monitor Chris Wilson
2011-04-16  9:17 ` [PATCH 14/21] drm/i915/i2c: Convert from using GMBUS1 + reg_offset idiom to reg + 0 Chris Wilson
2011-04-16  9:17 ` [PATCH 15/21] drm/i915/gmbus: Reset the controller on initialisation Chris Wilson
2011-04-16  9:17 ` [PATCH 16/21] drm/i915: Retire requests before disabling pagefaults Chris Wilson
2011-04-16 13:44   ` Daniel Vetter
2011-04-16  9:17 ` [PATCH 17/21] drm/i915: Repeat retiring of requests until the seqno is stable Chris Wilson
2011-04-16 13:45   ` Daniel Vetter
2011-04-16  9:17 ` [PATCH 18/21] drm/i915: Split out i915_gem_object_move_to_ring() from execbuffer Chris Wilson
2011-04-16 13:54   ` Daniel Vetter
2011-04-16 14:18     ` Chris Wilson
2011-04-16 14:24       ` Daniel Vetter
2011-04-16  9:17 ` [PATCH 19/21] drm/i915: Enable the use of GPU semaphores whilst page-flipping Chris Wilson
2011-04-16 13:58   ` Daniel Vetter
2011-04-16 14:20     ` Chris Wilson
2011-04-16  9:17 ` [PATCH 20/21] drm/i915: Use a slab for object allocation Chris Wilson
2011-04-16 14:07   ` Daniel Vetter
2011-04-16  9:17 ` [PATCH 21/21] drm/i915: Introduce vmap (mapping of user pages into video memory) ioctl Chris Wilson
2011-04-18 14:58   ` Daniel Vetter
2011-04-19  6:20     ` Chris Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).