All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/24] PPGTT dynamic page allocations
@ 2014-12-18 17:09 Michel Thierry
  2014-12-18 17:09 ` [PATCH 01/24] drm/i915: Add some extra guards in evict_vm Michel Thierry
                   ` (29 more replies)
  0 siblings, 30 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:09 UTC (permalink / raw)
  To: intel-gfx

This new version tries to remove as many unnecessary changes as possible from
the previous RFC.
 
For GEN8, it has also been extended to work in logical ring submission (lrc)
mode, as it will be the preferred mode of operation.
I also tried to update the lrc code at the same time the ppgtt refactoring
occurred, leaving only one patch that is exclusively for lrc.

This list can be seen in 3 parts:
[01-10] Include code rework for PPGTT (all GENs).
[11-14] Adds page table allocation for GEN6/GEN7
[15-24] Enables dynamic allocation in GEN8. It is enabled for both legacy
and execlist submission modes.

Ben Widawsky (23):
  drm/i915: Add some extra guards in evict_vm
  drm/i915/trace: Fix offsets for 64b
  drm/i915: Rename to GEN8_LEGACY_PDPES
  drm/i915: Setup less PPGTT on failed pagedir
  drm/i915/gen8: Un-hardcode number of page directories
  drm/i915: Range clearing is PPGTT agnostic
  drm/i915: page table abstractions
  drm/i915: Complete page table structures
  drm/i915: Create page table allocators
  drm/i915: Track GEN6 page table usage
  drm/i915: Extract context switch skip logic
  drm/i915: Track page table reload need
  drm/i915: Initialize all contexts
  drm/i915: Finish gen6/7 dynamic page table allocation
  drm/i915/bdw: Use dynamic allocation idioms on free
  drm/i915/bdw: pagedirs rework allocation
  drm/i915/bdw: pagetable allocation rework
  drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
  drm/i915: num_pd_pages/num_pd_entries isn't useful
  drm/i915: Extract PPGTT param from pagedir alloc
  drm/i915/bdw: Split out mappings
  drm/i915/bdw: begin bitmap tracking
  drm/i915/bdw: Dynamic page table allocations

Michel Thierry (1):
  drm/i915/bdw: Dynamic page table allocations in lrc mode

 drivers/gpu/drm/i915/i915_debugfs.c        |    7 +-
 drivers/gpu/drm/i915/i915_drv.h            |    7 +
 drivers/gpu/drm/i915/i915_gem_context.c    |   62 +-
 drivers/gpu/drm/i915/i915_gem_evict.c      |    3 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   11 +
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 1224 ++++++++++++++++++++--------
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  252 +++++-
 drivers/gpu/drm/i915/i915_trace.h          |  124 ++-
 drivers/gpu/drm/i915/intel_lrc.c           |   80 +-
 9 files changed, 1378 insertions(+), 392 deletions(-)

-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH 01/24] drm/i915: Add some extra guards in evict_vm
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
@ 2014-12-18 17:09 ` Michel Thierry
  2014-12-18 17:09 ` [PATCH 02/24] drm/i915/trace: Fix offsets for 64b Michel Thierry
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:09 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

v2: Use WARN_ONs (Daniel)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_evict.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 886ff2e..3dc7b37 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -214,6 +214,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
 	struct i915_vma *vma, *next;
 	int ret;
 
+	WARN_ON(!mutex_is_locked(&vm->dev->struct_mutex));
 	trace_i915_gem_evict_vm(vm);
 
 	if (do_idle) {
@@ -222,6 +223,8 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
 			return ret;
 
 		i915_gem_retire_requests(vm->dev);
+
+		WARN_ON(!list_empty(&vm->active_list));
 	}
 
 	list_for_each_entry_safe(vma, next, &vm->inactive_list, mm_list)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 02/24] drm/i915/trace: Fix offsets for 64b
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
  2014-12-18 17:09 ` [PATCH 01/24] drm/i915: Add some extra guards in evict_vm Michel Thierry
@ 2014-12-18 17:09 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 03/24] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:09 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_trace.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 6058a01..f004d3d 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -115,7 +115,7 @@ TRACE_EVENT(i915_vma_bind,
 	    TP_STRUCT__entry(
 			     __field(struct drm_i915_gem_object *, obj)
 			     __field(struct i915_address_space *, vm)
-			     __field(u32, offset)
+			     __field(u64, offset)
 			     __field(u32, size)
 			     __field(unsigned, flags)
 			     ),
@@ -128,7 +128,7 @@ TRACE_EVENT(i915_vma_bind,
 			   __entry->flags = flags;
 			   ),
 
-	    TP_printk("obj=%p, offset=%08x size=%x%s vm=%p",
+	    TP_printk("obj=%p, offset=%016llx size=%x%s vm=%p",
 		      __entry->obj, __entry->offset, __entry->size,
 		      __entry->flags & PIN_MAPPABLE ? ", mappable" : "",
 		      __entry->vm)
@@ -141,7 +141,7 @@ TRACE_EVENT(i915_vma_unbind,
 	    TP_STRUCT__entry(
 			     __field(struct drm_i915_gem_object *, obj)
 			     __field(struct i915_address_space *, vm)
-			     __field(u32, offset)
+			     __field(u64, offset)
 			     __field(u32, size)
 			     ),
 
@@ -152,7 +152,7 @@ TRACE_EVENT(i915_vma_unbind,
 			   __entry->size = vma->node.size;
 			   ),
 
-	    TP_printk("obj=%p, offset=%08x size=%x vm=%p",
+	    TP_printk("obj=%p, offset=%016llx size=%x vm=%p",
 		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
 );
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 03/24] drm/i915: Rename to GEN8_LEGACY_PDPES
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
  2014-12-18 17:09 ` [PATCH 01/24] drm/i915: Add some extra guards in evict_vm Michel Thierry
  2014-12-18 17:09 ` [PATCH 02/24] drm/i915/trace: Fix offsets for 64b Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 20:40   ` Daniel Vetter
  2014-12-18 17:10 ` [PATCH 04/24] drm/i915: Setup less PPGTT on failed pagedir Michel Thierry
                   ` (26 subsequent siblings)
  29 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

In gen8, 32b PPGTT has always had one "pdp" (it doesn't actually have
one, but it resembles having one). The #define was confusing as is, and
using "PDPE" is a much better description.

sed -i 's/GEN8_LEGACY_PDPS/GEN8_LEGACY_PDPES/' drivers/gpu/drm/i915/*.[ch]

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 +++---
 drivers/gpu/drm/i915/i915_gem_gtt.h | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 75a29a3..9639310 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -375,7 +375,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
-		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPS))
+		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
 			break;
 
 		if (pt_vaddr == NULL)
@@ -486,7 +486,7 @@ bail:
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
 					   const int max_pdp)
 {
-	struct page **pt_pages[GEN8_LEGACY_PDPS];
+	struct page **pt_pages[GEN8_LEGACY_PDPES];
 	int i, ret;
 
 	for (i = 0; i < max_pdp; i++) {
@@ -537,7 +537,7 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 		return -ENOMEM;
 
 	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
-	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
+	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e377c7d..9d998ec 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -88,7 +88,7 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN8_PDE_MASK			0x1ff
 #define GEN8_PTE_SHIFT			12
 #define GEN8_PTE_MASK			0x1ff
-#define GEN8_LEGACY_PDPS		4
+#define GEN8_LEGACY_PDPES		4
 #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
 #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
 
@@ -273,12 +273,12 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_pages; /* gen8+ */
 	union {
 		struct page **pt_pages;
-		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
+		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
 	};
 	struct page *pd_pages;
 	union {
 		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
+		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
 	};
 	union {
 		dma_addr_t *pt_dma_addr;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 04/24] drm/i915: Setup less PPGTT on failed pagedir
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (2 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 03/24] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 05/24] drm/i915/gen8: Un-hardcode number of page directories Michel Thierry
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The current code will both potentially print a WARN, and setup part of
the PPGTT structure. Neither of these harm the current code, it is
simply for clarity, and to perhaps prevent later bugs, or weird
debug messages.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 9639310..e14e4cc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1032,11 +1032,14 @@ alloc:
 		goto alloc;
 	}
 
+	if (ret)
+		return ret;
+
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
 	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
-	return ret;
+	return 0;
 }
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 05/24] drm/i915/gen8: Un-hardcode number of page directories
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (3 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 04/24] drm/i915: Setup less PPGTT on failed pagedir Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 06/24] drm/i915: Range clearing is PPGTT agnostic Michel Thierry
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9d998ec..8f76990 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -282,7 +282,7 @@ struct i915_hw_ppgtt {
 	};
 	union {
 		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[4];
+		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
 	};
 
 	struct drm_i915_file_private *file_priv;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 06/24] drm/i915: Range clearing is PPGTT agnostic
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (4 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 05/24] drm/i915/gen8: Un-hardcode number of page directories Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 07/24] drm/i915: page table abstractions Michel Thierry
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Therefore we can do it from our general init function. Eventually, I
hope to have a lot more commonality like this. It won't arrive yet, but
this was a nice easy one.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e14e4cc..1341483 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -672,8 +672,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.start = 0;
 	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
 
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
 	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
 			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
 	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
@@ -1146,8 +1144,6 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
 			 ppgtt->node.start / PAGE_SIZE);
@@ -1183,6 +1179,8 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 		kref_init(&ppgtt->ref);
 		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
 			    ppgtt->base.total);
+		ppgtt->base.clear_range(&ppgtt->base, 0,
+			    ppgtt->base.total, true);
 		i915_init_vm(dev_priv, &ppgtt->base);
 	}
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 07/24] drm/i915: page table abstractions
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (5 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 06/24] drm/i915: Range clearing is PPGTT agnostic Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 08/24] drm/i915: Complete page table structures Michel Thierry
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

When we move to dynamic page allocation, keeping pagedir and pagetabs as
separate structures will help to break actions into simpler tasks.

To help transition the code nicely there is some wasted space in gen6/7.
This will be ameliorated shortly.

v2: fixed mismatches after clean-up/rebase.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 177 ++++++++++++++++++------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  23 ++++-
 2 files changed, 107 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1341483..49e87b0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -334,7 +334,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
+		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
+		struct page *page_table = pd->page_tables[pde].page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
@@ -378,8 +379,12 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
 			break;
 
-		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
+		if (pt_vaddr == NULL) {
+			struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
+			struct page *page_table = pd->page_tables[pde].page;
+
+			pt_vaddr = kmap_atomic(page_table);
+		}
 
 		pt_vaddr[pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -403,29 +408,33 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct page **pt_pages)
+static void gen8_free_page_tables(struct i915_pagedir *pd)
 {
 	int i;
 
-	if (pt_pages == NULL)
+	if (pd->page_tables == NULL)
 		return;
 
 	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
-		if (pt_pages[i])
-			__free_pages(pt_pages[i], 0);
+		if (pd->page_tables[i].page)
+			__free_page(pd->page_tables[i].page);
 }
 
-static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
+static void gen8_free_page_directories(struct i915_pagedir *pd)
+{
+	kfree(pd->page_tables);
+	__free_page(pd->page);
+}
+
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
-		kfree(ppgtt->gen8_pt_pages[i]);
+		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
+		gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
 		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
-
-	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
 }
 
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
@@ -460,86 +469,75 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static struct page **__gen8_alloc_page_tables(void)
+static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages;
 	int i;
 
-	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
-	if (!pt_pages)
-		return ERR_PTR(-ENOMEM);
-
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!pt_pages[i])
-			goto bail;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
+						     sizeof(dma_addr_t),
+						     GFP_KERNEL);
+		if (!ppgtt->gen8_pt_dma_addr[i])
+			return -ENOMEM;
 	}
 
-	return pt_pages;
-
-bail:
-	gen8_free_page_tables(pt_pages);
-	kfree(pt_pages);
-	return ERR_PTR(-ENOMEM);
+	return 0;
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
-					   const int max_pdp)
+static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages[GEN8_LEGACY_PDPES];
-	int i, ret;
+	int i, j;
 
-	for (i = 0; i < max_pdp; i++) {
-		pt_pages[i] = __gen8_alloc_page_tables();
-		if (IS_ERR(pt_pages[i])) {
-			ret = PTR_ERR(pt_pages[i]);
-			goto unwind_out;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+			struct i915_pagetab *pt = &ppgtt->pdp.pagedir[i].page_tables[j];
+
+			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+			if (!pt->page)
+				goto unwind_out;
 		}
 	}
 
-	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
-	 * "atomic" - for cleanup purposes.
-	 */
-	for (i = 0; i < max_pdp; i++)
-		ppgtt->gen8_pt_pages[i] = pt_pages[i];
-
 	return 0;
 
 unwind_out:
-	while (i--) {
-		gen8_free_page_tables(pt_pages[i]);
-		kfree(pt_pages[i]);
-	}
+	while (i--)
+		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
 
-	return ret;
+	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
+						const int max_pdp)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
+	for (i = 0; i < max_pdp; i++) {
+		struct i915_pagetab *pt;
 
-	return 0;
-}
+		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
+		if (!pt)
+			goto unwind_out;
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
-{
-	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
-	if (!ppgtt->pd_pages)
-		return -ENOMEM;
+		ppgtt->pdp.pagedir[i].page = alloc_page(GFP_KERNEL);
+		if (!ppgtt->pdp.pagedir[i].page)
+			goto unwind_out;
+
+		ppgtt->pdp.pagedir[i].page_tables = pt;
+	}
 
-	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
+	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
+
+unwind_out:
+	while (i--) {
+		kfree(ppgtt->pdp.pagedir[i].page_tables);
+		__free_page(ppgtt->pdp.pagedir[i].page);
+	}
+
+	return -ENOMEM;
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
@@ -551,18 +549,19 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
-	if (ret) {
-		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
-		return ret;
-	}
+	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
+	if (ret)
+		goto err_out;
 
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
 
 	ret = gen8_ppgtt_allocate_dma(ppgtt);
-	if (ret)
-		gen8_ppgtt_free(ppgtt);
+	if (!ret)
+		return ret;
 
+	/* TODO: Check this for all cases */
+err_out:
+	gen8_ppgtt_free(ppgtt);
 	return ret;
 }
 
@@ -573,7 +572,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       &ppgtt->pd_pages[pd], 0,
+			       ppgtt->pdp.pagedir[pd].page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
@@ -593,7 +592,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 	struct page *p;
 	int ret;
 
-	p = ppgtt->gen8_pt_pages[pd][pt];
+	p = ppgtt->pdp.pagedir[pd].page_tables[pt].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
@@ -654,7 +653,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 */
 	for (i = 0; i < max_pdp; i++) {
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i].page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
@@ -715,7 +714,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
 		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
@@ -920,7 +919,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > I915_PPGTT_PT_ENTRIES)
 			last_pte = I915_PPGTT_PT_ENTRIES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -949,7 +948,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -984,8 +983,8 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pt_pages[i]);
-	kfree(ppgtt->pt_pages);
+		__free_page(ppgtt->pd.page_tables[i].page);
+	kfree(ppgtt->pd.page_tables);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1042,22 +1041,22 @@ alloc:
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
+	struct i915_pagetab *pt;
 	int i;
 
-	ppgtt->pt_pages = kcalloc(ppgtt->num_pd_entries, sizeof(struct page *),
-				  GFP_KERNEL);
-
-	if (!ppgtt->pt_pages)
+	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
+	if (!pt)
 		return -ENOMEM;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!ppgtt->pt_pages[i]) {
+		pt[i].page = alloc_page(GFP_KERNEL);
+		if (!pt->page) {
 			gen6_ppgtt_free(ppgtt);
 			return -ENOMEM;
 		}
 	}
 
+	ppgtt->pd.page_tables = pt;
 	return 0;
 }
 
@@ -1092,9 +1091,11 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct page *page;
 		dma_addr_t pt_addr;
 
-		pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i], 0, 4096,
+		page = ppgtt->pd.page_tables[i].page;
+		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
 		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
@@ -1138,7 +1139,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total =  ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd_offset =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8f76990..1ff3c05 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -265,6 +265,20 @@ struct i915_gtt {
 			  unsigned long *mappable_end);
 };
 
+struct i915_pagetab {
+	struct page *page;
+};
+
+struct i915_pagedir {
+	struct page *page; /* NULL for GEN6-GEN7 */
+	struct i915_pagetab *page_tables;
+};
+
+struct i915_pagedirpo {
+	/* struct page *page; */
+	struct i915_pagedir pagedir[GEN8_LEGACY_PDPES];
+};
+
 struct i915_hw_ppgtt {
 	struct i915_address_space base;
 	struct kref ref;
@@ -272,11 +286,6 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		struct page **pt_pages;
-		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
-	};
-	struct page *pd_pages;
-	union {
 		uint32_t pd_offset;
 		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
 	};
@@ -284,6 +293,10 @@ struct i915_hw_ppgtt {
 		dma_addr_t *pt_dma_addr;
 		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
 	};
+	union {
+		struct i915_pagedirpo pdp;
+		struct i915_pagedir pd;
+	};
 
 	struct drm_i915_file_private *file_priv;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 08/24] drm/i915: Complete page table structures
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (6 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 07/24] drm/i915: page table abstractions Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 09/24] drm/i915: Create page table allocators Michel Thierry
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Move the remaining members over to the new page table structures.

This can be squashed with the previous commit if desire. The reasoning
is the same as that patch. I simply felt it is easier to review if split.

v2: In lrc: s/ppgtt->pd_dma_addr[i]/ppgtt->pdp.pagedir[i].daddr/

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 85 +++++++++++++------------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 14 +++---
 drivers/gpu/drm/i915/intel_lrc.c    | 16 +++----
 4 files changed, 45 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e515aad..60f91bc 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2153,7 +2153,7 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
 		seq_puts(m, "aliasing PPGTT:\n");
-		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
+		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
 
 		ppgtt->debug_dump(ppgtt, m);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 49e87b0..5a9b362 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -307,7 +307,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pd_dma_addr[i];
+		dma_addr_t addr = ppgtt->pdp.pagedir[i].daddr;
 		ret = gen8_write_pdp(ring, i, addr);
 		if (ret)
 			return ret;
@@ -433,7 +433,6 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
 		gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
-		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
 }
 
@@ -445,14 +444,14 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pd_dma_addr[i])
+		if (!ppgtt->pdp.pagedir[i].daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i].daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -469,32 +468,19 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
-{
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
-
-	return 0;
-}
-
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
 	int i, j;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[i];
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_pagetab *pt = &ppgtt->pdp.pagedir[i].page_tables[j];
+			struct i915_pagetab *pt = &pd->page_tables[j];
 
 			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 			if (!pt->page)
 				goto unwind_out;
+
 		}
 	}
 
@@ -555,9 +541,7 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
 
-	ret = gen8_ppgtt_allocate_dma(ppgtt);
-	if (!ret)
-		return ret;
+	return 0;
 
 	/* TODO: Check this for all cases */
 err_out:
@@ -579,7 +563,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ppgtt->pd_dma_addr[pd] = pd_addr;
+	ppgtt->pdp.pagedir[pd].daddr = pd_addr;
 
 	return 0;
 }
@@ -589,17 +573,18 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pt)
 {
 	dma_addr_t pt_addr;
-	struct page *p;
+	struct i915_pagedir *pdir = &ppgtt->pdp.pagedir[pd];
+	struct i915_pagetab *ptab = &pdir->page_tables[pt];
+	struct page *p = ptab->page;
 	int ret;
 
-	p = ppgtt->pdp.pagedir[pd].page_tables[pt].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
+	ptab->daddr = pt_addr;
 
 	return 0;
 }
@@ -655,7 +640,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		gen8_ppgtt_pde_t *pd_vaddr;
 		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i].page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -696,14 +681,15 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
 
 	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
 	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-		   ppgtt->pd_offset, ppgtt->pd_offset + ppgtt->num_pd_entries);
+		   ppgtt->pd.pd_offset,
+		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pt_dma_addr[pde];
+		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -747,13 +733,13 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 	uint32_t pd_entry;
 	int i;
 
-	WARN_ON(ppgtt->pd_offset & 0x3f);
+	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
 	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		dma_addr_t pt_addr;
 
-		pt_addr = ppgtt->pt_dma_addr[i];
+		pt_addr = ppgtt->pd.page_tables[i].daddr;
 		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
 		pd_entry |= GEN6_PDE_VALID;
 
@@ -764,9 +750,9 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 {
-	BUG_ON(ppgtt->pd_offset & 0x3f);
+	BUG_ON(ppgtt->pd.pd_offset & 0x3f);
 
-	return (ppgtt->pd_offset / 64) << 16;
+	return (ppgtt->pd.pd_offset / 64) << 16;
 }
 
 static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
@@ -969,19 +955,16 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	if (ppgtt->pt_dma_addr) {
-		for (i = 0; i < ppgtt->num_pd_entries; i++)
-			pci_unmap_page(ppgtt->base.dev->pdev,
-				       ppgtt->pt_dma_addr[i],
-				       4096, PCI_DMA_BIDIRECTIONAL);
-	}
+	for (i = 0; i < ppgtt->num_pd_entries; i++)
+		pci_unmap_page(ppgtt->base.dev->pdev,
+			       ppgtt->pd.page_tables[i].daddr,
+			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		__free_page(ppgtt->pd.page_tables[i].page);
 	kfree(ppgtt->pd.page_tables);
@@ -1074,14 +1057,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 	}
 
-	ppgtt->pt_dma_addr = kcalloc(ppgtt->num_pd_entries, sizeof(dma_addr_t),
-				     GFP_KERNEL);
-	if (!ppgtt->pt_dma_addr) {
-		drm_mm_remove_node(&ppgtt->node);
-		gen6_ppgtt_free(ppgtt);
-		return -ENOMEM;
-	}
-
 	return 0;
 }
 
@@ -1103,7 +1078,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pt_dma_addr[i] = pt_addr;
+		ppgtt->pd.page_tables[i].daddr = pt_addr;
 	}
 
 	return 0;
@@ -1142,7 +1117,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
-	ppgtt->pd_offset =
+	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
@@ -1151,7 +1126,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 
 	gen6_write_pdes(ppgtt);
 	DRM_DEBUG("Adding PPGTT at offset %x\n",
-		  ppgtt->pd_offset << 10);
+		  ppgtt->pd.pd_offset << 10);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 1ff3c05..9bc973e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -267,10 +267,16 @@ struct i915_gtt {
 
 struct i915_pagetab {
 	struct page *page;
+	dma_addr_t daddr;
 };
 
 struct i915_pagedir {
 	struct page *page; /* NULL for GEN6-GEN7 */
+	union {
+		uint32_t pd_offset;
+		dma_addr_t daddr;
+	};
+
 	struct i915_pagetab *page_tables;
 };
 
@@ -286,14 +292,6 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
-	};
-	union {
-		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
-	};
-	union {
 		struct i915_pagedirpo pdp;
 		struct i915_pagedir pd;
 	};
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 57b1ca0..075cf68 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1731,14 +1731,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[3]);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[3]);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[2]);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[2]);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[1]);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[1]);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[0]);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[0]);
+	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3].daddr);
+	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3].daddr);
+	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2].daddr);
+	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2].daddr);
+	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1].daddr);
+	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1].daddr);
+	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0].daddr);
+	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0].daddr);
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 09/24] drm/i915: Create page table allocators
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (7 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 08/24] drm/i915: Complete page table structures Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 10/24] drm/i915: Track GEN6 page table usage Michel Thierry
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks.  This makes the
code easier to write, read, and verify.

Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,

The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain non-trivial complexity as a
result.

This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.

2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).

3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.

4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.

v2: Updated commit message to explain why this patch exists

v3: For lrc, s/pdp.pagedir[i].daddr/pdp.pagedir[i]->daddr/

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 228 +++++++++++++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c    |  16 +--
 3 files changed, 155 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5a9b362..564770f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -275,6 +275,102 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
+static void free_pt_single(struct i915_pagetab *pt)
+{
+	if (WARN_ON(!pt->page))
+		return;
+	__free_page(pt->page);
+	kfree(pt);
+}
+
+static struct i915_pagetab *alloc_pt_single(void)
+{
+	struct i915_pagetab *pt;
+
+	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
+	if (!pt)
+		return ERR_PTR(-ENOMEM);
+
+	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pt->page) {
+		kfree(pt);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pt;
+}
+
+/**
+ * alloc_pt_range() - Allocate a multiple page tables
+ * @pd:		The page directory which will have at least @count entries
+ *		available to point to the allocated page tables.
+ * @pde:	First page directory entry for which we are allocating.
+ * @count:	Number of pages to allocate.
+ *
+ * Allocates multiple page table pages and sets the appropriate entries in the
+ * page table structure within the page directory. Function cleans up after
+ * itself on any failures.
+ *
+ * Return: 0 if allocation succeeded.
+ */
+static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
+{
+	int i, ret;
+
+	/* 512 is the max page tables per pagedir on any platform.
+	 * TODO: make WARN after patch series is done
+	 */
+	BUG_ON(pde + count > GEN6_PPGTT_PD_ENTRIES);
+
+	for (i = pde; i < pde + count; i++) {
+		struct i915_pagetab *pt = alloc_pt_single();
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto err_out;
+		}
+		WARN(pd->page_tables[i],
+		     "Leaking page directory entry %d (%pa)\n",
+		     i, pd->page_tables[i]);
+		pd->page_tables[i] = pt;
+	}
+
+	return 0;
+
+err_out:
+	while (i--)
+		free_pt_single(pd->page_tables[i]);
+	return ret;
+}
+
+static void __free_pd_single(struct i915_pagedir *pd)
+{
+	__free_page(pd->page);
+	kfree(pd);
+}
+
+#define free_pd_single(pd) do { \
+	if ((pd)->page) { \
+		__free_pd_single(pd); \
+	} \
+} while (0)
+
+static struct i915_pagedir *alloc_pd_single(void)
+{
+	struct i915_pagedir *pd;
+
+	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+	if (!pd)
+		return ERR_PTR(-ENOMEM);
+
+	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pd->page) {
+		kfree(pd);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pd;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
 			   uint64_t val)
@@ -307,7 +403,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.pagedir[i].daddr;
+		dma_addr_t addr = ppgtt->pdp.pagedir[i]->daddr;
 		ret = gen8_write_pdp(ring, i, addr);
 		if (ret)
 			return ret;
@@ -334,8 +430,9 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
-		struct page *page_table = pd->page_tables[pde].page;
+		struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+		struct i915_pagetab *pt = pd->page_tables[pde];
+		struct page *page_table = pt->page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
@@ -380,8 +477,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 			break;
 
 		if (pt_vaddr == NULL) {
-			struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
-			struct page *page_table = pd->page_tables[pde].page;
+			struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+			struct i915_pagetab *pt = pd->page_tables[pde];
+			struct page *page_table = pt->page;
 
 			pt_vaddr = kmap_atomic(page_table);
 		}
@@ -412,18 +510,13 @@ static void gen8_free_page_tables(struct i915_pagedir *pd)
 {
 	int i;
 
-	if (pd->page_tables == NULL)
+	if (!pd->page)
 		return;
 
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
-		if (pd->page_tables[i].page)
-			__free_page(pd->page_tables[i].page);
-}
-
-static void gen8_free_page_directories(struct i915_pagedir *pd)
-{
-	kfree(pd->page_tables);
-	__free_page(pd->page);
+	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
+		free_pt_single(pd->page_tables[i]);
+		pd->page_tables[i] = NULL;
+	}
 }
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
@@ -431,8 +524,8 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
-		gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+		free_pd_single(ppgtt->pdp.pagedir[i]);
 	}
 }
 
@@ -444,14 +537,16 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pdp.pagedir[i].daddr)
+		if (!ppgtt->pdp.pagedir[i]->daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i].daddr, PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i]->daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
+			struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
+			struct i915_pagetab *pt =  pd->page_tables[j];
+			dma_addr_t addr = pt->daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -470,25 +565,20 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	int i, j;
+	int i, ret;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[i];
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_pagetab *pt = &pd->page_tables[j];
-
-			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-			if (!pt->page)
-				goto unwind_out;
-
-		}
+		ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
+				     0, GEN8_PDES_PER_PAGE);
+		if (ret)
+			goto unwind_out;
 	}
 
 	return 0;
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
 
 	return -ENOMEM;
 }
@@ -499,17 +589,9 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int i;
 
 	for (i = 0; i < max_pdp; i++) {
-		struct i915_pagetab *pt;
-
-		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
-		if (!pt)
+		ppgtt->pdp.pagedir[i] = alloc_pd_single();
+		if (IS_ERR(ppgtt->pdp.pagedir[i]))
 			goto unwind_out;
-
-		ppgtt->pdp.pagedir[i].page = alloc_page(GFP_KERNEL);
-		if (!ppgtt->pdp.pagedir[i].page)
-			goto unwind_out;
-
-		ppgtt->pdp.pagedir[i].page_tables = pt;
 	}
 
 	ppgtt->num_pd_pages = max_pdp;
@@ -518,10 +600,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 
 unwind_out:
-	while (i--) {
-		kfree(ppgtt->pdp.pagedir[i].page_tables);
-		__free_page(ppgtt->pdp.pagedir[i].page);
-	}
+	while (i--)
+		free_pd_single(ppgtt->pdp.pagedir[i]);
 
 	return -ENOMEM;
 }
@@ -556,14 +636,14 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.pagedir[pd].page, 0,
+			       ppgtt->pdp.pagedir[pd]->page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->pdp.pagedir[pd].daddr = pd_addr;
+	ppgtt->pdp.pagedir[pd]->daddr = pd_addr;
 
 	return 0;
 }
@@ -573,8 +653,8 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pt)
 {
 	dma_addr_t pt_addr;
-	struct i915_pagedir *pdir = &ppgtt->pdp.pagedir[pd];
-	struct i915_pagetab *ptab = &pdir->page_tables[pt];
+	struct i915_pagedir *pdir = ppgtt->pdp.pagedir[pd];
+	struct i915_pagetab *ptab = pdir->page_tables[pt];
 	struct page *p = ptab->page;
 	int ret;
 
@@ -637,10 +717,12 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 * will never need to touch the PDEs again.
 	 */
 	for (i = 0; i < max_pdp; i++) {
+		struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i].page);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i]->page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
+			struct i915_pagetab *pt = pd->page_tables[j];
+			dma_addr_t addr = pt->daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -689,7 +771,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
+		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -700,7 +782,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
 		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
@@ -739,7 +821,7 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		dma_addr_t pt_addr;
 
-		pt_addr = ppgtt->pd.page_tables[i].daddr;
+		pt_addr = ppgtt->pd.page_tables[i]->daddr;
 		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
 		pd_entry |= GEN6_PDE_VALID;
 
@@ -905,7 +987,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > I915_PPGTT_PT_ENTRIES)
 			last_pte = I915_PPGTT_PT_ENTRIES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -934,7 +1016,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -957,7 +1039,7 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_tables[i].daddr,
+			       ppgtt->pd.page_tables[i]->daddr,
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
@@ -966,8 +1048,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pd.page_tables[i].page);
-	kfree(ppgtt->pd.page_tables);
+		free_pt_single(ppgtt->pd.page_tables[i]);
+
+	free_pd_single(&ppgtt->pd);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1022,27 +1105,6 @@ alloc:
 	return 0;
 }
 
-static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct i915_pagetab *pt;
-	int i;
-
-	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
-	if (!pt)
-		return -ENOMEM;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		pt[i].page = alloc_page(GFP_KERNEL);
-		if (!pt->page) {
-			gen6_ppgtt_free(ppgtt);
-			return -ENOMEM;
-		}
-	}
-
-	ppgtt->pd.page_tables = pt;
-	return 0;
-}
-
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 {
 	int ret;
@@ -1051,7 +1113,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_allocate_page_tables(ppgtt);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
@@ -1069,7 +1131,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 		struct page *page;
 		dma_addr_t pt_addr;
 
-		page = ppgtt->pd.page_tables[i].page;
+		page = ppgtt->pd.page_tables[i]->page;
 		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
@@ -1078,7 +1140,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pd.page_tables[i].daddr = pt_addr;
+		ppgtt->pd.page_tables[i]->daddr = pt_addr;
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9bc973e..c08fe8b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -277,12 +277,12 @@ struct i915_pagedir {
 		dma_addr_t daddr;
 	};
 
-	struct i915_pagetab *page_tables;
+	struct i915_pagetab *page_tables[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
 };
 
 struct i915_pagedirpo {
 	/* struct page *page; */
-	struct i915_pagedir pagedir[GEN8_LEGACY_PDPES];
+	struct i915_pagedir *pagedir[GEN8_LEGACY_PDPES];
 };
 
 struct i915_hw_ppgtt {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 075cf68..546884b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1731,14 +1731,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3].daddr);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3].daddr);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2].daddr);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2].daddr);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1].daddr);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1].daddr);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0].daddr);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0].daddr);
+	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3]->daddr);
+	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3]->daddr);
+	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2]->daddr);
+	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2]->daddr);
+	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1]->daddr);
+	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1]->daddr);
+	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0]->daddr);
+	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0]->daddr);
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 10/24] drm/i915: Track GEN6 page table usage
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (8 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 09/24] drm/i915: Create page table allocators Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 21:06   ` Daniel Vetter
  2014-12-18 17:10 ` [PATCH 11/24] drm/i915: Extract context switch skip logic Michel Thierry
                   ` (19 subsequent siblings)
  29 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Instead of implementing the full tracking + dynamic allocation, this
patch does a bit less than half of the work, by tracking and warning on
unexpected conditions. The tracking itself follows which PTEs within a
page table are currently being used for objects. The next patch will
modify this to actually allocate the page tables only when necessary.

With the current patch there isn't much in the way of making a gen
agnostic range allocation function. However, in the next patch we'll add
more specificity which makes having separate functions a bit easier to
manage.

One important change introduced here is that DMA mappings are
created/destroyed at the same page directories/tables are
allocated/deallocated.

Notice that aliasing PPGTT is not managed here. The patch which actually
begins dynamic allocation/teardown explains the reasoning for this.

v2: s/pdp.pagedir/pdp.pagedirs
Make a scratch page allocation helper

v3: Rebase and expand commit message.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 300 ++++++++++++++++++++++++++----------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 151 +++++++++++++-----
 2 files changed, 333 insertions(+), 118 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 564770f..faa0603 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -138,10 +138,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 		return has_aliasing_ppgtt ? 1 : 0;
 }
 
-
-static void ppgtt_bind_vma(struct i915_vma *vma,
-			   enum i915_cache_level cache_level,
-			   u32 flags);
+static int ppgtt_bind_vma(struct i915_vma *vma,
+			  enum i915_cache_level cache_level,
+			  u32 flags);
 static void ppgtt_unbind_vma(struct i915_vma *vma);
 
 static inline gen8_gtt_pte_t gen8_pte_encode(dma_addr_t addr,
@@ -275,27 +274,99 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-static void free_pt_single(struct i915_pagetab *pt)
-{
+#define i915_dma_unmap_single(px, dev) do { \
+	pci_unmap_page((dev)->pdev, (px)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
+} while (0);
+
+/**
+ * i915_dma_map_px_single() - Create a dma mapping for a page table/dir/etc.
+ * @px:		Page table/dir/etc to get a DMA map for
+ * @dev:	drm device
+ *
+ * Page table allocations are unified across all gens. They always require a
+ * single 4k allocation, as well as a DMA mapping. If we keep the structs
+ * symmetric here, the simple macro covers us for every page table type.
+ *
+ * Return: 0 if success.
+ */
+#define i915_dma_map_px_single(px, dev) \
+	pci_dma_mapping_error((dev)->pdev, \
+			      (px)->daddr = pci_map_page((dev)->pdev, \
+							 (px)->page, 0, 4096, \
+							 PCI_DMA_BIDIRECTIONAL))
+
+static void __free_pt_single(struct i915_pagetab *pt, struct drm_device *dev,
+			     int scratch)
+{
+	if (WARN(scratch ^ pt->scratch,
+		 "Tried to free scratch = %d. Is scratch = %d\n",
+		 scratch, pt->scratch))
+		return;
+
 	if (WARN_ON(!pt->page))
 		return;
+
+	if (!scratch) {
+		const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+			GEN8_PTES_PER_PAGE : I915_PPGTT_PT_ENTRIES;
+		WARN(!bitmap_empty(pt->used_ptes, count),
+		     "Free page table with %d used pages\n",
+		     bitmap_weight(pt->used_ptes, count));
+	}
+
+	i915_dma_unmap_single(pt, dev);
 	__free_page(pt->page);
+	kfree(pt->used_ptes);
 	kfree(pt);
 }
 
-static struct i915_pagetab *alloc_pt_single(void)
+#define free_pt_single(pt, dev) \
+	__free_pt_single(pt, dev, false)
+#define free_pt_scratch(pt, dev) \
+	__free_pt_single(pt, dev, true)
+
+static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_pagetab *pt;
+	const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+		GEN8_PTES_PER_PAGE : I915_PPGTT_PT_ENTRIES;
+	int ret = -ENOMEM;
 
 	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
 	if (!pt)
 		return ERR_PTR(-ENOMEM);
 
+	pt->used_ptes = kcalloc(BITS_TO_LONGS(count), sizeof(*pt->used_ptes),
+				GFP_KERNEL);
+
+	if (!pt->used_ptes)
+		goto fail_bitmap;
+
 	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (!pt->page) {
-		kfree(pt);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!pt->page)
+		goto fail_page;
+
+	ret = i915_dma_map_px_single(pt, dev);
+	if (ret)
+		goto fail_dma;
+
+	return pt;
+
+fail_dma:
+	__free_page(pt->page);
+fail_page:
+	kfree(pt->used_ptes);
+fail_bitmap:
+	kfree(pt);
+
+	return ERR_PTR(ret);
+}
+
+static inline struct i915_pagetab *alloc_pt_scratch(struct drm_device *dev)
+{
+	struct i915_pagetab *pt = alloc_pt_single(dev);
+	if (!IS_ERR(pt))
+		pt->scratch = 1;
 
 	return pt;
 }
@@ -313,7 +384,9 @@ static struct i915_pagetab *alloc_pt_single(void)
  *
  * Return: 0 if allocation succeeded.
  */
-static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
+static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count,
+		  struct drm_device *dev)
+
 {
 	int i, ret;
 
@@ -323,7 +396,7 @@ static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
 	BUG_ON(pde + count > GEN6_PPGTT_PD_ENTRIES);
 
 	for (i = pde; i < pde + count; i++) {
-		struct i915_pagetab *pt = alloc_pt_single();
+		struct i915_pagetab *pt = alloc_pt_single(dev);
 		if (IS_ERR(pt)) {
 			ret = PTR_ERR(pt);
 			goto err_out;
@@ -338,7 +411,7 @@ static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
 
 err_out:
 	while (i--)
-		free_pt_single(pd->page_tables[i]);
+		free_pt_single(pd->page_tables[i], dev);
 	return ret;
 }
 
@@ -506,7 +579,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct i915_pagedir *pd)
+static void gen8_free_page_tables(struct i915_pagedir *pd, struct drm_device *dev)
 {
 	int i;
 
@@ -514,7 +587,7 @@ static void gen8_free_page_tables(struct i915_pagedir *pd)
 		return;
 
 	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		free_pt_single(pd->page_tables[i]);
+		free_pt_single(pd->page_tables[i], dev);
 		pd->page_tables[i] = NULL;
 	}
 }
@@ -524,7 +597,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
 		free_pd_single(ppgtt->pdp.pagedir[i]);
 	}
 }
@@ -569,7 +642,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
-				     0, GEN8_PDES_PER_PAGE);
+				     0, GEN8_PDES_PER_PAGE, ppgtt->base.dev);
 		if (ret)
 			goto unwind_out;
 	}
@@ -578,7 +651,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
@@ -808,26 +881,36 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	}
 }
 
-static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
+/* Write pde (index) from the page directory @pd to the page table @pt */
+static void gen6_write_pdes(struct i915_pagedir *pd,
+			    const int pde, struct i915_pagetab *pt)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
-	gen6_gtt_pte_t __iomem *pd_addr;
-	uint32_t pd_entry;
-	int i;
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pd, struct i915_hw_ppgtt, pd);
+	u32 pd_entry;
 
-	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
-	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		dma_addr_t pt_addr;
+	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
+	pd_entry |= GEN6_PDE_VALID;
 
-		pt_addr = ppgtt->pd.page_tables[i]->daddr;
-		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
-		pd_entry |= GEN6_PDE_VALID;
+	writel(pd_entry, ppgtt->pd_addr + pde);
 
-		writel(pd_entry, pd_addr + i);
-	}
-	readl(pd_addr);
+	/* XXX: Caller needs to make sure the write completes if necessary */
+}
+
+/* Write all the page tables found in the ppgtt structure to incrementing page
+ * directories. */
+static void gen6_write_page_range(struct drm_i915_private *dev_priv,
+				struct i915_pagedir *pd, uint32_t start, uint32_t length)
+{
+	struct i915_pagetab *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, pd, start, length, temp, pde)
+		gen6_write_pdes(pd, pde, pt);
+
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
+	readl(dev_priv->gtt.gsm);
 }
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
@@ -1043,13 +1126,59 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
+static int gen6_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+				container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_pagetab *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		int j;
+
+		DECLARE_BITMAP(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
+		bitmap_zero(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
+		bitmap_set(tmp_bitmap, gen6_pte_index(start),
+			   gen6_pte_count(start, length));
+
+		/* TODO: To be done in the next patch. Map the page/insert
+		 * entries here */
+		for_each_set_bit(j, tmp_bitmap, I915_PPGTT_PT_ENTRIES) {
+			if (test_bit(j, pt->used_ptes)) {
+				/* Check that we're changing cache levels */
+			}
+		}
+
+		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+				I915_PPGTT_PT_ENTRIES);
+	}
+
+	return 0;
+}
+
+static void gen6_teardown_va_range(struct i915_address_space *vm,
+				   uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+				container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_pagetab *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
+			     gen6_pte_count(start, length));
+	}
+}
+
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		free_pt_single(ppgtt->pd.page_tables[i]);
+		free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
 
+	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
 	free_pd_single(&ppgtt->pd);
 }
 
@@ -1076,6 +1205,9 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 	 * size. We allocate at the top of the GTT to avoid fragmentation.
 	 */
 	BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
+	ppgtt->scratch_pt = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pt))
+		return PTR_ERR(ppgtt->scratch_pt);
 alloc:
 	ret = drm_mm_insert_node_in_range_generic(&dev_priv->gtt.base.mm,
 						  &ppgtt->node, GEN6_PD_SIZE,
@@ -1089,20 +1221,25 @@ alloc:
 					       0, dev_priv->gtt.base.total,
 					       0);
 		if (ret)
-			return ret;
+			goto err_out;
 
 		retried = true;
 		goto alloc;
 	}
 
 	if (ret)
-		return ret;
+		goto err_out;
+
 
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
 	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
 	return 0;
+
+err_out:
+	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
+	return ret;
 }
 
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
@@ -1113,7 +1250,9 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+			ppgtt->base.dev);
+
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
@@ -1122,30 +1261,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct drm_device *dev = ppgtt->base.dev;
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		struct page *page;
-		dma_addr_t pt_addr;
-
-		page = ppgtt->pd.page_tables[i]->page;
-		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
-				       PCI_DMA_BIDIRECTIONAL);
-
-		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
-			gen6_ppgtt_unmap_pages(ppgtt);
-			return -EIO;
-		}
-
-		ppgtt->pd.page_tables[i]->daddr = pt_addr;
-	}
-
-	return 0;
-}
-
 static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
 	struct drm_device *dev = ppgtt->base.dev;
@@ -1166,12 +1281,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_setup_page_tables(ppgtt);
-	if (ret) {
-		gen6_ppgtt_free(ppgtt);
-		return ret;
-	}
-
+	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
+	ppgtt->base.teardown_va_range = gen6_teardown_va_range;
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
@@ -1182,11 +1293,15 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
+	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
+
+	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
+
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
 			 ppgtt->node.start / PAGE_SIZE);
 
-	gen6_write_pdes(ppgtt);
 	DRM_DEBUG("Adding PPGTT at offset %x\n",
 		  ppgtt->pd.pd_offset << 10);
 
@@ -1301,17 +1416,28 @@ void  i915_ppgtt_release(struct kref *kref)
 	kfree(ppgtt);
 }
 
-static void
+static int
 ppgtt_bind_vma(struct i915_vma *vma,
 	       enum i915_cache_level cache_level,
 	       u32 flags)
 {
+	int ret;
+
 	/* Currently applicable only to VLV */
 	if (vma->obj->gt_ro)
 		flags |= PTE_READ_ONLY;
 
+	if (vma->vm->allocate_va_range) {
+		ret = vma->vm->allocate_va_range(vma->vm,
+						 vma->node.start,
+						 vma->node.size);
+		if (ret)
+			return ret;
+	}
+
 	vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
 				cache_level, flags);
+	return 0;
 }
 
 static void ppgtt_unbind_vma(struct i915_vma *vma)
@@ -1320,6 +1446,9 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 			     vma->node.start,
 			     vma->obj->base.size,
 			     true);
+	if (vma->vm->teardown_va_range)
+		vma->vm->teardown_va_range(vma->vm,
+					   vma->node.start, vma->node.size);
 }
 
 extern int intel_iommu_gfx_mapped;
@@ -1463,13 +1592,14 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 
 	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
 		/* TODO: Perhaps it shouldn't be gen6 specific */
-		if (i915_is_ggtt(vm)) {
-			if (dev_priv->mm.aliasing_ppgtt)
-				gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
-			continue;
-		}
 
-		gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
+		struct i915_hw_ppgtt *ppgtt =
+			container_of(vm, struct i915_hw_ppgtt, base);
+
+		if (i915_is_ggtt(vm))
+			ppgtt = dev_priv->mm.aliasing_ppgtt;
+
+		gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
 	}
 
 	i915_ggtt_flush(dev_priv);
@@ -1634,9 +1764,9 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
 }
 
 
-static void i915_ggtt_bind_vma(struct i915_vma *vma,
-			       enum i915_cache_level cache_level,
-			       u32 unused)
+static int i915_ggtt_bind_vma(struct i915_vma *vma,
+			      enum i915_cache_level cache_level,
+			      u32 unused)
 {
 	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
 	unsigned int flags = (cache_level == I915_CACHE_NONE) ?
@@ -1645,6 +1775,8 @@ static void i915_ggtt_bind_vma(struct i915_vma *vma,
 	BUG_ON(!i915_is_ggtt(vma->vm));
 	intel_gtt_insert_sg_entries(vma->ggtt_view.pages, entry, flags);
 	vma->bound = GLOBAL_BIND;
+
+	return 0;
 }
 
 static void i915_ggtt_clear_range(struct i915_address_space *vm,
@@ -1667,9 +1799,9 @@ static void i915_ggtt_unbind_vma(struct i915_vma *vma)
 	intel_gtt_clear_range(first, size);
 }
 
-static void ggtt_bind_vma(struct i915_vma *vma,
-			  enum i915_cache_level cache_level,
-			  u32 flags)
+static int ggtt_bind_vma(struct i915_vma *vma,
+			 enum i915_cache_level cache_level,
+			 u32 flags)
 {
 	struct drm_device *dev = vma->vm->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1710,6 +1842,8 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 					    cache_level, flags);
 		vma->bound |= LOCAL_BIND;
 	}
+
+	return 0;
 }
 
 static void ggtt_unbind_vma(struct i915_vma *vma)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c08fe8b..2eb6011 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -54,7 +54,10 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN6_PPGTT_PD_ENTRIES		512
 #define GEN6_PD_SIZE			(GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
 #define GEN6_PD_ALIGN			(PAGE_SIZE * 16)
+#define GEN6_PDE_SHIFT          22
 #define GEN6_PDE_VALID			(1 << 0)
+#define GEN6_PDE_MASK			(GEN6_PPGTT_PD_ENTRIES-1)
+#define NUM_PTE(pde_shift)		(1 << (pde_shift - PAGE_SHIFT))
 
 #define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
 
@@ -182,9 +185,33 @@ struct i915_vma {
 	 * setting the valid PTE entries to a reserved scratch page. */
 	void (*unbind_vma)(struct i915_vma *vma);
 	/* Map an object into an address space with the given cache flags. */
-	void (*bind_vma)(struct i915_vma *vma,
-			 enum i915_cache_level cache_level,
-			 u32 flags);
+	int (*bind_vma)(struct i915_vma *vma,
+			enum i915_cache_level cache_level,
+			u32 flags);
+};
+
+
+struct i915_pagetab {
+	struct page *page;
+	dma_addr_t daddr;
+
+	unsigned long *used_ptes;
+	unsigned int scratch:1;
+};
+
+struct i915_pagedir {
+	struct page *page; /* NULL for GEN6-GEN7 */
+	union {
+		uint32_t pd_offset;
+		dma_addr_t daddr;
+	};
+
+	struct i915_pagetab *page_tables[GEN6_PPGTT_PD_ENTRIES];
+};
+
+struct i915_pagedirpo {
+	/* struct page *page; */
+	struct i915_pagedir *pagedir[GEN8_LEGACY_PDPES];
 };
 
 struct i915_address_space {
@@ -226,6 +253,12 @@ struct i915_address_space {
 	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
 				     enum i915_cache_level level,
 				     bool valid, u32 flags); /* Create a valid PTE */
+	int (*allocate_va_range)(struct i915_address_space *vm,
+				 uint64_t start,
+				 uint64_t length);
+	void (*teardown_va_range)(struct i915_address_space *vm,
+				  uint64_t start,
+				  uint64_t length);
 	void (*clear_range)(struct i915_address_space *vm,
 			    uint64_t start,
 			    uint64_t length,
@@ -237,6 +270,29 @@ struct i915_address_space {
 	void (*cleanup)(struct i915_address_space *vm);
 };
 
+struct i915_hw_ppgtt {
+	struct i915_address_space base;
+	struct kref ref;
+	struct drm_mm_node node;
+	unsigned num_pd_entries;
+	unsigned num_pd_pages; /* gen8+ */
+	union {
+		struct i915_pagedirpo pdp;
+		struct i915_pagedir pd;
+	};
+
+	struct i915_pagetab *scratch_pt;
+
+	struct drm_i915_file_private *file_priv;
+
+	gen6_gtt_pte_t __iomem *pd_addr;
+
+	int (*enable)(struct i915_hw_ppgtt *ppgtt);
+	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
+			 struct intel_engine_cs *ring);
+	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
+};
+
 /* The Graphics Translation Table is the way in which GEN hardware translates a
  * Graphics Virtual Address into a Physical Address. In addition to the normal
  * collateral associated with any va->pa translations GEN hardware also has a
@@ -265,44 +321,69 @@ struct i915_gtt {
 			  unsigned long *mappable_end);
 };
 
-struct i915_pagetab {
-	struct page *page;
-	dma_addr_t daddr;
-};
+/* For each pde iterates over every pde between from start until start + length.
+ * If start, and start+length are not perfectly divisible, the macro will round
+ * down, and up as needed. The macro modifies pde, start, and length. Dev is
+ * only used to differentiate shift values. Temp is temp.  On gen6/7, start = 0,
+ * and length = 2G effectively iterates over every PDE in the system. On gen8+
+ * it simply iterates over every page directory entry in a page directory.
+ *
+ * XXX: temp is not actually needed, but it saves doing the ALIGN operation.
+ */
+#define gen6_for_each_pde(pt, pd, start, length, temp, iter) \
+	for (iter = gen6_pde_index(start), pt = (pd)->page_tables[iter]; \
+	     length > 0 && iter < GEN6_PPGTT_PD_ENTRIES; \
+	     pt = (pd)->page_tables[++iter], \
+	     temp = ALIGN(start+1, 1 << GEN6_PDE_SHIFT) - start, \
+	     temp = min(temp, (unsigned)length), \
+	     start += temp, length -= temp)
+
+static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
+{
+	const uint32_t mask = NUM_PTE(pde_shift) - 1;
+	return (address >> PAGE_SHIFT) & mask;
+}
 
-struct i915_pagedir {
-	struct page *page; /* NULL for GEN6-GEN7 */
-	union {
-		uint32_t pd_offset;
-		dma_addr_t daddr;
-	};
+/* Helper to counts the number of PTEs within the given length. This count does
+* not cross a page table boundary, so the max value would be
+* I915_PPGTT_PT_ENTRIES for GEN6, and GEN8_PTES_PER_PAGE for GEN8.
+*/
+static inline size_t i915_pte_count(uint64_t addr, size_t length,
+					uint32_t pde_shift)
+{
+	const uint64_t mask = ~((1 << pde_shift) - 1);
+	uint64_t end;
 
-	struct i915_pagetab *page_tables[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
-};
+	BUG_ON(length == 0);
+	BUG_ON(offset_in_page(addr|length));
 
-struct i915_pagedirpo {
-	/* struct page *page; */
-	struct i915_pagedir *pagedir[GEN8_LEGACY_PDPES];
-};
+	end = addr + length;
 
-struct i915_hw_ppgtt {
-	struct i915_address_space base;
-	struct kref ref;
-	struct drm_mm_node node;
-	unsigned num_pd_entries;
-	unsigned num_pd_pages; /* gen8+ */
-	union {
-		struct i915_pagedirpo pdp;
-		struct i915_pagedir pd;
-	};
+	if ((addr & mask) != (end & mask))
+		return NUM_PTE(pde_shift) - i915_pte_index(addr, pde_shift);
 
-	struct drm_i915_file_private *file_priv;
+	return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
+}
 
-	int (*enable)(struct i915_hw_ppgtt *ppgtt);
-	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
-			 struct intel_engine_cs *ring);
-	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
-};
+static inline uint32_t i915_pde_index(uint64_t addr, uint32_t shift)
+{
+	return (addr >> shift) & GEN6_PDE_MASK;
+}
+
+static inline uint32_t gen6_pte_index(uint32_t addr)
+{
+	return i915_pte_index(addr, GEN6_PDE_SHIFT);
+}
+
+static inline size_t gen6_pte_count(uint32_t addr, uint32_t length)
+{
+	return i915_pte_count(addr, length, GEN6_PDE_SHIFT);
+}
+
+static inline uint32_t gen6_pde_index(uint32_t addr)
+{
+	return i915_pde_index(addr, GEN6_PDE_SHIFT);
+}
 
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 11/24] drm/i915: Extract context switch skip logic
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (9 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 10/24] drm/i915: Track GEN6 page table usage Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 20:54   ` Daniel Vetter
  2014-12-18 17:10 ` [PATCH 12/24] drm/i915: Track page table reload need Michel Thierry
                   ` (18 subsequent siblings)
  29 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

We have some fanciness coming up. This patch just breaks out the logic.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index b67d269..a8ff03d 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -726,6 +726,16 @@ unpin_out:
 	return ret;
 }
 
+static inline bool should_skip_switch(struct intel_engine_cs *ring,
+				      struct intel_context *from,
+				      struct intel_context *to)
+{
+	if (from == to && !to->remap_slice)
+		return true;
+
+	return false;
+}
+
 /**
  * i915_switch_context() - perform a GPU context switch.
  * @ring: ring for which we'll execute the context switch
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 12/24] drm/i915: Track page table reload need
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (10 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 11/24] drm/i915: Extract context switch skip logic Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 21:08   ` Daniel Vetter
  2014-12-18 17:10 ` [PATCH 13/24] drm/i915: Initialize all contexts Michel Thierry
                   ` (17 subsequent siblings)
  29 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This patch was formerly known as, "Force pd restore when PDEs change,
gen6-7." I had to change the name because it is needed for GEN8 too.

The real issue this is trying to solve is when a new object is mapped
into the current address space. The GPU does not snoop the new mapping
so we must do the gen specific action to reload the page tables.

GEN8 and GEN7 do differ in the way they load page tables for the RCS.
GEN8 does so with the context restore, while GEN7 requires the proper
load commands in the command streamer. Non-render is similar for both.

Caveat for GEN7
The docs say you cannot change the PDEs of a currently running context.
We never map new PDEs of a running context, and expect them to be
present - so I think this is okay. (We can unmap, but this should also
be okay since we only unmap unreferenced objects that the GPU shouldn't
be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag
to signal that even if the context is the same, force a reload. It's
unclear exactly what this does, but I have a hunch it's the right thing
to do.

The logic assumes that we always emit a context switch after mapping new
PDEs, and before we submit a batch. This is the case today, and has been
the case since the inception of hardware contexts. A note in the comment
let's the user know.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

squash! drm/i915: Force pd restore when PDEs change, gen6-7

It's not just for gen8. If the current context has mappings change, we
need a context reload to switch

v2: Rebased after ppgtt clean up patches. Split the warning for aliasing
and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt
is always null.

Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_context.c    | 67 +++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++++
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 15 ++++++-
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  2 +
 4 files changed, 75 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index a8ff03d..fa9d4a1 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -563,6 +563,42 @@ mi_set_context(struct intel_engine_cs *ring,
 	return ret;
 }
 
+static inline bool should_skip_switch(struct intel_engine_cs *ring,
+				      struct intel_context *from,
+				      struct intel_context *to)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	if (to->remap_slice)
+		return false;
+
+	if (to->ppgtt) {
+		if (from == to && !test_bit(ring->id, &to->ppgtt->base.pd_reload_mask))
+			return true;
+	} else {
+		if (from == to && !test_bit(ring->id, &dev_priv->mm.aliasing_ppgtt->base.pd_reload_mask))
+			return true;
+	}
+
+	return false;
+}
+
+static bool
+needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	return ((INTEL_INFO(ring->dev)->gen < 8) ||
+			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
+}
+
+static bool
+needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
+{
+	return IS_GEN8(ring->dev) &&
+			(to->ppgtt || &to->ppgtt->base.pd_reload_mask);
+}
+
 static int do_switch(struct intel_engine_cs *ring,
 		     struct intel_context *to)
 {
@@ -571,9 +607,6 @@ static int do_switch(struct intel_engine_cs *ring,
 	u32 hw_flags = 0;
 	bool uninitialized = false;
 	struct i915_vma *vma;
-	bool needs_pd_load_pre = ((INTEL_INFO(ring->dev)->gen < 8) ||
-			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
-	bool needs_pd_load_post = false;
 	int ret, i;
 
 	if (from != NULL && ring == &dev_priv->ring[RCS]) {
@@ -581,7 +614,7 @@ static int do_switch(struct intel_engine_cs *ring,
 		BUG_ON(!i915_gem_obj_is_pinned(from->legacy_hw_ctx.rcs_state));
 	}
 
-	if (from == to && !to->remap_slice)
+	if (should_skip_switch(ring, from, to))
 		return 0;
 
 	/* Trying to pin first makes error handling easier. */
@@ -599,7 +632,7 @@ static int do_switch(struct intel_engine_cs *ring,
 	 */
 	from = ring->last_context;
 
-	if (needs_pd_load_pre) {
+	if (needs_pd_load_pre(ring, to)) {
 		/* Older GENs and non render rings still want the load first,
 		 * "PP_DCLV followed by PP_DIR_BASE register through Load
 		 * Register Immediate commands in Ring Buffer before submitting
@@ -608,6 +641,12 @@ static int do_switch(struct intel_engine_cs *ring,
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		if (ret)
 			goto unpin_out;
+
+		/* Doing a PD load always reloads the page dirs */
+		if (to->ppgtt)
+			clear_bit(ring->id, &to->ppgtt->base.pd_reload_mask);
+		else
+			clear_bit(ring->id, &dev_priv->mm.aliasing_ppgtt->base.pd_reload_mask);
 	}
 
 	if (ring != &dev_priv->ring[RCS]) {
@@ -644,16 +683,16 @@ static int do_switch(struct intel_engine_cs *ring,
 	 * XXX: If we implemented page directory eviction code, this
 	 * optimization needs to be removed.
 	 */
-	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
+	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
 		hw_flags |= MI_RESTORE_INHIBIT;
-		needs_pd_load_post = to->ppgtt && IS_GEN8(ring->dev);
-	}
+	else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->base.pd_reload_mask))
+		hw_flags |= MI_FORCE_RESTORE;
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
 		goto unpin_out;
 
-	if (needs_pd_load_post) {
+	if (needs_pd_load_post(ring, to)) {
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		/* The hardware context switch is emitted, but we haven't
 		 * actually changed the state - so it's probably safe to bail
@@ -726,16 +765,6 @@ unpin_out:
 	return ret;
 }
 
-static inline bool should_skip_switch(struct intel_engine_cs *ring,
-				      struct intel_context *from,
-				      struct intel_context *to)
-{
-	if (from == to && !to->remap_slice)
-		return true;
-
-	return false;
-}
-
 /**
  * i915_switch_context() - perform a GPU context switch.
  * @ring: ring for which we'll execute the context switch
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8330660..09d864f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1199,6 +1199,13 @@ i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
 	if (ret)
 		goto error;
 
+	if (ctx->ppgtt)
+		WARN(ctx->ppgtt->base.pd_reload_mask & (1<<ring->id),
+			"%s didn't clear reload\n", ring->name);
+	else
+		WARN(dev_priv->mm.aliasing_ppgtt->base.pd_reload_mask &
+			(1<<ring->id), "%s didn't clear reload\n", ring->name);
+
 	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
 	instp_mask = I915_EXEC_CONSTANTS_MASK;
 	switch (instp_mode) {
@@ -1446,6 +1453,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret)
 		goto err;
 
+	/* XXX: Reserve has possibly change PDEs which means we must do a
+	 * context switch before we can coherently read some of the reserved
+	 * VMAs. */
+
 	/* The objects are in their final locations, apply the relocations. */
 	if (need_relocs)
 		ret = i915_gem_execbuffer_relocate(eb);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index faa0603..c917301 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1398,6 +1398,15 @@ i915_ppgtt_create(struct drm_device *dev, struct drm_i915_file_private *fpriv)
 	return ppgtt;
 }
 
+/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
+ * are switching between contexts with the same LRCA, we also must do a force
+ * restore.
+ */
+#define ppgtt_invalidate_tlbs(vm) do {\
+	/* If current vm != vm, */ \
+	vm->pd_reload_mask = INTEL_INFO(vm->dev)->ring_mask; \
+} while (0)
+
 void  i915_ppgtt_release(struct kref *kref)
 {
 	struct i915_hw_ppgtt *ppgtt =
@@ -1433,6 +1442,8 @@ ppgtt_bind_vma(struct i915_vma *vma,
 						 vma->node.size);
 		if (ret)
 			return ret;
+
+		ppgtt_invalidate_tlbs(vma->vm);
 	}
 
 	vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
@@ -1446,9 +1457,11 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 			     vma->node.start,
 			     vma->obj->base.size,
 			     true);
-	if (vma->vm->teardown_va_range)
+	if (vma->vm->teardown_va_range) {
 		vma->vm->teardown_va_range(vma->vm,
 					   vma->node.start, vma->node.size);
+		ppgtt_invalidate_tlbs(vma->vm);
+	}
 }
 
 extern int intel_iommu_gfx_mapped;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 2eb6011..58a55bc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -226,6 +226,8 @@ struct i915_address_space {
 		struct page *page;
 	} scratch;
 
+	unsigned long pd_reload_mask;
+
 	/**
 	 * List of objects currently involved in rendering.
 	 *
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 13/24] drm/i915: Initialize all contexts
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (11 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 12/24] drm/i915: Track page table reload need Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 14/24] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The problem is we're going to switch to a new context, which could be
the default context. The plan was to use restore inhibit, which would be
fine, except if we are using dynamic page tables (which we will). If we
use dynamic page tables and we don't load new page tables, the previous
page tables might go away, and future operations will fault.

CTXA runs.
switch to default, restore inhibit
CTXA dies and has its address space taken away.
Run CTXB, tries to save using the context A's address space - this
fails.

The general solution is to make sure every context has it's own state,
and its own address space. For cases when we must restore inhibit, first
thing we do is load a valid address space. I thought this would be
enough, but apparently there are references within the context itself
which will refer to the old address space - therefore, we also must
reinitialize.

It was tricky to track this down as we don't have much insight into what
happens in a context save.

This is required for the next patch which enables dynamic page tables.

v2: to->ppgtt is only valid in full ppgtt.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_context.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index fa9d4a1..b1f3d50 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -592,13 +592,6 @@ needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
 			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
 }
 
-static bool
-needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
-{
-	return IS_GEN8(ring->dev) &&
-			(to->ppgtt || &to->ppgtt->base.pd_reload_mask);
-}
-
 static int do_switch(struct intel_engine_cs *ring,
 		     struct intel_context *to)
 {
@@ -679,20 +672,24 @@ static int do_switch(struct intel_engine_cs *ring,
 
 	/* GEN8 does *not* require an explicit reload if the PDPs have been
 	 * setup, and we do not wish to move them.
-	 *
-	 * XXX: If we implemented page directory eviction code, this
-	 * optimization needs to be removed.
 	 */
-	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
+	if (!to->legacy_hw_ctx.initialized) {
 		hw_flags |= MI_RESTORE_INHIBIT;
-	else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->base.pd_reload_mask))
+		/* NB: If we inhibit the restore, the context is not allowed to
+		 * die because future work may end up depending on valid address
+		 * space. This means we must enforce that a page table load
+		 * occur when this occurs. */
+	} else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->base.pd_reload_mask))
 		hw_flags |= MI_FORCE_RESTORE;
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
 		goto unpin_out;
 
-	if (needs_pd_load_post(ring, to)) {
+	if (IS_GEN8(ring->dev) && to->ppgtt && (hw_flags & MI_RESTORE_INHIBIT)) {
+		/* We have a valid page directory (scratch) to switch to. This
+		 * allows the old VM to be freed. Note that if anything occurs
+		 * between the set context, and here, we are f*cked */
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		/* The hardware context switch is emitted, but we haven't
 		 * actually changed the state - so it's probably safe to bail
@@ -742,7 +739,7 @@ static int do_switch(struct intel_engine_cs *ring,
 		i915_gem_context_unreference(from);
 	}
 
-	uninitialized = !to->legacy_hw_ctx.initialized && from == NULL;
+	uninitialized = !to->legacy_hw_ctx.initialized;
 	to->legacy_hw_ctx.initialized = true;
 
 done:
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 14/24] drm/i915: Finish gen6/7 dynamic page table allocation
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (12 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 13/24] drm/i915: Initialize all contexts Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 21:12   ` Daniel Vetter
  2014-12-18 17:10 ` [PATCH 15/24] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
                   ` (15 subsequent siblings)
  29 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This patch continues on the idea from the previous patch. From here on,
in the steady state, PDEs are all pointing to the scratch page table (as
recommended in the spec). When an object is allocated in the VA range,
the code will determine if we need to allocate a page for the page
table. Similarly when the object is destroyed, we will remove, and free
the page table pointing the PDE back to the scratch page.

Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.

The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV.  Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.

We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.

v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.

v3: Updated trace event to spit out a name

v4: Aliasing ppgtt is now initialized differently (in setup global gtt)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4)
---
 drivers/gpu/drm/i915/i915_debugfs.c |   3 +-
 drivers/gpu/drm/i915/i915_drv.h     |   7 ++
 drivers/gpu/drm/i915/i915_gem_gtt.c | 125 +++++++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_trace.h   | 116 +++++++++++++++++++++++++++++++++
 4 files changed, 240 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 60f91bc..0f63076 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2149,6 +2149,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", I915_READ(RING_PP_DIR_BASE_READ(ring)));
 		seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", I915_READ(RING_PP_DIR_DCLV(ring)));
 	}
+	seq_printf(m, "ECOCHK: 0x%08x\n\n", I915_READ(GAM_ECOCHK));
+
 	if (dev_priv->mm.aliasing_ppgtt) {
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
@@ -2165,7 +2167,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
 		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
 	}
-	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
 }
 
 static int i915_ppgtt_info(struct seq_file *m, void *data)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3047291f..d74db21 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2819,6 +2819,13 @@ static inline bool i915_is_ggtt(struct i915_address_space *vm)
 	return vm == ggtt;
 }
 
+static inline bool i915_is_aliasing_ppgtt(struct i915_address_space *vm)
+{
+	struct i915_address_space *appgtt =
+		&((struct drm_i915_private *)(vm)->dev->dev_private)->mm.aliasing_ppgtt->base;
+	return vm == appgtt;
+}
+
 static inline struct i915_hw_ppgtt *
 i915_vm_to_ppgtt(struct i915_address_space *vm)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c917301..02ccb18 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1129,10 +1129,47 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start, uint64_t length)
 {
+	DECLARE_BITMAP(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
+	struct drm_device *dev = vm->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_hw_ppgtt *ppgtt =
 				container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_pagetab *pt;
+	const uint32_t start_save = start, length_save = length;
 	uint32_t pde, temp;
+	int ret;
+
+	BUG_ON(upper_32_bits(start));
+
+	bitmap_zero(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
+
+	/* The allocation is done in two stages so that we can bail out with
+	 * minimal amount of pain. The first stage finds new page tables that
+	 * need allocation. The second stage marks use ptes within the page
+	 * tables.
+	 */
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		if (pt != ppgtt->scratch_pt) {
+			WARN_ON(bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
+			continue;
+		}
+
+		/* We've already allocated a page table */
+		WARN_ON(!bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
+
+		pt = alloc_pt_single(dev);
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto unwind_out;
+		}
+
+		ppgtt->pd.page_tables[pde] = pt;
+		set_bit(pde, new_page_tables);
+		trace_i915_pagetable_alloc(vm, pde, start, GEN6_PDE_SHIFT);
+	}
+
+	start = start_save;
+	length = length_save;
 
 	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
 		int j;
@@ -1150,11 +1187,32 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 			}
 		}
 
-		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+		if (test_and_clear_bit(pde, new_page_tables))
+			gen6_write_pdes(&ppgtt->pd, pde, pt);
+
+		trace_i915_pagetable_map(vm, pde, pt,
+					 gen6_pte_index(start),
+					 gen6_pte_count(start, length),
+					 I915_PPGTT_PT_ENTRIES);
+		bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
 				I915_PPGTT_PT_ENTRIES);
 	}
 
+	WARN_ON(!bitmap_empty(new_page_tables, GEN6_PPGTT_PD_ENTRIES));
+
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
+	readl(dev_priv->gtt.gsm);
+
 	return 0;
+
+unwind_out:
+	for_each_set_bit(pde, new_page_tables, GEN6_PPGTT_PD_ENTRIES) {
+		struct i915_pagetab *pt = ppgtt->pd.page_tables[pde];
+		ppgtt->pd.page_tables[pde] = NULL;
+		free_pt_single(pt, vm->dev);
+	}
+	return ret;
 }
 
 static void gen6_teardown_va_range(struct i915_address_space *vm,
@@ -1166,8 +1224,27 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
 	uint32_t pde, temp;
 
 	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+
+		if (WARN(pt == ppgtt->scratch_pt,
+		    "Tried to teardown scratch page vm %p. pde %u: %llx-%llx\n",
+		    vm, pde, start, start + length))
+			continue;
+
+		trace_i915_pagetable_unmap(vm, pde, pt,
+					   gen6_pte_index(start),
+					   gen6_pte_count(start, length),
+					   I915_PPGTT_PT_ENTRIES);
+
 		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
 			     gen6_pte_count(start, length));
+
+		if (bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES)) {
+			trace_i915_pagetable_destroy(vm, pde,
+						     start & GENMASK_ULL(63, GEN6_PDE_SHIFT),
+						     GEN6_PDE_SHIFT);
+			gen6_write_pdes(&ppgtt->pd, pde, ppgtt->scratch_pt);
+			ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
+		}
 	}
 }
 
@@ -1175,9 +1252,13 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct i915_pagetab *pt = ppgtt->pd.page_tables[i];
+		if (pt != ppgtt->scratch_pt)
+			free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+	}
 
+	/* Consider putting this as part of pd free. */
 	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
 	free_pd_single(&ppgtt->pd);
 }
@@ -1242,7 +1323,7 @@ err_out:
 	return ret;
 }
 
-static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt, bool preallocate_pt)
 {
 	int ret;
 
@@ -1250,10 +1331,14 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
+	if (!preallocate_pt)
+		return 0;
+
 	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
 			ppgtt->base.dev);
 
 	if (ret) {
+		free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
 	}
@@ -1261,7 +1346,17 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
+				  uint64_t start, uint64_t length)
+{
+	struct i915_pagetab *unused;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde)
+		ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
+}
+
+static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 {
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1277,7 +1372,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	} else
 		BUG();
 
-	ret = gen6_ppgtt_alloc(ppgtt);
+	ret = gen6_ppgtt_alloc(ppgtt, aliasing);
 	if (ret)
 		return ret;
 
@@ -1296,6 +1391,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
 		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
+	if (!aliasing)
+		gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
+
 	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
@@ -1308,7 +1406,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
+static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
+		bool aliasing)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
@@ -1316,7 +1415,7 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.scratch = dev_priv->gtt.base.scratch;
 
 	if (INTEL_INFO(dev)->gen < 8)
-		return gen6_ppgtt_init(ppgtt);
+		return gen6_ppgtt_init(ppgtt, aliasing);
 	else if (IS_GEN8(dev) || IS_GEN9(dev))
 		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
 	else
@@ -1327,7 +1426,7 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret = 0;
 
-	ret = __hw_ppgtt_init(dev, ppgtt);
+	ret = __hw_ppgtt_init(dev, ppgtt, false);
 	if (ret == 0) {
 		kref_init(&ppgtt->ref);
 		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
@@ -1437,6 +1536,8 @@ ppgtt_bind_vma(struct i915_vma *vma,
 		flags |= PTE_READ_ONLY;
 
 	if (vma->vm->allocate_va_range) {
+		trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size,
+				    VM_TO_TRACE_NAME(vma->vm));
 		ret = vma->vm->allocate_va_range(vma->vm,
 						 vma->node.start,
 						 vma->node.size);
@@ -1458,6 +1559,10 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 			     vma->obj->base.size,
 			     true);
 	if (vma->vm->teardown_va_range) {
+		trace_i915_va_teardown(vma->vm,
+				       vma->node.start, vma->node.size,
+				       VM_TO_TRACE_NAME(vma->vm));
+
 		vma->vm->teardown_va_range(vma->vm,
 					   vma->node.start, vma->node.size);
 		ppgtt_invalidate_tlbs(vma->vm);
@@ -1981,7 +2086,7 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
 		if (!ppgtt)
 			return -ENOMEM;
 
-		ret = __hw_ppgtt_init(dev, ppgtt);
+		ret = __hw_ppgtt_init(dev, ppgtt, true);
 		if (ret != 0)
 			return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index f004d3d..8ba11d8 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -156,6 +156,122 @@ TRACE_EVENT(i915_vma_unbind,
 		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
 );
 
+#define VM_TO_TRACE_NAME(vm) \
+	(i915_is_ggtt(vm) ? "GGTT" : \
+	 i915_is_aliasing_ppgtt(vm) ? "Aliasing PPGTT" : \
+				      "Private VM")
+
+DECLARE_EVENT_CLASS(i915_va,
+	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+	TP_ARGS(vm, start, length, name),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u64, start)
+		__field(u64, end)
+		__string(name, name)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->start = start;
+		__entry->end = start + length;
+		__assign_str(name, name);
+	),
+
+	TP_printk("vm=%p (%s), 0x%llx-0x%llx",
+		  __entry->vm, __get_str(name),  __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_va, i915_va_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+	     TP_ARGS(vm, start, length, name)
+);
+
+DEFINE_EVENT(i915_va, i915_va_teardown,
+	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+	     TP_ARGS(vm, start, length, name)
+);
+
+DECLARE_EVENT_CLASS(i915_pagetable,
+	TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	TP_ARGS(vm, pde, start, pde_shift),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u64, start)
+		__field(u64, end)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->start = start;
+		__entry->end = (start + (1ULL << pde_shift)) & ~((1ULL << pde_shift)-1);
+	),
+
+	TP_printk("vm=%p, pde=%d (0x%llx-0x%llx)",
+		  __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_pagetable, i915_pagetable_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	     TP_ARGS(vm, pde, start, pde_shift)
+);
+
+DEFINE_EVENT(i915_pagetable, i915_pagetable_destroy,
+	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	     TP_ARGS(vm, pde, start, pde_shift)
+);
+
+/* Avoid extra math because we only support two sizes. The format is defined by
+ * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
+#define TRACE_PT_SIZE(bits) \
+	((((bits) == 1024) ? 288 : 144) + 1)
+
+DECLARE_EVENT_CLASS(i915_pagetable_update,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u32, first)
+		__field(u32, last)
+		__dynamic_array(char, cur_ptes, TRACE_PT_SIZE(bits))
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->first = first;
+		__entry->last = first + len;
+
+		bitmap_scnprintf(__get_str(cur_ptes),
+				 TRACE_PT_SIZE(bits),
+				 pt->used_ptes,
+				 bits);
+	),
+
+	TP_printk("vm=%p, pde=%d, updating %u:%u\t%s",
+		  __entry->vm, __entry->pde, __entry->last, __entry->first,
+		  __get_str(cur_ptes))
+);
+
+DEFINE_EVENT(i915_pagetable_update, i915_pagetable_map,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits)
+);
+
+DEFINE_EVENT(i915_pagetable_update, i915_pagetable_unmap,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits)
+);
+
 TRACE_EVENT(i915_gem_object_change_domain,
 	    TP_PROTO(struct drm_i915_gem_object *obj, u32 old_read, u32 old_write),
 	    TP_ARGS(obj, old_read, old_write),
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 15/24] drm/i915/bdw: Use dynamic allocation idioms on free
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (13 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 14/24] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 16/24] drm/i915/bdw: pagedirs rework allocation Michel Thierry
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The page directory freer is left here for now as it's still useful given
that GEN8 still preallocates. Once the allocation functions are broken
up into more discrete chunks, we'll follow suit and destroy this
leftover piece.

v2: Match trace_i915_va_teardown params
v3: Multiple rebases.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2, v3)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 54 +++++++++++++++++++++++--------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 46 +++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 02ccb18..07f0d24 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -579,27 +579,32 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct i915_pagedir *pd, struct drm_device *dev)
+static void gen8_teardown_va_range(struct i915_address_space *vm,
+				   uint64_t start, uint64_t length)
 {
-	int i;
-
-	if (!pd->page)
-		return;
-
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		free_pt_single(pd->page_tables[i], dev);
-		pd->page_tables[i] = NULL;
+	struct i915_hw_ppgtt *ppgtt =
+				container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_pagedir *pd;
+	struct i915_pagetab *pt;
+	uint64_t temp;
+	uint32_t pdpe, pde;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		uint64_t pd_len = gen8_clamp_pd(start, length);
+		uint64_t pd_start = start;
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			free_pt_single(pt, vm->dev);
+		}
+		free_pd_single(pd);
 	}
 }
 
-static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+/* This function will die soon */
+static void gen8_free_full_pagedir(struct i915_hw_ppgtt *ppgtt, int i)
 {
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
-		free_pd_single(ppgtt->pdp.pagedir[i]);
-	}
+	gen8_teardown_va_range(&ppgtt->base,
+			       i << GEN8_PDPE_SHIFT,
+			       (1 << GEN8_PDPE_SHIFT));
 }
 
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
@@ -614,19 +619,28 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 			continue;
 
 		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i]->daddr, PAGE_SIZE,
-			       PCI_DMA_BIDIRECTIONAL);
+				PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
-			struct i915_pagetab *pt =  pd->page_tables[j];
+			struct i915_pagetab *pt = pd->page_tables[j];
 			dma_addr_t addr = pt->daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
-					       PCI_DMA_BIDIRECTIONAL);
+						PCI_DMA_BIDIRECTIONAL);
 		}
 	}
 }
 
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+{
+	trace_i915_va_teardown(&ppgtt->base,
+			       ppgtt->base.start, ppgtt->base.total,
+			       VM_TO_TRACE_NAME(&ppgtt->base));
+	gen8_teardown_va_range(&ppgtt->base,
+			       ppgtt->base.start, ppgtt->base.total);
+}
+
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct i915_hw_ppgtt *ppgtt =
@@ -651,7 +665,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
+		gen8_free_full_pagedir(ppgtt, i);
 
 	return -ENOMEM;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 58a55bc..7797f0e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -387,6 +387,52 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	return i915_pde_index(addr, GEN6_PDE_SHIFT);
 }
 
+#define gen8_for_each_pde(pt, pd, start, length, temp, iter)		\
+	for (iter = gen8_pde_index(start), pt = (pd)->page_tables[iter]; \
+	     length > 0 && iter < GEN8_PDES_PER_PAGE;			\
+	     pt = (pd)->page_tables[++iter],				\
+	     temp = ALIGN(start+1, 1 << GEN8_PDE_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
+	for (iter = gen8_pdpe_index(start), pd = (pdp)->pagedir[iter];	\
+	     length > 0 && iter < GEN8_LEGACY_PDPES;			\
+	     pd = (pdp)->pagedir[++iter],				\
+	     temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
+/* Clamp length to the next pagedir boundary */
+static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
+{
+	uint64_t next_pd = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
+	if (next_pd > (start + length))
+		return length;
+
+	return next_pd - start;
+}
+
+static inline uint32_t gen8_pte_index(uint64_t address)
+{
+	return i915_pte_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pde_index(uint64_t address)
+{
+	return i915_pde_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pdpe_index(uint64_t address)
+{
+	return (address >> GEN8_PDPE_SHIFT) & GEN8_PDPE_MASK;
+}
+
+static inline uint32_t gen8_pml4e_index(uint64_t address)
+{
+	BUG(); /* For 64B */
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 16/24] drm/i915/bdw: pagedirs rework allocation
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (14 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 15/24] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 17/24] drm/i915/bdw: pagetable allocation rework Michel Thierry
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Start using gen8_for_each_pdpe macro to allocate the page directories.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 43 ++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 07f0d24..274b20f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -594,8 +594,10 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 		uint64_t pd_start = start;
 		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
 			free_pt_single(pt, vm->dev);
+			pd->page_tables[pde] = NULL;
 		}
 		free_pd_single(pd);
+		ppgtt->pdp.pagedir[pdpe] = NULL;
 	}
 }
 
@@ -670,25 +672,39 @@ unwind_out:
 	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
+static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
+				     uint64_t start,
+				     uint64_t length)
 {
-	int i;
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pdp, struct i915_hw_ppgtt, pdp);
+	struct i915_pagedir *unused;
+	uint64_t temp;
+	uint32_t pdpe;
 
-	for (i = 0; i < max_pdp; i++) {
-		ppgtt->pdp.pagedir[i] = alloc_pd_single();
-		if (IS_ERR(ppgtt->pdp.pagedir[i]))
+	/* FIXME: PPGTT container_of won't work for 64b */
+	BUG_ON((start + length) > 0x800000000ULL);
+
+	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
+		BUG_ON(unused);
+		pdp->pagedir[pdpe] = alloc_pd_single();
+		if (IS_ERR(ppgtt->pdp.pagedir[pdpe]))
 			goto unwind_out;
+
+		ppgtt->num_pd_pages++;
 	}
 
-	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
 
 unwind_out:
-	while (i--)
-		free_pd_single(ppgtt->pdp.pagedir[i]);
+	while (pdpe--) {
+		free_pd_single(ppgtt->pdp.pagedir[pdpe]);
+		ppgtt->num_pd_pages--;
+	}
+
+	WARN_ON(ppgtt->num_pd_pages);
 
 	return -ENOMEM;
 }
@@ -698,7 +714,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 {
 	int ret;
 
-	ret = gen8_ppgtt_allocate_page_directories(ppgtt, max_pdp);
+	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, ppgtt->base.start,
+					ppgtt->base.total);
 	if (ret)
 		return ret;
 
@@ -775,6 +792,10 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	if (size % (1<<30))
 		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
 
+	ppgtt->base.start = 0;
+	ppgtt->base.total = size;
+	BUG_ON(ppgtt->base.total == 0);
+
 	/* 1. Do all our allocations for page directories and page tables. */
 	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
 	if (ret)
@@ -822,8 +843,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
-	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
 
 	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
 			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 17/24] drm/i915/bdw: pagetable allocation rework
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (15 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 16/24] drm/i915/bdw: pagedirs rework allocation Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 18/24] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Start using gen8_for_each_pde macro to allocate page tables.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 54 ++++++++++++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 10 +++++++
 2 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 274b20f..2fb0db7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -601,14 +601,6 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 	}
 }
 
-/* This function will die soon */
-static void gen8_free_full_pagedir(struct i915_hw_ppgtt *ppgtt, int i)
-{
-	gen8_teardown_va_range(&ppgtt->base,
-			       i << GEN8_PDPE_SHIFT,
-			       (1 << GEN8_PDPE_SHIFT));
-}
-
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
@@ -652,22 +644,27 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_alloc_pagetabs(struct i915_pagedir *pd,
+				     uint64_t start,
+				     uint64_t length,
+				     struct drm_device *dev)
 {
-	int i, ret;
+	struct i915_pagetab *unused;
+	uint64_t temp;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
-				     0, GEN8_PDES_PER_PAGE, ppgtt->base.dev);
-		if (ret)
+	gen8_for_each_pde(unused, pd, start, length, temp, pde) {
+		BUG_ON(unused);
+		pd->page_tables[pde] = alloc_pt_single(dev);
+		if (IS_ERR(pd->page_tables[pde]))
 			goto unwind_out;
 	}
 
 	return 0;
 
 unwind_out:
-	while (i--)
-		gen8_free_full_pagedir(ppgtt, i);
+	while (pde--)
+		free_pt_single(pd->page_tables[pde], dev);
 
 	return -ENOMEM;
 }
@@ -710,20 +707,28 @@ unwind_out:
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-			    const int max_pdp)
+			    uint64_t start,
+			    uint64_t length)
 {
+	struct i915_pagedir *pd;
+	uint64_t temp;
+	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, ppgtt->base.start,
-					ppgtt->base.total);
+	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length);
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
-	if (ret)
-		goto err_out;
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
+						ppgtt->base.dev);
+		if (ret)
+			goto err_out;
+
+		ppgtt->num_pd_entries += GEN8_PDES_PER_PAGE;
+	}
 
-	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
+	BUG_ON(pdpe > ppgtt->num_pd_pages);
 
 	return 0;
 
@@ -794,10 +799,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
-	BUG_ON(ppgtt->base.total == 0);
 
 	/* 1. Do all our allocations for page directories and page tables. */
-	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
+	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 7797f0e..d7b71ef 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -403,6 +403,16 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
+/* Clamp length to the next pagetab boundary */
+static inline uint64_t gen8_clamp_pt(uint64_t start, uint64_t length)
+{
+	uint64_t next_pt = ALIGN(start + 1, 1 << GEN8_PDE_SHIFT);
+	if (next_pt > (start + length))
+		return length;
+
+	return next_pt - start;
+}
+
 /* Clamp length to the next pagedir boundary */
 static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
 {
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 18/24] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (16 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 17/24] drm/i915/bdw: pagetable allocation rework Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 19/24] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

One important part of this patch is we now write a scratch page
directory into any unused PDP descriptors. This matters for 2 reasons,
first, we're not allowed to just use 0, or an invalid pointer, and second,
we must wipe out any previous contents from the last context.

The latter point only matters with full PPGTT. The former point only
effect platforms with less than 4GB memory.

v2: Updated commit message to point that we must set unused PDPs to the
scratch page.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 29 ++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  5 ++++-
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2fb0db7..65c5aa8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -445,8 +445,9 @@ static struct i915_pagedir *alloc_pd_single(void)
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
-static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
-			   uint64_t val)
+static int gen8_write_pdp(struct intel_engine_cs *ring,
+			  unsigned entry,
+			  dma_addr_t addr)
 {
 	int ret;
 
@@ -458,10 +459,10 @@ static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
 
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 	intel_ring_emit(ring, GEN8_RING_PDP_UDW(ring, entry));
-	intel_ring_emit(ring, (u32)(val >> 32));
+	intel_ring_emit(ring, upper_32_bits(addr));
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 	intel_ring_emit(ring, GEN8_RING_PDP_LDW(ring, entry));
-	intel_ring_emit(ring, (u32)(val));
+	intel_ring_emit(ring, lower_32_bits(addr));
 	intel_ring_advance(ring);
 
 	return 0;
@@ -472,12 +473,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 {
 	int i, ret;
 
-	/* bit of a hack to find the actual last used pd */
-	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
-
-	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.pagedir[i]->daddr;
-		ret = gen8_write_pdp(ring, i, addr);
+	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+		struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
+		dma_addr_t pd_daddr = pd ? pd->daddr : ppgtt->scratch_pd->daddr;
+		/* The page directory might be NULL, but we need to clear out
+		 * whatever the previous context might have used. */
+		ret = gen8_write_pdp(ring, i, pd_daddr);
 		if (ret)
 			return ret;
 	}
@@ -800,10 +801,16 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
 
+	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pd))
+		return PTR_ERR(ppgtt->scratch_pd);
+
 	/* 1. Do all our allocations for page directories and page tables. */
 	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
-	if (ret)
+	if (ret) {
+		free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
+	}
 
 	/*
 	 * 2. Create DMA mappings for the page directories and page tables.
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d7b71ef..383d990 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -283,7 +283,10 @@ struct i915_hw_ppgtt {
 		struct i915_pagedir pd;
 	};
 
-	struct i915_pagetab *scratch_pt;
+	union {
+		struct i915_pagetab *scratch_pt;
+		struct i915_pagetab *scratch_pd; /* Just need the daddr */
+	};
 
 	struct drm_i915_file_private *file_priv;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 19/24] drm/i915: num_pd_pages/num_pd_entries isn't useful
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (17 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 18/24] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 20/24] drm/i915: Extract PPGTT param from pagedir alloc Michel Thierry
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

These values are never quite useful for dynamic allocations of the page
tables. Getting rid of them will help prevent later confusion.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 --
 drivers/gpu/drm/i915/i915_gem_gtt.c | 68 ++++++++++++-------------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  7 ++--
 3 files changed, 27 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 0f63076..b00760b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2117,8 +2117,6 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 	if (!ppgtt)
 		return;
 
-	seq_printf(m, "Page directories: %d\n", ppgtt->num_pd_pages);
-	seq_printf(m, "Page tables: %d\n", ppgtt->num_pd_entries);
 	for_each_ring(ring, dev_priv, unused) {
 		seq_printf(m, "%s\n", ring->name);
 		for (i = 0; i < 4; i++) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 65c5aa8..28cb503 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -607,7 +607,7 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
 		if (!ppgtt->pdp.pagedir[i]->daddr)
@@ -688,21 +688,13 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 		pdp->pagedir[pdpe] = alloc_pd_single();
 		if (IS_ERR(ppgtt->pdp.pagedir[pdpe]))
 			goto unwind_out;
-
-		ppgtt->num_pd_pages++;
 	}
 
-	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
-
 	return 0;
 
 unwind_out:
-	while (pdpe--) {
+	while (pdpe--)
 		free_pd_single(ppgtt->pdp.pagedir[pdpe]);
-		ppgtt->num_pd_pages--;
-	}
-
-	WARN_ON(ppgtt->num_pd_pages);
 
 	return -ENOMEM;
 }
@@ -725,12 +717,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 						ppgtt->base.dev);
 		if (ret)
 			goto err_out;
-
-		ppgtt->num_pd_entries += GEN8_PDES_PER_PAGE;
 	}
 
-	BUG_ON(pdpe > ppgtt->num_pd_pages);
-
 	return 0;
 
 	/* TODO: Check this for all cases */
@@ -792,7 +780,6 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
 	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
 	int i, j, ret;
 
 	if (size % (1<<30))
@@ -855,11 +842,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 
-	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
-			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
-	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
-			 ppgtt->num_pd_entries,
-			 (ppgtt->num_pd_entries - min_pt_pages) + size % (1<<30));
 	return 0;
 
 bail:
@@ -870,26 +852,20 @@ bail:
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
 	struct i915_address_space *vm = &ppgtt->base;
-	gen6_gtt_pte_t __iomem *pd_addr;
+	struct i915_pagetab *unused;
 	gen6_gtt_pte_t scratch_pte;
 	uint32_t pd_entry;
-	int pte, pde;
+	uint32_t  pte, pde, temp;
+	uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
 
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
 
-	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-
-	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-		   ppgtt->pd.pd_offset,
-		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
-	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
+	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
 		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
-		pd_entry = readl(pd_addr + pde);
+		pd_entry = readl(ppgtt->pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
 		if (pd_entry != expected)
@@ -1162,12 +1138,15 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 
 static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
-	int i;
+	struct i915_pagetab *pt;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_tables[i]->daddr,
-			       4096, PCI_DMA_BIDIRECTIONAL);
+	gen6_for_all_pdes(pt, ppgtt, pde) {
+		if (pt != ppgtt->scratch_pt) /* MT check if needed this if */
+			pci_unmap_page(ppgtt->base.dev->pdev,
+				pt->daddr,
+				4096, PCI_DMA_BIDIRECTIONAL);
+	}
 }
 
 static int gen6_alloc_va_range(struct i915_address_space *vm,
@@ -1294,12 +1273,12 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
-	int i;
+	struct i915_pagetab *pt;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		struct i915_pagetab *pt = ppgtt->pd.page_tables[i];
+	gen6_for_all_pdes(pt, ppgtt, pde) {
 		if (pt != ppgtt->scratch_pt)
-			free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+			free_pt_single(pt, ppgtt->base.dev);
 	}
 
 	/* Consider putting this as part of pd free. */
@@ -1359,7 +1338,6 @@ alloc:
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
-	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
 	return 0;
 
 err_out:
@@ -1378,9 +1356,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt, bool preallocate_pt)
 	if (!preallocate_pt)
 		return 0;
 
-	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
-			ppgtt->base.dev);
-
+	ret = alloc_pt_range(&ppgtt->pd, 0, GEN6_PPGTT_PD_ENTRIES, ppgtt->base.dev);
 	if (ret) {
 		free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
 		drm_mm_remove_node(&ppgtt->node);
@@ -1426,7 +1402,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	ppgtt->base.total = GEN6_PPGTT_PD_ENTRIES * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd.pd_offset =
@@ -1761,7 +1737,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 		if (i915_is_ggtt(vm))
 			ppgtt = dev_priv->mm.aliasing_ppgtt;
 
-		gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
+		gen6_write_page_range(dev_priv, &ppgtt->pd, 0, GEN6_PPGTT_PD_ENTRIES);
 	}
 
 	i915_ggtt_flush(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 383d990..e06f249 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -276,8 +276,6 @@ struct i915_hw_ppgtt {
 	struct i915_address_space base;
 	struct kref ref;
 	struct drm_mm_node node;
-	unsigned num_pd_entries;
-	unsigned num_pd_pages; /* gen8+ */
 	union {
 		struct i915_pagedirpo pdp;
 		struct i915_pagedir pd;
@@ -343,6 +341,11 @@ struct i915_gtt {
 	     temp = min(temp, (unsigned)length), \
 	     start += temp, length -= temp)
 
+#define gen6_for_all_pdes(pt, ppgtt, iter)  \
+	for (iter = 0, pt = ppgtt->pd.page_tables[iter];			\
+	     iter < gen6_pde_index(ppgtt->base.total);			\
+	     pt =  ppgtt->pd.page_tables[++iter])
+
 static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
 {
 	const uint32_t mask = NUM_PTE(pde_shift) - 1;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 20/24] drm/i915: Extract PPGTT param from pagedir alloc
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (18 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 19/24] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 21/24] drm/i915/bdw: Split out mappings Michel Thierry
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Now that we don't need to trace num_pd_pages, we may as well kill all
need for the PPGTT structure in the alloc_pagedirs. This is very useful
for when we move to 48b addressing, and the PDP isn't the root of the
page table structure.

The param is replaced with drm_device, which is an unavoidable wart
throughout the series. (in other words, not extra flagrant).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 28cb503..0c32a91 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -674,8 +674,6 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 				     uint64_t start,
 				     uint64_t length)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(pdp, struct i915_hw_ppgtt, pdp);
 	struct i915_pagedir *unused;
 	uint64_t temp;
 	uint32_t pdpe;
@@ -686,7 +684,7 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
 		pdp->pagedir[pdpe] = alloc_pd_single();
-		if (IS_ERR(ppgtt->pdp.pagedir[pdpe]))
+		if (IS_ERR(pdp->pagedir[pdpe]))
 			goto unwind_out;
 	}
 
@@ -694,7 +692,7 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 
 unwind_out:
 	while (pdpe--)
-		free_pd_single(ppgtt->pdp.pagedir[pdpe]);
+		free_pd_single(pdp->pagedir[pdpe]);
 
 	return -ENOMEM;
 }
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 21/24] drm/i915/bdw: Split out mappings
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (19 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 20/24] drm/i915: Extract PPGTT param from pagedir alloc Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 22/24] drm/i915/bdw: begin bitmap tracking Michel Thierry
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

When we do dynamic page table allocations for gen8, we'll need to have
more control over how and when we map page tables, similar to gen6.
In particular, DMA mappings for page directories/tables occur at allocation
time.

This patch adds the functionality and calls it at init, which should
have no functional change.

The PDPEs are still a special case for now. We'll need a function for
that in the future as well.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 186 ++++++++++++++----------------------
 1 file changed, 72 insertions(+), 114 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0c32a91..73e7c08 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -415,21 +415,23 @@ err_out:
 	return ret;
 }
 
-static void __free_pd_single(struct i915_pagedir *pd)
+static void __free_pd_single(struct i915_pagedir *pd, struct drm_device *dev)
 {
+	i915_dma_unmap_single(pd, dev);
 	__free_page(pd->page);
 	kfree(pd);
 }
 
-#define free_pd_single(pd) do { \
+#define free_pd_single(pd,  dev) do { \
 	if ((pd)->page) { \
-		__free_pd_single(pd); \
+		__free_pd_single(pd, dev); \
 	} \
 } while (0)
 
-static struct i915_pagedir *alloc_pd_single(void)
+static struct i915_pagedir *alloc_pd_single(struct drm_device *dev)
 {
 	struct i915_pagedir *pd;
+	int ret;
 
 	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
 	if (!pd)
@@ -441,6 +443,13 @@ static struct i915_pagedir *alloc_pd_single(void)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	ret = i915_dma_map_px_single(pd, dev);
+	if (ret) {
+		__free_page(pd->page);
+		kfree(pd);
+		return ERR_PTR(ret);
+	}
+
 	return pd;
 }
 
@@ -580,6 +589,36 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+			     struct i915_pagetab *pt,
+			     struct drm_device *dev)
+{
+	gen8_ppgtt_pde_t entry =
+		gen8_pde_encode(dev, pt->daddr, I915_CACHE_LLC);
+	*pde = entry;
+}
+
+/* It's likely we'll map more than one pagetable at a time. This function will
+ * save us unnecessary kmap calls, but do no more functionally than multiple
+ * calls to map_pt. */
+static void gen8_map_pagetable_range(struct i915_pagedir *pd,
+				     uint64_t start,
+				     uint64_t length,
+				     struct drm_device *dev)
+{
+	gen8_ppgtt_pde_t *pagedir = kmap_atomic(pd->page);
+	struct i915_pagetab *pt;
+	uint64_t temp, pde;
+
+	gen8_for_each_pde(pt, pd, start, length, temp, pde)
+		__gen8_do_map_pt(pagedir + pde, pt, dev);
+
+	if (!HAS_LLC(dev))
+		drm_clflush_virt_range(pagedir, PAGE_SIZE);
+
+	kunmap_atomic(pagedir);
+}
+
 static void gen8_teardown_va_range(struct i915_address_space *vm,
 				   uint64_t start, uint64_t length)
 {
@@ -597,7 +636,7 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 			free_pt_single(pt, vm->dev);
 			pd->page_tables[pde] = NULL;
 		}
-		free_pd_single(pd);
+		free_pd_single(pd, vm->dev);
 		ppgtt->pdp.pagedir[pdpe] = NULL;
 	}
 }
@@ -629,9 +668,6 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
-	trace_i915_va_teardown(&ppgtt->base,
-			       ppgtt->base.start, ppgtt->base.total,
-			       VM_TO_TRACE_NAME(&ppgtt->base));
 	gen8_teardown_va_range(&ppgtt->base,
 			       ppgtt->base.start, ppgtt->base.total);
 }
@@ -672,7 +708,8 @@ unwind_out:
 
 static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 				     uint64_t start,
-				     uint64_t length)
+				     uint64_t length,
+				     struct drm_device *dev)
 {
 	struct i915_pagedir *unused;
 	uint64_t temp;
@@ -683,7 +720,7 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
-		pdp->pagedir[pdpe] = alloc_pd_single();
+		pdp->pagedir[pdpe] = alloc_pd_single(dev);
 		if (IS_ERR(pdp->pagedir[pdpe]))
 			goto unwind_out;
 	}
@@ -692,21 +729,25 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 
 unwind_out:
 	while (pdpe--)
-		free_pd_single(pdp->pagedir[pdpe]);
+		free_pd_single(pdp->pagedir[pdpe], dev);
 
 	return -ENOMEM;
 }
 
-static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-			    uint64_t start,
-			    uint64_t length)
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start,
+			       uint64_t length)
 {
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_pagedir *pd;
+	const uint64_t orig_start = start;
 	uint64_t temp;
 	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length);
+	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length,
+					ppgtt->base.dev);
 	if (ret)
 		return ret;
 
@@ -719,133 +760,50 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 
 	return 0;
 
-	/* TODO: Check this for all cases */
 err_out:
-	gen8_ppgtt_free(ppgtt);
+	gen8_teardown_va_range(vm, orig_start, start);
 	return ret;
 }
 
-static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
-					     const int pd)
-{
-	dma_addr_t pd_addr;
-	int ret;
-
-	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.pagedir[pd]->page, 0,
-			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
-	if (ret)
-		return ret;
-
-	ppgtt->pdp.pagedir[pd]->daddr = pd_addr;
-
-	return 0;
-}
-
-static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
-					const int pd,
-					const int pt)
-{
-	dma_addr_t pt_addr;
-	struct i915_pagedir *pdir = ppgtt->pdp.pagedir[pd];
-	struct i915_pagetab *ptab = pdir->page_tables[pt];
-	struct page *p = ptab->page;
-	int ret;
-
-	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
-	if (ret)
-		return ret;
-
-	ptab->daddr = pt_addr;
-
-	return 0;
-}
-
 /**
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
  * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
  * space.
  *
- * FIXME: split allocation into smaller pieces. For now we only ever do this
- * once, but with full PPGTT, the multiple contiguous allocations will be bad.
- * TODO: Do something with the size parameter
  */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	int i, j, ret;
-
-	if (size % (1<<30))
-		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
+	struct i915_pagedir *pd;
+	uint64_t temp, start = 0;
+	const uint64_t orig_length = size;
+	uint32_t pdpe;
+	int ret;
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->switch_mm = gen8_mm_switch;
 
 	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pd))
 		return PTR_ERR(ppgtt->scratch_pd);
 
-	/* 1. Do all our allocations for page directories and page tables. */
-	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
+	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
 		free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
 	}
 
-	/*
-	 * 2. Create DMA mappings for the page directories and page tables.
-	 */
-	for (i = 0; i < max_pdp; i++) {
-		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
-		if (ret)
-			goto bail;
-
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
-			if (ret)
-				goto bail;
-		}
-	}
-
-	/*
-	 * 3. Map all the page directory entires to point to the page tables
-	 * we've allocated.
-	 *
-	 * For now, the PPGTT helper functions all require that the PDEs are
-	 * plugged in correctly. So we do that now/here. For aliasing PPGTT, we
-	 * will never need to touch the PDEs again.
-	 */
-	for (i = 0; i < max_pdp; i++) {
-		struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
-		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i]->page);
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_pagetab *pt = pd->page_tables[j];
-			dma_addr_t addr = pt->daddr;
-			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
-						      I915_CACHE_LLC);
-		}
-		if (!HAS_LLC(ppgtt->base.dev))
-			drm_clflush_virt_range(pd_vaddr, PAGE_SIZE);
-		kunmap_atomic(pd_vaddr);
-	}
+	start = 0;
+	size = orig_length;
 
-	ppgtt->switch_mm = gen8_mm_switch;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
-	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
+		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
 
 	return 0;
-
-bail:
-	gen8_ppgtt_unmap_pages(ppgtt);
-	gen8_ppgtt_free(ppgtt);
-	return ret;
 }
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
@@ -1281,7 +1239,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	/* Consider putting this as part of pd free. */
 	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
-	free_pd_single(&ppgtt->pd);
+	free_pd_single(&ppgtt->pd, ppgtt->base.dev);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 22/24] drm/i915/bdw: begin bitmap tracking
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (20 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 21/24] drm/i915/bdw: Split out mappings Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 23/24] drm/i915/bdw: Dynamic page table allocations Michel Thierry
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Like with gen6/7, we can enable bitmap tracking with all the
preallocations to make sure things actually don't blow up.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 121 +++++++++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  24 +++++++
 2 files changed, 108 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 73e7c08..a834fa6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -417,8 +417,12 @@ err_out:
 
 static void __free_pd_single(struct i915_pagedir *pd, struct drm_device *dev)
 {
+	WARN(!bitmap_empty(pd->used_pdes, GEN8_PDES_PER_PAGE),
+	     "Free page directory with %d used pages\n",
+	     bitmap_weight(pd->used_pdes, GEN8_PDES_PER_PAGE));
 	i915_dma_unmap_single(pd, dev);
 	__free_page(pd->page);
+	kfree(pd->used_pdes);
 	kfree(pd);
 }
 
@@ -431,26 +435,35 @@ static void __free_pd_single(struct i915_pagedir *pd, struct drm_device *dev)
 static struct i915_pagedir *alloc_pd_single(struct drm_device *dev)
 {
 	struct i915_pagedir *pd;
-	int ret;
+	int ret = -ENOMEM;
 
 	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
 	if (!pd)
 		return ERR_PTR(-ENOMEM);
 
+	pd->used_pdes = kcalloc(BITS_TO_LONGS(GEN8_PDES_PER_PAGE),
+				sizeof(*pd->used_pdes), GFP_KERNEL);
+	if (!pd->used_pdes)
+		goto free_pd;
+
 	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (!pd->page) {
-		kfree(pd);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!pd->page)
+		goto free_bitmap;
 
 	ret = i915_dma_map_px_single(pd, dev);
-	if (ret) {
-		__free_page(pd->page);
-		kfree(pd);
-		return ERR_PTR(ret);
-	}
+	if (ret)
+		goto free_page;
 
 	return pd;
+
+free_page:
+	__free_page(pd->page);
+free_bitmap:
+	kfree(pd->used_pdes);
+free_pd:
+	kfree(pd);
+
+	return ERR_PTR(ret);
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
@@ -632,36 +645,47 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
-		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
-			free_pt_single(pt, vm->dev);
-			pd->page_tables[pde] = NULL;
-		}
-		free_pd_single(pd, vm->dev);
-		ppgtt->pdp.pagedir[pdpe] = NULL;
-	}
-}
 
-static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
-{
-	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
-	int i, j;
-
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		/* TODO: In the future we'll support sparse mappings, so this
-		 * will have to change. */
-		if (!ppgtt->pdp.pagedir[i]->daddr)
+		/* Page directories might not be present since the macro rounds
+		 * down, and up.
+		 */
+		if (!pd) {
+			WARN(test_bit(pdpe, ppgtt->pdp.used_pdpes),
+			     "PDPE %d is not allocated, but is reserved (%p)\n",
+			     pdpe, vm);
 			continue;
+		} else {
+			WARN(!test_bit(pdpe, ppgtt->pdp.used_pdpes),
+			     "PDPE %d not reserved, but is allocated (%p)",
+			     pdpe, vm);
+		}
 
-		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i]->daddr, PAGE_SIZE,
-				PCI_DMA_BIDIRECTIONAL);
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			if (!pt) {
+				WARN(test_bit(pde, pd->used_pdes),
+				     "PDE %d is not allocated, but is reserved (%p)\n",
+				     pde, vm);
+				continue;
+			} else
+				WARN(!test_bit(pde, pd->used_pdes),
+				     "PDE %d not reserved, but is allocated (%p)",
+				     pde, vm);
+
+			bitmap_clear(pt->used_ptes,
+				     gen8_pte_index(pd_start),
+				     gen8_pte_count(pd_start, pd_len));
+
+			if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PAGE)) {
+				free_pt_single(pt, vm->dev);
+				pd->page_tables[pde] = NULL;
+				WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+			}
+		}
 
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
-			struct i915_pagetab *pt = pd->page_tables[j];
-			dma_addr_t addr = pt->daddr;
-			if (addr)
-				pci_unmap_page(hwdev, addr, PAGE_SIZE,
-						PCI_DMA_BIDIRECTIONAL);
+		if (bitmap_empty(pd->used_pdes, GEN8_PDES_PER_PAGE)) {
+			free_pd_single(pd, vm->dev);
+			ppgtt->pdp.pagedir[pdpe] = NULL;
+			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
 		}
 	}
 }
@@ -677,7 +701,6 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 
-	gen8_ppgtt_unmap_pages(ppgtt);
 	gen8_ppgtt_free(ppgtt);
 }
 
@@ -706,6 +729,7 @@ unwind_out:
 	return -ENOMEM;
 }
 
+/* bitmap of new pagedirs */
 static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 				     uint64_t start,
 				     uint64_t length,
@@ -721,6 +745,7 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
 		pdp->pagedir[pdpe] = alloc_pd_single(dev);
+
 		if (IS_ERR(pdp->pagedir[pdpe]))
 			goto unwind_out;
 	}
@@ -742,10 +767,12 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_pagedir *pd;
 	const uint64_t orig_start = start;
+	const uint64_t orig_length = length;
 	uint64_t temp;
 	uint32_t pdpe;
 	int ret;
 
+	/* Do the allocations first so we can easily bail out */
 	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length,
 					ppgtt->base.dev);
 	if (ret)
@@ -758,6 +785,26 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 			goto err_out;
 	}
 
+	/* Now mark everything we've touched as used. This doesn't allow for
+	 * robust error checking, but it makes the code a hell of a lot simpler.
+	 */
+	start = orig_start;
+	length = orig_length;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		struct i915_pagetab *pt;
+		uint64_t pd_len = gen8_clamp_pd(start, length);
+		uint64_t pd_start = start;
+		uint32_t pde;
+		gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
+			bitmap_set(pd->page_tables[pde]->used_ptes,
+				   gen8_pte_index(start),
+				   gen8_pte_count(start, length));
+			set_bit(pde, pd->used_pdes);
+		}
+		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+	}
+
 	return 0;
 
 err_out:
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e06f249..9b6caac 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -206,11 +206,13 @@ struct i915_pagedir {
 		dma_addr_t daddr;
 	};
 
+	unsigned long *used_pdes;
 	struct i915_pagetab *page_tables[GEN6_PPGTT_PD_ENTRIES];
 };
 
 struct i915_pagedirpo {
 	/* struct page *page; */
+	DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
 	struct i915_pagedir *pagedir[GEN8_LEGACY_PDPES];
 };
 
@@ -449,6 +451,28 @@ static inline uint32_t gen8_pml4e_index(uint64_t address)
 	BUG(); /* For 64B */
 }
 
+static inline size_t gen8_pte_count(uint64_t addr, uint64_t length)
+{
+	return i915_pte_count(addr, length, GEN8_PDE_SHIFT);
+}
+
+static inline size_t gen8_pde_count(uint64_t addr, uint64_t length)
+{
+	const uint32_t pdp_shift = GEN8_PDE_SHIFT + 9;
+	const uint64_t mask = ~((1 << pdp_shift) - 1);
+	uint64_t end;
+
+	BUG_ON(length == 0);
+	BUG_ON(offset_in_page(addr|length));
+
+	end = addr + length;
+
+	if ((addr & mask) != (end & mask))
+		return GEN8_PDES_PER_PAGE - i915_pde_index(addr, GEN8_PDE_SHIFT);
+
+	return i915_pde_index(end, GEN8_PDE_SHIFT) - i915_pde_index(addr, GEN8_PDE_SHIFT);
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 23/24] drm/i915/bdw: Dynamic page table allocations
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (21 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 22/24] drm/i915/bdw: begin bitmap tracking Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 17:10 ` [PATCH 24/24] drm/i915/bdw: Dynamic page table allocations in lrc mode Michel Thierry
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.

Zombie tracking:
This could be a separate patch, but I found it helpful for debugging.
Since we write page tables asynchronously with respect to the GPU using
them, we can't actually free the page tables until we know the GPU won't
use them. With this patch, that is always when the context dies.  It
would be possible to write a reaper to go through zombies and clean them
up when under memory pressure. That exercise is left for the reader.

Scratch unused pages:
The object pages can get freed even if a page table still points to
them.  Like the zombie fix, we need to make sure we don't let our GPU
access arbitrary memory when we've unmapped things.

v2: Update aliasing/true ppgtt allocate/teardown/clear functions for
gen 6 & 7.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 378 +++++++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  16 +-
 2 files changed, 327 insertions(+), 67 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a834fa6..7001869 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -602,7 +602,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
 			     struct i915_pagetab *pt,
 			     struct drm_device *dev)
 {
@@ -619,7 +619,7 @@ static void gen8_map_pagetable_range(struct i915_pagedir *pd,
 				     uint64_t length,
 				     struct drm_device *dev)
 {
-	gen8_ppgtt_pde_t *pagedir = kmap_atomic(pd->page);
+	gen8_ppgtt_pde_t * const pagedir = kmap_atomic(pd->page);
 	struct i915_pagetab *pt;
 	uint64_t temp, pde;
 
@@ -632,8 +632,9 @@ static void gen8_map_pagetable_range(struct i915_pagedir *pd,
 	kunmap_atomic(pagedir);
 }
 
-static void gen8_teardown_va_range(struct i915_address_space *vm,
-				   uint64_t start, uint64_t length)
+static void __gen8_teardown_va_range(struct i915_address_space *vm,
+				     uint64_t start, uint64_t length,
+				     bool dead)
 {
 	struct i915_hw_ppgtt *ppgtt =
 				container_of(vm, struct i915_hw_ppgtt, base);
@@ -655,6 +656,13 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 			     pdpe, vm);
 			continue;
 		} else {
+			if (dead && pd->zombie) {
+				WARN_ON(test_bit(pdpe, ppgtt->pdp.used_pdpes));
+				free_pd_single(pd, vm->dev);
+				ppgtt->pdp.pagedir[pdpe] = NULL;
+				continue;
+			}
+
 			WARN(!test_bit(pdpe, ppgtt->pdp.used_pdpes),
 			     "PDPE %d not reserved, but is allocated (%p)",
 			     pdpe, vm);
@@ -666,34 +674,64 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 				     "PDE %d is not allocated, but is reserved (%p)\n",
 				     pde, vm);
 				continue;
-			} else
+			} else {
+				if (dead && pt->zombie) {
+					WARN_ON(test_bit(pde, pd->used_pdes));
+					free_pt_single(pt, vm->dev);
+					pd->page_tables[pde] = NULL;
+					continue;
+				}
 				WARN(!test_bit(pde, pd->used_pdes),
 				     "PDE %d not reserved, but is allocated (%p)",
 				     pde, vm);
+			}
 
 			bitmap_clear(pt->used_ptes,
 				     gen8_pte_index(pd_start),
 				     gen8_pte_count(pd_start, pd_len));
 
 			if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PAGE)) {
+				WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+				if (!dead) {
+					pt->zombie = 1;
+					continue;
+				}
 				free_pt_single(pt, vm->dev);
 				pd->page_tables[pde] = NULL;
-				WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+
 			}
 		}
 
+		gen8_ppgtt_clear_range(vm, pd_start, pd_len, true);
+
 		if (bitmap_empty(pd->used_pdes, GEN8_PDES_PER_PAGE)) {
+			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
+			if (!dead) {
+				/* We've unmapped a possibly live context. Make
+				 * note of it so we can clean it up later. */
+				pd->zombie = 1;
+				continue;
+			}
 			free_pd_single(pd, vm->dev);
 			ppgtt->pdp.pagedir[pdpe] = NULL;
-			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
 		}
 	}
 }
 
+static void gen8_teardown_va_range(struct i915_address_space *vm,
+				   uint64_t start, uint64_t length)
+{
+	__gen8_teardown_va_range(vm, start, length, false);
+}
+
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
-	gen8_teardown_va_range(&ppgtt->base,
-			       ppgtt->base.start, ppgtt->base.total);
+	trace_i915_va_teardown(&ppgtt->base,
+			       ppgtt->base.start, ppgtt->base.total,
+			       VM_TO_TRACE_NAME(&ppgtt->base));
+	__gen8_teardown_va_range(&ppgtt->base,
+				 ppgtt->base.start, ppgtt->base.total,
+				 true);
 }
 
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
@@ -704,58 +742,167 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_alloc_pagetabs(struct i915_pagedir *pd,
+/**
+ * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
+ * @ppgtt:	Master ppgtt structure.
+ * @pd:		Page directory for this address range.
+ * @start:	Starting virtual address to begin allocations.
+ * @length	Size of the allocations.
+ * @new_pts:	Bitmap set by function with new allocations. Likely used by the
+ *		caller to free on error.
+ *
+ * Allocate the required number of page tables. Extremely similar to
+ * gen8_ppgtt_alloc_pagedirs(). The main difference is here we are limited by
+ * the page directory boundary (instead of the page directory pointer). That
+ * boundary is 1GB virtual. Therefore, unlike gen8_ppgtt_alloc_pagedirs(), it is
+ * possible, and likely that the caller will need to use multiple calls of this
+ * function to achieve the appropriate allocation.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_pagedir *pd,
 				     uint64_t start,
 				     uint64_t length,
-				     struct drm_device *dev)
+				     unsigned long *new_pts)
 {
-	struct i915_pagetab *unused;
+	struct i915_pagetab *pt;
 	uint64_t temp;
 	uint32_t pde;
 
-	gen8_for_each_pde(unused, pd, start, length, temp, pde) {
-		BUG_ON(unused);
-		pd->page_tables[pde] = alloc_pt_single(dev);
-		if (IS_ERR(pd->page_tables[pde]))
+	gen8_for_each_pde(pt, pd, start, length, temp, pde) {
+		/* Don't reallocate page tables */
+		if (pt) {
+			/* Scratch is never allocated this way */
+			WARN_ON(pt->scratch);
+			/* If there is a zombie, we can reuse it and save time
+			 * on the allocation. If we clear the zombie status and
+			 * the caller somehow fails, we'll probably hit some
+			 * assertions, so it's up to them to fix up the bitmaps.
+			 */
+			continue;
+		}
+
+		pt = alloc_pt_single(ppgtt->base.dev);
+		if (IS_ERR(pt))
 			goto unwind_out;
+
+		pd->page_tables[pde] = pt;
+		set_bit(pde, new_pts);
 	}
 
 	return 0;
 
 unwind_out:
-	while (pde--)
-		free_pt_single(pd->page_tables[pde], dev);
+	for_each_set_bit(pde, new_pts, GEN8_PDES_PER_PAGE)
+		free_pt_single(pd->page_tables[pde], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
 
-/* bitmap of new pagedirs */
-static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
+/**
+ * gen8_ppgtt_alloc_pagedirs() - Allocate page directories for VA range.
+ * @ppgtt:	Master ppgtt structure.
+ * @pdp:	Page directory pointer for this address range.
+ * @start:	Starting virtual address to begin allocations.
+ * @length	Size of the allocations.
+ * @new_pds	Bitmap set by function with new allocations. Likely used by the
+ *		caller to free on error.
+ *
+ * Allocate the required number of page directories starting at the pde index of
+ * @start, and ending at the pde index @start + @length. This function will skip
+ * over already allocated page directories within the range, and only allocate
+ * new ones, setting the appropriate pointer within the pdp as well as the
+ * correct position in the bitmap @new_pds.
+ *
+ * The function will only allocate the pages within the range for a give page
+ * directory pointer. In other words, if @start + @length straddles a virtually
+ * addressed PDP boundary (512GB for 4k pages), there will be more allocations
+ * required by the caller, This is not currently possible, and the BUG in the
+ * code will prevent it.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_pagedirs(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_pagedirpo *pdp,
 				     uint64_t start,
 				     uint64_t length,
-				     struct drm_device *dev)
+				     unsigned long *new_pds)
 {
-	struct i915_pagedir *unused;
+	struct i915_pagedir *pd;
 	uint64_t temp;
 	uint32_t pdpe;
 
+	BUG_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+
 	/* FIXME: PPGTT container_of won't work for 64b */
 	BUG_ON((start + length) > 0x800000000ULL);
 
-	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
-		BUG_ON(unused);
-		pdp->pagedir[pdpe] = alloc_pd_single(dev);
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+		if (pd)
+			continue;
 
-		if (IS_ERR(pdp->pagedir[pdpe]))
+		pd = alloc_pd_single(ppgtt->base.dev);
+		if (IS_ERR(pd))
 			goto unwind_out;
+
+		pdp->pagedir[pdpe] = pd;
+		set_bit(pdpe, new_pds);
 	}
 
 	return 0;
 
 unwind_out:
-	while (pdpe--)
-		free_pd_single(pdp->pagedir[pdpe], dev);
+	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
+		free_pd_single(pdp->pagedir[pdpe], ppgtt->base.dev);
+
+	return -ENOMEM;
+}
+
+static inline void
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
+{
+	int i;
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+		kfree(new_pts[i]);
+	kfree(new_pts);
+	kfree(new_pds);
+}
+
+/* Fills in the page directory bitmap, ant the array of page tables bitmap. Both
+ * of these are based on the number of PDPEs in the system.
+ */
+int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
+					 unsigned long ***new_pts)
+{
+	int i;
+	unsigned long *pds;
+	unsigned long **pts;
 
+	pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
+	if (!pds)
+		return -ENOMEM;
+
+	pts = kcalloc(GEN8_PDES_PER_PAGE, sizeof(unsigned long *), GFP_KERNEL);
+	if (!pts) {
+		kfree(pds);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+		pts[i] = kcalloc(BITS_TO_LONGS(GEN8_PDES_PER_PAGE),
+				 sizeof(unsigned long), GFP_KERNEL);
+		if (!pts[i])
+			goto err_out;
+	}
+
+	*new_pds = pds;
+	*new_pts = (unsigned long **)pts;
+
+	return 0;
+
+err_out:
+	free_gen8_temp_bitmaps(pds, pts);
 	return -ENOMEM;
 }
 
@@ -765,6 +912,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	unsigned long *new_page_dirs, **new_page_tables;
 	struct i915_pagedir *pd;
 	const uint64_t orig_start = start;
 	const uint64_t orig_length = length;
@@ -772,43 +920,103 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	uint32_t pdpe;
 	int ret;
 
-	/* Do the allocations first so we can easily bail out */
-	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length,
-					ppgtt->base.dev);
+#ifndef CONFIG_64BIT
+	/* Disallow 64b address on 32b platforms. Nothing is wrong with doing
+	 * this in hardware, but a lot of the drm code is not prepared to handle
+	 * 64b offset on 32b platforms. */
+	if (start + length > 0x100000000ULL)
+		return -E2BIG;
+#endif
+
+	/* Wrap is never okay since we can only represent 48b, and we don't
+	 * actually use the other side of the canonical address space.
+	 */
+	if (WARN_ON(start + length < start))
+		return -ERANGE;
+
+	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
 	if (ret)
 		return ret;
 
+	/* Do the allocations first so we can easily bail out */
+	ret = gen8_ppgtt_alloc_pagedirs(ppgtt, &ppgtt->pdp, start, length,
+					new_page_dirs);
+	if (ret) {
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+		return ret;
+	}
+
+	/* For every page directory referenced, allocate page tables */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
-						ppgtt->base.dev);
+		bitmap_zero(new_page_tables[pdpe], GEN8_PDES_PER_PAGE);
+		ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
+						new_page_tables[pdpe]);
 		if (ret)
 			goto err_out;
 	}
 
-	/* Now mark everything we've touched as used. This doesn't allow for
-	 * robust error checking, but it makes the code a hell of a lot simpler.
-	 */
 	start = orig_start;
 	length = orig_length;
 
+	/* Allocations have completed successfully, so set the bitmaps, and do
+	 * the mappings. */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		gen8_ppgtt_pde_t *const pagedir = kmap_atomic(pd->page);
 		struct i915_pagetab *pt;
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
 		uint32_t pde;
-		gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
-			bitmap_set(pd->page_tables[pde]->used_ptes,
-				   gen8_pte_index(start),
-				   gen8_pte_count(start, length));
+
+		/* Every pd should be allocated, we just did that above. */
+		BUG_ON(!pd);
+
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			/* Same reasoning as pd */
+			BUG_ON(!pt);
+			BUG_ON(!pd_len);
+			BUG_ON(!gen8_pte_count(pd_start, pd_len));
+
+			/* Set our used ptes within the page table */
+			bitmap_set(pt->used_ptes,
+				   gen8_pte_index(pd_start),
+				   gen8_pte_count(pd_start, pd_len));
+
+			/* Our pde is now pointing to the pagetable, pt */
 			set_bit(pde, pd->used_pdes);
+
+			/* Map the PDE to the page table */
+			__gen8_do_map_pt(pagedir + pde, pt, vm->dev);
+
+			/* NB: We haven't yet mapped ptes to pages. At this
+			 * point we're still relying on insert_entries() */
+
+			/* No longer possible this page table is a zombie */
+			pt->zombie = 0;
 		}
+
+		if (!HAS_LLC(vm->dev))
+			drm_clflush_virt_range(pagedir, PAGE_SIZE);
+
+		kunmap_atomic(pagedir);
+
 		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+		/* This pd is officially not a zombie either */
+		ppgtt->pdp.pagedir[pdpe]->zombie = 0;
 	}
 
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	return 0;
 
 err_out:
-	gen8_teardown_va_range(vm, orig_start, start);
+	while (pdpe--) {
+		for_each_set_bit(temp, new_page_tables[pdpe], GEN8_PDES_PER_PAGE)
+			free_pt_single(pd->page_tables[temp], vm->dev);
+	}
+
+	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
+		free_pd_single(ppgtt->pdp.pagedir[pdpe], vm->dev);
+
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	return ret;
 }
 
@@ -819,37 +1027,68 @@ err_out:
  * space.
  *
  */
-static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
+static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	struct i915_pagedir *pd;
-	uint64_t temp, start = 0;
-	const uint64_t orig_length = size;
-	uint32_t pdpe;
-	int ret;
+	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pd))
+		return PTR_ERR(ppgtt->scratch_pd);
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+
 	ppgtt->switch_mm = gen8_mm_switch;
 
-	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
-	if (IS_ERR(ppgtt->scratch_pd))
-		return PTR_ERR(ppgtt->scratch_pd);
+	return 0;
+}
 
+static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_pagedir *pd;
+	uint64_t temp, start = 0, size = dev_priv->gtt.base.total;
+	uint32_t pdpe;
+	int ret;
+
+	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	if (ret)
+		return ret;
+
+	/* Aliasing PPGTT has to always work and be mapped because of the way we
+	 * use RESTORE_INHIBIT in the context switch. This will be fixed
+	 * eventually. */
 	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
 		free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
 	}
 
-	start = 0;
-	size = orig_length;
-
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
 		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
 
+	ppgtt->base.allocate_va_range = NULL;
+	ppgtt->base.teardown_va_range = NULL;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+
+	return 0;
+}
+
+static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	if (ret)
+		return ret;
+
+	ppgtt->base.allocate_va_range = gen8_alloc_va_range;
+	ppgtt->base.teardown_va_range = gen8_teardown_va_range;
+	ppgtt->base.clear_range = NULL;
+
 	return 0;
 }
 
@@ -1399,9 +1638,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 	if (ret)
 		return ret;
 
-	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
-	ppgtt->base.teardown_va_range = gen6_teardown_va_range;
-	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
+	ppgtt->base.allocate_va_range = aliasing ? NULL : gen6_alloc_va_range;
+	ppgtt->base.teardown_va_range = aliasing ? NULL : gen6_teardown_va_range;
+	ppgtt->base.clear_range = aliasing ? gen6_ppgtt_clear_range : NULL;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
@@ -1439,8 +1678,10 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
 
 	if (INTEL_INFO(dev)->gen < 8)
 		return gen6_ppgtt_init(ppgtt, aliasing);
+	else if ((IS_GEN8(dev) || IS_GEN9(dev)) && aliasing)
+		return gen8_aliasing_ppgtt_init(ppgtt);
 	else if (IS_GEN8(dev) || IS_GEN9(dev))
-		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
+		return gen8_ppgtt_init(ppgtt);
 	else
 		BUG();
 }
@@ -1454,8 +1695,9 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 		kref_init(&ppgtt->ref);
 		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
 			    ppgtt->base.total);
-		ppgtt->base.clear_range(&ppgtt->base, 0,
-			    ppgtt->base.total, true);
+		if (ppgtt->base.clear_range)
+			ppgtt->base.clear_range(&ppgtt->base, 0,
+				ppgtt->base.total, true);
 		i915_init_vm(dev_priv, &ppgtt->base);
 	}
 
@@ -1577,10 +1819,7 @@ ppgtt_bind_vma(struct i915_vma *vma,
 
 static void ppgtt_unbind_vma(struct i915_vma *vma)
 {
-	vma->vm->clear_range(vma->vm,
-			     vma->node.start,
-			     vma->obj->base.size,
-			     true);
+	WARN_ON(vma->vm->teardown_va_range && vma->vm->clear_range);
 	if (vma->vm->teardown_va_range) {
 		trace_i915_va_teardown(vma->vm,
 				       vma->node.start, vma->node.size,
@@ -1589,7 +1828,14 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 		vma->vm->teardown_va_range(vma->vm,
 					   vma->node.start, vma->node.size);
 		ppgtt_invalidate_tlbs(vma->vm);
-	}
+	} else if (vma->vm->clear_range) {
+		vma->vm->clear_range(vma->vm,
+				     vma->node.start,
+				     vma->obj->base.size,
+				     true);
+	} else
+		BUG();
+
 }
 
 extern int intel_iommu_gfx_mapped;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9b6caac..5446592 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -190,13 +190,26 @@ struct i915_vma {
 			u32 flags);
 };
 
-
+/* Zombies. We write page tables with the CPU, and hardware switches them with
+ * the GPU. As such, the only time we can safely remove a page table is when we
+ * know the context is idle. Since we have no good way to do this, we use the
+ * zombie.
+ *
+ * Under memory pressure, if the system is idle, zombies may be reaped.
+ *
+ * There are 3 states a page table can be in (not including scratch)
+ *  bitmap = 0, zombie = 0: unallocated
+ *  bitmap = 1, zombie = 0: allocated
+ *  bitmap = 0, zombie = 1: zombie
+ *  bitmap = 1, zombie = 1: invalid
+ */
 struct i915_pagetab {
 	struct page *page;
 	dma_addr_t daddr;
 
 	unsigned long *used_ptes;
 	unsigned int scratch:1;
+	unsigned zombie:1;
 };
 
 struct i915_pagedir {
@@ -208,6 +221,7 @@ struct i915_pagedir {
 
 	unsigned long *used_pdes;
 	struct i915_pagetab *page_tables[GEN6_PPGTT_PD_ENTRIES];
+	unsigned zombie:1;
 };
 
 struct i915_pagedirpo {
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH 24/24] drm/i915/bdw: Dynamic page table allocations in lrc mode
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (22 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 23/24] drm/i915/bdw: Dynamic page table allocations Michel Thierry
@ 2014-12-18 17:10 ` Michel Thierry
  2014-12-18 21:16 ` [PATCH 00/24] PPGTT dynamic page allocations Daniel Vetter
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-18 17:10 UTC (permalink / raw)
  To: intel-gfx

Logic ring contexts need to know the PDPs when they are populated. With
dynamic page table allocations, these PDPs may not exist yet.

Check if PDPs have been allocated and use the scratch page if they do
not exist yet.

Before submission, update the PDPs in the logic ring context as PDPs
have been allocated.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 80 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 70 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 546884b..6abe4bc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -358,6 +358,7 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 
 static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 				    struct drm_i915_gem_object *ring_obj,
+				    struct i915_hw_ppgtt *ppgtt,
 				    u32 tail)
 {
 	struct page *page;
@@ -369,6 +370,40 @@ static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 	reg_state[CTX_RING_TAIL+1] = tail;
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
 
+	/* True PPGTT with dynamic page allocation: update PDP registers and
+	 * point the unallocated PDPs to the scratch page
+	 */
+	if (ppgtt) {
+		if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3]->daddr);
+			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3]->daddr);
+		} else {
+			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2]->daddr);
+			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2]->daddr);
+		} else {
+			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1]->daddr);
+			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1]->daddr);
+		} else {
+			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0]->daddr);
+			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0]->daddr);
+		} else {
+			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+	}
+
 	kunmap_atomic(reg_state);
 
 	return 0;
@@ -387,7 +422,7 @@ static void execlists_submit_contexts(struct intel_engine_cs *ring,
 	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj0));
 	WARN_ON(!i915_gem_obj_is_pinned(ringbuf0->obj));
 
-	execlists_update_context(ctx_obj0, ringbuf0->obj, tail0);
+	execlists_update_context(ctx_obj0, ringbuf0->obj, to0->ppgtt, tail0);
 
 	if (to1) {
 		ringbuf1 = to1->engine[ring->id].ringbuf;
@@ -396,7 +431,7 @@ static void execlists_submit_contexts(struct intel_engine_cs *ring,
 		WARN_ON(!i915_gem_obj_is_pinned(ctx_obj1));
 		WARN_ON(!i915_gem_obj_is_pinned(ringbuf1->obj));
 
-		execlists_update_context(ctx_obj1, ringbuf1->obj, tail1);
+		execlists_update_context(ctx_obj1, ringbuf1->obj, to1->ppgtt, tail1);
 	}
 
 	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
@@ -1731,14 +1766,39 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3]->daddr);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3]->daddr);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2]->daddr);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2]->daddr);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1]->daddr);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1]->daddr);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0]->daddr);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0]->daddr);
+
+	/* With dynamic page allocation, PDPs may not be allocated at this point,
+	 * Point the unallocated PDPs to the scratch page
+	 */
+	if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3]->daddr);
+		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3]->daddr);
+	} else {
+		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2]->daddr);
+		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2]->daddr);
+	} else {
+		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1]->daddr);
+		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1]->daddr);
+	} else {
+		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0]->daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0]->daddr);
+	} else {
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH 03/24] drm/i915: Rename to GEN8_LEGACY_PDPES
  2014-12-18 17:10 ` [PATCH 03/24] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
@ 2014-12-18 20:40   ` Daniel Vetter
  2014-12-18 20:44     ` Daniel Vetter
  0 siblings, 1 reply; 229+ messages in thread
From: Daniel Vetter @ 2014-12-18 20:40 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Thu, Dec 18, 2014 at 05:10:00PM +0000, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
> 
> In gen8, 32b PPGTT has always had one "pdp" (it doesn't actually have
> one, but it resembles having one). The #define was confusing as is, and
> using "PDPE" is a much better description.
> 
> sed -i 's/GEN8_LEGACY_PDPS/GEN8_LEGACY_PDPES/' drivers/gpu/drm/i915/*.[ch]

Hm generally I've thought the abbreviations are pdp (for the page itself)
and pde (for the entries within). I still have no idea what pdpe means ...

So either please explain that or pick one of the others.
-Daniel

> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 6 +++---
>  drivers/gpu/drm/i915/i915_gem_gtt.h | 6 +++---
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 75a29a3..9639310 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -375,7 +375,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  	pt_vaddr = NULL;
>  
>  	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> -		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPS))
> +		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
>  			break;
>  
>  		if (pt_vaddr == NULL)
> @@ -486,7 +486,7 @@ bail:
>  static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
>  					   const int max_pdp)
>  {
> -	struct page **pt_pages[GEN8_LEGACY_PDPS];
> +	struct page **pt_pages[GEN8_LEGACY_PDPES];
>  	int i, ret;
>  
>  	for (i = 0; i < max_pdp; i++) {
> @@ -537,7 +537,7 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
>  		return -ENOMEM;
>  
>  	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
> -	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
> +	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
>  
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index e377c7d..9d998ec 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -88,7 +88,7 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
>  #define GEN8_PDE_MASK			0x1ff
>  #define GEN8_PTE_SHIFT			12
>  #define GEN8_PTE_MASK			0x1ff
> -#define GEN8_LEGACY_PDPS		4
> +#define GEN8_LEGACY_PDPES		4
>  #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
>  #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
>  
> @@ -273,12 +273,12 @@ struct i915_hw_ppgtt {
>  	unsigned num_pd_pages; /* gen8+ */
>  	union {
>  		struct page **pt_pages;
> -		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
> +		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
>  	};
>  	struct page *pd_pages;
>  	union {
>  		uint32_t pd_offset;
> -		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
> +		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
>  	};
>  	union {
>  		dma_addr_t *pt_dma_addr;
> -- 
> 2.1.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 03/24] drm/i915: Rename to GEN8_LEGACY_PDPES
  2014-12-18 20:40   ` Daniel Vetter
@ 2014-12-18 20:44     ` Daniel Vetter
  2014-12-19 12:32       ` Dave Gordon
  0 siblings, 1 reply; 229+ messages in thread
From: Daniel Vetter @ 2014-12-18 20:44 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Thu, Dec 18, 2014 at 09:40:51PM +0100, Daniel Vetter wrote:
> On Thu, Dec 18, 2014 at 05:10:00PM +0000, Michel Thierry wrote:
> > From: Ben Widawsky <benjamin.widawsky@intel.com>
> > 
> > In gen8, 32b PPGTT has always had one "pdp" (it doesn't actually have
> > one, but it resembles having one). The #define was confusing as is, and
> > using "PDPE" is a much better description.
> > 
> > sed -i 's/GEN8_LEGACY_PDPS/GEN8_LEGACY_PDPES/' drivers/gpu/drm/i915/*.[ch]
> 
> Hm generally I've thought the abbreviations are pdp (for the page itself)
> and pde (for the entries within). I still have no idea what pdpe means ...
> 
> So either please explain that or pick one of the others.

In case you fear the rebase pain of renaming this:

1. Export entire series as patches with git format-patch.
2. sed -e 's/PDPE/PDE/g' on all the patch files
3. Import the changed patches into a new fresh branch.'

That's all. Feels really crazy the first time you do it, but after having
done this a lot with the internal branch when something random (function
name or so) changed in upstream it's a fairly simple trick to pull off ;-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 11/24] drm/i915: Extract context switch skip logic
  2014-12-18 17:10 ` [PATCH 11/24] drm/i915: Extract context switch skip logic Michel Thierry
@ 2014-12-18 20:54   ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2014-12-18 20:54 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Thu, Dec 18, 2014 at 05:10:08PM +0000, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
> 
> We have some fanciness coming up. This patch just breaks out the logic.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>

The next patch moves around this function again which means the diff is
actually bigger. Also an extracted function should actually replace the
logic that's extracted in the same patch. As-is this doesn't help review,
but I agree it's a nice split-up idea which would help review. Probably
got lost in all the rebases.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index b67d269..a8ff03d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -726,6 +726,16 @@ unpin_out:
>  	return ret;
>  }
>  
> +static inline bool should_skip_switch(struct intel_engine_cs *ring,
> +				      struct intel_context *from,
> +				      struct intel_context *to)
> +{
> +	if (from == to && !to->remap_slice)
> +		return true;
> +
> +	return false;
> +}
> +
>  /**
>   * i915_switch_context() - perform a GPU context switch.
>   * @ring: ring for which we'll execute the context switch
> -- 
> 2.1.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 10/24] drm/i915: Track GEN6 page table usage
  2014-12-18 17:10 ` [PATCH 10/24] drm/i915: Track GEN6 page table usage Michel Thierry
@ 2014-12-18 21:06   ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2014-12-18 21:06 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

Ok this is just a very coarse high level comment. The only thing I really
looked for is very the dynamic pagetable allocation point happens and how
out-of-memory is handled in there.

But I've noticed that while trying to look for that that the patches and
code have a lot of history and often code comments and commit message
don't really agree with the code any more. I think that should be
carefully reviewed to make it less confusing.

On Thu, Dec 18, 2014 at 05:10:07PM +0000, Michel Thierry wrote:
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index c08fe8b..2eb6011 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -54,7 +54,10 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
>  #define GEN6_PPGTT_PD_ENTRIES		512
>  #define GEN6_PD_SIZE			(GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
>  #define GEN6_PD_ALIGN			(PAGE_SIZE * 16)
> +#define GEN6_PDE_SHIFT          22
>  #define GEN6_PDE_VALID			(1 << 0)
> +#define GEN6_PDE_MASK			(GEN6_PPGTT_PD_ENTRIES-1)
> +#define NUM_PTE(pde_shift)		(1 << (pde_shift - PAGE_SHIFT))
>  
>  #define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
>  
> @@ -182,9 +185,33 @@ struct i915_vma {
>  	 * setting the valid PTE entries to a reserved scratch page. */
>  	void (*unbind_vma)(struct i915_vma *vma);
>  	/* Map an object into an address space with the given cache flags. */
> -	void (*bind_vma)(struct i915_vma *vma,
> -			 enum i915_cache_level cache_level,
> -			 u32 flags);
> +	int (*bind_vma)(struct i915_vma *vma,
> +			enum i915_cache_level cache_level,
> +			u32 flags);
> +};

So this patch changes the interface of vma->bind_vma to return errors when
the pagetable alloc fails. Afaics none of the callers get updated, and I
didn't see those adjustments in any other patch in this series. This
doesn't work (or I'm blind).

Also I'm not too happy with smashing this into ->bind_vma: This function
is also called in places where we know that the pagetables must be there
already, namely when changing pte bits in i915_gem_object_set_cache_level.

Imo we should add an explicit call to allocate required pagetables in
i915_gem_object_bind_to_vm, which is the only place which actually needs
this (I think).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 12/24] drm/i915: Track page table reload need
  2014-12-18 17:10 ` [PATCH 12/24] drm/i915: Track page table reload need Michel Thierry
@ 2014-12-18 21:08   ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2014-12-18 21:08 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

Another high-level logic comment below.

On Thu, Dec 18, 2014 at 05:10:09PM +0000, Michel Thierry wrote:
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index faa0603..c917301 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1398,6 +1398,15 @@ i915_ppgtt_create(struct drm_device *dev, struct drm_i915_file_private *fpriv)
>  	return ppgtt;
>  }
>  
> +/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
> + * are switching between contexts with the same LRCA, we also must do a force
> + * restore.
> + */
> +#define ppgtt_invalidate_tlbs(vm) do {\
> +	/* If current vm != vm, */ \
> +	vm->pd_reload_mask = INTEL_INFO(vm->dev)->ring_mask; \
> +} while (0)
> +
>  void  i915_ppgtt_release(struct kref *kref)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
> @@ -1433,6 +1442,8 @@ ppgtt_bind_vma(struct i915_vma *vma,
>  						 vma->node.size);
>  		if (ret)
>  			return ret;
> +
> +		ppgtt_invalidate_tlbs(vma->vm);

Imo we should only set this when we actually allocate new pagetables, and
not unconditionally every time we bind.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 14/24] drm/i915: Finish gen6/7 dynamic page table allocation
  2014-12-18 17:10 ` [PATCH 14/24] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
@ 2014-12-18 21:12   ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2014-12-18 21:12 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Thu, Dec 18, 2014 at 05:10:11PM +0000, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
> 
> This patch continues on the idea from the previous patch. From here on,
> in the steady state, PDEs are all pointing to the scratch page table (as
> recommended in the spec). When an object is allocated in the VA range,
> the code will determine if we need to allocate a page for the page
> table. Similarly when the object is destroyed, we will remove, and free
> the page table pointing the PDE back to the scratch page.
> 
> Following patches will work to unify the code a bit as we bring in GEN8
> support. GEN6 and GEN8 are different enough that I had a hard time to
> get to this point with as much common code as I do.
> 
> The aliasing PPGTT must pre-allocate all of the page tables. There are a
> few reasons for this. Two trivial ones: aliasing ppgtt goes through the
> ggtt paths, so it's hard to maintain, we currently do not restore the
> default context (assuming the previous force reload is indeed
> necessary). Most importantly though, the only way (it seems from
> empirical evidence) to invalidate the CS TLBs on non-render ring is to
> either use ring sync (which requires actually stopping the rings in
> order to synchronize when the sync completes vs. where you are in
> execution), or to reload DCLV.  Since without full PPGTT we do not ever
> reload the DCLV register, there is no good way to achieve this. The
> simplest solution is just to not support dynamic page table
> creation/destruction in the aliasing PPGTT.
> 
> We could always reload DCLV, but this seems like quite a bit of excess
> overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
> page tables.
> 
> v2: Make the page table bitmap declared inside the function (Chris)
> Simplify the way scratching address space works.
> Move the alloc/teardown tracepoints up a level in the call stack so that
> both all implementations get the trace.
> 
> v3: Updated trace event to spit out a name
> 
> v4: Aliasing ppgtt is now initialized differently (in setup global gtt)
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4)
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |   3 +-
>  drivers/gpu/drm/i915/i915_drv.h     |   7 ++
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 125 +++++++++++++++++++++++++++++++++---
>  drivers/gpu/drm/i915/i915_trace.h   | 116 +++++++++++++++++++++++++++++++++
>  4 files changed, 240 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 60f91bc..0f63076 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2149,6 +2149,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>  		seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", I915_READ(RING_PP_DIR_BASE_READ(ring)));
>  		seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", I915_READ(RING_PP_DIR_DCLV(ring)));
>  	}
> +	seq_printf(m, "ECOCHK: 0x%08x\n\n", I915_READ(GAM_ECOCHK));
> +
>  	if (dev_priv->mm.aliasing_ppgtt) {
>  		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
>  
> @@ -2165,7 +2167,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>  			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
>  		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
>  	}
> -	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
>  }
>  
>  static int i915_ppgtt_info(struct seq_file *m, void *data)
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 3047291f..d74db21 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2819,6 +2819,13 @@ static inline bool i915_is_ggtt(struct i915_address_space *vm)
>  	return vm == ggtt;
>  }
>  
> +static inline bool i915_is_aliasing_ppgtt(struct i915_address_space *vm)
> +{
> +	struct i915_address_space *appgtt =
> +		&((struct drm_i915_private *)(vm)->dev->dev_private)->mm.aliasing_ppgtt->base;
> +	return vm == appgtt;
> +}

We killed the aliasing ppgtt vm, this condition will never be true when
you pass any vma->vm pointer to it.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/24] PPGTT dynamic page allocations
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (23 preceding siblings ...)
  2014-12-18 17:10 ` [PATCH 24/24] drm/i915/bdw: Dynamic page table allocations in lrc mode Michel Thierry
@ 2014-12-18 21:16 ` Daniel Vetter
  2014-12-19  8:31   ` Chris Wilson
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                   ` (4 subsequent siblings)
  29 siblings, 1 reply; 229+ messages in thread
From: Daniel Vetter @ 2014-12-18 21:16 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Thu, Dec 18, 2014 at 05:09:57PM +0000, Michel Thierry wrote:
> This new version tries to remove as many unnecessary changes as possible from
> the previous RFC.
>  
> For GEN8, it has also been extended to work in logical ring submission (lrc)
> mode, as it will be the preferred mode of operation.
> I also tried to update the lrc code at the same time the ppgtt refactoring
> occurred, leaving only one patch that is exclusively for lrc.
> 
> This list can be seen in 3 parts:
> [01-10] Include code rework for PPGTT (all GENs).
> [11-14] Adds page table allocation for GEN6/GEN7
> [15-24] Enables dynamic allocation in GEN8. It is enabled for both legacy
> and execlist submission modes.
> 
> Ben Widawsky (23):
>   drm/i915: Add some extra guards in evict_vm
>   drm/i915/trace: Fix offsets for 64b
>   drm/i915: Rename to GEN8_LEGACY_PDPES
>   drm/i915: Setup less PPGTT on failed pagedir
>   drm/i915/gen8: Un-hardcode number of page directories
>   drm/i915: Range clearing is PPGTT agnostic
>   drm/i915: page table abstractions
>   drm/i915: Complete page table structures
>   drm/i915: Create page table allocators
>   drm/i915: Track GEN6 page table usage
>   drm/i915: Extract context switch skip logic
>   drm/i915: Track page table reload need
>   drm/i915: Initialize all contexts
>   drm/i915: Finish gen6/7 dynamic page table allocation
>   drm/i915/bdw: Use dynamic allocation idioms on free
>   drm/i915/bdw: pagedirs rework allocation
>   drm/i915/bdw: pagetable allocation rework
>   drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
>   drm/i915: num_pd_pages/num_pd_entries isn't useful
>   drm/i915: Extract PPGTT param from pagedir alloc
>   drm/i915/bdw: Split out mappings
>   drm/i915/bdw: begin bitmap tracking
>   drm/i915/bdw: Dynamic page table allocations
> 
> Michel Thierry (1):
>   drm/i915/bdw: Dynamic page table allocations in lrc mode

Ok, I've tried to read through this series again and I definitely see a
bit clearer, but it's still fairly confusing to me. I think that's just
the long history of this patch series - often it seems to do something and
then undo it again in some later patch. Which doesn't help understanding
it.

I've replied with a few comments. Imo the way forward with this is to
read, understand and review it from the beginning and merge while that's
happening. It will probably take a few rounds until all the confusion is
cleared up and we've reached the last patch.
-Daniel

> 
>  drivers/gpu/drm/i915/i915_debugfs.c        |    7 +-
>  drivers/gpu/drm/i915/i915_drv.h            |    7 +
>  drivers/gpu/drm/i915/i915_gem_context.c    |   62 +-
>  drivers/gpu/drm/i915/i915_gem_evict.c      |    3 +
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   11 +
>  drivers/gpu/drm/i915/i915_gem_gtt.c        | 1224 ++++++++++++++++++++--------
>  drivers/gpu/drm/i915/i915_gem_gtt.h        |  252 +++++-
>  drivers/gpu/drm/i915/i915_trace.h          |  124 ++-
>  drivers/gpu/drm/i915/intel_lrc.c           |   80 +-
>  9 files changed, 1378 insertions(+), 392 deletions(-)
> 
> -- 
> 2.1.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/24] PPGTT dynamic page allocations
  2014-12-18 21:16 ` [PATCH 00/24] PPGTT dynamic page allocations Daniel Vetter
@ 2014-12-19  8:31   ` Chris Wilson
  2014-12-19  8:37     ` Daniel Vetter
  0 siblings, 1 reply; 229+ messages in thread
From: Chris Wilson @ 2014-12-19  8:31 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Thu, Dec 18, 2014 at 10:16:22PM +0100, Daniel Vetter wrote:
> On Thu, Dec 18, 2014 at 05:09:57PM +0000, Michel Thierry wrote:
> > This new version tries to remove as many unnecessary changes as possible from
> > the previous RFC.
> >  
> > For GEN8, it has also been extended to work in logical ring submission (lrc)
> > mode, as it will be the preferred mode of operation.
> > I also tried to update the lrc code at the same time the ppgtt refactoring
> > occurred, leaving only one patch that is exclusively for lrc.
> > 
> > This list can be seen in 3 parts:
> > [01-10] Include code rework for PPGTT (all GENs).
> > [11-14] Adds page table allocation for GEN6/GEN7
> > [15-24] Enables dynamic allocation in GEN8. It is enabled for both legacy
> > and execlist submission modes.
> > 
> > Ben Widawsky (23):
> >   drm/i915: Add some extra guards in evict_vm
> >   drm/i915/trace: Fix offsets for 64b
> >   drm/i915: Rename to GEN8_LEGACY_PDPES
> >   drm/i915: Setup less PPGTT on failed pagedir
> >   drm/i915/gen8: Un-hardcode number of page directories
> >   drm/i915: Range clearing is PPGTT agnostic
> >   drm/i915: page table abstractions
> >   drm/i915: Complete page table structures
> >   drm/i915: Create page table allocators
> >   drm/i915: Track GEN6 page table usage
> >   drm/i915: Extract context switch skip logic
> >   drm/i915: Track page table reload need
> >   drm/i915: Initialize all contexts
> >   drm/i915: Finish gen6/7 dynamic page table allocation
> >   drm/i915/bdw: Use dynamic allocation idioms on free
> >   drm/i915/bdw: pagedirs rework allocation
> >   drm/i915/bdw: pagetable allocation rework
> >   drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
> >   drm/i915: num_pd_pages/num_pd_entries isn't useful
> >   drm/i915: Extract PPGTT param from pagedir alloc
> >   drm/i915/bdw: Split out mappings
> >   drm/i915/bdw: begin bitmap tracking
> >   drm/i915/bdw: Dynamic page table allocations
> > 
> > Michel Thierry (1):
> >   drm/i915/bdw: Dynamic page table allocations in lrc mode
> 
> Ok, I've tried to read through this series again and I definitely see a
> bit clearer, but it's still fairly confusing to me. I think that's just
> the long history of this patch series - often it seems to do something and
> then undo it again in some later patch. Which doesn't help understanding
> it.
> 
> I've replied with a few comments. Imo the way forward with this is to
> read, understand and review it from the beginning and merge while that's
> happening. It will probably take a few rounds until all the confusion is
> cleared up and we've reached the last patch.

I honestly think this is starting off in the wrong direction. The first
task, imo, is to make the current PD swappable. Then, we can introduce
infrastructure to do deferred page allocation, hopefully combining with
an approach to allow userspace to similarly defer their page allocation.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/24] PPGTT dynamic page allocations
  2014-12-19  8:31   ` Chris Wilson
@ 2014-12-19  8:37     ` Daniel Vetter
  2014-12-19  8:50       ` Chris Wilson
  0 siblings, 1 reply; 229+ messages in thread
From: Daniel Vetter @ 2014-12-19  8:37 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Michel Thierry, intel-gfx

On Fri, Dec 19, 2014 at 08:31:03AM +0000, Chris Wilson wrote:
> On Thu, Dec 18, 2014 at 10:16:22PM +0100, Daniel Vetter wrote:
> > On Thu, Dec 18, 2014 at 05:09:57PM +0000, Michel Thierry wrote:
> > > This new version tries to remove as many unnecessary changes as possible from
> > > the previous RFC.
> > >  
> > > For GEN8, it has also been extended to work in logical ring submission (lrc)
> > > mode, as it will be the preferred mode of operation.
> > > I also tried to update the lrc code at the same time the ppgtt refactoring
> > > occurred, leaving only one patch that is exclusively for lrc.
> > > 
> > > This list can be seen in 3 parts:
> > > [01-10] Include code rework for PPGTT (all GENs).
> > > [11-14] Adds page table allocation for GEN6/GEN7
> > > [15-24] Enables dynamic allocation in GEN8. It is enabled for both legacy
> > > and execlist submission modes.
> > > 
> > > Ben Widawsky (23):
> > >   drm/i915: Add some extra guards in evict_vm
> > >   drm/i915/trace: Fix offsets for 64b
> > >   drm/i915: Rename to GEN8_LEGACY_PDPES
> > >   drm/i915: Setup less PPGTT on failed pagedir
> > >   drm/i915/gen8: Un-hardcode number of page directories
> > >   drm/i915: Range clearing is PPGTT agnostic
> > >   drm/i915: page table abstractions
> > >   drm/i915: Complete page table structures
> > >   drm/i915: Create page table allocators
> > >   drm/i915: Track GEN6 page table usage
> > >   drm/i915: Extract context switch skip logic
> > >   drm/i915: Track page table reload need
> > >   drm/i915: Initialize all contexts
> > >   drm/i915: Finish gen6/7 dynamic page table allocation
> > >   drm/i915/bdw: Use dynamic allocation idioms on free
> > >   drm/i915/bdw: pagedirs rework allocation
> > >   drm/i915/bdw: pagetable allocation rework
> > >   drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
> > >   drm/i915: num_pd_pages/num_pd_entries isn't useful
> > >   drm/i915: Extract PPGTT param from pagedir alloc
> > >   drm/i915/bdw: Split out mappings
> > >   drm/i915/bdw: begin bitmap tracking
> > >   drm/i915/bdw: Dynamic page table allocations
> > > 
> > > Michel Thierry (1):
> > >   drm/i915/bdw: Dynamic page table allocations in lrc mode
> > 
> > Ok, I've tried to read through this series again and I definitely see a
> > bit clearer, but it's still fairly confusing to me. I think that's just
> > the long history of this patch series - often it seems to do something and
> > then undo it again in some later patch. Which doesn't help understanding
> > it.
> > 
> > I've replied with a few comments. Imo the way forward with this is to
> > read, understand and review it from the beginning and merge while that's
> > happening. It will probably take a few rounds until all the confusion is
> > cleared up and we've reached the last patch.
> 
> I honestly think this is starting off in the wrong direction. The first
> task, imo, is to make the current PD swappable. Then, we can introduce
> infrastructure to do deferred page allocation, hopefully combining with
> an approach to allow userspace to similarly defer their page allocation.

Thus far things started to blow up because we need a bit too much memory
in testcases so we need this. We'll probably need the pd swapping
eventually, too. But only on gen7, and atm it doesn't look good for ppgtt
on gen7 still. Given that I think the priorities are right for now.

But yeah we need to keep this in mind for gen7 full ppgtt.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/24] PPGTT dynamic page allocations
  2014-12-19  8:37     ` Daniel Vetter
@ 2014-12-19  8:50       ` Chris Wilson
  2014-12-19 10:13         ` Daniel Vetter
  0 siblings, 1 reply; 229+ messages in thread
From: Chris Wilson @ 2014-12-19  8:50 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Dec 19, 2014 at 09:37:52AM +0100, Daniel Vetter wrote:
> On Fri, Dec 19, 2014 at 08:31:03AM +0000, Chris Wilson wrote:
> > On Thu, Dec 18, 2014 at 10:16:22PM +0100, Daniel Vetter wrote:
> > > On Thu, Dec 18, 2014 at 05:09:57PM +0000, Michel Thierry wrote:
> > > > This new version tries to remove as many unnecessary changes as possible from
> > > > the previous RFC.
> > > >  
> > > > For GEN8, it has also been extended to work in logical ring submission (lrc)
> > > > mode, as it will be the preferred mode of operation.
> > > > I also tried to update the lrc code at the same time the ppgtt refactoring
> > > > occurred, leaving only one patch that is exclusively for lrc.
> > > > 
> > > > This list can be seen in 3 parts:
> > > > [01-10] Include code rework for PPGTT (all GENs).
> > > > [11-14] Adds page table allocation for GEN6/GEN7
> > > > [15-24] Enables dynamic allocation in GEN8. It is enabled for both legacy
> > > > and execlist submission modes.
> > > > 
> > > > Ben Widawsky (23):
> > > >   drm/i915: Add some extra guards in evict_vm
> > > >   drm/i915/trace: Fix offsets for 64b
> > > >   drm/i915: Rename to GEN8_LEGACY_PDPES
> > > >   drm/i915: Setup less PPGTT on failed pagedir
> > > >   drm/i915/gen8: Un-hardcode number of page directories
> > > >   drm/i915: Range clearing is PPGTT agnostic
> > > >   drm/i915: page table abstractions
> > > >   drm/i915: Complete page table structures
> > > >   drm/i915: Create page table allocators
> > > >   drm/i915: Track GEN6 page table usage
> > > >   drm/i915: Extract context switch skip logic
> > > >   drm/i915: Track page table reload need
> > > >   drm/i915: Initialize all contexts
> > > >   drm/i915: Finish gen6/7 dynamic page table allocation
> > > >   drm/i915/bdw: Use dynamic allocation idioms on free
> > > >   drm/i915/bdw: pagedirs rework allocation
> > > >   drm/i915/bdw: pagetable allocation rework
> > > >   drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
> > > >   drm/i915: num_pd_pages/num_pd_entries isn't useful
> > > >   drm/i915: Extract PPGTT param from pagedir alloc
> > > >   drm/i915/bdw: Split out mappings
> > > >   drm/i915/bdw: begin bitmap tracking
> > > >   drm/i915/bdw: Dynamic page table allocations
> > > > 
> > > > Michel Thierry (1):
> > > >   drm/i915/bdw: Dynamic page table allocations in lrc mode
> > > 
> > > Ok, I've tried to read through this series again and I definitely see a
> > > bit clearer, but it's still fairly confusing to me. I think that's just
> > > the long history of this patch series - often it seems to do something and
> > > then undo it again in some later patch. Which doesn't help understanding
> > > it.
> > > 
> > > I've replied with a few comments. Imo the way forward with this is to
> > > read, understand and review it from the beginning and merge while that's
> > > happening. It will probably take a few rounds until all the confusion is
> > > cleared up and we've reached the last patch.
> > 
> > I honestly think this is starting off in the wrong direction. The first
> > task, imo, is to make the current PD swappable. Then, we can introduce
> > infrastructure to do deferred page allocation, hopefully combining with
> > an approach to allow userspace to similarly defer their page allocation.
> 
> Thus far things started to blow up because we need a bit too much memory
> in testcases so we need this. We'll probably need the pd swapping
> eventually, too. But only on gen7, and atm it doesn't look good for ppgtt
> on gen7 still. Given that I think the priorities are right for now.
> 
> But yeah we need to keep this in mind for gen7 full ppgtt.

Certainly making the toplevel PD swappable on BDW is less of a priority,
since that should also be already done as part of execlists (and it is so
much smaller). I would like to see the gen6/7 deferred allocation
removed from this series, as I still consider it to be the wrong initial
approach (and making the current gen7 full-ppgtt swappable is rather
trivial).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/24] PPGTT dynamic page allocations
  2014-12-19  8:50       ` Chris Wilson
@ 2014-12-19 10:13         ` Daniel Vetter
  2014-12-19 12:35           ` Michel Thierry
  2014-12-19 13:10           ` Chris Wilson
  0 siblings, 2 replies; 229+ messages in thread
From: Daniel Vetter @ 2014-12-19 10:13 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Michel Thierry, intel-gfx

On Fri, Dec 19, 2014 at 08:50:09AM +0000, Chris Wilson wrote:
> On Fri, Dec 19, 2014 at 09:37:52AM +0100, Daniel Vetter wrote:
> > On Fri, Dec 19, 2014 at 08:31:03AM +0000, Chris Wilson wrote:
> > > On Thu, Dec 18, 2014 at 10:16:22PM +0100, Daniel Vetter wrote:
> > > > On Thu, Dec 18, 2014 at 05:09:57PM +0000, Michel Thierry wrote:
> > > > > This new version tries to remove as many unnecessary changes as possible from
> > > > > the previous RFC.
> > > > >  
> > > > > For GEN8, it has also been extended to work in logical ring submission (lrc)
> > > > > mode, as it will be the preferred mode of operation.
> > > > > I also tried to update the lrc code at the same time the ppgtt refactoring
> > > > > occurred, leaving only one patch that is exclusively for lrc.
> > > > > 
> > > > > This list can be seen in 3 parts:
> > > > > [01-10] Include code rework for PPGTT (all GENs).
> > > > > [11-14] Adds page table allocation for GEN6/GEN7
> > > > > [15-24] Enables dynamic allocation in GEN8. It is enabled for both legacy
> > > > > and execlist submission modes.
> > > > > 
> > > > > Ben Widawsky (23):
> > > > >   drm/i915: Add some extra guards in evict_vm
> > > > >   drm/i915/trace: Fix offsets for 64b
> > > > >   drm/i915: Rename to GEN8_LEGACY_PDPES
> > > > >   drm/i915: Setup less PPGTT on failed pagedir
> > > > >   drm/i915/gen8: Un-hardcode number of page directories
> > > > >   drm/i915: Range clearing is PPGTT agnostic
> > > > >   drm/i915: page table abstractions
> > > > >   drm/i915: Complete page table structures
> > > > >   drm/i915: Create page table allocators
> > > > >   drm/i915: Track GEN6 page table usage
> > > > >   drm/i915: Extract context switch skip logic
> > > > >   drm/i915: Track page table reload need
> > > > >   drm/i915: Initialize all contexts
> > > > >   drm/i915: Finish gen6/7 dynamic page table allocation
> > > > >   drm/i915/bdw: Use dynamic allocation idioms on free
> > > > >   drm/i915/bdw: pagedirs rework allocation
> > > > >   drm/i915/bdw: pagetable allocation rework
> > > > >   drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
> > > > >   drm/i915: num_pd_pages/num_pd_entries isn't useful
> > > > >   drm/i915: Extract PPGTT param from pagedir alloc
> > > > >   drm/i915/bdw: Split out mappings
> > > > >   drm/i915/bdw: begin bitmap tracking
> > > > >   drm/i915/bdw: Dynamic page table allocations
> > > > > 
> > > > > Michel Thierry (1):
> > > > >   drm/i915/bdw: Dynamic page table allocations in lrc mode
> > > > 
> > > > Ok, I've tried to read through this series again and I definitely see a
> > > > bit clearer, but it's still fairly confusing to me. I think that's just
> > > > the long history of this patch series - often it seems to do something and
> > > > then undo it again in some later patch. Which doesn't help understanding
> > > > it.
> > > > 
> > > > I've replied with a few comments. Imo the way forward with this is to
> > > > read, understand and review it from the beginning and merge while that's
> > > > happening. It will probably take a few rounds until all the confusion is
> > > > cleared up and we've reached the last patch.
> > > 
> > > I honestly think this is starting off in the wrong direction. The first
> > > task, imo, is to make the current PD swappable. Then, we can introduce
> > > infrastructure to do deferred page allocation, hopefully combining with
> > > an approach to allow userspace to similarly defer their page allocation.
> > 
> > Thus far things started to blow up because we need a bit too much memory
> > in testcases so we need this. We'll probably need the pd swapping
> > eventually, too. But only on gen7, and atm it doesn't look good for ppgtt
> > on gen7 still. Given that I think the priorities are right for now.
> > 
> > But yeah we need to keep this in mind for gen7 full ppgtt.
> 
> Certainly making the toplevel PD swappable on BDW is less of a priority,
> since that should also be already done as part of execlists (and it is so
> much smaller). I would like to see the gen6/7 deferred allocation
> removed from this series, as I still consider it to be the wrong initial
> approach (and making the current gen7 full-ppgtt swappable is rather
> trivial).

I don't understand why you'd want to hold up the delayed pagetable alloc
until the pd is evictable. Imo these are fairly orthogoanl issues: pd
eviction tries to make ggtt space reclaimable, while deferred pagetable
alloc ensure that we don't alloc 2M of pagetables (which are system
memory) in the lower level when not needed. So imo we can go ahead with
both in parallel. There might be some minor conflicts between this series
and making pd ggtt blocks evictable around pd reloading, but noting
fundamental.

Or are we talking past each another? Just wondering since you call it
"swappable pd" and I can't really think of a way we could swap out pds to
disk on gen7 (they just block ggtt space which can't be used for real ptes
any more).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 03/24] drm/i915: Rename to GEN8_LEGACY_PDPES
  2014-12-18 20:44     ` Daniel Vetter
@ 2014-12-19 12:32       ` Dave Gordon
  2014-12-19 13:24         ` Daniel Vetter
  0 siblings, 1 reply; 229+ messages in thread
From: Dave Gordon @ 2014-12-19 12:32 UTC (permalink / raw)
  To: Daniel Vetter, Michel Thierry; +Cc: intel-gfx

On 18/12/14 20:44, Daniel Vetter wrote:
> On Thu, Dec 18, 2014 at 09:40:51PM +0100, Daniel Vetter wrote:
>> On Thu, Dec 18, 2014 at 05:10:00PM +0000, Michel Thierry wrote:
>>> From: Ben Widawsky <benjamin.widawsky@intel.com>
>>>
>>> In gen8, 32b PPGTT has always had one "pdp" (it doesn't actually have
>>> one, but it resembles having one). The #define was confusing as is, and
>>> using "PDPE" is a much better description.
>>>
>>> sed -i 's/GEN8_LEGACY_PDPS/GEN8_LEGACY_PDPES/' drivers/gpu/drm/i915/*.[ch]
>>
>> Hm generally I've thought the abbreviations are pdp (for the page itself)
>> and pde (for the entries within). I still have no idea what pdpe means ...
>>
>> So either please explain that or pick one of the others.
> 
> In case you fear the rebase pain of renaming this:
> 
> 1. Export entire series as patches with git format-patch.
> 2. sed -e 's/PDPE/PDE/g' on all the patch files
> 3. Import the changed patches into a new fresh branch.'
> 
> That's all. Feels really crazy the first time you do it, but after having
> done this a lot with the internal branch when something random (function
> name or so) changed in upstream it's a fairly simple trick to pull off ;-)
> 
> Cheers, Daniel

The specific #define is inconsistent with the naming used for other
#defines and in the associated comments. Here's the relevant chunk of
i915_gem_gtt.h:

/* GEN8 legacy style address is defined as a 3 level page table:
 * 31:30 | 29:21 | 20:12 |  11:0
 * PDPE  |  PDE  |  PTE  | offset
 * The difference as compared to normal x86 3 level page table is the
PDPEs are
 * programmed via register.
 */
#define GEN8_PDPE_SHIFT                 30
#define GEN8_PDPE_MASK                  0x3
#define GEN8_PDE_SHIFT                  21
#define GEN8_PDE_MASK                   0x1ff
#define GEN8_PTE_SHIFT                  12
#define GEN8_PTE_MASK                   0x1ff
#define GEN8_LEGACY_PDPS                4
#define GEN8_PTES_PER_PAGE              (PAGE_SIZE / sizeof(gen8_gtt_pte_t))
#define GEN8_PDES_PER_PAGE              (PAGE_SIZE /
sizeof(gen8_ppgtt_pde_t))

So 'LEGACY_PDPS' is inconsistent with 'PDPE_SHIFT/MASK'.

PTE  = Page Table Entry
PDE  = Page Directory Entry
PDPE = Page Directory Pointer Entry

See http://www.pagetable.com/?p=308

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/24] PPGTT dynamic page allocations
  2014-12-19 10:13         ` Daniel Vetter
@ 2014-12-19 12:35           ` Michel Thierry
  2014-12-19 13:10           ` Chris Wilson
  1 sibling, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-19 12:35 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 5155 bytes --]

On 12/19/2014 10:13 AM, Daniel Vetter wrote:
> On Fri, Dec 19, 2014 at 08:50:09AM +0000, Chris Wilson wrote:
>> On Fri, Dec 19, 2014 at 09:37:52AM +0100, Daniel Vetter wrote:
>>> On Fri, Dec 19, 2014 at 08:31:03AM +0000, Chris Wilson wrote:
>>>> On Thu, Dec 18, 2014 at 10:16:22PM +0100, Daniel Vetter wrote:
>>>>> On Thu, Dec 18, 2014 at 05:09:57PM +0000, Michel Thierry wrote:
>>>>>> This new version tries to remove as many unnecessary changes as possible from
>>>>>> the previous RFC.
>>>>>>   
>>>>>> For GEN8, it has also been extended to work in logical ring submission (lrc)
>>>>>> mode, as it will be the preferred mode of operation.
>>>>>> I also tried to update the lrc code at the same time the ppgtt refactoring
>>>>>> occurred, leaving only one patch that is exclusively for lrc.
>>>>>>
>>>>>> This list can be seen in 3 parts:
>>>>>> [01-10] Include code rework for PPGTT (all GENs).
>>>>>> [11-14] Adds page table allocation for GEN6/GEN7
>>>>>> [15-24] Enables dynamic allocation in GEN8. It is enabled for both legacy
>>>>>> and execlist submission modes.
>>>>>>
>>>>>> Ben Widawsky (23):
>>>>>>    drm/i915: Add some extra guards in evict_vm
>>>>>>    drm/i915/trace: Fix offsets for 64b
>>>>>>    drm/i915: Rename to GEN8_LEGACY_PDPES
>>>>>>    drm/i915: Setup less PPGTT on failed pagedir
>>>>>>    drm/i915/gen8: Un-hardcode number of page directories
>>>>>>    drm/i915: Range clearing is PPGTT agnostic
>>>>>>    drm/i915: page table abstractions
>>>>>>    drm/i915: Complete page table structures
>>>>>>    drm/i915: Create page table allocators
>>>>>>    drm/i915: Track GEN6 page table usage
>>>>>>    drm/i915: Extract context switch skip logic
>>>>>>    drm/i915: Track page table reload need
>>>>>>    drm/i915: Initialize all contexts
>>>>>>    drm/i915: Finish gen6/7 dynamic page table allocation
>>>>>>    drm/i915/bdw: Use dynamic allocation idioms on free
>>>>>>    drm/i915/bdw: pagedirs rework allocation
>>>>>>    drm/i915/bdw: pagetable allocation rework
>>>>>>    drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
>>>>>>    drm/i915: num_pd_pages/num_pd_entries isn't useful
>>>>>>    drm/i915: Extract PPGTT param from pagedir alloc
>>>>>>    drm/i915/bdw: Split out mappings
>>>>>>    drm/i915/bdw: begin bitmap tracking
>>>>>>    drm/i915/bdw: Dynamic page table allocations
>>>>>>
>>>>>> Michel Thierry (1):
>>>>>>    drm/i915/bdw: Dynamic page table allocations in lrc mode
>>>>> Ok, I've tried to read through this series again and I definitely see a
>>>>> bit clearer, but it's still fairly confusing to me. I think that's just
>>>>> the long history of this patch series - often it seems to do something and
>>>>> then undo it again in some later patch. Which doesn't help understanding
>>>>> it.
>>>>>
>>>>> I've replied with a few comments. Imo the way forward with this is to
>>>>> read, understand and review it from the beginning and merge while that's
>>>>> happening. It will probably take a few rounds until all the confusion is
>>>>> cleared up and we've reached the last patch.
>>>> I honestly think this is starting off in the wrong direction. The first
>>>> task, imo, is to make the current PD swappable. Then, we can introduce
>>>> infrastructure to do deferred page allocation, hopefully combining with
>>>> an approach to allow userspace to similarly defer their page allocation.
>>> Thus far things started to blow up because we need a bit too much memory
>>> in testcases so we need this. We'll probably need the pd swapping
>>> eventually, too. But only on gen7, and atm it doesn't look good for ppgtt
>>> on gen7 still. Given that I think the priorities are right for now.
>>>
>>> But yeah we need to keep this in mind for gen7 full ppgtt.
>> Certainly making the toplevel PD swappable on BDW is less of a priority,
>> since that should also be already done as part of execlists (and it is so
>> much smaller). I would like to see the gen6/7 deferred allocation
>> removed from this series, as I still consider it to be the wrong initial
>> approach (and making the current gen7 full-ppgtt swappable is rather
>> trivial).
> I don't understand why you'd want to hold up the delayed pagetable alloc
> until the pd is evictable. Imo these are fairly orthogoanl issues: pd
> eviction tries to make ggtt space reclaimable, while deferred pagetable
> alloc ensure that we don't alloc 2M of pagetables (which are system
> memory) in the lower level when not needed. So imo we can go ahead with
> both in parallel. There might be some minor conflicts between this series
> and making pd ggtt blocks evictable around pd reloading, but noting
> fundamental.
>
> Or are we talking past each another? Just wondering since you call it
> "swappable pd" and I can't really think of a way we could swap out pds to
> disk on gen7 (they just block ggtt space which can't be used for real ptes
> any more).
> -Daniel

Thanks for the quick review.

I'll make the changes (and try to tidy up the commit messages).
In particular, I agree it's better to decouple alloc and bind_vma.

-Michel


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5510 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/24] PPGTT dynamic page allocations
  2014-12-19 10:13         ` Daniel Vetter
  2014-12-19 12:35           ` Michel Thierry
@ 2014-12-19 13:10           ` Chris Wilson
  2014-12-19 13:29             ` Daniel Vetter
  1 sibling, 1 reply; 229+ messages in thread
From: Chris Wilson @ 2014-12-19 13:10 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Dec 19, 2014 at 11:13:51AM +0100, Daniel Vetter wrote:
> On Fri, Dec 19, 2014 at 08:50:09AM +0000, Chris Wilson wrote:
> > On Fri, Dec 19, 2014 at 09:37:52AM +0100, Daniel Vetter wrote:
> > > On Fri, Dec 19, 2014 at 08:31:03AM +0000, Chris Wilson wrote:
> > > > On Thu, Dec 18, 2014 at 10:16:22PM +0100, Daniel Vetter wrote:
> > > > > On Thu, Dec 18, 2014 at 05:09:57PM +0000, Michel Thierry wrote:
> > > > > > This new version tries to remove as many unnecessary changes as possible from
> > > > > > the previous RFC.
> > > > > >  
> > > > > > For GEN8, it has also been extended to work in logical ring submission (lrc)
> > > > > > mode, as it will be the preferred mode of operation.
> > > > > > I also tried to update the lrc code at the same time the ppgtt refactoring
> > > > > > occurred, leaving only one patch that is exclusively for lrc.
> > > > > > 
> > > > > > This list can be seen in 3 parts:
> > > > > > [01-10] Include code rework for PPGTT (all GENs).
> > > > > > [11-14] Adds page table allocation for GEN6/GEN7
> > > > > > [15-24] Enables dynamic allocation in GEN8. It is enabled for both legacy
> > > > > > and execlist submission modes.
> > > > > > 
> > > > > > Ben Widawsky (23):
> > > > > >   drm/i915: Add some extra guards in evict_vm
> > > > > >   drm/i915/trace: Fix offsets for 64b
> > > > > >   drm/i915: Rename to GEN8_LEGACY_PDPES
> > > > > >   drm/i915: Setup less PPGTT on failed pagedir
> > > > > >   drm/i915/gen8: Un-hardcode number of page directories
> > > > > >   drm/i915: Range clearing is PPGTT agnostic
> > > > > >   drm/i915: page table abstractions
> > > > > >   drm/i915: Complete page table structures
> > > > > >   drm/i915: Create page table allocators
> > > > > >   drm/i915: Track GEN6 page table usage
> > > > > >   drm/i915: Extract context switch skip logic
> > > > > >   drm/i915: Track page table reload need
> > > > > >   drm/i915: Initialize all contexts
> > > > > >   drm/i915: Finish gen6/7 dynamic page table allocation
> > > > > >   drm/i915/bdw: Use dynamic allocation idioms on free
> > > > > >   drm/i915/bdw: pagedirs rework allocation
> > > > > >   drm/i915/bdw: pagetable allocation rework
> > > > > >   drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
> > > > > >   drm/i915: num_pd_pages/num_pd_entries isn't useful
> > > > > >   drm/i915: Extract PPGTT param from pagedir alloc
> > > > > >   drm/i915/bdw: Split out mappings
> > > > > >   drm/i915/bdw: begin bitmap tracking
> > > > > >   drm/i915/bdw: Dynamic page table allocations
> > > > > > 
> > > > > > Michel Thierry (1):
> > > > > >   drm/i915/bdw: Dynamic page table allocations in lrc mode
> > > > > 
> > > > > Ok, I've tried to read through this series again and I definitely see a
> > > > > bit clearer, but it's still fairly confusing to me. I think that's just
> > > > > the long history of this patch series - often it seems to do something and
> > > > > then undo it again in some later patch. Which doesn't help understanding
> > > > > it.
> > > > > 
> > > > > I've replied with a few comments. Imo the way forward with this is to
> > > > > read, understand and review it from the beginning and merge while that's
> > > > > happening. It will probably take a few rounds until all the confusion is
> > > > > cleared up and we've reached the last patch.
> > > > 
> > > > I honestly think this is starting off in the wrong direction. The first
> > > > task, imo, is to make the current PD swappable. Then, we can introduce
> > > > infrastructure to do deferred page allocation, hopefully combining with
> > > > an approach to allow userspace to similarly defer their page allocation.
> > > 
> > > Thus far things started to blow up because we need a bit too much memory
> > > in testcases so we need this. We'll probably need the pd swapping
> > > eventually, too. But only on gen7, and atm it doesn't look good for ppgtt
> > > on gen7 still. Given that I think the priorities are right for now.
> > > 
> > > But yeah we need to keep this in mind for gen7 full ppgtt.
> > 
> > Certainly making the toplevel PD swappable on BDW is less of a priority,
> > since that should also be already done as part of execlists (and it is so
> > much smaller). I would like to see the gen6/7 deferred allocation
> > removed from this series, as I still consider it to be the wrong initial
> > approach (and making the current gen7 full-ppgtt swappable is rather
> > trivial).
> 
> I don't understand why you'd want to hold up the delayed pagetable alloc
> until the pd is evictable. Imo these are fairly orthogoanl issues: pd
> eviction tries to make ggtt space reclaimable, while deferred pagetable
> alloc ensure that we don't alloc 2M of pagetables (which are system
> memory) in the lower level when not needed.

There is only one level of PD on ivb, and it is 2M of pinned memory.

> So imo we can go ahead with
> both in parallel. There might be some minor conflicts between this series
> and making pd ggtt blocks evictable around pd reloading, but noting
> fundamental.

The trivial way to make it work, is to make the PD an obj. That then
ties into the existing vma/obj management and debug/error
infrastructure. Having deferred allocation for objects is also on the
wishlist.
 
> Or are we talking past each another? Just wondering since you call it
> "swappable pd" and I can't really think of a way we could swap out pds to
> disk on gen7 (they just block ggtt space which can't be used for real ptes
> any more).

Of course we can swap the inactive page directory tables.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 03/24] drm/i915: Rename to GEN8_LEGACY_PDPES
  2014-12-19 12:32       ` Dave Gordon
@ 2014-12-19 13:24         ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2014-12-19 13:24 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Fri, Dec 19, 2014 at 12:32:23PM +0000, Dave Gordon wrote:
> On 18/12/14 20:44, Daniel Vetter wrote:
> > On Thu, Dec 18, 2014 at 09:40:51PM +0100, Daniel Vetter wrote:
> >> On Thu, Dec 18, 2014 at 05:10:00PM +0000, Michel Thierry wrote:
> >>> From: Ben Widawsky <benjamin.widawsky@intel.com>
> >>>
> >>> In gen8, 32b PPGTT has always had one "pdp" (it doesn't actually have
> >>> one, but it resembles having one). The #define was confusing as is, and
> >>> using "PDPE" is a much better description.
> >>>
> >>> sed -i 's/GEN8_LEGACY_PDPS/GEN8_LEGACY_PDPES/' drivers/gpu/drm/i915/*.[ch]
> >>
> >> Hm generally I've thought the abbreviations are pdp (for the page itself)
> >> and pde (for the entries within). I still have no idea what pdpe means ...
> >>
> >> So either please explain that or pick one of the others.
> > 
> > In case you fear the rebase pain of renaming this:
> > 
> > 1. Export entire series as patches with git format-patch.
> > 2. sed -e 's/PDPE/PDE/g' on all the patch files
> > 3. Import the changed patches into a new fresh branch.'
> > 
> > That's all. Feels really crazy the first time you do it, but after having
> > done this a lot with the internal branch when something random (function
> > name or so) changed in upstream it's a fairly simple trick to pull off ;-)
> > 
> > Cheers, Daniel
> 
> The specific #define is inconsistent with the naming used for other
> #defines and in the associated comments. Here's the relevant chunk of
> i915_gem_gtt.h:
> 
> /* GEN8 legacy style address is defined as a 3 level page table:
>  * 31:30 | 29:21 | 20:12 |  11:0
>  * PDPE  |  PDE  |  PTE  | offset
>  * The difference as compared to normal x86 3 level page table is the
> PDPEs are
>  * programmed via register.
>  */
> #define GEN8_PDPE_SHIFT                 30
> #define GEN8_PDPE_MASK                  0x3
> #define GEN8_PDE_SHIFT                  21
> #define GEN8_PDE_MASK                   0x1ff
> #define GEN8_PTE_SHIFT                  12
> #define GEN8_PTE_MASK                   0x1ff
> #define GEN8_LEGACY_PDPS                4
> #define GEN8_PTES_PER_PAGE              (PAGE_SIZE / sizeof(gen8_gtt_pte_t))
> #define GEN8_PDES_PER_PAGE              (PAGE_SIZE /
> sizeof(gen8_ppgtt_pde_t))
> 
> So 'LEGACY_PDPS' is inconsistent with 'PDPE_SHIFT/MASK'.
> 
> PTE  = Page Table Entry
> PDE  = Page Directory Entry
> PDPE = Page Directory Pointer Entry

Ok, I just wasn't aware of the intel nomeclatura for page directories. Imo
if we want to rename we should consider the names established by the linux
vm, mostly because svm and also because they're actually sane:

pgd -> pud -> pmd -> pt if you have 4 levels
pgd -> pmd -> pt if you have 3 levels and just
pgd -> pt for just 2

g = global
u = upper
m = middle

But I guess we can bikeshed this later on. Just please add the above
expansion to the commit message and mention that this is what intel has
picked in the IA PRM, too.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/24] PPGTT dynamic page allocations
  2014-12-19 13:10           ` Chris Wilson
@ 2014-12-19 13:29             ` Daniel Vetter
  2014-12-19 13:36               ` Chris Wilson
  0 siblings, 1 reply; 229+ messages in thread
From: Daniel Vetter @ 2014-12-19 13:29 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Michel Thierry, intel-gfx

On Fri, Dec 19, 2014 at 01:10:40PM +0000, Chris Wilson wrote:
> On Fri, Dec 19, 2014 at 11:13:51AM +0100, Daniel Vetter wrote:
> > On Fri, Dec 19, 2014 at 08:50:09AM +0000, Chris Wilson wrote:
> > > On Fri, Dec 19, 2014 at 09:37:52AM +0100, Daniel Vetter wrote:
> > > > On Fri, Dec 19, 2014 at 08:31:03AM +0000, Chris Wilson wrote:
> > > > > On Thu, Dec 18, 2014 at 10:16:22PM +0100, Daniel Vetter wrote:
> > > > > > On Thu, Dec 18, 2014 at 05:09:57PM +0000, Michel Thierry wrote:
> > > > > > > This new version tries to remove as many unnecessary changes as possible from
> > > > > > > the previous RFC.
> > > > > > >  
> > > > > > > For GEN8, it has also been extended to work in logical ring submission (lrc)
> > > > > > > mode, as it will be the preferred mode of operation.
> > > > > > > I also tried to update the lrc code at the same time the ppgtt refactoring
> > > > > > > occurred, leaving only one patch that is exclusively for lrc.
> > > > > > > 
> > > > > > > This list can be seen in 3 parts:
> > > > > > > [01-10] Include code rework for PPGTT (all GENs).
> > > > > > > [11-14] Adds page table allocation for GEN6/GEN7
> > > > > > > [15-24] Enables dynamic allocation in GEN8. It is enabled for both legacy
> > > > > > > and execlist submission modes.
> > > > > > > 
> > > > > > > Ben Widawsky (23):
> > > > > > >   drm/i915: Add some extra guards in evict_vm
> > > > > > >   drm/i915/trace: Fix offsets for 64b
> > > > > > >   drm/i915: Rename to GEN8_LEGACY_PDPES
> > > > > > >   drm/i915: Setup less PPGTT on failed pagedir
> > > > > > >   drm/i915/gen8: Un-hardcode number of page directories
> > > > > > >   drm/i915: Range clearing is PPGTT agnostic
> > > > > > >   drm/i915: page table abstractions
> > > > > > >   drm/i915: Complete page table structures
> > > > > > >   drm/i915: Create page table allocators
> > > > > > >   drm/i915: Track GEN6 page table usage
> > > > > > >   drm/i915: Extract context switch skip logic
> > > > > > >   drm/i915: Track page table reload need
> > > > > > >   drm/i915: Initialize all contexts
> > > > > > >   drm/i915: Finish gen6/7 dynamic page table allocation
> > > > > > >   drm/i915/bdw: Use dynamic allocation idioms on free
> > > > > > >   drm/i915/bdw: pagedirs rework allocation
> > > > > > >   drm/i915/bdw: pagetable allocation rework
> > > > > > >   drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
> > > > > > >   drm/i915: num_pd_pages/num_pd_entries isn't useful
> > > > > > >   drm/i915: Extract PPGTT param from pagedir alloc
> > > > > > >   drm/i915/bdw: Split out mappings
> > > > > > >   drm/i915/bdw: begin bitmap tracking
> > > > > > >   drm/i915/bdw: Dynamic page table allocations
> > > > > > > 
> > > > > > > Michel Thierry (1):
> > > > > > >   drm/i915/bdw: Dynamic page table allocations in lrc mode
> > > > > > 
> > > > > > Ok, I've tried to read through this series again and I definitely see a
> > > > > > bit clearer, but it's still fairly confusing to me. I think that's just
> > > > > > the long history of this patch series - often it seems to do something and
> > > > > > then undo it again in some later patch. Which doesn't help understanding
> > > > > > it.
> > > > > > 
> > > > > > I've replied with a few comments. Imo the way forward with this is to
> > > > > > read, understand and review it from the beginning and merge while that's
> > > > > > happening. It will probably take a few rounds until all the confusion is
> > > > > > cleared up and we've reached the last patch.
> > > > > 
> > > > > I honestly think this is starting off in the wrong direction. The first
> > > > > task, imo, is to make the current PD swappable. Then, we can introduce
> > > > > infrastructure to do deferred page allocation, hopefully combining with
> > > > > an approach to allow userspace to similarly defer their page allocation.
> > > > 
> > > > Thus far things started to blow up because we need a bit too much memory
> > > > in testcases so we need this. We'll probably need the pd swapping
> > > > eventually, too. But only on gen7, and atm it doesn't look good for ppgtt
> > > > on gen7 still. Given that I think the priorities are right for now.
> > > > 
> > > > But yeah we need to keep this in mind for gen7 full ppgtt.
> > > 
> > > Certainly making the toplevel PD swappable on BDW is less of a priority,
> > > since that should also be already done as part of execlists (and it is so
> > > much smaller). I would like to see the gen6/7 deferred allocation
> > > removed from this series, as I still consider it to be the wrong initial
> > > approach (and making the current gen7 full-ppgtt swappable is rather
> > > trivial).
> > 
> > I don't understand why you'd want to hold up the delayed pagetable alloc
> > until the pd is evictable. Imo these are fairly orthogoanl issues: pd
> > eviction tries to make ggtt space reclaimable, while deferred pagetable
> > alloc ensure that we don't alloc 2M of pagetables (which are system
> > memory) in the lower level when not needed.
> 
> There is only one level of PD on ivb, and it is 2M of pinned memory.

I think we have a mixup of nomeclatura here. The thing that takes 512x4k
on ivb are imo the pagetables, whereas the page directory entries are the
512 entries in the global gtt table. We can't swap those entries out, but
we could evict them from the ggtt.

> > So imo we can go ahead with
> > both in parallel. There might be some minor conflicts between this series
> > and making pd ggtt blocks evictable around pd reloading, but noting
> > fundamental.
> 
> The trivial way to make it work, is to make the PD an obj. That then
> ties into the existing vma/obj management and debug/error
> infrastructure. Having deferred allocation for objects is also on the
> wishlist.
>  
> > Or are we talking past each another? Just wondering since you call it
> > "swappable pd" and I can't really think of a way we could swap out pds to
> > disk on gen7 (they just block ggtt space which can't be used for real ptes
> > any more).
> 
> Of course we can swap the inactive page directory tables.

See above, I don't think we can. Furthermore the vm doesn't bother with
making page tables or directories reclaimable afaik (we don't need to swap
them out, we can simply free them when everything in them is evicted). So
I don't think that's worth the bother.

If we really have a problem with pagetables we can add it to the oom
shrinker: Walking all vms and dropping all pagetables for completely empty
vms would be conceptually really simple. And there shouldn't be anything
left really when all objects are evicted for a ppgtt vm.
-DAniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/24] PPGTT dynamic page allocations
  2014-12-19 13:29             ` Daniel Vetter
@ 2014-12-19 13:36               ` Chris Wilson
  2014-12-19 19:08                 ` Chris Wilson
  0 siblings, 1 reply; 229+ messages in thread
From: Chris Wilson @ 2014-12-19 13:36 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Dec 19, 2014 at 02:29:57PM +0100, Daniel Vetter wrote:
> On Fri, Dec 19, 2014 at 01:10:40PM +0000, Chris Wilson wrote:
> > On Fri, Dec 19, 2014 at 11:13:51AM +0100, Daniel Vetter wrote:
> > > On Fri, Dec 19, 2014 at 08:50:09AM +0000, Chris Wilson wrote:
> > > > On Fri, Dec 19, 2014 at 09:37:52AM +0100, Daniel Vetter wrote:
> > > > > On Fri, Dec 19, 2014 at 08:31:03AM +0000, Chris Wilson wrote:
> > > > > > On Thu, Dec 18, 2014 at 10:16:22PM +0100, Daniel Vetter wrote:
> > > > > > > On Thu, Dec 18, 2014 at 05:09:57PM +0000, Michel Thierry wrote:
> > > > > > > > This new version tries to remove as many unnecessary changes as possible from
> > > > > > > > the previous RFC.
> > > > > > > >  
> > > > > > > > For GEN8, it has also been extended to work in logical ring submission (lrc)
> > > > > > > > mode, as it will be the preferred mode of operation.
> > > > > > > > I also tried to update the lrc code at the same time the ppgtt refactoring
> > > > > > > > occurred, leaving only one patch that is exclusively for lrc.
> > > > > > > > 
> > > > > > > > This list can be seen in 3 parts:
> > > > > > > > [01-10] Include code rework for PPGTT (all GENs).
> > > > > > > > [11-14] Adds page table allocation for GEN6/GEN7
> > > > > > > > [15-24] Enables dynamic allocation in GEN8. It is enabled for both legacy
> > > > > > > > and execlist submission modes.
> > > > > > > > 
> > > > > > > > Ben Widawsky (23):
> > > > > > > >   drm/i915: Add some extra guards in evict_vm
> > > > > > > >   drm/i915/trace: Fix offsets for 64b
> > > > > > > >   drm/i915: Rename to GEN8_LEGACY_PDPES
> > > > > > > >   drm/i915: Setup less PPGTT on failed pagedir
> > > > > > > >   drm/i915/gen8: Un-hardcode number of page directories
> > > > > > > >   drm/i915: Range clearing is PPGTT agnostic
> > > > > > > >   drm/i915: page table abstractions
> > > > > > > >   drm/i915: Complete page table structures
> > > > > > > >   drm/i915: Create page table allocators
> > > > > > > >   drm/i915: Track GEN6 page table usage
> > > > > > > >   drm/i915: Extract context switch skip logic
> > > > > > > >   drm/i915: Track page table reload need
> > > > > > > >   drm/i915: Initialize all contexts
> > > > > > > >   drm/i915: Finish gen6/7 dynamic page table allocation
> > > > > > > >   drm/i915/bdw: Use dynamic allocation idioms on free
> > > > > > > >   drm/i915/bdw: pagedirs rework allocation
> > > > > > > >   drm/i915/bdw: pagetable allocation rework
> > > > > > > >   drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
> > > > > > > >   drm/i915: num_pd_pages/num_pd_entries isn't useful
> > > > > > > >   drm/i915: Extract PPGTT param from pagedir alloc
> > > > > > > >   drm/i915/bdw: Split out mappings
> > > > > > > >   drm/i915/bdw: begin bitmap tracking
> > > > > > > >   drm/i915/bdw: Dynamic page table allocations
> > > > > > > > 
> > > > > > > > Michel Thierry (1):
> > > > > > > >   drm/i915/bdw: Dynamic page table allocations in lrc mode
> > > > > > > 
> > > > > > > Ok, I've tried to read through this series again and I definitely see a
> > > > > > > bit clearer, but it's still fairly confusing to me. I think that's just
> > > > > > > the long history of this patch series - often it seems to do something and
> > > > > > > then undo it again in some later patch. Which doesn't help understanding
> > > > > > > it.
> > > > > > > 
> > > > > > > I've replied with a few comments. Imo the way forward with this is to
> > > > > > > read, understand and review it from the beginning and merge while that's
> > > > > > > happening. It will probably take a few rounds until all the confusion is
> > > > > > > cleared up and we've reached the last patch.
> > > > > > 
> > > > > > I honestly think this is starting off in the wrong direction. The first
> > > > > > task, imo, is to make the current PD swappable. Then, we can introduce
> > > > > > infrastructure to do deferred page allocation, hopefully combining with
> > > > > > an approach to allow userspace to similarly defer their page allocation.
> > > > > 
> > > > > Thus far things started to blow up because we need a bit too much memory
> > > > > in testcases so we need this. We'll probably need the pd swapping
> > > > > eventually, too. But only on gen7, and atm it doesn't look good for ppgtt
> > > > > on gen7 still. Given that I think the priorities are right for now.
> > > > > 
> > > > > But yeah we need to keep this in mind for gen7 full ppgtt.
> > > > 
> > > > Certainly making the toplevel PD swappable on BDW is less of a priority,
> > > > since that should also be already done as part of execlists (and it is so
> > > > much smaller). I would like to see the gen6/7 deferred allocation
> > > > removed from this series, as I still consider it to be the wrong initial
> > > > approach (and making the current gen7 full-ppgtt swappable is rather
> > > > trivial).
> > > 
> > > I don't understand why you'd want to hold up the delayed pagetable alloc
> > > until the pd is evictable. Imo these are fairly orthogoanl issues: pd
> > > eviction tries to make ggtt space reclaimable, while deferred pagetable
> > > alloc ensure that we don't alloc 2M of pagetables (which are system
> > > memory) in the lower level when not needed.
> > 
> > There is only one level of PD on ivb, and it is 2M of pinned memory.
> 
> I think we have a mixup of nomeclatura here. The thing that takes 512x4k
> on ivb are imo the pagetables, whereas the page directory entries are the
> 512 entries in the global gtt table. We can't swap those entries out, but
> we could evict them from the ggtt.

Page Directory vs Page Directory Entries. Of course they can be written
to a swapfile when idle and repinned before use.

> > > So imo we can go ahead with
> > > both in parallel. There might be some minor conflicts between this series
> > > and making pd ggtt blocks evictable around pd reloading, but noting
> > > fundamental.
> > 
> > The trivial way to make it work, is to make the PD an obj. That then
> > ties into the existing vma/obj management and debug/error
> > infrastructure. Having deferred allocation for objects is also on the
> > wishlist.
> >  
> > > Or are we talking past each another? Just wondering since you call it
> > > "swappable pd" and I can't really think of a way we could swap out pds to
> > > disk on gen7 (they just block ggtt space which can't be used for real ptes
> > > any more).
> > 
> > Of course we can swap the inactive page directory tables.
> 
> See above, I don't think we can. Furthermore the vm doesn't bother with
> making page tables or directories reclaimable afaik (we don't need to swap
> them out, we can simply free them when everything in them is evicted). So
> I don't think that's worth the bother.
> 
> If we really have a problem with pagetables we can add it to the oom
> shrinker: Walking all vms and dropping all pagetables for completely empty
> vms would be conceptually really simple. And there shouldn't be anything
> left really when all objects are evicted for a ppgtt vm.

It is trivial to do so right now. If you look at the problem, you will
see that is a requirement to do things like execlists correctly, rather
than the current hodgepodge. In addition, we get a host of debugging
insight for free.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH 00/24] PPGTT dynamic page allocations
  2014-12-19 13:36               ` Chris Wilson
@ 2014-12-19 19:08                 ` Chris Wilson
  0 siblings, 0 replies; 229+ messages in thread
From: Chris Wilson @ 2014-12-19 19:08 UTC (permalink / raw)
  To: Daniel Vetter, Michel Thierry, intel-gfx

To end this particular thread, Daniel made a good point on IRC that his
intent is to blow away the contents of the page tables if we need to
swap, and then recreate them upon next use.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v2 00/24] PPGTT dynamic page allocations
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (24 preceding siblings ...)
  2014-12-18 21:16 ` [PATCH 00/24] PPGTT dynamic page allocations Daniel Vetter
@ 2014-12-23 17:16 ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 01/24] drm/i915: Add some extra guards in evict_vm Michel Thierry
                     ` (24 more replies)
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                   ` (3 subsequent siblings)
  29 siblings, 25 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

Addressing comments from v1.

For GEN8, it has also been extended to work in logical ring submission (lrc)
mode, as it will be the preferred mode of operation.
I also tried to update the lrc code at the same time the ppgtt refactoring
occurred, leaving only one patch that is exclusively for lrc.

This list can be seen in 3 parts:
[01-10] Include code rework for PPGTT (all GENs).
[11-14] Adds page table allocation for GEN6/GEN7
[15-24] Enables dynamic allocation in GEN8. It is enabled for both legacy
and execlist submission modes.

Ben Widawsky (23):
  drm/i915: Add some extra guards in evict_vm
  drm/i915/trace: Fix offsets for 64b
  drm/i915: Rename to GEN8_LEGACY_PDPES
  drm/i915: Setup less PPGTT on failed pagedir
  drm/i915/gen8: Un-hardcode number of page directories
  drm/i915: Range clearing is PPGTT agnostic
  drm/i915: page table abstractions
  drm/i915: Complete page table structures
  drm/i915: Create page table allocators
  drm/i915: Track GEN6 page table usage
  drm/i915: Extract context switch skip and pd load logic
  drm/i915: Track page table reload need
  drm/i915: Initialize all contexts
  drm/i915: Finish gen6/7 dynamic page table allocation
  drm/i915/bdw: Use dynamic allocation idioms on free
  drm/i915/bdw: pagedirs rework allocation
  drm/i915/bdw: pagetable allocation rework
  drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
  drm/i915: num_pd_pages/num_pd_entries isn't useful
  drm/i915: Extract PPGTT param from pagedir alloc
  drm/i915/bdw: Split out mappings
  drm/i915/bdw: begin bitmap tracking
  drm/i915/bdw: Dynamic page table allocations

Michel Thierry (1):
  drm/i915/bdw: Dynamic page table allocations in lrc mode

 drivers/gpu/drm/i915/i915_debugfs.c        |    7 +-
 drivers/gpu/drm/i915/i915_gem.c            |   11 +
 drivers/gpu/drm/i915/i915_gem_context.c    |   62 +-
 drivers/gpu/drm/i915/i915_gem_evict.c      |    3 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   11 +
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 1200 ++++++++++++++++++++--------
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  250 +++++-
 drivers/gpu/drm/i915/i915_trace.h          |  123 ++-
 drivers/gpu/drm/i915/intel_lrc.c           |   80 +-
 9 files changed, 1360 insertions(+), 387 deletions(-)

-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v2 01/24] drm/i915: Add some extra guards in evict_vm
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2015-01-05 13:39     ` Daniel Vetter
  2014-12-23 17:16   ` [PATCH v2 02/24] drm/i915/trace: Fix offsets for 64b Michel Thierry
                     ` (23 subsequent siblings)
  24 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

v2: Use WARN_ONs (Daniel)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_evict.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 886ff2e..3dc7b37 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -214,6 +214,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
 	struct i915_vma *vma, *next;
 	int ret;
 
+	WARN_ON(!mutex_is_locked(&vm->dev->struct_mutex));
 	trace_i915_gem_evict_vm(vm);
 
 	if (do_idle) {
@@ -222,6 +223,8 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
 			return ret;
 
 		i915_gem_retire_requests(vm->dev);
+
+		WARN_ON(!list_empty(&vm->active_list));
 	}
 
 	list_for_each_entry_safe(vma, next, &vm->inactive_list, mm_list)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 02/24] drm/i915/trace: Fix offsets for 64b
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 01/24] drm/i915: Add some extra guards in evict_vm Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 03/24] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
                     ` (22 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_trace.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 6058a01..f004d3d 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -115,7 +115,7 @@ TRACE_EVENT(i915_vma_bind,
 	    TP_STRUCT__entry(
 			     __field(struct drm_i915_gem_object *, obj)
 			     __field(struct i915_address_space *, vm)
-			     __field(u32, offset)
+			     __field(u64, offset)
 			     __field(u32, size)
 			     __field(unsigned, flags)
 			     ),
@@ -128,7 +128,7 @@ TRACE_EVENT(i915_vma_bind,
 			   __entry->flags = flags;
 			   ),
 
-	    TP_printk("obj=%p, offset=%08x size=%x%s vm=%p",
+	    TP_printk("obj=%p, offset=%016llx size=%x%s vm=%p",
 		      __entry->obj, __entry->offset, __entry->size,
 		      __entry->flags & PIN_MAPPABLE ? ", mappable" : "",
 		      __entry->vm)
@@ -141,7 +141,7 @@ TRACE_EVENT(i915_vma_unbind,
 	    TP_STRUCT__entry(
 			     __field(struct drm_i915_gem_object *, obj)
 			     __field(struct i915_address_space *, vm)
-			     __field(u32, offset)
+			     __field(u64, offset)
 			     __field(u32, size)
 			     ),
 
@@ -152,7 +152,7 @@ TRACE_EVENT(i915_vma_unbind,
 			   __entry->size = vma->node.size;
 			   ),
 
-	    TP_printk("obj=%p, offset=%08x size=%x vm=%p",
+	    TP_printk("obj=%p, offset=%016llx size=%x vm=%p",
 		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
 );
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 03/24] drm/i915: Rename to GEN8_LEGACY_PDPES
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 01/24] drm/i915: Add some extra guards in evict_vm Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 02/24] drm/i915/trace: Fix offsets for 64b Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 04/24] drm/i915: Setup less PPGTT on failed pagedir Michel Thierry
                     ` (21 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

In gen8, 32b PPGTT has always had one "pdp" (it doesn't actually have
one, but it resembles having one). The #define was confusing as is, and
using "PDPE" is a much better description.

sed -i 's/GEN8_LEGACY_PDPS/GEN8_LEGACY_PDPES/' drivers/gpu/drm/i915/*.[ch]

It also matches the x86 pagetable terminology:
PTE  = Page Table Entry - pagetable level 1 page
PDE  = Page Directory Entry - pagetable level 2 page
PDPE = Page Directory Pointer Entry - pagetable level 3 page

And in the near future (for 48b addressing):
PML4E = Page Map Level 4 Entry

v2: Expanded information about Page Directory/Table nomenclature.

Cc: Daniel Vetter <daniel@ffwll.ch>
CC: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 +++---
 drivers/gpu/drm/i915/i915_gem_gtt.h | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 746f77f..58d54bd 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -375,7 +375,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
-		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPS))
+		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
 			break;
 
 		if (pt_vaddr == NULL)
@@ -486,7 +486,7 @@ bail:
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
 					   const int max_pdp)
 {
-	struct page **pt_pages[GEN8_LEGACY_PDPS];
+	struct page **pt_pages[GEN8_LEGACY_PDPES];
 	int i, ret;
 
 	for (i = 0; i < max_pdp; i++) {
@@ -537,7 +537,7 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 		return -ENOMEM;
 
 	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
-	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
+	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e377c7d..9d998ec 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -88,7 +88,7 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN8_PDE_MASK			0x1ff
 #define GEN8_PTE_SHIFT			12
 #define GEN8_PTE_MASK			0x1ff
-#define GEN8_LEGACY_PDPS		4
+#define GEN8_LEGACY_PDPES		4
 #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
 #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
 
@@ -273,12 +273,12 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_pages; /* gen8+ */
 	union {
 		struct page **pt_pages;
-		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
+		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
 	};
 	struct page *pd_pages;
 	union {
 		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
+		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
 	};
 	union {
 		dma_addr_t *pt_dma_addr;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 04/24] drm/i915: Setup less PPGTT on failed pagedir
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (2 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 03/24] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 05/24] drm/i915/gen8: Un-hardcode number of page directories Michel Thierry
                     ` (20 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The current code will both potentially print a WARN, and setup part of
the PPGTT structure. Neither of these harm the current code, it is
simply for clarity, and to perhaps prevent later bugs, or weird
debug messages.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 58d54bd..b48b586 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1032,11 +1032,14 @@ alloc:
 		goto alloc;
 	}
 
+	if (ret)
+		return ret;
+
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
 	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
-	return ret;
+	return 0;
 }
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 05/24] drm/i915/gen8: Un-hardcode number of page directories
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (3 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 04/24] drm/i915: Setup less PPGTT on failed pagedir Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 06/24] drm/i915: Range clearing is PPGTT agnostic Michel Thierry
                     ` (19 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9d998ec..8f76990 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -282,7 +282,7 @@ struct i915_hw_ppgtt {
 	};
 	union {
 		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[4];
+		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
 	};
 
 	struct drm_i915_file_private *file_priv;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 06/24] drm/i915: Range clearing is PPGTT agnostic
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (4 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 05/24] drm/i915/gen8: Un-hardcode number of page directories Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 07/24] drm/i915: page table abstractions Michel Thierry
                     ` (18 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Therefore we can do it from our general init function. Eventually, I
hope to have a lot more commonality like this. It won't arrive yet, but
this was a nice easy one.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b48b586..0f6a196 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -672,8 +672,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.start = 0;
 	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
 
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
 	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
 			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
 	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
@@ -1146,8 +1144,6 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
 			 ppgtt->node.start / PAGE_SIZE);
@@ -1181,6 +1177,8 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 		kref_init(&ppgtt->ref);
 		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
 			    ppgtt->base.total);
+		ppgtt->base.clear_range(&ppgtt->base, 0,
+			    ppgtt->base.total, true);
 		i915_init_vm(dev_priv, &ppgtt->base);
 	}
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 07/24] drm/i915: page table abstractions
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (5 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 06/24] drm/i915: Range clearing is PPGTT agnostic Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2015-01-05 13:47     ` Daniel Vetter
  2014-12-23 17:16   ` [PATCH v2 08/24] drm/i915: Complete page table structures Michel Thierry
                     ` (17 subsequent siblings)
  24 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

When we move to dynamic page allocation, keeping pagedir and pagetabs as
separate structures will help to break actions into simpler tasks.

To help transition the code nicely there is some wasted space in gen6/7.
This will be ameliorated shortly.

v2: fixed mismatches after clean-up/rebase.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 177 ++++++++++++++++++------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  23 ++++-
 2 files changed, 107 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0f6a196..6902462 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -334,7 +334,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
+		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
+		struct page *page_table = pd->page_tables[pde].page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
@@ -378,8 +379,12 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
 			break;
 
-		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
+		if (pt_vaddr == NULL) {
+			struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
+			struct page *page_table = pd->page_tables[pde].page;
+
+			pt_vaddr = kmap_atomic(page_table);
+		}
 
 		pt_vaddr[pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -403,29 +408,33 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct page **pt_pages)
+static void gen8_free_page_tables(struct i915_pagedir *pd)
 {
 	int i;
 
-	if (pt_pages == NULL)
+	if (pd->page_tables == NULL)
 		return;
 
 	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
-		if (pt_pages[i])
-			__free_pages(pt_pages[i], 0);
+		if (pd->page_tables[i].page)
+			__free_page(pd->page_tables[i].page);
 }
 
-static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
+static void gen8_free_page_directories(struct i915_pagedir *pd)
+{
+	kfree(pd->page_tables);
+	__free_page(pd->page);
+}
+
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
-		kfree(ppgtt->gen8_pt_pages[i]);
+		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
+		gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
 		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
-
-	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
 }
 
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
@@ -460,86 +469,75 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static struct page **__gen8_alloc_page_tables(void)
+static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages;
 	int i;
 
-	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
-	if (!pt_pages)
-		return ERR_PTR(-ENOMEM);
-
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!pt_pages[i])
-			goto bail;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
+						     sizeof(dma_addr_t),
+						     GFP_KERNEL);
+		if (!ppgtt->gen8_pt_dma_addr[i])
+			return -ENOMEM;
 	}
 
-	return pt_pages;
-
-bail:
-	gen8_free_page_tables(pt_pages);
-	kfree(pt_pages);
-	return ERR_PTR(-ENOMEM);
+	return 0;
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
-					   const int max_pdp)
+static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages[GEN8_LEGACY_PDPES];
-	int i, ret;
+	int i, j;
 
-	for (i = 0; i < max_pdp; i++) {
-		pt_pages[i] = __gen8_alloc_page_tables();
-		if (IS_ERR(pt_pages[i])) {
-			ret = PTR_ERR(pt_pages[i]);
-			goto unwind_out;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+			struct i915_pagetab *pt = &ppgtt->pdp.pagedir[i].page_tables[j];
+
+			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+			if (!pt->page)
+				goto unwind_out;
 		}
 	}
 
-	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
-	 * "atomic" - for cleanup purposes.
-	 */
-	for (i = 0; i < max_pdp; i++)
-		ppgtt->gen8_pt_pages[i] = pt_pages[i];
-
 	return 0;
 
 unwind_out:
-	while (i--) {
-		gen8_free_page_tables(pt_pages[i]);
-		kfree(pt_pages[i]);
-	}
+	while (i--)
+		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
 
-	return ret;
+	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
+						const int max_pdp)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
+	for (i = 0; i < max_pdp; i++) {
+		struct i915_pagetab *pt;
 
-	return 0;
-}
+		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
+		if (!pt)
+			goto unwind_out;
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
-{
-	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
-	if (!ppgtt->pd_pages)
-		return -ENOMEM;
+		ppgtt->pdp.pagedir[i].page = alloc_page(GFP_KERNEL);
+		if (!ppgtt->pdp.pagedir[i].page)
+			goto unwind_out;
+
+		ppgtt->pdp.pagedir[i].page_tables = pt;
+	}
 
-	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
+	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
+
+unwind_out:
+	while (i--) {
+		kfree(ppgtt->pdp.pagedir[i].page_tables);
+		__free_page(ppgtt->pdp.pagedir[i].page);
+	}
+
+	return -ENOMEM;
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
@@ -551,18 +549,19 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
-	if (ret) {
-		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
-		return ret;
-	}
+	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
+	if (ret)
+		goto err_out;
 
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
 
 	ret = gen8_ppgtt_allocate_dma(ppgtt);
-	if (ret)
-		gen8_ppgtt_free(ppgtt);
+	if (!ret)
+		return ret;
 
+	/* TODO: Check this for all cases */
+err_out:
+	gen8_ppgtt_free(ppgtt);
 	return ret;
 }
 
@@ -573,7 +572,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       &ppgtt->pd_pages[pd], 0,
+			       ppgtt->pdp.pagedir[pd].page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
@@ -593,7 +592,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 	struct page *p;
 	int ret;
 
-	p = ppgtt->gen8_pt_pages[pd][pt];
+	p = ppgtt->pdp.pagedir[pd].page_tables[pt].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
@@ -654,7 +653,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 */
 	for (i = 0; i < max_pdp; i++) {
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i].page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
@@ -715,7 +714,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
 		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
@@ -920,7 +919,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > I915_PPGTT_PT_ENTRIES)
 			last_pte = I915_PPGTT_PT_ENTRIES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -949,7 +948,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -984,8 +983,8 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pt_pages[i]);
-	kfree(ppgtt->pt_pages);
+		__free_page(ppgtt->pd.page_tables[i].page);
+	kfree(ppgtt->pd.page_tables);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1042,22 +1041,22 @@ alloc:
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
+	struct i915_pagetab *pt;
 	int i;
 
-	ppgtt->pt_pages = kcalloc(ppgtt->num_pd_entries, sizeof(struct page *),
-				  GFP_KERNEL);
-
-	if (!ppgtt->pt_pages)
+	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
+	if (!pt)
 		return -ENOMEM;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!ppgtt->pt_pages[i]) {
+		pt[i].page = alloc_page(GFP_KERNEL);
+		if (!pt->page) {
 			gen6_ppgtt_free(ppgtt);
 			return -ENOMEM;
 		}
 	}
 
+	ppgtt->pd.page_tables = pt;
 	return 0;
 }
 
@@ -1092,9 +1091,11 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct page *page;
 		dma_addr_t pt_addr;
 
-		pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i], 0, 4096,
+		page = ppgtt->pd.page_tables[i].page;
+		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
 		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
@@ -1138,7 +1139,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total =  ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd_offset =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8f76990..1ff3c05 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -265,6 +265,20 @@ struct i915_gtt {
 			  unsigned long *mappable_end);
 };
 
+struct i915_pagetab {
+	struct page *page;
+};
+
+struct i915_pagedir {
+	struct page *page; /* NULL for GEN6-GEN7 */
+	struct i915_pagetab *page_tables;
+};
+
+struct i915_pagedirpo {
+	/* struct page *page; */
+	struct i915_pagedir pagedir[GEN8_LEGACY_PDPES];
+};
+
 struct i915_hw_ppgtt {
 	struct i915_address_space base;
 	struct kref ref;
@@ -272,11 +286,6 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		struct page **pt_pages;
-		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
-	};
-	struct page *pd_pages;
-	union {
 		uint32_t pd_offset;
 		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
 	};
@@ -284,6 +293,10 @@ struct i915_hw_ppgtt {
 		dma_addr_t *pt_dma_addr;
 		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
 	};
+	union {
+		struct i915_pagedirpo pdp;
+		struct i915_pagedir pd;
+	};
 
 	struct drm_i915_file_private *file_priv;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 08/24] drm/i915: Complete page table structures
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (6 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 07/24] drm/i915: page table abstractions Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 09/24] drm/i915: Create page table allocators Michel Thierry
                     ` (16 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Move the remaining members over to the new page table structures.

This can be squashed with the previous commit if desire. The reasoning
is the same as that patch. I simply felt it is easier to review if split.

v2: In lrc: s/ppgtt->pd_dma_addr[i]/ppgtt->pdp.pagedir[i].daddr/

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 85 +++++++++++++------------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 14 +++---
 drivers/gpu/drm/i915/intel_lrc.c    | 16 +++----
 4 files changed, 45 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e515aad..60f91bc 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2153,7 +2153,7 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
 		seq_puts(m, "aliasing PPGTT:\n");
-		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
+		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
 
 		ppgtt->debug_dump(ppgtt, m);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6902462..a26c18c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -307,7 +307,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pd_dma_addr[i];
+		dma_addr_t addr = ppgtt->pdp.pagedir[i].daddr;
 		ret = gen8_write_pdp(ring, i, addr);
 		if (ret)
 			return ret;
@@ -433,7 +433,6 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
 		gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
-		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
 }
 
@@ -445,14 +444,14 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pd_dma_addr[i])
+		if (!ppgtt->pdp.pagedir[i].daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i].daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -469,32 +468,19 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
-{
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
-
-	return 0;
-}
-
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
 	int i, j;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[i];
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_pagetab *pt = &ppgtt->pdp.pagedir[i].page_tables[j];
+			struct i915_pagetab *pt = &pd->page_tables[j];
 
 			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 			if (!pt->page)
 				goto unwind_out;
+
 		}
 	}
 
@@ -555,9 +541,7 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
 
-	ret = gen8_ppgtt_allocate_dma(ppgtt);
-	if (!ret)
-		return ret;
+	return 0;
 
 	/* TODO: Check this for all cases */
 err_out:
@@ -579,7 +563,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ppgtt->pd_dma_addr[pd] = pd_addr;
+	ppgtt->pdp.pagedir[pd].daddr = pd_addr;
 
 	return 0;
 }
@@ -589,17 +573,18 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pt)
 {
 	dma_addr_t pt_addr;
-	struct page *p;
+	struct i915_pagedir *pdir = &ppgtt->pdp.pagedir[pd];
+	struct i915_pagetab *ptab = &pdir->page_tables[pt];
+	struct page *p = ptab->page;
 	int ret;
 
-	p = ppgtt->pdp.pagedir[pd].page_tables[pt].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
+	ptab->daddr = pt_addr;
 
 	return 0;
 }
@@ -655,7 +640,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		gen8_ppgtt_pde_t *pd_vaddr;
 		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i].page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -696,14 +681,15 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
 
 	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
 	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-		   ppgtt->pd_offset, ppgtt->pd_offset + ppgtt->num_pd_entries);
+		   ppgtt->pd.pd_offset,
+		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pt_dma_addr[pde];
+		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -747,13 +733,13 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 	uint32_t pd_entry;
 	int i;
 
-	WARN_ON(ppgtt->pd_offset & 0x3f);
+	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
 	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		dma_addr_t pt_addr;
 
-		pt_addr = ppgtt->pt_dma_addr[i];
+		pt_addr = ppgtt->pd.page_tables[i].daddr;
 		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
 		pd_entry |= GEN6_PDE_VALID;
 
@@ -764,9 +750,9 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 {
-	BUG_ON(ppgtt->pd_offset & 0x3f);
+	BUG_ON(ppgtt->pd.pd_offset & 0x3f);
 
-	return (ppgtt->pd_offset / 64) << 16;
+	return (ppgtt->pd.pd_offset / 64) << 16;
 }
 
 static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
@@ -969,19 +955,16 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	if (ppgtt->pt_dma_addr) {
-		for (i = 0; i < ppgtt->num_pd_entries; i++)
-			pci_unmap_page(ppgtt->base.dev->pdev,
-				       ppgtt->pt_dma_addr[i],
-				       4096, PCI_DMA_BIDIRECTIONAL);
-	}
+	for (i = 0; i < ppgtt->num_pd_entries; i++)
+		pci_unmap_page(ppgtt->base.dev->pdev,
+			       ppgtt->pd.page_tables[i].daddr,
+			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		__free_page(ppgtt->pd.page_tables[i].page);
 	kfree(ppgtt->pd.page_tables);
@@ -1074,14 +1057,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 	}
 
-	ppgtt->pt_dma_addr = kcalloc(ppgtt->num_pd_entries, sizeof(dma_addr_t),
-				     GFP_KERNEL);
-	if (!ppgtt->pt_dma_addr) {
-		drm_mm_remove_node(&ppgtt->node);
-		gen6_ppgtt_free(ppgtt);
-		return -ENOMEM;
-	}
-
 	return 0;
 }
 
@@ -1103,7 +1078,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pt_dma_addr[i] = pt_addr;
+		ppgtt->pd.page_tables[i].daddr = pt_addr;
 	}
 
 	return 0;
@@ -1142,7 +1117,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
-	ppgtt->pd_offset =
+	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
@@ -1151,7 +1126,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 
 	gen6_write_pdes(ppgtt);
 	DRM_DEBUG("Adding PPGTT at offset %x\n",
-		  ppgtt->pd_offset << 10);
+		  ppgtt->pd.pd_offset << 10);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 1ff3c05..9bc973e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -267,10 +267,16 @@ struct i915_gtt {
 
 struct i915_pagetab {
 	struct page *page;
+	dma_addr_t daddr;
 };
 
 struct i915_pagedir {
 	struct page *page; /* NULL for GEN6-GEN7 */
+	union {
+		uint32_t pd_offset;
+		dma_addr_t daddr;
+	};
+
 	struct i915_pagetab *page_tables;
 };
 
@@ -286,14 +292,6 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
-	};
-	union {
-		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
-	};
-	union {
 		struct i915_pagedirpo pdp;
 		struct i915_pagedir pd;
 	};
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 57b1ca0..075cf68 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1731,14 +1731,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[3]);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[3]);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[2]);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[2]);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[1]);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[1]);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[0]);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[0]);
+	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3].daddr);
+	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3].daddr);
+	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2].daddr);
+	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2].daddr);
+	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1].daddr);
+	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1].daddr);
+	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0].daddr);
+	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0].daddr);
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 09/24] drm/i915: Create page table allocators
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (7 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 08/24] drm/i915: Complete page table structures Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 10/24] drm/i915: Track GEN6 page table usage Michel Thierry
                     ` (15 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks.  This makes the
code easier to write, read, and verify.

Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,

The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain non-trivial complexity as a
result.

This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.

2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).

3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.

4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.

v2: Updated commit message to explain why this patch exists

v3: For lrc, s/pdp.pagedir[i].daddr/pdp.pagedir[i]->daddr/

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 228 +++++++++++++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c    |  16 +--
 3 files changed, 155 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a26c18c..52bdde7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -275,6 +275,102 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
+static void free_pt_single(struct i915_pagetab *pt)
+{
+	if (WARN_ON(!pt->page))
+		return;
+	__free_page(pt->page);
+	kfree(pt);
+}
+
+static struct i915_pagetab *alloc_pt_single(void)
+{
+	struct i915_pagetab *pt;
+
+	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
+	if (!pt)
+		return ERR_PTR(-ENOMEM);
+
+	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pt->page) {
+		kfree(pt);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pt;
+}
+
+/**
+ * alloc_pt_range() - Allocate a multiple page tables
+ * @pd:		The page directory which will have at least @count entries
+ *		available to point to the allocated page tables.
+ * @pde:	First page directory entry for which we are allocating.
+ * @count:	Number of pages to allocate.
+ *
+ * Allocates multiple page table pages and sets the appropriate entries in the
+ * page table structure within the page directory. Function cleans up after
+ * itself on any failures.
+ *
+ * Return: 0 if allocation succeeded.
+ */
+static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
+{
+	int i, ret;
+
+	/* 512 is the max page tables per pagedir on any platform.
+	 * TODO: make WARN after patch series is done
+	 */
+	BUG_ON(pde + count > GEN6_PPGTT_PD_ENTRIES);
+
+	for (i = pde; i < pde + count; i++) {
+		struct i915_pagetab *pt = alloc_pt_single();
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto err_out;
+		}
+		WARN(pd->page_tables[i],
+		     "Leaking page directory entry %d (%pa)\n",
+		     i, pd->page_tables[i]);
+		pd->page_tables[i] = pt;
+	}
+
+	return 0;
+
+err_out:
+	while (i--)
+		free_pt_single(pd->page_tables[i]);
+	return ret;
+}
+
+static void __free_pd_single(struct i915_pagedir *pd)
+{
+	__free_page(pd->page);
+	kfree(pd);
+}
+
+#define free_pd_single(pd) do { \
+	if ((pd)->page) { \
+		__free_pd_single(pd); \
+	} \
+} while (0)
+
+static struct i915_pagedir *alloc_pd_single(void)
+{
+	struct i915_pagedir *pd;
+
+	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+	if (!pd)
+		return ERR_PTR(-ENOMEM);
+
+	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pd->page) {
+		kfree(pd);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pd;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
 			   uint64_t val)
@@ -307,7 +403,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.pagedir[i].daddr;
+		dma_addr_t addr = ppgtt->pdp.pagedir[i]->daddr;
 		ret = gen8_write_pdp(ring, i, addr);
 		if (ret)
 			return ret;
@@ -334,8 +430,9 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
-		struct page *page_table = pd->page_tables[pde].page;
+		struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+		struct i915_pagetab *pt = pd->page_tables[pde];
+		struct page *page_table = pt->page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
@@ -380,8 +477,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 			break;
 
 		if (pt_vaddr == NULL) {
-			struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
-			struct page *page_table = pd->page_tables[pde].page;
+			struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+			struct i915_pagetab *pt = pd->page_tables[pde];
+			struct page *page_table = pt->page;
 
 			pt_vaddr = kmap_atomic(page_table);
 		}
@@ -412,18 +510,13 @@ static void gen8_free_page_tables(struct i915_pagedir *pd)
 {
 	int i;
 
-	if (pd->page_tables == NULL)
+	if (!pd->page)
 		return;
 
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
-		if (pd->page_tables[i].page)
-			__free_page(pd->page_tables[i].page);
-}
-
-static void gen8_free_page_directories(struct i915_pagedir *pd)
-{
-	kfree(pd->page_tables);
-	__free_page(pd->page);
+	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
+		free_pt_single(pd->page_tables[i]);
+		pd->page_tables[i] = NULL;
+	}
 }
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
@@ -431,8 +524,8 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
-		gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+		free_pd_single(ppgtt->pdp.pagedir[i]);
 	}
 }
 
@@ -444,14 +537,16 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pdp.pagedir[i].daddr)
+		if (!ppgtt->pdp.pagedir[i]->daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i].daddr, PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i]->daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
+			struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
+			struct i915_pagetab *pt =  pd->page_tables[j];
+			dma_addr_t addr = pt->daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -470,25 +565,20 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	int i, j;
+	int i, ret;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[i];
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_pagetab *pt = &pd->page_tables[j];
-
-			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-			if (!pt->page)
-				goto unwind_out;
-
-		}
+		ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
+				     0, GEN8_PDES_PER_PAGE);
+		if (ret)
+			goto unwind_out;
 	}
 
 	return 0;
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
 
 	return -ENOMEM;
 }
@@ -499,17 +589,9 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int i;
 
 	for (i = 0; i < max_pdp; i++) {
-		struct i915_pagetab *pt;
-
-		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
-		if (!pt)
+		ppgtt->pdp.pagedir[i] = alloc_pd_single();
+		if (IS_ERR(ppgtt->pdp.pagedir[i]))
 			goto unwind_out;
-
-		ppgtt->pdp.pagedir[i].page = alloc_page(GFP_KERNEL);
-		if (!ppgtt->pdp.pagedir[i].page)
-			goto unwind_out;
-
-		ppgtt->pdp.pagedir[i].page_tables = pt;
 	}
 
 	ppgtt->num_pd_pages = max_pdp;
@@ -518,10 +600,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 
 unwind_out:
-	while (i--) {
-		kfree(ppgtt->pdp.pagedir[i].page_tables);
-		__free_page(ppgtt->pdp.pagedir[i].page);
-	}
+	while (i--)
+		free_pd_single(ppgtt->pdp.pagedir[i]);
 
 	return -ENOMEM;
 }
@@ -556,14 +636,14 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.pagedir[pd].page, 0,
+			       ppgtt->pdp.pagedir[pd]->page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->pdp.pagedir[pd].daddr = pd_addr;
+	ppgtt->pdp.pagedir[pd]->daddr = pd_addr;
 
 	return 0;
 }
@@ -573,8 +653,8 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pt)
 {
 	dma_addr_t pt_addr;
-	struct i915_pagedir *pdir = &ppgtt->pdp.pagedir[pd];
-	struct i915_pagetab *ptab = &pdir->page_tables[pt];
+	struct i915_pagedir *pdir = ppgtt->pdp.pagedir[pd];
+	struct i915_pagetab *ptab = pdir->page_tables[pt];
 	struct page *p = ptab->page;
 	int ret;
 
@@ -637,10 +717,12 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 * will never need to touch the PDEs again.
 	 */
 	for (i = 0; i < max_pdp; i++) {
+		struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i].page);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i]->page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
+			struct i915_pagetab *pt = pd->page_tables[j];
+			dma_addr_t addr = pt->daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -689,7 +771,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
+		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -700,7 +782,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
 		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
@@ -739,7 +821,7 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		dma_addr_t pt_addr;
 
-		pt_addr = ppgtt->pd.page_tables[i].daddr;
+		pt_addr = ppgtt->pd.page_tables[i]->daddr;
 		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
 		pd_entry |= GEN6_PDE_VALID;
 
@@ -905,7 +987,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > I915_PPGTT_PT_ENTRIES)
 			last_pte = I915_PPGTT_PT_ENTRIES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -934,7 +1016,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -957,7 +1039,7 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_tables[i].daddr,
+			       ppgtt->pd.page_tables[i]->daddr,
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
@@ -966,8 +1048,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pd.page_tables[i].page);
-	kfree(ppgtt->pd.page_tables);
+		free_pt_single(ppgtt->pd.page_tables[i]);
+
+	free_pd_single(&ppgtt->pd);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1022,27 +1105,6 @@ alloc:
 	return 0;
 }
 
-static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct i915_pagetab *pt;
-	int i;
-
-	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
-	if (!pt)
-		return -ENOMEM;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		pt[i].page = alloc_page(GFP_KERNEL);
-		if (!pt->page) {
-			gen6_ppgtt_free(ppgtt);
-			return -ENOMEM;
-		}
-	}
-
-	ppgtt->pd.page_tables = pt;
-	return 0;
-}
-
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 {
 	int ret;
@@ -1051,7 +1113,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_allocate_page_tables(ppgtt);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
@@ -1069,7 +1131,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 		struct page *page;
 		dma_addr_t pt_addr;
 
-		page = ppgtt->pd.page_tables[i].page;
+		page = ppgtt->pd.page_tables[i]->page;
 		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
@@ -1078,7 +1140,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pd.page_tables[i].daddr = pt_addr;
+		ppgtt->pd.page_tables[i]->daddr = pt_addr;
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9bc973e..c08fe8b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -277,12 +277,12 @@ struct i915_pagedir {
 		dma_addr_t daddr;
 	};
 
-	struct i915_pagetab *page_tables;
+	struct i915_pagetab *page_tables[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
 };
 
 struct i915_pagedirpo {
 	/* struct page *page; */
-	struct i915_pagedir pagedir[GEN8_LEGACY_PDPES];
+	struct i915_pagedir *pagedir[GEN8_LEGACY_PDPES];
 };
 
 struct i915_hw_ppgtt {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 075cf68..546884b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1731,14 +1731,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3].daddr);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3].daddr);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2].daddr);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2].daddr);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1].daddr);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1].daddr);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0].daddr);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0].daddr);
+	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3]->daddr);
+	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3]->daddr);
+	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2]->daddr);
+	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2]->daddr);
+	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1]->daddr);
+	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1]->daddr);
+	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0]->daddr);
+	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0]->daddr);
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 10/24] drm/i915: Track GEN6 page table usage
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (8 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 09/24] drm/i915: Create page table allocators Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2015-01-05 14:29     ` Daniel Vetter
  2014-12-23 17:16   ` [PATCH v2 11/24] drm/i915: Extract context switch skip and pd load logic Michel Thierry
                     ` (14 subsequent siblings)
  24 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Instead of implementing the full tracking + dynamic allocation, this
patch does a bit less than half of the work, by tracking and warning on
unexpected conditions. The tracking itself follows which PTEs within a
page table are currently being used for objects. The next patch will
modify this to actually allocate the page tables only when necessary.

With the current patch there isn't much in the way of making a gen
agnostic range allocation function. However, in the next patch we'll add
more specificity which makes having separate functions a bit easier to
manage.

One important change introduced here is that DMA mappings are
created/destroyed at the same page directories/tables are
allocated/deallocated.

Notice that aliasing PPGTT is not managed here. The patch which actually
begins dynamic allocation/teardown explains the reasoning for this.

v2: s/pdp.pagedir/pdp.pagedirs
Make a scratch page allocation helper

v3: Rebase and expand commit message.

v4: Allocate required pagetables only when it is needed, _bind_to_vm
instead of bind_vma (Daniel).

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
---
 drivers/gpu/drm/i915/i915_gem.c     |   9 ++
 drivers/gpu/drm/i915/i915_gem_gtt.c | 277 ++++++++++++++++++++++++++----------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 149 ++++++++++++++-----
 3 files changed, 322 insertions(+), 113 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2b6ecfd..5d52990 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3597,6 +3597,15 @@ search_free:
 	if (ret)
 		goto err_remove_node;
 
+	/*  allocate before insert / bind */
+	if (vma->vm->allocate_va_range) {
+		ret = vma->vm->allocate_va_range(vma->vm,
+						vma->node.start,
+						vma->node.size);
+		if (ret)
+			goto err_remove_node;
+	}
+
 	trace_i915_vma_bind(vma, flags);
 	ret = i915_vma_bind(vma, obj->cache_level,
 			    flags & PIN_GLOBAL ? GLOBAL_BIND : 0);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 52bdde7..313432e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -138,10 +138,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 		return has_aliasing_ppgtt ? 1 : 0;
 }
 
-
 static void ppgtt_bind_vma(struct i915_vma *vma,
-			   enum i915_cache_level cache_level,
-			   u32 flags);
+			  enum i915_cache_level cache_level,
+			  u32 flags);
 static void ppgtt_unbind_vma(struct i915_vma *vma);
 
 static inline gen8_gtt_pte_t gen8_pte_encode(dma_addr_t addr,
@@ -275,27 +274,99 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-static void free_pt_single(struct i915_pagetab *pt)
-{
+#define i915_dma_unmap_single(px, dev) do { \
+	pci_unmap_page((dev)->pdev, (px)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
+} while (0);
+
+/**
+ * i915_dma_map_px_single() - Create a dma mapping for a page table/dir/etc.
+ * @px:		Page table/dir/etc to get a DMA map for
+ * @dev:	drm device
+ *
+ * Page table allocations are unified across all gens. They always require a
+ * single 4k allocation, as well as a DMA mapping. If we keep the structs
+ * symmetric here, the simple macro covers us for every page table type.
+ *
+ * Return: 0 if success.
+ */
+#define i915_dma_map_px_single(px, dev) \
+	pci_dma_mapping_error((dev)->pdev, \
+			      (px)->daddr = pci_map_page((dev)->pdev, \
+							 (px)->page, 0, 4096, \
+							 PCI_DMA_BIDIRECTIONAL))
+
+static void __free_pt_single(struct i915_pagetab *pt, struct drm_device *dev,
+			     int scratch)
+{
+	if (WARN(scratch ^ pt->scratch,
+		 "Tried to free scratch = %d. Is scratch = %d\n",
+		 scratch, pt->scratch))
+		return;
+
 	if (WARN_ON(!pt->page))
 		return;
+
+	if (!scratch) {
+		const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+			GEN8_PTES_PER_PAGE : I915_PPGTT_PT_ENTRIES;
+		WARN(!bitmap_empty(pt->used_ptes, count),
+		     "Free page table with %d used pages\n",
+		     bitmap_weight(pt->used_ptes, count));
+	}
+
+	i915_dma_unmap_single(pt, dev);
 	__free_page(pt->page);
+	kfree(pt->used_ptes);
 	kfree(pt);
 }
 
-static struct i915_pagetab *alloc_pt_single(void)
+#define free_pt_single(pt, dev) \
+	__free_pt_single(pt, dev, false)
+#define free_pt_scratch(pt, dev) \
+	__free_pt_single(pt, dev, true)
+
+static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_pagetab *pt;
+	const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+		GEN8_PTES_PER_PAGE : I915_PPGTT_PT_ENTRIES;
+	int ret = -ENOMEM;
 
 	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
 	if (!pt)
 		return ERR_PTR(-ENOMEM);
 
+	pt->used_ptes = kcalloc(BITS_TO_LONGS(count), sizeof(*pt->used_ptes),
+				GFP_KERNEL);
+
+	if (!pt->used_ptes)
+		goto fail_bitmap;
+
 	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (!pt->page) {
-		kfree(pt);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!pt->page)
+		goto fail_page;
+
+	ret = i915_dma_map_px_single(pt, dev);
+	if (ret)
+		goto fail_dma;
+
+	return pt;
+
+fail_dma:
+	__free_page(pt->page);
+fail_page:
+	kfree(pt->used_ptes);
+fail_bitmap:
+	kfree(pt);
+
+	return ERR_PTR(ret);
+}
+
+static inline struct i915_pagetab *alloc_pt_scratch(struct drm_device *dev)
+{
+	struct i915_pagetab *pt = alloc_pt_single(dev);
+	if (!IS_ERR(pt))
+		pt->scratch = 1;
 
 	return pt;
 }
@@ -313,7 +384,9 @@ static struct i915_pagetab *alloc_pt_single(void)
  *
  * Return: 0 if allocation succeeded.
  */
-static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
+static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count,
+		  struct drm_device *dev)
+
 {
 	int i, ret;
 
@@ -323,7 +396,7 @@ static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
 	BUG_ON(pde + count > GEN6_PPGTT_PD_ENTRIES);
 
 	for (i = pde; i < pde + count; i++) {
-		struct i915_pagetab *pt = alloc_pt_single();
+		struct i915_pagetab *pt = alloc_pt_single(dev);
 		if (IS_ERR(pt)) {
 			ret = PTR_ERR(pt);
 			goto err_out;
@@ -338,7 +411,7 @@ static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
 
 err_out:
 	while (i--)
-		free_pt_single(pd->page_tables[i]);
+		free_pt_single(pd->page_tables[i], dev);
 	return ret;
 }
 
@@ -506,7 +579,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct i915_pagedir *pd)
+static void gen8_free_page_tables(struct i915_pagedir *pd, struct drm_device *dev)
 {
 	int i;
 
@@ -514,7 +587,7 @@ static void gen8_free_page_tables(struct i915_pagedir *pd)
 		return;
 
 	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		free_pt_single(pd->page_tables[i]);
+		free_pt_single(pd->page_tables[i], dev);
 		pd->page_tables[i] = NULL;
 	}
 }
@@ -524,7 +597,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
 		free_pd_single(ppgtt->pdp.pagedir[i]);
 	}
 }
@@ -569,7 +642,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
-				     0, GEN8_PDES_PER_PAGE);
+				     0, GEN8_PDES_PER_PAGE, ppgtt->base.dev);
 		if (ret)
 			goto unwind_out;
 	}
@@ -578,7 +651,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
@@ -808,26 +881,36 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	}
 }
 
-static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
+/* Write pde (index) from the page directory @pd to the page table @pt */
+static void gen6_write_pdes(struct i915_pagedir *pd,
+			    const int pde, struct i915_pagetab *pt)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
-	gen6_gtt_pte_t __iomem *pd_addr;
-	uint32_t pd_entry;
-	int i;
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pd, struct i915_hw_ppgtt, pd);
+	u32 pd_entry;
 
-	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
-	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		dma_addr_t pt_addr;
+	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
+	pd_entry |= GEN6_PDE_VALID;
 
-		pt_addr = ppgtt->pd.page_tables[i]->daddr;
-		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
-		pd_entry |= GEN6_PDE_VALID;
+	writel(pd_entry, ppgtt->pd_addr + pde);
 
-		writel(pd_entry, pd_addr + i);
-	}
-	readl(pd_addr);
+	/* XXX: Caller needs to make sure the write completes if necessary */
+}
+
+/* Write all the page tables found in the ppgtt structure to incrementing page
+ * directories. */
+static void gen6_write_page_range(struct drm_i915_private *dev_priv,
+				struct i915_pagedir *pd, uint32_t start, uint32_t length)
+{
+	struct i915_pagetab *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, pd, start, length, temp, pde)
+		gen6_write_pdes(pd, pde, pt);
+
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
+	readl(dev_priv->gtt.gsm);
 }
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
@@ -1043,13 +1126,59 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
+static int gen6_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+				container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_pagetab *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		int j;
+
+		DECLARE_BITMAP(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
+		bitmap_zero(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
+		bitmap_set(tmp_bitmap, gen6_pte_index(start),
+			   gen6_pte_count(start, length));
+
+		/* TODO: To be done in the next patch. Map the page/insert
+		 * entries here */
+		for_each_set_bit(j, tmp_bitmap, I915_PPGTT_PT_ENTRIES) {
+			if (test_bit(j, pt->used_ptes)) {
+				/* Check that we're changing cache levels */
+			}
+		}
+
+		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+				I915_PPGTT_PT_ENTRIES);
+	}
+
+	return 0;
+}
+
+static void gen6_teardown_va_range(struct i915_address_space *vm,
+				   uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+				container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_pagetab *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
+			     gen6_pte_count(start, length));
+	}
+}
+
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		free_pt_single(ppgtt->pd.page_tables[i]);
+		free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
 
+	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
 	free_pd_single(&ppgtt->pd);
 }
 
@@ -1076,6 +1205,9 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 	 * size. We allocate at the top of the GTT to avoid fragmentation.
 	 */
 	BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
+	ppgtt->scratch_pt = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pt))
+		return PTR_ERR(ppgtt->scratch_pt);
 alloc:
 	ret = drm_mm_insert_node_in_range_generic(&dev_priv->gtt.base.mm,
 						  &ppgtt->node, GEN6_PD_SIZE,
@@ -1089,20 +1221,25 @@ alloc:
 					       0, dev_priv->gtt.base.total,
 					       0);
 		if (ret)
-			return ret;
+			goto err_out;
 
 		retried = true;
 		goto alloc;
 	}
 
 	if (ret)
-		return ret;
+		goto err_out;
+
 
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
 	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
 	return 0;
+
+err_out:
+	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
+	return ret;
 }
 
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
@@ -1113,7 +1250,9 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+			ppgtt->base.dev);
+
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
@@ -1122,30 +1261,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct drm_device *dev = ppgtt->base.dev;
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		struct page *page;
-		dma_addr_t pt_addr;
-
-		page = ppgtt->pd.page_tables[i]->page;
-		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
-				       PCI_DMA_BIDIRECTIONAL);
-
-		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
-			gen6_ppgtt_unmap_pages(ppgtt);
-			return -EIO;
-		}
-
-		ppgtt->pd.page_tables[i]->daddr = pt_addr;
-	}
-
-	return 0;
-}
-
 static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
 	struct drm_device *dev = ppgtt->base.dev;
@@ -1166,12 +1281,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_setup_page_tables(ppgtt);
-	if (ret) {
-		gen6_ppgtt_free(ppgtt);
-		return ret;
-	}
-
+	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
+	ppgtt->base.teardown_va_range = gen6_teardown_va_range;
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
@@ -1182,11 +1293,15 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
+	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
+
+	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
+
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
 			 ppgtt->node.start / PAGE_SIZE);
 
-	gen6_write_pdes(ppgtt);
 	DRM_DEBUG("Adding PPGTT at offset %x\n",
 		  ppgtt->pd.pd_offset << 10);
 
@@ -1318,6 +1433,9 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 			     vma->node.start,
 			     vma->obj->base.size,
 			     true);
+	if (vma->vm->teardown_va_range)
+		vma->vm->teardown_va_range(vma->vm,
+					   vma->node.start, vma->node.size);
 }
 
 extern int intel_iommu_gfx_mapped;
@@ -1461,13 +1579,14 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 
 	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
 		/* TODO: Perhaps it shouldn't be gen6 specific */
-		if (i915_is_ggtt(vm)) {
-			if (dev_priv->mm.aliasing_ppgtt)
-				gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
-			continue;
-		}
 
-		gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
+		struct i915_hw_ppgtt *ppgtt =
+			container_of(vm, struct i915_hw_ppgtt, base);
+
+		if (i915_is_ggtt(vm))
+			ppgtt = dev_priv->mm.aliasing_ppgtt;
+
+		gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
 	}
 
 	i915_ggtt_flush(dev_priv);
@@ -1633,8 +1752,8 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
 
 
 static void i915_ggtt_bind_vma(struct i915_vma *vma,
-			       enum i915_cache_level cache_level,
-			       u32 unused)
+			      enum i915_cache_level cache_level,
+			      u32 unused)
 {
 	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
 	unsigned int flags = (cache_level == I915_CACHE_NONE) ?
@@ -1666,8 +1785,8 @@ static void i915_ggtt_unbind_vma(struct i915_vma *vma)
 }
 
 static void ggtt_bind_vma(struct i915_vma *vma,
-			  enum i915_cache_level cache_level,
-			  u32 flags)
+			 enum i915_cache_level cache_level,
+			 u32 flags)
 {
 	struct drm_device *dev = vma->vm->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c08fe8b..d579f74 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -54,7 +54,10 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN6_PPGTT_PD_ENTRIES		512
 #define GEN6_PD_SIZE			(GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
 #define GEN6_PD_ALIGN			(PAGE_SIZE * 16)
+#define GEN6_PDE_SHIFT          22
 #define GEN6_PDE_VALID			(1 << 0)
+#define GEN6_PDE_MASK			(GEN6_PPGTT_PD_ENTRIES-1)
+#define NUM_PTE(pde_shift)		(1 << (pde_shift - PAGE_SHIFT))
 
 #define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
 
@@ -183,8 +186,32 @@ struct i915_vma {
 	void (*unbind_vma)(struct i915_vma *vma);
 	/* Map an object into an address space with the given cache flags. */
 	void (*bind_vma)(struct i915_vma *vma,
-			 enum i915_cache_level cache_level,
-			 u32 flags);
+			enum i915_cache_level cache_level,
+			u32 flags);
+};
+
+
+struct i915_pagetab {
+	struct page *page;
+	dma_addr_t daddr;
+
+	unsigned long *used_ptes;
+	unsigned int scratch:1;
+};
+
+struct i915_pagedir {
+	struct page *page; /* NULL for GEN6-GEN7 */
+	union {
+		uint32_t pd_offset;
+		dma_addr_t daddr;
+	};
+
+	struct i915_pagetab *page_tables[GEN6_PPGTT_PD_ENTRIES];
+};
+
+struct i915_pagedirpo {
+	/* struct page *page; */
+	struct i915_pagedir *pagedir[GEN8_LEGACY_PDPES];
 };
 
 struct i915_address_space {
@@ -226,6 +253,12 @@ struct i915_address_space {
 	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
 				     enum i915_cache_level level,
 				     bool valid, u32 flags); /* Create a valid PTE */
+	int (*allocate_va_range)(struct i915_address_space *vm,
+				 uint64_t start,
+				 uint64_t length);
+	void (*teardown_va_range)(struct i915_address_space *vm,
+				  uint64_t start,
+				  uint64_t length);
 	void (*clear_range)(struct i915_address_space *vm,
 			    uint64_t start,
 			    uint64_t length,
@@ -237,6 +270,29 @@ struct i915_address_space {
 	void (*cleanup)(struct i915_address_space *vm);
 };
 
+struct i915_hw_ppgtt {
+	struct i915_address_space base;
+	struct kref ref;
+	struct drm_mm_node node;
+	unsigned num_pd_entries;
+	unsigned num_pd_pages; /* gen8+ */
+	union {
+		struct i915_pagedirpo pdp;
+		struct i915_pagedir pd;
+	};
+
+	struct i915_pagetab *scratch_pt;
+
+	struct drm_i915_file_private *file_priv;
+
+	gen6_gtt_pte_t __iomem *pd_addr;
+
+	int (*enable)(struct i915_hw_ppgtt *ppgtt);
+	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
+			 struct intel_engine_cs *ring);
+	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
+};
+
 /* The Graphics Translation Table is the way in which GEN hardware translates a
  * Graphics Virtual Address into a Physical Address. In addition to the normal
  * collateral associated with any va->pa translations GEN hardware also has a
@@ -265,44 +321,69 @@ struct i915_gtt {
 			  unsigned long *mappable_end);
 };
 
-struct i915_pagetab {
-	struct page *page;
-	dma_addr_t daddr;
-};
+/* For each pde iterates over every pde between from start until start + length.
+ * If start, and start+length are not perfectly divisible, the macro will round
+ * down, and up as needed. The macro modifies pde, start, and length. Dev is
+ * only used to differentiate shift values. Temp is temp.  On gen6/7, start = 0,
+ * and length = 2G effectively iterates over every PDE in the system. On gen8+
+ * it simply iterates over every page directory entry in a page directory.
+ *
+ * XXX: temp is not actually needed, but it saves doing the ALIGN operation.
+ */
+#define gen6_for_each_pde(pt, pd, start, length, temp, iter) \
+	for (iter = gen6_pde_index(start), pt = (pd)->page_tables[iter]; \
+	     length > 0 && iter < GEN6_PPGTT_PD_ENTRIES; \
+	     pt = (pd)->page_tables[++iter], \
+	     temp = ALIGN(start+1, 1 << GEN6_PDE_SHIFT) - start, \
+	     temp = min(temp, (unsigned)length), \
+	     start += temp, length -= temp)
+
+static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
+{
+	const uint32_t mask = NUM_PTE(pde_shift) - 1;
+	return (address >> PAGE_SHIFT) & mask;
+}
 
-struct i915_pagedir {
-	struct page *page; /* NULL for GEN6-GEN7 */
-	union {
-		uint32_t pd_offset;
-		dma_addr_t daddr;
-	};
+/* Helper to counts the number of PTEs within the given length. This count does
+* not cross a page table boundary, so the max value would be
+* I915_PPGTT_PT_ENTRIES for GEN6, and GEN8_PTES_PER_PAGE for GEN8.
+*/
+static inline size_t i915_pte_count(uint64_t addr, size_t length,
+					uint32_t pde_shift)
+{
+	const uint64_t mask = ~((1 << pde_shift) - 1);
+	uint64_t end;
 
-	struct i915_pagetab *page_tables[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
-};
+	BUG_ON(length == 0);
+	BUG_ON(offset_in_page(addr|length));
 
-struct i915_pagedirpo {
-	/* struct page *page; */
-	struct i915_pagedir *pagedir[GEN8_LEGACY_PDPES];
-};
+	end = addr + length;
 
-struct i915_hw_ppgtt {
-	struct i915_address_space base;
-	struct kref ref;
-	struct drm_mm_node node;
-	unsigned num_pd_entries;
-	unsigned num_pd_pages; /* gen8+ */
-	union {
-		struct i915_pagedirpo pdp;
-		struct i915_pagedir pd;
-	};
+	if ((addr & mask) != (end & mask))
+		return NUM_PTE(pde_shift) - i915_pte_index(addr, pde_shift);
 
-	struct drm_i915_file_private *file_priv;
+	return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
+}
 
-	int (*enable)(struct i915_hw_ppgtt *ppgtt);
-	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
-			 struct intel_engine_cs *ring);
-	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
-};
+static inline uint32_t i915_pde_index(uint64_t addr, uint32_t shift)
+{
+	return (addr >> shift) & GEN6_PDE_MASK;
+}
+
+static inline uint32_t gen6_pte_index(uint32_t addr)
+{
+	return i915_pte_index(addr, GEN6_PDE_SHIFT);
+}
+
+static inline size_t gen6_pte_count(uint32_t addr, uint32_t length)
+{
+	return i915_pte_count(addr, length, GEN6_PDE_SHIFT);
+}
+
+static inline uint32_t gen6_pde_index(uint32_t addr)
+{
+	return i915_pde_index(addr, GEN6_PDE_SHIFT);
+}
 
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 11/24] drm/i915: Extract context switch skip and pd load logic
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (9 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 10/24] drm/i915: Track GEN6 page table usage Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2015-01-05 14:31     ` Daniel Vetter
  2014-12-23 17:16   ` [PATCH v2 12/24] drm/i915: Track page table reload need Michel Thierry
                     ` (13 subsequent siblings)
  24 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

We have some fanciness coming up. This patch just breaks out the logic
of context switch skip, pd load pre, and pd load post.

v2: Use new functions to replace the logic right away (Daniel)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_context.c | 40 +++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index b67d269..7b20bd4 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -563,6 +563,33 @@ mi_set_context(struct intel_engine_cs *ring,
 	return ret;
 }
 
+static inline bool should_skip_switch(struct intel_engine_cs *ring,
+				      struct intel_context *from,
+				      struct intel_context *to)
+{
+	if (from == to && !to->remap_slice)
+		return true;
+
+	return false;
+}
+
+static bool
+needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	return ((INTEL_INFO(ring->dev)->gen < 8) ||
+			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
+}
+
+static bool
+needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
+{
+	return (!to->legacy_hw_ctx.initialized ||
+			i915_gem_context_is_default(to)) &&
+			to->ppgtt && IS_GEN8(ring->dev);
+}
+
 static int do_switch(struct intel_engine_cs *ring,
 		     struct intel_context *to)
 {
@@ -571,9 +598,6 @@ static int do_switch(struct intel_engine_cs *ring,
 	u32 hw_flags = 0;
 	bool uninitialized = false;
 	struct i915_vma *vma;
-	bool needs_pd_load_pre = ((INTEL_INFO(ring->dev)->gen < 8) ||
-			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
-	bool needs_pd_load_post = false;
 	int ret, i;
 
 	if (from != NULL && ring == &dev_priv->ring[RCS]) {
@@ -581,7 +605,7 @@ static int do_switch(struct intel_engine_cs *ring,
 		BUG_ON(!i915_gem_obj_is_pinned(from->legacy_hw_ctx.rcs_state));
 	}
 
-	if (from == to && !to->remap_slice)
+	if (should_skip_switch(ring, from, to))
 		return 0;
 
 	/* Trying to pin first makes error handling easier. */
@@ -599,7 +623,7 @@ static int do_switch(struct intel_engine_cs *ring,
 	 */
 	from = ring->last_context;
 
-	if (needs_pd_load_pre) {
+	if (needs_pd_load_pre(ring, to)) {
 		/* Older GENs and non render rings still want the load first,
 		 * "PP_DCLV followed by PP_DIR_BASE register through Load
 		 * Register Immediate commands in Ring Buffer before submitting
@@ -644,16 +668,14 @@ static int do_switch(struct intel_engine_cs *ring,
 	 * XXX: If we implemented page directory eviction code, this
 	 * optimization needs to be removed.
 	 */
-	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
+	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
 		hw_flags |= MI_RESTORE_INHIBIT;
-		needs_pd_load_post = to->ppgtt && IS_GEN8(ring->dev);
-	}
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
 		goto unpin_out;
 
-	if (needs_pd_load_post) {
+	if (needs_pd_load_post(ring, to)) {
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		/* The hardware context switch is emitted, but we haven't
 		 * actually changed the state - so it's probably safe to bail
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 12/24] drm/i915: Track page table reload need
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (10 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 11/24] drm/i915: Extract context switch skip and pd load logic Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2015-01-05 14:36     ` Daniel Vetter
  2014-12-23 17:16   ` [PATCH v2 13/24] drm/i915: Initialize all contexts Michel Thierry
                     ` (12 subsequent siblings)
  24 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This patch was formerly known as, "Force pd restore when PDEs change,
gen6-7." I had to change the name because it is needed for GEN8 too.

The real issue this is trying to solve is when a new object is mapped
into the current address space. The GPU does not snoop the new mapping
so we must do the gen specific action to reload the page tables.

GEN8 and GEN7 do differ in the way they load page tables for the RCS.
GEN8 does so with the context restore, while GEN7 requires the proper
load commands in the command streamer. Non-render is similar for both.

Caveat for GEN7
The docs say you cannot change the PDEs of a currently running context.
We never map new PDEs of a running context, and expect them to be
present - so I think this is okay. (We can unmap, but this should also
be okay since we only unmap unreferenced objects that the GPU shouldn't
be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag
to signal that even if the context is the same, force a reload. It's
unclear exactly what this does, but I have a hunch it's the right thing
to do.

The logic assumes that we always emit a context switch after mapping new
PDEs, and before we submit a batch. This is the case today, and has been
the case since the inception of hardware contexts. A note in the comment
let's the user know.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

squash! drm/i915: Force pd restore when PDEs change, gen6-7

It's not just for gen8. If the current context has mappings change, we
need a context reload to switch

v2: Rebased after ppgtt clean up patches. Split the warning for aliasing
and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt
is always null.

v3: Invalidate PPGTT TLBs inside alloc_va_range and teardown_va_range.
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_context.c    | 27 ++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 12 ++++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  2 ++
 4 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 7b20bd4..fa9d4a1 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -567,8 +567,18 @@ static inline bool should_skip_switch(struct intel_engine_cs *ring,
 				      struct intel_context *from,
 				      struct intel_context *to)
 {
-	if (from == to && !to->remap_slice)
-		return true;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	if (to->remap_slice)
+		return false;
+
+	if (to->ppgtt) {
+		if (from == to && !test_bit(ring->id, &to->ppgtt->base.pd_reload_mask))
+			return true;
+	} else {
+		if (from == to && !test_bit(ring->id, &dev_priv->mm.aliasing_ppgtt->base.pd_reload_mask))
+			return true;
+	}
 
 	return false;
 }
@@ -585,9 +595,8 @@ needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
 static bool
 needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
 {
-	return (!to->legacy_hw_ctx.initialized ||
-			i915_gem_context_is_default(to)) &&
-			to->ppgtt && IS_GEN8(ring->dev);
+	return IS_GEN8(ring->dev) &&
+			(to->ppgtt || &to->ppgtt->base.pd_reload_mask);
 }
 
 static int do_switch(struct intel_engine_cs *ring,
@@ -632,6 +641,12 @@ static int do_switch(struct intel_engine_cs *ring,
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		if (ret)
 			goto unpin_out;
+
+		/* Doing a PD load always reloads the page dirs */
+		if (to->ppgtt)
+			clear_bit(ring->id, &to->ppgtt->base.pd_reload_mask);
+		else
+			clear_bit(ring->id, &dev_priv->mm.aliasing_ppgtt->base.pd_reload_mask);
 	}
 
 	if (ring != &dev_priv->ring[RCS]) {
@@ -670,6 +685,8 @@ static int do_switch(struct intel_engine_cs *ring,
 	 */
 	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
 		hw_flags |= MI_RESTORE_INHIBIT;
+	else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->base.pd_reload_mask))
+		hw_flags |= MI_FORCE_RESTORE;
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8330660..09d864f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1199,6 +1199,13 @@ i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
 	if (ret)
 		goto error;
 
+	if (ctx->ppgtt)
+		WARN(ctx->ppgtt->base.pd_reload_mask & (1<<ring->id),
+			"%s didn't clear reload\n", ring->name);
+	else
+		WARN(dev_priv->mm.aliasing_ppgtt->base.pd_reload_mask &
+			(1<<ring->id), "%s didn't clear reload\n", ring->name);
+
 	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
 	instp_mask = I915_EXEC_CONSTANTS_MASK;
 	switch (instp_mode) {
@@ -1446,6 +1453,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret)
 		goto err;
 
+	/* XXX: Reserve has possibly change PDEs which means we must do a
+	 * context switch before we can coherently read some of the reserved
+	 * VMAs. */
+
 	/* The objects are in their final locations, apply the relocations. */
 	if (need_relocs)
 		ret = i915_gem_execbuffer_relocate(eb);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 313432e..54c7ca7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1126,6 +1126,15 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
+/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
+ * are switching between contexts with the same LRCA, we also must do a force
+ * restore.
+ */
+#define ppgtt_invalidate_tlbs(vm) do {\
+	/* If current vm != vm, */ \
+	vm->pd_reload_mask = INTEL_INFO(vm->dev)->ring_mask; \
+} while (0)
+
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start, uint64_t length)
 {
@@ -1154,6 +1163,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 				I915_PPGTT_PT_ENTRIES);
 	}
 
+	ppgtt_invalidate_tlbs(vm);
 	return 0;
 }
 
@@ -1169,6 +1179,8 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
 		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
 			     gen6_pte_count(start, length));
 	}
+
+	ppgtt_invalidate_tlbs(vm);
 }
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d579f74..dc71cae 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -226,6 +226,8 @@ struct i915_address_space {
 		struct page *page;
 	} scratch;
 
+	unsigned long pd_reload_mask;
+
 	/**
 	 * List of objects currently involved in rendering.
 	 *
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 13/24] drm/i915: Initialize all contexts
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (11 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 12/24] drm/i915: Track page table reload need Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 14/24] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
                     ` (11 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The problem is we're going to switch to a new context, which could be
the default context. The plan was to use restore inhibit, which would be
fine, except if we are using dynamic page tables (which we will). If we
use dynamic page tables and we don't load new page tables, the previous
page tables might go away, and future operations will fault.

CTXA runs.
switch to default, restore inhibit
CTXA dies and has its address space taken away.
Run CTXB, tries to save using the context A's address space - this
fails.

The general solution is to make sure every context has it's own state,
and its own address space. For cases when we must restore inhibit, first
thing we do is load a valid address space. I thought this would be
enough, but apparently there are references within the context itself
which will refer to the old address space - therefore, we also must
reinitialize.

It was tricky to track this down as we don't have much insight into what
happens in a context save.

This is required for the next patch which enables dynamic page tables.

v2: to->ppgtt is only valid in full ppgtt.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_context.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index fa9d4a1..b1f3d50 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -592,13 +592,6 @@ needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
 			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
 }
 
-static bool
-needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
-{
-	return IS_GEN8(ring->dev) &&
-			(to->ppgtt || &to->ppgtt->base.pd_reload_mask);
-}
-
 static int do_switch(struct intel_engine_cs *ring,
 		     struct intel_context *to)
 {
@@ -679,20 +672,24 @@ static int do_switch(struct intel_engine_cs *ring,
 
 	/* GEN8 does *not* require an explicit reload if the PDPs have been
 	 * setup, and we do not wish to move them.
-	 *
-	 * XXX: If we implemented page directory eviction code, this
-	 * optimization needs to be removed.
 	 */
-	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
+	if (!to->legacy_hw_ctx.initialized) {
 		hw_flags |= MI_RESTORE_INHIBIT;
-	else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->base.pd_reload_mask))
+		/* NB: If we inhibit the restore, the context is not allowed to
+		 * die because future work may end up depending on valid address
+		 * space. This means we must enforce that a page table load
+		 * occur when this occurs. */
+	} else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->base.pd_reload_mask))
 		hw_flags |= MI_FORCE_RESTORE;
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
 		goto unpin_out;
 
-	if (needs_pd_load_post(ring, to)) {
+	if (IS_GEN8(ring->dev) && to->ppgtt && (hw_flags & MI_RESTORE_INHIBIT)) {
+		/* We have a valid page directory (scratch) to switch to. This
+		 * allows the old VM to be freed. Note that if anything occurs
+		 * between the set context, and here, we are f*cked */
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		/* The hardware context switch is emitted, but we haven't
 		 * actually changed the state - so it's probably safe to bail
@@ -742,7 +739,7 @@ static int do_switch(struct intel_engine_cs *ring,
 		i915_gem_context_unreference(from);
 	}
 
-	uninitialized = !to->legacy_hw_ctx.initialized && from == NULL;
+	uninitialized = !to->legacy_hw_ctx.initialized;
 	to->legacy_hw_ctx.initialized = true;
 
 done:
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 14/24] drm/i915: Finish gen6/7 dynamic page table allocation
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (12 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 13/24] drm/i915: Initialize all contexts Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2015-01-05 14:45     ` Daniel Vetter
  2014-12-23 17:16   ` [PATCH v2 15/24] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
                     ` (10 subsequent siblings)
  24 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This patch continues on the idea from the previous patch. From here on,
in the steady state, PDEs are all pointing to the scratch page table (as
recommended in the spec). When an object is allocated in the VA range,
the code will determine if we need to allocate a page for the page
table. Similarly when the object is destroyed, we will remove, and free
the page table pointing the PDE back to the scratch page.

Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.

The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV.  Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.

We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.

v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.

v3: Updated trace event to spit out a name

v4: Aliasing ppgtt is now initialized differently (in setup global gtt)

v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check for
trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
---
 drivers/gpu/drm/i915/i915_debugfs.c |   3 +-
 drivers/gpu/drm/i915/i915_gem.c     |   2 +
 drivers/gpu/drm/i915/i915_gem_gtt.c | 128 ++++++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_trace.h   | 115 ++++++++++++++++++++++++++++++++
 4 files changed, 236 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 60f91bc..0f63076 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2149,6 +2149,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", I915_READ(RING_PP_DIR_BASE_READ(ring)));
 		seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", I915_READ(RING_PP_DIR_DCLV(ring)));
 	}
+	seq_printf(m, "ECOCHK: 0x%08x\n\n", I915_READ(GAM_ECOCHK));
+
 	if (dev_priv->mm.aliasing_ppgtt) {
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
@@ -2165,7 +2167,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
 		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
 	}
-	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
 }
 
 static int i915_ppgtt_info(struct seq_file *m, void *data)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5d52990..1649fb2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3599,6 +3599,8 @@ search_free:
 
 	/*  allocate before insert / bind */
 	if (vma->vm->allocate_va_range) {
+		trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size,
+				VM_TO_TRACE_NAME(vma->vm));
 		ret = vma->vm->allocate_va_range(vma->vm,
 						vma->node.start,
 						vma->node.size);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 54c7ca7..32a355a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1138,10 +1138,47 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start, uint64_t length)
 {
+	DECLARE_BITMAP(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
+	struct drm_device *dev = vm->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_hw_ppgtt *ppgtt =
 				container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_pagetab *pt;
+	const uint32_t start_save = start, length_save = length;
 	uint32_t pde, temp;
+	int ret;
+
+	BUG_ON(upper_32_bits(start));
+
+	bitmap_zero(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
+
+	/* The allocation is done in two stages so that we can bail out with
+	 * minimal amount of pain. The first stage finds new page tables that
+	 * need allocation. The second stage marks use ptes within the page
+	 * tables.
+	 */
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		if (pt != ppgtt->scratch_pt) {
+			WARN_ON(bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
+			continue;
+		}
+
+		/* We've already allocated a page table */
+		WARN_ON(!bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
+
+		pt = alloc_pt_single(dev);
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto unwind_out;
+		}
+
+		ppgtt->pd.page_tables[pde] = pt;
+		set_bit(pde, new_page_tables);
+		trace_i915_pagetable_alloc(vm, pde, start, GEN6_PDE_SHIFT);
+	}
+
+	start = start_save;
+	length = length_save;
 
 	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
 		int j;
@@ -1159,12 +1196,35 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 			}
 		}
 
-		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+		if (test_and_clear_bit(pde, new_page_tables))
+			gen6_write_pdes(&ppgtt->pd, pde, pt);
+
+		trace_i915_pagetable_map(vm, pde, pt,
+					 gen6_pte_index(start),
+					 gen6_pte_count(start, length),
+					 I915_PPGTT_PT_ENTRIES);
+		bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
 				I915_PPGTT_PT_ENTRIES);
 	}
 
+	WARN_ON(!bitmap_empty(new_page_tables, GEN6_PPGTT_PD_ENTRIES));
+
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
+	readl(dev_priv->gtt.gsm);
+
 	ppgtt_invalidate_tlbs(vm);
 	return 0;
+
+unwind_out:
+	for_each_set_bit(pde, new_page_tables, GEN6_PPGTT_PD_ENTRIES) {
+		struct i915_pagetab *pt = ppgtt->pd.page_tables[pde];
+		ppgtt->pd.page_tables[pde] = NULL;
+		free_pt_single(pt, vm->dev);
+	}
+
+	ppgtt_invalidate_tlbs(vm);
+	return ret;
 }
 
 static void gen6_teardown_va_range(struct i915_address_space *vm,
@@ -1176,8 +1236,27 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
 	uint32_t pde, temp;
 
 	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+
+		if (WARN(pt == ppgtt->scratch_pt,
+		    "Tried to teardown scratch page vm %p. pde %u: %llx-%llx\n",
+		    vm, pde, start, start + length))
+			continue;
+
+		trace_i915_pagetable_unmap(vm, pde, pt,
+					   gen6_pte_index(start),
+					   gen6_pte_count(start, length),
+					   I915_PPGTT_PT_ENTRIES);
+
 		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
 			     gen6_pte_count(start, length));
+
+		if (bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES)) {
+			trace_i915_pagetable_destroy(vm, pde,
+						     start & GENMASK_ULL(63, GEN6_PDE_SHIFT),
+						     GEN6_PDE_SHIFT);
+			gen6_write_pdes(&ppgtt->pd, pde, ppgtt->scratch_pt);
+			ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
+		}
 	}
 
 	ppgtt_invalidate_tlbs(vm);
@@ -1187,9 +1266,13 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct i915_pagetab *pt = ppgtt->pd.page_tables[i];
+		if (pt != ppgtt->scratch_pt)
+			free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+	}
 
+	/* Consider putting this as part of pd free. */
 	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
 	free_pd_single(&ppgtt->pd);
 }
@@ -1254,7 +1337,7 @@ err_out:
 	return ret;
 }
 
-static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt, bool preallocate_pt)
 {
 	int ret;
 
@@ -1262,10 +1345,14 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
+	if (!preallocate_pt)
+		return 0;
+
 	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
 			ppgtt->base.dev);
 
 	if (ret) {
+		free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
 	}
@@ -1273,7 +1360,17 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
+				  uint64_t start, uint64_t length)
+{
+	struct i915_pagetab *unused;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde)
+		ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
+}
+
+static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 {
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1289,7 +1386,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	} else
 		BUG();
 
-	ret = gen6_ppgtt_alloc(ppgtt);
+	ret = gen6_ppgtt_alloc(ppgtt, aliasing);
 	if (ret)
 		return ret;
 
@@ -1308,6 +1405,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
 		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
+	if (!aliasing)
+		gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
+
 	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
@@ -1320,7 +1420,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
+static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
+		bool aliasing)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
@@ -1328,7 +1429,7 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.scratch = dev_priv->gtt.base.scratch;
 
 	if (INTEL_INFO(dev)->gen < 8)
-		return gen6_ppgtt_init(ppgtt);
+		return gen6_ppgtt_init(ppgtt, aliasing);
 	else
 		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
 }
@@ -1337,7 +1438,7 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret = 0;
 
-	ret = __hw_ppgtt_init(dev, ppgtt);
+	ret = __hw_ppgtt_init(dev, ppgtt, false);
 	if (ret == 0) {
 		kref_init(&ppgtt->ref);
 		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
@@ -1445,9 +1546,14 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 			     vma->node.start,
 			     vma->obj->base.size,
 			     true);
-	if (vma->vm->teardown_va_range)
+	if (vma->vm->teardown_va_range) {
+		trace_i915_va_teardown(vma->vm,
+				       vma->node.start, vma->node.size,
+				       VM_TO_TRACE_NAME(vma->vm));
+
 		vma->vm->teardown_va_range(vma->vm,
 					   vma->node.start, vma->node.size);
+	}
 }
 
 extern int intel_iommu_gfx_mapped;
@@ -1963,7 +2069,7 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
 		if (!ppgtt)
 			return -ENOMEM;
 
-		ret = __hw_ppgtt_init(dev, ppgtt);
+		ret = __hw_ppgtt_init(dev, ppgtt, true);
 		if (ret != 0)
 			return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index f004d3d..0b617c9 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -156,6 +156,121 @@ TRACE_EVENT(i915_vma_unbind,
 		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
 );
 
+#define VM_TO_TRACE_NAME(vm) \
+	(i915_is_ggtt(vm) ? "GGTT" : \
+				      "Private VM")
+
+DECLARE_EVENT_CLASS(i915_va,
+	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+	TP_ARGS(vm, start, length, name),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u64, start)
+		__field(u64, end)
+		__string(name, name)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->start = start;
+		__entry->end = start + length;
+		__assign_str(name, name);
+	),
+
+	TP_printk("vm=%p (%s), 0x%llx-0x%llx",
+		  __entry->vm, __get_str(name),  __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_va, i915_va_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+	     TP_ARGS(vm, start, length, name)
+);
+
+DEFINE_EVENT(i915_va, i915_va_teardown,
+	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+	     TP_ARGS(vm, start, length, name)
+);
+
+DECLARE_EVENT_CLASS(i915_pagetable,
+	TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	TP_ARGS(vm, pde, start, pde_shift),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u64, start)
+		__field(u64, end)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->start = start;
+		__entry->end = (start + (1ULL << pde_shift)) & ~((1ULL << pde_shift)-1);
+	),
+
+	TP_printk("vm=%p, pde=%d (0x%llx-0x%llx)",
+		  __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_pagetable, i915_pagetable_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	     TP_ARGS(vm, pde, start, pde_shift)
+);
+
+DEFINE_EVENT(i915_pagetable, i915_pagetable_destroy,
+	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	     TP_ARGS(vm, pde, start, pde_shift)
+);
+
+/* Avoid extra math because we only support two sizes. The format is defined by
+ * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
+#define TRACE_PT_SIZE(bits) \
+	((((bits) == 1024) ? 288 : 144) + 1)
+
+DECLARE_EVENT_CLASS(i915_pagetable_update,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u32, first)
+		__field(u32, last)
+		__dynamic_array(char, cur_ptes, TRACE_PT_SIZE(bits))
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->first = first;
+		__entry->last = first + len;
+
+		bitmap_scnprintf(__get_str(cur_ptes),
+				 TRACE_PT_SIZE(bits),
+				 pt->used_ptes,
+				 bits);
+	),
+
+	TP_printk("vm=%p, pde=%d, updating %u:%u\t%s",
+		  __entry->vm, __entry->pde, __entry->last, __entry->first,
+		  __get_str(cur_ptes))
+);
+
+DEFINE_EVENT(i915_pagetable_update, i915_pagetable_map,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits)
+);
+
+DEFINE_EVENT(i915_pagetable_update, i915_pagetable_unmap,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits)
+);
+
 TRACE_EVENT(i915_gem_object_change_domain,
 	    TP_PROTO(struct drm_i915_gem_object *obj, u32 old_read, u32 old_write),
 	    TP_ARGS(obj, old_read, old_write),
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 15/24] drm/i915/bdw: Use dynamic allocation idioms on free
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (13 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 14/24] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 16/24] drm/i915/bdw: pagedirs rework allocation Michel Thierry
                     ` (9 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The page directory freer is left here for now as it's still useful given
that GEN8 still preallocates. Once the allocation functions are broken
up into more discrete chunks, we'll follow suit and destroy this
leftover piece.

v2: Match trace_i915_va_teardown params
v3: Multiple rebases.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2, v3)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 54 +++++++++++++++++++++++--------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 46 +++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 32a355a..971c05b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -579,27 +579,32 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct i915_pagedir *pd, struct drm_device *dev)
+static void gen8_teardown_va_range(struct i915_address_space *vm,
+				   uint64_t start, uint64_t length)
 {
-	int i;
-
-	if (!pd->page)
-		return;
-
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		free_pt_single(pd->page_tables[i], dev);
-		pd->page_tables[i] = NULL;
+	struct i915_hw_ppgtt *ppgtt =
+				container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_pagedir *pd;
+	struct i915_pagetab *pt;
+	uint64_t temp;
+	uint32_t pdpe, pde;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		uint64_t pd_len = gen8_clamp_pd(start, length);
+		uint64_t pd_start = start;
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			free_pt_single(pt, vm->dev);
+		}
+		free_pd_single(pd);
 	}
 }
 
-static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+/* This function will die soon */
+static void gen8_free_full_pagedir(struct i915_hw_ppgtt *ppgtt, int i)
 {
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
-		free_pd_single(ppgtt->pdp.pagedir[i]);
-	}
+	gen8_teardown_va_range(&ppgtt->base,
+			       i << GEN8_PDPE_SHIFT,
+			       (1 << GEN8_PDPE_SHIFT));
 }
 
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
@@ -614,19 +619,28 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 			continue;
 
 		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i]->daddr, PAGE_SIZE,
-			       PCI_DMA_BIDIRECTIONAL);
+				PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
-			struct i915_pagetab *pt =  pd->page_tables[j];
+			struct i915_pagetab *pt = pd->page_tables[j];
 			dma_addr_t addr = pt->daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
-					       PCI_DMA_BIDIRECTIONAL);
+						PCI_DMA_BIDIRECTIONAL);
 		}
 	}
 }
 
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+{
+	trace_i915_va_teardown(&ppgtt->base,
+			       ppgtt->base.start, ppgtt->base.total,
+			       VM_TO_TRACE_NAME(&ppgtt->base));
+	gen8_teardown_va_range(&ppgtt->base,
+			       ppgtt->base.start, ppgtt->base.total);
+}
+
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct i915_hw_ppgtt *ppgtt =
@@ -651,7 +665,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
+		gen8_free_full_pagedir(ppgtt, i);
 
 	return -ENOMEM;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index dc71cae..96209c2 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -387,6 +387,52 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	return i915_pde_index(addr, GEN6_PDE_SHIFT);
 }
 
+#define gen8_for_each_pde(pt, pd, start, length, temp, iter)		\
+	for (iter = gen8_pde_index(start), pt = (pd)->page_tables[iter]; \
+	     length > 0 && iter < GEN8_PDES_PER_PAGE;			\
+	     pt = (pd)->page_tables[++iter],				\
+	     temp = ALIGN(start+1, 1 << GEN8_PDE_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
+	for (iter = gen8_pdpe_index(start), pd = (pdp)->pagedir[iter];	\
+	     length > 0 && iter < GEN8_LEGACY_PDPES;			\
+	     pd = (pdp)->pagedir[++iter],				\
+	     temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
+/* Clamp length to the next pagedir boundary */
+static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
+{
+	uint64_t next_pd = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
+	if (next_pd > (start + length))
+		return length;
+
+	return next_pd - start;
+}
+
+static inline uint32_t gen8_pte_index(uint64_t address)
+{
+	return i915_pte_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pde_index(uint64_t address)
+{
+	return i915_pde_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pdpe_index(uint64_t address)
+{
+	return (address >> GEN8_PDPE_SHIFT) & GEN8_PDPE_MASK;
+}
+
+static inline uint32_t gen8_pml4e_index(uint64_t address)
+{
+	BUG(); /* For 64B */
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 16/24] drm/i915/bdw: pagedirs rework allocation
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (14 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 15/24] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 17/24] drm/i915/bdw: pagetable allocation rework Michel Thierry
                     ` (8 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Start using gen8_for_each_pdpe macro to allocate the page directories.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 43 ++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 971c05b..e759a03 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -594,8 +594,10 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 		uint64_t pd_start = start;
 		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
 			free_pt_single(pt, vm->dev);
+			pd->page_tables[pde] = NULL;
 		}
 		free_pd_single(pd);
+		ppgtt->pdp.pagedir[pdpe] = NULL;
 	}
 }
 
@@ -670,25 +672,39 @@ unwind_out:
 	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
+static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
+				     uint64_t start,
+				     uint64_t length)
 {
-	int i;
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pdp, struct i915_hw_ppgtt, pdp);
+	struct i915_pagedir *unused;
+	uint64_t temp;
+	uint32_t pdpe;
 
-	for (i = 0; i < max_pdp; i++) {
-		ppgtt->pdp.pagedir[i] = alloc_pd_single();
-		if (IS_ERR(ppgtt->pdp.pagedir[i]))
+	/* FIXME: PPGTT container_of won't work for 64b */
+	BUG_ON((start + length) > 0x800000000ULL);
+
+	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
+		BUG_ON(unused);
+		pdp->pagedir[pdpe] = alloc_pd_single();
+		if (IS_ERR(ppgtt->pdp.pagedir[pdpe]))
 			goto unwind_out;
+
+		ppgtt->num_pd_pages++;
 	}
 
-	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
 
 unwind_out:
-	while (i--)
-		free_pd_single(ppgtt->pdp.pagedir[i]);
+	while (pdpe--) {
+		free_pd_single(ppgtt->pdp.pagedir[pdpe]);
+		ppgtt->num_pd_pages--;
+	}
+
+	WARN_ON(ppgtt->num_pd_pages);
 
 	return -ENOMEM;
 }
@@ -698,7 +714,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 {
 	int ret;
 
-	ret = gen8_ppgtt_allocate_page_directories(ppgtt, max_pdp);
+	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, ppgtt->base.start,
+					ppgtt->base.total);
 	if (ret)
 		return ret;
 
@@ -775,6 +792,10 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	if (size % (1<<30))
 		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
 
+	ppgtt->base.start = 0;
+	ppgtt->base.total = size;
+	BUG_ON(ppgtt->base.total == 0);
+
 	/* 1. Do all our allocations for page directories and page tables. */
 	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
 	if (ret)
@@ -822,8 +843,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
-	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
 
 	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
 			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 17/24] drm/i915/bdw: pagetable allocation rework
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (15 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 16/24] drm/i915/bdw: pagedirs rework allocation Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 18/24] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
                     ` (7 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Start using gen8_for_each_pde macro to allocate page tables.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 54 ++++++++++++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 10 +++++++
 2 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e759a03..f928c10 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -601,14 +601,6 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 	}
 }
 
-/* This function will die soon */
-static void gen8_free_full_pagedir(struct i915_hw_ppgtt *ppgtt, int i)
-{
-	gen8_teardown_va_range(&ppgtt->base,
-			       i << GEN8_PDPE_SHIFT,
-			       (1 << GEN8_PDPE_SHIFT));
-}
-
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
@@ -652,22 +644,27 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_alloc_pagetabs(struct i915_pagedir *pd,
+				     uint64_t start,
+				     uint64_t length,
+				     struct drm_device *dev)
 {
-	int i, ret;
+	struct i915_pagetab *unused;
+	uint64_t temp;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
-				     0, GEN8_PDES_PER_PAGE, ppgtt->base.dev);
-		if (ret)
+	gen8_for_each_pde(unused, pd, start, length, temp, pde) {
+		BUG_ON(unused);
+		pd->page_tables[pde] = alloc_pt_single(dev);
+		if (IS_ERR(pd->page_tables[pde]))
 			goto unwind_out;
 	}
 
 	return 0;
 
 unwind_out:
-	while (i--)
-		gen8_free_full_pagedir(ppgtt, i);
+	while (pde--)
+		free_pt_single(pd->page_tables[pde], dev);
 
 	return -ENOMEM;
 }
@@ -710,20 +707,28 @@ unwind_out:
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-			    const int max_pdp)
+			    uint64_t start,
+			    uint64_t length)
 {
+	struct i915_pagedir *pd;
+	uint64_t temp;
+	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, ppgtt->base.start,
-					ppgtt->base.total);
+	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length);
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
-	if (ret)
-		goto err_out;
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
+						ppgtt->base.dev);
+		if (ret)
+			goto err_out;
+
+		ppgtt->num_pd_entries += GEN8_PDES_PER_PAGE;
+	}
 
-	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
+	BUG_ON(pdpe > ppgtt->num_pd_pages);
 
 	return 0;
 
@@ -794,10 +799,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
-	BUG_ON(ppgtt->base.total == 0);
 
 	/* 1. Do all our allocations for page directories and page tables. */
-	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
+	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 96209c2..74837a3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -403,6 +403,16 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
+/* Clamp length to the next pagetab boundary */
+static inline uint64_t gen8_clamp_pt(uint64_t start, uint64_t length)
+{
+	uint64_t next_pt = ALIGN(start + 1, 1 << GEN8_PDE_SHIFT);
+	if (next_pt > (start + length))
+		return length;
+
+	return next_pt - start;
+}
+
 /* Clamp length to the next pagedir boundary */
 static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
 {
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 18/24] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (16 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 17/24] drm/i915/bdw: pagetable allocation rework Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 19/24] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
                     ` (6 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

One important part of this patch is we now write a scratch page
directory into any unused PDP descriptors. This matters for 2 reasons,
first, we're not allowed to just use 0, or an invalid pointer, and second,
we must wipe out any previous contents from the last context.

The latter point only matters with full PPGTT. The former point only
effect platforms with less than 4GB memory.

v2: Updated commit message to point that we must set unused PDPs to the
scratch page.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 29 ++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  5 ++++-
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f928c10..bd6cb2f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -445,8 +445,9 @@ static struct i915_pagedir *alloc_pd_single(void)
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
-static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
-			   uint64_t val)
+static int gen8_write_pdp(struct intel_engine_cs *ring,
+			  unsigned entry,
+			  dma_addr_t addr)
 {
 	int ret;
 
@@ -458,10 +459,10 @@ static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
 
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 	intel_ring_emit(ring, GEN8_RING_PDP_UDW(ring, entry));
-	intel_ring_emit(ring, (u32)(val >> 32));
+	intel_ring_emit(ring, upper_32_bits(addr));
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 	intel_ring_emit(ring, GEN8_RING_PDP_LDW(ring, entry));
-	intel_ring_emit(ring, (u32)(val));
+	intel_ring_emit(ring, lower_32_bits(addr));
 	intel_ring_advance(ring);
 
 	return 0;
@@ -472,12 +473,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 {
 	int i, ret;
 
-	/* bit of a hack to find the actual last used pd */
-	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
-
-	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.pagedir[i]->daddr;
-		ret = gen8_write_pdp(ring, i, addr);
+	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+		struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
+		dma_addr_t pd_daddr = pd ? pd->daddr : ppgtt->scratch_pd->daddr;
+		/* The page directory might be NULL, but we need to clear out
+		 * whatever the previous context might have used. */
+		ret = gen8_write_pdp(ring, i, pd_daddr);
 		if (ret)
 			return ret;
 	}
@@ -800,10 +801,16 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
 
+	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pd))
+		return PTR_ERR(ppgtt->scratch_pd);
+
 	/* 1. Do all our allocations for page directories and page tables. */
 	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
-	if (ret)
+	if (ret) {
+		free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
+	}
 
 	/*
 	 * 2. Create DMA mappings for the page directories and page tables.
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 74837a3..0cf4f6d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -283,7 +283,10 @@ struct i915_hw_ppgtt {
 		struct i915_pagedir pd;
 	};
 
-	struct i915_pagetab *scratch_pt;
+	union {
+		struct i915_pagetab *scratch_pt;
+		struct i915_pagetab *scratch_pd; /* Just need the daddr */
+	};
 
 	struct drm_i915_file_private *file_priv;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 19/24] drm/i915: num_pd_pages/num_pd_entries isn't useful
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (17 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 18/24] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 20/24] drm/i915: Extract PPGTT param from pagedir alloc Michel Thierry
                     ` (5 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

These values are never quite useful for dynamic allocations of the page
tables. Getting rid of them will help prevent later confusion.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 --
 drivers/gpu/drm/i915/i915_gem_gtt.c | 68 ++++++++++++-------------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  7 ++--
 3 files changed, 27 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 0f63076..b00760b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2117,8 +2117,6 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 	if (!ppgtt)
 		return;
 
-	seq_printf(m, "Page directories: %d\n", ppgtt->num_pd_pages);
-	seq_printf(m, "Page tables: %d\n", ppgtt->num_pd_entries);
 	for_each_ring(ring, dev_priv, unused) {
 		seq_printf(m, "%s\n", ring->name);
 		for (i = 0; i < 4; i++) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bd6cb2f..c40db0e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -607,7 +607,7 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
 		if (!ppgtt->pdp.pagedir[i]->daddr)
@@ -688,21 +688,13 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 		pdp->pagedir[pdpe] = alloc_pd_single();
 		if (IS_ERR(ppgtt->pdp.pagedir[pdpe]))
 			goto unwind_out;
-
-		ppgtt->num_pd_pages++;
 	}
 
-	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
-
 	return 0;
 
 unwind_out:
-	while (pdpe--) {
+	while (pdpe--)
 		free_pd_single(ppgtt->pdp.pagedir[pdpe]);
-		ppgtt->num_pd_pages--;
-	}
-
-	WARN_ON(ppgtt->num_pd_pages);
 
 	return -ENOMEM;
 }
@@ -725,12 +717,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 						ppgtt->base.dev);
 		if (ret)
 			goto err_out;
-
-		ppgtt->num_pd_entries += GEN8_PDES_PER_PAGE;
 	}
 
-	BUG_ON(pdpe > ppgtt->num_pd_pages);
-
 	return 0;
 
 	/* TODO: Check this for all cases */
@@ -792,7 +780,6 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
 	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
 	int i, j, ret;
 
 	if (size % (1<<30))
@@ -855,11 +842,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 
-	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
-			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
-	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
-			 ppgtt->num_pd_entries,
-			 (ppgtt->num_pd_entries - min_pt_pages) + size % (1<<30));
 	return 0;
 
 bail:
@@ -870,26 +852,20 @@ bail:
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
 	struct i915_address_space *vm = &ppgtt->base;
-	gen6_gtt_pte_t __iomem *pd_addr;
+	struct i915_pagetab *unused;
 	gen6_gtt_pte_t scratch_pte;
 	uint32_t pd_entry;
-	int pte, pde;
+	uint32_t  pte, pde, temp;
+	uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
 
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
 
-	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-
-	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-		   ppgtt->pd.pd_offset,
-		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
-	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
+	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
 		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
-		pd_entry = readl(pd_addr + pde);
+		pd_entry = readl(ppgtt->pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
 		if (pd_entry != expected)
@@ -1162,12 +1138,15 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 
 static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
-	int i;
+	struct i915_pagetab *pt;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_tables[i]->daddr,
-			       4096, PCI_DMA_BIDIRECTIONAL);
+	gen6_for_all_pdes(pt, ppgtt, pde) {
+		if (pt != ppgtt->scratch_pt) /* MT check if needed this if */
+			pci_unmap_page(ppgtt->base.dev->pdev,
+				pt->daddr,
+				4096, PCI_DMA_BIDIRECTIONAL);
+	}
 }
 
 /* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
@@ -1308,12 +1287,12 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
-	int i;
+	struct i915_pagetab *pt;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		struct i915_pagetab *pt = ppgtt->pd.page_tables[i];
+	gen6_for_all_pdes(pt, ppgtt, pde) {
 		if (pt != ppgtt->scratch_pt)
-			free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+			free_pt_single(pt, ppgtt->base.dev);
 	}
 
 	/* Consider putting this as part of pd free. */
@@ -1373,7 +1352,6 @@ alloc:
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
-	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
 	return 0;
 
 err_out:
@@ -1392,9 +1370,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt, bool preallocate_pt)
 	if (!preallocate_pt)
 		return 0;
 
-	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
-			ppgtt->base.dev);
-
+	ret = alloc_pt_range(&ppgtt->pd, 0, GEN6_PPGTT_PD_ENTRIES, ppgtt->base.dev);
 	if (ret) {
 		free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
 		drm_mm_remove_node(&ppgtt->node);
@@ -1440,7 +1416,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	ppgtt->base.total = GEN6_PPGTT_PD_ENTRIES * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd.pd_offset =
@@ -1748,7 +1724,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 		if (i915_is_ggtt(vm))
 			ppgtt = dev_priv->mm.aliasing_ppgtt;
 
-		gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
+		gen6_write_page_range(dev_priv, &ppgtt->pd, 0, GEN6_PPGTT_PD_ENTRIES);
 	}
 
 	i915_ggtt_flush(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 0cf4f6d..4c50d87 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -276,8 +276,6 @@ struct i915_hw_ppgtt {
 	struct i915_address_space base;
 	struct kref ref;
 	struct drm_mm_node node;
-	unsigned num_pd_entries;
-	unsigned num_pd_pages; /* gen8+ */
 	union {
 		struct i915_pagedirpo pdp;
 		struct i915_pagedir pd;
@@ -343,6 +341,11 @@ struct i915_gtt {
 	     temp = min(temp, (unsigned)length), \
 	     start += temp, length -= temp)
 
+#define gen6_for_all_pdes(pt, ppgtt, iter)  \
+	for (iter = 0, pt = ppgtt->pd.page_tables[iter];			\
+	     iter < gen6_pde_index(ppgtt->base.total);			\
+	     pt =  ppgtt->pd.page_tables[++iter])
+
 static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
 {
 	const uint32_t mask = NUM_PTE(pde_shift) - 1;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 20/24] drm/i915: Extract PPGTT param from pagedir alloc
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (18 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 19/24] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 21/24] drm/i915/bdw: Split out mappings Michel Thierry
                     ` (4 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Now that we don't need to trace num_pd_pages, we may as well kill all
need for the PPGTT structure in the alloc_pagedirs. This is very useful
for when we move to 48b addressing, and the PDP isn't the root of the
page table structure.

The param is replaced with drm_device, which is an unavoidable wart
throughout the series. (in other words, not extra flagrant).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c40db0e..6d67660 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -674,8 +674,6 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 				     uint64_t start,
 				     uint64_t length)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(pdp, struct i915_hw_ppgtt, pdp);
 	struct i915_pagedir *unused;
 	uint64_t temp;
 	uint32_t pdpe;
@@ -686,7 +684,7 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
 		pdp->pagedir[pdpe] = alloc_pd_single();
-		if (IS_ERR(ppgtt->pdp.pagedir[pdpe]))
+		if (IS_ERR(pdp->pagedir[pdpe]))
 			goto unwind_out;
 	}
 
@@ -694,7 +692,7 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 
 unwind_out:
 	while (pdpe--)
-		free_pd_single(ppgtt->pdp.pagedir[pdpe]);
+		free_pd_single(pdp->pagedir[pdpe]);
 
 	return -ENOMEM;
 }
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 21/24] drm/i915/bdw: Split out mappings
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (19 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 20/24] drm/i915: Extract PPGTT param from pagedir alloc Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 22/24] drm/i915/bdw: begin bitmap tracking Michel Thierry
                     ` (3 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

When we do dynamic page table allocations for gen8, we'll need to have
more control over how and when we map page tables, similar to gen6.
In particular, DMA mappings for page directories/tables occur at allocation
time.

This patch adds the functionality and calls it at init, which should
have no functional change.

The PDPEs are still a special case for now. We'll need a function for
that in the future as well.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 186 ++++++++++++++----------------------
 1 file changed, 72 insertions(+), 114 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6d67660..ff3aac5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -415,21 +415,23 @@ err_out:
 	return ret;
 }
 
-static void __free_pd_single(struct i915_pagedir *pd)
+static void __free_pd_single(struct i915_pagedir *pd, struct drm_device *dev)
 {
+	i915_dma_unmap_single(pd, dev);
 	__free_page(pd->page);
 	kfree(pd);
 }
 
-#define free_pd_single(pd) do { \
+#define free_pd_single(pd,  dev) do { \
 	if ((pd)->page) { \
-		__free_pd_single(pd); \
+		__free_pd_single(pd, dev); \
 	} \
 } while (0)
 
-static struct i915_pagedir *alloc_pd_single(void)
+static struct i915_pagedir *alloc_pd_single(struct drm_device *dev)
 {
 	struct i915_pagedir *pd;
+	int ret;
 
 	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
 	if (!pd)
@@ -441,6 +443,13 @@ static struct i915_pagedir *alloc_pd_single(void)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	ret = i915_dma_map_px_single(pd, dev);
+	if (ret) {
+		__free_page(pd->page);
+		kfree(pd);
+		return ERR_PTR(ret);
+	}
+
 	return pd;
 }
 
@@ -580,6 +589,36 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+			     struct i915_pagetab *pt,
+			     struct drm_device *dev)
+{
+	gen8_ppgtt_pde_t entry =
+		gen8_pde_encode(dev, pt->daddr, I915_CACHE_LLC);
+	*pde = entry;
+}
+
+/* It's likely we'll map more than one pagetable at a time. This function will
+ * save us unnecessary kmap calls, but do no more functionally than multiple
+ * calls to map_pt. */
+static void gen8_map_pagetable_range(struct i915_pagedir *pd,
+				     uint64_t start,
+				     uint64_t length,
+				     struct drm_device *dev)
+{
+	gen8_ppgtt_pde_t *pagedir = kmap_atomic(pd->page);
+	struct i915_pagetab *pt;
+	uint64_t temp, pde;
+
+	gen8_for_each_pde(pt, pd, start, length, temp, pde)
+		__gen8_do_map_pt(pagedir + pde, pt, dev);
+
+	if (!HAS_LLC(dev))
+		drm_clflush_virt_range(pagedir, PAGE_SIZE);
+
+	kunmap_atomic(pagedir);
+}
+
 static void gen8_teardown_va_range(struct i915_address_space *vm,
 				   uint64_t start, uint64_t length)
 {
@@ -597,7 +636,7 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 			free_pt_single(pt, vm->dev);
 			pd->page_tables[pde] = NULL;
 		}
-		free_pd_single(pd);
+		free_pd_single(pd, vm->dev);
 		ppgtt->pdp.pagedir[pdpe] = NULL;
 	}
 }
@@ -629,9 +668,6 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
-	trace_i915_va_teardown(&ppgtt->base,
-			       ppgtt->base.start, ppgtt->base.total,
-			       VM_TO_TRACE_NAME(&ppgtt->base));
 	gen8_teardown_va_range(&ppgtt->base,
 			       ppgtt->base.start, ppgtt->base.total);
 }
@@ -672,7 +708,8 @@ unwind_out:
 
 static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 				     uint64_t start,
-				     uint64_t length)
+				     uint64_t length,
+				     struct drm_device *dev)
 {
 	struct i915_pagedir *unused;
 	uint64_t temp;
@@ -683,7 +720,7 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
-		pdp->pagedir[pdpe] = alloc_pd_single();
+		pdp->pagedir[pdpe] = alloc_pd_single(dev);
 		if (IS_ERR(pdp->pagedir[pdpe]))
 			goto unwind_out;
 	}
@@ -692,21 +729,25 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 
 unwind_out:
 	while (pdpe--)
-		free_pd_single(pdp->pagedir[pdpe]);
+		free_pd_single(pdp->pagedir[pdpe], dev);
 
 	return -ENOMEM;
 }
 
-static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-			    uint64_t start,
-			    uint64_t length)
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start,
+			       uint64_t length)
 {
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_pagedir *pd;
+	const uint64_t orig_start = start;
 	uint64_t temp;
 	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length);
+	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length,
+					ppgtt->base.dev);
 	if (ret)
 		return ret;
 
@@ -719,133 +760,50 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 
 	return 0;
 
-	/* TODO: Check this for all cases */
 err_out:
-	gen8_ppgtt_free(ppgtt);
+	gen8_teardown_va_range(vm, orig_start, start);
 	return ret;
 }
 
-static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
-					     const int pd)
-{
-	dma_addr_t pd_addr;
-	int ret;
-
-	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.pagedir[pd]->page, 0,
-			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
-	if (ret)
-		return ret;
-
-	ppgtt->pdp.pagedir[pd]->daddr = pd_addr;
-
-	return 0;
-}
-
-static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
-					const int pd,
-					const int pt)
-{
-	dma_addr_t pt_addr;
-	struct i915_pagedir *pdir = ppgtt->pdp.pagedir[pd];
-	struct i915_pagetab *ptab = pdir->page_tables[pt];
-	struct page *p = ptab->page;
-	int ret;
-
-	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
-	if (ret)
-		return ret;
-
-	ptab->daddr = pt_addr;
-
-	return 0;
-}
-
 /**
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
  * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
  * space.
  *
- * FIXME: split allocation into smaller pieces. For now we only ever do this
- * once, but with full PPGTT, the multiple contiguous allocations will be bad.
- * TODO: Do something with the size parameter
  */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	int i, j, ret;
-
-	if (size % (1<<30))
-		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
+	struct i915_pagedir *pd;
+	uint64_t temp, start = 0;
+	const uint64_t orig_length = size;
+	uint32_t pdpe;
+	int ret;
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->switch_mm = gen8_mm_switch;
 
 	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pd))
 		return PTR_ERR(ppgtt->scratch_pd);
 
-	/* 1. Do all our allocations for page directories and page tables. */
-	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
+	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
 		free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
 	}
 
-	/*
-	 * 2. Create DMA mappings for the page directories and page tables.
-	 */
-	for (i = 0; i < max_pdp; i++) {
-		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
-		if (ret)
-			goto bail;
-
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
-			if (ret)
-				goto bail;
-		}
-	}
-
-	/*
-	 * 3. Map all the page directory entires to point to the page tables
-	 * we've allocated.
-	 *
-	 * For now, the PPGTT helper functions all require that the PDEs are
-	 * plugged in correctly. So we do that now/here. For aliasing PPGTT, we
-	 * will never need to touch the PDEs again.
-	 */
-	for (i = 0; i < max_pdp; i++) {
-		struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
-		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i]->page);
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_pagetab *pt = pd->page_tables[j];
-			dma_addr_t addr = pt->daddr;
-			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
-						      I915_CACHE_LLC);
-		}
-		if (!HAS_LLC(ppgtt->base.dev))
-			drm_clflush_virt_range(pd_vaddr, PAGE_SIZE);
-		kunmap_atomic(pd_vaddr);
-	}
+	start = 0;
+	size = orig_length;
 
-	ppgtt->switch_mm = gen8_mm_switch;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
-	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
+		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
 
 	return 0;
-
-bail:
-	gen8_ppgtt_unmap_pages(ppgtt);
-	gen8_ppgtt_free(ppgtt);
-	return ret;
 }
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
@@ -1295,7 +1253,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	/* Consider putting this as part of pd free. */
 	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
-	free_pd_single(&ppgtt->pd);
+	free_pd_single(&ppgtt->pd, ppgtt->base.dev);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 22/24] drm/i915/bdw: begin bitmap tracking
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (20 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 21/24] drm/i915/bdw: Split out mappings Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2014-12-23 17:16   ` [PATCH v2 23/24] drm/i915/bdw: Dynamic page table allocations Michel Thierry
                     ` (2 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Like with gen6/7, we can enable bitmap tracking with all the
preallocations to make sure things actually don't blow up.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 121 +++++++++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  24 +++++++
 2 files changed, 108 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ff3aac5..6254677 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -417,8 +417,12 @@ err_out:
 
 static void __free_pd_single(struct i915_pagedir *pd, struct drm_device *dev)
 {
+	WARN(!bitmap_empty(pd->used_pdes, GEN8_PDES_PER_PAGE),
+	     "Free page directory with %d used pages\n",
+	     bitmap_weight(pd->used_pdes, GEN8_PDES_PER_PAGE));
 	i915_dma_unmap_single(pd, dev);
 	__free_page(pd->page);
+	kfree(pd->used_pdes);
 	kfree(pd);
 }
 
@@ -431,26 +435,35 @@ static void __free_pd_single(struct i915_pagedir *pd, struct drm_device *dev)
 static struct i915_pagedir *alloc_pd_single(struct drm_device *dev)
 {
 	struct i915_pagedir *pd;
-	int ret;
+	int ret = -ENOMEM;
 
 	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
 	if (!pd)
 		return ERR_PTR(-ENOMEM);
 
+	pd->used_pdes = kcalloc(BITS_TO_LONGS(GEN8_PDES_PER_PAGE),
+				sizeof(*pd->used_pdes), GFP_KERNEL);
+	if (!pd->used_pdes)
+		goto free_pd;
+
 	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (!pd->page) {
-		kfree(pd);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!pd->page)
+		goto free_bitmap;
 
 	ret = i915_dma_map_px_single(pd, dev);
-	if (ret) {
-		__free_page(pd->page);
-		kfree(pd);
-		return ERR_PTR(ret);
-	}
+	if (ret)
+		goto free_page;
 
 	return pd;
+
+free_page:
+	__free_page(pd->page);
+free_bitmap:
+	kfree(pd->used_pdes);
+free_pd:
+	kfree(pd);
+
+	return ERR_PTR(ret);
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
@@ -632,36 +645,47 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
-		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
-			free_pt_single(pt, vm->dev);
-			pd->page_tables[pde] = NULL;
-		}
-		free_pd_single(pd, vm->dev);
-		ppgtt->pdp.pagedir[pdpe] = NULL;
-	}
-}
 
-static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
-{
-	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
-	int i, j;
-
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		/* TODO: In the future we'll support sparse mappings, so this
-		 * will have to change. */
-		if (!ppgtt->pdp.pagedir[i]->daddr)
+		/* Page directories might not be present since the macro rounds
+		 * down, and up.
+		 */
+		if (!pd) {
+			WARN(test_bit(pdpe, ppgtt->pdp.used_pdpes),
+			     "PDPE %d is not allocated, but is reserved (%p)\n",
+			     pdpe, vm);
 			continue;
+		} else {
+			WARN(!test_bit(pdpe, ppgtt->pdp.used_pdpes),
+			     "PDPE %d not reserved, but is allocated (%p)",
+			     pdpe, vm);
+		}
 
-		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i]->daddr, PAGE_SIZE,
-				PCI_DMA_BIDIRECTIONAL);
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			if (!pt) {
+				WARN(test_bit(pde, pd->used_pdes),
+				     "PDE %d is not allocated, but is reserved (%p)\n",
+				     pde, vm);
+				continue;
+			} else
+				WARN(!test_bit(pde, pd->used_pdes),
+				     "PDE %d not reserved, but is allocated (%p)",
+				     pde, vm);
+
+			bitmap_clear(pt->used_ptes,
+				     gen8_pte_index(pd_start),
+				     gen8_pte_count(pd_start, pd_len));
+
+			if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PAGE)) {
+				free_pt_single(pt, vm->dev);
+				pd->page_tables[pde] = NULL;
+				WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+			}
+		}
 
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
-			struct i915_pagetab *pt = pd->page_tables[j];
-			dma_addr_t addr = pt->daddr;
-			if (addr)
-				pci_unmap_page(hwdev, addr, PAGE_SIZE,
-						PCI_DMA_BIDIRECTIONAL);
+		if (bitmap_empty(pd->used_pdes, GEN8_PDES_PER_PAGE)) {
+			free_pd_single(pd, vm->dev);
+			ppgtt->pdp.pagedir[pdpe] = NULL;
+			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
 		}
 	}
 }
@@ -677,7 +701,6 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 
-	gen8_ppgtt_unmap_pages(ppgtt);
 	gen8_ppgtt_free(ppgtt);
 }
 
@@ -706,6 +729,7 @@ unwind_out:
 	return -ENOMEM;
 }
 
+/* bitmap of new pagedirs */
 static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 				     uint64_t start,
 				     uint64_t length,
@@ -721,6 +745,7 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
 		pdp->pagedir[pdpe] = alloc_pd_single(dev);
+
 		if (IS_ERR(pdp->pagedir[pdpe]))
 			goto unwind_out;
 	}
@@ -742,10 +767,12 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_pagedir *pd;
 	const uint64_t orig_start = start;
+	const uint64_t orig_length = length;
 	uint64_t temp;
 	uint32_t pdpe;
 	int ret;
 
+	/* Do the allocations first so we can easily bail out */
 	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length,
 					ppgtt->base.dev);
 	if (ret)
@@ -758,6 +785,26 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 			goto err_out;
 	}
 
+	/* Now mark everything we've touched as used. This doesn't allow for
+	 * robust error checking, but it makes the code a hell of a lot simpler.
+	 */
+	start = orig_start;
+	length = orig_length;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		struct i915_pagetab *pt;
+		uint64_t pd_len = gen8_clamp_pd(start, length);
+		uint64_t pd_start = start;
+		uint32_t pde;
+		gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
+			bitmap_set(pd->page_tables[pde]->used_ptes,
+				   gen8_pte_index(start),
+				   gen8_pte_count(start, length));
+			set_bit(pde, pd->used_pdes);
+		}
+		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+	}
+
 	return 0;
 
 err_out:
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 4c50d87..957f2d0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -206,11 +206,13 @@ struct i915_pagedir {
 		dma_addr_t daddr;
 	};
 
+	unsigned long *used_pdes;
 	struct i915_pagetab *page_tables[GEN6_PPGTT_PD_ENTRIES];
 };
 
 struct i915_pagedirpo {
 	/* struct page *page; */
+	DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
 	struct i915_pagedir *pagedir[GEN8_LEGACY_PDPES];
 };
 
@@ -449,6 +451,28 @@ static inline uint32_t gen8_pml4e_index(uint64_t address)
 	BUG(); /* For 64B */
 }
 
+static inline size_t gen8_pte_count(uint64_t addr, uint64_t length)
+{
+	return i915_pte_count(addr, length, GEN8_PDE_SHIFT);
+}
+
+static inline size_t gen8_pde_count(uint64_t addr, uint64_t length)
+{
+	const uint32_t pdp_shift = GEN8_PDE_SHIFT + 9;
+	const uint64_t mask = ~((1 << pdp_shift) - 1);
+	uint64_t end;
+
+	BUG_ON(length == 0);
+	BUG_ON(offset_in_page(addr|length));
+
+	end = addr + length;
+
+	if ((addr & mask) != (end & mask))
+		return GEN8_PDES_PER_PAGE - i915_pde_index(addr, GEN8_PDE_SHIFT);
+
+	return i915_pde_index(end, GEN8_PDE_SHIFT) - i915_pde_index(addr, GEN8_PDE_SHIFT);
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 23/24] drm/i915/bdw: Dynamic page table allocations
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (21 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 22/24] drm/i915/bdw: begin bitmap tracking Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2015-01-05 14:52     ` Daniel Vetter
  2014-12-23 17:16   ` [PATCH v2 24/24] drm/i915/bdw: Dynamic page table allocations in lrc mode Michel Thierry
  2015-01-05 14:57   ` [PATCH v2 00/24] PPGTT dynamic page allocations Daniel Vetter
  24 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.

Zombie tracking:
This could be a separate patch, but I found it helpful for debugging.
Since we write page tables asynchronously with respect to the GPU using
them, we can't actually free the page tables until we know the GPU won't
use them. With this patch, that is always when the context dies.  It
would be possible to write a reaper to go through zombies and clean them
up when under memory pressure. That exercise is left for the reader.

Scratch unused pages:
The object pages can get freed even if a page table still points to
them.  Like the zombie fix, we need to make sure we don't let our GPU
access arbitrary memory when we've unmapped things.

v2: Update aliasing/true ppgtt allocate/teardown/clear functions for
gen 6 & 7.

v3: Rebase.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 377 +++++++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  16 +-
 2 files changed, 326 insertions(+), 67 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6254677..571c307 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -602,7 +602,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
 			     struct i915_pagetab *pt,
 			     struct drm_device *dev)
 {
@@ -619,7 +619,7 @@ static void gen8_map_pagetable_range(struct i915_pagedir *pd,
 				     uint64_t length,
 				     struct drm_device *dev)
 {
-	gen8_ppgtt_pde_t *pagedir = kmap_atomic(pd->page);
+	gen8_ppgtt_pde_t * const pagedir = kmap_atomic(pd->page);
 	struct i915_pagetab *pt;
 	uint64_t temp, pde;
 
@@ -632,8 +632,9 @@ static void gen8_map_pagetable_range(struct i915_pagedir *pd,
 	kunmap_atomic(pagedir);
 }
 
-static void gen8_teardown_va_range(struct i915_address_space *vm,
-				   uint64_t start, uint64_t length)
+static void __gen8_teardown_va_range(struct i915_address_space *vm,
+				     uint64_t start, uint64_t length,
+				     bool dead)
 {
 	struct i915_hw_ppgtt *ppgtt =
 				container_of(vm, struct i915_hw_ppgtt, base);
@@ -655,6 +656,13 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 			     pdpe, vm);
 			continue;
 		} else {
+			if (dead && pd->zombie) {
+				WARN_ON(test_bit(pdpe, ppgtt->pdp.used_pdpes));
+				free_pd_single(pd, vm->dev);
+				ppgtt->pdp.pagedir[pdpe] = NULL;
+				continue;
+			}
+
 			WARN(!test_bit(pdpe, ppgtt->pdp.used_pdpes),
 			     "PDPE %d not reserved, but is allocated (%p)",
 			     pdpe, vm);
@@ -666,34 +674,64 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 				     "PDE %d is not allocated, but is reserved (%p)\n",
 				     pde, vm);
 				continue;
-			} else
+			} else {
+				if (dead && pt->zombie) {
+					WARN_ON(test_bit(pde, pd->used_pdes));
+					free_pt_single(pt, vm->dev);
+					pd->page_tables[pde] = NULL;
+					continue;
+				}
 				WARN(!test_bit(pde, pd->used_pdes),
 				     "PDE %d not reserved, but is allocated (%p)",
 				     pde, vm);
+			}
 
 			bitmap_clear(pt->used_ptes,
 				     gen8_pte_index(pd_start),
 				     gen8_pte_count(pd_start, pd_len));
 
 			if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PAGE)) {
+				WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+				if (!dead) {
+					pt->zombie = 1;
+					continue;
+				}
 				free_pt_single(pt, vm->dev);
 				pd->page_tables[pde] = NULL;
-				WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+
 			}
 		}
 
+		gen8_ppgtt_clear_range(vm, pd_start, pd_len, true);
+
 		if (bitmap_empty(pd->used_pdes, GEN8_PDES_PER_PAGE)) {
+			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
+			if (!dead) {
+				/* We've unmapped a possibly live context. Make
+				 * note of it so we can clean it up later. */
+				pd->zombie = 1;
+				continue;
+			}
 			free_pd_single(pd, vm->dev);
 			ppgtt->pdp.pagedir[pdpe] = NULL;
-			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
 		}
 	}
 }
 
+static void gen8_teardown_va_range(struct i915_address_space *vm,
+				   uint64_t start, uint64_t length)
+{
+	__gen8_teardown_va_range(vm, start, length, false);
+}
+
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
-	gen8_teardown_va_range(&ppgtt->base,
-			       ppgtt->base.start, ppgtt->base.total);
+	trace_i915_va_teardown(&ppgtt->base,
+			       ppgtt->base.start, ppgtt->base.total,
+			       VM_TO_TRACE_NAME(&ppgtt->base));
+	__gen8_teardown_va_range(&ppgtt->base,
+				 ppgtt->base.start, ppgtt->base.total,
+				 true);
 }
 
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
@@ -704,67 +742,177 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_alloc_pagetabs(struct i915_pagedir *pd,
+/**
+ * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
+ * @ppgtt:	Master ppgtt structure.
+ * @pd:		Page directory for this address range.
+ * @start:	Starting virtual address to begin allocations.
+ * @length	Size of the allocations.
+ * @new_pts:	Bitmap set by function with new allocations. Likely used by the
+ *		caller to free on error.
+ *
+ * Allocate the required number of page tables. Extremely similar to
+ * gen8_ppgtt_alloc_pagedirs(). The main difference is here we are limited by
+ * the page directory boundary (instead of the page directory pointer). That
+ * boundary is 1GB virtual. Therefore, unlike gen8_ppgtt_alloc_pagedirs(), it is
+ * possible, and likely that the caller will need to use multiple calls of this
+ * function to achieve the appropriate allocation.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_pagedir *pd,
 				     uint64_t start,
 				     uint64_t length,
-				     struct drm_device *dev)
+				     unsigned long *new_pts)
 {
-	struct i915_pagetab *unused;
+	struct i915_pagetab *pt;
 	uint64_t temp;
 	uint32_t pde;
 
-	gen8_for_each_pde(unused, pd, start, length, temp, pde) {
-		BUG_ON(unused);
-		pd->page_tables[pde] = alloc_pt_single(dev);
-		if (IS_ERR(pd->page_tables[pde]))
+	gen8_for_each_pde(pt, pd, start, length, temp, pde) {
+		/* Don't reallocate page tables */
+		if (pt) {
+			/* Scratch is never allocated this way */
+			WARN_ON(pt->scratch);
+			/* If there is a zombie, we can reuse it and save time
+			 * on the allocation. If we clear the zombie status and
+			 * the caller somehow fails, we'll probably hit some
+			 * assertions, so it's up to them to fix up the bitmaps.
+			 */
+			continue;
+		}
+
+		pt = alloc_pt_single(ppgtt->base.dev);
+		if (IS_ERR(pt))
 			goto unwind_out;
+
+		pd->page_tables[pde] = pt;
+		set_bit(pde, new_pts);
 	}
 
 	return 0;
 
 unwind_out:
-	while (pde--)
-		free_pt_single(pd->page_tables[pde], dev);
+	for_each_set_bit(pde, new_pts, GEN8_PDES_PER_PAGE)
+		free_pt_single(pd->page_tables[pde], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
 
-/* bitmap of new pagedirs */
-static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
+/**
+ * gen8_ppgtt_alloc_pagedirs() - Allocate page directories for VA range.
+ * @ppgtt:	Master ppgtt structure.
+ * @pdp:	Page directory pointer for this address range.
+ * @start:	Starting virtual address to begin allocations.
+ * @length	Size of the allocations.
+ * @new_pds	Bitmap set by function with new allocations. Likely used by the
+ *		caller to free on error.
+ *
+ * Allocate the required number of page directories starting at the pde index of
+ * @start, and ending at the pde index @start + @length. This function will skip
+ * over already allocated page directories within the range, and only allocate
+ * new ones, setting the appropriate pointer within the pdp as well as the
+ * correct position in the bitmap @new_pds.
+ *
+ * The function will only allocate the pages within the range for a give page
+ * directory pointer. In other words, if @start + @length straddles a virtually
+ * addressed PDP boundary (512GB for 4k pages), there will be more allocations
+ * required by the caller, This is not currently possible, and the BUG in the
+ * code will prevent it.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_pagedirs(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_pagedirpo *pdp,
 				     uint64_t start,
 				     uint64_t length,
-				     struct drm_device *dev)
+				     unsigned long *new_pds)
 {
-	struct i915_pagedir *unused;
+	struct i915_pagedir *pd;
 	uint64_t temp;
 	uint32_t pdpe;
 
+	BUG_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+
 	/* FIXME: PPGTT container_of won't work for 64b */
 	BUG_ON((start + length) > 0x800000000ULL);
 
-	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
-		BUG_ON(unused);
-		pdp->pagedir[pdpe] = alloc_pd_single(dev);
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+		if (pd)
+			continue;
 
-		if (IS_ERR(pdp->pagedir[pdpe]))
+		pd = alloc_pd_single(ppgtt->base.dev);
+		if (IS_ERR(pd))
 			goto unwind_out;
+
+		pdp->pagedir[pdpe] = pd;
+		set_bit(pdpe, new_pds);
 	}
 
 	return 0;
 
 unwind_out:
-	while (pdpe--)
-		free_pd_single(pdp->pagedir[pdpe], dev);
+	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
+		free_pd_single(pdp->pagedir[pdpe], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
 
+static inline void
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
+{
+	int i;
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+		kfree(new_pts[i]);
+	kfree(new_pts);
+	kfree(new_pds);
+}
+
+/* Fills in the page directory bitmap, ant the array of page tables bitmap. Both
+ * of these are based on the number of PDPEs in the system.
+ */
+int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
+					 unsigned long ***new_pts)
+{
+	int i;
+	unsigned long *pds;
+	unsigned long **pts;
+
+	pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
+	if (!pds)
+		return -ENOMEM;
+
+	pts = kcalloc(GEN8_PDES_PER_PAGE, sizeof(unsigned long *), GFP_KERNEL);
+	if (!pts) {
+		kfree(pds);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+		pts[i] = kcalloc(BITS_TO_LONGS(GEN8_PDES_PER_PAGE),
+				 sizeof(unsigned long), GFP_KERNEL);
+		if (!pts[i])
+			goto err_out;
+	}
+
+	*new_pds = pds;
+	*new_pts = (unsigned long **)pts;
+
+	return 0;
+
+err_out:
+	free_gen8_temp_bitmaps(pds, pts);
+	return -ENOMEM;
+}
+
 static int gen8_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start,
 			       uint64_t length)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	unsigned long *new_page_dirs, **new_page_tables;
 	struct i915_pagedir *pd;
 	const uint64_t orig_start = start;
 	const uint64_t orig_length = length;
@@ -772,43 +920,103 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	uint32_t pdpe;
 	int ret;
 
-	/* Do the allocations first so we can easily bail out */
-	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length,
-					ppgtt->base.dev);
+#ifndef CONFIG_64BIT
+	/* Disallow 64b address on 32b platforms. Nothing is wrong with doing
+	 * this in hardware, but a lot of the drm code is not prepared to handle
+	 * 64b offset on 32b platforms. */
+	if (start + length > 0x100000000ULL)
+		return -E2BIG;
+#endif
+
+	/* Wrap is never okay since we can only represent 48b, and we don't
+	 * actually use the other side of the canonical address space.
+	 */
+	if (WARN_ON(start + length < start))
+		return -ERANGE;
+
+	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
 	if (ret)
 		return ret;
 
+	/* Do the allocations first so we can easily bail out */
+	ret = gen8_ppgtt_alloc_pagedirs(ppgtt, &ppgtt->pdp, start, length,
+					new_page_dirs);
+	if (ret) {
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+		return ret;
+	}
+
+	/* For every page directory referenced, allocate page tables */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
-						ppgtt->base.dev);
+		bitmap_zero(new_page_tables[pdpe], GEN8_PDES_PER_PAGE);
+		ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
+						new_page_tables[pdpe]);
 		if (ret)
 			goto err_out;
 	}
 
-	/* Now mark everything we've touched as used. This doesn't allow for
-	 * robust error checking, but it makes the code a hell of a lot simpler.
-	 */
 	start = orig_start;
 	length = orig_length;
 
+	/* Allocations have completed successfully, so set the bitmaps, and do
+	 * the mappings. */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		gen8_ppgtt_pde_t *const pagedir = kmap_atomic(pd->page);
 		struct i915_pagetab *pt;
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
 		uint32_t pde;
-		gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
-			bitmap_set(pd->page_tables[pde]->used_ptes,
-				   gen8_pte_index(start),
-				   gen8_pte_count(start, length));
+
+		/* Every pd should be allocated, we just did that above. */
+		BUG_ON(!pd);
+
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			/* Same reasoning as pd */
+			BUG_ON(!pt);
+			BUG_ON(!pd_len);
+			BUG_ON(!gen8_pte_count(pd_start, pd_len));
+
+			/* Set our used ptes within the page table */
+			bitmap_set(pt->used_ptes,
+				   gen8_pte_index(pd_start),
+				   gen8_pte_count(pd_start, pd_len));
+
+			/* Our pde is now pointing to the pagetable, pt */
 			set_bit(pde, pd->used_pdes);
+
+			/* Map the PDE to the page table */
+			__gen8_do_map_pt(pagedir + pde, pt, vm->dev);
+
+			/* NB: We haven't yet mapped ptes to pages. At this
+			 * point we're still relying on insert_entries() */
+
+			/* No longer possible this page table is a zombie */
+			pt->zombie = 0;
 		}
+
+		if (!HAS_LLC(vm->dev))
+			drm_clflush_virt_range(pagedir, PAGE_SIZE);
+
+		kunmap_atomic(pagedir);
+
 		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+		/* This pd is officially not a zombie either */
+		ppgtt->pdp.pagedir[pdpe]->zombie = 0;
 	}
 
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	return 0;
 
 err_out:
-	gen8_teardown_va_range(vm, orig_start, start);
+	while (pdpe--) {
+		for_each_set_bit(temp, new_page_tables[pdpe], GEN8_PDES_PER_PAGE)
+			free_pt_single(pd->page_tables[temp], vm->dev);
+	}
+
+	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
+		free_pd_single(ppgtt->pdp.pagedir[pdpe], vm->dev);
+
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	return ret;
 }
 
@@ -819,37 +1027,68 @@ err_out:
  * space.
  *
  */
-static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
+static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	struct i915_pagedir *pd;
-	uint64_t temp, start = 0;
-	const uint64_t orig_length = size;
-	uint32_t pdpe;
-	int ret;
+	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pd))
+		return PTR_ERR(ppgtt->scratch_pd);
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+
 	ppgtt->switch_mm = gen8_mm_switch;
 
-	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
-	if (IS_ERR(ppgtt->scratch_pd))
-		return PTR_ERR(ppgtt->scratch_pd);
+	return 0;
+}
+
+static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_pagedir *pd;
+	uint64_t temp, start = 0, size = dev_priv->gtt.base.total;
+	uint32_t pdpe;
+	int ret;
 
+	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	if (ret)
+		return ret;
+
+	/* Aliasing PPGTT has to always work and be mapped because of the way we
+	 * use RESTORE_INHIBIT in the context switch. This will be fixed
+	 * eventually. */
 	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
 		free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
 	}
 
-	start = 0;
-	size = orig_length;
-
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
 		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
 
+	ppgtt->base.allocate_va_range = NULL;
+	ppgtt->base.teardown_va_range = NULL;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+
+	return 0;
+}
+
+static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	if (ret)
+		return ret;
+
+	ppgtt->base.allocate_va_range = gen8_alloc_va_range;
+	ppgtt->base.teardown_va_range = gen8_teardown_va_range;
+	ppgtt->base.clear_range = NULL;
+
 	return 0;
 }
 
@@ -1413,9 +1652,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 	if (ret)
 		return ret;
 
-	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
-	ppgtt->base.teardown_va_range = gen6_teardown_va_range;
-	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
+	ppgtt->base.allocate_va_range = aliasing ? NULL : gen6_alloc_va_range;
+	ppgtt->base.teardown_va_range = aliasing ? NULL : gen6_teardown_va_range;
+	ppgtt->base.clear_range = aliasing ? gen6_ppgtt_clear_range : NULL;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
@@ -1453,8 +1692,10 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
 
 	if (INTEL_INFO(dev)->gen < 8)
 		return gen6_ppgtt_init(ppgtt, aliasing);
+	else if (aliasing)
+		return gen8_aliasing_ppgtt_init(ppgtt);
 	else
-		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
+		return gen8_ppgtt_init(ppgtt);
 }
 int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 {
@@ -1466,8 +1707,9 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 		kref_init(&ppgtt->ref);
 		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
 			    ppgtt->base.total);
-		ppgtt->base.clear_range(&ppgtt->base, 0,
-			    ppgtt->base.total, true);
+		if (ppgtt->base.clear_range)
+			ppgtt->base.clear_range(&ppgtt->base, 0,
+				ppgtt->base.total, true);
 		i915_init_vm(dev_priv, &ppgtt->base);
 	}
 
@@ -1565,10 +1807,7 @@ ppgtt_bind_vma(struct i915_vma *vma,
 
 static void ppgtt_unbind_vma(struct i915_vma *vma)
 {
-	vma->vm->clear_range(vma->vm,
-			     vma->node.start,
-			     vma->obj->base.size,
-			     true);
+	WARN_ON(vma->vm->teardown_va_range && vma->vm->clear_range);
 	if (vma->vm->teardown_va_range) {
 		trace_i915_va_teardown(vma->vm,
 				       vma->node.start, vma->node.size,
@@ -1576,7 +1815,13 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 
 		vma->vm->teardown_va_range(vma->vm,
 					   vma->node.start, vma->node.size);
-	}
+	} else if (vma->vm->clear_range) {
+		vma->vm->clear_range(vma->vm,
+				     vma->node.start,
+				     vma->obj->base.size,
+				     true);
+	} else
+		BUG();
 }
 
 extern int intel_iommu_gfx_mapped;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 957f2d0..534ed82 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -190,13 +190,26 @@ struct i915_vma {
 			u32 flags);
 };
 
-
+/* Zombies. We write page tables with the CPU, and hardware switches them with
+ * the GPU. As such, the only time we can safely remove a page table is when we
+ * know the context is idle. Since we have no good way to do this, we use the
+ * zombie.
+ *
+ * Under memory pressure, if the system is idle, zombies may be reaped.
+ *
+ * There are 3 states a page table can be in (not including scratch)
+ *  bitmap = 0, zombie = 0: unallocated
+ *  bitmap = 1, zombie = 0: allocated
+ *  bitmap = 0, zombie = 1: zombie
+ *  bitmap = 1, zombie = 1: invalid
+ */
 struct i915_pagetab {
 	struct page *page;
 	dma_addr_t daddr;
 
 	unsigned long *used_ptes;
 	unsigned int scratch:1;
+	unsigned zombie:1;
 };
 
 struct i915_pagedir {
@@ -208,6 +221,7 @@ struct i915_pagedir {
 
 	unsigned long *used_pdes;
 	struct i915_pagetab *page_tables[GEN6_PPGTT_PD_ENTRIES];
+	unsigned zombie:1;
 };
 
 struct i915_pagedirpo {
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v2 24/24] drm/i915/bdw: Dynamic page table allocations in lrc mode
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (22 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 23/24] drm/i915/bdw: Dynamic page table allocations Michel Thierry
@ 2014-12-23 17:16   ` Michel Thierry
  2015-01-05 14:59     ` Daniel Vetter
  2015-01-05 14:57   ` [PATCH v2 00/24] PPGTT dynamic page allocations Daniel Vetter
  24 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2014-12-23 17:16 UTC (permalink / raw)
  To: intel-gfx

Logic ring contexts need to know the PDPs when they are populated. With
dynamic page table allocations, these PDPs may not exist yet.

Check if PDPs have been allocated and use the scratch page if they do
not exist yet.

Before submission, update the PDPs in the logic ring context as PDPs
have been allocated.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 80 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 70 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 546884b..6abe4bc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -358,6 +358,7 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 
 static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 				    struct drm_i915_gem_object *ring_obj,
+				    struct i915_hw_ppgtt *ppgtt,
 				    u32 tail)
 {
 	struct page *page;
@@ -369,6 +370,40 @@ static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 	reg_state[CTX_RING_TAIL+1] = tail;
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
 
+	/* True PPGTT with dynamic page allocation: update PDP registers and
+	 * point the unallocated PDPs to the scratch page
+	 */
+	if (ppgtt) {
+		if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3]->daddr);
+			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3]->daddr);
+		} else {
+			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2]->daddr);
+			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2]->daddr);
+		} else {
+			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1]->daddr);
+			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1]->daddr);
+		} else {
+			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0]->daddr);
+			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0]->daddr);
+		} else {
+			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+	}
+
 	kunmap_atomic(reg_state);
 
 	return 0;
@@ -387,7 +422,7 @@ static void execlists_submit_contexts(struct intel_engine_cs *ring,
 	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj0));
 	WARN_ON(!i915_gem_obj_is_pinned(ringbuf0->obj));
 
-	execlists_update_context(ctx_obj0, ringbuf0->obj, tail0);
+	execlists_update_context(ctx_obj0, ringbuf0->obj, to0->ppgtt, tail0);
 
 	if (to1) {
 		ringbuf1 = to1->engine[ring->id].ringbuf;
@@ -396,7 +431,7 @@ static void execlists_submit_contexts(struct intel_engine_cs *ring,
 		WARN_ON(!i915_gem_obj_is_pinned(ctx_obj1));
 		WARN_ON(!i915_gem_obj_is_pinned(ringbuf1->obj));
 
-		execlists_update_context(ctx_obj1, ringbuf1->obj, tail1);
+		execlists_update_context(ctx_obj1, ringbuf1->obj, to1->ppgtt, tail1);
 	}
 
 	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
@@ -1731,14 +1766,39 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3]->daddr);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3]->daddr);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2]->daddr);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2]->daddr);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1]->daddr);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1]->daddr);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0]->daddr);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0]->daddr);
+
+	/* With dynamic page allocation, PDPs may not be allocated at this point,
+	 * Point the unallocated PDPs to the scratch page
+	 */
+	if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3]->daddr);
+		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3]->daddr);
+	} else {
+		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2]->daddr);
+		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2]->daddr);
+	} else {
+		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1]->daddr);
+		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1]->daddr);
+	} else {
+		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0]->daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0]->daddr);
+	} else {
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 01/24] drm/i915: Add some extra guards in evict_vm
  2014-12-23 17:16   ` [PATCH v2 01/24] drm/i915: Add some extra guards in evict_vm Michel Thierry
@ 2015-01-05 13:39     ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2015-01-05 13:39 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Tue, Dec 23, 2014 at 05:16:04PM +0000, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
> 
> v2: Use WARN_ONs (Daniel)
> 
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>

Nitpick: Please put in some indication about what you've changed since
this isn't precisly Ben's patch. Usually we go by putting a (v1) or
similar behind the relavant sob line. Just for next time around.
-Daniel
> ---
>  drivers/gpu/drm/i915/i915_gem_evict.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> index 886ff2e..3dc7b37 100644
> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> @@ -214,6 +214,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
>  	struct i915_vma *vma, *next;
>  	int ret;
>  
> +	WARN_ON(!mutex_is_locked(&vm->dev->struct_mutex));
>  	trace_i915_gem_evict_vm(vm);
>  
>  	if (do_idle) {
> @@ -222,6 +223,8 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
>  			return ret;
>  
>  		i915_gem_retire_requests(vm->dev);
> +
> +		WARN_ON(!list_empty(&vm->active_list));
>  	}
>  
>  	list_for_each_entry_safe(vma, next, &vm->inactive_list, mm_list)
> -- 
> 2.1.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 07/24] drm/i915: page table abstractions
  2014-12-23 17:16   ` [PATCH v2 07/24] drm/i915: page table abstractions Michel Thierry
@ 2015-01-05 13:47     ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2015-01-05 13:47 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Tue, Dec 23, 2014 at 05:16:10PM +0000, Michel Thierry wrote:
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 8f76990..1ff3c05 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -265,6 +265,20 @@ struct i915_gtt {
>  			  unsigned long *mappable_end);
>  };
>  
> +struct i915_pagetab {
> +	struct page *page;
> +};
> +
> +struct i915_pagedir {
> +	struct page *page; /* NULL for GEN6-GEN7 */
> +	struct i915_pagetab *page_tables;
> +};
> +
> +struct i915_pagedirpo {
> +	/* struct page *page; */
> +	struct i915_pagedir pagedir[GEN8_LEGACY_PDPES];
> +};

Well I still think the names assigned by intel in the IA prm are horrible
and long-term we should just use the ones used by the core linux vm
because they're much saner. But shortening them inconsistently like here
really doesn't help. Can you please replace this with the relevant
page_directory or page_directory_pointer or whatever to make it clearer?

Since it's only used once saving characters here doesn't seem a good
tradeoff.
-Daniel

> +
>  struct i915_hw_ppgtt {
>  	struct i915_address_space base;
>  	struct kref ref;
> @@ -272,11 +286,6 @@ struct i915_hw_ppgtt {
>  	unsigned num_pd_entries;
>  	unsigned num_pd_pages; /* gen8+ */
>  	union {
> -		struct page **pt_pages;
> -		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
> -	};
> -	struct page *pd_pages;
> -	union {
>  		uint32_t pd_offset;
>  		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
>  	};
> @@ -284,6 +293,10 @@ struct i915_hw_ppgtt {
>  		dma_addr_t *pt_dma_addr;
>  		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
>  	};
> +	union {
> +		struct i915_pagedirpo pdp;
> +		struct i915_pagedir pd;
> +	};
>  
>  	struct drm_i915_file_private *file_priv;
>  
> -- 
> 2.1.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 10/24] drm/i915: Track GEN6 page table usage
  2014-12-23 17:16   ` [PATCH v2 10/24] drm/i915: Track GEN6 page table usage Michel Thierry
@ 2015-01-05 14:29     ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2015-01-05 14:29 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Tue, Dec 23, 2014 at 05:16:13PM +0000, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
> 
> Instead of implementing the full tracking + dynamic allocation, this
> patch does a bit less than half of the work, by tracking and warning on
> unexpected conditions. The tracking itself follows which PTEs within a
> page table are currently being used for objects. The next patch will
> modify this to actually allocate the page tables only when necessary.
> 
> With the current patch there isn't much in the way of making a gen
> agnostic range allocation function. However, in the next patch we'll add
> more specificity which makes having separate functions a bit easier to
> manage.
> 
> One important change introduced here is that DMA mappings are
> created/destroyed at the same page directories/tables are
> allocated/deallocated.
> 
> Notice that aliasing PPGTT is not managed here. The patch which actually
> begins dynamic allocation/teardown explains the reasoning for this.
> 
> v2: s/pdp.pagedir/pdp.pagedirs
> Make a scratch page allocation helper
> 
> v3: Rebase and expand commit message.
> 
> v4: Allocate required pagetables only when it is needed, _bind_to_vm
> instead of bind_vma (Daniel).
> 
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)

Imo still a bit too much rebase fluff in this patch. I think it would help
the patch clarity a lot of we'd split the changes to move around the
dma_map/unmap calls from the other parts of the patch to dynamically
allocate pagetables.

Bunch more comments below.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c     |   9 ++
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 277 ++++++++++++++++++++++++++----------
>  drivers/gpu/drm/i915/i915_gem_gtt.h | 149 ++++++++++++++-----
>  3 files changed, 322 insertions(+), 113 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 2b6ecfd..5d52990 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3597,6 +3597,15 @@ search_free:
>  	if (ret)
>  		goto err_remove_node;
>  
> +	/*  allocate before insert / bind */
> +	if (vma->vm->allocate_va_range) {
> +		ret = vma->vm->allocate_va_range(vma->vm,
> +						vma->node.start,
> +						vma->node.size);
> +		if (ret)
> +			goto err_remove_node;
> +	}

Is this really the right patch for this hunk? The commit message sounds
like dynamic pagetable alloc is only partially implemented here ...

> +
>  	trace_i915_vma_bind(vma, flags);
>  	ret = i915_vma_bind(vma, obj->cache_level,
>  			    flags & PIN_GLOBAL ? GLOBAL_BIND : 0);
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 52bdde7..313432e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -138,10 +138,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
>  		return has_aliasing_ppgtt ? 1 : 0;
>  }
>  
> -
>  static void ppgtt_bind_vma(struct i915_vma *vma,
> -			   enum i915_cache_level cache_level,
> -			   u32 flags);
> +			  enum i915_cache_level cache_level,
> +			  u32 flags);
>  static void ppgtt_unbind_vma(struct i915_vma *vma);
>  
>  static inline gen8_gtt_pte_t gen8_pte_encode(dma_addr_t addr,
> @@ -275,27 +274,99 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
>  	return pte;
>  }
>  
> -static void free_pt_single(struct i915_pagetab *pt)
> -{
> +#define i915_dma_unmap_single(px, dev) do { \
> +	pci_unmap_page((dev)->pdev, (px)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
> +} while (0);
> +
> +/**
> + * i915_dma_map_px_single() - Create a dma mapping for a page table/dir/etc.
> + * @px:		Page table/dir/etc to get a DMA map for
> + * @dev:	drm device
> + *
> + * Page table allocations are unified across all gens. They always require a
> + * single 4k allocation, as well as a DMA mapping. If we keep the structs
> + * symmetric here, the simple macro covers us for every page table type.
> + *
> + * Return: 0 if success.
> + */
> +#define i915_dma_map_px_single(px, dev) \
> +	pci_dma_mapping_error((dev)->pdev, \
> +			      (px)->daddr = pci_map_page((dev)->pdev, \
> +							 (px)->page, 0, 4096, \
> +							 PCI_DMA_BIDIRECTIONAL))

Linux coding style discourages macro abuse like this, please make this a
static inline instead. Otoh I don't really see the value in hiding the
pci_map_page call, imo open-codeing this is totally ok.

But while you touch the code please switch away from the pci_map wrappers
and use the dma_map functions directly.

> +
> +static void __free_pt_single(struct i915_pagetab *pt, struct drm_device *dev,
> +			     int scratch)
> +{
> +	if (WARN(scratch ^ pt->scratch,
> +		 "Tried to free scratch = %d. Is scratch = %d\n",
> +		 scratch, pt->scratch))
> +		return;
> +
>  	if (WARN_ON(!pt->page))
>  		return;
> +
> +	if (!scratch) {
> +		const size_t count = INTEL_INFO(dev)->gen >= 8 ?
> +			GEN8_PTES_PER_PAGE : I915_PPGTT_PT_ENTRIES;
> +		WARN(!bitmap_empty(pt->used_ptes, count),
> +		     "Free page table with %d used pages\n",
> +		     bitmap_weight(pt->used_ptes, count));
> +	}
> +
> +	i915_dma_unmap_single(pt, dev);
>  	__free_page(pt->page);
> +	kfree(pt->used_ptes);
>  	kfree(pt);
>  }
>  
> -static struct i915_pagetab *alloc_pt_single(void)
> +#define free_pt_single(pt, dev) \
> +	__free_pt_single(pt, dev, false)
> +#define free_pt_scratch(pt, dev) \
> +	__free_pt_single(pt, dev, true)

Imo the disdinction between _single and _scracth is confusing. Instead
adding a new functions unmap_and_free_pt which calls dma_unmap_page and
free_pt_single would make more sense. Then there's also no need for the __
version of the function and the scratch parameter.

It means that we'll need to kill the selfchecks for scratch, but by
uncluttering the indirections here a bit as proposed any such bugs should
be obvious. All the others will be caught by the dma mapping debugging
code (which is super-paranoid afaik).

> +
> +static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
>  {
>  	struct i915_pagetab *pt;
> +	const size_t count = INTEL_INFO(dev)->gen >= 8 ?
> +		GEN8_PTES_PER_PAGE : I915_PPGTT_PT_ENTRIES;
> +	int ret = -ENOMEM;
>  
>  	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
>  	if (!pt)
>  		return ERR_PTR(-ENOMEM);
>  
> +	pt->used_ptes = kcalloc(BITS_TO_LONGS(count), sizeof(*pt->used_ptes),
> +				GFP_KERNEL);
> +

I don't see the value in tracking used_ptes. For debugging we can just
look at the pte value itself (which should be in cached memory, so dirt
cheap). And since pagetables are the lowest level we can't screw up the
allocations/freeing. What do I miss?

> +	if (!pt->used_ptes)
> +		goto fail_bitmap;
> +
>  	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> -	if (!pt->page) {
> -		kfree(pt);
> -		return ERR_PTR(-ENOMEM);
> -	}
> +	if (!pt->page)
> +		goto fail_page;
> +
> +	ret = i915_dma_map_px_single(pt, dev);
> +	if (ret)
> +		goto fail_dma;
> +
> +	return pt;
> +
> +fail_dma:
> +	__free_page(pt->page);
> +fail_page:
> +	kfree(pt->used_ptes);
> +fail_bitmap:
> +	kfree(pt);
> +
> +	return ERR_PTR(ret);
> +}
> +
> +static inline struct i915_pagetab *alloc_pt_scratch(struct drm_device *dev)
> +{
> +	struct i915_pagetab *pt = alloc_pt_single(dev);
> +	if (!IS_ERR(pt))
> +		pt->scratch = 1;

Shouldn't we fill the scratch pt with scratch pte entries? Or do I miss
the point of this? Hard to tell without having users of scratch_pt in the
same patch as the patch as the one that adds them. Might be good to split
this in yet another patc.

>  
>  	return pt;
>  }
> @@ -313,7 +384,9 @@ static struct i915_pagetab *alloc_pt_single(void)
>   *
>   * Return: 0 if allocation succeeded.
>   */
> -static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
> +static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count,
> +		  struct drm_device *dev)

Aside: The reason I've suggested to split the patches is so that we can
get all the added *dev parameters out of the diff. That should help patch
readablity a lot.

> +
>  {
>  	int i, ret;
>  
> @@ -323,7 +396,7 @@ static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
>  	BUG_ON(pde + count > GEN6_PPGTT_PD_ENTRIES);
>  
>  	for (i = pde; i < pde + count; i++) {
> -		struct i915_pagetab *pt = alloc_pt_single();
> +		struct i915_pagetab *pt = alloc_pt_single(dev);
>  		if (IS_ERR(pt)) {
>  			ret = PTR_ERR(pt);
>  			goto err_out;
> @@ -338,7 +411,7 @@ static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
>  
>  err_out:
>  	while (i--)
> -		free_pt_single(pd->page_tables[i]);
> +		free_pt_single(pd->page_tables[i], dev);
>  	return ret;
>  }
>  
> @@ -506,7 +579,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  	}
>  }
>  
> -static void gen8_free_page_tables(struct i915_pagedir *pd)
> +static void gen8_free_page_tables(struct i915_pagedir *pd, struct drm_device *dev)
>  {
>  	int i;
>  
> @@ -514,7 +587,7 @@ static void gen8_free_page_tables(struct i915_pagedir *pd)
>  		return;
>  
>  	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
> -		free_pt_single(pd->page_tables[i]);
> +		free_pt_single(pd->page_tables[i], dev);
>  		pd->page_tables[i] = NULL;
>  	}
>  }
> @@ -524,7 +597,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  	int i;
>  
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> -		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
> +		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
>  		free_pd_single(ppgtt->pdp.pagedir[i]);
>  	}
>  }
> @@ -569,7 +642,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>  
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
>  		ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
> -				     0, GEN8_PDES_PER_PAGE);
> +				     0, GEN8_PDES_PER_PAGE, ppgtt->base.dev);
>  		if (ret)
>  			goto unwind_out;
>  	}
> @@ -578,7 +651,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>  
>  unwind_out:
>  	while (i--)
> -		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
> +		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
>  
>  	return -ENOMEM;
>  }
> @@ -808,26 +881,36 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>  	}
>  }
>  
> -static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
> +/* Write pde (index) from the page directory @pd to the page table @pt */
> +static void gen6_write_pdes(struct i915_pagedir *pd,
> +			    const int pde, struct i915_pagetab *pt)
>  {
> -	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
> -	gen6_gtt_pte_t __iomem *pd_addr;
> -	uint32_t pd_entry;
> -	int i;
> +	struct i915_hw_ppgtt *ppgtt =
> +		container_of(pd, struct i915_hw_ppgtt, pd);
> +	u32 pd_entry;
>  
> -	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
> -	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
> -		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
> -	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> -		dma_addr_t pt_addr;
> +	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
> +	pd_entry |= GEN6_PDE_VALID;
>  
> -		pt_addr = ppgtt->pd.page_tables[i]->daddr;
> -		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
> -		pd_entry |= GEN6_PDE_VALID;
> +	writel(pd_entry, ppgtt->pd_addr + pde);
>  
> -		writel(pd_entry, pd_addr + i);
> -	}
> -	readl(pd_addr);
> +	/* XXX: Caller needs to make sure the write completes if necessary */

Please take care of such XXX comments. That's the stuff I've meant when
I've said that the overall patch series needs a full pass with a critical
eye to catch development/rebase leftovers.

> +}
> +
> +/* Write all the page tables found in the ppgtt structure to incrementing page
> + * directories. */
> +static void gen6_write_page_range(struct drm_i915_private *dev_priv,
> +				struct i915_pagedir *pd, uint32_t start, uint32_t length)
> +{
> +	struct i915_pagetab *pt;
> +	uint32_t pde, temp;
> +
> +	gen6_for_each_pde(pt, pd, start, length, temp, pde)
> +		gen6_write_pdes(pd, pde, pt);
> +
> +	/* Make sure write is complete before other code can use this page
> +	 * table. Also require for WC mapped PTEs */
> +	readl(dev_priv->gtt.gsm);
>  }
>  
>  static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
> @@ -1043,13 +1126,59 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  			       4096, PCI_DMA_BIDIRECTIONAL);
>  }
>  
> +static int gen6_alloc_va_range(struct i915_address_space *vm,
> +			       uint64_t start, uint64_t length)
> +{
> +	struct i915_hw_ppgtt *ppgtt =
> +				container_of(vm, struct i915_hw_ppgtt, base);
> +	struct i915_pagetab *pt;
> +	uint32_t pde, temp;
> +
> +	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
> +		int j;
> +
> +		DECLARE_BITMAP(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
> +		bitmap_zero(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
> +		bitmap_set(tmp_bitmap, gen6_pte_index(start),
> +			   gen6_pte_count(start, length));
> +
> +		/* TODO: To be done in the next patch. Map the page/insert
> +		 * entries here */
> +		for_each_set_bit(j, tmp_bitmap, I915_PPGTT_PT_ENTRIES) {
> +			if (test_bit(j, pt->used_ptes)) {
> +				/* Check that we're changing cache levels */

Again something only valid from older revisions since we've taken care of
the cache_level changes by only extendeding pagetables where actually
needed (in the bind functions). Furthermore with the used_ptes bitmask
gone this would all disappear anyway.

Given that the patch justifies itself by adding the dynamic allocation and
tracking first to debug it, but doesn't add any self-checks in the pte
writing funcs (afaics at least) I don't think this is all that useful any
more.

> +			}
> +		}
> +
> +		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
> +				I915_PPGTT_PT_ENTRIES);
> +	}
> +
> +	return 0;
> +}
> +
> +static void gen6_teardown_va_range(struct i915_address_space *vm,
> +				   uint64_t start, uint64_t length)
> +{
> +	struct i915_hw_ppgtt *ppgtt =
> +				container_of(vm, struct i915_hw_ppgtt, base);
> +	struct i915_pagetab *pt;
> +	uint32_t pde, temp;
> +
> +	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
> +		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
> +			     gen6_pte_count(start, length));
> +	}
> +}
> +
>  static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  {
>  	int i;
>  
>  	for (i = 0; i < ppgtt->num_pd_entries; i++)
> -		free_pt_single(ppgtt->pd.page_tables[i]);
> +		free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
>  
> +	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
>  	free_pd_single(&ppgtt->pd);
>  }
>  
> @@ -1076,6 +1205,9 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
>  	 * size. We allocate at the top of the GTT to avoid fragmentation.
>  	 */
>  	BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
> +	ppgtt->scratch_pt = alloc_pt_scratch(ppgtt->base.dev);
> +	if (IS_ERR(ppgtt->scratch_pt))
> +		return PTR_ERR(ppgtt->scratch_pt);
>  alloc:
>  	ret = drm_mm_insert_node_in_range_generic(&dev_priv->gtt.base.mm,
>  						  &ppgtt->node, GEN6_PD_SIZE,
> @@ -1089,20 +1221,25 @@ alloc:
>  					       0, dev_priv->gtt.base.total,
>  					       0);
>  		if (ret)
> -			return ret;
> +			goto err_out;
>  
>  		retried = true;
>  		goto alloc;
>  	}
>  
>  	if (ret)
> -		return ret;
> +		goto err_out;
> +
>  
>  	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
>  		DRM_DEBUG("Forced to use aperture for PDEs\n");
>  
>  	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
>  	return 0;
> +
> +err_out:
> +	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
> +	return ret;
>  }
>  
>  static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
> @@ -1113,7 +1250,9 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>  	if (ret)
>  		return ret;
>  
> -	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
> +	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
> +			ppgtt->base.dev);
> +
>  	if (ret) {
>  		drm_mm_remove_node(&ppgtt->node);
>  		return ret;
> @@ -1122,30 +1261,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>  	return 0;
>  }
>  
> -static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
> -{
> -	struct drm_device *dev = ppgtt->base.dev;
> -	int i;
> -
> -	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> -		struct page *page;
> -		dma_addr_t pt_addr;
> -
> -		page = ppgtt->pd.page_tables[i]->page;
> -		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
> -				       PCI_DMA_BIDIRECTIONAL);
> -
> -		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
> -			gen6_ppgtt_unmap_pages(ppgtt);
> -			return -EIO;
> -		}
> -
> -		ppgtt->pd.page_tables[i]->daddr = pt_addr;
> -	}
> -
> -	return 0;
> -}
> -
>  static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  {
>  	struct drm_device *dev = ppgtt->base.dev;
> @@ -1166,12 +1281,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  	if (ret)
>  		return ret;
>  
> -	ret = gen6_ppgtt_setup_page_tables(ppgtt);
> -	if (ret) {
> -		gen6_ppgtt_free(ppgtt);
> -		return ret;
> -	}
> -
> +	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
> +	ppgtt->base.teardown_va_range = gen6_teardown_va_range;
>  	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
>  	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
>  	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
> @@ -1182,11 +1293,15 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  	ppgtt->pd.pd_offset =
>  		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
>  
> +	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
> +		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
> +
> +	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
> +
>  	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
>  			 ppgtt->node.size >> 20,
>  			 ppgtt->node.start / PAGE_SIZE);
>  
> -	gen6_write_pdes(ppgtt);
>  	DRM_DEBUG("Adding PPGTT at offset %x\n",
>  		  ppgtt->pd.pd_offset << 10);
>  
> @@ -1318,6 +1433,9 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
>  			     vma->node.start,
>  			     vma->obj->base.size,
>  			     true);
> +	if (vma->vm->teardown_va_range)
> +		vma->vm->teardown_va_range(vma->vm,
> +					   vma->node.start, vma->node.size);

If we ditch unsed_ptes we can ditch this here too. As per my irc
discussion with Chris Wilson I think the best way to actually free
pagetables is in our shrinker, by simply freeing them all when a vm
contains no bound buffer at all. Much less fuzz and avoids all the
complications that ripping out pagetables from underneath active users
might entail. On the cpu side the core vm doesn't even bother with that,
so I expect we can forgoe that complexity on the gpu side, too.

>  }
>  
>  extern int intel_iommu_gfx_mapped;
> @@ -1461,13 +1579,14 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
>  
>  	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
>  		/* TODO: Perhaps it shouldn't be gen6 specific */
> -		if (i915_is_ggtt(vm)) {
> -			if (dev_priv->mm.aliasing_ppgtt)
> -				gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
> -			continue;
> -		}
>  
> -		gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
> +		struct i915_hw_ppgtt *ppgtt =
> +			container_of(vm, struct i915_hw_ppgtt, base);
> +
> +		if (i915_is_ggtt(vm))
> +			ppgtt = dev_priv->mm.aliasing_ppgtt;
> +
> +		gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
>  	}
>  
>  	i915_ggtt_flush(dev_priv);
> @@ -1633,8 +1752,8 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
>  
>  
>  static void i915_ggtt_bind_vma(struct i915_vma *vma,
> -			       enum i915_cache_level cache_level,
> -			       u32 unused)
> +			      enum i915_cache_level cache_level,
> +			      u32 unused)
>  {
>  	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
>  	unsigned int flags = (cache_level == I915_CACHE_NONE) ?
> @@ -1666,8 +1785,8 @@ static void i915_ggtt_unbind_vma(struct i915_vma *vma)
>  }
>  
>  static void ggtt_bind_vma(struct i915_vma *vma,
> -			  enum i915_cache_level cache_level,
> -			  u32 flags)
> +			 enum i915_cache_level cache_level,
> +			 u32 flags)
>  {
>  	struct drm_device *dev = vma->vm->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index c08fe8b..d579f74 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -54,7 +54,10 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
>  #define GEN6_PPGTT_PD_ENTRIES		512
>  #define GEN6_PD_SIZE			(GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
>  #define GEN6_PD_ALIGN			(PAGE_SIZE * 16)
> +#define GEN6_PDE_SHIFT          22
>  #define GEN6_PDE_VALID			(1 << 0)
> +#define GEN6_PDE_MASK			(GEN6_PPGTT_PD_ENTRIES-1)
> +#define NUM_PTE(pde_shift)		(1 << (pde_shift - PAGE_SHIFT))
>  
>  #define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
>  
> @@ -183,8 +186,32 @@ struct i915_vma {
>  	void (*unbind_vma)(struct i915_vma *vma);
>  	/* Map an object into an address space with the given cache flags. */
>  	void (*bind_vma)(struct i915_vma *vma,
> -			 enum i915_cache_level cache_level,
> -			 u32 flags);
> +			enum i915_cache_level cache_level,
> +			u32 flags);
> +};
> +
> +
> +struct i915_pagetab {
> +	struct page *page;
> +	dma_addr_t daddr;
> +
> +	unsigned long *used_ptes;
> +	unsigned int scratch:1;
> +};

there's a bit of noise in the diff because you move around structures. Imo
just using forward decls is better, or reorder the definitions in the
patch that adds them.

> +
> +struct i915_pagedir {
> +	struct page *page; /* NULL for GEN6-GEN7 */
> +	union {
> +		uint32_t pd_offset;
> +		dma_addr_t daddr;
> +	};
> +
> +	struct i915_pagetab *page_tables[GEN6_PPGTT_PD_ENTRIES];
> +};
> +
> +struct i915_pagedirpo {
> +	/* struct page *page; */
> +	struct i915_pagedir *pagedir[GEN8_LEGACY_PDPES];
>  };
>  
>  struct i915_address_space {
> @@ -226,6 +253,12 @@ struct i915_address_space {
>  	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
>  				     enum i915_cache_level level,
>  				     bool valid, u32 flags); /* Create a valid PTE */
> +	int (*allocate_va_range)(struct i915_address_space *vm,
> +				 uint64_t start,
> +				 uint64_t length);
> +	void (*teardown_va_range)(struct i915_address_space *vm,
> +				  uint64_t start,
> +				  uint64_t length);
>  	void (*clear_range)(struct i915_address_space *vm,
>  			    uint64_t start,
>  			    uint64_t length,
> @@ -237,6 +270,29 @@ struct i915_address_space {
>  	void (*cleanup)(struct i915_address_space *vm);
>  };
>  
> +struct i915_hw_ppgtt {
> +	struct i915_address_space base;
> +	struct kref ref;
> +	struct drm_mm_node node;
> +	unsigned num_pd_entries;
> +	unsigned num_pd_pages; /* gen8+ */
> +	union {
> +		struct i915_pagedirpo pdp;
> +		struct i915_pagedir pd;
> +	};
> +
> +	struct i915_pagetab *scratch_pt;
> +
> +	struct drm_i915_file_private *file_priv;
> +
> +	gen6_gtt_pte_t __iomem *pd_addr;
> +
> +	int (*enable)(struct i915_hw_ppgtt *ppgtt);
> +	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
> +			 struct intel_engine_cs *ring);
> +	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
> +};
> +
>  /* The Graphics Translation Table is the way in which GEN hardware translates a
>   * Graphics Virtual Address into a Physical Address. In addition to the normal
>   * collateral associated with any va->pa translations GEN hardware also has a
> @@ -265,44 +321,69 @@ struct i915_gtt {
>  			  unsigned long *mappable_end);
>  };
>  
> -struct i915_pagetab {
> -	struct page *page;
> -	dma_addr_t daddr;
> -};
> +/* For each pde iterates over every pde between from start until start + length.
> + * If start, and start+length are not perfectly divisible, the macro will round
> + * down, and up as needed. The macro modifies pde, start, and length. Dev is
> + * only used to differentiate shift values. Temp is temp.  On gen6/7, start = 0,
> + * and length = 2G effectively iterates over every PDE in the system. On gen8+
> + * it simply iterates over every page directory entry in a page directory.
> + *
> + * XXX: temp is not actually needed, but it saves doing the ALIGN operation.
> + */
> +#define gen6_for_each_pde(pt, pd, start, length, temp, iter) \
> +	for (iter = gen6_pde_index(start), pt = (pd)->page_tables[iter]; \
> +	     length > 0 && iter < GEN6_PPGTT_PD_ENTRIES; \
> +	     pt = (pd)->page_tables[++iter], \
> +	     temp = ALIGN(start+1, 1 << GEN6_PDE_SHIFT) - start, \
> +	     temp = min(temp, (unsigned)length), \
> +	     start += temp, length -= temp)
> +
> +static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
> +{
> +	const uint32_t mask = NUM_PTE(pde_shift) - 1;
> +	return (address >> PAGE_SHIFT) & mask;
> +}
>  
> -struct i915_pagedir {
> -	struct page *page; /* NULL for GEN6-GEN7 */
> -	union {
> -		uint32_t pd_offset;
> -		dma_addr_t daddr;
> -	};
> +/* Helper to counts the number of PTEs within the given length. This count does
> +* not cross a page table boundary, so the max value would be
> +* I915_PPGTT_PT_ENTRIES for GEN6, and GEN8_PTES_PER_PAGE for GEN8.
> +*/
> +static inline size_t i915_pte_count(uint64_t addr, size_t length,
> +					uint32_t pde_shift)
> +{
> +	const uint64_t mask = ~((1 << pde_shift) - 1);
> +	uint64_t end;
>  
> -	struct i915_pagetab *page_tables[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
> -};
> +	BUG_ON(length == 0);
> +	BUG_ON(offset_in_page(addr|length));
>  
> -struct i915_pagedirpo {
> -	/* struct page *page; */
> -	struct i915_pagedir *pagedir[GEN8_LEGACY_PDPES];
> -};
> +	end = addr + length;
>  
> -struct i915_hw_ppgtt {
> -	struct i915_address_space base;
> -	struct kref ref;
> -	struct drm_mm_node node;
> -	unsigned num_pd_entries;
> -	unsigned num_pd_pages; /* gen8+ */
> -	union {
> -		struct i915_pagedirpo pdp;
> -		struct i915_pagedir pd;
> -	};
> +	if ((addr & mask) != (end & mask))
> +		return NUM_PTE(pde_shift) - i915_pte_index(addr, pde_shift);
>  
> -	struct drm_i915_file_private *file_priv;
> +	return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
> +}
>  
> -	int (*enable)(struct i915_hw_ppgtt *ppgtt);
> -	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
> -			 struct intel_engine_cs *ring);
> -	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
> -};
> +static inline uint32_t i915_pde_index(uint64_t addr, uint32_t shift)

Shouldn't this have a gen6 prefix, too?

> +{
> +	return (addr >> shift) & GEN6_PDE_MASK;
> +}
> +
> +static inline uint32_t gen6_pte_index(uint32_t addr)
> +{
> +	return i915_pte_index(addr, GEN6_PDE_SHIFT);
> +}
> +
> +static inline size_t gen6_pte_count(uint32_t addr, uint32_t length)
> +{
> +	return i915_pte_count(addr, length, GEN6_PDE_SHIFT);
> +}
> +
> +static inline uint32_t gen6_pde_index(uint32_t addr)
> +{
> +	return i915_pde_index(addr, GEN6_PDE_SHIFT);
> +}
>  
>  int i915_gem_gtt_init(struct drm_device *dev);
>  void i915_gem_init_global_gtt(struct drm_device *dev);
> -- 
> 2.1.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 11/24] drm/i915: Extract context switch skip and pd load logic
  2014-12-23 17:16   ` [PATCH v2 11/24] drm/i915: Extract context switch skip and pd load logic Michel Thierry
@ 2015-01-05 14:31     ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2015-01-05 14:31 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Tue, Dec 23, 2014 at 05:16:14PM +0000, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
> 
> We have some fanciness coming up. This patch just breaks out the logic
> of context switch skip, pd load pre, and pd load post.
> 
> v2: Use new functions to replace the logic right away (Daniel)
> 
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)

Since we won't ever ship gen8+ without execlist  and full ppgtt legacy
context switching special-cases for that are kinda moot. But something we
could simplify with a follow-up patch once the series has landed, together
with some checks to make sure we don't try to reload the ppgtt on gen8+
ever.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 40 +++++++++++++++++++++++++--------
>  1 file changed, 31 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index b67d269..7b20bd4 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -563,6 +563,33 @@ mi_set_context(struct intel_engine_cs *ring,
>  	return ret;
>  }
>  
> +static inline bool should_skip_switch(struct intel_engine_cs *ring,
> +				      struct intel_context *from,
> +				      struct intel_context *to)
> +{
> +	if (from == to && !to->remap_slice)
> +		return true;
> +
> +	return false;
> +}
> +
> +static bool
> +needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +
> +	return ((INTEL_INFO(ring->dev)->gen < 8) ||
> +			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
> +}
> +
> +static bool
> +needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
> +{
> +	return (!to->legacy_hw_ctx.initialized ||
> +			i915_gem_context_is_default(to)) &&
> +			to->ppgtt && IS_GEN8(ring->dev);
> +}
> +
>  static int do_switch(struct intel_engine_cs *ring,
>  		     struct intel_context *to)
>  {
> @@ -571,9 +598,6 @@ static int do_switch(struct intel_engine_cs *ring,
>  	u32 hw_flags = 0;
>  	bool uninitialized = false;
>  	struct i915_vma *vma;
> -	bool needs_pd_load_pre = ((INTEL_INFO(ring->dev)->gen < 8) ||
> -			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
> -	bool needs_pd_load_post = false;
>  	int ret, i;
>  
>  	if (from != NULL && ring == &dev_priv->ring[RCS]) {
> @@ -581,7 +605,7 @@ static int do_switch(struct intel_engine_cs *ring,
>  		BUG_ON(!i915_gem_obj_is_pinned(from->legacy_hw_ctx.rcs_state));
>  	}
>  
> -	if (from == to && !to->remap_slice)
> +	if (should_skip_switch(ring, from, to))
>  		return 0;
>  
>  	/* Trying to pin first makes error handling easier. */
> @@ -599,7 +623,7 @@ static int do_switch(struct intel_engine_cs *ring,
>  	 */
>  	from = ring->last_context;
>  
> -	if (needs_pd_load_pre) {
> +	if (needs_pd_load_pre(ring, to)) {
>  		/* Older GENs and non render rings still want the load first,
>  		 * "PP_DCLV followed by PP_DIR_BASE register through Load
>  		 * Register Immediate commands in Ring Buffer before submitting
> @@ -644,16 +668,14 @@ static int do_switch(struct intel_engine_cs *ring,
>  	 * XXX: If we implemented page directory eviction code, this
>  	 * optimization needs to be removed.
>  	 */
> -	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
> +	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
>  		hw_flags |= MI_RESTORE_INHIBIT;
> -		needs_pd_load_post = to->ppgtt && IS_GEN8(ring->dev);
> -	}
>  
>  	ret = mi_set_context(ring, to, hw_flags);
>  	if (ret)
>  		goto unpin_out;
>  
> -	if (needs_pd_load_post) {
> +	if (needs_pd_load_post(ring, to)) {
>  		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
>  		/* The hardware context switch is emitted, but we haven't
>  		 * actually changed the state - so it's probably safe to bail
> -- 
> 2.1.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 12/24] drm/i915: Track page table reload need
  2014-12-23 17:16   ` [PATCH v2 12/24] drm/i915: Track page table reload need Michel Thierry
@ 2015-01-05 14:36     ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2015-01-05 14:36 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Tue, Dec 23, 2014 at 05:16:15PM +0000, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
> 
> This patch was formerly known as, "Force pd restore when PDEs change,
> gen6-7." I had to change the name because it is needed for GEN8 too.
> 
> The real issue this is trying to solve is when a new object is mapped
> into the current address space. The GPU does not snoop the new mapping
> so we must do the gen specific action to reload the page tables.
> 
> GEN8 and GEN7 do differ in the way they load page tables for the RCS.
> GEN8 does so with the context restore, while GEN7 requires the proper
> load commands in the command streamer. Non-render is similar for both.
> 
> Caveat for GEN7
> The docs say you cannot change the PDEs of a currently running context.
> We never map new PDEs of a running context, and expect them to be
> present - so I think this is okay. (We can unmap, but this should also
> be okay since we only unmap unreferenced objects that the GPU shouldn't
> be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag
> to signal that even if the context is the same, force a reload. It's
> unclear exactly what this does, but I have a hunch it's the right thing
> to do.
> 
> The logic assumes that we always emit a context switch after mapping new
> PDEs, and before we submit a batch. This is the case today, and has been
> the case since the inception of hardware contexts. A note in the comment
> let's the user know.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> 
> squash! drm/i915: Force pd restore when PDEs change, gen6-7
> 
> It's not just for gen8. If the current context has mappings change, we
> need a context reload to switch
> 
> v2: Rebased after ppgtt clean up patches. Split the warning for aliasing
> and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt
> is always null.
> 
> v3: Invalidate PPGTT TLBs inside alloc_va_range and teardown_va_range.
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c    | 27 ++++++++++++++++++++++-----
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++++++++++
>  drivers/gpu/drm/i915/i915_gem_gtt.c        | 12 ++++++++++++
>  drivers/gpu/drm/i915/i915_gem_gtt.h        |  2 ++
>  4 files changed, 47 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 7b20bd4..fa9d4a1 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -567,8 +567,18 @@ static inline bool should_skip_switch(struct intel_engine_cs *ring,
>  				      struct intel_context *from,
>  				      struct intel_context *to)
>  {
> -	if (from == to && !to->remap_slice)
> -		return true;
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +
> +	if (to->remap_slice)
> +		return false;
> +
> +	if (to->ppgtt) {
> +		if (from == to && !test_bit(ring->id, &to->ppgtt->base.pd_reload_mask))
> +			return true;
> +	} else {
> +		if (from == to && !test_bit(ring->id, &dev_priv->mm.aliasing_ppgtt->base.pd_reload_mask))
> +			return true;
> +	}
>  
>  	return false;
>  }
> @@ -585,9 +595,8 @@ needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
>  static bool
>  needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
>  {
> -	return (!to->legacy_hw_ctx.initialized ||
> -			i915_gem_context_is_default(to)) &&
> -			to->ppgtt && IS_GEN8(ring->dev);
> +	return IS_GEN8(ring->dev) &&
> +			(to->ppgtt || &to->ppgtt->base.pd_reload_mask);
>  }
>  
>  static int do_switch(struct intel_engine_cs *ring,
> @@ -632,6 +641,12 @@ static int do_switch(struct intel_engine_cs *ring,
>  		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
>  		if (ret)
>  			goto unpin_out;
> +
> +		/* Doing a PD load always reloads the page dirs */
> +		if (to->ppgtt)
> +			clear_bit(ring->id, &to->ppgtt->base.pd_reload_mask);
> +		else
> +			clear_bit(ring->id, &dev_priv->mm.aliasing_ppgtt->base.pd_reload_mask);
>  	}
>  
>  	if (ring != &dev_priv->ring[RCS]) {
> @@ -670,6 +685,8 @@ static int do_switch(struct intel_engine_cs *ring,
>  	 */
>  	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
>  		hw_flags |= MI_RESTORE_INHIBIT;
> +	else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->base.pd_reload_mask))
> +		hw_flags |= MI_FORCE_RESTORE;
>  
>  	ret = mi_set_context(ring, to, hw_flags);
>  	if (ret)
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 8330660..09d864f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1199,6 +1199,13 @@ i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
>  	if (ret)
>  		goto error;
>  
> +	if (ctx->ppgtt)
> +		WARN(ctx->ppgtt->base.pd_reload_mask & (1<<ring->id),
> +			"%s didn't clear reload\n", ring->name);
> +	else
> +		WARN(dev_priv->mm.aliasing_ppgtt->base.pd_reload_mask &
> +			(1<<ring->id), "%s didn't clear reload\n", ring->name);
> +
>  	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
>  	instp_mask = I915_EXEC_CONSTANTS_MASK;
>  	switch (instp_mode) {
> @@ -1446,6 +1453,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	if (ret)
>  		goto err;
>  
> +	/* XXX: Reserve has possibly change PDEs which means we must do a
> +	 * context switch before we can coherently read some of the reserved
> +	 * VMAs. */
> +
>  	/* The objects are in their final locations, apply the relocations. */
>  	if (need_relocs)
>  		ret = i915_gem_execbuffer_relocate(eb);
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 313432e..54c7ca7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1126,6 +1126,15 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  			       4096, PCI_DMA_BIDIRECTIONAL);
>  }
>  
> +/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
> + * are switching between contexts with the same LRCA, we also must do a force
> + * restore.
> + */
> +#define ppgtt_invalidate_tlbs(vm) do {\
> +	/* If current vm != vm, */ \
> +	vm->pd_reload_mask = INTEL_INFO(vm->dev)->ring_mask; \
> +} while (0)

Again this should be a proper static inline. Also maybe call this
mark_tlbs_dirty since that's what it does, the invalidate happens later
on in the ctx switch code. In the same spirit:
s/pd_realod_mask/pd_dirty_rings/.
-Daniel

> +
>  static int gen6_alloc_va_range(struct i915_address_space *vm,
>  			       uint64_t start, uint64_t length)
>  {
> @@ -1154,6 +1163,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
>  				I915_PPGTT_PT_ENTRIES);
>  	}
>  
> +	ppgtt_invalidate_tlbs(vm);
>  	return 0;
>  }
>  
> @@ -1169,6 +1179,8 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
>  		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
>  			     gen6_pte_count(start, length));
>  	}
> +
> +	ppgtt_invalidate_tlbs(vm);
>  }
>  
>  static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index d579f74..dc71cae 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -226,6 +226,8 @@ struct i915_address_space {
>  		struct page *page;
>  	} scratch;
>  
> +	unsigned long pd_reload_mask;

Conceptually this should be in i915_hw_ppgtt, not in struct
i915_address_space.  Anything holding up that movement that I don't see?

> +
>  	/**
>  	 * List of objects currently involved in rendering.
>  	 *
> -- 
> 2.1.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 14/24] drm/i915: Finish gen6/7 dynamic page table allocation
  2014-12-23 17:16   ` [PATCH v2 14/24] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
@ 2015-01-05 14:45     ` Daniel Vetter
  2015-01-13 11:53       ` Michel Thierry
  0 siblings, 1 reply; 229+ messages in thread
From: Daniel Vetter @ 2015-01-05 14:45 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Tue, Dec 23, 2014 at 05:16:17PM +0000, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
> 
> This patch continues on the idea from the previous patch. From here on,
> in the steady state, PDEs are all pointing to the scratch page table (as
> recommended in the spec). When an object is allocated in the VA range,
> the code will determine if we need to allocate a page for the page
> table. Similarly when the object is destroyed, we will remove, and free
> the page table pointing the PDE back to the scratch page.
> 
> Following patches will work to unify the code a bit as we bring in GEN8
> support. GEN6 and GEN8 are different enough that I had a hard time to
> get to this point with as much common code as I do.
> 
> The aliasing PPGTT must pre-allocate all of the page tables. There are a
> few reasons for this. Two trivial ones: aliasing ppgtt goes through the
> ggtt paths, so it's hard to maintain, we currently do not restore the
> default context (assuming the previous force reload is indeed
> necessary). Most importantly though, the only way (it seems from
> empirical evidence) to invalidate the CS TLBs on non-render ring is to
> either use ring sync (which requires actually stopping the rings in
> order to synchronize when the sync completes vs. where you are in
> execution), or to reload DCLV.  Since without full PPGTT we do not ever
> reload the DCLV register, there is no good way to achieve this. The
> simplest solution is just to not support dynamic page table
> creation/destruction in the aliasing PPGTT.
> 
> We could always reload DCLV, but this seems like quite a bit of excess
> overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
> page tables.
> 
> v2: Make the page table bitmap declared inside the function (Chris)
> Simplify the way scratching address space works.
> Move the alloc/teardown tracepoints up a level in the call stack so that
> both all implementations get the trace.
> 
> v3: Updated trace event to spit out a name
> 
> v4: Aliasing ppgtt is now initialized differently (in setup global gtt)
> 
> v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check for
> trace, as it is no longer possible after the PPGTT cleanup patch series
> of a couple of months ago (Daniel).
> 
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)

The tracepoints should be split into a separate patch. Although the
teardown stuff will likely disappear I guess ...

Two more comments below.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |   3 +-
>  drivers/gpu/drm/i915/i915_gem.c     |   2 +
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 128 ++++++++++++++++++++++++++++++++----
>  drivers/gpu/drm/i915/i915_trace.h   | 115 ++++++++++++++++++++++++++++++++
>  4 files changed, 236 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 60f91bc..0f63076 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2149,6 +2149,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>  		seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", I915_READ(RING_PP_DIR_BASE_READ(ring)));
>  		seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", I915_READ(RING_PP_DIR_DCLV(ring)));
>  	}
> +	seq_printf(m, "ECOCHK: 0x%08x\n\n", I915_READ(GAM_ECOCHK));
> +
>  	if (dev_priv->mm.aliasing_ppgtt) {
>  		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
>  
> @@ -2165,7 +2167,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>  			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
>  		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
>  	}
> -	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
>  }
>  
>  static int i915_ppgtt_info(struct seq_file *m, void *data)
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5d52990..1649fb2 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3599,6 +3599,8 @@ search_free:
>  
>  	/*  allocate before insert / bind */
>  	if (vma->vm->allocate_va_range) {
> +		trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size,
> +				VM_TO_TRACE_NAME(vma->vm));
>  		ret = vma->vm->allocate_va_range(vma->vm,
>  						vma->node.start,
>  						vma->node.size);
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 54c7ca7..32a355a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1138,10 +1138,47 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  static int gen6_alloc_va_range(struct i915_address_space *vm,
>  			       uint64_t start, uint64_t length)
>  {
> +	DECLARE_BITMAP(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
> +	struct drm_device *dev = vm->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct i915_hw_ppgtt *ppgtt =
>  				container_of(vm, struct i915_hw_ppgtt, base);
>  	struct i915_pagetab *pt;
> +	const uint32_t start_save = start, length_save = length;
>  	uint32_t pde, temp;
> +	int ret;
> +
> +	BUG_ON(upper_32_bits(start));
> +
> +	bitmap_zero(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
> +
> +	/* The allocation is done in two stages so that we can bail out with
> +	 * minimal amount of pain. The first stage finds new page tables that
> +	 * need allocation. The second stage marks use ptes within the page
> +	 * tables.
> +	 */

If we drop the bitmask tracking we could massively simplify this -
checking just the various pt pointers should be enough?

> +	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
> +		if (pt != ppgtt->scratch_pt) {
> +			WARN_ON(bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
> +			continue;
> +		}
> +
> +		/* We've already allocated a page table */
> +		WARN_ON(!bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
> +
> +		pt = alloc_pt_single(dev);
> +		if (IS_ERR(pt)) {
> +			ret = PTR_ERR(pt);
> +			goto unwind_out;
> +		}
> +
> +		ppgtt->pd.page_tables[pde] = pt;
> +		set_bit(pde, new_page_tables);
> +		trace_i915_pagetable_alloc(vm, pde, start, GEN6_PDE_SHIFT);
> +	}
> +
> +	start = start_save;
> +	length = length_save;
>  
>  	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
>  		int j;
> @@ -1159,12 +1196,35 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
>  			}
>  		}
>  
> -		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
> +		if (test_and_clear_bit(pde, new_page_tables))
> +			gen6_write_pdes(&ppgtt->pd, pde, pt);
> +
> +		trace_i915_pagetable_map(vm, pde, pt,
> +					 gen6_pte_index(start),
> +					 gen6_pte_count(start, length),
> +					 I915_PPGTT_PT_ENTRIES);
> +		bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
>  				I915_PPGTT_PT_ENTRIES);
>  	}
>  
> +	WARN_ON(!bitmap_empty(new_page_tables, GEN6_PPGTT_PD_ENTRIES));
> +
> +	/* Make sure write is complete before other code can use this page
> +	 * table. Also require for WC mapped PTEs */
> +	readl(dev_priv->gtt.gsm);
> +
>  	ppgtt_invalidate_tlbs(vm);
>  	return 0;
> +
> +unwind_out:
> +	for_each_set_bit(pde, new_page_tables, GEN6_PPGTT_PD_ENTRIES) {
> +		struct i915_pagetab *pt = ppgtt->pd.page_tables[pde];
> +		ppgtt->pd.page_tables[pde] = NULL;
> +		free_pt_single(pt, vm->dev);
> +	}
> +
> +	ppgtt_invalidate_tlbs(vm);
> +	return ret;
>  }
>  
>  static void gen6_teardown_va_range(struct i915_address_space *vm,
> @@ -1176,8 +1236,27 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
>  	uint32_t pde, temp;
>  
>  	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
> +
> +		if (WARN(pt == ppgtt->scratch_pt,
> +		    "Tried to teardown scratch page vm %p. pde %u: %llx-%llx\n",
> +		    vm, pde, start, start + length))
> +			continue;
> +
> +		trace_i915_pagetable_unmap(vm, pde, pt,
> +					   gen6_pte_index(start),
> +					   gen6_pte_count(start, length),
> +					   I915_PPGTT_PT_ENTRIES);
> +
>  		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
>  			     gen6_pte_count(start, length));
> +
> +		if (bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES)) {
> +			trace_i915_pagetable_destroy(vm, pde,
> +						     start & GENMASK_ULL(63, GEN6_PDE_SHIFT),
> +						     GEN6_PDE_SHIFT);
> +			gen6_write_pdes(&ppgtt->pd, pde, ppgtt->scratch_pt);
> +			ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
> +		}
>  	}
>  
>  	ppgtt_invalidate_tlbs(vm);
> @@ -1187,9 +1266,13 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  {
>  	int i;
>  
> -	for (i = 0; i < ppgtt->num_pd_entries; i++)
> -		free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
> +	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> +		struct i915_pagetab *pt = ppgtt->pd.page_tables[i];
> +		if (pt != ppgtt->scratch_pt)
> +			free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
> +	}
>  
> +	/* Consider putting this as part of pd free. */
>  	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
>  	free_pd_single(&ppgtt->pd);
>  }
> @@ -1254,7 +1337,7 @@ err_out:
>  	return ret;
>  }
>  
> -static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
> +static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt, bool preallocate_pt)

Imo it would be clearer to move the pt preallocation for alising ppgtt
into the ppgtt_init function. Makes for a bit a bigger diff, but will
result in less convoluted control flow since we should end up in a nice

if (alising)
	/* create all pts */
else
	/* allocate&use scratch_pt */

Aside: Should we only allocate the scratch_pt for !aliasing?

>  {
>  	int ret;
>  
> @@ -1262,10 +1345,14 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>  	if (ret)
>  		return ret;
>  
> +	if (!preallocate_pt)
> +		return 0;
> +
>  	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
>  			ppgtt->base.dev);
>  
>  	if (ret) {
> +		free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
>  		drm_mm_remove_node(&ppgtt->node);
>  		return ret;
>  	}
> @@ -1273,7 +1360,17 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>  	return 0;
>  }
>  
> -static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> +static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
> +				  uint64_t start, uint64_t length)
> +{
> +	struct i915_pagetab *unused;
> +	uint32_t pde, temp;
> +
> +	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde)
> +		ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
> +}
> +
> +static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
>  {
>  	struct drm_device *dev = ppgtt->base.dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> @@ -1289,7 +1386,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  	} else
>  		BUG();
>  
> -	ret = gen6_ppgtt_alloc(ppgtt);
> +	ret = gen6_ppgtt_alloc(ppgtt, aliasing);
>  	if (ret)
>  		return ret;
>  
> @@ -1308,6 +1405,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
>  		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
>  
> +	if (!aliasing)
> +		gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
> +
>  	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
>  
>  	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
> @@ -1320,7 +1420,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  	return 0;
>  }
>  
> -static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
> +static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
> +		bool aliasing)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  
> @@ -1328,7 +1429,7 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
>  	ppgtt->base.scratch = dev_priv->gtt.base.scratch;
>  
>  	if (INTEL_INFO(dev)->gen < 8)
> -		return gen6_ppgtt_init(ppgtt);
> +		return gen6_ppgtt_init(ppgtt, aliasing);
>  	else
>  		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
>  }
> @@ -1337,7 +1438,7 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	int ret = 0;
>  
> -	ret = __hw_ppgtt_init(dev, ppgtt);
> +	ret = __hw_ppgtt_init(dev, ppgtt, false);
>  	if (ret == 0) {
>  		kref_init(&ppgtt->ref);
>  		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
> @@ -1445,9 +1546,14 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
>  			     vma->node.start,
>  			     vma->obj->base.size,
>  			     true);
> -	if (vma->vm->teardown_va_range)
> +	if (vma->vm->teardown_va_range) {
> +		trace_i915_va_teardown(vma->vm,
> +				       vma->node.start, vma->node.size,
> +				       VM_TO_TRACE_NAME(vma->vm));
> +
>  		vma->vm->teardown_va_range(vma->vm,
>  					   vma->node.start, vma->node.size);
> +	}
>  }
>  
>  extern int intel_iommu_gfx_mapped;
> @@ -1963,7 +2069,7 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
>  		if (!ppgtt)
>  			return -ENOMEM;
>  
> -		ret = __hw_ppgtt_init(dev, ppgtt);
> +		ret = __hw_ppgtt_init(dev, ppgtt, true);
>  		if (ret != 0)
>  			return ret;
>  
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index f004d3d..0b617c9 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -156,6 +156,121 @@ TRACE_EVENT(i915_vma_unbind,
>  		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
>  );
>  
> +#define VM_TO_TRACE_NAME(vm) \
> +	(i915_is_ggtt(vm) ? "GGTT" : \
> +				      "Private VM")
> +
> +DECLARE_EVENT_CLASS(i915_va,
> +	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
> +	TP_ARGS(vm, start, length, name),
> +
> +	TP_STRUCT__entry(
> +		__field(struct i915_address_space *, vm)
> +		__field(u64, start)
> +		__field(u64, end)
> +		__string(name, name)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vm = vm;
> +		__entry->start = start;
> +		__entry->end = start + length;
> +		__assign_str(name, name);
> +	),
> +
> +	TP_printk("vm=%p (%s), 0x%llx-0x%llx",
> +		  __entry->vm, __get_str(name),  __entry->start, __entry->end)
> +);
> +
> +DEFINE_EVENT(i915_va, i915_va_alloc,
> +	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
> +	     TP_ARGS(vm, start, length, name)
> +);
> +
> +DEFINE_EVENT(i915_va, i915_va_teardown,
> +	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
> +	     TP_ARGS(vm, start, length, name)
> +);
> +
> +DECLARE_EVENT_CLASS(i915_pagetable,
> +	TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
> +	TP_ARGS(vm, pde, start, pde_shift),
> +
> +	TP_STRUCT__entry(
> +		__field(struct i915_address_space *, vm)
> +		__field(u32, pde)
> +		__field(u64, start)
> +		__field(u64, end)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vm = vm;
> +		__entry->pde = pde;
> +		__entry->start = start;
> +		__entry->end = (start + (1ULL << pde_shift)) & ~((1ULL << pde_shift)-1);
> +	),
> +
> +	TP_printk("vm=%p, pde=%d (0x%llx-0x%llx)",
> +		  __entry->vm, __entry->pde, __entry->start, __entry->end)
> +);
> +
> +DEFINE_EVENT(i915_pagetable, i915_pagetable_alloc,
> +	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
> +	     TP_ARGS(vm, pde, start, pde_shift)
> +);
> +
> +DEFINE_EVENT(i915_pagetable, i915_pagetable_destroy,
> +	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
> +	     TP_ARGS(vm, pde, start, pde_shift)
> +);
> +
> +/* Avoid extra math because we only support two sizes. The format is defined by
> + * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
> +#define TRACE_PT_SIZE(bits) \
> +	((((bits) == 1024) ? 288 : 144) + 1)
> +
> +DECLARE_EVENT_CLASS(i915_pagetable_update,
> +	TP_PROTO(struct i915_address_space *vm, u32 pde,
> +		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
> +	TP_ARGS(vm, pde, pt, first, len, bits),
> +
> +	TP_STRUCT__entry(
> +		__field(struct i915_address_space *, vm)
> +		__field(u32, pde)
> +		__field(u32, first)
> +		__field(u32, last)
> +		__dynamic_array(char, cur_ptes, TRACE_PT_SIZE(bits))
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vm = vm;
> +		__entry->pde = pde;
> +		__entry->first = first;
> +		__entry->last = first + len;
> +
> +		bitmap_scnprintf(__get_str(cur_ptes),
> +				 TRACE_PT_SIZE(bits),
> +				 pt->used_ptes,
> +				 bits);
> +	),
> +
> +	TP_printk("vm=%p, pde=%d, updating %u:%u\t%s",
> +		  __entry->vm, __entry->pde, __entry->last, __entry->first,
> +		  __get_str(cur_ptes))
> +);
> +
> +DEFINE_EVENT(i915_pagetable_update, i915_pagetable_map,
> +	TP_PROTO(struct i915_address_space *vm, u32 pde,
> +		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
> +	TP_ARGS(vm, pde, pt, first, len, bits)
> +);
> +
> +DEFINE_EVENT(i915_pagetable_update, i915_pagetable_unmap,
> +	TP_PROTO(struct i915_address_space *vm, u32 pde,
> +		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
> +	TP_ARGS(vm, pde, pt, first, len, bits)
> +);
> +
>  TRACE_EVENT(i915_gem_object_change_domain,
>  	    TP_PROTO(struct drm_i915_gem_object *obj, u32 old_read, u32 old_write),
>  	    TP_ARGS(obj, old_read, old_write),
> -- 
> 2.1.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 23/24] drm/i915/bdw: Dynamic page table allocations
  2014-12-23 17:16   ` [PATCH v2 23/24] drm/i915/bdw: Dynamic page table allocations Michel Thierry
@ 2015-01-05 14:52     ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2015-01-05 14:52 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Tue, Dec 23, 2014 at 05:16:26PM +0000, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
> 
> This finishes off the dynamic page tables allocations, in the legacy 3
> level style that already exists. Most everything has already been setup
> to this point, the patch finishes off the enabling by setting the
> appropriate function pointers.
> 
> Zombie tracking:
> This could be a separate patch, but I found it helpful for debugging.
> Since we write page tables asynchronously with respect to the GPU using
> them, we can't actually free the page tables until we know the GPU won't
> use them. With this patch, that is always when the context dies.  It
> would be possible to write a reaper to go through zombies and clean them
> up when under memory pressure. That exercise is left for the reader.

As mention in some previous reply, freeing pagetables is a separate issue
entirely. Imo we can just idle the gpu and then rip out all pagetables for
empty vms.

> Scratch unused pages:
> The object pages can get freed even if a page table still points to
> them.  Like the zombie fix, we need to make sure we don't let our GPU
> access arbitrary memory when we've unmapped things.

Hm, either I don't follow or this would mean that our active tracking is
broken and we release backing storage before the gpu stopped using it.

In any case I vote to simplify things a lot for now and just never
teardown pagetables at all. Implementing a last-ditch attempt to free
memory can be done with a lot less complexity imo than trying to be super
careful without stalling the gpu in normal operations.

One more comment below.
-Daniel

> 
> v2: Update aliasing/true ppgtt allocate/teardown/clear functions for
> gen 6 & 7.
> 
> v3: Rebase.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 377 +++++++++++++++++++++++++++++-------
>  drivers/gpu/drm/i915/i915_gem_gtt.h |  16 +-
>  2 files changed, 326 insertions(+), 67 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 6254677..571c307 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -602,7 +602,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  	}
>  }
>  
> -static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
> +static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
>  			     struct i915_pagetab *pt,
>  			     struct drm_device *dev)
>  {
> @@ -619,7 +619,7 @@ static void gen8_map_pagetable_range(struct i915_pagedir *pd,
>  				     uint64_t length,
>  				     struct drm_device *dev)
>  {
> -	gen8_ppgtt_pde_t *pagedir = kmap_atomic(pd->page);
> +	gen8_ppgtt_pde_t * const pagedir = kmap_atomic(pd->page);
>  	struct i915_pagetab *pt;
>  	uint64_t temp, pde;
>  
> @@ -632,8 +632,9 @@ static void gen8_map_pagetable_range(struct i915_pagedir *pd,
>  	kunmap_atomic(pagedir);
>  }
>  
> -static void gen8_teardown_va_range(struct i915_address_space *vm,
> -				   uint64_t start, uint64_t length)
> +static void __gen8_teardown_va_range(struct i915_address_space *vm,
> +				     uint64_t start, uint64_t length,
> +				     bool dead)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
>  				container_of(vm, struct i915_hw_ppgtt, base);
> @@ -655,6 +656,13 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
>  			     pdpe, vm);
>  			continue;
>  		} else {
> +			if (dead && pd->zombie) {
> +				WARN_ON(test_bit(pdpe, ppgtt->pdp.used_pdpes));
> +				free_pd_single(pd, vm->dev);
> +				ppgtt->pdp.pagedir[pdpe] = NULL;
> +				continue;
> +			}
> +
>  			WARN(!test_bit(pdpe, ppgtt->pdp.used_pdpes),
>  			     "PDPE %d not reserved, but is allocated (%p)",
>  			     pdpe, vm);
> @@ -666,34 +674,64 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
>  				     "PDE %d is not allocated, but is reserved (%p)\n",
>  				     pde, vm);
>  				continue;
> -			} else
> +			} else {
> +				if (dead && pt->zombie) {
> +					WARN_ON(test_bit(pde, pd->used_pdes));
> +					free_pt_single(pt, vm->dev);
> +					pd->page_tables[pde] = NULL;
> +					continue;
> +				}
>  				WARN(!test_bit(pde, pd->used_pdes),
>  				     "PDE %d not reserved, but is allocated (%p)",
>  				     pde, vm);
> +			}
>  
>  			bitmap_clear(pt->used_ptes,
>  				     gen8_pte_index(pd_start),
>  				     gen8_pte_count(pd_start, pd_len));
>  
>  			if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PAGE)) {
> +				WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
> +				if (!dead) {
> +					pt->zombie = 1;
> +					continue;
> +				}
>  				free_pt_single(pt, vm->dev);
>  				pd->page_tables[pde] = NULL;
> -				WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
> +
>  			}
>  		}
>  
> +		gen8_ppgtt_clear_range(vm, pd_start, pd_len, true);
> +
>  		if (bitmap_empty(pd->used_pdes, GEN8_PDES_PER_PAGE)) {
> +			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
> +			if (!dead) {
> +				/* We've unmapped a possibly live context. Make
> +				 * note of it so we can clean it up later. */
> +				pd->zombie = 1;
> +				continue;
> +			}
>  			free_pd_single(pd, vm->dev);
>  			ppgtt->pdp.pagedir[pdpe] = NULL;
> -			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
>  		}
>  	}
>  }
>  
> +static void gen8_teardown_va_range(struct i915_address_space *vm,
> +				   uint64_t start, uint64_t length)
> +{
> +	__gen8_teardown_va_range(vm, start, length, false);
> +}
> +
>  static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  {
> -	gen8_teardown_va_range(&ppgtt->base,
> -			       ppgtt->base.start, ppgtt->base.total);
> +	trace_i915_va_teardown(&ppgtt->base,
> +			       ppgtt->base.start, ppgtt->base.total,
> +			       VM_TO_TRACE_NAME(&ppgtt->base));
> +	__gen8_teardown_va_range(&ppgtt->base,
> +				 ppgtt->base.start, ppgtt->base.total,
> +				 true);
>  }
>  
>  static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> @@ -704,67 +742,177 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>  	gen8_ppgtt_free(ppgtt);
>  }
>  
> -static int gen8_ppgtt_alloc_pagetabs(struct i915_pagedir *pd,
> +/**
> + * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
> + * @ppgtt:	Master ppgtt structure.
> + * @pd:		Page directory for this address range.
> + * @start:	Starting virtual address to begin allocations.
> + * @length	Size of the allocations.
> + * @new_pts:	Bitmap set by function with new allocations. Likely used by the
> + *		caller to free on error.
> + *
> + * Allocate the required number of page tables. Extremely similar to
> + * gen8_ppgtt_alloc_pagedirs(). The main difference is here we are limited by
> + * the page directory boundary (instead of the page directory pointer). That
> + * boundary is 1GB virtual. Therefore, unlike gen8_ppgtt_alloc_pagedirs(), it is
> + * possible, and likely that the caller will need to use multiple calls of this
> + * function to achieve the appropriate allocation.
> + *
> + * Return: 0 if success; negative error code otherwise.
> + */
> +static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
> +				     struct i915_pagedir *pd,
>  				     uint64_t start,
>  				     uint64_t length,
> -				     struct drm_device *dev)
> +				     unsigned long *new_pts)
>  {
> -	struct i915_pagetab *unused;
> +	struct i915_pagetab *pt;
>  	uint64_t temp;
>  	uint32_t pde;
>  
> -	gen8_for_each_pde(unused, pd, start, length, temp, pde) {
> -		BUG_ON(unused);
> -		pd->page_tables[pde] = alloc_pt_single(dev);
> -		if (IS_ERR(pd->page_tables[pde]))
> +	gen8_for_each_pde(pt, pd, start, length, temp, pde) {
> +		/* Don't reallocate page tables */
> +		if (pt) {
> +			/* Scratch is never allocated this way */
> +			WARN_ON(pt->scratch);
> +			/* If there is a zombie, we can reuse it and save time
> +			 * on the allocation. If we clear the zombie status and
> +			 * the caller somehow fails, we'll probably hit some
> +			 * assertions, so it's up to them to fix up the bitmaps.
> +			 */
> +			continue;
> +		}
> +
> +		pt = alloc_pt_single(ppgtt->base.dev);
> +		if (IS_ERR(pt))
>  			goto unwind_out;
> +
> +		pd->page_tables[pde] = pt;
> +		set_bit(pde, new_pts);
>  	}
>  
>  	return 0;
>  
>  unwind_out:
> -	while (pde--)
> -		free_pt_single(pd->page_tables[pde], dev);
> +	for_each_set_bit(pde, new_pts, GEN8_PDES_PER_PAGE)
> +		free_pt_single(pd->page_tables[pde], ppgtt->base.dev);
>  
>  	return -ENOMEM;
>  }
>  
> -/* bitmap of new pagedirs */
> -static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
> +/**
> + * gen8_ppgtt_alloc_pagedirs() - Allocate page directories for VA range.
> + * @ppgtt:	Master ppgtt structure.
> + * @pdp:	Page directory pointer for this address range.
> + * @start:	Starting virtual address to begin allocations.
> + * @length	Size of the allocations.
> + * @new_pds	Bitmap set by function with new allocations. Likely used by the
> + *		caller to free on error.
> + *
> + * Allocate the required number of page directories starting at the pde index of
> + * @start, and ending at the pde index @start + @length. This function will skip
> + * over already allocated page directories within the range, and only allocate
> + * new ones, setting the appropriate pointer within the pdp as well as the
> + * correct position in the bitmap @new_pds.
> + *
> + * The function will only allocate the pages within the range for a give page
> + * directory pointer. In other words, if @start + @length straddles a virtually
> + * addressed PDP boundary (512GB for 4k pages), there will be more allocations
> + * required by the caller, This is not currently possible, and the BUG in the
> + * code will prevent it.
> + *
> + * Return: 0 if success; negative error code otherwise.
> + */
> +static int gen8_ppgtt_alloc_pagedirs(struct i915_hw_ppgtt *ppgtt,
> +				     struct i915_pagedirpo *pdp,
>  				     uint64_t start,
>  				     uint64_t length,
> -				     struct drm_device *dev)
> +				     unsigned long *new_pds)
>  {
> -	struct i915_pagedir *unused;
> +	struct i915_pagedir *pd;
>  	uint64_t temp;
>  	uint32_t pdpe;
>  
> +	BUG_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
> +
>  	/* FIXME: PPGTT container_of won't work for 64b */
>  	BUG_ON((start + length) > 0x800000000ULL);
>  
> -	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
> -		BUG_ON(unused);
> -		pdp->pagedir[pdpe] = alloc_pd_single(dev);
> +	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
> +		if (pd)
> +			continue;
>  
> -		if (IS_ERR(pdp->pagedir[pdpe]))
> +		pd = alloc_pd_single(ppgtt->base.dev);
> +		if (IS_ERR(pd))
>  			goto unwind_out;
> +
> +		pdp->pagedir[pdpe] = pd;
> +		set_bit(pdpe, new_pds);
>  	}
>  
>  	return 0;
>  
>  unwind_out:
> -	while (pdpe--)
> -		free_pd_single(pdp->pagedir[pdpe], dev);
> +	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
> +		free_pd_single(pdp->pagedir[pdpe], ppgtt->base.dev);
>  
>  	return -ENOMEM;
>  }
>  
> +static inline void
> +free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
> +{
> +	int i;
> +	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
> +		kfree(new_pts[i]);
> +	kfree(new_pts);
> +	kfree(new_pds);
> +}
> +
> +/* Fills in the page directory bitmap, ant the array of page tables bitmap. Both
> + * of these are based on the number of PDPEs in the system.
> + */
> +int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
> +					 unsigned long ***new_pts)
> +{
> +	int i;
> +	unsigned long *pds;
> +	unsigned long **pts;
> +
> +	pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
> +	if (!pds)
> +		return -ENOMEM;
> +
> +	pts = kcalloc(GEN8_PDES_PER_PAGE, sizeof(unsigned long *), GFP_KERNEL);
> +	if (!pts) {
> +		kfree(pds);
> +		return -ENOMEM;
> +	}
> +
> +	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
> +		pts[i] = kcalloc(BITS_TO_LONGS(GEN8_PDES_PER_PAGE),
> +				 sizeof(unsigned long), GFP_KERNEL);
> +		if (!pts[i])
> +			goto err_out;
> +	}
> +
> +	*new_pds = pds;
> +	*new_pts = (unsigned long **)pts;
> +
> +	return 0;
> +
> +err_out:
> +	free_gen8_temp_bitmaps(pds, pts);
> +	return -ENOMEM;
> +}
> +
>  static int gen8_alloc_va_range(struct i915_address_space *vm,
>  			       uint64_t start,
>  			       uint64_t length)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
> +	unsigned long *new_page_dirs, **new_page_tables;
>  	struct i915_pagedir *pd;
>  	const uint64_t orig_start = start;
>  	const uint64_t orig_length = length;
> @@ -772,43 +920,103 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>  	uint32_t pdpe;
>  	int ret;
>  
> -	/* Do the allocations first so we can easily bail out */
> -	ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length,
> -					ppgtt->base.dev);
> +#ifndef CONFIG_64BIT
> +	/* Disallow 64b address on 32b platforms. Nothing is wrong with doing
> +	 * this in hardware, but a lot of the drm code is not prepared to handle
> +	 * 64b offset on 32b platforms. */
> +	if (start + length > 0x100000000ULL)
> +		return -E2BIG;
> +#endif
> +
> +	/* Wrap is never okay since we can only represent 48b, and we don't
> +	 * actually use the other side of the canonical address space.
> +	 */
> +	if (WARN_ON(start + length < start))
> +		return -ERANGE;
> +
> +	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
>  	if (ret)
>  		return ret;
>  
> +	/* Do the allocations first so we can easily bail out */
> +	ret = gen8_ppgtt_alloc_pagedirs(ppgtt, &ppgtt->pdp, start, length,
> +					new_page_dirs);
> +	if (ret) {
> +		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> +		return ret;
> +	}
> +
> +	/* For every page directory referenced, allocate page tables */
>  	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
> -		ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
> -						ppgtt->base.dev);
> +		bitmap_zero(new_page_tables[pdpe], GEN8_PDES_PER_PAGE);
> +		ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
> +						new_page_tables[pdpe]);
>  		if (ret)
>  			goto err_out;
>  	}
>  
> -	/* Now mark everything we've touched as used. This doesn't allow for
> -	 * robust error checking, but it makes the code a hell of a lot simpler.
> -	 */
>  	start = orig_start;
>  	length = orig_length;
>  
> +	/* Allocations have completed successfully, so set the bitmaps, and do
> +	 * the mappings. */
>  	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
> +		gen8_ppgtt_pde_t *const pagedir = kmap_atomic(pd->page);
>  		struct i915_pagetab *pt;
>  		uint64_t pd_len = gen8_clamp_pd(start, length);
>  		uint64_t pd_start = start;
>  		uint32_t pde;
> -		gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
> -			bitmap_set(pd->page_tables[pde]->used_ptes,
> -				   gen8_pte_index(start),
> -				   gen8_pte_count(start, length));
> +
> +		/* Every pd should be allocated, we just did that above. */
> +		BUG_ON(!pd);
> +
> +		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
> +			/* Same reasoning as pd */
> +			BUG_ON(!pt);
> +			BUG_ON(!pd_len);
> +			BUG_ON(!gen8_pte_count(pd_start, pd_len));
> +
> +			/* Set our used ptes within the page table */
> +			bitmap_set(pt->used_ptes,
> +				   gen8_pte_index(pd_start),
> +				   gen8_pte_count(pd_start, pd_len));
> +
> +			/* Our pde is now pointing to the pagetable, pt */
>  			set_bit(pde, pd->used_pdes);
> +
> +			/* Map the PDE to the page table */
> +			__gen8_do_map_pt(pagedir + pde, pt, vm->dev);
> +
> +			/* NB: We haven't yet mapped ptes to pages. At this
> +			 * point we're still relying on insert_entries() */
> +
> +			/* No longer possible this page table is a zombie */
> +			pt->zombie = 0;
>  		}
> +
> +		if (!HAS_LLC(vm->dev))
> +			drm_clflush_virt_range(pagedir, PAGE_SIZE);
> +
> +		kunmap_atomic(pagedir);
> +
>  		set_bit(pdpe, ppgtt->pdp.used_pdpes);
> +		/* This pd is officially not a zombie either */
> +		ppgtt->pdp.pagedir[pdpe]->zombie = 0;
>  	}
>  
> +	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
>  	return 0;
>  
>  err_out:
> -	gen8_teardown_va_range(vm, orig_start, start);
> +	while (pdpe--) {
> +		for_each_set_bit(temp, new_page_tables[pdpe], GEN8_PDES_PER_PAGE)
> +			free_pt_single(pd->page_tables[temp], vm->dev);
> +	}
> +
> +	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
> +		free_pd_single(ppgtt->pdp.pagedir[pdpe], vm->dev);
> +
> +	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
>  	return ret;
>  }
>  
> @@ -819,37 +1027,68 @@ err_out:
>   * space.
>   *
>   */
> -static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
> +static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  {
> -	struct i915_pagedir *pd;
> -	uint64_t temp, start = 0;
> -	const uint64_t orig_length = size;
> -	uint32_t pdpe;
> -	int ret;
> +	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
> +	if (IS_ERR(ppgtt->scratch_pd))
> +		return PTR_ERR(ppgtt->scratch_pd);
>  
>  	ppgtt->base.start = 0;
>  	ppgtt->base.total = size;
> -	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
> -	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
>  	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
> +	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
> +
>  	ppgtt->switch_mm = gen8_mm_switch;
>  
> -	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
> -	if (IS_ERR(ppgtt->scratch_pd))
> -		return PTR_ERR(ppgtt->scratch_pd);
> +	return 0;
> +}
> +
> +static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> +{
> +	struct drm_device *dev = ppgtt->base.dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct i915_pagedir *pd;
> +	uint64_t temp, start = 0, size = dev_priv->gtt.base.total;
> +	uint32_t pdpe;
> +	int ret;
>  
> +	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
> +	if (ret)
> +		return ret;
> +
> +	/* Aliasing PPGTT has to always work and be mapped because of the way we
> +	 * use RESTORE_INHIBIT in the context switch. This will be fixed
> +	 * eventually. */
>  	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
>  	if (ret) {
>  		free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
>  		return ret;
>  	}
>  
> -	start = 0;
> -	size = orig_length;
> -
>  	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
>  		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
>  
> +	ppgtt->base.allocate_va_range = NULL;
> +	ppgtt->base.teardown_va_range = NULL;
> +	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
> +
> +	return 0;
> +}
> +
> +static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> +{
> +	struct drm_device *dev = ppgtt->base.dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	int ret;
> +
> +	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
> +	if (ret)
> +		return ret;
> +
> +	ppgtt->base.allocate_va_range = gen8_alloc_va_range;
> +	ppgtt->base.teardown_va_range = gen8_teardown_va_range;
> +	ppgtt->base.clear_range = NULL;
> +
>  	return 0;
>  }
>  
> @@ -1413,9 +1652,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
>  	if (ret)
>  		return ret;
>  
> -	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
> -	ppgtt->base.teardown_va_range = gen6_teardown_va_range;
> -	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
> +	ppgtt->base.allocate_va_range = aliasing ? NULL : gen6_alloc_va_range;
> +	ppgtt->base.teardown_va_range = aliasing ? NULL : gen6_teardown_va_range;
> +	ppgtt->base.clear_range = aliasing ? gen6_ppgtt_clear_range : NULL;
>  	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
>  	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
>  	ppgtt->base.start = 0;
> @@ -1453,8 +1692,10 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
>  
>  	if (INTEL_INFO(dev)->gen < 8)
>  		return gen6_ppgtt_init(ppgtt, aliasing);
> +	else if (aliasing)
> +		return gen8_aliasing_ppgtt_init(ppgtt);
>  	else
> -		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
> +		return gen8_ppgtt_init(ppgtt);
>  }
>  int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
>  {
> @@ -1466,8 +1707,9 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
>  		kref_init(&ppgtt->ref);
>  		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
>  			    ppgtt->base.total);
> -		ppgtt->base.clear_range(&ppgtt->base, 0,
> -			    ppgtt->base.total, true);
> +		if (ppgtt->base.clear_range)
> +			ppgtt->base.clear_range(&ppgtt->base, 0,
> +				ppgtt->base.total, true);
>  		i915_init_vm(dev_priv, &ppgtt->base);
>  	}
>  
> @@ -1565,10 +1807,7 @@ ppgtt_bind_vma(struct i915_vma *vma,
>  
>  static void ppgtt_unbind_vma(struct i915_vma *vma)
>  {
> -	vma->vm->clear_range(vma->vm,
> -			     vma->node.start,
> -			     vma->obj->base.size,
> -			     true);
> +	WARN_ON(vma->vm->teardown_va_range && vma->vm->clear_range);
>  	if (vma->vm->teardown_va_range) {
>  		trace_i915_va_teardown(vma->vm,
>  				       vma->node.start, vma->node.size,
> @@ -1576,7 +1815,13 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
>  
>  		vma->vm->teardown_va_range(vma->vm,
>  					   vma->node.start, vma->node.size);
> -	}
> +	} else if (vma->vm->clear_range) {
> +		vma->vm->clear_range(vma->vm,
> +				     vma->node.start,
> +				     vma->obj->base.size,
> +				     true);
> +	} else
> +		BUG();

No gratitious additions of BUG please.

>  }
>  
>  extern int intel_iommu_gfx_mapped;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 957f2d0..534ed82 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -190,13 +190,26 @@ struct i915_vma {
>  			u32 flags);
>  };
>  
> -
> +/* Zombies. We write page tables with the CPU, and hardware switches them with
> + * the GPU. As such, the only time we can safely remove a page table is when we
> + * know the context is idle. Since we have no good way to do this, we use the
> + * zombie.
> + *
> + * Under memory pressure, if the system is idle, zombies may be reaped.
> + *
> + * There are 3 states a page table can be in (not including scratch)
> + *  bitmap = 0, zombie = 0: unallocated
> + *  bitmap = 1, zombie = 0: allocated
> + *  bitmap = 0, zombie = 1: zombie
> + *  bitmap = 1, zombie = 1: invalid
> + */
>  struct i915_pagetab {
>  	struct page *page;
>  	dma_addr_t daddr;
>  
>  	unsigned long *used_ptes;
>  	unsigned int scratch:1;
> +	unsigned zombie:1;
>  };
>  
>  struct i915_pagedir {
> @@ -208,6 +221,7 @@ struct i915_pagedir {
>  
>  	unsigned long *used_pdes;
>  	struct i915_pagetab *page_tables[GEN6_PPGTT_PD_ENTRIES];
> +	unsigned zombie:1;
>  };
>  
>  struct i915_pagedirpo {
> -- 
> 2.1.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 00/24] PPGTT dynamic page allocations
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
                     ` (23 preceding siblings ...)
  2014-12-23 17:16   ` [PATCH v2 24/24] drm/i915/bdw: Dynamic page table allocations in lrc mode Michel Thierry
@ 2015-01-05 14:57   ` Daniel Vetter
  24 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2015-01-05 14:57 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Tue, Dec 23, 2014 at 05:16:03PM +0000, Michel Thierry wrote:
> Addressing comments from v1.
> 
> For GEN8, it has also been extended to work in logical ring submission (lrc)
> mode, as it will be the preferred mode of operation.
> I also tried to update the lrc code at the same time the ppgtt refactoring
> occurred, leaving only one patch that is exclusively for lrc.
> 
> This list can be seen in 3 parts:
> [01-10] Include code rework for PPGTT (all GENs).
> [11-14] Adds page table allocation for GEN6/GEN7
> [15-24] Enables dynamic allocation in GEN8. It is enabled for both legacy
> and execlist submission modes.

Ok, I think I start to see the forrest for the individual trees here.
Thanks for reworking the series.

More comments in replies to individual patches. I've mostly concentrated
on the gen6/7 code and mostly ignored the details for bdw, but where
applicapable please do similar adjustments.

Since there's a bunch of simpler prep patches please start with the
detailed review even when some of the later patches are still under
discussion. That way I can start with merging. Otherwise I think we can go
to the detailed review phase for the entire series with my comments
addressed (and starting with the review while doing some of these reworks
probably helps the reviewer, too).

Thanks, Daniel

> 
> Ben Widawsky (23):
>   drm/i915: Add some extra guards in evict_vm
>   drm/i915/trace: Fix offsets for 64b
>   drm/i915: Rename to GEN8_LEGACY_PDPES
>   drm/i915: Setup less PPGTT on failed pagedir
>   drm/i915/gen8: Un-hardcode number of page directories
>   drm/i915: Range clearing is PPGTT agnostic
>   drm/i915: page table abstractions
>   drm/i915: Complete page table structures
>   drm/i915: Create page table allocators
>   drm/i915: Track GEN6 page table usage
>   drm/i915: Extract context switch skip and pd load logic
>   drm/i915: Track page table reload need
>   drm/i915: Initialize all contexts
>   drm/i915: Finish gen6/7 dynamic page table allocation
>   drm/i915/bdw: Use dynamic allocation idioms on free
>   drm/i915/bdw: pagedirs rework allocation
>   drm/i915/bdw: pagetable allocation rework
>   drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
>   drm/i915: num_pd_pages/num_pd_entries isn't useful
>   drm/i915: Extract PPGTT param from pagedir alloc
>   drm/i915/bdw: Split out mappings
>   drm/i915/bdw: begin bitmap tracking
>   drm/i915/bdw: Dynamic page table allocations
> 
> Michel Thierry (1):
>   drm/i915/bdw: Dynamic page table allocations in lrc mode
> 
>  drivers/gpu/drm/i915/i915_debugfs.c        |    7 +-
>  drivers/gpu/drm/i915/i915_gem.c            |   11 +
>  drivers/gpu/drm/i915/i915_gem_context.c    |   62 +-
>  drivers/gpu/drm/i915/i915_gem_evict.c      |    3 +
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   11 +
>  drivers/gpu/drm/i915/i915_gem_gtt.c        | 1200 ++++++++++++++++++++--------
>  drivers/gpu/drm/i915/i915_gem_gtt.h        |  250 +++++-
>  drivers/gpu/drm/i915/i915_trace.h          |  123 ++-
>  drivers/gpu/drm/i915/intel_lrc.c           |   80 +-
>  9 files changed, 1360 insertions(+), 387 deletions(-)
> 
> -- 
> 2.1.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 24/24] drm/i915/bdw: Dynamic page table allocations in lrc mode
  2014-12-23 17:16   ` [PATCH v2 24/24] drm/i915/bdw: Dynamic page table allocations in lrc mode Michel Thierry
@ 2015-01-05 14:59     ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2015-01-05 14:59 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Tue, Dec 23, 2014 at 05:16:27PM +0000, Michel Thierry wrote:
> Logic ring contexts need to know the PDPs when they are populated. With
> dynamic page table allocations, these PDPs may not exist yet.
> 
> Check if PDPs have been allocated and use the scratch page if they do
> not exist yet.
> 
> Before submission, update the PDPs in the logic ring context as PDPs
> have been allocated.
> 
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>

Patch subject is imo a bit misleading. What about "support dynamic pdp
updates in lrc mode"?
-Daniel

> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 80 +++++++++++++++++++++++++++++++++++-----
>  1 file changed, 70 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 546884b..6abe4bc 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -358,6 +358,7 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
>  
>  static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
>  				    struct drm_i915_gem_object *ring_obj,
> +				    struct i915_hw_ppgtt *ppgtt,
>  				    u32 tail)
>  {
>  	struct page *page;
> @@ -369,6 +370,40 @@ static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
>  	reg_state[CTX_RING_TAIL+1] = tail;
>  	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
>  
> +	/* True PPGTT with dynamic page allocation: update PDP registers and
> +	 * point the unallocated PDPs to the scratch page
> +	 */
> +	if (ppgtt) {
> +		if (test_bit(3, ppgtt->pdp.used_pdpes)) {
> +			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3]->daddr);
> +			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3]->daddr);
> +		} else {
> +			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
> +			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
> +		}
> +		if (test_bit(2, ppgtt->pdp.used_pdpes)) {
> +			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2]->daddr);
> +			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2]->daddr);
> +		} else {
> +			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
> +			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
> +		}
> +		if (test_bit(1, ppgtt->pdp.used_pdpes)) {
> +			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1]->daddr);
> +			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1]->daddr);
> +		} else {
> +			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
> +			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
> +		}
> +		if (test_bit(0, ppgtt->pdp.used_pdpes)) {
> +			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0]->daddr);
> +			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0]->daddr);
> +		} else {
> +			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
> +			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
> +		}
> +	}
> +
>  	kunmap_atomic(reg_state);
>  
>  	return 0;
> @@ -387,7 +422,7 @@ static void execlists_submit_contexts(struct intel_engine_cs *ring,
>  	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj0));
>  	WARN_ON(!i915_gem_obj_is_pinned(ringbuf0->obj));
>  
> -	execlists_update_context(ctx_obj0, ringbuf0->obj, tail0);
> +	execlists_update_context(ctx_obj0, ringbuf0->obj, to0->ppgtt, tail0);
>  
>  	if (to1) {
>  		ringbuf1 = to1->engine[ring->id].ringbuf;
> @@ -396,7 +431,7 @@ static void execlists_submit_contexts(struct intel_engine_cs *ring,
>  		WARN_ON(!i915_gem_obj_is_pinned(ctx_obj1));
>  		WARN_ON(!i915_gem_obj_is_pinned(ringbuf1->obj));
>  
> -		execlists_update_context(ctx_obj1, ringbuf1->obj, tail1);
> +		execlists_update_context(ctx_obj1, ringbuf1->obj, to1->ppgtt, tail1);
>  	}
>  
>  	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
> @@ -1731,14 +1766,39 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
>  	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
>  	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
>  	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
> -	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3]->daddr);
> -	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3]->daddr);
> -	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2]->daddr);
> -	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2]->daddr);
> -	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1]->daddr);
> -	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1]->daddr);
> -	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0]->daddr);
> -	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0]->daddr);
> +
> +	/* With dynamic page allocation, PDPs may not be allocated at this point,
> +	 * Point the unallocated PDPs to the scratch page
> +	 */
> +	if (test_bit(3, ppgtt->pdp.used_pdpes)) {
> +		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3]->daddr);
> +		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3]->daddr);
> +	} else {
> +		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
> +		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
> +	}
> +	if (test_bit(2, ppgtt->pdp.used_pdpes)) {
> +		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2]->daddr);
> +		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2]->daddr);
> +	} else {
> +		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
> +		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
> +	}
> +	if (test_bit(1, ppgtt->pdp.used_pdpes)) {
> +		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1]->daddr);
> +		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1]->daddr);
> +	} else {
> +		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
> +		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
> +	}
> +	if (test_bit(0, ppgtt->pdp.used_pdpes)) {
> +		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0]->daddr);
> +		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0]->daddr);
> +	} else {
> +		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
> +		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
> +	}
> +
>  	if (ring->id == RCS) {
>  		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>  		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
> -- 
> 2.1.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v3 00/25] PPGTT dynamic page allocations
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (25 preceding siblings ...)
  2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
@ 2015-01-13 11:52 ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 01/25] drm/i915/trace: Fix offsets for 64b Michel Thierry
                     ` (24 more replies)
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (2 subsequent siblings)
  29 siblings, 25 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

This new patchset addresses most of the comments from v2. It still teardowns
pagetables (which I plan to ammend shortly), but I think there were already
enough changes to justify it.

For GEN8, it has also been extended to work in logical ring submission (lrc)
mode, as it will be the preferred mode of operation.
I also tried to update the lrc code at the same time the ppgtt refactoring
occurred, leaving only one patch that is exclusively for lrc.

This list can be seen in 3 parts:
[01-09] Include code rework for PPGTT (all GENs).
[10-15] Adds page table allocation for GEN6/GEN7
[16-25] Enables dynamic allocation in GEN8,for both legacy and
execlist submission modes.

Ben Widawsky (22):
  drm/i915/trace: Fix offsets for 64b
  drm/i915: Rename to GEN8_LEGACY_PDPES
  drm/i915: Setup less PPGTT on failed page_directory
  drm/i915/gen8: Un-hardcode number of page directories
  drm/i915: Range clearing is PPGTT agnostic
  drm/i915: page table abstractions
  drm/i915: Complete page table structures
  drm/i915: Create page table allocators
  drm/i915: Track GEN6 page table usage
  drm/i915: Extract context switch skip and pd load logic
  drm/i915: Track page table reload need
  drm/i915: Initialize all contexts
  drm/i915: Finish gen6/7 dynamic page table allocation
  drm/i915/bdw: Use dynamic allocation idioms on free
  drm/i915/bdw: page directories rework allocation
  drm/i915/bdw: pagetable allocation rework
  drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
  drm/i915: num_pd_pages/num_pd_entries isn't useful
  drm/i915: Extract PPGTT param from page_directory alloc
  drm/i915/bdw: Split out mappings
  drm/i915/bdw: begin bitmap tracking
  drm/i915/bdw: Dynamic page table allocations

Michel Thierry (3):
  drm/i915: Plumb drm_device through page tables operations
  drm/i915: Add dynamic page trace events
  drm/i915/bdw: Support dynamic pdp updates in lrc mode

 drivers/gpu/drm/i915/i915_debugfs.c        |    7 +-
 drivers/gpu/drm/i915/i915_gem.c            |   11 +
 drivers/gpu/drm/i915/i915_gem_context.c    |   62 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   11 +
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 1184 ++++++++++++++++++++--------
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  217 ++++-
 drivers/gpu/drm/i915/i915_trace.h          |  123 ++-
 drivers/gpu/drm/i915/intel_lrc.c           |   80 +-
 8 files changed, 1329 insertions(+), 366 deletions(-)

-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v3 01/25] drm/i915/trace: Fix offsets for 64b
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 02/25] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
                     ` (23 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_trace.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 6058a01..f004d3d 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -115,7 +115,7 @@ TRACE_EVENT(i915_vma_bind,
 	    TP_STRUCT__entry(
 			     __field(struct drm_i915_gem_object *, obj)
 			     __field(struct i915_address_space *, vm)
-			     __field(u32, offset)
+			     __field(u64, offset)
 			     __field(u32, size)
 			     __field(unsigned, flags)
 			     ),
@@ -128,7 +128,7 @@ TRACE_EVENT(i915_vma_bind,
 			   __entry->flags = flags;
 			   ),
 
-	    TP_printk("obj=%p, offset=%08x size=%x%s vm=%p",
+	    TP_printk("obj=%p, offset=%016llx size=%x%s vm=%p",
 		      __entry->obj, __entry->offset, __entry->size,
 		      __entry->flags & PIN_MAPPABLE ? ", mappable" : "",
 		      __entry->vm)
@@ -141,7 +141,7 @@ TRACE_EVENT(i915_vma_unbind,
 	    TP_STRUCT__entry(
 			     __field(struct drm_i915_gem_object *, obj)
 			     __field(struct i915_address_space *, vm)
-			     __field(u32, offset)
+			     __field(u64, offset)
 			     __field(u32, size)
 			     ),
 
@@ -152,7 +152,7 @@ TRACE_EVENT(i915_vma_unbind,
 			   __entry->size = vma->node.size;
 			   ),
 
-	    TP_printk("obj=%p, offset=%08x size=%x vm=%p",
+	    TP_printk("obj=%p, offset=%016llx size=%x vm=%p",
 		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
 );
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 02/25] drm/i915: Rename to GEN8_LEGACY_PDPES
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 01/25] drm/i915/trace: Fix offsets for 64b Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 03/25] drm/i915: Setup less PPGTT on failed page_directory Michel Thierry
                     ` (22 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

In gen8, 32b PPGTT has always had one "pdp" (it doesn't actually have
one, but it resembles having one). The #define was confusing as is, and
using "PDPE" is a much better description.

sed -i 's/GEN8_LEGACY_PDPS/GEN8_LEGACY_PDPES/' drivers/gpu/drm/i915/*.[ch]

It also matches the x86 pagetable terminology:
PTE  = Page Table Entry - pagetable level 1 page
PDE  = Page Directory Entry - pagetable level 2 page
PDPE = Page Directory Pointer Entry - pagetable level 3 page

And in the near future (for 48b addressing):
PML4E = Page Map Level 4 Entry

v2: Expanded information about Page Directory/Table nomenclature.

Cc: Daniel Vetter <daniel@ffwll.ch>
CC: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 +++---
 drivers/gpu/drm/i915/i915_gem_gtt.h | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 746f77f..58d54bd 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -375,7 +375,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
-		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPS))
+		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
 			break;
 
 		if (pt_vaddr == NULL)
@@ -486,7 +486,7 @@ bail:
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
 					   const int max_pdp)
 {
-	struct page **pt_pages[GEN8_LEGACY_PDPS];
+	struct page **pt_pages[GEN8_LEGACY_PDPES];
 	int i, ret;
 
 	for (i = 0; i < max_pdp; i++) {
@@ -537,7 +537,7 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 		return -ENOMEM;
 
 	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
-	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
+	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e377c7d..9d998ec 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -88,7 +88,7 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN8_PDE_MASK			0x1ff
 #define GEN8_PTE_SHIFT			12
 #define GEN8_PTE_MASK			0x1ff
-#define GEN8_LEGACY_PDPS		4
+#define GEN8_LEGACY_PDPES		4
 #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
 #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
 
@@ -273,12 +273,12 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_pages; /* gen8+ */
 	union {
 		struct page **pt_pages;
-		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
+		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
 	};
 	struct page *pd_pages;
 	union {
 		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
+		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
 	};
 	union {
 		dma_addr_t *pt_dma_addr;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 03/25] drm/i915: Setup less PPGTT on failed page_directory
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 01/25] drm/i915/trace: Fix offsets for 64b Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 02/25] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 04/25] drm/i915/gen8: Un-hardcode number of page directories Michel Thierry
                     ` (21 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The current code will both potentially print a WARN, and setup part of
the PPGTT structure. Neither of these harm the current code, it is
simply for clarity, and to perhaps prevent later bugs, or weird
debug messages.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 58d54bd..b48b586 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1032,11 +1032,14 @@ alloc:
 		goto alloc;
 	}
 
+	if (ret)
+		return ret;
+
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
 	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
-	return ret;
+	return 0;
 }
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 04/25] drm/i915/gen8: Un-hardcode number of page directories
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (2 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 03/25] drm/i915: Setup less PPGTT on failed page_directory Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 05/25] drm/i915: Range clearing is PPGTT agnostic Michel Thierry
                     ` (20 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9d998ec..8f76990 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -282,7 +282,7 @@ struct i915_hw_ppgtt {
 	};
 	union {
 		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[4];
+		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
 	};
 
 	struct drm_i915_file_private *file_priv;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 05/25] drm/i915: Range clearing is PPGTT agnostic
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (3 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 04/25] drm/i915/gen8: Un-hardcode number of page directories Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 06/25] drm/i915: page table abstractions Michel Thierry
                     ` (19 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Therefore we can do it from our general init function. Eventually, I
hope to have a lot more commonality like this. It won't arrive yet, but
this was a nice easy one.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b48b586..0f6a196 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -672,8 +672,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.start = 0;
 	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
 
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
 	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
 			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
 	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
@@ -1146,8 +1144,6 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
 			 ppgtt->node.start / PAGE_SIZE);
@@ -1181,6 +1177,8 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 		kref_init(&ppgtt->ref);
 		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
 			    ppgtt->base.total);
+		ppgtt->base.clear_range(&ppgtt->base, 0,
+			    ppgtt->base.total, true);
 		i915_init_vm(dev_priv, &ppgtt->base);
 	}
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 06/25] drm/i915: page table abstractions
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (4 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 05/25] drm/i915: Range clearing is PPGTT agnostic Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 07/25] drm/i915: Complete page table structures Michel Thierry
                     ` (18 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

When we move to dynamic page allocation, keeping page_directory and pagetabs as
separate structures will help to break actions into simpler tasks.

To help transition the code nicely there is some wasted space in gen6/7.
This will be ameliorated shortly.

Following the x86 pagetable terminology:
PDPE = struct i915_page_directory_pointer_entry.
PDE = struct i915_page_directory_entry [page_directory].
PTE = struct i915_page_table_entry [page_tables].

v2: fixed mismatches after clean-up/rebase.

v3: Clarify the names of the multiple levels of page tables (Daniel)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2, v3)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 177 ++++++++++++++++++------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  23 ++++-
 2 files changed, 107 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0f6a196..c9f9266 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -334,7 +334,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
+		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
+		struct page *page_table = pd->page_tables[pde].page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
@@ -378,8 +379,12 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
 			break;
 
-		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
+		if (pt_vaddr == NULL) {
+			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
+			struct page *page_table = pd->page_tables[pde].page;
+
+			pt_vaddr = kmap_atomic(page_table);
+		}
 
 		pt_vaddr[pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -403,29 +408,33 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct page **pt_pages)
+static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
 {
 	int i;
 
-	if (pt_pages == NULL)
+	if (pd->page_tables == NULL)
 		return;
 
 	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
-		if (pt_pages[i])
-			__free_pages(pt_pages[i], 0);
+		if (pd->page_tables[i].page)
+			__free_page(pd->page_tables[i].page);
 }
 
-static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
+static void gen8_free_page_directories(struct i915_page_directory_entry *pd)
+{
+	kfree(pd->page_tables);
+	__free_page(pd->page);
+}
+
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
-		kfree(ppgtt->gen8_pt_pages[i]);
+		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
+		gen8_free_page_directories(&ppgtt->pdp.page_directory[i]);
 		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
-
-	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
 }
 
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
@@ -460,86 +469,75 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static struct page **__gen8_alloc_page_tables(void)
+static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages;
 	int i;
 
-	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
-	if (!pt_pages)
-		return ERR_PTR(-ENOMEM);
-
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!pt_pages[i])
-			goto bail;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
+						     sizeof(dma_addr_t),
+						     GFP_KERNEL);
+		if (!ppgtt->gen8_pt_dma_addr[i])
+			return -ENOMEM;
 	}
 
-	return pt_pages;
-
-bail:
-	gen8_free_page_tables(pt_pages);
-	kfree(pt_pages);
-	return ERR_PTR(-ENOMEM);
+	return 0;
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
-					   const int max_pdp)
+static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages[GEN8_LEGACY_PDPES];
-	int i, ret;
+	int i, j;
 
-	for (i = 0; i < max_pdp; i++) {
-		pt_pages[i] = __gen8_alloc_page_tables();
-		if (IS_ERR(pt_pages[i])) {
-			ret = PTR_ERR(pt_pages[i]);
-			goto unwind_out;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+			struct i915_page_table_entry *pt = &ppgtt->pdp.page_directory[i].page_tables[j];
+
+			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+			if (!pt->page)
+				goto unwind_out;
 		}
 	}
 
-	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
-	 * "atomic" - for cleanup purposes.
-	 */
-	for (i = 0; i < max_pdp; i++)
-		ppgtt->gen8_pt_pages[i] = pt_pages[i];
-
 	return 0;
 
 unwind_out:
-	while (i--) {
-		gen8_free_page_tables(pt_pages[i]);
-		kfree(pt_pages[i]);
-	}
+	while (i--)
+		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
 
-	return ret;
+	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
+						const int max_pdp)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
+	for (i = 0; i < max_pdp; i++) {
+		struct i915_page_table_entry *pt;
 
-	return 0;
-}
+		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
+		if (!pt)
+			goto unwind_out;
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
-{
-	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
-	if (!ppgtt->pd_pages)
-		return -ENOMEM;
+		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
+		if (!ppgtt->pdp.page_directory[i].page)
+			goto unwind_out;
+
+		ppgtt->pdp.page_directory[i].page_tables = pt;
+	}
 
-	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
+	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
+
+unwind_out:
+	while (i--) {
+		kfree(ppgtt->pdp.page_directory[i].page_tables);
+		__free_page(ppgtt->pdp.page_directory[i].page);
+	}
+
+	return -ENOMEM;
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
@@ -551,18 +549,19 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
-	if (ret) {
-		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
-		return ret;
-	}
+	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
+	if (ret)
+		goto err_out;
 
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
 
 	ret = gen8_ppgtt_allocate_dma(ppgtt);
-	if (ret)
-		gen8_ppgtt_free(ppgtt);
+	if (!ret)
+		return ret;
 
+	/* TODO: Check this for all cases */
+err_out:
+	gen8_ppgtt_free(ppgtt);
 	return ret;
 }
 
@@ -573,7 +572,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       &ppgtt->pd_pages[pd], 0,
+			       ppgtt->pdp.page_directory[pd].page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
@@ -593,7 +592,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 	struct page *p;
 	int ret;
 
-	p = ppgtt->gen8_pt_pages[pd][pt];
+	p = ppgtt->pdp.page_directory[pd].page_tables[pt].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
@@ -654,7 +653,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 */
 	for (i = 0; i < max_pdp; i++) {
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
@@ -715,7 +714,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
 		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
@@ -920,7 +919,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > I915_PPGTT_PT_ENTRIES)
 			last_pte = I915_PPGTT_PT_ENTRIES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -949,7 +948,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -984,8 +983,8 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pt_pages[i]);
-	kfree(ppgtt->pt_pages);
+		__free_page(ppgtt->pd.page_tables[i].page);
+	kfree(ppgtt->pd.page_tables);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1042,22 +1041,22 @@ alloc:
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
+	struct i915_page_table_entry *pt;
 	int i;
 
-	ppgtt->pt_pages = kcalloc(ppgtt->num_pd_entries, sizeof(struct page *),
-				  GFP_KERNEL);
-
-	if (!ppgtt->pt_pages)
+	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
+	if (!pt)
 		return -ENOMEM;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!ppgtt->pt_pages[i]) {
+		pt[i].page = alloc_page(GFP_KERNEL);
+		if (!pt->page) {
 			gen6_ppgtt_free(ppgtt);
 			return -ENOMEM;
 		}
 	}
 
+	ppgtt->pd.page_tables = pt;
 	return 0;
 }
 
@@ -1092,9 +1091,11 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct page *page;
 		dma_addr_t pt_addr;
 
-		pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i], 0, 4096,
+		page = ppgtt->pd.page_tables[i].page;
+		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
 		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
@@ -1138,7 +1139,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total =  ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd_offset =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8f76990..d9bc375 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -187,6 +187,20 @@ struct i915_vma {
 			 u32 flags);
 };
 
+struct i915_page_table_entry {
+	struct page *page;
+};
+
+struct i915_page_directory_entry {
+	struct page *page; /* NULL for GEN6-GEN7 */
+	struct i915_page_table_entry *page_tables;
+};
+
+struct i915_page_directory_pointer_entry {
+	/* struct page *page; */
+	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
+};
+
 struct i915_address_space {
 	struct drm_mm mm;
 	struct drm_device *dev;
@@ -272,11 +286,6 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		struct page **pt_pages;
-		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
-	};
-	struct page *pd_pages;
-	union {
 		uint32_t pd_offset;
 		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
 	};
@@ -284,6 +293,10 @@ struct i915_hw_ppgtt {
 		dma_addr_t *pt_dma_addr;
 		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
 	};
+	union {
+		struct i915_page_directory_pointer_entry pdp;
+		struct i915_page_directory_entry pd;
+	};
 
 	struct drm_i915_file_private *file_priv;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 07/25] drm/i915: Complete page table structures
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (5 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 06/25] drm/i915: page table abstractions Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 08/25] drm/i915: Create page table allocators Michel Thierry
                     ` (17 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Move the remaining members over to the new page table structures.

This can be squashed with the previous commit if desire. The reasoning
is the same as that patch. I simply felt it is easier to review if split.

v2: In lrc: s/ppgtt->pd_dma_addr[i]/ppgtt->pdp.page_directory[i].daddr/
v3: Rebase.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2, v3)
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 85 +++++++++++++------------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 14 +++---
 drivers/gpu/drm/i915/intel_lrc.c    | 16 +++----
 4 files changed, 45 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e515aad..60f91bc 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2153,7 +2153,7 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
 		seq_puts(m, "aliasing PPGTT:\n");
-		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
+		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
 
 		ppgtt->debug_dump(ppgtt, m);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c9f9266..9459195 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -307,7 +307,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pd_dma_addr[i];
+		dma_addr_t addr = ppgtt->pdp.page_directory[i].daddr;
 		ret = gen8_write_pdp(ring, i, addr);
 		if (ret)
 			return ret;
@@ -433,7 +433,6 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
 		gen8_free_page_directories(&ppgtt->pdp.page_directory[i]);
-		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
 }
 
@@ -445,14 +444,14 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pd_dma_addr[i])
+		if (!ppgtt->pdp.page_directory[i].daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i].daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -469,32 +468,19 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
-{
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
-
-	return 0;
-}
-
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
 	int i, j;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[i];
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_table_entry *pt = &ppgtt->pdp.page_directory[i].page_tables[j];
+			struct i915_page_table_entry *pt = &pd->page_tables[j];
 
 			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 			if (!pt->page)
 				goto unwind_out;
+
 		}
 	}
 
@@ -555,9 +541,7 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
 
-	ret = gen8_ppgtt_allocate_dma(ppgtt);
-	if (!ret)
-		return ret;
+	return 0;
 
 	/* TODO: Check this for all cases */
 err_out:
@@ -579,7 +563,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ppgtt->pd_dma_addr[pd] = pd_addr;
+	ppgtt->pdp.page_directory[pd].daddr = pd_addr;
 
 	return 0;
 }
@@ -589,17 +573,18 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pt)
 {
 	dma_addr_t pt_addr;
-	struct page *p;
+	struct i915_page_directory_entry *pdir = &ppgtt->pdp.page_directory[pd];
+	struct i915_page_table_entry *ptab = &pdir->page_tables[pt];
+	struct page *p = ptab->page;
 	int ret;
 
-	p = ppgtt->pdp.page_directory[pd].page_tables[pt].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
+	ptab->daddr = pt_addr;
 
 	return 0;
 }
@@ -655,7 +640,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		gen8_ppgtt_pde_t *pd_vaddr;
 		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -696,14 +681,15 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
 
 	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
 	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-		   ppgtt->pd_offset, ppgtt->pd_offset + ppgtt->num_pd_entries);
+		   ppgtt->pd.pd_offset,
+		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pt_dma_addr[pde];
+		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -747,13 +733,13 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 	uint32_t pd_entry;
 	int i;
 
-	WARN_ON(ppgtt->pd_offset & 0x3f);
+	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
 	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		dma_addr_t pt_addr;
 
-		pt_addr = ppgtt->pt_dma_addr[i];
+		pt_addr = ppgtt->pd.page_tables[i].daddr;
 		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
 		pd_entry |= GEN6_PDE_VALID;
 
@@ -764,9 +750,9 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 {
-	BUG_ON(ppgtt->pd_offset & 0x3f);
+	BUG_ON(ppgtt->pd.pd_offset & 0x3f);
 
-	return (ppgtt->pd_offset / 64) << 16;
+	return (ppgtt->pd.pd_offset / 64) << 16;
 }
 
 static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
@@ -969,19 +955,16 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	if (ppgtt->pt_dma_addr) {
-		for (i = 0; i < ppgtt->num_pd_entries; i++)
-			pci_unmap_page(ppgtt->base.dev->pdev,
-				       ppgtt->pt_dma_addr[i],
-				       4096, PCI_DMA_BIDIRECTIONAL);
-	}
+	for (i = 0; i < ppgtt->num_pd_entries; i++)
+		pci_unmap_page(ppgtt->base.dev->pdev,
+			       ppgtt->pd.page_tables[i].daddr,
+			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		__free_page(ppgtt->pd.page_tables[i].page);
 	kfree(ppgtt->pd.page_tables);
@@ -1074,14 +1057,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 	}
 
-	ppgtt->pt_dma_addr = kcalloc(ppgtt->num_pd_entries, sizeof(dma_addr_t),
-				     GFP_KERNEL);
-	if (!ppgtt->pt_dma_addr) {
-		drm_mm_remove_node(&ppgtt->node);
-		gen6_ppgtt_free(ppgtt);
-		return -ENOMEM;
-	}
-
 	return 0;
 }
 
@@ -1103,7 +1078,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pt_dma_addr[i] = pt_addr;
+		ppgtt->pd.page_tables[i].daddr = pt_addr;
 	}
 
 	return 0;
@@ -1142,7 +1117,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
-	ppgtt->pd_offset =
+	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
@@ -1151,7 +1126,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 
 	gen6_write_pdes(ppgtt);
 	DRM_DEBUG("Adding PPGTT at offset %x\n",
-		  ppgtt->pd_offset << 10);
+		  ppgtt->pd.pd_offset << 10);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d9bc375..6efeb18 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -189,10 +189,16 @@ struct i915_vma {
 
 struct i915_page_table_entry {
 	struct page *page;
+	dma_addr_t daddr;
 };
 
 struct i915_page_directory_entry {
 	struct page *page; /* NULL for GEN6-GEN7 */
+	union {
+		uint32_t pd_offset;
+		dma_addr_t daddr;
+	};
+
 	struct i915_page_table_entry *page_tables;
 };
 
@@ -286,14 +292,6 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
-	};
-	union {
-		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
-	};
-	union {
 		struct i915_page_directory_pointer_entry pdp;
 		struct i915_page_directory_entry pd;
 	};
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a68f180..a784d1d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1731,14 +1731,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[3]);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[3]);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[2]);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[2]);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[1]);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[1]);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[0]);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[0]);
+	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3].daddr);
+	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3].daddr);
+	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2].daddr);
+	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2].daddr);
+	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1].daddr);
+	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1].daddr);
+	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0].daddr);
+	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0].daddr);
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 08/25] drm/i915: Create page table allocators
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (6 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 07/25] drm/i915: Complete page table structures Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 09/25] drm/i915: Plumb drm_device through page tables operations Michel Thierry
                     ` (16 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks.  This makes the
code easier to write, read, and verify.

Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,

The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain non-trivial complexity as a
result.

This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.

2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).

3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.

4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.

v2: Updated commit message to explain why this patch exists

v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/

v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3, v4)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 224 +++++++++++++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c    |  16 +--
 3 files changed, 151 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 9459195..87beb40 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -275,6 +275,98 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
+static void unmap_and_free_pt(struct i915_page_table_entry *pt)
+{
+	if (WARN_ON(!pt->page))
+		return;
+	__free_page(pt->page);
+	kfree(pt);
+}
+
+static struct i915_page_table_entry *alloc_pt_single(void)
+{
+	struct i915_page_table_entry *pt;
+
+	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
+	if (!pt)
+		return ERR_PTR(-ENOMEM);
+
+	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pt->page) {
+		kfree(pt);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pt;
+}
+
+/**
+ * alloc_pt_range() - Allocate a multiple page tables
+ * @pd:		The page directory which will have at least @count entries
+ *		available to point to the allocated page tables.
+ * @pde:	First page directory entry for which we are allocating.
+ * @count:	Number of pages to allocate.
+ *
+ * Allocates multiple page table pages and sets the appropriate entries in the
+ * page table structure within the page directory. Function cleans up after
+ * itself on any failures.
+ *
+ * Return: 0 if allocation succeeded.
+ */
+static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count)
+{
+	int i, ret;
+
+	/* 512 is the max page tables per page_directory on any platform.
+	 * TODO: make WARN after patch series is done
+	 */
+	BUG_ON(pde + count > GEN6_PPGTT_PD_ENTRIES);
+
+	for (i = pde; i < pde + count; i++) {
+		struct i915_page_table_entry *pt = alloc_pt_single();
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto err_out;
+		}
+		WARN(pd->page_tables[i],
+		     "Leaking page directory entry %d (%pa)\n",
+		     i, pd->page_tables[i]);
+		pd->page_tables[i] = pt;
+	}
+
+	return 0;
+
+err_out:
+	while (i--)
+		unmap_and_free_pt(pd->page_tables[i]);
+	return ret;
+}
+
+static void unmap_and_free_pd(struct i915_page_directory_entry *pd)
+{
+	if (pd->page) {
+		__free_page(pd->page);
+		kfree(pd);
+	}
+}
+
+static struct i915_page_directory_entry *alloc_pd_single(void)
+{
+	struct i915_page_directory_entry *pd;
+
+	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+	if (!pd)
+		return ERR_PTR(-ENOMEM);
+
+	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pd->page) {
+		kfree(pd);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pd;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
 			   uint64_t val)
@@ -307,7 +399,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.page_directory[i].daddr;
+		dma_addr_t addr = ppgtt->pdp.page_directory[i]->daddr;
 		ret = gen8_write_pdp(ring, i, addr);
 		if (ret)
 			return ret;
@@ -334,8 +426,9 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
-		struct page *page_table = pd->page_tables[pde].page;
+		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[pdpe];
+		struct i915_page_table_entry *pt = pd->page_tables[pde];
+		struct page *page_table = pt->page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
@@ -380,8 +473,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 			break;
 
 		if (pt_vaddr == NULL) {
-			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
-			struct page *page_table = pd->page_tables[pde].page;
+			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[pdpe];
+			struct i915_page_table_entry *pt = pd->page_tables[pde];
+			struct page *page_table = pt->page;
 
 			pt_vaddr = kmap_atomic(page_table);
 		}
@@ -412,18 +506,13 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
 {
 	int i;
 
-	if (pd->page_tables == NULL)
+	if (!pd->page)
 		return;
 
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
-		if (pd->page_tables[i].page)
-			__free_page(pd->page_tables[i].page);
-}
-
-static void gen8_free_page_directories(struct i915_page_directory_entry *pd)
-{
-	kfree(pd->page_tables);
-	__free_page(pd->page);
+	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
+		unmap_and_free_pt(pd->page_tables[i]);
+		pd->page_tables[i] = NULL;
+	}
 }
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
@@ -431,8 +520,8 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
-		gen8_free_page_directories(&ppgtt->pdp.page_directory[i]);
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
+		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
 	}
 }
 
@@ -444,14 +533,16 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pdp.page_directory[i].daddr)
+		if (!ppgtt->pdp.page_directory[i]->daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i].daddr, PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i]->daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
+			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
+			struct i915_page_table_entry *pt =  pd->page_tables[j];
+			dma_addr_t addr = pt->daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -470,25 +561,20 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	int i, j;
+	int i, ret;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[i];
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_table_entry *pt = &pd->page_tables[j];
-
-			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-			if (!pt->page)
-				goto unwind_out;
-
-		}
+		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
+				     0, GEN8_PDES_PER_PAGE);
+		if (ret)
+			goto unwind_out;
 	}
 
 	return 0;
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
 
 	return -ENOMEM;
 }
@@ -499,17 +585,9 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int i;
 
 	for (i = 0; i < max_pdp; i++) {
-		struct i915_page_table_entry *pt;
-
-		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
-		if (!pt)
-			goto unwind_out;
-
-		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
-		if (!ppgtt->pdp.page_directory[i].page)
+		ppgtt->pdp.page_directory[i] = alloc_pd_single();
+		if (IS_ERR(ppgtt->pdp.page_directory[i]))
 			goto unwind_out;
-
-		ppgtt->pdp.page_directory[i].page_tables = pt;
 	}
 
 	ppgtt->num_pd_pages = max_pdp;
@@ -518,10 +596,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 
 unwind_out:
-	while (i--) {
-		kfree(ppgtt->pdp.page_directory[i].page_tables);
-		__free_page(ppgtt->pdp.page_directory[i].page);
-	}
+	while (i--)
+		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
 
 	return -ENOMEM;
 }
@@ -556,14 +632,14 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.page_directory[pd].page, 0,
+			       ppgtt->pdp.page_directory[pd]->page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->pdp.page_directory[pd].daddr = pd_addr;
+	ppgtt->pdp.page_directory[pd]->daddr = pd_addr;
 
 	return 0;
 }
@@ -573,8 +649,8 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pt)
 {
 	dma_addr_t pt_addr;
-	struct i915_page_directory_entry *pdir = &ppgtt->pdp.page_directory[pd];
-	struct i915_page_table_entry *ptab = &pdir->page_tables[pt];
+	struct i915_page_directory_entry *pdir = ppgtt->pdp.page_directory[pd];
+	struct i915_page_table_entry *ptab = pdir->page_tables[pt];
 	struct page *p = ptab->page;
 	int ret;
 
@@ -637,10 +713,12 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 * will never need to touch the PDEs again.
 	 */
 	for (i = 0; i < max_pdp; i++) {
+		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i]->page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
+			struct i915_page_table_entry *pt = pd->page_tables[j];
+			dma_addr_t addr = pt->daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -689,7 +767,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
+		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -700,7 +778,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
 		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
@@ -739,7 +817,7 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		dma_addr_t pt_addr;
 
-		pt_addr = ppgtt->pd.page_tables[i].daddr;
+		pt_addr = ppgtt->pd.page_tables[i]->daddr;
 		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
 		pd_entry |= GEN6_PDE_VALID;
 
@@ -905,7 +983,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > I915_PPGTT_PT_ENTRIES)
 			last_pte = I915_PPGTT_PT_ENTRIES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -934,7 +1012,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -957,7 +1035,7 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_tables[i].daddr,
+			       ppgtt->pd.page_tables[i]->daddr,
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
@@ -966,8 +1044,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pd.page_tables[i].page);
-	kfree(ppgtt->pd.page_tables);
+		unmap_and_free_pt(ppgtt->pd.page_tables[i]);
+
+	unmap_and_free_pd(&ppgtt->pd);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1022,27 +1101,6 @@ alloc:
 	return 0;
 }
 
-static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct i915_page_table_entry *pt;
-	int i;
-
-	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
-	if (!pt)
-		return -ENOMEM;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		pt[i].page = alloc_page(GFP_KERNEL);
-		if (!pt->page) {
-			gen6_ppgtt_free(ppgtt);
-			return -ENOMEM;
-		}
-	}
-
-	ppgtt->pd.page_tables = pt;
-	return 0;
-}
-
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 {
 	int ret;
@@ -1051,7 +1109,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_allocate_page_tables(ppgtt);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
@@ -1069,7 +1127,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 		struct page *page;
 		dma_addr_t pt_addr;
 
-		page = ppgtt->pd.page_tables[i].page;
+		page = ppgtt->pd.page_tables[i]->page;
 		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
@@ -1078,7 +1136,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pd.page_tables[i].daddr = pt_addr;
+		ppgtt->pd.page_tables[i]->daddr = pt_addr;
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 6efeb18..e8cad72 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -199,12 +199,12 @@ struct i915_page_directory_entry {
 		dma_addr_t daddr;
 	};
 
-	struct i915_page_table_entry *page_tables;
+	struct i915_page_table_entry *page_tables[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
 };
 
 struct i915_page_directory_pointer_entry {
 	/* struct page *page; */
-	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
+	struct i915_page_directory_entry *page_directory[GEN8_LEGACY_PDPES];
 };
 
 struct i915_address_space {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a784d1d..efaaebe 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1731,14 +1731,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3].daddr);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3].daddr);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2].daddr);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2].daddr);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1].daddr);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1].daddr);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0].daddr);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0].daddr);
+	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 09/25] drm/i915: Plumb drm_device through page tables operations
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (7 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 08/25] drm/i915: Create page table allocators Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 10/25] drm/i915: Track GEN6 page table usage Michel Thierry
                     ` (15 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

The next patch in the series will require it for alloc_pt_single.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 87beb40..c3c1828 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -138,7 +138,6 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 		return has_aliasing_ppgtt ? 1 : 0;
 }
 
-
 static void ppgtt_bind_vma(struct i915_vma *vma,
 			   enum i915_cache_level cache_level,
 			   u32 flags);
@@ -275,7 +274,7 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-static void unmap_and_free_pt(struct i915_page_table_entry *pt)
+static void unmap_and_free_pt(struct i915_page_table_entry *pt, struct drm_device *dev)
 {
 	if (WARN_ON(!pt->page))
 		return;
@@ -283,7 +282,7 @@ static void unmap_and_free_pt(struct i915_page_table_entry *pt)
 	kfree(pt);
 }
 
-static struct i915_page_table_entry *alloc_pt_single(void)
+static struct i915_page_table_entry *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_page_table_entry *pt;
 
@@ -313,7 +312,9 @@ static struct i915_page_table_entry *alloc_pt_single(void)
  *
  * Return: 0 if allocation succeeded.
  */
-static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count)
+static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count,
+		  struct drm_device *dev)
+
 {
 	int i, ret;
 
@@ -323,7 +324,7 @@ static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, si
 	BUG_ON(pde + count > GEN6_PPGTT_PD_ENTRIES);
 
 	for (i = pde; i < pde + count; i++) {
-		struct i915_page_table_entry *pt = alloc_pt_single();
+		struct i915_page_table_entry *pt = alloc_pt_single(dev);
 		if (IS_ERR(pt)) {
 			ret = PTR_ERR(pt);
 			goto err_out;
@@ -338,7 +339,7 @@ static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, si
 
 err_out:
 	while (i--)
-		unmap_and_free_pt(pd->page_tables[i]);
+		unmap_and_free_pt(pd->page_tables[i], dev);
 	return ret;
 }
 
@@ -502,7 +503,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
+static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct drm_device *dev)
 {
 	int i;
 
@@ -510,7 +511,7 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
 		return;
 
 	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		unmap_and_free_pt(pd->page_tables[i]);
+		unmap_and_free_pt(pd->page_tables[i], dev);
 		pd->page_tables[i] = NULL;
 	}
 }
@@ -520,7 +521,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
 		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
 	}
 }
@@ -565,7 +566,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
-				     0, GEN8_PDES_PER_PAGE);
+				     0, GEN8_PDES_PER_PAGE, ppgtt->base.dev);
 		if (ret)
 			goto unwind_out;
 	}
@@ -574,7 +575,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
@@ -1044,7 +1045,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		unmap_and_free_pt(ppgtt->pd.page_tables[i]);
+		unmap_and_free_pt(ppgtt->pd.page_tables[i], ppgtt->base.dev);
 
 	unmap_and_free_pd(&ppgtt->pd);
 }
@@ -1109,7 +1110,9 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+			ppgtt->base.dev);
+
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 10/25] drm/i915: Track GEN6 page table usage
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (8 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 09/25] drm/i915: Plumb drm_device through page tables operations Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 11/25] drm/i915: Extract context switch skip and pd load logic Michel Thierry
                     ` (14 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Instead of implementing the full tracking + dynamic allocation, this
patch does a bit less than half of the work, by tracking and warning on
unexpected conditions. The tracking itself follows which PTEs within a
page table are currently being used for objects. The next patch will
modify this to actually allocate the page tables only when necessary.

With the current patch there isn't much in the way of making a gen
agnostic range allocation function. However, in the next patch we'll add
more specificity which makes having separate functions a bit easier to
manage.

One important change introduced here is that DMA mappings are
created/destroyed at the same page directories/tables are
allocated/deallocated.

Notice that aliasing PPGTT is not managed here. The patch which actually
begins dynamic allocation/teardown explains the reasoning for this.

v2: s/pdp.page_directory/pdp.page_directorys
Make a scratch page allocation helper

v3: Rebase and expand commit message.

v4: Allocate required pagetables only when it is needed, _bind_to_vm
instead of bind_vma (Daniel).

v5: Rebased to remove the unnecessary noise in the diff, also:
 - PDE mask is GEN agnostic, renamed GEN6_PDE_MASK to I915_PDE_MASK.
 - Removed unnecessary checks in gen6_alloc_va_range.
 - Changed map/unmap_px_single macros to use dma functions directly and
   be part of a static inline function instead.
 - Moved drm_device plumbing through page tables operation to its own
   patch.
 - Moved allocate/teardown_va_range calls until they are fully
   implemented (in subsequent patch).
 - Merged pt and scratch_pt unmap_and_free path.
 - Moved scratch page allocator helper to the patch that will use it.

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 206 +++++++++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  77 ++++++++++++++
 2 files changed, 223 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c3c1828..95934a7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -274,29 +274,89 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-static void unmap_and_free_pt(struct i915_page_table_entry *pt, struct drm_device *dev)
+#define i915_dma_unmap_single(px, dev) do { \
+	__i915_dma_unmap_single((px)->daddr, dev); \
+} while (0)
+
+static inline void __i915_dma_unmap_single(dma_addr_t daddr,
+					struct drm_device *dev)
+{
+	struct device *device = &dev->pdev->dev;
+
+	dma_unmap_page(device, daddr, 4096, PCI_DMA_BIDIRECTIONAL);
+}
+
+/**
+ * i915_dma_map_px_single() - Create a dma mapping for a page table/dir/etc.
+ * @px:		Page table/dir/etc to get a DMA map for
+ * @dev:	drm device
+ *
+ * Page table allocations are unified across all gens. They always require a
+ * single 4k allocation, as well as a DMA mapping. If we keep the structs
+ * symmetric here, the simple macro covers us for every page table type.
+ *
+ * Return: 0 if success.
+ */
+#define i915_dma_map_px_single(px, dev) \
+    i915_dma_map_page_single((px)->page, (dev), &(px)->daddr)
+
+static inline int i915_dma_map_page_single(struct page *page,
+					   struct drm_device *dev,
+					   dma_addr_t *daddr)
+{
+	struct device *device = &dev->pdev->dev;
+
+	*daddr = dma_map_page(device, page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
+	return dma_mapping_error(device, *daddr);
+}
+
+static void unmap_and_free_pt(struct i915_page_table_entry *pt,
+			       struct drm_device *dev)
 {
 	if (WARN_ON(!pt->page))
 		return;
+
+	i915_dma_unmap_single(pt, dev);
 	__free_page(pt->page);
+	kfree(pt->used_ptes);
 	kfree(pt);
 }
 
 static struct i915_page_table_entry *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_page_table_entry *pt;
+	const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+		GEN8_PTES_PER_PAGE : I915_PPGTT_PT_ENTRIES;
+	int ret = -ENOMEM;
 
 	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
 	if (!pt)
 		return ERR_PTR(-ENOMEM);
 
+	pt->used_ptes = kcalloc(BITS_TO_LONGS(count), sizeof(*pt->used_ptes),
+				GFP_KERNEL);
+
+	if (!pt->used_ptes)
+		goto fail_bitmap;
+
 	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (!pt->page) {
-		kfree(pt);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!pt->page)
+		goto fail_page;
+
+	ret = i915_dma_map_px_single(pt, dev);
+	if (ret)
+		goto fail_dma;
 
 	return pt;
+
+fail_dma:
+	__free_page(pt->page);
+fail_page:
+	kfree(pt->used_ptes);
+fail_bitmap:
+	kfree(pt);
+
+	return ERR_PTR(ret);
 }
 
 /**
@@ -805,26 +865,36 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	}
 }
 
-static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
+/* Write pde (index) from the page directory @pd to the page table @pt */
+static void gen6_write_pdes(struct i915_page_directory_entry *pd,
+			    const int pde, struct i915_page_table_entry *pt)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
-	gen6_gtt_pte_t __iomem *pd_addr;
-	uint32_t pd_entry;
-	int i;
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pd, struct i915_hw_ppgtt, pd);
+	u32 pd_entry;
 
-	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
-	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		dma_addr_t pt_addr;
+	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
+	pd_entry |= GEN6_PDE_VALID;
 
-		pt_addr = ppgtt->pd.page_tables[i]->daddr;
-		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
-		pd_entry |= GEN6_PDE_VALID;
+	writel(pd_entry, ppgtt->pd_addr + pde);
 
-		writel(pd_entry, pd_addr + i);
-	}
-	readl(pd_addr);
+	/* XXX: Caller needs to make sure the write completes if necessary */
+}
+
+/* Write all the page tables found in the ppgtt structure to incrementing page
+ * directories. */
+static void gen6_write_page_range(struct drm_i915_private *dev_priv,
+				struct i915_page_directory_entry *pd, uint32_t start, uint32_t length)
+{
+	struct i915_page_table_entry *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, pd, start, length, temp, pde)
+		gen6_write_pdes(pd, pde, pt);
+
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
+	readl(dev_priv->gtt.gsm);
 }
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
@@ -1040,6 +1110,41 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
+static int gen6_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+				container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_table_entry *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		DECLARE_BITMAP(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
+		bitmap_zero(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
+		bitmap_set(tmp_bitmap, gen6_pte_index(start),
+			   gen6_pte_count(start, length));
+
+		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+				I915_PPGTT_PT_ENTRIES);
+	}
+
+	return 0;
+}
+
+static void gen6_teardown_va_range(struct i915_address_space *vm,
+				   uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+				container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_table_entry *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
+			     gen6_pte_count(start, length));
+	}
+}
+
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
@@ -1086,20 +1191,24 @@ alloc:
 					       0, dev_priv->gtt.base.total,
 					       0);
 		if (ret)
-			return ret;
+			goto err_out;
 
 		retried = true;
 		goto alloc;
 	}
 
 	if (ret)
-		return ret;
+		goto err_out;
+
 
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
 	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
 	return 0;
+
+err_out:
+	return ret;
 }
 
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
@@ -1121,30 +1230,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct drm_device *dev = ppgtt->base.dev;
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		struct page *page;
-		dma_addr_t pt_addr;
-
-		page = ppgtt->pd.page_tables[i]->page;
-		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
-				       PCI_DMA_BIDIRECTIONAL);
-
-		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
-			gen6_ppgtt_unmap_pages(ppgtt);
-			return -EIO;
-		}
-
-		ppgtt->pd.page_tables[i]->daddr = pt_addr;
-	}
-
-	return 0;
-}
-
 static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
 	struct drm_device *dev = ppgtt->base.dev;
@@ -1165,12 +1250,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_setup_page_tables(ppgtt);
-	if (ret) {
-		gen6_ppgtt_free(ppgtt);
-		return ret;
-	}
-
+	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
+	ppgtt->base.teardown_va_range = gen6_teardown_va_range;
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
@@ -1181,11 +1262,15 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
+	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
+
+	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
+
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
 			 ppgtt->node.start / PAGE_SIZE);
 
-	gen6_write_pdes(ppgtt);
 	DRM_DEBUG("Adding PPGTT at offset %x\n",
 		  ppgtt->pd.pd_offset << 10);
 
@@ -1460,13 +1545,14 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 
 	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
 		/* TODO: Perhaps it shouldn't be gen6 specific */
-		if (i915_is_ggtt(vm)) {
-			if (dev_priv->mm.aliasing_ppgtt)
-				gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
-			continue;
-		}
 
-		gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
+		struct i915_hw_ppgtt *ppgtt =
+			container_of(vm, struct i915_hw_ppgtt, base);
+
+		if (i915_is_ggtt(vm))
+			ppgtt = dev_priv->mm.aliasing_ppgtt;
+
+		gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
 	}
 
 	i915_ggtt_flush(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e8cad72..caa1aa9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -54,7 +54,10 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN6_PPGTT_PD_ENTRIES		512
 #define GEN6_PD_SIZE			(GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
 #define GEN6_PD_ALIGN			(PAGE_SIZE * 16)
+#define GEN6_PDE_SHIFT			22
 #define GEN6_PDE_VALID			(1 << 0)
+#define I915_PDE_MASK			(GEN6_PPGTT_PD_ENTRIES-1)
+#define NUM_PTE(pde_shift)		(1 << (pde_shift - PAGE_SHIFT))
 
 #define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
 
@@ -190,6 +193,8 @@ struct i915_vma {
 struct i915_page_table_entry {
 	struct page *page;
 	dma_addr_t daddr;
+
+	unsigned long *used_ptes;
 };
 
 struct i915_page_directory_entry {
@@ -246,6 +251,12 @@ struct i915_address_space {
 	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
 				     enum i915_cache_level level,
 				     bool valid, u32 flags); /* Create a valid PTE */
+	int (*allocate_va_range)(struct i915_address_space *vm,
+				 uint64_t start,
+				 uint64_t length);
+	void (*teardown_va_range)(struct i915_address_space *vm,
+				  uint64_t start,
+				  uint64_t length);
 	void (*clear_range)(struct i915_address_space *vm,
 			    uint64_t start,
 			    uint64_t length,
@@ -298,12 +309,78 @@ struct i915_hw_ppgtt {
 
 	struct drm_i915_file_private *file_priv;
 
+	gen6_gtt_pte_t __iomem *pd_addr;
+
 	int (*enable)(struct i915_hw_ppgtt *ppgtt);
 	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
 			 struct intel_engine_cs *ring);
 	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
 };
 
+/* For each pde iterates over every pde between from start until start + length.
+ * If start, and start+length are not perfectly divisible, the macro will round
+ * down, and up as needed. The macro modifies pde, start, and length. Dev is
+ * only used to differentiate shift values. Temp is temp.  On gen6/7, start = 0,
+ * and length = 2G effectively iterates over every PDE in the system. On gen8+
+ * it simply iterates over every page directory entry in a page directory.
+ *
+ * XXX: temp is not actually needed, but it saves doing the ALIGN operation.
+ */
+#define gen6_for_each_pde(pt, pd, start, length, temp, iter) \
+	for (iter = gen6_pde_index(start), pt = (pd)->page_tables[iter]; \
+	     length > 0 && iter < GEN6_PPGTT_PD_ENTRIES; \
+	     pt = (pd)->page_tables[++iter], \
+	     temp = ALIGN(start+1, 1 << GEN6_PDE_SHIFT) - start, \
+	     temp = min(temp, (unsigned)length), \
+	     start += temp, length -= temp)
+
+static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
+{
+	const uint32_t mask = NUM_PTE(pde_shift) - 1;
+	return (address >> PAGE_SHIFT) & mask;
+}
+
+/* Helper to counts the number of PTEs within the given length. This count does
+* not cross a page table boundary, so the max value would be
+* I915_PPGTT_PT_ENTRIES for GEN6, and GEN8_PTES_PER_PAGE for GEN8.
+*/
+static inline size_t i915_pte_count(uint64_t addr, size_t length,
+					uint32_t pde_shift)
+{
+	const uint64_t mask = ~((1 << pde_shift) - 1);
+	uint64_t end;
+
+	BUG_ON(length == 0);
+	BUG_ON(offset_in_page(addr|length));
+
+	end = addr + length;
+
+	if ((addr & mask) != (end & mask))
+		return NUM_PTE(pde_shift) - i915_pte_index(addr, pde_shift);
+
+	return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
+}
+
+static inline uint32_t i915_pde_index(uint64_t addr, uint32_t shift)
+{
+	return (addr >> shift) & I915_PDE_MASK;
+}
+
+static inline uint32_t gen6_pte_index(uint32_t addr)
+{
+	return i915_pte_index(addr, GEN6_PDE_SHIFT);
+}
+
+static inline size_t gen6_pte_count(uint32_t addr, uint32_t length)
+{
+	return i915_pte_count(addr, length, GEN6_PDE_SHIFT);
+}
+
+static inline uint32_t gen6_pde_index(uint32_t addr)
+{
+	return i915_pde_index(addr, GEN6_PDE_SHIFT);
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 11/25] drm/i915: Extract context switch skip and pd load logic
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (9 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 10/25] drm/i915: Track GEN6 page table usage Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 12/25] drm/i915: Track page table reload need Michel Thierry
                     ` (13 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

We have some fanciness coming up. This patch just breaks out the logic
of context switch skip, pd load pre, and pd load post.

v2: Use new functions to replace the logic right away (Daniel)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_context.c | 40 +++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 755b415..6206d27 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -565,6 +565,33 @@ mi_set_context(struct intel_engine_cs *ring,
 	return ret;
 }
 
+static inline bool should_skip_switch(struct intel_engine_cs *ring,
+				      struct intel_context *from,
+				      struct intel_context *to)
+{
+	if (from == to && !to->remap_slice)
+		return true;
+
+	return false;
+}
+
+static bool
+needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	return ((INTEL_INFO(ring->dev)->gen < 8) ||
+			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
+}
+
+static bool
+needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
+{
+	return (!to->legacy_hw_ctx.initialized ||
+			i915_gem_context_is_default(to)) &&
+			to->ppgtt && IS_GEN8(ring->dev);
+}
+
 static int do_switch(struct intel_engine_cs *ring,
 		     struct intel_context *to)
 {
@@ -573,9 +600,6 @@ static int do_switch(struct intel_engine_cs *ring,
 	u32 hw_flags = 0;
 	bool uninitialized = false;
 	struct i915_vma *vma;
-	bool needs_pd_load_pre = ((INTEL_INFO(ring->dev)->gen < 8) ||
-			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
-	bool needs_pd_load_post = false;
 	int ret, i;
 
 	if (from != NULL && ring == &dev_priv->ring[RCS]) {
@@ -583,7 +607,7 @@ static int do_switch(struct intel_engine_cs *ring,
 		BUG_ON(!i915_gem_obj_is_pinned(from->legacy_hw_ctx.rcs_state));
 	}
 
-	if (from == to && !to->remap_slice)
+	if (should_skip_switch(ring, from, to))
 		return 0;
 
 	/* Trying to pin first makes error handling easier. */
@@ -601,7 +625,7 @@ static int do_switch(struct intel_engine_cs *ring,
 	 */
 	from = ring->last_context;
 
-	if (needs_pd_load_pre) {
+	if (needs_pd_load_pre(ring, to)) {
 		/* Older GENs and non render rings still want the load first,
 		 * "PP_DCLV followed by PP_DIR_BASE register through Load
 		 * Register Immediate commands in Ring Buffer before submitting
@@ -646,16 +670,14 @@ static int do_switch(struct intel_engine_cs *ring,
 	 * XXX: If we implemented page directory eviction code, this
 	 * optimization needs to be removed.
 	 */
-	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
+	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
 		hw_flags |= MI_RESTORE_INHIBIT;
-		needs_pd_load_post = to->ppgtt && IS_GEN8(ring->dev);
-	}
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
 		goto unpin_out;
 
-	if (needs_pd_load_post) {
+	if (needs_pd_load_post(ring, to)) {
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		/* The hardware context switch is emitted, but we haven't
 		 * actually changed the state - so it's probably safe to bail
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 12/25] drm/i915: Track page table reload need
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (10 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 11/25] drm/i915: Extract context switch skip and pd load logic Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 13/25] drm/i915: Initialize all contexts Michel Thierry
                     ` (12 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This patch was formerly known as, "Force pd restore when PDEs change,
gen6-7." I had to change the name because it is needed for GEN8 too.

The real issue this is trying to solve is when a new object is mapped
into the current address space. The GPU does not snoop the new mapping
so we must do the gen specific action to reload the page tables.

GEN8 and GEN7 do differ in the way they load page tables for the RCS.
GEN8 does so with the context restore, while GEN7 requires the proper
load commands in the command streamer. Non-render is similar for both.

Caveat for GEN7
The docs say you cannot change the PDEs of a currently running context.
We never map new PDEs of a running context, and expect them to be
present - so I think this is okay. (We can unmap, but this should also
be okay since we only unmap unreferenced objects that the GPU shouldn't
be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag
to signal that even if the context is the same, force a reload. It's
unclear exactly what this does, but I have a hunch it's the right thing
to do.

The logic assumes that we always emit a context switch after mapping new
PDEs, and before we submit a batch. This is the case today, and has been
the case since the inception of hardware contexts. A note in the comment
let's the user know.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

squash! drm/i915: Force pd restore when PDEs change, gen6-7

It's not just for gen8. If the current context has mappings change, we
need a context reload to switch

v2: Rebased after ppgtt clean up patches. Split the warning for aliasing
and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt
is always null.

v3: Invalidate PPGTT TLBs inside alloc_va_range and teardown_va_range.

v4: Rename ppgtt_invalidate_tlbs to mark_tlbs_dirty and move
pd_dirty_rings from i915_address_space to i915_hw_ppgtt.

Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_context.c    | 27 ++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 13 +++++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  1 +
 4 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 6206d27..92347a9 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -569,8 +569,18 @@ static inline bool should_skip_switch(struct intel_engine_cs *ring,
 				      struct intel_context *from,
 				      struct intel_context *to)
 {
-	if (from == to && !to->remap_slice)
-		return true;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	if (to->remap_slice)
+		return false;
+
+	if (to->ppgtt) {
+		if (from == to && !test_bit(ring->id, &to->ppgtt->pd_dirty_rings))
+			return true;
+	} else {
+		if (from == to && !test_bit(ring->id, &dev_priv->mm.aliasing_ppgtt->pd_dirty_rings))
+			return true;
+	}
 
 	return false;
 }
@@ -587,9 +597,8 @@ needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
 static bool
 needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
 {
-	return (!to->legacy_hw_ctx.initialized ||
-			i915_gem_context_is_default(to)) &&
-			to->ppgtt && IS_GEN8(ring->dev);
+	return IS_GEN8(ring->dev) &&
+			(to->ppgtt || &to->ppgtt->pd_dirty_rings);
 }
 
 static int do_switch(struct intel_engine_cs *ring,
@@ -634,6 +643,12 @@ static int do_switch(struct intel_engine_cs *ring,
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		if (ret)
 			goto unpin_out;
+
+		/* Doing a PD load always reloads the page dirs */
+		if (to->ppgtt)
+			clear_bit(ring->id, &to->ppgtt->pd_dirty_rings);
+		else
+			clear_bit(ring->id, &dev_priv->mm.aliasing_ppgtt->pd_dirty_rings);
 	}
 
 	if (ring != &dev_priv->ring[RCS]) {
@@ -672,6 +687,8 @@ static int do_switch(struct intel_engine_cs *ring,
 	 */
 	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
 		hw_flags |= MI_RESTORE_INHIBIT;
+	else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->pd_dirty_rings))
+		hw_flags |= MI_FORCE_RESTORE;
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index e3ef177..10e6a27 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1198,6 +1198,13 @@ i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
 	if (ret)
 		goto error;
 
+	if (ctx->ppgtt)
+		WARN(ctx->ppgtt->pd_dirty_rings & (1<<ring->id),
+			"%s didn't clear reload\n", ring->name);
+	else
+		WARN(dev_priv->mm.aliasing_ppgtt->pd_dirty_rings &
+			(1<<ring->id), "%s didn't clear reload\n", ring->name);
+
 	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
 	instp_mask = I915_EXEC_CONSTANTS_MASK;
 	switch (instp_mode) {
@@ -1445,6 +1452,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret)
 		goto err;
 
+	/* XXX: Reserve has possibly change PDEs which means we must do a
+	 * context switch before we can coherently read some of the reserved
+	 * VMAs. */
+
 	/* The objects are in their final locations, apply the relocations. */
 	if (need_relocs)
 		ret = i915_gem_execbuffer_relocate(eb);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 95934a7..62f492f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1110,6 +1110,16 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
+/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
+ * are switching between contexts with the same LRCA, we also must do a force
+ * restore.
+ */
+static inline void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
+{
+	/* If current vm != vm, */ \
+	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
+}
+
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start, uint64_t length)
 {
@@ -1128,6 +1138,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 				I915_PPGTT_PT_ENTRIES);
 	}
 
+	mark_tlbs_dirty(ppgtt);
 	return 0;
 }
 
@@ -1143,6 +1154,8 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
 		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
 			     gen6_pte_count(start, length));
 	}
+
+	mark_tlbs_dirty(ppgtt);
 }
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index caa1aa9..1ae70be 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -300,6 +300,7 @@ struct i915_hw_ppgtt {
 	struct i915_address_space base;
 	struct kref ref;
 	struct drm_mm_node node;
+	unsigned long pd_dirty_rings;
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 13/25] drm/i915: Initialize all contexts
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (11 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 12/25] drm/i915: Track page table reload need Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 14/25] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
                     ` (11 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The problem is we're going to switch to a new context, which could be
the default context. The plan was to use restore inhibit, which would be
fine, except if we are using dynamic page tables (which we will). If we
use dynamic page tables and we don't load new page tables, the previous
page tables might go away, and future operations will fault.

CTXA runs.
switch to default, restore inhibit
CTXA dies and has its address space taken away.
Run CTXB, tries to save using the context A's address space - this
fails.

The general solution is to make sure every context has it's own state,
and its own address space. For cases when we must restore inhibit, first
thing we do is load a valid address space. I thought this would be
enough, but apparently there are references within the context itself
which will refer to the old address space - therefore, we also must
reinitialize.

It was tricky to track this down as we don't have much insight into what
happens in a context save.

This is required for the next patch which enables dynamic page tables.

v2: to->ppgtt is only valid in full ppgtt.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_context.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 92347a9..dd6324e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -594,13 +594,6 @@ needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
 			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
 }
 
-static bool
-needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
-{
-	return IS_GEN8(ring->dev) &&
-			(to->ppgtt || &to->ppgtt->pd_dirty_rings);
-}
-
 static int do_switch(struct intel_engine_cs *ring,
 		     struct intel_context *to)
 {
@@ -681,20 +674,24 @@ static int do_switch(struct intel_engine_cs *ring,
 
 	/* GEN8 does *not* require an explicit reload if the PDPs have been
 	 * setup, and we do not wish to move them.
-	 *
-	 * XXX: If we implemented page directory eviction code, this
-	 * optimization needs to be removed.
 	 */
-	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
+	if (!to->legacy_hw_ctx.initialized) {
 		hw_flags |= MI_RESTORE_INHIBIT;
-	else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->pd_dirty_rings))
+		/* NB: If we inhibit the restore, the context is not allowed to
+		 * die because future work may end up depending on valid address
+		 * space. This means we must enforce that a page table load
+		 * occur when this occurs. */
+	} else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->pd_dirty_rings))
 		hw_flags |= MI_FORCE_RESTORE;
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
 		goto unpin_out;
 
-	if (needs_pd_load_post(ring, to)) {
+	if (IS_GEN8(ring->dev) && to->ppgtt && (hw_flags & MI_RESTORE_INHIBIT)) {
+		/* We have a valid page directory (scratch) to switch to. This
+		 * allows the old VM to be freed. Note that if anything occurs
+		 * between the set context, and here, we are f*cked */
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		/* The hardware context switch is emitted, but we haven't
 		 * actually changed the state - so it's probably safe to bail
@@ -744,7 +741,7 @@ static int do_switch(struct intel_engine_cs *ring,
 		i915_gem_context_unreference(from);
 	}
 
-	uninitialized = !to->legacy_hw_ctx.initialized && from == NULL;
+	uninitialized = !to->legacy_hw_ctx.initialized;
 	to->legacy_hw_ctx.initialized = true;
 
 done:
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 14/25] drm/i915: Finish gen6/7 dynamic page table allocation
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (12 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 13/25] drm/i915: Initialize all contexts Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 15/25] drm/i915: Add dynamic page trace events Michel Thierry
                     ` (10 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This patch continues on the idea from the previous patch. From here on,
in the steady state, PDEs are all pointing to the scratch page table (as
recommended in the spec). When an object is allocated in the VA range,
the code will determine if we need to allocate a page for the page
table. Similarly when the object is destroyed, we will remove, and free
the page table pointing the PDE back to the scratch page.

Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.

The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV.  Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.

We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.

v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.

v3: Updated trace event to spit out a name

v4: Aliasing ppgtt is now initialized differently (in setup global gtt)

v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check
for trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).

v6: Implement changes from code review (Daniel):
 - allocate/teardown_va_range calls added.
 - Add a scratch page allocation helper (only need the address).
 - Move trace events to a new patch.
 - Use updated mark_tlbs_dirty.
 - Moved pt preallocation for aliasing ppgtt into gen6_ppgtt_init.

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
---
 drivers/gpu/drm/i915/i915_debugfs.c |   3 +-
 drivers/gpu/drm/i915/i915_gem.c     |   9 +++
 drivers/gpu/drm/i915/i915_gem_gtt.c | 143 ++++++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem_gtt.h |   3 +
 4 files changed, 142 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 60f91bc..0f63076 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2149,6 +2149,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", I915_READ(RING_PP_DIR_BASE_READ(ring)));
 		seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", I915_READ(RING_PP_DIR_DCLV(ring)));
 	}
+	seq_printf(m, "ECOCHK: 0x%08x\n\n", I915_READ(GAM_ECOCHK));
+
 	if (dev_priv->mm.aliasing_ppgtt) {
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
@@ -2165,7 +2167,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
 		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
 	}
-	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
 }
 
 static int i915_ppgtt_info(struct seq_file *m, void *data)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6c40365..65e055c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3585,6 +3585,15 @@ search_free:
 	if (ret)
 		goto err_remove_node;
 
+	/*  allocate before insert / bind */
+	if (vma->vm->allocate_va_range) {
+		ret = vma->vm->allocate_va_range(vma->vm,
+						vma->node.start,
+						vma->node.size);
+		if (ret)
+			goto err_remove_node;
+	}
+
 	trace_i915_vma_bind(vma, flags);
 	ret = i915_vma_bind(vma, obj->cache_level,
 			    flags & PIN_GLOBAL ? GLOBAL_BIND : 0);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 62f492f..d37bd83 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -316,6 +316,14 @@ static void unmap_and_free_pt(struct i915_page_table_entry *pt,
 	if (WARN_ON(!pt->page))
 		return;
 
+	if (!pt->scratch) {
+		const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+			GEN8_PTES_PER_PAGE : I915_PPGTT_PT_ENTRIES;
+		WARN(!bitmap_empty(pt->used_ptes, count),
+		     "Free page table with %d used pages\n",
+		     bitmap_weight(pt->used_ptes, count));
+	}
+
 	i915_dma_unmap_single(pt, dev);
 	__free_page(pt->page);
 	kfree(pt->used_ptes);
@@ -359,6 +367,15 @@ fail_bitmap:
 	return ERR_PTR(ret);
 }
 
+static inline struct i915_page_table_entry *alloc_pt_scratch(struct drm_device *dev)
+{
+	struct i915_page_table_entry *pt = alloc_pt_single(dev);
+	if (!IS_ERR(pt))
+		pt->scratch = 1;
+
+	return pt;
+}
+
 /**
  * alloc_pt_range() - Allocate a multiple page tables
  * @pd:		The page directory which will have at least @count entries
@@ -1123,10 +1140,46 @@ static inline void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start, uint64_t length)
 {
+	DECLARE_BITMAP(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
+	struct drm_device *dev = vm->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_hw_ppgtt *ppgtt =
 				container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_table_entry *pt;
+	const uint32_t start_save = start, length_save = length;
 	uint32_t pde, temp;
+	int ret;
+
+	BUG_ON(upper_32_bits(start));
+
+	bitmap_zero(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
+
+	/* The allocation is done in two stages so that we can bail out with
+	 * minimal amount of pain. The first stage finds new page tables that
+	 * need allocation. The second stage marks use ptes within the page
+	 * tables.
+	 */
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		if (pt != ppgtt->scratch_pt) {
+			WARN_ON(bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
+			continue;
+		}
+
+		/* We've already allocated a page table */
+		WARN_ON(!bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
+
+		pt = alloc_pt_single(dev);
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto unwind_out;
+		}
+
+		ppgtt->pd.page_tables[pde] = pt;
+		set_bit(pde, new_page_tables);
+	}
+
+	start = start_save;
+	length = length_save;
 
 	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
 		DECLARE_BITMAP(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
@@ -1134,12 +1187,31 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 		bitmap_set(tmp_bitmap, gen6_pte_index(start),
 			   gen6_pte_count(start, length));
 
-		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+		if (test_and_clear_bit(pde, new_page_tables))
+			gen6_write_pdes(&ppgtt->pd, pde, pt);
+
+		bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
 				I915_PPGTT_PT_ENTRIES);
 	}
 
+	WARN_ON(!bitmap_empty(new_page_tables, GEN6_PPGTT_PD_ENTRIES));
+
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
+	readl(dev_priv->gtt.gsm);
+
 	mark_tlbs_dirty(ppgtt);
 	return 0;
+
+unwind_out:
+	for_each_set_bit(pde, new_page_tables, GEN6_PPGTT_PD_ENTRIES) {
+		struct i915_page_table_entry *pt = ppgtt->pd.page_tables[pde];
+		ppgtt->pd.page_tables[pde] = NULL;
+		unmap_and_free_pt(pt, vm->dev);
+	}
+
+	mark_tlbs_dirty(ppgtt);
+	return ret;
 }
 
 static void gen6_teardown_va_range(struct i915_address_space *vm,
@@ -1151,8 +1223,19 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
 	uint32_t pde, temp;
 
 	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+
+		if (WARN(pt == ppgtt->scratch_pt,
+		    "Tried to teardown scratch page vm %p. pde %u: %llx-%llx\n",
+		    vm, pde, start, start + length))
+			continue;
+
 		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
 			     gen6_pte_count(start, length));
+
+		if (bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES)) {
+			gen6_write_pdes(&ppgtt->pd, pde, ppgtt->scratch_pt);
+			ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
+		}
 	}
 
 	mark_tlbs_dirty(ppgtt);
@@ -1162,9 +1245,13 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		unmap_and_free_pt(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct i915_page_table_entry *pt = ppgtt->pd.page_tables[i];
+		if (pt != ppgtt->scratch_pt)
+			unmap_and_free_pt(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+	}
 
+	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
 	unmap_and_free_pd(&ppgtt->pd);
 }
 
@@ -1191,6 +1278,9 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 	 * size. We allocate at the top of the GTT to avoid fragmentation.
 	 */
 	BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
+	ppgtt->scratch_pt = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pt))
+		return PTR_ERR(ppgtt->scratch_pt);
 alloc:
 	ret = drm_mm_insert_node_in_range_generic(&dev_priv->gtt.base.mm,
 						  &ppgtt->node, GEN6_PD_SIZE,
@@ -1221,6 +1311,7 @@ alloc:
 	return 0;
 
 err_out:
+	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
 	return ret;
 }
 
@@ -1232,18 +1323,20 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
-			ppgtt->base.dev);
+	return 0;
+}
 
-	if (ret) {
-		drm_mm_remove_node(&ppgtt->node);
-		return ret;
-	}
+static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
+				  uint64_t start, uint64_t length)
+{
+	struct i915_page_table_entry *unused;
+	uint32_t pde, temp;
 
-	return 0;
+	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde)
+		ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
 }
 
-static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 {
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1263,6 +1356,18 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
+	if (aliasing) {
+		/* preallocate all pts */
+		ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+				ppgtt->base.dev);
+
+		if (ret) {
+			unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
+			drm_mm_remove_node(&ppgtt->node);
+			return ret;
+		}
+	}
+
 	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
 	ppgtt->base.teardown_va_range = gen6_teardown_va_range;
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
@@ -1278,6 +1383,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
 		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
+	if (!aliasing)
+		gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
+
 	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
@@ -1290,7 +1398,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
+static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
+		bool aliasing)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
@@ -1298,7 +1407,7 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.scratch = dev_priv->gtt.base.scratch;
 
 	if (INTEL_INFO(dev)->gen < 8)
-		return gen6_ppgtt_init(ppgtt);
+		return gen6_ppgtt_init(ppgtt, aliasing);
 	else
 		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
 }
@@ -1307,7 +1416,7 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret = 0;
 
-	ret = __hw_ppgtt_init(dev, ppgtt);
+	ret = __hw_ppgtt_init(dev, ppgtt, false);
 	if (ret == 0) {
 		kref_init(&ppgtt->ref);
 		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
@@ -1415,6 +1524,10 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 			     vma->node.start,
 			     vma->obj->base.size,
 			     true);
+	if (vma->vm->teardown_va_range) {
+		vma->vm->teardown_va_range(vma->vm,
+					   vma->node.start, vma->node.size);
+	}
 }
 
 extern int intel_iommu_gfx_mapped;
@@ -1930,7 +2043,7 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
 		if (!ppgtt)
 			return -ENOMEM;
 
-		ret = __hw_ppgtt_init(dev, ppgtt);
+		ret = __hw_ppgtt_init(dev, ppgtt, true);
 		if (ret != 0)
 			return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 1ae70be..074b368 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -195,6 +195,7 @@ struct i915_page_table_entry {
 	dma_addr_t daddr;
 
 	unsigned long *used_ptes;
+	unsigned int scratch:1;
 };
 
 struct i915_page_directory_entry {
@@ -308,6 +309,8 @@ struct i915_hw_ppgtt {
 		struct i915_page_directory_entry pd;
 	};
 
+	struct i915_page_table_entry *scratch_pt;
+
 	struct drm_i915_file_private *file_priv;
 
 	gen6_gtt_pte_t __iomem *pd_addr;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 15/25] drm/i915: Add dynamic page trace events
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (13 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 14/25] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 16/25] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
                     ` (9 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

Traces for page directories and tables allocation/destroy and map/unmap.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c     |   2 +
 drivers/gpu/drm/i915/i915_gem_gtt.c |  17 ++++++
 drivers/gpu/drm/i915/i915_trace.h   | 115 ++++++++++++++++++++++++++++++++++++
 3 files changed, 134 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 65e055c..bab62d2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3587,6 +3587,8 @@ search_free:
 
 	/*  allocate before insert / bind */
 	if (vma->vm->allocate_va_range) {
+		trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size,
+				VM_TO_TRACE_NAME(vma->vm));
 		ret = vma->vm->allocate_va_range(vma->vm,
 						vma->node.start,
 						vma->node.size);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d37bd83..40996fe 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1176,6 +1176,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 
 		ppgtt->pd.page_tables[pde] = pt;
 		set_bit(pde, new_page_tables);
+		trace_i915_page_table_entry_alloc(vm, pde, start, GEN6_PDE_SHIFT);
 	}
 
 	start = start_save;
@@ -1190,6 +1191,10 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 		if (test_and_clear_bit(pde, new_page_tables))
 			gen6_write_pdes(&ppgtt->pd, pde, pt);
 
+		trace_i915_page_table_entry_map(vm, pde, pt,
+					 gen6_pte_index(start),
+					 gen6_pte_count(start, length),
+					 I915_PPGTT_PT_ENTRIES);
 		bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
 				I915_PPGTT_PT_ENTRIES);
 	}
@@ -1229,10 +1234,18 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
 		    vm, pde, start, start + length))
 			continue;
 
+		trace_i915_page_table_entry_unmap(vm, pde, pt,
+					   gen6_pte_index(start),
+					   gen6_pte_count(start, length),
+					   I915_PPGTT_PT_ENTRIES);
+
 		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
 			     gen6_pte_count(start, length));
 
 		if (bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES)) {
+			trace_i915_page_table_entry_destroy(vm, pde,
+						     start & GENMASK_ULL(63, GEN6_PDE_SHIFT),
+						     GEN6_PDE_SHIFT);
 			gen6_write_pdes(&ppgtt->pd, pde, ppgtt->scratch_pt);
 			ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
 		}
@@ -1525,6 +1538,10 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 			     vma->obj->base.size,
 			     true);
 	if (vma->vm->teardown_va_range) {
+		trace_i915_va_teardown(vma->vm,
+				       vma->node.start, vma->node.size,
+				       VM_TO_TRACE_NAME(vma->vm));
+
 		vma->vm->teardown_va_range(vma->vm,
 					   vma->node.start, vma->node.size);
 	}
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index f004d3d..22fa11d 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -156,6 +156,121 @@ TRACE_EVENT(i915_vma_unbind,
 		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
 );
 
+#define VM_TO_TRACE_NAME(vm) \
+	(i915_is_ggtt(vm) ? "GGTT" : \
+		      "Private VM")
+
+DECLARE_EVENT_CLASS(i915_va,
+	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+	TP_ARGS(vm, start, length, name),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u64, start)
+		__field(u64, end)
+		__string(name, name)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->start = start;
+		__entry->end = start + length;
+		__assign_str(name, name);
+	),
+
+	TP_printk("vm=%p (%s), 0x%llx-0x%llx",
+		  __entry->vm, __get_str(name),  __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_va, i915_va_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+	     TP_ARGS(vm, start, length, name)
+);
+
+DEFINE_EVENT(i915_va, i915_va_teardown,
+	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+	     TP_ARGS(vm, start, length, name)
+);
+
+DECLARE_EVENT_CLASS(i915_page_table_entry,
+	TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	TP_ARGS(vm, pde, start, pde_shift),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u64, start)
+		__field(u64, end)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->start = start;
+		__entry->end = (start + (1ULL << pde_shift)) & ~((1ULL << pde_shift)-1);
+	),
+
+	TP_printk("vm=%p, pde=%d (0x%llx-0x%llx)",
+		  __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_page_table_entry, i915_page_table_entry_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	     TP_ARGS(vm, pde, start, pde_shift)
+);
+
+DEFINE_EVENT(i915_page_table_entry, i915_page_table_entry_destroy,
+	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	     TP_ARGS(vm, pde, start, pde_shift)
+);
+
+/* Avoid extra math because we only support two sizes. The format is defined by
+ * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
+#define TRACE_PT_SIZE(bits) \
+	((((bits) == 1024) ? 288 : 144) + 1)
+
+DECLARE_EVENT_CLASS(i915_page_table_entry_update,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_page_table_entry *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u32, first)
+		__field(u32, last)
+		__dynamic_array(char, cur_ptes, TRACE_PT_SIZE(bits))
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->first = first;
+		__entry->last = first + len;
+
+		bitmap_scnprintf(__get_str(cur_ptes),
+				 TRACE_PT_SIZE(bits),
+				 pt->used_ptes,
+				 bits);
+	),
+
+	TP_printk("vm=%p, pde=%d, updating %u:%u\t%s",
+		  __entry->vm, __entry->pde, __entry->last, __entry->first,
+		  __get_str(cur_ptes))
+);
+
+DEFINE_EVENT(i915_page_table_entry_update, i915_page_table_entry_map,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_page_table_entry *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits)
+);
+
+DEFINE_EVENT(i915_page_table_entry_update, i915_page_table_entry_unmap,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_page_table_entry *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits)
+);
+
 TRACE_EVENT(i915_gem_object_change_domain,
 	    TP_PROTO(struct drm_i915_gem_object *obj, u32 old_read, u32 old_write),
 	    TP_ARGS(obj, old_read, old_write),
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 16/25] drm/i915/bdw: Use dynamic allocation idioms on free
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (14 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 15/25] drm/i915: Add dynamic page trace events Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 17/25] drm/i915/bdw: page directories rework allocation Michel Thierry
                     ` (8 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The page directory freer is left here for now as it's still useful given
that GEN8 still preallocates. Once the allocation functions are broken
up into more discrete chunks, we'll follow suit and destroy this
leftover piece.

v2: Match trace_i915_va_teardown params
v3: Multiple rebases.
v4: Updated to use unmap_and_free_pt.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 54 +++++++++++++++++++++++--------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 46 +++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 40996fe..756907f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -580,27 +580,32 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct drm_device *dev)
+static void gen8_teardown_va_range(struct i915_address_space *vm,
+				   uint64_t start, uint64_t length)
 {
-	int i;
-
-	if (!pd->page)
-		return;
-
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		unmap_and_free_pt(pd->page_tables[i], dev);
-		pd->page_tables[i] = NULL;
+	struct i915_hw_ppgtt *ppgtt =
+				container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_entry *pd;
+	struct i915_page_table_entry *pt;
+	uint64_t temp;
+	uint32_t pdpe, pde;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		uint64_t pd_len = gen8_clamp_pd(start, length);
+		uint64_t pd_start = start;
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			unmap_and_free_pt(pt, vm->dev);
+		}
+		unmap_and_free_pd(pd);
 	}
 }
 
-static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+/* This function will die soon */
+static void gen8_free_full_page_directory(struct i915_hw_ppgtt *ppgtt, int i)
 {
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
-		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
-	}
+	gen8_teardown_va_range(&ppgtt->base,
+			       i << GEN8_PDPE_SHIFT,
+			       (1 << GEN8_PDPE_SHIFT));
 }
 
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
@@ -615,19 +620,28 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 			continue;
 
 		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i]->daddr, PAGE_SIZE,
-			       PCI_DMA_BIDIRECTIONAL);
+				PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
-			struct i915_page_table_entry *pt =  pd->page_tables[j];
+			struct i915_page_table_entry *pt = pd->page_tables[j];
 			dma_addr_t addr = pt->daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
-					       PCI_DMA_BIDIRECTIONAL);
+						PCI_DMA_BIDIRECTIONAL);
 		}
 	}
 }
 
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+{
+	trace_i915_va_teardown(&ppgtt->base,
+			       ppgtt->base.start, ppgtt->base.total,
+			       VM_TO_TRACE_NAME(&ppgtt->base));
+	gen8_teardown_va_range(&ppgtt->base,
+			       ppgtt->base.start, ppgtt->base.total);
+}
+
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct i915_hw_ppgtt *ppgtt =
@@ -652,7 +666,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
+		gen8_free_full_page_directory(ppgtt, i);
 
 	return -ENOMEM;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 074b368..c82a029 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -385,6 +385,52 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	return i915_pde_index(addr, GEN6_PDE_SHIFT);
 }
 
+#define gen8_for_each_pde(pt, pd, start, length, temp, iter)		\
+	for (iter = gen8_pde_index(start), pt = (pd)->page_tables[iter]; \
+	     length > 0 && iter < GEN8_PDES_PER_PAGE;			\
+	     pt = (pd)->page_tables[++iter],				\
+	     temp = ALIGN(start+1, 1 << GEN8_PDE_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
+	for (iter = gen8_pdpe_index(start), pd = (pdp)->page_directory[iter];	\
+	     length > 0 && iter < GEN8_LEGACY_PDPES;			\
+	     pd = (pdp)->page_directory[++iter],				\
+	     temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
+/* Clamp length to the next page_directory boundary */
+static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
+{
+	uint64_t next_pd = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
+	if (next_pd > (start + length))
+		return length;
+
+	return next_pd - start;
+}
+
+static inline uint32_t gen8_pte_index(uint64_t address)
+{
+	return i915_pte_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pde_index(uint64_t address)
+{
+	return i915_pde_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pdpe_index(uint64_t address)
+{
+	return (address >> GEN8_PDPE_SHIFT) & GEN8_PDPE_MASK;
+}
+
+static inline uint32_t gen8_pml4e_index(uint64_t address)
+{
+	BUG(); /* For 64B */
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 17/25] drm/i915/bdw: page directories rework allocation
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (15 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 16/25] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 18/25] drm/i915/bdw: pagetable allocation rework Michel Thierry
                     ` (7 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Start using gen8_for_each_pdpe macro to allocate the page directories.

v2: Rebased after s/free_pt_*/unmap_and_free_pt/ change.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 43 ++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 756907f..3779653 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -595,8 +595,10 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 		uint64_t pd_start = start;
 		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
 			unmap_and_free_pt(pt, vm->dev);
+			pd->page_tables[pde] = NULL;
 		}
 		unmap_and_free_pd(pd);
+		ppgtt->pdp.page_directory[pdpe] = NULL;
 	}
 }
 
@@ -671,25 +673,39 @@ unwind_out:
 	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
+static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
+				     uint64_t start,
+				     uint64_t length)
 {
-	int i;
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pdp, struct i915_hw_ppgtt, pdp);
+	struct i915_page_directory_entry *unused;
+	uint64_t temp;
+	uint32_t pdpe;
 
-	for (i = 0; i < max_pdp; i++) {
-		ppgtt->pdp.page_directory[i] = alloc_pd_single();
-		if (IS_ERR(ppgtt->pdp.page_directory[i]))
+	/* FIXME: PPGTT container_of won't work for 64b */
+	BUG_ON((start + length) > 0x800000000ULL);
+
+	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
+		BUG_ON(unused);
+		pdp->page_directory[pdpe] = alloc_pd_single();
+		if (IS_ERR(ppgtt->pdp.page_directory[pdpe]))
 			goto unwind_out;
+
+		ppgtt->num_pd_pages++;
 	}
 
-	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
 
 unwind_out:
-	while (i--)
-		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
+	while (pdpe--) {
+		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe]);
+		ppgtt->num_pd_pages--;
+	}
+
+	WARN_ON(ppgtt->num_pd_pages);
 
 	return -ENOMEM;
 }
@@ -699,7 +715,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 {
 	int ret;
 
-	ret = gen8_ppgtt_allocate_page_directories(ppgtt, max_pdp);
+	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, ppgtt->base.start,
+					ppgtt->base.total);
 	if (ret)
 		return ret;
 
@@ -776,6 +793,10 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	if (size % (1<<30))
 		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
 
+	ppgtt->base.start = 0;
+	ppgtt->base.total = size;
+	BUG_ON(ppgtt->base.total == 0);
+
 	/* 1. Do all our allocations for page directories and page tables. */
 	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
 	if (ret)
@@ -823,8 +844,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
-	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
 
 	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
 			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 18/25] drm/i915/bdw: pagetable allocation rework
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (16 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 17/25] drm/i915/bdw: page directories rework allocation Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 19/25] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
                     ` (6 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Start using gen8_for_each_pde macro to allocate page tables.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 54 ++++++++++++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 10 +++++++
 2 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3779653..1a070b7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -602,14 +602,6 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 	}
 }
 
-/* This function will die soon */
-static void gen8_free_full_page_directory(struct i915_hw_ppgtt *ppgtt, int i)
-{
-	gen8_teardown_va_range(&ppgtt->base,
-			       i << GEN8_PDPE_SHIFT,
-			       (1 << GEN8_PDPE_SHIFT));
-}
-
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
@@ -653,22 +645,27 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_alloc_pagetabs(struct i915_page_directory_entry *pd,
+				     uint64_t start,
+				     uint64_t length,
+				     struct drm_device *dev)
 {
-	int i, ret;
+	struct i915_page_table_entry *unused;
+	uint64_t temp;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
-				     0, GEN8_PDES_PER_PAGE, ppgtt->base.dev);
-		if (ret)
+	gen8_for_each_pde(unused, pd, start, length, temp, pde) {
+		BUG_ON(unused);
+		pd->page_tables[pde] = alloc_pt_single(dev);
+		if (IS_ERR(pd->page_tables[pde]))
 			goto unwind_out;
 	}
 
 	return 0;
 
 unwind_out:
-	while (i--)
-		gen8_free_full_page_directory(ppgtt, i);
+	while (pde--)
+		unmap_and_free_pt(pd->page_tables[pde], dev);
 
 	return -ENOMEM;
 }
@@ -711,20 +708,28 @@ unwind_out:
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-			    const int max_pdp)
+			    uint64_t start,
+			    uint64_t length)
 {
+	struct i915_page_directory_entry *pd;
+	uint64_t temp;
+	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, ppgtt->base.start,
-					ppgtt->base.total);
+	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length);
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
-	if (ret)
-		goto err_out;
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
+						ppgtt->base.dev);
+		if (ret)
+			goto err_out;
+
+		ppgtt->num_pd_entries += GEN8_PDES_PER_PAGE;
+	}
 
-	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
+	BUG_ON(pdpe > ppgtt->num_pd_pages);
 
 	return 0;
 
@@ -795,10 +800,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
-	BUG_ON(ppgtt->base.total == 0);
 
 	/* 1. Do all our allocations for page directories and page tables. */
-	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
+	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c82a029..f416e01 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -401,6 +401,16 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
+/* Clamp length to the next pagetab boundary */
+static inline uint64_t gen8_clamp_pt(uint64_t start, uint64_t length)
+{
+	uint64_t next_pt = ALIGN(start + 1, 1 << GEN8_PDE_SHIFT);
+	if (next_pt > (start + length))
+		return length;
+
+	return next_pt - start;
+}
+
 /* Clamp length to the next page_directory boundary */
 static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
 {
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 19/25] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (17 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 18/25] drm/i915/bdw: pagetable allocation rework Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 20/25] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
                     ` (5 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

One important part of this patch is we now write a scratch page
directory into any unused PDP descriptors. This matters for 2 reasons,
first, we're not allowed to just use 0, or an invalid pointer, and second,
we must wipe out any previous contents from the last context.

The latter point only matters with full PPGTT. The former point only
effect platforms with less than 4GB memory.

v2: Updated commit message to point that we must set unused PDPs to the
scratch page.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 29 ++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  5 ++++-
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1a070b7..2c00b24 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -446,8 +446,9 @@ static struct i915_page_directory_entry *alloc_pd_single(void)
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
-static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
-			   uint64_t val)
+static int gen8_write_pdp(struct intel_engine_cs *ring,
+			  unsigned entry,
+			  dma_addr_t addr)
 {
 	int ret;
 
@@ -459,10 +460,10 @@ static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
 
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 	intel_ring_emit(ring, GEN8_RING_PDP_UDW(ring, entry));
-	intel_ring_emit(ring, (u32)(val >> 32));
+	intel_ring_emit(ring, upper_32_bits(addr));
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 	intel_ring_emit(ring, GEN8_RING_PDP_LDW(ring, entry));
-	intel_ring_emit(ring, (u32)(val));
+	intel_ring_emit(ring, lower_32_bits(addr));
 	intel_ring_advance(ring);
 
 	return 0;
@@ -473,12 +474,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 {
 	int i, ret;
 
-	/* bit of a hack to find the actual last used pd */
-	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
-
-	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.page_directory[i]->daddr;
-		ret = gen8_write_pdp(ring, i, addr);
+	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
+		dma_addr_t pd_daddr = pd ? pd->daddr : ppgtt->scratch_pd->daddr;
+		/* The page directory might be NULL, but we need to clear out
+		 * whatever the previous context might have used. */
+		ret = gen8_write_pdp(ring, i, pd_daddr);
 		if (ret)
 			return ret;
 	}
@@ -801,10 +802,16 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
 
+	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pd))
+		return PTR_ERR(ppgtt->scratch_pd);
+
 	/* 1. Do all our allocations for page directories and page tables. */
 	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
-	if (ret)
+	if (ret) {
+		unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
+	}
 
 	/*
 	 * 2. Create DMA mappings for the page directories and page tables.
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index f416e01..40ac457 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -309,7 +309,10 @@ struct i915_hw_ppgtt {
 		struct i915_page_directory_entry pd;
 	};
 
-	struct i915_page_table_entry *scratch_pt;
+	union {
+		struct i915_page_table_entry *scratch_pt;
+		struct i915_page_table_entry *scratch_pd; /* Just need the daddr */
+	};
 
 	struct drm_i915_file_private *file_priv;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 20/25] drm/i915: num_pd_pages/num_pd_entries isn't useful
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (18 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 19/25] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 21/25] drm/i915: Extract PPGTT param from page_directory alloc Michel Thierry
                     ` (4 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

These values are never quite useful for dynamic allocations of the page
tables. Getting rid of them will help prevent later confusion.

v2: Updated to use unmap_and_free_pd functions.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 --
 drivers/gpu/drm/i915/i915_gem_gtt.c | 66 +++++++++++++------------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  7 ++--
 3 files changed, 27 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 0f63076..b00760b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2117,8 +2117,6 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 	if (!ppgtt)
 		return;
 
-	seq_printf(m, "Page directories: %d\n", ppgtt->num_pd_pages);
-	seq_printf(m, "Page tables: %d\n", ppgtt->num_pd_entries);
 	for_each_ring(ring, dev_priv, unused) {
 		seq_printf(m, "%s\n", ring->name);
 		for (i = 0; i < 4; i++) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2c00b24..3050648 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -608,7 +608,7 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
 		if (!ppgtt->pdp.page_directory[i]->daddr)
@@ -689,21 +689,13 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 		pdp->page_directory[pdpe] = alloc_pd_single();
 		if (IS_ERR(ppgtt->pdp.page_directory[pdpe]))
 			goto unwind_out;
-
-		ppgtt->num_pd_pages++;
 	}
 
-	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
-
 	return 0;
 
 unwind_out:
-	while (pdpe--) {
+	while (pdpe--)
 		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe]);
-		ppgtt->num_pd_pages--;
-	}
-
-	WARN_ON(ppgtt->num_pd_pages);
 
 	return -ENOMEM;
 }
@@ -726,12 +718,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 						ppgtt->base.dev);
 		if (ret)
 			goto err_out;
-
-		ppgtt->num_pd_entries += GEN8_PDES_PER_PAGE;
 	}
 
-	BUG_ON(pdpe > ppgtt->num_pd_pages);
-
 	return 0;
 
 	/* TODO: Check this for all cases */
@@ -793,7 +781,6 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
 	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
 	int i, j, ret;
 
 	if (size % (1<<30))
@@ -856,11 +843,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 
-	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
-			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
-	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
-			 ppgtt->num_pd_entries,
-			 (ppgtt->num_pd_entries - min_pt_pages) + size % (1<<30));
 	return 0;
 
 bail:
@@ -871,26 +853,20 @@ bail:
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
 	struct i915_address_space *vm = &ppgtt->base;
-	gen6_gtt_pte_t __iomem *pd_addr;
+	struct i915_page_table_entry *unused;
 	gen6_gtt_pte_t scratch_pte;
 	uint32_t pd_entry;
-	int pte, pde;
+	uint32_t  pte, pde, temp;
+	uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
 
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
 
-	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-
-	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-		   ppgtt->pd.pd_offset,
-		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
-	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
+	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
 		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
-		pd_entry = readl(pd_addr + pde);
+		pd_entry = readl(ppgtt->pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
 		if (pd_entry != expected)
@@ -1163,12 +1139,15 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 
 static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
-	int i;
+	struct i915_page_table_entry *pt;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_tables[i]->daddr,
-			       4096, PCI_DMA_BIDIRECTIONAL);
+	gen6_for_all_pdes(pt, ppgtt, pde) {
+		if (pt != ppgtt->scratch_pt)
+			pci_unmap_page(ppgtt->base.dev->pdev,
+				pt->daddr,
+				4096, PCI_DMA_BIDIRECTIONAL);
+	}
 }
 
 /* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
@@ -1300,12 +1279,12 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
-	int i;
+	struct i915_page_table_entry *pt;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		struct i915_page_table_entry *pt = ppgtt->pd.page_tables[i];
+	gen6_for_all_pdes(pt, ppgtt, pde) {
 		if (pt != ppgtt->scratch_pt)
-			unmap_and_free_pt(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+			unmap_and_free_pt(pt, ppgtt->base.dev);
 	}
 
 	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
@@ -1364,7 +1343,6 @@ alloc:
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
-	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
 	return 0;
 
 err_out:
@@ -1415,7 +1393,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 
 	if (aliasing) {
 		/* preallocate all pts */
-		ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+		ret = alloc_pt_range(&ppgtt->pd, 0, GEN6_PPGTT_PD_ENTRIES,
 				ppgtt->base.dev);
 
 		if (ret) {
@@ -1431,7 +1409,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	ppgtt->base.total = GEN6_PPGTT_PD_ENTRIES * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd.pd_offset =
@@ -1739,7 +1717,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 		if (i915_is_ggtt(vm))
 			ppgtt = dev_priv->mm.aliasing_ppgtt;
 
-		gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
+		gen6_write_page_range(dev_priv, &ppgtt->pd, 0, GEN6_PPGTT_PD_ENTRIES);
 	}
 
 	i915_ggtt_flush(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 40ac457..4a3371a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -302,8 +302,6 @@ struct i915_hw_ppgtt {
 	struct kref ref;
 	struct drm_mm_node node;
 	unsigned long pd_dirty_rings;
-	unsigned num_pd_entries;
-	unsigned num_pd_pages; /* gen8+ */
 	union {
 		struct i915_page_directory_pointer_entry pdp;
 		struct i915_page_directory_entry pd;
@@ -341,6 +339,11 @@ struct i915_hw_ppgtt {
 	     temp = min(temp, (unsigned)length), \
 	     start += temp, length -= temp)
 
+#define gen6_for_all_pdes(pt, ppgtt, iter)  \
+	for (iter = 0, pt = ppgtt->pd.page_tables[iter];			\
+	     iter < gen6_pde_index(ppgtt->base.total);			\
+	     pt =  ppgtt->pd.page_tables[++iter])
+
 static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
 {
 	const uint32_t mask = NUM_PTE(pde_shift) - 1;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 21/25] drm/i915: Extract PPGTT param from page_directory alloc
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (19 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 20/25] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 22/25] drm/i915/bdw: Split out mappings Michel Thierry
                     ` (3 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Now that we don't need to trace num_pd_pages, we may as well kill all
need for the PPGTT structure in the alloc_page_directorys. This is very useful
for when we move to 48b addressing, and the PDP isn't the root of the
page table structure.

The param is replaced with drm_device, which is an unavoidable wart
throughout the series. (in other words, not extra flagrant).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3050648..69c5e21 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -675,8 +675,6 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 				     uint64_t start,
 				     uint64_t length)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(pdp, struct i915_hw_ppgtt, pdp);
 	struct i915_page_directory_entry *unused;
 	uint64_t temp;
 	uint32_t pdpe;
@@ -687,7 +685,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
 		pdp->page_directory[pdpe] = alloc_pd_single();
-		if (IS_ERR(ppgtt->pdp.page_directory[pdpe]))
+		if (IS_ERR(pdp->page_directory[pdpe]))
 			goto unwind_out;
 	}
 
@@ -695,7 +693,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 
 unwind_out:
 	while (pdpe--)
-		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe]);
+		unmap_and_free_pd(pdp->page_directory[pdpe]);
 
 	return -ENOMEM;
 }
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 22/25] drm/i915/bdw: Split out mappings
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (20 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 21/25] drm/i915: Extract PPGTT param from page_directory alloc Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 23/25] drm/i915/bdw: begin bitmap tracking Michel Thierry
                     ` (2 subsequent siblings)
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

When we do dynamic page table allocations for gen8, we'll need to have
more control over how and when we map page tables, similar to gen6.
In particular, DMA mappings for page directories/tables occur at allocation
time.

This patch adds the functionality and calls it at init, which should
have no functional change.

The PDPEs are still a special case for now. We'll need a function for
that in the future as well.

v2: Handle renamed unmap_and_free_page functions.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 183 ++++++++++++++----------------------
 1 file changed, 71 insertions(+), 112 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 69c5e21..3ce9f83 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -420,17 +420,20 @@ err_out:
 	return ret;
 }
 
-static void unmap_and_free_pd(struct i915_page_directory_entry *pd)
+static void unmap_and_free_pd(struct i915_page_directory_entry *pd,
+			       struct drm_device *dev)
 {
 	if (pd->page) {
+		i915_dma_unmap_single(pd, dev);
 		__free_page(pd->page);
 		kfree(pd);
 	}
 }
 
-static struct i915_page_directory_entry *alloc_pd_single(void)
+static struct i915_page_directory_entry *alloc_pd_single(struct drm_device *dev)
 {
 	struct i915_page_directory_entry *pd;
+	int ret;
 
 	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
 	if (!pd)
@@ -442,6 +445,13 @@ static struct i915_page_directory_entry *alloc_pd_single(void)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	ret = i915_dma_map_px_single(pd, dev);
+	if (ret) {
+		__free_page(pd->page);
+		kfree(pd);
+		return ERR_PTR(ret);
+	}
+
 	return pd;
 }
 
@@ -581,6 +591,36 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+			     struct i915_page_table_entry *pt,
+			     struct drm_device *dev)
+{
+	gen8_ppgtt_pde_t entry =
+		gen8_pde_encode(dev, pt->daddr, I915_CACHE_LLC);
+	*pde = entry;
+}
+
+/* It's likely we'll map more than one pagetable at a time. This function will
+ * save us unnecessary kmap calls, but do no more functionally than multiple
+ * calls to map_pt. */
+static void gen8_map_pagetable_range(struct i915_page_directory_entry *pd,
+				     uint64_t start,
+				     uint64_t length,
+				     struct drm_device *dev)
+{
+	gen8_ppgtt_pde_t *page_directory = kmap_atomic(pd->page);
+	struct i915_page_table_entry *pt;
+	uint64_t temp, pde;
+
+	gen8_for_each_pde(pt, pd, start, length, temp, pde)
+		__gen8_do_map_pt(page_directory + pde, pt, dev);
+
+	if (!HAS_LLC(dev))
+		drm_clflush_virt_range(page_directory, PAGE_SIZE);
+
+	kunmap_atomic(page_directory);
+}
+
 static void gen8_teardown_va_range(struct i915_address_space *vm,
 				   uint64_t start, uint64_t length)
 {
@@ -598,7 +638,7 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 			unmap_and_free_pt(pt, vm->dev);
 			pd->page_tables[pde] = NULL;
 		}
-		unmap_and_free_pd(pd);
+		unmap_and_free_pd(pd, vm->dev);
 		ppgtt->pdp.page_directory[pdpe] = NULL;
 	}
 }
@@ -630,9 +670,6 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
-	trace_i915_va_teardown(&ppgtt->base,
-			       ppgtt->base.start, ppgtt->base.total,
-			       VM_TO_TRACE_NAME(&ppgtt->base));
 	gen8_teardown_va_range(&ppgtt->base,
 			       ppgtt->base.start, ppgtt->base.total);
 }
@@ -673,7 +710,8 @@ unwind_out:
 
 static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
 				     uint64_t start,
-				     uint64_t length)
+				     uint64_t length,
+				     struct drm_device *dev)
 {
 	struct i915_page_directory_entry *unused;
 	uint64_t temp;
@@ -684,7 +722,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
-		pdp->page_directory[pdpe] = alloc_pd_single();
+		pdp->page_directory[pdpe] = alloc_pd_single(dev);
 		if (IS_ERR(pdp->page_directory[pdpe]))
 			goto unwind_out;
 	}
@@ -693,21 +731,25 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 
 unwind_out:
 	while (pdpe--)
-		unmap_and_free_pd(pdp->page_directory[pdpe]);
+		unmap_and_free_pd(pdp->page_directory[pdpe], dev);
 
 	return -ENOMEM;
 }
 
-static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-			    uint64_t start,
-			    uint64_t length)
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start,
+			       uint64_t length)
 {
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_entry *pd;
+	const uint64_t orig_start = start;
 	uint64_t temp;
 	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length);
+	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length,
+					ppgtt->base.dev);
 	if (ret)
 		return ret;
 
@@ -720,133 +762,50 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 
 	return 0;
 
-	/* TODO: Check this for all cases */
 err_out:
-	gen8_ppgtt_free(ppgtt);
+	gen8_teardown_va_range(vm, orig_start, start);
 	return ret;
 }
 
-static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
-					     const int pd)
-{
-	dma_addr_t pd_addr;
-	int ret;
-
-	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.page_directory[pd]->page, 0,
-			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
-	if (ret)
-		return ret;
-
-	ppgtt->pdp.page_directory[pd]->daddr = pd_addr;
-
-	return 0;
-}
-
-static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
-					const int pd,
-					const int pt)
-{
-	dma_addr_t pt_addr;
-	struct i915_page_directory_entry *pdir = ppgtt->pdp.page_directory[pd];
-	struct i915_page_table_entry *ptab = pdir->page_tables[pt];
-	struct page *p = ptab->page;
-	int ret;
-
-	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
-	if (ret)
-		return ret;
-
-	ptab->daddr = pt_addr;
-
-	return 0;
-}
-
 /**
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
  * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
  * space.
  *
- * FIXME: split allocation into smaller pieces. For now we only ever do this
- * once, but with full PPGTT, the multiple contiguous allocations will be bad.
- * TODO: Do something with the size parameter
  */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	int i, j, ret;
-
-	if (size % (1<<30))
-		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
+	struct i915_page_directory_entry *pd;
+	uint64_t temp, start = 0;
+	const uint64_t orig_length = size;
+	uint32_t pdpe;
+	int ret;
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->switch_mm = gen8_mm_switch;
 
 	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pd))
 		return PTR_ERR(ppgtt->scratch_pd);
 
-	/* 1. Do all our allocations for page directories and page tables. */
-	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
+	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
 		unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
 	}
 
-	/*
-	 * 2. Create DMA mappings for the page directories and page tables.
-	 */
-	for (i = 0; i < max_pdp; i++) {
-		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
-		if (ret)
-			goto bail;
+	start = 0;
+	size = orig_length;
 
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
-			if (ret)
-				goto bail;
-		}
-	}
-
-	/*
-	 * 3. Map all the page directory entires to point to the page tables
-	 * we've allocated.
-	 *
-	 * For now, the PPGTT helper functions all require that the PDEs are
-	 * plugged in correctly. So we do that now/here. For aliasing PPGTT, we
-	 * will never need to touch the PDEs again.
-	 */
-	for (i = 0; i < max_pdp; i++) {
-		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
-		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i]->page);
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_table_entry *pt = pd->page_tables[j];
-			dma_addr_t addr = pt->daddr;
-			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
-						      I915_CACHE_LLC);
-		}
-		if (!HAS_LLC(ppgtt->base.dev))
-			drm_clflush_virt_range(pd_vaddr, PAGE_SIZE);
-		kunmap_atomic(pd_vaddr);
-	}
-
-	ppgtt->switch_mm = gen8_mm_switch;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
-	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
+		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
 
 	return 0;
-
-bail:
-	gen8_ppgtt_unmap_pages(ppgtt);
-	gen8_ppgtt_free(ppgtt);
-	return ret;
 }
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
@@ -1286,7 +1245,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	}
 
 	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
-	unmap_and_free_pd(&ppgtt->pd);
+	unmap_and_free_pd(&ppgtt->pd, ppgtt->base.dev);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 23/25] drm/i915/bdw: begin bitmap tracking
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (21 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 22/25] drm/i915/bdw: Split out mappings Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 24/25] drm/i915/bdw: Dynamic page table allocations Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 25/25] drm/i915/bdw: Support dynamic pdp updates in lrc mode Michel Thierry
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Like with gen6/7, we can enable bitmap tracking with all the
preallocations to make sure things actually don't blow up.

v2: Rebased to match changes from previous patches.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 121 +++++++++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  24 +++++++
 2 files changed, 108 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3ce9f83..82b72a1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -424,8 +424,12 @@ static void unmap_and_free_pd(struct i915_page_directory_entry *pd,
 			       struct drm_device *dev)
 {
 	if (pd->page) {
+		WARN(!bitmap_empty(pd->used_pdes, GEN8_PDES_PER_PAGE),
+				"Free page directory with %d used pages\n",
+				bitmap_weight(pd->used_pdes, GEN8_PDES_PER_PAGE));
 		i915_dma_unmap_single(pd, dev);
 		__free_page(pd->page);
+		kfree(pd->used_pdes);
 		kfree(pd);
 	}
 }
@@ -433,26 +437,35 @@ static void unmap_and_free_pd(struct i915_page_directory_entry *pd,
 static struct i915_page_directory_entry *alloc_pd_single(struct drm_device *dev)
 {
 	struct i915_page_directory_entry *pd;
-	int ret;
+	int ret = -ENOMEM;
 
 	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
 	if (!pd)
 		return ERR_PTR(-ENOMEM);
 
+	pd->used_pdes = kcalloc(BITS_TO_LONGS(GEN8_PDES_PER_PAGE),
+				sizeof(*pd->used_pdes), GFP_KERNEL);
+	if (!pd->used_pdes)
+		goto free_pd;
+
 	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (!pd->page) {
-		kfree(pd);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!pd->page)
+		goto free_bitmap;
 
 	ret = i915_dma_map_px_single(pd, dev);
-	if (ret) {
-		__free_page(pd->page);
-		kfree(pd);
-		return ERR_PTR(ret);
-	}
+	if (ret)
+		goto free_page;
 
 	return pd;
+
+free_page:
+	__free_page(pd->page);
+free_bitmap:
+	kfree(pd->used_pdes);
+free_pd:
+	kfree(pd);
+
+	return ERR_PTR(ret);
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
@@ -634,36 +647,47 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
-		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
-			unmap_and_free_pt(pt, vm->dev);
-			pd->page_tables[pde] = NULL;
-		}
-		unmap_and_free_pd(pd, vm->dev);
-		ppgtt->pdp.page_directory[pdpe] = NULL;
-	}
-}
 
-static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
-{
-	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
-	int i, j;
-
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		/* TODO: In the future we'll support sparse mappings, so this
-		 * will have to change. */
-		if (!ppgtt->pdp.page_directory[i]->daddr)
+		/* Page directories might not be present since the macro rounds
+		 * down, and up.
+		 */
+		if (!pd) {
+			WARN(test_bit(pdpe, ppgtt->pdp.used_pdpes),
+			     "PDPE %d is not allocated, but is reserved (%p)\n",
+			     pdpe, vm);
 			continue;
+		} else {
+			WARN(!test_bit(pdpe, ppgtt->pdp.used_pdpes),
+			     "PDPE %d not reserved, but is allocated (%p)",
+			     pdpe, vm);
+		}
 
-		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i]->daddr, PAGE_SIZE,
-				PCI_DMA_BIDIRECTIONAL);
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			if (!pt) {
+				WARN(test_bit(pde, pd->used_pdes),
+				     "PDE %d is not allocated, but is reserved (%p)\n",
+				     pde, vm);
+				continue;
+			} else
+				WARN(!test_bit(pde, pd->used_pdes),
+				     "PDE %d not reserved, but is allocated (%p)",
+				     pde, vm);
+
+			bitmap_clear(pt->used_ptes,
+				     gen8_pte_index(pd_start),
+				     gen8_pte_count(pd_start, pd_len));
+
+			if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PAGE)) {
+				unmap_and_free_pt(pt, vm->dev);
+				pd->page_tables[pde] = NULL;
+				WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+			}
+		}
 
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
-			struct i915_page_table_entry *pt = pd->page_tables[j];
-			dma_addr_t addr = pt->daddr;
-			if (addr)
-				pci_unmap_page(hwdev, addr, PAGE_SIZE,
-						PCI_DMA_BIDIRECTIONAL);
+		if (bitmap_empty(pd->used_pdes, GEN8_PDES_PER_PAGE)) {
+			unmap_and_free_pd(pd, vm->dev);
+			ppgtt->pdp.page_directory[pdpe] = NULL;
+			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
 		}
 	}
 }
@@ -679,7 +703,6 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 
-	gen8_ppgtt_unmap_pages(ppgtt);
 	gen8_ppgtt_free(ppgtt);
 }
 
@@ -708,6 +731,7 @@ unwind_out:
 	return -ENOMEM;
 }
 
+/* bitmap of new page_directories */
 static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
 				     uint64_t start,
 				     uint64_t length,
@@ -723,6 +747,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
 		pdp->page_directory[pdpe] = alloc_pd_single(dev);
+
 		if (IS_ERR(pdp->page_directory[pdpe]))
 			goto unwind_out;
 	}
@@ -744,10 +769,12 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_entry *pd;
 	const uint64_t orig_start = start;
+	const uint64_t orig_length = length;
 	uint64_t temp;
 	uint32_t pdpe;
 	int ret;
 
+	/* Do the allocations first so we can easily bail out */
 	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length,
 					ppgtt->base.dev);
 	if (ret)
@@ -760,6 +787,26 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 			goto err_out;
 	}
 
+	/* Now mark everything we've touched as used. This doesn't allow for
+	 * robust error checking, but it makes the code a hell of a lot simpler.
+	 */
+	start = orig_start;
+	length = orig_length;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		struct i915_page_table_entry *pt;
+		uint64_t pd_len = gen8_clamp_pd(start, length);
+		uint64_t pd_start = start;
+		uint32_t pde;
+		gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
+			bitmap_set(pd->page_tables[pde]->used_ptes,
+				   gen8_pte_index(start),
+				   gen8_pte_count(start, length));
+			set_bit(pde, pd->used_pdes);
+		}
+		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+	}
+
 	return 0;
 
 err_out:
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 4a3371a..c755617 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -205,11 +205,13 @@ struct i915_page_directory_entry {
 		dma_addr_t daddr;
 	};
 
+	unsigned long *used_pdes;
 	struct i915_page_table_entry *page_tables[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
 };
 
 struct i915_page_directory_pointer_entry {
 	/* struct page *page; */
+	DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
 	struct i915_page_directory_entry *page_directory[GEN8_LEGACY_PDPES];
 };
 
@@ -447,6 +449,28 @@ static inline uint32_t gen8_pml4e_index(uint64_t address)
 	BUG(); /* For 64B */
 }
 
+static inline size_t gen8_pte_count(uint64_t addr, uint64_t length)
+{
+	return i915_pte_count(addr, length, GEN8_PDE_SHIFT);
+}
+
+static inline size_t gen8_pde_count(uint64_t addr, uint64_t length)
+{
+	const uint32_t pdp_shift = GEN8_PDE_SHIFT + 9;
+	const uint64_t mask = ~((1 << pdp_shift) - 1);
+	uint64_t end;
+
+	BUG_ON(length == 0);
+	BUG_ON(offset_in_page(addr|length));
+
+	end = addr + length;
+
+	if ((addr & mask) != (end & mask))
+		return GEN8_PDES_PER_PAGE - i915_pde_index(addr, GEN8_PDE_SHIFT);
+
+	return i915_pde_index(end, GEN8_PDE_SHIFT) - i915_pde_index(addr, GEN8_PDE_SHIFT);
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 24/25] drm/i915/bdw: Dynamic page table allocations
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (22 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 23/25] drm/i915/bdw: begin bitmap tracking Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  2015-01-13 11:52   ` [PATCH v3 25/25] drm/i915/bdw: Support dynamic pdp updates in lrc mode Michel Thierry
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.

Zombie tracking:
This could be a separate patch, but I found it helpful for debugging.
Since we write page tables asynchronously with respect to the GPU using
them, we can't actually free the page tables until we know the GPU won't
use them. With this patch, that is always when the context dies.

Scratch unused pages:
The object pages can get freed even if a page table still points to
them.  Like the zombie fix, we need to make sure we don't let our GPU
access arbitrary memory when we've unmapped things.

v2: Update aliasing/true ppgtt allocate/teardown/clear functions for
gen 6 & 7.

v3: Rebase.

v4: Remove BUG() from ppgtt_unbind_vma, but keep checking that either
teardown_va_range or clear_range functions exist (Daniel).

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 375 +++++++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  15 ++
 2 files changed, 325 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 82b72a1..4f6e758 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -604,7 +604,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
 			     struct i915_page_table_entry *pt,
 			     struct drm_device *dev)
 {
@@ -621,7 +621,7 @@ static void gen8_map_pagetable_range(struct i915_page_directory_entry *pd,
 				     uint64_t length,
 				     struct drm_device *dev)
 {
-	gen8_ppgtt_pde_t *page_directory = kmap_atomic(pd->page);
+	gen8_ppgtt_pde_t * const page_directory = kmap_atomic(pd->page);
 	struct i915_page_table_entry *pt;
 	uint64_t temp, pde;
 
@@ -634,8 +634,9 @@ static void gen8_map_pagetable_range(struct i915_page_directory_entry *pd,
 	kunmap_atomic(page_directory);
 }
 
-static void gen8_teardown_va_range(struct i915_address_space *vm,
-				   uint64_t start, uint64_t length)
+static void __gen8_teardown_va_range(struct i915_address_space *vm,
+				     uint64_t start, uint64_t length,
+				     bool dead)
 {
 	struct i915_hw_ppgtt *ppgtt =
 				container_of(vm, struct i915_hw_ppgtt, base);
@@ -657,6 +658,13 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 			     pdpe, vm);
 			continue;
 		} else {
+			if (dead && pd->zombie) {
+				WARN_ON(test_bit(pdpe, ppgtt->pdp.used_pdpes));
+				unmap_and_free_pd(pd, vm->dev);
+				ppgtt->pdp.page_directory[pdpe] = NULL;
+				continue;
+			}
+
 			WARN(!test_bit(pdpe, ppgtt->pdp.used_pdpes),
 			     "PDPE %d not reserved, but is allocated (%p)",
 			     pdpe, vm);
@@ -668,34 +676,64 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
 				     "PDE %d is not allocated, but is reserved (%p)\n",
 				     pde, vm);
 				continue;
-			} else
+			} else {
+				if (dead && pt->zombie) {
+					WARN_ON(test_bit(pde, pd->used_pdes));
+					unmap_and_free_pt(pt, vm->dev);
+					pd->page_tables[pde] = NULL;
+					continue;
+				}
 				WARN(!test_bit(pde, pd->used_pdes),
 				     "PDE %d not reserved, but is allocated (%p)",
 				     pde, vm);
+			}
 
 			bitmap_clear(pt->used_ptes,
 				     gen8_pte_index(pd_start),
 				     gen8_pte_count(pd_start, pd_len));
 
 			if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PAGE)) {
+				WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+				if (!dead) {
+					pt->zombie = 1;
+					continue;
+				}
 				unmap_and_free_pt(pt, vm->dev);
 				pd->page_tables[pde] = NULL;
-				WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+
 			}
 		}
 
+		gen8_ppgtt_clear_range(vm, pd_start, pd_len, true);
+
 		if (bitmap_empty(pd->used_pdes, GEN8_PDES_PER_PAGE)) {
+			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
+			if (!dead) {
+				/* We've unmapped a possibly live context. Make
+				 * note of it so we can clean it up later. */
+				pd->zombie = 1;
+				continue;
+			}
 			unmap_and_free_pd(pd, vm->dev);
 			ppgtt->pdp.page_directory[pdpe] = NULL;
-			WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
 		}
 	}
 }
 
+static void gen8_teardown_va_range(struct i915_address_space *vm,
+				   uint64_t start, uint64_t length)
+{
+	__gen8_teardown_va_range(vm, start, length, false);
+}
+
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
-	gen8_teardown_va_range(&ppgtt->base,
-			       ppgtt->base.start, ppgtt->base.total);
+	trace_i915_va_teardown(&ppgtt->base,
+			       ppgtt->base.start, ppgtt->base.total,
+			       VM_TO_TRACE_NAME(&ppgtt->base));
+	__gen8_teardown_va_range(&ppgtt->base,
+				 ppgtt->base.start, ppgtt->base.total,
+				 true);
 }
 
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
@@ -706,67 +744,177 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_alloc_pagetabs(struct i915_page_directory_entry *pd,
+/**
+ * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
+ * @ppgtt:	Master ppgtt structure.
+ * @pd:		Page directory for this address range.
+ * @start:	Starting virtual address to begin allocations.
+ * @length	Size of the allocations.
+ * @new_pts:	Bitmap set by function with new allocations. Likely used by the
+ *		caller to free on error.
+ *
+ * Allocate the required number of page tables. Extremely similar to
+ * gen8_ppgtt_alloc_page_directories(). The main difference is here we are limited by
+ * the page directory boundary (instead of the page directory pointer). That
+ * boundary is 1GB virtual. Therefore, unlike gen8_ppgtt_alloc_page_directories(), it is
+ * possible, and likely that the caller will need to use multiple calls of this
+ * function to achieve the appropriate allocation.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_page_directory_entry *pd,
 				     uint64_t start,
 				     uint64_t length,
-				     struct drm_device *dev)
+				     unsigned long *new_pts)
 {
-	struct i915_page_table_entry *unused;
+	struct i915_page_table_entry *pt;
 	uint64_t temp;
 	uint32_t pde;
 
-	gen8_for_each_pde(unused, pd, start, length, temp, pde) {
-		BUG_ON(unused);
-		pd->page_tables[pde] = alloc_pt_single(dev);
-		if (IS_ERR(pd->page_tables[pde]))
+	gen8_for_each_pde(pt, pd, start, length, temp, pde) {
+		/* Don't reallocate page tables */
+		if (pt) {
+			/* Scratch is never allocated this way */
+			WARN_ON(pt->scratch);
+			/* If there is a zombie, we can reuse it and save time
+			 * on the allocation. If we clear the zombie status and
+			 * the caller somehow fails, we'll probably hit some
+			 * assertions, so it's up to them to fix up the bitmaps.
+			 */
+			continue;
+		}
+
+		pt = alloc_pt_single(ppgtt->base.dev);
+		if (IS_ERR(pt))
 			goto unwind_out;
+
+		pd->page_tables[pde] = pt;
+		set_bit(pde, new_pts);
 	}
 
 	return 0;
 
 unwind_out:
-	while (pde--)
-		unmap_and_free_pt(pd->page_tables[pde], dev);
+	for_each_set_bit(pde, new_pts, GEN8_PDES_PER_PAGE)
+		unmap_and_free_pt(pd->page_tables[pde], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
 
-/* bitmap of new page_directories */
-static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
+/**
+ * gen8_ppgtt_alloc_page_directories() - Allocate page directories for VA range.
+ * @ppgtt:	Master ppgtt structure.
+ * @pdp:	Page directory pointer for this address range.
+ * @start:	Starting virtual address to begin allocations.
+ * @length	Size of the allocations.
+ * @new_pds	Bitmap set by function with new allocations. Likely used by the
+ *		caller to free on error.
+ *
+ * Allocate the required number of page directories starting at the pde index of
+ * @start, and ending at the pde index @start + @length. This function will skip
+ * over already allocated page directories within the range, and only allocate
+ * new ones, setting the appropriate pointer within the pdp as well as the
+ * correct position in the bitmap @new_pds.
+ *
+ * The function will only allocate the pages within the range for a give page
+ * directory pointer. In other words, if @start + @length straddles a virtually
+ * addressed PDP boundary (512GB for 4k pages), there will be more allocations
+ * required by the caller, This is not currently possible, and the BUG in the
+ * code will prevent it.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_page_directory_pointer_entry *pdp,
 				     uint64_t start,
 				     uint64_t length,
-				     struct drm_device *dev)
+				     unsigned long *new_pds)
 {
-	struct i915_page_directory_entry *unused;
+	struct i915_page_directory_entry *pd;
 	uint64_t temp;
 	uint32_t pdpe;
 
+	BUG_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+
 	/* FIXME: PPGTT container_of won't work for 64b */
 	BUG_ON((start + length) > 0x800000000ULL);
 
-	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
-		BUG_ON(unused);
-		pdp->page_directory[pdpe] = alloc_pd_single(dev);
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+		if (pd)
+			continue;
 
-		if (IS_ERR(pdp->page_directory[pdpe]))
+		pd = alloc_pd_single(ppgtt->base.dev);
+		if (IS_ERR(pd))
 			goto unwind_out;
+
+		pdp->page_directory[pdpe] = pd;
+		set_bit(pdpe, new_pds);
 	}
 
 	return 0;
 
 unwind_out:
-	while (pdpe--)
-		unmap_and_free_pd(pdp->page_directory[pdpe], dev);
+	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
+		unmap_and_free_pd(pdp->page_directory[pdpe], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
 
+static inline void
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
+{
+	int i;
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+		kfree(new_pts[i]);
+	kfree(new_pts);
+	kfree(new_pds);
+}
+
+/* Fills in the page directory bitmap, ant the array of page tables bitmap. Both
+ * of these are based on the number of PDPEs in the system.
+ */
+int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
+					 unsigned long ***new_pts)
+{
+	int i;
+	unsigned long *pds;
+	unsigned long **pts;
+
+	pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
+	if (!pds)
+		return -ENOMEM;
+
+	pts = kcalloc(GEN8_PDES_PER_PAGE, sizeof(unsigned long *), GFP_KERNEL);
+	if (!pts) {
+		kfree(pds);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+		pts[i] = kcalloc(BITS_TO_LONGS(GEN8_PDES_PER_PAGE),
+				 sizeof(unsigned long), GFP_KERNEL);
+		if (!pts[i])
+			goto err_out;
+	}
+
+	*new_pds = pds;
+	*new_pts = (unsigned long **)pts;
+
+	return 0;
+
+err_out:
+	free_gen8_temp_bitmaps(pds, pts);
+	return -ENOMEM;
+}
+
 static int gen8_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start,
 			       uint64_t length)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	unsigned long *new_page_dirs, **new_page_tables;
 	struct i915_page_directory_entry *pd;
 	const uint64_t orig_start = start;
 	const uint64_t orig_length = length;
@@ -774,43 +922,103 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	uint32_t pdpe;
 	int ret;
 
-	/* Do the allocations first so we can easily bail out */
-	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length,
-					ppgtt->base.dev);
+#ifndef CONFIG_64BIT
+	/* Disallow 64b address on 32b platforms. Nothing is wrong with doing
+	 * this in hardware, but a lot of the drm code is not prepared to handle
+	 * 64b offset on 32b platforms. */
+	if (start + length > 0x100000000ULL)
+		return -E2BIG;
+#endif
+
+	/* Wrap is never okay since we can only represent 48b, and we don't
+	 * actually use the other side of the canonical address space.
+	 */
+	if (WARN_ON(start + length < start))
+		return -ERANGE;
+
+	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
 	if (ret)
 		return ret;
 
+	/* Do the allocations first so we can easily bail out */
+	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
+					new_page_dirs);
+	if (ret) {
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+		return ret;
+	}
+
+	/* For every page directory referenced, allocate page tables */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
-						ppgtt->base.dev);
+		bitmap_zero(new_page_tables[pdpe], GEN8_PDES_PER_PAGE);
+		ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
+						new_page_tables[pdpe]);
 		if (ret)
 			goto err_out;
 	}
 
-	/* Now mark everything we've touched as used. This doesn't allow for
-	 * robust error checking, but it makes the code a hell of a lot simpler.
-	 */
 	start = orig_start;
 	length = orig_length;
 
+	/* Allocations have completed successfully, so set the bitmaps, and do
+	 * the mappings. */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		gen8_ppgtt_pde_t *const page_directory = kmap_atomic(pd->page);
 		struct i915_page_table_entry *pt;
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
 		uint32_t pde;
-		gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
-			bitmap_set(pd->page_tables[pde]->used_ptes,
-				   gen8_pte_index(start),
-				   gen8_pte_count(start, length));
+
+		/* Every pd should be allocated, we just did that above. */
+		BUG_ON(!pd);
+
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			/* Same reasoning as pd */
+			BUG_ON(!pt);
+			BUG_ON(!pd_len);
+			BUG_ON(!gen8_pte_count(pd_start, pd_len));
+
+			/* Set our used ptes within the page table */
+			bitmap_set(pt->used_ptes,
+				   gen8_pte_index(pd_start),
+				   gen8_pte_count(pd_start, pd_len));
+
+			/* Our pde is now pointing to the pagetable, pt */
 			set_bit(pde, pd->used_pdes);
+
+			/* Map the PDE to the page table */
+			__gen8_do_map_pt(page_directory + pde, pt, vm->dev);
+
+			/* NB: We haven't yet mapped ptes to pages. At this
+			 * point we're still relying on insert_entries() */
+
+			/* No longer possible this page table is a zombie */
+			pt->zombie = 0;
 		}
+
+		if (!HAS_LLC(vm->dev))
+			drm_clflush_virt_range(page_directory, PAGE_SIZE);
+
+		kunmap_atomic(page_directory);
+
 		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+		/* This pd is officially not a zombie either */
+		ppgtt->pdp.page_directory[pdpe]->zombie = 0;
 	}
 
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	return 0;
 
 err_out:
-	gen8_teardown_va_range(vm, orig_start, start);
+	while (pdpe--) {
+		for_each_set_bit(temp, new_page_tables[pdpe], GEN8_PDES_PER_PAGE)
+			unmap_and_free_pt(pd->page_tables[temp], vm->dev);
+	}
+
+	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
+		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
+
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	return ret;
 }
 
@@ -821,37 +1029,68 @@ err_out:
  * space.
  *
  */
-static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
+static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	struct i915_page_directory_entry *pd;
-	uint64_t temp, start = 0;
-	const uint64_t orig_length = size;
-	uint32_t pdpe;
-	int ret;
+	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pd))
+		return PTR_ERR(ppgtt->scratch_pd);
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+
 	ppgtt->switch_mm = gen8_mm_switch;
 
-	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
-	if (IS_ERR(ppgtt->scratch_pd))
-		return PTR_ERR(ppgtt->scratch_pd);
+	return 0;
+}
+
+static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_page_directory_entry *pd;
+	uint64_t temp, start = 0, size = dev_priv->gtt.base.total;
+	uint32_t pdpe;
+	int ret;
+
+	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	if (ret)
+		return ret;
 
+	/* Aliasing PPGTT has to always work and be mapped because of the way we
+	 * use RESTORE_INHIBIT in the context switch. This will be fixed
+	 * eventually. */
 	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
 		unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
 	}
 
-	start = 0;
-	size = orig_length;
-
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
 		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
 
+	ppgtt->base.allocate_va_range = NULL;
+	ppgtt->base.teardown_va_range = NULL;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+
+	return 0;
+}
+
+static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	if (ret)
+		return ret;
+
+	ppgtt->base.allocate_va_range = gen8_alloc_va_range;
+	ppgtt->base.teardown_va_range = gen8_teardown_va_range;
+	ppgtt->base.clear_range = NULL;
+
 	return 0;
 }
 
@@ -1407,9 +1646,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 		}
 	}
 
-	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
-	ppgtt->base.teardown_va_range = gen6_teardown_va_range;
-	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
+	ppgtt->base.allocate_va_range = aliasing ? NULL : gen6_alloc_va_range;
+	ppgtt->base.teardown_va_range = aliasing ? NULL : gen6_teardown_va_range;
+	ppgtt->base.clear_range = aliasing ? gen6_ppgtt_clear_range : NULL;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
@@ -1447,8 +1686,10 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
 
 	if (INTEL_INFO(dev)->gen < 8)
 		return gen6_ppgtt_init(ppgtt, aliasing);
+	else if (aliasing)
+		return gen8_aliasing_ppgtt_init(ppgtt);
 	else
-		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
+		return gen8_ppgtt_init(ppgtt);
 }
 int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 {
@@ -1460,8 +1701,9 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 		kref_init(&ppgtt->ref);
 		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
 			    ppgtt->base.total);
-		ppgtt->base.clear_range(&ppgtt->base, 0,
-			    ppgtt->base.total, true);
+		if (ppgtt->base.clear_range)
+			ppgtt->base.clear_range(&ppgtt->base, 0,
+				ppgtt->base.total, true);
 		i915_init_vm(dev_priv, &ppgtt->base);
 	}
 
@@ -1559,10 +1801,8 @@ ppgtt_bind_vma(struct i915_vma *vma,
 
 static void ppgtt_unbind_vma(struct i915_vma *vma)
 {
-	vma->vm->clear_range(vma->vm,
-			     vma->node.start,
-			     vma->obj->base.size,
-			     true);
+	WARN_ON(vma->vm->teardown_va_range && vma->vm->clear_range);
+	WARN_ON(!vma->vm->teardown_va_range && !vma->vm->clear_range);
 	if (vma->vm->teardown_va_range) {
 		trace_i915_va_teardown(vma->vm,
 				       vma->node.start, vma->node.size,
@@ -1570,6 +1810,11 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 
 		vma->vm->teardown_va_range(vma->vm,
 					   vma->node.start, vma->node.size);
+	} else if (vma->vm->clear_range) {
+		vma->vm->clear_range(vma->vm,
+				     vma->node.start,
+				     vma->obj->base.size,
+				     true);
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c755617..3481871 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -190,12 +190,26 @@ struct i915_vma {
 			 u32 flags);
 };
 
+/* Zombies. We write page tables with the CPU, and hardware switches them with
+ * the GPU. As such, the only time we can safely remove a page table is when we
+ * know the context is idle. Since we have no good way to do this, we use the
+ * zombie.
+ *
+ * Under memory pressure, if the system is idle, zombies may be reaped.
+ *
+ * There are 3 states a page table can be in (not including scratch)
+ *  bitmap = 0, zombie = 0: unallocated
+ *  bitmap = 1, zombie = 0: allocated
+ *  bitmap = 0, zombie = 1: zombie
+ *  bitmap = 1, zombie = 1: invalid
+ */
 struct i915_page_table_entry {
 	struct page *page;
 	dma_addr_t daddr;
 
 	unsigned long *used_ptes;
 	unsigned int scratch:1;
+	unsigned zombie:1;
 };
 
 struct i915_page_directory_entry {
@@ -207,6 +221,7 @@ struct i915_page_directory_entry {
 
 	unsigned long *used_pdes;
 	struct i915_page_table_entry *page_tables[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
+	unsigned zombie:1;
 };
 
 struct i915_page_directory_pointer_entry {
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v3 25/25] drm/i915/bdw: Support dynamic pdp updates in lrc mode
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
                     ` (23 preceding siblings ...)
  2015-01-13 11:52   ` [PATCH v3 24/25] drm/i915/bdw: Dynamic page table allocations Michel Thierry
@ 2015-01-13 11:52   ` Michel Thierry
  24 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:52 UTC (permalink / raw)
  To: intel-gfx

Logic ring contexts need to know the PDPs when they are populated. With
dynamic page table allocations, these PDPs may not exist yet.

Check if PDPs have been allocated and use the scratch page if they do
not exist yet.

Before submission, update the PDPs in the logic ring context as PDPs
have been allocated.

v2: Renamed commit title (Daniel)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 80 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 70 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index efaaebe..109ec59 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -358,6 +358,7 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 
 static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 				    struct drm_i915_gem_object *ring_obj,
+				    struct i915_hw_ppgtt *ppgtt,
 				    u32 tail)
 {
 	struct page *page;
@@ -369,6 +370,40 @@ static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 	reg_state[CTX_RING_TAIL+1] = tail;
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
 
+	/* True PPGTT with dynamic page allocation: update PDP registers and
+	 * point the unallocated PDPs to the scratch page
+	 */
+	if (ppgtt) {
+		if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+		} else {
+			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+		} else {
+			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+		} else {
+			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+		} else {
+			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+	}
+
 	kunmap_atomic(reg_state);
 
 	return 0;
@@ -387,7 +422,7 @@ static void execlists_submit_contexts(struct intel_engine_cs *ring,
 	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj0));
 	WARN_ON(!i915_gem_obj_is_pinned(ringbuf0->obj));
 
-	execlists_update_context(ctx_obj0, ringbuf0->obj, tail0);
+	execlists_update_context(ctx_obj0, ringbuf0->obj, to0->ppgtt, tail0);
 
 	if (to1) {
 		ringbuf1 = to1->engine[ring->id].ringbuf;
@@ -396,7 +431,7 @@ static void execlists_submit_contexts(struct intel_engine_cs *ring,
 		WARN_ON(!i915_gem_obj_is_pinned(ctx_obj1));
 		WARN_ON(!i915_gem_obj_is_pinned(ringbuf1->obj));
 
-		execlists_update_context(ctx_obj1, ringbuf1->obj, tail1);
+		execlists_update_context(ctx_obj1, ringbuf1->obj, to1->ppgtt, tail1);
 	}
 
 	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
@@ -1731,14 +1766,39 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+
+	/* With dynamic page allocation, PDPs may not be allocated at this point,
+	 * Point the unallocated PDPs to the scratch page
+	 */
+	if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+	} else {
+		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+	} else {
+		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+	} else {
+		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+	} else {
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 14/24] drm/i915: Finish gen6/7 dynamic page table allocation
  2015-01-05 14:45     ` Daniel Vetter
@ 2015-01-13 11:53       ` Michel Thierry
  2015-01-13 22:09         ` Daniel Vetter
  0 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-01-13 11:53 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 18913 bytes --]

On 1/5/2015 2:45 PM, Daniel Vetter wrote:
> On Tue, Dec 23, 2014 at 05:16:17PM +0000, Michel Thierry wrote:
>> From: Ben Widawsky <benjamin.widawsky@intel.com>
>>
>> This patch continues on the idea from the previous patch. From here on,
>> in the steady state, PDEs are all pointing to the scratch page table (as
>> recommended in the spec). When an object is allocated in the VA range,
>> the code will determine if we need to allocate a page for the page
>> table. Similarly when the object is destroyed, we will remove, and free
>> the page table pointing the PDE back to the scratch page.
>>
>> Following patches will work to unify the code a bit as we bring in GEN8
>> support. GEN6 and GEN8 are different enough that I had a hard time to
>> get to this point with as much common code as I do.
>>
>> The aliasing PPGTT must pre-allocate all of the page tables. There are a
>> few reasons for this. Two trivial ones: aliasing ppgtt goes through the
>> ggtt paths, so it's hard to maintain, we currently do not restore the
>> default context (assuming the previous force reload is indeed
>> necessary). Most importantly though, the only way (it seems from
>> empirical evidence) to invalidate the CS TLBs on non-render ring is to
>> either use ring sync (which requires actually stopping the rings in
>> order to synchronize when the sync completes vs. where you are in
>> execution), or to reload DCLV.  Since without full PPGTT we do not ever
>> reload the DCLV register, there is no good way to achieve this. The
>> simplest solution is just to not support dynamic page table
>> creation/destruction in the aliasing PPGTT.
>>
>> We could always reload DCLV, but this seems like quite a bit of excess
>> overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
>> page tables.
>>
>> v2: Make the page table bitmap declared inside the function (Chris)
>> Simplify the way scratching address space works.
>> Move the alloc/teardown tracepoints up a level in the call stack so that
>> both all implementations get the trace.
>>
>> v3: Updated trace event to spit out a name
>>
>> v4: Aliasing ppgtt is now initialized differently (in setup global gtt)
>>
>> v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check for
>> trace, as it is no longer possible after the PPGTT cleanup patch series
>> of a couple of months ago (Daniel).
>>
>> Cc: Daniel Vetter <daniel@ffwll.ch>
>> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
> The tracepoints should be split into a separate patch. Although the
> teardown stuff will likely disappear I guess ...
>
> Two more comments below.
> -Daniel
>
>> ---
>>   drivers/gpu/drm/i915/i915_debugfs.c |   3 +-
>>   drivers/gpu/drm/i915/i915_gem.c     |   2 +
>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 128 ++++++++++++++++++++++++++++++++----
>>   drivers/gpu/drm/i915/i915_trace.h   | 115 ++++++++++++++++++++++++++++++++
>>   4 files changed, 236 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>> index 60f91bc..0f63076 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -2149,6 +2149,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>>   		seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", I915_READ(RING_PP_DIR_BASE_READ(ring)));
>>   		seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", I915_READ(RING_PP_DIR_DCLV(ring)));
>>   	}
>> +	seq_printf(m, "ECOCHK: 0x%08x\n\n", I915_READ(GAM_ECOCHK));
>> +
>>   	if (dev_priv->mm.aliasing_ppgtt) {
>>   		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
>>   
>> @@ -2165,7 +2167,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>>   			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
>>   		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
>>   	}
>> -	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
>>   }
>>   
>>   static int i915_ppgtt_info(struct seq_file *m, void *data)
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 5d52990..1649fb2 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -3599,6 +3599,8 @@ search_free:
>>   
>>   	/*  allocate before insert / bind */
>>   	if (vma->vm->allocate_va_range) {
>> +		trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size,
>> +				VM_TO_TRACE_NAME(vma->vm));
>>   		ret = vma->vm->allocate_va_range(vma->vm,
>>   						vma->node.start,
>>   						vma->node.size);
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index 54c7ca7..32a355a 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -1138,10 +1138,47 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>>   static int gen6_alloc_va_range(struct i915_address_space *vm,
>>   			       uint64_t start, uint64_t length)
>>   {
>> +	DECLARE_BITMAP(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
>> +	struct drm_device *dev = vm->dev;
>> +	struct drm_i915_private *dev_priv = dev->dev_private;
>>   	struct i915_hw_ppgtt *ppgtt =
>>   				container_of(vm, struct i915_hw_ppgtt, base);
>>   	struct i915_pagetab *pt;
>> +	const uint32_t start_save = start, length_save = length;
>>   	uint32_t pde, temp;
>> +	int ret;
>> +
>> +	BUG_ON(upper_32_bits(start));
>> +
>> +	bitmap_zero(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
>> +
>> +	/* The allocation is done in two stages so that we can bail out with
>> +	 * minimal amount of pain. The first stage finds new page tables that
>> +	 * need allocation. The second stage marks use ptes within the page
>> +	 * tables.
>> +	 */
> If we drop the bitmask tracking we could massively simplify this -
> checking just the various pt pointers should be enough?
>
>> +	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
>> +		if (pt != ppgtt->scratch_pt) {
>> +			WARN_ON(bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
>> +			continue;
>> +		}
>> +
>> +		/* We've already allocated a page table */
>> +		WARN_ON(!bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
>> +
>> +		pt = alloc_pt_single(dev);
>> +		if (IS_ERR(pt)) {
>> +			ret = PTR_ERR(pt);
>> +			goto unwind_out;
>> +		}
>> +
>> +		ppgtt->pd.page_tables[pde] = pt;
>> +		set_bit(pde, new_page_tables);
>> +		trace_i915_pagetable_alloc(vm, pde, start, GEN6_PDE_SHIFT);
>> +	}
>> +
>> +	start = start_save;
>> +	length = length_save;
>>   
>>   	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
>>   		int j;
>> @@ -1159,12 +1196,35 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
>>   			}
>>   		}
>>   
>> -		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
>> +		if (test_and_clear_bit(pde, new_page_tables))
>> +			gen6_write_pdes(&ppgtt->pd, pde, pt);
>> +
>> +		trace_i915_pagetable_map(vm, pde, pt,
>> +					 gen6_pte_index(start),
>> +					 gen6_pte_count(start, length),
>> +					 I915_PPGTT_PT_ENTRIES);
>> +		bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
>>   				I915_PPGTT_PT_ENTRIES);
>>   	}
>>   
>> +	WARN_ON(!bitmap_empty(new_page_tables, GEN6_PPGTT_PD_ENTRIES));
>> +
>> +	/* Make sure write is complete before other code can use this page
>> +	 * table. Also require for WC mapped PTEs */
>> +	readl(dev_priv->gtt.gsm);
>> +
>>   	ppgtt_invalidate_tlbs(vm);
>>   	return 0;
>> +
>> +unwind_out:
>> +	for_each_set_bit(pde, new_page_tables, GEN6_PPGTT_PD_ENTRIES) {
>> +		struct i915_pagetab *pt = ppgtt->pd.page_tables[pde];
>> +		ppgtt->pd.page_tables[pde] = NULL;
>> +		free_pt_single(pt, vm->dev);
>> +	}
>> +
>> +	ppgtt_invalidate_tlbs(vm);
>> +	return ret;
>>   }
>>   
>>   static void gen6_teardown_va_range(struct i915_address_space *vm,
>> @@ -1176,8 +1236,27 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
>>   	uint32_t pde, temp;
>>   
>>   	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
>> +
>> +		if (WARN(pt == ppgtt->scratch_pt,
>> +		    "Tried to teardown scratch page vm %p. pde %u: %llx-%llx\n",
>> +		    vm, pde, start, start + length))
>> +			continue;
>> +
>> +		trace_i915_pagetable_unmap(vm, pde, pt,
>> +					   gen6_pte_index(start),
>> +					   gen6_pte_count(start, length),
>> +					   I915_PPGTT_PT_ENTRIES);
>> +
>>   		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
>>   			     gen6_pte_count(start, length));
>> +
>> +		if (bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES)) {
>> +			trace_i915_pagetable_destroy(vm, pde,
>> +						     start & GENMASK_ULL(63, GEN6_PDE_SHIFT),
>> +						     GEN6_PDE_SHIFT);
>> +			gen6_write_pdes(&ppgtt->pd, pde, ppgtt->scratch_pt);
>> +			ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
>> +		}
>>   	}
>>   
>>   	ppgtt_invalidate_tlbs(vm);
>> @@ -1187,9 +1266,13 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>>   {
>>   	int i;
>>   
>> -	for (i = 0; i < ppgtt->num_pd_entries; i++)
>> -		free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
>> +	for (i = 0; i < ppgtt->num_pd_entries; i++) {
>> +		struct i915_pagetab *pt = ppgtt->pd.page_tables[i];
>> +		if (pt != ppgtt->scratch_pt)
>> +			free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
>> +	}
>>   
>> +	/* Consider putting this as part of pd free. */
>>   	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
>>   	free_pd_single(&ppgtt->pd);
>>   }
>> @@ -1254,7 +1337,7 @@ err_out:
>>   	return ret;
>>   }
>>   
>> -static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>> +static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt, bool preallocate_pt)
> Imo it would be clearer to move the pt preallocation for alising ppgtt
> into the ppgtt_init function. Makes for a bit a bigger diff, but will
> result in less convoluted control flow since we should end up in a nice
>
> if (alising)
> 	/* create all pts */
> else
> 	/* allocate&use scratch_pt */
>
> Aside: Should we only allocate the scratch_pt for !aliasing?
The next patch version will have the changes.
About the scratch_pt, I'm not sure if it's a requirement in gen6/7 (to 
point unused page tables to the scratch, e.g. if there's less than 2GB).
We know in gen8 that's the case, and systems with less than 4GB must 
have the remaining PDPs set to scratch page.

>
>>   {
>>   	int ret;
>>   
>> @@ -1262,10 +1345,14 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>>   	if (ret)
>>   		return ret;
>>   
>> +	if (!preallocate_pt)
>> +		return 0;
>> +
>>   	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
>>   			ppgtt->base.dev);
>>   
>>   	if (ret) {
>> +		free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
>>   		drm_mm_remove_node(&ppgtt->node);
>>   		return ret;
>>   	}
>> @@ -1273,7 +1360,17 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>>   	return 0;
>>   }
>>   
>> -static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>> +static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
>> +				  uint64_t start, uint64_t length)
>> +{
>> +	struct i915_pagetab *unused;
>> +	uint32_t pde, temp;
>> +
>> +	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde)
>> +		ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
>> +}
>> +
>> +static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
>>   {
>>   	struct drm_device *dev = ppgtt->base.dev;
>>   	struct drm_i915_private *dev_priv = dev->dev_private;
>> @@ -1289,7 +1386,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>>   	} else
>>   		BUG();
>>   
>> -	ret = gen6_ppgtt_alloc(ppgtt);
>> +	ret = gen6_ppgtt_alloc(ppgtt, aliasing);
>>   	if (ret)
>>   		return ret;
>>   
>> @@ -1308,6 +1405,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>>   	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
>>   		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
>>   
>> +	if (!aliasing)
>> +		gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
>> +
>>   	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
>>   
>>   	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
>> @@ -1320,7 +1420,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>>   	return 0;
>>   }
>>   
>> -static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
>> +static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
>> +		bool aliasing)
>>   {
>>   	struct drm_i915_private *dev_priv = dev->dev_private;
>>   
>> @@ -1328,7 +1429,7 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
>>   	ppgtt->base.scratch = dev_priv->gtt.base.scratch;
>>   
>>   	if (INTEL_INFO(dev)->gen < 8)
>> -		return gen6_ppgtt_init(ppgtt);
>> +		return gen6_ppgtt_init(ppgtt, aliasing);
>>   	else
>>   		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
>>   }
>> @@ -1337,7 +1438,7 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
>>   	struct drm_i915_private *dev_priv = dev->dev_private;
>>   	int ret = 0;
>>   
>> -	ret = __hw_ppgtt_init(dev, ppgtt);
>> +	ret = __hw_ppgtt_init(dev, ppgtt, false);
>>   	if (ret == 0) {
>>   		kref_init(&ppgtt->ref);
>>   		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
>> @@ -1445,9 +1546,14 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
>>   			     vma->node.start,
>>   			     vma->obj->base.size,
>>   			     true);
>> -	if (vma->vm->teardown_va_range)
>> +	if (vma->vm->teardown_va_range) {
>> +		trace_i915_va_teardown(vma->vm,
>> +				       vma->node.start, vma->node.size,
>> +				       VM_TO_TRACE_NAME(vma->vm));
>> +
>>   		vma->vm->teardown_va_range(vma->vm,
>>   					   vma->node.start, vma->node.size);
>> +	}
>>   }
>>   
>>   extern int intel_iommu_gfx_mapped;
>> @@ -1963,7 +2069,7 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
>>   		if (!ppgtt)
>>   			return -ENOMEM;
>>   
>> -		ret = __hw_ppgtt_init(dev, ppgtt);
>> +		ret = __hw_ppgtt_init(dev, ppgtt, true);
>>   		if (ret != 0)
>>   			return ret;
>>   
>> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
>> index f004d3d..0b617c9 100644
>> --- a/drivers/gpu/drm/i915/i915_trace.h
>> +++ b/drivers/gpu/drm/i915/i915_trace.h
>> @@ -156,6 +156,121 @@ TRACE_EVENT(i915_vma_unbind,
>>   		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
>>   );
>>   
>> +#define VM_TO_TRACE_NAME(vm) \
>> +	(i915_is_ggtt(vm) ? "GGTT" : \
>> +				      "Private VM")
>> +
>> +DECLARE_EVENT_CLASS(i915_va,
>> +	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
>> +	TP_ARGS(vm, start, length, name),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(struct i915_address_space *, vm)
>> +		__field(u64, start)
>> +		__field(u64, end)
>> +		__string(name, name)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->vm = vm;
>> +		__entry->start = start;
>> +		__entry->end = start + length;
>> +		__assign_str(name, name);
>> +	),
>> +
>> +	TP_printk("vm=%p (%s), 0x%llx-0x%llx",
>> +		  __entry->vm, __get_str(name),  __entry->start, __entry->end)
>> +);
>> +
>> +DEFINE_EVENT(i915_va, i915_va_alloc,
>> +	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
>> +	     TP_ARGS(vm, start, length, name)
>> +);
>> +
>> +DEFINE_EVENT(i915_va, i915_va_teardown,
>> +	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
>> +	     TP_ARGS(vm, start, length, name)
>> +);
>> +
>> +DECLARE_EVENT_CLASS(i915_pagetable,
>> +	TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
>> +	TP_ARGS(vm, pde, start, pde_shift),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(struct i915_address_space *, vm)
>> +		__field(u32, pde)
>> +		__field(u64, start)
>> +		__field(u64, end)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->vm = vm;
>> +		__entry->pde = pde;
>> +		__entry->start = start;
>> +		__entry->end = (start + (1ULL << pde_shift)) & ~((1ULL << pde_shift)-1);
>> +	),
>> +
>> +	TP_printk("vm=%p, pde=%d (0x%llx-0x%llx)",
>> +		  __entry->vm, __entry->pde, __entry->start, __entry->end)
>> +);
>> +
>> +DEFINE_EVENT(i915_pagetable, i915_pagetable_alloc,
>> +	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
>> +	     TP_ARGS(vm, pde, start, pde_shift)
>> +);
>> +
>> +DEFINE_EVENT(i915_pagetable, i915_pagetable_destroy,
>> +	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
>> +	     TP_ARGS(vm, pde, start, pde_shift)
>> +);
>> +
>> +/* Avoid extra math because we only support two sizes. The format is defined by
>> + * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
>> +#define TRACE_PT_SIZE(bits) \
>> +	((((bits) == 1024) ? 288 : 144) + 1)
>> +
>> +DECLARE_EVENT_CLASS(i915_pagetable_update,
>> +	TP_PROTO(struct i915_address_space *vm, u32 pde,
>> +		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
>> +	TP_ARGS(vm, pde, pt, first, len, bits),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(struct i915_address_space *, vm)
>> +		__field(u32, pde)
>> +		__field(u32, first)
>> +		__field(u32, last)
>> +		__dynamic_array(char, cur_ptes, TRACE_PT_SIZE(bits))
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->vm = vm;
>> +		__entry->pde = pde;
>> +		__entry->first = first;
>> +		__entry->last = first + len;
>> +
>> +		bitmap_scnprintf(__get_str(cur_ptes),
>> +				 TRACE_PT_SIZE(bits),
>> +				 pt->used_ptes,
>> +				 bits);
>> +	),
>> +
>> +	TP_printk("vm=%p, pde=%d, updating %u:%u\t%s",
>> +		  __entry->vm, __entry->pde, __entry->last, __entry->first,
>> +		  __get_str(cur_ptes))
>> +);
>> +
>> +DEFINE_EVENT(i915_pagetable_update, i915_pagetable_map,
>> +	TP_PROTO(struct i915_address_space *vm, u32 pde,
>> +		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
>> +	TP_ARGS(vm, pde, pt, first, len, bits)
>> +);
>> +
>> +DEFINE_EVENT(i915_pagetable_update, i915_pagetable_unmap,
>> +	TP_PROTO(struct i915_address_space *vm, u32 pde,
>> +		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
>> +	TP_ARGS(vm, pde, pt, first, len, bits)
>> +);
>> +
>>   TRACE_EVENT(i915_gem_object_change_domain,
>>   	    TP_PROTO(struct drm_i915_gem_object *obj, u32 old_read, u32 old_write),
>>   	    TP_ARGS(obj, old_read, old_write),
>> -- 
>> 2.1.1
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5510 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v2 14/24] drm/i915: Finish gen6/7 dynamic page table allocation
  2015-01-13 11:53       ` Michel Thierry
@ 2015-01-13 22:09         ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2015-01-13 22:09 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Tue, Jan 13, 2015 at 11:53:22AM +0000, Michel Thierry wrote:
> On 1/5/2015 2:45 PM, Daniel Vetter wrote:
> >Aside: Should we only allocate the scratch_pt for !aliasing?
> The next patch version will have the changes.
> About the scratch_pt, I'm not sure if it's a requirement in gen6/7 (to point
> unused page tables to the scratch, e.g. if there's less than 2GB).
> We know in gen8 that's the case, and systems with less than 4GB must have
> the remaining PDPs set to scratch page.

Assuming I understand things correctly we need the scratch_pt for
replacing the intermediate levels to collapse the tree a bit. Which is
useful both on gen7 and gen8+.

My question was to move the allocation into the full-ppgtt code (i.e.
i915.enable_ppgtt=2, not =1) since we don't really need this for aliasing
ppgtt mode. That should help a bit with readability since it makes it
clearer which things are used in which modes of the driver.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v4 00/24] PPGTT dynamic page allocations
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (26 preceding siblings ...)
  2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
@ 2015-01-22 17:01 ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 01/24] drm/i915/trace: Fix offsets for 64b Michel Thierry
                     ` (23 more replies)
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
  29 siblings, 24 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

This patchset continues addressing the comments from v2. In particular, it no
longer teardowns pagetables dynamically, now happening until the vm is free.

For GEN8, it has also been extended to work in logical ring submission (lrc)
mode, as it will be the preferred mode of operation.
I also tried to update the lrc code at the same time the ppgtt refactoring
occurred, leaving only one patch that is exclusively for lrc.

This list can be seen in 3 parts:
[01-08] Include code rework for PPGTT (all GENs).
[09-14] Adds page table allocation for GEN6/GEN7
[15-24] Enables dynamic allocation in GEN8,for both legacy and
execlist submission modes.

Ben Widawsky (21):
  drm/i915/trace: Fix offsets for 64b
  drm/i915: Rename to GEN8_LEGACY_PDPES
  drm/i915: Setup less PPGTT on failed page_directory
  drm/i915/gen8: Un-hardcode number of page directories
  drm/i915: page table abstractions
  drm/i915: Complete page table structures
  drm/i915: Create page table allocators
  drm/i915: Track GEN6 page table usage
  drm/i915: Extract context switch skip and pd load logic
  drm/i915: Track page table reload need
  drm/i915: Initialize all contexts
  drm/i915: Finish gen6/7 dynamic page table allocation
  drm/i915/bdw: Use dynamic allocation idioms on free
  drm/i915/bdw: page directories rework allocation
  drm/i915/bdw: pagetable allocation rework
  drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
  drm/i915: num_pd_pages/num_pd_entries isn't useful
  drm/i915: Extract PPGTT param from page_directory alloc
  drm/i915/bdw: Split out mappings
  drm/i915/bdw: begin bitmap tracking
  drm/i915/bdw: Dynamic page table allocations

Michel Thierry (3):
  drm/i915: Plumb drm_device through page tables operations
  drm/i915: Add dynamic page trace events
  drm/i915/bdw: Support dynamic pdp updates in lrc mode

 drivers/gpu/drm/i915/i915_debugfs.c        |    7 +-
 drivers/gpu/drm/i915/i915_gem.c            |   11 +
 drivers/gpu/drm/i915/i915_gem_context.c    |   64 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   11 +
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 1064 ++++++++++++++++++++--------
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  191 ++++-
 drivers/gpu/drm/i915/i915_trace.h          |  107 ++-
 drivers/gpu/drm/i915/intel_lrc.c           |   80 ++-
 8 files changed, 1183 insertions(+), 352 deletions(-)

-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v4 01/24] drm/i915/trace: Fix offsets for 64b
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-27 12:16     ` Mika Kuoppala
  2015-01-22 17:01   ` [PATCH v4 02/24] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
                     ` (22 subsequent siblings)
  23 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_trace.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 6058a01..f004d3d 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -115,7 +115,7 @@ TRACE_EVENT(i915_vma_bind,
 	    TP_STRUCT__entry(
 			     __field(struct drm_i915_gem_object *, obj)
 			     __field(struct i915_address_space *, vm)
-			     __field(u32, offset)
+			     __field(u64, offset)
 			     __field(u32, size)
 			     __field(unsigned, flags)
 			     ),
@@ -128,7 +128,7 @@ TRACE_EVENT(i915_vma_bind,
 			   __entry->flags = flags;
 			   ),
 
-	    TP_printk("obj=%p, offset=%08x size=%x%s vm=%p",
+	    TP_printk("obj=%p, offset=%016llx size=%x%s vm=%p",
 		      __entry->obj, __entry->offset, __entry->size,
 		      __entry->flags & PIN_MAPPABLE ? ", mappable" : "",
 		      __entry->vm)
@@ -141,7 +141,7 @@ TRACE_EVENT(i915_vma_unbind,
 	    TP_STRUCT__entry(
 			     __field(struct drm_i915_gem_object *, obj)
 			     __field(struct i915_address_space *, vm)
-			     __field(u32, offset)
+			     __field(u64, offset)
 			     __field(u32, size)
 			     ),
 
@@ -152,7 +152,7 @@ TRACE_EVENT(i915_vma_unbind,
 			   __entry->size = vma->node.size;
 			   ),
 
-	    TP_printk("obj=%p, offset=%08x size=%x vm=%p",
+	    TP_printk("obj=%p, offset=%016llx size=%x vm=%p",
 		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
 );
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 02/24] drm/i915: Rename to GEN8_LEGACY_PDPES
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 01/24] drm/i915/trace: Fix offsets for 64b Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-02-06 15:32     ` Mika Kuoppala
  2015-01-22 17:01   ` [PATCH v4 03/24] drm/i915: Setup less PPGTT on failed page_directory Michel Thierry
                     ` (21 subsequent siblings)
  23 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

In gen8, 32b PPGTT has always had one "pdp" (it doesn't actually have
one, but it resembles having one). The #define was confusing as is, and
using "PDPE" is a much better description.

sed -i 's/GEN8_LEGACY_PDPS/GEN8_LEGACY_PDPES/' drivers/gpu/drm/i915/*.[ch]

It also matches the x86 pagetable terminology:
PTE  = Page Table Entry - pagetable level 1 page
PDE  = Page Directory Entry - pagetable level 2 page
PDPE = Page Directory Pointer Entry - pagetable level 3 page

And in the near future (for 48b addressing):
PML4E = Page Map Level 4 Entry

v2: Expanded information about Page Directory/Table nomenclature.

Cc: Daniel Vetter <daniel@ffwll.ch>
CC: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 +++---
 drivers/gpu/drm/i915/i915_gem_gtt.h | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 746f77f..58d54bd 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -375,7 +375,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
-		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPS))
+		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
 			break;
 
 		if (pt_vaddr == NULL)
@@ -486,7 +486,7 @@ bail:
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
 					   const int max_pdp)
 {
-	struct page **pt_pages[GEN8_LEGACY_PDPS];
+	struct page **pt_pages[GEN8_LEGACY_PDPES];
 	int i, ret;
 
 	for (i = 0; i < max_pdp; i++) {
@@ -537,7 +537,7 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 		return -ENOMEM;
 
 	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
-	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
+	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e377c7d..9d998ec 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -88,7 +88,7 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN8_PDE_MASK			0x1ff
 #define GEN8_PTE_SHIFT			12
 #define GEN8_PTE_MASK			0x1ff
-#define GEN8_LEGACY_PDPS		4
+#define GEN8_LEGACY_PDPES		4
 #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
 #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
 
@@ -273,12 +273,12 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_pages; /* gen8+ */
 	union {
 		struct page **pt_pages;
-		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
+		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
 	};
 	struct page *pd_pages;
 	union {
 		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
+		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
 	};
 	union {
 		dma_addr_t *pt_dma_addr;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 03/24] drm/i915: Setup less PPGTT on failed page_directory
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 01/24] drm/i915/trace: Fix offsets for 64b Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 02/24] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-02-09 15:21     ` Mika Kuoppala
  2015-01-22 17:01   ` [PATCH v4 04/24] drm/i915/gen8: Un-hardcode number of page directories Michel Thierry
                     ` (20 subsequent siblings)
  23 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The current code will both potentially print a WARN, and setup part of
the PPGTT structure. Neither of these harm the current code, it is
simply for clarity, and to perhaps prevent later bugs, or weird
debug messages.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 58d54bd..b48b586 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1032,11 +1032,14 @@ alloc:
 		goto alloc;
 	}
 
+	if (ret)
+		return ret;
+
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
 	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
-	return ret;
+	return 0;
 }
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 04/24] drm/i915/gen8: Un-hardcode number of page directories
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (2 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 03/24] drm/i915: Setup less PPGTT on failed page_directory Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-02-09 15:30     ` Mika Kuoppala
  2015-01-22 17:01   ` [PATCH v4 05/24] drm/i915: page table abstractions Michel Thierry
                     ` (19 subsequent siblings)
  23 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9d998ec..8f76990 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -282,7 +282,7 @@ struct i915_hw_ppgtt {
 	};
 	union {
 		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[4];
+		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
 	};
 
 	struct drm_i915_file_private *file_priv;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 05/24] drm/i915: page table abstractions
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (3 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 04/24] drm/i915/gen8: Un-hardcode number of page directories Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-02-18 11:27     ` Mika Kuoppala
  2015-01-22 17:01   ` [PATCH v4 06/24] drm/i915: Complete page table structures Michel Thierry
                     ` (18 subsequent siblings)
  23 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

When we move to dynamic page allocation, keeping page_directory and pagetabs as
separate structures will help to break actions into simpler tasks.

To help transition the code nicely there is some wasted space in gen6/7.
This will be ameliorated shortly.

Following the x86 pagetable terminology:
PDPE = struct i915_page_directory_pointer_entry.
PDE = struct i915_page_directory_entry [page_directory].
PTE = struct i915_page_table_entry [page_tables].

v2: fixed mismatches after clean-up/rebase.

v3: Clarify the names of the multiple levels of page tables (Daniel)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2, v3)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 177 ++++++++++++++++++------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  23 ++++-
 2 files changed, 107 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b48b586..98b4698 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -334,7 +334,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
+		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
+		struct page *page_table = pd->page_tables[pde].page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
@@ -378,8 +379,12 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
 			break;
 
-		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
+		if (pt_vaddr == NULL) {
+			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
+			struct page *page_table = pd->page_tables[pde].page;
+
+			pt_vaddr = kmap_atomic(page_table);
+		}
 
 		pt_vaddr[pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -403,29 +408,33 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct page **pt_pages)
+static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
 {
 	int i;
 
-	if (pt_pages == NULL)
+	if (pd->page_tables == NULL)
 		return;
 
 	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
-		if (pt_pages[i])
-			__free_pages(pt_pages[i], 0);
+		if (pd->page_tables[i].page)
+			__free_page(pd->page_tables[i].page);
 }
 
-static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
+static void gen8_free_page_directories(struct i915_page_directory_entry *pd)
+{
+	kfree(pd->page_tables);
+	__free_page(pd->page);
+}
+
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
-		kfree(ppgtt->gen8_pt_pages[i]);
+		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
+		gen8_free_page_directories(&ppgtt->pdp.page_directory[i]);
 		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
-
-	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
 }
 
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
@@ -460,86 +469,75 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static struct page **__gen8_alloc_page_tables(void)
+static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages;
 	int i;
 
-	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
-	if (!pt_pages)
-		return ERR_PTR(-ENOMEM);
-
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!pt_pages[i])
-			goto bail;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
+						     sizeof(dma_addr_t),
+						     GFP_KERNEL);
+		if (!ppgtt->gen8_pt_dma_addr[i])
+			return -ENOMEM;
 	}
 
-	return pt_pages;
-
-bail:
-	gen8_free_page_tables(pt_pages);
-	kfree(pt_pages);
-	return ERR_PTR(-ENOMEM);
+	return 0;
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
-					   const int max_pdp)
+static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages[GEN8_LEGACY_PDPES];
-	int i, ret;
+	int i, j;
 
-	for (i = 0; i < max_pdp; i++) {
-		pt_pages[i] = __gen8_alloc_page_tables();
-		if (IS_ERR(pt_pages[i])) {
-			ret = PTR_ERR(pt_pages[i]);
-			goto unwind_out;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+			struct i915_page_table_entry *pt = &ppgtt->pdp.page_directory[i].page_tables[j];
+
+			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+			if (!pt->page)
+				goto unwind_out;
 		}
 	}
 
-	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
-	 * "atomic" - for cleanup purposes.
-	 */
-	for (i = 0; i < max_pdp; i++)
-		ppgtt->gen8_pt_pages[i] = pt_pages[i];
-
 	return 0;
 
 unwind_out:
-	while (i--) {
-		gen8_free_page_tables(pt_pages[i]);
-		kfree(pt_pages[i]);
-	}
+	while (i--)
+		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
 
-	return ret;
+	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
+						const int max_pdp)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
+	for (i = 0; i < max_pdp; i++) {
+		struct i915_page_table_entry *pt;
 
-	return 0;
-}
+		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
+		if (!pt)
+			goto unwind_out;
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
-{
-	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
-	if (!ppgtt->pd_pages)
-		return -ENOMEM;
+		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
+		if (!ppgtt->pdp.page_directory[i].page)
+			goto unwind_out;
+
+		ppgtt->pdp.page_directory[i].page_tables = pt;
+	}
 
-	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
+	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
+
+unwind_out:
+	while (i--) {
+		kfree(ppgtt->pdp.page_directory[i].page_tables);
+		__free_page(ppgtt->pdp.page_directory[i].page);
+	}
+
+	return -ENOMEM;
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
@@ -551,18 +549,19 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
-	if (ret) {
-		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
-		return ret;
-	}
+	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
+	if (ret)
+		goto err_out;
 
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
 
 	ret = gen8_ppgtt_allocate_dma(ppgtt);
-	if (ret)
-		gen8_ppgtt_free(ppgtt);
+	if (!ret)
+		return ret;
 
+	/* TODO: Check this for all cases */
+err_out:
+	gen8_ppgtt_free(ppgtt);
 	return ret;
 }
 
@@ -573,7 +572,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       &ppgtt->pd_pages[pd], 0,
+			       ppgtt->pdp.page_directory[pd].page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
@@ -593,7 +592,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 	struct page *p;
 	int ret;
 
-	p = ppgtt->gen8_pt_pages[pd][pt];
+	p = ppgtt->pdp.page_directory[pd].page_tables[pt].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
@@ -654,7 +653,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 */
 	for (i = 0; i < max_pdp; i++) {
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
@@ -717,7 +716,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
 		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
@@ -922,7 +921,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > I915_PPGTT_PT_ENTRIES)
 			last_pte = I915_PPGTT_PT_ENTRIES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -951,7 +950,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -986,8 +985,8 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pt_pages[i]);
-	kfree(ppgtt->pt_pages);
+		__free_page(ppgtt->pd.page_tables[i].page);
+	kfree(ppgtt->pd.page_tables);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1044,22 +1043,22 @@ alloc:
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
+	struct i915_page_table_entry *pt;
 	int i;
 
-	ppgtt->pt_pages = kcalloc(ppgtt->num_pd_entries, sizeof(struct page *),
-				  GFP_KERNEL);
-
-	if (!ppgtt->pt_pages)
+	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
+	if (!pt)
 		return -ENOMEM;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!ppgtt->pt_pages[i]) {
+		pt[i].page = alloc_page(GFP_KERNEL);
+		if (!pt->page) {
 			gen6_ppgtt_free(ppgtt);
 			return -ENOMEM;
 		}
 	}
 
+	ppgtt->pd.page_tables = pt;
 	return 0;
 }
 
@@ -1094,9 +1093,11 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct page *page;
 		dma_addr_t pt_addr;
 
-		pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i], 0, 4096,
+		page = ppgtt->pd.page_tables[i].page;
+		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
 		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
@@ -1140,7 +1141,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total =  ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd_offset =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8f76990..d9bc375 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -187,6 +187,20 @@ struct i915_vma {
 			 u32 flags);
 };
 
+struct i915_page_table_entry {
+	struct page *page;
+};
+
+struct i915_page_directory_entry {
+	struct page *page; /* NULL for GEN6-GEN7 */
+	struct i915_page_table_entry *page_tables;
+};
+
+struct i915_page_directory_pointer_entry {
+	/* struct page *page; */
+	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
+};
+
 struct i915_address_space {
 	struct drm_mm mm;
 	struct drm_device *dev;
@@ -272,11 +286,6 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		struct page **pt_pages;
-		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
-	};
-	struct page *pd_pages;
-	union {
 		uint32_t pd_offset;
 		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
 	};
@@ -284,6 +293,10 @@ struct i915_hw_ppgtt {
 		dma_addr_t *pt_dma_addr;
 		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
 	};
+	union {
+		struct i915_page_directory_pointer_entry pdp;
+		struct i915_page_directory_entry pd;
+	};
 
 	struct drm_i915_file_private *file_priv;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 06/24] drm/i915: Complete page table structures
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (4 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 05/24] drm/i915: page table abstractions Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 07/24] drm/i915: Create page table allocators Michel Thierry
                     ` (17 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Move the remaining members over to the new page table structures.

This can be squashed with the previous commit if desire. The reasoning
is the same as that patch. I simply felt it is easier to review if split.

v2: In lrc: s/ppgtt->pd_dma_addr[i]/ppgtt->pdp.page_directory[i].daddr/
v3: Rebase.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2, v3)
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 85 +++++++++++++------------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 14 +++---
 drivers/gpu/drm/i915/intel_lrc.c    | 16 +++----
 4 files changed, 45 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e515aad..60f91bc 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2153,7 +2153,7 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
 		seq_puts(m, "aliasing PPGTT:\n");
-		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
+		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
 
 		ppgtt->debug_dump(ppgtt, m);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 98b4698..0fe5c1e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -307,7 +307,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pd_dma_addr[i];
+		dma_addr_t addr = ppgtt->pdp.page_directory[i].daddr;
 		ret = gen8_write_pdp(ring, i, addr);
 		if (ret)
 			return ret;
@@ -433,7 +433,6 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
 		gen8_free_page_directories(&ppgtt->pdp.page_directory[i]);
-		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
 }
 
@@ -445,14 +444,14 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pd_dma_addr[i])
+		if (!ppgtt->pdp.page_directory[i].daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i].daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -469,32 +468,19 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
-{
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
-
-	return 0;
-}
-
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
 	int i, j;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[i];
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_table_entry *pt = &ppgtt->pdp.page_directory[i].page_tables[j];
+			struct i915_page_table_entry *pt = &pd->page_tables[j];
 
 			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 			if (!pt->page)
 				goto unwind_out;
+
 		}
 	}
 
@@ -555,9 +541,7 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
 
-	ret = gen8_ppgtt_allocate_dma(ppgtt);
-	if (!ret)
-		return ret;
+	return 0;
 
 	/* TODO: Check this for all cases */
 err_out:
@@ -579,7 +563,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ppgtt->pd_dma_addr[pd] = pd_addr;
+	ppgtt->pdp.page_directory[pd].daddr = pd_addr;
 
 	return 0;
 }
@@ -589,17 +573,18 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pt)
 {
 	dma_addr_t pt_addr;
-	struct page *p;
+	struct i915_page_directory_entry *pdir = &ppgtt->pdp.page_directory[pd];
+	struct i915_page_table_entry *ptab = &pdir->page_tables[pt];
+	struct page *p = ptab->page;
 	int ret;
 
-	p = ppgtt->pdp.page_directory[pd].page_tables[pt].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
+	ptab->daddr = pt_addr;
 
 	return 0;
 }
@@ -655,7 +640,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		gen8_ppgtt_pde_t *pd_vaddr;
 		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -698,14 +683,15 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
 
 	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
 	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-		   ppgtt->pd_offset, ppgtt->pd_offset + ppgtt->num_pd_entries);
+		   ppgtt->pd.pd_offset,
+		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pt_dma_addr[pde];
+		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -749,13 +735,13 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 	uint32_t pd_entry;
 	int i;
 
-	WARN_ON(ppgtt->pd_offset & 0x3f);
+	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
 	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		dma_addr_t pt_addr;
 
-		pt_addr = ppgtt->pt_dma_addr[i];
+		pt_addr = ppgtt->pd.page_tables[i].daddr;
 		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
 		pd_entry |= GEN6_PDE_VALID;
 
@@ -766,9 +752,9 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 {
-	BUG_ON(ppgtt->pd_offset & 0x3f);
+	BUG_ON(ppgtt->pd.pd_offset & 0x3f);
 
-	return (ppgtt->pd_offset / 64) << 16;
+	return (ppgtt->pd.pd_offset / 64) << 16;
 }
 
 static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
@@ -971,19 +957,16 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	if (ppgtt->pt_dma_addr) {
-		for (i = 0; i < ppgtt->num_pd_entries; i++)
-			pci_unmap_page(ppgtt->base.dev->pdev,
-				       ppgtt->pt_dma_addr[i],
-				       4096, PCI_DMA_BIDIRECTIONAL);
-	}
+	for (i = 0; i < ppgtt->num_pd_entries; i++)
+		pci_unmap_page(ppgtt->base.dev->pdev,
+			       ppgtt->pd.page_tables[i].daddr,
+			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		__free_page(ppgtt->pd.page_tables[i].page);
 	kfree(ppgtt->pd.page_tables);
@@ -1076,14 +1059,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 	}
 
-	ppgtt->pt_dma_addr = kcalloc(ppgtt->num_pd_entries, sizeof(dma_addr_t),
-				     GFP_KERNEL);
-	if (!ppgtt->pt_dma_addr) {
-		drm_mm_remove_node(&ppgtt->node);
-		gen6_ppgtt_free(ppgtt);
-		return -ENOMEM;
-	}
-
 	return 0;
 }
 
@@ -1105,7 +1080,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pt_dma_addr[i] = pt_addr;
+		ppgtt->pd.page_tables[i].daddr = pt_addr;
 	}
 
 	return 0;
@@ -1144,7 +1119,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
-	ppgtt->pd_offset =
+	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
@@ -1155,7 +1130,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 
 	gen6_write_pdes(ppgtt);
 	DRM_DEBUG("Adding PPGTT at offset %x\n",
-		  ppgtt->pd_offset << 10);
+		  ppgtt->pd.pd_offset << 10);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d9bc375..6efeb18 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -189,10 +189,16 @@ struct i915_vma {
 
 struct i915_page_table_entry {
 	struct page *page;
+	dma_addr_t daddr;
 };
 
 struct i915_page_directory_entry {
 	struct page *page; /* NULL for GEN6-GEN7 */
+	union {
+		uint32_t pd_offset;
+		dma_addr_t daddr;
+	};
+
 	struct i915_page_table_entry *page_tables;
 };
 
@@ -286,14 +292,6 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
-	};
-	union {
-		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
-	};
-	union {
 		struct i915_page_directory_pointer_entry pdp;
 		struct i915_page_directory_entry pd;
 	};
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a68f180..a784d1d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1731,14 +1731,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[3]);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[3]);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[2]);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[2]);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[1]);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[1]);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[0]);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[0]);
+	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3].daddr);
+	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3].daddr);
+	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2].daddr);
+	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2].daddr);
+	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1].daddr);
+	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1].daddr);
+	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0].daddr);
+	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0].daddr);
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 07/24] drm/i915: Create page table allocators
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (5 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 06/24] drm/i915: Complete page table structures Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-02-20 16:50     ` Mika Kuoppala
  2015-01-22 17:01   ` [PATCH v4 08/24] drm/i915: Plumb drm_device through page tables operations Michel Thierry
                     ` (16 subsequent siblings)
  23 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks.  This makes the
code easier to write, read, and verify.

Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,

The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain non-trivial complexity as a
result.

This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.

2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).

3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.

4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.

v2: Updated commit message to explain why this patch exists

v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/

v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)

v5: Added additional safety checks in gen8 clear/free/unmap.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3, v4, v5)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 251 ++++++++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c    |  16 +--
 3 files changed, 179 insertions(+), 92 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0fe5c1e..85ea535 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -275,6 +275,99 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
+static void unmap_and_free_pt(struct i915_page_table_entry *pt)
+{
+	if (WARN_ON(!pt->page))
+		return;
+	__free_page(pt->page);
+	kfree(pt);
+}
+
+static struct i915_page_table_entry *alloc_pt_single(void)
+{
+	struct i915_page_table_entry *pt;
+
+	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
+	if (!pt)
+		return ERR_PTR(-ENOMEM);
+
+	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pt->page) {
+		kfree(pt);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pt;
+}
+
+/**
+ * alloc_pt_range() - Allocate a multiple page tables
+ * @pd:		The page directory which will have at least @count entries
+ *		available to point to the allocated page tables.
+ * @pde:	First page directory entry for which we are allocating.
+ * @count:	Number of pages to allocate.
+ *
+ * Allocates multiple page table pages and sets the appropriate entries in the
+ * page table structure within the page directory. Function cleans up after
+ * itself on any failures.
+ *
+ * Return: 0 if allocation succeeded.
+ */
+static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count)
+{
+	int i, ret;
+
+	/* 512 is the max page tables per page_directory on any platform.
+	 * TODO: make WARN after patch series is done
+	 */
+	BUG_ON(pde + count > GEN6_PPGTT_PD_ENTRIES);
+
+	for (i = pde; i < pde + count; i++) {
+		struct i915_page_table_entry *pt = alloc_pt_single();
+
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto err_out;
+		}
+		WARN(pd->page_tables[i],
+		     "Leaking page directory entry %d (%pa)\n",
+		     i, pd->page_tables[i]);
+		pd->page_tables[i] = pt;
+	}
+
+	return 0;
+
+err_out:
+	while (i--)
+		unmap_and_free_pt(pd->page_tables[i]);
+	return ret;
+}
+
+static void unmap_and_free_pd(struct i915_page_directory_entry *pd)
+{
+	if (pd->page) {
+		__free_page(pd->page);
+		kfree(pd);
+	}
+}
+
+static struct i915_page_directory_entry *alloc_pd_single(void)
+{
+	struct i915_page_directory_entry *pd;
+
+	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+	if (!pd)
+		return ERR_PTR(-ENOMEM);
+
+	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pd->page) {
+		kfree(pd);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pd;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
 			   uint64_t val)
@@ -307,7 +400,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.page_directory[i].daddr;
+		dma_addr_t addr = ppgtt->pdp.page_directory[i]->daddr;
 		ret = gen8_write_pdp(ring, i, addr);
 		if (ret)
 			return ret;
@@ -334,8 +427,24 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
-		struct page *page_table = pd->page_tables[pde].page;
+		struct i915_page_directory_entry *pd;
+		struct i915_page_table_entry *pt;
+		struct page *page_table;
+
+		if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
+			continue;
+
+		pd = ppgtt->pdp.page_directory[pdpe];
+
+		if (WARN_ON(!pd->page_tables[pde]))
+			continue;
+
+		pt = pd->page_tables[pde];
+
+		if (WARN_ON(!pt->page))
+			continue;
+
+		page_table = pt->page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
@@ -380,8 +489,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 			break;
 
 		if (pt_vaddr == NULL) {
-			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
-			struct page *page_table = pd->page_tables[pde].page;
+			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[pdpe];
+			struct i915_page_table_entry *pt = pd->page_tables[pde];
+			struct page *page_table = pt->page;
 
 			pt_vaddr = kmap_atomic(page_table);
 		}
@@ -412,18 +522,16 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
 {
 	int i;
 
-	if (pd->page_tables == NULL)
+	if (!pd->page)
 		return;
 
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
-		if (pd->page_tables[i].page)
-			__free_page(pd->page_tables[i].page);
-}
+	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
+		if (WARN_ON(!pd->page_tables[i]))
+			continue;
 
-static void gen8_free_page_directories(struct i915_page_directory_entry *pd)
-{
-	kfree(pd->page_tables);
-	__free_page(pd->page);
+		unmap_and_free_pt(pd->page_tables[i]);
+		pd->page_tables[i] = NULL;
+	}
 }
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
@@ -431,8 +539,11 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
-		gen8_free_page_directories(&ppgtt->pdp.page_directory[i]);
+		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+			continue;
+
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
+		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
 	}
 }
 
@@ -444,14 +555,23 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pdp.page_directory[i].daddr)
+		if (!ppgtt->pdp.page_directory[i]->daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i].daddr, PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i]->daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
+			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
+			struct i915_page_table_entry *pt;
+			dma_addr_t addr;
+
+			if (WARN_ON(!pd->page_tables[j]))
+				continue;
+
+			pt = pd->page_tables[j];
+			addr = pt->daddr;
+
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -470,25 +590,20 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	int i, j;
+	int i, ret;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[i];
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_table_entry *pt = &pd->page_tables[j];
-
-			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-			if (!pt->page)
-				goto unwind_out;
-
-		}
+		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
+				     0, GEN8_PDES_PER_PAGE);
+		if (ret)
+			goto unwind_out;
 	}
 
 	return 0;
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
 
 	return -ENOMEM;
 }
@@ -499,17 +614,9 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int i;
 
 	for (i = 0; i < max_pdp; i++) {
-		struct i915_page_table_entry *pt;
-
-		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
-		if (!pt)
+		ppgtt->pdp.page_directory[i] = alloc_pd_single();
+		if (IS_ERR(ppgtt->pdp.page_directory[i]))
 			goto unwind_out;
-
-		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
-		if (!ppgtt->pdp.page_directory[i].page)
-			goto unwind_out;
-
-		ppgtt->pdp.page_directory[i].page_tables = pt;
 	}
 
 	ppgtt->num_pd_pages = max_pdp;
@@ -518,10 +625,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 
 unwind_out:
-	while (i--) {
-		kfree(ppgtt->pdp.page_directory[i].page_tables);
-		__free_page(ppgtt->pdp.page_directory[i].page);
-	}
+	while (i--)
+		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
 
 	return -ENOMEM;
 }
@@ -556,14 +661,14 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.page_directory[pd].page, 0,
+			       ppgtt->pdp.page_directory[pd]->page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->pdp.page_directory[pd].daddr = pd_addr;
+	ppgtt->pdp.page_directory[pd]->daddr = pd_addr;
 
 	return 0;
 }
@@ -573,8 +678,8 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pt)
 {
 	dma_addr_t pt_addr;
-	struct i915_page_directory_entry *pdir = &ppgtt->pdp.page_directory[pd];
-	struct i915_page_table_entry *ptab = &pdir->page_tables[pt];
+	struct i915_page_directory_entry *pdir = ppgtt->pdp.page_directory[pd];
+	struct i915_page_table_entry *ptab = pdir->page_tables[pt];
 	struct page *p = ptab->page;
 	int ret;
 
@@ -637,10 +742,12 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 * will never need to touch the PDEs again.
 	 */
 	for (i = 0; i < max_pdp; i++) {
+		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i]->page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
+			struct i915_page_table_entry *pt = pd->page_tables[j];
+			dma_addr_t addr = pt->daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -691,7 +798,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
+		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -702,7 +809,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
 		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
@@ -741,7 +848,7 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		dma_addr_t pt_addr;
 
-		pt_addr = ppgtt->pd.page_tables[i].daddr;
+		pt_addr = ppgtt->pd.page_tables[i]->daddr;
 		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
 		pd_entry |= GEN6_PDE_VALID;
 
@@ -907,7 +1014,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > I915_PPGTT_PT_ENTRIES)
 			last_pte = I915_PPGTT_PT_ENTRIES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -936,7 +1043,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -959,7 +1066,7 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_tables[i].daddr,
+			       ppgtt->pd.page_tables[i]->daddr,
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
@@ -968,8 +1075,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pd.page_tables[i].page);
-	kfree(ppgtt->pd.page_tables);
+		unmap_and_free_pt(ppgtt->pd.page_tables[i]);
+
+	unmap_and_free_pd(&ppgtt->pd);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1024,27 +1132,6 @@ alloc:
 	return 0;
 }
 
-static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct i915_page_table_entry *pt;
-	int i;
-
-	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
-	if (!pt)
-		return -ENOMEM;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		pt[i].page = alloc_page(GFP_KERNEL);
-		if (!pt->page) {
-			gen6_ppgtt_free(ppgtt);
-			return -ENOMEM;
-		}
-	}
-
-	ppgtt->pd.page_tables = pt;
-	return 0;
-}
-
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 {
 	int ret;
@@ -1053,7 +1140,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_allocate_page_tables(ppgtt);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
@@ -1071,7 +1158,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 		struct page *page;
 		dma_addr_t pt_addr;
 
-		page = ppgtt->pd.page_tables[i].page;
+		page = ppgtt->pd.page_tables[i]->page;
 		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
@@ -1080,7 +1167,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pd.page_tables[i].daddr = pt_addr;
+		ppgtt->pd.page_tables[i]->daddr = pt_addr;
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 6efeb18..e8cad72 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -199,12 +199,12 @@ struct i915_page_directory_entry {
 		dma_addr_t daddr;
 	};
 
-	struct i915_page_table_entry *page_tables;
+	struct i915_page_table_entry *page_tables[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
 };
 
 struct i915_page_directory_pointer_entry {
 	/* struct page *page; */
-	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
+	struct i915_page_directory_entry *page_directory[GEN8_LEGACY_PDPES];
 };
 
 struct i915_address_space {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a784d1d..efaaebe 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1731,14 +1731,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3].daddr);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3].daddr);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2].daddr);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2].daddr);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1].daddr);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1].daddr);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0].daddr);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0].daddr);
+	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 08/24] drm/i915: Plumb drm_device through page tables operations
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (6 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 07/24] drm/i915: Create page table allocators Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 09/24] drm/i915: Track GEN6 page table usage Michel Thierry
                     ` (15 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

The next patch in the series will require it for alloc_pt_single.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 85ea535..e2bcd10 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -138,7 +138,6 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 		return has_aliasing_ppgtt ? 1 : 0;
 }
 
-
 static void ppgtt_bind_vma(struct i915_vma *vma,
 			   enum i915_cache_level cache_level,
 			   u32 flags);
@@ -275,7 +274,7 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-static void unmap_and_free_pt(struct i915_page_table_entry *pt)
+static void unmap_and_free_pt(struct i915_page_table_entry *pt, struct drm_device *dev)
 {
 	if (WARN_ON(!pt->page))
 		return;
@@ -283,7 +282,7 @@ static void unmap_and_free_pt(struct i915_page_table_entry *pt)
 	kfree(pt);
 }
 
-static struct i915_page_table_entry *alloc_pt_single(void)
+static struct i915_page_table_entry *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_page_table_entry *pt;
 
@@ -313,7 +312,9 @@ static struct i915_page_table_entry *alloc_pt_single(void)
  *
  * Return: 0 if allocation succeeded.
  */
-static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count)
+static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count,
+		  struct drm_device *dev)
+
 {
 	int i, ret;
 
@@ -323,7 +324,7 @@ static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, si
 	BUG_ON(pde + count > GEN6_PPGTT_PD_ENTRIES);
 
 	for (i = pde; i < pde + count; i++) {
-		struct i915_page_table_entry *pt = alloc_pt_single();
+		struct i915_page_table_entry *pt = alloc_pt_single(dev);
 
 		if (IS_ERR(pt)) {
 			ret = PTR_ERR(pt);
@@ -339,7 +340,7 @@ static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, si
 
 err_out:
 	while (i--)
-		unmap_and_free_pt(pd->page_tables[i]);
+		unmap_and_free_pt(pd->page_tables[i], dev);
 	return ret;
 }
 
@@ -518,7 +519,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
+static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct drm_device *dev)
 {
 	int i;
 
@@ -529,7 +530,7 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
 		if (WARN_ON(!pd->page_tables[i]))
 			continue;
 
-		unmap_and_free_pt(pd->page_tables[i]);
+		unmap_and_free_pt(pd->page_tables[i], dev);
 		pd->page_tables[i] = NULL;
 	}
 }
@@ -542,7 +543,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
 		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
 	}
 }
@@ -594,7 +595,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
-				     0, GEN8_PDES_PER_PAGE);
+				     0, GEN8_PDES_PER_PAGE, ppgtt->base.dev);
 		if (ret)
 			goto unwind_out;
 	}
@@ -603,7 +604,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
@@ -1075,7 +1076,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		unmap_and_free_pt(ppgtt->pd.page_tables[i]);
+		unmap_and_free_pt(ppgtt->pd.page_tables[i], ppgtt->base.dev);
 
 	unmap_and_free_pd(&ppgtt->pd);
 }
@@ -1140,7 +1141,9 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+			ppgtt->base.dev);
+
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 09/24] drm/i915: Track GEN6 page table usage
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (7 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 08/24] drm/i915: Plumb drm_device through page tables operations Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-02-20 16:41     ` Mika Kuoppala
  2015-01-22 17:01   ` [PATCH v4 10/24] drm/i915: Extract context switch skip and pd load logic Michel Thierry
                     ` (14 subsequent siblings)
  23 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Instead of implementing the full tracking + dynamic allocation, this
patch does a bit less than half of the work, by tracking and warning on
unexpected conditions. The tracking itself follows which PTEs within a
page table are currently being used for objects. The next patch will
modify this to actually allocate the page tables only when necessary.

With the current patch there isn't much in the way of making a gen
agnostic range allocation function. However, in the next patch we'll add
more specificity which makes having separate functions a bit easier to
manage.

One important change introduced here is that DMA mappings are
created/destroyed at the same page directories/tables are
allocated/deallocated.

Notice that aliasing PPGTT is not managed here. The patch which actually
begins dynamic allocation/teardown explains the reasoning for this.

v2: s/pdp.page_directory/pdp.page_directorys
Make a scratch page allocation helper

v3: Rebase and expand commit message.

v4: Allocate required pagetables only when it is needed, _bind_to_vm
instead of bind_vma (Daniel).

v5: Rebased to remove the unnecessary noise in the diff, also:
 - PDE mask is GEN agnostic, renamed GEN6_PDE_MASK to I915_PDE_MASK.
 - Removed unnecessary checks in gen6_alloc_va_range.
 - Changed map/unmap_px_single macros to use dma functions directly and
   be part of a static inline function instead.
 - Moved drm_device plumbing through page tables operation to its own
   patch.
 - Moved allocate/teardown_va_range calls until they are fully
   implemented (in subsequent patch).
 - Merged pt and scratch_pt unmap_and_free path.
 - Moved scratch page allocator helper to the patch that will use it.

v6: Reduce complexity by not tearing down pagetables dynamically, the
same can be achieved while freeing empty vms. (Daniel)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 191 +++++++++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  75 ++++++++++++++
 2 files changed, 206 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e2bcd10..760585e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -274,29 +274,88 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-static void unmap_and_free_pt(struct i915_page_table_entry *pt, struct drm_device *dev)
+#define i915_dma_unmap_single(px, dev) \
+	__i915_dma_unmap_single((px)->daddr, dev)
+
+static inline void __i915_dma_unmap_single(dma_addr_t daddr,
+					struct drm_device *dev)
+{
+	struct device *device = &dev->pdev->dev;
+
+	dma_unmap_page(device, daddr, 4096, PCI_DMA_BIDIRECTIONAL);
+}
+
+/**
+ * i915_dma_map_px_single() - Create a dma mapping for a page table/dir/etc.
+ * @px:		Page table/dir/etc to get a DMA map for
+ * @dev:	drm device
+ *
+ * Page table allocations are unified across all gens. They always require a
+ * single 4k allocation, as well as a DMA mapping. If we keep the structs
+ * symmetric here, the simple macro covers us for every page table type.
+ *
+ * Return: 0 if success.
+ */
+#define i915_dma_map_px_single(px, dev) \
+	i915_dma_map_page_single((px)->page, (dev), &(px)->daddr)
+
+static inline int i915_dma_map_page_single(struct page *page,
+					   struct drm_device *dev,
+					   dma_addr_t *daddr)
+{
+	struct device *device = &dev->pdev->dev;
+
+	*daddr = dma_map_page(device, page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
+	return dma_mapping_error(device, *daddr);
+}
+
+static void unmap_and_free_pt(struct i915_page_table_entry *pt,
+			       struct drm_device *dev)
 {
 	if (WARN_ON(!pt->page))
 		return;
+
+	i915_dma_unmap_single(pt, dev);
 	__free_page(pt->page);
+	kfree(pt->used_ptes);
 	kfree(pt);
 }
 
 static struct i915_page_table_entry *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_page_table_entry *pt;
+	const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+		GEN8_PTES_PER_PAGE : I915_PPGTT_PT_ENTRIES;
+	int ret = -ENOMEM;
 
 	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
 	if (!pt)
 		return ERR_PTR(-ENOMEM);
 
+	pt->used_ptes = kcalloc(BITS_TO_LONGS(count), sizeof(*pt->used_ptes),
+				GFP_KERNEL);
+
+	if (!pt->used_ptes)
+		goto fail_bitmap;
+
 	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (!pt->page) {
-		kfree(pt);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!pt->page)
+		goto fail_page;
+
+	ret = i915_dma_map_px_single(pt, dev);
+	if (ret)
+		goto fail_dma;
 
 	return pt;
+
+fail_dma:
+	__free_page(pt->page);
+fail_page:
+	kfree(pt->used_ptes);
+fail_bitmap:
+	kfree(pt);
+
+	return ERR_PTR(ret);
 }
 
 /**
@@ -836,26 +895,36 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	}
 }
 
-static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
+/* Write pde (index) from the page directory @pd to the page table @pt */
+static void gen6_write_pdes(struct i915_page_directory_entry *pd,
+			    const int pde, struct i915_page_table_entry *pt)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
-	gen6_gtt_pte_t __iomem *pd_addr;
-	uint32_t pd_entry;
-	int i;
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pd, struct i915_hw_ppgtt, pd);
+	u32 pd_entry;
 
-	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
-	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		dma_addr_t pt_addr;
+	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
+	pd_entry |= GEN6_PDE_VALID;
 
-		pt_addr = ppgtt->pd.page_tables[i]->daddr;
-		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
-		pd_entry |= GEN6_PDE_VALID;
+	writel(pd_entry, ppgtt->pd_addr + pde);
 
-		writel(pd_entry, pd_addr + i);
-	}
-	readl(pd_addr);
+	/* XXX: Caller needs to make sure the write completes if necessary */
+}
+
+/* Write all the page tables found in the ppgtt structure to incrementing page
+ * directories. */
+static void gen6_write_page_range(struct drm_i915_private *dev_priv,
+				struct i915_page_directory_entry *pd, uint32_t start, uint32_t length)
+{
+	struct i915_page_table_entry *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, pd, start, length, temp, pde)
+		gen6_write_pdes(pd, pde, pt);
+
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
+	readl(dev_priv->gtt.gsm);
 }
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
@@ -1071,6 +1140,28 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
+static int gen6_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+				container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_table_entry *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		DECLARE_BITMAP(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
+
+		bitmap_zero(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
+		bitmap_set(tmp_bitmap, gen6_pte_index(start),
+			   gen6_pte_count(start, length));
+
+		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+				I915_PPGTT_PT_ENTRIES);
+	}
+
+	return 0;
+}
+
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
@@ -1117,20 +1208,24 @@ alloc:
 					       0, dev_priv->gtt.base.total,
 					       0);
 		if (ret)
-			return ret;
+			goto err_out;
 
 		retried = true;
 		goto alloc;
 	}
 
 	if (ret)
-		return ret;
+		goto err_out;
+
 
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
 	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
 	return 0;
+
+err_out:
+	return ret;
 }
 
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
@@ -1152,30 +1247,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct drm_device *dev = ppgtt->base.dev;
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		struct page *page;
-		dma_addr_t pt_addr;
-
-		page = ppgtt->pd.page_tables[i]->page;
-		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
-				       PCI_DMA_BIDIRECTIONAL);
-
-		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
-			gen6_ppgtt_unmap_pages(ppgtt);
-			return -EIO;
-		}
-
-		ppgtt->pd.page_tables[i]->daddr = pt_addr;
-	}
-
-	return 0;
-}
-
 static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
 	struct drm_device *dev = ppgtt->base.dev;
@@ -1196,12 +1267,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_setup_page_tables(ppgtt);
-	if (ret) {
-		gen6_ppgtt_free(ppgtt);
-		return ret;
-	}
-
+	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
@@ -1212,13 +1278,17 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
+	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
+
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 
+	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
+
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
 			 ppgtt->node.start / PAGE_SIZE);
 
-	gen6_write_pdes(ppgtt);
 	DRM_DEBUG("Adding PPGTT at offset %x\n",
 		  ppgtt->pd.pd_offset << 10);
 
@@ -1491,13 +1561,14 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 
 	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
 		/* TODO: Perhaps it shouldn't be gen6 specific */
-		if (i915_is_ggtt(vm)) {
-			if (dev_priv->mm.aliasing_ppgtt)
-				gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
-			continue;
-		}
 
-		gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
+		struct i915_hw_ppgtt *ppgtt =
+			container_of(vm, struct i915_hw_ppgtt, base);
+
+		if (i915_is_ggtt(vm))
+			ppgtt = dev_priv->mm.aliasing_ppgtt;
+
+		gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
 	}
 
 	i915_ggtt_flush(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e8cad72..1b15fc9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -54,7 +54,10 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN6_PPGTT_PD_ENTRIES		512
 #define GEN6_PD_SIZE			(GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
 #define GEN6_PD_ALIGN			(PAGE_SIZE * 16)
+#define GEN6_PDE_SHIFT			22
 #define GEN6_PDE_VALID			(1 << 0)
+#define I915_PDE_MASK			(GEN6_PPGTT_PD_ENTRIES-1)
+#define NUM_PTE(pde_shift)		(1 << (pde_shift - PAGE_SHIFT))
 
 #define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
 
@@ -190,6 +193,8 @@ struct i915_vma {
 struct i915_page_table_entry {
 	struct page *page;
 	dma_addr_t daddr;
+
+	unsigned long *used_ptes;
 };
 
 struct i915_page_directory_entry {
@@ -246,6 +251,9 @@ struct i915_address_space {
 	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
 				     enum i915_cache_level level,
 				     bool valid, u32 flags); /* Create a valid PTE */
+	int (*allocate_va_range)(struct i915_address_space *vm,
+				 uint64_t start,
+				 uint64_t length);
 	void (*clear_range)(struct i915_address_space *vm,
 			    uint64_t start,
 			    uint64_t length,
@@ -298,12 +306,79 @@ struct i915_hw_ppgtt {
 
 	struct drm_i915_file_private *file_priv;
 
+	gen6_gtt_pte_t __iomem *pd_addr;
+
 	int (*enable)(struct i915_hw_ppgtt *ppgtt);
 	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
 			 struct intel_engine_cs *ring);
 	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
 };
 
+/* For each pde iterates over every pde between from start until start + length.
+ * If start, and start+length are not perfectly divisible, the macro will round
+ * down, and up as needed. The macro modifies pde, start, and length. Dev is
+ * only used to differentiate shift values. Temp is temp.  On gen6/7, start = 0,
+ * and length = 2G effectively iterates over every PDE in the system. On gen8+
+ * it simply iterates over every page directory entry in a page directory.
+ *
+ * XXX: temp is not actually needed, but it saves doing the ALIGN operation.
+ */
+#define gen6_for_each_pde(pt, pd, start, length, temp, iter) \
+	for (iter = gen6_pde_index(start), pt = (pd)->page_tables[iter]; \
+	     length > 0 && iter < GEN6_PPGTT_PD_ENTRIES; \
+	     pt = (pd)->page_tables[++iter], \
+	     temp = ALIGN(start+1, 1 << GEN6_PDE_SHIFT) - start, \
+	     temp = min_t(unsigned, temp, length), \
+	     start += temp, length -= temp)
+
+static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
+{
+	const uint32_t mask = NUM_PTE(pde_shift) - 1;
+
+	return (address >> PAGE_SHIFT) & mask;
+}
+
+/* Helper to counts the number of PTEs within the given length. This count does
+* not cross a page table boundary, so the max value would be
+* I915_PPGTT_PT_ENTRIES for GEN6, and GEN8_PTES_PER_PAGE for GEN8.
+*/
+static inline size_t i915_pte_count(uint64_t addr, size_t length,
+					uint32_t pde_shift)
+{
+	const uint64_t mask = ~((1 << pde_shift) - 1);
+	uint64_t end;
+
+	BUG_ON(length == 0);
+	BUG_ON(offset_in_page(addr|length));
+
+	end = addr + length;
+
+	if ((addr & mask) != (end & mask))
+		return NUM_PTE(pde_shift) - i915_pte_index(addr, pde_shift);
+
+	return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
+}
+
+static inline uint32_t i915_pde_index(uint64_t addr, uint32_t shift)
+{
+	return (addr >> shift) & I915_PDE_MASK;
+}
+
+static inline uint32_t gen6_pte_index(uint32_t addr)
+{
+	return i915_pte_index(addr, GEN6_PDE_SHIFT);
+}
+
+static inline size_t gen6_pte_count(uint32_t addr, uint32_t length)
+{
+	return i915_pte_count(addr, length, GEN6_PDE_SHIFT);
+}
+
+static inline uint32_t gen6_pde_index(uint32_t addr)
+{
+	return i915_pde_index(addr, GEN6_PDE_SHIFT);
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 10/24] drm/i915: Extract context switch skip and pd load logic
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (8 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 09/24] drm/i915: Track GEN6 page table usage Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 11/24] drm/i915: Track page table reload need Michel Thierry
                     ` (13 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

We have some fanciness coming up. This patch just breaks out the logic
of context switch skip, pd load pre, and pd load post.

v2: Use new functions to replace the logic right away (Daniel)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_context.c | 40 +++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 755b415..6206d27 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -565,6 +565,33 @@ mi_set_context(struct intel_engine_cs *ring,
 	return ret;
 }
 
+static inline bool should_skip_switch(struct intel_engine_cs *ring,
+				      struct intel_context *from,
+				      struct intel_context *to)
+{
+	if (from == to && !to->remap_slice)
+		return true;
+
+	return false;
+}
+
+static bool
+needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	return ((INTEL_INFO(ring->dev)->gen < 8) ||
+			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
+}
+
+static bool
+needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
+{
+	return (!to->legacy_hw_ctx.initialized ||
+			i915_gem_context_is_default(to)) &&
+			to->ppgtt && IS_GEN8(ring->dev);
+}
+
 static int do_switch(struct intel_engine_cs *ring,
 		     struct intel_context *to)
 {
@@ -573,9 +600,6 @@ static int do_switch(struct intel_engine_cs *ring,
 	u32 hw_flags = 0;
 	bool uninitialized = false;
 	struct i915_vma *vma;
-	bool needs_pd_load_pre = ((INTEL_INFO(ring->dev)->gen < 8) ||
-			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
-	bool needs_pd_load_post = false;
 	int ret, i;
 
 	if (from != NULL && ring == &dev_priv->ring[RCS]) {
@@ -583,7 +607,7 @@ static int do_switch(struct intel_engine_cs *ring,
 		BUG_ON(!i915_gem_obj_is_pinned(from->legacy_hw_ctx.rcs_state));
 	}
 
-	if (from == to && !to->remap_slice)
+	if (should_skip_switch(ring, from, to))
 		return 0;
 
 	/* Trying to pin first makes error handling easier. */
@@ -601,7 +625,7 @@ static int do_switch(struct intel_engine_cs *ring,
 	 */
 	from = ring->last_context;
 
-	if (needs_pd_load_pre) {
+	if (needs_pd_load_pre(ring, to)) {
 		/* Older GENs and non render rings still want the load first,
 		 * "PP_DCLV followed by PP_DIR_BASE register through Load
 		 * Register Immediate commands in Ring Buffer before submitting
@@ -646,16 +670,14 @@ static int do_switch(struct intel_engine_cs *ring,
 	 * XXX: If we implemented page directory eviction code, this
 	 * optimization needs to be removed.
 	 */
-	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
+	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
 		hw_flags |= MI_RESTORE_INHIBIT;
-		needs_pd_load_post = to->ppgtt && IS_GEN8(ring->dev);
-	}
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
 		goto unpin_out;
 
-	if (needs_pd_load_post) {
+	if (needs_pd_load_post(ring, to)) {
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		/* The hardware context switch is emitted, but we haven't
 		 * actually changed the state - so it's probably safe to bail
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 11/24] drm/i915: Track page table reload need
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (9 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 10/24] drm/i915: Extract context switch skip and pd load logic Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 12/24] drm/i915: Initialize all contexts Michel Thierry
                     ` (12 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This patch was formerly known as, "Force pd restore when PDEs change,
gen6-7." I had to change the name because it is needed for GEN8 too.

The real issue this is trying to solve is when a new object is mapped
into the current address space. The GPU does not snoop the new mapping
so we must do the gen specific action to reload the page tables.

GEN8 and GEN7 do differ in the way they load page tables for the RCS.
GEN8 does so with the context restore, while GEN7 requires the proper
load commands in the command streamer. Non-render is similar for both.

Caveat for GEN7
The docs say you cannot change the PDEs of a currently running context.
We never map new PDEs of a running context, and expect them to be
present - so I think this is okay. (We can unmap, but this should also
be okay since we only unmap unreferenced objects that the GPU shouldn't
be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag
to signal that even if the context is the same, force a reload. It's
unclear exactly what this does, but I have a hunch it's the right thing
to do.

The logic assumes that we always emit a context switch after mapping new
PDEs, and before we submit a batch. This is the case today, and has been
the case since the inception of hardware contexts. A note in the comment
let's the user know.

It's not just for gen8. If the current context has mappings change, we
need a context reload to switch

v2: Rebased after ppgtt clean up patches. Split the warning for aliasing
and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt
is always null.

v3: Invalidate PPGTT TLBs inside alloc_va_range.

v4: Rename ppgtt_invalidate_tlbs to mark_tlbs_dirty and move
pd_dirty_rings from i915_address_space to i915_hw_ppgtt. Fixes when
neither ctx->ppgtt and aliasing_ppgtt exist.

v5: Removed references to teardown_va_range.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_context.c    | 29 ++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 11 +++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  1 +
 4 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 6206d27..437cdcc 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -569,8 +569,20 @@ static inline bool should_skip_switch(struct intel_engine_cs *ring,
 				      struct intel_context *from,
 				      struct intel_context *to)
 {
-	if (from == to && !to->remap_slice)
-		return true;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	if (to->remap_slice)
+		return false;
+
+	if (to->ppgtt) {
+		if (from == to && !test_bit(ring->id,
+				&to->ppgtt->pd_dirty_rings))
+			return true;
+	} else if (dev_priv->mm.aliasing_ppgtt) {
+		if (from == to && !test_bit(ring->id,
+				&dev_priv->mm.aliasing_ppgtt->pd_dirty_rings))
+			return true;
+	}
 
 	return false;
 }
@@ -587,9 +599,8 @@ needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
 static bool
 needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
 {
-	return (!to->legacy_hw_ctx.initialized ||
-			i915_gem_context_is_default(to)) &&
-			to->ppgtt && IS_GEN8(ring->dev);
+	return IS_GEN8(ring->dev) &&
+			(to->ppgtt || &to->ppgtt->pd_dirty_rings);
 }
 
 static int do_switch(struct intel_engine_cs *ring,
@@ -634,6 +645,12 @@ static int do_switch(struct intel_engine_cs *ring,
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		if (ret)
 			goto unpin_out;
+
+		/* Doing a PD load always reloads the page dirs */
+		if (to->ppgtt)
+			clear_bit(ring->id, &to->ppgtt->pd_dirty_rings);
+		else
+			clear_bit(ring->id, &dev_priv->mm.aliasing_ppgtt->pd_dirty_rings);
 	}
 
 	if (ring != &dev_priv->ring[RCS]) {
@@ -672,6 +689,8 @@ static int do_switch(struct intel_engine_cs *ring,
 	 */
 	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
 		hw_flags |= MI_RESTORE_INHIBIT;
+	else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->pd_dirty_rings))
+		hw_flags |= MI_FORCE_RESTORE;
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index e3ef177..971a149 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1198,6 +1198,13 @@ i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
 	if (ret)
 		goto error;
 
+	if (ctx->ppgtt)
+		WARN(ctx->ppgtt->pd_dirty_rings & (1<<ring->id),
+			"%s didn't clear reload\n", ring->name);
+	else if (dev_priv->mm.aliasing_ppgtt)
+		WARN(dev_priv->mm.aliasing_ppgtt->pd_dirty_rings &
+			(1<<ring->id), "%s didn't clear reload\n", ring->name);
+
 	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
 	instp_mask = I915_EXEC_CONSTANTS_MASK;
 	switch (instp_mode) {
@@ -1445,6 +1452,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret)
 		goto err;
 
+	/* XXX: Reserve has possibly change PDEs which means we must do a
+	 * context switch before we can coherently read some of the reserved
+	 * VMAs. */
+
 	/* The objects are in their final locations, apply the relocations. */
 	if (need_relocs)
 		ret = i915_gem_execbuffer_relocate(eb);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 760585e..74c777d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1140,6 +1140,16 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
+/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
+ * are switching between contexts with the same LRCA, we also must do a force
+ * restore.
+ */
+static inline void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
+{
+	/* If current vm != vm, */
+	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
+}
+
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start, uint64_t length)
 {
@@ -1159,6 +1169,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 				I915_PPGTT_PT_ENTRIES);
 	}
 
+	mark_tlbs_dirty(ppgtt);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 1b15fc9..eaf530f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -297,6 +297,7 @@ struct i915_hw_ppgtt {
 	struct i915_address_space base;
 	struct kref ref;
 	struct drm_mm_node node;
+	unsigned long pd_dirty_rings;
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 12/24] drm/i915: Initialize all contexts
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (10 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 11/24] drm/i915: Track page table reload need Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 13/24] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
                     ` (11 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The problem is we're going to switch to a new context, which could be
the default context. The plan was to use restore inhibit, which would be
fine, except if we are using dynamic page tables (which we will). If we
use dynamic page tables and we don't load new page tables, the previous
page tables might go away, and future operations will fault.

CTXA runs.
switch to default, restore inhibit
CTXA dies and has its address space taken away.
Run CTXB, tries to save using the context A's address space - this
fails.

The general solution is to make sure every context has it's own state,
and its own address space. For cases when we must restore inhibit, first
thing we do is load a valid address space. I thought this would be
enough, but apparently there are references within the context itself
which will refer to the old address space - therefore, we also must
reinitialize.

It was tricky to track this down as we don't have much insight into what
happens in a context save.

This is required for the next patch which enables dynamic page tables.

v2: to->ppgtt is only valid in full ppgtt.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_context.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 437cdcc..6a583c3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -596,13 +596,6 @@ needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
 			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
 }
 
-static bool
-needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
-{
-	return IS_GEN8(ring->dev) &&
-			(to->ppgtt || &to->ppgtt->pd_dirty_rings);
-}
-
 static int do_switch(struct intel_engine_cs *ring,
 		     struct intel_context *to)
 {
@@ -683,20 +676,24 @@ static int do_switch(struct intel_engine_cs *ring,
 
 	/* GEN8 does *not* require an explicit reload if the PDPs have been
 	 * setup, and we do not wish to move them.
-	 *
-	 * XXX: If we implemented page directory eviction code, this
-	 * optimization needs to be removed.
 	 */
-	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
+	if (!to->legacy_hw_ctx.initialized) {
 		hw_flags |= MI_RESTORE_INHIBIT;
-	else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->pd_dirty_rings))
+		/* NB: If we inhibit the restore, the context is not allowed to
+		 * die because future work may end up depending on valid address
+		 * space. This means we must enforce that a page table load
+		 * occur when this occurs. */
+	} else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->pd_dirty_rings))
 		hw_flags |= MI_FORCE_RESTORE;
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
 		goto unpin_out;
 
-	if (needs_pd_load_post(ring, to)) {
+	if (IS_GEN8(ring->dev) && to->ppgtt && (hw_flags & MI_RESTORE_INHIBIT)) {
+		/* We have a valid page directory (scratch) to switch to. This
+		 * allows the old VM to be freed. Note that if anything occurs
+		 * between the set context, and here, we are f*cked */
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		/* The hardware context switch is emitted, but we haven't
 		 * actually changed the state - so it's probably safe to bail
@@ -746,7 +743,7 @@ static int do_switch(struct intel_engine_cs *ring,
 		i915_gem_context_unreference(from);
 	}
 
-	uninitialized = !to->legacy_hw_ctx.initialized && from == NULL;
+	uninitialized = !to->legacy_hw_ctx.initialized;
 	to->legacy_hw_ctx.initialized = true;
 
 done:
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 13/24] drm/i915: Finish gen6/7 dynamic page table allocation
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (11 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 12/24] drm/i915: Initialize all contexts Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 14/24] drm/i915: Add dynamic page trace events Michel Thierry
                     ` (10 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This patch continues on the idea from the previous patch. From here on,
in the steady state, PDEs are all pointing to the scratch page table (as
recommended in the spec). When an object is allocated in the VA range,
the code will determine if we need to allocate a page for the page
table. Similarly when the object is destroyed, we will remove, and free
the page table pointing the PDE back to the scratch page.

Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.

The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV.  Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.

We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.

v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.

v3: Updated trace event to spit out a name

v4: Aliasing ppgtt is now initialized differently (in setup global gtt)

v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check
for trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).

v6: Implement changes from code review (Daniel):
 - allocate/teardown_va_range calls added.
 - Add a scratch page allocation helper (only need the address).
 - Move trace events to a new patch.
 - Use updated mark_tlbs_dirty.
 - Moved pt preallocation for aliasing ppgtt into gen6_ppgtt_init.

v7: teardown_va_range removed (Daniel).
    In init, gen6_ppgtt_clear_range call is only needed for aliasing ppgtt.

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
---
 drivers/gpu/drm/i915/i915_debugfs.c |   3 +-
 drivers/gpu/drm/i915/i915_gem.c     |   9 +++
 drivers/gpu/drm/i915/i915_gem_gtt.c | 125 +++++++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_gtt.h |   3 +
 4 files changed, 123 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 60f91bc..0f63076 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2149,6 +2149,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", I915_READ(RING_PP_DIR_BASE_READ(ring)));
 		seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", I915_READ(RING_PP_DIR_DCLV(ring)));
 	}
+	seq_printf(m, "ECOCHK: 0x%08x\n\n", I915_READ(GAM_ECOCHK));
+
 	if (dev_priv->mm.aliasing_ppgtt) {
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
@@ -2165,7 +2167,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
 		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
 	}
-	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
 }
 
 static int i915_ppgtt_info(struct seq_file *m, void *data)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 98657b3..7944931 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3566,6 +3566,15 @@ search_free:
 	if (ret)
 		goto err_remove_node;
 
+	/*  allocate before insert / bind */
+	if (vma->vm->allocate_va_range) {
+		ret = vma->vm->allocate_va_range(vma->vm,
+						vma->node.start,
+						vma->node.size);
+		if (ret)
+			goto err_remove_node;
+	}
+
 	trace_i915_vma_bind(vma, flags);
 	ret = i915_vma_bind(vma, obj->cache_level,
 			    flags & PIN_GLOBAL ? GLOBAL_BIND : 0);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 74c777d..85c914f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -358,6 +358,16 @@ fail_bitmap:
 	return ERR_PTR(ret);
 }
 
+static inline struct i915_page_table_entry *alloc_pt_scratch(struct drm_device *dev)
+{
+	struct i915_page_table_entry *pt = alloc_pt_single(dev);
+
+	if (!IS_ERR(pt))
+		pt->scratch = 1;
+
+	return pt;
+}
+
 /**
  * alloc_pt_range() - Allocate a multiple page tables
  * @pd:		The page directory which will have at least @count entries
@@ -1153,10 +1163,46 @@ static inline void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start, uint64_t length)
 {
+	DECLARE_BITMAP(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
+	struct drm_device *dev = vm->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_hw_ppgtt *ppgtt =
 				container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_table_entry *pt;
+	const uint32_t start_save = start, length_save = length;
 	uint32_t pde, temp;
+	int ret;
+
+	BUG_ON(upper_32_bits(start));
+
+	bitmap_zero(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
+
+	/* The allocation is done in two stages so that we can bail out with
+	 * minimal amount of pain. The first stage finds new page tables that
+	 * need allocation. The second stage marks use ptes within the page
+	 * tables.
+	 */
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		if (pt != ppgtt->scratch_pt) {
+			WARN_ON(bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
+			continue;
+		}
+
+		/* We've already allocated a page table */
+		WARN_ON(!bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
+
+		pt = alloc_pt_single(dev);
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto unwind_out;
+		}
+
+		ppgtt->pd.page_tables[pde] = pt;
+		set_bit(pde, new_page_tables);
+	}
+
+	start = start_save;
+	length = length_save;
 
 	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
 		DECLARE_BITMAP(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
@@ -1165,21 +1211,46 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 		bitmap_set(tmp_bitmap, gen6_pte_index(start),
 			   gen6_pte_count(start, length));
 
-		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+		if (test_and_clear_bit(pde, new_page_tables))
+			gen6_write_pdes(&ppgtt->pd, pde, pt);
+
+		bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
 				I915_PPGTT_PT_ENTRIES);
 	}
 
+	WARN_ON(!bitmap_empty(new_page_tables, GEN6_PPGTT_PD_ENTRIES));
+
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
+	readl(dev_priv->gtt.gsm);
+
 	mark_tlbs_dirty(ppgtt);
 	return 0;
+
+unwind_out:
+	for_each_set_bit(pde, new_page_tables, GEN6_PPGTT_PD_ENTRIES) {
+		struct i915_page_table_entry *pt = ppgtt->pd.page_tables[pde];
+
+		ppgtt->pd.page_tables[pde] = NULL;
+		unmap_and_free_pt(pt, vm->dev);
+	}
+
+	mark_tlbs_dirty(ppgtt);
+	return ret;
 }
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		unmap_and_free_pt(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct i915_page_table_entry *pt = ppgtt->pd.page_tables[i];
 
+		if (pt != ppgtt->scratch_pt)
+			unmap_and_free_pt(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+	}
+
+	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
 	unmap_and_free_pd(&ppgtt->pd);
 }
 
@@ -1206,6 +1277,9 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 	 * size. We allocate at the top of the GTT to avoid fragmentation.
 	 */
 	BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
+	ppgtt->scratch_pt = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pt))
+		return PTR_ERR(ppgtt->scratch_pt);
 alloc:
 	ret = drm_mm_insert_node_in_range_generic(&dev_priv->gtt.base.mm,
 						  &ppgtt->node, GEN6_PD_SIZE,
@@ -1236,6 +1310,7 @@ alloc:
 	return 0;
 
 err_out:
+	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
 	return ret;
 }
 
@@ -1247,18 +1322,20 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
-			ppgtt->base.dev);
+	return 0;
+}
 
-	if (ret) {
-		drm_mm_remove_node(&ppgtt->node);
-		return ret;
-	}
+static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
+				  uint64_t start, uint64_t length)
+{
+	struct i915_page_table_entry *unused;
+	uint32_t pde, temp;
 
-	return 0;
+	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde)
+		ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
 }
 
-static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 {
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1278,6 +1355,18 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
+	if (aliasing) {
+		/* preallocate all pts */
+		ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+				ppgtt->base.dev);
+
+		if (ret) {
+			unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
+			drm_mm_remove_node(&ppgtt->node);
+			return ret;
+		}
+	}
+
 	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
@@ -1292,7 +1381,10 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
 		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
+	if (aliasing)
+		ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
+	else
+		gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
 
 	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
 
@@ -1306,7 +1398,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
+static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
+		bool aliasing)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
@@ -1314,7 +1407,7 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.scratch = dev_priv->gtt.base.scratch;
 
 	if (INTEL_INFO(dev)->gen < 8)
-		return gen6_ppgtt_init(ppgtt);
+		return gen6_ppgtt_init(ppgtt, aliasing);
 	else
 		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
 }
@@ -1323,7 +1416,7 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret = 0;
 
-	ret = __hw_ppgtt_init(dev, ppgtt);
+	ret = __hw_ppgtt_init(dev, ppgtt, false);
 	if (ret == 0) {
 		kref_init(&ppgtt->ref);
 		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
@@ -1944,7 +2037,7 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
 		if (!ppgtt)
 			return -ENOMEM;
 
-		ret = __hw_ppgtt_init(dev, ppgtt);
+		ret = __hw_ppgtt_init(dev, ppgtt, true);
 		if (ret != 0)
 			return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index eaf530f..43b5adf 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -195,6 +195,7 @@ struct i915_page_table_entry {
 	dma_addr_t daddr;
 
 	unsigned long *used_ptes;
+	unsigned int scratch:1;
 };
 
 struct i915_page_directory_entry {
@@ -305,6 +306,8 @@ struct i915_hw_ppgtt {
 		struct i915_page_directory_entry pd;
 	};
 
+	struct i915_page_table_entry *scratch_pt;
+
 	struct drm_i915_file_private *file_priv;
 
 	gen6_gtt_pte_t __iomem *pd_addr;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 14/24] drm/i915: Add dynamic page trace events
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (12 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 13/24] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 15/24] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
                     ` (9 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

Traces for page directories and tables allocation and map.

v2: Removed references to teardown.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c     |  2 +
 drivers/gpu/drm/i915/i915_gem_gtt.c |  5 ++
 drivers/gpu/drm/i915/i915_trace.h   | 99 +++++++++++++++++++++++++++++++++++++
 3 files changed, 106 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7944931..601f373 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3568,6 +3568,8 @@ search_free:
 
 	/*  allocate before insert / bind */
 	if (vma->vm->allocate_va_range) {
+		trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size,
+				VM_TO_TRACE_NAME(vma->vm));
 		ret = vma->vm->allocate_va_range(vma->vm,
 						vma->node.start,
 						vma->node.size);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 85c914f..36e2482 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1199,6 +1199,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 
 		ppgtt->pd.page_tables[pde] = pt;
 		set_bit(pde, new_page_tables);
+		trace_i915_page_table_entry_alloc(vm, pde, start, GEN6_PDE_SHIFT);
 	}
 
 	start = start_save;
@@ -1214,6 +1215,10 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 		if (test_and_clear_bit(pde, new_page_tables))
 			gen6_write_pdes(&ppgtt->pd, pde, pt);
 
+		trace_i915_page_table_entry_map(vm, pde, pt,
+					 gen6_pte_index(start),
+					 gen6_pte_count(start, length),
+					 I915_PPGTT_PT_ENTRIES);
 		bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
 				I915_PPGTT_PT_ENTRIES);
 	}
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index f004d3d..3a657e4 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -156,6 +156,105 @@ TRACE_EVENT(i915_vma_unbind,
 		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
 );
 
+#define VM_TO_TRACE_NAME(vm) \
+	(i915_is_ggtt(vm) ? "GGTT" : \
+		      "Private VM")
+
+DECLARE_EVENT_CLASS(i915_va,
+	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+	TP_ARGS(vm, start, length, name),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u64, start)
+		__field(u64, end)
+		__string(name, name)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->start = start;
+		__entry->end = start + length;
+		__assign_str(name, name);
+	),
+
+	TP_printk("vm=%p (%s), 0x%llx-0x%llx",
+		  __entry->vm, __get_str(name),  __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_va, i915_va_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+	     TP_ARGS(vm, start, length, name)
+);
+
+DECLARE_EVENT_CLASS(i915_page_table_entry,
+	TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	TP_ARGS(vm, pde, start, pde_shift),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u64, start)
+		__field(u64, end)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->start = start;
+		__entry->end = (start + (1ULL << pde_shift)) & ~((1ULL << pde_shift)-1);
+	),
+
+	TP_printk("vm=%p, pde=%d (0x%llx-0x%llx)",
+		  __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_page_table_entry, i915_page_table_entry_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	     TP_ARGS(vm, pde, start, pde_shift)
+);
+
+/* Avoid extra math because we only support two sizes. The format is defined by
+ * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
+#define TRACE_PT_SIZE(bits) \
+	((((bits) == 1024) ? 288 : 144) + 1)
+
+DECLARE_EVENT_CLASS(i915_page_table_entry_update,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_page_table_entry *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u32, first)
+		__field(u32, last)
+		__dynamic_array(char, cur_ptes, TRACE_PT_SIZE(bits))
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->first = first;
+		__entry->last = first + len;
+
+		bitmap_scnprintf(__get_str(cur_ptes),
+				 TRACE_PT_SIZE(bits),
+				 pt->used_ptes,
+				 bits);
+	),
+
+	TP_printk("vm=%p, pde=%d, updating %u:%u\t%s",
+		  __entry->vm, __entry->pde, __entry->last, __entry->first,
+		  __get_str(cur_ptes))
+);
+
+DEFINE_EVENT(i915_page_table_entry_update, i915_page_table_entry_map,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_page_table_entry *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits)
+);
+
 TRACE_EVENT(i915_gem_object_change_domain,
 	    TP_PROTO(struct drm_i915_gem_object *obj, u32 old_read, u32 old_write),
 	    TP_ARGS(obj, old_read, old_write),
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 15/24] drm/i915/bdw: Use dynamic allocation idioms on free
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (13 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 14/24] drm/i915: Add dynamic page trace events Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 16/24] drm/i915/bdw: page directories rework allocation Michel Thierry
                     ` (8 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The page directory freer is left here for now as it's still useful given
that GEN8 still preallocates. Once the allocation functions are broken
up into more discrete chunks, we'll follow suit and destroy this
leftover piece.

v2: Match trace_i915_va_teardown params
v3: Multiple rebases.
v4: Updated to use unmap_and_free_pt.
v5: teardown_va_range logic no longer needed.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 26 ++++++++++----------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 47 +++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 36e2482..56db132 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -604,19 +604,6 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct d
 	}
 }
 
-static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
-{
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
-			continue;
-
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
-		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
-	}
-}
-
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
@@ -649,6 +636,19 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	}
 }
 
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+{
+	int i;
+
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+			continue;
+
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
+		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
+	}
+}
+
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct i915_hw_ppgtt *ppgtt =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 43b5adf..70ce50d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -383,6 +383,53 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	return i915_pde_index(addr, GEN6_PDE_SHIFT);
 }
 
+#define gen8_for_each_pde(pt, pd, start, length, temp, iter)		\
+	for (iter = gen8_pde_index(start), pt = (pd)->page_tables[iter]; \
+	     length > 0 && iter < GEN8_PDES_PER_PAGE;			\
+	     pt = (pd)->page_tables[++iter],				\
+	     temp = ALIGN(start+1, 1 << GEN8_PDE_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
+	for (iter = gen8_pdpe_index(start), pd = (pdp)->page_directory[iter];	\
+	     length > 0 && iter < GEN8_LEGACY_PDPES;			\
+	     pd = (pdp)->page_directory[++iter],				\
+	     temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
+/* Clamp length to the next page_directory boundary */
+static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
+{
+	uint64_t next_pd = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
+
+	if (next_pd > (start + length))
+		return length;
+
+	return next_pd - start;
+}
+
+static inline uint32_t gen8_pte_index(uint64_t address)
+{
+	return i915_pte_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pde_index(uint64_t address)
+{
+	return i915_pde_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pdpe_index(uint64_t address)
+{
+	return (address >> GEN8_PDPE_SHIFT) & GEN8_PDPE_MASK;
+}
+
+static inline uint32_t gen8_pml4e_index(uint64_t address)
+{
+	BUG(); /* For 64B */
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 16/24] drm/i915/bdw: page directories rework allocation
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (14 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 15/24] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 17/24] drm/i915/bdw: pagetable allocation rework Michel Thierry
                     ` (7 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Start using gen8_for_each_pdpe macro to allocate the page directories.

v2: Rebased after s/free_pt_*/unmap_and_free_pt/ change.
v3: Rebased after teardown va range logic was removed.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 43 ++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 56db132..5b6b665 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -678,25 +678,39 @@ unwind_out:
 	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
+static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
+				     uint64_t start,
+				     uint64_t length)
 {
-	int i;
-
-	for (i = 0; i < max_pdp; i++) {
-		ppgtt->pdp.page_directory[i] = alloc_pd_single();
-		if (IS_ERR(ppgtt->pdp.page_directory[i]))
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pdp, struct i915_hw_ppgtt, pdp);
+	struct i915_page_directory_entry *unused;
+	uint64_t temp;
+	uint32_t pdpe;
+
+	/* FIXME: PPGTT container_of won't work for 64b */
+	BUG_ON((start + length) > 0x800000000ULL);
+
+	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
+		BUG_ON(unused);
+		pdp->page_directory[pdpe] = alloc_pd_single();
+		if (IS_ERR(ppgtt->pdp.page_directory[pdpe]))
 			goto unwind_out;
+
+		ppgtt->num_pd_pages++;
 	}
 
-	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
 
 unwind_out:
-	while (i--)
-		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
+	while (pdpe--) {
+		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe]);
+		ppgtt->num_pd_pages--;
+	}
+
+	WARN_ON(ppgtt->num_pd_pages);
 
 	return -ENOMEM;
 }
@@ -706,7 +720,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 {
 	int ret;
 
-	ret = gen8_ppgtt_allocate_page_directories(ppgtt, max_pdp);
+	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, ppgtt->base.start,
+					ppgtt->base.total);
 	if (ret)
 		return ret;
 
@@ -783,6 +798,10 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	if (size % (1<<30))
 		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
 
+	ppgtt->base.start = 0;
+	ppgtt->base.total = size;
+	BUG_ON(ppgtt->base.total == 0);
+
 	/* 1. Do all our allocations for page directories and page tables. */
 	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
 	if (ret)
@@ -830,8 +849,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
-	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
 
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 17/24] drm/i915/bdw: pagetable allocation rework
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (15 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 16/24] drm/i915/bdw: page directories rework allocation Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 18/24] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
                     ` (6 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Start using gen8_for_each_pde macro to allocate page tables.

v2: teardown_va_range references removed.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 46 +++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5b6b665..e85d0f9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -658,22 +658,27 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_alloc_pagetabs(struct i915_page_directory_entry *pd,
+				     uint64_t start,
+				     uint64_t length,
+				     struct drm_device *dev)
 {
-	int i, ret;
+	struct i915_page_table_entry *unused;
+	uint64_t temp;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
-				     0, GEN8_PDES_PER_PAGE, ppgtt->base.dev);
-		if (ret)
+	gen8_for_each_pde(unused, pd, start, length, temp, pde) {
+		BUG_ON(unused);
+		pd->page_tables[pde] = alloc_pt_single(dev);
+		if (IS_ERR(pd->page_tables[pde]))
 			goto unwind_out;
 	}
 
 	return 0;
 
 unwind_out:
-	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
+	while (pde--)
+		unmap_and_free_pt(pd->page_tables[pde], dev);
 
 	return -ENOMEM;
 }
@@ -716,20 +721,28 @@ unwind_out:
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-			    const int max_pdp)
+			    uint64_t start,
+			    uint64_t length)
 {
+	struct i915_page_directory_entry *pd;
+	uint64_t temp;
+	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, ppgtt->base.start,
-					ppgtt->base.total);
+	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length);
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
-	if (ret)
-		goto err_out;
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
+						ppgtt->base.dev);
+		if (ret)
+			goto err_out;
+
+		ppgtt->num_pd_entries += GEN8_PDES_PER_PAGE;
+	}
 
-	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
+	BUG_ON(pdpe > ppgtt->num_pd_pages);
 
 	return 0;
 
@@ -800,10 +813,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
-	BUG_ON(ppgtt->base.total == 0);
 
 	/* 1. Do all our allocations for page directories and page tables. */
-	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
+	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
 	if (ret)
 		return ret;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 18/24] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (16 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 17/24] drm/i915/bdw: pagetable allocation rework Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 19/24] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
                     ` (5 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

One important part of this patch is we now write a scratch page
directory into any unused PDP descriptors. This matters for 2 reasons,
first, we're not allowed to just use 0, or an invalid pointer, and second,
we must wipe out any previous contents from the last context.

The latter point only matters with full PPGTT. The former point only
effect platforms with less than 4GB memory.

v2: Updated commit message to point that we must set unused PDPs to the
scratch page.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 29 ++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  5 ++++-
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e85d0f9..92c97a9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -439,8 +439,9 @@ static struct i915_page_directory_entry *alloc_pd_single(void)
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
-static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
-			   uint64_t val)
+static int gen8_write_pdp(struct intel_engine_cs *ring,
+			  unsigned entry,
+			  dma_addr_t addr)
 {
 	int ret;
 
@@ -452,10 +453,10 @@ static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
 
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 	intel_ring_emit(ring, GEN8_RING_PDP_UDW(ring, entry));
-	intel_ring_emit(ring, (u32)(val >> 32));
+	intel_ring_emit(ring, upper_32_bits(addr));
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 	intel_ring_emit(ring, GEN8_RING_PDP_LDW(ring, entry));
-	intel_ring_emit(ring, (u32)(val));
+	intel_ring_emit(ring, lower_32_bits(addr));
 	intel_ring_advance(ring);
 
 	return 0;
@@ -466,12 +467,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 {
 	int i, ret;
 
-	/* bit of a hack to find the actual last used pd */
-	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
-
-	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.page_directory[i]->daddr;
-		ret = gen8_write_pdp(ring, i, addr);
+	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
+		dma_addr_t pd_daddr = pd ? pd->daddr : ppgtt->scratch_pd->daddr;
+		/* The page directory might be NULL, but we need to clear out
+		 * whatever the previous context might have used. */
+		ret = gen8_write_pdp(ring, i, pd_daddr);
 		if (ret)
 			return ret;
 	}
@@ -814,10 +815,16 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
 
+	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pd))
+		return PTR_ERR(ppgtt->scratch_pd);
+
 	/* 1. Do all our allocations for page directories and page tables. */
 	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
-	if (ret)
+	if (ret) {
+		unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
+	}
 
 	/*
 	 * 2. Create DMA mappings for the page directories and page tables.
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 70ce50d..f7d2af5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -306,7 +306,10 @@ struct i915_hw_ppgtt {
 		struct i915_page_directory_entry pd;
 	};
 
-	struct i915_page_table_entry *scratch_pt;
+	union {
+		struct i915_page_table_entry *scratch_pt;
+		struct i915_page_table_entry *scratch_pd; /* Just need the daddr */
+	};
 
 	struct drm_i915_file_private *file_priv;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 19/24] drm/i915: num_pd_pages/num_pd_entries isn't useful
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (17 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 18/24] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 20/24] drm/i915: Extract PPGTT param from page_directory alloc Michel Thierry
                     ` (4 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

These values are never quite useful for dynamic allocations of the page
tables. Getting rid of them will help prevent later confusion.

v2: Updated to use unmap_and_free_pd functions.
v3: Updated gen8_ppgtt_free after teardown logic was removed.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 --
 drivers/gpu/drm/i915/i915_gem_gtt.c | 72 ++++++++++++-------------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  7 ++--
 3 files changed, 28 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 0f63076..b00760b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2117,8 +2117,6 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 	if (!ppgtt)
 		return;
 
-	seq_printf(m, "Page directories: %d\n", ppgtt->num_pd_pages);
-	seq_printf(m, "Page tables: %d\n", ppgtt->num_pd_entries);
 	for_each_ring(ring, dev_priv, unused) {
 		seq_printf(m, "%s\n", ring->name);
 		for (i = 0; i < 4; i++) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 92c97a9..3b821cb 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -610,9 +610,7 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		/* TODO: In the future we'll support sparse mappings, so this
-		 * will have to change. */
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
 		if (!ppgtt->pdp.page_directory[i]->daddr)
 			continue;
 
@@ -641,7 +639,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
@@ -702,21 +700,13 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 		pdp->page_directory[pdpe] = alloc_pd_single();
 		if (IS_ERR(ppgtt->pdp.page_directory[pdpe]))
 			goto unwind_out;
-
-		ppgtt->num_pd_pages++;
 	}
 
-	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
-
 	return 0;
 
 unwind_out:
-	while (pdpe--) {
+	while (pdpe--)
 		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe]);
-		ppgtt->num_pd_pages--;
-	}
-
-	WARN_ON(ppgtt->num_pd_pages);
 
 	return -ENOMEM;
 }
@@ -739,12 +729,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 						ppgtt->base.dev);
 		if (ret)
 			goto err_out;
-
-		ppgtt->num_pd_entries += GEN8_PDES_PER_PAGE;
 	}
 
-	BUG_ON(pdpe > ppgtt->num_pd_pages);
-
 	return 0;
 
 	/* TODO: Check this for all cases */
@@ -806,7 +792,6 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
 	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
 	int i, j, ret;
 
 	if (size % (1<<30))
@@ -870,12 +855,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
-	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
-			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
-	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
-			 ppgtt->num_pd_entries,
-			 (ppgtt->num_pd_entries - min_pt_pages) + size % (1<<30));
 	return 0;
 
 bail:
@@ -886,26 +865,20 @@ bail:
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
 	struct i915_address_space *vm = &ppgtt->base;
-	gen6_gtt_pte_t __iomem *pd_addr;
+	struct i915_page_table_entry *unused;
 	gen6_gtt_pte_t scratch_pte;
 	uint32_t pd_entry;
-	int pte, pde;
+	uint32_t  pte, pde, temp;
+	uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
 
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
 
-	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-
-	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-		   ppgtt->pd.pd_offset,
-		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
-	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
+	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
 		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
-		pd_entry = readl(pd_addr + pde);
+		pd_entry = readl(ppgtt->pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
 		if (pd_entry != expected)
@@ -1178,12 +1151,15 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 
 static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
-	int i;
+	struct i915_page_table_entry *pt;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_tables[i]->daddr,
-			       4096, PCI_DMA_BIDIRECTIONAL);
+	gen6_for_all_pdes(pt, ppgtt, pde) {
+		if (pt != ppgtt->scratch_pt)
+			pci_unmap_page(ppgtt->base.dev->pdev,
+				pt->daddr,
+				4096, PCI_DMA_BIDIRECTIONAL);
+	}
 }
 
 /* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
@@ -1282,13 +1258,12 @@ unwind_out:
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		struct i915_page_table_entry *pt = ppgtt->pd.page_tables[i];
+	struct i915_page_table_entry *pt;
+	uint32_t pde;
 
+	gen6_for_all_pdes(pt, ppgtt, pde) {
 		if (pt != ppgtt->scratch_pt)
-			unmap_and_free_pt(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+			unmap_and_free_pt(pt, ppgtt->base.dev);
 	}
 
 	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
@@ -1347,7 +1322,6 @@ alloc:
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
-	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
 	return 0;
 
 err_out:
@@ -1398,7 +1372,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 
 	if (aliasing) {
 		/* preallocate all pts */
-		ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+		ret = alloc_pt_range(&ppgtt->pd, 0, GEN6_PPGTT_PD_ENTRIES,
 				ppgtt->base.dev);
 
 		if (ret) {
@@ -1413,7 +1387,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	ppgtt->base.total = GEN6_PPGTT_PD_ENTRIES * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd.pd_offset =
@@ -1713,7 +1687,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 		if (i915_is_ggtt(vm))
 			ppgtt = dev_priv->mm.aliasing_ppgtt;
 
-		gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
+		gen6_write_page_range(dev_priv, &ppgtt->pd, 0, GEN6_PPGTT_PD_ENTRIES);
 	}
 
 	i915_ggtt_flush(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index f7d2af5..9d49de7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -299,8 +299,6 @@ struct i915_hw_ppgtt {
 	struct kref ref;
 	struct drm_mm_node node;
 	unsigned long pd_dirty_rings;
-	unsigned num_pd_entries;
-	unsigned num_pd_pages; /* gen8+ */
 	union {
 		struct i915_page_directory_pointer_entry pdp;
 		struct i915_page_directory_entry pd;
@@ -338,6 +336,11 @@ struct i915_hw_ppgtt {
 	     temp = min_t(unsigned, temp, length), \
 	     start += temp, length -= temp)
 
+#define gen6_for_all_pdes(pt, ppgtt, iter)  \
+	for (iter = 0, pt = ppgtt->pd.page_tables[iter];			\
+	     iter < gen6_pde_index(ppgtt->base.total);			\
+	     pt =  ppgtt->pd.page_tables[++iter])
+
 static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
 {
 	const uint32_t mask = NUM_PTE(pde_shift) - 1;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 20/24] drm/i915: Extract PPGTT param from page_directory alloc
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (18 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 19/24] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 21/24] drm/i915/bdw: Split out mappings Michel Thierry
                     ` (3 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Now that we don't need to trace num_pd_pages, we may as well kill all
need for the PPGTT structure in the alloc_page_directorys. This is very useful
for when we move to 48b addressing, and the PDP isn't the root of the
page table structure.

The param is replaced with drm_device, which is an unavoidable wart
throughout the series. (in other words, not extra flagrant).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3b821cb..66c2a9d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -686,8 +686,6 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 				     uint64_t start,
 				     uint64_t length)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(pdp, struct i915_hw_ppgtt, pdp);
 	struct i915_page_directory_entry *unused;
 	uint64_t temp;
 	uint32_t pdpe;
@@ -698,7 +696,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
 		pdp->page_directory[pdpe] = alloc_pd_single();
-		if (IS_ERR(ppgtt->pdp.page_directory[pdpe]))
+		if (IS_ERR(pdp->page_directory[pdpe]))
 			goto unwind_out;
 	}
 
@@ -706,7 +704,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 
 unwind_out:
 	while (pdpe--)
-		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe]);
+		unmap_and_free_pd(pdp->page_directory[pdpe]);
 
 	return -ENOMEM;
 }
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 21/24] drm/i915/bdw: Split out mappings
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (19 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 20/24] drm/i915: Extract PPGTT param from page_directory alloc Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 22/24] drm/i915/bdw: begin bitmap tracking Michel Thierry
                     ` (2 subsequent siblings)
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

When we do dynamic page table allocations for gen8, we'll need to have
more control over how and when we map page tables, similar to gen6.
In particular, DMA mappings for page directories/tables occur at allocation
time.

This patch adds the functionality and calls it at init, which should
have no functional change.

The PDPEs are still a special case for now. We'll need a function for
that in the future as well.

v2: Handle renamed unmap_and_free_page functions.
v3: Updated after teardown_va logic was removed.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 177 ++++++++++++++----------------------
 1 file changed, 69 insertions(+), 108 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 66c2a9d..e662039 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -413,17 +413,20 @@ err_out:
 	return ret;
 }
 
-static void unmap_and_free_pd(struct i915_page_directory_entry *pd)
+static void unmap_and_free_pd(struct i915_page_directory_entry *pd,
+			       struct drm_device *dev)
 {
 	if (pd->page) {
+		i915_dma_unmap_single(pd, dev);
 		__free_page(pd->page);
 		kfree(pd);
 	}
 }
 
-static struct i915_page_directory_entry *alloc_pd_single(void)
+static struct i915_page_directory_entry *alloc_pd_single(struct drm_device *dev)
 {
 	struct i915_page_directory_entry *pd;
+	int ret;
 
 	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
 	if (!pd)
@@ -435,6 +438,13 @@ static struct i915_page_directory_entry *alloc_pd_single(void)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	ret = i915_dma_map_px_single(pd, dev);
+	if (ret) {
+		__free_page(pd->page);
+		kfree(pd);
+		return ERR_PTR(ret);
+	}
+
 	return pd;
 }
 
@@ -589,6 +599,36 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+			     struct i915_page_table_entry *pt,
+			     struct drm_device *dev)
+{
+	gen8_ppgtt_pde_t entry =
+		gen8_pde_encode(dev, pt->daddr, I915_CACHE_LLC);
+	*pde = entry;
+}
+
+/* It's likely we'll map more than one pagetable at a time. This function will
+ * save us unnecessary kmap calls, but do no more functionally than multiple
+ * calls to map_pt. */
+static void gen8_map_pagetable_range(struct i915_page_directory_entry *pd,
+				     uint64_t start,
+				     uint64_t length,
+				     struct drm_device *dev)
+{
+	gen8_ppgtt_pde_t *page_directory = kmap_atomic(pd->page);
+	struct i915_page_table_entry *pt;
+	uint64_t temp, pde;
+
+	gen8_for_each_pde(pt, pd, start, length, temp, pde)
+		__gen8_do_map_pt(page_directory + pde, pt, dev);
+
+	if (!HAS_LLC(dev))
+		drm_clflush_virt_range(page_directory, PAGE_SIZE);
+
+	kunmap_atomic(page_directory);
+}
+
 static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct drm_device *dev)
 {
 	int i;
@@ -644,7 +684,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 			continue;
 
 		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
-		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
+		unmap_and_free_pd(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
 	}
 }
 
@@ -684,7 +724,8 @@ unwind_out:
 
 static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
 				     uint64_t start,
-				     uint64_t length)
+				     uint64_t length,
+				     struct drm_device *dev)
 {
 	struct i915_page_directory_entry *unused;
 	uint64_t temp;
@@ -695,7 +736,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
-		pdp->page_directory[pdpe] = alloc_pd_single();
+		pdp->page_directory[pdpe] = alloc_pd_single(dev);
 		if (IS_ERR(pdp->page_directory[pdpe]))
 			goto unwind_out;
 	}
@@ -704,21 +745,24 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 
 unwind_out:
 	while (pdpe--)
-		unmap_and_free_pd(pdp->page_directory[pdpe]);
+		unmap_and_free_pd(pdp->page_directory[pdpe], dev);
 
 	return -ENOMEM;
 }
 
-static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-			    uint64_t start,
-			    uint64_t length)
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start,
+			       uint64_t length)
 {
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_entry *pd;
 	uint64_t temp;
 	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length);
+	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length,
+					ppgtt->base.dev);
 	if (ret)
 		return ret;
 
@@ -731,134 +775,51 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 
 	return 0;
 
-	/* TODO: Check this for all cases */
 err_out:
 	gen8_ppgtt_free(ppgtt);
 	return ret;
 }
 
-static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
-					     const int pd)
-{
-	dma_addr_t pd_addr;
-	int ret;
-
-	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.page_directory[pd]->page, 0,
-			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
-	if (ret)
-		return ret;
-
-	ppgtt->pdp.page_directory[pd]->daddr = pd_addr;
-
-	return 0;
-}
-
-static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
-					const int pd,
-					const int pt)
-{
-	dma_addr_t pt_addr;
-	struct i915_page_directory_entry *pdir = ppgtt->pdp.page_directory[pd];
-	struct i915_page_table_entry *ptab = pdir->page_tables[pt];
-	struct page *p = ptab->page;
-	int ret;
-
-	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
-	if (ret)
-		return ret;
-
-	ptab->daddr = pt_addr;
-
-	return 0;
-}
-
 /**
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
  * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
  * space.
  *
- * FIXME: split allocation into smaller pieces. For now we only ever do this
- * once, but with full PPGTT, the multiple contiguous allocations will be bad.
- * TODO: Do something with the size parameter
  */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	int i, j, ret;
-
-	if (size % (1<<30))
-		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
+	struct i915_page_directory_entry *pd;
+	uint64_t temp, start = 0;
+	const uint64_t orig_length = size;
+	uint32_t pdpe;
+	int ret;
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->switch_mm = gen8_mm_switch;
 
 	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pd))
 		return PTR_ERR(ppgtt->scratch_pd);
 
-	/* 1. Do all our allocations for page directories and page tables. */
-	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
+	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
 		unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
 	}
 
-	/*
-	 * 2. Create DMA mappings for the page directories and page tables.
-	 */
-	for (i = 0; i < max_pdp; i++) {
-		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
-		if (ret)
-			goto bail;
+	start = 0;
+	size = orig_length;
 
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
-			if (ret)
-				goto bail;
-		}
-	}
-
-	/*
-	 * 3. Map all the page directory entires to point to the page tables
-	 * we've allocated.
-	 *
-	 * For now, the PPGTT helper functions all require that the PDEs are
-	 * plugged in correctly. So we do that now/here. For aliasing PPGTT, we
-	 * will never need to touch the PDEs again.
-	 */
-	for (i = 0; i < max_pdp; i++) {
-		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
-		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i]->page);
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_table_entry *pt = pd->page_tables[j];
-			dma_addr_t addr = pt->daddr;
-			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
-						      I915_CACHE_LLC);
-		}
-		if (!HAS_LLC(ppgtt->base.dev))
-			drm_clflush_virt_range(pd_vaddr, PAGE_SIZE);
-		kunmap_atomic(pd_vaddr);
-	}
-
-	ppgtt->switch_mm = gen8_mm_switch;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
-	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
+		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
 
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 	return 0;
-
-bail:
-	gen8_ppgtt_unmap_pages(ppgtt);
-	gen8_ppgtt_free(ppgtt);
-	return ret;
 }
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
@@ -1265,7 +1226,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	}
 
 	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
-	unmap_and_free_pd(&ppgtt->pd);
+	unmap_and_free_pd(&ppgtt->pd, ppgtt->base.dev);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 22/24] drm/i915/bdw: begin bitmap tracking
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (20 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 21/24] drm/i915/bdw: Split out mappings Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 23/24] drm/i915/bdw: Dynamic page table allocations Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 24/24] drm/i915/bdw: Support dynamic pdp updates in lrc mode Michel Thierry
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Like with gen6/7, we can enable bitmap tracking with all the
preallocations to make sure things actually don't blow up.

v2: Rebased to match changes from previous patches.
v3: Without teardown logic, rely on used_pdpes and used_pdes when
freeing page tables.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 75 ++++++++++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 24 ++++++++++++
 2 files changed, 81 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e662039..662b9d8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -419,6 +419,7 @@ static void unmap_and_free_pd(struct i915_page_directory_entry *pd,
 	if (pd->page) {
 		i915_dma_unmap_single(pd, dev);
 		__free_page(pd->page);
+		kfree(pd->used_pdes);
 		kfree(pd);
 	}
 }
@@ -426,26 +427,35 @@ static void unmap_and_free_pd(struct i915_page_directory_entry *pd,
 static struct i915_page_directory_entry *alloc_pd_single(struct drm_device *dev)
 {
 	struct i915_page_directory_entry *pd;
-	int ret;
+	int ret = -ENOMEM;
 
 	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
 	if (!pd)
 		return ERR_PTR(-ENOMEM);
 
+	pd->used_pdes = kcalloc(BITS_TO_LONGS(GEN8_PDES_PER_PAGE),
+				sizeof(*pd->used_pdes), GFP_KERNEL);
+	if (!pd->used_pdes)
+		goto free_pd;
+
 	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (!pd->page) {
-		kfree(pd);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!pd->page)
+		goto free_bitmap;
 
 	ret = i915_dma_map_px_single(pd, dev);
-	if (ret) {
-		__free_page(pd->page);
-		kfree(pd);
-		return ERR_PTR(ret);
-	}
+	if (ret)
+		goto free_page;
 
 	return pd;
+
+free_page:
+	__free_page(pd->page);
+free_bitmap:
+	kfree(pd->used_pdes);
+free_pd:
+	kfree(pd);
+
+	return ERR_PTR(ret);
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
@@ -636,7 +646,7 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct d
 	if (!pd->page)
 		return;
 
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
+	for_each_set_bit(i, pd->used_pdes, GEN8_PDES_PER_PAGE) {
 		if (WARN_ON(!pd->page_tables[i]))
 			continue;
 
@@ -650,15 +660,18 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
 
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
-		if (!ppgtt->pdp.page_directory[i]->daddr)
+	for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
+		struct i915_page_directory_entry *pd;
+
+		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i]->daddr, PAGE_SIZE,
-			       PCI_DMA_BIDIRECTIONAL);
+		pd = ppgtt->pdp.page_directory[i];
+		if (!pd->daddr)
+			pci_unmap_page(hwdev, pd->daddr, PAGE_SIZE,
+					PCI_DMA_BIDIRECTIONAL);
 
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
+		for_each_set_bit(j, pd->used_pdes, GEN8_PDES_PER_PAGE) {
 			struct i915_page_table_entry *pt;
 			dma_addr_t addr;
 
@@ -679,7 +692,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+	for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
@@ -722,6 +735,7 @@ unwind_out:
 	return -ENOMEM;
 }
 
+/* bitmap of new page_directories */
 static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
 				     uint64_t start,
 				     uint64_t length,
@@ -737,6 +751,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
 		pdp->page_directory[pdpe] = alloc_pd_single(dev);
+
 		if (IS_ERR(pdp->page_directory[pdpe]))
 			goto unwind_out;
 	}
@@ -757,10 +772,13 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_entry *pd;
+	const uint64_t orig_start = start;
+	const uint64_t orig_length = length;
 	uint64_t temp;
 	uint32_t pdpe;
 	int ret;
 
+	/* Do the allocations first so we can easily bail out */
 	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length,
 					ppgtt->base.dev);
 	if (ret)
@@ -773,6 +791,27 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 			goto err_out;
 	}
 
+	/* Now mark everything we've touched as used. This doesn't allow for
+	 * robust error checking, but it makes the code a hell of a lot simpler.
+	 */
+	start = orig_start;
+	length = orig_length;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		struct i915_page_table_entry *pt;
+		uint64_t pd_len = gen8_clamp_pd(start, length);
+		uint64_t pd_start = start;
+		uint32_t pde;
+
+		gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
+			bitmap_set(pd->page_tables[pde]->used_ptes,
+				   gen8_pte_index(start),
+				   gen8_pte_count(start, length));
+			set_bit(pde, pd->used_pdes);
+		}
+		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+	}
+
 	return 0;
 
 err_out:
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9d49de7..c68ec3a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -205,11 +205,13 @@ struct i915_page_directory_entry {
 		dma_addr_t daddr;
 	};
 
+	unsigned long *used_pdes;
 	struct i915_page_table_entry *page_tables[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
 };
 
 struct i915_page_directory_pointer_entry {
 	/* struct page *page; */
+	DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
 	struct i915_page_directory_entry *page_directory[GEN8_LEGACY_PDPES];
 };
 
@@ -436,6 +438,28 @@ static inline uint32_t gen8_pml4e_index(uint64_t address)
 	BUG(); /* For 64B */
 }
 
+static inline size_t gen8_pte_count(uint64_t addr, uint64_t length)
+{
+	return i915_pte_count(addr, length, GEN8_PDE_SHIFT);
+}
+
+static inline size_t gen8_pde_count(uint64_t addr, uint64_t length)
+{
+	const uint32_t pdp_shift = GEN8_PDE_SHIFT + 9;
+	const uint64_t mask = ~((1 << pdp_shift) - 1);
+	uint64_t end;
+
+	BUG_ON(length == 0);
+	BUG_ON(offset_in_page(addr|length));
+
+	end = addr + length;
+
+	if ((addr & mask) != (end & mask))
+		return GEN8_PDES_PER_PAGE - i915_pde_index(addr, GEN8_PDE_SHIFT);
+
+	return i915_pde_index(end, GEN8_PDE_SHIFT) - i915_pde_index(addr, GEN8_PDE_SHIFT);
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 23/24] drm/i915/bdw: Dynamic page table allocations
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (21 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 22/24] drm/i915/bdw: begin bitmap tracking Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  2015-01-22 17:01   ` [PATCH v4 24/24] drm/i915/bdw: Support dynamic pdp updates in lrc mode Michel Thierry
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.

v2: Update aliasing/true ppgtt allocate/teardown/clear functions for
gen 6 & 7.

v3: Rebase.

v4: Remove BUG() from ppgtt_unbind_vma, but keep checking that either
teardown_va_range or clear_range functions exist (Daniel).

v5: Similar to gen6, in init, gen8_ppgtt_clear_range call is only needed
for aliasing ppgtt. Zombie tracking was originally added for teardown
function and is no longer required.

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 295 +++++++++++++++++++++++++++++-------
 1 file changed, 242 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 662b9d8..f3d7e26 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -609,7 +609,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
 			     struct i915_page_table_entry *pt,
 			     struct drm_device *dev)
 {
@@ -626,7 +626,7 @@ static void gen8_map_pagetable_range(struct i915_page_directory_entry *pd,
 				     uint64_t length,
 				     struct drm_device *dev)
 {
-	gen8_ppgtt_pde_t *page_directory = kmap_atomic(pd->page);
+	gen8_ppgtt_pde_t * const page_directory = kmap_atomic(pd->page);
 	struct i915_page_table_entry *pt;
 	uint64_t temp, pde;
 
@@ -710,67 +710,173 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_alloc_pagetabs(struct i915_page_directory_entry *pd,
+/**
+ * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
+ * @ppgtt:	Master ppgtt structure.
+ * @pd:		Page directory for this address range.
+ * @start:	Starting virtual address to begin allocations.
+ * @length	Size of the allocations.
+ * @new_pts:	Bitmap set by function with new allocations. Likely used by the
+ *		caller to free on error.
+ *
+ * Allocate the required number of page tables. Extremely similar to
+ * gen8_ppgtt_alloc_page_directories(). The main difference is here we are limited by
+ * the page directory boundary (instead of the page directory pointer). That
+ * boundary is 1GB virtual. Therefore, unlike gen8_ppgtt_alloc_page_directories(), it is
+ * possible, and likely that the caller will need to use multiple calls of this
+ * function to achieve the appropriate allocation.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_page_directory_entry *pd,
 				     uint64_t start,
 				     uint64_t length,
-				     struct drm_device *dev)
+				     unsigned long *new_pts)
 {
-	struct i915_page_table_entry *unused;
+	struct i915_page_table_entry *pt;
 	uint64_t temp;
 	uint32_t pde;
 
-	gen8_for_each_pde(unused, pd, start, length, temp, pde) {
-		BUG_ON(unused);
-		pd->page_tables[pde] = alloc_pt_single(dev);
-		if (IS_ERR(pd->page_tables[pde]))
+	gen8_for_each_pde(pt, pd, start, length, temp, pde) {
+		/* Don't reallocate page tables */
+		if (pt) {
+			/* Scratch is never allocated this way */
+			WARN_ON(pt->scratch);
+			continue;
+		}
+
+		pt = alloc_pt_single(ppgtt->base.dev);
+		if (IS_ERR(pt))
 			goto unwind_out;
+
+		pd->page_tables[pde] = pt;
+		set_bit(pde, new_pts);
 	}
 
 	return 0;
 
 unwind_out:
-	while (pde--)
-		unmap_and_free_pt(pd->page_tables[pde], dev);
+	for_each_set_bit(pde, new_pts, GEN8_PDES_PER_PAGE)
+		unmap_and_free_pt(pd->page_tables[pde], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
 
-/* bitmap of new page_directories */
-static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
+/**
+ * gen8_ppgtt_alloc_page_directories() - Allocate page directories for VA range.
+ * @ppgtt:	Master ppgtt structure.
+ * @pdp:	Page directory pointer for this address range.
+ * @start:	Starting virtual address to begin allocations.
+ * @length	Size of the allocations.
+ * @new_pds	Bitmap set by function with new allocations. Likely used by the
+ *		caller to free on error.
+ *
+ * Allocate the required number of page directories starting at the pde index of
+ * @start, and ending at the pde index @start + @length. This function will skip
+ * over already allocated page directories within the range, and only allocate
+ * new ones, setting the appropriate pointer within the pdp as well as the
+ * correct position in the bitmap @new_pds.
+ *
+ * The function will only allocate the pages within the range for a give page
+ * directory pointer. In other words, if @start + @length straddles a virtually
+ * addressed PDP boundary (512GB for 4k pages), there will be more allocations
+ * required by the caller, This is not currently possible, and the BUG in the
+ * code will prevent it.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_page_directory_pointer_entry *pdp,
 				     uint64_t start,
 				     uint64_t length,
-				     struct drm_device *dev)
+				     unsigned long *new_pds)
 {
-	struct i915_page_directory_entry *unused;
+	struct i915_page_directory_entry *pd;
 	uint64_t temp;
 	uint32_t pdpe;
 
+	BUG_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+
 	/* FIXME: PPGTT container_of won't work for 64b */
 	BUG_ON((start + length) > 0x800000000ULL);
 
-	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
-		BUG_ON(unused);
-		pdp->page_directory[pdpe] = alloc_pd_single(dev);
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+		if (pd)
+			continue;
 
-		if (IS_ERR(pdp->page_directory[pdpe]))
+		pd = alloc_pd_single(ppgtt->base.dev);
+		if (IS_ERR(pd))
 			goto unwind_out;
+
+		pdp->page_directory[pdpe] = pd;
+		set_bit(pdpe, new_pds);
 	}
 
 	return 0;
 
 unwind_out:
-	while (pdpe--)
-		unmap_and_free_pd(pdp->page_directory[pdpe], dev);
+	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
+		unmap_and_free_pd(pdp->page_directory[pdpe], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
 
+static inline void
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
+{
+	int i;
+
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+		kfree(new_pts[i]);
+	kfree(new_pts);
+	kfree(new_pds);
+}
+
+/* Fills in the page directory bitmap, ant the array of page tables bitmap. Both
+ * of these are based on the number of PDPEs in the system.
+ */
+int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
+					 unsigned long ***new_pts)
+{
+	int i;
+	unsigned long *pds;
+	unsigned long **pts;
+
+	pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
+	if (!pds)
+		return -ENOMEM;
+
+	pts = kcalloc(GEN8_PDES_PER_PAGE, sizeof(unsigned long *), GFP_KERNEL);
+	if (!pts) {
+		kfree(pds);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+		pts[i] = kcalloc(BITS_TO_LONGS(GEN8_PDES_PER_PAGE),
+				 sizeof(unsigned long), GFP_KERNEL);
+		if (!pts[i])
+			goto err_out;
+	}
+
+	*new_pds = pds;
+	*new_pts = (unsigned long **)pts;
+
+	return 0;
+
+err_out:
+	free_gen8_temp_bitmaps(pds, pts);
+	return -ENOMEM;
+}
+
 static int gen8_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start,
 			       uint64_t length)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	unsigned long *new_page_dirs, **new_page_tables;
 	struct i915_page_directory_entry *pd;
 	const uint64_t orig_start = start;
 	const uint64_t orig_length = length;
@@ -778,44 +884,96 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	uint32_t pdpe;
 	int ret;
 
-	/* Do the allocations first so we can easily bail out */
-	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length,
-					ppgtt->base.dev);
+#ifndef CONFIG_64BIT
+	/* Disallow 64b address on 32b platforms. Nothing is wrong with doing
+	 * this in hardware, but a lot of the drm code is not prepared to handle
+	 * 64b offset on 32b platforms.
+	 * This will be addressed when 48b PPGTT is added */
+	if (start + length > 0x100000000ULL)
+		return -E2BIG;
+#endif
+
+	/* Wrap is never okay since we can only represent 48b, and we don't
+	 * actually use the other side of the canonical address space.
+	 */
+	if (WARN_ON(start + length < start))
+		return -ERANGE;
+
+	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
 	if (ret)
 		return ret;
 
+	/* Do the allocations first so we can easily bail out */
+	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
+					new_page_dirs);
+	if (ret) {
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+		return ret;
+	}
+
+	/* For every page directory referenced, allocate page tables */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
-						ppgtt->base.dev);
+		bitmap_zero(new_page_tables[pdpe], GEN8_PDES_PER_PAGE);
+		ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
+						new_page_tables[pdpe]);
 		if (ret)
 			goto err_out;
 	}
 
-	/* Now mark everything we've touched as used. This doesn't allow for
-	 * robust error checking, but it makes the code a hell of a lot simpler.
-	 */
 	start = orig_start;
 	length = orig_length;
 
+	/* Allocations have completed successfully, so set the bitmaps, and do
+	 * the mappings. */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		gen8_ppgtt_pde_t *const page_directory = kmap_atomic(pd->page);
 		struct i915_page_table_entry *pt;
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
 		uint32_t pde;
 
-		gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
-			bitmap_set(pd->page_tables[pde]->used_ptes,
-				   gen8_pte_index(start),
-				   gen8_pte_count(start, length));
+		/* Every pd should be allocated, we just did that above. */
+		BUG_ON(!pd);
+
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			/* Same reasoning as pd */
+			BUG_ON(!pt);
+			BUG_ON(!pd_len);
+			BUG_ON(!gen8_pte_count(pd_start, pd_len));
+
+			/* Set our used ptes within the page table */
+			bitmap_set(pt->used_ptes,
+				   gen8_pte_index(pd_start),
+				   gen8_pte_count(pd_start, pd_len));
+
+			/* Our pde is now pointing to the pagetable, pt */
 			set_bit(pde, pd->used_pdes);
+
+			/* Map the PDE to the page table */
+			__gen8_do_map_pt(page_directory + pde, pt, vm->dev);
+
+			/* NB: We haven't yet mapped ptes to pages. At this
+			 * point we're still relying on insert_entries() */
 		}
+
+		if (!HAS_LLC(vm->dev))
+			drm_clflush_virt_range(page_directory, PAGE_SIZE);
+
+		kunmap_atomic(page_directory);
+
 		set_bit(pdpe, ppgtt->pdp.used_pdpes);
 	}
 
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	return 0;
 
 err_out:
 	gen8_ppgtt_free(ppgtt);
+
+	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
+		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
+
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	return ret;
 }
 
@@ -826,38 +984,67 @@ err_out:
  * space.
  *
  */
-static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
+static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	struct i915_page_directory_entry *pd;
-	uint64_t temp, start = 0;
-	const uint64_t orig_length = size;
-	uint32_t pdpe;
-	int ret;
+	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pd))
+		return PTR_ERR(ppgtt->scratch_pd);
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+
 	ppgtt->switch_mm = gen8_mm_switch;
 
-	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
-	if (IS_ERR(ppgtt->scratch_pd))
-		return PTR_ERR(ppgtt->scratch_pd);
+	return 0;
+}
 
+static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_page_directory_entry *pd;
+	uint64_t temp, start = 0, size = dev_priv->gtt.base.total;
+	uint32_t pdpe;
+	int ret;
+
+	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	if (ret)
+		return ret;
+
+	/* Aliasing PPGTT has to always work and be mapped because of the way we
+	 * use RESTORE_INHIBIT in the context switch. This will be fixed
+	 * eventually. */
 	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
 		unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
 	}
 
-	start = 0;
-	size = orig_length;
-
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
 		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
 
+	ppgtt->base.allocate_va_range = NULL;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
+
+	return 0;
+}
+
+static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	if (ret)
+		return ret;
+
+	ppgtt->base.allocate_va_range = gen8_alloc_va_range;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+
 	return 0;
 }
 
@@ -1380,7 +1567,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 		}
 	}
 
-	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
+	ppgtt->base.allocate_va_range = aliasing ? NULL : gen6_alloc_va_range;
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
@@ -1421,8 +1608,10 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
 
 	if (INTEL_INFO(dev)->gen < 8)
 		return gen6_ppgtt_init(ppgtt, aliasing);
+	else if (aliasing)
+		return gen8_aliasing_ppgtt_init(ppgtt);
 	else
-		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
+		return gen8_ppgtt_init(ppgtt);
 }
 int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 {
@@ -1531,10 +1720,10 @@ ppgtt_bind_vma(struct i915_vma *vma,
 
 static void ppgtt_unbind_vma(struct i915_vma *vma)
 {
-	vma->vm->clear_range(vma->vm,
-			     vma->node.start,
-			     vma->obj->base.size,
-			     true);
+		vma->vm->clear_range(vma->vm,
+				     vma->node.start,
+				     vma->obj->base.size,
+				     true);
 }
 
 extern int intel_iommu_gfx_mapped;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v4 24/24] drm/i915/bdw: Support dynamic pdp updates in lrc mode
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
                     ` (22 preceding siblings ...)
  2015-01-22 17:01   ` [PATCH v4 23/24] drm/i915/bdw: Dynamic page table allocations Michel Thierry
@ 2015-01-22 17:01   ` Michel Thierry
  23 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-01-22 17:01 UTC (permalink / raw)
  To: intel-gfx

Logic ring contexts need to know the PDPs when they are populated. With
dynamic page table allocations, these PDPs may not exist yet.

Check if PDPs have been allocated and use the scratch page if they do
not exist yet.

Before submission, update the PDPs in the logic ring context as PDPs
have been allocated.

v2: Renamed commit title (Daniel)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 80 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 70 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index efaaebe..109ec59 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -358,6 +358,7 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 
 static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 				    struct drm_i915_gem_object *ring_obj,
+				    struct i915_hw_ppgtt *ppgtt,
 				    u32 tail)
 {
 	struct page *page;
@@ -369,6 +370,40 @@ static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 	reg_state[CTX_RING_TAIL+1] = tail;
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
 
+	/* True PPGTT with dynamic page allocation: update PDP registers and
+	 * point the unallocated PDPs to the scratch page
+	 */
+	if (ppgtt) {
+		if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+		} else {
+			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+		} else {
+			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+		} else {
+			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+		} else {
+			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+	}
+
 	kunmap_atomic(reg_state);
 
 	return 0;
@@ -387,7 +422,7 @@ static void execlists_submit_contexts(struct intel_engine_cs *ring,
 	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj0));
 	WARN_ON(!i915_gem_obj_is_pinned(ringbuf0->obj));
 
-	execlists_update_context(ctx_obj0, ringbuf0->obj, tail0);
+	execlists_update_context(ctx_obj0, ringbuf0->obj, to0->ppgtt, tail0);
 
 	if (to1) {
 		ringbuf1 = to1->engine[ring->id].ringbuf;
@@ -396,7 +431,7 @@ static void execlists_submit_contexts(struct intel_engine_cs *ring,
 		WARN_ON(!i915_gem_obj_is_pinned(ctx_obj1));
 		WARN_ON(!i915_gem_obj_is_pinned(ringbuf1->obj));
 
-		execlists_update_context(ctx_obj1, ringbuf1->obj, tail1);
+		execlists_update_context(ctx_obj1, ringbuf1->obj, to1->ppgtt, tail1);
 	}
 
 	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
@@ -1731,14 +1766,39 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+
+	/* With dynamic page allocation, PDPs may not be allocated at this point,
+	 * Point the unallocated PDPs to the scratch page
+	 */
+	if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+	} else {
+		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+	} else {
+		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+	} else {
+		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+	} else {
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 01/24] drm/i915/trace: Fix offsets for 64b
  2015-01-22 17:01   ` [PATCH v4 01/24] drm/i915/trace: Fix offsets for 64b Michel Thierry
@ 2015-01-27 12:16     ` Mika Kuoppala
  0 siblings, 0 replies; 229+ messages in thread
From: Mika Kuoppala @ 2015-01-27 12:16 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_trace.h | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index 6058a01..f004d3d 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -115,7 +115,7 @@ TRACE_EVENT(i915_vma_bind,
>  	    TP_STRUCT__entry(
>  			     __field(struct drm_i915_gem_object *, obj)
>  			     __field(struct i915_address_space *, vm)
> -			     __field(u32, offset)
> +			     __field(u64, offset)
>  			     __field(u32, size)
>  			     __field(unsigned, flags)

We seem to use uint64_t for flags in caller. Not that we are using
past a few currently so we good for now.

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

>  			     ),
> @@ -128,7 +128,7 @@ TRACE_EVENT(i915_vma_bind,
>  			   __entry->flags = flags;
>  			   ),
>  
> -	    TP_printk("obj=%p, offset=%08x size=%x%s vm=%p",
> +	    TP_printk("obj=%p, offset=%016llx size=%x%s vm=%p",
>  		      __entry->obj, __entry->offset, __entry->size,
>  		      __entry->flags & PIN_MAPPABLE ? ", mappable" : "",
>  		      __entry->vm)
> @@ -141,7 +141,7 @@ TRACE_EVENT(i915_vma_unbind,
>  	    TP_STRUCT__entry(
>  			     __field(struct drm_i915_gem_object *, obj)
>  			     __field(struct i915_address_space *, vm)
> -			     __field(u32, offset)
> +			     __field(u64, offset)
>  			     __field(u32, size)
>  			     ),
>  
> @@ -152,7 +152,7 @@ TRACE_EVENT(i915_vma_unbind,
>  			   __entry->size = vma->node.size;
>  			   ),
>  
> -	    TP_printk("obj=%p, offset=%08x size=%x vm=%p",
> +	    TP_printk("obj=%p, offset=%016llx size=%x vm=%p",
>  		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
>  );
>  
> -- 
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 02/24] drm/i915: Rename to GEN8_LEGACY_PDPES
  2015-01-22 17:01   ` [PATCH v4 02/24] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
@ 2015-02-06 15:32     ` Mika Kuoppala
  0 siblings, 0 replies; 229+ messages in thread
From: Mika Kuoppala @ 2015-02-06 15:32 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> In gen8, 32b PPGTT has always had one "pdp" (it doesn't actually have
> one, but it resembles having one). The #define was confusing as is, and
> using "PDPE" is a much better description.
>
> sed -i 's/GEN8_LEGACY_PDPS/GEN8_LEGACY_PDPES/' drivers/gpu/drm/i915/*.[ch]
>
> It also matches the x86 pagetable terminology:
> PTE  = Page Table Entry - pagetable level 1 page
> PDE  = Page Directory Entry - pagetable level 2 page
> PDPE = Page Directory Pointer Entry - pagetable level 3 page
>
> And in the near future (for 48b addressing):
> PML4E = Page Map Level 4 Entry
>
> v2: Expanded information about Page Directory/Table nomenclature.
>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> CC: Dave Gordon <david.s.gordon@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 6 +++---
>  drivers/gpu/drm/i915/i915_gem_gtt.h | 6 +++---
>  2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 746f77f..58d54bd 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -375,7 +375,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  	pt_vaddr = NULL;
>  
>  	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> -		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPS))
> +		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
>  			break;
>  
>  		if (pt_vaddr == NULL)
> @@ -486,7 +486,7 @@ bail:
>  static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
>  					   const int max_pdp)
>  {
> -	struct page **pt_pages[GEN8_LEGACY_PDPS];
> +	struct page **pt_pages[GEN8_LEGACY_PDPES];
>  	int i, ret;
>  
>  	for (i = 0; i < max_pdp; i++) {
> @@ -537,7 +537,7 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
>  		return -ENOMEM;
>  
>  	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
> -	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
> +	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
>  
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index e377c7d..9d998ec 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -88,7 +88,7 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
>  #define GEN8_PDE_MASK			0x1ff
>  #define GEN8_PTE_SHIFT			12
>  #define GEN8_PTE_MASK			0x1ff
> -#define GEN8_LEGACY_PDPS		4
> +#define GEN8_LEGACY_PDPES		4
>  #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
>  #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
>  
> @@ -273,12 +273,12 @@ struct i915_hw_ppgtt {
>  	unsigned num_pd_pages; /* gen8+ */
>  	union {
>  		struct page **pt_pages;
> -		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
> +		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
>  	};
>  	struct page *pd_pages;
>  	union {
>  		uint32_t pd_offset;
> -		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
> +		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
>  	};
>  	union {
>  		dma_addr_t *pt_dma_addr;
> -- 
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 03/24] drm/i915: Setup less PPGTT on failed page_directory
  2015-01-22 17:01   ` [PATCH v4 03/24] drm/i915: Setup less PPGTT on failed page_directory Michel Thierry
@ 2015-02-09 15:21     ` Mika Kuoppala
  0 siblings, 0 replies; 229+ messages in thread
From: Mika Kuoppala @ 2015-02-09 15:21 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> The current code will both potentially print a WARN, and setup part of
> the PPGTT structure. Neither of these harm the current code, it is
> simply for clarity, and to perhaps prevent later bugs, or weird
> debug messages.
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

>  drivers/gpu/drm/i915/i915_gem_gtt.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 58d54bd..b48b586 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1032,11 +1032,14 @@ alloc:
>  		goto alloc;
>  	}
>  
> +	if (ret)
> +		return ret;
> +
>  	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
>  		DRM_DEBUG("Forced to use aperture for PDEs\n");
>  
>  	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
> -	return ret;
> +	return 0;
>  }
>  
>  static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
> -- 
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 04/24] drm/i915/gen8: Un-hardcode number of page directories
  2015-01-22 17:01   ` [PATCH v4 04/24] drm/i915/gen8: Un-hardcode number of page directories Michel Thierry
@ 2015-02-09 15:30     ` Mika Kuoppala
  2015-02-09 16:33       ` Daniel Vetter
  0 siblings, 1 reply; 229+ messages in thread
From: Mika Kuoppala @ 2015-02-09 15:30 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 9d998ec..8f76990 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -282,7 +282,7 @@ struct i915_hw_ppgtt {
>  	};
>  	union {
>  		dma_addr_t *pt_dma_addr;
> -		dma_addr_t *gen8_pt_dma_addr[4];
> +		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
>  	};
>  
>  	struct drm_i915_file_private *file_priv;
> -- 
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 04/24] drm/i915/gen8: Un-hardcode number of page directories
  2015-02-09 15:30     ` Mika Kuoppala
@ 2015-02-09 16:33       ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2015-02-09 16:33 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Mon, Feb 09, 2015 at 05:30:45PM +0200, Mika Kuoppala wrote:
> Michel Thierry <michel.thierry@intel.com> writes:
> 
> > From: Ben Widawsky <benjamin.widawsky@intel.com>
> >
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> 
> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

Merged up to this one, thanks for patches&review.
-Daniel

> 
> > ---
> >  drivers/gpu/drm/i915/i915_gem_gtt.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> > index 9d998ec..8f76990 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> > @@ -282,7 +282,7 @@ struct i915_hw_ppgtt {
> >  	};
> >  	union {
> >  		dma_addr_t *pt_dma_addr;
> > -		dma_addr_t *gen8_pt_dma_addr[4];
> > +		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
> >  	};
> >  
> >  	struct drm_i915_file_private *file_priv;
> > -- 
> > 2.1.1
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 05/24] drm/i915: page table abstractions
  2015-01-22 17:01   ` [PATCH v4 05/24] drm/i915: page table abstractions Michel Thierry
@ 2015-02-18 11:27     ` Mika Kuoppala
  2015-02-23 15:39       ` Michel Thierry
  0 siblings, 1 reply; 229+ messages in thread
From: Mika Kuoppala @ 2015-02-18 11:27 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> When we move to dynamic page allocation, keeping page_directory and pagetabs as
> separate structures will help to break actions into simpler tasks.
>
> To help transition the code nicely there is some wasted space in gen6/7.
> This will be ameliorated shortly.
>
> Following the x86 pagetable terminology:
> PDPE = struct i915_page_directory_pointer_entry.
> PDE = struct i915_page_directory_entry [page_directory].
> PTE = struct i915_page_table_entry [page_tables].
>
> v2: fixed mismatches after clean-up/rebase.
>
> v3: Clarify the names of the multiple levels of page tables (Daniel)
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2, v3)
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 177 ++++++++++++++++++------------------
>  drivers/gpu/drm/i915/i915_gem_gtt.h |  23 ++++-
>  2 files changed, 107 insertions(+), 93 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index b48b586..98b4698 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -334,7 +334,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>  				      I915_CACHE_LLC, use_scratch);
>  
>  	while (num_entries) {
> -		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
> +		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
> +		struct page *page_table = pd->page_tables[pde].page;
>  
>  		last_pte = pte + num_entries;
>  		if (last_pte > GEN8_PTES_PER_PAGE)
> @@ -378,8 +379,12 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
>  			break;
>  
> -		if (pt_vaddr == NULL)
> -			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
> +		if (pt_vaddr == NULL) {
> +			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
> +			struct page *page_table = pd->page_tables[pde].page;
> +
> +			pt_vaddr = kmap_atomic(page_table);
> +		}
>  
>  		pt_vaddr[pte] =
>  			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
> @@ -403,29 +408,33 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  	}
>  }
>  
> -static void gen8_free_page_tables(struct page **pt_pages)
> +static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
>  {
>  	int i;
>  
> -	if (pt_pages == NULL)
> +	if (pd->page_tables == NULL)
>  		return;
>  
>  	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
> -		if (pt_pages[i])
> -			__free_pages(pt_pages[i], 0);
> +		if (pd->page_tables[i].page)
> +			__free_page(pd->page_tables[i].page);
>  }
>  
> -static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
> +static void gen8_free_page_directories(struct i915_page_directory_entry *pd)
                                        ^
You only free one directory so why plural here?

> +{

If you free the page tables for the directory here..

> +	kfree(pd->page_tables);
> +	__free_page(pd->page);
> +}
> +
> +static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  {
>  	int i;
>  
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> -		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
> -		kfree(ppgtt->gen8_pt_pages[i]);
> +		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);

...this loop will be cleaner.

Also consider renaming 'num_pd_pages' to 'num_pd'. But if it does
cause a lot of rebase burden dont worry about it.

> +		gen8_free_page_directories(&ppgtt->pdp.page_directory[i]);
>  		kfree(ppgtt->gen8_pt_dma_addr[i]);
>  	}
> -
> -	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
>  }
>  
>  static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
> @@ -460,86 +469,75 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>  	gen8_ppgtt_free(ppgtt);
>  }
>  
> -static struct page **__gen8_alloc_page_tables(void)
> +static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
>  {
> -	struct page **pt_pages;
>  	int i;
>  
> -	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
> -	if (!pt_pages)
> -		return ERR_PTR(-ENOMEM);
> -
> -	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
> -		pt_pages[i] = alloc_page(GFP_KERNEL);
> -		if (!pt_pages[i])
> -			goto bail;
> +	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> +		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
> +						     sizeof(dma_addr_t),
> +						     GFP_KERNEL);
> +		if (!ppgtt->gen8_pt_dma_addr[i])
> +			return -ENOMEM;
>  	}
>  
> -	return pt_pages;
> -
> -bail:
> -	gen8_free_page_tables(pt_pages);
> -	kfree(pt_pages);
> -	return ERR_PTR(-ENOMEM);
> +	return 0;
>  }
>  
> -static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
> -					   const int max_pdp)
> +static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>  {
> -	struct page **pt_pages[GEN8_LEGACY_PDPES];
> -	int i, ret;
> +	int i, j;
>  
> -	for (i = 0; i < max_pdp; i++) {
> -		pt_pages[i] = __gen8_alloc_page_tables();
> -		if (IS_ERR(pt_pages[i])) {
> -			ret = PTR_ERR(pt_pages[i]);
> -			goto unwind_out;
> +	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> +		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> +			struct i915_page_table_entry *pt = &ppgtt->pdp.page_directory[i].page_tables[j];
> +
> +			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +			if (!pt->page)
> +				goto unwind_out;
>  		}
>  	}
>  
> -	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
> -	 * "atomic" - for cleanup purposes.
> -	 */
> -	for (i = 0; i < max_pdp; i++)
> -		ppgtt->gen8_pt_pages[i] = pt_pages[i];
> -
>  	return 0;
>  
>  unwind_out:
> -	while (i--) {
> -		gen8_free_page_tables(pt_pages[i]);
> -		kfree(pt_pages[i]);
> -	}
> +	while (i--)
> +		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
>  
> -	return ret;
> +	return -ENOMEM;
>  }
>  
> -static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
> +static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
> +						const int max_pdp)
>  {
>  	int i;
>  
> -	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> -		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
> -						     sizeof(dma_addr_t),
> -						     GFP_KERNEL);
> -		if (!ppgtt->gen8_pt_dma_addr[i])
> -			return -ENOMEM;
> -	}
> +	for (i = 0; i < max_pdp; i++) {
> +		struct i915_page_table_entry *pt;
>  
> -	return 0;
> -}
> +		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
> +		if (!pt)
> +			goto unwind_out;
>  
> -static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
> -						const int max_pdp)
> -{
> -	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
> -	if (!ppgtt->pd_pages)
> -		return -ENOMEM;
> +		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
> +		if (!ppgtt->pdp.page_directory[i].page)
> +			goto unwind_out;

If you end up having alloc error here you will leak the previously
allocated pt above.

Also consider that if you do gen8_ppgtt_allocate_page_directory() and
add null check for pd->page in gen8_free_page_directory you should be able to avoid
the unwinding below completely.

> +
> +		ppgtt->pdp.page_directory[i].page_tables = pt;
> +	}
>  
> -	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
> +	ppgtt->num_pd_pages = max_pdp;
>  	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
>  
>  	return 0;
> +
> +unwind_out:
> +	while (i--) {
> +		kfree(ppgtt->pdp.page_directory[i].page_tables);
> +		__free_page(ppgtt->pdp.page_directory[i].page);
> +	}
> +
> +	return -ENOMEM;
>  }
>  
>  static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
> @@ -551,18 +549,19 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
>  	if (ret)
>  		return ret;
>  
> -	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
> -	if (ret) {
> -		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
> -		return ret;
> -	}
> +	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
> +	if (ret)
> +		goto err_out;
>  
>  	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
>  
>  	ret = gen8_ppgtt_allocate_dma(ppgtt);
> -	if (ret)
> -		gen8_ppgtt_free(ppgtt);
> +	if (!ret)
> +		return ret;
>  
> +	/* TODO: Check this for all cases */

The check for zero return and then returning it with the comment is
confusing. Why not just do the same pattern as in above?

if (ret)
   goto err_out;

return 0;

-Mika

> +err_out:
> +	gen8_ppgtt_free(ppgtt);
>  	return ret;
>  }
>  
> @@ -573,7 +572,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
>  	int ret;
>  
>  	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
> -			       &ppgtt->pd_pages[pd], 0,
> +			       ppgtt->pdp.page_directory[pd].page, 0,
>  			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>  
>  	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
> @@ -593,7 +592,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
>  	struct page *p;
>  	int ret;
>  
> -	p = ppgtt->gen8_pt_pages[pd][pt];
> +	p = ppgtt->pdp.page_directory[pd].page_tables[pt].page;
>  	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
>  			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>  	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
> @@ -654,7 +653,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  	 */
>  	for (i = 0; i < max_pdp; i++) {
>  		gen8_ppgtt_pde_t *pd_vaddr;
> -		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
> +		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
>  			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
>  			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
> @@ -717,7 +716,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>  				   expected);
>  		seq_printf(m, "\tPDE: %x\n", pd_entry);
>  
> -		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
>  		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
>  			unsigned long va =
>  				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
> @@ -922,7 +921,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>  		if (last_pte > I915_PPGTT_PT_ENTRIES)
>  			last_pte = I915_PPGTT_PT_ENTRIES;
>  
> -		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
>  
>  		for (i = first_pte; i < last_pte; i++)
>  			pt_vaddr[i] = scratch_pte;
> @@ -951,7 +950,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>  	pt_vaddr = NULL;
>  	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
>  		if (pt_vaddr == NULL)
> -			pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
> +			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
>  
>  		pt_vaddr[act_pte] =
>  			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
> @@ -986,8 +985,8 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  
>  	kfree(ppgtt->pt_dma_addr);
>  	for (i = 0; i < ppgtt->num_pd_entries; i++)
> -		__free_page(ppgtt->pt_pages[i]);
> -	kfree(ppgtt->pt_pages);
> +		__free_page(ppgtt->pd.page_tables[i].page);
> +	kfree(ppgtt->pd.page_tables);
>  }
>  
>  static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
> @@ -1044,22 +1043,22 @@ alloc:
>  
>  static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>  {
> +	struct i915_page_table_entry *pt;
>  	int i;
>  
> -	ppgtt->pt_pages = kcalloc(ppgtt->num_pd_entries, sizeof(struct page *),
> -				  GFP_KERNEL);
> -
> -	if (!ppgtt->pt_pages)
> +	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
> +	if (!pt)
>  		return -ENOMEM;
>  
>  	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> -		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
> -		if (!ppgtt->pt_pages[i]) {
> +		pt[i].page = alloc_page(GFP_KERNEL);
> +		if (!pt->page) {
>  			gen6_ppgtt_free(ppgtt);
>  			return -ENOMEM;
>  		}
>  	}
>  
> +	ppgtt->pd.page_tables = pt;
>  	return 0;
>  }
>  
> @@ -1094,9 +1093,11 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>  	int i;
>  
>  	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> +		struct page *page;
>  		dma_addr_t pt_addr;
>  
> -		pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i], 0, 4096,
> +		page = ppgtt->pd.page_tables[i].page;
> +		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
>  				       PCI_DMA_BIDIRECTIONAL);
>  
>  		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
> @@ -1140,7 +1141,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
>  	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
>  	ppgtt->base.start = 0;
> -	ppgtt->base.total =  ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
> +	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
>  	ppgtt->debug_dump = gen6_dump_ppgtt;
>  
>  	ppgtt->pd_offset =
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 8f76990..d9bc375 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -187,6 +187,20 @@ struct i915_vma {
>  			 u32 flags);
>  };
>  
> +struct i915_page_table_entry {
> +	struct page *page;
> +};
> +
> +struct i915_page_directory_entry {
> +	struct page *page; /* NULL for GEN6-GEN7 */
> +	struct i915_page_table_entry *page_tables;
> +};
> +
> +struct i915_page_directory_pointer_entry {
> +	/* struct page *page; */
> +	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
> +};
> +
>  struct i915_address_space {
>  	struct drm_mm mm;
>  	struct drm_device *dev;
> @@ -272,11 +286,6 @@ struct i915_hw_ppgtt {
>  	unsigned num_pd_entries;
>  	unsigned num_pd_pages; /* gen8+ */
>  	union {
> -		struct page **pt_pages;
> -		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
> -	};
> -	struct page *pd_pages;
> -	union {
>  		uint32_t pd_offset;
>  		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
>  	};
> @@ -284,6 +293,10 @@ struct i915_hw_ppgtt {
>  		dma_addr_t *pt_dma_addr;
>  		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
>  	};
> +	union {
> +		struct i915_page_directory_pointer_entry pdp;
> +		struct i915_page_directory_entry pd;
> +	};
>  
>  	struct drm_i915_file_private *file_priv;
>  
> -- 
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 09/24] drm/i915: Track GEN6 page table usage
  2015-01-22 17:01   ` [PATCH v4 09/24] drm/i915: Track GEN6 page table usage Michel Thierry
@ 2015-02-20 16:41     ` Mika Kuoppala
  2015-02-23 15:39       ` Michel Thierry
  0 siblings, 1 reply; 229+ messages in thread
From: Mika Kuoppala @ 2015-02-20 16:41 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> Instead of implementing the full tracking + dynamic allocation, this
> patch does a bit less than half of the work, by tracking and warning on
> unexpected conditions. The tracking itself follows which PTEs within a
> page table are currently being used for objects. The next patch will
> modify this to actually allocate the page tables only when necessary.
>
> With the current patch there isn't much in the way of making a gen
> agnostic range allocation function. However, in the next patch we'll add
> more specificity which makes having separate functions a bit easier to
> manage.
>
> One important change introduced here is that DMA mappings are
> created/destroyed at the same page directories/tables are
> allocated/deallocated.
>
> Notice that aliasing PPGTT is not managed here. The patch which actually
> begins dynamic allocation/teardown explains the reasoning for this.
>
> v2: s/pdp.page_directory/pdp.page_directorys
> Make a scratch page allocation helper
>
> v3: Rebase and expand commit message.
>
> v4: Allocate required pagetables only when it is needed, _bind_to_vm
> instead of bind_vma (Daniel).
>
> v5: Rebased to remove the unnecessary noise in the diff, also:
>  - PDE mask is GEN agnostic, renamed GEN6_PDE_MASK to I915_PDE_MASK.
>  - Removed unnecessary checks in gen6_alloc_va_range.
>  - Changed map/unmap_px_single macros to use dma functions directly and
>    be part of a static inline function instead.
>  - Moved drm_device plumbing through page tables operation to its own
>    patch.
>  - Moved allocate/teardown_va_range calls until they are fully
>    implemented (in subsequent patch).
>  - Merged pt and scratch_pt unmap_and_free path.
>  - Moved scratch page allocator helper to the patch that will use it.
>
> v6: Reduce complexity by not tearing down pagetables dynamically, the
> same can be achieved while freeing empty vms. (Daniel)
>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 191 +++++++++++++++++++++++++-----------
>  drivers/gpu/drm/i915/i915_gem_gtt.h |  75 ++++++++++++++
>  2 files changed, 206 insertions(+), 60 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index e2bcd10..760585e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -274,29 +274,88 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
>  	return pte;
>  }
>  
> -static void unmap_and_free_pt(struct i915_page_table_entry *pt, struct drm_device *dev)
> +#define i915_dma_unmap_single(px, dev) \
> +	__i915_dma_unmap_single((px)->daddr, dev)
> +
> +static inline void __i915_dma_unmap_single(dma_addr_t daddr,
> +					struct drm_device *dev)
> +{
> +	struct device *device = &dev->pdev->dev;
> +
> +	dma_unmap_page(device, daddr, 4096, PCI_DMA_BIDIRECTIONAL);
> +}
> +
> +/**
> + * i915_dma_map_px_single() - Create a dma mapping for a page table/dir/etc.
> + * @px:		Page table/dir/etc to get a DMA map for
> + * @dev:	drm device
> + *
> + * Page table allocations are unified across all gens. They always require a
> + * single 4k allocation, as well as a DMA mapping. If we keep the structs
> + * symmetric here, the simple macro covers us for every page table type.
> + *
> + * Return: 0 if success.
> + */
> +#define i915_dma_map_px_single(px, dev) \
> +	i915_dma_map_page_single((px)->page, (dev), &(px)->daddr)
> +

If this is symmetrical to i915_dma_unmap_single() is the _px_ needed?

> +static inline int i915_dma_map_page_single(struct page *page,
> +					   struct drm_device *dev,
> +					   dma_addr_t *daddr)
> +{
> +	struct device *device = &dev->pdev->dev;
> +
> +	*daddr = dma_map_page(device, page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
> +	return dma_mapping_error(device, *daddr);
> +}
> +
> +static void unmap_and_free_pt(struct i915_page_table_entry *pt,
> +			       struct drm_device *dev)
>  {
>  	if (WARN_ON(!pt->page))
>  		return;
> +
> +	i915_dma_unmap_single(pt, dev);
>  	__free_page(pt->page);
> +	kfree(pt->used_ptes);
>  	kfree(pt);
>  }
>  
>  static struct i915_page_table_entry *alloc_pt_single(struct drm_device *dev)
>  {
>  	struct i915_page_table_entry *pt;
> +	const size_t count = INTEL_INFO(dev)->gen >= 8 ?
> +		GEN8_PTES_PER_PAGE : I915_PPGTT_PT_ENTRIES;
> +	int ret = -ENOMEM;
>  
>  	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
>  	if (!pt)
>  		return ERR_PTR(-ENOMEM);
>  
> +	pt->used_ptes = kcalloc(BITS_TO_LONGS(count), sizeof(*pt->used_ptes),
> +				GFP_KERNEL);
> +
> +	if (!pt->used_ptes)
> +		goto fail_bitmap;
> +
>  	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> -	if (!pt->page) {
> -		kfree(pt);
> -		return ERR_PTR(-ENOMEM);
> -	}
> +	if (!pt->page)
> +		goto fail_page;
> +
> +	ret = i915_dma_map_px_single(pt, dev);
> +	if (ret)
> +		goto fail_dma;
>  
>  	return pt;
> +
> +fail_dma:
> +	__free_page(pt->page);
> +fail_page:
> +	kfree(pt->used_ptes);
> +fail_bitmap:
> +	kfree(pt);
> +
> +	return ERR_PTR(ret);
>  }
>  
>  /**
> @@ -836,26 +895,36 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>  	}
>  }
>  
> -static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
> +/* Write pde (index) from the page directory @pd to the page table @pt */
> +static void gen6_write_pdes(struct i915_page_directory_entry *pd,

For me it seems that you will write only one pde entry so s/pdes/pde ?

> +			    const int pde, struct i915_page_table_entry *pt)
>  {
> -	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
> -	gen6_gtt_pte_t __iomem *pd_addr;
> -	uint32_t pd_entry;
> -	int i;
> +	struct i915_hw_ppgtt *ppgtt =
> +		container_of(pd, struct i915_hw_ppgtt, pd);
> +	u32 pd_entry;
>  
> -	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
> -	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
> -		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
> -	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> -		dma_addr_t pt_addr;
> +	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
> +	pd_entry |= GEN6_PDE_VALID;
>  
> -		pt_addr = ppgtt->pd.page_tables[i]->daddr;
> -		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
> -		pd_entry |= GEN6_PDE_VALID;
> +	writel(pd_entry, ppgtt->pd_addr + pde);
>  
> -		writel(pd_entry, pd_addr + i);
> -	}
> -	readl(pd_addr);
> +	/* XXX: Caller needs to make sure the write completes if necessary */
> +}

Move this comment on top of the function and lift the XXX:

> +
> +/* Write all the page tables found in the ppgtt structure to incrementing page
> + * directories. */
> +static void gen6_write_page_range(struct drm_i915_private *dev_priv,
> +				struct i915_page_directory_entry *pd, uint32_t start, uint32_t length)
> +{
> +	struct i915_page_table_entry *pt;
> +	uint32_t pde, temp;
> +
> +	gen6_for_each_pde(pt, pd, start, length, temp, pde)
> +		gen6_write_pdes(pd, pde, pt);
> +
> +	/* Make sure write is complete before other code can use this page
> +	 * table. Also require for WC mapped PTEs */
> +	readl(dev_priv->gtt.gsm);
>  }
>  
>  static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
> @@ -1071,6 +1140,28 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  			       4096, PCI_DMA_BIDIRECTIONAL);
>  }
>  
> +static int gen6_alloc_va_range(struct i915_address_space *vm,
> +			       uint64_t start, uint64_t length)
> +{
> +	struct i915_hw_ppgtt *ppgtt =
> +				container_of(vm, struct i915_hw_ppgtt, base);
> +	struct i915_page_table_entry *pt;
> +	uint32_t pde, temp;
> +
> +	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
> +		DECLARE_BITMAP(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
> +
> +		bitmap_zero(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
> +		bitmap_set(tmp_bitmap, gen6_pte_index(start),
> +			   gen6_pte_count(start, length));
> +
> +		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
> +				I915_PPGTT_PT_ENTRIES);
> +	}
> +
> +	return 0;
> +}
> +
>  static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  {
>  	int i;
> @@ -1117,20 +1208,24 @@ alloc:
>  					       0, dev_priv->gtt.base.total,
>  					       0);
>  		if (ret)
> -			return ret;
> +			goto err_out;
>  
>  		retried = true;
>  		goto alloc;
>  	}
>  
>  	if (ret)
> -		return ret;
> +		goto err_out;
> +
>  
>  	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
>  		DRM_DEBUG("Forced to use aperture for PDEs\n");
>  
>  	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
>  	return 0;
> +
> +err_out:
> +	return ret;
>  }
>  
>  static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
> @@ -1152,30 +1247,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>  	return 0;
>  }
>  
> -static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
> -{
> -	struct drm_device *dev = ppgtt->base.dev;
> -	int i;
> -
> -	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> -		struct page *page;
> -		dma_addr_t pt_addr;
> -
> -		page = ppgtt->pd.page_tables[i]->page;
> -		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
> -				       PCI_DMA_BIDIRECTIONAL);
> -
> -		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
> -			gen6_ppgtt_unmap_pages(ppgtt);
> -			return -EIO;
> -		}
> -
> -		ppgtt->pd.page_tables[i]->daddr = pt_addr;
> -	}
> -
> -	return 0;
> -}
> -
>  static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  {
>  	struct drm_device *dev = ppgtt->base.dev;
> @@ -1196,12 +1267,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  	if (ret)
>  		return ret;
>  
> -	ret = gen6_ppgtt_setup_page_tables(ppgtt);
> -	if (ret) {
> -		gen6_ppgtt_free(ppgtt);
> -		return ret;
> -	}
> -
> +	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
>  	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
>  	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
>  	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
> @@ -1212,13 +1278,17 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  	ppgtt->pd.pd_offset =
>  		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
>  
> +	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
> +		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
> +
>  	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
>  
> +	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
> +
>  	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
>  			 ppgtt->node.size >> 20,
>  			 ppgtt->node.start / PAGE_SIZE);
>  
> -	gen6_write_pdes(ppgtt);
>  	DRM_DEBUG("Adding PPGTT at offset %x\n",
>  		  ppgtt->pd.pd_offset << 10);
>  
> @@ -1491,13 +1561,14 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
>  
>  	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
>  		/* TODO: Perhaps it shouldn't be gen6 specific */
> -		if (i915_is_ggtt(vm)) {
> -			if (dev_priv->mm.aliasing_ppgtt)
> -				gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
> -			continue;
> -		}
>  
> -		gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
> +		struct i915_hw_ppgtt *ppgtt =
> +			container_of(vm, struct i915_hw_ppgtt, base);
> +
> +		if (i915_is_ggtt(vm))
> +			ppgtt = dev_priv->mm.aliasing_ppgtt;

If we have ggtt but aliasing is not enabled we get NULL here and oops
over in next function?

-Mika

> +
> +		gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
>  	}
>  
>  	i915_ggtt_flush(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index e8cad72..1b15fc9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -54,7 +54,10 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
>  #define GEN6_PPGTT_PD_ENTRIES		512
>  #define GEN6_PD_SIZE			(GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
>  #define GEN6_PD_ALIGN			(PAGE_SIZE * 16)
> +#define GEN6_PDE_SHIFT			22
>  #define GEN6_PDE_VALID			(1 << 0)
> +#define I915_PDE_MASK			(GEN6_PPGTT_PD_ENTRIES-1)
> +#define NUM_PTE(pde_shift)		(1 << (pde_shift - PAGE_SHIFT))
>  
>  #define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
>  
> @@ -190,6 +193,8 @@ struct i915_vma {
>  struct i915_page_table_entry {
>  	struct page *page;
>  	dma_addr_t daddr;
> +
> +	unsigned long *used_ptes;
>  };
>  
>  struct i915_page_directory_entry {
> @@ -246,6 +251,9 @@ struct i915_address_space {
>  	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
>  				     enum i915_cache_level level,
>  				     bool valid, u32 flags); /* Create a valid PTE */
> +	int (*allocate_va_range)(struct i915_address_space *vm,
> +				 uint64_t start,
> +				 uint64_t length);
>  	void (*clear_range)(struct i915_address_space *vm,
>  			    uint64_t start,
>  			    uint64_t length,
> @@ -298,12 +306,79 @@ struct i915_hw_ppgtt {
>  
>  	struct drm_i915_file_private *file_priv;
>  
> +	gen6_gtt_pte_t __iomem *pd_addr;
> +
>  	int (*enable)(struct i915_hw_ppgtt *ppgtt);
>  	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
>  			 struct intel_engine_cs *ring);
>  	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
>  };
>  
> +/* For each pde iterates over every pde between from start until start + length.
> + * If start, and start+length are not perfectly divisible, the macro will round
> + * down, and up as needed. The macro modifies pde, start, and length. Dev is
> + * only used to differentiate shift values. Temp is temp.  On gen6/7, start = 0,
> + * and length = 2G effectively iterates over every PDE in the system. On gen8+
> + * it simply iterates over every page directory entry in a page directory.
> + *
> + * XXX: temp is not actually needed, but it saves doing the ALIGN operation.
> + */
> +#define gen6_for_each_pde(pt, pd, start, length, temp, iter) \
> +	for (iter = gen6_pde_index(start), pt = (pd)->page_tables[iter]; \
> +	     length > 0 && iter < GEN6_PPGTT_PD_ENTRIES; \
> +	     pt = (pd)->page_tables[++iter], \
> +	     temp = ALIGN(start+1, 1 << GEN6_PDE_SHIFT) - start, \
> +	     temp = min_t(unsigned, temp, length), \
> +	     start += temp, length -= temp)
> +
> +static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
> +{
> +	const uint32_t mask = NUM_PTE(pde_shift) - 1;
> +
> +	return (address >> PAGE_SHIFT) & mask;
> +}
> +
> +/* Helper to counts the number of PTEs within the given length. This count does
> +* not cross a page table boundary, so the max value would be
> +* I915_PPGTT_PT_ENTRIES for GEN6, and GEN8_PTES_PER_PAGE for GEN8.
> +*/
> +static inline size_t i915_pte_count(uint64_t addr, size_t length,
> +					uint32_t pde_shift)
> +{
> +	const uint64_t mask = ~((1 << pde_shift) - 1);
> +	uint64_t end;
> +
> +	BUG_ON(length == 0);
> +	BUG_ON(offset_in_page(addr|length));
> +
> +	end = addr + length;
> +
> +	if ((addr & mask) != (end & mask))
> +		return NUM_PTE(pde_shift) - i915_pte_index(addr, pde_shift);
> +
> +	return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
> +}
> +
> +static inline uint32_t i915_pde_index(uint64_t addr, uint32_t shift)
> +{
> +	return (addr >> shift) & I915_PDE_MASK;
> +}
> +
> +static inline uint32_t gen6_pte_index(uint32_t addr)
> +{
> +	return i915_pte_index(addr, GEN6_PDE_SHIFT);
> +}
> +
> +static inline size_t gen6_pte_count(uint32_t addr, uint32_t length)
> +{
> +	return i915_pte_count(addr, length, GEN6_PDE_SHIFT);
> +}
> +
> +static inline uint32_t gen6_pde_index(uint32_t addr)
> +{
> +	return i915_pde_index(addr, GEN6_PDE_SHIFT);
> +}
> +
>  int i915_gem_gtt_init(struct drm_device *dev);
>  void i915_gem_init_global_gtt(struct drm_device *dev);
>  void i915_global_gtt_cleanup(struct drm_device *dev);
> -- 
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 07/24] drm/i915: Create page table allocators
  2015-01-22 17:01   ` [PATCH v4 07/24] drm/i915: Create page table allocators Michel Thierry
@ 2015-02-20 16:50     ` Mika Kuoppala
  2015-02-23 15:39       ` Michel Thierry
  0 siblings, 1 reply; 229+ messages in thread
From: Mika Kuoppala @ 2015-02-20 16:50 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> As we move toward dynamic page table allocation, it becomes much easier
> to manage our data structures if break do things less coarsely by
> breaking up all of our actions into individual tasks.  This makes the
> code easier to write, read, and verify.
>
> Aside from the dissection of the allocation functions, the patch
> statically allocates the page table structures without a page directory.
> This remains the same for all platforms,
>
> The patch itself should not have much functional difference. The primary
> noticeable difference is the fact that page tables are no longer
> allocated, but rather statically declared as part of the page directory.
> This has non-zero overhead, but things gain non-trivial complexity as a
> result.
>
> This patch exists for a few reasons:
> 1. Splitting out the functions allows easily combining GEN6 and GEN8
> code. Page tables have no difference based on GEN8. As we'll see in a
> future patch when we add the DMA mappings to the allocations, it
> requires only one small change to make work, and error handling should
> just fall into place.
>
> 2. Unless we always want to allocate all page tables under a given PDE,
> we'll have to eventually break this up into an array of pointers (or
> pointer to pointer).
>
> 3. Having the discrete functions is easier to review, and understand.
> All allocations and frees now take place in just a couple of locations.
> Reviewing, and catching leaks should be easy.
>
> 4. Less important: the GFP flags are confined to one location, which
> makes playing around with such things trivial.
>
> v2: Updated commit message to explain why this patch exists
>
> v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
>
> v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
>
> v5: Added additional safety checks in gen8 clear/free/unmap.
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3, v4, v5)
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 251 ++++++++++++++++++++++++------------
>  drivers/gpu/drm/i915/i915_gem_gtt.h |   4 +-
>  drivers/gpu/drm/i915/intel_lrc.c    |  16 +--
>  3 files changed, 179 insertions(+), 92 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 0fe5c1e..85ea535 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -275,6 +275,99 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
>  	return pte;
>  }
>  
> +static void unmap_and_free_pt(struct i915_page_table_entry *pt)
> +{
> +	if (WARN_ON(!pt->page))
> +		return;
> +	__free_page(pt->page);
> +	kfree(pt);
> +}
> +
> +static struct i915_page_table_entry *alloc_pt_single(void)
> +{
> +	struct i915_page_table_entry *pt;
> +
> +	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
> +	if (!pt)
> +		return ERR_PTR(-ENOMEM);
> +
> +	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!pt->page) {
> +		kfree(pt);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	return pt;
> +}
> +
> +/**
> + * alloc_pt_range() - Allocate a multiple page tables
> + * @pd:		The page directory which will have at least @count entries
> + *		available to point to the allocated page tables.
> + * @pde:	First page directory entry for which we are allocating.
> + * @count:	Number of pages to allocate.
> + *
> + * Allocates multiple page table pages and sets the appropriate entries in the
> + * page table structure within the page directory. Function cleans up after
> + * itself on any failures.
> + *
> + * Return: 0 if allocation succeeded.
> + */
> +static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count)
> +{
> +	int i, ret;
> +
> +	/* 512 is the max page tables per page_directory on any platform.
> +	 * TODO: make WARN after patch series is done
> +	 */
> +	BUG_ON(pde + count > GEN6_PPGTT_PD_ENTRIES);
> +

WARN_ON in here and return -EINVAL.

-Mika

> +	for (i = pde; i < pde + count; i++) {
> +		struct i915_page_table_entry *pt = alloc_pt_single();
> +
> +		if (IS_ERR(pt)) {
> +			ret = PTR_ERR(pt);
> +			goto err_out;
> +		}
> +		WARN(pd->page_tables[i],
> +		     "Leaking page directory entry %d (%pa)\n",
> +		     i, pd->page_tables[i]);
> +		pd->page_tables[i] = pt;
> +	}
> +
> +	return 0;
> +
> +err_out:
> +	while (i--)
> +		unmap_and_free_pt(pd->page_tables[i]);
> +	return ret;
> +}
> +
> +static void unmap_and_free_pd(struct i915_page_directory_entry *pd)
> +{
> +	if (pd->page) {
> +		__free_page(pd->page);
> +		kfree(pd);
> +	}
> +}
> +
> +static struct i915_page_directory_entry *alloc_pd_single(void)
> +{
> +	struct i915_page_directory_entry *pd;
> +
> +	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
> +	if (!pd)
> +		return ERR_PTR(-ENOMEM);
> +
> +	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!pd->page) {
> +		kfree(pd);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	return pd;
> +}
> +
>  /* Broadwell Page Directory Pointer Descriptors */
>  static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
>  			   uint64_t val)
> @@ -307,7 +400,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
>  	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
>  
>  	for (i = used_pd - 1; i >= 0; i--) {
> -		dma_addr_t addr = ppgtt->pdp.page_directory[i].daddr;
> +		dma_addr_t addr = ppgtt->pdp.page_directory[i]->daddr;
>  		ret = gen8_write_pdp(ring, i, addr);
>  		if (ret)
>  			return ret;
> @@ -334,8 +427,24 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>  				      I915_CACHE_LLC, use_scratch);
>  
>  	while (num_entries) {
> -		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
> -		struct page *page_table = pd->page_tables[pde].page;
> +		struct i915_page_directory_entry *pd;
> +		struct i915_page_table_entry *pt;
> +		struct page *page_table;
> +
> +		if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
> +			continue;
> +
> +		pd = ppgtt->pdp.page_directory[pdpe];
> +
> +		if (WARN_ON(!pd->page_tables[pde]))
> +			continue;
> +
> +		pt = pd->page_tables[pde];
> +
> +		if (WARN_ON(!pt->page))
> +			continue;
> +
> +		page_table = pt->page;
>  
>  		last_pte = pte + num_entries;
>  		if (last_pte > GEN8_PTES_PER_PAGE)
> @@ -380,8 +489,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  			break;
>  
>  		if (pt_vaddr == NULL) {
> -			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
> -			struct page *page_table = pd->page_tables[pde].page;
> +			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[pdpe];
> +			struct i915_page_table_entry *pt = pd->page_tables[pde];
> +			struct page *page_table = pt->page;
>  
>  			pt_vaddr = kmap_atomic(page_table);
>  		}
> @@ -412,18 +522,16 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
>  {
>  	int i;
>  
> -	if (pd->page_tables == NULL)
> +	if (!pd->page)
>  		return;
>  
> -	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
> -		if (pd->page_tables[i].page)
> -			__free_page(pd->page_tables[i].page);
> -}
> +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
> +		if (WARN_ON(!pd->page_tables[i]))
> +			continue;
>  
> -static void gen8_free_page_directories(struct i915_page_directory_entry *pd)
> -{
> -	kfree(pd->page_tables);
> -	__free_page(pd->page);
> +		unmap_and_free_pt(pd->page_tables[i]);
> +		pd->page_tables[i] = NULL;
> +	}
>  }
>  
>  static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
> @@ -431,8 +539,11 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  	int i;
>  
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> -		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
> -		gen8_free_page_directories(&ppgtt->pdp.page_directory[i]);
> +		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
> +			continue;
> +
> +		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
> +		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
>  	}
>  }
>  
> @@ -444,14 +555,23 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
>  		/* TODO: In the future we'll support sparse mappings, so this
>  		 * will have to change. */
> -		if (!ppgtt->pdp.page_directory[i].daddr)
> +		if (!ppgtt->pdp.page_directory[i]->daddr)
>  			continue;
>  
> -		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i].daddr, PAGE_SIZE,
> +		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i]->daddr, PAGE_SIZE,
>  			       PCI_DMA_BIDIRECTIONAL);
>  
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> -			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
> +			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
> +			struct i915_page_table_entry *pt;
> +			dma_addr_t addr;
> +
> +			if (WARN_ON(!pd->page_tables[j]))
> +				continue;
> +
> +			pt = pd->page_tables[j];
> +			addr = pt->daddr;
> +
>  			if (addr)
>  				pci_unmap_page(hwdev, addr, PAGE_SIZE,
>  					       PCI_DMA_BIDIRECTIONAL);
> @@ -470,25 +590,20 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>  
>  static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>  {
> -	int i, j;
> +	int i, ret;
>  
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> -		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[i];
> -		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> -			struct i915_page_table_entry *pt = &pd->page_tables[j];
> -
> -			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> -			if (!pt->page)
> -				goto unwind_out;
> -
> -		}
> +		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
> +				     0, GEN8_PDES_PER_PAGE);
> +		if (ret)
> +			goto unwind_out;
>  	}
>  
>  	return 0;
>  
>  unwind_out:
>  	while (i--)
> -		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
> +		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
>  
>  	return -ENOMEM;
>  }
> @@ -499,17 +614,9 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
>  	int i;
>  
>  	for (i = 0; i < max_pdp; i++) {
> -		struct i915_page_table_entry *pt;
> -
> -		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
> -		if (!pt)
> +		ppgtt->pdp.page_directory[i] = alloc_pd_single();
> +		if (IS_ERR(ppgtt->pdp.page_directory[i]))
>  			goto unwind_out;
> -
> -		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
> -		if (!ppgtt->pdp.page_directory[i].page)
> -			goto unwind_out;
> -
> -		ppgtt->pdp.page_directory[i].page_tables = pt;
>  	}
>  
>  	ppgtt->num_pd_pages = max_pdp;
> @@ -518,10 +625,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
>  	return 0;
>  
>  unwind_out:
> -	while (i--) {
> -		kfree(ppgtt->pdp.page_directory[i].page_tables);
> -		__free_page(ppgtt->pdp.page_directory[i].page);
> -	}
> +	while (i--)
> +		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
>  
>  	return -ENOMEM;
>  }
> @@ -556,14 +661,14 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
>  	int ret;
>  
>  	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
> -			       ppgtt->pdp.page_directory[pd].page, 0,
> +			       ppgtt->pdp.page_directory[pd]->page, 0,
>  			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>  
>  	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
>  	if (ret)
>  		return ret;
>  
> -	ppgtt->pdp.page_directory[pd].daddr = pd_addr;
> +	ppgtt->pdp.page_directory[pd]->daddr = pd_addr;
>  
>  	return 0;
>  }
> @@ -573,8 +678,8 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
>  					const int pt)
>  {
>  	dma_addr_t pt_addr;
> -	struct i915_page_directory_entry *pdir = &ppgtt->pdp.page_directory[pd];
> -	struct i915_page_table_entry *ptab = &pdir->page_tables[pt];
> +	struct i915_page_directory_entry *pdir = ppgtt->pdp.page_directory[pd];
> +	struct i915_page_table_entry *ptab = pdir->page_tables[pt];
>  	struct page *p = ptab->page;
>  	int ret;
>  
> @@ -637,10 +742,12 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  	 * will never need to touch the PDEs again.
>  	 */
>  	for (i = 0; i < max_pdp; i++) {
> +		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
>  		gen8_ppgtt_pde_t *pd_vaddr;
> -		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
> +		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i]->page);
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> -			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
> +			struct i915_page_table_entry *pt = pd->page_tables[j];
> +			dma_addr_t addr = pt->daddr;
>  			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
>  						      I915_CACHE_LLC);
>  		}
> @@ -691,7 +798,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>  	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
>  		u32 expected;
>  		gen6_gtt_pte_t *pt_vaddr;
> -		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
> +		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
>  		pd_entry = readl(pd_addr + pde);
>  		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
>  
> @@ -702,7 +809,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>  				   expected);
>  		seq_printf(m, "\tPDE: %x\n", pd_entry);
>  
> -		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
>  		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
>  			unsigned long va =
>  				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
> @@ -741,7 +848,7 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
>  	for (i = 0; i < ppgtt->num_pd_entries; i++) {
>  		dma_addr_t pt_addr;
>  
> -		pt_addr = ppgtt->pd.page_tables[i].daddr;
> +		pt_addr = ppgtt->pd.page_tables[i]->daddr;
>  		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
>  		pd_entry |= GEN6_PDE_VALID;
>  
> @@ -907,7 +1014,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>  		if (last_pte > I915_PPGTT_PT_ENTRIES)
>  			last_pte = I915_PPGTT_PT_ENTRIES;
>  
> -		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
>  
>  		for (i = first_pte; i < last_pte; i++)
>  			pt_vaddr[i] = scratch_pte;
> @@ -936,7 +1043,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>  	pt_vaddr = NULL;
>  	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
>  		if (pt_vaddr == NULL)
> -			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
> +			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
>  
>  		pt_vaddr[act_pte] =
>  			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
> @@ -959,7 +1066,7 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  
>  	for (i = 0; i < ppgtt->num_pd_entries; i++)
>  		pci_unmap_page(ppgtt->base.dev->pdev,
> -			       ppgtt->pd.page_tables[i].daddr,
> +			       ppgtt->pd.page_tables[i]->daddr,
>  			       4096, PCI_DMA_BIDIRECTIONAL);
>  }
>  
> @@ -968,8 +1075,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  	int i;
>  
>  	for (i = 0; i < ppgtt->num_pd_entries; i++)
> -		__free_page(ppgtt->pd.page_tables[i].page);
> -	kfree(ppgtt->pd.page_tables);
> +		unmap_and_free_pt(ppgtt->pd.page_tables[i]);
> +
> +	unmap_and_free_pd(&ppgtt->pd);
>  }
>  
>  static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
> @@ -1024,27 +1132,6 @@ alloc:
>  	return 0;
>  }
>  
> -static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
> -{
> -	struct i915_page_table_entry *pt;
> -	int i;
> -
> -	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
> -	if (!pt)
> -		return -ENOMEM;
> -
> -	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> -		pt[i].page = alloc_page(GFP_KERNEL);
> -		if (!pt->page) {
> -			gen6_ppgtt_free(ppgtt);
> -			return -ENOMEM;
> -		}
> -	}
> -
> -	ppgtt->pd.page_tables = pt;
> -	return 0;
> -}
> -
>  static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>  {
>  	int ret;
> @@ -1053,7 +1140,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>  	if (ret)
>  		return ret;
>  
> -	ret = gen6_ppgtt_allocate_page_tables(ppgtt);
> +	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
>  	if (ret) {
>  		drm_mm_remove_node(&ppgtt->node);
>  		return ret;
> @@ -1071,7 +1158,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>  		struct page *page;
>  		dma_addr_t pt_addr;
>  
> -		page = ppgtt->pd.page_tables[i].page;
> +		page = ppgtt->pd.page_tables[i]->page;
>  		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
>  				       PCI_DMA_BIDIRECTIONAL);
>  
> @@ -1080,7 +1167,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>  			return -EIO;
>  		}
>  
> -		ppgtt->pd.page_tables[i].daddr = pt_addr;
> +		ppgtt->pd.page_tables[i]->daddr = pt_addr;
>  	}
>  
>  	return 0;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 6efeb18..e8cad72 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -199,12 +199,12 @@ struct i915_page_directory_entry {
>  		dma_addr_t daddr;
>  	};
>  
> -	struct i915_page_table_entry *page_tables;
> +	struct i915_page_table_entry *page_tables[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
>  };
>  
>  struct i915_page_directory_pointer_entry {
>  	/* struct page *page; */
> -	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
> +	struct i915_page_directory_entry *page_directory[GEN8_LEGACY_PDPES];
>  };
>  
>  struct i915_address_space {
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index a784d1d..efaaebe 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1731,14 +1731,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
>  	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
>  	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
>  	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
> -	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3].daddr);
> -	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3].daddr);
> -	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2].daddr);
> -	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2].daddr);
> -	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1].daddr);
> -	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1].daddr);
> -	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0].daddr);
> -	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0].daddr);
> +	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
> +	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
> +	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
> +	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
> +	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
> +	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
> +	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
> +	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
>  	if (ring->id == RCS) {
>  		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>  		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
> -- 
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 05/24] drm/i915: page table abstractions
  2015-02-18 11:27     ` Mika Kuoppala
@ 2015-02-23 15:39       ` Michel Thierry
  0 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:39 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 15768 bytes --]

On 2/18/2015 11:27 AM, Mika Kuoppala wrote:
> Michel Thierry <michel.thierry@intel.com> writes:
>
>> From: Ben Widawsky <benjamin.widawsky@intel.com>
>>
>> When we move to dynamic page allocation, keeping page_directory and pagetabs as
>> separate structures will help to break actions into simpler tasks.
>>
>> To help transition the code nicely there is some wasted space in gen6/7.
>> This will be ameliorated shortly.
>>
>> Following the x86 pagetable terminology:
>> PDPE = struct i915_page_directory_pointer_entry.
>> PDE = struct i915_page_directory_entry [page_directory].
>> PTE = struct i915_page_table_entry [page_tables].
>>
>> v2: fixed mismatches after clean-up/rebase.
>>
>> v3: Clarify the names of the multiple levels of page tables (Daniel)
>>
>> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2, v3)
>> ---
>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 177 ++++++++++++++++++------------------
>>   drivers/gpu/drm/i915/i915_gem_gtt.h |  23 ++++-
>>   2 files changed, 107 insertions(+), 93 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index b48b586..98b4698 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -334,7 +334,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>>   				      I915_CACHE_LLC, use_scratch);
>>   
>>   	while (num_entries) {
>> -		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
>> +		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
>> +		struct page *page_table = pd->page_tables[pde].page;
>>   
>>   		last_pte = pte + num_entries;
>>   		if (last_pte > GEN8_PTES_PER_PAGE)
>> @@ -378,8 +379,12 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>>   		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
>>   			break;
>>   
>> -		if (pt_vaddr == NULL)
>> -			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
>> +		if (pt_vaddr == NULL) {
>> +			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
>> +			struct page *page_table = pd->page_tables[pde].page;
>> +
>> +			pt_vaddr = kmap_atomic(page_table);
>> +		}
>>   
>>   		pt_vaddr[pte] =
>>   			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
>> @@ -403,29 +408,33 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>>   	}
>>   }
>>   
>> -static void gen8_free_page_tables(struct page **pt_pages)
>> +static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
>>   {
>>   	int i;
>>   
>> -	if (pt_pages == NULL)
>> +	if (pd->page_tables == NULL)
>>   		return;
>>   
>>   	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
>> -		if (pt_pages[i])
>> -			__free_pages(pt_pages[i], 0);
>> +		if (pd->page_tables[i].page)
>> +			__free_page(pd->page_tables[i].page);
>>   }
>>   
>> -static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
>> +static void gen8_free_page_directories(struct i915_page_directory_entry *pd)
>                                          ^
> You only free one directory so why plural here?
>
>> +{
> If you free the page tables for the directory here..
>
>> +	kfree(pd->page_tables);
>> +	__free_page(pd->page);
>> +}
>> +
>> +static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>>   {
>>   	int i;
>>   
>>   	for (i = 0; i < ppgtt->num_pd_pages; i++) {
>> -		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
>> -		kfree(ppgtt->gen8_pt_pages[i]);
>> +		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
> ...this loop will be cleaner.
>
> Also consider renaming 'num_pd_pages' to 'num_pd'. But if it does
> cause a lot of rebase burden dont worry about it.

num_pd_pages will go away in patch #19, so I rather not change that.
All other comments addressed in v4.

Thanks,

-Michel
>> +		gen8_free_page_directories(&ppgtt->pdp.page_directory[i]);
>>   		kfree(ppgtt->gen8_pt_dma_addr[i]);
>>   	}
>> -
>> -	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
>>   }
>>   
>>   static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>> @@ -460,86 +469,75 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>>   	gen8_ppgtt_free(ppgtt);
>>   }
>>   
>> -static struct page **__gen8_alloc_page_tables(void)
>> +static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
>>   {
>> -	struct page **pt_pages;
>>   	int i;
>>   
>> -	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
>> -	if (!pt_pages)
>> -		return ERR_PTR(-ENOMEM);
>> -
>> -	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
>> -		pt_pages[i] = alloc_page(GFP_KERNEL);
>> -		if (!pt_pages[i])
>> -			goto bail;
>> +	for (i = 0; i < ppgtt->num_pd_pages; i++) {
>> +		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
>> +						     sizeof(dma_addr_t),
>> +						     GFP_KERNEL);
>> +		if (!ppgtt->gen8_pt_dma_addr[i])
>> +			return -ENOMEM;
>>   	}
>>   
>> -	return pt_pages;
>> -
>> -bail:
>> -	gen8_free_page_tables(pt_pages);
>> -	kfree(pt_pages);
>> -	return ERR_PTR(-ENOMEM);
>> +	return 0;
>>   }
>>   
>> -static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
>> -					   const int max_pdp)
>> +static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>>   {
>> -	struct page **pt_pages[GEN8_LEGACY_PDPES];
>> -	int i, ret;
>> +	int i, j;
>>   
>> -	for (i = 0; i < max_pdp; i++) {
>> -		pt_pages[i] = __gen8_alloc_page_tables();
>> -		if (IS_ERR(pt_pages[i])) {
>> -			ret = PTR_ERR(pt_pages[i]);
>> -			goto unwind_out;
>> +	for (i = 0; i < ppgtt->num_pd_pages; i++) {
>> +		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
>> +			struct i915_page_table_entry *pt = &ppgtt->pdp.page_directory[i].page_tables[j];
>> +
>> +			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +			if (!pt->page)
>> +				goto unwind_out;
>>   		}
>>   	}
>>   
>> -	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
>> -	 * "atomic" - for cleanup purposes.
>> -	 */
>> -	for (i = 0; i < max_pdp; i++)
>> -		ppgtt->gen8_pt_pages[i] = pt_pages[i];
>> -
>>   	return 0;
>>   
>>   unwind_out:
>> -	while (i--) {
>> -		gen8_free_page_tables(pt_pages[i]);
>> -		kfree(pt_pages[i]);
>> -	}
>> +	while (i--)
>> +		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
>>   
>> -	return ret;
>> +	return -ENOMEM;
>>   }
>>   
>> -static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
>> +static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
>> +						const int max_pdp)
>>   {
>>   	int i;
>>   
>> -	for (i = 0; i < ppgtt->num_pd_pages; i++) {
>> -		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
>> -						     sizeof(dma_addr_t),
>> -						     GFP_KERNEL);
>> -		if (!ppgtt->gen8_pt_dma_addr[i])
>> -			return -ENOMEM;
>> -	}
>> +	for (i = 0; i < max_pdp; i++) {
>> +		struct i915_page_table_entry *pt;
>>   
>> -	return 0;
>> -}
>> +		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
>> +		if (!pt)
>> +			goto unwind_out;
>>   
>> -static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
>> -						const int max_pdp)
>> -{
>> -	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
>> -	if (!ppgtt->pd_pages)
>> -		return -ENOMEM;
>> +		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
>> +		if (!ppgtt->pdp.page_directory[i].page)
>> +			goto unwind_out;
> If you end up having alloc error here you will leak the previously
> allocated pt above.
>
> Also consider that if you do gen8_ppgtt_allocate_page_directory() and
> add null check for pd->page in gen8_free_page_directory you should be able to avoid
> the unwinding below completely.
>
>> +
>> +		ppgtt->pdp.page_directory[i].page_tables = pt;
>> +	}
>>   
>> -	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
>> +	ppgtt->num_pd_pages = max_pdp;
>>   	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
>>   
>>   	return 0;
>> +
>> +unwind_out:
>> +	while (i--) {
>> +		kfree(ppgtt->pdp.page_directory[i].page_tables);
>> +		__free_page(ppgtt->pdp.page_directory[i].page);
>> +	}
>> +
>> +	return -ENOMEM;
>>   }
>>   
>>   static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
>> @@ -551,18 +549,19 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
>>   	if (ret)
>>   		return ret;
>>   
>> -	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
>> -	if (ret) {
>> -		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
>> -		return ret;
>> -	}
>> +	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
>> +	if (ret)
>> +		goto err_out;
>>   
>>   	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
>>   
>>   	ret = gen8_ppgtt_allocate_dma(ppgtt);
>> -	if (ret)
>> -		gen8_ppgtt_free(ppgtt);
>> +	if (!ret)
>> +		return ret;
>>   
>> +	/* TODO: Check this for all cases */
> The check for zero return and then returning it with the comment is
> confusing. Why not just do the same pattern as in above?
>
> if (ret)
>     goto err_out;
>
> return 0;
>
> -Mika
>
>> +err_out:
>> +	gen8_ppgtt_free(ppgtt);
>>   	return ret;
>>   }
>>   
>> @@ -573,7 +572,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
>>   	int ret;
>>   
>>   	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
>> -			       &ppgtt->pd_pages[pd], 0,
>> +			       ppgtt->pdp.page_directory[pd].page, 0,
>>   			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>>   
>>   	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
>> @@ -593,7 +592,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
>>   	struct page *p;
>>   	int ret;
>>   
>> -	p = ppgtt->gen8_pt_pages[pd][pt];
>> +	p = ppgtt->pdp.page_directory[pd].page_tables[pt].page;
>>   	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
>>   			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>>   	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
>> @@ -654,7 +653,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>>   	 */
>>   	for (i = 0; i < max_pdp; i++) {
>>   		gen8_ppgtt_pde_t *pd_vaddr;
>> -		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
>> +		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
>>   		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
>>   			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
>>   			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
>> @@ -717,7 +716,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>>   				   expected);
>>   		seq_printf(m, "\tPDE: %x\n", pd_entry);
>>   
>> -		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
>> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
>>   		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
>>   			unsigned long va =
>>   				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
>> @@ -922,7 +921,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>>   		if (last_pte > I915_PPGTT_PT_ENTRIES)
>>   			last_pte = I915_PPGTT_PT_ENTRIES;
>>   
>> -		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
>> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
>>   
>>   		for (i = first_pte; i < last_pte; i++)
>>   			pt_vaddr[i] = scratch_pte;
>> @@ -951,7 +950,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>>   	pt_vaddr = NULL;
>>   	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
>>   		if (pt_vaddr == NULL)
>> -			pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
>> +			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
>>   
>>   		pt_vaddr[act_pte] =
>>   			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
>> @@ -986,8 +985,8 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>>   
>>   	kfree(ppgtt->pt_dma_addr);
>>   	for (i = 0; i < ppgtt->num_pd_entries; i++)
>> -		__free_page(ppgtt->pt_pages[i]);
>> -	kfree(ppgtt->pt_pages);
>> +		__free_page(ppgtt->pd.page_tables[i].page);
>> +	kfree(ppgtt->pd.page_tables);
>>   }
>>   
>>   static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
>> @@ -1044,22 +1043,22 @@ alloc:
>>   
>>   static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>>   {
>> +	struct i915_page_table_entry *pt;
>>   	int i;
>>   
>> -	ppgtt->pt_pages = kcalloc(ppgtt->num_pd_entries, sizeof(struct page *),
>> -				  GFP_KERNEL);
>> -
>> -	if (!ppgtt->pt_pages)
>> +	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
>> +	if (!pt)
>>   		return -ENOMEM;
>>   
>>   	for (i = 0; i < ppgtt->num_pd_entries; i++) {
>> -		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
>> -		if (!ppgtt->pt_pages[i]) {
>> +		pt[i].page = alloc_page(GFP_KERNEL);
>> +		if (!pt->page) {
>>   			gen6_ppgtt_free(ppgtt);
>>   			return -ENOMEM;
>>   		}
>>   	}
>>   
>> +	ppgtt->pd.page_tables = pt;
>>   	return 0;
>>   }
>>   
>> @@ -1094,9 +1093,11 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>>   	int i;
>>   
>>   	for (i = 0; i < ppgtt->num_pd_entries; i++) {
>> +		struct page *page;
>>   		dma_addr_t pt_addr;
>>   
>> -		pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i], 0, 4096,
>> +		page = ppgtt->pd.page_tables[i].page;
>> +		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
>>   				       PCI_DMA_BIDIRECTIONAL);
>>   
>>   		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
>> @@ -1140,7 +1141,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>>   	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
>>   	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
>>   	ppgtt->base.start = 0;
>> -	ppgtt->base.total =  ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
>> +	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
>>   	ppgtt->debug_dump = gen6_dump_ppgtt;
>>   
>>   	ppgtt->pd_offset =
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> index 8f76990..d9bc375 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> @@ -187,6 +187,20 @@ struct i915_vma {
>>   			 u32 flags);
>>   };
>>   
>> +struct i915_page_table_entry {
>> +	struct page *page;
>> +};
>> +
>> +struct i915_page_directory_entry {
>> +	struct page *page; /* NULL for GEN6-GEN7 */
>> +	struct i915_page_table_entry *page_tables;
>> +};
>> +
>> +struct i915_page_directory_pointer_entry {
>> +	/* struct page *page; */
>> +	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
>> +};
>> +
>>   struct i915_address_space {
>>   	struct drm_mm mm;
>>   	struct drm_device *dev;
>> @@ -272,11 +286,6 @@ struct i915_hw_ppgtt {
>>   	unsigned num_pd_entries;
>>   	unsigned num_pd_pages; /* gen8+ */
>>   	union {
>> -		struct page **pt_pages;
>> -		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
>> -	};
>> -	struct page *pd_pages;
>> -	union {
>>   		uint32_t pd_offset;
>>   		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
>>   	};
>> @@ -284,6 +293,10 @@ struct i915_hw_ppgtt {
>>   		dma_addr_t *pt_dma_addr;
>>   		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
>>   	};
>> +	union {
>> +		struct i915_page_directory_pointer_entry pdp;
>> +		struct i915_page_directory_entry pd;
>> +	};
>>   
>>   	struct drm_i915_file_private *file_priv;
>>   
>> -- 
>> 2.1.1
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5510 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 09/24] drm/i915: Track GEN6 page table usage
  2015-02-20 16:41     ` Mika Kuoppala
@ 2015-02-23 15:39       ` Michel Thierry
  0 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:39 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 12197 bytes --]

On 2/20/2015 4:41 PM, Mika Kuoppala wrote:
> Michel Thierry <michel.thierry@intel.com> writes:
>
>> From: Ben Widawsky <benjamin.widawsky@intel.com>
>>
>> Instead of implementing the full tracking + dynamic allocation, this
>> patch does a bit less than half of the work, by tracking and warning on
>> unexpected conditions. The tracking itself follows which PTEs within a
>> page table are currently being used for objects. The next patch will
>> modify this to actually allocate the page tables only when necessary.
>>
>> With the current patch there isn't much in the way of making a gen
>> agnostic range allocation function. However, in the next patch we'll add
>> more specificity which makes having separate functions a bit easier to
>> manage.
>>
>> One important change introduced here is that DMA mappings are
>> created/destroyed at the same page directories/tables are
>> allocated/deallocated.
>>
>> Notice that aliasing PPGTT is not managed here. The patch which actually
>> begins dynamic allocation/teardown explains the reasoning for this.
>>
>> v2: s/pdp.page_directory/pdp.page_directorys
>> Make a scratch page allocation helper
>>
>> v3: Rebase and expand commit message.
>>
>> v4: Allocate required pagetables only when it is needed, _bind_to_vm
>> instead of bind_vma (Daniel).
>>
>> v5: Rebased to remove the unnecessary noise in the diff, also:
>>   - PDE mask is GEN agnostic, renamed GEN6_PDE_MASK to I915_PDE_MASK.
>>   - Removed unnecessary checks in gen6_alloc_va_range.
>>   - Changed map/unmap_px_single macros to use dma functions directly and
>>     be part of a static inline function instead.
>>   - Moved drm_device plumbing through page tables operation to its own
>>     patch.
>>   - Moved allocate/teardown_va_range calls until they are fully
>>     implemented (in subsequent patch).
>>   - Merged pt and scratch_pt unmap_and_free path.
>>   - Moved scratch page allocator helper to the patch that will use it.
>>
>> v6: Reduce complexity by not tearing down pagetables dynamically, the
>> same can be achieved while freeing empty vms. (Daniel)
>>
>> Cc: Daniel Vetter <daniel@ffwll.ch>
>> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
>> ---
>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 191 +++++++++++++++++++++++++-----------
>>   drivers/gpu/drm/i915/i915_gem_gtt.h |  75 ++++++++++++++
>>   2 files changed, 206 insertions(+), 60 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index e2bcd10..760585e 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -274,29 +274,88 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
>>   	return pte;
>>   }
>>   
>> -static void unmap_and_free_pt(struct i915_page_table_entry *pt, struct drm_device *dev)
>> +#define i915_dma_unmap_single(px, dev) \
>> +	__i915_dma_unmap_single((px)->daddr, dev)
>> +
>> +static inline void __i915_dma_unmap_single(dma_addr_t daddr,
>> +					struct drm_device *dev)
>> +{
>> +	struct device *device = &dev->pdev->dev;
>> +
>> +	dma_unmap_page(device, daddr, 4096, PCI_DMA_BIDIRECTIONAL);
>> +}
>> +
>> +/**
>> + * i915_dma_map_px_single() - Create a dma mapping for a page table/dir/etc.
>> + * @px:		Page table/dir/etc to get a DMA map for
>> + * @dev:	drm device
>> + *
>> + * Page table allocations are unified across all gens. They always require a
>> + * single 4k allocation, as well as a DMA mapping. If we keep the structs
>> + * symmetric here, the simple macro covers us for every page table type.
>> + *
>> + * Return: 0 if success.
>> + */
>> +#define i915_dma_map_px_single(px, dev) \
>> +	i915_dma_map_page_single((px)->page, (dev), &(px)->daddr)
>> +
> If this is symmetrical to i915_dma_unmap_single() is the _px_ needed?
>
>> +static inline int i915_dma_map_page_single(struct page *page,
>> +					   struct drm_device *dev,
>> +					   dma_addr_t *daddr)
>> +{
>> +	struct device *device = &dev->pdev->dev;
>> +
>> +	*daddr = dma_map_page(device, page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
>> +	return dma_mapping_error(device, *daddr);
>> +}
>> +
>> +static void unmap_and_free_pt(struct i915_page_table_entry *pt,
>> +			       struct drm_device *dev)
>>   {
>>   	if (WARN_ON(!pt->page))
>>   		return;
>> +
>> +	i915_dma_unmap_single(pt, dev);
>>   	__free_page(pt->page);
>> +	kfree(pt->used_ptes);
>>   	kfree(pt);
>>   }
>>   
>>   static struct i915_page_table_entry *alloc_pt_single(struct drm_device *dev)
>>   {
>>   	struct i915_page_table_entry *pt;
>> +	const size_t count = INTEL_INFO(dev)->gen >= 8 ?
>> +		GEN8_PTES_PER_PAGE : I915_PPGTT_PT_ENTRIES;
>> +	int ret = -ENOMEM;
>>   
>>   	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
>>   	if (!pt)
>>   		return ERR_PTR(-ENOMEM);
>>   
>> +	pt->used_ptes = kcalloc(BITS_TO_LONGS(count), sizeof(*pt->used_ptes),
>> +				GFP_KERNEL);
>> +
>> +	if (!pt->used_ptes)
>> +		goto fail_bitmap;
>> +
>>   	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> -	if (!pt->page) {
>> -		kfree(pt);
>> -		return ERR_PTR(-ENOMEM);
>> -	}
>> +	if (!pt->page)
>> +		goto fail_page;
>> +
>> +	ret = i915_dma_map_px_single(pt, dev);
>> +	if (ret)
>> +		goto fail_dma;
>>   
>>   	return pt;
>> +
>> +fail_dma:
>> +	__free_page(pt->page);
>> +fail_page:
>> +	kfree(pt->used_ptes);
>> +fail_bitmap:
>> +	kfree(pt);
>> +
>> +	return ERR_PTR(ret);
>>   }
>>   
>>   /**
>> @@ -836,26 +895,36 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>>   	}
>>   }
>>   
>> -static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
>> +/* Write pde (index) from the page directory @pd to the page table @pt */
>> +static void gen6_write_pdes(struct i915_page_directory_entry *pd,
> For me it seems that you will write only one pde entry so s/pdes/pde ?
>
>> +			    const int pde, struct i915_page_table_entry *pt)
>>   {
>> -	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
>> -	gen6_gtt_pte_t __iomem *pd_addr;
>> -	uint32_t pd_entry;
>> -	int i;
>> +	struct i915_hw_ppgtt *ppgtt =
>> +		container_of(pd, struct i915_hw_ppgtt, pd);
>> +	u32 pd_entry;
>>   
>> -	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
>> -	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
>> -		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
>> -	for (i = 0; i < ppgtt->num_pd_entries; i++) {
>> -		dma_addr_t pt_addr;
>> +	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
>> +	pd_entry |= GEN6_PDE_VALID;
>>   
>> -		pt_addr = ppgtt->pd.page_tables[i]->daddr;
>> -		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
>> -		pd_entry |= GEN6_PDE_VALID;
>> +	writel(pd_entry, ppgtt->pd_addr + pde);
>>   
>> -		writel(pd_entry, pd_addr + i);
>> -	}
>> -	readl(pd_addr);
>> +	/* XXX: Caller needs to make sure the write completes if necessary */
>> +}
> Move this comment on top of the function and lift the XXX:
>
>> +
>> +/* Write all the page tables found in the ppgtt structure to incrementing page
>> + * directories. */
>> +static void gen6_write_page_range(struct drm_i915_private *dev_priv,
>> +				struct i915_page_directory_entry *pd, uint32_t start, uint32_t length)
>> +{
>> +	struct i915_page_table_entry *pt;
>> +	uint32_t pde, temp;
>> +
>> +	gen6_for_each_pde(pt, pd, start, length, temp, pde)
>> +		gen6_write_pdes(pd, pde, pt);
>> +
>> +	/* Make sure write is complete before other code can use this page
>> +	 * table. Also require for WC mapped PTEs */
>> +	readl(dev_priv->gtt.gsm);
>>   }
>>   
>>   static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
>> @@ -1071,6 +1140,28 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>>   			       4096, PCI_DMA_BIDIRECTIONAL);
>>   }
>>   
>> +static int gen6_alloc_va_range(struct i915_address_space *vm,
>> +			       uint64_t start, uint64_t length)
>> +{
>> +	struct i915_hw_ppgtt *ppgtt =
>> +				container_of(vm, struct i915_hw_ppgtt, base);
>> +	struct i915_page_table_entry *pt;
>> +	uint32_t pde, temp;
>> +
>> +	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
>> +		DECLARE_BITMAP(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
>> +
>> +		bitmap_zero(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
>> +		bitmap_set(tmp_bitmap, gen6_pte_index(start),
>> +			   gen6_pte_count(start, length));
>> +
>> +		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
>> +				I915_PPGTT_PT_ENTRIES);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>   static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>>   {
>>   	int i;
>> @@ -1117,20 +1208,24 @@ alloc:
>>   					       0, dev_priv->gtt.base.total,
>>   					       0);
>>   		if (ret)
>> -			return ret;
>> +			goto err_out;
>>   
>>   		retried = true;
>>   		goto alloc;
>>   	}
>>   
>>   	if (ret)
>> -		return ret;
>> +		goto err_out;
>> +
>>   
>>   	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
>>   		DRM_DEBUG("Forced to use aperture for PDEs\n");
>>   
>>   	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
>>   	return 0;
>> +
>> +err_out:
>> +	return ret;
>>   }
>>   
>>   static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>> @@ -1152,30 +1247,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>>   	return 0;
>>   }
>>   
>> -static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>> -{
>> -	struct drm_device *dev = ppgtt->base.dev;
>> -	int i;
>> -
>> -	for (i = 0; i < ppgtt->num_pd_entries; i++) {
>> -		struct page *page;
>> -		dma_addr_t pt_addr;
>> -
>> -		page = ppgtt->pd.page_tables[i]->page;
>> -		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
>> -				       PCI_DMA_BIDIRECTIONAL);
>> -
>> -		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
>> -			gen6_ppgtt_unmap_pages(ppgtt);
>> -			return -EIO;
>> -		}
>> -
>> -		ppgtt->pd.page_tables[i]->daddr = pt_addr;
>> -	}
>> -
>> -	return 0;
>> -}
>> -
>>   static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>>   {
>>   	struct drm_device *dev = ppgtt->base.dev;
>> @@ -1196,12 +1267,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>>   	if (ret)
>>   		return ret;
>>   
>> -	ret = gen6_ppgtt_setup_page_tables(ppgtt);
>> -	if (ret) {
>> -		gen6_ppgtt_free(ppgtt);
>> -		return ret;
>> -	}
>> -
>> +	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
>>   	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
>>   	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
>>   	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
>> @@ -1212,13 +1278,17 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>>   	ppgtt->pd.pd_offset =
>>   		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
>>   
>> +	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
>> +		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
>> +
>>   	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
>>   
>> +	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
>> +
>>   	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
>>   			 ppgtt->node.size >> 20,
>>   			 ppgtt->node.start / PAGE_SIZE);
>>   
>> -	gen6_write_pdes(ppgtt);
>>   	DRM_DEBUG("Adding PPGTT at offset %x\n",
>>   		  ppgtt->pd.pd_offset << 10);
>>   
>> @@ -1491,13 +1561,14 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
>>   
>>   	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
>>   		/* TODO: Perhaps it shouldn't be gen6 specific */
>> -		if (i915_is_ggtt(vm)) {
>> -			if (dev_priv->mm.aliasing_ppgtt)
>> -				gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
>> -			continue;
>> -		}
>>   
>> -		gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
>> +		struct i915_hw_ppgtt *ppgtt =
>> +			container_of(vm, struct i915_hw_ppgtt, base);
>> +
>> +		if (i915_is_ggtt(vm))
>> +			ppgtt = dev_priv->mm.aliasing_ppgtt;
> If we have ggtt but aliasing is not enabled we get NULL here and oops
> over in next function?
>
> -Mika
>
I applied the changes in v7.

Thanks for the review.

-Michel


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5510 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v4 07/24] drm/i915: Create page table allocators
  2015-02-20 16:50     ` Mika Kuoppala
@ 2015-02-23 15:39       ` Michel Thierry
  0 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:39 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 4406 bytes --]

On 2/20/2015 4:50 PM, Mika Kuoppala wrote:
> Michel Thierry <michel.thierry@intel.com> writes:
>
>> From: Ben Widawsky <benjamin.widawsky@intel.com>
>>
>> As we move toward dynamic page table allocation, it becomes much easier
>> to manage our data structures if break do things less coarsely by
>> breaking up all of our actions into individual tasks.  This makes the
>> code easier to write, read, and verify.
>>
>> Aside from the dissection of the allocation functions, the patch
>> statically allocates the page table structures without a page directory.
>> This remains the same for all platforms,
>>
>> The patch itself should not have much functional difference. The primary
>> noticeable difference is the fact that page tables are no longer
>> allocated, but rather statically declared as part of the page directory.
>> This has non-zero overhead, but things gain non-trivial complexity as a
>> result.
>>
>> This patch exists for a few reasons:
>> 1. Splitting out the functions allows easily combining GEN6 and GEN8
>> code. Page tables have no difference based on GEN8. As we'll see in a
>> future patch when we add the DMA mappings to the allocations, it
>> requires only one small change to make work, and error handling should
>> just fall into place.
>>
>> 2. Unless we always want to allocate all page tables under a given PDE,
>> we'll have to eventually break this up into an array of pointers (or
>> pointer to pointer).
>>
>> 3. Having the discrete functions is easier to review, and understand.
>> All allocations and frees now take place in just a couple of locations.
>> Reviewing, and catching leaks should be easy.
>>
>> 4. Less important: the GFP flags are confined to one location, which
>> makes playing around with such things trivial.
>>
>> v2: Updated commit message to explain why this patch exists
>>
>> v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
>>
>> v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
>>
>> v5: Added additional safety checks in gen8 clear/free/unmap.
>>
>> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3, v4, v5)
>> ---
>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 251 ++++++++++++++++++++++++------------
>>   drivers/gpu/drm/i915/i915_gem_gtt.h |   4 +-
>>   drivers/gpu/drm/i915/intel_lrc.c    |  16 +--
>>   3 files changed, 179 insertions(+), 92 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index 0fe5c1e..85ea535 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -275,6 +275,99 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
>>   	return pte;
>>   }
>>   
>> +static void unmap_and_free_pt(struct i915_page_table_entry *pt)
>> +{
>> +	if (WARN_ON(!pt->page))
>> +		return;
>> +	__free_page(pt->page);
>> +	kfree(pt);
>> +}
>> +
>> +static struct i915_page_table_entry *alloc_pt_single(void)
>> +{
>> +	struct i915_page_table_entry *pt;
>> +
>> +	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
>> +	if (!pt)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +	if (!pt->page) {
>> +		kfree(pt);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +
>> +	return pt;
>> +}
>> +
>> +/**
>> + * alloc_pt_range() - Allocate a multiple page tables
>> + * @pd:		The page directory which will have at least @count entries
>> + *		available to point to the allocated page tables.
>> + * @pde:	First page directory entry for which we are allocating.
>> + * @count:	Number of pages to allocate.
>> + *
>> + * Allocates multiple page table pages and sets the appropriate entries in the
>> + * page table structure within the page directory. Function cleans up after
>> + * itself on any failures.
>> + *
>> + * Return: 0 if allocation succeeded.
>> + */
>> +static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count)
>> +{
>> +	int i, ret;
>> +
>> +	/* 512 is the max page tables per page_directory on any platform.
>> +	 * TODO: make WARN after patch series is done
>> +	 */
>> +	BUG_ON(pde + count > GEN6_PPGTT_PD_ENTRIES);
>> +
> WARN_ON in here and return -EINVAL.
>
> -Mika

I applied the changes in v6.

Thanks for the review.

-Michel


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5510 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (27 preceding siblings ...)
  2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
@ 2015-02-23 15:44 ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 01/32] drm/i915: page table abstractions Michel Thierry
                     ` (31 more replies)
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
  29 siblings, 32 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

This patchset starts addressing comments from v4 by Mika, and also has been
rebased on top of nightly.

For GEN8, it has also been extended to work in logical ring submission (lrc)
mode, as it will be the preferred mode of operation.
I also tried to update the lrc code at the same time the ppgtt refactoring
occurred, leaving only one patch that is exclusively for lrc.

I'm also now including the required patches for PPGTT with 48b addressing.
In order expand the GPU address space, a 4th level translation is added, the
Page Map Level 4 (PML4). This PML4 has 256 PML4 Entries (PML4E), PML4[0-255],
each pointing to a PDP.

For now, this feature will only be available in BDW, in LRC submission mode
(execlists) and when i915.enable_ppgtt=3 is set.
Also note that this expanded address space is only available for full PPGTT,
aliasing PPGTT remains 32b.

This list can be seen in 3 parts:
[01-10] Add page table allocation for GEN6/GEN7
[11-20] Enable dynamic allocation in GEN8,for both legacy and
execlist submission modes.
[21-32] PML4 support in BDW.

Ben Widawsky (26):
  drm/i915: page table abstractions
  drm/i915: Complete page table structures
  drm/i915: Create page table allocators
  drm/i915: Track GEN6 page table usage
  drm/i915: Extract context switch skip and pd load logic
  drm/i915: Track page table reload need
  drm/i915: Initialize all contexts
  drm/i915: Finish gen6/7 dynamic page table allocation
  drm/i915/bdw: Use dynamic allocation idioms on free
  drm/i915/bdw: page directories rework allocation
  drm/i915/bdw: pagetable allocation rework
  drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
  drm/i915: num_pd_pages/num_pd_entries isn't useful
  drm/i915: Extract PPGTT param from page_directory alloc
  drm/i915/bdw: Split out mappings
  drm/i915/bdw: begin bitmap tracking
  drm/i915/bdw: Dynamic page table allocations
  drm/i915/bdw: Make pdp allocation more dynamic
  drm/i915/bdw: Abstract PDP usage
  drm/i915/bdw: Add dynamic page trace events
  drm/i915/bdw: Add ppgtt info for dynamic pages
  drm/i915/bdw: implement alloc/free for 4lvl
  drm/i915/bdw: Add 4 level switching infrastructure
  drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT
  drm/i915: Plumb sg_iter through va allocation ->maps
  drm/i915: Expand error state's address width to 64b

Michel Thierry (6):
  drm/i915: Plumb drm_device through page tables operations
  drm/i915: Add dynamic page trace events
  drm/i915/bdw: Support dynamic pdp updates in lrc mode
  drm/i915/bdw: Support 64 bit PPGTT in lrc mode
  drm/i915/bdw: Add 4 level support in insert_entries and clear_range
  drm/i915/bdw: Flip the 48b switch

 drivers/gpu/drm/i915/i915_debugfs.c        |   26 +-
 drivers/gpu/drm/i915/i915_drv.h            |   11 +-
 drivers/gpu/drm/i915/i915_gem.c            |   11 +
 drivers/gpu/drm/i915/i915_gem_context.c    |   64 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   11 +
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 1534 ++++++++++++++++++++++------
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  248 ++++-
 drivers/gpu/drm/i915/i915_gpu_error.c      |   17 +-
 drivers/gpu/drm/i915/i915_params.c         |    2 +-
 drivers/gpu/drm/i915/i915_reg.h            |    1 +
 drivers/gpu/drm/i915/i915_trace.h          |  111 ++
 drivers/gpu/drm/i915/intel_lrc.c           |  149 ++-
 12 files changed, 1786 insertions(+), 399 deletions(-)

-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v5 01/32] drm/i915: page table abstractions
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-24 11:14     ` [PATCH] " Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 02/32] drm/i915: Complete page table structures Michel Thierry
                     ` (30 subsequent siblings)
  31 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

When we move to dynamic page allocation, keeping page_directory and pagetabs as
separate structures will help to break actions into simpler tasks.

To help transition the code nicely there is some wasted space in gen6/7.
This will be ameliorated shortly.

Following the x86 pagetable terminology:
PDPE = struct i915_page_directory_pointer_entry.
PDE = struct i915_page_directory_entry [page_directory].
PTE = struct i915_page_table_entry [page_tables].

v2: fixed mismatches after clean-up/rebase.

v3: Clarify the names of the multiple levels of page tables (Daniel)

v4: Addressing Mika's review comments.
s/gen8_free_page_directories/gen8_free_page_directory and free the
page tables for the directory there.
In gen8_ppgtt_allocate_page_directories, do not leak previously allocated
pt in case the page_directory alloc fails.
Update error return handling in gen8_ppgtt_alloc.

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 178 ++++++++++++++++++------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  23 ++++-
 2 files changed, 109 insertions(+), 92 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e54b2a0..10026d3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -338,7 +338,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
+		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
+		struct page *page_table = pd->page_tables[pde].page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
@@ -382,8 +383,12 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
 			break;
 
-		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
+		if (pt_vaddr == NULL) {
+			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
+			struct page *page_table = pd->page_tables[pde].page;
+
+			pt_vaddr = kmap_atomic(page_table);
+		}
 
 		pt_vaddr[pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -407,29 +412,33 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct page **pt_pages)
+static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
 {
 	int i;
 
-	if (pt_pages == NULL)
+	if (pd->page_tables == NULL)
 		return;
 
 	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
-		if (pt_pages[i])
-			__free_pages(pt_pages[i], 0);
+		if (pd->page_tables[i].page)
+			__free_page(pd->page_tables[i].page);
 }
 
-static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
+static void gen8_free_page_directory(struct i915_page_directory_entry *pd)
+{
+	gen8_free_page_tables(pd);
+	kfree(pd->page_tables);
+	__free_page(pd->page);
+}
+
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
-		kfree(ppgtt->gen8_pt_pages[i]);
+		gen8_free_page_directory(&ppgtt->pdp.page_directory[i]);
 		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
-
-	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
 }
 
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
@@ -464,86 +473,77 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static struct page **__gen8_alloc_page_tables(void)
+static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages;
 	int i;
 
-	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
-	if (!pt_pages)
-		return ERR_PTR(-ENOMEM);
-
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!pt_pages[i])
-			goto bail;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
+						     sizeof(dma_addr_t),
+						     GFP_KERNEL);
+		if (!ppgtt->gen8_pt_dma_addr[i])
+			return -ENOMEM;
 	}
 
-	return pt_pages;
-
-bail:
-	gen8_free_page_tables(pt_pages);
-	kfree(pt_pages);
-	return ERR_PTR(-ENOMEM);
+	return 0;
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
-					   const int max_pdp)
+static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages[GEN8_LEGACY_PDPES];
-	int i, ret;
+	int i, j;
 
-	for (i = 0; i < max_pdp; i++) {
-		pt_pages[i] = __gen8_alloc_page_tables();
-		if (IS_ERR(pt_pages[i])) {
-			ret = PTR_ERR(pt_pages[i]);
-			goto unwind_out;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+			struct i915_page_table_entry *pt = &ppgtt->pdp.page_directory[i].page_tables[j];
+
+			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+			if (!pt->page)
+				goto unwind_out;
 		}
 	}
 
-	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
-	 * "atomic" - for cleanup purposes.
-	 */
-	for (i = 0; i < max_pdp; i++)
-		ppgtt->gen8_pt_pages[i] = pt_pages[i];
-
 	return 0;
 
 unwind_out:
-	while (i--) {
-		gen8_free_page_tables(pt_pages[i]);
-		kfree(pt_pages[i]);
-	}
+	while (i--)
+		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
 
-	return ret;
+	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
+						const int max_pdp)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
+	for (i = 0; i < max_pdp; i++) {
+		struct i915_page_table_entry *pt;
 
-	return 0;
-}
+		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
+		if (!pt)
+			goto unwind_out;
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
-{
-	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
-	if (!ppgtt->pd_pages)
-		return -ENOMEM;
+		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
+		if (!ppgtt->pdp.page_directory[i].page) {
+			kfree(pt);
+			goto unwind_out;
+		}
+
+		ppgtt->pdp.page_directory[i].page_tables = pt;
+	}
 
-	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
+	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
+
+unwind_out:
+	while (i--) {
+		kfree(ppgtt->pdp.page_directory[i].page_tables);
+		__free_page(ppgtt->pdp.page_directory[i].page);
+	}
+
+	return -ENOMEM;
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
@@ -555,18 +555,20 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
-	if (ret) {
-		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
-		return ret;
-	}
+	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
+	if (ret)
+		goto err_out;
 
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
 
 	ret = gen8_ppgtt_allocate_dma(ppgtt);
 	if (ret)
-		gen8_ppgtt_free(ppgtt);
+		goto err_out;
 
+	return 0;
+
+err_out:
+	gen8_ppgtt_free(ppgtt);
 	return ret;
 }
 
@@ -577,7 +579,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       &ppgtt->pd_pages[pd], 0,
+			       ppgtt->pdp.page_directory[pd].page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
@@ -597,7 +599,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 	struct page *p;
 	int ret;
 
-	p = ppgtt->gen8_pt_pages[pd][pt];
+	p = ppgtt->pdp.page_directory[pd].page_tables[pt].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
@@ -658,7 +660,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 */
 	for (i = 0; i < max_pdp; i++) {
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
@@ -721,7 +723,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
 		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
@@ -936,7 +938,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > I915_PPGTT_PT_ENTRIES)
 			last_pte = I915_PPGTT_PT_ENTRIES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -965,7 +967,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -1000,8 +1002,8 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pt_pages[i]);
-	kfree(ppgtt->pt_pages);
+		__free_page(ppgtt->pd.page_tables[i].page);
+	kfree(ppgtt->pd.page_tables);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1058,22 +1060,22 @@ alloc:
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
+	struct i915_page_table_entry *pt;
 	int i;
 
-	ppgtt->pt_pages = kcalloc(ppgtt->num_pd_entries, sizeof(struct page *),
-				  GFP_KERNEL);
-
-	if (!ppgtt->pt_pages)
+	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
+	if (!pt)
 		return -ENOMEM;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!ppgtt->pt_pages[i]) {
+		pt[i].page = alloc_page(GFP_KERNEL);
+		if (!pt->page) {
 			gen6_ppgtt_free(ppgtt);
 			return -ENOMEM;
 		}
 	}
 
+	ppgtt->pd.page_tables = pt;
 	return 0;
 }
 
@@ -1108,9 +1110,11 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct page *page;
 		dma_addr_t pt_addr;
 
-		pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i], 0, 4096,
+		page = ppgtt->pd.page_tables[i].page;
+		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
 		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
@@ -1157,7 +1161,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total =  ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd_offset =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8f76990..d9bc375 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -187,6 +187,20 @@ struct i915_vma {
 			 u32 flags);
 };
 
+struct i915_page_table_entry {
+	struct page *page;
+};
+
+struct i915_page_directory_entry {
+	struct page *page; /* NULL for GEN6-GEN7 */
+	struct i915_page_table_entry *page_tables;
+};
+
+struct i915_page_directory_pointer_entry {
+	/* struct page *page; */
+	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
+};
+
 struct i915_address_space {
 	struct drm_mm mm;
 	struct drm_device *dev;
@@ -272,11 +286,6 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		struct page **pt_pages;
-		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
-	};
-	struct page *pd_pages;
-	union {
 		uint32_t pd_offset;
 		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
 	};
@@ -284,6 +293,10 @@ struct i915_hw_ppgtt {
 		dma_addr_t *pt_dma_addr;
 		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
 	};
+	union {
+		struct i915_page_directory_pointer_entry pdp;
+		struct i915_page_directory_entry pd;
+	};
 
 	struct drm_i915_file_private *file_priv;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 02/32] drm/i915: Complete page table structures
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 01/32] drm/i915: page table abstractions Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-24 13:10     ` Mika Kuoppala
  2015-02-23 15:44   ` [PATCH v5 03/32] drm/i915: Create page table allocators Michel Thierry
                     ` (29 subsequent siblings)
  31 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Move the remaining members over to the new page table structures.

This can be squashed with the previous commit if desire. The reasoning
is the same as that patch. I simply felt it is easier to review if split.

v2: In lrc: s/ppgtt->pd_dma_addr[i]/ppgtt->pdp.page_directory[i].daddr/
v3: Rebase.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2, v3)
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 85 +++++++++++++------------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 14 +++---
 drivers/gpu/drm/i915/intel_lrc.c    | 16 +++----
 4 files changed, 44 insertions(+), 73 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 63be374..4d07030 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2185,7 +2185,7 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
 		seq_puts(m, "aliasing PPGTT:\n");
-		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
+		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
 
 		ppgtt->debug_dump(ppgtt, m);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 10026d3..eb0714c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -311,7 +311,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pd_dma_addr[i];
+		dma_addr_t addr = ppgtt->pdp.page_directory[i].daddr;
 		ret = gen8_write_pdp(ring, i, addr);
 		if (ret)
 			return ret;
@@ -437,7 +437,6 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		gen8_free_page_directory(&ppgtt->pdp.page_directory[i]);
-		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
 }
 
@@ -449,14 +448,14 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pd_dma_addr[i])
+		if (!ppgtt->pdp.page_directory[i].daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i].daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -473,32 +472,19 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
-{
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
-
-	return 0;
-}
-
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
 	int i, j;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[i];
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_table_entry *pt = &ppgtt->pdp.page_directory[i].page_tables[j];
+			struct i915_page_table_entry *pt = &pd->page_tables[j];
 
 			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 			if (!pt->page)
 				goto unwind_out;
+
 		}
 	}
 
@@ -561,10 +547,6 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
 
-	ret = gen8_ppgtt_allocate_dma(ppgtt);
-	if (ret)
-		goto err_out;
-
 	return 0;
 
 err_out:
@@ -586,7 +568,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ppgtt->pd_dma_addr[pd] = pd_addr;
+	ppgtt->pdp.page_directory[pd].daddr = pd_addr;
 
 	return 0;
 }
@@ -596,17 +578,18 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pt)
 {
 	dma_addr_t pt_addr;
-	struct page *p;
+	struct i915_page_directory_entry *pdir = &ppgtt->pdp.page_directory[pd];
+	struct i915_page_table_entry *ptab = &pdir->page_tables[pt];
+	struct page *p = ptab->page;
 	int ret;
 
-	p = ppgtt->pdp.page_directory[pd].page_tables[pt].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
+	ptab->daddr = pt_addr;
 
 	return 0;
 }
@@ -662,7 +645,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		gen8_ppgtt_pde_t *pd_vaddr;
 		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -705,14 +688,15 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
 
 	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
 	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-		   ppgtt->pd_offset, ppgtt->pd_offset + ppgtt->num_pd_entries);
+		   ppgtt->pd.pd_offset,
+		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pt_dma_addr[pde];
+		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -756,13 +740,13 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 	uint32_t pd_entry;
 	int i;
 
-	WARN_ON(ppgtt->pd_offset & 0x3f);
+	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
 	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		dma_addr_t pt_addr;
 
-		pt_addr = ppgtt->pt_dma_addr[i];
+		pt_addr = ppgtt->pd.page_tables[i].daddr;
 		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
 		pd_entry |= GEN6_PDE_VALID;
 
@@ -773,9 +757,9 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 {
-	BUG_ON(ppgtt->pd_offset & 0x3f);
+	BUG_ON(ppgtt->pd.pd_offset & 0x3f);
 
-	return (ppgtt->pd_offset / 64) << 16;
+	return (ppgtt->pd.pd_offset / 64) << 16;
 }
 
 static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
@@ -988,19 +972,16 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	if (ppgtt->pt_dma_addr) {
-		for (i = 0; i < ppgtt->num_pd_entries; i++)
-			pci_unmap_page(ppgtt->base.dev->pdev,
-				       ppgtt->pt_dma_addr[i],
-				       4096, PCI_DMA_BIDIRECTIONAL);
-	}
+	for (i = 0; i < ppgtt->num_pd_entries; i++)
+		pci_unmap_page(ppgtt->base.dev->pdev,
+			       ppgtt->pd.page_tables[i].daddr,
+			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		__free_page(ppgtt->pd.page_tables[i].page);
 	kfree(ppgtt->pd.page_tables);
@@ -1093,14 +1074,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 	}
 
-	ppgtt->pt_dma_addr = kcalloc(ppgtt->num_pd_entries, sizeof(dma_addr_t),
-				     GFP_KERNEL);
-	if (!ppgtt->pt_dma_addr) {
-		drm_mm_remove_node(&ppgtt->node);
-		gen6_ppgtt_free(ppgtt);
-		return -ENOMEM;
-	}
-
 	return 0;
 }
 
@@ -1122,7 +1095,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pt_dma_addr[i] = pt_addr;
+		ppgtt->pd.page_tables[i].daddr = pt_addr;
 	}
 
 	return 0;
@@ -1164,7 +1137,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
-	ppgtt->pd_offset =
+	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
@@ -1175,7 +1148,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 
 	gen6_write_pdes(ppgtt);
 	DRM_DEBUG("Adding PPGTT at offset %x\n",
-		  ppgtt->pd_offset << 10);
+		  ppgtt->pd.pd_offset << 10);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d9bc375..6efeb18 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -189,10 +189,16 @@ struct i915_vma {
 
 struct i915_page_table_entry {
 	struct page *page;
+	dma_addr_t daddr;
 };
 
 struct i915_page_directory_entry {
 	struct page *page; /* NULL for GEN6-GEN7 */
+	union {
+		uint32_t pd_offset;
+		dma_addr_t daddr;
+	};
+
 	struct i915_page_table_entry *page_tables;
 };
 
@@ -286,14 +292,6 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
-	};
-	union {
-		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
-	};
-	union {
 		struct i915_page_directory_pointer_entry pdp;
 		struct i915_page_directory_entry pd;
 	};
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1c65949..9e71992 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1735,14 +1735,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[3]);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[3]);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[2]);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[2]);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[1]);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[1]);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[0]);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[0]);
+	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3].daddr);
+	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3].daddr);
+	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2].daddr);
+	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2].daddr);
+	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1].daddr);
+	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1].daddr);
+	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0].daddr);
+	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0].daddr);
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 03/32] drm/i915: Create page table allocators
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 01/32] drm/i915: page table abstractions Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 02/32] drm/i915: Complete page table structures Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-24 13:56     ` Mika Kuoppala
  2015-02-23 15:44   ` [PATCH v5 04/32] drm/i915: Plumb drm_device through page tables operations Michel Thierry
                     ` (28 subsequent siblings)
  31 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks.  This makes the
code easier to write, read, and verify.

Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,

The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain non-trivial complexity as a
result.

This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.

2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).

3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.

4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.

v2: Updated commit message to explain why this patch exists

v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/

v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)

v5: Added additional safety checks in gen8 clear/free/unmap.

v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 252 ++++++++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c    |  16 +--
 3 files changed, 178 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index eb0714c..65c77e5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -279,6 +279,98 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
+static void unmap_and_free_pt(struct i915_page_table_entry *pt)
+{
+	if (WARN_ON(!pt->page))
+		return;
+	__free_page(pt->page);
+	kfree(pt);
+}
+
+static struct i915_page_table_entry *alloc_pt_single(void)
+{
+	struct i915_page_table_entry *pt;
+
+	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
+	if (!pt)
+		return ERR_PTR(-ENOMEM);
+
+	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pt->page) {
+		kfree(pt);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pt;
+}
+
+/**
+ * alloc_pt_range() - Allocate a multiple page tables
+ * @pd:		The page directory which will have at least @count entries
+ *		available to point to the allocated page tables.
+ * @pde:	First page directory entry for which we are allocating.
+ * @count:	Number of pages to allocate.
+ *
+ * Allocates multiple page table pages and sets the appropriate entries in the
+ * page table structure within the page directory. Function cleans up after
+ * itself on any failures.
+ *
+ * Return: 0 if allocation succeeded.
+ */
+static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count)
+{
+	int i, ret;
+
+	/* 512 is the max page tables per page_directory on any platform. */
+	if (WARN_ON(pde + count > GEN6_PPGTT_PD_ENTRIES))
+		return -EINVAL;
+
+	for (i = pde; i < pde + count; i++) {
+		struct i915_page_table_entry *pt = alloc_pt_single();
+
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto err_out;
+		}
+		WARN(pd->page_tables[i],
+		     "Leaking page directory entry %d (%pa)\n",
+		     i, pd->page_tables[i]);
+		pd->page_tables[i] = pt;
+	}
+
+	return 0;
+
+err_out:
+	while (i--)
+		unmap_and_free_pt(pd->page_tables[i]);
+	return ret;
+}
+
+static void unmap_and_free_pd(struct i915_page_directory_entry *pd)
+{
+	if (pd->page) {
+		__free_page(pd->page);
+		kfree(pd);
+	}
+}
+
+static struct i915_page_directory_entry *alloc_pd_single(void)
+{
+	struct i915_page_directory_entry *pd;
+
+	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+	if (!pd)
+		return ERR_PTR(-ENOMEM);
+
+	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pd->page) {
+		kfree(pd);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pd;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
 			   uint64_t val)
@@ -311,7 +403,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.page_directory[i].daddr;
+		dma_addr_t addr = ppgtt->pdp.page_directory[i]->daddr;
 		ret = gen8_write_pdp(ring, i, addr);
 		if (ret)
 			return ret;
@@ -338,8 +430,24 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
-		struct page *page_table = pd->page_tables[pde].page;
+		struct i915_page_directory_entry *pd;
+		struct i915_page_table_entry *pt;
+		struct page *page_table;
+
+		if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
+			continue;
+
+		pd = ppgtt->pdp.page_directory[pdpe];
+
+		if (WARN_ON(!pd->page_tables[pde]))
+			continue;
+
+		pt = pd->page_tables[pde];
+
+		if (WARN_ON(!pt->page))
+			continue;
+
+		page_table = pt->page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
@@ -384,8 +492,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 			break;
 
 		if (pt_vaddr == NULL) {
-			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
-			struct page *page_table = pd->page_tables[pde].page;
+			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[pdpe];
+			struct i915_page_table_entry *pt = pd->page_tables[pde];
+			struct page *page_table = pt->page;
 
 			pt_vaddr = kmap_atomic(page_table);
 		}
@@ -416,19 +525,16 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
 {
 	int i;
 
-	if (pd->page_tables == NULL)
+	if (!pd->page)
 		return;
 
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
-		if (pd->page_tables[i].page)
-			__free_page(pd->page_tables[i].page);
-}
+	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
+		if (WARN_ON(!pd->page_tables[i]))
+			continue;
 
-static void gen8_free_page_directory(struct i915_page_directory_entry *pd)
-{
-	gen8_free_page_tables(pd);
-	kfree(pd->page_tables);
-	__free_page(pd->page);
+		unmap_and_free_pt(pd->page_tables[i]);
+		pd->page_tables[i] = NULL;
+	}
 }
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
@@ -436,7 +542,11 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_directory(&ppgtt->pdp.page_directory[i]);
+		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+			continue;
+
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
+		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
 	}
 }
 
@@ -448,14 +558,23 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pdp.page_directory[i].daddr)
+		if (!ppgtt->pdp.page_directory[i]->daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i].daddr, PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i]->daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
+			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
+			struct i915_page_table_entry *pt;
+			dma_addr_t addr;
+
+			if (WARN_ON(!pd->page_tables[j]))
+				continue;
+
+			pt = pd->page_tables[j];
+			addr = pt->daddr;
+
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -474,25 +593,20 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	int i, j;
+	int i, ret;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[i];
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_table_entry *pt = &pd->page_tables[j];
-
-			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-			if (!pt->page)
-				goto unwind_out;
-
-		}
+		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
+				     0, GEN8_PDES_PER_PAGE);
+		if (ret)
+			goto unwind_out;
 	}
 
 	return 0;
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
 
 	return -ENOMEM;
 }
@@ -503,19 +617,9 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int i;
 
 	for (i = 0; i < max_pdp; i++) {
-		struct i915_page_table_entry *pt;
-
-		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
-		if (!pt)
+		ppgtt->pdp.page_directory[i] = alloc_pd_single();
+		if (IS_ERR(ppgtt->pdp.page_directory[i]))
 			goto unwind_out;
-
-		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
-		if (!ppgtt->pdp.page_directory[i].page) {
-			kfree(pt);
-			goto unwind_out;
-		}
-
-		ppgtt->pdp.page_directory[i].page_tables = pt;
 	}
 
 	ppgtt->num_pd_pages = max_pdp;
@@ -524,10 +628,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 
 unwind_out:
-	while (i--) {
-		kfree(ppgtt->pdp.page_directory[i].page_tables);
-		__free_page(ppgtt->pdp.page_directory[i].page);
-	}
+	while (i--)
+		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
 
 	return -ENOMEM;
 }
@@ -561,14 +663,14 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.page_directory[pd].page, 0,
+			       ppgtt->pdp.page_directory[pd]->page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->pdp.page_directory[pd].daddr = pd_addr;
+	ppgtt->pdp.page_directory[pd]->daddr = pd_addr;
 
 	return 0;
 }
@@ -578,8 +680,8 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pt)
 {
 	dma_addr_t pt_addr;
-	struct i915_page_directory_entry *pdir = &ppgtt->pdp.page_directory[pd];
-	struct i915_page_table_entry *ptab = &pdir->page_tables[pt];
+	struct i915_page_directory_entry *pdir = ppgtt->pdp.page_directory[pd];
+	struct i915_page_table_entry *ptab = pdir->page_tables[pt];
 	struct page *p = ptab->page;
 	int ret;
 
@@ -642,10 +744,12 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 * will never need to touch the PDEs again.
 	 */
 	for (i = 0; i < max_pdp; i++) {
+		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i]->page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
+			struct i915_page_table_entry *pt = pd->page_tables[j];
+			dma_addr_t addr = pt->daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -696,7 +800,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
+		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -707,7 +811,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
 		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
@@ -746,7 +850,7 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		dma_addr_t pt_addr;
 
-		pt_addr = ppgtt->pd.page_tables[i].daddr;
+		pt_addr = ppgtt->pd.page_tables[i]->daddr;
 		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
 		pd_entry |= GEN6_PDE_VALID;
 
@@ -922,7 +1026,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > I915_PPGTT_PT_ENTRIES)
 			last_pte = I915_PPGTT_PT_ENTRIES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -951,7 +1055,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -974,7 +1078,7 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_tables[i].daddr,
+			       ppgtt->pd.page_tables[i]->daddr,
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
@@ -983,8 +1087,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pd.page_tables[i].page);
-	kfree(ppgtt->pd.page_tables);
+		unmap_and_free_pt(ppgtt->pd.page_tables[i]);
+
+	unmap_and_free_pd(&ppgtt->pd);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1039,27 +1144,6 @@ alloc:
 	return 0;
 }
 
-static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct i915_page_table_entry *pt;
-	int i;
-
-	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
-	if (!pt)
-		return -ENOMEM;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		pt[i].page = alloc_page(GFP_KERNEL);
-		if (!pt->page) {
-			gen6_ppgtt_free(ppgtt);
-			return -ENOMEM;
-		}
-	}
-
-	ppgtt->pd.page_tables = pt;
-	return 0;
-}
-
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 {
 	int ret;
@@ -1068,7 +1152,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_allocate_page_tables(ppgtt);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
@@ -1086,7 +1170,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 		struct page *page;
 		dma_addr_t pt_addr;
 
-		page = ppgtt->pd.page_tables[i].page;
+		page = ppgtt->pd.page_tables[i]->page;
 		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
@@ -1095,7 +1179,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pd.page_tables[i].daddr = pt_addr;
+		ppgtt->pd.page_tables[i]->daddr = pt_addr;
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 6efeb18..e8cad72 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -199,12 +199,12 @@ struct i915_page_directory_entry {
 		dma_addr_t daddr;
 	};
 
-	struct i915_page_table_entry *page_tables;
+	struct i915_page_table_entry *page_tables[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
 };
 
 struct i915_page_directory_pointer_entry {
 	/* struct page *page; */
-	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
+	struct i915_page_directory_entry *page_directory[GEN8_LEGACY_PDPES];
 };
 
 struct i915_address_space {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9e71992..bc9c7c3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1735,14 +1735,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3].daddr);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3].daddr);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2].daddr);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2].daddr);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1].daddr);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1].daddr);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0].daddr);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0].daddr);
+	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 04/32] drm/i915: Plumb drm_device through page tables operations
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (2 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 03/32] drm/i915: Create page table allocators Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 05/32] drm/i915: Track GEN6 page table usage Michel Thierry
                     ` (27 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

The next patch in the series will require it for alloc_pt_single.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 65c77e5..65a506c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -142,7 +142,6 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 		return has_aliasing_ppgtt ? 1 : 0;
 }
 
-
 static void ppgtt_bind_vma(struct i915_vma *vma,
 			   enum i915_cache_level cache_level,
 			   u32 flags);
@@ -279,7 +278,7 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-static void unmap_and_free_pt(struct i915_page_table_entry *pt)
+static void unmap_and_free_pt(struct i915_page_table_entry *pt, struct drm_device *dev)
 {
 	if (WARN_ON(!pt->page))
 		return;
@@ -287,7 +286,7 @@ static void unmap_and_free_pt(struct i915_page_table_entry *pt)
 	kfree(pt);
 }
 
-static struct i915_page_table_entry *alloc_pt_single(void)
+static struct i915_page_table_entry *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_page_table_entry *pt;
 
@@ -317,7 +316,9 @@ static struct i915_page_table_entry *alloc_pt_single(void)
  *
  * Return: 0 if allocation succeeded.
  */
-static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count)
+static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count,
+		  struct drm_device *dev)
+
 {
 	int i, ret;
 
@@ -326,7 +327,7 @@ static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, si
 		return -EINVAL;
 
 	for (i = pde; i < pde + count; i++) {
-		struct i915_page_table_entry *pt = alloc_pt_single();
+		struct i915_page_table_entry *pt = alloc_pt_single(dev);
 
 		if (IS_ERR(pt)) {
 			ret = PTR_ERR(pt);
@@ -342,7 +343,7 @@ static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, si
 
 err_out:
 	while (i--)
-		unmap_and_free_pt(pd->page_tables[i]);
+		unmap_and_free_pt(pd->page_tables[i], dev);
 	return ret;
 }
 
@@ -521,7 +522,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
+static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct drm_device *dev)
 {
 	int i;
 
@@ -532,7 +533,7 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
 		if (WARN_ON(!pd->page_tables[i]))
 			continue;
 
-		unmap_and_free_pt(pd->page_tables[i]);
+		unmap_and_free_pt(pd->page_tables[i], dev);
 		pd->page_tables[i] = NULL;
 	}
 }
@@ -545,7 +546,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
 		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
 	}
 }
@@ -597,7 +598,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
-				     0, GEN8_PDES_PER_PAGE);
+				     0, GEN8_PDES_PER_PAGE, ppgtt->base.dev);
 		if (ret)
 			goto unwind_out;
 	}
@@ -606,7 +607,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
@@ -1087,7 +1088,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		unmap_and_free_pt(ppgtt->pd.page_tables[i]);
+		unmap_and_free_pt(ppgtt->pd.page_tables[i], ppgtt->base.dev);
 
 	unmap_and_free_pd(&ppgtt->pd);
 }
@@ -1152,7 +1153,9 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+			ppgtt->base.dev);
+
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 05/32] drm/i915: Track GEN6 page table usage
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (3 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 04/32] drm/i915: Plumb drm_device through page tables operations Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 06/32] drm/i915: Extract context switch skip and pd load logic Michel Thierry
                     ` (26 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Instead of implementing the full tracking + dynamic allocation, this
patch does a bit less than half of the work, by tracking and warning on
unexpected conditions. The tracking itself follows which PTEs within a
page table are currently being used for objects. The next patch will
modify this to actually allocate the page tables only when necessary.

With the current patch there isn't much in the way of making a gen
agnostic range allocation function. However, in the next patch we'll add
more specificity which makes having separate functions a bit easier to
manage.

One important change introduced here is that DMA mappings are
created/destroyed at the same page directories/tables are
allocated/deallocated.

Notice that aliasing PPGTT is not managed here. The patch which actually
begins dynamic allocation/teardown explains the reasoning for this.

v2: s/pdp.page_directory/pdp.page_directorys
Make a scratch page allocation helper

v3: Rebase and expand commit message.

v4: Allocate required pagetables only when it is needed, _bind_to_vm
instead of bind_vma (Daniel).

v5: Rebased to remove the unnecessary noise in the diff, also:
 - PDE mask is GEN agnostic, renamed GEN6_PDE_MASK to I915_PDE_MASK.
 - Removed unnecessary checks in gen6_alloc_va_range.
 - Changed map/unmap_px_single macros to use dma functions directly and
   be part of a static inline function instead.
 - Moved drm_device plumbing through page tables operation to its own
   patch.
 - Moved allocate/teardown_va_range calls until they are fully
   implemented (in subsequent patch).
 - Merged pt and scratch_pt unmap_and_free path.
 - Moved scratch page allocator helper to the patch that will use it.

v6: Reduce complexity by not tearing down pagetables dynamically, the
same can be achieved while freeing empty vms. (Daniel)

v7: s/i915_dma_map_px_single/i915_dma_map_single
s/gen6_write_pdes/gen6_write_pde
Prevent a NULL case when only GGTT is available. (Mika)

Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 198 +++++++++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  75 ++++++++++++++
 2 files changed, 211 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 65a506c..5ee92ce 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -278,29 +278,88 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-static void unmap_and_free_pt(struct i915_page_table_entry *pt, struct drm_device *dev)
+#define i915_dma_unmap_single(px, dev) \
+	__i915_dma_unmap_single((px)->daddr, dev)
+
+static inline void __i915_dma_unmap_single(dma_addr_t daddr,
+					struct drm_device *dev)
+{
+	struct device *device = &dev->pdev->dev;
+
+	dma_unmap_page(device, daddr, 4096, PCI_DMA_BIDIRECTIONAL);
+}
+
+/**
+ * i915_dma_map_single() - Create a dma mapping for a page table/dir/etc.
+ * @px:	Page table/dir/etc to get a DMA map for
+ * @dev:	drm device
+ *
+ * Page table allocations are unified across all gens. They always require a
+ * single 4k allocation, as well as a DMA mapping. If we keep the structs
+ * symmetric here, the simple macro covers us for every page table type.
+ *
+ * Return: 0 if success.
+ */
+#define i915_dma_map_single(px, dev) \
+	i915_dma_map_page_single((px)->page, (dev), &(px)->daddr)
+
+static inline int i915_dma_map_page_single(struct page *page,
+					   struct drm_device *dev,
+					   dma_addr_t *daddr)
+{
+	struct device *device = &dev->pdev->dev;
+
+	*daddr = dma_map_page(device, page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
+	return dma_mapping_error(device, *daddr);
+}
+
+static void unmap_and_free_pt(struct i915_page_table_entry *pt,
+			       struct drm_device *dev)
 {
 	if (WARN_ON(!pt->page))
 		return;
+
+	i915_dma_unmap_single(pt, dev);
 	__free_page(pt->page);
+	kfree(pt->used_ptes);
 	kfree(pt);
 }
 
 static struct i915_page_table_entry *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_page_table_entry *pt;
+	const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+		GEN8_PTES_PER_PAGE : I915_PPGTT_PT_ENTRIES;
+	int ret = -ENOMEM;
 
 	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
 	if (!pt)
 		return ERR_PTR(-ENOMEM);
 
+	pt->used_ptes = kcalloc(BITS_TO_LONGS(count), sizeof(*pt->used_ptes),
+				GFP_KERNEL);
+
+	if (!pt->used_ptes)
+		goto fail_bitmap;
+
 	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (!pt->page) {
-		kfree(pt);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!pt->page)
+		goto fail_page;
+
+	ret = i915_dma_map_single(pt, dev);
+	if (ret)
+		goto fail_dma;
 
 	return pt;
+
+fail_dma:
+	__free_page(pt->page);
+fail_page:
+	kfree(pt->used_ptes);
+fail_bitmap:
+	kfree(pt);
+
+	return ERR_PTR(ret);
 }
 
 /**
@@ -838,26 +897,35 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	}
 }
 
-static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
+/* Write pde (index) from the page directory @pd to the page table @pt */
+static void gen6_write_pde(struct i915_page_directory_entry *pd,
+			    const int pde, struct i915_page_table_entry *pt)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
-	gen6_gtt_pte_t __iomem *pd_addr;
-	uint32_t pd_entry;
-	int i;
+	/* Caller needs to make sure the write completes if necessary */
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pd, struct i915_hw_ppgtt, pd);
+	u32 pd_entry;
 
-	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
-	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		dma_addr_t pt_addr;
+	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
+	pd_entry |= GEN6_PDE_VALID;
 
-		pt_addr = ppgtt->pd.page_tables[i]->daddr;
-		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
-		pd_entry |= GEN6_PDE_VALID;
+	writel(pd_entry, ppgtt->pd_addr + pde);
+}
 
-		writel(pd_entry, pd_addr + i);
-	}
-	readl(pd_addr);
+/* Write all the page tables found in the ppgtt structure to incrementing page
+ * directories. */
+static void gen6_write_page_range(struct drm_i915_private *dev_priv,
+				struct i915_page_directory_entry *pd, uint32_t start, uint32_t length)
+{
+	struct i915_page_table_entry *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, pd, start, length, temp, pde)
+		gen6_write_pde(pd, pde, pt);
+
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
+	readl(dev_priv->gtt.gsm);
 }
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
@@ -1083,6 +1151,28 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
+static int gen6_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+				container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_table_entry *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		DECLARE_BITMAP(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
+
+		bitmap_zero(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
+		bitmap_set(tmp_bitmap, gen6_pte_index(start),
+			   gen6_pte_count(start, length));
+
+		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+				I915_PPGTT_PT_ENTRIES);
+	}
+
+	return 0;
+}
+
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
@@ -1129,20 +1219,24 @@ alloc:
 					       0, dev_priv->gtt.base.total,
 					       0);
 		if (ret)
-			return ret;
+			goto err_out;
 
 		retried = true;
 		goto alloc;
 	}
 
 	if (ret)
-		return ret;
+		goto err_out;
+
 
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
 	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
 	return 0;
+
+err_out:
+	return ret;
 }
 
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
@@ -1164,30 +1258,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct drm_device *dev = ppgtt->base.dev;
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		struct page *page;
-		dma_addr_t pt_addr;
-
-		page = ppgtt->pd.page_tables[i]->page;
-		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
-				       PCI_DMA_BIDIRECTIONAL);
-
-		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
-			gen6_ppgtt_unmap_pages(ppgtt);
-			return -EIO;
-		}
-
-		ppgtt->pd.page_tables[i]->daddr = pt_addr;
-	}
-
-	return 0;
-}
-
 static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
 	struct drm_device *dev = ppgtt->base.dev;
@@ -1211,12 +1281,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_setup_page_tables(ppgtt);
-	if (ret) {
-		gen6_ppgtt_free(ppgtt);
-		return ret;
-	}
-
+	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
@@ -1227,13 +1292,17 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
+	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
+
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 
+	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
+
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
 			 ppgtt->node.start / PAGE_SIZE);
 
-	gen6_write_pdes(ppgtt);
 	DRM_DEBUG("Adding PPGTT at offset %x\n",
 		  ppgtt->pd.pd_offset << 10);
 
@@ -1504,15 +1573,20 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 		return;
 	}
 
-	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
-		/* TODO: Perhaps it shouldn't be gen6 specific */
-		if (i915_is_ggtt(vm)) {
-			if (dev_priv->mm.aliasing_ppgtt)
-				gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
-			continue;
-		}
+	if (USES_PPGTT(dev)) {
+		list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
+			/* TODO: Perhaps it shouldn't be gen6 specific */
+
+			struct i915_hw_ppgtt *ppgtt =
+					container_of(vm, struct i915_hw_ppgtt,
+						     base);
 
-		gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
+			if (i915_is_ggtt(vm))
+				ppgtt = dev_priv->mm.aliasing_ppgtt;
+
+			gen6_write_page_range(dev_priv, &ppgtt->pd, 0,
+					      ppgtt->num_pd_entries);
+		}
 	}
 
 	i915_ggtt_flush(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e8cad72..1b15fc9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -54,7 +54,10 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN6_PPGTT_PD_ENTRIES		512
 #define GEN6_PD_SIZE			(GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
 #define GEN6_PD_ALIGN			(PAGE_SIZE * 16)
+#define GEN6_PDE_SHIFT			22
 #define GEN6_PDE_VALID			(1 << 0)
+#define I915_PDE_MASK			(GEN6_PPGTT_PD_ENTRIES-1)
+#define NUM_PTE(pde_shift)		(1 << (pde_shift - PAGE_SHIFT))
 
 #define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
 
@@ -190,6 +193,8 @@ struct i915_vma {
 struct i915_page_table_entry {
 	struct page *page;
 	dma_addr_t daddr;
+
+	unsigned long *used_ptes;
 };
 
 struct i915_page_directory_entry {
@@ -246,6 +251,9 @@ struct i915_address_space {
 	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
 				     enum i915_cache_level level,
 				     bool valid, u32 flags); /* Create a valid PTE */
+	int (*allocate_va_range)(struct i915_address_space *vm,
+				 uint64_t start,
+				 uint64_t length);
 	void (*clear_range)(struct i915_address_space *vm,
 			    uint64_t start,
 			    uint64_t length,
@@ -298,12 +306,79 @@ struct i915_hw_ppgtt {
 
 	struct drm_i915_file_private *file_priv;
 
+	gen6_gtt_pte_t __iomem *pd_addr;
+
 	int (*enable)(struct i915_hw_ppgtt *ppgtt);
 	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
 			 struct intel_engine_cs *ring);
 	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
 };
 
+/* For each pde iterates over every pde between from start until start + length.
+ * If start, and start+length are not perfectly divisible, the macro will round
+ * down, and up as needed. The macro modifies pde, start, and length. Dev is
+ * only used to differentiate shift values. Temp is temp.  On gen6/7, start = 0,
+ * and length = 2G effectively iterates over every PDE in the system. On gen8+
+ * it simply iterates over every page directory entry in a page directory.
+ *
+ * XXX: temp is not actually needed, but it saves doing the ALIGN operation.
+ */
+#define gen6_for_each_pde(pt, pd, start, length, temp, iter) \
+	for (iter = gen6_pde_index(start), pt = (pd)->page_tables[iter]; \
+	     length > 0 && iter < GEN6_PPGTT_PD_ENTRIES; \
+	     pt = (pd)->page_tables[++iter], \
+	     temp = ALIGN(start+1, 1 << GEN6_PDE_SHIFT) - start, \
+	     temp = min_t(unsigned, temp, length), \
+	     start += temp, length -= temp)
+
+static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
+{
+	const uint32_t mask = NUM_PTE(pde_shift) - 1;
+
+	return (address >> PAGE_SHIFT) & mask;
+}
+
+/* Helper to counts the number of PTEs within the given length. This count does
+* not cross a page table boundary, so the max value would be
+* I915_PPGTT_PT_ENTRIES for GEN6, and GEN8_PTES_PER_PAGE for GEN8.
+*/
+static inline size_t i915_pte_count(uint64_t addr, size_t length,
+					uint32_t pde_shift)
+{
+	const uint64_t mask = ~((1 << pde_shift) - 1);
+	uint64_t end;
+
+	BUG_ON(length == 0);
+	BUG_ON(offset_in_page(addr|length));
+
+	end = addr + length;
+
+	if ((addr & mask) != (end & mask))
+		return NUM_PTE(pde_shift) - i915_pte_index(addr, pde_shift);
+
+	return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
+}
+
+static inline uint32_t i915_pde_index(uint64_t addr, uint32_t shift)
+{
+	return (addr >> shift) & I915_PDE_MASK;
+}
+
+static inline uint32_t gen6_pte_index(uint32_t addr)
+{
+	return i915_pte_index(addr, GEN6_PDE_SHIFT);
+}
+
+static inline size_t gen6_pte_count(uint32_t addr, uint32_t length)
+{
+	return i915_pte_count(addr, length, GEN6_PDE_SHIFT);
+}
+
+static inline uint32_t gen6_pde_index(uint32_t addr)
+{
+	return i915_pde_index(addr, GEN6_PDE_SHIFT);
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 06/32] drm/i915: Extract context switch skip and pd load logic
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (4 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 05/32] drm/i915: Track GEN6 page table usage Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 07/32] drm/i915: Track page table reload need Michel Thierry
                     ` (25 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

We have some fanciness coming up. This patch just breaks out the logic
of context switch skip, pd load pre, and pd load post.

v2: Use new functions to replace the logic right away (Daniel)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_context.c | 40 +++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 755b415..6206d27 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -565,6 +565,33 @@ mi_set_context(struct intel_engine_cs *ring,
 	return ret;
 }
 
+static inline bool should_skip_switch(struct intel_engine_cs *ring,
+				      struct intel_context *from,
+				      struct intel_context *to)
+{
+	if (from == to && !to->remap_slice)
+		return true;
+
+	return false;
+}
+
+static bool
+needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	return ((INTEL_INFO(ring->dev)->gen < 8) ||
+			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
+}
+
+static bool
+needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
+{
+	return (!to->legacy_hw_ctx.initialized ||
+			i915_gem_context_is_default(to)) &&
+			to->ppgtt && IS_GEN8(ring->dev);
+}
+
 static int do_switch(struct intel_engine_cs *ring,
 		     struct intel_context *to)
 {
@@ -573,9 +600,6 @@ static int do_switch(struct intel_engine_cs *ring,
 	u32 hw_flags = 0;
 	bool uninitialized = false;
 	struct i915_vma *vma;
-	bool needs_pd_load_pre = ((INTEL_INFO(ring->dev)->gen < 8) ||
-			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
-	bool needs_pd_load_post = false;
 	int ret, i;
 
 	if (from != NULL && ring == &dev_priv->ring[RCS]) {
@@ -583,7 +607,7 @@ static int do_switch(struct intel_engine_cs *ring,
 		BUG_ON(!i915_gem_obj_is_pinned(from->legacy_hw_ctx.rcs_state));
 	}
 
-	if (from == to && !to->remap_slice)
+	if (should_skip_switch(ring, from, to))
 		return 0;
 
 	/* Trying to pin first makes error handling easier. */
@@ -601,7 +625,7 @@ static int do_switch(struct intel_engine_cs *ring,
 	 */
 	from = ring->last_context;
 
-	if (needs_pd_load_pre) {
+	if (needs_pd_load_pre(ring, to)) {
 		/* Older GENs and non render rings still want the load first,
 		 * "PP_DCLV followed by PP_DIR_BASE register through Load
 		 * Register Immediate commands in Ring Buffer before submitting
@@ -646,16 +670,14 @@ static int do_switch(struct intel_engine_cs *ring,
 	 * XXX: If we implemented page directory eviction code, this
 	 * optimization needs to be removed.
 	 */
-	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
+	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
 		hw_flags |= MI_RESTORE_INHIBIT;
-		needs_pd_load_post = to->ppgtt && IS_GEN8(ring->dev);
-	}
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
 		goto unpin_out;
 
-	if (needs_pd_load_post) {
+	if (needs_pd_load_post(ring, to)) {
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		/* The hardware context switch is emitted, but we haven't
 		 * actually changed the state - so it's probably safe to bail
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 07/32] drm/i915: Track page table reload need
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (5 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 06/32] drm/i915: Extract context switch skip and pd load logic Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 08/32] drm/i915: Initialize all contexts Michel Thierry
                     ` (24 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This patch was formerly known as, "Force pd restore when PDEs change,
gen6-7." I had to change the name because it is needed for GEN8 too.

The real issue this is trying to solve is when a new object is mapped
into the current address space. The GPU does not snoop the new mapping
so we must do the gen specific action to reload the page tables.

GEN8 and GEN7 do differ in the way they load page tables for the RCS.
GEN8 does so with the context restore, while GEN7 requires the proper
load commands in the command streamer. Non-render is similar for both.

Caveat for GEN7
The docs say you cannot change the PDEs of a currently running context.
We never map new PDEs of a running context, and expect them to be
present - so I think this is okay. (We can unmap, but this should also
be okay since we only unmap unreferenced objects that the GPU shouldn't
be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag
to signal that even if the context is the same, force a reload. It's
unclear exactly what this does, but I have a hunch it's the right thing
to do.

The logic assumes that we always emit a context switch after mapping new
PDEs, and before we submit a batch. This is the case today, and has been
the case since the inception of hardware contexts. A note in the comment
let's the user know.

It's not just for gen8. If the current context has mappings change, we
need a context reload to switch

v2: Rebased after ppgtt clean up patches. Split the warning for aliasing
and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt
is always null.

v3: Invalidate PPGTT TLBs inside alloc_va_range.

v4: Rename ppgtt_invalidate_tlbs to mark_tlbs_dirty and move
pd_dirty_rings from i915_address_space to i915_hw_ppgtt. Fixes when
neither ctx->ppgtt and aliasing_ppgtt exist.

v5: Removed references to teardown_va_range.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_context.c    | 29 ++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 11 +++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  1 +
 4 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 6206d27..437cdcc 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -569,8 +569,20 @@ static inline bool should_skip_switch(struct intel_engine_cs *ring,
 				      struct intel_context *from,
 				      struct intel_context *to)
 {
-	if (from == to && !to->remap_slice)
-		return true;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	if (to->remap_slice)
+		return false;
+
+	if (to->ppgtt) {
+		if (from == to && !test_bit(ring->id,
+				&to->ppgtt->pd_dirty_rings))
+			return true;
+	} else if (dev_priv->mm.aliasing_ppgtt) {
+		if (from == to && !test_bit(ring->id,
+				&dev_priv->mm.aliasing_ppgtt->pd_dirty_rings))
+			return true;
+	}
 
 	return false;
 }
@@ -587,9 +599,8 @@ needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
 static bool
 needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
 {
-	return (!to->legacy_hw_ctx.initialized ||
-			i915_gem_context_is_default(to)) &&
-			to->ppgtt && IS_GEN8(ring->dev);
+	return IS_GEN8(ring->dev) &&
+			(to->ppgtt || &to->ppgtt->pd_dirty_rings);
 }
 
 static int do_switch(struct intel_engine_cs *ring,
@@ -634,6 +645,12 @@ static int do_switch(struct intel_engine_cs *ring,
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		if (ret)
 			goto unpin_out;
+
+		/* Doing a PD load always reloads the page dirs */
+		if (to->ppgtt)
+			clear_bit(ring->id, &to->ppgtt->pd_dirty_rings);
+		else
+			clear_bit(ring->id, &dev_priv->mm.aliasing_ppgtt->pd_dirty_rings);
 	}
 
 	if (ring != &dev_priv->ring[RCS]) {
@@ -672,6 +689,8 @@ static int do_switch(struct intel_engine_cs *ring,
 	 */
 	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
 		hw_flags |= MI_RESTORE_INHIBIT;
+	else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->pd_dirty_rings))
+		hw_flags |= MI_FORCE_RESTORE;
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b773368..1961107 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1198,6 +1198,13 @@ i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
 	if (ret)
 		goto error;
 
+	if (ctx->ppgtt)
+		WARN(ctx->ppgtt->pd_dirty_rings & (1<<ring->id),
+			"%s didn't clear reload\n", ring->name);
+	else if (dev_priv->mm.aliasing_ppgtt)
+		WARN(dev_priv->mm.aliasing_ppgtt->pd_dirty_rings &
+			(1<<ring->id), "%s didn't clear reload\n", ring->name);
+
 	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
 	instp_mask = I915_EXEC_CONSTANTS_MASK;
 	switch (instp_mode) {
@@ -1467,6 +1474,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret)
 		goto err;
 
+	/* XXX: Reserve has possibly change PDEs which means we must do a
+	 * context switch before we can coherently read some of the reserved
+	 * VMAs. */
+
 	/* The objects are in their final locations, apply the relocations. */
 	if (need_relocs)
 		ret = i915_gem_execbuffer_relocate(eb);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5ee92ce..18d7b28 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1151,6 +1151,16 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
+/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
+ * are switching between contexts with the same LRCA, we also must do a force
+ * restore.
+ */
+static inline void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
+{
+	/* If current vm != vm, */
+	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
+}
+
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start, uint64_t length)
 {
@@ -1170,6 +1180,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 				I915_PPGTT_PT_ENTRIES);
 	}
 
+	mark_tlbs_dirty(ppgtt);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 1b15fc9..eaf530f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -297,6 +297,7 @@ struct i915_hw_ppgtt {
 	struct i915_address_space base;
 	struct kref ref;
 	struct drm_mm_node node;
+	unsigned long pd_dirty_rings;
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 08/32] drm/i915: Initialize all contexts
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (6 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 07/32] drm/i915: Track page table reload need Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 09/32] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
                     ` (23 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The problem is we're going to switch to a new context, which could be
the default context. The plan was to use restore inhibit, which would be
fine, except if we are using dynamic page tables (which we will). If we
use dynamic page tables and we don't load new page tables, the previous
page tables might go away, and future operations will fault.

CTXA runs.
switch to default, restore inhibit
CTXA dies and has its address space taken away.
Run CTXB, tries to save using the context A's address space - this
fails.

The general solution is to make sure every context has it's own state,
and its own address space. For cases when we must restore inhibit, first
thing we do is load a valid address space. I thought this would be
enough, but apparently there are references within the context itself
which will refer to the old address space - therefore, we also must
reinitialize.

It was tricky to track this down as we don't have much insight into what
happens in a context save.

This is required for the next patch which enables dynamic page tables.

v2: to->ppgtt is only valid in full ppgtt.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_context.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 437cdcc..6a583c3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -596,13 +596,6 @@ needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
 			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
 }
 
-static bool
-needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
-{
-	return IS_GEN8(ring->dev) &&
-			(to->ppgtt || &to->ppgtt->pd_dirty_rings);
-}
-
 static int do_switch(struct intel_engine_cs *ring,
 		     struct intel_context *to)
 {
@@ -683,20 +676,24 @@ static int do_switch(struct intel_engine_cs *ring,
 
 	/* GEN8 does *not* require an explicit reload if the PDPs have been
 	 * setup, and we do not wish to move them.
-	 *
-	 * XXX: If we implemented page directory eviction code, this
-	 * optimization needs to be removed.
 	 */
-	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
+	if (!to->legacy_hw_ctx.initialized) {
 		hw_flags |= MI_RESTORE_INHIBIT;
-	else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->pd_dirty_rings))
+		/* NB: If we inhibit the restore, the context is not allowed to
+		 * die because future work may end up depending on valid address
+		 * space. This means we must enforce that a page table load
+		 * occur when this occurs. */
+	} else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->pd_dirty_rings))
 		hw_flags |= MI_FORCE_RESTORE;
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
 		goto unpin_out;
 
-	if (needs_pd_load_post(ring, to)) {
+	if (IS_GEN8(ring->dev) && to->ppgtt && (hw_flags & MI_RESTORE_INHIBIT)) {
+		/* We have a valid page directory (scratch) to switch to. This
+		 * allows the old VM to be freed. Note that if anything occurs
+		 * between the set context, and here, we are f*cked */
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		/* The hardware context switch is emitted, but we haven't
 		 * actually changed the state - so it's probably safe to bail
@@ -746,7 +743,7 @@ static int do_switch(struct intel_engine_cs *ring,
 		i915_gem_context_unreference(from);
 	}
 
-	uninitialized = !to->legacy_hw_ctx.initialized && from == NULL;
+	uninitialized = !to->legacy_hw_ctx.initialized;
 	to->legacy_hw_ctx.initialized = true;
 
 done:
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 09/32] drm/i915: Finish gen6/7 dynamic page table allocation
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (7 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 08/32] drm/i915: Initialize all contexts Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 10/32] drm/i915: Add dynamic page trace events Michel Thierry
                     ` (22 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This patch continues on the idea from the previous patch. From here on,
in the steady state, PDEs are all pointing to the scratch page table (as
recommended in the spec). When an object is allocated in the VA range,
the code will determine if we need to allocate a page for the page
table. Similarly when the object is destroyed, we will remove, and free
the page table pointing the PDE back to the scratch page.

Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.

The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV.  Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.

We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.

v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.

v3: Updated trace event to spit out a name

v4: Aliasing ppgtt is now initialized differently (in setup global gtt)

v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check
for trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).

v6: Implement changes from code review (Daniel):
 - allocate/teardown_va_range calls added.
 - Add a scratch page allocation helper (only need the address).
 - Move trace events to a new patch.
 - Use updated mark_tlbs_dirty.
 - Moved pt preallocation for aliasing ppgtt into gen6_ppgtt_init.

v7: teardown_va_range removed (Daniel).
    In init, gen6_ppgtt_clear_range call is only needed for aliasing ppgtt.

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
---
 drivers/gpu/drm/i915/i915_debugfs.c |   3 +-
 drivers/gpu/drm/i915/i915_gem.c     |   9 +++
 drivers/gpu/drm/i915/i915_gem_gtt.c | 125 +++++++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_gtt.h |   3 +
 4 files changed, 123 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 4d07030..e8ad450 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2181,6 +2181,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", I915_READ(RING_PP_DIR_BASE_READ(ring)));
 		seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", I915_READ(RING_PP_DIR_DCLV(ring)));
 	}
+	seq_printf(m, "ECOCHK: 0x%08x\n\n", I915_READ(GAM_ECOCHK));
+
 	if (dev_priv->mm.aliasing_ppgtt) {
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
@@ -2197,7 +2199,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
 		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
 	}
-	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
 }
 
 static int i915_ppgtt_info(struct seq_file *m, void *data)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 61134ab..312b7d2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3599,6 +3599,15 @@ search_free:
 	if (ret)
 		goto err_remove_node;
 
+	/*  allocate before insert / bind */
+	if (vma->vm->allocate_va_range) {
+		ret = vma->vm->allocate_va_range(vma->vm,
+						vma->node.start,
+						vma->node.size);
+		if (ret)
+			goto err_remove_node;
+	}
+
 	trace_i915_vma_bind(vma, flags);
 	ret = i915_vma_bind(vma, obj->cache_level,
 			    flags & PIN_GLOBAL ? GLOBAL_BIND : 0);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 18d7b28..85c8a51 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -362,6 +362,16 @@ fail_bitmap:
 	return ERR_PTR(ret);
 }
 
+static inline struct i915_page_table_entry *alloc_pt_scratch(struct drm_device *dev)
+{
+	struct i915_page_table_entry *pt = alloc_pt_single(dev);
+
+	if (!IS_ERR(pt))
+		pt->scratch = 1;
+
+	return pt;
+}
+
 /**
  * alloc_pt_range() - Allocate a multiple page tables
  * @pd:		The page directory which will have at least @count entries
@@ -1164,10 +1174,46 @@ static inline void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start, uint64_t length)
 {
+	DECLARE_BITMAP(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
+	struct drm_device *dev = vm->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_hw_ppgtt *ppgtt =
 				container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_table_entry *pt;
+	const uint32_t start_save = start, length_save = length;
 	uint32_t pde, temp;
+	int ret;
+
+	BUG_ON(upper_32_bits(start));
+
+	bitmap_zero(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
+
+	/* The allocation is done in two stages so that we can bail out with
+	 * minimal amount of pain. The first stage finds new page tables that
+	 * need allocation. The second stage marks use ptes within the page
+	 * tables.
+	 */
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		if (pt != ppgtt->scratch_pt) {
+			WARN_ON(bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
+			continue;
+		}
+
+		/* We've already allocated a page table */
+		WARN_ON(!bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
+
+		pt = alloc_pt_single(dev);
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto unwind_out;
+		}
+
+		ppgtt->pd.page_tables[pde] = pt;
+		set_bit(pde, new_page_tables);
+	}
+
+	start = start_save;
+	length = length_save;
 
 	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
 		DECLARE_BITMAP(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
@@ -1176,21 +1222,46 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 		bitmap_set(tmp_bitmap, gen6_pte_index(start),
 			   gen6_pte_count(start, length));
 
-		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+		if (test_and_clear_bit(pde, new_page_tables))
+			gen6_write_pde(&ppgtt->pd, pde, pt);
+
+		bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
 				I915_PPGTT_PT_ENTRIES);
 	}
 
+	WARN_ON(!bitmap_empty(new_page_tables, GEN6_PPGTT_PD_ENTRIES));
+
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
+	readl(dev_priv->gtt.gsm);
+
 	mark_tlbs_dirty(ppgtt);
 	return 0;
+
+unwind_out:
+	for_each_set_bit(pde, new_page_tables, GEN6_PPGTT_PD_ENTRIES) {
+		struct i915_page_table_entry *pt = ppgtt->pd.page_tables[pde];
+
+		ppgtt->pd.page_tables[pde] = NULL;
+		unmap_and_free_pt(pt, vm->dev);
+	}
+
+	mark_tlbs_dirty(ppgtt);
+	return ret;
 }
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		unmap_and_free_pt(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct i915_page_table_entry *pt = ppgtt->pd.page_tables[i];
 
+		if (pt != ppgtt->scratch_pt)
+			unmap_and_free_pt(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+	}
+
+	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
 	unmap_and_free_pd(&ppgtt->pd);
 }
 
@@ -1217,6 +1288,9 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 	 * size. We allocate at the top of the GTT to avoid fragmentation.
 	 */
 	BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
+	ppgtt->scratch_pt = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pt))
+		return PTR_ERR(ppgtt->scratch_pt);
 alloc:
 	ret = drm_mm_insert_node_in_range_generic(&dev_priv->gtt.base.mm,
 						  &ppgtt->node, GEN6_PD_SIZE,
@@ -1247,6 +1321,7 @@ alloc:
 	return 0;
 
 err_out:
+	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
 	return ret;
 }
 
@@ -1258,18 +1333,20 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
-			ppgtt->base.dev);
+	return 0;
+}
 
-	if (ret) {
-		drm_mm_remove_node(&ppgtt->node);
-		return ret;
-	}
+static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
+				  uint64_t start, uint64_t length)
+{
+	struct i915_page_table_entry *unused;
+	uint32_t pde, temp;
 
-	return 0;
+	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde)
+		ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
 }
 
-static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 {
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1292,6 +1369,18 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
+	if (aliasing) {
+		/* preallocate all pts */
+		ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+				ppgtt->base.dev);
+
+		if (ret) {
+			unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
+			drm_mm_remove_node(&ppgtt->node);
+			return ret;
+		}
+	}
+
 	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
@@ -1306,7 +1395,10 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
 		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
+	if (aliasing)
+		ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
+	else
+		gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
 
 	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
 
@@ -1320,7 +1412,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
+static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
+		bool aliasing)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
@@ -1328,7 +1421,7 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.scratch = dev_priv->gtt.base.scratch;
 
 	if (INTEL_INFO(dev)->gen < 8)
-		return gen6_ppgtt_init(ppgtt);
+		return gen6_ppgtt_init(ppgtt, aliasing);
 	else
 		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
 }
@@ -1337,7 +1430,7 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret = 0;
 
-	ret = __hw_ppgtt_init(dev, ppgtt);
+	ret = __hw_ppgtt_init(dev, ppgtt, false);
 	if (ret == 0) {
 		kref_init(&ppgtt->ref);
 		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
@@ -1969,7 +2062,7 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
 		if (!ppgtt)
 			return -ENOMEM;
 
-		ret = __hw_ppgtt_init(dev, ppgtt);
+		ret = __hw_ppgtt_init(dev, ppgtt, true);
 		if (ret != 0)
 			return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index eaf530f..43b5adf 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -195,6 +195,7 @@ struct i915_page_table_entry {
 	dma_addr_t daddr;
 
 	unsigned long *used_ptes;
+	unsigned int scratch:1;
 };
 
 struct i915_page_directory_entry {
@@ -305,6 +306,8 @@ struct i915_hw_ppgtt {
 		struct i915_page_directory_entry pd;
 	};
 
+	struct i915_page_table_entry *scratch_pt;
+
 	struct drm_i915_file_private *file_priv;
 
 	gen6_gtt_pte_t __iomem *pd_addr;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 10/32] drm/i915: Add dynamic page trace events
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (8 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 09/32] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 11/32] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
                     ` (21 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

Traces for page directories and tables allocation and map.

v2: Removed references to teardown.
v3: bitmap_scnprintf has been deprecated.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c     |  2 +
 drivers/gpu/drm/i915/i915_gem_gtt.c |  5 ++
 drivers/gpu/drm/i915/i915_trace.h   | 95 +++++++++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 312b7d2..4e51275 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3601,6 +3601,8 @@ search_free:
 
 	/*  allocate before insert / bind */
 	if (vma->vm->allocate_va_range) {
+		trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size,
+				VM_TO_TRACE_NAME(vma->vm));
 		ret = vma->vm->allocate_va_range(vma->vm,
 						vma->node.start,
 						vma->node.size);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 85c8a51..93b7bce 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1210,6 +1210,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 
 		ppgtt->pd.page_tables[pde] = pt;
 		set_bit(pde, new_page_tables);
+		trace_i915_page_table_entry_alloc(vm, pde, start, GEN6_PDE_SHIFT);
 	}
 
 	start = start_save;
@@ -1225,6 +1226,10 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 		if (test_and_clear_bit(pde, new_page_tables))
 			gen6_write_pde(&ppgtt->pd, pde, pt);
 
+		trace_i915_page_table_entry_map(vm, pde, pt,
+					 gen6_pte_index(start),
+					 gen6_pte_count(start, length),
+					 I915_PPGTT_PT_ENTRIES);
 		bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
 				I915_PPGTT_PT_ENTRIES);
 	}
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index f004d3d..0038dc2 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -156,6 +156,101 @@ TRACE_EVENT(i915_vma_unbind,
 		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
 );
 
+#define VM_TO_TRACE_NAME(vm) \
+	(i915_is_ggtt(vm) ? "GGTT" : \
+		      "Private VM")
+
+DECLARE_EVENT_CLASS(i915_va,
+	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+	TP_ARGS(vm, start, length, name),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u64, start)
+		__field(u64, end)
+		__string(name, name)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->start = start;
+		__entry->end = start + length;
+		__assign_str(name, name);
+	),
+
+	TP_printk("vm=%p (%s), 0x%llx-0x%llx",
+		  __entry->vm, __get_str(name),  __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_va, i915_va_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+	     TP_ARGS(vm, start, length, name)
+);
+
+DECLARE_EVENT_CLASS(i915_page_table_entry,
+	TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	TP_ARGS(vm, pde, start, pde_shift),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u64, start)
+		__field(u64, end)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->start = start;
+		__entry->end = (start + (1ULL << pde_shift)) & ~((1ULL << pde_shift)-1);
+	),
+
+	TP_printk("vm=%p, pde=%d (0x%llx-0x%llx)",
+		  __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_page_table_entry, i915_page_table_entry_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	     TP_ARGS(vm, pde, start, pde_shift)
+);
+
+/* Avoid extra math because we only support two sizes. The format is defined by
+ * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
+#define TRACE_PT_SIZE(bits) \
+	((((bits) == 1024) ? 288 : 144) + 1)
+
+DECLARE_EVENT_CLASS(i915_page_table_entry_update,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_page_table_entry *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u32, first)
+		__field(u32, last)
+		__bitmask(cur_ptes, TRACE_PT_SIZE(bits))
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->first = first;
+		__entry->last = first + len;
+		__assign_bitmask(cur_ptes, pt->used_ptes, bits);
+	),
+
+	TP_printk("vm=%p, pde=%d, updating %u:%u\t%s",
+		  __entry->vm, __entry->pde, __entry->last, __entry->first,
+		  __get_bitmask(cur_ptes))
+);
+
+DEFINE_EVENT(i915_page_table_entry_update, i915_page_table_entry_map,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_page_table_entry *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits)
+);
+
 TRACE_EVENT(i915_gem_object_change_domain,
 	    TP_PROTO(struct drm_i915_gem_object *obj, u32 old_read, u32 old_write),
 	    TP_ARGS(obj, old_read, old_write),
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 11/32] drm/i915/bdw: Use dynamic allocation idioms on free
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (9 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 10/32] drm/i915: Add dynamic page trace events Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 12/32] drm/i915/bdw: page directories rework allocation Michel Thierry
                     ` (20 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The page directory freer is left here for now as it's still useful given
that GEN8 still preallocates. Once the allocation functions are broken
up into more discrete chunks, we'll follow suit and destroy this
leftover piece.

v2: Match trace_i915_va_teardown params
v3: Multiple rebases.
v4: Updated to use unmap_and_free_pt.
v5: teardown_va_range logic no longer needed.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 26 ++++++++++----------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 47 +++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 93b7bce..0289176 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -607,19 +607,6 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct d
 	}
 }
 
-static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
-{
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
-			continue;
-
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
-		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
-	}
-}
-
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
@@ -652,6 +639,19 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	}
 }
 
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+{
+	int i;
+
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+			continue;
+
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
+		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
+	}
+}
+
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct i915_hw_ppgtt *ppgtt =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 43b5adf..70ce50d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -383,6 +383,53 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	return i915_pde_index(addr, GEN6_PDE_SHIFT);
 }
 
+#define gen8_for_each_pde(pt, pd, start, length, temp, iter)		\
+	for (iter = gen8_pde_index(start), pt = (pd)->page_tables[iter]; \
+	     length > 0 && iter < GEN8_PDES_PER_PAGE;			\
+	     pt = (pd)->page_tables[++iter],				\
+	     temp = ALIGN(start+1, 1 << GEN8_PDE_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
+	for (iter = gen8_pdpe_index(start), pd = (pdp)->page_directory[iter];	\
+	     length > 0 && iter < GEN8_LEGACY_PDPES;			\
+	     pd = (pdp)->page_directory[++iter],				\
+	     temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
+/* Clamp length to the next page_directory boundary */
+static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
+{
+	uint64_t next_pd = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
+
+	if (next_pd > (start + length))
+		return length;
+
+	return next_pd - start;
+}
+
+static inline uint32_t gen8_pte_index(uint64_t address)
+{
+	return i915_pte_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pde_index(uint64_t address)
+{
+	return i915_pde_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pdpe_index(uint64_t address)
+{
+	return (address >> GEN8_PDPE_SHIFT) & GEN8_PDPE_MASK;
+}
+
+static inline uint32_t gen8_pml4e_index(uint64_t address)
+{
+	BUG(); /* For 64B */
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 12/32] drm/i915/bdw: page directories rework allocation
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (10 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 11/32] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 13/32] drm/i915/bdw: pagetable allocation rework Michel Thierry
                     ` (19 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Start using gen8_for_each_pdpe macro to allocate the page directories.

v2: Rebased after s/free_pt_*/unmap_and_free_pt/ change.
v3: Rebased after teardown va range logic was removed.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 43 ++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0289176..2d7359e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -681,25 +681,39 @@ unwind_out:
 	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
+static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
+				     uint64_t start,
+				     uint64_t length)
 {
-	int i;
-
-	for (i = 0; i < max_pdp; i++) {
-		ppgtt->pdp.page_directory[i] = alloc_pd_single();
-		if (IS_ERR(ppgtt->pdp.page_directory[i]))
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pdp, struct i915_hw_ppgtt, pdp);
+	struct i915_page_directory_entry *unused;
+	uint64_t temp;
+	uint32_t pdpe;
+
+	/* FIXME: PPGTT container_of won't work for 64b */
+	BUG_ON((start + length) > 0x800000000ULL);
+
+	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
+		BUG_ON(unused);
+		pdp->page_directory[pdpe] = alloc_pd_single();
+		if (IS_ERR(ppgtt->pdp.page_directory[pdpe]))
 			goto unwind_out;
+
+		ppgtt->num_pd_pages++;
 	}
 
-	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
 
 unwind_out:
-	while (i--)
-		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
+	while (pdpe--) {
+		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe]);
+		ppgtt->num_pd_pages--;
+	}
+
+	WARN_ON(ppgtt->num_pd_pages);
 
 	return -ENOMEM;
 }
@@ -709,7 +723,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 {
 	int ret;
 
-	ret = gen8_ppgtt_allocate_page_directories(ppgtt, max_pdp);
+	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, ppgtt->base.start,
+					ppgtt->base.total);
 	if (ret)
 		return ret;
 
@@ -785,6 +800,10 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	if (size % (1<<30))
 		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
 
+	ppgtt->base.start = 0;
+	ppgtt->base.total = size;
+	BUG_ON(ppgtt->base.total == 0);
+
 	/* 1. Do all our allocations for page directories and page tables. */
 	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
 	if (ret)
@@ -832,8 +851,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
-	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
 
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 13/32] drm/i915/bdw: pagetable allocation rework
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (11 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 12/32] drm/i915/bdw: page directories rework allocation Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 14/32] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
                     ` (18 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Start using gen8_for_each_pde macro to allocate page tables.

v2: teardown_va_range references removed.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 46 +++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2d7359e..a359f62 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -661,22 +661,27 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_alloc_pagetabs(struct i915_page_directory_entry *pd,
+				     uint64_t start,
+				     uint64_t length,
+				     struct drm_device *dev)
 {
-	int i, ret;
+	struct i915_page_table_entry *unused;
+	uint64_t temp;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
-				     0, GEN8_PDES_PER_PAGE, ppgtt->base.dev);
-		if (ret)
+	gen8_for_each_pde(unused, pd, start, length, temp, pde) {
+		BUG_ON(unused);
+		pd->page_tables[pde] = alloc_pt_single(dev);
+		if (IS_ERR(pd->page_tables[pde]))
 			goto unwind_out;
 	}
 
 	return 0;
 
 unwind_out:
-	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
+	while (pde--)
+		unmap_and_free_pt(pd->page_tables[pde], dev);
 
 	return -ENOMEM;
 }
@@ -719,20 +724,28 @@ unwind_out:
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-			    const int max_pdp)
+			    uint64_t start,
+			    uint64_t length)
 {
+	struct i915_page_directory_entry *pd;
+	uint64_t temp;
+	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, ppgtt->base.start,
-					ppgtt->base.total);
+	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length);
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
-	if (ret)
-		goto err_out;
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
+						ppgtt->base.dev);
+		if (ret)
+			goto err_out;
+
+		ppgtt->num_pd_entries += GEN8_PDES_PER_PAGE;
+	}
 
-	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
+	BUG_ON(pdpe > ppgtt->num_pd_pages);
 
 	return 0;
 
@@ -802,10 +815,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
-	BUG_ON(ppgtt->base.total == 0);
 
 	/* 1. Do all our allocations for page directories and page tables. */
-	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
+	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
 	if (ret)
 		return ret;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 14/32] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (12 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 13/32] drm/i915/bdw: pagetable allocation rework Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 15/32] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
                     ` (17 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

One important part of this patch is we now write a scratch page
directory into any unused PDP descriptors. This matters for 2 reasons,
first, we're not allowed to just use 0, or an invalid pointer, and second,
we must wipe out any previous contents from the last context.

The latter point only matters with full PPGTT. The former point only
effect platforms with less than 4GB memory.

v2: Updated commit message to point that we must set unused PDPs to the
scratch page.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 29 ++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  5 ++++-
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a359f62..079a742 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -442,8 +442,9 @@ static struct i915_page_directory_entry *alloc_pd_single(void)
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
-static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
-			   uint64_t val)
+static int gen8_write_pdp(struct intel_engine_cs *ring,
+			  unsigned entry,
+			  dma_addr_t addr)
 {
 	int ret;
 
@@ -455,10 +456,10 @@ static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
 
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 	intel_ring_emit(ring, GEN8_RING_PDP_UDW(ring, entry));
-	intel_ring_emit(ring, (u32)(val >> 32));
+	intel_ring_emit(ring, upper_32_bits(addr));
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 	intel_ring_emit(ring, GEN8_RING_PDP_LDW(ring, entry));
-	intel_ring_emit(ring, (u32)(val));
+	intel_ring_emit(ring, lower_32_bits(addr));
 	intel_ring_advance(ring);
 
 	return 0;
@@ -469,12 +470,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 {
 	int i, ret;
 
-	/* bit of a hack to find the actual last used pd */
-	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
-
-	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.page_directory[i]->daddr;
-		ret = gen8_write_pdp(ring, i, addr);
+	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
+		dma_addr_t pd_daddr = pd ? pd->daddr : ppgtt->scratch_pd->daddr;
+		/* The page directory might be NULL, but we need to clear out
+		 * whatever the previous context might have used. */
+		ret = gen8_write_pdp(ring, i, pd_daddr);
 		if (ret)
 			return ret;
 	}
@@ -816,10 +817,16 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
 
+	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pd))
+		return PTR_ERR(ppgtt->scratch_pd);
+
 	/* 1. Do all our allocations for page directories and page tables. */
 	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
-	if (ret)
+	if (ret) {
+		unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
+	}
 
 	/*
 	 * 2. Create DMA mappings for the page directories and page tables.
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 70ce50d..f7d2af5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -306,7 +306,10 @@ struct i915_hw_ppgtt {
 		struct i915_page_directory_entry pd;
 	};
 
-	struct i915_page_table_entry *scratch_pt;
+	union {
+		struct i915_page_table_entry *scratch_pt;
+		struct i915_page_table_entry *scratch_pd; /* Just need the daddr */
+	};
 
 	struct drm_i915_file_private *file_priv;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 15/32] drm/i915: num_pd_pages/num_pd_entries isn't useful
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (13 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 14/32] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 16/32] drm/i915: Extract PPGTT param from page_directory alloc Michel Thierry
                     ` (16 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

These values are never quite useful for dynamic allocations of the page
tables. Getting rid of them will help prevent later confusion.

v2: Updated to use unmap_and_free_pd functions.
v3: Updated gen8_ppgtt_free after teardown logic was removed.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 --
 drivers/gpu/drm/i915/i915_gem_gtt.c | 72 ++++++++++++-------------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  7 ++--
 3 files changed, 28 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e8ad450..e85da9d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2149,8 +2149,6 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 	if (!ppgtt)
 		return;
 
-	seq_printf(m, "Page directories: %d\n", ppgtt->num_pd_pages);
-	seq_printf(m, "Page tables: %d\n", ppgtt->num_pd_entries);
 	for_each_ring(ring, dev_priv, unused) {
 		seq_printf(m, "%s\n", ring->name);
 		for (i = 0; i < 4; i++) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 079a742..781b751 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -613,9 +613,7 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		/* TODO: In the future we'll support sparse mappings, so this
-		 * will have to change. */
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
 		if (!ppgtt->pdp.page_directory[i]->daddr)
 			continue;
 
@@ -644,7 +642,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
@@ -705,21 +703,13 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 		pdp->page_directory[pdpe] = alloc_pd_single();
 		if (IS_ERR(ppgtt->pdp.page_directory[pdpe]))
 			goto unwind_out;
-
-		ppgtt->num_pd_pages++;
 	}
 
-	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
-
 	return 0;
 
 unwind_out:
-	while (pdpe--) {
+	while (pdpe--)
 		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe]);
-		ppgtt->num_pd_pages--;
-	}
-
-	WARN_ON(ppgtt->num_pd_pages);
 
 	return -ENOMEM;
 }
@@ -742,12 +732,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 						ppgtt->base.dev);
 		if (ret)
 			goto err_out;
-
-		ppgtt->num_pd_entries += GEN8_PDES_PER_PAGE;
 	}
 
-	BUG_ON(pdpe > ppgtt->num_pd_pages);
-
 	return 0;
 
 err_out:
@@ -808,7 +794,6 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
 	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
 	int i, j, ret;
 
 	if (size % (1<<30))
@@ -872,12 +857,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
-	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
-			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
-	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
-			 ppgtt->num_pd_entries,
-			 (ppgtt->num_pd_entries - min_pt_pages) + size % (1<<30));
 	return 0;
 
 bail:
@@ -888,26 +867,20 @@ bail:
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
 	struct i915_address_space *vm = &ppgtt->base;
-	gen6_gtt_pte_t __iomem *pd_addr;
+	struct i915_page_table_entry *unused;
 	gen6_gtt_pte_t scratch_pte;
 	uint32_t pd_entry;
-	int pte, pde;
+	uint32_t  pte, pde, temp;
+	uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
 
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
 
-	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-
-	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-		   ppgtt->pd.pd_offset,
-		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
-	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
+	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
 		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
-		pd_entry = readl(pd_addr + pde);
+		pd_entry = readl(ppgtt->pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
 		if (pd_entry != expected)
@@ -1189,12 +1162,15 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 
 static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
-	int i;
+	struct i915_page_table_entry *pt;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_tables[i]->daddr,
-			       4096, PCI_DMA_BIDIRECTIONAL);
+	gen6_for_all_pdes(pt, ppgtt, pde) {
+		if (pt != ppgtt->scratch_pt)
+			pci_unmap_page(ppgtt->base.dev->pdev,
+				pt->daddr,
+				4096, PCI_DMA_BIDIRECTIONAL);
+	}
 }
 
 /* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
@@ -1293,13 +1269,12 @@ unwind_out:
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		struct i915_page_table_entry *pt = ppgtt->pd.page_tables[i];
+	struct i915_page_table_entry *pt;
+	uint32_t pde;
 
+	gen6_for_all_pdes(pt, ppgtt, pde) {
 		if (pt != ppgtt->scratch_pt)
-			unmap_and_free_pt(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+			unmap_and_free_pt(pt, ppgtt->base.dev);
 	}
 
 	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
@@ -1358,7 +1333,6 @@ alloc:
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
-	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
 	return 0;
 
 err_out:
@@ -1412,7 +1386,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 
 	if (aliasing) {
 		/* preallocate all pts */
-		ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+		ret = alloc_pt_range(&ppgtt->pd, 0, GEN6_PPGTT_PD_ENTRIES,
 				ppgtt->base.dev);
 
 		if (ret) {
@@ -1427,7 +1401,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	ppgtt->base.total = GEN6_PPGTT_PD_ENTRIES * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd.pd_offset =
@@ -1730,7 +1704,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 				ppgtt = dev_priv->mm.aliasing_ppgtt;
 
 			gen6_write_page_range(dev_priv, &ppgtt->pd, 0,
-					      ppgtt->num_pd_entries);
+					      GEN6_PPGTT_PD_ENTRIES);
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index f7d2af5..9d49de7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -299,8 +299,6 @@ struct i915_hw_ppgtt {
 	struct kref ref;
 	struct drm_mm_node node;
 	unsigned long pd_dirty_rings;
-	unsigned num_pd_entries;
-	unsigned num_pd_pages; /* gen8+ */
 	union {
 		struct i915_page_directory_pointer_entry pdp;
 		struct i915_page_directory_entry pd;
@@ -338,6 +336,11 @@ struct i915_hw_ppgtt {
 	     temp = min_t(unsigned, temp, length), \
 	     start += temp, length -= temp)
 
+#define gen6_for_all_pdes(pt, ppgtt, iter)  \
+	for (iter = 0, pt = ppgtt->pd.page_tables[iter];			\
+	     iter < gen6_pde_index(ppgtt->base.total);			\
+	     pt =  ppgtt->pd.page_tables[++iter])
+
 static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
 {
 	const uint32_t mask = NUM_PTE(pde_shift) - 1;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 16/32] drm/i915: Extract PPGTT param from page_directory alloc
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (14 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 15/32] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 17/32] drm/i915/bdw: Split out mappings Michel Thierry
                     ` (15 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Now that we don't need to trace num_pd_pages, we may as well kill all
need for the PPGTT structure in the alloc_page_directorys. This is very useful
for when we move to 48b addressing, and the PDP isn't the root of the
page table structure.

The param is replaced with drm_device, which is an unavoidable wart
throughout the series. (in other words, not extra flagrant).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 781b751..7849769 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -689,8 +689,6 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 				     uint64_t start,
 				     uint64_t length)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(pdp, struct i915_hw_ppgtt, pdp);
 	struct i915_page_directory_entry *unused;
 	uint64_t temp;
 	uint32_t pdpe;
@@ -701,7 +699,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
 		pdp->page_directory[pdpe] = alloc_pd_single();
-		if (IS_ERR(ppgtt->pdp.page_directory[pdpe]))
+		if (IS_ERR(pdp->page_directory[pdpe]))
 			goto unwind_out;
 	}
 
@@ -709,7 +707,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 
 unwind_out:
 	while (pdpe--)
-		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe]);
+		unmap_and_free_pd(pdp->page_directory[pdpe]);
 
 	return -ENOMEM;
 }
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 17/32] drm/i915/bdw: Split out mappings
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (15 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 16/32] drm/i915: Extract PPGTT param from page_directory alloc Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 18/32] drm/i915/bdw: begin bitmap tracking Michel Thierry
                     ` (14 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

When we do dynamic page table allocations for gen8, we'll need to have
more control over how and when we map page tables, similar to gen6.
In particular, DMA mappings for page directories/tables occur at allocation
time.

This patch adds the functionality and calls it at init, which should
have no functional change.

The PDPEs are still a special case for now. We'll need a function for
that in the future as well.

v2: Handle renamed unmap_and_free_page functions.
v3: Updated after teardown_va logic was removed.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 176 ++++++++++++++----------------------
 1 file changed, 69 insertions(+), 107 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7849769..3a75408 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -416,17 +416,20 @@ err_out:
 	return ret;
 }
 
-static void unmap_and_free_pd(struct i915_page_directory_entry *pd)
+static void unmap_and_free_pd(struct i915_page_directory_entry *pd,
+			       struct drm_device *dev)
 {
 	if (pd->page) {
+		i915_dma_unmap_single(pd, dev);
 		__free_page(pd->page);
 		kfree(pd);
 	}
 }
 
-static struct i915_page_directory_entry *alloc_pd_single(void)
+static struct i915_page_directory_entry *alloc_pd_single(struct drm_device *dev)
 {
 	struct i915_page_directory_entry *pd;
+	int ret;
 
 	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
 	if (!pd)
@@ -438,6 +441,13 @@ static struct i915_page_directory_entry *alloc_pd_single(void)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	ret = i915_dma_map_single(pd, dev);
+	if (ret) {
+		__free_page(pd->page);
+		kfree(pd);
+		return ERR_PTR(ret);
+	}
+
 	return pd;
 }
 
@@ -592,6 +602,36 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+			     struct i915_page_table_entry *pt,
+			     struct drm_device *dev)
+{
+	gen8_ppgtt_pde_t entry =
+		gen8_pde_encode(dev, pt->daddr, I915_CACHE_LLC);
+	*pde = entry;
+}
+
+/* It's likely we'll map more than one pagetable at a time. This function will
+ * save us unnecessary kmap calls, but do no more functionally than multiple
+ * calls to map_pt. */
+static void gen8_map_pagetable_range(struct i915_page_directory_entry *pd,
+				     uint64_t start,
+				     uint64_t length,
+				     struct drm_device *dev)
+{
+	gen8_ppgtt_pde_t *page_directory = kmap_atomic(pd->page);
+	struct i915_page_table_entry *pt;
+	uint64_t temp, pde;
+
+	gen8_for_each_pde(pt, pd, start, length, temp, pde)
+		__gen8_do_map_pt(page_directory + pde, pt, dev);
+
+	if (!HAS_LLC(dev))
+		drm_clflush_virt_range(page_directory, PAGE_SIZE);
+
+	kunmap_atomic(page_directory);
+}
+
 static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct drm_device *dev)
 {
 	int i;
@@ -647,7 +687,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 			continue;
 
 		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
-		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
+		unmap_and_free_pd(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
 	}
 }
 
@@ -687,7 +727,8 @@ unwind_out:
 
 static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
 				     uint64_t start,
-				     uint64_t length)
+				     uint64_t length,
+				     struct drm_device *dev)
 {
 	struct i915_page_directory_entry *unused;
 	uint64_t temp;
@@ -698,7 +739,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
-		pdp->page_directory[pdpe] = alloc_pd_single();
+		pdp->page_directory[pdpe] = alloc_pd_single(dev);
 		if (IS_ERR(pdp->page_directory[pdpe]))
 			goto unwind_out;
 	}
@@ -707,21 +748,24 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 
 unwind_out:
 	while (pdpe--)
-		unmap_and_free_pd(pdp->page_directory[pdpe]);
+		unmap_and_free_pd(pdp->page_directory[pdpe], dev);
 
 	return -ENOMEM;
 }
 
-static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-			    uint64_t start,
-			    uint64_t length)
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start,
+			       uint64_t length)
 {
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_entry *pd;
 	uint64_t temp;
 	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length);
+	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length,
+					ppgtt->base.dev);
 	if (ret)
 		return ret;
 
@@ -739,128 +783,46 @@ err_out:
 	return ret;
 }
 
-static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
-					     const int pd)
-{
-	dma_addr_t pd_addr;
-	int ret;
-
-	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.page_directory[pd]->page, 0,
-			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
-	if (ret)
-		return ret;
-
-	ppgtt->pdp.page_directory[pd]->daddr = pd_addr;
-
-	return 0;
-}
-
-static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
-					const int pd,
-					const int pt)
-{
-	dma_addr_t pt_addr;
-	struct i915_page_directory_entry *pdir = ppgtt->pdp.page_directory[pd];
-	struct i915_page_table_entry *ptab = pdir->page_tables[pt];
-	struct page *p = ptab->page;
-	int ret;
-
-	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
-	if (ret)
-		return ret;
-
-	ptab->daddr = pt_addr;
-
-	return 0;
-}
-
 /**
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
  * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
  * space.
  *
- * FIXME: split allocation into smaller pieces. For now we only ever do this
- * once, but with full PPGTT, the multiple contiguous allocations will be bad.
- * TODO: Do something with the size parameter
  */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	int i, j, ret;
-
-	if (size % (1<<30))
-		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
+	struct i915_page_directory_entry *pd;
+	uint64_t temp, start = 0;
+	const uint64_t orig_length = size;
+	uint32_t pdpe;
+	int ret;
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->switch_mm = gen8_mm_switch;
 
 	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pd))
 		return PTR_ERR(ppgtt->scratch_pd);
 
-	/* 1. Do all our allocations for page directories and page tables. */
-	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
+	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
 		unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
 	}
 
-	/*
-	 * 2. Create DMA mappings for the page directories and page tables.
-	 */
-	for (i = 0; i < max_pdp; i++) {
-		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
-		if (ret)
-			goto bail;
+	start = 0;
+	size = orig_length;
 
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
-			if (ret)
-				goto bail;
-		}
-	}
-
-	/*
-	 * 3. Map all the page directory entires to point to the page tables
-	 * we've allocated.
-	 *
-	 * For now, the PPGTT helper functions all require that the PDEs are
-	 * plugged in correctly. So we do that now/here. For aliasing PPGTT, we
-	 * will never need to touch the PDEs again.
-	 */
-	for (i = 0; i < max_pdp; i++) {
-		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
-		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i]->page);
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_table_entry *pt = pd->page_tables[j];
-			dma_addr_t addr = pt->daddr;
-			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
-						      I915_CACHE_LLC);
-		}
-		if (!HAS_LLC(ppgtt->base.dev))
-			drm_clflush_virt_range(pd_vaddr, PAGE_SIZE);
-		kunmap_atomic(pd_vaddr);
-	}
-
-	ppgtt->switch_mm = gen8_mm_switch;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
-	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
+		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
 
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 	return 0;
-
-bail:
-	gen8_ppgtt_unmap_pages(ppgtt);
-	gen8_ppgtt_free(ppgtt);
-	return ret;
 }
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
@@ -1276,7 +1238,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	}
 
 	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
-	unmap_and_free_pd(&ppgtt->pd);
+	unmap_and_free_pd(&ppgtt->pd, ppgtt->base.dev);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 18/32] drm/i915/bdw: begin bitmap tracking
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (16 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 17/32] drm/i915/bdw: Split out mappings Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 19/32] drm/i915/bdw: Dynamic page table allocations Michel Thierry
                     ` (13 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Like with gen6/7, we can enable bitmap tracking with all the
preallocations to make sure things actually don't blow up.

v2: Rebased to match changes from previous patches.
v3: Without teardown logic, rely on used_pdpes and used_pdes when
freeing page tables.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 75 ++++++++++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 24 ++++++++++++
 2 files changed, 81 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3a75408..d9b488a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -422,6 +422,7 @@ static void unmap_and_free_pd(struct i915_page_directory_entry *pd,
 	if (pd->page) {
 		i915_dma_unmap_single(pd, dev);
 		__free_page(pd->page);
+		kfree(pd->used_pdes);
 		kfree(pd);
 	}
 }
@@ -429,26 +430,35 @@ static void unmap_and_free_pd(struct i915_page_directory_entry *pd,
 static struct i915_page_directory_entry *alloc_pd_single(struct drm_device *dev)
 {
 	struct i915_page_directory_entry *pd;
-	int ret;
+	int ret = -ENOMEM;
 
 	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
 	if (!pd)
 		return ERR_PTR(-ENOMEM);
 
+	pd->used_pdes = kcalloc(BITS_TO_LONGS(GEN8_PDES_PER_PAGE),
+				sizeof(*pd->used_pdes), GFP_KERNEL);
+	if (!pd->used_pdes)
+		goto free_pd;
+
 	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (!pd->page) {
-		kfree(pd);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!pd->page)
+		goto free_bitmap;
 
 	ret = i915_dma_map_single(pd, dev);
-	if (ret) {
-		__free_page(pd->page);
-		kfree(pd);
-		return ERR_PTR(ret);
-	}
+	if (ret)
+		goto free_page;
 
 	return pd;
+
+free_page:
+	__free_page(pd->page);
+free_bitmap:
+	kfree(pd->used_pdes);
+free_pd:
+	kfree(pd);
+
+	return ERR_PTR(ret);
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
@@ -639,7 +649,7 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct d
 	if (!pd->page)
 		return;
 
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
+	for_each_set_bit(i, pd->used_pdes, GEN8_PDES_PER_PAGE) {
 		if (WARN_ON(!pd->page_tables[i]))
 			continue;
 
@@ -653,15 +663,18 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
 
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
-		if (!ppgtt->pdp.page_directory[i]->daddr)
+	for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
+		struct i915_page_directory_entry *pd;
+
+		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i]->daddr, PAGE_SIZE,
-			       PCI_DMA_BIDIRECTIONAL);
+		pd = ppgtt->pdp.page_directory[i];
+		if (!pd->daddr)
+			pci_unmap_page(hwdev, pd->daddr, PAGE_SIZE,
+					PCI_DMA_BIDIRECTIONAL);
 
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
+		for_each_set_bit(j, pd->used_pdes, GEN8_PDES_PER_PAGE) {
 			struct i915_page_table_entry *pt;
 			dma_addr_t addr;
 
@@ -682,7 +695,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+	for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
@@ -725,6 +738,7 @@ unwind_out:
 	return -ENOMEM;
 }
 
+/* bitmap of new page_directories */
 static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
 				     uint64_t start,
 				     uint64_t length,
@@ -740,6 +754,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
 		pdp->page_directory[pdpe] = alloc_pd_single(dev);
+
 		if (IS_ERR(pdp->page_directory[pdpe]))
 			goto unwind_out;
 	}
@@ -760,10 +775,13 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_entry *pd;
+	const uint64_t orig_start = start;
+	const uint64_t orig_length = length;
 	uint64_t temp;
 	uint32_t pdpe;
 	int ret;
 
+	/* Do the allocations first so we can easily bail out */
 	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length,
 					ppgtt->base.dev);
 	if (ret)
@@ -776,6 +794,27 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 			goto err_out;
 	}
 
+	/* Now mark everything we've touched as used. This doesn't allow for
+	 * robust error checking, but it makes the code a hell of a lot simpler.
+	 */
+	start = orig_start;
+	length = orig_length;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		struct i915_page_table_entry *pt;
+		uint64_t pd_len = gen8_clamp_pd(start, length);
+		uint64_t pd_start = start;
+		uint32_t pde;
+
+		gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
+			bitmap_set(pd->page_tables[pde]->used_ptes,
+				   gen8_pte_index(start),
+				   gen8_pte_count(start, length));
+			set_bit(pde, pd->used_pdes);
+		}
+		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+	}
+
 	return 0;
 
 err_out:
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9d49de7..c68ec3a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -205,11 +205,13 @@ struct i915_page_directory_entry {
 		dma_addr_t daddr;
 	};
 
+	unsigned long *used_pdes;
 	struct i915_page_table_entry *page_tables[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
 };
 
 struct i915_page_directory_pointer_entry {
 	/* struct page *page; */
+	DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
 	struct i915_page_directory_entry *page_directory[GEN8_LEGACY_PDPES];
 };
 
@@ -436,6 +438,28 @@ static inline uint32_t gen8_pml4e_index(uint64_t address)
 	BUG(); /* For 64B */
 }
 
+static inline size_t gen8_pte_count(uint64_t addr, uint64_t length)
+{
+	return i915_pte_count(addr, length, GEN8_PDE_SHIFT);
+}
+
+static inline size_t gen8_pde_count(uint64_t addr, uint64_t length)
+{
+	const uint32_t pdp_shift = GEN8_PDE_SHIFT + 9;
+	const uint64_t mask = ~((1 << pdp_shift) - 1);
+	uint64_t end;
+
+	BUG_ON(length == 0);
+	BUG_ON(offset_in_page(addr|length));
+
+	end = addr + length;
+
+	if ((addr & mask) != (end & mask))
+		return GEN8_PDES_PER_PAGE - i915_pde_index(addr, GEN8_PDE_SHIFT);
+
+	return i915_pde_index(end, GEN8_PDE_SHIFT) - i915_pde_index(addr, GEN8_PDE_SHIFT);
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 19/32] drm/i915/bdw: Dynamic page table allocations
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (17 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 18/32] drm/i915/bdw: begin bitmap tracking Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 20/32] drm/i915/bdw: Support dynamic pdp updates in lrc mode Michel Thierry
                     ` (12 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.

v2: Update aliasing/true ppgtt allocate/teardown/clear functions for
gen 6 & 7.

v3: Rebase.

v4: Remove BUG() from ppgtt_unbind_vma, but keep checking that either
teardown_va_range or clear_range functions exist (Daniel).

v5: Similar to gen6, in init, gen8_ppgtt_clear_range call is only needed
for aliasing ppgtt. Zombie tracking was originally added for teardown
function and is no longer required.

v6: Update err_out case in gen8_alloc_va_range (missed from lastest
rebase).

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 300 +++++++++++++++++++++++++++++-------
 1 file changed, 246 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d9b488a..63caaed 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -612,7 +612,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
 			     struct i915_page_table_entry *pt,
 			     struct drm_device *dev)
 {
@@ -629,7 +629,7 @@ static void gen8_map_pagetable_range(struct i915_page_directory_entry *pd,
 				     uint64_t length,
 				     struct drm_device *dev)
 {
-	gen8_ppgtt_pde_t *page_directory = kmap_atomic(pd->page);
+	gen8_ppgtt_pde_t * const page_directory = kmap_atomic(pd->page);
 	struct i915_page_table_entry *pt;
 	uint64_t temp, pde;
 
@@ -713,58 +713,163 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_alloc_pagetabs(struct i915_page_directory_entry *pd,
+/**
+ * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
+ * @ppgtt:	Master ppgtt structure.
+ * @pd:		Page directory for this address range.
+ * @start:	Starting virtual address to begin allocations.
+ * @length	Size of the allocations.
+ * @new_pts:	Bitmap set by function with new allocations. Likely used by the
+ *		caller to free on error.
+ *
+ * Allocate the required number of page tables. Extremely similar to
+ * gen8_ppgtt_alloc_page_directories(). The main difference is here we are limited by
+ * the page directory boundary (instead of the page directory pointer). That
+ * boundary is 1GB virtual. Therefore, unlike gen8_ppgtt_alloc_page_directories(), it is
+ * possible, and likely that the caller will need to use multiple calls of this
+ * function to achieve the appropriate allocation.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_page_directory_entry *pd,
 				     uint64_t start,
 				     uint64_t length,
-				     struct drm_device *dev)
+				     unsigned long *new_pts)
 {
-	struct i915_page_table_entry *unused;
+	struct i915_page_table_entry *pt;
 	uint64_t temp;
 	uint32_t pde;
 
-	gen8_for_each_pde(unused, pd, start, length, temp, pde) {
-		BUG_ON(unused);
-		pd->page_tables[pde] = alloc_pt_single(dev);
-		if (IS_ERR(pd->page_tables[pde]))
+	gen8_for_each_pde(pt, pd, start, length, temp, pde) {
+		/* Don't reallocate page tables */
+		if (pt) {
+			/* Scratch is never allocated this way */
+			WARN_ON(pt->scratch);
+			continue;
+		}
+
+		pt = alloc_pt_single(ppgtt->base.dev);
+		if (IS_ERR(pt))
 			goto unwind_out;
+
+		pd->page_tables[pde] = pt;
+		set_bit(pde, new_pts);
 	}
 
 	return 0;
 
 unwind_out:
-	while (pde--)
-		unmap_and_free_pt(pd->page_tables[pde], dev);
+	for_each_set_bit(pde, new_pts, GEN8_PDES_PER_PAGE)
+		unmap_and_free_pt(pd->page_tables[pde], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
 
-/* bitmap of new page_directories */
-static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
+/**
+ * gen8_ppgtt_alloc_page_directories() - Allocate page directories for VA range.
+ * @ppgtt:	Master ppgtt structure.
+ * @pdp:	Page directory pointer for this address range.
+ * @start:	Starting virtual address to begin allocations.
+ * @length	Size of the allocations.
+ * @new_pds	Bitmap set by function with new allocations. Likely used by the
+ *		caller to free on error.
+ *
+ * Allocate the required number of page directories starting at the pde index of
+ * @start, and ending at the pde index @start + @length. This function will skip
+ * over already allocated page directories within the range, and only allocate
+ * new ones, setting the appropriate pointer within the pdp as well as the
+ * correct position in the bitmap @new_pds.
+ *
+ * The function will only allocate the pages within the range for a give page
+ * directory pointer. In other words, if @start + @length straddles a virtually
+ * addressed PDP boundary (512GB for 4k pages), there will be more allocations
+ * required by the caller, This is not currently possible, and the BUG in the
+ * code will prevent it.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_page_directory_pointer_entry *pdp,
 				     uint64_t start,
 				     uint64_t length,
-				     struct drm_device *dev)
+				     unsigned long *new_pds)
 {
-	struct i915_page_directory_entry *unused;
+	struct i915_page_directory_entry *pd;
 	uint64_t temp;
 	uint32_t pdpe;
 
+	BUG_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+
 	/* FIXME: PPGTT container_of won't work for 64b */
 	BUG_ON((start + length) > 0x800000000ULL);
 
-	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
-		BUG_ON(unused);
-		pdp->page_directory[pdpe] = alloc_pd_single(dev);
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+		if (pd)
+			continue;
 
-		if (IS_ERR(pdp->page_directory[pdpe]))
+		pd = alloc_pd_single(ppgtt->base.dev);
+		if (IS_ERR(pd))
 			goto unwind_out;
+
+		pdp->page_directory[pdpe] = pd;
+		set_bit(pdpe, new_pds);
 	}
 
 	return 0;
 
 unwind_out:
-	while (pdpe--)
-		unmap_and_free_pd(pdp->page_directory[pdpe], dev);
+	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
+		unmap_and_free_pd(pdp->page_directory[pdpe], ppgtt->base.dev);
+
+	return -ENOMEM;
+}
+
+static inline void
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
+{
+	int i;
+
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+		kfree(new_pts[i]);
+	kfree(new_pts);
+	kfree(new_pds);
+}
+
+/* Fills in the page directory bitmap, ant the array of page tables bitmap. Both
+ * of these are based on the number of PDPEs in the system.
+ */
+int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
+					 unsigned long ***new_pts)
+{
+	int i;
+	unsigned long *pds;
+	unsigned long **pts;
+
+	pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
+	if (!pds)
+		return -ENOMEM;
+
+	pts = kcalloc(GEN8_PDES_PER_PAGE, sizeof(unsigned long *), GFP_KERNEL);
+	if (!pts) {
+		kfree(pds);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+		pts[i] = kcalloc(BITS_TO_LONGS(GEN8_PDES_PER_PAGE),
+				 sizeof(unsigned long), GFP_KERNEL);
+		if (!pts[i])
+			goto err_out;
+	}
+
+	*new_pds = pds;
+	*new_pts = (unsigned long **)pts;
 
+	return 0;
+
+err_out:
+	free_gen8_temp_bitmaps(pds, pts);
 	return -ENOMEM;
 }
 
@@ -774,6 +879,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	unsigned long *new_page_dirs, **new_page_tables;
 	struct i915_page_directory_entry *pd;
 	const uint64_t orig_start = start;
 	const uint64_t orig_length = length;
@@ -781,44 +887,99 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	uint32_t pdpe;
 	int ret;
 
-	/* Do the allocations first so we can easily bail out */
-	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length,
-					ppgtt->base.dev);
+#ifndef CONFIG_64BIT
+	/* Disallow 64b address on 32b platforms. Nothing is wrong with doing
+	 * this in hardware, but a lot of the drm code is not prepared to handle
+	 * 64b offset on 32b platforms.
+	 * This will be addressed when 48b PPGTT is added */
+	if (start + length > 0x100000000ULL)
+		return -E2BIG;
+#endif
+
+	/* Wrap is never okay since we can only represent 48b, and we don't
+	 * actually use the other side of the canonical address space.
+	 */
+	if (WARN_ON(start + length < start))
+		return -ERANGE;
+
+	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
 	if (ret)
 		return ret;
 
+	/* Do the allocations first so we can easily bail out */
+	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
+					new_page_dirs);
+	if (ret) {
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+		return ret;
+	}
+
+	/* For every page directory referenced, allocate page tables */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
-						ppgtt->base.dev);
+		bitmap_zero(new_page_tables[pdpe], GEN8_PDES_PER_PAGE);
+		ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
+						new_page_tables[pdpe]);
 		if (ret)
 			goto err_out;
 	}
 
-	/* Now mark everything we've touched as used. This doesn't allow for
-	 * robust error checking, but it makes the code a hell of a lot simpler.
-	 */
 	start = orig_start;
 	length = orig_length;
 
+	/* Allocations have completed successfully, so set the bitmaps, and do
+	 * the mappings. */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		gen8_ppgtt_pde_t *const page_directory = kmap_atomic(pd->page);
 		struct i915_page_table_entry *pt;
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
 		uint32_t pde;
 
-		gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
-			bitmap_set(pd->page_tables[pde]->used_ptes,
-				   gen8_pte_index(start),
-				   gen8_pte_count(start, length));
+		/* Every pd should be allocated, we just did that above. */
+		BUG_ON(!pd);
+
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			/* Same reasoning as pd */
+			BUG_ON(!pt);
+			BUG_ON(!pd_len);
+			BUG_ON(!gen8_pte_count(pd_start, pd_len));
+
+			/* Set our used ptes within the page table */
+			bitmap_set(pt->used_ptes,
+				   gen8_pte_index(pd_start),
+				   gen8_pte_count(pd_start, pd_len));
+
+			/* Our pde is now pointing to the pagetable, pt */
 			set_bit(pde, pd->used_pdes);
+
+			/* Map the PDE to the page table */
+			__gen8_do_map_pt(page_directory + pde, pt, vm->dev);
+
+			/* NB: We haven't yet mapped ptes to pages. At this
+			 * point we're still relying on insert_entries() */
 		}
+
+		if (!HAS_LLC(vm->dev))
+			drm_clflush_virt_range(page_directory, PAGE_SIZE);
+
+		kunmap_atomic(page_directory);
+
 		set_bit(pdpe, ppgtt->pdp.used_pdpes);
 	}
 
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	return 0;
 
 err_out:
-	gen8_ppgtt_free(ppgtt);
+	while (pdpe--) {
+		for_each_set_bit(temp, new_page_tables[pdpe], GEN8_PDES_PER_PAGE)
+			unmap_and_free_pt(pd->page_tables[temp], vm->dev);
+	}
+
+	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
+		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
+
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	return ret;
 }
 
@@ -829,38 +990,67 @@ err_out:
  * space.
  *
  */
-static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
+static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	struct i915_page_directory_entry *pd;
-	uint64_t temp, start = 0;
-	const uint64_t orig_length = size;
-	uint32_t pdpe;
-	int ret;
+	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pd))
+		return PTR_ERR(ppgtt->scratch_pd);
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+
 	ppgtt->switch_mm = gen8_mm_switch;
 
-	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
-	if (IS_ERR(ppgtt->scratch_pd))
-		return PTR_ERR(ppgtt->scratch_pd);
+	return 0;
+}
+
+static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_page_directory_entry *pd;
+	uint64_t temp, start = 0, size = dev_priv->gtt.base.total;
+	uint32_t pdpe;
+	int ret;
+
+	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	if (ret)
+		return ret;
 
+	/* Aliasing PPGTT has to always work and be mapped because of the way we
+	 * use RESTORE_INHIBIT in the context switch. This will be fixed
+	 * eventually. */
 	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
 		unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
 	}
 
-	start = 0;
-	size = orig_length;
-
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
 		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
 
+	ppgtt->base.allocate_va_range = NULL;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
+
+	return 0;
+}
+
+static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	if (ret)
+		return ret;
+
+	ppgtt->base.allocate_va_range = gen8_alloc_va_range;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+
 	return 0;
 }
 
@@ -1395,7 +1585,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 		}
 	}
 
-	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
+	ppgtt->base.allocate_va_range = aliasing ? NULL : gen6_alloc_va_range;
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
@@ -1436,8 +1626,10 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
 
 	if (INTEL_INFO(dev)->gen < 8)
 		return gen6_ppgtt_init(ppgtt, aliasing);
+	else if (aliasing)
+		return gen8_aliasing_ppgtt_init(ppgtt);
 	else
-		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
+		return gen8_ppgtt_init(ppgtt);
 }
 int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 {
@@ -1546,10 +1738,10 @@ ppgtt_bind_vma(struct i915_vma *vma,
 
 static void ppgtt_unbind_vma(struct i915_vma *vma)
 {
-	vma->vm->clear_range(vma->vm,
-			     vma->node.start,
-			     vma->obj->base.size,
-			     true);
+		vma->vm->clear_range(vma->vm,
+				     vma->node.start,
+				     vma->obj->base.size,
+				     true);
 }
 
 extern int intel_iommu_gfx_mapped;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 20/32] drm/i915/bdw: Support dynamic pdp updates in lrc mode
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (18 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 19/32] drm/i915/bdw: Dynamic page table allocations Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 21/32] drm/i915/bdw: Make pdp allocation more dynamic Michel Thierry
                     ` (11 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

Logic ring contexts need to know the PDPs when they are populated. With
dynamic page table allocations, these PDPs may not exist yet.

Check if PDPs have been allocated and use the scratch page if they do
not exist yet.

Before submission, update the PDPs in the logic ring context as PDPs
have been allocated.

v2: Renamed commit title (Daniel)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 80 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 70 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index bc9c7c3..f461631 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -320,6 +320,7 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 
 static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 				    struct drm_i915_gem_object *ring_obj,
+				    struct i915_hw_ppgtt *ppgtt,
 				    u32 tail)
 {
 	struct page *page;
@@ -331,6 +332,40 @@ static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 	reg_state[CTX_RING_TAIL+1] = tail;
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
 
+	/* True PPGTT with dynamic page allocation: update PDP registers and
+	 * point the unallocated PDPs to the scratch page
+	 */
+	if (ppgtt) {
+		if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+		} else {
+			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+		} else {
+			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+		} else {
+			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+		} else {
+			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+	}
+
 	kunmap_atomic(reg_state);
 
 	return 0;
@@ -349,7 +384,7 @@ static void execlists_submit_contexts(struct intel_engine_cs *ring,
 	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj0));
 	WARN_ON(!i915_gem_obj_is_pinned(ringbuf0->obj));
 
-	execlists_update_context(ctx_obj0, ringbuf0->obj, tail0);
+	execlists_update_context(ctx_obj0, ringbuf0->obj, to0->ppgtt, tail0);
 
 	if (to1) {
 		ringbuf1 = to1->engine[ring->id].ringbuf;
@@ -358,7 +393,7 @@ static void execlists_submit_contexts(struct intel_engine_cs *ring,
 		WARN_ON(!i915_gem_obj_is_pinned(ctx_obj1));
 		WARN_ON(!i915_gem_obj_is_pinned(ringbuf1->obj));
 
-		execlists_update_context(ctx_obj1, ringbuf1->obj, tail1);
+		execlists_update_context(ctx_obj1, ringbuf1->obj, to1->ppgtt, tail1);
 	}
 
 	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
@@ -1735,14 +1770,39 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+
+	/* With dynamic page allocation, PDPs may not be allocated at this point,
+	 * Point the unallocated PDPs to the scratch page
+	 */
+	if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+	} else {
+		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+	} else {
+		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+	} else {
+		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+	} else {
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 21/32] drm/i915/bdw: Make pdp allocation more dynamic
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (19 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 20/32] drm/i915/bdw: Support dynamic pdp updates in lrc mode Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 22/32] drm/i915/bdw: Abstract PDP usage Michel Thierry
                     ` (10 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This transitional patch doesn't do much for the existing code. However,
it should make upcoming patches to use the full 48b address space a bit
easier to swallow. The patch also introduces the PML4, ie. the new top
level structure of the page tables.

v2: Renamed  pdp_free to be similar to  pd/pt (unmap_and_free_pdp),
To facilitate testing, 48b mode will be available on Broadwell, when
i915.enable_ppgtt = 3.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_drv.h     |   7 ++-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 108 +++++++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  41 +++++++++++---
 3 files changed, 126 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3cc0196..662d6c1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2433,7 +2433,12 @@ struct drm_i915_cmd_table {
 #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
 #define HAS_LOGICAL_RING_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 8)
 #define USES_PPGTT(dev)		(i915.enable_ppgtt)
-#define USES_FULL_PPGTT(dev)	(i915.enable_ppgtt == 2)
+#define USES_FULL_PPGTT(dev)	(i915.enable_ppgtt >= 2)
+#ifdef CONFIG_64BIT
+# define USES_FULL_48BIT_PPGTT(dev)	(i915.enable_ppgtt == 3)
+#else
+# define USES_FULL_48BIT_PPGTT(dev)	false
+#endif
 
 #define HAS_OVERLAY(dev)		(INTEL_INFO(dev)->has_overlay)
 #define OVERLAY_NEEDS_PHYSICAL(dev)	(INTEL_INFO(dev)->overlay_needs_physical)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 63caaed..1cd5f65 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -100,10 +100,17 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 {
 	bool has_aliasing_ppgtt;
 	bool has_full_ppgtt;
+	bool has_full_64bit_ppgtt;
 
 	has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
 	has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
 
+#ifdef CONFIG_64BIT
+	has_full_64bit_ppgtt = IS_BROADWELL(dev) && false; /* FIXME: 64b */
+#else
+	has_full_64bit_ppgtt = false;
+#endif
+
 	if (intel_vgpu_active(dev))
 		has_full_ppgtt = false; /* emulation is too hard */
 
@@ -121,6 +128,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 	if (enable_ppgtt == 2 && has_full_ppgtt)
 		return 2;
 
+	if (enable_ppgtt == 3 && has_full_64bit_ppgtt)
+		return 3;
+
 #ifdef CONFIG_INTEL_IOMMU
 	/* Disable ppgtt on SNB if VT-d is on. */
 	if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
@@ -461,6 +471,45 @@ free_pd:
 	return ERR_PTR(ret);
 }
 
+static void __pdp_fini(struct i915_page_directory_pointer_entry *pdp)
+{
+	kfree(pdp->used_pdpes);
+	kfree(pdp->page_directory);
+	/* HACK */
+	pdp->page_directory = NULL;
+}
+
+static void unmap_and_free_pdp(struct i915_page_directory_pointer_entry *pdp,
+			    struct drm_device *dev)
+{
+	__pdp_fini(pdp);
+	if (USES_FULL_48BIT_PPGTT(dev))
+		kfree(pdp);
+}
+
+static int __pdp_init(struct i915_page_directory_pointer_entry *pdp,
+		      struct drm_device *dev)
+{
+	size_t pdpes = I915_PDPES_PER_PDP(dev);
+
+	pdp->used_pdpes = kcalloc(BITS_TO_LONGS(pdpes),
+				  sizeof(unsigned long),
+				  GFP_KERNEL);
+	if (!pdp->used_pdpes)
+		return -ENOMEM;
+
+	pdp->page_directory = kcalloc(pdpes, sizeof(*pdp->page_directory), GFP_KERNEL);
+	if (!pdp->page_directory) {
+		kfree(pdp->used_pdpes);
+		/* the PDP might be the statically allocated top level. Keep it
+		 * as clean as possible */
+		pdp->used_pdpes = NULL;
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_engine_cs *ring,
 			  unsigned entry,
@@ -490,7 +539,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 {
 	int i, ret;
 
-	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+	for (i = 3; i >= 0; i--) {
 		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
 		dma_addr_t pd_daddr = pd ? pd->daddr : ppgtt->scratch_pd->daddr;
 		/* The page directory might be NULL, but we need to clear out
@@ -579,9 +628,6 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
-		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
-			break;
-
 		if (pt_vaddr == NULL) {
 			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[pdpe];
 			struct i915_page_table_entry *pt = pd->page_tables[pde];
@@ -663,7 +709,8 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
 
-	for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
+	for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+			I915_PDPES_PER_PDP(ppgtt->base.dev)) {
 		struct i915_page_directory_entry *pd;
 
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
@@ -695,13 +742,15 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
+	for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+				I915_PDPES_PER_PDP(ppgtt->base.dev)) {
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
 		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
 		unmap_and_free_pd(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
 	}
+	unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
 }
 
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
@@ -798,8 +847,9 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 	struct i915_page_directory_entry *pd;
 	uint64_t temp;
 	uint32_t pdpe;
+	size_t pdpes =  I915_PDPES_PER_PDP(ppgtt->base.dev);
 
-	BUG_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+	BUG_ON(!bitmap_empty(new_pds, pdpes));
 
 	/* FIXME: PPGTT container_of won't work for 64b */
 	BUG_ON((start + length) > 0x800000000ULL);
@@ -819,18 +869,19 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 
 unwind_out:
-	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
+	for_each_set_bit(pdpe, new_pds, pdpes)
 		unmap_and_free_pd(pdp->page_directory[pdpe], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
 
 static inline void
-free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
+		       size_t pdpes)
 {
 	int i;
 
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+	for (i = 0; i < pdpes; i++)
 		kfree(new_pts[i]);
 	kfree(new_pts);
 	kfree(new_pds);
@@ -840,13 +891,14 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
  * of these are based on the number of PDPEs in the system.
  */
 int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
-					 unsigned long ***new_pts)
+					 unsigned long ***new_pts,
+					 size_t pdpes)
 {
 	int i;
 	unsigned long *pds;
 	unsigned long **pts;
 
-	pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
+	pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL);
 	if (!pds)
 		return -ENOMEM;
 
@@ -856,7 +908,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
 		return -ENOMEM;
 	}
 
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+	for (i = 0; i < pdpes; i++) {
 		pts[i] = kcalloc(BITS_TO_LONGS(GEN8_PDES_PER_PAGE),
 				 sizeof(unsigned long), GFP_KERNEL);
 		if (!pts[i])
@@ -869,7 +921,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
 	return 0;
 
 err_out:
-	free_gen8_temp_bitmaps(pds, pts);
+	free_gen8_temp_bitmaps(pds, pts, pdpes);
 	return -ENOMEM;
 }
 
@@ -885,6 +937,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	const uint64_t orig_length = length;
 	uint64_t temp;
 	uint32_t pdpe;
+	size_t pdpes = I915_PDPES_PER_PDP(dev);
 	int ret;
 
 #ifndef CONFIG_64BIT
@@ -902,7 +955,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	if (WARN_ON(start + length < start))
 		return -ERANGE;
 
-	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
+	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables, pdpes);
 	if (ret)
 		return ret;
 
@@ -910,7 +963,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
 					new_page_dirs);
 	if (ret) {
-		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 		return ret;
 	}
 
@@ -967,7 +1020,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		set_bit(pdpe, ppgtt->pdp.used_pdpes);
 	}
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 	return 0;
 
 err_out:
@@ -976,13 +1029,19 @@ err_out:
 			unmap_and_free_pt(pd->page_tables[temp], vm->dev);
 	}
 
-	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
+	for_each_set_bit(pdpe, new_page_dirs, pdpes)
 		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 	return ret;
 }
 
+static void gen8_ppgtt_fini_common(struct i915_hw_ppgtt *ppgtt)
+{
+	unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
+	unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
+}
+
 /**
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -1003,6 +1062,15 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 
 	ppgtt->switch_mm = gen8_mm_switch;
 
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		int ret = __pdp_init(&ppgtt->pdp, false);
+		if (ret) {
+			unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
+			return ret;
+		}
+	} else
+		return -EPERM; /* Not yet implemented */
+
 	return 0;
 }
 
@@ -1024,7 +1092,7 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	 * eventually. */
 	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
-		unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
+		gen8_ppgtt_fini_common(ppgtt);
 		return ret;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c68ec3a..a33c6e9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -85,8 +85,12 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
  * The difference as compared to normal x86 3 level page table is the PDPEs are
  * programmed via register.
  */
+#define GEN8_PML4ES_PER_PML4		512
+#define GEN8_PML4E_SHIFT		39
 #define GEN8_PDPE_SHIFT			30
-#define GEN8_PDPE_MASK			0x3
+/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
+ * tables */
+#define GEN8_PDPE_MASK			0x1ff
 #define GEN8_PDE_SHIFT			21
 #define GEN8_PDE_MASK			0x1ff
 #define GEN8_PTE_SHIFT			12
@@ -95,6 +99,13 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
 #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
 
+#ifdef CONFIG_64BIT
+# define I915_PDPES_PER_PDP(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
+		GEN8_PML4ES_PER_PML4 : GEN8_LEGACY_PDPES)
+#else
+# define I915_PDPES_PER_PDP		GEN8_LEGACY_PDPES
+#endif
+
 #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
 #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
 #define PPAT_CACHED_INDEX		_PAGE_PAT /* WB LLCeLLC */
@@ -210,9 +221,17 @@ struct i915_page_directory_entry {
 };
 
 struct i915_page_directory_pointer_entry {
-	/* struct page *page; */
-	DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
-	struct i915_page_directory_entry *page_directory[GEN8_LEGACY_PDPES];
+	struct page *page;
+	dma_addr_t daddr;
+	unsigned long *used_pdpes;
+	struct i915_page_directory_entry **page_directory;
+};
+
+struct i915_pml4 {
+	struct page *page;
+	dma_addr_t daddr;
+	DECLARE_BITMAP(used_pml4es, GEN8_PML4ES_PER_PML4);
+	struct i915_page_directory_pointer_entry *pdps[GEN8_PML4ES_PER_PML4];
 };
 
 struct i915_address_space {
@@ -302,8 +321,9 @@ struct i915_hw_ppgtt {
 	struct drm_mm_node node;
 	unsigned long pd_dirty_rings;
 	union {
-		struct i915_page_directory_pointer_entry pdp;
-		struct i915_page_directory_entry pd;
+		struct i915_pml4 pml4;		/* GEN8+ & 64b PPGTT */
+		struct i915_page_directory_pointer_entry pdp;	/* GEN8+ */
+		struct i915_page_directory_entry pd;		/* GEN6-7 */
 	};
 
 	union {
@@ -399,14 +419,17 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
-#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
-	for (iter = gen8_pdpe_index(start), pd = (pdp)->page_directory[iter];	\
-	     length > 0 && iter < GEN8_LEGACY_PDPES;			\
+#define gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, b)	\
+	for (iter = gen8_pdpe_index(start), pd = (pdp)->page_directory[iter]; \
+	     length > 0 && (iter < b);					\
 	     pd = (pdp)->page_directory[++iter],				\
 	     temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start,	\
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
+	gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
+
 /* Clamp length to the next page_directory boundary */
 static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
 {
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 22/32] drm/i915/bdw: Abstract PDP usage
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (20 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 21/32] drm/i915/bdw: Make pdp allocation more dynamic Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 23/32] drm/i915/bdw: Add dynamic page trace events Michel Thierry
                     ` (9 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Up until now, ppgtt->pdp has always been the root of our page tables.
Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.

In preparation for 4 level page tables, we need to stop use ppgtt->pdp
directly unless we know it's what we want. The future structure will use
ppgtt->pml4 for the top level, and the pdp is just one of the entries
being pointed to by a pml4e.

This patch addresses some carelessness done throughout development wrt
assumptions made of the root page tables.

v2: Updated after dynamic page allocation changes.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 123 ++++++++++++++++++++----------------
 1 file changed, 70 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1cd5f65..92ca430 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -559,6 +559,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -574,10 +575,10 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 		struct i915_page_table_entry *pt;
 		struct page *page_table;
 
-		if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
+		if (WARN_ON(!pdp->page_directory[pdpe]))
 			continue;
 
-		pd = ppgtt->pdp.page_directory[pdpe];
+		pd = pdp->page_directory[pdpe];
 
 		if (WARN_ON(!pd->page_tables[pde]))
 			continue;
@@ -619,6 +620,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	gen8_gtt_pte_t *pt_vaddr;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -629,7 +631,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL) {
-			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[pdpe];
+			struct i915_page_directory_entry *pd = pdp->page_directory[pdpe];
 			struct i915_page_table_entry *pt = pd->page_tables[pde];
 			struct page *page_table = pt->page;
 
@@ -707,16 +709,17 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct d
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
+	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	int i, j;
 
-	for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+	for_each_set_bit(i, pdp->used_pdpes,
 			I915_PDPES_PER_PDP(ppgtt->base.dev)) {
 		struct i915_page_directory_entry *pd;
 
-		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+		if (WARN_ON(!pdp->page_directory[i]))
 			continue;
 
-		pd = ppgtt->pdp.page_directory[i];
+		pd = pdp->page_directory[i];
 		if (!pd->daddr)
 			pci_unmap_page(hwdev, pd->daddr, PAGE_SIZE,
 					PCI_DMA_BIDIRECTIONAL);
@@ -742,15 +745,21 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for_each_set_bit(i, ppgtt->pdp.used_pdpes,
-				I915_PDPES_PER_PDP(ppgtt->base.dev)) {
-		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
-			continue;
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+				 I915_PDPES_PER_PDP(ppgtt->base.dev)) {
+			if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+				continue;
 
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
-		unmap_and_free_pd(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
+			gen8_free_page_tables(ppgtt->pdp.page_directory[i],
+					      ppgtt->base.dev);
+			unmap_and_free_pd(ppgtt->pdp.page_directory[i],
+					  ppgtt->base.dev);
+		}
+		unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
+	} else {
+		BUG(); /* to be implemented later */
 	}
-	unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
 }
 
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
@@ -764,7 +773,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 
 /**
  * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
- * @ppgtt:	Master ppgtt structure.
+ * @vm:		Master vm structure.
  * @pd:		Page directory for this address range.
  * @start:	Starting virtual address to begin allocations.
  * @length	Size of the allocations.
@@ -780,12 +789,13 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
  *
  * Return: 0 if success; negative error code otherwise.
  */
-static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
 				     struct i915_page_directory_entry *pd,
 				     uint64_t start,
 				     uint64_t length,
 				     unsigned long *new_pts)
 {
+	struct drm_device *dev = vm->dev;
 	struct i915_page_table_entry *pt;
 	uint64_t temp;
 	uint32_t pde;
@@ -798,7 +808,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
 			continue;
 		}
 
-		pt = alloc_pt_single(ppgtt->base.dev);
+		pt = alloc_pt_single(dev);
 		if (IS_ERR(pt))
 			goto unwind_out;
 
@@ -810,14 +820,14 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
 
 unwind_out:
 	for_each_set_bit(pde, new_pts, GEN8_PDES_PER_PAGE)
-		unmap_and_free_pt(pd->page_tables[pde], ppgtt->base.dev);
+		unmap_and_free_pt(pd->page_tables[pde], dev);
 
 	return -ENOMEM;
 }
 
 /**
  * gen8_ppgtt_alloc_page_directories() - Allocate page directories for VA range.
- * @ppgtt:	Master ppgtt structure.
+ * @vm:		Master vm structure.
  * @pdp:	Page directory pointer for this address range.
  * @start:	Starting virtual address to begin allocations.
  * @length	Size of the allocations.
@@ -838,16 +848,17 @@ unwind_out:
  *
  * Return: 0 if success; negative error code otherwise.
  */
-static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
+static int gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
 				     struct i915_page_directory_pointer_entry *pdp,
 				     uint64_t start,
 				     uint64_t length,
 				     unsigned long *new_pds)
 {
+	struct drm_device *dev = vm->dev;
 	struct i915_page_directory_entry *pd;
 	uint64_t temp;
 	uint32_t pdpe;
-	size_t pdpes =  I915_PDPES_PER_PDP(ppgtt->base.dev);
+	size_t pdpes =  I915_PDPES_PER_PDP(vm->dev);
 
 	BUG_ON(!bitmap_empty(new_pds, pdpes));
 
@@ -858,7 +869,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 		if (pd)
 			continue;
 
-		pd = alloc_pd_single(ppgtt->base.dev);
+		pd = alloc_pd_single(dev);
 		if (IS_ERR(pd))
 			goto unwind_out;
 
@@ -870,7 +881,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 
 unwind_out:
 	for_each_set_bit(pdpe, new_pds, pdpes)
-		unmap_and_free_pd(pdp->page_directory[pdpe], ppgtt->base.dev);
+		unmap_and_free_pd(pdp->page_directory[pdpe], dev);
 
 	return -ENOMEM;
 }
@@ -925,13 +936,13 @@ err_out:
 	return -ENOMEM;
 }
 
-static int gen8_alloc_va_range(struct i915_address_space *vm,
-			       uint64_t start,
-			       uint64_t length)
+static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
+				    struct i915_page_directory_pointer_entry *pdp,
+				    uint64_t start,
+				    uint64_t length)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
 	unsigned long *new_page_dirs, **new_page_tables;
+	struct drm_device *dev = vm->dev;
 	struct i915_page_directory_entry *pd;
 	const uint64_t orig_start = start;
 	const uint64_t orig_length = length;
@@ -960,17 +971,15 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		return ret;
 
 	/* Do the allocations first so we can easily bail out */
-	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
-					new_page_dirs);
+	ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length, new_page_dirs);
 	if (ret) {
 		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 		return ret;
 	}
 
-	/* For every page directory referenced, allocate page tables */
-	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		bitmap_zero(new_page_tables[pdpe], GEN8_PDES_PER_PAGE);
-		ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
+		ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length,
 						new_page_tables[pdpe]);
 		if (ret)
 			goto err_out;
@@ -979,10 +988,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	start = orig_start;
 	length = orig_length;
 
-	/* Allocations have completed successfully, so set the bitmaps, and do
-	 * the mappings. */
-	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		gen8_ppgtt_pde_t *const page_directory = kmap_atomic(pd->page);
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		struct i915_page_table_entry *pt;
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
@@ -1004,20 +1010,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 
 			/* Our pde is now pointing to the pagetable, pt */
 			set_bit(pde, pd->used_pdes);
-
-			/* Map the PDE to the page table */
-			__gen8_do_map_pt(page_directory + pde, pt, vm->dev);
-
-			/* NB: We haven't yet mapped ptes to pages. At this
-			 * point we're still relying on insert_entries() */
 		}
 
-		if (!HAS_LLC(vm->dev))
-			drm_clflush_virt_range(page_directory, PAGE_SIZE);
-
-		kunmap_atomic(page_directory);
-
-		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+		set_bit(pdpe, pdp->used_pdpes);
+		gen8_map_pagetable_range(pd, start, length, dev);
 	}
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1026,16 +1022,36 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 err_out:
 	while (pdpe--) {
 		for_each_set_bit(temp, new_page_tables[pdpe], GEN8_PDES_PER_PAGE)
-			unmap_and_free_pt(pd->page_tables[temp], vm->dev);
+			unmap_and_free_pt(pd->page_tables[temp], dev);
 	}
 
 	for_each_set_bit(pdpe, new_page_dirs, pdpes)
-		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
+		unmap_and_free_pd(pdp->page_directory[pdpe], dev);
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 	return ret;
 }
 
+static int __noreturn gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
+					       struct i915_pml4 *pml4,
+					       uint64_t start,
+					       uint64_t length)
+{
+	BUG(); /* to be implemented later */
+}
+
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+
+	if (!USES_FULL_48BIT_PPGTT(vm->dev))
+		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
+	else
+		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+}
+
 static void gen8_ppgtt_fini_common(struct i915_hw_ppgtt *ppgtt)
 {
 	unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
@@ -1078,12 +1094,13 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	struct i915_page_directory_entry *pd;
 	uint64_t temp, start = 0, size = dev_priv->gtt.base.total;
 	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	ret = gen8_ppgtt_init_common(ppgtt, size);
 	if (ret)
 		return ret;
 
@@ -1096,8 +1113,8 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 	}
 
-	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
-		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
+	gen8_for_each_pdpe(pd, pdp, start, size, temp, pdpe)
+		gen8_map_pagetable_range(pd, start, size, dev);
 
 	ppgtt->base.allocate_va_range = NULL;
 	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 23/32] drm/i915/bdw: Add dynamic page trace events
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (21 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 22/32] drm/i915/bdw: Abstract PDP usage Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 24/32] drm/i915/bdw: Add ppgtt info for dynamic pages Michel Thierry
                     ` (8 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The dynamic page allocation patch series added it for GEN6, this patch
adds them for GEN8.

v2: Consolidate pagetable/page_directory events
v3: Multiple rebases.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 23 +++++++++++++++--------
 drivers/gpu/drm/i915/i915_trace.h   | 16 ++++++++++++++++
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 92ca430..a6dad95 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -672,19 +672,24 @@ static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
 /* It's likely we'll map more than one pagetable at a time. This function will
  * save us unnecessary kmap calls, but do no more functionally than multiple
  * calls to map_pt. */
-static void gen8_map_pagetable_range(struct i915_page_directory_entry *pd,
+static void gen8_map_pagetable_range(struct i915_address_space *vm,
+				     struct i915_page_directory_entry *pd,
 				     uint64_t start,
-				     uint64_t length,
-				     struct drm_device *dev)
+				     uint64_t length)
 {
 	gen8_ppgtt_pde_t * const page_directory = kmap_atomic(pd->page);
 	struct i915_page_table_entry *pt;
 	uint64_t temp, pde;
 
-	gen8_for_each_pde(pt, pd, start, length, temp, pde)
-		__gen8_do_map_pt(page_directory + pde, pt, dev);
+	gen8_for_each_pde(pt, pd, start, length, temp, pde) {
+		__gen8_do_map_pt(page_directory + pde, pt, vm->dev);
+		trace_i915_page_table_entry_map(vm, pde, pt,
+					 gen8_pte_index(start),
+					 gen8_pte_count(start, length),
+					 GEN8_PTES_PER_PAGE);
+	}
 
-	if (!HAS_LLC(dev))
+	if (!HAS_LLC(vm->dev))
 		drm_clflush_virt_range(page_directory, PAGE_SIZE);
 
 	kunmap_atomic(page_directory);
@@ -814,6 +819,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
 
 		pd->page_tables[pde] = pt;
 		set_bit(pde, new_pts);
+		trace_i915_page_table_entry_alloc(vm, pde, start, GEN8_PDE_SHIFT);
 	}
 
 	return 0;
@@ -875,6 +881,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
 
 		pdp->page_directory[pdpe] = pd;
 		set_bit(pdpe, new_pds);
+		trace_i915_page_directory_entry_alloc(vm, pdpe, start, GEN8_PDPE_SHIFT);
 	}
 
 	return 0;
@@ -1013,7 +1020,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 		}
 
 		set_bit(pdpe, pdp->used_pdpes);
-		gen8_map_pagetable_range(pd, start, length, dev);
+		gen8_map_pagetable_range(vm, pd, start, length);
 	}
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1114,7 +1121,7 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	}
 
 	gen8_for_each_pdpe(pd, pdp, start, size, temp, pdpe)
-		gen8_map_pagetable_range(pd, start, size, dev);
+		gen8_map_pagetable_range(&ppgtt->base, pd,start, size);
 
 	ppgtt->base.allocate_va_range = NULL;
 	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 0038dc2..10cd830 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -214,6 +214,22 @@ DEFINE_EVENT(i915_page_table_entry, i915_page_table_entry_alloc,
 	     TP_ARGS(vm, pde, start, pde_shift)
 );
 
+DEFINE_EVENT_PRINT(i915_page_table_entry, i915_page_directory_entry_alloc,
+		   TP_PROTO(struct i915_address_space *vm, u32 pdpe, u64 start, u64 pdpe_shift),
+		   TP_ARGS(vm, pdpe, start, pdpe_shift),
+
+		   TP_printk("vm=%p, pdpe=%d (0x%llx-0x%llx)",
+			     __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT_PRINT(i915_page_table_entry, i915_page_directory_pointer_entry_alloc,
+		   TP_PROTO(struct i915_address_space *vm, u32 pml4e, u64 start, u64 pml4e_shift),
+		   TP_ARGS(vm, pml4e, start, pml4e_shift),
+
+		   TP_printk("vm=%p, pml4e=%d (0x%llx-0x%llx)",
+			     __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
 /* Avoid extra math because we only support two sizes. The format is defined by
  * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
 #define TRACE_PT_SIZE(bits) \
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 24/32] drm/i915/bdw: Add ppgtt info for dynamic pages
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (22 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 23/32] drm/i915/bdw: Add dynamic page trace events Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 25/32] drm/i915/bdw: implement alloc/free for 4lvl Michel Thierry
                     ` (7 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Note that there is no gen8 ppgtt debug_dump function yet.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 19 ++++++++++---------
 drivers/gpu/drm/i915/i915_gem_gtt.c | 32 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h |  9 +++++++++
 3 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e85da9d..c877957 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2165,7 +2165,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	struct drm_file *file;
 	int i;
 
 	if (INTEL_INFO(dev)->gen == 6)
@@ -2189,14 +2188,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 
 		ppgtt->debug_dump(ppgtt, m);
 	}
-
-	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
-		struct drm_i915_file_private *file_priv = file->driver_priv;
-
-		seq_printf(m, "proc: %s\n",
-			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
-		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
-	}
 }
 
 static int i915_ppgtt_info(struct seq_file *m, void *data)
@@ -2204,6 +2195,7 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	struct drm_info_node *node = m->private;
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_file *file;
 
 	int ret = mutex_lock_interruptible(&dev->struct_mutex);
 	if (ret)
@@ -2215,6 +2207,15 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	else if (INTEL_INFO(dev)->gen >= 6)
 		gen6_ppgtt_info(m, dev);
 
+	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
+		struct drm_i915_file_private *file_priv = file->driver_priv;
+
+		seq_printf(m, "\nproc: %s\n",
+			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
+		idr_for_each(&file_priv->context_idr, per_file_ctx,
+			     (void *)(unsigned long)m);
+	}
+
 	intel_runtime_pm_put(dev_priv);
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a6dad95..1bf457a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2127,6 +2127,38 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
 	readl(gtt_base);
 }
 
+void gen8_for_every_pdpe_pde(struct i915_hw_ppgtt *ppgtt,
+			     void (*callback)(struct i915_page_directory_pointer_entry *pdp,
+					      struct i915_page_directory_entry *pd,
+					      struct i915_page_table_entry *pt,
+					      unsigned pdpe,
+					      unsigned pde,
+					      void *data),
+			     void *data)
+{
+	uint64_t start = ppgtt->base.start;
+	uint64_t length = ppgtt->base.total;
+	uint64_t pdpe, pde, temp;
+
+	struct i915_page_directory_entry *pd;
+	struct i915_page_table_entry *pt;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		uint64_t pd_start = start, pd_length = length;
+		int i;
+
+		if (pd == NULL) {
+			for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
+				callback(&ppgtt->pdp, NULL, NULL, pdpe, i, data);
+			continue;
+		}
+
+		gen8_for_each_pde(pt, pd, pd_start, pd_length, temp, pde) {
+			callback(&ppgtt->pdp, pd, pt, pdpe, pde, data);
+		}
+	}
+}
+
 static void gen6_ggtt_clear_range(struct i915_address_space *vm,
 				  uint64_t start,
 				  uint64_t length,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index a33c6e9..144858e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -483,6 +483,15 @@ static inline size_t gen8_pde_count(uint64_t addr, uint64_t length)
 	return i915_pde_index(end, GEN8_PDE_SHIFT) - i915_pde_index(addr, GEN8_PDE_SHIFT);
 }
 
+void gen8_for_every_pdpe_pde(struct i915_hw_ppgtt *ppgtt,
+			     void (*callback)(struct i915_page_directory_pointer_entry *pdp,
+					      struct i915_page_directory_entry *pd,
+					      struct i915_page_table_entry *pt,
+					      unsigned pdpe,
+					      unsigned pde,
+					      void *data),
+			     void *data);
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 25/32] drm/i915/bdw: implement alloc/free for 4lvl
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (23 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 24/32] drm/i915/bdw: Add ppgtt info for dynamic pages Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 26/32] drm/i915/bdw: Add 4 level switching infrastructure Michel Thierry
                     ` (6 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The code for 4lvl works just as one would expect, and nicely it is able
to call into the existing 3lvl page table code to handle all of the
lower levels.

PML4 has no special attributes, and there will always be a PML4.
So simply initialize it at creation, and destroy it at the end.

v2: Return something at the end of gen8_alloc_va_range_4lvl to keep the
compiler happy. And define ret only in one place.
Updated gen8_ppgtt_unmap_pages and gen8_ppgtt_free to handle 4lvl.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 240 +++++++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_gtt.h |  11 +-
 2 files changed, 217 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1bf457a..44f8fa5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -482,9 +482,12 @@ static void __pdp_fini(struct i915_page_directory_pointer_entry *pdp)
 static void unmap_and_free_pdp(struct i915_page_directory_pointer_entry *pdp,
 			    struct drm_device *dev)
 {
-	__pdp_fini(pdp);
-	if (USES_FULL_48BIT_PPGTT(dev))
+	if (USES_FULL_48BIT_PPGTT(dev)) {
+		__pdp_fini(pdp);
+		i915_dma_unmap_single(pdp, dev);
+		__free_page(pdp->page);
 		kfree(pdp);
+	}
 }
 
 static int __pdp_init(struct i915_page_directory_pointer_entry *pdp,
@@ -510,6 +513,60 @@ static int __pdp_init(struct i915_page_directory_pointer_entry *pdp,
 	return 0;
 }
 
+static struct i915_page_directory_pointer_entry *alloc_pdp_single(struct i915_hw_ppgtt *ppgtt,
+					       struct i915_pml4 *pml4)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct i915_page_directory_pointer_entry *pdp;
+	int ret;
+
+	BUG_ON(!USES_FULL_48BIT_PPGTT(dev));
+
+	pdp = kmalloc(sizeof(*pdp), GFP_KERNEL);
+	if (!pdp)
+		return ERR_PTR(-ENOMEM);
+
+	pdp->page = alloc_page(GFP_KERNEL | GFP_DMA32 | __GFP_ZERO);
+	if (!pdp->page) {
+		kfree(pdp);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ret = __pdp_init(pdp, dev);
+	if (ret) {
+		__free_page(pdp->page);
+		kfree(pdp);
+		return ERR_PTR(ret);
+	}
+
+	i915_dma_map_single(pdp, dev);
+
+	return pdp;
+}
+
+static void pml4_fini(struct i915_pml4 *pml4)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pml4, struct i915_hw_ppgtt, pml4);
+	i915_dma_unmap_single(pml4, ppgtt->base.dev);
+	__free_page(pml4->page);
+	/* HACK */
+	pml4->page = NULL;
+}
+
+static int pml4_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct i915_pml4 *pml4 = &ppgtt->pml4;
+
+	pml4->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pml4->page)
+		return -ENOMEM;
+
+	i915_dma_map_single(pml4, ppgtt->base.dev);
+
+	return 0;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_engine_cs *ring,
 			  unsigned entry,
@@ -711,14 +768,13 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct d
 	}
 }
 
-static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
+static void gen8_ppgtt_unmap_pages_3lvl(struct i915_page_directory_pointer_entry *pdp,
+					struct drm_device *dev)
 {
-	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
-	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
+	struct pci_dev *hwdev = dev->pdev;
 	int i, j;
 
-	for_each_set_bit(i, pdp->used_pdpes,
-			I915_PDPES_PER_PDP(ppgtt->base.dev)) {
+	for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
 		struct i915_page_directory_entry *pd;
 
 		if (WARN_ON(!pdp->page_directory[i]))
@@ -746,27 +802,73 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	}
 }
 
-static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+static void gen8_ppgtt_unmap_pages_4lvl(struct i915_hw_ppgtt *ppgtt)
 {
+	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i;
 
-	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
-		for_each_set_bit(i, ppgtt->pdp.used_pdpes,
-				 I915_PDPES_PER_PDP(ppgtt->base.dev)) {
-			if (WARN_ON(!ppgtt->pdp.page_directory[i]))
-				continue;
+	for_each_set_bit(i, ppgtt->pml4.used_pml4es, GEN8_PML4ES_PER_PML4) {
+		struct i915_page_directory_pointer_entry *pdp;
 
-			gen8_free_page_tables(ppgtt->pdp.page_directory[i],
-					      ppgtt->base.dev);
-			unmap_and_free_pd(ppgtt->pdp.page_directory[i],
-					  ppgtt->base.dev);
-		}
-		unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
-	} else {
-		BUG(); /* to be implemented later */
+		if (WARN_ON(!ppgtt->pml4.pdps[i]))
+			continue;
+
+		pdp = ppgtt->pml4.pdps[i];
+		if (!pdp->daddr)
+			pci_unmap_page(hwdev, pdp->daddr, PAGE_SIZE,
+				       PCI_DMA_BIDIRECTIONAL);
+
+		gen8_ppgtt_unmap_pages_3lvl(ppgtt->pml4.pdps[i],
+					    ppgtt->base.dev);
 	}
 }
 
+static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
+{
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+		gen8_ppgtt_unmap_pages_3lvl(&ppgtt->pdp, ppgtt->base.dev);
+	else
+		gen8_ppgtt_unmap_pages_4lvl(ppgtt);
+}
+
+static void gen8_ppgtt_free_3lvl(struct i915_page_directory_pointer_entry *pdp,
+				 struct drm_device *dev)
+{
+	int i;
+
+	for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
+		if (WARN_ON(!pdp->page_directory[i]))
+			continue;
+
+		gen8_free_page_tables(pdp->page_directory[i], dev);
+		unmap_and_free_pd(pdp->page_directory[i], dev);
+	}
+
+	unmap_and_free_pdp(pdp, dev);
+}
+
+static void gen8_ppgtt_free_4lvl(struct i915_hw_ppgtt *ppgtt)
+{
+	int i;
+
+	for_each_set_bit(i, ppgtt->pml4.used_pml4es, GEN8_PML4ES_PER_PML4) {
+		if (WARN_ON(!ppgtt->pml4.pdps[i]))
+			continue;
+
+		gen8_ppgtt_free_3lvl(ppgtt->pml4.pdps[i], ppgtt->base.dev);
+	}
+
+	pml4_fini(&ppgtt->pml4);
+}
+
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+{
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+		gen8_ppgtt_free_3lvl(&ppgtt->pdp, ppgtt->base.dev);
+	else
+		gen8_ppgtt_free_4lvl(ppgtt);
+}
+
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct i915_hw_ppgtt *ppgtt =
@@ -1039,12 +1141,74 @@ err_out:
 	return ret;
 }
 
-static int __noreturn gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
-					       struct i915_pml4 *pml4,
-					       uint64_t start,
-					       uint64_t length)
+static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
+				    struct i915_pml4 *pml4,
+				    uint64_t start,
+				    uint64_t length)
 {
-	BUG(); /* to be implemented later */
+	DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer_entry *pdp;
+	const uint64_t orig_start = start;
+	const uint64_t orig_length = length;
+	uint64_t temp, pml4e;
+	int ret = 0;
+
+	/* Do the pml4 allocations first, so we don't need to track the newly
+	 * allocated tables below the pdp */
+	bitmap_zero(new_pdps, GEN8_PML4ES_PER_PML4);
+
+	/* The page_directoryectory and pagetable allocations are done in the shared 3
+	 * and 4 level code. Just allocate the pdps.
+	 */
+	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+		if (!pdp) {
+			WARN_ON(test_bit(pml4e, pml4->used_pml4es));
+			pdp = alloc_pdp_single(ppgtt, pml4);
+			if (IS_ERR(pdp))
+				goto err_alloc;
+
+			pml4->pdps[pml4e] = pdp;
+			set_bit(pml4e, new_pdps);
+			trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, pml4e,
+						   pml4e << GEN8_PML4E_SHIFT,
+						   GEN8_PML4E_SHIFT);
+
+		}
+	}
+
+	WARN(bitmap_weight(new_pdps, GEN8_PML4ES_PER_PML4) > 2,
+	     "The allocation has spanned more than 512GB. "
+	     "It is highly likely this is incorrect.");
+
+	start = orig_start;
+	length = orig_length;
+
+	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+		BUG_ON(!pdp);
+
+		ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
+		if (ret)
+			goto err_out;
+	}
+
+	bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
+		  GEN8_PML4ES_PER_PML4);
+
+	return 0;
+
+err_out:
+	start = orig_start;
+	length = orig_length;
+	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e)
+		gen8_ppgtt_free_3lvl(pdp, vm->dev);
+
+err_alloc:
+	for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
+		unmap_and_free_pdp(pdp, vm->dev);
+
+	return ret;
 }
 
 static int gen8_alloc_va_range(struct i915_address_space *vm,
@@ -1053,16 +1217,19 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 
-	if (!USES_FULL_48BIT_PPGTT(vm->dev))
-		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
-	else
+	if (USES_FULL_48BIT_PPGTT(vm->dev))
 		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+	else
+		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
 }
 
 static void gen8_ppgtt_fini_common(struct i915_hw_ppgtt *ppgtt)
 {
 	unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
-	unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
+	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+		pml4_fini(&ppgtt->pml4);
+	else
+		unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
 }
 
 /**
@@ -1085,14 +1252,21 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 
 	ppgtt->switch_mm = gen8_mm_switch;
 
-	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		int ret = pml4_init(ppgtt);
+		if (ret) {
+			unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
+			return ret;
+		}
+	} else {
 		int ret = __pdp_init(&ppgtt->pdp, false);
 		if (ret) {
 			unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 			return ret;
 		}
-	} else
-		return -EPERM; /* Not yet implemented */
+
+		trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, 0, 0, GEN8_PML4E_SHIFT);
+	}
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 144858e..1477f54 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -87,6 +87,7 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
  */
 #define GEN8_PML4ES_PER_PML4		512
 #define GEN8_PML4E_SHIFT		39
+#define GEN8_PML4E_MASK			(GEN8_PML4ES_PER_PML4 - 1)
 #define GEN8_PDPE_SHIFT			30
 /* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
  * tables */
@@ -427,6 +428,14 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
+#define gen8_for_each_pml4e(pdp, pml4, start, length, temp, iter)	\
+	for (iter = gen8_pml4e_index(start), pdp = (pml4)->pdps[iter];	\
+	     length > 0 && iter < GEN8_PML4ES_PER_PML4;			\
+	     pdp = (pml4)->pdps[++iter],				\
+	     temp = ALIGN(start+1, 1ULL << GEN8_PML4E_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
 #define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
 	gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
 
@@ -458,7 +467,7 @@ static inline uint32_t gen8_pdpe_index(uint64_t address)
 
 static inline uint32_t gen8_pml4e_index(uint64_t address)
 {
-	BUG(); /* For 64B */
+	return (address >> GEN8_PML4E_SHIFT) & GEN8_PML4E_MASK;
 }
 
 static inline size_t gen8_pte_count(uint64_t addr, uint64_t length)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 26/32] drm/i915/bdw: Add 4 level switching infrastructure
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (24 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 25/32] drm/i915/bdw: implement alloc/free for 4lvl Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 27/32] drm/i915/bdw: Support 64 bit PPGTT in lrc mode Michel Thierry
                     ` (5 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Map is easy, it's the same register as the PDP descriptor 0, but it only
has one entry.

v2: PML4 update in legacy context switch is left for historic reasons,
the preferred mode of operation is with lrc context based submission.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 56 +++++++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem_gtt.h |  4 ++-
 drivers/gpu/drm/i915/i915_reg.h     |  1 +
 3 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 44f8fa5..2c3f2db 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -192,6 +192,9 @@ static inline gen8_ppgtt_pde_t gen8_pde_encode(struct drm_device *dev,
 	return pde;
 }
 
+#define gen8_pdpe_encode gen8_pde_encode
+#define gen8_pml4e_encode gen8_pde_encode
+
 static gen6_gtt_pte_t snb_pte_encode(dma_addr_t addr,
 				     enum i915_cache_level level,
 				     bool valid, u32 unused)
@@ -591,8 +594,8 @@ static int gen8_write_pdp(struct intel_engine_cs *ring,
 	return 0;
 }
 
-static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
-			  struct intel_engine_cs *ring)
+static int gen8_legacy_mm_switch(struct i915_hw_ppgtt *ppgtt,
+				 struct intel_engine_cs *ring)
 {
 	int i, ret;
 
@@ -609,6 +612,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 }
 
+static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
+			      struct intel_engine_cs *ring)
+{
+	return gen8_write_pdp(ring, 0, ppgtt->pml4.daddr);
+}
+
 static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				   uint64_t start,
 				   uint64_t length,
@@ -752,6 +761,37 @@ static void gen8_map_pagetable_range(struct i915_address_space *vm,
 	kunmap_atomic(page_directory);
 }
 
+static void gen8_map_page_directory(struct i915_page_directory_pointer_entry *pdp,
+				    struct i915_page_directory_entry *pd,
+				    int index,
+				    struct drm_device *dev)
+{
+	gen8_ppgtt_pdpe_t *page_directorypo;
+	gen8_ppgtt_pdpe_t pdpe;
+
+	/* We do not need to clflush because no platform requiring flush
+	 * supports 64b pagetables. */
+	if (!USES_FULL_48BIT_PPGTT(dev))
+		return;
+
+	page_directorypo = kmap_atomic(pdp->page);
+	pdpe = gen8_pdpe_encode(dev, pd->daddr, I915_CACHE_LLC);
+	page_directorypo[index] = pdpe;
+	kunmap_atomic(page_directorypo);
+}
+
+static void gen8_map_page_directory_pointer(struct i915_pml4 *pml4,
+					    struct i915_page_directory_pointer_entry *pdp,
+					    int index,
+					    struct drm_device *dev)
+{
+	gen8_ppgtt_pml4e_t *pagemap = kmap_atomic(pml4->page);
+	gen8_ppgtt_pml4e_t pml4e = gen8_pml4e_encode(dev, pdp->daddr, I915_CACHE_LLC);
+	BUG_ON(!USES_FULL_48BIT_PPGTT(dev));
+	pagemap[index] = pml4e;
+	kunmap_atomic(pagemap);
+}
+
 static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct drm_device *dev)
 {
 	int i;
@@ -1123,6 +1163,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 
 		set_bit(pdpe, pdp->used_pdpes);
 		gen8_map_pagetable_range(vm, pd, start, length);
+		gen8_map_page_directory(pdp, pd, pdpe, dev);
 	}
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1191,6 +1232,8 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
 		ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
 		if (ret)
 			goto err_out;
+
+		gen8_map_page_directory_pointer(pml4, pdp, pml4e, vm->dev);
 	}
 
 	bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
@@ -1250,14 +1293,14 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 
-	ppgtt->switch_mm = gen8_mm_switch;
-
 	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
 		int ret = pml4_init(ppgtt);
 		if (ret) {
 			unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 			return ret;
 		}
+
+		ppgtt->switch_mm = gen8_48b_mm_switch;
 	} else {
 		int ret = __pdp_init(&ppgtt->pdp, false);
 		if (ret) {
@@ -1265,6 +1308,7 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 			return ret;
 		}
 
+		ppgtt->switch_mm = gen8_legacy_mm_switch;
 		trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, 0, 0, GEN8_PML4E_SHIFT);
 	}
 
@@ -1294,6 +1338,7 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 	}
 
+	/* FIXME: PML4 */
 	gen8_for_each_pdpe(pd, pdp, start, size, temp, pdpe)
 		gen8_map_pagetable_range(&ppgtt->base, pd,start, size);
 
@@ -1498,8 +1543,9 @@ static void gen8_ppgtt_enable(struct drm_device *dev)
 	int j;
 
 	for_each_ring(ring, dev_priv, j) {
+		u32 four_level = USES_FULL_48BIT_PPGTT(dev) ? GEN8_GFX_PPGTT_64B : 0;
 		I915_WRITE(RING_MODE_GEN7(ring),
-			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
+			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE | four_level));
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 1477f54..1f4cdb1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -38,7 +38,9 @@ struct drm_i915_file_private;
 
 typedef uint32_t gen6_gtt_pte_t;
 typedef uint64_t gen8_gtt_pte_t;
-typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
+typedef gen8_gtt_pte_t		gen8_ppgtt_pde_t;
+typedef gen8_ppgtt_pde_t	gen8_ppgtt_pdpe_t;
+typedef gen8_ppgtt_pdpe_t	gen8_ppgtt_pml4e_t;
 
 #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
 
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 1dc91de..305e5b7 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1338,6 +1338,7 @@ enum skl_disp_power_wells {
 #define   GFX_REPLAY_MODE		(1<<11)
 #define   GFX_PSMI_GRANULARITY		(1<<10)
 #define   GFX_PPGTT_ENABLE		(1<<9)
+#define   GEN8_GFX_PPGTT_64B		(1<<7)
 
 #define VLV_DISPLAY_BASE 0x180000
 #define VLV_MIPI_BASE VLV_DISPLAY_BASE
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 27/32] drm/i915/bdw: Support 64 bit PPGTT in lrc mode
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (25 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 26/32] drm/i915/bdw: Add 4 level switching infrastructure Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 28/32] drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT Michel Thierry
                     ` (4 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

In 64b (48bit canonical) PPGTT addressing, the PDP0 register contains
the base address to PML4, while the other PDP registers are ignored.

Also, the addressing mode must be specified in every context descriptor.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 167 ++++++++++++++++++++++++++-------------
 1 file changed, 114 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f461631..2b6d262 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -255,7 +255,8 @@ u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
 }
 
 static uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
-					 struct drm_i915_gem_object *ctx_obj)
+					 struct drm_i915_gem_object *ctx_obj,
+					 bool legacy_64bit_ctx)
 {
 	struct drm_device *dev = ring->dev;
 	uint64_t desc;
@@ -264,7 +265,10 @@ static uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
 	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
 
 	desc = GEN8_CTX_VALID;
-	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
+	if (legacy_64bit_ctx)
+		desc |= LEGACY_64B_CONTEXT << GEN8_CTX_MODE_SHIFT;
+	else
+		desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
 	desc |= GEN8_CTX_L3LLC_COHERENT;
 	desc |= GEN8_CTX_PRIVILEGE;
 	desc |= lrca;
@@ -292,16 +296,17 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	uint64_t temp = 0;
 	uint32_t desc[4];
+	bool legacy_64bit_ctx = USES_FULL_48BIT_PPGTT(dev);
 
 	/* XXX: You must always write both descriptors in the order below. */
 	if (ctx_obj1)
-		temp = execlists_ctx_descriptor(ring, ctx_obj1);
+		temp = execlists_ctx_descriptor(ring, ctx_obj1, legacy_64bit_ctx);
 	else
 		temp = 0;
 	desc[1] = (u32)(temp >> 32);
 	desc[0] = (u32)temp;
 
-	temp = execlists_ctx_descriptor(ring, ctx_obj0);
+	temp = execlists_ctx_descriptor(ring, ctx_obj0, legacy_64bit_ctx);
 	desc[3] = (u32)(temp >> 32);
 	desc[2] = (u32)temp;
 
@@ -332,37 +337,60 @@ static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 	reg_state[CTX_RING_TAIL+1] = tail;
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
 
-	/* True PPGTT with dynamic page allocation: update PDP registers and
-	 * point the unallocated PDPs to the scratch page
-	 */
-	if (ppgtt) {
+	if (ppgtt && USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		/* True 64b PPGTT (48bit canonical)
+		 * PDP0_DESCRIPTOR contains the base address to PML4 and
+		 * other PDP Descriptors are ignored
+		 */
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pml4.daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pml4.daddr);
+	} else if (ppgtt) {
+		/* True 32b PPGTT with dynamic page allocation: update PDP
+		 * registers and point the unallocated PDPs to the scratch page
+		 */
 		if (test_bit(3, ppgtt->pdp.used_pdpes)) {
-			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
-			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+			reg_state[CTX_PDP3_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+			reg_state[CTX_PDP3_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
 		} else {
-			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP3_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP3_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
 		}
 		if (test_bit(2, ppgtt->pdp.used_pdpes)) {
-			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
-			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+			reg_state[CTX_PDP2_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+			reg_state[CTX_PDP2_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
 		} else {
-			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP2_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP2_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
 		}
 		if (test_bit(1, ppgtt->pdp.used_pdpes)) {
-			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
-			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+			reg_state[CTX_PDP1_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+			reg_state[CTX_PDP1_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
 		} else {
-			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP1_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP1_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
 		}
 		if (test_bit(0, ppgtt->pdp.used_pdpes)) {
-			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
-			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+			reg_state[CTX_PDP0_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+			reg_state[CTX_PDP0_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
 		} else {
-			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP0_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP0_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
 		}
 	}
 
@@ -1771,36 +1799,69 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
 
-	/* With dynamic page allocation, PDPs may not be allocated at this point,
-	 * Point the unallocated PDPs to the scratch page
-	 */
-	if (test_bit(3, ppgtt->pdp.used_pdpes)) {
-		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
-		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
-	} else {
-		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
-	}
-	if (test_bit(2, ppgtt->pdp.used_pdpes)) {
-		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
-		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
-	} else {
-		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
-	}
-	if (test_bit(1, ppgtt->pdp.used_pdpes)) {
-		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
-		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
-	} else {
-		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
-	}
-	if (test_bit(0, ppgtt->pdp.used_pdpes)) {
-		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
-		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		/* 64b PPGTT (48bit canonical)
+		 * PDP0_DESCRIPTOR contains the base address to PML4 and
+		 * other PDP Descriptors are ignored
+		 */
+		reg_state[CTX_PDP3_UDW+1] = 0;
+		reg_state[CTX_PDP3_LDW+1] = 0;
+		reg_state[CTX_PDP2_UDW+1] = 0;
+		reg_state[CTX_PDP2_LDW+1] = 0;
+		reg_state[CTX_PDP1_UDW+1] = 0;
+		reg_state[CTX_PDP1_LDW+1] = 0;
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pml4.daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pml4.daddr);
 	} else {
-		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		/* 32b PPGTT
+		 * PDP*_DESCRIPTOR contains the base address of space supported.
+		 * With dynamic page allocation, PDPs may not be allocated at
+		 * this point. Point the unallocated PDPs to the scratch page
+		 */
+		if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP3_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+			reg_state[CTX_PDP3_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+		} else {
+			reg_state[CTX_PDP3_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP3_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP2_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+			reg_state[CTX_PDP2_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+		} else {
+			reg_state[CTX_PDP2_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP2_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP1_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+			reg_state[CTX_PDP1_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+		} else {
+			reg_state[CTX_PDP1_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP1_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP0_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+			reg_state[CTX_PDP0_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+		} else {
+			reg_state[CTX_PDP0_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP0_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
 	}
 
 	if (ring->id == RCS) {
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 28/32] drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (26 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 27/32] drm/i915/bdw: Support 64 bit PPGTT in lrc mode Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 29/32] drm/i915: Plumb sg_iter through va allocation ->maps Michel Thierry
                     ` (3 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The insert_entries function was the function used to write PTEs. For the
PPGTT it was "hardcoded" to only understand two level page tables, which
was the case for GEN7. We can reuse this for 4 level page tables, and
remove the concept of insert_entries, which was never viable past 2
level page tables anyway, but it requires a bit of rework to make the
function a bit more generic.

This patch begins the generalization work, and it will be heavily used
upon when the 48b code is complete. The patch series attempts to make
each function which touches a part of code specific to the page table
level and here is no exception. Having extra variables (such as the
PPGTT) distracts and provides room to add bugs since the function
shouldn't be touching anything in the higher order page tables.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 55 +++++++++++++++++++++++++------------
 1 file changed, 38 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2c3f2db..ad7e274 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -618,23 +618,19 @@ static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	return gen8_write_pdp(ring, 0, ppgtt->pml4.daddr);
 }
 
-static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
-				   uint64_t start,
-				   uint64_t length,
-				   bool use_scratch)
+static void gen8_ppgtt_clear_pte_range(struct i915_page_directory_pointer_entry *pdp,
+				       uint64_t start,
+				       uint64_t length,
+				       gen8_gtt_pte_t scratch_pte,
+				       const bool flush)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
-	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
-	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
+	gen8_gtt_pte_t *pt_vaddr;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
 	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
 	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned last_pte, i;
 
-	scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
-				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
 		struct i915_page_directory_entry *pd;
@@ -667,7 +663,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 			num_entries--;
 		}
 
-		if (!HAS_LLC(ppgtt->base.dev))
+		if (flush)
 			drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
 		kunmap_atomic(pt_vaddr);
 
@@ -679,14 +675,27 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
-				      struct sg_table *pages,
-				      uint64_t start,
-				      enum i915_cache_level cache_level, u32 unused)
+static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
+				   uint64_t start,
+				   uint64_t length,
+				   bool use_scratch)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+	gen8_gtt_pte_t scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
+						     I915_CACHE_LLC, use_scratch);
+
+	gen8_ppgtt_clear_pte_range(pdp, start, length, scratch_pte, !HAS_LLC(vm->dev));
+}
+
+static void gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer_entry *pdp,
+					  struct sg_table *pages,
+					  uint64_t start,
+					  enum i915_cache_level cache_level,
+					  const bool flush)
+{
 	gen8_gtt_pte_t *pt_vaddr;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -708,7 +717,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
 					cache_level, true);
 		if (++pte == GEN8_PTES_PER_PAGE) {
-			if (!HAS_LLC(ppgtt->base.dev))
+			if (flush)
 				drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
 			kunmap_atomic(pt_vaddr);
 			pt_vaddr = NULL;
@@ -720,12 +729,24 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		}
 	}
 	if (pt_vaddr) {
-		if (!HAS_LLC(ppgtt->base.dev))
+		if (flush)
 			drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
 		kunmap_atomic(pt_vaddr);
 	}
 }
 
+static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
+				      struct sg_table *pages,
+				      uint64_t start,
+				      enum i915_cache_level cache_level,
+				      u32 unused)
+{
+	struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+	gen8_ppgtt_insert_pte_entries(pdp, pages, start, cache_level, !HAS_LLC(vm->dev));
+}
+
 static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
 			     struct i915_page_table_entry *pt,
 			     struct drm_device *dev)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 29/32] drm/i915: Plumb sg_iter through va allocation ->maps
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (27 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 28/32] drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 30/32] drm/i915/bdw: Add 4 level support in insert_entries and clear_range Michel Thierry
                     ` (2 subsequent siblings)
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

As a step towards implementing 4 levels, while not discarding the
existing pte map functions, we need to pass the sg_iter through. The
current function understands to the page directory granularity. An
object's pages may span the page directory, and so using the iter
directly as we write the PTEs allows the iterator to stay coherent
through a VMA mapping operation spanning multiple page table levels.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 46 +++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ad7e274..483dd73 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -691,7 +691,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 }
 
 static void gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer_entry *pdp,
-					  struct sg_table *pages,
+					  struct sg_page_iter *sg_iter,
 					  uint64_t start,
 					  enum i915_cache_level cache_level,
 					  const bool flush)
@@ -700,11 +700,10 @@ static void gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer_ent
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
 	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
-	struct sg_page_iter sg_iter;
 
 	pt_vaddr = NULL;
 
-	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
+	while (__sg_page_iter_next(sg_iter)) {
 		if (pt_vaddr == NULL) {
 			struct i915_page_directory_entry *pd = pdp->page_directory[pdpe];
 			struct i915_page_table_entry *pt = pd->page_tables[pde];
@@ -714,7 +713,7 @@ static void gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer_ent
 		}
 
 		pt_vaddr[pte] =
-			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
+			gen8_pte_encode(sg_page_iter_dma_address(sg_iter),
 					cache_level, true);
 		if (++pte == GEN8_PTES_PER_PAGE) {
 			if (flush)
@@ -743,8 +742,10 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
+	struct sg_page_iter sg_iter;
 
-	gen8_ppgtt_insert_pte_entries(pdp, pages, start, cache_level, !HAS_LLC(vm->dev));
+	__sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
+	gen8_ppgtt_insert_pte_entries(pdp, &sg_iter, start, cache_level, !HAS_LLC(vm->dev));
 }
 
 static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
@@ -1106,10 +1107,12 @@ err_out:
 	return -ENOMEM;
 }
 
-static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
-				    struct i915_page_directory_pointer_entry *pdp,
-				    uint64_t start,
-				    uint64_t length)
+static int __gen8_alloc_vma_range_3lvl(struct i915_address_space *vm,
+				       struct i915_page_directory_pointer_entry *pdp,
+				       struct sg_page_iter *sg_iter,
+				       uint64_t start,
+				       uint64_t length,
+				       u32 flags)
 {
 	unsigned long *new_page_dirs, **new_page_tables;
 	struct drm_device *dev = vm->dev;
@@ -1178,7 +1181,11 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 				   gen8_pte_index(pd_start),
 				   gen8_pte_count(pd_start, pd_len));
 
-			/* Our pde is now pointing to the pagetable, pt */
+			if (sg_iter) {
+				BUG_ON(!sg_iter->__nents);
+				gen8_ppgtt_insert_pte_entries(pdp, sg_iter, pd_start,
+							      flags, !HAS_LLC(vm->dev));
+			}
 			set_bit(pde, pd->used_pdes);
 		}
 
@@ -1203,10 +1210,12 @@ err_out:
 	return ret;
 }
 
-static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
-				    struct i915_pml4 *pml4,
-				    uint64_t start,
-				    uint64_t length)
+static int __gen8_alloc_vma_range_4lvl(struct i915_address_space *vm,
+				       struct i915_pml4 *pml4,
+				       struct sg_page_iter *sg_iter,
+				       uint64_t start,
+				       uint64_t length,
+				       u32 flags)
 {
 	DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
 	struct i915_hw_ppgtt *ppgtt =
@@ -1250,7 +1259,8 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
 	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
 		BUG_ON(!pdp);
 
-		ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
+		ret = __gen8_alloc_vma_range_3lvl(vm, pdp, sg_iter,
+						  start, length, flags);
 		if (ret)
 			goto err_out;
 
@@ -1282,9 +1292,11 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		container_of(vm, struct i915_hw_ppgtt, base);
 
 	if (USES_FULL_48BIT_PPGTT(vm->dev))
-		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+		return __gen8_alloc_vma_range_4lvl(vm, &ppgtt->pml4, NULL,
+						   start, length, 0);
 	else
-		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
+		return __gen8_alloc_vma_range_3lvl(vm, &ppgtt->pdp, NULL,
+						   start, length, 0);
 }
 
 static void gen8_ppgtt_fini_common(struct i915_hw_ppgtt *ppgtt)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 30/32] drm/i915/bdw: Add 4 level support in insert_entries and clear_range
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (28 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 29/32] drm/i915: Plumb sg_iter through va allocation ->maps Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 31/32] drm/i915: Expand error state's address width to 64b Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 32/32] drm/i915/bdw: Flip the 48b switch Michel Thierry
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
it will write to.

Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.

Also add a scratch page for PML4.

This patch was inspired by Ben's "Depend exclusively on map and
unmap_vma".

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 66 ++++++++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 12 +++++++
 2 files changed, 67 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 483dd73..cd57c22 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -675,24 +675,52 @@ static void gen8_ppgtt_clear_pte_range(struct i915_page_directory_pointer_entry
 	}
 }
 
+static void gen8_ppgtt_clear_range_4lvl(struct i915_hw_ppgtt *ppgtt,
+					gen8_gtt_pte_t scratch_pte,
+					uint64_t start,
+					uint64_t length)
+{
+	struct i915_page_directory_pointer_entry *pdp;
+	uint64_t templ4, templ3, pml4e, pdpe;
+
+	gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
+		struct i915_page_directory_entry *pd;
+		uint64_t pdp_len = gen8_clamp_pdp(start, length);
+		uint64_t pdp_start = start;
+
+		gen8_for_each_pdpe(pd, pdp, pdp_start, pdp_len, templ3, pdpe) {
+			uint64_t pd_len = gen8_clamp_pd(pdp_start, pdp_len);
+			uint64_t pd_start = pdp_start;
+
+			gen8_ppgtt_clear_pte_range(pdp, pd_start, pd_len,
+						   scratch_pte, !HAS_LLC(ppgtt->base.dev));
+		}
+	}
+}
+
 static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
-				   uint64_t start,
-				   uint64_t length,
+				   uint64_t start, uint64_t length,
 				   bool use_scratch)
 {
 	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
-	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
-
+			container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
 						     I915_CACHE_LLC, use_scratch);
 
-	gen8_ppgtt_clear_pte_range(pdp, start, length, scratch_pte, !HAS_LLC(vm->dev));
+	if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+		struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp;
+
+		gen8_ppgtt_clear_pte_range(pdp, start, length, scratch_pte,
+					   !HAS_LLC(ppgtt->base.dev));
+	} else {
+		gen8_ppgtt_clear_range_4lvl(ppgtt, scratch_pte, start, length);
+	}
 }
 
 static void gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer_entry *pdp,
 					  struct sg_page_iter *sg_iter,
 					  uint64_t start,
+					  size_t pages,
 					  enum i915_cache_level cache_level,
 					  const bool flush)
 {
@@ -703,7 +731,7 @@ static void gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer_ent
 
 	pt_vaddr = NULL;
 
-	while (__sg_page_iter_next(sg_iter)) {
+	while (pages-- && __sg_page_iter_next(sg_iter)) {
 		if (pt_vaddr == NULL) {
 			struct i915_page_directory_entry *pd = pdp->page_directory[pdpe];
 			struct i915_page_table_entry *pt = pd->page_tables[pde];
@@ -741,11 +769,26 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 				      u32 unused)
 {
 	struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base);
-	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
+	struct i915_page_directory_pointer_entry *pdp;
 	struct sg_page_iter sg_iter;
 
 	__sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
-	gen8_ppgtt_insert_pte_entries(pdp, &sg_iter, start, cache_level, !HAS_LLC(vm->dev));
+
+	if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+		pdp = &ppgtt->pdp;
+		gen8_ppgtt_insert_pte_entries(pdp, &sg_iter, start,
+				sg_nents(pages->sgl),
+				cache_level, !HAS_LLC(vm->dev));
+	} else {
+		struct i915_pml4 *pml4;
+		unsigned pml4e = gen8_pml4e_index(start);
+
+		pml4 = &ppgtt->pml4;
+		pdp = pml4->pdps[pml4e];
+		gen8_ppgtt_insert_pte_entries(pdp, &sg_iter, start,
+				sg_nents(pages->sgl),
+				cache_level, !HAS_LLC(vm->dev));
+	}
 }
 
 static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
@@ -1184,7 +1227,8 @@ static int __gen8_alloc_vma_range_3lvl(struct i915_address_space *vm,
 			if (sg_iter) {
 				BUG_ON(!sg_iter->__nents);
 				gen8_ppgtt_insert_pte_entries(pdp, sg_iter, pd_start,
-							      flags, !HAS_LLC(vm->dev));
+						gen8_pte_count(pd_start, pd_len),
+						flags, !HAS_LLC(vm->dev));
 			}
 			set_bit(pde, pd->used_pdes);
 		}
@@ -1329,7 +1373,7 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
 		int ret = pml4_init(ppgtt);
 		if (ret) {
-			unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
+			unmap_and_free_pt(ppgtt->scratch_pml4, ppgtt->base.dev);
 			return ret;
 		}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 1f4cdb1..602d446c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -332,6 +332,7 @@ struct i915_hw_ppgtt {
 	union {
 		struct i915_page_table_entry *scratch_pt;
 		struct i915_page_table_entry *scratch_pd; /* Just need the daddr */
+		struct i915_page_table_entry *scratch_pml4;
 	};
 
 	struct drm_i915_file_private *file_priv;
@@ -452,6 +453,17 @@ static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
 	return next_pd - start;
 }
 
+/* Clamp length to the next page_directory pointer boundary */
+static inline uint64_t gen8_clamp_pdp(uint64_t start, uint64_t length)
+{
+	uint64_t next_pdp = ALIGN(start + 1, 1ULL << GEN8_PML4E_SHIFT);
+
+	if (next_pdp > (start + length))
+		return length;
+
+	return next_pdp - start;
+}
+
 static inline uint32_t gen8_pte_index(uint64_t address)
 {
 	return i915_pte_index(address, GEN8_PDE_SHIFT);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 31/32] drm/i915: Expand error state's address width to 64b
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (29 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 30/32] drm/i915/bdw: Add 4 level support in insert_entries and clear_range Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  2015-02-23 15:44   ` [PATCH v5 32/32] drm/i915/bdw: Flip the 48b switch Michel Thierry
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

v2: 0 pad the new 8B fields or else intel_error_decode has a hard time.
Note, regardless we need an igt update.

v3: Make reloc_offset 64b also.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h       |  4 ++--
 drivers/gpu/drm/i915/i915_gpu_error.c | 17 +++++++++--------
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 662d6c1..d28abd1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -459,7 +459,7 @@ struct drm_i915_error_state {
 
 		struct drm_i915_error_object {
 			int page_count;
-			u32 gtt_offset;
+			u64 gtt_offset;
 			u32 *pages[0];
 		} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
 
@@ -485,7 +485,7 @@ struct drm_i915_error_state {
 		u32 size;
 		u32 name;
 		u32 rseqno, wseqno;
-		u32 gtt_offset;
+		u64 gtt_offset;
 		u32 read_domains;
 		u32 write_domain;
 		s32 fence_reg:I915_MAX_NUM_FENCE_BITS;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index a982849..bbf25d0 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -195,7 +195,7 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
 	err_printf(m, "  %s [%d]:\n", name, count);
 
 	while (count--) {
-		err_printf(m, "    %08x %8u %02x %02x %x %x",
+		err_printf(m, "    %016llx %8u %02x %02x %x %x",
 			   err->gtt_offset,
 			   err->size,
 			   err->read_domains,
@@ -415,7 +415,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 				err_printf(m, " (submitted by %s [%d])",
 					   error->ring[i].comm,
 					   error->ring[i].pid);
-			err_printf(m, " --- gtt_offset = 0x%08x\n",
+			err_printf(m, " --- gtt_offset = 0x%016llx\n",
 				   obj->gtt_offset);
 			print_error_obj(m, obj);
 		}
@@ -423,7 +423,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		obj = error->ring[i].wa_batchbuffer;
 		if (obj) {
 			err_printf(m, "%s (w/a) --- gtt_offset = 0x%08x\n",
-				   dev_priv->ring[i].name, obj->gtt_offset);
+				   dev_priv->ring[i].name,
+				   lower_32_bits(obj->gtt_offset));
 			print_error_obj(m, obj);
 		}
 
@@ -442,14 +443,14 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		if ((obj = error->ring[i].ringbuffer)) {
 			err_printf(m, "%s --- ringbuffer = 0x%08x\n",
 				   dev_priv->ring[i].name,
-				   obj->gtt_offset);
+				   lower_32_bits(obj->gtt_offset));
 			print_error_obj(m, obj);
 		}
 
 		if ((obj = error->ring[i].hws_page)) {
 			err_printf(m, "%s --- HW Status = 0x%08x\n",
 				   dev_priv->ring[i].name,
-				   obj->gtt_offset);
+				   lower_32_bits(obj->gtt_offset));
 			offset = 0;
 			for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
 				err_printf(m, "[%04x] %08x %08x %08x %08x\n",
@@ -465,13 +466,13 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		if ((obj = error->ring[i].ctx)) {
 			err_printf(m, "%s --- HW Context = 0x%08x\n",
 				   dev_priv->ring[i].name,
-				   obj->gtt_offset);
+				   lower_32_bits(obj->gtt_offset));
 			print_error_obj(m, obj);
 		}
 	}
 
 	if ((obj = error->semaphore_obj)) {
-		err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
+		err_printf(m, "Semaphore page = 0x%016llx\n", obj->gtt_offset);
 		for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
 			err_printf(m, "[%04x] %08x %08x %08x %08x\n",
 				   elt * 4,
@@ -571,7 +572,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 	int num_pages;
 	bool use_ggtt;
 	int i = 0;
-	u32 reloc_offset;
+	u64 reloc_offset;
 
 	if (src == NULL || src->pages == NULL)
 		return NULL;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v5 32/32] drm/i915/bdw: Flip the 48b switch
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (30 preceding siblings ...)
  2015-02-23 15:44   ` [PATCH v5 31/32] drm/i915: Expand error state's address width to 64b Michel Thierry
@ 2015-02-23 15:44   ` Michel Thierry
  31 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-23 15:44 UTC (permalink / raw)
  To: intel-gfx

Use 48b addresses if hw supports it and i915.enable_ppgtt=3.

Aliasing PPGTT remains 32b only.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 7 ++-----
 drivers/gpu/drm/i915/i915_params.c  | 2 +-
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index cd57c22..cebb868 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -106,7 +106,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 	has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
 
 #ifdef CONFIG_64BIT
-	has_full_64bit_ppgtt = IS_BROADWELL(dev) && false; /* FIXME: 64b */
+	has_full_64bit_ppgtt = IS_BROADWELL(dev);
 #else
 	has_full_64bit_ppgtt = false;
 #endif
@@ -1075,9 +1075,6 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
 
 	BUG_ON(!bitmap_empty(new_pds, pdpes));
 
-	/* FIXME: PPGTT container_of won't work for 64b */
-	BUG_ON((start + length) > 0x800000000ULL);
-
 	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		if (pd)
 			continue;
@@ -1396,7 +1393,7 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
+	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b? */
 	struct i915_page_directory_entry *pd;
 	uint64_t temp, start = 0, size = dev_priv->gtt.base.total;
 	uint32_t pdpe;
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 44f2262..1cd43b0 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -119,7 +119,7 @@ MODULE_PARM_DESC(enable_hangcheck,
 module_param_named_unsafe(enable_ppgtt, i915.enable_ppgtt, int, 0400);
 MODULE_PARM_DESC(enable_ppgtt,
 	"Override PPGTT usage. "
-	"(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
+	"(-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full_64b)");
 
 module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
 MODULE_PARM_DESC(enable_execlists,
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH] drm/i915: page table abstractions
  2015-02-23 15:44   ` [PATCH v5 01/32] drm/i915: page table abstractions Michel Thierry
@ 2015-02-24 11:14     ` Michel Thierry
  2015-02-24 12:03       ` Mika Kuoppala
  0 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 11:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky

From: Ben Widawsky <benjamin.widawsky@intel.com>

When we move to dynamic page allocation, keeping page_directory and pagetabs as
separate structures will help to break actions into simpler tasks.

To help transition the code nicely there is some wasted space in gen6/7.
This will be ameliorated shortly.

Following the x86 pagetable terminology:
PDPE = struct i915_page_directory_pointer_entry.
PDE = struct i915_page_directory_entry [page_directory].
PTE = struct i915_page_table_entry [page_tables].

v2: fixed mismatches after clean-up/rebase.

v3: Clarify the names of the multiple levels of page tables (Daniel)

v4: Addressing Mika's review comments.
s/gen8_free_page_directories/gen8_free_page_directory and free the
page tables for the directory there.
In gen8_ppgtt_allocate_page_directories, do not leak previously allocated
pt in case the page_directory alloc fails.
Update error return handling in gen8_ppgtt_alloc.

v5: Do not leak pt on error in gen6_ppgtt_allocate_page_tables. (Mika)

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 180 +++++++++++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  23 ++++-
 2 files changed, 111 insertions(+), 92 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e54b2a0..b4dee34 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -338,7 +338,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
+		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
+		struct page *page_table = pd->page_tables[pde].page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
@@ -382,8 +383,12 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
 			break;
 
-		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
+		if (pt_vaddr == NULL) {
+			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
+			struct page *page_table = pd->page_tables[pde].page;
+
+			pt_vaddr = kmap_atomic(page_table);
+		}
 
 		pt_vaddr[pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -407,29 +412,33 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct page **pt_pages)
+static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
 {
 	int i;
 
-	if (pt_pages == NULL)
+	if (pd->page_tables == NULL)
 		return;
 
 	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
-		if (pt_pages[i])
-			__free_pages(pt_pages[i], 0);
+		if (pd->page_tables[i].page)
+			__free_page(pd->page_tables[i].page);
 }
 
-static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
+static void gen8_free_page_directory(struct i915_page_directory_entry *pd)
+{
+	gen8_free_page_tables(pd);
+	kfree(pd->page_tables);
+	__free_page(pd->page);
+}
+
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
-		kfree(ppgtt->gen8_pt_pages[i]);
+		gen8_free_page_directory(&ppgtt->pdp.page_directory[i]);
 		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
-
-	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
 }
 
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
@@ -464,86 +473,77 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static struct page **__gen8_alloc_page_tables(void)
+static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages;
 	int i;
 
-	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
-	if (!pt_pages)
-		return ERR_PTR(-ENOMEM);
-
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!pt_pages[i])
-			goto bail;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
+						     sizeof(dma_addr_t),
+						     GFP_KERNEL);
+		if (!ppgtt->gen8_pt_dma_addr[i])
+			return -ENOMEM;
 	}
 
-	return pt_pages;
-
-bail:
-	gen8_free_page_tables(pt_pages);
-	kfree(pt_pages);
-	return ERR_PTR(-ENOMEM);
+	return 0;
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
-					   const int max_pdp)
+static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages[GEN8_LEGACY_PDPES];
-	int i, ret;
+	int i, j;
 
-	for (i = 0; i < max_pdp; i++) {
-		pt_pages[i] = __gen8_alloc_page_tables();
-		if (IS_ERR(pt_pages[i])) {
-			ret = PTR_ERR(pt_pages[i]);
-			goto unwind_out;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+			struct i915_page_table_entry *pt = &ppgtt->pdp.page_directory[i].page_tables[j];
+
+			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+			if (!pt->page)
+				goto unwind_out;
 		}
 	}
 
-	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
-	 * "atomic" - for cleanup purposes.
-	 */
-	for (i = 0; i < max_pdp; i++)
-		ppgtt->gen8_pt_pages[i] = pt_pages[i];
-
 	return 0;
 
 unwind_out:
-	while (i--) {
-		gen8_free_page_tables(pt_pages[i]);
-		kfree(pt_pages[i]);
-	}
+	while (i--)
+		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
 
-	return ret;
+	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
+						const int max_pdp)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
+	for (i = 0; i < max_pdp; i++) {
+		struct i915_page_table_entry *pt;
 
-	return 0;
-}
+		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
+		if (!pt)
+			goto unwind_out;
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
-{
-	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
-	if (!ppgtt->pd_pages)
-		return -ENOMEM;
+		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
+		if (!ppgtt->pdp.page_directory[i].page) {
+			kfree(pt);
+			goto unwind_out;
+		}
+
+		ppgtt->pdp.page_directory[i].page_tables = pt;
+	}
 
-	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
+	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
+
+unwind_out:
+	while (i--) {
+		kfree(ppgtt->pdp.page_directory[i].page_tables);
+		__free_page(ppgtt->pdp.page_directory[i].page);
+	}
+
+	return -ENOMEM;
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
@@ -555,18 +555,20 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
-	if (ret) {
-		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
-		return ret;
-	}
+	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
+	if (ret)
+		goto err_out;
 
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
 
 	ret = gen8_ppgtt_allocate_dma(ppgtt);
 	if (ret)
-		gen8_ppgtt_free(ppgtt);
+		goto err_out;
+
+	return 0;
 
+err_out:
+	gen8_ppgtt_free(ppgtt);
 	return ret;
 }
 
@@ -577,7 +579,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       &ppgtt->pd_pages[pd], 0,
+			       ppgtt->pdp.page_directory[pd].page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
@@ -597,7 +599,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 	struct page *p;
 	int ret;
 
-	p = ppgtt->gen8_pt_pages[pd][pt];
+	p = ppgtt->pdp.page_directory[pd].page_tables[pt].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
@@ -658,7 +660,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 */
 	for (i = 0; i < max_pdp; i++) {
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
@@ -721,7 +723,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
 		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
@@ -936,7 +938,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > I915_PPGTT_PT_ENTRIES)
 			last_pte = I915_PPGTT_PT_ENTRIES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -965,7 +967,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -1000,8 +1002,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pt_pages[i]);
-	kfree(ppgtt->pt_pages);
+		if (ppgtt->pd.page_tables[i].page)
+			__free_page(ppgtt->pd.page_tables[i].page);
+	kfree(ppgtt->pd.page_tables);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1058,17 +1061,18 @@ alloc:
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
+	struct i915_page_table_entry *pt;
 	int i;
 
-	ppgtt->pt_pages = kcalloc(ppgtt->num_pd_entries, sizeof(struct page *),
-				  GFP_KERNEL);
-
-	if (!ppgtt->pt_pages)
+	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
+	if (!pt)
 		return -ENOMEM;
 
+	ppgtt->pd.page_tables = pt;
+
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!ppgtt->pt_pages[i]) {
+		pt[i].page = alloc_page(GFP_KERNEL);
+		if (!pt->page) {
 			gen6_ppgtt_free(ppgtt);
 			return -ENOMEM;
 		}
@@ -1108,9 +1112,11 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct page *page;
 		dma_addr_t pt_addr;
 
-		pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i], 0, 4096,
+		page = ppgtt->pd.page_tables[i].page;
+		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
 		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
@@ -1157,7 +1163,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total =  ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd_offset =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8f76990..d9bc375 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -187,6 +187,20 @@ struct i915_vma {
 			 u32 flags);
 };
 
+struct i915_page_table_entry {
+	struct page *page;
+};
+
+struct i915_page_directory_entry {
+	struct page *page; /* NULL for GEN6-GEN7 */
+	struct i915_page_table_entry *page_tables;
+};
+
+struct i915_page_directory_pointer_entry {
+	/* struct page *page; */
+	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
+};
+
 struct i915_address_space {
 	struct drm_mm mm;
 	struct drm_device *dev;
@@ -272,11 +286,6 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		struct page **pt_pages;
-		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
-	};
-	struct page *pd_pages;
-	union {
 		uint32_t pd_offset;
 		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
 	};
@@ -284,6 +293,10 @@ struct i915_hw_ppgtt {
 		dma_addr_t *pt_dma_addr;
 		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
 	};
+	union {
+		struct i915_page_directory_pointer_entry pdp;
+		struct i915_page_directory_entry pd;
+	};
 
 	struct drm_i915_file_private *file_priv;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH] drm/i915: page table abstractions
  2015-02-24 11:14     ` [PATCH] " Michel Thierry
@ 2015-02-24 12:03       ` Mika Kuoppala
  0 siblings, 0 replies; 229+ messages in thread
From: Mika Kuoppala @ 2015-02-24 12:03 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx; +Cc: Ben Widawsky

Michel Thierry <michel.thierry@intel.com> writes:

> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> When we move to dynamic page allocation, keeping page_directory and pagetabs as
> separate structures will help to break actions into simpler tasks.
>
> To help transition the code nicely there is some wasted space in gen6/7.
> This will be ameliorated shortly.
>
> Following the x86 pagetable terminology:
> PDPE = struct i915_page_directory_pointer_entry.
> PDE = struct i915_page_directory_entry [page_directory].
> PTE = struct i915_page_table_entry [page_tables].
>
> v2: fixed mismatches after clean-up/rebase.
>
> v3: Clarify the names of the multiple levels of page tables (Daniel)
>
> v4: Addressing Mika's review comments.
> s/gen8_free_page_directories/gen8_free_page_directory and free the
> page tables for the directory there.
> In gen8_ppgtt_allocate_page_directories, do not leak previously allocated
> pt in case the page_directory alloc fails.
> Update error return handling in gen8_ppgtt_alloc.
>
> v5: Do not leak pt on error in gen6_ppgtt_allocate_page_tables. (Mika)
>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 180 +++++++++++++++++++-----------------
>  drivers/gpu/drm/i915/i915_gem_gtt.h |  23 ++++-
>  2 files changed, 111 insertions(+), 92 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index e54b2a0..b4dee34 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -338,7 +338,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>  				      I915_CACHE_LLC, use_scratch);
>  
>  	while (num_entries) {
> -		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
> +		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
> +		struct page *page_table = pd->page_tables[pde].page;
>  
>  		last_pte = pte + num_entries;
>  		if (last_pte > GEN8_PTES_PER_PAGE)
> @@ -382,8 +383,12 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
>  			break;
>  
> -		if (pt_vaddr == NULL)
> -			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
> +		if (pt_vaddr == NULL) {
> +			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
> +			struct page *page_table = pd->page_tables[pde].page;
> +
> +			pt_vaddr = kmap_atomic(page_table);
> +		}
>  
>  		pt_vaddr[pte] =
>  			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
> @@ -407,29 +412,33 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  	}
>  }
>  
> -static void gen8_free_page_tables(struct page **pt_pages)
> +static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
>  {
>  	int i;
>  
> -	if (pt_pages == NULL)
> +	if (pd->page_tables == NULL)
>  		return;
>  
>  	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
> -		if (pt_pages[i])
> -			__free_pages(pt_pages[i], 0);
> +		if (pd->page_tables[i].page)
> +			__free_page(pd->page_tables[i].page);
>  }
>  
> -static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
> +static void gen8_free_page_directory(struct i915_page_directory_entry *pd)
> +{
> +	gen8_free_page_tables(pd);
> +	kfree(pd->page_tables);
> +	__free_page(pd->page);
> +}
> +
> +static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  {
>  	int i;
>  
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> -		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
> -		kfree(ppgtt->gen8_pt_pages[i]);
> +		gen8_free_page_directory(&ppgtt->pdp.page_directory[i]);
>  		kfree(ppgtt->gen8_pt_dma_addr[i]);
>  	}
> -
> -	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
>  }
>  
>  static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
> @@ -464,86 +473,77 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>  	gen8_ppgtt_free(ppgtt);
>  }
>  
> -static struct page **__gen8_alloc_page_tables(void)
> +static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
>  {
> -	struct page **pt_pages;
>  	int i;
>  
> -	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
> -	if (!pt_pages)
> -		return ERR_PTR(-ENOMEM);
> -
> -	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
> -		pt_pages[i] = alloc_page(GFP_KERNEL);
> -		if (!pt_pages[i])
> -			goto bail;
> +	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> +		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
> +						     sizeof(dma_addr_t),
> +						     GFP_KERNEL);
> +		if (!ppgtt->gen8_pt_dma_addr[i])
> +			return -ENOMEM;
>  	}
>  
> -	return pt_pages;
> -
> -bail:
> -	gen8_free_page_tables(pt_pages);
> -	kfree(pt_pages);
> -	return ERR_PTR(-ENOMEM);
> +	return 0;
>  }
>  
> -static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
> -					   const int max_pdp)
> +static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>  {
> -	struct page **pt_pages[GEN8_LEGACY_PDPES];
> -	int i, ret;
> +	int i, j;
>  
> -	for (i = 0; i < max_pdp; i++) {
> -		pt_pages[i] = __gen8_alloc_page_tables();
> -		if (IS_ERR(pt_pages[i])) {
> -			ret = PTR_ERR(pt_pages[i]);
> -			goto unwind_out;
> +	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> +		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> +			struct i915_page_table_entry *pt = &ppgtt->pdp.page_directory[i].page_tables[j];
> +
> +			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +			if (!pt->page)
> +				goto unwind_out;
>  		}
>  	}
>  
> -	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
> -	 * "atomic" - for cleanup purposes.
> -	 */
> -	for (i = 0; i < max_pdp; i++)
> -		ppgtt->gen8_pt_pages[i] = pt_pages[i];
> -
>  	return 0;
>  
>  unwind_out:
> -	while (i--) {
> -		gen8_free_page_tables(pt_pages[i]);
> -		kfree(pt_pages[i]);
> -	}
> +	while (i--)
> +		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
>  
> -	return ret;
> +	return -ENOMEM;
>  }
>  
> -static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
> +static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
> +						const int max_pdp)
>  {
>  	int i;
>  
> -	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> -		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
> -						     sizeof(dma_addr_t),
> -						     GFP_KERNEL);
> -		if (!ppgtt->gen8_pt_dma_addr[i])
> -			return -ENOMEM;
> -	}
> +	for (i = 0; i < max_pdp; i++) {
> +		struct i915_page_table_entry *pt;
>  
> -	return 0;
> -}
> +		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
> +		if (!pt)
> +			goto unwind_out;
>  
> -static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
> -						const int max_pdp)
> -{
> -	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
> -	if (!ppgtt->pd_pages)
> -		return -ENOMEM;
> +		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
> +		if (!ppgtt->pdp.page_directory[i].page) {
> +			kfree(pt);
> +			goto unwind_out;
> +		}
> +
> +		ppgtt->pdp.page_directory[i].page_tables = pt;
> +	}
>  
> -	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
> +	ppgtt->num_pd_pages = max_pdp;
>  	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
>  
>  	return 0;
> +
> +unwind_out:
> +	while (i--) {
> +		kfree(ppgtt->pdp.page_directory[i].page_tables);
> +		__free_page(ppgtt->pdp.page_directory[i].page);
> +	}
> +
> +	return -ENOMEM;
>  }
>  
>  static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
> @@ -555,18 +555,20 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
>  	if (ret)
>  		return ret;
>  
> -	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
> -	if (ret) {
> -		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
> -		return ret;
> -	}
> +	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
> +	if (ret)
> +		goto err_out;
>  
>  	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
>  
>  	ret = gen8_ppgtt_allocate_dma(ppgtt);
>  	if (ret)
> -		gen8_ppgtt_free(ppgtt);
> +		goto err_out;
> +
> +	return 0;
>  
> +err_out:
> +	gen8_ppgtt_free(ppgtt);
>  	return ret;
>  }
>  
> @@ -577,7 +579,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
>  	int ret;
>  
>  	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
> -			       &ppgtt->pd_pages[pd], 0,
> +			       ppgtt->pdp.page_directory[pd].page, 0,
>  			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>  
>  	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
> @@ -597,7 +599,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
>  	struct page *p;
>  	int ret;
>  
> -	p = ppgtt->gen8_pt_pages[pd][pt];
> +	p = ppgtt->pdp.page_directory[pd].page_tables[pt].page;
>  	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
>  			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>  	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
> @@ -658,7 +660,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  	 */
>  	for (i = 0; i < max_pdp; i++) {
>  		gen8_ppgtt_pde_t *pd_vaddr;
> -		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
> +		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
>  			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
>  			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
> @@ -721,7 +723,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>  				   expected);
>  		seq_printf(m, "\tPDE: %x\n", pd_entry);
>  
> -		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
>  		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
>  			unsigned long va =
>  				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
> @@ -936,7 +938,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>  		if (last_pte > I915_PPGTT_PT_ENTRIES)
>  			last_pte = I915_PPGTT_PT_ENTRIES;
>  
> -		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
>  
>  		for (i = first_pte; i < last_pte; i++)
>  			pt_vaddr[i] = scratch_pte;
> @@ -965,7 +967,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>  	pt_vaddr = NULL;
>  	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
>  		if (pt_vaddr == NULL)
> -			pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
> +			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
>  
>  		pt_vaddr[act_pte] =
>  			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
> @@ -1000,8 +1002,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  
>  	kfree(ppgtt->pt_dma_addr);
>  	for (i = 0; i < ppgtt->num_pd_entries; i++)
> -		__free_page(ppgtt->pt_pages[i]);
> -	kfree(ppgtt->pt_pages);
> +		if (ppgtt->pd.page_tables[i].page)
> +			__free_page(ppgtt->pd.page_tables[i].page);
> +	kfree(ppgtt->pd.page_tables);
>  }
>  
>  static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
> @@ -1058,17 +1061,18 @@ alloc:
>  
>  static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>  {
> +	struct i915_page_table_entry *pt;
>  	int i;
>  
> -	ppgtt->pt_pages = kcalloc(ppgtt->num_pd_entries, sizeof(struct page *),
> -				  GFP_KERNEL);
> -
> -	if (!ppgtt->pt_pages)
> +	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
> +	if (!pt)
>  		return -ENOMEM;
>  
> +	ppgtt->pd.page_tables = pt;
> +
>  	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> -		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
> -		if (!ppgtt->pt_pages[i]) {
> +		pt[i].page = alloc_page(GFP_KERNEL);
> +		if (!pt->page) {
>  			gen6_ppgtt_free(ppgtt);
>  			return -ENOMEM;
>  		}
> @@ -1108,9 +1112,11 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>  	int i;
>  
>  	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> +		struct page *page;
>  		dma_addr_t pt_addr;
>  
> -		pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i], 0, 4096,
> +		page = ppgtt->pd.page_tables[i].page;
> +		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
>  				       PCI_DMA_BIDIRECTIONAL);
>  
>  		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
> @@ -1157,7 +1163,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
>  	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
>  	ppgtt->base.start = 0;
> -	ppgtt->base.total =  ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
> +	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
>  	ppgtt->debug_dump = gen6_dump_ppgtt;
>  
>  	ppgtt->pd_offset =
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 8f76990..d9bc375 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -187,6 +187,20 @@ struct i915_vma {
>  			 u32 flags);
>  };
>  
> +struct i915_page_table_entry {
> +	struct page *page;
> +};
> +
> +struct i915_page_directory_entry {
> +	struct page *page; /* NULL for GEN6-GEN7 */
> +	struct i915_page_table_entry *page_tables;
> +};
> +
> +struct i915_page_directory_pointer_entry {
> +	/* struct page *page; */
> +	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
> +};
> +
>  struct i915_address_space {
>  	struct drm_mm mm;
>  	struct drm_device *dev;
> @@ -272,11 +286,6 @@ struct i915_hw_ppgtt {
>  	unsigned num_pd_entries;
>  	unsigned num_pd_pages; /* gen8+ */
>  	union {
> -		struct page **pt_pages;
> -		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
> -	};
> -	struct page *pd_pages;
> -	union {
>  		uint32_t pd_offset;
>  		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
>  	};
> @@ -284,6 +293,10 @@ struct i915_hw_ppgtt {
>  		dma_addr_t *pt_dma_addr;
>  		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
>  	};
> +	union {
> +		struct i915_page_directory_pointer_entry pdp;
> +		struct i915_page_directory_entry pd;
> +	};
>  
>  	struct drm_i915_file_private *file_priv;
>  
> -- 
> 2.1.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v5 02/32] drm/i915: Complete page table structures
  2015-02-23 15:44   ` [PATCH v5 02/32] drm/i915: Complete page table structures Michel Thierry
@ 2015-02-24 13:10     ` Mika Kuoppala
  0 siblings, 0 replies; 229+ messages in thread
From: Mika Kuoppala @ 2015-02-24 13:10 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> Move the remaining members over to the new page table structures.
>
> This can be squashed with the previous commit if desire. The reasoning
> is the same as that patch. I simply felt it is easier to review if split.
>
> v2: In lrc: s/ppgtt->pd_dma_addr[i]/ppgtt->pdp.page_directory[i].daddr/
> v3: Rebase.
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2, v3)
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |  2 +-
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 85 +++++++++++++------------------------
>  drivers/gpu/drm/i915/i915_gem_gtt.h | 14 +++---
>  drivers/gpu/drm/i915/intel_lrc.c    | 16 +++----
>  4 files changed, 44 insertions(+), 73 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 63be374..4d07030 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2185,7 +2185,7 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>  		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
>  
>  		seq_puts(m, "aliasing PPGTT:\n");
> -		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
> +		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
>  
>  		ppgtt->debug_dump(ppgtt, m);
>  	}
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 10026d3..eb0714c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -311,7 +311,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
>  	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
>  
>  	for (i = used_pd - 1; i >= 0; i--) {
> -		dma_addr_t addr = ppgtt->pd_dma_addr[i];
> +		dma_addr_t addr = ppgtt->pdp.page_directory[i].daddr;
>  		ret = gen8_write_pdp(ring, i, addr);
>  		if (ret)
>  			return ret;
> @@ -437,7 +437,6 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
>  		gen8_free_page_directory(&ppgtt->pdp.page_directory[i]);
> -		kfree(ppgtt->gen8_pt_dma_addr[i]);
>  	}
>  }
>  
> @@ -449,14 +448,14 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
>  		/* TODO: In the future we'll support sparse mappings, so this
>  		 * will have to change. */
> -		if (!ppgtt->pd_dma_addr[i])
> +		if (!ppgtt->pdp.page_directory[i].daddr)
>  			continue;
>  
> -		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
> +		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i].daddr, PAGE_SIZE,
>  			       PCI_DMA_BIDIRECTIONAL);
>  
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> -			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
> +			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
>  			if (addr)
>  				pci_unmap_page(hwdev, addr, PAGE_SIZE,
>  					       PCI_DMA_BIDIRECTIONAL);
> @@ -473,32 +472,19 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>  	gen8_ppgtt_free(ppgtt);
>  }
>  
> -static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
> -{
> -	int i;
> -
> -	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> -		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
> -						     sizeof(dma_addr_t),
> -						     GFP_KERNEL);
> -		if (!ppgtt->gen8_pt_dma_addr[i])
> -			return -ENOMEM;
> -	}
> -
> -	return 0;
> -}
> -
>  static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>  {
>  	int i, j;
>  
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> +		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[i];
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> -			struct i915_page_table_entry *pt = &ppgtt->pdp.page_directory[i].page_tables[j];
> +			struct i915_page_table_entry *pt = &pd->page_tables[j];
>  
>  			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>  			if (!pt->page)
>  				goto unwind_out;
> +

This hunk should have been in the previous patch, oh well..

>  		}
>  	}
>  
> @@ -561,10 +547,6 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
>  
>  	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
>  
> -	ret = gen8_ppgtt_allocate_dma(ppgtt);
> -	if (ret)
> -		goto err_out;
> -
>  	return 0;
>  
>  err_out:
> @@ -586,7 +568,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,

Here we are again setting only one page directory. But as it is not
problem with this patch:

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

>  	if (ret)
>  		return ret;
>  
> -	ppgtt->pd_dma_addr[pd] = pd_addr;
> +	ppgtt->pdp.page_directory[pd].daddr = pd_addr;
>  
>  	return 0;
>  }
> @@ -596,17 +578,18 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
>  					const int pt)
>  {
>  	dma_addr_t pt_addr;
> -	struct page *p;
> +	struct i915_page_directory_entry *pdir = &ppgtt->pdp.page_directory[pd];
> +	struct i915_page_table_entry *ptab = &pdir->page_tables[pt];
> +	struct page *p = ptab->page;
>  	int ret;
>  
> -	p = ppgtt->pdp.page_directory[pd].page_tables[pt].page;
>  	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
>  			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>  	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
>  	if (ret)
>  		return ret;
>  
> -	ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
> +	ptab->daddr = pt_addr;
>  
>  	return 0;
>  }
> @@ -662,7 +645,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  		gen8_ppgtt_pde_t *pd_vaddr;
>  		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> -			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
> +			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
>  			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
>  						      I915_CACHE_LLC);
>  		}
> @@ -705,14 +688,15 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>  	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
>  
>  	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
> -		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
> +		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
>  
>  	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
> -		   ppgtt->pd_offset, ppgtt->pd_offset + ppgtt->num_pd_entries);
> +		   ppgtt->pd.pd_offset,
> +		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
>  	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
>  		u32 expected;
>  		gen6_gtt_pte_t *pt_vaddr;
> -		dma_addr_t pt_addr = ppgtt->pt_dma_addr[pde];
> +		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
>  		pd_entry = readl(pd_addr + pde);
>  		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
>  
> @@ -756,13 +740,13 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
>  	uint32_t pd_entry;
>  	int i;
>  
> -	WARN_ON(ppgtt->pd_offset & 0x3f);
> +	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
>  	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
> -		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
> +		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
>  	for (i = 0; i < ppgtt->num_pd_entries; i++) {
>  		dma_addr_t pt_addr;
>  
> -		pt_addr = ppgtt->pt_dma_addr[i];
> +		pt_addr = ppgtt->pd.page_tables[i].daddr;
>  		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
>  		pd_entry |= GEN6_PDE_VALID;
>  
> @@ -773,9 +757,9 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
>  
>  static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
>  {
> -	BUG_ON(ppgtt->pd_offset & 0x3f);
> +	BUG_ON(ppgtt->pd.pd_offset & 0x3f);
>  
> -	return (ppgtt->pd_offset / 64) << 16;
> +	return (ppgtt->pd.pd_offset / 64) << 16;
>  }
>  
>  static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
> @@ -988,19 +972,16 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  {
>  	int i;
>  
> -	if (ppgtt->pt_dma_addr) {
> -		for (i = 0; i < ppgtt->num_pd_entries; i++)
> -			pci_unmap_page(ppgtt->base.dev->pdev,
> -				       ppgtt->pt_dma_addr[i],
> -				       4096, PCI_DMA_BIDIRECTIONAL);
> -	}
> +	for (i = 0; i < ppgtt->num_pd_entries; i++)
> +		pci_unmap_page(ppgtt->base.dev->pdev,
> +			       ppgtt->pd.page_tables[i].daddr,
> +			       4096, PCI_DMA_BIDIRECTIONAL);
>  }
>  
>  static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  {
>  	int i;
>  
> -	kfree(ppgtt->pt_dma_addr);
>  	for (i = 0; i < ppgtt->num_pd_entries; i++)
>  		__free_page(ppgtt->pd.page_tables[i].page);
>  	kfree(ppgtt->pd.page_tables);
> @@ -1093,14 +1074,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>  		return ret;
>  	}
>  
> -	ppgtt->pt_dma_addr = kcalloc(ppgtt->num_pd_entries, sizeof(dma_addr_t),
> -				     GFP_KERNEL);
> -	if (!ppgtt->pt_dma_addr) {
> -		drm_mm_remove_node(&ppgtt->node);
> -		gen6_ppgtt_free(ppgtt);
> -		return -ENOMEM;
> -	}
> -
>  	return 0;
>  }
>  
> @@ -1122,7 +1095,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>  			return -EIO;
>  		}
>  
> -		ppgtt->pt_dma_addr[i] = pt_addr;
> +		ppgtt->pd.page_tables[i].daddr = pt_addr;
>  	}
>  
>  	return 0;
> @@ -1164,7 +1137,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
>  	ppgtt->debug_dump = gen6_dump_ppgtt;
>  
> -	ppgtt->pd_offset =
> +	ppgtt->pd.pd_offset =
>  		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
>  
>  	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
> @@ -1175,7 +1148,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  
>  	gen6_write_pdes(ppgtt);
>  	DRM_DEBUG("Adding PPGTT at offset %x\n",
> -		  ppgtt->pd_offset << 10);
> +		  ppgtt->pd.pd_offset << 10);
>  
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index d9bc375..6efeb18 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -189,10 +189,16 @@ struct i915_vma {
>  
>  struct i915_page_table_entry {
>  	struct page *page;
> +	dma_addr_t daddr;
>  };
>  
>  struct i915_page_directory_entry {
>  	struct page *page; /* NULL for GEN6-GEN7 */
> +	union {
> +		uint32_t pd_offset;
> +		dma_addr_t daddr;
> +	};
> +
>  	struct i915_page_table_entry *page_tables;
>  };
>  
> @@ -286,14 +292,6 @@ struct i915_hw_ppgtt {
>  	unsigned num_pd_entries;
>  	unsigned num_pd_pages; /* gen8+ */
>  	union {
> -		uint32_t pd_offset;
> -		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
> -	};
> -	union {
> -		dma_addr_t *pt_dma_addr;
> -		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
> -	};
> -	union {
>  		struct i915_page_directory_pointer_entry pdp;
>  		struct i915_page_directory_entry pd;
>  	};
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 1c65949..9e71992 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1735,14 +1735,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
>  	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
>  	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
>  	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
> -	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[3]);
> -	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[3]);
> -	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[2]);
> -	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[2]);
> -	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[1]);
> -	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[1]);
> -	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[0]);
> -	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[0]);
> +	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3].daddr);
> +	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3].daddr);
> +	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2].daddr);
> +	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2].daddr);
> +	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1].daddr);
> +	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1].daddr);
> +	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0].daddr);
> +	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0].daddr);
>  	if (ring->id == RCS) {
>  		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>  		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
> -- 
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v5 03/32] drm/i915: Create page table allocators
  2015-02-23 15:44   ` [PATCH v5 03/32] drm/i915: Create page table allocators Michel Thierry
@ 2015-02-24 13:56     ` Mika Kuoppala
  2015-02-24 15:18       ` Michel Thierry
  0 siblings, 1 reply; 229+ messages in thread
From: Mika Kuoppala @ 2015-02-24 13:56 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> As we move toward dynamic page table allocation, it becomes much easier
> to manage our data structures if break do things less coarsely by
> breaking up all of our actions into individual tasks.  This makes the
> code easier to write, read, and verify.
>
> Aside from the dissection of the allocation functions, the patch
> statically allocates the page table structures without a page directory.
> This remains the same for all platforms,
>
> The patch itself should not have much functional difference. The primary
> noticeable difference is the fact that page tables are no longer
> allocated, but rather statically declared as part of the page directory.
> This has non-zero overhead, but things gain non-trivial complexity as a
> result.
>

I don't quite understand the last sentence here. We gain overhead and
complexity.

s/non-trivial/trivial?

> This patch exists for a few reasons:
> 1. Splitting out the functions allows easily combining GEN6 and GEN8
> code. Page tables have no difference based on GEN8. As we'll see in a
> future patch when we add the DMA mappings to the allocations, it
> requires only one small change to make work, and error handling should
> just fall into place.
>
> 2. Unless we always want to allocate all page tables under a given PDE,
> we'll have to eventually break this up into an array of pointers (or
> pointer to pointer).
>
> 3. Having the discrete functions is easier to review, and understand.
> All allocations and frees now take place in just a couple of locations.
> Reviewing, and catching leaks should be easy.
>
> 4. Less important: the GFP flags are confined to one location, which
> makes playing around with such things trivial.
>
> v2: Updated commit message to explain why this patch exists
>
> v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
>
> v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
>
> v5: Added additional safety checks in gen8 clear/free/unmap.
>
> v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 252 ++++++++++++++++++++++++------------
>  drivers/gpu/drm/i915/i915_gem_gtt.h |   4 +-
>  drivers/gpu/drm/i915/intel_lrc.c    |  16 +--
>  3 files changed, 178 insertions(+), 94 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index eb0714c..65c77e5 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -279,6 +279,98 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
>  	return pte;
>  }
>  
> +static void unmap_and_free_pt(struct i915_page_table_entry *pt)
> +{
> +	if (WARN_ON(!pt->page))
> +		return;
> +	__free_page(pt->page);
> +	kfree(pt);
> +}
> +
> +static struct i915_page_table_entry *alloc_pt_single(void)
> +{
> +	struct i915_page_table_entry *pt;
> +
> +	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
> +	if (!pt)
> +		return ERR_PTR(-ENOMEM);
> +
> +	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!pt->page) {
> +		kfree(pt);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	return pt;
> +}
> +
> +/**
> + * alloc_pt_range() - Allocate a multiple page tables
> + * @pd:		The page directory which will have at least @count entries
> + *		available to point to the allocated page tables.
> + * @pde:	First page directory entry for which we are allocating.
> + * @count:	Number of pages to allocate.
> + *
> + * Allocates multiple page table pages and sets the appropriate entries in the
> + * page table structure within the page directory. Function cleans up after
> + * itself on any failures.
> + *
> + * Return: 0 if allocation succeeded.
> + */
> +static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count)
> +{
> +	int i, ret;
> +
> +	/* 512 is the max page tables per page_directory on any platform. */
> +	if (WARN_ON(pde + count > GEN6_PPGTT_PD_ENTRIES))
> +		return -EINVAL;
> +
> +	for (i = pde; i < pde + count; i++) {
> +		struct i915_page_table_entry *pt = alloc_pt_single();
> +
> +		if (IS_ERR(pt)) {
> +			ret = PTR_ERR(pt);
> +			goto err_out;
> +		}
> +		WARN(pd->page_tables[i],
> +		     "Leaking page directory entry %d (%pa)\n",
> +		     i, pd->page_tables[i]);
> +		pd->page_tables[i] = pt;
> +	}
> +
> +	return 0;
> +
> +err_out:
> +	while (i--)
> +		unmap_and_free_pt(pd->page_tables[i]);

This is suspicious as it is non symmetrical of how we allocate. If the
plan is to free everything below pde, that should be mentioned in the
comment above.

On this patch we call this with pde == 0, but I suspect later in the
series there will be other usecases for this.

> +	return ret;
> +}
> +
> +static void unmap_and_free_pd(struct i915_page_directory_entry *pd)
> +{
> +	if (pd->page) {
> +		__free_page(pd->page);
> +		kfree(pd);
> +	}
> +}
> +
> +static struct i915_page_directory_entry *alloc_pd_single(void)
> +{
> +	struct i915_page_directory_entry *pd;
> +
> +	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
> +	if (!pd)
> +		return ERR_PTR(-ENOMEM);
> +
> +	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!pd->page) {
> +		kfree(pd);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	return pd;
> +}
> +
>  /* Broadwell Page Directory Pointer Descriptors */
>  static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
>  			   uint64_t val)
> @@ -311,7 +403,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
>  	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
>  
>  	for (i = used_pd - 1; i >= 0; i--) {
> -		dma_addr_t addr = ppgtt->pdp.page_directory[i].daddr;
> +		dma_addr_t addr = ppgtt->pdp.page_directory[i]->daddr;
>  		ret = gen8_write_pdp(ring, i, addr);
>  		if (ret)
>  			return ret;
> @@ -338,8 +430,24 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>  				      I915_CACHE_LLC, use_scratch);
>  
>  	while (num_entries) {
> -		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
> -		struct page *page_table = pd->page_tables[pde].page;
> +		struct i915_page_directory_entry *pd;
> +		struct i915_page_table_entry *pt;
> +		struct page *page_table;
> +
> +		if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
> +			continue;
> +
> +		pd = ppgtt->pdp.page_directory[pdpe];
> +
> +		if (WARN_ON(!pd->page_tables[pde]))
> +			continue;
> +
> +		pt = pd->page_tables[pde];
> +
> +		if (WARN_ON(!pt->page))
> +			continue;
> +
> +		page_table = pt->page;
>  
>  		last_pte = pte + num_entries;
>  		if (last_pte > GEN8_PTES_PER_PAGE)
> @@ -384,8 +492,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  			break;
>  
>  		if (pt_vaddr == NULL) {
> -			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
> -			struct page *page_table = pd->page_tables[pde].page;
> +			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[pdpe];
> +			struct i915_page_table_entry *pt = pd->page_tables[pde];
> +			struct page *page_table = pt->page;
>  
>  			pt_vaddr = kmap_atomic(page_table);
>  		}
> @@ -416,19 +525,16 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
>  {
>  	int i;
>  
> -	if (pd->page_tables == NULL)
> +	if (!pd->page)
>  		return;
>  
> -	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
> -		if (pd->page_tables[i].page)
> -			__free_page(pd->page_tables[i].page);
> -}
> +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
> +		if (WARN_ON(!pd->page_tables[i]))
> +			continue;
>  
> -static void gen8_free_page_directory(struct i915_page_directory_entry *pd)
> -{
> -	gen8_free_page_tables(pd);
> -	kfree(pd->page_tables);
> -	__free_page(pd->page);
> +		unmap_and_free_pt(pd->page_tables[i]);
> +		pd->page_tables[i] = NULL;
> +	}
>  }
>  
>  static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
> @@ -436,7 +542,11 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  	int i;
>  
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> -		gen8_free_page_directory(&ppgtt->pdp.page_directory[i]);
> +		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
> +			continue;
> +
> +		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
> +		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
>  	}
>  }
>  
> @@ -448,14 +558,23 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
>  		/* TODO: In the future we'll support sparse mappings, so this
>  		 * will have to change. */
> -		if (!ppgtt->pdp.page_directory[i].daddr)
> +		if (!ppgtt->pdp.page_directory[i]->daddr)
>  			continue;
>  
> -		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i].daddr, PAGE_SIZE,
> +		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i]->daddr, PAGE_SIZE,
>  			       PCI_DMA_BIDIRECTIONAL);
>  
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> -			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
> +			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
> +			struct i915_page_table_entry *pt;
> +			dma_addr_t addr;
> +
> +			if (WARN_ON(!pd->page_tables[j]))
> +				continue;
> +
> +			pt = pd->page_tables[j];
> +			addr = pt->daddr;
> +
>  			if (addr)
>  				pci_unmap_page(hwdev, addr, PAGE_SIZE,
>  					       PCI_DMA_BIDIRECTIONAL);
> @@ -474,25 +593,20 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>  
>  static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>  {
> -	int i, j;
> +	int i, ret;
>  
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> -		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[i];
> -		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> -			struct i915_page_table_entry *pt = &pd->page_tables[j];
> -
> -			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> -			if (!pt->page)
> -				goto unwind_out;
> -
> -		}
> +		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
> +				     0, GEN8_PDES_PER_PAGE);
> +		if (ret)
> +			goto unwind_out;
>  	}
>  
>  	return 0;
>  
>  unwind_out:
>  	while (i--)
> -		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
> +		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
>  
>  	return -ENOMEM;
>  }
> @@ -503,19 +617,9 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
>  	int i;
>  
>  	for (i = 0; i < max_pdp; i++) {
> -		struct i915_page_table_entry *pt;
> -
> -		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
> -		if (!pt)
> +		ppgtt->pdp.page_directory[i] = alloc_pd_single();
> +		if (IS_ERR(ppgtt->pdp.page_directory[i]))
>  			goto unwind_out;
> -
> -		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
> -		if (!ppgtt->pdp.page_directory[i].page) {
> -			kfree(pt);
> -			goto unwind_out;
> -		}
> -
> -		ppgtt->pdp.page_directory[i].page_tables = pt;
>  	}
>  
>  	ppgtt->num_pd_pages = max_pdp;
> @@ -524,10 +628,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
>  	return 0;
>  
>  unwind_out:
> -	while (i--) {
> -		kfree(ppgtt->pdp.page_directory[i].page_tables);
> -		__free_page(ppgtt->pdp.page_directory[i].page);
> -	}
> +	while (i--)
> +		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
>  
>  	return -ENOMEM;
>  }
> @@ -561,14 +663,14 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
>  	int ret;
>  
>  	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
> -			       ppgtt->pdp.page_directory[pd].page, 0,
> +			       ppgtt->pdp.page_directory[pd]->page, 0,
>  			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>  
>  	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
>  	if (ret)
>  		return ret;
>  
> -	ppgtt->pdp.page_directory[pd].daddr = pd_addr;
> +	ppgtt->pdp.page_directory[pd]->daddr = pd_addr;
>  
>  	return 0;
>  }
> @@ -578,8 +680,8 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
>  					const int pt)
>  {
>  	dma_addr_t pt_addr;
> -	struct i915_page_directory_entry *pdir = &ppgtt->pdp.page_directory[pd];
> -	struct i915_page_table_entry *ptab = &pdir->page_tables[pt];
> +	struct i915_page_directory_entry *pdir = ppgtt->pdp.page_directory[pd];
> +	struct i915_page_table_entry *ptab = pdir->page_tables[pt];
>  	struct page *p = ptab->page;
>  	int ret;
>  
> @@ -642,10 +744,12 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  	 * will never need to touch the PDEs again.
>  	 */
>  	for (i = 0; i < max_pdp; i++) {
> +		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
>  		gen8_ppgtt_pde_t *pd_vaddr;
> -		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
> +		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i]->page);
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> -			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
> +			struct i915_page_table_entry *pt = pd->page_tables[j];
> +			dma_addr_t addr = pt->daddr;
>  			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
>  						      I915_CACHE_LLC);
>  		}
> @@ -696,7 +800,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>  	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
>  		u32 expected;
>  		gen6_gtt_pte_t *pt_vaddr;
> -		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
> +		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
>  		pd_entry = readl(pd_addr + pde);
>  		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
>  
> @@ -707,7 +811,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>  				   expected);
>  		seq_printf(m, "\tPDE: %x\n", pd_entry);
>  
> -		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
>  		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
>  			unsigned long va =
>  				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
> @@ -746,7 +850,7 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
>  	for (i = 0; i < ppgtt->num_pd_entries; i++) {
>  		dma_addr_t pt_addr;
>  
> -		pt_addr = ppgtt->pd.page_tables[i].daddr;
> +		pt_addr = ppgtt->pd.page_tables[i]->daddr;
>  		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
>  		pd_entry |= GEN6_PDE_VALID;
>  
> @@ -922,7 +1026,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>  		if (last_pte > I915_PPGTT_PT_ENTRIES)
>  			last_pte = I915_PPGTT_PT_ENTRIES;
>  
> -		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
>  
>  		for (i = first_pte; i < last_pte; i++)
>  			pt_vaddr[i] = scratch_pte;
> @@ -951,7 +1055,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>  	pt_vaddr = NULL;
>  	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
>  		if (pt_vaddr == NULL)
> -			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
> +			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
>  
>  		pt_vaddr[act_pte] =
>  			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
> @@ -974,7 +1078,7 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  
>  	for (i = 0; i < ppgtt->num_pd_entries; i++)
>  		pci_unmap_page(ppgtt->base.dev->pdev,
> -			       ppgtt->pd.page_tables[i].daddr,
> +			       ppgtt->pd.page_tables[i]->daddr,
>  			       4096, PCI_DMA_BIDIRECTIONAL);
>  }
>  
> @@ -983,8 +1087,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  	int i;
>  
>  	for (i = 0; i < ppgtt->num_pd_entries; i++)
> -		__free_page(ppgtt->pd.page_tables[i].page);
> -	kfree(ppgtt->pd.page_tables);
> +		unmap_and_free_pt(ppgtt->pd.page_tables[i]);
> +
> +	unmap_and_free_pd(&ppgtt->pd);
>  }
>  
>  static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
> @@ -1039,27 +1144,6 @@ alloc:
>  	return 0;
>  }
>  
> -static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
> -{
> -	struct i915_page_table_entry *pt;
> -	int i;
> -
> -	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
> -	if (!pt)
> -		return -ENOMEM;
> -
> -	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> -		pt[i].page = alloc_page(GFP_KERNEL);
> -		if (!pt->page) {
> -			gen6_ppgtt_free(ppgtt);
> -			return -ENOMEM;
> -		}
> -	}
> -
> -	ppgtt->pd.page_tables = pt;
> -	return 0;
> -}
> -
>  static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>  {
>  	int ret;
> @@ -1068,7 +1152,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>  	if (ret)
>  		return ret;
>  
> -	ret = gen6_ppgtt_allocate_page_tables(ppgtt);
> +	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
>  	if (ret) {
>  		drm_mm_remove_node(&ppgtt->node);
>  		return ret;
> @@ -1086,7 +1170,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>  		struct page *page;
>  		dma_addr_t pt_addr;
>  
> -		page = ppgtt->pd.page_tables[i].page;
> +		page = ppgtt->pd.page_tables[i]->page;
>  		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
>  				       PCI_DMA_BIDIRECTIONAL);
>  
> @@ -1095,7 +1179,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>  			return -EIO;
>  		}
>  
> -		ppgtt->pd.page_tables[i].daddr = pt_addr;
> +		ppgtt->pd.page_tables[i]->daddr = pt_addr;
>  	}
>  
>  	return 0;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 6efeb18..e8cad72 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -199,12 +199,12 @@ struct i915_page_directory_entry {
>  		dma_addr_t daddr;
>  	};
>  
> -	struct i915_page_table_entry *page_tables;
> +	struct i915_page_table_entry *page_tables[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */

Would you consider changing the plural here in 'tables' so that we would
lose the discrepancy against the page_directory below?

-Mika

>  };
>  
>  struct i915_page_directory_pointer_entry {
>  	/* struct page *page; */
> -	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
> +	struct i915_page_directory_entry *page_directory[GEN8_LEGACY_PDPES];
>  };
>  
>  struct i915_address_space {
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 9e71992..bc9c7c3 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1735,14 +1735,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
>  	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
>  	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
>  	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
> -	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3].daddr);
> -	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3].daddr);
> -	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2].daddr);
> -	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2].daddr);
> -	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1].daddr);
> -	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1].daddr);
> -	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0].daddr);
> -	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0].daddr);
> +	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
> +	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
> +	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
> +	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
> +	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
> +	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
> +	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
> +	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
>  	if (ring->id == RCS) {
>  		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>  		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
> -- 
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v5 03/32] drm/i915: Create page table allocators
  2015-02-24 13:56     ` Mika Kuoppala
@ 2015-02-24 15:18       ` Michel Thierry
  0 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 15:18 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 22192 bytes --]

On 2/24/2015 1:56 PM, Mika Kuoppala wrote:
> Michel Thierry <michel.thierry@intel.com> writes:
>
>> From: Ben Widawsky <benjamin.widawsky@intel.com>
>>
>> As we move toward dynamic page table allocation, it becomes much easier
>> to manage our data structures if break do things less coarsely by
>> breaking up all of our actions into individual tasks.  This makes the
>> code easier to write, read, and verify.
>>
>> Aside from the dissection of the allocation functions, the patch
>> statically allocates the page table structures without a page directory.
>> This remains the same for all platforms,
>>
>> The patch itself should not have much functional difference. The primary
>> noticeable difference is the fact that page tables are no longer
>> allocated, but rather statically declared as part of the page directory.
>> This has non-zero overhead, but things gain non-trivial complexity as a
>> result.
>>
> I don't quite understand the last sentence here. We gain overhead and
> complexity.
>
> s/non-trivial/trivial?
I'll rephrase this.
>> This patch exists for a few reasons:
>> 1. Splitting out the functions allows easily combining GEN6 and GEN8
>> code. Page tables have no difference based on GEN8. As we'll see in a
>> future patch when we add the DMA mappings to the allocations, it
>> requires only one small change to make work, and error handling should
>> just fall into place.
>>
>> 2. Unless we always want to allocate all page tables under a given PDE,
>> we'll have to eventually break this up into an array of pointers (or
>> pointer to pointer).
>>
>> 3. Having the discrete functions is easier to review, and understand.
>> All allocations and frees now take place in just a couple of locations.
>> Reviewing, and catching leaks should be easy.
>>
>> 4. Less important: the GFP flags are confined to one location, which
>> makes playing around with such things trivial.
>>
>> v2: Updated commit message to explain why this patch exists
>>
>> v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
>>
>> v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
>>
>> v5: Added additional safety checks in gen8 clear/free/unmap.
>>
>> v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
>>
>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
>> ---
>>   drivers/gpu/drm/i915/i915_gem_gtt.c | 252 ++++++++++++++++++++++++------------
>>   drivers/gpu/drm/i915/i915_gem_gtt.h |   4 +-
>>   drivers/gpu/drm/i915/intel_lrc.c    |  16 +--
>>   3 files changed, 178 insertions(+), 94 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index eb0714c..65c77e5 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -279,6 +279,98 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
>>   	return pte;
>>   }
>>   
>> +static void unmap_and_free_pt(struct i915_page_table_entry *pt)
>> +{
>> +	if (WARN_ON(!pt->page))
>> +		return;
>> +	__free_page(pt->page);
>> +	kfree(pt);
>> +}
>> +
>> +static struct i915_page_table_entry *alloc_pt_single(void)
>> +{
>> +	struct i915_page_table_entry *pt;
>> +
>> +	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
>> +	if (!pt)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +	if (!pt->page) {
>> +		kfree(pt);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +
>> +	return pt;
>> +}
>> +
>> +/**
>> + * alloc_pt_range() - Allocate a multiple page tables
>> + * @pd:		The page directory which will have at least @count entries
>> + *		available to point to the allocated page tables.
>> + * @pde:	First page directory entry for which we are allocating.
>> + * @count:	Number of pages to allocate.
>> + *
>> + * Allocates multiple page table pages and sets the appropriate entries in the
>> + * page table structure within the page directory. Function cleans up after
>> + * itself on any failures.
>> + *
>> + * Return: 0 if allocation succeeded.
>> + */
>> +static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count)
>> +{
>> +	int i, ret;
>> +
>> +	/* 512 is the max page tables per page_directory on any platform. */
>> +	if (WARN_ON(pde + count > GEN6_PPGTT_PD_ENTRIES))
>> +		return -EINVAL;
>> +
>> +	for (i = pde; i < pde + count; i++) {
>> +		struct i915_page_table_entry *pt = alloc_pt_single();
>> +
>> +		if (IS_ERR(pt)) {
>> +			ret = PTR_ERR(pt);
>> +			goto err_out;
>> +		}
>> +		WARN(pd->page_tables[i],
>> +		     "Leaking page directory entry %d (%pa)\n",
>> +		     i, pd->page_tables[i]);
>> +		pd->page_tables[i] = pt;
>> +	}
>> +
>> +	return 0;
>> +
>> +err_out:
>> +	while (i--)
>> +		unmap_and_free_pt(pd->page_tables[i]);
> This is suspicious as it is non symmetrical of how we allocate. If the
> plan is to free everything below pde, that should be mentioned in the
> comment above.
>
> On this patch we call this with pde == 0, but I suspect later in the
> series there will be other usecases for this.
Actually is the other way around, later on this will be used only by 
aliasing ppgtt;
others will iterate through a macro and call alloc_pt_single ("page 
table allocation rework" patch).

Anyway, it makes it clearer to have something like:
  err_out:
-    while (i--)
+    while (i-- > pde)
          unmap_and_free_pt(pd->page_tables[i]);
      return ret;

>> +	return ret;
>> +}
>> +
>> +static void unmap_and_free_pd(struct i915_page_directory_entry *pd)
>> +{
>> +	if (pd->page) {
>> +		__free_page(pd->page);
>> +		kfree(pd);
>> +	}
>> +}
>> +
>> +static struct i915_page_directory_entry *alloc_pd_single(void)
>> +{
>> +	struct i915_page_directory_entry *pd;
>> +
>> +	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
>> +	if (!pd)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +	if (!pd->page) {
>> +		kfree(pd);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +
>> +	return pd;
>> +}
>> +
>>   /* Broadwell Page Directory Pointer Descriptors */
>>   static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
>>   			   uint64_t val)
>> @@ -311,7 +403,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
>>   	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
>>   
>>   	for (i = used_pd - 1; i >= 0; i--) {
>> -		dma_addr_t addr = ppgtt->pdp.page_directory[i].daddr;
>> +		dma_addr_t addr = ppgtt->pdp.page_directory[i]->daddr;
>>   		ret = gen8_write_pdp(ring, i, addr);
>>   		if (ret)
>>   			return ret;
>> @@ -338,8 +430,24 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>>   				      I915_CACHE_LLC, use_scratch);
>>   
>>   	while (num_entries) {
>> -		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
>> -		struct page *page_table = pd->page_tables[pde].page;
>> +		struct i915_page_directory_entry *pd;
>> +		struct i915_page_table_entry *pt;
>> +		struct page *page_table;
>> +
>> +		if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
>> +			continue;
>> +
>> +		pd = ppgtt->pdp.page_directory[pdpe];
>> +
>> +		if (WARN_ON(!pd->page_tables[pde]))
>> +			continue;
>> +
>> +		pt = pd->page_tables[pde];
>> +
>> +		if (WARN_ON(!pt->page))
>> +			continue;
>> +
>> +		page_table = pt->page;
>>   
>>   		last_pte = pte + num_entries;
>>   		if (last_pte > GEN8_PTES_PER_PAGE)
>> @@ -384,8 +492,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>>   			break;
>>   
>>   		if (pt_vaddr == NULL) {
>> -			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
>> -			struct page *page_table = pd->page_tables[pde].page;
>> +			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[pdpe];
>> +			struct i915_page_table_entry *pt = pd->page_tables[pde];
>> +			struct page *page_table = pt->page;
>>   
>>   			pt_vaddr = kmap_atomic(page_table);
>>   		}
>> @@ -416,19 +525,16 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
>>   {
>>   	int i;
>>   
>> -	if (pd->page_tables == NULL)
>> +	if (!pd->page)
>>   		return;
>>   
>> -	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
>> -		if (pd->page_tables[i].page)
>> -			__free_page(pd->page_tables[i].page);
>> -}
>> +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
>> +		if (WARN_ON(!pd->page_tables[i]))
>> +			continue;
>>   
>> -static void gen8_free_page_directory(struct i915_page_directory_entry *pd)
>> -{
>> -	gen8_free_page_tables(pd);
>> -	kfree(pd->page_tables);
>> -	__free_page(pd->page);
>> +		unmap_and_free_pt(pd->page_tables[i]);
>> +		pd->page_tables[i] = NULL;
>> +	}
>>   }
>>   
>>   static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>> @@ -436,7 +542,11 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>>   	int i;
>>   
>>   	for (i = 0; i < ppgtt->num_pd_pages; i++) {
>> -		gen8_free_page_directory(&ppgtt->pdp.page_directory[i]);
>> +		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
>> +			continue;
>> +
>> +		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
>> +		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
>>   	}
>>   }
>>   
>> @@ -448,14 +558,23 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>>   	for (i = 0; i < ppgtt->num_pd_pages; i++) {
>>   		/* TODO: In the future we'll support sparse mappings, so this
>>   		 * will have to change. */
>> -		if (!ppgtt->pdp.page_directory[i].daddr)
>> +		if (!ppgtt->pdp.page_directory[i]->daddr)
>>   			continue;
>>   
>> -		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i].daddr, PAGE_SIZE,
>> +		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i]->daddr, PAGE_SIZE,
>>   			       PCI_DMA_BIDIRECTIONAL);
>>   
>>   		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
>> -			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
>> +			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
>> +			struct i915_page_table_entry *pt;
>> +			dma_addr_t addr;
>> +
>> +			if (WARN_ON(!pd->page_tables[j]))
>> +				continue;
>> +
>> +			pt = pd->page_tables[j];
>> +			addr = pt->daddr;
>> +
>>   			if (addr)
>>   				pci_unmap_page(hwdev, addr, PAGE_SIZE,
>>   					       PCI_DMA_BIDIRECTIONAL);
>> @@ -474,25 +593,20 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>>   
>>   static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>>   {
>> -	int i, j;
>> +	int i, ret;
>>   
>>   	for (i = 0; i < ppgtt->num_pd_pages; i++) {
>> -		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[i];
>> -		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
>> -			struct i915_page_table_entry *pt = &pd->page_tables[j];
>> -
>> -			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> -			if (!pt->page)
>> -				goto unwind_out;
>> -
>> -		}
>> +		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
>> +				     0, GEN8_PDES_PER_PAGE);
>> +		if (ret)
>> +			goto unwind_out;
>>   	}
>>   
>>   	return 0;
>>   
>>   unwind_out:
>>   	while (i--)
>> -		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
>> +		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
>>   
>>   	return -ENOMEM;
>>   }
>> @@ -503,19 +617,9 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
>>   	int i;
>>   
>>   	for (i = 0; i < max_pdp; i++) {
>> -		struct i915_page_table_entry *pt;
>> -
>> -		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
>> -		if (!pt)
>> +		ppgtt->pdp.page_directory[i] = alloc_pd_single();
>> +		if (IS_ERR(ppgtt->pdp.page_directory[i]))
>>   			goto unwind_out;
>> -
>> -		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
>> -		if (!ppgtt->pdp.page_directory[i].page) {
>> -			kfree(pt);
>> -			goto unwind_out;
>> -		}
>> -
>> -		ppgtt->pdp.page_directory[i].page_tables = pt;
>>   	}
>>   
>>   	ppgtt->num_pd_pages = max_pdp;
>> @@ -524,10 +628,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
>>   	return 0;
>>   
>>   unwind_out:
>> -	while (i--) {
>> -		kfree(ppgtt->pdp.page_directory[i].page_tables);
>> -		__free_page(ppgtt->pdp.page_directory[i].page);
>> -	}
>> +	while (i--)
>> +		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
>>   
>>   	return -ENOMEM;
>>   }
>> @@ -561,14 +663,14 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
>>   	int ret;
>>   
>>   	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
>> -			       ppgtt->pdp.page_directory[pd].page, 0,
>> +			       ppgtt->pdp.page_directory[pd]->page, 0,
>>   			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>>   
>>   	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
>>   	if (ret)
>>   		return ret;
>>   
>> -	ppgtt->pdp.page_directory[pd].daddr = pd_addr;
>> +	ppgtt->pdp.page_directory[pd]->daddr = pd_addr;
>>   
>>   	return 0;
>>   }
>> @@ -578,8 +680,8 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
>>   					const int pt)
>>   {
>>   	dma_addr_t pt_addr;
>> -	struct i915_page_directory_entry *pdir = &ppgtt->pdp.page_directory[pd];
>> -	struct i915_page_table_entry *ptab = &pdir->page_tables[pt];
>> +	struct i915_page_directory_entry *pdir = ppgtt->pdp.page_directory[pd];
>> +	struct i915_page_table_entry *ptab = pdir->page_tables[pt];
>>   	struct page *p = ptab->page;
>>   	int ret;
>>   
>> @@ -642,10 +744,12 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>>   	 * will never need to touch the PDEs again.
>>   	 */
>>   	for (i = 0; i < max_pdp; i++) {
>> +		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
>>   		gen8_ppgtt_pde_t *pd_vaddr;
>> -		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
>> +		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i]->page);
>>   		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
>> -			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_tables[j].daddr;
>> +			struct i915_page_table_entry *pt = pd->page_tables[j];
>> +			dma_addr_t addr = pt->daddr;
>>   			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
>>   						      I915_CACHE_LLC);
>>   		}
>> @@ -696,7 +800,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>>   	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
>>   		u32 expected;
>>   		gen6_gtt_pte_t *pt_vaddr;
>> -		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
>> +		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
>>   		pd_entry = readl(pd_addr + pde);
>>   		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
>>   
>> @@ -707,7 +811,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>>   				   expected);
>>   		seq_printf(m, "\tPDE: %x\n", pd_entry);
>>   
>> -		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
>> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
>>   		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
>>   			unsigned long va =
>>   				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
>> @@ -746,7 +850,7 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
>>   	for (i = 0; i < ppgtt->num_pd_entries; i++) {
>>   		dma_addr_t pt_addr;
>>   
>> -		pt_addr = ppgtt->pd.page_tables[i].daddr;
>> +		pt_addr = ppgtt->pd.page_tables[i]->daddr;
>>   		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
>>   		pd_entry |= GEN6_PDE_VALID;
>>   
>> @@ -922,7 +1026,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>>   		if (last_pte > I915_PPGTT_PT_ENTRIES)
>>   			last_pte = I915_PPGTT_PT_ENTRIES;
>>   
>> -		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
>> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
>>   
>>   		for (i = first_pte; i < last_pte; i++)
>>   			pt_vaddr[i] = scratch_pte;
>> @@ -951,7 +1055,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>>   	pt_vaddr = NULL;
>>   	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
>>   		if (pt_vaddr == NULL)
>> -			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt].page);
>> +			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[act_pt]->page);
>>   
>>   		pt_vaddr[act_pte] =
>>   			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
>> @@ -974,7 +1078,7 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>>   
>>   	for (i = 0; i < ppgtt->num_pd_entries; i++)
>>   		pci_unmap_page(ppgtt->base.dev->pdev,
>> -			       ppgtt->pd.page_tables[i].daddr,
>> +			       ppgtt->pd.page_tables[i]->daddr,
>>   			       4096, PCI_DMA_BIDIRECTIONAL);
>>   }
>>   
>> @@ -983,8 +1087,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>>   	int i;
>>   
>>   	for (i = 0; i < ppgtt->num_pd_entries; i++)
>> -		__free_page(ppgtt->pd.page_tables[i].page);
>> -	kfree(ppgtt->pd.page_tables);
>> +		unmap_and_free_pt(ppgtt->pd.page_tables[i]);
>> +
>> +	unmap_and_free_pd(&ppgtt->pd);
>>   }
>>   
>>   static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
>> @@ -1039,27 +1144,6 @@ alloc:
>>   	return 0;
>>   }
>>   
>> -static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>> -{
>> -	struct i915_page_table_entry *pt;
>> -	int i;
>> -
>> -	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
>> -	if (!pt)
>> -		return -ENOMEM;
>> -
>> -	for (i = 0; i < ppgtt->num_pd_entries; i++) {
>> -		pt[i].page = alloc_page(GFP_KERNEL);
>> -		if (!pt->page) {
>> -			gen6_ppgtt_free(ppgtt);
>> -			return -ENOMEM;
>> -		}
>> -	}
>> -
>> -	ppgtt->pd.page_tables = pt;
>> -	return 0;
>> -}
>> -
>>   static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>>   {
>>   	int ret;
>> @@ -1068,7 +1152,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>>   	if (ret)
>>   		return ret;
>>   
>> -	ret = gen6_ppgtt_allocate_page_tables(ppgtt);
>> +	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
>>   	if (ret) {
>>   		drm_mm_remove_node(&ppgtt->node);
>>   		return ret;
>> @@ -1086,7 +1170,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>>   		struct page *page;
>>   		dma_addr_t pt_addr;
>>   
>> -		page = ppgtt->pd.page_tables[i].page;
>> +		page = ppgtt->pd.page_tables[i]->page;
>>   		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
>>   				       PCI_DMA_BIDIRECTIONAL);
>>   
>> @@ -1095,7 +1179,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>>   			return -EIO;
>>   		}
>>   
>> -		ppgtt->pd.page_tables[i].daddr = pt_addr;
>> +		ppgtt->pd.page_tables[i]->daddr = pt_addr;
>>   	}
>>   
>>   	return 0;
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> index 6efeb18..e8cad72 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> @@ -199,12 +199,12 @@ struct i915_page_directory_entry {
>>   		dma_addr_t daddr;
>>   	};
>>   
>> -	struct i915_page_table_entry *page_tables;
>> +	struct i915_page_table_entry *page_tables[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
> Would you consider changing the plural here in 'tables' so that we would
> lose the discrepancy against the page_directory below?
Ok, I'll rename them, but it's cleaner to make this change in the patch 
that added it ("drm/i915: page table abstractions").

-Michel
> -Mika
>
>>   };
>>   
>>   struct i915_page_directory_pointer_entry {
>>   	/* struct page *page; */
>> -	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
>> +	struct i915_page_directory_entry *page_directory[GEN8_LEGACY_PDPES];
>>   };
>>   
>>   struct i915_address_space {
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 9e71992..bc9c7c3 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1735,14 +1735,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
>>   	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
>>   	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
>>   	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
>> -	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3].daddr);
>> -	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3].daddr);
>> -	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2].daddr);
>> -	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2].daddr);
>> -	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1].daddr);
>> -	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1].daddr);
>> -	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0].daddr);
>> -	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0].daddr);
>> +	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
>> +	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
>> +	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
>> +	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
>> +	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
>> +	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
>> +	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
>> +	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
>>   	if (ring->id == RCS) {
>>   		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>>   		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
>> -- 
>> 2.1.1
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5510 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing
  2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
                   ` (28 preceding siblings ...)
  2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
@ 2015-02-24 16:22 ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 01/32] drm/i915: page table abstractions Michel Thierry
                     ` (32 more replies)
  29 siblings, 33 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

This patchset addresses comments from v5 by Mika, specially some rename changes
touched several patches.

For GEN8, it has also been extended to work in logical ring submission (lrc)
mode, as it will be the preferred mode of operation.
I also tried to update the lrc code at the same time the ppgtt refactoring
occurred, leaving only one patch that is exclusively for lrc.

I'm also now including the required patches for PPGTT with 48b addressing.
In order expand the GPU address space, a 4th level translation is added, the
Page Map Level 4 (PML4). This PML4 has 256 PML4 Entries (PML4E), PML4[0-255],
each pointing to a PDP.

For now, this feature will only be available in BDW and GEN9, in LRC submission
mode (execlists) and when i915.enable_ppgtt=3 is set.
Also note that this expanded address space is only available for full PPGTT,
aliasing PPGTT remains 32b.

This list can be seen in 3 parts:
[01-10] Add page table allocation for GEN6/GEN7
[11-20] Enable dynamic allocation in GEN8,for both legacy and
execlist submission modes.
[21-32] PML4 support in BDW and GEN9+.

Ben Widawsky (26):
  drm/i915: page table abstractions
  drm/i915: Complete page table structures
  drm/i915: Create page table allocators
  drm/i915: Track GEN6 page table usage
  drm/i915: Extract context switch skip and pd load logic
  drm/i915: Track page table reload need
  drm/i915: Initialize all contexts
  drm/i915: Finish gen6/7 dynamic page table allocation
  drm/i915/bdw: Use dynamic allocation idioms on free
  drm/i915/bdw: page directories rework allocation
  drm/i915/bdw: pagetable allocation rework
  drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
  drm/i915: num_pd_pages/num_pd_entries isn't useful
  drm/i915: Extract PPGTT param from page_directory alloc
  drm/i915/bdw: Split out mappings
  drm/i915/bdw: begin bitmap tracking
  drm/i915/bdw: Dynamic page table allocations
  drm/i915/bdw: Make pdp allocation more dynamic
  drm/i915/bdw: Abstract PDP usage
  drm/i915/bdw: Add dynamic page trace events
  drm/i915/bdw: Add ppgtt info for dynamic pages
  drm/i915/bdw: implement alloc/free for 4lvl
  drm/i915/bdw: Add 4 level switching infrastructure
  drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT
  drm/i915: Plumb sg_iter through va allocation ->maps
  drm/i915: Expand error state's address width to 64b

Michel Thierry (6):
  drm/i915: Plumb drm_device through page tables operations
  drm/i915: Add dynamic page trace events
  drm/i915/bdw: Support dynamic pdp updates in lrc mode
  drm/i915/bdw: Support 64 bit PPGTT in lrc mode
  drm/i915/bdw: Add 4 level support in insert_entries and clear_range
  drm/i915/bdw: Flip the 48b switch

 drivers/gpu/drm/i915/i915_debugfs.c        |   26 +-
 drivers/gpu/drm/i915/i915_drv.h            |   11 +-
 drivers/gpu/drm/i915/i915_gem.c            |   11 +
 drivers/gpu/drm/i915/i915_gem_context.c    |   64 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   11 +
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 1535 ++++++++++++++++++++++------
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  248 ++++-
 drivers/gpu/drm/i915/i915_gpu_error.c      |   17 +-
 drivers/gpu/drm/i915/i915_params.c         |    2 +-
 drivers/gpu/drm/i915/i915_reg.h            |    1 +
 drivers/gpu/drm/i915/i915_trace.h          |  111 ++
 drivers/gpu/drm/i915/intel_lrc.c           |  149 ++-
 12 files changed, 1787 insertions(+), 399 deletions(-)

-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH v6 01/32] drm/i915: page table abstractions
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 02/32] drm/i915: Complete page table structures Michel Thierry
                     ` (31 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

When we move to dynamic page allocation, keeping page_directory and pagetabs as
separate structures will help to break actions into simpler tasks.

To help transition the code nicely there is some wasted space in gen6/7.
This will be ameliorated shortly.

Following the x86 pagetable terminology:
PDPE = struct i915_page_directory_pointer_entry.
PDE = struct i915_page_directory_entry [page_directory].
PTE = struct i915_page_table_entry [page_tables].

v2: fixed mismatches after clean-up/rebase.

v3: Clarify the names of the multiple levels of page tables (Daniel)

v4: Addressing Mika's review comments.
s/gen8_free_page_directories/gen8_free_page_directory and free the
page tables for the directory there.
In gen8_ppgtt_allocate_page_directories, do not leak previously allocated
pt in case the page_directory alloc fails.
Update error return handling in gen8_ppgtt_alloc.

v5: Do not leak pt on error in gen6_ppgtt_allocate_page_tables. (Mika)

v6: s/page_tables/page_table/. (Mika)

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 180 +++++++++++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  23 ++++-
 2 files changed, 111 insertions(+), 92 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e54b2a0..874d9cc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -338,7 +338,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
+		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
+		struct page *page_table = pd->page_table[pde].page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
@@ -382,8 +383,12 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
 			break;
 
-		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
+		if (pt_vaddr == NULL) {
+			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
+			struct page *page_table = pd->page_table[pde].page;
+
+			pt_vaddr = kmap_atomic(page_table);
+		}
 
 		pt_vaddr[pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -407,29 +412,33 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct page **pt_pages)
+static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
 {
 	int i;
 
-	if (pt_pages == NULL)
+	if (pd->page_table == NULL)
 		return;
 
 	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
-		if (pt_pages[i])
-			__free_pages(pt_pages[i], 0);
+		if (pd->page_table[i].page)
+			__free_page(pd->page_table[i].page);
 }
 
-static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
+static void gen8_free_page_directory(struct i915_page_directory_entry *pd)
+{
+	gen8_free_page_tables(pd);
+	kfree(pd->page_table);
+	__free_page(pd->page);
+}
+
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
-		kfree(ppgtt->gen8_pt_pages[i]);
+		gen8_free_page_directory(&ppgtt->pdp.page_directory[i]);
 		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
-
-	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
 }
 
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
@@ -464,86 +473,77 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static struct page **__gen8_alloc_page_tables(void)
+static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages;
 	int i;
 
-	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
-	if (!pt_pages)
-		return ERR_PTR(-ENOMEM);
-
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
-		pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!pt_pages[i])
-			goto bail;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
+						     sizeof(dma_addr_t),
+						     GFP_KERNEL);
+		if (!ppgtt->gen8_pt_dma_addr[i])
+			return -ENOMEM;
 	}
 
-	return pt_pages;
-
-bail:
-	gen8_free_page_tables(pt_pages);
-	kfree(pt_pages);
-	return ERR_PTR(-ENOMEM);
+	return 0;
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
-					   const int max_pdp)
+static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages[GEN8_LEGACY_PDPES];
-	int i, ret;
+	int i, j;
 
-	for (i = 0; i < max_pdp; i++) {
-		pt_pages[i] = __gen8_alloc_page_tables();
-		if (IS_ERR(pt_pages[i])) {
-			ret = PTR_ERR(pt_pages[i]);
-			goto unwind_out;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+			struct i915_page_table_entry *pt = &ppgtt->pdp.page_directory[i].page_table[j];
+
+			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+			if (!pt->page)
+				goto unwind_out;
 		}
 	}
 
-	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
-	 * "atomic" - for cleanup purposes.
-	 */
-	for (i = 0; i < max_pdp; i++)
-		ppgtt->gen8_pt_pages[i] = pt_pages[i];
-
 	return 0;
 
 unwind_out:
-	while (i--) {
-		gen8_free_page_tables(pt_pages[i]);
-		kfree(pt_pages[i]);
-	}
+	while (i--)
+		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
 
-	return ret;
+	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
+						const int max_pdp)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
+	for (i = 0; i < max_pdp; i++) {
+		struct i915_page_table_entry *pt;
 
-	return 0;
-}
+		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
+		if (!pt)
+			goto unwind_out;
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
-{
-	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
-	if (!ppgtt->pd_pages)
-		return -ENOMEM;
+		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
+		if (!ppgtt->pdp.page_directory[i].page) {
+			kfree(pt);
+			goto unwind_out;
+		}
+
+		ppgtt->pdp.page_directory[i].page_table = pt;
+	}
 
-	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
+	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
+
+unwind_out:
+	while (i--) {
+		kfree(ppgtt->pdp.page_directory[i].page_table);
+		__free_page(ppgtt->pdp.page_directory[i].page);
+	}
+
+	return -ENOMEM;
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
@@ -555,18 +555,20 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
-	if (ret) {
-		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
-		return ret;
-	}
+	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
+	if (ret)
+		goto err_out;
 
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
 
 	ret = gen8_ppgtt_allocate_dma(ppgtt);
 	if (ret)
-		gen8_ppgtt_free(ppgtt);
+		goto err_out;
+
+	return 0;
 
+err_out:
+	gen8_ppgtt_free(ppgtt);
 	return ret;
 }
 
@@ -577,7 +579,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       &ppgtt->pd_pages[pd], 0,
+			       ppgtt->pdp.page_directory[pd].page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
@@ -597,7 +599,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 	struct page *p;
 	int ret;
 
-	p = ppgtt->gen8_pt_pages[pd][pt];
+	p = ppgtt->pdp.page_directory[pd].page_table[pt].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
@@ -658,7 +660,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 */
 	for (i = 0; i < max_pdp; i++) {
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
@@ -721,7 +723,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde].page);
 		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
@@ -936,7 +938,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > I915_PPGTT_PT_ENTRIES)
 			last_pte = I915_PPGTT_PT_ENTRIES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt].page);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -965,7 +967,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt].page);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -1000,8 +1002,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pt_pages[i]);
-	kfree(ppgtt->pt_pages);
+		if (ppgtt->pd.page_table[i].page)
+			__free_page(ppgtt->pd.page_table[i].page);
+	kfree(ppgtt->pd.page_table);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1058,17 +1061,18 @@ alloc:
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
+	struct i915_page_table_entry *pt;
 	int i;
 
-	ppgtt->pt_pages = kcalloc(ppgtt->num_pd_entries, sizeof(struct page *),
-				  GFP_KERNEL);
-
-	if (!ppgtt->pt_pages)
+	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
+	if (!pt)
 		return -ENOMEM;
 
+	ppgtt->pd.page_table = pt;
+
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!ppgtt->pt_pages[i]) {
+		pt[i].page = alloc_page(GFP_KERNEL);
+		if (!pt->page) {
 			gen6_ppgtt_free(ppgtt);
 			return -ENOMEM;
 		}
@@ -1108,9 +1112,11 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct page *page;
 		dma_addr_t pt_addr;
 
-		pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i], 0, 4096,
+		page = ppgtt->pd.page_table[i].page;
+		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
 		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
@@ -1157,7 +1163,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total =  ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd_offset =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8f76990..b759c41 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -187,6 +187,20 @@ struct i915_vma {
 			 u32 flags);
 };
 
+struct i915_page_table_entry {
+	struct page *page;
+};
+
+struct i915_page_directory_entry {
+	struct page *page; /* NULL for GEN6-GEN7 */
+	struct i915_page_table_entry *page_table;
+};
+
+struct i915_page_directory_pointer_entry {
+	/* struct page *page; */
+	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
+};
+
 struct i915_address_space {
 	struct drm_mm mm;
 	struct drm_device *dev;
@@ -272,11 +286,6 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		struct page **pt_pages;
-		struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
-	};
-	struct page *pd_pages;
-	union {
 		uint32_t pd_offset;
 		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
 	};
@@ -284,6 +293,10 @@ struct i915_hw_ppgtt {
 		dma_addr_t *pt_dma_addr;
 		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
 	};
+	union {
+		struct i915_page_directory_pointer_entry pdp;
+		struct i915_page_directory_entry pd;
+	};
 
 	struct drm_i915_file_private *file_priv;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 02/32] drm/i915: Complete page table structures
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 01/32] drm/i915: page table abstractions Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 03/32] drm/i915: Create page table allocators Michel Thierry
                     ` (30 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Move the remaining members over to the new page table structures.

This can be squashed with the previous commit if desire. The reasoning
is the same as that patch. I simply felt it is easier to review if split.

v2: In lrc: s/ppgtt->pd_dma_addr[i]/ppgtt->pdp.page_directory[i].daddr/
v3: Rebase.
v4: Rebased after s/page_tables/page_table/.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 85 +++++++++++++------------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 14 +++---
 drivers/gpu/drm/i915/intel_lrc.c    | 16 +++----
 4 files changed, 44 insertions(+), 73 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 63be374..4d07030 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2185,7 +2185,7 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
 		seq_puts(m, "aliasing PPGTT:\n");
-		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
+		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
 
 		ppgtt->debug_dump(ppgtt, m);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 874d9cc..ab6f1d4 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -311,7 +311,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pd_dma_addr[i];
+		dma_addr_t addr = ppgtt->pdp.page_directory[i].daddr;
 		ret = gen8_write_pdp(ring, i, addr);
 		if (ret)
 			return ret;
@@ -437,7 +437,6 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		gen8_free_page_directory(&ppgtt->pdp.page_directory[i]);
-		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
 }
 
@@ -449,14 +448,14 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pd_dma_addr[i])
+		if (!ppgtt->pdp.page_directory[i].daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i].daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_table[j].daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -473,32 +472,19 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
-{
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
-
-	return 0;
-}
-
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
 	int i, j;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[i];
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_table_entry *pt = &ppgtt->pdp.page_directory[i].page_table[j];
+			struct i915_page_table_entry *pt = &pd->page_table[j];
 
 			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 			if (!pt->page)
 				goto unwind_out;
+
 		}
 	}
 
@@ -561,10 +547,6 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
 
-	ret = gen8_ppgtt_allocate_dma(ppgtt);
-	if (ret)
-		goto err_out;
-
 	return 0;
 
 err_out:
@@ -586,7 +568,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ppgtt->pd_dma_addr[pd] = pd_addr;
+	ppgtt->pdp.page_directory[pd].daddr = pd_addr;
 
 	return 0;
 }
@@ -596,17 +578,18 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pt)
 {
 	dma_addr_t pt_addr;
-	struct page *p;
+	struct i915_page_directory_entry *pdir = &ppgtt->pdp.page_directory[pd];
+	struct i915_page_table_entry *ptab = &pdir->page_table[pt];
+	struct page *p = ptab->page;
 	int ret;
 
-	p = ppgtt->pdp.page_directory[pd].page_table[pt].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
+	ptab->daddr = pt_addr;
 
 	return 0;
 }
@@ -662,7 +645,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		gen8_ppgtt_pde_t *pd_vaddr;
 		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_table[j].daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -705,14 +688,15 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
 
 	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
 	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-		   ppgtt->pd_offset, ppgtt->pd_offset + ppgtt->num_pd_entries);
+		   ppgtt->pd.pd_offset,
+		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pt_dma_addr[pde];
+		dma_addr_t pt_addr = ppgtt->pd.page_table[pde].daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -756,13 +740,13 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 	uint32_t pd_entry;
 	int i;
 
-	WARN_ON(ppgtt->pd_offset & 0x3f);
+	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
 	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		dma_addr_t pt_addr;
 
-		pt_addr = ppgtt->pt_dma_addr[i];
+		pt_addr = ppgtt->pd.page_table[i].daddr;
 		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
 		pd_entry |= GEN6_PDE_VALID;
 
@@ -773,9 +757,9 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 {
-	BUG_ON(ppgtt->pd_offset & 0x3f);
+	BUG_ON(ppgtt->pd.pd_offset & 0x3f);
 
-	return (ppgtt->pd_offset / 64) << 16;
+	return (ppgtt->pd.pd_offset / 64) << 16;
 }
 
 static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
@@ -988,19 +972,16 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	if (ppgtt->pt_dma_addr) {
-		for (i = 0; i < ppgtt->num_pd_entries; i++)
-			pci_unmap_page(ppgtt->base.dev->pdev,
-				       ppgtt->pt_dma_addr[i],
-				       4096, PCI_DMA_BIDIRECTIONAL);
-	}
+	for (i = 0; i < ppgtt->num_pd_entries; i++)
+		pci_unmap_page(ppgtt->base.dev->pdev,
+			       ppgtt->pd.page_table[i].daddr,
+			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		if (ppgtt->pd.page_table[i].page)
 			__free_page(ppgtt->pd.page_table[i].page);
@@ -1095,14 +1076,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 	}
 
-	ppgtt->pt_dma_addr = kcalloc(ppgtt->num_pd_entries, sizeof(dma_addr_t),
-				     GFP_KERNEL);
-	if (!ppgtt->pt_dma_addr) {
-		drm_mm_remove_node(&ppgtt->node);
-		gen6_ppgtt_free(ppgtt);
-		return -ENOMEM;
-	}
-
 	return 0;
 }
 
@@ -1124,7 +1097,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pt_dma_addr[i] = pt_addr;
+		ppgtt->pd.page_table[i].daddr = pt_addr;
 	}
 
 	return 0;
@@ -1166,7 +1139,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
-	ppgtt->pd_offset =
+	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
@@ -1177,7 +1150,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 
 	gen6_write_pdes(ppgtt);
 	DRM_DEBUG("Adding PPGTT at offset %x\n",
-		  ppgtt->pd_offset << 10);
+		  ppgtt->pd.pd_offset << 10);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index b759c41..1144b709 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -189,10 +189,16 @@ struct i915_vma {
 
 struct i915_page_table_entry {
 	struct page *page;
+	dma_addr_t daddr;
 };
 
 struct i915_page_directory_entry {
 	struct page *page; /* NULL for GEN6-GEN7 */
+	union {
+		uint32_t pd_offset;
+		dma_addr_t daddr;
+	};
+
 	struct i915_page_table_entry *page_table;
 };
 
@@ -286,14 +292,6 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
-	};
-	union {
-		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
-	};
-	union {
 		struct i915_page_directory_pointer_entry pdp;
 		struct i915_page_directory_entry pd;
 	};
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1c65949..9e71992 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1735,14 +1735,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[3]);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[3]);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[2]);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[2]);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[1]);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[1]);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[0]);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[0]);
+	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3].daddr);
+	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3].daddr);
+	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2].daddr);
+	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2].daddr);
+	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1].daddr);
+	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1].daddr);
+	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0].daddr);
+	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0].daddr);
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 03/32] drm/i915: Create page table allocators
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 01/32] drm/i915: page table abstractions Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 02/32] drm/i915: Complete page table structures Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-25 13:34     ` Mika Kuoppala
  2015-02-24 16:22   ` [PATCH v6 04/32] drm/i915: Plumb drm_device through page tables operations Michel Thierry
                     ` (29 subsequent siblings)
  32 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks.  This makes the
code easier to write, read, and verify.

Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,

The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.

This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.

2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).

3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.

4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.

v2: Updated commit message to explain why this patch exists

v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/

v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)

v5: Added additional safety checks in gen8 clear/free/unmap.

v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).

v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 254 ++++++++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c    |  16 +--
 3 files changed, 178 insertions(+), 96 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ab6f1d4..81c1dba 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -279,6 +279,98 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
+static void unmap_and_free_pt(struct i915_page_table_entry *pt)
+{
+	if (WARN_ON(!pt->page))
+		return;
+	__free_page(pt->page);
+	kfree(pt);
+}
+
+static struct i915_page_table_entry *alloc_pt_single(void)
+{
+	struct i915_page_table_entry *pt;
+
+	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
+	if (!pt)
+		return ERR_PTR(-ENOMEM);
+
+	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pt->page) {
+		kfree(pt);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pt;
+}
+
+/**
+ * alloc_pt_range() - Allocate a multiple page tables
+ * @pd:		The page directory which will have at least @count entries
+ *		available to point to the allocated page tables.
+ * @pde:	First page directory entry for which we are allocating.
+ * @count:	Number of pages to allocate.
+ *
+ * Allocates multiple page table pages and sets the appropriate entries in the
+ * page table structure within the page directory. Function cleans up after
+ * itself on any failures.
+ *
+ * Return: 0 if allocation succeeded.
+ */
+static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count)
+{
+	int i, ret;
+
+	/* 512 is the max page tables per page_directory on any platform. */
+	if (WARN_ON(pde + count > GEN6_PPGTT_PD_ENTRIES))
+		return -EINVAL;
+
+	for (i = pde; i < pde + count; i++) {
+		struct i915_page_table_entry *pt = alloc_pt_single();
+
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto err_out;
+		}
+		WARN(pd->page_table[i],
+		     "Leaking page directory entry %d (%pa)\n",
+		     i, pd->page_table[i]);
+		pd->page_table[i] = pt;
+	}
+
+	return 0;
+
+err_out:
+	while (i-- > pde)
+		unmap_and_free_pt(pd->page_table[i]);
+	return ret;
+}
+
+static void unmap_and_free_pd(struct i915_page_directory_entry *pd)
+{
+	if (pd->page) {
+		__free_page(pd->page);
+		kfree(pd);
+	}
+}
+
+static struct i915_page_directory_entry *alloc_pd_single(void)
+{
+	struct i915_page_directory_entry *pd;
+
+	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+	if (!pd)
+		return ERR_PTR(-ENOMEM);
+
+	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pd->page) {
+		kfree(pd);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pd;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
 			   uint64_t val)
@@ -311,7 +403,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.page_directory[i].daddr;
+		dma_addr_t addr = ppgtt->pdp.page_directory[i]->daddr;
 		ret = gen8_write_pdp(ring, i, addr);
 		if (ret)
 			return ret;
@@ -338,8 +430,24 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
-		struct page *page_table = pd->page_table[pde].page;
+		struct i915_page_directory_entry *pd;
+		struct i915_page_table_entry *pt;
+		struct page *page_table;
+
+		if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
+			continue;
+
+		pd = ppgtt->pdp.page_directory[pdpe];
+
+		if (WARN_ON(!pd->page_table[pde]))
+			continue;
+
+		pt = pd->page_table[pde];
+
+		if (WARN_ON(!pt->page))
+			continue;
+
+		page_table = pt->page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
@@ -384,8 +492,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 			break;
 
 		if (pt_vaddr == NULL) {
-			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
-			struct page *page_table = pd->page_table[pde].page;
+			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[pdpe];
+			struct i915_page_table_entry *pt = pd->page_table[pde];
+			struct page *page_table = pt->page;
 
 			pt_vaddr = kmap_atomic(page_table);
 		}
@@ -416,19 +525,16 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
 {
 	int i;
 
-	if (pd->page_table == NULL)
+	if (!pd->page)
 		return;
 
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
-		if (pd->page_table[i].page)
-			__free_page(pd->page_table[i].page);
-}
+	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
+		if (WARN_ON(!pd->page_table[i]))
+			continue;
 
-static void gen8_free_page_directory(struct i915_page_directory_entry *pd)
-{
-	gen8_free_page_tables(pd);
-	kfree(pd->page_table);
-	__free_page(pd->page);
+		unmap_and_free_pt(pd->page_table[i]);
+		pd->page_table[i] = NULL;
+	}
 }
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
@@ -436,7 +542,11 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_directory(&ppgtt->pdp.page_directory[i]);
+		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+			continue;
+
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
+		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
 	}
 }
 
@@ -448,14 +558,23 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pdp.page_directory[i].daddr)
+		if (!ppgtt->pdp.page_directory[i]->daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i].daddr, PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i]->daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_table[j].daddr;
+			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
+			struct i915_page_table_entry *pt;
+			dma_addr_t addr;
+
+			if (WARN_ON(!pd->page_table[j]))
+				continue;
+
+			pt = pd->page_table[j];
+			addr = pt->daddr;
+
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -474,25 +593,20 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	int i, j;
+	int i, ret;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[i];
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_table_entry *pt = &pd->page_table[j];
-
-			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-			if (!pt->page)
-				goto unwind_out;
-
-		}
+		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
+				     0, GEN8_PDES_PER_PAGE);
+		if (ret)
+			goto unwind_out;
 	}
 
 	return 0;
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
 
 	return -ENOMEM;
 }
@@ -503,19 +617,9 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int i;
 
 	for (i = 0; i < max_pdp; i++) {
-		struct i915_page_table_entry *pt;
-
-		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
-		if (!pt)
-			goto unwind_out;
-
-		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
-		if (!ppgtt->pdp.page_directory[i].page) {
-			kfree(pt);
+		ppgtt->pdp.page_directory[i] = alloc_pd_single();
+		if (IS_ERR(ppgtt->pdp.page_directory[i]))
 			goto unwind_out;
-		}
-
-		ppgtt->pdp.page_directory[i].page_table = pt;
 	}
 
 	ppgtt->num_pd_pages = max_pdp;
@@ -524,10 +628,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 
 unwind_out:
-	while (i--) {
-		kfree(ppgtt->pdp.page_directory[i].page_table);
-		__free_page(ppgtt->pdp.page_directory[i].page);
-	}
+	while (i--)
+		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
 
 	return -ENOMEM;
 }
@@ -561,14 +663,14 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.page_directory[pd].page, 0,
+			       ppgtt->pdp.page_directory[pd]->page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->pdp.page_directory[pd].daddr = pd_addr;
+	ppgtt->pdp.page_directory[pd]->daddr = pd_addr;
 
 	return 0;
 }
@@ -578,8 +680,8 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pt)
 {
 	dma_addr_t pt_addr;
-	struct i915_page_directory_entry *pdir = &ppgtt->pdp.page_directory[pd];
-	struct i915_page_table_entry *ptab = &pdir->page_table[pt];
+	struct i915_page_directory_entry *pdir = ppgtt->pdp.page_directory[pd];
+	struct i915_page_table_entry *ptab = pdir->page_table[pt];
 	struct page *p = ptab->page;
 	int ret;
 
@@ -642,10 +744,12 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 * will never need to touch the PDEs again.
 	 */
 	for (i = 0; i < max_pdp; i++) {
+		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i]->page);
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_table[j].daddr;
+			struct i915_page_table_entry *pt = pd->page_table[j];
+			dma_addr_t addr = pt->daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -696,7 +800,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pd.page_table[pde].daddr;
+		dma_addr_t pt_addr = ppgtt->pd.page_table[pde]->daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -707,7 +811,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde]->page);
 		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
@@ -746,7 +850,7 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		dma_addr_t pt_addr;
 
-		pt_addr = ppgtt->pd.page_table[i].daddr;
+		pt_addr = ppgtt->pd.page_table[i]->daddr;
 		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
 		pd_entry |= GEN6_PDE_VALID;
 
@@ -922,7 +1026,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > I915_PPGTT_PT_ENTRIES)
 			last_pte = I915_PPGTT_PT_ENTRIES;
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
 
 		for (i = first_pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -951,7 +1055,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt].page);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
 
 		pt_vaddr[act_pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -974,7 +1078,7 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_table[i].daddr,
+			       ppgtt->pd.page_table[i]->daddr,
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
@@ -983,9 +1087,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		if (ppgtt->pd.page_table[i].page)
-			__free_page(ppgtt->pd.page_table[i].page);
-	kfree(ppgtt->pd.page_table);
+		unmap_and_free_pt(ppgtt->pd.page_table[i]);
+
+	unmap_and_free_pd(&ppgtt->pd);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1040,28 +1144,6 @@ alloc:
 	return 0;
 }
 
-static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct i915_page_table_entry *pt;
-	int i;
-
-	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
-	if (!pt)
-		return -ENOMEM;
-
-	ppgtt->pd.page_table = pt;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		pt[i].page = alloc_page(GFP_KERNEL);
-		if (!pt->page) {
-			gen6_ppgtt_free(ppgtt);
-			return -ENOMEM;
-		}
-	}
-
-	return 0;
-}
-
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 {
 	int ret;
@@ -1070,7 +1152,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_allocate_page_tables(ppgtt);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
@@ -1088,7 +1170,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 		struct page *page;
 		dma_addr_t pt_addr;
 
-		page = ppgtt->pd.page_table[i].page;
+		page = ppgtt->pd.page_table[i]->page;
 		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
@@ -1097,7 +1179,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pd.page_table[i].daddr = pt_addr;
+		ppgtt->pd.page_table[i]->daddr = pt_addr;
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 1144b709..c9e93f5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -199,12 +199,12 @@ struct i915_page_directory_entry {
 		dma_addr_t daddr;
 	};
 
-	struct i915_page_table_entry *page_table;
+	struct i915_page_table_entry *page_table[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
 };
 
 struct i915_page_directory_pointer_entry {
 	/* struct page *page; */
-	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
+	struct i915_page_directory_entry *page_directory[GEN8_LEGACY_PDPES];
 };
 
 struct i915_address_space {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9e71992..bc9c7c3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1735,14 +1735,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3].daddr);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3].daddr);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2].daddr);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2].daddr);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1].daddr);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1].daddr);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0].daddr);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0].daddr);
+	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 04/32] drm/i915: Plumb drm_device through page tables operations
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (2 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 03/32] drm/i915: Create page table allocators Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-25 14:52     ` Mika Kuoppala
  2015-02-24 16:22   ` [PATCH v6 05/32] drm/i915: Track GEN6 page table usage Michel Thierry
                     ` (28 subsequent siblings)
  32 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

The next patch in the series will require it for alloc_pt_single.

v2: Rebased after s/page_tables/page_table/.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 81c1dba..e05488e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -142,7 +142,6 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 		return has_aliasing_ppgtt ? 1 : 0;
 }
 
-
 static void ppgtt_bind_vma(struct i915_vma *vma,
 			   enum i915_cache_level cache_level,
 			   u32 flags);
@@ -279,7 +278,7 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-static void unmap_and_free_pt(struct i915_page_table_entry *pt)
+static void unmap_and_free_pt(struct i915_page_table_entry *pt, struct drm_device *dev)
 {
 	if (WARN_ON(!pt->page))
 		return;
@@ -287,7 +286,7 @@ static void unmap_and_free_pt(struct i915_page_table_entry *pt)
 	kfree(pt);
 }
 
-static struct i915_page_table_entry *alloc_pt_single(void)
+static struct i915_page_table_entry *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_page_table_entry *pt;
 
@@ -317,7 +316,9 @@ static struct i915_page_table_entry *alloc_pt_single(void)
  *
  * Return: 0 if allocation succeeded.
  */
-static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count)
+static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count,
+		  struct drm_device *dev)
+
 {
 	int i, ret;
 
@@ -326,7 +327,7 @@ static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, si
 		return -EINVAL;
 
 	for (i = pde; i < pde + count; i++) {
-		struct i915_page_table_entry *pt = alloc_pt_single();
+		struct i915_page_table_entry *pt = alloc_pt_single(dev);
 
 		if (IS_ERR(pt)) {
 			ret = PTR_ERR(pt);
@@ -342,7 +343,7 @@ static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, si
 
 err_out:
 	while (i-- > pde)
-		unmap_and_free_pt(pd->page_table[i]);
+		unmap_and_free_pt(pd->page_table[i], dev);
 	return ret;
 }
 
@@ -521,7 +522,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
+static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct drm_device *dev)
 {
 	int i;
 
@@ -532,7 +533,7 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
 		if (WARN_ON(!pd->page_table[i]))
 			continue;
 
-		unmap_and_free_pt(pd->page_table[i]);
+		unmap_and_free_pt(pd->page_table[i], dev);
 		pd->page_table[i] = NULL;
 	}
 }
@@ -545,7 +546,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
 		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
 	}
 }
@@ -597,7 +598,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
-				     0, GEN8_PDES_PER_PAGE);
+				     0, GEN8_PDES_PER_PAGE, ppgtt->base.dev);
 		if (ret)
 			goto unwind_out;
 	}
@@ -606,7 +607,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
@@ -1087,7 +1088,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		unmap_and_free_pt(ppgtt->pd.page_table[i]);
+		unmap_and_free_pt(ppgtt->pd.page_table[i], ppgtt->base.dev);
 
 	unmap_and_free_pd(&ppgtt->pd);
 }
@@ -1152,7 +1153,9 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+			ppgtt->base.dev);
+
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 05/32] drm/i915: Track GEN6 page table usage
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (3 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 04/32] drm/i915: Plumb drm_device through page tables operations Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-26 15:58     ` Mika Kuoppala
  2015-02-24 16:22   ` [PATCH v6 06/32] drm/i915: Extract context switch skip and pd load logic Michel Thierry
                     ` (27 subsequent siblings)
  32 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Instead of implementing the full tracking + dynamic allocation, this
patch does a bit less than half of the work, by tracking and warning on
unexpected conditions. The tracking itself follows which PTEs within a
page table are currently being used for objects. The next patch will
modify this to actually allocate the page tables only when necessary.

With the current patch there isn't much in the way of making a gen
agnostic range allocation function. However, in the next patch we'll add
more specificity which makes having separate functions a bit easier to
manage.

One important change introduced here is that DMA mappings are
created/destroyed at the same page directories/tables are
allocated/deallocated.

Notice that aliasing PPGTT is not managed here. The patch which actually
begins dynamic allocation/teardown explains the reasoning for this.

v2: s/pdp.page_directory/pdp.page_directorys
Make a scratch page allocation helper

v3: Rebase and expand commit message.

v4: Allocate required pagetables only when it is needed, _bind_to_vm
instead of bind_vma (Daniel).

v5: Rebased to remove the unnecessary noise in the diff, also:
 - PDE mask is GEN agnostic, renamed GEN6_PDE_MASK to I915_PDE_MASK.
 - Removed unnecessary checks in gen6_alloc_va_range.
 - Changed map/unmap_px_single macros to use dma functions directly and
   be part of a static inline function instead.
 - Moved drm_device plumbing through page tables operation to its own
   patch.
 - Moved allocate/teardown_va_range calls until they are fully
   implemented (in subsequent patch).
 - Merged pt and scratch_pt unmap_and_free path.
 - Moved scratch page allocator helper to the patch that will use it.

v6: Reduce complexity by not tearing down pagetables dynamically, the
same can be achieved while freeing empty vms. (Daniel)

v7: s/i915_dma_map_px_single/i915_dma_map_single
s/gen6_write_pdes/gen6_write_pde
Prevent a NULL case when only GGTT is available. (Mika)

v8: Rebased after s/page_tables/page_table/.

Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 198 +++++++++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  75 ++++++++++++++
 2 files changed, 211 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e05488e..f9354c7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -278,29 +278,88 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-static void unmap_and_free_pt(struct i915_page_table_entry *pt, struct drm_device *dev)
+#define i915_dma_unmap_single(px, dev) \
+	__i915_dma_unmap_single((px)->daddr, dev)
+
+static inline void __i915_dma_unmap_single(dma_addr_t daddr,
+					struct drm_device *dev)
+{
+	struct device *device = &dev->pdev->dev;
+
+	dma_unmap_page(device, daddr, 4096, PCI_DMA_BIDIRECTIONAL);
+}
+
+/**
+ * i915_dma_map_single() - Create a dma mapping for a page table/dir/etc.
+ * @px:	Page table/dir/etc to get a DMA map for
+ * @dev:	drm device
+ *
+ * Page table allocations are unified across all gens. They always require a
+ * single 4k allocation, as well as a DMA mapping. If we keep the structs
+ * symmetric here, the simple macro covers us for every page table type.
+ *
+ * Return: 0 if success.
+ */
+#define i915_dma_map_single(px, dev) \
+	i915_dma_map_page_single((px)->page, (dev), &(px)->daddr)
+
+static inline int i915_dma_map_page_single(struct page *page,
+					   struct drm_device *dev,
+					   dma_addr_t *daddr)
+{
+	struct device *device = &dev->pdev->dev;
+
+	*daddr = dma_map_page(device, page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
+	return dma_mapping_error(device, *daddr);
+}
+
+static void unmap_and_free_pt(struct i915_page_table_entry *pt,
+			       struct drm_device *dev)
 {
 	if (WARN_ON(!pt->page))
 		return;
+
+	i915_dma_unmap_single(pt, dev);
 	__free_page(pt->page);
+	kfree(pt->used_ptes);
 	kfree(pt);
 }
 
 static struct i915_page_table_entry *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_page_table_entry *pt;
+	const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+		GEN8_PTES_PER_PAGE : I915_PPGTT_PT_ENTRIES;
+	int ret = -ENOMEM;
 
 	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
 	if (!pt)
 		return ERR_PTR(-ENOMEM);
 
+	pt->used_ptes = kcalloc(BITS_TO_LONGS(count), sizeof(*pt->used_ptes),
+				GFP_KERNEL);
+
+	if (!pt->used_ptes)
+		goto fail_bitmap;
+
 	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (!pt->page) {
-		kfree(pt);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!pt->page)
+		goto fail_page;
+
+	ret = i915_dma_map_single(pt, dev);
+	if (ret)
+		goto fail_dma;
 
 	return pt;
+
+fail_dma:
+	__free_page(pt->page);
+fail_page:
+	kfree(pt->used_ptes);
+fail_bitmap:
+	kfree(pt);
+
+	return ERR_PTR(ret);
 }
 
 /**
@@ -838,26 +897,35 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	}
 }
 
-static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
+/* Write pde (index) from the page directory @pd to the page table @pt */
+static void gen6_write_pde(struct i915_page_directory_entry *pd,
+			    const int pde, struct i915_page_table_entry *pt)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
-	gen6_gtt_pte_t __iomem *pd_addr;
-	uint32_t pd_entry;
-	int i;
+	/* Caller needs to make sure the write completes if necessary */
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pd, struct i915_hw_ppgtt, pd);
+	u32 pd_entry;
 
-	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
-	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		dma_addr_t pt_addr;
+	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
+	pd_entry |= GEN6_PDE_VALID;
 
-		pt_addr = ppgtt->pd.page_table[i]->daddr;
-		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
-		pd_entry |= GEN6_PDE_VALID;
+	writel(pd_entry, ppgtt->pd_addr + pde);
+}
 
-		writel(pd_entry, pd_addr + i);
-	}
-	readl(pd_addr);
+/* Write all the page tables found in the ppgtt structure to incrementing page
+ * directories. */
+static void gen6_write_page_range(struct drm_i915_private *dev_priv,
+				struct i915_page_directory_entry *pd, uint32_t start, uint32_t length)
+{
+	struct i915_page_table_entry *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, pd, start, length, temp, pde)
+		gen6_write_pde(pd, pde, pt);
+
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
+	readl(dev_priv->gtt.gsm);
 }
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
@@ -1083,6 +1151,28 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
+static int gen6_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+				container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_table_entry *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		DECLARE_BITMAP(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
+
+		bitmap_zero(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
+		bitmap_set(tmp_bitmap, gen6_pte_index(start),
+			   gen6_pte_count(start, length));
+
+		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+				I915_PPGTT_PT_ENTRIES);
+	}
+
+	return 0;
+}
+
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
@@ -1129,20 +1219,24 @@ alloc:
 					       0, dev_priv->gtt.base.total,
 					       0);
 		if (ret)
-			return ret;
+			goto err_out;
 
 		retried = true;
 		goto alloc;
 	}
 
 	if (ret)
-		return ret;
+		goto err_out;
+
 
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
 	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
 	return 0;
+
+err_out:
+	return ret;
 }
 
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
@@ -1164,30 +1258,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct drm_device *dev = ppgtt->base.dev;
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		struct page *page;
-		dma_addr_t pt_addr;
-
-		page = ppgtt->pd.page_table[i]->page;
-		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
-				       PCI_DMA_BIDIRECTIONAL);
-
-		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
-			gen6_ppgtt_unmap_pages(ppgtt);
-			return -EIO;
-		}
-
-		ppgtt->pd.page_table[i]->daddr = pt_addr;
-	}
-
-	return 0;
-}
-
 static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
 	struct drm_device *dev = ppgtt->base.dev;
@@ -1211,12 +1281,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_setup_page_tables(ppgtt);
-	if (ret) {
-		gen6_ppgtt_free(ppgtt);
-		return ret;
-	}
-
+	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
@@ -1227,13 +1292,17 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
+	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
+
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 
+	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
+
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
 			 ppgtt->node.start / PAGE_SIZE);
 
-	gen6_write_pdes(ppgtt);
 	DRM_DEBUG("Adding PPGTT at offset %x\n",
 		  ppgtt->pd.pd_offset << 10);
 
@@ -1504,15 +1573,20 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 		return;
 	}
 
-	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
-		/* TODO: Perhaps it shouldn't be gen6 specific */
-		if (i915_is_ggtt(vm)) {
-			if (dev_priv->mm.aliasing_ppgtt)
-				gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
-			continue;
-		}
+	if (USES_PPGTT(dev)) {
+		list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
+			/* TODO: Perhaps it shouldn't be gen6 specific */
+
+			struct i915_hw_ppgtt *ppgtt =
+					container_of(vm, struct i915_hw_ppgtt,
+						     base);
 
-		gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
+			if (i915_is_ggtt(vm))
+				ppgtt = dev_priv->mm.aliasing_ppgtt;
+
+			gen6_write_page_range(dev_priv, &ppgtt->pd, 0,
+					      ppgtt->num_pd_entries);
+		}
 	}
 
 	i915_ggtt_flush(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c9e93f5..bf0e380 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -54,7 +54,10 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN6_PPGTT_PD_ENTRIES		512
 #define GEN6_PD_SIZE			(GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
 #define GEN6_PD_ALIGN			(PAGE_SIZE * 16)
+#define GEN6_PDE_SHIFT			22
 #define GEN6_PDE_VALID			(1 << 0)
+#define I915_PDE_MASK			(GEN6_PPGTT_PD_ENTRIES-1)
+#define NUM_PTE(pde_shift)		(1 << (pde_shift - PAGE_SHIFT))
 
 #define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
 
@@ -190,6 +193,8 @@ struct i915_vma {
 struct i915_page_table_entry {
 	struct page *page;
 	dma_addr_t daddr;
+
+	unsigned long *used_ptes;
 };
 
 struct i915_page_directory_entry {
@@ -246,6 +251,9 @@ struct i915_address_space {
 	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
 				     enum i915_cache_level level,
 				     bool valid, u32 flags); /* Create a valid PTE */
+	int (*allocate_va_range)(struct i915_address_space *vm,
+				 uint64_t start,
+				 uint64_t length);
 	void (*clear_range)(struct i915_address_space *vm,
 			    uint64_t start,
 			    uint64_t length,
@@ -298,12 +306,79 @@ struct i915_hw_ppgtt {
 
 	struct drm_i915_file_private *file_priv;
 
+	gen6_gtt_pte_t __iomem *pd_addr;
+
 	int (*enable)(struct i915_hw_ppgtt *ppgtt);
 	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
 			 struct intel_engine_cs *ring);
 	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
 };
 
+/* For each pde iterates over every pde between from start until start + length.
+ * If start, and start+length are not perfectly divisible, the macro will round
+ * down, and up as needed. The macro modifies pde, start, and length. Dev is
+ * only used to differentiate shift values. Temp is temp.  On gen6/7, start = 0,
+ * and length = 2G effectively iterates over every PDE in the system. On gen8+
+ * it simply iterates over every page directory entry in a page directory.
+ *
+ * XXX: temp is not actually needed, but it saves doing the ALIGN operation.
+ */
+#define gen6_for_each_pde(pt, pd, start, length, temp, iter) \
+	for (iter = gen6_pde_index(start), pt = (pd)->page_table[iter]; \
+	     length > 0 && iter < GEN6_PPGTT_PD_ENTRIES; \
+	     pt = (pd)->page_table[++iter], \
+	     temp = ALIGN(start+1, 1 << GEN6_PDE_SHIFT) - start, \
+	     temp = min_t(unsigned, temp, length), \
+	     start += temp, length -= temp)
+
+static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
+{
+	const uint32_t mask = NUM_PTE(pde_shift) - 1;
+
+	return (address >> PAGE_SHIFT) & mask;
+}
+
+/* Helper to counts the number of PTEs within the given length. This count does
+* not cross a page table boundary, so the max value would be
+* I915_PPGTT_PT_ENTRIES for GEN6, and GEN8_PTES_PER_PAGE for GEN8.
+*/
+static inline size_t i915_pte_count(uint64_t addr, size_t length,
+					uint32_t pde_shift)
+{
+	const uint64_t mask = ~((1 << pde_shift) - 1);
+	uint64_t end;
+
+	BUG_ON(length == 0);
+	BUG_ON(offset_in_page(addr|length));
+
+	end = addr + length;
+
+	if ((addr & mask) != (end & mask))
+		return NUM_PTE(pde_shift) - i915_pte_index(addr, pde_shift);
+
+	return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
+}
+
+static inline uint32_t i915_pde_index(uint64_t addr, uint32_t shift)
+{
+	return (addr >> shift) & I915_PDE_MASK;
+}
+
+static inline uint32_t gen6_pte_index(uint32_t addr)
+{
+	return i915_pte_index(addr, GEN6_PDE_SHIFT);
+}
+
+static inline size_t gen6_pte_count(uint32_t addr, uint32_t length)
+{
+	return i915_pte_count(addr, length, GEN6_PDE_SHIFT);
+}
+
+static inline uint32_t gen6_pde_index(uint32_t addr)
+{
+	return i915_pde_index(addr, GEN6_PDE_SHIFT);
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 06/32] drm/i915: Extract context switch skip and pd load logic
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (4 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 05/32] drm/i915: Track GEN6 page table usage Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-27 11:46     ` Mika Kuoppala
  2015-02-24 16:22   ` [PATCH v6 07/32] drm/i915: Track page table reload need Michel Thierry
                     ` (26 subsequent siblings)
  32 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

We have some fanciness coming up. This patch just breaks out the logic
of context switch skip, pd load pre, and pd load post.

v2: Use new functions to replace the logic right away (Daniel)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_context.c | 40 +++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 755b415..6206d27 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -565,6 +565,33 @@ mi_set_context(struct intel_engine_cs *ring,
 	return ret;
 }
 
+static inline bool should_skip_switch(struct intel_engine_cs *ring,
+				      struct intel_context *from,
+				      struct intel_context *to)
+{
+	if (from == to && !to->remap_slice)
+		return true;
+
+	return false;
+}
+
+static bool
+needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	return ((INTEL_INFO(ring->dev)->gen < 8) ||
+			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
+}
+
+static bool
+needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
+{
+	return (!to->legacy_hw_ctx.initialized ||
+			i915_gem_context_is_default(to)) &&
+			to->ppgtt && IS_GEN8(ring->dev);
+}
+
 static int do_switch(struct intel_engine_cs *ring,
 		     struct intel_context *to)
 {
@@ -573,9 +600,6 @@ static int do_switch(struct intel_engine_cs *ring,
 	u32 hw_flags = 0;
 	bool uninitialized = false;
 	struct i915_vma *vma;
-	bool needs_pd_load_pre = ((INTEL_INFO(ring->dev)->gen < 8) ||
-			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
-	bool needs_pd_load_post = false;
 	int ret, i;
 
 	if (from != NULL && ring == &dev_priv->ring[RCS]) {
@@ -583,7 +607,7 @@ static int do_switch(struct intel_engine_cs *ring,
 		BUG_ON(!i915_gem_obj_is_pinned(from->legacy_hw_ctx.rcs_state));
 	}
 
-	if (from == to && !to->remap_slice)
+	if (should_skip_switch(ring, from, to))
 		return 0;
 
 	/* Trying to pin first makes error handling easier. */
@@ -601,7 +625,7 @@ static int do_switch(struct intel_engine_cs *ring,
 	 */
 	from = ring->last_context;
 
-	if (needs_pd_load_pre) {
+	if (needs_pd_load_pre(ring, to)) {
 		/* Older GENs and non render rings still want the load first,
 		 * "PP_DCLV followed by PP_DIR_BASE register through Load
 		 * Register Immediate commands in Ring Buffer before submitting
@@ -646,16 +670,14 @@ static int do_switch(struct intel_engine_cs *ring,
 	 * XXX: If we implemented page directory eviction code, this
 	 * optimization needs to be removed.
 	 */
-	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
+	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
 		hw_flags |= MI_RESTORE_INHIBIT;
-		needs_pd_load_post = to->ppgtt && IS_GEN8(ring->dev);
-	}
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
 		goto unpin_out;
 
-	if (needs_pd_load_post) {
+	if (needs_pd_load_post(ring, to)) {
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		/* The hardware context switch is emitted, but we haven't
 		 * actually changed the state - so it's probably safe to bail
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 07/32] drm/i915: Track page table reload need
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (5 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 06/32] drm/i915: Extract context switch skip and pd load logic Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 08/32] drm/i915: Initialize all contexts Michel Thierry
                     ` (25 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This patch was formerly known as, "Force pd restore when PDEs change,
gen6-7." I had to change the name because it is needed for GEN8 too.

The real issue this is trying to solve is when a new object is mapped
into the current address space. The GPU does not snoop the new mapping
so we must do the gen specific action to reload the page tables.

GEN8 and GEN7 do differ in the way they load page tables for the RCS.
GEN8 does so with the context restore, while GEN7 requires the proper
load commands in the command streamer. Non-render is similar for both.

Caveat for GEN7
The docs say you cannot change the PDEs of a currently running context.
We never map new PDEs of a running context, and expect them to be
present - so I think this is okay. (We can unmap, but this should also
be okay since we only unmap unreferenced objects that the GPU shouldn't
be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag
to signal that even if the context is the same, force a reload. It's
unclear exactly what this does, but I have a hunch it's the right thing
to do.

The logic assumes that we always emit a context switch after mapping new
PDEs, and before we submit a batch. This is the case today, and has been
the case since the inception of hardware contexts. A note in the comment
let's the user know.

It's not just for gen8. If the current context has mappings change, we
need a context reload to switch

v2: Rebased after ppgtt clean up patches. Split the warning for aliasing
and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt
is always null.

v3: Invalidate PPGTT TLBs inside alloc_va_range.

v4: Rename ppgtt_invalidate_tlbs to mark_tlbs_dirty and move
pd_dirty_rings from i915_address_space to i915_hw_ppgtt. Fixes when
neither ctx->ppgtt and aliasing_ppgtt exist.

v5: Removed references to teardown_va_range.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_context.c    | 29 ++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 11 +++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  1 +
 4 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 6206d27..437cdcc 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -569,8 +569,20 @@ static inline bool should_skip_switch(struct intel_engine_cs *ring,
 				      struct intel_context *from,
 				      struct intel_context *to)
 {
-	if (from == to && !to->remap_slice)
-		return true;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	if (to->remap_slice)
+		return false;
+
+	if (to->ppgtt) {
+		if (from == to && !test_bit(ring->id,
+				&to->ppgtt->pd_dirty_rings))
+			return true;
+	} else if (dev_priv->mm.aliasing_ppgtt) {
+		if (from == to && !test_bit(ring->id,
+				&dev_priv->mm.aliasing_ppgtt->pd_dirty_rings))
+			return true;
+	}
 
 	return false;
 }
@@ -587,9 +599,8 @@ needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
 static bool
 needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
 {
-	return (!to->legacy_hw_ctx.initialized ||
-			i915_gem_context_is_default(to)) &&
-			to->ppgtt && IS_GEN8(ring->dev);
+	return IS_GEN8(ring->dev) &&
+			(to->ppgtt || &to->ppgtt->pd_dirty_rings);
 }
 
 static int do_switch(struct intel_engine_cs *ring,
@@ -634,6 +645,12 @@ static int do_switch(struct intel_engine_cs *ring,
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		if (ret)
 			goto unpin_out;
+
+		/* Doing a PD load always reloads the page dirs */
+		if (to->ppgtt)
+			clear_bit(ring->id, &to->ppgtt->pd_dirty_rings);
+		else
+			clear_bit(ring->id, &dev_priv->mm.aliasing_ppgtt->pd_dirty_rings);
 	}
 
 	if (ring != &dev_priv->ring[RCS]) {
@@ -672,6 +689,8 @@ static int do_switch(struct intel_engine_cs *ring,
 	 */
 	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
 		hw_flags |= MI_RESTORE_INHIBIT;
+	else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->pd_dirty_rings))
+		hw_flags |= MI_FORCE_RESTORE;
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 82636aa..24757ee 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1187,6 +1187,13 @@ i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
 	if (ret)
 		goto error;
 
+	if (ctx->ppgtt)
+		WARN(ctx->ppgtt->pd_dirty_rings & (1<<ring->id),
+			"%s didn't clear reload\n", ring->name);
+	else if (dev_priv->mm.aliasing_ppgtt)
+		WARN(dev_priv->mm.aliasing_ppgtt->pd_dirty_rings &
+			(1<<ring->id), "%s didn't clear reload\n", ring->name);
+
 	instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
 	instp_mask = I915_EXEC_CONSTANTS_MASK;
 	switch (instp_mode) {
@@ -1456,6 +1463,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret)
 		goto err;
 
+	/* XXX: Reserve has possibly change PDEs which means we must do a
+	 * context switch before we can coherently read some of the reserved
+	 * VMAs. */
+
 	/* The objects are in their final locations, apply the relocations. */
 	if (need_relocs)
 		ret = i915_gem_execbuffer_relocate(eb);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f9354c7..bd8e876 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1151,6 +1151,16 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
+/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
+ * are switching between contexts with the same LRCA, we also must do a force
+ * restore.
+ */
+static inline void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
+{
+	/* If current vm != vm, */
+	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
+}
+
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start, uint64_t length)
 {
@@ -1170,6 +1180,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 				I915_PPGTT_PT_ENTRIES);
 	}
 
+	mark_tlbs_dirty(ppgtt);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index bf0e380..867ede5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -297,6 +297,7 @@ struct i915_hw_ppgtt {
 	struct i915_address_space base;
 	struct kref ref;
 	struct drm_mm_node node;
+	unsigned long pd_dirty_rings;
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 08/32] drm/i915: Initialize all contexts
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (6 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 07/32] drm/i915: Track page table reload need Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-27 13:40     ` [PATCH] " Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 09/32] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
                     ` (24 subsequent siblings)
  32 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The problem is we're going to switch to a new context, which could be
the default context. The plan was to use restore inhibit, which would be
fine, except if we are using dynamic page tables (which we will). If we
use dynamic page tables and we don't load new page tables, the previous
page tables might go away, and future operations will fault.

CTXA runs.
switch to default, restore inhibit
CTXA dies and has its address space taken away.
Run CTXB, tries to save using the context A's address space - this
fails.

The general solution is to make sure every context has it's own state,
and its own address space. For cases when we must restore inhibit, first
thing we do is load a valid address space. I thought this would be
enough, but apparently there are references within the context itself
which will refer to the old address space - therefore, we also must
reinitialize.

It was tricky to track this down as we don't have much insight into what
happens in a context save.

This is required for the next patch which enables dynamic page tables.

v2: to->ppgtt is only valid in full ppgtt.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_context.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 437cdcc..6a583c3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -596,13 +596,6 @@ needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
 			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
 }
 
-static bool
-needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
-{
-	return IS_GEN8(ring->dev) &&
-			(to->ppgtt || &to->ppgtt->pd_dirty_rings);
-}
-
 static int do_switch(struct intel_engine_cs *ring,
 		     struct intel_context *to)
 {
@@ -683,20 +676,24 @@ static int do_switch(struct intel_engine_cs *ring,
 
 	/* GEN8 does *not* require an explicit reload if the PDPs have been
 	 * setup, and we do not wish to move them.
-	 *
-	 * XXX: If we implemented page directory eviction code, this
-	 * optimization needs to be removed.
 	 */
-	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
+	if (!to->legacy_hw_ctx.initialized) {
 		hw_flags |= MI_RESTORE_INHIBIT;
-	else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->pd_dirty_rings))
+		/* NB: If we inhibit the restore, the context is not allowed to
+		 * die because future work may end up depending on valid address
+		 * space. This means we must enforce that a page table load
+		 * occur when this occurs. */
+	} else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->pd_dirty_rings))
 		hw_flags |= MI_FORCE_RESTORE;
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
 		goto unpin_out;
 
-	if (needs_pd_load_post(ring, to)) {
+	if (IS_GEN8(ring->dev) && to->ppgtt && (hw_flags & MI_RESTORE_INHIBIT)) {
+		/* We have a valid page directory (scratch) to switch to. This
+		 * allows the old VM to be freed. Note that if anything occurs
+		 * between the set context, and here, we are f*cked */
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		/* The hardware context switch is emitted, but we haven't
 		 * actually changed the state - so it's probably safe to bail
@@ -746,7 +743,7 @@ static int do_switch(struct intel_engine_cs *ring,
 		i915_gem_context_unreference(from);
 	}
 
-	uninitialized = !to->legacy_hw_ctx.initialized && from == NULL;
+	uninitialized = !to->legacy_hw_ctx.initialized;
 	to->legacy_hw_ctx.initialized = true;
 
 done:
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 09/32] drm/i915: Finish gen6/7 dynamic page table allocation
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (7 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 08/32] drm/i915: Initialize all contexts Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 10/32] drm/i915: Add dynamic page trace events Michel Thierry
                     ` (23 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This patch continues on the idea from the previous patch. From here on,
in the steady state, PDEs are all pointing to the scratch page table (as
recommended in the spec). When an object is allocated in the VA range,
the code will determine if we need to allocate a page for the page
table. Similarly when the object is destroyed, we will remove, and free
the page table pointing the PDE back to the scratch page.

Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.

The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV.  Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.

We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.

v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.

v3: Updated trace event to spit out a name

v4: Aliasing ppgtt is now initialized differently (in setup global gtt)

v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check
for trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).

v6: Implement changes from code review (Daniel):
 - allocate/teardown_va_range calls added.
 - Add a scratch page allocation helper (only need the address).
 - Move trace events to a new patch.
 - Use updated mark_tlbs_dirty.
 - Moved pt preallocation for aliasing ppgtt into gen6_ppgtt_init.

v7: teardown_va_range removed (Daniel).
    In init, gen6_ppgtt_clear_range call is only needed for aliasing ppgtt.

v8: Rebase after s/page_tables/page_table/.

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4+)
---
 drivers/gpu/drm/i915/i915_debugfs.c |   3 +-
 drivers/gpu/drm/i915/i915_gem.c     |   9 +++
 drivers/gpu/drm/i915/i915_gem_gtt.c | 125 +++++++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_gtt.h |   3 +
 4 files changed, 123 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 4d07030..e8ad450 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2181,6 +2181,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", I915_READ(RING_PP_DIR_BASE_READ(ring)));
 		seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", I915_READ(RING_PP_DIR_DCLV(ring)));
 	}
+	seq_printf(m, "ECOCHK: 0x%08x\n\n", I915_READ(GAM_ECOCHK));
+
 	if (dev_priv->mm.aliasing_ppgtt) {
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
@@ -2197,7 +2199,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
 		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
 	}
-	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
 }
 
 static int i915_ppgtt_info(struct seq_file *m, void *data)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 61134ab..312b7d2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3599,6 +3599,15 @@ search_free:
 	if (ret)
 		goto err_remove_node;
 
+	/*  allocate before insert / bind */
+	if (vma->vm->allocate_va_range) {
+		ret = vma->vm->allocate_va_range(vma->vm,
+						vma->node.start,
+						vma->node.size);
+		if (ret)
+			goto err_remove_node;
+	}
+
 	trace_i915_vma_bind(vma, flags);
 	ret = i915_vma_bind(vma, obj->cache_level,
 			    flags & PIN_GLOBAL ? GLOBAL_BIND : 0);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bd8e876..29cda58 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -362,6 +362,16 @@ fail_bitmap:
 	return ERR_PTR(ret);
 }
 
+static inline struct i915_page_table_entry *alloc_pt_scratch(struct drm_device *dev)
+{
+	struct i915_page_table_entry *pt = alloc_pt_single(dev);
+
+	if (!IS_ERR(pt))
+		pt->scratch = 1;
+
+	return pt;
+}
+
 /**
  * alloc_pt_range() - Allocate a multiple page tables
  * @pd:		The page directory which will have at least @count entries
@@ -1164,10 +1174,46 @@ static inline void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start, uint64_t length)
 {
+	DECLARE_BITMAP(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
+	struct drm_device *dev = vm->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_hw_ppgtt *ppgtt =
 				container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_table_entry *pt;
+	const uint32_t start_save = start, length_save = length;
 	uint32_t pde, temp;
+	int ret;
+
+	BUG_ON(upper_32_bits(start));
+
+	bitmap_zero(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
+
+	/* The allocation is done in two stages so that we can bail out with
+	 * minimal amount of pain. The first stage finds new page tables that
+	 * need allocation. The second stage marks use ptes within the page
+	 * tables.
+	 */
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		if (pt != ppgtt->scratch_pt) {
+			WARN_ON(bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
+			continue;
+		}
+
+		/* We've already allocated a page table */
+		WARN_ON(!bitmap_empty(pt->used_ptes, I915_PPGTT_PT_ENTRIES));
+
+		pt = alloc_pt_single(dev);
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto unwind_out;
+		}
+
+		ppgtt->pd.page_table[pde] = pt;
+		set_bit(pde, new_page_tables);
+	}
+
+	start = start_save;
+	length = length_save;
 
 	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
 		DECLARE_BITMAP(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
@@ -1176,21 +1222,46 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 		bitmap_set(tmp_bitmap, gen6_pte_index(start),
 			   gen6_pte_count(start, length));
 
-		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+		if (test_and_clear_bit(pde, new_page_tables))
+			gen6_write_pde(&ppgtt->pd, pde, pt);
+
+		bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
 				I915_PPGTT_PT_ENTRIES);
 	}
 
+	WARN_ON(!bitmap_empty(new_page_tables, GEN6_PPGTT_PD_ENTRIES));
+
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
+	readl(dev_priv->gtt.gsm);
+
 	mark_tlbs_dirty(ppgtt);
 	return 0;
+
+unwind_out:
+	for_each_set_bit(pde, new_page_tables, GEN6_PPGTT_PD_ENTRIES) {
+		struct i915_page_table_entry *pt = ppgtt->pd.page_table[pde];
+
+		ppgtt->pd.page_table[pde] = NULL;
+		unmap_and_free_pt(pt, vm->dev);
+	}
+
+	mark_tlbs_dirty(ppgtt);
+	return ret;
 }
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		unmap_and_free_pt(ppgtt->pd.page_table[i], ppgtt->base.dev);
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct i915_page_table_entry *pt = ppgtt->pd.page_table[i];
 
+		if (pt != ppgtt->scratch_pt)
+			unmap_and_free_pt(ppgtt->pd.page_table[i], ppgtt->base.dev);
+	}
+
+	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
 	unmap_and_free_pd(&ppgtt->pd);
 }
 
@@ -1217,6 +1288,9 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 	 * size. We allocate at the top of the GTT to avoid fragmentation.
 	 */
 	BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
+	ppgtt->scratch_pt = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pt))
+		return PTR_ERR(ppgtt->scratch_pt);
 alloc:
 	ret = drm_mm_insert_node_in_range_generic(&dev_priv->gtt.base.mm,
 						  &ppgtt->node, GEN6_PD_SIZE,
@@ -1247,6 +1321,7 @@ alloc:
 	return 0;
 
 err_out:
+	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
 	return ret;
 }
 
@@ -1258,18 +1333,20 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
-			ppgtt->base.dev);
+	return 0;
+}
 
-	if (ret) {
-		drm_mm_remove_node(&ppgtt->node);
-		return ret;
-	}
+static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
+				  uint64_t start, uint64_t length)
+{
+	struct i915_page_table_entry *unused;
+	uint32_t pde, temp;
 
-	return 0;
+	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde)
+		ppgtt->pd.page_table[pde] = ppgtt->scratch_pt;
 }
 
-static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 {
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1292,6 +1369,18 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
+	if (aliasing) {
+		/* preallocate all pts */
+		ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+				ppgtt->base.dev);
+
+		if (ret) {
+			unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
+			drm_mm_remove_node(&ppgtt->node);
+			return ret;
+		}
+	}
+
 	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
@@ -1306,7 +1395,10 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
 		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
+	if (aliasing)
+		ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
+	else
+		gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
 
 	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
 
@@ -1320,7 +1412,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
+static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
+		bool aliasing)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
@@ -1328,7 +1421,7 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.scratch = dev_priv->gtt.base.scratch;
 
 	if (INTEL_INFO(dev)->gen < 8)
-		return gen6_ppgtt_init(ppgtt);
+		return gen6_ppgtt_init(ppgtt, aliasing);
 	else
 		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
 }
@@ -1337,7 +1430,7 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret = 0;
 
-	ret = __hw_ppgtt_init(dev, ppgtt);
+	ret = __hw_ppgtt_init(dev, ppgtt, false);
 	if (ret == 0) {
 		kref_init(&ppgtt->ref);
 		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
@@ -1969,7 +2062,7 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
 		if (!ppgtt)
 			return -ENOMEM;
 
-		ret = __hw_ppgtt_init(dev, ppgtt);
+		ret = __hw_ppgtt_init(dev, ppgtt, true);
 		if (ret != 0)
 			return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 867ede5..5918131 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -195,6 +195,7 @@ struct i915_page_table_entry {
 	dma_addr_t daddr;
 
 	unsigned long *used_ptes;
+	unsigned int scratch:1;
 };
 
 struct i915_page_directory_entry {
@@ -305,6 +306,8 @@ struct i915_hw_ppgtt {
 		struct i915_page_directory_entry pd;
 	};
 
+	struct i915_page_table_entry *scratch_pt;
+
 	struct drm_i915_file_private *file_priv;
 
 	gen6_gtt_pte_t __iomem *pd_addr;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 10/32] drm/i915: Add dynamic page trace events
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (8 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 09/32] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-03-20 13:29     ` Mika Kuoppala
  2015-02-24 16:22   ` [PATCH v6 11/32] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
                     ` (22 subsequent siblings)
  32 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

Traces for page directories and tables allocation and map.

v2: Removed references to teardown.
v3: bitmap_scnprintf has been deprecated.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c     |  2 +
 drivers/gpu/drm/i915/i915_gem_gtt.c |  5 ++
 drivers/gpu/drm/i915/i915_trace.h   | 95 +++++++++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 312b7d2..4e51275 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3601,6 +3601,8 @@ search_free:
 
 	/*  allocate before insert / bind */
 	if (vma->vm->allocate_va_range) {
+		trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size,
+				VM_TO_TRACE_NAME(vma->vm));
 		ret = vma->vm->allocate_va_range(vma->vm,
 						vma->node.start,
 						vma->node.size);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 29cda58..94cdd99 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1210,6 +1210,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 
 		ppgtt->pd.page_table[pde] = pt;
 		set_bit(pde, new_page_tables);
+		trace_i915_page_table_entry_alloc(vm, pde, start, GEN6_PDE_SHIFT);
 	}
 
 	start = start_save;
@@ -1225,6 +1226,10 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 		if (test_and_clear_bit(pde, new_page_tables))
 			gen6_write_pde(&ppgtt->pd, pde, pt);
 
+		trace_i915_page_table_entry_map(vm, pde, pt,
+					 gen6_pte_index(start),
+					 gen6_pte_count(start, length),
+					 I915_PPGTT_PT_ENTRIES);
 		bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
 				I915_PPGTT_PT_ENTRIES);
 	}
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index f004d3d..0038dc2 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -156,6 +156,101 @@ TRACE_EVENT(i915_vma_unbind,
 		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
 );
 
+#define VM_TO_TRACE_NAME(vm) \
+	(i915_is_ggtt(vm) ? "GGTT" : \
+		      "Private VM")
+
+DECLARE_EVENT_CLASS(i915_va,
+	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+	TP_ARGS(vm, start, length, name),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u64, start)
+		__field(u64, end)
+		__string(name, name)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->start = start;
+		__entry->end = start + length;
+		__assign_str(name, name);
+	),
+
+	TP_printk("vm=%p (%s), 0x%llx-0x%llx",
+		  __entry->vm, __get_str(name),  __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_va, i915_va_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+	     TP_ARGS(vm, start, length, name)
+);
+
+DECLARE_EVENT_CLASS(i915_page_table_entry,
+	TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	TP_ARGS(vm, pde, start, pde_shift),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u64, start)
+		__field(u64, end)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->start = start;
+		__entry->end = (start + (1ULL << pde_shift)) & ~((1ULL << pde_shift)-1);
+	),
+
+	TP_printk("vm=%p, pde=%d (0x%llx-0x%llx)",
+		  __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_page_table_entry, i915_page_table_entry_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+	     TP_ARGS(vm, pde, start, pde_shift)
+);
+
+/* Avoid extra math because we only support two sizes. The format is defined by
+ * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
+#define TRACE_PT_SIZE(bits) \
+	((((bits) == 1024) ? 288 : 144) + 1)
+
+DECLARE_EVENT_CLASS(i915_page_table_entry_update,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_page_table_entry *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u32, first)
+		__field(u32, last)
+		__bitmask(cur_ptes, TRACE_PT_SIZE(bits))
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->first = first;
+		__entry->last = first + len;
+		__assign_bitmask(cur_ptes, pt->used_ptes, bits);
+	),
+
+	TP_printk("vm=%p, pde=%d, updating %u:%u\t%s",
+		  __entry->vm, __entry->pde, __entry->last, __entry->first,
+		  __get_bitmask(cur_ptes))
+);
+
+DEFINE_EVENT(i915_page_table_entry_update, i915_page_table_entry_map,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_page_table_entry *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits)
+);
+
 TRACE_EVENT(i915_gem_object_change_domain,
 	    TP_PROTO(struct drm_i915_gem_object *obj, u32 old_read, u32 old_write),
 	    TP_ARGS(obj, old_read, old_write),
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 11/32] drm/i915/bdw: Use dynamic allocation idioms on free
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (9 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 10/32] drm/i915: Add dynamic page trace events Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 12/32] drm/i915/bdw: page directories rework allocation Michel Thierry
                     ` (21 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The page directory freer is left here for now as it's still useful given
that GEN8 still preallocates. Once the allocation functions are broken
up into more discrete chunks, we'll follow suit and destroy this
leftover piece.

v2: Match trace_i915_va_teardown params
v3: Multiple rebases.
v4: Updated to use unmap_and_free_pt.
v5: teardown_va_range logic no longer needed.
v6: Rebase after s/page_tables/page_table/.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 26 ++++++++++----------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 47 +++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 94cdd99..e03b2c8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -607,19 +607,6 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct d
 	}
 }
 
-static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
-{
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
-			continue;
-
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
-		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
-	}
-}
-
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
@@ -652,6 +639,19 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	}
 }
 
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+{
+	int i;
+
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+			continue;
+
+		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
+		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
+	}
+}
+
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct i915_hw_ppgtt *ppgtt =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 5918131..1f5c136 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -383,6 +383,53 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	return i915_pde_index(addr, GEN6_PDE_SHIFT);
 }
 
+#define gen8_for_each_pde(pt, pd, start, length, temp, iter)		\
+	for (iter = gen8_pde_index(start), pt = (pd)->page_table[iter]; \
+	     length > 0 && iter < GEN8_PDES_PER_PAGE;			\
+	     pt = (pd)->page_table[++iter],				\
+	     temp = ALIGN(start+1, 1 << GEN8_PDE_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
+	for (iter = gen8_pdpe_index(start), pd = (pdp)->page_directory[iter];	\
+	     length > 0 && iter < GEN8_LEGACY_PDPES;			\
+	     pd = (pdp)->page_directory[++iter],				\
+	     temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
+/* Clamp length to the next page_directory boundary */
+static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
+{
+	uint64_t next_pd = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
+
+	if (next_pd > (start + length))
+		return length;
+
+	return next_pd - start;
+}
+
+static inline uint32_t gen8_pte_index(uint64_t address)
+{
+	return i915_pte_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pde_index(uint64_t address)
+{
+	return i915_pde_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pdpe_index(uint64_t address)
+{
+	return (address >> GEN8_PDPE_SHIFT) & GEN8_PDPE_MASK;
+}
+
+static inline uint32_t gen8_pml4e_index(uint64_t address)
+{
+	BUG(); /* For 64B */
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 12/32] drm/i915/bdw: page directories rework allocation
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (10 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 11/32] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 13/32] drm/i915/bdw: pagetable allocation rework Michel Thierry
                     ` (20 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Start using gen8_for_each_pdpe macro to allocate the page directories.

v2: Rebased after s/free_pt_*/unmap_and_free_pt/ change.
v3: Rebased after teardown va range logic was removed.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 43 ++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e03b2c8..ade8edd 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -681,25 +681,39 @@ unwind_out:
 	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
+static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
+				     uint64_t start,
+				     uint64_t length)
 {
-	int i;
-
-	for (i = 0; i < max_pdp; i++) {
-		ppgtt->pdp.page_directory[i] = alloc_pd_single();
-		if (IS_ERR(ppgtt->pdp.page_directory[i]))
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pdp, struct i915_hw_ppgtt, pdp);
+	struct i915_page_directory_entry *unused;
+	uint64_t temp;
+	uint32_t pdpe;
+
+	/* FIXME: PPGTT container_of won't work for 64b */
+	BUG_ON((start + length) > 0x800000000ULL);
+
+	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
+		BUG_ON(unused);
+		pdp->page_directory[pdpe] = alloc_pd_single();
+		if (IS_ERR(ppgtt->pdp.page_directory[pdpe]))
 			goto unwind_out;
+
+		ppgtt->num_pd_pages++;
 	}
 
-	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
 	return 0;
 
 unwind_out:
-	while (i--)
-		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
+	while (pdpe--) {
+		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe]);
+		ppgtt->num_pd_pages--;
+	}
+
+	WARN_ON(ppgtt->num_pd_pages);
 
 	return -ENOMEM;
 }
@@ -709,7 +723,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 {
 	int ret;
 
-	ret = gen8_ppgtt_allocate_page_directories(ppgtt, max_pdp);
+	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, ppgtt->base.start,
+					ppgtt->base.total);
 	if (ret)
 		return ret;
 
@@ -785,6 +800,10 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	if (size % (1<<30))
 		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
 
+	ppgtt->base.start = 0;
+	ppgtt->base.total = size;
+	BUG_ON(ppgtt->base.total == 0);
+
 	/* 1. Do all our allocations for page directories and page tables. */
 	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
 	if (ret)
@@ -832,8 +851,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
-	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
 
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 13/32] drm/i915/bdw: pagetable allocation rework
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (11 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 12/32] drm/i915/bdw: page directories rework allocation Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 14/32] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
                     ` (19 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Start using gen8_for_each_pde macro to allocate page tables.

v2: teardown_va_range references removed.
v3: Rebase after s/page_tables/page_table/.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 46 +++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ade8edd..762c535 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -661,22 +661,27 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_alloc_pagetabs(struct i915_page_directory_entry *pd,
+				     uint64_t start,
+				     uint64_t length,
+				     struct drm_device *dev)
 {
-	int i, ret;
+	struct i915_page_table_entry *unused;
+	uint64_t temp;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
-				     0, GEN8_PDES_PER_PAGE, ppgtt->base.dev);
-		if (ret)
+	gen8_for_each_pde(unused, pd, start, length, temp, pde) {
+		BUG_ON(unused);
+		pd->page_table[pde] = alloc_pt_single(dev);
+		if (IS_ERR(pd->page_table[pde]))
 			goto unwind_out;
 	}
 
 	return 0;
 
 unwind_out:
-	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
+	while (pde--)
+		unmap_and_free_pt(pd->page_table[pde], dev);
 
 	return -ENOMEM;
 }
@@ -719,20 +724,28 @@ unwind_out:
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-			    const int max_pdp)
+			    uint64_t start,
+			    uint64_t length)
 {
+	struct i915_page_directory_entry *pd;
+	uint64_t temp;
+	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, ppgtt->base.start,
-					ppgtt->base.total);
+	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length);
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
-	if (ret)
-		goto err_out;
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
+						ppgtt->base.dev);
+		if (ret)
+			goto err_out;
+
+		ppgtt->num_pd_entries += GEN8_PDES_PER_PAGE;
+	}
 
-	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
+	BUG_ON(pdpe > ppgtt->num_pd_pages);
 
 	return 0;
 
@@ -802,10 +815,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
-	BUG_ON(ppgtt->base.total == 0);
 
 	/* 1. Do all our allocations for page directories and page tables. */
-	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
+	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
 	if (ret)
 		return ret;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 14/32] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (12 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 13/32] drm/i915/bdw: pagetable allocation rework Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 15/32] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
                     ` (18 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

One important part of this patch is we now write a scratch page
directory into any unused PDP descriptors. This matters for 2 reasons,
first, we're not allowed to just use 0, or an invalid pointer, and second,
we must wipe out any previous contents from the last context.

The latter point only matters with full PPGTT. The former point only
effect platforms with less than 4GB memory.

v2: Updated commit message to point that we must set unused PDPs to the
scratch page.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 29 ++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  5 ++++-
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 762c535..2e4db77 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -442,8 +442,9 @@ static struct i915_page_directory_entry *alloc_pd_single(void)
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
-static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
-			   uint64_t val)
+static int gen8_write_pdp(struct intel_engine_cs *ring,
+			  unsigned entry,
+			  dma_addr_t addr)
 {
 	int ret;
 
@@ -455,10 +456,10 @@ static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
 
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 	intel_ring_emit(ring, GEN8_RING_PDP_UDW(ring, entry));
-	intel_ring_emit(ring, (u32)(val >> 32));
+	intel_ring_emit(ring, upper_32_bits(addr));
 	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
 	intel_ring_emit(ring, GEN8_RING_PDP_LDW(ring, entry));
-	intel_ring_emit(ring, (u32)(val));
+	intel_ring_emit(ring, lower_32_bits(addr));
 	intel_ring_advance(ring);
 
 	return 0;
@@ -469,12 +470,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 {
 	int i, ret;
 
-	/* bit of a hack to find the actual last used pd */
-	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
-
-	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.page_directory[i]->daddr;
-		ret = gen8_write_pdp(ring, i, addr);
+	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
+		dma_addr_t pd_daddr = pd ? pd->daddr : ppgtt->scratch_pd->daddr;
+		/* The page directory might be NULL, but we need to clear out
+		 * whatever the previous context might have used. */
+		ret = gen8_write_pdp(ring, i, pd_daddr);
 		if (ret)
 			return ret;
 	}
@@ -816,10 +817,16 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
 
+	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pd))
+		return PTR_ERR(ppgtt->scratch_pd);
+
 	/* 1. Do all our allocations for page directories and page tables. */
 	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
-	if (ret)
+	if (ret) {
+		unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
+	}
 
 	/*
 	 * 2. Create DMA mappings for the page directories and page tables.
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 1f5c136..7a16627 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -306,7 +306,10 @@ struct i915_hw_ppgtt {
 		struct i915_page_directory_entry pd;
 	};
 
-	struct i915_page_table_entry *scratch_pt;
+	union {
+		struct i915_page_table_entry *scratch_pt;
+		struct i915_page_table_entry *scratch_pd; /* Just need the daddr */
+	};
 
 	struct drm_i915_file_private *file_priv;
 
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 15/32] drm/i915: num_pd_pages/num_pd_entries isn't useful
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (13 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 14/32] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 16/32] drm/i915: Extract PPGTT param from page_directory alloc Michel Thierry
                     ` (17 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

These values are never quite useful for dynamic allocations of the page
tables. Getting rid of them will help prevent later confusion.

v2: Updated to use unmap_and_free_pd functions.
v3: Updated gen8_ppgtt_free after teardown logic was removed.
v4: Rebase after s/page_tables/page_table/.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 --
 drivers/gpu/drm/i915/i915_gem_gtt.c | 72 ++++++++++++-------------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  7 ++--
 3 files changed, 28 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e8ad450..e85da9d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2149,8 +2149,6 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 	if (!ppgtt)
 		return;
 
-	seq_printf(m, "Page directories: %d\n", ppgtt->num_pd_pages);
-	seq_printf(m, "Page tables: %d\n", ppgtt->num_pd_entries);
 	for_each_ring(ring, dev_priv, unused) {
 		seq_printf(m, "%s\n", ring->name);
 		for (i = 0; i < 4; i++) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2e4db77..bddfcc2 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -613,9 +613,7 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		/* TODO: In the future we'll support sparse mappings, so this
-		 * will have to change. */
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
 		if (!ppgtt->pdp.page_directory[i]->daddr)
 			continue;
 
@@ -644,7 +642,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
@@ -705,21 +703,13 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 		pdp->page_directory[pdpe] = alloc_pd_single();
 		if (IS_ERR(ppgtt->pdp.page_directory[pdpe]))
 			goto unwind_out;
-
-		ppgtt->num_pd_pages++;
 	}
 
-	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
-
 	return 0;
 
 unwind_out:
-	while (pdpe--) {
+	while (pdpe--)
 		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe]);
-		ppgtt->num_pd_pages--;
-	}
-
-	WARN_ON(ppgtt->num_pd_pages);
 
 	return -ENOMEM;
 }
@@ -742,12 +732,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 						ppgtt->base.dev);
 		if (ret)
 			goto err_out;
-
-		ppgtt->num_pd_entries += GEN8_PDES_PER_PAGE;
 	}
 
-	BUG_ON(pdpe > ppgtt->num_pd_pages);
-
 	return 0;
 
 err_out:
@@ -808,7 +794,6 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
 	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
 	int i, j, ret;
 
 	if (size % (1<<30))
@@ -872,12 +857,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
-	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
-			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
-	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
-			 ppgtt->num_pd_entries,
-			 (ppgtt->num_pd_entries - min_pt_pages) + size % (1<<30));
 	return 0;
 
 bail:
@@ -888,26 +867,20 @@ bail:
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
 	struct i915_address_space *vm = &ppgtt->base;
-	gen6_gtt_pte_t __iomem *pd_addr;
+	struct i915_page_table_entry *unused;
 	gen6_gtt_pte_t scratch_pte;
 	uint32_t pd_entry;
-	int pte, pde;
+	uint32_t  pte, pde, temp;
+	uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
 
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
 
-	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-
-	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-		   ppgtt->pd.pd_offset,
-		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
-	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
+	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
 		dma_addr_t pt_addr = ppgtt->pd.page_table[pde]->daddr;
-		pd_entry = readl(pd_addr + pde);
+		pd_entry = readl(ppgtt->pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
 		if (pd_entry != expected)
@@ -1189,12 +1162,15 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 
 static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
-	int i;
+	struct i915_page_table_entry *pt;
+	uint32_t pde;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_table[i]->daddr,
-			       4096, PCI_DMA_BIDIRECTIONAL);
+	gen6_for_all_pdes(pt, ppgtt, pde) {
+		if (pt != ppgtt->scratch_pt)
+			pci_unmap_page(ppgtt->base.dev->pdev,
+				pt->daddr,
+				4096, PCI_DMA_BIDIRECTIONAL);
+	}
 }
 
 /* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
@@ -1293,13 +1269,12 @@ unwind_out:
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		struct i915_page_table_entry *pt = ppgtt->pd.page_table[i];
+	struct i915_page_table_entry *pt;
+	uint32_t pde;
 
+	gen6_for_all_pdes(pt, ppgtt, pde) {
 		if (pt != ppgtt->scratch_pt)
-			unmap_and_free_pt(ppgtt->pd.page_table[i], ppgtt->base.dev);
+			unmap_and_free_pt(pt, ppgtt->base.dev);
 	}
 
 	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
@@ -1358,7 +1333,6 @@ alloc:
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
-	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
 	return 0;
 
 err_out:
@@ -1412,7 +1386,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 
 	if (aliasing) {
 		/* preallocate all pts */
-		ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+		ret = alloc_pt_range(&ppgtt->pd, 0, GEN6_PPGTT_PD_ENTRIES,
 				ppgtt->base.dev);
 
 		if (ret) {
@@ -1427,7 +1401,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	ppgtt->base.total = GEN6_PPGTT_PD_ENTRIES * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd.pd_offset =
@@ -1730,7 +1704,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 				ppgtt = dev_priv->mm.aliasing_ppgtt;
 
 			gen6_write_page_range(dev_priv, &ppgtt->pd, 0,
-					      ppgtt->num_pd_entries);
+					      GEN6_PPGTT_PD_ENTRIES);
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 7a16627..b53d40ca 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -299,8 +299,6 @@ struct i915_hw_ppgtt {
 	struct kref ref;
 	struct drm_mm_node node;
 	unsigned long pd_dirty_rings;
-	unsigned num_pd_entries;
-	unsigned num_pd_pages; /* gen8+ */
 	union {
 		struct i915_page_directory_pointer_entry pdp;
 		struct i915_page_directory_entry pd;
@@ -338,6 +336,11 @@ struct i915_hw_ppgtt {
 	     temp = min_t(unsigned, temp, length), \
 	     start += temp, length -= temp)
 
+#define gen6_for_all_pdes(pt, ppgtt, iter)  \
+	for (iter = 0, pt = ppgtt->pd.page_table[iter];			\
+	     iter < gen6_pde_index(ppgtt->base.total);			\
+	     pt =  ppgtt->pd.page_table[++iter])
+
 static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
 {
 	const uint32_t mask = NUM_PTE(pde_shift) - 1;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 16/32] drm/i915: Extract PPGTT param from page_directory alloc
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (14 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 15/32] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 17/32] drm/i915/bdw: Split out mappings Michel Thierry
                     ` (16 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Now that we don't need to trace num_pd_pages, we may as well kill all
need for the PPGTT structure in the alloc_page_directorys. This is very useful
for when we move to 48b addressing, and the PDP isn't the root of the
page table structure.

The param is replaced with drm_device, which is an unavoidable wart
throughout the series. (in other words, not extra flagrant).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bddfcc2..678ab62 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -689,8 +689,6 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 				     uint64_t start,
 				     uint64_t length)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(pdp, struct i915_hw_ppgtt, pdp);
 	struct i915_page_directory_entry *unused;
 	uint64_t temp;
 	uint32_t pdpe;
@@ -701,7 +699,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
 		pdp->page_directory[pdpe] = alloc_pd_single();
-		if (IS_ERR(ppgtt->pdp.page_directory[pdpe]))
+		if (IS_ERR(pdp->page_directory[pdpe]))
 			goto unwind_out;
 	}
 
@@ -709,7 +707,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 
 unwind_out:
 	while (pdpe--)
-		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe]);
+		unmap_and_free_pd(pdp->page_directory[pdpe]);
 
 	return -ENOMEM;
 }
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 17/32] drm/i915/bdw: Split out mappings
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (15 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 16/32] drm/i915: Extract PPGTT param from page_directory alloc Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 18/32] drm/i915/bdw: begin bitmap tracking Michel Thierry
                     ` (15 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

When we do dynamic page table allocations for gen8, we'll need to have
more control over how and when we map page tables, similar to gen6.
In particular, DMA mappings for page directories/tables occur at allocation
time.

This patch adds the functionality and calls it at init, which should
have no functional change.

The PDPEs are still a special case for now. We'll need a function for
that in the future as well.

v2: Handle renamed unmap_and_free_page functions.
v3: Updated after teardown_va logic was removed.
v4: Rebase after s/page_tables/page_table/.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 176 ++++++++++++++----------------------
 1 file changed, 69 insertions(+), 107 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 678ab62..76bf2c9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -416,17 +416,20 @@ err_out:
 	return ret;
 }
 
-static void unmap_and_free_pd(struct i915_page_directory_entry *pd)
+static void unmap_and_free_pd(struct i915_page_directory_entry *pd,
+			       struct drm_device *dev)
 {
 	if (pd->page) {
+		i915_dma_unmap_single(pd, dev);
 		__free_page(pd->page);
 		kfree(pd);
 	}
 }
 
-static struct i915_page_directory_entry *alloc_pd_single(void)
+static struct i915_page_directory_entry *alloc_pd_single(struct drm_device *dev)
 {
 	struct i915_page_directory_entry *pd;
+	int ret;
 
 	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
 	if (!pd)
@@ -438,6 +441,13 @@ static struct i915_page_directory_entry *alloc_pd_single(void)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	ret = i915_dma_map_single(pd, dev);
+	if (ret) {
+		__free_page(pd->page);
+		kfree(pd);
+		return ERR_PTR(ret);
+	}
+
 	return pd;
 }
 
@@ -592,6 +602,36 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+			     struct i915_page_table_entry *pt,
+			     struct drm_device *dev)
+{
+	gen8_ppgtt_pde_t entry =
+		gen8_pde_encode(dev, pt->daddr, I915_CACHE_LLC);
+	*pde = entry;
+}
+
+/* It's likely we'll map more than one pagetable at a time. This function will
+ * save us unnecessary kmap calls, but do no more functionally than multiple
+ * calls to map_pt. */
+static void gen8_map_pagetable_range(struct i915_page_directory_entry *pd,
+				     uint64_t start,
+				     uint64_t length,
+				     struct drm_device *dev)
+{
+	gen8_ppgtt_pde_t *page_directory = kmap_atomic(pd->page);
+	struct i915_page_table_entry *pt;
+	uint64_t temp, pde;
+
+	gen8_for_each_pde(pt, pd, start, length, temp, pde)
+		__gen8_do_map_pt(page_directory + pde, pt, dev);
+
+	if (!HAS_LLC(dev))
+		drm_clflush_virt_range(page_directory, PAGE_SIZE);
+
+	kunmap_atomic(page_directory);
+}
+
 static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct drm_device *dev)
 {
 	int i;
@@ -647,7 +687,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 			continue;
 
 		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
-		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
+		unmap_and_free_pd(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
 	}
 }
 
@@ -687,7 +727,8 @@ unwind_out:
 
 static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
 				     uint64_t start,
-				     uint64_t length)
+				     uint64_t length,
+				     struct drm_device *dev)
 {
 	struct i915_page_directory_entry *unused;
 	uint64_t temp;
@@ -698,7 +739,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
-		pdp->page_directory[pdpe] = alloc_pd_single();
+		pdp->page_directory[pdpe] = alloc_pd_single(dev);
 		if (IS_ERR(pdp->page_directory[pdpe]))
 			goto unwind_out;
 	}
@@ -707,21 +748,24 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 
 unwind_out:
 	while (pdpe--)
-		unmap_and_free_pd(pdp->page_directory[pdpe]);
+		unmap_and_free_pd(pdp->page_directory[pdpe], dev);
 
 	return -ENOMEM;
 }
 
-static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-			    uint64_t start,
-			    uint64_t length)
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start,
+			       uint64_t length)
 {
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_entry *pd;
 	uint64_t temp;
 	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length);
+	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length,
+					ppgtt->base.dev);
 	if (ret)
 		return ret;
 
@@ -739,128 +783,46 @@ err_out:
 	return ret;
 }
 
-static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
-					     const int pd)
-{
-	dma_addr_t pd_addr;
-	int ret;
-
-	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.page_directory[pd]->page, 0,
-			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
-	if (ret)
-		return ret;
-
-	ppgtt->pdp.page_directory[pd]->daddr = pd_addr;
-
-	return 0;
-}
-
-static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
-					const int pd,
-					const int pt)
-{
-	dma_addr_t pt_addr;
-	struct i915_page_directory_entry *pdir = ppgtt->pdp.page_directory[pd];
-	struct i915_page_table_entry *ptab = pdir->page_table[pt];
-	struct page *p = ptab->page;
-	int ret;
-
-	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
-	if (ret)
-		return ret;
-
-	ptab->daddr = pt_addr;
-
-	return 0;
-}
-
 /**
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
  * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
  * space.
  *
- * FIXME: split allocation into smaller pieces. For now we only ever do this
- * once, but with full PPGTT, the multiple contiguous allocations will be bad.
- * TODO: Do something with the size parameter
  */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	int i, j, ret;
-
-	if (size % (1<<30))
-		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
+	struct i915_page_directory_entry *pd;
+	uint64_t temp, start = 0;
+	const uint64_t orig_length = size;
+	uint32_t pdpe;
+	int ret;
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->switch_mm = gen8_mm_switch;
 
 	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
 	if (IS_ERR(ppgtt->scratch_pd))
 		return PTR_ERR(ppgtt->scratch_pd);
 
-	/* 1. Do all our allocations for page directories and page tables. */
-	ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
+	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
 		unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
 	}
 
-	/*
-	 * 2. Create DMA mappings for the page directories and page tables.
-	 */
-	for (i = 0; i < max_pdp; i++) {
-		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
-		if (ret)
-			goto bail;
+	start = 0;
+	size = orig_length;
 
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
-			if (ret)
-				goto bail;
-		}
-	}
-
-	/*
-	 * 3. Map all the page directory entires to point to the page tables
-	 * we've allocated.
-	 *
-	 * For now, the PPGTT helper functions all require that the PDEs are
-	 * plugged in correctly. So we do that now/here. For aliasing PPGTT, we
-	 * will never need to touch the PDEs again.
-	 */
-	for (i = 0; i < max_pdp; i++) {
-		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
-		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i]->page);
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_table_entry *pt = pd->page_table[j];
-			dma_addr_t addr = pt->daddr;
-			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
-						      I915_CACHE_LLC);
-		}
-		if (!HAS_LLC(ppgtt->base.dev))
-			drm_clflush_virt_range(pd_vaddr, PAGE_SIZE);
-		kunmap_atomic(pd_vaddr);
-	}
-
-	ppgtt->switch_mm = gen8_mm_switch;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
-	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
+		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
 
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 	return 0;
-
-bail:
-	gen8_ppgtt_unmap_pages(ppgtt);
-	gen8_ppgtt_free(ppgtt);
-	return ret;
 }
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
@@ -1276,7 +1238,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	}
 
 	unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
-	unmap_and_free_pd(&ppgtt->pd);
+	unmap_and_free_pd(&ppgtt->pd, ppgtt->base.dev);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 18/32] drm/i915/bdw: begin bitmap tracking
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (16 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 17/32] drm/i915/bdw: Split out mappings Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 19/32] drm/i915/bdw: Dynamic page table allocations Michel Thierry
                     ` (14 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Like with gen6/7, we can enable bitmap tracking with all the
preallocations to make sure things actually don't blow up.

v2: Rebased to match changes from previous patches.
v3: Without teardown logic, rely on used_pdpes and used_pdes when
freeing page tables.
v4: Rebase after s/page_tables/page_table/.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 75 ++++++++++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 24 ++++++++++++
 2 files changed, 81 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 76bf2c9..adf55e2 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -422,6 +422,7 @@ static void unmap_and_free_pd(struct i915_page_directory_entry *pd,
 	if (pd->page) {
 		i915_dma_unmap_single(pd, dev);
 		__free_page(pd->page);
+		kfree(pd->used_pdes);
 		kfree(pd);
 	}
 }
@@ -429,26 +430,35 @@ static void unmap_and_free_pd(struct i915_page_directory_entry *pd,
 static struct i915_page_directory_entry *alloc_pd_single(struct drm_device *dev)
 {
 	struct i915_page_directory_entry *pd;
-	int ret;
+	int ret = -ENOMEM;
 
 	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
 	if (!pd)
 		return ERR_PTR(-ENOMEM);
 
+	pd->used_pdes = kcalloc(BITS_TO_LONGS(GEN8_PDES_PER_PAGE),
+				sizeof(*pd->used_pdes), GFP_KERNEL);
+	if (!pd->used_pdes)
+		goto free_pd;
+
 	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (!pd->page) {
-		kfree(pd);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!pd->page)
+		goto free_bitmap;
 
 	ret = i915_dma_map_single(pd, dev);
-	if (ret) {
-		__free_page(pd->page);
-		kfree(pd);
-		return ERR_PTR(ret);
-	}
+	if (ret)
+		goto free_page;
 
 	return pd;
+
+free_page:
+	__free_page(pd->page);
+free_bitmap:
+	kfree(pd->used_pdes);
+free_pd:
+	kfree(pd);
+
+	return ERR_PTR(ret);
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
@@ -639,7 +649,7 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct d
 	if (!pd->page)
 		return;
 
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
+	for_each_set_bit(i, pd->used_pdes, GEN8_PDES_PER_PAGE) {
 		if (WARN_ON(!pd->page_table[i]))
 			continue;
 
@@ -653,15 +663,18 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
 
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
-		if (!ppgtt->pdp.page_directory[i]->daddr)
+	for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
+		struct i915_page_directory_entry *pd;
+
+		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i]->daddr, PAGE_SIZE,
-			       PCI_DMA_BIDIRECTIONAL);
+		pd = ppgtt->pdp.page_directory[i];
+		if (!pd->daddr)
+			pci_unmap_page(hwdev, pd->daddr, PAGE_SIZE,
+					PCI_DMA_BIDIRECTIONAL);
 
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
+		for_each_set_bit(j, pd->used_pdes, GEN8_PDES_PER_PAGE) {
 			struct i915_page_table_entry *pt;
 			dma_addr_t addr;
 
@@ -682,7 +695,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+	for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
@@ -725,6 +738,7 @@ unwind_out:
 	return -ENOMEM;
 }
 
+/* bitmap of new page_directories */
 static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
 				     uint64_t start,
 				     uint64_t length,
@@ -740,6 +754,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_
 	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
 		BUG_ON(unused);
 		pdp->page_directory[pdpe] = alloc_pd_single(dev);
+
 		if (IS_ERR(pdp->page_directory[pdpe]))
 			goto unwind_out;
 	}
@@ -760,10 +775,13 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_entry *pd;
+	const uint64_t orig_start = start;
+	const uint64_t orig_length = length;
 	uint64_t temp;
 	uint32_t pdpe;
 	int ret;
 
+	/* Do the allocations first so we can easily bail out */
 	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length,
 					ppgtt->base.dev);
 	if (ret)
@@ -776,6 +794,27 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 			goto err_out;
 	}
 
+	/* Now mark everything we've touched as used. This doesn't allow for
+	 * robust error checking, but it makes the code a hell of a lot simpler.
+	 */
+	start = orig_start;
+	length = orig_length;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		struct i915_page_table_entry *pt;
+		uint64_t pd_len = gen8_clamp_pd(start, length);
+		uint64_t pd_start = start;
+		uint32_t pde;
+
+		gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
+			bitmap_set(pd->page_table[pde]->used_ptes,
+				   gen8_pte_index(start),
+				   gen8_pte_count(start, length));
+			set_bit(pde, pd->used_pdes);
+		}
+		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+	}
+
 	return 0;
 
 err_out:
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index b53d40ca..fd84bbc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -205,11 +205,13 @@ struct i915_page_directory_entry {
 		dma_addr_t daddr;
 	};
 
+	unsigned long *used_pdes;
 	struct i915_page_table_entry *page_table[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
 };
 
 struct i915_page_directory_pointer_entry {
 	/* struct page *page; */
+	DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
 	struct i915_page_directory_entry *page_directory[GEN8_LEGACY_PDPES];
 };
 
@@ -436,6 +438,28 @@ static inline uint32_t gen8_pml4e_index(uint64_t address)
 	BUG(); /* For 64B */
 }
 
+static inline size_t gen8_pte_count(uint64_t addr, uint64_t length)
+{
+	return i915_pte_count(addr, length, GEN8_PDE_SHIFT);
+}
+
+static inline size_t gen8_pde_count(uint64_t addr, uint64_t length)
+{
+	const uint32_t pdp_shift = GEN8_PDE_SHIFT + 9;
+	const uint64_t mask = ~((1 << pdp_shift) - 1);
+	uint64_t end;
+
+	BUG_ON(length == 0);
+	BUG_ON(offset_in_page(addr|length));
+
+	end = addr + length;
+
+	if ((addr & mask) != (end & mask))
+		return GEN8_PDES_PER_PAGE - i915_pde_index(addr, GEN8_PDE_SHIFT);
+
+	return i915_pde_index(end, GEN8_PDE_SHIFT) - i915_pde_index(addr, GEN8_PDE_SHIFT);
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 19/32] drm/i915/bdw: Dynamic page table allocations
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (17 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 18/32] drm/i915/bdw: begin bitmap tracking Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 20/32] drm/i915/bdw: Support dynamic pdp updates in lrc mode Michel Thierry
                     ` (13 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.

v2: Update aliasing/true ppgtt allocate/teardown/clear functions for
gen 6 & 7.

v3: Rebase.

v4: Remove BUG() from ppgtt_unbind_vma, but keep checking that either
teardown_va_range or clear_range functions exist (Daniel).

v5: Similar to gen6, in init, gen8_ppgtt_clear_range call is only needed
for aliasing ppgtt. Zombie tracking was originally added for teardown
function and is no longer required.

v6: Update err_out case in gen8_alloc_va_range (missed from lastest
rebase).

v7: Rebase after s/page_tables/page_table/.

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 300 +++++++++++++++++++++++++++++-------
 1 file changed, 246 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index adf55e2..b9dfc56 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -612,7 +612,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	}
 }
 
-static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
 			     struct i915_page_table_entry *pt,
 			     struct drm_device *dev)
 {
@@ -629,7 +629,7 @@ static void gen8_map_pagetable_range(struct i915_page_directory_entry *pd,
 				     uint64_t length,
 				     struct drm_device *dev)
 {
-	gen8_ppgtt_pde_t *page_directory = kmap_atomic(pd->page);
+	gen8_ppgtt_pde_t * const page_directory = kmap_atomic(pd->page);
 	struct i915_page_table_entry *pt;
 	uint64_t temp, pde;
 
@@ -713,58 +713,163 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_alloc_pagetabs(struct i915_page_directory_entry *pd,
+/**
+ * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
+ * @ppgtt:	Master ppgtt structure.
+ * @pd:		Page directory for this address range.
+ * @start:	Starting virtual address to begin allocations.
+ * @length	Size of the allocations.
+ * @new_pts:	Bitmap set by function with new allocations. Likely used by the
+ *		caller to free on error.
+ *
+ * Allocate the required number of page tables. Extremely similar to
+ * gen8_ppgtt_alloc_page_directories(). The main difference is here we are limited by
+ * the page directory boundary (instead of the page directory pointer). That
+ * boundary is 1GB virtual. Therefore, unlike gen8_ppgtt_alloc_page_directories(), it is
+ * possible, and likely that the caller will need to use multiple calls of this
+ * function to achieve the appropriate allocation.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_page_directory_entry *pd,
 				     uint64_t start,
 				     uint64_t length,
-				     struct drm_device *dev)
+				     unsigned long *new_pts)
 {
-	struct i915_page_table_entry *unused;
+	struct i915_page_table_entry *pt;
 	uint64_t temp;
 	uint32_t pde;
 
-	gen8_for_each_pde(unused, pd, start, length, temp, pde) {
-		BUG_ON(unused);
-		pd->page_table[pde] = alloc_pt_single(dev);
-		if (IS_ERR(pd->page_table[pde]))
+	gen8_for_each_pde(pt, pd, start, length, temp, pde) {
+		/* Don't reallocate page tables */
+		if (pt) {
+			/* Scratch is never allocated this way */
+			WARN_ON(pt->scratch);
+			continue;
+		}
+
+		pt = alloc_pt_single(ppgtt->base.dev);
+		if (IS_ERR(pt))
 			goto unwind_out;
+
+		pd->page_table[pde] = pt;
+		set_bit(pde, new_pts);
 	}
 
 	return 0;
 
 unwind_out:
-	while (pde--)
-		unmap_and_free_pt(pd->page_table[pde], dev);
+	for_each_set_bit(pde, new_pts, GEN8_PDES_PER_PAGE)
+		unmap_and_free_pt(pd->page_table[pde], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
 
-/* bitmap of new page_directories */
-static int gen8_ppgtt_alloc_page_directories(struct i915_page_directory_pointer_entry *pdp,
+/**
+ * gen8_ppgtt_alloc_page_directories() - Allocate page directories for VA range.
+ * @ppgtt:	Master ppgtt structure.
+ * @pdp:	Page directory pointer for this address range.
+ * @start:	Starting virtual address to begin allocations.
+ * @length	Size of the allocations.
+ * @new_pds	Bitmap set by function with new allocations. Likely used by the
+ *		caller to free on error.
+ *
+ * Allocate the required number of page directories starting at the pde index of
+ * @start, and ending at the pde index @start + @length. This function will skip
+ * over already allocated page directories within the range, and only allocate
+ * new ones, setting the appropriate pointer within the pdp as well as the
+ * correct position in the bitmap @new_pds.
+ *
+ * The function will only allocate the pages within the range for a give page
+ * directory pointer. In other words, if @start + @length straddles a virtually
+ * addressed PDP boundary (512GB for 4k pages), there will be more allocations
+ * required by the caller, This is not currently possible, and the BUG in the
+ * code will prevent it.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
+				     struct i915_page_directory_pointer_entry *pdp,
 				     uint64_t start,
 				     uint64_t length,
-				     struct drm_device *dev)
+				     unsigned long *new_pds)
 {
-	struct i915_page_directory_entry *unused;
+	struct i915_page_directory_entry *pd;
 	uint64_t temp;
 	uint32_t pdpe;
 
+	BUG_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+
 	/* FIXME: PPGTT container_of won't work for 64b */
 	BUG_ON((start + length) > 0x800000000ULL);
 
-	gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
-		BUG_ON(unused);
-		pdp->page_directory[pdpe] = alloc_pd_single(dev);
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+		if (pd)
+			continue;
 
-		if (IS_ERR(pdp->page_directory[pdpe]))
+		pd = alloc_pd_single(ppgtt->base.dev);
+		if (IS_ERR(pd))
 			goto unwind_out;
+
+		pdp->page_directory[pdpe] = pd;
+		set_bit(pdpe, new_pds);
 	}
 
 	return 0;
 
 unwind_out:
-	while (pdpe--)
-		unmap_and_free_pd(pdp->page_directory[pdpe], dev);
+	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
+		unmap_and_free_pd(pdp->page_directory[pdpe], ppgtt->base.dev);
+
+	return -ENOMEM;
+}
+
+static inline void
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
+{
+	int i;
+
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+		kfree(new_pts[i]);
+	kfree(new_pts);
+	kfree(new_pds);
+}
+
+/* Fills in the page directory bitmap, ant the array of page tables bitmap. Both
+ * of these are based on the number of PDPEs in the system.
+ */
+int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
+					 unsigned long ***new_pts)
+{
+	int i;
+	unsigned long *pds;
+	unsigned long **pts;
+
+	pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
+	if (!pds)
+		return -ENOMEM;
+
+	pts = kcalloc(GEN8_PDES_PER_PAGE, sizeof(unsigned long *), GFP_KERNEL);
+	if (!pts) {
+		kfree(pds);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+		pts[i] = kcalloc(BITS_TO_LONGS(GEN8_PDES_PER_PAGE),
+				 sizeof(unsigned long), GFP_KERNEL);
+		if (!pts[i])
+			goto err_out;
+	}
+
+	*new_pds = pds;
+	*new_pts = (unsigned long **)pts;
 
+	return 0;
+
+err_out:
+	free_gen8_temp_bitmaps(pds, pts);
 	return -ENOMEM;
 }
 
@@ -774,6 +879,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	unsigned long *new_page_dirs, **new_page_tables;
 	struct i915_page_directory_entry *pd;
 	const uint64_t orig_start = start;
 	const uint64_t orig_length = length;
@@ -781,44 +887,99 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	uint32_t pdpe;
 	int ret;
 
-	/* Do the allocations first so we can easily bail out */
-	ret = gen8_ppgtt_alloc_page_directories(&ppgtt->pdp, start, length,
-					ppgtt->base.dev);
+#ifndef CONFIG_64BIT
+	/* Disallow 64b address on 32b platforms. Nothing is wrong with doing
+	 * this in hardware, but a lot of the drm code is not prepared to handle
+	 * 64b offset on 32b platforms.
+	 * This will be addressed when 48b PPGTT is added */
+	if (start + length > 0x100000000ULL)
+		return -E2BIG;
+#endif
+
+	/* Wrap is never okay since we can only represent 48b, and we don't
+	 * actually use the other side of the canonical address space.
+	 */
+	if (WARN_ON(start + length < start))
+		return -ERANGE;
+
+	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
 	if (ret)
 		return ret;
 
+	/* Do the allocations first so we can easily bail out */
+	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
+					new_page_dirs);
+	if (ret) {
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+		return ret;
+	}
+
+	/* For every page directory referenced, allocate page tables */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
-						ppgtt->base.dev);
+		bitmap_zero(new_page_tables[pdpe], GEN8_PDES_PER_PAGE);
+		ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
+						new_page_tables[pdpe]);
 		if (ret)
 			goto err_out;
 	}
 
-	/* Now mark everything we've touched as used. This doesn't allow for
-	 * robust error checking, but it makes the code a hell of a lot simpler.
-	 */
 	start = orig_start;
 	length = orig_length;
 
+	/* Allocations have completed successfully, so set the bitmaps, and do
+	 * the mappings. */
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		gen8_ppgtt_pde_t *const page_directory = kmap_atomic(pd->page);
 		struct i915_page_table_entry *pt;
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
 		uint32_t pde;
 
-		gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
-			bitmap_set(pd->page_table[pde]->used_ptes,
-				   gen8_pte_index(start),
-				   gen8_pte_count(start, length));
+		/* Every pd should be allocated, we just did that above. */
+		BUG_ON(!pd);
+
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+			/* Same reasoning as pd */
+			BUG_ON(!pt);
+			BUG_ON(!pd_len);
+			BUG_ON(!gen8_pte_count(pd_start, pd_len));
+
+			/* Set our used ptes within the page table */
+			bitmap_set(pt->used_ptes,
+				   gen8_pte_index(pd_start),
+				   gen8_pte_count(pd_start, pd_len));
+
+			/* Our pde is now pointing to the pagetable, pt */
 			set_bit(pde, pd->used_pdes);
+
+			/* Map the PDE to the page table */
+			__gen8_do_map_pt(page_directory + pde, pt, vm->dev);
+
+			/* NB: We haven't yet mapped ptes to pages. At this
+			 * point we're still relying on insert_entries() */
 		}
+
+		if (!HAS_LLC(vm->dev))
+			drm_clflush_virt_range(page_directory, PAGE_SIZE);
+
+		kunmap_atomic(page_directory);
+
 		set_bit(pdpe, ppgtt->pdp.used_pdpes);
 	}
 
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	return 0;
 
 err_out:
-	gen8_ppgtt_free(ppgtt);
+	while (pdpe--) {
+		for_each_set_bit(temp, new_page_tables[pdpe], GEN8_PDES_PER_PAGE)
+			unmap_and_free_pt(pd->page_table[temp], vm->dev);
+	}
+
+	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
+		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
+
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	return ret;
 }
 
@@ -829,38 +990,67 @@ err_out:
  * space.
  *
  */
-static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
+static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	struct i915_page_directory_entry *pd;
-	uint64_t temp, start = 0;
-	const uint64_t orig_length = size;
-	uint32_t pdpe;
-	int ret;
+	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pd))
+		return PTR_ERR(ppgtt->scratch_pd);
 
 	ppgtt->base.start = 0;
 	ppgtt->base.total = size;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+
 	ppgtt->switch_mm = gen8_mm_switch;
 
-	ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
-	if (IS_ERR(ppgtt->scratch_pd))
-		return PTR_ERR(ppgtt->scratch_pd);
+	return 0;
+}
+
+static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_page_directory_entry *pd;
+	uint64_t temp, start = 0, size = dev_priv->gtt.base.total;
+	uint32_t pdpe;
+	int ret;
+
+	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	if (ret)
+		return ret;
 
+	/* Aliasing PPGTT has to always work and be mapped because of the way we
+	 * use RESTORE_INHIBIT in the context switch. This will be fixed
+	 * eventually. */
 	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
 		unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 		return ret;
 	}
 
-	start = 0;
-	size = orig_length;
-
 	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
 		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
 
+	ppgtt->base.allocate_va_range = NULL;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
+
+	return 0;
+}
+
+static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	if (ret)
+		return ret;
+
+	ppgtt->base.allocate_va_range = gen8_alloc_va_range;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+
 	return 0;
 }
 
@@ -1395,7 +1585,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 		}
 	}
 
-	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
+	ppgtt->base.allocate_va_range = aliasing ? NULL : gen6_alloc_va_range;
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
@@ -1436,8 +1626,10 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
 
 	if (INTEL_INFO(dev)->gen < 8)
 		return gen6_ppgtt_init(ppgtt, aliasing);
+	else if (aliasing)
+		return gen8_aliasing_ppgtt_init(ppgtt);
 	else
-		return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
+		return gen8_ppgtt_init(ppgtt);
 }
 int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 {
@@ -1546,10 +1738,10 @@ ppgtt_bind_vma(struct i915_vma *vma,
 
 static void ppgtt_unbind_vma(struct i915_vma *vma)
 {
-	vma->vm->clear_range(vma->vm,
-			     vma->node.start,
-			     vma->obj->base.size,
-			     true);
+		vma->vm->clear_range(vma->vm,
+				     vma->node.start,
+				     vma->obj->base.size,
+				     true);
 }
 
 extern int intel_iommu_gfx_mapped;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 20/32] drm/i915/bdw: Support dynamic pdp updates in lrc mode
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (18 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 19/32] drm/i915/bdw: Dynamic page table allocations Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 21/32] drm/i915/bdw: Make pdp allocation more dynamic Michel Thierry
                     ` (12 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

Logic ring contexts need to know the PDPs when they are populated. With
dynamic page table allocations, these PDPs may not exist yet.

Check if PDPs have been allocated and use the scratch page if they do
not exist yet.

Before submission, update the PDPs in the logic ring context as PDPs
have been allocated.

v2: Renamed commit title (Daniel)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 80 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 70 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index bc9c7c3..f461631 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -320,6 +320,7 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 
 static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 				    struct drm_i915_gem_object *ring_obj,
+				    struct i915_hw_ppgtt *ppgtt,
 				    u32 tail)
 {
 	struct page *page;
@@ -331,6 +332,40 @@ static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 	reg_state[CTX_RING_TAIL+1] = tail;
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
 
+	/* True PPGTT with dynamic page allocation: update PDP registers and
+	 * point the unallocated PDPs to the scratch page
+	 */
+	if (ppgtt) {
+		if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+		} else {
+			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+		} else {
+			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+		} else {
+			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+		} else {
+			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+	}
+
 	kunmap_atomic(reg_state);
 
 	return 0;
@@ -349,7 +384,7 @@ static void execlists_submit_contexts(struct intel_engine_cs *ring,
 	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj0));
 	WARN_ON(!i915_gem_obj_is_pinned(ringbuf0->obj));
 
-	execlists_update_context(ctx_obj0, ringbuf0->obj, tail0);
+	execlists_update_context(ctx_obj0, ringbuf0->obj, to0->ppgtt, tail0);
 
 	if (to1) {
 		ringbuf1 = to1->engine[ring->id].ringbuf;
@@ -358,7 +393,7 @@ static void execlists_submit_contexts(struct intel_engine_cs *ring,
 		WARN_ON(!i915_gem_obj_is_pinned(ctx_obj1));
 		WARN_ON(!i915_gem_obj_is_pinned(ringbuf1->obj));
 
-		execlists_update_context(ctx_obj1, ringbuf1->obj, tail1);
+		execlists_update_context(ctx_obj1, ringbuf1->obj, to1->ppgtt, tail1);
 	}
 
 	execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
@@ -1735,14 +1770,39 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
-	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
-	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
-	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
-	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
-	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
-	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
-	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+
+	/* With dynamic page allocation, PDPs may not be allocated at this point,
+	 * Point the unallocated PDPs to the scratch page
+	 */
+	if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+	} else {
+		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+	} else {
+		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+	} else {
+		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+	if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+	} else {
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+	}
+
 	if (ring->id == RCS) {
 		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 21/32] drm/i915/bdw: Make pdp allocation more dynamic
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (19 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 20/32] drm/i915/bdw: Support dynamic pdp updates in lrc mode Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 22/32] drm/i915/bdw: Abstract PDP usage Michel Thierry
                     ` (11 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This transitional patch doesn't do much for the existing code. However,
it should make upcoming patches to use the full 48b address space a bit
easier to swallow. The patch also introduces the PML4, ie. the new top
level structure of the page tables.

v2: Renamed  pdp_free to be similar to  pd/pt (unmap_and_free_pdp).

v3: To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.

v4: Rebase after s/page_tables/page_table/.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_drv.h     |   7 ++-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 109 +++++++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  41 +++++++++++---
 3 files changed, 127 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3cc0196..662d6c1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2433,7 +2433,12 @@ struct drm_i915_cmd_table {
 #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
 #define HAS_LOGICAL_RING_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 8)
 #define USES_PPGTT(dev)		(i915.enable_ppgtt)
-#define USES_FULL_PPGTT(dev)	(i915.enable_ppgtt == 2)
+#define USES_FULL_PPGTT(dev)	(i915.enable_ppgtt >= 2)
+#ifdef CONFIG_64BIT
+# define USES_FULL_48BIT_PPGTT(dev)	(i915.enable_ppgtt == 3)
+#else
+# define USES_FULL_48BIT_PPGTT(dev)	false
+#endif
 
 #define HAS_OVERLAY(dev)		(INTEL_INFO(dev)->has_overlay)
 #define OVERLAY_NEEDS_PHYSICAL(dev)	(INTEL_INFO(dev)->overlay_needs_physical)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b9dfc56..2a453fd 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -100,10 +100,18 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 {
 	bool has_aliasing_ppgtt;
 	bool has_full_ppgtt;
+	bool has_full_64bit_ppgtt;
 
 	has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
 	has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
 
+#ifdef CONFIG_64BIT
+	has_full_64bit_ppgtt = IS_BROADWELL(dev) ||
+				INTEL_INFO(dev)->gen >= 9 && false; /* FIXME: 64b */
+#else
+	has_full_64bit_ppgtt = false;
+#endif
+
 	if (intel_vgpu_active(dev))
 		has_full_ppgtt = false; /* emulation is too hard */
 
@@ -121,6 +129,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 	if (enable_ppgtt == 2 && has_full_ppgtt)
 		return 2;
 
+	if (enable_ppgtt == 3 && has_full_64bit_ppgtt)
+		return 3;
+
 #ifdef CONFIG_INTEL_IOMMU
 	/* Disable ppgtt on SNB if VT-d is on. */
 	if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
@@ -461,6 +472,45 @@ free_pd:
 	return ERR_PTR(ret);
 }
 
+static void __pdp_fini(struct i915_page_directory_pointer_entry *pdp)
+{
+	kfree(pdp->used_pdpes);
+	kfree(pdp->page_directory);
+	/* HACK */
+	pdp->page_directory = NULL;
+}
+
+static void unmap_and_free_pdp(struct i915_page_directory_pointer_entry *pdp,
+			    struct drm_device *dev)
+{
+	__pdp_fini(pdp);
+	if (USES_FULL_48BIT_PPGTT(dev))
+		kfree(pdp);
+}
+
+static int __pdp_init(struct i915_page_directory_pointer_entry *pdp,
+		      struct drm_device *dev)
+{
+	size_t pdpes = I915_PDPES_PER_PDP(dev);
+
+	pdp->used_pdpes = kcalloc(BITS_TO_LONGS(pdpes),
+				  sizeof(unsigned long),
+				  GFP_KERNEL);
+	if (!pdp->used_pdpes)
+		return -ENOMEM;
+
+	pdp->page_directory = kcalloc(pdpes, sizeof(*pdp->page_directory), GFP_KERNEL);
+	if (!pdp->page_directory) {
+		kfree(pdp->used_pdpes);
+		/* the PDP might be the statically allocated top level. Keep it
+		 * as clean as possible */
+		pdp->used_pdpes = NULL;
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_engine_cs *ring,
 			  unsigned entry,
@@ -490,7 +540,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 {
 	int i, ret;
 
-	for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+	for (i = 3; i >= 0; i--) {
 		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
 		dma_addr_t pd_daddr = pd ? pd->daddr : ppgtt->scratch_pd->daddr;
 		/* The page directory might be NULL, but we need to clear out
@@ -579,9 +629,6 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
-		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
-			break;
-
 		if (pt_vaddr == NULL) {
 			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[pdpe];
 			struct i915_page_table_entry *pt = pd->page_table[pde];
@@ -663,7 +710,8 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
 
-	for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
+	for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+			I915_PDPES_PER_PDP(ppgtt->base.dev)) {
 		struct i915_page_directory_entry *pd;
 
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
@@ -695,13 +743,15 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
+	for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+				I915_PDPES_PER_PDP(ppgtt->base.dev)) {
 		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
 			continue;
 
 		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
 		unmap_and_free_pd(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
 	}
+	unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
 }
 
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
@@ -798,8 +848,9 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 	struct i915_page_directory_entry *pd;
 	uint64_t temp;
 	uint32_t pdpe;
+	size_t pdpes =  I915_PDPES_PER_PDP(ppgtt->base.dev);
 
-	BUG_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+	BUG_ON(!bitmap_empty(new_pds, pdpes));
 
 	/* FIXME: PPGTT container_of won't work for 64b */
 	BUG_ON((start + length) > 0x800000000ULL);
@@ -819,18 +870,19 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 
 unwind_out:
-	for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
+	for_each_set_bit(pdpe, new_pds, pdpes)
 		unmap_and_free_pd(pdp->page_directory[pdpe], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
 
 static inline void
-free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
+		       size_t pdpes)
 {
 	int i;
 
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+	for (i = 0; i < pdpes; i++)
 		kfree(new_pts[i]);
 	kfree(new_pts);
 	kfree(new_pds);
@@ -840,13 +892,14 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
  * of these are based on the number of PDPEs in the system.
  */
 int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
-					 unsigned long ***new_pts)
+					 unsigned long ***new_pts,
+					 size_t pdpes)
 {
 	int i;
 	unsigned long *pds;
 	unsigned long **pts;
 
-	pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
+	pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL);
 	if (!pds)
 		return -ENOMEM;
 
@@ -856,7 +909,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
 		return -ENOMEM;
 	}
 
-	for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+	for (i = 0; i < pdpes; i++) {
 		pts[i] = kcalloc(BITS_TO_LONGS(GEN8_PDES_PER_PAGE),
 				 sizeof(unsigned long), GFP_KERNEL);
 		if (!pts[i])
@@ -869,7 +922,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
 	return 0;
 
 err_out:
-	free_gen8_temp_bitmaps(pds, pts);
+	free_gen8_temp_bitmaps(pds, pts, pdpes);
 	return -ENOMEM;
 }
 
@@ -885,6 +938,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	const uint64_t orig_length = length;
 	uint64_t temp;
 	uint32_t pdpe;
+	size_t pdpes = I915_PDPES_PER_PDP(dev);
 	int ret;
 
 #ifndef CONFIG_64BIT
@@ -902,7 +956,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	if (WARN_ON(start + length < start))
 		return -ERANGE;
 
-	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
+	ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables, pdpes);
 	if (ret)
 		return ret;
 
@@ -910,7 +964,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
 					new_page_dirs);
 	if (ret) {
-		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 		return ret;
 	}
 
@@ -967,7 +1021,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		set_bit(pdpe, ppgtt->pdp.used_pdpes);
 	}
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 	return 0;
 
 err_out:
@@ -976,13 +1030,19 @@ err_out:
 			unmap_and_free_pt(pd->page_table[temp], vm->dev);
 	}
 
-	for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
+	for_each_set_bit(pdpe, new_page_dirs, pdpes)
 		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 	return ret;
 }
 
+static void gen8_ppgtt_fini_common(struct i915_hw_ppgtt *ppgtt)
+{
+	unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
+	unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
+}
+
 /**
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -1003,6 +1063,15 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 
 	ppgtt->switch_mm = gen8_mm_switch;
 
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		int ret = __pdp_init(&ppgtt->pdp, false);
+		if (ret) {
+			unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
+			return ret;
+		}
+	} else
+		return -EPERM; /* Not yet implemented */
+
 	return 0;
 }
 
@@ -1024,7 +1093,7 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	 * eventually. */
 	ret = gen8_alloc_va_range(&ppgtt->base, start, size);
 	if (ret) {
-		unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
+		gen8_ppgtt_fini_common(ppgtt);
 		return ret;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index fd84bbc..1004e0f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -85,8 +85,12 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
  * The difference as compared to normal x86 3 level page table is the PDPEs are
  * programmed via register.
  */
+#define GEN8_PML4ES_PER_PML4		512
+#define GEN8_PML4E_SHIFT		39
 #define GEN8_PDPE_SHIFT			30
-#define GEN8_PDPE_MASK			0x3
+/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
+ * tables */
+#define GEN8_PDPE_MASK			0x1ff
 #define GEN8_PDE_SHIFT			21
 #define GEN8_PDE_MASK			0x1ff
 #define GEN8_PTE_SHIFT			12
@@ -95,6 +99,13 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
 #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
 
+#ifdef CONFIG_64BIT
+# define I915_PDPES_PER_PDP(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
+		GEN8_PML4ES_PER_PML4 : GEN8_LEGACY_PDPES)
+#else
+# define I915_PDPES_PER_PDP		GEN8_LEGACY_PDPES
+#endif
+
 #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
 #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
 #define PPAT_CACHED_INDEX		_PAGE_PAT /* WB LLCeLLC */
@@ -210,9 +221,17 @@ struct i915_page_directory_entry {
 };
 
 struct i915_page_directory_pointer_entry {
-	/* struct page *page; */
-	DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
-	struct i915_page_directory_entry *page_directory[GEN8_LEGACY_PDPES];
+	struct page *page;
+	dma_addr_t daddr;
+	unsigned long *used_pdpes;
+	struct i915_page_directory_entry **page_directory;
+};
+
+struct i915_pml4 {
+	struct page *page;
+	dma_addr_t daddr;
+	DECLARE_BITMAP(used_pml4es, GEN8_PML4ES_PER_PML4);
+	struct i915_page_directory_pointer_entry *pdps[GEN8_PML4ES_PER_PML4];
 };
 
 struct i915_address_space {
@@ -302,8 +321,9 @@ struct i915_hw_ppgtt {
 	struct drm_mm_node node;
 	unsigned long pd_dirty_rings;
 	union {
-		struct i915_page_directory_pointer_entry pdp;
-		struct i915_page_directory_entry pd;
+		struct i915_pml4 pml4;		/* GEN8+ & 64b PPGTT */
+		struct i915_page_directory_pointer_entry pdp;	/* GEN8+ */
+		struct i915_page_directory_entry pd;		/* GEN6-7 */
 	};
 
 	union {
@@ -399,14 +419,17 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
-#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
-	for (iter = gen8_pdpe_index(start), pd = (pdp)->page_directory[iter];	\
-	     length > 0 && iter < GEN8_LEGACY_PDPES;			\
+#define gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, b)	\
+	for (iter = gen8_pdpe_index(start), pd = (pdp)->page_directory[iter]; \
+	     length > 0 && (iter < b);					\
 	     pd = (pdp)->page_directory[++iter],				\
 	     temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start,	\
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
+	gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
+
 /* Clamp length to the next page_directory boundary */
 static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
 {
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 22/32] drm/i915/bdw: Abstract PDP usage
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (20 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 21/32] drm/i915/bdw: Make pdp allocation more dynamic Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 23/32] drm/i915/bdw: Add dynamic page trace events Michel Thierry
                     ` (10 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Up until now, ppgtt->pdp has always been the root of our page tables.
Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.

In preparation for 4 level page tables, we need to stop use ppgtt->pdp
directly unless we know it's what we want. The future structure will use
ppgtt->pml4 for the top level, and the pdp is just one of the entries
being pointed to by a pml4e.

This patch addresses some carelessness done throughout development wrt
assumptions made of the root page tables.

v2: Updated after dynamic page allocation changes.
v3: Rebase after s/page_tables/page_table/.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 123 ++++++++++++++++++++----------------
 1 file changed, 70 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2a453fd..50583a4 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -560,6 +560,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -575,10 +576,10 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 		struct i915_page_table_entry *pt;
 		struct page *page_table;
 
-		if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
+		if (WARN_ON(!pdp->page_directory[pdpe]))
 			continue;
 
-		pd = ppgtt->pdp.page_directory[pdpe];
+		pd = pdp->page_directory[pdpe];
 
 		if (WARN_ON(!pd->page_table[pde]))
 			continue;
@@ -620,6 +621,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	gen8_gtt_pte_t *pt_vaddr;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -630,7 +632,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL) {
-			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[pdpe];
+			struct i915_page_directory_entry *pd = pdp->page_directory[pdpe];
 			struct i915_page_table_entry *pt = pd->page_table[pde];
 			struct page *page_table = pt->page;
 
@@ -708,16 +710,17 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct d
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
+	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	int i, j;
 
-	for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+	for_each_set_bit(i, pdp->used_pdpes,
 			I915_PDPES_PER_PDP(ppgtt->base.dev)) {
 		struct i915_page_directory_entry *pd;
 
-		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+		if (WARN_ON(!pdp->page_directory[i]))
 			continue;
 
-		pd = ppgtt->pdp.page_directory[i];
+		pd = pdp->page_directory[i];
 		if (!pd->daddr)
 			pci_unmap_page(hwdev, pd->daddr, PAGE_SIZE,
 					PCI_DMA_BIDIRECTIONAL);
@@ -743,15 +746,21 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for_each_set_bit(i, ppgtt->pdp.used_pdpes,
-				I915_PDPES_PER_PDP(ppgtt->base.dev)) {
-		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
-			continue;
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+				 I915_PDPES_PER_PDP(ppgtt->base.dev)) {
+			if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+				continue;
 
-		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
-		unmap_and_free_pd(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
+			gen8_free_page_tables(ppgtt->pdp.page_directory[i],
+					      ppgtt->base.dev);
+			unmap_and_free_pd(ppgtt->pdp.page_directory[i],
+					  ppgtt->base.dev);
+		}
+		unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
+	} else {
+		BUG(); /* to be implemented later */
 	}
-	unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
 }
 
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
@@ -765,7 +774,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 
 /**
  * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
- * @ppgtt:	Master ppgtt structure.
+ * @vm:		Master vm structure.
  * @pd:		Page directory for this address range.
  * @start:	Starting virtual address to begin allocations.
  * @length	Size of the allocations.
@@ -781,12 +790,13 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
  *
  * Return: 0 if success; negative error code otherwise.
  */
-static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
 				     struct i915_page_directory_entry *pd,
 				     uint64_t start,
 				     uint64_t length,
 				     unsigned long *new_pts)
 {
+	struct drm_device *dev = vm->dev;
 	struct i915_page_table_entry *pt;
 	uint64_t temp;
 	uint32_t pde;
@@ -799,7 +809,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
 			continue;
 		}
 
-		pt = alloc_pt_single(ppgtt->base.dev);
+		pt = alloc_pt_single(dev);
 		if (IS_ERR(pt))
 			goto unwind_out;
 
@@ -811,14 +821,14 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
 
 unwind_out:
 	for_each_set_bit(pde, new_pts, GEN8_PDES_PER_PAGE)
-		unmap_and_free_pt(pd->page_table[pde], ppgtt->base.dev);
+		unmap_and_free_pt(pd->page_table[pde], dev);
 
 	return -ENOMEM;
 }
 
 /**
  * gen8_ppgtt_alloc_page_directories() - Allocate page directories for VA range.
- * @ppgtt:	Master ppgtt structure.
+ * @vm:		Master vm structure.
  * @pdp:	Page directory pointer for this address range.
  * @start:	Starting virtual address to begin allocations.
  * @length	Size of the allocations.
@@ -839,16 +849,17 @@ unwind_out:
  *
  * Return: 0 if success; negative error code otherwise.
  */
-static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
+static int gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
 				     struct i915_page_directory_pointer_entry *pdp,
 				     uint64_t start,
 				     uint64_t length,
 				     unsigned long *new_pds)
 {
+	struct drm_device *dev = vm->dev;
 	struct i915_page_directory_entry *pd;
 	uint64_t temp;
 	uint32_t pdpe;
-	size_t pdpes =  I915_PDPES_PER_PDP(ppgtt->base.dev);
+	size_t pdpes =  I915_PDPES_PER_PDP(vm->dev);
 
 	BUG_ON(!bitmap_empty(new_pds, pdpes));
 
@@ -859,7 +870,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 		if (pd)
 			continue;
 
-		pd = alloc_pd_single(ppgtt->base.dev);
+		pd = alloc_pd_single(dev);
 		if (IS_ERR(pd))
 			goto unwind_out;
 
@@ -871,7 +882,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
 
 unwind_out:
 	for_each_set_bit(pdpe, new_pds, pdpes)
-		unmap_and_free_pd(pdp->page_directory[pdpe], ppgtt->base.dev);
+		unmap_and_free_pd(pdp->page_directory[pdpe], dev);
 
 	return -ENOMEM;
 }
@@ -926,13 +937,13 @@ err_out:
 	return -ENOMEM;
 }
 
-static int gen8_alloc_va_range(struct i915_address_space *vm,
-			       uint64_t start,
-			       uint64_t length)
+static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
+				    struct i915_page_directory_pointer_entry *pdp,
+				    uint64_t start,
+				    uint64_t length)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
 	unsigned long *new_page_dirs, **new_page_tables;
+	struct drm_device *dev = vm->dev;
 	struct i915_page_directory_entry *pd;
 	const uint64_t orig_start = start;
 	const uint64_t orig_length = length;
@@ -961,17 +972,15 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		return ret;
 
 	/* Do the allocations first so we can easily bail out */
-	ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
-					new_page_dirs);
+	ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length, new_page_dirs);
 	if (ret) {
 		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 		return ret;
 	}
 
-	/* For every page directory referenced, allocate page tables */
-	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		bitmap_zero(new_page_tables[pdpe], GEN8_PDES_PER_PAGE);
-		ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
+		ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length,
 						new_page_tables[pdpe]);
 		if (ret)
 			goto err_out;
@@ -980,10 +989,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	start = orig_start;
 	length = orig_length;
 
-	/* Allocations have completed successfully, so set the bitmaps, and do
-	 * the mappings. */
-	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
-		gen8_ppgtt_pde_t *const page_directory = kmap_atomic(pd->page);
+	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		struct i915_page_table_entry *pt;
 		uint64_t pd_len = gen8_clamp_pd(start, length);
 		uint64_t pd_start = start;
@@ -1005,20 +1011,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 
 			/* Our pde is now pointing to the pagetable, pt */
 			set_bit(pde, pd->used_pdes);
-
-			/* Map the PDE to the page table */
-			__gen8_do_map_pt(page_directory + pde, pt, vm->dev);
-
-			/* NB: We haven't yet mapped ptes to pages. At this
-			 * point we're still relying on insert_entries() */
 		}
 
-		if (!HAS_LLC(vm->dev))
-			drm_clflush_virt_range(page_directory, PAGE_SIZE);
-
-		kunmap_atomic(page_directory);
-
-		set_bit(pdpe, ppgtt->pdp.used_pdpes);
+		set_bit(pdpe, pdp->used_pdpes);
+		gen8_map_pagetable_range(pd, start, length, dev);
 	}
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1027,16 +1023,36 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 err_out:
 	while (pdpe--) {
 		for_each_set_bit(temp, new_page_tables[pdpe], GEN8_PDES_PER_PAGE)
-			unmap_and_free_pt(pd->page_table[temp], vm->dev);
+			unmap_and_free_pt(pd->page_table[temp], dev);
 	}
 
 	for_each_set_bit(pdpe, new_page_dirs, pdpes)
-		unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
+		unmap_and_free_pd(pdp->page_directory[pdpe], dev);
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
 	return ret;
 }
 
+static int __noreturn gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
+					       struct i915_pml4 *pml4,
+					       uint64_t start,
+					       uint64_t length)
+{
+	BUG(); /* to be implemented later */
+}
+
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+
+	if (!USES_FULL_48BIT_PPGTT(vm->dev))
+		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
+	else
+		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+}
+
 static void gen8_ppgtt_fini_common(struct i915_hw_ppgtt *ppgtt)
 {
 	unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
@@ -1079,12 +1095,13 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
 	struct i915_page_directory_entry *pd;
 	uint64_t temp, start = 0, size = dev_priv->gtt.base.total;
 	uint32_t pdpe;
 	int ret;
 
-	ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+	ret = gen8_ppgtt_init_common(ppgtt, size);
 	if (ret)
 		return ret;
 
@@ -1097,8 +1114,8 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 	}
 
-	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
-		gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
+	gen8_for_each_pdpe(pd, pdp, start, size, temp, pdpe)
+		gen8_map_pagetable_range(pd, start, size, dev);
 
 	ppgtt->base.allocate_va_range = NULL;
 	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 23/32] drm/i915/bdw: Add dynamic page trace events
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (21 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 22/32] drm/i915/bdw: Abstract PDP usage Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 24/32] drm/i915/bdw: Add ppgtt info for dynamic pages Michel Thierry
                     ` (9 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The dynamic page allocation patch series added it for GEN6, this patch
adds them for GEN8.

v2: Consolidate pagetable/page_directory events
v3: Multiple rebases.
v4: Rebase after s/page_tables/page_table/.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 23 +++++++++++++++--------
 drivers/gpu/drm/i915/i915_trace.h   | 16 ++++++++++++++++
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 50583a4..f613377 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -673,19 +673,24 @@ static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
 /* It's likely we'll map more than one pagetable at a time. This function will
  * save us unnecessary kmap calls, but do no more functionally than multiple
  * calls to map_pt. */
-static void gen8_map_pagetable_range(struct i915_page_directory_entry *pd,
+static void gen8_map_pagetable_range(struct i915_address_space *vm,
+				     struct i915_page_directory_entry *pd,
 				     uint64_t start,
-				     uint64_t length,
-				     struct drm_device *dev)
+				     uint64_t length)
 {
 	gen8_ppgtt_pde_t * const page_directory = kmap_atomic(pd->page);
 	struct i915_page_table_entry *pt;
 	uint64_t temp, pde;
 
-	gen8_for_each_pde(pt, pd, start, length, temp, pde)
-		__gen8_do_map_pt(page_directory + pde, pt, dev);
+	gen8_for_each_pde(pt, pd, start, length, temp, pde) {
+		__gen8_do_map_pt(page_directory + pde, pt, vm->dev);
+		trace_i915_page_table_entry_map(vm, pde, pt,
+					 gen8_pte_index(start),
+					 gen8_pte_count(start, length),
+					 GEN8_PTES_PER_PAGE);
+	}
 
-	if (!HAS_LLC(dev))
+	if (!HAS_LLC(vm->dev))
 		drm_clflush_virt_range(page_directory, PAGE_SIZE);
 
 	kunmap_atomic(page_directory);
@@ -815,6 +820,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
 
 		pd->page_table[pde] = pt;
 		set_bit(pde, new_pts);
+		trace_i915_page_table_entry_alloc(vm, pde, start, GEN8_PDE_SHIFT);
 	}
 
 	return 0;
@@ -876,6 +882,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
 
 		pdp->page_directory[pdpe] = pd;
 		set_bit(pdpe, new_pds);
+		trace_i915_page_directory_entry_alloc(vm, pdpe, start, GEN8_PDPE_SHIFT);
 	}
 
 	return 0;
@@ -1014,7 +1021,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 		}
 
 		set_bit(pdpe, pdp->used_pdpes);
-		gen8_map_pagetable_range(pd, start, length, dev);
+		gen8_map_pagetable_range(vm, pd, start, length);
 	}
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1115,7 +1122,7 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	}
 
 	gen8_for_each_pdpe(pd, pdp, start, size, temp, pdpe)
-		gen8_map_pagetable_range(pd, start, size, dev);
+		gen8_map_pagetable_range(&ppgtt->base, pd,start, size);
 
 	ppgtt->base.allocate_va_range = NULL;
 	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 0038dc2..10cd830 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -214,6 +214,22 @@ DEFINE_EVENT(i915_page_table_entry, i915_page_table_entry_alloc,
 	     TP_ARGS(vm, pde, start, pde_shift)
 );
 
+DEFINE_EVENT_PRINT(i915_page_table_entry, i915_page_directory_entry_alloc,
+		   TP_PROTO(struct i915_address_space *vm, u32 pdpe, u64 start, u64 pdpe_shift),
+		   TP_ARGS(vm, pdpe, start, pdpe_shift),
+
+		   TP_printk("vm=%p, pdpe=%d (0x%llx-0x%llx)",
+			     __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT_PRINT(i915_page_table_entry, i915_page_directory_pointer_entry_alloc,
+		   TP_PROTO(struct i915_address_space *vm, u32 pml4e, u64 start, u64 pml4e_shift),
+		   TP_ARGS(vm, pml4e, start, pml4e_shift),
+
+		   TP_printk("vm=%p, pml4e=%d (0x%llx-0x%llx)",
+			     __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
 /* Avoid extra math because we only support two sizes. The format is defined by
  * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
 #define TRACE_PT_SIZE(bits) \
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 24/32] drm/i915/bdw: Add ppgtt info for dynamic pages
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (22 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 23/32] drm/i915/bdw: Add dynamic page trace events Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 25/32] drm/i915/bdw: implement alloc/free for 4lvl Michel Thierry
                     ` (8 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Note that there is no gen8 ppgtt debug_dump function yet.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 19 ++++++++++---------
 drivers/gpu/drm/i915/i915_gem_gtt.c | 32 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h |  9 +++++++++
 3 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e85da9d..c877957 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2165,7 +2165,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	struct drm_file *file;
 	int i;
 
 	if (INTEL_INFO(dev)->gen == 6)
@@ -2189,14 +2188,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 
 		ppgtt->debug_dump(ppgtt, m);
 	}
-
-	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
-		struct drm_i915_file_private *file_priv = file->driver_priv;
-
-		seq_printf(m, "proc: %s\n",
-			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
-		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
-	}
 }
 
 static int i915_ppgtt_info(struct seq_file *m, void *data)
@@ -2204,6 +2195,7 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	struct drm_info_node *node = m->private;
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_file *file;
 
 	int ret = mutex_lock_interruptible(&dev->struct_mutex);
 	if (ret)
@@ -2215,6 +2207,15 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	else if (INTEL_INFO(dev)->gen >= 6)
 		gen6_ppgtt_info(m, dev);
 
+	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
+		struct drm_i915_file_private *file_priv = file->driver_priv;
+
+		seq_printf(m, "\nproc: %s\n",
+			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
+		idr_for_each(&file_priv->context_idr, per_file_ctx,
+			     (void *)(unsigned long)m);
+	}
+
 	intel_runtime_pm_put(dev_priv);
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f613377..9f16db7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2128,6 +2128,38 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
 	readl(gtt_base);
 }
 
+void gen8_for_every_pdpe_pde(struct i915_hw_ppgtt *ppgtt,
+			     void (*callback)(struct i915_page_directory_pointer_entry *pdp,
+					      struct i915_page_directory_entry *pd,
+					      struct i915_page_table_entry *pt,
+					      unsigned pdpe,
+					      unsigned pde,
+					      void *data),
+			     void *data)
+{
+	uint64_t start = ppgtt->base.start;
+	uint64_t length = ppgtt->base.total;
+	uint64_t pdpe, pde, temp;
+
+	struct i915_page_directory_entry *pd;
+	struct i915_page_table_entry *pt;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		uint64_t pd_start = start, pd_length = length;
+		int i;
+
+		if (pd == NULL) {
+			for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
+				callback(&ppgtt->pdp, NULL, NULL, pdpe, i, data);
+			continue;
+		}
+
+		gen8_for_each_pde(pt, pd, pd_start, pd_length, temp, pde) {
+			callback(&ppgtt->pdp, pd, pt, pdpe, pde, data);
+		}
+	}
+}
+
 static void gen6_ggtt_clear_range(struct i915_address_space *vm,
 				  uint64_t start,
 				  uint64_t length,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 1004e0f..f74afa6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -483,6 +483,15 @@ static inline size_t gen8_pde_count(uint64_t addr, uint64_t length)
 	return i915_pde_index(end, GEN8_PDE_SHIFT) - i915_pde_index(addr, GEN8_PDE_SHIFT);
 }
 
+void gen8_for_every_pdpe_pde(struct i915_hw_ppgtt *ppgtt,
+			     void (*callback)(struct i915_page_directory_pointer_entry *pdp,
+					      struct i915_page_directory_entry *pd,
+					      struct i915_page_table_entry *pt,
+					      unsigned pdpe,
+					      unsigned pde,
+					      void *data),
+			     void *data);
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 25/32] drm/i915/bdw: implement alloc/free for 4lvl
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (23 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 24/32] drm/i915/bdw: Add ppgtt info for dynamic pages Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:22   ` [PATCH v6 26/32] drm/i915/bdw: Add 4 level switching infrastructure Michel Thierry
                     ` (7 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The code for 4lvl works just as one would expect, and nicely it is able
to call into the existing 3lvl page table code to handle all of the
lower levels.

PML4 has no special attributes, and there will always be a PML4.
So simply initialize it at creation, and destroy it at the end.

v2: Return something at the end of gen8_alloc_va_range_4lvl to keep the
compiler happy. And define ret only in one place.
Updated gen8_ppgtt_unmap_pages and gen8_ppgtt_free to handle 4lvl.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 240 +++++++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_gtt.h |  11 +-
 2 files changed, 217 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 9f16db7..4c921d0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -483,9 +483,12 @@ static void __pdp_fini(struct i915_page_directory_pointer_entry *pdp)
 static void unmap_and_free_pdp(struct i915_page_directory_pointer_entry *pdp,
 			    struct drm_device *dev)
 {
-	__pdp_fini(pdp);
-	if (USES_FULL_48BIT_PPGTT(dev))
+	if (USES_FULL_48BIT_PPGTT(dev)) {
+		__pdp_fini(pdp);
+		i915_dma_unmap_single(pdp, dev);
+		__free_page(pdp->page);
 		kfree(pdp);
+	}
 }
 
 static int __pdp_init(struct i915_page_directory_pointer_entry *pdp,
@@ -511,6 +514,60 @@ static int __pdp_init(struct i915_page_directory_pointer_entry *pdp,
 	return 0;
 }
 
+static struct i915_page_directory_pointer_entry *alloc_pdp_single(struct i915_hw_ppgtt *ppgtt,
+					       struct i915_pml4 *pml4)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct i915_page_directory_pointer_entry *pdp;
+	int ret;
+
+	BUG_ON(!USES_FULL_48BIT_PPGTT(dev));
+
+	pdp = kmalloc(sizeof(*pdp), GFP_KERNEL);
+	if (!pdp)
+		return ERR_PTR(-ENOMEM);
+
+	pdp->page = alloc_page(GFP_KERNEL | GFP_DMA32 | __GFP_ZERO);
+	if (!pdp->page) {
+		kfree(pdp);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ret = __pdp_init(pdp, dev);
+	if (ret) {
+		__free_page(pdp->page);
+		kfree(pdp);
+		return ERR_PTR(ret);
+	}
+
+	i915_dma_map_single(pdp, dev);
+
+	return pdp;
+}
+
+static void pml4_fini(struct i915_pml4 *pml4)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pml4, struct i915_hw_ppgtt, pml4);
+	i915_dma_unmap_single(pml4, ppgtt->base.dev);
+	__free_page(pml4->page);
+	/* HACK */
+	pml4->page = NULL;
+}
+
+static int pml4_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct i915_pml4 *pml4 = &ppgtt->pml4;
+
+	pml4->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pml4->page)
+		return -ENOMEM;
+
+	i915_dma_map_single(pml4, ppgtt->base.dev);
+
+	return 0;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_engine_cs *ring,
 			  unsigned entry,
@@ -712,14 +769,13 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct d
 	}
 }
 
-static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
+static void gen8_ppgtt_unmap_pages_3lvl(struct i915_page_directory_pointer_entry *pdp,
+					struct drm_device *dev)
 {
-	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
-	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
+	struct pci_dev *hwdev = dev->pdev;
 	int i, j;
 
-	for_each_set_bit(i, pdp->used_pdpes,
-			I915_PDPES_PER_PDP(ppgtt->base.dev)) {
+	for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
 		struct i915_page_directory_entry *pd;
 
 		if (WARN_ON(!pdp->page_directory[i]))
@@ -747,27 +803,73 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	}
 }
 
-static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+static void gen8_ppgtt_unmap_pages_4lvl(struct i915_hw_ppgtt *ppgtt)
 {
+	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i;
 
-	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
-		for_each_set_bit(i, ppgtt->pdp.used_pdpes,
-				 I915_PDPES_PER_PDP(ppgtt->base.dev)) {
-			if (WARN_ON(!ppgtt->pdp.page_directory[i]))
-				continue;
+	for_each_set_bit(i, ppgtt->pml4.used_pml4es, GEN8_PML4ES_PER_PML4) {
+		struct i915_page_directory_pointer_entry *pdp;
 
-			gen8_free_page_tables(ppgtt->pdp.page_directory[i],
-					      ppgtt->base.dev);
-			unmap_and_free_pd(ppgtt->pdp.page_directory[i],
-					  ppgtt->base.dev);
-		}
-		unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
-	} else {
-		BUG(); /* to be implemented later */
+		if (WARN_ON(!ppgtt->pml4.pdps[i]))
+			continue;
+
+		pdp = ppgtt->pml4.pdps[i];
+		if (!pdp->daddr)
+			pci_unmap_page(hwdev, pdp->daddr, PAGE_SIZE,
+				       PCI_DMA_BIDIRECTIONAL);
+
+		gen8_ppgtt_unmap_pages_3lvl(ppgtt->pml4.pdps[i],
+					    ppgtt->base.dev);
 	}
 }
 
+static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
+{
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+		gen8_ppgtt_unmap_pages_3lvl(&ppgtt->pdp, ppgtt->base.dev);
+	else
+		gen8_ppgtt_unmap_pages_4lvl(ppgtt);
+}
+
+static void gen8_ppgtt_free_3lvl(struct i915_page_directory_pointer_entry *pdp,
+				 struct drm_device *dev)
+{
+	int i;
+
+	for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
+		if (WARN_ON(!pdp->page_directory[i]))
+			continue;
+
+		gen8_free_page_tables(pdp->page_directory[i], dev);
+		unmap_and_free_pd(pdp->page_directory[i], dev);
+	}
+
+	unmap_and_free_pdp(pdp, dev);
+}
+
+static void gen8_ppgtt_free_4lvl(struct i915_hw_ppgtt *ppgtt)
+{
+	int i;
+
+	for_each_set_bit(i, ppgtt->pml4.used_pml4es, GEN8_PML4ES_PER_PML4) {
+		if (WARN_ON(!ppgtt->pml4.pdps[i]))
+			continue;
+
+		gen8_ppgtt_free_3lvl(ppgtt->pml4.pdps[i], ppgtt->base.dev);
+	}
+
+	pml4_fini(&ppgtt->pml4);
+}
+
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+{
+	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+		gen8_ppgtt_free_3lvl(&ppgtt->pdp, ppgtt->base.dev);
+	else
+		gen8_ppgtt_free_4lvl(ppgtt);
+}
+
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct i915_hw_ppgtt *ppgtt =
@@ -1040,12 +1142,74 @@ err_out:
 	return ret;
 }
 
-static int __noreturn gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
-					       struct i915_pml4 *pml4,
-					       uint64_t start,
-					       uint64_t length)
+static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
+				    struct i915_pml4 *pml4,
+				    uint64_t start,
+				    uint64_t length)
 {
-	BUG(); /* to be implemented later */
+	DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer_entry *pdp;
+	const uint64_t orig_start = start;
+	const uint64_t orig_length = length;
+	uint64_t temp, pml4e;
+	int ret = 0;
+
+	/* Do the pml4 allocations first, so we don't need to track the newly
+	 * allocated tables below the pdp */
+	bitmap_zero(new_pdps, GEN8_PML4ES_PER_PML4);
+
+	/* The page_directoryectory and pagetable allocations are done in the shared 3
+	 * and 4 level code. Just allocate the pdps.
+	 */
+	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+		if (!pdp) {
+			WARN_ON(test_bit(pml4e, pml4->used_pml4es));
+			pdp = alloc_pdp_single(ppgtt, pml4);
+			if (IS_ERR(pdp))
+				goto err_alloc;
+
+			pml4->pdps[pml4e] = pdp;
+			set_bit(pml4e, new_pdps);
+			trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, pml4e,
+						   pml4e << GEN8_PML4E_SHIFT,
+						   GEN8_PML4E_SHIFT);
+
+		}
+	}
+
+	WARN(bitmap_weight(new_pdps, GEN8_PML4ES_PER_PML4) > 2,
+	     "The allocation has spanned more than 512GB. "
+	     "It is highly likely this is incorrect.");
+
+	start = orig_start;
+	length = orig_length;
+
+	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+		BUG_ON(!pdp);
+
+		ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
+		if (ret)
+			goto err_out;
+	}
+
+	bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
+		  GEN8_PML4ES_PER_PML4);
+
+	return 0;
+
+err_out:
+	start = orig_start;
+	length = orig_length;
+	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e)
+		gen8_ppgtt_free_3lvl(pdp, vm->dev);
+
+err_alloc:
+	for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
+		unmap_and_free_pdp(pdp, vm->dev);
+
+	return ret;
 }
 
 static int gen8_alloc_va_range(struct i915_address_space *vm,
@@ -1054,16 +1218,19 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 
-	if (!USES_FULL_48BIT_PPGTT(vm->dev))
-		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
-	else
+	if (USES_FULL_48BIT_PPGTT(vm->dev))
 		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+	else
+		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
 }
 
 static void gen8_ppgtt_fini_common(struct i915_hw_ppgtt *ppgtt)
 {
 	unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
-	unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
+	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+		pml4_fini(&ppgtt->pml4);
+	else
+		unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
 }
 
 /**
@@ -1086,14 +1253,21 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 
 	ppgtt->switch_mm = gen8_mm_switch;
 
-	if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		int ret = pml4_init(ppgtt);
+		if (ret) {
+			unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
+			return ret;
+		}
+	} else {
 		int ret = __pdp_init(&ppgtt->pdp, false);
 		if (ret) {
 			unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 			return ret;
 		}
-	} else
-		return -EPERM; /* Not yet implemented */
+
+		trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, 0, 0, GEN8_PML4E_SHIFT);
+	}
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index f74afa6..3700f46 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -87,6 +87,7 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
  */
 #define GEN8_PML4ES_PER_PML4		512
 #define GEN8_PML4E_SHIFT		39
+#define GEN8_PML4E_MASK			(GEN8_PML4ES_PER_PML4 - 1)
 #define GEN8_PDPE_SHIFT			30
 /* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
  * tables */
@@ -427,6 +428,14 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 	     temp = min(temp, length),					\
 	     start += temp, length -= temp)
 
+#define gen8_for_each_pml4e(pdp, pml4, start, length, temp, iter)	\
+	for (iter = gen8_pml4e_index(start), pdp = (pml4)->pdps[iter];	\
+	     length > 0 && iter < GEN8_PML4ES_PER_PML4;			\
+	     pdp = (pml4)->pdps[++iter],				\
+	     temp = ALIGN(start+1, 1ULL << GEN8_PML4E_SHIFT) - start,	\
+	     temp = min(temp, length),					\
+	     start += temp, length -= temp)
+
 #define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter)		\
 	gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
 
@@ -458,7 +467,7 @@ static inline uint32_t gen8_pdpe_index(uint64_t address)
 
 static inline uint32_t gen8_pml4e_index(uint64_t address)
 {
-	BUG(); /* For 64B */
+	return (address >> GEN8_PML4E_SHIFT) & GEN8_PML4E_MASK;
 }
 
 static inline size_t gen8_pte_count(uint64_t addr, uint64_t length)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 26/32] drm/i915/bdw: Add 4 level switching infrastructure
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (24 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 25/32] drm/i915/bdw: implement alloc/free for 4lvl Michel Thierry
@ 2015-02-24 16:22   ` Michel Thierry
  2015-02-24 16:23   ` [PATCH v6 27/32] drm/i915/bdw: Support 64 bit PPGTT in lrc mode Michel Thierry
                     ` (6 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:22 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

Map is easy, it's the same register as the PDP descriptor 0, but it only
has one entry.

v2: PML4 update in legacy context switch is left for historic reasons,
the preferred mode of operation is with lrc context based submission.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 56 +++++++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem_gtt.h |  4 ++-
 drivers/gpu/drm/i915/i915_reg.h     |  1 +
 3 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 4c921d0..88a3c49 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -193,6 +193,9 @@ static inline gen8_ppgtt_pde_t gen8_pde_encode(struct drm_device *dev,
 	return pde;
 }
 
+#define gen8_pdpe_encode gen8_pde_encode
+#define gen8_pml4e_encode gen8_pde_encode
+
 static gen6_gtt_pte_t snb_pte_encode(dma_addr_t addr,
 				     enum i915_cache_level level,
 				     bool valid, u32 unused)
@@ -592,8 +595,8 @@ static int gen8_write_pdp(struct intel_engine_cs *ring,
 	return 0;
 }
 
-static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
-			  struct intel_engine_cs *ring)
+static int gen8_legacy_mm_switch(struct i915_hw_ppgtt *ppgtt,
+				 struct intel_engine_cs *ring)
 {
 	int i, ret;
 
@@ -610,6 +613,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 }
 
+static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
+			      struct intel_engine_cs *ring)
+{
+	return gen8_write_pdp(ring, 0, ppgtt->pml4.daddr);
+}
+
 static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				   uint64_t start,
 				   uint64_t length,
@@ -753,6 +762,37 @@ static void gen8_map_pagetable_range(struct i915_address_space *vm,
 	kunmap_atomic(page_directory);
 }
 
+static void gen8_map_page_directory(struct i915_page_directory_pointer_entry *pdp,
+				    struct i915_page_directory_entry *pd,
+				    int index,
+				    struct drm_device *dev)
+{
+	gen8_ppgtt_pdpe_t *page_directorypo;
+	gen8_ppgtt_pdpe_t pdpe;
+
+	/* We do not need to clflush because no platform requiring flush
+	 * supports 64b pagetables. */
+	if (!USES_FULL_48BIT_PPGTT(dev))
+		return;
+
+	page_directorypo = kmap_atomic(pdp->page);
+	pdpe = gen8_pdpe_encode(dev, pd->daddr, I915_CACHE_LLC);
+	page_directorypo[index] = pdpe;
+	kunmap_atomic(page_directorypo);
+}
+
+static void gen8_map_page_directory_pointer(struct i915_pml4 *pml4,
+					    struct i915_page_directory_pointer_entry *pdp,
+					    int index,
+					    struct drm_device *dev)
+{
+	gen8_ppgtt_pml4e_t *pagemap = kmap_atomic(pml4->page);
+	gen8_ppgtt_pml4e_t pml4e = gen8_pml4e_encode(dev, pdp->daddr, I915_CACHE_LLC);
+	BUG_ON(!USES_FULL_48BIT_PPGTT(dev));
+	pagemap[index] = pml4e;
+	kunmap_atomic(pagemap);
+}
+
 static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct drm_device *dev)
 {
 	int i;
@@ -1124,6 +1164,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 
 		set_bit(pdpe, pdp->used_pdpes);
 		gen8_map_pagetable_range(vm, pd, start, length);
+		gen8_map_page_directory(pdp, pd, pdpe, dev);
 	}
 
 	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1192,6 +1233,8 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
 		ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
 		if (ret)
 			goto err_out;
+
+		gen8_map_page_directory_pointer(pml4, pdp, pml4e, vm->dev);
 	}
 
 	bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
@@ -1251,14 +1294,14 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 
-	ppgtt->switch_mm = gen8_mm_switch;
-
 	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
 		int ret = pml4_init(ppgtt);
 		if (ret) {
 			unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
 			return ret;
 		}
+
+		ppgtt->switch_mm = gen8_48b_mm_switch;
 	} else {
 		int ret = __pdp_init(&ppgtt->pdp, false);
 		if (ret) {
@@ -1266,6 +1309,7 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 			return ret;
 		}
 
+		ppgtt->switch_mm = gen8_legacy_mm_switch;
 		trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, 0, 0, GEN8_PML4E_SHIFT);
 	}
 
@@ -1295,6 +1339,7 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 	}
 
+	/* FIXME: PML4 */
 	gen8_for_each_pdpe(pd, pdp, start, size, temp, pdpe)
 		gen8_map_pagetable_range(&ppgtt->base, pd,start, size);
 
@@ -1499,8 +1544,9 @@ static void gen8_ppgtt_enable(struct drm_device *dev)
 	int j;
 
 	for_each_ring(ring, dev_priv, j) {
+		u32 four_level = USES_FULL_48BIT_PPGTT(dev) ? GEN8_GFX_PPGTT_64B : 0;
 		I915_WRITE(RING_MODE_GEN7(ring),
-			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
+			   _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE | four_level));
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 3700f46..8866360 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -38,7 +38,9 @@ struct drm_i915_file_private;
 
 typedef uint32_t gen6_gtt_pte_t;
 typedef uint64_t gen8_gtt_pte_t;
-typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
+typedef gen8_gtt_pte_t		gen8_ppgtt_pde_t;
+typedef gen8_ppgtt_pde_t	gen8_ppgtt_pdpe_t;
+typedef gen8_ppgtt_pdpe_t	gen8_ppgtt_pml4e_t;
 
 #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
 
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index f67e290..754fa24 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1348,6 +1348,7 @@ enum skl_disp_power_wells {
 #define   GFX_REPLAY_MODE		(1<<11)
 #define   GFX_PSMI_GRANULARITY		(1<<10)
 #define   GFX_PPGTT_ENABLE		(1<<9)
+#define   GEN8_GFX_PPGTT_64B		(1<<7)
 
 #define VLV_DISPLAY_BASE 0x180000
 #define VLV_MIPI_BASE VLV_DISPLAY_BASE
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 27/32] drm/i915/bdw: Support 64 bit PPGTT in lrc mode
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (25 preceding siblings ...)
  2015-02-24 16:22   ` [PATCH v6 26/32] drm/i915/bdw: Add 4 level switching infrastructure Michel Thierry
@ 2015-02-24 16:23   ` Michel Thierry
  2015-02-24 16:23   ` [PATCH v6 28/32] drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT Michel Thierry
                     ` (5 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:23 UTC (permalink / raw)
  To: intel-gfx

In 64b (48bit canonical) PPGTT addressing, the PDP0 register contains
the base address to PML4, while the other PDP registers are ignored.

Also, the addressing mode must be specified in every context descriptor.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 167 ++++++++++++++++++++++++++-------------
 1 file changed, 114 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f461631..2b6d262 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -255,7 +255,8 @@ u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
 }
 
 static uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
-					 struct drm_i915_gem_object *ctx_obj)
+					 struct drm_i915_gem_object *ctx_obj,
+					 bool legacy_64bit_ctx)
 {
 	struct drm_device *dev = ring->dev;
 	uint64_t desc;
@@ -264,7 +265,10 @@ static uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
 	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
 
 	desc = GEN8_CTX_VALID;
-	desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
+	if (legacy_64bit_ctx)
+		desc |= LEGACY_64B_CONTEXT << GEN8_CTX_MODE_SHIFT;
+	else
+		desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
 	desc |= GEN8_CTX_L3LLC_COHERENT;
 	desc |= GEN8_CTX_PRIVILEGE;
 	desc |= lrca;
@@ -292,16 +296,17 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	uint64_t temp = 0;
 	uint32_t desc[4];
+	bool legacy_64bit_ctx = USES_FULL_48BIT_PPGTT(dev);
 
 	/* XXX: You must always write both descriptors in the order below. */
 	if (ctx_obj1)
-		temp = execlists_ctx_descriptor(ring, ctx_obj1);
+		temp = execlists_ctx_descriptor(ring, ctx_obj1, legacy_64bit_ctx);
 	else
 		temp = 0;
 	desc[1] = (u32)(temp >> 32);
 	desc[0] = (u32)temp;
 
-	temp = execlists_ctx_descriptor(ring, ctx_obj0);
+	temp = execlists_ctx_descriptor(ring, ctx_obj0, legacy_64bit_ctx);
 	desc[3] = (u32)(temp >> 32);
 	desc[2] = (u32)temp;
 
@@ -332,37 +337,60 @@ static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 	reg_state[CTX_RING_TAIL+1] = tail;
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
 
-	/* True PPGTT with dynamic page allocation: update PDP registers and
-	 * point the unallocated PDPs to the scratch page
-	 */
-	if (ppgtt) {
+	if (ppgtt && USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		/* True 64b PPGTT (48bit canonical)
+		 * PDP0_DESCRIPTOR contains the base address to PML4 and
+		 * other PDP Descriptors are ignored
+		 */
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pml4.daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pml4.daddr);
+	} else if (ppgtt) {
+		/* True 32b PPGTT with dynamic page allocation: update PDP
+		 * registers and point the unallocated PDPs to the scratch page
+		 */
 		if (test_bit(3, ppgtt->pdp.used_pdpes)) {
-			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
-			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+			reg_state[CTX_PDP3_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+			reg_state[CTX_PDP3_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
 		} else {
-			reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-			reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP3_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP3_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
 		}
 		if (test_bit(2, ppgtt->pdp.used_pdpes)) {
-			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
-			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+			reg_state[CTX_PDP2_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+			reg_state[CTX_PDP2_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
 		} else {
-			reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-			reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP2_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP2_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
 		}
 		if (test_bit(1, ppgtt->pdp.used_pdpes)) {
-			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
-			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+			reg_state[CTX_PDP1_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+			reg_state[CTX_PDP1_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
 		} else {
-			reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-			reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP1_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP1_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
 		}
 		if (test_bit(0, ppgtt->pdp.used_pdpes)) {
-			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
-			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+			reg_state[CTX_PDP0_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+			reg_state[CTX_PDP0_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
 		} else {
-			reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-			reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP0_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP0_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
 		}
 	}
 
@@ -1771,36 +1799,69 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
 	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
 
-	/* With dynamic page allocation, PDPs may not be allocated at this point,
-	 * Point the unallocated PDPs to the scratch page
-	 */
-	if (test_bit(3, ppgtt->pdp.used_pdpes)) {
-		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
-		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
-	} else {
-		reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-		reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
-	}
-	if (test_bit(2, ppgtt->pdp.used_pdpes)) {
-		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
-		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
-	} else {
-		reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-		reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
-	}
-	if (test_bit(1, ppgtt->pdp.used_pdpes)) {
-		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
-		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
-	} else {
-		reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-		reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
-	}
-	if (test_bit(0, ppgtt->pdp.used_pdpes)) {
-		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
-		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+		/* 64b PPGTT (48bit canonical)
+		 * PDP0_DESCRIPTOR contains the base address to PML4 and
+		 * other PDP Descriptors are ignored
+		 */
+		reg_state[CTX_PDP3_UDW+1] = 0;
+		reg_state[CTX_PDP3_LDW+1] = 0;
+		reg_state[CTX_PDP2_UDW+1] = 0;
+		reg_state[CTX_PDP2_LDW+1] = 0;
+		reg_state[CTX_PDP1_UDW+1] = 0;
+		reg_state[CTX_PDP1_LDW+1] = 0;
+		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pml4.daddr);
+		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pml4.daddr);
 	} else {
-		reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
-		reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+		/* 32b PPGTT
+		 * PDP*_DESCRIPTOR contains the base address of space supported.
+		 * With dynamic page allocation, PDPs may not be allocated at
+		 * this point. Point the unallocated PDPs to the scratch page
+		 */
+		if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP3_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+			reg_state[CTX_PDP3_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
+		} else {
+			reg_state[CTX_PDP3_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP3_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP2_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+			reg_state[CTX_PDP2_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
+		} else {
+			reg_state[CTX_PDP2_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP2_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP1_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+			reg_state[CTX_PDP1_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
+		} else {
+			reg_state[CTX_PDP1_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP1_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
+		if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+			reg_state[CTX_PDP0_UDW+1] =
+					upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+			reg_state[CTX_PDP0_LDW+1] =
+					lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
+		} else {
+			reg_state[CTX_PDP0_UDW+1] =
+					upper_32_bits(ppgtt->scratch_pd->daddr);
+			reg_state[CTX_PDP0_LDW+1] =
+					lower_32_bits(ppgtt->scratch_pd->daddr);
+		}
 	}
 
 	if (ring->id == RCS) {
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 28/32] drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (26 preceding siblings ...)
  2015-02-24 16:23   ` [PATCH v6 27/32] drm/i915/bdw: Support 64 bit PPGTT in lrc mode Michel Thierry
@ 2015-02-24 16:23   ` Michel Thierry
  2015-02-24 16:23   ` [PATCH v6 29/32] drm/i915: Plumb sg_iter through va allocation ->maps Michel Thierry
                     ` (4 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:23 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The insert_entries function was the function used to write PTEs. For the
PPGTT it was "hardcoded" to only understand two level page tables, which
was the case for GEN7. We can reuse this for 4 level page tables, and
remove the concept of insert_entries, which was never viable past 2
level page tables anyway, but it requires a bit of rework to make the
function a bit more generic.

This patch begins the generalization work, and it will be heavily used
upon when the 48b code is complete. The patch series attempts to make
each function which touches a part of code specific to the page table
level and here is no exception. Having extra variables (such as the
PPGTT) distracts and provides room to add bugs since the function
shouldn't be touching anything in the higher order page tables.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 55 +++++++++++++++++++++++++------------
 1 file changed, 38 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 88a3c49..d5271e3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -619,23 +619,19 @@ static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	return gen8_write_pdp(ring, 0, ppgtt->pml4.daddr);
 }
 
-static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
-				   uint64_t start,
-				   uint64_t length,
-				   bool use_scratch)
+static void gen8_ppgtt_clear_pte_range(struct i915_page_directory_pointer_entry *pdp,
+				       uint64_t start,
+				       uint64_t length,
+				       gen8_gtt_pte_t scratch_pte,
+				       const bool flush)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
-	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
-	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
+	gen8_gtt_pte_t *pt_vaddr;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
 	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
 	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned last_pte, i;
 
-	scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
-				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
 		struct i915_page_directory_entry *pd;
@@ -668,7 +664,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 			num_entries--;
 		}
 
-		if (!HAS_LLC(ppgtt->base.dev))
+		if (flush)
 			drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
 		kunmap_atomic(pt_vaddr);
 
@@ -680,14 +676,27 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 	}
 }
 
-static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
-				      struct sg_table *pages,
-				      uint64_t start,
-				      enum i915_cache_level cache_level, u32 unused)
+static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
+				   uint64_t start,
+				   uint64_t length,
+				   bool use_scratch)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+	gen8_gtt_pte_t scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
+						     I915_CACHE_LLC, use_scratch);
+
+	gen8_ppgtt_clear_pte_range(pdp, start, length, scratch_pte, !HAS_LLC(vm->dev));
+}
+
+static void gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer_entry *pdp,
+					  struct sg_table *pages,
+					  uint64_t start,
+					  enum i915_cache_level cache_level,
+					  const bool flush)
+{
 	gen8_gtt_pte_t *pt_vaddr;
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -709,7 +718,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
 					cache_level, true);
 		if (++pte == GEN8_PTES_PER_PAGE) {
-			if (!HAS_LLC(ppgtt->base.dev))
+			if (flush)
 				drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
 			kunmap_atomic(pt_vaddr);
 			pt_vaddr = NULL;
@@ -721,12 +730,24 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		}
 	}
 	if (pt_vaddr) {
-		if (!HAS_LLC(ppgtt->base.dev))
+		if (flush)
 			drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
 		kunmap_atomic(pt_vaddr);
 	}
 }
 
+static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
+				      struct sg_table *pages,
+				      uint64_t start,
+				      enum i915_cache_level cache_level,
+				      u32 unused)
+{
+	struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+	gen8_ppgtt_insert_pte_entries(pdp, pages, start, cache_level, !HAS_LLC(vm->dev));
+}
+
 static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
 			     struct i915_page_table_entry *pt,
 			     struct drm_device *dev)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 29/32] drm/i915: Plumb sg_iter through va allocation ->maps
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (27 preceding siblings ...)
  2015-02-24 16:23   ` [PATCH v6 28/32] drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT Michel Thierry
@ 2015-02-24 16:23   ` Michel Thierry
  2015-02-24 16:23   ` [PATCH v6 30/32] drm/i915/bdw: Add 4 level support in insert_entries and clear_range Michel Thierry
                     ` (3 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:23 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

As a step towards implementing 4 levels, while not discarding the
existing pte map functions, we need to pass the sg_iter through. The
current function understands to the page directory granularity. An
object's pages may span the page directory, and so using the iter
directly as we write the PTEs allows the iterator to stay coherent
through a VMA mapping operation spanning multiple page table levels.

v2: Rebase after s/page_tables/page_table/.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 46 +++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d5271e3..166daf4 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -692,7 +692,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 }
 
 static void gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer_entry *pdp,
-					  struct sg_table *pages,
+					  struct sg_page_iter *sg_iter,
 					  uint64_t start,
 					  enum i915_cache_level cache_level,
 					  const bool flush)
@@ -701,11 +701,10 @@ static void gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer_ent
 	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
 	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
 	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
-	struct sg_page_iter sg_iter;
 
 	pt_vaddr = NULL;
 
-	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
+	while (__sg_page_iter_next(sg_iter)) {
 		if (pt_vaddr == NULL) {
 			struct i915_page_directory_entry *pd = pdp->page_directory[pdpe];
 			struct i915_page_table_entry *pt = pd->page_table[pde];
@@ -715,7 +714,7 @@ static void gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer_ent
 		}
 
 		pt_vaddr[pte] =
-			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
+			gen8_pte_encode(sg_page_iter_dma_address(sg_iter),
 					cache_level, true);
 		if (++pte == GEN8_PTES_PER_PAGE) {
 			if (flush)
@@ -744,8 +743,10 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
+	struct sg_page_iter sg_iter;
 
-	gen8_ppgtt_insert_pte_entries(pdp, pages, start, cache_level, !HAS_LLC(vm->dev));
+	__sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
+	gen8_ppgtt_insert_pte_entries(pdp, &sg_iter, start, cache_level, !HAS_LLC(vm->dev));
 }
 
 static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
@@ -1107,10 +1108,12 @@ err_out:
 	return -ENOMEM;
 }
 
-static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
-				    struct i915_page_directory_pointer_entry *pdp,
-				    uint64_t start,
-				    uint64_t length)
+static int __gen8_alloc_vma_range_3lvl(struct i915_address_space *vm,
+				       struct i915_page_directory_pointer_entry *pdp,
+				       struct sg_page_iter *sg_iter,
+				       uint64_t start,
+				       uint64_t length,
+				       u32 flags)
 {
 	unsigned long *new_page_dirs, **new_page_tables;
 	struct drm_device *dev = vm->dev;
@@ -1179,7 +1182,11 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 				   gen8_pte_index(pd_start),
 				   gen8_pte_count(pd_start, pd_len));
 
-			/* Our pde is now pointing to the pagetable, pt */
+			if (sg_iter) {
+				BUG_ON(!sg_iter->__nents);
+				gen8_ppgtt_insert_pte_entries(pdp, sg_iter, pd_start,
+							      flags, !HAS_LLC(vm->dev));
+			}
 			set_bit(pde, pd->used_pdes);
 		}
 
@@ -1204,10 +1211,12 @@ err_out:
 	return ret;
 }
 
-static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
-				    struct i915_pml4 *pml4,
-				    uint64_t start,
-				    uint64_t length)
+static int __gen8_alloc_vma_range_4lvl(struct i915_address_space *vm,
+				       struct i915_pml4 *pml4,
+				       struct sg_page_iter *sg_iter,
+				       uint64_t start,
+				       uint64_t length,
+				       u32 flags)
 {
 	DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
 	struct i915_hw_ppgtt *ppgtt =
@@ -1251,7 +1260,8 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
 	gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
 		BUG_ON(!pdp);
 
-		ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
+		ret = __gen8_alloc_vma_range_3lvl(vm, pdp, sg_iter,
+						  start, length, flags);
 		if (ret)
 			goto err_out;
 
@@ -1283,9 +1293,11 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
 		container_of(vm, struct i915_hw_ppgtt, base);
 
 	if (USES_FULL_48BIT_PPGTT(vm->dev))
-		return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+		return __gen8_alloc_vma_range_4lvl(vm, &ppgtt->pml4, NULL,
+						   start, length, 0);
 	else
-		return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
+		return __gen8_alloc_vma_range_3lvl(vm, &ppgtt->pdp, NULL,
+						   start, length, 0);
 }
 
 static void gen8_ppgtt_fini_common(struct i915_hw_ppgtt *ppgtt)
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 30/32] drm/i915/bdw: Add 4 level support in insert_entries and clear_range
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (28 preceding siblings ...)
  2015-02-24 16:23   ` [PATCH v6 29/32] drm/i915: Plumb sg_iter through va allocation ->maps Michel Thierry
@ 2015-02-24 16:23   ` Michel Thierry
  2015-02-24 16:23   ` [PATCH v6 31/32] drm/i915: Expand error state's address width to 64b Michel Thierry
                     ` (2 subsequent siblings)
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:23 UTC (permalink / raw)
  To: intel-gfx

When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
it will write to.

Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.

Also add a scratch page for PML4.

This patch was inspired by Ben's "Depend exclusively on map and
unmap_vma".

v2: Rebase after s/page_tables/page_table/.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 66 ++++++++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 12 +++++++
 2 files changed, 67 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 166daf4..842ce93 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -676,24 +676,52 @@ static void gen8_ppgtt_clear_pte_range(struct i915_page_directory_pointer_entry
 	}
 }
 
+static void gen8_ppgtt_clear_range_4lvl(struct i915_hw_ppgtt *ppgtt,
+					gen8_gtt_pte_t scratch_pte,
+					uint64_t start,
+					uint64_t length)
+{
+	struct i915_page_directory_pointer_entry *pdp;
+	uint64_t templ4, templ3, pml4e, pdpe;
+
+	gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
+		struct i915_page_directory_entry *pd;
+		uint64_t pdp_len = gen8_clamp_pdp(start, length);
+		uint64_t pdp_start = start;
+
+		gen8_for_each_pdpe(pd, pdp, pdp_start, pdp_len, templ3, pdpe) {
+			uint64_t pd_len = gen8_clamp_pd(pdp_start, pdp_len);
+			uint64_t pd_start = pdp_start;
+
+			gen8_ppgtt_clear_pte_range(pdp, pd_start, pd_len,
+						   scratch_pte, !HAS_LLC(ppgtt->base.dev));
+		}
+	}
+}
+
 static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
-				   uint64_t start,
-				   uint64_t length,
+				   uint64_t start, uint64_t length,
 				   bool use_scratch)
 {
 	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
-	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
-
+			container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
 						     I915_CACHE_LLC, use_scratch);
 
-	gen8_ppgtt_clear_pte_range(pdp, start, length, scratch_pte, !HAS_LLC(vm->dev));
+	if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+		struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp;
+
+		gen8_ppgtt_clear_pte_range(pdp, start, length, scratch_pte,
+					   !HAS_LLC(ppgtt->base.dev));
+	} else {
+		gen8_ppgtt_clear_range_4lvl(ppgtt, scratch_pte, start, length);
+	}
 }
 
 static void gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer_entry *pdp,
 					  struct sg_page_iter *sg_iter,
 					  uint64_t start,
+					  size_t pages,
 					  enum i915_cache_level cache_level,
 					  const bool flush)
 {
@@ -704,7 +732,7 @@ static void gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer_ent
 
 	pt_vaddr = NULL;
 
-	while (__sg_page_iter_next(sg_iter)) {
+	while (pages-- && __sg_page_iter_next(sg_iter)) {
 		if (pt_vaddr == NULL) {
 			struct i915_page_directory_entry *pd = pdp->page_directory[pdpe];
 			struct i915_page_table_entry *pt = pd->page_table[pde];
@@ -742,11 +770,26 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 				      u32 unused)
 {
 	struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base);
-	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
+	struct i915_page_directory_pointer_entry *pdp;
 	struct sg_page_iter sg_iter;
 
 	__sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
-	gen8_ppgtt_insert_pte_entries(pdp, &sg_iter, start, cache_level, !HAS_LLC(vm->dev));
+
+	if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+		pdp = &ppgtt->pdp;
+		gen8_ppgtt_insert_pte_entries(pdp, &sg_iter, start,
+				sg_nents(pages->sgl),
+				cache_level, !HAS_LLC(vm->dev));
+	} else {
+		struct i915_pml4 *pml4;
+		unsigned pml4e = gen8_pml4e_index(start);
+
+		pml4 = &ppgtt->pml4;
+		pdp = pml4->pdps[pml4e];
+		gen8_ppgtt_insert_pte_entries(pdp, &sg_iter, start,
+				sg_nents(pages->sgl),
+				cache_level, !HAS_LLC(vm->dev));
+	}
 }
 
 static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
@@ -1185,7 +1228,8 @@ static int __gen8_alloc_vma_range_3lvl(struct i915_address_space *vm,
 			if (sg_iter) {
 				BUG_ON(!sg_iter->__nents);
 				gen8_ppgtt_insert_pte_entries(pdp, sg_iter, pd_start,
-							      flags, !HAS_LLC(vm->dev));
+						gen8_pte_count(pd_start, pd_len),
+						flags, !HAS_LLC(vm->dev));
 			}
 			set_bit(pde, pd->used_pdes);
 		}
@@ -1330,7 +1374,7 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
 		int ret = pml4_init(ppgtt);
 		if (ret) {
-			unmap_and_free_pt(ppgtt->scratch_pd, ppgtt->base.dev);
+			unmap_and_free_pt(ppgtt->scratch_pml4, ppgtt->base.dev);
 			return ret;
 		}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8866360..e34bc92 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -332,6 +332,7 @@ struct i915_hw_ppgtt {
 	union {
 		struct i915_page_table_entry *scratch_pt;
 		struct i915_page_table_entry *scratch_pd; /* Just need the daddr */
+		struct i915_page_table_entry *scratch_pml4;
 	};
 
 	struct drm_i915_file_private *file_priv;
@@ -452,6 +453,17 @@ static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
 	return next_pd - start;
 }
 
+/* Clamp length to the next page_directory pointer boundary */
+static inline uint64_t gen8_clamp_pdp(uint64_t start, uint64_t length)
+{
+	uint64_t next_pdp = ALIGN(start + 1, 1ULL << GEN8_PML4E_SHIFT);
+
+	if (next_pdp > (start + length))
+		return length;
+
+	return next_pdp - start;
+}
+
 static inline uint32_t gen8_pte_index(uint64_t address)
 {
 	return i915_pte_index(address, GEN8_PDE_SHIFT);
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 31/32] drm/i915: Expand error state's address width to 64b
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (29 preceding siblings ...)
  2015-02-24 16:23   ` [PATCH v6 30/32] drm/i915/bdw: Add 4 level support in insert_entries and clear_range Michel Thierry
@ 2015-02-24 16:23   ` Michel Thierry
  2015-02-24 16:23   ` [PATCH v6 32/32] drm/i915/bdw: Flip the 48b switch Michel Thierry
  2015-02-24 20:31   ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Daniel Vetter
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:23 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

v2: 0 pad the new 8B fields or else intel_error_decode has a hard time.
Note, regardless we need an igt update.

v3: Make reloc_offset 64b also.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h       |  4 ++--
 drivers/gpu/drm/i915/i915_gpu_error.c | 17 +++++++++--------
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 662d6c1..d28abd1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -459,7 +459,7 @@ struct drm_i915_error_state {
 
 		struct drm_i915_error_object {
 			int page_count;
-			u32 gtt_offset;
+			u64 gtt_offset;
 			u32 *pages[0];
 		} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
 
@@ -485,7 +485,7 @@ struct drm_i915_error_state {
 		u32 size;
 		u32 name;
 		u32 rseqno, wseqno;
-		u32 gtt_offset;
+		u64 gtt_offset;
 		u32 read_domains;
 		u32 write_domain;
 		s32 fence_reg:I915_MAX_NUM_FENCE_BITS;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index a982849..bbf25d0 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -195,7 +195,7 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
 	err_printf(m, "  %s [%d]:\n", name, count);
 
 	while (count--) {
-		err_printf(m, "    %08x %8u %02x %02x %x %x",
+		err_printf(m, "    %016llx %8u %02x %02x %x %x",
 			   err->gtt_offset,
 			   err->size,
 			   err->read_domains,
@@ -415,7 +415,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 				err_printf(m, " (submitted by %s [%d])",
 					   error->ring[i].comm,
 					   error->ring[i].pid);
-			err_printf(m, " --- gtt_offset = 0x%08x\n",
+			err_printf(m, " --- gtt_offset = 0x%016llx\n",
 				   obj->gtt_offset);
 			print_error_obj(m, obj);
 		}
@@ -423,7 +423,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		obj = error->ring[i].wa_batchbuffer;
 		if (obj) {
 			err_printf(m, "%s (w/a) --- gtt_offset = 0x%08x\n",
-				   dev_priv->ring[i].name, obj->gtt_offset);
+				   dev_priv->ring[i].name,
+				   lower_32_bits(obj->gtt_offset));
 			print_error_obj(m, obj);
 		}
 
@@ -442,14 +443,14 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		if ((obj = error->ring[i].ringbuffer)) {
 			err_printf(m, "%s --- ringbuffer = 0x%08x\n",
 				   dev_priv->ring[i].name,
-				   obj->gtt_offset);
+				   lower_32_bits(obj->gtt_offset));
 			print_error_obj(m, obj);
 		}
 
 		if ((obj = error->ring[i].hws_page)) {
 			err_printf(m, "%s --- HW Status = 0x%08x\n",
 				   dev_priv->ring[i].name,
-				   obj->gtt_offset);
+				   lower_32_bits(obj->gtt_offset));
 			offset = 0;
 			for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
 				err_printf(m, "[%04x] %08x %08x %08x %08x\n",
@@ -465,13 +466,13 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		if ((obj = error->ring[i].ctx)) {
 			err_printf(m, "%s --- HW Context = 0x%08x\n",
 				   dev_priv->ring[i].name,
-				   obj->gtt_offset);
+				   lower_32_bits(obj->gtt_offset));
 			print_error_obj(m, obj);
 		}
 	}
 
 	if ((obj = error->semaphore_obj)) {
-		err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
+		err_printf(m, "Semaphore page = 0x%016llx\n", obj->gtt_offset);
 		for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
 			err_printf(m, "[%04x] %08x %08x %08x %08x\n",
 				   elt * 4,
@@ -571,7 +572,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 	int num_pages;
 	bool use_ggtt;
 	int i = 0;
-	u32 reloc_offset;
+	u64 reloc_offset;
 
 	if (src == NULL || src->pages == NULL)
 		return NULL;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH v6 32/32] drm/i915/bdw: Flip the 48b switch
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (30 preceding siblings ...)
  2015-02-24 16:23   ` [PATCH v6 31/32] drm/i915: Expand error state's address width to 64b Michel Thierry
@ 2015-02-24 16:23   ` Michel Thierry
  2015-02-24 20:31   ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Daniel Vetter
  32 siblings, 0 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-24 16:23 UTC (permalink / raw)
  To: intel-gfx

Use 48b addresses if hw supports it and i915.enable_ppgtt=3.

Aliasing PPGTT remains 32b only.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 7 ++-----
 drivers/gpu/drm/i915/i915_params.c  | 2 +-
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 842ce93..fda5907 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -107,7 +107,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
 
 #ifdef CONFIG_64BIT
 	has_full_64bit_ppgtt = IS_BROADWELL(dev) ||
-				INTEL_INFO(dev)->gen >= 9 && false; /* FIXME: 64b */
+				INTEL_INFO(dev)->gen >= 9;
 #else
 	has_full_64bit_ppgtt = false;
 #endif
@@ -1076,9 +1076,6 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
 
 	BUG_ON(!bitmap_empty(new_pds, pdpes));
 
-	/* FIXME: PPGTT container_of won't work for 64b */
-	BUG_ON((start + length) > 0x800000000ULL);
-
 	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		if (pd)
 			continue;
@@ -1397,7 +1394,7 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b */
+	struct i915_page_directory_pointer_entry *pdp = &ppgtt->pdp; /* FIXME: 48b? */
 	struct i915_page_directory_entry *pd;
 	uint64_t temp, start = 0, size = dev_priv->gtt.base.total;
 	uint32_t pdpe;
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 44f2262..1cd43b0 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -119,7 +119,7 @@ MODULE_PARM_DESC(enable_hangcheck,
 module_param_named_unsafe(enable_ppgtt, i915.enable_ppgtt, int, 0400);
 MODULE_PARM_DESC(enable_ppgtt,
 	"Override PPGTT usage. "
-	"(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
+	"(-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full_64b)");
 
 module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
 MODULE_PARM_DESC(enable_execlists,
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing
  2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
                     ` (31 preceding siblings ...)
  2015-02-24 16:23   ` [PATCH v6 32/32] drm/i915/bdw: Flip the 48b switch Michel Thierry
@ 2015-02-24 20:31   ` Daniel Vetter
  2015-02-25 10:55     ` Mika Kuoppala
  32 siblings, 1 reply; 229+ messages in thread
From: Daniel Vetter @ 2015-02-24 20:31 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Tue, Feb 24, 2015 at 04:22:33PM +0000, Michel Thierry wrote:
> This patchset addresses comments from v5 by Mika, specially some rename changes
> touched several patches.
> 
> For GEN8, it has also been extended to work in logical ring submission (lrc)
> mode, as it will be the preferred mode of operation.
> I also tried to update the lrc code at the same time the ppgtt refactoring
> occurred, leaving only one patch that is exclusively for lrc.
> 
> I'm also now including the required patches for PPGTT with 48b addressing.
> In order expand the GPU address space, a 4th level translation is added, the
> Page Map Level 4 (PML4). This PML4 has 256 PML4 Entries (PML4E), PML4[0-255],
> each pointing to a PDP.
> 
> For now, this feature will only be available in BDW and GEN9, in LRC submission
> mode (execlists) and when i915.enable_ppgtt=3 is set.
> Also note that this expanded address space is only available for full PPGTT,
> aliasing PPGTT remains 32b.
> 
> This list can be seen in 3 parts:
> [01-10] Add page table allocation for GEN6/GEN7
> [11-20] Enable dynamic allocation in GEN8,for both legacy and
> execlist submission modes.
> [21-32] PML4 support in BDW and GEN9+.
> 
> Ben Widawsky (26):
>   drm/i915: page table abstractions
>   drm/i915: Complete page table structures
>   drm/i915: Create page table allocators
>   drm/i915: Track GEN6 page table usage
>   drm/i915: Extract context switch skip and pd load logic
>   drm/i915: Track page table reload need
>   drm/i915: Initialize all contexts
>   drm/i915: Finish gen6/7 dynamic page table allocation
>   drm/i915/bdw: Use dynamic allocation idioms on free
>   drm/i915/bdw: page directories rework allocation
>   drm/i915/bdw: pagetable allocation rework
>   drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
>   drm/i915: num_pd_pages/num_pd_entries isn't useful
>   drm/i915: Extract PPGTT param from page_directory alloc
>   drm/i915/bdw: Split out mappings
>   drm/i915/bdw: begin bitmap tracking
>   drm/i915/bdw: Dynamic page table allocations
>   drm/i915/bdw: Make pdp allocation more dynamic
>   drm/i915/bdw: Abstract PDP usage
>   drm/i915/bdw: Add dynamic page trace events
>   drm/i915/bdw: Add ppgtt info for dynamic pages
>   drm/i915/bdw: implement alloc/free for 4lvl
>   drm/i915/bdw: Add 4 level switching infrastructure
>   drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT
>   drm/i915: Plumb sg_iter through va allocation ->maps
>   drm/i915: Expand error state's address width to 64b
> 
> Michel Thierry (6):
>   drm/i915: Plumb drm_device through page tables operations
>   drm/i915: Add dynamic page trace events
>   drm/i915/bdw: Support dynamic pdp updates in lrc mode
>   drm/i915/bdw: Support 64 bit PPGTT in lrc mode
>   drm/i915/bdw: Add 4 level support in insert_entries and clear_range
>   drm/i915/bdw: Flip the 48b switch

When just a few patches changed (which I suspect is the case here) please
don't resend the entire series, but only resend the individual patches
in-reply-to their earlier versions.

Resending the entire series too often tends to split up the discussions
between multiple threads, so should be used cautiously. My approach is
that I don't resend the entire series except when all the patches have
changed. And I only resend when the review round has reached a conclusion.
While the review is ongoing doing incremental updates of the series is imo
much better.

But when resending the entire series, please start a new thread. Otherwise
it again starts to become unclear which versions of which patches go
together.

And a quick aside if you fear that a patch causes subsequent patches to no
longer apply without a rebase: I can deal with a lot of small conflicts
quickly when merging. And if that doesn't cut it I'll just ask for a
resend when needed.

Just a quick reminder of patch resending bkms, intel-gfx is a really busy
place so everyone needs to strive for best signal/noise ratio.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing
  2015-02-24 20:31   ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Daniel Vetter
@ 2015-02-25 10:55     ` Mika Kuoppala
  2015-02-25 12:29       ` Michel Thierry
  0 siblings, 1 reply; 229+ messages in thread
From: Mika Kuoppala @ 2015-02-25 10:55 UTC (permalink / raw)
  To: Daniel Vetter, Michel Thierry; +Cc: intel-gfx

Daniel Vetter <daniel@ffwll.ch> writes:

> On Tue, Feb 24, 2015 at 04:22:33PM +0000, Michel Thierry wrote:
>> This patchset addresses comments from v5 by Mika, specially some rename changes
>> touched several patches.
>> 
>> For GEN8, it has also been extended to work in logical ring submission (lrc)
>> mode, as it will be the preferred mode of operation.
>> I also tried to update the lrc code at the same time the ppgtt refactoring
>> occurred, leaving only one patch that is exclusively for lrc.
>> 
>> I'm also now including the required patches for PPGTT with 48b addressing.
>> In order expand the GPU address space, a 4th level translation is added, the
>> Page Map Level 4 (PML4). This PML4 has 256 PML4 Entries (PML4E), PML4[0-255],
>> each pointing to a PDP.
>> 
>> For now, this feature will only be available in BDW and GEN9, in LRC submission
>> mode (execlists) and when i915.enable_ppgtt=3 is set.
>> Also note that this expanded address space is only available for full PPGTT,
>> aliasing PPGTT remains 32b.
>> 
>> This list can be seen in 3 parts:
>> [01-10] Add page table allocation for GEN6/GEN7
>> [11-20] Enable dynamic allocation in GEN8,for both legacy and
>> execlist submission modes.
>> [21-32] PML4 support in BDW and GEN9+.
>> 
>> Ben Widawsky (26):
>>   drm/i915: page table abstractions
>>   drm/i915: Complete page table structures
>>   drm/i915: Create page table allocators
>>   drm/i915: Track GEN6 page table usage
>>   drm/i915: Extract context switch skip and pd load logic
>>   drm/i915: Track page table reload need
>>   drm/i915: Initialize all contexts
>>   drm/i915: Finish gen6/7 dynamic page table allocation
>>   drm/i915/bdw: Use dynamic allocation idioms on free
>>   drm/i915/bdw: page directories rework allocation
>>   drm/i915/bdw: pagetable allocation rework
>>   drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
>>   drm/i915: num_pd_pages/num_pd_entries isn't useful
>>   drm/i915: Extract PPGTT param from page_directory alloc
>>   drm/i915/bdw: Split out mappings
>>   drm/i915/bdw: begin bitmap tracking
>>   drm/i915/bdw: Dynamic page table allocations
>>   drm/i915/bdw: Make pdp allocation more dynamic
>>   drm/i915/bdw: Abstract PDP usage
>>   drm/i915/bdw: Add dynamic page trace events
>>   drm/i915/bdw: Add ppgtt info for dynamic pages
>>   drm/i915/bdw: implement alloc/free for 4lvl
>>   drm/i915/bdw: Add 4 level switching infrastructure
>>   drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT
>>   drm/i915: Plumb sg_iter through va allocation ->maps
>>   drm/i915: Expand error state's address width to 64b
>> 
>> Michel Thierry (6):
>>   drm/i915: Plumb drm_device through page tables operations
>>   drm/i915: Add dynamic page trace events
>>   drm/i915/bdw: Support dynamic pdp updates in lrc mode
>>   drm/i915/bdw: Support 64 bit PPGTT in lrc mode
>>   drm/i915/bdw: Add 4 level support in insert_entries and clear_range
>>   drm/i915/bdw: Flip the 48b switch
>
> When just a few patches changed (which I suspect is the case here) please
> don't resend the entire series, but only resend the individual patches
> in-reply-to their earlier versions.
>
> Resending the entire series too often tends to split up the discussions
> between multiple threads, so should be used cautiously. My approach is
> that I don't resend the entire series except when all the patches have
> changed. And I only resend when the review round has reached a conclusion.
> While the review is ongoing doing incremental updates of the series is imo
> much better.
>
> But when resending the entire series, please start a new thread. Otherwise
> it again starts to become unclear which versions of which patches go
> together.
>
> And a quick aside if you fear that a patch causes subsequent patches to no
> longer apply without a rebase: I can deal with a lot of small conflicts
> quickly when merging. And if that doesn't cut it I'll just ask for a
> resend when needed.
>

I have been asking alot of stuff that triggers a rebasing. I suggest we
should keep the 48b items in a separate series until we have most of the
dynamic page table series sorted out.

- Mika

> Just a quick reminder of patch resending bkms, intel-gfx is a really busy
> place so everyone needs to strive for best signal/noise ratio.
>
> Thanks, Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing
  2015-02-25 10:55     ` Mika Kuoppala
@ 2015-02-25 12:29       ` Michel Thierry
  2015-02-25 14:20         ` Daniel Vetter
  0 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-02-25 12:29 UTC (permalink / raw)
  To: Mika Kuoppala, Daniel Vetter; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 4461 bytes --]

On 2/25/2015 10:55 AM, Mika Kuoppala wrote:
> Daniel Vetter <daniel@ffwll.ch> writes:
>
>> On Tue, Feb 24, 2015 at 04:22:33PM +0000, Michel Thierry wrote:
>>> This patchset addresses comments from v5 by Mika, specially some rename changes
>>> touched several patches.
>>>
>>> For GEN8, it has also been extended to work in logical ring submission (lrc)
>>> mode, as it will be the preferred mode of operation.
>>> I also tried to update the lrc code at the same time the ppgtt refactoring
>>> occurred, leaving only one patch that is exclusively for lrc.
>>>
>>> I'm also now including the required patches for PPGTT with 48b addressing.
>>> In order expand the GPU address space, a 4th level translation is added, the
>>> Page Map Level 4 (PML4). This PML4 has 256 PML4 Entries (PML4E), PML4[0-255],
>>> each pointing to a PDP.
>>>
>>> For now, this feature will only be available in BDW and GEN9, in LRC submission
>>> mode (execlists) and when i915.enable_ppgtt=3 is set.
>>> Also note that this expanded address space is only available for full PPGTT,
>>> aliasing PPGTT remains 32b.
>>>
>>> This list can be seen in 3 parts:
>>> [01-10] Add page table allocation for GEN6/GEN7
>>> [11-20] Enable dynamic allocation in GEN8,for both legacy and
>>> execlist submission modes.
>>> [21-32] PML4 support in BDW and GEN9+.
>>>
>>> Ben Widawsky (26):
>>>    drm/i915: page table abstractions
>>>    drm/i915: Complete page table structures
>>>    drm/i915: Create page table allocators
>>>    drm/i915: Track GEN6 page table usage
>>>    drm/i915: Extract context switch skip and pd load logic
>>>    drm/i915: Track page table reload need
>>>    drm/i915: Initialize all contexts
>>>    drm/i915: Finish gen6/7 dynamic page table allocation
>>>    drm/i915/bdw: Use dynamic allocation idioms on free
>>>    drm/i915/bdw: page directories rework allocation
>>>    drm/i915/bdw: pagetable allocation rework
>>>    drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
>>>    drm/i915: num_pd_pages/num_pd_entries isn't useful
>>>    drm/i915: Extract PPGTT param from page_directory alloc
>>>    drm/i915/bdw: Split out mappings
>>>    drm/i915/bdw: begin bitmap tracking
>>>    drm/i915/bdw: Dynamic page table allocations
>>>    drm/i915/bdw: Make pdp allocation more dynamic
>>>    drm/i915/bdw: Abstract PDP usage
>>>    drm/i915/bdw: Add dynamic page trace events
>>>    drm/i915/bdw: Add ppgtt info for dynamic pages
>>>    drm/i915/bdw: implement alloc/free for 4lvl
>>>    drm/i915/bdw: Add 4 level switching infrastructure
>>>    drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT
>>>    drm/i915: Plumb sg_iter through va allocation ->maps
>>>    drm/i915: Expand error state's address width to 64b
>>>
>>> Michel Thierry (6):
>>>    drm/i915: Plumb drm_device through page tables operations
>>>    drm/i915: Add dynamic page trace events
>>>    drm/i915/bdw: Support dynamic pdp updates in lrc mode
>>>    drm/i915/bdw: Support 64 bit PPGTT in lrc mode
>>>    drm/i915/bdw: Add 4 level support in insert_entries and clear_range
>>>    drm/i915/bdw: Flip the 48b switch
>> When just a few patches changed (which I suspect is the case here) please
>> don't resend the entire series, but only resend the individual patches
>> in-reply-to their earlier versions.
>>
>> Resending the entire series too often tends to split up the discussions
>> between multiple threads, so should be used cautiously. My approach is
>> that I don't resend the entire series except when all the patches have
>> changed. And I only resend when the review round has reached a conclusion.
>> While the review is ongoing doing incremental updates of the series is imo
>> much better.
>>
>> But when resending the entire series, please start a new thread. Otherwise
>> it again starts to become unclear which versions of which patches go
>> together.
>>
>> And a quick aside if you fear that a patch causes subsequent patches to no
>> longer apply without a rebase: I can deal with a lot of small conflicts
>> quickly when merging. And if that doesn't cut it I'll just ask for a
>> resend when needed.
>>
> I have been asking alot of stuff that triggers a rebasing. I suggest we
> should keep the 48b items in a separate series until we have most of the
> dynamic page table series sorted out.
>
> - Mika
Ok, I'll keep dynamic page allocation and 48b separated.

-Michel


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5510 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v6 03/32] drm/i915: Create page table allocators
  2015-02-24 16:22   ` [PATCH v6 03/32] drm/i915: Create page table allocators Michel Thierry
@ 2015-02-25 13:34     ` Mika Kuoppala
  2015-03-02 18:57       ` Paulo Zanoni
  0 siblings, 1 reply; 229+ messages in thread
From: Mika Kuoppala @ 2015-02-25 13:34 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> As we move toward dynamic page table allocation, it becomes much easier
> to manage our data structures if break do things less coarsely by
> breaking up all of our actions into individual tasks.  This makes the
> code easier to write, read, and verify.
>
> Aside from the dissection of the allocation functions, the patch
> statically allocates the page table structures without a page directory.
> This remains the same for all platforms,
>
> The patch itself should not have much functional difference. The primary
> noticeable difference is the fact that page tables are no longer
> allocated, but rather statically declared as part of the page directory.
> This has non-zero overhead, but things gain additional complexity as a
> result.
>
> This patch exists for a few reasons:
> 1. Splitting out the functions allows easily combining GEN6 and GEN8
> code. Page tables have no difference based on GEN8. As we'll see in a
> future patch when we add the DMA mappings to the allocations, it
> requires only one small change to make work, and error handling should
> just fall into place.
>
> 2. Unless we always want to allocate all page tables under a given PDE,
> we'll have to eventually break this up into an array of pointers (or
> pointer to pointer).
>
> 3. Having the discrete functions is easier to review, and understand.
> All allocations and frees now take place in just a couple of locations.
> Reviewing, and catching leaks should be easy.
>
> 4. Less important: the GFP flags are confined to one location, which
> makes playing around with such things trivial.
>
> v2: Updated commit message to explain why this patch exists
>
> v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
>
> v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
>
> v5: Added additional safety checks in gen8 clear/free/unmap.
>
> v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
>
> v7: Make err_out loop symmetrical to the way we allocate in
> alloc_pt_range. Also s/page_tables/page_table and correct commit
> message (Mika)
>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 254 ++++++++++++++++++++++++------------
>  drivers/gpu/drm/i915/i915_gem_gtt.h |   4 +-
>  drivers/gpu/drm/i915/intel_lrc.c    |  16 +--
>  3 files changed, 178 insertions(+), 96 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index ab6f1d4..81c1dba 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -279,6 +279,98 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
>  	return pte;
>  }
>  
> +static void unmap_and_free_pt(struct i915_page_table_entry *pt)
> +{
> +	if (WARN_ON(!pt->page))
> +		return;
> +	__free_page(pt->page);
> +	kfree(pt);
> +}
> +
> +static struct i915_page_table_entry *alloc_pt_single(void)
> +{
> +	struct i915_page_table_entry *pt;
> +
> +	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
> +	if (!pt)
> +		return ERR_PTR(-ENOMEM);
> +
> +	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!pt->page) {
> +		kfree(pt);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	return pt;
> +}
> +
> +/**
> + * alloc_pt_range() - Allocate a multiple page tables
> + * @pd:		The page directory which will have at least @count entries
> + *		available to point to the allocated page tables.
> + * @pde:	First page directory entry for which we are allocating.
> + * @count:	Number of pages to allocate.
> + *
> + * Allocates multiple page table pages and sets the appropriate entries in the
> + * page table structure within the page directory. Function cleans up after
> + * itself on any failures.
> + *
> + * Return: 0 if allocation succeeded.
> + */
> +static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count)
> +{
> +	int i, ret;
> +
> +	/* 512 is the max page tables per page_directory on any platform. */
> +	if (WARN_ON(pde + count > GEN6_PPGTT_PD_ENTRIES))
> +		return -EINVAL;
> +
> +	for (i = pde; i < pde + count; i++) {
> +		struct i915_page_table_entry *pt = alloc_pt_single();
> +
> +		if (IS_ERR(pt)) {
> +			ret = PTR_ERR(pt);
> +			goto err_out;
> +		}
> +		WARN(pd->page_table[i],
> +		     "Leaking page directory entry %d (%pa)\n",
> +		     i, pd->page_table[i]);
> +		pd->page_table[i] = pt;
> +	}
> +
> +	return 0;
> +
> +err_out:
> +	while (i-- > pde)
> +		unmap_and_free_pt(pd->page_table[i]);
> +	return ret;
> +}
> +
> +static void unmap_and_free_pd(struct i915_page_directory_entry *pd)
> +{
> +	if (pd->page) {
> +		__free_page(pd->page);
> +		kfree(pd);
> +	}
> +}
> +
> +static struct i915_page_directory_entry *alloc_pd_single(void)
> +{
> +	struct i915_page_directory_entry *pd;
> +
> +	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
> +	if (!pd)
> +		return ERR_PTR(-ENOMEM);
> +
> +	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!pd->page) {
> +		kfree(pd);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	return pd;
> +}
> +
>  /* Broadwell Page Directory Pointer Descriptors */
>  static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
>  			   uint64_t val)
> @@ -311,7 +403,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
>  	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
>  
>  	for (i = used_pd - 1; i >= 0; i--) {
> -		dma_addr_t addr = ppgtt->pdp.page_directory[i].daddr;
> +		dma_addr_t addr = ppgtt->pdp.page_directory[i]->daddr;
>  		ret = gen8_write_pdp(ring, i, addr);
>  		if (ret)
>  			return ret;
> @@ -338,8 +430,24 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>  				      I915_CACHE_LLC, use_scratch);
>  
>  	while (num_entries) {
> -		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
> -		struct page *page_table = pd->page_table[pde].page;
> +		struct i915_page_directory_entry *pd;
> +		struct i915_page_table_entry *pt;
> +		struct page *page_table;
> +
> +		if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
> +			continue;
> +
> +		pd = ppgtt->pdp.page_directory[pdpe];
> +
> +		if (WARN_ON(!pd->page_table[pde]))
> +			continue;
> +
> +		pt = pd->page_table[pde];
> +
> +		if (WARN_ON(!pt->page))
> +			continue;
> +
> +		page_table = pt->page;
>  
>  		last_pte = pte + num_entries;
>  		if (last_pte > GEN8_PTES_PER_PAGE)
> @@ -384,8 +492,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  			break;
>  
>  		if (pt_vaddr == NULL) {
> -			struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
> -			struct page *page_table = pd->page_table[pde].page;
> +			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[pdpe];
> +			struct i915_page_table_entry *pt = pd->page_table[pde];
> +			struct page *page_table = pt->page;
>  
>  			pt_vaddr = kmap_atomic(page_table);
>  		}
> @@ -416,19 +525,16 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
>  {
>  	int i;
>  
> -	if (pd->page_table == NULL)
> +	if (!pd->page)
>  		return;
>  
> -	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
> -		if (pd->page_table[i].page)
> -			__free_page(pd->page_table[i].page);
> -}
> +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
> +		if (WARN_ON(!pd->page_table[i]))
> +			continue;
>  
> -static void gen8_free_page_directory(struct i915_page_directory_entry *pd)
> -{
> -	gen8_free_page_tables(pd);
> -	kfree(pd->page_table);
> -	__free_page(pd->page);
> +		unmap_and_free_pt(pd->page_table[i]);
> +		pd->page_table[i] = NULL;
> +	}
>  }
>  
>  static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
> @@ -436,7 +542,11 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  	int i;
>  
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> -		gen8_free_page_directory(&ppgtt->pdp.page_directory[i]);
> +		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
> +			continue;
> +
> +		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
> +		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
>  	}
>  }
>  
> @@ -448,14 +558,23 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
>  		/* TODO: In the future we'll support sparse mappings, so this
>  		 * will have to change. */
> -		if (!ppgtt->pdp.page_directory[i].daddr)
> +		if (!ppgtt->pdp.page_directory[i]->daddr)
>  			continue;
>  
> -		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i].daddr, PAGE_SIZE,
> +		pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i]->daddr, PAGE_SIZE,
>  			       PCI_DMA_BIDIRECTIONAL);
>  
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> -			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_table[j].daddr;
> +			struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
> +			struct i915_page_table_entry *pt;
> +			dma_addr_t addr;
> +
> +			if (WARN_ON(!pd->page_table[j]))
> +				continue;
> +
> +			pt = pd->page_table[j];
> +			addr = pt->daddr;
> +
>  			if (addr)
>  				pci_unmap_page(hwdev, addr, PAGE_SIZE,
>  					       PCI_DMA_BIDIRECTIONAL);
> @@ -474,25 +593,20 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>  
>  static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>  {
> -	int i, j;
> +	int i, ret;
>  
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> -		struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[i];
> -		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> -			struct i915_page_table_entry *pt = &pd->page_table[j];
> -
> -			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> -			if (!pt->page)
> -				goto unwind_out;
> -
> -		}
> +		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
> +				     0, GEN8_PDES_PER_PAGE);
> +		if (ret)
> +			goto unwind_out;
>  	}
>  
>  	return 0;
>  
>  unwind_out:
>  	while (i--)
> -		gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
> +		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
>  
>  	return -ENOMEM;
>  }
> @@ -503,19 +617,9 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
>  	int i;
>  
>  	for (i = 0; i < max_pdp; i++) {
> -		struct i915_page_table_entry *pt;
> -
> -		pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
> -		if (!pt)
> -			goto unwind_out;
> -
> -		ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
> -		if (!ppgtt->pdp.page_directory[i].page) {
> -			kfree(pt);
> +		ppgtt->pdp.page_directory[i] = alloc_pd_single();
> +		if (IS_ERR(ppgtt->pdp.page_directory[i]))
>  			goto unwind_out;
> -		}
> -
> -		ppgtt->pdp.page_directory[i].page_table = pt;
>  	}
>  
>  	ppgtt->num_pd_pages = max_pdp;
> @@ -524,10 +628,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
>  	return 0;
>  
>  unwind_out:
> -	while (i--) {
> -		kfree(ppgtt->pdp.page_directory[i].page_table);
> -		__free_page(ppgtt->pdp.page_directory[i].page);
> -	}
> +	while (i--)
> +		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
>  
>  	return -ENOMEM;
>  }
> @@ -561,14 +663,14 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
>  	int ret;
>  
>  	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
> -			       ppgtt->pdp.page_directory[pd].page, 0,
> +			       ppgtt->pdp.page_directory[pd]->page, 0,
>  			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>  
>  	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
>  	if (ret)
>  		return ret;
>  
> -	ppgtt->pdp.page_directory[pd].daddr = pd_addr;
> +	ppgtt->pdp.page_directory[pd]->daddr = pd_addr;
>  
>  	return 0;
>  }
> @@ -578,8 +680,8 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
>  					const int pt)
>  {
>  	dma_addr_t pt_addr;
> -	struct i915_page_directory_entry *pdir = &ppgtt->pdp.page_directory[pd];
> -	struct i915_page_table_entry *ptab = &pdir->page_table[pt];
> +	struct i915_page_directory_entry *pdir = ppgtt->pdp.page_directory[pd];
> +	struct i915_page_table_entry *ptab = pdir->page_table[pt];
>  	struct page *p = ptab->page;
>  	int ret;
>  
> @@ -642,10 +744,12 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  	 * will never need to touch the PDEs again.
>  	 */
>  	for (i = 0; i < max_pdp; i++) {
> +		struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
>  		gen8_ppgtt_pde_t *pd_vaddr;
> -		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
> +		pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i]->page);
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> -			dma_addr_t addr = ppgtt->pdp.page_directory[i].page_table[j].daddr;
> +			struct i915_page_table_entry *pt = pd->page_table[j];
> +			dma_addr_t addr = pt->daddr;
>  			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
>  						      I915_CACHE_LLC);
>  		}
> @@ -696,7 +800,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>  	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
>  		u32 expected;
>  		gen6_gtt_pte_t *pt_vaddr;
> -		dma_addr_t pt_addr = ppgtt->pd.page_table[pde].daddr;
> +		dma_addr_t pt_addr = ppgtt->pd.page_table[pde]->daddr;
>  		pd_entry = readl(pd_addr + pde);
>  		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
>  
> @@ -707,7 +811,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>  				   expected);
>  		seq_printf(m, "\tPDE: %x\n", pd_entry);
>  
> -		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde].page);
> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde]->page);
>  		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
>  			unsigned long va =
>  				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
> @@ -746,7 +850,7 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
>  	for (i = 0; i < ppgtt->num_pd_entries; i++) {
>  		dma_addr_t pt_addr;
>  
> -		pt_addr = ppgtt->pd.page_table[i].daddr;
> +		pt_addr = ppgtt->pd.page_table[i]->daddr;
>  		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
>  		pd_entry |= GEN6_PDE_VALID;
>  
> @@ -922,7 +1026,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>  		if (last_pte > I915_PPGTT_PT_ENTRIES)
>  			last_pte = I915_PPGTT_PT_ENTRIES;
>  
> -		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt].page);
> +		pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
>  
>  		for (i = first_pte; i < last_pte; i++)
>  			pt_vaddr[i] = scratch_pte;
> @@ -951,7 +1055,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>  	pt_vaddr = NULL;
>  	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
>  		if (pt_vaddr == NULL)
> -			pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt].page);
> +			pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
>  
>  		pt_vaddr[act_pte] =
>  			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
> @@ -974,7 +1078,7 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  
>  	for (i = 0; i < ppgtt->num_pd_entries; i++)
>  		pci_unmap_page(ppgtt->base.dev->pdev,
> -			       ppgtt->pd.page_table[i].daddr,
> +			       ppgtt->pd.page_table[i]->daddr,
>  			       4096, PCI_DMA_BIDIRECTIONAL);
>  }
>  
> @@ -983,9 +1087,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  	int i;
>  
>  	for (i = 0; i < ppgtt->num_pd_entries; i++)
> -		if (ppgtt->pd.page_table[i].page)
> -			__free_page(ppgtt->pd.page_table[i].page);
> -	kfree(ppgtt->pd.page_table);
> +		unmap_and_free_pt(ppgtt->pd.page_table[i]);
> +
> +	unmap_and_free_pd(&ppgtt->pd);
>  }
>  
>  static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
> @@ -1040,28 +1144,6 @@ alloc:
>  	return 0;
>  }
>  
> -static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
> -{
> -	struct i915_page_table_entry *pt;
> -	int i;
> -
> -	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
> -	if (!pt)
> -		return -ENOMEM;
> -
> -	ppgtt->pd.page_table = pt;
> -
> -	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> -		pt[i].page = alloc_page(GFP_KERNEL);
> -		if (!pt->page) {
> -			gen6_ppgtt_free(ppgtt);
> -			return -ENOMEM;
> -		}
> -	}
> -
> -	return 0;
> -}
> -
>  static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>  {
>  	int ret;
> @@ -1070,7 +1152,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>  	if (ret)
>  		return ret;
>  
> -	ret = gen6_ppgtt_allocate_page_tables(ppgtt);
> +	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
>  	if (ret) {
>  		drm_mm_remove_node(&ppgtt->node);
>  		return ret;
> @@ -1088,7 +1170,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>  		struct page *page;
>  		dma_addr_t pt_addr;
>  
> -		page = ppgtt->pd.page_table[i].page;
> +		page = ppgtt->pd.page_table[i]->page;
>  		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
>  				       PCI_DMA_BIDIRECTIONAL);
>  
> @@ -1097,7 +1179,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>  			return -EIO;
>  		}
>  
> -		ppgtt->pd.page_table[i].daddr = pt_addr;
> +		ppgtt->pd.page_table[i]->daddr = pt_addr;
>  	}
>  
>  	return 0;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 1144b709..c9e93f5 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -199,12 +199,12 @@ struct i915_page_directory_entry {
>  		dma_addr_t daddr;
>  	};
>  
> -	struct i915_page_table_entry *page_table;
> +	struct i915_page_table_entry *page_table[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
>  };
>  
>  struct i915_page_directory_pointer_entry {
>  	/* struct page *page; */
> -	struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
> +	struct i915_page_directory_entry *page_directory[GEN8_LEGACY_PDPES];
>  };
>  
>  struct i915_address_space {
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 9e71992..bc9c7c3 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1735,14 +1735,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
>  	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
>  	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
>  	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
> -	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3].daddr);
> -	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3].daddr);
> -	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2].daddr);
> -	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2].daddr);
> -	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1].daddr);
> -	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1].daddr);
> -	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0].daddr);
> -	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0].daddr);
> +	reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
> +	reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
> +	reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
> +	reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
> +	reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
> +	reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
> +	reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
> +	reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
>  	if (ring->id == RCS) {
>  		reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>  		reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
> -- 
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing
  2015-02-25 12:29       ` Michel Thierry
@ 2015-02-25 14:20         ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2015-02-25 14:20 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Wed, Feb 25, 2015 at 12:29:50PM +0000, Michel Thierry wrote:
> On 2/25/2015 10:55 AM, Mika Kuoppala wrote:
> >Daniel Vetter <daniel@ffwll.ch> writes:
> >
> >>On Tue, Feb 24, 2015 at 04:22:33PM +0000, Michel Thierry wrote:
> >>>This patchset addresses comments from v5 by Mika, specially some rename changes
> >>>touched several patches.
> >>>
> >>>For GEN8, it has also been extended to work in logical ring submission (lrc)
> >>>mode, as it will be the preferred mode of operation.
> >>>I also tried to update the lrc code at the same time the ppgtt refactoring
> >>>occurred, leaving only one patch that is exclusively for lrc.
> >>>
> >>>I'm also now including the required patches for PPGTT with 48b addressing.
> >>>In order expand the GPU address space, a 4th level translation is added, the
> >>>Page Map Level 4 (PML4). This PML4 has 256 PML4 Entries (PML4E), PML4[0-255],
> >>>each pointing to a PDP.
> >>>
> >>>For now, this feature will only be available in BDW and GEN9, in LRC submission
> >>>mode (execlists) and when i915.enable_ppgtt=3 is set.
> >>>Also note that this expanded address space is only available for full PPGTT,
> >>>aliasing PPGTT remains 32b.
> >>>
> >>>This list can be seen in 3 parts:
> >>>[01-10] Add page table allocation for GEN6/GEN7
> >>>[11-20] Enable dynamic allocation in GEN8,for both legacy and
> >>>execlist submission modes.
> >>>[21-32] PML4 support in BDW and GEN9+.
> >>>
> >>>Ben Widawsky (26):
> >>>   drm/i915: page table abstractions
> >>>   drm/i915: Complete page table structures
> >>>   drm/i915: Create page table allocators
> >>>   drm/i915: Track GEN6 page table usage
> >>>   drm/i915: Extract context switch skip and pd load logic
> >>>   drm/i915: Track page table reload need
> >>>   drm/i915: Initialize all contexts
> >>>   drm/i915: Finish gen6/7 dynamic page table allocation
> >>>   drm/i915/bdw: Use dynamic allocation idioms on free
> >>>   drm/i915/bdw: page directories rework allocation
> >>>   drm/i915/bdw: pagetable allocation rework
> >>>   drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page
> >>>   drm/i915: num_pd_pages/num_pd_entries isn't useful
> >>>   drm/i915: Extract PPGTT param from page_directory alloc
> >>>   drm/i915/bdw: Split out mappings
> >>>   drm/i915/bdw: begin bitmap tracking
> >>>   drm/i915/bdw: Dynamic page table allocations
> >>>   drm/i915/bdw: Make pdp allocation more dynamic
> >>>   drm/i915/bdw: Abstract PDP usage
> >>>   drm/i915/bdw: Add dynamic page trace events
> >>>   drm/i915/bdw: Add ppgtt info for dynamic pages
> >>>   drm/i915/bdw: implement alloc/free for 4lvl
> >>>   drm/i915/bdw: Add 4 level switching infrastructure
> >>>   drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT
> >>>   drm/i915: Plumb sg_iter through va allocation ->maps
> >>>   drm/i915: Expand error state's address width to 64b
> >>>
> >>>Michel Thierry (6):
> >>>   drm/i915: Plumb drm_device through page tables operations
> >>>   drm/i915: Add dynamic page trace events
> >>>   drm/i915/bdw: Support dynamic pdp updates in lrc mode
> >>>   drm/i915/bdw: Support 64 bit PPGTT in lrc mode
> >>>   drm/i915/bdw: Add 4 level support in insert_entries and clear_range
> >>>   drm/i915/bdw: Flip the 48b switch
> >>When just a few patches changed (which I suspect is the case here) please
> >>don't resend the entire series, but only resend the individual patches
> >>in-reply-to their earlier versions.
> >>
> >>Resending the entire series too often tends to split up the discussions
> >>between multiple threads, so should be used cautiously. My approach is
> >>that I don't resend the entire series except when all the patches have
> >>changed. And I only resend when the review round has reached a conclusion.
> >>While the review is ongoing doing incremental updates of the series is imo
> >>much better.
> >>
> >>But when resending the entire series, please start a new thread. Otherwise
> >>it again starts to become unclear which versions of which patches go
> >>together.
> >>
> >>And a quick aside if you fear that a patch causes subsequent patches to no
> >>longer apply without a rebase: I can deal with a lot of small conflicts
> >>quickly when merging. And if that doesn't cut it I'll just ask for a
> >>resend when needed.
> >>
> >I have been asking alot of stuff that triggers a rebasing. I suggest we
> >should keep the 48b items in a separate series until we have most of the
> >dynamic page table series sorted out.
> >
> >- Mika
> Ok, I'll keep dynamic page allocation and 48b separated.

Yeah I didn't realize that these two have been merged. Another bkm is to
keep patch-series for detailed code-review at 10-20 patches tops. More
thens to just result in a lot more churn for every minimal rebase on the
patch authors side. And also tends to burn out reviewers ime.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v6 04/32] drm/i915: Plumb drm_device through page tables operations
  2015-02-24 16:22   ` [PATCH v6 04/32] drm/i915: Plumb drm_device through page tables operations Michel Thierry
@ 2015-02-25 14:52     ` Mika Kuoppala
  2015-02-25 15:57       ` Daniel Vetter
  0 siblings, 1 reply; 229+ messages in thread
From: Mika Kuoppala @ 2015-02-25 14:52 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> The next patch in the series will require it for alloc_pt_single.
>
> v2: Rebased after s/page_tables/page_table/.
>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 29 ++++++++++++++++-------------
>  1 file changed, 16 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 81c1dba..e05488e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -142,7 +142,6 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
>  		return has_aliasing_ppgtt ? 1 : 0;
>  }
>  
> -
>  static void ppgtt_bind_vma(struct i915_vma *vma,
>  			   enum i915_cache_level cache_level,
>  			   u32 flags);
> @@ -279,7 +278,7 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
>  	return pte;
>  }
>  
> -static void unmap_and_free_pt(struct i915_page_table_entry *pt)
> +static void unmap_and_free_pt(struct i915_page_table_entry *pt, struct drm_device *dev)
>  {
>  	if (WARN_ON(!pt->page))
>  		return;
> @@ -287,7 +286,7 @@ static void unmap_and_free_pt(struct i915_page_table_entry *pt)
>  	kfree(pt);
>  }
>  
> -static struct i915_page_table_entry *alloc_pt_single(void)
> +static struct i915_page_table_entry *alloc_pt_single(struct drm_device *dev)
>  {
>  	struct i915_page_table_entry *pt;
>  
> @@ -317,7 +316,9 @@ static struct i915_page_table_entry *alloc_pt_single(void)
>   *
>   * Return: 0 if allocation succeeded.
>   */
> -static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count)
> +static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count,
> +		  struct drm_device *dev)
> +
>  {
>  	int i, ret;
>  
> @@ -326,7 +327,7 @@ static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, si
>  		return -EINVAL;
>  
>  	for (i = pde; i < pde + count; i++) {
> -		struct i915_page_table_entry *pt = alloc_pt_single();
> +		struct i915_page_table_entry *pt = alloc_pt_single(dev);
>  
>  		if (IS_ERR(pt)) {
>  			ret = PTR_ERR(pt);
> @@ -342,7 +343,7 @@ static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, si
>  
>  err_out:
>  	while (i-- > pde)
> -		unmap_and_free_pt(pd->page_table[i]);
> +		unmap_and_free_pt(pd->page_table[i], dev);
>  	return ret;
>  }
>  
> @@ -521,7 +522,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  	}
>  }
>  
> -static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
> +static void gen8_free_page_tables(struct i915_page_directory_entry *pd, struct drm_device *dev)
>  {
>  	int i;
>  
> @@ -532,7 +533,7 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
>  		if (WARN_ON(!pd->page_table[i]))
>  			continue;
>  
> -		unmap_and_free_pt(pd->page_table[i]);
> +		unmap_and_free_pt(pd->page_table[i], dev);
>  		pd->page_table[i] = NULL;
>  	}
>  }
> @@ -545,7 +546,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  		if (WARN_ON(!ppgtt->pdp.page_directory[i]))
>  			continue;
>  
> -		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
> +		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
>  		unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
>  	}
>  }
> @@ -597,7 +598,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>  
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
>  		ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
> -				     0, GEN8_PDES_PER_PAGE);
> +				     0, GEN8_PDES_PER_PAGE, ppgtt->base.dev);
>  		if (ret)
>  			goto unwind_out;
>  	}
> @@ -606,7 +607,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>  
>  unwind_out:
>  	while (i--)
> -		gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
> +		gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
>  
>  	return -ENOMEM;
>  }
> @@ -1087,7 +1088,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  	int i;
>  
>  	for (i = 0; i < ppgtt->num_pd_entries; i++)
> -		unmap_and_free_pt(ppgtt->pd.page_table[i]);
> +		unmap_and_free_pt(ppgtt->pd.page_table[i], ppgtt->base.dev);
>  
>  	unmap_and_free_pd(&ppgtt->pd);
>  }
> @@ -1152,7 +1153,9 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>  	if (ret)
>  		return ret;
>  
> -	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
> +	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
> +			ppgtt->base.dev);
> +
>  	if (ret) {
>  		drm_mm_remove_node(&ppgtt->node);
>  		return ret;
> -- 
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v6 04/32] drm/i915: Plumb drm_device through page tables operations
  2015-02-25 14:52     ` Mika Kuoppala
@ 2015-02-25 15:57       ` Daniel Vetter
  0 siblings, 0 replies; 229+ messages in thread
From: Daniel Vetter @ 2015-02-25 15:57 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Wed, Feb 25, 2015 at 04:52:39PM +0200, Mika Kuoppala wrote:
> Michel Thierry <michel.thierry@intel.com> writes:
> 
> > The next patch in the series will require it for alloc_pt_single.
> >
> > v2: Rebased after s/page_tables/page_table/.
> >
> > Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> 
> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

Merged up to this one, thanks for patches&review.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v6 05/32] drm/i915: Track GEN6 page table usage
  2015-02-24 16:22   ` [PATCH v6 05/32] drm/i915: Track GEN6 page table usage Michel Thierry
@ 2015-02-26 15:58     ` Mika Kuoppala
  2015-03-10 11:19       ` Mika Kuoppala
  0 siblings, 1 reply; 229+ messages in thread
From: Mika Kuoppala @ 2015-02-26 15:58 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> Instead of implementing the full tracking + dynamic allocation, this
> patch does a bit less than half of the work, by tracking and warning on
> unexpected conditions. The tracking itself follows which PTEs within a
> page table are currently being used for objects. The next patch will
> modify this to actually allocate the page tables only when necessary.
>
> With the current patch there isn't much in the way of making a gen
> agnostic range allocation function. However, in the next patch we'll add
> more specificity which makes having separate functions a bit easier to
> manage.
>
> One important change introduced here is that DMA mappings are
> created/destroyed at the same page directories/tables are
> allocated/deallocated.
>
> Notice that aliasing PPGTT is not managed here. The patch which actually
> begins dynamic allocation/teardown explains the reasoning for this.
>
> v2: s/pdp.page_directory/pdp.page_directorys
> Make a scratch page allocation helper
>
> v3: Rebase and expand commit message.
>
> v4: Allocate required pagetables only when it is needed, _bind_to_vm
> instead of bind_vma (Daniel).
>
> v5: Rebased to remove the unnecessary noise in the diff, also:
>  - PDE mask is GEN agnostic, renamed GEN6_PDE_MASK to I915_PDE_MASK.
>  - Removed unnecessary checks in gen6_alloc_va_range.
>  - Changed map/unmap_px_single macros to use dma functions directly and
>    be part of a static inline function instead.
>  - Moved drm_device plumbing through page tables operation to its own
>    patch.
>  - Moved allocate/teardown_va_range calls until they are fully
>    implemented (in subsequent patch).
>  - Merged pt and scratch_pt unmap_and_free path.
>  - Moved scratch page allocator helper to the patch that will use it.
>
> v6: Reduce complexity by not tearing down pagetables dynamically, the
> same can be achieved while freeing empty vms. (Daniel)
>
> v7: s/i915_dma_map_px_single/i915_dma_map_single
> s/gen6_write_pdes/gen6_write_pde
> Prevent a NULL case when only GGTT is available. (Mika)
>
> v8: Rebased after s/page_tables/page_table/.
>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 198 +++++++++++++++++++++++++-----------
>  drivers/gpu/drm/i915/i915_gem_gtt.h |  75 ++++++++++++++
>  2 files changed, 211 insertions(+), 62 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index e05488e..f9354c7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -278,29 +278,88 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
>  	return pte;
>  }
>  
> -static void unmap_and_free_pt(struct i915_page_table_entry *pt, struct drm_device *dev)
> +#define i915_dma_unmap_single(px, dev) \
> +	__i915_dma_unmap_single((px)->daddr, dev)
> +
> +static inline void __i915_dma_unmap_single(dma_addr_t daddr,
> +					struct drm_device *dev)
> +{
> +	struct device *device = &dev->pdev->dev;
> +
> +	dma_unmap_page(device, daddr, 4096, PCI_DMA_BIDIRECTIONAL);
> +}
> +
> +/**
> + * i915_dma_map_single() - Create a dma mapping for a page table/dir/etc.
> + * @px:	Page table/dir/etc to get a DMA map for
> + * @dev:	drm device
> + *
> + * Page table allocations are unified across all gens. They always require a
> + * single 4k allocation, as well as a DMA mapping. If we keep the structs
> + * symmetric here, the simple macro covers us for every page table type.
> + *
> + * Return: 0 if success.
> + */
> +#define i915_dma_map_single(px, dev) \
> +	i915_dma_map_page_single((px)->page, (dev), &(px)->daddr)
> +
> +static inline int i915_dma_map_page_single(struct page *page,
> +					   struct drm_device *dev,
> +					   dma_addr_t *daddr)
> +{
> +	struct device *device = &dev->pdev->dev;
> +
> +	*daddr = dma_map_page(device, page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
> +	return dma_mapping_error(device, *daddr);
> +}
> +
> +static void unmap_and_free_pt(struct i915_page_table_entry *pt,
> +			       struct drm_device *dev)
>  {
>  	if (WARN_ON(!pt->page))
>  		return;
> +
> +	i915_dma_unmap_single(pt, dev);
>  	__free_page(pt->page);
> +	kfree(pt->used_ptes);
>  	kfree(pt);
>  }
>  
>  static struct i915_page_table_entry *alloc_pt_single(struct drm_device *dev)
>  {
>  	struct i915_page_table_entry *pt;
> +	const size_t count = INTEL_INFO(dev)->gen >= 8 ?
> +		GEN8_PTES_PER_PAGE : I915_PPGTT_PT_ENTRIES;
> +	int ret = -ENOMEM;
>  
>  	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
>  	if (!pt)
>  		return ERR_PTR(-ENOMEM);
>  
> +	pt->used_ptes = kcalloc(BITS_TO_LONGS(count), sizeof(*pt->used_ptes),
> +				GFP_KERNEL);
> +
> +	if (!pt->used_ptes)
> +		goto fail_bitmap;
> +
>  	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> -	if (!pt->page) {
> -		kfree(pt);
> -		return ERR_PTR(-ENOMEM);
> -	}
> +	if (!pt->page)
> +		goto fail_page;
> +
> +	ret = i915_dma_map_single(pt, dev);
> +	if (ret)
> +		goto fail_dma;
>  
>  	return pt;
> +
> +fail_dma:
> +	__free_page(pt->page);
> +fail_page:
> +	kfree(pt->used_ptes);
> +fail_bitmap:
> +	kfree(pt);
> +
> +	return ERR_PTR(ret);
>  }
>  
>  /**
> @@ -838,26 +897,35 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>  	}
>  }
>  
> -static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
> +/* Write pde (index) from the page directory @pd to the page table @pt */
> +static void gen6_write_pde(struct i915_page_directory_entry *pd,
> +			    const int pde, struct i915_page_table_entry *pt)
>  {
> -	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
> -	gen6_gtt_pte_t __iomem *pd_addr;
> -	uint32_t pd_entry;
> -	int i;
> +	/* Caller needs to make sure the write completes if necessary */
> +	struct i915_hw_ppgtt *ppgtt =
> +		container_of(pd, struct i915_hw_ppgtt, pd);
> +	u32 pd_entry;
>  
> -	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
> -	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
> -		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
> -	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> -		dma_addr_t pt_addr;
> +	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
> +	pd_entry |= GEN6_PDE_VALID;
>  
> -		pt_addr = ppgtt->pd.page_table[i]->daddr;
> -		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
> -		pd_entry |= GEN6_PDE_VALID;
> +	writel(pd_entry, ppgtt->pd_addr + pde);
> +}
>  
> -		writel(pd_entry, pd_addr + i);
> -	}
> -	readl(pd_addr);
> +/* Write all the page tables found in the ppgtt structure to incrementing page
> + * directories. */
> +static void gen6_write_page_range(struct drm_i915_private *dev_priv,
> +				struct i915_page_directory_entry *pd, uint32_t start, uint32_t length)
> +{
> +	struct i915_page_table_entry *pt;
> +	uint32_t pde, temp;
> +
> +	gen6_for_each_pde(pt, pd, start, length, temp, pde)
> +		gen6_write_pde(pd, pde, pt);
> +
> +	/* Make sure write is complete before other code can use this page
> +	 * table. Also require for WC mapped PTEs */
> +	readl(dev_priv->gtt.gsm);
>  }
>  
>  static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
> @@ -1083,6 +1151,28 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  			       4096, PCI_DMA_BIDIRECTIONAL);
>  }
>  
> +static int gen6_alloc_va_range(struct i915_address_space *vm,
> +			       uint64_t start, uint64_t length)
> +{
> +	struct i915_hw_ppgtt *ppgtt =
> +				container_of(vm, struct i915_hw_ppgtt, base);
> +	struct i915_page_table_entry *pt;
> +	uint32_t pde, temp;
> +
> +	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
> +		DECLARE_BITMAP(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
> +
> +		bitmap_zero(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
> +		bitmap_set(tmp_bitmap, gen6_pte_index(start),
> +			   gen6_pte_count(start, length));
> +
> +		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
> +				I915_PPGTT_PT_ENTRIES);
> +	}
> +
> +	return 0;
> +}
> +
>  static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  {
>  	int i;
> @@ -1129,20 +1219,24 @@ alloc:
>  					       0, dev_priv->gtt.base.total,
>  					       0);
>  		if (ret)
> -			return ret;
> +			goto err_out;
>  
>  		retried = true;
>  		goto alloc;
>  	}
>  
>  	if (ret)
> -		return ret;
> +		goto err_out;
> +
>  
>  	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
>  		DRM_DEBUG("Forced to use aperture for PDEs\n");
>  
>  	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
>  	return 0;
> +
> +err_out:
> +	return ret;
>  }
>  
>  static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
> @@ -1164,30 +1258,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>  	return 0;
>  }
>  
> -static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
> -{
> -	struct drm_device *dev = ppgtt->base.dev;
> -	int i;
> -
> -	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> -		struct page *page;
> -		dma_addr_t pt_addr;
> -
> -		page = ppgtt->pd.page_table[i]->page;
> -		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
> -				       PCI_DMA_BIDIRECTIONAL);
> -
> -		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
> -			gen6_ppgtt_unmap_pages(ppgtt);
> -			return -EIO;
> -		}
> -
> -		ppgtt->pd.page_table[i]->daddr = pt_addr;
> -	}
> -
> -	return 0;
> -}
> -
>  static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  {
>  	struct drm_device *dev = ppgtt->base.dev;
> @@ -1211,12 +1281,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  	if (ret)
>  		return ret;
>  
> -	ret = gen6_ppgtt_setup_page_tables(ppgtt);
> -	if (ret) {
> -		gen6_ppgtt_free(ppgtt);
> -		return ret;
> -	}
> -
> +	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
>  	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
>  	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
>  	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
> @@ -1227,13 +1292,17 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  	ppgtt->pd.pd_offset =
>  		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
>  
> +	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
> +		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
> +
>  	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
>  
> +	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
> +
>  	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
>  			 ppgtt->node.size >> 20,
>  			 ppgtt->node.start / PAGE_SIZE);
>  
> -	gen6_write_pdes(ppgtt);
>  	DRM_DEBUG("Adding PPGTT at offset %x\n",
>  		  ppgtt->pd.pd_offset << 10);
>  
> @@ -1504,15 +1573,20 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
>  		return;
>  	}
>  
> -	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
> -		/* TODO: Perhaps it shouldn't be gen6 specific */
> -		if (i915_is_ggtt(vm)) {
> -			if (dev_priv->mm.aliasing_ppgtt)
> -				gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
> -			continue;
> -		}
> +	if (USES_PPGTT(dev)) {
> +		list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
> +			/* TODO: Perhaps it shouldn't be gen6 specific */
> +
> +			struct i915_hw_ppgtt *ppgtt =
> +					container_of(vm, struct i915_hw_ppgtt,
> +						     base);
>  
> -		gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
> +			if (i915_is_ggtt(vm))
> +				ppgtt = dev_priv->mm.aliasing_ppgtt;
> +
> +			gen6_write_page_range(dev_priv, &ppgtt->pd, 0,
> +					      ppgtt->num_pd_entries);
> +		}
>  	}
>  
>  	i915_ggtt_flush(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index c9e93f5..bf0e380 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -54,7 +54,10 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
>  #define GEN6_PPGTT_PD_ENTRIES		512
>  #define GEN6_PD_SIZE			(GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
>  #define GEN6_PD_ALIGN			(PAGE_SIZE * 16)
> +#define GEN6_PDE_SHIFT			22
>  #define GEN6_PDE_VALID			(1 << 0)
> +#define I915_PDE_MASK			(GEN6_PPGTT_PD_ENTRIES-1)
> +#define NUM_PTE(pde_shift)		(1 << (pde_shift - PAGE_SHIFT))
>  
>  #define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
>  
> @@ -190,6 +193,8 @@ struct i915_vma {
>  struct i915_page_table_entry {
>  	struct page *page;
>  	dma_addr_t daddr;
> +
> +	unsigned long *used_ptes;
>  };
>  
>  struct i915_page_directory_entry {
> @@ -246,6 +251,9 @@ struct i915_address_space {
>  	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
>  				     enum i915_cache_level level,
>  				     bool valid, u32 flags); /* Create a valid PTE */
> +	int (*allocate_va_range)(struct i915_address_space *vm,
> +				 uint64_t start,
> +				 uint64_t length);
>  	void (*clear_range)(struct i915_address_space *vm,
>  			    uint64_t start,
>  			    uint64_t length,
> @@ -298,12 +306,79 @@ struct i915_hw_ppgtt {
>  
>  	struct drm_i915_file_private *file_priv;
>  
> +	gen6_gtt_pte_t __iomem *pd_addr;
> +
>  	int (*enable)(struct i915_hw_ppgtt *ppgtt);
>  	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
>  			 struct intel_engine_cs *ring);
>  	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
>  };
>  
> +/* For each pde iterates over every pde between from start until start + length.
> + * If start, and start+length are not perfectly divisible, the macro will round
> + * down, and up as needed. The macro modifies pde, start, and length. Dev is
> + * only used to differentiate shift values. Temp is temp.  On gen6/7, start = 0,
> + * and length = 2G effectively iterates over every PDE in the system. On gen8+
> + * it simply iterates over every page directory entry in a page directory.
> + *

There is nothing for gen8 in the macro yet, so comment is a bit misleading.

> + * XXX: temp is not actually needed, but it saves doing the ALIGN operation.
> + */
> +#define gen6_for_each_pde(pt, pd, start, length, temp, iter) \
> +	for (iter = gen6_pde_index(start), pt = (pd)->page_table[iter]; \
> +	     length > 0 && iter < GEN6_PPGTT_PD_ENTRIES; \
> +	     pt = (pd)->page_table[++iter], \
> +	     temp = ALIGN(start+1, 1 << GEN6_PDE_SHIFT) - start, \
> +	     temp = min_t(unsigned, temp, length), \
> +	     start += temp, length -= temp)
> +
> +static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
> +{
> +	const uint32_t mask = NUM_PTE(pde_shift) - 1;
> +
> +	return (address >> PAGE_SHIFT) & mask;
> +}
> +
> +/* Helper to counts the number of PTEs within the given length. This count does
> +* not cross a page table boundary, so the max value would be
> +* I915_PPGTT_PT_ENTRIES for GEN6, and GEN8_PTES_PER_PAGE for GEN8.
> +*/
> +static inline size_t i915_pte_count(uint64_t addr, size_t length,
> +					uint32_t pde_shift)
> +{
> +	const uint64_t mask = ~((1 << pde_shift) - 1);
> +	uint64_t end;
> +
> +	BUG_ON(length == 0);
> +	BUG_ON(offset_in_page(addr|length));
> +
> +	end = addr + length;
> +
> +	if ((addr & mask) != (end & mask))
> +		return NUM_PTE(pde_shift) - i915_pte_index(addr, pde_shift);
> +
> +	return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
> +}

After trying to figure out the reasoning for this i915_pte_count
and it's role in the used bitmap setup, I started to wonder why all
the complexity here.

BUG_ON(offset_in_page(addr|length)) reveals that we can't be called
with anything but a page boundary so the address parameter is
irrelevant in here?

Then there is trickery with the pde_shifting. I tried to find some
generalization down the series to take advantage of this. Perhaps
I missed it?

For me it seems that we can replace this with simple:

static inline uint32_t i915_pte_count(uint64_t length, uint32_t
pte_len)
{
       return min_t(uint32_t, length / PAGE_SIZE, PAGE_SIZE / pte_len);
}

...and not lose anything.


On top of that when trying to wrap my brain around the differences
between GEN6/7 and GEN8+ the following patch before this patch would
make things much easier to understand, atleast for me:

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c9e93f5..c13f32f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -36,13 +36,13 @@
 
 struct drm_i915_file_private;
 
-typedef uint32_t gen6_gtt_pte_t;
-typedef uint64_t gen8_gtt_pte_t;
-typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
+typedef uint32_t gen6_pte_t;
+typedef uint64_t gen8_pte_t;
+typedef uint64_t gen8_pde_t;
 
 #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
 
-#define I915_PPGTT_PT_ENTRIES		(PAGE_SIZE / sizeof(gen6_gtt_pte_t))
+
 /* gen6-hsw has bit 11-4 for physical addr bit 39-32 */
 #define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
 #define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
@@ -51,8 +51,13 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN6_PTE_UNCACHED		(1 << 1)
 #define GEN6_PTE_VALID			(1 << 0)
 
-#define GEN6_PPGTT_PD_ENTRIES		512
-#define GEN6_PD_SIZE			(GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
+#define I915_PTES(pte_len)		(PAGE_SIZE / (pte_len))
+#define I915_PTE_MASK(pte_len)		(I915_PTES(pte_len) - 1)
+#define I915_PDES			512
+#define I915_PDE_MASK			(I915_PDES - 1)
+
+#define GEN6_PTES			I915_PTES(sizeof(gen6_pte_t))
+#define GEN6_PD_SIZE		        (I915_PDES * PAGE_SIZE)
 #define GEN6_PD_ALIGN			(PAGE_SIZE * 16)
 #define GEN6_PDE_VALID			(1 << 0)
 
@@ -89,8 +94,7 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN8_PTE_SHIFT			12
 #define GEN8_PTE_MASK			0x1ff
 #define GEN8_LEGACY_PDPES		4
-#define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
-#define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
+#define GEN8_PTES			I915_PTES(sizeof(gen8_pte_t))
 
 #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
 #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
@@ -199,7 +203,7 @@ struct i915_page_directory_entry {
 		dma_addr_t daddr;
 	};
 
-	struct i915_page_table_entry *page_table[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
+	struct i915_page_table_entry *page_table[I915_PDES]; /* PDEs */
 };
 
 struct i915_page_directory_pointer_entry {
@@ -243,9 +247,9 @@ struct i915_address_space {
 	struct list_head inactive_list;
 
 	/* FIXME: Need a more generic return type */
-	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
-				     enum i915_cache_level level,
-				     bool valid, u32 flags); /* Create a valid PTE */
+	gen6_pte_t (*pte_encode)(dma_addr_t addr,
+				 enum i915_cache_level level,
+				 bool valid, u32 flags); /* Create a valid PTE */
 	void (*clear_range)(struct i915_address_space *vm,
 			    uint64_t start,
 			    uint64_t length,
@@ -304,6 +308,21 @@ struct i915_hw_ppgtt {
 	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
 };
 
+static inline uint32_t i915_pte_index(uint64_t address, uint32_t pte_len)
+{
+	return (address >> PAGE_SHIFT) & I915_PTE_MASK(pte_len);
+}
+
+static inline uint32_t i915_pte_count(uint64_t length, uint32_t pte_len)
+{
+	return min_t(uint32_t, length / PAGE_SIZE, I915_PTES(pte_len));
+}
+
+static inline uint32_t i915_pde_index(uint64_t addr, uint32_t pde_shift)
+{
+	return (addr >> pde_shift) & I915_PDE_MASK;
+}
+
 int i915_gem_gtt_init(struct drm_device *dev);
 void i915_gem_init_global_gtt(struct drm_device *dev);
 void i915_global_gtt_cleanup(struct drm_device *dev);

With that, I think you could write generic gen_for_each_pde
more easily and just parametrize the gen6 variant and gen8 one
further in the series.

--Mika

> +static inline uint32_t i915_pde_index(uint64_t addr, uint32_t shift)
> +{
> +	return (addr >> shift) & I915_PDE_MASK;
> +}
> +
> +static inline uint32_t gen6_pte_index(uint32_t addr)
> +{
> +	return i915_pte_index(addr, GEN6_PDE_SHIFT);
> +}
> +
> +static inline size_t gen6_pte_count(uint32_t addr, uint32_t length)
> +{
> +	return i915_pte_count(addr, length, GEN6_PDE_SHIFT);
> +}
> +
> +static inline uint32_t gen6_pde_index(uint32_t addr)
> +{
> +	return i915_pde_index(addr, GEN6_PDE_SHIFT);
> +}
> +
>  int i915_gem_gtt_init(struct drm_device *dev);
>  void i915_gem_init_global_gtt(struct drm_device *dev);
>  void i915_global_gtt_cleanup(struct drm_device *dev);
> -- 
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH v6 06/32] drm/i915: Extract context switch skip and pd load logic
  2015-02-24 16:22   ` [PATCH v6 06/32] drm/i915: Extract context switch skip and pd load logic Michel Thierry
@ 2015-02-27 11:46     ` Mika Kuoppala
  2015-02-27 13:38       ` [PATCH] drm/i915: Extract context switch skip and add " Michel Thierry
  0 siblings, 1 reply; 229+ messages in thread
From: Mika Kuoppala @ 2015-02-27 11:46 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> We have some fanciness coming up. This patch just breaks out the logic
> of context switch skip, pd load pre, and pd load post.
>
> v2: Use new functions to replace the logic right away (Daniel)
>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 40 +++++++++++++++++++++++++--------
>  1 file changed, 31 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 755b415..6206d27 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -565,6 +565,33 @@ mi_set_context(struct intel_engine_cs *ring,
>  	return ret;
>  }
>  
> +static inline bool should_skip_switch(struct intel_engine_cs *ring,
> +				      struct intel_context *from,
> +				      struct intel_context *to)
> +{
> +	if (from == to && !to->remap_slice)
> +		return true;
> +
> +	return false;
> +}
> +
> +static bool
> +needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +
> +	return ((INTEL_INFO(ring->dev)->gen < 8) ||
> +			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
> +}
> +
> +static bool
> +needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
> +{
> +	return (!to->legacy_hw_ctx.initialized ||
> +			i915_gem_context_is_default(to)) &&
> +			to->ppgtt && IS_GEN8(ring->dev);
> +}
> +
>  static int do_switch(struct intel_engine_cs *ring,
>  		     struct intel_context *to)
>  {
> @@ -573,9 +600,6 @@ static int do_switch(struct intel_engine_cs *ring,
>  	u32 hw_flags = 0;
>  	bool uninitialized = false;
>  	struct i915_vma *vma;
> -	bool needs_pd_load_pre = ((INTEL_INFO(ring->dev)->gen < 8) ||
> -			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
> -	bool needs_pd_load_post = false;

There is no such code in drm-intel-nightly. On top of which tree this is
for?

-Mika

>  	int ret, i;
>  
>  	if (from != NULL && ring == &dev_priv->ring[RCS]) {
> @@ -583,7 +607,7 @@ static int do_switch(struct intel_engine_cs *ring,
>  		BUG_ON(!i915_gem_obj_is_pinned(from->legacy_hw_ctx.rcs_state));
>  	}
>  
> -	if (from == to && !to->remap_slice)
> +	if (should_skip_switch(ring, from, to))
>  		return 0;
>  
>  	/* Trying to pin first makes error handling easier. */
> @@ -601,7 +625,7 @@ static int do_switch(struct intel_engine_cs *ring,
>  	 */
>  	from = ring->last_context;
>  
> -	if (needs_pd_load_pre) {
> +	if (needs_pd_load_pre(ring, to)) {
>  		/* Older GENs and non render rings still want the load first,
>  		 * "PP_DCLV followed by PP_DIR_BASE register through Load
>  		 * Register Immediate commands in Ring Buffer before submitting
> @@ -646,16 +670,14 @@ static int do_switch(struct intel_engine_cs *ring,
>  	 * XXX: If we implemented page directory eviction code, this
>  	 * optimization needs to be removed.
>  	 */
> -	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
> +	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
>  		hw_flags |= MI_RESTORE_INHIBIT;
> -		needs_pd_load_post = to->ppgtt && IS_GEN8(ring->dev);
> -	}
>  
>  	ret = mi_set_context(ring, to, hw_flags);
>  	if (ret)
>  		goto unpin_out;
>  
> -	if (needs_pd_load_post) {
> +	if (needs_pd_load_post(ring, to)) {
>  		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
>  		/* The hardware context switch is emitted, but we haven't
>  		 * actually changed the state - so it's probably safe to bail
> -- 
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* [PATCH] drm/i915: Extract context switch skip and add pd load logic
  2015-02-27 11:46     ` Mika Kuoppala
@ 2015-02-27 13:38       ` Michel Thierry
  2015-03-03  3:54         ` shuang.he
  2015-03-05 14:37         ` Mika Kuoppala
  0 siblings, 2 replies; 229+ messages in thread
From: Michel Thierry @ 2015-02-27 13:38 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

This patch just breaks out the logic of context switch skip.

It also adds pd load pre, and pd load post logic (for GEN8).

v2: Use new functions to replace the logic right away (Daniel)
v3: Add missing pd load logic.

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_context.c | 48 +++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 70346b0..8474e2c 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -569,6 +569,33 @@ mi_set_context(struct intel_engine_cs *ring,
 	return ret;
 }
 
+static inline bool should_skip_switch(struct intel_engine_cs *ring,
+				      struct intel_context *from,
+				      struct intel_context *to)
+{
+	if (from == to && !to->remap_slice)
+		return true;
+
+	return false;
+}
+
+static bool
+needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	return ((INTEL_INFO(ring->dev)->gen < 8) ||
+			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
+}
+
+static bool
+needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
+{
+	return (!to->legacy_hw_ctx.initialized ||
+			i915_gem_context_is_default(to)) &&
+			to->ppgtt && IS_GEN8(ring->dev);
+}
+
 static int do_switch(struct intel_engine_cs *ring,
 		     struct intel_context *to)
 {
@@ -584,7 +611,7 @@ static int do_switch(struct intel_engine_cs *ring,
 		BUG_ON(!i915_gem_obj_is_pinned(from->legacy_hw_ctx.rcs_state));
 	}
 
-	if (from == to && !to->remap_slice)
+	if (should_skip_switch(ring, from, to))
 		return 0;
 
 	/* Trying to pin first makes error handling easier. */
@@ -602,7 +629,11 @@ static int do_switch(struct intel_engine_cs *ring,
 	 */
 	from = ring->last_context;
 
-	if (to->ppgtt) {
+	if (needs_pd_load_pre(ring, to)) {
+		/* Older GENs and non render rings still want the load first,
+		 * "PP_DCLV followed by PP_DIR_BASE register through Load
+		 * Register Immediate commands in Ring Buffer before submitting
+		 * a context."*/
 		trace_switch_mm(ring, to);
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		if (ret)
@@ -644,6 +675,19 @@ static int do_switch(struct intel_engine_cs *ring,
 	if (ret)
 		goto unpin_out;
 
+	if (needs_pd_load_post(ring, to)) {
+		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
+		/* The hardware context switch is emitted, but we haven't
+		 * actually changed the state - so it's probably safe to bail
+		 * here. Still, let the user know something dangerous has
+		 * happened.
+		 */
+		if (ret) {
+			DRM_ERROR("Failed to change address space on context switch\n");
+			goto unpin_out;
+		}
+	}
+
 	for (i = 0; i < MAX_L3_SLICES; i++) {
 		if (!(to->remap_slice & (1<<i)))
 			continue;
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* [PATCH] drm/i915: Initialize all contexts
  2015-02-24 16:22   ` [PATCH v6 08/32] drm/i915: Initialize all contexts Michel Thierry
@ 2015-02-27 13:40     ` Michel Thierry
  2015-03-20 10:38       ` Chris Wilson
  0 siblings, 1 reply; 229+ messages in thread
From: Michel Thierry @ 2015-02-27 13:40 UTC (permalink / raw)
  To: intel-gfx

From: Ben Widawsky <benjamin.widawsky@intel.com>

The problem is we're going to switch to a new context, which could be
the default context. The plan was to use restore inhibit, which would be
fine, except if we are using dynamic page tables (which we will). If we
use dynamic page tables and we don't load new page tables, the previous
page tables might go away, and future operations will fault.

CTXA runs.
switch to default, restore inhibit
CTXA dies and has its address space taken away.
Run CTXB, tries to save using the context A's address space - this
fails.

The general solution is to make sure every context has it's own state,
and its own address space. For cases when we must restore inhibit, first
thing we do is load a valid address space. I thought this would be
enough, but apparently there are references within the context itself
which will refer to the old address space - therefore, we also must
reinitialize.

It was tricky to track this down as we don't have much insight into what
happens in a context save.

This is required for the next patch which enables dynamic page tables.

v2: to->ppgtt is only valid in full ppgtt.
v3: Rebased.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
 drivers/gpu/drm/i915/i915_gem_context.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 8b288a8..1ff86f1 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -600,13 +600,6 @@ needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
 			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
 }
 
-static bool
-needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
-{
-	return IS_GEN8(ring->dev) &&
-			(to->ppgtt || &to->ppgtt->pd_dirty_rings);
-}
-
 static int do_switch(struct intel_engine_cs *ring,
 		     struct intel_context *to)
 {
@@ -685,16 +678,26 @@ static int do_switch(struct intel_engine_cs *ring,
 			goto unpin_out;
 	}
 
-	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
+	/* GEN8 does *not* require an explicit reload if the PDPs have been
+	 * setup, and we do not wish to move them.
+	 */
+	if (!to->legacy_hw_ctx.initialized) {
 		hw_flags |= MI_RESTORE_INHIBIT;
-	else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->pd_dirty_rings))
+		/* NB: If we inhibit the restore, the context is not allowed to
+		 * die because future work may end up depending on valid address
+		 * space. This means we must enforce that a page table load
+		 * occur when this occurs. */
+	} else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->pd_dirty_rings))
 		hw_flags |= MI_FORCE_RESTORE;
 
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
 		goto unpin_out;
 
-	if (needs_pd_load_post(ring, to)) {
+	if (IS_GEN8(ring->dev) && to->ppgtt && (hw_flags & MI_RESTORE_INHIBIT)) {
+		/* We have a valid page directory (scratch) to switch to. This
+		 * allows the old VM to be freed. Note that if anything occurs
+		 * between the set context, and here, we are f*cked */
 		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
 		/* The hardware context switch is emitted, but we haven't
 		 * actually changed the state - so it's probably safe to bail
@@ -744,7 +747,7 @@ static int do_switch(struct intel_engine_cs *ring,
 		i915_gem_context_unreference(from);
 	}
 
-	uninitialized = !to->legacy_hw_ctx.initialized && from == NULL;
+	uninitialized = !to->legacy_hw_ctx.initialized;
 	to->legacy_hw_ctx.initialized = true;
 
 done:
-- 
2.1.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 229+ messages in thread

* Re: [PATCH v6 03/32] drm/i915: Create page table allocators
  2015-02-25 13:34     ` Mika Kuoppala
@ 2015-03-02 18:57       ` Paulo Zanoni
  0 siblings, 0 replies; 229+ messages in thread
From: Paulo Zanoni @ 2015-03-02 18:57 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: Intel Graphics Development

2015-02-25 10:34 GMT-03:00 Mika Kuoppala <mika.kuoppala@linux.intel.com>:
> Michel Thierry <michel.thierry@intel.com> writes:
>
>> From: Ben Widawsky <benjamin.widawsky@intel.com>
>>
>> As we move toward dynamic page table allocation, it becomes much easier
>> to manage our data structures if break do things less coarsely by
>> breaking up all of our actions into individual tasks.  This makes the
>> code easier to write, read, and verify.

QA reported a regression caused by this patch. BSW doesn't boot anymore.

https://bugs.freedesktop.org/show_bug.cgi?id=89350

>>
>> Aside from the dissection of the allocation functions, the patch
>> statically allocates the page table structures without a page directory.
>> This remains the same for all platforms,
>>
>> The patch itself should not have much functional difference. The primary
>> noticeable difference is the fact that page tables are no longer
>> allocated, but rather statically declared as part of the page directory.
>> This has non-zero overhead, but things gain additional complexity as a
>> result.
>>
>> This patch exists for a few reasons:
>> 1. Splitting out the functions allows easily combining GEN6 and GEN8
>> code. Page tables have no difference based on GEN8. As we'll see in a
>> future patch when we add the DMA mappings to the allocations, it
>> requires only one small change to make work, and error handling should
>> just fall into place.
>>
>> 2. Unless we always want to allocate all page tables under a given PDE,
>> we'll have to eventually break this up into an array of pointers (or
>> pointer to pointer).
>>
>> 3. Having the discrete functions is easier to review, and understand.
>> All allocations and frees now take place in just a couple of locations.
>> Reviewing, and catching leaks should be easy.
>>
>> 4. Less important: the GFP flags are confined to one location, which
>> makes playing around with such things trivial.
>>
>> v2: Updated commit message to explain why this patch exists
>>
>> v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
>>
>> v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
>>
>> v5: Added additional safety checks in gen8 clear/free/unmap.
>>
>> v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
>>
>> v7: Make err_out loop symmetrical to the way we allocate in
>> alloc_pt_range. Also s/page_tables/page_table and correct commit
>> message (Mika)
>>
>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
>
> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
>
>> ---
>>  drivers/gpu/drm/i915/i915_gem_gtt.c | 254 ++++++++++++++++++++++++------------
>>  drivers/gpu/drm/i915/i915_gem_gtt.h |   4 +-
>>  drivers/gpu/drm/i915/intel_lrc.c    |  16 +--
>>  3 files changed, 178 insertions(+), 96 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index ab6f1d4..81c1dba 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -279,6 +279,98 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
>>       return pte;
>>  }
>>
>> +static void unmap_and_free_pt(struct i915_page_table_entry *pt)
>> +{
>> +     if (WARN_ON(!pt->page))
>> +             return;
>> +     __free_page(pt->page);
>> +     kfree(pt);
>> +}
>> +
>> +static struct i915_page_table_entry *alloc_pt_single(void)
>> +{
>> +     struct i915_page_table_entry *pt;
>> +
>> +     pt = kzalloc(sizeof(*pt), GFP_KERNEL);
>> +     if (!pt)
>> +             return ERR_PTR(-ENOMEM);
>> +
>> +     pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +     if (!pt->page) {
>> +             kfree(pt);
>> +             return ERR_PTR(-ENOMEM);
>> +     }
>> +
>> +     return pt;
>> +}
>> +
>> +/**
>> + * alloc_pt_range() - Allocate a multiple page tables
>> + * @pd:              The page directory which will have at least @count entries
>> + *           available to point to the allocated page tables.
>> + * @pde:     First page directory entry for which we are allocating.
>> + * @count:   Number of pages to allocate.
>> + *
>> + * Allocates multiple page table pages and sets the appropriate entries in the
>> + * page table structure within the page directory. Function cleans up after
>> + * itself on any failures.
>> + *
>> + * Return: 0 if allocation succeeded.
>> + */
>> +static int alloc_pt_range(struct i915_page_directory_entry *pd, uint16_t pde, size_t count)
>> +{
>> +     int i, ret;
>> +
>> +     /* 512 is the max page tables per page_directory on any platform. */
>> +     if (WARN_ON(pde + count > GEN6_PPGTT_PD_ENTRIES))
>> +             return -EINVAL;
>> +
>> +     for (i = pde; i < pde + count; i++) {
>> +             struct i915_page_table_entry *pt = alloc_pt_single();
>> +
>> +             if (IS_ERR(pt)) {
>> +                     ret = PTR_ERR(pt);
>> +                     goto err_out;
>> +             }
>> +             WARN(pd->page_table[i],
>> +                  "Leaking page directory entry %d (%pa)\n",
>> +                  i, pd->page_table[i]);
>> +             pd->page_table[i] = pt;
>> +     }
>> +
>> +     return 0;
>> +
>> +err_out:
>> +     while (i-- > pde)
>> +             unmap_and_free_pt(pd->page_table[i]);
>> +     return ret;
>> +}
>> +
>> +static void unmap_and_free_pd(struct i915_page_directory_entry *pd)
>> +{
>> +     if (pd->page) {
>> +             __free_page(pd->page);
>> +             kfree(pd);
>> +     }
>> +}
>> +
>> +static struct i915_page_directory_entry *alloc_pd_single(void)
>> +{
>> +     struct i915_page_directory_entry *pd;
>> +
>> +     pd = kzalloc(sizeof(*pd), GFP_KERNEL);
>> +     if (!pd)
>> +             return ERR_PTR(-ENOMEM);
>> +
>> +     pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +     if (!pd->page) {
>> +             kfree(pd);
>> +             return ERR_PTR(-ENOMEM);
>> +     }
>> +
>> +     return pd;
>> +}
>> +
>>  /* Broadwell Page Directory Pointer Descriptors */
>>  static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
>>                          uint64_t val)
>> @@ -311,7 +403,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
>>       int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
>>
>>       for (i = used_pd - 1; i >= 0; i--) {
>> -             dma_addr_t addr = ppgtt->pdp.page_directory[i].daddr;
>> +             dma_addr_t addr = ppgtt->pdp.page_directory[i]->daddr;
>>               ret = gen8_write_pdp(ring, i, addr);
>>               if (ret)
>>                       return ret;
>> @@ -338,8 +430,24 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>>                                     I915_CACHE_LLC, use_scratch);
>>
>>       while (num_entries) {
>> -             struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
>> -             struct page *page_table = pd->page_table[pde].page;
>> +             struct i915_page_directory_entry *pd;
>> +             struct i915_page_table_entry *pt;
>> +             struct page *page_table;
>> +
>> +             if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
>> +                     continue;
>> +
>> +             pd = ppgtt->pdp.page_directory[pdpe];
>> +
>> +             if (WARN_ON(!pd->page_table[pde]))
>> +                     continue;
>> +
>> +             pt = pd->page_table[pde];
>> +
>> +             if (WARN_ON(!pt->page))
>> +                     continue;
>> +
>> +             page_table = pt->page;
>>
>>               last_pte = pte + num_entries;
>>               if (last_pte > GEN8_PTES_PER_PAGE)
>> @@ -384,8 +492,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>>                       break;
>>
>>               if (pt_vaddr == NULL) {
>> -                     struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[pdpe];
>> -                     struct page *page_table = pd->page_table[pde].page;
>> +                     struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[pdpe];
>> +                     struct i915_page_table_entry *pt = pd->page_table[pde];
>> +                     struct page *page_table = pt->page;
>>
>>                       pt_vaddr = kmap_atomic(page_table);
>>               }
>> @@ -416,19 +525,16 @@ static void gen8_free_page_tables(struct i915_page_directory_entry *pd)
>>  {
>>       int i;
>>
>> -     if (pd->page_table == NULL)
>> +     if (!pd->page)
>>               return;
>>
>> -     for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
>> -             if (pd->page_table[i].page)
>> -                     __free_page(pd->page_table[i].page);
>> -}
>> +     for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
>> +             if (WARN_ON(!pd->page_table[i]))
>> +                     continue;
>>
>> -static void gen8_free_page_directory(struct i915_page_directory_entry *pd)
>> -{
>> -     gen8_free_page_tables(pd);
>> -     kfree(pd->page_table);
>> -     __free_page(pd->page);
>> +             unmap_and_free_pt(pd->page_table[i]);
>> +             pd->page_table[i] = NULL;
>> +     }
>>  }
>>
>>  static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>> @@ -436,7 +542,11 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>>       int i;
>>
>>       for (i = 0; i < ppgtt->num_pd_pages; i++) {
>> -             gen8_free_page_directory(&ppgtt->pdp.page_directory[i]);
>> +             if (WARN_ON(!ppgtt->pdp.page_directory[i]))
>> +                     continue;
>> +
>> +             gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
>> +             unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
>>       }
>>  }
>>
>> @@ -448,14 +558,23 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>>       for (i = 0; i < ppgtt->num_pd_pages; i++) {
>>               /* TODO: In the future we'll support sparse mappings, so this
>>                * will have to change. */
>> -             if (!ppgtt->pdp.page_directory[i].daddr)
>> +             if (!ppgtt->pdp.page_directory[i]->daddr)
>>                       continue;
>>
>> -             pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i].daddr, PAGE_SIZE,
>> +             pci_unmap_page(hwdev, ppgtt->pdp.page_directory[i]->daddr, PAGE_SIZE,
>>                              PCI_DMA_BIDIRECTIONAL);
>>
>>               for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
>> -                     dma_addr_t addr = ppgtt->pdp.page_directory[i].page_table[j].daddr;
>> +                     struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
>> +                     struct i915_page_table_entry *pt;
>> +                     dma_addr_t addr;
>> +
>> +                     if (WARN_ON(!pd->page_table[j]))
>> +                             continue;
>> +
>> +                     pt = pd->page_table[j];
>> +                     addr = pt->daddr;
>> +
>>                       if (addr)
>>                               pci_unmap_page(hwdev, addr, PAGE_SIZE,
>>                                              PCI_DMA_BIDIRECTIONAL);
>> @@ -474,25 +593,20 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>>
>>  static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>>  {
>> -     int i, j;
>> +     int i, ret;
>>
>>       for (i = 0; i < ppgtt->num_pd_pages; i++) {
>> -             struct i915_page_directory_entry *pd = &ppgtt->pdp.page_directory[i];
>> -             for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
>> -                     struct i915_page_table_entry *pt = &pd->page_table[j];
>> -
>> -                     pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> -                     if (!pt->page)
>> -                             goto unwind_out;
>> -
>> -             }
>> +             ret = alloc_pt_range(ppgtt->pdp.page_directory[i],
>> +                                  0, GEN8_PDES_PER_PAGE);
>> +             if (ret)
>> +                     goto unwind_out;
>>       }
>>
>>       return 0;
>>
>>  unwind_out:
>>       while (i--)
>> -             gen8_free_page_tables(&ppgtt->pdp.page_directory[i]);
>> +             gen8_free_page_tables(ppgtt->pdp.page_directory[i]);
>>
>>       return -ENOMEM;
>>  }
>> @@ -503,19 +617,9 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
>>       int i;
>>
>>       for (i = 0; i < max_pdp; i++) {
>> -             struct i915_page_table_entry *pt;
>> -
>> -             pt = kcalloc(GEN8_PDES_PER_PAGE, sizeof(*pt), GFP_KERNEL);
>> -             if (!pt)
>> -                     goto unwind_out;
>> -
>> -             ppgtt->pdp.page_directory[i].page = alloc_page(GFP_KERNEL);
>> -             if (!ppgtt->pdp.page_directory[i].page) {
>> -                     kfree(pt);
>> +             ppgtt->pdp.page_directory[i] = alloc_pd_single();
>> +             if (IS_ERR(ppgtt->pdp.page_directory[i]))
>>                       goto unwind_out;
>> -             }
>> -
>> -             ppgtt->pdp.page_directory[i].page_table = pt;
>>       }
>>
>>       ppgtt->num_pd_pages = max_pdp;
>> @@ -524,10 +628,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
>>       return 0;
>>
>>  unwind_out:
>> -     while (i--) {
>> -             kfree(ppgtt->pdp.page_directory[i].page_table);
>> -             __free_page(ppgtt->pdp.page_directory[i].page);
>> -     }
>> +     while (i--)
>> +             unmap_and_free_pd(ppgtt->pdp.page_directory[i]);
>>
>>       return -ENOMEM;
>>  }
>> @@ -561,14 +663,14 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
>>       int ret;
>>
>>       pd_addr = pci_map_page(ppgtt->base.dev->pdev,
>> -                            ppgtt->pdp.page_directory[pd].page, 0,
>> +                            ppgtt->pdp.page_directory[pd]->page, 0,
>>                              PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>>
>>       ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
>>       if (ret)
>>               return ret;
>>
>> -     ppgtt->pdp.page_directory[pd].daddr = pd_addr;
>> +     ppgtt->pdp.page_directory[pd]->daddr = pd_addr;
>>
>>       return 0;
>>  }
>> @@ -578,8 +680,8 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
>>                                       const int pt)
>>  {
>>       dma_addr_t pt_addr;
>> -     struct i915_page_directory_entry *pdir = &ppgtt->pdp.page_directory[pd];
>> -     struct i915_page_table_entry *ptab = &pdir->page_table[pt];
>> +     struct i915_page_directory_entry *pdir = ppgtt->pdp.page_directory[pd];
>> +     struct i915_page_table_entry *ptab = pdir->page_table[pt];
>>       struct page *p = ptab->page;
>>       int ret;
>>
>> @@ -642,10 +744,12 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>>        * will never need to touch the PDEs again.
>>        */
>>       for (i = 0; i < max_pdp; i++) {
>> +             struct i915_page_directory_entry *pd = ppgtt->pdp.page_directory[i];
>>               gen8_ppgtt_pde_t *pd_vaddr;
>> -             pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i].page);
>> +             pd_vaddr = kmap_atomic(ppgtt->pdp.page_directory[i]->page);
>>               for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
>> -                     dma_addr_t addr = ppgtt->pdp.page_directory[i].page_table[j].daddr;
>> +                     struct i915_page_table_entry *pt = pd->page_table[j];
>> +                     dma_addr_t addr = pt->daddr;
>>                       pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
>>                                                     I915_CACHE_LLC);
>>               }
>> @@ -696,7 +800,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>>       for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
>>               u32 expected;
>>               gen6_gtt_pte_t *pt_vaddr;
>> -             dma_addr_t pt_addr = ppgtt->pd.page_table[pde].daddr;
>> +             dma_addr_t pt_addr = ppgtt->pd.page_table[pde]->daddr;
>>               pd_entry = readl(pd_addr + pde);
>>               expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
>>
>> @@ -707,7 +811,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>>                                  expected);
>>               seq_printf(m, "\tPDE: %x\n", pd_entry);
>>
>> -             pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde].page);
>> +             pt_vaddr = kmap_atomic(ppgtt->pd.page_table[pde]->page);
>>               for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
>>                       unsigned long va =
>>                               (pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
>> @@ -746,7 +850,7 @@ static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
>>       for (i = 0; i < ppgtt->num_pd_entries; i++) {
>>               dma_addr_t pt_addr;
>>
>> -             pt_addr = ppgtt->pd.page_table[i].daddr;
>> +             pt_addr = ppgtt->pd.page_table[i]->daddr;
>>               pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
>>               pd_entry |= GEN6_PDE_VALID;
>>
>> @@ -922,7 +1026,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>>               if (last_pte > I915_PPGTT_PT_ENTRIES)
>>                       last_pte = I915_PPGTT_PT_ENTRIES;
>>
>> -             pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt].page);
>> +             pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
>>
>>               for (i = first_pte; i < last_pte; i++)
>>                       pt_vaddr[i] = scratch_pte;
>> @@ -951,7 +1055,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>>       pt_vaddr = NULL;
>>       for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
>>               if (pt_vaddr == NULL)
>> -                     pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt].page);
>> +                     pt_vaddr = kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
>>
>>               pt_vaddr[act_pte] =
>>                       vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
>> @@ -974,7 +1078,7 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>>
>>       for (i = 0; i < ppgtt->num_pd_entries; i++)
>>               pci_unmap_page(ppgtt->base.dev->pdev,
>> -                            ppgtt->pd.page_table[i].daddr,
>> +                            ppgtt->pd.page_table[i]->daddr,
>>                              4096, PCI_DMA_BIDIRECTIONAL);
>>  }
>>
>> @@ -983,9 +1087,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>>       int i;
>>
>>       for (i = 0; i < ppgtt->num_pd_entries; i++)
>> -             if (ppgtt->pd.page_table[i].page)
>> -                     __free_page(ppgtt->pd.page_table[i].page);
>> -     kfree(ppgtt->pd.page_table);
>> +             unmap_and_free_pt(ppgtt->pd.page_table[i]);
>> +
>> +     unmap_and_free_pd(&ppgtt->pd);
>>  }
>>
>>  static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
>> @@ -1040,28 +1144,6 @@ alloc:
>>       return 0;
>>  }
>>
>> -static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
>> -{
>> -     struct i915_page_table_entry *pt;
>> -     int i;
>> -
>> -     pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
>> -     if (!pt)
>> -             return -ENOMEM;
>> -
>> -     ppgtt->pd.page_table = pt;
>> -
>> -     for (i = 0; i < ppgtt->num_pd_entries; i++) {
>> -             pt[i].page = alloc_page(GFP_KERNEL);
>> -             if (!pt->page) {
>> -                     gen6_ppgtt_free(ppgtt);
>> -                     return -ENOMEM;
>> -             }
>> -     }
>> -
>> -     return 0;
>> -}
>> -
>>  static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>>  {
>>       int ret;
>> @@ -1070,7 +1152,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>>       if (ret)
>>               return ret;
>>
>> -     ret = gen6_ppgtt_allocate_page_tables(ppgtt);
>> +     ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
>>       if (ret) {
>>               drm_mm_remove_node(&ppgtt->node);
>>               return ret;
>> @@ -1088,7 +1170,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>>               struct page *page;
>>               dma_addr_t pt_addr;
>>
>> -             page = ppgtt->pd.page_table[i].page;
>> +             page = ppgtt->pd.page_table[i]->page;
>>               pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
>>                                      PCI_DMA_BIDIRECTIONAL);
>>
>> @@ -1097,7 +1179,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>>                       return -EIO;
>>               }
>>
>> -             ppgtt->pd.page_table[i].daddr = pt_addr;
>> +             ppgtt->pd.page_table[i]->daddr = pt_addr;
>>       }
>>
>>       return 0;
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> index 1144b709..c9e93f5 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> @@ -199,12 +199,12 @@ struct i915_page_directory_entry {
>>               dma_addr_t daddr;
>>       };
>>
>> -     struct i915_page_table_entry *page_table;
>> +     struct i915_page_table_entry *page_table[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
>>  };
>>
>>  struct i915_page_directory_pointer_entry {
>>       /* struct page *page; */
>> -     struct i915_page_directory_entry page_directory[GEN8_LEGACY_PDPES];
>> +     struct i915_page_directory_entry *page_directory[GEN8_LEGACY_PDPES];
>>  };
>>
>>  struct i915_address_space {
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 9e71992..bc9c7c3 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1735,14 +1735,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
>>       reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
>>       reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
>>       reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
>> -     reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3].daddr);
>> -     reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3].daddr);
>> -     reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2].daddr);
>> -     reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2].daddr);
>> -     reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1].daddr);
>> -     reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1].daddr);
>> -     reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0].daddr);
>> -     reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0].daddr);
>> +     reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[3]->daddr);
>> +     reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[3]->daddr);
>> +     reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[2]->daddr);
>> +     reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[2]->daddr);
>> +     reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[1]->daddr);
>> +     reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[1]->daddr);
>> +     reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.page_directory[0]->daddr);
>> +     reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.page_directory[0]->daddr);
>>       if (ring->id == RCS) {
>>               reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>>               reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
>> --
>> 2.1.1
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx



-- 
Paulo Zanoni
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] drm/i915: Extract context switch skip and add pd load logic
  2015-02-27 13:38       ` [PATCH] drm/i915: Extract context switch skip and add " Michel Thierry
@ 2015-03-03  3:54         ` shuang.he
  2015-03-05 14:37         ` Mika Kuoppala
  1 sibling, 0 replies; 229+ messages in thread
From: shuang.he @ 2015-03-03  3:54 UTC (permalink / raw)
  To: shuang.he, ethan.gao, intel-gfx, michel.thierry

Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5853
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                                  278/278              278/278
ILK                                  308/308              308/308
SNB                                  284/284              284/284
IVB                                  380/380              380/380
BYT                                  294/294              294/294
HSW                 -1              387/387              386/387
BDW                 -1              316/316              315/316
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*HSW  igt_gem_storedw_loop_vebox      PASS(1)      DMESG_WARN(1)PASS(1)
*BDW  igt_gem_gtt_hog      PASS(8)      DMESG_WARN(1)PASS(1)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] drm/i915: Extract context switch skip and add pd load logic
  2015-02-27 13:38       ` [PATCH] drm/i915: Extract context switch skip and add " Michel Thierry
  2015-03-03  3:54         ` shuang.he
@ 2015-03-05 14:37         ` Mika Kuoppala
  1 sibling, 0 replies; 229+ messages in thread
From: Mika Kuoppala @ 2015-03-05 14:37 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> This patch just breaks out the logic of context switch skip.
>
> It also adds pd load pre, and pd load post logic (for GEN8).
>

I dont think this patch just breaks out the logic but it changes
it. And the reasons remains a mystery.

Could you please add justification why the logic changes are
required?

> v2: Use new functions to replace the logic right away (Daniel)
> v3: Add missing pd load logic.
>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 48 +++++++++++++++++++++++++++++++--
>  1 file changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 70346b0..8474e2c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -569,6 +569,33 @@ mi_set_context(struct intel_engine_cs *ring,
>  	return ret;
>  }
>  
> +static inline bool should_skip_switch(struct intel_engine_cs *ring,
> +				      struct intel_context *from,
> +				      struct intel_context *to)
> +{
> +	if (from == to && !to->remap_slice)
> +		return true;
> +
> +	return false;
> +}
> +
> +static bool
> +needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +
> +	return ((INTEL_INFO(ring->dev)->gen < 8) ||
> +			(ring != &dev_priv->ring[RCS])) && to->ppgtt;
> +}
> +
> +static bool
> +needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
> +{
> +	return (!to->legacy_hw_ctx.initialized ||
> +			i915_gem_context_is_default(to)) &&
> +			to->ppgtt && IS_GEN8(ring->dev);
> +}
> +
>  static int do_switch(struct intel_engine_cs *ring,
>  		     struct intel_context *to)
>  {
> @@ -584,7 +611,7 @@ static int do_switch(struct intel_engine_cs *ring,
>  		BUG_ON(!i915_gem_obj_is_pinned(from->legacy_hw_ctx.rcs_state));
>  	}
>  
> -	if (from == to && !to->remap_slice)
> +	if (should_skip_switch(ring, from, to))
>  		return 0;
>  
>  	/* Trying to pin first makes error handling easier. */
> @@ -602,7 +629,11 @@ static int do_switch(struct intel_engine_cs *ring,
>  	 */
>  	from = ring->last_context;
>  
> -	if (to->ppgtt) {
> +	if (needs_pd_load_pre(ring, to)) {
> +		/* Older GENs and non render rings still want the load first,
> +		 * "PP_DCLV followed by PP_DIR_BASE register through Load
> +		 * Register Immediate commands in Ring Buffer before submitting
> +		 * a context."*/
>  		trace_switch_mm(ring, to);
>  		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
>  		if (ret)
> @@ -644,6 +675,19 @@ static int do_switch(struct intel_engine_cs *ring,
>  	if (ret)
>  		goto unpin_out;
>  
> +	if (needs_pd_load_post(ring, to)) {
> +		ret = to->ppgtt->switch_mm(to->ppgtt, ring);
> +		/* The hardware context switch is emitted, but we haven't
> +		 * actually changed the state - so it's probably safe to bail
> +		 * here. Still, let the user know something dangerous has
> +		 * happened.
> +		 */
> +		if (ret) {
> +			DRM_ERROR("Failed to change address space on context switch\n");
> +			goto unpin_out;
> +		}
> +	}
> +

Can there be a context where both pd load pre and pd load post is
needed? If not, please consider adding WARN_ON which trigger is
when we emit switch_mm twice due to both being true.

Thanks,
-Mika

>  	for (i = 0; i < MAX_L3_SLICES; i++) {
>  		if (!(to->remap_slice & (1<<i)))
>  			continue;
> -- 
> 2.1.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v6 05/32] drm/i915: Track GEN6 page table usage
  2015-02-26 15:58     ` Mika Kuoppala
@ 2015-03-10 11:19       ` Mika Kuoppala
  0 siblings, 0 replies; 229+ messages in thread
From: Mika Kuoppala @ 2015-03-10 11:19 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Mika Kuoppala <mika.kuoppala@linux.intel.com> writes:

> Michel Thierry <michel.thierry@intel.com> writes:
>
>> From: Ben Widawsky <benjamin.widawsky@intel.com>
>>
>> Instead of implementing the full tracking + dynamic allocation, this
>> patch does a bit less than half of the work, by tracking and warning on
>> unexpected conditions. The tracking itself follows which PTEs within a
>> page table are currently being used for objects. The next patch will
>> modify this to actually allocate the page tables only when necessary.
>>
>> With the current patch there isn't much in the way of making a gen
>> agnostic range allocation function. However, in the next patch we'll add
>> more specificity which makes having separate functions a bit easier to
>> manage.
>>
>> One important change introduced here is that DMA mappings are
>> created/destroyed at the same page directories/tables are
>> allocated/deallocated.
>>
>> Notice that aliasing PPGTT is not managed here. The patch which actually
>> begins dynamic allocation/teardown explains the reasoning for this.
>>
>> v2: s/pdp.page_directory/pdp.page_directorys
>> Make a scratch page allocation helper
>>
>> v3: Rebase and expand commit message.
>>
>> v4: Allocate required pagetables only when it is needed, _bind_to_vm
>> instead of bind_vma (Daniel).
>>
>> v5: Rebased to remove the unnecessary noise in the diff, also:
>>  - PDE mask is GEN agnostic, renamed GEN6_PDE_MASK to I915_PDE_MASK.
>>  - Removed unnecessary checks in gen6_alloc_va_range.
>>  - Changed map/unmap_px_single macros to use dma functions directly and
>>    be part of a static inline function instead.
>>  - Moved drm_device plumbing through page tables operation to its own
>>    patch.
>>  - Moved allocate/teardown_va_range calls until they are fully
>>    implemented (in subsequent patch).
>>  - Merged pt and scratch_pt unmap_and_free path.
>>  - Moved scratch page allocator helper to the patch that will use it.
>>
>> v6: Reduce complexity by not tearing down pagetables dynamically, the
>> same can be achieved while freeing empty vms. (Daniel)
>>
>> v7: s/i915_dma_map_px_single/i915_dma_map_single
>> s/gen6_write_pdes/gen6_write_pde
>> Prevent a NULL case when only GGTT is available. (Mika)
>>
>> v8: Rebased after s/page_tables/page_table/.
>>
>> Cc: Daniel Vetter <daniel@ffwll.ch>
>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
>> ---
>>  drivers/gpu/drm/i915/i915_gem_gtt.c | 198 +++++++++++++++++++++++++-----------
>>  drivers/gpu/drm/i915/i915_gem_gtt.h |  75 ++++++++++++++
>>  2 files changed, 211 insertions(+), 62 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index e05488e..f9354c7 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -278,29 +278,88 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
>>  	return pte;
>>  }
>>  
>> -static void unmap_and_free_pt(struct i915_page_table_entry *pt, struct drm_device *dev)
>> +#define i915_dma_unmap_single(px, dev) \
>> +	__i915_dma_unmap_single((px)->daddr, dev)
>> +
>> +static inline void __i915_dma_unmap_single(dma_addr_t daddr,
>> +					struct drm_device *dev)
>> +{
>> +	struct device *device = &dev->pdev->dev;
>> +
>> +	dma_unmap_page(device, daddr, 4096, PCI_DMA_BIDIRECTIONAL);
>> +}
>> +
>> +/**
>> + * i915_dma_map_single() - Create a dma mapping for a page table/dir/etc.
>> + * @px:	Page table/dir/etc to get a DMA map for
>> + * @dev:	drm device
>> + *
>> + * Page table allocations are unified across all gens. They always require a
>> + * single 4k allocation, as well as a DMA mapping. If we keep the structs
>> + * symmetric here, the simple macro covers us for every page table type.
>> + *
>> + * Return: 0 if success.
>> + */
>> +#define i915_dma_map_single(px, dev) \
>> +	i915_dma_map_page_single((px)->page, (dev), &(px)->daddr)
>> +
>> +static inline int i915_dma_map_page_single(struct page *page,
>> +					   struct drm_device *dev,
>> +					   dma_addr_t *daddr)
>> +{
>> +	struct device *device = &dev->pdev->dev;
>> +
>> +	*daddr = dma_map_page(device, page, 0, 4096, PCI_DMA_BIDIRECTIONAL);
>> +	return dma_mapping_error(device, *daddr);
>> +}
>> +
>> +static void unmap_and_free_pt(struct i915_page_table_entry *pt,
>> +			       struct drm_device *dev)
>>  {
>>  	if (WARN_ON(!pt->page))
>>  		return;
>> +
>> +	i915_dma_unmap_single(pt, dev);
>>  	__free_page(pt->page);
>> +	kfree(pt->used_ptes);
>>  	kfree(pt);
>>  }
>>  
>>  static struct i915_page_table_entry *alloc_pt_single(struct drm_device *dev)
>>  {
>>  	struct i915_page_table_entry *pt;
>> +	const size_t count = INTEL_INFO(dev)->gen >= 8 ?
>> +		GEN8_PTES_PER_PAGE : I915_PPGTT_PT_ENTRIES;
>> +	int ret = -ENOMEM;
>>  
>>  	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
>>  	if (!pt)
>>  		return ERR_PTR(-ENOMEM);
>>  
>> +	pt->used_ptes = kcalloc(BITS_TO_LONGS(count), sizeof(*pt->used_ptes),
>> +				GFP_KERNEL);
>> +
>> +	if (!pt->used_ptes)
>> +		goto fail_bitmap;
>> +
>>  	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> -	if (!pt->page) {
>> -		kfree(pt);
>> -		return ERR_PTR(-ENOMEM);
>> -	}
>> +	if (!pt->page)
>> +		goto fail_page;
>> +
>> +	ret = i915_dma_map_single(pt, dev);
>> +	if (ret)
>> +		goto fail_dma;
>>  
>>  	return pt;
>> +
>> +fail_dma:
>> +	__free_page(pt->page);
>> +fail_page:
>> +	kfree(pt->used_ptes);
>> +fail_bitmap:
>> +	kfree(pt);
>> +
>> +	return ERR_PTR(ret);
>>  }
>>  
>>  /**
>> @@ -838,26 +897,35 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>>  	}
>>  }
>>  
>> -static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
>> +/* Write pde (index) from the page directory @pd to the page table @pt */
>> +static void gen6_write_pde(struct i915_page_directory_entry *pd,
>> +			    const int pde, struct i915_page_table_entry *pt)
>>  {
>> -	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
>> -	gen6_gtt_pte_t __iomem *pd_addr;
>> -	uint32_t pd_entry;
>> -	int i;
>> +	/* Caller needs to make sure the write completes if necessary */
>> +	struct i915_hw_ppgtt *ppgtt =
>> +		container_of(pd, struct i915_hw_ppgtt, pd);
>> +	u32 pd_entry;
>>  
>> -	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
>> -	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
>> -		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
>> -	for (i = 0; i < ppgtt->num_pd_entries; i++) {
>> -		dma_addr_t pt_addr;
>> +	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
>> +	pd_entry |= GEN6_PDE_VALID;
>>  
>> -		pt_addr = ppgtt->pd.page_table[i]->daddr;
>> -		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
>> -		pd_entry |= GEN6_PDE_VALID;
>> +	writel(pd_entry, ppgtt->pd_addr + pde);
>> +}
>>  
>> -		writel(pd_entry, pd_addr + i);
>> -	}
>> -	readl(pd_addr);
>> +/* Write all the page tables found in the ppgtt structure to incrementing page
>> + * directories. */
>> +static void gen6_write_page_range(struct drm_i915_private *dev_priv,
>> +				struct i915_page_directory_entry *pd, uint32_t start, uint32_t length)
>> +{
>> +	struct i915_page_table_entry *pt;
>> +	uint32_t pde, temp;
>> +
>> +	gen6_for_each_pde(pt, pd, start, length, temp, pde)
>> +		gen6_write_pde(pd, pde, pt);
>> +
>> +	/* Make sure write is complete before other code can use this page
>> +	 * table. Also require for WC mapped PTEs */
>> +	readl(dev_priv->gtt.gsm);
>>  }
>>  
>>  static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
>> @@ -1083,6 +1151,28 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>>  			       4096, PCI_DMA_BIDIRECTIONAL);
>>  }
>>  
>> +static int gen6_alloc_va_range(struct i915_address_space *vm,
>> +			       uint64_t start, uint64_t length)
>> +{
>> +	struct i915_hw_ppgtt *ppgtt =
>> +				container_of(vm, struct i915_hw_ppgtt, base);
>> +	struct i915_page_table_entry *pt;
>> +	uint32_t pde, temp;
>> +
>> +	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
>> +		DECLARE_BITMAP(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
>> +
>> +		bitmap_zero(tmp_bitmap, I915_PPGTT_PT_ENTRIES);
>> +		bitmap_set(tmp_bitmap, gen6_pte_index(start),
>> +			   gen6_pte_count(start, length));
>> +
>> +		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
>> +				I915_PPGTT_PT_ENTRIES);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>>  {
>>  	int i;
>> @@ -1129,20 +1219,24 @@ alloc:
>>  					       0, dev_priv->gtt.base.total,
>>  					       0);
>>  		if (ret)
>> -			return ret;
>> +			goto err_out;
>>  
>>  		retried = true;
>>  		goto alloc;
>>  	}
>>  
>>  	if (ret)
>> -		return ret;
>> +		goto err_out;
>> +
>>  
>>  	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
>>  		DRM_DEBUG("Forced to use aperture for PDEs\n");
>>  
>>  	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
>>  	return 0;
>> +
>> +err_out:
>> +	return ret;
>>  }
>>  
>>  static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>> @@ -1164,30 +1258,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
>>  	return 0;
>>  }
>>  
>> -static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
>> -{
>> -	struct drm_device *dev = ppgtt->base.dev;
>> -	int i;
>> -
>> -	for (i = 0; i < ppgtt->num_pd_entries; i++) {
>> -		struct page *page;
>> -		dma_addr_t pt_addr;
>> -
>> -		page = ppgtt->pd.page_table[i]->page;
>> -		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
>> -				       PCI_DMA_BIDIRECTIONAL);
>> -
>> -		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
>> -			gen6_ppgtt_unmap_pages(ppgtt);
>> -			return -EIO;
>> -		}
>> -
>> -		ppgtt->pd.page_table[i]->daddr = pt_addr;
>> -	}
>> -
>> -	return 0;
>> -}
>> -
>>  static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>>  {
>>  	struct drm_device *dev = ppgtt->base.dev;
>> @@ -1211,12 +1281,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>>  	if (ret)
>>  		return ret;
>>  
>> -	ret = gen6_ppgtt_setup_page_tables(ppgtt);
>> -	if (ret) {
>> -		gen6_ppgtt_free(ppgtt);
>> -		return ret;
>> -	}
>> -
>> +	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
>>  	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
>>  	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
>>  	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
>> @@ -1227,13 +1292,17 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>>  	ppgtt->pd.pd_offset =
>>  		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
>>  
>> +	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
>> +		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
>> +
>>  	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
>>  
>> +	gen6_write_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
>> +
>>  	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
>>  			 ppgtt->node.size >> 20,
>>  			 ppgtt->node.start / PAGE_SIZE);
>>  
>> -	gen6_write_pdes(ppgtt);
>>  	DRM_DEBUG("Adding PPGTT at offset %x\n",
>>  		  ppgtt->pd.pd_offset << 10);
>>  
>> @@ -1504,15 +1573,20 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
>>  		return;
>>  	}
>>  
>> -	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
>> -		/* TODO: Perhaps it shouldn't be gen6 specific */
>> -		if (i915_is_ggtt(vm)) {
>> -			if (dev_priv->mm.aliasing_ppgtt)
>> -				gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
>> -			continue;
>> -		}
>> +	if (USES_PPGTT(dev)) {
>> +		list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
>> +			/* TODO: Perhaps it shouldn't be gen6 specific */
>> +
>> +			struct i915_hw_ppgtt *ppgtt =
>> +					container_of(vm, struct i915_hw_ppgtt,
>> +						     base);
>>  
>> -		gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
>> +			if (i915_is_ggtt(vm))
>> +				ppgtt = dev_priv->mm.aliasing_ppgtt;
>> +
>> +			gen6_write_page_range(dev_priv, &ppgtt->pd, 0,
>> +					      ppgtt->num_pd_entries);
>> +		}
>>  	}
>>  
>>  	i915_ggtt_flush(dev_priv);
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> index c9e93f5..bf0e380 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
>> @@ -54,7 +54,10 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
>>  #define GEN6_PPGTT_PD_ENTRIES		512
>>  #define GEN6_PD_SIZE			(GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
>>  #define GEN6_PD_ALIGN			(PAGE_SIZE * 16)
>> +#define GEN6_PDE_SHIFT			22
>>  #define GEN6_PDE_VALID			(1 << 0)
>> +#define I915_PDE_MASK			(GEN6_PPGTT_PD_ENTRIES-1)
>> +#define NUM_PTE(pde_shift)		(1 << (pde_shift - PAGE_SHIFT))
>>  
>>  #define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
>>  
>> @@ -190,6 +193,8 @@ struct i915_vma {
>>  struct i915_page_table_entry {
>>  	struct page *page;
>>  	dma_addr_t daddr;
>> +
>> +	unsigned long *used_ptes;
>>  };
>>  
>>  struct i915_page_directory_entry {
>> @@ -246,6 +251,9 @@ struct i915_address_space {
>>  	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
>>  				     enum i915_cache_level level,
>>  				     bool valid, u32 flags); /* Create a valid PTE */
>> +	int (*allocate_va_range)(struct i915_address_space *vm,
>> +				 uint64_t start,
>> +				 uint64_t length);
>>  	void (*clear_range)(struct i915_address_space *vm,
>>  			    uint64_t start,
>>  			    uint64_t length,
>> @@ -298,12 +306,79 @@ struct i915_hw_ppgtt {
>>  
>>  	struct drm_i915_file_private *file_priv;
>>  
>> +	gen6_gtt_pte_t __iomem *pd_addr;
>> +
>>  	int (*enable)(struct i915_hw_ppgtt *ppgtt);
>>  	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
>>  			 struct intel_engine_cs *ring);
>>  	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
>>  };
>>  
>> +/* For each pde iterates over every pde between from start until start + length.
>> + * If start, and start+length are not perfectly divisible, the macro will round
>> + * down, and up as needed. The macro modifies pde, start, and length. Dev is
>> + * only used to differentiate shift values. Temp is temp.  On gen6/7, start = 0,
>> + * and length = 2G effectively iterates over every PDE in the system. On gen8+
>> + * it simply iterates over every page directory entry in a page directory.
>> + *
>
> There is nothing for gen8 in the macro yet, so comment is a bit misleading.
>
>> + * XXX: temp is not actually needed, but it saves doing the ALIGN operation.
>> + */
>> +#define gen6_for_each_pde(pt, pd, start, length, temp, iter) \
>> +	for (iter = gen6_pde_index(start), pt = (pd)->page_table[iter]; \
>> +	     length > 0 && iter < GEN6_PPGTT_PD_ENTRIES; \
>> +	     pt = (pd)->page_table[++iter], \
>> +	     temp = ALIGN(start+1, 1 << GEN6_PDE_SHIFT) - start, \
>> +	     temp = min_t(unsigned, temp, length), \
>> +	     start += temp, length -= temp)
>> +
>> +static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
>> +{
>> +	const uint32_t mask = NUM_PTE(pde_shift) - 1;
>> +
>> +	return (address >> PAGE_SHIFT) & mask;
>> +}
>> +
>> +/* Helper to counts the number of PTEs within the given length. This count does
>> +* not cross a page table boundary, so the max value would be
>> +* I915_PPGTT_PT_ENTRIES for GEN6, and GEN8_PTES_PER_PAGE for GEN8.
>> +*/
>> +static inline size_t i915_pte_count(uint64_t addr, size_t length,
>> +					uint32_t pde_shift)
>> +{
>> +	const uint64_t mask = ~((1 << pde_shift) - 1);
>> +	uint64_t end;
>> +
>> +	BUG_ON(length == 0);
>> +	BUG_ON(offset_in_page(addr|length));
>> +
>> +	end = addr + length;
>> +
>> +	if ((addr & mask) != (end & mask))
>> +		return NUM_PTE(pde_shift) - i915_pte_index(addr, pde_shift);
>> +
>> +	return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
>> +}
>
> After trying to figure out the reasoning for this i915_pte_count
> and it's role in the used bitmap setup, I started to wonder why all
> the complexity here.
>
> BUG_ON(offset_in_page(addr|length)) reveals that we can't be called
> with anything but a page boundary so the address parameter is
> irrelevant in here?
>
> Then there is trickery with the pde_shifting. I tried to find some
> generalization down the series to take advantage of this. Perhaps
> I missed it?
>
> For me it seems that we can replace this with simple:
>
> static inline uint32_t i915_pte_count(uint64_t length, uint32_t
> pte_len)
> {
>        return min_t(uint32_t, length / PAGE_SIZE, PAGE_SIZE / pte_len);
> }
>
> ...and not lose anything.

Oh we lose alot. The simplified version might work in the context of
this patch but the macro will be broken. Michel explained it all to
me in irc.

I apologize for this. Please ignore and keep the original i915_pte_count.

-Mika

> On top of that when trying to wrap my brain around the differences
> between GEN6/7 and GEN8+ the following patch before this patch would
> make things much easier to understand, atleast for me:
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index c9e93f5..c13f32f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -36,13 +36,13 @@
>  
>  struct drm_i915_file_private;
>  
> -typedef uint32_t gen6_gtt_pte_t;
> -typedef uint64_t gen8_gtt_pte_t;
> -typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
> +typedef uint32_t gen6_pte_t;
> +typedef uint64_t gen8_pte_t;
> +typedef uint64_t gen8_pde_t;
>  
>  #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
>  
> -#define I915_PPGTT_PT_ENTRIES		(PAGE_SIZE / sizeof(gen6_gtt_pte_t))
> +
>  /* gen6-hsw has bit 11-4 for physical addr bit 39-32 */
>  #define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
>  #define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
> @@ -51,8 +51,13 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
>  #define GEN6_PTE_UNCACHED		(1 << 1)
>  #define GEN6_PTE_VALID			(1 << 0)
>  
> -#define GEN6_PPGTT_PD_ENTRIES		512
> -#define GEN6_PD_SIZE			(GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
> +#define I915_PTES(pte_len)		(PAGE_SIZE / (pte_len))
> +#define I915_PTE_MASK(pte_len)		(I915_PTES(pte_len) - 1)
> +#define I915_PDES			512
> +#define I915_PDE_MASK			(I915_PDES - 1)
> +
> +#define GEN6_PTES			I915_PTES(sizeof(gen6_pte_t))
> +#define GEN6_PD_SIZE		        (I915_PDES * PAGE_SIZE)
>  #define GEN6_PD_ALIGN			(PAGE_SIZE * 16)
>  #define GEN6_PDE_VALID			(1 << 0)
>  
> @@ -89,8 +94,7 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
>  #define GEN8_PTE_SHIFT			12
>  #define GEN8_PTE_MASK			0x1ff
>  #define GEN8_LEGACY_PDPES		4
> -#define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
> -#define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
> +#define GEN8_PTES			I915_PTES(sizeof(gen8_pte_t))
>  
>  #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
>  #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
> @@ -199,7 +203,7 @@ struct i915_page_directory_entry {
>  		dma_addr_t daddr;
>  	};
>  
> -	struct i915_page_table_entry *page_table[GEN6_PPGTT_PD_ENTRIES]; /* PDEs */
> +	struct i915_page_table_entry *page_table[I915_PDES]; /* PDEs */
>  };
>  
>  struct i915_page_directory_pointer_entry {
> @@ -243,9 +247,9 @@ struct i915_address_space {
>  	struct list_head inactive_list;
>  
>  	/* FIXME: Need a more generic return type */
> -	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
> -				     enum i915_cache_level level,
> -				     bool valid, u32 flags); /* Create a valid PTE */
> +	gen6_pte_t (*pte_encode)(dma_addr_t addr,
> +				 enum i915_cache_level level,
> +				 bool valid, u32 flags); /* Create a valid PTE */
>  	void (*clear_range)(struct i915_address_space *vm,
>  			    uint64_t start,
>  			    uint64_t length,
> @@ -304,6 +308,21 @@ struct i915_hw_ppgtt {
>  	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
>  };
>  
> +static inline uint32_t i915_pte_index(uint64_t address, uint32_t pte_len)
> +{
> +	return (address >> PAGE_SHIFT) & I915_PTE_MASK(pte_len);
> +}
> +
> +static inline uint32_t i915_pte_count(uint64_t length, uint32_t pte_len)
> +{
> +	return min_t(uint32_t, length / PAGE_SIZE, I915_PTES(pte_len));
> +}
> +
> +static inline uint32_t i915_pde_index(uint64_t addr, uint32_t pde_shift)
> +{
> +	return (addr >> pde_shift) & I915_PDE_MASK;
> +}
> +
>  int i915_gem_gtt_init(struct drm_device *dev);
>  void i915_gem_init_global_gtt(struct drm_device *dev);
>  void i915_global_gtt_cleanup(struct drm_device *dev);
>
> With that, I think you could write generic gen_for_each_pde
> more easily and just parametrize the gen6 variant and gen8 one
> further in the series.
>
> --Mika
>
>> +static inline uint32_t i915_pde_index(uint64_t addr, uint32_t shift)
>> +{
>> +	return (addr >> shift) & I915_PDE_MASK;
>> +}
>> +
>> +static inline uint32_t gen6_pte_index(uint32_t addr)
>> +{
>> +	return i915_pte_index(addr, GEN6_PDE_SHIFT);
>> +}
>> +
>> +static inline size_t gen6_pte_count(uint32_t addr, uint32_t length)
>> +{
>> +	return i915_pte_count(addr, length, GEN6_PDE_SHIFT);
>> +}
>> +
>> +static inline uint32_t gen6_pde_index(uint32_t addr)
>> +{
>> +	return i915_pde_index(addr, GEN6_PDE_SHIFT);
>> +}
>> +
>>  int i915_gem_gtt_init(struct drm_device *dev);
>>  void i915_gem_init_global_gtt(struct drm_device *dev);
>>  void i915_global_gtt_cleanup(struct drm_device *dev);
>> -- 
>> 2.1.1
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH] drm/i915: Initialize all contexts
  2015-02-27 13:40     ` [PATCH] " Michel Thierry
@ 2015-03-20 10:38       ` Chris Wilson
  0 siblings, 0 replies; 229+ messages in thread
From: Chris Wilson @ 2015-03-20 10:38 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Fri, Feb 27, 2015 at 01:40:18PM +0000, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
> 
> The problem is we're going to switch to a new context, which could be
> the default context. The plan was to use restore inhibit, which would be
> fine, except if we are using dynamic page tables (which we will). If we
> use dynamic page tables and we don't load new page tables, the previous
> page tables might go away, and future operations will fault.
> 
> CTXA runs.
> switch to default, restore inhibit
> CTXA dies and has its address space taken away.
> Run CTXB, tries to save using the context A's address space - this
> fails.
> 
> The general solution is to make sure every context has it's own state,
> and its own address space. For cases when we must restore inhibit, first
> thing we do is load a valid address space. I thought this would be
> enough, but apparently there are references within the context itself
> which will refer to the old address space - therefore, we also must
> reinitialize.
> 
> It was tricky to track this down as we don't have much insight into what
> happens in a context save.
> 
> This is required for the next patch which enables dynamic page tables.

This sneaks in a major change that is not mentioned at all above.
Namely:

> @@ -744,7 +747,7 @@ static int do_switch(struct intel_engine_cs *ring,
>  		i915_gem_context_unreference(from);
>  	}
>  
> -	uninitialized = !to->legacy_hw_ctx.initialized && from == NULL;
> +	uninitialized = !to->legacy_hw_ctx.initialized;
>  	to->legacy_hw_ctx.initialized = true;

That has nothing to do with VM spaces, but with w/a that need to be
applied to the context image.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

* Re: [PATCH v6 10/32] drm/i915: Add dynamic page trace events
  2015-02-24 16:22   ` [PATCH v6 10/32] drm/i915: Add dynamic page trace events Michel Thierry
@ 2015-03-20 13:29     ` Mika Kuoppala
  0 siblings, 0 replies; 229+ messages in thread
From: Mika Kuoppala @ 2015-03-20 13:29 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Michel Thierry <michel.thierry@intel.com> writes:

> Traces for page directories and tables allocation and map.
>
> v2: Removed references to teardown.
> v3: bitmap_scnprintf has been deprecated.
>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c     |  2 +
>  drivers/gpu/drm/i915/i915_gem_gtt.c |  5 ++
>  drivers/gpu/drm/i915/i915_trace.h   | 95 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 102 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 312b7d2..4e51275 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3601,6 +3601,8 @@ search_free:
>  
>  	/*  allocate before insert / bind */
>  	if (vma->vm->allocate_va_range) {
> +		trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size,
> +				VM_TO_TRACE_NAME(vma->vm));
>  		ret = vma->vm->allocate_va_range(vma->vm,
>  						vma->node.start,
>  						vma->node.size);
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 29cda58..94cdd99 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1210,6 +1210,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
>  
>  		ppgtt->pd.page_table[pde] = pt;
>  		set_bit(pde, new_page_tables);
> +		trace_i915_page_table_entry_alloc(vm, pde, start, GEN6_PDE_SHIFT);
>  	}
>  
>  	start = start_save;
> @@ -1225,6 +1226,10 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
>  		if (test_and_clear_bit(pde, new_page_tables))
>  			gen6_write_pde(&ppgtt->pd, pde, pt);
>  
> +		trace_i915_page_table_entry_map(vm, pde, pt,
> +					 gen6_pte_index(start),
> +					 gen6_pte_count(start, length),
> +					 I915_PPGTT_PT_ENTRIES);
>  		bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
>  				I915_PPGTT_PT_ENTRIES);
>  	}
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index f004d3d..0038dc2 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -156,6 +156,101 @@ TRACE_EVENT(i915_vma_unbind,
>  		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
>  );
>  
> +#define VM_TO_TRACE_NAME(vm) \
> +	(i915_is_ggtt(vm) ? "GGTT" : \
> +		      "Private VM")

"G" and "P" would suffice but it is matter of taste.

> +DECLARE_EVENT_CLASS(i915_va,
> +	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
> +	TP_ARGS(vm, start, length, name),
> +
> +	TP_STRUCT__entry(
> +		__field(struct i915_address_space *, vm)
> +		__field(u64, start)
> +		__field(u64, end)
> +		__string(name, name)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vm = vm;
> +		__entry->start = start;
> +		__entry->end = start + length;

__entry->end = start + length - 1;

> +		__assign_str(name, name);
> +	),
> +
> +	TP_printk("vm=%p (%s), 0x%llx-0x%llx",
> +		  __entry->vm, __get_str(name),  __entry->start, __entry->end)
> +);
> +
> +DEFINE_EVENT(i915_va, i915_va_alloc,
> +	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
> +	     TP_ARGS(vm, start, length, name)
> +);
> +
> +DECLARE_EVENT_CLASS(i915_page_table_entry,
> +	TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
> +	TP_ARGS(vm, pde, start, pde_shift),
> +
> +	TP_STRUCT__entry(
> +		__field(struct i915_address_space *, vm)
> +		__field(u32, pde)
> +		__field(u64, start)
> +		__field(u64, end)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vm = vm;
> +		__entry->pde = pde;
> +		__entry->start = start;
> +		__entry->end = (start + (1ULL << pde_shift)) & ~((1ULL
> << pde_shift)-1);

-1 for her also?

> +	),
> +
> +	TP_printk("vm=%p, pde=%d (0x%llx-0x%llx)",
> +		  __entry->vm, __entry->pde, __entry->start, __entry->end)
> +);
> +
> +DEFINE_EVENT(i915_page_table_entry, i915_page_table_entry_alloc,
> +	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
> +	     TP_ARGS(vm, pde, start, pde_shift)
> +);
> +
> +/* Avoid extra math because we only support two sizes. The format is defined by
> + * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
> +#define TRACE_PT_SIZE(bits) \
> +	((((bits) == 1024) ? 288 : 144) + 1)
> +

This seems to be calculating the length of the output string, not the
bitmap size.

> +DECLARE_EVENT_CLASS(i915_page_table_entry_update,
> +	TP_PROTO(struct i915_address_space *vm, u32 pde,
> +		 struct i915_page_table_entry *pt, u32 first, u32 len, size_t bits),
> +	TP_ARGS(vm, pde, pt, first, len, bits),
> +
> +	TP_STRUCT__entry(
> +		__field(struct i915_address_space *, vm)
> +		__field(u32, pde)
> +		__field(u32, first)
> +		__field(u32, last)
> +		__bitmask(cur_ptes, TRACE_PT_SIZE(bits))

..and this seems wrong.

__bitmask(cur_ptes, bits);

And I would replace size_t with u32 throughout the patch.

> +	),
> +
> +	TP_fast_assign(
> +		__entry->vm = vm;
> +		__entry->pde = pde;
> +		__entry->first = first;
> +		__entry->last = first + len;

It is not len that we are getting but count. So please rename
the argument and

__entry->last = first + count - 1;

-Mika

> +		__assign_bitmask(cur_ptes, pt->used_ptes, bits);
> +	),
> +
> +	TP_printk("vm=%p, pde=%d, updating %u:%u\t%s",
> +		  __entry->vm, __entry->pde, __entry->last, __entry->first,
> +		  __get_bitmask(cur_ptes))
> +);
> +
> +DEFINE_EVENT(i915_page_table_entry_update, i915_page_table_entry_map,
> +	TP_PROTO(struct i915_address_space *vm, u32 pde,
> +		 struct i915_page_table_entry *pt, u32 first, u32 len, size_t bits),
> +	TP_ARGS(vm, pde, pt, first, len, bits)
> +);
> +
>  TRACE_EVENT(i915_gem_object_change_domain,
>  	    TP_PROTO(struct drm_i915_gem_object *obj, u32 old_read, u32 old_write),
>  	    TP_ARGS(obj, old_read, old_write),
> -- 
> 2.1.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 229+ messages in thread

end of thread, other threads:[~2015-03-20 13:29 UTC | newest]

Thread overview: 229+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-18 17:09 [PATCH 00/24] PPGTT dynamic page allocations Michel Thierry
2014-12-18 17:09 ` [PATCH 01/24] drm/i915: Add some extra guards in evict_vm Michel Thierry
2014-12-18 17:09 ` [PATCH 02/24] drm/i915/trace: Fix offsets for 64b Michel Thierry
2014-12-18 17:10 ` [PATCH 03/24] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
2014-12-18 20:40   ` Daniel Vetter
2014-12-18 20:44     ` Daniel Vetter
2014-12-19 12:32       ` Dave Gordon
2014-12-19 13:24         ` Daniel Vetter
2014-12-18 17:10 ` [PATCH 04/24] drm/i915: Setup less PPGTT on failed pagedir Michel Thierry
2014-12-18 17:10 ` [PATCH 05/24] drm/i915/gen8: Un-hardcode number of page directories Michel Thierry
2014-12-18 17:10 ` [PATCH 06/24] drm/i915: Range clearing is PPGTT agnostic Michel Thierry
2014-12-18 17:10 ` [PATCH 07/24] drm/i915: page table abstractions Michel Thierry
2014-12-18 17:10 ` [PATCH 08/24] drm/i915: Complete page table structures Michel Thierry
2014-12-18 17:10 ` [PATCH 09/24] drm/i915: Create page table allocators Michel Thierry
2014-12-18 17:10 ` [PATCH 10/24] drm/i915: Track GEN6 page table usage Michel Thierry
2014-12-18 21:06   ` Daniel Vetter
2014-12-18 17:10 ` [PATCH 11/24] drm/i915: Extract context switch skip logic Michel Thierry
2014-12-18 20:54   ` Daniel Vetter
2014-12-18 17:10 ` [PATCH 12/24] drm/i915: Track page table reload need Michel Thierry
2014-12-18 21:08   ` Daniel Vetter
2014-12-18 17:10 ` [PATCH 13/24] drm/i915: Initialize all contexts Michel Thierry
2014-12-18 17:10 ` [PATCH 14/24] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
2014-12-18 21:12   ` Daniel Vetter
2014-12-18 17:10 ` [PATCH 15/24] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
2014-12-18 17:10 ` [PATCH 16/24] drm/i915/bdw: pagedirs rework allocation Michel Thierry
2014-12-18 17:10 ` [PATCH 17/24] drm/i915/bdw: pagetable allocation rework Michel Thierry
2014-12-18 17:10 ` [PATCH 18/24] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
2014-12-18 17:10 ` [PATCH 19/24] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
2014-12-18 17:10 ` [PATCH 20/24] drm/i915: Extract PPGTT param from pagedir alloc Michel Thierry
2014-12-18 17:10 ` [PATCH 21/24] drm/i915/bdw: Split out mappings Michel Thierry
2014-12-18 17:10 ` [PATCH 22/24] drm/i915/bdw: begin bitmap tracking Michel Thierry
2014-12-18 17:10 ` [PATCH 23/24] drm/i915/bdw: Dynamic page table allocations Michel Thierry
2014-12-18 17:10 ` [PATCH 24/24] drm/i915/bdw: Dynamic page table allocations in lrc mode Michel Thierry
2014-12-18 21:16 ` [PATCH 00/24] PPGTT dynamic page allocations Daniel Vetter
2014-12-19  8:31   ` Chris Wilson
2014-12-19  8:37     ` Daniel Vetter
2014-12-19  8:50       ` Chris Wilson
2014-12-19 10:13         ` Daniel Vetter
2014-12-19 12:35           ` Michel Thierry
2014-12-19 13:10           ` Chris Wilson
2014-12-19 13:29             ` Daniel Vetter
2014-12-19 13:36               ` Chris Wilson
2014-12-19 19:08                 ` Chris Wilson
2014-12-23 17:16 ` [PATCH v2 " Michel Thierry
2014-12-23 17:16   ` [PATCH v2 01/24] drm/i915: Add some extra guards in evict_vm Michel Thierry
2015-01-05 13:39     ` Daniel Vetter
2014-12-23 17:16   ` [PATCH v2 02/24] drm/i915/trace: Fix offsets for 64b Michel Thierry
2014-12-23 17:16   ` [PATCH v2 03/24] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
2014-12-23 17:16   ` [PATCH v2 04/24] drm/i915: Setup less PPGTT on failed pagedir Michel Thierry
2014-12-23 17:16   ` [PATCH v2 05/24] drm/i915/gen8: Un-hardcode number of page directories Michel Thierry
2014-12-23 17:16   ` [PATCH v2 06/24] drm/i915: Range clearing is PPGTT agnostic Michel Thierry
2014-12-23 17:16   ` [PATCH v2 07/24] drm/i915: page table abstractions Michel Thierry
2015-01-05 13:47     ` Daniel Vetter
2014-12-23 17:16   ` [PATCH v2 08/24] drm/i915: Complete page table structures Michel Thierry
2014-12-23 17:16   ` [PATCH v2 09/24] drm/i915: Create page table allocators Michel Thierry
2014-12-23 17:16   ` [PATCH v2 10/24] drm/i915: Track GEN6 page table usage Michel Thierry
2015-01-05 14:29     ` Daniel Vetter
2014-12-23 17:16   ` [PATCH v2 11/24] drm/i915: Extract context switch skip and pd load logic Michel Thierry
2015-01-05 14:31     ` Daniel Vetter
2014-12-23 17:16   ` [PATCH v2 12/24] drm/i915: Track page table reload need Michel Thierry
2015-01-05 14:36     ` Daniel Vetter
2014-12-23 17:16   ` [PATCH v2 13/24] drm/i915: Initialize all contexts Michel Thierry
2014-12-23 17:16   ` [PATCH v2 14/24] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
2015-01-05 14:45     ` Daniel Vetter
2015-01-13 11:53       ` Michel Thierry
2015-01-13 22:09         ` Daniel Vetter
2014-12-23 17:16   ` [PATCH v2 15/24] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
2014-12-23 17:16   ` [PATCH v2 16/24] drm/i915/bdw: pagedirs rework allocation Michel Thierry
2014-12-23 17:16   ` [PATCH v2 17/24] drm/i915/bdw: pagetable allocation rework Michel Thierry
2014-12-23 17:16   ` [PATCH v2 18/24] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
2014-12-23 17:16   ` [PATCH v2 19/24] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
2014-12-23 17:16   ` [PATCH v2 20/24] drm/i915: Extract PPGTT param from pagedir alloc Michel Thierry
2014-12-23 17:16   ` [PATCH v2 21/24] drm/i915/bdw: Split out mappings Michel Thierry
2014-12-23 17:16   ` [PATCH v2 22/24] drm/i915/bdw: begin bitmap tracking Michel Thierry
2014-12-23 17:16   ` [PATCH v2 23/24] drm/i915/bdw: Dynamic page table allocations Michel Thierry
2015-01-05 14:52     ` Daniel Vetter
2014-12-23 17:16   ` [PATCH v2 24/24] drm/i915/bdw: Dynamic page table allocations in lrc mode Michel Thierry
2015-01-05 14:59     ` Daniel Vetter
2015-01-05 14:57   ` [PATCH v2 00/24] PPGTT dynamic page allocations Daniel Vetter
2015-01-13 11:52 ` [PATCH v3 00/25] " Michel Thierry
2015-01-13 11:52   ` [PATCH v3 01/25] drm/i915/trace: Fix offsets for 64b Michel Thierry
2015-01-13 11:52   ` [PATCH v3 02/25] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
2015-01-13 11:52   ` [PATCH v3 03/25] drm/i915: Setup less PPGTT on failed page_directory Michel Thierry
2015-01-13 11:52   ` [PATCH v3 04/25] drm/i915/gen8: Un-hardcode number of page directories Michel Thierry
2015-01-13 11:52   ` [PATCH v3 05/25] drm/i915: Range clearing is PPGTT agnostic Michel Thierry
2015-01-13 11:52   ` [PATCH v3 06/25] drm/i915: page table abstractions Michel Thierry
2015-01-13 11:52   ` [PATCH v3 07/25] drm/i915: Complete page table structures Michel Thierry
2015-01-13 11:52   ` [PATCH v3 08/25] drm/i915: Create page table allocators Michel Thierry
2015-01-13 11:52   ` [PATCH v3 09/25] drm/i915: Plumb drm_device through page tables operations Michel Thierry
2015-01-13 11:52   ` [PATCH v3 10/25] drm/i915: Track GEN6 page table usage Michel Thierry
2015-01-13 11:52   ` [PATCH v3 11/25] drm/i915: Extract context switch skip and pd load logic Michel Thierry
2015-01-13 11:52   ` [PATCH v3 12/25] drm/i915: Track page table reload need Michel Thierry
2015-01-13 11:52   ` [PATCH v3 13/25] drm/i915: Initialize all contexts Michel Thierry
2015-01-13 11:52   ` [PATCH v3 14/25] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
2015-01-13 11:52   ` [PATCH v3 15/25] drm/i915: Add dynamic page trace events Michel Thierry
2015-01-13 11:52   ` [PATCH v3 16/25] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
2015-01-13 11:52   ` [PATCH v3 17/25] drm/i915/bdw: page directories rework allocation Michel Thierry
2015-01-13 11:52   ` [PATCH v3 18/25] drm/i915/bdw: pagetable allocation rework Michel Thierry
2015-01-13 11:52   ` [PATCH v3 19/25] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
2015-01-13 11:52   ` [PATCH v3 20/25] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
2015-01-13 11:52   ` [PATCH v3 21/25] drm/i915: Extract PPGTT param from page_directory alloc Michel Thierry
2015-01-13 11:52   ` [PATCH v3 22/25] drm/i915/bdw: Split out mappings Michel Thierry
2015-01-13 11:52   ` [PATCH v3 23/25] drm/i915/bdw: begin bitmap tracking Michel Thierry
2015-01-13 11:52   ` [PATCH v3 24/25] drm/i915/bdw: Dynamic page table allocations Michel Thierry
2015-01-13 11:52   ` [PATCH v3 25/25] drm/i915/bdw: Support dynamic pdp updates in lrc mode Michel Thierry
2015-01-22 17:01 ` [PATCH v4 00/24] PPGTT dynamic page allocations Michel Thierry
2015-01-22 17:01   ` [PATCH v4 01/24] drm/i915/trace: Fix offsets for 64b Michel Thierry
2015-01-27 12:16     ` Mika Kuoppala
2015-01-22 17:01   ` [PATCH v4 02/24] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
2015-02-06 15:32     ` Mika Kuoppala
2015-01-22 17:01   ` [PATCH v4 03/24] drm/i915: Setup less PPGTT on failed page_directory Michel Thierry
2015-02-09 15:21     ` Mika Kuoppala
2015-01-22 17:01   ` [PATCH v4 04/24] drm/i915/gen8: Un-hardcode number of page directories Michel Thierry
2015-02-09 15:30     ` Mika Kuoppala
2015-02-09 16:33       ` Daniel Vetter
2015-01-22 17:01   ` [PATCH v4 05/24] drm/i915: page table abstractions Michel Thierry
2015-02-18 11:27     ` Mika Kuoppala
2015-02-23 15:39       ` Michel Thierry
2015-01-22 17:01   ` [PATCH v4 06/24] drm/i915: Complete page table structures Michel Thierry
2015-01-22 17:01   ` [PATCH v4 07/24] drm/i915: Create page table allocators Michel Thierry
2015-02-20 16:50     ` Mika Kuoppala
2015-02-23 15:39       ` Michel Thierry
2015-01-22 17:01   ` [PATCH v4 08/24] drm/i915: Plumb drm_device through page tables operations Michel Thierry
2015-01-22 17:01   ` [PATCH v4 09/24] drm/i915: Track GEN6 page table usage Michel Thierry
2015-02-20 16:41     ` Mika Kuoppala
2015-02-23 15:39       ` Michel Thierry
2015-01-22 17:01   ` [PATCH v4 10/24] drm/i915: Extract context switch skip and pd load logic Michel Thierry
2015-01-22 17:01   ` [PATCH v4 11/24] drm/i915: Track page table reload need Michel Thierry
2015-01-22 17:01   ` [PATCH v4 12/24] drm/i915: Initialize all contexts Michel Thierry
2015-01-22 17:01   ` [PATCH v4 13/24] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
2015-01-22 17:01   ` [PATCH v4 14/24] drm/i915: Add dynamic page trace events Michel Thierry
2015-01-22 17:01   ` [PATCH v4 15/24] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
2015-01-22 17:01   ` [PATCH v4 16/24] drm/i915/bdw: page directories rework allocation Michel Thierry
2015-01-22 17:01   ` [PATCH v4 17/24] drm/i915/bdw: pagetable allocation rework Michel Thierry
2015-01-22 17:01   ` [PATCH v4 18/24] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
2015-01-22 17:01   ` [PATCH v4 19/24] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
2015-01-22 17:01   ` [PATCH v4 20/24] drm/i915: Extract PPGTT param from page_directory alloc Michel Thierry
2015-01-22 17:01   ` [PATCH v4 21/24] drm/i915/bdw: Split out mappings Michel Thierry
2015-01-22 17:01   ` [PATCH v4 22/24] drm/i915/bdw: begin bitmap tracking Michel Thierry
2015-01-22 17:01   ` [PATCH v4 23/24] drm/i915/bdw: Dynamic page table allocations Michel Thierry
2015-01-22 17:01   ` [PATCH v4 24/24] drm/i915/bdw: Support dynamic pdp updates in lrc mode Michel Thierry
2015-02-23 15:44 ` [PATCH v5 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
2015-02-23 15:44   ` [PATCH v5 01/32] drm/i915: page table abstractions Michel Thierry
2015-02-24 11:14     ` [PATCH] " Michel Thierry
2015-02-24 12:03       ` Mika Kuoppala
2015-02-23 15:44   ` [PATCH v5 02/32] drm/i915: Complete page table structures Michel Thierry
2015-02-24 13:10     ` Mika Kuoppala
2015-02-23 15:44   ` [PATCH v5 03/32] drm/i915: Create page table allocators Michel Thierry
2015-02-24 13:56     ` Mika Kuoppala
2015-02-24 15:18       ` Michel Thierry
2015-02-23 15:44   ` [PATCH v5 04/32] drm/i915: Plumb drm_device through page tables operations Michel Thierry
2015-02-23 15:44   ` [PATCH v5 05/32] drm/i915: Track GEN6 page table usage Michel Thierry
2015-02-23 15:44   ` [PATCH v5 06/32] drm/i915: Extract context switch skip and pd load logic Michel Thierry
2015-02-23 15:44   ` [PATCH v5 07/32] drm/i915: Track page table reload need Michel Thierry
2015-02-23 15:44   ` [PATCH v5 08/32] drm/i915: Initialize all contexts Michel Thierry
2015-02-23 15:44   ` [PATCH v5 09/32] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
2015-02-23 15:44   ` [PATCH v5 10/32] drm/i915: Add dynamic page trace events Michel Thierry
2015-02-23 15:44   ` [PATCH v5 11/32] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
2015-02-23 15:44   ` [PATCH v5 12/32] drm/i915/bdw: page directories rework allocation Michel Thierry
2015-02-23 15:44   ` [PATCH v5 13/32] drm/i915/bdw: pagetable allocation rework Michel Thierry
2015-02-23 15:44   ` [PATCH v5 14/32] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
2015-02-23 15:44   ` [PATCH v5 15/32] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
2015-02-23 15:44   ` [PATCH v5 16/32] drm/i915: Extract PPGTT param from page_directory alloc Michel Thierry
2015-02-23 15:44   ` [PATCH v5 17/32] drm/i915/bdw: Split out mappings Michel Thierry
2015-02-23 15:44   ` [PATCH v5 18/32] drm/i915/bdw: begin bitmap tracking Michel Thierry
2015-02-23 15:44   ` [PATCH v5 19/32] drm/i915/bdw: Dynamic page table allocations Michel Thierry
2015-02-23 15:44   ` [PATCH v5 20/32] drm/i915/bdw: Support dynamic pdp updates in lrc mode Michel Thierry
2015-02-23 15:44   ` [PATCH v5 21/32] drm/i915/bdw: Make pdp allocation more dynamic Michel Thierry
2015-02-23 15:44   ` [PATCH v5 22/32] drm/i915/bdw: Abstract PDP usage Michel Thierry
2015-02-23 15:44   ` [PATCH v5 23/32] drm/i915/bdw: Add dynamic page trace events Michel Thierry
2015-02-23 15:44   ` [PATCH v5 24/32] drm/i915/bdw: Add ppgtt info for dynamic pages Michel Thierry
2015-02-23 15:44   ` [PATCH v5 25/32] drm/i915/bdw: implement alloc/free for 4lvl Michel Thierry
2015-02-23 15:44   ` [PATCH v5 26/32] drm/i915/bdw: Add 4 level switching infrastructure Michel Thierry
2015-02-23 15:44   ` [PATCH v5 27/32] drm/i915/bdw: Support 64 bit PPGTT in lrc mode Michel Thierry
2015-02-23 15:44   ` [PATCH v5 28/32] drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT Michel Thierry
2015-02-23 15:44   ` [PATCH v5 29/32] drm/i915: Plumb sg_iter through va allocation ->maps Michel Thierry
2015-02-23 15:44   ` [PATCH v5 30/32] drm/i915/bdw: Add 4 level support in insert_entries and clear_range Michel Thierry
2015-02-23 15:44   ` [PATCH v5 31/32] drm/i915: Expand error state's address width to 64b Michel Thierry
2015-02-23 15:44   ` [PATCH v5 32/32] drm/i915/bdw: Flip the 48b switch Michel Thierry
2015-02-24 16:22 ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Michel Thierry
2015-02-24 16:22   ` [PATCH v6 01/32] drm/i915: page table abstractions Michel Thierry
2015-02-24 16:22   ` [PATCH v6 02/32] drm/i915: Complete page table structures Michel Thierry
2015-02-24 16:22   ` [PATCH v6 03/32] drm/i915: Create page table allocators Michel Thierry
2015-02-25 13:34     ` Mika Kuoppala
2015-03-02 18:57       ` Paulo Zanoni
2015-02-24 16:22   ` [PATCH v6 04/32] drm/i915: Plumb drm_device through page tables operations Michel Thierry
2015-02-25 14:52     ` Mika Kuoppala
2015-02-25 15:57       ` Daniel Vetter
2015-02-24 16:22   ` [PATCH v6 05/32] drm/i915: Track GEN6 page table usage Michel Thierry
2015-02-26 15:58     ` Mika Kuoppala
2015-03-10 11:19       ` Mika Kuoppala
2015-02-24 16:22   ` [PATCH v6 06/32] drm/i915: Extract context switch skip and pd load logic Michel Thierry
2015-02-27 11:46     ` Mika Kuoppala
2015-02-27 13:38       ` [PATCH] drm/i915: Extract context switch skip and add " Michel Thierry
2015-03-03  3:54         ` shuang.he
2015-03-05 14:37         ` Mika Kuoppala
2015-02-24 16:22   ` [PATCH v6 07/32] drm/i915: Track page table reload need Michel Thierry
2015-02-24 16:22   ` [PATCH v6 08/32] drm/i915: Initialize all contexts Michel Thierry
2015-02-27 13:40     ` [PATCH] " Michel Thierry
2015-03-20 10:38       ` Chris Wilson
2015-02-24 16:22   ` [PATCH v6 09/32] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
2015-02-24 16:22   ` [PATCH v6 10/32] drm/i915: Add dynamic page trace events Michel Thierry
2015-03-20 13:29     ` Mika Kuoppala
2015-02-24 16:22   ` [PATCH v6 11/32] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
2015-02-24 16:22   ` [PATCH v6 12/32] drm/i915/bdw: page directories rework allocation Michel Thierry
2015-02-24 16:22   ` [PATCH v6 13/32] drm/i915/bdw: pagetable allocation rework Michel Thierry
2015-02-24 16:22   ` [PATCH v6 14/32] drm/i915/bdw: Update pdp switch and point unused PDPs to scratch page Michel Thierry
2015-02-24 16:22   ` [PATCH v6 15/32] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
2015-02-24 16:22   ` [PATCH v6 16/32] drm/i915: Extract PPGTT param from page_directory alloc Michel Thierry
2015-02-24 16:22   ` [PATCH v6 17/32] drm/i915/bdw: Split out mappings Michel Thierry
2015-02-24 16:22   ` [PATCH v6 18/32] drm/i915/bdw: begin bitmap tracking Michel Thierry
2015-02-24 16:22   ` [PATCH v6 19/32] drm/i915/bdw: Dynamic page table allocations Michel Thierry
2015-02-24 16:22   ` [PATCH v6 20/32] drm/i915/bdw: Support dynamic pdp updates in lrc mode Michel Thierry
2015-02-24 16:22   ` [PATCH v6 21/32] drm/i915/bdw: Make pdp allocation more dynamic Michel Thierry
2015-02-24 16:22   ` [PATCH v6 22/32] drm/i915/bdw: Abstract PDP usage Michel Thierry
2015-02-24 16:22   ` [PATCH v6 23/32] drm/i915/bdw: Add dynamic page trace events Michel Thierry
2015-02-24 16:22   ` [PATCH v6 24/32] drm/i915/bdw: Add ppgtt info for dynamic pages Michel Thierry
2015-02-24 16:22   ` [PATCH v6 25/32] drm/i915/bdw: implement alloc/free for 4lvl Michel Thierry
2015-02-24 16:22   ` [PATCH v6 26/32] drm/i915/bdw: Add 4 level switching infrastructure Michel Thierry
2015-02-24 16:23   ` [PATCH v6 27/32] drm/i915/bdw: Support 64 bit PPGTT in lrc mode Michel Thierry
2015-02-24 16:23   ` [PATCH v6 28/32] drm/i915/bdw: Generalize PTE writing for GEN8 PPGTT Michel Thierry
2015-02-24 16:23   ` [PATCH v6 29/32] drm/i915: Plumb sg_iter through va allocation ->maps Michel Thierry
2015-02-24 16:23   ` [PATCH v6 30/32] drm/i915/bdw: Add 4 level support in insert_entries and clear_range Michel Thierry
2015-02-24 16:23   ` [PATCH v6 31/32] drm/i915: Expand error state's address width to 64b Michel Thierry
2015-02-24 16:23   ` [PATCH v6 32/32] drm/i915/bdw: Flip the 48b switch Michel Thierry
2015-02-24 20:31   ` [PATCH v6 00/32] PPGTT dynamic page allocations and 48b addressing Daniel Vetter
2015-02-25 10:55     ` Mika Kuoppala
2015-02-25 12:29       ` Michel Thierry
2015-02-25 14:20         ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.