All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/26] [RFCish] GEN7 dynamic page tables
@ 2014-03-18  5:48 Ben Widawsky
  2014-03-18  5:48 ` [PATCH 01/26] drm/i915: Split out verbose PPGTT dumping Ben Widawsky
                   ` (26 more replies)
  0 siblings, 27 replies; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

These patches live here, based on my temporary Broadwell branch:
http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=dynamic_pt_alloc

First, and most importantly, this work should have no impact on current
drm-intel code because PPGTT is currently shut off there. To actually
test this patch series, one must re-enable PPGTT. On a single run of IGT
on IVB, it seem this doesn't introduce any regressions, but y'know, it's
PPGTT, so there's some instability, and it's hard to claim for certain
this doesn't break anything on top. Also, as stated below, the gen8 work
is only partially done.

Before I go too much further with this, I wanted to get eyes on it. I am
really open to any feedback. Before you do request a change though,
please realize that I've gone through several iterations of the
functions/interfaces. So please, spare me some pain and try to think
through what your request is before rattling it off. Daniel has
expressed to me already that he is unwilling to merge certain things
until PPGTT problems are fixed, and that can be enabled by default.
That's okay. In my opinion, many of the patches don't really have any
major behavioral changes, and only make the code so much more readable
and easy to deal with, that I believe merging it would only improve
PPGTT debugging in the future. There are several cleanups in the series
which could also go in relatively harmlessly.

Okay, so what does this do?
The patch series /dynamicizes/ page table allocation and teardown for
GEN7. It also starts to introduce GEN8, but the tricky stuff is still
not done. Up until now, all our page tables are pre-allocated when the
address space is created. That's actually okay for current GENs since we
don't use many address spaces, and the page tables occupy only 2MB each.
However, on GEN8 we can use a deeper page table, and to preallocate such
an address space would be very costly. This work was done for GEN7 first
because this is the most well tested with full PPGTT, and stable
platforms are readily available.

In this patch series, I've demonstrated how we will manage tracking used
page tables (bitmaps), and broken things out into much more discrete
functions. I'm hoping I'll get feedback on the way I've implemented
things (primarily if it seems fundamentally flawed in any way). The real
goal was to prove out the dynamic allocation so we can begin to enable
GEN8 in the same way. I'll emphasize now that I put in a lot of effort
limit risk with each patch, and this does result in some excess churn.

My next step is bring GEN8 up to par with GEN7. Once GEN8 is working,
and clean we can find where GEN7, and GEN8 overlap, and then recombine
where I haven't done so already. It's possible this plan will not work
out, and the above 2 steps will end up as one. After that, I plan to
merge the VA range allocation, and teardown into the insert/clear
entries (currently it's two steps). I think both of those steps should
be distinct.

On x86 code overlap:
I spent more time that I would have liked trying to conjoin our
pagetable management with x86 code. In the end I decided not to depend
on any of the x86 definitions (other than PAGE_SIZE) because I found the
maze of conditional compiles and defines a bit too cumbersome.  I also
didn't feel the abstract pagetable topology used in x86 code was
worthwhile given that with about 6 #defines, we achieve the same thing.
We just don't support nearly as many configurations, and our page table
format differs in too many places. One thing I had really considered,
and toyed around with was not having data structures to track the page
tables we've allocated and simply use the one that's in memory (which is
what x86 does). I was not able to make this work because of IOMMU. The
address we write into our page tables is an IOMMU address.  This means
we need to know, or be able to easily derive both the physical address
(or pfn, or struct page), and the DMA address. I failed to accomplish
this. I think using the bitmaps should be a fast way than having to kmap
the pagetables to determine their status anyway. And, one thing to keep
in mind is currently we don't have any GPU faulting capability. This
will greatly limit the ability to map things sparsely, which also will
greatly limit the effective virtual address space we can use.

Ben Widawsky (26):
  drm/i915: Split out verbose PPGTT dumping
  drm/i915: Extract switch to default context
  drm/i915: s/pd/pdpe, s/pt/pde
  drm/i915: rename map/unmap to dma_map/unmap
  drm/i915: Setup less PPGTT on failed pagedir
  drm/i915: Wrap VMA binding
  drm/i915: clean up PPGTT init error path
  drm/i915: Un-hardcode number of page directories
  drm/i915: Split out gtt specific header file
  drm/i915: Make gen6_write_pdes gen6_map_page_tables
  drm/i915: Range clearing is PPGTT agnostic
  drm/i915: Page table helpers, and define renames
  drm/i915: construct page table abstractions
  drm/i915: Complete page table structures
  drm/i915: Create page table allocators
  drm/i915: Generalize GEN6 mapping
  drm/i915: Clean up pagetable DMA map & unmap
  drm/i915: Always dma map page table allocations
  drm/i915: Consolidate dma mappings
  drm/i915: Always dma map page directory allocations
  drm/i915: Track GEN6 page table usage
  drm/i915: Extract context switch skip logic
  drm/i915: Force pd restore when PDEs change, gen6-7
  drm/i915: Finish gen6/7 dynamic page table allocation
  drm/i915: Print used ppgtt pages for gen6 in debugfs
  FOR REFERENCE ONLY

 drivers/gpu/drm/i915/i915_debugfs.c        |  47 +-
 drivers/gpu/drm/i915/i915_drv.h            | 169 +----
 drivers/gpu/drm/i915/i915_gem.c            |  10 +-
 drivers/gpu/drm/i915/i915_gem_context.c    |  25 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  10 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 995 +++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.h        | 417 ++++++++++++
 drivers/gpu/drm/i915/i915_gpu_error.c      |   1 -
 drivers/gpu/drm/i915/i915_trace.h          | 108 ++++
 9 files changed, 1198 insertions(+), 584 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_gtt.h

-- 
1.9.0

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/26] drm/i915: Split out verbose PPGTT dumping
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-20 11:57   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 02/26] drm/i915: Extract switch to default context Ben Widawsky
                   ` (25 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

There often is not enough memory to dump the full contents of the PPGTT.
As a temporary bandage, to continue getting valuable basic PPGTT info,
wrap the dangerous, memory hungry part inside of a new verbose version
of the debugfs file.

Also while here we can split out the ppgtt print function so it's more
reusable.

I'd really like to get ppgtt info into our error state, but I found it too
difficult to make work in the limited time I have. Maybe Mika can find a way.

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 1031c43..b226788 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1760,7 +1760,7 @@ static int per_file_ctx(int id, void *ptr, void *data)
 	return 0;
 }
 
-static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
+static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int verbose)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *ring;
@@ -1785,7 +1785,13 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 	}
 }
 
-static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
+static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt, const char *name)
+{
+	seq_printf(m, "%s:\n", name);
+	seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
+}
+
+static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool verbose)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *ring;
@@ -1806,10 +1812,9 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 	if (dev_priv->mm.aliasing_ppgtt) {
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
-		seq_puts(m, "aliasing PPGTT:\n");
-		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
-
-		ppgtt->debug_dump(ppgtt, m);
+		print_ppgtt(m, ppgtt, "Aliasing PPGTT");
+		if (verbose)
+			ppgtt->debug_dump(ppgtt, m);
 	} else
 		return;
 
@@ -1820,8 +1825,9 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		pvt_ppgtt = ctx_to_ppgtt(file_priv->private_default_ctx);
 		seq_printf(m, "proc: %s\n",
 			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
-		seq_puts(m, "  default context:\n");
-		idr_for_each(&file_priv->context_idr, per_file_ctx, m);
+		print_ppgtt(m, pvt_ppgtt, "Default context");
+		if (verbose)
+			idr_for_each(&file_priv->context_idr, per_file_ctx, m);
 	}
 	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
 }
@@ -1831,6 +1837,7 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	struct drm_info_node *node = (struct drm_info_node *) m->private;
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	bool verbose = node->info_ent->data ? true : false;
 
 	int ret = mutex_lock_interruptible(&dev->struct_mutex);
 	if (ret)
@@ -1838,9 +1845,9 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
 	intel_runtime_pm_get(dev_priv);
 
 	if (INTEL_INFO(dev)->gen >= 8)
-		gen8_ppgtt_info(m, dev);
+		gen8_ppgtt_info(m, dev, verbose);
 	else if (INTEL_INFO(dev)->gen >= 6)
-		gen6_ppgtt_info(m, dev);
+		gen6_ppgtt_info(m, dev, verbose);
 
 	intel_runtime_pm_put(dev_priv);
 	mutex_unlock(&dev->struct_mutex);
@@ -3826,6 +3833,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
 	{"i915_swizzle_info", i915_swizzle_info, 0},
 	{"i915_ppgtt_info", i915_ppgtt_info, 0},
+	{"i915_ppgtt_verbose_info", i915_ppgtt_info, 0, (void *)1},
 	{"i915_dpio", i915_dpio_info, 0},
 	{"i915_llc", i915_llc, 0},
 	{"i915_edp_psr_status", i915_edp_psr_status, 0},
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 02/26] drm/i915: Extract switch to default context
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
  2014-03-18  5:48 ` [PATCH 01/26] drm/i915: Split out verbose PPGTT dumping Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  8:38   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 03/26] drm/i915: s/pd/pdpe, s/pt/pde Ben Widawsky
                   ` (24 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

This patch existed for another reason which no longer exists. I liked
it, so I kept it in the series. It can skipped if undesirable.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h | 2 ++
 drivers/gpu/drm/i915/i915_gem.c | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 35f9a37..c59b707 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2476,6 +2476,8 @@ int i915_gem_context_enable(struct drm_i915_private *dev_priv);
 void i915_gem_context_close(struct drm_device *dev, struct drm_file *file);
 int i915_switch_context(struct intel_ring_buffer *ring,
 			struct drm_file *file, struct i915_hw_context *to);
+#define i915_switch_to_default(ring) \
+	i915_switch_context(ring, NULL, ring->default_context)
 struct i915_hw_context *
 i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id);
 void i915_gem_context_free(struct kref *ctx_ref);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b2565d2..ed09dda 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2799,7 +2799,7 @@ int i915_gpu_idle(struct drm_device *dev)
 
 	/* Flush everything onto the inactive list. */
 	for_each_ring(ring, dev_priv, i) {
-		ret = i915_switch_context(ring, NULL, ring->default_context);
+		ret = i915_switch_to_default(ring);
 		if (ret)
 			return ret;
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 03/26] drm/i915: s/pd/pdpe, s/pt/pde
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
  2014-03-18  5:48 ` [PATCH 01/26] drm/i915: Split out verbose PPGTT dumping Ben Widawsky
  2014-03-18  5:48 ` [PATCH 02/26] drm/i915: Extract switch to default context Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  5:48 ` [PATCH 04/26] drm/i915: rename map/unmap to dma_map/unmap Ben Widawsky
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

The actual correct way to think about this with the new style of page
table data structures is as the actual entry that is being indexed into
the array. "pd", and "pt" aren't representative of what the operation is
doing.

The clarity here will improve the readability of future patches.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bd016e2..b26b186 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -537,40 +537,40 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 }
 
 static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
-					     const int pd)
+					     const int pdpe)
 {
 	dma_addr_t pd_addr;
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       &ppgtt->pd_pages[pd], 0,
+			       &ppgtt->pd_pages[pdpe], 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->pd_dma_addr[pd] = pd_addr;
+	ppgtt->pd_dma_addr[pdpe] = pd_addr;
 
 	return 0;
 }
 
 static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
-					const int pd,
-					const int pt)
+					const int pdpe,
+					const int pde)
 {
 	dma_addr_t pt_addr;
 	struct page *p;
 	int ret;
 
-	p = ppgtt->gen8_pt_pages[pd][pt];
+	p = ppgtt->gen8_pt_pages[pdpe][pde];
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
+	ppgtt->gen8_pt_dma_addr[pdpe][pde] = pt_addr;
 
 	return 0;
 }
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 04/26] drm/i915: rename map/unmap to dma_map/unmap
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (2 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 03/26] drm/i915: s/pd/pdpe, s/pt/pde Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  8:40   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 05/26] drm/i915: Setup less PPGTT on failed pagedir Ben Widawsky
                   ` (22 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

Upcoming patches will use the terms map and unmap in references to the
page table entries. Having this distinction will really help with code
clarity at that point.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b26b186..08a1e1c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -394,7 +394,7 @@ static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
 	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
 }
 
-static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
+static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
@@ -425,7 +425,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	list_del(&vm->global_link);
 	drm_mm_takedown(&vm->mm);
 
-	gen8_ppgtt_unmap_pages(ppgtt);
+	gen8_ppgtt_dma_unmap_pages(ppgtt);
 	gen8_ppgtt_free(ppgtt);
 }
 
@@ -651,7 +651,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	return 0;
 
 bail:
-	gen8_ppgtt_unmap_pages(ppgtt);
+	gen8_ppgtt_dma_unmap_pages(ppgtt);
 	gen8_ppgtt_free(ppgtt);
 	return ret;
 }
@@ -1019,7 +1019,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
-static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
+static void gen6_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
@@ -1050,7 +1050,7 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 	drm_mm_takedown(&ppgtt->base.mm);
 	drm_mm_remove_node(&ppgtt->node);
 
-	gen6_ppgtt_unmap_pages(ppgtt);
+	gen6_ppgtt_dma_unmap_pages(ppgtt);
 	gen6_ppgtt_free(ppgtt);
 }
 
@@ -1150,7 +1150,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 				       PCI_DMA_BIDIRECTIONAL);
 
 		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
-			gen6_ppgtt_unmap_pages(ppgtt);
+			gen6_ppgtt_dma_unmap_pages(ppgtt);
 			return -EIO;
 		}
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 05/26] drm/i915: Setup less PPGTT on failed pagedir
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (3 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 04/26] drm/i915: rename map/unmap to dma_map/unmap Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  5:48 ` [PATCH 06/26] drm/i915: Wrap VMA binding Ben Widawsky
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

The current code will both potentially print a WARN, and setup part of
the PPGTT structure. Neither of these harm the current code, it is
simply for clarity, and to perhaps prevent later bugs, or weird
debug messages.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 08a1e1c..09556d1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1085,11 +1085,14 @@ alloc:
 		goto alloc;
 	}
 
+	if (ret)
+		return ret;
+
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
 	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
-	return ret;
+	return 0;
 }
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 06/26] drm/i915: Wrap VMA binding
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (4 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 05/26] drm/i915: Setup less PPGTT on failed pagedir Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  8:42   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 07/26] drm/i915: clean up PPGTT init error path Ben Widawsky
                   ` (20 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

This will be useful for some upcoming patches which do more platform
specific work. Having it in one central place just makes things a bit
cleaner and easier.

There is a small functional change here. There are more calls to the
tracepoints.

NOTE: I didn't actually end up using this patch for the intended purpose, but I
thought it was a nice patch to keep around.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h            |  3 +++
 drivers/gpu/drm/i915/i915_gem.c            |  8 ++++----
 drivers/gpu/drm/i915/i915_gem_context.c    |  2 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 +++--
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 16 ++++++++++++++--
 5 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c59b707..b3e31fd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2408,6 +2408,9 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
 			struct i915_address_space *vm);
 unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 				struct i915_address_space *vm);
+void i915_gem_bind_vma(struct i915_vma *vma, enum i915_cache_level,
+		       unsigned flags);
+void i915_gem_unbind_vma(struct i915_vma *vma);
 struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 				     struct i915_address_space *vm);
 struct i915_vma *
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ed09dda..0a3f4ac 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2765,7 +2765,7 @@ int i915_vma_unbind(struct i915_vma *vma)
 
 	trace_i915_vma_unbind(vma);
 
-	vma->unbind_vma(vma);
+	i915_gem_unbind_vma(vma);
 
 	i915_gem_gtt_finish_object(obj);
 
@@ -3514,8 +3514,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 
 		list_for_each_entry(vma, &obj->vma_list, vma_link)
 			if (drm_mm_node_allocated(&vma->node))
-				vma->bind_vma(vma, cache_level,
-					      obj->has_global_gtt_mapping ? GLOBAL_BIND : 0);
+				i915_gem_bind_vma(vma, cache_level,
+						  obj->has_global_gtt_mapping ? GLOBAL_BIND : 0);
 	}
 
 	list_for_each_entry(vma, &obj->vma_list, vma_link)
@@ -3878,7 +3878,7 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
 	}
 
 	if (flags & PIN_GLOBAL && !obj->has_global_gtt_mapping)
-		vma->bind_vma(vma, obj->cache_level, GLOBAL_BIND);
+		i915_gem_bind_vma(vma, obj->cache_level, GLOBAL_BIND);
 
 	vma->pin_count++;
 	if (flags & PIN_MAPPABLE)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 7dfdc02..f918f2c 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -693,7 +693,7 @@ static int do_switch(struct intel_ring_buffer *ring,
 	if (!to->obj->has_global_gtt_mapping) {
 		struct i915_vma *vma = i915_gem_obj_to_vma(to->obj,
 							   &dev_priv->gtt.base);
-		vma->bind_vma(vma, to->obj->cache_level, GLOBAL_BIND);
+		i915_gem_bind_vma(vma, to->obj->cache_level, GLOBAL_BIND);
 	}
 
 	if (!to->is_initialized || i915_gem_context_is_default(to))
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3851a1b..856fa9d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -369,7 +369,8 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 		struct i915_vma *vma =
 			list_first_entry(&target_i915_obj->vma_list,
 					 typeof(*vma), vma_link);
-		vma->bind_vma(vma, target_i915_obj->cache_level, GLOBAL_BIND);
+		i915_gem_bind_vma(vma, target_i915_obj->cache_level,
+				  GLOBAL_BIND);
 	}
 
 	/* Validate that the target is in a valid r/w GPU domain */
@@ -1209,7 +1210,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		 * allocate space first */
 		struct i915_vma *vma = i915_gem_obj_to_ggtt(batch_obj);
 		BUG_ON(!vma);
-		vma->bind_vma(vma, batch_obj->cache_level, GLOBAL_BIND);
+		i915_gem_bind_vma(vma, batch_obj->cache_level, GLOBAL_BIND);
 	}
 
 	if (flags & I915_DISPATCH_SECURE)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 09556d1..1620211 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1370,10 +1370,9 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 		 * without telling our object about it. So we need to fake it.
 		 */
 		obj->has_global_gtt_mapping = 0;
-		vma->bind_vma(vma, obj->cache_level, GLOBAL_BIND);
+		i915_gem_bind_vma(vma, obj->cache_level, GLOBAL_BIND);
 	}
 
-
 	if (INTEL_INFO(dev)->gen >= 8)
 		return;
 
@@ -2034,6 +2033,19 @@ int i915_gem_gtt_init(struct drm_device *dev)
 	return 0;
 }
 
+void i915_gem_bind_vma(struct i915_vma *vma, enum i915_cache_level cache_level,
+		       unsigned flags)
+{
+	trace_i915_vma_bind(vma, flags);
+	vma->bind_vma(vma, cache_level, flags);
+}
+
+void i915_gem_unbind_vma(struct i915_vma *vma)
+{
+	trace_i915_vma_unbind(vma);
+	vma->unbind_vma(vma);
+}
+
 static struct i915_vma *__i915_gem_vma_create(struct drm_i915_gem_object *obj,
 					      struct i915_address_space *vm)
 {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 07/26] drm/i915: clean up PPGTT init error path
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (5 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 06/26] drm/i915: Wrap VMA binding Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  8:44   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 08/26] drm/i915: Un-hardcode number of page directories Ben Widawsky
                   ` (19 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

The old code (I'm having trouble finding the commit) had a reason for
doing things when there was an error, and would continue on, thus the
!ret. For the newer code however, this looks completely silly.

Follow the normal idiom of if (ret) return ret.

Also, put the pde wiring in the gen specific init, now that GEN8 exists.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1620211..5f73284 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1202,6 +1202,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
+	gen6_write_pdes(ppgtt);
+
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
@@ -1226,20 +1228,14 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	else
 		BUG();
 
-	if (!ret) {
-		struct drm_i915_private *dev_priv = dev->dev_private;
-		kref_init(&ppgtt->ref);
-		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
-			    ppgtt->base.total);
-		i915_init_vm(dev_priv, &ppgtt->base);
-		if (INTEL_INFO(dev)->gen < 8) {
-			gen6_write_pdes(ppgtt);
-			DRM_DEBUG("Adding PPGTT at offset %x\n",
-				  ppgtt->pd_offset << 10);
-		}
-	}
+	if (ret)
+		return ret;
 
-	return ret;
+	kref_init(&ppgtt->ref);
+	drm_mm_init(&ppgtt->base.mm, ppgtt->base.start, ppgtt->base.total);
+	i915_init_vm(dev_priv, &ppgtt->base);
+
+	return 0;
 }
 
 static void
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 08/26] drm/i915: Un-hardcode number of page directories
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (6 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 07/26] drm/i915: clean up PPGTT init error path Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  5:48 ` [PATCH 09/26] drm/i915: Split out gtt specific header file Ben Widawsky
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

trivial.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b3e31fd..084e82f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -722,7 +722,7 @@ struct i915_hw_ppgtt {
 	};
 	union {
 		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[4];
+		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPS];
 	};
 
 	int (*enable)(struct i915_hw_ppgtt *ppgtt);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 09/26] drm/i915: Split out gtt specific header file
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (7 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 08/26] drm/i915: Un-hardcode number of page directories Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  8:46   ` Chris Wilson
  2014-03-18  9:15   ` Daniel Vetter
  2014-03-18  5:48 ` [PATCH 10/26] drm/i915: Make gen6_write_pdes gen6_map_page_tables Ben Widawsky
                   ` (17 subsequent siblings)
  26 siblings, 2 replies; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

TODO: Do header files need a copyright?

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h     | 162 +-------------------------
 drivers/gpu/drm/i915/i915_gem_gtt.c |  57 ---------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 225 ++++++++++++++++++++++++++++++++++++
 3 files changed, 227 insertions(+), 217 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_gtt.h

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 084e82f..b19442c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -44,6 +44,8 @@
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
 
+#include "i915_gem_gtt.h"
+
 /* General customization:
  */
 
@@ -572,166 +574,6 @@ enum i915_cache_level {
 	I915_CACHE_WT, /* hsw:gt3e WriteThrough for scanouts */
 };
 
-typedef uint32_t gen6_gtt_pte_t;
-
-/**
- * A VMA represents a GEM BO that is bound into an address space. Therefore, a
- * VMA's presence cannot be guaranteed before binding, or after unbinding the
- * object into/from the address space.
- *
- * To make things as simple as possible (ie. no refcounting), a VMA's lifetime
- * will always be <= an objects lifetime. So object refcounting should cover us.
- */
-struct i915_vma {
-	struct drm_mm_node node;
-	struct drm_i915_gem_object *obj;
-	struct i915_address_space *vm;
-
-	/** This object's place on the active/inactive lists */
-	struct list_head mm_list;
-
-	struct list_head vma_link; /* Link in the object's VMA list */
-
-	/** This vma's place in the batchbuffer or on the eviction list */
-	struct list_head exec_list;
-
-	/**
-	 * Used for performing relocations during execbuffer insertion.
-	 */
-	struct hlist_node exec_node;
-	unsigned long exec_handle;
-	struct drm_i915_gem_exec_object2 *exec_entry;
-
-	/**
-	 * How many users have pinned this object in GTT space. The following
-	 * users can each hold at most one reference: pwrite/pread, pin_ioctl
-	 * (via user_pin_count), execbuffer (objects are not allowed multiple
-	 * times for the same batchbuffer), and the framebuffer code. When
-	 * switching/pageflipping, the framebuffer code has at most two buffers
-	 * pinned per crtc.
-	 *
-	 * In the worst case this is 1 + 1 + 1 + 2*2 = 7. That would fit into 3
-	 * bits with absolutely no headroom. So use 4 bits. */
-	unsigned int pin_count:4;
-#define DRM_I915_GEM_OBJECT_MAX_PIN_COUNT 0xf
-
-	/** Unmap an object from an address space. This usually consists of
-	 * setting the valid PTE entries to a reserved scratch page. */
-	void (*unbind_vma)(struct i915_vma *vma);
-	/* Map an object into an address space with the given cache flags. */
-#define GLOBAL_BIND (1<<0)
-	void (*bind_vma)(struct i915_vma *vma,
-			 enum i915_cache_level cache_level,
-			 u32 flags);
-};
-
-struct i915_address_space {
-	struct drm_mm mm;
-	struct drm_device *dev;
-	struct list_head global_link;
-	unsigned long start;		/* Start offset always 0 for dri2 */
-	size_t total;		/* size addr space maps (ex. 2GB for ggtt) */
-
-	struct {
-		dma_addr_t addr;
-		struct page *page;
-	} scratch;
-
-	/**
-	 * List of objects currently involved in rendering.
-	 *
-	 * Includes buffers having the contents of their GPU caches
-	 * flushed, not necessarily primitives.  last_rendering_seqno
-	 * represents when the rendering involved will be completed.
-	 *
-	 * A reference is held on the buffer while on this list.
-	 */
-	struct list_head active_list;
-
-	/**
-	 * LRU list of objects which are not in the ringbuffer and
-	 * are ready to unbind, but are still in the GTT.
-	 *
-	 * last_rendering_seqno is 0 while an object is in this list.
-	 *
-	 * A reference is not held on the buffer while on this list,
-	 * as merely being GTT-bound shouldn't prevent its being
-	 * freed, and we'll pull it off the list in the free path.
-	 */
-	struct list_head inactive_list;
-
-	/* FIXME: Need a more generic return type */
-	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
-				     enum i915_cache_level level,
-				     bool valid); /* Create a valid PTE */
-	void (*clear_range)(struct i915_address_space *vm,
-			    uint64_t start,
-			    uint64_t length,
-			    bool use_scratch);
-	void (*insert_entries)(struct i915_address_space *vm,
-			       struct sg_table *st,
-			       uint64_t start,
-			       enum i915_cache_level cache_level);
-	void (*cleanup)(struct i915_address_space *vm);
-};
-
-/* The Graphics Translation Table is the way in which GEN hardware translates a
- * Graphics Virtual Address into a Physical Address. In addition to the normal
- * collateral associated with any va->pa translations GEN hardware also has a
- * portion of the GTT which can be mapped by the CPU and remain both coherent
- * and correct (in cases like swizzling). That region is referred to as GMADR in
- * the spec.
- */
-struct i915_gtt {
-	struct i915_address_space base;
-	size_t stolen_size;		/* Total size of stolen memory */
-
-	unsigned long mappable_end;	/* End offset that we can CPU map */
-	struct io_mapping *mappable;	/* Mapping to our CPU mappable region */
-	phys_addr_t mappable_base;	/* PA of our GMADR */
-
-	/** "Graphics Stolen Memory" holds the global PTEs */
-	void __iomem *gsm;
-
-	bool do_idle_maps;
-
-	int mtrr;
-
-	/* global gtt ops */
-	int (*gtt_probe)(struct drm_device *dev, size_t *gtt_total,
-			  size_t *stolen, phys_addr_t *mappable_base,
-			  unsigned long *mappable_end);
-};
-#define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
-
-#define GEN8_LEGACY_PDPS 4
-struct i915_hw_ppgtt {
-	struct i915_address_space base;
-	struct kref ref;
-	struct drm_mm_node node;
-	unsigned num_pd_entries;
-	unsigned num_pd_pages; /* gen8+ */
-	union {
-		struct page **pt_pages;
-		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
-	};
-	struct page *pd_pages;
-	union {
-		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
-	};
-	union {
-		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPS];
-	};
-
-	int (*enable)(struct i915_hw_ppgtt *ppgtt);
-	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
-			 struct intel_ring_buffer *ring,
-			 bool synchronous);
-	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
-};
-
 struct i915_ctx_hang_stats {
 	/* This context had batch pending when hang was declared */
 	unsigned batch_pending;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5f73284..a239196 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -53,60 +53,6 @@ bool intel_enable_ppgtt(struct drm_device *dev, bool full)
 		return HAS_ALIASING_PPGTT(dev);
 }
 
-#define GEN6_PPGTT_PD_ENTRIES 512
-#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
-typedef uint64_t gen8_gtt_pte_t;
-typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
-
-/* PPGTT stuff */
-#define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
-#define HSW_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0x7f0))
-
-#define GEN6_PDE_VALID			(1 << 0)
-/* gen6+ has bit 11-4 for physical addr bit 39-32 */
-#define GEN6_PDE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
-
-#define GEN6_PTE_VALID			(1 << 0)
-#define GEN6_PTE_UNCACHED		(1 << 1)
-#define HSW_PTE_UNCACHED		(0)
-#define GEN6_PTE_CACHE_LLC		(2 << 1)
-#define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
-#define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
-#define HSW_PTE_ADDR_ENCODE(addr)	HSW_GTT_ADDR_ENCODE(addr)
-
-/* Cacheability Control is a 4-bit value. The low three bits are stored in *
- * bits 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
- */
-#define HSW_CACHEABILITY_CONTROL(bits)	((((bits) & 0x7) << 1) | \
-					 (((bits) & 0x8) << (11 - 3)))
-#define HSW_WB_LLC_AGE3			HSW_CACHEABILITY_CONTROL(0x2)
-#define HSW_WB_LLC_AGE0			HSW_CACHEABILITY_CONTROL(0x3)
-#define HSW_WB_ELLC_LLC_AGE0		HSW_CACHEABILITY_CONTROL(0xb)
-#define HSW_WB_ELLC_LLC_AGE3		HSW_CACHEABILITY_CONTROL(0x8)
-#define HSW_WT_ELLC_LLC_AGE0		HSW_CACHEABILITY_CONTROL(0x6)
-#define HSW_WT_ELLC_LLC_AGE3		HSW_CACHEABILITY_CONTROL(0x7)
-
-#define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
-#define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
-
-/* GEN8 legacy style addressis defined as a 3 level page table:
- * 31:30 | 29:21 | 20:12 |  11:0
- * PDPE  |  PDE  |  PTE  | offset
- * The difference as compared to normal x86 3 level page table is the PDPEs are
- * programmed via register.
- */
-#define GEN8_PDPE_SHIFT			30
-#define GEN8_PDPE_MASK			0x3
-#define GEN8_PDE_SHIFT			21
-#define GEN8_PDE_MASK			0x1ff
-#define GEN8_PTE_SHIFT			12
-#define GEN8_PTE_MASK			0x1ff
-
-#define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
-#define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
-#define PPAT_CACHED_INDEX		_PAGE_PAT /* WB LLCeLLC */
-#define PPAT_DISPLAY_ELLC_INDEX		_PAGE_PCD /* WT eLLC */
-
 static void ppgtt_bind_vma(struct i915_vma *vma,
 			   enum i915_cache_level cache_level,
 			   u32 flags);
@@ -185,9 +131,6 @@ static gen6_gtt_pte_t ivb_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-#define BYT_PTE_WRITEABLE		(1 << 1)
-#define BYT_PTE_SNOOPED_BY_CPU_CACHES	(1 << 2)
-
 static gen6_gtt_pte_t byt_pte_encode(dma_addr_t addr,
 				     enum i915_cache_level level,
 				     bool valid)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
new file mode 100644
index 0000000..c8d5c77
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -0,0 +1,225 @@
+#ifndef _I915_GEM_GTT_H
+#define _I915_GEM_GTT_H
+
+#define GEN6_PPGTT_PD_ENTRIES 512
+#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
+typedef uint32_t gen6_gtt_pte_t;
+typedef uint64_t gen8_gtt_pte_t;
+typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
+
+/* PPGTT stuff */
+#define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
+#define HSW_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0x7f0))
+
+#define GEN6_PDE_VALID			(1 << 0)
+/* gen6+ has bit 11-4 for physical addr bit 39-32 */
+#define GEN6_PDE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
+
+#define GEN6_PTE_VALID			(1 << 0)
+#define GEN6_PTE_UNCACHED		(1 << 1)
+#define HSW_PTE_UNCACHED		(0)
+#define GEN6_PTE_CACHE_LLC		(2 << 1)
+#define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
+#define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
+#define HSW_PTE_ADDR_ENCODE(addr)	HSW_GTT_ADDR_ENCODE(addr)
+
+#define BYT_PTE_WRITEABLE		(1 << 1)
+#define BYT_PTE_SNOOPED_BY_CPU_CACHES	(1 << 2)
+
+/* Cacheability Control is a 4-bit value. The low three bits are stored in *
+ * bits 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
+ */
+#define HSW_CACHEABILITY_CONTROL(bits)	((((bits) & 0x7) << 1) | \
+					 (((bits) & 0x8) << (11 - 3)))
+#define HSW_WB_LLC_AGE3			HSW_CACHEABILITY_CONTROL(0x2)
+#define HSW_WB_LLC_AGE0			HSW_CACHEABILITY_CONTROL(0x3)
+#define HSW_WB_ELLC_LLC_AGE0		HSW_CACHEABILITY_CONTROL(0xb)
+#define HSW_WB_ELLC_LLC_AGE3		HSW_CACHEABILITY_CONTROL(0x8)
+#define HSW_WT_ELLC_LLC_AGE0		HSW_CACHEABILITY_CONTROL(0x6)
+#define HSW_WT_ELLC_LLC_AGE3		HSW_CACHEABILITY_CONTROL(0x7)
+
+#define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
+#define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
+#define PPAT_CACHED_INDEX		_PAGE_PAT /* WB LLCeLLC */
+#define PPAT_DISPLAY_ELLC_INDEX		_PAGE_PCD /* WT eLLC */
+
+#define GEN8_LEGACY_PDPS		4
+#define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
+#define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
+
+/* GEN8 legacy style addressis defined as a 3 level page table:
+ * 31:30 | 29:21 | 20:12 |  11:0
+ * PDPE  |  PDE  |  PTE  | offset
+ * The difference as compared to normal x86 3 level page table is the PDPEs are
+ * programmed via register.
+ *
+ * The x86 pagetable code is flexible in its ability to handle varying page
+ * table depths via abstracted PGDIR/PUD/PMD/PTE. I've opted to not do this and
+ * instead replicate the interesting functionality.
+ */
+#define GEN8_PDPE_SHIFT			30
+#define GEN8_PDPE_MASK			0x3
+#define GEN8_PDE_SHIFT			21
+#define GEN8_PDE_MASK			0x1ff
+#define GEN8_PTE_SHIFT			12
+#define GEN8_PTE_MASK			0x1ff
+
+enum i915_cache_level;
+/**
+ * A VMA represents a GEM BO that is bound into an address space. Therefore, a
+ * VMA's presence cannot be guaranteed before binding, or after unbinding the
+ * object into/from the address space.
+ *
+ * To make things as simple as possible (ie. no refcounting), a VMA's lifetime
+ * will always be <= an objects lifetime. So object refcounting should cover us.
+ */
+struct i915_vma {
+	struct drm_mm_node node;
+	struct drm_i915_gem_object *obj;
+	struct i915_address_space *vm;
+
+	/** This object's place on the active/inactive lists */
+	struct list_head mm_list;
+
+	struct list_head vma_link; /* Link in the object's VMA list */
+
+	/** This vma's place in the batchbuffer or on the eviction list */
+	struct list_head exec_list;
+
+	/**
+	 * Used for performing relocations during execbuffer insertion.
+	 */
+	struct hlist_node exec_node;
+	unsigned long exec_handle;
+	struct drm_i915_gem_exec_object2 *exec_entry;
+
+	/**
+	 * How many users have pinned this object in GTT space. The following
+	 * users can each hold at most one reference: pwrite/pread, pin_ioctl
+	 * (via user_pin_count), execbuffer (objects are not allowed multiple
+	 * times for the same batchbuffer), and the framebuffer code. When
+	 * switching/pageflipping, the framebuffer code has at most two buffers
+	 * pinned per crtc.
+	 *
+	 * In the worst case this is 1 + 1 + 1 + 2*2 = 7. That would fit into 3
+	 * bits with absolutely no headroom. So use 4 bits. */
+	unsigned int pin_count:4;
+#define DRM_I915_GEM_OBJECT_MAX_PIN_COUNT 0xf
+
+	/** Unmap an object from an address space. This usually consists of
+	 * setting the valid PTE entries to a reserved scratch page. */
+	void (*unbind_vma)(struct i915_vma *vma);
+	/* Map an object into an address space with the given cache flags. */
+#define GLOBAL_BIND (1<<0)
+	void (*bind_vma)(struct i915_vma *vma,
+			 enum i915_cache_level cache_level,
+			 u32 flags);
+};
+
+struct i915_address_space {
+	struct drm_mm mm;
+	struct drm_device *dev;
+	struct list_head global_link;
+	unsigned long start;		/* Start offset always 0 for dri2 */
+	size_t total;		/* size addr space maps (ex. 2GB for ggtt) */
+
+	struct {
+		dma_addr_t addr;
+		struct page *page;
+	} scratch;
+
+	/**
+	 * List of objects currently involved in rendering.
+	 *
+	 * Includes buffers having the contents of their GPU caches
+	 * flushed, not necessarily primitives.  last_rendering_seqno
+	 * represents when the rendering involved will be completed.
+	 *
+	 * A reference is held on the buffer while on this list.
+	 */
+	struct list_head active_list;
+
+	/**
+	 * LRU list of objects which are not in the ringbuffer and
+	 * are ready to unbind, but are still in the GTT.
+	 *
+	 * last_rendering_seqno is 0 while an object is in this list.
+	 *
+	 * A reference is not held on the buffer while on this list,
+	 * as merely being GTT-bound shouldn't prevent its being
+	 * freed, and we'll pull it off the list in the free path.
+	 */
+	struct list_head inactive_list;
+
+	/* FIXME: Need a more generic return type */
+	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
+				     enum i915_cache_level level,
+				     bool valid); /* Create a valid PTE */
+	void (*clear_range)(struct i915_address_space *vm,
+			    uint64_t start,
+			    uint64_t length,
+			    bool use_scratch);
+	void (*insert_entries)(struct i915_address_space *vm,
+			       struct sg_table *st,
+			       uint64_t start,
+			       enum i915_cache_level cache_level);
+	void (*cleanup)(struct i915_address_space *vm);
+};
+
+/* The Graphics Translation Table is the way in which GEN hardware translates a
+ * Graphics Virtual Address into a Physical Address. In addition to the normal
+ * collateral associated with any va->pa translations GEN hardware also has a
+ * portion of the GTT which can be mapped by the CPU and remain both coherent
+ * and correct (in cases like swizzling). That region is referred to as GMADR in
+ * the spec.
+ */
+struct i915_gtt {
+	struct i915_address_space base;
+	size_t stolen_size;		/* Total size of stolen memory */
+
+	unsigned long mappable_end;	/* End offset that we can CPU map */
+	struct io_mapping *mappable;	/* Mapping to our CPU mappable region */
+	phys_addr_t mappable_base;	/* PA of our GMADR */
+
+	/** "Graphics Stolen Memory" holds the global PTEs */
+	void __iomem *gsm;
+
+	bool do_idle_maps;
+
+	int mtrr;
+
+	/* global gtt ops */
+	int (*gtt_probe)(struct drm_device *dev, size_t *gtt_total,
+			  size_t *stolen, phys_addr_t *mappable_base,
+			  unsigned long *mappable_end);
+};
+#define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
+
+struct i915_hw_ppgtt {
+	struct i915_address_space base;
+	struct kref ref;
+	struct drm_mm_node node;
+	unsigned num_pd_entries;
+	unsigned num_pd_pages; /* gen8+ */
+	union {
+		struct page **pt_pages;
+		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
+	};
+	struct page *pd_pages;
+	union {
+		uint32_t pd_offset;
+		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
+	};
+	union {
+		dma_addr_t *pt_dma_addr;
+		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPS];
+	};
+
+	int (*enable)(struct i915_hw_ppgtt *ppgtt);
+	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
+			 struct intel_ring_buffer *ring,
+			 bool synchronous);
+	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
+};
+
+#endif
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 10/26] drm/i915: Make gen6_write_pdes gen6_map_page_tables
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (8 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 09/26] drm/i915: Split out gtt specific header file Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  8:48   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 11/26] drm/i915: Range clearing is PPGTT agnostic Ben Widawsky
                   ` (16 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

Split out single mappings which will help with upcoming work. Also while
here, rename the function because it is a better description - but this
function is going away soon.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 39 ++++++++++++++++++++++---------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a239196..d89054d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -655,26 +655,33 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	}
 }
 
-static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
+static void gen6_map_single(struct i915_hw_ppgtt *ppgtt,
+			    const unsigned pde_index,
+			    dma_addr_t daddr)
 {
 	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
-	gen6_gtt_pte_t __iomem *pd_addr;
 	uint32_t pd_entry;
+	gen6_gtt_pte_t __iomem *pd_addr =
+		(gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+
+	pd_entry = GEN6_PDE_ADDR_ENCODE(daddr);
+	pd_entry |= GEN6_PDE_VALID;
+
+	writel(pd_entry, pd_addr + pde_index);
+}
+
+/* Map all the page tables found in the ppgtt structure to incrementing page
+ * directories. */
+static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
 	int i;
 
 	WARN_ON(ppgtt->pd_offset & 0x3f);
-	pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		dma_addr_t pt_addr;
-
-		pt_addr = ppgtt->pt_dma_addr[i];
-		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
-		pd_entry |= GEN6_PDE_VALID;
+	for (i = 0; i < ppgtt->num_pd_entries; i++)
+		gen6_map_single(ppgtt, i, ppgtt->pt_dma_addr[i]);
 
-		writel(pd_entry, pd_addr + i);
-	}
-	readl(pd_addr);
+	readl(dev_priv->gtt.gsm);
 }
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
@@ -1145,7 +1152,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
-	gen6_write_pdes(ppgtt);
+	gen6_map_page_tables(ppgtt);
 
 	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 
@@ -1319,11 +1326,11 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 		/* TODO: Perhaps it shouldn't be gen6 specific */
 		if (i915_is_ggtt(vm)) {
 			if (dev_priv->mm.aliasing_ppgtt)
-				gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
+				gen6_map_page_tables(dev_priv->mm.aliasing_ppgtt);
 			continue;
 		}
 
-		gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
+		gen6_map_page_tables(container_of(vm, struct i915_hw_ppgtt, base));
 	}
 
 	i915_gem_chipset_flush(dev);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 11/26] drm/i915: Range clearing is PPGTT agnostic
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (9 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 10/26] drm/i915: Make gen6_write_pdes gen6_map_page_tables Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  8:50   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 12/26] drm/i915: Page table helpers, and define renames Ben Widawsky
                   ` (15 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

Therefore we can do it from our general init function. Eventually, I
hope to have a lot more commonality like this. It won't arrive yet, but
this was a nice easy one.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d89054d..77556ac 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -584,8 +584,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.start = 0;
 	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
 
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
 	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
 			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
 	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
@@ -1154,8 +1152,6 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 
 	gen6_map_page_tables(ppgtt);
 
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
 			 ppgtt->node.start / PAGE_SIZE);
@@ -1183,6 +1179,7 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 
 	kref_init(&ppgtt->ref);
 	drm_mm_init(&ppgtt->base.mm, ppgtt->base.start, ppgtt->base.total);
+	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 	i915_init_vm(dev_priv, &ppgtt->base);
 
 	return 0;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 12/26] drm/i915: Page table helpers, and define renames
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (10 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 11/26] drm/i915: Range clearing is PPGTT agnostic Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  9:05   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 13/26] drm/i915: construct page table abstractions Ben Widawsky
                   ` (14 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

These page table helpers make the code much cleaner. There is some
room to use the arch/x86 header files. The reason I've opted not to is
in several cases, the definitions are dictated by the CONFIG_ options
which do not always indicate the restrictions in the GPU. While here,
clean up the defines to have more concise names, and consolidate between
gen6 and gen8 where appropriate.

I've made a lot of tiny errors in these helpers. Often I'd correct an
error only to introduce another one. While IGT was capable of catching
them, the tests often took a while to catch, and where hard/slow to
debug in the kernel. As a result, to test this, I compiled
i915_gem_gtt.h in userspace, and ran tests from userspace. What follows
isn't by any means complete, but it was able to catch lot of bugs. Gen8
is also untested, but since the current code is almost identical, I feel
pretty comfortable with that.

void test_pte(uint32_t base) {
        uint32_t ret;
        assert_pte_index((base + 0), 0);
        assert_pte_index((base + 1), 0);
        assert_pte_index((base + 0x1000), 1);
        assert_pte_index((base + (1<<22)), 0);
        assert_pte_index((base + ((1<<22) - 1)), 1023);
        assert_pte_index((base + (1<<21)), 512);

        assert_pte_count(base + 0, 0, 0);
        assert_pte_count(base + 0, 1, 1);
        assert_pte_count(base + 0, 0x1000, 1);
        assert_pte_count(base + 0, 0x1001, 2);
        assert_pte_count(base + 0, 1<<21, 512);

        assert_pte_count(base + 0, 1<<22, 1024);
        assert_pte_count(base + 0, (1<<22) - 1, 1024);
        assert_pte_count(base + (1<<21), 1<<22, 512);
        assert_pte_count(base + (1<<21), (1<<22)+1, 512);
        assert_pte_count(base + (1<<21), 10<<22, 512);
}

void test_pde(uint32_t base) {
        assert(gen6_pde_index(base + 0) == 0);
        assert(gen6_pde_index(base + 1) == 0);
        assert(gen6_pde_index(base + (1<<21)) == 0);
        assert(gen6_pde_index(base + (1<<22)) == 1);
        assert(gen6_pde_index(base + ((256<<22)))== 256);
        assert(gen6_pde_index(base + ((512<<22))) == 0);
        assert(gen6_pde_index(base + ((513<<22))) == 1); /* This is
actually not possible on gen6 */

        assert(gen6_pde_count(base + 0, 0) == 0);
        assert(gen6_pde_count(base + 0, 1) == 1);
        assert(gen6_pde_count(base + 0, 1<<21) == 1);
        assert(gen6_pde_count(base + 0, 1<<22) == 1);
        assert(gen6_pde_count(base + 0, (1<<22) + 0x1000) == 2);
        assert(gen6_pde_count(base + 0x1000, 1<<22) == 2);
        assert(gen6_pde_count(base + 0, 511<<22) == 511);
        assert(gen6_pde_count(base + 0, 512<<22) == 512);
        assert(gen6_pde_count(base + 0x1000, 512<<22) == 512);
        assert(gen6_pde_count(base + (1<<22), 512<<22) == 511);
}

int main()
{
        test_pde(0);
        while (1)
                test_pte(rand() & ~((1<<22) - 1));

        return 0;
}

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c |  90 +++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 125 ++++++++++++++++++++++++++++++++++--
 2 files changed, 162 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 77556ac..7afa5f4 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -220,7 +220,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int i, ret;
 
 	/* bit of a hack to find the actual last used pd */
-	int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
+	int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
 
 	for (i = used_pd - 1; i >= 0; i--) {
 		dma_addr_t addr = ppgtt->pd_dma_addr[i];
@@ -240,9 +240,9 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
-	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
-	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
-	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+	unsigned pdpe = gen8_pdpe_index(start);
+	unsigned pde = gen8_pde_index(start);
+	unsigned pte = gen8_pte_index(start);
 	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned last_pte, i;
 
@@ -253,8 +253,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
 
 		last_pte = pte + num_entries;
-		if (last_pte > GEN8_PTES_PER_PAGE)
-			last_pte = GEN8_PTES_PER_PAGE;
+		if (last_pte > GEN8_PTES_PER_PT)
+			last_pte = GEN8_PTES_PER_PT;
 
 		pt_vaddr = kmap_atomic(page_table);
 
@@ -266,7 +266,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 
 		pte = 0;
-		if (++pde == GEN8_PDES_PER_PAGE) {
+		if (++pde == I915_PDES_PER_PD) {
 			pdpe++;
 			pde = 0;
 		}
@@ -281,9 +281,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr;
-	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
-	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
-	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+	unsigned pdpe = gen8_pdpe_index(start);
+	unsigned pde = gen8_pde_index(start);
+	unsigned pte = gen8_pte_index(start);
 	struct sg_page_iter sg_iter;
 
 	pt_vaddr = NULL;
@@ -298,10 +298,10 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		pt_vaddr[pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
 					cache_level, true);
-		if (++pte == GEN8_PTES_PER_PAGE) {
+		if (++pte == GEN8_PTES_PER_PT) {
 			kunmap_atomic(pt_vaddr);
 			pt_vaddr = NULL;
-			if (++pde == GEN8_PDES_PER_PAGE) {
+			if (++pde == I915_PDES_PER_PD) {
 				pdpe++;
 				pde = 0;
 			}
@@ -319,7 +319,7 @@ static void gen8_free_page_tables(struct page **pt_pages)
 	if (pt_pages == NULL)
 		return;
 
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
+	for (i = 0; i < I915_PDES_PER_PD; i++)
 		if (pt_pages[i])
 			__free_pages(pt_pages[i], 0);
 }
@@ -351,7 +351,7 @@ static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+		for (j = 0; j < I915_PDES_PER_PD; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
@@ -377,11 +377,11 @@ static struct page **__gen8_alloc_page_tables(void)
 	struct page **pt_pages;
 	int i;
 
-	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
+	pt_pages = kcalloc(I915_PDES_PER_PD, sizeof(struct page *), GFP_KERNEL);
 	if (!pt_pages)
 		return ERR_PTR(-ENOMEM);
 
-	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
+	for (i = 0; i < I915_PDES_PER_PD; i++) {
 		pt_pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO);
 		if (!pt_pages[i])
 			goto bail;
@@ -431,7 +431,7 @@ static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
+		ppgtt->gen8_pt_dma_addr[i] = kcalloc(I915_PDES_PER_PD,
 						     sizeof(dma_addr_t),
 						     GFP_KERNEL);
 		if (!ppgtt->gen8_pt_dma_addr[i])
@@ -470,7 +470,7 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 		return ret;
 	}
 
-	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
+	ppgtt->num_pd_entries = max_pdp * I915_PDES_PER_PD;
 
 	ret = gen8_ppgtt_allocate_dma(ppgtt);
 	if (ret)
@@ -531,7 +531,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
 	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
+	const int min_pt_pages = I915_PDES_PER_PD * max_pdp;
 	int i, j, ret;
 
 	if (size % (1<<30))
@@ -550,7 +550,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		if (ret)
 			goto bail;
 
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+		for (j = 0; j < I915_PDES_PER_PD; j++) {
 			ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
 			if (ret)
 				goto bail;
@@ -568,7 +568,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	for (i = 0; i < max_pdp; i++) {
 		gen8_ppgtt_pde_t *pd_vaddr;
 		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
-		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+		for (j = 0; j < I915_PDES_PER_PD; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
@@ -582,7 +582,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
+	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PT * PAGE_SIZE;
 
 	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
 			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
@@ -628,9 +628,9 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
 		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
-		for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
+		for (pte = 0; pte < GEN6_PTES_PER_PT; pte+=4) {
 			unsigned long va =
-				(pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
+				(pde * PAGE_SIZE * GEN6_PTES_PER_PT) +
 				(pte * PAGE_SIZE);
 			int i;
 			bool found = false;
@@ -909,29 +909,28 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen6_gtt_pte_t *pt_vaddr, scratch_pte;
-	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned pde = gen6_pde_index(start);
 	unsigned num_entries = length >> PAGE_SHIFT;
-	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
-	unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
+	unsigned pte = gen6_pte_index(start);
 	unsigned last_pte, i;
 
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
 
 	while (num_entries) {
-		last_pte = first_pte + num_entries;
-		if (last_pte > I915_PPGTT_PT_ENTRIES)
-			last_pte = I915_PPGTT_PT_ENTRIES;
+		last_pte = pte + num_entries;
+		if (last_pte > GEN6_PTES_PER_PT)
+			last_pte = GEN6_PTES_PER_PT;
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
 
-		for (i = first_pte; i < last_pte; i++)
+		for (i = pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
 
 		kunmap_atomic(pt_vaddr);
 
-		num_entries -= last_pte - first_pte;
-		first_pte = 0;
-		act_pt++;
+		num_entries -= last_pte - pte;
+		pte = 0;
+		pde++;
 	}
 }
 
@@ -943,24 +942,23 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen6_gtt_pte_t *pt_vaddr;
-	unsigned first_entry = start >> PAGE_SHIFT;
-	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
-	unsigned act_pte = first_entry % I915_PPGTT_PT_ENTRIES;
+	unsigned pde = gen6_pde_index(start);
+	unsigned pte = gen6_pte_index(start);
 	struct sg_page_iter sg_iter;
 
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+			pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
 
-		pt_vaddr[act_pte] =
+		pt_vaddr[pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
 				       cache_level, true);
-		if (++act_pte == I915_PPGTT_PT_ENTRIES) {
+		if (++pte == GEN6_PTES_PER_PT) {
 			kunmap_atomic(pt_vaddr);
 			pt_vaddr = NULL;
-			act_pt++;
-			act_pte = 0;
+			pde++;
+			pte = 0;
 		}
 	}
 	if (pt_vaddr)
@@ -1005,7 +1003,7 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 {
 #define GEN6_PD_ALIGN (PAGE_SIZE * 16)
-#define GEN6_PD_SIZE (GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
+#define GEN6_PD_SIZE (I915_PDES_PER_PD * PAGE_SIZE)
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	bool retried = false;
@@ -1039,7 +1037,7 @@ alloc:
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
-	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
+	ppgtt->num_pd_entries = I915_PDES_PER_PD;
 	return 0;
 }
 
@@ -1144,7 +1142,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total =  ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	ppgtt->base.total =  ppgtt->num_pd_entries * GEN6_PTES_PER_PT * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd_offset =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c8d5c77..f813769 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -1,8 +1,11 @@
 #ifndef _I915_GEM_GTT_H
 #define _I915_GEM_GTT_H
 
-#define GEN6_PPGTT_PD_ENTRIES 512
-#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
+/* GEN Agnostic defines */
+#define I915_PDES_PER_PD		512
+#define I915_PTE_MASK			(PAGE_SHIFT-1)
+#define I915_PDE_MASK			(I915_PDES_PER_PD-1)
+
 typedef uint32_t gen6_gtt_pte_t;
 typedef uint64_t gen8_gtt_pte_t;
 typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
@@ -23,6 +26,98 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
 #define HSW_PTE_ADDR_ENCODE(addr)	HSW_GTT_ADDR_ENCODE(addr)
 
+
+/* GEN6 PPGTT resembles a 2 level page table:
+ * 31:22 | 21:12 |  11:0
+ *  PDE  |  PTE  | offset
+ */
+#define GEN6_PDE_SHIFT			22
+#define GEN6_PTES_PER_PT		(PAGE_SIZE / sizeof(gen6_gtt_pte_t))
+
+static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
+{
+	const uint32_t mask = (1 << (pde_shift - PAGE_SHIFT)) - 1;
+	return (address >> PAGE_SHIFT) & mask;
+}
+
+/* Helper to counts the number of PTEs within the given length. This count does
+ * not cross a page table boundary, so the max value would be
+ * GEN6_PTES_PER_PT for GEN6, and GEN8_PTES_PER_PT for GEN8.
+ */
+static inline size_t i915_pte_count(uint64_t addr, size_t length,
+				    uint32_t pde_shift)
+{
+	const uint64_t pd_mask = ~((1 << pde_shift) - 1);
+	uint64_t end;
+
+	if (WARN_ON(!length))
+		return 0;
+
+	if (WARN_ON(addr % PAGE_SIZE))
+		addr = round_down(addr, PAGE_SIZE);
+
+	if (WARN_ON(length % PAGE_SIZE))
+		length = round_up(length, PAGE_SIZE);
+
+	end = addr + length;
+
+	if ((addr & pd_mask) != (end & pd_mask))
+		return (1 << (pde_shift - PAGE_SHIFT)) -
+			i915_pte_index(addr, pde_shift);
+
+	return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
+}
+
+static inline uint32_t i915_pde_index(uint64_t addr, uint32_t shift)
+{
+	return (addr >> shift) & I915_PDE_MASK;
+}
+
+static inline size_t i915_pde_count(uint64_t addr, uint64_t length,
+				    uint32_t pde_shift)
+{
+	const uint32_t pdp_shift = pde_shift + 9;
+	uint32_t start, end;
+
+	if (WARN_ON(!length))
+		return 0;
+
+	if (WARN_ON(addr % PAGE_SIZE))
+		addr = round_down(addr, PAGE_SIZE);
+
+	if (WARN_ON(length % PAGE_SIZE))
+		length = round_up(length, PAGE_SIZE);
+
+	start = i915_pde_index(addr, pde_shift);
+	end = round_up(addr + length, (1 << pde_shift)) >> pde_shift;
+
+	if (addr >> pdp_shift != (addr + length) >> pdp_shift)
+		end = round_down(end, I915_PDES_PER_PD);
+
+	BUG_ON(start > end);
+	return end - start;
+}
+
+static inline uint32_t gen6_pte_index(uint32_t addr)
+{
+	return i915_pte_index(addr, GEN6_PDE_SHIFT);
+}
+
+static inline size_t gen6_pte_count(uint32_t addr, uint32_t length)
+{
+	return i915_pte_count(addr, length, GEN6_PDE_SHIFT);
+}
+
+static inline uint32_t gen6_pde_index(uint32_t addr)
+{
+	return i915_pde_index(addr, GEN6_PDE_SHIFT);
+}
+
+static inline size_t gen6_pde_count(uint32_t addr, uint32_t length)
+{
+	return i915_pde_count(addr, length, GEN6_PDE_SHIFT);
+}
+
 #define BYT_PTE_WRITEABLE		(1 << 1)
 #define BYT_PTE_SNOOPED_BY_CPU_CACHES	(1 << 2)
 
@@ -44,8 +139,7 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define PPAT_DISPLAY_ELLC_INDEX		_PAGE_PCD /* WT eLLC */
 
 #define GEN8_LEGACY_PDPS		4
-#define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
-#define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
+#define GEN8_PTES_PER_PT		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
 
 /* GEN8 legacy style addressis defined as a 3 level page table:
  * 31:30 | 29:21 | 20:12 |  11:0
@@ -60,9 +154,26 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN8_PDPE_SHIFT			30
 #define GEN8_PDPE_MASK			0x3
 #define GEN8_PDE_SHIFT			21
-#define GEN8_PDE_MASK			0x1ff
-#define GEN8_PTE_SHIFT			12
-#define GEN8_PTE_MASK			0x1ff
+
+static inline uint32_t gen8_pte_index(uint64_t address)
+{
+	return i915_pte_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pde_index(uint64_t address)
+{
+	return i915_pde_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pdpe_index(uint64_t address)
+{
+	return (address >> GEN8_PDPE_SHIFT) & GEN8_PDPE_MASK;
+}
+
+static inline uint32_t gen8_pml4e_index(uint64_t address)
+{
+	BUG();
+}
 
 enum i915_cache_level;
 /**
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 13/26] drm/i915: construct page table abstractions
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (11 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 12/26] drm/i915: Page table helpers, and define renames Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  5:48 ` [PATCH 14/26] drm/i915: Complete page table structures Ben Widawsky
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

Thus far we've opted to make complex code requiring difficult review. In
the future, the code is only going to become more complex, and as such
we'll take the hit now and start to encapsulate things.

To help transition the code nicely there is some wasted space in gen6/7.
This will be ameliorated shortly.

NOTE: The pun in the subject was intentional.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Conflicts:
	drivers/gpu/drm/i915/i915_drv.h
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 175 ++++++++++++++++++------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  24 +++--
 2 files changed, 104 insertions(+), 95 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7afa5f4..5b283f2 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -250,7 +250,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
+		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
+		struct page *page_table = pd->page_tables[pde].page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PT)
@@ -292,8 +293,11 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPS))
 			break;
 
-		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
+		if (pt_vaddr == NULL) {
+			struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
+			struct page *page_table = pd->page_tables[pde].page;
+			pt_vaddr = kmap_atomic(page_table);
+		}
 
 		pt_vaddr[pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -312,29 +316,33 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
-static void gen8_free_page_tables(struct page **pt_pages)
+static void gen8_free_page_tables(struct i915_pagedir *pd)
 {
 	int i;
 
-	if (pt_pages == NULL)
+	if (pd->page_tables == NULL)
 		return;
 
 	for (i = 0; i < I915_PDES_PER_PD; i++)
-		if (pt_pages[i])
-			__free_pages(pt_pages[i], 0);
+		if (pd->page_tables[i].page)
+			__free_page(pd->page_tables[i].page);
+}
+
+static void gen8_free_page_directories(struct i915_pagedir *pd)
+{
+	kfree(pd->page_tables);
+	__free_page(pd->page);
 }
 
-static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
-		kfree(ppgtt->gen8_pt_pages[i]);
+		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
+		gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
 		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
-
-	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
 }
 
 static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
@@ -372,87 +380,73 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static struct page **__gen8_alloc_page_tables(void)
+static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages;
 	int i;
 
-	pt_pages = kcalloc(I915_PDES_PER_PD, sizeof(struct page *), GFP_KERNEL);
-	if (!pt_pages)
-		return ERR_PTR(-ENOMEM);
-
-	for (i = 0; i < I915_PDES_PER_PD; i++) {
-		pt_pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO);
-		if (!pt_pages[i])
-			goto bail;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		ppgtt->gen8_pt_dma_addr[i] = kcalloc(I915_PDES_PER_PD,
+						     sizeof(dma_addr_t),
+						     GFP_KERNEL);
+		if (!ppgtt->gen8_pt_dma_addr[i])
+			return -ENOMEM;
 	}
 
-	return pt_pages;
-
-bail:
-	gen8_free_page_tables(pt_pages);
-	kfree(pt_pages);
-	return ERR_PTR(-ENOMEM);
+	return 0;
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
-					   const int max_pdp)
+static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	struct page **pt_pages[GEN8_LEGACY_PDPS];
-	int i, ret;
+	int i, j;
 
-	for (i = 0; i < max_pdp; i++) {
-		pt_pages[i] = __gen8_alloc_page_tables();
-		if (IS_ERR(pt_pages[i])) {
-			ret = PTR_ERR(pt_pages[i]);
-			goto unwind_out;
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		for (j = 0; j < I915_PDES_PER_PD; j++) {
+			struct i915_pagetab *pt = &ppgtt->pdp.pagedir[i].page_tables[j];
+			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+			if (!pt->page)
+				goto unwind_out;
 		}
 	}
 
-	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
-	 * "atomic" - for cleanup purposes.
-	 */
-	for (i = 0; i < max_pdp; i++)
-		ppgtt->gen8_pt_pages[i] = pt_pages[i];
-
 	return 0;
 
 unwind_out:
-	while (i--) {
-		gen8_free_page_tables(pt_pages[i]);
-		kfree(pt_pages[i]);
-	}
+	while (i--)
+		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
 
-	return ret;
+	return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
+						const int max_pdp)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(I915_PDES_PER_PD,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
+	for (i = 0; i < max_pdp; i++) {
+		struct i915_pagetab *pt;
+		pt = kcalloc(I915_PDES_PER_PD, sizeof(*pt), GFP_KERNEL);
+		if (!pt)
+			goto unwind_out;
 
-	return 0;
-}
+		ppgtt->pdp.pagedir[i].page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!ppgtt->pdp.pagedir[i].page)
+			goto unwind_out;
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
-{
-	ppgtt->pd_pages = alloc_pages(GFP_KERNEL | __GFP_ZERO,
-				      get_order(max_pdp << PAGE_SHIFT));
-	if (!ppgtt->pd_pages)
-		return -ENOMEM;
+		ppgtt->pdp.pagedir[i].page_tables = pt;
+	}
 
-	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
+	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
 
 	return 0;
+
+unwind_out:
+	while (i--) {
+		kfree(ppgtt->pdp.pagedir[i].page_tables);
+		__free_page(ppgtt->pdp.pagedir[i].page);
+	}
+
+	return -ENOMEM;
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
@@ -464,18 +458,19 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
-	if (ret) {
-		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
-		return ret;
-	}
+	ret = gen8_ppgtt_allocate_page_tables(ppgtt);
+	if (ret)
+		goto err_out;
 
 	ppgtt->num_pd_entries = max_pdp * I915_PDES_PER_PD;
 
 	ret = gen8_ppgtt_allocate_dma(ppgtt);
-	if (ret)
-		gen8_ppgtt_free(ppgtt);
+	if (!ret)
+		return ret;
 
+	/* TODO: Check this for all cases */
+err_out:
+	gen8_ppgtt_free(ppgtt);
 	return ret;
 }
 
@@ -486,7 +481,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       &ppgtt->pd_pages[pdpe], 0,
+			       ppgtt->pdp.pagedir[pdpe].page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
@@ -506,7 +501,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 	struct page *p;
 	int ret;
 
-	p = ppgtt->gen8_pt_pages[pdpe][pde];
+	p = ppgtt->pdp.pagedir[pdpe].page_tables[pde].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
@@ -567,7 +562,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 */
 	for (i = 0; i < max_pdp; i++) {
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i].page);
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
@@ -627,7 +622,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
 		for (pte = 0; pte < GEN6_PTES_PER_PT; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * GEN6_PTES_PER_PT) +
@@ -921,7 +916,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > GEN6_PTES_PER_PT)
 			last_pte = GEN6_PTES_PER_PT;
 
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
 
 		for (i = pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -949,7 +944,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
 
 		pt_vaddr[pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -983,8 +978,8 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pt_pages[i]);
-	kfree(ppgtt->pt_pages);
+		__free_page(ppgtt->pd.page_tables[i].page);
+	kfree(ppgtt->pd.page_tables);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1043,22 +1038,22 @@ alloc:
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
+	struct i915_pagetab *pt;
 	int i;
 
-	ppgtt->pt_pages = kcalloc(ppgtt->num_pd_entries, sizeof(struct page *),
-				  GFP_KERNEL);
-
-	if (!ppgtt->pt_pages)
+	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
+	if (!pt)
 		return -ENOMEM;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO);
-		if (!ppgtt->pt_pages[i]) {
+		pt[i].page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!pt->page) {
 			gen6_ppgtt_free(ppgtt);
 			return -ENOMEM;
 		}
 	}
 
+	ppgtt->pd.page_tables = pt;
 	return 0;
 }
 
@@ -1093,9 +1088,11 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct page *page;
 		dma_addr_t pt_addr;
 
-		pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i], 0, 4096,
+		page = ppgtt->pd.page_tables[i].page;
+		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
 		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
@@ -1142,7 +1139,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total =  ppgtt->num_pd_entries * GEN6_PTES_PER_PT * PAGE_SIZE;
+	ppgtt->base.total = ppgtt->num_pd_entries * GEN6_PTES_PER_PT * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	ppgtt->pd_offset =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index f813769..2c7b378 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -306,6 +306,20 @@ struct i915_gtt {
 };
 #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
 
+struct i915_pagetab {
+	struct page *page;
+};
+
+struct i915_pagedir {
+	struct page *page; /* NULL for GEN6-GEN7 */
+	struct i915_pagetab *page_tables;
+};
+
+struct i915_pagedirpo {
+	/* struct page *page; */
+	struct i915_pagedir pagedir[GEN8_LEGACY_PDPS];
+};
+
 struct i915_hw_ppgtt {
 	struct i915_address_space base;
 	struct kref ref;
@@ -313,11 +327,6 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		struct page **pt_pages;
-		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
-	};
-	struct page *pd_pages;
-	union {
 		uint32_t pd_offset;
 		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
 	};
@@ -325,7 +334,10 @@ struct i915_hw_ppgtt {
 		dma_addr_t *pt_dma_addr;
 		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPS];
 	};
-
+	union {
+		struct i915_pagedirpo pdp;
+		struct i915_pagedir pd;
+	};
 	int (*enable)(struct i915_hw_ppgtt *ppgtt);
 	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
 			 struct intel_ring_buffer *ring,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 14/26] drm/i915: Complete page table structures
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (12 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 13/26] drm/i915: construct page table abstractions Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  9:09   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 15/26] drm/i915: Create page table allocators Ben Widawsky
                   ` (12 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

Move the remaining members over to the new page table structures.

This can be squashed with the previous commit if desire. The reasoning
is the same as that patch. I simply felt it is easier to review if split.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Conflicts:
	drivers/gpu/drm/i915/i915_drv.h
	drivers/gpu/drm/i915/i915_gem_gtt.c
---
 drivers/gpu/drm/i915/i915_debugfs.c   |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c   | 85 +++++++++++++----------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h   | 15 +++----
 drivers/gpu/drm/i915/i915_gpu_error.c |  1 -
 4 files changed, 38 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b226788..5f3666a 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1788,7 +1788,7 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int verb
 static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt, const char *name)
 {
 	seq_printf(m, "%s:\n", name);
-	seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
+	seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
 }
 
 static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool verbose)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5b283f2..d91a545 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -223,7 +223,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pd_dma_addr[i];
+		dma_addr_t addr = ppgtt->pdp.pagedir[i].daddr;
 		ret = gen8_write_pdp(ring, i, addr, synchronous);
 		if (ret)
 			return ret;
@@ -341,7 +341,6 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
 		gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
-		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
 }
 
@@ -353,14 +352,14 @@ static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pd_dma_addr[i])
+		if (!ppgtt->pdp.pagedir[i].daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i].daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -380,31 +379,18 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
-{
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(I915_PDES_PER_PD,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			return -ENOMEM;
-	}
-
-	return 0;
-}
-
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
 	int i, j;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[i];
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			struct i915_pagetab *pt = &ppgtt->pdp.pagedir[i].page_tables[j];
+			struct i915_pagetab *pt = &pd->page_tables[j];
 			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 			if (!pt->page)
 				goto unwind_out;
+
 		}
 	}
 
@@ -464,9 +450,7 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 
 	ppgtt->num_pd_entries = max_pdp * I915_PDES_PER_PD;
 
-	ret = gen8_ppgtt_allocate_dma(ppgtt);
-	if (!ret)
-		return ret;
+	return 0;
 
 	/* TODO: Check this for all cases */
 err_out:
@@ -488,7 +472,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		return ret;
 
-	ppgtt->pd_dma_addr[pdpe] = pd_addr;
+	ppgtt->pdp.pagedir[pdpe].daddr = pd_addr;
 
 	return 0;
 }
@@ -498,17 +482,18 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pde)
 {
 	dma_addr_t pt_addr;
-	struct page *p;
+	struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
+	struct i915_pagetab *pt = &pd->page_tables[pde];
+	struct page *p = pt->page;
 	int ret;
 
-	p = ppgtt->pdp.pagedir[pdpe].page_tables[pde].page;
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->gen8_pt_dma_addr[pdpe][pde] = pt_addr;
+	pt->daddr = pt_addr;
 
 	return 0;
 }
@@ -564,7 +549,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		gen8_ppgtt_pde_t *pd_vaddr;
 		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i].page);
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -604,14 +589,15 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
 
 	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
 	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-		   ppgtt->pd_offset, ppgtt->pd_offset + ppgtt->num_pd_entries);
+		   ppgtt->pd.pd_offset,
+		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pt_dma_addr[pde];
+		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -654,8 +640,8 @@ static void gen6_map_single(struct i915_hw_ppgtt *ppgtt,
 {
 	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
 	uint32_t pd_entry;
-	gen6_gtt_pte_t __iomem *pd_addr =
-		(gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+	gen6_gtt_pte_t __iomem *pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm;
+	pd_addr	+= ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
 	pd_entry = GEN6_PDE_ADDR_ENCODE(daddr);
 	pd_entry |= GEN6_PDE_VALID;
@@ -670,18 +656,18 @@ static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
 	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
 	int i;
 
-	WARN_ON(ppgtt->pd_offset & 0x3f);
+	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		gen6_map_single(ppgtt, i, ppgtt->pt_dma_addr[i]);
+		gen6_map_single(ppgtt, i, ppgtt->pd.page_tables[i].daddr);
 
 	readl(dev_priv->gtt.gsm);
 }
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 {
-	BUG_ON(ppgtt->pd_offset & 0x3f);
+	BUG_ON(ppgtt->pd.pd_offset & 0x3f);
 
-	return (ppgtt->pd_offset / 64) << 16;
+	return (ppgtt->pd.pd_offset / 64) << 16;
 }
 
 static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
@@ -964,19 +950,16 @@ static void gen6_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	if (ppgtt->pt_dma_addr) {
-		for (i = 0; i < ppgtt->num_pd_entries; i++)
-			pci_unmap_page(ppgtt->base.dev->pdev,
-				       ppgtt->pt_dma_addr[i],
-				       4096, PCI_DMA_BIDIRECTIONAL);
-	}
+	for (i = 0; i < ppgtt->num_pd_entries; i++)
+		pci_unmap_page(ppgtt->base.dev->pdev,
+			       ppgtt->pd.page_tables[i].daddr,
+			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		__free_page(ppgtt->pd.page_tables[i].page);
 	kfree(ppgtt->pd.page_tables);
@@ -1071,14 +1054,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 		return ret;
 	}
 
-	ppgtt->pt_dma_addr = kcalloc(ppgtt->num_pd_entries, sizeof(dma_addr_t),
-				     GFP_KERNEL);
-	if (!ppgtt->pt_dma_addr) {
-		drm_mm_remove_node(&ppgtt->node);
-		gen6_ppgtt_free(ppgtt);
-		return -ENOMEM;
-	}
-
 	return 0;
 }
 
@@ -1100,7 +1075,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pt_dma_addr[i] = pt_addr;
+		ppgtt->pd.page_tables[i].daddr = pt_addr;
 	}
 
 	return 0;
@@ -1142,7 +1117,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.total = ppgtt->num_pd_entries * GEN6_PTES_PER_PT * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
-	ppgtt->pd_offset =
+	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
 	gen6_map_page_tables(ppgtt);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 2c7b378..d30f6de 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -308,10 +308,16 @@ struct i915_gtt {
 
 struct i915_pagetab {
 	struct page *page;
+	dma_addr_t daddr;
 };
 
 struct i915_pagedir {
 	struct page *page; /* NULL for GEN6-GEN7 */
+	union {
+		uint32_t pd_offset;
+		dma_addr_t daddr;
+	};
+
 	struct i915_pagetab *page_tables;
 };
 
@@ -327,17 +333,10 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	unsigned num_pd_pages; /* gen8+ */
 	union {
-		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
-	};
-	union {
-		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPS];
-	};
-	union {
 		struct i915_pagedirpo pdp;
 		struct i915_pagedir pd;
 	};
+
 	int (*enable)(struct i915_hw_ppgtt *ppgtt);
 	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
 			 struct intel_ring_buffer *ring,
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 63266ae..d7ac688 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -764,7 +764,6 @@ static void i915_gem_record_fences(struct drm_device *dev,
 	}
 }
 
-
 static void gen8_record_semaphore_state(struct drm_i915_private *dev_priv,
 					struct drm_i915_error_state *error,
 					struct intel_ring_buffer *ring,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 15/26] drm/i915: Create page table allocators
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (13 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 14/26] drm/i915: Complete page table structures Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  9:14   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 16/26] drm/i915: Generalize GEN6 mapping Ben Widawsky
                   ` (11 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks.  This makes the
code easier to write, read, and verify.

Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,

The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain non-trivial complexity as a
result.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 226 +++++++++++++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |   4 +-
 2 files changed, 147 insertions(+), 83 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d91a545..5c08cf9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -183,6 +183,102 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
+static void free_pt_single(struct i915_pagetab *pt)
+{
+	if (WARN_ON(!pt->page))
+		return;
+	__free_page(pt->page);
+	kfree(pt);
+}
+
+static struct i915_pagetab *alloc_pt_single(void)
+{
+	struct i915_pagetab *pt;
+
+	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
+	if (!pt)
+		return ERR_PTR(-ENOMEM);
+
+	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pt->page) {
+		kfree(pt);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pt;
+}
+
+/**
+ * alloc_pt_range() - Allocate a multiple page tables
+ * @pd:		The page directory which will have at least @count entries
+ *		available to point to the allocated page tables.
+ * @pde:	First page directory entry for which we are allocating.
+ * @count:	Number of pages to allocate.
+ *
+ * Allocates multiple page table pages and sets the appropriate entries in the
+ * page table structure within the page directory. Function cleans up after
+ * itself on any failures.
+ *
+ * Return: 0 if allocation succeeded.
+ */
+static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
+{
+	int i, ret;
+
+	/* 512 is the max page tables per pagedir on any platform.
+	 * TODO: make WARN after patch series is done
+	 */
+	BUG_ON(pde + count > I915_PDES_PER_PD);
+
+	for (i = pde; i < pde + count; i++) {
+		struct i915_pagetab *pt = alloc_pt_single();
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto err_out;
+		}
+		WARN(pd->page_tables[i],
+		     "Leaking page directory entry %d (%pa)\n",
+		     i, pd->page_tables[i]);
+		pd->page_tables[i] = pt;
+	}
+
+	return 0;
+
+err_out:
+	while (i--)
+		free_pt_single(pd->page_tables[i]);
+	return ret;
+}
+
+static void __free_pd_single(struct i915_pagedir *pd)
+{
+	__free_page(pd->page);
+	kfree(pd);
+}
+
+#define free_pd_single(pd) do { \
+	if ((pd)->page) { \
+		__free_pd_single(pd); \
+	} \
+} while (0)
+
+static struct i915_pagedir *alloc_pd_single(void)
+{
+	struct i915_pagedir *pd;
+
+	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+	if (!pd)
+		return ERR_PTR(-ENOMEM);
+
+	pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!pd->page) {
+		kfree(pd);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return pd;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_ring_buffer *ring, unsigned entry,
 			   uint64_t val, bool synchronous)
@@ -223,7 +319,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
 
 	for (i = used_pd - 1; i >= 0; i--) {
-		dma_addr_t addr = ppgtt->pdp.pagedir[i].daddr;
+		dma_addr_t addr = ppgtt->pdp.pagedir[i]->daddr;
 		ret = gen8_write_pdp(ring, i, addr, synchronous);
 		if (ret)
 			return ret;
@@ -250,8 +346,9 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
-		struct page *page_table = pd->page_tables[pde].page;
+		struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+		struct i915_pagetab *pt = pd->page_tables[pde];
+		struct page *page_table = pt->page;
 
 		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PT)
@@ -294,8 +391,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 			break;
 
 		if (pt_vaddr == NULL) {
-			struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
-			struct page *page_table = pd->page_tables[pde].page;
+			struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+			struct i915_pagetab *pt = pd->page_tables[pde];
+			struct page *page_table = pt->page;
 			pt_vaddr = kmap_atomic(page_table);
 		}
 
@@ -320,18 +418,13 @@ static void gen8_free_page_tables(struct i915_pagedir *pd)
 {
 	int i;
 
-	if (pd->page_tables == NULL)
+	if (!pd->page)
 		return;
 
-	for (i = 0; i < I915_PDES_PER_PD; i++)
-		if (pd->page_tables[i].page)
-			__free_page(pd->page_tables[i].page);
-}
-
-static void gen8_free_page_directories(struct i915_pagedir *pd)
-{
-	kfree(pd->page_tables);
-	__free_page(pd->page);
+	for (i = 0; i < I915_PDES_PER_PD; i++) {
+		free_pt_single(pd->page_tables[i]);
+		pd->page_tables[i] = NULL;
+	}
 }
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
@@ -339,8 +432,8 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
-		gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+		free_pd_single(ppgtt->pdp.pagedir[i]);
 	}
 }
 
@@ -352,14 +445,16 @@ static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		/* TODO: In the future we'll support sparse mappings, so this
 		 * will have to change. */
-		if (!ppgtt->pdp.pagedir[i].daddr)
+		if (!ppgtt->pdp.pagedir[i]->daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i].daddr, PAGE_SIZE,
+		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i]->daddr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
+			struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
+			struct i915_pagetab *pt =  pd->page_tables[j];
+			dma_addr_t addr = pt->daddr;
 			if (addr)
 				pci_unmap_page(hwdev, addr, PAGE_SIZE,
 					       PCI_DMA_BIDIRECTIONAL);
@@ -381,24 +476,20 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-	int i, j;
+	int i, ret;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		struct i915_pagedir *pd = &ppgtt->pdp.pagedir[i];
-		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			struct i915_pagetab *pt = &pd->page_tables[j];
-			pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-			if (!pt->page)
-				goto unwind_out;
-
-		}
+		ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
+				     0, I915_PDES_PER_PD);
+		if (ret)
+			goto unwind_out;
 	}
 
 	return 0;
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
 
 	return -ENOMEM;
 }
@@ -409,16 +500,9 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int i;
 
 	for (i = 0; i < max_pdp; i++) {
-		struct i915_pagetab *pt;
-		pt = kcalloc(I915_PDES_PER_PD, sizeof(*pt), GFP_KERNEL);
-		if (!pt)
+		ppgtt->pdp.pagedir[i] = alloc_pd_single();
+		if (IS_ERR(ppgtt->pdp.pagedir[i]))
 			goto unwind_out;
-
-		ppgtt->pdp.pagedir[i].page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-		if (!ppgtt->pdp.pagedir[i].page)
-			goto unwind_out;
-
-		ppgtt->pdp.pagedir[i].page_tables = pt;
 	}
 
 	ppgtt->num_pd_pages = max_pdp;
@@ -427,10 +511,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 
 unwind_out:
-	while (i--) {
-		kfree(ppgtt->pdp.pagedir[i].page_tables);
-		__free_page(ppgtt->pdp.pagedir[i].page);
-	}
+	while (i--)
+		free_pd_single(ppgtt->pdp.pagedir[i]);
 
 	return -ENOMEM;
 }
@@ -465,14 +547,14 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int ret;
 
 	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.pagedir[pdpe].page, 0,
+			       ppgtt->pdp.pagedir[pdpe]->page, 0,
 			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
 	if (ret)
 		return ret;
 
-	ppgtt->pdp.pagedir[pdpe].daddr = pd_addr;
+	ppgtt->pdp.pagedir[pdpe]->daddr = pd_addr;
 
 	return 0;
 }
@@ -482,8 +564,8 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 					const int pde)
 {
 	dma_addr_t pt_addr;
-	struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
-	struct i915_pagetab *pt = &pd->page_tables[pde];
+	struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+	struct i915_pagetab *pt = pd->page_tables[pde];
 	struct page *p = pt->page;
 	int ret;
 
@@ -546,10 +628,12 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 * will never need to touch the PDEs again.
 	 */
 	for (i = 0; i < max_pdp; i++) {
+		struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
 		gen8_ppgtt_pde_t *pd_vaddr;
-		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i].page);
+		pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i]->page);
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
+			struct i915_pagetab *pt = pd->page_tables[j];
+			dma_addr_t addr = pt->daddr;
 			pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
 						      I915_CACHE_LLC);
 		}
@@ -597,7 +681,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
-		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
+		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
 		pd_entry = readl(pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -608,7 +692,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
 		for (pte = 0; pte < GEN6_PTES_PER_PT; pte+=4) {
 			unsigned long va =
 				(pde * PAGE_SIZE * GEN6_PTES_PER_PT) +
@@ -658,7 +742,7 @@ static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		gen6_map_single(ppgtt, i, ppgtt->pd.page_tables[i].daddr);
+		gen6_map_single(ppgtt, i, ppgtt->pd.page_tables[i]->daddr);
 
 	readl(dev_priv->gtt.gsm);
 }
@@ -902,7 +986,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 		if (last_pte > GEN6_PTES_PER_PT)
 			last_pte = GEN6_PTES_PER_PT;
 
-		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
+		pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
 
 		for (i = pte; i < last_pte; i++)
 			pt_vaddr[i] = scratch_pte;
@@ -930,7 +1014,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	pt_vaddr = NULL;
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
+			pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
 
 		pt_vaddr[pte] =
 			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -952,7 +1036,7 @@ static void gen6_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_tables[i].daddr,
+			       ppgtt->pd.page_tables[i]->daddr,
 			       4096, PCI_DMA_BIDIRECTIONAL);
 }
 
@@ -961,8 +1045,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		__free_page(ppgtt->pd.page_tables[i].page);
-	kfree(ppgtt->pd.page_tables);
+		free_pt_single(ppgtt->pd.page_tables[i]);
+
+	free_pd_single(&ppgtt->pd);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1019,27 +1104,6 @@ alloc:
 	return 0;
 }
 
-static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct i915_pagetab *pt;
-	int i;
-
-	pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
-	if (!pt)
-		return -ENOMEM;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		pt[i].page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-		if (!pt->page) {
-			gen6_ppgtt_free(ppgtt);
-			return -ENOMEM;
-		}
-	}
-
-	ppgtt->pd.page_tables = pt;
-	return 0;
-}
-
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 {
 	int ret;
@@ -1048,7 +1112,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_allocate_page_tables(ppgtt);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
@@ -1066,7 +1130,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 		struct page *page;
 		dma_addr_t pt_addr;
 
-		page = ppgtt->pd.page_tables[i].page;
+		page = ppgtt->pd.page_tables[i]->page;
 		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
 				       PCI_DMA_BIDIRECTIONAL);
 
@@ -1075,7 +1139,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
 			return -EIO;
 		}
 
-		ppgtt->pd.page_tables[i].daddr = pt_addr;
+		ppgtt->pd.page_tables[i]->daddr = pt_addr;
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d30f6de..9cec163 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -318,12 +318,12 @@ struct i915_pagedir {
 		dma_addr_t daddr;
 	};
 
-	struct i915_pagetab *page_tables;
+	struct i915_pagetab *page_tables[I915_PDES_PER_PD]; /* PDEs */
 };
 
 struct i915_pagedirpo {
 	/* struct page *page; */
-	struct i915_pagedir pagedir[GEN8_LEGACY_PDPS];
+	struct i915_pagedir *pagedir[GEN8_LEGACY_PDPS];
 };
 
 struct i915_hw_ppgtt {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 16/26] drm/i915: Generalize GEN6 mapping
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (14 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 15/26] drm/i915: Create page table allocators Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  9:22   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 17/26] drm/i915: Clean up pagetable DMA map & unmap Ben Widawsky
                   ` (10 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

Having a more general way of doing mappings will allow the ability to
easy map and unmap a specific page table. Specifically in this case, we
pass down the page directory + entry, and the page table to map. This
works similarly to the x86 code.

The same work will need to happen for GEN8. At that point I will try to
combine functionality.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 61 +++++++++++++++++++------------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  2 ++
 2 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5c08cf9..35acccb 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -663,18 +663,13 @@ bail:
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
 	struct i915_address_space *vm = &ppgtt->base;
-	gen6_gtt_pte_t __iomem *pd_addr;
 	gen6_gtt_pte_t scratch_pte;
 	uint32_t pd_entry;
 	int pte, pde;
 
 	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
 
-	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-
 	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
 		   ppgtt->pd.pd_offset,
 		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
@@ -682,7 +677,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 		u32 expected;
 		gen6_gtt_pte_t *pt_vaddr;
 		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
-		pd_entry = readl(pd_addr + pde);
+		pd_entry = readl(ppgtt->pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
 		if (pd_entry != expected)
@@ -718,39 +713,43 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 	}
 }
 
-static void gen6_map_single(struct i915_hw_ppgtt *ppgtt,
-			    const unsigned pde_index,
-			    dma_addr_t daddr)
+/* Map pde (index) from the page directory @pd to the page table @pt */
+static void gen6_map_single(struct i915_pagedir *pd,
+			    const int pde, struct i915_pagetab *pt)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
-	uint32_t pd_entry;
-	gen6_gtt_pte_t __iomem *pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm;
-	pd_addr	+= ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(pd, struct i915_hw_ppgtt, pd);
+	u32 pd_entry;
 
-	pd_entry = GEN6_PDE_ADDR_ENCODE(daddr);
+	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
 	pd_entry |= GEN6_PDE_VALID;
 
-	writel(pd_entry, pd_addr + pde_index);
+	writel(pd_entry, ppgtt->pd_addr + pde);
+
+	/* XXX: Caller needs to make sure the write completes if necessary */
 }
 
 /* Map all the page tables found in the ppgtt structure to incrementing page
  * directories. */
-static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
+static void gen6_map_page_range(struct drm_i915_private *dev_priv,
+				struct i915_pagedir *pd, unsigned pde, size_t n)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
-	int i;
+	if (WARN_ON(pde + n > I915_PDES_PER_PD))
+		n = I915_PDES_PER_PD - pde;
 
-	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		gen6_map_single(ppgtt, i, ppgtt->pd.page_tables[i]->daddr);
+	n += pde;
+
+	for (; pde < n; pde++)
+		gen6_map_single(pd, pde, pd->page_tables[pde]);
 
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
 	readl(dev_priv->gtt.gsm);
 }
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 {
 	BUG_ON(ppgtt->pd.pd_offset & 0x3f);
-
 	return (ppgtt->pd.pd_offset / 64) << 16;
 }
 
@@ -1184,7 +1183,10 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd.pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
-	gen6_map_page_tables(ppgtt);
+	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
+		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
+
+	gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
@@ -1355,13 +1357,14 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 
 	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
 		/* TODO: Perhaps it shouldn't be gen6 specific */
-		if (i915_is_ggtt(vm)) {
-			if (dev_priv->mm.aliasing_ppgtt)
-				gen6_map_page_tables(dev_priv->mm.aliasing_ppgtt);
-			continue;
-		}
 
-		gen6_map_page_tables(container_of(vm, struct i915_hw_ppgtt, base));
+		struct i915_hw_ppgtt *ppgtt =
+			container_of(vm, struct i915_hw_ppgtt, base);
+
+		if (i915_is_ggtt(vm))
+			ppgtt = dev_priv->mm.aliasing_ppgtt;
+
+		gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
 	}
 
 	i915_gem_chipset_flush(dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9cec163..fa9249f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -337,6 +337,8 @@ struct i915_hw_ppgtt {
 		struct i915_pagedir pd;
 	};
 
+	gen6_gtt_pte_t __iomem *pd_addr;
+
 	int (*enable)(struct i915_hw_ppgtt *ppgtt);
 	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
 			 struct intel_ring_buffer *ring,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 17/26] drm/i915: Clean up pagetable DMA map & unmap
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (15 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 16/26] drm/i915: Generalize GEN6 mapping Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  9:24   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 18/26] drm/i915: Always dma map page table allocations Ben Widawsky
                   ` (9 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

Map and unmap are common operations across all generations for
pagetables. With a simple helper, we can get a nice net code reduction
as well as simplified complexity.

There is some room for optimization here, for instance with the multiple
page mapping, that can be done in one pci_map operation. In that case
however, the max value we'll ever see there is 512, and so I believe the
simpler code makes this a worthwhile trade-off. Also, the range mapping
functions are place holders to help transition the code. Eventually,
mapping will only occur during a page allocation which will always be a
discrete operation.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 147 +++++++++++++++++++++---------------
 1 file changed, 85 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 35acccb..92e03dd 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -183,6 +183,76 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
+#define dma_unmap_pt_single(pt, dev) do { \
+	pci_unmap_page((dev)->pdev, (pt)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
+} while (0);
+
+
+static void dma_unmap_pt_range(struct i915_pagedir *pd,
+			       unsigned pde, size_t n,
+			       struct drm_device *dev)
+{
+	if (WARN_ON(pde + n > I915_PDES_PER_PD))
+		n = I915_PDES_PER_PD - pde;
+
+	n += pde;
+
+	for (; pde < n; pde++)
+		dma_unmap_pt_single(pd->page_tables[pde], dev);
+}
+
+/**
+ * dma_map_pt_single() - Create a dma mapping for a page table
+ * @pt:		Page table to get a DMA map for
+ * @dev:	drm device
+ *
+ * Page table allocations are unified across all gens. They always require a
+ * single 4k allocation, as well as a DMA mapping.
+ *
+ * Return: 0 if success.
+ */
+static int dma_map_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
+{
+	struct page *page;
+	dma_addr_t pt_addr;
+	int ret;
+
+	page = pt->page;
+	pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
+			       PCI_DMA_BIDIRECTIONAL);
+
+	ret = pci_dma_mapping_error(dev->pdev, pt_addr);
+	if (ret)
+		return ret;
+
+	pt->daddr = pt_addr;
+
+	return 0;
+}
+
+static int dma_map_pt_range(struct i915_pagedir *pd,
+			    unsigned pde, size_t n,
+			    struct drm_device *dev)
+{
+	const int first = pde;
+
+	if (WARN_ON(pde + n > I915_PDES_PER_PD))
+		n = I915_PDES_PER_PD - pde;
+
+	n += pde;
+
+	for (; pde < n; pde++) {
+		int ret;
+		ret = dma_map_pt_single(pd->page_tables[pde], dev);
+		if (ret) {
+			dma_unmap_pt_range(pd, first, pde, dev);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
 static void free_pt_single(struct i915_pagetab *pt)
 {
 	if (WARN_ON(!pt->page))
@@ -191,7 +261,7 @@ static void free_pt_single(struct i915_pagetab *pt)
 	kfree(pt);
 }
 
-static struct i915_pagetab *alloc_pt_single(void)
+static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_pagetab *pt;
 
@@ -214,6 +284,7 @@ static struct i915_pagetab *alloc_pt_single(void)
  *		available to point to the allocated page tables.
  * @pde:	First page directory entry for which we are allocating.
  * @count:	Number of pages to allocate.
+ * @dev		DRM device used for DMA mapping.
  *
  * Allocates multiple page table pages and sets the appropriate entries in the
  * page table structure within the page directory. Function cleans up after
@@ -221,7 +292,8 @@ static struct i915_pagetab *alloc_pt_single(void)
  *
  * Return: 0 if allocation succeeded.
  */
-static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
+static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count,
+			  struct drm_device *dev)
 {
 	int i, ret;
 
@@ -231,7 +303,7 @@ static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
 	BUG_ON(pde + count > I915_PDES_PER_PD);
 
 	for (i = pde; i < pde + count; i++) {
-		struct i915_pagetab *pt = alloc_pt_single();
+		struct i915_pagetab *pt = alloc_pt_single(dev);
 		if (IS_ERR(pt)) {
 			ret = PTR_ERR(pt);
 			goto err_out;
@@ -480,7 +552,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
-				     0, I915_PDES_PER_PD);
+				     0, I915_PDES_PER_PD, ppgtt->base.dev);
 		if (ret)
 			goto unwind_out;
 	}
@@ -559,27 +631,6 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 }
 
-static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
-					const int pdpe,
-					const int pde)
-{
-	dma_addr_t pt_addr;
-	struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
-	struct i915_pagetab *pt = pd->page_tables[pde];
-	struct page *p = pt->page;
-	int ret;
-
-	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
-	if (ret)
-		return ret;
-
-	pt->daddr = pt_addr;
-
-	return 0;
-}
-
 /**
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -608,12 +659,15 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 * 2. Create DMA mappings for the page directories and page tables.
 	 */
 	for (i = 0; i < max_pdp; i++) {
+		struct i915_pagedir *pd;
 		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
 		if (ret)
 			goto bail;
 
+		pd = ppgtt->pdp.pagedir[i];
+
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
+			ret = dma_map_pt_single(pd->page_tables[j], ppgtt->base.dev);
 			if (ret)
 				goto bail;
 		}
@@ -1029,16 +1083,6 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
-static void gen6_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
-{
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd.page_tables[i]->daddr,
-			       4096, PCI_DMA_BIDIRECTIONAL);
-}
-
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
@@ -1058,7 +1102,7 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 	drm_mm_takedown(&ppgtt->base.mm);
 	drm_mm_remove_node(&ppgtt->node);
 
-	gen6_ppgtt_dma_unmap_pages(ppgtt);
+	dma_unmap_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries, vm->dev);
 	gen6_ppgtt_free(ppgtt);
 }
 
@@ -1111,7 +1155,8 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
+	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+			     ppgtt->base.dev);
 	if (ret) {
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
@@ -1120,29 +1165,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
-	struct drm_device *dev = ppgtt->base.dev;
-	int i;
-
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		struct page *page;
-		dma_addr_t pt_addr;
-
-		page = ppgtt->pd.page_tables[i]->page;
-		pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
-				       PCI_DMA_BIDIRECTIONAL);
-
-		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
-			gen6_ppgtt_dma_unmap_pages(ppgtt);
-			return -EIO;
-		}
-
-		ppgtt->pd.page_tables[i]->daddr = pt_addr;
-	}
-
-	return 0;
-}
 
 static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
@@ -1167,7 +1189,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = gen6_ppgtt_setup_page_tables(ppgtt);
+	ret = dma_map_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+			       ppgtt->base.dev);
 	if (ret) {
 		gen6_ppgtt_free(ppgtt);
 		return ret;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 18/26] drm/i915: Always dma map page table allocations
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (16 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 17/26] drm/i915: Clean up pagetable DMA map & unmap Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  9:25   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 19/26] drm/i915: Consolidate dma mappings Ben Widawsky
                   ` (8 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

There is never a case where we don't want to do it. Since we've broken
up the allocations into nice clean helper functions, it's both easy and
obvious to do the dma mapping at the same time.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 78 ++++++++-----------------------------
 1 file changed, 17 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 92e03dd..9630109 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -187,20 +187,6 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	pci_unmap_page((dev)->pdev, (pt)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
 } while (0);
 
-
-static void dma_unmap_pt_range(struct i915_pagedir *pd,
-			       unsigned pde, size_t n,
-			       struct drm_device *dev)
-{
-	if (WARN_ON(pde + n > I915_PDES_PER_PD))
-		n = I915_PDES_PER_PD - pde;
-
-	n += pde;
-
-	for (; pde < n; pde++)
-		dma_unmap_pt_single(pd->page_tables[pde], dev);
-}
-
 /**
  * dma_map_pt_single() - Create a dma mapping for a page table
  * @pt:		Page table to get a DMA map for
@@ -230,33 +216,12 @@ static int dma_map_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
 	return 0;
 }
 
-static int dma_map_pt_range(struct i915_pagedir *pd,
-			    unsigned pde, size_t n,
-			    struct drm_device *dev)
-{
-	const int first = pde;
-
-	if (WARN_ON(pde + n > I915_PDES_PER_PD))
-		n = I915_PDES_PER_PD - pde;
-
-	n += pde;
-
-	for (; pde < n; pde++) {
-		int ret;
-		ret = dma_map_pt_single(pd->page_tables[pde], dev);
-		if (ret) {
-			dma_unmap_pt_range(pd, first, pde, dev);
-			return ret;
-		}
-	}
-
-	return 0;
-}
-
-static void free_pt_single(struct i915_pagetab *pt)
+static void free_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
 {
 	if (WARN_ON(!pt->page))
 		return;
+
+	dma_unmap_pt_single(pt, dev);
 	__free_page(pt->page);
 	kfree(pt);
 }
@@ -264,6 +229,7 @@ static void free_pt_single(struct i915_pagetab *pt)
 static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_pagetab *pt;
+	int ret;
 
 	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
 	if (!pt)
@@ -275,6 +241,13 @@ static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	ret = dma_map_pt_single(pt, dev);
+	if (ret) {
+		__free_page(pt->page);
+		kfree(pt);
+		return ERR_PTR(ret);
+	}
+
 	return pt;
 }
 
@@ -318,7 +291,7 @@ static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count,
 
 err_out:
 	while (i--)
-		free_pt_single(pd->page_tables[i]);
+		free_pt_single(pd->page_tables[i], dev);
 	return ret;
 }
 
@@ -486,7 +459,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
-static void gen8_free_page_tables(struct i915_pagedir *pd)
+static void gen8_free_page_tables(struct i915_pagedir *pd, struct drm_device *dev)
 {
 	int i;
 
@@ -494,7 +467,7 @@ static void gen8_free_page_tables(struct i915_pagedir *pd)
 		return;
 
 	for (i = 0; i < I915_PDES_PER_PD; i++) {
-		free_pt_single(pd->page_tables[i]);
+		free_pt_single(pd->page_tables[i], dev);
 		pd->page_tables[i] = NULL;
 	}
 }
@@ -504,7 +477,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
 		free_pd_single(ppgtt->pdp.pagedir[i]);
 	}
 }
@@ -561,7 +534,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
@@ -659,18 +632,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	 * 2. Create DMA mappings for the page directories and page tables.
 	 */
 	for (i = 0; i < max_pdp; i++) {
-		struct i915_pagedir *pd;
 		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
 		if (ret)
 			goto bail;
-
-		pd = ppgtt->pdp.pagedir[i];
-
-		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			ret = dma_map_pt_single(pd->page_tables[j], ppgtt->base.dev);
-			if (ret)
-				goto bail;
-		}
 	}
 
 	/*
@@ -1088,7 +1052,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		free_pt_single(ppgtt->pd.page_tables[i]);
+		free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
 
 	free_pd_single(&ppgtt->pd);
 }
@@ -1102,7 +1066,6 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 	drm_mm_takedown(&ppgtt->base.mm);
 	drm_mm_remove_node(&ppgtt->node);
 
-	dma_unmap_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries, vm->dev);
 	gen6_ppgtt_free(ppgtt);
 }
 
@@ -1189,13 +1152,6 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
-	ret = dma_map_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
-			       ppgtt->base.dev);
-	if (ret) {
-		gen6_ppgtt_free(ppgtt);
-		return ret;
-	}
-
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 19/26] drm/i915: Consolidate dma mappings
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (17 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 18/26] drm/i915: Always dma map page table allocations Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  9:28   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 20/26] drm/i915: Always dma map page directory allocations Ben Widawsky
                   ` (7 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

With a little bit of macro magic, and the fact that every page
table/dir/etc. we wish to map will have a page, and daddr member, we can
greatly simplify and reduce code.

The patch introduces an i915_dma_map/unmap which has the same semantics
as pci_map_page, but is 1 line, and doesn't require newlines, or local
variables to make it fit cleanly.

Notice that even the page allocation shares this same attribute. For
now, I am leaving that code untouched because the macro version would be
a bit on the big side - but it's a nice cleanup as well (IMO)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 56 ++++++++++++-------------------------
 1 file changed, 18 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 9630109..abef33dd 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -183,45 +183,33 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 	return pte;
 }
 
-#define dma_unmap_pt_single(pt, dev) do { \
-	pci_unmap_page((dev)->pdev, (pt)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
+#define i915_dma_unmap_single(px, dev) do { \
+	pci_unmap_page((dev)->pdev, (px)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
 } while (0);
 
 /**
- * dma_map_pt_single() - Create a dma mapping for a page table
- * @pt:		Page table to get a DMA map for
+ * i915_dma_map_px_single() - Create a dma mapping for a page table/dir/etc.
+ * @px:		Page table/dir/etc to get a DMA map for
  * @dev:	drm device
  *
  * Page table allocations are unified across all gens. They always require a
- * single 4k allocation, as well as a DMA mapping.
+ * single 4k allocation, as well as a DMA mapping. If we keep the structs
+ * symmetric here, the simple macro covers us for every page table type.
  *
  * Return: 0 if success.
  */
-static int dma_map_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
-{
-	struct page *page;
-	dma_addr_t pt_addr;
-	int ret;
-
-	page = pt->page;
-	pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
-			       PCI_DMA_BIDIRECTIONAL);
-
-	ret = pci_dma_mapping_error(dev->pdev, pt_addr);
-	if (ret)
-		return ret;
-
-	pt->daddr = pt_addr;
-
-	return 0;
-}
+#define i915_dma_map_px_single(px, dev) \
+	pci_dma_mapping_error((dev)->pdev, \
+			      (px)->daddr = pci_map_page((dev)->pdev, \
+							 (px)->page, 0, 4096, \
+							 PCI_DMA_BIDIRECTIONAL))
 
 static void free_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
 {
 	if (WARN_ON(!pt->page))
 		return;
 
-	dma_unmap_pt_single(pt, dev);
+	i915_dma_unmap_single(pt, dev);
 	__free_page(pt->page);
 	kfree(pt);
 }
@@ -241,7 +229,7 @@ static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
 		return ERR_PTR(-ENOMEM);
 	}
 
-	ret = dma_map_pt_single(pt, dev);
+	ret = i915_dma_map_px_single(pt, dev);
 	if (ret) {
 		__free_page(pt->page);
 		kfree(pt);
@@ -484,7 +472,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
-	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
+	struct drm_device *dev = ppgtt->base.dev;
 	int i, j;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
@@ -493,16 +481,14 @@ static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 		if (!ppgtt->pdp.pagedir[i]->daddr)
 			continue;
 
-		pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i]->daddr, PAGE_SIZE,
-			       PCI_DMA_BIDIRECTIONAL);
+		i915_dma_unmap_single(ppgtt->pdp.pagedir[i], dev);
 
 		for (j = 0; j < I915_PDES_PER_PD; j++) {
 			struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
 			struct i915_pagetab *pt =  pd->page_tables[j];
 			dma_addr_t addr = pt->daddr;
 			if (addr)
-				pci_unmap_page(hwdev, addr, PAGE_SIZE,
-					       PCI_DMA_BIDIRECTIONAL);
+				i915_dma_unmap_single(pt, dev);
 		}
 	}
 }
@@ -588,19 +574,13 @@ err_out:
 static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 					     const int pdpe)
 {
-	dma_addr_t pd_addr;
 	int ret;
 
-	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-			       ppgtt->pdp.pagedir[pdpe]->page, 0,
-			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-
-	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
+	ret = i915_dma_map_px_single(ppgtt->pdp.pagedir[pdpe],
+				     ppgtt->base.dev);
 	if (ret)
 		return ret;
 
-	ppgtt->pdp.pagedir[pdpe]->daddr = pd_addr;
-
 	return 0;
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 20/26] drm/i915: Always dma map page directory allocations
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (18 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 19/26] drm/i915: Consolidate dma mappings Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  9:29   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 21/26] drm/i915: Track GEN6 page table usage Ben Widawsky
                   ` (6 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

Similar to the patch a few back in the series, we can always map and
unmap page directories when we do their allocation and teardown. Page
directory pages only exist on gen8+, so this should only effect behavior
on those platforms.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 79 +++++++++----------------------------
 1 file changed, 19 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index abef33dd..ad2f2c5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -283,21 +283,23 @@ err_out:
 	return ret;
 }
 
-static void __free_pd_single(struct i915_pagedir *pd)
+static void __free_pd_single(struct i915_pagedir *pd, struct drm_device *dev)
 {
+	i915_dma_unmap_single(pd, dev);
 	__free_page(pd->page);
 	kfree(pd);
 }
 
-#define free_pd_single(pd) do { \
+#define free_pd_single(pd, dev) do { \
 	if ((pd)->page) { \
-		__free_pd_single(pd); \
+		__free_pd_single(pd, dev); \
 	} \
 } while (0)
 
-static struct i915_pagedir *alloc_pd_single(void)
+static struct i915_pagedir *alloc_pd_single(struct drm_device *dev)
 {
 	struct i915_pagedir *pd;
+	int ret;
 
 	pd = kzalloc(sizeof(*pd), GFP_KERNEL);
 	if (!pd)
@@ -309,6 +311,13 @@ static struct i915_pagedir *alloc_pd_single(void)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	ret = i915_dma_map_px_single(pd, dev);
+	if (ret) {
+		__free_page(pd->page);
+		kfree(pd);
+		return ERR_PTR(ret);
+	}
+
 	return pd;
 }
 
@@ -466,30 +475,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
 		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
-		free_pd_single(ppgtt->pdp.pagedir[i]);
-	}
-}
-
-static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
-{
-	struct drm_device *dev = ppgtt->base.dev;
-	int i, j;
-
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		/* TODO: In the future we'll support sparse mappings, so this
-		 * will have to change. */
-		if (!ppgtt->pdp.pagedir[i]->daddr)
-			continue;
-
-		i915_dma_unmap_single(ppgtt->pdp.pagedir[i], dev);
-
-		for (j = 0; j < I915_PDES_PER_PD; j++) {
-			struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
-			struct i915_pagetab *pt =  pd->page_tables[j];
-			dma_addr_t addr = pt->daddr;
-			if (addr)
-				i915_dma_unmap_single(pt, dev);
-		}
+		free_pd_single(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
 	}
 }
 
@@ -501,7 +487,6 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	list_del(&vm->global_link);
 	drm_mm_takedown(&vm->mm);
 
-	gen8_ppgtt_dma_unmap_pages(ppgtt);
 	gen8_ppgtt_free(ppgtt);
 }
 
@@ -531,7 +516,7 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 	int i;
 
 	for (i = 0; i < max_pdp; i++) {
-		ppgtt->pdp.pagedir[i] = alloc_pd_single();
+		ppgtt->pdp.pagedir[i] = alloc_pd_single(ppgtt->base.dev);
 		if (IS_ERR(ppgtt->pdp.pagedir[i]))
 			goto unwind_out;
 	}
@@ -543,7 +528,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
 
 unwind_out:
 	while (i--)
-		free_pd_single(ppgtt->pdp.pagedir[i]);
+		free_pd_single(ppgtt->pdp.pagedir[i],
+			       ppgtt->base.dev);
 
 	return -ENOMEM;
 }
@@ -571,19 +557,6 @@ err_out:
 	return ret;
 }
 
-static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
-					     const int pdpe)
-{
-	int ret;
-
-	ret = i915_dma_map_px_single(ppgtt->pdp.pagedir[pdpe],
-				     ppgtt->base.dev);
-	if (ret)
-		return ret;
-
-	return 0;
-}
-
 /**
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -609,16 +582,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		return ret;
 
 	/*
-	 * 2. Create DMA mappings for the page directories and page tables.
-	 */
-	for (i = 0; i < max_pdp; i++) {
-		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
-		if (ret)
-			goto bail;
-	}
-
-	/*
-	 * 3. Map all the page directory entires to point to the page tables
+	 * 2. Map all the page directory entires to point to the page tables
 	 * we've allocated.
 	 *
 	 * For now, the PPGTT helper functions all require that the PDEs are
@@ -652,11 +616,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 			 ppgtt->num_pd_entries,
 			 (ppgtt->num_pd_entries - min_pt_pages) + size % (1<<30));
 	return 0;
-
-bail:
-	gen8_ppgtt_dma_unmap_pages(ppgtt);
-	gen8_ppgtt_free(ppgtt);
-	return ret;
 }
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
@@ -1034,7 +993,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
 
-	free_pd_single(&ppgtt->pd);
+	free_pd_single(&ppgtt->pd, ppgtt->base.dev);
 }
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 21/26] drm/i915: Track GEN6 page table usage
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (19 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 20/26] drm/i915: Always dma map page directory allocations Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  5:48 ` [PATCH 22/26] drm/i915: Extract context switch skip logic Ben Widawsky
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

Instead of implementing the full tracking + dynamic allocation, this
patch does a bit less than half of the work, by tracking and warning on
unexpected conditions. The tracking itself follows which PTEs within a
page table are currently being used for objects. The next patch will
modify this to actually allocate the page tables only when necessary.

With the current patch there isn't much in the way of making a gen
agnostic range allocation function. However, in the next patch we'll add
more specificity which makes having separate functions a bit easier to
manage.

Notice that aliasing PPGTT is not managed here. The patch which actually
begins dynamic allocation/teardown explains the reasoning forthis.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 170 +++++++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_gtt.h | 117 +++++++++++++++----------
 2 files changed, 212 insertions(+), 75 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ad2f2c5..d3c77d1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -53,9 +53,9 @@ bool intel_enable_ppgtt(struct drm_device *dev, bool full)
 		return HAS_ALIASING_PPGTT(dev);
 }
 
-static void ppgtt_bind_vma(struct i915_vma *vma,
-			   enum i915_cache_level cache_level,
-			   u32 flags);
+static int ppgtt_bind_vma(struct i915_vma *vma,
+			  enum i915_cache_level cache_level,
+			  u32 flags);
 static void ppgtt_unbind_vma(struct i915_vma *vma);
 static int gen8_ppgtt_enable(struct i915_hw_ppgtt *ppgtt);
 
@@ -204,39 +204,71 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 							 (px)->page, 0, 4096, \
 							 PCI_DMA_BIDIRECTIONAL))
 
-static void free_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
+static void __free_pt_single(struct i915_pagetab *pt, struct drm_device *dev,
+			     int scratch)
 {
+	if (WARN(scratch ^ pt->scratch,
+		 "Tried to free scratch = %d. Is scratch = %d\n",
+		 scratch, pt->scratch))
+		return;
+
 	if (WARN_ON(!pt->page))
 		return;
 
+	if (!scratch) {
+		const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+			GEN8_PTES_PER_PT : GEN6_PTES_PER_PT;
+		WARN(!bitmap_empty(pt->used_ptes, count),
+		     "Free page table with %d used pages\n",
+		     bitmap_weight(pt->used_ptes, count));
+	}
+
 	i915_dma_unmap_single(pt, dev);
 	__free_page(pt->page);
+	kfree(pt->used_ptes);
 	kfree(pt);
 }
 
+#define free_pt_single(pt, dev) \
+	__free_pt_single(pt, dev, false)
+#define free_pt_scratch(pt, dev) \
+	__free_pt_single(pt, dev, true)
+
 static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
 {
 	struct i915_pagetab *pt;
-	int ret;
+	const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+		GEN8_PTES_PER_PT : GEN6_PTES_PER_PT;
+	int ret = -ENOMEM;
 
 	pt = kzalloc(sizeof(*pt), GFP_KERNEL);
 	if (!pt)
 		return ERR_PTR(-ENOMEM);
 
+	pt->used_ptes = kcalloc(BITS_TO_LONGS(count), sizeof(*pt->used_ptes),
+				GFP_KERNEL);
+
+	if (!pt->used_ptes)
+		goto fail_bitmap;
+
 	pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-	if (!pt->page) {
-		kfree(pt);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!pt->page)
+		goto fail_page;
 
 	ret = i915_dma_map_px_single(pt, dev);
-	if (ret) {
-		__free_page(pt->page);
-		kfree(pt);
-		return ERR_PTR(ret);
-	}
+	if (ret)
+		goto fail_dma;
 
 	return pt;
+
+fail_dma:
+	__free_page(pt->page);
+fail_page:
+	kfree(pt->used_ptes);
+fail_bitmap:
+	kfree(pt);
+
+	return ERR_PTR(ret);
 }
 
 /**
@@ -689,15 +721,13 @@ static void gen6_map_single(struct i915_pagedir *pd,
 /* Map all the page tables found in the ppgtt structure to incrementing page
  * directories. */
 static void gen6_map_page_range(struct drm_i915_private *dev_priv,
-				struct i915_pagedir *pd, unsigned pde, size_t n)
+				struct i915_pagedir *pd, uint32_t start, uint32_t length)
 {
-	if (WARN_ON(pde + n > I915_PDES_PER_PD))
-		n = I915_PDES_PER_PD - pde;
-
-	n += pde;
+	struct i915_pagetab *pt;
+	uint32_t pde, temp;
 
-	for (; pde < n; pde++)
-		gen6_map_single(pd, pde, pd->page_tables[pde]);
+	gen6_for_each_pde(pt, pd, start, length, temp, pde)
+		gen6_map_single(pd, pde, pt);
 
 	/* Make sure write is complete before other code can use this page
 	 * table. Also require for WC mapped PTEs */
@@ -986,6 +1016,51 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
+static int gen6_alloc_va_range(struct i915_address_space *vm,
+			       uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		        container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_pagetab *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		int j;
+
+		DECLARE_BITMAP(tmp_bitmap, GEN6_PTES_PER_PT);
+		bitmap_zero(tmp_bitmap, GEN6_PTES_PER_PT);
+		bitmap_set(tmp_bitmap, gen6_pte_index(start),
+			   gen6_pte_count(start, length));
+
+		/* TODO: To be done in the next patch. Map the page/insert
+		 * entries here */
+		for_each_set_bit(j, tmp_bitmap, GEN6_PTES_PER_PT) {
+			if (test_bit(j, pt->used_ptes)) {
+				/* Check that we're changing cache levels */
+			}
+		}
+
+		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+			  GEN6_PTES_PER_PT);
+	}
+
+	return 0;
+}
+
+static void gen6_teardown_va_range(struct i915_address_space *vm,
+				   uint64_t start, uint64_t length)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		        container_of(vm, struct i915_hw_ppgtt, base);
+	struct i915_pagetab *pt;
+	uint32_t pde, temp;
+
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
+			     gen6_pte_count(start, length));
+	}
+}
+
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
@@ -993,6 +1068,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
 
+	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
 	free_pd_single(&ppgtt->pd, ppgtt->base.dev);
 }
 
@@ -1022,6 +1098,13 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 	 * size. We allocate at the top of the GTT to avoid fragmentation.
 	 */
 	BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
+
+	ppgtt->scratch_pt = alloc_pt_single(ppgtt->base.dev);
+	if (IS_ERR(ppgtt->scratch_pt))
+		return PTR_ERR(ppgtt->scratch_pt);
+
+	ppgtt->scratch_pt->scratch = 1;
+
 alloc:
 	ret = drm_mm_insert_node_in_range_generic(&dev_priv->gtt.base.mm,
 						  &ppgtt->node, GEN6_PD_SIZE,
@@ -1033,20 +1116,24 @@ alloc:
 					       GEN6_PD_SIZE, GEN6_PD_ALIGN,
 					       I915_CACHE_NONE, 0);
 		if (ret)
-			return ret;
+			goto err_out;
 
 		retried = true;
 		goto alloc;
 	}
 
 	if (ret)
-		return ret;
+		goto err_out;
 
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
 	ppgtt->num_pd_entries = I915_PDES_PER_PD;
 	return 0;
+
+err_out:
+	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
+	return ret;
 }
 
 static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
@@ -1091,6 +1178,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
+	ppgtt->base.allocate_va_range = gen6_alloc_va_range;
+	ppgtt->base.teardown_va_range = gen6_teardown_va_range;
 	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
@@ -1104,7 +1193,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
 		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
-	gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
+	gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
@@ -1139,13 +1228,25 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-static void
+static int
 ppgtt_bind_vma(struct i915_vma *vma,
 	       enum i915_cache_level cache_level,
 	       u32 flags)
 {
+	int ret;
+
+	WARN_ON(flags);
+	if (vma->vm->allocate_va_range) {
+		ret = vma->vm->allocate_va_range(vma->vm,
+						 vma->node.start,
+						 vma->node.size);
+		if (ret)
+			return ret;
+	}
+
 	vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
 				cache_level);
+	return 0;
 }
 
 static void ppgtt_unbind_vma(struct i915_vma *vma)
@@ -1154,6 +1255,9 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 			     vma->node.start,
 			     vma->obj->base.size,
 			     true);
+	if (vma->vm->teardown_va_range)
+		vma->vm->teardown_va_range(vma->vm,
+					   vma->node.start, vma->node.size);
 }
 
 extern int intel_iommu_gfx_mapped;
@@ -1446,9 +1550,9 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
 }
 
 
-static void i915_ggtt_bind_vma(struct i915_vma *vma,
-			       enum i915_cache_level cache_level,
-			       u32 unused)
+static int i915_ggtt_bind_vma(struct i915_vma *vma,
+			      enum i915_cache_level cache_level,
+			      u32 unused)
 {
 	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
 	unsigned int flags = (cache_level == I915_CACHE_NONE) ?
@@ -1457,6 +1561,8 @@ static void i915_ggtt_bind_vma(struct i915_vma *vma,
 	BUG_ON(!i915_is_ggtt(vma->vm));
 	intel_gtt_insert_sg_entries(vma->obj->pages, entry, flags);
 	vma->obj->has_global_gtt_mapping = 1;
+
+	return 0;
 }
 
 static void i915_ggtt_clear_range(struct i915_address_space *vm,
@@ -1479,9 +1585,9 @@ static void i915_ggtt_unbind_vma(struct i915_vma *vma)
 	intel_gtt_clear_range(first, size);
 }
 
-static void ggtt_bind_vma(struct i915_vma *vma,
-			  enum i915_cache_level cache_level,
-			  u32 flags)
+static int ggtt_bind_vma(struct i915_vma *vma,
+			 enum i915_cache_level cache_level,
+			 u32 flags)
 {
 	struct drm_device *dev = vma->vm->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1518,6 +1624,8 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 					    cache_level);
 		vma->obj->has_aliasing_ppgtt_mapping = 1;
 	}
+
+	return 0;
 }
 
 static void ggtt_unbind_vma(struct i915_vma *vma)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index fa9249f..3925fde 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -118,6 +118,23 @@ static inline size_t gen6_pde_count(uint32_t addr, uint32_t length)
 	return i915_pde_count(addr, length, GEN6_PDE_SHIFT);
 }
 
+/* For each pde iterates over every pde between from start until start + length.
+ * If start, and start+length are not perfectly divisible, the macro will round
+ * down, and up as needed. The macro modifies pde, start, and length. Dev is
+ * only used to differentiate shift values. Temp is temp.  On gen6/7, start = 0,
+ * and length = 2G effectively iterates over every PDE in the system. On gen8+
+ * it simply iterates over every page directory entry in a page directory.
+ *
+ * XXX: temp is not actually needed, but it saves doing the ALIGN operation.
+ */
+#define gen6_for_each_pde(pt, pd, start, length, temp, iter) \
+	for (iter = gen6_pde_index(start), pt = (pd)->page_tables[iter]; \
+	     length > 0 && iter < I915_PDES_PER_PD; \
+	     pt = (pd)->page_tables[++iter], \
+	     temp = ALIGN(start+1, 1 << GEN6_PDE_SHIFT) - start, \
+	     temp = min(temp, (unsigned)length), \
+	     start += temp, length -= temp)
+
 #define BYT_PTE_WRITEABLE		(1 << 1)
 #define BYT_PTE_SNOOPED_BY_CPU_CACHES	(1 << 2)
 
@@ -222,9 +239,33 @@ struct i915_vma {
 	void (*unbind_vma)(struct i915_vma *vma);
 	/* Map an object into an address space with the given cache flags. */
 #define GLOBAL_BIND (1<<0)
-	void (*bind_vma)(struct i915_vma *vma,
-			 enum i915_cache_level cache_level,
-			 u32 flags);
+	int (*bind_vma)(struct i915_vma *vma,
+			enum i915_cache_level cache_level,
+			u32 flags);
+};
+
+
+struct i915_pagetab {
+	struct page *page;
+	dma_addr_t daddr;
+
+	unsigned long *used_ptes;
+	unsigned int scratch:1;
+};
+
+struct i915_pagedir {
+	struct page *page; /* NULL for GEN6-GEN7 */
+	union {
+		uint32_t pd_offset;
+		dma_addr_t daddr;
+	};
+
+	struct i915_pagetab *page_tables[I915_PDES_PER_PD];
+};
+
+struct i915_pagedirpo {
+	/* struct page *page; */
+	struct i915_pagedir *pagedir[GEN8_LEGACY_PDPS];
 };
 
 struct i915_address_space {
@@ -266,6 +307,12 @@ struct i915_address_space {
 	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
 				     enum i915_cache_level level,
 				     bool valid); /* Create a valid PTE */
+	int (*allocate_va_range)(struct i915_address_space *vm,
+				 uint64_t start,
+				 uint64_t length);
+	void (*teardown_va_range)(struct i915_address_space *vm,
+				  uint64_t start,
+				  uint64_t length);
 	void (*clear_range)(struct i915_address_space *vm,
 			    uint64_t start,
 			    uint64_t length,
@@ -276,6 +323,29 @@ struct i915_address_space {
 			       enum i915_cache_level cache_level);
 	void (*cleanup)(struct i915_address_space *vm);
 };
+#define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
+
+struct i915_hw_ppgtt {
+	struct i915_address_space base;
+	struct kref ref;
+	struct drm_mm_node node;
+	unsigned num_pd_entries;
+	unsigned num_pd_pages; /* gen8+ */
+	union {
+		struct i915_pagedirpo pdp;
+		struct i915_pagedir pd;
+	};
+
+	struct i915_pagetab *scratch_pt;
+
+	gen6_gtt_pte_t __iomem *pd_addr;
+
+	int (*enable)(struct i915_hw_ppgtt *ppgtt);
+	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
+			 struct intel_ring_buffer *ring,
+			 bool synchronous);
+	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
+};
 
 /* The Graphics Translation Table is the way in which GEN hardware translates a
  * Graphics Virtual Address into a Physical Address. In addition to the normal
@@ -304,46 +374,5 @@ struct i915_gtt {
 			  size_t *stolen, phys_addr_t *mappable_base,
 			  unsigned long *mappable_end);
 };
-#define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
-
-struct i915_pagetab {
-	struct page *page;
-	dma_addr_t daddr;
-};
-
-struct i915_pagedir {
-	struct page *page; /* NULL for GEN6-GEN7 */
-	union {
-		uint32_t pd_offset;
-		dma_addr_t daddr;
-	};
-
-	struct i915_pagetab *page_tables[I915_PDES_PER_PD]; /* PDEs */
-};
-
-struct i915_pagedirpo {
-	/* struct page *page; */
-	struct i915_pagedir *pagedir[GEN8_LEGACY_PDPS];
-};
-
-struct i915_hw_ppgtt {
-	struct i915_address_space base;
-	struct kref ref;
-	struct drm_mm_node node;
-	unsigned num_pd_entries;
-	unsigned num_pd_pages; /* gen8+ */
-	union {
-		struct i915_pagedirpo pdp;
-		struct i915_pagedir pd;
-	};
-
-	gen6_gtt_pte_t __iomem *pd_addr;
-
-	int (*enable)(struct i915_hw_ppgtt *ppgtt);
-	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
-			 struct intel_ring_buffer *ring,
-			 bool synchronous);
-	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
-};
 
 #endif
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 22/26] drm/i915: Extract context switch skip logic
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (20 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 21/26] drm/i915: Track GEN6 page table usage Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  5:48 ` [PATCH 23/26] drm/i915: Force pd restore when PDEs change, gen6-7 Ben Widawsky
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

We have some fanciness coming up. This patch just breaks out the logic.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f918f2c..a899e11 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -634,6 +634,16 @@ mi_set_context(struct intel_ring_buffer *ring,
 	return ret;
 }
 
+static inline bool should_skip_switch(struct intel_ring_buffer *ring,
+				      struct i915_hw_context *from,
+				      struct i915_hw_context *to)
+{
+	if (from == to && from->last_ring == ring && !to->remap_slice)
+		return true;
+
+	return false;
+}
+
 static int do_switch(struct intel_ring_buffer *ring,
 		     struct i915_hw_context *to)
 {
@@ -648,7 +658,7 @@ static int do_switch(struct intel_ring_buffer *ring,
 		BUG_ON(!i915_gem_obj_is_pinned(from->obj));
 	}
 
-	if (from == to && from->last_ring == ring && !to->remap_slice)
+	if (should_skip_switch(ring, from, to))
 		return 0;
 
 	/* Trying to pin first makes error handling easier. */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 23/26] drm/i915: Force pd restore when PDEs change, gen6-7
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (21 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 22/26] drm/i915: Extract context switch skip logic Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-18  5:48 ` [PATCH 24/26] drm/i915: Finish gen6/7 dynamic page table allocation Ben Widawsky
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

The docs say you cannot change the PDEs of a currently running context. If you
are changing the PDEs of the currently running context then. We never
map new PDEs of a running context, and expect them to be present - so I
think this is okay. (We can unmap, but this should also be okay since we
only unmap unreferenced objects that the GPU shouldn't be tryingto
va->pa xlate.) The MI_SET_CONTEXT command does have a flag to signal
that even if the context is the same, force a reload. It's unclear
exactly what this does, but I have a hunch it's the right thing to do.

The logic assumes that we always emit a context switch after mapping new
PDEs, and before we submit a batch. This is the case today, and has been
the case since the inception of hardware contexts. A note in the comment
let's the user know.

NOTE: I have no evidence to suggest this is actually needed other than a
few tidbits which lead me to believe there are some corner cases that
will require it. I'm mostly depending on the reload of DCLV to
invalidate the old TLBs. We can try to remove this patch and see what
happens.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_context.c    | 15 ++++++++++++---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 +++++
 drivers/gpu/drm/i915/i915_gem_gtt.c        | 17 ++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  2 ++
 4 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index a899e11..6ad5380 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -636,9 +636,18 @@ mi_set_context(struct intel_ring_buffer *ring,
 
 static inline bool should_skip_switch(struct intel_ring_buffer *ring,
 				      struct i915_hw_context *from,
-				      struct i915_hw_context *to)
+				      struct i915_hw_context *to,
+				      u32 *flags)
 {
-	if (from == to && from->last_ring == ring && !to->remap_slice)
+	if (test_and_clear_bit(ring->id, &to->vm->pd_reload_mask)) {
+		*flags |= MI_FORCE_RESTORE;
+		return false;
+	}
+
+	if (to->remap_slice)
+		return false;
+
+	if (from == to && from->last_ring == ring)
 		return true;
 
 	return false;
@@ -658,7 +667,7 @@ static int do_switch(struct intel_ring_buffer *ring,
 		BUG_ON(!i915_gem_obj_is_pinned(from->obj));
 	}
 
-	if (should_skip_switch(ring, from, to))
+	if (should_skip_switch(ring, from, to, &hw_flags))
 		return 0;
 
 	/* Trying to pin first makes error handling easier. */
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 856fa9d..bb901e8 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1162,6 +1162,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret)
 		goto err;
 
+	/* XXX: Reserve has possibly change PDEs which means we must do a
+	 * context switch before we can coherently read some of the reserved
+	 * VMAs. */
+
 	/* The objects are in their final locations, apply the relocations. */
 	if (need_relocs)
 		ret = i915_gem_execbuffer_relocate(eb);
@@ -1263,6 +1267,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 				goto err;
 		}
 	} else {
+		WARN_ON(vm->pd_reload_mask & (1<<ring->id));
 		ret = ring->dispatch_execbuffer(ring,
 						exec_start, exec_len,
 						flags);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d3c77d1..6d904c9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1228,6 +1228,16 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
+/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
+ * are switching between contexts with the same LRCA, we also must do a force
+ * restore.
+ */
+#define ppgtt_invalidate_tlbs(vm) do {\
+	if (INTEL_INFO(vm->dev)->gen < 8) { \
+		vm->pd_reload_mask = INTEL_INFO(vm->dev)->ring_mask; \
+	} \
+} while(0)
+
 static int
 ppgtt_bind_vma(struct i915_vma *vma,
 	       enum i915_cache_level cache_level,
@@ -1242,10 +1252,13 @@ ppgtt_bind_vma(struct i915_vma *vma,
 						 vma->node.size);
 		if (ret)
 			return ret;
+
+		ppgtt_invalidate_tlbs(vma->vm);
 	}
 
 	vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
 				cache_level);
+
 	return 0;
 }
 
@@ -1255,9 +1268,11 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
 			     vma->node.start,
 			     vma->obj->base.size,
 			     true);
-	if (vma->vm->teardown_va_range)
+	if (vma->vm->teardown_va_range) {
 		vma->vm->teardown_va_range(vma->vm,
 					   vma->node.start, vma->node.size);
+		ppgtt_invalidate_tlbs(vma->vm);
+	}
 }
 
 extern int intel_iommu_gfx_mapped;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 3925fde..6130f3d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -280,6 +280,8 @@ struct i915_address_space {
 		struct page *page;
 	} scratch;
 
+	unsigned long pd_reload_mask;
+
 	/**
 	 * List of objects currently involved in rendering.
 	 *
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 24/26] drm/i915: Finish gen6/7 dynamic page table allocation
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (22 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 23/26] drm/i915: Force pd restore when PDEs change, gen6-7 Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-20 12:15   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 25/26] drm/i915: Print used ppgtt pages for gen6 in debugfs Ben Widawsky
                   ` (2 subsequent siblings)
  26 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

This patch continues on the idea from the previous patch. From here on,
in the steady state, PDEs are all pointing to the scratch page table (as
recommended in the spec). When an object is allocated in the VA range,
the code will determine if we need to allocate a page for the page
table. Similarly when the object is destroyed, we will remove, and free
the page table pointing the PDE back to the scratch page.

Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.

The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV.  Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.

We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h         |   2 +-
 drivers/gpu/drm/i915/i915_gem_context.c |   2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c     | 123 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_trace.h       | 108 ++++++++++++++++++++++++++++
 4 files changed, 224 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b19442c..eeef032 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2373,7 +2373,7 @@ static inline void i915_gem_chipset_flush(struct drm_device *dev)
 	if (INTEL_INFO(dev)->gen < 6)
 		intel_gtt_chipset_flush();
 }
-int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt);
+int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt, bool aliasing);
 bool intel_enable_ppgtt(struct drm_device *dev, bool full);
 
 /* i915_gem_stolen.c */
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 6ad5380..185c926 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -209,7 +209,7 @@ create_vm_for_ctx(struct drm_device *dev, struct i915_hw_context *ctx)
 	if (!ppgtt)
 		return ERR_PTR(-ENOMEM);
 
-	ret = i915_gem_init_ppgtt(dev, ppgtt);
+	ret = i915_gem_init_ppgtt(dev, ppgtt, ctx->file_priv == NULL);
 	if (ret) {
 		kfree(ppgtt);
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6d904c9..846a5b5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1016,13 +1016,54 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
+static DECLARE_BITMAP(new_page_tables, I915_PDES_PER_PD);
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       uint64_t start, uint64_t length)
 {
+	struct drm_device *dev = vm->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_hw_ppgtt *ppgtt =
 		        container_of(vm, struct i915_hw_ppgtt, base);
 	struct i915_pagetab *pt;
+	const uint32_t start_save = start, length_save = length;
 	uint32_t pde, temp;
+	int ret;
+
+	BUG_ON(upper_32_bits(start));
+
+	bitmap_zero(new_page_tables, I915_PDES_PER_PD);
+
+	trace_i915_va_alloc(vm, start, length);
+
+	/* The allocation is done in two stages so that we can bail out with
+	 * minimal amount of pain. The first stage finds new page tables that
+	 * need allocation. The second stage marks use ptes within the page
+	 * tables.
+	 */
+	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+		if (pt != ppgtt->scratch_pt) {
+			WARN_ON(bitmap_empty(pt->used_ptes, GEN6_PTES_PER_PT));
+			continue;
+		}
+
+		/* We've already allocated a page table */
+		WARN_ON(!bitmap_empty(pt->used_ptes, GEN6_PTES_PER_PT));
+
+		pt = alloc_pt_single(dev);
+		if (IS_ERR(pt)) {
+			ret = PTR_ERR(pt);
+			goto unwind_out;
+		}
+
+		ppgtt->pd.page_tables[pde] = pt;
+		set_bit(pde, new_page_tables);
+		trace_i915_pagetable_alloc(vm, pde,
+					   start,
+					   start + (1 << GEN6_PDE_SHIFT));
+	}
+
+	start = start_save;
+	length = length_save;
 
 	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
 		int j;
@@ -1040,11 +1081,32 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 			}
 		}
 
-		bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+		if (test_and_clear_bit(pde, new_page_tables))
+			gen6_map_single(&ppgtt->pd, pde, pt);
+
+		trace_i915_pagetable_map(vm, pde, pt,
+					 gen6_pte_index(start),
+					 gen6_pte_count(start, length),
+					 GEN6_PTES_PER_PT);
+		bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
 			  GEN6_PTES_PER_PT);
 	}
 
+	WARN_ON(!bitmap_empty(new_page_tables, I915_PDES_PER_PD));
+
+	/* Make sure write is complete before other code can use this page
+	 * table. Also require for WC mapped PTEs */
+	readl(dev_priv->gtt.gsm);
+
 	return 0;
+
+unwind_out:
+	for_each_set_bit(pde, new_page_tables, I915_PDES_PER_PD) {
+		struct i915_pagetab *pt = ppgtt->pd.page_tables[pde];
+		ppgtt->pd.page_tables[pde] = NULL;
+		free_pt_single(pt, vm->dev);
+	}
+	return ret;
 }
 
 static void gen6_teardown_va_range(struct i915_address_space *vm,
@@ -1055,9 +1117,30 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
 	struct i915_pagetab *pt;
 	uint32_t pde, temp;
 
+	trace_i915_va_teardown(vm, start, length);
+
 	gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+
+		if (WARN(pt == ppgtt->scratch_pt,
+		    "Tried to teardown scratch page vm %p. pde %u: %llx-%llx\n",
+		    vm, pde, start, start + length))
+			continue;
+
+		trace_i915_pagetable_unmap(vm, pde, pt,
+					   gen6_pte_index(start),
+					   gen6_pte_count(start, length),
+					   GEN6_PTES_PER_PT);
+
 		bitmap_clear(pt->used_ptes, gen6_pte_index(start),
 			     gen6_pte_count(start, length));
+
+		if (bitmap_empty(pt->used_ptes, GEN6_PTES_PER_PT)) {
+			trace_i915_pagetable_destroy(vm, pde,
+						     start & GENMASK_ULL(64, GEN6_PDE_SHIFT),
+						     start + (1 << GEN6_PDE_SHIFT));
+			gen6_map_single(&ppgtt->pd, pde, ppgtt->scratch_pt);
+			ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
+		}
 	}
 }
 
@@ -1065,9 +1148,13 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++)
-		free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		struct i915_pagetab *pt = ppgtt->pd.page_tables[i];
+		if (pt != ppgtt->scratch_pt)
+			free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+	}
 
+	/* Consider putting this as part of pd free. */
 	free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
 	free_pd_single(&ppgtt->pd, ppgtt->base.dev);
 }
@@ -1136,7 +1223,7 @@ err_out:
 	return ret;
 }
 
-static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt, bool preallocate_pt)
 {
 	int ret;
 
@@ -1144,9 +1231,13 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
+	if (!preallocate_pt)
+		return 0;
+
 	ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
 			     ppgtt->base.dev);
 	if (ret) {
+		free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
 		drm_mm_remove_node(&ppgtt->node);
 		return ret;
 	}
@@ -1154,8 +1245,19 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
+static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
+				  uint64_t start, uint64_t length)
+{
+	struct i915_pagetab *unused;
+	uint32_t pde, temp;
 
-static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+	gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
+		ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
+		gen6_map_single(&ppgtt->pd, pde, ppgtt->pd.page_tables[pde]);
+	}
+}
+
+static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
 {
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1174,7 +1276,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	} else
 		BUG();
 
-	ret = gen6_ppgtt_alloc(ppgtt);
+	ret = gen6_ppgtt_alloc(ppgtt, aliasing);
 	if (ret)
 		return ret;
 
@@ -1193,7 +1295,10 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
 		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
 
-	gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
+	if (aliasing)
+		gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
+	else
+		gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 			 ppgtt->node.size >> 20,
@@ -1202,7 +1307,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 	return 0;
 }
 
-int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
+int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt, bool aliasing)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret = 0;
@@ -1211,7 +1316,7 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
 	ppgtt->base.scratch = dev_priv->gtt.base.scratch;
 
 	if (INTEL_INFO(dev)->gen < 8)
-		ret = gen6_ppgtt_init(ppgtt);
+		ret = gen6_ppgtt_init(ppgtt, aliasing);
 	else if (IS_GEN8(dev))
 		ret = gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
 	else
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index b95a380..86e85de 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -81,6 +81,114 @@ TRACE_EVENT(i915_vma_unbind,
 		      __entry->obj, __entry->offset, __entry->size, __entry->vm)
 );
 
+DECLARE_EVENT_CLASS(i915_va,
+	TP_PROTO(struct i915_address_space *vm, u64 start, u64 length),
+	TP_ARGS(vm, start, length),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u64, start)
+		__field(u64, end)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->start = start;
+		__entry->end = start + length;
+	),
+
+	TP_printk("vm=%p, 0x%llx-0x%llx", __entry->vm, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_va, i915_va_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length),
+	     TP_ARGS(vm, start, length)
+);
+
+DEFINE_EVENT(i915_va, i915_va_teardown,
+	     TP_PROTO(struct i915_address_space *vm, u64 start, u64 length),
+	     TP_ARGS(vm, start, length)
+);
+
+DECLARE_EVENT_CLASS(i915_pagetable,
+	TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 end),
+	TP_ARGS(vm, pde, start, end),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u64, start)
+		__field(u64, end)
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->start = start;
+		__entry->end = end;
+	),
+
+	TP_printk("vm=%p, pde=%d (0x%llx-0x%llx)",
+		  __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_pagetable, i915_pagetable_alloc,
+	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 end),
+	     TP_ARGS(vm, pde, start, end)
+);
+
+DEFINE_EVENT(i915_pagetable, i915_pagetable_destroy,
+	     TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 end),
+	     TP_ARGS(vm, pde, start, end)
+);
+
+/* Avoid extra math because we only support two sizes. The format is defined by
+ * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
+#define TRACE_PT_SIZE(bits) \
+	((((bits) == 1024) ? 288 : 144) + 1)
+
+DECLARE_EVENT_CLASS(i915_pagetable_update,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits),
+
+	TP_STRUCT__entry(
+		__field(struct i915_address_space *, vm)
+		__field(u32, pde)
+		__field(u32, first)
+		__field(u32, last)
+		__dynamic_array(char, cur_ptes, TRACE_PT_SIZE(bits))
+	),
+
+	TP_fast_assign(
+		__entry->vm = vm;
+		__entry->pde = pde;
+		__entry->first = first;
+		__entry->last = first + len;
+
+		bitmap_scnprintf(__get_str(cur_ptes),
+				 TRACE_PT_SIZE(bits),
+				 pt->used_ptes,
+				 bits);
+	),
+
+	TP_printk("vm=%p, pde=%d, updating %u:%u\t%s",
+		  __entry->vm, __entry->pde, __entry->last, __entry->first,
+		  __get_str(cur_ptes))
+);
+
+DEFINE_EVENT(i915_pagetable_update, i915_pagetable_map,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits)
+);
+
+DEFINE_EVENT(i915_pagetable_update, i915_pagetable_unmap,
+	TP_PROTO(struct i915_address_space *vm, u32 pde,
+		 struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
+	TP_ARGS(vm, pde, pt, first, len, bits)
+);
+
 TRACE_EVENT(i915_gem_object_change_domain,
 	    TP_PROTO(struct drm_i915_gem_object *obj, u32 old_read, u32 old_write),
 	    TP_ARGS(obj, old_read, old_write),
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 25/26] drm/i915: Print used ppgtt pages for gen6 in debugfs
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (23 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 24/26] drm/i915: Finish gen6/7 dynamic page table allocation Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-20 10:09   ` Chris Wilson
  2014-03-20 10:17   ` Chris Wilson
  2014-03-18  5:48 ` [PATCH 26/26] FOR REFERENCE ONLY Ben Widawsky
  2014-03-20 12:17 ` [PATCH 00/26] [RFCish] GEN7 dynamic page tables Chris Wilson
  26 siblings, 2 replies; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5f3666a..04d40fa 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1785,10 +1785,26 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int verb
 	}
 }
 
+static size_t gen6_ppgtt_count_pt_pages(struct i915_hw_ppgtt *ppgtt)
+{
+	struct i915_pagedir *pd = &ppgtt->pd;
+	struct i915_pagetab **pt = &pd->page_tables[0];
+	size_t cnt = 0;
+	int i;
+
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		if (pt[i] != ppgtt->scratch_pt)
+			cnt++;
+	}
+
+	return cnt;
+}
+
 static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt, const char *name)
 {
 	seq_printf(m, "%s:\n", name);
 	seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
+	seq_printf(m, "\tpd pages: %zu\n", gen6_ppgtt_count_pt_pages(ppgtt));
 }
 
 static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool verbose)
@@ -1809,6 +1825,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool ver
 		seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", I915_READ(RING_PP_DIR_BASE_READ(ring)));
 		seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", I915_READ(RING_PP_DIR_DCLV(ring)));
 	}
+	seq_printf(m, "ECOCHK: 0x%08x\n\n", I915_READ(GAM_ECOCHK));
+
 	if (dev_priv->mm.aliasing_ppgtt) {
 		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
@@ -1829,7 +1847,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool ver
 		if (verbose)
 			idr_for_each(&file_priv->context_idr, per_file_ctx, m);
 	}
-	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
 }
 
 static int i915_ppgtt_info(struct seq_file *m, void *data)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 26/26] FOR REFERENCE ONLY
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (24 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 25/26] drm/i915: Print used ppgtt pages for gen6 in debugfs Ben Widawsky
@ 2014-03-18  5:48 ` Ben Widawsky
  2014-03-20 12:17 ` [PATCH 00/26] [RFCish] GEN7 dynamic page tables Chris Wilson
  26 siblings, 0 replies; 62+ messages in thread
From: Ben Widawsky @ 2014-03-18  5:48 UTC (permalink / raw)
  To: Intel GFX

Start using size/length through the GEN8 code. The same approach was
taken for gen7. The difference with gen8 to this point is we need to
take care to the do the page directory allocations, as well as the page
tables.

This patch is meant to show how things will look (more or less) if I
keep up in the same direction.
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 104 +++++++++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  37 +++++++++++++
 2 files changed, 115 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 846a5b5..1348d48 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -488,29 +488,50 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
-static void gen8_free_page_tables(struct i915_pagedir *pd, struct drm_device *dev)
+static void gen8_free_page_tables(struct i915_pagedir *pd,
+				  uint64_t start, uint64_t length,
+				  struct drm_device *dev)
 {
 	int i;
 
 	if (!pd->page)
 		return;
 
-	for (i = 0; i < I915_PDES_PER_PD; i++) {
+	for (i = gen8_pte_index(start);
+	     length && i < GEN8_PTES_PER_PT; i++, length -= PAGE_SIZE) {
+		if (!pd->page_tables[i])
+			continue;
+
 		free_pt_single(pd->page_tables[i], dev);
 		pd->page_tables[i] = NULL;
 	}
 }
 
-static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+static void gen8_teardown_va_range(struct i915_hw_ppgtt *ppgtt,
+				   uint64_t start, uint64_t length)
 {
-	int i;
+	struct drm_device *dev = ppgtt->base.dev;
+	struct i915_pagedir *pd;
+	struct i915_pagetab *pt;
+	uint64_t temp, temp2;
+	uint32_t pdpe, pde;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		uint64_t pd_start = start;
+		uint64_t pd_len = gen8_bound_pt(start, length);
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp2, pde) {
+			gen8_free_page_tables(pd, pd_start, pd_len, dev);
+		}
 
-	for (i = 0; i < ppgtt->num_pd_pages; i++) {
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
-		free_pd_single(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
+		free_pd_single(pd, dev);
 	}
 }
 
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+{
+	gen8_teardown_va_range(ppgtt, ppgtt->base.start, ppgtt->base.total);
+}
+
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct i915_hw_ppgtt *ppgtt =
@@ -537,41 +558,75 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 
 unwind_out:
 	while (i--)
-		gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
+		gen8_free_page_tables(ppgtt->pdp.pagedir[i],
+				      i * I915_PDES_PER_PD * GEN8_PTES_PER_PT,
+				      (i + 1)* I915_PDES_PER_PD * GEN8_PTES_PER_PT,
+				      ppgtt->base.dev);
 
 	return -ENOMEM;
 }
 
 static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-						const int max_pdp)
+						uint64_t start, uint64_t length)
 {
-	int i;
+	struct i915_pagedir *unused;
+	uint64_t temp;
+	uint32_t pdpe;
 
-	for (i = 0; i < max_pdp; i++) {
-		ppgtt->pdp.pagedir[i] = alloc_pd_single(ppgtt->base.dev);
-		if (IS_ERR(ppgtt->pdp.pagedir[i]))
-			goto unwind_out;
+	gen8_for_each_pdpe(unused, &ppgtt->pdp, start, length, temp, pdpe) {
+		struct i915_pagedir *pd;
+
+		BUG_ON(unused);
+		pd = alloc_pd_single(ppgtt->base.dev);
+		if (!pd)
+			goto pd_fail;
+
+		ppgtt->pdp.pagedir[pdpe] = pd;
+		ppgtt->num_pd_pages++;
 	}
 
-	ppgtt->num_pd_pages = max_pdp;
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
 
 	return 0;
 
-unwind_out:
-	while (i--)
-		free_pd_single(ppgtt->pdp.pagedir[i],
-			       ppgtt->base.dev);
+pd_fail:
+	while (pdpe--)
+		free_pd_single(ppgtt->pdp.pagedir[pdpe], ppgtt->base.dev);
 
 	return -ENOMEM;
 }
 
+static void gen8_alloc_va_range(struct i915_hw_ppgtt *ppgtt,
+				uint64_t start, uint64_t length)
+{
+	struct i915_pagedir *pd;
+	struct i915_pagetab *pt;
+	uint64_t temp, temp2;
+	uint32_t pdpe, pde;
+
+	gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+		uint64_t pd_start = start;
+		uint64_t pd_len = gen8_bound_pt(start, length);
+		gen8_for_each_pde(pt, pd, pd_start, pd_len, temp2, pde) {
+			uint64_t bound = gen8_bound_pt(pd_start, pd_len);
+			int ret = alloc_pt_range(pd,
+						 gen8_pde_index(pd_start),
+						 gen8_pde_index(bound),
+						 ppgtt->base.dev);
+			if (ret) {
+				//gen8_free_page_tables(pd, pd_start, pd_len, dev);
+			}
+
+		}
+	}
+}
+
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-			    const int max_pdp)
+			    uint64_t start, uint64_t length)
 {
 	int ret;
 
-	ret = gen8_ppgtt_allocate_page_directories(ppgtt, max_pdp);
+	ret = gen8_ppgtt_allocate_page_directories(ppgtt, start, length);
 	if (ret)
 		return ret;
 
@@ -579,7 +634,7 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 	if (ret)
 		goto err_out;
 
-	ppgtt->num_pd_entries = max_pdp * I915_PDES_PER_PD;
+	ppgtt->num_pd_entries = length >> GEN8_PDE_SHIFT;
 
 	return 0;
 
@@ -605,11 +660,8 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	const int min_pt_pages = I915_PDES_PER_PD * max_pdp;
 	int i, j, ret;
 
-	if (size % (1<<30))
-		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
-
 	/* 1. Do all our allocations for page directories and page tables. */
-	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
+	ret = gen8_ppgtt_alloc(ppgtt, 0, size);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 6130f3d..91f8a36 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -192,6 +192,43 @@ static inline uint32_t gen8_pml4e_index(uint64_t address)
 	BUG();
 }
 
+/* Either rounds down length to the nearest page table VA boundary, or returns
+ * length
+ */
+static inline uint64_t gen8_bound_pt(uint64_t start, uint64_t length)
+{
+	uint64_t next_pt = ALIGN(start + 1, 1 << GEN8_PDE_SHIFT);
+	if (next_pt > (start + length))
+		return length;
+
+	return next_pt - start;
+}
+
+static inline uint64_t gen8_bound_pd(uint64_t start, uint64_t length)
+{
+	uint64_t next_pt = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
+	if (next_pt > (start + length))
+		return length;
+
+	return next_pt - start;
+}
+
+#define gen8_for_each_pde(pt, pd, start, length, temp, iter) \
+	for (iter = gen8_pde_index(start), pt = (pd)->page_tables[iter]; \
+	     length > 0 && iter < I915_PDES_PER_PD; \
+	     pt = (pd)->page_tables[++iter], \
+	     temp = ALIGN(start+1, 1 << GEN8_PDE_SHIFT) - start, \
+	     temp = min(temp, length), \
+	     start += temp, length -= temp)
+
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter) \
+	for (iter = gen8_pdpe_index(start), pd = (pdp)->pagedir[iter]; \
+	     length > 0 && iter < GEN8_LEGACY_PDPS; \
+	     pd = (pdp)->pagedir[iter++], \
+	     temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start, \
+	     temp = min(temp, length), \
+	     start += temp, length -= temp)
+
 enum i915_cache_level;
 /**
  * A VMA represents a GEM BO that is bound into an address space. Therefore, a
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 02/26] drm/i915: Extract switch to default context
  2014-03-18  5:48 ` [PATCH 02/26] drm/i915: Extract switch to default context Ben Widawsky
@ 2014-03-18  8:38   ` Chris Wilson
  0 siblings, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-18  8:38 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:34PM -0700, Ben Widawsky wrote:
> This patch existed for another reason which no longer exists. I liked
> it, so I kept it in the series. It can skipped if undesirable.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_drv.h | 2 ++
>  drivers/gpu/drm/i915/i915_gem.c | 2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 35f9a37..c59b707 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2476,6 +2476,8 @@ int i915_gem_context_enable(struct drm_i915_private *dev_priv);
>  void i915_gem_context_close(struct drm_device *dev, struct drm_file *file);
>  int i915_switch_context(struct intel_ring_buffer *ring,
>  			struct drm_file *file, struct i915_hw_context *to);
> +#define i915_switch_to_default(ring) \
> +	i915_switch_context(ring, NULL, ring->default_context)
>  struct i915_hw_context *
>  i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id);
>  void i915_gem_context_free(struct kref *ctx_ref);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b2565d2..ed09dda 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2799,7 +2799,7 @@ int i915_gpu_idle(struct drm_device *dev)
>  
>  	/* Flush everything onto the inactive list. */
>  	for_each_ring(ring, dev_priv, i) {
> -		ret = i915_switch_context(ring, NULL, ring->default_context);
> +		ret = i915_switch_to_default(ring);

Switch what to default? What it does is not very clear, sorry.
Skip unless we change it to i915_switch_to_default_context()?
intel_ring_switch_to_default_context()? Doesn't seem like much of a win
by that point. :|
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/26] drm/i915: rename map/unmap to dma_map/unmap
  2014-03-18  5:48 ` [PATCH 04/26] drm/i915: rename map/unmap to dma_map/unmap Ben Widawsky
@ 2014-03-18  8:40   ` Chris Wilson
  0 siblings, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-18  8:40 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:36PM -0700, Ben Widawsky wrote:
> Upcoming patches will use the terms map and unmap in references to the
> page table entries. Having this distinction will really help with code
> clarity at that point.

Nice.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 06/26] drm/i915: Wrap VMA binding
  2014-03-18  5:48 ` [PATCH 06/26] drm/i915: Wrap VMA binding Ben Widawsky
@ 2014-03-18  8:42   ` Chris Wilson
  0 siblings, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-18  8:42 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:38PM -0700, Ben Widawsky wrote:
> This will be useful for some upcoming patches which do more platform
> specific work. Having it in one central place just makes things a bit
> cleaner and easier.
> 
> There is a small functional change here. There are more calls to the
> tracepoints.
> 
> NOTE: I didn't actually end up using this patch for the intended purpose, but I
> thought it was a nice patch to keep around.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_drv.h            |  3 +++
>  drivers/gpu/drm/i915/i915_gem.c            |  8 ++++----
>  drivers/gpu/drm/i915/i915_gem_context.c    |  2 +-
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 +++--
>  drivers/gpu/drm/i915/i915_gem_gtt.c        | 16 ++++++++++++++--
>  5 files changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c59b707..b3e31fd 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2408,6 +2408,9 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
>  			struct i915_address_space *vm);
>  unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
>  				struct i915_address_space *vm);
> +void i915_gem_bind_vma(struct i915_vma *vma, enum i915_cache_level,
> +		       unsigned flags);
> +void i915_gem_unbind_vma(struct i915_vma *vma);

Being pedantic, this should be i915_vma_bind, i915_vma_unbind.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/26] drm/i915: clean up PPGTT init error path
  2014-03-18  5:48 ` [PATCH 07/26] drm/i915: clean up PPGTT init error path Ben Widawsky
@ 2014-03-18  8:44   ` Chris Wilson
  2014-03-22 19:43     ` Ben Widawsky
  0 siblings, 1 reply; 62+ messages in thread
From: Chris Wilson @ 2014-03-18  8:44 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:39PM -0700, Ben Widawsky wrote:
> The old code (I'm having trouble finding the commit) had a reason for
> doing things when there was an error, and would continue on, thus the
> !ret. For the newer code however, this looks completely silly.
> 
> Follow the normal idiom of if (ret) return ret.
> 
> Also, put the pde wiring in the gen specific init, now that GEN8 exists.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 22 +++++++++-------------
>  1 file changed, 9 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 1620211..5f73284 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1202,6 +1202,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  	ppgtt->pd_offset =
>  		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
>  
> +	gen6_write_pdes(ppgtt);
> +
>  	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
>  
>  	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
> @@ -1226,20 +1228,14 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
>  	else
>  		BUG();
>  
> -	if (!ret) {
> -		struct drm_i915_private *dev_priv = dev->dev_private;
> -		kref_init(&ppgtt->ref);
> -		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
> -			    ppgtt->base.total);
> -		i915_init_vm(dev_priv, &ppgtt->base);
> -		if (INTEL_INFO(dev)->gen < 8) {
> -			gen6_write_pdes(ppgtt);
> -			DRM_DEBUG("Adding PPGTT at offset %x\n",
> -				  ppgtt->pd_offset << 10);
> -		}
> -	}
> +	if (ret)
> +		return ret;
>  
> -	return ret;
> +	kref_init(&ppgtt->ref);
> +	drm_mm_init(&ppgtt->base.mm, ppgtt->base.start, ppgtt->base.total);
> +	i915_init_vm(dev_priv, &ppgtt->base);

Didn't we just delete the dev_priv local variable?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 09/26] drm/i915: Split out gtt specific header file
  2014-03-18  5:48 ` [PATCH 09/26] drm/i915: Split out gtt specific header file Ben Widawsky
@ 2014-03-18  8:46   ` Chris Wilson
  2014-03-18  9:15   ` Daniel Vetter
  1 sibling, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-18  8:46 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:41PM -0700, Ben Widawsky wrote:
> TODO: Do header files need a copyright?

Short answer: yes.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/26] drm/i915: Make gen6_write_pdes gen6_map_page_tables
  2014-03-18  5:48 ` [PATCH 10/26] drm/i915: Make gen6_write_pdes gen6_map_page_tables Ben Widawsky
@ 2014-03-18  8:48   ` Chris Wilson
  0 siblings, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-18  8:48 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:42PM -0700, Ben Widawsky wrote:
> Split out single mappings which will help with upcoming work. Also while
> here, rename the function because it is a better description - but this
> function is going away soon.

At this moment, I'm not sure about the name genX_map_single(). Maybe it
will make sense in a bit...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 11/26] drm/i915: Range clearing is PPGTT agnostic
  2014-03-18  5:48 ` [PATCH 11/26] drm/i915: Range clearing is PPGTT agnostic Ben Widawsky
@ 2014-03-18  8:50   ` Chris Wilson
  0 siblings, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-18  8:50 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:43PM -0700, Ben Widawsky wrote:
> Therefore we can do it from our general init function. Eventually, I
> hope to have a lot more commonality like this. It won't arrive yet, but
> this was a nice easy one.

Lgtm.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 12/26] drm/i915: Page table helpers, and define renames
  2014-03-18  5:48 ` [PATCH 12/26] drm/i915: Page table helpers, and define renames Ben Widawsky
@ 2014-03-18  9:05   ` Chris Wilson
  2014-03-18 18:29     ` Jesse Barnes
  0 siblings, 1 reply; 62+ messages in thread
From: Chris Wilson @ 2014-03-18  9:05 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:44PM -0700, Ben Widawsky wrote:
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -1,8 +1,11 @@
>  #ifndef _I915_GEM_GTT_H
>  #define _I915_GEM_GTT_H
>  
> -#define GEN6_PPGTT_PD_ENTRIES 512
> -#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
> +/* GEN Agnostic defines */
> +#define I915_PDES_PER_PD		512
> +#define I915_PTE_MASK			(PAGE_SHIFT-1)

That looks decidely fishy.

PAGE_SHIFT is 12 -> PTE_MASK = 0xb

> +#define I915_PDE_MASK			(I915_PDES_PER_PD-1)
> +
>  typedef uint32_t gen6_gtt_pte_t;
>  typedef uint64_t gen8_gtt_pte_t;
>  typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
> @@ -23,6 +26,98 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
>  #define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
>  #define HSW_PTE_ADDR_ENCODE(addr)	HSW_GTT_ADDR_ENCODE(addr)
>  
> +
> +/* GEN6 PPGTT resembles a 2 level page table:
> + * 31:22 | 21:12 |  11:0
> + *  PDE  |  PTE  | offset
> + */
> +#define GEN6_PDE_SHIFT			22
> +#define GEN6_PTES_PER_PT		(PAGE_SIZE / sizeof(gen6_gtt_pte_t))
> +
> +static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
> +{
> +	const uint32_t mask = (1 << (pde_shift - PAGE_SHIFT)) - 1;
> +	return (address >> PAGE_SHIFT) & mask;
> +}
> +
> +/* Helper to counts the number of PTEs within the given length. This count does
> + * not cross a page table boundary, so the max value would be
> + * GEN6_PTES_PER_PT for GEN6, and GEN8_PTES_PER_PT for GEN8.
> + */
> +static inline size_t i915_pte_count(uint64_t addr, size_t length,
> +				    uint32_t pde_shift)
> +{
> +	const uint64_t pd_mask = ~((1 << pde_shift) - 1);
> +	uint64_t end;
> +
> +	if (WARN_ON(!length))
> +		return 0;
> +
> +	if (WARN_ON(addr % PAGE_SIZE))
> +		addr = round_down(addr, PAGE_SIZE);
> +
> +	if (WARN_ON(length % PAGE_SIZE))
> +		length = round_up(length, PAGE_SIZE);

Oh oh. I think these fixups are very suspect, so just
BUG_ON(length == 0);
BUG_ON(offset_in_page(addr|length));

> +
> +	end = addr + length;
> +
> +	if ((addr & pd_mask) != (end & pd_mask))
> +		return (1 << (pde_shift - PAGE_SHIFT)) -

#define NUM_PTE(pde_shift) (1 << (pde_shift - PAGE_SHIFT))
here and for computing the pd_mask.

> +			i915_pte_index(addr, pde_shift);
> +
> +	return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
> +}

Otherwise the helpers look a useful improvement in readability.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 14/26] drm/i915: Complete page table structures
  2014-03-18  5:48 ` [PATCH 14/26] drm/i915: Complete page table structures Ben Widawsky
@ 2014-03-18  9:09   ` Chris Wilson
  2014-03-22 20:10     ` Ben Widawsky
  0 siblings, 1 reply; 62+ messages in thread
From: Chris Wilson @ 2014-03-18  9:09 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:46PM -0700, Ben Widawsky wrote:
> Move the remaining members over to the new page table structures.
> 
> This can be squashed with the previous commit if desire. The reasoning
> is the same as that patch. I simply felt it is easier to review if split.

I'm not liking the shorter names much. Is there precedence elsewhere
(e.g. daddr)?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 15/26] drm/i915: Create page table allocators
  2014-03-18  5:48 ` [PATCH 15/26] drm/i915: Create page table allocators Ben Widawsky
@ 2014-03-18  9:14   ` Chris Wilson
  2014-03-22 20:21     ` Ben Widawsky
  0 siblings, 1 reply; 62+ messages in thread
From: Chris Wilson @ 2014-03-18  9:14 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:47PM -0700, Ben Widawsky wrote:
> As we move toward dynamic page table allocation, it becomes much easier
> to manage our data structures if break do things less coarsely by
> breaking up all of our actions into individual tasks.  This makes the
> code easier to write, read, and verify.
> 
> Aside from the dissection of the allocation functions, the patch
> statically allocates the page table structures without a page directory.
> This remains the same for all platforms,
> 
> The patch itself should not have much functional difference. The primary
> noticeable difference is the fact that page tables are no longer
> allocated, but rather statically declared as part of the page directory.
> This has non-zero overhead, but things gain non-trivial complexity as a
> result.

We increase overhead for increased complexity. What's the selling point
of this patch then?

Otherwise, patch does as you say.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 09/26] drm/i915: Split out gtt specific header file
  2014-03-18  5:48 ` [PATCH 09/26] drm/i915: Split out gtt specific header file Ben Widawsky
  2014-03-18  8:46   ` Chris Wilson
@ 2014-03-18  9:15   ` Daniel Vetter
  2014-03-22 19:44     ` Ben Widawsky
  1 sibling, 1 reply; 62+ messages in thread
From: Daniel Vetter @ 2014-03-18  9:15 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:41PM -0700, Ben Widawsky wrote:
> TODO: Do header files need a copyright?

Yup ;-)

I like this though, especially since finer-grained files will make
kerneldoc inclusion (well, grouped into sensible chapters at least) much
simpler.
-Daniel

> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_drv.h     | 162 +-------------------------
>  drivers/gpu/drm/i915/i915_gem_gtt.c |  57 ---------
>  drivers/gpu/drm/i915/i915_gem_gtt.h | 225 ++++++++++++++++++++++++++++++++++++
>  3 files changed, 227 insertions(+), 217 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_gem_gtt.h
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 084e82f..b19442c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -44,6 +44,8 @@
>  #include <linux/kref.h>
>  #include <linux/pm_qos.h>
>  
> +#include "i915_gem_gtt.h"
> +
>  /* General customization:
>   */
>  
> @@ -572,166 +574,6 @@ enum i915_cache_level {
>  	I915_CACHE_WT, /* hsw:gt3e WriteThrough for scanouts */
>  };
>  
> -typedef uint32_t gen6_gtt_pte_t;
> -
> -/**
> - * A VMA represents a GEM BO that is bound into an address space. Therefore, a
> - * VMA's presence cannot be guaranteed before binding, or after unbinding the
> - * object into/from the address space.
> - *
> - * To make things as simple as possible (ie. no refcounting), a VMA's lifetime
> - * will always be <= an objects lifetime. So object refcounting should cover us.
> - */
> -struct i915_vma {
> -	struct drm_mm_node node;
> -	struct drm_i915_gem_object *obj;
> -	struct i915_address_space *vm;
> -
> -	/** This object's place on the active/inactive lists */
> -	struct list_head mm_list;
> -
> -	struct list_head vma_link; /* Link in the object's VMA list */
> -
> -	/** This vma's place in the batchbuffer or on the eviction list */
> -	struct list_head exec_list;
> -
> -	/**
> -	 * Used for performing relocations during execbuffer insertion.
> -	 */
> -	struct hlist_node exec_node;
> -	unsigned long exec_handle;
> -	struct drm_i915_gem_exec_object2 *exec_entry;
> -
> -	/**
> -	 * How many users have pinned this object in GTT space. The following
> -	 * users can each hold at most one reference: pwrite/pread, pin_ioctl
> -	 * (via user_pin_count), execbuffer (objects are not allowed multiple
> -	 * times for the same batchbuffer), and the framebuffer code. When
> -	 * switching/pageflipping, the framebuffer code has at most two buffers
> -	 * pinned per crtc.
> -	 *
> -	 * In the worst case this is 1 + 1 + 1 + 2*2 = 7. That would fit into 3
> -	 * bits with absolutely no headroom. So use 4 bits. */
> -	unsigned int pin_count:4;
> -#define DRM_I915_GEM_OBJECT_MAX_PIN_COUNT 0xf
> -
> -	/** Unmap an object from an address space. This usually consists of
> -	 * setting the valid PTE entries to a reserved scratch page. */
> -	void (*unbind_vma)(struct i915_vma *vma);
> -	/* Map an object into an address space with the given cache flags. */
> -#define GLOBAL_BIND (1<<0)
> -	void (*bind_vma)(struct i915_vma *vma,
> -			 enum i915_cache_level cache_level,
> -			 u32 flags);
> -};
> -
> -struct i915_address_space {
> -	struct drm_mm mm;
> -	struct drm_device *dev;
> -	struct list_head global_link;
> -	unsigned long start;		/* Start offset always 0 for dri2 */
> -	size_t total;		/* size addr space maps (ex. 2GB for ggtt) */
> -
> -	struct {
> -		dma_addr_t addr;
> -		struct page *page;
> -	} scratch;
> -
> -	/**
> -	 * List of objects currently involved in rendering.
> -	 *
> -	 * Includes buffers having the contents of their GPU caches
> -	 * flushed, not necessarily primitives.  last_rendering_seqno
> -	 * represents when the rendering involved will be completed.
> -	 *
> -	 * A reference is held on the buffer while on this list.
> -	 */
> -	struct list_head active_list;
> -
> -	/**
> -	 * LRU list of objects which are not in the ringbuffer and
> -	 * are ready to unbind, but are still in the GTT.
> -	 *
> -	 * last_rendering_seqno is 0 while an object is in this list.
> -	 *
> -	 * A reference is not held on the buffer while on this list,
> -	 * as merely being GTT-bound shouldn't prevent its being
> -	 * freed, and we'll pull it off the list in the free path.
> -	 */
> -	struct list_head inactive_list;
> -
> -	/* FIXME: Need a more generic return type */
> -	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
> -				     enum i915_cache_level level,
> -				     bool valid); /* Create a valid PTE */
> -	void (*clear_range)(struct i915_address_space *vm,
> -			    uint64_t start,
> -			    uint64_t length,
> -			    bool use_scratch);
> -	void (*insert_entries)(struct i915_address_space *vm,
> -			       struct sg_table *st,
> -			       uint64_t start,
> -			       enum i915_cache_level cache_level);
> -	void (*cleanup)(struct i915_address_space *vm);
> -};
> -
> -/* The Graphics Translation Table is the way in which GEN hardware translates a
> - * Graphics Virtual Address into a Physical Address. In addition to the normal
> - * collateral associated with any va->pa translations GEN hardware also has a
> - * portion of the GTT which can be mapped by the CPU and remain both coherent
> - * and correct (in cases like swizzling). That region is referred to as GMADR in
> - * the spec.
> - */
> -struct i915_gtt {
> -	struct i915_address_space base;
> -	size_t stolen_size;		/* Total size of stolen memory */
> -
> -	unsigned long mappable_end;	/* End offset that we can CPU map */
> -	struct io_mapping *mappable;	/* Mapping to our CPU mappable region */
> -	phys_addr_t mappable_base;	/* PA of our GMADR */
> -
> -	/** "Graphics Stolen Memory" holds the global PTEs */
> -	void __iomem *gsm;
> -
> -	bool do_idle_maps;
> -
> -	int mtrr;
> -
> -	/* global gtt ops */
> -	int (*gtt_probe)(struct drm_device *dev, size_t *gtt_total,
> -			  size_t *stolen, phys_addr_t *mappable_base,
> -			  unsigned long *mappable_end);
> -};
> -#define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
> -
> -#define GEN8_LEGACY_PDPS 4
> -struct i915_hw_ppgtt {
> -	struct i915_address_space base;
> -	struct kref ref;
> -	struct drm_mm_node node;
> -	unsigned num_pd_entries;
> -	unsigned num_pd_pages; /* gen8+ */
> -	union {
> -		struct page **pt_pages;
> -		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
> -	};
> -	struct page *pd_pages;
> -	union {
> -		uint32_t pd_offset;
> -		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
> -	};
> -	union {
> -		dma_addr_t *pt_dma_addr;
> -		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPS];
> -	};
> -
> -	int (*enable)(struct i915_hw_ppgtt *ppgtt);
> -	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
> -			 struct intel_ring_buffer *ring,
> -			 bool synchronous);
> -	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
> -};
> -
>  struct i915_ctx_hang_stats {
>  	/* This context had batch pending when hang was declared */
>  	unsigned batch_pending;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 5f73284..a239196 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -53,60 +53,6 @@ bool intel_enable_ppgtt(struct drm_device *dev, bool full)
>  		return HAS_ALIASING_PPGTT(dev);
>  }
>  
> -#define GEN6_PPGTT_PD_ENTRIES 512
> -#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
> -typedef uint64_t gen8_gtt_pte_t;
> -typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
> -
> -/* PPGTT stuff */
> -#define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
> -#define HSW_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0x7f0))
> -
> -#define GEN6_PDE_VALID			(1 << 0)
> -/* gen6+ has bit 11-4 for physical addr bit 39-32 */
> -#define GEN6_PDE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
> -
> -#define GEN6_PTE_VALID			(1 << 0)
> -#define GEN6_PTE_UNCACHED		(1 << 1)
> -#define HSW_PTE_UNCACHED		(0)
> -#define GEN6_PTE_CACHE_LLC		(2 << 1)
> -#define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
> -#define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
> -#define HSW_PTE_ADDR_ENCODE(addr)	HSW_GTT_ADDR_ENCODE(addr)
> -
> -/* Cacheability Control is a 4-bit value. The low three bits are stored in *
> - * bits 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
> - */
> -#define HSW_CACHEABILITY_CONTROL(bits)	((((bits) & 0x7) << 1) | \
> -					 (((bits) & 0x8) << (11 - 3)))
> -#define HSW_WB_LLC_AGE3			HSW_CACHEABILITY_CONTROL(0x2)
> -#define HSW_WB_LLC_AGE0			HSW_CACHEABILITY_CONTROL(0x3)
> -#define HSW_WB_ELLC_LLC_AGE0		HSW_CACHEABILITY_CONTROL(0xb)
> -#define HSW_WB_ELLC_LLC_AGE3		HSW_CACHEABILITY_CONTROL(0x8)
> -#define HSW_WT_ELLC_LLC_AGE0		HSW_CACHEABILITY_CONTROL(0x6)
> -#define HSW_WT_ELLC_LLC_AGE3		HSW_CACHEABILITY_CONTROL(0x7)
> -
> -#define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
> -#define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
> -
> -/* GEN8 legacy style addressis defined as a 3 level page table:
> - * 31:30 | 29:21 | 20:12 |  11:0
> - * PDPE  |  PDE  |  PTE  | offset
> - * The difference as compared to normal x86 3 level page table is the PDPEs are
> - * programmed via register.
> - */
> -#define GEN8_PDPE_SHIFT			30
> -#define GEN8_PDPE_MASK			0x3
> -#define GEN8_PDE_SHIFT			21
> -#define GEN8_PDE_MASK			0x1ff
> -#define GEN8_PTE_SHIFT			12
> -#define GEN8_PTE_MASK			0x1ff
> -
> -#define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
> -#define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
> -#define PPAT_CACHED_INDEX		_PAGE_PAT /* WB LLCeLLC */
> -#define PPAT_DISPLAY_ELLC_INDEX		_PAGE_PCD /* WT eLLC */
> -
>  static void ppgtt_bind_vma(struct i915_vma *vma,
>  			   enum i915_cache_level cache_level,
>  			   u32 flags);
> @@ -185,9 +131,6 @@ static gen6_gtt_pte_t ivb_pte_encode(dma_addr_t addr,
>  	return pte;
>  }
>  
> -#define BYT_PTE_WRITEABLE		(1 << 1)
> -#define BYT_PTE_SNOOPED_BY_CPU_CACHES	(1 << 2)
> -
>  static gen6_gtt_pte_t byt_pte_encode(dma_addr_t addr,
>  				     enum i915_cache_level level,
>  				     bool valid)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> new file mode 100644
> index 0000000..c8d5c77
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -0,0 +1,225 @@
> +#ifndef _I915_GEM_GTT_H
> +#define _I915_GEM_GTT_H
> +
> +#define GEN6_PPGTT_PD_ENTRIES 512
> +#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
> +typedef uint32_t gen6_gtt_pte_t;
> +typedef uint64_t gen8_gtt_pte_t;
> +typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
> +
> +/* PPGTT stuff */
> +#define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
> +#define HSW_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0x7f0))
> +
> +#define GEN6_PDE_VALID			(1 << 0)
> +/* gen6+ has bit 11-4 for physical addr bit 39-32 */
> +#define GEN6_PDE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
> +
> +#define GEN6_PTE_VALID			(1 << 0)
> +#define GEN6_PTE_UNCACHED		(1 << 1)
> +#define HSW_PTE_UNCACHED		(0)
> +#define GEN6_PTE_CACHE_LLC		(2 << 1)
> +#define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
> +#define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
> +#define HSW_PTE_ADDR_ENCODE(addr)	HSW_GTT_ADDR_ENCODE(addr)
> +
> +#define BYT_PTE_WRITEABLE		(1 << 1)
> +#define BYT_PTE_SNOOPED_BY_CPU_CACHES	(1 << 2)
> +
> +/* Cacheability Control is a 4-bit value. The low three bits are stored in *
> + * bits 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
> + */
> +#define HSW_CACHEABILITY_CONTROL(bits)	((((bits) & 0x7) << 1) | \
> +					 (((bits) & 0x8) << (11 - 3)))
> +#define HSW_WB_LLC_AGE3			HSW_CACHEABILITY_CONTROL(0x2)
> +#define HSW_WB_LLC_AGE0			HSW_CACHEABILITY_CONTROL(0x3)
> +#define HSW_WB_ELLC_LLC_AGE0		HSW_CACHEABILITY_CONTROL(0xb)
> +#define HSW_WB_ELLC_LLC_AGE3		HSW_CACHEABILITY_CONTROL(0x8)
> +#define HSW_WT_ELLC_LLC_AGE0		HSW_CACHEABILITY_CONTROL(0x6)
> +#define HSW_WT_ELLC_LLC_AGE3		HSW_CACHEABILITY_CONTROL(0x7)
> +
> +#define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
> +#define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
> +#define PPAT_CACHED_INDEX		_PAGE_PAT /* WB LLCeLLC */
> +#define PPAT_DISPLAY_ELLC_INDEX		_PAGE_PCD /* WT eLLC */
> +
> +#define GEN8_LEGACY_PDPS		4
> +#define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
> +#define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
> +
> +/* GEN8 legacy style addressis defined as a 3 level page table:
> + * 31:30 | 29:21 | 20:12 |  11:0
> + * PDPE  |  PDE  |  PTE  | offset
> + * The difference as compared to normal x86 3 level page table is the PDPEs are
> + * programmed via register.
> + *
> + * The x86 pagetable code is flexible in its ability to handle varying page
> + * table depths via abstracted PGDIR/PUD/PMD/PTE. I've opted to not do this and
> + * instead replicate the interesting functionality.
> + */
> +#define GEN8_PDPE_SHIFT			30
> +#define GEN8_PDPE_MASK			0x3
> +#define GEN8_PDE_SHIFT			21
> +#define GEN8_PDE_MASK			0x1ff
> +#define GEN8_PTE_SHIFT			12
> +#define GEN8_PTE_MASK			0x1ff
> +
> +enum i915_cache_level;
> +/**
> + * A VMA represents a GEM BO that is bound into an address space. Therefore, a
> + * VMA's presence cannot be guaranteed before binding, or after unbinding the
> + * object into/from the address space.
> + *
> + * To make things as simple as possible (ie. no refcounting), a VMA's lifetime
> + * will always be <= an objects lifetime. So object refcounting should cover us.
> + */
> +struct i915_vma {
> +	struct drm_mm_node node;
> +	struct drm_i915_gem_object *obj;
> +	struct i915_address_space *vm;
> +
> +	/** This object's place on the active/inactive lists */
> +	struct list_head mm_list;
> +
> +	struct list_head vma_link; /* Link in the object's VMA list */
> +
> +	/** This vma's place in the batchbuffer or on the eviction list */
> +	struct list_head exec_list;
> +
> +	/**
> +	 * Used for performing relocations during execbuffer insertion.
> +	 */
> +	struct hlist_node exec_node;
> +	unsigned long exec_handle;
> +	struct drm_i915_gem_exec_object2 *exec_entry;
> +
> +	/**
> +	 * How many users have pinned this object in GTT space. The following
> +	 * users can each hold at most one reference: pwrite/pread, pin_ioctl
> +	 * (via user_pin_count), execbuffer (objects are not allowed multiple
> +	 * times for the same batchbuffer), and the framebuffer code. When
> +	 * switching/pageflipping, the framebuffer code has at most two buffers
> +	 * pinned per crtc.
> +	 *
> +	 * In the worst case this is 1 + 1 + 1 + 2*2 = 7. That would fit into 3
> +	 * bits with absolutely no headroom. So use 4 bits. */
> +	unsigned int pin_count:4;
> +#define DRM_I915_GEM_OBJECT_MAX_PIN_COUNT 0xf
> +
> +	/** Unmap an object from an address space. This usually consists of
> +	 * setting the valid PTE entries to a reserved scratch page. */
> +	void (*unbind_vma)(struct i915_vma *vma);
> +	/* Map an object into an address space with the given cache flags. */
> +#define GLOBAL_BIND (1<<0)
> +	void (*bind_vma)(struct i915_vma *vma,
> +			 enum i915_cache_level cache_level,
> +			 u32 flags);
> +};
> +
> +struct i915_address_space {
> +	struct drm_mm mm;
> +	struct drm_device *dev;
> +	struct list_head global_link;
> +	unsigned long start;		/* Start offset always 0 for dri2 */
> +	size_t total;		/* size addr space maps (ex. 2GB for ggtt) */
> +
> +	struct {
> +		dma_addr_t addr;
> +		struct page *page;
> +	} scratch;
> +
> +	/**
> +	 * List of objects currently involved in rendering.
> +	 *
> +	 * Includes buffers having the contents of their GPU caches
> +	 * flushed, not necessarily primitives.  last_rendering_seqno
> +	 * represents when the rendering involved will be completed.
> +	 *
> +	 * A reference is held on the buffer while on this list.
> +	 */
> +	struct list_head active_list;
> +
> +	/**
> +	 * LRU list of objects which are not in the ringbuffer and
> +	 * are ready to unbind, but are still in the GTT.
> +	 *
> +	 * last_rendering_seqno is 0 while an object is in this list.
> +	 *
> +	 * A reference is not held on the buffer while on this list,
> +	 * as merely being GTT-bound shouldn't prevent its being
> +	 * freed, and we'll pull it off the list in the free path.
> +	 */
> +	struct list_head inactive_list;
> +
> +	/* FIXME: Need a more generic return type */
> +	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
> +				     enum i915_cache_level level,
> +				     bool valid); /* Create a valid PTE */
> +	void (*clear_range)(struct i915_address_space *vm,
> +			    uint64_t start,
> +			    uint64_t length,
> +			    bool use_scratch);
> +	void (*insert_entries)(struct i915_address_space *vm,
> +			       struct sg_table *st,
> +			       uint64_t start,
> +			       enum i915_cache_level cache_level);
> +	void (*cleanup)(struct i915_address_space *vm);
> +};
> +
> +/* The Graphics Translation Table is the way in which GEN hardware translates a
> + * Graphics Virtual Address into a Physical Address. In addition to the normal
> + * collateral associated with any va->pa translations GEN hardware also has a
> + * portion of the GTT which can be mapped by the CPU and remain both coherent
> + * and correct (in cases like swizzling). That region is referred to as GMADR in
> + * the spec.
> + */
> +struct i915_gtt {
> +	struct i915_address_space base;
> +	size_t stolen_size;		/* Total size of stolen memory */
> +
> +	unsigned long mappable_end;	/* End offset that we can CPU map */
> +	struct io_mapping *mappable;	/* Mapping to our CPU mappable region */
> +	phys_addr_t mappable_base;	/* PA of our GMADR */
> +
> +	/** "Graphics Stolen Memory" holds the global PTEs */
> +	void __iomem *gsm;
> +
> +	bool do_idle_maps;
> +
> +	int mtrr;
> +
> +	/* global gtt ops */
> +	int (*gtt_probe)(struct drm_device *dev, size_t *gtt_total,
> +			  size_t *stolen, phys_addr_t *mappable_base,
> +			  unsigned long *mappable_end);
> +};
> +#define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
> +
> +struct i915_hw_ppgtt {
> +	struct i915_address_space base;
> +	struct kref ref;
> +	struct drm_mm_node node;
> +	unsigned num_pd_entries;
> +	unsigned num_pd_pages; /* gen8+ */
> +	union {
> +		struct page **pt_pages;
> +		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
> +	};
> +	struct page *pd_pages;
> +	union {
> +		uint32_t pd_offset;
> +		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
> +	};
> +	union {
> +		dma_addr_t *pt_dma_addr;
> +		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPS];
> +	};
> +
> +	int (*enable)(struct i915_hw_ppgtt *ppgtt);
> +	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
> +			 struct intel_ring_buffer *ring,
> +			 bool synchronous);
> +	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
> +};
> +
> +#endif
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 16/26] drm/i915: Generalize GEN6 mapping
  2014-03-18  5:48 ` [PATCH 16/26] drm/i915: Generalize GEN6 mapping Ben Widawsky
@ 2014-03-18  9:22   ` Chris Wilson
  0 siblings, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-18  9:22 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:48PM -0700, Ben Widawsky wrote:
> Having a more general way of doing mappings will allow the ability to
> easy map and unmap a specific page table. Specifically in this case, we
> pass down the page directory + entry, and the page table to map. This
> works similarly to the x86 code.
> 
> The same work will need to happen for GEN8. At that point I will try to
> combine functionality.

pt->daddr is quite close to pgtt->pd_addr (just arguing that I'm not
convinced by the choice of daddr naming)

> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 61 +++++++++++++++++++------------------
>  drivers/gpu/drm/i915/i915_gem_gtt.h |  2 ++
>  2 files changed, 34 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 5c08cf9..35acccb 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -663,18 +663,13 @@ bail:
>  
>  static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>  {
> -	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
>  	struct i915_address_space *vm = &ppgtt->base;
> -	gen6_gtt_pte_t __iomem *pd_addr;
>  	gen6_gtt_pte_t scratch_pte;
>  	uint32_t pd_entry;
>  	int pte, pde;
>  
>  	scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
>  
> -	pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
> -		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
> -
>  	seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
>  		   ppgtt->pd.pd_offset,
>  		   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
> @@ -682,7 +677,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>  		u32 expected;
>  		gen6_gtt_pte_t *pt_vaddr;
>  		dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
> -		pd_entry = readl(pd_addr + pde);
> +		pd_entry = readl(ppgtt->pd_addr + pde);
>  		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
>  
>  		if (pd_entry != expected)
> @@ -718,39 +713,43 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
>  	}
>  }
>  
> -static void gen6_map_single(struct i915_hw_ppgtt *ppgtt,
> -			    const unsigned pde_index,
> -			    dma_addr_t daddr)
> +/* Map pde (index) from the page directory @pd to the page table @pt */
> +static void gen6_map_single(struct i915_pagedir *pd,
> +			    const int pde, struct i915_pagetab *pt)
>  {
> -	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
> -	uint32_t pd_entry;
> -	gen6_gtt_pte_t __iomem *pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm;
> -	pd_addr	+= ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
> +	struct i915_hw_ppgtt *ppgtt =
> +		container_of(pd, struct i915_hw_ppgtt, pd);
> +	u32 pd_entry;
>  
> -	pd_entry = GEN6_PDE_ADDR_ENCODE(daddr);
> +	pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
>  	pd_entry |= GEN6_PDE_VALID;
>  
> -	writel(pd_entry, pd_addr + pde_index);
> +	writel(pd_entry, ppgtt->pd_addr + pde);
> +
> +	/* XXX: Caller needs to make sure the write completes if necessary */
>  }
>  
>  /* Map all the page tables found in the ppgtt structure to incrementing page
>   * directories. */
> -static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
> +static void gen6_map_page_range(struct drm_i915_private *dev_priv,
> +				struct i915_pagedir *pd, unsigned pde, size_t n)
>  {
> -	struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
> -	int i;
> +	if (WARN_ON(pde + n > I915_PDES_PER_PD))
> +		n = I915_PDES_PER_PD - pde;

I don't think the rest of the code is prepared for such errors.

> -	WARN_ON(ppgtt->pd.pd_offset & 0x3f);
> -	for (i = 0; i < ppgtt->num_pd_entries; i++)
> -		gen6_map_single(ppgtt, i, ppgtt->pd.page_tables[i]->daddr);
> +	n += pde;
> +
> +	for (; pde < n; pde++)
> +		gen6_map_single(pd, pde, pd->page_tables[pde]);
>  
> +	/* Make sure write is complete before other code can use this page
> +	 * table. Also require for WC mapped PTEs */
>  	readl(dev_priv->gtt.gsm);
>  }
>  
>  static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
>  {
>  	BUG_ON(ppgtt->pd.pd_offset & 0x3f);
> -
>  	return (ppgtt->pd.pd_offset / 64) << 16;
>  }
>  
> @@ -1184,7 +1183,10 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>  	ppgtt->pd.pd_offset =
>  		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
>  
> -	gen6_map_page_tables(ppgtt);
> +	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
> +		ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);

Would this look simpler as
	ppgtt->pd_addr = (gen6_gtt_pte_t __iomem *)
		(dev_priv->gtt.gsm + ppgtt->pd.pd_offset);

Although the use of (gen6_gtt_pte_t __iomem*) looks wrong as
ppgtt->pd_addr can not be declared as that type.

> +
> +	gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
>  
>  	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
>  			 ppgtt->node.size >> 20,
> @@ -1355,13 +1357,14 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
>  
>  	list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
>  		/* TODO: Perhaps it shouldn't be gen6 specific */
> -		if (i915_is_ggtt(vm)) {
> -			if (dev_priv->mm.aliasing_ppgtt)
> -				gen6_map_page_tables(dev_priv->mm.aliasing_ppgtt);
> -			continue;
> -		}
>  
> -		gen6_map_page_tables(container_of(vm, struct i915_hw_ppgtt, base));
> +		struct i915_hw_ppgtt *ppgtt =
> +			container_of(vm, struct i915_hw_ppgtt, base);
> +
> +		if (i915_is_ggtt(vm))
> +			ppgtt = dev_priv->mm.aliasing_ppgtt;
> +
> +		gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);

That's worth the hassle! :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 17/26] drm/i915: Clean up pagetable DMA map & unmap
  2014-03-18  5:48 ` [PATCH 17/26] drm/i915: Clean up pagetable DMA map & unmap Ben Widawsky
@ 2014-03-18  9:24   ` Chris Wilson
  0 siblings, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-18  9:24 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:49PM -0700, Ben Widawsky wrote:
> Map and unmap are common operations across all generations for
> pagetables. With a simple helper, we can get a nice net code reduction
> as well as simplified complexity.
> 
> There is some room for optimization here, for instance with the multiple
> page mapping, that can be done in one pci_map operation. In that case
> however, the max value we'll ever see there is 512, and so I believe the
> simpler code makes this a worthwhile trade-off. Also, the range mapping
> functions are place holders to help transition the code. Eventually,
> mapping will only occur during a page allocation which will always be a
> discrete operation.

Nice. (Except still uneasy about that WARN_ON ;-)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 18/26] drm/i915: Always dma map page table allocations
  2014-03-18  5:48 ` [PATCH 18/26] drm/i915: Always dma map page table allocations Ben Widawsky
@ 2014-03-18  9:25   ` Chris Wilson
  0 siblings, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-18  9:25 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:50PM -0700, Ben Widawsky wrote:
> There is never a case where we don't want to do it. Since we've broken
> up the allocations into nice clean helper functions, it's both easy and
> obvious to do the dma mapping at the same time.

Lvgtm.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 19/26] drm/i915: Consolidate dma mappings
  2014-03-18  5:48 ` [PATCH 19/26] drm/i915: Consolidate dma mappings Ben Widawsky
@ 2014-03-18  9:28   ` Chris Wilson
  0 siblings, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-18  9:28 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:51PM -0700, Ben Widawsky wrote:
> With a little bit of macro magic, and the fact that every page
> table/dir/etc. we wish to map will have a page, and daddr member, we can
> greatly simplify and reduce code.
> 
> The patch introduces an i915_dma_map/unmap which has the same semantics
> as pci_map_page, but is 1 line, and doesn't require newlines, or local
> variables to make it fit cleanly.
> 
> Notice that even the page allocation shares this same attribute. For
> now, I am leaving that code untouched because the macro version would be
> a bit on the big side - but it's a nice cleanup as well (IMO)

Doesn't this make the error unwinding very fragile and likely to unmap a
pci_dma_mapping_error() cookie rather than the dma_addr_t?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 20/26] drm/i915: Always dma map page directory allocations
  2014-03-18  5:48 ` [PATCH 20/26] drm/i915: Always dma map page directory allocations Ben Widawsky
@ 2014-03-18  9:29   ` Chris Wilson
  0 siblings, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-18  9:29 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:52PM -0700, Ben Widawsky wrote:
> Similar to the patch a few back in the series, we can always map and
> unmap page directories when we do their allocation and teardown. Page
> directory pages only exist on gen8+, so this should only effect behavior
> on those platforms.

Lgtm.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 12/26] drm/i915: Page table helpers, and define renames
  2014-03-18  9:05   ` Chris Wilson
@ 2014-03-18 18:29     ` Jesse Barnes
  2014-03-19  0:58       ` Ben Widawsky
  0 siblings, 1 reply; 62+ messages in thread
From: Jesse Barnes @ 2014-03-18 18:29 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel GFX, Ben Widawsky

On Tue, 18 Mar 2014 09:05:58 +0000
Chris Wilson <chris@chris-wilson.co.uk> wrote:

> On Mon, Mar 17, 2014 at 10:48:44PM -0700, Ben Widawsky wrote:
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> > @@ -1,8 +1,11 @@
> >  #ifndef _I915_GEM_GTT_H
> >  #define _I915_GEM_GTT_H
> >  
> > -#define GEN6_PPGTT_PD_ENTRIES 512
> > -#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
> > +/* GEN Agnostic defines */
> > +#define I915_PDES_PER_PD		512
> > +#define I915_PTE_MASK			(PAGE_SHIFT-1)
> 
> That looks decidely fishy.
> 
> PAGE_SHIFT is 12 -> PTE_MASK = 0xb
> 
> > +#define I915_PDE_MASK			(I915_PDES_PER_PD-1)
> > +
> >  typedef uint32_t gen6_gtt_pte_t;
> >  typedef uint64_t gen8_gtt_pte_t;
> >  typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
> > @@ -23,6 +26,98 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
> >  #define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
> >  #define HSW_PTE_ADDR_ENCODE(addr)	HSW_GTT_ADDR_ENCODE(addr)
> >  
> > +
> > +/* GEN6 PPGTT resembles a 2 level page table:
> > + * 31:22 | 21:12 |  11:0
> > + *  PDE  |  PTE  | offset
> > + */
> > +#define GEN6_PDE_SHIFT			22
> > +#define GEN6_PTES_PER_PT		(PAGE_SIZE / sizeof(gen6_gtt_pte_t))
> > +
> > +static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
> > +{
> > +	const uint32_t mask = (1 << (pde_shift - PAGE_SHIFT)) - 1;
> > +	return (address >> PAGE_SHIFT) & mask;
> > +}
> > +
> > +/* Helper to counts the number of PTEs within the given length. This count does
> > + * not cross a page table boundary, so the max value would be
> > + * GEN6_PTES_PER_PT for GEN6, and GEN8_PTES_PER_PT for GEN8.
> > + */
> > +static inline size_t i915_pte_count(uint64_t addr, size_t length,
> > +				    uint32_t pde_shift)
> > +{
> > +	const uint64_t pd_mask = ~((1 << pde_shift) - 1);
> > +	uint64_t end;
> > +
> > +	if (WARN_ON(!length))
> > +		return 0;
> > +
> > +	if (WARN_ON(addr % PAGE_SIZE))
> > +		addr = round_down(addr, PAGE_SIZE);
> > +
> > +	if (WARN_ON(length % PAGE_SIZE))
> > +		length = round_up(length, PAGE_SIZE);
> 
> Oh oh. I think these fixups are very suspect, so just
> BUG_ON(length == 0);
> BUG_ON(offset_in_page(addr|length));
> 
> > +
> > +	end = addr + length;
> > +
> > +	if ((addr & pd_mask) != (end & pd_mask))
> > +		return (1 << (pde_shift - PAGE_SHIFT)) -
> 
> #define NUM_PTE(pde_shift) (1 << (pde_shift - PAGE_SHIFT))
> here and for computing the pd_mask.
> 
> > +			i915_pte_index(addr, pde_shift);
> > +
> > +	return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
> > +}
> 
> Otherwise the helpers look a useful improvement in readability.
> -Chris
> 

Can we use GTT_PAGE_SIZE here too?  I'm worried the kernel PAGE_SIZE
will change at some point and blow us up.  At least in places where
we're doing our own thing rather than using the x86 bits...

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 12/26] drm/i915: Page table helpers, and define renames
  2014-03-18 18:29     ` Jesse Barnes
@ 2014-03-19  0:58       ` Ben Widawsky
  0 siblings, 0 replies; 62+ messages in thread
From: Ben Widawsky @ 2014-03-19  0:58 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel GFX, Ben Widawsky

On Tue, Mar 18, 2014 at 11:29:58AM -0700, Jesse Barnes wrote:
> On Tue, 18 Mar 2014 09:05:58 +0000
> Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
> > On Mon, Mar 17, 2014 at 10:48:44PM -0700, Ben Widawsky wrote:
> > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> > > @@ -1,8 +1,11 @@
> > >  #ifndef _I915_GEM_GTT_H
> > >  #define _I915_GEM_GTT_H
> > >  
> > > -#define GEN6_PPGTT_PD_ENTRIES 512
> > > -#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
> > > +/* GEN Agnostic defines */
> > > +#define I915_PDES_PER_PD		512
> > > +#define I915_PTE_MASK			(PAGE_SHIFT-1)
> > 
> > That looks decidely fishy.
> > 
> > PAGE_SHIFT is 12 -> PTE_MASK = 0xb
> > 

Thanks for catching this. I'll presume the define isn't even used.

> > > +#define I915_PDE_MASK			(I915_PDES_PER_PD-1)
> > > +
> > >  typedef uint32_t gen6_gtt_pte_t;
> > >  typedef uint64_t gen8_gtt_pte_t;
> > >  typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
> > > @@ -23,6 +26,98 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
> > >  #define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
> > >  #define HSW_PTE_ADDR_ENCODE(addr)	HSW_GTT_ADDR_ENCODE(addr)
> > >  
> > > +
> > > +/* GEN6 PPGTT resembles a 2 level page table:
> > > + * 31:22 | 21:12 |  11:0
> > > + *  PDE  |  PTE  | offset
> > > + */
> > > +#define GEN6_PDE_SHIFT			22
> > > +#define GEN6_PTES_PER_PT		(PAGE_SIZE / sizeof(gen6_gtt_pte_t))
> > > +
> > > +static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
> > > +{
> > > +	const uint32_t mask = (1 << (pde_shift - PAGE_SHIFT)) - 1;
> > > +	return (address >> PAGE_SHIFT) & mask;
> > > +}
> > > +
> > > +/* Helper to counts the number of PTEs within the given length. This count does
> > > + * not cross a page table boundary, so the max value would be
> > > + * GEN6_PTES_PER_PT for GEN6, and GEN8_PTES_PER_PT for GEN8.
> > > + */
> > > +static inline size_t i915_pte_count(uint64_t addr, size_t length,
> > > +				    uint32_t pde_shift)
> > > +{
> > > +	const uint64_t pd_mask = ~((1 << pde_shift) - 1);
> > > +	uint64_t end;
> > > +
> > > +	if (WARN_ON(!length))
> > > +		return 0;
> > > +
> > > +	if (WARN_ON(addr % PAGE_SIZE))
> > > +		addr = round_down(addr, PAGE_SIZE);
> > > +
> > > +	if (WARN_ON(length % PAGE_SIZE))
> > > +		length = round_up(length, PAGE_SIZE);
> > 
> > Oh oh. I think these fixups are very suspect, so just
> > BUG_ON(length == 0);
> > BUG_ON(offset_in_page(addr|length));
> > 

I thought someone might have an issue with the BUG_ON. But I prefer it
as well.

> > > +
> > > +	end = addr + length;
> > > +
> > > +	if ((addr & pd_mask) != (end & pd_mask))
> > > +		return (1 << (pde_shift - PAGE_SHIFT)) -
> > 
> > #define NUM_PTE(pde_shift) (1 << (pde_shift - PAGE_SHIFT))
> > here and for computing the pd_mask.
> > 
> > > +			i915_pte_index(addr, pde_shift);
> > > +
> > > +	return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
> > > +}
> > 
> > Otherwise the helpers look a useful improvement in readability.
> > -Chris
> > 
> 
> Can we use GTT_PAGE_SIZE here too?  I'm worried the kernel PAGE_SIZE
> will change at some point and blow us up.  At least in places where
> we're doing our own thing rather than using the x86 bits...

That's fine with me. We have quite a few other places in our code which
depend on PAGE_SIZE being 4k though.

It's likely I'll be maintaining this branch myself for a while, but I'll
modify these both locally.

> 
> -- 
> Jesse Barnes, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 25/26] drm/i915: Print used ppgtt pages for gen6 in debugfs
  2014-03-18  5:48 ` [PATCH 25/26] drm/i915: Print used ppgtt pages for gen6 in debugfs Ben Widawsky
@ 2014-03-20 10:09   ` Chris Wilson
  2014-03-20 10:17   ` Chris Wilson
  1 sibling, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-20 10:09 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:57PM -0700, Ben Widawsky wrote:
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 5f3666a..04d40fa 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1785,10 +1785,26 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int verb
>  	}
>  }
>  
> +static size_t gen6_ppgtt_count_pt_pages(struct i915_hw_ppgtt *ppgtt)
> +{
> +	struct i915_pagedir *pd = &ppgtt->pd;
> +	struct i915_pagetab **pt = &pd->page_tables[0];
> +	size_t cnt = 0;
> +	int i;

How can the count be a size_t when cnt <= i  and i is only an int?

What was the reason for picking size_t here? Does that have far reaching
implications?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 25/26] drm/i915: Print used ppgtt pages for gen6 in debugfs
  2014-03-18  5:48 ` [PATCH 25/26] drm/i915: Print used ppgtt pages for gen6 in debugfs Ben Widawsky
  2014-03-20 10:09   ` Chris Wilson
@ 2014-03-20 10:17   ` Chris Wilson
  1 sibling, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-20 10:17 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:57PM -0700, Ben Widawsky wrote:
>  static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt, const char *name)
>  {
>  	seq_printf(m, "%s:\n", name);
>  	seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
> +	seq_printf(m, "\tpd pages: %zu\n", gen6_ppgtt_count_pt_pages(ppgtt));
Also

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 04d40fa..68fa91e 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1804,7 +1804,7 @@ static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt, const c
 {
        seq_printf(m, "%s:\n", name);
        seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
-       seq_printf(m, "\tpd pages: %zu\n", gen6_ppgtt_count_pt_pages(ppgtt));
+       seq_printf(m, "\tpd pages: %zu / %d\n", gen6_ppgtt_count_pt_pages(ppgtt), ppgtt->num_pd_entries);
 }
 
 static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool verbose)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/26] drm/i915: Split out verbose PPGTT dumping
  2014-03-18  5:48 ` [PATCH 01/26] drm/i915: Split out verbose PPGTT dumping Ben Widawsky
@ 2014-03-20 11:57   ` Chris Wilson
  2014-03-20 12:08     ` Chris Wilson
  0 siblings, 1 reply; 62+ messages in thread
From: Chris Wilson @ 2014-03-20 11:57 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:33PM -0700, Ben Widawsky wrote:
> There often is not enough memory to dump the full contents of the PPGTT.
> As a temporary bandage, to continue getting valuable basic PPGTT info,
> wrap the dangerous, memory hungry part inside of a new verbose version
> of the debugfs file.
> 
> Also while here we can split out the ppgtt print function so it's more
> reusable.
> 
> I'd really like to get ppgtt info into our error state, but I found it too
> difficult to make work in the limited time I have. Maybe Mika can find a way.

ppgtt_info is not printing out the user contexts - those are the
most interesting ones that should get dynamically allocated.

=0 ickle:/usr/src/linux$ git diff | cat
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 04d40fa..442a075 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1749,17 +1749,6 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
 	return 0;
 }
 
-static int per_file_ctx(int id, void *ptr, void *data)
-{
-	struct i915_hw_context *ctx = ptr;
-	struct seq_file *m = data;
-	struct i915_hw_ppgtt *ppgtt = ctx_to_ppgtt(ctx);
-
-	ppgtt->debug_dump(ppgtt, m);
-
-	return 0;
-}
-
 static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int verbose)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1804,7 +1793,21 @@ static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt, const c
 {
 	seq_printf(m, "%s:\n", name);
 	seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
-	seq_printf(m, "\tpd pages: %zu\n", gen6_ppgtt_count_pt_pages(ppgtt));
+	seq_printf(m, "\tpd pages: %zu / %d\n", gen6_ppgtt_count_pt_pages(ppgtt), ppgtt->num_pd_entries);
+}
+
+static int per_file_ctx(int id, void *ptr, void *data)
+{
+	struct i915_hw_context *ctx = ptr;
+	struct seq_file *m = (void *)((unsigned long)data & ~1);
+	bool verbose = (unsigned long)data & 1;
+	struct i915_hw_ppgtt *ppgtt = ctx_to_ppgtt(ctx);
+
+	print_ppgtt(m, ppgtt, ctx->id == DEFAULT_CONTEXT_ID ? "Default context" : "User context");
+	if (verbose)
+		ppgtt->debug_dump(ppgtt, m);
+
+	return 0;
 }
 
 static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool verbose)
@@ -1838,14 +1841,11 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool ver
 
 	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
 		struct drm_i915_file_private *file_priv = file->driver_priv;
-		struct i915_hw_ppgtt *pvt_ppgtt;
 
-		pvt_ppgtt = ctx_to_ppgtt(file_priv->private_default_ctx);
 		seq_printf(m, "proc: %s\n",
 			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
-		print_ppgtt(m, pvt_ppgtt, "Default context");
-		if (verbose)
-			idr_for_each(&file_priv->context_idr, per_file_ctx, m);
+		idr_for_each(&file_priv->context_idr, per_file_ctx,
+			     (void *)((unsigned long)m | verbose));
 	}
 }
 

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/26] drm/i915: Split out verbose PPGTT dumping
  2014-03-20 11:57   ` Chris Wilson
@ 2014-03-20 12:08     ` Chris Wilson
  2014-03-22 18:13       ` Ben Widawsky
  0 siblings, 1 reply; 62+ messages in thread
From: Chris Wilson @ 2014-03-20 12:08 UTC (permalink / raw)
  To: Ben Widawsky, Intel GFX

On Thu, Mar 20, 2014 at 11:57:42AM +0000, Chris Wilson wrote:
>  static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool verbose)
> @@ -1838,14 +1841,11 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool ver
>  
>  	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
>  		struct drm_i915_file_private *file_priv = file->driver_priv;
> -		struct i915_hw_ppgtt *pvt_ppgtt;
>  
> -		pvt_ppgtt = ctx_to_ppgtt(file_priv->private_default_ctx);
>  		seq_printf(m, "proc: %s\n",
>  			   get_pid_task(file->pid, PIDTYPE_PID)->comm);

And 
	seq_printf(m, "\nproc: %s\n",
for good measure

> -		print_ppgtt(m, pvt_ppgtt, "Default context");
> -		if (verbose)
> -			idr_for_each(&file_priv->context_idr, per_file_ctx, m);
> +		idr_for_each(&file_priv->context_idr, per_file_ctx,
> +			     (void *)((unsigned long)m | verbose));
>  	}
>  }
>  
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 24/26] drm/i915: Finish gen6/7 dynamic page table allocation
  2014-03-18  5:48 ` [PATCH 24/26] drm/i915: Finish gen6/7 dynamic page table allocation Ben Widawsky
@ 2014-03-20 12:15   ` Chris Wilson
  0 siblings, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-20 12:15 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:56PM -0700, Ben Widawsky wrote:
> +static DECLARE_BITMAP(new_page_tables, I915_PDES_PER_PD);

It is only 64 bytes, I think we can accommodate that on stack.

Otherwise, I could barely find anything to quibble about.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 00/26] [RFCish] GEN7 dynamic page tables
  2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
                   ` (25 preceding siblings ...)
  2014-03-18  5:48 ` [PATCH 26/26] FOR REFERENCE ONLY Ben Widawsky
@ 2014-03-20 12:17 ` Chris Wilson
  26 siblings, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-20 12:17 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Mon, Mar 17, 2014 at 10:48:32PM -0700, Ben Widawsky wrote:
> Okay, so what does this do?
> The patch series /dynamicizes/ page table allocation and teardown for
> GEN7. It also starts to introduce GEN8, but the tricky stuff is still
> not done. Up until now, all our page tables are pre-allocated when the
> address space is created. That's actually okay for current GENs since we
> don't use many address spaces, and the page tables occupy only 2MB each.
> However, on GEN8 we can use a deeper page table, and to preallocate such
> an address space would be very costly. This work was done for GEN7 first
> because this is the most well tested with full PPGTT, and stable
> platforms are readily available.
> 
> In this patch series, I've demonstrated how we will manage tracking used
> page tables (bitmaps), and broken things out into much more discrete
> functions. I'm hoping I'll get feedback on the way I've implemented
> things (primarily if it seems fundamentally flawed in any way). The real
> goal was to prove out the dynamic allocation so we can begin to enable
> GEN8 in the same way. I'll emphasize now that I put in a lot of effort
> limit risk with each patch, and this does result in some excess churn.

I like it.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/26] drm/i915: Split out verbose PPGTT dumping
  2014-03-20 12:08     ` Chris Wilson
@ 2014-03-22 18:13       ` Ben Widawsky
  2014-03-22 20:59         ` Chris Wilson
  0 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-22 18:13 UTC (permalink / raw)
  To: Chris Wilson, Ben Widawsky, Intel GFX

On Thu, Mar 20, 2014 at 12:08:00PM +0000, Chris Wilson wrote:
> On Thu, Mar 20, 2014 at 11:57:42AM +0000, Chris Wilson wrote:
> >  static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool verbose)
> > @@ -1838,14 +1841,11 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool ver
> >  
> >  	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
> >  		struct drm_i915_file_private *file_priv = file->driver_priv;
> > -		struct i915_hw_ppgtt *pvt_ppgtt;
> >  
> > -		pvt_ppgtt = ctx_to_ppgtt(file_priv->private_default_ctx);
> >  		seq_printf(m, "proc: %s\n",
> >  			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
> 
> And 
> 	seq_printf(m, "\nproc: %s\n",
> for good measure
> 
> > -		print_ppgtt(m, pvt_ppgtt, "Default context");
> > -		if (verbose)
> > -			idr_for_each(&file_priv->context_idr, per_file_ctx, m);
> > +		idr_for_each(&file_priv->context_idr, per_file_ctx,
> > +			     (void *)((unsigned long)m | verbose));
> >  	}
> >  }
> >  
> > 
> > -- 

Thanks, I like it. I'm assuming you didn't want the count_pt_pages stuck
in at this point (your diff was based on the end result)? I can do that
if you prefer it. It seems pointless to me though.

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/26] drm/i915: clean up PPGTT init error path
  2014-03-18  8:44   ` Chris Wilson
@ 2014-03-22 19:43     ` Ben Widawsky
  2014-03-22 20:58       ` Chris Wilson
  0 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-22 19:43 UTC (permalink / raw)
  To: Chris Wilson, Ben Widawsky, Intel GFX

On Tue, Mar 18, 2014 at 08:44:28AM +0000, Chris Wilson wrote:
> On Mon, Mar 17, 2014 at 10:48:39PM -0700, Ben Widawsky wrote:
> > The old code (I'm having trouble finding the commit) had a reason for
> > doing things when there was an error, and would continue on, thus the
> > !ret. For the newer code however, this looks completely silly.
> > 
> > Follow the normal idiom of if (ret) return ret.
> > 
> > Also, put the pde wiring in the gen specific init, now that GEN8 exists.
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_gtt.c | 22 +++++++++-------------
> >  1 file changed, 9 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index 1620211..5f73284 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -1202,6 +1202,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> >  	ppgtt->pd_offset =
> >  		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
> >  
> > +	gen6_write_pdes(ppgtt);
> > +
> >  	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
> >  
> >  	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
> > @@ -1226,20 +1228,14 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
> >  	else
> >  		BUG();
> >  
> > -	if (!ret) {
> > -		struct drm_i915_private *dev_priv = dev->dev_private;
> > -		kref_init(&ppgtt->ref);
> > -		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
> > -			    ppgtt->base.total);
> > -		i915_init_vm(dev_priv, &ppgtt->base);
> > -		if (INTEL_INFO(dev)->gen < 8) {
> > -			gen6_write_pdes(ppgtt);
> > -			DRM_DEBUG("Adding PPGTT at offset %x\n",
> > -				  ppgtt->pd_offset << 10);
> > -		}
> > -	}
> > +	if (ret)
> > +		return ret;
> >  
> > -	return ret;
> > +	kref_init(&ppgtt->ref);
> > +	drm_mm_init(&ppgtt->base.mm, ppgtt->base.start, ppgtt->base.total);
> > +	i915_init_vm(dev_priv, &ppgtt->base);
> 
> Didn't we just delete the dev_priv local variable?
> -Chris

The important part is that the pde writes moved. (The DRM debug is also
dropped). As for this code, I just wanted to get rid of the if (!ret)
block. It looks weird.

Maybe I didn't get what you're asking though.

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 09/26] drm/i915: Split out gtt specific header file
  2014-03-18  9:15   ` Daniel Vetter
@ 2014-03-22 19:44     ` Ben Widawsky
  2014-03-23  0:46       ` Daniel Vetter
  0 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-22 19:44 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel GFX, Ben Widawsky

On Tue, Mar 18, 2014 at 10:15:56AM +0100, Daniel Vetter wrote:
> On Mon, Mar 17, 2014 at 10:48:41PM -0700, Ben Widawsky wrote:
> > TODO: Do header files need a copyright?
> 
> Yup ;-)
> 
> I like this though, especially since finer-grained files will make
> kerneldoc inclusion (well, grouped into sensible chapters at least) much
> simpler.
> -Daniel
> 

If I re-submit just this patch (with the copyright), will you merge it?
It will make my life so much easier.

> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h     | 162 +-------------------------
> >  drivers/gpu/drm/i915/i915_gem_gtt.c |  57 ---------
> >  drivers/gpu/drm/i915/i915_gem_gtt.h | 225 ++++++++++++++++++++++++++++++++++++
> >  3 files changed, 227 insertions(+), 217 deletions(-)
> >  create mode 100644 drivers/gpu/drm/i915/i915_gem_gtt.h
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 084e82f..b19442c 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -44,6 +44,8 @@
> >  #include <linux/kref.h>
> >  #include <linux/pm_qos.h>
> >  
> > +#include "i915_gem_gtt.h"
> > +
> >  /* General customization:
> >   */
> >  
> > @@ -572,166 +574,6 @@ enum i915_cache_level {
> >  	I915_CACHE_WT, /* hsw:gt3e WriteThrough for scanouts */
> >  };
> >  
> > -typedef uint32_t gen6_gtt_pte_t;
> > -
> > -/**
> > - * A VMA represents a GEM BO that is bound into an address space. Therefore, a
> > - * VMA's presence cannot be guaranteed before binding, or after unbinding the
> > - * object into/from the address space.
> > - *
> > - * To make things as simple as possible (ie. no refcounting), a VMA's lifetime
> > - * will always be <= an objects lifetime. So object refcounting should cover us.
> > - */
> > -struct i915_vma {
> > -	struct drm_mm_node node;
> > -	struct drm_i915_gem_object *obj;
> > -	struct i915_address_space *vm;
> > -
> > -	/** This object's place on the active/inactive lists */
> > -	struct list_head mm_list;
> > -
> > -	struct list_head vma_link; /* Link in the object's VMA list */
> > -
> > -	/** This vma's place in the batchbuffer or on the eviction list */
> > -	struct list_head exec_list;
> > -
> > -	/**
> > -	 * Used for performing relocations during execbuffer insertion.
> > -	 */
> > -	struct hlist_node exec_node;
> > -	unsigned long exec_handle;
> > -	struct drm_i915_gem_exec_object2 *exec_entry;
> > -
> > -	/**
> > -	 * How many users have pinned this object in GTT space. The following
> > -	 * users can each hold at most one reference: pwrite/pread, pin_ioctl
> > -	 * (via user_pin_count), execbuffer (objects are not allowed multiple
> > -	 * times for the same batchbuffer), and the framebuffer code. When
> > -	 * switching/pageflipping, the framebuffer code has at most two buffers
> > -	 * pinned per crtc.
> > -	 *
> > -	 * In the worst case this is 1 + 1 + 1 + 2*2 = 7. That would fit into 3
> > -	 * bits with absolutely no headroom. So use 4 bits. */
> > -	unsigned int pin_count:4;
> > -#define DRM_I915_GEM_OBJECT_MAX_PIN_COUNT 0xf
> > -
> > -	/** Unmap an object from an address space. This usually consists of
> > -	 * setting the valid PTE entries to a reserved scratch page. */
> > -	void (*unbind_vma)(struct i915_vma *vma);
> > -	/* Map an object into an address space with the given cache flags. */
> > -#define GLOBAL_BIND (1<<0)
> > -	void (*bind_vma)(struct i915_vma *vma,
> > -			 enum i915_cache_level cache_level,
> > -			 u32 flags);
> > -};
> > -
> > -struct i915_address_space {
> > -	struct drm_mm mm;
> > -	struct drm_device *dev;
> > -	struct list_head global_link;
> > -	unsigned long start;		/* Start offset always 0 for dri2 */
> > -	size_t total;		/* size addr space maps (ex. 2GB for ggtt) */
> > -
> > -	struct {
> > -		dma_addr_t addr;
> > -		struct page *page;
> > -	} scratch;
> > -
> > -	/**
> > -	 * List of objects currently involved in rendering.
> > -	 *
> > -	 * Includes buffers having the contents of their GPU caches
> > -	 * flushed, not necessarily primitives.  last_rendering_seqno
> > -	 * represents when the rendering involved will be completed.
> > -	 *
> > -	 * A reference is held on the buffer while on this list.
> > -	 */
> > -	struct list_head active_list;
> > -
> > -	/**
> > -	 * LRU list of objects which are not in the ringbuffer and
> > -	 * are ready to unbind, but are still in the GTT.
> > -	 *
> > -	 * last_rendering_seqno is 0 while an object is in this list.
> > -	 *
> > -	 * A reference is not held on the buffer while on this list,
> > -	 * as merely being GTT-bound shouldn't prevent its being
> > -	 * freed, and we'll pull it off the list in the free path.
> > -	 */
> > -	struct list_head inactive_list;
> > -
> > -	/* FIXME: Need a more generic return type */
> > -	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
> > -				     enum i915_cache_level level,
> > -				     bool valid); /* Create a valid PTE */
> > -	void (*clear_range)(struct i915_address_space *vm,
> > -			    uint64_t start,
> > -			    uint64_t length,
> > -			    bool use_scratch);
> > -	void (*insert_entries)(struct i915_address_space *vm,
> > -			       struct sg_table *st,
> > -			       uint64_t start,
> > -			       enum i915_cache_level cache_level);
> > -	void (*cleanup)(struct i915_address_space *vm);
> > -};
> > -
> > -/* The Graphics Translation Table is the way in which GEN hardware translates a
> > - * Graphics Virtual Address into a Physical Address. In addition to the normal
> > - * collateral associated with any va->pa translations GEN hardware also has a
> > - * portion of the GTT which can be mapped by the CPU and remain both coherent
> > - * and correct (in cases like swizzling). That region is referred to as GMADR in
> > - * the spec.
> > - */
> > -struct i915_gtt {
> > -	struct i915_address_space base;
> > -	size_t stolen_size;		/* Total size of stolen memory */
> > -
> > -	unsigned long mappable_end;	/* End offset that we can CPU map */
> > -	struct io_mapping *mappable;	/* Mapping to our CPU mappable region */
> > -	phys_addr_t mappable_base;	/* PA of our GMADR */
> > -
> > -	/** "Graphics Stolen Memory" holds the global PTEs */
> > -	void __iomem *gsm;
> > -
> > -	bool do_idle_maps;
> > -
> > -	int mtrr;
> > -
> > -	/* global gtt ops */
> > -	int (*gtt_probe)(struct drm_device *dev, size_t *gtt_total,
> > -			  size_t *stolen, phys_addr_t *mappable_base,
> > -			  unsigned long *mappable_end);
> > -};
> > -#define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
> > -
> > -#define GEN8_LEGACY_PDPS 4
> > -struct i915_hw_ppgtt {
> > -	struct i915_address_space base;
> > -	struct kref ref;
> > -	struct drm_mm_node node;
> > -	unsigned num_pd_entries;
> > -	unsigned num_pd_pages; /* gen8+ */
> > -	union {
> > -		struct page **pt_pages;
> > -		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
> > -	};
> > -	struct page *pd_pages;
> > -	union {
> > -		uint32_t pd_offset;
> > -		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
> > -	};
> > -	union {
> > -		dma_addr_t *pt_dma_addr;
> > -		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPS];
> > -	};
> > -
> > -	int (*enable)(struct i915_hw_ppgtt *ppgtt);
> > -	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
> > -			 struct intel_ring_buffer *ring,
> > -			 bool synchronous);
> > -	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
> > -};
> > -
> >  struct i915_ctx_hang_stats {
> >  	/* This context had batch pending when hang was declared */
> >  	unsigned batch_pending;
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index 5f73284..a239196 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -53,60 +53,6 @@ bool intel_enable_ppgtt(struct drm_device *dev, bool full)
> >  		return HAS_ALIASING_PPGTT(dev);
> >  }
> >  
> > -#define GEN6_PPGTT_PD_ENTRIES 512
> > -#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
> > -typedef uint64_t gen8_gtt_pte_t;
> > -typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
> > -
> > -/* PPGTT stuff */
> > -#define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
> > -#define HSW_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0x7f0))
> > -
> > -#define GEN6_PDE_VALID			(1 << 0)
> > -/* gen6+ has bit 11-4 for physical addr bit 39-32 */
> > -#define GEN6_PDE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
> > -
> > -#define GEN6_PTE_VALID			(1 << 0)
> > -#define GEN6_PTE_UNCACHED		(1 << 1)
> > -#define HSW_PTE_UNCACHED		(0)
> > -#define GEN6_PTE_CACHE_LLC		(2 << 1)
> > -#define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
> > -#define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
> > -#define HSW_PTE_ADDR_ENCODE(addr)	HSW_GTT_ADDR_ENCODE(addr)
> > -
> > -/* Cacheability Control is a 4-bit value. The low three bits are stored in *
> > - * bits 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
> > - */
> > -#define HSW_CACHEABILITY_CONTROL(bits)	((((bits) & 0x7) << 1) | \
> > -					 (((bits) & 0x8) << (11 - 3)))
> > -#define HSW_WB_LLC_AGE3			HSW_CACHEABILITY_CONTROL(0x2)
> > -#define HSW_WB_LLC_AGE0			HSW_CACHEABILITY_CONTROL(0x3)
> > -#define HSW_WB_ELLC_LLC_AGE0		HSW_CACHEABILITY_CONTROL(0xb)
> > -#define HSW_WB_ELLC_LLC_AGE3		HSW_CACHEABILITY_CONTROL(0x8)
> > -#define HSW_WT_ELLC_LLC_AGE0		HSW_CACHEABILITY_CONTROL(0x6)
> > -#define HSW_WT_ELLC_LLC_AGE3		HSW_CACHEABILITY_CONTROL(0x7)
> > -
> > -#define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
> > -#define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
> > -
> > -/* GEN8 legacy style addressis defined as a 3 level page table:
> > - * 31:30 | 29:21 | 20:12 |  11:0
> > - * PDPE  |  PDE  |  PTE  | offset
> > - * The difference as compared to normal x86 3 level page table is the PDPEs are
> > - * programmed via register.
> > - */
> > -#define GEN8_PDPE_SHIFT			30
> > -#define GEN8_PDPE_MASK			0x3
> > -#define GEN8_PDE_SHIFT			21
> > -#define GEN8_PDE_MASK			0x1ff
> > -#define GEN8_PTE_SHIFT			12
> > -#define GEN8_PTE_MASK			0x1ff
> > -
> > -#define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
> > -#define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
> > -#define PPAT_CACHED_INDEX		_PAGE_PAT /* WB LLCeLLC */
> > -#define PPAT_DISPLAY_ELLC_INDEX		_PAGE_PCD /* WT eLLC */
> > -
> >  static void ppgtt_bind_vma(struct i915_vma *vma,
> >  			   enum i915_cache_level cache_level,
> >  			   u32 flags);
> > @@ -185,9 +131,6 @@ static gen6_gtt_pte_t ivb_pte_encode(dma_addr_t addr,
> >  	return pte;
> >  }
> >  
> > -#define BYT_PTE_WRITEABLE		(1 << 1)
> > -#define BYT_PTE_SNOOPED_BY_CPU_CACHES	(1 << 2)
> > -
> >  static gen6_gtt_pte_t byt_pte_encode(dma_addr_t addr,
> >  				     enum i915_cache_level level,
> >  				     bool valid)
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> > new file mode 100644
> > index 0000000..c8d5c77
> > --- /dev/null
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> > @@ -0,0 +1,225 @@
> > +#ifndef _I915_GEM_GTT_H
> > +#define _I915_GEM_GTT_H
> > +
> > +#define GEN6_PPGTT_PD_ENTRIES 512
> > +#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
> > +typedef uint32_t gen6_gtt_pte_t;
> > +typedef uint64_t gen8_gtt_pte_t;
> > +typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
> > +
> > +/* PPGTT stuff */
> > +#define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
> > +#define HSW_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0x7f0))
> > +
> > +#define GEN6_PDE_VALID			(1 << 0)
> > +/* gen6+ has bit 11-4 for physical addr bit 39-32 */
> > +#define GEN6_PDE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
> > +
> > +#define GEN6_PTE_VALID			(1 << 0)
> > +#define GEN6_PTE_UNCACHED		(1 << 1)
> > +#define HSW_PTE_UNCACHED		(0)
> > +#define GEN6_PTE_CACHE_LLC		(2 << 1)
> > +#define GEN7_PTE_CACHE_L3_LLC		(3 << 1)
> > +#define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
> > +#define HSW_PTE_ADDR_ENCODE(addr)	HSW_GTT_ADDR_ENCODE(addr)
> > +
> > +#define BYT_PTE_WRITEABLE		(1 << 1)
> > +#define BYT_PTE_SNOOPED_BY_CPU_CACHES	(1 << 2)
> > +
> > +/* Cacheability Control is a 4-bit value. The low three bits are stored in *
> > + * bits 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
> > + */
> > +#define HSW_CACHEABILITY_CONTROL(bits)	((((bits) & 0x7) << 1) | \
> > +					 (((bits) & 0x8) << (11 - 3)))
> > +#define HSW_WB_LLC_AGE3			HSW_CACHEABILITY_CONTROL(0x2)
> > +#define HSW_WB_LLC_AGE0			HSW_CACHEABILITY_CONTROL(0x3)
> > +#define HSW_WB_ELLC_LLC_AGE0		HSW_CACHEABILITY_CONTROL(0xb)
> > +#define HSW_WB_ELLC_LLC_AGE3		HSW_CACHEABILITY_CONTROL(0x8)
> > +#define HSW_WT_ELLC_LLC_AGE0		HSW_CACHEABILITY_CONTROL(0x6)
> > +#define HSW_WT_ELLC_LLC_AGE3		HSW_CACHEABILITY_CONTROL(0x7)
> > +
> > +#define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
> > +#define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
> > +#define PPAT_CACHED_INDEX		_PAGE_PAT /* WB LLCeLLC */
> > +#define PPAT_DISPLAY_ELLC_INDEX		_PAGE_PCD /* WT eLLC */
> > +
> > +#define GEN8_LEGACY_PDPS		4
> > +#define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
> > +#define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
> > +
> > +/* GEN8 legacy style addressis defined as a 3 level page table:
> > + * 31:30 | 29:21 | 20:12 |  11:0
> > + * PDPE  |  PDE  |  PTE  | offset
> > + * The difference as compared to normal x86 3 level page table is the PDPEs are
> > + * programmed via register.
> > + *
> > + * The x86 pagetable code is flexible in its ability to handle varying page
> > + * table depths via abstracted PGDIR/PUD/PMD/PTE. I've opted to not do this and
> > + * instead replicate the interesting functionality.
> > + */
> > +#define GEN8_PDPE_SHIFT			30
> > +#define GEN8_PDPE_MASK			0x3
> > +#define GEN8_PDE_SHIFT			21
> > +#define GEN8_PDE_MASK			0x1ff
> > +#define GEN8_PTE_SHIFT			12
> > +#define GEN8_PTE_MASK			0x1ff
> > +
> > +enum i915_cache_level;
> > +/**
> > + * A VMA represents a GEM BO that is bound into an address space. Therefore, a
> > + * VMA's presence cannot be guaranteed before binding, or after unbinding the
> > + * object into/from the address space.
> > + *
> > + * To make things as simple as possible (ie. no refcounting), a VMA's lifetime
> > + * will always be <= an objects lifetime. So object refcounting should cover us.
> > + */
> > +struct i915_vma {
> > +	struct drm_mm_node node;
> > +	struct drm_i915_gem_object *obj;
> > +	struct i915_address_space *vm;
> > +
> > +	/** This object's place on the active/inactive lists */
> > +	struct list_head mm_list;
> > +
> > +	struct list_head vma_link; /* Link in the object's VMA list */
> > +
> > +	/** This vma's place in the batchbuffer or on the eviction list */
> > +	struct list_head exec_list;
> > +
> > +	/**
> > +	 * Used for performing relocations during execbuffer insertion.
> > +	 */
> > +	struct hlist_node exec_node;
> > +	unsigned long exec_handle;
> > +	struct drm_i915_gem_exec_object2 *exec_entry;
> > +
> > +	/**
> > +	 * How many users have pinned this object in GTT space. The following
> > +	 * users can each hold at most one reference: pwrite/pread, pin_ioctl
> > +	 * (via user_pin_count), execbuffer (objects are not allowed multiple
> > +	 * times for the same batchbuffer), and the framebuffer code. When
> > +	 * switching/pageflipping, the framebuffer code has at most two buffers
> > +	 * pinned per crtc.
> > +	 *
> > +	 * In the worst case this is 1 + 1 + 1 + 2*2 = 7. That would fit into 3
> > +	 * bits with absolutely no headroom. So use 4 bits. */
> > +	unsigned int pin_count:4;
> > +#define DRM_I915_GEM_OBJECT_MAX_PIN_COUNT 0xf
> > +
> > +	/** Unmap an object from an address space. This usually consists of
> > +	 * setting the valid PTE entries to a reserved scratch page. */
> > +	void (*unbind_vma)(struct i915_vma *vma);
> > +	/* Map an object into an address space with the given cache flags. */
> > +#define GLOBAL_BIND (1<<0)
> > +	void (*bind_vma)(struct i915_vma *vma,
> > +			 enum i915_cache_level cache_level,
> > +			 u32 flags);
> > +};
> > +
> > +struct i915_address_space {
> > +	struct drm_mm mm;
> > +	struct drm_device *dev;
> > +	struct list_head global_link;
> > +	unsigned long start;		/* Start offset always 0 for dri2 */
> > +	size_t total;		/* size addr space maps (ex. 2GB for ggtt) */
> > +
> > +	struct {
> > +		dma_addr_t addr;
> > +		struct page *page;
> > +	} scratch;
> > +
> > +	/**
> > +	 * List of objects currently involved in rendering.
> > +	 *
> > +	 * Includes buffers having the contents of their GPU caches
> > +	 * flushed, not necessarily primitives.  last_rendering_seqno
> > +	 * represents when the rendering involved will be completed.
> > +	 *
> > +	 * A reference is held on the buffer while on this list.
> > +	 */
> > +	struct list_head active_list;
> > +
> > +	/**
> > +	 * LRU list of objects which are not in the ringbuffer and
> > +	 * are ready to unbind, but are still in the GTT.
> > +	 *
> > +	 * last_rendering_seqno is 0 while an object is in this list.
> > +	 *
> > +	 * A reference is not held on the buffer while on this list,
> > +	 * as merely being GTT-bound shouldn't prevent its being
> > +	 * freed, and we'll pull it off the list in the free path.
> > +	 */
> > +	struct list_head inactive_list;
> > +
> > +	/* FIXME: Need a more generic return type */
> > +	gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
> > +				     enum i915_cache_level level,
> > +				     bool valid); /* Create a valid PTE */
> > +	void (*clear_range)(struct i915_address_space *vm,
> > +			    uint64_t start,
> > +			    uint64_t length,
> > +			    bool use_scratch);
> > +	void (*insert_entries)(struct i915_address_space *vm,
> > +			       struct sg_table *st,
> > +			       uint64_t start,
> > +			       enum i915_cache_level cache_level);
> > +	void (*cleanup)(struct i915_address_space *vm);
> > +};
> > +
> > +/* The Graphics Translation Table is the way in which GEN hardware translates a
> > + * Graphics Virtual Address into a Physical Address. In addition to the normal
> > + * collateral associated with any va->pa translations GEN hardware also has a
> > + * portion of the GTT which can be mapped by the CPU and remain both coherent
> > + * and correct (in cases like swizzling). That region is referred to as GMADR in
> > + * the spec.
> > + */
> > +struct i915_gtt {
> > +	struct i915_address_space base;
> > +	size_t stolen_size;		/* Total size of stolen memory */
> > +
> > +	unsigned long mappable_end;	/* End offset that we can CPU map */
> > +	struct io_mapping *mappable;	/* Mapping to our CPU mappable region */
> > +	phys_addr_t mappable_base;	/* PA of our GMADR */
> > +
> > +	/** "Graphics Stolen Memory" holds the global PTEs */
> > +	void __iomem *gsm;
> > +
> > +	bool do_idle_maps;
> > +
> > +	int mtrr;
> > +
> > +	/* global gtt ops */
> > +	int (*gtt_probe)(struct drm_device *dev, size_t *gtt_total,
> > +			  size_t *stolen, phys_addr_t *mappable_base,
> > +			  unsigned long *mappable_end);
> > +};
> > +#define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
> > +
> > +struct i915_hw_ppgtt {
> > +	struct i915_address_space base;
> > +	struct kref ref;
> > +	struct drm_mm_node node;
> > +	unsigned num_pd_entries;
> > +	unsigned num_pd_pages; /* gen8+ */
> > +	union {
> > +		struct page **pt_pages;
> > +		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
> > +	};
> > +	struct page *pd_pages;
> > +	union {
> > +		uint32_t pd_offset;
> > +		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
> > +	};
> > +	union {
> > +		dma_addr_t *pt_dma_addr;
> > +		dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPS];
> > +	};
> > +
> > +	int (*enable)(struct i915_hw_ppgtt *ppgtt);
> > +	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
> > +			 struct intel_ring_buffer *ring,
> > +			 bool synchronous);
> > +	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
> > +};
> > +
> > +#endif
> > -- 
> > 1.9.0
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 14/26] drm/i915: Complete page table structures
  2014-03-18  9:09   ` Chris Wilson
@ 2014-03-22 20:10     ` Ben Widawsky
  2014-03-22 21:14       ` Chris Wilson
  0 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-22 20:10 UTC (permalink / raw)
  To: Chris Wilson, Ben Widawsky, Intel GFX

On Tue, Mar 18, 2014 at 09:09:45AM +0000, Chris Wilson wrote:
> On Mon, Mar 17, 2014 at 10:48:46PM -0700, Ben Widawsky wrote:
> > Move the remaining members over to the new page table structures.
> > 
> > This can be squashed with the previous commit if desire. The reasoning
> > is the same as that patch. I simply felt it is easier to review if split.
> 
> I'm not liking the shorter names much. Is there precedence elsewhere
> (e.g. daddr)?
> -Chris
> 

I'm not particularly attached to "daddr." It was fun to say in my head.
A lot of code does use "daddr" but it seems to vary between "dma",
"device", data", "destination" Not exactly precedence.

Initially I had the prefix p[td]_daddr, but I thought you might complain
about it because it's implicit. dma_addr seemed kinda redundant to me.

Recommendation?

> -- 
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 15/26] drm/i915: Create page table allocators
  2014-03-18  9:14   ` Chris Wilson
@ 2014-03-22 20:21     ` Ben Widawsky
  2014-03-22 21:10       ` Chris Wilson
  0 siblings, 1 reply; 62+ messages in thread
From: Ben Widawsky @ 2014-03-22 20:21 UTC (permalink / raw)
  To: Chris Wilson, Ben Widawsky, Intel GFX

On Tue, Mar 18, 2014 at 09:14:09AM +0000, Chris Wilson wrote:
> On Mon, Mar 17, 2014 at 10:48:47PM -0700, Ben Widawsky wrote:
> > As we move toward dynamic page table allocation, it becomes much easier
> > to manage our data structures if break do things less coarsely by
> > breaking up all of our actions into individual tasks.  This makes the
> > code easier to write, read, and verify.
> > 
> > Aside from the dissection of the allocation functions, the patch
> > statically allocates the page table structures without a page directory.
> > This remains the same for all platforms,
> > 
> > The patch itself should not have much functional difference. The primary
> > noticeable difference is the fact that page tables are no longer
> > allocated, but rather statically declared as part of the page directory.
> > This has non-zero overhead, but things gain non-trivial complexity as a
> > result.
> 
> We increase overhead for increased complexity. What's the selling point
> of this patch then?

I'd argue about the complexity. Personally, I think the result is easier
to read.

I'll add this all to the commit message, but hopefully you agree:

1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the dma mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.

2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).

3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and cathcing leaks should be easy.

4. Less important: the gfp flags are confined to one location, which
makes playing around with such things trivial.o

Hopefully you're more convinced after you went through more of the patch
series.

If you want to try to optimize the overhead of managing the page tables,
I think this is a worthy thing to do (for instance, not statically
declaring the array). It takes a little more work though, and I'd prefer
to do it after the code is doing what it's supposed to do.

> 
> Otherwise, patch does as you say.
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/26] drm/i915: clean up PPGTT init error path
  2014-03-22 19:43     ` Ben Widawsky
@ 2014-03-22 20:58       ` Chris Wilson
  2014-03-23 17:27         ` Ben Widawsky
  0 siblings, 1 reply; 62+ messages in thread
From: Chris Wilson @ 2014-03-22 20:58 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Sat, Mar 22, 2014 at 12:43:28PM -0700, Ben Widawsky wrote:
> On Tue, Mar 18, 2014 at 08:44:28AM +0000, Chris Wilson wrote:
> > On Mon, Mar 17, 2014 at 10:48:39PM -0700, Ben Widawsky wrote:
> > > The old code (I'm having trouble finding the commit) had a reason for
> > > doing things when there was an error, and would continue on, thus the
> > > !ret. For the newer code however, this looks completely silly.
> > > 
> > > Follow the normal idiom of if (ret) return ret.
> > > 
> > > Also, put the pde wiring in the gen specific init, now that GEN8 exists.
> > > 
> > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem_gtt.c | 22 +++++++++-------------
> > >  1 file changed, 9 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > index 1620211..5f73284 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > @@ -1202,6 +1202,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> > >  	ppgtt->pd_offset =
> > >  		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
> > >  
> > > +	gen6_write_pdes(ppgtt);
> > > +
> > >  	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
> > >  
> > >  	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
> > > @@ -1226,20 +1228,14 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
> > >  	else
> > >  		BUG();
> > >  
> > > -	if (!ret) {
> > > -		struct drm_i915_private *dev_priv = dev->dev_private;
> > > -		kref_init(&ppgtt->ref);
> > > -		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
> > > -			    ppgtt->base.total);
> > > -		i915_init_vm(dev_priv, &ppgtt->base);
> > > -		if (INTEL_INFO(dev)->gen < 8) {
> > > -			gen6_write_pdes(ppgtt);
> > > -			DRM_DEBUG("Adding PPGTT at offset %x\n",
> > > -				  ppgtt->pd_offset << 10);
> > > -		}
> > > -	}
> > > +	if (ret)
> > > +		return ret;
> > >  
> > > -	return ret;
> > > +	kref_init(&ppgtt->ref);
> > > +	drm_mm_init(&ppgtt->base.mm, ppgtt->base.start, ppgtt->base.total);
> > > +	i915_init_vm(dev_priv, &ppgtt->base);
> > 
> > Didn't we just delete the dev_priv local variable?
> > -Chris
> 
> The important part is that the pde writes moved. (The DRM debug is also
> dropped). As for this code, I just wanted to get rid of the if (!ret)
> block. It looks weird.
> 
> Maybe I didn't get what you're asking though.

I was wondering if this patch compiles because of the removal of the
dev_priv local variable. (Or if the original was a shadow.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/26] drm/i915: Split out verbose PPGTT dumping
  2014-03-22 18:13       ` Ben Widawsky
@ 2014-03-22 20:59         ` Chris Wilson
  0 siblings, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-22 20:59 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Sat, Mar 22, 2014 at 11:13:17AM -0700, Ben Widawsky wrote:
> On Thu, Mar 20, 2014 at 12:08:00PM +0000, Chris Wilson wrote:
> > On Thu, Mar 20, 2014 at 11:57:42AM +0000, Chris Wilson wrote:
> > >  static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool verbose)
> > > @@ -1838,14 +1841,11 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool ver
> > >  
> > >  	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
> > >  		struct drm_i915_file_private *file_priv = file->driver_priv;
> > > -		struct i915_hw_ppgtt *pvt_ppgtt;
> > >  
> > > -		pvt_ppgtt = ctx_to_ppgtt(file_priv->private_default_ctx);
> > >  		seq_printf(m, "proc: %s\n",
> > >  			   get_pid_task(file->pid, PIDTYPE_PID)->comm);
> > 
> > And 
> > 	seq_printf(m, "\nproc: %s\n",
> > for good measure
> > 
> > > -		print_ppgtt(m, pvt_ppgtt, "Default context");
> > > -		if (verbose)
> > > -			idr_for_each(&file_priv->context_idr, per_file_ctx, m);
> > > +		idr_for_each(&file_priv->context_idr, per_file_ctx,
> > > +			     (void *)((unsigned long)m | verbose));
> > >  	}
> > >  }
> > >  
> > > 
> > > -- 
> 
> Thanks, I like it. I'm assuming you didn't want the count_pt_pages stuck
> in at this point (your diff was based on the end result)? I can do that
> if you prefer it. It seems pointless to me though.

Maybe pointless to you, but it helped me to know when they were fully
populated - without having to look back at the constants.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 15/26] drm/i915: Create page table allocators
  2014-03-22 20:21     ` Ben Widawsky
@ 2014-03-22 21:10       ` Chris Wilson
  0 siblings, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-22 21:10 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Sat, Mar 22, 2014 at 01:21:39PM -0700, Ben Widawsky wrote:
> On Tue, Mar 18, 2014 at 09:14:09AM +0000, Chris Wilson wrote:
> > On Mon, Mar 17, 2014 at 10:48:47PM -0700, Ben Widawsky wrote:
> > > As we move toward dynamic page table allocation, it becomes much easier
> > > to manage our data structures if break do things less coarsely by
> > > breaking up all of our actions into individual tasks.  This makes the
> > > code easier to write, read, and verify.
> > > 
> > > Aside from the dissection of the allocation functions, the patch
> > > statically allocates the page table structures without a page directory.
> > > This remains the same for all platforms,
> > > 
> > > The patch itself should not have much functional difference. The primary
> > > noticeable difference is the fact that page tables are no longer
> > > allocated, but rather statically declared as part of the page directory.
> > > This has non-zero overhead, but things gain non-trivial complexity as a
> > > result.
> > 
> > We increase overhead for increased complexity. What's the selling point
> > of this patch then?
> 
> I'd argue about the complexity. Personally, I think the result is easier
> to read.
> 
> I'll add this all to the commit message, but hopefully you agree:
> 
> 1. Splitting out the functions allows easily combining GEN6 and GEN8
> code. Page tables have no difference based on GEN8. As we'll see in a
> future patch when we add the dma mappings to the allocations, it
> requires only one small change to make work, and error handling should
> just fall into place.
> 
> 2. Unless we always want to allocate all page tables under a given PDE,
> we'll have to eventually break this up into an array of pointers (or
> pointer to pointer).
> 
> 3. Having the discrete functions is easier to review, and understand.
> All allocations and frees now take place in just a couple of locations.
> Reviewing, and cathcing leaks should be easy.
> 
> 4. Less important: the gfp flags are confined to one location, which
> makes playing around with such things trivial.o
> 
> Hopefully you're more convinced after you went through more of the patch
> series.

Right, the patches and the resulting code look good. I just felt the
changelog here was self-contradictory. And now we have a great spiel to
include ;-)
 
> If you want to try to optimize the overhead of managing the page tables,
> I think this is a worthy thing to do (for instance, not statically
> declaring the array). It takes a little more work though, and I'd prefer
> to do it after the code is doing what it's supposed to do.

Agreed. I'm still watching PT pages come and go with great joy, and not
yet worrying about the impact. (Though I think I did notices some
side-effect when a reap of the userspace cache lead to a sudden release
of lots of pages, and a dip in throughput.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 14/26] drm/i915: Complete page table structures
  2014-03-22 20:10     ` Ben Widawsky
@ 2014-03-22 21:14       ` Chris Wilson
  0 siblings, 0 replies; 62+ messages in thread
From: Chris Wilson @ 2014-03-22 21:14 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Sat, Mar 22, 2014 at 01:10:47PM -0700, Ben Widawsky wrote:
> On Tue, Mar 18, 2014 at 09:09:45AM +0000, Chris Wilson wrote:
> > On Mon, Mar 17, 2014 at 10:48:46PM -0700, Ben Widawsky wrote:
> > > Move the remaining members over to the new page table structures.
> > > 
> > > This can be squashed with the previous commit if desire. The reasoning
> > > is the same as that patch. I simply felt it is easier to review if split.
> > 
> > I'm not liking the shorter names much. Is there precedence elsewhere
> > (e.g. daddr)?
> > -Chris
> > 
> 
> I'm not particularly attached to "daddr." It was fun to say in my head.
> A lot of code does use "daddr" but it seems to vary between "dma",
> "device", data", "destination" Not exactly precedence.
> 
> Initially I had the prefix p[td]_daddr, but I thought you might complain
> about it because it's implicit. dma_addr seemed kinda redundant to me.
> 
> Recommendation?

I am still attached to dma_addr, as it seems to be common (or
dma_address) for dma-mapped addresses. But by the end, I had stopped
caring, so

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>

I'll come back to this if nobody has a better idea and see if I can make
some sensisble suggestions, or just give in completely.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 09/26] drm/i915: Split out gtt specific header file
  2014-03-22 19:44     ` Ben Widawsky
@ 2014-03-23  0:46       ` Daniel Vetter
  0 siblings, 0 replies; 62+ messages in thread
From: Daniel Vetter @ 2014-03-23  0:46 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Sat, Mar 22, 2014 at 8:44 PM, Ben Widawsky <ben@bwidawsk.net> wrote:
> On Tue, Mar 18, 2014 at 10:15:56AM +0100, Daniel Vetter wrote:
>> On Mon, Mar 17, 2014 at 10:48:41PM -0700, Ben Widawsky wrote:
>> > TODO: Do header files need a copyright?
>>
>> Yup ;-)
>>
>> I like this though, especially since finer-grained files will make
>> kerneldoc inclusion (well, grouped into sensible chapters at least) much
>> simpler.
>> -Daniel
>>
>
> If I re-submit just this patch (with the copyright), will you merge it?
> It will make my life so much easier.

Yeah.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/26] drm/i915: clean up PPGTT init error path
  2014-03-22 20:58       ` Chris Wilson
@ 2014-03-23 17:27         ` Ben Widawsky
  0 siblings, 0 replies; 62+ messages in thread
From: Ben Widawsky @ 2014-03-23 17:27 UTC (permalink / raw)
  To: Chris Wilson, Ben Widawsky, Intel GFX

On Sat, Mar 22, 2014 at 08:58:29PM +0000, Chris Wilson wrote:
> On Sat, Mar 22, 2014 at 12:43:28PM -0700, Ben Widawsky wrote:
> > On Tue, Mar 18, 2014 at 08:44:28AM +0000, Chris Wilson wrote:
> > > On Mon, Mar 17, 2014 at 10:48:39PM -0700, Ben Widawsky wrote:
> > > > The old code (I'm having trouble finding the commit) had a reason for
> > > > doing things when there was an error, and would continue on, thus the
> > > > !ret. For the newer code however, this looks completely silly.
> > > > 
> > > > Follow the normal idiom of if (ret) return ret.
> > > > 
> > > > Also, put the pde wiring in the gen specific init, now that GEN8 exists.
> > > > 
> > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_gem_gtt.c | 22 +++++++++-------------
> > > >  1 file changed, 9 insertions(+), 13 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > > index 1620211..5f73284 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > > @@ -1202,6 +1202,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> > > >  	ppgtt->pd_offset =
> > > >  		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
> > > >  
> > > > +	gen6_write_pdes(ppgtt);
> > > > +
> > > >  	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
> > > >  
> > > >  	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
> > > > @@ -1226,20 +1228,14 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
> > > >  	else
> > > >  		BUG();
> > > >  
> > > > -	if (!ret) {
> > > > -		struct drm_i915_private *dev_priv = dev->dev_private;
> > > > -		kref_init(&ppgtt->ref);
> > > > -		drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
> > > > -			    ppgtt->base.total);
> > > > -		i915_init_vm(dev_priv, &ppgtt->base);
> > > > -		if (INTEL_INFO(dev)->gen < 8) {
> > > > -			gen6_write_pdes(ppgtt);
> > > > -			DRM_DEBUG("Adding PPGTT at offset %x\n",
> > > > -				  ppgtt->pd_offset << 10);
> > > > -		}
> > > > -	}
> > > > +	if (ret)
> > > > +		return ret;
> > > >  
> > > > -	return ret;
> > > > +	kref_init(&ppgtt->ref);
> > > > +	drm_mm_init(&ppgtt->base.mm, ppgtt->base.start, ppgtt->base.total);
> > > > +	i915_init_vm(dev_priv, &ppgtt->base);
> > > 
> > > Didn't we just delete the dev_priv local variable?
> > > -Chris
> > 
> > The important part is that the pde writes moved. (The DRM debug is also
> > dropped). As for this code, I just wanted to get rid of the if (!ret)
> > block. It looks weird.
> > 
> > Maybe I didn't get what you're asking though.
> 
> I was wondering if this patch compiles because of the removal of the
> dev_priv local variable. (Or if the original was a shadow.)
> -Chris

Ah, of course. Yes, there was a shadowed dev_priv. I think it was
merge/rebase fail either by myself or Daniel when the original patches
were merged.

> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2014-03-23 17:27 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-18  5:48 [PATCH 00/26] [RFCish] GEN7 dynamic page tables Ben Widawsky
2014-03-18  5:48 ` [PATCH 01/26] drm/i915: Split out verbose PPGTT dumping Ben Widawsky
2014-03-20 11:57   ` Chris Wilson
2014-03-20 12:08     ` Chris Wilson
2014-03-22 18:13       ` Ben Widawsky
2014-03-22 20:59         ` Chris Wilson
2014-03-18  5:48 ` [PATCH 02/26] drm/i915: Extract switch to default context Ben Widawsky
2014-03-18  8:38   ` Chris Wilson
2014-03-18  5:48 ` [PATCH 03/26] drm/i915: s/pd/pdpe, s/pt/pde Ben Widawsky
2014-03-18  5:48 ` [PATCH 04/26] drm/i915: rename map/unmap to dma_map/unmap Ben Widawsky
2014-03-18  8:40   ` Chris Wilson
2014-03-18  5:48 ` [PATCH 05/26] drm/i915: Setup less PPGTT on failed pagedir Ben Widawsky
2014-03-18  5:48 ` [PATCH 06/26] drm/i915: Wrap VMA binding Ben Widawsky
2014-03-18  8:42   ` Chris Wilson
2014-03-18  5:48 ` [PATCH 07/26] drm/i915: clean up PPGTT init error path Ben Widawsky
2014-03-18  8:44   ` Chris Wilson
2014-03-22 19:43     ` Ben Widawsky
2014-03-22 20:58       ` Chris Wilson
2014-03-23 17:27         ` Ben Widawsky
2014-03-18  5:48 ` [PATCH 08/26] drm/i915: Un-hardcode number of page directories Ben Widawsky
2014-03-18  5:48 ` [PATCH 09/26] drm/i915: Split out gtt specific header file Ben Widawsky
2014-03-18  8:46   ` Chris Wilson
2014-03-18  9:15   ` Daniel Vetter
2014-03-22 19:44     ` Ben Widawsky
2014-03-23  0:46       ` Daniel Vetter
2014-03-18  5:48 ` [PATCH 10/26] drm/i915: Make gen6_write_pdes gen6_map_page_tables Ben Widawsky
2014-03-18  8:48   ` Chris Wilson
2014-03-18  5:48 ` [PATCH 11/26] drm/i915: Range clearing is PPGTT agnostic Ben Widawsky
2014-03-18  8:50   ` Chris Wilson
2014-03-18  5:48 ` [PATCH 12/26] drm/i915: Page table helpers, and define renames Ben Widawsky
2014-03-18  9:05   ` Chris Wilson
2014-03-18 18:29     ` Jesse Barnes
2014-03-19  0:58       ` Ben Widawsky
2014-03-18  5:48 ` [PATCH 13/26] drm/i915: construct page table abstractions Ben Widawsky
2014-03-18  5:48 ` [PATCH 14/26] drm/i915: Complete page table structures Ben Widawsky
2014-03-18  9:09   ` Chris Wilson
2014-03-22 20:10     ` Ben Widawsky
2014-03-22 21:14       ` Chris Wilson
2014-03-18  5:48 ` [PATCH 15/26] drm/i915: Create page table allocators Ben Widawsky
2014-03-18  9:14   ` Chris Wilson
2014-03-22 20:21     ` Ben Widawsky
2014-03-22 21:10       ` Chris Wilson
2014-03-18  5:48 ` [PATCH 16/26] drm/i915: Generalize GEN6 mapping Ben Widawsky
2014-03-18  9:22   ` Chris Wilson
2014-03-18  5:48 ` [PATCH 17/26] drm/i915: Clean up pagetable DMA map & unmap Ben Widawsky
2014-03-18  9:24   ` Chris Wilson
2014-03-18  5:48 ` [PATCH 18/26] drm/i915: Always dma map page table allocations Ben Widawsky
2014-03-18  9:25   ` Chris Wilson
2014-03-18  5:48 ` [PATCH 19/26] drm/i915: Consolidate dma mappings Ben Widawsky
2014-03-18  9:28   ` Chris Wilson
2014-03-18  5:48 ` [PATCH 20/26] drm/i915: Always dma map page directory allocations Ben Widawsky
2014-03-18  9:29   ` Chris Wilson
2014-03-18  5:48 ` [PATCH 21/26] drm/i915: Track GEN6 page table usage Ben Widawsky
2014-03-18  5:48 ` [PATCH 22/26] drm/i915: Extract context switch skip logic Ben Widawsky
2014-03-18  5:48 ` [PATCH 23/26] drm/i915: Force pd restore when PDEs change, gen6-7 Ben Widawsky
2014-03-18  5:48 ` [PATCH 24/26] drm/i915: Finish gen6/7 dynamic page table allocation Ben Widawsky
2014-03-20 12:15   ` Chris Wilson
2014-03-18  5:48 ` [PATCH 25/26] drm/i915: Print used ppgtt pages for gen6 in debugfs Ben Widawsky
2014-03-20 10:09   ` Chris Wilson
2014-03-20 10:17   ` Chris Wilson
2014-03-18  5:48 ` [PATCH 26/26] FOR REFERENCE ONLY Ben Widawsky
2014-03-20 12:17 ` [PATCH 00/26] [RFCish] GEN7 dynamic page tables Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.