All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups
       [not found] <to=1387921357-22942-1-git-send-email-benjamin.widawsky@intel.com>
@ 2014-02-12 22:28 ` Ben Widawsky
  2014-02-12 22:28   ` [PATCH 1/9] drm/i915/bdw: Split up PPGTT cleanup Ben Widawsky
                     ` (19 more replies)
  0 siblings, 20 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-12 22:28 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky

Rebased series from what I submitted a while ago:
http://lists.freedesktop.org/archives/intel-gfx/2013-December/037815.html

It was mostly a clean rebase, but there were a couple of major conflicts which
I think I cleaned up properly, but extra eyes would be good.

As before, the last two are optional.

Ben Widawsky (9):
  drm/i915/bdw: Split up PPGTT cleanup
  drm/i915/bdw: Reorganize PPGTT init
  drm/i915/bdw: Split ppgtt initialization up
  drm/i915: Make clear/insert vfuncs args absolute
  drm/i915/bdw: Reorganize PT allocations
  Revert "drm/i915/bdw: Limit GTT to 2GB"
  drm/i915: Update i915_gem_gtt.c copyright
  drm/i915: Split GEN6 PPGTT cleanup
  drm/i915: Split GEN6 PPGTT initialization up

 drivers/gpu/drm/i915/i915_drv.h     |  13 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 621 +++++++++++++++++++++++++-----------
 2 files changed, 438 insertions(+), 196 deletions(-)

-- 
1.8.5.4

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 1/9] drm/i915/bdw: Split up PPGTT cleanup
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
@ 2014-02-12 22:28   ` Ben Widawsky
  2014-02-13 10:40     ` Chris Wilson
  2014-02-12 22:28   ` [PATCH 2/9] drm/i915/bdw: Reorganize PPGTT init Ben Widawsky
                     ` (18 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Ben Widawsky @ 2014-02-12 22:28 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This will make the code more readable, and extensible which is needed
for upcoming feature work. Eventually, we'll do the same for init.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 59 ++++++++++++++++++++++++-------------
 1 file changed, 38 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6e858e1..ee38faf 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -319,36 +319,53 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
-static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+{
+	int i;
+
+	for (i = 0; i < ppgtt->num_pd_pages ; i++)
+		kfree(ppgtt->gen8_pt_dma_addr[i]);
+
+	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
+	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
+}
+
+static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
 	int i, j;
 
-	list_del(&vm->global_link);
-	drm_mm_takedown(&vm->mm);
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		/* TODO: In the future we'll support sparse mappings, so this
+		 * will have to change. */
+		if (!ppgtt->pd_dma_addr[i])
+			continue;
 
-	for (i = 0; i < ppgtt->num_pd_pages ; i++) {
-		if (ppgtt->pd_dma_addr[i]) {
-			pci_unmap_page(ppgtt->base.dev->pdev,
-				       ppgtt->pd_dma_addr[i],
-				       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+		pci_unmap_page(ppgtt->base.dev->pdev,
+			       ppgtt->pd_dma_addr[i],
+			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
-			for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-				dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
-				if (addr)
-					pci_unmap_page(ppgtt->base.dev->pdev,
-						       addr,
-						       PAGE_SIZE,
-						       PCI_DMA_BIDIRECTIONAL);
+		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+			if (addr)
+				pci_unmap_page(ppgtt->base.dev->pdev,
+				       addr,
+				       PAGE_SIZE,
+				       PCI_DMA_BIDIRECTIONAL);
 
-			}
 		}
-		kfree(ppgtt->gen8_pt_dma_addr[i]);
 	}
+}
 
-	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
-	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
+static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+
+	list_del(&vm->global_link);
+	drm_mm_takedown(&vm->mm);
+
+	gen8_ppgtt_unmap_pages(ppgtt);
+	gen8_ppgtt_free(ppgtt);
 }
 
 /**
-- 
1.8.5.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 2/9] drm/i915/bdw: Reorganize PPGTT init
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
  2014-02-12 22:28   ` [PATCH 1/9] drm/i915/bdw: Split up PPGTT cleanup Ben Widawsky
@ 2014-02-12 22:28   ` Ben Widawsky
  2014-02-19 14:59     ` Imre Deak
  2014-02-12 22:28   ` [PATCH 3/9] drm/i915/bdw: Split ppgtt initialization up Ben Widawsky
                     ` (17 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Ben Widawsky @ 2014-02-12 22:28 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Create 3 clear stages in PPGTT init. This will help with upcoming
changes be more readable. The 3 stages are, allocation, dma mapping, and
writing the P[DT]Es

One nice benefit to the patches is that it makes 2 very clear error
points, allocation, and mapping, and avoids having to do any handling
after writing PTEs (something which was likely buggy before). This
simplified error handling I suspect will be helpful when we move to
deferred/dynamic page table allocation and mapping.

The patches also attempts to break up some of the steps into more
logical reviewable chunks, particularly when we free.

v2: Don't call cleanup on the error path since that takes down the
drm_mm and list entry, which aren't setup at this point.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h     |   2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 124 +++++++++++++++++++++---------------
 2 files changed, 73 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2572a95..cecbb9a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -709,7 +709,7 @@ struct i915_hw_ppgtt {
 	};
 	union {
 		dma_addr_t *pt_dma_addr;
-		dma_addr_t *gen8_pt_dma_addr[4];
+		dma_addr_t **gen8_pt_dma_addr;
 	};
 
 	int (*enable)(struct i915_hw_ppgtt *ppgtt);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ee38faf..c6c221c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -326,12 +326,14 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 	for (i = 0; i < ppgtt->num_pd_pages ; i++)
 		kfree(ppgtt->gen8_pt_dma_addr[i]);
 
+	kfree(ppgtt->gen8_pt_dma_addr);
 	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
 	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
 }
 
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
+	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
@@ -340,18 +342,14 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 		if (!ppgtt->pd_dma_addr[i])
 			continue;
 
-		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd_dma_addr[i],
-			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
+			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			if (addr)
-				pci_unmap_page(ppgtt->base.dev->pdev,
-				       addr,
-				       PAGE_SIZE,
-				       PCI_DMA_BIDIRECTIONAL);
-
+				pci_unmap_page(hwdev, addr, PAGE_SIZE,
+					       PCI_DMA_BIDIRECTIONAL);
 		}
 	}
 }
@@ -369,27 +367,26 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 }
 
 /**
- * GEN8 legacy ppgtt programming is accomplished through 4 PDP registers with a
- * net effect resembling a 2-level page table in normal x86 terms. Each PDP
- * represents 1GB of memory
- * 4 * 512 * 512 * 4096 = 4GB legacy 32b address space.
+ * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
+ * with a net effect resembling a 2-level page table in normal x86 terms. Each
+ * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
+ * space.
  *
+ * FIXME: split allocation into smaller pieces. For now we only ever do this
+ * once, but with full PPGTT, the multiple contiguous allocations will be bad.
  * TODO: Do something with the size parameter
- **/
+ */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
 	struct page *pt_pages;
-	int i, j, ret = -ENOMEM;
 	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
 	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
+	int i, j, ret;
 
 	if (size % (1<<30))
 		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
 
-	/* FIXME: split allocation into smaller pieces. For now we only ever do
-	 * this once, but with full PPGTT, the multiple contiguous allocations
-	 * will be bad.
-	 */
+	/* 1. Do all our allocations for page directories and page tables */
 	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
 	if (!ppgtt->pd_pages)
 		return -ENOMEM;
@@ -404,52 +401,66 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
 	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
-	ppgtt->enable = gen8_ppgtt_enable;
-	ppgtt->switch_mm = gen8_mm_switch;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
-	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
-	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
-
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
 
+	ppgtt->gen8_pt_dma_addr = kcalloc(max_pdp,
+					  sizeof(*ppgtt->gen8_pt_dma_addr),
+					  GFP_KERNEL);
+	if (!ppgtt->gen8_pt_dma_addr) {
+		ret = -ENOMEM;
+		goto bail;
+	}
+
+	for (i = 0; i < max_pdp; i++) {
+		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
+						     sizeof(dma_addr_t),
+						     GFP_KERNEL);
+		if (!ppgtt->gen8_pt_dma_addr[i]) {
+			ret = -ENOMEM;
+			goto bail;
+		}
+	}
+
 	/*
-	 * - Create a mapping for the page directories.
-	 * - For each page directory:
-	 *      allocate space for page table mappings.
-	 *      map each page table
+	 * 2. Create all the DMA mappings for the page directories and page
+	 * tables
 	 */
 	for (i = 0; i < max_pdp; i++) {
-		dma_addr_t temp;
-		temp = pci_map_page(ppgtt->base.dev->pdev,
-				    &ppgtt->pd_pages[i], 0,
-				    PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-		if (pci_dma_mapping_error(ppgtt->base.dev->pdev, temp))
-			goto err_out;
-
-		ppgtt->pd_dma_addr[i] = temp;
-
-		ppgtt->gen8_pt_dma_addr[i] = kmalloc(sizeof(dma_addr_t) * GEN8_PDES_PER_PAGE, GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			goto err_out;
+		dma_addr_t pd_addr, pt_addr;
 
+		/* Get the page table mappings per page directory */
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			struct page *p = &pt_pages[i * GEN8_PDES_PER_PAGE + j];
-			temp = pci_map_page(ppgtt->base.dev->pdev,
-					    p, 0, PAGE_SIZE,
-					    PCI_DMA_BIDIRECTIONAL);
 
-			if (pci_dma_mapping_error(ppgtt->base.dev->pdev, temp))
-				goto err_out;
+			pt_addr = pci_map_page(ppgtt->base.dev->pdev,
+					       p, 0, PAGE_SIZE,
+					       PCI_DMA_BIDIRECTIONAL);
+			ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
+			if (ret)
+				goto bail;
 
-			ppgtt->gen8_pt_dma_addr[i][j] = temp;
+			ppgtt->gen8_pt_dma_addr[i][j] = pt_addr;
 		}
+
+		/* And the page directory mappings */
+		pd_addr = pci_map_page(ppgtt->base.dev->pdev,
+				       &ppgtt->pd_pages[i], 0,
+				       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+		ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
+		if (ret)
+			goto bail;
+
+		ppgtt->pd_dma_addr[i] = pd_addr;
 	}
 
-	/* For now, the PPGTT helper functions all require that the PDEs are
+	/*
+	 * 3. Map all the page directory entires to point to the page tables
+	 * we've allocated.
+	 *
+	 * For now, the PPGTT helper functions all require that the PDEs are
 	 * plugged in correctly. So we do that now/here. For aliasing PPGTT, we
-	 * will never need to touch the PDEs again */
+	 * will never need to touch the PDEs again.
+	 */
 	for (i = 0; i < max_pdp; i++) {
 		gen8_ppgtt_pde_t *pd_vaddr;
 		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
@@ -461,6 +472,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		kunmap_atomic(pd_vaddr);
 	}
 
+	ppgtt->enable = gen8_ppgtt_enable;
+	ppgtt->switch_mm = gen8_mm_switch;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->base.start = 0;
+	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
+
 	ppgtt->base.clear_range(&ppgtt->base, 0,
 				ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE,
 				true);
@@ -473,8 +492,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 			 size % (1<<30));
 	return 0;
 
-err_out:
-	ppgtt->base.cleanup(&ppgtt->base);
+bail:
+	gen8_ppgtt_unmap_pages(ppgtt);
+	gen8_ppgtt_free(ppgtt);
 	return ret;
 }
 
-- 
1.8.5.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 3/9] drm/i915/bdw: Split ppgtt initialization up
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
  2014-02-12 22:28   ` [PATCH 1/9] drm/i915/bdw: Split up PPGTT cleanup Ben Widawsky
  2014-02-12 22:28   ` [PATCH 2/9] drm/i915/bdw: Reorganize PPGTT init Ben Widawsky
@ 2014-02-12 22:28   ` Ben Widawsky
  2014-02-19 17:03     ` Imre Deak
  2014-02-12 22:28   ` [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute Ben Widawsky
                     ` (16 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Ben Widawsky @ 2014-02-12 22:28 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Like cleanup in an earlier patch, the code becomes much more readable,
and easier to extend if we extract out helper functions for the various
stages of init.

Note that with this patch it becomes really simple, and tempting to begin
using the 'goto out' idiom with explicit free/fini semantics. I've
kept the error path as similar as possible to the cleanup() function to
make sure cleanup is as robust as possible

v2: Remove comment "NB:From here on, ppgtt->base.cleanup() should
function properly"
Update commit message to reflect above

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 182 +++++++++++++++++++++++++-----------
 1 file changed, 126 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c6c221c..8a5cad9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -366,91 +366,161 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
-/**
- * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
- * with a net effect resembling a 2-level page table in normal x86 terms. Each
- * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
- * space.
- *
- * FIXME: split allocation into smaller pieces. For now we only ever do this
- * once, but with full PPGTT, the multiple contiguous allocations will be bad.
- * TODO: Do something with the size parameter
- */
-static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
+static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
+					   const int max_pdp)
 {
 	struct page *pt_pages;
-	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
 	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
-	int i, j, ret;
-
-	if (size % (1<<30))
-		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
-
-	/* 1. Do all our allocations for page directories and page tables */
-	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
-	if (!ppgtt->pd_pages)
-		return -ENOMEM;
 
 	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
-	if (!pt_pages) {
-		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
+	if (!pt_pages)
 		return -ENOMEM;
-	}
 
 	ppgtt->gen8_pt_pages = pt_pages;
-	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
 	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
-	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
-	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
 
-	ppgtt->gen8_pt_dma_addr = kcalloc(max_pdp,
+	return 0;
+}
+
+static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
+{
+	int i;
+
+	ppgtt->gen8_pt_dma_addr = kcalloc(ppgtt->num_pd_entries,
 					  sizeof(*ppgtt->gen8_pt_dma_addr),
 					  GFP_KERNEL);
-	if (!ppgtt->gen8_pt_dma_addr) {
-		ret = -ENOMEM;
-		goto bail;
-	}
+	if (!ppgtt->gen8_pt_dma_addr)
+		return -ENOMEM;
 
-	for (i = 0; i < max_pdp; i++) {
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
 						     sizeof(dma_addr_t),
 						     GFP_KERNEL);
 		if (!ppgtt->gen8_pt_dma_addr[i]) {
-			ret = -ENOMEM;
-			goto bail;
+			kfree(ppgtt->gen8_pt_dma_addr);
+			while(i--)
+				kfree(ppgtt->gen8_pt_dma_addr[i]);
+
+			return -ENOMEM;
 		}
 	}
 
+	return 0;
+}
+
+static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
+						const int max_pdp)
+{
+	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
+	if (!ppgtt->pd_pages)
+		return -ENOMEM;
+
+	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
+
+	return 0;
+}
+
+static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
+			    const int max_pdp)
+{
+	int ret;
+
+	ret = gen8_ppgtt_allocate_page_directories(ppgtt, max_pdp);
+	if (ret)
+		return ret;
+
+	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
+	if (ret) {
+		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
+		return ret;
+	}
+
+	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
+	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
+
+	ret = gen8_ppgtt_allocate_dma(ppgtt);
+	if (ret)
+		gen8_ppgtt_free(ppgtt);
+
+	return ret;
+}
+
+static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
+					     const int pd)
+{
+	dma_addr_t pd_addr;
+	int ret;
+
+	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
+			       &ppgtt->pd_pages[pd], 0,
+			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+
+	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
+	if (ret)
+		return ret;
+
+	ppgtt->pd_dma_addr[pd] = pd_addr;
+
+	return 0;
+}
+
+static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
+					const int pd,
+					const int pt)
+{
+	dma_addr_t pt_addr;
+	struct page *p;
+	int ret;
+
+	p = &ppgtt->gen8_pt_pages[pd * GEN8_PDES_PER_PAGE + pt];
+	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
+			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
+	if (ret)
+		return ret;
+
+	ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
+
+	return 0;
+}
+
+/**
+ * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
+ * with a net effect resembling a 2-level page table in normal x86 terms. Each
+ * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
+ * space.
+ *
+ * FIXME: split allocation into smaller pieces. For now we only ever do this
+ * once, but with full PPGTT, the multiple contiguous allocations will be bad.
+ * TODO: Do something with the size parameter
+ */
+static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
+{
+	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
+	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
+	int i, j, ret;
+
+	if (size % (1<<30))
+		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
+
+	/* 1. Do all our allocations for page directories and page tables. */
+	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
+	if (ret)
+		return ret;
+
 	/*
-	 * 2. Create all the DMA mappings for the page directories and page
-	 * tables
+	 * 2. Create DMA mappings for the page directories and page tables.
 	 */
 	for (i = 0; i < max_pdp; i++) {
-		dma_addr_t pd_addr, pt_addr;
-
-		/* Get the page table mappings per page directory */
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct page *p = &pt_pages[i * GEN8_PDES_PER_PAGE + j];
-
-			pt_addr = pci_map_page(ppgtt->base.dev->pdev,
-					       p, 0, PAGE_SIZE,
-					       PCI_DMA_BIDIRECTIONAL);
-			ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
+			ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
 			if (ret)
 				goto bail;
-
-			ppgtt->gen8_pt_dma_addr[i][j] = pt_addr;
 		}
 
-		/* And the page directory mappings */
-		pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-				       &ppgtt->pd_pages[i], 0,
-				       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-		ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
+		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
 		if (ret)
 			goto bail;
-
-		ppgtt->pd_dma_addr[i] = pd_addr;
 	}
 
 	/*
@@ -488,7 +558,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
 	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
 			 ppgtt->num_pt_pages,
-			 (ppgtt->num_pt_pages - num_pt_pages) +
+			 (ppgtt->num_pt_pages - min_pt_pages) +
 			 size % (1<<30));
 	return 0;
 
-- 
1.8.5.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (2 preceding siblings ...)
  2014-02-12 22:28   ` [PATCH 3/9] drm/i915/bdw: Split ppgtt initialization up Ben Widawsky
@ 2014-02-12 22:28   ` Ben Widawsky
  2014-02-13  0:14     ` Chris Wilson
  2014-02-19 17:26     ` Imre Deak
  2014-02-12 22:28   ` [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations Ben Widawsky
                     ` (15 subsequent siblings)
  19 siblings, 2 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-12 22:28 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This patch converts insert_entries and clear_range, both functions which
are specific to the VM. These functions tend to encapsulate the gen
specific PTE writes. Passing absolute addresses to the insert_entries,
and clear_range will help make the logic clearer within the functions as
to what's going on. Currently, all callers simply do the appropriate
page shift, which IMO, ends up looking weird with an upcoming change for
the gen8 page table allocations.

Up until now, the PPGTT was a funky 2 level page table. GEN8 changes
this to look more like a 3 level page table, and to that extent we need
a significant amount more memory simply for the page tables. To address
this, the allocations will be split up in finer amounts.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h     |  6 +--
 drivers/gpu/drm/i915/i915_gem_gtt.c | 80 +++++++++++++++++++++----------------
 2 files changed, 49 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index cecbb9a..2ebad96 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -652,12 +652,12 @@ struct i915_address_space {
 				     enum i915_cache_level level,
 				     bool valid); /* Create a valid PTE */
 	void (*clear_range)(struct i915_address_space *vm,
-			    unsigned int first_entry,
-			    unsigned int num_entries,
+			    uint64_t start,
+			    size_t length,
 			    bool use_scratch);
 	void (*insert_entries)(struct i915_address_space *vm,
 			       struct sg_table *st,
-			       unsigned int first_entry,
+			       uint64_t start,
 			       enum i915_cache_level cache_level);
 	void (*cleanup)(struct i915_address_space *vm);
 };
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 8a5cad9..5bfc6ff 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -254,13 +254,15 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 }
 
 static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
-				   unsigned first_entry,
-				   unsigned num_entries,
+				   uint64_t start,
+				   size_t length,
 				   bool use_scratch)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
+	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
 	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
 	unsigned last_pte, i;
@@ -290,12 +292,13 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 
 static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 				      struct sg_table *pages,
-				      unsigned first_entry,
+				      uint64_t start,
 				      enum i915_cache_level cache_level)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr;
+	unsigned first_entry = start >> PAGE_SHIFT;
 	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
 	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
 	struct sg_page_iter sg_iter;
@@ -866,13 +869,15 @@ static int gen6_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
 
 /* PPGTT support for Sandybdrige/Gen6 and later */
 static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
-				   unsigned first_entry,
-				   unsigned num_entries,
+				   uint64_t start,
+				   size_t length,
 				   bool use_scratch)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen6_gtt_pte_t *pt_vaddr, scratch_pte;
+	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
 	unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
 	unsigned last_pte, i;
@@ -899,12 +904,13 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 
 static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 				      struct sg_table *pages,
-				      unsigned first_entry,
+				      uint64_t start,
 				      enum i915_cache_level cache_level)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen6_gtt_pte_t *pt_vaddr;
+	unsigned first_entry = start >> PAGE_SHIFT;
 	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
 	unsigned act_pte = first_entry % I915_PPGTT_PT_ENTRIES;
 	struct sg_page_iter sg_iter;
@@ -1037,8 +1043,7 @@ alloc:
 		ppgtt->pt_dma_addr[i] = pt_addr;
 	}
 
-	ppgtt->base.clear_range(&ppgtt->base, 0,
-				ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES, true);
+	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
@@ -1102,20 +1107,17 @@ ppgtt_bind_vma(struct i915_vma *vma,
 	       enum i915_cache_level cache_level,
 	       u32 flags)
 {
-	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
-
 	WARN_ON(flags);
 
-	vma->vm->insert_entries(vma->vm, vma->obj->pages, entry, cache_level);
+	vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
+				cache_level);
 }
 
 static void ppgtt_unbind_vma(struct i915_vma *vma)
 {
-	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
-
 	vma->vm->clear_range(vma->vm,
-			     entry,
-			     vma->obj->base.size >> PAGE_SHIFT,
+			     vma->node.start,
+			     vma->obj->base.size,
 			     true);
 }
 
@@ -1276,10 +1278,11 @@ static inline void gen8_set_pte(void __iomem *addr, gen8_gtt_pte_t pte)
 
 static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
 				     struct sg_table *st,
-				     unsigned int first_entry,
+				     uint64_t start,
 				     enum i915_cache_level level)
 {
 	struct drm_i915_private *dev_priv = vm->dev->dev_private;
+	unsigned first_entry = start >> PAGE_SHIFT;
 	gen8_gtt_pte_t __iomem *gtt_entries =
 		(gen8_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
 	int i = 0;
@@ -1321,10 +1324,11 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
  */
 static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 				     struct sg_table *st,
-				     unsigned int first_entry,
+				     uint64_t start,
 				     enum i915_cache_level level)
 {
 	struct drm_i915_private *dev_priv = vm->dev->dev_private;
+	unsigned first_entry = start >> PAGE_SHIFT;
 	gen6_gtt_pte_t __iomem *gtt_entries =
 		(gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
 	int i = 0;
@@ -1356,11 +1360,13 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 }
 
 static void gen8_ggtt_clear_range(struct i915_address_space *vm,
-				  unsigned int first_entry,
-				  unsigned int num_entries,
+				  uint64_t start,
+				  size_t length,
 				  bool use_scratch)
 {
 	struct drm_i915_private *dev_priv = vm->dev->dev_private;
+	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned num_entries = length >> PAGE_SHIFT;
 	gen8_gtt_pte_t scratch_pte, __iomem *gtt_base =
 		(gen8_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
 	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
@@ -1380,11 +1386,13 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
 }
 
 static void gen6_ggtt_clear_range(struct i915_address_space *vm,
-				  unsigned int first_entry,
-				  unsigned int num_entries,
+				  uint64_t start,
+				  size_t length,
 				  bool use_scratch)
 {
 	struct drm_i915_private *dev_priv = vm->dev->dev_private;
+	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned num_entries = length >> PAGE_SHIFT;
 	gen6_gtt_pte_t scratch_pte, __iomem *gtt_base =
 		(gen6_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
 	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
@@ -1417,10 +1425,12 @@ static void i915_ggtt_bind_vma(struct i915_vma *vma,
 }
 
 static void i915_ggtt_clear_range(struct i915_address_space *vm,
-				  unsigned int first_entry,
-				  unsigned int num_entries,
+				  uint64_t start,
+				  size_t length,
 				  bool unused)
 {
+	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned num_entries = length >> PAGE_SHIFT;
 	intel_gtt_clear_range(first_entry, num_entries);
 }
 
@@ -1441,7 +1451,6 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 	struct drm_device *dev = vma->vm->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj = vma->obj;
-	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
 
 	/* If there is no aliasing PPGTT, or the caller needs a global mapping,
 	 * or we have a global mapping already but the cacheability flags have
@@ -1457,7 +1466,8 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 	if (!dev_priv->mm.aliasing_ppgtt || flags & GLOBAL_BIND) {
 		if (!obj->has_global_gtt_mapping ||
 		    (cache_level != obj->cache_level)) {
-			vma->vm->insert_entries(vma->vm, obj->pages, entry,
+			vma->vm->insert_entries(vma->vm, obj->pages,
+						vma->node.start,
 						cache_level);
 			obj->has_global_gtt_mapping = 1;
 		}
@@ -1468,7 +1478,9 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 	     (cache_level != obj->cache_level))) {
 		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
 		appgtt->base.insert_entries(&appgtt->base,
-					    vma->obj->pages, entry, cache_level);
+					    vma->obj->pages,
+					    vma->node.start,
+					    cache_level);
 		vma->obj->has_aliasing_ppgtt_mapping = 1;
 	}
 }
@@ -1478,11 +1490,11 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
 	struct drm_device *dev = vma->vm->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj = vma->obj;
-	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
 
 	if (obj->has_global_gtt_mapping) {
-		vma->vm->clear_range(vma->vm, entry,
-				     vma->obj->base.size >> PAGE_SHIFT,
+		vma->vm->clear_range(vma->vm,
+				     vma->node.start,
+				     obj->base.size,
 				     true);
 		obj->has_global_gtt_mapping = 0;
 	}
@@ -1490,8 +1502,8 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
 	if (obj->has_aliasing_ppgtt_mapping) {
 		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
 		appgtt->base.clear_range(&appgtt->base,
-					 entry,
-					 obj->base.size >> PAGE_SHIFT,
+					 vma->node.start,
+					 obj->base.size,
 					 true);
 		obj->has_aliasing_ppgtt_mapping = 0;
 	}
@@ -1576,14 +1588,14 @@ void i915_gem_setup_global_gtt(struct drm_device *dev,
 
 	/* Clear any non-preallocated blocks */
 	drm_mm_for_each_hole(entry, &ggtt_vm->mm, hole_start, hole_end) {
-		const unsigned long count = (hole_end - hole_start) / PAGE_SIZE;
 		DRM_DEBUG_KMS("clearing unused GTT space: [%lx, %lx]\n",
 			      hole_start, hole_end);
-		ggtt_vm->clear_range(ggtt_vm, hole_start / PAGE_SIZE, count, true);
+		ggtt_vm->clear_range(ggtt_vm, hole_start,
+				     hole_end - hole_start, true);
 	}
 
 	/* And finally clear the reserved guard page */
-	ggtt_vm->clear_range(ggtt_vm, end / PAGE_SIZE - 1, 1, true);
+	ggtt_vm->clear_range(ggtt_vm, end - PAGE_SIZE, PAGE_SIZE, true);
 }
 
 void i915_gem_init_global_gtt(struct drm_device *dev)
-- 
1.8.5.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (3 preceding siblings ...)
  2014-02-12 22:28   ` [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute Ben Widawsky
@ 2014-02-12 22:28   ` Ben Widawsky
  2014-02-12 23:45     ` Chris Wilson
  2014-02-19 19:11     ` Imre Deak
  2014-02-12 22:28   ` [PATCH 6/9] Revert "drm/i915/bdw: Limit GTT to 2GB" Ben Widawsky
                     ` (14 subsequent siblings)
  19 siblings, 2 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-12 22:28 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

The previous allocation mechanism would get 2 contiguous allocations,
one for the page directories, and one for the page tables. As each page
table is 1 page, and there are 512 of these per page directory, this
goes to 1MB. An unfriendly request at best. Worse still, our HW now
supports 4 page directories, and a 2MB allocation is not allowed.

In order to fix this, this patch attempts to split up each page table
allocation into a single, discrete allocation. There is nothing really
fancy about the patch itself, it just has to manage an extra pointer
indirection, and have a fancier bit of logic to free up the pages.

To accommodate some of the added complexity, two new helpers are
introduced to allocate, and free the page table pages.

NOTE: I really wanted to split the way we do allocations, and the way in
which we identify the page table/page directory being used. I found
splitting this functionality up to be too unwieldy. I apologize in
advance to the reviewer. I'd recommend looking at the result, rather
than the diff.

v2/NOTE2: This patch predated commit:
6f1cc993518462ccf039e195fabd47e7aa5bfd13
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Dec 31 15:50:31 2013 +0000

    drm/i915: Avoid dereference past end of page arr

It fixed the same issue as that patch, but because of the limbo state of
PPGTT, Chris patch was merged instead. The excess churn is a result of
my using my original patch, which has my preferred naming. Primarily
act_* is changed to which_*, but it's mostly the same otherwise. I've
kept the convention Chris used for the pte wrap (I had something
slightly different, and broken - but fixable)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h     |   5 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 127 ++++++++++++++++++++++++++++--------
 2 files changed, 103 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2ebad96..d9a6327 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -691,6 +691,7 @@ struct i915_gtt {
 };
 #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
 
+#define GEN8_LEGACY_PDPS 4
 struct i915_hw_ppgtt {
 	struct i915_address_space base;
 	struct kref ref;
@@ -698,14 +699,14 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	union {
 		struct page **pt_pages;
-		struct page *gen8_pt_pages;
+		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
 	};
 	struct page *pd_pages;
 	int num_pd_pages;
 	int num_pt_pages;
 	union {
 		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[4];
+		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
 	};
 	union {
 		dma_addr_t *pt_dma_addr;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5bfc6ff..5299acc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -64,7 +64,19 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 
 #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
 #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
-#define GEN8_LEGACY_PDPS		4
+
+/* GEN8 legacy style addressis defined as a 3 level page table:
+ * 31:30 | 29:21 | 20:12 |  11:0
+ * PDPE  |  PDE  |  PTE  | offset
+ * The difference as compared to normal x86 3 level page table is the PDPEs are
+ * programmed via register.
+ */
+#define GEN8_PDPE_SHIFT			30
+#define GEN8_PDPE_MASK			0x3
+#define GEN8_PDE_SHIFT			21
+#define GEN8_PDE_MASK			0x1ff
+#define GEN8_PTE_SHIFT			12
+#define GEN8_PTE_MASK			0x1ff
 
 #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
 #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
@@ -261,32 +273,36 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
-	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned which_pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
+	unsigned which_pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
+	unsigned which_pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
 	unsigned num_entries = length >> PAGE_SHIFT;
-	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
-	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
 	unsigned last_pte, i;
 
 	scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct page *page_table = &ppgtt->gen8_pt_pages[act_pt];
+		struct page *page_table = ppgtt->gen8_pt_pages[which_pdpe][which_pde];
 
-		last_pte = first_pte + num_entries;
+		last_pte = which_pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
 			last_pte = GEN8_PTES_PER_PAGE;
 
 		pt_vaddr = kmap_atomic(page_table);
 
-		for (i = first_pte; i < last_pte; i++)
+		for (i = which_pte; i < last_pte; i++) {
 			pt_vaddr[i] = scratch_pte;
+			num_entries--;
+			BUG_ON(num_entries < 0);
+		}
 
 		kunmap_atomic(pt_vaddr);
 
-		num_entries -= last_pte - first_pte;
-		first_pte = 0;
-		act_pt++;
+		which_pte = 0;
+		if (which_pde + 1 == GEN8_PDES_PER_PAGE)
+			which_pdpe++;
+		which_pde = (which_pde + 1) & GEN8_PDE_MASK;
 	}
 }
 
@@ -298,39 +314,57 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr;
-	unsigned first_entry = start >> PAGE_SHIFT;
-	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
-	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
+	unsigned which_pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
+	unsigned which_pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
+	unsigned which_pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
 	struct sg_page_iter sg_iter;
 
 	pt_vaddr = NULL;
+
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
+		if (WARN_ON(which_pdpe >= GEN8_LEGACY_PDPS))
+			break;
+
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(&ppgtt->gen8_pt_pages[act_pt]);
+			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[which_pdpe][which_pde]);
 
-		pt_vaddr[act_pte] =
+		pt_vaddr[which_pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
 					cache_level, true);
-		if (++act_pte == GEN8_PTES_PER_PAGE) {
+		if (++which_pte == GEN8_PTES_PER_PAGE) {
 			kunmap_atomic(pt_vaddr);
 			pt_vaddr = NULL;
-			act_pt++;
-			act_pte = 0;
+			if (which_pde + 1 == GEN8_PDES_PER_PAGE)
+				which_pdpe++;
+			which_pte = 0;
 		}
 	}
 	if (pt_vaddr)
 		kunmap_atomic(pt_vaddr);
 }
 
-static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+static void gen8_free_page_tables(struct page **pt_pages)
+{
+	int i;
+
+	if (pt_pages == NULL)
+		return;
+
+	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
+		if (pt_pages[i])
+			__free_pages(pt_pages[i], 0);
+}
+
+static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_pages ; i++)
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
 		kfree(ppgtt->gen8_pt_dma_addr[i]);
+	}
 
 	kfree(ppgtt->gen8_pt_dma_addr);
-	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
 	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
 }
 
@@ -369,20 +403,59 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
+static struct page **__gen8_alloc_page_tables(void)
+{
+	struct page **pt_pages;
+	int i;
+
+	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
+	if (!pt_pages)
+		return ERR_PTR(-ENOMEM);
+
+	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
+		pt_pages[i] = alloc_page(GFP_KERNEL);
+		if (!pt_pages[i])
+			goto bail;
+	}
+
+	return pt_pages;
+
+bail:
+	gen8_free_page_tables(pt_pages);
+	kfree(pt_pages);
+	return ERR_PTR(-ENOMEM);
+}
+
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
 					   const int max_pdp)
 {
-	struct page *pt_pages;
+	struct page **pt_pages[GEN8_LEGACY_PDPS];
 	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
+	int i, ret;
 
-	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
-	if (!pt_pages)
-		return -ENOMEM;
+	for (i = 0; i < max_pdp; i++) {
+		pt_pages[i] = __gen8_alloc_page_tables();
+		if (IS_ERR(pt_pages[i])) {
+			ret = PTR_ERR(pt_pages[i]);
+			goto unwind_out;
+		}
+	}
+
+	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
+	 * "atomic" - for cleanup purposes.
+	 */
+	for (i = 0; i < max_pdp; i++)
+		ppgtt->gen8_pt_pages[i] = pt_pages[i];
 
-	ppgtt->gen8_pt_pages = pt_pages;
 	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
 
 	return 0;
+
+unwind_out:
+	while(i--)
+		gen8_free_page_tables(pt_pages[i]);
+
+	return ret;
 }
 
 static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
@@ -475,7 +548,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 	struct page *p;
 	int ret;
 
-	p = &ppgtt->gen8_pt_pages[pd * GEN8_PDES_PER_PAGE + pt];
+	p = ppgtt->gen8_pt_pages[pd][pt];
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
-- 
1.8.5.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 6/9] Revert "drm/i915/bdw: Limit GTT to 2GB"
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (4 preceding siblings ...)
  2014-02-12 22:28   ` [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations Ben Widawsky
@ 2014-02-12 22:28   ` Ben Widawsky
  2014-02-19 19:14     ` Imre Deak
  2014-02-12 22:28   ` [PATCH 7/9] drm/i915: Update i915_gem_gtt.c copyright Ben Widawsky
                     ` (13 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Ben Widawsky @ 2014-02-12 22:28 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This reverts commit 3a2ffb65eec6dbda2fd8151894f51c18b42c8d41.

Now that the code is fixed to use smaller allocations, it should be safe
to let the full GGTT be used on DW.

The testcase for this is anything which uses more than half of the GTT,
thus eclipsing the old limit.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5299acc..2c2121d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1733,11 +1733,6 @@ static inline unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl)
 	bdw_gmch_ctl &= BDW_GMCH_GGMS_MASK;
 	if (bdw_gmch_ctl)
 		bdw_gmch_ctl = 1 << bdw_gmch_ctl;
-	if (bdw_gmch_ctl > 4) {
-		WARN_ON(!i915.preliminary_hw_support);
-		return 4<<20;
-	}
-
 	return bdw_gmch_ctl << 20;
 }
 
-- 
1.8.5.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 7/9] drm/i915: Update i915_gem_gtt.c copyright
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (5 preceding siblings ...)
  2014-02-12 22:28   ` [PATCH 6/9] Revert "drm/i915/bdw: Limit GTT to 2GB" Ben Widawsky
@ 2014-02-12 22:28   ` Ben Widawsky
  2014-02-12 23:19     ` Damien Lespiau
  2014-02-19 19:20     ` Imre Deak
  2014-02-12 22:28   ` [PATCH 8/9] drm/i915: Split GEN6 PPGTT cleanup Ben Widawsky
                     ` (12 subsequent siblings)
  19 siblings, 2 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-12 22:28 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

I keep meaning to do this... by now almost the entire file has been
written by an Intel employee (including Daniel post-2010).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2c2121d..e1bc0b9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1,5 +1,6 @@
 /*
  * Copyright © 2010 Daniel Vetter
+ * Copyright © 2011-2013 Intel Corporation
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the "Software"),
-- 
1.8.5.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 8/9] drm/i915: Split GEN6 PPGTT cleanup
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (6 preceding siblings ...)
  2014-02-12 22:28   ` [PATCH 7/9] drm/i915: Update i915_gem_gtt.c copyright Ben Widawsky
@ 2014-02-12 22:28   ` Ben Widawsky
  2014-02-13 10:29     ` Chris Wilson
  2014-02-12 22:28   ` [PATCH 9/9] drm/i915: Split GEN6 PPGTT initialization up Ben Widawsky
                     ` (11 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Ben Widawsky @ 2014-02-12 22:28 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This cleanup is similar to the GEN8 cleanup (though less necessary).
Having everything split will make cleaning the initialization path error
paths easier to understand.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e1bc0b9..d3ee916 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1008,22 +1008,21 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
-static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
+static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
 	int i;
 
-	list_del(&vm->global_link);
-	drm_mm_takedown(&ppgtt->base.mm);
-	drm_mm_remove_node(&ppgtt->node);
-
 	if (ppgtt->pt_dma_addr) {
 		for (i = 0; i < ppgtt->num_pd_entries; i++)
 			pci_unmap_page(ppgtt->base.dev->pdev,
 				       ppgtt->pt_dma_addr[i],
 				       4096, PCI_DMA_BIDIRECTIONAL);
 	}
+}
+
+static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+{
+	int i;
 
 	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
@@ -1032,6 +1031,19 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 	kfree(ppgtt);
 }
 
+static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+
+	list_del(&vm->global_link);
+	drm_mm_takedown(&ppgtt->base.mm);
+	drm_mm_remove_node(&ppgtt->node);
+
+	gen6_ppgtt_unmap_pages(ppgtt);
+	gen6_ppgtt_free(ppgtt);
+}
+
 static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
 #define GEN6_PD_ALIGN (PAGE_SIZE * 16)
-- 
1.8.5.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 9/9] drm/i915: Split GEN6 PPGTT initialization up
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (7 preceding siblings ...)
  2014-02-12 22:28   ` [PATCH 8/9] drm/i915: Split GEN6 PPGTT cleanup Ben Widawsky
@ 2014-02-12 22:28   ` Ben Widawsky
  2014-02-13 10:33     ` Chris Wilson
  2014-02-13 11:47   ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ville Syrjälä
                     ` (10 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Ben Widawsky @ 2014-02-12 22:28 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Simply to match the GEN8 style of PPGTT initialization, split up the
allocations and mappings. Unlike GEN8, we skip a separate dma_addr_t
allocation function, as it is much simpler pre-gen8.

With this code it would be easy to make a more general PPGTT
initialization function with per GEN alloc/map/etc. or use a common
helper, similar to the ringbuffer code. I don't see a benefit to doing
this just yet, but who knows...

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 141 +++++++++++++++++++++++-------------
 1 file changed, 91 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d3ee916..396c862 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1044,14 +1044,14 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 	gen6_ppgtt_free(ppgtt);
 }
 
-static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 {
 #define GEN6_PD_ALIGN (PAGE_SIZE * 16)
 #define GEN6_PD_SIZE (GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	bool retried = false;
-	int i, ret;
+	int ret;
 
 	/* PPGTT PDEs reside in the GGTT and consists of 512 entries. The
 	 * allocator works in address space sizes, so it's multiplied by page
@@ -1078,42 +1078,60 @@ alloc:
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
-	ppgtt->base.pte_encode = dev_priv->gtt.base.pte_encode;
 	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
-	if (IS_GEN6(dev)) {
-		ppgtt->enable = gen6_ppgtt_enable;
-		ppgtt->switch_mm = gen6_mm_switch;
-	} else if (IS_HASWELL(dev)) {
-		ppgtt->enable = gen7_ppgtt_enable;
-		ppgtt->switch_mm = hsw_mm_switch;
-	} else if (IS_GEN7(dev)) {
-		ppgtt->enable = gen7_ppgtt_enable;
-		ppgtt->switch_mm = gen7_mm_switch;
-	} else
-		BUG();
-	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
-	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
-	ppgtt->base.scratch = dev_priv->gtt.base.scratch;
-	ppgtt->base.start = 0;
-	ppgtt->base.total = GEN6_PPGTT_PD_ENTRIES * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	return ret;
+}
+
+static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
+{
+	int i;
+
 	ppgtt->pt_pages = kcalloc(ppgtt->num_pd_entries, sizeof(struct page *),
 				  GFP_KERNEL);
-	if (!ppgtt->pt_pages) {
-		drm_mm_remove_node(&ppgtt->node);
+
+	if (!ppgtt->pt_pages)
 		return -ENOMEM;
-	}
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!ppgtt->pt_pages[i])
-			goto err_pt_alloc;
+		if (!ppgtt->pt_pages[i]) {
+			gen6_ppgtt_free(ppgtt);
+			return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
+static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
+{
+	int ret;
+
+	ret = gen6_ppgtt_allocate_page_directories(ppgtt);
+	if (ret)
+		return ret;
+
+	ret = gen6_ppgtt_allocate_page_tables(ppgtt);
+	if (ret) {
+		drm_mm_remove_node(&ppgtt->node);
+		return ret;
 	}
 
 	ppgtt->pt_dma_addr = kcalloc(ppgtt->num_pd_entries, sizeof(dma_addr_t),
 				     GFP_KERNEL);
-	if (!ppgtt->pt_dma_addr)
-		goto err_pt_alloc;
+	if (!ppgtt->pt_dma_addr) {
+		drm_mm_remove_node(&ppgtt->node);
+		gen6_ppgtt_free(ppgtt);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		dma_addr_t pt_addr;
@@ -1122,40 +1140,63 @@ alloc:
 				       PCI_DMA_BIDIRECTIONAL);
 
 		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
-			ret = -EIO;
-			goto err_pd_pin;
-
+			gen6_ppgtt_unmap_pages(ppgtt);
+			return -EIO;
 		}
+
 		ppgtt->pt_dma_addr[i] = pt_addr;
 	}
 
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
+	return 0;
+}
+
+static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ppgtt->base.pte_encode = dev_priv->gtt.base.pte_encode;
+	if (IS_GEN6(dev)) {
+		ppgtt->enable = gen6_ppgtt_enable;
+		ppgtt->switch_mm = gen6_mm_switch;
+	} else if (IS_HASWELL(dev)) {
+		ppgtt->enable = gen7_ppgtt_enable;
+		ppgtt->switch_mm = hsw_mm_switch;
+	} else if (IS_GEN7(dev)) {
+		ppgtt->enable = gen7_ppgtt_enable;
+		ppgtt->switch_mm = gen7_mm_switch;
+	} else
+		BUG();
+
+	ret = gen6_ppgtt_alloc(ppgtt);
+	if (ret)
+		return ret;
+
+	ret = gen6_ppgtt_setup_page_tables(ppgtt);
+	if (ret) {
+		gen6_ppgtt_free(ppgtt);
+		return ret;
+	}
+
+	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
+	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
+	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
+	ppgtt->base.scratch = dev_priv->gtt.base.scratch;
+	ppgtt->base.start = 0;
+	ppgtt->base.total = GEN6_PPGTT_PD_ENTRIES * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
-	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
-			 ppgtt->node.size >> 20,
-			 ppgtt->node.start / PAGE_SIZE);
 	ppgtt->pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
-	return 0;
+	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 
-err_pd_pin:
-	if (ppgtt->pt_dma_addr) {
-		for (i--; i >= 0; i--)
-			pci_unmap_page(dev->pdev, ppgtt->pt_dma_addr[i],
-				       4096, PCI_DMA_BIDIRECTIONAL);
-	}
-err_pt_alloc:
-	kfree(ppgtt->pt_dma_addr);
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		if (ppgtt->pt_pages[i])
-			__free_page(ppgtt->pt_pages[i]);
-	}
-	kfree(ppgtt->pt_pages);
-	drm_mm_remove_node(&ppgtt->node);
+	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
+			 ppgtt->node.size >> 20,
+			 ppgtt->node.start / PAGE_SIZE);
 
-	return ret;
+	return 0;
 }
 
 int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
-- 
1.8.5.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH 7/9] drm/i915: Update i915_gem_gtt.c copyright
  2014-02-12 22:28   ` [PATCH 7/9] drm/i915: Update i915_gem_gtt.c copyright Ben Widawsky
@ 2014-02-12 23:19     ` Damien Lespiau
  2014-02-12 23:22       ` Ben Widawsky
  2014-02-19 19:20     ` Imre Deak
  1 sibling, 1 reply; 63+ messages in thread
From: Damien Lespiau @ 2014-02-12 23:19 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, Feb 12, 2014 at 02:28:50PM -0800, Ben Widawsky wrote:
> I keep meaning to do this... by now almost the entire file has been
> written by an Intel employee (including Daniel post-2010).
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 2c2121d..e1bc0b9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1,5 +1,6 @@
>  /*
>   * Copyright © 2010 Daniel Vetter
> + * Copyright © 2011-2013 Intel Corporation

2014?

-- 
Damien

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 7/9] drm/i915: Update i915_gem_gtt.c copyright
  2014-02-12 23:19     ` Damien Lespiau
@ 2014-02-12 23:22       ` Ben Widawsky
  0 siblings, 0 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-12 23:22 UTC (permalink / raw)
  To: Damien Lespiau; +Cc: Intel GFX, Ben Widawsky

On Wed, Feb 12, 2014 at 11:19:24PM +0000, Damien Lespiau wrote:
> On Wed, Feb 12, 2014 at 02:28:50PM -0800, Ben Widawsky wrote:
> > I keep meaning to do this... by now almost the entire file has been
> > written by an Intel employee (including Daniel post-2010).
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_gtt.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index 2c2121d..e1bc0b9 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -1,5 +1,6 @@
> >  /*
> >   * Copyright © 2010 Daniel Vetter
> > + * Copyright © 2011-2013 Intel Corporation
> 
> 2014?
> 
> -- 
> Damien

This patch was written in 2013 still, but yes.

-- 
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations
  2014-02-12 22:28   ` [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations Ben Widawsky
@ 2014-02-12 23:45     ` Chris Wilson
  2014-02-12 23:52       ` Ben Widawsky
  2014-02-19 19:11     ` Imre Deak
  1 sibling, 1 reply; 63+ messages in thread
From: Chris Wilson @ 2014-02-12 23:45 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, Feb 12, 2014 at 02:28:48PM -0800, Ben Widawsky wrote:
> -		for (i = first_pte; i < last_pte; i++)
> +		for (i = which_pte; i < last_pte; i++) {
>  			pt_vaddr[i] = scratch_pte;
> +			num_entries--;
> +			BUG_ON(num_entries < 0);
> +		}
>  
>  		kunmap_atomic(pt_vaddr);
>  
> -		num_entries -= last_pte - first_pte;

I'm going to moan about this being replaced by a BUG_ON inside the inner
loop.

> -		first_pte = 0;
> -		act_pt++;
> +		which_pte = 0;

> +		if (which_pde + 1 == GEN8_PDES_PER_PAGE)
> +			which_pdpe++;
> +		which_pde = (which_pde + 1) & GEN8_PDE_MASK;

I think this would be clearer written as
  if (++which_pde == GEN8_PDES_PER_PAGE) {
     which_pdpe++;
     which_pde = 0;
   }
as then the relationship between pdpe and pde is much more apparent to
me. Do we feel that which_pte, which_pde, which_pdpe are really any
better than pte, pde, pdpe? Or is it important to question ourselves
every step of the way?

And I may as well moan about having to preallocate everything. ;-)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations
  2014-02-12 23:45     ` Chris Wilson
@ 2014-02-12 23:52       ` Ben Widawsky
  0 siblings, 0 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-12 23:52 UTC (permalink / raw)
  To: Chris Wilson, Ben Widawsky, Intel GFX

On Wed, Feb 12, 2014 at 11:45:59PM +0000, Chris Wilson wrote:
> On Wed, Feb 12, 2014 at 02:28:48PM -0800, Ben Widawsky wrote:
> > -		for (i = first_pte; i < last_pte; i++)
> > +		for (i = which_pte; i < last_pte; i++) {
> >  			pt_vaddr[i] = scratch_pte;
> > +			num_entries--;
> > +			BUG_ON(num_entries < 0);
> > +		}
> >  
> >  		kunmap_atomic(pt_vaddr);
> >  
> > -		num_entries -= last_pte - first_pte;
> 
> I'm going to moan about this being replaced by a BUG_ON inside the inner
> loop.
> 

I'm fine with removing it. I guess doing any sort of perf with BUG_ON is
ill advised, but running without BUG_ON is perhaps equally ill advised.

> > -		first_pte = 0;
> > -		act_pt++;
> > +		which_pte = 0;
> 
> > +		if (which_pde + 1 == GEN8_PDES_PER_PAGE)
> > +			which_pdpe++;
> > +		which_pde = (which_pde + 1) & GEN8_PDE_MASK;
> 
> I think this would be clearer written as
>   if (++which_pde == GEN8_PDES_PER_PAGE) {
>      which_pdpe++;
>      which_pde = 0;
>    }

I'm fine with that change.

> as then the relationship between pdpe and pde is much more apparent to
> me. Do we feel that which_pte, which_pde, which_pdpe are really any
> better than pte, pde, pdpe? Or is it important to question ourselves
> every step of the way?

I actually just don't like act_, and first_, dropping the "which_" is
perfectly acceptable to me.

> 
> And I may as well moan about having to preallocate everything. ;-)
> -Chris
> 

Deferring allocation is an important but separate step.

> -- 
> Chris Wilson, Intel Open Source Technology Centre

I'll give Imre a bit to leave comments and then respin.

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute
  2014-02-12 22:28   ` [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute Ben Widawsky
@ 2014-02-13  0:14     ` Chris Wilson
  2014-02-13  0:34       ` Ben Widawsky
  2014-02-19 17:26     ` Imre Deak
  1 sibling, 1 reply; 63+ messages in thread
From: Chris Wilson @ 2014-02-13  0:14 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, Feb 12, 2014 at 02:28:47PM -0800, Ben Widawsky wrote:
> This patch converts insert_entries and clear_range, both functions which
> are specific to the VM. These functions tend to encapsulate the gen
> specific PTE writes. Passing absolute addresses to the insert_entries,
> and clear_range will help make the logic clearer within the functions as
> to what's going on. Currently, all callers simply do the appropriate
> page shift, which IMO, ends up looking weird with an upcoming change for
> the gen8 page table allocations.
> 
> Up until now, the PPGTT was a funky 2 level page table. GEN8 changes
> this to look more like a 3 level page table, and to that extent we need
> a significant amount more memory simply for the page tables. To address
> this, the allocations will be split up in finer amounts.

Why size_t? Having a type that changes size irrespective of the GPU
seems like asking for trouble.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute
  2014-02-13  0:14     ` Chris Wilson
@ 2014-02-13  0:34       ` Ben Widawsky
  0 siblings, 0 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-13  0:34 UTC (permalink / raw)
  To: Chris Wilson, Ben Widawsky, Intel GFX

On Thu, Feb 13, 2014 at 12:14:15AM +0000, Chris Wilson wrote:
> On Wed, Feb 12, 2014 at 02:28:47PM -0800, Ben Widawsky wrote:
> > This patch converts insert_entries and clear_range, both functions which
> > are specific to the VM. These functions tend to encapsulate the gen
> > specific PTE writes. Passing absolute addresses to the insert_entries,
> > and clear_range will help make the logic clearer within the functions as
> > to what's going on. Currently, all callers simply do the appropriate
> > page shift, which IMO, ends up looking weird with an upcoming change for
> > the gen8 page table allocations.
> > 
> > Up until now, the PPGTT was a funky 2 level page table. GEN8 changes
> > this to look more like a 3 level page table, and to that extent we need
> > a significant amount more memory simply for the page tables. To address
> > this, the allocations will be split up in finer amounts.
> 
> Why size_t? Having a type that changes size irrespective of the GPU
> seems like asking for trouble.
> -Chris
> 

For GGTT, I agree with you. The only trouble I could think of is if you
used 48b PPGTT addressing on i386. I suspect there is some reason this
won't work, but I've not read the spec.

In the case of a field named, "length" I don't really think we need the
size_t type to help self-document. So that's not useful... I can live
with it if you want the change, I do think size_t still makes sense.

> -- 
> Chris Wilson, Intel Open Source Technology Centre

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 8/9] drm/i915: Split GEN6 PPGTT cleanup
  2014-02-12 22:28   ` [PATCH 8/9] drm/i915: Split GEN6 PPGTT cleanup Ben Widawsky
@ 2014-02-13 10:29     ` Chris Wilson
  0 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2014-02-13 10:29 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, Feb 12, 2014 at 02:28:51PM -0800, Ben Widawsky wrote:
> This cleanup is similar to the GEN8 cleanup (though less necessary).
> Having everything split will make cleaning the initialization path error
> paths easier to understand.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 9/9] drm/i915: Split GEN6 PPGTT initialization up
  2014-02-12 22:28   ` [PATCH 9/9] drm/i915: Split GEN6 PPGTT initialization up Ben Widawsky
@ 2014-02-13 10:33     ` Chris Wilson
  0 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2014-02-13 10:33 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, Feb 12, 2014 at 02:28:52PM -0800, Ben Widawsky wrote:
> Simply to match the GEN8 style of PPGTT initialization, split up the
> allocations and mappings. Unlike GEN8, we skip a separate dma_addr_t
> allocation function, as it is much simpler pre-gen8.
> 
> With this code it would be easy to make a more general PPGTT
> initialization function with per GEN alloc/map/etc. or use a common
> helper, similar to the ringbuffer code. I don't see a benefit to doing
> this just yet, but who knows...
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Had to double-check whether free_pages() safely accepted NULL, but that
was the only logic change I spotted, so
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 1/9] drm/i915/bdw: Split up PPGTT cleanup
  2014-02-12 22:28   ` [PATCH 1/9] drm/i915/bdw: Split up PPGTT cleanup Ben Widawsky
@ 2014-02-13 10:40     ` Chris Wilson
  0 siblings, 0 replies; 63+ messages in thread
From: Chris Wilson @ 2014-02-13 10:40 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, Feb 12, 2014 at 02:28:44PM -0800, Ben Widawsky wrote:
> This will make the code more readable, and extensible which is needed
> for upcoming feature work. Eventually, we'll do the same for init.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

I don't think I spotted any logic changes,
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (8 preceding siblings ...)
  2014-02-12 22:28   ` [PATCH 9/9] drm/i915: Split GEN6 PPGTT initialization up Ben Widawsky
@ 2014-02-13 11:47   ` Ville Syrjälä
  2014-02-19 17:17     ` Ben Widawsky
  2014-02-20  6:05   ` [PATCH 0/9] [v2] " Ben Widawsky
                     ` (9 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Ville Syrjälä @ 2014-02-13 11:47 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Wed, Feb 12, 2014 at 02:28:43PM -0800, Ben Widawsky wrote:
> Rebased series from what I submitted a while ago:
> http://lists.freedesktop.org/archives/intel-gfx/2013-December/037815.html
> 
> It was mostly a clean rebase, but there were a couple of major conflicts which
> I think I cleaned up properly, but extra eyes would be good.

One thing I noticed while staring at the ppgtt code recently is that
gen6 ppgtt cleanup kfrees the ppgtt struct, but gen8 code doesn't.
At that time it looked like the correct fix was moving the kfree()
out from the gen6 code into some common place. The reason being
that the gen8 code called the cleanup function during error handling
in the init paths. But I'm not sure if you've changed that with this
series. A quick scan of these patches tells me the leak is still
there at least.

> 
> As before, the last two are optional.
> 
> Ben Widawsky (9):
>   drm/i915/bdw: Split up PPGTT cleanup
>   drm/i915/bdw: Reorganize PPGTT init
>   drm/i915/bdw: Split ppgtt initialization up
>   drm/i915: Make clear/insert vfuncs args absolute
>   drm/i915/bdw: Reorganize PT allocations
>   Revert "drm/i915/bdw: Limit GTT to 2GB"
>   drm/i915: Update i915_gem_gtt.c copyright
>   drm/i915: Split GEN6 PPGTT cleanup
>   drm/i915: Split GEN6 PPGTT initialization up
> 
>  drivers/gpu/drm/i915/i915_drv.h     |  13 +-
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 621 +++++++++++++++++++++++++-----------
>  2 files changed, 438 insertions(+), 196 deletions(-)
> 
> -- 
> 1.8.5.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/9] drm/i915/bdw: Reorganize PPGTT init
  2014-02-12 22:28   ` [PATCH 2/9] drm/i915/bdw: Reorganize PPGTT init Ben Widawsky
@ 2014-02-19 14:59     ` Imre Deak
  2014-02-19 20:06       ` [PATCH] [v3] " Ben Widawsky
  0 siblings, 1 reply; 63+ messages in thread
From: Imre Deak @ 2014-02-19 14:59 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky


[-- Attachment #1.1: Type: text/plain, Size: 9829 bytes --]

On Wed, 2014-02-12 at 14:28 -0800, Ben Widawsky wrote:
> Create 3 clear stages in PPGTT init. This will help with upcoming
> changes be more readable. The 3 stages are, allocation, dma mapping, and
> writing the P[DT]Es
> 
> One nice benefit to the patches is that it makes 2 very clear error
> points, allocation, and mapping, and avoids having to do any handling
> after writing PTEs (something which was likely buggy before). This
> simplified error handling I suspect will be helpful when we move to
> deferred/dynamic page table allocation and mapping.
> 
> The patches also attempts to break up some of the steps into more
> logical reviewable chunks, particularly when we free.
> 
> v2: Don't call cleanup on the error path since that takes down the
> drm_mm and list entry, which aren't setup at this point.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_drv.h     |   2 +-
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 124 +++++++++++++++++++++---------------
>  2 files changed, 73 insertions(+), 53 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2572a95..cecbb9a 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -709,7 +709,7 @@ struct i915_hw_ppgtt {
>  	};
>  	union {
>  		dma_addr_t *pt_dma_addr;
> -		dma_addr_t *gen8_pt_dma_addr[4];
> +		dma_addr_t **gen8_pt_dma_addr;

If there isn't any reason to allocate this dynamically I'd just leave
the static array. This would make the error path a bit simpler and be
more symmetric wrt. pd_dma_addr which is also a static array.

>  	};
>  
>  	int (*enable)(struct i915_hw_ppgtt *ppgtt);
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index ee38faf..c6c221c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -326,12 +326,14 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  	for (i = 0; i < ppgtt->num_pd_pages ; i++)
>  		kfree(ppgtt->gen8_pt_dma_addr[i]);
>  
> +	kfree(ppgtt->gen8_pt_dma_addr);
>  	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
>  	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
>  }
>  
>  static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  {
> +	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
>  	int i, j;
>  
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> @@ -340,18 +342,14 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  		if (!ppgtt->pd_dma_addr[i])
>  			continue;
>  
> -		pci_unmap_page(ppgtt->base.dev->pdev,
> -			       ppgtt->pd_dma_addr[i],
> -			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> +		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
> +			       PCI_DMA_BIDIRECTIONAL);
>  
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
>  			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
>  			if (addr)
> -				pci_unmap_page(ppgtt->base.dev->pdev,
> -				       addr,
> -				       PAGE_SIZE,
> -				       PCI_DMA_BIDIRECTIONAL);
> -
> +				pci_unmap_page(hwdev, addr, PAGE_SIZE,
> +					       PCI_DMA_BIDIRECTIONAL);
>  		}
>  	}
>  }
> @@ -369,27 +367,26 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>  }
>  
>  /**
> - * GEN8 legacy ppgtt programming is accomplished through 4 PDP registers with a
> - * net effect resembling a 2-level page table in normal x86 terms. Each PDP
> - * represents 1GB of memory
> - * 4 * 512 * 512 * 4096 = 4GB legacy 32b address space.
> + * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
> + * with a net effect resembling a 2-level page table in normal x86 terms. Each
> + * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
> + * space.
>   *
> + * FIXME: split allocation into smaller pieces. For now we only ever do this
> + * once, but with full PPGTT, the multiple contiguous allocations will be bad.
>   * TODO: Do something with the size parameter
> - **/
> + */
>  static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  {
>  	struct page *pt_pages;
> -	int i, j, ret = -ENOMEM;
>  	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
>  	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
> +	int i, j, ret;
>  
>  	if (size % (1<<30))
>  		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
>  
> -	/* FIXME: split allocation into smaller pieces. For now we only ever do
> -	 * this once, but with full PPGTT, the multiple contiguous allocations
> -	 * will be bad.
> -	 */
> +	/* 1. Do all our allocations for page directories and page tables */
>  	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
>  	if (!ppgtt->pd_pages)
>  		return -ENOMEM;
> @@ -404,52 +401,66 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
>  	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
>  	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
> -	ppgtt->enable = gen8_ppgtt_enable;
> -	ppgtt->switch_mm = gen8_mm_switch;
> -	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
> -	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
> -	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
> -	ppgtt->base.start = 0;
> -	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
> -
>  	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
>  
> +	ppgtt->gen8_pt_dma_addr = kcalloc(max_pdp,
> +					  sizeof(*ppgtt->gen8_pt_dma_addr),
> +					  GFP_KERNEL);
> +	if (!ppgtt->gen8_pt_dma_addr) {
> +		ret = -ENOMEM;
> +		goto bail;

On the error path, in gen8_ppgtt_free() we'd dereference the above NULL
ptr.

> +	}
> +
> +	for (i = 0; i < max_pdp; i++) {
> +		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
> +						     sizeof(dma_addr_t),
> +						     GFP_KERNEL);
> +		if (!ppgtt->gen8_pt_dma_addr[i]) {
> +			ret = -ENOMEM;
> +			goto bail;
> +		}
> +	}
> +
>  	/*
> -	 * - Create a mapping for the page directories.
> -	 * - For each page directory:
> -	 *      allocate space for page table mappings.
> -	 *      map each page table
> +	 * 2. Create all the DMA mappings for the page directories and page
> +	 * tables
>  	 */
>  	for (i = 0; i < max_pdp; i++) {
> -		dma_addr_t temp;
> -		temp = pci_map_page(ppgtt->base.dev->pdev,
> -				    &ppgtt->pd_pages[i], 0,
> -				    PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> -		if (pci_dma_mapping_error(ppgtt->base.dev->pdev, temp))
> -			goto err_out;
> -
> -		ppgtt->pd_dma_addr[i] = temp;
> -
> -		ppgtt->gen8_pt_dma_addr[i] = kmalloc(sizeof(dma_addr_t) * GEN8_PDES_PER_PAGE, GFP_KERNEL);
> -		if (!ppgtt->gen8_pt_dma_addr[i])
> -			goto err_out;
> +		dma_addr_t pd_addr, pt_addr;
>  
> +		/* Get the page table mappings per page directory */
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
>  			struct page *p = &pt_pages[i * GEN8_PDES_PER_PAGE + j];
> -			temp = pci_map_page(ppgtt->base.dev->pdev,
> -					    p, 0, PAGE_SIZE,
> -					    PCI_DMA_BIDIRECTIONAL);
>  
> -			if (pci_dma_mapping_error(ppgtt->base.dev->pdev, temp))
> -				goto err_out;
> +			pt_addr = pci_map_page(ppgtt->base.dev->pdev,
> +					       p, 0, PAGE_SIZE,
> +					       PCI_DMA_BIDIRECTIONAL);
> +			ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
> +			if (ret)
> +				goto bail;
>  
> -			ppgtt->gen8_pt_dma_addr[i][j] = temp;
> +			ppgtt->gen8_pt_dma_addr[i][j] = pt_addr;
>  		}
> +
> +		/* And the page directory mappings */
> +		pd_addr = pci_map_page(ppgtt->base.dev->pdev,
> +				       &ppgtt->pd_pages[i], 0,
> +				       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> +		ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
> +		if (ret)
> +			goto bail;

The error path here would leak the above page table mappings, since
ppgtt->pd_dma_addr[i] is still zero, but in gen8_ppgtt_unmap_pages() we
do a if (!ppgtt->pd_dma_addr[i]) continue; skipping the page table unmap
part. This is reworked in your later patches, but the issue is still
there in the final version. 

> +
> +		ppgtt->pd_dma_addr[i] = pd_addr;
>  	}
>  
> -	/* For now, the PPGTT helper functions all require that the PDEs are
> +	/*
> +	 * 3. Map all the page directory entires to point to the page tables
> +	 * we've allocated.
> +	 *
> +	 * For now, the PPGTT helper functions all require that the PDEs are
>  	 * plugged in correctly. So we do that now/here. For aliasing PPGTT, we
> -	 * will never need to touch the PDEs again */
> +	 * will never need to touch the PDEs again.
> +	 */
>  	for (i = 0; i < max_pdp; i++) {
>  		gen8_ppgtt_pde_t *pd_vaddr;
>  		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
> @@ -461,6 +472,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  		kunmap_atomic(pd_vaddr);
>  	}
>  
> +	ppgtt->enable = gen8_ppgtt_enable;
> +	ppgtt->switch_mm = gen8_mm_switch;
> +	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
> +	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
> +	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
> +	ppgtt->base.start = 0;
> +	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
> +
>  	ppgtt->base.clear_range(&ppgtt->base, 0,
>  				ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE,
>  				true);
> @@ -473,8 +492,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  			 size % (1<<30));
>  	return 0;
>  
> -err_out:
> -	ppgtt->base.cleanup(&ppgtt->base);
> +bail:
> +	gen8_ppgtt_unmap_pages(ppgtt);
> +	gen8_ppgtt_free(ppgtt);
>  	return ret;
>  }
>  


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 3/9] drm/i915/bdw: Split ppgtt initialization up
  2014-02-12 22:28   ` [PATCH 3/9] drm/i915/bdw: Split ppgtt initialization up Ben Widawsky
@ 2014-02-19 17:03     ` Imre Deak
  0 siblings, 0 replies; 63+ messages in thread
From: Imre Deak @ 2014-02-19 17:03 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky


[-- Attachment #1.1: Type: text/plain, Size: 8648 bytes --]

On Wed, 2014-02-12 at 14:28 -0800, Ben Widawsky wrote:
> Like cleanup in an earlier patch, the code becomes much more readable,
> and easier to extend if we extract out helper functions for the various
> stages of init.
> 
> Note that with this patch it becomes really simple, and tempting to begin
> using the 'goto out' idiom with explicit free/fini semantics. I've
> kept the error path as similar as possible to the cleanup() function to
> make sure cleanup is as robust as possible
> 
> v2: Remove comment "NB:From here on, ppgtt->base.cleanup() should
> function properly"
> Update commit message to reflect above
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 182 +++++++++++++++++++++++++-----------
>  1 file changed, 126 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index c6c221c..8a5cad9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -366,91 +366,161 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>  	gen8_ppgtt_free(ppgtt);
>  }
>  
> -/**
> - * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
> - * with a net effect resembling a 2-level page table in normal x86 terms. Each
> - * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
> - * space.
> - *
> - * FIXME: split allocation into smaller pieces. For now we only ever do this
> - * once, but with full PPGTT, the multiple contiguous allocations will be bad.
> - * TODO: Do something with the size parameter
> - */
> -static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
> +static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
> +					   const int max_pdp)
>  {
>  	struct page *pt_pages;
> -	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
>  	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
> -	int i, j, ret;
> -
> -	if (size % (1<<30))
> -		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
> -
> -	/* 1. Do all our allocations for page directories and page tables */
> -	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
> -	if (!ppgtt->pd_pages)
> -		return -ENOMEM;
>  
>  	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
> -	if (!pt_pages) {
> -		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
> +	if (!pt_pages)
>  		return -ENOMEM;
> -	}
>  
>  	ppgtt->gen8_pt_pages = pt_pages;
> -	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
>  	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
> -	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
> -	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
>  
> -	ppgtt->gen8_pt_dma_addr = kcalloc(max_pdp,
> +	return 0;
> +}
> +
> +static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
> +{
> +	int i;
> +
> +	ppgtt->gen8_pt_dma_addr = kcalloc(ppgtt->num_pd_entries,
>  					  sizeof(*ppgtt->gen8_pt_dma_addr),
>  					  GFP_KERNEL);
> -	if (!ppgtt->gen8_pt_dma_addr) {
> -		ret = -ENOMEM;
> -		goto bail;
> -	}
> +	if (!ppgtt->gen8_pt_dma_addr)
> +		return -ENOMEM;
>  
> -	for (i = 0; i < max_pdp; i++) {
> +	for (i = 0; i < ppgtt->num_pd_entries; i++) {
>  		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
>  						     sizeof(dma_addr_t),
>  						     GFP_KERNEL);
>  		if (!ppgtt->gen8_pt_dma_addr[i]) {
> -			ret = -ENOMEM;
> -			goto bail;
> +			kfree(ppgtt->gen8_pt_dma_addr);
> +			while(i--)
> +				kfree(ppgtt->gen8_pt_dma_addr[i]);
> +
> +			return -ENOMEM;
>  		}
>  	}
>  
> +	return 0;
> +}
> +
> +static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
> +						const int max_pdp)
> +{
> +	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
> +	if (!ppgtt->pd_pages)
> +		return -ENOMEM;
> +
> +	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
> +
> +	return 0;
> +}
> +
> +static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
> +			    const int max_pdp)
> +{
> +	int ret;
> +
> +	ret = gen8_ppgtt_allocate_page_directories(ppgtt, max_pdp);
> +	if (ret)
> +		return ret;
> +
> +	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
> +	if (ret) {
> +		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
> +		return ret;
> +	}
> +
> +	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
> +	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);

This check belongs to gen8_ppgtt_allocate_page_directories().

> +
> +	ret = gen8_ppgtt_allocate_dma(ppgtt);
> +	if (ret)
> +		gen8_ppgtt_free(ppgtt);

Just for reference, this the same ppgtt->gen8_pt_dma_addr NULL deref
issue on the error path as in 2/9.

> +
> +	return ret;
> +}
> +
> +static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
> +					     const int pd)
> +{
> +	dma_addr_t pd_addr;
> +	int ret;
> +
> +	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
> +			       &ppgtt->pd_pages[pd], 0,
> +			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> +
> +	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
> +	if (ret)
> +		return ret;
> +
> +	ppgtt->pd_dma_addr[pd] = pd_addr;
> +
> +	return 0;
> +}
> +
> +static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
> +					const int pd,
> +					const int pt)
> +{
> +	dma_addr_t pt_addr;
> +	struct page *p;
> +	int ret;
> +
> +	p = &ppgtt->gen8_pt_pages[pd * GEN8_PDES_PER_PAGE + pt];
> +	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
> +			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> +	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
> +	if (ret)
> +		return ret;
> +
> +	ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
> +
> +	return 0;
> +}
> +
> +/**
> + * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
> + * with a net effect resembling a 2-level page table in normal x86 terms. Each
> + * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
> + * space.
> + *
> + * FIXME: split allocation into smaller pieces. For now we only ever do this
> + * once, but with full PPGTT, the multiple contiguous allocations will be bad.
> + * TODO: Do something with the size parameter
> + */
> +static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
> +{
> +	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
> +	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
> +	int i, j, ret;
> +
> +	if (size % (1<<30))
> +		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
> +
> +	/* 1. Do all our allocations for page directories and page tables. */
> +	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
> +	if (ret)
> +		return ret;
> +
>  	/*
> -	 * 2. Create all the DMA mappings for the page directories and page
> -	 * tables
> +	 * 2. Create DMA mappings for the page directories and page tables.
>  	 */
>  	for (i = 0; i < max_pdp; i++) {
> -		dma_addr_t pd_addr, pt_addr;
> -
> -		/* Get the page table mappings per page directory */
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> -			struct page *p = &pt_pages[i * GEN8_PDES_PER_PAGE + j];
> -
> -			pt_addr = pci_map_page(ppgtt->base.dev->pdev,
> -					       p, 0, PAGE_SIZE,
> -					       PCI_DMA_BIDIRECTIONAL);
> -			ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
> +			ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
>  			if (ret)
>  				goto bail;
> -
> -			ppgtt->gen8_pt_dma_addr[i][j] = pt_addr;
>  		}
>  
> -		/* And the page directory mappings */
> -		pd_addr = pci_map_page(ppgtt->base.dev->pdev,
> -				       &ppgtt->pd_pages[i], 0,
> -				       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> -		ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
> +		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
>  		if (ret)
>  			goto bail;

Again only for reference the same leaked mappings for page tables on the
error path as in 2/9.

> -
> -		ppgtt->pd_dma_addr[i] = pd_addr;
>  	}
>  
>  	/*
> @@ -488,7 +558,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
>  	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
>  			 ppgtt->num_pt_pages,
> -			 (ppgtt->num_pt_pages - num_pt_pages) +
> +			 (ppgtt->num_pt_pages - min_pt_pages) +
>  			 size % (1<<30));
>  	return 0;
>  


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups
  2014-02-13 11:47   ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ville Syrjälä
@ 2014-02-19 17:17     ` Ben Widawsky
  0 siblings, 0 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-19 17:17 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Intel GFX

On Thu, Feb 13, 2014 at 01:47:57PM +0200, Ville Syrjälä wrote:
> On Wed, Feb 12, 2014 at 02:28:43PM -0800, Ben Widawsky wrote:
> > Rebased series from what I submitted a while ago:
> > http://lists.freedesktop.org/archives/intel-gfx/2013-December/037815.html
> > 
> > It was mostly a clean rebase, but there were a couple of major conflicts which
> > I think I cleaned up properly, but extra eyes would be good.
> 
> One thing I noticed while staring at the ppgtt code recently is that
> gen6 ppgtt cleanup kfrees the ppgtt struct, but gen8 code doesn't.
> At that time it looked like the correct fix was moving the kfree()
> out from the gen6 code into some common place. The reason being
> that the gen8 code called the cleanup function during error handling
> in the init paths. But I'm not sure if you've changed that with this
> series. A quick scan of these patches tells me the leak is still
> there at least.
> 

Yeah, you're right, thanks for spotting it. I put the fix at the
beginning of the series. Fortunately BDW full PPGTT isn't turned on yet,
so we only leak 1 per module reload.

> > 
> > As before, the last two are optional.
> > 
> > Ben Widawsky (9):
> >   drm/i915/bdw: Split up PPGTT cleanup
> >   drm/i915/bdw: Reorganize PPGTT init
> >   drm/i915/bdw: Split ppgtt initialization up
> >   drm/i915: Make clear/insert vfuncs args absolute
> >   drm/i915/bdw: Reorganize PT allocations
> >   Revert "drm/i915/bdw: Limit GTT to 2GB"
> >   drm/i915: Update i915_gem_gtt.c copyright
> >   drm/i915: Split GEN6 PPGTT cleanup
> >   drm/i915: Split GEN6 PPGTT initialization up
> > 
> >  drivers/gpu/drm/i915/i915_drv.h     |  13 +-
> >  drivers/gpu/drm/i915/i915_gem_gtt.c | 621 +++++++++++++++++++++++++-----------
> >  2 files changed, 438 insertions(+), 196 deletions(-)
> > 
> > -- 
> > 1.8.5.4
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Ville Syrjälä
> Intel OTC

-- 
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute
  2014-02-12 22:28   ` [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute Ben Widawsky
  2014-02-13  0:14     ` Chris Wilson
@ 2014-02-19 17:26     ` Imre Deak
  1 sibling, 0 replies; 63+ messages in thread
From: Imre Deak @ 2014-02-19 17:26 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky


[-- Attachment #1.1: Type: text/plain, Size: 12233 bytes --]

On Wed, 2014-02-12 at 14:28 -0800, Ben Widawsky wrote:
> This patch converts insert_entries and clear_range, both functions which
> are specific to the VM. These functions tend to encapsulate the gen
> specific PTE writes. Passing absolute addresses to the insert_entries,
> and clear_range will help make the logic clearer within the functions as
> to what's going on. Currently, all callers simply do the appropriate
> page shift, which IMO, ends up looking weird with an upcoming change for
> the gen8 page table allocations.
> 
> Up until now, the PPGTT was a funky 2 level page table. GEN8 changes
> this to look more like a 3 level page table, and to that extent we need
> a significant amount more memory simply for the page tables. To address
> this, the allocations will be split up in finer amounts.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

I haven't found any issues with this patch, but Chris' comment on size_t
makes sense. So with that changed:

Reviewed-by: Imre Deak <imre.deak@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_drv.h     |  6 +--
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 80 +++++++++++++++++++++----------------
>  2 files changed, 49 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index cecbb9a..2ebad96 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -652,12 +652,12 @@ struct i915_address_space {
>  				     enum i915_cache_level level,
>  				     bool valid); /* Create a valid PTE */
>  	void (*clear_range)(struct i915_address_space *vm,
> -			    unsigned int first_entry,
> -			    unsigned int num_entries,
> +			    uint64_t start,
> +			    size_t length,
>  			    bool use_scratch);
>  	void (*insert_entries)(struct i915_address_space *vm,
>  			       struct sg_table *st,
> -			       unsigned int first_entry,
> +			       uint64_t start,
>  			       enum i915_cache_level cache_level);
>  	void (*cleanup)(struct i915_address_space *vm);
>  };
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 8a5cad9..5bfc6ff 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -254,13 +254,15 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
>  }
>  
>  static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> -				   unsigned first_entry,
> -				   unsigned num_entries,
> +				   uint64_t start,
> +				   size_t length,
>  				   bool use_scratch)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
> +	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned num_entries = length >> PAGE_SHIFT;
>  	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
>  	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
>  	unsigned last_pte, i;
> @@ -290,12 +292,13 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>  
>  static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  				      struct sg_table *pages,
> -				      unsigned first_entry,
> +				      uint64_t start,
>  				      enum i915_cache_level cache_level)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen8_gtt_pte_t *pt_vaddr;
> +	unsigned first_entry = start >> PAGE_SHIFT;
>  	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
>  	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
>  	struct sg_page_iter sg_iter;
> @@ -866,13 +869,15 @@ static int gen6_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
>  
>  /* PPGTT support for Sandybdrige/Gen6 and later */
>  static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
> -				   unsigned first_entry,
> -				   unsigned num_entries,
> +				   uint64_t start,
> +				   size_t length,
>  				   bool use_scratch)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen6_gtt_pte_t *pt_vaddr, scratch_pte;
> +	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned num_entries = length >> PAGE_SHIFT;
>  	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
>  	unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
>  	unsigned last_pte, i;
> @@ -899,12 +904,13 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>  
>  static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>  				      struct sg_table *pages,
> -				      unsigned first_entry,
> +				      uint64_t start,
>  				      enum i915_cache_level cache_level)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen6_gtt_pte_t *pt_vaddr;
> +	unsigned first_entry = start >> PAGE_SHIFT;
>  	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
>  	unsigned act_pte = first_entry % I915_PPGTT_PT_ENTRIES;
>  	struct sg_page_iter sg_iter;
> @@ -1037,8 +1043,7 @@ alloc:
>  		ppgtt->pt_dma_addr[i] = pt_addr;
>  	}
>  
> -	ppgtt->base.clear_range(&ppgtt->base, 0,
> -				ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES, true);
> +	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
>  	ppgtt->debug_dump = gen6_dump_ppgtt;
>  
>  	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
> @@ -1102,20 +1107,17 @@ ppgtt_bind_vma(struct i915_vma *vma,
>  	       enum i915_cache_level cache_level,
>  	       u32 flags)
>  {
> -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
> -
>  	WARN_ON(flags);
>  
> -	vma->vm->insert_entries(vma->vm, vma->obj->pages, entry, cache_level);
> +	vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
> +				cache_level);
>  }
>  
>  static void ppgtt_unbind_vma(struct i915_vma *vma)
>  {
> -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
> -
>  	vma->vm->clear_range(vma->vm,
> -			     entry,
> -			     vma->obj->base.size >> PAGE_SHIFT,
> +			     vma->node.start,
> +			     vma->obj->base.size,
>  			     true);
>  }
>  
> @@ -1276,10 +1278,11 @@ static inline void gen8_set_pte(void __iomem *addr, gen8_gtt_pte_t pte)
>  
>  static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>  				     struct sg_table *st,
> -				     unsigned int first_entry,
> +				     uint64_t start,
>  				     enum i915_cache_level level)
>  {
>  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> +	unsigned first_entry = start >> PAGE_SHIFT;
>  	gen8_gtt_pte_t __iomem *gtt_entries =
>  		(gen8_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
>  	int i = 0;
> @@ -1321,10 +1324,11 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>   */
>  static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>  				     struct sg_table *st,
> -				     unsigned int first_entry,
> +				     uint64_t start,
>  				     enum i915_cache_level level)
>  {
>  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> +	unsigned first_entry = start >> PAGE_SHIFT;
>  	gen6_gtt_pte_t __iomem *gtt_entries =
>  		(gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
>  	int i = 0;
> @@ -1356,11 +1360,13 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>  }
>  
>  static void gen8_ggtt_clear_range(struct i915_address_space *vm,
> -				  unsigned int first_entry,
> -				  unsigned int num_entries,
> +				  uint64_t start,
> +				  size_t length,
>  				  bool use_scratch)
>  {
>  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> +	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned num_entries = length >> PAGE_SHIFT;
>  	gen8_gtt_pte_t scratch_pte, __iomem *gtt_base =
>  		(gen8_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
>  	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
> @@ -1380,11 +1386,13 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
>  }
>  
>  static void gen6_ggtt_clear_range(struct i915_address_space *vm,
> -				  unsigned int first_entry,
> -				  unsigned int num_entries,
> +				  uint64_t start,
> +				  size_t length,
>  				  bool use_scratch)
>  {
>  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> +	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned num_entries = length >> PAGE_SHIFT;
>  	gen6_gtt_pte_t scratch_pte, __iomem *gtt_base =
>  		(gen6_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
>  	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
> @@ -1417,10 +1425,12 @@ static void i915_ggtt_bind_vma(struct i915_vma *vma,
>  }
>  
>  static void i915_ggtt_clear_range(struct i915_address_space *vm,
> -				  unsigned int first_entry,
> -				  unsigned int num_entries,
> +				  uint64_t start,
> +				  size_t length,
>  				  bool unused)
>  {
> +	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned num_entries = length >> PAGE_SHIFT;
>  	intel_gtt_clear_range(first_entry, num_entries);
>  }
>  
> @@ -1441,7 +1451,6 @@ static void ggtt_bind_vma(struct i915_vma *vma,
>  	struct drm_device *dev = vma->vm->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct drm_i915_gem_object *obj = vma->obj;
> -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
>  
>  	/* If there is no aliasing PPGTT, or the caller needs a global mapping,
>  	 * or we have a global mapping already but the cacheability flags have
> @@ -1457,7 +1466,8 @@ static void ggtt_bind_vma(struct i915_vma *vma,
>  	if (!dev_priv->mm.aliasing_ppgtt || flags & GLOBAL_BIND) {
>  		if (!obj->has_global_gtt_mapping ||
>  		    (cache_level != obj->cache_level)) {
> -			vma->vm->insert_entries(vma->vm, obj->pages, entry,
> +			vma->vm->insert_entries(vma->vm, obj->pages,
> +						vma->node.start,
>  						cache_level);
>  			obj->has_global_gtt_mapping = 1;
>  		}
> @@ -1468,7 +1478,9 @@ static void ggtt_bind_vma(struct i915_vma *vma,
>  	     (cache_level != obj->cache_level))) {
>  		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
>  		appgtt->base.insert_entries(&appgtt->base,
> -					    vma->obj->pages, entry, cache_level);
> +					    vma->obj->pages,
> +					    vma->node.start,
> +					    cache_level);
>  		vma->obj->has_aliasing_ppgtt_mapping = 1;
>  	}
>  }
> @@ -1478,11 +1490,11 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
>  	struct drm_device *dev = vma->vm->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct drm_i915_gem_object *obj = vma->obj;
> -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
>  
>  	if (obj->has_global_gtt_mapping) {
> -		vma->vm->clear_range(vma->vm, entry,
> -				     vma->obj->base.size >> PAGE_SHIFT,
> +		vma->vm->clear_range(vma->vm,
> +				     vma->node.start,
> +				     obj->base.size,
>  				     true);
>  		obj->has_global_gtt_mapping = 0;
>  	}
> @@ -1490,8 +1502,8 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
>  	if (obj->has_aliasing_ppgtt_mapping) {
>  		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
>  		appgtt->base.clear_range(&appgtt->base,
> -					 entry,
> -					 obj->base.size >> PAGE_SHIFT,
> +					 vma->node.start,
> +					 obj->base.size,
>  					 true);
>  		obj->has_aliasing_ppgtt_mapping = 0;
>  	}
> @@ -1576,14 +1588,14 @@ void i915_gem_setup_global_gtt(struct drm_device *dev,
>  
>  	/* Clear any non-preallocated blocks */
>  	drm_mm_for_each_hole(entry, &ggtt_vm->mm, hole_start, hole_end) {
> -		const unsigned long count = (hole_end - hole_start) / PAGE_SIZE;
>  		DRM_DEBUG_KMS("clearing unused GTT space: [%lx, %lx]\n",
>  			      hole_start, hole_end);
> -		ggtt_vm->clear_range(ggtt_vm, hole_start / PAGE_SIZE, count, true);
> +		ggtt_vm->clear_range(ggtt_vm, hole_start,
> +				     hole_end - hole_start, true);
>  	}
>  
>  	/* And finally clear the reserved guard page */
> -	ggtt_vm->clear_range(ggtt_vm, end / PAGE_SIZE - 1, 1, true);
> +	ggtt_vm->clear_range(ggtt_vm, end - PAGE_SIZE, PAGE_SIZE, true);
>  }
>  
>  void i915_gem_init_global_gtt(struct drm_device *dev)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations
  2014-02-12 22:28   ` [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations Ben Widawsky
  2014-02-12 23:45     ` Chris Wilson
@ 2014-02-19 19:11     ` Imre Deak
  2014-02-19 19:25       ` Imre Deak
  2014-02-19 21:06       ` Ben Widawsky
  1 sibling, 2 replies; 63+ messages in thread
From: Imre Deak @ 2014-02-19 19:11 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky


[-- Attachment #1.1: Type: text/plain, Size: 10284 bytes --]

On Wed, 2014-02-12 at 14:28 -0800, Ben Widawsky wrote:
> The previous allocation mechanism would get 2 contiguous allocations,
> one for the page directories, and one for the page tables. As each page
> table is 1 page, and there are 512 of these per page directory, this
> goes to 1MB. An unfriendly request at best. Worse still, our HW now
       ---^
Fwiw, 2MB.

> supports 4 page directories, and a 2MB allocation is not allowed.
> 
> In order to fix this, this patch attempts to split up each page table
> allocation into a single, discrete allocation. There is nothing really
> fancy about the patch itself, it just has to manage an extra pointer
> indirection, and have a fancier bit of logic to free up the pages.
> 
> To accommodate some of the added complexity, two new helpers are
> introduced to allocate, and free the page table pages.
> 
> NOTE: I really wanted to split the way we do allocations, and the way in
> which we identify the page table/page directory being used. I found
> splitting this functionality up to be too unwieldy. I apologize in
> advance to the reviewer. I'd recommend looking at the result, rather
> than the diff.
> 
> v2/NOTE2: This patch predated commit:
> 6f1cc993518462ccf039e195fabd47e7aa5bfd13
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Dec 31 15:50:31 2013 +0000
> 
>     drm/i915: Avoid dereference past end of page arr
> 
> It fixed the same issue as that patch, but because of the limbo state of
> PPGTT, Chris patch was merged instead. The excess churn is a result of
> my using my original patch, which has my preferred naming. Primarily
> act_* is changed to which_*, but it's mostly the same otherwise. I've
> kept the convention Chris used for the pte wrap (I had something
> slightly different, and broken - but fixable)
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_drv.h     |   5 +-
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 127 ++++++++++++++++++++++++++++--------
>  2 files changed, 103 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2ebad96..d9a6327 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -691,6 +691,7 @@ struct i915_gtt {
>  };
>  #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
>  
> +#define GEN8_LEGACY_PDPS 4
>  struct i915_hw_ppgtt {
>  	struct i915_address_space base;
>  	struct kref ref;
> @@ -698,14 +699,14 @@ struct i915_hw_ppgtt {
>  	unsigned num_pd_entries;
>  	union {
>  		struct page **pt_pages;
> -		struct page *gen8_pt_pages;
> +		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
>  	};
>  	struct page *pd_pages;
>  	int num_pd_pages;
>  	int num_pt_pages;
>  	union {
>  		uint32_t pd_offset;
> -		dma_addr_t pd_dma_addr[4];
> +		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
>  	};
>  	union {
>  		dma_addr_t *pt_dma_addr;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 5bfc6ff..5299acc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -64,7 +64,19 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
>  
>  #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
>  #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
> -#define GEN8_LEGACY_PDPS		4
> +
> +/* GEN8 legacy style addressis defined as a 3 level page table:
> + * 31:30 | 29:21 | 20:12 |  11:0
> + * PDPE  |  PDE  |  PTE  | offset
> + * The difference as compared to normal x86 3 level page table is the PDPEs are
> + * programmed via register.
> + */
> +#define GEN8_PDPE_SHIFT			30
> +#define GEN8_PDPE_MASK			0x3
> +#define GEN8_PDE_SHIFT			21
> +#define GEN8_PDE_MASK			0x1ff
> +#define GEN8_PTE_SHIFT			12
> +#define GEN8_PTE_MASK			0x1ff
>  
>  #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
>  #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
> @@ -261,32 +273,36 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
> -	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned which_pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> +	unsigned which_pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> +	unsigned which_pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
>  	unsigned num_entries = length >> PAGE_SHIFT;
> -	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> -	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
>  	unsigned last_pte, i;
>  
>  	scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
>  				      I915_CACHE_LLC, use_scratch);
>  
>  	while (num_entries) {
> -		struct page *page_table = &ppgtt->gen8_pt_pages[act_pt];
> +		struct page *page_table = ppgtt->gen8_pt_pages[which_pdpe][which_pde];
>  
> -		last_pte = first_pte + num_entries;
> +		last_pte = which_pte + num_entries;
>  		if (last_pte > GEN8_PTES_PER_PAGE)
>  			last_pte = GEN8_PTES_PER_PAGE;
>  
>  		pt_vaddr = kmap_atomic(page_table);
>  
> -		for (i = first_pte; i < last_pte; i++)
> +		for (i = which_pte; i < last_pte; i++) {
>  			pt_vaddr[i] = scratch_pte;
> +			num_entries--;
> +			BUG_ON(num_entries < 0);

num_entries is unsigned.

> +		}
>  
>  		kunmap_atomic(pt_vaddr);
>  
> -		num_entries -= last_pte - first_pte;
> -		first_pte = 0;
> -		act_pt++;
> +		which_pte = 0;
> +		if (which_pde + 1 == GEN8_PDES_PER_PAGE)
> +			which_pdpe++;
> +		which_pde = (which_pde + 1) & GEN8_PDE_MASK;
>  	}
>  }
>  
> @@ -298,39 +314,57 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen8_gtt_pte_t *pt_vaddr;
> -	unsigned first_entry = start >> PAGE_SHIFT;
> -	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> -	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
> +	unsigned which_pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> +	unsigned which_pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> +	unsigned which_pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
>  	struct sg_page_iter sg_iter;
>  
>  	pt_vaddr = NULL;
> +
>  	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> +		if (WARN_ON(which_pdpe >= GEN8_LEGACY_PDPS))
> +			break;
> +
>  		if (pt_vaddr == NULL)
> -			pt_vaddr = kmap_atomic(&ppgtt->gen8_pt_pages[act_pt]);
> +			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[which_pdpe][which_pde]);
>  
> -		pt_vaddr[act_pte] =
> +		pt_vaddr[which_pte] =
>  			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
>  					cache_level, true);
> -		if (++act_pte == GEN8_PTES_PER_PAGE) {
> +		if (++which_pte == GEN8_PTES_PER_PAGE) {
>  			kunmap_atomic(pt_vaddr);
>  			pt_vaddr = NULL;
> -			act_pt++;
> -			act_pte = 0;
> +			if (which_pde + 1 == GEN8_PDES_PER_PAGE)
> +				which_pdpe++;

Afaics which_pde = (which_pde + 1) & GEN8_PDE_MASK; is missing here.

> +			which_pte = 0;



>  		}
>  	}
>  	if (pt_vaddr)
>  		kunmap_atomic(pt_vaddr);
>  }
>  
> -static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
> +static void gen8_free_page_tables(struct page **pt_pages)
> +{
> +	int i;
> +
> +	if (pt_pages == NULL)
> +		return;
> +
> +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
> +		if (pt_pages[i])
> +			__free_pages(pt_pages[i], 0);
> +}
> +
> +static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
>  {
>  	int i;
>  
> -	for (i = 0; i < ppgtt->num_pd_pages ; i++)
> +	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> +		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
>  		kfree(ppgtt->gen8_pt_dma_addr[i]);
> +	}
>  
>  	kfree(ppgtt->gen8_pt_dma_addr);
> -	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
>  	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
>  }
>  
> @@ -369,20 +403,59 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>  	gen8_ppgtt_free(ppgtt);
>  }
>  
> +static struct page **__gen8_alloc_page_tables(void)
> +{
> +	struct page **pt_pages;
> +	int i;
> +
> +	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
> +	if (!pt_pages)
> +		return ERR_PTR(-ENOMEM);
> +
> +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
> +		pt_pages[i] = alloc_page(GFP_KERNEL);
> +		if (!pt_pages[i])
> +			goto bail;
> +	}
> +
> +	return pt_pages;
> +
> +bail:
> +	gen8_free_page_tables(pt_pages);
> +	kfree(pt_pages);
> +	return ERR_PTR(-ENOMEM);
> +}
> +
>  static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
>  					   const int max_pdp)
>  {
> -	struct page *pt_pages;
> +	struct page **pt_pages[GEN8_LEGACY_PDPS];
>  	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
> +	int i, ret;
>  
> -	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
> -	if (!pt_pages)
> -		return -ENOMEM;
> +	for (i = 0; i < max_pdp; i++) {
> +		pt_pages[i] = __gen8_alloc_page_tables();
> +		if (IS_ERR(pt_pages[i])) {
> +			ret = PTR_ERR(pt_pages[i]);
> +			goto unwind_out;
> +		}
> +	}
> +
> +	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
> +	 * "atomic" - for cleanup purposes.
> +	 */
> +	for (i = 0; i < max_pdp; i++)
> +		ppgtt->gen8_pt_pages[i] = pt_pages[i];
>  
> -	ppgtt->gen8_pt_pages = pt_pages;
>  	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
>  
>  	return 0;
> +
> +unwind_out:
> +	while(i--)
> +		gen8_free_page_tables(pt_pages[i]);

I guess Ville commented on this issue, but pt_pages would be leaked
here.

> +
> +	return ret;
>  }
>  
>  static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
> @@ -475,7 +548,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
>  	struct page *p;
>  	int ret;
>  
> -	p = &ppgtt->gen8_pt_pages[pd * GEN8_PDES_PER_PAGE + pt];
> +	p = ppgtt->gen8_pt_pages[pd][pt];
>  	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
>  			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>  	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 6/9] Revert "drm/i915/bdw: Limit GTT to 2GB"
  2014-02-12 22:28   ` [PATCH 6/9] Revert "drm/i915/bdw: Limit GTT to 2GB" Ben Widawsky
@ 2014-02-19 19:14     ` Imre Deak
  0 siblings, 0 replies; 63+ messages in thread
From: Imre Deak @ 2014-02-19 19:14 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky


[-- Attachment #1.1: Type: text/plain, Size: 1163 bytes --]

On Wed, 2014-02-12 at 14:28 -0800, Ben Widawsky wrote:
> This reverts commit 3a2ffb65eec6dbda2fd8151894f51c18b42c8d41.
> 
> Now that the code is fixed to use smaller allocations, it should be safe
> to let the full GGTT be used on DW.
> 
> The testcase for this is anything which uses more than half of the GTT,
> thus eclipsing the old limit.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

After the remaining issues fixed, on this one:
Reviewed-by: Imre Deak <imre.deak@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 5299acc..2c2121d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1733,11 +1733,6 @@ static inline unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl)
>  	bdw_gmch_ctl &= BDW_GMCH_GGMS_MASK;
>  	if (bdw_gmch_ctl)
>  		bdw_gmch_ctl = 1 << bdw_gmch_ctl;
> -	if (bdw_gmch_ctl > 4) {
> -		WARN_ON(!i915.preliminary_hw_support);
> -		return 4<<20;
> -	}
> -
>  	return bdw_gmch_ctl << 20;
>  }
>  


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 7/9] drm/i915: Update i915_gem_gtt.c copyright
  2014-02-12 22:28   ` [PATCH 7/9] drm/i915: Update i915_gem_gtt.c copyright Ben Widawsky
  2014-02-12 23:19     ` Damien Lespiau
@ 2014-02-19 19:20     ` Imre Deak
  1 sibling, 0 replies; 63+ messages in thread
From: Imre Deak @ 2014-02-19 19:20 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky


[-- Attachment #1.1: Type: text/plain, Size: 952 bytes --]

On Wed, 2014-02-12 at 14:28 -0800, Ben Widawsky wrote:
> I keep meaning to do this... by now almost the entire file has been
> written by an Intel employee (including Daniel post-2010).
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 2c2121d..e1bc0b9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1,5 +1,6 @@
>  /*
>   * Copyright © 2010 Daniel Vetter
> + * Copyright © 2011-2013 Intel Corporation
>   *
>   * Permission is hereby granted, free of charge, to any person obtaining a
>   * copy of this software and associated documentation files (the "Software"),

I would add the authors line for you, Chris and Daniel, but either way:
Reviewed-by: Imre Deak <imre.deak@intel.com>


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations
  2014-02-19 19:11     ` Imre Deak
@ 2014-02-19 19:25       ` Imre Deak
  2014-02-19 21:06       ` Ben Widawsky
  1 sibling, 0 replies; 63+ messages in thread
From: Imre Deak @ 2014-02-19 19:25 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky


[-- Attachment #1.1: Type: text/plain, Size: 11150 bytes --]

On Wed, 2014-02-19 at 21:11 +0200, Imre Deak wrote:
> On Wed, 2014-02-12 at 14:28 -0800, Ben Widawsky wrote:
> > The previous allocation mechanism would get 2 contiguous allocations,
> > one for the page directories, and one for the page tables. As each page
> > table is 1 page, and there are 512 of these per page directory, this
> > goes to 1MB. An unfriendly request at best. Worse still, our HW now
>        ---^
> Fwiw, 2MB.
> 
> > supports 4 page directories, and a 2MB allocation is not allowed.
> > 
> > In order to fix this, this patch attempts to split up each page table
> > allocation into a single, discrete allocation. There is nothing really
> > fancy about the patch itself, it just has to manage an extra pointer
> > indirection, and have a fancier bit of logic to free up the pages.
> > 
> > To accommodate some of the added complexity, two new helpers are
> > introduced to allocate, and free the page table pages.
> > 
> > NOTE: I really wanted to split the way we do allocations, and the way in
> > which we identify the page table/page directory being used. I found
> > splitting this functionality up to be too unwieldy. I apologize in
> > advance to the reviewer. I'd recommend looking at the result, rather
> > than the diff.
> > 
> > v2/NOTE2: This patch predated commit:
> > 6f1cc993518462ccf039e195fabd47e7aa5bfd13
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Tue Dec 31 15:50:31 2013 +0000
> > 
> >     drm/i915: Avoid dereference past end of page arr
> > 
> > It fixed the same issue as that patch, but because of the limbo state of
> > PPGTT, Chris patch was merged instead. The excess churn is a result of
> > my using my original patch, which has my preferred naming. Primarily
> > act_* is changed to which_*, but it's mostly the same otherwise. I've
> > kept the convention Chris used for the pte wrap (I had something
> > slightly different, and broken - but fixable)
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h     |   5 +-
> >  drivers/gpu/drm/i915/i915_gem_gtt.c | 127 ++++++++++++++++++++++++++++--------
> >  2 files changed, 103 insertions(+), 29 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 2ebad96..d9a6327 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -691,6 +691,7 @@ struct i915_gtt {
> >  };
> >  #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
> >  
> > +#define GEN8_LEGACY_PDPS 4
> >  struct i915_hw_ppgtt {
> >  	struct i915_address_space base;
> >  	struct kref ref;
> > @@ -698,14 +699,14 @@ struct i915_hw_ppgtt {
> >  	unsigned num_pd_entries;
> >  	union {
> >  		struct page **pt_pages;
> > -		struct page *gen8_pt_pages;
> > +		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
> >  	};
> >  	struct page *pd_pages;
> >  	int num_pd_pages;
> >  	int num_pt_pages;
> >  	union {
> >  		uint32_t pd_offset;
> > -		dma_addr_t pd_dma_addr[4];
> > +		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
> >  	};
> >  	union {
> >  		dma_addr_t *pt_dma_addr;
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index 5bfc6ff..5299acc 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -64,7 +64,19 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
> >  
> >  #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
> >  #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
> > -#define GEN8_LEGACY_PDPS		4
> > +
> > +/* GEN8 legacy style addressis defined as a 3 level page table:
> > + * 31:30 | 29:21 | 20:12 |  11:0
> > + * PDPE  |  PDE  |  PTE  | offset
> > + * The difference as compared to normal x86 3 level page table is the PDPEs are
> > + * programmed via register.
> > + */
> > +#define GEN8_PDPE_SHIFT			30
> > +#define GEN8_PDPE_MASK			0x3
> > +#define GEN8_PDE_SHIFT			21
> > +#define GEN8_PDE_MASK			0x1ff
> > +#define GEN8_PTE_SHIFT			12
> > +#define GEN8_PTE_MASK			0x1ff
> >  
> >  #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
> >  #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
> > @@ -261,32 +273,36 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> >  	struct i915_hw_ppgtt *ppgtt =
> >  		container_of(vm, struct i915_hw_ppgtt, base);
> >  	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
> > -	unsigned first_entry = start >> PAGE_SHIFT;
> > +	unsigned which_pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> > +	unsigned which_pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> > +	unsigned which_pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> >  	unsigned num_entries = length >> PAGE_SHIFT;
> > -	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> > -	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
> >  	unsigned last_pte, i;
> >  
> >  	scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
> >  				      I915_CACHE_LLC, use_scratch);
> >  
> >  	while (num_entries) {
> > -		struct page *page_table = &ppgtt->gen8_pt_pages[act_pt];
> > +		struct page *page_table = ppgtt->gen8_pt_pages[which_pdpe][which_pde];
> >  
> > -		last_pte = first_pte + num_entries;
> > +		last_pte = which_pte + num_entries;
> >  		if (last_pte > GEN8_PTES_PER_PAGE)
> >  			last_pte = GEN8_PTES_PER_PAGE;
> >  
> >  		pt_vaddr = kmap_atomic(page_table);
> >  
> > -		for (i = first_pte; i < last_pte; i++)
> > +		for (i = which_pte; i < last_pte; i++) {
> >  			pt_vaddr[i] = scratch_pte;
> > +			num_entries--;
> > +			BUG_ON(num_entries < 0);
> 
> num_entries is unsigned.
> 
> > +		}
> >  
> >  		kunmap_atomic(pt_vaddr);
> >  
> > -		num_entries -= last_pte - first_pte;
> > -		first_pte = 0;
> > -		act_pt++;
> > +		which_pte = 0;
> > +		if (which_pde + 1 == GEN8_PDES_PER_PAGE)
> > +			which_pdpe++;
> > +		which_pde = (which_pde + 1) & GEN8_PDE_MASK;
> >  	}
> >  }
> >  
> > @@ -298,39 +314,57 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> >  	struct i915_hw_ppgtt *ppgtt =
> >  		container_of(vm, struct i915_hw_ppgtt, base);
> >  	gen8_gtt_pte_t *pt_vaddr;
> > -	unsigned first_entry = start >> PAGE_SHIFT;
> > -	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> > -	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
> > +	unsigned which_pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> > +	unsigned which_pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> > +	unsigned which_pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> >  	struct sg_page_iter sg_iter;
> >  
> >  	pt_vaddr = NULL;
> > +
> >  	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> > +		if (WARN_ON(which_pdpe >= GEN8_LEGACY_PDPS))
> > +			break;
> > +
> >  		if (pt_vaddr == NULL)
> > -			pt_vaddr = kmap_atomic(&ppgtt->gen8_pt_pages[act_pt]);
> > +			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[which_pdpe][which_pde]);
> >  
> > -		pt_vaddr[act_pte] =
> > +		pt_vaddr[which_pte] =
> >  			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
> >  					cache_level, true);
> > -		if (++act_pte == GEN8_PTES_PER_PAGE) {
> > +		if (++which_pte == GEN8_PTES_PER_PAGE) {
> >  			kunmap_atomic(pt_vaddr);
> >  			pt_vaddr = NULL;
> > -			act_pt++;
> > -			act_pte = 0;
> > +			if (which_pde + 1 == GEN8_PDES_PER_PAGE)
> > +				which_pdpe++;
> 
> Afaics which_pde = (which_pde + 1) & GEN8_PDE_MASK; is missing here.
> 
> > +			which_pte = 0;
> 
> 
> 
> >  		}
> >  	}
> >  	if (pt_vaddr)
> >  		kunmap_atomic(pt_vaddr);
> >  }
> >  
> > -static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
> > +static void gen8_free_page_tables(struct page **pt_pages)
> > +{
> > +	int i;
> > +
> > +	if (pt_pages == NULL)
> > +		return;
> > +
> > +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
> > +		if (pt_pages[i])
> > +			__free_pages(pt_pages[i], 0);
> > +}
> > +
> > +static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
> >  {
> >  	int i;
> >  
> > -	for (i = 0; i < ppgtt->num_pd_pages ; i++)
> > +	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> > +		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
> >  		kfree(ppgtt->gen8_pt_dma_addr[i]);
> > +	}
> >  
> >  	kfree(ppgtt->gen8_pt_dma_addr);
> > -	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
> >  	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
> >  }
> >  
> > @@ -369,20 +403,59 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> >  	gen8_ppgtt_free(ppgtt);
> >  }
> >  
> > +static struct page **__gen8_alloc_page_tables(void)
> > +{
> > +	struct page **pt_pages;
> > +	int i;
> > +
> > +	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
> > +	if (!pt_pages)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
> > +		pt_pages[i] = alloc_page(GFP_KERNEL);
> > +		if (!pt_pages[i])
> > +			goto bail;
> > +	}
> > +
> > +	return pt_pages;
> > +
> > +bail:
> > +	gen8_free_page_tables(pt_pages);
> > +	kfree(pt_pages);
> > +	return ERR_PTR(-ENOMEM);
> > +}
> > +
> >  static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
> >  					   const int max_pdp)
> >  {
> > -	struct page *pt_pages;
> > +	struct page **pt_pages[GEN8_LEGACY_PDPS];
> >  	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
> > +	int i, ret;
> >  
> > -	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
> > -	if (!pt_pages)
> > -		return -ENOMEM;
> > +	for (i = 0; i < max_pdp; i++) {
> > +		pt_pages[i] = __gen8_alloc_page_tables();
> > +		if (IS_ERR(pt_pages[i])) {
> > +			ret = PTR_ERR(pt_pages[i]);
> > +			goto unwind_out;
> > +		}
> > +	}
> > +
> > +	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
> > +	 * "atomic" - for cleanup purposes.
> > +	 */
> > +	for (i = 0; i < max_pdp; i++)
> > +		ppgtt->gen8_pt_pages[i] = pt_pages[i];
> >  
> > -	ppgtt->gen8_pt_pages = pt_pages;
> >  	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
> >  
> >  	return 0;
> > +
> > +unwind_out:
> > +	while(i--)
> > +		gen8_free_page_tables(pt_pages[i]);
> 
> I guess Ville commented on this issue, but pt_pages would be leaked
> here.

Sorry, I meant pt_pages[i] here.

> 
> > +
> > +	return ret;
> >  }
> >  
> >  static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
> > @@ -475,7 +548,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
> >  	struct page *p;
> >  	int ret;
> >  
> > -	p = &ppgtt->gen8_pt_pages[pd * GEN8_PDES_PER_PAGE + pt];
> > +	p = ppgtt->gen8_pt_pages[pd][pt];
> >  	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
> >  			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> >  	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH] [v3] drm/i915/bdw: Reorganize PPGTT init
  2014-02-19 14:59     ` Imre Deak
@ 2014-02-19 20:06       ` Ben Widawsky
  2014-02-19 21:00         ` Imre Deak
  0 siblings, 1 reply; 63+ messages in thread
From: Ben Widawsky @ 2014-02-19 20:06 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Create 3 clear stages in PPGTT init. This will help with upcoming
changes be more readable. The 3 stages are, allocation, dma mapping, and
writing the P[DT]Es

One nice benefit to the patches is that it makes 2 very clear error
points, allocation, and mapping, and avoids having to do any handling
after writing PTEs (something which was likely buggy before). This
simplified error handling I suspect will be helpful when we move to
deferred/dynamic page table allocation and mapping.

The patches also attempts to break up some of the steps into more
logical reviewable chunks, particularly when we free.

v2: Don't call cleanup on the error path since that takes down the
drm_mm and list entry, which aren't setup at this point.

v3: Fixes addressing Imre's comments from:
<1392821989.19792.13.camel@intelbox>

Don't do dynamic allocation for the page table DMA addresses. I can't
remember why I did it in the first place. This addresses one of Imre's
other issues.

Fix error path leak of page tables.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 114 ++++++++++++++++++++----------------
 1 file changed, 64 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e414d7e..03f586aa 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -333,6 +333,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
+	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
@@ -341,18 +342,14 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 		if (!ppgtt->pd_dma_addr[i])
 			continue;
 
-		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd_dma_addr[i],
-			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
+			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			if (addr)
-				pci_unmap_page(ppgtt->base.dev->pdev,
-				       addr,
-				       PAGE_SIZE,
-				       PCI_DMA_BIDIRECTIONAL);
-
+				pci_unmap_page(hwdev, addr, PAGE_SIZE,
+					       PCI_DMA_BIDIRECTIONAL);
 		}
 	}
 }
@@ -370,27 +367,27 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 }
 
 /**
- * GEN8 legacy ppgtt programming is accomplished through 4 PDP registers with a
- * net effect resembling a 2-level page table in normal x86 terms. Each PDP
- * represents 1GB of memory
- * 4 * 512 * 512 * 4096 = 4GB legacy 32b address space.
+ * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
+ * with a net effect resembling a 2-level page table in normal x86 terms. Each
+ * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
+ * space.
  *
+ * FIXME: split allocation into smaller pieces. For now we only ever do this
+ * once, but with full PPGTT, the multiple contiguous allocations will be bad.
  * TODO: Do something with the size parameter
- **/
+ */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
 	struct page *pt_pages;
-	int i, j, ret = -ENOMEM;
 	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
 	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
+	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
+	int i, j, ret;
 
 	if (size % (1<<30))
 		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
 
-	/* FIXME: split allocation into smaller pieces. For now we only ever do
-	 * this once, but with full PPGTT, the multiple contiguous allocations
-	 * will be bad.
-	 */
+	/* 1. Do all our allocations for page directories and page tables */
 	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
 	if (!ppgtt->pd_pages)
 		return -ENOMEM;
@@ -405,52 +402,60 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
 	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
-	ppgtt->enable = gen8_ppgtt_enable;
-	ppgtt->switch_mm = gen8_mm_switch;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
-	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
-	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
-
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
 
+	for (i = 0; i < max_pdp; i++) {
+		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
+						     sizeof(dma_addr_t),
+						     GFP_KERNEL);
+		if (!ppgtt->gen8_pt_dma_addr[i]) {
+			ret = -ENOMEM;
+			goto bail;
+		}
+	}
+
 	/*
-	 * - Create a mapping for the page directories.
-	 * - For each page directory:
-	 *      allocate space for page table mappings.
-	 *      map each page table
+	 * 2. Create all the DMA mappings for the page directories and page
+	 * tables
 	 */
 	for (i = 0; i < max_pdp; i++) {
-		dma_addr_t temp;
-		temp = pci_map_page(ppgtt->base.dev->pdev,
-				    &ppgtt->pd_pages[i], 0,
-				    PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-		if (pci_dma_mapping_error(ppgtt->base.dev->pdev, temp))
-			goto err_out;
+		dma_addr_t pd_addr, pt_addr;
 
-		ppgtt->pd_dma_addr[i] = temp;
+		/* And the page directory mappings */
+		pd_addr = pci_map_page(hwdev, &ppgtt->pd_pages[i], 0,
+				       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+		ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
+		if (ret)
+			goto bail;
 
-		ppgtt->gen8_pt_dma_addr[i] = kmalloc(sizeof(dma_addr_t) * GEN8_PDES_PER_PAGE, GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			goto err_out;
+		ppgtt->pd_dma_addr[i] = pd_addr;
 
+		/* Get the page table mappings per page directory */
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			struct page *p = &pt_pages[i * GEN8_PDES_PER_PAGE + j];
-			temp = pci_map_page(ppgtt->base.dev->pdev,
-					    p, 0, PAGE_SIZE,
-					    PCI_DMA_BIDIRECTIONAL);
 
-			if (pci_dma_mapping_error(ppgtt->base.dev->pdev, temp))
-				goto err_out;
+			pt_addr = pci_map_page(hwdev, p, 0, PAGE_SIZE,
+					       PCI_DMA_BIDIRECTIONAL);
+			ret = pci_dma_mapping_error(hwdev, pt_addr);
+			if (ret) {
+				ppgtt->pd_dma_addr[i] = 0;
+				pci_unmap_page(hwdev, pd_addr, PAGE_SIZE,
+					       PCI_DMA_BIDIRECTIONAL);
+				goto bail;
+			}
 
-			ppgtt->gen8_pt_dma_addr[i][j] = temp;
+			ppgtt->gen8_pt_dma_addr[i][j] = pt_addr;
 		}
 	}
 
-	/* For now, the PPGTT helper functions all require that the PDEs are
+	/*
+	 * 3. Map all the page directory entires to point to the page tables
+	 * we've allocated.
+	 *
+	 * For now, the PPGTT helper functions all require that the PDEs are
 	 * plugged in correctly. So we do that now/here. For aliasing PPGTT, we
-	 * will never need to touch the PDEs again */
+	 * will never need to touch the PDEs again.
+	 */
 	for (i = 0; i < max_pdp; i++) {
 		gen8_ppgtt_pde_t *pd_vaddr;
 		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
@@ -462,6 +467,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		kunmap_atomic(pd_vaddr);
 	}
 
+	ppgtt->enable = gen8_ppgtt_enable;
+	ppgtt->switch_mm = gen8_mm_switch;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->base.start = 0;
+	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
+
 	ppgtt->base.clear_range(&ppgtt->base, 0,
 				ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE,
 				true);
@@ -474,8 +487,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 			 size % (1<<30));
 	return 0;
 
-err_out:
-	ppgtt->base.cleanup(&ppgtt->base);
+bail:
+	gen8_ppgtt_unmap_pages(ppgtt);
+	gen8_ppgtt_free(ppgtt);
 	return ret;
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH] [v3] drm/i915/bdw: Reorganize PPGTT init
  2014-02-19 20:06       ` [PATCH] [v3] " Ben Widawsky
@ 2014-02-19 21:00         ` Imre Deak
  2014-02-19 21:18           ` Ben Widawsky
  0 siblings, 1 reply; 63+ messages in thread
From: Imre Deak @ 2014-02-19 21:00 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, 2014-02-19 at 12:06 -0800, Ben Widawsky wrote:
> Create 3 clear stages in PPGTT init. This will help with upcoming
> changes be more readable. The 3 stages are, allocation, dma mapping, and
> writing the P[DT]Es
> 
> One nice benefit to the patches is that it makes 2 very clear error
> points, allocation, and mapping, and avoids having to do any handling
> after writing PTEs (something which was likely buggy before). This
> simplified error handling I suspect will be helpful when we move to
> deferred/dynamic page table allocation and mapping.
> 
> The patches also attempts to break up some of the steps into more
> logical reviewable chunks, particularly when we free.
> 
> v2: Don't call cleanup on the error path since that takes down the
> drm_mm and list entry, which aren't setup at this point.
> 
> v3: Fixes addressing Imre's comments from:
> <1392821989.19792.13.camel@intelbox>
> 
> Don't do dynamic allocation for the page table DMA addresses. I can't
> remember why I did it in the first place. This addresses one of Imre's
> other issues.
> 
> Fix error path leak of page tables.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 114 ++++++++++++++++++++----------------
>  1 file changed, 64 insertions(+), 50 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index e414d7e..03f586aa 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -333,6 +333,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  
>  static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  {
> +	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
>  	int i, j;
>  
>  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> @@ -341,18 +342,14 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
>  		if (!ppgtt->pd_dma_addr[i])
>  			continue;
>  
> -		pci_unmap_page(ppgtt->base.dev->pdev,
> -			       ppgtt->pd_dma_addr[i],
> -			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> +		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
> +			       PCI_DMA_BIDIRECTIONAL);
>  
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
>  			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
>  			if (addr)
> -				pci_unmap_page(ppgtt->base.dev->pdev,
> -				       addr,
> -				       PAGE_SIZE,
> -				       PCI_DMA_BIDIRECTIONAL);
> -
> +				pci_unmap_page(hwdev, addr, PAGE_SIZE,
> +					       PCI_DMA_BIDIRECTIONAL);
>  		}
>  	}
>  }
> @@ -370,27 +367,27 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>  }
>  
>  /**
> - * GEN8 legacy ppgtt programming is accomplished through 4 PDP registers with a
> - * net effect resembling a 2-level page table in normal x86 terms. Each PDP
> - * represents 1GB of memory
> - * 4 * 512 * 512 * 4096 = 4GB legacy 32b address space.
> + * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
> + * with a net effect resembling a 2-level page table in normal x86 terms. Each
> + * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
> + * space.
>   *
> + * FIXME: split allocation into smaller pieces. For now we only ever do this
> + * once, but with full PPGTT, the multiple contiguous allocations will be bad.
>   * TODO: Do something with the size parameter
> - **/
> + */
>  static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  {
>  	struct page *pt_pages;
> -	int i, j, ret = -ENOMEM;
>  	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
>  	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
> +	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
> +	int i, j, ret;
>  
>  	if (size % (1<<30))
>  		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
>  
> -	/* FIXME: split allocation into smaller pieces. For now we only ever do
> -	 * this once, but with full PPGTT, the multiple contiguous allocations
> -	 * will be bad.
> -	 */
> +	/* 1. Do all our allocations for page directories and page tables */
>  	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
>  	if (!ppgtt->pd_pages)
>  		return -ENOMEM;
> @@ -405,52 +402,60 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
>  	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
>  	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
> -	ppgtt->enable = gen8_ppgtt_enable;
> -	ppgtt->switch_mm = gen8_mm_switch;
> -	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
> -	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
> -	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
> -	ppgtt->base.start = 0;
> -	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
> -
>  	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
>  
> +	for (i = 0; i < max_pdp; i++) {
> +		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
> +						     sizeof(dma_addr_t),
> +						     GFP_KERNEL);
> +		if (!ppgtt->gen8_pt_dma_addr[i]) {
> +			ret = -ENOMEM;
> +			goto bail;
> +		}
> +	}
> +
>  	/*
> -	 * - Create a mapping for the page directories.
> -	 * - For each page directory:
> -	 *      allocate space for page table mappings.
> -	 *      map each page table
> +	 * 2. Create all the DMA mappings for the page directories and page
> +	 * tables
>  	 */
>  	for (i = 0; i < max_pdp; i++) {
> -		dma_addr_t temp;
> -		temp = pci_map_page(ppgtt->base.dev->pdev,
> -				    &ppgtt->pd_pages[i], 0,
> -				    PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> -		if (pci_dma_mapping_error(ppgtt->base.dev->pdev, temp))
> -			goto err_out;
> +		dma_addr_t pd_addr, pt_addr;
>  
> -		ppgtt->pd_dma_addr[i] = temp;
> +		/* And the page directory mappings */
> +		pd_addr = pci_map_page(hwdev, &ppgtt->pd_pages[i], 0,
> +				       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> +		ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
> +		if (ret)
> +			goto bail;
>  
> -		ppgtt->gen8_pt_dma_addr[i] = kmalloc(sizeof(dma_addr_t) * GEN8_PDES_PER_PAGE, GFP_KERNEL);
> -		if (!ppgtt->gen8_pt_dma_addr[i])
> -			goto err_out;
> +		ppgtt->pd_dma_addr[i] = pd_addr;
>  
> +		/* Get the page table mappings per page directory */
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
>  			struct page *p = &pt_pages[i * GEN8_PDES_PER_PAGE + j];
> -			temp = pci_map_page(ppgtt->base.dev->pdev,
> -					    p, 0, PAGE_SIZE,
> -					    PCI_DMA_BIDIRECTIONAL);
>  
> -			if (pci_dma_mapping_error(ppgtt->base.dev->pdev, temp))
> -				goto err_out;
> +			pt_addr = pci_map_page(hwdev, p, 0, PAGE_SIZE,
> +					       PCI_DMA_BIDIRECTIONAL);
> +			ret = pci_dma_mapping_error(hwdev, pt_addr);
> +			if (ret) {
> +				ppgtt->pd_dma_addr[i] = 0;
> +				pci_unmap_page(hwdev, pd_addr, PAGE_SIZE,
> +					       PCI_DMA_BIDIRECTIONAL);
> +				goto bail;

I think this would still leave the ppgtt->gen8_pt_dma_addr[i][0 .. j-1]
mapped on error. Simply doing if (ret) goto bail; would be ok imo. With
that fixed this patch is:

Reviewed-by: Imre Deak <imre.deak@intel.com> 

> +			}
>  
> -			ppgtt->gen8_pt_dma_addr[i][j] = temp;
> +			ppgtt->gen8_pt_dma_addr[i][j] = pt_addr;
>  		}
>  	}
>  
> -	/* For now, the PPGTT helper functions all require that the PDEs are
> +	/*
> +	 * 3. Map all the page directory entires to point to the page tables
> +	 * we've allocated.
> +	 *
> +	 * For now, the PPGTT helper functions all require that the PDEs are
>  	 * plugged in correctly. So we do that now/here. For aliasing PPGTT, we
> -	 * will never need to touch the PDEs again */
> +	 * will never need to touch the PDEs again.
> +	 */
>  	for (i = 0; i < max_pdp; i++) {
>  		gen8_ppgtt_pde_t *pd_vaddr;
>  		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
> @@ -462,6 +467,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  		kunmap_atomic(pd_vaddr);
>  	}
>  
> +	ppgtt->enable = gen8_ppgtt_enable;
> +	ppgtt->switch_mm = gen8_mm_switch;
> +	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
> +	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
> +	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
> +	ppgtt->base.start = 0;
> +	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
> +
>  	ppgtt->base.clear_range(&ppgtt->base, 0,
>  				ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE,
>  				true);
> @@ -474,8 +487,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  			 size % (1<<30));
>  	return 0;
>  
> -err_out:
> -	ppgtt->base.cleanup(&ppgtt->base);
> +bail:
> +	gen8_ppgtt_unmap_pages(ppgtt);
> +	gen8_ppgtt_free(ppgtt);
>  	return ret;
>  }
>  

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations
  2014-02-19 19:11     ` Imre Deak
  2014-02-19 19:25       ` Imre Deak
@ 2014-02-19 21:06       ` Ben Widawsky
  2014-02-19 21:20         ` Imre Deak
  1 sibling, 1 reply; 63+ messages in thread
From: Ben Widawsky @ 2014-02-19 21:06 UTC (permalink / raw)
  To: Imre Deak; +Cc: Intel GFX, Ben Widawsky

On Wed, Feb 19, 2014 at 09:11:46PM +0200, Imre Deak wrote:
> On Wed, 2014-02-12 at 14:28 -0800, Ben Widawsky wrote:
> > The previous allocation mechanism would get 2 contiguous allocations,
> > one for the page directories, and one for the page tables. As each page
> > table is 1 page, and there are 512 of these per page directory, this
> > goes to 1MB. An unfriendly request at best. Worse still, our HW now
>        ---^
> Fwiw, 2MB.

Thanks.

> 
> > supports 4 page directories, and a 2MB allocation is not allowed.
> > 
> > In order to fix this, this patch attempts to split up each page table
> > allocation into a single, discrete allocation. There is nothing really
> > fancy about the patch itself, it just has to manage an extra pointer
> > indirection, and have a fancier bit of logic to free up the pages.
> > 
> > To accommodate some of the added complexity, two new helpers are
> > introduced to allocate, and free the page table pages.
> > 
> > NOTE: I really wanted to split the way we do allocations, and the way in
> > which we identify the page table/page directory being used. I found
> > splitting this functionality up to be too unwieldy. I apologize in
> > advance to the reviewer. I'd recommend looking at the result, rather
> > than the diff.
> > 
> > v2/NOTE2: This patch predated commit:
> > 6f1cc993518462ccf039e195fabd47e7aa5bfd13
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Tue Dec 31 15:50:31 2013 +0000
> > 
> >     drm/i915: Avoid dereference past end of page arr
> > 
> > It fixed the same issue as that patch, but because of the limbo state of
> > PPGTT, Chris patch was merged instead. The excess churn is a result of
> > my using my original patch, which has my preferred naming. Primarily
> > act_* is changed to which_*, but it's mostly the same otherwise. I've
> > kept the convention Chris used for the pte wrap (I had something
> > slightly different, and broken - but fixable)
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h     |   5 +-
> >  drivers/gpu/drm/i915/i915_gem_gtt.c | 127 ++++++++++++++++++++++++++++--------
> >  2 files changed, 103 insertions(+), 29 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 2ebad96..d9a6327 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -691,6 +691,7 @@ struct i915_gtt {
> >  };
> >  #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
> >  
> > +#define GEN8_LEGACY_PDPS 4
> >  struct i915_hw_ppgtt {
> >  	struct i915_address_space base;
> >  	struct kref ref;
> > @@ -698,14 +699,14 @@ struct i915_hw_ppgtt {
> >  	unsigned num_pd_entries;
> >  	union {
> >  		struct page **pt_pages;
> > -		struct page *gen8_pt_pages;
> > +		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
> >  	};
> >  	struct page *pd_pages;
> >  	int num_pd_pages;
> >  	int num_pt_pages;
> >  	union {
> >  		uint32_t pd_offset;
> > -		dma_addr_t pd_dma_addr[4];
> > +		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
> >  	};
> >  	union {
> >  		dma_addr_t *pt_dma_addr;
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index 5bfc6ff..5299acc 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -64,7 +64,19 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
> >  
> >  #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
> >  #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
> > -#define GEN8_LEGACY_PDPS		4
> > +
> > +/* GEN8 legacy style addressis defined as a 3 level page table:
> > + * 31:30 | 29:21 | 20:12 |  11:0
> > + * PDPE  |  PDE  |  PTE  | offset
> > + * The difference as compared to normal x86 3 level page table is the PDPEs are
> > + * programmed via register.
> > + */
> > +#define GEN8_PDPE_SHIFT			30
> > +#define GEN8_PDPE_MASK			0x3
> > +#define GEN8_PDE_SHIFT			21
> > +#define GEN8_PDE_MASK			0x1ff
> > +#define GEN8_PTE_SHIFT			12
> > +#define GEN8_PTE_MASK			0x1ff
> >  
> >  #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
> >  #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
> > @@ -261,32 +273,36 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> >  	struct i915_hw_ppgtt *ppgtt =
> >  		container_of(vm, struct i915_hw_ppgtt, base);
> >  	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
> > -	unsigned first_entry = start >> PAGE_SHIFT;
> > +	unsigned which_pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> > +	unsigned which_pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> > +	unsigned which_pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> >  	unsigned num_entries = length >> PAGE_SHIFT;
> > -	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> > -	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
> >  	unsigned last_pte, i;
> >  
> >  	scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
> >  				      I915_CACHE_LLC, use_scratch);
> >  
> >  	while (num_entries) {
> > -		struct page *page_table = &ppgtt->gen8_pt_pages[act_pt];
> > +		struct page *page_table = ppgtt->gen8_pt_pages[which_pdpe][which_pde];
> >  
> > -		last_pte = first_pte + num_entries;
> > +		last_pte = which_pte + num_entries;
> >  		if (last_pte > GEN8_PTES_PER_PAGE)
> >  			last_pte = GEN8_PTES_PER_PAGE;
> >  
> >  		pt_vaddr = kmap_atomic(page_table);
> >  
> > -		for (i = first_pte; i < last_pte; i++)
> > +		for (i = which_pte; i < last_pte; i++) {
> >  			pt_vaddr[i] = scratch_pte;
> > +			num_entries--;
> > +			BUG_ON(num_entries < 0);
> 
> num_entries is unsigned.

This was already changed per Chris' request.

> 
> > +		}
> >  
> >  		kunmap_atomic(pt_vaddr);
> >  
> > -		num_entries -= last_pte - first_pte;
> > -		first_pte = 0;
> > -		act_pt++;
> > +		which_pte = 0;
> > +		if (which_pde + 1 == GEN8_PDES_PER_PAGE)
> > +			which_pdpe++;
> > +		which_pde = (which_pde + 1) & GEN8_PDE_MASK;
> >  	}
> >  }
> >  
> > @@ -298,39 +314,57 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> >  	struct i915_hw_ppgtt *ppgtt =
> >  		container_of(vm, struct i915_hw_ppgtt, base);
> >  	gen8_gtt_pte_t *pt_vaddr;
> > -	unsigned first_entry = start >> PAGE_SHIFT;
> > -	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> > -	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
> > +	unsigned which_pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> > +	unsigned which_pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> > +	unsigned which_pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> >  	struct sg_page_iter sg_iter;
> >  
> >  	pt_vaddr = NULL;
> > +
> >  	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> > +		if (WARN_ON(which_pdpe >= GEN8_LEGACY_PDPS))
> > +			break;
> > +
> >  		if (pt_vaddr == NULL)
> > -			pt_vaddr = kmap_atomic(&ppgtt->gen8_pt_pages[act_pt]);
> > +			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[which_pdpe][which_pde]);
> >  
> > -		pt_vaddr[act_pte] =
> > +		pt_vaddr[which_pte] =
> >  			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
> >  					cache_level, true);
> > -		if (++act_pte == GEN8_PTES_PER_PAGE) {
> > +		if (++which_pte == GEN8_PTES_PER_PAGE) {
> >  			kunmap_atomic(pt_vaddr);
> >  			pt_vaddr = NULL;
> > -			act_pt++;
> > -			act_pte = 0;
> > +			if (which_pde + 1 == GEN8_PDES_PER_PAGE)
> > +				which_pdpe++;
> 
> Afaics which_pde = (which_pde + 1) & GEN8_PDE_MASK; is missing here.
> 

This was already changed per Chris' request.

> > +			which_pte = 0;
> 
> 
> 
> >  		}
> >  	}
> >  	if (pt_vaddr)
> >  		kunmap_atomic(pt_vaddr);
> >  }
> >  
> > -static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
> > +static void gen8_free_page_tables(struct page **pt_pages)
> > +{
> > +	int i;
> > +
> > +	if (pt_pages == NULL)
> > +		return;
> > +
> > +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
> > +		if (pt_pages[i])
> > +			__free_pages(pt_pages[i], 0);
> > +}
> > +
> > +static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
> >  {
> >  	int i;
> >  
> > -	for (i = 0; i < ppgtt->num_pd_pages ; i++)
> > +	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> > +		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
> >  		kfree(ppgtt->gen8_pt_dma_addr[i]);
> > +	}
> >  
> >  	kfree(ppgtt->gen8_pt_dma_addr);
> > -	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
> >  	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
> >  }
> >  
> > @@ -369,20 +403,59 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> >  	gen8_ppgtt_free(ppgtt);
> >  }
> >  
> > +static struct page **__gen8_alloc_page_tables(void)
> > +{
> > +	struct page **pt_pages;
> > +	int i;
> > +
> > +	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
> > +	if (!pt_pages)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
> > +		pt_pages[i] = alloc_page(GFP_KERNEL);
> > +		if (!pt_pages[i])
> > +			goto bail;
> > +	}
> > +
> > +	return pt_pages;
> > +
> > +bail:
> > +	gen8_free_page_tables(pt_pages);
> > +	kfree(pt_pages);
> > +	return ERR_PTR(-ENOMEM);
> > +}
> > +
> >  static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
> >  					   const int max_pdp)
> >  {
> > -	struct page *pt_pages;
> > +	struct page **pt_pages[GEN8_LEGACY_PDPS];
> >  	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
> > +	int i, ret;
> >  
> > -	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
> > -	if (!pt_pages)
> > -		return -ENOMEM;
> > +	for (i = 0; i < max_pdp; i++) {
> > +		pt_pages[i] = __gen8_alloc_page_tables();
> > +		if (IS_ERR(pt_pages[i])) {
> > +			ret = PTR_ERR(pt_pages[i]);
> > +			goto unwind_out;
> > +		}
> > +	}
> > +
> > +	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
> > +	 * "atomic" - for cleanup purposes.
> > +	 */
> > +	for (i = 0; i < max_pdp; i++)
> > +		ppgtt->gen8_pt_pages[i] = pt_pages[i];
> >  
> > -	ppgtt->gen8_pt_pages = pt_pages;
> >  	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
> >  
> >  	return 0;
> > +
> > +unwind_out:
> > +	while(i--)
> > +		gen8_free_page_tables(pt_pages[i]);
> 
> I guess Ville commented on this issue, but pt_pages would be leaked
> here.

I think Ville was referring to a different issue. The PPGTT struct
itself isn't freed (if I understood his point correctly, which was
indeed an issue). Forgive my ignorance here but I don't see where the
leak is. __gen8_alloc_page_tables() appears to always free if it failed,
and unwind out should work backwards. Can you show me what I've missed?

> 
> > +
> > +	return ret;
> >  }
> >  
> >  static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
> > @@ -475,7 +548,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
> >  	struct page *p;
> >  	int ret;
> >  
> > -	p = &ppgtt->gen8_pt_pages[pd * GEN8_PDES_PER_PAGE + pt];
> > +	p = ppgtt->gen8_pt_pages[pd][pt];
> >  	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
> >  			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> >  	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
> 



-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] [v3] drm/i915/bdw: Reorganize PPGTT init
  2014-02-19 21:00         ` Imre Deak
@ 2014-02-19 21:18           ` Ben Widawsky
  0 siblings, 0 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-19 21:18 UTC (permalink / raw)
  To: Imre Deak; +Cc: Intel GFX, Ben Widawsky

On Wed, Feb 19, 2014 at 11:00:17PM +0200, Imre Deak wrote:
> On Wed, 2014-02-19 at 12:06 -0800, Ben Widawsky wrote:
> > Create 3 clear stages in PPGTT init. This will help with upcoming
> > changes be more readable. The 3 stages are, allocation, dma mapping, and
> > writing the P[DT]Es
> > 
> > One nice benefit to the patches is that it makes 2 very clear error
> > points, allocation, and mapping, and avoids having to do any handling
> > after writing PTEs (something which was likely buggy before). This
> > simplified error handling I suspect will be helpful when we move to
> > deferred/dynamic page table allocation and mapping.
> > 
> > The patches also attempts to break up some of the steps into more
> > logical reviewable chunks, particularly when we free.
> > 
> > v2: Don't call cleanup on the error path since that takes down the
> > drm_mm and list entry, which aren't setup at this point.
> > 
> > v3: Fixes addressing Imre's comments from:
> > <1392821989.19792.13.camel@intelbox>
> > 
> > Don't do dynamic allocation for the page table DMA addresses. I can't
> > remember why I did it in the first place. This addresses one of Imre's
> > other issues.
> > 
> > Fix error path leak of page tables.
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_gtt.c | 114 ++++++++++++++++++++----------------
> >  1 file changed, 64 insertions(+), 50 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index e414d7e..03f586aa 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -333,6 +333,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
> >  
> >  static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
> >  {
> > +	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
> >  	int i, j;
> >  
> >  	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> > @@ -341,18 +342,14 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
> >  		if (!ppgtt->pd_dma_addr[i])
> >  			continue;
> >  
> > -		pci_unmap_page(ppgtt->base.dev->pdev,
> > -			       ppgtt->pd_dma_addr[i],
> > -			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> > +		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
> > +			       PCI_DMA_BIDIRECTIONAL);
> >  
> >  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> >  			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
> >  			if (addr)
> > -				pci_unmap_page(ppgtt->base.dev->pdev,
> > -				       addr,
> > -				       PAGE_SIZE,
> > -				       PCI_DMA_BIDIRECTIONAL);
> > -
> > +				pci_unmap_page(hwdev, addr, PAGE_SIZE,
> > +					       PCI_DMA_BIDIRECTIONAL);
> >  		}
> >  	}
> >  }
> > @@ -370,27 +367,27 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> >  }
> >  
> >  /**
> > - * GEN8 legacy ppgtt programming is accomplished through 4 PDP registers with a
> > - * net effect resembling a 2-level page table in normal x86 terms. Each PDP
> > - * represents 1GB of memory
> > - * 4 * 512 * 512 * 4096 = 4GB legacy 32b address space.
> > + * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
> > + * with a net effect resembling a 2-level page table in normal x86 terms. Each
> > + * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
> > + * space.
> >   *
> > + * FIXME: split allocation into smaller pieces. For now we only ever do this
> > + * once, but with full PPGTT, the multiple contiguous allocations will be bad.
> >   * TODO: Do something with the size parameter
> > - **/
> > + */
> >  static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
> >  {
> >  	struct page *pt_pages;
> > -	int i, j, ret = -ENOMEM;
> >  	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
> >  	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
> > +	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
> > +	int i, j, ret;
> >  
> >  	if (size % (1<<30))
> >  		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
> >  
> > -	/* FIXME: split allocation into smaller pieces. For now we only ever do
> > -	 * this once, but with full PPGTT, the multiple contiguous allocations
> > -	 * will be bad.
> > -	 */
> > +	/* 1. Do all our allocations for page directories and page tables */
> >  	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
> >  	if (!ppgtt->pd_pages)
> >  		return -ENOMEM;
> > @@ -405,52 +402,60 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
> >  	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
> >  	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
> >  	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
> > -	ppgtt->enable = gen8_ppgtt_enable;
> > -	ppgtt->switch_mm = gen8_mm_switch;
> > -	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
> > -	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
> > -	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
> > -	ppgtt->base.start = 0;
> > -	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
> > -
> >  	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
> >  
> > +	for (i = 0; i < max_pdp; i++) {
> > +		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
> > +						     sizeof(dma_addr_t),
> > +						     GFP_KERNEL);
> > +		if (!ppgtt->gen8_pt_dma_addr[i]) {
> > +			ret = -ENOMEM;
> > +			goto bail;
> > +		}
> > +	}
> > +
> >  	/*
> > -	 * - Create a mapping for the page directories.
> > -	 * - For each page directory:
> > -	 *      allocate space for page table mappings.
> > -	 *      map each page table
> > +	 * 2. Create all the DMA mappings for the page directories and page
> > +	 * tables
> >  	 */
> >  	for (i = 0; i < max_pdp; i++) {
> > -		dma_addr_t temp;
> > -		temp = pci_map_page(ppgtt->base.dev->pdev,
> > -				    &ppgtt->pd_pages[i], 0,
> > -				    PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> > -		if (pci_dma_mapping_error(ppgtt->base.dev->pdev, temp))
> > -			goto err_out;
> > +		dma_addr_t pd_addr, pt_addr;
> >  
> > -		ppgtt->pd_dma_addr[i] = temp;
> > +		/* And the page directory mappings */
> > +		pd_addr = pci_map_page(hwdev, &ppgtt->pd_pages[i], 0,
> > +				       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> > +		ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
> > +		if (ret)
> > +			goto bail;
> >  
> > -		ppgtt->gen8_pt_dma_addr[i] = kmalloc(sizeof(dma_addr_t) * GEN8_PDES_PER_PAGE, GFP_KERNEL);
> > -		if (!ppgtt->gen8_pt_dma_addr[i])
> > -			goto err_out;
> > +		ppgtt->pd_dma_addr[i] = pd_addr;
> >  
> > +		/* Get the page table mappings per page directory */
> >  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> >  			struct page *p = &pt_pages[i * GEN8_PDES_PER_PAGE + j];
> > -			temp = pci_map_page(ppgtt->base.dev->pdev,
> > -					    p, 0, PAGE_SIZE,
> > -					    PCI_DMA_BIDIRECTIONAL);
> >  
> > -			if (pci_dma_mapping_error(ppgtt->base.dev->pdev, temp))
> > -				goto err_out;
> > +			pt_addr = pci_map_page(hwdev, p, 0, PAGE_SIZE,
> > +					       PCI_DMA_BIDIRECTIONAL);
> > +			ret = pci_dma_mapping_error(hwdev, pt_addr);
> > +			if (ret) {
> > +				ppgtt->pd_dma_addr[i] = 0;
> > +				pci_unmap_page(hwdev, pd_addr, PAGE_SIZE,
> > +					       PCI_DMA_BIDIRECTIONAL);
> > +				goto bail;
> 
> I think this would still leave the ppgtt->gen8_pt_dma_addr[i][0 .. j-1]
> mapped on error. Simply doing if (ret) goto bail; would be ok imo. With
> that fixed this patch is:
> 
> Reviewed-by: Imre Deak <imre.deak@intel.com> 
> 

Yep, I was just testing you :D. My fix traded one leak for another (it'd
skip unmapping J page tables).

> > +			}
> >  
> > -			ppgtt->gen8_pt_dma_addr[i][j] = temp;
> > +			ppgtt->gen8_pt_dma_addr[i][j] = pt_addr;
> >  		}
> >  	}
> >  
> > -	/* For now, the PPGTT helper functions all require that the PDEs are
> > +	/*
> > +	 * 3. Map all the page directory entires to point to the page tables
> > +	 * we've allocated.
> > +	 *
> > +	 * For now, the PPGTT helper functions all require that the PDEs are
> >  	 * plugged in correctly. So we do that now/here. For aliasing PPGTT, we
> > -	 * will never need to touch the PDEs again */
> > +	 * will never need to touch the PDEs again.
> > +	 */
> >  	for (i = 0; i < max_pdp; i++) {
> >  		gen8_ppgtt_pde_t *pd_vaddr;
> >  		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
> > @@ -462,6 +467,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
> >  		kunmap_atomic(pd_vaddr);
> >  	}
> >  
> > +	ppgtt->enable = gen8_ppgtt_enable;
> > +	ppgtt->switch_mm = gen8_mm_switch;
> > +	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
> > +	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
> > +	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
> > +	ppgtt->base.start = 0;
> > +	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
> > +
> >  	ppgtt->base.clear_range(&ppgtt->base, 0,
> >  				ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE,
> >  				true);
> > @@ -474,8 +487,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
> >  			 size % (1<<30));
> >  	return 0;
> >  
> > -err_out:
> > -	ppgtt->base.cleanup(&ppgtt->base);
> > +bail:
> > +	gen8_ppgtt_unmap_pages(ppgtt);
> > +	gen8_ppgtt_free(ppgtt);
> >  	return ret;
> >  }
> >  
> 
> 

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations
  2014-02-19 21:06       ` Ben Widawsky
@ 2014-02-19 21:20         ` Imre Deak
  2014-02-19 21:31           ` Ben Widawsky
  0 siblings, 1 reply; 63+ messages in thread
From: Imre Deak @ 2014-02-19 21:20 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, 2014-02-19 at 13:06 -0800, Ben Widawsky wrote:
> On Wed, Feb 19, 2014 at 09:11:46PM +0200, Imre Deak wrote:
> > On Wed, 2014-02-12 at 14:28 -0800, Ben Widawsky wrote:
> > > The previous allocation mechanism would get 2 contiguous allocations,
> > > one for the page directories, and one for the page tables. As each page
> > > table is 1 page, and there are 512 of these per page directory, this
> > > goes to 1MB. An unfriendly request at best. Worse still, our HW now
> >        ---^
> > Fwiw, 2MB.
> 
> Thanks.
> 
> > 
> > > supports 4 page directories, and a 2MB allocation is not allowed.
> > > 
> > > In order to fix this, this patch attempts to split up each page table
> > > allocation into a single, discrete allocation. There is nothing really
> > > fancy about the patch itself, it just has to manage an extra pointer
> > > indirection, and have a fancier bit of logic to free up the pages.
> > > 
> > > To accommodate some of the added complexity, two new helpers are
> > > introduced to allocate, and free the page table pages.
> > > 
> > > NOTE: I really wanted to split the way we do allocations, and the way in
> > > which we identify the page table/page directory being used. I found
> > > splitting this functionality up to be too unwieldy. I apologize in
> > > advance to the reviewer. I'd recommend looking at the result, rather
> > > than the diff.
> > > 
> > > v2/NOTE2: This patch predated commit:
> > > 6f1cc993518462ccf039e195fabd47e7aa5bfd13
> > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > Date:   Tue Dec 31 15:50:31 2013 +0000
> > > 
> > >     drm/i915: Avoid dereference past end of page arr
> > > 
> > > It fixed the same issue as that patch, but because of the limbo state of
> > > PPGTT, Chris patch was merged instead. The excess churn is a result of
> > > my using my original patch, which has my preferred naming. Primarily
> > > act_* is changed to which_*, but it's mostly the same otherwise. I've
> > > kept the convention Chris used for the pte wrap (I had something
> > > slightly different, and broken - but fixable)
> > > 
> > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > > ---
> > >  drivers/gpu/drm/i915/i915_drv.h     |   5 +-
> > >  drivers/gpu/drm/i915/i915_gem_gtt.c | 127 ++++++++++++++++++++++++++++--------
> > >  2 files changed, 103 insertions(+), 29 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > > index 2ebad96..d9a6327 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > @@ -691,6 +691,7 @@ struct i915_gtt {
> > >  };
> > >  #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
> > >  
> > > +#define GEN8_LEGACY_PDPS 4
> > >  struct i915_hw_ppgtt {
> > >  	struct i915_address_space base;
> > >  	struct kref ref;
> > > @@ -698,14 +699,14 @@ struct i915_hw_ppgtt {
> > >  	unsigned num_pd_entries;
> > >  	union {
> > >  		struct page **pt_pages;
> > > -		struct page *gen8_pt_pages;
> > > +		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
> > >  	};
> > >  	struct page *pd_pages;
> > >  	int num_pd_pages;
> > >  	int num_pt_pages;
> > >  	union {
> > >  		uint32_t pd_offset;
> > > -		dma_addr_t pd_dma_addr[4];
> > > +		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
> > >  	};
> > >  	union {
> > >  		dma_addr_t *pt_dma_addr;
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > index 5bfc6ff..5299acc 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > @@ -64,7 +64,19 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
> > >  
> > >  #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
> > >  #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
> > > -#define GEN8_LEGACY_PDPS		4
> > > +
> > > +/* GEN8 legacy style addressis defined as a 3 level page table:
> > > + * 31:30 | 29:21 | 20:12 |  11:0
> > > + * PDPE  |  PDE  |  PTE  | offset
> > > + * The difference as compared to normal x86 3 level page table is the PDPEs are
> > > + * programmed via register.
> > > + */
> > > +#define GEN8_PDPE_SHIFT			30
> > > +#define GEN8_PDPE_MASK			0x3
> > > +#define GEN8_PDE_SHIFT			21
> > > +#define GEN8_PDE_MASK			0x1ff
> > > +#define GEN8_PTE_SHIFT			12
> > > +#define GEN8_PTE_MASK			0x1ff
> > >  
> > >  #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
> > >  #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
> > > @@ -261,32 +273,36 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> > >  	struct i915_hw_ppgtt *ppgtt =
> > >  		container_of(vm, struct i915_hw_ppgtt, base);
> > >  	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
> > > -	unsigned first_entry = start >> PAGE_SHIFT;
> > > +	unsigned which_pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> > > +	unsigned which_pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> > > +	unsigned which_pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> > >  	unsigned num_entries = length >> PAGE_SHIFT;
> > > -	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> > > -	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
> > >  	unsigned last_pte, i;
> > >  
> > >  	scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
> > >  				      I915_CACHE_LLC, use_scratch);
> > >  
> > >  	while (num_entries) {
> > > -		struct page *page_table = &ppgtt->gen8_pt_pages[act_pt];
> > > +		struct page *page_table = ppgtt->gen8_pt_pages[which_pdpe][which_pde];
> > >  
> > > -		last_pte = first_pte + num_entries;
> > > +		last_pte = which_pte + num_entries;
> > >  		if (last_pte > GEN8_PTES_PER_PAGE)
> > >  			last_pte = GEN8_PTES_PER_PAGE;
> > >  
> > >  		pt_vaddr = kmap_atomic(page_table);
> > >  
> > > -		for (i = first_pte; i < last_pte; i++)
> > > +		for (i = which_pte; i < last_pte; i++) {
> > >  			pt_vaddr[i] = scratch_pte;
> > > +			num_entries--;
> > > +			BUG_ON(num_entries < 0);
> > 
> > num_entries is unsigned.
> 
> This was already changed per Chris' request.
> 
> > 
> > > +		}
> > >  
> > >  		kunmap_atomic(pt_vaddr);
> > >  
> > > -		num_entries -= last_pte - first_pte;
> > > -		first_pte = 0;
> > > -		act_pt++;
> > > +		which_pte = 0;
> > > +		if (which_pde + 1 == GEN8_PDES_PER_PAGE)
> > > +			which_pdpe++;
> > > +		which_pde = (which_pde + 1) & GEN8_PDE_MASK;
> > >  	}
> > >  }
> > >  
> > > @@ -298,39 +314,57 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> > >  	struct i915_hw_ppgtt *ppgtt =
> > >  		container_of(vm, struct i915_hw_ppgtt, base);
> > >  	gen8_gtt_pte_t *pt_vaddr;
> > > -	unsigned first_entry = start >> PAGE_SHIFT;
> > > -	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> > > -	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
> > > +	unsigned which_pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> > > +	unsigned which_pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> > > +	unsigned which_pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> > >  	struct sg_page_iter sg_iter;
> > >  
> > >  	pt_vaddr = NULL;
> > > +
> > >  	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> > > +		if (WARN_ON(which_pdpe >= GEN8_LEGACY_PDPS))
> > > +			break;
> > > +
> > >  		if (pt_vaddr == NULL)
> > > -			pt_vaddr = kmap_atomic(&ppgtt->gen8_pt_pages[act_pt]);
> > > +			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[which_pdpe][which_pde]);
> > >  
> > > -		pt_vaddr[act_pte] =
> > > +		pt_vaddr[which_pte] =
> > >  			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
> > >  					cache_level, true);
> > > -		if (++act_pte == GEN8_PTES_PER_PAGE) {
> > > +		if (++which_pte == GEN8_PTES_PER_PAGE) {
> > >  			kunmap_atomic(pt_vaddr);
> > >  			pt_vaddr = NULL;
> > > -			act_pt++;
> > > -			act_pte = 0;
> > > +			if (which_pde + 1 == GEN8_PDES_PER_PAGE)
> > > +				which_pdpe++;
> > 
> > Afaics which_pde = (which_pde + 1) & GEN8_PDE_MASK; is missing here.
> > 
> 
> This was already changed per Chris' request.
> 
> > > +			which_pte = 0;
> > 
> > 
> > 
> > >  		}
> > >  	}
> > >  	if (pt_vaddr)
> > >  		kunmap_atomic(pt_vaddr);
> > >  }
> > >  
> > > -static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
> > > +static void gen8_free_page_tables(struct page **pt_pages)
> > > +{
> > > +	int i;
> > > +
> > > +	if (pt_pages == NULL)
> > > +		return;
> > > +
> > > +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
> > > +		if (pt_pages[i])
> > > +			__free_pages(pt_pages[i], 0);
> > > +}
> > > +
> > > +static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
> > >  {
> > >  	int i;
> > >  
> > > -	for (i = 0; i < ppgtt->num_pd_pages ; i++)
> > > +	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> > > +		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
> > >  		kfree(ppgtt->gen8_pt_dma_addr[i]);
> > > +	}
> > >  
> > >  	kfree(ppgtt->gen8_pt_dma_addr);
> > > -	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
> > >  	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
> > >  }
> > >  
> > > @@ -369,20 +403,59 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> > >  	gen8_ppgtt_free(ppgtt);
> > >  }
> > >  
> > > +static struct page **__gen8_alloc_page_tables(void)
> > > +{
> > > +	struct page **pt_pages;
> > > +	int i;
> > > +
> > > +	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
> > > +	if (!pt_pages)
> > > +		return ERR_PTR(-ENOMEM);
> > > +
> > > +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
> > > +		pt_pages[i] = alloc_page(GFP_KERNEL);
> > > +		if (!pt_pages[i])
> > > +			goto bail;
> > > +	}
> > > +
> > > +	return pt_pages;
> > > +
> > > +bail:
> > > +	gen8_free_page_tables(pt_pages);
> > > +	kfree(pt_pages);
> > > +	return ERR_PTR(-ENOMEM);
> > > +}
> > > +
> > >  static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
> > >  					   const int max_pdp)
> > >  {
> > > -	struct page *pt_pages;
> > > +	struct page **pt_pages[GEN8_LEGACY_PDPS];
> > >  	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
> > > +	int i, ret;
> > >  
> > > -	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
> > > -	if (!pt_pages)
> > > -		return -ENOMEM;
> > > +	for (i = 0; i < max_pdp; i++) {
> > > +		pt_pages[i] = __gen8_alloc_page_tables();
> > > +		if (IS_ERR(pt_pages[i])) {
> > > +			ret = PTR_ERR(pt_pages[i]);
> > > +			goto unwind_out;
> > > +		}
> > > +	}
> > > +
> > > +	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
> > > +	 * "atomic" - for cleanup purposes.
> > > +	 */
> > > +	for (i = 0; i < max_pdp; i++)
> > > +		ppgtt->gen8_pt_pages[i] = pt_pages[i];
> > >  
> > > -	ppgtt->gen8_pt_pages = pt_pages;
> > >  	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
> > >  
> > >  	return 0;
> > > +
> > > +unwind_out:
> > > +	while(i--)
> > > +		gen8_free_page_tables(pt_pages[i]);
> > 
> > I guess Ville commented on this issue, but pt_pages would be leaked
> > here.
> 
> I think Ville was referring to a different issue. The PPGTT struct
> itself isn't freed (if I understood his point correctly, which was
> indeed an issue). Forgive my ignorance here but I don't see where the
> leak is. __gen8_alloc_page_tables() appears to always free if it failed,

Yes that cleanup inside __gen8_alloc_page_tables is ok.

> and unwind out should work backwards. Can you show me what I've missed?

As I understand after

pt_pages[i] = __gen8_alloc_page_tables();

we have a kcalloc()'d buffer in pt_pages[i]. Each entry of this buffer
points to an alloc_page()'d page.

Then on error at unwind_out for 0..i-1 we call gen8_free_page_tables()
which will do only a  __free_pages() on each of the above alloc_page'd
entry. But then we still have the kcalloc'd buffer, which I can't see
being freed anywhere. 

--Imre

> 
> > 
> > > +
> > > +	return ret;
> > >  }
> > >  
> > >  static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
> > > @@ -475,7 +548,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
> > >  	struct page *p;
> > >  	int ret;
> > >  
> > > -	p = &ppgtt->gen8_pt_pages[pd * GEN8_PDES_PER_PAGE + pt];
> > > +	p = ppgtt->gen8_pt_pages[pd][pt];
> > >  	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
> > >  			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> > >  	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
> > 
> 
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations
  2014-02-19 21:20         ` Imre Deak
@ 2014-02-19 21:31           ` Ben Widawsky
  0 siblings, 0 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-19 21:31 UTC (permalink / raw)
  To: Imre Deak; +Cc: Intel GFX, Ben Widawsky

On Wed, Feb 19, 2014 at 11:20:51PM +0200, Imre Deak wrote:
> On Wed, 2014-02-19 at 13:06 -0800, Ben Widawsky wrote:
> > On Wed, Feb 19, 2014 at 09:11:46PM +0200, Imre Deak wrote:
> > > On Wed, 2014-02-12 at 14:28 -0800, Ben Widawsky wrote:
> > > > The previous allocation mechanism would get 2 contiguous allocations,
> > > > one for the page directories, and one for the page tables. As each page
> > > > table is 1 page, and there are 512 of these per page directory, this
> > > > goes to 1MB. An unfriendly request at best. Worse still, our HW now
> > >        ---^
> > > Fwiw, 2MB.
> > 
> > Thanks.
> > 
> > > 
> > > > supports 4 page directories, and a 2MB allocation is not allowed.
> > > > 
> > > > In order to fix this, this patch attempts to split up each page table
> > > > allocation into a single, discrete allocation. There is nothing really
> > > > fancy about the patch itself, it just has to manage an extra pointer
> > > > indirection, and have a fancier bit of logic to free up the pages.
> > > > 
> > > > To accommodate some of the added complexity, two new helpers are
> > > > introduced to allocate, and free the page table pages.
> > > > 
> > > > NOTE: I really wanted to split the way we do allocations, and the way in
> > > > which we identify the page table/page directory being used. I found
> > > > splitting this functionality up to be too unwieldy. I apologize in
> > > > advance to the reviewer. I'd recommend looking at the result, rather
> > > > than the diff.
> > > > 
> > > > v2/NOTE2: This patch predated commit:
> > > > 6f1cc993518462ccf039e195fabd47e7aa5bfd13
> > > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Date:   Tue Dec 31 15:50:31 2013 +0000
> > > > 
> > > >     drm/i915: Avoid dereference past end of page arr
> > > > 
> > > > It fixed the same issue as that patch, but because of the limbo state of
> > > > PPGTT, Chris patch was merged instead. The excess churn is a result of
> > > > my using my original patch, which has my preferred naming. Primarily
> > > > act_* is changed to which_*, but it's mostly the same otherwise. I've
> > > > kept the convention Chris used for the pte wrap (I had something
> > > > slightly different, and broken - but fixable)
> > > > 
> > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_drv.h     |   5 +-
> > > >  drivers/gpu/drm/i915/i915_gem_gtt.c | 127 ++++++++++++++++++++++++++++--------
> > > >  2 files changed, 103 insertions(+), 29 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > > > index 2ebad96..d9a6327 100644
> > > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > @@ -691,6 +691,7 @@ struct i915_gtt {
> > > >  };
> > > >  #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
> > > >  
> > > > +#define GEN8_LEGACY_PDPS 4
> > > >  struct i915_hw_ppgtt {
> > > >  	struct i915_address_space base;
> > > >  	struct kref ref;
> > > > @@ -698,14 +699,14 @@ struct i915_hw_ppgtt {
> > > >  	unsigned num_pd_entries;
> > > >  	union {
> > > >  		struct page **pt_pages;
> > > > -		struct page *gen8_pt_pages;
> > > > +		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
> > > >  	};
> > > >  	struct page *pd_pages;
> > > >  	int num_pd_pages;
> > > >  	int num_pt_pages;
> > > >  	union {
> > > >  		uint32_t pd_offset;
> > > > -		dma_addr_t pd_dma_addr[4];
> > > > +		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
> > > >  	};
> > > >  	union {
> > > >  		dma_addr_t *pt_dma_addr;
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > > index 5bfc6ff..5299acc 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > > @@ -64,7 +64,19 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
> > > >  
> > > >  #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
> > > >  #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
> > > > -#define GEN8_LEGACY_PDPS		4
> > > > +
> > > > +/* GEN8 legacy style addressis defined as a 3 level page table:
> > > > + * 31:30 | 29:21 | 20:12 |  11:0
> > > > + * PDPE  |  PDE  |  PTE  | offset
> > > > + * The difference as compared to normal x86 3 level page table is the PDPEs are
> > > > + * programmed via register.
> > > > + */
> > > > +#define GEN8_PDPE_SHIFT			30
> > > > +#define GEN8_PDPE_MASK			0x3
> > > > +#define GEN8_PDE_SHIFT			21
> > > > +#define GEN8_PDE_MASK			0x1ff
> > > > +#define GEN8_PTE_SHIFT			12
> > > > +#define GEN8_PTE_MASK			0x1ff
> > > >  
> > > >  #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
> > > >  #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
> > > > @@ -261,32 +273,36 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> > > >  	struct i915_hw_ppgtt *ppgtt =
> > > >  		container_of(vm, struct i915_hw_ppgtt, base);
> > > >  	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
> > > > -	unsigned first_entry = start >> PAGE_SHIFT;
> > > > +	unsigned which_pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> > > > +	unsigned which_pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> > > > +	unsigned which_pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> > > >  	unsigned num_entries = length >> PAGE_SHIFT;
> > > > -	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> > > > -	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
> > > >  	unsigned last_pte, i;
> > > >  
> > > >  	scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
> > > >  				      I915_CACHE_LLC, use_scratch);
> > > >  
> > > >  	while (num_entries) {
> > > > -		struct page *page_table = &ppgtt->gen8_pt_pages[act_pt];
> > > > +		struct page *page_table = ppgtt->gen8_pt_pages[which_pdpe][which_pde];
> > > >  
> > > > -		last_pte = first_pte + num_entries;
> > > > +		last_pte = which_pte + num_entries;
> > > >  		if (last_pte > GEN8_PTES_PER_PAGE)
> > > >  			last_pte = GEN8_PTES_PER_PAGE;
> > > >  
> > > >  		pt_vaddr = kmap_atomic(page_table);
> > > >  
> > > > -		for (i = first_pte; i < last_pte; i++)
> > > > +		for (i = which_pte; i < last_pte; i++) {
> > > >  			pt_vaddr[i] = scratch_pte;
> > > > +			num_entries--;
> > > > +			BUG_ON(num_entries < 0);
> > > 
> > > num_entries is unsigned.
> > 
> > This was already changed per Chris' request.
> > 
> > > 
> > > > +		}
> > > >  
> > > >  		kunmap_atomic(pt_vaddr);
> > > >  
> > > > -		num_entries -= last_pte - first_pte;
> > > > -		first_pte = 0;
> > > > -		act_pt++;
> > > > +		which_pte = 0;
> > > > +		if (which_pde + 1 == GEN8_PDES_PER_PAGE)
> > > > +			which_pdpe++;
> > > > +		which_pde = (which_pde + 1) & GEN8_PDE_MASK;
> > > >  	}
> > > >  }
> > > >  
> > > > @@ -298,39 +314,57 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> > > >  	struct i915_hw_ppgtt *ppgtt =
> > > >  		container_of(vm, struct i915_hw_ppgtt, base);
> > > >  	gen8_gtt_pte_t *pt_vaddr;
> > > > -	unsigned first_entry = start >> PAGE_SHIFT;
> > > > -	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> > > > -	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
> > > > +	unsigned which_pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> > > > +	unsigned which_pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> > > > +	unsigned which_pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> > > >  	struct sg_page_iter sg_iter;
> > > >  
> > > >  	pt_vaddr = NULL;
> > > > +
> > > >  	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> > > > +		if (WARN_ON(which_pdpe >= GEN8_LEGACY_PDPS))
> > > > +			break;
> > > > +
> > > >  		if (pt_vaddr == NULL)
> > > > -			pt_vaddr = kmap_atomic(&ppgtt->gen8_pt_pages[act_pt]);
> > > > +			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[which_pdpe][which_pde]);
> > > >  
> > > > -		pt_vaddr[act_pte] =
> > > > +		pt_vaddr[which_pte] =
> > > >  			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
> > > >  					cache_level, true);
> > > > -		if (++act_pte == GEN8_PTES_PER_PAGE) {
> > > > +		if (++which_pte == GEN8_PTES_PER_PAGE) {
> > > >  			kunmap_atomic(pt_vaddr);
> > > >  			pt_vaddr = NULL;
> > > > -			act_pt++;
> > > > -			act_pte = 0;
> > > > +			if (which_pde + 1 == GEN8_PDES_PER_PAGE)
> > > > +				which_pdpe++;
> > > 
> > > Afaics which_pde = (which_pde + 1) & GEN8_PDE_MASK; is missing here.
> > > 
> > 
> > This was already changed per Chris' request.
> > 
> > > > +			which_pte = 0;
> > > 
> > > 
> > > 
> > > >  		}
> > > >  	}
> > > >  	if (pt_vaddr)
> > > >  		kunmap_atomic(pt_vaddr);
> > > >  }
> > > >  
> > > > -static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
> > > > +static void gen8_free_page_tables(struct page **pt_pages)
> > > > +{
> > > > +	int i;
> > > > +
> > > > +	if (pt_pages == NULL)
> > > > +		return;
> > > > +
> > > > +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
> > > > +		if (pt_pages[i])
> > > > +			__free_pages(pt_pages[i], 0);
> > > > +}
> > > > +
> > > > +static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
> > > >  {
> > > >  	int i;
> > > >  
> > > > -	for (i = 0; i < ppgtt->num_pd_pages ; i++)
> > > > +	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> > > > +		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
> > > >  		kfree(ppgtt->gen8_pt_dma_addr[i]);
> > > > +	}
> > > >  
> > > >  	kfree(ppgtt->gen8_pt_dma_addr);
> > > > -	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
> > > >  	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
> > > >  }
> > > >  
> > > > @@ -369,20 +403,59 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> > > >  	gen8_ppgtt_free(ppgtt);
> > > >  }
> > > >  
> > > > +static struct page **__gen8_alloc_page_tables(void)
> > > > +{
> > > > +	struct page **pt_pages;
> > > > +	int i;
> > > > +
> > > > +	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
> > > > +	if (!pt_pages)
> > > > +		return ERR_PTR(-ENOMEM);
> > > > +
> > > > +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
> > > > +		pt_pages[i] = alloc_page(GFP_KERNEL);
> > > > +		if (!pt_pages[i])
> > > > +			goto bail;
> > > > +	}
> > > > +
> > > > +	return pt_pages;
> > > > +
> > > > +bail:
> > > > +	gen8_free_page_tables(pt_pages);
> > > > +	kfree(pt_pages);
> > > > +	return ERR_PTR(-ENOMEM);
> > > > +}
> > > > +
> > > >  static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
> > > >  					   const int max_pdp)
> > > >  {
> > > > -	struct page *pt_pages;
> > > > +	struct page **pt_pages[GEN8_LEGACY_PDPS];
> > > >  	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
> > > > +	int i, ret;
> > > >  
> > > > -	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
> > > > -	if (!pt_pages)
> > > > -		return -ENOMEM;
> > > > +	for (i = 0; i < max_pdp; i++) {
> > > > +		pt_pages[i] = __gen8_alloc_page_tables();
> > > > +		if (IS_ERR(pt_pages[i])) {
> > > > +			ret = PTR_ERR(pt_pages[i]);
> > > > +			goto unwind_out;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
> > > > +	 * "atomic" - for cleanup purposes.
> > > > +	 */
> > > > +	for (i = 0; i < max_pdp; i++)
> > > > +		ppgtt->gen8_pt_pages[i] = pt_pages[i];
> > > >  
> > > > -	ppgtt->gen8_pt_pages = pt_pages;
> > > >  	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
> > > >  
> > > >  	return 0;
> > > > +
> > > > +unwind_out:
> > > > +	while(i--)
> > > > +		gen8_free_page_tables(pt_pages[i]);
> > > 
> > > I guess Ville commented on this issue, but pt_pages would be leaked
> > > here.
> > 
> > I think Ville was referring to a different issue. The PPGTT struct
> > itself isn't freed (if I understood his point correctly, which was
> > indeed an issue). Forgive my ignorance here but I don't see where the
> > leak is. __gen8_alloc_page_tables() appears to always free if it failed,
> 
> Yes that cleanup inside __gen8_alloc_page_tables is ok.
> 
> > and unwind out should work backwards. Can you show me what I've missed?
> 
> As I understand after
> 
> pt_pages[i] = __gen8_alloc_page_tables();
> 
> we have a kcalloc()'d buffer in pt_pages[i]. Each entry of this buffer
> points to an alloc_page()'d page.
> 
> Then on error at unwind_out for 0..i-1 we call gen8_free_page_tables()
> which will do only a  __free_pages() on each of the above alloc_page'd
> entry. But then we still have the kcalloc'd buffer, which I can't see
> being freed anywhere. 
> 
> --Imre
> 

Ah, gotcha. I was confused by where you placed your comment. It does
appear gen8_pt_pages is leaked more generally. Thanks.

> > 
> > > 
> > > > +
> > > > +	return ret;
> > > >  }
> > > >  
> > > >  static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
> > > > @@ -475,7 +548,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
> > > >  	struct page *p;
> > > >  	int ret;
> > > >  
> > > > -	p = &ppgtt->gen8_pt_pages[pd * GEN8_PDES_PER_PAGE + pt];
> > > > +	p = ppgtt->gen8_pt_pages[pd][pt];
> > > >  	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
> > > >  			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> > > >  	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
> > > 
> > 
> > 
> > 
> 
> 

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 0/9] [v2] BDW 4G GGTT + PPGTT cleanups
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (9 preceding siblings ...)
  2014-02-13 11:47   ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ville Syrjälä
@ 2014-02-20  6:05   ` Ben Widawsky
  2014-02-21 21:06     ` [PATCH 10/9] drm/i915/bdw: Kill ppgtt->num_pt_pages Ben Widawsky
  2014-03-04 14:50     ` [PATCH 0/9] [v2] BDW 4G GGTT + PPGTT cleanups Daniel Vetter
  2014-02-20  6:05   ` [PATCH 1/9] drm/i915/bdw: Free PPGTT struct Ben Widawsky
                     ` (8 subsequent siblings)
  19 siblings, 2 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-20  6:05 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky

Thanks to Imre's very detailed review, and Ville's observation of a
missed free (earlier bug), I think the series is finally starting to
shape up. I am having some unrelated problems on my BDW platform at the
moment, so they are not well tested.

Many patches are way past v2, but for th series it's the second
iteration.

Ben Widawsky (9):
  drm/i915/bdw: Free PPGTT struct
  drm/i915/bdw: Reorganize PPGTT init
  drm/i915/bdw: Split ppgtt initialization up
  drm/i915: Make clear/insert vfuncs args absolute
  drm/i915/bdw: Reorganize PT allocations
  Revert "drm/i915/bdw: Limit GTT to 2GB"
  drm/i915: Update i915_gem_gtt.c copyright
  drm/i915: Split GEN6 PPGTT cleanup
  drm/i915: Split GEN6 PPGTT initialization up

 drivers/gpu/drm/i915/i915_drv.h     |  11 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 582 ++++++++++++++++++++++++------------
 2 files changed, 405 insertions(+), 188 deletions(-)

-- 
1.9.0

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 1/9] drm/i915/bdw: Free PPGTT struct
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (10 preceding siblings ...)
  2014-02-20  6:05   ` [PATCH 0/9] [v2] " Ben Widawsky
@ 2014-02-20  6:05   ` Ben Widawsky
  2014-02-20  9:31     ` Imre Deak
  2014-02-20 19:47     ` [PATCH .5/9] drm/i915: Move ppgtt_release out of the header Ben Widawsky
  2014-02-20  6:05   ` [PATCH 2/9] drm/i915/bdw: Reorganize PPGTT init Ben Widawsky
                     ` (7 subsequent siblings)
  19 siblings, 2 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-20  6:05 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

GEN8 never freed the PPGTT struct. As GEN8 doesn't use full PPGTT, the
leak is small and only found on a module reload. ie. I don't think this
needs to go to stable.

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 69a88d4..e414d7e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -328,6 +328,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
 	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
+	kfree(ppgtt);
 }
 
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
-- 
1.9.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 2/9] drm/i915/bdw: Reorganize PPGTT init
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (11 preceding siblings ...)
  2014-02-20  6:05   ` [PATCH 1/9] drm/i915/bdw: Free PPGTT struct Ben Widawsky
@ 2014-02-20  6:05   ` Ben Widawsky
  2014-02-20  6:05   ` [PATCH 3/9] drm/i915/bdw: Split ppgtt initialization up Ben Widawsky
                     ` (6 subsequent siblings)
  19 siblings, 0 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-20  6:05 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Create 3 clear stages in PPGTT init. This will help with upcoming
changes be more readable. The 3 stages are, allocation, dma mapping, and
writing the P[DT]Es

One nice benefit to the patches is that it makes 2 very clear error
points, allocation, and mapping, and avoids having to do any handling
after writing PTEs (something which was likely buggy before). This
simplified error handling I suspect will be helpful when we move to
deferred/dynamic page table allocation and mapping.

The patches also attempts to break up some of the steps into more
logical reviewable chunks, particularly when we free.

v2: Don't call cleanup on the error path since that takes down the
drm_mm and list entry, which aren't setup at this point.

v3: Fixes addressing Imre's comments from:
<1392821989.19792.13.camel@intelbox>

Don't do dynamic allocation for the page table DMA addresses. I can't
remember why I did it in the first place. This addresses one of Imre's
other issues.

Fix error path leak of page tables.

v4: Fix the fix of the error path leak. Original fix still leaked page
tables. (Imre)

Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 110 ++++++++++++++++++++----------------
 1 file changed, 60 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e414d7e..7956659 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -333,6 +333,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
+	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
 	int i, j;
 
 	for (i = 0; i < ppgtt->num_pd_pages; i++) {
@@ -341,18 +342,14 @@ static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 		if (!ppgtt->pd_dma_addr[i])
 			continue;
 
-		pci_unmap_page(ppgtt->base.dev->pdev,
-			       ppgtt->pd_dma_addr[i],
-			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+		pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
+			       PCI_DMA_BIDIRECTIONAL);
 
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
 			if (addr)
-				pci_unmap_page(ppgtt->base.dev->pdev,
-				       addr,
-				       PAGE_SIZE,
-				       PCI_DMA_BIDIRECTIONAL);
-
+				pci_unmap_page(hwdev, addr, PAGE_SIZE,
+					       PCI_DMA_BIDIRECTIONAL);
 		}
 	}
 }
@@ -370,27 +367,27 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 }
 
 /**
- * GEN8 legacy ppgtt programming is accomplished through 4 PDP registers with a
- * net effect resembling a 2-level page table in normal x86 terms. Each PDP
- * represents 1GB of memory
- * 4 * 512 * 512 * 4096 = 4GB legacy 32b address space.
+ * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
+ * with a net effect resembling a 2-level page table in normal x86 terms. Each
+ * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
+ * space.
  *
+ * FIXME: split allocation into smaller pieces. For now we only ever do this
+ * once, but with full PPGTT, the multiple contiguous allocations will be bad.
  * TODO: Do something with the size parameter
- **/
+ */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
 	struct page *pt_pages;
-	int i, j, ret = -ENOMEM;
 	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
 	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
+	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
+	int i, j, ret;
 
 	if (size % (1<<30))
 		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
 
-	/* FIXME: split allocation into smaller pieces. For now we only ever do
-	 * this once, but with full PPGTT, the multiple contiguous allocations
-	 * will be bad.
-	 */
+	/* 1. Do all our allocations for page directories and page tables */
 	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
 	if (!ppgtt->pd_pages)
 		return -ENOMEM;
@@ -405,52 +402,56 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
 	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
 	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
-	ppgtt->enable = gen8_ppgtt_enable;
-	ppgtt->switch_mm = gen8_mm_switch;
-	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
-	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
-	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
-
 	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
 
+	for (i = 0; i < max_pdp; i++) {
+		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
+						     sizeof(dma_addr_t),
+						     GFP_KERNEL);
+		if (!ppgtt->gen8_pt_dma_addr[i]) {
+			ret = -ENOMEM;
+			goto bail;
+		}
+	}
+
 	/*
-	 * - Create a mapping for the page directories.
-	 * - For each page directory:
-	 *      allocate space for page table mappings.
-	 *      map each page table
+	 * 2. Create all the DMA mappings for the page directories and page
+	 * tables
 	 */
 	for (i = 0; i < max_pdp; i++) {
-		dma_addr_t temp;
-		temp = pci_map_page(ppgtt->base.dev->pdev,
-				    &ppgtt->pd_pages[i], 0,
-				    PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-		if (pci_dma_mapping_error(ppgtt->base.dev->pdev, temp))
-			goto err_out;
+		dma_addr_t pd_addr, pt_addr;
 
-		ppgtt->pd_dma_addr[i] = temp;
+		/* Get the page directory mappings */
+		pd_addr = pci_map_page(hwdev, &ppgtt->pd_pages[i], 0,
+				       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+		ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
+		if (ret)
+			goto bail;
 
-		ppgtt->gen8_pt_dma_addr[i] = kmalloc(sizeof(dma_addr_t) * GEN8_PDES_PER_PAGE, GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i])
-			goto err_out;
+		ppgtt->pd_dma_addr[i] = pd_addr;
 
+		/* And the page table mappings per page directory */
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
 			struct page *p = &pt_pages[i * GEN8_PDES_PER_PAGE + j];
-			temp = pci_map_page(ppgtt->base.dev->pdev,
-					    p, 0, PAGE_SIZE,
-					    PCI_DMA_BIDIRECTIONAL);
 
-			if (pci_dma_mapping_error(ppgtt->base.dev->pdev, temp))
-				goto err_out;
+			pt_addr = pci_map_page(hwdev, p, 0, PAGE_SIZE,
+					       PCI_DMA_BIDIRECTIONAL);
+			ret = pci_dma_mapping_error(hwdev, pt_addr);
+			if (ret)
+				goto bail;
 
-			ppgtt->gen8_pt_dma_addr[i][j] = temp;
+			ppgtt->gen8_pt_dma_addr[i][j] = pt_addr;
 		}
 	}
 
-	/* For now, the PPGTT helper functions all require that the PDEs are
+	/*
+	 * 3. Map all the page directory entires to point to the page tables
+	 * we've allocated.
+	 *
+	 * For now, the PPGTT helper functions all require that the PDEs are
 	 * plugged in correctly. So we do that now/here. For aliasing PPGTT, we
-	 * will never need to touch the PDEs again */
+	 * will never need to touch the PDEs again.
+	 */
 	for (i = 0; i < max_pdp; i++) {
 		gen8_ppgtt_pde_t *pd_vaddr;
 		pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
@@ -462,6 +463,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 		kunmap_atomic(pd_vaddr);
 	}
 
+	ppgtt->enable = gen8_ppgtt_enable;
+	ppgtt->switch_mm = gen8_mm_switch;
+	ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+	ppgtt->base.start = 0;
+	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
+
 	ppgtt->base.clear_range(&ppgtt->base, 0,
 				ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE,
 				true);
@@ -474,8 +483,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 			 size % (1<<30));
 	return 0;
 
-err_out:
-	ppgtt->base.cleanup(&ppgtt->base);
+bail:
+	gen8_ppgtt_unmap_pages(ppgtt);
+	gen8_ppgtt_free(ppgtt);
 	return ret;
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 3/9] drm/i915/bdw: Split ppgtt initialization up
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (12 preceding siblings ...)
  2014-02-20  6:05   ` [PATCH 2/9] drm/i915/bdw: Reorganize PPGTT init Ben Widawsky
@ 2014-02-20  6:05   ` Ben Widawsky
  2014-02-20 13:10     ` Imre Deak
  2014-02-20  6:05   ` [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute Ben Widawsky
                     ` (5 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Ben Widawsky @ 2014-02-20  6:05 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Like cleanup in an earlier patch, the code becomes much more readable,
and easier to extend if we extract out helper functions for the various
stages of init.

Note that with this patch it becomes really simple, and tempting to begin
using the 'goto out' idiom with explicit free/fini semantics. I've
kept the error path as similar as possible to the cleanup() function to
make sure cleanup is as robust as possible

v2: Remove comment "NB:From here on, ppgtt->base.cleanup() should
function properly"
Update commit message to reflect above

v3: Rebased on top of bugfixes found in the previous patch by Imre
Moved number of pd pages assertion to the proper place (Imre)

v4:
Allocate dma address space for num_pd_pages, not num_pd_entries (Ben)
Don't use gen8_pt_dma_addr after free on error path (Imre)
With new fix from v4 of the previous patch.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 164 +++++++++++++++++++++++++-----------
 1 file changed, 116 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7956659..0af3587 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -366,6 +366,113 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
+static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
+					   const int max_pdp)
+{
+	struct page *pt_pages;
+	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
+
+	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
+	if (!pt_pages)
+		return -ENOMEM;
+
+	ppgtt->gen8_pt_pages = pt_pages;
+	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
+
+	return 0;
+}
+
+static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
+{
+	int i;
+
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
+						     sizeof(dma_addr_t),
+						     GFP_KERNEL);
+		if (!ppgtt->gen8_pt_dma_addr[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
+						const int max_pdp)
+{
+	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
+	if (!ppgtt->pd_pages)
+		return -ENOMEM;
+
+	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
+	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
+
+	return 0;
+}
+
+static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
+			    const int max_pdp)
+{
+	int ret;
+
+	ret = gen8_ppgtt_allocate_page_directories(ppgtt, max_pdp);
+	if (ret)
+		return ret;
+
+	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
+	if (ret) {
+		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
+		return ret;
+	}
+
+	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
+
+	ret = gen8_ppgtt_allocate_dma(ppgtt);
+	if (ret)
+		gen8_ppgtt_free(ppgtt);
+
+	return ret;
+}
+
+static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
+					     const int pd)
+{
+	dma_addr_t pd_addr;
+	int ret;
+
+	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
+			       &ppgtt->pd_pages[pd], 0,
+			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+
+	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
+	if (ret)
+		return ret;
+
+	ppgtt->pd_dma_addr[pd] = pd_addr;
+
+	return 0;
+}
+
+static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
+					const int pd,
+					const int pt)
+{
+	dma_addr_t pt_addr;
+	struct page *p;
+	int ret;
+
+	p = &ppgtt->gen8_pt_pages[pd * GEN8_PDES_PER_PAGE + pt];
+	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
+			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
+	if (ret)
+		return ret;
+
+	ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
+
+	return 0;
+}
+
 /**
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -378,69 +485,30 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
  */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	struct page *pt_pages;
 	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
-	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
+	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
 	int i, j, ret;
 
 	if (size % (1<<30))
 		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
 
-	/* 1. Do all our allocations for page directories and page tables */
-	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
-	if (!ppgtt->pd_pages)
-		return -ENOMEM;
-
-	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
-	if (!pt_pages) {
-		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
-		return -ENOMEM;
-	}
-
-	ppgtt->gen8_pt_pages = pt_pages;
-	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
-	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
-	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
-	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
-
-	for (i = 0; i < max_pdp; i++) {
-		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
-						     sizeof(dma_addr_t),
-						     GFP_KERNEL);
-		if (!ppgtt->gen8_pt_dma_addr[i]) {
-			ret = -ENOMEM;
-			goto bail;
-		}
-	}
+	/* 1. Do all our allocations for page directories and page tables. */
+	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
+	if (ret)
+		return ret;
 
 	/*
-	 * 2. Create all the DMA mappings for the page directories and page
-	 * tables
+	 * 2. Create DMA mappings for the page directories and page tables.
 	 */
 	for (i = 0; i < max_pdp; i++) {
-		dma_addr_t pd_addr, pt_addr;
-
-		/* Get the page directory mappings */
-		pd_addr = pci_map_page(hwdev, &ppgtt->pd_pages[i], 0,
-				       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-		ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
+		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
 		if (ret)
 			goto bail;
 
-		ppgtt->pd_dma_addr[i] = pd_addr;
-
-		/* And the page table mappings per page directory */
 		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
-			struct page *p = &pt_pages[i * GEN8_PDES_PER_PAGE + j];
-
-			pt_addr = pci_map_page(hwdev, p, 0, PAGE_SIZE,
-					       PCI_DMA_BIDIRECTIONAL);
-			ret = pci_dma_mapping_error(hwdev, pt_addr);
+			ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
 			if (ret)
 				goto bail;
-
-			ppgtt->gen8_pt_dma_addr[i][j] = pt_addr;
 		}
 	}
 
@@ -479,7 +547,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
 	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
 			 ppgtt->num_pt_pages,
-			 (ppgtt->num_pt_pages - num_pt_pages) +
+			 (ppgtt->num_pt_pages - min_pt_pages) +
 			 size % (1<<30));
 	return 0;
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (13 preceding siblings ...)
  2014-02-20  6:05   ` [PATCH 3/9] drm/i915/bdw: Split ppgtt initialization up Ben Widawsky
@ 2014-02-20  6:05   ` Ben Widawsky
  2014-02-20 10:37     ` Imre Deak
  2014-02-20 19:50     ` [PATCH 4/9] [v3] " Ben Widawsky
  2014-02-20  6:05   ` [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations Ben Widawsky
                     ` (4 subsequent siblings)
  19 siblings, 2 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-20  6:05 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This patch converts insert_entries and clear_range, both functions which
are specific to the VM. These functions tend to encapsulate the gen
specific PTE writes. Passing absolute addresses to the insert_entries,
and clear_range will help make the logic clearer within the functions as
to what's going on. Currently, all callers simply do the appropriate
page shift, which IMO, ends up looking weird with an upcoming change for
the gen8 page table allocations.

Up until now, the PPGTT was a funky 2 level page table. GEN8 changes
this to look more like a 3 level page table, and to that extent we need
a significant amount more memory simply for the page tables. To address
this, the allocations will be split up in finer amounts.

v2: Replace size_t with uint64_t (Chris, Imre)

Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h     |  6 +--
 drivers/gpu/drm/i915/i915_gem_gtt.c | 80 +++++++++++++++++++++----------------
 2 files changed, 49 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8c64831..f3379ea 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -652,12 +652,12 @@ struct i915_address_space {
 				     enum i915_cache_level level,
 				     bool valid); /* Create a valid PTE */
 	void (*clear_range)(struct i915_address_space *vm,
-			    unsigned int first_entry,
-			    unsigned int num_entries,
+			    uint64_t start,
+			    uint64_t length,
 			    bool use_scratch);
 	void (*insert_entries)(struct i915_address_space *vm,
 			       struct sg_table *st,
-			       unsigned int first_entry,
+			       uint64_t start,
 			       enum i915_cache_level cache_level);
 	void (*cleanup)(struct i915_address_space *vm);
 };
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0af3587..ef5e90c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -254,13 +254,15 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 }
 
 static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
-				   unsigned first_entry,
-				   unsigned num_entries,
+				   uint64_t start,
+				   uint64_t length,
 				   bool use_scratch)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
+	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
 	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
 	unsigned last_pte, i;
@@ -290,12 +292,13 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 
 static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 				      struct sg_table *pages,
-				      unsigned first_entry,
+				      uint64_t start,
 				      enum i915_cache_level cache_level)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr;
+	unsigned first_entry = start >> PAGE_SHIFT;
 	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
 	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
 	struct sg_page_iter sg_iter;
@@ -855,13 +858,15 @@ static int gen6_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
 
 /* PPGTT support for Sandybdrige/Gen6 and later */
 static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
-				   unsigned first_entry,
-				   unsigned num_entries,
+				   uint64_t start,
+				   uint64_t length,
 				   bool use_scratch)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen6_gtt_pte_t *pt_vaddr, scratch_pte;
+	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
 	unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
 	unsigned last_pte, i;
@@ -888,12 +893,13 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 
 static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 				      struct sg_table *pages,
-				      unsigned first_entry,
+				      uint64_t start,
 				      enum i915_cache_level cache_level)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen6_gtt_pte_t *pt_vaddr;
+	unsigned first_entry = start >> PAGE_SHIFT;
 	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
 	unsigned act_pte = first_entry % I915_PPGTT_PT_ENTRIES;
 	struct sg_page_iter sg_iter;
@@ -1026,8 +1032,7 @@ alloc:
 		ppgtt->pt_dma_addr[i] = pt_addr;
 	}
 
-	ppgtt->base.clear_range(&ppgtt->base, 0,
-				ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES, true);
+	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
@@ -1091,20 +1096,17 @@ ppgtt_bind_vma(struct i915_vma *vma,
 	       enum i915_cache_level cache_level,
 	       u32 flags)
 {
-	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
-
 	WARN_ON(flags);
 
-	vma->vm->insert_entries(vma->vm, vma->obj->pages, entry, cache_level);
+	vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
+				cache_level);
 }
 
 static void ppgtt_unbind_vma(struct i915_vma *vma)
 {
-	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
-
 	vma->vm->clear_range(vma->vm,
-			     entry,
-			     vma->obj->base.size >> PAGE_SHIFT,
+			     vma->node.start,
+			     vma->obj->base.size,
 			     true);
 }
 
@@ -1265,10 +1267,11 @@ static inline void gen8_set_pte(void __iomem *addr, gen8_gtt_pte_t pte)
 
 static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
 				     struct sg_table *st,
-				     unsigned int first_entry,
+				     uint64_t start,
 				     enum i915_cache_level level)
 {
 	struct drm_i915_private *dev_priv = vm->dev->dev_private;
+	unsigned first_entry = start >> PAGE_SHIFT;
 	gen8_gtt_pte_t __iomem *gtt_entries =
 		(gen8_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
 	int i = 0;
@@ -1310,10 +1313,11 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
  */
 static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 				     struct sg_table *st,
-				     unsigned int first_entry,
+				     uint64_t start,
 				     enum i915_cache_level level)
 {
 	struct drm_i915_private *dev_priv = vm->dev->dev_private;
+	unsigned first_entry = start >> PAGE_SHIFT;
 	gen6_gtt_pte_t __iomem *gtt_entries =
 		(gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
 	int i = 0;
@@ -1345,11 +1349,13 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 }
 
 static void gen8_ggtt_clear_range(struct i915_address_space *vm,
-				  unsigned int first_entry,
-				  unsigned int num_entries,
+				  uint64_t start,
+				  uint64_t length,
 				  bool use_scratch)
 {
 	struct drm_i915_private *dev_priv = vm->dev->dev_private;
+	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned num_entries = length >> PAGE_SHIFT;
 	gen8_gtt_pte_t scratch_pte, __iomem *gtt_base =
 		(gen8_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
 	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
@@ -1369,11 +1375,13 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
 }
 
 static void gen6_ggtt_clear_range(struct i915_address_space *vm,
-				  unsigned int first_entry,
-				  unsigned int num_entries,
+				  uint64_t start,
+				  uint64_t length,
 				  bool use_scratch)
 {
 	struct drm_i915_private *dev_priv = vm->dev->dev_private;
+	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned num_entries = length >> PAGE_SHIFT;
 	gen6_gtt_pte_t scratch_pte, __iomem *gtt_base =
 		(gen6_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
 	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
@@ -1406,10 +1414,12 @@ static void i915_ggtt_bind_vma(struct i915_vma *vma,
 }
 
 static void i915_ggtt_clear_range(struct i915_address_space *vm,
-				  unsigned int first_entry,
-				  unsigned int num_entries,
+				  uint64_t start,
+				  uint64_t length,
 				  bool unused)
 {
+	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned num_entries = length >> PAGE_SHIFT;
 	intel_gtt_clear_range(first_entry, num_entries);
 }
 
@@ -1430,7 +1440,6 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 	struct drm_device *dev = vma->vm->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj = vma->obj;
-	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
 
 	/* If there is no aliasing PPGTT, or the caller needs a global mapping,
 	 * or we have a global mapping already but the cacheability flags have
@@ -1446,7 +1455,8 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 	if (!dev_priv->mm.aliasing_ppgtt || flags & GLOBAL_BIND) {
 		if (!obj->has_global_gtt_mapping ||
 		    (cache_level != obj->cache_level)) {
-			vma->vm->insert_entries(vma->vm, obj->pages, entry,
+			vma->vm->insert_entries(vma->vm, obj->pages,
+						vma->node.start,
 						cache_level);
 			obj->has_global_gtt_mapping = 1;
 		}
@@ -1457,7 +1467,9 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 	     (cache_level != obj->cache_level))) {
 		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
 		appgtt->base.insert_entries(&appgtt->base,
-					    vma->obj->pages, entry, cache_level);
+					    vma->obj->pages,
+					    vma->node.start,
+					    cache_level);
 		vma->obj->has_aliasing_ppgtt_mapping = 1;
 	}
 }
@@ -1467,11 +1479,11 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
 	struct drm_device *dev = vma->vm->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj = vma->obj;
-	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
 
 	if (obj->has_global_gtt_mapping) {
-		vma->vm->clear_range(vma->vm, entry,
-				     vma->obj->base.size >> PAGE_SHIFT,
+		vma->vm->clear_range(vma->vm,
+				     vma->node.start,
+				     obj->base.size,
 				     true);
 		obj->has_global_gtt_mapping = 0;
 	}
@@ -1479,8 +1491,8 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
 	if (obj->has_aliasing_ppgtt_mapping) {
 		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
 		appgtt->base.clear_range(&appgtt->base,
-					 entry,
-					 obj->base.size >> PAGE_SHIFT,
+					 vma->node.start,
+					 obj->base.size,
 					 true);
 		obj->has_aliasing_ppgtt_mapping = 0;
 	}
@@ -1565,14 +1577,14 @@ void i915_gem_setup_global_gtt(struct drm_device *dev,
 
 	/* Clear any non-preallocated blocks */
 	drm_mm_for_each_hole(entry, &ggtt_vm->mm, hole_start, hole_end) {
-		const unsigned long count = (hole_end - hole_start) / PAGE_SIZE;
 		DRM_DEBUG_KMS("clearing unused GTT space: [%lx, %lx]\n",
 			      hole_start, hole_end);
-		ggtt_vm->clear_range(ggtt_vm, hole_start / PAGE_SIZE, count, true);
+		ggtt_vm->clear_range(ggtt_vm, hole_start,
+				     hole_end - hole_start, true);
 	}
 
 	/* And finally clear the reserved guard page */
-	ggtt_vm->clear_range(ggtt_vm, end / PAGE_SIZE - 1, 1, true);
+	ggtt_vm->clear_range(ggtt_vm, end - PAGE_SIZE, PAGE_SIZE, true);
 }
 
 void i915_gem_init_global_gtt(struct drm_device *dev)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (14 preceding siblings ...)
  2014-02-20  6:05   ` [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute Ben Widawsky
@ 2014-02-20  6:05   ` Ben Widawsky
  2014-02-20 11:28     ` Imre Deak
  2014-02-20 19:51     ` [PATCH 5/9] [v5] " Ben Widawsky
  2014-02-20  6:05   ` [PATCH 6/9] Revert "drm/i915/bdw: Limit GTT to 2GB" Ben Widawsky
                     ` (3 subsequent siblings)
  19 siblings, 2 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-20  6:05 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

The previous allocation mechanism would get 2 contiguous allocations,
one for the page directories, and one for the page tables. As each page
table is 1 page, and there are 512 of these per page directory, this
goes to 2MB. An unfriendly request at best. Worse still, our HW now
supports 4 page directories, and a 2MB allocation is not allowed.

In order to fix this, this patch attempts to split up each page table
allocation into a single, discrete allocation. There is nothing really
fancy about the patch itself, it just has to manage an extra pointer
indirection, and have a fancier bit of logic to free up the pages.

To accommodate some of the added complexity, two new helpers are
introduced to allocate, and free the page table pages.

NOTE: I really wanted to split the way we do allocations, and the way in
which we identify the page table/page directory being used. I found
splitting this functionality up to be too unwieldy. I apologize in
advance to the reviewer. I'd recommend looking at the result, rather
than the diff.

v2/NOTE2: This patch predated commit:
6f1cc993518462ccf039e195fabd47e7aa5bfd13
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Dec 31 15:50:31 2013 +0000

    drm/i915: Avoid dereference past end of page arr

It fixed the same issue as that patch, but because of the limbo state of
PPGTT, Chris patch was merged instead. The excess churn is a result of
my using my original patch, which has my preferred naming. Primarily
act_* is changed to which_*, but it's mostly the same otherwise. I've
kept the convention Chris used for the pte wrap (I had something
slightly different, and broken - but fixable)

v3: Rename which_p[..]e to drop which_ (Chris)
Remove BUG_ON in inner loop (Chris)
Redo the pde/pdpe wrap logic (Chris)

v4: s/1MB/2MB in commit message (Imre)
Plug leaking gen8_pt_pages in both the error path, as well as general
free case (Imre)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h     |   5 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 130 ++++++++++++++++++++++++++++--------
 2 files changed, 106 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f3379ea..2dbdd34 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -691,6 +691,7 @@ struct i915_gtt {
 };
 #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
 
+#define GEN8_LEGACY_PDPS 4
 struct i915_hw_ppgtt {
 	struct i915_address_space base;
 	struct kref ref;
@@ -698,14 +699,14 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	union {
 		struct page **pt_pages;
-		struct page *gen8_pt_pages;
+		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
 	};
 	struct page *pd_pages;
 	int num_pd_pages;
 	int num_pt_pages;
 	union {
 		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[4];
+		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
 	};
 	union {
 		dma_addr_t *pt_dma_addr;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ef5e90c..fcde3c7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -64,7 +64,19 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 
 #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
 #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
-#define GEN8_LEGACY_PDPS		4
+
+/* GEN8 legacy style addressis defined as a 3 level page table:
+ * 31:30 | 29:21 | 20:12 |  11:0
+ * PDPE  |  PDE  |  PTE  | offset
+ * The difference as compared to normal x86 3 level page table is the PDPEs are
+ * programmed via register.
+ */
+#define GEN8_PDPE_SHIFT			30
+#define GEN8_PDPE_MASK			0x3
+#define GEN8_PDE_SHIFT			21
+#define GEN8_PDE_MASK			0x1ff
+#define GEN8_PTE_SHIFT			12
+#define GEN8_PTE_MASK			0x1ff
 
 #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
 #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
@@ -261,32 +273,36 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
-	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
+	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
+	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
 	unsigned num_entries = length >> PAGE_SHIFT;
-	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
-	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
 	unsigned last_pte, i;
 
 	scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct page *page_table = &ppgtt->gen8_pt_pages[act_pt];
+		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
 
-		last_pte = first_pte + num_entries;
+		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
 			last_pte = GEN8_PTES_PER_PAGE;
 
 		pt_vaddr = kmap_atomic(page_table);
 
-		for (i = first_pte; i < last_pte; i++)
+		for (i = pte; i < last_pte; i++) {
 			pt_vaddr[i] = scratch_pte;
+			num_entries--;
+		}
 
 		kunmap_atomic(pt_vaddr);
 
-		num_entries -= last_pte - first_pte;
-		first_pte = 0;
-		act_pt++;
+		pte = 0;
+		if (++pde == GEN8_PDES_PER_PAGE) {
+			pdpe++;
+			pde = 0;
+		}
 	}
 }
 
@@ -298,38 +314,57 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr;
-	unsigned first_entry = start >> PAGE_SHIFT;
-	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
-	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
+	unsigned which_pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
+	unsigned which_pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
+	unsigned which_pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
 	struct sg_page_iter sg_iter;
 
 	pt_vaddr = NULL;
+
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
+		if (WARN_ON(which_pdpe >= GEN8_LEGACY_PDPS))
+			break;
+
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(&ppgtt->gen8_pt_pages[act_pt]);
+			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[which_pdpe][which_pde]);
 
-		pt_vaddr[act_pte] =
+		pt_vaddr[which_pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
 					cache_level, true);
-		if (++act_pte == GEN8_PTES_PER_PAGE) {
+		if (++which_pte == GEN8_PTES_PER_PAGE) {
 			kunmap_atomic(pt_vaddr);
 			pt_vaddr = NULL;
-			act_pt++;
-			act_pte = 0;
+			if (which_pde + 1 == GEN8_PDES_PER_PAGE)
+				which_pdpe++;
+			which_pte = 0;
 		}
 	}
 	if (pt_vaddr)
 		kunmap_atomic(pt_vaddr);
 }
 
-static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+static void gen8_free_page_tables(struct page **pt_pages)
+{
+	int i;
+
+	if (pt_pages == NULL)
+		return;
+
+	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
+		if (pt_pages[i])
+			__free_pages(pt_pages[i], 0);
+}
+
+static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_pages ; i++)
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
+		kfree(ppgtt->gen8_pt_pages[i]);
 		kfree(ppgtt->gen8_pt_dma_addr[i]);
+	}
 
-	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
 	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
 	kfree(ppgtt);
 }
@@ -369,20 +404,61 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
+static struct page **__gen8_alloc_page_tables(void)
+{
+	struct page **pt_pages;
+	int i;
+
+	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
+	if (!pt_pages)
+		return ERR_PTR(-ENOMEM);
+
+	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
+		pt_pages[i] = alloc_page(GFP_KERNEL);
+		if (!pt_pages[i])
+			goto bail;
+	}
+
+	return pt_pages;
+
+bail:
+	gen8_free_page_tables(pt_pages);
+	kfree(pt_pages);
+	return ERR_PTR(-ENOMEM);
+}
+
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
 					   const int max_pdp)
 {
-	struct page *pt_pages;
+	struct page **pt_pages[GEN8_LEGACY_PDPS];
 	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
+	int i, ret;
 
-	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
-	if (!pt_pages)
-		return -ENOMEM;
+	for (i = 0; i < max_pdp; i++) {
+		pt_pages[i] = __gen8_alloc_page_tables();
+		if (IS_ERR(pt_pages[i])) {
+			ret = PTR_ERR(pt_pages[i]);
+			goto unwind_out;
+		}
+	}
+
+	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
+	 * "atomic" - for cleanup purposes.
+	 */
+	for (i = 0; i < max_pdp; i++)
+		ppgtt->gen8_pt_pages[i] = pt_pages[i];
 
-	ppgtt->gen8_pt_pages = pt_pages;
 	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
 
 	return 0;
+
+unwind_out:
+	while (i--) {
+		gen8_free_page_tables(pt_pages[i]);
+		kfree(pt_pages[i]);
+	}
+
+	return ret;
 }
 
 static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
@@ -464,7 +540,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 	struct page *p;
 	int ret;
 
-	p = &ppgtt->gen8_pt_pages[pd * GEN8_PDES_PER_PAGE + pt];
+	p = ppgtt->gen8_pt_pages[pd][pt];
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 6/9] Revert "drm/i915/bdw: Limit GTT to 2GB"
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (15 preceding siblings ...)
  2014-02-20  6:05   ` [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations Ben Widawsky
@ 2014-02-20  6:05   ` Ben Widawsky
  2014-02-20  6:05   ` [PATCH 7/9] drm/i915: Update i915_gem_gtt.c copyright Ben Widawsky
                     ` (2 subsequent siblings)
  19 siblings, 0 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-20  6:05 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This reverts commit 3a2ffb65eec6dbda2fd8151894f51c18b42c8d41.

Now that the code is fixed to use smaller allocations, it should be safe
to let the full GGTT be used on DW.

The testcase for this is anything which uses more than half of the GTT,
thus eclipsing the old limit.

With pre-requisite patches fixed/merged:
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index fcde3c7..7245166 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1725,11 +1725,6 @@ static inline unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl)
 	bdw_gmch_ctl &= BDW_GMCH_GGMS_MASK;
 	if (bdw_gmch_ctl)
 		bdw_gmch_ctl = 1 << bdw_gmch_ctl;
-	if (bdw_gmch_ctl > 4) {
-		WARN_ON(!i915.preliminary_hw_support);
-		return 4<<20;
-	}
-
 	return bdw_gmch_ctl << 20;
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 7/9] drm/i915: Update i915_gem_gtt.c copyright
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (16 preceding siblings ...)
  2014-02-20  6:05   ` [PATCH 6/9] Revert "drm/i915/bdw: Limit GTT to 2GB" Ben Widawsky
@ 2014-02-20  6:05   ` Ben Widawsky
  2014-02-20  6:05   ` [PATCH 8/9] drm/i915: Split GEN6 PPGTT cleanup Ben Widawsky
  2014-02-20  6:05   ` [PATCH 9/9] drm/i915: Split GEN6 PPGTT initialization up Ben Widawsky
  19 siblings, 0 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-20  6:05 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=a, Size: 795 bytes --]

I keep meaning to do this... by now almost the entire file has been
written by an Intel employee (including Daniel post-2010).

Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7245166..8b8ba38 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1,5 +1,6 @@
 /*
  * Copyright © 2010 Daniel Vetter
+ * Copyright © 2011-2014 Intel Corporation
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the "Software"),
-- 
1.9.0


[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 8/9] drm/i915: Split GEN6 PPGTT cleanup
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (17 preceding siblings ...)
  2014-02-20  6:05   ` [PATCH 7/9] drm/i915: Update i915_gem_gtt.c copyright Ben Widawsky
@ 2014-02-20  6:05   ` Ben Widawsky
  2014-02-20  6:05   ` [PATCH 9/9] drm/i915: Split GEN6 PPGTT initialization up Ben Widawsky
  19 siblings, 0 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-20  6:05 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This cleanup is similar to the GEN8 cleanup (though less necessary).
Having everything split will make cleaning the initialization path error
paths easier to understand.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 8b8ba38..3f2b8e8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1000,22 +1000,21 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 		kunmap_atomic(pt_vaddr);
 }
 
-static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
+static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
-	struct i915_hw_ppgtt *ppgtt =
-		container_of(vm, struct i915_hw_ppgtt, base);
 	int i;
 
-	list_del(&vm->global_link);
-	drm_mm_takedown(&ppgtt->base.mm);
-	drm_mm_remove_node(&ppgtt->node);
-
 	if (ppgtt->pt_dma_addr) {
 		for (i = 0; i < ppgtt->num_pd_entries; i++)
 			pci_unmap_page(ppgtt->base.dev->pdev,
 				       ppgtt->pt_dma_addr[i],
 				       4096, PCI_DMA_BIDIRECTIONAL);
 	}
+}
+
+static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+{
+	int i;
 
 	kfree(ppgtt->pt_dma_addr);
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
@@ -1024,6 +1023,19 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 	kfree(ppgtt);
 }
 
+static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(vm, struct i915_hw_ppgtt, base);
+
+	list_del(&vm->global_link);
+	drm_mm_takedown(&ppgtt->base.mm);
+	drm_mm_remove_node(&ppgtt->node);
+
+	gen6_ppgtt_unmap_pages(ppgtt);
+	gen6_ppgtt_free(ppgtt);
+}
+
 static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 {
 #define GEN6_PD_ALIGN (PAGE_SIZE * 16)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 9/9] drm/i915: Split GEN6 PPGTT initialization up
  2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
                     ` (18 preceding siblings ...)
  2014-02-20  6:05   ` [PATCH 8/9] drm/i915: Split GEN6 PPGTT cleanup Ben Widawsky
@ 2014-02-20  6:05   ` Ben Widawsky
  19 siblings, 0 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-20  6:05 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Simply to match the GEN8 style of PPGTT initialization, split up the
allocations and mappings. Unlike GEN8, we skip a separate dma_addr_t
allocation function, as it is much simpler pre-gen8.

With this code it would be easy to make a more general PPGTT
initialization function with per GEN alloc/map/etc. or use a common
helper, similar to the ringbuffer code. I don't see a benefit to doing
this just yet, but who knows...

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 141 +++++++++++++++++++++++-------------
 1 file changed, 91 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3f2b8e8..6630598 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1036,14 +1036,14 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 	gen6_ppgtt_free(ppgtt);
 }
 
-static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 {
 #define GEN6_PD_ALIGN (PAGE_SIZE * 16)
 #define GEN6_PD_SIZE (GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	bool retried = false;
-	int i, ret;
+	int ret;
 
 	/* PPGTT PDEs reside in the GGTT and consists of 512 entries. The
 	 * allocator works in address space sizes, so it's multiplied by page
@@ -1070,42 +1070,60 @@ alloc:
 	if (ppgtt->node.start < dev_priv->gtt.mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
-	ppgtt->base.pte_encode = dev_priv->gtt.base.pte_encode;
 	ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
-	if (IS_GEN6(dev)) {
-		ppgtt->enable = gen6_ppgtt_enable;
-		ppgtt->switch_mm = gen6_mm_switch;
-	} else if (IS_HASWELL(dev)) {
-		ppgtt->enable = gen7_ppgtt_enable;
-		ppgtt->switch_mm = hsw_mm_switch;
-	} else if (IS_GEN7(dev)) {
-		ppgtt->enable = gen7_ppgtt_enable;
-		ppgtt->switch_mm = gen7_mm_switch;
-	} else
-		BUG();
-	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
-	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
-	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
-	ppgtt->base.scratch = dev_priv->gtt.base.scratch;
-	ppgtt->base.start = 0;
-	ppgtt->base.total = GEN6_PPGTT_PD_ENTRIES * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+	return ret;
+}
+
+static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
+{
+	int i;
+
 	ppgtt->pt_pages = kcalloc(ppgtt->num_pd_entries, sizeof(struct page *),
 				  GFP_KERNEL);
-	if (!ppgtt->pt_pages) {
-		drm_mm_remove_node(&ppgtt->node);
+
+	if (!ppgtt->pt_pages)
 		return -ENOMEM;
-	}
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
-		if (!ppgtt->pt_pages[i])
-			goto err_pt_alloc;
+		if (!ppgtt->pt_pages[i]) {
+			gen6_ppgtt_free(ppgtt);
+			return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
+static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
+{
+	int ret;
+
+	ret = gen6_ppgtt_allocate_page_directories(ppgtt);
+	if (ret)
+		return ret;
+
+	ret = gen6_ppgtt_allocate_page_tables(ppgtt);
+	if (ret) {
+		drm_mm_remove_node(&ppgtt->node);
+		return ret;
 	}
 
 	ppgtt->pt_dma_addr = kcalloc(ppgtt->num_pd_entries, sizeof(dma_addr_t),
 				     GFP_KERNEL);
-	if (!ppgtt->pt_dma_addr)
-		goto err_pt_alloc;
+	if (!ppgtt->pt_dma_addr) {
+		drm_mm_remove_node(&ppgtt->node);
+		gen6_ppgtt_free(ppgtt);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	int i;
 
 	for (i = 0; i < ppgtt->num_pd_entries; i++) {
 		dma_addr_t pt_addr;
@@ -1114,40 +1132,63 @@ alloc:
 				       PCI_DMA_BIDIRECTIONAL);
 
 		if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
-			ret = -EIO;
-			goto err_pd_pin;
-
+			gen6_ppgtt_unmap_pages(ppgtt);
+			return -EIO;
 		}
+
 		ppgtt->pt_dma_addr[i] = pt_addr;
 	}
 
-	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
+	return 0;
+}
+
+static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ppgtt->base.pte_encode = dev_priv->gtt.base.pte_encode;
+	if (IS_GEN6(dev)) {
+		ppgtt->enable = gen6_ppgtt_enable;
+		ppgtt->switch_mm = gen6_mm_switch;
+	} else if (IS_HASWELL(dev)) {
+		ppgtt->enable = gen7_ppgtt_enable;
+		ppgtt->switch_mm = hsw_mm_switch;
+	} else if (IS_GEN7(dev)) {
+		ppgtt->enable = gen7_ppgtt_enable;
+		ppgtt->switch_mm = gen7_mm_switch;
+	} else
+		BUG();
+
+	ret = gen6_ppgtt_alloc(ppgtt);
+	if (ret)
+		return ret;
+
+	ret = gen6_ppgtt_setup_page_tables(ppgtt);
+	if (ret) {
+		gen6_ppgtt_free(ppgtt);
+		return ret;
+	}
+
+	ppgtt->base.clear_range = gen6_ppgtt_clear_range;
+	ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
+	ppgtt->base.cleanup = gen6_ppgtt_cleanup;
+	ppgtt->base.scratch = dev_priv->gtt.base.scratch;
+	ppgtt->base.start = 0;
+	ppgtt->base.total = GEN6_PPGTT_PD_ENTRIES * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
-	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
-			 ppgtt->node.size >> 20,
-			 ppgtt->node.start / PAGE_SIZE);
 	ppgtt->pd_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
-	return 0;
+	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 
-err_pd_pin:
-	if (ppgtt->pt_dma_addr) {
-		for (i--; i >= 0; i--)
-			pci_unmap_page(dev->pdev, ppgtt->pt_dma_addr[i],
-				       4096, PCI_DMA_BIDIRECTIONAL);
-	}
-err_pt_alloc:
-	kfree(ppgtt->pt_dma_addr);
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		if (ppgtt->pt_pages[i])
-			__free_page(ppgtt->pt_pages[i]);
-	}
-	kfree(ppgtt->pt_pages);
-	drm_mm_remove_node(&ppgtt->node);
+	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
+			 ppgtt->node.size >> 20,
+			 ppgtt->node.start / PAGE_SIZE);
 
-	return ret;
+	return 0;
 }
 
 int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH 1/9] drm/i915/bdw: Free PPGTT struct
  2014-02-20  6:05   ` [PATCH 1/9] drm/i915/bdw: Free PPGTT struct Ben Widawsky
@ 2014-02-20  9:31     ` Imre Deak
  2014-02-20 19:47     ` [PATCH .5/9] drm/i915: Move ppgtt_release out of the header Ben Widawsky
  1 sibling, 0 replies; 63+ messages in thread
From: Imre Deak @ 2014-02-20  9:31 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, 2014-02-19 at 22:05 -0800, Ben Widawsky wrote:
> GEN8 never freed the PPGTT struct. As GEN8 doesn't use full PPGTT, the
> leak is small and only found on a module reload. ie. I don't think this
> needs to go to stable.
> 
> Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 69a88d4..e414d7e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -328,6 +328,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
>  
>  	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
>  	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
> +	kfree(ppgtt);

On error we'd also free ppgtt in create_vm_for_ctx().

>  }
>  
>  static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute
  2014-02-20  6:05   ` [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute Ben Widawsky
@ 2014-02-20 10:37     ` Imre Deak
  2014-02-20 19:35       ` Ben Widawsky
  2014-02-20 19:50     ` [PATCH 4/9] [v3] " Ben Widawsky
  1 sibling, 1 reply; 63+ messages in thread
From: Imre Deak @ 2014-02-20 10:37 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, 2014-02-19 at 22:05 -0800, Ben Widawsky wrote:
> This patch converts insert_entries and clear_range, both functions which
> are specific to the VM. These functions tend to encapsulate the gen
> specific PTE writes. Passing absolute addresses to the insert_entries,
> and clear_range will help make the logic clearer within the functions as
> to what's going on. Currently, all callers simply do the appropriate
> page shift, which IMO, ends up looking weird with an upcoming change for
> the gen8 page table allocations.
> 
> Up until now, the PPGTT was a funky 2 level page table. GEN8 changes
> this to look more like a 3 level page table, and to that extent we need
> a significant amount more memory simply for the page tables. To address
> this, the allocations will be split up in finer amounts.
> 
> v2: Replace size_t with uint64_t (Chris, Imre)
> 
> Reviewed-by: Imre Deak <imre.deak@intel.com>

One more thing I haven't noticed,
i915_gem_suspend_gtt_mappings()/i915_gem_restore_gtt_mappings() needs a
fixup too.

> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>



> ---
>  drivers/gpu/drm/i915/i915_drv.h     |  6 +--
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 80 +++++++++++++++++++++----------------
>  2 files changed, 49 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 8c64831..f3379ea 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -652,12 +652,12 @@ struct i915_address_space {
>  				     enum i915_cache_level level,
>  				     bool valid); /* Create a valid PTE */
>  	void (*clear_range)(struct i915_address_space *vm,
> -			    unsigned int first_entry,
> -			    unsigned int num_entries,
> +			    uint64_t start,
> +			    uint64_t length,
>  			    bool use_scratch);
>  	void (*insert_entries)(struct i915_address_space *vm,
>  			       struct sg_table *st,
> -			       unsigned int first_entry,
> +			       uint64_t start,
>  			       enum i915_cache_level cache_level);
>  	void (*cleanup)(struct i915_address_space *vm);
>  };
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 0af3587..ef5e90c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -254,13 +254,15 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
>  }
>  
>  static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> -				   unsigned first_entry,
> -				   unsigned num_entries,
> +				   uint64_t start,
> +				   uint64_t length,
>  				   bool use_scratch)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
> +	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned num_entries = length >> PAGE_SHIFT;
>  	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
>  	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
>  	unsigned last_pte, i;
> @@ -290,12 +292,13 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>  
>  static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  				      struct sg_table *pages,
> -				      unsigned first_entry,
> +				      uint64_t start,
>  				      enum i915_cache_level cache_level)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen8_gtt_pte_t *pt_vaddr;
> +	unsigned first_entry = start >> PAGE_SHIFT;
>  	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
>  	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
>  	struct sg_page_iter sg_iter;
> @@ -855,13 +858,15 @@ static int gen6_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
>  
>  /* PPGTT support for Sandybdrige/Gen6 and later */
>  static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
> -				   unsigned first_entry,
> -				   unsigned num_entries,
> +				   uint64_t start,
> +				   uint64_t length,
>  				   bool use_scratch)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen6_gtt_pte_t *pt_vaddr, scratch_pte;
> +	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned num_entries = length >> PAGE_SHIFT;
>  	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
>  	unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
>  	unsigned last_pte, i;
> @@ -888,12 +893,13 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>  
>  static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>  				      struct sg_table *pages,
> -				      unsigned first_entry,
> +				      uint64_t start,
>  				      enum i915_cache_level cache_level)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen6_gtt_pte_t *pt_vaddr;
> +	unsigned first_entry = start >> PAGE_SHIFT;
>  	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
>  	unsigned act_pte = first_entry % I915_PPGTT_PT_ENTRIES;
>  	struct sg_page_iter sg_iter;
> @@ -1026,8 +1032,7 @@ alloc:
>  		ppgtt->pt_dma_addr[i] = pt_addr;
>  	}
>  
> -	ppgtt->base.clear_range(&ppgtt->base, 0,
> -				ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES, true);
> +	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
>  	ppgtt->debug_dump = gen6_dump_ppgtt;
>  
>  	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
> @@ -1091,20 +1096,17 @@ ppgtt_bind_vma(struct i915_vma *vma,
>  	       enum i915_cache_level cache_level,
>  	       u32 flags)
>  {
> -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
> -
>  	WARN_ON(flags);
>  
> -	vma->vm->insert_entries(vma->vm, vma->obj->pages, entry, cache_level);
> +	vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
> +				cache_level);
>  }
>  
>  static void ppgtt_unbind_vma(struct i915_vma *vma)
>  {
> -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
> -
>  	vma->vm->clear_range(vma->vm,
> -			     entry,
> -			     vma->obj->base.size >> PAGE_SHIFT,
> +			     vma->node.start,
> +			     vma->obj->base.size,
>  			     true);
>  }
>  
> @@ -1265,10 +1267,11 @@ static inline void gen8_set_pte(void __iomem *addr, gen8_gtt_pte_t pte)
>  
>  static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>  				     struct sg_table *st,
> -				     unsigned int first_entry,
> +				     uint64_t start,
>  				     enum i915_cache_level level)
>  {
>  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> +	unsigned first_entry = start >> PAGE_SHIFT;
>  	gen8_gtt_pte_t __iomem *gtt_entries =
>  		(gen8_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
>  	int i = 0;
> @@ -1310,10 +1313,11 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>   */
>  static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>  				     struct sg_table *st,
> -				     unsigned int first_entry,
> +				     uint64_t start,
>  				     enum i915_cache_level level)
>  {
>  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> +	unsigned first_entry = start >> PAGE_SHIFT;
>  	gen6_gtt_pte_t __iomem *gtt_entries =
>  		(gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
>  	int i = 0;
> @@ -1345,11 +1349,13 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>  }
>  
>  static void gen8_ggtt_clear_range(struct i915_address_space *vm,
> -				  unsigned int first_entry,
> -				  unsigned int num_entries,
> +				  uint64_t start,
> +				  uint64_t length,
>  				  bool use_scratch)
>  {
>  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> +	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned num_entries = length >> PAGE_SHIFT;
>  	gen8_gtt_pte_t scratch_pte, __iomem *gtt_base =
>  		(gen8_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
>  	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
> @@ -1369,11 +1375,13 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
>  }
>  
>  static void gen6_ggtt_clear_range(struct i915_address_space *vm,
> -				  unsigned int first_entry,
> -				  unsigned int num_entries,
> +				  uint64_t start,
> +				  uint64_t length,
>  				  bool use_scratch)
>  {
>  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> +	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned num_entries = length >> PAGE_SHIFT;
>  	gen6_gtt_pte_t scratch_pte, __iomem *gtt_base =
>  		(gen6_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
>  	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
> @@ -1406,10 +1414,12 @@ static void i915_ggtt_bind_vma(struct i915_vma *vma,
>  }
>  
>  static void i915_ggtt_clear_range(struct i915_address_space *vm,
> -				  unsigned int first_entry,
> -				  unsigned int num_entries,
> +				  uint64_t start,
> +				  uint64_t length,
>  				  bool unused)
>  {
> +	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned num_entries = length >> PAGE_SHIFT;
>  	intel_gtt_clear_range(first_entry, num_entries);
>  }
>  
> @@ -1430,7 +1440,6 @@ static void ggtt_bind_vma(struct i915_vma *vma,
>  	struct drm_device *dev = vma->vm->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct drm_i915_gem_object *obj = vma->obj;
> -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
>  
>  	/* If there is no aliasing PPGTT, or the caller needs a global mapping,
>  	 * or we have a global mapping already but the cacheability flags have
> @@ -1446,7 +1455,8 @@ static void ggtt_bind_vma(struct i915_vma *vma,
>  	if (!dev_priv->mm.aliasing_ppgtt || flags & GLOBAL_BIND) {
>  		if (!obj->has_global_gtt_mapping ||
>  		    (cache_level != obj->cache_level)) {
> -			vma->vm->insert_entries(vma->vm, obj->pages, entry,
> +			vma->vm->insert_entries(vma->vm, obj->pages,
> +						vma->node.start,
>  						cache_level);
>  			obj->has_global_gtt_mapping = 1;
>  		}
> @@ -1457,7 +1467,9 @@ static void ggtt_bind_vma(struct i915_vma *vma,
>  	     (cache_level != obj->cache_level))) {
>  		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
>  		appgtt->base.insert_entries(&appgtt->base,
> -					    vma->obj->pages, entry, cache_level);
> +					    vma->obj->pages,
> +					    vma->node.start,
> +					    cache_level);
>  		vma->obj->has_aliasing_ppgtt_mapping = 1;
>  	}
>  }
> @@ -1467,11 +1479,11 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
>  	struct drm_device *dev = vma->vm->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct drm_i915_gem_object *obj = vma->obj;
> -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
>  
>  	if (obj->has_global_gtt_mapping) {
> -		vma->vm->clear_range(vma->vm, entry,
> -				     vma->obj->base.size >> PAGE_SHIFT,
> +		vma->vm->clear_range(vma->vm,
> +				     vma->node.start,
> +				     obj->base.size,
>  				     true);
>  		obj->has_global_gtt_mapping = 0;
>  	}
> @@ -1479,8 +1491,8 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
>  	if (obj->has_aliasing_ppgtt_mapping) {
>  		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
>  		appgtt->base.clear_range(&appgtt->base,
> -					 entry,
> -					 obj->base.size >> PAGE_SHIFT,
> +					 vma->node.start,
> +					 obj->base.size,
>  					 true);
>  		obj->has_aliasing_ppgtt_mapping = 0;
>  	}
> @@ -1565,14 +1577,14 @@ void i915_gem_setup_global_gtt(struct drm_device *dev,
>  
>  	/* Clear any non-preallocated blocks */
>  	drm_mm_for_each_hole(entry, &ggtt_vm->mm, hole_start, hole_end) {
> -		const unsigned long count = (hole_end - hole_start) / PAGE_SIZE;
>  		DRM_DEBUG_KMS("clearing unused GTT space: [%lx, %lx]\n",
>  			      hole_start, hole_end);
> -		ggtt_vm->clear_range(ggtt_vm, hole_start / PAGE_SIZE, count, true);
> +		ggtt_vm->clear_range(ggtt_vm, hole_start,
> +				     hole_end - hole_start, true);
>  	}
>  
>  	/* And finally clear the reserved guard page */
> -	ggtt_vm->clear_range(ggtt_vm, end / PAGE_SIZE - 1, 1, true);
> +	ggtt_vm->clear_range(ggtt_vm, end - PAGE_SIZE, PAGE_SIZE, true);
>  }
>  
>  void i915_gem_init_global_gtt(struct drm_device *dev)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations
  2014-02-20  6:05   ` [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations Ben Widawsky
@ 2014-02-20 11:28     ` Imre Deak
  2014-02-20 19:51     ` [PATCH 5/9] [v5] " Ben Widawsky
  1 sibling, 0 replies; 63+ messages in thread
From: Imre Deak @ 2014-02-20 11:28 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, 2014-02-19 at 22:05 -0800, Ben Widawsky wrote:
> The previous allocation mechanism would get 2 contiguous allocations,
> one for the page directories, and one for the page tables. As each page
> table is 1 page, and there are 512 of these per page directory, this
> goes to 2MB. An unfriendly request at best. Worse still, our HW now
> supports 4 page directories, and a 2MB allocation is not allowed.
> 
> In order to fix this, this patch attempts to split up each page table
> allocation into a single, discrete allocation. There is nothing really
> fancy about the patch itself, it just has to manage an extra pointer
> indirection, and have a fancier bit of logic to free up the pages.
> 
> To accommodate some of the added complexity, two new helpers are
> introduced to allocate, and free the page table pages.
> 
> NOTE: I really wanted to split the way we do allocations, and the way in
> which we identify the page table/page directory being used. I found
> splitting this functionality up to be too unwieldy. I apologize in
> advance to the reviewer. I'd recommend looking at the result, rather
> than the diff.
> 
> v2/NOTE2: This patch predated commit:
> 6f1cc993518462ccf039e195fabd47e7aa5bfd13
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Dec 31 15:50:31 2013 +0000
> 
>     drm/i915: Avoid dereference past end of page arr
> 
> It fixed the same issue as that patch, but because of the limbo state of
> PPGTT, Chris patch was merged instead. The excess churn is a result of
> my using my original patch, which has my preferred naming. Primarily
> act_* is changed to which_*, but it's mostly the same otherwise. I've
> kept the convention Chris used for the pte wrap (I had something
> slightly different, and broken - but fixable)
> 
> v3: Rename which_p[..]e to drop which_ (Chris)
> Remove BUG_ON in inner loop (Chris)
> Redo the pde/pdpe wrap logic (Chris)
> 
> v4: s/1MB/2MB in commit message (Imre)
> Plug leaking gen8_pt_pages in both the error path, as well as general
> free case (Imre)
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_drv.h     |   5 +-
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 130 ++++++++++++++++++++++++++++--------
>  2 files changed, 106 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index f3379ea..2dbdd34 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -691,6 +691,7 @@ struct i915_gtt {
>  };
>  #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
>  
> +#define GEN8_LEGACY_PDPS 4
>  struct i915_hw_ppgtt {
>  	struct i915_address_space base;
>  	struct kref ref;
> @@ -698,14 +699,14 @@ struct i915_hw_ppgtt {
>  	unsigned num_pd_entries;
>  	union {
>  		struct page **pt_pages;
> -		struct page *gen8_pt_pages;
> +		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
>  	};
>  	struct page *pd_pages;
>  	int num_pd_pages;
>  	int num_pt_pages;
>  	union {
>  		uint32_t pd_offset;
> -		dma_addr_t pd_dma_addr[4];
> +		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
>  	};
>  	union {
>  		dma_addr_t *pt_dma_addr;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index ef5e90c..fcde3c7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -64,7 +64,19 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
>  
>  #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
>  #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
> -#define GEN8_LEGACY_PDPS		4
> +
> +/* GEN8 legacy style addressis defined as a 3 level page table:
> + * 31:30 | 29:21 | 20:12 |  11:0
> + * PDPE  |  PDE  |  PTE  | offset
> + * The difference as compared to normal x86 3 level page table is the PDPEs are
> + * programmed via register.
> + */
> +#define GEN8_PDPE_SHIFT			30
> +#define GEN8_PDPE_MASK			0x3
> +#define GEN8_PDE_SHIFT			21
> +#define GEN8_PDE_MASK			0x1ff
> +#define GEN8_PTE_SHIFT			12
> +#define GEN8_PTE_MASK			0x1ff
>  
>  #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
>  #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
> @@ -261,32 +273,36 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
> -	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> +	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> +	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
>  	unsigned num_entries = length >> PAGE_SHIFT;
> -	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> -	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
>  	unsigned last_pte, i;
>  
>  	scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
>  				      I915_CACHE_LLC, use_scratch);
>  
>  	while (num_entries) {
> -		struct page *page_table = &ppgtt->gen8_pt_pages[act_pt];
> +		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
>  
> -		last_pte = first_pte + num_entries;
> +		last_pte = pte + num_entries;
>  		if (last_pte > GEN8_PTES_PER_PAGE)
>  			last_pte = GEN8_PTES_PER_PAGE;
>  
>  		pt_vaddr = kmap_atomic(page_table);
>  
> -		for (i = first_pte; i < last_pte; i++)
> +		for (i = pte; i < last_pte; i++) {
>  			pt_vaddr[i] = scratch_pte;
> +			num_entries--;
> +		}
>  
>  		kunmap_atomic(pt_vaddr);
>  
> -		num_entries -= last_pte - first_pte;
> -		first_pte = 0;
> -		act_pt++;
> +		pte = 0;
> +		if (++pde == GEN8_PDES_PER_PAGE) {
> +			pdpe++;
> +			pde = 0;
> +		}
>  	}
>  }
>  
> @@ -298,38 +314,57 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen8_gtt_pte_t *pt_vaddr;
> -	unsigned first_entry = start >> PAGE_SHIFT;
> -	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> -	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
> +	unsigned which_pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> +	unsigned which_pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> +	unsigned which_pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
>  	struct sg_page_iter sg_iter;
>  
>  	pt_vaddr = NULL;
> +
>  	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> +		if (WARN_ON(which_pdpe >= GEN8_LEGACY_PDPS))
> +			break;
> +
>  		if (pt_vaddr == NULL)
> -			pt_vaddr = kmap_atomic(&ppgtt->gen8_pt_pages[act_pt]);
> +			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[which_pdpe][which_pde]);
>  
> -		pt_vaddr[act_pte] =
> +		pt_vaddr[which_pte] =
>  			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
>  					cache_level, true);
> -		if (++act_pte == GEN8_PTES_PER_PAGE) {
> +		if (++which_pte == GEN8_PTES_PER_PAGE) {
>  			kunmap_atomic(pt_vaddr);
>  			pt_vaddr = NULL;
> -			act_pt++;
> -			act_pte = 0;
> +			if (which_pde + 1 == GEN8_PDES_PER_PAGE)
> +				which_pdpe++;
> +			which_pte = 0;

These need the same s/which_//, pde increment cleanup that you did in
gen8_ppgtt_clear_range (the pde increment cleanup is also a bug fix
here).

>  		}
>  	}
>  	if (pt_vaddr)
>  		kunmap_atomic(pt_vaddr);
>  }
>  
> -static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
> +static void gen8_free_page_tables(struct page **pt_pages)
> +{
> +	int i;
> +
> +	if (pt_pages == NULL)
> +		return;
> +
> +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
> +		if (pt_pages[i])
> +			__free_pages(pt_pages[i], 0);
> +}
> +
> +static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
>  {
>  	int i;
>  
> -	for (i = 0; i < ppgtt->num_pd_pages ; i++)
> +	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> +		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
> +		kfree(ppgtt->gen8_pt_pages[i]);
>  		kfree(ppgtt->gen8_pt_dma_addr[i]);
> +	}
>  
> -	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
>  	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
>  	kfree(ppgtt);

I couldn't find anything else, so with the double kfree for the above
ppgtt fixed in 1/9, this looks good to me:
Reviewed-by: Imre Deak <imre.deak@intel.com>

>  }
> @@ -369,20 +404,61 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>  	gen8_ppgtt_free(ppgtt);
>  }
>  
> +static struct page **__gen8_alloc_page_tables(void)
> +{
> +	struct page **pt_pages;
> +	int i;
> +
> +	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
> +	if (!pt_pages)
> +		return ERR_PTR(-ENOMEM);
> +
> +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
> +		pt_pages[i] = alloc_page(GFP_KERNEL);
> +		if (!pt_pages[i])
> +			goto bail;
> +	}
> +
> +	return pt_pages;
> +
> +bail:
> +	gen8_free_page_tables(pt_pages);
> +	kfree(pt_pages);
> +	return ERR_PTR(-ENOMEM);
> +}
> +
>  static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
>  					   const int max_pdp)
>  {
> -	struct page *pt_pages;
> +	struct page **pt_pages[GEN8_LEGACY_PDPS];
>  	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
> +	int i, ret;
>  
> -	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
> -	if (!pt_pages)
> -		return -ENOMEM;
> +	for (i = 0; i < max_pdp; i++) {
> +		pt_pages[i] = __gen8_alloc_page_tables();
> +		if (IS_ERR(pt_pages[i])) {
> +			ret = PTR_ERR(pt_pages[i]);
> +			goto unwind_out;
> +		}
> +	}
> +
> +	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
> +	 * "atomic" - for cleanup purposes.
> +	 */
> +	for (i = 0; i < max_pdp; i++)
> +		ppgtt->gen8_pt_pages[i] = pt_pages[i];
>  
> -	ppgtt->gen8_pt_pages = pt_pages;
>  	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
>  
>  	return 0;
> +
> +unwind_out:
> +	while (i--) {
> +		gen8_free_page_tables(pt_pages[i]);
> +		kfree(pt_pages[i]);
> +	}
> +
> +	return ret;
>  }
>  
>  static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
> @@ -464,7 +540,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
>  	struct page *p;
>  	int ret;
>  
> -	p = &ppgtt->gen8_pt_pages[pd * GEN8_PDES_PER_PAGE + pt];
> +	p = ppgtt->gen8_pt_pages[pd][pt];
>  	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
>  			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>  	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 3/9] drm/i915/bdw: Split ppgtt initialization up
  2014-02-20  6:05   ` [PATCH 3/9] drm/i915/bdw: Split ppgtt initialization up Ben Widawsky
@ 2014-02-20 13:10     ` Imre Deak
  0 siblings, 0 replies; 63+ messages in thread
From: Imre Deak @ 2014-02-20 13:10 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky


[-- Attachment #1.1: Type: text/plain, Size: 7785 bytes --]

On Wed, 2014-02-19 at 22:05 -0800, Ben Widawsky wrote:
> Like cleanup in an earlier patch, the code becomes much more readable,
> and easier to extend if we extract out helper functions for the various
> stages of init.
> 
> Note that with this patch it becomes really simple, and tempting to begin
> using the 'goto out' idiom with explicit free/fini semantics. I've
> kept the error path as similar as possible to the cleanup() function to
> make sure cleanup is as robust as possible
> 
> v2: Remove comment "NB:From here on, ppgtt->base.cleanup() should
> function properly"
> Update commit message to reflect above
> 
> v3: Rebased on top of bugfixes found in the previous patch by Imre
> Moved number of pd pages assertion to the proper place (Imre)
> 
> v4:
> Allocate dma address space for num_pd_pages, not num_pd_entries (Ben)
> Don't use gen8_pt_dma_addr after free on error path (Imre)
> With new fix from v4 of the previous patch.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Looks ok to me:
Reviewed-by: Imre Deak <imre.deak@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 164 +++++++++++++++++++++++++-----------
>  1 file changed, 116 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 7956659..0af3587 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -366,6 +366,113 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>  	gen8_ppgtt_free(ppgtt);
>  }
>  
> +static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
> +					   const int max_pdp)
> +{
> +	struct page *pt_pages;
> +	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
> +
> +	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
> +	if (!pt_pages)
> +		return -ENOMEM;
> +
> +	ppgtt->gen8_pt_pages = pt_pages;
> +	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
> +
> +	return 0;
> +}
> +
> +static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
> +{
> +	int i;
> +
> +	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> +		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
> +						     sizeof(dma_addr_t),
> +						     GFP_KERNEL);
> +		if (!ppgtt->gen8_pt_dma_addr[i])
> +			return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
> +static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
> +						const int max_pdp)
> +{
> +	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
> +	if (!ppgtt->pd_pages)
> +		return -ENOMEM;
> +
> +	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
> +	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
> +
> +	return 0;
> +}
> +
> +static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
> +			    const int max_pdp)
> +{
> +	int ret;
> +
> +	ret = gen8_ppgtt_allocate_page_directories(ppgtt, max_pdp);
> +	if (ret)
> +		return ret;
> +
> +	ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
> +	if (ret) {
> +		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
> +		return ret;
> +	}
> +
> +	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
> +
> +	ret = gen8_ppgtt_allocate_dma(ppgtt);
> +	if (ret)
> +		gen8_ppgtt_free(ppgtt);
> +
> +	return ret;
> +}
> +
> +static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
> +					     const int pd)
> +{
> +	dma_addr_t pd_addr;
> +	int ret;
> +
> +	pd_addr = pci_map_page(ppgtt->base.dev->pdev,
> +			       &ppgtt->pd_pages[pd], 0,
> +			       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> +
> +	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
> +	if (ret)
> +		return ret;
> +
> +	ppgtt->pd_dma_addr[pd] = pd_addr;
> +
> +	return 0;
> +}
> +
> +static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
> +					const int pd,
> +					const int pt)
> +{
> +	dma_addr_t pt_addr;
> +	struct page *p;
> +	int ret;
> +
> +	p = &ppgtt->gen8_pt_pages[pd * GEN8_PDES_PER_PAGE + pt];
> +	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
> +			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> +	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
> +	if (ret)
> +		return ret;
> +
> +	ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
> +
> +	return 0;
> +}
> +
>  /**
>   * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
>   * with a net effect resembling a 2-level page table in normal x86 terms. Each
> @@ -378,69 +485,30 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>   */
>  static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  {
> -	struct page *pt_pages;
>  	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
> -	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
> -	struct pci_dev *hwdev = ppgtt->base.dev->pdev;
> +	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
>  	int i, j, ret;
>  
>  	if (size % (1<<30))
>  		DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
>  
> -	/* 1. Do all our allocations for page directories and page tables */
> -	ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
> -	if (!ppgtt->pd_pages)
> -		return -ENOMEM;
> -
> -	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
> -	if (!pt_pages) {
> -		__free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
> -		return -ENOMEM;
> -	}
> -
> -	ppgtt->gen8_pt_pages = pt_pages;
> -	ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
> -	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
> -	ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
> -	BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
> -
> -	for (i = 0; i < max_pdp; i++) {
> -		ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
> -						     sizeof(dma_addr_t),
> -						     GFP_KERNEL);
> -		if (!ppgtt->gen8_pt_dma_addr[i]) {
> -			ret = -ENOMEM;
> -			goto bail;
> -		}
> -	}
> +	/* 1. Do all our allocations for page directories and page tables. */
> +	ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
> +	if (ret)
> +		return ret;
>  
>  	/*
> -	 * 2. Create all the DMA mappings for the page directories and page
> -	 * tables
> +	 * 2. Create DMA mappings for the page directories and page tables.
>  	 */
>  	for (i = 0; i < max_pdp; i++) {
> -		dma_addr_t pd_addr, pt_addr;
> -
> -		/* Get the page directory mappings */
> -		pd_addr = pci_map_page(hwdev, &ppgtt->pd_pages[i], 0,
> -				       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> -		ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
> +		ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
>  		if (ret)
>  			goto bail;
>  
> -		ppgtt->pd_dma_addr[i] = pd_addr;
> -
> -		/* And the page table mappings per page directory */
>  		for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
> -			struct page *p = &pt_pages[i * GEN8_PDES_PER_PAGE + j];
> -
> -			pt_addr = pci_map_page(hwdev, p, 0, PAGE_SIZE,
> -					       PCI_DMA_BIDIRECTIONAL);
> -			ret = pci_dma_mapping_error(hwdev, pt_addr);
> +			ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
>  			if (ret)
>  				goto bail;
> -
> -			ppgtt->gen8_pt_dma_addr[i][j] = pt_addr;
>  		}
>  	}
>  
> @@ -479,7 +547,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
>  	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
>  			 ppgtt->num_pt_pages,
> -			 (ppgtt->num_pt_pages - num_pt_pages) +
> +			 (ppgtt->num_pt_pages - min_pt_pages) +
>  			 size % (1<<30));
>  	return 0;
>  


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute
  2014-02-20 10:37     ` Imre Deak
@ 2014-02-20 19:35       ` Ben Widawsky
  0 siblings, 0 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-20 19:35 UTC (permalink / raw)
  To: Imre Deak; +Cc: Intel GFX, Ben Widawsky

On Thu, Feb 20, 2014 at 12:37:19PM +0200, Imre Deak wrote:
> On Wed, 2014-02-19 at 22:05 -0800, Ben Widawsky wrote:
> > This patch converts insert_entries and clear_range, both functions which
> > are specific to the VM. These functions tend to encapsulate the gen
> > specific PTE writes. Passing absolute addresses to the insert_entries,
> > and clear_range will help make the logic clearer within the functions as
> > to what's going on. Currently, all callers simply do the appropriate
> > page shift, which IMO, ends up looking weird with an upcoming change for
> > the gen8 page table allocations.
> > 
> > Up until now, the PPGTT was a funky 2 level page table. GEN8 changes
> > this to look more like a 3 level page table, and to that extent we need
> > a significant amount more memory simply for the page tables. To address
> > this, the allocations will be split up in finer amounts.
> > 
> > v2: Replace size_t with uint64_t (Chris, Imre)
> > 
> > Reviewed-by: Imre Deak <imre.deak@intel.com>
> 
> One more thing I haven't noticed,
> i915_gem_suspend_gtt_mappings()/i915_gem_restore_gtt_mappings() needs a
> fixup too.
> 

Thanks for spotting this. gen8_ppgtt_init() is also broken. I don't see
any others:

@@ -542,7 +542,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
        ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
 
        ppgtt->base.clear_range(&ppgtt->base, 0,
-                               ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE,
+                               ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE,
                                true);
 
        DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",

> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> 
> 
> 
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h     |  6 +--
> >  drivers/gpu/drm/i915/i915_gem_gtt.c | 80 +++++++++++++++++++++----------------
> >  2 files changed, 49 insertions(+), 37 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 8c64831..f3379ea 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -652,12 +652,12 @@ struct i915_address_space {
> >  				     enum i915_cache_level level,
> >  				     bool valid); /* Create a valid PTE */
> >  	void (*clear_range)(struct i915_address_space *vm,
> > -			    unsigned int first_entry,
> > -			    unsigned int num_entries,
> > +			    uint64_t start,
> > +			    uint64_t length,
> >  			    bool use_scratch);
> >  	void (*insert_entries)(struct i915_address_space *vm,
> >  			       struct sg_table *st,
> > -			       unsigned int first_entry,
> > +			       uint64_t start,
> >  			       enum i915_cache_level cache_level);
> >  	void (*cleanup)(struct i915_address_space *vm);
> >  };
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index 0af3587..ef5e90c 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -254,13 +254,15 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
> >  }
> >  
> >  static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> > -				   unsigned first_entry,
> > -				   unsigned num_entries,
> > +				   uint64_t start,
> > +				   uint64_t length,
> >  				   bool use_scratch)
> >  {
> >  	struct i915_hw_ppgtt *ppgtt =
> >  		container_of(vm, struct i915_hw_ppgtt, base);
> >  	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
> > +	unsigned first_entry = start >> PAGE_SHIFT;
> > +	unsigned num_entries = length >> PAGE_SHIFT;
> >  	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> >  	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
> >  	unsigned last_pte, i;
> > @@ -290,12 +292,13 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> >  
> >  static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> >  				      struct sg_table *pages,
> > -				      unsigned first_entry,
> > +				      uint64_t start,
> >  				      enum i915_cache_level cache_level)
> >  {
> >  	struct i915_hw_ppgtt *ppgtt =
> >  		container_of(vm, struct i915_hw_ppgtt, base);
> >  	gen8_gtt_pte_t *pt_vaddr;
> > +	unsigned first_entry = start >> PAGE_SHIFT;
> >  	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> >  	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
> >  	struct sg_page_iter sg_iter;
> > @@ -855,13 +858,15 @@ static int gen6_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
> >  
> >  /* PPGTT support for Sandybdrige/Gen6 and later */
> >  static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
> > -				   unsigned first_entry,
> > -				   unsigned num_entries,
> > +				   uint64_t start,
> > +				   uint64_t length,
> >  				   bool use_scratch)
> >  {
> >  	struct i915_hw_ppgtt *ppgtt =
> >  		container_of(vm, struct i915_hw_ppgtt, base);
> >  	gen6_gtt_pte_t *pt_vaddr, scratch_pte;
> > +	unsigned first_entry = start >> PAGE_SHIFT;
> > +	unsigned num_entries = length >> PAGE_SHIFT;
> >  	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
> >  	unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
> >  	unsigned last_pte, i;
> > @@ -888,12 +893,13 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
> >  
> >  static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
> >  				      struct sg_table *pages,
> > -				      unsigned first_entry,
> > +				      uint64_t start,
> >  				      enum i915_cache_level cache_level)
> >  {
> >  	struct i915_hw_ppgtt *ppgtt =
> >  		container_of(vm, struct i915_hw_ppgtt, base);
> >  	gen6_gtt_pte_t *pt_vaddr;
> > +	unsigned first_entry = start >> PAGE_SHIFT;
> >  	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
> >  	unsigned act_pte = first_entry % I915_PPGTT_PT_ENTRIES;
> >  	struct sg_page_iter sg_iter;
> > @@ -1026,8 +1032,7 @@ alloc:
> >  		ppgtt->pt_dma_addr[i] = pt_addr;
> >  	}
> >  
> > -	ppgtt->base.clear_range(&ppgtt->base, 0,
> > -				ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES, true);
> > +	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
> >  	ppgtt->debug_dump = gen6_dump_ppgtt;
> >  
> >  	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
> > @@ -1091,20 +1096,17 @@ ppgtt_bind_vma(struct i915_vma *vma,
> >  	       enum i915_cache_level cache_level,
> >  	       u32 flags)
> >  {
> > -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
> > -
> >  	WARN_ON(flags);
> >  
> > -	vma->vm->insert_entries(vma->vm, vma->obj->pages, entry, cache_level);
> > +	vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
> > +				cache_level);
> >  }
> >  
> >  static void ppgtt_unbind_vma(struct i915_vma *vma)
> >  {
> > -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
> > -
> >  	vma->vm->clear_range(vma->vm,
> > -			     entry,
> > -			     vma->obj->base.size >> PAGE_SHIFT,
> > +			     vma->node.start,
> > +			     vma->obj->base.size,
> >  			     true);
> >  }
> >  
> > @@ -1265,10 +1267,11 @@ static inline void gen8_set_pte(void __iomem *addr, gen8_gtt_pte_t pte)
> >  
> >  static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
> >  				     struct sg_table *st,
> > -				     unsigned int first_entry,
> > +				     uint64_t start,
> >  				     enum i915_cache_level level)
> >  {
> >  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> > +	unsigned first_entry = start >> PAGE_SHIFT;
> >  	gen8_gtt_pte_t __iomem *gtt_entries =
> >  		(gen8_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
> >  	int i = 0;
> > @@ -1310,10 +1313,11 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
> >   */
> >  static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
> >  				     struct sg_table *st,
> > -				     unsigned int first_entry,
> > +				     uint64_t start,
> >  				     enum i915_cache_level level)
> >  {
> >  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> > +	unsigned first_entry = start >> PAGE_SHIFT;
> >  	gen6_gtt_pte_t __iomem *gtt_entries =
> >  		(gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
> >  	int i = 0;
> > @@ -1345,11 +1349,13 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
> >  }
> >  
> >  static void gen8_ggtt_clear_range(struct i915_address_space *vm,
> > -				  unsigned int first_entry,
> > -				  unsigned int num_entries,
> > +				  uint64_t start,
> > +				  uint64_t length,
> >  				  bool use_scratch)
> >  {
> >  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> > +	unsigned first_entry = start >> PAGE_SHIFT;
> > +	unsigned num_entries = length >> PAGE_SHIFT;
> >  	gen8_gtt_pte_t scratch_pte, __iomem *gtt_base =
> >  		(gen8_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
> >  	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
> > @@ -1369,11 +1375,13 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
> >  }
> >  
> >  static void gen6_ggtt_clear_range(struct i915_address_space *vm,
> > -				  unsigned int first_entry,
> > -				  unsigned int num_entries,
> > +				  uint64_t start,
> > +				  uint64_t length,
> >  				  bool use_scratch)
> >  {
> >  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> > +	unsigned first_entry = start >> PAGE_SHIFT;
> > +	unsigned num_entries = length >> PAGE_SHIFT;
> >  	gen6_gtt_pte_t scratch_pte, __iomem *gtt_base =
> >  		(gen6_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
> >  	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
> > @@ -1406,10 +1414,12 @@ static void i915_ggtt_bind_vma(struct i915_vma *vma,
> >  }
> >  
> >  static void i915_ggtt_clear_range(struct i915_address_space *vm,
> > -				  unsigned int first_entry,
> > -				  unsigned int num_entries,
> > +				  uint64_t start,
> > +				  uint64_t length,
> >  				  bool unused)
> >  {
> > +	unsigned first_entry = start >> PAGE_SHIFT;
> > +	unsigned num_entries = length >> PAGE_SHIFT;
> >  	intel_gtt_clear_range(first_entry, num_entries);
> >  }
> >  
> > @@ -1430,7 +1440,6 @@ static void ggtt_bind_vma(struct i915_vma *vma,
> >  	struct drm_device *dev = vma->vm->dev;
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	struct drm_i915_gem_object *obj = vma->obj;
> > -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
> >  
> >  	/* If there is no aliasing PPGTT, or the caller needs a global mapping,
> >  	 * or we have a global mapping already but the cacheability flags have
> > @@ -1446,7 +1455,8 @@ static void ggtt_bind_vma(struct i915_vma *vma,
> >  	if (!dev_priv->mm.aliasing_ppgtt || flags & GLOBAL_BIND) {
> >  		if (!obj->has_global_gtt_mapping ||
> >  		    (cache_level != obj->cache_level)) {
> > -			vma->vm->insert_entries(vma->vm, obj->pages, entry,
> > +			vma->vm->insert_entries(vma->vm, obj->pages,
> > +						vma->node.start,
> >  						cache_level);
> >  			obj->has_global_gtt_mapping = 1;
> >  		}
> > @@ -1457,7 +1467,9 @@ static void ggtt_bind_vma(struct i915_vma *vma,
> >  	     (cache_level != obj->cache_level))) {
> >  		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
> >  		appgtt->base.insert_entries(&appgtt->base,
> > -					    vma->obj->pages, entry, cache_level);
> > +					    vma->obj->pages,
> > +					    vma->node.start,
> > +					    cache_level);
> >  		vma->obj->has_aliasing_ppgtt_mapping = 1;
> >  	}
> >  }
> > @@ -1467,11 +1479,11 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
> >  	struct drm_device *dev = vma->vm->dev;
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	struct drm_i915_gem_object *obj = vma->obj;
> > -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
> >  
> >  	if (obj->has_global_gtt_mapping) {
> > -		vma->vm->clear_range(vma->vm, entry,
> > -				     vma->obj->base.size >> PAGE_SHIFT,
> > +		vma->vm->clear_range(vma->vm,
> > +				     vma->node.start,
> > +				     obj->base.size,
> >  				     true);
> >  		obj->has_global_gtt_mapping = 0;
> >  	}
> > @@ -1479,8 +1491,8 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
> >  	if (obj->has_aliasing_ppgtt_mapping) {
> >  		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
> >  		appgtt->base.clear_range(&appgtt->base,
> > -					 entry,
> > -					 obj->base.size >> PAGE_SHIFT,
> > +					 vma->node.start,
> > +					 obj->base.size,
> >  					 true);
> >  		obj->has_aliasing_ppgtt_mapping = 0;
> >  	}
> > @@ -1565,14 +1577,14 @@ void i915_gem_setup_global_gtt(struct drm_device *dev,
> >  
> >  	/* Clear any non-preallocated blocks */
> >  	drm_mm_for_each_hole(entry, &ggtt_vm->mm, hole_start, hole_end) {
> > -		const unsigned long count = (hole_end - hole_start) / PAGE_SIZE;
> >  		DRM_DEBUG_KMS("clearing unused GTT space: [%lx, %lx]\n",
> >  			      hole_start, hole_end);
> > -		ggtt_vm->clear_range(ggtt_vm, hole_start / PAGE_SIZE, count, true);
> > +		ggtt_vm->clear_range(ggtt_vm, hole_start,
> > +				     hole_end - hole_start, true);
> >  	}
> >  
> >  	/* And finally clear the reserved guard page */
> > -	ggtt_vm->clear_range(ggtt_vm, end / PAGE_SIZE - 1, 1, true);
> > +	ggtt_vm->clear_range(ggtt_vm, end - PAGE_SIZE, PAGE_SIZE, true);
> >  }
> >  
> >  void i915_gem_init_global_gtt(struct drm_device *dev)
> 
> 

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH .5/9] drm/i915: Move ppgtt_release out of the header
  2014-02-20  6:05   ` [PATCH 1/9] drm/i915/bdw: Free PPGTT struct Ben Widawsky
  2014-02-20  9:31     ` Imre Deak
@ 2014-02-20 19:47     ` Ben Widawsky
  2014-02-20 19:47       ` [PATCH 1/9] [v2] drm/i915/bdw: Free PPGTT struct Ben Widawsky
                         ` (2 more replies)
  1 sibling, 3 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-20 19:47 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

At one time it was expected to be called in multiple places by kref_put.
At the current time however, it is all contained within
i915_gem_context.c.

This patch makes an upcoming required addition a bit nicer since it too
doesn't need to be defined in a header file.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h         | 36 ---------------------------------
 drivers/gpu/drm/i915/i915_gem_context.c | 36 +++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8c64831..57556fb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2387,42 +2387,6 @@ static inline bool intel_enable_ppgtt(struct drm_device *dev, bool full)
 		return HAS_ALIASING_PPGTT(dev);
 }
 
-static inline void ppgtt_release(struct kref *kref)
-{
-	struct i915_hw_ppgtt *ppgtt = container_of(kref, struct i915_hw_ppgtt, ref);
-	struct drm_device *dev = ppgtt->base.dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct i915_address_space *vm = &ppgtt->base;
-
-	if (ppgtt == dev_priv->mm.aliasing_ppgtt ||
-	    (list_empty(&vm->active_list) && list_empty(&vm->inactive_list))) {
-		ppgtt->base.cleanup(&ppgtt->base);
-		return;
-	}
-
-	/*
-	 * Make sure vmas are unbound before we take down the drm_mm
-	 *
-	 * FIXME: Proper refcounting should take care of this, this shouldn't be
-	 * needed at all.
-	 */
-	if (!list_empty(&vm->active_list)) {
-		struct i915_vma *vma;
-
-		list_for_each_entry(vma, &vm->active_list, mm_list)
-			if (WARN_ON(list_empty(&vma->vma_link) ||
-				    list_is_singular(&vma->vma_link)))
-				break;
-
-		i915_gem_evict_vm(&ppgtt->base, true);
-	} else {
-		i915_gem_retire_requests(dev);
-		i915_gem_evict_vm(&ppgtt->base, false);
-	}
-
-	ppgtt->base.cleanup(&ppgtt->base);
-}
-
 /* i915_gem_stolen.c */
 int i915_gem_init_stolen(struct drm_device *dev);
 int i915_gem_stolen_setup_compression(struct drm_device *dev, int size);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f8c21a6..171a2ef 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -99,6 +99,42 @@
 static int do_switch(struct intel_ring_buffer *ring,
 		     struct i915_hw_context *to);
 
+static void ppgtt_release(struct kref *kref)
+{
+	struct i915_hw_ppgtt *ppgtt = container_of(kref, struct i915_hw_ppgtt, ref);
+	struct drm_device *dev = ppgtt->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_address_space *vm = &ppgtt->base;
+
+	if (ppgtt == dev_priv->mm.aliasing_ppgtt ||
+	    (list_empty(&vm->active_list) && list_empty(&vm->inactive_list))) {
+		ppgtt->base.cleanup(&ppgtt->base);
+		return;
+	}
+
+	/*
+	 * Make sure vmas are unbound before we take down the drm_mm
+	 *
+	 * FIXME: Proper refcounting should take care of this, this shouldn't be
+	 * needed at all.
+	 */
+	if (!list_empty(&vm->active_list)) {
+		struct i915_vma *vma;
+
+		list_for_each_entry(vma, &vm->active_list, mm_list)
+			if (WARN_ON(list_empty(&vma->vma_link) ||
+				    list_is_singular(&vma->vma_link)))
+				break;
+
+		i915_gem_evict_vm(&ppgtt->base, true);
+	} else {
+		i915_gem_retire_requests(dev);
+		i915_gem_evict_vm(&ppgtt->base, false);
+	}
+
+	ppgtt->base.cleanup(&ppgtt->base);
+}
+
 static size_t get_context_alignment(struct drm_device *dev)
 {
 	if (IS_GEN6(dev))
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 1/9] [v2] drm/i915/bdw: Free PPGTT struct
  2014-02-20 19:47     ` [PATCH .5/9] drm/i915: Move ppgtt_release out of the header Ben Widawsky
@ 2014-02-20 19:47       ` Ben Widawsky
  2014-02-24 16:43         ` Imre Deak
  2014-02-24 16:18       ` [PATCH .5/9] drm/i915: Move ppgtt_release out of the header Imre Deak
  2014-03-04 14:53       ` Daniel Vetter
  2 siblings, 1 reply; 63+ messages in thread
From: Ben Widawsky @ 2014-02-20 19:47 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

GEN8 never freed the PPGTT struct. As GEN8 doesn't use full PPGTT, the
leak is small and only found on a module reload. ie. I don't think this
needs to go to stable.

v2: The very naive, kfree in gen8 ppgtt cleanup, is subject to a double
free on PPGTT initialization failure. (Spotted by Imre). Instead this
patch pulls the ppgtt struct freeing out of the cleanup and leaves it to
the allocators/callers or the one doing the last kref_put as in standard
convention

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 12 ++++++++++--
 drivers/gpu/drm/i915/i915_gem_gtt.c     |  1 -
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 171a2ef..9096e2a 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -99,9 +99,8 @@
 static int do_switch(struct intel_ring_buffer *ring,
 		     struct i915_hw_context *to);
 
-static void ppgtt_release(struct kref *kref)
+static void do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt)
 {
-	struct i915_hw_ppgtt *ppgtt = container_of(kref, struct i915_hw_ppgtt, ref);
 	struct drm_device *dev = ppgtt->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_address_space *vm = &ppgtt->base;
@@ -135,6 +134,15 @@ static void ppgtt_release(struct kref *kref)
 	ppgtt->base.cleanup(&ppgtt->base);
 }
 
+static void ppgtt_release(struct kref *kref)
+{
+	struct i915_hw_ppgtt *ppgtt =
+		container_of(kref, struct i915_hw_ppgtt, ref);
+
+	do_ppgtt_cleanup(ppgtt);
+	kfree(ppgtt);
+}
+
 static size_t get_context_alignment(struct drm_device *dev)
 {
 	if (IS_GEN6(dev))
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 69a88d4..49e79fb 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -859,7 +859,6 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 	for (i = 0; i < ppgtt->num_pd_entries; i++)
 		__free_page(ppgtt->pt_pages[i]);
 	kfree(ppgtt->pt_pages);
-	kfree(ppgtt);
 }
 
 static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
-- 
1.9.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 4/9] [v3] drm/i915: Make clear/insert vfuncs args absolute
  2014-02-20  6:05   ` [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute Ben Widawsky
  2014-02-20 10:37     ` Imre Deak
@ 2014-02-20 19:50     ` Ben Widawsky
  2014-02-24 16:52       ` Imre Deak
  1 sibling, 1 reply; 63+ messages in thread
From: Ben Widawsky @ 2014-02-20 19:50 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This patch converts insert_entries and clear_range, both functions which
are specific to the VM. These functions tend to encapsulate the gen
specific PTE writes. Passing absolute addresses to the insert_entries,
and clear_range will help make the logic clearer within the functions as
to what's going on. Currently, all callers simply do the appropriate
page shift, which IMO, ends up looking weird with an upcoming change for
the gen8 page table allocations.

Up until now, the PPGTT was a funky 2 level page table. GEN8 changes
this to look more like a 3 level page table, and to that extent we need
a significant amount more memory simply for the page tables. To address
this, the allocations will be split up in finer amounts.

v2: Replace size_t with uint64_t (Chris, Imre)

v3: Fix size in gen8_ppgtt_init (Ben)
Fix Size in i915_gem_suspend_gtt_mappings/restore (Imre)

Reviewed-by: Imre Deak <imre.deak@intel.com> (v2)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h     |  6 +--
 drivers/gpu/drm/i915/i915_gem_gtt.c | 90 +++++++++++++++++++++----------------
 2 files changed, 54 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 57556fb..ab23bfd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -652,12 +652,12 @@ struct i915_address_space {
 				     enum i915_cache_level level,
 				     bool valid); /* Create a valid PTE */
 	void (*clear_range)(struct i915_address_space *vm,
-			    unsigned int first_entry,
-			    unsigned int num_entries,
+			    uint64_t start,
+			    uint64_t length,
 			    bool use_scratch);
 	void (*insert_entries)(struct i915_address_space *vm,
 			       struct sg_table *st,
-			       unsigned int first_entry,
+			       uint64_t start,
 			       enum i915_cache_level cache_level);
 	void (*cleanup)(struct i915_address_space *vm);
 };
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index beca571..03a3871 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -254,13 +254,15 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 }
 
 static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
-				   unsigned first_entry,
-				   unsigned num_entries,
+				   uint64_t start,
+				   uint64_t length,
 				   bool use_scratch)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
+	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
 	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
 	unsigned last_pte, i;
@@ -290,12 +292,13 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 
 static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 				      struct sg_table *pages,
-				      unsigned first_entry,
+				      uint64_t start,
 				      enum i915_cache_level cache_level)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr;
+	unsigned first_entry = start >> PAGE_SHIFT;
 	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
 	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
 	struct sg_page_iter sg_iter;
@@ -539,7 +542,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
 
 	ppgtt->base.clear_range(&ppgtt->base, 0,
-				ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE,
+				ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE,
 				true);
 
 	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
@@ -854,13 +857,15 @@ static int gen6_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
 
 /* PPGTT support for Sandybdrige/Gen6 and later */
 static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
-				   unsigned first_entry,
-				   unsigned num_entries,
+				   uint64_t start,
+				   uint64_t length,
 				   bool use_scratch)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen6_gtt_pte_t *pt_vaddr, scratch_pte;
+	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned num_entries = length >> PAGE_SHIFT;
 	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
 	unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
 	unsigned last_pte, i;
@@ -887,12 +892,13 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 
 static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 				      struct sg_table *pages,
-				      unsigned first_entry,
+				      uint64_t start,
 				      enum i915_cache_level cache_level)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen6_gtt_pte_t *pt_vaddr;
+	unsigned first_entry = start >> PAGE_SHIFT;
 	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
 	unsigned act_pte = first_entry % I915_PPGTT_PT_ENTRIES;
 	struct sg_page_iter sg_iter;
@@ -1024,8 +1030,7 @@ alloc:
 		ppgtt->pt_dma_addr[i] = pt_addr;
 	}
 
-	ppgtt->base.clear_range(&ppgtt->base, 0,
-				ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES, true);
+	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 	ppgtt->debug_dump = gen6_dump_ppgtt;
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
@@ -1089,20 +1094,17 @@ ppgtt_bind_vma(struct i915_vma *vma,
 	       enum i915_cache_level cache_level,
 	       u32 flags)
 {
-	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
-
 	WARN_ON(flags);
 
-	vma->vm->insert_entries(vma->vm, vma->obj->pages, entry, cache_level);
+	vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
+				cache_level);
 }
 
 static void ppgtt_unbind_vma(struct i915_vma *vma)
 {
-	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
-
 	vma->vm->clear_range(vma->vm,
-			     entry,
-			     vma->obj->base.size >> PAGE_SHIFT,
+			     vma->node.start,
+			     vma->obj->base.size,
 			     true);
 }
 
@@ -1186,8 +1188,8 @@ void i915_gem_suspend_gtt_mappings(struct drm_device *dev)
 	i915_check_and_clear_faults(dev);
 
 	dev_priv->gtt.base.clear_range(&dev_priv->gtt.base,
-				       dev_priv->gtt.base.start / PAGE_SIZE,
-				       dev_priv->gtt.base.total / PAGE_SIZE,
+				       dev_priv->gtt.base.start,
+				       dev_priv->gtt.base.total,
 				       false);
 }
 
@@ -1201,8 +1203,8 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
 
 	/* First fill our portion of the GTT with scratch pages */
 	dev_priv->gtt.base.clear_range(&dev_priv->gtt.base,
-				       dev_priv->gtt.base.start / PAGE_SIZE,
-				       dev_priv->gtt.base.total / PAGE_SIZE,
+				       dev_priv->gtt.base.start,
+				       dev_priv->gtt.base.total,
 				       true);
 
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
@@ -1263,10 +1265,11 @@ static inline void gen8_set_pte(void __iomem *addr, gen8_gtt_pte_t pte)
 
 static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
 				     struct sg_table *st,
-				     unsigned int first_entry,
+				     uint64_t start,
 				     enum i915_cache_level level)
 {
 	struct drm_i915_private *dev_priv = vm->dev->dev_private;
+	unsigned first_entry = start >> PAGE_SHIFT;
 	gen8_gtt_pte_t __iomem *gtt_entries =
 		(gen8_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
 	int i = 0;
@@ -1308,10 +1311,11 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
  */
 static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 				     struct sg_table *st,
-				     unsigned int first_entry,
+				     uint64_t start,
 				     enum i915_cache_level level)
 {
 	struct drm_i915_private *dev_priv = vm->dev->dev_private;
+	unsigned first_entry = start >> PAGE_SHIFT;
 	gen6_gtt_pte_t __iomem *gtt_entries =
 		(gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
 	int i = 0;
@@ -1343,11 +1347,13 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 }
 
 static void gen8_ggtt_clear_range(struct i915_address_space *vm,
-				  unsigned int first_entry,
-				  unsigned int num_entries,
+				  uint64_t start,
+				  uint64_t length,
 				  bool use_scratch)
 {
 	struct drm_i915_private *dev_priv = vm->dev->dev_private;
+	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned num_entries = length >> PAGE_SHIFT;
 	gen8_gtt_pte_t scratch_pte, __iomem *gtt_base =
 		(gen8_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
 	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
@@ -1367,11 +1373,13 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
 }
 
 static void gen6_ggtt_clear_range(struct i915_address_space *vm,
-				  unsigned int first_entry,
-				  unsigned int num_entries,
+				  uint64_t start,
+				  uint64_t length,
 				  bool use_scratch)
 {
 	struct drm_i915_private *dev_priv = vm->dev->dev_private;
+	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned num_entries = length >> PAGE_SHIFT;
 	gen6_gtt_pte_t scratch_pte, __iomem *gtt_base =
 		(gen6_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
 	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
@@ -1404,10 +1412,12 @@ static void i915_ggtt_bind_vma(struct i915_vma *vma,
 }
 
 static void i915_ggtt_clear_range(struct i915_address_space *vm,
-				  unsigned int first_entry,
-				  unsigned int num_entries,
+				  uint64_t start,
+				  uint64_t length,
 				  bool unused)
 {
+	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned num_entries = length >> PAGE_SHIFT;
 	intel_gtt_clear_range(first_entry, num_entries);
 }
 
@@ -1428,7 +1438,6 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 	struct drm_device *dev = vma->vm->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj = vma->obj;
-	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
 
 	/* If there is no aliasing PPGTT, or the caller needs a global mapping,
 	 * or we have a global mapping already but the cacheability flags have
@@ -1444,7 +1453,8 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 	if (!dev_priv->mm.aliasing_ppgtt || flags & GLOBAL_BIND) {
 		if (!obj->has_global_gtt_mapping ||
 		    (cache_level != obj->cache_level)) {
-			vma->vm->insert_entries(vma->vm, obj->pages, entry,
+			vma->vm->insert_entries(vma->vm, obj->pages,
+						vma->node.start,
 						cache_level);
 			obj->has_global_gtt_mapping = 1;
 		}
@@ -1455,7 +1465,9 @@ static void ggtt_bind_vma(struct i915_vma *vma,
 	     (cache_level != obj->cache_level))) {
 		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
 		appgtt->base.insert_entries(&appgtt->base,
-					    vma->obj->pages, entry, cache_level);
+					    vma->obj->pages,
+					    vma->node.start,
+					    cache_level);
 		vma->obj->has_aliasing_ppgtt_mapping = 1;
 	}
 }
@@ -1465,11 +1477,11 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
 	struct drm_device *dev = vma->vm->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj = vma->obj;
-	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
 
 	if (obj->has_global_gtt_mapping) {
-		vma->vm->clear_range(vma->vm, entry,
-				     vma->obj->base.size >> PAGE_SHIFT,
+		vma->vm->clear_range(vma->vm,
+				     vma->node.start,
+				     obj->base.size,
 				     true);
 		obj->has_global_gtt_mapping = 0;
 	}
@@ -1477,8 +1489,8 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
 	if (obj->has_aliasing_ppgtt_mapping) {
 		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
 		appgtt->base.clear_range(&appgtt->base,
-					 entry,
-					 obj->base.size >> PAGE_SHIFT,
+					 vma->node.start,
+					 obj->base.size,
 					 true);
 		obj->has_aliasing_ppgtt_mapping = 0;
 	}
@@ -1563,14 +1575,14 @@ void i915_gem_setup_global_gtt(struct drm_device *dev,
 
 	/* Clear any non-preallocated blocks */
 	drm_mm_for_each_hole(entry, &ggtt_vm->mm, hole_start, hole_end) {
-		const unsigned long count = (hole_end - hole_start) / PAGE_SIZE;
 		DRM_DEBUG_KMS("clearing unused GTT space: [%lx, %lx]\n",
 			      hole_start, hole_end);
-		ggtt_vm->clear_range(ggtt_vm, hole_start / PAGE_SIZE, count, true);
+		ggtt_vm->clear_range(ggtt_vm, hole_start,
+				     hole_end - hole_start, true);
 	}
 
 	/* And finally clear the reserved guard page */
-	ggtt_vm->clear_range(ggtt_vm, end / PAGE_SIZE - 1, 1, true);
+	ggtt_vm->clear_range(ggtt_vm, end - PAGE_SIZE, PAGE_SIZE, true);
 }
 
 void i915_gem_init_global_gtt(struct drm_device *dev)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 5/9] [v5] drm/i915/bdw: Reorganize PT allocations
  2014-02-20  6:05   ` [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations Ben Widawsky
  2014-02-20 11:28     ` Imre Deak
@ 2014-02-20 19:51     ` Ben Widawsky
  2014-02-24 17:03       ` Imre Deak
  1 sibling, 1 reply; 63+ messages in thread
From: Ben Widawsky @ 2014-02-20 19:51 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

The previous allocation mechanism would get 2 contiguous allocations,
one for the page directories, and one for the page tables. As each page
table is 1 page, and there are 512 of these per page directory, this
goes to 2MB. An unfriendly request at best. Worse still, our HW now
supports 4 page directories, and a 2MB allocation is not allowed.

In order to fix this, this patch attempts to split up each page table
allocation into a single, discrete allocation. There is nothing really
fancy about the patch itself, it just has to manage an extra pointer
indirection, and have a fancier bit of logic to free up the pages.

To accommodate some of the added complexity, two new helpers are
introduced to allocate, and free the page table pages.

NOTE: I really wanted to split the way we do allocations, and the way in
which we identify the page table/page directory being used. I found
splitting this functionality up to be too unwieldy. I apologize in
advance to the reviewer. I'd recommend looking at the result, rather
than the diff.

v2/NOTE2: This patch predated commit:
6f1cc993518462ccf039e195fabd47e7aa5bfd13
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Dec 31 15:50:31 2013 +0000

    drm/i915: Avoid dereference past end of page arr

It fixed the same issue as that patch, but because of the limbo state of
PPGTT, Chris patch was merged instead. The excess churn is a result of
my using my original patch, which has my preferred naming. Primarily
act_* is changed to which_*, but it's mostly the same otherwise. I've
kept the convention Chris used for the pte wrap (I had something
slightly different, and broken - but fixable)

v3: Rename which_p[..]e to drop which_ (Chris)
Remove BUG_ON in inner loop (Chris)
Redo the pde/pdpe wrap logic (Chris)

v4: s/1MB/2MB in commit message (Imre)
Plug leaking gen8_pt_pages in both the error path, as well as general
free case (Imre)

v5: Rename leftover "which_" variables (Imre)
Add the pde = 0 wrap that was missed from v3 (Imre)

Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h     |   5 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 132 ++++++++++++++++++++++++++++--------
 2 files changed, 108 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ab23bfd..465ebf4 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -691,6 +691,7 @@ struct i915_gtt {
 };
 #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
 
+#define GEN8_LEGACY_PDPS 4
 struct i915_hw_ppgtt {
 	struct i915_address_space base;
 	struct kref ref;
@@ -698,14 +699,14 @@ struct i915_hw_ppgtt {
 	unsigned num_pd_entries;
 	union {
 		struct page **pt_pages;
-		struct page *gen8_pt_pages;
+		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
 	};
 	struct page *pd_pages;
 	int num_pd_pages;
 	int num_pt_pages;
 	union {
 		uint32_t pd_offset;
-		dma_addr_t pd_dma_addr[4];
+		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
 	};
 	union {
 		dma_addr_t *pt_dma_addr;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 03a3871..46adb0d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -64,7 +64,19 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 
 #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
 #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
-#define GEN8_LEGACY_PDPS		4
+
+/* GEN8 legacy style addressis defined as a 3 level page table:
+ * 31:30 | 29:21 | 20:12 |  11:0
+ * PDPE  |  PDE  |  PTE  | offset
+ * The difference as compared to normal x86 3 level page table is the PDPEs are
+ * programmed via register.
+ */
+#define GEN8_PDPE_SHIFT			30
+#define GEN8_PDPE_MASK			0x3
+#define GEN8_PDE_SHIFT			21
+#define GEN8_PDE_MASK			0x1ff
+#define GEN8_PTE_SHIFT			12
+#define GEN8_PTE_MASK			0x1ff
 
 #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
 #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
@@ -261,32 +273,36 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
-	unsigned first_entry = start >> PAGE_SHIFT;
+	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
+	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
+	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
 	unsigned num_entries = length >> PAGE_SHIFT;
-	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
-	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
 	unsigned last_pte, i;
 
 	scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
 				      I915_CACHE_LLC, use_scratch);
 
 	while (num_entries) {
-		struct page *page_table = &ppgtt->gen8_pt_pages[act_pt];
+		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
 
-		last_pte = first_pte + num_entries;
+		last_pte = pte + num_entries;
 		if (last_pte > GEN8_PTES_PER_PAGE)
 			last_pte = GEN8_PTES_PER_PAGE;
 
 		pt_vaddr = kmap_atomic(page_table);
 
-		for (i = first_pte; i < last_pte; i++)
+		for (i = pte; i < last_pte; i++) {
 			pt_vaddr[i] = scratch_pte;
+			num_entries--;
+		}
 
 		kunmap_atomic(pt_vaddr);
 
-		num_entries -= last_pte - first_pte;
-		first_pte = 0;
-		act_pt++;
+		pte = 0;
+		if (++pde == GEN8_PDES_PER_PAGE) {
+			pdpe++;
+			pde = 0;
+		}
 	}
 }
 
@@ -298,38 +314,59 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
 	gen8_gtt_pte_t *pt_vaddr;
-	unsigned first_entry = start >> PAGE_SHIFT;
-	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
-	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
+	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
+	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
+	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
 	struct sg_page_iter sg_iter;
 
 	pt_vaddr = NULL;
+
 	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
+		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPS))
+			break;
+
 		if (pt_vaddr == NULL)
-			pt_vaddr = kmap_atomic(&ppgtt->gen8_pt_pages[act_pt]);
+			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
 
-		pt_vaddr[act_pte] =
+		pt_vaddr[pte] =
 			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
 					cache_level, true);
-		if (++act_pte == GEN8_PTES_PER_PAGE) {
+		if (++pte == GEN8_PTES_PER_PAGE) {
 			kunmap_atomic(pt_vaddr);
 			pt_vaddr = NULL;
-			act_pt++;
-			act_pte = 0;
+			if (pde + 1 == GEN8_PDES_PER_PAGE) {
+				pdpe++;
+				pde = 0;
+			}
+			pte = 0;
 		}
 	}
 	if (pt_vaddr)
 		kunmap_atomic(pt_vaddr);
 }
 
-static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+static void gen8_free_page_tables(struct page **pt_pages)
 {
 	int i;
 
-	for (i = 0; i < ppgtt->num_pd_pages ; i++)
+	if (pt_pages == NULL)
+		return;
+
+	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
+		if (pt_pages[i])
+			__free_pages(pt_pages[i], 0);
+}
+
+static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
+{
+	int i;
+
+	for (i = 0; i < ppgtt->num_pd_pages; i++) {
+		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
+		kfree(ppgtt->gen8_pt_pages[i]);
 		kfree(ppgtt->gen8_pt_dma_addr[i]);
+	}
 
-	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
 	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
 }
 
@@ -368,20 +405,61 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
 	gen8_ppgtt_free(ppgtt);
 }
 
+static struct page **__gen8_alloc_page_tables(void)
+{
+	struct page **pt_pages;
+	int i;
+
+	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
+	if (!pt_pages)
+		return ERR_PTR(-ENOMEM);
+
+	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
+		pt_pages[i] = alloc_page(GFP_KERNEL);
+		if (!pt_pages[i])
+			goto bail;
+	}
+
+	return pt_pages;
+
+bail:
+	gen8_free_page_tables(pt_pages);
+	kfree(pt_pages);
+	return ERR_PTR(-ENOMEM);
+}
+
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
 					   const int max_pdp)
 {
-	struct page *pt_pages;
+	struct page **pt_pages[GEN8_LEGACY_PDPS];
 	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
+	int i, ret;
 
-	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
-	if (!pt_pages)
-		return -ENOMEM;
+	for (i = 0; i < max_pdp; i++) {
+		pt_pages[i] = __gen8_alloc_page_tables();
+		if (IS_ERR(pt_pages[i])) {
+			ret = PTR_ERR(pt_pages[i]);
+			goto unwind_out;
+		}
+	}
+
+	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
+	 * "atomic" - for cleanup purposes.
+	 */
+	for (i = 0; i < max_pdp; i++)
+		ppgtt->gen8_pt_pages[i] = pt_pages[i];
 
-	ppgtt->gen8_pt_pages = pt_pages;
 	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
 
 	return 0;
+
+unwind_out:
+	while (i--) {
+		gen8_free_page_tables(pt_pages[i]);
+		kfree(pt_pages[i]);
+	}
+
+	return ret;
 }
 
 static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
@@ -463,7 +541,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
 	struct page *p;
 	int ret;
 
-	p = &ppgtt->gen8_pt_pages[pd * GEN8_PDES_PER_PAGE + pt];
+	p = ppgtt->gen8_pt_pages[pd][pt];
 	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
 			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 10/9] drm/i915/bdw: Kill ppgtt->num_pt_pages
  2014-02-20  6:05   ` [PATCH 0/9] [v2] " Ben Widawsky
@ 2014-02-21 21:06     ` Ben Widawsky
  2014-02-24 17:17       ` Imre Deak
  2014-03-04 14:50     ` [PATCH 0/9] [v2] BDW 4G GGTT + PPGTT cleanups Daniel Vetter
  1 sibling, 1 reply; 63+ messages in thread
From: Ben Widawsky @ 2014-02-21 21:06 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

With the original PPGTT implementation if the number of PDPs was not a
power of two, the number of pages for the page tables would end up being
rounded up. The code actually had a bug here afaict, but this is a
theoretical bug as I don't believe this can actually occur with the
current code/HW..

With the rework of the page table allocations, there is no longer a
distinction between number of page table pages, and number of page
directory entries. To avoid confusion, kill the redundant (and newer)
struct member.

Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 +-
 drivers/gpu/drm/i915/i915_drv.h     |  3 +--
 drivers/gpu/drm/i915/i915_gem_gtt.c | 14 ++++----------
 3 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 509e2e1..e0c42a6 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1757,7 +1757,7 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
 		return;
 
 	seq_printf(m, "Page directories: %d\n", ppgtt->num_pd_pages);
-	seq_printf(m, "Page tables: %d\n", ppgtt->num_pt_pages);
+	seq_printf(m, "Page tables: %d\n", ppgtt->num_pd_entries);
 	for_each_ring(ring, dev_priv, unused) {
 		seq_printf(m, "%s\n", ring->name);
 		for (i = 0; i < 4; i++) {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2f29558..a9f1cae 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -698,13 +698,12 @@ struct i915_hw_ppgtt {
 	struct kref ref;
 	struct drm_mm_node node;
 	unsigned num_pd_entries;
+	unsigned num_pd_pages; /* gen8+ */
 	union {
 		struct page **pt_pages;
 		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
 	};
 	struct page *pd_pages;
-	int num_pd_pages;
-	int num_pt_pages;
 	union {
 		uint32_t pd_offset;
 		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6c03929..bd815d7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -433,7 +433,6 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
 					   const int max_pdp)
 {
 	struct page **pt_pages[GEN8_LEGACY_PDPS];
-	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
 	int i, ret;
 
 	for (i = 0; i < max_pdp; i++) {
@@ -450,8 +449,6 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
 	for (i = 0; i < max_pdp; i++)
 		ppgtt->gen8_pt_pages[i] = pt_pages[i];
 
-	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
-
 	return 0;
 
 unwind_out:
@@ -618,18 +615,15 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 	ppgtt->base.start = 0;
-	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
+	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
 
-	ppgtt->base.clear_range(&ppgtt->base, 0,
-				ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE,
-				true);
+	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 
 	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
 			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
 	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
-			 ppgtt->num_pt_pages,
-			 (ppgtt->num_pt_pages - min_pt_pages) +
-			 size % (1<<30));
+			 ppgtt->num_pd_entries,
+			 (ppgtt->num_pd_entries - min_pt_pages) + size % (1<<30));
 	return 0;
 
 bail:
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH .5/9] drm/i915: Move ppgtt_release out of the header
  2014-02-20 19:47     ` [PATCH .5/9] drm/i915: Move ppgtt_release out of the header Ben Widawsky
  2014-02-20 19:47       ` [PATCH 1/9] [v2] drm/i915/bdw: Free PPGTT struct Ben Widawsky
@ 2014-02-24 16:18       ` Imre Deak
  2014-03-04 14:53       ` Daniel Vetter
  2 siblings, 0 replies; 63+ messages in thread
From: Imre Deak @ 2014-02-24 16:18 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky


[-- Attachment #1.1: Type: text/plain, Size: 3912 bytes --]

On Thu, 2014-02-20 at 11:47 -0800, Ben Widawsky wrote:
> At one time it was expected to be called in multiple places by kref_put.
> At the current time however, it is all contained within
> i915_gem_context.c.
> 
> This patch makes an upcoming required addition a bit nicer since it too
> doesn't need to be defined in a header file.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Reviewed-by: Imre Deak <imre.deak@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_drv.h         | 36 ---------------------------------
>  drivers/gpu/drm/i915/i915_gem_context.c | 36 +++++++++++++++++++++++++++++++++
>  2 files changed, 36 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 8c64831..57556fb 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2387,42 +2387,6 @@ static inline bool intel_enable_ppgtt(struct drm_device *dev, bool full)
>  		return HAS_ALIASING_PPGTT(dev);
>  }
>  
> -static inline void ppgtt_release(struct kref *kref)
> -{
> -	struct i915_hw_ppgtt *ppgtt = container_of(kref, struct i915_hw_ppgtt, ref);
> -	struct drm_device *dev = ppgtt->base.dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct i915_address_space *vm = &ppgtt->base;
> -
> -	if (ppgtt == dev_priv->mm.aliasing_ppgtt ||
> -	    (list_empty(&vm->active_list) && list_empty(&vm->inactive_list))) {
> -		ppgtt->base.cleanup(&ppgtt->base);
> -		return;
> -	}
> -
> -	/*
> -	 * Make sure vmas are unbound before we take down the drm_mm
> -	 *
> -	 * FIXME: Proper refcounting should take care of this, this shouldn't be
> -	 * needed at all.
> -	 */
> -	if (!list_empty(&vm->active_list)) {
> -		struct i915_vma *vma;
> -
> -		list_for_each_entry(vma, &vm->active_list, mm_list)
> -			if (WARN_ON(list_empty(&vma->vma_link) ||
> -				    list_is_singular(&vma->vma_link)))
> -				break;
> -
> -		i915_gem_evict_vm(&ppgtt->base, true);
> -	} else {
> -		i915_gem_retire_requests(dev);
> -		i915_gem_evict_vm(&ppgtt->base, false);
> -	}
> -
> -	ppgtt->base.cleanup(&ppgtt->base);
> -}
> -
>  /* i915_gem_stolen.c */
>  int i915_gem_init_stolen(struct drm_device *dev);
>  int i915_gem_stolen_setup_compression(struct drm_device *dev, int size);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index f8c21a6..171a2ef 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -99,6 +99,42 @@
>  static int do_switch(struct intel_ring_buffer *ring,
>  		     struct i915_hw_context *to);
>  
> +static void ppgtt_release(struct kref *kref)
> +{
> +	struct i915_hw_ppgtt *ppgtt = container_of(kref, struct i915_hw_ppgtt, ref);
> +	struct drm_device *dev = ppgtt->base.dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct i915_address_space *vm = &ppgtt->base;
> +
> +	if (ppgtt == dev_priv->mm.aliasing_ppgtt ||
> +	    (list_empty(&vm->active_list) && list_empty(&vm->inactive_list))) {
> +		ppgtt->base.cleanup(&ppgtt->base);
> +		return;
> +	}
> +
> +	/*
> +	 * Make sure vmas are unbound before we take down the drm_mm
> +	 *
> +	 * FIXME: Proper refcounting should take care of this, this shouldn't be
> +	 * needed at all.
> +	 */
> +	if (!list_empty(&vm->active_list)) {
> +		struct i915_vma *vma;
> +
> +		list_for_each_entry(vma, &vm->active_list, mm_list)
> +			if (WARN_ON(list_empty(&vma->vma_link) ||
> +				    list_is_singular(&vma->vma_link)))
> +				break;
> +
> +		i915_gem_evict_vm(&ppgtt->base, true);
> +	} else {
> +		i915_gem_retire_requests(dev);
> +		i915_gem_evict_vm(&ppgtt->base, false);
> +	}
> +
> +	ppgtt->base.cleanup(&ppgtt->base);
> +}
> +
>  static size_t get_context_alignment(struct drm_device *dev)
>  {
>  	if (IS_GEN6(dev))


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 1/9] [v2] drm/i915/bdw: Free PPGTT struct
  2014-02-20 19:47       ` [PATCH 1/9] [v2] drm/i915/bdw: Free PPGTT struct Ben Widawsky
@ 2014-02-24 16:43         ` Imre Deak
  0 siblings, 0 replies; 63+ messages in thread
From: Imre Deak @ 2014-02-24 16:43 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky


[-- Attachment #1.1: Type: text/plain, Size: 2531 bytes --]

On Thu, 2014-02-20 at 11:47 -0800, Ben Widawsky wrote:
> GEN8 never freed the PPGTT struct. As GEN8 doesn't use full PPGTT, the
> leak is small and only found on a module reload. ie. I don't think this
> needs to go to stable.
> 
> v2: The very naive, kfree in gen8 ppgtt cleanup, is subject to a double
> free on PPGTT initialization failure. (Spotted by Imre). Instead this
> patch pulls the ppgtt struct freeing out of the cleanup and leaves it to
> the allocators/callers or the one doing the last kref_put as in standard
> convention
> 
> Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Reviewed-by: Imre Deak <imre.deak@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 12 ++++++++++--
>  drivers/gpu/drm/i915/i915_gem_gtt.c     |  1 -
>  2 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 171a2ef..9096e2a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -99,9 +99,8 @@
>  static int do_switch(struct intel_ring_buffer *ring,
>  		     struct i915_hw_context *to);
>  
> -static void ppgtt_release(struct kref *kref)
> +static void do_ppgtt_cleanup(struct i915_hw_ppgtt *ppgtt)
>  {
> -	struct i915_hw_ppgtt *ppgtt = container_of(kref, struct i915_hw_ppgtt, ref);
>  	struct drm_device *dev = ppgtt->base.dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct i915_address_space *vm = &ppgtt->base;
> @@ -135,6 +134,15 @@ static void ppgtt_release(struct kref *kref)
>  	ppgtt->base.cleanup(&ppgtt->base);
>  }
>  
> +static void ppgtt_release(struct kref *kref)
> +{
> +	struct i915_hw_ppgtt *ppgtt =
> +		container_of(kref, struct i915_hw_ppgtt, ref);
> +
> +	do_ppgtt_cleanup(ppgtt);
> +	kfree(ppgtt);
> +}
> +
>  static size_t get_context_alignment(struct drm_device *dev)
>  {
>  	if (IS_GEN6(dev))
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 69a88d4..49e79fb 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -859,7 +859,6 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
>  	for (i = 0; i < ppgtt->num_pd_entries; i++)
>  		__free_page(ppgtt->pt_pages[i]);
>  	kfree(ppgtt->pt_pages);
> -	kfree(ppgtt);
>  }
>  
>  static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 4/9] [v3] drm/i915: Make clear/insert vfuncs args absolute
  2014-02-20 19:50     ` [PATCH 4/9] [v3] " Ben Widawsky
@ 2014-02-24 16:52       ` Imre Deak
  0 siblings, 0 replies; 63+ messages in thread
From: Imre Deak @ 2014-02-24 16:52 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky


[-- Attachment #1.1: Type: text/plain, Size: 13687 bytes --]

On Thu, 2014-02-20 at 11:50 -0800, Ben Widawsky wrote:
> This patch converts insert_entries and clear_range, both functions which
> are specific to the VM. These functions tend to encapsulate the gen
> specific PTE writes. Passing absolute addresses to the insert_entries,
> and clear_range will help make the logic clearer within the functions as
> to what's going on. Currently, all callers simply do the appropriate
> page shift, which IMO, ends up looking weird with an upcoming change for
> the gen8 page table allocations.
> 
> Up until now, the PPGTT was a funky 2 level page table. GEN8 changes
> this to look more like a 3 level page table, and to that extent we need
> a significant amount more memory simply for the page tables. To address
> this, the allocations will be split up in finer amounts.
> 
> v2: Replace size_t with uint64_t (Chris, Imre)
> 
> v3: Fix size in gen8_ppgtt_init (Ben)
> Fix Size in i915_gem_suspend_gtt_mappings/restore (Imre)
> 
> Reviewed-by: Imre Deak <imre.deak@intel.com> (v2)
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Reviewed-by: Imre Deak <imre.deak@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_drv.h     |  6 +--
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 90 +++++++++++++++++++++----------------
>  2 files changed, 54 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 57556fb..ab23bfd 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -652,12 +652,12 @@ struct i915_address_space {
>  				     enum i915_cache_level level,
>  				     bool valid); /* Create a valid PTE */
>  	void (*clear_range)(struct i915_address_space *vm,
> -			    unsigned int first_entry,
> -			    unsigned int num_entries,
> +			    uint64_t start,
> +			    uint64_t length,
>  			    bool use_scratch);
>  	void (*insert_entries)(struct i915_address_space *vm,
>  			       struct sg_table *st,
> -			       unsigned int first_entry,
> +			       uint64_t start,
>  			       enum i915_cache_level cache_level);
>  	void (*cleanup)(struct i915_address_space *vm);
>  };
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index beca571..03a3871 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -254,13 +254,15 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
>  }
>  
>  static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> -				   unsigned first_entry,
> -				   unsigned num_entries,
> +				   uint64_t start,
> +				   uint64_t length,
>  				   bool use_scratch)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
> +	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned num_entries = length >> PAGE_SHIFT;
>  	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
>  	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
>  	unsigned last_pte, i;
> @@ -290,12 +292,13 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>  
>  static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  				      struct sg_table *pages,
> -				      unsigned first_entry,
> +				      uint64_t start,
>  				      enum i915_cache_level cache_level)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen8_gtt_pte_t *pt_vaddr;
> +	unsigned first_entry = start >> PAGE_SHIFT;
>  	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
>  	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
>  	struct sg_page_iter sg_iter;
> @@ -539,7 +542,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
>  
>  	ppgtt->base.clear_range(&ppgtt->base, 0,
> -				ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE,
> +				ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE,
>  				true);
>  
>  	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
> @@ -854,13 +857,15 @@ static int gen6_ppgtt_enable(struct i915_hw_ppgtt *ppgtt)
>  
>  /* PPGTT support for Sandybdrige/Gen6 and later */
>  static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
> -				   unsigned first_entry,
> -				   unsigned num_entries,
> +				   uint64_t start,
> +				   uint64_t length,
>  				   bool use_scratch)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen6_gtt_pte_t *pt_vaddr, scratch_pte;
> +	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned num_entries = length >> PAGE_SHIFT;
>  	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
>  	unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
>  	unsigned last_pte, i;
> @@ -887,12 +892,13 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>  
>  static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>  				      struct sg_table *pages,
> -				      unsigned first_entry,
> +				      uint64_t start,
>  				      enum i915_cache_level cache_level)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen6_gtt_pte_t *pt_vaddr;
> +	unsigned first_entry = start >> PAGE_SHIFT;
>  	unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
>  	unsigned act_pte = first_entry % I915_PPGTT_PT_ENTRIES;
>  	struct sg_page_iter sg_iter;
> @@ -1024,8 +1030,7 @@ alloc:
>  		ppgtt->pt_dma_addr[i] = pt_addr;
>  	}
>  
> -	ppgtt->base.clear_range(&ppgtt->base, 0,
> -				ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES, true);
> +	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
>  	ppgtt->debug_dump = gen6_dump_ppgtt;
>  
>  	DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
> @@ -1089,20 +1094,17 @@ ppgtt_bind_vma(struct i915_vma *vma,
>  	       enum i915_cache_level cache_level,
>  	       u32 flags)
>  {
> -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
> -
>  	WARN_ON(flags);
>  
> -	vma->vm->insert_entries(vma->vm, vma->obj->pages, entry, cache_level);
> +	vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
> +				cache_level);
>  }
>  
>  static void ppgtt_unbind_vma(struct i915_vma *vma)
>  {
> -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
> -
>  	vma->vm->clear_range(vma->vm,
> -			     entry,
> -			     vma->obj->base.size >> PAGE_SHIFT,
> +			     vma->node.start,
> +			     vma->obj->base.size,
>  			     true);
>  }
>  
> @@ -1186,8 +1188,8 @@ void i915_gem_suspend_gtt_mappings(struct drm_device *dev)
>  	i915_check_and_clear_faults(dev);
>  
>  	dev_priv->gtt.base.clear_range(&dev_priv->gtt.base,
> -				       dev_priv->gtt.base.start / PAGE_SIZE,
> -				       dev_priv->gtt.base.total / PAGE_SIZE,
> +				       dev_priv->gtt.base.start,
> +				       dev_priv->gtt.base.total,
>  				       false);
>  }
>  
> @@ -1201,8 +1203,8 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
>  
>  	/* First fill our portion of the GTT with scratch pages */
>  	dev_priv->gtt.base.clear_range(&dev_priv->gtt.base,
> -				       dev_priv->gtt.base.start / PAGE_SIZE,
> -				       dev_priv->gtt.base.total / PAGE_SIZE,
> +				       dev_priv->gtt.base.start,
> +				       dev_priv->gtt.base.total,
>  				       true);
>  
>  	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
> @@ -1263,10 +1265,11 @@ static inline void gen8_set_pte(void __iomem *addr, gen8_gtt_pte_t pte)
>  
>  static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>  				     struct sg_table *st,
> -				     unsigned int first_entry,
> +				     uint64_t start,
>  				     enum i915_cache_level level)
>  {
>  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> +	unsigned first_entry = start >> PAGE_SHIFT;
>  	gen8_gtt_pte_t __iomem *gtt_entries =
>  		(gen8_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
>  	int i = 0;
> @@ -1308,10 +1311,11 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>   */
>  static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>  				     struct sg_table *st,
> -				     unsigned int first_entry,
> +				     uint64_t start,
>  				     enum i915_cache_level level)
>  {
>  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> +	unsigned first_entry = start >> PAGE_SHIFT;
>  	gen6_gtt_pte_t __iomem *gtt_entries =
>  		(gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm + first_entry;
>  	int i = 0;
> @@ -1343,11 +1347,13 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>  }
>  
>  static void gen8_ggtt_clear_range(struct i915_address_space *vm,
> -				  unsigned int first_entry,
> -				  unsigned int num_entries,
> +				  uint64_t start,
> +				  uint64_t length,
>  				  bool use_scratch)
>  {
>  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> +	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned num_entries = length >> PAGE_SHIFT;
>  	gen8_gtt_pte_t scratch_pte, __iomem *gtt_base =
>  		(gen8_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
>  	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
> @@ -1367,11 +1373,13 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
>  }
>  
>  static void gen6_ggtt_clear_range(struct i915_address_space *vm,
> -				  unsigned int first_entry,
> -				  unsigned int num_entries,
> +				  uint64_t start,
> +				  uint64_t length,
>  				  bool use_scratch)
>  {
>  	struct drm_i915_private *dev_priv = vm->dev->dev_private;
> +	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned num_entries = length >> PAGE_SHIFT;
>  	gen6_gtt_pte_t scratch_pte, __iomem *gtt_base =
>  		(gen6_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
>  	const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
> @@ -1404,10 +1412,12 @@ static void i915_ggtt_bind_vma(struct i915_vma *vma,
>  }
>  
>  static void i915_ggtt_clear_range(struct i915_address_space *vm,
> -				  unsigned int first_entry,
> -				  unsigned int num_entries,
> +				  uint64_t start,
> +				  uint64_t length,
>  				  bool unused)
>  {
> +	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned num_entries = length >> PAGE_SHIFT;
>  	intel_gtt_clear_range(first_entry, num_entries);
>  }
>  
> @@ -1428,7 +1438,6 @@ static void ggtt_bind_vma(struct i915_vma *vma,
>  	struct drm_device *dev = vma->vm->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct drm_i915_gem_object *obj = vma->obj;
> -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
>  
>  	/* If there is no aliasing PPGTT, or the caller needs a global mapping,
>  	 * or we have a global mapping already but the cacheability flags have
> @@ -1444,7 +1453,8 @@ static void ggtt_bind_vma(struct i915_vma *vma,
>  	if (!dev_priv->mm.aliasing_ppgtt || flags & GLOBAL_BIND) {
>  		if (!obj->has_global_gtt_mapping ||
>  		    (cache_level != obj->cache_level)) {
> -			vma->vm->insert_entries(vma->vm, obj->pages, entry,
> +			vma->vm->insert_entries(vma->vm, obj->pages,
> +						vma->node.start,
>  						cache_level);
>  			obj->has_global_gtt_mapping = 1;
>  		}
> @@ -1455,7 +1465,9 @@ static void ggtt_bind_vma(struct i915_vma *vma,
>  	     (cache_level != obj->cache_level))) {
>  		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
>  		appgtt->base.insert_entries(&appgtt->base,
> -					    vma->obj->pages, entry, cache_level);
> +					    vma->obj->pages,
> +					    vma->node.start,
> +					    cache_level);
>  		vma->obj->has_aliasing_ppgtt_mapping = 1;
>  	}
>  }
> @@ -1465,11 +1477,11 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
>  	struct drm_device *dev = vma->vm->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct drm_i915_gem_object *obj = vma->obj;
> -	const unsigned long entry = vma->node.start >> PAGE_SHIFT;
>  
>  	if (obj->has_global_gtt_mapping) {
> -		vma->vm->clear_range(vma->vm, entry,
> -				     vma->obj->base.size >> PAGE_SHIFT,
> +		vma->vm->clear_range(vma->vm,
> +				     vma->node.start,
> +				     obj->base.size,
>  				     true);
>  		obj->has_global_gtt_mapping = 0;
>  	}
> @@ -1477,8 +1489,8 @@ static void ggtt_unbind_vma(struct i915_vma *vma)
>  	if (obj->has_aliasing_ppgtt_mapping) {
>  		struct i915_hw_ppgtt *appgtt = dev_priv->mm.aliasing_ppgtt;
>  		appgtt->base.clear_range(&appgtt->base,
> -					 entry,
> -					 obj->base.size >> PAGE_SHIFT,
> +					 vma->node.start,
> +					 obj->base.size,
>  					 true);
>  		obj->has_aliasing_ppgtt_mapping = 0;
>  	}
> @@ -1563,14 +1575,14 @@ void i915_gem_setup_global_gtt(struct drm_device *dev,
>  
>  	/* Clear any non-preallocated blocks */
>  	drm_mm_for_each_hole(entry, &ggtt_vm->mm, hole_start, hole_end) {
> -		const unsigned long count = (hole_end - hole_start) / PAGE_SIZE;
>  		DRM_DEBUG_KMS("clearing unused GTT space: [%lx, %lx]\n",
>  			      hole_start, hole_end);
> -		ggtt_vm->clear_range(ggtt_vm, hole_start / PAGE_SIZE, count, true);
> +		ggtt_vm->clear_range(ggtt_vm, hole_start,
> +				     hole_end - hole_start, true);
>  	}
>  
>  	/* And finally clear the reserved guard page */
> -	ggtt_vm->clear_range(ggtt_vm, end / PAGE_SIZE - 1, 1, true);
> +	ggtt_vm->clear_range(ggtt_vm, end - PAGE_SIZE, PAGE_SIZE, true);
>  }
>  
>  void i915_gem_init_global_gtt(struct drm_device *dev)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 5/9] [v5] drm/i915/bdw: Reorganize PT allocations
  2014-02-20 19:51     ` [PATCH 5/9] [v5] " Ben Widawsky
@ 2014-02-24 17:03       ` Imre Deak
  2014-02-24 23:38         ` Ben Widawsky
  0 siblings, 1 reply; 63+ messages in thread
From: Imre Deak @ 2014-02-24 17:03 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky


[-- Attachment #1.1: Type: text/plain, Size: 10425 bytes --]

On Thu, 2014-02-20 at 11:51 -0800, Ben Widawsky wrote:
> The previous allocation mechanism would get 2 contiguous allocations,
> one for the page directories, and one for the page tables. As each page
> table is 1 page, and there are 512 of these per page directory, this
> goes to 2MB. An unfriendly request at best. Worse still, our HW now
> supports 4 page directories, and a 2MB allocation is not allowed.
> 
> In order to fix this, this patch attempts to split up each page table
> allocation into a single, discrete allocation. There is nothing really
> fancy about the patch itself, it just has to manage an extra pointer
> indirection, and have a fancier bit of logic to free up the pages.
> 
> To accommodate some of the added complexity, two new helpers are
> introduced to allocate, and free the page table pages.
> 
> NOTE: I really wanted to split the way we do allocations, and the way in
> which we identify the page table/page directory being used. I found
> splitting this functionality up to be too unwieldy. I apologize in
> advance to the reviewer. I'd recommend looking at the result, rather
> than the diff.
> 
> v2/NOTE2: This patch predated commit:
> 6f1cc993518462ccf039e195fabd47e7aa5bfd13
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Dec 31 15:50:31 2013 +0000
> 
>     drm/i915: Avoid dereference past end of page arr
> 
> It fixed the same issue as that patch, but because of the limbo state of
> PPGTT, Chris patch was merged instead. The excess churn is a result of
> my using my original patch, which has my preferred naming. Primarily
> act_* is changed to which_*, but it's mostly the same otherwise. I've
> kept the convention Chris used for the pte wrap (I had something
> slightly different, and broken - but fixable)
> 
> v3: Rename which_p[..]e to drop which_ (Chris)
> Remove BUG_ON in inner loop (Chris)
> Redo the pde/pdpe wrap logic (Chris)
> 
> v4: s/1MB/2MB in commit message (Imre)
> Plug leaking gen8_pt_pages in both the error path, as well as general
> free case (Imre)
> 
> v5: Rename leftover "which_" variables (Imre)
> Add the pde = 0 wrap that was missed from v3 (Imre)
> 
> Reviewed-by: Imre Deak <imre.deak@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Reviewed-by: Imre Deak <imre.deak@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_drv.h     |   5 +-
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 132 ++++++++++++++++++++++++++++--------
>  2 files changed, 108 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ab23bfd..465ebf4 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -691,6 +691,7 @@ struct i915_gtt {
>  };
>  #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
>  
> +#define GEN8_LEGACY_PDPS 4
>  struct i915_hw_ppgtt {
>  	struct i915_address_space base;
>  	struct kref ref;
> @@ -698,14 +699,14 @@ struct i915_hw_ppgtt {
>  	unsigned num_pd_entries;
>  	union {
>  		struct page **pt_pages;
> -		struct page *gen8_pt_pages;
> +		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
>  	};
>  	struct page *pd_pages;
>  	int num_pd_pages;
>  	int num_pt_pages;
>  	union {
>  		uint32_t pd_offset;
> -		dma_addr_t pd_dma_addr[4];
> +		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
>  	};
>  	union {
>  		dma_addr_t *pt_dma_addr;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 03a3871..46adb0d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -64,7 +64,19 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
>  
>  #define GEN8_PTES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_gtt_pte_t))
>  #define GEN8_PDES_PER_PAGE		(PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
> -#define GEN8_LEGACY_PDPS		4
> +
> +/* GEN8 legacy style addressis defined as a 3 level page table:
> + * 31:30 | 29:21 | 20:12 |  11:0
> + * PDPE  |  PDE  |  PTE  | offset
> + * The difference as compared to normal x86 3 level page table is the PDPEs are
> + * programmed via register.
> + */
> +#define GEN8_PDPE_SHIFT			30
> +#define GEN8_PDPE_MASK			0x3
> +#define GEN8_PDE_SHIFT			21
> +#define GEN8_PDE_MASK			0x1ff
> +#define GEN8_PTE_SHIFT			12
> +#define GEN8_PTE_MASK			0x1ff
>  
>  #define PPAT_UNCACHED_INDEX		(_PAGE_PWT | _PAGE_PCD)
>  #define PPAT_CACHED_PDE_INDEX		0 /* WB LLC */
> @@ -261,32 +273,36 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen8_gtt_pte_t *pt_vaddr, scratch_pte;
> -	unsigned first_entry = start >> PAGE_SHIFT;
> +	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> +	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> +	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
>  	unsigned num_entries = length >> PAGE_SHIFT;
> -	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> -	unsigned first_pte = first_entry % GEN8_PTES_PER_PAGE;
>  	unsigned last_pte, i;
>  
>  	scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
>  				      I915_CACHE_LLC, use_scratch);
>  
>  	while (num_entries) {
> -		struct page *page_table = &ppgtt->gen8_pt_pages[act_pt];
> +		struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
>  
> -		last_pte = first_pte + num_entries;
> +		last_pte = pte + num_entries;
>  		if (last_pte > GEN8_PTES_PER_PAGE)
>  			last_pte = GEN8_PTES_PER_PAGE;
>  
>  		pt_vaddr = kmap_atomic(page_table);
>  
> -		for (i = first_pte; i < last_pte; i++)
> +		for (i = pte; i < last_pte; i++) {
>  			pt_vaddr[i] = scratch_pte;
> +			num_entries--;
> +		}
>  
>  		kunmap_atomic(pt_vaddr);
>  
> -		num_entries -= last_pte - first_pte;
> -		first_pte = 0;
> -		act_pt++;
> +		pte = 0;
> +		if (++pde == GEN8_PDES_PER_PAGE) {
> +			pdpe++;
> +			pde = 0;
> +		}
>  	}
>  }
>  
> @@ -298,38 +314,59 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(vm, struct i915_hw_ppgtt, base);
>  	gen8_gtt_pte_t *pt_vaddr;
> -	unsigned first_entry = start >> PAGE_SHIFT;
> -	unsigned act_pt = first_entry / GEN8_PTES_PER_PAGE;
> -	unsigned act_pte = first_entry % GEN8_PTES_PER_PAGE;
> +	unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> +	unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> +	unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
>  	struct sg_page_iter sg_iter;
>  
>  	pt_vaddr = NULL;
> +
>  	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> +		if (WARN_ON(pdpe >= GEN8_LEGACY_PDPS))
> +			break;
> +
>  		if (pt_vaddr == NULL)
> -			pt_vaddr = kmap_atomic(&ppgtt->gen8_pt_pages[act_pt]);
> +			pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
>  
> -		pt_vaddr[act_pte] =
> +		pt_vaddr[pte] =
>  			gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
>  					cache_level, true);
> -		if (++act_pte == GEN8_PTES_PER_PAGE) {
> +		if (++pte == GEN8_PTES_PER_PAGE) {
>  			kunmap_atomic(pt_vaddr);
>  			pt_vaddr = NULL;
> -			act_pt++;
> -			act_pte = 0;
> +			if (pde + 1 == GEN8_PDES_PER_PAGE) {
> +				pdpe++;
> +				pde = 0;
> +			}
> +			pte = 0;
>  		}
>  	}
>  	if (pt_vaddr)
>  		kunmap_atomic(pt_vaddr);
>  }
>  
> -static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
> +static void gen8_free_page_tables(struct page **pt_pages)
>  {
>  	int i;
>  
> -	for (i = 0; i < ppgtt->num_pd_pages ; i++)
> +	if (pt_pages == NULL)
> +		return;
> +
> +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
> +		if (pt_pages[i])
> +			__free_pages(pt_pages[i], 0);
> +}
> +
> +static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
> +{
> +	int i;
> +
> +	for (i = 0; i < ppgtt->num_pd_pages; i++) {
> +		gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
> +		kfree(ppgtt->gen8_pt_pages[i]);
>  		kfree(ppgtt->gen8_pt_dma_addr[i]);
> +	}
>  
> -	__free_pages(ppgtt->gen8_pt_pages, get_order(ppgtt->num_pt_pages << PAGE_SHIFT));
>  	__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
>  }
>  
> @@ -368,20 +405,61 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
>  	gen8_ppgtt_free(ppgtt);
>  }
>  
> +static struct page **__gen8_alloc_page_tables(void)
> +{
> +	struct page **pt_pages;
> +	int i;
> +
> +	pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
> +	if (!pt_pages)
> +		return ERR_PTR(-ENOMEM);
> +
> +	for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
> +		pt_pages[i] = alloc_page(GFP_KERNEL);
> +		if (!pt_pages[i])
> +			goto bail;
> +	}
> +
> +	return pt_pages;
> +
> +bail:
> +	gen8_free_page_tables(pt_pages);
> +	kfree(pt_pages);
> +	return ERR_PTR(-ENOMEM);
> +}
> +
>  static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
>  					   const int max_pdp)
>  {
> -	struct page *pt_pages;
> +	struct page **pt_pages[GEN8_LEGACY_PDPS];
>  	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
> +	int i, ret;
>  
> -	pt_pages = alloc_pages(GFP_KERNEL, get_order(num_pt_pages << PAGE_SHIFT));
> -	if (!pt_pages)
> -		return -ENOMEM;
> +	for (i = 0; i < max_pdp; i++) {
> +		pt_pages[i] = __gen8_alloc_page_tables();
> +		if (IS_ERR(pt_pages[i])) {
> +			ret = PTR_ERR(pt_pages[i]);
> +			goto unwind_out;
> +		}
> +	}
> +
> +	/* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
> +	 * "atomic" - for cleanup purposes.
> +	 */
> +	for (i = 0; i < max_pdp; i++)
> +		ppgtt->gen8_pt_pages[i] = pt_pages[i];
>  
> -	ppgtt->gen8_pt_pages = pt_pages;
>  	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
>  
>  	return 0;
> +
> +unwind_out:
> +	while (i--) {
> +		gen8_free_page_tables(pt_pages[i]);
> +		kfree(pt_pages[i]);
> +	}
> +
> +	return ret;
>  }
>  
>  static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
> @@ -463,7 +541,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
>  	struct page *p;
>  	int ret;
>  
> -	p = &ppgtt->gen8_pt_pages[pd * GEN8_PDES_PER_PAGE + pt];
> +	p = ppgtt->gen8_pt_pages[pd][pt];
>  	pt_addr = pci_map_page(ppgtt->base.dev->pdev,
>  			       p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>  	ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/9] drm/i915/bdw: Kill ppgtt->num_pt_pages
  2014-02-21 21:06     ` [PATCH 10/9] drm/i915/bdw: Kill ppgtt->num_pt_pages Ben Widawsky
@ 2014-02-24 17:17       ` Imre Deak
  2014-03-04 15:42         ` Daniel Vetter
  0 siblings, 1 reply; 63+ messages in thread
From: Imre Deak @ 2014-02-24 17:17 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky


[-- Attachment #1.1: Type: text/plain, Size: 4284 bytes --]

On Fri, 2014-02-21 at 13:06 -0800, Ben Widawsky wrote:
> With the original PPGTT implementation if the number of PDPs was not a
> power of two, the number of pages for the page tables would end up being
> rounded up. The code actually had a bug here afaict, but this is a
> theoretical bug as I don't believe this can actually occur with the
> current code/HW..
> 
> With the rework of the page table allocations, there is no longer a
> distinction between number of page table pages, and number of page
> directory entries. To avoid confusion, kill the redundant (and newer)
> struct member.
> 
> Cc: Imre Deak <imre.deak@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Nitpick: keeping num_pt_pages instead would make the code more
understandable to me and symmetric with num_pd_pages, but that would've
been much more churn. In any case nice simplification,

Reviewed-by: Imre Deak <imre.deak@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |  2 +-
>  drivers/gpu/drm/i915/i915_drv.h     |  3 +--
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 14 ++++----------
>  3 files changed, 6 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 509e2e1..e0c42a6 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1757,7 +1757,7 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
>  		return;
>  
>  	seq_printf(m, "Page directories: %d\n", ppgtt->num_pd_pages);
> -	seq_printf(m, "Page tables: %d\n", ppgtt->num_pt_pages);
> +	seq_printf(m, "Page tables: %d\n", ppgtt->num_pd_entries);
>  	for_each_ring(ring, dev_priv, unused) {
>  		seq_printf(m, "%s\n", ring->name);
>  		for (i = 0; i < 4; i++) {
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2f29558..a9f1cae 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -698,13 +698,12 @@ struct i915_hw_ppgtt {
>  	struct kref ref;
>  	struct drm_mm_node node;
>  	unsigned num_pd_entries;
> +	unsigned num_pd_pages; /* gen8+ */
>  	union {
>  		struct page **pt_pages;
>  		struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
>  	};
>  	struct page *pd_pages;
> -	int num_pd_pages;
> -	int num_pt_pages;
>  	union {
>  		uint32_t pd_offset;
>  		dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 6c03929..bd815d7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -433,7 +433,6 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
>  					   const int max_pdp)
>  {
>  	struct page **pt_pages[GEN8_LEGACY_PDPS];
> -	const int num_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
>  	int i, ret;
>  
>  	for (i = 0; i < max_pdp; i++) {
> @@ -450,8 +449,6 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
>  	for (i = 0; i < max_pdp; i++)
>  		ppgtt->gen8_pt_pages[i] = pt_pages[i];
>  
> -	ppgtt->num_pt_pages = 1 << get_order(num_pt_pages << PAGE_SHIFT);
> -
>  	return 0;
>  
>  unwind_out:
> @@ -618,18 +615,15 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  	ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
>  	ppgtt->base.cleanup = gen8_ppgtt_cleanup;
>  	ppgtt->base.start = 0;
> -	ppgtt->base.total = ppgtt->num_pt_pages * GEN8_PTES_PER_PAGE * PAGE_SIZE;
> +	ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
>  
> -	ppgtt->base.clear_range(&ppgtt->base, 0,
> -				ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE,
> -				true);
> +	ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
>  
>  	DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
>  			 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
>  	DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
> -			 ppgtt->num_pt_pages,
> -			 (ppgtt->num_pt_pages - min_pt_pages) +
> -			 size % (1<<30));
> +			 ppgtt->num_pd_entries,
> +			 (ppgtt->num_pd_entries - min_pt_pages) + size % (1<<30));
>  	return 0;
>  
>  bail:


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 5/9] [v5] drm/i915/bdw: Reorganize PT allocations
  2014-02-24 17:03       ` Imre Deak
@ 2014-02-24 23:38         ` Ben Widawsky
  0 siblings, 0 replies; 63+ messages in thread
From: Ben Widawsky @ 2014-02-24 23:38 UTC (permalink / raw)
  To: Imre Deak; +Cc: Intel GFX, Ben Widawsky

On Mon, Feb 24, 2014 at 07:03:12PM +0200, Imre Deak wrote:
> On Thu, 2014-02-20 at 11:51 -0800, Ben Widawsky wrote:
> > The previous allocation mechanism would get 2 contiguous allocations,
> > one for the page directories, and one for the page tables. As each page
> > table is 1 page, and there are 512 of these per page directory, this
> > goes to 2MB. An unfriendly request at best. Worse still, our HW now
> > supports 4 page directories, and a 2MB allocation is not allowed.
> > 
> > In order to fix this, this patch attempts to split up each page table
> > allocation into a single, discrete allocation. There is nothing really
> > fancy about the patch itself, it just has to manage an extra pointer
> > indirection, and have a fancier bit of logic to free up the pages.
> > 
> > To accommodate some of the added complexity, two new helpers are
> > introduced to allocate, and free the page table pages.
> > 
> > NOTE: I really wanted to split the way we do allocations, and the way in
> > which we identify the page table/page directory being used. I found
> > splitting this functionality up to be too unwieldy. I apologize in
> > advance to the reviewer. I'd recommend looking at the result, rather
> > than the diff.
> > 
> > v2/NOTE2: This patch predated commit:
> > 6f1cc993518462ccf039e195fabd47e7aa5bfd13
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Tue Dec 31 15:50:31 2013 +0000
> > 
> >     drm/i915: Avoid dereference past end of page arr
> > 
> > It fixed the same issue as that patch, but because of the limbo state of
> > PPGTT, Chris patch was merged instead. The excess churn is a result of
> > my using my original patch, which has my preferred naming. Primarily
> > act_* is changed to which_*, but it's mostly the same otherwise. I've
> > kept the convention Chris used for the pte wrap (I had something
> > slightly different, and broken - but fixable)
> > 
> > v3: Rename which_p[..]e to drop which_ (Chris)
> > Remove BUG_ON in inner loop (Chris)
> > Redo the pde/pdpe wrap logic (Chris)
> > 
> > v4: s/1MB/2MB in commit message (Imre)
> > Plug leaking gen8_pt_pages in both the error path, as well as general
> > free case (Imre)
> > 
> > v5: Rename leftover "which_" variables (Imre)
> > Add the pde = 0 wrap that was missed from v3 (Imre)
> > 
> > Reviewed-by: Imre Deak <imre.deak@intel.com>
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> 
> Reviewed-by: Imre Deak <imre.deak@intel.com>
> 

Thanks very much for your thorough review of the series.


-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 0/9] [v2] BDW 4G GGTT + PPGTT cleanups
  2014-02-20  6:05   ` [PATCH 0/9] [v2] " Ben Widawsky
  2014-02-21 21:06     ` [PATCH 10/9] drm/i915/bdw: Kill ppgtt->num_pt_pages Ben Widawsky
@ 2014-03-04 14:50     ` Daniel Vetter
  1 sibling, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2014-03-04 14:50 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Wed, Feb 19, 2014 at 10:05:40PM -0800, Ben Widawsky wrote:
> Thanks to Imre's very detailed review, and Ville's observation of a
> missed free (earlier bug), I think the series is finally starting to
> shape up. I am having some unrelated problems on my BDW platform at the
> moment, so they are not well tested.
> 
> Many patches are way past v2, but for th series it's the second
> iteration.

fyi threading here is funky - if you want to reply a complete resend to an
old series please only do the cover letter as a direct reply, but make the
individual patches replies to the cover letter. I'm prone to make a giant
mess out of things otherwise ;-)

But in general if you resend everything, a new thread is in order - in
reply resends is just for individual patches to keep an ongoing review
discussion tightly grouped together.
-Daniel

> 
> Ben Widawsky (9):
>   drm/i915/bdw: Free PPGTT struct
>   drm/i915/bdw: Reorganize PPGTT init
>   drm/i915/bdw: Split ppgtt initialization up
>   drm/i915: Make clear/insert vfuncs args absolute
>   drm/i915/bdw: Reorganize PT allocations
>   Revert "drm/i915/bdw: Limit GTT to 2GB"
>   drm/i915: Update i915_gem_gtt.c copyright
>   drm/i915: Split GEN6 PPGTT cleanup
>   drm/i915: Split GEN6 PPGTT initialization up
> 
>  drivers/gpu/drm/i915/i915_drv.h     |  11 +-
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 582 ++++++++++++++++++++++++------------
>  2 files changed, 405 insertions(+), 188 deletions(-)
> 
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH .5/9] drm/i915: Move ppgtt_release out of the header
  2014-02-20 19:47     ` [PATCH .5/9] drm/i915: Move ppgtt_release out of the header Ben Widawsky
  2014-02-20 19:47       ` [PATCH 1/9] [v2] drm/i915/bdw: Free PPGTT struct Ben Widawsky
  2014-02-24 16:18       ` [PATCH .5/9] drm/i915: Move ppgtt_release out of the header Imre Deak
@ 2014-03-04 14:53       ` Daniel Vetter
  2 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2014-03-04 14:53 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Thu, Feb 20, 2014 at 11:47:06AM -0800, Ben Widawsky wrote:
> At one time it was expected to be called in multiple places by kref_put.
> At the current time however, it is all contained within
> i915_gem_context.c.
> 
> This patch makes an upcoming required addition a bit nicer since it too
> doesn't need to be defined in a header file.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

I'm ridiculously happy to see this get out of the header file, it should
never have been there.

Really.

*original reply heavily redacted*

;-)

Cheers, Daniel

> ---
>  drivers/gpu/drm/i915/i915_drv.h         | 36 ---------------------------------
>  drivers/gpu/drm/i915/i915_gem_context.c | 36 +++++++++++++++++++++++++++++++++
>  2 files changed, 36 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 8c64831..57556fb 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2387,42 +2387,6 @@ static inline bool intel_enable_ppgtt(struct drm_device *dev, bool full)
>  		return HAS_ALIASING_PPGTT(dev);
>  }
>  
> -static inline void ppgtt_release(struct kref *kref)
> -{
> -	struct i915_hw_ppgtt *ppgtt = container_of(kref, struct i915_hw_ppgtt, ref);
> -	struct drm_device *dev = ppgtt->base.dev;
> -	struct drm_i915_private *dev_priv = dev->dev_private;
> -	struct i915_address_space *vm = &ppgtt->base;
> -
> -	if (ppgtt == dev_priv->mm.aliasing_ppgtt ||
> -	    (list_empty(&vm->active_list) && list_empty(&vm->inactive_list))) {
> -		ppgtt->base.cleanup(&ppgtt->base);
> -		return;
> -	}
> -
> -	/*
> -	 * Make sure vmas are unbound before we take down the drm_mm
> -	 *
> -	 * FIXME: Proper refcounting should take care of this, this shouldn't be
> -	 * needed at all.
> -	 */
> -	if (!list_empty(&vm->active_list)) {
> -		struct i915_vma *vma;
> -
> -		list_for_each_entry(vma, &vm->active_list, mm_list)
> -			if (WARN_ON(list_empty(&vma->vma_link) ||
> -				    list_is_singular(&vma->vma_link)))
> -				break;
> -
> -		i915_gem_evict_vm(&ppgtt->base, true);
> -	} else {
> -		i915_gem_retire_requests(dev);
> -		i915_gem_evict_vm(&ppgtt->base, false);
> -	}
> -
> -	ppgtt->base.cleanup(&ppgtt->base);
> -}
> -
>  /* i915_gem_stolen.c */
>  int i915_gem_init_stolen(struct drm_device *dev);
>  int i915_gem_stolen_setup_compression(struct drm_device *dev, int size);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index f8c21a6..171a2ef 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -99,6 +99,42 @@
>  static int do_switch(struct intel_ring_buffer *ring,
>  		     struct i915_hw_context *to);
>  
> +static void ppgtt_release(struct kref *kref)
> +{
> +	struct i915_hw_ppgtt *ppgtt = container_of(kref, struct i915_hw_ppgtt, ref);
> +	struct drm_device *dev = ppgtt->base.dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct i915_address_space *vm = &ppgtt->base;
> +
> +	if (ppgtt == dev_priv->mm.aliasing_ppgtt ||
> +	    (list_empty(&vm->active_list) && list_empty(&vm->inactive_list))) {
> +		ppgtt->base.cleanup(&ppgtt->base);
> +		return;
> +	}
> +
> +	/*
> +	 * Make sure vmas are unbound before we take down the drm_mm
> +	 *
> +	 * FIXME: Proper refcounting should take care of this, this shouldn't be
> +	 * needed at all.
> +	 */
> +	if (!list_empty(&vm->active_list)) {
> +		struct i915_vma *vma;
> +
> +		list_for_each_entry(vma, &vm->active_list, mm_list)
> +			if (WARN_ON(list_empty(&vma->vma_link) ||
> +				    list_is_singular(&vma->vma_link)))
> +				break;
> +
> +		i915_gem_evict_vm(&ppgtt->base, true);
> +	} else {
> +		i915_gem_retire_requests(dev);
> +		i915_gem_evict_vm(&ppgtt->base, false);
> +	}
> +
> +	ppgtt->base.cleanup(&ppgtt->base);
> +}
> +
>  static size_t get_context_alignment(struct drm_device *dev)
>  {
>  	if (IS_GEN6(dev))
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/9] drm/i915/bdw: Kill ppgtt->num_pt_pages
  2014-02-24 17:17       ` Imre Deak
@ 2014-03-04 15:42         ` Daniel Vetter
  0 siblings, 0 replies; 63+ messages in thread
From: Daniel Vetter @ 2014-03-04 15:42 UTC (permalink / raw)
  To: Imre Deak; +Cc: Intel GFX, Ben Widawsky, Ben Widawsky

On Mon, Feb 24, 2014 at 07:17:02PM +0200, Imre Deak wrote:
> On Fri, 2014-02-21 at 13:06 -0800, Ben Widawsky wrote:
> > With the original PPGTT implementation if the number of PDPs was not a
> > power of two, the number of pages for the page tables would end up being
> > rounded up. The code actually had a bug here afaict, but this is a
> > theoretical bug as I don't believe this can actually occur with the
> > current code/HW..
> > 
> > With the rework of the page table allocations, there is no longer a
> > distinction between number of page table pages, and number of page
> > directory entries. To avoid confusion, kill the redundant (and newer)
> > struct member.
> > 
> > Cc: Imre Deak <imre.deak@intel.com>
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> 
> Nitpick: keeping num_pt_pages instead would make the code more
> understandable to me and symmetric with num_pd_pages, but that would've
> been much more churn. In any case nice simplification,
> 
> Reviewed-by: Imre Deak <imre.deak@intel.com>

All merged to dinq, thanks for patches&review.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2014-03-04 15:42 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <to=1387921357-22942-1-git-send-email-benjamin.widawsky@intel.com>
2014-02-12 22:28 ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ben Widawsky
2014-02-12 22:28   ` [PATCH 1/9] drm/i915/bdw: Split up PPGTT cleanup Ben Widawsky
2014-02-13 10:40     ` Chris Wilson
2014-02-12 22:28   ` [PATCH 2/9] drm/i915/bdw: Reorganize PPGTT init Ben Widawsky
2014-02-19 14:59     ` Imre Deak
2014-02-19 20:06       ` [PATCH] [v3] " Ben Widawsky
2014-02-19 21:00         ` Imre Deak
2014-02-19 21:18           ` Ben Widawsky
2014-02-12 22:28   ` [PATCH 3/9] drm/i915/bdw: Split ppgtt initialization up Ben Widawsky
2014-02-19 17:03     ` Imre Deak
2014-02-12 22:28   ` [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute Ben Widawsky
2014-02-13  0:14     ` Chris Wilson
2014-02-13  0:34       ` Ben Widawsky
2014-02-19 17:26     ` Imre Deak
2014-02-12 22:28   ` [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations Ben Widawsky
2014-02-12 23:45     ` Chris Wilson
2014-02-12 23:52       ` Ben Widawsky
2014-02-19 19:11     ` Imre Deak
2014-02-19 19:25       ` Imre Deak
2014-02-19 21:06       ` Ben Widawsky
2014-02-19 21:20         ` Imre Deak
2014-02-19 21:31           ` Ben Widawsky
2014-02-12 22:28   ` [PATCH 6/9] Revert "drm/i915/bdw: Limit GTT to 2GB" Ben Widawsky
2014-02-19 19:14     ` Imre Deak
2014-02-12 22:28   ` [PATCH 7/9] drm/i915: Update i915_gem_gtt.c copyright Ben Widawsky
2014-02-12 23:19     ` Damien Lespiau
2014-02-12 23:22       ` Ben Widawsky
2014-02-19 19:20     ` Imre Deak
2014-02-12 22:28   ` [PATCH 8/9] drm/i915: Split GEN6 PPGTT cleanup Ben Widawsky
2014-02-13 10:29     ` Chris Wilson
2014-02-12 22:28   ` [PATCH 9/9] drm/i915: Split GEN6 PPGTT initialization up Ben Widawsky
2014-02-13 10:33     ` Chris Wilson
2014-02-13 11:47   ` [PATCH 0/9] [REPOST] BDW 4G GGTT + PPGTT cleanups Ville Syrjälä
2014-02-19 17:17     ` Ben Widawsky
2014-02-20  6:05   ` [PATCH 0/9] [v2] " Ben Widawsky
2014-02-21 21:06     ` [PATCH 10/9] drm/i915/bdw: Kill ppgtt->num_pt_pages Ben Widawsky
2014-02-24 17:17       ` Imre Deak
2014-03-04 15:42         ` Daniel Vetter
2014-03-04 14:50     ` [PATCH 0/9] [v2] BDW 4G GGTT + PPGTT cleanups Daniel Vetter
2014-02-20  6:05   ` [PATCH 1/9] drm/i915/bdw: Free PPGTT struct Ben Widawsky
2014-02-20  9:31     ` Imre Deak
2014-02-20 19:47     ` [PATCH .5/9] drm/i915: Move ppgtt_release out of the header Ben Widawsky
2014-02-20 19:47       ` [PATCH 1/9] [v2] drm/i915/bdw: Free PPGTT struct Ben Widawsky
2014-02-24 16:43         ` Imre Deak
2014-02-24 16:18       ` [PATCH .5/9] drm/i915: Move ppgtt_release out of the header Imre Deak
2014-03-04 14:53       ` Daniel Vetter
2014-02-20  6:05   ` [PATCH 2/9] drm/i915/bdw: Reorganize PPGTT init Ben Widawsky
2014-02-20  6:05   ` [PATCH 3/9] drm/i915/bdw: Split ppgtt initialization up Ben Widawsky
2014-02-20 13:10     ` Imre Deak
2014-02-20  6:05   ` [PATCH 4/9] drm/i915: Make clear/insert vfuncs args absolute Ben Widawsky
2014-02-20 10:37     ` Imre Deak
2014-02-20 19:35       ` Ben Widawsky
2014-02-20 19:50     ` [PATCH 4/9] [v3] " Ben Widawsky
2014-02-24 16:52       ` Imre Deak
2014-02-20  6:05   ` [PATCH 5/9] drm/i915/bdw: Reorganize PT allocations Ben Widawsky
2014-02-20 11:28     ` Imre Deak
2014-02-20 19:51     ` [PATCH 5/9] [v5] " Ben Widawsky
2014-02-24 17:03       ` Imre Deak
2014-02-24 23:38         ` Ben Widawsky
2014-02-20  6:05   ` [PATCH 6/9] Revert "drm/i915/bdw: Limit GTT to 2GB" Ben Widawsky
2014-02-20  6:05   ` [PATCH 7/9] drm/i915: Update i915_gem_gtt.c copyright Ben Widawsky
2014-02-20  6:05   ` [PATCH 8/9] drm/i915: Split GEN6 PPGTT cleanup Ben Widawsky
2014-02-20  6:05   ` [PATCH 9/9] drm/i915: Split GEN6 PPGTT initialization up Ben Widawsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.