All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps
@ 2015-08-31 16:27 Michał Winiarski
  2015-08-31 16:40 ` Chris Wilson
  2015-08-31 16:59 ` [PATCH v2] " Michał Winiarski
  0 siblings, 2 replies; 15+ messages in thread
From: Michał Winiarski @ 2015-08-31 16:27 UTC (permalink / raw)
  To: intel-gfx

On each call to gen8_alloc_va_range_3lvl we're allocating temporary
bitmaps needed for error handling. Unfortunately, when we increase
address space size (48b ppgtt) we do additional (512 - 4) calls to
kcalloc, increasing latency between exec and actual start of execution
on the GPU. Let's just do a single kcalloc and setup proper offsets in
an array, we can also drop the size from free_gen8_temp_bitmaps since
it's no longer needed.

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 35 +++++++++++++++--------------------
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 4a76807..fd5545a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1126,13 +1126,9 @@ unwind_out:
 }
 
 static void
-free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
-		       uint32_t pdpes)
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
 {
-	int i;
-
-	for (i = 0; i < pdpes; i++)
-		kfree(new_pts[i]);
+	kfree(*new_pts);
 	kfree(new_pts);
 	kfree(new_pds);
 }
@@ -1154,17 +1150,16 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
 		return -ENOMEM;
 
 	pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL);
-	if (!pts) {
-		kfree(pds);
-		return -ENOMEM;
-	}
+	if (!pts)
+		goto err_out;
 
-	for (i = 0; i < pdpes; i++) {
-		pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES),
-				 sizeof(unsigned long), GFP_KERNEL);
-		if (!pts[i])
-			goto err_out;
-	}
+	*pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES),
+			sizeof(unsigned long), GFP_KERNEL);
+	if (!*pts)
+		goto err_out;
+
+	for (i = 0; i < pdpes; i++)
+		pts[i] = *pts + i * BITS_TO_LONGS(I915_PDES);
 
 	*new_pds = pds;
 	*new_pts = pts;
@@ -1172,7 +1167,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
 	return 0;
 
 err_out:
-	free_gen8_temp_bitmaps(pds, pts, pdpes);
+	free_gen8_temp_bitmaps(pds, pts);
 	return -ENOMEM;
 }
 
@@ -1220,7 +1215,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 	ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length,
 						new_page_dirs);
 	if (ret) {
-		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 		return ret;
 	}
 
@@ -1278,7 +1273,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 		gen8_setup_page_directory(ppgtt, pdp, pd, pdpe);
 	}
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	mark_tlbs_dirty(ppgtt);
 	return 0;
 
@@ -1291,7 +1286,7 @@ err_out:
 	for_each_set_bit(pdpe, new_page_dirs, pdpes)
 		free_pd(dev, pdp->page_directory[pdpe]);
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	mark_tlbs_dirty(ppgtt);
 	return ret;
 }
-- 
2.4.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps
  2015-08-31 16:27 [PATCH] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps Michał Winiarski
@ 2015-08-31 16:40 ` Chris Wilson
  2015-08-31 16:59 ` [PATCH v2] " Michał Winiarski
  1 sibling, 0 replies; 15+ messages in thread
From: Chris Wilson @ 2015-08-31 16:40 UTC (permalink / raw)
  To: Michał Winiarski; +Cc: intel-gfx

On Mon, Aug 31, 2015 at 06:27:40PM +0200, Michał Winiarski wrote:
> On each call to gen8_alloc_va_range_3lvl we're allocating temporary
> bitmaps needed for error handling. Unfortunately, when we increase
> address space size (48b ppgtt) we do additional (512 - 4) calls to
> kcalloc, increasing latency between exec and actual start of execution
> on the GPU. Let's just do a single kcalloc and setup proper offsets in
> an array, we can also drop the size from free_gen8_temp_bitmaps since
> it's no longer needed.
> 
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

Whilst you are here, mind using GFP_TEMPORARY as well?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps
  2015-08-31 16:27 [PATCH] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps Michał Winiarski
  2015-08-31 16:40 ` Chris Wilson
@ 2015-08-31 16:59 ` Michał Winiarski
  2015-08-31 18:42   ` Chris Wilson
  2015-09-01  9:06   ` [PATCH v3] " Michał Winiarski
  1 sibling, 2 replies; 15+ messages in thread
From: Michał Winiarski @ 2015-08-31 16:59 UTC (permalink / raw)
  To: intel-gfx

On each call to gen8_alloc_va_range_3lvl we're allocating temporary
bitmaps needed for error handling. Unfortunately, when we increase
address space size (48b ppgtt) we do additional (512 - 4) calls to
kcalloc, increasing latency between exec and actual start of execution
on the GPU. Let's just do a single kcalloc and setup proper offsets in
an array, we can also drop the size from free_gen8_temp_bitmaps since
it's no longer needed.

v2: Use GFP_TEMPORARY to make the allocations reclaimable.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 39 ++++++++++++++++---------------------
 1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 4a76807..9270c47 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1126,13 +1126,9 @@ unwind_out:
 }
 
 static void
-free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
-		       uint32_t pdpes)
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
 {
-	int i;
-
-	for (i = 0; i < pdpes; i++)
-		kfree(new_pts[i]);
+	kfree(*new_pts);
 	kfree(new_pts);
 	kfree(new_pds);
 }
@@ -1149,22 +1145,21 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
 	unsigned long *pds;
 	unsigned long **pts;
 
-	pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL);
+	pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_TEMPORARY);
 	if (!pds)
 		return -ENOMEM;
 
-	pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL);
-	if (!pts) {
-		kfree(pds);
-		return -ENOMEM;
-	}
+	pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_TEMPORARY);
+	if (!pts)
+		goto err_out;
 
-	for (i = 0; i < pdpes; i++) {
-		pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES),
-				 sizeof(unsigned long), GFP_KERNEL);
-		if (!pts[i])
-			goto err_out;
-	}
+	*pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES),
+			sizeof(unsigned long), GFP_TEMPORARY);
+	if (!*pts)
+		goto err_out;
+
+	for (i = 0; i < pdpes; i++)
+		pts[i] = *pts + i * BITS_TO_LONGS(I915_PDES);
 
 	*new_pds = pds;
 	*new_pts = pts;
@@ -1172,7 +1167,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
 	return 0;
 
 err_out:
-	free_gen8_temp_bitmaps(pds, pts, pdpes);
+	free_gen8_temp_bitmaps(pds, pts);
 	return -ENOMEM;
 }
 
@@ -1220,7 +1215,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 	ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length,
 						new_page_dirs);
 	if (ret) {
-		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 		return ret;
 	}
 
@@ -1278,7 +1273,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 		gen8_setup_page_directory(ppgtt, pdp, pd, pdpe);
 	}
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	mark_tlbs_dirty(ppgtt);
 	return 0;
 
@@ -1291,7 +1286,7 @@ err_out:
 	for_each_set_bit(pdpe, new_page_dirs, pdpes)
 		free_pd(dev, pdp->page_directory[pdpe]);
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	mark_tlbs_dirty(ppgtt);
 	return ret;
 }
-- 
2.4.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps
  2015-08-31 16:59 ` [PATCH v2] " Michał Winiarski
@ 2015-08-31 18:42   ` Chris Wilson
  2015-09-01  9:03     ` Michał Winiarski
  2015-09-01  9:06   ` [PATCH v3] " Michał Winiarski
  1 sibling, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2015-08-31 18:42 UTC (permalink / raw)
  To: Michał Winiarski; +Cc: intel-gfx

On Mon, Aug 31, 2015 at 06:59:40PM +0200, Michał Winiarski wrote:
> On each call to gen8_alloc_va_range_3lvl we're allocating temporary
> bitmaps needed for error handling. Unfortunately, when we increase
> address space size (48b ppgtt) we do additional (512 - 4) calls to
> kcalloc, increasing latency between exec and actual start of execution
> on the GPU. Let's just do a single kcalloc and setup proper offsets in
> an array, we can also drop the size from free_gen8_temp_bitmaps since
> it's no longer needed.
> 
> v2: Use GFP_TEMPORARY to make the allocations reclaimable.
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

Is there any reason why it remains a 2D array though? Looks like it can
be reduced to a single block. And why do we allocate a bitmap for the
whole address space rather than just the range being allocated?

While you may just answer the questions posed, this looks clumsy:

> +	*pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES),
> +			sizeof(unsigned long), GFP_TEMPORARY);
> +	if (!*pts)
> +		goto err_out;
> +
> +	for (i = 0; i < pdpes; i++)
> +		pts[i] = *pts + i * BITS_TO_LONGS(I915_PDES);

i..e

for (i = 1; i < pdpes; i++)
	pts[i] = pts[0] + i* BITS_TO_LONGS(I915_PDES);

raises fewer questions.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps
  2015-08-31 18:42   ` Chris Wilson
@ 2015-09-01  9:03     ` Michał Winiarski
  0 siblings, 0 replies; 15+ messages in thread
From: Michał Winiarski @ 2015-09-01  9:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Mika Kuoppala, Michel Thierry

On Mon, Aug 31, 2015 at 07:42:34PM +0100, Chris Wilson wrote:
> On Mon, Aug 31, 2015 at 06:59:40PM +0200, Michał Winiarski wrote:
> > On each call to gen8_alloc_va_range_3lvl we're allocating temporary
> > bitmaps needed for error handling. Unfortunately, when we increase
> > address space size (48b ppgtt) we do additional (512 - 4) calls to
> > kcalloc, increasing latency between exec and actual start of execution
> > on the GPU. Let's just do a single kcalloc and setup proper offsets in
> > an array, we can also drop the size from free_gen8_temp_bitmaps since
> > it's no longer needed.
> > 
> > v2: Use GFP_TEMPORARY to make the allocations reclaimable.
> > 
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Michel Thierry <michel.thierry@intel.com>
> > Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
> 
> Is there any reason why it remains a 2D array though? Looks like it can
> be reduced to a single block. And why do we allocate a bitmap for the
> whole address space rather than just the range being allocated?

No particular reason, let's try that.
It's not the whole address space... Quite possibly just 1/512 of the whole
address space, as the PUD is handled in gen8_alloc_va_range_4lvl. I guess it was
done to simplify... something, although I can't tell what at this point.
 
> While you may just answer the questions posed, this looks clumsy:

Yup - missed that, I was using temp variable before sending the patch.
-Michał

> 
> > +	*pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES),
> > +			sizeof(unsigned long), GFP_TEMPORARY);
> > +	if (!*pts)
> > +		goto err_out;
> > +
> > +	for (i = 0; i < pdpes; i++)
> > +		pts[i] = *pts + i * BITS_TO_LONGS(I915_PDES);
> 
> i..e
> 
> for (i = 1; i < pdpes; i++)
> 	pts[i] = pts[0] + i* BITS_TO_LONGS(I915_PDES);
> 
> raises fewer questions.
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps
  2015-08-31 16:59 ` [PATCH v2] " Michał Winiarski
  2015-08-31 18:42   ` Chris Wilson
@ 2015-09-01  9:06   ` Michał Winiarski
  2015-09-02 13:40     ` Michel Thierry
  1 sibling, 1 reply; 15+ messages in thread
From: Michał Winiarski @ 2015-09-01  9:06 UTC (permalink / raw)
  To: intel-gfx

On each call to gen8_alloc_va_range_3lvl we're allocating temporary
bitmaps needed for error handling. Unfortunately, when we increase
address space size (48b ppgtt) we do additional (512 - 4) calls to
kcalloc, increasing latency between exec and actual start of execution
on the GPU. Let's just do a single kcalloc, we can also drop the size
from free_gen8_temp_bitmaps since it's no longer used.

v2: Use GFP_TEMPORARY to make the allocations reclaimable.
v3: Drop the 2D array, just allocate a single block.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 45 +++++++++++++------------------------
 1 file changed, 16 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 4a76807..f810762 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1126,13 +1126,8 @@ unwind_out:
 }
 
 static void
-free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
-		       uint32_t pdpes)
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long *new_pts)
 {
-	int i;
-
-	for (i = 0; i < pdpes; i++)
-		kfree(new_pts[i]);
 	kfree(new_pts);
 	kfree(new_pds);
 }
@@ -1142,29 +1137,20 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
  */
 static
 int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
-					 unsigned long ***new_pts,
+					 unsigned long **new_pts,
 					 uint32_t pdpes)
 {
-	int i;
 	unsigned long *pds;
-	unsigned long **pts;
+	unsigned long *pts;
 
-	pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL);
+	pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_TEMPORARY);
 	if (!pds)
 		return -ENOMEM;
 
-	pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL);
-	if (!pts) {
-		kfree(pds);
-		return -ENOMEM;
-	}
-
-	for (i = 0; i < pdpes; i++) {
-		pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES),
-				 sizeof(unsigned long), GFP_KERNEL);
-		if (!pts[i])
-			goto err_out;
-	}
+	pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES),
+			sizeof(unsigned long), GFP_TEMPORARY);
+	if (!pts)
+		goto err_out;
 
 	*new_pds = pds;
 	*new_pts = pts;
@@ -1172,7 +1158,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
 	return 0;
 
 err_out:
-	free_gen8_temp_bitmaps(pds, pts, pdpes);
+	free_gen8_temp_bitmaps(pds, pts);
 	return -ENOMEM;
 }
 
@@ -1193,7 +1179,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
-	unsigned long *new_page_dirs, **new_page_tables;
+	unsigned long *new_page_dirs, *new_page_tables;
 	struct drm_device *dev = vm->dev;
 	struct i915_page_directory *pd;
 	const uint64_t orig_start = start;
@@ -1220,14 +1206,14 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 	ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length,
 						new_page_dirs);
 	if (ret) {
-		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 		return ret;
 	}
 
 	/* For every page directory referenced, allocate page tables */
 	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length,
-						new_page_tables[pdpe]);
+						new_page_tables + pdpe * BITS_TO_LONGS(I915_PDES));
 		if (ret)
 			goto err_out;
 	}
@@ -1278,20 +1264,21 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 		gen8_setup_page_directory(ppgtt, pdp, pd, pdpe);
 	}
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	mark_tlbs_dirty(ppgtt);
 	return 0;
 
 err_out:
 	while (pdpe--) {
-		for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES)
+		for_each_set_bit(temp, new_page_tables + pdpe *
+				BITS_TO_LONGS(I915_PDES), I915_PDES)
 			free_pt(dev, pdp->page_directory[pdpe]->page_table[temp]);
 	}
 
 	for_each_set_bit(pdpe, new_page_dirs, pdpes)
 		free_pd(dev, pdp->page_directory[pdpe]);
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	mark_tlbs_dirty(ppgtt);
 	return ret;
 }
-- 
2.4.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps
  2015-09-01  9:06   ` [PATCH v3] " Michał Winiarski
@ 2015-09-02 13:40     ` Michel Thierry
  2015-09-02 13:46       ` Chris Wilson
  0 siblings, 1 reply; 15+ messages in thread
From: Michel Thierry @ 2015-09-02 13:40 UTC (permalink / raw)
  To: Michał Winiarski, intel-gfx

On 9/1/2015 10:06 AM, Michał Winiarski wrote:
> On each call to gen8_alloc_va_range_3lvl we're allocating temporary
> bitmaps needed for error handling. Unfortunately, when we increase
> address space size (48b ppgtt) we do additional (512 - 4) calls to
> kcalloc, increasing latency between exec and actual start of execution
> on the GPU. Let's just do a single kcalloc, we can also drop the size
> from free_gen8_temp_bitmaps since it's no longer used.
>
> v2: Use GFP_TEMPORARY to make the allocations reclaimable.
> v3: Drop the 2D array, just allocate a single block.
>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>

Unless Chris thinks otherwise, I see Michał already addressed his comments.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>

> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 45 +++++++++++++------------------------
>   1 file changed, 16 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 4a76807..f810762 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1126,13 +1126,8 @@ unwind_out:
>   }
>
>   static void
> -free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
> -		       uint32_t pdpes)
> +free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long *new_pts)
>   {
> -	int i;
> -
> -	for (i = 0; i < pdpes; i++)
> -		kfree(new_pts[i]);
>   	kfree(new_pts);
>   	kfree(new_pds);
>   }
> @@ -1142,29 +1137,20 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
>    */
>   static
>   int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
> -					 unsigned long ***new_pts,
> +					 unsigned long **new_pts,
>   					 uint32_t pdpes)
>   {
> -	int i;
>   	unsigned long *pds;
> -	unsigned long **pts;
> +	unsigned long *pts;
>
> -	pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL);
> +	pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_TEMPORARY);
>   	if (!pds)
>   		return -ENOMEM;
>
> -	pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL);
> -	if (!pts) {
> -		kfree(pds);
> -		return -ENOMEM;
> -	}
> -
> -	for (i = 0; i < pdpes; i++) {
> -		pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES),
> -				 sizeof(unsigned long), GFP_KERNEL);
> -		if (!pts[i])
> -			goto err_out;
> -	}
> +	pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES),
> +			sizeof(unsigned long), GFP_TEMPORARY);
> +	if (!pts)
> +		goto err_out;
>
>   	*new_pds = pds;
>   	*new_pts = pts;
> @@ -1172,7 +1158,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
>   	return 0;
>
>   err_out:
> -	free_gen8_temp_bitmaps(pds, pts, pdpes);
> +	free_gen8_temp_bitmaps(pds, pts);
>   	return -ENOMEM;
>   }
>
> @@ -1193,7 +1179,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
>   {
>   	struct i915_hw_ppgtt *ppgtt =
>   		container_of(vm, struct i915_hw_ppgtt, base);
> -	unsigned long *new_page_dirs, **new_page_tables;
> +	unsigned long *new_page_dirs, *new_page_tables;
>   	struct drm_device *dev = vm->dev;
>   	struct i915_page_directory *pd;
>   	const uint64_t orig_start = start;
> @@ -1220,14 +1206,14 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
>   	ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length,
>   						new_page_dirs);
>   	if (ret) {
> -		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> +		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
>   		return ret;
>   	}
>
>   	/* For every page directory referenced, allocate page tables */
>   	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
>   		ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length,
> -						new_page_tables[pdpe]);
> +						new_page_tables + pdpe * BITS_TO_LONGS(I915_PDES));
>   		if (ret)
>   			goto err_out;
>   	}
> @@ -1278,20 +1264,21 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
>   		gen8_setup_page_directory(ppgtt, pdp, pd, pdpe);
>   	}
>
> -	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> +	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
>   	mark_tlbs_dirty(ppgtt);
>   	return 0;
>
>   err_out:
>   	while (pdpe--) {
> -		for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES)
> +		for_each_set_bit(temp, new_page_tables + pdpe *
> +				BITS_TO_LONGS(I915_PDES), I915_PDES)
>   			free_pt(dev, pdp->page_directory[pdpe]->page_table[temp]);
>   	}
>
>   	for_each_set_bit(pdpe, new_page_dirs, pdpes)
>   		free_pd(dev, pdp->page_directory[pdpe]);
>
> -	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> +	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
>   	mark_tlbs_dirty(ppgtt);
>   	return ret;
>   }
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps
  2015-09-02 13:40     ` Michel Thierry
@ 2015-09-02 13:46       ` Chris Wilson
  2015-09-02 15:13         ` Daniel Vetter
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2015-09-02 13:46 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Wed, Sep 02, 2015 at 02:40:03PM +0100, Michel Thierry wrote:
> On 9/1/2015 10:06 AM, Michał Winiarski wrote:
> >On each call to gen8_alloc_va_range_3lvl we're allocating temporary
> >bitmaps needed for error handling. Unfortunately, when we increase
> >address space size (48b ppgtt) we do additional (512 - 4) calls to
> >kcalloc, increasing latency between exec and actual start of execution
> >on the GPU. Let's just do a single kcalloc, we can also drop the size
> >from free_gen8_temp_bitmaps since it's no longer used.
> >
> >v2: Use GFP_TEMPORARY to make the allocations reclaimable.
> >v3: Drop the 2D array, just allocate a single block.
> >
> >Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >Cc: Michel Thierry <michel.thierry@intel.com>
> >Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
> 
> Unless Chris thinks otherwise, I see Michał already addressed his comments.

I think I spotted a misaligned bracket... ;)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chri

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps
  2015-09-02 13:46       ` Chris Wilson
@ 2015-09-02 15:13         ` Daniel Vetter
  2015-09-02 15:26           ` Daniel Vetter
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Vetter @ 2015-09-02 15:13 UTC (permalink / raw)
  To: Chris Wilson, Michel Thierry, Michał Winiarski, intel-gfx,
	Mika Kuoppala

On Wed, Sep 02, 2015 at 02:46:41PM +0100, Chris Wilson wrote:
> On Wed, Sep 02, 2015 at 02:40:03PM +0100, Michel Thierry wrote:
> > On 9/1/2015 10:06 AM, Michał Winiarski wrote:
> > >On each call to gen8_alloc_va_range_3lvl we're allocating temporary
> > >bitmaps needed for error handling. Unfortunately, when we increase
> > >address space size (48b ppgtt) we do additional (512 - 4) calls to
> > >kcalloc, increasing latency between exec and actual start of execution
> > >on the GPU. Let's just do a single kcalloc, we can also drop the size
> > >from free_gen8_temp_bitmaps since it's no longer used.
> > >
> > >v2: Use GFP_TEMPORARY to make the allocations reclaimable.
> > >v3: Drop the 2D array, just allocate a single block.
> > >
> > >Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > >Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > >Cc: Michel Thierry <michel.thierry@intel.com>
> > >Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
> > 
> > Unless Chris thinks otherwise, I see Michał already addressed his comments.
> 
> I think I spotted a misaligned bracket... ;)

checkpatch didn't spot it and neither me ...

> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps
  2015-09-02 15:13         ` Daniel Vetter
@ 2015-09-02 15:26           ` Daniel Vetter
  2015-09-02 15:46             ` [PATCH v4] " Michał Winiarski
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Vetter @ 2015-09-02 15:26 UTC (permalink / raw)
  To: Chris Wilson, Michel Thierry, Michał Winiarski, intel-gfx,
	Mika Kuoppala

On Wed, Sep 02, 2015 at 05:13:29PM +0200, Daniel Vetter wrote:
> On Wed, Sep 02, 2015 at 02:46:41PM +0100, Chris Wilson wrote:
> > On Wed, Sep 02, 2015 at 02:40:03PM +0100, Michel Thierry wrote:
> > > On 9/1/2015 10:06 AM, Michał Winiarski wrote:
> > > >On each call to gen8_alloc_va_range_3lvl we're allocating temporary
> > > >bitmaps needed for error handling. Unfortunately, when we increase
> > > >address space size (48b ppgtt) we do additional (512 - 4) calls to
> > > >kcalloc, increasing latency between exec and actual start of execution
> > > >on the GPU. Let's just do a single kcalloc, we can also drop the size
> > > >from free_gen8_temp_bitmaps since it's no longer used.
> > > >
> > > >v2: Use GFP_TEMPORARY to make the allocations reclaimable.
> > > >v3: Drop the 2D array, just allocate a single block.
> > > >
> > > >Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > >Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > >Cc: Michel Thierry <michel.thierry@intel.com>
> > > >Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
> > > 
> > > Unless Chris thinks otherwise, I see Michał already addressed his comments.
> > 
> > I think I spotted a misaligned bracket... ;)
> 
> checkpatch didn't spot it and neither me ...
> 
> > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Queued for -next, thanks for the patch.

Strike that, patch doesn't compile too well on plain dinq. What's going on
here? And there doesn't seem to be anything in -nightly which isn't in
dinq ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps
  2015-09-02 15:26           ` Daniel Vetter
@ 2015-09-02 15:46             ` Michał Winiarski
  2015-09-02 16:05               ` Chris Wilson
  2015-09-03 17:22               ` [PATCH v5] " Michał Winiarski
  0 siblings, 2 replies; 15+ messages in thread
From: Michał Winiarski @ 2015-09-02 15:46 UTC (permalink / raw)
  To: intel-gfx, Daniel Vetter

On each call to gen8_alloc_va_range_3lvl we're allocating temporary
bitmaps needed for error handling. Unfortunately, when we increase
address space size (48b ppgtt) we do additional (512 - 4) calls to
kcalloc, increasing latency between exec and actual start of execution
on the GPU. Let's just do a single kcalloc, we can also drop the size
from free_gen8_temp_bitmaps since it's no longer used.

v2: Use GFP_TEMPORARY to make the allocations reclaimable.
v3: Drop the 2D array, just allocate a single block.
v4: Rebase to handle gen8_preallocate_top_level_pdps.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 49 ++++++++++++++-----------------------
 1 file changed, 18 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bdb7adc..8885ceb 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1164,13 +1164,8 @@ unwind_out:
 }
 
 static void
-free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
-		       uint32_t pdpes)
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long *new_pts)
 {
-	int i;
-
-	for (i = 0; i < pdpes; i++)
-		kfree(new_pts[i]);
 	kfree(new_pts);
 	kfree(new_pds);
 }
@@ -1180,29 +1175,20 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
  */
 static
 int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
-					 unsigned long ***new_pts,
+					 unsigned long **new_pts,
 					 uint32_t pdpes)
 {
-	int i;
 	unsigned long *pds;
-	unsigned long **pts;
+	unsigned long *pts;
 
-	pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL);
+	pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_TEMPORARY);
 	if (!pds)
 		return -ENOMEM;
 
-	pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL);
-	if (!pts) {
-		kfree(pds);
-		return -ENOMEM;
-	}
-
-	for (i = 0; i < pdpes; i++) {
-		pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES),
-				 sizeof(unsigned long), GFP_KERNEL);
-		if (!pts[i])
-			goto err_out;
-	}
+	pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES),
+			sizeof(unsigned long), GFP_TEMPORARY);
+	if (!pts)
+		goto err_out;
 
 	*new_pds = pds;
 	*new_pts = pts;
@@ -1210,7 +1196,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
 	return 0;
 
 err_out:
-	free_gen8_temp_bitmaps(pds, pts, pdpes);
+	free_gen8_temp_bitmaps(pds, pts);
 	return -ENOMEM;
 }
 
@@ -1231,7 +1217,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
-	unsigned long *new_page_dirs, **new_page_tables;
+	unsigned long *new_page_dirs, *new_page_tables;
 	struct drm_device *dev = vm->dev;
 	struct i915_page_directory *pd;
 	const uint64_t orig_start = start;
@@ -1258,14 +1244,14 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 	ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length,
 						new_page_dirs);
 	if (ret) {
-		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 		return ret;
 	}
 
 	/* For every page directory referenced, allocate page tables */
 	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length,
-						new_page_tables[pdpe]);
+						new_page_tables + pdpe * BITS_TO_LONGS(I915_PDES));
 		if (ret)
 			goto err_out;
 	}
@@ -1316,20 +1302,21 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 		gen8_setup_page_directory(ppgtt, pdp, pd, pdpe);
 	}
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	mark_tlbs_dirty(ppgtt);
 	return 0;
 
 err_out:
 	while (pdpe--) {
-		for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES)
+		for_each_set_bit(temp, new_page_tables + pdpe *
+				BITS_TO_LONGS(I915_PDES), I915_PDES)
 			free_pt(dev, pdp->page_directory[pdpe]->page_table[temp]);
 	}
 
 	for_each_set_bit(pdpe, new_page_dirs, pdpes)
 		free_pd(dev, pdp->page_directory[pdpe]);
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	mark_tlbs_dirty(ppgtt);
 	return ret;
 }
@@ -1481,7 +1468,7 @@ static void gen8_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 
 static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
 {
-	unsigned long *new_page_dirs, **new_page_tables;
+	unsigned long *new_page_dirs, *new_page_tables;
 	uint32_t pdpes = I915_PDPES_PER_PDP(dev);
 	int ret;
 
@@ -1501,7 +1488,7 @@ static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
 	if (!ret)
 		*ppgtt->pdp.used_pdpes = *new_page_dirs;
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 
 	return ret;
 }
-- 
2.4.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps
  2015-09-02 15:46             ` [PATCH v4] " Michał Winiarski
@ 2015-09-02 16:05               ` Chris Wilson
  2015-09-03 17:22               ` [PATCH v5] " Michał Winiarski
  1 sibling, 0 replies; 15+ messages in thread
From: Chris Wilson @ 2015-09-02 16:05 UTC (permalink / raw)
  To: Michał Winiarski; +Cc: intel-gfx

On Wed, Sep 02, 2015 at 05:46:38PM +0200, Michał Winiarski wrote:
> +	pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES),
> +			sizeof(unsigned long), GFP_TEMPORARY);
> +	if (!pts)
> +		goto err_out;

This is the oddly aligned bracket!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v5] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps
  2015-09-02 15:46             ` [PATCH v4] " Michał Winiarski
  2015-09-02 16:05               ` Chris Wilson
@ 2015-09-03 17:22               ` Michał Winiarski
  2015-09-03 20:48                 ` Chris Wilson
  1 sibling, 1 reply; 15+ messages in thread
From: Michał Winiarski @ 2015-09-03 17:22 UTC (permalink / raw)
  To: intel-gfx, Daniel Vetter

On each call to gen8_alloc_va_range_3lvl we're allocating temporary
bitmaps needed for error handling. Unfortunately, when we increase
address space size (48b ppgtt) we do additional (512 - 4) calls to
kcalloc, increasing latency between exec and actual start of execution
on the GPU. Let's just do a single kcalloc, we can also drop the size
from free_gen8_temp_bitmaps since it's no longer used.

v2: Use GFP_TEMPORARY to make the allocations reclaimable.
v3: Drop the 2D array, just allocate a single block.
v4: Rebase to handle gen8_preallocate_top_level_pdps.
v5: Align misaligned bracket.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 49 ++++++++++++++-----------------------
 1 file changed, 18 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bdb7adc..b40a9cb 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1164,13 +1164,8 @@ unwind_out:
 }
 
 static void
-free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
-		       uint32_t pdpes)
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long *new_pts)
 {
-	int i;
-
-	for (i = 0; i < pdpes; i++)
-		kfree(new_pts[i]);
 	kfree(new_pts);
 	kfree(new_pds);
 }
@@ -1180,29 +1175,20 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
  */
 static
 int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
-					 unsigned long ***new_pts,
+					 unsigned long **new_pts,
 					 uint32_t pdpes)
 {
-	int i;
 	unsigned long *pds;
-	unsigned long **pts;
+	unsigned long *pts;
 
-	pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL);
+	pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_TEMPORARY);
 	if (!pds)
 		return -ENOMEM;
 
-	pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL);
-	if (!pts) {
-		kfree(pds);
-		return -ENOMEM;
-	}
-
-	for (i = 0; i < pdpes; i++) {
-		pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES),
-				 sizeof(unsigned long), GFP_KERNEL);
-		if (!pts[i])
-			goto err_out;
-	}
+	pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES),
+		      sizeof(unsigned long), GFP_TEMPORARY);
+	if (!pts)
+		goto err_out;
 
 	*new_pds = pds;
 	*new_pts = pts;
@@ -1210,7 +1196,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
 	return 0;
 
 err_out:
-	free_gen8_temp_bitmaps(pds, pts, pdpes);
+	free_gen8_temp_bitmaps(pds, pts);
 	return -ENOMEM;
 }
 
@@ -1231,7 +1217,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(vm, struct i915_hw_ppgtt, base);
-	unsigned long *new_page_dirs, **new_page_tables;
+	unsigned long *new_page_dirs, *new_page_tables;
 	struct drm_device *dev = vm->dev;
 	struct i915_page_directory *pd;
 	const uint64_t orig_start = start;
@@ -1258,14 +1244,14 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 	ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length,
 						new_page_dirs);
 	if (ret) {
-		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+		free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 		return ret;
 	}
 
 	/* For every page directory referenced, allocate page tables */
 	gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
 		ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length,
-						new_page_tables[pdpe]);
+						new_page_tables + pdpe * BITS_TO_LONGS(I915_PDES));
 		if (ret)
 			goto err_out;
 	}
@@ -1316,20 +1302,21 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
 		gen8_setup_page_directory(ppgtt, pdp, pd, pdpe);
 	}
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	mark_tlbs_dirty(ppgtt);
 	return 0;
 
 err_out:
 	while (pdpe--) {
-		for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES)
+		for_each_set_bit(temp, new_page_tables + pdpe *
+				BITS_TO_LONGS(I915_PDES), I915_PDES)
 			free_pt(dev, pdp->page_directory[pdpe]->page_table[temp]);
 	}
 
 	for_each_set_bit(pdpe, new_page_dirs, pdpes)
 		free_pd(dev, pdp->page_directory[pdpe]);
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 	mark_tlbs_dirty(ppgtt);
 	return ret;
 }
@@ -1481,7 +1468,7 @@ static void gen8_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 
 static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
 {
-	unsigned long *new_page_dirs, **new_page_tables;
+	unsigned long *new_page_dirs, *new_page_tables;
 	uint32_t pdpes = I915_PDPES_PER_PDP(dev);
 	int ret;
 
@@ -1501,7 +1488,7 @@ static int gen8_preallocate_top_level_pdps(struct i915_hw_ppgtt *ppgtt)
 	if (!ret)
 		*ppgtt->pdp.used_pdpes = *new_page_dirs;
 
-	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
+	free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
 
 	return ret;
 }
-- 
2.4.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v5] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps
  2015-09-03 17:22               ` [PATCH v5] " Michał Winiarski
@ 2015-09-03 20:48                 ` Chris Wilson
  2015-09-04  7:53                   ` Daniel Vetter
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2015-09-03 20:48 UTC (permalink / raw)
  To: Michał Winiarski; +Cc: intel-gfx

On Thu, Sep 03, 2015 at 07:22:18PM +0200, Michał Winiarski wrote:
> +	pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES),
> +		      sizeof(unsigned long), GFP_TEMPORARY);

Something to remember is that kcalloc is written presuming that the size
argument (the second) is constant.

pts = kcalloc(pdpes,
	      BITS_TO_LONGS(I915_PDES) * sizeof(unsigned long),
	      GFP_TEMPORARY);

should be infinitesimally more efficient.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps
  2015-09-03 20:48                 ` Chris Wilson
@ 2015-09-04  7:53                   ` Daniel Vetter
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2015-09-04  7:53 UTC (permalink / raw)
  To: Chris Wilson, Michał Winiarski, intel-gfx, Daniel Vetter,
	Mika Kuoppala, Michel Thierry

On Thu, Sep 03, 2015 at 09:48:03PM +0100, Chris Wilson wrote:
> On Thu, Sep 03, 2015 at 07:22:18PM +0200, Michał Winiarski wrote:
> > +	pts = kcalloc(pdpes * BITS_TO_LONGS(I915_PDES),
> > +		      sizeof(unsigned long), GFP_TEMPORARY);
> 
> Something to remember is that kcalloc is written presuming that the size
> argument (the second) is constant.
> 
> pts = kcalloc(pdpes,
> 	      BITS_TO_LONGS(I915_PDES) * sizeof(unsigned long),
> 	      GFP_TEMPORARY);
> 
> should be infinitesimally more efficient.

That's also better style from a security pov sinc kcalloc checks for
overflows. Which means if you multiply yourself things might overflow and
go boom. Luckily pdpdes is guaranteed to be small enough here, but still
secure coding best practices means you better not multiply the variable.
Hence I changed this.

> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-09-04  7:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-31 16:27 [PATCH] drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps Michał Winiarski
2015-08-31 16:40 ` Chris Wilson
2015-08-31 16:59 ` [PATCH v2] " Michał Winiarski
2015-08-31 18:42   ` Chris Wilson
2015-09-01  9:03     ` Michał Winiarski
2015-09-01  9:06   ` [PATCH v3] " Michał Winiarski
2015-09-02 13:40     ` Michel Thierry
2015-09-02 13:46       ` Chris Wilson
2015-09-02 15:13         ` Daniel Vetter
2015-09-02 15:26           ` Daniel Vetter
2015-09-02 15:46             ` [PATCH v4] " Michał Winiarski
2015-09-02 16:05               ` Chris Wilson
2015-09-03 17:22               ` [PATCH v5] " Michał Winiarski
2015-09-03 20:48                 ` Chris Wilson
2015-09-04  7:53                   ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.