All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Trim the object sg table
@ 2016-11-09 14:30 Tvrtko Ursulin
  2016-11-09 14:44 ` Chris Wilson
  2016-11-09 16:45 ` ✗ Fi.CI.BAT: warning for drm/i915: Trim the object sg table (rev2) Patchwork
  0 siblings, 2 replies; 7+ messages in thread
From: Tvrtko Ursulin @ 2016-11-09 14:30 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

At the moment we allocate enough sg table entries assuming we
will not be able to do any coallescing. But since in practice
we most often can, and more so very effectively, this ends up
wasting a lot of memory.

A simple and effective way of trimming the over-allocated
entries is to copy the table over to a new one allocated to the
exact size.

Experiment on my freshly logged and idle desktop (KDE) showed
that by doing this we can save approximately 1 MiB of RAM, or
when running a typical benchmark like gl_manhattan I have
even seen a 6 MiB saving.

v2:
 * Update commit message.
 * Use temporary sg_table on stack. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d2ad73d0b5b9..411aae535abe 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2232,6 +2232,28 @@ static unsigned int swiotlb_max_size(void)
 #endif
 }
 
+static void i915_sg_trim(struct sg_table *orig_st)
+{
+	struct sg_table new_st;
+	struct scatterlist *sg, *new_sg;
+	unsigned int i;
+
+	if (orig_st->nents == orig_st->orig_nents)
+		return;
+
+	if (sg_alloc_table(&new_st, orig_st->nents, GFP_KERNEL))
+		return;
+
+	new_sg = new_st.sgl;
+	for_each_sg(orig_st->sgl, sg, orig_st->nents, i) {
+		sg_set_page(new_sg, sg_page(sg), sg->length, 0);
+		new_sg = sg_next(new_sg);
+	}
+
+	sg_free_table(orig_st);
+	memcpy(orig_st, &new_st, sizeof(*orig_st));
+}
+
 static struct sg_table *
 i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 {
@@ -2317,6 +2339,9 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	if (sg) /* loop terminated early; short sg table */
 		sg_mark_end(sg);
 
+	/* Trim unused sg entries to avoid wasting memory. */
+	i915_sg_trim(st);
+
 	ret = i915_gem_gtt_prepare_pages(obj, st);
 	if (ret)
 		goto err_pages;
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Trim the object sg table
  2016-11-09 14:30 [PATCH] drm/i915: Trim the object sg table Tvrtko Ursulin
@ 2016-11-09 14:44 ` Chris Wilson
  2016-11-09 15:07   ` Tvrtko Ursulin
  2016-11-09 15:13   ` [PATCH v3] " Tvrtko Ursulin
  2016-11-09 16:45 ` ✗ Fi.CI.BAT: warning for drm/i915: Trim the object sg table (rev2) Patchwork
  1 sibling, 2 replies; 7+ messages in thread
From: Chris Wilson @ 2016-11-09 14:44 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Wed, Nov 09, 2016 at 02:30:02PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> At the moment we allocate enough sg table entries assuming we
> will not be able to do any coallescing. But since in practice
> we most often can, and more so very effectively, this ends up
> wasting a lot of memory.
> 
> A simple and effective way of trimming the over-allocated
> entries is to copy the table over to a new one allocated to the
> exact size.
> 
> Experiment on my freshly logged and idle desktop (KDE) showed
Experiments
> that by doing this we can save approximately 1 MiB of RAM, or
> when running a typical benchmark like gl_manhattan I have
> even seen a 6 MiB saving.

More complicated techniques such as only copying the last used page and
freeing the rest are left to the reader.

> v2:
>  * Update commit message.
>  * Use temporary sg_table on stack. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d2ad73d0b5b9..411aae535abe 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2232,6 +2232,28 @@ static unsigned int swiotlb_max_size(void)
>  #endif
>  }
>  
> +static void i915_sg_trim(struct sg_table *orig_st)
> +{
> +	struct sg_table new_st;
> +	struct scatterlist *sg, *new_sg;
> +	unsigned int i;
> +
> +	if (orig_st->nents == orig_st->orig_nents)
> +		return;
> +
> +	if (sg_alloc_table(&new_st, orig_st->nents, GFP_KERNEL))
> +		return;
> +
> +	new_sg = new_st.sgl;
> +	for_each_sg(orig_st->sgl, sg, orig_st->nents, i) {
> +		sg_set_page(new_sg, sg_page(sg), sg->length, 0);
> +		new_sg = sg_next(new_sg);

Worth a
	/* called before being DMA mapped, no need to copy sg->dma_* */
?

> +	}
> +
> +	sg_free_table(orig_st);
> +	memcpy(orig_st, &new_st, sizeof(*orig_st));

I would have used *orig_st = new;

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
           ^ I remembered it this time!
Took a couple of attempts to spell my name right though.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Trim the object sg table
  2016-11-09 14:44 ` Chris Wilson
@ 2016-11-09 15:07   ` Tvrtko Ursulin
  2016-11-09 15:13     ` Chris Wilson
  2016-11-09 15:13   ` [PATCH v3] " Tvrtko Ursulin
  1 sibling, 1 reply; 7+ messages in thread
From: Tvrtko Ursulin @ 2016-11-09 15:07 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx, Tvrtko Ursulin


On 09/11/2016 14:44, Chris Wilson wrote:
> On Wed, Nov 09, 2016 at 02:30:02PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> At the moment we allocate enough sg table entries assuming we
>> will not be able to do any coallescing. But since in practice
>> we most often can, and more so very effectively, this ends up
>> wasting a lot of memory.
>>
>> A simple and effective way of trimming the over-allocated
>> entries is to copy the table over to a new one allocated to the
>> exact size.
>>
>> Experiment on my freshly logged and idle desktop (KDE) showed
> Experiments
>> that by doing this we can save approximately 1 MiB of RAM, or
>> when running a typical benchmark like gl_manhattan I have
>> even seen a 6 MiB saving.
>
> More complicated techniques such as only copying the last used page and
> freeing the rest are left to the reader.

Yes that would need to go into the core kernel since it needs access to 
static alloc/free functions and performance benefit might be quite small 
for that. Typically I see coalescing working really well so the delta in 
saved allocations and frees would be quite small. Perhaps I need to 
attempt to fragment my memory a lot to see what happens then.

>> v2:
>>  * Update commit message.
>>  * Use temporary sg_table on stack. (Chris Wilson)
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> ---
>>  drivers/gpu/drm/i915/i915_gem.c | 25 +++++++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index d2ad73d0b5b9..411aae535abe 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2232,6 +2232,28 @@ static unsigned int swiotlb_max_size(void)
>>  #endif
>>  }
>>
>> +static void i915_sg_trim(struct sg_table *orig_st)
>> +{
>> +	struct sg_table new_st;
>> +	struct scatterlist *sg, *new_sg;
>> +	unsigned int i;
>> +
>> +	if (orig_st->nents == orig_st->orig_nents)
>> +		return;
>> +
>> +	if (sg_alloc_table(&new_st, orig_st->nents, GFP_KERNEL))
>> +		return;
>> +
>> +	new_sg = new_st.sgl;
>> +	for_each_sg(orig_st->sgl, sg, orig_st->nents, i) {
>> +		sg_set_page(new_sg, sg_page(sg), sg->length, 0);
>> +		new_sg = sg_next(new_sg);
>
> Worth a
> 	/* called before being DMA mapped, no need to copy sg->dma_* */
> ?

Hm, or something safer than a comment. Unfortunately entries are not 
zeroed by default to enable a GEM_BUG_ON here. Unless CONFIG_GEM_DEBUG 
could mean GFP_ZERO added to some our allocations. :)

Yeah I think comment is the best option as long as this function is 
static only. Will add.

>
>> +	}
>> +
>> +	sg_free_table(orig_st);
>> +	memcpy(orig_st, &new_st, sizeof(*orig_st));
>
> I would have used *orig_st = new;

It is more readable, agreed.

>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>            ^ I remembered it this time!
> Took a couple of attempts to spell my name right though.

Thanks! I assume I can keep it for the above little changes.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Trim the object sg table
  2016-11-09 15:07   ` Tvrtko Ursulin
@ 2016-11-09 15:13     ` Chris Wilson
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2016-11-09 15:13 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Wed, Nov 09, 2016 at 03:07:38PM +0000, Tvrtko Ursulin wrote:
> 
> On 09/11/2016 14:44, Chris Wilson wrote:
> >Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> >           ^ I remembered it this time!
> >Took a couple of attempts to spell my name right though.
> 
> Thanks! I assume I can keep it for the above little changes.

Yes.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3] drm/i915: Trim the object sg table
  2016-11-09 14:44 ` Chris Wilson
  2016-11-09 15:07   ` Tvrtko Ursulin
@ 2016-11-09 15:13   ` Tvrtko Ursulin
  1 sibling, 0 replies; 7+ messages in thread
From: Tvrtko Ursulin @ 2016-11-09 15:13 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

At the moment we allocate enough sg table entries assuming we
will not be able to do any coalescing. But since in practice
we most often can, and more so very effectively, this ends up
wasting a lot of memory.

A simple and effective way of trimming the over-allocated
entries is to copy the table over to a new one allocated to the
exact size.

Experiments on my freshly logged and idle desktop (KDE) showed
that by doing this we can save approximately 1 MiB of RAM, or
when running a typical benchmark like gl_manhattan I have
even seen a 6 MiB saving.

More complicated techniques such as only copying the last used
page and freeing the rest are left to the reader.

v2:
 * Update commit message.
 * Use temporary sg_table on stack. (Chris Wilson)

v3:
 * Commit message update.
 * Comment added.
 * Replace memcpy with copy assignment.
   (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d2ad73d0b5b9..1c20edba7f2a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2232,6 +2232,30 @@ static unsigned int swiotlb_max_size(void)
 #endif
 }
 
+static void i915_sg_trim(struct sg_table *orig_st)
+{
+	struct sg_table new_st;
+	struct scatterlist *sg, *new_sg;
+	unsigned int i;
+
+	if (orig_st->nents == orig_st->orig_nents)
+		return;
+
+	if (sg_alloc_table(&new_st, orig_st->nents, GFP_KERNEL))
+		return;
+
+	new_sg = new_st.sgl;
+	for_each_sg(orig_st->sgl, sg, orig_st->nents, i) {
+		sg_set_page(new_sg, sg_page(sg), sg->length, 0);
+		/* called before being DMA mapped, no need to copy sg->dma_* */
+		new_sg = sg_next(new_sg);
+	}
+
+	sg_free_table(orig_st);
+
+	*orig_st = new_st;
+}
+
 static struct sg_table *
 i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 {
@@ -2317,6 +2341,9 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	if (sg) /* loop terminated early; short sg table */
 		sg_mark_end(sg);
 
+	/* Trim unused sg entries to avoid wasting memory. */
+	i915_sg_trim(st);
+
 	ret = i915_gem_gtt_prepare_pages(obj, st);
 	if (ret)
 		goto err_pages;
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* ✗ Fi.CI.BAT: warning for drm/i915: Trim the object sg table (rev2)
  2016-11-09 14:30 [PATCH] drm/i915: Trim the object sg table Tvrtko Ursulin
  2016-11-09 14:44 ` Chris Wilson
@ 2016-11-09 16:45 ` Patchwork
  2016-11-10  9:30   ` Tvrtko Ursulin
  1 sibling, 1 reply; 7+ messages in thread
From: Patchwork @ 2016-11-09 16:45 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Trim the object sg table (rev2)
URL   : https://patchwork.freedesktop.org/series/15036/
State : warning

== Summary ==

Series 15036v2 drm/i915: Trim the object sg table
https://patchwork.freedesktop.org/api/1.0/series/15036/revisions/2/mbox/

Test drv_module_reload_basic:
                pass       -> DMESG-WARN (fi-bsw-n3050)

fi-bdw-5557u     total:244  pass:229  dwarn:0   dfail:0   fail:0   skip:15 
fi-bsw-n3050     total:244  pass:203  dwarn:1   dfail:0   fail:0   skip:40 
fi-bxt-t5700     total:244  pass:216  dwarn:0   dfail:0   fail:0   skip:28 
fi-byt-j1900     total:244  pass:216  dwarn:0   dfail:0   fail:0   skip:28 
fi-byt-n2820     total:244  pass:212  dwarn:0   dfail:0   fail:0   skip:32 
fi-hsw-4770      total:244  pass:224  dwarn:0   dfail:0   fail:0   skip:20 
fi-hsw-4770r     total:244  pass:224  dwarn:0   dfail:0   fail:0   skip:20 
fi-ilk-650       total:244  pass:191  dwarn:0   dfail:0   fail:0   skip:53 
fi-ivb-3520m     total:244  pass:222  dwarn:0   dfail:0   fail:0   skip:22 
fi-ivb-3770      total:244  pass:222  dwarn:0   dfail:0   fail:0   skip:22 
fi-kbl-7200u     total:244  pass:222  dwarn:0   dfail:0   fail:0   skip:22 
fi-skl-6260u     total:244  pass:230  dwarn:0   dfail:0   fail:0   skip:14 
fi-skl-6700hq    total:244  pass:223  dwarn:0   dfail:0   fail:0   skip:21 
fi-skl-6700k     total:244  pass:222  dwarn:1   dfail:0   fail:0   skip:21 
fi-skl-6770hq    total:244  pass:230  dwarn:0   dfail:0   fail:0   skip:14 
fi-snb-2520m     total:244  pass:212  dwarn:0   dfail:0   fail:0   skip:32 
fi-snb-2600      total:244  pass:211  dwarn:0   dfail:0   fail:0   skip:33 

74d13d4fab710f664d5eeb15fd3de821a7f46818 drm-intel-nightly: 2016y-11m-09d-15h-02m-46s UTC integration manifest
2af81e6 drm/i915: Trim the object sg table

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_2944/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ✗ Fi.CI.BAT: warning for drm/i915: Trim the object sg table (rev2)
  2016-11-09 16:45 ` ✗ Fi.CI.BAT: warning for drm/i915: Trim the object sg table (rev2) Patchwork
@ 2016-11-10  9:30   ` Tvrtko Ursulin
  0 siblings, 0 replies; 7+ messages in thread
From: Tvrtko Ursulin @ 2016-11-10  9:30 UTC (permalink / raw)
  To: intel-gfx, Tvrtko Ursulin


On 09/11/2016 16:45, Patchwork wrote:
> == Series Details ==
>
> Series: drm/i915: Trim the object sg table (rev2)
> URL   : https://patchwork.freedesktop.org/series/15036/
> State : warning
>
> == Summary ==
>
> Series 15036v2 drm/i915: Trim the object sg table
> https://patchwork.freedesktop.org/api/1.0/series/15036/revisions/2/mbox/
>
> Test drv_module_reload_basic:
>                 pass       -> DMESG-WARN (fi-bsw-n3050)

Unrelated, raised new BZ at 
https://bugs.freedesktop.org/show_bug.cgi?id=98670.

>
> fi-bdw-5557u     total:244  pass:229  dwarn:0   dfail:0   fail:0   skip:15
> fi-bsw-n3050     total:244  pass:203  dwarn:1   dfail:0   fail:0   skip:40
> fi-bxt-t5700     total:244  pass:216  dwarn:0   dfail:0   fail:0   skip:28
> fi-byt-j1900     total:244  pass:216  dwarn:0   dfail:0   fail:0   skip:28
> fi-byt-n2820     total:244  pass:212  dwarn:0   dfail:0   fail:0   skip:32
> fi-hsw-4770      total:244  pass:224  dwarn:0   dfail:0   fail:0   skip:20
> fi-hsw-4770r     total:244  pass:224  dwarn:0   dfail:0   fail:0   skip:20
> fi-ilk-650       total:244  pass:191  dwarn:0   dfail:0   fail:0   skip:53
> fi-ivb-3520m     total:244  pass:222  dwarn:0   dfail:0   fail:0   skip:22
> fi-ivb-3770      total:244  pass:222  dwarn:0   dfail:0   fail:0   skip:22
> fi-kbl-7200u     total:244  pass:222  dwarn:0   dfail:0   fail:0   skip:22
> fi-skl-6260u     total:244  pass:230  dwarn:0   dfail:0   fail:0   skip:14
> fi-skl-6700hq    total:244  pass:223  dwarn:0   dfail:0   fail:0   skip:21
> fi-skl-6700k     total:244  pass:222  dwarn:1   dfail:0   fail:0   skip:21
> fi-skl-6770hq    total:244  pass:230  dwarn:0   dfail:0   fail:0   skip:14
> fi-snb-2520m     total:244  pass:212  dwarn:0   dfail:0   fail:0   skip:32
> fi-snb-2600      total:244  pass:211  dwarn:0   dfail:0   fail:0   skip:33
>
> 74d13d4fab710f664d5eeb15fd3de821a7f46818 drm-intel-nightly: 2016y-11m-09d-15h-02m-46s UTC integration manifest
> 2af81e6 drm/i915: Trim the object sg table

Merged to dinq, thanks for the review!

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-11-10  9:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-09 14:30 [PATCH] drm/i915: Trim the object sg table Tvrtko Ursulin
2016-11-09 14:44 ` Chris Wilson
2016-11-09 15:07   ` Tvrtko Ursulin
2016-11-09 15:13     ` Chris Wilson
2016-11-09 15:13   ` [PATCH v3] " Tvrtko Ursulin
2016-11-09 16:45 ` ✗ Fi.CI.BAT: warning for drm/i915: Trim the object sg table (rev2) Patchwork
2016-11-10  9:30   ` Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.