All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Matthew Auld <matthew.auld@intel.com>, Intel-gfx@lists.freedesktop.org
Cc: Eero Tamminen <eero.t.tamminen@intel.com>,
	dri-devel@lists.freedesktop.org,
	Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Subject: Re: [PATCH 1/2] drm/i915: Enable THP on Icelake and beyond
Date: Mon, 9 May 2022 14:07:11 +0100	[thread overview]
Message-ID: <2deb2d24-034c-e2e7-4b2c-7bf501529a8c@linux.intel.com> (raw)
In-Reply-To: <5aea48fb-8b80-4873-5e37-64bec9562e46@intel.com>


On 09/05/2022 11:49, Matthew Auld wrote:
> On 29/04/2022 11:04, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> We have a statement from HW designers that the GPU read regression when
>> using 2M pages was fixed from Icelake onwards, which was also confirmed
>> by bencharking Eero did last year:
>>
>> """
>> When IOMMU is disabled, enabling THP causes following perf changes on
>> TGL-H (GT1):
>>
>>      10-15% SynMark Batch[0-3]
>>      5-10% MemBW GPU texture, SynMark ShMapVsm
>>      3-5% SynMark TerrainFly* + Geom* + Fill* + CSCloth + Batch4
>>      1-3% GpuTest Triangle, SynMark TexMem* + DeferredAA + Batch[5-7]
>>            + few others
>>      -7% MemBW GPU blend
>>
>> In the above 3D benchmark names, * means all the variants of tests with
>> the same prefix. For example "SynMark TexMem*", means both TexMem128 &
>> TexMem512 tests in the synthetic (Intel internal) SynMark test suite.
>>
>> In the (public, but proprietary) GfxBench & GLB(enchmark) test suites,
>> there are both onscreen and offscreen variants of each test. Unless
>> explicitly stated otherwise, numbers are for both variants.
>>
>> All tests are run with FullHD monitor. All tests are fullscreen except
>> for GLB and GpuTest ones, which are run in 1/2 screen window (GpuTest
>> triangle is run both in fullscreen and 1/2 screen window).
>> """
>>
>> Since the only regression is MemBW GPU blend, against many more gains,
>> it sounds it is time to enable THP on Gen11+.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/430
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Matthew Auld <matthew.auld@intel.com>
>> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> 
> fwiw, for the series,
> Reviewed-by: Matthew Auld <matthew.auld@intel.com>

Thanks! With a statement from hw arch, benchmark results from Eero and a 
r-b from you, I think it is justified to push this so I have. Lets see 
if someone notices an improvement.

Regards,

Tvrtko

> 
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gemfs.c | 13 +++++++++----
>>   1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c 
>> b/drivers/gpu/drm/i915/gem/i915_gemfs.c
>> index ee87874e59dc..c5a6bbc842fc 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c
>> @@ -28,12 +28,14 @@ int i915_gemfs_init(struct drm_i915_private *i915)
>>        *
>>        * One example, although it is probably better with a per-file
>>        * control, is selecting huge page allocations 
>> ("huge=within_size").
>> -     * However, we only do so to offset the overhead of iommu lookups
>> -     * due to bandwidth issues (slow reads) on Broadwell+.
>> +     * However, we only do so on platforms which benefit from it, or to
>> +     * offset the overhead of iommu lookups, where with latter it is 
>> a net
>> +     * win even on platforms which would otherwise see some performance
>> +     * regressions such a slow reads issue on Broadwell and Skylake.
>>        */
>>       opts = NULL;
>> -    if (i915_vtd_active(i915)) {
>> +    if (GRAPHICS_VER(i915) >= 11 || i915_vtd_active(i915)) {
>>           if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
>>               opts = huge_opt;
>>               drm_info(&i915->drm,
>> @@ -41,7 +43,10 @@ int i915_gemfs_init(struct drm_i915_private *i915)
>>                    opts);
>>           } else {
>>               drm_notice(&i915->drm,
>> -                   "Transparent Hugepage support is recommended for 
>> optimal performance when IOMMU is enabled!\n");
>> +                   "Transparent Hugepage support is recommended for 
>> optimal performance%s\n",
>> +                   GRAPHICS_VER(i915) >= 11 ?
>> +                   " on this platform!" :
>> +                   " when IOMMU is enabled!");
>>           }
>>       }

WARNING: multiple messages have this Message-ID (diff)
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Matthew Auld <matthew.auld@intel.com>, Intel-gfx@lists.freedesktop.org
Cc: Eero Tamminen <eero.t.tamminen@intel.com>,
	dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 1/2] drm/i915: Enable THP on Icelake and beyond
Date: Mon, 9 May 2022 14:07:11 +0100	[thread overview]
Message-ID: <2deb2d24-034c-e2e7-4b2c-7bf501529a8c@linux.intel.com> (raw)
In-Reply-To: <5aea48fb-8b80-4873-5e37-64bec9562e46@intel.com>


On 09/05/2022 11:49, Matthew Auld wrote:
> On 29/04/2022 11:04, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> We have a statement from HW designers that the GPU read regression when
>> using 2M pages was fixed from Icelake onwards, which was also confirmed
>> by bencharking Eero did last year:
>>
>> """
>> When IOMMU is disabled, enabling THP causes following perf changes on
>> TGL-H (GT1):
>>
>>      10-15% SynMark Batch[0-3]
>>      5-10% MemBW GPU texture, SynMark ShMapVsm
>>      3-5% SynMark TerrainFly* + Geom* + Fill* + CSCloth + Batch4
>>      1-3% GpuTest Triangle, SynMark TexMem* + DeferredAA + Batch[5-7]
>>            + few others
>>      -7% MemBW GPU blend
>>
>> In the above 3D benchmark names, * means all the variants of tests with
>> the same prefix. For example "SynMark TexMem*", means both TexMem128 &
>> TexMem512 tests in the synthetic (Intel internal) SynMark test suite.
>>
>> In the (public, but proprietary) GfxBench & GLB(enchmark) test suites,
>> there are both onscreen and offscreen variants of each test. Unless
>> explicitly stated otherwise, numbers are for both variants.
>>
>> All tests are run with FullHD monitor. All tests are fullscreen except
>> for GLB and GpuTest ones, which are run in 1/2 screen window (GpuTest
>> triangle is run both in fullscreen and 1/2 screen window).
>> """
>>
>> Since the only regression is MemBW GPU blend, against many more gains,
>> it sounds it is time to enable THP on Gen11+.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/430
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Matthew Auld <matthew.auld@intel.com>
>> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> 
> fwiw, for the series,
> Reviewed-by: Matthew Auld <matthew.auld@intel.com>

Thanks! With a statement from hw arch, benchmark results from Eero and a 
r-b from you, I think it is justified to push this so I have. Lets see 
if someone notices an improvement.

Regards,

Tvrtko

> 
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gemfs.c | 13 +++++++++----
>>   1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c 
>> b/drivers/gpu/drm/i915/gem/i915_gemfs.c
>> index ee87874e59dc..c5a6bbc842fc 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c
>> @@ -28,12 +28,14 @@ int i915_gemfs_init(struct drm_i915_private *i915)
>>        *
>>        * One example, although it is probably better with a per-file
>>        * control, is selecting huge page allocations 
>> ("huge=within_size").
>> -     * However, we only do so to offset the overhead of iommu lookups
>> -     * due to bandwidth issues (slow reads) on Broadwell+.
>> +     * However, we only do so on platforms which benefit from it, or to
>> +     * offset the overhead of iommu lookups, where with latter it is 
>> a net
>> +     * win even on platforms which would otherwise see some performance
>> +     * regressions such a slow reads issue on Broadwell and Skylake.
>>        */
>>       opts = NULL;
>> -    if (i915_vtd_active(i915)) {
>> +    if (GRAPHICS_VER(i915) >= 11 || i915_vtd_active(i915)) {
>>           if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
>>               opts = huge_opt;
>>               drm_info(&i915->drm,
>> @@ -41,7 +43,10 @@ int i915_gemfs_init(struct drm_i915_private *i915)
>>                    opts);
>>           } else {
>>               drm_notice(&i915->drm,
>> -                   "Transparent Hugepage support is recommended for 
>> optimal performance when IOMMU is enabled!\n");
>> +                   "Transparent Hugepage support is recommended for 
>> optimal performance%s\n",
>> +                   GRAPHICS_VER(i915) >= 11 ?
>> +                   " on this platform!" :
>> +                   " when IOMMU is enabled!");
>>           }
>>       }

  reply	other threads:[~2022-05-09 13:07 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-29 10:04 [PATCH 1/2] drm/i915: Enable THP on Icelake and beyond Tvrtko Ursulin
2022-04-29 10:04 ` [Intel-gfx] " Tvrtko Ursulin
2022-04-29 10:04 ` [PATCH 2/2] drm/i915: Only setup private tmpfs mount when needed and fix logging Tvrtko Ursulin
2022-04-29 10:04   ` [Intel-gfx] " Tvrtko Ursulin
2022-04-29 13:19 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [1/2] drm/i915: Enable THP on Icelake and beyond Patchwork
2022-04-29 13:52 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-05-03 11:00 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [1/2] drm/i915: Enable THP on Icelake and beyond (rev2) Patchwork
2022-05-03 11:24 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-05-04  9:08 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2022-05-04  9:52 ` [Intel-gfx] ✓ Fi.CI.IGT: success " Patchwork
2022-05-09 10:49 ` [PATCH 1/2] drm/i915: Enable THP on Icelake and beyond Matthew Auld
2022-05-09 10:49   ` [Intel-gfx] " Matthew Auld
2022-05-09 13:07   ` Tvrtko Ursulin [this message]
2022-05-09 13:07     ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2deb2d24-034c-e2e7-4b2c-7bf501529a8c@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=Intel-gfx@lists.freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=eero.t.tamminen@intel.com \
    --cc=matthew.auld@intel.com \
    --cc=tvrtko.ursulin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.