All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Intel-gfx@lists.freedesktop.org
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>,
	Eero Tamminen <eero.t.tamminen@intel.com>,
	dri-devel@lists.freedesktop.org,
	Chris Wilson <chris@chris-wilson.co.uk>,
	Matthew Auld <matthew.auld@intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: [PATCH 2/2] drm/i915: Use Transparent Hugepages when IOMMU is enabled
Date: Thu, 29 Jul 2021 12:18:48 +0100	[thread overview]
Message-ID: <20210729111848.729888-2-tvrtko.ursulin@linux.intel.com> (raw)
In-Reply-To: <20210729111848.729888-1-tvrtko.ursulin@linux.intel.com>

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Usage of Transparent Hugepages was disabled in 9987da4b5dcf
("drm/i915: Disable THP until we have a GPU read BW W/A"), but since it
appears majority of performance regressions reported with an enabled IOMMU
can be almost eliminated by turning them on, lets do that by adding a
couple of Kconfig options.

To err on the side of safety we keep the current default in cases where
IOMMU is not active, and only when it is default to the "huge=within_size"
mode. Although there probably would be wins to enable them throughout,
more extensive testing across benchmarks and platforms would need to be
done.

With the patch and IOMMU enabled my local testing on a small Skylake part
shows OglVSTangent regression being reduced from ~14% to ~2%.

v2:
 * Add Kconfig dependency to transparent hugepages and some help text.
 * Move to helper for easier handling of kernel build options.

References: b901bb89324a ("drm/i915/gemfs: enable THP")
References: 9987da4b5dcf ("drm/i915: Disable THP until we have a GPU read BW W/A")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/430
Co-developed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> # v1
---
 drivers/gpu/drm/i915/Kconfig.profile  | 73 +++++++++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gemfs.c | 27 ++++++++--
 2 files changed, 97 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
index 39328567c200..d49ee794732f 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -119,3 +119,76 @@ config DRM_I915_TIMESLICE_DURATION
 	  /sys/class/drm/card?/engine/*/timeslice_duration_ms
 
 	  May be 0 to disable timeslicing.
+
+choice
+	prompt "Transparent Hugepage Support (native)"
+	default DRM_I915_THP_NATIVE_NEVER
+	depends on TRANSPARENT_HUGEPAGE
+	help
+	  Select the preferred method for allocating from Transparent Hugepages
+	  when IOMMU is not enabled.
+
+	config DRM_I915_THP_NATIVE_NEVER
+	bool "Never"
+	help
+	  Disable using THP for system memory allocations, individually
+	  allocating each 4K chunk as a separate page. It is unlikely that such
+	  individual allocations will return contiguous memory.
+
+	config DRM_I915_THP_NATIVE_WITHIN
+	bool "Within size"
+	help
+	  Allocate whole 2M superpages while those chunks do not exceed the
+	  object size. The remainder of the object will be allocated from 4K
+	  pages. No overallocation.
+
+	config DRM_I915_THP_NATIVE_ALWAYS
+	bool "Always"
+	help
+	  Allocate the whole object using 2M superpages, even if the object does
+	  not require an exact number of superpages.
+
+endchoice
+
+config DRM_I915_THP_NATIVE
+	string
+	default "always" if DRM_I915_THP_NATIVE_ALWAYS
+	default "within_size" if DRM_I915_THP_NATIVE_WITHIN
+	default "never" if DRM_I915_THP_NATIVE_NEVER
+
+choice
+	prompt "Transparent Hugepage Support (IOMMU)"
+	default DRM_I915_THP_IOMMU_WITHIN if TRANSPARENT_HUGEPAGE=y
+	default DRM_I915_THP_IOMMU_NEVER if TRANSPARENT_HUGEPAGE=n
+	depends on TRANSPARENT_HUGEPAGE
+	help
+	  Select the preferred method for allocating from Transparent Hugepages
+	  with IOMMU active.
+
+	config DRM_I915_THP_IOMMU_NEVER
+	bool "Never"
+	help
+	  Disable using THP for system memory allocations, individually
+	  allocating each 4K chunk as a separate page. It is unlikely that such
+	  individual allocations will return contiguous memory.
+
+	config DRM_I915_THP_IOMMU_WITHIN
+	bool "Within size"
+	help
+	  Allocate whole 2M superpages while those chunks do not exceed the
+	  object size. The remainder of the object will be allocated from 4K
+	  pages. No overallocation.
+
+	config DRM_I915_THP_IOMMU_ALWAYS
+	bool "Always"
+	help
+	  Allocate the whole object using 2M superpages, even if the object does
+	  not require an exact number of superpages.
+
+endchoice
+
+config DRM_I915_THP_IOMMU
+	string
+	default "always" if DRM_I915_THP_IOMMU_ALWAYS
+	default "within_size" if DRM_I915_THP_IOMMU_WITHIN
+	default "never" if DRM_I915_THP_IOMMU_NEVER
diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c
index 5e6e8c91ab38..871cbfb02fdf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gemfs.c
+++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c
@@ -11,6 +11,26 @@
 #include "i915_drv.h"
 #include "i915_gemfs.h"
 
+#if defined(CONFIG_DRM_I915_THP_NATIVE) && defined(CONFIG_DRM_I915_THP_IOMMU)
+static char *gemfd_mount_opts(struct drm_i915_private *i915)
+{
+	static char thp_native[] = "huge=" CONFIG_DRM_I915_THP_NATIVE;
+	static char thp_iommu[] = "huge=" CONFIG_DRM_I915_THP_IOMMU;
+	char *opts;
+
+	opts = intel_vtd_active() ? thp_iommu : thp_native;
+	drm_info(&i915->drm, "Transparent Hugepage mode '%s'", opts);
+
+	return opts;
+}
+#else
+static char *gemfd_mount_opts(struct drm_i915_private *i915)
+{
+	return NULL;
+}
+#endif
+
+
 int i915_gemfs_init(struct drm_i915_private *i915)
 {
 	struct file_system_type *type;
@@ -26,10 +46,11 @@ int i915_gemfs_init(struct drm_i915_private *i915)
 	 *
 	 * One example, although it is probably better with a per-file
 	 * control, is selecting huge page allocations ("huge=within_size").
-	 * Currently unused due to bandwidth issues (slow reads) on Broadwell+.
+	 * However, we only do so to offset the overhead of iommu lookups
+	 * due to bandwidth issues (slow reads) on Broadwell+.
 	 */
-
-	gemfs = kern_mount(type);
+	gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name,
+			       gemfd_mount_opts(i915));
 	if (IS_ERR(gemfs))
 		return PTR_ERR(gemfs);
 
-- 
2.30.2


WARNING: multiple messages have this Message-ID (diff)
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Intel-gfx@lists.freedesktop.org
Cc: Eero Tamminen <eero.t.tamminen@intel.com>,
	dri-devel@lists.freedesktop.org,
	Chris Wilson <chris@chris-wilson.co.uk>,
	Matthew Auld <matthew.auld@intel.com>
Subject: [Intel-gfx] [PATCH 2/2] drm/i915: Use Transparent Hugepages when IOMMU is enabled
Date: Thu, 29 Jul 2021 12:18:48 +0100	[thread overview]
Message-ID: <20210729111848.729888-2-tvrtko.ursulin@linux.intel.com> (raw)
In-Reply-To: <20210729111848.729888-1-tvrtko.ursulin@linux.intel.com>

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Usage of Transparent Hugepages was disabled in 9987da4b5dcf
("drm/i915: Disable THP until we have a GPU read BW W/A"), but since it
appears majority of performance regressions reported with an enabled IOMMU
can be almost eliminated by turning them on, lets do that by adding a
couple of Kconfig options.

To err on the side of safety we keep the current default in cases where
IOMMU is not active, and only when it is default to the "huge=within_size"
mode. Although there probably would be wins to enable them throughout,
more extensive testing across benchmarks and platforms would need to be
done.

With the patch and IOMMU enabled my local testing on a small Skylake part
shows OglVSTangent regression being reduced from ~14% to ~2%.

v2:
 * Add Kconfig dependency to transparent hugepages and some help text.
 * Move to helper for easier handling of kernel build options.

References: b901bb89324a ("drm/i915/gemfs: enable THP")
References: 9987da4b5dcf ("drm/i915: Disable THP until we have a GPU read BW W/A")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/430
Co-developed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> # v1
---
 drivers/gpu/drm/i915/Kconfig.profile  | 73 +++++++++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gemfs.c | 27 ++++++++--
 2 files changed, 97 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
index 39328567c200..d49ee794732f 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -119,3 +119,76 @@ config DRM_I915_TIMESLICE_DURATION
 	  /sys/class/drm/card?/engine/*/timeslice_duration_ms
 
 	  May be 0 to disable timeslicing.
+
+choice
+	prompt "Transparent Hugepage Support (native)"
+	default DRM_I915_THP_NATIVE_NEVER
+	depends on TRANSPARENT_HUGEPAGE
+	help
+	  Select the preferred method for allocating from Transparent Hugepages
+	  when IOMMU is not enabled.
+
+	config DRM_I915_THP_NATIVE_NEVER
+	bool "Never"
+	help
+	  Disable using THP for system memory allocations, individually
+	  allocating each 4K chunk as a separate page. It is unlikely that such
+	  individual allocations will return contiguous memory.
+
+	config DRM_I915_THP_NATIVE_WITHIN
+	bool "Within size"
+	help
+	  Allocate whole 2M superpages while those chunks do not exceed the
+	  object size. The remainder of the object will be allocated from 4K
+	  pages. No overallocation.
+
+	config DRM_I915_THP_NATIVE_ALWAYS
+	bool "Always"
+	help
+	  Allocate the whole object using 2M superpages, even if the object does
+	  not require an exact number of superpages.
+
+endchoice
+
+config DRM_I915_THP_NATIVE
+	string
+	default "always" if DRM_I915_THP_NATIVE_ALWAYS
+	default "within_size" if DRM_I915_THP_NATIVE_WITHIN
+	default "never" if DRM_I915_THP_NATIVE_NEVER
+
+choice
+	prompt "Transparent Hugepage Support (IOMMU)"
+	default DRM_I915_THP_IOMMU_WITHIN if TRANSPARENT_HUGEPAGE=y
+	default DRM_I915_THP_IOMMU_NEVER if TRANSPARENT_HUGEPAGE=n
+	depends on TRANSPARENT_HUGEPAGE
+	help
+	  Select the preferred method for allocating from Transparent Hugepages
+	  with IOMMU active.
+
+	config DRM_I915_THP_IOMMU_NEVER
+	bool "Never"
+	help
+	  Disable using THP for system memory allocations, individually
+	  allocating each 4K chunk as a separate page. It is unlikely that such
+	  individual allocations will return contiguous memory.
+
+	config DRM_I915_THP_IOMMU_WITHIN
+	bool "Within size"
+	help
+	  Allocate whole 2M superpages while those chunks do not exceed the
+	  object size. The remainder of the object will be allocated from 4K
+	  pages. No overallocation.
+
+	config DRM_I915_THP_IOMMU_ALWAYS
+	bool "Always"
+	help
+	  Allocate the whole object using 2M superpages, even if the object does
+	  not require an exact number of superpages.
+
+endchoice
+
+config DRM_I915_THP_IOMMU
+	string
+	default "always" if DRM_I915_THP_IOMMU_ALWAYS
+	default "within_size" if DRM_I915_THP_IOMMU_WITHIN
+	default "never" if DRM_I915_THP_IOMMU_NEVER
diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c
index 5e6e8c91ab38..871cbfb02fdf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gemfs.c
+++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c
@@ -11,6 +11,26 @@
 #include "i915_drv.h"
 #include "i915_gemfs.h"
 
+#if defined(CONFIG_DRM_I915_THP_NATIVE) && defined(CONFIG_DRM_I915_THP_IOMMU)
+static char *gemfd_mount_opts(struct drm_i915_private *i915)
+{
+	static char thp_native[] = "huge=" CONFIG_DRM_I915_THP_NATIVE;
+	static char thp_iommu[] = "huge=" CONFIG_DRM_I915_THP_IOMMU;
+	char *opts;
+
+	opts = intel_vtd_active() ? thp_iommu : thp_native;
+	drm_info(&i915->drm, "Transparent Hugepage mode '%s'", opts);
+
+	return opts;
+}
+#else
+static char *gemfd_mount_opts(struct drm_i915_private *i915)
+{
+	return NULL;
+}
+#endif
+
+
 int i915_gemfs_init(struct drm_i915_private *i915)
 {
 	struct file_system_type *type;
@@ -26,10 +46,11 @@ int i915_gemfs_init(struct drm_i915_private *i915)
 	 *
 	 * One example, although it is probably better with a per-file
 	 * control, is selecting huge page allocations ("huge=within_size").
-	 * Currently unused due to bandwidth issues (slow reads) on Broadwell+.
+	 * However, we only do so to offset the overhead of iommu lookups
+	 * due to bandwidth issues (slow reads) on Broadwell+.
 	 */
-
-	gemfs = kern_mount(type);
+	gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name,
+			       gemfd_mount_opts(i915));
 	if (IS_ERR(gemfs))
 		return PTR_ERR(gemfs);
 
-- 
2.30.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2021-07-29 11:19 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-29 11:18 [PATCH 1/2] drm/i915/selftests: fixup igt_shrink_thp Tvrtko Ursulin
2021-07-29 11:18 ` [Intel-gfx] " Tvrtko Ursulin
2021-07-29 11:18 ` Tvrtko Ursulin [this message]
2021-07-29 11:18   ` [Intel-gfx] [PATCH 2/2] drm/i915: Use Transparent Hugepages when IOMMU is enabled Tvrtko Ursulin
2021-07-29 12:07   ` Daniel Vetter
2021-07-29 12:07     ` [Intel-gfx] " Daniel Vetter
2021-07-29 12:21     ` Tvrtko Ursulin
2021-07-29 12:21       ` [Intel-gfx] " Tvrtko Ursulin
2021-07-29 12:28       ` Daniel Vetter
2021-07-29 12:28         ` [Intel-gfx] " Daniel Vetter
2021-07-29 11:48 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/2] drm/i915/selftests: fixup igt_shrink_thp Patchwork
2021-07-29 12:16 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-07-29 20:53 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2021-07-29 13:34 [PATCH 1/2] " Tvrtko Ursulin
2021-07-29 13:34 ` [PATCH 2/2] drm/i915: Use Transparent Hugepages when IOMMU is enabled Tvrtko Ursulin
2021-07-29 14:06   ` Daniel Vetter
2021-09-03 12:47     ` Tvrtko Ursulin
2021-09-07  8:42       ` Daniel Vetter
2021-09-07  9:34         ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210729111848.729888-2-tvrtko.ursulin@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=Intel-gfx@lists.freedesktop.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=eero.t.tamminen@intel.com \
    --cc=matthew.auld@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=tvrtko.ursulin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.