All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/8] drm/v3d: Enable Big and Super Pages
@ 2024-04-28 12:40 Maíra Canal
  2024-04-28 12:40 ` [PATCH v4 1/8] drm/v3d: Fix return if scheduler initialization fails Maíra Canal
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Maíra Canal @ 2024-04-28 12:40 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Maíra Canal

This series introduces support for big and super pages in V3D. The V3D MMU has
support for 64KB and 1MB pages, called big pages and super pages, which are
currently not used. Therefore, this patchset has the intention to enable big and
super pages in V3D. The advantage of enabling big and super pages is that if any
entry for a page within a big/super page is cached in the MMU, it will be used
for the translation of all virtual addresses in the range of that super page
without requiring fetching any other entries.

Big/Super pages essentially mean a slightly better performance for users,
especially in applications with high memory requirements (e.g. applications
that use multiple large BOs).

Using a Raspberry Pi 4 (with a PAGE_SIZE=4KB downstream kernel), when running
traces from multiple applications, we were able to see the following
improvements:

fps_avg  helped:  warzone2100.70secs.1024x768.trace:                    1.98 -> 2.42     (22.19%) (v1: 2.56)
fps_avg  helped:  warzone2100.30secs.1024x768.trace:                    2 -> 2.45        (21.96%) (v1: 2.39)
fps_avg  helped:  supertuxkart-menus_1024x768.trace:                    123.12 -> 128.39 (4.28%)  (v1: 125.50)
fps_avg  helped:  vkQuake_capture_frames_1_through_1200_1280x720.gfxr:  61.26 -> 62.89   (2.67%)  (v1: 61.36)
fps_avg  helped:  quake2-gles3-1280x720.trace:                          63.42 -> 64.86   (2.27%)  (v1: 64.29)
fps_avg  helped:  ue4_sun_temple_640x480.gfxr:                          25.15 -> 25.63   (1.89%)  (v1: 24.94)
fps_avg  helped:  ue4_shooter_game_high_quality_640x480.gfxr:           19.28 -> 19.61   (1.72%)  (v1: 19.02)
fps_avg  helped:  ue4_shooter_game_shooting_high_quality_640x480.gfxr:  23.58 -> 23.91   (1.39%)  (v1: 23.34)
fps_avg  helped:  quake3e_capture_frames_1_through_1800_1920x1080.gfxr: 61.40 -> 62.00   (0.96%)  (v1: -)
fps_avg  helped:  serious_sam_trace02_1280x720.gfxr:                    49.41 -> 49.84   (0.86%)  (v1: 47.74)
fps_avg  helped:  supertuxkart-racing_1024x768.trace:                   8.69 -> 8.74     (0.56%)  (v1: -)
fps_avg  helped:  sponza_demo01_800x600.gfxr:                           17.55 -> 17.64   (0.50%)  (v1: -)
fps_avg  helped:  quake2-gl1.4-1280x720.trace:                          36.43 -> 36.58   (0.43%)  (v1: 36.57)
fps_avg  helped:  quake3e-1280x720.trace:                               94.49 -> 94.87   (0.40%)  (v1: -)
fps_avg  helped:  sponza_demo02_800x600.gfxr:                           19.79 -> 19.87   (0.39%)  (v1: -)

Using a Raspberry Pi 5 (with a PAGE_SIZE=16KB downstream kernel), when running
traces from multiple applications, we were able to see the following
improvements:

fps_avg  helped:  warzone2100.30secs.1024x768.trace:                       4.40 -> 4.95     (12.58%) (v1: 4.49)
fps_avg  helped:  vkQuake_capture_frames_1_through_1200_1280x720.gfxr:     135.05 -> 141.21 (4.56%)  (v1: 139.45)
fps_avg  helped:  supertuxkart-menus_1024x768.trace:                       291.73 -> 302.05 (3.54%)  (v1: 303.80)
fps_avg  helped:  quake2-gles3-1280x720.trace:                             148.90 -> 153.57 (3.13%)  (v1: 156.13)
fps_avg  helped:  quake2-gl1.4-1280x720.trace:                             82.60 -> 84.46   (2.25%)  (v1: -)
fps_avg  helped:  ue4_shooter_game_shooting_low_quality_640x480.gfxr:      49.59 -> 50.54   (1.92%)  (v1: 47.30)
fps_avg  helped:  ue4_shooter_game_shooting_high_quality_640x480.gfxr:     51.03 -> 51.93   (1.76%)  (v1: 50.46)
fps_avg  helped:  ue4_shooter_game_high_quality_640x480.gfxr:              31.15 -> 31.64   (1.60%)  (v1: 31.05)
fps_avg  helped:  ue4_sun_temple_640x480.gfxr:                             40.26 -> 40.88   (1.54%)  (v1: 40.23)
fps_avg  helped:  sponza_demo01_800x600.gfxr:                              43.23 -> 43.84   (1.40%)  (v1: 43.68)
fps_avg  helped:  warzone2100.70secs.1024x768.trace:                       4.36 -> 4.42     (1.39%)  (v1: -)
fps_avg  helped:  sponza_demo02_800x600.gfxr:                              48.94 -> 49.51   (1.17%)  (v1: 49.34)
fps_avg  helped:  quake3e_capture_frames_1_through_1800_1920x1080.gfxr:    162.11 -> 163.13 (0.63%)  (v1: 165.71)
fps_avg  helped:  quake3e-1280x720.trace:                                  229.82 -> 231.00 (0.51%)  (v1: 234.51)
fps_avg  helped:  supertuxkart-racing_1024x768.trace:                      20.42 -> 20.48   (0.30%)  (v1: 20.59)
fps_avg  helped:  quake3e_capture_frames_1800_through_2400_1920x1080.gfxr: 157.45 -> 157.61 (0.10%)  (v1: 160.35)
fps_avg  helped:  serious_sam_trace02_1280x720.gfxr:                       59.99 -> 60.02   (0.05%)  (v1: -)

When glancing at the percentage improvement, one might initially perceive v2
as causing a performance downgrade. However, that assumption doesn't hold true.
In the case of v1, we were using downstream kernel version 6.1, whereas now,
with the kernel updated to 6.6 for v2, there's a small uptick in performance.
This indicates an enhancement in the baseline scenario, rather than any detriment
caused by v2. Additionally, I've included stats from v1 in the comparisons. Upon
scrutinizing the average FPS of v2 in contrast to v1, it becomes evident that v2
not only maintains the improvements but may even surpass them.

This version provides a much safer way to iterate through memory and doesn't
hold to the same limitations as v1. For example, v1 had a hard-coded hack that
only allowed a huge page to be created if the BO was bigger than 2MB. These
limitations are gone now.

This series also introduces changes in the GEM helpers, in order to enable V3D
to have a separate mount point for shmfs GEM objects. Any feedback from the
community about the changes in the GEM helpers is welcomed!

v1 -> v2: https://lore.kernel.org/dri-devel/20240311100959.205545-1-mcanal@igalia.com/

* [1/6] Add Iago's R-b to PATCH 1/5 (Iago Toral)
* [2/6] Create a new function `drm_gem_object_init_with_mnt()` to define the
        shmfs mountpoint. Now, we don't touch a bunch of drivers, as
				`drm_gem_object_init()` preserves its signature (Tvrtko Ursulin)
* [3/6] Use `huge=within_size` instead of `huge=always`, in order to avoid OOM.
        This also allow us to move away from the 2MB hack. (Tvrtko Ursulin)
* [3/6] Add Iago's R-b to PATCH 3/5 (Iago Toral)
* [5/6] Create a separate patch to reduce the alignment of the node allocation
        (Iago Toral)
* [6/6] Complete refactoring to the way that we iterate through the memory
        (Tvrtko Ursulin)
* [6/6] Don't use drm_prime_get_contiguous_size(), as it could give us misleading
        data (Tvrtko Ursulin)
* [6/6] Use both Big Pages (64K) and Super Pages (1MB)

v2 -> v3: https://lore.kernel.org/dri-devel/20240405201753.1176914-1-mcanal@igalia.com/T/

* [2/8] Add Tvrtko's R-b to PATCH 2/8 (Tvrtko Ursulin)
* [4/8] Add Tvrtko's R-b to PATCH 4/8 (Tvrtko Ursulin)
* [6/8] Now, PATCH 6/8 regards supporting big/super pages when writing out PTEs
        (Tvrtko Ursulin)
* [6/8] s/page_address/pfn (Tvrtko Ursulin)
* [6/8] As `sg_dma_len()` returns `unsigned int`, then `len` must be `unsigned int`
        too (Tvrtko Ursulin)
* [6/8] `i` and `page_size` are `unsigned int` as well (Tvrtko Ursulin)
* [6/8] Move `i`, `page_prot` and `page_size` to the inner scope (Tvrtko Ursulin)
* [6/8] s/pte/page_address/ (Tvrtko Ursulin)
* [7/8] New patch: use gemfs/THP in BO creation if available
* [8/8] New patch: 
* [8/8] Don't expose the modparam `super_pages` unless CONFIG_TRANSPARENT_HUGEPAGE
        is enabled (Tvrtko Ursulin)
* [8/8] Use `v3d->gemfs` to check if the user disabled Super Pages support
        (Tvrtko Ursulin)

v3 -> v4: https://lore.kernel.org/dri-devel/20240421215309.660018-1-mcanal@igalia.com/T/

* [5/8] Add Iago's R-b to PATCH 5/8 (Iago Toral)
* [6/8] Add Tvrtko's R-b to PATCH 6/8 (Tvrtko Ursulin)
* [7/8] Add Tvrtko's R-b to PATCH 7/8 (Tvrtko Ursulin)
* [8/8] Move `bool super_pages` to the guard (Tvrtko Ursulin)

Best Regards,
- Maíra

Maíra Canal (8):
  drm/v3d: Fix return if scheduler initialization fails
  drm/gem: Create a drm_gem_object_init_with_mnt() function
  drm/v3d: Introduce gemfs
  drm/gem: Create shmem GEM object in a given mountpoint
  drm/v3d: Reduce the alignment of the node allocation
  drm/v3d: Support Big/Super Pages when writing out PTEs
  drm/v3d: Use gemfs/THP in BO creation if available
  drm/v3d: Add modparam for turning off Big/Super Pages

 drivers/gpu/drm/drm_gem.c              | 34 +++++++++++++++--
 drivers/gpu/drm/drm_gem_shmem_helper.c | 30 +++++++++++++--
 drivers/gpu/drm/v3d/Makefile           |  3 +-
 drivers/gpu/drm/v3d/v3d_bo.c           | 21 ++++++++++-
 drivers/gpu/drm/v3d/v3d_drv.c          |  7 ++++
 drivers/gpu/drm/v3d/v3d_drv.h          | 12 +++++-
 drivers/gpu/drm/v3d/v3d_gem.c          |  6 ++-
 drivers/gpu/drm/v3d/v3d_gemfs.c        | 51 +++++++++++++++++++++++++
 drivers/gpu/drm/v3d/v3d_mmu.c          | 52 +++++++++++++++++++-------
 include/drm/drm_gem.h                  |  3 ++
 include/drm/drm_gem_shmem_helper.h     |  3 ++
 11 files changed, 195 insertions(+), 27 deletions(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c

-- 
2.44.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/8] drm/v3d: Fix return if scheduler initialization fails
  2024-04-28 12:40 [PATCH v4 0/8] drm/v3d: Enable Big and Super Pages Maíra Canal
@ 2024-04-28 12:40 ` Maíra Canal
  2024-04-28 12:40 ` [PATCH v4 2/8] drm/gem: Create a drm_gem_object_init_with_mnt() function Maíra Canal
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Maíra Canal @ 2024-04-28 12:40 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Maíra Canal

If the scheduler initialization fails, GEM initialization must fail as
well. Therefore, if `v3d_sched_init()` fails, free the DMA memory
allocated and return the error value in `v3d_gem_init()`.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_gem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index da8faf3b9011..b3b76332f2c5 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -291,8 +291,9 @@ v3d_gem_init(struct drm_device *dev)
 	ret = v3d_sched_init(v3d);
 	if (ret) {
 		drm_mm_takedown(&v3d->mm);
-		dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d->pt,
+		dma_free_coherent(v3d->drm.dev, pt_size, (void *)v3d->pt,
 				  v3d->pt_paddr);
+		return ret;
 	}
 
 	return 0;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/8] drm/gem: Create a drm_gem_object_init_with_mnt() function
  2024-04-28 12:40 [PATCH v4 0/8] drm/v3d: Enable Big and Super Pages Maíra Canal
  2024-04-28 12:40 ` [PATCH v4 1/8] drm/v3d: Fix return if scheduler initialization fails Maíra Canal
@ 2024-04-28 12:40 ` Maíra Canal
  2024-04-28 12:40 ` [PATCH v4 3/8] drm/v3d: Introduce gemfs Maíra Canal
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Maíra Canal @ 2024-04-28 12:40 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Maíra Canal, Tvrtko Ursulin

For some applications, such as applications that uses huge pages, we might
want to have a different mountpoint, for which we pass mount flags that
better match our usecase.

Therefore, create a new function `drm_gem_object_init_with_mnt()` that
allow us to define the tmpfs mountpoint where the GEM object will be
created. If this parameter is NULL, then we fallback to `shmem_file_setup()`.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
---
 drivers/gpu/drm/drm_gem.c | 34 ++++++++++++++++++++++++++++++----
 include/drm/drm_gem.h     |  3 +++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index d4bbc5d109c8..74ebe68e3d61 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -114,22 +114,32 @@ drm_gem_init(struct drm_device *dev)
 }
 
 /**
- * drm_gem_object_init - initialize an allocated shmem-backed GEM object
+ * drm_gem_object_init_with_mnt - initialize an allocated shmem-backed GEM
+ * object in a given shmfs mountpoint
+ *
  * @dev: drm_device the object should be initialized for
  * @obj: drm_gem_object to initialize
  * @size: object size
+ * @gemfs: tmpfs mount where the GEM object will be created. If NULL, use
+ * the usual tmpfs mountpoint (`shm_mnt`).
  *
  * Initialize an already allocated GEM object of the specified size with
  * shmfs backing store.
  */
-int drm_gem_object_init(struct drm_device *dev,
-			struct drm_gem_object *obj, size_t size)
+int drm_gem_object_init_with_mnt(struct drm_device *dev,
+				 struct drm_gem_object *obj, size_t size,
+				 struct vfsmount *gemfs)
 {
 	struct file *filp;
 
 	drm_gem_private_object_init(dev, obj, size);
 
-	filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+	if (gemfs)
+		filp = shmem_file_setup_with_mnt(gemfs, "drm mm object", size,
+						 VM_NORESERVE);
+	else
+		filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+
 	if (IS_ERR(filp))
 		return PTR_ERR(filp);
 
@@ -137,6 +147,22 @@ int drm_gem_object_init(struct drm_device *dev,
 
 	return 0;
 }
+EXPORT_SYMBOL(drm_gem_object_init_with_mnt);
+
+/**
+ * drm_gem_object_init - initialize an allocated shmem-backed GEM object
+ * @dev: drm_device the object should be initialized for
+ * @obj: drm_gem_object to initialize
+ * @size: object size
+ *
+ * Initialize an already allocated GEM object of the specified size with
+ * shmfs backing store.
+ */
+int drm_gem_object_init(struct drm_device *dev, struct drm_gem_object *obj,
+			size_t size)
+{
+	return drm_gem_object_init_with_mnt(dev, obj, size, NULL);
+}
 EXPORT_SYMBOL(drm_gem_object_init);
 
 /**
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index bae4865b2101..2ebf6e10cc44 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -472,6 +472,9 @@ void drm_gem_object_release(struct drm_gem_object *obj);
 void drm_gem_object_free(struct kref *kref);
 int drm_gem_object_init(struct drm_device *dev,
 			struct drm_gem_object *obj, size_t size);
+int drm_gem_object_init_with_mnt(struct drm_device *dev,
+				 struct drm_gem_object *obj, size_t size,
+				 struct vfsmount *gemfs);
 void drm_gem_private_object_init(struct drm_device *dev,
 				 struct drm_gem_object *obj, size_t size);
 void drm_gem_private_object_fini(struct drm_gem_object *obj);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/8] drm/v3d: Introduce gemfs
  2024-04-28 12:40 [PATCH v4 0/8] drm/v3d: Enable Big and Super Pages Maíra Canal
  2024-04-28 12:40 ` [PATCH v4 1/8] drm/v3d: Fix return if scheduler initialization fails Maíra Canal
  2024-04-28 12:40 ` [PATCH v4 2/8] drm/gem: Create a drm_gem_object_init_with_mnt() function Maíra Canal
@ 2024-04-28 12:40 ` Maíra Canal
  2024-04-28 12:40 ` [PATCH v4 4/8] drm/gem: Create shmem GEM object in a given mountpoint Maíra Canal
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Maíra Canal @ 2024-04-28 12:40 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Maíra Canal

Create a separate "tmpfs" kernel mount for V3D. This will allow us to
move away from the shmemfs `shm_mnt` and gives the flexibility to do
things like set our own mount options. Here, the interest is to use
"huge=", which should allow us to enable the use of THP for our
shmem-backed objects.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
---
 drivers/gpu/drm/v3d/Makefile    |  3 ++-
 drivers/gpu/drm/v3d/v3d_drv.h   |  9 +++++++
 drivers/gpu/drm/v3d/v3d_gem.c   |  3 +++
 drivers/gpu/drm/v3d/v3d_gemfs.c | 46 +++++++++++++++++++++++++++++++++
 4 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c

diff --git a/drivers/gpu/drm/v3d/Makefile b/drivers/gpu/drm/v3d/Makefile
index b7d673f1153b..fcf710926057 100644
--- a/drivers/gpu/drm/v3d/Makefile
+++ b/drivers/gpu/drm/v3d/Makefile
@@ -13,7 +13,8 @@ v3d-y := \
 	v3d_trace_points.o \
 	v3d_sched.o \
 	v3d_sysfs.o \
-	v3d_submit.o
+	v3d_submit.o \
+	v3d_gemfs.o
 
 v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index a2c516fe6d79..cef2f82b7a75 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -131,6 +131,11 @@ struct v3d_dev {
 	struct drm_mm mm;
 	spinlock_t mm_lock;
 
+	/*
+	 * tmpfs instance used for shmem backed objects
+	 */
+	struct vfsmount *gemfs;
+
 	struct work_struct overflow_mem_work;
 
 	struct v3d_bin_job *bin_job;
@@ -532,6 +537,10 @@ void v3d_reset(struct v3d_dev *v3d);
 void v3d_invalidate_caches(struct v3d_dev *v3d);
 void v3d_clean_caches(struct v3d_dev *v3d);
 
+/* v3d_gemfs.c */
+void v3d_gemfs_init(struct v3d_dev *v3d);
+void v3d_gemfs_fini(struct v3d_dev *v3d);
+
 /* v3d_submit.c */
 void v3d_job_cleanup(struct v3d_job *job);
 void v3d_job_put(struct v3d_job *job);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index b3b76332f2c5..b1e681630ded 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -288,6 +288,8 @@ v3d_gem_init(struct drm_device *dev)
 	v3d_init_hw_state(v3d);
 	v3d_mmu_set_page_table(v3d);
 
+	v3d_gemfs_init(v3d);
+
 	ret = v3d_sched_init(v3d);
 	if (ret) {
 		drm_mm_takedown(&v3d->mm);
@@ -305,6 +307,7 @@ v3d_gem_destroy(struct drm_device *dev)
 	struct v3d_dev *v3d = to_v3d_dev(dev);
 
 	v3d_sched_fini(v3d);
+	v3d_gemfs_fini(v3d);
 
 	/* Waiting for jobs to finish would need to be done before
 	 * unregistering V3D.
diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
new file mode 100644
index 000000000000..31cf5bd11e39
--- /dev/null
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* Copyright (C) 2024 Raspberry Pi */
+
+#include <linux/fs.h>
+#include <linux/mount.h>
+
+#include "v3d_drv.h"
+
+void v3d_gemfs_init(struct v3d_dev *v3d)
+{
+	char huge_opt[] = "huge=within_size";
+	struct file_system_type *type;
+	struct vfsmount *gemfs;
+
+	/*
+	 * By creating our own shmemfs mountpoint, we can pass in
+	 * mount flags that better match our usecase. However, we
+	 * only do so on platforms which benefit from it.
+	 */
+	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+		goto err;
+
+	type = get_fs_type("tmpfs");
+	if (!type)
+		goto err;
+
+	gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt);
+	if (IS_ERR(gemfs))
+		goto err;
+
+	v3d->gemfs = gemfs;
+	drm_info(&v3d->drm, "Using Transparent Hugepages\n");
+
+	return;
+
+err:
+	v3d->gemfs = NULL;
+	drm_notice(&v3d->drm,
+		   "Transparent Hugepage support is recommended for optimal performance on this platform!\n");
+}
+
+void v3d_gemfs_fini(struct v3d_dev *v3d)
+{
+	if (v3d->gemfs)
+		kern_unmount(v3d->gemfs);
+}
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 4/8] drm/gem: Create shmem GEM object in a given mountpoint
  2024-04-28 12:40 [PATCH v4 0/8] drm/v3d: Enable Big and Super Pages Maíra Canal
                   ` (2 preceding siblings ...)
  2024-04-28 12:40 ` [PATCH v4 3/8] drm/v3d: Introduce gemfs Maíra Canal
@ 2024-04-28 12:40 ` Maíra Canal
  2024-04-28 12:40 ` [PATCH v4 5/8] drm/v3d: Reduce the alignment of the node allocation Maíra Canal
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Maíra Canal @ 2024-04-28 12:40 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Maíra Canal, Tvrtko Ursulin

Create a function `drm_gem_shmem_create_with_mnt()`, similar to
`drm_gem_shmem_create()`, that has a mountpoint as a argument. This
function will create a shmem GEM object in a given tmpfs mountpoint.

This function will be useful for drivers that have a special mountpoint
with flags enabled.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
---
 drivers/gpu/drm/drm_gem_shmem_helper.c | 30 ++++++++++++++++++++++----
 include/drm/drm_gem_shmem_helper.h     |  3 +++
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 177773bcdbfd..10b7c4c769a3 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -49,7 +49,8 @@ static const struct drm_gem_object_funcs drm_gem_shmem_funcs = {
 };
 
 static struct drm_gem_shmem_object *
-__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private)
+__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private,
+		       struct vfsmount *gemfs)
 {
 	struct drm_gem_shmem_object *shmem;
 	struct drm_gem_object *obj;
@@ -76,7 +77,7 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private)
 		drm_gem_private_object_init(dev, obj, size);
 		shmem->map_wc = false; /* dma-buf mappings use always writecombine */
 	} else {
-		ret = drm_gem_object_init(dev, obj, size);
+		ret = drm_gem_object_init_with_mnt(dev, obj, size, gemfs);
 	}
 	if (ret) {
 		drm_gem_private_object_fini(obj);
@@ -123,10 +124,31 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private)
  */
 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, size_t size)
 {
-	return __drm_gem_shmem_create(dev, size, false);
+	return __drm_gem_shmem_create(dev, size, false, NULL);
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_create);
 
+/**
+ * drm_gem_shmem_create_with_mnt - Allocate an object with the given size in a
+ * given mountpoint
+ * @dev: DRM device
+ * @size: Size of the object to allocate
+ * @gemfs: tmpfs mount where the GEM object will be created
+ *
+ * This function creates a shmem GEM object in a given tmpfs mountpoint.
+ *
+ * Returns:
+ * A struct drm_gem_shmem_object * on success or an ERR_PTR()-encoded negative
+ * error code on failure.
+ */
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device *dev,
+							   size_t size,
+							   struct vfsmount *gemfs)
+{
+	return __drm_gem_shmem_create(dev, size, false, gemfs);
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_create_with_mnt);
+
 /**
  * drm_gem_shmem_free - Free resources associated with a shmem GEM object
  * @shmem: shmem GEM object to free
@@ -760,7 +782,7 @@ drm_gem_shmem_prime_import_sg_table(struct drm_device *dev,
 	size_t size = PAGE_ALIGN(attach->dmabuf->size);
 	struct drm_gem_shmem_object *shmem;
 
-	shmem = __drm_gem_shmem_create(dev, size, true);
+	shmem = __drm_gem_shmem_create(dev, size, true, NULL);
 	if (IS_ERR(shmem))
 		return ERR_CAST(shmem);
 
diff --git a/include/drm/drm_gem_shmem_helper.h b/include/drm/drm_gem_shmem_helper.h
index efbc9f27312b..d22e3fb53631 100644
--- a/include/drm/drm_gem_shmem_helper.h
+++ b/include/drm/drm_gem_shmem_helper.h
@@ -97,6 +97,9 @@ struct drm_gem_shmem_object {
 	container_of(obj, struct drm_gem_shmem_object, base)
 
 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, size_t size);
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device *dev,
+							   size_t size,
+							   struct vfsmount *gemfs);
 void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem);
 
 void drm_gem_shmem_put_pages(struct drm_gem_shmem_object *shmem);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 5/8] drm/v3d: Reduce the alignment of the node allocation
  2024-04-28 12:40 [PATCH v4 0/8] drm/v3d: Enable Big and Super Pages Maíra Canal
                   ` (3 preceding siblings ...)
  2024-04-28 12:40 ` [PATCH v4 4/8] drm/gem: Create shmem GEM object in a given mountpoint Maíra Canal
@ 2024-04-28 12:40 ` Maíra Canal
  2024-04-28 12:40 ` [PATCH v4 6/8] drm/v3d: Support Big/Super Pages when writing out PTEs Maíra Canal
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Maíra Canal @ 2024-04-28 12:40 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Maíra Canal

Currently, we are using an alignment of 128 kB to insert a node, which
ends up wasting memory as we perform plenty of small BOs allocations
(<= 4 kB). We require that allocations are aligned to 128Kb so for any
allocation smaller than that, we are wasting the difference.

This implies that we cannot effectively use the whole 4 GB address space
available for the GPU in the RPi 4. Currently, we can allocate up to
32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be
quite limiting for applications that have a high memory requirement, such
as vkoverhead [1].

By reducing the page alignment to 4 kB, we can allocate up to 1000000 BOs
of 4 kB (~4 GB) and 10000 BOs of 400 kB (~4 GB). Moreover, by performing
benchmarks, we were able to attest that reducing the page alignment to
4 kB can provide a general performance improvement in OpenGL
applications (e.g. glmark2).

Therefore, this patch reduces the alignment of the node allocation to 4
kB, which will allow RPi users to explore the whole 4GB virtual
address space provided by the hardware. Also, this patch allow users to
fully run vkoverhead in the RPi 4/5, solving the issue reported in [1].

[1] https://github.com/zmike/vkoverhead/issues/14

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_bo.c  | 2 +-
 drivers/gpu/drm/v3d/v3d_drv.h | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index a07ede668cc1..79e31c5299b1 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -110,7 +110,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 	 */
 	ret = drm_mm_insert_node_generic(&v3d->mm, &bo->node,
 					 obj->size >> V3D_MMU_PAGE_SHIFT,
-					 GMP_GRANULARITY >> V3D_MMU_PAGE_SHIFT, 0, 0);
+					 SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0);
 	spin_unlock(&v3d->mm_lock);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index cef2f82b7a75..e1f291db68de 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -17,8 +17,6 @@ struct clk;
 struct platform_device;
 struct reset_control;
 
-#define GMP_GRANULARITY (128 * 1024)
-
 #define V3D_MMU_PAGE_SHIFT 12
 
 #define V3D_MAX_QUEUES (V3D_CPU + 1)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 6/8] drm/v3d: Support Big/Super Pages when writing out PTEs
  2024-04-28 12:40 [PATCH v4 0/8] drm/v3d: Enable Big and Super Pages Maíra Canal
                   ` (4 preceding siblings ...)
  2024-04-28 12:40 ` [PATCH v4 5/8] drm/v3d: Reduce the alignment of the node allocation Maíra Canal
@ 2024-04-28 12:40 ` Maíra Canal
  2024-04-28 12:40 ` [PATCH v4 7/8] drm/v3d: Use gemfs/THP in BO creation if available Maíra Canal
  2024-04-28 12:40 ` [PATCH v4 8/8] drm/v3d: Add modparam for turning off Big/Super Pages Maíra Canal
  7 siblings, 0 replies; 12+ messages in thread
From: Maíra Canal @ 2024-04-28 12:40 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Maíra Canal, Tvrtko Ursulin

The V3D MMU also supports 64KB and 1MB pages, called big and super pages,
respectively. In order to set a 64KB page or 1MB page in the MMU, we need
to make sure that page table entries for all 4KB pages within a big/super
page must be correctly configured.

In order to create a big/super page, we need a contiguous memory region.
That's why we use a separate mountpoint with THP enabled. In order to
place the page table entries in the MMU, we iterate over the 16 4KB pages
(for big pages) or 256 4KB pages (for super pages) and insert the PTE.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_drv.h |  1 +
 drivers/gpu/drm/v3d/v3d_mmu.c | 52 ++++++++++++++++++++++++++---------
 2 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index e1f291db68de..3276eef280ef 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -18,6 +18,7 @@ struct platform_device;
 struct reset_control;
 
 #define V3D_MMU_PAGE_SHIFT 12
+#define V3D_PAGE_FACTOR (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT)
 
 #define V3D_MAX_QUEUES (V3D_CPU + 1)
 
diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c b/drivers/gpu/drm/v3d/v3d_mmu.c
index 14f3af40d6f6..2e0b31e373b2 100644
--- a/drivers/gpu/drm/v3d/v3d_mmu.c
+++ b/drivers/gpu/drm/v3d/v3d_mmu.c
@@ -25,9 +25,16 @@
  * superpage bit set.
  */
 #define V3D_PTE_SUPERPAGE BIT(31)
+#define V3D_PTE_BIGPAGE BIT(30)
 #define V3D_PTE_WRITEABLE BIT(29)
 #define V3D_PTE_VALID BIT(28)
 
+static bool v3d_mmu_is_aligned(u32 page, u32 page_address, size_t alignment)
+{
+	return IS_ALIGNED(page, alignment >> V3D_MMU_PAGE_SHIFT) &&
+		IS_ALIGNED(page_address, alignment >> V3D_MMU_PAGE_SHIFT);
+}
+
 static int v3d_mmu_flush_all(struct v3d_dev *v3d)
 {
 	int ret;
@@ -87,19 +94,38 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo)
 	struct drm_gem_shmem_object *shmem_obj = &bo->base;
 	struct v3d_dev *v3d = to_v3d_dev(shmem_obj->base.dev);
 	u32 page = bo->node.start;
-	u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID;
-	struct sg_dma_page_iter dma_iter;
-
-	for_each_sgtable_dma_page(shmem_obj->sgt, &dma_iter, 0) {
-		dma_addr_t dma_addr = sg_page_iter_dma_address(&dma_iter);
-		u32 page_address = dma_addr >> V3D_MMU_PAGE_SHIFT;
-		u32 pte = page_prot | page_address;
-		u32 i;
-
-		BUG_ON(page_address + (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT) >=
-		       BIT(24));
-		for (i = 0; i < PAGE_SIZE >> V3D_MMU_PAGE_SHIFT; i++)
-			v3d->pt[page++] = pte + i;
+	struct scatterlist *sgl;
+	unsigned int count;
+
+	for_each_sgtable_dma_sg(shmem_obj->sgt, sgl, count) {
+		dma_addr_t dma_addr = sg_dma_address(sgl);
+		u32 pfn = dma_addr >> V3D_MMU_PAGE_SHIFT;
+		unsigned int len = sg_dma_len(sgl);
+
+		while (len > 0) {
+			u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID;
+			u32 page_address = page_prot | pfn;
+			unsigned int i, page_size;
+
+			BUG_ON(pfn + V3D_PAGE_FACTOR >= BIT(24));
+
+			if (len >= SZ_1M && v3d_mmu_is_aligned(page, page_address, SZ_1M)) {
+				page_size = SZ_1M;
+				page_address |= V3D_PTE_SUPERPAGE;
+			} else if (len >= SZ_64K && v3d_mmu_is_aligned(page, page_address, SZ_64K)) {
+				page_size = SZ_64K;
+				page_address |= V3D_PTE_BIGPAGE;
+			} else {
+				page_size = SZ_4K;
+			}
+
+			for (i = 0; i < page_size >> V3D_MMU_PAGE_SHIFT; i++) {
+				v3d->pt[page++] = page_address + i;
+				pfn++;
+			}
+
+			len -= page_size;
+		}
 	}
 
 	WARN_ON_ONCE(page - bo->node.start !=
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 7/8] drm/v3d: Use gemfs/THP in BO creation if available
  2024-04-28 12:40 [PATCH v4 0/8] drm/v3d: Enable Big and Super Pages Maíra Canal
                   ` (5 preceding siblings ...)
  2024-04-28 12:40 ` [PATCH v4 6/8] drm/v3d: Support Big/Super Pages when writing out PTEs Maíra Canal
@ 2024-04-28 12:40 ` Maíra Canal
  2024-04-29  5:22   ` Iago Toral
  2024-04-28 12:40 ` [PATCH v4 8/8] drm/v3d: Add modparam for turning off Big/Super Pages Maíra Canal
  7 siblings, 1 reply; 12+ messages in thread
From: Maíra Canal @ 2024-04-28 12:40 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Maíra Canal, Tvrtko Ursulin

Although Big/Super Pages could appear naturally, it would be quite hard
to have 1MB or 64KB allocated contiguously naturally. Therefore, we can
force the creation of large pages allocated contiguously by using a
mountpoint with "huge=within_size" enabled.

Therefore, as V3D has a mountpoint with "huge=within_size" (if user has
THP enabled), use this mountpoint for BO creation if available. This
will allow us to create large pages allocated contiguously and make use
of Big/Super Pages.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_bo.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index 79e31c5299b1..16ac26c31c6b 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 	struct v3d_dev *v3d = to_v3d_dev(obj->dev);
 	struct v3d_bo *bo = to_v3d_bo(obj);
 	struct sg_table *sgt;
+	u64 align;
 	int ret;
 
 	/* So far we pin the BO in the MMU for its lifetime, so use
@@ -103,6 +104,15 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 	if (IS_ERR(sgt))
 		return PTR_ERR(sgt);
 
+	if (!v3d->gemfs)
+		align = SZ_4K;
+	else if (obj->size >= SZ_1M)
+		align = SZ_1M;
+	else if (obj->size >= SZ_64K)
+		align = SZ_64K;
+	else
+		align = SZ_4K;
+
 	spin_lock(&v3d->mm_lock);
 	/* Allocate the object's space in the GPU's page tables.
 	 * Inserting PTEs will happen later, but the offset is for the
@@ -110,7 +120,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 	 */
 	ret = drm_mm_insert_node_generic(&v3d->mm, &bo->node,
 					 obj->size >> V3D_MMU_PAGE_SHIFT,
-					 SZ_4K >> V3D_MMU_PAGE_SHIFT, 0, 0);
+					 align >> V3D_MMU_PAGE_SHIFT, 0, 0);
 	spin_unlock(&v3d->mm_lock);
 	if (ret)
 		return ret;
@@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, struct drm_file *file_priv,
 			     size_t unaligned_size)
 {
 	struct drm_gem_shmem_object *shmem_obj;
+	struct v3d_dev *v3d = to_v3d_dev(dev);
 	struct v3d_bo *bo;
 	int ret;
 
-	shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+	/* Let the user opt out of allocating the BOs with THP */
+	if (v3d->gemfs)
+		shmem_obj = drm_gem_shmem_create_with_mnt(dev, unaligned_size,
+							  v3d->gemfs);
+	else
+		shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+
 	if (IS_ERR(shmem_obj))
 		return ERR_CAST(shmem_obj);
 	bo = to_v3d_bo(&shmem_obj->base);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 8/8] drm/v3d: Add modparam for turning off Big/Super Pages
  2024-04-28 12:40 [PATCH v4 0/8] drm/v3d: Enable Big and Super Pages Maíra Canal
                   ` (6 preceding siblings ...)
  2024-04-28 12:40 ` [PATCH v4 7/8] drm/v3d: Use gemfs/THP in BO creation if available Maíra Canal
@ 2024-04-28 12:40 ` Maíra Canal
  2024-04-29  8:24   ` Tvrtko Ursulin
  7 siblings, 1 reply; 12+ messages in thread
From: Maíra Canal @ 2024-04-28 12:40 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Maíra Canal

Add a modparam for turning off Big/Super Pages to make sure that if an
user doesn't want Big/Super Pages enabled, it can disabled it by setting
the modparam to false.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_drv.c   | 7 +++++++
 drivers/gpu/drm/v3d/v3d_gemfs.c | 5 +++++
 2 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 28b7ddce7747..1a6e01235df6 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -36,6 +36,13 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0
 
+/* Only expose the `super_pages` modparam if THP is enabled. */
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+bool super_pages = true;
+module_param_named(super_pages, super_pages, bool, 0400);
+MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support.");
+#endif
+
 static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
 			       struct drm_file *file_priv)
 {
diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
index 31cf5bd11e39..0ade02bb7209 100644
--- a/drivers/gpu/drm/v3d/v3d_gemfs.c
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -11,6 +11,7 @@ void v3d_gemfs_init(struct v3d_dev *v3d)
 	char huge_opt[] = "huge=within_size";
 	struct file_system_type *type;
 	struct vfsmount *gemfs;
+	extern bool super_pages;
 
 	/*
 	 * By creating our own shmemfs mountpoint, we can pass in
@@ -20,6 +21,10 @@ void v3d_gemfs_init(struct v3d_dev *v3d)
 	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
 		goto err;
 
+	/* The user doesn't want to enable Super Pages */
+	if (!super_pages)
+		goto err;
+
 	type = get_fs_type("tmpfs");
 	if (!type)
 		goto err;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 7/8] drm/v3d: Use gemfs/THP in BO creation if available
  2024-04-28 12:40 ` [PATCH v4 7/8] drm/v3d: Use gemfs/THP in BO creation if available Maíra Canal
@ 2024-04-29  5:22   ` Iago Toral
  2024-04-29 16:52     ` Maíra Canal
  0 siblings, 1 reply; 12+ messages in thread
From: Iago Toral @ 2024-04-29  5:22 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, Tvrtko Ursulin, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Tvrtko Ursulin

Hi Maíra, 

a question below:

El dom, 28-04-2024 a las 09:40 -0300, Maíra Canal escribió:
> Although Big/Super Pages could appear naturally, it would be quite
> hard
> to have 1MB or 64KB allocated contiguously naturally. Therefore, we
> can
> force the creation of large pages allocated contiguously by using a
> mountpoint with "huge=within_size" enabled.
> 
> Therefore, as V3D has a mountpoint with "huge=within_size" (if user
> has
> THP enabled), use this mountpoint for BO creation if available. This
> will allow us to create large pages allocated contiguously and make
> use
> of Big/Super Pages.
> 
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> ---
> 

(...)

> @@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device
> *dev, struct drm_file *file_priv,
>  			     size_t unaligned_size)
>  {
>  	struct drm_gem_shmem_object *shmem_obj;
> +	struct v3d_dev *v3d = to_v3d_dev(dev);
>  	struct v3d_bo *bo;
>  	int ret;
>  
> -	shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
> +	/* Let the user opt out of allocating the BOs with THP */
> +	if (v3d->gemfs)
> +		shmem_obj = drm_gem_shmem_create_with_mnt(dev,
> unaligned_size,
> +							  v3d-
> >gemfs);
> +	else
> +		shmem_obj = drm_gem_shmem_create(dev,
> unaligned_size);
> +
>  	if (IS_ERR(shmem_obj))
>  		return ERR_CAST(shmem_obj);
>  	bo = to_v3d_bo(&shmem_obj->base);


if I read this correctly when we have THP we always allocate with that,
Even objects that are smaller than 64KB. I was wondering if there is
any benefit/downside to this or if the behavior for small allocations
is the same we had without the new mount point.

Iago

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 8/8] drm/v3d: Add modparam for turning off Big/Super Pages
  2024-04-28 12:40 ` [PATCH v4 8/8] drm/v3d: Add modparam for turning off Big/Super Pages Maíra Canal
@ 2024-04-29  8:24   ` Tvrtko Ursulin
  0 siblings, 0 replies; 12+ messages in thread
From: Tvrtko Ursulin @ 2024-04-29  8:24 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, Iago Toral, Tvrtko Ursulin,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev


On 28/04/2024 13:40, Maíra Canal wrote:
> Add a modparam for turning off Big/Super Pages to make sure that if an
> user doesn't want Big/Super Pages enabled, it can disabled it by setting
> the modparam to false.
> 
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
>   drivers/gpu/drm/v3d/v3d_drv.c   | 7 +++++++
>   drivers/gpu/drm/v3d/v3d_gemfs.c | 5 +++++
>   2 files changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
> index 28b7ddce7747..1a6e01235df6 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.c
> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> @@ -36,6 +36,13 @@
>   #define DRIVER_MINOR 0
>   #define DRIVER_PATCHLEVEL 0
>   
> +/* Only expose the `super_pages` modparam if THP is enabled. */
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +bool super_pages = true;
> +module_param_named(super_pages, super_pages, bool, 0400);
> +MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support.");
> +#endif
> +
>   static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
>   			       struct drm_file *file_priv)
>   {
> diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
> index 31cf5bd11e39..0ade02bb7209 100644
> --- a/drivers/gpu/drm/v3d/v3d_gemfs.c
> +++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
> @@ -11,6 +11,7 @@ void v3d_gemfs_init(struct v3d_dev *v3d)
>   	char huge_opt[] = "huge=within_size";
>   	struct file_system_type *type;
>   	struct vfsmount *gemfs;
> +	extern bool super_pages;
>   
>   	/*
>   	 * By creating our own shmemfs mountpoint, we can pass in
> @@ -20,6 +21,10 @@ void v3d_gemfs_init(struct v3d_dev *v3d)
>   	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
>   		goto err;
>   
> +	/* The user doesn't want to enable Super Pages */
> +	if (!super_pages)
> +		goto err;
> +
>   	type = get_fs_type("tmpfs");
>   	if (!type)
>   		goto err;

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 7/8] drm/v3d: Use gemfs/THP in BO creation if available
  2024-04-29  5:22   ` Iago Toral
@ 2024-04-29 16:52     ` Maíra Canal
  0 siblings, 0 replies; 12+ messages in thread
From: Maíra Canal @ 2024-04-29 16:52 UTC (permalink / raw)
  To: Iago Toral, Melissa Wen, Tvrtko Ursulin, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Tvrtko Ursulin

Hi Iago,

On 4/29/24 02:22, Iago Toral wrote:
> Hi Maíra,
> 
> a question below:
> 
> El dom, 28-04-2024 a las 09:40 -0300, Maíra Canal escribió:
>> Although Big/Super Pages could appear naturally, it would be quite
>> hard
>> to have 1MB or 64KB allocated contiguously naturally. Therefore, we
>> can
>> force the creation of large pages allocated contiguously by using a
>> mountpoint with "huge=within_size" enabled.
>>
>> Therefore, as V3D has a mountpoint with "huge=within_size" (if user
>> has
>> THP enabled), use this mountpoint for BO creation if available. This
>> will allow us to create large pages allocated contiguously and make
>> use
>> of Big/Super Pages.
>>
>> Signed-off-by: Maíra Canal <mcanal@igalia.com>
>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> ---
>>
> 
> (...)
> 
>> @@ -130,10 +140,17 @@ struct v3d_bo *v3d_bo_create(struct drm_device
>> *dev, struct drm_file *file_priv,
>>   			     size_t unaligned_size)
>>   {
>>   	struct drm_gem_shmem_object *shmem_obj;
>> +	struct v3d_dev *v3d = to_v3d_dev(dev);
>>   	struct v3d_bo *bo;
>>   	int ret;
>>   
>> -	shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
>> +	/* Let the user opt out of allocating the BOs with THP */
>> +	if (v3d->gemfs)
>> +		shmem_obj = drm_gem_shmem_create_with_mnt(dev,
>> unaligned_size,
>> +							  v3d-
>>> gemfs);
>> +	else
>> +		shmem_obj = drm_gem_shmem_create(dev,
>> unaligned_size);
>> +
>>   	if (IS_ERR(shmem_obj))
>>   		return ERR_CAST(shmem_obj);
>>   	bo = to_v3d_bo(&shmem_obj->base);
> 
> 
> if I read this correctly when we have THP we always allocate with that,
> Even objects that are smaller than 64KB. I was wondering if there is
> any benefit/downside to this or if the behavior for small allocations
> is the same we had without the new mount point.

I'm assuming that your concern is related to memory pressure and memory
fragmentation.

As we are using `huge=within_size`, we only allocate a huge page if it
will be fully within `i_size`. When using `huge=within_size`, we can
optimize the performance for smaller files without forcing larger files
to also use huge pages. I don't understand `huge=within_size` in full
details, but it is possible to check that it is able to avoid the system
(even the RPi) to go OOM. Although it is slightly less performant than
`huge=always` (used in v1), I believe it is more ideal for a system such
as the RPi due to the memory requirements.

Best Regards,
- Maíra

> 
> Iago

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-04-29 16:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-28 12:40 [PATCH v4 0/8] drm/v3d: Enable Big and Super Pages Maíra Canal
2024-04-28 12:40 ` [PATCH v4 1/8] drm/v3d: Fix return if scheduler initialization fails Maíra Canal
2024-04-28 12:40 ` [PATCH v4 2/8] drm/gem: Create a drm_gem_object_init_with_mnt() function Maíra Canal
2024-04-28 12:40 ` [PATCH v4 3/8] drm/v3d: Introduce gemfs Maíra Canal
2024-04-28 12:40 ` [PATCH v4 4/8] drm/gem: Create shmem GEM object in a given mountpoint Maíra Canal
2024-04-28 12:40 ` [PATCH v4 5/8] drm/v3d: Reduce the alignment of the node allocation Maíra Canal
2024-04-28 12:40 ` [PATCH v4 6/8] drm/v3d: Support Big/Super Pages when writing out PTEs Maíra Canal
2024-04-28 12:40 ` [PATCH v4 7/8] drm/v3d: Use gemfs/THP in BO creation if available Maíra Canal
2024-04-29  5:22   ` Iago Toral
2024-04-29 16:52     ` Maíra Canal
2024-04-28 12:40 ` [PATCH v4 8/8] drm/v3d: Add modparam for turning off Big/Super Pages Maíra Canal
2024-04-29  8:24   ` Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.