All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] drm/v3d: Enable Super Pages
@ 2024-03-11 10:05 Maíra Canal
  2024-03-11 10:05 ` [PATCH 1/5] drm/v3d: Fix return if scheduler initialization fails Maíra Canal
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Maíra Canal @ 2024-03-11 10:05 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Maíra Canal

This series introduces support for super pages in V3D. The V3D MMU has support
for 1MB pages, called super pages, which is currently not used. Therefore,
this patchset has the intention to enable super pages in V3D. The advantage of
enabling super pages size is that if any entry for a page within a super page
is cached in the MMU, it will be used for translation of all virtual addresses
in the range of that super pages without requiring fetching any other entries.

Super pages essentially means a slightly better performance for users,
especially in applications with high memory requirements (e.g. applications
that uses multiple large BOs).

Using a Raspberry Pi 4 (with a PAGE_SIZE=4KB downstream kernel), when running
traces from multiple applications, we were able to see the following
improvements:

fps_avg  helped:  warzone2100.70secs.1024x768.trace:                       1.81 -> 2.56 (41.82%)
fps_avg  helped:  warzone2100.30secs.1024x768.trace:                       2.00 -> 2.39 (19.62%)
fps_avg  helped:  quake2-gl1.4-1280x720.trace:                             35.01 -> 36.57 (4.47%)
fps_avg  helped:  supertuxkart-menus_1024x768.trace:                       120.75 -> 125.50 (3.93%)
fps_avg  helped:  quake2-gles3-1280x720.trace:                             62.69 -> 64.29 (2.55%)
fps_avg  helped:  ue4_shooter_game_shooting_low_quality_640x480.gfxr:      26.13 -> 26.75 (2.39%)
fps_avg  helped:  vkQuake_capture_frames_1_through_1200_1280x720.gfxr:     60.35 -> 61.36 (1.67%)
fps_avg  helped:  ue4_sun_temple_640x480.gfxr:                             24.60 -> 24.94 (1.40%)
fps_avg  helped:  ue4_shooter_game_shooting_high_quality_640x480.gfxr:     23.07 -> 23.34 (1.15%)
fps_avg  helped:  serious_sam_trace02_1280x720.gfxr:                       47.44 -> 47.74 (0.63%)
fps_avg  helped:  ue4_shooter_game_high_quality_640x480.gfxr:              18.91 -> 19.02 (0.59%)

Using a Raspberry Pi 5 (with a PAGE_SIZE=16KB downstream kernel), when running
traces from multiple applications, we were able to see the following
improvements:

fps_avg  helped:  warzone2100.30secs.1024x768.trace:                       3.60 -> 4.49 (24.72%)
fps_avg  helped:  sponza_demo02_800x600.gfxr:                              46.33 -> 49.34 (6.49%)
fps_avg  helped:  quake3e_capture_frames_1_through_1800_1920x1080.gfxr:    155.70 -> 165.71 (6.43%)
fps_avg  helped:  gl-117-1024x768.trace:                                   31.82 -> 33.85 (6.41%)
fps_avg  helped:  supertuxkart-menus_1024x768.trace:                       287.80 -> 303.80 (5.56%)
fps_avg  helped:  ue4_shooter_game_shooting_low_quality_640x480.gfxr:      45.27 -> 47.30 (4.49%)
fps_avg  helped:  sponza_demo01_800x600.gfxr:                              42.05 -> 43.68 (3.89%)
fps_avg  helped:  supertuxkart-racing_1024x768.trace:                      19.94 -> 20.59 (3.26%)
fps_avg  helped:  vkQuake_capture_frames_1_through_1200_1280x720.gfxr:     135.19 -> 139.45 (3.15%)
fps_avg  helped:  quake2-gles3-1280x720.trace:                             151.71 -> 156.13 (2.92%)
fps_avg  helped:  ue4_shooter_game_high_quality_640x480.gfxr:              30.28 -> 31.05 (2.54%)
fps_avg  helped:  rbdoom-3-bfg_640x480.gfxr:                               31.52 -> 32.30 (2.49%)
fps_avg  helped:  quake3e_capture_frames_1800_through_2400_1920x1080.gfxr: 157.29 -> 160.35 (1.94%)
fps_avg  helped:  quake3e-1280x720.trace:                                  230.48 -> 234.51 (1.75%)
fps_avg  helped:  ue4_shooter_game_shooting_high_quality_640x480.gfxr:     49.67 -> 50.46 (1.60%)
fps_avg  helped:  ue4_sun_temple_640x480.gfxr:                             39.70 -> 40.23 (1.34%)

This series also introduces changes in the GEM helpers, in order to enable V3D
to have a separated mountpoint for shmem GEM objects. Any feedback from the
community about the changes in the GEM helpers is welcomed!

Best Regards,
- Maíra

Maíra Canal (5):
  drm/v3d: Fix return if scheduler initialization fails
  drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  drm/v3d: Introduce gemfs
  drm/gem: Create shmem GEM object in a given mountpoint
  drm/v3d: Enable super pages

 drivers/gpu/drm/armada/armada_gem.c           |  2 +-
 drivers/gpu/drm/drm_gem.c                     | 12 ++++-
 drivers/gpu/drm/drm_gem_dma_helper.c          |  2 +-
 drivers/gpu/drm/drm_gem_shmem_helper.c        | 30 +++++++++--
 drivers/gpu/drm/drm_gem_vram_helper.c         |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c         |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_gem.c       |  2 +-
 drivers/gpu/drm/gma500/gem.c                  |  2 +-
 drivers/gpu/drm/loongson/lsdc_ttm.c           |  2 +-
 drivers/gpu/drm/mediatek/mtk_drm_gem.c        |  2 +-
 drivers/gpu/drm/msm/msm_gem.c                 |  2 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c         |  2 +-
 drivers/gpu/drm/nouveau/nouveau_prime.c       |  2 +-
 drivers/gpu/drm/omapdrm/omap_gem.c            |  2 +-
 drivers/gpu/drm/qxl/qxl_object.c              |  2 +-
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c   |  2 +-
 drivers/gpu/drm/tegra/gem.c                   |  2 +-
 drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c |  2 +-
 drivers/gpu/drm/v3d/Makefile                  |  3 +-
 drivers/gpu/drm/v3d/v3d_bo.c                  | 19 ++++++-
 drivers/gpu/drm/v3d/v3d_drv.c                 |  7 +++
 drivers/gpu/drm/v3d/v3d_drv.h                 | 15 +++++-
 drivers/gpu/drm/v3d/v3d_gem.c                 |  6 ++-
 drivers/gpu/drm/v3d/v3d_gemfs.c               | 52 +++++++++++++++++++
 drivers/gpu/drm/v3d/v3d_mmu.c                 | 24 ++++++++-
 drivers/gpu/drm/xen/xen_drm_front_gem.c       |  2 +-
 include/drm/drm_gem.h                         |  3 +-
 include/drm/drm_gem_shmem_helper.h            |  3 ++
 28 files changed, 176 insertions(+), 32 deletions(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c

--
2.43.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/5] drm/v3d: Fix return if scheduler initialization fails
  2024-03-11 10:05 [PATCH 0/5] drm/v3d: Enable Super Pages Maíra Canal
@ 2024-03-11 10:05 ` Maíra Canal
  2024-03-12  8:35   ` Iago Toral
  2024-03-11 10:05 ` [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init() Maíra Canal
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Maíra Canal @ 2024-03-11 10:05 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Maíra Canal

If the scheduler initialization fails, GEM initialization must fail as
well. Therefore, if `v3d_sched_init()` fails, free the DMA memory
allocated and return the error value in `v3d_gem_init()`.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_gem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index afc565078c78..66f4b78a6b2e 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -290,8 +290,9 @@ v3d_gem_init(struct drm_device *dev)
 	ret = v3d_sched_init(v3d);
 	if (ret) {
 		drm_mm_takedown(&v3d->mm);
-		dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d->pt,
+		dma_free_coherent(v3d->drm.dev, pt_size, (void *)v3d->pt,
 				  v3d->pt_paddr);
+		return ret;
 	}

 	return 0;
--
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-11 10:05 [PATCH 0/5] drm/v3d: Enable Super Pages Maíra Canal
  2024-03-11 10:05 ` [PATCH 1/5] drm/v3d: Fix return if scheduler initialization fails Maíra Canal
@ 2024-03-11 10:05 ` Maíra Canal
  2024-03-12  8:51   ` Tvrtko Ursulin
  2024-03-11 10:06 ` [PATCH 3/5] drm/v3d: Introduce gemfs Maíra Canal
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Maíra Canal @ 2024-03-11 10:05 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Maíra Canal, Russell King,
	Lucas Stach, Christian Gmeiner, Inki Dae, Seung-Woo Kim,
	Kyungmin Park, Krzysztof Kozlowski, Alim Akhtar,
	Patrik Jakobsson, Sui Jingfeng, Chun-Kuang Hu, Philipp Zabel,
	Matthias Brugger, AngeloGioacchino Del Regno, Rob Clark,
	Abhinav Kumar, Dmitry Baryshkov, Sean Paul, Marijn Suijten,
	Karol Herbst, Lyude Paul, Danilo Krummrich, Tomi Valkeinen,
	Gerd Hoffmann, Sandy Huang, Heiko Stübner, Andy Yan,
	Thierry Reding, Mikko Perttunen, Jonathan Hunter,
	Christian König, Huang Rui, Oleksandr Andrushchenko,
	Karolina Stolarek, Andi Shyti

For some applications, such as using huge pages, we might want to have a
different mountpoint, for which we pass in mount flags that better match
our usecase.

Therefore, add a new parameter to drm_gem_object_init() that allow us to
define the tmpfs mountpoint where the GEM object will be created. If
this parameter is NULL, then we fallback to shmem_file_setup().

Cc: Russell King <linux@armlinux.org.uk>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Alim Akhtar <alim.akhtar@samsung.com>
Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Cc: Sui Jingfeng <suijingfeng@loongson.cn>
Cc: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Sean Paul <sean@poorly.run>
Cc: Marijn Suijten <marijn.suijten@somainline.org>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@redhat.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Sandy Huang <hjc@rock-chips.com>
Cc: "Heiko Stübner" <heiko@sntech.de>
Cc: Andy Yan <andy.yan@rock-chips.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Mikko Perttunen <mperttunen@nvidia.com>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Cc: Karolina Stolarek <karolina.stolarek@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/armada/armada_gem.c           |  2 +-
 drivers/gpu/drm/drm_gem.c                     | 12 ++++++++++--
 drivers/gpu/drm/drm_gem_dma_helper.c          |  2 +-
 drivers/gpu/drm/drm_gem_shmem_helper.c        |  2 +-
 drivers/gpu/drm/drm_gem_vram_helper.c         |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c         |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_gem.c       |  2 +-
 drivers/gpu/drm/gma500/gem.c                  |  2 +-
 drivers/gpu/drm/loongson/lsdc_ttm.c           |  2 +-
 drivers/gpu/drm/mediatek/mtk_drm_gem.c        |  2 +-
 drivers/gpu/drm/msm/msm_gem.c                 |  2 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c         |  2 +-
 drivers/gpu/drm/nouveau/nouveau_prime.c       |  2 +-
 drivers/gpu/drm/omapdrm/omap_gem.c            |  2 +-
 drivers/gpu/drm/qxl/qxl_object.c              |  2 +-
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c   |  2 +-
 drivers/gpu/drm/tegra/gem.c                   |  2 +-
 drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c |  2 +-
 drivers/gpu/drm/xen/xen_drm_front_gem.c       |  2 +-
 include/drm/drm_gem.h                         |  3 ++-
 20 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/armada/armada_gem.c b/drivers/gpu/drm/armada/armada_gem.c
index 26d10065d534..36a25e667341 100644
--- a/drivers/gpu/drm/armada/armada_gem.c
+++ b/drivers/gpu/drm/armada/armada_gem.c
@@ -226,7 +226,7 @@ static struct armada_gem_object *armada_gem_alloc_object(struct drm_device *dev,

 	obj->obj.funcs = &armada_gem_object_funcs;

-	if (drm_gem_object_init(dev, &obj->obj, size)) {
+	if (drm_gem_object_init(dev, &obj->obj, size, NULL)) {
 		kfree(obj);
 		return NULL;
 	}
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 44a948b80ee1..ddd8777fcda5 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -118,18 +118,26 @@ drm_gem_init(struct drm_device *dev)
  * @dev: drm_device the object should be initialized for
  * @obj: drm_gem_object to initialize
  * @size: object size
+ * @gemfs: tmpfs mount where the GEM object will be created. If NULL, use
+ * the usual tmpfs mountpoint (`shm_mnt`).
  *
  * Initialize an already allocated GEM object of the specified size with
  * shmfs backing store.
  */
 int drm_gem_object_init(struct drm_device *dev,
-			struct drm_gem_object *obj, size_t size)
+			struct drm_gem_object *obj, size_t size,
+			struct vfsmount *gemfs)
 {
 	struct file *filp;

 	drm_gem_private_object_init(dev, obj, size);

-	filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+	if (gemfs)
+		filp = shmem_file_setup_with_mnt(gemfs, "drm mm object", size,
+						 VM_NORESERVE);
+	else
+		filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
+
 	if (IS_ERR(filp))
 		return PTR_ERR(filp);

diff --git a/drivers/gpu/drm/drm_gem_dma_helper.c b/drivers/gpu/drm/drm_gem_dma_helper.c
index 870b90b78bc4..9ada5ac85dd6 100644
--- a/drivers/gpu/drm/drm_gem_dma_helper.c
+++ b/drivers/gpu/drm/drm_gem_dma_helper.c
@@ -95,7 +95,7 @@ __drm_gem_dma_create(struct drm_device *drm, size_t size, bool private)
 		/* Always use writecombine for dma-buf mappings */
 		dma_obj->map_noncoherent = false;
 	} else {
-		ret = drm_gem_object_init(drm, gem_obj, size);
+		ret = drm_gem_object_init(drm, gem_obj, size, NULL);
 	}
 	if (ret)
 		goto error;
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index e435f986cd13..15635b330ca8 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -77,7 +77,7 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private)
 		drm_gem_private_object_init(dev, obj, size);
 		shmem->map_wc = false; /* dma-buf mappings use always writecombine */
 	} else {
-		ret = drm_gem_object_init(dev, obj, size);
+		ret = drm_gem_object_init(dev, obj, size, NULL);
 	}
 	if (ret) {
 		drm_gem_private_object_fini(obj);
diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c
index 75f2eaf0d5b6..90649899dbef 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -210,7 +210,7 @@ struct drm_gem_vram_object *drm_gem_vram_create(struct drm_device *dev,
 	if (!gem->funcs)
 		gem->funcs = &drm_gem_vram_object_funcs;

-	ret = drm_gem_object_init(dev, gem, size);
+	ret = drm_gem_object_init(dev, gem, size, NULL);
 	if (ret) {
 		kfree(gbo);
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index 71a6d2b1c80f..aa4b61c48b7f 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -596,7 +596,7 @@ int etnaviv_gem_new_handle(struct drm_device *dev, struct drm_file *file,

 	lockdep_set_class(&to_etnaviv_bo(obj)->lock, &etnaviv_shm_lock_class);

-	ret = drm_gem_object_init(dev, obj, size);
+	ret = drm_gem_object_init(dev, obj, size, NULL);
 	if (ret)
 		goto fail;

diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index 638ca96830e9..c50c0d12246e 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -160,7 +160,7 @@ static struct exynos_drm_gem *exynos_drm_gem_init(struct drm_device *dev,

 	obj->funcs = &exynos_drm_gem_object_funcs;

-	ret = drm_gem_object_init(dev, obj, size);
+	ret = drm_gem_object_init(dev, obj, size, NULL);
 	if (ret < 0) {
 		DRM_DEV_ERROR(dev->dev, "failed to initialize gem object\n");
 		kfree(exynos_gem);
diff --git a/drivers/gpu/drm/gma500/gem.c b/drivers/gpu/drm/gma500/gem.c
index 4b7627a72637..315e085dc9ee 100644
--- a/drivers/gpu/drm/gma500/gem.c
+++ b/drivers/gpu/drm/gma500/gem.c
@@ -169,7 +169,7 @@ psb_gem_create(struct drm_device *dev, u64 size, const char *name, bool stolen,
 	if (stolen) {
 		drm_gem_private_object_init(dev, obj, size);
 	} else {
-		ret = drm_gem_object_init(dev, obj, size);
+		ret = drm_gem_object_init(dev, obj, size, NULL);
 		if (ret)
 			goto err_release_resource;

diff --git a/drivers/gpu/drm/loongson/lsdc_ttm.c b/drivers/gpu/drm/loongson/lsdc_ttm.c
index 465f622ac05d..d392ea66d72e 100644
--- a/drivers/gpu/drm/loongson/lsdc_ttm.c
+++ b/drivers/gpu/drm/loongson/lsdc_ttm.c
@@ -458,7 +458,7 @@ struct lsdc_bo *lsdc_bo_create(struct drm_device *ddev,

 	size = ALIGN(size, PAGE_SIZE);

-	ret = drm_gem_object_init(ddev, &tbo->base, size);
+	ret = drm_gem_object_init(ddev, &tbo->base, size, NULL);
 	if (ret) {
 		kfree(lbo);
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_gem.c b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
index 4f2e3feabc0f..261d386921dc 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_gem.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
@@ -44,7 +44,7 @@ static struct mtk_drm_gem_obj *mtk_drm_gem_init(struct drm_device *dev,

 	mtk_gem_obj->base.funcs = &mtk_drm_gem_object_funcs;

-	ret = drm_gem_object_init(dev, &mtk_gem_obj->base, size);
+	ret = drm_gem_object_init(dev, &mtk_gem_obj->base, size, NULL);
 	if (ret < 0) {
 		DRM_ERROR("failed to initialize gem object\n");
 		kfree(mtk_gem_obj);
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 175ee4ab8a6f..6fe17cf28ef6 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -1222,7 +1222,7 @@ struct drm_gem_object *msm_gem_new(struct drm_device *dev, uint32_t size, uint32

 		vma->iova = physaddr(obj);
 	} else {
-		ret = drm_gem_object_init(dev, obj, size);
+		ret = drm_gem_object_init(dev, obj, size, NULL);
 		if (ret)
 			goto fail;
 		/*
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 49c2bcbef129..434325fa8752 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -262,7 +262,7 @@ nouveau_gem_new(struct nouveau_cli *cli, u64 size, int align, uint32_t domain,

 	/* Initialize the embedded gem-object. We return a single gem-reference
 	 * to the caller, instead of a normal nouveau_bo ttm reference. */
-	ret = drm_gem_object_init(drm->dev, &nvbo->bo.base, size);
+	ret = drm_gem_object_init(drm->dev, &nvbo->bo.base, size, NULL);
 	if (ret) {
 		drm_gem_object_release(&nvbo->bo.base);
 		kfree(nvbo);
diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c b/drivers/gpu/drm/nouveau/nouveau_prime.c
index 1b2ff0c40fc1..c9b3572df555 100644
--- a/drivers/gpu/drm/nouveau/nouveau_prime.c
+++ b/drivers/gpu/drm/nouveau/nouveau_prime.c
@@ -62,7 +62,7 @@ struct drm_gem_object *nouveau_gem_prime_import_sg_table(struct drm_device *dev,

 	/* Initialize the embedded gem-object. We return a single gem-reference
 	 * to the caller, instead of a normal nouveau_bo ttm reference. */
-	ret = drm_gem_object_init(dev, &nvbo->bo.base, size);
+	ret = drm_gem_object_init(dev, &nvbo->bo.base, size, NULL);
 	if (ret) {
 		nouveau_bo_ref(NULL, &nvbo);
 		obj = ERR_PTR(-ENOMEM);
diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c
index 3421e8389222..53b4ec64c7b0 100644
--- a/drivers/gpu/drm/omapdrm/omap_gem.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem.c
@@ -1352,7 +1352,7 @@ struct drm_gem_object *omap_gem_new(struct drm_device *dev,
 	if (!(flags & OMAP_BO_MEM_SHMEM)) {
 		drm_gem_private_object_init(dev, obj, size);
 	} else {
-		ret = drm_gem_object_init(dev, obj, size);
+		ret = drm_gem_object_init(dev, obj, size, NULL);
 		if (ret)
 			goto err_free;

diff --git a/drivers/gpu/drm/qxl/qxl_object.c b/drivers/gpu/drm/qxl/qxl_object.c
index 1e46b0a6e478..45d7abe26ebd 100644
--- a/drivers/gpu/drm/qxl/qxl_object.c
+++ b/drivers/gpu/drm/qxl/qxl_object.c
@@ -123,7 +123,7 @@ int qxl_bo_create(struct qxl_device *qdev, unsigned long size,
 	if (bo == NULL)
 		return -ENOMEM;
 	size = roundup(size, PAGE_SIZE);
-	r = drm_gem_object_init(&qdev->ddev, &bo->tbo.base, size);
+	r = drm_gem_object_init(&qdev->ddev, &bo->tbo.base, size, NULL);
 	if (unlikely(r)) {
 		kfree(bo);
 		return r;
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index 93ed841f5dce..daba285bd78f 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -295,7 +295,7 @@ static struct rockchip_gem_object *

 	obj->funcs = &rockchip_gem_object_funcs;

-	drm_gem_object_init(drm, obj, size);
+	drm_gem_object_init(drm, obj, size, NULL);

 	return rk_obj;
 }
diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index b4eb030ea961..63f10d5a57ba 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -311,7 +311,7 @@ static struct tegra_bo *tegra_bo_alloc_object(struct drm_device *drm,
 	host1x_bo_init(&bo->base, &tegra_bo_ops);
 	size = round_up(size, PAGE_SIZE);

-	err = drm_gem_object_init(drm, &bo->gem, size);
+	err = drm_gem_object_init(drm, &bo->gem, size, NULL);
 	if (err < 0)
 		goto free;

diff --git a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
index 7b7c1fa805fc..a9bf7d5a887c 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
@@ -61,7 +61,7 @@ struct ttm_buffer_object *ttm_bo_kunit_init(struct kunit *test,
 	KUNIT_ASSERT_NOT_NULL(test, bo);

 	bo->base = gem_obj;
-	err = drm_gem_object_init(devs->drm, &bo->base, size);
+	err = drm_gem_object_init(devs->drm, &bo->base, size, NULL);
 	KUNIT_ASSERT_EQ(test, err, 0);

 	bo->bdev = devs->ttm_dev;
diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c b/drivers/gpu/drm/xen/xen_drm_front_gem.c
index 3ad2b4cfd1f0..1b36c958340b 100644
--- a/drivers/gpu/drm/xen/xen_drm_front_gem.c
+++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c
@@ -122,7 +122,7 @@ static struct xen_gem_object *gem_create_obj(struct drm_device *dev,

 	xen_obj->base.funcs = &xen_drm_front_gem_object_funcs;

-	ret = drm_gem_object_init(dev, &xen_obj->base, size);
+	ret = drm_gem_object_init(dev, &xen_obj->base, size, NULL);
 	if (ret < 0) {
 		kfree(xen_obj);
 		return ERR_PTR(ret);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 2ebec3984cd4..c75611ae8f93 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -471,7 +471,8 @@ struct drm_gem_object {
 void drm_gem_object_release(struct drm_gem_object *obj);
 void drm_gem_object_free(struct kref *kref);
 int drm_gem_object_init(struct drm_device *dev,
-			struct drm_gem_object *obj, size_t size);
+			struct drm_gem_object *obj, size_t size,
+			struct vfsmount *gemfs);
 void drm_gem_private_object_init(struct drm_device *dev,
 				 struct drm_gem_object *obj, size_t size);
 void drm_gem_private_object_fini(struct drm_gem_object *obj);
--
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/5] drm/v3d: Introduce gemfs
  2024-03-11 10:05 [PATCH 0/5] drm/v3d: Enable Super Pages Maíra Canal
  2024-03-11 10:05 ` [PATCH 1/5] drm/v3d: Fix return if scheduler initialization fails Maíra Canal
  2024-03-11 10:05 ` [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init() Maíra Canal
@ 2024-03-11 10:06 ` Maíra Canal
  2024-03-12  8:35   ` Iago Toral
  2024-03-12  8:55   ` Tvrtko Ursulin
  2024-03-11 10:06 ` [PATCH 4/5] drm/gem: Create shmem GEM object in a given mountpoint Maíra Canal
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 28+ messages in thread
From: Maíra Canal @ 2024-03-11 10:06 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Maíra Canal

Create a separate "tmpfs" kernel mount for V3D. This will allow us to
move away from the shmemfs `shm_mnt` and gives the flexibility to do
things like set our own mount options. Here, the interest is to use
"huge=", which should allow us to enable the use of THP for our
shmem-backed objects.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/Makefile    |  3 ++-
 drivers/gpu/drm/v3d/v3d_drv.h   |  9 +++++++
 drivers/gpu/drm/v3d/v3d_gem.c   |  3 +++
 drivers/gpu/drm/v3d/v3d_gemfs.c | 46 +++++++++++++++++++++++++++++++++
 4 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c

diff --git a/drivers/gpu/drm/v3d/Makefile b/drivers/gpu/drm/v3d/Makefile
index b7d673f1153b..fcf710926057 100644
--- a/drivers/gpu/drm/v3d/Makefile
+++ b/drivers/gpu/drm/v3d/Makefile
@@ -13,7 +13,8 @@ v3d-y := \
 	v3d_trace_points.o \
 	v3d_sched.o \
 	v3d_sysfs.o \
-	v3d_submit.o
+	v3d_submit.o \
+	v3d_gemfs.o

 v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 1950c723dde1..d2ce8222771a 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -119,6 +119,11 @@ struct v3d_dev {
 	struct drm_mm mm;
 	spinlock_t mm_lock;

+	/*
+	 * tmpfs instance used for shmem backed objects
+	 */
+	struct vfsmount *gemfs;
+
 	struct work_struct overflow_mem_work;

 	struct v3d_bin_job *bin_job;
@@ -519,6 +524,10 @@ void v3d_reset(struct v3d_dev *v3d);
 void v3d_invalidate_caches(struct v3d_dev *v3d);
 void v3d_clean_caches(struct v3d_dev *v3d);

+/* v3d_gemfs.c */
+void v3d_gemfs_init(struct v3d_dev *v3d);
+void v3d_gemfs_fini(struct v3d_dev *v3d);
+
 /* v3d_submit.c */
 void v3d_job_cleanup(struct v3d_job *job);
 void v3d_job_put(struct v3d_job *job);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 66f4b78a6b2e..faefbe497e8d 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -287,6 +287,8 @@ v3d_gem_init(struct drm_device *dev)
 	v3d_init_hw_state(v3d);
 	v3d_mmu_set_page_table(v3d);

+	v3d_gemfs_init(v3d);
+
 	ret = v3d_sched_init(v3d);
 	if (ret) {
 		drm_mm_takedown(&v3d->mm);
@@ -304,6 +306,7 @@ v3d_gem_destroy(struct drm_device *dev)
 	struct v3d_dev *v3d = to_v3d_dev(dev);

 	v3d_sched_fini(v3d);
+	v3d_gemfs_fini(v3d);

 	/* Waiting for jobs to finish would need to be done before
 	 * unregistering V3D.
diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
new file mode 100644
index 000000000000..8518b7da6f73
--- /dev/null
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* Copyright (C) 2024 Raspberry Pi */
+
+#include <linux/fs.h>
+#include <linux/mount.h>
+
+#include "v3d_drv.h"
+
+void v3d_gemfs_init(struct v3d_dev *v3d)
+{
+	char huge_opt[] = "huge=always";
+	struct file_system_type *type;
+	struct vfsmount *gemfs;
+
+	/*
+	 * By creating our own shmemfs mountpoint, we can pass in
+	 * mount flags that better match our usecase. However, we
+	 * only do so on platforms which benefit from it.
+	 */
+	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+		goto err;
+
+	type = get_fs_type("tmpfs");
+	if (!type)
+		goto err;
+
+	gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt);
+	if (IS_ERR(gemfs))
+		goto err;
+
+	v3d->gemfs = gemfs;
+	drm_info(&v3d->drm, "Using Transparent Hugepages\n");
+
+	return;
+
+err:
+	v3d->gemfs = NULL;
+	drm_notice(&v3d->drm,
+		   "Transparent Hugepage support is recommended for optimal performance on this platform!\n");
+}
+
+void v3d_gemfs_fini(struct v3d_dev *v3d)
+{
+	if (v3d->gemfs)
+		kern_unmount(v3d->gemfs);
+}
--
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/5] drm/gem: Create shmem GEM object in a given mountpoint
  2024-03-11 10:05 [PATCH 0/5] drm/v3d: Enable Super Pages Maíra Canal
                   ` (2 preceding siblings ...)
  2024-03-11 10:06 ` [PATCH 3/5] drm/v3d: Introduce gemfs Maíra Canal
@ 2024-03-11 10:06 ` Maíra Canal
  2024-03-11 10:06 ` [PATCH 5/5] drm/v3d: Enable super pages Maíra Canal
  2024-03-12  8:37 ` [PATCH 0/5] drm/v3d: Enable Super Pages Iago Toral
  5 siblings, 0 replies; 28+ messages in thread
From: Maíra Canal @ 2024-03-11 10:06 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Maíra Canal

Create a function `drm_gem_shmem_create_with_mnt()`, similar to
`drm_gem_shmem_create()`, that has a mountpoint as a argument. This
function will create a shmem GEM object in a given tmpfs mountpoint.

This function will be useful for drivers that have a special mountpoint
with flags enabled.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/drm_gem_shmem_helper.c | 30 ++++++++++++++++++++++----
 include/drm/drm_gem_shmem_helper.h     |  3 +++
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 15635b330ca8..1097def870a2 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -50,7 +50,8 @@ static const struct drm_gem_object_funcs drm_gem_shmem_funcs = {
 };

 static struct drm_gem_shmem_object *
-__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private)
+__drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private,
+		       struct vfsmount *gemfs)
 {
 	struct drm_gem_shmem_object *shmem;
 	struct drm_gem_object *obj;
@@ -77,7 +78,7 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private)
 		drm_gem_private_object_init(dev, obj, size);
 		shmem->map_wc = false; /* dma-buf mappings use always writecombine */
 	} else {
-		ret = drm_gem_object_init(dev, obj, size, NULL);
+		ret = drm_gem_object_init(dev, obj, size, gemfs);
 	}
 	if (ret) {
 		drm_gem_private_object_fini(obj);
@@ -124,10 +125,31 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private)
  */
 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, size_t size)
 {
-	return __drm_gem_shmem_create(dev, size, false);
+	return __drm_gem_shmem_create(dev, size, false, NULL);
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_create);

+/**
+ * drm_gem_shmem_create_with_mnt - Allocate an object with the given size in a
+ * given mountpoint
+ * @dev: DRM device
+ * @size: Size of the object to allocate
+ * @gemfs: tmpfs mount where the GEM object will be created
+ *
+ * This function creates a shmem GEM object in a given tmpfs mountpoint.
+ *
+ * Returns:
+ * A struct drm_gem_shmem_object * on success or an ERR_PTR()-encoded negative
+ * error code on failure.
+ */
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device *dev,
+							   size_t size,
+							   struct vfsmount *gemfs)
+{
+	return __drm_gem_shmem_create(dev, size, false, gemfs);
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_create_with_mnt);
+
 /**
  * drm_gem_shmem_free - Free resources associated with a shmem GEM object
  * @shmem: shmem GEM object to free
@@ -759,7 +781,7 @@ drm_gem_shmem_prime_import_sg_table(struct drm_device *dev,
 	size_t size = PAGE_ALIGN(attach->dmabuf->size);
 	struct drm_gem_shmem_object *shmem;

-	shmem = __drm_gem_shmem_create(dev, size, true);
+	shmem = __drm_gem_shmem_create(dev, size, true, NULL);
 	if (IS_ERR(shmem))
 		return ERR_CAST(shmem);

diff --git a/include/drm/drm_gem_shmem_helper.h b/include/drm/drm_gem_shmem_helper.h
index bf0c31aa8fbe..ad5e32d01892 100644
--- a/include/drm/drm_gem_shmem_helper.h
+++ b/include/drm/drm_gem_shmem_helper.h
@@ -97,6 +97,9 @@ struct drm_gem_shmem_object {
 	container_of(obj, struct drm_gem_shmem_object, base)

 struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, size_t size);
+struct drm_gem_shmem_object *drm_gem_shmem_create_with_mnt(struct drm_device *dev,
+							   size_t size,
+							   struct vfsmount *gemfs);
 void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem);

 void drm_gem_shmem_put_pages(struct drm_gem_shmem_object *shmem);
--
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/5] drm/v3d: Enable super pages
  2024-03-11 10:05 [PATCH 0/5] drm/v3d: Enable Super Pages Maíra Canal
                   ` (3 preceding siblings ...)
  2024-03-11 10:06 ` [PATCH 4/5] drm/gem: Create shmem GEM object in a given mountpoint Maíra Canal
@ 2024-03-11 10:06 ` Maíra Canal
  2024-03-12  8:34   ` Iago Toral
  2024-03-12 13:41   ` Tvrtko Ursulin
  2024-03-12  8:37 ` [PATCH 0/5] drm/v3d: Enable Super Pages Iago Toral
  5 siblings, 2 replies; 28+ messages in thread
From: Maíra Canal @ 2024-03-11 10:06 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Maíra Canal

The V3D MMU also supports 1MB pages, called super pages. In order to
set a 1MB page in the MMU, we need to make sure that page table entries
for all 4KB pages within a super page must be correctly configured.

Therefore, if the BO is larger than 2MB, we allocate it in a separate
mountpoint that uses THP. This will allow us to create a contiguous
memory region to create our super pages. In order to place the page
table entries in the MMU, we iterate over the 256 4KB pages and insert
the PTE.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_bo.c    | 19 +++++++++++++++++--
 drivers/gpu/drm/v3d/v3d_drv.c   |  7 +++++++
 drivers/gpu/drm/v3d/v3d_drv.h   |  6 ++++--
 drivers/gpu/drm/v3d/v3d_gemfs.c |  6 ++++++
 drivers/gpu/drm/v3d/v3d_mmu.c   | 24 ++++++++++++++++++++++--
 5 files changed, 56 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index a07ede668cc1..cb8e49a33be7 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 	struct v3d_dev *v3d = to_v3d_dev(obj->dev);
 	struct v3d_bo *bo = to_v3d_bo(obj);
 	struct sg_table *sgt;
+	u64 align;
 	int ret;

 	/* So far we pin the BO in the MMU for its lifetime, so use
@@ -103,6 +104,9 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 	if (IS_ERR(sgt))
 		return PTR_ERR(sgt);

+	bo->huge_pages = (obj->size >= SZ_2M && v3d->super_pages);
+	align = bo->huge_pages ? SZ_1M : SZ_4K;
+
 	spin_lock(&v3d->mm_lock);
 	/* Allocate the object's space in the GPU's page tables.
 	 * Inserting PTEs will happen later, but the offset is for the
@@ -110,7 +114,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
 	 */
 	ret = drm_mm_insert_node_generic(&v3d->mm, &bo->node,
 					 obj->size >> V3D_MMU_PAGE_SHIFT,
-					 GMP_GRANULARITY >> V3D_MMU_PAGE_SHIFT, 0, 0);
+					 align >> V3D_MMU_PAGE_SHIFT, 0, 0);
 	spin_unlock(&v3d->mm_lock);
 	if (ret)
 		return ret;
@@ -130,10 +134,21 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, struct drm_file *file_priv,
 			     size_t unaligned_size)
 {
 	struct drm_gem_shmem_object *shmem_obj;
+	struct v3d_dev *v3d = to_v3d_dev(dev);
 	struct v3d_bo *bo;
+	size_t size;
 	int ret;

-	shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
+	size = PAGE_ALIGN(unaligned_size);
+
+	/* To avoid memory fragmentation, we only use THP if the BO is bigger
+	 * than two Super Pages (1MB).
+	 */
+	if (size >= SZ_2M && v3d->super_pages)
+		shmem_obj = drm_gem_shmem_create_with_mnt(dev, size, v3d->gemfs);
+	else
+		shmem_obj = drm_gem_shmem_create(dev, size);
+
 	if (IS_ERR(shmem_obj))
 		return ERR_CAST(shmem_obj);
 	bo = to_v3d_bo(&shmem_obj->base);
diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 3debf37e7d9b..96f4d8227407 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -36,6 +36,11 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0

+static bool super_pages = true;
+module_param_named(super_pages, super_pages, bool, 0400);
+MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support. Note: \
+			       To enable Super Pages, you need support to THP.");
+
 static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
 			       struct drm_file *file_priv)
 {
@@ -308,6 +313,8 @@ static int v3d_platform_drm_probe(struct platform_device *pdev)
 		return -ENOMEM;
 	}

+	v3d->super_pages = super_pages;
+
 	ret = v3d_gem_init(drm);
 	if (ret)
 		goto dma_free;
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index d2ce8222771a..795087663739 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -17,9 +17,8 @@ struct clk;
 struct platform_device;
 struct reset_control;

-#define GMP_GRANULARITY (128 * 1024)
-
 #define V3D_MMU_PAGE_SHIFT 12
+#define V3D_PAGE_FACTOR (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT)

 #define V3D_MAX_QUEUES (V3D_CPU + 1)

@@ -123,6 +122,7 @@ struct v3d_dev {
 	 * tmpfs instance used for shmem backed objects
 	 */
 	struct vfsmount *gemfs;
+	bool super_pages;

 	struct work_struct overflow_mem_work;

@@ -211,6 +211,8 @@ struct v3d_bo {
 	struct list_head unref_head;

 	void *vaddr;
+
+	bool huge_pages;
 };

 static inline struct v3d_bo *
diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
index 8518b7da6f73..bcde3138f555 100644
--- a/drivers/gpu/drm/v3d/v3d_gemfs.c
+++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
@@ -12,6 +12,10 @@ void v3d_gemfs_init(struct v3d_dev *v3d)
 	struct file_system_type *type;
 	struct vfsmount *gemfs;

+	/* The user doesn't want support for Super Pages */
+	if (!v3d->super_pages)
+		goto err;
+
 	/*
 	 * By creating our own shmemfs mountpoint, we can pass in
 	 * mount flags that better match our usecase. However, we
@@ -35,6 +39,8 @@ void v3d_gemfs_init(struct v3d_dev *v3d)

 err:
 	v3d->gemfs = NULL;
+	v3d->super_pages = false;
+
 	drm_notice(&v3d->drm,
 		   "Transparent Hugepage support is recommended for optimal performance on this platform!\n");
 }
diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c b/drivers/gpu/drm/v3d/v3d_mmu.c
index 14f3af40d6f6..2f368dc2c0ca 100644
--- a/drivers/gpu/drm/v3d/v3d_mmu.c
+++ b/drivers/gpu/drm/v3d/v3d_mmu.c
@@ -89,6 +89,9 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo)
 	u32 page = bo->node.start;
 	u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID;
 	struct sg_dma_page_iter dma_iter;
+	int ctg_size = drm_prime_get_contiguous_size(shmem_obj->sgt);
+	u32 page_size = 0;
+	u32 npages = 0;

 	for_each_sgtable_dma_page(shmem_obj->sgt, &dma_iter, 0) {
 		dma_addr_t dma_addr = sg_page_iter_dma_address(&dma_iter);
@@ -96,10 +99,27 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo)
 		u32 pte = page_prot | page_address;
 		u32 i;

-		BUG_ON(page_address + (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT) >=
+		if (npages == 0) {
+			if (ctg_size >= SZ_1M && bo->huge_pages) {
+				page_size = SZ_1M;
+				npages = 256;
+			} else {
+				page_size = SZ_4K;
+				npages = V3D_PAGE_FACTOR;
+			}
+
+			ctg_size -= npages * SZ_4K;
+		}
+
+		if (page_size == SZ_1M)
+			pte |= V3D_PTE_SUPERPAGE;
+
+		BUG_ON(page_address + V3D_PAGE_FACTOR >=
 		       BIT(24));
-		for (i = 0; i < PAGE_SIZE >> V3D_MMU_PAGE_SHIFT; i++)
+		for (i = 0; i < V3D_PAGE_FACTOR; i++)
 			v3d->pt[page++] = pte + i;
+
+		npages -= V3D_PAGE_FACTOR;
 	}

 	WARN_ON_ONCE(page - bo->node.start !=
--
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] drm/v3d: Enable super pages
  2024-03-11 10:06 ` [PATCH 5/5] drm/v3d: Enable super pages Maíra Canal
@ 2024-03-12  8:34   ` Iago Toral
  2024-03-12 13:41   ` Tvrtko Ursulin
  1 sibling, 0 replies; 28+ messages in thread
From: Iago Toral @ 2024-03-12  8:34 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev

El lun, 11-03-2024 a las 07:06 -0300, Maíra Canal escribió:
> The V3D MMU also supports 1MB pages, called super pages. In order to
> set a 1MB page in the MMU, we need to make sure that page table
> entries
> for all 4KB pages within a super page must be correctly configured.
> 
> Therefore, if the BO is larger than 2MB, we allocate it in a separate
> mountpoint that uses THP. This will allow us to create a contiguous
> memory region to create our super pages. In order to place the page
> table entries in the MMU, we iterate over the 256 4KB pages and
> insert
> the PTE.
> 
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
>  drivers/gpu/drm/v3d/v3d_bo.c    | 19 +++++++++++++++++--
>  drivers/gpu/drm/v3d/v3d_drv.c   |  7 +++++++
>  drivers/gpu/drm/v3d/v3d_drv.h   |  6 ++++--
>  drivers/gpu/drm/v3d/v3d_gemfs.c |  6 ++++++
>  drivers/gpu/drm/v3d/v3d_mmu.c   | 24 ++++++++++++++++++++++--
>  5 files changed, 56 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_bo.c
> b/drivers/gpu/drm/v3d/v3d_bo.c
> index a07ede668cc1..cb8e49a33be7 100644
> --- a/drivers/gpu/drm/v3d/v3d_bo.c
> +++ b/drivers/gpu/drm/v3d/v3d_bo.c
> @@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
>  	struct v3d_dev *v3d = to_v3d_dev(obj->dev);
>  	struct v3d_bo *bo = to_v3d_bo(obj);
>  	struct sg_table *sgt;
> +	u64 align;
>  	int ret;
> 
>  	/* So far we pin the BO in the MMU for its lifetime, so use
> @@ -103,6 +104,9 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
>  	if (IS_ERR(sgt))
>  		return PTR_ERR(sgt);
> 
> +	bo->huge_pages = (obj->size >= SZ_2M && v3d->super_pages);

We have this check for detecting huge pages replicated here and in
v3d_bo_create, but I think we can just do this check once in
v3d_bo_create and assign this field there as well so we don't have to
repeat the check in both functions?

> +	align = bo->huge_pages ? SZ_1M : SZ_4K;
> +
>  	spin_lock(&v3d->mm_lock);
>  	/* Allocate the object's space in the GPU's page tables.
>  	 * Inserting PTEs will happen later, but the offset is for
> the
> @@ -110,7 +114,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
>  	 */
>  	ret = drm_mm_insert_node_generic(&v3d->mm, &bo->node,
>  					 obj->size >>
> V3D_MMU_PAGE_SHIFT,
> -					 GMP_GRANULARITY >>
> V3D_MMU_PAGE_SHIFT, 0, 0);
> +					 align >>
> V3D_MMU_PAGE_SHIFT, 0, 0);

This is making another change to drop the page align size for the
regular case from GMP_GRANULARITY to 4KB. I think this change is
relevant enough that it would probably deserve a separate commit
explaining the rationale for it. What do you think?

>  	spin_unlock(&v3d->mm_lock);
>  	if (ret)
>  		return ret;
> @@ -130,10 +134,21 @@ struct v3d_bo *v3d_bo_create(struct drm_device
> *dev, struct drm_file *file_priv,
>  			     size_t unaligned_size)
>  {
>  	struct drm_gem_shmem_object *shmem_obj;
> +	struct v3d_dev *v3d = to_v3d_dev(dev);
>  	struct v3d_bo *bo;
> +	size_t size;
>  	int ret;
> 
> -	shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
> +	size = PAGE_ALIGN(unaligned_size);
> +
> +	/* To avoid memory fragmentation, we only use THP if the BO
> is bigger
> +	 * than two Super Pages (1MB).
> +	 */
> +	if (size >= SZ_2M && v3d->super_pages)
> +		shmem_obj = drm_gem_shmem_create_with_mnt(dev, size,
> v3d->gemfs);
> +	else
> +		shmem_obj = drm_gem_shmem_create(dev, size);
> +
>  	if (IS_ERR(shmem_obj))
>  		return ERR_CAST(shmem_obj);
>  	bo = to_v3d_bo(&shmem_obj->base);
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> b/drivers/gpu/drm/v3d/v3d_drv.c
> index 3debf37e7d9b..96f4d8227407 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.c
> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> @@ -36,6 +36,11 @@
>  #define DRIVER_MINOR 0
>  #define DRIVER_PATCHLEVEL 0
> 
> +static bool super_pages = true;
> +module_param_named(super_pages, super_pages, bool, 0400);
> +MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support.
> Note: \
> +			       To enable Super Pages, you need
> support to THP.");

I guess you meant to say '(...) support for THP'?

> +
>  static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
>  			       struct drm_file *file_priv)
>  {
> @@ -308,6 +313,8 @@ static int v3d_platform_drm_probe(struct
> platform_device *pdev)
>  		return -ENOMEM;
>  	}
> 
> +	v3d->super_pages = super_pages;
> +
>  	ret = v3d_gem_init(drm);
>  	if (ret)
>  		goto dma_free;
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index d2ce8222771a..795087663739 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -17,9 +17,8 @@ struct clk;
>  struct platform_device;
>  struct reset_control;
> 
> -#define GMP_GRANULARITY (128 * 1024)
> -
>  #define V3D_MMU_PAGE_SHIFT 12
> +#define V3D_PAGE_FACTOR (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT)
> 
>  #define V3D_MAX_QUEUES (V3D_CPU + 1)
> 
> @@ -123,6 +122,7 @@ struct v3d_dev {
>  	 * tmpfs instance used for shmem backed objects
>  	 */
>  	struct vfsmount *gemfs;
> +	bool super_pages;
> 
>  	struct work_struct overflow_mem_work;
> 
> @@ -211,6 +211,8 @@ struct v3d_bo {
>  	struct list_head unref_head;
> 
>  	void *vaddr;
> +
> +	bool huge_pages;
>  };
> 
>  static inline struct v3d_bo *
> diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c
> b/drivers/gpu/drm/v3d/v3d_gemfs.c
> index 8518b7da6f73..bcde3138f555 100644
> --- a/drivers/gpu/drm/v3d/v3d_gemfs.c
> +++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
> @@ -12,6 +12,10 @@ void v3d_gemfs_init(struct v3d_dev *v3d)
>  	struct file_system_type *type;
>  	struct vfsmount *gemfs;
> 
> +	/* The user doesn't want support for Super Pages */
> +	if (!v3d->super_pages)
> +		goto err;
> +
>  	/*
>  	 * By creating our own shmemfs mountpoint, we can pass in
>  	 * mount flags that better match our usecase. However, we
> @@ -35,6 +39,8 @@ void v3d_gemfs_init(struct v3d_dev *v3d)
> 
>  err:
>  	v3d->gemfs = NULL;
> +	v3d->super_pages = false;
> +
>  	drm_notice(&v3d->drm,
>  		   "Transparent Hugepage support is recommended for
> optimal performance on this platform!\n");
>  }
> diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c
> b/drivers/gpu/drm/v3d/v3d_mmu.c
> index 14f3af40d6f6..2f368dc2c0ca 100644
> --- a/drivers/gpu/drm/v3d/v3d_mmu.c
> +++ b/drivers/gpu/drm/v3d/v3d_mmu.c
> @@ -89,6 +89,9 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo)
>  	u32 page = bo->node.start;
>  	u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID;
>  	struct sg_dma_page_iter dma_iter;
> +	int ctg_size = drm_prime_get_contiguous_size(shmem_obj-
> >sgt);
> +	u32 page_size = 0;
> +	u32 npages = 0;

Maybe call this npages_4kb so it is more explicit about its purpose?

> 
>  	for_each_sgtable_dma_page(shmem_obj->sgt, &dma_iter, 0) {
>  		dma_addr_t dma_addr =
> sg_page_iter_dma_address(&dma_iter);
> @@ -96,10 +99,27 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo)
>  		u32 pte = page_prot | page_address;
>  		u32 i;
> 
> -		BUG_ON(page_address + (PAGE_SIZE >>
> V3D_MMU_PAGE_SHIFT) >=
> +		if (npages == 0) {
> +			if (ctg_size >= SZ_1M && bo->huge_pages) {
> +				page_size = SZ_1M;
> +				npages = 256;
> +			} else {
> +				page_size = SZ_4K;
> +				npages = V3D_PAGE_FACTOR;

Does it make sense to make this relative to V3D_PAGE_FACTOR when we are
hardcoding the page size to 4KB? And if it does, should we not make
pages for the huge_pages case also be based on the page factor for
consistency?

> +			}
> +
> +			ctg_size -= npages * SZ_4K;
> +		}
> +
> +		if (page_size == SZ_1M)
> +			pte |= V3D_PTE_SUPERPAGE;
> +
> +		BUG_ON(page_address + V3D_PAGE_FACTOR >=
>  		       BIT(24));
> -		for (i = 0; i < PAGE_SIZE >> V3D_MMU_PAGE_SHIFT;
> i++)
> +		for (i = 0; i < V3D_PAGE_FACTOR; i++)
>  			v3d->pt[page++] = pte + i;
> +
> +		npages -= V3D_PAGE_FACTOR;

So just to be sure I get this right: if we allocate a super page, here
we are only assigning the first 4KB of its contiguous address space in
the page table, and then we continue iterating over the remaining DMA
pages fitting them into the super page until we assign all the pages in
the object or we exhaust the super page (in which case we go back to
npages == 0 and make a decision again for the remaining size). Is that
correct?


>  	}
> 
>  	WARN_ON_ONCE(page - bo->node.start !=
> --
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/5] drm/v3d: Fix return if scheduler initialization fails
  2024-03-11 10:05 ` [PATCH 1/5] drm/v3d: Fix return if scheduler initialization fails Maíra Canal
@ 2024-03-12  8:35   ` Iago Toral
  0 siblings, 0 replies; 28+ messages in thread
From: Iago Toral @ 2024-03-12  8:35 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev

This patch is: Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

Iago

El lun, 11-03-2024 a las 07:05 -0300, Maíra Canal escribió:
> If the scheduler initialization fails, GEM initialization must fail
> as
> well. Therefore, if `v3d_sched_init()` fails, free the DMA memory
> allocated and return the error value in `v3d_gem_init()`.
> 
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
>  drivers/gpu/drm/v3d/v3d_gem.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index afc565078c78..66f4b78a6b2e 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -290,8 +290,9 @@ v3d_gem_init(struct drm_device *dev)
>  	ret = v3d_sched_init(v3d);
>  	if (ret) {
>  		drm_mm_takedown(&v3d->mm);
> -		dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void
> *)v3d->pt,
> +		dma_free_coherent(v3d->drm.dev, pt_size, (void
> *)v3d->pt,
>  				  v3d->pt_paddr);
> +		return ret;
>  	}
> 
>  	return 0;
> --
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5] drm/v3d: Introduce gemfs
  2024-03-11 10:06 ` [PATCH 3/5] drm/v3d: Introduce gemfs Maíra Canal
@ 2024-03-12  8:35   ` Iago Toral
  2024-03-12  8:55   ` Tvrtko Ursulin
  1 sibling, 0 replies; 28+ messages in thread
From: Iago Toral @ 2024-03-12  8:35 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev

This patch is: Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

Iago

El lun, 11-03-2024 a las 07:06 -0300, Maíra Canal escribió:
> Create a separate "tmpfs" kernel mount for V3D. This will allow us to
> move away from the shmemfs `shm_mnt` and gives the flexibility to do
> things like set our own mount options. Here, the interest is to use
> "huge=", which should allow us to enable the use of THP for our
> shmem-backed objects.
> 
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
>  drivers/gpu/drm/v3d/Makefile    |  3 ++-
>  drivers/gpu/drm/v3d/v3d_drv.h   |  9 +++++++
>  drivers/gpu/drm/v3d/v3d_gem.c   |  3 +++
>  drivers/gpu/drm/v3d/v3d_gemfs.c | 46
> +++++++++++++++++++++++++++++++++
>  4 files changed, 60 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c
> 
> diff --git a/drivers/gpu/drm/v3d/Makefile
> b/drivers/gpu/drm/v3d/Makefile
> index b7d673f1153b..fcf710926057 100644
> --- a/drivers/gpu/drm/v3d/Makefile
> +++ b/drivers/gpu/drm/v3d/Makefile
> @@ -13,7 +13,8 @@ v3d-y := \
>  	v3d_trace_points.o \
>  	v3d_sched.o \
>  	v3d_sysfs.o \
> -	v3d_submit.o
> +	v3d_submit.o \
> +	v3d_gemfs.o
> 
>  v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index 1950c723dde1..d2ce8222771a 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -119,6 +119,11 @@ struct v3d_dev {
>  	struct drm_mm mm;
>  	spinlock_t mm_lock;
> 
> +	/*
> +	 * tmpfs instance used for shmem backed objects
> +	 */
> +	struct vfsmount *gemfs;
> +
>  	struct work_struct overflow_mem_work;
> 
>  	struct v3d_bin_job *bin_job;
> @@ -519,6 +524,10 @@ void v3d_reset(struct v3d_dev *v3d);
>  void v3d_invalidate_caches(struct v3d_dev *v3d);
>  void v3d_clean_caches(struct v3d_dev *v3d);
> 
> +/* v3d_gemfs.c */
> +void v3d_gemfs_init(struct v3d_dev *v3d);
> +void v3d_gemfs_fini(struct v3d_dev *v3d);
> +
>  /* v3d_submit.c */
>  void v3d_job_cleanup(struct v3d_job *job);
>  void v3d_job_put(struct v3d_job *job);
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index 66f4b78a6b2e..faefbe497e8d 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -287,6 +287,8 @@ v3d_gem_init(struct drm_device *dev)
>  	v3d_init_hw_state(v3d);
>  	v3d_mmu_set_page_table(v3d);
> 
> +	v3d_gemfs_init(v3d);
> +
>  	ret = v3d_sched_init(v3d);
>  	if (ret) {
>  		drm_mm_takedown(&v3d->mm);
> @@ -304,6 +306,7 @@ v3d_gem_destroy(struct drm_device *dev)
>  	struct v3d_dev *v3d = to_v3d_dev(dev);
> 
>  	v3d_sched_fini(v3d);
> +	v3d_gemfs_fini(v3d);
> 
>  	/* Waiting for jobs to finish would need to be done before
>  	 * unregistering V3D.
> diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c
> b/drivers/gpu/drm/v3d/v3d_gemfs.c
> new file mode 100644
> index 000000000000..8518b7da6f73
> --- /dev/null
> +++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
> @@ -0,0 +1,46 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/* Copyright (C) 2024 Raspberry Pi */
> +
> +#include <linux/fs.h>
> +#include <linux/mount.h>
> +
> +#include "v3d_drv.h"
> +
> +void v3d_gemfs_init(struct v3d_dev *v3d)
> +{
> +	char huge_opt[] = "huge=always";
> +	struct file_system_type *type;
> +	struct vfsmount *gemfs;
> +
> +	/*
> +	 * By creating our own shmemfs mountpoint, we can pass in
> +	 * mount flags that better match our usecase. However, we
> +	 * only do so on platforms which benefit from it.
> +	 */
> +	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> +		goto err;
> +
> +	type = get_fs_type("tmpfs");
> +	if (!type)
> +		goto err;
> +
> +	gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name,
> huge_opt);
> +	if (IS_ERR(gemfs))
> +		goto err;
> +
> +	v3d->gemfs = gemfs;
> +	drm_info(&v3d->drm, "Using Transparent Hugepages\n");
> +
> +	return;
> +
> +err:
> +	v3d->gemfs = NULL;
> +	drm_notice(&v3d->drm,
> +		   "Transparent Hugepage support is recommended for
> optimal performance on this platform!\n");
> +}
> +
> +void v3d_gemfs_fini(struct v3d_dev *v3d)
> +{
> +	if (v3d->gemfs)
> +		kern_unmount(v3d->gemfs);
> +}
> --
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5] drm/v3d: Enable Super Pages
  2024-03-11 10:05 [PATCH 0/5] drm/v3d: Enable Super Pages Maíra Canal
                   ` (4 preceding siblings ...)
  2024-03-11 10:06 ` [PATCH 5/5] drm/v3d: Enable super pages Maíra Canal
@ 2024-03-12  8:37 ` Iago Toral
  5 siblings, 0 replies; 28+ messages in thread
From: Iago Toral @ 2024-03-12  8:37 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev

Hi Maíra,

El lun, 11-03-2024 a las 07:05 -0300, Maíra Canal escribió:
> This series introduces support for super pages in V3D. The V3D MMU
> has support
> for 1MB pages, called super pages, which is currently not used.
> Therefore,
> this patchset has the intention to enable super pages in V3D. The
> advantage of
> enabling super pages size is that if any entry for a page within a
> super page
> is cached in the MMU, it will be used for translation of all virtual
> addresses
> in the range of that super pages without requiring fetching any other
> entries.
> 
> Super pages essentially means a slightly better performance for
> users,
> especially in applications with high memory requirements (e.g.
> applications
> that uses multiple large BOs).
> 
> Using a Raspberry Pi 4 (with a PAGE_SIZE=4KB downstream kernel), when
> running
> traces from multiple applications, we were able to see the following
> improvements:
> 
> fps_avg  helped: 
> warzone2100.70secs.1024x768.trace:                       1.81 -> 2.56
> (41.82%)
> fps_avg  helped: 
> warzone2100.30secs.1024x768.trace:                       2.00 -> 2.39
> (19.62%)
> fps_avg  helped:  quake2-gl1.4-
> 1280x720.trace:                             35.01 -> 36.57 (4.47%)
> fps_avg  helped:  supertuxkart-
> menus_1024x768.trace:                       120.75 -> 125.50 (3.93%)
> fps_avg  helped:  quake2-gles3-
> 1280x720.trace:                             62.69 -> 64.29 (2.55%)
> fps_avg  helped: 
> ue4_shooter_game_shooting_low_quality_640x480.gfxr:      26.13 ->
> 26.75 (2.39%)
> fps_avg  helped: 
> vkQuake_capture_frames_1_through_1200_1280x720.gfxr:     60.35 ->
> 61.36 (1.67%)
> fps_avg  helped: 
> ue4_sun_temple_640x480.gfxr:                             24.60 ->
> 24.94 (1.40%)
> fps_avg  helped: 
> ue4_shooter_game_shooting_high_quality_640x480.gfxr:     23.07 ->
> 23.34 (1.15%)
> fps_avg  helped: 
> serious_sam_trace02_1280x720.gfxr:                       47.44 ->
> 47.74 (0.63%)
> fps_avg  helped: 
> ue4_shooter_game_high_quality_640x480.gfxr:              18.91 ->
> 19.02 (0.59%)
> 
> Using a Raspberry Pi 5 (with a PAGE_SIZE=16KB downstream kernel),
> when running
> traces from multiple applications, we were able to see the following
> improvements:
> 
> fps_avg  helped: 
> warzone2100.30secs.1024x768.trace:                       3.60 -> 4.49
> (24.72%)
> fps_avg  helped: 
> sponza_demo02_800x600.gfxr:                              46.33 ->
> 49.34 (6.49%)
> fps_avg  helped: 
> quake3e_capture_frames_1_through_1800_1920x1080.gfxr:    155.70 ->
> 165.71 (6.43%)
> fps_avg  helped:  gl-117-
> 1024x768.trace:                                   31.82 -> 33.85
> (6.41%)
> fps_avg  helped:  supertuxkart-
> menus_1024x768.trace:                       287.80 -> 303.80 (5.56%)
> fps_avg  helped: 
> ue4_shooter_game_shooting_low_quality_640x480.gfxr:      45.27 ->
> 47.30 (4.49%)
> fps_avg  helped: 
> sponza_demo01_800x600.gfxr:                              42.05 ->
> 43.68 (3.89%)
> fps_avg  helped:  supertuxkart-
> racing_1024x768.trace:                      19.94 -> 20.59 (3.26%)
> fps_avg  helped: 
> vkQuake_capture_frames_1_through_1200_1280x720.gfxr:     135.19 ->
> 139.45 (3.15%)
> fps_avg  helped:  quake2-gles3-
> 1280x720.trace:                             151.71 -> 156.13 (2.92%)
> fps_avg  helped: 
> ue4_shooter_game_high_quality_640x480.gfxr:              30.28 ->
> 31.05 (2.54%)
> fps_avg  helped:  rbdoom-3-
> bfg_640x480.gfxr:                               31.52 -> 32.30
> (2.49%)
> fps_avg  helped: 
> quake3e_capture_frames_1800_through_2400_1920x1080.gfxr: 157.29 ->
> 160.35 (1.94%)
> fps_avg  helped:  quake3e-
> 1280x720.trace:                                  230.48 -> 234.51
> (1.75%)
> fps_avg  helped: 
> ue4_shooter_game_shooting_high_quality_640x480.gfxr:     49.67 ->
> 50.46 (1.60%)
> fps_avg  helped: 
> ue4_sun_temple_640x480.gfxr:                             39.70 ->
> 40.23 (1.34%)
> 
> This series also introduces changes in the GEM helpers, in order to
> enable V3D
> to have a separated mountpoint for shmem GEM objects. Any feedback
> from the
> community about the changes in the GEM helpers is welcomed!
> 
> Best Regards,
> - Maíra
> 
> Maíra Canal (5):
>   drm/v3d: Fix return if scheduler initialization fails
>   drm/gem: Add a mountpoint parameter to drm_gem_object_init()
>   drm/v3d: Introduce gemfs
>   drm/gem: Create shmem GEM object in a given mountpoint
>   drm/v3d: Enable super pages
> 

I reviewed the 3 v3d patches in the series, gave R-B for the first two
and made a couple of comments to the last one. For the drm/gem patches
I think you want someone more qualified to review them.

Iago

>  drivers/gpu/drm/armada/armada_gem.c           |  2 +-
>  drivers/gpu/drm/drm_gem.c                     | 12 ++++-
>  drivers/gpu/drm/drm_gem_dma_helper.c          |  2 +-
>  drivers/gpu/drm/drm_gem_shmem_helper.c        | 30 +++++++++--
>  drivers/gpu/drm/drm_gem_vram_helper.c         |  2 +-
>  drivers/gpu/drm/etnaviv/etnaviv_gem.c         |  2 +-
>  drivers/gpu/drm/exynos/exynos_drm_gem.c       |  2 +-
>  drivers/gpu/drm/gma500/gem.c                  |  2 +-
>  drivers/gpu/drm/loongson/lsdc_ttm.c           |  2 +-
>  drivers/gpu/drm/mediatek/mtk_drm_gem.c        |  2 +-
>  drivers/gpu/drm/msm/msm_gem.c                 |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_gem.c         |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_prime.c       |  2 +-
>  drivers/gpu/drm/omapdrm/omap_gem.c            |  2 +-
>  drivers/gpu/drm/qxl/qxl_object.c              |  2 +-
>  drivers/gpu/drm/rockchip/rockchip_drm_gem.c   |  2 +-
>  drivers/gpu/drm/tegra/gem.c                   |  2 +-
>  drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c |  2 +-
>  drivers/gpu/drm/v3d/Makefile                  |  3 +-
>  drivers/gpu/drm/v3d/v3d_bo.c                  | 19 ++++++-
>  drivers/gpu/drm/v3d/v3d_drv.c                 |  7 +++
>  drivers/gpu/drm/v3d/v3d_drv.h                 | 15 +++++-
>  drivers/gpu/drm/v3d/v3d_gem.c                 |  6 ++-
>  drivers/gpu/drm/v3d/v3d_gemfs.c               | 52
> +++++++++++++++++++
>  drivers/gpu/drm/v3d/v3d_mmu.c                 | 24 ++++++++-
>  drivers/gpu/drm/xen/xen_drm_front_gem.c       |  2 +-
>  include/drm/drm_gem.h                         |  3 +-
>  include/drm/drm_gem_shmem_helper.h            |  3 ++
>  28 files changed, 176 insertions(+), 32 deletions(-)
>  create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c
> 
> --
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-11 10:05 ` [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init() Maíra Canal
@ 2024-03-12  8:51   ` Tvrtko Ursulin
  2024-03-12  8:59     ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: Tvrtko Ursulin @ 2024-03-12  8:51 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, Iago Toral, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Christian König,
	Huang Rui, Oleksandr Andrushchenko, Karolina Stolarek,
	Andi Shyti


Hi Maira,

On 11/03/2024 10:05, Maíra Canal wrote:
> For some applications, such as using huge pages, we might want to have a
> different mountpoint, for which we pass in mount flags that better match
> our usecase.
> 
> Therefore, add a new parameter to drm_gem_object_init() that allow us to
> define the tmpfs mountpoint where the GEM object will be created. If
> this parameter is NULL, then we fallback to shmem_file_setup().

One strategy for reducing churn, and so the number of drivers this patch 
touches, could be to add a lower level drm_gem_object_init() (which 
takes vfsmount, call it __drm_gem_object_init(), or 
drm__gem_object_init_mnt(), and make drm_gem_object_init() call that one 
with a NULL argument.

Regards,

Tvrtko

> 
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Lucas Stach <l.stach@pengutronix.de>
> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
> Cc: Inki Dae <inki.dae@samsung.com>
> Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> Cc: Alim Akhtar <alim.akhtar@samsung.com>
> Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
> Cc: Sui Jingfeng <suijingfeng@loongson.cn>
> Cc: Chun-Kuang Hu <chunkuang.hu@kernel.org>
> Cc: Philipp Zabel <p.zabel@pengutronix.de>
> Cc: Matthias Brugger <matthias.bgg@gmail.com>
> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
> Cc: Rob Clark <robdclark@gmail.com>
> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> Cc: Sean Paul <sean@poorly.run>
> Cc: Marijn Suijten <marijn.suijten@somainline.org>
> Cc: Karol Herbst <kherbst@redhat.com>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Danilo Krummrich <dakr@redhat.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Sandy Huang <hjc@rock-chips.com>
> Cc: "Heiko Stübner" <heiko@sntech.de>
> Cc: Andy Yan <andy.yan@rock-chips.com>
> Cc: Thierry Reding <thierry.reding@gmail.com>
> Cc: Mikko Perttunen <mperttunen@nvidia.com>
> Cc: Jonathan Hunter <jonathanh@nvidia.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Cc: Karolina Stolarek <karolina.stolarek@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
>   drivers/gpu/drm/armada/armada_gem.c           |  2 +-
>   drivers/gpu/drm/drm_gem.c                     | 12 ++++++++++--
>   drivers/gpu/drm/drm_gem_dma_helper.c          |  2 +-
>   drivers/gpu/drm/drm_gem_shmem_helper.c        |  2 +-
>   drivers/gpu/drm/drm_gem_vram_helper.c         |  2 +-
>   drivers/gpu/drm/etnaviv/etnaviv_gem.c         |  2 +-
>   drivers/gpu/drm/exynos/exynos_drm_gem.c       |  2 +-
>   drivers/gpu/drm/gma500/gem.c                  |  2 +-
>   drivers/gpu/drm/loongson/lsdc_ttm.c           |  2 +-
>   drivers/gpu/drm/mediatek/mtk_drm_gem.c        |  2 +-
>   drivers/gpu/drm/msm/msm_gem.c                 |  2 +-
>   drivers/gpu/drm/nouveau/nouveau_gem.c         |  2 +-
>   drivers/gpu/drm/nouveau/nouveau_prime.c       |  2 +-
>   drivers/gpu/drm/omapdrm/omap_gem.c            |  2 +-
>   drivers/gpu/drm/qxl/qxl_object.c              |  2 +-
>   drivers/gpu/drm/rockchip/rockchip_drm_gem.c   |  2 +-
>   drivers/gpu/drm/tegra/gem.c                   |  2 +-
>   drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c |  2 +-
>   drivers/gpu/drm/xen/xen_drm_front_gem.c       |  2 +-
>   include/drm/drm_gem.h                         |  3 ++-
>   20 files changed, 30 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/armada/armada_gem.c b/drivers/gpu/drm/armada/armada_gem.c
> index 26d10065d534..36a25e667341 100644
> --- a/drivers/gpu/drm/armada/armada_gem.c
> +++ b/drivers/gpu/drm/armada/armada_gem.c
> @@ -226,7 +226,7 @@ static struct armada_gem_object *armada_gem_alloc_object(struct drm_device *dev,
> 
>   	obj->obj.funcs = &armada_gem_object_funcs;
> 
> -	if (drm_gem_object_init(dev, &obj->obj, size)) {
> +	if (drm_gem_object_init(dev, &obj->obj, size, NULL)) {
>   		kfree(obj);
>   		return NULL;
>   	}
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index 44a948b80ee1..ddd8777fcda5 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -118,18 +118,26 @@ drm_gem_init(struct drm_device *dev)
>    * @dev: drm_device the object should be initialized for
>    * @obj: drm_gem_object to initialize
>    * @size: object size
> + * @gemfs: tmpfs mount where the GEM object will be created. If NULL, use
> + * the usual tmpfs mountpoint (`shm_mnt`).
>    *
>    * Initialize an already allocated GEM object of the specified size with
>    * shmfs backing store.
>    */
>   int drm_gem_object_init(struct drm_device *dev,
> -			struct drm_gem_object *obj, size_t size)
> +			struct drm_gem_object *obj, size_t size,
> +			struct vfsmount *gemfs)
>   {
>   	struct file *filp;
> 
>   	drm_gem_private_object_init(dev, obj, size);
> 
> -	filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
> +	if (gemfs)
> +		filp = shmem_file_setup_with_mnt(gemfs, "drm mm object", size,
> +						 VM_NORESERVE);
> +	else
> +		filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
> +
>   	if (IS_ERR(filp))
>   		return PTR_ERR(filp);
> 
> diff --git a/drivers/gpu/drm/drm_gem_dma_helper.c b/drivers/gpu/drm/drm_gem_dma_helper.c
> index 870b90b78bc4..9ada5ac85dd6 100644
> --- a/drivers/gpu/drm/drm_gem_dma_helper.c
> +++ b/drivers/gpu/drm/drm_gem_dma_helper.c
> @@ -95,7 +95,7 @@ __drm_gem_dma_create(struct drm_device *drm, size_t size, bool private)
>   		/* Always use writecombine for dma-buf mappings */
>   		dma_obj->map_noncoherent = false;
>   	} else {
> -		ret = drm_gem_object_init(drm, gem_obj, size);
> +		ret = drm_gem_object_init(drm, gem_obj, size, NULL);
>   	}
>   	if (ret)
>   		goto error;
> diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
> index e435f986cd13..15635b330ca8 100644
> --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> @@ -77,7 +77,7 @@ __drm_gem_shmem_create(struct drm_device *dev, size_t size, bool private)
>   		drm_gem_private_object_init(dev, obj, size);
>   		shmem->map_wc = false; /* dma-buf mappings use always writecombine */
>   	} else {
> -		ret = drm_gem_object_init(dev, obj, size);
> +		ret = drm_gem_object_init(dev, obj, size, NULL);
>   	}
>   	if (ret) {
>   		drm_gem_private_object_fini(obj);
> diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c
> index 75f2eaf0d5b6..90649899dbef 100644
> --- a/drivers/gpu/drm/drm_gem_vram_helper.c
> +++ b/drivers/gpu/drm/drm_gem_vram_helper.c
> @@ -210,7 +210,7 @@ struct drm_gem_vram_object *drm_gem_vram_create(struct drm_device *dev,
>   	if (!gem->funcs)
>   		gem->funcs = &drm_gem_vram_object_funcs;
> 
> -	ret = drm_gem_object_init(dev, gem, size);
> +	ret = drm_gem_object_init(dev, gem, size, NULL);
>   	if (ret) {
>   		kfree(gbo);
>   		return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> index 71a6d2b1c80f..aa4b61c48b7f 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> @@ -596,7 +596,7 @@ int etnaviv_gem_new_handle(struct drm_device *dev, struct drm_file *file,
> 
>   	lockdep_set_class(&to_etnaviv_bo(obj)->lock, &etnaviv_shm_lock_class);
> 
> -	ret = drm_gem_object_init(dev, obj, size);
> +	ret = drm_gem_object_init(dev, obj, size, NULL);
>   	if (ret)
>   		goto fail;
> 
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c
> index 638ca96830e9..c50c0d12246e 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
> @@ -160,7 +160,7 @@ static struct exynos_drm_gem *exynos_drm_gem_init(struct drm_device *dev,
> 
>   	obj->funcs = &exynos_drm_gem_object_funcs;
> 
> -	ret = drm_gem_object_init(dev, obj, size);
> +	ret = drm_gem_object_init(dev, obj, size, NULL);
>   	if (ret < 0) {
>   		DRM_DEV_ERROR(dev->dev, "failed to initialize gem object\n");
>   		kfree(exynos_gem);
> diff --git a/drivers/gpu/drm/gma500/gem.c b/drivers/gpu/drm/gma500/gem.c
> index 4b7627a72637..315e085dc9ee 100644
> --- a/drivers/gpu/drm/gma500/gem.c
> +++ b/drivers/gpu/drm/gma500/gem.c
> @@ -169,7 +169,7 @@ psb_gem_create(struct drm_device *dev, u64 size, const char *name, bool stolen,
>   	if (stolen) {
>   		drm_gem_private_object_init(dev, obj, size);
>   	} else {
> -		ret = drm_gem_object_init(dev, obj, size);
> +		ret = drm_gem_object_init(dev, obj, size, NULL);
>   		if (ret)
>   			goto err_release_resource;
> 
> diff --git a/drivers/gpu/drm/loongson/lsdc_ttm.c b/drivers/gpu/drm/loongson/lsdc_ttm.c
> index 465f622ac05d..d392ea66d72e 100644
> --- a/drivers/gpu/drm/loongson/lsdc_ttm.c
> +++ b/drivers/gpu/drm/loongson/lsdc_ttm.c
> @@ -458,7 +458,7 @@ struct lsdc_bo *lsdc_bo_create(struct drm_device *ddev,
> 
>   	size = ALIGN(size, PAGE_SIZE);
> 
> -	ret = drm_gem_object_init(ddev, &tbo->base, size);
> +	ret = drm_gem_object_init(ddev, &tbo->base, size, NULL);
>   	if (ret) {
>   		kfree(lbo);
>   		return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_gem.c b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
> index 4f2e3feabc0f..261d386921dc 100644
> --- a/drivers/gpu/drm/mediatek/mtk_drm_gem.c
> +++ b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
> @@ -44,7 +44,7 @@ static struct mtk_drm_gem_obj *mtk_drm_gem_init(struct drm_device *dev,
> 
>   	mtk_gem_obj->base.funcs = &mtk_drm_gem_object_funcs;
> 
> -	ret = drm_gem_object_init(dev, &mtk_gem_obj->base, size);
> +	ret = drm_gem_object_init(dev, &mtk_gem_obj->base, size, NULL);
>   	if (ret < 0) {
>   		DRM_ERROR("failed to initialize gem object\n");
>   		kfree(mtk_gem_obj);
> diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
> index 175ee4ab8a6f..6fe17cf28ef6 100644
> --- a/drivers/gpu/drm/msm/msm_gem.c
> +++ b/drivers/gpu/drm/msm/msm_gem.c
> @@ -1222,7 +1222,7 @@ struct drm_gem_object *msm_gem_new(struct drm_device *dev, uint32_t size, uint32
> 
>   		vma->iova = physaddr(obj);
>   	} else {
> -		ret = drm_gem_object_init(dev, obj, size);
> +		ret = drm_gem_object_init(dev, obj, size, NULL);
>   		if (ret)
>   			goto fail;
>   		/*
> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
> index 49c2bcbef129..434325fa8752 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> @@ -262,7 +262,7 @@ nouveau_gem_new(struct nouveau_cli *cli, u64 size, int align, uint32_t domain,
> 
>   	/* Initialize the embedded gem-object. We return a single gem-reference
>   	 * to the caller, instead of a normal nouveau_bo ttm reference. */
> -	ret = drm_gem_object_init(drm->dev, &nvbo->bo.base, size);
> +	ret = drm_gem_object_init(drm->dev, &nvbo->bo.base, size, NULL);
>   	if (ret) {
>   		drm_gem_object_release(&nvbo->bo.base);
>   		kfree(nvbo);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c b/drivers/gpu/drm/nouveau/nouveau_prime.c
> index 1b2ff0c40fc1..c9b3572df555 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_prime.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_prime.c
> @@ -62,7 +62,7 @@ struct drm_gem_object *nouveau_gem_prime_import_sg_table(struct drm_device *dev,
> 
>   	/* Initialize the embedded gem-object. We return a single gem-reference
>   	 * to the caller, instead of a normal nouveau_bo ttm reference. */
> -	ret = drm_gem_object_init(dev, &nvbo->bo.base, size);
> +	ret = drm_gem_object_init(dev, &nvbo->bo.base, size, NULL);
>   	if (ret) {
>   		nouveau_bo_ref(NULL, &nvbo);
>   		obj = ERR_PTR(-ENOMEM);
> diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c
> index 3421e8389222..53b4ec64c7b0 100644
> --- a/drivers/gpu/drm/omapdrm/omap_gem.c
> +++ b/drivers/gpu/drm/omapdrm/omap_gem.c
> @@ -1352,7 +1352,7 @@ struct drm_gem_object *omap_gem_new(struct drm_device *dev,
>   	if (!(flags & OMAP_BO_MEM_SHMEM)) {
>   		drm_gem_private_object_init(dev, obj, size);
>   	} else {
> -		ret = drm_gem_object_init(dev, obj, size);
> +		ret = drm_gem_object_init(dev, obj, size, NULL);
>   		if (ret)
>   			goto err_free;
> 
> diff --git a/drivers/gpu/drm/qxl/qxl_object.c b/drivers/gpu/drm/qxl/qxl_object.c
> index 1e46b0a6e478..45d7abe26ebd 100644
> --- a/drivers/gpu/drm/qxl/qxl_object.c
> +++ b/drivers/gpu/drm/qxl/qxl_object.c
> @@ -123,7 +123,7 @@ int qxl_bo_create(struct qxl_device *qdev, unsigned long size,
>   	if (bo == NULL)
>   		return -ENOMEM;
>   	size = roundup(size, PAGE_SIZE);
> -	r = drm_gem_object_init(&qdev->ddev, &bo->tbo.base, size);
> +	r = drm_gem_object_init(&qdev->ddev, &bo->tbo.base, size, NULL);
>   	if (unlikely(r)) {
>   		kfree(bo);
>   		return r;
> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
> index 93ed841f5dce..daba285bd78f 100644
> --- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
> @@ -295,7 +295,7 @@ static struct rockchip_gem_object *
> 
>   	obj->funcs = &rockchip_gem_object_funcs;
> 
> -	drm_gem_object_init(drm, obj, size);
> +	drm_gem_object_init(drm, obj, size, NULL);
> 
>   	return rk_obj;
>   }
> diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
> index b4eb030ea961..63f10d5a57ba 100644
> --- a/drivers/gpu/drm/tegra/gem.c
> +++ b/drivers/gpu/drm/tegra/gem.c
> @@ -311,7 +311,7 @@ static struct tegra_bo *tegra_bo_alloc_object(struct drm_device *drm,
>   	host1x_bo_init(&bo->base, &tegra_bo_ops);
>   	size = round_up(size, PAGE_SIZE);
> 
> -	err = drm_gem_object_init(drm, &bo->gem, size);
> +	err = drm_gem_object_init(drm, &bo->gem, size, NULL);
>   	if (err < 0)
>   		goto free;
> 
> diff --git a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
> index 7b7c1fa805fc..a9bf7d5a887c 100644
> --- a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
> +++ b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
> @@ -61,7 +61,7 @@ struct ttm_buffer_object *ttm_bo_kunit_init(struct kunit *test,
>   	KUNIT_ASSERT_NOT_NULL(test, bo);
> 
>   	bo->base = gem_obj;
> -	err = drm_gem_object_init(devs->drm, &bo->base, size);
> +	err = drm_gem_object_init(devs->drm, &bo->base, size, NULL);
>   	KUNIT_ASSERT_EQ(test, err, 0);
> 
>   	bo->bdev = devs->ttm_dev;
> diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c b/drivers/gpu/drm/xen/xen_drm_front_gem.c
> index 3ad2b4cfd1f0..1b36c958340b 100644
> --- a/drivers/gpu/drm/xen/xen_drm_front_gem.c
> +++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c
> @@ -122,7 +122,7 @@ static struct xen_gem_object *gem_create_obj(struct drm_device *dev,
> 
>   	xen_obj->base.funcs = &xen_drm_front_gem_object_funcs;
> 
> -	ret = drm_gem_object_init(dev, &xen_obj->base, size);
> +	ret = drm_gem_object_init(dev, &xen_obj->base, size, NULL);
>   	if (ret < 0) {
>   		kfree(xen_obj);
>   		return ERR_PTR(ret);
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 2ebec3984cd4..c75611ae8f93 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -471,7 +471,8 @@ struct drm_gem_object {
>   void drm_gem_object_release(struct drm_gem_object *obj);
>   void drm_gem_object_free(struct kref *kref);
>   int drm_gem_object_init(struct drm_device *dev,
> -			struct drm_gem_object *obj, size_t size);
> +			struct drm_gem_object *obj, size_t size,
> +			struct vfsmount *gemfs);
>   void drm_gem_private_object_init(struct drm_device *dev,
>   				 struct drm_gem_object *obj, size_t size);
>   void drm_gem_private_object_fini(struct drm_gem_object *obj);
> --
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5] drm/v3d: Introduce gemfs
  2024-03-11 10:06 ` [PATCH 3/5] drm/v3d: Introduce gemfs Maíra Canal
  2024-03-12  8:35   ` Iago Toral
@ 2024-03-12  8:55   ` Tvrtko Ursulin
  1 sibling, 0 replies; 28+ messages in thread
From: Tvrtko Ursulin @ 2024-03-12  8:55 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, Iago Toral, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev


Hi,

On 11/03/2024 10:06, Maíra Canal wrote:
> Create a separate "tmpfs" kernel mount for V3D. This will allow us to
> move away from the shmemfs `shm_mnt` and gives the flexibility to do
> things like set our own mount options. Here, the interest is to use
> "huge=", which should allow us to enable the use of THP for our
> shmem-backed objects.
> 
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
>   drivers/gpu/drm/v3d/Makefile    |  3 ++-
>   drivers/gpu/drm/v3d/v3d_drv.h   |  9 +++++++
>   drivers/gpu/drm/v3d/v3d_gem.c   |  3 +++
>   drivers/gpu/drm/v3d/v3d_gemfs.c | 46 +++++++++++++++++++++++++++++++++
>   4 files changed, 60 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/gpu/drm/v3d/v3d_gemfs.c
> 
> diff --git a/drivers/gpu/drm/v3d/Makefile b/drivers/gpu/drm/v3d/Makefile
> index b7d673f1153b..fcf710926057 100644
> --- a/drivers/gpu/drm/v3d/Makefile
> +++ b/drivers/gpu/drm/v3d/Makefile
> @@ -13,7 +13,8 @@ v3d-y := \
>   	v3d_trace_points.o \
>   	v3d_sched.o \
>   	v3d_sysfs.o \
> -	v3d_submit.o
> +	v3d_submit.o \
> +	v3d_gemfs.o
> 
>   v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
> index 1950c723dde1..d2ce8222771a 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -119,6 +119,11 @@ struct v3d_dev {
>   	struct drm_mm mm;
>   	spinlock_t mm_lock;
> 
> +	/*
> +	 * tmpfs instance used for shmem backed objects
> +	 */
> +	struct vfsmount *gemfs;
> +
>   	struct work_struct overflow_mem_work;
> 
>   	struct v3d_bin_job *bin_job;
> @@ -519,6 +524,10 @@ void v3d_reset(struct v3d_dev *v3d);
>   void v3d_invalidate_caches(struct v3d_dev *v3d);
>   void v3d_clean_caches(struct v3d_dev *v3d);
> 
> +/* v3d_gemfs.c */
> +void v3d_gemfs_init(struct v3d_dev *v3d);
> +void v3d_gemfs_fini(struct v3d_dev *v3d);
> +
>   /* v3d_submit.c */
>   void v3d_job_cleanup(struct v3d_job *job);
>   void v3d_job_put(struct v3d_job *job);
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> index 66f4b78a6b2e..faefbe497e8d 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -287,6 +287,8 @@ v3d_gem_init(struct drm_device *dev)
>   	v3d_init_hw_state(v3d);
>   	v3d_mmu_set_page_table(v3d);
> 
> +	v3d_gemfs_init(v3d);
> +
>   	ret = v3d_sched_init(v3d);
>   	if (ret) {
>   		drm_mm_takedown(&v3d->mm);
> @@ -304,6 +306,7 @@ v3d_gem_destroy(struct drm_device *dev)
>   	struct v3d_dev *v3d = to_v3d_dev(dev);
> 
>   	v3d_sched_fini(v3d);
> +	v3d_gemfs_fini(v3d);
> 
>   	/* Waiting for jobs to finish would need to be done before
>   	 * unregistering V3D.
> diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
> new file mode 100644
> index 000000000000..8518b7da6f73
> --- /dev/null
> +++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
> @@ -0,0 +1,46 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/* Copyright (C) 2024 Raspberry Pi */
> +
> +#include <linux/fs.h>
> +#include <linux/mount.h>
> +
> +#include "v3d_drv.h"
> +
> +void v3d_gemfs_init(struct v3d_dev *v3d)
> +{
> +	char huge_opt[] = "huge=always";

Using 'always' and not 'within_size' is deliberate? It can waste memory 
but indeed could be best for performance. I am just asking and perhaps I 
missed some prior discussion on this.

Regards,

Tvrtko

> +	struct file_system_type *type;
> +	struct vfsmount *gemfs;
> +
> +	/*
> +	 * By creating our own shmemfs mountpoint, we can pass in
> +	 * mount flags that better match our usecase. However, we
> +	 * only do so on platforms which benefit from it.
> +	 */
> +	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> +		goto err;
> +
> +	type = get_fs_type("tmpfs");
> +	if (!type)
> +		goto err;
> +
> +	gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt);
> +	if (IS_ERR(gemfs))
> +		goto err;
> +
> +	v3d->gemfs = gemfs;
> +	drm_info(&v3d->drm, "Using Transparent Hugepages\n");
> +
> +	return;
> +
> +err:
> +	v3d->gemfs = NULL;
> +	drm_notice(&v3d->drm,
> +		   "Transparent Hugepage support is recommended for optimal performance on this platform!\n");
> +}
> +
> +void v3d_gemfs_fini(struct v3d_dev *v3d)
> +{
> +	if (v3d->gemfs)
> +		kern_unmount(v3d->gemfs);
> +}
> --
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-12  8:51   ` Tvrtko Ursulin
@ 2024-03-12  8:59     ` Christian König
  2024-03-12  9:30       ` Tvrtko Ursulin
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2024-03-12  8:59 UTC (permalink / raw)
  To: Tvrtko Ursulin, Maíra Canal, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti

Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>
> Hi Maira,
>
> On 11/03/2024 10:05, Maíra Canal wrote:
>> For some applications, such as using huge pages, we might want to have a
>> different mountpoint, for which we pass in mount flags that better match
>> our usecase.
>>
>> Therefore, add a new parameter to drm_gem_object_init() that allow us to
>> define the tmpfs mountpoint where the GEM object will be created. If
>> this parameter is NULL, then we fallback to shmem_file_setup().
>
> One strategy for reducing churn, and so the number of drivers this 
> patch touches, could be to add a lower level drm_gem_object_init() 
> (which takes vfsmount, call it __drm_gem_object_init(), or 
> drm__gem_object_init_mnt(), and make drm_gem_object_init() call that 
> one with a NULL argument.

I would even go a step further into the other direction. The shmem 
backed GEM object is just some special handling as far as I can see.

So I would rather suggest to rename all drm_gem_* function which only 
deal with the shmem backed GEM object into drm_gem_shmem_*.

Also the explanation why a different mount point helps with something 
isn't very satisfying.

Regards,
Christian.

>
> Regards,
>
> Tvrtko
>
>>
>> Cc: Russell King <linux@armlinux.org.uk>
>> Cc: Lucas Stach <l.stach@pengutronix.de>
>> Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
>> Cc: Inki Dae <inki.dae@samsung.com>
>> Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
>> Cc: Kyungmin Park <kyungmin.park@samsung.com>
>> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
>> Cc: Alim Akhtar <alim.akhtar@samsung.com>
>> Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
>> Cc: Sui Jingfeng <suijingfeng@loongson.cn>
>> Cc: Chun-Kuang Hu <chunkuang.hu@kernel.org>
>> Cc: Philipp Zabel <p.zabel@pengutronix.de>
>> Cc: Matthias Brugger <matthias.bgg@gmail.com>
>> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
>> Cc: Rob Clark <robdclark@gmail.com>
>> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
>> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>> Cc: Sean Paul <sean@poorly.run>
>> Cc: Marijn Suijten <marijn.suijten@somainline.org>
>> Cc: Karol Herbst <kherbst@redhat.com>
>> Cc: Lyude Paul <lyude@redhat.com>
>> Cc: Danilo Krummrich <dakr@redhat.com>
>> Cc: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
>> Cc: Gerd Hoffmann <kraxel@redhat.com>
>> Cc: Sandy Huang <hjc@rock-chips.com>
>> Cc: "Heiko Stübner" <heiko@sntech.de>
>> Cc: Andy Yan <andy.yan@rock-chips.com>
>> Cc: Thierry Reding <thierry.reding@gmail.com>
>> Cc: Mikko Perttunen <mperttunen@nvidia.com>
>> Cc: Jonathan Hunter <jonathanh@nvidia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Huang Rui <ray.huang@amd.com>
>> Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> Cc: Karolina Stolarek <karolina.stolarek@intel.com>
>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>> Signed-off-by: Maíra Canal <mcanal@igalia.com>
>> ---
>>   drivers/gpu/drm/armada/armada_gem.c           |  2 +-
>>   drivers/gpu/drm/drm_gem.c                     | 12 ++++++++++--
>>   drivers/gpu/drm/drm_gem_dma_helper.c          |  2 +-
>>   drivers/gpu/drm/drm_gem_shmem_helper.c        |  2 +-
>>   drivers/gpu/drm/drm_gem_vram_helper.c         |  2 +-
>>   drivers/gpu/drm/etnaviv/etnaviv_gem.c         |  2 +-
>>   drivers/gpu/drm/exynos/exynos_drm_gem.c       |  2 +-
>>   drivers/gpu/drm/gma500/gem.c                  |  2 +-
>>   drivers/gpu/drm/loongson/lsdc_ttm.c           |  2 +-
>>   drivers/gpu/drm/mediatek/mtk_drm_gem.c        |  2 +-
>>   drivers/gpu/drm/msm/msm_gem.c                 |  2 +-
>>   drivers/gpu/drm/nouveau/nouveau_gem.c         |  2 +-
>>   drivers/gpu/drm/nouveau/nouveau_prime.c       |  2 +-
>>   drivers/gpu/drm/omapdrm/omap_gem.c            |  2 +-
>>   drivers/gpu/drm/qxl/qxl_object.c              |  2 +-
>>   drivers/gpu/drm/rockchip/rockchip_drm_gem.c   |  2 +-
>>   drivers/gpu/drm/tegra/gem.c                   |  2 +-
>>   drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c |  2 +-
>>   drivers/gpu/drm/xen/xen_drm_front_gem.c       |  2 +-
>>   include/drm/drm_gem.h                         |  3 ++-
>>   20 files changed, 30 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/armada/armada_gem.c 
>> b/drivers/gpu/drm/armada/armada_gem.c
>> index 26d10065d534..36a25e667341 100644
>> --- a/drivers/gpu/drm/armada/armada_gem.c
>> +++ b/drivers/gpu/drm/armada/armada_gem.c
>> @@ -226,7 +226,7 @@ static struct armada_gem_object 
>> *armada_gem_alloc_object(struct drm_device *dev,
>>
>>       obj->obj.funcs = &armada_gem_object_funcs;
>>
>> -    if (drm_gem_object_init(dev, &obj->obj, size)) {
>> +    if (drm_gem_object_init(dev, &obj->obj, size, NULL)) {
>>           kfree(obj);
>>           return NULL;
>>       }
>> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
>> index 44a948b80ee1..ddd8777fcda5 100644
>> --- a/drivers/gpu/drm/drm_gem.c
>> +++ b/drivers/gpu/drm/drm_gem.c
>> @@ -118,18 +118,26 @@ drm_gem_init(struct drm_device *dev)
>>    * @dev: drm_device the object should be initialized for
>>    * @obj: drm_gem_object to initialize
>>    * @size: object size
>> + * @gemfs: tmpfs mount where the GEM object will be created. If 
>> NULL, use
>> + * the usual tmpfs mountpoint (`shm_mnt`).
>>    *
>>    * Initialize an already allocated GEM object of the specified size 
>> with
>>    * shmfs backing store.
>>    */
>>   int drm_gem_object_init(struct drm_device *dev,
>> -            struct drm_gem_object *obj, size_t size)
>> +            struct drm_gem_object *obj, size_t size,
>> +            struct vfsmount *gemfs)
>>   {
>>       struct file *filp;
>>
>>       drm_gem_private_object_init(dev, obj, size);
>>
>> -    filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
>> +    if (gemfs)
>> +        filp = shmem_file_setup_with_mnt(gemfs, "drm mm object", size,
>> +                         VM_NORESERVE);
>> +    else
>> +        filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
>> +
>>       if (IS_ERR(filp))
>>           return PTR_ERR(filp);
>>
>> diff --git a/drivers/gpu/drm/drm_gem_dma_helper.c 
>> b/drivers/gpu/drm/drm_gem_dma_helper.c
>> index 870b90b78bc4..9ada5ac85dd6 100644
>> --- a/drivers/gpu/drm/drm_gem_dma_helper.c
>> +++ b/drivers/gpu/drm/drm_gem_dma_helper.c
>> @@ -95,7 +95,7 @@ __drm_gem_dma_create(struct drm_device *drm, size_t 
>> size, bool private)
>>           /* Always use writecombine for dma-buf mappings */
>>           dma_obj->map_noncoherent = false;
>>       } else {
>> -        ret = drm_gem_object_init(drm, gem_obj, size);
>> +        ret = drm_gem_object_init(drm, gem_obj, size, NULL);
>>       }
>>       if (ret)
>>           goto error;
>> diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
>> b/drivers/gpu/drm/drm_gem_shmem_helper.c
>> index e435f986cd13..15635b330ca8 100644
>> --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
>> +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
>> @@ -77,7 +77,7 @@ __drm_gem_shmem_create(struct drm_device *dev, 
>> size_t size, bool private)
>>           drm_gem_private_object_init(dev, obj, size);
>>           shmem->map_wc = false; /* dma-buf mappings use always 
>> writecombine */
>>       } else {
>> -        ret = drm_gem_object_init(dev, obj, size);
>> +        ret = drm_gem_object_init(dev, obj, size, NULL);
>>       }
>>       if (ret) {
>>           drm_gem_private_object_fini(obj);
>> diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c 
>> b/drivers/gpu/drm/drm_gem_vram_helper.c
>> index 75f2eaf0d5b6..90649899dbef 100644
>> --- a/drivers/gpu/drm/drm_gem_vram_helper.c
>> +++ b/drivers/gpu/drm/drm_gem_vram_helper.c
>> @@ -210,7 +210,7 @@ struct drm_gem_vram_object 
>> *drm_gem_vram_create(struct drm_device *dev,
>>       if (!gem->funcs)
>>           gem->funcs = &drm_gem_vram_object_funcs;
>>
>> -    ret = drm_gem_object_init(dev, gem, size);
>> +    ret = drm_gem_object_init(dev, gem, size, NULL);
>>       if (ret) {
>>           kfree(gbo);
>>           return ERR_PTR(ret);
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c 
>> b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
>> index 71a6d2b1c80f..aa4b61c48b7f 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
>> @@ -596,7 +596,7 @@ int etnaviv_gem_new_handle(struct drm_device 
>> *dev, struct drm_file *file,
>>
>>       lockdep_set_class(&to_etnaviv_bo(obj)->lock, 
>> &etnaviv_shm_lock_class);
>>
>> -    ret = drm_gem_object_init(dev, obj, size);
>> +    ret = drm_gem_object_init(dev, obj, size, NULL);
>>       if (ret)
>>           goto fail;
>>
>> diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c 
>> b/drivers/gpu/drm/exynos/exynos_drm_gem.c
>> index 638ca96830e9..c50c0d12246e 100644
>> --- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
>> +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
>> @@ -160,7 +160,7 @@ static struct exynos_drm_gem 
>> *exynos_drm_gem_init(struct drm_device *dev,
>>
>>       obj->funcs = &exynos_drm_gem_object_funcs;
>>
>> -    ret = drm_gem_object_init(dev, obj, size);
>> +    ret = drm_gem_object_init(dev, obj, size, NULL);
>>       if (ret < 0) {
>>           DRM_DEV_ERROR(dev->dev, "failed to initialize gem object\n");
>>           kfree(exynos_gem);
>> diff --git a/drivers/gpu/drm/gma500/gem.c b/drivers/gpu/drm/gma500/gem.c
>> index 4b7627a72637..315e085dc9ee 100644
>> --- a/drivers/gpu/drm/gma500/gem.c
>> +++ b/drivers/gpu/drm/gma500/gem.c
>> @@ -169,7 +169,7 @@ psb_gem_create(struct drm_device *dev, u64 size, 
>> const char *name, bool stolen,
>>       if (stolen) {
>>           drm_gem_private_object_init(dev, obj, size);
>>       } else {
>> -        ret = drm_gem_object_init(dev, obj, size);
>> +        ret = drm_gem_object_init(dev, obj, size, NULL);
>>           if (ret)
>>               goto err_release_resource;
>>
>> diff --git a/drivers/gpu/drm/loongson/lsdc_ttm.c 
>> b/drivers/gpu/drm/loongson/lsdc_ttm.c
>> index 465f622ac05d..d392ea66d72e 100644
>> --- a/drivers/gpu/drm/loongson/lsdc_ttm.c
>> +++ b/drivers/gpu/drm/loongson/lsdc_ttm.c
>> @@ -458,7 +458,7 @@ struct lsdc_bo *lsdc_bo_create(struct drm_device 
>> *ddev,
>>
>>       size = ALIGN(size, PAGE_SIZE);
>>
>> -    ret = drm_gem_object_init(ddev, &tbo->base, size);
>> +    ret = drm_gem_object_init(ddev, &tbo->base, size, NULL);
>>       if (ret) {
>>           kfree(lbo);
>>           return ERR_PTR(ret);
>> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_gem.c 
>> b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
>> index 4f2e3feabc0f..261d386921dc 100644
>> --- a/drivers/gpu/drm/mediatek/mtk_drm_gem.c
>> +++ b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
>> @@ -44,7 +44,7 @@ static struct mtk_drm_gem_obj 
>> *mtk_drm_gem_init(struct drm_device *dev,
>>
>>       mtk_gem_obj->base.funcs = &mtk_drm_gem_object_funcs;
>>
>> -    ret = drm_gem_object_init(dev, &mtk_gem_obj->base, size);
>> +    ret = drm_gem_object_init(dev, &mtk_gem_obj->base, size, NULL);
>>       if (ret < 0) {
>>           DRM_ERROR("failed to initialize gem object\n");
>>           kfree(mtk_gem_obj);
>> diff --git a/drivers/gpu/drm/msm/msm_gem.c 
>> b/drivers/gpu/drm/msm/msm_gem.c
>> index 175ee4ab8a6f..6fe17cf28ef6 100644
>> --- a/drivers/gpu/drm/msm/msm_gem.c
>> +++ b/drivers/gpu/drm/msm/msm_gem.c
>> @@ -1222,7 +1222,7 @@ struct drm_gem_object *msm_gem_new(struct 
>> drm_device *dev, uint32_t size, uint32
>>
>>           vma->iova = physaddr(obj);
>>       } else {
>> -        ret = drm_gem_object_init(dev, obj, size);
>> +        ret = drm_gem_object_init(dev, obj, size, NULL);
>>           if (ret)
>>               goto fail;
>>           /*
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
>> b/drivers/gpu/drm/nouveau/nouveau_gem.c
>> index 49c2bcbef129..434325fa8752 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
>> @@ -262,7 +262,7 @@ nouveau_gem_new(struct nouveau_cli *cli, u64 
>> size, int align, uint32_t domain,
>>
>>       /* Initialize the embedded gem-object. We return a single 
>> gem-reference
>>        * to the caller, instead of a normal nouveau_bo ttm reference. */
>> -    ret = drm_gem_object_init(drm->dev, &nvbo->bo.base, size);
>> +    ret = drm_gem_object_init(drm->dev, &nvbo->bo.base, size, NULL);
>>       if (ret) {
>>           drm_gem_object_release(&nvbo->bo.base);
>>           kfree(nvbo);
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c 
>> b/drivers/gpu/drm/nouveau/nouveau_prime.c
>> index 1b2ff0c40fc1..c9b3572df555 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_prime.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_prime.c
>> @@ -62,7 +62,7 @@ struct drm_gem_object 
>> *nouveau_gem_prime_import_sg_table(struct drm_device *dev,
>>
>>       /* Initialize the embedded gem-object. We return a single 
>> gem-reference
>>        * to the caller, instead of a normal nouveau_bo ttm reference. */
>> -    ret = drm_gem_object_init(dev, &nvbo->bo.base, size);
>> +    ret = drm_gem_object_init(dev, &nvbo->bo.base, size, NULL);
>>       if (ret) {
>>           nouveau_bo_ref(NULL, &nvbo);
>>           obj = ERR_PTR(-ENOMEM);
>> diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c 
>> b/drivers/gpu/drm/omapdrm/omap_gem.c
>> index 3421e8389222..53b4ec64c7b0 100644
>> --- a/drivers/gpu/drm/omapdrm/omap_gem.c
>> +++ b/drivers/gpu/drm/omapdrm/omap_gem.c
>> @@ -1352,7 +1352,7 @@ struct drm_gem_object *omap_gem_new(struct 
>> drm_device *dev,
>>       if (!(flags & OMAP_BO_MEM_SHMEM)) {
>>           drm_gem_private_object_init(dev, obj, size);
>>       } else {
>> -        ret = drm_gem_object_init(dev, obj, size);
>> +        ret = drm_gem_object_init(dev, obj, size, NULL);
>>           if (ret)
>>               goto err_free;
>>
>> diff --git a/drivers/gpu/drm/qxl/qxl_object.c 
>> b/drivers/gpu/drm/qxl/qxl_object.c
>> index 1e46b0a6e478..45d7abe26ebd 100644
>> --- a/drivers/gpu/drm/qxl/qxl_object.c
>> +++ b/drivers/gpu/drm/qxl/qxl_object.c
>> @@ -123,7 +123,7 @@ int qxl_bo_create(struct qxl_device *qdev, 
>> unsigned long size,
>>       if (bo == NULL)
>>           return -ENOMEM;
>>       size = roundup(size, PAGE_SIZE);
>> -    r = drm_gem_object_init(&qdev->ddev, &bo->tbo.base, size);
>> +    r = drm_gem_object_init(&qdev->ddev, &bo->tbo.base, size, NULL);
>>       if (unlikely(r)) {
>>           kfree(bo);
>>           return r;
>> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c 
>> b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
>> index 93ed841f5dce..daba285bd78f 100644
>> --- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
>> +++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
>> @@ -295,7 +295,7 @@ static struct rockchip_gem_object *
>>
>>       obj->funcs = &rockchip_gem_object_funcs;
>>
>> -    drm_gem_object_init(drm, obj, size);
>> +    drm_gem_object_init(drm, obj, size, NULL);
>>
>>       return rk_obj;
>>   }
>> diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
>> index b4eb030ea961..63f10d5a57ba 100644
>> --- a/drivers/gpu/drm/tegra/gem.c
>> +++ b/drivers/gpu/drm/tegra/gem.c
>> @@ -311,7 +311,7 @@ static struct tegra_bo 
>> *tegra_bo_alloc_object(struct drm_device *drm,
>>       host1x_bo_init(&bo->base, &tegra_bo_ops);
>>       size = round_up(size, PAGE_SIZE);
>>
>> -    err = drm_gem_object_init(drm, &bo->gem, size);
>> +    err = drm_gem_object_init(drm, &bo->gem, size, NULL);
>>       if (err < 0)
>>           goto free;
>>
>> diff --git a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c 
>> b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
>> index 7b7c1fa805fc..a9bf7d5a887c 100644
>> --- a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
>> +++ b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
>> @@ -61,7 +61,7 @@ struct ttm_buffer_object *ttm_bo_kunit_init(struct 
>> kunit *test,
>>       KUNIT_ASSERT_NOT_NULL(test, bo);
>>
>>       bo->base = gem_obj;
>> -    err = drm_gem_object_init(devs->drm, &bo->base, size);
>> +    err = drm_gem_object_init(devs->drm, &bo->base, size, NULL);
>>       KUNIT_ASSERT_EQ(test, err, 0);
>>
>>       bo->bdev = devs->ttm_dev;
>> diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c 
>> b/drivers/gpu/drm/xen/xen_drm_front_gem.c
>> index 3ad2b4cfd1f0..1b36c958340b 100644
>> --- a/drivers/gpu/drm/xen/xen_drm_front_gem.c
>> +++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c
>> @@ -122,7 +122,7 @@ static struct xen_gem_object 
>> *gem_create_obj(struct drm_device *dev,
>>
>>       xen_obj->base.funcs = &xen_drm_front_gem_object_funcs;
>>
>> -    ret = drm_gem_object_init(dev, &xen_obj->base, size);
>> +    ret = drm_gem_object_init(dev, &xen_obj->base, size, NULL);
>>       if (ret < 0) {
>>           kfree(xen_obj);
>>           return ERR_PTR(ret);
>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>> index 2ebec3984cd4..c75611ae8f93 100644
>> --- a/include/drm/drm_gem.h
>> +++ b/include/drm/drm_gem.h
>> @@ -471,7 +471,8 @@ struct drm_gem_object {
>>   void drm_gem_object_release(struct drm_gem_object *obj);
>>   void drm_gem_object_free(struct kref *kref);
>>   int drm_gem_object_init(struct drm_device *dev,
>> -            struct drm_gem_object *obj, size_t size);
>> +            struct drm_gem_object *obj, size_t size,
>> +            struct vfsmount *gemfs);
>>   void drm_gem_private_object_init(struct drm_device *dev,
>>                    struct drm_gem_object *obj, size_t size);
>>   void drm_gem_private_object_fini(struct drm_gem_object *obj);
>> -- 
>> 2.43.0
>>
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-12  8:59     ` Christian König
@ 2024-03-12  9:30       ` Tvrtko Ursulin
  2024-03-12 10:23         ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: Tvrtko Ursulin @ 2024-03-12  9:30 UTC (permalink / raw)
  To: Christian König, Maíra Canal, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti


On 12/03/2024 08:59, Christian König wrote:
> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>
>> Hi Maira,
>>
>> On 11/03/2024 10:05, Maíra Canal wrote:
>>> For some applications, such as using huge pages, we might want to have a
>>> different mountpoint, for which we pass in mount flags that better match
>>> our usecase.
>>>
>>> Therefore, add a new parameter to drm_gem_object_init() that allow us to
>>> define the tmpfs mountpoint where the GEM object will be created. If
>>> this parameter is NULL, then we fallback to shmem_file_setup().
>>
>> One strategy for reducing churn, and so the number of drivers this 
>> patch touches, could be to add a lower level drm_gem_object_init() 
>> (which takes vfsmount, call it __drm_gem_object_init(), or 
>> drm__gem_object_init_mnt(), and make drm_gem_object_init() call that 
>> one with a NULL argument.
> 
> I would even go a step further into the other direction. The shmem 
> backed GEM object is just some special handling as far as I can see.
> 
> So I would rather suggest to rename all drm_gem_* function which only 
> deal with the shmem backed GEM object into drm_gem_shmem_*.

That makes sense although it would be very churny. I at least would be 
on the fence regarding the cost vs benefit.

> Also the explanation why a different mount point helps with something 
> isn't very satisfying.

Not satisfying as you think it is not detailed enough to say driver 
wants to use huge pages for performance? Or not satisying as you 
question why huge pages would help?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-12  9:30       ` Tvrtko Ursulin
@ 2024-03-12 10:23         ` Christian König
  2024-03-12 10:31           ` Tvrtko Ursulin
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2024-03-12 10:23 UTC (permalink / raw)
  To: Tvrtko Ursulin, Maíra Canal, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti

Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>
> On 12/03/2024 08:59, Christian König wrote:
>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>
>>> Hi Maira,
>>>
>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>> For some applications, such as using huge pages, we might want to 
>>>> have a
>>>> different mountpoint, for which we pass in mount flags that better 
>>>> match
>>>> our usecase.
>>>>
>>>> Therefore, add a new parameter to drm_gem_object_init() that allow 
>>>> us to
>>>> define the tmpfs mountpoint where the GEM object will be created. If
>>>> this parameter is NULL, then we fallback to shmem_file_setup().
>>>
>>> One strategy for reducing churn, and so the number of drivers this 
>>> patch touches, could be to add a lower level drm_gem_object_init() 
>>> (which takes vfsmount, call it __drm_gem_object_init(), or 
>>> drm__gem_object_init_mnt(), and make drm_gem_object_init() call that 
>>> one with a NULL argument.
>>
>> I would even go a step further into the other direction. The shmem 
>> backed GEM object is just some special handling as far as I can see.
>>
>> So I would rather suggest to rename all drm_gem_* function which only 
>> deal with the shmem backed GEM object into drm_gem_shmem_*.
>
> That makes sense although it would be very churny. I at least would be 
> on the fence regarding the cost vs benefit.

Yeah, it should clearly not be part of this patch here.

>
>> Also the explanation why a different mount point helps with something 
>> isn't very satisfying.
>
> Not satisfying as you think it is not detailed enough to say driver 
> wants to use huge pages for performance? Or not satisying as you 
> question why huge pages would help?

That huge pages are beneficial is clear to me, but I'm missing the 
connection why a different mount point helps with using huge pages.

Regards,
Christian.

>
> Regards,
>
> Tvrtko


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-12 10:23         ` Christian König
@ 2024-03-12 10:31           ` Tvrtko Ursulin
  2024-03-12 10:37             ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: Tvrtko Ursulin @ 2024-03-12 10:31 UTC (permalink / raw)
  To: Christian König, Maíra Canal, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti


On 12/03/2024 10:23, Christian König wrote:
> Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>>
>> On 12/03/2024 08:59, Christian König wrote:
>>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>>
>>>> Hi Maira,
>>>>
>>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>>> For some applications, such as using huge pages, we might want to 
>>>>> have a
>>>>> different mountpoint, for which we pass in mount flags that better 
>>>>> match
>>>>> our usecase.
>>>>>
>>>>> Therefore, add a new parameter to drm_gem_object_init() that allow 
>>>>> us to
>>>>> define the tmpfs mountpoint where the GEM object will be created. If
>>>>> this parameter is NULL, then we fallback to shmem_file_setup().
>>>>
>>>> One strategy for reducing churn, and so the number of drivers this 
>>>> patch touches, could be to add a lower level drm_gem_object_init() 
>>>> (which takes vfsmount, call it __drm_gem_object_init(), or 
>>>> drm__gem_object_init_mnt(), and make drm_gem_object_init() call that 
>>>> one with a NULL argument.
>>>
>>> I would even go a step further into the other direction. The shmem 
>>> backed GEM object is just some special handling as far as I can see.
>>>
>>> So I would rather suggest to rename all drm_gem_* function which only 
>>> deal with the shmem backed GEM object into drm_gem_shmem_*.
>>
>> That makes sense although it would be very churny. I at least would be 
>> on the fence regarding the cost vs benefit.
> 
> Yeah, it should clearly not be part of this patch here.
> 
>>
>>> Also the explanation why a different mount point helps with something 
>>> isn't very satisfying.
>>
>> Not satisfying as you think it is not detailed enough to say driver 
>> wants to use huge pages for performance? Or not satisying as you 
>> question why huge pages would help?
> 
> That huge pages are beneficial is clear to me, but I'm missing the 
> connection why a different mount point helps with using huge pages.

Ah right, same as in i915, one needs to mount a tmpfs instance passing 
huge=within_size or huge=always option. Default is 'never', see man 5 tmpfs.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-12 10:31           ` Tvrtko Ursulin
@ 2024-03-12 10:37             ` Christian König
  2024-03-12 13:09               ` Tvrtko Ursulin
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2024-03-12 10:37 UTC (permalink / raw)
  To: Tvrtko Ursulin, Maíra Canal, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti

Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:
>
> On 12/03/2024 10:23, Christian König wrote:
>> Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>>>
>>> On 12/03/2024 08:59, Christian König wrote:
>>>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>>>
>>>>> Hi Maira,
>>>>>
>>>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>>>> For some applications, such as using huge pages, we might want to 
>>>>>> have a
>>>>>> different mountpoint, for which we pass in mount flags that 
>>>>>> better match
>>>>>> our usecase.
>>>>>>
>>>>>> Therefore, add a new parameter to drm_gem_object_init() that 
>>>>>> allow us to
>>>>>> define the tmpfs mountpoint where the GEM object will be created. If
>>>>>> this parameter is NULL, then we fallback to shmem_file_setup().
>>>>>
>>>>> One strategy for reducing churn, and so the number of drivers this 
>>>>> patch touches, could be to add a lower level drm_gem_object_init() 
>>>>> (which takes vfsmount, call it __drm_gem_object_init(), or 
>>>>> drm__gem_object_init_mnt(), and make drm_gem_object_init() call 
>>>>> that one with a NULL argument.
>>>>
>>>> I would even go a step further into the other direction. The shmem 
>>>> backed GEM object is just some special handling as far as I can see.
>>>>
>>>> So I would rather suggest to rename all drm_gem_* function which 
>>>> only deal with the shmem backed GEM object into drm_gem_shmem_*.
>>>
>>> That makes sense although it would be very churny. I at least would 
>>> be on the fence regarding the cost vs benefit.
>>
>> Yeah, it should clearly not be part of this patch here.
>>
>>>
>>>> Also the explanation why a different mount point helps with 
>>>> something isn't very satisfying.
>>>
>>> Not satisfying as you think it is not detailed enough to say driver 
>>> wants to use huge pages for performance? Or not satisying as you 
>>> question why huge pages would help?
>>
>> That huge pages are beneficial is clear to me, but I'm missing the 
>> connection why a different mount point helps with using huge pages.
>
> Ah right, same as in i915, one needs to mount a tmpfs instance passing 
> huge=within_size or huge=always option. Default is 'never', see man 5 
> tmpfs.

Thanks for the explanation, I wasn't aware of that.

Mhm, shouldn't we always use huge pages? Is there a reason for a DRM 
device to not use huge pages with the shmem backend?

I mean it would make this patch here even smaller.

Regards,
Christian.

>
>
> Regards,
>
> Tvrtko


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-12 10:37             ` Christian König
@ 2024-03-12 13:09               ` Tvrtko Ursulin
  2024-03-12 13:48                 ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: Tvrtko Ursulin @ 2024-03-12 13:09 UTC (permalink / raw)
  To: Christian König, Maíra Canal, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti


On 12/03/2024 10:37, Christian König wrote:
> Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:
>>
>> On 12/03/2024 10:23, Christian König wrote:
>>> Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>>>>
>>>> On 12/03/2024 08:59, Christian König wrote:
>>>>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>>>>
>>>>>> Hi Maira,
>>>>>>
>>>>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>>>>> For some applications, such as using huge pages, we might want to 
>>>>>>> have a
>>>>>>> different mountpoint, for which we pass in mount flags that 
>>>>>>> better match
>>>>>>> our usecase.
>>>>>>>
>>>>>>> Therefore, add a new parameter to drm_gem_object_init() that 
>>>>>>> allow us to
>>>>>>> define the tmpfs mountpoint where the GEM object will be created. If
>>>>>>> this parameter is NULL, then we fallback to shmem_file_setup().
>>>>>>
>>>>>> One strategy for reducing churn, and so the number of drivers this 
>>>>>> patch touches, could be to add a lower level drm_gem_object_init() 
>>>>>> (which takes vfsmount, call it __drm_gem_object_init(), or 
>>>>>> drm__gem_object_init_mnt(), and make drm_gem_object_init() call 
>>>>>> that one with a NULL argument.
>>>>>
>>>>> I would even go a step further into the other direction. The shmem 
>>>>> backed GEM object is just some special handling as far as I can see.
>>>>>
>>>>> So I would rather suggest to rename all drm_gem_* function which 
>>>>> only deal with the shmem backed GEM object into drm_gem_shmem_*.
>>>>
>>>> That makes sense although it would be very churny. I at least would 
>>>> be on the fence regarding the cost vs benefit.
>>>
>>> Yeah, it should clearly not be part of this patch here.
>>>
>>>>
>>>>> Also the explanation why a different mount point helps with 
>>>>> something isn't very satisfying.
>>>>
>>>> Not satisfying as you think it is not detailed enough to say driver 
>>>> wants to use huge pages for performance? Or not satisying as you 
>>>> question why huge pages would help?
>>>
>>> That huge pages are beneficial is clear to me, but I'm missing the 
>>> connection why a different mount point helps with using huge pages.
>>
>> Ah right, same as in i915, one needs to mount a tmpfs instance passing 
>> huge=within_size or huge=always option. Default is 'never', see man 5 
>> tmpfs.
> 
> Thanks for the explanation, I wasn't aware of that.
> 
> Mhm, shouldn't we always use huge pages? Is there a reason for a DRM 
> device to not use huge pages with the shmem backend?

AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), back 
then the understanding was within_size may overallocate, meaning there 
would be some space wastage, until the memory pressure makes the thp 
code split the trailing huge page. I haven't checked if that still applies.

Other than that I don't know if some drivers/platforms could have 
problems if they have some limitations or hardcoded assumptions when 
they iterate the sg list.

Te Cc is plenty large so perhaps someone else will have additional 
information. :)

Regards,

Tvrtko

> 
> I mean it would make this patch here even smaller.
> 
> Regards,
> Christian.
> 
>>
>>
>> Regards,
>>
>> Tvrtko
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] drm/v3d: Enable super pages
  2024-03-11 10:06 ` [PATCH 5/5] drm/v3d: Enable super pages Maíra Canal
  2024-03-12  8:34   ` Iago Toral
@ 2024-03-12 13:41   ` Tvrtko Ursulin
  1 sibling, 0 replies; 28+ messages in thread
From: Tvrtko Ursulin @ 2024-03-12 13:41 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, Iago Toral, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev


Hi Maira,

On 11/03/2024 10:06, Maíra Canal wrote:
> The V3D MMU also supports 1MB pages, called super pages. In order to
> set a 1MB page in the MMU, we need to make sure that page table entries
> for all 4KB pages within a super page must be correctly configured.
> 
> Therefore, if the BO is larger than 2MB, we allocate it in a separate
> mountpoint that uses THP. This will allow us to create a contiguous
> memory region to create our super pages. In order to place the page
> table entries in the MMU, we iterate over the 256 4KB pages and insert
> the PTE.
> 
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
>   drivers/gpu/drm/v3d/v3d_bo.c    | 19 +++++++++++++++++--
>   drivers/gpu/drm/v3d/v3d_drv.c   |  7 +++++++
>   drivers/gpu/drm/v3d/v3d_drv.h   |  6 ++++--
>   drivers/gpu/drm/v3d/v3d_gemfs.c |  6 ++++++
>   drivers/gpu/drm/v3d/v3d_mmu.c   | 24 ++++++++++++++++++++++--
>   5 files changed, 56 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
> index a07ede668cc1..cb8e49a33be7 100644
> --- a/drivers/gpu/drm/v3d/v3d_bo.c
> +++ b/drivers/gpu/drm/v3d/v3d_bo.c
> @@ -94,6 +94,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
>   	struct v3d_dev *v3d = to_v3d_dev(obj->dev);
>   	struct v3d_bo *bo = to_v3d_bo(obj);
>   	struct sg_table *sgt;
> +	u64 align;
>   	int ret;
> 
>   	/* So far we pin the BO in the MMU for its lifetime, so use
> @@ -103,6 +104,9 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
>   	if (IS_ERR(sgt))
>   		return PTR_ERR(sgt);
> 
> +	bo->huge_pages = (obj->size >= SZ_2M && v3d->super_pages);
> +	align = bo->huge_pages ? SZ_1M : SZ_4K;
> +
>   	spin_lock(&v3d->mm_lock);
>   	/* Allocate the object's space in the GPU's page tables.
>   	 * Inserting PTEs will happen later, but the offset is for the
> @@ -110,7 +114,7 @@ v3d_bo_create_finish(struct drm_gem_object *obj)
>   	 */
>   	ret = drm_mm_insert_node_generic(&v3d->mm, &bo->node,
>   					 obj->size >> V3D_MMU_PAGE_SHIFT,
> -					 GMP_GRANULARITY >> V3D_MMU_PAGE_SHIFT, 0, 0);
> +					 align >> V3D_MMU_PAGE_SHIFT, 0, 0);
>   	spin_unlock(&v3d->mm_lock);
>   	if (ret)
>   		return ret;
> @@ -130,10 +134,21 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, struct drm_file *file_priv,
>   			     size_t unaligned_size)
>   {
>   	struct drm_gem_shmem_object *shmem_obj;
> +	struct v3d_dev *v3d = to_v3d_dev(dev);
>   	struct v3d_bo *bo;
> +	size_t size;
>   	int ret;
> 
> -	shmem_obj = drm_gem_shmem_create(dev, unaligned_size);
> +	size = PAGE_ALIGN(unaligned_size);
> +
> +	/* To avoid memory fragmentation, we only use THP if the BO is bigger
> +	 * than two Super Pages (1MB).
> +	 */
> +	if (size >= SZ_2M && v3d->super_pages)
> +		shmem_obj = drm_gem_shmem_create_with_mnt(dev, size, v3d->gemfs);
> +	else
> +		shmem_obj = drm_gem_shmem_create(dev, size);
> +
>   	if (IS_ERR(shmem_obj))
>   		return ERR_CAST(shmem_obj);
>   	bo = to_v3d_bo(&shmem_obj->base);
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
> index 3debf37e7d9b..96f4d8227407 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.c
> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> @@ -36,6 +36,11 @@
>   #define DRIVER_MINOR 0
>   #define DRIVER_PATCHLEVEL 0
> 
> +static bool super_pages = true;
> +module_param_named(super_pages, super_pages, bool, 0400);
> +MODULE_PARM_DESC(super_pages, "Enable/Disable Super Pages support. Note: \
> +			       To enable Super Pages, you need support to THP.");
> +
>   static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
>   			       struct drm_file *file_priv)
>   {
> @@ -308,6 +313,8 @@ static int v3d_platform_drm_probe(struct platform_device *pdev)
>   		return -ENOMEM;
>   	}
> 
> +	v3d->super_pages = super_pages;
> +
>   	ret = v3d_gem_init(drm);
>   	if (ret)
>   		goto dma_free;
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
> index d2ce8222771a..795087663739 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -17,9 +17,8 @@ struct clk;
>   struct platform_device;
>   struct reset_control;
> 
> -#define GMP_GRANULARITY (128 * 1024)
> -
>   #define V3D_MMU_PAGE_SHIFT 12
> +#define V3D_PAGE_FACTOR (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT)
> 
>   #define V3D_MAX_QUEUES (V3D_CPU + 1)
> 
> @@ -123,6 +122,7 @@ struct v3d_dev {
>   	 * tmpfs instance used for shmem backed objects
>   	 */
>   	struct vfsmount *gemfs;
> +	bool super_pages;

One not very important comment just in passing: Does v3d->super_pages == 
!!v3d->gemfs always holds at runtime? Thinking if you really need to add 
v3d->super_pages, or could just infer from v3d->gemfs, maybe via a 
wrapper or whatever pattern is used in v3d.

> 
>   	struct work_struct overflow_mem_work;
> 
> @@ -211,6 +211,8 @@ struct v3d_bo {
>   	struct list_head unref_head;
> 
>   	void *vaddr;
> +
> +	bool huge_pages;
>   };
> 
>   static inline struct v3d_bo *
> diff --git a/drivers/gpu/drm/v3d/v3d_gemfs.c b/drivers/gpu/drm/v3d/v3d_gemfs.c
> index 8518b7da6f73..bcde3138f555 100644
> --- a/drivers/gpu/drm/v3d/v3d_gemfs.c
> +++ b/drivers/gpu/drm/v3d/v3d_gemfs.c
> @@ -12,6 +12,10 @@ void v3d_gemfs_init(struct v3d_dev *v3d)
>   	struct file_system_type *type;
>   	struct vfsmount *gemfs;
> 
> +	/* The user doesn't want support for Super Pages */
> +	if (!v3d->super_pages)
> +		goto err;
> +
>   	/*
>   	 * By creating our own shmemfs mountpoint, we can pass in
>   	 * mount flags that better match our usecase. However, we
> @@ -35,6 +39,8 @@ void v3d_gemfs_init(struct v3d_dev *v3d)
> 
>   err:
>   	v3d->gemfs = NULL;
> +	v3d->super_pages = false;
> +
>   	drm_notice(&v3d->drm,
>   		   "Transparent Hugepage support is recommended for optimal performance on this platform!\n");
>   }
> diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c b/drivers/gpu/drm/v3d/v3d_mmu.c
> index 14f3af40d6f6..2f368dc2c0ca 100644
> --- a/drivers/gpu/drm/v3d/v3d_mmu.c
> +++ b/drivers/gpu/drm/v3d/v3d_mmu.c
> @@ -89,6 +89,9 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo)
>   	u32 page = bo->node.start;
>   	u32 page_prot = V3D_PTE_WRITEABLE | V3D_PTE_VALID;
>   	struct sg_dma_page_iter dma_iter;
> +	int ctg_size = drm_prime_get_contiguous_size(shmem_obj->sgt);

Coming from the i915 background, this call looked suspicios to me. First 
of all the helper seems to be for prime import and secondly I don't know 
if v3d supports non-contigous DMA addresses even? Like can the IOMMU on 
the respective platform create such mappings? If it can not and does 
not, then the return value is just equal to object size.

Hmm no.. it cannot be mapping non-contiguous because then 
drm_prime_get_contiguous_size() could return you smaller than total 
object size and the loop below would underflow ctg_size.

So it looks it would dtrt just a bit misleading.

I also wonder if even before using THP there was no chance on this 
platform to get accidentally coalesced pages?

[Goes and looks.]

Donsn't seem so. V3d uses drm_gem_get_pages() right? And there it 
doesn't use page coalescing but just assigns them 1:1.

Maybe there is some scope to save some memory for all drivers which use 
the common helpers. See 0c40ce130e38 ("drm/i915: Trim the object sg 
table") for reference, and also code which compares the PFN as it builds 
the sg list in i915 shmem_sg_alloc_table. Although I suspect to make 
moving this approach into core drm I should really figure out a way to 
make without i915_sg_trim and allow building the sg list incrementally 
with some new scatterlist.h|c APIs.

Presumably when dma_map_sg is called on v3d platforms it also maps 1:1 
even if pages are neighbouring by chance? DMA API allows it but maybe 
platform does not?

A long digression.. to go back to original point.. Do you think you can 
just replace drm_prime_get_contiguous_size() with bo->base.size and 
everything keeps working as is, or I am missing something?

Regards,

Tvrtko

> +	u32 page_size = 0;
> +	u32 npages = 0;
> 
>   	for_each_sgtable_dma_page(shmem_obj->sgt, &dma_iter, 0) {
>   		dma_addr_t dma_addr = sg_page_iter_dma_address(&dma_iter);
> @@ -96,10 +99,27 @@ void v3d_mmu_insert_ptes(struct v3d_bo *bo)
>   		u32 pte = page_prot | page_address;
>   		u32 i;
> 
> -		BUG_ON(page_address + (PAGE_SIZE >> V3D_MMU_PAGE_SHIFT) >=
> +		if (npages == 0) {
> +			if (ctg_size >= SZ_1M && bo->huge_pages) {
> +				page_size = SZ_1M;
> +				npages = 256;
> +			} else {
> +				page_size = SZ_4K;
> +				npages = V3D_PAGE_FACTOR;
> +			}
> +
> +			ctg_size -= npages * SZ_4K;
> +		}
> +
> +		if (page_size == SZ_1M)
> +			pte |= V3D_PTE_SUPERPAGE;
> +
> +		BUG_ON(page_address + V3D_PAGE_FACTOR >=
>   		       BIT(24));
> -		for (i = 0; i < PAGE_SIZE >> V3D_MMU_PAGE_SHIFT; i++)
> +		for (i = 0; i < V3D_PAGE_FACTOR; i++)
>   			v3d->pt[page++] = pte + i;
> +
> +		npages -= V3D_PAGE_FACTOR;
>   	}
> 
>   	WARN_ON_ONCE(page - bo->node.start !=
> --
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-12 13:09               ` Tvrtko Ursulin
@ 2024-03-12 13:48                 ` Christian König
  2024-03-18 12:42                   ` Maíra Canal
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2024-03-12 13:48 UTC (permalink / raw)
  To: Tvrtko Ursulin, Maíra Canal, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti

Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:
>
> On 12/03/2024 10:37, Christian König wrote:
>> Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:
>>>
>>> On 12/03/2024 10:23, Christian König wrote:
>>>> Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>>>>>
>>>>> On 12/03/2024 08:59, Christian König wrote:
>>>>>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>>>>>
>>>>>>> Hi Maira,
>>>>>>>
>>>>>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>>>>>> For some applications, such as using huge pages, we might want 
>>>>>>>> to have a
>>>>>>>> different mountpoint, for which we pass in mount flags that 
>>>>>>>> better match
>>>>>>>> our usecase.
>>>>>>>>
>>>>>>>> Therefore, add a new parameter to drm_gem_object_init() that 
>>>>>>>> allow us to
>>>>>>>> define the tmpfs mountpoint where the GEM object will be 
>>>>>>>> created. If
>>>>>>>> this parameter is NULL, then we fallback to shmem_file_setup().
>>>>>>>
>>>>>>> One strategy for reducing churn, and so the number of drivers 
>>>>>>> this patch touches, could be to add a lower level 
>>>>>>> drm_gem_object_init() (which takes vfsmount, call it 
>>>>>>> __drm_gem_object_init(), or drm__gem_object_init_mnt(), and make 
>>>>>>> drm_gem_object_init() call that one with a NULL argument.
>>>>>>
>>>>>> I would even go a step further into the other direction. The 
>>>>>> shmem backed GEM object is just some special handling as far as I 
>>>>>> can see.
>>>>>>
>>>>>> So I would rather suggest to rename all drm_gem_* function which 
>>>>>> only deal with the shmem backed GEM object into drm_gem_shmem_*.
>>>>>
>>>>> That makes sense although it would be very churny. I at least 
>>>>> would be on the fence regarding the cost vs benefit.
>>>>
>>>> Yeah, it should clearly not be part of this patch here.
>>>>
>>>>>
>>>>>> Also the explanation why a different mount point helps with 
>>>>>> something isn't very satisfying.
>>>>>
>>>>> Not satisfying as you think it is not detailed enough to say 
>>>>> driver wants to use huge pages for performance? Or not satisying 
>>>>> as you question why huge pages would help?
>>>>
>>>> That huge pages are beneficial is clear to me, but I'm missing the 
>>>> connection why a different mount point helps with using huge pages.
>>>
>>> Ah right, same as in i915, one needs to mount a tmpfs instance 
>>> passing huge=within_size or huge=always option. Default is 'never', 
>>> see man 5 tmpfs.
>>
>> Thanks for the explanation, I wasn't aware of that.
>>
>> Mhm, shouldn't we always use huge pages? Is there a reason for a DRM 
>> device to not use huge pages with the shmem backend?
>
> AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), back 
> then the understanding was within_size may overallocate, meaning there 
> would be some space wastage, until the memory pressure makes the thp 
> code split the trailing huge page. I haven't checked if that still 
> applies.
>
> Other than that I don't know if some drivers/platforms could have 
> problems if they have some limitations or hardcoded assumptions when 
> they iterate the sg list.

Yeah, that was the whole point behind my question. As far as I can see 
this isn't driver specific, but platform specific.

I might be wrong here, but I think we should then probably not have that 
handling in each individual driver, but rather centralized in the DRM code.

Regards,
Christian.


>
> Te Cc is plenty large so perhaps someone else will have additional 
> information. :)
>
> Regards,
>
> Tvrtko
>
>>
>> I mean it would make this patch here even smaller.
>>
>> Regards,
>> Christian.
>>
>>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-12 13:48                 ` Christian König
@ 2024-03-18 12:42                   ` Maíra Canal
  2024-03-18 13:10                     ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: Maíra Canal @ 2024-03-18 12:42 UTC (permalink / raw)
  To: Christian König, Tvrtko Ursulin, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti

Hi Christian,

On 3/12/24 10:48, Christian König wrote:
> Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:
>>
>> On 12/03/2024 10:37, Christian König wrote:
>>> Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:
>>>>
>>>> On 12/03/2024 10:23, Christian König wrote:
>>>>> Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>>>>>>
>>>>>> On 12/03/2024 08:59, Christian König wrote:
>>>>>>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>>>>>>
>>>>>>>> Hi Maira,
>>>>>>>>
>>>>>>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>>>>>>> For some applications, such as using huge pages, we might want 
>>>>>>>>> to have a
>>>>>>>>> different mountpoint, for which we pass in mount flags that 
>>>>>>>>> better match
>>>>>>>>> our usecase.
>>>>>>>>>
>>>>>>>>> Therefore, add a new parameter to drm_gem_object_init() that 
>>>>>>>>> allow us to
>>>>>>>>> define the tmpfs mountpoint where the GEM object will be 
>>>>>>>>> created. If
>>>>>>>>> this parameter is NULL, then we fallback to shmem_file_setup().
>>>>>>>>
>>>>>>>> One strategy for reducing churn, and so the number of drivers 
>>>>>>>> this patch touches, could be to add a lower level 
>>>>>>>> drm_gem_object_init() (which takes vfsmount, call it 
>>>>>>>> __drm_gem_object_init(), or drm__gem_object_init_mnt(), and make 
>>>>>>>> drm_gem_object_init() call that one with a NULL argument.
>>>>>>>
>>>>>>> I would even go a step further into the other direction. The 
>>>>>>> shmem backed GEM object is just some special handling as far as I 
>>>>>>> can see.
>>>>>>>
>>>>>>> So I would rather suggest to rename all drm_gem_* function which 
>>>>>>> only deal with the shmem backed GEM object into drm_gem_shmem_*.
>>>>>>
>>>>>> That makes sense although it would be very churny. I at least 
>>>>>> would be on the fence regarding the cost vs benefit.
>>>>>
>>>>> Yeah, it should clearly not be part of this patch here.
>>>>>
>>>>>>
>>>>>>> Also the explanation why a different mount point helps with 
>>>>>>> something isn't very satisfying.
>>>>>>
>>>>>> Not satisfying as you think it is not detailed enough to say 
>>>>>> driver wants to use huge pages for performance? Or not satisying 
>>>>>> as you question why huge pages would help?
>>>>>
>>>>> That huge pages are beneficial is clear to me, but I'm missing the 
>>>>> connection why a different mount point helps with using huge pages.
>>>>
>>>> Ah right, same as in i915, one needs to mount a tmpfs instance 
>>>> passing huge=within_size or huge=always option. Default is 'never', 
>>>> see man 5 tmpfs.
>>>
>>> Thanks for the explanation, I wasn't aware of that.
>>>
>>> Mhm, shouldn't we always use huge pages? Is there a reason for a DRM 
>>> device to not use huge pages with the shmem backend?
>>
>> AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), back 
>> then the understanding was within_size may overallocate, meaning there 
>> would be some space wastage, until the memory pressure makes the thp 
>> code split the trailing huge page. I haven't checked if that still 
>> applies.
>>
>> Other than that I don't know if some drivers/platforms could have 
>> problems if they have some limitations or hardcoded assumptions when 
>> they iterate the sg list.
> 
> Yeah, that was the whole point behind my question. As far as I can see 
> this isn't driver specific, but platform specific.
> 
> I might be wrong here, but I think we should then probably not have that 
> handling in each individual driver, but rather centralized in the DRM code.

I don't see a point in enabling THP for all shmem drivers. A huge page
is only useful if the driver is going to use it. On V3D, for example,
I only need huge pages because I need the memory contiguously allocated
to implement Super Pages. Otherwise, if we don't have the Super Pages
support implemented in the driver, I would be creating memory pressure
without any performance gain.

Best Regards,
- Maíra

> 
> Regards,
> Christian.
> 
> 
>>
>> Te Cc is plenty large so perhaps someone else will have additional 
>> information. :)
>>
>> Regards,
>>
>> Tvrtko
>>
>>>
>>> I mean it would make this patch here even smaller.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-18 12:42                   ` Maíra Canal
@ 2024-03-18 13:10                     ` Christian König
  2024-03-18 13:28                       ` Maíra Canal
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2024-03-18 13:10 UTC (permalink / raw)
  To: Maíra Canal, Tvrtko Ursulin, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti

Am 18.03.24 um 13:42 schrieb Maíra Canal:
> Hi Christian,
>
> On 3/12/24 10:48, Christian König wrote:
>> Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:
>>>
>>> On 12/03/2024 10:37, Christian König wrote:
>>>> Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:
>>>>>
>>>>> On 12/03/2024 10:23, Christian König wrote:
>>>>>> Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>>>>>>>
>>>>>>> On 12/03/2024 08:59, Christian König wrote:
>>>>>>>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>>>>>>>
>>>>>>>>> Hi Maira,
>>>>>>>>>
>>>>>>>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>>>>>>>> For some applications, such as using huge pages, we might 
>>>>>>>>>> want to have a
>>>>>>>>>> different mountpoint, for which we pass in mount flags that 
>>>>>>>>>> better match
>>>>>>>>>> our usecase.
>>>>>>>>>>
>>>>>>>>>> Therefore, add a new parameter to drm_gem_object_init() that 
>>>>>>>>>> allow us to
>>>>>>>>>> define the tmpfs mountpoint where the GEM object will be 
>>>>>>>>>> created. If
>>>>>>>>>> this parameter is NULL, then we fallback to shmem_file_setup().
>>>>>>>>>
>>>>>>>>> One strategy for reducing churn, and so the number of drivers 
>>>>>>>>> this patch touches, could be to add a lower level 
>>>>>>>>> drm_gem_object_init() (which takes vfsmount, call it 
>>>>>>>>> __drm_gem_object_init(), or drm__gem_object_init_mnt(), and 
>>>>>>>>> make drm_gem_object_init() call that one with a NULL argument.
>>>>>>>>
>>>>>>>> I would even go a step further into the other direction. The 
>>>>>>>> shmem backed GEM object is just some special handling as far as 
>>>>>>>> I can see.
>>>>>>>>
>>>>>>>> So I would rather suggest to rename all drm_gem_* function 
>>>>>>>> which only deal with the shmem backed GEM object into 
>>>>>>>> drm_gem_shmem_*.
>>>>>>>
>>>>>>> That makes sense although it would be very churny. I at least 
>>>>>>> would be on the fence regarding the cost vs benefit.
>>>>>>
>>>>>> Yeah, it should clearly not be part of this patch here.
>>>>>>
>>>>>>>
>>>>>>>> Also the explanation why a different mount point helps with 
>>>>>>>> something isn't very satisfying.
>>>>>>>
>>>>>>> Not satisfying as you think it is not detailed enough to say 
>>>>>>> driver wants to use huge pages for performance? Or not satisying 
>>>>>>> as you question why huge pages would help?
>>>>>>
>>>>>> That huge pages are beneficial is clear to me, but I'm missing 
>>>>>> the connection why a different mount point helps with using huge 
>>>>>> pages.
>>>>>
>>>>> Ah right, same as in i915, one needs to mount a tmpfs instance 
>>>>> passing huge=within_size or huge=always option. Default is 
>>>>> 'never', see man 5 tmpfs.
>>>>
>>>> Thanks for the explanation, I wasn't aware of that.
>>>>
>>>> Mhm, shouldn't we always use huge pages? Is there a reason for a 
>>>> DRM device to not use huge pages with the shmem backend?
>>>
>>> AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), 
>>> back then the understanding was within_size may overallocate, 
>>> meaning there would be some space wastage, until the memory pressure 
>>> makes the thp code split the trailing huge page. I haven't checked 
>>> if that still applies.
>>>
>>> Other than that I don't know if some drivers/platforms could have 
>>> problems if they have some limitations or hardcoded assumptions when 
>>> they iterate the sg list.
>>
>> Yeah, that was the whole point behind my question. As far as I can 
>> see this isn't driver specific, but platform specific.
>>
>> I might be wrong here, but I think we should then probably not have 
>> that handling in each individual driver, but rather centralized in 
>> the DRM code.
>
> I don't see a point in enabling THP for all shmem drivers. A huge page
> is only useful if the driver is going to use it. On V3D, for example,
> I only need huge pages because I need the memory contiguously allocated
> to implement Super Pages. Otherwise, if we don't have the Super Pages
> support implemented in the driver, I would be creating memory pressure
> without any performance gain.

Well that's the point I'm disagreeing with. THP doesn't seem to create 
much extra memory pressure for this use case.

As far as I can see background for the option is that files in tmpfs 
usually have a varying size, so it usually isn't beneficial to allocate 
a huge page just to find that the shmem file is much smaller than what's 
needed.

But GEM objects have a fixed size. So we of hand knew if we need 4KiB or 
1GiB and can therefore directly allocate huge pages if they are 
available and object large enough to back them with.

If the memory pressure is so high that we don't have huge pages 
available the shmem code falls back to standard pages anyway.

So THP is almost always beneficial for GEM even if the driver doesn't 
actually need it. The only potential case I can think of which might not 
be handled gracefully is the tail pages, e.g. huge + 4kib.

But that is trivial to optimize in the shmem code when the final size of 
the file is known beforehand.

Regards,
Christian.

>
> Best Regards,
> - Maíra
>
>>
>> Regards,
>> Christian.
>>
>>
>>>
>>> Te Cc is plenty large so perhaps someone else will have additional 
>>> information. :)
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>>
>>>> I mean it would make this patch here even smaller.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Tvrtko
>>>>
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-18 13:10                     ` Christian König
@ 2024-03-18 13:28                       ` Maíra Canal
  2024-03-18 14:01                         ` Maíra Canal
  2024-03-18 14:04                         ` Christian König
  0 siblings, 2 replies; 28+ messages in thread
From: Maíra Canal @ 2024-03-18 13:28 UTC (permalink / raw)
  To: Christian König, Tvrtko Ursulin, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti

Hi Christian,

On 3/18/24 10:10, Christian König wrote:
> Am 18.03.24 um 13:42 schrieb Maíra Canal:
>> Hi Christian,
>>
>> On 3/12/24 10:48, Christian König wrote:
>>> Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:
>>>>
>>>> On 12/03/2024 10:37, Christian König wrote:
>>>>> Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:
>>>>>>
>>>>>> On 12/03/2024 10:23, Christian König wrote:
>>>>>>> Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>>>>>>>>
>>>>>>>> On 12/03/2024 08:59, Christian König wrote:
>>>>>>>>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>>>>>>>>
>>>>>>>>>> Hi Maira,
>>>>>>>>>>
>>>>>>>>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>>>>>>>>> For some applications, such as using huge pages, we might 
>>>>>>>>>>> want to have a
>>>>>>>>>>> different mountpoint, for which we pass in mount flags that 
>>>>>>>>>>> better match
>>>>>>>>>>> our usecase.
>>>>>>>>>>>
>>>>>>>>>>> Therefore, add a new parameter to drm_gem_object_init() that 
>>>>>>>>>>> allow us to
>>>>>>>>>>> define the tmpfs mountpoint where the GEM object will be 
>>>>>>>>>>> created. If
>>>>>>>>>>> this parameter is NULL, then we fallback to shmem_file_setup().
>>>>>>>>>>
>>>>>>>>>> One strategy for reducing churn, and so the number of drivers 
>>>>>>>>>> this patch touches, could be to add a lower level 
>>>>>>>>>> drm_gem_object_init() (which takes vfsmount, call it 
>>>>>>>>>> __drm_gem_object_init(), or drm__gem_object_init_mnt(), and 
>>>>>>>>>> make drm_gem_object_init() call that one with a NULL argument.
>>>>>>>>>
>>>>>>>>> I would even go a step further into the other direction. The 
>>>>>>>>> shmem backed GEM object is just some special handling as far as 
>>>>>>>>> I can see.
>>>>>>>>>
>>>>>>>>> So I would rather suggest to rename all drm_gem_* function 
>>>>>>>>> which only deal with the shmem backed GEM object into 
>>>>>>>>> drm_gem_shmem_*.
>>>>>>>>
>>>>>>>> That makes sense although it would be very churny. I at least 
>>>>>>>> would be on the fence regarding the cost vs benefit.
>>>>>>>
>>>>>>> Yeah, it should clearly not be part of this patch here.
>>>>>>>
>>>>>>>>
>>>>>>>>> Also the explanation why a different mount point helps with 
>>>>>>>>> something isn't very satisfying.
>>>>>>>>
>>>>>>>> Not satisfying as you think it is not detailed enough to say 
>>>>>>>> driver wants to use huge pages for performance? Or not satisying 
>>>>>>>> as you question why huge pages would help?
>>>>>>>
>>>>>>> That huge pages are beneficial is clear to me, but I'm missing 
>>>>>>> the connection why a different mount point helps with using huge 
>>>>>>> pages.
>>>>>>
>>>>>> Ah right, same as in i915, one needs to mount a tmpfs instance 
>>>>>> passing huge=within_size or huge=always option. Default is 
>>>>>> 'never', see man 5 tmpfs.
>>>>>
>>>>> Thanks for the explanation, I wasn't aware of that.
>>>>>
>>>>> Mhm, shouldn't we always use huge pages? Is there a reason for a 
>>>>> DRM device to not use huge pages with the shmem backend?
>>>>
>>>> AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), 
>>>> back then the understanding was within_size may overallocate, 
>>>> meaning there would be some space wastage, until the memory pressure 
>>>> makes the thp code split the trailing huge page. I haven't checked 
>>>> if that still applies.
>>>>
>>>> Other than that I don't know if some drivers/platforms could have 
>>>> problems if they have some limitations or hardcoded assumptions when 
>>>> they iterate the sg list.
>>>
>>> Yeah, that was the whole point behind my question. As far as I can 
>>> see this isn't driver specific, but platform specific.
>>>
>>> I might be wrong here, but I think we should then probably not have 
>>> that handling in each individual driver, but rather centralized in 
>>> the DRM code.
>>
>> I don't see a point in enabling THP for all shmem drivers. A huge page
>> is only useful if the driver is going to use it. On V3D, for example,
>> I only need huge pages because I need the memory contiguously allocated
>> to implement Super Pages. Otherwise, if we don't have the Super Pages
>> support implemented in the driver, I would be creating memory pressure
>> without any performance gain.
> 
> Well that's the point I'm disagreeing with. THP doesn't seem to create 
> much extra memory pressure for this use case.
> 
> As far as I can see background for the option is that files in tmpfs 
> usually have a varying size, so it usually isn't beneficial to allocate 
> a huge page just to find that the shmem file is much smaller than what's 
> needed.
> 
> But GEM objects have a fixed size. So we of hand knew if we need 4KiB or 
> 1GiB and can therefore directly allocate huge pages if they are 
> available and object large enough to back them with.
> 
> If the memory pressure is so high that we don't have huge pages 
> available the shmem code falls back to standard pages anyway.

The matter is: how do we define the point where the memory pressure is 
high? For example, notice that in this implementation of Super Pages
for the V3D driver, I only use a Super Page if the BO is bigger than 
2MB. I'm doing that because the Raspberry Pi only has 4GB of RAM 
available for the GPU. If I created huge pages for every BO allocation 
(and initially, I tried that), I would end up with hangs in some 
applications.

At least, for V3D, I wouldn't like to see THP being used for all the 
allocations. But, we have maintainers of other drivers in the CC.

Best Regards,
- Maíra

> 
> So THP is almost always beneficial for GEM even if the driver doesn't 
> actually need it. The only potential case I can think of which might not 
> be handled gracefully is the tail pages, e.g. huge + 4kib.
> 
> But that is trivial to optimize in the shmem code when the final size of 
> the file is known beforehand.
> 
> Regards,
> Christian.
> 
>>
>> Best Regards,
>> - Maíra
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>>>
>>>> Te Cc is plenty large so perhaps someone else will have additional 
>>>> information. :)
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>>
>>>>>
>>>>> I mean it would make this patch here even smaller.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Tvrtko
>>>>>
>>>
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-18 13:28                       ` Maíra Canal
@ 2024-03-18 14:01                         ` Maíra Canal
  2024-03-18 14:04                         ` Christian König
  1 sibling, 0 replies; 28+ messages in thread
From: Maíra Canal @ 2024-03-18 14:01 UTC (permalink / raw)
  To: Christian König, Tvrtko Ursulin, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti

On 3/18/24 10:28, Maíra Canal wrote:
> Hi Christian,
> 
> On 3/18/24 10:10, Christian König wrote:
>> Am 18.03.24 um 13:42 schrieb Maíra Canal:
>>> Hi Christian,
>>>
>>> On 3/12/24 10:48, Christian König wrote:
>>>> Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:
>>>>>
>>>>> On 12/03/2024 10:37, Christian König wrote:
>>>>>> Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:
>>>>>>>
>>>>>>> On 12/03/2024 10:23, Christian König wrote:
>>>>>>>> Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>>>>>>>>>
>>>>>>>>> On 12/03/2024 08:59, Christian König wrote:
>>>>>>>>>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>>>>>>>>>
>>>>>>>>>>> Hi Maira,
>>>>>>>>>>>
>>>>>>>>>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>>>>>>>>>> For some applications, such as using huge pages, we might 
>>>>>>>>>>>> want to have a
>>>>>>>>>>>> different mountpoint, for which we pass in mount flags that 
>>>>>>>>>>>> better match
>>>>>>>>>>>> our usecase.
>>>>>>>>>>>>
>>>>>>>>>>>> Therefore, add a new parameter to drm_gem_object_init() that 
>>>>>>>>>>>> allow us to
>>>>>>>>>>>> define the tmpfs mountpoint where the GEM object will be 
>>>>>>>>>>>> created. If
>>>>>>>>>>>> this parameter is NULL, then we fallback to shmem_file_setup().
>>>>>>>>>>>
>>>>>>>>>>> One strategy for reducing churn, and so the number of drivers 
>>>>>>>>>>> this patch touches, could be to add a lower level 
>>>>>>>>>>> drm_gem_object_init() (which takes vfsmount, call it 
>>>>>>>>>>> __drm_gem_object_init(), or drm__gem_object_init_mnt(), and 
>>>>>>>>>>> make drm_gem_object_init() call that one with a NULL argument.
>>>>>>>>>>
>>>>>>>>>> I would even go a step further into the other direction. The 
>>>>>>>>>> shmem backed GEM object is just some special handling as far 
>>>>>>>>>> as I can see.
>>>>>>>>>>
>>>>>>>>>> So I would rather suggest to rename all drm_gem_* function 
>>>>>>>>>> which only deal with the shmem backed GEM object into 
>>>>>>>>>> drm_gem_shmem_*.
>>>>>>>>>
>>>>>>>>> That makes sense although it would be very churny. I at least 
>>>>>>>>> would be on the fence regarding the cost vs benefit.
>>>>>>>>
>>>>>>>> Yeah, it should clearly not be part of this patch here.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Also the explanation why a different mount point helps with 
>>>>>>>>>> something isn't very satisfying.
>>>>>>>>>
>>>>>>>>> Not satisfying as you think it is not detailed enough to say 
>>>>>>>>> driver wants to use huge pages for performance? Or not 
>>>>>>>>> satisying as you question why huge pages would help?
>>>>>>>>
>>>>>>>> That huge pages are beneficial is clear to me, but I'm missing 
>>>>>>>> the connection why a different mount point helps with using huge 
>>>>>>>> pages.
>>>>>>>
>>>>>>> Ah right, same as in i915, one needs to mount a tmpfs instance 
>>>>>>> passing huge=within_size or huge=always option. Default is 
>>>>>>> 'never', see man 5 tmpfs.
>>>>>>
>>>>>> Thanks for the explanation, I wasn't aware of that.
>>>>>>
>>>>>> Mhm, shouldn't we always use huge pages? Is there a reason for a 
>>>>>> DRM device to not use huge pages with the shmem backend?
>>>>>
>>>>> AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), 
>>>>> back then the understanding was within_size may overallocate, 
>>>>> meaning there would be some space wastage, until the memory 
>>>>> pressure makes the thp code split the trailing huge page. I haven't 
>>>>> checked if that still applies.
>>>>>
>>>>> Other than that I don't know if some drivers/platforms could have 
>>>>> problems if they have some limitations or hardcoded assumptions 
>>>>> when they iterate the sg list.
>>>>
>>>> Yeah, that was the whole point behind my question. As far as I can 
>>>> see this isn't driver specific, but platform specific.
>>>>
>>>> I might be wrong here, but I think we should then probably not have 
>>>> that handling in each individual driver, but rather centralized in 
>>>> the DRM code.
>>>
>>> I don't see a point in enabling THP for all shmem drivers. A huge page
>>> is only useful if the driver is going to use it. On V3D, for example,
>>> I only need huge pages because I need the memory contiguously allocated
>>> to implement Super Pages. Otherwise, if we don't have the Super Pages
>>> support implemented in the driver, I would be creating memory pressure
>>> without any performance gain.
>>
>> Well that's the point I'm disagreeing with. THP doesn't seem to create 
>> much extra memory pressure for this use case.
>>
>> As far as I can see background for the option is that files in tmpfs 
>> usually have a varying size, so it usually isn't beneficial to 
>> allocate a huge page just to find that the shmem file is much smaller 
>> than what's needed.
>>
>> But GEM objects have a fixed size. So we of hand knew if we need 4KiB 
>> or 1GiB and can therefore directly allocate huge pages if they are 
>> available and object large enough to back them with.
>>
>> If the memory pressure is so high that we don't have huge pages 
>> available the shmem code falls back to standard pages anyway.
> 
> The matter is: how do we define the point where the memory pressure is 
> high? For example, notice that in this implementation of Super Pages
> for the V3D driver, I only use a Super Page if the BO is bigger than 
> 2MB. I'm doing that because the Raspberry Pi only has 4GB of RAM 
> available for the GPU. If I created huge pages for every BO allocation 
> (and initially, I tried that), I would end up with hangs in some 
> applications.
> 
> At least, for V3D, I wouldn't like to see THP being used for all the 
> allocations. But, we have maintainers of other drivers in the CC.

Okay, I'm thinking about a compromise. What if we create a gemfs 
mountpoint in the DRM core and everytime we init a object, we can
choose if we will use huge pages or not. Therefore,
drm_gem_shmem_create() would have a new parameter called huge_pages, 
that can be true or false.

This way each driver would have the opportunity to use its own
heuristics to create huge pages.

What do you think?

Best Regards,
- Maíra

> 
> Best Regards,
> - Maíra
> 
>>
>> So THP is almost always beneficial for GEM even if the driver doesn't 
>> actually need it. The only potential case I can think of which might 
>> not be handled gracefully is the tail pages, e.g. huge + 4kib.
>>
>> But that is trivial to optimize in the shmem code when the final size 
>> of the file is known beforehand.
>>
>> Regards,
>> Christian.
>>
>>>
>>> Best Regards,
>>> - Maíra
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>
>>>>>
>>>>> Te Cc is plenty large so perhaps someone else will have additional 
>>>>> information. :)
>>>>>
>>>>> Regards,
>>>>>
>>>>> Tvrtko
>>>>>
>>>>>>
>>>>>> I mean it would make this patch here even smaller.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Tvrtko
>>>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-18 13:28                       ` Maíra Canal
  2024-03-18 14:01                         ` Maíra Canal
@ 2024-03-18 14:04                         ` Christian König
  2024-03-18 14:24                           ` Maíra Canal
  1 sibling, 1 reply; 28+ messages in thread
From: Christian König @ 2024-03-18 14:04 UTC (permalink / raw)
  To: Maíra Canal, Tvrtko Ursulin, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti

Am 18.03.24 um 14:28 schrieb Maíra Canal:
> Hi Christian,
>
> On 3/18/24 10:10, Christian König wrote:
>> Am 18.03.24 um 13:42 schrieb Maíra Canal:
>>> Hi Christian,
>>>
>>> On 3/12/24 10:48, Christian König wrote:
>>>> Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:
>>>>>
>>>>> On 12/03/2024 10:37, Christian König wrote:
>>>>>> Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:
>>>>>>>
>>>>>>> On 12/03/2024 10:23, Christian König wrote:
>>>>>>>> Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>>>>>>>>>
>>>>>>>>> On 12/03/2024 08:59, Christian König wrote:
>>>>>>>>>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>>>>>>>>>
>>>>>>>>>>> Hi Maira,
>>>>>>>>>>>
>>>>>>>>>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>>>>>>>>>> For some applications, such as using huge pages, we might 
>>>>>>>>>>>> want to have a
>>>>>>>>>>>> different mountpoint, for which we pass in mount flags that 
>>>>>>>>>>>> better match
>>>>>>>>>>>> our usecase.
>>>>>>>>>>>>
>>>>>>>>>>>> Therefore, add a new parameter to drm_gem_object_init() 
>>>>>>>>>>>> that allow us to
>>>>>>>>>>>> define the tmpfs mountpoint where the GEM object will be 
>>>>>>>>>>>> created. If
>>>>>>>>>>>> this parameter is NULL, then we fallback to 
>>>>>>>>>>>> shmem_file_setup().
>>>>>>>>>>>
>>>>>>>>>>> One strategy for reducing churn, and so the number of 
>>>>>>>>>>> drivers this patch touches, could be to add a lower level 
>>>>>>>>>>> drm_gem_object_init() (which takes vfsmount, call it 
>>>>>>>>>>> __drm_gem_object_init(), or drm__gem_object_init_mnt(), and 
>>>>>>>>>>> make drm_gem_object_init() call that one with a NULL argument.
>>>>>>>>>>
>>>>>>>>>> I would even go a step further into the other direction. The 
>>>>>>>>>> shmem backed GEM object is just some special handling as far 
>>>>>>>>>> as I can see.
>>>>>>>>>>
>>>>>>>>>> So I would rather suggest to rename all drm_gem_* function 
>>>>>>>>>> which only deal with the shmem backed GEM object into 
>>>>>>>>>> drm_gem_shmem_*.
>>>>>>>>>
>>>>>>>>> That makes sense although it would be very churny. I at least 
>>>>>>>>> would be on the fence regarding the cost vs benefit.
>>>>>>>>
>>>>>>>> Yeah, it should clearly not be part of this patch here.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Also the explanation why a different mount point helps with 
>>>>>>>>>> something isn't very satisfying.
>>>>>>>>>
>>>>>>>>> Not satisfying as you think it is not detailed enough to say 
>>>>>>>>> driver wants to use huge pages for performance? Or not 
>>>>>>>>> satisying as you question why huge pages would help?
>>>>>>>>
>>>>>>>> That huge pages are beneficial is clear to me, but I'm missing 
>>>>>>>> the connection why a different mount point helps with using 
>>>>>>>> huge pages.
>>>>>>>
>>>>>>> Ah right, same as in i915, one needs to mount a tmpfs instance 
>>>>>>> passing huge=within_size or huge=always option. Default is 
>>>>>>> 'never', see man 5 tmpfs.
>>>>>>
>>>>>> Thanks for the explanation, I wasn't aware of that.
>>>>>>
>>>>>> Mhm, shouldn't we always use huge pages? Is there a reason for a 
>>>>>> DRM device to not use huge pages with the shmem backend?
>>>>>
>>>>> AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), 
>>>>> back then the understanding was within_size may overallocate, 
>>>>> meaning there would be some space wastage, until the memory 
>>>>> pressure makes the thp code split the trailing huge page. I 
>>>>> haven't checked if that still applies.
>>>>>
>>>>> Other than that I don't know if some drivers/platforms could have 
>>>>> problems if they have some limitations or hardcoded assumptions 
>>>>> when they iterate the sg list.
>>>>
>>>> Yeah, that was the whole point behind my question. As far as I can 
>>>> see this isn't driver specific, but platform specific.
>>>>
>>>> I might be wrong here, but I think we should then probably not have 
>>>> that handling in each individual driver, but rather centralized in 
>>>> the DRM code.
>>>
>>> I don't see a point in enabling THP for all shmem drivers. A huge page
>>> is only useful if the driver is going to use it. On V3D, for example,
>>> I only need huge pages because I need the memory contiguously allocated
>>> to implement Super Pages. Otherwise, if we don't have the Super Pages
>>> support implemented in the driver, I would be creating memory pressure
>>> without any performance gain.
>>
>> Well that's the point I'm disagreeing with. THP doesn't seem to 
>> create much extra memory pressure for this use case.
>>
>> As far as I can see background for the option is that files in tmpfs 
>> usually have a varying size, so it usually isn't beneficial to 
>> allocate a huge page just to find that the shmem file is much smaller 
>> than what's needed.
>>
>> But GEM objects have a fixed size. So we of hand knew if we need 4KiB 
>> or 1GiB and can therefore directly allocate huge pages if they are 
>> available and object large enough to back them with.
>>
>> If the memory pressure is so high that we don't have huge pages 
>> available the shmem code falls back to standard pages anyway.
>
> The matter is: how do we define the point where the memory pressure is 
> high?

Well as driver developers/maintainers we simply don't do that. This is 
the job of the shmem code.

> For example, notice that in this implementation of Super Pages
> for the V3D driver, I only use a Super Page if the BO is bigger than 
> 2MB. I'm doing that because the Raspberry Pi only has 4GB of RAM 
> available for the GPU. If I created huge pages for every BO allocation 
> (and initially, I tried that), I would end up with hangs in some 
> applications.

Yeah, that is what I meant with the trivial optimisation to the shmem 
code. Essentially when you have 2MiB+4KiB as BO size then the shmem code 
should use a 2MiB and a 4KiB page to back them, but what it currently 
does is to use two 2MiB pages and then split up the second page when it 
find that it isn't needed.

That is wasteful and leads to fragmentation, but as soon as we stop 
doing that we should be fine to enable it unconditionally for all drivers.

TTM does essentially the same thing for years.

Regards,
Christian.

>
>
> At least, for V3D, I wouldn't like to see THP being used for all the 
> allocations. But, we have maintainers of other drivers in the CC.
>
> Best Regards,
> - Maíra
>
>>
>> So THP is almost always beneficial for GEM even if the driver doesn't 
>> actually need it. The only potential case I can think of which might 
>> not be handled gracefully is the tail pages, e.g. huge + 4kib.
>>
>> But that is trivial to optimize in the shmem code when the final size 
>> of the file is known beforehand.
>>
>> Regards,
>> Christian.
>>
>>>
>>> Best Regards,
>>> - Maíra
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>
>>>>>
>>>>> Te Cc is plenty large so perhaps someone else will have additional 
>>>>> information. :)
>>>>>
>>>>> Regards,
>>>>>
>>>>> Tvrtko
>>>>>
>>>>>>
>>>>>> I mean it would make this patch here even smaller.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Tvrtko
>>>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-18 14:04                         ` Christian König
@ 2024-03-18 14:24                           ` Maíra Canal
  2024-03-18 15:05                             ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: Maíra Canal @ 2024-03-18 14:24 UTC (permalink / raw)
  To: Christian König, Tvrtko Ursulin, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti,
	Hugh Dickins, linux-mm

Not that the CC list wasn't big enough, but I'm adding MM folks
in the CC list.

On 3/18/24 11:04, Christian König wrote:
> Am 18.03.24 um 14:28 schrieb Maíra Canal:
>> Hi Christian,
>>
>> On 3/18/24 10:10, Christian König wrote:
>>> Am 18.03.24 um 13:42 schrieb Maíra Canal:
>>>> Hi Christian,
>>>>
>>>> On 3/12/24 10:48, Christian König wrote:
>>>>> Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:
>>>>>>
>>>>>> On 12/03/2024 10:37, Christian König wrote:
>>>>>>> Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:
>>>>>>>>
>>>>>>>> On 12/03/2024 10:23, Christian König wrote:
>>>>>>>>> Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>>>>>>>>>>
>>>>>>>>>> On 12/03/2024 08:59, Christian König wrote:
>>>>>>>>>>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Maira,
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>>>>>>>>>>> For some applications, such as using huge pages, we might 
>>>>>>>>>>>>> want to have a
>>>>>>>>>>>>> different mountpoint, for which we pass in mount flags that 
>>>>>>>>>>>>> better match
>>>>>>>>>>>>> our usecase.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Therefore, add a new parameter to drm_gem_object_init() 
>>>>>>>>>>>>> that allow us to
>>>>>>>>>>>>> define the tmpfs mountpoint where the GEM object will be 
>>>>>>>>>>>>> created. If
>>>>>>>>>>>>> this parameter is NULL, then we fallback to 
>>>>>>>>>>>>> shmem_file_setup().
>>>>>>>>>>>>
>>>>>>>>>>>> One strategy for reducing churn, and so the number of 
>>>>>>>>>>>> drivers this patch touches, could be to add a lower level 
>>>>>>>>>>>> drm_gem_object_init() (which takes vfsmount, call it 
>>>>>>>>>>>> __drm_gem_object_init(), or drm__gem_object_init_mnt(), and 
>>>>>>>>>>>> make drm_gem_object_init() call that one with a NULL argument.
>>>>>>>>>>>
>>>>>>>>>>> I would even go a step further into the other direction. The 
>>>>>>>>>>> shmem backed GEM object is just some special handling as far 
>>>>>>>>>>> as I can see.
>>>>>>>>>>>
>>>>>>>>>>> So I would rather suggest to rename all drm_gem_* function 
>>>>>>>>>>> which only deal with the shmem backed GEM object into 
>>>>>>>>>>> drm_gem_shmem_*.
>>>>>>>>>>
>>>>>>>>>> That makes sense although it would be very churny. I at least 
>>>>>>>>>> would be on the fence regarding the cost vs benefit.
>>>>>>>>>
>>>>>>>>> Yeah, it should clearly not be part of this patch here.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Also the explanation why a different mount point helps with 
>>>>>>>>>>> something isn't very satisfying.
>>>>>>>>>>
>>>>>>>>>> Not satisfying as you think it is not detailed enough to say 
>>>>>>>>>> driver wants to use huge pages for performance? Or not 
>>>>>>>>>> satisying as you question why huge pages would help?
>>>>>>>>>
>>>>>>>>> That huge pages are beneficial is clear to me, but I'm missing 
>>>>>>>>> the connection why a different mount point helps with using 
>>>>>>>>> huge pages.
>>>>>>>>
>>>>>>>> Ah right, same as in i915, one needs to mount a tmpfs instance 
>>>>>>>> passing huge=within_size or huge=always option. Default is 
>>>>>>>> 'never', see man 5 tmpfs.
>>>>>>>
>>>>>>> Thanks for the explanation, I wasn't aware of that.
>>>>>>>
>>>>>>> Mhm, shouldn't we always use huge pages? Is there a reason for a 
>>>>>>> DRM device to not use huge pages with the shmem backend?
>>>>>>
>>>>>> AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), 
>>>>>> back then the understanding was within_size may overallocate, 
>>>>>> meaning there would be some space wastage, until the memory 
>>>>>> pressure makes the thp code split the trailing huge page. I 
>>>>>> haven't checked if that still applies.
>>>>>>
>>>>>> Other than that I don't know if some drivers/platforms could have 
>>>>>> problems if they have some limitations or hardcoded assumptions 
>>>>>> when they iterate the sg list.
>>>>>
>>>>> Yeah, that was the whole point behind my question. As far as I can 
>>>>> see this isn't driver specific, but platform specific.
>>>>>
>>>>> I might be wrong here, but I think we should then probably not have 
>>>>> that handling in each individual driver, but rather centralized in 
>>>>> the DRM code.
>>>>
>>>> I don't see a point in enabling THP for all shmem drivers. A huge page
>>>> is only useful if the driver is going to use it. On V3D, for example,
>>>> I only need huge pages because I need the memory contiguously allocated
>>>> to implement Super Pages. Otherwise, if we don't have the Super Pages
>>>> support implemented in the driver, I would be creating memory pressure
>>>> without any performance gain.
>>>
>>> Well that's the point I'm disagreeing with. THP doesn't seem to 
>>> create much extra memory pressure for this use case.
>>>
>>> As far as I can see background for the option is that files in tmpfs 
>>> usually have a varying size, so it usually isn't beneficial to 
>>> allocate a huge page just to find that the shmem file is much smaller 
>>> than what's needed.
>>>
>>> But GEM objects have a fixed size. So we of hand knew if we need 4KiB 
>>> or 1GiB and can therefore directly allocate huge pages if they are 
>>> available and object large enough to back them with.
>>>
>>> If the memory pressure is so high that we don't have huge pages 
>>> available the shmem code falls back to standard pages anyway.
>>
>> The matter is: how do we define the point where the memory pressure is 
>> high?
> 
> Well as driver developers/maintainers we simply don't do that. This is 
> the job of the shmem code.
> 
>> For example, notice that in this implementation of Super Pages
>> for the V3D driver, I only use a Super Page if the BO is bigger than 
>> 2MB. I'm doing that because the Raspberry Pi only has 4GB of RAM 
>> available for the GPU. If I created huge pages for every BO allocation 
>> (and initially, I tried that), I would end up with hangs in some 
>> applications.
> 
> Yeah, that is what I meant with the trivial optimisation to the shmem 
> code. Essentially when you have 2MiB+4KiB as BO size then the shmem code 
> should use a 2MiB and a 4KiB page to back them, but what it currently 
> does is to use two 2MiB pages and then split up the second page when it 
> find that it isn't needed.
> 
> That is wasteful and leads to fragmentation, but as soon as we stop 
> doing that we should be fine to enable it unconditionally for all drivers.

I see your point, but I believe that it would be tangent to the goal of
this series. As you mentioned, currently, we have a lot of memory
fragmentation when using THP and while we don't solve that (at the tmpfs
level), I believe we could move on with the current implementation (with
improvements proposed by Tvrtko).

Best Regards,
- Maíra

> 
> TTM does essentially the same thing for years.
> 
> Regards,
> Christian.
> 
>>
>>
>> At least, for V3D, I wouldn't like to see THP being used for all the 
>> allocations. But, we have maintainers of other drivers in the CC.
>>
>> Best Regards,
>> - Maíra
>>
>>>
>>> So THP is almost always beneficial for GEM even if the driver doesn't 
>>> actually need it. The only potential case I can think of which might 
>>> not be handled gracefully is the tail pages, e.g. huge + 4kib.
>>>
>>> But that is trivial to optimize in the shmem code when the final size 
>>> of the file is known beforehand.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Best Regards,
>>>> - Maíra
>>>>
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>
>>>>>>
>>>>>> Te Cc is plenty large so perhaps someone else will have additional 
>>>>>> information. :)
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Tvrtko
>>>>>>
>>>>>>>
>>>>>>> I mean it would make this patch here even smaller.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Tvrtko
>>>>>>>
>>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-18 14:24                           ` Maíra Canal
@ 2024-03-18 15:05                             ` Christian König
  2024-03-18 15:27                               ` Tvrtko Ursulin
  0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2024-03-18 15:05 UTC (permalink / raw)
  To: Maíra Canal, Tvrtko Ursulin, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti,
	Hugh Dickins, linux-mm

Am 18.03.24 um 15:24 schrieb Maíra Canal:
> Not that the CC list wasn't big enough, but I'm adding MM folks
> in the CC list.
>
> On 3/18/24 11:04, Christian König wrote:
>> Am 18.03.24 um 14:28 schrieb Maíra Canal:
>>> Hi Christian,
>>>
>>> On 3/18/24 10:10, Christian König wrote:
>>>> Am 18.03.24 um 13:42 schrieb Maíra Canal:
>>>>> Hi Christian,
>>>>>
>>>>> On 3/12/24 10:48, Christian König wrote:
>>>>>> Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:
>>>>>>>
>>>>>>> On 12/03/2024 10:37, Christian König wrote:
>>>>>>>> Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:
>>>>>>>>>
>>>>>>>>> On 12/03/2024 10:23, Christian König wrote:
>>>>>>>>>> Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>>>>>>>>>>>
>>>>>>>>>>> On 12/03/2024 08:59, Christian König wrote:
>>>>>>>>>>>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Maira,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>>>>>>>>>>>> For some applications, such as using huge pages, we might 
>>>>>>>>>>>>>> want to have a
>>>>>>>>>>>>>> different mountpoint, for which we pass in mount flags 
>>>>>>>>>>>>>> that better match
>>>>>>>>>>>>>> our usecase.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Therefore, add a new parameter to drm_gem_object_init() 
>>>>>>>>>>>>>> that allow us to
>>>>>>>>>>>>>> define the tmpfs mountpoint where the GEM object will be 
>>>>>>>>>>>>>> created. If
>>>>>>>>>>>>>> this parameter is NULL, then we fallback to 
>>>>>>>>>>>>>> shmem_file_setup().
>>>>>>>>>>>>>
>>>>>>>>>>>>> One strategy for reducing churn, and so the number of 
>>>>>>>>>>>>> drivers this patch touches, could be to add a lower level 
>>>>>>>>>>>>> drm_gem_object_init() (which takes vfsmount, call it 
>>>>>>>>>>>>> __drm_gem_object_init(), or drm__gem_object_init_mnt(), 
>>>>>>>>>>>>> and make drm_gem_object_init() call that one with a NULL 
>>>>>>>>>>>>> argument.
>>>>>>>>>>>>
>>>>>>>>>>>> I would even go a step further into the other direction. 
>>>>>>>>>>>> The shmem backed GEM object is just some special handling 
>>>>>>>>>>>> as far as I can see.
>>>>>>>>>>>>
>>>>>>>>>>>> So I would rather suggest to rename all drm_gem_* function 
>>>>>>>>>>>> which only deal with the shmem backed GEM object into 
>>>>>>>>>>>> drm_gem_shmem_*.
>>>>>>>>>>>
>>>>>>>>>>> That makes sense although it would be very churny. I at 
>>>>>>>>>>> least would be on the fence regarding the cost vs benefit.
>>>>>>>>>>
>>>>>>>>>> Yeah, it should clearly not be part of this patch here.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Also the explanation why a different mount point helps with 
>>>>>>>>>>>> something isn't very satisfying.
>>>>>>>>>>>
>>>>>>>>>>> Not satisfying as you think it is not detailed enough to say 
>>>>>>>>>>> driver wants to use huge pages for performance? Or not 
>>>>>>>>>>> satisying as you question why huge pages would help?
>>>>>>>>>>
>>>>>>>>>> That huge pages are beneficial is clear to me, but I'm 
>>>>>>>>>> missing the connection why a different mount point helps with 
>>>>>>>>>> using huge pages.
>>>>>>>>>
>>>>>>>>> Ah right, same as in i915, one needs to mount a tmpfs instance 
>>>>>>>>> passing huge=within_size or huge=always option. Default is 
>>>>>>>>> 'never', see man 5 tmpfs.
>>>>>>>>
>>>>>>>> Thanks for the explanation, I wasn't aware of that.
>>>>>>>>
>>>>>>>> Mhm, shouldn't we always use huge pages? Is there a reason for 
>>>>>>>> a DRM device to not use huge pages with the shmem backend?
>>>>>>>
>>>>>>> AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), 
>>>>>>> back then the understanding was within_size may overallocate, 
>>>>>>> meaning there would be some space wastage, until the memory 
>>>>>>> pressure makes the thp code split the trailing huge page. I 
>>>>>>> haven't checked if that still applies.
>>>>>>>
>>>>>>> Other than that I don't know if some drivers/platforms could 
>>>>>>> have problems if they have some limitations or hardcoded 
>>>>>>> assumptions when they iterate the sg list.
>>>>>>
>>>>>> Yeah, that was the whole point behind my question. As far as I 
>>>>>> can see this isn't driver specific, but platform specific.
>>>>>>
>>>>>> I might be wrong here, but I think we should then probably not 
>>>>>> have that handling in each individual driver, but rather 
>>>>>> centralized in the DRM code.
>>>>>
>>>>> I don't see a point in enabling THP for all shmem drivers. A huge 
>>>>> page
>>>>> is only useful if the driver is going to use it. On V3D, for example,
>>>>> I only need huge pages because I need the memory contiguously 
>>>>> allocated
>>>>> to implement Super Pages. Otherwise, if we don't have the Super Pages
>>>>> support implemented in the driver, I would be creating memory 
>>>>> pressure
>>>>> without any performance gain.
>>>>
>>>> Well that's the point I'm disagreeing with. THP doesn't seem to 
>>>> create much extra memory pressure for this use case.
>>>>
>>>> As far as I can see background for the option is that files in 
>>>> tmpfs usually have a varying size, so it usually isn't beneficial 
>>>> to allocate a huge page just to find that the shmem file is much 
>>>> smaller than what's needed.
>>>>
>>>> But GEM objects have a fixed size. So we of hand knew if we need 
>>>> 4KiB or 1GiB and can therefore directly allocate huge pages if they 
>>>> are available and object large enough to back them with.
>>>>
>>>> If the memory pressure is so high that we don't have huge pages 
>>>> available the shmem code falls back to standard pages anyway.
>>>
>>> The matter is: how do we define the point where the memory pressure 
>>> is high?
>>
>> Well as driver developers/maintainers we simply don't do that. This 
>> is the job of the shmem code.
>>
>>> For example, notice that in this implementation of Super Pages
>>> for the V3D driver, I only use a Super Page if the BO is bigger than 
>>> 2MB. I'm doing that because the Raspberry Pi only has 4GB of RAM 
>>> available for the GPU. If I created huge pages for every BO 
>>> allocation (and initially, I tried that), I would end up with hangs 
>>> in some applications.
>>
>> Yeah, that is what I meant with the trivial optimisation to the shmem 
>> code. Essentially when you have 2MiB+4KiB as BO size then the shmem 
>> code should use a 2MiB and a 4KiB page to back them, but what it 
>> currently does is to use two 2MiB pages and then split up the second 
>> page when it find that it isn't needed.
>>
>> That is wasteful and leads to fragmentation, but as soon as we stop 
>> doing that we should be fine to enable it unconditionally for all 
>> drivers.
>
> I see your point, but I believe that it would be tangent to the goal of
> this series. As you mentioned, currently, we have a lot of memory
> fragmentation when using THP and while we don't solve that (at the tmpfs
> level), I believe we could move on with the current implementation (with
> improvements proposed by Tvrtko).

Oh, I seriously don't want to block this patch set here. Just asking if 
it's the right approach.

Point is we might need to revert the driver changes again when THP is 
further optimized and the options aren't needed any more.

But if and how that should happen is perfectly up to Tvrtko.

Regards,
Christian.

>
> Best Regards,
> - Maíra
>
>>
>> TTM does essentially the same thing for years.
>>
>> Regards,
>> Christian.
>>
>>>
>>>
>>> At least, for V3D, I wouldn't like to see THP being used for all the 
>>> allocations. But, we have maintainers of other drivers in the CC.
>>>
>>> Best Regards,
>>> - Maíra
>>>
>>>>
>>>> So THP is almost always beneficial for GEM even if the driver 
>>>> doesn't actually need it. The only potential case I can think of 
>>>> which might not be handled gracefully is the tail pages, e.g. huge 
>>>> + 4kib.
>>>>
>>>> But that is trivial to optimize in the shmem code when the final 
>>>> size of the file is known beforehand.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>> Best Regards,
>>>>> - Maíra
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Te Cc is plenty large so perhaps someone else will have 
>>>>>>> additional information. :)
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Tvrtko
>>>>>>>
>>>>>>>>
>>>>>>>> I mean it would make this patch here even smaller.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Tvrtko
>>>>>>>>
>>>>>>
>>>>
>>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init()
  2024-03-18 15:05                             ` Christian König
@ 2024-03-18 15:27                               ` Tvrtko Ursulin
  0 siblings, 0 replies; 28+ messages in thread
From: Tvrtko Ursulin @ 2024-03-18 15:27 UTC (permalink / raw)
  To: Christian König, Maíra Canal, Melissa Wen, Iago Toral,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: dri-devel, kernel-dev, Russell King, Lucas Stach,
	Christian Gmeiner, Inki Dae, Seung-Woo Kim, Kyungmin Park,
	Krzysztof Kozlowski, Alim Akhtar, Patrik Jakobsson, Sui Jingfeng,
	Chun-Kuang Hu, Philipp Zabel, Matthias Brugger,
	AngeloGioacchino Del Regno, Rob Clark, Abhinav Kumar,
	Dmitry Baryshkov, Sean Paul, Marijn Suijten, Karol Herbst,
	Lyude Paul, Danilo Krummrich, Tomi Valkeinen, Gerd Hoffmann,
	Sandy Huang, Heiko Stübner, Andy Yan, Thierry Reding,
	Mikko Perttunen, Jonathan Hunter, Huang Rui,
	Oleksandr Andrushchenko, Karolina Stolarek, Andi Shyti,
	Hugh Dickins, linux-mm


On 18/03/2024 15:05, Christian König wrote:
> Am 18.03.24 um 15:24 schrieb Maíra Canal:
>> Not that the CC list wasn't big enough, but I'm adding MM folks
>> in the CC list.
>>
>> On 3/18/24 11:04, Christian König wrote:
>>> Am 18.03.24 um 14:28 schrieb Maíra Canal:
>>>> Hi Christian,
>>>>
>>>> On 3/18/24 10:10, Christian König wrote:
>>>>> Am 18.03.24 um 13:42 schrieb Maíra Canal:
>>>>>> Hi Christian,
>>>>>>
>>>>>> On 3/12/24 10:48, Christian König wrote:
>>>>>>> Am 12.03.24 um 14:09 schrieb Tvrtko Ursulin:
>>>>>>>>
>>>>>>>> On 12/03/2024 10:37, Christian König wrote:
>>>>>>>>> Am 12.03.24 um 11:31 schrieb Tvrtko Ursulin:
>>>>>>>>>>
>>>>>>>>>> On 12/03/2024 10:23, Christian König wrote:
>>>>>>>>>>> Am 12.03.24 um 10:30 schrieb Tvrtko Ursulin:
>>>>>>>>>>>>
>>>>>>>>>>>> On 12/03/2024 08:59, Christian König wrote:
>>>>>>>>>>>>> Am 12.03.24 um 09:51 schrieb Tvrtko Ursulin:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Maira,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 11/03/2024 10:05, Maíra Canal wrote:
>>>>>>>>>>>>>>> For some applications, such as using huge pages, we might 
>>>>>>>>>>>>>>> want to have a
>>>>>>>>>>>>>>> different mountpoint, for which we pass in mount flags 
>>>>>>>>>>>>>>> that better match
>>>>>>>>>>>>>>> our usecase.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Therefore, add a new parameter to drm_gem_object_init() 
>>>>>>>>>>>>>>> that allow us to
>>>>>>>>>>>>>>> define the tmpfs mountpoint where the GEM object will be 
>>>>>>>>>>>>>>> created. If
>>>>>>>>>>>>>>> this parameter is NULL, then we fallback to 
>>>>>>>>>>>>>>> shmem_file_setup().
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One strategy for reducing churn, and so the number of 
>>>>>>>>>>>>>> drivers this patch touches, could be to add a lower level 
>>>>>>>>>>>>>> drm_gem_object_init() (which takes vfsmount, call it 
>>>>>>>>>>>>>> __drm_gem_object_init(), or drm__gem_object_init_mnt(), 
>>>>>>>>>>>>>> and make drm_gem_object_init() call that one with a NULL 
>>>>>>>>>>>>>> argument.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would even go a step further into the other direction. 
>>>>>>>>>>>>> The shmem backed GEM object is just some special handling 
>>>>>>>>>>>>> as far as I can see.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So I would rather suggest to rename all drm_gem_* function 
>>>>>>>>>>>>> which only deal with the shmem backed GEM object into 
>>>>>>>>>>>>> drm_gem_shmem_*.
>>>>>>>>>>>>
>>>>>>>>>>>> That makes sense although it would be very churny. I at 
>>>>>>>>>>>> least would be on the fence regarding the cost vs benefit.
>>>>>>>>>>>
>>>>>>>>>>> Yeah, it should clearly not be part of this patch here.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Also the explanation why a different mount point helps with 
>>>>>>>>>>>>> something isn't very satisfying.
>>>>>>>>>>>>
>>>>>>>>>>>> Not satisfying as you think it is not detailed enough to say 
>>>>>>>>>>>> driver wants to use huge pages for performance? Or not 
>>>>>>>>>>>> satisying as you question why huge pages would help?
>>>>>>>>>>>
>>>>>>>>>>> That huge pages are beneficial is clear to me, but I'm 
>>>>>>>>>>> missing the connection why a different mount point helps with 
>>>>>>>>>>> using huge pages.
>>>>>>>>>>
>>>>>>>>>> Ah right, same as in i915, one needs to mount a tmpfs instance 
>>>>>>>>>> passing huge=within_size or huge=always option. Default is 
>>>>>>>>>> 'never', see man 5 tmpfs.
>>>>>>>>>
>>>>>>>>> Thanks for the explanation, I wasn't aware of that.
>>>>>>>>>
>>>>>>>>> Mhm, shouldn't we always use huge pages? Is there a reason for 
>>>>>>>>> a DRM device to not use huge pages with the shmem backend?
>>>>>>>>
>>>>>>>> AFAIU, according to b901bb89324a ("drm/i915/gemfs: enable THP"), 
>>>>>>>> back then the understanding was within_size may overallocate, 
>>>>>>>> meaning there would be some space wastage, until the memory 
>>>>>>>> pressure makes the thp code split the trailing huge page. I 
>>>>>>>> haven't checked if that still applies.
>>>>>>>>
>>>>>>>> Other than that I don't know if some drivers/platforms could 
>>>>>>>> have problems if they have some limitations or hardcoded 
>>>>>>>> assumptions when they iterate the sg list.
>>>>>>>
>>>>>>> Yeah, that was the whole point behind my question. As far as I 
>>>>>>> can see this isn't driver specific, but platform specific.
>>>>>>>
>>>>>>> I might be wrong here, but I think we should then probably not 
>>>>>>> have that handling in each individual driver, but rather 
>>>>>>> centralized in the DRM code.
>>>>>>
>>>>>> I don't see a point in enabling THP for all shmem drivers. A huge 
>>>>>> page
>>>>>> is only useful if the driver is going to use it. On V3D, for example,
>>>>>> I only need huge pages because I need the memory contiguously 
>>>>>> allocated
>>>>>> to implement Super Pages. Otherwise, if we don't have the Super Pages
>>>>>> support implemented in the driver, I would be creating memory 
>>>>>> pressure
>>>>>> without any performance gain.
>>>>>
>>>>> Well that's the point I'm disagreeing with. THP doesn't seem to 
>>>>> create much extra memory pressure for this use case.
>>>>>
>>>>> As far as I can see background for the option is that files in 
>>>>> tmpfs usually have a varying size, so it usually isn't beneficial 
>>>>> to allocate a huge page just to find that the shmem file is much 
>>>>> smaller than what's needed.
>>>>>
>>>>> But GEM objects have a fixed size. So we of hand knew if we need 
>>>>> 4KiB or 1GiB and can therefore directly allocate huge pages if they 
>>>>> are available and object large enough to back them with.
>>>>>
>>>>> If the memory pressure is so high that we don't have huge pages 
>>>>> available the shmem code falls back to standard pages anyway.
>>>>
>>>> The matter is: how do we define the point where the memory pressure 
>>>> is high?
>>>
>>> Well as driver developers/maintainers we simply don't do that. This 
>>> is the job of the shmem code.
>>>
>>>> For example, notice that in this implementation of Super Pages
>>>> for the V3D driver, I only use a Super Page if the BO is bigger than 
>>>> 2MB. I'm doing that because the Raspberry Pi only has 4GB of RAM 
>>>> available for the GPU. If I created huge pages for every BO 
>>>> allocation (and initially, I tried that), I would end up with hangs 
>>>> in some applications.
>>>
>>> Yeah, that is what I meant with the trivial optimisation to the shmem 
>>> code. Essentially when you have 2MiB+4KiB as BO size then the shmem 
>>> code should use a 2MiB and a 4KiB page to back them, but what it 
>>> currently does is to use two 2MiB pages and then split up the second 
>>> page when it find that it isn't needed.
>>>
>>> That is wasteful and leads to fragmentation, but as soon as we stop 
>>> doing that we should be fine to enable it unconditionally for all 
>>> drivers.
>>
>> I see your point, but I believe that it would be tangent to the goal of
>> this series. As you mentioned, currently, we have a lot of memory
>> fragmentation when using THP and while we don't solve that (at the tmpfs
>> level), I believe we could move on with the current implementation (with
>> improvements proposed by Tvrtko).
> 
> Oh, I seriously don't want to block this patch set here. Just asking if 
> it's the right approach.
> 
> Point is we might need to revert the driver changes again when THP is 
> further optimized and the options aren't needed any more.
> 
> But if and how that should happen is perfectly up to Tvrtko.

Seem I got some un-intended voting powers here. :)

What I think would work best is, if Maíra solves the v3d part first and 
then she can propose/float the wider DRM/GEM change to always use THP. 
Because that one will require some wider testing and acks.

My concern there will be the behaviour of within_size mode. I was 
reading 465c403cb508 (drm/i915: introduce simple gemfs) the other day 
which made my think even then THP would initially over allocate. Or 
maybe the comment added in that commit wasn't fully accurate. If 
within_size is not high impact like that, then it GEM wide change might 
be feasible. With some opt-out facility or not I don't know.

In any case, as long as the v3d work can be done with not too much churn 
to common code, I think separating that as follow up should not cause a 
lot of additional churn. Should all be hidden in the implementation 
details if all goes to plan.

Regards,

Tvrtko


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2024-03-18 15:28 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-11 10:05 [PATCH 0/5] drm/v3d: Enable Super Pages Maíra Canal
2024-03-11 10:05 ` [PATCH 1/5] drm/v3d: Fix return if scheduler initialization fails Maíra Canal
2024-03-12  8:35   ` Iago Toral
2024-03-11 10:05 ` [PATCH 2/5] drm/gem: Add a mountpoint parameter to drm_gem_object_init() Maíra Canal
2024-03-12  8:51   ` Tvrtko Ursulin
2024-03-12  8:59     ` Christian König
2024-03-12  9:30       ` Tvrtko Ursulin
2024-03-12 10:23         ` Christian König
2024-03-12 10:31           ` Tvrtko Ursulin
2024-03-12 10:37             ` Christian König
2024-03-12 13:09               ` Tvrtko Ursulin
2024-03-12 13:48                 ` Christian König
2024-03-18 12:42                   ` Maíra Canal
2024-03-18 13:10                     ` Christian König
2024-03-18 13:28                       ` Maíra Canal
2024-03-18 14:01                         ` Maíra Canal
2024-03-18 14:04                         ` Christian König
2024-03-18 14:24                           ` Maíra Canal
2024-03-18 15:05                             ` Christian König
2024-03-18 15:27                               ` Tvrtko Ursulin
2024-03-11 10:06 ` [PATCH 3/5] drm/v3d: Introduce gemfs Maíra Canal
2024-03-12  8:35   ` Iago Toral
2024-03-12  8:55   ` Tvrtko Ursulin
2024-03-11 10:06 ` [PATCH 4/5] drm/gem: Create shmem GEM object in a given mountpoint Maíra Canal
2024-03-11 10:06 ` [PATCH 5/5] drm/v3d: Enable super pages Maíra Canal
2024-03-12  8:34   ` Iago Toral
2024-03-12 13:41   ` Tvrtko Ursulin
2024-03-12  8:37 ` [PATCH 0/5] drm/v3d: Enable Super Pages Iago Toral

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.