All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] drm: fdinfo memory stats
@ 2023-04-11 22:56 ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: linux-arm-msm, freedreno, Boris Brezillon, Tvrtko Ursulin,
	Christopher Healy, Emil Velikov, Rob Clark, Alex Deucher,
	open list:RADEON and AMDGPU DRM DRIVERS,
	Arunpravin Paneer Selvam, Christian Gmeiner,
	Christian König,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Evan Quan,
	Guchun Chen, Hawking Zhang, intel-gfx, open list:DOCUMENTATION,
	open list, Mario Limonciello, Michel Dänzer, Russell King,
	Sean Paul, Shashank Sharma, Tvrtko Ursulin, YiPeng Chai

From: Rob Clark <robdclark@chromium.org>

Similar motivation to other similar recent attempt[1].  But with an
attempt to have some shared code for this.  As well as documentation.

It is probably a bit UMA-centric, I guess devices with VRAM might want
some placement stats as well.  But this seems like a reasonable start.

Basic gputop support: https://patchwork.freedesktop.org/series/116236/
And already nvtop support: https://github.com/Syllo/nvtop/pull/204

[1] https://patchwork.freedesktop.org/series/112397/

Rob Clark (7):
  drm: Add common fdinfo helper
  drm/msm: Switch to fdinfo helper
  drm/amdgpu: Switch to fdinfo helper
  drm/i915: Switch to fdinfo helper
  drm/etnaviv: Switch to fdinfo helper
  drm: Add fdinfo memory stats
  drm/msm: Add memory stats to fdinfo

 Documentation/gpu/drm-usage-stats.rst      |  21 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  16 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |   2 +-
 drivers/gpu/drm/drm_file.c                 | 115 +++++++++++++++++++++
 drivers/gpu/drm/etnaviv/etnaviv_drv.c      |  10 +-
 drivers/gpu/drm/i915/i915_driver.c         |   3 +-
 drivers/gpu/drm/i915/i915_drm_client.c     |  18 +---
 drivers/gpu/drm/i915/i915_drm_client.h     |   2 +-
 drivers/gpu/drm/msm/msm_drv.c              |  11 +-
 drivers/gpu/drm/msm/msm_gem.c              |  15 +++
 drivers/gpu/drm/msm/msm_gpu.c              |   2 -
 include/drm/drm_drv.h                      |   7 ++
 include/drm/drm_file.h                     |   5 +
 include/drm/drm_gem.h                      |  19 ++++
 15 files changed, 208 insertions(+), 41 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v3 0/7] drm: fdinfo memory stats
@ 2023-04-11 22:56 ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: open list:DOCUMENTATION, open list, Michel Dänzer,
	YiPeng Chai, Mario Limonciello, Rob Clark, Guchun Chen,
	Shashank Sharma, open list:RADEON and AMDGPU DRM DRIVERS,
	Russell King, Arunpravin Paneer Selvam, linux-arm-msm, intel-gfx,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Evan Quan,
	Sean Paul, Tvrtko Ursulin, Tvrtko Ursulin, Emil Velikov,
	Christopher Healy, Boris Brezillon, Alex Deucher, freedreno,
	Christian König, Hawking Zhang

From: Rob Clark <robdclark@chromium.org>

Similar motivation to other similar recent attempt[1].  But with an
attempt to have some shared code for this.  As well as documentation.

It is probably a bit UMA-centric, I guess devices with VRAM might want
some placement stats as well.  But this seems like a reasonable start.

Basic gputop support: https://patchwork.freedesktop.org/series/116236/
And already nvtop support: https://github.com/Syllo/nvtop/pull/204

[1] https://patchwork.freedesktop.org/series/112397/

Rob Clark (7):
  drm: Add common fdinfo helper
  drm/msm: Switch to fdinfo helper
  drm/amdgpu: Switch to fdinfo helper
  drm/i915: Switch to fdinfo helper
  drm/etnaviv: Switch to fdinfo helper
  drm: Add fdinfo memory stats
  drm/msm: Add memory stats to fdinfo

 Documentation/gpu/drm-usage-stats.rst      |  21 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  16 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |   2 +-
 drivers/gpu/drm/drm_file.c                 | 115 +++++++++++++++++++++
 drivers/gpu/drm/etnaviv/etnaviv_drv.c      |  10 +-
 drivers/gpu/drm/i915/i915_driver.c         |   3 +-
 drivers/gpu/drm/i915/i915_drm_client.c     |  18 +---
 drivers/gpu/drm/i915/i915_drm_client.h     |   2 +-
 drivers/gpu/drm/msm/msm_drv.c              |  11 +-
 drivers/gpu/drm/msm/msm_gem.c              |  15 +++
 drivers/gpu/drm/msm/msm_gpu.c              |   2 -
 include/drm/drm_drv.h                      |   7 ++
 include/drm/drm_file.h                     |   5 +
 include/drm/drm_gem.h                      |  19 ++++
 15 files changed, 208 insertions(+), 41 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [Intel-gfx] [PATCH v3 0/7] drm: fdinfo memory stats
@ 2023-04-11 22:56 ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: open list:DOCUMENTATION, open list, Michel Dänzer,
	YiPeng Chai, Mario Limonciello, Rob Clark, Guchun Chen,
	open list:RADEON and AMDGPU DRM DRIVERS, Russell King,
	Arunpravin Paneer Selvam, linux-arm-msm, intel-gfx,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Christian Gmeiner,
	Evan Quan, Christopher Healy, Alex Deucher, freedreno,
	Christian König, Hawking Zhang

From: Rob Clark <robdclark@chromium.org>

Similar motivation to other similar recent attempt[1].  But with an
attempt to have some shared code for this.  As well as documentation.

It is probably a bit UMA-centric, I guess devices with VRAM might want
some placement stats as well.  But this seems like a reasonable start.

Basic gputop support: https://patchwork.freedesktop.org/series/116236/
And already nvtop support: https://github.com/Syllo/nvtop/pull/204

[1] https://patchwork.freedesktop.org/series/112397/

Rob Clark (7):
  drm: Add common fdinfo helper
  drm/msm: Switch to fdinfo helper
  drm/amdgpu: Switch to fdinfo helper
  drm/i915: Switch to fdinfo helper
  drm/etnaviv: Switch to fdinfo helper
  drm: Add fdinfo memory stats
  drm/msm: Add memory stats to fdinfo

 Documentation/gpu/drm-usage-stats.rst      |  21 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  16 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |   2 +-
 drivers/gpu/drm/drm_file.c                 | 115 +++++++++++++++++++++
 drivers/gpu/drm/etnaviv/etnaviv_drv.c      |  10 +-
 drivers/gpu/drm/i915/i915_driver.c         |   3 +-
 drivers/gpu/drm/i915/i915_drm_client.c     |  18 +---
 drivers/gpu/drm/i915/i915_drm_client.h     |   2 +-
 drivers/gpu/drm/msm/msm_drv.c              |  11 +-
 drivers/gpu/drm/msm/msm_gem.c              |  15 +++
 drivers/gpu/drm/msm/msm_gpu.c              |   2 -
 include/drm/drm_drv.h                      |   7 ++
 include/drm/drm_file.h                     |   5 +
 include/drm/drm_gem.h                      |  19 ++++
 15 files changed, 208 insertions(+), 41 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v3 0/7] drm: fdinfo memory stats
@ 2023-04-11 22:56 ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: open list:DOCUMENTATION, open list, Michel Dänzer,
	YiPeng Chai, Mario Limonciello, Rob Clark, Guchun Chen,
	Shashank Sharma, open list:RADEON and AMDGPU DRM DRIVERS,
	Russell King, Arunpravin Paneer Selvam, linux-arm-msm, intel-gfx,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Christian Gmeiner,
	Evan Quan, Sean Paul, Tvrtko Ursulin, Tvrtko Ursulin,
	Emil Velikov, Christopher Healy, Boris Brezillon, Alex Deucher,
	freedreno, Christian König, Hawking Zhang

From: Rob Clark <robdclark@chromium.org>

Similar motivation to other similar recent attempt[1].  But with an
attempt to have some shared code for this.  As well as documentation.

It is probably a bit UMA-centric, I guess devices with VRAM might want
some placement stats as well.  But this seems like a reasonable start.

Basic gputop support: https://patchwork.freedesktop.org/series/116236/
And already nvtop support: https://github.com/Syllo/nvtop/pull/204

[1] https://patchwork.freedesktop.org/series/112397/

Rob Clark (7):
  drm: Add common fdinfo helper
  drm/msm: Switch to fdinfo helper
  drm/amdgpu: Switch to fdinfo helper
  drm/i915: Switch to fdinfo helper
  drm/etnaviv: Switch to fdinfo helper
  drm: Add fdinfo memory stats
  drm/msm: Add memory stats to fdinfo

 Documentation/gpu/drm-usage-stats.rst      |  21 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  16 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |   2 +-
 drivers/gpu/drm/drm_file.c                 | 115 +++++++++++++++++++++
 drivers/gpu/drm/etnaviv/etnaviv_drv.c      |  10 +-
 drivers/gpu/drm/i915/i915_driver.c         |   3 +-
 drivers/gpu/drm/i915/i915_drm_client.c     |  18 +---
 drivers/gpu/drm/i915/i915_drm_client.h     |   2 +-
 drivers/gpu/drm/msm/msm_drv.c              |  11 +-
 drivers/gpu/drm/msm/msm_gem.c              |  15 +++
 drivers/gpu/drm/msm/msm_gpu.c              |   2 -
 include/drm/drm_drv.h                      |   7 ++
 include/drm/drm_file.h                     |   5 +
 include/drm/drm_gem.h                      |  19 ++++
 15 files changed, 208 insertions(+), 41 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v3 1/7] drm: Add common fdinfo helper
  2023-04-11 22:56 ` Rob Clark
@ 2023-04-11 22:56   ` Rob Clark
  -1 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: linux-arm-msm, freedreno, Boris Brezillon, Tvrtko Ursulin,
	Christopher Healy, Emil Velikov, Rob Clark, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter,
	open list

From: Rob Clark <robdclark@chromium.org>

Handle a bit of the boiler-plate in a single case, and make it easier to
add some core tracked stats.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/drm_file.c | 39 ++++++++++++++++++++++++++++++++++++++
 include/drm/drm_drv.h      |  7 +++++++
 include/drm/drm_file.h     |  4 ++++
 3 files changed, 50 insertions(+)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index a51ff8cee049..37dfaa6be560 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -148,6 +148,7 @@ bool drm_dev_needs_global_mutex(struct drm_device *dev)
  */
 struct drm_file *drm_file_alloc(struct drm_minor *minor)
 {
+	static atomic_t ident = ATOMIC_INIT(0);
 	struct drm_device *dev = minor->dev;
 	struct drm_file *file;
 	int ret;
@@ -156,6 +157,8 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
 	if (!file)
 		return ERR_PTR(-ENOMEM);
 
+	/* Get a unique identifier for fdinfo: */
+	file->client_id = atomic_inc_return(&ident) - 1;
 	file->pid = get_pid(task_pid(current));
 	file->minor = minor;
 
@@ -868,6 +871,42 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
 }
 EXPORT_SYMBOL(drm_send_event);
 
+/**
+ * drm_fop_show_fdinfo - helper for drm file fops
+ * @seq_file: output stream
+ * @f: the device file instance
+ *
+ * Helper to implement fdinfo, for userspace to query usage stats, etc, of a
+ * process using the GPU.
+ */
+void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
+{
+	struct drm_file *file = f->private_data;
+	struct drm_device *dev = file->minor->dev;
+	struct drm_printer p = drm_seq_file_printer(m);
+
+	/*
+	 * ******************************************************************
+	 * For text output format description please see drm-usage-stats.rst!
+	 * ******************************************************************
+	 */
+
+	drm_printf(&p, "drm-driver:\t%s\n", dev->driver->name);
+	drm_printf(&p, "drm-client-id:\t%u\n", file->client_id);
+
+	if (dev_is_pci(dev->dev)) {
+		struct pci_dev *pdev = to_pci_dev(dev->dev);
+
+		drm_printf(&p, "drm-pdev:\t%04x:%02x:%02x.%d\n",
+			   pci_domain_nr(pdev->bus), pdev->bus->number,
+			   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
+	}
+
+	if (dev->driver->show_fdinfo)
+		dev->driver->show_fdinfo(&p, file);
+}
+EXPORT_SYMBOL(drm_fop_show_fdinfo);
+
 /**
  * mock_drm_getfile - Create a new struct file for the drm device
  * @minor: drm minor to wrap (e.g. #drm_device.primary)
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 5b86bb7603e7..a883c6d3bcdf 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -401,6 +401,13 @@ struct drm_driver {
 			       struct drm_device *dev, uint32_t handle,
 			       uint64_t *offset);
 
+	/**
+	 * @fdinfo:
+	 *
+	 * Print device specific fdinfo.  See drm-usage-stats.rst.
+	 */
+	void (*show_fdinfo)(struct drm_printer *p, struct drm_file *f);
+
 	/** @major: driver major number */
 	int major;
 	/** @minor: driver minor number */
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 0d1f853092ab..dfa995b787e1 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -258,6 +258,9 @@ struct drm_file {
 	/** @pid: Process that opened this file. */
 	struct pid *pid;
 
+	/** @client_id: A unique id for fdinfo */
+	u32 client_id;
+
 	/** @magic: Authentication magic, see @authenticated. */
 	drm_magic_t magic;
 
@@ -437,6 +440,7 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e);
 void drm_send_event_timestamp_locked(struct drm_device *dev,
 				     struct drm_pending_event *e,
 				     ktime_t timestamp);
+void drm_fop_show_fdinfo(struct seq_file *m, struct file *f);
 
 struct file *mock_drm_getfile(struct drm_minor *minor, unsigned int flags);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 1/7] drm: Add common fdinfo helper
@ 2023-04-11 22:56   ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Tvrtko Ursulin, linux-arm-msm, Emil Velikov,
	Christopher Healy, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno

From: Rob Clark <robdclark@chromium.org>

Handle a bit of the boiler-plate in a single case, and make it easier to
add some core tracked stats.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/drm_file.c | 39 ++++++++++++++++++++++++++++++++++++++
 include/drm/drm_drv.h      |  7 +++++++
 include/drm/drm_file.h     |  4 ++++
 3 files changed, 50 insertions(+)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index a51ff8cee049..37dfaa6be560 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -148,6 +148,7 @@ bool drm_dev_needs_global_mutex(struct drm_device *dev)
  */
 struct drm_file *drm_file_alloc(struct drm_minor *minor)
 {
+	static atomic_t ident = ATOMIC_INIT(0);
 	struct drm_device *dev = minor->dev;
 	struct drm_file *file;
 	int ret;
@@ -156,6 +157,8 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
 	if (!file)
 		return ERR_PTR(-ENOMEM);
 
+	/* Get a unique identifier for fdinfo: */
+	file->client_id = atomic_inc_return(&ident) - 1;
 	file->pid = get_pid(task_pid(current));
 	file->minor = minor;
 
@@ -868,6 +871,42 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
 }
 EXPORT_SYMBOL(drm_send_event);
 
+/**
+ * drm_fop_show_fdinfo - helper for drm file fops
+ * @seq_file: output stream
+ * @f: the device file instance
+ *
+ * Helper to implement fdinfo, for userspace to query usage stats, etc, of a
+ * process using the GPU.
+ */
+void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
+{
+	struct drm_file *file = f->private_data;
+	struct drm_device *dev = file->minor->dev;
+	struct drm_printer p = drm_seq_file_printer(m);
+
+	/*
+	 * ******************************************************************
+	 * For text output format description please see drm-usage-stats.rst!
+	 * ******************************************************************
+	 */
+
+	drm_printf(&p, "drm-driver:\t%s\n", dev->driver->name);
+	drm_printf(&p, "drm-client-id:\t%u\n", file->client_id);
+
+	if (dev_is_pci(dev->dev)) {
+		struct pci_dev *pdev = to_pci_dev(dev->dev);
+
+		drm_printf(&p, "drm-pdev:\t%04x:%02x:%02x.%d\n",
+			   pci_domain_nr(pdev->bus), pdev->bus->number,
+			   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
+	}
+
+	if (dev->driver->show_fdinfo)
+		dev->driver->show_fdinfo(&p, file);
+}
+EXPORT_SYMBOL(drm_fop_show_fdinfo);
+
 /**
  * mock_drm_getfile - Create a new struct file for the drm device
  * @minor: drm minor to wrap (e.g. #drm_device.primary)
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 5b86bb7603e7..a883c6d3bcdf 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -401,6 +401,13 @@ struct drm_driver {
 			       struct drm_device *dev, uint32_t handle,
 			       uint64_t *offset);
 
+	/**
+	 * @fdinfo:
+	 *
+	 * Print device specific fdinfo.  See drm-usage-stats.rst.
+	 */
+	void (*show_fdinfo)(struct drm_printer *p, struct drm_file *f);
+
 	/** @major: driver major number */
 	int major;
 	/** @minor: driver minor number */
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 0d1f853092ab..dfa995b787e1 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -258,6 +258,9 @@ struct drm_file {
 	/** @pid: Process that opened this file. */
 	struct pid *pid;
 
+	/** @client_id: A unique id for fdinfo */
+	u32 client_id;
+
 	/** @magic: Authentication magic, see @authenticated. */
 	drm_magic_t magic;
 
@@ -437,6 +440,7 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e);
 void drm_send_event_timestamp_locked(struct drm_device *dev,
 				     struct drm_pending_event *e,
 				     ktime_t timestamp);
+void drm_fop_show_fdinfo(struct seq_file *m, struct file *f);
 
 struct file *mock_drm_getfile(struct drm_minor *minor, unsigned int flags);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 2/7] drm/msm: Switch to fdinfo helper
  2023-04-11 22:56 ` Rob Clark
@ 2023-04-11 22:56   ` Rob Clark
  -1 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: linux-arm-msm, freedreno, Boris Brezillon, Tvrtko Ursulin,
	Christopher Healy, Emil Velikov, Rob Clark, Rob Clark,
	Abhinav Kumar, Dmitry Baryshkov, Sean Paul, David Airlie,
	Daniel Vetter, open list

From: Rob Clark <robdclark@chromium.org>

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_drv.c | 11 +++++------
 drivers/gpu/drm/msm/msm_gpu.c |  2 --
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 5a10d28de9dd..e516a3544505 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -1043,23 +1043,21 @@ static const struct drm_ioctl_desc msm_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(MSM_SUBMITQUEUE_QUERY, msm_ioctl_submitqueue_query, DRM_RENDER_ALLOW),
 };
 
-static void msm_fop_show_fdinfo(struct seq_file *m, struct file *f)
+static void msm_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 {
-	struct drm_file *file = f->private_data;
 	struct drm_device *dev = file->minor->dev;
 	struct msm_drm_private *priv = dev->dev_private;
-	struct drm_printer p = drm_seq_file_printer(m);
 
 	if (!priv->gpu)
 		return;
 
-	msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, &p);
+	msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
 }
 
 static const struct file_operations fops = {
 	.owner = THIS_MODULE,
 	DRM_GEM_FOPS,
-	.show_fdinfo = msm_fop_show_fdinfo,
+	.show_fdinfo = drm_fop_show_fdinfo,
 };
 
 static const struct drm_driver msm_driver = {
@@ -1070,7 +1068,7 @@ static const struct drm_driver msm_driver = {
 				DRIVER_SYNCOBJ_TIMELINE |
 				DRIVER_SYNCOBJ,
 	.open               = msm_open,
-	.postclose           = msm_postclose,
+	.postclose          = msm_postclose,
 	.lastclose          = drm_fb_helper_lastclose,
 	.dumb_create        = msm_gem_dumb_create,
 	.dumb_map_offset    = msm_gem_dumb_map_offset,
@@ -1081,6 +1079,7 @@ static const struct drm_driver msm_driver = {
 #ifdef CONFIG_DEBUG_FS
 	.debugfs_init       = msm_debugfs_init,
 #endif
+	.show_fdinfo        = msm_show_fdinfo,
 	.ioctls             = msm_ioctls,
 	.num_ioctls         = ARRAY_SIZE(msm_ioctls),
 	.fops               = &fops,
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 26ebda40be4f..c403912d13ab 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -151,8 +151,6 @@ int msm_gpu_pm_suspend(struct msm_gpu *gpu)
 void msm_gpu_show_fdinfo(struct msm_gpu *gpu, struct msm_file_private *ctx,
 			 struct drm_printer *p)
 {
-	drm_printf(p, "drm-driver:\t%s\n", gpu->dev->driver->name);
-	drm_printf(p, "drm-client-id:\t%u\n", ctx->seqno);
 	drm_printf(p, "drm-engine-gpu:\t%llu ns\n", ctx->elapsed_ns);
 	drm_printf(p, "drm-cycles-gpu:\t%llu\n", ctx->cycles);
 	drm_printf(p, "drm-maxfreq-gpu:\t%u Hz\n", gpu->fast_rate);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 2/7] drm/msm: Switch to fdinfo helper
@ 2023-04-11 22:56   ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Tvrtko Ursulin, linux-arm-msm, Emil Velikov,
	Christopher Healy, Abhinav Kumar, Sean Paul, Boris Brezillon,
	Dmitry Baryshkov, freedreno, open list

From: Rob Clark <robdclark@chromium.org>

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_drv.c | 11 +++++------
 drivers/gpu/drm/msm/msm_gpu.c |  2 --
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 5a10d28de9dd..e516a3544505 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -1043,23 +1043,21 @@ static const struct drm_ioctl_desc msm_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(MSM_SUBMITQUEUE_QUERY, msm_ioctl_submitqueue_query, DRM_RENDER_ALLOW),
 };
 
-static void msm_fop_show_fdinfo(struct seq_file *m, struct file *f)
+static void msm_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 {
-	struct drm_file *file = f->private_data;
 	struct drm_device *dev = file->minor->dev;
 	struct msm_drm_private *priv = dev->dev_private;
-	struct drm_printer p = drm_seq_file_printer(m);
 
 	if (!priv->gpu)
 		return;
 
-	msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, &p);
+	msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
 }
 
 static const struct file_operations fops = {
 	.owner = THIS_MODULE,
 	DRM_GEM_FOPS,
-	.show_fdinfo = msm_fop_show_fdinfo,
+	.show_fdinfo = drm_fop_show_fdinfo,
 };
 
 static const struct drm_driver msm_driver = {
@@ -1070,7 +1068,7 @@ static const struct drm_driver msm_driver = {
 				DRIVER_SYNCOBJ_TIMELINE |
 				DRIVER_SYNCOBJ,
 	.open               = msm_open,
-	.postclose           = msm_postclose,
+	.postclose          = msm_postclose,
 	.lastclose          = drm_fb_helper_lastclose,
 	.dumb_create        = msm_gem_dumb_create,
 	.dumb_map_offset    = msm_gem_dumb_map_offset,
@@ -1081,6 +1079,7 @@ static const struct drm_driver msm_driver = {
 #ifdef CONFIG_DEBUG_FS
 	.debugfs_init       = msm_debugfs_init,
 #endif
+	.show_fdinfo        = msm_show_fdinfo,
 	.ioctls             = msm_ioctls,
 	.num_ioctls         = ARRAY_SIZE(msm_ioctls),
 	.fops               = &fops,
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 26ebda40be4f..c403912d13ab 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -151,8 +151,6 @@ int msm_gpu_pm_suspend(struct msm_gpu *gpu)
 void msm_gpu_show_fdinfo(struct msm_gpu *gpu, struct msm_file_private *ctx,
 			 struct drm_printer *p)
 {
-	drm_printf(p, "drm-driver:\t%s\n", gpu->dev->driver->name);
-	drm_printf(p, "drm-client-id:\t%u\n", ctx->seqno);
 	drm_printf(p, "drm-engine-gpu:\t%llu ns\n", ctx->elapsed_ns);
 	drm_printf(p, "drm-cycles-gpu:\t%llu\n", ctx->cycles);
 	drm_printf(p, "drm-maxfreq-gpu:\t%u Hz\n", gpu->fast_rate);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 3/7] drm/amdgpu: Switch to fdinfo helper
  2023-04-11 22:56 ` Rob Clark
  (?)
@ 2023-04-11 22:56   ` Rob Clark
  -1 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: linux-arm-msm, freedreno, Boris Brezillon, Tvrtko Ursulin,
	Christopher Healy, Emil Velikov, Rob Clark, Alex Deucher,
	Christian König, Pan, Xinhui, David Airlie, Daniel Vetter,
	Hawking Zhang, Evan Quan, Mario Limonciello, Guchun Chen,
	YiPeng Chai, Michel Dänzer, Shashank Sharma, Tvrtko Ursulin,
	Arunpravin Paneer Selvam,
	open list:RADEON and AMDGPU DRM DRIVERS, open list

From: Rob Clark <robdclark@chromium.org>

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 16 ++++++----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |  2 +-
 3 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index f5ffca24def4..3611cfd5f076 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2752,7 +2752,7 @@ static const struct file_operations amdgpu_driver_kms_fops = {
 	.compat_ioctl = amdgpu_kms_compat_ioctl,
 #endif
 #ifdef CONFIG_PROC_FS
-	.show_fdinfo = amdgpu_show_fdinfo
+	.show_fdinfo = drm_fop_show_fdinfo,
 #endif
 };
 
@@ -2807,6 +2807,7 @@ static const struct drm_driver amdgpu_kms_driver = {
 	.dumb_map_offset = amdgpu_mode_dumb_mmap,
 	.fops = &amdgpu_driver_kms_fops,
 	.release = &amdgpu_driver_release_kms,
+	.show_fdinfo = amdgpu_show_fdinfo,
 
 	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
 	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
index 99a7855ab1bc..c2fdd5e448d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -53,9 +53,8 @@ static const char *amdgpu_ip_name[AMDGPU_HW_IP_NUM] = {
 	[AMDGPU_HW_IP_VCN_JPEG]	=	"jpeg",
 };
 
-void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
+void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 {
-	struct drm_file *file = f->private_data;
 	struct amdgpu_device *adev = drm_to_adev(file->minor->dev);
 	struct amdgpu_fpriv *fpriv = file->driver_priv;
 	struct amdgpu_vm *vm = &fpriv->vm;
@@ -86,18 +85,15 @@ void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
 	 * ******************************************************************
 	 */
 
-	seq_printf(m, "pasid:\t%u\n", fpriv->vm.pasid);
-	seq_printf(m, "drm-driver:\t%s\n", file->minor->dev->driver->name);
-	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n", domain, bus, dev, fn);
-	seq_printf(m, "drm-client-id:\t%Lu\n", vm->immediate.fence_context);
-	seq_printf(m, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
-	seq_printf(m, "drm-memory-gtt: \t%llu KiB\n", gtt_mem/1024UL);
-	seq_printf(m, "drm-memory-cpu: \t%llu KiB\n", cpu_mem/1024UL);
+	drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid);
+	drm_printf(p, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
+	drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", gtt_mem/1024UL);
+	drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", cpu_mem/1024UL);
 	for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
 		if (!usage[hw_ip])
 			continue;
 
-		seq_printf(m, "drm-engine-%s:\t%Ld ns\n", amdgpu_ip_name[hw_ip],
+		drm_printf(p, "drm-engine-%s:\t%Ld ns\n", amdgpu_ip_name[hw_ip],
 			   ktime_to_ns(usage[hw_ip]));
 	}
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
index e86834bfea1d..0398f5a159ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
@@ -37,6 +37,6 @@
 #include "amdgpu_ids.h"
 
 uint32_t amdgpu_get_ip_count(struct amdgpu_device *adev, int id);
-void amdgpu_show_fdinfo(struct seq_file *m, struct file *f);
+void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file);
 
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 3/7] drm/amdgpu: Switch to fdinfo helper
@ 2023-04-11 22:56   ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: open list, Michel Dänzer, YiPeng Chai, Mario Limonciello,
	Rob Clark, Guchun Chen, Shashank Sharma,
	open list:RADEON and AMDGPU DRM DRIVERS,
	Arunpravin Paneer Selvam, linux-arm-msm, Evan Quan,
	Tvrtko Ursulin, Tvrtko Ursulin, Pan, Xinhui, Emil Velikov,
	Christopher Healy, Boris Brezillon, Alex Deucher, freedreno,
	Christian König, Hawking Zhang

From: Rob Clark <robdclark@chromium.org>

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 16 ++++++----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |  2 +-
 3 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index f5ffca24def4..3611cfd5f076 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2752,7 +2752,7 @@ static const struct file_operations amdgpu_driver_kms_fops = {
 	.compat_ioctl = amdgpu_kms_compat_ioctl,
 #endif
 #ifdef CONFIG_PROC_FS
-	.show_fdinfo = amdgpu_show_fdinfo
+	.show_fdinfo = drm_fop_show_fdinfo,
 #endif
 };
 
@@ -2807,6 +2807,7 @@ static const struct drm_driver amdgpu_kms_driver = {
 	.dumb_map_offset = amdgpu_mode_dumb_mmap,
 	.fops = &amdgpu_driver_kms_fops,
 	.release = &amdgpu_driver_release_kms,
+	.show_fdinfo = amdgpu_show_fdinfo,
 
 	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
 	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
index 99a7855ab1bc..c2fdd5e448d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -53,9 +53,8 @@ static const char *amdgpu_ip_name[AMDGPU_HW_IP_NUM] = {
 	[AMDGPU_HW_IP_VCN_JPEG]	=	"jpeg",
 };
 
-void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
+void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 {
-	struct drm_file *file = f->private_data;
 	struct amdgpu_device *adev = drm_to_adev(file->minor->dev);
 	struct amdgpu_fpriv *fpriv = file->driver_priv;
 	struct amdgpu_vm *vm = &fpriv->vm;
@@ -86,18 +85,15 @@ void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
 	 * ******************************************************************
 	 */
 
-	seq_printf(m, "pasid:\t%u\n", fpriv->vm.pasid);
-	seq_printf(m, "drm-driver:\t%s\n", file->minor->dev->driver->name);
-	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n", domain, bus, dev, fn);
-	seq_printf(m, "drm-client-id:\t%Lu\n", vm->immediate.fence_context);
-	seq_printf(m, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
-	seq_printf(m, "drm-memory-gtt: \t%llu KiB\n", gtt_mem/1024UL);
-	seq_printf(m, "drm-memory-cpu: \t%llu KiB\n", cpu_mem/1024UL);
+	drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid);
+	drm_printf(p, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
+	drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", gtt_mem/1024UL);
+	drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", cpu_mem/1024UL);
 	for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
 		if (!usage[hw_ip])
 			continue;
 
-		seq_printf(m, "drm-engine-%s:\t%Ld ns\n", amdgpu_ip_name[hw_ip],
+		drm_printf(p, "drm-engine-%s:\t%Ld ns\n", amdgpu_ip_name[hw_ip],
 			   ktime_to_ns(usage[hw_ip]));
 	}
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
index e86834bfea1d..0398f5a159ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
@@ -37,6 +37,6 @@
 #include "amdgpu_ids.h"
 
 uint32_t amdgpu_get_ip_count(struct amdgpu_device *adev, int id);
-void amdgpu_show_fdinfo(struct seq_file *m, struct file *f);
+void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file);
 
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 3/7] drm/amdgpu: Switch to fdinfo helper
@ 2023-04-11 22:56   ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: open list, Michel Dänzer, YiPeng Chai, Mario Limonciello,
	David Airlie, Rob Clark, Guchun Chen, Shashank Sharma,
	open list:RADEON and AMDGPU DRM DRIVERS,
	Arunpravin Paneer Selvam, linux-arm-msm, Evan Quan,
	Tvrtko Ursulin, Tvrtko Ursulin, Pan, Xinhui, Emil Velikov,
	Christopher Healy, Boris Brezillon, Daniel Vetter, Alex Deucher,
	freedreno, Christian König, Hawking Zhang

From: Rob Clark <robdclark@chromium.org>

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 16 ++++++----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |  2 +-
 3 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index f5ffca24def4..3611cfd5f076 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2752,7 +2752,7 @@ static const struct file_operations amdgpu_driver_kms_fops = {
 	.compat_ioctl = amdgpu_kms_compat_ioctl,
 #endif
 #ifdef CONFIG_PROC_FS
-	.show_fdinfo = amdgpu_show_fdinfo
+	.show_fdinfo = drm_fop_show_fdinfo,
 #endif
 };
 
@@ -2807,6 +2807,7 @@ static const struct drm_driver amdgpu_kms_driver = {
 	.dumb_map_offset = amdgpu_mode_dumb_mmap,
 	.fops = &amdgpu_driver_kms_fops,
 	.release = &amdgpu_driver_release_kms,
+	.show_fdinfo = amdgpu_show_fdinfo,
 
 	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
 	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
index 99a7855ab1bc..c2fdd5e448d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -53,9 +53,8 @@ static const char *amdgpu_ip_name[AMDGPU_HW_IP_NUM] = {
 	[AMDGPU_HW_IP_VCN_JPEG]	=	"jpeg",
 };
 
-void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
+void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 {
-	struct drm_file *file = f->private_data;
 	struct amdgpu_device *adev = drm_to_adev(file->minor->dev);
 	struct amdgpu_fpriv *fpriv = file->driver_priv;
 	struct amdgpu_vm *vm = &fpriv->vm;
@@ -86,18 +85,15 @@ void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
 	 * ******************************************************************
 	 */
 
-	seq_printf(m, "pasid:\t%u\n", fpriv->vm.pasid);
-	seq_printf(m, "drm-driver:\t%s\n", file->minor->dev->driver->name);
-	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n", domain, bus, dev, fn);
-	seq_printf(m, "drm-client-id:\t%Lu\n", vm->immediate.fence_context);
-	seq_printf(m, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
-	seq_printf(m, "drm-memory-gtt: \t%llu KiB\n", gtt_mem/1024UL);
-	seq_printf(m, "drm-memory-cpu: \t%llu KiB\n", cpu_mem/1024UL);
+	drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid);
+	drm_printf(p, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
+	drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", gtt_mem/1024UL);
+	drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", cpu_mem/1024UL);
 	for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
 		if (!usage[hw_ip])
 			continue;
 
-		seq_printf(m, "drm-engine-%s:\t%Ld ns\n", amdgpu_ip_name[hw_ip],
+		drm_printf(p, "drm-engine-%s:\t%Ld ns\n", amdgpu_ip_name[hw_ip],
 			   ktime_to_ns(usage[hw_ip]));
 	}
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
index e86834bfea1d..0398f5a159ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
@@ -37,6 +37,6 @@
 #include "amdgpu_ids.h"
 
 uint32_t amdgpu_get_ip_count(struct amdgpu_device *adev, int id);
-void amdgpu_show_fdinfo(struct seq_file *m, struct file *f);
+void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file);
 
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 4/7] drm/i915: Switch to fdinfo helper
  2023-04-11 22:56 ` Rob Clark
  (?)
@ 2023-04-11 22:56   ` Rob Clark
  -1 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: linux-arm-msm, freedreno, Boris Brezillon, Tvrtko Ursulin,
	Christopher Healy, Emil Velikov, Rob Clark, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, David Airlie, Daniel Vetter,
	intel-gfx, open list

From: Rob Clark <robdclark@chromium.org>

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/i915/i915_driver.c     |  3 ++-
 drivers/gpu/drm/i915/i915_drm_client.c | 18 +++++-------------
 drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
 3 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index db7a86def7e2..37eacaa3064b 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
 	.compat_ioctl = i915_ioc32_compat_ioctl,
 	.llseek = noop_llseek,
 #ifdef CONFIG_PROC_FS
-	.show_fdinfo = i915_drm_client_fdinfo,
+	.show_fdinfo = drm_fop_show_fdinfo,
 #endif
 };
 
@@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
 	.open = i915_driver_open,
 	.lastclose = i915_driver_lastclose,
 	.postclose = i915_driver_postclose,
+	.show_fdinfo = i915_drm_client_fdinfo,
 
 	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
 	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
index b09d1d386574..4a77e5e47f79 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
 }
 
 static void
-show_client_class(struct seq_file *m,
+show_client_class(struct drm_printer *p,
 		  struct i915_drm_client *client,
 		  unsigned int class)
 {
@@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
 	rcu_read_unlock();
 
 	if (capacity)
-		seq_printf(m, "drm-engine-%s:\t%llu ns\n",
+		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
 			   uabi_class_names[class], total);
 
 	if (capacity > 1)
-		seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
+		drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
 			   uabi_class_names[class],
 			   capacity);
 }
 
-void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
+void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
 {
-	struct drm_file *file = f->private_data;
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct drm_i915_private *i915 = file_priv->dev_priv;
 	struct i915_drm_client *client = file_priv->client;
-	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
 	unsigned int i;
 
 	/*
@@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
 	 * ******************************************************************
 	 */
 
-	seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
-	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
-		   pci_domain_nr(pdev->bus), pdev->bus->number,
-		   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
-	seq_printf(m, "drm-client-id:\t%u\n", client->id);
-
 	/*
 	 * Temporarily skip showing client engine information with GuC submission till
 	 * fetching engine busyness is implemented in the GuC submission backend
@@ -155,6 +147,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
 		return;
 
 	for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
-		show_client_class(m, client, i);
+		show_client_class(p, client, i);
 }
 #endif
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
index 69496af996d9..ef85fef45de5 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -60,7 +60,7 @@ static inline void i915_drm_client_put(struct i915_drm_client *client)
 struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients);
 
 #ifdef CONFIG_PROC_FS
-void i915_drm_client_fdinfo(struct seq_file *m, struct file *f);
+void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file);
 #endif
 
 void i915_drm_clients_fini(struct i915_drm_clients *clients);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 4/7] drm/i915: Switch to fdinfo helper
@ 2023-04-11 22:56   ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Tvrtko Ursulin, linux-arm-msm, intel-gfx,
	Emil Velikov, Christopher Healy, open list, Boris Brezillon,
	Rodrigo Vivi, freedreno

From: Rob Clark <robdclark@chromium.org>

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/i915/i915_driver.c     |  3 ++-
 drivers/gpu/drm/i915/i915_drm_client.c | 18 +++++-------------
 drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
 3 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index db7a86def7e2..37eacaa3064b 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
 	.compat_ioctl = i915_ioc32_compat_ioctl,
 	.llseek = noop_llseek,
 #ifdef CONFIG_PROC_FS
-	.show_fdinfo = i915_drm_client_fdinfo,
+	.show_fdinfo = drm_fop_show_fdinfo,
 #endif
 };
 
@@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
 	.open = i915_driver_open,
 	.lastclose = i915_driver_lastclose,
 	.postclose = i915_driver_postclose,
+	.show_fdinfo = i915_drm_client_fdinfo,
 
 	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
 	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
index b09d1d386574..4a77e5e47f79 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
 }
 
 static void
-show_client_class(struct seq_file *m,
+show_client_class(struct drm_printer *p,
 		  struct i915_drm_client *client,
 		  unsigned int class)
 {
@@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
 	rcu_read_unlock();
 
 	if (capacity)
-		seq_printf(m, "drm-engine-%s:\t%llu ns\n",
+		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
 			   uabi_class_names[class], total);
 
 	if (capacity > 1)
-		seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
+		drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
 			   uabi_class_names[class],
 			   capacity);
 }
 
-void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
+void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
 {
-	struct drm_file *file = f->private_data;
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct drm_i915_private *i915 = file_priv->dev_priv;
 	struct i915_drm_client *client = file_priv->client;
-	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
 	unsigned int i;
 
 	/*
@@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
 	 * ******************************************************************
 	 */
 
-	seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
-	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
-		   pci_domain_nr(pdev->bus), pdev->bus->number,
-		   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
-	seq_printf(m, "drm-client-id:\t%u\n", client->id);
-
 	/*
 	 * Temporarily skip showing client engine information with GuC submission till
 	 * fetching engine busyness is implemented in the GuC submission backend
@@ -155,6 +147,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
 		return;
 
 	for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
-		show_client_class(m, client, i);
+		show_client_class(p, client, i);
 }
 #endif
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
index 69496af996d9..ef85fef45de5 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -60,7 +60,7 @@ static inline void i915_drm_client_put(struct i915_drm_client *client)
 struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients);
 
 #ifdef CONFIG_PROC_FS
-void i915_drm_client_fdinfo(struct seq_file *m, struct file *f);
+void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file);
 #endif
 
 void i915_drm_clients_fini(struct i915_drm_clients *clients);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Intel-gfx] [PATCH v3 4/7] drm/i915: Switch to fdinfo helper
@ 2023-04-11 22:56   ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, linux-arm-msm, intel-gfx, Christopher Healy,
	open list, Daniel Vetter, Rodrigo Vivi, David Airlie, freedreno

From: Rob Clark <robdclark@chromium.org>

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/i915/i915_driver.c     |  3 ++-
 drivers/gpu/drm/i915/i915_drm_client.c | 18 +++++-------------
 drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
 3 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index db7a86def7e2..37eacaa3064b 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
 	.compat_ioctl = i915_ioc32_compat_ioctl,
 	.llseek = noop_llseek,
 #ifdef CONFIG_PROC_FS
-	.show_fdinfo = i915_drm_client_fdinfo,
+	.show_fdinfo = drm_fop_show_fdinfo,
 #endif
 };
 
@@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
 	.open = i915_driver_open,
 	.lastclose = i915_driver_lastclose,
 	.postclose = i915_driver_postclose,
+	.show_fdinfo = i915_drm_client_fdinfo,
 
 	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
 	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
index b09d1d386574..4a77e5e47f79 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
 }
 
 static void
-show_client_class(struct seq_file *m,
+show_client_class(struct drm_printer *p,
 		  struct i915_drm_client *client,
 		  unsigned int class)
 {
@@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
 	rcu_read_unlock();
 
 	if (capacity)
-		seq_printf(m, "drm-engine-%s:\t%llu ns\n",
+		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
 			   uabi_class_names[class], total);
 
 	if (capacity > 1)
-		seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
+		drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
 			   uabi_class_names[class],
 			   capacity);
 }
 
-void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
+void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
 {
-	struct drm_file *file = f->private_data;
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct drm_i915_private *i915 = file_priv->dev_priv;
 	struct i915_drm_client *client = file_priv->client;
-	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
 	unsigned int i;
 
 	/*
@@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
 	 * ******************************************************************
 	 */
 
-	seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
-	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
-		   pci_domain_nr(pdev->bus), pdev->bus->number,
-		   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
-	seq_printf(m, "drm-client-id:\t%u\n", client->id);
-
 	/*
 	 * Temporarily skip showing client engine information with GuC submission till
 	 * fetching engine busyness is implemented in the GuC submission backend
@@ -155,6 +147,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
 		return;
 
 	for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
-		show_client_class(m, client, i);
+		show_client_class(p, client, i);
 }
 #endif
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
index 69496af996d9..ef85fef45de5 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -60,7 +60,7 @@ static inline void i915_drm_client_put(struct i915_drm_client *client)
 struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients);
 
 #ifdef CONFIG_PROC_FS
-void i915_drm_client_fdinfo(struct seq_file *m, struct file *f);
+void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file);
 #endif
 
 void i915_drm_clients_fini(struct i915_drm_clients *clients);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 5/7] drm/etnaviv: Switch to fdinfo helper
  2023-04-11 22:56 ` Rob Clark
@ 2023-04-11 22:56   ` Rob Clark
  -1 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: linux-arm-msm, freedreno, Boris Brezillon, Tvrtko Ursulin,
	Christopher Healy, Emil Velikov, Rob Clark, Lucas Stach,
	Russell King, Christian Gmeiner, David Airlie, Daniel Vetter,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, open list

From: Rob Clark <robdclark@chromium.org>

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/etnaviv/etnaviv_drv.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
index 44ca803237a5..170000d6af94 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
@@ -476,9 +476,8 @@ static const struct drm_ioctl_desc etnaviv_ioctls[] = {
 	ETNA_IOCTL(PM_QUERY_SIG, pm_query_sig, DRM_RENDER_ALLOW),
 };
 
-static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
+static void etnaviv_fop_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 {
-	struct drm_file *file = f->private_data;
 	struct drm_device *dev = file->minor->dev;
 	struct etnaviv_drm_private *priv = dev->dev_private;
 	struct etnaviv_file_private *ctx = file->driver_priv;
@@ -487,8 +486,6 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
 	 * For a description of the text output format used here, see
 	 * Documentation/gpu/drm-usage-stats.rst.
 	 */
-	seq_printf(m, "drm-driver:\t%s\n", dev->driver->name);
-	seq_printf(m, "drm-client-id:\t%u\n", ctx->id);
 
 	for (int i = 0; i < ETNA_MAX_PIPES; i++) {
 		struct etnaviv_gpu *gpu = priv->gpu[i];
@@ -507,7 +504,7 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
 			cur = snprintf(engine + cur, sizeof(engine) - cur,
 				       "%sNN", cur ? "/" : "");
 
-		seq_printf(m, "drm-engine-%s:\t%llu ns\n", engine,
+		drm_printf(p, "drm-engine-%s:\t%llu ns\n", engine,
 			   ctx->sched_entity[i].elapsed_ns);
 	}
 }
@@ -515,7 +512,7 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
 static const struct file_operations fops = {
 	.owner = THIS_MODULE,
 	DRM_GEM_FOPS,
-	.show_fdinfo = etnaviv_fop_show_fdinfo,
+	.show_fdinfo = drm_fop_show_fdinfo,
 };
 
 static const struct drm_driver etnaviv_drm_driver = {
@@ -529,6 +526,7 @@ static const struct drm_driver etnaviv_drm_driver = {
 #ifdef CONFIG_DEBUG_FS
 	.debugfs_init       = etnaviv_debugfs_init,
 #endif
+	.show_fdinfo        = etnaviv_fop_show_fdinfo,
 	.ioctls             = etnaviv_ioctls,
 	.num_ioctls         = DRM_ETNAVIV_NUM_IOCTLS,
 	.fops               = &fops,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 5/7] drm/etnaviv: Switch to fdinfo helper
@ 2023-04-11 22:56   ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Tvrtko Ursulin, linux-arm-msm, Emil Velikov,
	Christopher Healy, moderated list:DRM DRIVERS FOR VIVANTE GPU IP,
	Boris Brezillon, Russell King, freedreno, open list

From: Rob Clark <robdclark@chromium.org>

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/etnaviv/etnaviv_drv.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
index 44ca803237a5..170000d6af94 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
@@ -476,9 +476,8 @@ static const struct drm_ioctl_desc etnaviv_ioctls[] = {
 	ETNA_IOCTL(PM_QUERY_SIG, pm_query_sig, DRM_RENDER_ALLOW),
 };
 
-static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
+static void etnaviv_fop_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 {
-	struct drm_file *file = f->private_data;
 	struct drm_device *dev = file->minor->dev;
 	struct etnaviv_drm_private *priv = dev->dev_private;
 	struct etnaviv_file_private *ctx = file->driver_priv;
@@ -487,8 +486,6 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
 	 * For a description of the text output format used here, see
 	 * Documentation/gpu/drm-usage-stats.rst.
 	 */
-	seq_printf(m, "drm-driver:\t%s\n", dev->driver->name);
-	seq_printf(m, "drm-client-id:\t%u\n", ctx->id);
 
 	for (int i = 0; i < ETNA_MAX_PIPES; i++) {
 		struct etnaviv_gpu *gpu = priv->gpu[i];
@@ -507,7 +504,7 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
 			cur = snprintf(engine + cur, sizeof(engine) - cur,
 				       "%sNN", cur ? "/" : "");
 
-		seq_printf(m, "drm-engine-%s:\t%llu ns\n", engine,
+		drm_printf(p, "drm-engine-%s:\t%llu ns\n", engine,
 			   ctx->sched_entity[i].elapsed_ns);
 	}
 }
@@ -515,7 +512,7 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
 static const struct file_operations fops = {
 	.owner = THIS_MODULE,
 	DRM_GEM_FOPS,
-	.show_fdinfo = etnaviv_fop_show_fdinfo,
+	.show_fdinfo = drm_fop_show_fdinfo,
 };
 
 static const struct drm_driver etnaviv_drm_driver = {
@@ -529,6 +526,7 @@ static const struct drm_driver etnaviv_drm_driver = {
 #ifdef CONFIG_DEBUG_FS
 	.debugfs_init       = etnaviv_debugfs_init,
 #endif
+	.show_fdinfo        = etnaviv_fop_show_fdinfo,
 	.ioctls             = etnaviv_ioctls,
 	.num_ioctls         = DRM_ETNAVIV_NUM_IOCTLS,
 	.fops               = &fops,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-11 22:56 ` Rob Clark
@ 2023-04-11 22:56   ` Rob Clark
  -1 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Tvrtko Ursulin, Thomas Zimmermann, Jonathan Corbet,
	linux-arm-msm, open list:DOCUMENTATION, Emil Velikov,
	Christopher Healy, open list, Boris Brezillon, freedreno

From: Rob Clark <robdclark@chromium.org>

Add support to dump GEM stats to fdinfo.

v2: Fix typos, change size units to match docs, use div_u64
v3: Do it in core

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
---
 Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
 drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
 include/drm/drm_file.h                |  1 +
 include/drm/drm_gem.h                 | 19 +++++++
 4 files changed, 117 insertions(+)

diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
index b46327356e80..b5e7802532ed 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
 Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
 indicating kibi- or mebi-bytes.
 
+- drm-shared-memory: <uint> [KiB|MiB]
+
+The total size of buffers that are shared with another file (ie. have more
+than a single handle).
+
+- drm-private-memory: <uint> [KiB|MiB]
+
+The total size of buffers that are not shared with another file.
+
+- drm-resident-memory: <uint> [KiB|MiB]
+
+The total size of buffers that are resident in system memory.
+
+- drm-purgeable-memory: <uint> [KiB|MiB]
+
+The total size of buffers that are purgeable.
+
+- drm-active-memory: <uint> [KiB|MiB]
+
+The total size of buffers that are active on one or more rings.
+
 - drm-cycles-<str> <uint>
 
 Engine identifier string must be the same as the one specified in the
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 37dfaa6be560..46fdd843bb3a 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -42,6 +42,7 @@
 #include <drm/drm_client.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_file.h>
+#include <drm/drm_gem.h>
 #include <drm/drm_print.h>
 
 #include "drm_crtc_internal.h"
@@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
 }
 EXPORT_SYMBOL(drm_send_event);
 
+static void print_size(struct drm_printer *p, const char *stat, size_t sz)
+{
+	const char *units[] = {"", " KiB", " MiB"};
+	unsigned u;
+
+	for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
+		if (sz < SZ_1K)
+			break;
+		sz = div_u64(sz, SZ_1K);
+	}
+
+	drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
+}
+
+static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
+{
+	struct drm_gem_object *obj;
+	struct {
+		size_t shared;
+		size_t private;
+		size_t resident;
+		size_t purgeable;
+		size_t active;
+	} size = {0};
+	bool has_status = false;
+	int id;
+
+	spin_lock(&file->table_lock);
+	idr_for_each_entry (&file->object_idr, obj, id) {
+		enum drm_gem_object_status s = 0;
+
+		if (obj->funcs && obj->funcs->status) {
+			s = obj->funcs->status(obj);
+			has_status = true;
+		}
+
+		if (obj->handle_count > 1) {
+			size.shared += obj->size;
+		} else {
+			size.private += obj->size;
+		}
+
+		if (s & DRM_GEM_OBJECT_RESIDENT) {
+			size.resident += obj->size;
+		} else {
+			/* If already purged or not yet backed by pages, don't
+			 * count it as purgeable:
+			 */
+			s &= ~DRM_GEM_OBJECT_PURGEABLE;
+		}
+
+		if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
+			size.active += obj->size;
+
+			/* If still active, don't count as purgeable: */
+			s &= ~DRM_GEM_OBJECT_PURGEABLE;
+		}
+
+		if (s & DRM_GEM_OBJECT_PURGEABLE)
+			size.purgeable += obj->size;
+	}
+	spin_unlock(&file->table_lock);
+
+	print_size(p, "drm-shared-memory", size.shared);
+	print_size(p, "drm-private-memory", size.private);
+	print_size(p, "drm-active-memory", size.active);
+
+	if (has_status) {
+		print_size(p, "drm-resident-memory", size.resident);
+		print_size(p, "drm-purgeable-memory", size.purgeable);
+	}
+}
+
 /**
  * drm_fop_show_fdinfo - helper for drm file fops
  * @seq_file: output stream
@@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
 
 	if (dev->driver->show_fdinfo)
 		dev->driver->show_fdinfo(&p, file);
+
+	print_memory_stats(&p, file);
 }
 EXPORT_SYMBOL(drm_fop_show_fdinfo);
 
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index dfa995b787e1..e5b40084538f 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -41,6 +41,7 @@
 struct dma_fence;
 struct drm_file;
 struct drm_device;
+struct drm_printer;
 struct device;
 struct file;
 
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 189fd618ca65..213917bb6b11 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -42,6 +42,14 @@
 struct iosys_map;
 struct drm_gem_object;
 
+/**
+ * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
+ */
+enum drm_gem_object_status {
+	DRM_GEM_OBJECT_RESIDENT  = BIT(0),
+	DRM_GEM_OBJECT_PURGEABLE = BIT(1),
+};
+
 /**
  * struct drm_gem_object_funcs - GEM object functions
  */
@@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
 	 */
 	int (*evict)(struct drm_gem_object *obj);
 
+	/**
+	 * @status:
+	 *
+	 * The optional status callback can return additional object state
+	 * which determines which stats the object is counted against.  The
+	 * callback is called under table_lock.  Racing against object status
+	 * change is "harmless", and the callback can expect to not race
+	 * against object destruction.
+	 */
+	enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
+
 	/**
 	 * @vm_ops:
 	 *
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-11 22:56   ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: linux-arm-msm, freedreno, Boris Brezillon, Tvrtko Ursulin,
	Christopher Healy, Emil Velikov, Rob Clark, David Airlie,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, open list:DOCUMENTATION,
	open list

From: Rob Clark <robdclark@chromium.org>

Add support to dump GEM stats to fdinfo.

v2: Fix typos, change size units to match docs, use div_u64
v3: Do it in core

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
---
 Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
 drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
 include/drm/drm_file.h                |  1 +
 include/drm/drm_gem.h                 | 19 +++++++
 4 files changed, 117 insertions(+)

diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
index b46327356e80..b5e7802532ed 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
 Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
 indicating kibi- or mebi-bytes.
 
+- drm-shared-memory: <uint> [KiB|MiB]
+
+The total size of buffers that are shared with another file (ie. have more
+than a single handle).
+
+- drm-private-memory: <uint> [KiB|MiB]
+
+The total size of buffers that are not shared with another file.
+
+- drm-resident-memory: <uint> [KiB|MiB]
+
+The total size of buffers that are resident in system memory.
+
+- drm-purgeable-memory: <uint> [KiB|MiB]
+
+The total size of buffers that are purgeable.
+
+- drm-active-memory: <uint> [KiB|MiB]
+
+The total size of buffers that are active on one or more rings.
+
 - drm-cycles-<str> <uint>
 
 Engine identifier string must be the same as the one specified in the
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 37dfaa6be560..46fdd843bb3a 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -42,6 +42,7 @@
 #include <drm/drm_client.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_file.h>
+#include <drm/drm_gem.h>
 #include <drm/drm_print.h>
 
 #include "drm_crtc_internal.h"
@@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
 }
 EXPORT_SYMBOL(drm_send_event);
 
+static void print_size(struct drm_printer *p, const char *stat, size_t sz)
+{
+	const char *units[] = {"", " KiB", " MiB"};
+	unsigned u;
+
+	for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
+		if (sz < SZ_1K)
+			break;
+		sz = div_u64(sz, SZ_1K);
+	}
+
+	drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
+}
+
+static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
+{
+	struct drm_gem_object *obj;
+	struct {
+		size_t shared;
+		size_t private;
+		size_t resident;
+		size_t purgeable;
+		size_t active;
+	} size = {0};
+	bool has_status = false;
+	int id;
+
+	spin_lock(&file->table_lock);
+	idr_for_each_entry (&file->object_idr, obj, id) {
+		enum drm_gem_object_status s = 0;
+
+		if (obj->funcs && obj->funcs->status) {
+			s = obj->funcs->status(obj);
+			has_status = true;
+		}
+
+		if (obj->handle_count > 1) {
+			size.shared += obj->size;
+		} else {
+			size.private += obj->size;
+		}
+
+		if (s & DRM_GEM_OBJECT_RESIDENT) {
+			size.resident += obj->size;
+		} else {
+			/* If already purged or not yet backed by pages, don't
+			 * count it as purgeable:
+			 */
+			s &= ~DRM_GEM_OBJECT_PURGEABLE;
+		}
+
+		if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
+			size.active += obj->size;
+
+			/* If still active, don't count as purgeable: */
+			s &= ~DRM_GEM_OBJECT_PURGEABLE;
+		}
+
+		if (s & DRM_GEM_OBJECT_PURGEABLE)
+			size.purgeable += obj->size;
+	}
+	spin_unlock(&file->table_lock);
+
+	print_size(p, "drm-shared-memory", size.shared);
+	print_size(p, "drm-private-memory", size.private);
+	print_size(p, "drm-active-memory", size.active);
+
+	if (has_status) {
+		print_size(p, "drm-resident-memory", size.resident);
+		print_size(p, "drm-purgeable-memory", size.purgeable);
+	}
+}
+
 /**
  * drm_fop_show_fdinfo - helper for drm file fops
  * @seq_file: output stream
@@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
 
 	if (dev->driver->show_fdinfo)
 		dev->driver->show_fdinfo(&p, file);
+
+	print_memory_stats(&p, file);
 }
 EXPORT_SYMBOL(drm_fop_show_fdinfo);
 
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index dfa995b787e1..e5b40084538f 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -41,6 +41,7 @@
 struct dma_fence;
 struct drm_file;
 struct drm_device;
+struct drm_printer;
 struct device;
 struct file;
 
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 189fd618ca65..213917bb6b11 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -42,6 +42,14 @@
 struct iosys_map;
 struct drm_gem_object;
 
+/**
+ * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
+ */
+enum drm_gem_object_status {
+	DRM_GEM_OBJECT_RESIDENT  = BIT(0),
+	DRM_GEM_OBJECT_PURGEABLE = BIT(1),
+};
+
 /**
  * struct drm_gem_object_funcs - GEM object functions
  */
@@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
 	 */
 	int (*evict)(struct drm_gem_object *obj);
 
+	/**
+	 * @status:
+	 *
+	 * The optional status callback can return additional object state
+	 * which determines which stats the object is counted against.  The
+	 * callback is called under table_lock.  Racing against object status
+	 * change is "harmless", and the callback can expect to not race
+	 * against object destruction.
+	 */
+	enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
+
 	/**
 	 * @vm_ops:
 	 *
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 7/7] drm/msm: Add memory stats to fdinfo
  2023-04-11 22:56 ` Rob Clark
@ 2023-04-11 22:56   ` Rob Clark
  -1 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: Rob Clark, Tvrtko Ursulin, linux-arm-msm, Emil Velikov,
	Christopher Healy, Abhinav Kumar, Sean Paul, Boris Brezillon,
	Dmitry Baryshkov, freedreno, open list

From: Rob Clark <robdclark@chromium.org>

Use the new helper to export stats about memory usage.

v2: Drop unintended hunk
v3: Rebase

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
---
 drivers/gpu/drm/msm/msm_gem.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index db6c4e281d75..c32264234ea1 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -1096,6 +1096,20 @@ int msm_gem_new_handle(struct drm_device *dev, struct drm_file *file,
 	return ret;
 }
 
+static enum drm_gem_object_status msm_gem_status(struct drm_gem_object *obj)
+{
+	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	enum drm_gem_object_status status = 0;
+
+	if (msm_obj->pages)
+		status |= DRM_GEM_OBJECT_RESIDENT;
+
+	if (msm_obj->madv == MSM_MADV_DONTNEED)
+		status |= DRM_GEM_OBJECT_PURGEABLE;
+
+	return status;
+}
+
 static const struct vm_operations_struct vm_ops = {
 	.fault = msm_gem_fault,
 	.open = drm_gem_vm_open,
@@ -1110,6 +1124,7 @@ static const struct drm_gem_object_funcs msm_gem_object_funcs = {
 	.vmap = msm_gem_prime_vmap,
 	.vunmap = msm_gem_prime_vunmap,
 	.mmap = msm_gem_object_mmap,
+	.status = msm_gem_status,
 	.vm_ops = &vm_ops,
 };
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 7/7] drm/msm: Add memory stats to fdinfo
@ 2023-04-11 22:56   ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-11 22:56 UTC (permalink / raw)
  To: dri-devel
  Cc: linux-arm-msm, freedreno, Boris Brezillon, Tvrtko Ursulin,
	Christopher Healy, Emil Velikov, Rob Clark, Rob Clark,
	Abhinav Kumar, Dmitry Baryshkov, Sean Paul, David Airlie,
	Daniel Vetter, open list

From: Rob Clark <robdclark@chromium.org>

Use the new helper to export stats about memory usage.

v2: Drop unintended hunk
v3: Rebase

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
---
 drivers/gpu/drm/msm/msm_gem.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index db6c4e281d75..c32264234ea1 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -1096,6 +1096,20 @@ int msm_gem_new_handle(struct drm_device *dev, struct drm_file *file,
 	return ret;
 }
 
+static enum drm_gem_object_status msm_gem_status(struct drm_gem_object *obj)
+{
+	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	enum drm_gem_object_status status = 0;
+
+	if (msm_obj->pages)
+		status |= DRM_GEM_OBJECT_RESIDENT;
+
+	if (msm_obj->madv == MSM_MADV_DONTNEED)
+		status |= DRM_GEM_OBJECT_PURGEABLE;
+
+	return status;
+}
+
 static const struct vm_operations_struct vm_ops = {
 	.fault = msm_gem_fault,
 	.open = drm_gem_vm_open,
@@ -1110,6 +1124,7 @@ static const struct drm_gem_object_funcs msm_gem_object_funcs = {
 	.vmap = msm_gem_prime_vmap,
 	.vunmap = msm_gem_prime_vunmap,
 	.mmap = msm_gem_object_mmap,
+	.status = msm_gem_status,
 	.vm_ops = &vm_ops,
 };
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/7] drm: Add common fdinfo helper
  2023-04-11 22:56   ` Rob Clark
@ 2023-04-12  7:55     ` Daniel Vetter
  -1 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12  7:55 UTC (permalink / raw)
  To: Rob Clark
  Cc: dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Tvrtko Ursulin, Christopher Healy, Emil Velikov, Rob Clark,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, open list

On Tue, Apr 11, 2023 at 03:56:06PM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Handle a bit of the boiler-plate in a single case, and make it easier to
> add some core tracked stats.
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>

Thanks a lot for kicking this off. A few polish comments below, with those
addressed:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/gpu/drm/drm_file.c | 39 ++++++++++++++++++++++++++++++++++++++
>  include/drm/drm_drv.h      |  7 +++++++
>  include/drm/drm_file.h     |  4 ++++
>  3 files changed, 50 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index a51ff8cee049..37dfaa6be560 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -148,6 +148,7 @@ bool drm_dev_needs_global_mutex(struct drm_device *dev)
>   */
>  struct drm_file *drm_file_alloc(struct drm_minor *minor)
>  {
> +	static atomic_t ident = ATOMIC_INIT(0);

Maybe make this atomic64_t just to be sure?

>  	struct drm_device *dev = minor->dev;
>  	struct drm_file *file;
>  	int ret;
> @@ -156,6 +157,8 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>  	if (!file)
>  		return ERR_PTR(-ENOMEM);
>  
> +	/* Get a unique identifier for fdinfo: */
> +	file->client_id = atomic_inc_return(&ident) - 1;
>  	file->pid = get_pid(task_pid(current));
>  	file->minor = minor;
>  
> @@ -868,6 +871,42 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
>  }
>  EXPORT_SYMBOL(drm_send_event);
>  
> +/**
> + * drm_fop_show_fdinfo - helper for drm file fops
> + * @seq_file: output stream
> + * @f: the device file instance
> + *
> + * Helper to implement fdinfo, for userspace to query usage stats, etc, of a
> + * process using the GPU.

Please mention drm_driver.show_fd_info here too.

> + */
> +void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> +{
> +	struct drm_file *file = f->private_data;
> +	struct drm_device *dev = file->minor->dev;
> +	struct drm_printer p = drm_seq_file_printer(m);
> +
> +	/*
> +	 * ******************************************************************
> +	 * For text output format description please see drm-usage-stats.rst!
> +	 * ******************************************************************

Maybe move this into the kerneldoc comment above (perhaps with an
IMPORTANT: tag or something, and make it an actual link)?

Also in the drm-usage-stats.rst please put a link to this function and
that is must be used for implementing fd_info.

> +	 */
> +
> +	drm_printf(&p, "drm-driver:\t%s\n", dev->driver->name);
> +	drm_printf(&p, "drm-client-id:\t%u\n", file->client_id);
> +
> +	if (dev_is_pci(dev->dev)) {
> +		struct pci_dev *pdev = to_pci_dev(dev->dev);
> +
> +		drm_printf(&p, "drm-pdev:\t%04x:%02x:%02x.%d\n",
> +			   pci_domain_nr(pdev->bus), pdev->bus->number,
> +			   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> +	}
> +
> +	if (dev->driver->show_fdinfo)
> +		dev->driver->show_fdinfo(&p, file);
> +}
> +EXPORT_SYMBOL(drm_fop_show_fdinfo);

Bit a bikeshed, but for consistency drop the _fop_? We don't have it for
any of the other drm fops and git grep doesn't show a naming conflict.

> +
>  /**
>   * mock_drm_getfile - Create a new struct file for the drm device
>   * @minor: drm minor to wrap (e.g. #drm_device.primary)
> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> index 5b86bb7603e7..a883c6d3bcdf 100644
> --- a/include/drm/drm_drv.h
> +++ b/include/drm/drm_drv.h
> @@ -401,6 +401,13 @@ struct drm_driver {
>  			       struct drm_device *dev, uint32_t handle,
>  			       uint64_t *offset);
>  
> +	/**
> +	 * @fdinfo:
> +	 *
> +	 * Print device specific fdinfo.  See drm-usage-stats.rst.

Please make this a link. I like links in kerneldoc :-)

> +	 */
> +	void (*show_fdinfo)(struct drm_printer *p, struct drm_file *f);
> +
>  	/** @major: driver major number */
>  	int major;
>  	/** @minor: driver minor number */
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index 0d1f853092ab..dfa995b787e1 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -258,6 +258,9 @@ struct drm_file {
>  	/** @pid: Process that opened this file. */
>  	struct pid *pid;
>  
> +	/** @client_id: A unique id for fdinfo */
> +	u32 client_id;
> +
>  	/** @magic: Authentication magic, see @authenticated. */
>  	drm_magic_t magic;
>  
> @@ -437,6 +440,7 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e);
>  void drm_send_event_timestamp_locked(struct drm_device *dev,
>  				     struct drm_pending_event *e,
>  				     ktime_t timestamp);
> +void drm_fop_show_fdinfo(struct seq_file *m, struct file *f);
>  
>  struct file *mock_drm_getfile(struct drm_minor *minor, unsigned int flags);
>  
> -- 
> 2.39.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 1/7] drm: Add common fdinfo helper
@ 2023-04-12  7:55     ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12  7:55 UTC (permalink / raw)
  To: Rob Clark
  Cc: Rob Clark, Tvrtko Ursulin, linux-arm-msm, Emil Velikov,
	Christopher Healy, dri-devel, open list, Boris Brezillon,
	Thomas Zimmermann, freedreno

On Tue, Apr 11, 2023 at 03:56:06PM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Handle a bit of the boiler-plate in a single case, and make it easier to
> add some core tracked stats.
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>

Thanks a lot for kicking this off. A few polish comments below, with those
addressed:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/gpu/drm/drm_file.c | 39 ++++++++++++++++++++++++++++++++++++++
>  include/drm/drm_drv.h      |  7 +++++++
>  include/drm/drm_file.h     |  4 ++++
>  3 files changed, 50 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index a51ff8cee049..37dfaa6be560 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -148,6 +148,7 @@ bool drm_dev_needs_global_mutex(struct drm_device *dev)
>   */
>  struct drm_file *drm_file_alloc(struct drm_minor *minor)
>  {
> +	static atomic_t ident = ATOMIC_INIT(0);

Maybe make this atomic64_t just to be sure?

>  	struct drm_device *dev = minor->dev;
>  	struct drm_file *file;
>  	int ret;
> @@ -156,6 +157,8 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>  	if (!file)
>  		return ERR_PTR(-ENOMEM);
>  
> +	/* Get a unique identifier for fdinfo: */
> +	file->client_id = atomic_inc_return(&ident) - 1;
>  	file->pid = get_pid(task_pid(current));
>  	file->minor = minor;
>  
> @@ -868,6 +871,42 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
>  }
>  EXPORT_SYMBOL(drm_send_event);
>  
> +/**
> + * drm_fop_show_fdinfo - helper for drm file fops
> + * @seq_file: output stream
> + * @f: the device file instance
> + *
> + * Helper to implement fdinfo, for userspace to query usage stats, etc, of a
> + * process using the GPU.

Please mention drm_driver.show_fd_info here too.

> + */
> +void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> +{
> +	struct drm_file *file = f->private_data;
> +	struct drm_device *dev = file->minor->dev;
> +	struct drm_printer p = drm_seq_file_printer(m);
> +
> +	/*
> +	 * ******************************************************************
> +	 * For text output format description please see drm-usage-stats.rst!
> +	 * ******************************************************************

Maybe move this into the kerneldoc comment above (perhaps with an
IMPORTANT: tag or something, and make it an actual link)?

Also in the drm-usage-stats.rst please put a link to this function and
that is must be used for implementing fd_info.

> +	 */
> +
> +	drm_printf(&p, "drm-driver:\t%s\n", dev->driver->name);
> +	drm_printf(&p, "drm-client-id:\t%u\n", file->client_id);
> +
> +	if (dev_is_pci(dev->dev)) {
> +		struct pci_dev *pdev = to_pci_dev(dev->dev);
> +
> +		drm_printf(&p, "drm-pdev:\t%04x:%02x:%02x.%d\n",
> +			   pci_domain_nr(pdev->bus), pdev->bus->number,
> +			   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> +	}
> +
> +	if (dev->driver->show_fdinfo)
> +		dev->driver->show_fdinfo(&p, file);
> +}
> +EXPORT_SYMBOL(drm_fop_show_fdinfo);

Bit a bikeshed, but for consistency drop the _fop_? We don't have it for
any of the other drm fops and git grep doesn't show a naming conflict.

> +
>  /**
>   * mock_drm_getfile - Create a new struct file for the drm device
>   * @minor: drm minor to wrap (e.g. #drm_device.primary)
> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> index 5b86bb7603e7..a883c6d3bcdf 100644
> --- a/include/drm/drm_drv.h
> +++ b/include/drm/drm_drv.h
> @@ -401,6 +401,13 @@ struct drm_driver {
>  			       struct drm_device *dev, uint32_t handle,
>  			       uint64_t *offset);
>  
> +	/**
> +	 * @fdinfo:
> +	 *
> +	 * Print device specific fdinfo.  See drm-usage-stats.rst.

Please make this a link. I like links in kerneldoc :-)

> +	 */
> +	void (*show_fdinfo)(struct drm_printer *p, struct drm_file *f);
> +
>  	/** @major: driver major number */
>  	int major;
>  	/** @minor: driver minor number */
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index 0d1f853092ab..dfa995b787e1 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -258,6 +258,9 @@ struct drm_file {
>  	/** @pid: Process that opened this file. */
>  	struct pid *pid;
>  
> +	/** @client_id: A unique id for fdinfo */
> +	u32 client_id;
> +
>  	/** @magic: Authentication magic, see @authenticated. */
>  	drm_magic_t magic;
>  
> @@ -437,6 +440,7 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e);
>  void drm_send_event_timestamp_locked(struct drm_device *dev,
>  				     struct drm_pending_event *e,
>  				     ktime_t timestamp);
> +void drm_fop_show_fdinfo(struct seq_file *m, struct file *f);
>  
>  struct file *mock_drm_getfile(struct drm_minor *minor, unsigned int flags);
>  
> -- 
> 2.39.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 3/7] drm/amdgpu: Switch to fdinfo helper
  2023-04-11 22:56   ` Rob Clark
  (?)
@ 2023-04-12  7:58     ` Daniel Vetter
  -1 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12  7:58 UTC (permalink / raw)
  To: Rob Clark
  Cc: dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Tvrtko Ursulin, Christopher Healy, Emil Velikov, Rob Clark,
	Alex Deucher, Christian König, Pan, Xinhui, David Airlie,
	Daniel Vetter, Hawking Zhang, Evan Quan, Mario Limonciello,
	Guchun Chen, YiPeng Chai, Michel Dänzer, Shashank Sharma,
	Tvrtko Ursulin, Arunpravin Paneer Selvam,
	open list:RADEON and AMDGPU DRM DRIVERS, open list

On Tue, Apr 11, 2023 at 03:56:08PM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  3 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 16 ++++++----------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |  2 +-
>  3 files changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index f5ffca24def4..3611cfd5f076 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2752,7 +2752,7 @@ static const struct file_operations amdgpu_driver_kms_fops = {
>  	.compat_ioctl = amdgpu_kms_compat_ioctl,
>  #endif
>  #ifdef CONFIG_PROC_FS
> -	.show_fdinfo = amdgpu_show_fdinfo
> +	.show_fdinfo = drm_fop_show_fdinfo,
>  #endif
>  };
>  
> @@ -2807,6 +2807,7 @@ static const struct drm_driver amdgpu_kms_driver = {
>  	.dumb_map_offset = amdgpu_mode_dumb_mmap,
>  	.fops = &amdgpu_driver_kms_fops,
>  	.release = &amdgpu_driver_release_kms,
> +	.show_fdinfo = amdgpu_show_fdinfo,
>  
>  	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>  	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> index 99a7855ab1bc..c2fdd5e448d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> @@ -53,9 +53,8 @@ static const char *amdgpu_ip_name[AMDGPU_HW_IP_NUM] = {
>  	[AMDGPU_HW_IP_VCN_JPEG]	=	"jpeg",
>  };
>  
> -void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
> +void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>  {
> -	struct drm_file *file = f->private_data;
>  	struct amdgpu_device *adev = drm_to_adev(file->minor->dev);
>  	struct amdgpu_fpriv *fpriv = file->driver_priv;
>  	struct amdgpu_vm *vm = &fpriv->vm;
> @@ -86,18 +85,15 @@ void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
>  	 * ******************************************************************
>  	 */
>  
> -	seq_printf(m, "pasid:\t%u\n", fpriv->vm.pasid);
> -	seq_printf(m, "drm-driver:\t%s\n", file->minor->dev->driver->name);
> -	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n", domain, bus, dev, fn);
> -	seq_printf(m, "drm-client-id:\t%Lu\n", vm->immediate.fence_context);
> -	seq_printf(m, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
> -	seq_printf(m, "drm-memory-gtt: \t%llu KiB\n", gtt_mem/1024UL);
> -	seq_printf(m, "drm-memory-cpu: \t%llu KiB\n", cpu_mem/1024UL);
> +	drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid);
> +	drm_printf(p, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
> +	drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", gtt_mem/1024UL);
> +	drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", cpu_mem/1024UL);

random aside, but we're not super consistent here, some of these have an
additional ' ' space.

I guess a next step would be a drm_fdinfo_printf(drm_printer *p, const
char *name, const char *printf, ...) and maybe some specialized ones that
dtrt for specific parameters, like drm_fdinfo_llu().

But that's for next one I guess :-)
-Daniel


>  	for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
>  		if (!usage[hw_ip])
>  			continue;
>  
> -		seq_printf(m, "drm-engine-%s:\t%Ld ns\n", amdgpu_ip_name[hw_ip],
> +		drm_printf(p, "drm-engine-%s:\t%Ld ns\n", amdgpu_ip_name[hw_ip],
>  			   ktime_to_ns(usage[hw_ip]));
>  	}
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
> index e86834bfea1d..0398f5a159ef 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
> @@ -37,6 +37,6 @@
>  #include "amdgpu_ids.h"
>  
>  uint32_t amdgpu_get_ip_count(struct amdgpu_device *adev, int id);
> -void amdgpu_show_fdinfo(struct seq_file *m, struct file *f);
> +void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file);
>  
>  #endif
> -- 
> 2.39.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 3/7] drm/amdgpu: Switch to fdinfo helper
@ 2023-04-12  7:58     ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12  7:58 UTC (permalink / raw)
  To: Rob Clark
  Cc: dri-devel, open list, Michel Dänzer, YiPeng Chai,
	Mario Limonciello, Rob Clark, Guchun Chen, Shashank Sharma,
	open list:RADEON and AMDGPU DRM DRIVERS,
	Arunpravin Paneer Selvam, linux-arm-msm, Evan Quan,
	Tvrtko Ursulin, Tvrtko Ursulin, Pan, Xinhui, Emil Velikov,
	Christopher Healy, Boris Brezillon, Alex Deucher, freedreno,
	Christian König, Hawking Zhang

On Tue, Apr 11, 2023 at 03:56:08PM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  3 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 16 ++++++----------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |  2 +-
>  3 files changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index f5ffca24def4..3611cfd5f076 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2752,7 +2752,7 @@ static const struct file_operations amdgpu_driver_kms_fops = {
>  	.compat_ioctl = amdgpu_kms_compat_ioctl,
>  #endif
>  #ifdef CONFIG_PROC_FS
> -	.show_fdinfo = amdgpu_show_fdinfo
> +	.show_fdinfo = drm_fop_show_fdinfo,
>  #endif
>  };
>  
> @@ -2807,6 +2807,7 @@ static const struct drm_driver amdgpu_kms_driver = {
>  	.dumb_map_offset = amdgpu_mode_dumb_mmap,
>  	.fops = &amdgpu_driver_kms_fops,
>  	.release = &amdgpu_driver_release_kms,
> +	.show_fdinfo = amdgpu_show_fdinfo,
>  
>  	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>  	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> index 99a7855ab1bc..c2fdd5e448d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> @@ -53,9 +53,8 @@ static const char *amdgpu_ip_name[AMDGPU_HW_IP_NUM] = {
>  	[AMDGPU_HW_IP_VCN_JPEG]	=	"jpeg",
>  };
>  
> -void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
> +void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>  {
> -	struct drm_file *file = f->private_data;
>  	struct amdgpu_device *adev = drm_to_adev(file->minor->dev);
>  	struct amdgpu_fpriv *fpriv = file->driver_priv;
>  	struct amdgpu_vm *vm = &fpriv->vm;
> @@ -86,18 +85,15 @@ void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
>  	 * ******************************************************************
>  	 */
>  
> -	seq_printf(m, "pasid:\t%u\n", fpriv->vm.pasid);
> -	seq_printf(m, "drm-driver:\t%s\n", file->minor->dev->driver->name);
> -	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n", domain, bus, dev, fn);
> -	seq_printf(m, "drm-client-id:\t%Lu\n", vm->immediate.fence_context);
> -	seq_printf(m, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
> -	seq_printf(m, "drm-memory-gtt: \t%llu KiB\n", gtt_mem/1024UL);
> -	seq_printf(m, "drm-memory-cpu: \t%llu KiB\n", cpu_mem/1024UL);
> +	drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid);
> +	drm_printf(p, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
> +	drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", gtt_mem/1024UL);
> +	drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", cpu_mem/1024UL);

random aside, but we're not super consistent here, some of these have an
additional ' ' space.

I guess a next step would be a drm_fdinfo_printf(drm_printer *p, const
char *name, const char *printf, ...) and maybe some specialized ones that
dtrt for specific parameters, like drm_fdinfo_llu().

But that's for next one I guess :-)
-Daniel


>  	for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
>  		if (!usage[hw_ip])
>  			continue;
>  
> -		seq_printf(m, "drm-engine-%s:\t%Ld ns\n", amdgpu_ip_name[hw_ip],
> +		drm_printf(p, "drm-engine-%s:\t%Ld ns\n", amdgpu_ip_name[hw_ip],
>  			   ktime_to_ns(usage[hw_ip]));
>  	}
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
> index e86834bfea1d..0398f5a159ef 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
> @@ -37,6 +37,6 @@
>  #include "amdgpu_ids.h"
>  
>  uint32_t amdgpu_get_ip_count(struct amdgpu_device *adev, int id);
> -void amdgpu_show_fdinfo(struct seq_file *m, struct file *f);
> +void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file);
>  
>  #endif
> -- 
> 2.39.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 3/7] drm/amdgpu: Switch to fdinfo helper
@ 2023-04-12  7:58     ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12  7:58 UTC (permalink / raw)
  To: Rob Clark
  Cc: dri-devel, open list, Michel Dänzer, YiPeng Chai,
	Mario Limonciello, David Airlie, Rob Clark, Guchun Chen,
	Shashank Sharma, open list:RADEON and AMDGPU DRM DRIVERS,
	Arunpravin Paneer Selvam, linux-arm-msm, Evan Quan,
	Tvrtko Ursulin, Tvrtko Ursulin, Pan, Xinhui, Emil Velikov,
	Christopher Healy, Boris Brezillon, Daniel Vetter, Alex Deucher,
	freedreno, Christian König, Hawking Zhang

On Tue, Apr 11, 2023 at 03:56:08PM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  3 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 16 ++++++----------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |  2 +-
>  3 files changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index f5ffca24def4..3611cfd5f076 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2752,7 +2752,7 @@ static const struct file_operations amdgpu_driver_kms_fops = {
>  	.compat_ioctl = amdgpu_kms_compat_ioctl,
>  #endif
>  #ifdef CONFIG_PROC_FS
> -	.show_fdinfo = amdgpu_show_fdinfo
> +	.show_fdinfo = drm_fop_show_fdinfo,
>  #endif
>  };
>  
> @@ -2807,6 +2807,7 @@ static const struct drm_driver amdgpu_kms_driver = {
>  	.dumb_map_offset = amdgpu_mode_dumb_mmap,
>  	.fops = &amdgpu_driver_kms_fops,
>  	.release = &amdgpu_driver_release_kms,
> +	.show_fdinfo = amdgpu_show_fdinfo,
>  
>  	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>  	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> index 99a7855ab1bc..c2fdd5e448d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
> @@ -53,9 +53,8 @@ static const char *amdgpu_ip_name[AMDGPU_HW_IP_NUM] = {
>  	[AMDGPU_HW_IP_VCN_JPEG]	=	"jpeg",
>  };
>  
> -void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
> +void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>  {
> -	struct drm_file *file = f->private_data;
>  	struct amdgpu_device *adev = drm_to_adev(file->minor->dev);
>  	struct amdgpu_fpriv *fpriv = file->driver_priv;
>  	struct amdgpu_vm *vm = &fpriv->vm;
> @@ -86,18 +85,15 @@ void amdgpu_show_fdinfo(struct seq_file *m, struct file *f)
>  	 * ******************************************************************
>  	 */
>  
> -	seq_printf(m, "pasid:\t%u\n", fpriv->vm.pasid);
> -	seq_printf(m, "drm-driver:\t%s\n", file->minor->dev->driver->name);
> -	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n", domain, bus, dev, fn);
> -	seq_printf(m, "drm-client-id:\t%Lu\n", vm->immediate.fence_context);
> -	seq_printf(m, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
> -	seq_printf(m, "drm-memory-gtt: \t%llu KiB\n", gtt_mem/1024UL);
> -	seq_printf(m, "drm-memory-cpu: \t%llu KiB\n", cpu_mem/1024UL);
> +	drm_printf(p, "pasid:\t%u\n", fpriv->vm.pasid);
> +	drm_printf(p, "drm-memory-vram:\t%llu KiB\n", vram_mem/1024UL);
> +	drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", gtt_mem/1024UL);
> +	drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", cpu_mem/1024UL);

random aside, but we're not super consistent here, some of these have an
additional ' ' space.

I guess a next step would be a drm_fdinfo_printf(drm_printer *p, const
char *name, const char *printf, ...) and maybe some specialized ones that
dtrt for specific parameters, like drm_fdinfo_llu().

But that's for next one I guess :-)
-Daniel


>  	for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
>  		if (!usage[hw_ip])
>  			continue;
>  
> -		seq_printf(m, "drm-engine-%s:\t%Ld ns\n", amdgpu_ip_name[hw_ip],
> +		drm_printf(p, "drm-engine-%s:\t%Ld ns\n", amdgpu_ip_name[hw_ip],
>  			   ktime_to_ns(usage[hw_ip]));
>  	}
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
> index e86834bfea1d..0398f5a159ef 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h
> @@ -37,6 +37,6 @@
>  #include "amdgpu_ids.h"
>  
>  uint32_t amdgpu_get_ip_count(struct amdgpu_device *adev, int id);
> -void amdgpu_show_fdinfo(struct seq_file *m, struct file *f);
> +void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file);
>  
>  #endif
> -- 
> 2.39.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 5/7] drm/etnaviv: Switch to fdinfo helper
  2023-04-11 22:56   ` Rob Clark
@ 2023-04-12  7:59     ` Daniel Vetter
  -1 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12  7:59 UTC (permalink / raw)
  To: Rob Clark
  Cc: dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Tvrtko Ursulin, Christopher Healy, Emil Velikov, Rob Clark,
	Lucas Stach, Russell King, Christian Gmeiner, David Airlie,
	Daniel Vetter, moderated list:DRM DRIVERS FOR VIVANTE GPU IP,
	open list

On Tue, Apr 11, 2023 at 03:56:10PM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>

You're on an old tree, this got reverted. But I'm kinda wondering whether
another patch on top shouldn't just includ the drm_show_fdinfo in
DRM_GEM_FOPS macro ... There's really no good reasons for drivers to not
have this I think?
-Daniel

> ---
>  drivers/gpu/drm/etnaviv/etnaviv_drv.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> index 44ca803237a5..170000d6af94 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> @@ -476,9 +476,8 @@ static const struct drm_ioctl_desc etnaviv_ioctls[] = {
>  	ETNA_IOCTL(PM_QUERY_SIG, pm_query_sig, DRM_RENDER_ALLOW),
>  };
>  
> -static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
> +static void etnaviv_fop_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>  {
> -	struct drm_file *file = f->private_data;
>  	struct drm_device *dev = file->minor->dev;
>  	struct etnaviv_drm_private *priv = dev->dev_private;
>  	struct etnaviv_file_private *ctx = file->driver_priv;
> @@ -487,8 +486,6 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
>  	 * For a description of the text output format used here, see
>  	 * Documentation/gpu/drm-usage-stats.rst.
>  	 */
> -	seq_printf(m, "drm-driver:\t%s\n", dev->driver->name);
> -	seq_printf(m, "drm-client-id:\t%u\n", ctx->id);
>  
>  	for (int i = 0; i < ETNA_MAX_PIPES; i++) {
>  		struct etnaviv_gpu *gpu = priv->gpu[i];
> @@ -507,7 +504,7 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
>  			cur = snprintf(engine + cur, sizeof(engine) - cur,
>  				       "%sNN", cur ? "/" : "");
>  
> -		seq_printf(m, "drm-engine-%s:\t%llu ns\n", engine,
> +		drm_printf(p, "drm-engine-%s:\t%llu ns\n", engine,
>  			   ctx->sched_entity[i].elapsed_ns);
>  	}
>  }
> @@ -515,7 +512,7 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
>  static const struct file_operations fops = {
>  	.owner = THIS_MODULE,
>  	DRM_GEM_FOPS,
> -	.show_fdinfo = etnaviv_fop_show_fdinfo,
> +	.show_fdinfo = drm_fop_show_fdinfo,
>  };
>  
>  static const struct drm_driver etnaviv_drm_driver = {
> @@ -529,6 +526,7 @@ static const struct drm_driver etnaviv_drm_driver = {
>  #ifdef CONFIG_DEBUG_FS
>  	.debugfs_init       = etnaviv_debugfs_init,
>  #endif
> +	.show_fdinfo        = etnaviv_fop_show_fdinfo,
>  	.ioctls             = etnaviv_ioctls,
>  	.num_ioctls         = DRM_ETNAVIV_NUM_IOCTLS,
>  	.fops               = &fops,
> -- 
> 2.39.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 5/7] drm/etnaviv: Switch to fdinfo helper
@ 2023-04-12  7:59     ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12  7:59 UTC (permalink / raw)
  To: Rob Clark
  Cc: Rob Clark, Tvrtko Ursulin, linux-arm-msm, Emil Velikov,
	Christopher Healy, dri-devel,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Boris Brezillon,
	Russell King, freedreno, open list

On Tue, Apr 11, 2023 at 03:56:10PM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>

You're on an old tree, this got reverted. But I'm kinda wondering whether
another patch on top shouldn't just includ the drm_show_fdinfo in
DRM_GEM_FOPS macro ... There's really no good reasons for drivers to not
have this I think?
-Daniel

> ---
>  drivers/gpu/drm/etnaviv/etnaviv_drv.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> index 44ca803237a5..170000d6af94 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> @@ -476,9 +476,8 @@ static const struct drm_ioctl_desc etnaviv_ioctls[] = {
>  	ETNA_IOCTL(PM_QUERY_SIG, pm_query_sig, DRM_RENDER_ALLOW),
>  };
>  
> -static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
> +static void etnaviv_fop_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>  {
> -	struct drm_file *file = f->private_data;
>  	struct drm_device *dev = file->minor->dev;
>  	struct etnaviv_drm_private *priv = dev->dev_private;
>  	struct etnaviv_file_private *ctx = file->driver_priv;
> @@ -487,8 +486,6 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
>  	 * For a description of the text output format used here, see
>  	 * Documentation/gpu/drm-usage-stats.rst.
>  	 */
> -	seq_printf(m, "drm-driver:\t%s\n", dev->driver->name);
> -	seq_printf(m, "drm-client-id:\t%u\n", ctx->id);
>  
>  	for (int i = 0; i < ETNA_MAX_PIPES; i++) {
>  		struct etnaviv_gpu *gpu = priv->gpu[i];
> @@ -507,7 +504,7 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
>  			cur = snprintf(engine + cur, sizeof(engine) - cur,
>  				       "%sNN", cur ? "/" : "");
>  
> -		seq_printf(m, "drm-engine-%s:\t%llu ns\n", engine,
> +		drm_printf(p, "drm-engine-%s:\t%llu ns\n", engine,
>  			   ctx->sched_entity[i].elapsed_ns);
>  	}
>  }
> @@ -515,7 +512,7 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
>  static const struct file_operations fops = {
>  	.owner = THIS_MODULE,
>  	DRM_GEM_FOPS,
> -	.show_fdinfo = etnaviv_fop_show_fdinfo,
> +	.show_fdinfo = drm_fop_show_fdinfo,
>  };
>  
>  static const struct drm_driver etnaviv_drm_driver = {
> @@ -529,6 +526,7 @@ static const struct drm_driver etnaviv_drm_driver = {
>  #ifdef CONFIG_DEBUG_FS
>  	.debugfs_init       = etnaviv_debugfs_init,
>  #endif
> +	.show_fdinfo        = etnaviv_fop_show_fdinfo,
>  	.ioctls             = etnaviv_ioctls,
>  	.num_ioctls         = DRM_ETNAVIV_NUM_IOCTLS,
>  	.fops               = &fops,
> -- 
> 2.39.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-11 22:56   ` Rob Clark
@ 2023-04-12  8:01     ` Daniel Vetter
  -1 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12  8:01 UTC (permalink / raw)
  To: Rob Clark
  Cc: dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Tvrtko Ursulin, Christopher Healy, Emil Velikov, Rob Clark,
	David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, open list:DOCUMENTATION,
	open list

On Tue, Apr 11, 2023 at 03:56:11PM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Add support to dump GEM stats to fdinfo.
> 
> v2: Fix typos, change size units to match docs, use div_u64
> v3: Do it in core
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> ---
>  Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
>  drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
>  include/drm/drm_file.h                |  1 +
>  include/drm/drm_gem.h                 | 19 +++++++
>  4 files changed, 117 insertions(+)
> 
> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> index b46327356e80..b5e7802532ed 100644
> --- a/Documentation/gpu/drm-usage-stats.rst
> +++ b/Documentation/gpu/drm-usage-stats.rst
> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
>  Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
>  indicating kibi- or mebi-bytes.
>  
> +- drm-shared-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are shared with another file (ie. have more
> +than a single handle).
> +
> +- drm-private-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are not shared with another file.
> +
> +- drm-resident-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are resident in system memory.
> +
> +- drm-purgeable-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are purgeable.
> +
> +- drm-active-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are active on one or more rings.
> +
>  - drm-cycles-<str> <uint>
>  
>  Engine identifier string must be the same as the one specified in the
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index 37dfaa6be560..46fdd843bb3a 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -42,6 +42,7 @@
>  #include <drm/drm_client.h>
>  #include <drm/drm_drv.h>
>  #include <drm/drm_file.h>
> +#include <drm/drm_gem.h>
>  #include <drm/drm_print.h>
>  
>  #include "drm_crtc_internal.h"
> @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
>  }
>  EXPORT_SYMBOL(drm_send_event);
>  
> +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> +{
> +	const char *units[] = {"", " KiB", " MiB"};
> +	unsigned u;
> +
> +	for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> +		if (sz < SZ_1K)
> +			break;
> +		sz = div_u64(sz, SZ_1K);
> +	}
> +
> +	drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> +}
> +
> +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> +{
> +	struct drm_gem_object *obj;
> +	struct {
> +		size_t shared;
> +		size_t private;
> +		size_t resident;
> +		size_t purgeable;
> +		size_t active;
> +	} size = {0};
> +	bool has_status = false;
> +	int id;
> +
> +	spin_lock(&file->table_lock);
> +	idr_for_each_entry (&file->object_idr, obj, id) {
> +		enum drm_gem_object_status s = 0;
> +
> +		if (obj->funcs && obj->funcs->status) {
> +			s = obj->funcs->status(obj);
> +			has_status = true;
> +		}
> +
> +		if (obj->handle_count > 1) {
> +			size.shared += obj->size;
> +		} else {
> +			size.private += obj->size;
> +		}
> +
> +		if (s & DRM_GEM_OBJECT_RESIDENT) {
> +			size.resident += obj->size;
> +		} else {
> +			/* If already purged or not yet backed by pages, don't
> +			 * count it as purgeable:
> +			 */
> +			s &= ~DRM_GEM_OBJECT_PURGEABLE;
> +		}
> +
> +		if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> +			size.active += obj->size;
> +
> +			/* If still active, don't count as purgeable: */

Maybe mention this in the kerneldoc for DRM_GEM_OBJECT_PURGEABLE?

Otherwise looks tidy:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> +			s &= ~DRM_GEM_OBJECT_PURGEABLE;
> +		}
> +
> +		if (s & DRM_GEM_OBJECT_PURGEABLE)
> +			size.purgeable += obj->size;
> +	}
> +	spin_unlock(&file->table_lock);
> +
> +	print_size(p, "drm-shared-memory", size.shared);
> +	print_size(p, "drm-private-memory", size.private);
> +	print_size(p, "drm-active-memory", size.active);
> +
> +	if (has_status) {
> +		print_size(p, "drm-resident-memory", size.resident);
> +		print_size(p, "drm-purgeable-memory", size.purgeable);
> +	}
> +}
> +
>  /**
>   * drm_fop_show_fdinfo - helper for drm file fops
>   * @seq_file: output stream
> @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
>  
>  	if (dev->driver->show_fdinfo)
>  		dev->driver->show_fdinfo(&p, file);
> +
> +	print_memory_stats(&p, file);
>  }
>  EXPORT_SYMBOL(drm_fop_show_fdinfo);
>  
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index dfa995b787e1..e5b40084538f 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -41,6 +41,7 @@
>  struct dma_fence;
>  struct drm_file;
>  struct drm_device;
> +struct drm_printer;
>  struct device;
>  struct file;
>  
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 189fd618ca65..213917bb6b11 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -42,6 +42,14 @@
>  struct iosys_map;
>  struct drm_gem_object;
>  
> +/**
> + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> + */
> +enum drm_gem_object_status {
> +	DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> +	DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> +};
> +
>  /**
>   * struct drm_gem_object_funcs - GEM object functions
>   */
> @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
>  	 */
>  	int (*evict)(struct drm_gem_object *obj);
>  
> +	/**
> +	 * @status:
> +	 *
> +	 * The optional status callback can return additional object state
> +	 * which determines which stats the object is counted against.  The
> +	 * callback is called under table_lock.  Racing against object status
> +	 * change is "harmless", and the callback can expect to not race
> +	 * against object destruction.
> +	 */
> +	enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
> +
>  	/**
>  	 * @vm_ops:
>  	 *
> -- 
> 2.39.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-12  8:01     ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12  8:01 UTC (permalink / raw)
  To: Rob Clark
  Cc: Rob Clark, Tvrtko Ursulin, Thomas Zimmermann, Jonathan Corbet,
	linux-arm-msm, open list:DOCUMENTATION, Emil Velikov,
	Christopher Healy, dri-devel, open list, Boris Brezillon,
	freedreno

On Tue, Apr 11, 2023 at 03:56:11PM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Add support to dump GEM stats to fdinfo.
> 
> v2: Fix typos, change size units to match docs, use div_u64
> v3: Do it in core
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> ---
>  Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
>  drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
>  include/drm/drm_file.h                |  1 +
>  include/drm/drm_gem.h                 | 19 +++++++
>  4 files changed, 117 insertions(+)
> 
> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> index b46327356e80..b5e7802532ed 100644
> --- a/Documentation/gpu/drm-usage-stats.rst
> +++ b/Documentation/gpu/drm-usage-stats.rst
> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
>  Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
>  indicating kibi- or mebi-bytes.
>  
> +- drm-shared-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are shared with another file (ie. have more
> +than a single handle).
> +
> +- drm-private-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are not shared with another file.
> +
> +- drm-resident-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are resident in system memory.
> +
> +- drm-purgeable-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are purgeable.
> +
> +- drm-active-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are active on one or more rings.
> +
>  - drm-cycles-<str> <uint>
>  
>  Engine identifier string must be the same as the one specified in the
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index 37dfaa6be560..46fdd843bb3a 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -42,6 +42,7 @@
>  #include <drm/drm_client.h>
>  #include <drm/drm_drv.h>
>  #include <drm/drm_file.h>
> +#include <drm/drm_gem.h>
>  #include <drm/drm_print.h>
>  
>  #include "drm_crtc_internal.h"
> @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
>  }
>  EXPORT_SYMBOL(drm_send_event);
>  
> +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> +{
> +	const char *units[] = {"", " KiB", " MiB"};
> +	unsigned u;
> +
> +	for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> +		if (sz < SZ_1K)
> +			break;
> +		sz = div_u64(sz, SZ_1K);
> +	}
> +
> +	drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> +}
> +
> +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> +{
> +	struct drm_gem_object *obj;
> +	struct {
> +		size_t shared;
> +		size_t private;
> +		size_t resident;
> +		size_t purgeable;
> +		size_t active;
> +	} size = {0};
> +	bool has_status = false;
> +	int id;
> +
> +	spin_lock(&file->table_lock);
> +	idr_for_each_entry (&file->object_idr, obj, id) {
> +		enum drm_gem_object_status s = 0;
> +
> +		if (obj->funcs && obj->funcs->status) {
> +			s = obj->funcs->status(obj);
> +			has_status = true;
> +		}
> +
> +		if (obj->handle_count > 1) {
> +			size.shared += obj->size;
> +		} else {
> +			size.private += obj->size;
> +		}
> +
> +		if (s & DRM_GEM_OBJECT_RESIDENT) {
> +			size.resident += obj->size;
> +		} else {
> +			/* If already purged or not yet backed by pages, don't
> +			 * count it as purgeable:
> +			 */
> +			s &= ~DRM_GEM_OBJECT_PURGEABLE;
> +		}
> +
> +		if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> +			size.active += obj->size;
> +
> +			/* If still active, don't count as purgeable: */

Maybe mention this in the kerneldoc for DRM_GEM_OBJECT_PURGEABLE?

Otherwise looks tidy:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> +			s &= ~DRM_GEM_OBJECT_PURGEABLE;
> +		}
> +
> +		if (s & DRM_GEM_OBJECT_PURGEABLE)
> +			size.purgeable += obj->size;
> +	}
> +	spin_unlock(&file->table_lock);
> +
> +	print_size(p, "drm-shared-memory", size.shared);
> +	print_size(p, "drm-private-memory", size.private);
> +	print_size(p, "drm-active-memory", size.active);
> +
> +	if (has_status) {
> +		print_size(p, "drm-resident-memory", size.resident);
> +		print_size(p, "drm-purgeable-memory", size.purgeable);
> +	}
> +}
> +
>  /**
>   * drm_fop_show_fdinfo - helper for drm file fops
>   * @seq_file: output stream
> @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
>  
>  	if (dev->driver->show_fdinfo)
>  		dev->driver->show_fdinfo(&p, file);
> +
> +	print_memory_stats(&p, file);
>  }
>  EXPORT_SYMBOL(drm_fop_show_fdinfo);
>  
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index dfa995b787e1..e5b40084538f 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -41,6 +41,7 @@
>  struct dma_fence;
>  struct drm_file;
>  struct drm_device;
> +struct drm_printer;
>  struct device;
>  struct file;
>  
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 189fd618ca65..213917bb6b11 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -42,6 +42,14 @@
>  struct iosys_map;
>  struct drm_gem_object;
>  
> +/**
> + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> + */
> +enum drm_gem_object_status {
> +	DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> +	DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> +};
> +
>  /**
>   * struct drm_gem_object_funcs - GEM object functions
>   */
> @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
>  	 */
>  	int (*evict)(struct drm_gem_object *obj);
>  
> +	/**
> +	 * @status:
> +	 *
> +	 * The optional status callback can return additional object state
> +	 * which determines which stats the object is counted against.  The
> +	 * callback is called under table_lock.  Racing against object status
> +	 * change is "harmless", and the callback can expect to not race
> +	 * against object destruction.
> +	 */
> +	enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
> +
>  	/**
>  	 * @vm_ops:
>  	 *
> -- 
> 2.39.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 0/7] drm: fdinfo memory stats
  2023-04-11 22:56 ` Rob Clark
  (?)
  (?)
@ 2023-04-12  9:34   ` Christian König
  -1 siblings, 0 replies; 94+ messages in thread
From: Christian König @ 2023-04-12  9:34 UTC (permalink / raw)
  To: Rob Clark, dri-devel
  Cc: linux-arm-msm, freedreno, Boris Brezillon, Tvrtko Ursulin,
	Christopher Healy, Emil Velikov, Rob Clark, Alex Deucher,
	open list:RADEON and AMDGPU DRM DRIVERS,
	Arunpravin Paneer Selvam, Christian Gmeiner,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Evan Quan,
	Guchun Chen, Hawking Zhang, intel-gfx, open list:DOCUMENTATION,
	open list, Mario Limonciello, Michel Dänzer, Russell King,
	Sean Paul, Shashank Sharma, Tvrtko Ursulin, YiPeng Chai

Am 12.04.23 um 00:56 schrieb Rob Clark:
> From: Rob Clark <robdclark@chromium.org>
>
> Similar motivation to other similar recent attempt[1].  But with an
> attempt to have some shared code for this.  As well as documentation.
>
> It is probably a bit UMA-centric, I guess devices with VRAM might want
> some placement stats as well.  But this seems like a reasonable start.
>
> Basic gputop support: https://patchwork.freedesktop.org/series/116236/
> And already nvtop support: https://github.com/Syllo/nvtop/pull/204
>
> [1] https://patchwork.freedesktop.org/series/112397/

I think the extra client id looks a bit superfluous since the ino of the 
file should already be unique and IIRC we have been already using that one.

Apart from that looks good to me,
Christian.

PS: For some reason only the two patches I was CCed on ended up in my 
inbox, dri-devel swallowed all the rest and hasn't spit it out yet. Had 
to dig up the rest from patchwork.


>
> Rob Clark (7):
>    drm: Add common fdinfo helper
>    drm/msm: Switch to fdinfo helper
>    drm/amdgpu: Switch to fdinfo helper
>    drm/i915: Switch to fdinfo helper
>    drm/etnaviv: Switch to fdinfo helper
>    drm: Add fdinfo memory stats
>    drm/msm: Add memory stats to fdinfo
>
>   Documentation/gpu/drm-usage-stats.rst      |  21 ++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   3 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  16 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |   2 +-
>   drivers/gpu/drm/drm_file.c                 | 115 +++++++++++++++++++++
>   drivers/gpu/drm/etnaviv/etnaviv_drv.c      |  10 +-
>   drivers/gpu/drm/i915/i915_driver.c         |   3 +-
>   drivers/gpu/drm/i915/i915_drm_client.c     |  18 +---
>   drivers/gpu/drm/i915/i915_drm_client.h     |   2 +-
>   drivers/gpu/drm/msm/msm_drv.c              |  11 +-
>   drivers/gpu/drm/msm/msm_gem.c              |  15 +++
>   drivers/gpu/drm/msm/msm_gpu.c              |   2 -
>   include/drm/drm_drv.h                      |   7 ++
>   include/drm/drm_file.h                     |   5 +
>   include/drm/drm_gem.h                      |  19 ++++
>   15 files changed, 208 insertions(+), 41 deletions(-)
>


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 0/7] drm: fdinfo memory stats
@ 2023-04-12  9:34   ` Christian König
  0 siblings, 0 replies; 94+ messages in thread
From: Christian König @ 2023-04-12  9:34 UTC (permalink / raw)
  To: Rob Clark, dri-devel
  Cc: open list:DOCUMENTATION, open list, Michel Dänzer,
	YiPeng Chai, Mario Limonciello, Rob Clark, Guchun Chen,
	Shashank Sharma, open list:RADEON and AMDGPU DRM DRIVERS,
	Russell King, Arunpravin Paneer Selvam, linux-arm-msm, intel-gfx,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Evan Quan,
	Sean Paul, Tvrtko Ursulin, Tvrtko Ursulin, Emil Velikov,
	Christopher Healy, Boris Brezillon, Alex Deucher, freedreno,
	Hawking Zhang

Am 12.04.23 um 00:56 schrieb Rob Clark:
> From: Rob Clark <robdclark@chromium.org>
>
> Similar motivation to other similar recent attempt[1].  But with an
> attempt to have some shared code for this.  As well as documentation.
>
> It is probably a bit UMA-centric, I guess devices with VRAM might want
> some placement stats as well.  But this seems like a reasonable start.
>
> Basic gputop support: https://patchwork.freedesktop.org/series/116236/
> And already nvtop support: https://github.com/Syllo/nvtop/pull/204
>
> [1] https://patchwork.freedesktop.org/series/112397/

I think the extra client id looks a bit superfluous since the ino of the 
file should already be unique and IIRC we have been already using that one.

Apart from that looks good to me,
Christian.

PS: For some reason only the two patches I was CCed on ended up in my 
inbox, dri-devel swallowed all the rest and hasn't spit it out yet. Had 
to dig up the rest from patchwork.


>
> Rob Clark (7):
>    drm: Add common fdinfo helper
>    drm/msm: Switch to fdinfo helper
>    drm/amdgpu: Switch to fdinfo helper
>    drm/i915: Switch to fdinfo helper
>    drm/etnaviv: Switch to fdinfo helper
>    drm: Add fdinfo memory stats
>    drm/msm: Add memory stats to fdinfo
>
>   Documentation/gpu/drm-usage-stats.rst      |  21 ++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   3 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  16 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |   2 +-
>   drivers/gpu/drm/drm_file.c                 | 115 +++++++++++++++++++++
>   drivers/gpu/drm/etnaviv/etnaviv_drv.c      |  10 +-
>   drivers/gpu/drm/i915/i915_driver.c         |   3 +-
>   drivers/gpu/drm/i915/i915_drm_client.c     |  18 +---
>   drivers/gpu/drm/i915/i915_drm_client.h     |   2 +-
>   drivers/gpu/drm/msm/msm_drv.c              |  11 +-
>   drivers/gpu/drm/msm/msm_gem.c              |  15 +++
>   drivers/gpu/drm/msm/msm_gpu.c              |   2 -
>   include/drm/drm_drv.h                      |   7 ++
>   include/drm/drm_file.h                     |   5 +
>   include/drm/drm_gem.h                      |  19 ++++
>   15 files changed, 208 insertions(+), 41 deletions(-)
>


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Intel-gfx] [PATCH v3 0/7] drm: fdinfo memory stats
@ 2023-04-12  9:34   ` Christian König
  0 siblings, 0 replies; 94+ messages in thread
From: Christian König @ 2023-04-12  9:34 UTC (permalink / raw)
  To: Rob Clark, dri-devel
  Cc: open list:DOCUMENTATION, open list, Michel Dänzer,
	YiPeng Chai, Mario Limonciello, Rob Clark, Guchun Chen,
	open list:RADEON and AMDGPU DRM DRIVERS, Russell King,
	Arunpravin Paneer Selvam, linux-arm-msm, intel-gfx,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Christian Gmeiner,
	Evan Quan, Christopher Healy, Alex Deucher, freedreno,
	Hawking Zhang

Am 12.04.23 um 00:56 schrieb Rob Clark:
> From: Rob Clark <robdclark@chromium.org>
>
> Similar motivation to other similar recent attempt[1].  But with an
> attempt to have some shared code for this.  As well as documentation.
>
> It is probably a bit UMA-centric, I guess devices with VRAM might want
> some placement stats as well.  But this seems like a reasonable start.
>
> Basic gputop support: https://patchwork.freedesktop.org/series/116236/
> And already nvtop support: https://github.com/Syllo/nvtop/pull/204
>
> [1] https://patchwork.freedesktop.org/series/112397/

I think the extra client id looks a bit superfluous since the ino of the 
file should already be unique and IIRC we have been already using that one.

Apart from that looks good to me,
Christian.

PS: For some reason only the two patches I was CCed on ended up in my 
inbox, dri-devel swallowed all the rest and hasn't spit it out yet. Had 
to dig up the rest from patchwork.


>
> Rob Clark (7):
>    drm: Add common fdinfo helper
>    drm/msm: Switch to fdinfo helper
>    drm/amdgpu: Switch to fdinfo helper
>    drm/i915: Switch to fdinfo helper
>    drm/etnaviv: Switch to fdinfo helper
>    drm: Add fdinfo memory stats
>    drm/msm: Add memory stats to fdinfo
>
>   Documentation/gpu/drm-usage-stats.rst      |  21 ++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   3 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  16 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |   2 +-
>   drivers/gpu/drm/drm_file.c                 | 115 +++++++++++++++++++++
>   drivers/gpu/drm/etnaviv/etnaviv_drv.c      |  10 +-
>   drivers/gpu/drm/i915/i915_driver.c         |   3 +-
>   drivers/gpu/drm/i915/i915_drm_client.c     |  18 +---
>   drivers/gpu/drm/i915/i915_drm_client.h     |   2 +-
>   drivers/gpu/drm/msm/msm_drv.c              |  11 +-
>   drivers/gpu/drm/msm/msm_gem.c              |  15 +++
>   drivers/gpu/drm/msm/msm_gpu.c              |   2 -
>   include/drm/drm_drv.h                      |   7 ++
>   include/drm/drm_file.h                     |   5 +
>   include/drm/drm_gem.h                      |  19 ++++
>   15 files changed, 208 insertions(+), 41 deletions(-)
>


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 0/7] drm: fdinfo memory stats
@ 2023-04-12  9:34   ` Christian König
  0 siblings, 0 replies; 94+ messages in thread
From: Christian König @ 2023-04-12  9:34 UTC (permalink / raw)
  To: Rob Clark, dri-devel
  Cc: open list:DOCUMENTATION, open list, Michel Dänzer,
	YiPeng Chai, Mario Limonciello, Rob Clark, Guchun Chen,
	Shashank Sharma, open list:RADEON and AMDGPU DRM DRIVERS,
	Russell King, Arunpravin Paneer Selvam, linux-arm-msm, intel-gfx,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Christian Gmeiner,
	Evan Quan, Sean Paul, Tvrtko Ursulin, Tvrtko Ursulin,
	Emil Velikov, Christopher Healy, Boris Brezillon, Alex Deucher,
	freedreno, Hawking Zhang

Am 12.04.23 um 00:56 schrieb Rob Clark:
> From: Rob Clark <robdclark@chromium.org>
>
> Similar motivation to other similar recent attempt[1].  But with an
> attempt to have some shared code for this.  As well as documentation.
>
> It is probably a bit UMA-centric, I guess devices with VRAM might want
> some placement stats as well.  But this seems like a reasonable start.
>
> Basic gputop support: https://patchwork.freedesktop.org/series/116236/
> And already nvtop support: https://github.com/Syllo/nvtop/pull/204
>
> [1] https://patchwork.freedesktop.org/series/112397/

I think the extra client id looks a bit superfluous since the ino of the 
file should already be unique and IIRC we have been already using that one.

Apart from that looks good to me,
Christian.

PS: For some reason only the two patches I was CCed on ended up in my 
inbox, dri-devel swallowed all the rest and hasn't spit it out yet. Had 
to dig up the rest from patchwork.


>
> Rob Clark (7):
>    drm: Add common fdinfo helper
>    drm/msm: Switch to fdinfo helper
>    drm/amdgpu: Switch to fdinfo helper
>    drm/i915: Switch to fdinfo helper
>    drm/etnaviv: Switch to fdinfo helper
>    drm: Add fdinfo memory stats
>    drm/msm: Add memory stats to fdinfo
>
>   Documentation/gpu/drm-usage-stats.rst      |  21 ++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   3 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  16 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |   2 +-
>   drivers/gpu/drm/drm_file.c                 | 115 +++++++++++++++++++++
>   drivers/gpu/drm/etnaviv/etnaviv_drv.c      |  10 +-
>   drivers/gpu/drm/i915/i915_driver.c         |   3 +-
>   drivers/gpu/drm/i915/i915_drm_client.c     |  18 +---
>   drivers/gpu/drm/i915/i915_drm_client.h     |   2 +-
>   drivers/gpu/drm/msm/msm_drv.c              |  11 +-
>   drivers/gpu/drm/msm/msm_gem.c              |  15 +++
>   drivers/gpu/drm/msm/msm_gpu.c              |   2 -
>   include/drm/drm_drv.h                      |   7 ++
>   include/drm/drm_file.h                     |   5 +
>   include/drm/drm_gem.h                      |  19 ++++
>   15 files changed, 208 insertions(+), 41 deletions(-)
>


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 0/7] drm: fdinfo memory stats
  2023-04-12  9:34   ` Christian König
  (?)
  (?)
@ 2023-04-12 12:10     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-12 12:10 UTC (permalink / raw)
  To: Christian König, Rob Clark, dri-devel
  Cc: linux-arm-msm, freedreno, Boris Brezillon, Christopher Healy,
	Emil Velikov, Rob Clark, Alex Deucher,
	open list:RADEON and AMDGPU DRM DRIVERS,
	Arunpravin Paneer Selvam, Christian Gmeiner,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Evan Quan,
	Guchun Chen, Hawking Zhang, intel-gfx, open list:DOCUMENTATION,
	open list, Mario Limonciello, Michel Dänzer, Russell King,
	Sean Paul, Shashank Sharma, Tvrtko Ursulin, YiPeng Chai


On 12/04/2023 10:34, Christian König wrote:
> Am 12.04.23 um 00:56 schrieb Rob Clark:
>> From: Rob Clark <robdclark@chromium.org>
>>
>> Similar motivation to other similar recent attempt[1].  But with an
>> attempt to have some shared code for this.  As well as documentation.
>>
>> It is probably a bit UMA-centric, I guess devices with VRAM might want
>> some placement stats as well.  But this seems like a reasonable start.
>>
>> Basic gputop support: https://patchwork.freedesktop.org/series/116236/
>> And already nvtop support: https://github.com/Syllo/nvtop/pull/204
>>
>> [1] https://patchwork.freedesktop.org/series/112397/
> 
> I think the extra client id looks a bit superfluous since the ino of the 
> file should already be unique and IIRC we have been already using that one.

Do you mean file_inode(struct drm_file->filp)->i_ino ? That one would be 
the same number for all clients which open the same device node so 
wouldn't work.

I also don't think the atomic_add_return for client id works either, 
since it can alias on overflow.

In i915 I use an xarray and __xa_alloc_cyclic.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 0/7] drm: fdinfo memory stats
@ 2023-04-12 12:10     ` Tvrtko Ursulin
  0 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-12 12:10 UTC (permalink / raw)
  To: Christian König, Rob Clark, dri-devel
  Cc: open list:DOCUMENTATION, open list, Michel Dänzer,
	YiPeng Chai, Mario Limonciello, Rob Clark, Guchun Chen,
	Shashank Sharma, open list:RADEON and AMDGPU DRM DRIVERS,
	Russell King, Arunpravin Paneer Selvam, linux-arm-msm, intel-gfx,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Evan Quan,
	Sean Paul, Tvrtko Ursulin, Emil Velikov, Christopher Healy,
	Boris Brezillon, Alex Deucher, freedreno, Hawking Zhang


On 12/04/2023 10:34, Christian König wrote:
> Am 12.04.23 um 00:56 schrieb Rob Clark:
>> From: Rob Clark <robdclark@chromium.org>
>>
>> Similar motivation to other similar recent attempt[1].  But with an
>> attempt to have some shared code for this.  As well as documentation.
>>
>> It is probably a bit UMA-centric, I guess devices with VRAM might want
>> some placement stats as well.  But this seems like a reasonable start.
>>
>> Basic gputop support: https://patchwork.freedesktop.org/series/116236/
>> And already nvtop support: https://github.com/Syllo/nvtop/pull/204
>>
>> [1] https://patchwork.freedesktop.org/series/112397/
> 
> I think the extra client id looks a bit superfluous since the ino of the 
> file should already be unique and IIRC we have been already using that one.

Do you mean file_inode(struct drm_file->filp)->i_ino ? That one would be 
the same number for all clients which open the same device node so 
wouldn't work.

I also don't think the atomic_add_return for client id works either, 
since it can alias on overflow.

In i915 I use an xarray and __xa_alloc_cyclic.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Intel-gfx] [PATCH v3 0/7] drm: fdinfo memory stats
@ 2023-04-12 12:10     ` Tvrtko Ursulin
  0 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-12 12:10 UTC (permalink / raw)
  To: Christian König, Rob Clark, dri-devel
  Cc: open list:DOCUMENTATION, open list, Michel Dänzer,
	YiPeng Chai, Mario Limonciello, Rob Clark, Guchun Chen,
	open list:RADEON and AMDGPU DRM DRIVERS, Russell King,
	Arunpravin Paneer Selvam, linux-arm-msm, intel-gfx,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Christian Gmeiner,
	Evan Quan, Christopher Healy, Alex Deucher, freedreno,
	Hawking Zhang


On 12/04/2023 10:34, Christian König wrote:
> Am 12.04.23 um 00:56 schrieb Rob Clark:
>> From: Rob Clark <robdclark@chromium.org>
>>
>> Similar motivation to other similar recent attempt[1].  But with an
>> attempt to have some shared code for this.  As well as documentation.
>>
>> It is probably a bit UMA-centric, I guess devices with VRAM might want
>> some placement stats as well.  But this seems like a reasonable start.
>>
>> Basic gputop support: https://patchwork.freedesktop.org/series/116236/
>> And already nvtop support: https://github.com/Syllo/nvtop/pull/204
>>
>> [1] https://patchwork.freedesktop.org/series/112397/
> 
> I think the extra client id looks a bit superfluous since the ino of the 
> file should already be unique and IIRC we have been already using that one.

Do you mean file_inode(struct drm_file->filp)->i_ino ? That one would be 
the same number for all clients which open the same device node so 
wouldn't work.

I also don't think the atomic_add_return for client id works either, 
since it can alias on overflow.

In i915 I use an xarray and __xa_alloc_cyclic.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 0/7] drm: fdinfo memory stats
@ 2023-04-12 12:10     ` Tvrtko Ursulin
  0 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-12 12:10 UTC (permalink / raw)
  To: Christian König, Rob Clark, dri-devel
  Cc: open list:DOCUMENTATION, open list, Michel Dänzer,
	YiPeng Chai, Mario Limonciello, Rob Clark, Guchun Chen,
	Shashank Sharma, open list:RADEON and AMDGPU DRM DRIVERS,
	Russell King, Arunpravin Paneer Selvam, linux-arm-msm, intel-gfx,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Christian Gmeiner,
	Evan Quan, Sean Paul, Tvrtko Ursulin, Emil Velikov,
	Christopher Healy, Boris Brezillon, Alex Deucher, freedreno,
	Hawking Zhang


On 12/04/2023 10:34, Christian König wrote:
> Am 12.04.23 um 00:56 schrieb Rob Clark:
>> From: Rob Clark <robdclark@chromium.org>
>>
>> Similar motivation to other similar recent attempt[1].  But with an
>> attempt to have some shared code for this.  As well as documentation.
>>
>> It is probably a bit UMA-centric, I guess devices with VRAM might want
>> some placement stats as well.  But this seems like a reasonable start.
>>
>> Basic gputop support: https://patchwork.freedesktop.org/series/116236/
>> And already nvtop support: https://github.com/Syllo/nvtop/pull/204
>>
>> [1] https://patchwork.freedesktop.org/series/112397/
> 
> I think the extra client id looks a bit superfluous since the ino of the 
> file should already be unique and IIRC we have been already using that one.

Do you mean file_inode(struct drm_file->filp)->i_ino ? That one would be 
the same number for all clients which open the same device node so 
wouldn't work.

I also don't think the atomic_add_return for client id works either, 
since it can alias on overflow.

In i915 I use an xarray and __xa_alloc_cyclic.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 0/7] drm: fdinfo memory stats
  2023-04-12 12:10     ` Tvrtko Ursulin
  (?)
  (?)
@ 2023-04-12 12:22       ` Christian König
  -1 siblings, 0 replies; 94+ messages in thread
From: Christian König @ 2023-04-12 12:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, Rob Clark, dri-devel
  Cc: linux-arm-msm, freedreno, Boris Brezillon, Christopher Healy,
	Emil Velikov, Rob Clark, Alex Deucher,
	open list:RADEON and AMDGPU DRM DRIVERS,
	Arunpravin Paneer Selvam, Christian Gmeiner,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Evan Quan,
	Guchun Chen, Hawking Zhang, intel-gfx, open list:DOCUMENTATION,
	open list, Mario Limonciello, Michel Dänzer, Russell King,
	Sean Paul, Shashank Sharma, Tvrtko Ursulin, YiPeng Chai

Am 12.04.23 um 14:10 schrieb Tvrtko Ursulin:
>
> On 12/04/2023 10:34, Christian König wrote:
>> Am 12.04.23 um 00:56 schrieb Rob Clark:
>>> From: Rob Clark <robdclark@chromium.org>
>>>
>>> Similar motivation to other similar recent attempt[1].  But with an
>>> attempt to have some shared code for this.  As well as documentation.
>>>
>>> It is probably a bit UMA-centric, I guess devices with VRAM might want
>>> some placement stats as well.  But this seems like a reasonable start.
>>>
>>> Basic gputop support: https://patchwork.freedesktop.org/series/116236/
>>> And already nvtop support: https://github.com/Syllo/nvtop/pull/204
>>>
>>> [1] https://patchwork.freedesktop.org/series/112397/
>>
>> I think the extra client id looks a bit superfluous since the ino of 
>> the file should already be unique and IIRC we have been already using 
>> that one.
>
> Do you mean file_inode(struct drm_file->filp)->i_ino ? That one would 
> be the same number for all clients which open the same device node so 
> wouldn't work.

Ah, right. DMA-buf used a separate ino per buffer, but we don't do that 
for the drm_file.

>
> I also don't think the atomic_add_return for client id works either, 
> since it can alias on overflow.

Yeah, we might want to use a 64bit number here if any.

Christian.

>
> In i915 I use an xarray and __xa_alloc_cyclic.
>
> Regards,
>
> Tvrtko


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 0/7] drm: fdinfo memory stats
@ 2023-04-12 12:22       ` Christian König
  0 siblings, 0 replies; 94+ messages in thread
From: Christian König @ 2023-04-12 12:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, Rob Clark, dri-devel
  Cc: open list:DOCUMENTATION, open list, Michel Dänzer,
	YiPeng Chai, Mario Limonciello, Rob Clark, Guchun Chen,
	Shashank Sharma, open list:RADEON and AMDGPU DRM DRIVERS,
	Russell King, Arunpravin Paneer Selvam, linux-arm-msm, intel-gfx,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Evan Quan,
	Sean Paul, Tvrtko Ursulin, Emil Velikov, Christopher Healy,
	Boris Brezillon, Alex Deucher, freedreno, Hawking Zhang

Am 12.04.23 um 14:10 schrieb Tvrtko Ursulin:
>
> On 12/04/2023 10:34, Christian König wrote:
>> Am 12.04.23 um 00:56 schrieb Rob Clark:
>>> From: Rob Clark <robdclark@chromium.org>
>>>
>>> Similar motivation to other similar recent attempt[1].  But with an
>>> attempt to have some shared code for this.  As well as documentation.
>>>
>>> It is probably a bit UMA-centric, I guess devices with VRAM might want
>>> some placement stats as well.  But this seems like a reasonable start.
>>>
>>> Basic gputop support: https://patchwork.freedesktop.org/series/116236/
>>> And already nvtop support: https://github.com/Syllo/nvtop/pull/204
>>>
>>> [1] https://patchwork.freedesktop.org/series/112397/
>>
>> I think the extra client id looks a bit superfluous since the ino of 
>> the file should already be unique and IIRC we have been already using 
>> that one.
>
> Do you mean file_inode(struct drm_file->filp)->i_ino ? That one would 
> be the same number for all clients which open the same device node so 
> wouldn't work.

Ah, right. DMA-buf used a separate ino per buffer, but we don't do that 
for the drm_file.

>
> I also don't think the atomic_add_return for client id works either, 
> since it can alias on overflow.

Yeah, we might want to use a 64bit number here if any.

Christian.

>
> In i915 I use an xarray and __xa_alloc_cyclic.
>
> Regards,
>
> Tvrtko


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Intel-gfx] [PATCH v3 0/7] drm: fdinfo memory stats
@ 2023-04-12 12:22       ` Christian König
  0 siblings, 0 replies; 94+ messages in thread
From: Christian König @ 2023-04-12 12:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, Rob Clark, dri-devel
  Cc: open list:DOCUMENTATION, open list, Michel Dänzer,
	YiPeng Chai, Mario Limonciello, Rob Clark, Guchun Chen,
	open list:RADEON and AMDGPU DRM DRIVERS, Russell King,
	Arunpravin Paneer Selvam, linux-arm-msm, intel-gfx,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Christian Gmeiner,
	Evan Quan, Christopher Healy, Alex Deucher, freedreno,
	Hawking Zhang

Am 12.04.23 um 14:10 schrieb Tvrtko Ursulin:
>
> On 12/04/2023 10:34, Christian König wrote:
>> Am 12.04.23 um 00:56 schrieb Rob Clark:
>>> From: Rob Clark <robdclark@chromium.org>
>>>
>>> Similar motivation to other similar recent attempt[1].  But with an
>>> attempt to have some shared code for this.  As well as documentation.
>>>
>>> It is probably a bit UMA-centric, I guess devices with VRAM might want
>>> some placement stats as well.  But this seems like a reasonable start.
>>>
>>> Basic gputop support: https://patchwork.freedesktop.org/series/116236/
>>> And already nvtop support: https://github.com/Syllo/nvtop/pull/204
>>>
>>> [1] https://patchwork.freedesktop.org/series/112397/
>>
>> I think the extra client id looks a bit superfluous since the ino of 
>> the file should already be unique and IIRC we have been already using 
>> that one.
>
> Do you mean file_inode(struct drm_file->filp)->i_ino ? That one would 
> be the same number for all clients which open the same device node so 
> wouldn't work.

Ah, right. DMA-buf used a separate ino per buffer, but we don't do that 
for the drm_file.

>
> I also don't think the atomic_add_return for client id works either, 
> since it can alias on overflow.

Yeah, we might want to use a 64bit number here if any.

Christian.

>
> In i915 I use an xarray and __xa_alloc_cyclic.
>
> Regards,
>
> Tvrtko


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 0/7] drm: fdinfo memory stats
@ 2023-04-12 12:22       ` Christian König
  0 siblings, 0 replies; 94+ messages in thread
From: Christian König @ 2023-04-12 12:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, Rob Clark, dri-devel
  Cc: open list:DOCUMENTATION, open list, Michel Dänzer,
	YiPeng Chai, Mario Limonciello, Rob Clark, Guchun Chen,
	Shashank Sharma, open list:RADEON and AMDGPU DRM DRIVERS,
	Russell King, Arunpravin Paneer Selvam, linux-arm-msm, intel-gfx,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, Christian Gmeiner,
	Evan Quan, Sean Paul, Tvrtko Ursulin, Emil Velikov,
	Christopher Healy, Boris Brezillon, Alex Deucher, freedreno,
	Hawking Zhang

Am 12.04.23 um 14:10 schrieb Tvrtko Ursulin:
>
> On 12/04/2023 10:34, Christian König wrote:
>> Am 12.04.23 um 00:56 schrieb Rob Clark:
>>> From: Rob Clark <robdclark@chromium.org>
>>>
>>> Similar motivation to other similar recent attempt[1].  But with an
>>> attempt to have some shared code for this.  As well as documentation.
>>>
>>> It is probably a bit UMA-centric, I guess devices with VRAM might want
>>> some placement stats as well.  But this seems like a reasonable start.
>>>
>>> Basic gputop support: https://patchwork.freedesktop.org/series/116236/
>>> And already nvtop support: https://github.com/Syllo/nvtop/pull/204
>>>
>>> [1] https://patchwork.freedesktop.org/series/112397/
>>
>> I think the extra client id looks a bit superfluous since the ino of 
>> the file should already be unique and IIRC we have been already using 
>> that one.
>
> Do you mean file_inode(struct drm_file->filp)->i_ino ? That one would 
> be the same number for all clients which open the same device node so 
> wouldn't work.

Ah, right. DMA-buf used a separate ino per buffer, but we don't do that 
for the drm_file.

>
> I also don't think the atomic_add_return for client id works either, 
> since it can alias on overflow.

Yeah, we might want to use a 64bit number here if any.

Christian.

>
> In i915 I use an xarray and __xa_alloc_cyclic.
>
> Regards,
>
> Tvrtko


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 4/7] drm/i915: Switch to fdinfo helper
  2023-04-11 22:56   ` Rob Clark
  (?)
@ 2023-04-12 12:32     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-12 12:32 UTC (permalink / raw)
  To: Rob Clark, dri-devel
  Cc: linux-arm-msm, freedreno, Boris Brezillon, Christopher Healy,
	Emil Velikov, Rob Clark, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, open list


On 11/04/2023 23:56, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>   drivers/gpu/drm/i915/i915_driver.c     |  3 ++-
>   drivers/gpu/drm/i915/i915_drm_client.c | 18 +++++-------------
>   drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
>   3 files changed, 8 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index db7a86def7e2..37eacaa3064b 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
>   	.compat_ioctl = i915_ioc32_compat_ioctl,
>   	.llseek = noop_llseek,
>   #ifdef CONFIG_PROC_FS
> -	.show_fdinfo = i915_drm_client_fdinfo,
> +	.show_fdinfo = drm_fop_show_fdinfo,
>   #endif
>   };
>   
> @@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
>   	.open = i915_driver_open,
>   	.lastclose = i915_driver_lastclose,
>   	.postclose = i915_driver_postclose,
> +	.show_fdinfo = i915_drm_client_fdinfo,
>   
>   	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>   	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
> index b09d1d386574..4a77e5e47f79 100644
> --- a/drivers/gpu/drm/i915/i915_drm_client.c
> +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> @@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
>   }
>   
>   static void
> -show_client_class(struct seq_file *m,
> +show_client_class(struct drm_printer *p,
>   		  struct i915_drm_client *client,
>   		  unsigned int class)
>   {
> @@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
>   	rcu_read_unlock();
>   
>   	if (capacity)
> -		seq_printf(m, "drm-engine-%s:\t%llu ns\n",
> +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
>   			   uabi_class_names[class], total);
>   
>   	if (capacity > 1)
> -		seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
> +		drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
>   			   uabi_class_names[class],
>   			   capacity);
>   }
>   
> -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
>   {
> -	struct drm_file *file = f->private_data;
>   	struct drm_i915_file_private *file_priv = file->driver_priv;
>   	struct drm_i915_private *i915 = file_priv->dev_priv;
>   	struct i915_drm_client *client = file_priv->client;
> -	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
>   	unsigned int i;
>   
>   	/*
> @@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
>   	 * ******************************************************************
>   	 */
>   
> -	seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
> -	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
> -		   pci_domain_nr(pdev->bus), pdev->bus->number,
> -		   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> -	seq_printf(m, "drm-client-id:\t%u\n", client->id);

As mentioned in my reply to the cover letter, I think the i915 
implementation is the right one. At least the semantics of it.

Granted it is a super set of the minimum required as documented by 
drm-usage-stats.rst - not only 1:1 to current instances of struct file, 
but also avoids immediate id recycling.

Former could perhaps be achieved with a simple pointer hash, but latter 
helps userspace detect when a client has exited and id re-allocated to a 
new client within a single scanning period.

Without this I don't think userspace can implement a fail safe method of 
detecting which clients are new ones and so wouldn't be able to track 
history correctly.

I think we should rather extend the documented contract to include the 
cyclical property than settle for a weaker common implementation.

Regards,

Tvrtko

> -
>   	/*
>   	 * Temporarily skip showing client engine information with GuC submission till
>   	 * fetching engine busyness is implemented in the GuC submission backend
> @@ -155,6 +147,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
>   		return;
>   
>   	for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
> -		show_client_class(m, client, i);
> +		show_client_class(p, client, i);
>   }
>   #endif
> diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
> index 69496af996d9..ef85fef45de5 100644
> --- a/drivers/gpu/drm/i915/i915_drm_client.h
> +++ b/drivers/gpu/drm/i915/i915_drm_client.h
> @@ -60,7 +60,7 @@ static inline void i915_drm_client_put(struct i915_drm_client *client)
>   struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients);
>   
>   #ifdef CONFIG_PROC_FS
> -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f);
> +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file);
>   #endif
>   
>   void i915_drm_clients_fini(struct i915_drm_clients *clients);

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 4/7] drm/i915: Switch to fdinfo helper
@ 2023-04-12 12:32     ` Tvrtko Ursulin
  0 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-12 12:32 UTC (permalink / raw)
  To: Rob Clark, dri-devel
  Cc: Rob Clark, linux-arm-msm, intel-gfx, Emil Velikov,
	Christopher Healy, open list, Boris Brezillon, Rodrigo Vivi,
	freedreno


On 11/04/2023 23:56, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>   drivers/gpu/drm/i915/i915_driver.c     |  3 ++-
>   drivers/gpu/drm/i915/i915_drm_client.c | 18 +++++-------------
>   drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
>   3 files changed, 8 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index db7a86def7e2..37eacaa3064b 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
>   	.compat_ioctl = i915_ioc32_compat_ioctl,
>   	.llseek = noop_llseek,
>   #ifdef CONFIG_PROC_FS
> -	.show_fdinfo = i915_drm_client_fdinfo,
> +	.show_fdinfo = drm_fop_show_fdinfo,
>   #endif
>   };
>   
> @@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
>   	.open = i915_driver_open,
>   	.lastclose = i915_driver_lastclose,
>   	.postclose = i915_driver_postclose,
> +	.show_fdinfo = i915_drm_client_fdinfo,
>   
>   	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>   	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
> index b09d1d386574..4a77e5e47f79 100644
> --- a/drivers/gpu/drm/i915/i915_drm_client.c
> +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> @@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
>   }
>   
>   static void
> -show_client_class(struct seq_file *m,
> +show_client_class(struct drm_printer *p,
>   		  struct i915_drm_client *client,
>   		  unsigned int class)
>   {
> @@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
>   	rcu_read_unlock();
>   
>   	if (capacity)
> -		seq_printf(m, "drm-engine-%s:\t%llu ns\n",
> +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
>   			   uabi_class_names[class], total);
>   
>   	if (capacity > 1)
> -		seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
> +		drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
>   			   uabi_class_names[class],
>   			   capacity);
>   }
>   
> -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
>   {
> -	struct drm_file *file = f->private_data;
>   	struct drm_i915_file_private *file_priv = file->driver_priv;
>   	struct drm_i915_private *i915 = file_priv->dev_priv;
>   	struct i915_drm_client *client = file_priv->client;
> -	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
>   	unsigned int i;
>   
>   	/*
> @@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
>   	 * ******************************************************************
>   	 */
>   
> -	seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
> -	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
> -		   pci_domain_nr(pdev->bus), pdev->bus->number,
> -		   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> -	seq_printf(m, "drm-client-id:\t%u\n", client->id);

As mentioned in my reply to the cover letter, I think the i915 
implementation is the right one. At least the semantics of it.

Granted it is a super set of the minimum required as documented by 
drm-usage-stats.rst - not only 1:1 to current instances of struct file, 
but also avoids immediate id recycling.

Former could perhaps be achieved with a simple pointer hash, but latter 
helps userspace detect when a client has exited and id re-allocated to a 
new client within a single scanning period.

Without this I don't think userspace can implement a fail safe method of 
detecting which clients are new ones and so wouldn't be able to track 
history correctly.

I think we should rather extend the documented contract to include the 
cyclical property than settle for a weaker common implementation.

Regards,

Tvrtko

> -
>   	/*
>   	 * Temporarily skip showing client engine information with GuC submission till
>   	 * fetching engine busyness is implemented in the GuC submission backend
> @@ -155,6 +147,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
>   		return;
>   
>   	for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
> -		show_client_class(m, client, i);
> +		show_client_class(p, client, i);
>   }
>   #endif
> diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
> index 69496af996d9..ef85fef45de5 100644
> --- a/drivers/gpu/drm/i915/i915_drm_client.h
> +++ b/drivers/gpu/drm/i915/i915_drm_client.h
> @@ -60,7 +60,7 @@ static inline void i915_drm_client_put(struct i915_drm_client *client)
>   struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients);
>   
>   #ifdef CONFIG_PROC_FS
> -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f);
> +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file);
>   #endif
>   
>   void i915_drm_clients_fini(struct i915_drm_clients *clients);

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Intel-gfx] [PATCH v3 4/7] drm/i915: Switch to fdinfo helper
@ 2023-04-12 12:32     ` Tvrtko Ursulin
  0 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-12 12:32 UTC (permalink / raw)
  To: Rob Clark, dri-devel
  Cc: Rob Clark, linux-arm-msm, intel-gfx, Christopher Healy,
	open list, Daniel Vetter, Rodrigo Vivi, David Airlie, freedreno


On 11/04/2023 23:56, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>   drivers/gpu/drm/i915/i915_driver.c     |  3 ++-
>   drivers/gpu/drm/i915/i915_drm_client.c | 18 +++++-------------
>   drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
>   3 files changed, 8 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index db7a86def7e2..37eacaa3064b 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
>   	.compat_ioctl = i915_ioc32_compat_ioctl,
>   	.llseek = noop_llseek,
>   #ifdef CONFIG_PROC_FS
> -	.show_fdinfo = i915_drm_client_fdinfo,
> +	.show_fdinfo = drm_fop_show_fdinfo,
>   #endif
>   };
>   
> @@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
>   	.open = i915_driver_open,
>   	.lastclose = i915_driver_lastclose,
>   	.postclose = i915_driver_postclose,
> +	.show_fdinfo = i915_drm_client_fdinfo,
>   
>   	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>   	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
> index b09d1d386574..4a77e5e47f79 100644
> --- a/drivers/gpu/drm/i915/i915_drm_client.c
> +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> @@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
>   }
>   
>   static void
> -show_client_class(struct seq_file *m,
> +show_client_class(struct drm_printer *p,
>   		  struct i915_drm_client *client,
>   		  unsigned int class)
>   {
> @@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
>   	rcu_read_unlock();
>   
>   	if (capacity)
> -		seq_printf(m, "drm-engine-%s:\t%llu ns\n",
> +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
>   			   uabi_class_names[class], total);
>   
>   	if (capacity > 1)
> -		seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
> +		drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
>   			   uabi_class_names[class],
>   			   capacity);
>   }
>   
> -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
>   {
> -	struct drm_file *file = f->private_data;
>   	struct drm_i915_file_private *file_priv = file->driver_priv;
>   	struct drm_i915_private *i915 = file_priv->dev_priv;
>   	struct i915_drm_client *client = file_priv->client;
> -	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
>   	unsigned int i;
>   
>   	/*
> @@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
>   	 * ******************************************************************
>   	 */
>   
> -	seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
> -	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
> -		   pci_domain_nr(pdev->bus), pdev->bus->number,
> -		   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> -	seq_printf(m, "drm-client-id:\t%u\n", client->id);

As mentioned in my reply to the cover letter, I think the i915 
implementation is the right one. At least the semantics of it.

Granted it is a super set of the minimum required as documented by 
drm-usage-stats.rst - not only 1:1 to current instances of struct file, 
but also avoids immediate id recycling.

Former could perhaps be achieved with a simple pointer hash, but latter 
helps userspace detect when a client has exited and id re-allocated to a 
new client within a single scanning period.

Without this I don't think userspace can implement a fail safe method of 
detecting which clients are new ones and so wouldn't be able to track 
history correctly.

I think we should rather extend the documented contract to include the 
cyclical property than settle for a weaker common implementation.

Regards,

Tvrtko

> -
>   	/*
>   	 * Temporarily skip showing client engine information with GuC submission till
>   	 * fetching engine busyness is implemented in the GuC submission backend
> @@ -155,6 +147,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
>   		return;
>   
>   	for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
> -		show_client_class(m, client, i);
> +		show_client_class(p, client, i);
>   }
>   #endif
> diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
> index 69496af996d9..ef85fef45de5 100644
> --- a/drivers/gpu/drm/i915/i915_drm_client.h
> +++ b/drivers/gpu/drm/i915/i915_drm_client.h
> @@ -60,7 +60,7 @@ static inline void i915_drm_client_put(struct i915_drm_client *client)
>   struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients);
>   
>   #ifdef CONFIG_PROC_FS
> -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f);
> +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file);
>   #endif
>   
>   void i915_drm_clients_fini(struct i915_drm_clients *clients);

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 4/7] drm/i915: Switch to fdinfo helper
  2023-04-12 12:32     ` Tvrtko Ursulin
  (?)
@ 2023-04-12 13:51       ` Daniel Vetter
  -1 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12 13:51 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Christopher Healy, Emil Velikov, Rob Clark, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, David Airlie, Daniel Vetter,
	intel-gfx, open list

On Wed, Apr 12, 2023 at 01:32:43PM +0100, Tvrtko Ursulin wrote:
> 
> On 11/04/2023 23:56, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> > 
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >   drivers/gpu/drm/i915/i915_driver.c     |  3 ++-
> >   drivers/gpu/drm/i915/i915_drm_client.c | 18 +++++-------------
> >   drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
> >   3 files changed, 8 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> > index db7a86def7e2..37eacaa3064b 100644
> > --- a/drivers/gpu/drm/i915/i915_driver.c
> > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > @@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
> >   	.compat_ioctl = i915_ioc32_compat_ioctl,
> >   	.llseek = noop_llseek,
> >   #ifdef CONFIG_PROC_FS
> > -	.show_fdinfo = i915_drm_client_fdinfo,
> > +	.show_fdinfo = drm_fop_show_fdinfo,
> >   #endif
> >   };
> > @@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
> >   	.open = i915_driver_open,
> >   	.lastclose = i915_driver_lastclose,
> >   	.postclose = i915_driver_postclose,
> > +	.show_fdinfo = i915_drm_client_fdinfo,
> >   	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> >   	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
> > index b09d1d386574..4a77e5e47f79 100644
> > --- a/drivers/gpu/drm/i915/i915_drm_client.c
> > +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> > @@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
> >   }
> >   static void
> > -show_client_class(struct seq_file *m,
> > +show_client_class(struct drm_printer *p,
> >   		  struct i915_drm_client *client,
> >   		  unsigned int class)
> >   {
> > @@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
> >   	rcu_read_unlock();
> >   	if (capacity)
> > -		seq_printf(m, "drm-engine-%s:\t%llu ns\n",
> > +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
> >   			   uabi_class_names[class], total);
> >   	if (capacity > 1)
> > -		seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
> > +		drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
> >   			   uabi_class_names[class],
> >   			   capacity);
> >   }
> > -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> > +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
> >   {
> > -	struct drm_file *file = f->private_data;
> >   	struct drm_i915_file_private *file_priv = file->driver_priv;
> >   	struct drm_i915_private *i915 = file_priv->dev_priv;
> >   	struct i915_drm_client *client = file_priv->client;
> > -	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
> >   	unsigned int i;
> >   	/*
> > @@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> >   	 * ******************************************************************
> >   	 */
> > -	seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
> > -	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
> > -		   pci_domain_nr(pdev->bus), pdev->bus->number,
> > -		   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> > -	seq_printf(m, "drm-client-id:\t%u\n", client->id);
> 
> As mentioned in my reply to the cover letter, I think the i915
> implementation is the right one. At least the semantics of it.
> 
> Granted it is a super set of the minimum required as documented by
> drm-usage-stats.rst - not only 1:1 to current instances of struct file, but
> also avoids immediate id recycling.
> 
> Former could perhaps be achieved with a simple pointer hash, but latter
> helps userspace detect when a client has exited and id re-allocated to a new
> client within a single scanning period.
> 
> Without this I don't think userspace can implement a fail safe method of
> detecting which clients are new ones and so wouldn't be able to track
> history correctly.
> 
> I think we should rather extend the documented contract to include the
> cyclical property than settle for a weaker common implementation.

atomic64_t never wraps, so you don't have any recycling issues?

The other piece and imo much more important is that I really don't want
the i915_drm_client design to spread, it conceptually makes no sense.
drm_file is the uapi object, once that's gone userspace will never be able
to look at anything, having a separate free-standing object that's
essentially always dead is backwards.

I went a bit more in-depth in a different thread on scheduler fd_info
stats, but essentially fd_info needs to pull stats, you should never push
stats towards the drm_file (or i915_drm_client). That avoids all the
refcounting issues and rcu needs and everything else like that.

Maybe you want to jump into that thread:
https://lore.kernel.org/dri-devel/CAKMK7uE=m3sSTQrLCeDg0vG8viODOecUsYDK1oC++f5pQi0e8Q@mail.gmail.com/

So retiring i915_drm_client infrastructure is the right direction I think.
-Daniel

> Regards,
> 
> Tvrtko
> 
> > -
> >   	/*
> >   	 * Temporarily skip showing client engine information with GuC submission till
> >   	 * fetching engine busyness is implemented in the GuC submission backend
> > @@ -155,6 +147,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> >   		return;
> >   	for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
> > -		show_client_class(m, client, i);
> > +		show_client_class(p, client, i);
> >   }
> >   #endif
> > diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
> > index 69496af996d9..ef85fef45de5 100644
> > --- a/drivers/gpu/drm/i915/i915_drm_client.h
> > +++ b/drivers/gpu/drm/i915/i915_drm_client.h
> > @@ -60,7 +60,7 @@ static inline void i915_drm_client_put(struct i915_drm_client *client)
> >   struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients);
> >   #ifdef CONFIG_PROC_FS
> > -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f);
> > +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file);
> >   #endif
> >   void i915_drm_clients_fini(struct i915_drm_clients *clients);

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 4/7] drm/i915: Switch to fdinfo helper
@ 2023-04-12 13:51       ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12 13:51 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, linux-arm-msm, intel-gfx, Emil Velikov,
	Christopher Healy, dri-devel, open list, Boris Brezillon,
	Rodrigo Vivi, freedreno

On Wed, Apr 12, 2023 at 01:32:43PM +0100, Tvrtko Ursulin wrote:
> 
> On 11/04/2023 23:56, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> > 
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >   drivers/gpu/drm/i915/i915_driver.c     |  3 ++-
> >   drivers/gpu/drm/i915/i915_drm_client.c | 18 +++++-------------
> >   drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
> >   3 files changed, 8 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> > index db7a86def7e2..37eacaa3064b 100644
> > --- a/drivers/gpu/drm/i915/i915_driver.c
> > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > @@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
> >   	.compat_ioctl = i915_ioc32_compat_ioctl,
> >   	.llseek = noop_llseek,
> >   #ifdef CONFIG_PROC_FS
> > -	.show_fdinfo = i915_drm_client_fdinfo,
> > +	.show_fdinfo = drm_fop_show_fdinfo,
> >   #endif
> >   };
> > @@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
> >   	.open = i915_driver_open,
> >   	.lastclose = i915_driver_lastclose,
> >   	.postclose = i915_driver_postclose,
> > +	.show_fdinfo = i915_drm_client_fdinfo,
> >   	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> >   	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
> > index b09d1d386574..4a77e5e47f79 100644
> > --- a/drivers/gpu/drm/i915/i915_drm_client.c
> > +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> > @@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
> >   }
> >   static void
> > -show_client_class(struct seq_file *m,
> > +show_client_class(struct drm_printer *p,
> >   		  struct i915_drm_client *client,
> >   		  unsigned int class)
> >   {
> > @@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
> >   	rcu_read_unlock();
> >   	if (capacity)
> > -		seq_printf(m, "drm-engine-%s:\t%llu ns\n",
> > +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
> >   			   uabi_class_names[class], total);
> >   	if (capacity > 1)
> > -		seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
> > +		drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
> >   			   uabi_class_names[class],
> >   			   capacity);
> >   }
> > -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> > +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
> >   {
> > -	struct drm_file *file = f->private_data;
> >   	struct drm_i915_file_private *file_priv = file->driver_priv;
> >   	struct drm_i915_private *i915 = file_priv->dev_priv;
> >   	struct i915_drm_client *client = file_priv->client;
> > -	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
> >   	unsigned int i;
> >   	/*
> > @@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> >   	 * ******************************************************************
> >   	 */
> > -	seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
> > -	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
> > -		   pci_domain_nr(pdev->bus), pdev->bus->number,
> > -		   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> > -	seq_printf(m, "drm-client-id:\t%u\n", client->id);
> 
> As mentioned in my reply to the cover letter, I think the i915
> implementation is the right one. At least the semantics of it.
> 
> Granted it is a super set of the minimum required as documented by
> drm-usage-stats.rst - not only 1:1 to current instances of struct file, but
> also avoids immediate id recycling.
> 
> Former could perhaps be achieved with a simple pointer hash, but latter
> helps userspace detect when a client has exited and id re-allocated to a new
> client within a single scanning period.
> 
> Without this I don't think userspace can implement a fail safe method of
> detecting which clients are new ones and so wouldn't be able to track
> history correctly.
> 
> I think we should rather extend the documented contract to include the
> cyclical property than settle for a weaker common implementation.

atomic64_t never wraps, so you don't have any recycling issues?

The other piece and imo much more important is that I really don't want
the i915_drm_client design to spread, it conceptually makes no sense.
drm_file is the uapi object, once that's gone userspace will never be able
to look at anything, having a separate free-standing object that's
essentially always dead is backwards.

I went a bit more in-depth in a different thread on scheduler fd_info
stats, but essentially fd_info needs to pull stats, you should never push
stats towards the drm_file (or i915_drm_client). That avoids all the
refcounting issues and rcu needs and everything else like that.

Maybe you want to jump into that thread:
https://lore.kernel.org/dri-devel/CAKMK7uE=m3sSTQrLCeDg0vG8viODOecUsYDK1oC++f5pQi0e8Q@mail.gmail.com/

So retiring i915_drm_client infrastructure is the right direction I think.
-Daniel

> Regards,
> 
> Tvrtko
> 
> > -
> >   	/*
> >   	 * Temporarily skip showing client engine information with GuC submission till
> >   	 * fetching engine busyness is implemented in the GuC submission backend
> > @@ -155,6 +147,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> >   		return;
> >   	for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
> > -		show_client_class(m, client, i);
> > +		show_client_class(p, client, i);
> >   }
> >   #endif
> > diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
> > index 69496af996d9..ef85fef45de5 100644
> > --- a/drivers/gpu/drm/i915/i915_drm_client.h
> > +++ b/drivers/gpu/drm/i915/i915_drm_client.h
> > @@ -60,7 +60,7 @@ static inline void i915_drm_client_put(struct i915_drm_client *client)
> >   struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients);
> >   #ifdef CONFIG_PROC_FS
> > -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f);
> > +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file);
> >   #endif
> >   void i915_drm_clients_fini(struct i915_drm_clients *clients);

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Intel-gfx] [PATCH v3 4/7] drm/i915: Switch to fdinfo helper
@ 2023-04-12 13:51       ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12 13:51 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, linux-arm-msm, intel-gfx, Christopher Healy,
	dri-devel, open list, Daniel Vetter, Rodrigo Vivi, David Airlie,
	freedreno

On Wed, Apr 12, 2023 at 01:32:43PM +0100, Tvrtko Ursulin wrote:
> 
> On 11/04/2023 23:56, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> > 
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >   drivers/gpu/drm/i915/i915_driver.c     |  3 ++-
> >   drivers/gpu/drm/i915/i915_drm_client.c | 18 +++++-------------
> >   drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
> >   3 files changed, 8 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> > index db7a86def7e2..37eacaa3064b 100644
> > --- a/drivers/gpu/drm/i915/i915_driver.c
> > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > @@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
> >   	.compat_ioctl = i915_ioc32_compat_ioctl,
> >   	.llseek = noop_llseek,
> >   #ifdef CONFIG_PROC_FS
> > -	.show_fdinfo = i915_drm_client_fdinfo,
> > +	.show_fdinfo = drm_fop_show_fdinfo,
> >   #endif
> >   };
> > @@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
> >   	.open = i915_driver_open,
> >   	.lastclose = i915_driver_lastclose,
> >   	.postclose = i915_driver_postclose,
> > +	.show_fdinfo = i915_drm_client_fdinfo,
> >   	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> >   	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
> > index b09d1d386574..4a77e5e47f79 100644
> > --- a/drivers/gpu/drm/i915/i915_drm_client.c
> > +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> > @@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
> >   }
> >   static void
> > -show_client_class(struct seq_file *m,
> > +show_client_class(struct drm_printer *p,
> >   		  struct i915_drm_client *client,
> >   		  unsigned int class)
> >   {
> > @@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
> >   	rcu_read_unlock();
> >   	if (capacity)
> > -		seq_printf(m, "drm-engine-%s:\t%llu ns\n",
> > +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
> >   			   uabi_class_names[class], total);
> >   	if (capacity > 1)
> > -		seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
> > +		drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
> >   			   uabi_class_names[class],
> >   			   capacity);
> >   }
> > -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> > +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
> >   {
> > -	struct drm_file *file = f->private_data;
> >   	struct drm_i915_file_private *file_priv = file->driver_priv;
> >   	struct drm_i915_private *i915 = file_priv->dev_priv;
> >   	struct i915_drm_client *client = file_priv->client;
> > -	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
> >   	unsigned int i;
> >   	/*
> > @@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> >   	 * ******************************************************************
> >   	 */
> > -	seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
> > -	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
> > -		   pci_domain_nr(pdev->bus), pdev->bus->number,
> > -		   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> > -	seq_printf(m, "drm-client-id:\t%u\n", client->id);
> 
> As mentioned in my reply to the cover letter, I think the i915
> implementation is the right one. At least the semantics of it.
> 
> Granted it is a super set of the minimum required as documented by
> drm-usage-stats.rst - not only 1:1 to current instances of struct file, but
> also avoids immediate id recycling.
> 
> Former could perhaps be achieved with a simple pointer hash, but latter
> helps userspace detect when a client has exited and id re-allocated to a new
> client within a single scanning period.
> 
> Without this I don't think userspace can implement a fail safe method of
> detecting which clients are new ones and so wouldn't be able to track
> history correctly.
> 
> I think we should rather extend the documented contract to include the
> cyclical property than settle for a weaker common implementation.

atomic64_t never wraps, so you don't have any recycling issues?

The other piece and imo much more important is that I really don't want
the i915_drm_client design to spread, it conceptually makes no sense.
drm_file is the uapi object, once that's gone userspace will never be able
to look at anything, having a separate free-standing object that's
essentially always dead is backwards.

I went a bit more in-depth in a different thread on scheduler fd_info
stats, but essentially fd_info needs to pull stats, you should never push
stats towards the drm_file (or i915_drm_client). That avoids all the
refcounting issues and rcu needs and everything else like that.

Maybe you want to jump into that thread:
https://lore.kernel.org/dri-devel/CAKMK7uE=m3sSTQrLCeDg0vG8viODOecUsYDK1oC++f5pQi0e8Q@mail.gmail.com/

So retiring i915_drm_client infrastructure is the right direction I think.
-Daniel

> Regards,
> 
> Tvrtko
> 
> > -
> >   	/*
> >   	 * Temporarily skip showing client engine information with GuC submission till
> >   	 * fetching engine busyness is implemented in the GuC submission backend
> > @@ -155,6 +147,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> >   		return;
> >   	for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
> > -		show_client_class(m, client, i);
> > +		show_client_class(p, client, i);
> >   }
> >   #endif
> > diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
> > index 69496af996d9..ef85fef45de5 100644
> > --- a/drivers/gpu/drm/i915/i915_drm_client.h
> > +++ b/drivers/gpu/drm/i915/i915_drm_client.h
> > @@ -60,7 +60,7 @@ static inline void i915_drm_client_put(struct i915_drm_client *client)
> >   struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients);
> >   #ifdef CONFIG_PROC_FS
> > -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f);
> > +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file);
> >   #endif
> >   void i915_drm_clients_fini(struct i915_drm_clients *clients);

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-11 22:56   ` Rob Clark
@ 2023-04-12 14:42     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-12 14:42 UTC (permalink / raw)
  To: Rob Clark, dri-devel
  Cc: linux-arm-msm, freedreno, Boris Brezillon, Christopher Healy,
	Emil Velikov, Rob Clark, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, open list:DOCUMENTATION, open list


On 11/04/2023 23:56, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Add support to dump GEM stats to fdinfo.
> 
> v2: Fix typos, change size units to match docs, use div_u64
> v3: Do it in core
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> ---
>   Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
>   drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
>   include/drm/drm_file.h                |  1 +
>   include/drm/drm_gem.h                 | 19 +++++++
>   4 files changed, 117 insertions(+)
> 
> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> index b46327356e80..b5e7802532ed 100644
> --- a/Documentation/gpu/drm-usage-stats.rst
> +++ b/Documentation/gpu/drm-usage-stats.rst
> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
>   Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
>   indicating kibi- or mebi-bytes.
>   
> +- drm-shared-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are shared with another file (ie. have more
> +than a single handle).
> +
> +- drm-private-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are not shared with another file.
> +
> +- drm-resident-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are resident in system memory.

I think this naming maybe does not work best with the existing 
drm-memory-<region> keys.

How about introduce the concept of a memory region from the start and 
use naming similar like we do for engines?

drm-memory-$CATEGORY-$REGION: ...

Then we document a bunch of categories and their semantics, for instance:

'size' - All reachable objects
'shared' - Subset of 'size' with handle_count > 1
'resident' - Objects with backing store
'active' - Objects in use, subset of resident
'purgeable' - Or inactive? Subset of resident.

We keep the same semantics as with process memory accounting (if I got 
it right) which could be desirable for a simplified mental model.

(AMD needs to remind me of their 'drm-memory-...' keys semantics. If we 
correctly captured this in the first round it should be equivalent to 
'resident' above. In any case we can document no category is equal to 
which category, and at most one of the two must be output.)

Region names we at most partially standardize. Like we could say 
'system' is to be used where backing store is system RAM and others are 
driver defined.

Then discrete GPUs could emit N sets of key-values, one for each memory 
region they support.

I think this all also works for objects which can be migrated between 
memory regions. 'Size' accounts them against all regions while for 
'resident' they only appear in the region of their current placement, etc.

Userspace can aggregate if it wishes to do so but kernel side should not.

> +
> +- drm-purgeable-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are purgeable.
> +
> +- drm-active-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are active on one or more rings.
> +
>   - drm-cycles-<str> <uint>
>   
>   Engine identifier string must be the same as the one specified in the
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index 37dfaa6be560..46fdd843bb3a 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -42,6 +42,7 @@
>   #include <drm/drm_client.h>
>   #include <drm/drm_drv.h>
>   #include <drm/drm_file.h>
> +#include <drm/drm_gem.h>
>   #include <drm/drm_print.h>
>   
>   #include "drm_crtc_internal.h"
> @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
>   }
>   EXPORT_SYMBOL(drm_send_event);
>   
> +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> +{
> +	const char *units[] = {"", " KiB", " MiB"};
> +	unsigned u;
> +
> +	for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> +		if (sz < SZ_1K)
> +			break;
> +		sz = div_u64(sz, SZ_1K);
> +	}
> +
> +	drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> +}
> +
> +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> +{
> +	struct drm_gem_object *obj;
> +	struct {
> +		size_t shared;
> +		size_t private;
> +		size_t resident;
> +		size_t purgeable;
> +		size_t active;
> +	} size = {0};
> +	bool has_status = false;
> +	int id;
> +
> +	spin_lock(&file->table_lock);
> +	idr_for_each_entry (&file->object_idr, obj, id) {
> +		enum drm_gem_object_status s = 0;
> +
> +		if (obj->funcs && obj->funcs->status) {
> +			s = obj->funcs->status(obj);
> +			has_status = true;
> +		}
> +
> +		if (obj->handle_count > 1) {
> +			size.shared += obj->size;
> +		} else {
> +			size.private += obj->size;
> +		}
> +
> +		if (s & DRM_GEM_OBJECT_RESIDENT) {
> +			size.resident += obj->size;
> +		} else {
> +			/* If already purged or not yet backed by pages, don't
> +			 * count it as purgeable:
> +			 */
> +			s &= ~DRM_GEM_OBJECT_PURGEABLE;

Side question - why couldn't resident buffers be purgeable? Did you mean 
for the if branch check to be active here? But then it wouldn't make 
sense for a driver to report active _and_ purgeable..

> +		}
> +
> +		if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> +			size.active += obj->size;
> +
> +			/* If still active, don't count as purgeable: */
> +			s &= ~DRM_GEM_OBJECT_PURGEABLE;

Another side question - I guess this tidies a race in reporting? If so 
not sure it matters given the stats are all rather approximate.

> +		}
> +
> +		if (s & DRM_GEM_OBJECT_PURGEABLE)
> +			size.purgeable += obj->size;
> +	}

One concern I have here is that it is all based on obj->size. That is, 
there is no provision for drivers to implement page level granularity. 
So correct reporting in use cases such as VM BIND in the future wouldn't 
work unless it was a driver hook to get almost all of the info above. At 
which point common code is just a loop. TBF I don't know if any drivers 
do sub obj->size backing store granularity today, but I think it is 
sometimes to be sure of before proceeding.

Second concern is what I touched upon in the first reply block - if the 
common code blindly loops over all objects then on discrete GPUs it 
seems we get an 'aggregate' value here which is not what I think we 
want. We rather want to have the ability for drivers to list stats per 
individual memory region.

> +	spin_unlock(&file->table_lock);
> +
> +	print_size(p, "drm-shared-memory", size.shared);
> +	print_size(p, "drm-private-memory", size.private);
> +	print_size(p, "drm-active-memory", size.active);
> +
> +	if (has_status) {
> +		print_size(p, "drm-resident-memory", size.resident);
> +		print_size(p, "drm-purgeable-memory", size.purgeable);
> +	}
> +}
> +
>   /**
>    * drm_fop_show_fdinfo - helper for drm file fops
>    * @seq_file: output stream
> @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
>   
>   	if (dev->driver->show_fdinfo)
>   		dev->driver->show_fdinfo(&p, file);
> +
> +	print_memory_stats(&p, file);
>   }
>   EXPORT_SYMBOL(drm_fop_show_fdinfo);
>   
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index dfa995b787e1..e5b40084538f 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -41,6 +41,7 @@
>   struct dma_fence;
>   struct drm_file;
>   struct drm_device;
> +struct drm_printer;
>   struct device;
>   struct file;
>   
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 189fd618ca65..213917bb6b11 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -42,6 +42,14 @@
>   struct iosys_map;
>   struct drm_gem_object;
>   
> +/**
> + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> + */
> +enum drm_gem_object_status {
> +	DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> +	DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> +};
> +
>   /**
>    * struct drm_gem_object_funcs - GEM object functions
>    */
> @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
>   	 */
>   	int (*evict)(struct drm_gem_object *obj);
>   
> +	/**
> +	 * @status:
> +	 *
> +	 * The optional status callback can return additional object state
> +	 * which determines which stats the object is counted against.  The
> +	 * callback is called under table_lock.  Racing against object status
> +	 * change is "harmless", and the callback can expect to not race
> +	 * against object destruction.
> +	 */
> +	enum drm_gem_object_status (*status)(struct drm_gem_object *obj);

Does this needs to be in object funcs and couldn't be consolidated to 
driver level?

Regards,

Tvrtko

> +
>   	/**
>   	 * @vm_ops:
>   	 *

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-12 14:42     ` Tvrtko Ursulin
  0 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-12 14:42 UTC (permalink / raw)
  To: Rob Clark, dri-devel
  Cc: Rob Clark, Thomas Zimmermann, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	open list, Boris Brezillon, freedreno


On 11/04/2023 23:56, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Add support to dump GEM stats to fdinfo.
> 
> v2: Fix typos, change size units to match docs, use div_u64
> v3: Do it in core
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> ---
>   Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
>   drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
>   include/drm/drm_file.h                |  1 +
>   include/drm/drm_gem.h                 | 19 +++++++
>   4 files changed, 117 insertions(+)
> 
> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> index b46327356e80..b5e7802532ed 100644
> --- a/Documentation/gpu/drm-usage-stats.rst
> +++ b/Documentation/gpu/drm-usage-stats.rst
> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
>   Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
>   indicating kibi- or mebi-bytes.
>   
> +- drm-shared-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are shared with another file (ie. have more
> +than a single handle).
> +
> +- drm-private-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are not shared with another file.
> +
> +- drm-resident-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are resident in system memory.

I think this naming maybe does not work best with the existing 
drm-memory-<region> keys.

How about introduce the concept of a memory region from the start and 
use naming similar like we do for engines?

drm-memory-$CATEGORY-$REGION: ...

Then we document a bunch of categories and their semantics, for instance:

'size' - All reachable objects
'shared' - Subset of 'size' with handle_count > 1
'resident' - Objects with backing store
'active' - Objects in use, subset of resident
'purgeable' - Or inactive? Subset of resident.

We keep the same semantics as with process memory accounting (if I got 
it right) which could be desirable for a simplified mental model.

(AMD needs to remind me of their 'drm-memory-...' keys semantics. If we 
correctly captured this in the first round it should be equivalent to 
'resident' above. In any case we can document no category is equal to 
which category, and at most one of the two must be output.)

Region names we at most partially standardize. Like we could say 
'system' is to be used where backing store is system RAM and others are 
driver defined.

Then discrete GPUs could emit N sets of key-values, one for each memory 
region they support.

I think this all also works for objects which can be migrated between 
memory regions. 'Size' accounts them against all regions while for 
'resident' they only appear in the region of their current placement, etc.

Userspace can aggregate if it wishes to do so but kernel side should not.

> +
> +- drm-purgeable-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are purgeable.
> +
> +- drm-active-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are active on one or more rings.
> +
>   - drm-cycles-<str> <uint>
>   
>   Engine identifier string must be the same as the one specified in the
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index 37dfaa6be560..46fdd843bb3a 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -42,6 +42,7 @@
>   #include <drm/drm_client.h>
>   #include <drm/drm_drv.h>
>   #include <drm/drm_file.h>
> +#include <drm/drm_gem.h>
>   #include <drm/drm_print.h>
>   
>   #include "drm_crtc_internal.h"
> @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
>   }
>   EXPORT_SYMBOL(drm_send_event);
>   
> +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> +{
> +	const char *units[] = {"", " KiB", " MiB"};
> +	unsigned u;
> +
> +	for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> +		if (sz < SZ_1K)
> +			break;
> +		sz = div_u64(sz, SZ_1K);
> +	}
> +
> +	drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> +}
> +
> +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> +{
> +	struct drm_gem_object *obj;
> +	struct {
> +		size_t shared;
> +		size_t private;
> +		size_t resident;
> +		size_t purgeable;
> +		size_t active;
> +	} size = {0};
> +	bool has_status = false;
> +	int id;
> +
> +	spin_lock(&file->table_lock);
> +	idr_for_each_entry (&file->object_idr, obj, id) {
> +		enum drm_gem_object_status s = 0;
> +
> +		if (obj->funcs && obj->funcs->status) {
> +			s = obj->funcs->status(obj);
> +			has_status = true;
> +		}
> +
> +		if (obj->handle_count > 1) {
> +			size.shared += obj->size;
> +		} else {
> +			size.private += obj->size;
> +		}
> +
> +		if (s & DRM_GEM_OBJECT_RESIDENT) {
> +			size.resident += obj->size;
> +		} else {
> +			/* If already purged or not yet backed by pages, don't
> +			 * count it as purgeable:
> +			 */
> +			s &= ~DRM_GEM_OBJECT_PURGEABLE;

Side question - why couldn't resident buffers be purgeable? Did you mean 
for the if branch check to be active here? But then it wouldn't make 
sense for a driver to report active _and_ purgeable..

> +		}
> +
> +		if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> +			size.active += obj->size;
> +
> +			/* If still active, don't count as purgeable: */
> +			s &= ~DRM_GEM_OBJECT_PURGEABLE;

Another side question - I guess this tidies a race in reporting? If so 
not sure it matters given the stats are all rather approximate.

> +		}
> +
> +		if (s & DRM_GEM_OBJECT_PURGEABLE)
> +			size.purgeable += obj->size;
> +	}

One concern I have here is that it is all based on obj->size. That is, 
there is no provision for drivers to implement page level granularity. 
So correct reporting in use cases such as VM BIND in the future wouldn't 
work unless it was a driver hook to get almost all of the info above. At 
which point common code is just a loop. TBF I don't know if any drivers 
do sub obj->size backing store granularity today, but I think it is 
sometimes to be sure of before proceeding.

Second concern is what I touched upon in the first reply block - if the 
common code blindly loops over all objects then on discrete GPUs it 
seems we get an 'aggregate' value here which is not what I think we 
want. We rather want to have the ability for drivers to list stats per 
individual memory region.

> +	spin_unlock(&file->table_lock);
> +
> +	print_size(p, "drm-shared-memory", size.shared);
> +	print_size(p, "drm-private-memory", size.private);
> +	print_size(p, "drm-active-memory", size.active);
> +
> +	if (has_status) {
> +		print_size(p, "drm-resident-memory", size.resident);
> +		print_size(p, "drm-purgeable-memory", size.purgeable);
> +	}
> +}
> +
>   /**
>    * drm_fop_show_fdinfo - helper for drm file fops
>    * @seq_file: output stream
> @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
>   
>   	if (dev->driver->show_fdinfo)
>   		dev->driver->show_fdinfo(&p, file);
> +
> +	print_memory_stats(&p, file);
>   }
>   EXPORT_SYMBOL(drm_fop_show_fdinfo);
>   
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index dfa995b787e1..e5b40084538f 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -41,6 +41,7 @@
>   struct dma_fence;
>   struct drm_file;
>   struct drm_device;
> +struct drm_printer;
>   struct device;
>   struct file;
>   
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 189fd618ca65..213917bb6b11 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -42,6 +42,14 @@
>   struct iosys_map;
>   struct drm_gem_object;
>   
> +/**
> + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> + */
> +enum drm_gem_object_status {
> +	DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> +	DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> +};
> +
>   /**
>    * struct drm_gem_object_funcs - GEM object functions
>    */
> @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
>   	 */
>   	int (*evict)(struct drm_gem_object *obj);
>   
> +	/**
> +	 * @status:
> +	 *
> +	 * The optional status callback can return additional object state
> +	 * which determines which stats the object is counted against.  The
> +	 * callback is called under table_lock.  Racing against object status
> +	 * change is "harmless", and the callback can expect to not race
> +	 * against object destruction.
> +	 */
> +	enum drm_gem_object_status (*status)(struct drm_gem_object *obj);

Does this needs to be in object funcs and couldn't be consolidated to 
driver level?

Regards,

Tvrtko

> +
>   	/**
>   	 * @vm_ops:
>   	 *

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 4/7] drm/i915: Switch to fdinfo helper
  2023-04-12 13:51       ` Daniel Vetter
@ 2023-04-12 15:12         ` Tvrtko Ursulin
  -1 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-12 15:12 UTC (permalink / raw)
  To: Rob Clark, dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Christopher Healy, Emil Velikov, Rob Clark, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, David Airlie, intel-gfx,
	open list


On 12/04/2023 14:51, Daniel Vetter wrote:
> On Wed, Apr 12, 2023 at 01:32:43PM +0100, Tvrtko Ursulin wrote:
>>
>> On 11/04/2023 23:56, Rob Clark wrote:
>>> From: Rob Clark <robdclark@chromium.org>
>>>
>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>> ---
>>>    drivers/gpu/drm/i915/i915_driver.c     |  3 ++-
>>>    drivers/gpu/drm/i915/i915_drm_client.c | 18 +++++-------------
>>>    drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
>>>    3 files changed, 8 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>>> index db7a86def7e2..37eacaa3064b 100644
>>> --- a/drivers/gpu/drm/i915/i915_driver.c
>>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>>> @@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
>>>    	.compat_ioctl = i915_ioc32_compat_ioctl,
>>>    	.llseek = noop_llseek,
>>>    #ifdef CONFIG_PROC_FS
>>> -	.show_fdinfo = i915_drm_client_fdinfo,
>>> +	.show_fdinfo = drm_fop_show_fdinfo,
>>>    #endif
>>>    };
>>> @@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
>>>    	.open = i915_driver_open,
>>>    	.lastclose = i915_driver_lastclose,
>>>    	.postclose = i915_driver_postclose,
>>> +	.show_fdinfo = i915_drm_client_fdinfo,
>>>    	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>>>    	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
>>> diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
>>> index b09d1d386574..4a77e5e47f79 100644
>>> --- a/drivers/gpu/drm/i915/i915_drm_client.c
>>> +++ b/drivers/gpu/drm/i915/i915_drm_client.c
>>> @@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
>>>    }
>>>    static void
>>> -show_client_class(struct seq_file *m,
>>> +show_client_class(struct drm_printer *p,
>>>    		  struct i915_drm_client *client,
>>>    		  unsigned int class)
>>>    {
>>> @@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
>>>    	rcu_read_unlock();
>>>    	if (capacity)
>>> -		seq_printf(m, "drm-engine-%s:\t%llu ns\n",
>>> +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
>>>    			   uabi_class_names[class], total);
>>>    	if (capacity > 1)
>>> -		seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
>>> +		drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
>>>    			   uabi_class_names[class],
>>>    			   capacity);
>>>    }
>>> -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
>>> +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
>>>    {
>>> -	struct drm_file *file = f->private_data;
>>>    	struct drm_i915_file_private *file_priv = file->driver_priv;
>>>    	struct drm_i915_private *i915 = file_priv->dev_priv;
>>>    	struct i915_drm_client *client = file_priv->client;
>>> -	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
>>>    	unsigned int i;
>>>    	/*
>>> @@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
>>>    	 * ******************************************************************
>>>    	 */
>>> -	seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
>>> -	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
>>> -		   pci_domain_nr(pdev->bus), pdev->bus->number,
>>> -		   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
>>> -	seq_printf(m, "drm-client-id:\t%u\n", client->id);
>>
>> As mentioned in my reply to the cover letter, I think the i915
>> implementation is the right one. At least the semantics of it.
>>
>> Granted it is a super set of the minimum required as documented by
>> drm-usage-stats.rst - not only 1:1 to current instances of struct file, but
>> also avoids immediate id recycling.
>>
>> Former could perhaps be achieved with a simple pointer hash, but latter
>> helps userspace detect when a client has exited and id re-allocated to a new
>> client within a single scanning period.
>>
>> Without this I don't think userspace can implement a fail safe method of
>> detecting which clients are new ones and so wouldn't be able to track
>> history correctly.
>>
>> I think we should rather extend the documented contract to include the
>> cyclical property than settle for a weaker common implementation.
> 
> atomic64_t never wraps, so you don't have any recycling issues?

Okay yes, with 64 bits there aren't any practical recycling issues.

> The other piece and imo much more important is that I really don't want
> the i915_drm_client design to spread, it conceptually makes no sense.
> drm_file is the uapi object, once that's gone userspace will never be able
> to look at anything, having a separate free-standing object that's
> essentially always dead is backwards.
> 
> I went a bit more in-depth in a different thread on scheduler fd_info
> stats, but essentially fd_info needs to pull stats, you should never push
> stats towards the drm_file (or i915_drm_client). That avoids all the
> refcounting issues and rcu needs and everything else like that.
> 
> Maybe you want to jump into that thread:
> https://lore.kernel.org/dri-devel/CAKMK7uE=m3sSTQrLCeDg0vG8viODOecUsYDK1oC++f5pQi0e8Q@mail.gmail.com/
> 
> So retiring i915_drm_client infrastructure is the right direction I think.

Hmmm.. it is a _mostly_ pull model that we have in i915 ie. data is 
pulled on fdinfo queries.

_Mostly_ because it cannot be fully pull based when you look at some 
internal flows. We have to save some data at runtime at times not driven 
by the fdinfo queries.

For instance context close needs to record the GPU utilisation against 
the client so that it is not lost. Also in the execlists backend we must 
transfer the hardware tracked runtime into the software state when hw 
contexts are switched out.

The fact i915_drm_client is detached from file_priv is a consequence of 
the fact i915 GEM contexts can outlive drm_file, and that when such 
contexts are closed, we need a to record their runtimes.

So I think there are three options: how it is now, fully krefed 
drm_file, or prohibit persistent contexts. Last one don't think we can 
do due ABI and 2nd felt heavy handed so I choose a lightweight 
i915_drm_client option.

Maybe there is a fourth option of somehow detecting during context 
destruction that drm_file is gone and skip the runtime recording, but 
avoiding races and all did not make me want to entertain it much. Is 
this actually what you are proposing?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Intel-gfx] [PATCH v3 4/7] drm/i915: Switch to fdinfo helper
@ 2023-04-12 15:12         ` Tvrtko Ursulin
  0 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-12 15:12 UTC (permalink / raw)
  To: Rob Clark, dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Christopher Healy, Emil Velikov, Rob Clark, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, David Airlie, intel-gfx,
	open list


On 12/04/2023 14:51, Daniel Vetter wrote:
> On Wed, Apr 12, 2023 at 01:32:43PM +0100, Tvrtko Ursulin wrote:
>>
>> On 11/04/2023 23:56, Rob Clark wrote:
>>> From: Rob Clark <robdclark@chromium.org>
>>>
>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>> ---
>>>    drivers/gpu/drm/i915/i915_driver.c     |  3 ++-
>>>    drivers/gpu/drm/i915/i915_drm_client.c | 18 +++++-------------
>>>    drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
>>>    3 files changed, 8 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>>> index db7a86def7e2..37eacaa3064b 100644
>>> --- a/drivers/gpu/drm/i915/i915_driver.c
>>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>>> @@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
>>>    	.compat_ioctl = i915_ioc32_compat_ioctl,
>>>    	.llseek = noop_llseek,
>>>    #ifdef CONFIG_PROC_FS
>>> -	.show_fdinfo = i915_drm_client_fdinfo,
>>> +	.show_fdinfo = drm_fop_show_fdinfo,
>>>    #endif
>>>    };
>>> @@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
>>>    	.open = i915_driver_open,
>>>    	.lastclose = i915_driver_lastclose,
>>>    	.postclose = i915_driver_postclose,
>>> +	.show_fdinfo = i915_drm_client_fdinfo,
>>>    	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>>>    	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
>>> diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
>>> index b09d1d386574..4a77e5e47f79 100644
>>> --- a/drivers/gpu/drm/i915/i915_drm_client.c
>>> +++ b/drivers/gpu/drm/i915/i915_drm_client.c
>>> @@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
>>>    }
>>>    static void
>>> -show_client_class(struct seq_file *m,
>>> +show_client_class(struct drm_printer *p,
>>>    		  struct i915_drm_client *client,
>>>    		  unsigned int class)
>>>    {
>>> @@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
>>>    	rcu_read_unlock();
>>>    	if (capacity)
>>> -		seq_printf(m, "drm-engine-%s:\t%llu ns\n",
>>> +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
>>>    			   uabi_class_names[class], total);
>>>    	if (capacity > 1)
>>> -		seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
>>> +		drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
>>>    			   uabi_class_names[class],
>>>    			   capacity);
>>>    }
>>> -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
>>> +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
>>>    {
>>> -	struct drm_file *file = f->private_data;
>>>    	struct drm_i915_file_private *file_priv = file->driver_priv;
>>>    	struct drm_i915_private *i915 = file_priv->dev_priv;
>>>    	struct i915_drm_client *client = file_priv->client;
>>> -	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
>>>    	unsigned int i;
>>>    	/*
>>> @@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
>>>    	 * ******************************************************************
>>>    	 */
>>> -	seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
>>> -	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
>>> -		   pci_domain_nr(pdev->bus), pdev->bus->number,
>>> -		   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
>>> -	seq_printf(m, "drm-client-id:\t%u\n", client->id);
>>
>> As mentioned in my reply to the cover letter, I think the i915
>> implementation is the right one. At least the semantics of it.
>>
>> Granted it is a super set of the minimum required as documented by
>> drm-usage-stats.rst - not only 1:1 to current instances of struct file, but
>> also avoids immediate id recycling.
>>
>> Former could perhaps be achieved with a simple pointer hash, but latter
>> helps userspace detect when a client has exited and id re-allocated to a new
>> client within a single scanning period.
>>
>> Without this I don't think userspace can implement a fail safe method of
>> detecting which clients are new ones and so wouldn't be able to track
>> history correctly.
>>
>> I think we should rather extend the documented contract to include the
>> cyclical property than settle for a weaker common implementation.
> 
> atomic64_t never wraps, so you don't have any recycling issues?

Okay yes, with 64 bits there aren't any practical recycling issues.

> The other piece and imo much more important is that I really don't want
> the i915_drm_client design to spread, it conceptually makes no sense.
> drm_file is the uapi object, once that's gone userspace will never be able
> to look at anything, having a separate free-standing object that's
> essentially always dead is backwards.
> 
> I went a bit more in-depth in a different thread on scheduler fd_info
> stats, but essentially fd_info needs to pull stats, you should never push
> stats towards the drm_file (or i915_drm_client). That avoids all the
> refcounting issues and rcu needs and everything else like that.
> 
> Maybe you want to jump into that thread:
> https://lore.kernel.org/dri-devel/CAKMK7uE=m3sSTQrLCeDg0vG8viODOecUsYDK1oC++f5pQi0e8Q@mail.gmail.com/
> 
> So retiring i915_drm_client infrastructure is the right direction I think.

Hmmm.. it is a _mostly_ pull model that we have in i915 ie. data is 
pulled on fdinfo queries.

_Mostly_ because it cannot be fully pull based when you look at some 
internal flows. We have to save some data at runtime at times not driven 
by the fdinfo queries.

For instance context close needs to record the GPU utilisation against 
the client so that it is not lost. Also in the execlists backend we must 
transfer the hardware tracked runtime into the software state when hw 
contexts are switched out.

The fact i915_drm_client is detached from file_priv is a consequence of 
the fact i915 GEM contexts can outlive drm_file, and that when such 
contexts are closed, we need a to record their runtimes.

So I think there are three options: how it is now, fully krefed 
drm_file, or prohibit persistent contexts. Last one don't think we can 
do due ABI and 2nd felt heavy handed so I choose a lightweight 
i915_drm_client option.

Maybe there is a fourth option of somehow detecting during context 
destruction that drm_file is gone and skip the runtime recording, but 
avoiding races and all did not make me want to entertain it much. Is 
this actually what you are proposing?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-12 14:42     ` Tvrtko Ursulin
@ 2023-04-12 17:59       ` Rob Clark
  -1 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-12 17:59 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Christopher Healy, Emil Velikov, Rob Clark, David Airlie,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, open list:DOCUMENTATION,
	open list

On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 11/04/2023 23:56, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Add support to dump GEM stats to fdinfo.
> >
> > v2: Fix typos, change size units to match docs, use div_u64
> > v3: Do it in core
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > ---
> >   Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >   drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >   include/drm/drm_file.h                |  1 +
> >   include/drm/drm_gem.h                 | 19 +++++++
> >   4 files changed, 117 insertions(+)
> >
> > diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > index b46327356e80..b5e7802532ed 100644
> > --- a/Documentation/gpu/drm-usage-stats.rst
> > +++ b/Documentation/gpu/drm-usage-stats.rst
> > @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >   Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >   indicating kibi- or mebi-bytes.
> >
> > +- drm-shared-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are shared with another file (ie. have more
> > +than a single handle).
> > +
> > +- drm-private-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are not shared with another file.
> > +
> > +- drm-resident-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are resident in system memory.
>
> I think this naming maybe does not work best with the existing
> drm-memory-<region> keys.

Actually, it was very deliberate not to conflict with the existing
drm-memory-<region> keys ;-)

I wouldn't have preferred drm-memory-{active,resident,...} but it
could be mis-parsed by existing userspace so my hands were a bit tied.

> How about introduce the concept of a memory region from the start and
> use naming similar like we do for engines?
>
> drm-memory-$CATEGORY-$REGION: ...
>
> Then we document a bunch of categories and their semantics, for instance:
>
> 'size' - All reachable objects
> 'shared' - Subset of 'size' with handle_count > 1
> 'resident' - Objects with backing store
> 'active' - Objects in use, subset of resident
> 'purgeable' - Or inactive? Subset of resident.
>
> We keep the same semantics as with process memory accounting (if I got
> it right) which could be desirable for a simplified mental model.
>
> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> correctly captured this in the first round it should be equivalent to
> 'resident' above. In any case we can document no category is equal to
> which category, and at most one of the two must be output.)
>
> Region names we at most partially standardize. Like we could say
> 'system' is to be used where backing store is system RAM and others are
> driver defined.
>
> Then discrete GPUs could emit N sets of key-values, one for each memory
> region they support.
>
> I think this all also works for objects which can be migrated between
> memory regions. 'Size' accounts them against all regions while for
> 'resident' they only appear in the region of their current placement, etc.

I'm not too sure how to rectify different memory regions with this,
since drm core doesn't really know about the driver's memory regions.
Perhaps we can go back to this being a helper and drivers with vram
just don't use the helper?  Or??

BR,
-R

> Userspace can aggregate if it wishes to do so but kernel side should not.
>
> > +
> > +- drm-purgeable-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are purgeable.
> > +
> > +- drm-active-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are active on one or more rings.
> > +
> >   - drm-cycles-<str> <uint>
> >
> >   Engine identifier string must be the same as the one specified in the
> > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > index 37dfaa6be560..46fdd843bb3a 100644
> > --- a/drivers/gpu/drm/drm_file.c
> > +++ b/drivers/gpu/drm/drm_file.c
> > @@ -42,6 +42,7 @@
> >   #include <drm/drm_client.h>
> >   #include <drm/drm_drv.h>
> >   #include <drm/drm_file.h>
> > +#include <drm/drm_gem.h>
> >   #include <drm/drm_print.h>
> >
> >   #include "drm_crtc_internal.h"
> > @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> >   }
> >   EXPORT_SYMBOL(drm_send_event);
> >
> > +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> > +{
> > +     const char *units[] = {"", " KiB", " MiB"};
> > +     unsigned u;
> > +
> > +     for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> > +             if (sz < SZ_1K)
> > +                     break;
> > +             sz = div_u64(sz, SZ_1K);
> > +     }
> > +
> > +     drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> > +}
> > +
> > +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> > +{
> > +     struct drm_gem_object *obj;
> > +     struct {
> > +             size_t shared;
> > +             size_t private;
> > +             size_t resident;
> > +             size_t purgeable;
> > +             size_t active;
> > +     } size = {0};
> > +     bool has_status = false;
> > +     int id;
> > +
> > +     spin_lock(&file->table_lock);
> > +     idr_for_each_entry (&file->object_idr, obj, id) {
> > +             enum drm_gem_object_status s = 0;
> > +
> > +             if (obj->funcs && obj->funcs->status) {
> > +                     s = obj->funcs->status(obj);
> > +                     has_status = true;
> > +             }
> > +
> > +             if (obj->handle_count > 1) {
> > +                     size.shared += obj->size;
> > +             } else {
> > +                     size.private += obj->size;
> > +             }
> > +
> > +             if (s & DRM_GEM_OBJECT_RESIDENT) {
> > +                     size.resident += obj->size;
> > +             } else {
> > +                     /* If already purged or not yet backed by pages, don't
> > +                      * count it as purgeable:
> > +                      */
> > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
>
> Side question - why couldn't resident buffers be purgeable? Did you mean
> for the if branch check to be active here? But then it wouldn't make
> sense for a driver to report active _and_ purgeable..
>
> > +             }
> > +
> > +             if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> > +                     size.active += obj->size;
> > +
> > +                     /* If still active, don't count as purgeable: */
> > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
>
> Another side question - I guess this tidies a race in reporting? If so
> not sure it matters given the stats are all rather approximate.
>
> > +             }
> > +
> > +             if (s & DRM_GEM_OBJECT_PURGEABLE)
> > +                     size.purgeable += obj->size;
> > +     }
>
> One concern I have here is that it is all based on obj->size. That is,
> there is no provision for drivers to implement page level granularity.
> So correct reporting in use cases such as VM BIND in the future wouldn't
> work unless it was a driver hook to get almost all of the info above. At
> which point common code is just a loop. TBF I don't know if any drivers
> do sub obj->size backing store granularity today, but I think it is
> sometimes to be sure of before proceeding.
>
> Second concern is what I touched upon in the first reply block - if the
> common code blindly loops over all objects then on discrete GPUs it
> seems we get an 'aggregate' value here which is not what I think we
> want. We rather want to have the ability for drivers to list stats per
> individual memory region.
>
> > +     spin_unlock(&file->table_lock);
> > +
> > +     print_size(p, "drm-shared-memory", size.shared);
> > +     print_size(p, "drm-private-memory", size.private);
> > +     print_size(p, "drm-active-memory", size.active);
> > +
> > +     if (has_status) {
> > +             print_size(p, "drm-resident-memory", size.resident);
> > +             print_size(p, "drm-purgeable-memory", size.purgeable);
> > +     }
> > +}
> > +
> >   /**
> >    * drm_fop_show_fdinfo - helper for drm file fops
> >    * @seq_file: output stream
> > @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> >
> >       if (dev->driver->show_fdinfo)
> >               dev->driver->show_fdinfo(&p, file);
> > +
> > +     print_memory_stats(&p, file);
> >   }
> >   EXPORT_SYMBOL(drm_fop_show_fdinfo);
> >
> > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > index dfa995b787e1..e5b40084538f 100644
> > --- a/include/drm/drm_file.h
> > +++ b/include/drm/drm_file.h
> > @@ -41,6 +41,7 @@
> >   struct dma_fence;
> >   struct drm_file;
> >   struct drm_device;
> > +struct drm_printer;
> >   struct device;
> >   struct file;
> >
> > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > index 189fd618ca65..213917bb6b11 100644
> > --- a/include/drm/drm_gem.h
> > +++ b/include/drm/drm_gem.h
> > @@ -42,6 +42,14 @@
> >   struct iosys_map;
> >   struct drm_gem_object;
> >
> > +/**
> > + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> > + */
> > +enum drm_gem_object_status {
> > +     DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> > +     DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> > +};
> > +
> >   /**
> >    * struct drm_gem_object_funcs - GEM object functions
> >    */
> > @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
> >        */
> >       int (*evict)(struct drm_gem_object *obj);
> >
> > +     /**
> > +      * @status:
> > +      *
> > +      * The optional status callback can return additional object state
> > +      * which determines which stats the object is counted against.  The
> > +      * callback is called under table_lock.  Racing against object status
> > +      * change is "harmless", and the callback can expect to not race
> > +      * against object destruction.
> > +      */
> > +     enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
>
> Does this needs to be in object funcs and couldn't be consolidated to
> driver level?
>
> Regards,
>
> Tvrtko
>
> > +
> >       /**
> >        * @vm_ops:
> >        *

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-12 17:59       ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-12 17:59 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, Thomas Zimmermann, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, freedreno

On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 11/04/2023 23:56, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Add support to dump GEM stats to fdinfo.
> >
> > v2: Fix typos, change size units to match docs, use div_u64
> > v3: Do it in core
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > ---
> >   Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >   drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >   include/drm/drm_file.h                |  1 +
> >   include/drm/drm_gem.h                 | 19 +++++++
> >   4 files changed, 117 insertions(+)
> >
> > diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > index b46327356e80..b5e7802532ed 100644
> > --- a/Documentation/gpu/drm-usage-stats.rst
> > +++ b/Documentation/gpu/drm-usage-stats.rst
> > @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >   Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >   indicating kibi- or mebi-bytes.
> >
> > +- drm-shared-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are shared with another file (ie. have more
> > +than a single handle).
> > +
> > +- drm-private-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are not shared with another file.
> > +
> > +- drm-resident-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are resident in system memory.
>
> I think this naming maybe does not work best with the existing
> drm-memory-<region> keys.

Actually, it was very deliberate not to conflict with the existing
drm-memory-<region> keys ;-)

I wouldn't have preferred drm-memory-{active,resident,...} but it
could be mis-parsed by existing userspace so my hands were a bit tied.

> How about introduce the concept of a memory region from the start and
> use naming similar like we do for engines?
>
> drm-memory-$CATEGORY-$REGION: ...
>
> Then we document a bunch of categories and their semantics, for instance:
>
> 'size' - All reachable objects
> 'shared' - Subset of 'size' with handle_count > 1
> 'resident' - Objects with backing store
> 'active' - Objects in use, subset of resident
> 'purgeable' - Or inactive? Subset of resident.
>
> We keep the same semantics as with process memory accounting (if I got
> it right) which could be desirable for a simplified mental model.
>
> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> correctly captured this in the first round it should be equivalent to
> 'resident' above. In any case we can document no category is equal to
> which category, and at most one of the two must be output.)
>
> Region names we at most partially standardize. Like we could say
> 'system' is to be used where backing store is system RAM and others are
> driver defined.
>
> Then discrete GPUs could emit N sets of key-values, one for each memory
> region they support.
>
> I think this all also works for objects which can be migrated between
> memory regions. 'Size' accounts them against all regions while for
> 'resident' they only appear in the region of their current placement, etc.

I'm not too sure how to rectify different memory regions with this,
since drm core doesn't really know about the driver's memory regions.
Perhaps we can go back to this being a helper and drivers with vram
just don't use the helper?  Or??

BR,
-R

> Userspace can aggregate if it wishes to do so but kernel side should not.
>
> > +
> > +- drm-purgeable-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are purgeable.
> > +
> > +- drm-active-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are active on one or more rings.
> > +
> >   - drm-cycles-<str> <uint>
> >
> >   Engine identifier string must be the same as the one specified in the
> > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > index 37dfaa6be560..46fdd843bb3a 100644
> > --- a/drivers/gpu/drm/drm_file.c
> > +++ b/drivers/gpu/drm/drm_file.c
> > @@ -42,6 +42,7 @@
> >   #include <drm/drm_client.h>
> >   #include <drm/drm_drv.h>
> >   #include <drm/drm_file.h>
> > +#include <drm/drm_gem.h>
> >   #include <drm/drm_print.h>
> >
> >   #include "drm_crtc_internal.h"
> > @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> >   }
> >   EXPORT_SYMBOL(drm_send_event);
> >
> > +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> > +{
> > +     const char *units[] = {"", " KiB", " MiB"};
> > +     unsigned u;
> > +
> > +     for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> > +             if (sz < SZ_1K)
> > +                     break;
> > +             sz = div_u64(sz, SZ_1K);
> > +     }
> > +
> > +     drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> > +}
> > +
> > +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> > +{
> > +     struct drm_gem_object *obj;
> > +     struct {
> > +             size_t shared;
> > +             size_t private;
> > +             size_t resident;
> > +             size_t purgeable;
> > +             size_t active;
> > +     } size = {0};
> > +     bool has_status = false;
> > +     int id;
> > +
> > +     spin_lock(&file->table_lock);
> > +     idr_for_each_entry (&file->object_idr, obj, id) {
> > +             enum drm_gem_object_status s = 0;
> > +
> > +             if (obj->funcs && obj->funcs->status) {
> > +                     s = obj->funcs->status(obj);
> > +                     has_status = true;
> > +             }
> > +
> > +             if (obj->handle_count > 1) {
> > +                     size.shared += obj->size;
> > +             } else {
> > +                     size.private += obj->size;
> > +             }
> > +
> > +             if (s & DRM_GEM_OBJECT_RESIDENT) {
> > +                     size.resident += obj->size;
> > +             } else {
> > +                     /* If already purged or not yet backed by pages, don't
> > +                      * count it as purgeable:
> > +                      */
> > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
>
> Side question - why couldn't resident buffers be purgeable? Did you mean
> for the if branch check to be active here? But then it wouldn't make
> sense for a driver to report active _and_ purgeable..
>
> > +             }
> > +
> > +             if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> > +                     size.active += obj->size;
> > +
> > +                     /* If still active, don't count as purgeable: */
> > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
>
> Another side question - I guess this tidies a race in reporting? If so
> not sure it matters given the stats are all rather approximate.
>
> > +             }
> > +
> > +             if (s & DRM_GEM_OBJECT_PURGEABLE)
> > +                     size.purgeable += obj->size;
> > +     }
>
> One concern I have here is that it is all based on obj->size. That is,
> there is no provision for drivers to implement page level granularity.
> So correct reporting in use cases such as VM BIND in the future wouldn't
> work unless it was a driver hook to get almost all of the info above. At
> which point common code is just a loop. TBF I don't know if any drivers
> do sub obj->size backing store granularity today, but I think it is
> sometimes to be sure of before proceeding.
>
> Second concern is what I touched upon in the first reply block - if the
> common code blindly loops over all objects then on discrete GPUs it
> seems we get an 'aggregate' value here which is not what I think we
> want. We rather want to have the ability for drivers to list stats per
> individual memory region.
>
> > +     spin_unlock(&file->table_lock);
> > +
> > +     print_size(p, "drm-shared-memory", size.shared);
> > +     print_size(p, "drm-private-memory", size.private);
> > +     print_size(p, "drm-active-memory", size.active);
> > +
> > +     if (has_status) {
> > +             print_size(p, "drm-resident-memory", size.resident);
> > +             print_size(p, "drm-purgeable-memory", size.purgeable);
> > +     }
> > +}
> > +
> >   /**
> >    * drm_fop_show_fdinfo - helper for drm file fops
> >    * @seq_file: output stream
> > @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> >
> >       if (dev->driver->show_fdinfo)
> >               dev->driver->show_fdinfo(&p, file);
> > +
> > +     print_memory_stats(&p, file);
> >   }
> >   EXPORT_SYMBOL(drm_fop_show_fdinfo);
> >
> > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > index dfa995b787e1..e5b40084538f 100644
> > --- a/include/drm/drm_file.h
> > +++ b/include/drm/drm_file.h
> > @@ -41,6 +41,7 @@
> >   struct dma_fence;
> >   struct drm_file;
> >   struct drm_device;
> > +struct drm_printer;
> >   struct device;
> >   struct file;
> >
> > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > index 189fd618ca65..213917bb6b11 100644
> > --- a/include/drm/drm_gem.h
> > +++ b/include/drm/drm_gem.h
> > @@ -42,6 +42,14 @@
> >   struct iosys_map;
> >   struct drm_gem_object;
> >
> > +/**
> > + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> > + */
> > +enum drm_gem_object_status {
> > +     DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> > +     DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> > +};
> > +
> >   /**
> >    * struct drm_gem_object_funcs - GEM object functions
> >    */
> > @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
> >        */
> >       int (*evict)(struct drm_gem_object *obj);
> >
> > +     /**
> > +      * @status:
> > +      *
> > +      * The optional status callback can return additional object state
> > +      * which determines which stats the object is counted against.  The
> > +      * callback is called under table_lock.  Racing against object status
> > +      * change is "harmless", and the callback can expect to not race
> > +      * against object destruction.
> > +      */
> > +     enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
>
> Does this needs to be in object funcs and couldn't be consolidated to
> driver level?
>
> Regards,
>
> Tvrtko
>
> > +
> >       /**
> >        * @vm_ops:
> >        *

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 4/7] drm/i915: Switch to fdinfo helper
  2023-04-12 15:12         ` [Intel-gfx] " Tvrtko Ursulin
  (?)
@ 2023-04-12 18:13           ` Daniel Vetter
  -1 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12 18:13 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Christopher Healy, Emil Velikov, Rob Clark, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, David Airlie, intel-gfx,
	open list

On Wed, Apr 12, 2023 at 04:12:41PM +0100, Tvrtko Ursulin wrote:
> 
> On 12/04/2023 14:51, Daniel Vetter wrote:
> > On Wed, Apr 12, 2023 at 01:32:43PM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 11/04/2023 23:56, Rob Clark wrote:
> > > > From: Rob Clark <robdclark@chromium.org>
> > > > 
> > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > ---
> > > >    drivers/gpu/drm/i915/i915_driver.c     |  3 ++-
> > > >    drivers/gpu/drm/i915/i915_drm_client.c | 18 +++++-------------
> > > >    drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
> > > >    3 files changed, 8 insertions(+), 15 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> > > > index db7a86def7e2..37eacaa3064b 100644
> > > > --- a/drivers/gpu/drm/i915/i915_driver.c
> > > > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > > > @@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
> > > >    	.compat_ioctl = i915_ioc32_compat_ioctl,
> > > >    	.llseek = noop_llseek,
> > > >    #ifdef CONFIG_PROC_FS
> > > > -	.show_fdinfo = i915_drm_client_fdinfo,
> > > > +	.show_fdinfo = drm_fop_show_fdinfo,
> > > >    #endif
> > > >    };
> > > > @@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
> > > >    	.open = i915_driver_open,
> > > >    	.lastclose = i915_driver_lastclose,
> > > >    	.postclose = i915_driver_postclose,
> > > > +	.show_fdinfo = i915_drm_client_fdinfo,
> > > >    	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> > > >    	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > > > diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
> > > > index b09d1d386574..4a77e5e47f79 100644
> > > > --- a/drivers/gpu/drm/i915/i915_drm_client.c
> > > > +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> > > > @@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
> > > >    }
> > > >    static void
> > > > -show_client_class(struct seq_file *m,
> > > > +show_client_class(struct drm_printer *p,
> > > >    		  struct i915_drm_client *client,
> > > >    		  unsigned int class)
> > > >    {
> > > > @@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
> > > >    	rcu_read_unlock();
> > > >    	if (capacity)
> > > > -		seq_printf(m, "drm-engine-%s:\t%llu ns\n",
> > > > +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
> > > >    			   uabi_class_names[class], total);
> > > >    	if (capacity > 1)
> > > > -		seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
> > > > +		drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
> > > >    			   uabi_class_names[class],
> > > >    			   capacity);
> > > >    }
> > > > -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> > > > +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
> > > >    {
> > > > -	struct drm_file *file = f->private_data;
> > > >    	struct drm_i915_file_private *file_priv = file->driver_priv;
> > > >    	struct drm_i915_private *i915 = file_priv->dev_priv;
> > > >    	struct i915_drm_client *client = file_priv->client;
> > > > -	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
> > > >    	unsigned int i;
> > > >    	/*
> > > > @@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> > > >    	 * ******************************************************************
> > > >    	 */
> > > > -	seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
> > > > -	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
> > > > -		   pci_domain_nr(pdev->bus), pdev->bus->number,
> > > > -		   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> > > > -	seq_printf(m, "drm-client-id:\t%u\n", client->id);
> > > 
> > > As mentioned in my reply to the cover letter, I think the i915
> > > implementation is the right one. At least the semantics of it.
> > > 
> > > Granted it is a super set of the minimum required as documented by
> > > drm-usage-stats.rst - not only 1:1 to current instances of struct file, but
> > > also avoids immediate id recycling.
> > > 
> > > Former could perhaps be achieved with a simple pointer hash, but latter
> > > helps userspace detect when a client has exited and id re-allocated to a new
> > > client within a single scanning period.
> > > 
> > > Without this I don't think userspace can implement a fail safe method of
> > > detecting which clients are new ones and so wouldn't be able to track
> > > history correctly.
> > > 
> > > I think we should rather extend the documented contract to include the
> > > cyclical property than settle for a weaker common implementation.
> > 
> > atomic64_t never wraps, so you don't have any recycling issues?
> 
> Okay yes, with 64 bits there aren't any practical recycling issues.
> 
> > The other piece and imo much more important is that I really don't want
> > the i915_drm_client design to spread, it conceptually makes no sense.
> > drm_file is the uapi object, once that's gone userspace will never be able
> > to look at anything, having a separate free-standing object that's
> > essentially always dead is backwards.
> > 
> > I went a bit more in-depth in a different thread on scheduler fd_info
> > stats, but essentially fd_info needs to pull stats, you should never push
> > stats towards the drm_file (or i915_drm_client). That avoids all the
> > refcounting issues and rcu needs and everything else like that.
> > 
> > Maybe you want to jump into that thread:
> > https://lore.kernel.org/dri-devel/CAKMK7uE=m3sSTQrLCeDg0vG8viODOecUsYDK1oC++f5pQi0e8Q@mail.gmail.com/
> > 
> > So retiring i915_drm_client infrastructure is the right direction I think.
> 
> Hmmm.. it is a _mostly_ pull model that we have in i915 ie. data is pulled
> on fdinfo queries.
> 
> _Mostly_ because it cannot be fully pull based when you look at some
> internal flows. We have to save some data at runtime at times not driven by
> the fdinfo queries.
> 
> For instance context close needs to record the GPU utilisation against the
> client so that it is not lost. Also in the execlists backend we must
> transfer the hardware tracked runtime into the software state when hw
> contexts are switched out.
> 
> The fact i915_drm_client is detached from file_priv is a consequence of the
> fact i915 GEM contexts can outlive drm_file, and that when such contexts are
> closed, we need a to record their runtimes.
> 
> So I think there are three options: how it is now, fully krefed drm_file, or
> prohibit persistent contexts. Last one don't think we can do due ABI and 2nd
> felt heavy handed so I choose a lightweight i915_drm_client option.
> 
> Maybe there is a fourth option of somehow detecting during context
> destruction that drm_file is gone and skip the runtime recording, but
> avoiding races and all did not make me want to entertain it much. Is this
> actually what you are proposing?

Hm right, persistent context, the annoying thing I missed again. From a
quick look amdgpu gets away with that by shooting all contexts
synchronously on drmfd close, which is the thing i915 can't because uapi.

The other part of the trick is to ... not care :-) See
amdgpu_ctx_fence_time(). I guess what would work a bit better is a
drm_file context list under a spinlock (which would need to be per
drm_device probably), which is cleaned up both when the final context ref
drops and when the drmfd closes, and you push back the final tally just
under that spinlock. But that's not how drm_sched_entity works right now,
that disappears before the final in-flight jobs have finished.

But yeah unless we just shrug and accept an accounting hole some minimal
push-back (at least while the drm_file is still alive) is needed to add
back the final tally when a context is destroyed.

Anyway I think it'd be good if you can follow that sched fd_info thread a
bit, to make sure it's not too silly :-) i915 won't use it, but xe will
eventually.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 4/7] drm/i915: Switch to fdinfo helper
@ 2023-04-12 18:13           ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12 18:13 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, linux-arm-msm, intel-gfx, Emil Velikov,
	Christopher Healy, dri-devel, open list, Boris Brezillon,
	Rodrigo Vivi, freedreno

On Wed, Apr 12, 2023 at 04:12:41PM +0100, Tvrtko Ursulin wrote:
> 
> On 12/04/2023 14:51, Daniel Vetter wrote:
> > On Wed, Apr 12, 2023 at 01:32:43PM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 11/04/2023 23:56, Rob Clark wrote:
> > > > From: Rob Clark <robdclark@chromium.org>
> > > > 
> > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > ---
> > > >    drivers/gpu/drm/i915/i915_driver.c     |  3 ++-
> > > >    drivers/gpu/drm/i915/i915_drm_client.c | 18 +++++-------------
> > > >    drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
> > > >    3 files changed, 8 insertions(+), 15 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> > > > index db7a86def7e2..37eacaa3064b 100644
> > > > --- a/drivers/gpu/drm/i915/i915_driver.c
> > > > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > > > @@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
> > > >    	.compat_ioctl = i915_ioc32_compat_ioctl,
> > > >    	.llseek = noop_llseek,
> > > >    #ifdef CONFIG_PROC_FS
> > > > -	.show_fdinfo = i915_drm_client_fdinfo,
> > > > +	.show_fdinfo = drm_fop_show_fdinfo,
> > > >    #endif
> > > >    };
> > > > @@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
> > > >    	.open = i915_driver_open,
> > > >    	.lastclose = i915_driver_lastclose,
> > > >    	.postclose = i915_driver_postclose,
> > > > +	.show_fdinfo = i915_drm_client_fdinfo,
> > > >    	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> > > >    	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > > > diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
> > > > index b09d1d386574..4a77e5e47f79 100644
> > > > --- a/drivers/gpu/drm/i915/i915_drm_client.c
> > > > +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> > > > @@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
> > > >    }
> > > >    static void
> > > > -show_client_class(struct seq_file *m,
> > > > +show_client_class(struct drm_printer *p,
> > > >    		  struct i915_drm_client *client,
> > > >    		  unsigned int class)
> > > >    {
> > > > @@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
> > > >    	rcu_read_unlock();
> > > >    	if (capacity)
> > > > -		seq_printf(m, "drm-engine-%s:\t%llu ns\n",
> > > > +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
> > > >    			   uabi_class_names[class], total);
> > > >    	if (capacity > 1)
> > > > -		seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
> > > > +		drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
> > > >    			   uabi_class_names[class],
> > > >    			   capacity);
> > > >    }
> > > > -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> > > > +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
> > > >    {
> > > > -	struct drm_file *file = f->private_data;
> > > >    	struct drm_i915_file_private *file_priv = file->driver_priv;
> > > >    	struct drm_i915_private *i915 = file_priv->dev_priv;
> > > >    	struct i915_drm_client *client = file_priv->client;
> > > > -	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
> > > >    	unsigned int i;
> > > >    	/*
> > > > @@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> > > >    	 * ******************************************************************
> > > >    	 */
> > > > -	seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
> > > > -	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
> > > > -		   pci_domain_nr(pdev->bus), pdev->bus->number,
> > > > -		   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> > > > -	seq_printf(m, "drm-client-id:\t%u\n", client->id);
> > > 
> > > As mentioned in my reply to the cover letter, I think the i915
> > > implementation is the right one. At least the semantics of it.
> > > 
> > > Granted it is a super set of the minimum required as documented by
> > > drm-usage-stats.rst - not only 1:1 to current instances of struct file, but
> > > also avoids immediate id recycling.
> > > 
> > > Former could perhaps be achieved with a simple pointer hash, but latter
> > > helps userspace detect when a client has exited and id re-allocated to a new
> > > client within a single scanning period.
> > > 
> > > Without this I don't think userspace can implement a fail safe method of
> > > detecting which clients are new ones and so wouldn't be able to track
> > > history correctly.
> > > 
> > > I think we should rather extend the documented contract to include the
> > > cyclical property than settle for a weaker common implementation.
> > 
> > atomic64_t never wraps, so you don't have any recycling issues?
> 
> Okay yes, with 64 bits there aren't any practical recycling issues.
> 
> > The other piece and imo much more important is that I really don't want
> > the i915_drm_client design to spread, it conceptually makes no sense.
> > drm_file is the uapi object, once that's gone userspace will never be able
> > to look at anything, having a separate free-standing object that's
> > essentially always dead is backwards.
> > 
> > I went a bit more in-depth in a different thread on scheduler fd_info
> > stats, but essentially fd_info needs to pull stats, you should never push
> > stats towards the drm_file (or i915_drm_client). That avoids all the
> > refcounting issues and rcu needs and everything else like that.
> > 
> > Maybe you want to jump into that thread:
> > https://lore.kernel.org/dri-devel/CAKMK7uE=m3sSTQrLCeDg0vG8viODOecUsYDK1oC++f5pQi0e8Q@mail.gmail.com/
> > 
> > So retiring i915_drm_client infrastructure is the right direction I think.
> 
> Hmmm.. it is a _mostly_ pull model that we have in i915 ie. data is pulled
> on fdinfo queries.
> 
> _Mostly_ because it cannot be fully pull based when you look at some
> internal flows. We have to save some data at runtime at times not driven by
> the fdinfo queries.
> 
> For instance context close needs to record the GPU utilisation against the
> client so that it is not lost. Also in the execlists backend we must
> transfer the hardware tracked runtime into the software state when hw
> contexts are switched out.
> 
> The fact i915_drm_client is detached from file_priv is a consequence of the
> fact i915 GEM contexts can outlive drm_file, and that when such contexts are
> closed, we need a to record their runtimes.
> 
> So I think there are three options: how it is now, fully krefed drm_file, or
> prohibit persistent contexts. Last one don't think we can do due ABI and 2nd
> felt heavy handed so I choose a lightweight i915_drm_client option.
> 
> Maybe there is a fourth option of somehow detecting during context
> destruction that drm_file is gone and skip the runtime recording, but
> avoiding races and all did not make me want to entertain it much. Is this
> actually what you are proposing?

Hm right, persistent context, the annoying thing I missed again. From a
quick look amdgpu gets away with that by shooting all contexts
synchronously on drmfd close, which is the thing i915 can't because uapi.

The other part of the trick is to ... not care :-) See
amdgpu_ctx_fence_time(). I guess what would work a bit better is a
drm_file context list under a spinlock (which would need to be per
drm_device probably), which is cleaned up both when the final context ref
drops and when the drmfd closes, and you push back the final tally just
under that spinlock. But that's not how drm_sched_entity works right now,
that disappears before the final in-flight jobs have finished.

But yeah unless we just shrug and accept an accounting hole some minimal
push-back (at least while the drm_file is still alive) is needed to add
back the final tally when a context is destroyed.

Anyway I think it'd be good if you can follow that sched fd_info thread a
bit, to make sure it's not too silly :-) i915 won't use it, but xe will
eventually.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Intel-gfx] [PATCH v3 4/7] drm/i915: Switch to fdinfo helper
@ 2023-04-12 18:13           ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12 18:13 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, linux-arm-msm, intel-gfx, Christopher Healy,
	dri-devel, open list, Rodrigo Vivi, David Airlie, freedreno

On Wed, Apr 12, 2023 at 04:12:41PM +0100, Tvrtko Ursulin wrote:
> 
> On 12/04/2023 14:51, Daniel Vetter wrote:
> > On Wed, Apr 12, 2023 at 01:32:43PM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 11/04/2023 23:56, Rob Clark wrote:
> > > > From: Rob Clark <robdclark@chromium.org>
> > > > 
> > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > ---
> > > >    drivers/gpu/drm/i915/i915_driver.c     |  3 ++-
> > > >    drivers/gpu/drm/i915/i915_drm_client.c | 18 +++++-------------
> > > >    drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
> > > >    3 files changed, 8 insertions(+), 15 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> > > > index db7a86def7e2..37eacaa3064b 100644
> > > > --- a/drivers/gpu/drm/i915/i915_driver.c
> > > > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > > > @@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
> > > >    	.compat_ioctl = i915_ioc32_compat_ioctl,
> > > >    	.llseek = noop_llseek,
> > > >    #ifdef CONFIG_PROC_FS
> > > > -	.show_fdinfo = i915_drm_client_fdinfo,
> > > > +	.show_fdinfo = drm_fop_show_fdinfo,
> > > >    #endif
> > > >    };
> > > > @@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
> > > >    	.open = i915_driver_open,
> > > >    	.lastclose = i915_driver_lastclose,
> > > >    	.postclose = i915_driver_postclose,
> > > > +	.show_fdinfo = i915_drm_client_fdinfo,
> > > >    	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> > > >    	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > > > diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
> > > > index b09d1d386574..4a77e5e47f79 100644
> > > > --- a/drivers/gpu/drm/i915/i915_drm_client.c
> > > > +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> > > > @@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
> > > >    }
> > > >    static void
> > > > -show_client_class(struct seq_file *m,
> > > > +show_client_class(struct drm_printer *p,
> > > >    		  struct i915_drm_client *client,
> > > >    		  unsigned int class)
> > > >    {
> > > > @@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
> > > >    	rcu_read_unlock();
> > > >    	if (capacity)
> > > > -		seq_printf(m, "drm-engine-%s:\t%llu ns\n",
> > > > +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
> > > >    			   uabi_class_names[class], total);
> > > >    	if (capacity > 1)
> > > > -		seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
> > > > +		drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
> > > >    			   uabi_class_names[class],
> > > >    			   capacity);
> > > >    }
> > > > -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> > > > +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
> > > >    {
> > > > -	struct drm_file *file = f->private_data;
> > > >    	struct drm_i915_file_private *file_priv = file->driver_priv;
> > > >    	struct drm_i915_private *i915 = file_priv->dev_priv;
> > > >    	struct i915_drm_client *client = file_priv->client;
> > > > -	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
> > > >    	unsigned int i;
> > > >    	/*
> > > > @@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> > > >    	 * ******************************************************************
> > > >    	 */
> > > > -	seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
> > > > -	seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
> > > > -		   pci_domain_nr(pdev->bus), pdev->bus->number,
> > > > -		   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> > > > -	seq_printf(m, "drm-client-id:\t%u\n", client->id);
> > > 
> > > As mentioned in my reply to the cover letter, I think the i915
> > > implementation is the right one. At least the semantics of it.
> > > 
> > > Granted it is a super set of the minimum required as documented by
> > > drm-usage-stats.rst - not only 1:1 to current instances of struct file, but
> > > also avoids immediate id recycling.
> > > 
> > > Former could perhaps be achieved with a simple pointer hash, but latter
> > > helps userspace detect when a client has exited and id re-allocated to a new
> > > client within a single scanning period.
> > > 
> > > Without this I don't think userspace can implement a fail safe method of
> > > detecting which clients are new ones and so wouldn't be able to track
> > > history correctly.
> > > 
> > > I think we should rather extend the documented contract to include the
> > > cyclical property than settle for a weaker common implementation.
> > 
> > atomic64_t never wraps, so you don't have any recycling issues?
> 
> Okay yes, with 64 bits there aren't any practical recycling issues.
> 
> > The other piece and imo much more important is that I really don't want
> > the i915_drm_client design to spread, it conceptually makes no sense.
> > drm_file is the uapi object, once that's gone userspace will never be able
> > to look at anything, having a separate free-standing object that's
> > essentially always dead is backwards.
> > 
> > I went a bit more in-depth in a different thread on scheduler fd_info
> > stats, but essentially fd_info needs to pull stats, you should never push
> > stats towards the drm_file (or i915_drm_client). That avoids all the
> > refcounting issues and rcu needs and everything else like that.
> > 
> > Maybe you want to jump into that thread:
> > https://lore.kernel.org/dri-devel/CAKMK7uE=m3sSTQrLCeDg0vG8viODOecUsYDK1oC++f5pQi0e8Q@mail.gmail.com/
> > 
> > So retiring i915_drm_client infrastructure is the right direction I think.
> 
> Hmmm.. it is a _mostly_ pull model that we have in i915 ie. data is pulled
> on fdinfo queries.
> 
> _Mostly_ because it cannot be fully pull based when you look at some
> internal flows. We have to save some data at runtime at times not driven by
> the fdinfo queries.
> 
> For instance context close needs to record the GPU utilisation against the
> client so that it is not lost. Also in the execlists backend we must
> transfer the hardware tracked runtime into the software state when hw
> contexts are switched out.
> 
> The fact i915_drm_client is detached from file_priv is a consequence of the
> fact i915 GEM contexts can outlive drm_file, and that when such contexts are
> closed, we need a to record their runtimes.
> 
> So I think there are three options: how it is now, fully krefed drm_file, or
> prohibit persistent contexts. Last one don't think we can do due ABI and 2nd
> felt heavy handed so I choose a lightweight i915_drm_client option.
> 
> Maybe there is a fourth option of somehow detecting during context
> destruction that drm_file is gone and skip the runtime recording, but
> avoiding races and all did not make me want to entertain it much. Is this
> actually what you are proposing?

Hm right, persistent context, the annoying thing I missed again. From a
quick look amdgpu gets away with that by shooting all contexts
synchronously on drmfd close, which is the thing i915 can't because uapi.

The other part of the trick is to ... not care :-) See
amdgpu_ctx_fence_time(). I guess what would work a bit better is a
drm_file context list under a spinlock (which would need to be per
drm_device probably), which is cleaned up both when the final context ref
drops and when the drmfd closes, and you push back the final tally just
under that spinlock. But that's not how drm_sched_entity works right now,
that disappears before the final in-flight jobs have finished.

But yeah unless we just shrug and accept an accounting hole some minimal
push-back (at least while the drm_file is still alive) is needed to add
back the final tally when a context is destroyed.

Anyway I think it'd be good if you can follow that sched fd_info thread a
bit, to make sure it's not too silly :-) i915 won't use it, but xe will
eventually.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-12 17:59       ` Rob Clark
@ 2023-04-12 18:17         ` Daniel Vetter
  -1 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12 18:17 UTC (permalink / raw)
  To: Rob Clark
  Cc: Tvrtko Ursulin, dri-devel, linux-arm-msm, freedreno,
	Boris Brezillon, Christopher Healy, Emil Velikov, Rob Clark,
	David Airlie, Daniel Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, open list:DOCUMENTATION,
	open list

On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> >
> > On 11/04/2023 23:56, Rob Clark wrote:
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > Add support to dump GEM stats to fdinfo.
> > >
> > > v2: Fix typos, change size units to match docs, use div_u64
> > > v3: Do it in core
> > >
> > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > > ---
> > >   Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> > >   drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> > >   include/drm/drm_file.h                |  1 +
> > >   include/drm/drm_gem.h                 | 19 +++++++
> > >   4 files changed, 117 insertions(+)
> > >
> > > diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > > index b46327356e80..b5e7802532ed 100644
> > > --- a/Documentation/gpu/drm-usage-stats.rst
> > > +++ b/Documentation/gpu/drm-usage-stats.rst
> > > @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> > >   Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> > >   indicating kibi- or mebi-bytes.
> > >
> > > +- drm-shared-memory: <uint> [KiB|MiB]
> > > +
> > > +The total size of buffers that are shared with another file (ie. have more
> > > +than a single handle).
> > > +
> > > +- drm-private-memory: <uint> [KiB|MiB]
> > > +
> > > +The total size of buffers that are not shared with another file.
> > > +
> > > +- drm-resident-memory: <uint> [KiB|MiB]
> > > +
> > > +The total size of buffers that are resident in system memory.
> >
> > I think this naming maybe does not work best with the existing
> > drm-memory-<region> keys.
> 
> Actually, it was very deliberate not to conflict with the existing
> drm-memory-<region> keys ;-)
> 
> I wouldn't have preferred drm-memory-{active,resident,...} but it
> could be mis-parsed by existing userspace so my hands were a bit tied.
> 
> > How about introduce the concept of a memory region from the start and
> > use naming similar like we do for engines?
> >
> > drm-memory-$CATEGORY-$REGION: ...
> >
> > Then we document a bunch of categories and their semantics, for instance:
> >
> > 'size' - All reachable objects
> > 'shared' - Subset of 'size' with handle_count > 1
> > 'resident' - Objects with backing store
> > 'active' - Objects in use, subset of resident
> > 'purgeable' - Or inactive? Subset of resident.
> >
> > We keep the same semantics as with process memory accounting (if I got
> > it right) which could be desirable for a simplified mental model.
> >
> > (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> > correctly captured this in the first round it should be equivalent to
> > 'resident' above. In any case we can document no category is equal to
> > which category, and at most one of the two must be output.)
> >
> > Region names we at most partially standardize. Like we could say
> > 'system' is to be used where backing store is system RAM and others are
> > driver defined.
> >
> > Then discrete GPUs could emit N sets of key-values, one for each memory
> > region they support.
> >
> > I think this all also works for objects which can be migrated between
> > memory regions. 'Size' accounts them against all regions while for
> > 'resident' they only appear in the region of their current placement, etc.
> 
> I'm not too sure how to rectify different memory regions with this,
> since drm core doesn't really know about the driver's memory regions.
> Perhaps we can go back to this being a helper and drivers with vram
> just don't use the helper?  Or??

I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
all works out reasonably consistently?

And ttm could/should perhaps provide a helper to dump the region specific
version of this. Or we lift the concept of regions out of ttm a bit
higher, that's kinda needed for cgroups eventually anyway I think.
-Daniel

> 
> BR,
> -R
> 
> > Userspace can aggregate if it wishes to do so but kernel side should not.
> >
> > > +
> > > +- drm-purgeable-memory: <uint> [KiB|MiB]
> > > +
> > > +The total size of buffers that are purgeable.
> > > +
> > > +- drm-active-memory: <uint> [KiB|MiB]
> > > +
> > > +The total size of buffers that are active on one or more rings.
> > > +
> > >   - drm-cycles-<str> <uint>
> > >
> > >   Engine identifier string must be the same as the one specified in the
> > > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > > index 37dfaa6be560..46fdd843bb3a 100644
> > > --- a/drivers/gpu/drm/drm_file.c
> > > +++ b/drivers/gpu/drm/drm_file.c
> > > @@ -42,6 +42,7 @@
> > >   #include <drm/drm_client.h>
> > >   #include <drm/drm_drv.h>
> > >   #include <drm/drm_file.h>
> > > +#include <drm/drm_gem.h>
> > >   #include <drm/drm_print.h>
> > >
> > >   #include "drm_crtc_internal.h"
> > > @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> > >   }
> > >   EXPORT_SYMBOL(drm_send_event);
> > >
> > > +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> > > +{
> > > +     const char *units[] = {"", " KiB", " MiB"};
> > > +     unsigned u;
> > > +
> > > +     for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> > > +             if (sz < SZ_1K)
> > > +                     break;
> > > +             sz = div_u64(sz, SZ_1K);
> > > +     }
> > > +
> > > +     drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> > > +}
> > > +
> > > +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> > > +{
> > > +     struct drm_gem_object *obj;
> > > +     struct {
> > > +             size_t shared;
> > > +             size_t private;
> > > +             size_t resident;
> > > +             size_t purgeable;
> > > +             size_t active;
> > > +     } size = {0};
> > > +     bool has_status = false;
> > > +     int id;
> > > +
> > > +     spin_lock(&file->table_lock);
> > > +     idr_for_each_entry (&file->object_idr, obj, id) {
> > > +             enum drm_gem_object_status s = 0;
> > > +
> > > +             if (obj->funcs && obj->funcs->status) {
> > > +                     s = obj->funcs->status(obj);
> > > +                     has_status = true;
> > > +             }
> > > +
> > > +             if (obj->handle_count > 1) {
> > > +                     size.shared += obj->size;
> > > +             } else {
> > > +                     size.private += obj->size;
> > > +             }
> > > +
> > > +             if (s & DRM_GEM_OBJECT_RESIDENT) {
> > > +                     size.resident += obj->size;
> > > +             } else {
> > > +                     /* If already purged or not yet backed by pages, don't
> > > +                      * count it as purgeable:
> > > +                      */
> > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> >
> > Side question - why couldn't resident buffers be purgeable? Did you mean
> > for the if branch check to be active here? But then it wouldn't make
> > sense for a driver to report active _and_ purgeable..
> >
> > > +             }
> > > +
> > > +             if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> > > +                     size.active += obj->size;
> > > +
> > > +                     /* If still active, don't count as purgeable: */
> > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> >
> > Another side question - I guess this tidies a race in reporting? If so
> > not sure it matters given the stats are all rather approximate.
> >
> > > +             }
> > > +
> > > +             if (s & DRM_GEM_OBJECT_PURGEABLE)
> > > +                     size.purgeable += obj->size;
> > > +     }
> >
> > One concern I have here is that it is all based on obj->size. That is,
> > there is no provision for drivers to implement page level granularity.
> > So correct reporting in use cases such as VM BIND in the future wouldn't
> > work unless it was a driver hook to get almost all of the info above. At
> > which point common code is just a loop. TBF I don't know if any drivers
> > do sub obj->size backing store granularity today, but I think it is
> > sometimes to be sure of before proceeding.
> >
> > Second concern is what I touched upon in the first reply block - if the
> > common code blindly loops over all objects then on discrete GPUs it
> > seems we get an 'aggregate' value here which is not what I think we
> > want. We rather want to have the ability for drivers to list stats per
> > individual memory region.
> >
> > > +     spin_unlock(&file->table_lock);
> > > +
> > > +     print_size(p, "drm-shared-memory", size.shared);
> > > +     print_size(p, "drm-private-memory", size.private);
> > > +     print_size(p, "drm-active-memory", size.active);
> > > +
> > > +     if (has_status) {
> > > +             print_size(p, "drm-resident-memory", size.resident);
> > > +             print_size(p, "drm-purgeable-memory", size.purgeable);
> > > +     }
> > > +}
> > > +
> > >   /**
> > >    * drm_fop_show_fdinfo - helper for drm file fops
> > >    * @seq_file: output stream
> > > @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> > >
> > >       if (dev->driver->show_fdinfo)
> > >               dev->driver->show_fdinfo(&p, file);
> > > +
> > > +     print_memory_stats(&p, file);
> > >   }
> > >   EXPORT_SYMBOL(drm_fop_show_fdinfo);
> > >
> > > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > > index dfa995b787e1..e5b40084538f 100644
> > > --- a/include/drm/drm_file.h
> > > +++ b/include/drm/drm_file.h
> > > @@ -41,6 +41,7 @@
> > >   struct dma_fence;
> > >   struct drm_file;
> > >   struct drm_device;
> > > +struct drm_printer;
> > >   struct device;
> > >   struct file;
> > >
> > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > index 189fd618ca65..213917bb6b11 100644
> > > --- a/include/drm/drm_gem.h
> > > +++ b/include/drm/drm_gem.h
> > > @@ -42,6 +42,14 @@
> > >   struct iosys_map;
> > >   struct drm_gem_object;
> > >
> > > +/**
> > > + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> > > + */
> > > +enum drm_gem_object_status {
> > > +     DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> > > +     DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> > > +};
> > > +
> > >   /**
> > >    * struct drm_gem_object_funcs - GEM object functions
> > >    */
> > > @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
> > >        */
> > >       int (*evict)(struct drm_gem_object *obj);
> > >
> > > +     /**
> > > +      * @status:
> > > +      *
> > > +      * The optional status callback can return additional object state
> > > +      * which determines which stats the object is counted against.  The
> > > +      * callback is called under table_lock.  Racing against object status
> > > +      * change is "harmless", and the callback can expect to not race
> > > +      * against object destruction.
> > > +      */
> > > +     enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
> >
> > Does this needs to be in object funcs and couldn't be consolidated to
> > driver level?
> >
> > Regards,
> >
> > Tvrtko
> >
> > > +
> > >       /**
> > >        * @vm_ops:
> > >        *

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-12 18:17         ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12 18:17 UTC (permalink / raw)
  To: Rob Clark
  Cc: Rob Clark, Tvrtko Ursulin, Thomas Zimmermann, Jonathan Corbet,
	linux-arm-msm, open list:DOCUMENTATION, Emil Velikov,
	Christopher Healy, dri-devel, open list, Boris Brezillon,
	freedreno

On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> >
> > On 11/04/2023 23:56, Rob Clark wrote:
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > Add support to dump GEM stats to fdinfo.
> > >
> > > v2: Fix typos, change size units to match docs, use div_u64
> > > v3: Do it in core
> > >
> > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > > ---
> > >   Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> > >   drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> > >   include/drm/drm_file.h                |  1 +
> > >   include/drm/drm_gem.h                 | 19 +++++++
> > >   4 files changed, 117 insertions(+)
> > >
> > > diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > > index b46327356e80..b5e7802532ed 100644
> > > --- a/Documentation/gpu/drm-usage-stats.rst
> > > +++ b/Documentation/gpu/drm-usage-stats.rst
> > > @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> > >   Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> > >   indicating kibi- or mebi-bytes.
> > >
> > > +- drm-shared-memory: <uint> [KiB|MiB]
> > > +
> > > +The total size of buffers that are shared with another file (ie. have more
> > > +than a single handle).
> > > +
> > > +- drm-private-memory: <uint> [KiB|MiB]
> > > +
> > > +The total size of buffers that are not shared with another file.
> > > +
> > > +- drm-resident-memory: <uint> [KiB|MiB]
> > > +
> > > +The total size of buffers that are resident in system memory.
> >
> > I think this naming maybe does not work best with the existing
> > drm-memory-<region> keys.
> 
> Actually, it was very deliberate not to conflict with the existing
> drm-memory-<region> keys ;-)
> 
> I wouldn't have preferred drm-memory-{active,resident,...} but it
> could be mis-parsed by existing userspace so my hands were a bit tied.
> 
> > How about introduce the concept of a memory region from the start and
> > use naming similar like we do for engines?
> >
> > drm-memory-$CATEGORY-$REGION: ...
> >
> > Then we document a bunch of categories and their semantics, for instance:
> >
> > 'size' - All reachable objects
> > 'shared' - Subset of 'size' with handle_count > 1
> > 'resident' - Objects with backing store
> > 'active' - Objects in use, subset of resident
> > 'purgeable' - Or inactive? Subset of resident.
> >
> > We keep the same semantics as with process memory accounting (if I got
> > it right) which could be desirable for a simplified mental model.
> >
> > (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> > correctly captured this in the first round it should be equivalent to
> > 'resident' above. In any case we can document no category is equal to
> > which category, and at most one of the two must be output.)
> >
> > Region names we at most partially standardize. Like we could say
> > 'system' is to be used where backing store is system RAM and others are
> > driver defined.
> >
> > Then discrete GPUs could emit N sets of key-values, one for each memory
> > region they support.
> >
> > I think this all also works for objects which can be migrated between
> > memory regions. 'Size' accounts them against all regions while for
> > 'resident' they only appear in the region of their current placement, etc.
> 
> I'm not too sure how to rectify different memory regions with this,
> since drm core doesn't really know about the driver's memory regions.
> Perhaps we can go back to this being a helper and drivers with vram
> just don't use the helper?  Or??

I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
all works out reasonably consistently?

And ttm could/should perhaps provide a helper to dump the region specific
version of this. Or we lift the concept of regions out of ttm a bit
higher, that's kinda needed for cgroups eventually anyway I think.
-Daniel

> 
> BR,
> -R
> 
> > Userspace can aggregate if it wishes to do so but kernel side should not.
> >
> > > +
> > > +- drm-purgeable-memory: <uint> [KiB|MiB]
> > > +
> > > +The total size of buffers that are purgeable.
> > > +
> > > +- drm-active-memory: <uint> [KiB|MiB]
> > > +
> > > +The total size of buffers that are active on one or more rings.
> > > +
> > >   - drm-cycles-<str> <uint>
> > >
> > >   Engine identifier string must be the same as the one specified in the
> > > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > > index 37dfaa6be560..46fdd843bb3a 100644
> > > --- a/drivers/gpu/drm/drm_file.c
> > > +++ b/drivers/gpu/drm/drm_file.c
> > > @@ -42,6 +42,7 @@
> > >   #include <drm/drm_client.h>
> > >   #include <drm/drm_drv.h>
> > >   #include <drm/drm_file.h>
> > > +#include <drm/drm_gem.h>
> > >   #include <drm/drm_print.h>
> > >
> > >   #include "drm_crtc_internal.h"
> > > @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> > >   }
> > >   EXPORT_SYMBOL(drm_send_event);
> > >
> > > +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> > > +{
> > > +     const char *units[] = {"", " KiB", " MiB"};
> > > +     unsigned u;
> > > +
> > > +     for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> > > +             if (sz < SZ_1K)
> > > +                     break;
> > > +             sz = div_u64(sz, SZ_1K);
> > > +     }
> > > +
> > > +     drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> > > +}
> > > +
> > > +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> > > +{
> > > +     struct drm_gem_object *obj;
> > > +     struct {
> > > +             size_t shared;
> > > +             size_t private;
> > > +             size_t resident;
> > > +             size_t purgeable;
> > > +             size_t active;
> > > +     } size = {0};
> > > +     bool has_status = false;
> > > +     int id;
> > > +
> > > +     spin_lock(&file->table_lock);
> > > +     idr_for_each_entry (&file->object_idr, obj, id) {
> > > +             enum drm_gem_object_status s = 0;
> > > +
> > > +             if (obj->funcs && obj->funcs->status) {
> > > +                     s = obj->funcs->status(obj);
> > > +                     has_status = true;
> > > +             }
> > > +
> > > +             if (obj->handle_count > 1) {
> > > +                     size.shared += obj->size;
> > > +             } else {
> > > +                     size.private += obj->size;
> > > +             }
> > > +
> > > +             if (s & DRM_GEM_OBJECT_RESIDENT) {
> > > +                     size.resident += obj->size;
> > > +             } else {
> > > +                     /* If already purged or not yet backed by pages, don't
> > > +                      * count it as purgeable:
> > > +                      */
> > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> >
> > Side question - why couldn't resident buffers be purgeable? Did you mean
> > for the if branch check to be active here? But then it wouldn't make
> > sense for a driver to report active _and_ purgeable..
> >
> > > +             }
> > > +
> > > +             if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> > > +                     size.active += obj->size;
> > > +
> > > +                     /* If still active, don't count as purgeable: */
> > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> >
> > Another side question - I guess this tidies a race in reporting? If so
> > not sure it matters given the stats are all rather approximate.
> >
> > > +             }
> > > +
> > > +             if (s & DRM_GEM_OBJECT_PURGEABLE)
> > > +                     size.purgeable += obj->size;
> > > +     }
> >
> > One concern I have here is that it is all based on obj->size. That is,
> > there is no provision for drivers to implement page level granularity.
> > So correct reporting in use cases such as VM BIND in the future wouldn't
> > work unless it was a driver hook to get almost all of the info above. At
> > which point common code is just a loop. TBF I don't know if any drivers
> > do sub obj->size backing store granularity today, but I think it is
> > sometimes to be sure of before proceeding.
> >
> > Second concern is what I touched upon in the first reply block - if the
> > common code blindly loops over all objects then on discrete GPUs it
> > seems we get an 'aggregate' value here which is not what I think we
> > want. We rather want to have the ability for drivers to list stats per
> > individual memory region.
> >
> > > +     spin_unlock(&file->table_lock);
> > > +
> > > +     print_size(p, "drm-shared-memory", size.shared);
> > > +     print_size(p, "drm-private-memory", size.private);
> > > +     print_size(p, "drm-active-memory", size.active);
> > > +
> > > +     if (has_status) {
> > > +             print_size(p, "drm-resident-memory", size.resident);
> > > +             print_size(p, "drm-purgeable-memory", size.purgeable);
> > > +     }
> > > +}
> > > +
> > >   /**
> > >    * drm_fop_show_fdinfo - helper for drm file fops
> > >    * @seq_file: output stream
> > > @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> > >
> > >       if (dev->driver->show_fdinfo)
> > >               dev->driver->show_fdinfo(&p, file);
> > > +
> > > +     print_memory_stats(&p, file);
> > >   }
> > >   EXPORT_SYMBOL(drm_fop_show_fdinfo);
> > >
> > > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > > index dfa995b787e1..e5b40084538f 100644
> > > --- a/include/drm/drm_file.h
> > > +++ b/include/drm/drm_file.h
> > > @@ -41,6 +41,7 @@
> > >   struct dma_fence;
> > >   struct drm_file;
> > >   struct drm_device;
> > > +struct drm_printer;
> > >   struct device;
> > >   struct file;
> > >
> > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > index 189fd618ca65..213917bb6b11 100644
> > > --- a/include/drm/drm_gem.h
> > > +++ b/include/drm/drm_gem.h
> > > @@ -42,6 +42,14 @@
> > >   struct iosys_map;
> > >   struct drm_gem_object;
> > >
> > > +/**
> > > + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> > > + */
> > > +enum drm_gem_object_status {
> > > +     DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> > > +     DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> > > +};
> > > +
> > >   /**
> > >    * struct drm_gem_object_funcs - GEM object functions
> > >    */
> > > @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
> > >        */
> > >       int (*evict)(struct drm_gem_object *obj);
> > >
> > > +     /**
> > > +      * @status:
> > > +      *
> > > +      * The optional status callback can return additional object state
> > > +      * which determines which stats the object is counted against.  The
> > > +      * callback is called under table_lock.  Racing against object status
> > > +      * change is "harmless", and the callback can expect to not race
> > > +      * against object destruction.
> > > +      */
> > > +     enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
> >
> > Does this needs to be in object funcs and couldn't be consolidated to
> > driver level?
> >
> > Regards,
> >
> > Tvrtko
> >
> > > +
> > >       /**
> > >        * @vm_ops:
> > >        *

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-12 18:17         ` Daniel Vetter
@ 2023-04-12 18:42           ` Rob Clark
  -1 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-12 18:42 UTC (permalink / raw)
  To: Rob Clark, Tvrtko Ursulin, dri-devel, linux-arm-msm, freedreno,
	Boris Brezillon, Christopher Healy, Emil Velikov, Rob Clark,
	David Airlie, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, open list:DOCUMENTATION,
	open list
  Cc: Daniel Vetter

On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> > On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> > <tvrtko.ursulin@linux.intel.com> wrote:
> > >
> > >
> > > On 11/04/2023 23:56, Rob Clark wrote:
> > > > From: Rob Clark <robdclark@chromium.org>
> > > >
> > > > Add support to dump GEM stats to fdinfo.
> > > >
> > > > v2: Fix typos, change size units to match docs, use div_u64
> > > > v3: Do it in core
> > > >
> > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > > > ---
> > > >   Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> > > >   drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> > > >   include/drm/drm_file.h                |  1 +
> > > >   include/drm/drm_gem.h                 | 19 +++++++
> > > >   4 files changed, 117 insertions(+)
> > > >
> > > > diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > > > index b46327356e80..b5e7802532ed 100644
> > > > --- a/Documentation/gpu/drm-usage-stats.rst
> > > > +++ b/Documentation/gpu/drm-usage-stats.rst
> > > > @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> > > >   Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> > > >   indicating kibi- or mebi-bytes.
> > > >
> > > > +- drm-shared-memory: <uint> [KiB|MiB]
> > > > +
> > > > +The total size of buffers that are shared with another file (ie. have more
> > > > +than a single handle).
> > > > +
> > > > +- drm-private-memory: <uint> [KiB|MiB]
> > > > +
> > > > +The total size of buffers that are not shared with another file.
> > > > +
> > > > +- drm-resident-memory: <uint> [KiB|MiB]
> > > > +
> > > > +The total size of buffers that are resident in system memory.
> > >
> > > I think this naming maybe does not work best with the existing
> > > drm-memory-<region> keys.
> >
> > Actually, it was very deliberate not to conflict with the existing
> > drm-memory-<region> keys ;-)
> >
> > I wouldn't have preferred drm-memory-{active,resident,...} but it
> > could be mis-parsed by existing userspace so my hands were a bit tied.
> >
> > > How about introduce the concept of a memory region from the start and
> > > use naming similar like we do for engines?
> > >
> > > drm-memory-$CATEGORY-$REGION: ...
> > >
> > > Then we document a bunch of categories and their semantics, for instance:
> > >
> > > 'size' - All reachable objects
> > > 'shared' - Subset of 'size' with handle_count > 1
> > > 'resident' - Objects with backing store
> > > 'active' - Objects in use, subset of resident
> > > 'purgeable' - Or inactive? Subset of resident.
> > >
> > > We keep the same semantics as with process memory accounting (if I got
> > > it right) which could be desirable for a simplified mental model.
> > >
> > > (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> > > correctly captured this in the first round it should be equivalent to
> > > 'resident' above. In any case we can document no category is equal to
> > > which category, and at most one of the two must be output.)
> > >
> > > Region names we at most partially standardize. Like we could say
> > > 'system' is to be used where backing store is system RAM and others are
> > > driver defined.
> > >
> > > Then discrete GPUs could emit N sets of key-values, one for each memory
> > > region they support.
> > >
> > > I think this all also works for objects which can be migrated between
> > > memory regions. 'Size' accounts them against all regions while for
> > > 'resident' they only appear in the region of their current placement, etc.
> >
> > I'm not too sure how to rectify different memory regions with this,
> > since drm core doesn't really know about the driver's memory regions.
> > Perhaps we can go back to this being a helper and drivers with vram
> > just don't use the helper?  Or??
>
> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> all works out reasonably consistently?

That is basically what we have now.  I could append -system to each to
make things easier to add vram/etc (from a uabi standpoint)..

BR,
-R

> And ttm could/should perhaps provide a helper to dump the region specific
> version of this. Or we lift the concept of regions out of ttm a bit
> higher, that's kinda needed for cgroups eventually anyway I think.
> -Daniel
>
> >
> > BR,
> > -R
> >
> > > Userspace can aggregate if it wishes to do so but kernel side should not.
> > >
> > > > +
> > > > +- drm-purgeable-memory: <uint> [KiB|MiB]
> > > > +
> > > > +The total size of buffers that are purgeable.
> > > > +
> > > > +- drm-active-memory: <uint> [KiB|MiB]
> > > > +
> > > > +The total size of buffers that are active on one or more rings.
> > > > +
> > > >   - drm-cycles-<str> <uint>
> > > >
> > > >   Engine identifier string must be the same as the one specified in the
> > > > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > > > index 37dfaa6be560..46fdd843bb3a 100644
> > > > --- a/drivers/gpu/drm/drm_file.c
> > > > +++ b/drivers/gpu/drm/drm_file.c
> > > > @@ -42,6 +42,7 @@
> > > >   #include <drm/drm_client.h>
> > > >   #include <drm/drm_drv.h>
> > > >   #include <drm/drm_file.h>
> > > > +#include <drm/drm_gem.h>
> > > >   #include <drm/drm_print.h>
> > > >
> > > >   #include "drm_crtc_internal.h"
> > > > @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> > > >   }
> > > >   EXPORT_SYMBOL(drm_send_event);
> > > >
> > > > +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> > > > +{
> > > > +     const char *units[] = {"", " KiB", " MiB"};
> > > > +     unsigned u;
> > > > +
> > > > +     for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> > > > +             if (sz < SZ_1K)
> > > > +                     break;
> > > > +             sz = div_u64(sz, SZ_1K);
> > > > +     }
> > > > +
> > > > +     drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> > > > +}
> > > > +
> > > > +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> > > > +{
> > > > +     struct drm_gem_object *obj;
> > > > +     struct {
> > > > +             size_t shared;
> > > > +             size_t private;
> > > > +             size_t resident;
> > > > +             size_t purgeable;
> > > > +             size_t active;
> > > > +     } size = {0};
> > > > +     bool has_status = false;
> > > > +     int id;
> > > > +
> > > > +     spin_lock(&file->table_lock);
> > > > +     idr_for_each_entry (&file->object_idr, obj, id) {
> > > > +             enum drm_gem_object_status s = 0;
> > > > +
> > > > +             if (obj->funcs && obj->funcs->status) {
> > > > +                     s = obj->funcs->status(obj);
> > > > +                     has_status = true;
> > > > +             }
> > > > +
> > > > +             if (obj->handle_count > 1) {
> > > > +                     size.shared += obj->size;
> > > > +             } else {
> > > > +                     size.private += obj->size;
> > > > +             }
> > > > +
> > > > +             if (s & DRM_GEM_OBJECT_RESIDENT) {
> > > > +                     size.resident += obj->size;
> > > > +             } else {
> > > > +                     /* If already purged or not yet backed by pages, don't
> > > > +                      * count it as purgeable:
> > > > +                      */
> > > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> > >
> > > Side question - why couldn't resident buffers be purgeable? Did you mean
> > > for the if branch check to be active here? But then it wouldn't make
> > > sense for a driver to report active _and_ purgeable..
> > >
> > > > +             }
> > > > +
> > > > +             if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> > > > +                     size.active += obj->size;
> > > > +
> > > > +                     /* If still active, don't count as purgeable: */
> > > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> > >
> > > Another side question - I guess this tidies a race in reporting? If so
> > > not sure it matters given the stats are all rather approximate.
> > >
> > > > +             }
> > > > +
> > > > +             if (s & DRM_GEM_OBJECT_PURGEABLE)
> > > > +                     size.purgeable += obj->size;
> > > > +     }
> > >
> > > One concern I have here is that it is all based on obj->size. That is,
> > > there is no provision for drivers to implement page level granularity.
> > > So correct reporting in use cases such as VM BIND in the future wouldn't
> > > work unless it was a driver hook to get almost all of the info above. At
> > > which point common code is just a loop. TBF I don't know if any drivers
> > > do sub obj->size backing store granularity today, but I think it is
> > > sometimes to be sure of before proceeding.
> > >
> > > Second concern is what I touched upon in the first reply block - if the
> > > common code blindly loops over all objects then on discrete GPUs it
> > > seems we get an 'aggregate' value here which is not what I think we
> > > want. We rather want to have the ability for drivers to list stats per
> > > individual memory region.
> > >
> > > > +     spin_unlock(&file->table_lock);
> > > > +
> > > > +     print_size(p, "drm-shared-memory", size.shared);
> > > > +     print_size(p, "drm-private-memory", size.private);
> > > > +     print_size(p, "drm-active-memory", size.active);
> > > > +
> > > > +     if (has_status) {
> > > > +             print_size(p, "drm-resident-memory", size.resident);
> > > > +             print_size(p, "drm-purgeable-memory", size.purgeable);
> > > > +     }
> > > > +}
> > > > +
> > > >   /**
> > > >    * drm_fop_show_fdinfo - helper for drm file fops
> > > >    * @seq_file: output stream
> > > > @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> > > >
> > > >       if (dev->driver->show_fdinfo)
> > > >               dev->driver->show_fdinfo(&p, file);
> > > > +
> > > > +     print_memory_stats(&p, file);
> > > >   }
> > > >   EXPORT_SYMBOL(drm_fop_show_fdinfo);
> > > >
> > > > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > > > index dfa995b787e1..e5b40084538f 100644
> > > > --- a/include/drm/drm_file.h
> > > > +++ b/include/drm/drm_file.h
> > > > @@ -41,6 +41,7 @@
> > > >   struct dma_fence;
> > > >   struct drm_file;
> > > >   struct drm_device;
> > > > +struct drm_printer;
> > > >   struct device;
> > > >   struct file;
> > > >
> > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > index 189fd618ca65..213917bb6b11 100644
> > > > --- a/include/drm/drm_gem.h
> > > > +++ b/include/drm/drm_gem.h
> > > > @@ -42,6 +42,14 @@
> > > >   struct iosys_map;
> > > >   struct drm_gem_object;
> > > >
> > > > +/**
> > > > + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> > > > + */
> > > > +enum drm_gem_object_status {
> > > > +     DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> > > > +     DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> > > > +};
> > > > +
> > > >   /**
> > > >    * struct drm_gem_object_funcs - GEM object functions
> > > >    */
> > > > @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
> > > >        */
> > > >       int (*evict)(struct drm_gem_object *obj);
> > > >
> > > > +     /**
> > > > +      * @status:
> > > > +      *
> > > > +      * The optional status callback can return additional object state
> > > > +      * which determines which stats the object is counted against.  The
> > > > +      * callback is called under table_lock.  Racing against object status
> > > > +      * change is "harmless", and the callback can expect to not race
> > > > +      * against object destruction.
> > > > +      */
> > > > +     enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
> > >
> > > Does this needs to be in object funcs and couldn't be consolidated to
> > > driver level?
> > >
> > > Regards,
> > >
> > > Tvrtko
> > >
> > > > +
> > > >       /**
> > > >        * @vm_ops:
> > > >        *
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-12 18:42           ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-12 18:42 UTC (permalink / raw)
  To: Rob Clark, Tvrtko Ursulin, dri-devel, linux-arm-msm, freedreno,
	Boris Brezillon, Christopher Healy, Emil Velikov, Rob Clark,
	David Airlie, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, open list:DOCUMENTATION,
	open list

On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> > On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> > <tvrtko.ursulin@linux.intel.com> wrote:
> > >
> > >
> > > On 11/04/2023 23:56, Rob Clark wrote:
> > > > From: Rob Clark <robdclark@chromium.org>
> > > >
> > > > Add support to dump GEM stats to fdinfo.
> > > >
> > > > v2: Fix typos, change size units to match docs, use div_u64
> > > > v3: Do it in core
> > > >
> > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > > > ---
> > > >   Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> > > >   drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> > > >   include/drm/drm_file.h                |  1 +
> > > >   include/drm/drm_gem.h                 | 19 +++++++
> > > >   4 files changed, 117 insertions(+)
> > > >
> > > > diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > > > index b46327356e80..b5e7802532ed 100644
> > > > --- a/Documentation/gpu/drm-usage-stats.rst
> > > > +++ b/Documentation/gpu/drm-usage-stats.rst
> > > > @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> > > >   Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> > > >   indicating kibi- or mebi-bytes.
> > > >
> > > > +- drm-shared-memory: <uint> [KiB|MiB]
> > > > +
> > > > +The total size of buffers that are shared with another file (ie. have more
> > > > +than a single handle).
> > > > +
> > > > +- drm-private-memory: <uint> [KiB|MiB]
> > > > +
> > > > +The total size of buffers that are not shared with another file.
> > > > +
> > > > +- drm-resident-memory: <uint> [KiB|MiB]
> > > > +
> > > > +The total size of buffers that are resident in system memory.
> > >
> > > I think this naming maybe does not work best with the existing
> > > drm-memory-<region> keys.
> >
> > Actually, it was very deliberate not to conflict with the existing
> > drm-memory-<region> keys ;-)
> >
> > I wouldn't have preferred drm-memory-{active,resident,...} but it
> > could be mis-parsed by existing userspace so my hands were a bit tied.
> >
> > > How about introduce the concept of a memory region from the start and
> > > use naming similar like we do for engines?
> > >
> > > drm-memory-$CATEGORY-$REGION: ...
> > >
> > > Then we document a bunch of categories and their semantics, for instance:
> > >
> > > 'size' - All reachable objects
> > > 'shared' - Subset of 'size' with handle_count > 1
> > > 'resident' - Objects with backing store
> > > 'active' - Objects in use, subset of resident
> > > 'purgeable' - Or inactive? Subset of resident.
> > >
> > > We keep the same semantics as with process memory accounting (if I got
> > > it right) which could be desirable for a simplified mental model.
> > >
> > > (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> > > correctly captured this in the first round it should be equivalent to
> > > 'resident' above. In any case we can document no category is equal to
> > > which category, and at most one of the two must be output.)
> > >
> > > Region names we at most partially standardize. Like we could say
> > > 'system' is to be used where backing store is system RAM and others are
> > > driver defined.
> > >
> > > Then discrete GPUs could emit N sets of key-values, one for each memory
> > > region they support.
> > >
> > > I think this all also works for objects which can be migrated between
> > > memory regions. 'Size' accounts them against all regions while for
> > > 'resident' they only appear in the region of their current placement, etc.
> >
> > I'm not too sure how to rectify different memory regions with this,
> > since drm core doesn't really know about the driver's memory regions.
> > Perhaps we can go back to this being a helper and drivers with vram
> > just don't use the helper?  Or??
>
> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> all works out reasonably consistently?

That is basically what we have now.  I could append -system to each to
make things easier to add vram/etc (from a uabi standpoint)..

BR,
-R

> And ttm could/should perhaps provide a helper to dump the region specific
> version of this. Or we lift the concept of regions out of ttm a bit
> higher, that's kinda needed for cgroups eventually anyway I think.
> -Daniel
>
> >
> > BR,
> > -R
> >
> > > Userspace can aggregate if it wishes to do so but kernel side should not.
> > >
> > > > +
> > > > +- drm-purgeable-memory: <uint> [KiB|MiB]
> > > > +
> > > > +The total size of buffers that are purgeable.
> > > > +
> > > > +- drm-active-memory: <uint> [KiB|MiB]
> > > > +
> > > > +The total size of buffers that are active on one or more rings.
> > > > +
> > > >   - drm-cycles-<str> <uint>
> > > >
> > > >   Engine identifier string must be the same as the one specified in the
> > > > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > > > index 37dfaa6be560..46fdd843bb3a 100644
> > > > --- a/drivers/gpu/drm/drm_file.c
> > > > +++ b/drivers/gpu/drm/drm_file.c
> > > > @@ -42,6 +42,7 @@
> > > >   #include <drm/drm_client.h>
> > > >   #include <drm/drm_drv.h>
> > > >   #include <drm/drm_file.h>
> > > > +#include <drm/drm_gem.h>
> > > >   #include <drm/drm_print.h>
> > > >
> > > >   #include "drm_crtc_internal.h"
> > > > @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> > > >   }
> > > >   EXPORT_SYMBOL(drm_send_event);
> > > >
> > > > +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> > > > +{
> > > > +     const char *units[] = {"", " KiB", " MiB"};
> > > > +     unsigned u;
> > > > +
> > > > +     for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> > > > +             if (sz < SZ_1K)
> > > > +                     break;
> > > > +             sz = div_u64(sz, SZ_1K);
> > > > +     }
> > > > +
> > > > +     drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> > > > +}
> > > > +
> > > > +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> > > > +{
> > > > +     struct drm_gem_object *obj;
> > > > +     struct {
> > > > +             size_t shared;
> > > > +             size_t private;
> > > > +             size_t resident;
> > > > +             size_t purgeable;
> > > > +             size_t active;
> > > > +     } size = {0};
> > > > +     bool has_status = false;
> > > > +     int id;
> > > > +
> > > > +     spin_lock(&file->table_lock);
> > > > +     idr_for_each_entry (&file->object_idr, obj, id) {
> > > > +             enum drm_gem_object_status s = 0;
> > > > +
> > > > +             if (obj->funcs && obj->funcs->status) {
> > > > +                     s = obj->funcs->status(obj);
> > > > +                     has_status = true;
> > > > +             }
> > > > +
> > > > +             if (obj->handle_count > 1) {
> > > > +                     size.shared += obj->size;
> > > > +             } else {
> > > > +                     size.private += obj->size;
> > > > +             }
> > > > +
> > > > +             if (s & DRM_GEM_OBJECT_RESIDENT) {
> > > > +                     size.resident += obj->size;
> > > > +             } else {
> > > > +                     /* If already purged or not yet backed by pages, don't
> > > > +                      * count it as purgeable:
> > > > +                      */
> > > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> > >
> > > Side question - why couldn't resident buffers be purgeable? Did you mean
> > > for the if branch check to be active here? But then it wouldn't make
> > > sense for a driver to report active _and_ purgeable..
> > >
> > > > +             }
> > > > +
> > > > +             if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> > > > +                     size.active += obj->size;
> > > > +
> > > > +                     /* If still active, don't count as purgeable: */
> > > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> > >
> > > Another side question - I guess this tidies a race in reporting? If so
> > > not sure it matters given the stats are all rather approximate.
> > >
> > > > +             }
> > > > +
> > > > +             if (s & DRM_GEM_OBJECT_PURGEABLE)
> > > > +                     size.purgeable += obj->size;
> > > > +     }
> > >
> > > One concern I have here is that it is all based on obj->size. That is,
> > > there is no provision for drivers to implement page level granularity.
> > > So correct reporting in use cases such as VM BIND in the future wouldn't
> > > work unless it was a driver hook to get almost all of the info above. At
> > > which point common code is just a loop. TBF I don't know if any drivers
> > > do sub obj->size backing store granularity today, but I think it is
> > > sometimes to be sure of before proceeding.
> > >
> > > Second concern is what I touched upon in the first reply block - if the
> > > common code blindly loops over all objects then on discrete GPUs it
> > > seems we get an 'aggregate' value here which is not what I think we
> > > want. We rather want to have the ability for drivers to list stats per
> > > individual memory region.
> > >
> > > > +     spin_unlock(&file->table_lock);
> > > > +
> > > > +     print_size(p, "drm-shared-memory", size.shared);
> > > > +     print_size(p, "drm-private-memory", size.private);
> > > > +     print_size(p, "drm-active-memory", size.active);
> > > > +
> > > > +     if (has_status) {
> > > > +             print_size(p, "drm-resident-memory", size.resident);
> > > > +             print_size(p, "drm-purgeable-memory", size.purgeable);
> > > > +     }
> > > > +}
> > > > +
> > > >   /**
> > > >    * drm_fop_show_fdinfo - helper for drm file fops
> > > >    * @seq_file: output stream
> > > > @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> > > >
> > > >       if (dev->driver->show_fdinfo)
> > > >               dev->driver->show_fdinfo(&p, file);
> > > > +
> > > > +     print_memory_stats(&p, file);
> > > >   }
> > > >   EXPORT_SYMBOL(drm_fop_show_fdinfo);
> > > >
> > > > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > > > index dfa995b787e1..e5b40084538f 100644
> > > > --- a/include/drm/drm_file.h
> > > > +++ b/include/drm/drm_file.h
> > > > @@ -41,6 +41,7 @@
> > > >   struct dma_fence;
> > > >   struct drm_file;
> > > >   struct drm_device;
> > > > +struct drm_printer;
> > > >   struct device;
> > > >   struct file;
> > > >
> > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > index 189fd618ca65..213917bb6b11 100644
> > > > --- a/include/drm/drm_gem.h
> > > > +++ b/include/drm/drm_gem.h
> > > > @@ -42,6 +42,14 @@
> > > >   struct iosys_map;
> > > >   struct drm_gem_object;
> > > >
> > > > +/**
> > > > + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> > > > + */
> > > > +enum drm_gem_object_status {
> > > > +     DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> > > > +     DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> > > > +};
> > > > +
> > > >   /**
> > > >    * struct drm_gem_object_funcs - GEM object functions
> > > >    */
> > > > @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
> > > >        */
> > > >       int (*evict)(struct drm_gem_object *obj);
> > > >
> > > > +     /**
> > > > +      * @status:
> > > > +      *
> > > > +      * The optional status callback can return additional object state
> > > > +      * which determines which stats the object is counted against.  The
> > > > +      * callback is called under table_lock.  Racing against object status
> > > > +      * change is "harmless", and the callback can expect to not race
> > > > +      * against object destruction.
> > > > +      */
> > > > +     enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
> > >
> > > Does this needs to be in object funcs and couldn't be consolidated to
> > > driver level?
> > >
> > > Regards,
> > >
> > > Tvrtko
> > >
> > > > +
> > > >       /**
> > > >        * @vm_ops:
> > > >        *
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-12 18:42           ` Rob Clark
@ 2023-04-12 19:18             ` Daniel Vetter
  -1 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12 19:18 UTC (permalink / raw)
  To: Rob Clark
  Cc: Tvrtko Ursulin, dri-devel, linux-arm-msm, freedreno,
	Boris Brezillon, Christopher Healy, Emil Velikov, Rob Clark,
	David Airlie, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, open list:DOCUMENTATION,
	open list, Daniel Vetter

On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> > > On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> > > <tvrtko.ursulin@linux.intel.com> wrote:
> > > >
> > > >
> > > > On 11/04/2023 23:56, Rob Clark wrote:
> > > > > From: Rob Clark <robdclark@chromium.org>
> > > > >
> > > > > Add support to dump GEM stats to fdinfo.
> > > > >
> > > > > v2: Fix typos, change size units to match docs, use div_u64
> > > > > v3: Do it in core
> > > > >
> > > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > > Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > > > > ---
> > > > >   Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> > > > >   drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> > > > >   include/drm/drm_file.h                |  1 +
> > > > >   include/drm/drm_gem.h                 | 19 +++++++
> > > > >   4 files changed, 117 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > > > > index b46327356e80..b5e7802532ed 100644
> > > > > --- a/Documentation/gpu/drm-usage-stats.rst
> > > > > +++ b/Documentation/gpu/drm-usage-stats.rst
> > > > > @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> > > > >   Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> > > > >   indicating kibi- or mebi-bytes.
> > > > >
> > > > > +- drm-shared-memory: <uint> [KiB|MiB]
> > > > > +
> > > > > +The total size of buffers that are shared with another file (ie. have more
> > > > > +than a single handle).
> > > > > +
> > > > > +- drm-private-memory: <uint> [KiB|MiB]
> > > > > +
> > > > > +The total size of buffers that are not shared with another file.
> > > > > +
> > > > > +- drm-resident-memory: <uint> [KiB|MiB]
> > > > > +
> > > > > +The total size of buffers that are resident in system memory.
> > > >
> > > > I think this naming maybe does not work best with the existing
> > > > drm-memory-<region> keys.
> > >
> > > Actually, it was very deliberate not to conflict with the existing
> > > drm-memory-<region> keys ;-)
> > >
> > > I wouldn't have preferred drm-memory-{active,resident,...} but it
> > > could be mis-parsed by existing userspace so my hands were a bit tied.
> > >
> > > > How about introduce the concept of a memory region from the start and
> > > > use naming similar like we do for engines?
> > > >
> > > > drm-memory-$CATEGORY-$REGION: ...
> > > >
> > > > Then we document a bunch of categories and their semantics, for instance:
> > > >
> > > > 'size' - All reachable objects
> > > > 'shared' - Subset of 'size' with handle_count > 1
> > > > 'resident' - Objects with backing store
> > > > 'active' - Objects in use, subset of resident
> > > > 'purgeable' - Or inactive? Subset of resident.
> > > >
> > > > We keep the same semantics as with process memory accounting (if I got
> > > > it right) which could be desirable for a simplified mental model.
> > > >
> > > > (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> > > > correctly captured this in the first round it should be equivalent to
> > > > 'resident' above. In any case we can document no category is equal to
> > > > which category, and at most one of the two must be output.)
> > > >
> > > > Region names we at most partially standardize. Like we could say
> > > > 'system' is to be used where backing store is system RAM and others are
> > > > driver defined.
> > > >
> > > > Then discrete GPUs could emit N sets of key-values, one for each memory
> > > > region they support.
> > > >
> > > > I think this all also works for objects which can be migrated between
> > > > memory regions. 'Size' accounts them against all regions while for
> > > > 'resident' they only appear in the region of their current placement, etc.
> > >
> > > I'm not too sure how to rectify different memory regions with this,
> > > since drm core doesn't really know about the driver's memory regions.
> > > Perhaps we can go back to this being a helper and drivers with vram
> > > just don't use the helper?  Or??
> >
> > I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> > all works out reasonably consistently?
> 
> That is basically what we have now.  I could append -system to each to
> make things easier to add vram/etc (from a uabi standpoint)..

What you have isn't really -system, but everything. So doesn't really make
sense to me to mark this -system, it's only really true for integrated (if
they don't have stolen or something like that).

Also my comment was more in reply to Tvrtko's suggestion.
-Daniel


> 
> BR,
> -R
> 
> > And ttm could/should perhaps provide a helper to dump the region specific
> > version of this. Or we lift the concept of regions out of ttm a bit
> > higher, that's kinda needed for cgroups eventually anyway I think.
> > -Daniel
> >
> > >
> > > BR,
> > > -R
> > >
> > > > Userspace can aggregate if it wishes to do so but kernel side should not.
> > > >
> > > > > +
> > > > > +- drm-purgeable-memory: <uint> [KiB|MiB]
> > > > > +
> > > > > +The total size of buffers that are purgeable.
> > > > > +
> > > > > +- drm-active-memory: <uint> [KiB|MiB]
> > > > > +
> > > > > +The total size of buffers that are active on one or more rings.
> > > > > +
> > > > >   - drm-cycles-<str> <uint>
> > > > >
> > > > >   Engine identifier string must be the same as the one specified in the
> > > > > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > > > > index 37dfaa6be560..46fdd843bb3a 100644
> > > > > --- a/drivers/gpu/drm/drm_file.c
> > > > > +++ b/drivers/gpu/drm/drm_file.c
> > > > > @@ -42,6 +42,7 @@
> > > > >   #include <drm/drm_client.h>
> > > > >   #include <drm/drm_drv.h>
> > > > >   #include <drm/drm_file.h>
> > > > > +#include <drm/drm_gem.h>
> > > > >   #include <drm/drm_print.h>
> > > > >
> > > > >   #include "drm_crtc_internal.h"
> > > > > @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> > > > >   }
> > > > >   EXPORT_SYMBOL(drm_send_event);
> > > > >
> > > > > +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> > > > > +{
> > > > > +     const char *units[] = {"", " KiB", " MiB"};
> > > > > +     unsigned u;
> > > > > +
> > > > > +     for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> > > > > +             if (sz < SZ_1K)
> > > > > +                     break;
> > > > > +             sz = div_u64(sz, SZ_1K);
> > > > > +     }
> > > > > +
> > > > > +     drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> > > > > +}
> > > > > +
> > > > > +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> > > > > +{
> > > > > +     struct drm_gem_object *obj;
> > > > > +     struct {
> > > > > +             size_t shared;
> > > > > +             size_t private;
> > > > > +             size_t resident;
> > > > > +             size_t purgeable;
> > > > > +             size_t active;
> > > > > +     } size = {0};
> > > > > +     bool has_status = false;
> > > > > +     int id;
> > > > > +
> > > > > +     spin_lock(&file->table_lock);
> > > > > +     idr_for_each_entry (&file->object_idr, obj, id) {
> > > > > +             enum drm_gem_object_status s = 0;
> > > > > +
> > > > > +             if (obj->funcs && obj->funcs->status) {
> > > > > +                     s = obj->funcs->status(obj);
> > > > > +                     has_status = true;
> > > > > +             }
> > > > > +
> > > > > +             if (obj->handle_count > 1) {
> > > > > +                     size.shared += obj->size;
> > > > > +             } else {
> > > > > +                     size.private += obj->size;
> > > > > +             }
> > > > > +
> > > > > +             if (s & DRM_GEM_OBJECT_RESIDENT) {
> > > > > +                     size.resident += obj->size;
> > > > > +             } else {
> > > > > +                     /* If already purged or not yet backed by pages, don't
> > > > > +                      * count it as purgeable:
> > > > > +                      */
> > > > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> > > >
> > > > Side question - why couldn't resident buffers be purgeable? Did you mean
> > > > for the if branch check to be active here? But then it wouldn't make
> > > > sense for a driver to report active _and_ purgeable..
> > > >
> > > > > +             }
> > > > > +
> > > > > +             if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> > > > > +                     size.active += obj->size;
> > > > > +
> > > > > +                     /* If still active, don't count as purgeable: */
> > > > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> > > >
> > > > Another side question - I guess this tidies a race in reporting? If so
> > > > not sure it matters given the stats are all rather approximate.
> > > >
> > > > > +             }
> > > > > +
> > > > > +             if (s & DRM_GEM_OBJECT_PURGEABLE)
> > > > > +                     size.purgeable += obj->size;
> > > > > +     }
> > > >
> > > > One concern I have here is that it is all based on obj->size. That is,
> > > > there is no provision for drivers to implement page level granularity.
> > > > So correct reporting in use cases such as VM BIND in the future wouldn't
> > > > work unless it was a driver hook to get almost all of the info above. At
> > > > which point common code is just a loop. TBF I don't know if any drivers
> > > > do sub obj->size backing store granularity today, but I think it is
> > > > sometimes to be sure of before proceeding.
> > > >
> > > > Second concern is what I touched upon in the first reply block - if the
> > > > common code blindly loops over all objects then on discrete GPUs it
> > > > seems we get an 'aggregate' value here which is not what I think we
> > > > want. We rather want to have the ability for drivers to list stats per
> > > > individual memory region.
> > > >
> > > > > +     spin_unlock(&file->table_lock);
> > > > > +
> > > > > +     print_size(p, "drm-shared-memory", size.shared);
> > > > > +     print_size(p, "drm-private-memory", size.private);
> > > > > +     print_size(p, "drm-active-memory", size.active);
> > > > > +
> > > > > +     if (has_status) {
> > > > > +             print_size(p, "drm-resident-memory", size.resident);
> > > > > +             print_size(p, "drm-purgeable-memory", size.purgeable);
> > > > > +     }
> > > > > +}
> > > > > +
> > > > >   /**
> > > > >    * drm_fop_show_fdinfo - helper for drm file fops
> > > > >    * @seq_file: output stream
> > > > > @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> > > > >
> > > > >       if (dev->driver->show_fdinfo)
> > > > >               dev->driver->show_fdinfo(&p, file);
> > > > > +
> > > > > +     print_memory_stats(&p, file);
> > > > >   }
> > > > >   EXPORT_SYMBOL(drm_fop_show_fdinfo);
> > > > >
> > > > > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > > > > index dfa995b787e1..e5b40084538f 100644
> > > > > --- a/include/drm/drm_file.h
> > > > > +++ b/include/drm/drm_file.h
> > > > > @@ -41,6 +41,7 @@
> > > > >   struct dma_fence;
> > > > >   struct drm_file;
> > > > >   struct drm_device;
> > > > > +struct drm_printer;
> > > > >   struct device;
> > > > >   struct file;
> > > > >
> > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > index 189fd618ca65..213917bb6b11 100644
> > > > > --- a/include/drm/drm_gem.h
> > > > > +++ b/include/drm/drm_gem.h
> > > > > @@ -42,6 +42,14 @@
> > > > >   struct iosys_map;
> > > > >   struct drm_gem_object;
> > > > >
> > > > > +/**
> > > > > + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> > > > > + */
> > > > > +enum drm_gem_object_status {
> > > > > +     DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> > > > > +     DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> > > > > +};
> > > > > +
> > > > >   /**
> > > > >    * struct drm_gem_object_funcs - GEM object functions
> > > > >    */
> > > > > @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
> > > > >        */
> > > > >       int (*evict)(struct drm_gem_object *obj);
> > > > >
> > > > > +     /**
> > > > > +      * @status:
> > > > > +      *
> > > > > +      * The optional status callback can return additional object state
> > > > > +      * which determines which stats the object is counted against.  The
> > > > > +      * callback is called under table_lock.  Racing against object status
> > > > > +      * change is "harmless", and the callback can expect to not race
> > > > > +      * against object destruction.
> > > > > +      */
> > > > > +     enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
> > > >
> > > > Does this needs to be in object funcs and couldn't be consolidated to
> > > > driver level?
> > > >
> > > > Regards,
> > > >
> > > > Tvrtko
> > > >
> > > > > +
> > > > >       /**
> > > > >        * @vm_ops:
> > > > >        *
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-12 19:18             ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-12 19:18 UTC (permalink / raw)
  To: Rob Clark
  Cc: Rob Clark, Tvrtko Ursulin, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno

On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> > > On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> > > <tvrtko.ursulin@linux.intel.com> wrote:
> > > >
> > > >
> > > > On 11/04/2023 23:56, Rob Clark wrote:
> > > > > From: Rob Clark <robdclark@chromium.org>
> > > > >
> > > > > Add support to dump GEM stats to fdinfo.
> > > > >
> > > > > v2: Fix typos, change size units to match docs, use div_u64
> > > > > v3: Do it in core
> > > > >
> > > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > > Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > > > > ---
> > > > >   Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> > > > >   drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> > > > >   include/drm/drm_file.h                |  1 +
> > > > >   include/drm/drm_gem.h                 | 19 +++++++
> > > > >   4 files changed, 117 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > > > > index b46327356e80..b5e7802532ed 100644
> > > > > --- a/Documentation/gpu/drm-usage-stats.rst
> > > > > +++ b/Documentation/gpu/drm-usage-stats.rst
> > > > > @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> > > > >   Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> > > > >   indicating kibi- or mebi-bytes.
> > > > >
> > > > > +- drm-shared-memory: <uint> [KiB|MiB]
> > > > > +
> > > > > +The total size of buffers that are shared with another file (ie. have more
> > > > > +than a single handle).
> > > > > +
> > > > > +- drm-private-memory: <uint> [KiB|MiB]
> > > > > +
> > > > > +The total size of buffers that are not shared with another file.
> > > > > +
> > > > > +- drm-resident-memory: <uint> [KiB|MiB]
> > > > > +
> > > > > +The total size of buffers that are resident in system memory.
> > > >
> > > > I think this naming maybe does not work best with the existing
> > > > drm-memory-<region> keys.
> > >
> > > Actually, it was very deliberate not to conflict with the existing
> > > drm-memory-<region> keys ;-)
> > >
> > > I wouldn't have preferred drm-memory-{active,resident,...} but it
> > > could be mis-parsed by existing userspace so my hands were a bit tied.
> > >
> > > > How about introduce the concept of a memory region from the start and
> > > > use naming similar like we do for engines?
> > > >
> > > > drm-memory-$CATEGORY-$REGION: ...
> > > >
> > > > Then we document a bunch of categories and their semantics, for instance:
> > > >
> > > > 'size' - All reachable objects
> > > > 'shared' - Subset of 'size' with handle_count > 1
> > > > 'resident' - Objects with backing store
> > > > 'active' - Objects in use, subset of resident
> > > > 'purgeable' - Or inactive? Subset of resident.
> > > >
> > > > We keep the same semantics as with process memory accounting (if I got
> > > > it right) which could be desirable for a simplified mental model.
> > > >
> > > > (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> > > > correctly captured this in the first round it should be equivalent to
> > > > 'resident' above. In any case we can document no category is equal to
> > > > which category, and at most one of the two must be output.)
> > > >
> > > > Region names we at most partially standardize. Like we could say
> > > > 'system' is to be used where backing store is system RAM and others are
> > > > driver defined.
> > > >
> > > > Then discrete GPUs could emit N sets of key-values, one for each memory
> > > > region they support.
> > > >
> > > > I think this all also works for objects which can be migrated between
> > > > memory regions. 'Size' accounts them against all regions while for
> > > > 'resident' they only appear in the region of their current placement, etc.
> > >
> > > I'm not too sure how to rectify different memory regions with this,
> > > since drm core doesn't really know about the driver's memory regions.
> > > Perhaps we can go back to this being a helper and drivers with vram
> > > just don't use the helper?  Or??
> >
> > I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> > all works out reasonably consistently?
> 
> That is basically what we have now.  I could append -system to each to
> make things easier to add vram/etc (from a uabi standpoint)..

What you have isn't really -system, but everything. So doesn't really make
sense to me to mark this -system, it's only really true for integrated (if
they don't have stolen or something like that).

Also my comment was more in reply to Tvrtko's suggestion.
-Daniel


> 
> BR,
> -R
> 
> > And ttm could/should perhaps provide a helper to dump the region specific
> > version of this. Or we lift the concept of regions out of ttm a bit
> > higher, that's kinda needed for cgroups eventually anyway I think.
> > -Daniel
> >
> > >
> > > BR,
> > > -R
> > >
> > > > Userspace can aggregate if it wishes to do so but kernel side should not.
> > > >
> > > > > +
> > > > > +- drm-purgeable-memory: <uint> [KiB|MiB]
> > > > > +
> > > > > +The total size of buffers that are purgeable.
> > > > > +
> > > > > +- drm-active-memory: <uint> [KiB|MiB]
> > > > > +
> > > > > +The total size of buffers that are active on one or more rings.
> > > > > +
> > > > >   - drm-cycles-<str> <uint>
> > > > >
> > > > >   Engine identifier string must be the same as the one specified in the
> > > > > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > > > > index 37dfaa6be560..46fdd843bb3a 100644
> > > > > --- a/drivers/gpu/drm/drm_file.c
> > > > > +++ b/drivers/gpu/drm/drm_file.c
> > > > > @@ -42,6 +42,7 @@
> > > > >   #include <drm/drm_client.h>
> > > > >   #include <drm/drm_drv.h>
> > > > >   #include <drm/drm_file.h>
> > > > > +#include <drm/drm_gem.h>
> > > > >   #include <drm/drm_print.h>
> > > > >
> > > > >   #include "drm_crtc_internal.h"
> > > > > @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> > > > >   }
> > > > >   EXPORT_SYMBOL(drm_send_event);
> > > > >
> > > > > +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> > > > > +{
> > > > > +     const char *units[] = {"", " KiB", " MiB"};
> > > > > +     unsigned u;
> > > > > +
> > > > > +     for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> > > > > +             if (sz < SZ_1K)
> > > > > +                     break;
> > > > > +             sz = div_u64(sz, SZ_1K);
> > > > > +     }
> > > > > +
> > > > > +     drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> > > > > +}
> > > > > +
> > > > > +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> > > > > +{
> > > > > +     struct drm_gem_object *obj;
> > > > > +     struct {
> > > > > +             size_t shared;
> > > > > +             size_t private;
> > > > > +             size_t resident;
> > > > > +             size_t purgeable;
> > > > > +             size_t active;
> > > > > +     } size = {0};
> > > > > +     bool has_status = false;
> > > > > +     int id;
> > > > > +
> > > > > +     spin_lock(&file->table_lock);
> > > > > +     idr_for_each_entry (&file->object_idr, obj, id) {
> > > > > +             enum drm_gem_object_status s = 0;
> > > > > +
> > > > > +             if (obj->funcs && obj->funcs->status) {
> > > > > +                     s = obj->funcs->status(obj);
> > > > > +                     has_status = true;
> > > > > +             }
> > > > > +
> > > > > +             if (obj->handle_count > 1) {
> > > > > +                     size.shared += obj->size;
> > > > > +             } else {
> > > > > +                     size.private += obj->size;
> > > > > +             }
> > > > > +
> > > > > +             if (s & DRM_GEM_OBJECT_RESIDENT) {
> > > > > +                     size.resident += obj->size;
> > > > > +             } else {
> > > > > +                     /* If already purged or not yet backed by pages, don't
> > > > > +                      * count it as purgeable:
> > > > > +                      */
> > > > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> > > >
> > > > Side question - why couldn't resident buffers be purgeable? Did you mean
> > > > for the if branch check to be active here? But then it wouldn't make
> > > > sense for a driver to report active _and_ purgeable..
> > > >
> > > > > +             }
> > > > > +
> > > > > +             if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> > > > > +                     size.active += obj->size;
> > > > > +
> > > > > +                     /* If still active, don't count as purgeable: */
> > > > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> > > >
> > > > Another side question - I guess this tidies a race in reporting? If so
> > > > not sure it matters given the stats are all rather approximate.
> > > >
> > > > > +             }
> > > > > +
> > > > > +             if (s & DRM_GEM_OBJECT_PURGEABLE)
> > > > > +                     size.purgeable += obj->size;
> > > > > +     }
> > > >
> > > > One concern I have here is that it is all based on obj->size. That is,
> > > > there is no provision for drivers to implement page level granularity.
> > > > So correct reporting in use cases such as VM BIND in the future wouldn't
> > > > work unless it was a driver hook to get almost all of the info above. At
> > > > which point common code is just a loop. TBF I don't know if any drivers
> > > > do sub obj->size backing store granularity today, but I think it is
> > > > sometimes to be sure of before proceeding.
> > > >
> > > > Second concern is what I touched upon in the first reply block - if the
> > > > common code blindly loops over all objects then on discrete GPUs it
> > > > seems we get an 'aggregate' value here which is not what I think we
> > > > want. We rather want to have the ability for drivers to list stats per
> > > > individual memory region.
> > > >
> > > > > +     spin_unlock(&file->table_lock);
> > > > > +
> > > > > +     print_size(p, "drm-shared-memory", size.shared);
> > > > > +     print_size(p, "drm-private-memory", size.private);
> > > > > +     print_size(p, "drm-active-memory", size.active);
> > > > > +
> > > > > +     if (has_status) {
> > > > > +             print_size(p, "drm-resident-memory", size.resident);
> > > > > +             print_size(p, "drm-purgeable-memory", size.purgeable);
> > > > > +     }
> > > > > +}
> > > > > +
> > > > >   /**
> > > > >    * drm_fop_show_fdinfo - helper for drm file fops
> > > > >    * @seq_file: output stream
> > > > > @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> > > > >
> > > > >       if (dev->driver->show_fdinfo)
> > > > >               dev->driver->show_fdinfo(&p, file);
> > > > > +
> > > > > +     print_memory_stats(&p, file);
> > > > >   }
> > > > >   EXPORT_SYMBOL(drm_fop_show_fdinfo);
> > > > >
> > > > > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > > > > index dfa995b787e1..e5b40084538f 100644
> > > > > --- a/include/drm/drm_file.h
> > > > > +++ b/include/drm/drm_file.h
> > > > > @@ -41,6 +41,7 @@
> > > > >   struct dma_fence;
> > > > >   struct drm_file;
> > > > >   struct drm_device;
> > > > > +struct drm_printer;
> > > > >   struct device;
> > > > >   struct file;
> > > > >
> > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > index 189fd618ca65..213917bb6b11 100644
> > > > > --- a/include/drm/drm_gem.h
> > > > > +++ b/include/drm/drm_gem.h
> > > > > @@ -42,6 +42,14 @@
> > > > >   struct iosys_map;
> > > > >   struct drm_gem_object;
> > > > >
> > > > > +/**
> > > > > + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> > > > > + */
> > > > > +enum drm_gem_object_status {
> > > > > +     DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> > > > > +     DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> > > > > +};
> > > > > +
> > > > >   /**
> > > > >    * struct drm_gem_object_funcs - GEM object functions
> > > > >    */
> > > > > @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
> > > > >        */
> > > > >       int (*evict)(struct drm_gem_object *obj);
> > > > >
> > > > > +     /**
> > > > > +      * @status:
> > > > > +      *
> > > > > +      * The optional status callback can return additional object state
> > > > > +      * which determines which stats the object is counted against.  The
> > > > > +      * callback is called under table_lock.  Racing against object status
> > > > > +      * change is "harmless", and the callback can expect to not race
> > > > > +      * against object destruction.
> > > > > +      */
> > > > > +     enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
> > > >
> > > > Does this needs to be in object funcs and couldn't be consolidated to
> > > > driver level?
> > > >
> > > > Regards,
> > > >
> > > > Tvrtko
> > > >
> > > > > +
> > > > >       /**
> > > > >        * @vm_ops:
> > > > >        *
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 5/7] drm/etnaviv: Switch to fdinfo helper
  2023-04-12  7:59     ` Daniel Vetter
@ 2023-04-12 22:18       ` Rob Clark
  -1 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-12 22:18 UTC (permalink / raw)
  To: Rob Clark, dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Tvrtko Ursulin, Christopher Healy, Emil Velikov, Rob Clark,
	Lucas Stach, Russell King, Christian Gmeiner, David Airlie,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, open list
  Cc: Daniel Vetter

On Wed, Apr 12, 2023 at 12:59 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Apr 11, 2023 at 03:56:10PM -0700, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
>
> You're on an old tree, this got reverted. But I'm kinda wondering whether
> another patch on top shouldn't just includ the drm_show_fdinfo in
> DRM_GEM_FOPS macro ... There's really no good reasons for drivers to not
> have this I think?

oh, I'm roughly on msm-next, so didn't see the revert.. I'll drop this
one.  But with things in flux, this is why I decided against adding it
to DRM_GEM_FOPS.  Ie. we should do that as a followup cleanup step
once everyone is moved over to the new helpers to avoid conflicts or
build breaks when merging things via different driver trees

BR,
-R

> -Daniel
>
> > ---
> >  drivers/gpu/drm/etnaviv/etnaviv_drv.c | 10 ++++------
> >  1 file changed, 4 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > index 44ca803237a5..170000d6af94 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > @@ -476,9 +476,8 @@ static const struct drm_ioctl_desc etnaviv_ioctls[] = {
> >       ETNA_IOCTL(PM_QUERY_SIG, pm_query_sig, DRM_RENDER_ALLOW),
> >  };
> >
> > -static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
> > +static void etnaviv_fop_show_fdinfo(struct drm_printer *p, struct drm_file *file)
> >  {
> > -     struct drm_file *file = f->private_data;
> >       struct drm_device *dev = file->minor->dev;
> >       struct etnaviv_drm_private *priv = dev->dev_private;
> >       struct etnaviv_file_private *ctx = file->driver_priv;
> > @@ -487,8 +486,6 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
> >        * For a description of the text output format used here, see
> >        * Documentation/gpu/drm-usage-stats.rst.
> >        */
> > -     seq_printf(m, "drm-driver:\t%s\n", dev->driver->name);
> > -     seq_printf(m, "drm-client-id:\t%u\n", ctx->id);
> >
> >       for (int i = 0; i < ETNA_MAX_PIPES; i++) {
> >               struct etnaviv_gpu *gpu = priv->gpu[i];
> > @@ -507,7 +504,7 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
> >                       cur = snprintf(engine + cur, sizeof(engine) - cur,
> >                                      "%sNN", cur ? "/" : "");
> >
> > -             seq_printf(m, "drm-engine-%s:\t%llu ns\n", engine,
> > +             drm_printf(p, "drm-engine-%s:\t%llu ns\n", engine,
> >                          ctx->sched_entity[i].elapsed_ns);
> >       }
> >  }
> > @@ -515,7 +512,7 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
> >  static const struct file_operations fops = {
> >       .owner = THIS_MODULE,
> >       DRM_GEM_FOPS,
> > -     .show_fdinfo = etnaviv_fop_show_fdinfo,
> > +     .show_fdinfo = drm_fop_show_fdinfo,
> >  };
> >
> >  static const struct drm_driver etnaviv_drm_driver = {
> > @@ -529,6 +526,7 @@ static const struct drm_driver etnaviv_drm_driver = {
> >  #ifdef CONFIG_DEBUG_FS
> >       .debugfs_init       = etnaviv_debugfs_init,
> >  #endif
> > +     .show_fdinfo        = etnaviv_fop_show_fdinfo,
> >       .ioctls             = etnaviv_ioctls,
> >       .num_ioctls         = DRM_ETNAVIV_NUM_IOCTLS,
> >       .fops               = &fops,
> > --
> > 2.39.2
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 5/7] drm/etnaviv: Switch to fdinfo helper
@ 2023-04-12 22:18       ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-12 22:18 UTC (permalink / raw)
  To: Rob Clark, dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Tvrtko Ursulin, Christopher Healy, Emil Velikov, Rob Clark,
	Lucas Stach, Russell King, Christian Gmeiner, David Airlie,
	moderated list:DRM DRIVERS FOR VIVANTE GPU IP, open list

On Wed, Apr 12, 2023 at 12:59 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Apr 11, 2023 at 03:56:10PM -0700, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
>
> You're on an old tree, this got reverted. But I'm kinda wondering whether
> another patch on top shouldn't just includ the drm_show_fdinfo in
> DRM_GEM_FOPS macro ... There's really no good reasons for drivers to not
> have this I think?

oh, I'm roughly on msm-next, so didn't see the revert.. I'll drop this
one.  But with things in flux, this is why I decided against adding it
to DRM_GEM_FOPS.  Ie. we should do that as a followup cleanup step
once everyone is moved over to the new helpers to avoid conflicts or
build breaks when merging things via different driver trees

BR,
-R

> -Daniel
>
> > ---
> >  drivers/gpu/drm/etnaviv/etnaviv_drv.c | 10 ++++------
> >  1 file changed, 4 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > index 44ca803237a5..170000d6af94 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > @@ -476,9 +476,8 @@ static const struct drm_ioctl_desc etnaviv_ioctls[] = {
> >       ETNA_IOCTL(PM_QUERY_SIG, pm_query_sig, DRM_RENDER_ALLOW),
> >  };
> >
> > -static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
> > +static void etnaviv_fop_show_fdinfo(struct drm_printer *p, struct drm_file *file)
> >  {
> > -     struct drm_file *file = f->private_data;
> >       struct drm_device *dev = file->minor->dev;
> >       struct etnaviv_drm_private *priv = dev->dev_private;
> >       struct etnaviv_file_private *ctx = file->driver_priv;
> > @@ -487,8 +486,6 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
> >        * For a description of the text output format used here, see
> >        * Documentation/gpu/drm-usage-stats.rst.
> >        */
> > -     seq_printf(m, "drm-driver:\t%s\n", dev->driver->name);
> > -     seq_printf(m, "drm-client-id:\t%u\n", ctx->id);
> >
> >       for (int i = 0; i < ETNA_MAX_PIPES; i++) {
> >               struct etnaviv_gpu *gpu = priv->gpu[i];
> > @@ -507,7 +504,7 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
> >                       cur = snprintf(engine + cur, sizeof(engine) - cur,
> >                                      "%sNN", cur ? "/" : "");
> >
> > -             seq_printf(m, "drm-engine-%s:\t%llu ns\n", engine,
> > +             drm_printf(p, "drm-engine-%s:\t%llu ns\n", engine,
> >                          ctx->sched_entity[i].elapsed_ns);
> >       }
> >  }
> > @@ -515,7 +512,7 @@ static void etnaviv_fop_show_fdinfo(struct seq_file *m, struct file *f)
> >  static const struct file_operations fops = {
> >       .owner = THIS_MODULE,
> >       DRM_GEM_FOPS,
> > -     .show_fdinfo = etnaviv_fop_show_fdinfo,
> > +     .show_fdinfo = drm_fop_show_fdinfo,
> >  };
> >
> >  static const struct drm_driver etnaviv_drm_driver = {
> > @@ -529,6 +526,7 @@ static const struct drm_driver etnaviv_drm_driver = {
> >  #ifdef CONFIG_DEBUG_FS
> >       .debugfs_init       = etnaviv_debugfs_init,
> >  #endif
> > +     .show_fdinfo        = etnaviv_fop_show_fdinfo,
> >       .ioctls             = etnaviv_ioctls,
> >       .num_ioctls         = DRM_ETNAVIV_NUM_IOCTLS,
> >       .fops               = &fops,
> > --
> > 2.39.2
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-12 19:18             ` Daniel Vetter
  (?)
@ 2023-04-13 12:58             ` Tvrtko Ursulin
  2023-04-13 13:27                 ` Daniel Vetter
  2023-04-13 15:47                 ` Rob Clark
  -1 siblings, 2 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-13 12:58 UTC (permalink / raw)
  To: Rob Clark, dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Christopher Healy, Emil Velikov, Rob Clark, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, open list:DOCUMENTATION, open list


On 12/04/2023 20:18, Daniel Vetter wrote:
> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>
>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>
>>>>>
>>>>> On 11/04/2023 23:56, Rob Clark wrote:
>>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>>
>>>>>> Add support to dump GEM stats to fdinfo.
>>>>>>
>>>>>> v2: Fix typos, change size units to match docs, use div_u64
>>>>>> v3: Do it in core
>>>>>>
>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
>>>>>> ---
>>>>>>    Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
>>>>>>    drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
>>>>>>    include/drm/drm_file.h                |  1 +
>>>>>>    include/drm/drm_gem.h                 | 19 +++++++
>>>>>>    4 files changed, 117 insertions(+)
>>>>>>
>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
>>>>>> index b46327356e80..b5e7802532ed 100644
>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
>>>>>>    Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
>>>>>>    indicating kibi- or mebi-bytes.
>>>>>>
>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
>>>>>> +
>>>>>> +The total size of buffers that are shared with another file (ie. have more
>>>>>> +than a single handle).
>>>>>> +
>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
>>>>>> +
>>>>>> +The total size of buffers that are not shared with another file.
>>>>>> +
>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
>>>>>> +
>>>>>> +The total size of buffers that are resident in system memory.
>>>>>
>>>>> I think this naming maybe does not work best with the existing
>>>>> drm-memory-<region> keys.
>>>>
>>>> Actually, it was very deliberate not to conflict with the existing
>>>> drm-memory-<region> keys ;-)
>>>>
>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
>>>>
>>>>> How about introduce the concept of a memory region from the start and
>>>>> use naming similar like we do for engines?
>>>>>
>>>>> drm-memory-$CATEGORY-$REGION: ...
>>>>>
>>>>> Then we document a bunch of categories and their semantics, for instance:
>>>>>
>>>>> 'size' - All reachable objects
>>>>> 'shared' - Subset of 'size' with handle_count > 1
>>>>> 'resident' - Objects with backing store
>>>>> 'active' - Objects in use, subset of resident
>>>>> 'purgeable' - Or inactive? Subset of resident.
>>>>>
>>>>> We keep the same semantics as with process memory accounting (if I got
>>>>> it right) which could be desirable for a simplified mental model.
>>>>>
>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
>>>>> correctly captured this in the first round it should be equivalent to
>>>>> 'resident' above. In any case we can document no category is equal to
>>>>> which category, and at most one of the two must be output.)
>>>>>
>>>>> Region names we at most partially standardize. Like we could say
>>>>> 'system' is to be used where backing store is system RAM and others are
>>>>> driver defined.
>>>>>
>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
>>>>> region they support.
>>>>>
>>>>> I think this all also works for objects which can be migrated between
>>>>> memory regions. 'Size' accounts them against all regions while for
>>>>> 'resident' they only appear in the region of their current placement, etc.
>>>>
>>>> I'm not too sure how to rectify different memory regions with this,
>>>> since drm core doesn't really know about the driver's memory regions.
>>>> Perhaps we can go back to this being a helper and drivers with vram
>>>> just don't use the helper?  Or??
>>>
>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
>>> all works out reasonably consistently?
>>
>> That is basically what we have now.  I could append -system to each to
>> make things easier to add vram/etc (from a uabi standpoint)..
> 
> What you have isn't really -system, but everything. So doesn't really make
> sense to me to mark this -system, it's only really true for integrated (if
> they don't have stolen or something like that).
> 
> Also my comment was more in reply to Tvrtko's suggestion.

Right so my proposal was drm-memory-$CATEGORY-$REGION which I think 
aligns with the current drm-memory-$REGION by extending, rather than 
creating confusion with different order of key name components.

AMD currently has (among others) drm-memory-vram, which we could define 
in the spec maps to category X, if category component is not present.

Some examples:

drm-memory-resident-system:
drm-memory-size-lmem0:
drm-memory-active-vram:

Etc.. I think it creates a consistent story.

Other than this, my two I think significant opens which haven't been 
addressed yet are:

1)

Why do we want totals (not per region) when userspace can trivially 
aggregate if they want. What is the use case?

2)

Current proposal limits the value to whole objects and fixates that by 
having it in the common code. If/when some driver is able to support 
sub-BO granularity they will need to opt out of the common printer at 
which point it may be less churn to start with a helper rather than 
mid-layer. Or maybe some drivers already support this, I don't know. 
Given how important VM BIND is I wouldn't be surprised.

Regards,

Tvrtko

>>> And ttm could/should perhaps provide a helper to dump the region specific
>>> version of this. Or we lift the concept of regions out of ttm a bit
>>> higher, that's kinda needed for cgroups eventually anyway I think.
>>> -Daniel
>>>
>>>>
>>>> BR,
>>>> -R
>>>>
>>>>> Userspace can aggregate if it wishes to do so but kernel side should not.
>>>>>
>>>>>> +
>>>>>> +- drm-purgeable-memory: <uint> [KiB|MiB]
>>>>>> +
>>>>>> +The total size of buffers that are purgeable.
>>>>>> +
>>>>>> +- drm-active-memory: <uint> [KiB|MiB]
>>>>>> +
>>>>>> +The total size of buffers that are active on one or more rings.
>>>>>> +
>>>>>>    - drm-cycles-<str> <uint>
>>>>>>
>>>>>>    Engine identifier string must be the same as the one specified in the
>>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>>>> index 37dfaa6be560..46fdd843bb3a 100644
>>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>>> @@ -42,6 +42,7 @@
>>>>>>    #include <drm/drm_client.h>
>>>>>>    #include <drm/drm_drv.h>
>>>>>>    #include <drm/drm_file.h>
>>>>>> +#include <drm/drm_gem.h>
>>>>>>    #include <drm/drm_print.h>
>>>>>>
>>>>>>    #include "drm_crtc_internal.h"
>>>>>> @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
>>>>>>    }
>>>>>>    EXPORT_SYMBOL(drm_send_event);
>>>>>>
>>>>>> +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
>>>>>> +{
>>>>>> +     const char *units[] = {"", " KiB", " MiB"};
>>>>>> +     unsigned u;
>>>>>> +
>>>>>> +     for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
>>>>>> +             if (sz < SZ_1K)
>>>>>> +                     break;
>>>>>> +             sz = div_u64(sz, SZ_1K);
>>>>>> +     }
>>>>>> +
>>>>>> +     drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
>>>>>> +}
>>>>>> +
>>>>>> +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
>>>>>> +{
>>>>>> +     struct drm_gem_object *obj;
>>>>>> +     struct {
>>>>>> +             size_t shared;
>>>>>> +             size_t private;
>>>>>> +             size_t resident;
>>>>>> +             size_t purgeable;
>>>>>> +             size_t active;
>>>>>> +     } size = {0};
>>>>>> +     bool has_status = false;
>>>>>> +     int id;
>>>>>> +
>>>>>> +     spin_lock(&file->table_lock);
>>>>>> +     idr_for_each_entry (&file->object_idr, obj, id) {
>>>>>> +             enum drm_gem_object_status s = 0;
>>>>>> +
>>>>>> +             if (obj->funcs && obj->funcs->status) {
>>>>>> +                     s = obj->funcs->status(obj);
>>>>>> +                     has_status = true;
>>>>>> +             }
>>>>>> +
>>>>>> +             if (obj->handle_count > 1) {
>>>>>> +                     size.shared += obj->size;
>>>>>> +             } else {
>>>>>> +                     size.private += obj->size;
>>>>>> +             }
>>>>>> +
>>>>>> +             if (s & DRM_GEM_OBJECT_RESIDENT) {
>>>>>> +                     size.resident += obj->size;
>>>>>> +             } else {
>>>>>> +                     /* If already purged or not yet backed by pages, don't
>>>>>> +                      * count it as purgeable:
>>>>>> +                      */
>>>>>> +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
>>>>>
>>>>> Side question - why couldn't resident buffers be purgeable? Did you mean
>>>>> for the if branch check to be active here? But then it wouldn't make
>>>>> sense for a driver to report active _and_ purgeable..
>>>>>
>>>>>> +             }
>>>>>> +
>>>>>> +             if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
>>>>>> +                     size.active += obj->size;
>>>>>> +
>>>>>> +                     /* If still active, don't count as purgeable: */
>>>>>> +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
>>>>>
>>>>> Another side question - I guess this tidies a race in reporting? If so
>>>>> not sure it matters given the stats are all rather approximate.
>>>>>
>>>>>> +             }
>>>>>> +
>>>>>> +             if (s & DRM_GEM_OBJECT_PURGEABLE)
>>>>>> +                     size.purgeable += obj->size;
>>>>>> +     }
>>>>>
>>>>> One concern I have here is that it is all based on obj->size. That is,
>>>>> there is no provision for drivers to implement page level granularity.
>>>>> So correct reporting in use cases such as VM BIND in the future wouldn't
>>>>> work unless it was a driver hook to get almost all of the info above. At
>>>>> which point common code is just a loop. TBF I don't know if any drivers
>>>>> do sub obj->size backing store granularity today, but I think it is
>>>>> sometimes to be sure of before proceeding.
>>>>>
>>>>> Second concern is what I touched upon in the first reply block - if the
>>>>> common code blindly loops over all objects then on discrete GPUs it
>>>>> seems we get an 'aggregate' value here which is not what I think we
>>>>> want. We rather want to have the ability for drivers to list stats per
>>>>> individual memory region.
>>>>>
>>>>>> +     spin_unlock(&file->table_lock);
>>>>>> +
>>>>>> +     print_size(p, "drm-shared-memory", size.shared);
>>>>>> +     print_size(p, "drm-private-memory", size.private);
>>>>>> +     print_size(p, "drm-active-memory", size.active);
>>>>>> +
>>>>>> +     if (has_status) {
>>>>>> +             print_size(p, "drm-resident-memory", size.resident);
>>>>>> +             print_size(p, "drm-purgeable-memory", size.purgeable);
>>>>>> +     }
>>>>>> +}
>>>>>> +
>>>>>>    /**
>>>>>>     * drm_fop_show_fdinfo - helper for drm file fops
>>>>>>     * @seq_file: output stream
>>>>>> @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
>>>>>>
>>>>>>        if (dev->driver->show_fdinfo)
>>>>>>                dev->driver->show_fdinfo(&p, file);
>>>>>> +
>>>>>> +     print_memory_stats(&p, file);
>>>>>>    }
>>>>>>    EXPORT_SYMBOL(drm_fop_show_fdinfo);
>>>>>>
>>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>>>>> index dfa995b787e1..e5b40084538f 100644
>>>>>> --- a/include/drm/drm_file.h
>>>>>> +++ b/include/drm/drm_file.h
>>>>>> @@ -41,6 +41,7 @@
>>>>>>    struct dma_fence;
>>>>>>    struct drm_file;
>>>>>>    struct drm_device;
>>>>>> +struct drm_printer;
>>>>>>    struct device;
>>>>>>    struct file;
>>>>>>
>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>>>>> index 189fd618ca65..213917bb6b11 100644
>>>>>> --- a/include/drm/drm_gem.h
>>>>>> +++ b/include/drm/drm_gem.h
>>>>>> @@ -42,6 +42,14 @@
>>>>>>    struct iosys_map;
>>>>>>    struct drm_gem_object;
>>>>>>
>>>>>> +/**
>>>>>> + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
>>>>>> + */
>>>>>> +enum drm_gem_object_status {
>>>>>> +     DRM_GEM_OBJECT_RESIDENT  = BIT(0),
>>>>>> +     DRM_GEM_OBJECT_PURGEABLE = BIT(1),
>>>>>> +};
>>>>>> +
>>>>>>    /**
>>>>>>     * struct drm_gem_object_funcs - GEM object functions
>>>>>>     */
>>>>>> @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
>>>>>>         */
>>>>>>        int (*evict)(struct drm_gem_object *obj);
>>>>>>
>>>>>> +     /**
>>>>>> +      * @status:
>>>>>> +      *
>>>>>> +      * The optional status callback can return additional object state
>>>>>> +      * which determines which stats the object is counted against.  The
>>>>>> +      * callback is called under table_lock.  Racing against object status
>>>>>> +      * change is "harmless", and the callback can expect to not race
>>>>>> +      * against object destruction.
>>>>>> +      */
>>>>>> +     enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
>>>>>
>>>>> Does this needs to be in object funcs and couldn't be consolidated to
>>>>> driver level?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Tvrtko
>>>>>
>>>>>> +
>>>>>>        /**
>>>>>>         * @vm_ops:
>>>>>>         *
>>>
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> http://blog.ffwll.ch
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-13 12:58             ` Tvrtko Ursulin
@ 2023-04-13 13:27                 ` Daniel Vetter
  2023-04-13 15:47                 ` Rob Clark
  1 sibling, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-13 13:27 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Christopher Healy, Emil Velikov, Rob Clark, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, open list:DOCUMENTATION, open list

On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> 
> On 12/04/2023 20:18, Daniel Vetter wrote:
> > On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> > > On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > 
> > > > On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> > > > > On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> > > > > <tvrtko.ursulin@linux.intel.com> wrote:
> > > > > > 
> > > > > > 
> > > > > > On 11/04/2023 23:56, Rob Clark wrote:
> > > > > > > From: Rob Clark <robdclark@chromium.org>
> > > > > > > 
> > > > > > > Add support to dump GEM stats to fdinfo.
> > > > > > > 
> > > > > > > v2: Fix typos, change size units to match docs, use div_u64
> > > > > > > v3: Do it in core
> > > > > > > 
> > > > > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > > > > Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > > > > > > ---
> > > > > > >    Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> > > > > > >    drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> > > > > > >    include/drm/drm_file.h                |  1 +
> > > > > > >    include/drm/drm_gem.h                 | 19 +++++++
> > > > > > >    4 files changed, 117 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > > > > > > index b46327356e80..b5e7802532ed 100644
> > > > > > > --- a/Documentation/gpu/drm-usage-stats.rst
> > > > > > > +++ b/Documentation/gpu/drm-usage-stats.rst
> > > > > > > @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> > > > > > >    Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> > > > > > >    indicating kibi- or mebi-bytes.
> > > > > > > 
> > > > > > > +- drm-shared-memory: <uint> [KiB|MiB]
> > > > > > > +
> > > > > > > +The total size of buffers that are shared with another file (ie. have more
> > > > > > > +than a single handle).
> > > > > > > +
> > > > > > > +- drm-private-memory: <uint> [KiB|MiB]
> > > > > > > +
> > > > > > > +The total size of buffers that are not shared with another file.
> > > > > > > +
> > > > > > > +- drm-resident-memory: <uint> [KiB|MiB]
> > > > > > > +
> > > > > > > +The total size of buffers that are resident in system memory.
> > > > > > 
> > > > > > I think this naming maybe does not work best with the existing
> > > > > > drm-memory-<region> keys.
> > > > > 
> > > > > Actually, it was very deliberate not to conflict with the existing
> > > > > drm-memory-<region> keys ;-)
> > > > > 
> > > > > I wouldn't have preferred drm-memory-{active,resident,...} but it
> > > > > could be mis-parsed by existing userspace so my hands were a bit tied.
> > > > > 
> > > > > > How about introduce the concept of a memory region from the start and
> > > > > > use naming similar like we do for engines?
> > > > > > 
> > > > > > drm-memory-$CATEGORY-$REGION: ...
> > > > > > 
> > > > > > Then we document a bunch of categories and their semantics, for instance:
> > > > > > 
> > > > > > 'size' - All reachable objects
> > > > > > 'shared' - Subset of 'size' with handle_count > 1
> > > > > > 'resident' - Objects with backing store
> > > > > > 'active' - Objects in use, subset of resident
> > > > > > 'purgeable' - Or inactive? Subset of resident.
> > > > > > 
> > > > > > We keep the same semantics as with process memory accounting (if I got
> > > > > > it right) which could be desirable for a simplified mental model.
> > > > > > 
> > > > > > (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> > > > > > correctly captured this in the first round it should be equivalent to
> > > > > > 'resident' above. In any case we can document no category is equal to
> > > > > > which category, and at most one of the two must be output.)
> > > > > > 
> > > > > > Region names we at most partially standardize. Like we could say
> > > > > > 'system' is to be used where backing store is system RAM and others are
> > > > > > driver defined.
> > > > > > 
> > > > > > Then discrete GPUs could emit N sets of key-values, one for each memory
> > > > > > region they support.
> > > > > > 
> > > > > > I think this all also works for objects which can be migrated between
> > > > > > memory regions. 'Size' accounts them against all regions while for
> > > > > > 'resident' they only appear in the region of their current placement, etc.
> > > > > 
> > > > > I'm not too sure how to rectify different memory regions with this,
> > > > > since drm core doesn't really know about the driver's memory regions.
> > > > > Perhaps we can go back to this being a helper and drivers with vram
> > > > > just don't use the helper?  Or??
> > > > 
> > > > I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> > > > all works out reasonably consistently?
> > > 
> > > That is basically what we have now.  I could append -system to each to
> > > make things easier to add vram/etc (from a uabi standpoint)..
> > 
> > What you have isn't really -system, but everything. So doesn't really make
> > sense to me to mark this -system, it's only really true for integrated (if
> > they don't have stolen or something like that).
> > 
> > Also my comment was more in reply to Tvrtko's suggestion.
> 
> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> with the current drm-memory-$REGION by extending, rather than creating
> confusion with different order of key name components.

Oh my comment was pretty much just bikeshed, in case someone creates a
$REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
So $CATEGORY before the -memory.

Otoh I don't think that'll happen, so I guess we can go with whatever more
folks like :-) I don't really care much personally.

> AMD currently has (among others) drm-memory-vram, which we could define in
> the spec maps to category X, if category component is not present.
> 
> Some examples:
> 
> drm-memory-resident-system:
> drm-memory-size-lmem0:
> drm-memory-active-vram:
> 
> Etc.. I think it creates a consistent story.
> 
> Other than this, my two I think significant opens which haven't been
> addressed yet are:
> 
> 1)
> 
> Why do we want totals (not per region) when userspace can trivially
> aggregate if they want. What is the use case?
> 
> 2)
> 
> Current proposal limits the value to whole objects and fixates that by
> having it in the common code. If/when some driver is able to support sub-BO
> granularity they will need to opt out of the common printer at which point
> it may be less churn to start with a helper rather than mid-layer. Or maybe
> some drivers already support this, I don't know. Given how important VM BIND
> is I wouldn't be surprised.

I feel like for drivers using ttm we want a ttm helper which takes care of
the region printing in hopefully a standard way. And that could then also
take care of all kinds of of partial binding and funny rules (like maybe
we want a standard vram region that addds up all the lmem regions on
intel, so that all dgpu have a common vram bucket that generic tools
understand?).

It does mean we walk the bo list twice, but *shrug*. People have been
complaining about procutils for decades, they're still horrible, I think
walking bo lists twice internally in the ttm case is going to be ok. If
not, it's internals, we can change them again.

Also I'd lean a lot more towards making ttm a helper and not putting that
into core, exactly because it's pretty clear we'll need more flexibility
when it comes to accurate stats for multi-region drivers.

But for a first "how much gpu space does this app use" across everything I
think this is a good enough starting point.
-Daniel

> 
> Regards,
> 
> Tvrtko
> 
> > > > And ttm could/should perhaps provide a helper to dump the region specific
> > > > version of this. Or we lift the concept of regions out of ttm a bit
> > > > higher, that's kinda needed for cgroups eventually anyway I think.
> > > > -Daniel
> > > > 
> > > > > 
> > > > > BR,
> > > > > -R
> > > > > 
> > > > > > Userspace can aggregate if it wishes to do so but kernel side should not.
> > > > > > 
> > > > > > > +
> > > > > > > +- drm-purgeable-memory: <uint> [KiB|MiB]
> > > > > > > +
> > > > > > > +The total size of buffers that are purgeable.
> > > > > > > +
> > > > > > > +- drm-active-memory: <uint> [KiB|MiB]
> > > > > > > +
> > > > > > > +The total size of buffers that are active on one or more rings.
> > > > > > > +
> > > > > > >    - drm-cycles-<str> <uint>
> > > > > > > 
> > > > > > >    Engine identifier string must be the same as the one specified in the
> > > > > > > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > > > > > > index 37dfaa6be560..46fdd843bb3a 100644
> > > > > > > --- a/drivers/gpu/drm/drm_file.c
> > > > > > > +++ b/drivers/gpu/drm/drm_file.c
> > > > > > > @@ -42,6 +42,7 @@
> > > > > > >    #include <drm/drm_client.h>
> > > > > > >    #include <drm/drm_drv.h>
> > > > > > >    #include <drm/drm_file.h>
> > > > > > > +#include <drm/drm_gem.h>
> > > > > > >    #include <drm/drm_print.h>
> > > > > > > 
> > > > > > >    #include "drm_crtc_internal.h"
> > > > > > > @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> > > > > > >    }
> > > > > > >    EXPORT_SYMBOL(drm_send_event);
> > > > > > > 
> > > > > > > +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> > > > > > > +{
> > > > > > > +     const char *units[] = {"", " KiB", " MiB"};
> > > > > > > +     unsigned u;
> > > > > > > +
> > > > > > > +     for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> > > > > > > +             if (sz < SZ_1K)
> > > > > > > +                     break;
> > > > > > > +             sz = div_u64(sz, SZ_1K);
> > > > > > > +     }
> > > > > > > +
> > > > > > > +     drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> > > > > > > +{
> > > > > > > +     struct drm_gem_object *obj;
> > > > > > > +     struct {
> > > > > > > +             size_t shared;
> > > > > > > +             size_t private;
> > > > > > > +             size_t resident;
> > > > > > > +             size_t purgeable;
> > > > > > > +             size_t active;
> > > > > > > +     } size = {0};
> > > > > > > +     bool has_status = false;
> > > > > > > +     int id;
> > > > > > > +
> > > > > > > +     spin_lock(&file->table_lock);
> > > > > > > +     idr_for_each_entry (&file->object_idr, obj, id) {
> > > > > > > +             enum drm_gem_object_status s = 0;
> > > > > > > +
> > > > > > > +             if (obj->funcs && obj->funcs->status) {
> > > > > > > +                     s = obj->funcs->status(obj);
> > > > > > > +                     has_status = true;
> > > > > > > +             }
> > > > > > > +
> > > > > > > +             if (obj->handle_count > 1) {
> > > > > > > +                     size.shared += obj->size;
> > > > > > > +             } else {
> > > > > > > +                     size.private += obj->size;
> > > > > > > +             }
> > > > > > > +
> > > > > > > +             if (s & DRM_GEM_OBJECT_RESIDENT) {
> > > > > > > +                     size.resident += obj->size;
> > > > > > > +             } else {
> > > > > > > +                     /* If already purged or not yet backed by pages, don't
> > > > > > > +                      * count it as purgeable:
> > > > > > > +                      */
> > > > > > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> > > > > > 
> > > > > > Side question - why couldn't resident buffers be purgeable? Did you mean
> > > > > > for the if branch check to be active here? But then it wouldn't make
> > > > > > sense for a driver to report active _and_ purgeable..
> > > > > > 
> > > > > > > +             }
> > > > > > > +
> > > > > > > +             if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> > > > > > > +                     size.active += obj->size;
> > > > > > > +
> > > > > > > +                     /* If still active, don't count as purgeable: */
> > > > > > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> > > > > > 
> > > > > > Another side question - I guess this tidies a race in reporting? If so
> > > > > > not sure it matters given the stats are all rather approximate.
> > > > > > 
> > > > > > > +             }
> > > > > > > +
> > > > > > > +             if (s & DRM_GEM_OBJECT_PURGEABLE)
> > > > > > > +                     size.purgeable += obj->size;
> > > > > > > +     }
> > > > > > 
> > > > > > One concern I have here is that it is all based on obj->size. That is,
> > > > > > there is no provision for drivers to implement page level granularity.
> > > > > > So correct reporting in use cases such as VM BIND in the future wouldn't
> > > > > > work unless it was a driver hook to get almost all of the info above. At
> > > > > > which point common code is just a loop. TBF I don't know if any drivers
> > > > > > do sub obj->size backing store granularity today, but I think it is
> > > > > > sometimes to be sure of before proceeding.
> > > > > > 
> > > > > > Second concern is what I touched upon in the first reply block - if the
> > > > > > common code blindly loops over all objects then on discrete GPUs it
> > > > > > seems we get an 'aggregate' value here which is not what I think we
> > > > > > want. We rather want to have the ability for drivers to list stats per
> > > > > > individual memory region.
> > > > > > 
> > > > > > > +     spin_unlock(&file->table_lock);
> > > > > > > +
> > > > > > > +     print_size(p, "drm-shared-memory", size.shared);
> > > > > > > +     print_size(p, "drm-private-memory", size.private);
> > > > > > > +     print_size(p, "drm-active-memory", size.active);
> > > > > > > +
> > > > > > > +     if (has_status) {
> > > > > > > +             print_size(p, "drm-resident-memory", size.resident);
> > > > > > > +             print_size(p, "drm-purgeable-memory", size.purgeable);
> > > > > > > +     }
> > > > > > > +}
> > > > > > > +
> > > > > > >    /**
> > > > > > >     * drm_fop_show_fdinfo - helper for drm file fops
> > > > > > >     * @seq_file: output stream
> > > > > > > @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> > > > > > > 
> > > > > > >        if (dev->driver->show_fdinfo)
> > > > > > >                dev->driver->show_fdinfo(&p, file);
> > > > > > > +
> > > > > > > +     print_memory_stats(&p, file);
> > > > > > >    }
> > > > > > >    EXPORT_SYMBOL(drm_fop_show_fdinfo);
> > > > > > > 
> > > > > > > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > > > > > > index dfa995b787e1..e5b40084538f 100644
> > > > > > > --- a/include/drm/drm_file.h
> > > > > > > +++ b/include/drm/drm_file.h
> > > > > > > @@ -41,6 +41,7 @@
> > > > > > >    struct dma_fence;
> > > > > > >    struct drm_file;
> > > > > > >    struct drm_device;
> > > > > > > +struct drm_printer;
> > > > > > >    struct device;
> > > > > > >    struct file;
> > > > > > > 
> > > > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > > > index 189fd618ca65..213917bb6b11 100644
> > > > > > > --- a/include/drm/drm_gem.h
> > > > > > > +++ b/include/drm/drm_gem.h
> > > > > > > @@ -42,6 +42,14 @@
> > > > > > >    struct iosys_map;
> > > > > > >    struct drm_gem_object;
> > > > > > > 
> > > > > > > +/**
> > > > > > > + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> > > > > > > + */
> > > > > > > +enum drm_gem_object_status {
> > > > > > > +     DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> > > > > > > +     DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> > > > > > > +};
> > > > > > > +
> > > > > > >    /**
> > > > > > >     * struct drm_gem_object_funcs - GEM object functions
> > > > > > >     */
> > > > > > > @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
> > > > > > >         */
> > > > > > >        int (*evict)(struct drm_gem_object *obj);
> > > > > > > 
> > > > > > > +     /**
> > > > > > > +      * @status:
> > > > > > > +      *
> > > > > > > +      * The optional status callback can return additional object state
> > > > > > > +      * which determines which stats the object is counted against.  The
> > > > > > > +      * callback is called under table_lock.  Racing against object status
> > > > > > > +      * change is "harmless", and the callback can expect to not race
> > > > > > > +      * against object destruction.
> > > > > > > +      */
> > > > > > > +     enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
> > > > > > 
> > > > > > Does this needs to be in object funcs and couldn't be consolidated to
> > > > > > driver level?
> > > > > > 
> > > > > > Regards,
> > > > > > 
> > > > > > Tvrtko
> > > > > > 
> > > > > > > +
> > > > > > >        /**
> > > > > > >         * @vm_ops:
> > > > > > >         *
> > > > 
> > > > --
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > http://blog.ffwll.ch
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-13 13:27                 ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-13 13:27 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno

On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> 
> On 12/04/2023 20:18, Daniel Vetter wrote:
> > On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> > > On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > 
> > > > On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> > > > > On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> > > > > <tvrtko.ursulin@linux.intel.com> wrote:
> > > > > > 
> > > > > > 
> > > > > > On 11/04/2023 23:56, Rob Clark wrote:
> > > > > > > From: Rob Clark <robdclark@chromium.org>
> > > > > > > 
> > > > > > > Add support to dump GEM stats to fdinfo.
> > > > > > > 
> > > > > > > v2: Fix typos, change size units to match docs, use div_u64
> > > > > > > v3: Do it in core
> > > > > > > 
> > > > > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > > > > Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > > > > > > ---
> > > > > > >    Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> > > > > > >    drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> > > > > > >    include/drm/drm_file.h                |  1 +
> > > > > > >    include/drm/drm_gem.h                 | 19 +++++++
> > > > > > >    4 files changed, 117 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > > > > > > index b46327356e80..b5e7802532ed 100644
> > > > > > > --- a/Documentation/gpu/drm-usage-stats.rst
> > > > > > > +++ b/Documentation/gpu/drm-usage-stats.rst
> > > > > > > @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> > > > > > >    Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> > > > > > >    indicating kibi- or mebi-bytes.
> > > > > > > 
> > > > > > > +- drm-shared-memory: <uint> [KiB|MiB]
> > > > > > > +
> > > > > > > +The total size of buffers that are shared with another file (ie. have more
> > > > > > > +than a single handle).
> > > > > > > +
> > > > > > > +- drm-private-memory: <uint> [KiB|MiB]
> > > > > > > +
> > > > > > > +The total size of buffers that are not shared with another file.
> > > > > > > +
> > > > > > > +- drm-resident-memory: <uint> [KiB|MiB]
> > > > > > > +
> > > > > > > +The total size of buffers that are resident in system memory.
> > > > > > 
> > > > > > I think this naming maybe does not work best with the existing
> > > > > > drm-memory-<region> keys.
> > > > > 
> > > > > Actually, it was very deliberate not to conflict with the existing
> > > > > drm-memory-<region> keys ;-)
> > > > > 
> > > > > I wouldn't have preferred drm-memory-{active,resident,...} but it
> > > > > could be mis-parsed by existing userspace so my hands were a bit tied.
> > > > > 
> > > > > > How about introduce the concept of a memory region from the start and
> > > > > > use naming similar like we do for engines?
> > > > > > 
> > > > > > drm-memory-$CATEGORY-$REGION: ...
> > > > > > 
> > > > > > Then we document a bunch of categories and their semantics, for instance:
> > > > > > 
> > > > > > 'size' - All reachable objects
> > > > > > 'shared' - Subset of 'size' with handle_count > 1
> > > > > > 'resident' - Objects with backing store
> > > > > > 'active' - Objects in use, subset of resident
> > > > > > 'purgeable' - Or inactive? Subset of resident.
> > > > > > 
> > > > > > We keep the same semantics as with process memory accounting (if I got
> > > > > > it right) which could be desirable for a simplified mental model.
> > > > > > 
> > > > > > (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> > > > > > correctly captured this in the first round it should be equivalent to
> > > > > > 'resident' above. In any case we can document no category is equal to
> > > > > > which category, and at most one of the two must be output.)
> > > > > > 
> > > > > > Region names we at most partially standardize. Like we could say
> > > > > > 'system' is to be used where backing store is system RAM and others are
> > > > > > driver defined.
> > > > > > 
> > > > > > Then discrete GPUs could emit N sets of key-values, one for each memory
> > > > > > region they support.
> > > > > > 
> > > > > > I think this all also works for objects which can be migrated between
> > > > > > memory regions. 'Size' accounts them against all regions while for
> > > > > > 'resident' they only appear in the region of their current placement, etc.
> > > > > 
> > > > > I'm not too sure how to rectify different memory regions with this,
> > > > > since drm core doesn't really know about the driver's memory regions.
> > > > > Perhaps we can go back to this being a helper and drivers with vram
> > > > > just don't use the helper?  Or??
> > > > 
> > > > I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> > > > all works out reasonably consistently?
> > > 
> > > That is basically what we have now.  I could append -system to each to
> > > make things easier to add vram/etc (from a uabi standpoint)..
> > 
> > What you have isn't really -system, but everything. So doesn't really make
> > sense to me to mark this -system, it's only really true for integrated (if
> > they don't have stolen or something like that).
> > 
> > Also my comment was more in reply to Tvrtko's suggestion.
> 
> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> with the current drm-memory-$REGION by extending, rather than creating
> confusion with different order of key name components.

Oh my comment was pretty much just bikeshed, in case someone creates a
$REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
So $CATEGORY before the -memory.

Otoh I don't think that'll happen, so I guess we can go with whatever more
folks like :-) I don't really care much personally.

> AMD currently has (among others) drm-memory-vram, which we could define in
> the spec maps to category X, if category component is not present.
> 
> Some examples:
> 
> drm-memory-resident-system:
> drm-memory-size-lmem0:
> drm-memory-active-vram:
> 
> Etc.. I think it creates a consistent story.
> 
> Other than this, my two I think significant opens which haven't been
> addressed yet are:
> 
> 1)
> 
> Why do we want totals (not per region) when userspace can trivially
> aggregate if they want. What is the use case?
> 
> 2)
> 
> Current proposal limits the value to whole objects and fixates that by
> having it in the common code. If/when some driver is able to support sub-BO
> granularity they will need to opt out of the common printer at which point
> it may be less churn to start with a helper rather than mid-layer. Or maybe
> some drivers already support this, I don't know. Given how important VM BIND
> is I wouldn't be surprised.

I feel like for drivers using ttm we want a ttm helper which takes care of
the region printing in hopefully a standard way. And that could then also
take care of all kinds of of partial binding and funny rules (like maybe
we want a standard vram region that addds up all the lmem regions on
intel, so that all dgpu have a common vram bucket that generic tools
understand?).

It does mean we walk the bo list twice, but *shrug*. People have been
complaining about procutils for decades, they're still horrible, I think
walking bo lists twice internally in the ttm case is going to be ok. If
not, it's internals, we can change them again.

Also I'd lean a lot more towards making ttm a helper and not putting that
into core, exactly because it's pretty clear we'll need more flexibility
when it comes to accurate stats for multi-region drivers.

But for a first "how much gpu space does this app use" across everything I
think this is a good enough starting point.
-Daniel

> 
> Regards,
> 
> Tvrtko
> 
> > > > And ttm could/should perhaps provide a helper to dump the region specific
> > > > version of this. Or we lift the concept of regions out of ttm a bit
> > > > higher, that's kinda needed for cgroups eventually anyway I think.
> > > > -Daniel
> > > > 
> > > > > 
> > > > > BR,
> > > > > -R
> > > > > 
> > > > > > Userspace can aggregate if it wishes to do so but kernel side should not.
> > > > > > 
> > > > > > > +
> > > > > > > +- drm-purgeable-memory: <uint> [KiB|MiB]
> > > > > > > +
> > > > > > > +The total size of buffers that are purgeable.
> > > > > > > +
> > > > > > > +- drm-active-memory: <uint> [KiB|MiB]
> > > > > > > +
> > > > > > > +The total size of buffers that are active on one or more rings.
> > > > > > > +
> > > > > > >    - drm-cycles-<str> <uint>
> > > > > > > 
> > > > > > >    Engine identifier string must be the same as the one specified in the
> > > > > > > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > > > > > > index 37dfaa6be560..46fdd843bb3a 100644
> > > > > > > --- a/drivers/gpu/drm/drm_file.c
> > > > > > > +++ b/drivers/gpu/drm/drm_file.c
> > > > > > > @@ -42,6 +42,7 @@
> > > > > > >    #include <drm/drm_client.h>
> > > > > > >    #include <drm/drm_drv.h>
> > > > > > >    #include <drm/drm_file.h>
> > > > > > > +#include <drm/drm_gem.h>
> > > > > > >    #include <drm/drm_print.h>
> > > > > > > 
> > > > > > >    #include "drm_crtc_internal.h"
> > > > > > > @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> > > > > > >    }
> > > > > > >    EXPORT_SYMBOL(drm_send_event);
> > > > > > > 
> > > > > > > +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> > > > > > > +{
> > > > > > > +     const char *units[] = {"", " KiB", " MiB"};
> > > > > > > +     unsigned u;
> > > > > > > +
> > > > > > > +     for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> > > > > > > +             if (sz < SZ_1K)
> > > > > > > +                     break;
> > > > > > > +             sz = div_u64(sz, SZ_1K);
> > > > > > > +     }
> > > > > > > +
> > > > > > > +     drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> > > > > > > +{
> > > > > > > +     struct drm_gem_object *obj;
> > > > > > > +     struct {
> > > > > > > +             size_t shared;
> > > > > > > +             size_t private;
> > > > > > > +             size_t resident;
> > > > > > > +             size_t purgeable;
> > > > > > > +             size_t active;
> > > > > > > +     } size = {0};
> > > > > > > +     bool has_status = false;
> > > > > > > +     int id;
> > > > > > > +
> > > > > > > +     spin_lock(&file->table_lock);
> > > > > > > +     idr_for_each_entry (&file->object_idr, obj, id) {
> > > > > > > +             enum drm_gem_object_status s = 0;
> > > > > > > +
> > > > > > > +             if (obj->funcs && obj->funcs->status) {
> > > > > > > +                     s = obj->funcs->status(obj);
> > > > > > > +                     has_status = true;
> > > > > > > +             }
> > > > > > > +
> > > > > > > +             if (obj->handle_count > 1) {
> > > > > > > +                     size.shared += obj->size;
> > > > > > > +             } else {
> > > > > > > +                     size.private += obj->size;
> > > > > > > +             }
> > > > > > > +
> > > > > > > +             if (s & DRM_GEM_OBJECT_RESIDENT) {
> > > > > > > +                     size.resident += obj->size;
> > > > > > > +             } else {
> > > > > > > +                     /* If already purged or not yet backed by pages, don't
> > > > > > > +                      * count it as purgeable:
> > > > > > > +                      */
> > > > > > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> > > > > > 
> > > > > > Side question - why couldn't resident buffers be purgeable? Did you mean
> > > > > > for the if branch check to be active here? But then it wouldn't make
> > > > > > sense for a driver to report active _and_ purgeable..
> > > > > > 
> > > > > > > +             }
> > > > > > > +
> > > > > > > +             if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> > > > > > > +                     size.active += obj->size;
> > > > > > > +
> > > > > > > +                     /* If still active, don't count as purgeable: */
> > > > > > > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> > > > > > 
> > > > > > Another side question - I guess this tidies a race in reporting? If so
> > > > > > not sure it matters given the stats are all rather approximate.
> > > > > > 
> > > > > > > +             }
> > > > > > > +
> > > > > > > +             if (s & DRM_GEM_OBJECT_PURGEABLE)
> > > > > > > +                     size.purgeable += obj->size;
> > > > > > > +     }
> > > > > > 
> > > > > > One concern I have here is that it is all based on obj->size. That is,
> > > > > > there is no provision for drivers to implement page level granularity.
> > > > > > So correct reporting in use cases such as VM BIND in the future wouldn't
> > > > > > work unless it was a driver hook to get almost all of the info above. At
> > > > > > which point common code is just a loop. TBF I don't know if any drivers
> > > > > > do sub obj->size backing store granularity today, but I think it is
> > > > > > sometimes to be sure of before proceeding.
> > > > > > 
> > > > > > Second concern is what I touched upon in the first reply block - if the
> > > > > > common code blindly loops over all objects then on discrete GPUs it
> > > > > > seems we get an 'aggregate' value here which is not what I think we
> > > > > > want. We rather want to have the ability for drivers to list stats per
> > > > > > individual memory region.
> > > > > > 
> > > > > > > +     spin_unlock(&file->table_lock);
> > > > > > > +
> > > > > > > +     print_size(p, "drm-shared-memory", size.shared);
> > > > > > > +     print_size(p, "drm-private-memory", size.private);
> > > > > > > +     print_size(p, "drm-active-memory", size.active);
> > > > > > > +
> > > > > > > +     if (has_status) {
> > > > > > > +             print_size(p, "drm-resident-memory", size.resident);
> > > > > > > +             print_size(p, "drm-purgeable-memory", size.purgeable);
> > > > > > > +     }
> > > > > > > +}
> > > > > > > +
> > > > > > >    /**
> > > > > > >     * drm_fop_show_fdinfo - helper for drm file fops
> > > > > > >     * @seq_file: output stream
> > > > > > > @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> > > > > > > 
> > > > > > >        if (dev->driver->show_fdinfo)
> > > > > > >                dev->driver->show_fdinfo(&p, file);
> > > > > > > +
> > > > > > > +     print_memory_stats(&p, file);
> > > > > > >    }
> > > > > > >    EXPORT_SYMBOL(drm_fop_show_fdinfo);
> > > > > > > 
> > > > > > > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > > > > > > index dfa995b787e1..e5b40084538f 100644
> > > > > > > --- a/include/drm/drm_file.h
> > > > > > > +++ b/include/drm/drm_file.h
> > > > > > > @@ -41,6 +41,7 @@
> > > > > > >    struct dma_fence;
> > > > > > >    struct drm_file;
> > > > > > >    struct drm_device;
> > > > > > > +struct drm_printer;
> > > > > > >    struct device;
> > > > > > >    struct file;
> > > > > > > 
> > > > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > > > index 189fd618ca65..213917bb6b11 100644
> > > > > > > --- a/include/drm/drm_gem.h
> > > > > > > +++ b/include/drm/drm_gem.h
> > > > > > > @@ -42,6 +42,14 @@
> > > > > > >    struct iosys_map;
> > > > > > >    struct drm_gem_object;
> > > > > > > 
> > > > > > > +/**
> > > > > > > + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> > > > > > > + */
> > > > > > > +enum drm_gem_object_status {
> > > > > > > +     DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> > > > > > > +     DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> > > > > > > +};
> > > > > > > +
> > > > > > >    /**
> > > > > > >     * struct drm_gem_object_funcs - GEM object functions
> > > > > > >     */
> > > > > > > @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
> > > > > > >         */
> > > > > > >        int (*evict)(struct drm_gem_object *obj);
> > > > > > > 
> > > > > > > +     /**
> > > > > > > +      * @status:
> > > > > > > +      *
> > > > > > > +      * The optional status callback can return additional object state
> > > > > > > +      * which determines which stats the object is counted against.  The
> > > > > > > +      * callback is called under table_lock.  Racing against object status
> > > > > > > +      * change is "harmless", and the callback can expect to not race
> > > > > > > +      * against object destruction.
> > > > > > > +      */
> > > > > > > +     enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
> > > > > > 
> > > > > > Does this needs to be in object funcs and couldn't be consolidated to
> > > > > > driver level?
> > > > > > 
> > > > > > Regards,
> > > > > > 
> > > > > > Tvrtko
> > > > > > 
> > > > > > > +
> > > > > > >        /**
> > > > > > >         * @vm_ops:
> > > > > > >         *
> > > > 
> > > > --
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > http://blog.ffwll.ch
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-13 12:58             ` Tvrtko Ursulin
@ 2023-04-13 15:47                 ` Rob Clark
  2023-04-13 15:47                 ` Rob Clark
  1 sibling, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-13 15:47 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Christopher Healy, Emil Velikov, Rob Clark, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, open list:DOCUMENTATION, open list

On Thu, Apr 13, 2023 at 5:58 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 12/04/2023 20:18, Daniel Vetter wrote:
> > On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> >> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>
> >>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> >>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> >>>> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>
> >>>>>
> >>>>> On 11/04/2023 23:56, Rob Clark wrote:
> >>>>>> From: Rob Clark <robdclark@chromium.org>
> >>>>>>
> >>>>>> Add support to dump GEM stats to fdinfo.
> >>>>>>
> >>>>>> v2: Fix typos, change size units to match docs, use div_u64
> >>>>>> v3: Do it in core
> >>>>>>
> >>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> >>>>>> ---
> >>>>>>    Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >>>>>>    drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >>>>>>    include/drm/drm_file.h                |  1 +
> >>>>>>    include/drm/drm_gem.h                 | 19 +++++++
> >>>>>>    4 files changed, 117 insertions(+)
> >>>>>>
> >>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> >>>>>> index b46327356e80..b5e7802532ed 100644
> >>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >>>>>>    Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >>>>>>    indicating kibi- or mebi-bytes.
> >>>>>>
> >>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> >>>>>> +
> >>>>>> +The total size of buffers that are shared with another file (ie. have more
> >>>>>> +than a single handle).
> >>>>>> +
> >>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> >>>>>> +
> >>>>>> +The total size of buffers that are not shared with another file.
> >>>>>> +
> >>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> >>>>>> +
> >>>>>> +The total size of buffers that are resident in system memory.
> >>>>>
> >>>>> I think this naming maybe does not work best with the existing
> >>>>> drm-memory-<region> keys.
> >>>>
> >>>> Actually, it was very deliberate not to conflict with the existing
> >>>> drm-memory-<region> keys ;-)
> >>>>
> >>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> >>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> >>>>
> >>>>> How about introduce the concept of a memory region from the start and
> >>>>> use naming similar like we do for engines?
> >>>>>
> >>>>> drm-memory-$CATEGORY-$REGION: ...
> >>>>>
> >>>>> Then we document a bunch of categories and their semantics, for instance:
> >>>>>
> >>>>> 'size' - All reachable objects
> >>>>> 'shared' - Subset of 'size' with handle_count > 1
> >>>>> 'resident' - Objects with backing store
> >>>>> 'active' - Objects in use, subset of resident
> >>>>> 'purgeable' - Or inactive? Subset of resident.
> >>>>>
> >>>>> We keep the same semantics as with process memory accounting (if I got
> >>>>> it right) which could be desirable for a simplified mental model.
> >>>>>
> >>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> >>>>> correctly captured this in the first round it should be equivalent to
> >>>>> 'resident' above. In any case we can document no category is equal to
> >>>>> which category, and at most one of the two must be output.)
> >>>>>
> >>>>> Region names we at most partially standardize. Like we could say
> >>>>> 'system' is to be used where backing store is system RAM and others are
> >>>>> driver defined.
> >>>>>
> >>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> >>>>> region they support.
> >>>>>
> >>>>> I think this all also works for objects which can be migrated between
> >>>>> memory regions. 'Size' accounts them against all regions while for
> >>>>> 'resident' they only appear in the region of their current placement, etc.
> >>>>
> >>>> I'm not too sure how to rectify different memory regions with this,
> >>>> since drm core doesn't really know about the driver's memory regions.
> >>>> Perhaps we can go back to this being a helper and drivers with vram
> >>>> just don't use the helper?  Or??
> >>>
> >>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> >>> all works out reasonably consistently?
> >>
> >> That is basically what we have now.  I could append -system to each to
> >> make things easier to add vram/etc (from a uabi standpoint)..
> >
> > What you have isn't really -system, but everything. So doesn't really make
> > sense to me to mark this -system, it's only really true for integrated (if
> > they don't have stolen or something like that).
> >
> > Also my comment was more in reply to Tvrtko's suggestion.
>
> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think
> aligns with the current drm-memory-$REGION by extending, rather than
> creating confusion with different order of key name components.
>
> AMD currently has (among others) drm-memory-vram, which we could define
> in the spec maps to category X, if category component is not present.
>
> Some examples:
>
> drm-memory-resident-system:
> drm-memory-size-lmem0:
> drm-memory-active-vram:
>
> Etc.. I think it creates a consistent story.

It does read more naturally.. but there is a problem here (and the
reason I didn't take this route),

```
- drm-memory-<str>: <uint> [KiB|MiB]

Each possible memory type which can be used to store buffer objects by the
GPU in question shall be given a stable and unique name to be returned as the
string here.
```

so, drm-memory-resident-system gets parsed as the "resident-system"
memory type by existing userspace :-(

This is why we are forced to use drm-$CATEGORY-memory...

BR,
-R

> Other than this, my two I think significant opens which haven't been
> addressed yet are:
>
> 1)
>
> Why do we want totals (not per region) when userspace can trivially
> aggregate if they want. What is the use case?
>
> 2)
>
> Current proposal limits the value to whole objects and fixates that by
> having it in the common code. If/when some driver is able to support
> sub-BO granularity they will need to opt out of the common printer at
> which point it may be less churn to start with a helper rather than
> mid-layer. Or maybe some drivers already support this, I don't know.
> Given how important VM BIND is I wouldn't be surprised.
>
> Regards,
>
> Tvrtko
>
> >>> And ttm could/should perhaps provide a helper to dump the region specific
> >>> version of this. Or we lift the concept of regions out of ttm a bit
> >>> higher, that's kinda needed for cgroups eventually anyway I think.
> >>> -Daniel
> >>>
> >>>>
> >>>> BR,
> >>>> -R
> >>>>
> >>>>> Userspace can aggregate if it wishes to do so but kernel side should not.
> >>>>>
> >>>>>> +
> >>>>>> +- drm-purgeable-memory: <uint> [KiB|MiB]
> >>>>>> +
> >>>>>> +The total size of buffers that are purgeable.
> >>>>>> +
> >>>>>> +- drm-active-memory: <uint> [KiB|MiB]
> >>>>>> +
> >>>>>> +The total size of buffers that are active on one or more rings.
> >>>>>> +
> >>>>>>    - drm-cycles-<str> <uint>
> >>>>>>
> >>>>>>    Engine identifier string must be the same as the one specified in the
> >>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> >>>>>> index 37dfaa6be560..46fdd843bb3a 100644
> >>>>>> --- a/drivers/gpu/drm/drm_file.c
> >>>>>> +++ b/drivers/gpu/drm/drm_file.c
> >>>>>> @@ -42,6 +42,7 @@
> >>>>>>    #include <drm/drm_client.h>
> >>>>>>    #include <drm/drm_drv.h>
> >>>>>>    #include <drm/drm_file.h>
> >>>>>> +#include <drm/drm_gem.h>
> >>>>>>    #include <drm/drm_print.h>
> >>>>>>
> >>>>>>    #include "drm_crtc_internal.h"
> >>>>>> @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> >>>>>>    }
> >>>>>>    EXPORT_SYMBOL(drm_send_event);
> >>>>>>
> >>>>>> +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> >>>>>> +{
> >>>>>> +     const char *units[] = {"", " KiB", " MiB"};
> >>>>>> +     unsigned u;
> >>>>>> +
> >>>>>> +     for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> >>>>>> +             if (sz < SZ_1K)
> >>>>>> +                     break;
> >>>>>> +             sz = div_u64(sz, SZ_1K);
> >>>>>> +     }
> >>>>>> +
> >>>>>> +     drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> >>>>>> +}
> >>>>>> +
> >>>>>> +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> >>>>>> +{
> >>>>>> +     struct drm_gem_object *obj;
> >>>>>> +     struct {
> >>>>>> +             size_t shared;
> >>>>>> +             size_t private;
> >>>>>> +             size_t resident;
> >>>>>> +             size_t purgeable;
> >>>>>> +             size_t active;
> >>>>>> +     } size = {0};
> >>>>>> +     bool has_status = false;
> >>>>>> +     int id;
> >>>>>> +
> >>>>>> +     spin_lock(&file->table_lock);
> >>>>>> +     idr_for_each_entry (&file->object_idr, obj, id) {
> >>>>>> +             enum drm_gem_object_status s = 0;
> >>>>>> +
> >>>>>> +             if (obj->funcs && obj->funcs->status) {
> >>>>>> +                     s = obj->funcs->status(obj);
> >>>>>> +                     has_status = true;
> >>>>>> +             }
> >>>>>> +
> >>>>>> +             if (obj->handle_count > 1) {
> >>>>>> +                     size.shared += obj->size;
> >>>>>> +             } else {
> >>>>>> +                     size.private += obj->size;
> >>>>>> +             }
> >>>>>> +
> >>>>>> +             if (s & DRM_GEM_OBJECT_RESIDENT) {
> >>>>>> +                     size.resident += obj->size;
> >>>>>> +             } else {
> >>>>>> +                     /* If already purged or not yet backed by pages, don't
> >>>>>> +                      * count it as purgeable:
> >>>>>> +                      */
> >>>>>> +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> >>>>>
> >>>>> Side question - why couldn't resident buffers be purgeable? Did you mean
> >>>>> for the if branch check to be active here? But then it wouldn't make
> >>>>> sense for a driver to report active _and_ purgeable..
> >>>>>
> >>>>>> +             }
> >>>>>> +
> >>>>>> +             if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> >>>>>> +                     size.active += obj->size;
> >>>>>> +
> >>>>>> +                     /* If still active, don't count as purgeable: */
> >>>>>> +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> >>>>>
> >>>>> Another side question - I guess this tidies a race in reporting? If so
> >>>>> not sure it matters given the stats are all rather approximate.
> >>>>>
> >>>>>> +             }
> >>>>>> +
> >>>>>> +             if (s & DRM_GEM_OBJECT_PURGEABLE)
> >>>>>> +                     size.purgeable += obj->size;
> >>>>>> +     }
> >>>>>
> >>>>> One concern I have here is that it is all based on obj->size. That is,
> >>>>> there is no provision for drivers to implement page level granularity.
> >>>>> So correct reporting in use cases such as VM BIND in the future wouldn't
> >>>>> work unless it was a driver hook to get almost all of the info above. At
> >>>>> which point common code is just a loop. TBF I don't know if any drivers
> >>>>> do sub obj->size backing store granularity today, but I think it is
> >>>>> sometimes to be sure of before proceeding.
> >>>>>
> >>>>> Second concern is what I touched upon in the first reply block - if the
> >>>>> common code blindly loops over all objects then on discrete GPUs it
> >>>>> seems we get an 'aggregate' value here which is not what I think we
> >>>>> want. We rather want to have the ability for drivers to list stats per
> >>>>> individual memory region.
> >>>>>
> >>>>>> +     spin_unlock(&file->table_lock);
> >>>>>> +
> >>>>>> +     print_size(p, "drm-shared-memory", size.shared);
> >>>>>> +     print_size(p, "drm-private-memory", size.private);
> >>>>>> +     print_size(p, "drm-active-memory", size.active);
> >>>>>> +
> >>>>>> +     if (has_status) {
> >>>>>> +             print_size(p, "drm-resident-memory", size.resident);
> >>>>>> +             print_size(p, "drm-purgeable-memory", size.purgeable);
> >>>>>> +     }
> >>>>>> +}
> >>>>>> +
> >>>>>>    /**
> >>>>>>     * drm_fop_show_fdinfo - helper for drm file fops
> >>>>>>     * @seq_file: output stream
> >>>>>> @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> >>>>>>
> >>>>>>        if (dev->driver->show_fdinfo)
> >>>>>>                dev->driver->show_fdinfo(&p, file);
> >>>>>> +
> >>>>>> +     print_memory_stats(&p, file);
> >>>>>>    }
> >>>>>>    EXPORT_SYMBOL(drm_fop_show_fdinfo);
> >>>>>>
> >>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> >>>>>> index dfa995b787e1..e5b40084538f 100644
> >>>>>> --- a/include/drm/drm_file.h
> >>>>>> +++ b/include/drm/drm_file.h
> >>>>>> @@ -41,6 +41,7 @@
> >>>>>>    struct dma_fence;
> >>>>>>    struct drm_file;
> >>>>>>    struct drm_device;
> >>>>>> +struct drm_printer;
> >>>>>>    struct device;
> >>>>>>    struct file;
> >>>>>>
> >>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> >>>>>> index 189fd618ca65..213917bb6b11 100644
> >>>>>> --- a/include/drm/drm_gem.h
> >>>>>> +++ b/include/drm/drm_gem.h
> >>>>>> @@ -42,6 +42,14 @@
> >>>>>>    struct iosys_map;
> >>>>>>    struct drm_gem_object;
> >>>>>>
> >>>>>> +/**
> >>>>>> + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> >>>>>> + */
> >>>>>> +enum drm_gem_object_status {
> >>>>>> +     DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> >>>>>> +     DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> >>>>>> +};
> >>>>>> +
> >>>>>>    /**
> >>>>>>     * struct drm_gem_object_funcs - GEM object functions
> >>>>>>     */
> >>>>>> @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
> >>>>>>         */
> >>>>>>        int (*evict)(struct drm_gem_object *obj);
> >>>>>>
> >>>>>> +     /**
> >>>>>> +      * @status:
> >>>>>> +      *
> >>>>>> +      * The optional status callback can return additional object state
> >>>>>> +      * which determines which stats the object is counted against.  The
> >>>>>> +      * callback is called under table_lock.  Racing against object status
> >>>>>> +      * change is "harmless", and the callback can expect to not race
> >>>>>> +      * against object destruction.
> >>>>>> +      */
> >>>>>> +     enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
> >>>>>
> >>>>> Does this needs to be in object funcs and couldn't be consolidated to
> >>>>> driver level?
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Tvrtko
> >>>>>
> >>>>>> +
> >>>>>>        /**
> >>>>>>         * @vm_ops:
> >>>>>>         *
> >>>
> >>> --
> >>> Daniel Vetter
> >>> Software Engineer, Intel Corporation
> >>> http://blog.ffwll.ch
> >

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-13 15:47                 ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-13 15:47 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno

On Thu, Apr 13, 2023 at 5:58 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 12/04/2023 20:18, Daniel Vetter wrote:
> > On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> >> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>
> >>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> >>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> >>>> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>
> >>>>>
> >>>>> On 11/04/2023 23:56, Rob Clark wrote:
> >>>>>> From: Rob Clark <robdclark@chromium.org>
> >>>>>>
> >>>>>> Add support to dump GEM stats to fdinfo.
> >>>>>>
> >>>>>> v2: Fix typos, change size units to match docs, use div_u64
> >>>>>> v3: Do it in core
> >>>>>>
> >>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> >>>>>> ---
> >>>>>>    Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >>>>>>    drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >>>>>>    include/drm/drm_file.h                |  1 +
> >>>>>>    include/drm/drm_gem.h                 | 19 +++++++
> >>>>>>    4 files changed, 117 insertions(+)
> >>>>>>
> >>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> >>>>>> index b46327356e80..b5e7802532ed 100644
> >>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >>>>>>    Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >>>>>>    indicating kibi- or mebi-bytes.
> >>>>>>
> >>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> >>>>>> +
> >>>>>> +The total size of buffers that are shared with another file (ie. have more
> >>>>>> +than a single handle).
> >>>>>> +
> >>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> >>>>>> +
> >>>>>> +The total size of buffers that are not shared with another file.
> >>>>>> +
> >>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> >>>>>> +
> >>>>>> +The total size of buffers that are resident in system memory.
> >>>>>
> >>>>> I think this naming maybe does not work best with the existing
> >>>>> drm-memory-<region> keys.
> >>>>
> >>>> Actually, it was very deliberate not to conflict with the existing
> >>>> drm-memory-<region> keys ;-)
> >>>>
> >>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> >>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> >>>>
> >>>>> How about introduce the concept of a memory region from the start and
> >>>>> use naming similar like we do for engines?
> >>>>>
> >>>>> drm-memory-$CATEGORY-$REGION: ...
> >>>>>
> >>>>> Then we document a bunch of categories and their semantics, for instance:
> >>>>>
> >>>>> 'size' - All reachable objects
> >>>>> 'shared' - Subset of 'size' with handle_count > 1
> >>>>> 'resident' - Objects with backing store
> >>>>> 'active' - Objects in use, subset of resident
> >>>>> 'purgeable' - Or inactive? Subset of resident.
> >>>>>
> >>>>> We keep the same semantics as with process memory accounting (if I got
> >>>>> it right) which could be desirable for a simplified mental model.
> >>>>>
> >>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> >>>>> correctly captured this in the first round it should be equivalent to
> >>>>> 'resident' above. In any case we can document no category is equal to
> >>>>> which category, and at most one of the two must be output.)
> >>>>>
> >>>>> Region names we at most partially standardize. Like we could say
> >>>>> 'system' is to be used where backing store is system RAM and others are
> >>>>> driver defined.
> >>>>>
> >>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> >>>>> region they support.
> >>>>>
> >>>>> I think this all also works for objects which can be migrated between
> >>>>> memory regions. 'Size' accounts them against all regions while for
> >>>>> 'resident' they only appear in the region of their current placement, etc.
> >>>>
> >>>> I'm not too sure how to rectify different memory regions with this,
> >>>> since drm core doesn't really know about the driver's memory regions.
> >>>> Perhaps we can go back to this being a helper and drivers with vram
> >>>> just don't use the helper?  Or??
> >>>
> >>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> >>> all works out reasonably consistently?
> >>
> >> That is basically what we have now.  I could append -system to each to
> >> make things easier to add vram/etc (from a uabi standpoint)..
> >
> > What you have isn't really -system, but everything. So doesn't really make
> > sense to me to mark this -system, it's only really true for integrated (if
> > they don't have stolen or something like that).
> >
> > Also my comment was more in reply to Tvrtko's suggestion.
>
> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think
> aligns with the current drm-memory-$REGION by extending, rather than
> creating confusion with different order of key name components.
>
> AMD currently has (among others) drm-memory-vram, which we could define
> in the spec maps to category X, if category component is not present.
>
> Some examples:
>
> drm-memory-resident-system:
> drm-memory-size-lmem0:
> drm-memory-active-vram:
>
> Etc.. I think it creates a consistent story.

It does read more naturally.. but there is a problem here (and the
reason I didn't take this route),

```
- drm-memory-<str>: <uint> [KiB|MiB]

Each possible memory type which can be used to store buffer objects by the
GPU in question shall be given a stable and unique name to be returned as the
string here.
```

so, drm-memory-resident-system gets parsed as the "resident-system"
memory type by existing userspace :-(

This is why we are forced to use drm-$CATEGORY-memory...

BR,
-R

> Other than this, my two I think significant opens which haven't been
> addressed yet are:
>
> 1)
>
> Why do we want totals (not per region) when userspace can trivially
> aggregate if they want. What is the use case?
>
> 2)
>
> Current proposal limits the value to whole objects and fixates that by
> having it in the common code. If/when some driver is able to support
> sub-BO granularity they will need to opt out of the common printer at
> which point it may be less churn to start with a helper rather than
> mid-layer. Or maybe some drivers already support this, I don't know.
> Given how important VM BIND is I wouldn't be surprised.
>
> Regards,
>
> Tvrtko
>
> >>> And ttm could/should perhaps provide a helper to dump the region specific
> >>> version of this. Or we lift the concept of regions out of ttm a bit
> >>> higher, that's kinda needed for cgroups eventually anyway I think.
> >>> -Daniel
> >>>
> >>>>
> >>>> BR,
> >>>> -R
> >>>>
> >>>>> Userspace can aggregate if it wishes to do so but kernel side should not.
> >>>>>
> >>>>>> +
> >>>>>> +- drm-purgeable-memory: <uint> [KiB|MiB]
> >>>>>> +
> >>>>>> +The total size of buffers that are purgeable.
> >>>>>> +
> >>>>>> +- drm-active-memory: <uint> [KiB|MiB]
> >>>>>> +
> >>>>>> +The total size of buffers that are active on one or more rings.
> >>>>>> +
> >>>>>>    - drm-cycles-<str> <uint>
> >>>>>>
> >>>>>>    Engine identifier string must be the same as the one specified in the
> >>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> >>>>>> index 37dfaa6be560..46fdd843bb3a 100644
> >>>>>> --- a/drivers/gpu/drm/drm_file.c
> >>>>>> +++ b/drivers/gpu/drm/drm_file.c
> >>>>>> @@ -42,6 +42,7 @@
> >>>>>>    #include <drm/drm_client.h>
> >>>>>>    #include <drm/drm_drv.h>
> >>>>>>    #include <drm/drm_file.h>
> >>>>>> +#include <drm/drm_gem.h>
> >>>>>>    #include <drm/drm_print.h>
> >>>>>>
> >>>>>>    #include "drm_crtc_internal.h"
> >>>>>> @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> >>>>>>    }
> >>>>>>    EXPORT_SYMBOL(drm_send_event);
> >>>>>>
> >>>>>> +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> >>>>>> +{
> >>>>>> +     const char *units[] = {"", " KiB", " MiB"};
> >>>>>> +     unsigned u;
> >>>>>> +
> >>>>>> +     for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> >>>>>> +             if (sz < SZ_1K)
> >>>>>> +                     break;
> >>>>>> +             sz = div_u64(sz, SZ_1K);
> >>>>>> +     }
> >>>>>> +
> >>>>>> +     drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> >>>>>> +}
> >>>>>> +
> >>>>>> +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> >>>>>> +{
> >>>>>> +     struct drm_gem_object *obj;
> >>>>>> +     struct {
> >>>>>> +             size_t shared;
> >>>>>> +             size_t private;
> >>>>>> +             size_t resident;
> >>>>>> +             size_t purgeable;
> >>>>>> +             size_t active;
> >>>>>> +     } size = {0};
> >>>>>> +     bool has_status = false;
> >>>>>> +     int id;
> >>>>>> +
> >>>>>> +     spin_lock(&file->table_lock);
> >>>>>> +     idr_for_each_entry (&file->object_idr, obj, id) {
> >>>>>> +             enum drm_gem_object_status s = 0;
> >>>>>> +
> >>>>>> +             if (obj->funcs && obj->funcs->status) {
> >>>>>> +                     s = obj->funcs->status(obj);
> >>>>>> +                     has_status = true;
> >>>>>> +             }
> >>>>>> +
> >>>>>> +             if (obj->handle_count > 1) {
> >>>>>> +                     size.shared += obj->size;
> >>>>>> +             } else {
> >>>>>> +                     size.private += obj->size;
> >>>>>> +             }
> >>>>>> +
> >>>>>> +             if (s & DRM_GEM_OBJECT_RESIDENT) {
> >>>>>> +                     size.resident += obj->size;
> >>>>>> +             } else {
> >>>>>> +                     /* If already purged or not yet backed by pages, don't
> >>>>>> +                      * count it as purgeable:
> >>>>>> +                      */
> >>>>>> +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> >>>>>
> >>>>> Side question - why couldn't resident buffers be purgeable? Did you mean
> >>>>> for the if branch check to be active here? But then it wouldn't make
> >>>>> sense for a driver to report active _and_ purgeable..
> >>>>>
> >>>>>> +             }
> >>>>>> +
> >>>>>> +             if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> >>>>>> +                     size.active += obj->size;
> >>>>>> +
> >>>>>> +                     /* If still active, don't count as purgeable: */
> >>>>>> +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
> >>>>>
> >>>>> Another side question - I guess this tidies a race in reporting? If so
> >>>>> not sure it matters given the stats are all rather approximate.
> >>>>>
> >>>>>> +             }
> >>>>>> +
> >>>>>> +             if (s & DRM_GEM_OBJECT_PURGEABLE)
> >>>>>> +                     size.purgeable += obj->size;
> >>>>>> +     }
> >>>>>
> >>>>> One concern I have here is that it is all based on obj->size. That is,
> >>>>> there is no provision for drivers to implement page level granularity.
> >>>>> So correct reporting in use cases such as VM BIND in the future wouldn't
> >>>>> work unless it was a driver hook to get almost all of the info above. At
> >>>>> which point common code is just a loop. TBF I don't know if any drivers
> >>>>> do sub obj->size backing store granularity today, but I think it is
> >>>>> sometimes to be sure of before proceeding.
> >>>>>
> >>>>> Second concern is what I touched upon in the first reply block - if the
> >>>>> common code blindly loops over all objects then on discrete GPUs it
> >>>>> seems we get an 'aggregate' value here which is not what I think we
> >>>>> want. We rather want to have the ability for drivers to list stats per
> >>>>> individual memory region.
> >>>>>
> >>>>>> +     spin_unlock(&file->table_lock);
> >>>>>> +
> >>>>>> +     print_size(p, "drm-shared-memory", size.shared);
> >>>>>> +     print_size(p, "drm-private-memory", size.private);
> >>>>>> +     print_size(p, "drm-active-memory", size.active);
> >>>>>> +
> >>>>>> +     if (has_status) {
> >>>>>> +             print_size(p, "drm-resident-memory", size.resident);
> >>>>>> +             print_size(p, "drm-purgeable-memory", size.purgeable);
> >>>>>> +     }
> >>>>>> +}
> >>>>>> +
> >>>>>>    /**
> >>>>>>     * drm_fop_show_fdinfo - helper for drm file fops
> >>>>>>     * @seq_file: output stream
> >>>>>> @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> >>>>>>
> >>>>>>        if (dev->driver->show_fdinfo)
> >>>>>>                dev->driver->show_fdinfo(&p, file);
> >>>>>> +
> >>>>>> +     print_memory_stats(&p, file);
> >>>>>>    }
> >>>>>>    EXPORT_SYMBOL(drm_fop_show_fdinfo);
> >>>>>>
> >>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> >>>>>> index dfa995b787e1..e5b40084538f 100644
> >>>>>> --- a/include/drm/drm_file.h
> >>>>>> +++ b/include/drm/drm_file.h
> >>>>>> @@ -41,6 +41,7 @@
> >>>>>>    struct dma_fence;
> >>>>>>    struct drm_file;
> >>>>>>    struct drm_device;
> >>>>>> +struct drm_printer;
> >>>>>>    struct device;
> >>>>>>    struct file;
> >>>>>>
> >>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> >>>>>> index 189fd618ca65..213917bb6b11 100644
> >>>>>> --- a/include/drm/drm_gem.h
> >>>>>> +++ b/include/drm/drm_gem.h
> >>>>>> @@ -42,6 +42,14 @@
> >>>>>>    struct iosys_map;
> >>>>>>    struct drm_gem_object;
> >>>>>>
> >>>>>> +/**
> >>>>>> + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> >>>>>> + */
> >>>>>> +enum drm_gem_object_status {
> >>>>>> +     DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> >>>>>> +     DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> >>>>>> +};
> >>>>>> +
> >>>>>>    /**
> >>>>>>     * struct drm_gem_object_funcs - GEM object functions
> >>>>>>     */
> >>>>>> @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
> >>>>>>         */
> >>>>>>        int (*evict)(struct drm_gem_object *obj);
> >>>>>>
> >>>>>> +     /**
> >>>>>> +      * @status:
> >>>>>> +      *
> >>>>>> +      * The optional status callback can return additional object state
> >>>>>> +      * which determines which stats the object is counted against.  The
> >>>>>> +      * callback is called under table_lock.  Racing against object status
> >>>>>> +      * change is "harmless", and the callback can expect to not race
> >>>>>> +      * against object destruction.
> >>>>>> +      */
> >>>>>> +     enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
> >>>>>
> >>>>> Does this needs to be in object funcs and couldn't be consolidated to
> >>>>> driver level?
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Tvrtko
> >>>>>
> >>>>>> +
> >>>>>>        /**
> >>>>>>         * @vm_ops:
> >>>>>>         *
> >>>
> >>> --
> >>> Daniel Vetter
> >>> Software Engineer, Intel Corporation
> >>> http://blog.ffwll.ch
> >

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-13 13:27                 ` Daniel Vetter
  (?)
@ 2023-04-13 16:40                 ` Tvrtko Ursulin
  2023-04-13 18:24                     ` Rob Clark
  2023-04-13 20:05                     ` Daniel Vetter
  -1 siblings, 2 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-13 16:40 UTC (permalink / raw)
  To: Rob Clark, dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Christopher Healy, Emil Velikov, Rob Clark, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, open list:DOCUMENTATION, open list


On 13/04/2023 14:27, Daniel Vetter wrote:
> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
>>
>> On 12/04/2023 20:18, Daniel Vetter wrote:
>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>>
>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
>>>>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>>>>
>>>>>>>> Add support to dump GEM stats to fdinfo.
>>>>>>>>
>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
>>>>>>>> v3: Do it in core
>>>>>>>>
>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
>>>>>>>> ---
>>>>>>>>     Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
>>>>>>>>     drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
>>>>>>>>     include/drm/drm_file.h                |  1 +
>>>>>>>>     include/drm/drm_gem.h                 | 19 +++++++
>>>>>>>>     4 files changed, 117 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
>>>>>>>> index b46327356e80..b5e7802532ed 100644
>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
>>>>>>>>     Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
>>>>>>>>     indicating kibi- or mebi-bytes.
>>>>>>>>
>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
>>>>>>>> +
>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
>>>>>>>> +than a single handle).
>>>>>>>> +
>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
>>>>>>>> +
>>>>>>>> +The total size of buffers that are not shared with another file.
>>>>>>>> +
>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
>>>>>>>> +
>>>>>>>> +The total size of buffers that are resident in system memory.
>>>>>>>
>>>>>>> I think this naming maybe does not work best with the existing
>>>>>>> drm-memory-<region> keys.
>>>>>>
>>>>>> Actually, it was very deliberate not to conflict with the existing
>>>>>> drm-memory-<region> keys ;-)
>>>>>>
>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
>>>>>>
>>>>>>> How about introduce the concept of a memory region from the start and
>>>>>>> use naming similar like we do for engines?
>>>>>>>
>>>>>>> drm-memory-$CATEGORY-$REGION: ...
>>>>>>>
>>>>>>> Then we document a bunch of categories and their semantics, for instance:
>>>>>>>
>>>>>>> 'size' - All reachable objects
>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
>>>>>>> 'resident' - Objects with backing store
>>>>>>> 'active' - Objects in use, subset of resident
>>>>>>> 'purgeable' - Or inactive? Subset of resident.
>>>>>>>
>>>>>>> We keep the same semantics as with process memory accounting (if I got
>>>>>>> it right) which could be desirable for a simplified mental model.
>>>>>>>
>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
>>>>>>> correctly captured this in the first round it should be equivalent to
>>>>>>> 'resident' above. In any case we can document no category is equal to
>>>>>>> which category, and at most one of the two must be output.)
>>>>>>>
>>>>>>> Region names we at most partially standardize. Like we could say
>>>>>>> 'system' is to be used where backing store is system RAM and others are
>>>>>>> driver defined.
>>>>>>>
>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
>>>>>>> region they support.
>>>>>>>
>>>>>>> I think this all also works for objects which can be migrated between
>>>>>>> memory regions. 'Size' accounts them against all regions while for
>>>>>>> 'resident' they only appear in the region of their current placement, etc.
>>>>>>
>>>>>> I'm not too sure how to rectify different memory regions with this,
>>>>>> since drm core doesn't really know about the driver's memory regions.
>>>>>> Perhaps we can go back to this being a helper and drivers with vram
>>>>>> just don't use the helper?  Or??
>>>>>
>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
>>>>> all works out reasonably consistently?
>>>>
>>>> That is basically what we have now.  I could append -system to each to
>>>> make things easier to add vram/etc (from a uabi standpoint)..
>>>
>>> What you have isn't really -system, but everything. So doesn't really make
>>> sense to me to mark this -system, it's only really true for integrated (if
>>> they don't have stolen or something like that).
>>>
>>> Also my comment was more in reply to Tvrtko's suggestion.
>>
>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
>> with the current drm-memory-$REGION by extending, rather than creating
>> confusion with different order of key name components.
> 
> Oh my comment was pretty much just bikeshed, in case someone creates a
> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> So $CATEGORY before the -memory.
> 
> Otoh I don't think that'll happen, so I guess we can go with whatever more
> folks like :-) I don't really care much personally.

Okay I missed the parsing problem.

>> AMD currently has (among others) drm-memory-vram, which we could define in
>> the spec maps to category X, if category component is not present.
>>
>> Some examples:
>>
>> drm-memory-resident-system:
>> drm-memory-size-lmem0:
>> drm-memory-active-vram:
>>
>> Etc.. I think it creates a consistent story.
>>
>> Other than this, my two I think significant opens which haven't been
>> addressed yet are:
>>
>> 1)
>>
>> Why do we want totals (not per region) when userspace can trivially
>> aggregate if they want. What is the use case?
>>
>> 2)
>>
>> Current proposal limits the value to whole objects and fixates that by
>> having it in the common code. If/when some driver is able to support sub-BO
>> granularity they will need to opt out of the common printer at which point
>> it may be less churn to start with a helper rather than mid-layer. Or maybe
>> some drivers already support this, I don't know. Given how important VM BIND
>> is I wouldn't be surprised.
> 
> I feel like for drivers using ttm we want a ttm helper which takes care of
> the region printing in hopefully a standard way. And that could then also
> take care of all kinds of of partial binding and funny rules (like maybe
> we want a standard vram region that addds up all the lmem regions on
> intel, so that all dgpu have a common vram bucket that generic tools
> understand?).

First part yes, but for the second I would think we want to avoid any 
aggregation in the kernel which can be done in userspace just as well. 
Such total vram bucket would be pretty useless on Intel even since 
userspace needs to be region aware to make use of all resources. It 
could even be counter productive I think - "why am I getting out of 
memory when half of my vram is unused!?".

> It does mean we walk the bo list twice, but *shrug*. People have been
> complaining about procutils for decades, they're still horrible, I think
> walking bo lists twice internally in the ttm case is going to be ok. If
> not, it's internals, we can change them again.
> 
> Also I'd lean a lot more towards making ttm a helper and not putting that
> into core, exactly because it's pretty clear we'll need more flexibility
> when it comes to accurate stats for multi-region drivers.

Exactly.

> But for a first "how much gpu space does this app use" across everything I
> think this is a good enough starting point.

Okay so we agree this would be better as a helper and not in the core.

On the point are keys/semantics good enough as a starting point I am 
still not convinced kernel should aggregate and that instead we should 
start from day one by appending -system (or something) to Rob's proposed 
keys.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-12 14:42     ` Tvrtko Ursulin
@ 2023-04-13 16:45       ` Alex Deucher
  -1 siblings, 0 replies; 94+ messages in thread
From: Alex Deucher @ 2023-04-13 16:45 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, dri-devel, Rob Clark, Thomas Zimmermann,
	Jonathan Corbet, linux-arm-msm, open list:DOCUMENTATION,
	Emil Velikov, Christopher Healy, open list, Boris Brezillon,
	freedreno

On Wed, Apr 12, 2023 at 10:42 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 11/04/2023 23:56, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Add support to dump GEM stats to fdinfo.
> >
> > v2: Fix typos, change size units to match docs, use div_u64
> > v3: Do it in core
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > ---
> >   Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >   drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >   include/drm/drm_file.h                |  1 +
> >   include/drm/drm_gem.h                 | 19 +++++++
> >   4 files changed, 117 insertions(+)
> >
> > diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > index b46327356e80..b5e7802532ed 100644
> > --- a/Documentation/gpu/drm-usage-stats.rst
> > +++ b/Documentation/gpu/drm-usage-stats.rst
> > @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >   Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >   indicating kibi- or mebi-bytes.
> >
> > +- drm-shared-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are shared with another file (ie. have more
> > +than a single handle).
> > +
> > +- drm-private-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are not shared with another file.
> > +
> > +- drm-resident-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are resident in system memory.
>
> I think this naming maybe does not work best with the existing
> drm-memory-<region> keys.
>
> How about introduce the concept of a memory region from the start and
> use naming similar like we do for engines?
>
> drm-memory-$CATEGORY-$REGION: ...
>
> Then we document a bunch of categories and their semantics, for instance:
>
> 'size' - All reachable objects
> 'shared' - Subset of 'size' with handle_count > 1
> 'resident' - Objects with backing store
> 'active' - Objects in use, subset of resident
> 'purgeable' - Or inactive? Subset of resident.
>
> We keep the same semantics as with process memory accounting (if I got
> it right) which could be desirable for a simplified mental model.
>
> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> correctly captured this in the first round it should be equivalent to
> 'resident' above. In any case we can document no category is equal to
> which category, and at most one of the two must be output.)

We've had the standard TTM pools for a while:
drm-memory-vram
drm-memory-gtt
drm-memory-cpu

And we recently added the following, mainly for profiling for mesa:
amd-memory-visible-vram
amd-evicted-vram
amd-evicted-visible-vram
amd-requested-vram
amd-requested-visible-vram
amd-requested-gtt

amd-memory-visible-vram is a subset of drm-memory-vram, not a separate pool.

Alex

>
> Region names we at most partially standardize. Like we could say
> 'system' is to be used where backing store is system RAM and others are
> driver defined.
>
> Then discrete GPUs could emit N sets of key-values, one for each memory
> region they support.
>
> I think this all also works for objects which can be migrated between
> memory regions. 'Size' accounts them against all regions while for
> 'resident' they only appear in the region of their current placement, etc.
>
> Userspace can aggregate if it wishes to do so but kernel side should not.
>
> > +
> > +- drm-purgeable-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are purgeable.
> > +
> > +- drm-active-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are active on one or more rings.
> > +
> >   - drm-cycles-<str> <uint>
> >
> >   Engine identifier string must be the same as the one specified in the
> > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > index 37dfaa6be560..46fdd843bb3a 100644
> > --- a/drivers/gpu/drm/drm_file.c
> > +++ b/drivers/gpu/drm/drm_file.c
> > @@ -42,6 +42,7 @@
> >   #include <drm/drm_client.h>
> >   #include <drm/drm_drv.h>
> >   #include <drm/drm_file.h>
> > +#include <drm/drm_gem.h>
> >   #include <drm/drm_print.h>
> >
> >   #include "drm_crtc_internal.h"
> > @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> >   }
> >   EXPORT_SYMBOL(drm_send_event);
> >
> > +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> > +{
> > +     const char *units[] = {"", " KiB", " MiB"};
> > +     unsigned u;
> > +
> > +     for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> > +             if (sz < SZ_1K)
> > +                     break;
> > +             sz = div_u64(sz, SZ_1K);
> > +     }
> > +
> > +     drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> > +}
> > +
> > +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> > +{
> > +     struct drm_gem_object *obj;
> > +     struct {
> > +             size_t shared;
> > +             size_t private;
> > +             size_t resident;
> > +             size_t purgeable;
> > +             size_t active;
> > +     } size = {0};
> > +     bool has_status = false;
> > +     int id;
> > +
> > +     spin_lock(&file->table_lock);
> > +     idr_for_each_entry (&file->object_idr, obj, id) {
> > +             enum drm_gem_object_status s = 0;
> > +
> > +             if (obj->funcs && obj->funcs->status) {
> > +                     s = obj->funcs->status(obj);
> > +                     has_status = true;
> > +             }
> > +
> > +             if (obj->handle_count > 1) {
> > +                     size.shared += obj->size;
> > +             } else {
> > +                     size.private += obj->size;
> > +             }
> > +
> > +             if (s & DRM_GEM_OBJECT_RESIDENT) {
> > +                     size.resident += obj->size;
> > +             } else {
> > +                     /* If already purged or not yet backed by pages, don't
> > +                      * count it as purgeable:
> > +                      */
> > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
>
> Side question - why couldn't resident buffers be purgeable? Did you mean
> for the if branch check to be active here? But then it wouldn't make
> sense for a driver to report active _and_ purgeable..
>
> > +             }
> > +
> > +             if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> > +                     size.active += obj->size;
> > +
> > +                     /* If still active, don't count as purgeable: */
> > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
>
> Another side question - I guess this tidies a race in reporting? If so
> not sure it matters given the stats are all rather approximate.
>
> > +             }
> > +
> > +             if (s & DRM_GEM_OBJECT_PURGEABLE)
> > +                     size.purgeable += obj->size;
> > +     }
>
> One concern I have here is that it is all based on obj->size. That is,
> there is no provision for drivers to implement page level granularity.
> So correct reporting in use cases such as VM BIND in the future wouldn't
> work unless it was a driver hook to get almost all of the info above. At
> which point common code is just a loop. TBF I don't know if any drivers
> do sub obj->size backing store granularity today, but I think it is
> sometimes to be sure of before proceeding.
>
> Second concern is what I touched upon in the first reply block - if the
> common code blindly loops over all objects then on discrete GPUs it
> seems we get an 'aggregate' value here which is not what I think we
> want. We rather want to have the ability for drivers to list stats per
> individual memory region.
>
> > +     spin_unlock(&file->table_lock);
> > +
> > +     print_size(p, "drm-shared-memory", size.shared);
> > +     print_size(p, "drm-private-memory", size.private);
> > +     print_size(p, "drm-active-memory", size.active);
> > +
> > +     if (has_status) {
> > +             print_size(p, "drm-resident-memory", size.resident);
> > +             print_size(p, "drm-purgeable-memory", size.purgeable);
> > +     }
> > +}
> > +
> >   /**
> >    * drm_fop_show_fdinfo - helper for drm file fops
> >    * @seq_file: output stream
> > @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> >
> >       if (dev->driver->show_fdinfo)
> >               dev->driver->show_fdinfo(&p, file);
> > +
> > +     print_memory_stats(&p, file);
> >   }
> >   EXPORT_SYMBOL(drm_fop_show_fdinfo);
> >
> > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > index dfa995b787e1..e5b40084538f 100644
> > --- a/include/drm/drm_file.h
> > +++ b/include/drm/drm_file.h
> > @@ -41,6 +41,7 @@
> >   struct dma_fence;
> >   struct drm_file;
> >   struct drm_device;
> > +struct drm_printer;
> >   struct device;
> >   struct file;
> >
> > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > index 189fd618ca65..213917bb6b11 100644
> > --- a/include/drm/drm_gem.h
> > +++ b/include/drm/drm_gem.h
> > @@ -42,6 +42,14 @@
> >   struct iosys_map;
> >   struct drm_gem_object;
> >
> > +/**
> > + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> > + */
> > +enum drm_gem_object_status {
> > +     DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> > +     DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> > +};
> > +
> >   /**
> >    * struct drm_gem_object_funcs - GEM object functions
> >    */
> > @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
> >        */
> >       int (*evict)(struct drm_gem_object *obj);
> >
> > +     /**
> > +      * @status:
> > +      *
> > +      * The optional status callback can return additional object state
> > +      * which determines which stats the object is counted against.  The
> > +      * callback is called under table_lock.  Racing against object status
> > +      * change is "harmless", and the callback can expect to not race
> > +      * against object destruction.
> > +      */
> > +     enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
>
> Does this needs to be in object funcs and couldn't be consolidated to
> driver level?
>
> Regards,
>
> Tvrtko
>
> > +
> >       /**
> >        * @vm_ops:
> >        *

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-13 16:45       ` Alex Deucher
  0 siblings, 0 replies; 94+ messages in thread
From: Alex Deucher @ 2023-04-13 16:45 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno

On Wed, Apr 12, 2023 at 10:42 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 11/04/2023 23:56, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Add support to dump GEM stats to fdinfo.
> >
> > v2: Fix typos, change size units to match docs, use div_u64
> > v3: Do it in core
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > ---
> >   Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >   drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >   include/drm/drm_file.h                |  1 +
> >   include/drm/drm_gem.h                 | 19 +++++++
> >   4 files changed, 117 insertions(+)
> >
> > diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > index b46327356e80..b5e7802532ed 100644
> > --- a/Documentation/gpu/drm-usage-stats.rst
> > +++ b/Documentation/gpu/drm-usage-stats.rst
> > @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >   Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >   indicating kibi- or mebi-bytes.
> >
> > +- drm-shared-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are shared with another file (ie. have more
> > +than a single handle).
> > +
> > +- drm-private-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are not shared with another file.
> > +
> > +- drm-resident-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are resident in system memory.
>
> I think this naming maybe does not work best with the existing
> drm-memory-<region> keys.
>
> How about introduce the concept of a memory region from the start and
> use naming similar like we do for engines?
>
> drm-memory-$CATEGORY-$REGION: ...
>
> Then we document a bunch of categories and their semantics, for instance:
>
> 'size' - All reachable objects
> 'shared' - Subset of 'size' with handle_count > 1
> 'resident' - Objects with backing store
> 'active' - Objects in use, subset of resident
> 'purgeable' - Or inactive? Subset of resident.
>
> We keep the same semantics as with process memory accounting (if I got
> it right) which could be desirable for a simplified mental model.
>
> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> correctly captured this in the first round it should be equivalent to
> 'resident' above. In any case we can document no category is equal to
> which category, and at most one of the two must be output.)

We've had the standard TTM pools for a while:
drm-memory-vram
drm-memory-gtt
drm-memory-cpu

And we recently added the following, mainly for profiling for mesa:
amd-memory-visible-vram
amd-evicted-vram
amd-evicted-visible-vram
amd-requested-vram
amd-requested-visible-vram
amd-requested-gtt

amd-memory-visible-vram is a subset of drm-memory-vram, not a separate pool.

Alex

>
> Region names we at most partially standardize. Like we could say
> 'system' is to be used where backing store is system RAM and others are
> driver defined.
>
> Then discrete GPUs could emit N sets of key-values, one for each memory
> region they support.
>
> I think this all also works for objects which can be migrated between
> memory regions. 'Size' accounts them against all regions while for
> 'resident' they only appear in the region of their current placement, etc.
>
> Userspace can aggregate if it wishes to do so but kernel side should not.
>
> > +
> > +- drm-purgeable-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are purgeable.
> > +
> > +- drm-active-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are active on one or more rings.
> > +
> >   - drm-cycles-<str> <uint>
> >
> >   Engine identifier string must be the same as the one specified in the
> > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > index 37dfaa6be560..46fdd843bb3a 100644
> > --- a/drivers/gpu/drm/drm_file.c
> > +++ b/drivers/gpu/drm/drm_file.c
> > @@ -42,6 +42,7 @@
> >   #include <drm/drm_client.h>
> >   #include <drm/drm_drv.h>
> >   #include <drm/drm_file.h>
> > +#include <drm/drm_gem.h>
> >   #include <drm/drm_print.h>
> >
> >   #include "drm_crtc_internal.h"
> > @@ -871,6 +872,79 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> >   }
> >   EXPORT_SYMBOL(drm_send_event);
> >
> > +static void print_size(struct drm_printer *p, const char *stat, size_t sz)
> > +{
> > +     const char *units[] = {"", " KiB", " MiB"};
> > +     unsigned u;
> > +
> > +     for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> > +             if (sz < SZ_1K)
> > +                     break;
> > +             sz = div_u64(sz, SZ_1K);
> > +     }
> > +
> > +     drm_printf(p, "%s:\t%zu%s\n", stat, sz, units[u]);
> > +}
> > +
> > +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> > +{
> > +     struct drm_gem_object *obj;
> > +     struct {
> > +             size_t shared;
> > +             size_t private;
> > +             size_t resident;
> > +             size_t purgeable;
> > +             size_t active;
> > +     } size = {0};
> > +     bool has_status = false;
> > +     int id;
> > +
> > +     spin_lock(&file->table_lock);
> > +     idr_for_each_entry (&file->object_idr, obj, id) {
> > +             enum drm_gem_object_status s = 0;
> > +
> > +             if (obj->funcs && obj->funcs->status) {
> > +                     s = obj->funcs->status(obj);
> > +                     has_status = true;
> > +             }
> > +
> > +             if (obj->handle_count > 1) {
> > +                     size.shared += obj->size;
> > +             } else {
> > +                     size.private += obj->size;
> > +             }
> > +
> > +             if (s & DRM_GEM_OBJECT_RESIDENT) {
> > +                     size.resident += obj->size;
> > +             } else {
> > +                     /* If already purged or not yet backed by pages, don't
> > +                      * count it as purgeable:
> > +                      */
> > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
>
> Side question - why couldn't resident buffers be purgeable? Did you mean
> for the if branch check to be active here? But then it wouldn't make
> sense for a driver to report active _and_ purgeable..
>
> > +             }
> > +
> > +             if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) {
> > +                     size.active += obj->size;
> > +
> > +                     /* If still active, don't count as purgeable: */
> > +                     s &= ~DRM_GEM_OBJECT_PURGEABLE;
>
> Another side question - I guess this tidies a race in reporting? If so
> not sure it matters given the stats are all rather approximate.
>
> > +             }
> > +
> > +             if (s & DRM_GEM_OBJECT_PURGEABLE)
> > +                     size.purgeable += obj->size;
> > +     }
>
> One concern I have here is that it is all based on obj->size. That is,
> there is no provision for drivers to implement page level granularity.
> So correct reporting in use cases such as VM BIND in the future wouldn't
> work unless it was a driver hook to get almost all of the info above. At
> which point common code is just a loop. TBF I don't know if any drivers
> do sub obj->size backing store granularity today, but I think it is
> sometimes to be sure of before proceeding.
>
> Second concern is what I touched upon in the first reply block - if the
> common code blindly loops over all objects then on discrete GPUs it
> seems we get an 'aggregate' value here which is not what I think we
> want. We rather want to have the ability for drivers to list stats per
> individual memory region.
>
> > +     spin_unlock(&file->table_lock);
> > +
> > +     print_size(p, "drm-shared-memory", size.shared);
> > +     print_size(p, "drm-private-memory", size.private);
> > +     print_size(p, "drm-active-memory", size.active);
> > +
> > +     if (has_status) {
> > +             print_size(p, "drm-resident-memory", size.resident);
> > +             print_size(p, "drm-purgeable-memory", size.purgeable);
> > +     }
> > +}
> > +
> >   /**
> >    * drm_fop_show_fdinfo - helper for drm file fops
> >    * @seq_file: output stream
> > @@ -904,6 +978,8 @@ void drm_fop_show_fdinfo(struct seq_file *m, struct file *f)
> >
> >       if (dev->driver->show_fdinfo)
> >               dev->driver->show_fdinfo(&p, file);
> > +
> > +     print_memory_stats(&p, file);
> >   }
> >   EXPORT_SYMBOL(drm_fop_show_fdinfo);
> >
> > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > index dfa995b787e1..e5b40084538f 100644
> > --- a/include/drm/drm_file.h
> > +++ b/include/drm/drm_file.h
> > @@ -41,6 +41,7 @@
> >   struct dma_fence;
> >   struct drm_file;
> >   struct drm_device;
> > +struct drm_printer;
> >   struct device;
> >   struct file;
> >
> > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > index 189fd618ca65..213917bb6b11 100644
> > --- a/include/drm/drm_gem.h
> > +++ b/include/drm/drm_gem.h
> > @@ -42,6 +42,14 @@
> >   struct iosys_map;
> >   struct drm_gem_object;
> >
> > +/**
> > + * enum drm_gem_object_status - bitmask of object state for fdinfo reporting
> > + */
> > +enum drm_gem_object_status {
> > +     DRM_GEM_OBJECT_RESIDENT  = BIT(0),
> > +     DRM_GEM_OBJECT_PURGEABLE = BIT(1),
> > +};
> > +
> >   /**
> >    * struct drm_gem_object_funcs - GEM object functions
> >    */
> > @@ -174,6 +182,17 @@ struct drm_gem_object_funcs {
> >        */
> >       int (*evict)(struct drm_gem_object *obj);
> >
> > +     /**
> > +      * @status:
> > +      *
> > +      * The optional status callback can return additional object state
> > +      * which determines which stats the object is counted against.  The
> > +      * callback is called under table_lock.  Racing against object status
> > +      * change is "harmless", and the callback can expect to not race
> > +      * against object destruction.
> > +      */
> > +     enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
>
> Does this needs to be in object funcs and couldn't be consolidated to
> driver level?
>
> Regards,
>
> Tvrtko
>
> > +
> >       /**
> >        * @vm_ops:
> >        *

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-13 16:40                 ` Tvrtko Ursulin
@ 2023-04-13 18:24                     ` Rob Clark
  2023-04-13 20:05                     ` Daniel Vetter
  1 sibling, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-13 18:24 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Christopher Healy, Emil Velikov, Rob Clark, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, open list:DOCUMENTATION, open list

On Thu, Apr 13, 2023 at 9:40 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 13/04/2023 14:27, Daniel Vetter wrote:
> > On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> >>
> >> On 12/04/2023 20:18, Daniel Vetter wrote:
> >>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> >>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>>>
> >>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> >>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> >>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
> >>>>>>>> From: Rob Clark <robdclark@chromium.org>
> >>>>>>>>
> >>>>>>>> Add support to dump GEM stats to fdinfo.
> >>>>>>>>
> >>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
> >>>>>>>> v3: Do it in core
> >>>>>>>>
> >>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> >>>>>>>> ---
> >>>>>>>>     Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >>>>>>>>     drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >>>>>>>>     include/drm/drm_file.h                |  1 +
> >>>>>>>>     include/drm/drm_gem.h                 | 19 +++++++
> >>>>>>>>     4 files changed, 117 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>> index b46327356e80..b5e7802532ed 100644
> >>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >>>>>>>>     Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >>>>>>>>     indicating kibi- or mebi-bytes.
> >>>>>>>>
> >>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> >>>>>>>> +
> >>>>>>>> +The total size of buffers that are shared with another file (ie. have more
> >>>>>>>> +than a single handle).
> >>>>>>>> +
> >>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> >>>>>>>> +
> >>>>>>>> +The total size of buffers that are not shared with another file.
> >>>>>>>> +
> >>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> >>>>>>>> +
> >>>>>>>> +The total size of buffers that are resident in system memory.
> >>>>>>>
> >>>>>>> I think this naming maybe does not work best with the existing
> >>>>>>> drm-memory-<region> keys.
> >>>>>>
> >>>>>> Actually, it was very deliberate not to conflict with the existing
> >>>>>> drm-memory-<region> keys ;-)
> >>>>>>
> >>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> >>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> >>>>>>
> >>>>>>> How about introduce the concept of a memory region from the start and
> >>>>>>> use naming similar like we do for engines?
> >>>>>>>
> >>>>>>> drm-memory-$CATEGORY-$REGION: ...
> >>>>>>>
> >>>>>>> Then we document a bunch of categories and their semantics, for instance:
> >>>>>>>
> >>>>>>> 'size' - All reachable objects
> >>>>>>> 'shared' - Subset of 'size' with handle_count > 1
> >>>>>>> 'resident' - Objects with backing store
> >>>>>>> 'active' - Objects in use, subset of resident
> >>>>>>> 'purgeable' - Or inactive? Subset of resident.
> >>>>>>>
> >>>>>>> We keep the same semantics as with process memory accounting (if I got
> >>>>>>> it right) which could be desirable for a simplified mental model.
> >>>>>>>
> >>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> >>>>>>> correctly captured this in the first round it should be equivalent to
> >>>>>>> 'resident' above. In any case we can document no category is equal to
> >>>>>>> which category, and at most one of the two must be output.)
> >>>>>>>
> >>>>>>> Region names we at most partially standardize. Like we could say
> >>>>>>> 'system' is to be used where backing store is system RAM and others are
> >>>>>>> driver defined.
> >>>>>>>
> >>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> >>>>>>> region they support.
> >>>>>>>
> >>>>>>> I think this all also works for objects which can be migrated between
> >>>>>>> memory regions. 'Size' accounts them against all regions while for
> >>>>>>> 'resident' they only appear in the region of their current placement, etc.
> >>>>>>
> >>>>>> I'm not too sure how to rectify different memory regions with this,
> >>>>>> since drm core doesn't really know about the driver's memory regions.
> >>>>>> Perhaps we can go back to this being a helper and drivers with vram
> >>>>>> just don't use the helper?  Or??
> >>>>>
> >>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> >>>>> all works out reasonably consistently?
> >>>>
> >>>> That is basically what we have now.  I could append -system to each to
> >>>> make things easier to add vram/etc (from a uabi standpoint)..
> >>>
> >>> What you have isn't really -system, but everything. So doesn't really make
> >>> sense to me to mark this -system, it's only really true for integrated (if
> >>> they don't have stolen or something like that).
> >>>
> >>> Also my comment was more in reply to Tvrtko's suggestion.
> >>
> >> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> >> with the current drm-memory-$REGION by extending, rather than creating
> >> confusion with different order of key name components.
> >
> > Oh my comment was pretty much just bikeshed, in case someone creates a
> > $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> > So $CATEGORY before the -memory.
> >
> > Otoh I don't think that'll happen, so I guess we can go with whatever more
> > folks like :-) I don't really care much personally.
>
> Okay I missed the parsing problem.
>
> >> AMD currently has (among others) drm-memory-vram, which we could define in
> >> the spec maps to category X, if category component is not present.
> >>
> >> Some examples:
> >>
> >> drm-memory-resident-system:
> >> drm-memory-size-lmem0:
> >> drm-memory-active-vram:
> >>
> >> Etc.. I think it creates a consistent story.
> >>
> >> Other than this, my two I think significant opens which haven't been
> >> addressed yet are:
> >>
> >> 1)
> >>
> >> Why do we want totals (not per region) when userspace can trivially
> >> aggregate if they want. What is the use case?
> >>
> >> 2)
> >>
> >> Current proposal limits the value to whole objects and fixates that by
> >> having it in the common code. If/when some driver is able to support sub-BO
> >> granularity they will need to opt out of the common printer at which point
> >> it may be less churn to start with a helper rather than mid-layer. Or maybe
> >> some drivers already support this, I don't know. Given how important VM BIND
> >> is I wouldn't be surprised.
> >
> > I feel like for drivers using ttm we want a ttm helper which takes care of
> > the region printing in hopefully a standard way. And that could then also
> > take care of all kinds of of partial binding and funny rules (like maybe
> > we want a standard vram region that addds up all the lmem regions on
> > intel, so that all dgpu have a common vram bucket that generic tools
> > understand?).
>
> First part yes, but for the second I would think we want to avoid any
> aggregation in the kernel which can be done in userspace just as well.
> Such total vram bucket would be pretty useless on Intel even since
> userspace needs to be region aware to make use of all resources. It
> could even be counter productive I think - "why am I getting out of
> memory when half of my vram is unused!?".
>
> > It does mean we walk the bo list twice, but *shrug*. People have been
> > complaining about procutils for decades, they're still horrible, I think
> > walking bo lists twice internally in the ttm case is going to be ok. If
> > not, it's internals, we can change them again.
> >
> > Also I'd lean a lot more towards making ttm a helper and not putting that
> > into core, exactly because it's pretty clear we'll need more flexibility
> > when it comes to accurate stats for multi-region drivers.
>
> Exactly.

It could also be that the gem->status() fxn is extended to return
_which_ pool that object is in.. but either way, we aren't painting
ourselves into a corner

> > But for a first "how much gpu space does this app use" across everything I
> > think this is a good enough starting point.
>
> Okay so we agree this would be better as a helper and not in the core.
>
> On the point are keys/semantics good enough as a starting point I am
> still not convinced kernel should aggregate and that instead we should
> start from day one by appending -system (or something) to Rob's proposed
> keys.

I mean, if addition were expensive I might agree about not aggregating ;-)

BR,
-R

> Regards,
>
> Tvrtko

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-13 18:24                     ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-13 18:24 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno

On Thu, Apr 13, 2023 at 9:40 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 13/04/2023 14:27, Daniel Vetter wrote:
> > On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> >>
> >> On 12/04/2023 20:18, Daniel Vetter wrote:
> >>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> >>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>>>
> >>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> >>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> >>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
> >>>>>>>> From: Rob Clark <robdclark@chromium.org>
> >>>>>>>>
> >>>>>>>> Add support to dump GEM stats to fdinfo.
> >>>>>>>>
> >>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
> >>>>>>>> v3: Do it in core
> >>>>>>>>
> >>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> >>>>>>>> ---
> >>>>>>>>     Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >>>>>>>>     drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >>>>>>>>     include/drm/drm_file.h                |  1 +
> >>>>>>>>     include/drm/drm_gem.h                 | 19 +++++++
> >>>>>>>>     4 files changed, 117 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>> index b46327356e80..b5e7802532ed 100644
> >>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >>>>>>>>     Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >>>>>>>>     indicating kibi- or mebi-bytes.
> >>>>>>>>
> >>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> >>>>>>>> +
> >>>>>>>> +The total size of buffers that are shared with another file (ie. have more
> >>>>>>>> +than a single handle).
> >>>>>>>> +
> >>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> >>>>>>>> +
> >>>>>>>> +The total size of buffers that are not shared with another file.
> >>>>>>>> +
> >>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> >>>>>>>> +
> >>>>>>>> +The total size of buffers that are resident in system memory.
> >>>>>>>
> >>>>>>> I think this naming maybe does not work best with the existing
> >>>>>>> drm-memory-<region> keys.
> >>>>>>
> >>>>>> Actually, it was very deliberate not to conflict with the existing
> >>>>>> drm-memory-<region> keys ;-)
> >>>>>>
> >>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> >>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> >>>>>>
> >>>>>>> How about introduce the concept of a memory region from the start and
> >>>>>>> use naming similar like we do for engines?
> >>>>>>>
> >>>>>>> drm-memory-$CATEGORY-$REGION: ...
> >>>>>>>
> >>>>>>> Then we document a bunch of categories and their semantics, for instance:
> >>>>>>>
> >>>>>>> 'size' - All reachable objects
> >>>>>>> 'shared' - Subset of 'size' with handle_count > 1
> >>>>>>> 'resident' - Objects with backing store
> >>>>>>> 'active' - Objects in use, subset of resident
> >>>>>>> 'purgeable' - Or inactive? Subset of resident.
> >>>>>>>
> >>>>>>> We keep the same semantics as with process memory accounting (if I got
> >>>>>>> it right) which could be desirable for a simplified mental model.
> >>>>>>>
> >>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> >>>>>>> correctly captured this in the first round it should be equivalent to
> >>>>>>> 'resident' above. In any case we can document no category is equal to
> >>>>>>> which category, and at most one of the two must be output.)
> >>>>>>>
> >>>>>>> Region names we at most partially standardize. Like we could say
> >>>>>>> 'system' is to be used where backing store is system RAM and others are
> >>>>>>> driver defined.
> >>>>>>>
> >>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> >>>>>>> region they support.
> >>>>>>>
> >>>>>>> I think this all also works for objects which can be migrated between
> >>>>>>> memory regions. 'Size' accounts them against all regions while for
> >>>>>>> 'resident' they only appear in the region of their current placement, etc.
> >>>>>>
> >>>>>> I'm not too sure how to rectify different memory regions with this,
> >>>>>> since drm core doesn't really know about the driver's memory regions.
> >>>>>> Perhaps we can go back to this being a helper and drivers with vram
> >>>>>> just don't use the helper?  Or??
> >>>>>
> >>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> >>>>> all works out reasonably consistently?
> >>>>
> >>>> That is basically what we have now.  I could append -system to each to
> >>>> make things easier to add vram/etc (from a uabi standpoint)..
> >>>
> >>> What you have isn't really -system, but everything. So doesn't really make
> >>> sense to me to mark this -system, it's only really true for integrated (if
> >>> they don't have stolen or something like that).
> >>>
> >>> Also my comment was more in reply to Tvrtko's suggestion.
> >>
> >> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> >> with the current drm-memory-$REGION by extending, rather than creating
> >> confusion with different order of key name components.
> >
> > Oh my comment was pretty much just bikeshed, in case someone creates a
> > $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> > So $CATEGORY before the -memory.
> >
> > Otoh I don't think that'll happen, so I guess we can go with whatever more
> > folks like :-) I don't really care much personally.
>
> Okay I missed the parsing problem.
>
> >> AMD currently has (among others) drm-memory-vram, which we could define in
> >> the spec maps to category X, if category component is not present.
> >>
> >> Some examples:
> >>
> >> drm-memory-resident-system:
> >> drm-memory-size-lmem0:
> >> drm-memory-active-vram:
> >>
> >> Etc.. I think it creates a consistent story.
> >>
> >> Other than this, my two I think significant opens which haven't been
> >> addressed yet are:
> >>
> >> 1)
> >>
> >> Why do we want totals (not per region) when userspace can trivially
> >> aggregate if they want. What is the use case?
> >>
> >> 2)
> >>
> >> Current proposal limits the value to whole objects and fixates that by
> >> having it in the common code. If/when some driver is able to support sub-BO
> >> granularity they will need to opt out of the common printer at which point
> >> it may be less churn to start with a helper rather than mid-layer. Or maybe
> >> some drivers already support this, I don't know. Given how important VM BIND
> >> is I wouldn't be surprised.
> >
> > I feel like for drivers using ttm we want a ttm helper which takes care of
> > the region printing in hopefully a standard way. And that could then also
> > take care of all kinds of of partial binding and funny rules (like maybe
> > we want a standard vram region that addds up all the lmem regions on
> > intel, so that all dgpu have a common vram bucket that generic tools
> > understand?).
>
> First part yes, but for the second I would think we want to avoid any
> aggregation in the kernel which can be done in userspace just as well.
> Such total vram bucket would be pretty useless on Intel even since
> userspace needs to be region aware to make use of all resources. It
> could even be counter productive I think - "why am I getting out of
> memory when half of my vram is unused!?".
>
> > It does mean we walk the bo list twice, but *shrug*. People have been
> > complaining about procutils for decades, they're still horrible, I think
> > walking bo lists twice internally in the ttm case is going to be ok. If
> > not, it's internals, we can change them again.
> >
> > Also I'd lean a lot more towards making ttm a helper and not putting that
> > into core, exactly because it's pretty clear we'll need more flexibility
> > when it comes to accurate stats for multi-region drivers.
>
> Exactly.

It could also be that the gem->status() fxn is extended to return
_which_ pool that object is in.. but either way, we aren't painting
ourselves into a corner

> > But for a first "how much gpu space does this app use" across everything I
> > think this is a good enough starting point.
>
> Okay so we agree this would be better as a helper and not in the core.
>
> On the point are keys/semantics good enough as a starting point I am
> still not convinced kernel should aggregate and that instead we should
> start from day one by appending -system (or something) to Rob's proposed
> keys.

I mean, if addition were expensive I might agree about not aggregating ;-)

BR,
-R

> Regards,
>
> Tvrtko

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-13 16:40                 ` Tvrtko Ursulin
@ 2023-04-13 20:05                     ` Daniel Vetter
  2023-04-13 20:05                     ` Daniel Vetter
  1 sibling, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-13 20:05 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Christopher Healy, Emil Velikov, Rob Clark, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, open list:DOCUMENTATION, open list

On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
> 
> On 13/04/2023 14:27, Daniel Vetter wrote:
> > On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 12/04/2023 20:18, Daniel Vetter wrote:
> > > > On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> > > > > On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > 
> > > > > > On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> > > > > > > On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> > > > > > > <tvrtko.ursulin@linux.intel.com> wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On 11/04/2023 23:56, Rob Clark wrote:
> > > > > > > > > From: Rob Clark <robdclark@chromium.org>
> > > > > > > > > 
> > > > > > > > > Add support to dump GEM stats to fdinfo.
> > > > > > > > > 
> > > > > > > > > v2: Fix typos, change size units to match docs, use div_u64
> > > > > > > > > v3: Do it in core
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > > > > > > Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > > > > > > > > ---
> > > > > > > > >     Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> > > > > > > > >     drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> > > > > > > > >     include/drm/drm_file.h                |  1 +
> > > > > > > > >     include/drm/drm_gem.h                 | 19 +++++++
> > > > > > > > >     4 files changed, 117 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > > > > > > > > index b46327356e80..b5e7802532ed 100644
> > > > > > > > > --- a/Documentation/gpu/drm-usage-stats.rst
> > > > > > > > > +++ b/Documentation/gpu/drm-usage-stats.rst
> > > > > > > > > @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> > > > > > > > >     Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> > > > > > > > >     indicating kibi- or mebi-bytes.
> > > > > > > > > 
> > > > > > > > > +- drm-shared-memory: <uint> [KiB|MiB]
> > > > > > > > > +
> > > > > > > > > +The total size of buffers that are shared with another file (ie. have more
> > > > > > > > > +than a single handle).
> > > > > > > > > +
> > > > > > > > > +- drm-private-memory: <uint> [KiB|MiB]
> > > > > > > > > +
> > > > > > > > > +The total size of buffers that are not shared with another file.
> > > > > > > > > +
> > > > > > > > > +- drm-resident-memory: <uint> [KiB|MiB]
> > > > > > > > > +
> > > > > > > > > +The total size of buffers that are resident in system memory.
> > > > > > > > 
> > > > > > > > I think this naming maybe does not work best with the existing
> > > > > > > > drm-memory-<region> keys.
> > > > > > > 
> > > > > > > Actually, it was very deliberate not to conflict with the existing
> > > > > > > drm-memory-<region> keys ;-)
> > > > > > > 
> > > > > > > I wouldn't have preferred drm-memory-{active,resident,...} but it
> > > > > > > could be mis-parsed by existing userspace so my hands were a bit tied.
> > > > > > > 
> > > > > > > > How about introduce the concept of a memory region from the start and
> > > > > > > > use naming similar like we do for engines?
> > > > > > > > 
> > > > > > > > drm-memory-$CATEGORY-$REGION: ...
> > > > > > > > 
> > > > > > > > Then we document a bunch of categories and their semantics, for instance:
> > > > > > > > 
> > > > > > > > 'size' - All reachable objects
> > > > > > > > 'shared' - Subset of 'size' with handle_count > 1
> > > > > > > > 'resident' - Objects with backing store
> > > > > > > > 'active' - Objects in use, subset of resident
> > > > > > > > 'purgeable' - Or inactive? Subset of resident.
> > > > > > > > 
> > > > > > > > We keep the same semantics as with process memory accounting (if I got
> > > > > > > > it right) which could be desirable for a simplified mental model.
> > > > > > > > 
> > > > > > > > (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> > > > > > > > correctly captured this in the first round it should be equivalent to
> > > > > > > > 'resident' above. In any case we can document no category is equal to
> > > > > > > > which category, and at most one of the two must be output.)
> > > > > > > > 
> > > > > > > > Region names we at most partially standardize. Like we could say
> > > > > > > > 'system' is to be used where backing store is system RAM and others are
> > > > > > > > driver defined.
> > > > > > > > 
> > > > > > > > Then discrete GPUs could emit N sets of key-values, one for each memory
> > > > > > > > region they support.
> > > > > > > > 
> > > > > > > > I think this all also works for objects which can be migrated between
> > > > > > > > memory regions. 'Size' accounts them against all regions while for
> > > > > > > > 'resident' they only appear in the region of their current placement, etc.
> > > > > > > 
> > > > > > > I'm not too sure how to rectify different memory regions with this,
> > > > > > > since drm core doesn't really know about the driver's memory regions.
> > > > > > > Perhaps we can go back to this being a helper and drivers with vram
> > > > > > > just don't use the helper?  Or??
> > > > > > 
> > > > > > I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> > > > > > all works out reasonably consistently?
> > > > > 
> > > > > That is basically what we have now.  I could append -system to each to
> > > > > make things easier to add vram/etc (from a uabi standpoint)..
> > > > 
> > > > What you have isn't really -system, but everything. So doesn't really make
> > > > sense to me to mark this -system, it's only really true for integrated (if
> > > > they don't have stolen or something like that).
> > > > 
> > > > Also my comment was more in reply to Tvrtko's suggestion.
> > > 
> > > Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> > > with the current drm-memory-$REGION by extending, rather than creating
> > > confusion with different order of key name components.
> > 
> > Oh my comment was pretty much just bikeshed, in case someone creates a
> > $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> > So $CATEGORY before the -memory.
> > 
> > Otoh I don't think that'll happen, so I guess we can go with whatever more
> > folks like :-) I don't really care much personally.
> 
> Okay I missed the parsing problem.
> 
> > > AMD currently has (among others) drm-memory-vram, which we could define in
> > > the spec maps to category X, if category component is not present.
> > > 
> > > Some examples:
> > > 
> > > drm-memory-resident-system:
> > > drm-memory-size-lmem0:
> > > drm-memory-active-vram:
> > > 
> > > Etc.. I think it creates a consistent story.
> > > 
> > > Other than this, my two I think significant opens which haven't been
> > > addressed yet are:
> > > 
> > > 1)
> > > 
> > > Why do we want totals (not per region) when userspace can trivially
> > > aggregate if they want. What is the use case?
> > > 
> > > 2)
> > > 
> > > Current proposal limits the value to whole objects and fixates that by
> > > having it in the common code. If/when some driver is able to support sub-BO
> > > granularity they will need to opt out of the common printer at which point
> > > it may be less churn to start with a helper rather than mid-layer. Or maybe
> > > some drivers already support this, I don't know. Given how important VM BIND
> > > is I wouldn't be surprised.
> > 
> > I feel like for drivers using ttm we want a ttm helper which takes care of
> > the region printing in hopefully a standard way. And that could then also
> > take care of all kinds of of partial binding and funny rules (like maybe
> > we want a standard vram region that addds up all the lmem regions on
> > intel, so that all dgpu have a common vram bucket that generic tools
> > understand?).
> 
> First part yes, but for the second I would think we want to avoid any
> aggregation in the kernel which can be done in userspace just as well. Such
> total vram bucket would be pretty useless on Intel even since userspace
> needs to be region aware to make use of all resources. It could even be
> counter productive I think - "why am I getting out of memory when half of my
> vram is unused!?".

This is not for intel-aware userspace. This is for fairly generic "gputop"
style userspace, which might simply have no clue or interest in what lmemX
means, but would understand vram.

Aggregating makes sense.

> > It does mean we walk the bo list twice, but *shrug*. People have been
> > complaining about procutils for decades, they're still horrible, I think
> > walking bo lists twice internally in the ttm case is going to be ok. If
> > not, it's internals, we can change them again.
> > 
> > Also I'd lean a lot more towards making ttm a helper and not putting that
> > into core, exactly because it's pretty clear we'll need more flexibility
> > when it comes to accurate stats for multi-region drivers.
> 
> Exactly.
> 
> > But for a first "how much gpu space does this app use" across everything I
> > think this is a good enough starting point.
> 
> Okay so we agree this would be better as a helper and not in the core.

Nope, if you mean with this = Rob's patch. I was talking about a
hypothetical region-aware extension for ttm-using drivers.

> On the point are keys/semantics good enough as a starting point I am still
> not convinced kernel should aggregate and that instead we should start from
> day one by appending -system (or something) to Rob's proposed keys.

It should imo. Inflicting driver knowledge on generic userspace makes not
much sense, we should start with the more generally useful stuff imo.
That's why there's the drm fdinfo spec and all that so it's not a
free-for-all.

Also Rob's stuff is _not_ system. Check on a i915 dgpu if you want :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-13 20:05                     ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-13 20:05 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno

On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
> 
> On 13/04/2023 14:27, Daniel Vetter wrote:
> > On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 12/04/2023 20:18, Daniel Vetter wrote:
> > > > On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> > > > > On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > 
> > > > > > On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> > > > > > > On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> > > > > > > <tvrtko.ursulin@linux.intel.com> wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On 11/04/2023 23:56, Rob Clark wrote:
> > > > > > > > > From: Rob Clark <robdclark@chromium.org>
> > > > > > > > > 
> > > > > > > > > Add support to dump GEM stats to fdinfo.
> > > > > > > > > 
> > > > > > > > > v2: Fix typos, change size units to match docs, use div_u64
> > > > > > > > > v3: Do it in core
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > > > > > > Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > > > > > > > > ---
> > > > > > > > >     Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> > > > > > > > >     drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> > > > > > > > >     include/drm/drm_file.h                |  1 +
> > > > > > > > >     include/drm/drm_gem.h                 | 19 +++++++
> > > > > > > > >     4 files changed, 117 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > > > > > > > > index b46327356e80..b5e7802532ed 100644
> > > > > > > > > --- a/Documentation/gpu/drm-usage-stats.rst
> > > > > > > > > +++ b/Documentation/gpu/drm-usage-stats.rst
> > > > > > > > > @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> > > > > > > > >     Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> > > > > > > > >     indicating kibi- or mebi-bytes.
> > > > > > > > > 
> > > > > > > > > +- drm-shared-memory: <uint> [KiB|MiB]
> > > > > > > > > +
> > > > > > > > > +The total size of buffers that are shared with another file (ie. have more
> > > > > > > > > +than a single handle).
> > > > > > > > > +
> > > > > > > > > +- drm-private-memory: <uint> [KiB|MiB]
> > > > > > > > > +
> > > > > > > > > +The total size of buffers that are not shared with another file.
> > > > > > > > > +
> > > > > > > > > +- drm-resident-memory: <uint> [KiB|MiB]
> > > > > > > > > +
> > > > > > > > > +The total size of buffers that are resident in system memory.
> > > > > > > > 
> > > > > > > > I think this naming maybe does not work best with the existing
> > > > > > > > drm-memory-<region> keys.
> > > > > > > 
> > > > > > > Actually, it was very deliberate not to conflict with the existing
> > > > > > > drm-memory-<region> keys ;-)
> > > > > > > 
> > > > > > > I wouldn't have preferred drm-memory-{active,resident,...} but it
> > > > > > > could be mis-parsed by existing userspace so my hands were a bit tied.
> > > > > > > 
> > > > > > > > How about introduce the concept of a memory region from the start and
> > > > > > > > use naming similar like we do for engines?
> > > > > > > > 
> > > > > > > > drm-memory-$CATEGORY-$REGION: ...
> > > > > > > > 
> > > > > > > > Then we document a bunch of categories and their semantics, for instance:
> > > > > > > > 
> > > > > > > > 'size' - All reachable objects
> > > > > > > > 'shared' - Subset of 'size' with handle_count > 1
> > > > > > > > 'resident' - Objects with backing store
> > > > > > > > 'active' - Objects in use, subset of resident
> > > > > > > > 'purgeable' - Or inactive? Subset of resident.
> > > > > > > > 
> > > > > > > > We keep the same semantics as with process memory accounting (if I got
> > > > > > > > it right) which could be desirable for a simplified mental model.
> > > > > > > > 
> > > > > > > > (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> > > > > > > > correctly captured this in the first round it should be equivalent to
> > > > > > > > 'resident' above. In any case we can document no category is equal to
> > > > > > > > which category, and at most one of the two must be output.)
> > > > > > > > 
> > > > > > > > Region names we at most partially standardize. Like we could say
> > > > > > > > 'system' is to be used where backing store is system RAM and others are
> > > > > > > > driver defined.
> > > > > > > > 
> > > > > > > > Then discrete GPUs could emit N sets of key-values, one for each memory
> > > > > > > > region they support.
> > > > > > > > 
> > > > > > > > I think this all also works for objects which can be migrated between
> > > > > > > > memory regions. 'Size' accounts them against all regions while for
> > > > > > > > 'resident' they only appear in the region of their current placement, etc.
> > > > > > > 
> > > > > > > I'm not too sure how to rectify different memory regions with this,
> > > > > > > since drm core doesn't really know about the driver's memory regions.
> > > > > > > Perhaps we can go back to this being a helper and drivers with vram
> > > > > > > just don't use the helper?  Or??
> > > > > > 
> > > > > > I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> > > > > > all works out reasonably consistently?
> > > > > 
> > > > > That is basically what we have now.  I could append -system to each to
> > > > > make things easier to add vram/etc (from a uabi standpoint)..
> > > > 
> > > > What you have isn't really -system, but everything. So doesn't really make
> > > > sense to me to mark this -system, it's only really true for integrated (if
> > > > they don't have stolen or something like that).
> > > > 
> > > > Also my comment was more in reply to Tvrtko's suggestion.
> > > 
> > > Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> > > with the current drm-memory-$REGION by extending, rather than creating
> > > confusion with different order of key name components.
> > 
> > Oh my comment was pretty much just bikeshed, in case someone creates a
> > $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> > So $CATEGORY before the -memory.
> > 
> > Otoh I don't think that'll happen, so I guess we can go with whatever more
> > folks like :-) I don't really care much personally.
> 
> Okay I missed the parsing problem.
> 
> > > AMD currently has (among others) drm-memory-vram, which we could define in
> > > the spec maps to category X, if category component is not present.
> > > 
> > > Some examples:
> > > 
> > > drm-memory-resident-system:
> > > drm-memory-size-lmem0:
> > > drm-memory-active-vram:
> > > 
> > > Etc.. I think it creates a consistent story.
> > > 
> > > Other than this, my two I think significant opens which haven't been
> > > addressed yet are:
> > > 
> > > 1)
> > > 
> > > Why do we want totals (not per region) when userspace can trivially
> > > aggregate if they want. What is the use case?
> > > 
> > > 2)
> > > 
> > > Current proposal limits the value to whole objects and fixates that by
> > > having it in the common code. If/when some driver is able to support sub-BO
> > > granularity they will need to opt out of the common printer at which point
> > > it may be less churn to start with a helper rather than mid-layer. Or maybe
> > > some drivers already support this, I don't know. Given how important VM BIND
> > > is I wouldn't be surprised.
> > 
> > I feel like for drivers using ttm we want a ttm helper which takes care of
> > the region printing in hopefully a standard way. And that could then also
> > take care of all kinds of of partial binding and funny rules (like maybe
> > we want a standard vram region that addds up all the lmem regions on
> > intel, so that all dgpu have a common vram bucket that generic tools
> > understand?).
> 
> First part yes, but for the second I would think we want to avoid any
> aggregation in the kernel which can be done in userspace just as well. Such
> total vram bucket would be pretty useless on Intel even since userspace
> needs to be region aware to make use of all resources. It could even be
> counter productive I think - "why am I getting out of memory when half of my
> vram is unused!?".

This is not for intel-aware userspace. This is for fairly generic "gputop"
style userspace, which might simply have no clue or interest in what lmemX
means, but would understand vram.

Aggregating makes sense.

> > It does mean we walk the bo list twice, but *shrug*. People have been
> > complaining about procutils for decades, they're still horrible, I think
> > walking bo lists twice internally in the ttm case is going to be ok. If
> > not, it's internals, we can change them again.
> > 
> > Also I'd lean a lot more towards making ttm a helper and not putting that
> > into core, exactly because it's pretty clear we'll need more flexibility
> > when it comes to accurate stats for multi-region drivers.
> 
> Exactly.
> 
> > But for a first "how much gpu space does this app use" across everything I
> > think this is a good enough starting point.
> 
> Okay so we agree this would be better as a helper and not in the core.

Nope, if you mean with this = Rob's patch. I was talking about a
hypothetical region-aware extension for ttm-using drivers.

> On the point are keys/semantics good enough as a starting point I am still
> not convinced kernel should aggregate and that instead we should start from
> day one by appending -system (or something) to Rob's proposed keys.

It should imo. Inflicting driver knowledge on generic userspace makes not
much sense, we should start with the more generally useful stuff imo.
That's why there's the drm fdinfo spec and all that so it's not a
free-for-all.

Also Rob's stuff is _not_ system. Check on a i915 dgpu if you want :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-13 20:05                     ` Daniel Vetter
  (?)
@ 2023-04-14  8:57                     ` Tvrtko Ursulin
  2023-04-14  9:07                         ` Daniel Vetter
  2023-04-14 13:40                         ` Rob Clark
  -1 siblings, 2 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-14  8:57 UTC (permalink / raw)
  To: Rob Clark, dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Christopher Healy, Emil Velikov, Rob Clark, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, open list:DOCUMENTATION, open list


On 13/04/2023 21:05, Daniel Vetter wrote:
> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
>>
>> On 13/04/2023 14:27, Daniel Vetter wrote:
>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 12/04/2023 20:18, Daniel Vetter wrote:
>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
>>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>>>>
>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
>>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
>>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>>>>>>
>>>>>>>>>> Add support to dump GEM stats to fdinfo.
>>>>>>>>>>
>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
>>>>>>>>>> v3: Do it in core
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
>>>>>>>>>> ---
>>>>>>>>>>      Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
>>>>>>>>>>      drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
>>>>>>>>>>      include/drm/drm_file.h                |  1 +
>>>>>>>>>>      include/drm/drm_gem.h                 | 19 +++++++
>>>>>>>>>>      4 files changed, 117 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>> index b46327356e80..b5e7802532ed 100644
>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
>>>>>>>>>>      Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
>>>>>>>>>>      indicating kibi- or mebi-bytes.
>>>>>>>>>>
>>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
>>>>>>>>>> +
>>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
>>>>>>>>>> +than a single handle).
>>>>>>>>>> +
>>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
>>>>>>>>>> +
>>>>>>>>>> +The total size of buffers that are not shared with another file.
>>>>>>>>>> +
>>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
>>>>>>>>>> +
>>>>>>>>>> +The total size of buffers that are resident in system memory.
>>>>>>>>>
>>>>>>>>> I think this naming maybe does not work best with the existing
>>>>>>>>> drm-memory-<region> keys.
>>>>>>>>
>>>>>>>> Actually, it was very deliberate not to conflict with the existing
>>>>>>>> drm-memory-<region> keys ;-)
>>>>>>>>
>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
>>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
>>>>>>>>
>>>>>>>>> How about introduce the concept of a memory region from the start and
>>>>>>>>> use naming similar like we do for engines?
>>>>>>>>>
>>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
>>>>>>>>>
>>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
>>>>>>>>>
>>>>>>>>> 'size' - All reachable objects
>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
>>>>>>>>> 'resident' - Objects with backing store
>>>>>>>>> 'active' - Objects in use, subset of resident
>>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
>>>>>>>>>
>>>>>>>>> We keep the same semantics as with process memory accounting (if I got
>>>>>>>>> it right) which could be desirable for a simplified mental model.
>>>>>>>>>
>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
>>>>>>>>> correctly captured this in the first round it should be equivalent to
>>>>>>>>> 'resident' above. In any case we can document no category is equal to
>>>>>>>>> which category, and at most one of the two must be output.)
>>>>>>>>>
>>>>>>>>> Region names we at most partially standardize. Like we could say
>>>>>>>>> 'system' is to be used where backing store is system RAM and others are
>>>>>>>>> driver defined.
>>>>>>>>>
>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
>>>>>>>>> region they support.
>>>>>>>>>
>>>>>>>>> I think this all also works for objects which can be migrated between
>>>>>>>>> memory regions. 'Size' accounts them against all regions while for
>>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
>>>>>>>>
>>>>>>>> I'm not too sure how to rectify different memory regions with this,
>>>>>>>> since drm core doesn't really know about the driver's memory regions.
>>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
>>>>>>>> just don't use the helper?  Or??
>>>>>>>
>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
>>>>>>> all works out reasonably consistently?
>>>>>>
>>>>>> That is basically what we have now.  I could append -system to each to
>>>>>> make things easier to add vram/etc (from a uabi standpoint)..
>>>>>
>>>>> What you have isn't really -system, but everything. So doesn't really make
>>>>> sense to me to mark this -system, it's only really true for integrated (if
>>>>> they don't have stolen or something like that).
>>>>>
>>>>> Also my comment was more in reply to Tvrtko's suggestion.
>>>>
>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
>>>> with the current drm-memory-$REGION by extending, rather than creating
>>>> confusion with different order of key name components.
>>>
>>> Oh my comment was pretty much just bikeshed, in case someone creates a
>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
>>> So $CATEGORY before the -memory.
>>>
>>> Otoh I don't think that'll happen, so I guess we can go with whatever more
>>> folks like :-) I don't really care much personally.
>>
>> Okay I missed the parsing problem.
>>
>>>> AMD currently has (among others) drm-memory-vram, which we could define in
>>>> the spec maps to category X, if category component is not present.
>>>>
>>>> Some examples:
>>>>
>>>> drm-memory-resident-system:
>>>> drm-memory-size-lmem0:
>>>> drm-memory-active-vram:
>>>>
>>>> Etc.. I think it creates a consistent story.
>>>>
>>>> Other than this, my two I think significant opens which haven't been
>>>> addressed yet are:
>>>>
>>>> 1)
>>>>
>>>> Why do we want totals (not per region) when userspace can trivially
>>>> aggregate if they want. What is the use case?
>>>>
>>>> 2)
>>>>
>>>> Current proposal limits the value to whole objects and fixates that by
>>>> having it in the common code. If/when some driver is able to support sub-BO
>>>> granularity they will need to opt out of the common printer at which point
>>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
>>>> some drivers already support this, I don't know. Given how important VM BIND
>>>> is I wouldn't be surprised.
>>>
>>> I feel like for drivers using ttm we want a ttm helper which takes care of
>>> the region printing in hopefully a standard way. And that could then also
>>> take care of all kinds of of partial binding and funny rules (like maybe
>>> we want a standard vram region that addds up all the lmem regions on
>>> intel, so that all dgpu have a common vram bucket that generic tools
>>> understand?).
>>
>> First part yes, but for the second I would think we want to avoid any
>> aggregation in the kernel which can be done in userspace just as well. Such
>> total vram bucket would be pretty useless on Intel even since userspace
>> needs to be region aware to make use of all resources. It could even be
>> counter productive I think - "why am I getting out of memory when half of my
>> vram is unused!?".
> 
> This is not for intel-aware userspace. This is for fairly generic "gputop"
> style userspace, which might simply have no clue or interest in what lmemX
> means, but would understand vram.
> 
> Aggregating makes sense.

Lmem vs vram is now an argument not about aggregation but about 
standardizing regions names.

One detail also is a change in philosophy compared to engine stats where 
engine names are not centrally prescribed and it was expected userspace 
will have to handle things generically and with some vendor specific 
knowledge.

Like in my gputop patches. It doesn't need to understand what is what, 
it just finds what's there and presents it to the user.

Come some accel driver with local memory it wouldn't be vram any more. 
Or even a headless data center GPU. So I really don't think it is good 
to hardcode 'vram' in the spec, or midlayer, or helpers.

And for aggregation.. again, userspace can do it just as well. If we do 
it in kernel then immediately we have multiple sets of keys to output 
for any driver which wants to show the region view. IMO it is just 
pointless work in the kernel and more code in the kernel, when userspace 
can do it.

Proposal A (one a discrete gpu, one category only):

drm-resident-memory: x KiB
drm-resident-memory-system: x KiB
drm-resident-memory-vram: x KiB

Two loops in the kernel, more parsing in userspace.

Proposal B:

drm-resident-memory-system: x KiB
drm-resident-memory-vram: x KiB

Can be one loop, one helper, less text for userspace to parse and it can 
still trivially show the total if so desired.

For instance a helper (or two) with a common struct containing region 
names and totals, where a callback into the driver tallies under each 
region, as the drm helper is walking objects.

>>> It does mean we walk the bo list twice, but *shrug*. People have been
>>> complaining about procutils for decades, they're still horrible, I think
>>> walking bo lists twice internally in the ttm case is going to be ok. If
>>> not, it's internals, we can change them again.
>>>
>>> Also I'd lean a lot more towards making ttm a helper and not putting that
>>> into core, exactly because it's pretty clear we'll need more flexibility
>>> when it comes to accurate stats for multi-region drivers.
>>
>> Exactly.
>>
>>> But for a first "how much gpu space does this app use" across everything I
>>> think this is a good enough starting point.
>>
>> Okay so we agree this would be better as a helper and not in the core.
> 
> Nope, if you mean with this = Rob's patch. I was talking about a
> hypothetical region-aware extension for ttm-using drivers.
> 
>> On the point are keys/semantics good enough as a starting point I am still
>> not convinced kernel should aggregate and that instead we should start from
>> day one by appending -system (or something) to Rob's proposed keys.
> 
> It should imo. Inflicting driver knowledge on generic userspace makes not
> much sense, we should start with the more generally useful stuff imo.
> That's why there's the drm fdinfo spec and all that so it's not a
> free-for-all.
> 
> Also Rob's stuff is _not_ system. Check on a i915 dgpu if you want :-)

I am well aware it adds up everything, that is beside the point.

Drm-usage-stats.rst text needs to be more precise across all keys at least:

+- drm-resident-memory: <uint> [KiB|MiB]
+
+The total size of buffers that are resident in system memory.

But as said, I don't see the point in providing aggregated values.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-14  8:57                     ` Tvrtko Ursulin
@ 2023-04-14  9:07                         ` Daniel Vetter
  2023-04-14 13:40                         ` Rob Clark
  1 sibling, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-14  9:07 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Christopher Healy, Emil Velikov, Rob Clark, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, open list:DOCUMENTATION, open list

On Fri, 14 Apr 2023 at 10:57, Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
> On 13/04/2023 21:05, Daniel Vetter wrote:
> > On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
> >>
> >> On 13/04/2023 14:27, Daniel Vetter wrote:
> >>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> >>>>
> >>>> On 12/04/2023 20:18, Daniel Vetter wrote:
> >>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> >>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>>>>>
> >>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> >>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> >>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
> >>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>>
> >>>>>>>>>> Add support to dump GEM stats to fdinfo.
> >>>>>>>>>>
> >>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
> >>>>>>>>>> v3: Do it in core
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> >>>>>>>>>> ---
> >>>>>>>>>>      Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >>>>>>>>>>      drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >>>>>>>>>>      include/drm/drm_file.h                |  1 +
> >>>>>>>>>>      include/drm/drm_gem.h                 | 19 +++++++
> >>>>>>>>>>      4 files changed, 117 insertions(+)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>> index b46327356e80..b5e7802532ed 100644
> >>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >>>>>>>>>>      Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >>>>>>>>>>      indicating kibi- or mebi-bytes.
> >>>>>>>>>>
> >>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> >>>>>>>>>> +
> >>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
> >>>>>>>>>> +than a single handle).
> >>>>>>>>>> +
> >>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> >>>>>>>>>> +
> >>>>>>>>>> +The total size of buffers that are not shared with another file.
> >>>>>>>>>> +
> >>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> >>>>>>>>>> +
> >>>>>>>>>> +The total size of buffers that are resident in system memory.
> >>>>>>>>>
> >>>>>>>>> I think this naming maybe does not work best with the existing
> >>>>>>>>> drm-memory-<region> keys.
> >>>>>>>>
> >>>>>>>> Actually, it was very deliberate not to conflict with the existing
> >>>>>>>> drm-memory-<region> keys ;-)
> >>>>>>>>
> >>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> >>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> >>>>>>>>
> >>>>>>>>> How about introduce the concept of a memory region from the start and
> >>>>>>>>> use naming similar like we do for engines?
> >>>>>>>>>
> >>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
> >>>>>>>>>
> >>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
> >>>>>>>>>
> >>>>>>>>> 'size' - All reachable objects
> >>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
> >>>>>>>>> 'resident' - Objects with backing store
> >>>>>>>>> 'active' - Objects in use, subset of resident
> >>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
> >>>>>>>>>
> >>>>>>>>> We keep the same semantics as with process memory accounting (if I got
> >>>>>>>>> it right) which could be desirable for a simplified mental model.
> >>>>>>>>>
> >>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> >>>>>>>>> correctly captured this in the first round it should be equivalent to
> >>>>>>>>> 'resident' above. In any case we can document no category is equal to
> >>>>>>>>> which category, and at most one of the two must be output.)
> >>>>>>>>>
> >>>>>>>>> Region names we at most partially standardize. Like we could say
> >>>>>>>>> 'system' is to be used where backing store is system RAM and others are
> >>>>>>>>> driver defined.
> >>>>>>>>>
> >>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> >>>>>>>>> region they support.
> >>>>>>>>>
> >>>>>>>>> I think this all also works for objects which can be migrated between
> >>>>>>>>> memory regions. 'Size' accounts them against all regions while for
> >>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
> >>>>>>>>
> >>>>>>>> I'm not too sure how to rectify different memory regions with this,
> >>>>>>>> since drm core doesn't really know about the driver's memory regions.
> >>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
> >>>>>>>> just don't use the helper?  Or??
> >>>>>>>
> >>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> >>>>>>> all works out reasonably consistently?
> >>>>>>
> >>>>>> That is basically what we have now.  I could append -system to each to
> >>>>>> make things easier to add vram/etc (from a uabi standpoint)..
> >>>>>
> >>>>> What you have isn't really -system, but everything. So doesn't really make
> >>>>> sense to me to mark this -system, it's only really true for integrated (if
> >>>>> they don't have stolen or something like that).
> >>>>>
> >>>>> Also my comment was more in reply to Tvrtko's suggestion.
> >>>>
> >>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> >>>> with the current drm-memory-$REGION by extending, rather than creating
> >>>> confusion with different order of key name components.
> >>>
> >>> Oh my comment was pretty much just bikeshed, in case someone creates a
> >>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> >>> So $CATEGORY before the -memory.
> >>>
> >>> Otoh I don't think that'll happen, so I guess we can go with whatever more
> >>> folks like :-) I don't really care much personally.
> >>
> >> Okay I missed the parsing problem.
> >>
> >>>> AMD currently has (among others) drm-memory-vram, which we could define in
> >>>> the spec maps to category X, if category component is not present.
> >>>>
> >>>> Some examples:
> >>>>
> >>>> drm-memory-resident-system:
> >>>> drm-memory-size-lmem0:
> >>>> drm-memory-active-vram:
> >>>>
> >>>> Etc.. I think it creates a consistent story.
> >>>>
> >>>> Other than this, my two I think significant opens which haven't been
> >>>> addressed yet are:
> >>>>
> >>>> 1)
> >>>>
> >>>> Why do we want totals (not per region) when userspace can trivially
> >>>> aggregate if they want. What is the use case?
> >>>>
> >>>> 2)
> >>>>
> >>>> Current proposal limits the value to whole objects and fixates that by
> >>>> having it in the common code. If/when some driver is able to support sub-BO
> >>>> granularity they will need to opt out of the common printer at which point
> >>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
> >>>> some drivers already support this, I don't know. Given how important VM BIND
> >>>> is I wouldn't be surprised.
> >>>
> >>> I feel like for drivers using ttm we want a ttm helper which takes care of
> >>> the region printing in hopefully a standard way. And that could then also
> >>> take care of all kinds of of partial binding and funny rules (like maybe
> >>> we want a standard vram region that addds up all the lmem regions on
> >>> intel, so that all dgpu have a common vram bucket that generic tools
> >>> understand?).
> >>
> >> First part yes, but for the second I would think we want to avoid any
> >> aggregation in the kernel which can be done in userspace just as well. Such
> >> total vram bucket would be pretty useless on Intel even since userspace
> >> needs to be region aware to make use of all resources. It could even be
> >> counter productive I think - "why am I getting out of memory when half of my
> >> vram is unused!?".
> >
> > This is not for intel-aware userspace. This is for fairly generic "gputop"
> > style userspace, which might simply have no clue or interest in what lmemX
> > means, but would understand vram.
> >
> > Aggregating makes sense.
>
> Lmem vs vram is now an argument not about aggregation but about
> standardizing regions names.
>
> One detail also is a change in philosophy compared to engine stats where
> engine names are not centrally prescribed and it was expected userspace
> will have to handle things generically and with some vendor specific
> knowledge.
>
> Like in my gputop patches. It doesn't need to understand what is what,
> it just finds what's there and presents it to the user.
>
> Come some accel driver with local memory it wouldn't be vram any more.
> Or even a headless data center GPU. So I really don't think it is good
> to hardcode 'vram' in the spec, or midlayer, or helpers.
>
> And for aggregation.. again, userspace can do it just as well. If we do
> it in kernel then immediately we have multiple sets of keys to output
> for any driver which wants to show the region view. IMO it is just
> pointless work in the kernel and more code in the kernel, when userspace
> can do it.
>
> Proposal A (one a discrete gpu, one category only):
>
> drm-resident-memory: x KiB
> drm-resident-memory-system: x KiB
> drm-resident-memory-vram: x KiB
>
> Two loops in the kernel, more parsing in userspace.
>
> Proposal B:
>
> drm-resident-memory-system: x KiB
> drm-resident-memory-vram: x KiB
>
> Can be one loop, one helper, less text for userspace to parse and it can
> still trivially show the total if so desired.
>
> For instance a helper (or two) with a common struct containing region
> names and totals, where a callback into the driver tallies under each
> region, as the drm helper is walking objects.

The difference is that Rob's patches exist, and consistently roll this
out across all drm drivers.

You're patches don't exist, and encourage further fragmentation. And
my take here is that "the good enough, and real" wins above "perfect,
but maybe in a few years and inconsitently across drivers".

No one is stopping you from writing a ton of patches to get towards
the perfect state, and we still want to get there. But I don't see the
point in rejecting the good enough for now for that.

It's kinda the same idea with scheduler stats, but the other way
round: Sure it'd have been great if we could have this consistently
across all drivers, but right now the scheduler situation just isn't
there to support that. I'm pushing a bit, but it's definitely years
away. So the pragmatic option there was to just roll things out
driver-by-driver, to get things going. It's not perfect at all, and it
would have been easy to nuke that entire fdinfo effort on those
grounds.

If you want maybe a todo.rst entry to cover this discussion and make
sure we do record the rough consensus of where we eventually want to
end up at?

> >>> It does mean we walk the bo list twice, but *shrug*. People have been
> >>> complaining about procutils for decades, they're still horrible, I think
> >>> walking bo lists twice internally in the ttm case is going to be ok. If
> >>> not, it's internals, we can change them again.
> >>>
> >>> Also I'd lean a lot more towards making ttm a helper and not putting that
> >>> into core, exactly because it's pretty clear we'll need more flexibility
> >>> when it comes to accurate stats for multi-region drivers.
> >>
> >> Exactly.
> >>
> >>> But for a first "how much gpu space does this app use" across everything I
> >>> think this is a good enough starting point.
> >>
> >> Okay so we agree this would be better as a helper and not in the core.
> >
> > Nope, if you mean with this = Rob's patch. I was talking about a
> > hypothetical region-aware extension for ttm-using drivers.
> >
> >> On the point are keys/semantics good enough as a starting point I am still
> >> not convinced kernel should aggregate and that instead we should start from
> >> day one by appending -system (or something) to Rob's proposed keys.
> >
> > It should imo. Inflicting driver knowledge on generic userspace makes not
> > much sense, we should start with the more generally useful stuff imo.
> > That's why there's the drm fdinfo spec and all that so it's not a
> > free-for-all.
> >
> > Also Rob's stuff is _not_ system. Check on a i915 dgpu if you want :-)
>
> I am well aware it adds up everything, that is beside the point.
>
> Drm-usage-stats.rst text needs to be more precise across all keys at least:
>
> +- drm-resident-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are resident in system memory.
>
> But as said, I don't see the point in providing aggregated values.

The choice isn't between aggregated values and split values.

The choice is between no values (for most drivers) and split values on
some drivers, vs aggregated values for everyone (and still split
values for some).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-14  9:07                         ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-14  9:07 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno

On Fri, 14 Apr 2023 at 10:57, Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
> On 13/04/2023 21:05, Daniel Vetter wrote:
> > On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
> >>
> >> On 13/04/2023 14:27, Daniel Vetter wrote:
> >>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> >>>>
> >>>> On 12/04/2023 20:18, Daniel Vetter wrote:
> >>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> >>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>>>>>
> >>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> >>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> >>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
> >>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>>
> >>>>>>>>>> Add support to dump GEM stats to fdinfo.
> >>>>>>>>>>
> >>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
> >>>>>>>>>> v3: Do it in core
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> >>>>>>>>>> ---
> >>>>>>>>>>      Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >>>>>>>>>>      drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >>>>>>>>>>      include/drm/drm_file.h                |  1 +
> >>>>>>>>>>      include/drm/drm_gem.h                 | 19 +++++++
> >>>>>>>>>>      4 files changed, 117 insertions(+)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>> index b46327356e80..b5e7802532ed 100644
> >>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >>>>>>>>>>      Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >>>>>>>>>>      indicating kibi- or mebi-bytes.
> >>>>>>>>>>
> >>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> >>>>>>>>>> +
> >>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
> >>>>>>>>>> +than a single handle).
> >>>>>>>>>> +
> >>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> >>>>>>>>>> +
> >>>>>>>>>> +The total size of buffers that are not shared with another file.
> >>>>>>>>>> +
> >>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> >>>>>>>>>> +
> >>>>>>>>>> +The total size of buffers that are resident in system memory.
> >>>>>>>>>
> >>>>>>>>> I think this naming maybe does not work best with the existing
> >>>>>>>>> drm-memory-<region> keys.
> >>>>>>>>
> >>>>>>>> Actually, it was very deliberate not to conflict with the existing
> >>>>>>>> drm-memory-<region> keys ;-)
> >>>>>>>>
> >>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> >>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> >>>>>>>>
> >>>>>>>>> How about introduce the concept of a memory region from the start and
> >>>>>>>>> use naming similar like we do for engines?
> >>>>>>>>>
> >>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
> >>>>>>>>>
> >>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
> >>>>>>>>>
> >>>>>>>>> 'size' - All reachable objects
> >>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
> >>>>>>>>> 'resident' - Objects with backing store
> >>>>>>>>> 'active' - Objects in use, subset of resident
> >>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
> >>>>>>>>>
> >>>>>>>>> We keep the same semantics as with process memory accounting (if I got
> >>>>>>>>> it right) which could be desirable for a simplified mental model.
> >>>>>>>>>
> >>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> >>>>>>>>> correctly captured this in the first round it should be equivalent to
> >>>>>>>>> 'resident' above. In any case we can document no category is equal to
> >>>>>>>>> which category, and at most one of the two must be output.)
> >>>>>>>>>
> >>>>>>>>> Region names we at most partially standardize. Like we could say
> >>>>>>>>> 'system' is to be used where backing store is system RAM and others are
> >>>>>>>>> driver defined.
> >>>>>>>>>
> >>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> >>>>>>>>> region they support.
> >>>>>>>>>
> >>>>>>>>> I think this all also works for objects which can be migrated between
> >>>>>>>>> memory regions. 'Size' accounts them against all regions while for
> >>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
> >>>>>>>>
> >>>>>>>> I'm not too sure how to rectify different memory regions with this,
> >>>>>>>> since drm core doesn't really know about the driver's memory regions.
> >>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
> >>>>>>>> just don't use the helper?  Or??
> >>>>>>>
> >>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> >>>>>>> all works out reasonably consistently?
> >>>>>>
> >>>>>> That is basically what we have now.  I could append -system to each to
> >>>>>> make things easier to add vram/etc (from a uabi standpoint)..
> >>>>>
> >>>>> What you have isn't really -system, but everything. So doesn't really make
> >>>>> sense to me to mark this -system, it's only really true for integrated (if
> >>>>> they don't have stolen or something like that).
> >>>>>
> >>>>> Also my comment was more in reply to Tvrtko's suggestion.
> >>>>
> >>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> >>>> with the current drm-memory-$REGION by extending, rather than creating
> >>>> confusion with different order of key name components.
> >>>
> >>> Oh my comment was pretty much just bikeshed, in case someone creates a
> >>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> >>> So $CATEGORY before the -memory.
> >>>
> >>> Otoh I don't think that'll happen, so I guess we can go with whatever more
> >>> folks like :-) I don't really care much personally.
> >>
> >> Okay I missed the parsing problem.
> >>
> >>>> AMD currently has (among others) drm-memory-vram, which we could define in
> >>>> the spec maps to category X, if category component is not present.
> >>>>
> >>>> Some examples:
> >>>>
> >>>> drm-memory-resident-system:
> >>>> drm-memory-size-lmem0:
> >>>> drm-memory-active-vram:
> >>>>
> >>>> Etc.. I think it creates a consistent story.
> >>>>
> >>>> Other than this, my two I think significant opens which haven't been
> >>>> addressed yet are:
> >>>>
> >>>> 1)
> >>>>
> >>>> Why do we want totals (not per region) when userspace can trivially
> >>>> aggregate if they want. What is the use case?
> >>>>
> >>>> 2)
> >>>>
> >>>> Current proposal limits the value to whole objects and fixates that by
> >>>> having it in the common code. If/when some driver is able to support sub-BO
> >>>> granularity they will need to opt out of the common printer at which point
> >>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
> >>>> some drivers already support this, I don't know. Given how important VM BIND
> >>>> is I wouldn't be surprised.
> >>>
> >>> I feel like for drivers using ttm we want a ttm helper which takes care of
> >>> the region printing in hopefully a standard way. And that could then also
> >>> take care of all kinds of of partial binding and funny rules (like maybe
> >>> we want a standard vram region that addds up all the lmem regions on
> >>> intel, so that all dgpu have a common vram bucket that generic tools
> >>> understand?).
> >>
> >> First part yes, but for the second I would think we want to avoid any
> >> aggregation in the kernel which can be done in userspace just as well. Such
> >> total vram bucket would be pretty useless on Intel even since userspace
> >> needs to be region aware to make use of all resources. It could even be
> >> counter productive I think - "why am I getting out of memory when half of my
> >> vram is unused!?".
> >
> > This is not for intel-aware userspace. This is for fairly generic "gputop"
> > style userspace, which might simply have no clue or interest in what lmemX
> > means, but would understand vram.
> >
> > Aggregating makes sense.
>
> Lmem vs vram is now an argument not about aggregation but about
> standardizing regions names.
>
> One detail also is a change in philosophy compared to engine stats where
> engine names are not centrally prescribed and it was expected userspace
> will have to handle things generically and with some vendor specific
> knowledge.
>
> Like in my gputop patches. It doesn't need to understand what is what,
> it just finds what's there and presents it to the user.
>
> Come some accel driver with local memory it wouldn't be vram any more.
> Or even a headless data center GPU. So I really don't think it is good
> to hardcode 'vram' in the spec, or midlayer, or helpers.
>
> And for aggregation.. again, userspace can do it just as well. If we do
> it in kernel then immediately we have multiple sets of keys to output
> for any driver which wants to show the region view. IMO it is just
> pointless work in the kernel and more code in the kernel, when userspace
> can do it.
>
> Proposal A (one a discrete gpu, one category only):
>
> drm-resident-memory: x KiB
> drm-resident-memory-system: x KiB
> drm-resident-memory-vram: x KiB
>
> Two loops in the kernel, more parsing in userspace.
>
> Proposal B:
>
> drm-resident-memory-system: x KiB
> drm-resident-memory-vram: x KiB
>
> Can be one loop, one helper, less text for userspace to parse and it can
> still trivially show the total if so desired.
>
> For instance a helper (or two) with a common struct containing region
> names and totals, where a callback into the driver tallies under each
> region, as the drm helper is walking objects.

The difference is that Rob's patches exist, and consistently roll this
out across all drm drivers.

You're patches don't exist, and encourage further fragmentation. And
my take here is that "the good enough, and real" wins above "perfect,
but maybe in a few years and inconsitently across drivers".

No one is stopping you from writing a ton of patches to get towards
the perfect state, and we still want to get there. But I don't see the
point in rejecting the good enough for now for that.

It's kinda the same idea with scheduler stats, but the other way
round: Sure it'd have been great if we could have this consistently
across all drivers, but right now the scheduler situation just isn't
there to support that. I'm pushing a bit, but it's definitely years
away. So the pragmatic option there was to just roll things out
driver-by-driver, to get things going. It's not perfect at all, and it
would have been easy to nuke that entire fdinfo effort on those
grounds.

If you want maybe a todo.rst entry to cover this discussion and make
sure we do record the rough consensus of where we eventually want to
end up at?

> >>> It does mean we walk the bo list twice, but *shrug*. People have been
> >>> complaining about procutils for decades, they're still horrible, I think
> >>> walking bo lists twice internally in the ttm case is going to be ok. If
> >>> not, it's internals, we can change them again.
> >>>
> >>> Also I'd lean a lot more towards making ttm a helper and not putting that
> >>> into core, exactly because it's pretty clear we'll need more flexibility
> >>> when it comes to accurate stats for multi-region drivers.
> >>
> >> Exactly.
> >>
> >>> But for a first "how much gpu space does this app use" across everything I
> >>> think this is a good enough starting point.
> >>
> >> Okay so we agree this would be better as a helper and not in the core.
> >
> > Nope, if you mean with this = Rob's patch. I was talking about a
> > hypothetical region-aware extension for ttm-using drivers.
> >
> >> On the point are keys/semantics good enough as a starting point I am still
> >> not convinced kernel should aggregate and that instead we should start from
> >> day one by appending -system (or something) to Rob's proposed keys.
> >
> > It should imo. Inflicting driver knowledge on generic userspace makes not
> > much sense, we should start with the more generally useful stuff imo.
> > That's why there's the drm fdinfo spec and all that so it's not a
> > free-for-all.
> >
> > Also Rob's stuff is _not_ system. Check on a i915 dgpu if you want :-)
>
> I am well aware it adds up everything, that is beside the point.
>
> Drm-usage-stats.rst text needs to be more precise across all keys at least:
>
> +- drm-resident-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are resident in system memory.
>
> But as said, I don't see the point in providing aggregated values.

The choice isn't between aggregated values and split values.

The choice is between no values (for most drivers) and split values on
some drivers, vs aggregated values for everyone (and still split
values for some).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-14  9:07                         ` Daniel Vetter
@ 2023-04-14 10:12                           ` Tvrtko Ursulin
  -1 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-14 10:12 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Clark, dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Christopher Healy, Emil Velikov, Rob Clark, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, open list:DOCUMENTATION, open list


On 14/04/2023 10:07, Daniel Vetter wrote:
> On Fri, 14 Apr 2023 at 10:57, Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>> On 13/04/2023 21:05, Daniel Vetter wrote:
>>> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 13/04/2023 14:27, Daniel Vetter wrote:
>>>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 12/04/2023 20:18, Daniel Vetter wrote:
>>>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
>>>>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>>>>>>
>>>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
>>>>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
>>>>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
>>>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>>>>>>>>
>>>>>>>>>>>> Add support to dump GEM stats to fdinfo.
>>>>>>>>>>>>
>>>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
>>>>>>>>>>>> v3: Do it in core
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>>>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>       Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
>>>>>>>>>>>>       drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
>>>>>>>>>>>>       include/drm/drm_file.h                |  1 +
>>>>>>>>>>>>       include/drm/drm_gem.h                 | 19 +++++++
>>>>>>>>>>>>       4 files changed, 117 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>>>> index b46327356e80..b5e7802532ed 100644
>>>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
>>>>>>>>>>>>       Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
>>>>>>>>>>>>       indicating kibi- or mebi-bytes.
>>>>>>>>>>>>
>>>>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
>>>>>>>>>>>> +
>>>>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
>>>>>>>>>>>> +than a single handle).
>>>>>>>>>>>> +
>>>>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
>>>>>>>>>>>> +
>>>>>>>>>>>> +The total size of buffers that are not shared with another file.
>>>>>>>>>>>> +
>>>>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
>>>>>>>>>>>> +
>>>>>>>>>>>> +The total size of buffers that are resident in system memory.
>>>>>>>>>>>
>>>>>>>>>>> I think this naming maybe does not work best with the existing
>>>>>>>>>>> drm-memory-<region> keys.
>>>>>>>>>>
>>>>>>>>>> Actually, it was very deliberate not to conflict with the existing
>>>>>>>>>> drm-memory-<region> keys ;-)
>>>>>>>>>>
>>>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
>>>>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
>>>>>>>>>>
>>>>>>>>>>> How about introduce the concept of a memory region from the start and
>>>>>>>>>>> use naming similar like we do for engines?
>>>>>>>>>>>
>>>>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
>>>>>>>>>>>
>>>>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
>>>>>>>>>>>
>>>>>>>>>>> 'size' - All reachable objects
>>>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
>>>>>>>>>>> 'resident' - Objects with backing store
>>>>>>>>>>> 'active' - Objects in use, subset of resident
>>>>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
>>>>>>>>>>>
>>>>>>>>>>> We keep the same semantics as with process memory accounting (if I got
>>>>>>>>>>> it right) which could be desirable for a simplified mental model.
>>>>>>>>>>>
>>>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
>>>>>>>>>>> correctly captured this in the first round it should be equivalent to
>>>>>>>>>>> 'resident' above. In any case we can document no category is equal to
>>>>>>>>>>> which category, and at most one of the two must be output.)
>>>>>>>>>>>
>>>>>>>>>>> Region names we at most partially standardize. Like we could say
>>>>>>>>>>> 'system' is to be used where backing store is system RAM and others are
>>>>>>>>>>> driver defined.
>>>>>>>>>>>
>>>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
>>>>>>>>>>> region they support.
>>>>>>>>>>>
>>>>>>>>>>> I think this all also works for objects which can be migrated between
>>>>>>>>>>> memory regions. 'Size' accounts them against all regions while for
>>>>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
>>>>>>>>>>
>>>>>>>>>> I'm not too sure how to rectify different memory regions with this,
>>>>>>>>>> since drm core doesn't really know about the driver's memory regions.
>>>>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
>>>>>>>>>> just don't use the helper?  Or??
>>>>>>>>>
>>>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
>>>>>>>>> all works out reasonably consistently?
>>>>>>>>
>>>>>>>> That is basically what we have now.  I could append -system to each to
>>>>>>>> make things easier to add vram/etc (from a uabi standpoint)..
>>>>>>>
>>>>>>> What you have isn't really -system, but everything. So doesn't really make
>>>>>>> sense to me to mark this -system, it's only really true for integrated (if
>>>>>>> they don't have stolen or something like that).
>>>>>>>
>>>>>>> Also my comment was more in reply to Tvrtko's suggestion.
>>>>>>
>>>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
>>>>>> with the current drm-memory-$REGION by extending, rather than creating
>>>>>> confusion with different order of key name components.
>>>>>
>>>>> Oh my comment was pretty much just bikeshed, in case someone creates a
>>>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
>>>>> So $CATEGORY before the -memory.
>>>>>
>>>>> Otoh I don't think that'll happen, so I guess we can go with whatever more
>>>>> folks like :-) I don't really care much personally.
>>>>
>>>> Okay I missed the parsing problem.
>>>>
>>>>>> AMD currently has (among others) drm-memory-vram, which we could define in
>>>>>> the spec maps to category X, if category component is not present.
>>>>>>
>>>>>> Some examples:
>>>>>>
>>>>>> drm-memory-resident-system:
>>>>>> drm-memory-size-lmem0:
>>>>>> drm-memory-active-vram:
>>>>>>
>>>>>> Etc.. I think it creates a consistent story.
>>>>>>
>>>>>> Other than this, my two I think significant opens which haven't been
>>>>>> addressed yet are:
>>>>>>
>>>>>> 1)
>>>>>>
>>>>>> Why do we want totals (not per region) when userspace can trivially
>>>>>> aggregate if they want. What is the use case?
>>>>>>
>>>>>> 2)
>>>>>>
>>>>>> Current proposal limits the value to whole objects and fixates that by
>>>>>> having it in the common code. If/when some driver is able to support sub-BO
>>>>>> granularity they will need to opt out of the common printer at which point
>>>>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
>>>>>> some drivers already support this, I don't know. Given how important VM BIND
>>>>>> is I wouldn't be surprised.
>>>>>
>>>>> I feel like for drivers using ttm we want a ttm helper which takes care of
>>>>> the region printing in hopefully a standard way. And that could then also
>>>>> take care of all kinds of of partial binding and funny rules (like maybe
>>>>> we want a standard vram region that addds up all the lmem regions on
>>>>> intel, so that all dgpu have a common vram bucket that generic tools
>>>>> understand?).
>>>>
>>>> First part yes, but for the second I would think we want to avoid any
>>>> aggregation in the kernel which can be done in userspace just as well. Such
>>>> total vram bucket would be pretty useless on Intel even since userspace
>>>> needs to be region aware to make use of all resources. It could even be
>>>> counter productive I think - "why am I getting out of memory when half of my
>>>> vram is unused!?".
>>>
>>> This is not for intel-aware userspace. This is for fairly generic "gputop"
>>> style userspace, which might simply have no clue or interest in what lmemX
>>> means, but would understand vram.
>>>
>>> Aggregating makes sense.
>>
>> Lmem vs vram is now an argument not about aggregation but about
>> standardizing regions names.
>>
>> One detail also is a change in philosophy compared to engine stats where
>> engine names are not centrally prescribed and it was expected userspace
>> will have to handle things generically and with some vendor specific
>> knowledge.
>>
>> Like in my gputop patches. It doesn't need to understand what is what,
>> it just finds what's there and presents it to the user.
>>
>> Come some accel driver with local memory it wouldn't be vram any more.
>> Or even a headless data center GPU. So I really don't think it is good
>> to hardcode 'vram' in the spec, or midlayer, or helpers.
>>
>> And for aggregation.. again, userspace can do it just as well. If we do
>> it in kernel then immediately we have multiple sets of keys to output
>> for any driver which wants to show the region view. IMO it is just
>> pointless work in the kernel and more code in the kernel, when userspace
>> can do it.
>>
>> Proposal A (one a discrete gpu, one category only):
>>
>> drm-resident-memory: x KiB
>> drm-resident-memory-system: x KiB
>> drm-resident-memory-vram: x KiB
>>
>> Two loops in the kernel, more parsing in userspace.
>>
>> Proposal B:
>>
>> drm-resident-memory-system: x KiB
>> drm-resident-memory-vram: x KiB
>>
>> Can be one loop, one helper, less text for userspace to parse and it can
>> still trivially show the total if so desired.
>>
>> For instance a helper (or two) with a common struct containing region
>> names and totals, where a callback into the driver tallies under each
>> region, as the drm helper is walking objects.
> 
> The difference is that Rob's patches exist, and consistently roll this
> out across all drm drivers.
> > You're patches don't exist, and encourage further fragmentation. And
> my take here is that "the good enough, and real" wins above "perfect,
> but maybe in a few years and inconsitently across drivers".

There is fragmentation in this series already since two categories 
depend on drivers implementing them. Resident is even IMO one of the 
more interesting ones.

> No one is stopping you from writing a ton of patches to get towards
> the perfect state, and we still want to get there. But I don't see the
> point in rejecting the good enough for now for that.

I argued a few times already on what I see as problems and discussed 
pros and cons, but I can write patches too.

Regards,

Tvrtko

> It's kinda the same idea with scheduler stats, but the other way
> round: Sure it'd have been great if we could have this consistently
> across all drivers, but right now the scheduler situation just isn't
> there to support that. I'm pushing a bit, but it's definitely years
> away. So the pragmatic option there was to just roll things out
> driver-by-driver, to get things going. It's not perfect at all, and it
> would have been easy to nuke that entire fdinfo effort on those
> grounds.
> 
> If you want maybe a todo.rst entry to cover this discussion and make
> sure we do record the rough consensus of where we eventually want to
> end up at?
> 
>>>>> It does mean we walk the bo list twice, but *shrug*. People have been
>>>>> complaining about procutils for decades, they're still horrible, I think
>>>>> walking bo lists twice internally in the ttm case is going to be ok. If
>>>>> not, it's internals, we can change them again.
>>>>>
>>>>> Also I'd lean a lot more towards making ttm a helper and not putting that
>>>>> into core, exactly because it's pretty clear we'll need more flexibility
>>>>> when it comes to accurate stats for multi-region drivers.
>>>>
>>>> Exactly.
>>>>
>>>>> But for a first "how much gpu space does this app use" across everything I
>>>>> think this is a good enough starting point.
>>>>
>>>> Okay so we agree this would be better as a helper and not in the core.
>>>
>>> Nope, if you mean with this = Rob's patch. I was talking about a
>>> hypothetical region-aware extension for ttm-using drivers.
>>>
>>>> On the point are keys/semantics good enough as a starting point I am still
>>>> not convinced kernel should aggregate and that instead we should start from
>>>> day one by appending -system (or something) to Rob's proposed keys.
>>>
>>> It should imo. Inflicting driver knowledge on generic userspace makes not
>>> much sense, we should start with the more generally useful stuff imo.
>>> That's why there's the drm fdinfo spec and all that so it's not a
>>> free-for-all.
>>>
>>> Also Rob's stuff is _not_ system. Check on a i915 dgpu if you want :-)
>>
>> I am well aware it adds up everything, that is beside the point.
>>
>> Drm-usage-stats.rst text needs to be more precise across all keys at least:
>>
>> +- drm-resident-memory: <uint> [KiB|MiB]
>> +
>> +The total size of buffers that are resident in system memory.
>>
>> But as said, I don't see the point in providing aggregated values.
> 
> The choice isn't between aggregated values and split values.
> 
> The choice is between no values (for most drivers) and split values on
> some drivers, vs aggregated values for everyone (and still split
> values for some).
> -Daniel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-14 10:12                           ` Tvrtko Ursulin
  0 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-14 10:12 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Clark, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno


On 14/04/2023 10:07, Daniel Vetter wrote:
> On Fri, 14 Apr 2023 at 10:57, Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>> On 13/04/2023 21:05, Daniel Vetter wrote:
>>> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 13/04/2023 14:27, Daniel Vetter wrote:
>>>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 12/04/2023 20:18, Daniel Vetter wrote:
>>>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
>>>>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>>>>>>
>>>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
>>>>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
>>>>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
>>>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>>>>>>>>
>>>>>>>>>>>> Add support to dump GEM stats to fdinfo.
>>>>>>>>>>>>
>>>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
>>>>>>>>>>>> v3: Do it in core
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>>>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>       Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
>>>>>>>>>>>>       drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
>>>>>>>>>>>>       include/drm/drm_file.h                |  1 +
>>>>>>>>>>>>       include/drm/drm_gem.h                 | 19 +++++++
>>>>>>>>>>>>       4 files changed, 117 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>>>> index b46327356e80..b5e7802532ed 100644
>>>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
>>>>>>>>>>>>       Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
>>>>>>>>>>>>       indicating kibi- or mebi-bytes.
>>>>>>>>>>>>
>>>>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
>>>>>>>>>>>> +
>>>>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
>>>>>>>>>>>> +than a single handle).
>>>>>>>>>>>> +
>>>>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
>>>>>>>>>>>> +
>>>>>>>>>>>> +The total size of buffers that are not shared with another file.
>>>>>>>>>>>> +
>>>>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
>>>>>>>>>>>> +
>>>>>>>>>>>> +The total size of buffers that are resident in system memory.
>>>>>>>>>>>
>>>>>>>>>>> I think this naming maybe does not work best with the existing
>>>>>>>>>>> drm-memory-<region> keys.
>>>>>>>>>>
>>>>>>>>>> Actually, it was very deliberate not to conflict with the existing
>>>>>>>>>> drm-memory-<region> keys ;-)
>>>>>>>>>>
>>>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
>>>>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
>>>>>>>>>>
>>>>>>>>>>> How about introduce the concept of a memory region from the start and
>>>>>>>>>>> use naming similar like we do for engines?
>>>>>>>>>>>
>>>>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
>>>>>>>>>>>
>>>>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
>>>>>>>>>>>
>>>>>>>>>>> 'size' - All reachable objects
>>>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
>>>>>>>>>>> 'resident' - Objects with backing store
>>>>>>>>>>> 'active' - Objects in use, subset of resident
>>>>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
>>>>>>>>>>>
>>>>>>>>>>> We keep the same semantics as with process memory accounting (if I got
>>>>>>>>>>> it right) which could be desirable for a simplified mental model.
>>>>>>>>>>>
>>>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
>>>>>>>>>>> correctly captured this in the first round it should be equivalent to
>>>>>>>>>>> 'resident' above. In any case we can document no category is equal to
>>>>>>>>>>> which category, and at most one of the two must be output.)
>>>>>>>>>>>
>>>>>>>>>>> Region names we at most partially standardize. Like we could say
>>>>>>>>>>> 'system' is to be used where backing store is system RAM and others are
>>>>>>>>>>> driver defined.
>>>>>>>>>>>
>>>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
>>>>>>>>>>> region they support.
>>>>>>>>>>>
>>>>>>>>>>> I think this all also works for objects which can be migrated between
>>>>>>>>>>> memory regions. 'Size' accounts them against all regions while for
>>>>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
>>>>>>>>>>
>>>>>>>>>> I'm not too sure how to rectify different memory regions with this,
>>>>>>>>>> since drm core doesn't really know about the driver's memory regions.
>>>>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
>>>>>>>>>> just don't use the helper?  Or??
>>>>>>>>>
>>>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
>>>>>>>>> all works out reasonably consistently?
>>>>>>>>
>>>>>>>> That is basically what we have now.  I could append -system to each to
>>>>>>>> make things easier to add vram/etc (from a uabi standpoint)..
>>>>>>>
>>>>>>> What you have isn't really -system, but everything. So doesn't really make
>>>>>>> sense to me to mark this -system, it's only really true for integrated (if
>>>>>>> they don't have stolen or something like that).
>>>>>>>
>>>>>>> Also my comment was more in reply to Tvrtko's suggestion.
>>>>>>
>>>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
>>>>>> with the current drm-memory-$REGION by extending, rather than creating
>>>>>> confusion with different order of key name components.
>>>>>
>>>>> Oh my comment was pretty much just bikeshed, in case someone creates a
>>>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
>>>>> So $CATEGORY before the -memory.
>>>>>
>>>>> Otoh I don't think that'll happen, so I guess we can go with whatever more
>>>>> folks like :-) I don't really care much personally.
>>>>
>>>> Okay I missed the parsing problem.
>>>>
>>>>>> AMD currently has (among others) drm-memory-vram, which we could define in
>>>>>> the spec maps to category X, if category component is not present.
>>>>>>
>>>>>> Some examples:
>>>>>>
>>>>>> drm-memory-resident-system:
>>>>>> drm-memory-size-lmem0:
>>>>>> drm-memory-active-vram:
>>>>>>
>>>>>> Etc.. I think it creates a consistent story.
>>>>>>
>>>>>> Other than this, my two I think significant opens which haven't been
>>>>>> addressed yet are:
>>>>>>
>>>>>> 1)
>>>>>>
>>>>>> Why do we want totals (not per region) when userspace can trivially
>>>>>> aggregate if they want. What is the use case?
>>>>>>
>>>>>> 2)
>>>>>>
>>>>>> Current proposal limits the value to whole objects and fixates that by
>>>>>> having it in the common code. If/when some driver is able to support sub-BO
>>>>>> granularity they will need to opt out of the common printer at which point
>>>>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
>>>>>> some drivers already support this, I don't know. Given how important VM BIND
>>>>>> is I wouldn't be surprised.
>>>>>
>>>>> I feel like for drivers using ttm we want a ttm helper which takes care of
>>>>> the region printing in hopefully a standard way. And that could then also
>>>>> take care of all kinds of of partial binding and funny rules (like maybe
>>>>> we want a standard vram region that addds up all the lmem regions on
>>>>> intel, so that all dgpu have a common vram bucket that generic tools
>>>>> understand?).
>>>>
>>>> First part yes, but for the second I would think we want to avoid any
>>>> aggregation in the kernel which can be done in userspace just as well. Such
>>>> total vram bucket would be pretty useless on Intel even since userspace
>>>> needs to be region aware to make use of all resources. It could even be
>>>> counter productive I think - "why am I getting out of memory when half of my
>>>> vram is unused!?".
>>>
>>> This is not for intel-aware userspace. This is for fairly generic "gputop"
>>> style userspace, which might simply have no clue or interest in what lmemX
>>> means, but would understand vram.
>>>
>>> Aggregating makes sense.
>>
>> Lmem vs vram is now an argument not about aggregation but about
>> standardizing regions names.
>>
>> One detail also is a change in philosophy compared to engine stats where
>> engine names are not centrally prescribed and it was expected userspace
>> will have to handle things generically and with some vendor specific
>> knowledge.
>>
>> Like in my gputop patches. It doesn't need to understand what is what,
>> it just finds what's there and presents it to the user.
>>
>> Come some accel driver with local memory it wouldn't be vram any more.
>> Or even a headless data center GPU. So I really don't think it is good
>> to hardcode 'vram' in the spec, or midlayer, or helpers.
>>
>> And for aggregation.. again, userspace can do it just as well. If we do
>> it in kernel then immediately we have multiple sets of keys to output
>> for any driver which wants to show the region view. IMO it is just
>> pointless work in the kernel and more code in the kernel, when userspace
>> can do it.
>>
>> Proposal A (one a discrete gpu, one category only):
>>
>> drm-resident-memory: x KiB
>> drm-resident-memory-system: x KiB
>> drm-resident-memory-vram: x KiB
>>
>> Two loops in the kernel, more parsing in userspace.
>>
>> Proposal B:
>>
>> drm-resident-memory-system: x KiB
>> drm-resident-memory-vram: x KiB
>>
>> Can be one loop, one helper, less text for userspace to parse and it can
>> still trivially show the total if so desired.
>>
>> For instance a helper (or two) with a common struct containing region
>> names and totals, where a callback into the driver tallies under each
>> region, as the drm helper is walking objects.
> 
> The difference is that Rob's patches exist, and consistently roll this
> out across all drm drivers.
> > You're patches don't exist, and encourage further fragmentation. And
> my take here is that "the good enough, and real" wins above "perfect,
> but maybe in a few years and inconsitently across drivers".

There is fragmentation in this series already since two categories 
depend on drivers implementing them. Resident is even IMO one of the 
more interesting ones.

> No one is stopping you from writing a ton of patches to get towards
> the perfect state, and we still want to get there. But I don't see the
> point in rejecting the good enough for now for that.

I argued a few times already on what I see as problems and discussed 
pros and cons, but I can write patches too.

Regards,

Tvrtko

> It's kinda the same idea with scheduler stats, but the other way
> round: Sure it'd have been great if we could have this consistently
> across all drivers, but right now the scheduler situation just isn't
> there to support that. I'm pushing a bit, but it's definitely years
> away. So the pragmatic option there was to just roll things out
> driver-by-driver, to get things going. It's not perfect at all, and it
> would have been easy to nuke that entire fdinfo effort on those
> grounds.
> 
> If you want maybe a todo.rst entry to cover this discussion and make
> sure we do record the rough consensus of where we eventually want to
> end up at?
> 
>>>>> It does mean we walk the bo list twice, but *shrug*. People have been
>>>>> complaining about procutils for decades, they're still horrible, I think
>>>>> walking bo lists twice internally in the ttm case is going to be ok. If
>>>>> not, it's internals, we can change them again.
>>>>>
>>>>> Also I'd lean a lot more towards making ttm a helper and not putting that
>>>>> into core, exactly because it's pretty clear we'll need more flexibility
>>>>> when it comes to accurate stats for multi-region drivers.
>>>>
>>>> Exactly.
>>>>
>>>>> But for a first "how much gpu space does this app use" across everything I
>>>>> think this is a good enough starting point.
>>>>
>>>> Okay so we agree this would be better as a helper and not in the core.
>>>
>>> Nope, if you mean with this = Rob's patch. I was talking about a
>>> hypothetical region-aware extension for ttm-using drivers.
>>>
>>>> On the point are keys/semantics good enough as a starting point I am still
>>>> not convinced kernel should aggregate and that instead we should start from
>>>> day one by appending -system (or something) to Rob's proposed keys.
>>>
>>> It should imo. Inflicting driver knowledge on generic userspace makes not
>>> much sense, we should start with the more generally useful stuff imo.
>>> That's why there's the drm fdinfo spec and all that so it's not a
>>> free-for-all.
>>>
>>> Also Rob's stuff is _not_ system. Check on a i915 dgpu if you want :-)
>>
>> I am well aware it adds up everything, that is beside the point.
>>
>> Drm-usage-stats.rst text needs to be more precise across all keys at least:
>>
>> +- drm-resident-memory: <uint> [KiB|MiB]
>> +
>> +The total size of buffers that are resident in system memory.
>>
>> But as said, I don't see the point in providing aggregated values.
> 
> The choice isn't between aggregated values and split values.
> 
> The choice is between no values (for most drivers) and split values on
> some drivers, vs aggregated values for everyone (and still split
> values for some).
> -Daniel

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-14  8:57                     ` Tvrtko Ursulin
@ 2023-04-14 13:40                         ` Rob Clark
  2023-04-14 13:40                         ` Rob Clark
  1 sibling, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-14 13:40 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: dri-devel, linux-arm-msm, freedreno, Boris Brezillon,
	Christopher Healy, Emil Velikov, Rob Clark, David Airlie,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Jonathan Corbet, open list:DOCUMENTATION, open list

On Fri, Apr 14, 2023 at 1:57 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 13/04/2023 21:05, Daniel Vetter wrote:
> > On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
> >>
> >> On 13/04/2023 14:27, Daniel Vetter wrote:
> >>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> >>>>
> >>>> On 12/04/2023 20:18, Daniel Vetter wrote:
> >>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> >>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>>>>>
> >>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> >>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> >>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
> >>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>>
> >>>>>>>>>> Add support to dump GEM stats to fdinfo.
> >>>>>>>>>>
> >>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
> >>>>>>>>>> v3: Do it in core
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> >>>>>>>>>> ---
> >>>>>>>>>>      Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >>>>>>>>>>      drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >>>>>>>>>>      include/drm/drm_file.h                |  1 +
> >>>>>>>>>>      include/drm/drm_gem.h                 | 19 +++++++
> >>>>>>>>>>      4 files changed, 117 insertions(+)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>> index b46327356e80..b5e7802532ed 100644
> >>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >>>>>>>>>>      Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >>>>>>>>>>      indicating kibi- or mebi-bytes.
> >>>>>>>>>>
> >>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> >>>>>>>>>> +
> >>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
> >>>>>>>>>> +than a single handle).
> >>>>>>>>>> +
> >>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> >>>>>>>>>> +
> >>>>>>>>>> +The total size of buffers that are not shared with another file.
> >>>>>>>>>> +
> >>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> >>>>>>>>>> +
> >>>>>>>>>> +The total size of buffers that are resident in system memory.
> >>>>>>>>>
> >>>>>>>>> I think this naming maybe does not work best with the existing
> >>>>>>>>> drm-memory-<region> keys.
> >>>>>>>>
> >>>>>>>> Actually, it was very deliberate not to conflict with the existing
> >>>>>>>> drm-memory-<region> keys ;-)
> >>>>>>>>
> >>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> >>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> >>>>>>>>
> >>>>>>>>> How about introduce the concept of a memory region from the start and
> >>>>>>>>> use naming similar like we do for engines?
> >>>>>>>>>
> >>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
> >>>>>>>>>
> >>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
> >>>>>>>>>
> >>>>>>>>> 'size' - All reachable objects
> >>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
> >>>>>>>>> 'resident' - Objects with backing store
> >>>>>>>>> 'active' - Objects in use, subset of resident
> >>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
> >>>>>>>>>
> >>>>>>>>> We keep the same semantics as with process memory accounting (if I got
> >>>>>>>>> it right) which could be desirable for a simplified mental model.
> >>>>>>>>>
> >>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> >>>>>>>>> correctly captured this in the first round it should be equivalent to
> >>>>>>>>> 'resident' above. In any case we can document no category is equal to
> >>>>>>>>> which category, and at most one of the two must be output.)
> >>>>>>>>>
> >>>>>>>>> Region names we at most partially standardize. Like we could say
> >>>>>>>>> 'system' is to be used where backing store is system RAM and others are
> >>>>>>>>> driver defined.
> >>>>>>>>>
> >>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> >>>>>>>>> region they support.
> >>>>>>>>>
> >>>>>>>>> I think this all also works for objects which can be migrated between
> >>>>>>>>> memory regions. 'Size' accounts them against all regions while for
> >>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
> >>>>>>>>
> >>>>>>>> I'm not too sure how to rectify different memory regions with this,
> >>>>>>>> since drm core doesn't really know about the driver's memory regions.
> >>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
> >>>>>>>> just don't use the helper?  Or??
> >>>>>>>
> >>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> >>>>>>> all works out reasonably consistently?
> >>>>>>
> >>>>>> That is basically what we have now.  I could append -system to each to
> >>>>>> make things easier to add vram/etc (from a uabi standpoint)..
> >>>>>
> >>>>> What you have isn't really -system, but everything. So doesn't really make
> >>>>> sense to me to mark this -system, it's only really true for integrated (if
> >>>>> they don't have stolen or something like that).
> >>>>>
> >>>>> Also my comment was more in reply to Tvrtko's suggestion.
> >>>>
> >>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> >>>> with the current drm-memory-$REGION by extending, rather than creating
> >>>> confusion with different order of key name components.
> >>>
> >>> Oh my comment was pretty much just bikeshed, in case someone creates a
> >>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> >>> So $CATEGORY before the -memory.
> >>>
> >>> Otoh I don't think that'll happen, so I guess we can go with whatever more
> >>> folks like :-) I don't really care much personally.
> >>
> >> Okay I missed the parsing problem.
> >>
> >>>> AMD currently has (among others) drm-memory-vram, which we could define in
> >>>> the spec maps to category X, if category component is not present.
> >>>>
> >>>> Some examples:
> >>>>
> >>>> drm-memory-resident-system:
> >>>> drm-memory-size-lmem0:
> >>>> drm-memory-active-vram:
> >>>>
> >>>> Etc.. I think it creates a consistent story.
> >>>>
> >>>> Other than this, my two I think significant opens which haven't been
> >>>> addressed yet are:
> >>>>
> >>>> 1)
> >>>>
> >>>> Why do we want totals (not per region) when userspace can trivially
> >>>> aggregate if they want. What is the use case?
> >>>>
> >>>> 2)
> >>>>
> >>>> Current proposal limits the value to whole objects and fixates that by
> >>>> having it in the common code. If/when some driver is able to support sub-BO
> >>>> granularity they will need to opt out of the common printer at which point
> >>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
> >>>> some drivers already support this, I don't know. Given how important VM BIND
> >>>> is I wouldn't be surprised.
> >>>
> >>> I feel like for drivers using ttm we want a ttm helper which takes care of
> >>> the region printing in hopefully a standard way. And that could then also
> >>> take care of all kinds of of partial binding and funny rules (like maybe
> >>> we want a standard vram region that addds up all the lmem regions on
> >>> intel, so that all dgpu have a common vram bucket that generic tools
> >>> understand?).
> >>
> >> First part yes, but for the second I would think we want to avoid any
> >> aggregation in the kernel which can be done in userspace just as well. Such
> >> total vram bucket would be pretty useless on Intel even since userspace
> >> needs to be region aware to make use of all resources. It could even be
> >> counter productive I think - "why am I getting out of memory when half of my
> >> vram is unused!?".
> >
> > This is not for intel-aware userspace. This is for fairly generic "gputop"
> > style userspace, which might simply have no clue or interest in what lmemX
> > means, but would understand vram.
> >
> > Aggregating makes sense.
>
> Lmem vs vram is now an argument not about aggregation but about
> standardizing regions names.
>
> One detail also is a change in philosophy compared to engine stats where
> engine names are not centrally prescribed and it was expected userspace
> will have to handle things generically and with some vendor specific
> knowledge.
>
> Like in my gputop patches. It doesn't need to understand what is what,
> it just finds what's there and presents it to the user.
>
> Come some accel driver with local memory it wouldn't be vram any more.
> Or even a headless data center GPU. So I really don't think it is good
> to hardcode 'vram' in the spec, or midlayer, or helpers.
>
> And for aggregation.. again, userspace can do it just as well. If we do
> it in kernel then immediately we have multiple sets of keys to output
> for any driver which wants to show the region view. IMO it is just
> pointless work in the kernel and more code in the kernel, when userspace
> can do it.
>
> Proposal A (one a discrete gpu, one category only):
>
> drm-resident-memory: x KiB
> drm-resident-memory-system: x KiB
> drm-resident-memory-vram: x KiB
>
> Two loops in the kernel, more parsing in userspace.

why would it be more than one loop, ie.

    mem.resident += size;
    mem.category[cat].resident += size;

At the end of the day, there is limited real-estate to show a million
different columns of information.  Even the gputop patches I posted
don't show everything of what is currently there.  And nvtop only
shows toplevel resident stat.  So I think the "everything" stat is
going to be what most tools use.

BR,
-R

> Proposal B:
>
> drm-resident-memory-system: x KiB
> drm-resident-memory-vram: x KiB
>
> Can be one loop, one helper, less text for userspace to parse and it can
> still trivially show the total if so desired.
>
> For instance a helper (or two) with a common struct containing region
> names and totals, where a callback into the driver tallies under each
> region, as the drm helper is walking objects.
>
> >>> It does mean we walk the bo list twice, but *shrug*. People have been
> >>> complaining about procutils for decades, they're still horrible, I think
> >>> walking bo lists twice internally in the ttm case is going to be ok. If
> >>> not, it's internals, we can change them again.
> >>>
> >>> Also I'd lean a lot more towards making ttm a helper and not putting that
> >>> into core, exactly because it's pretty clear we'll need more flexibility
> >>> when it comes to accurate stats for multi-region drivers.
> >>
> >> Exactly.
> >>
> >>> But for a first "how much gpu space does this app use" across everything I
> >>> think this is a good enough starting point.
> >>
> >> Okay so we agree this would be better as a helper and not in the core.
> >
> > Nope, if you mean with this = Rob's patch. I was talking about a
> > hypothetical region-aware extension for ttm-using drivers.
> >
> >> On the point are keys/semantics good enough as a starting point I am still
> >> not convinced kernel should aggregate and that instead we should start from
> >> day one by appending -system (or something) to Rob's proposed keys.
> >
> > It should imo. Inflicting driver knowledge on generic userspace makes not
> > much sense, we should start with the more generally useful stuff imo.
> > That's why there's the drm fdinfo spec and all that so it's not a
> > free-for-all.
> >
> > Also Rob's stuff is _not_ system. Check on a i915 dgpu if you want :-)
>
> I am well aware it adds up everything, that is beside the point.
>
> Drm-usage-stats.rst text needs to be more precise across all keys at least:
>
> +- drm-resident-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are resident in system memory.
>
> But as said, I don't see the point in providing aggregated values.
>
> Regards,
>
> Tvrtko

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-14 13:40                         ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-14 13:40 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno

On Fri, Apr 14, 2023 at 1:57 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 13/04/2023 21:05, Daniel Vetter wrote:
> > On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
> >>
> >> On 13/04/2023 14:27, Daniel Vetter wrote:
> >>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> >>>>
> >>>> On 12/04/2023 20:18, Daniel Vetter wrote:
> >>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> >>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>>>>>
> >>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> >>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> >>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
> >>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>>
> >>>>>>>>>> Add support to dump GEM stats to fdinfo.
> >>>>>>>>>>
> >>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
> >>>>>>>>>> v3: Do it in core
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> >>>>>>>>>> ---
> >>>>>>>>>>      Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >>>>>>>>>>      drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >>>>>>>>>>      include/drm/drm_file.h                |  1 +
> >>>>>>>>>>      include/drm/drm_gem.h                 | 19 +++++++
> >>>>>>>>>>      4 files changed, 117 insertions(+)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>> index b46327356e80..b5e7802532ed 100644
> >>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >>>>>>>>>>      Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >>>>>>>>>>      indicating kibi- or mebi-bytes.
> >>>>>>>>>>
> >>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> >>>>>>>>>> +
> >>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
> >>>>>>>>>> +than a single handle).
> >>>>>>>>>> +
> >>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> >>>>>>>>>> +
> >>>>>>>>>> +The total size of buffers that are not shared with another file.
> >>>>>>>>>> +
> >>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> >>>>>>>>>> +
> >>>>>>>>>> +The total size of buffers that are resident in system memory.
> >>>>>>>>>
> >>>>>>>>> I think this naming maybe does not work best with the existing
> >>>>>>>>> drm-memory-<region> keys.
> >>>>>>>>
> >>>>>>>> Actually, it was very deliberate not to conflict with the existing
> >>>>>>>> drm-memory-<region> keys ;-)
> >>>>>>>>
> >>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> >>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> >>>>>>>>
> >>>>>>>>> How about introduce the concept of a memory region from the start and
> >>>>>>>>> use naming similar like we do for engines?
> >>>>>>>>>
> >>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
> >>>>>>>>>
> >>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
> >>>>>>>>>
> >>>>>>>>> 'size' - All reachable objects
> >>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
> >>>>>>>>> 'resident' - Objects with backing store
> >>>>>>>>> 'active' - Objects in use, subset of resident
> >>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
> >>>>>>>>>
> >>>>>>>>> We keep the same semantics as with process memory accounting (if I got
> >>>>>>>>> it right) which could be desirable for a simplified mental model.
> >>>>>>>>>
> >>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> >>>>>>>>> correctly captured this in the first round it should be equivalent to
> >>>>>>>>> 'resident' above. In any case we can document no category is equal to
> >>>>>>>>> which category, and at most one of the two must be output.)
> >>>>>>>>>
> >>>>>>>>> Region names we at most partially standardize. Like we could say
> >>>>>>>>> 'system' is to be used where backing store is system RAM and others are
> >>>>>>>>> driver defined.
> >>>>>>>>>
> >>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> >>>>>>>>> region they support.
> >>>>>>>>>
> >>>>>>>>> I think this all also works for objects which can be migrated between
> >>>>>>>>> memory regions. 'Size' accounts them against all regions while for
> >>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
> >>>>>>>>
> >>>>>>>> I'm not too sure how to rectify different memory regions with this,
> >>>>>>>> since drm core doesn't really know about the driver's memory regions.
> >>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
> >>>>>>>> just don't use the helper?  Or??
> >>>>>>>
> >>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> >>>>>>> all works out reasonably consistently?
> >>>>>>
> >>>>>> That is basically what we have now.  I could append -system to each to
> >>>>>> make things easier to add vram/etc (from a uabi standpoint)..
> >>>>>
> >>>>> What you have isn't really -system, but everything. So doesn't really make
> >>>>> sense to me to mark this -system, it's only really true for integrated (if
> >>>>> they don't have stolen or something like that).
> >>>>>
> >>>>> Also my comment was more in reply to Tvrtko's suggestion.
> >>>>
> >>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> >>>> with the current drm-memory-$REGION by extending, rather than creating
> >>>> confusion with different order of key name components.
> >>>
> >>> Oh my comment was pretty much just bikeshed, in case someone creates a
> >>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> >>> So $CATEGORY before the -memory.
> >>>
> >>> Otoh I don't think that'll happen, so I guess we can go with whatever more
> >>> folks like :-) I don't really care much personally.
> >>
> >> Okay I missed the parsing problem.
> >>
> >>>> AMD currently has (among others) drm-memory-vram, which we could define in
> >>>> the spec maps to category X, if category component is not present.
> >>>>
> >>>> Some examples:
> >>>>
> >>>> drm-memory-resident-system:
> >>>> drm-memory-size-lmem0:
> >>>> drm-memory-active-vram:
> >>>>
> >>>> Etc.. I think it creates a consistent story.
> >>>>
> >>>> Other than this, my two I think significant opens which haven't been
> >>>> addressed yet are:
> >>>>
> >>>> 1)
> >>>>
> >>>> Why do we want totals (not per region) when userspace can trivially
> >>>> aggregate if they want. What is the use case?
> >>>>
> >>>> 2)
> >>>>
> >>>> Current proposal limits the value to whole objects and fixates that by
> >>>> having it in the common code. If/when some driver is able to support sub-BO
> >>>> granularity they will need to opt out of the common printer at which point
> >>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
> >>>> some drivers already support this, I don't know. Given how important VM BIND
> >>>> is I wouldn't be surprised.
> >>>
> >>> I feel like for drivers using ttm we want a ttm helper which takes care of
> >>> the region printing in hopefully a standard way. And that could then also
> >>> take care of all kinds of of partial binding and funny rules (like maybe
> >>> we want a standard vram region that addds up all the lmem regions on
> >>> intel, so that all dgpu have a common vram bucket that generic tools
> >>> understand?).
> >>
> >> First part yes, but for the second I would think we want to avoid any
> >> aggregation in the kernel which can be done in userspace just as well. Such
> >> total vram bucket would be pretty useless on Intel even since userspace
> >> needs to be region aware to make use of all resources. It could even be
> >> counter productive I think - "why am I getting out of memory when half of my
> >> vram is unused!?".
> >
> > This is not for intel-aware userspace. This is for fairly generic "gputop"
> > style userspace, which might simply have no clue or interest in what lmemX
> > means, but would understand vram.
> >
> > Aggregating makes sense.
>
> Lmem vs vram is now an argument not about aggregation but about
> standardizing regions names.
>
> One detail also is a change in philosophy compared to engine stats where
> engine names are not centrally prescribed and it was expected userspace
> will have to handle things generically and with some vendor specific
> knowledge.
>
> Like in my gputop patches. It doesn't need to understand what is what,
> it just finds what's there and presents it to the user.
>
> Come some accel driver with local memory it wouldn't be vram any more.
> Or even a headless data center GPU. So I really don't think it is good
> to hardcode 'vram' in the spec, or midlayer, or helpers.
>
> And for aggregation.. again, userspace can do it just as well. If we do
> it in kernel then immediately we have multiple sets of keys to output
> for any driver which wants to show the region view. IMO it is just
> pointless work in the kernel and more code in the kernel, when userspace
> can do it.
>
> Proposal A (one a discrete gpu, one category only):
>
> drm-resident-memory: x KiB
> drm-resident-memory-system: x KiB
> drm-resident-memory-vram: x KiB
>
> Two loops in the kernel, more parsing in userspace.

why would it be more than one loop, ie.

    mem.resident += size;
    mem.category[cat].resident += size;

At the end of the day, there is limited real-estate to show a million
different columns of information.  Even the gputop patches I posted
don't show everything of what is currently there.  And nvtop only
shows toplevel resident stat.  So I think the "everything" stat is
going to be what most tools use.

BR,
-R

> Proposal B:
>
> drm-resident-memory-system: x KiB
> drm-resident-memory-vram: x KiB
>
> Can be one loop, one helper, less text for userspace to parse and it can
> still trivially show the total if so desired.
>
> For instance a helper (or two) with a common struct containing region
> names and totals, where a callback into the driver tallies under each
> region, as the drm helper is walking objects.
>
> >>> It does mean we walk the bo list twice, but *shrug*. People have been
> >>> complaining about procutils for decades, they're still horrible, I think
> >>> walking bo lists twice internally in the ttm case is going to be ok. If
> >>> not, it's internals, we can change them again.
> >>>
> >>> Also I'd lean a lot more towards making ttm a helper and not putting that
> >>> into core, exactly because it's pretty clear we'll need more flexibility
> >>> when it comes to accurate stats for multi-region drivers.
> >>
> >> Exactly.
> >>
> >>> But for a first "how much gpu space does this app use" across everything I
> >>> think this is a good enough starting point.
> >>
> >> Okay so we agree this would be better as a helper and not in the core.
> >
> > Nope, if you mean with this = Rob's patch. I was talking about a
> > hypothetical region-aware extension for ttm-using drivers.
> >
> >> On the point are keys/semantics good enough as a starting point I am still
> >> not convinced kernel should aggregate and that instead we should start from
> >> day one by appending -system (or something) to Rob's proposed keys.
> >
> > It should imo. Inflicting driver knowledge on generic userspace makes not
> > much sense, we should start with the more generally useful stuff imo.
> > That's why there's the drm fdinfo spec and all that so it's not a
> > free-for-all.
> >
> > Also Rob's stuff is _not_ system. Check on a i915 dgpu if you want :-)
>
> I am well aware it adds up everything, that is beside the point.
>
> Drm-usage-stats.rst text needs to be more precise across all keys at least:
>
> +- drm-resident-memory: <uint> [KiB|MiB]
> +
> +The total size of buffers that are resident in system memory.
>
> But as said, I don't see the point in providing aggregated values.
>
> Regards,
>
> Tvrtko

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-14 13:40                         ` Rob Clark
@ 2023-04-16  7:48                           ` Daniel Vetter
  -1 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-16  7:48 UTC (permalink / raw)
  To: Rob Clark
  Cc: Tvrtko Ursulin, Rob Clark, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno

On Fri, Apr 14, 2023 at 06:40:27AM -0700, Rob Clark wrote:
> On Fri, Apr 14, 2023 at 1:57 AM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> >
> > On 13/04/2023 21:05, Daniel Vetter wrote:
> > > On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
> > >>
> > >> On 13/04/2023 14:27, Daniel Vetter wrote:
> > >>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> > >>>>
> > >>>> On 12/04/2023 20:18, Daniel Vetter wrote:
> > >>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> > >>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >>>>>>>
> > >>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> > >>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> > >>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
> > >>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> > >>>>>>>>>>
> > >>>>>>>>>> Add support to dump GEM stats to fdinfo.
> > >>>>>>>>>>
> > >>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
> > >>>>>>>>>> v3: Do it in core
> > >>>>>>>>>>
> > >>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> > >>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > >>>>>>>>>> ---
> > >>>>>>>>>>      Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> > >>>>>>>>>>      drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> > >>>>>>>>>>      include/drm/drm_file.h                |  1 +
> > >>>>>>>>>>      include/drm/drm_gem.h                 | 19 +++++++
> > >>>>>>>>>>      4 files changed, 117 insertions(+)
> > >>>>>>>>>>
> > >>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > >>>>>>>>>> index b46327356e80..b5e7802532ed 100644
> > >>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> > >>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> > >>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> > >>>>>>>>>>      Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> > >>>>>>>>>>      indicating kibi- or mebi-bytes.
> > >>>>>>>>>>
> > >>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> > >>>>>>>>>> +
> > >>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
> > >>>>>>>>>> +than a single handle).
> > >>>>>>>>>> +
> > >>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> > >>>>>>>>>> +
> > >>>>>>>>>> +The total size of buffers that are not shared with another file.
> > >>>>>>>>>> +
> > >>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> > >>>>>>>>>> +
> > >>>>>>>>>> +The total size of buffers that are resident in system memory.
> > >>>>>>>>>
> > >>>>>>>>> I think this naming maybe does not work best with the existing
> > >>>>>>>>> drm-memory-<region> keys.
> > >>>>>>>>
> > >>>>>>>> Actually, it was very deliberate not to conflict with the existing
> > >>>>>>>> drm-memory-<region> keys ;-)
> > >>>>>>>>
> > >>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> > >>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> > >>>>>>>>
> > >>>>>>>>> How about introduce the concept of a memory region from the start and
> > >>>>>>>>> use naming similar like we do for engines?
> > >>>>>>>>>
> > >>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
> > >>>>>>>>>
> > >>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
> > >>>>>>>>>
> > >>>>>>>>> 'size' - All reachable objects
> > >>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
> > >>>>>>>>> 'resident' - Objects with backing store
> > >>>>>>>>> 'active' - Objects in use, subset of resident
> > >>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
> > >>>>>>>>>
> > >>>>>>>>> We keep the same semantics as with process memory accounting (if I got
> > >>>>>>>>> it right) which could be desirable for a simplified mental model.
> > >>>>>>>>>
> > >>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> > >>>>>>>>> correctly captured this in the first round it should be equivalent to
> > >>>>>>>>> 'resident' above. In any case we can document no category is equal to
> > >>>>>>>>> which category, and at most one of the two must be output.)
> > >>>>>>>>>
> > >>>>>>>>> Region names we at most partially standardize. Like we could say
> > >>>>>>>>> 'system' is to be used where backing store is system RAM and others are
> > >>>>>>>>> driver defined.
> > >>>>>>>>>
> > >>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> > >>>>>>>>> region they support.
> > >>>>>>>>>
> > >>>>>>>>> I think this all also works for objects which can be migrated between
> > >>>>>>>>> memory regions. 'Size' accounts them against all regions while for
> > >>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
> > >>>>>>>>
> > >>>>>>>> I'm not too sure how to rectify different memory regions with this,
> > >>>>>>>> since drm core doesn't really know about the driver's memory regions.
> > >>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
> > >>>>>>>> just don't use the helper?  Or??
> > >>>>>>>
> > >>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> > >>>>>>> all works out reasonably consistently?
> > >>>>>>
> > >>>>>> That is basically what we have now.  I could append -system to each to
> > >>>>>> make things easier to add vram/etc (from a uabi standpoint)..
> > >>>>>
> > >>>>> What you have isn't really -system, but everything. So doesn't really make
> > >>>>> sense to me to mark this -system, it's only really true for integrated (if
> > >>>>> they don't have stolen or something like that).
> > >>>>>
> > >>>>> Also my comment was more in reply to Tvrtko's suggestion.
> > >>>>
> > >>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> > >>>> with the current drm-memory-$REGION by extending, rather than creating
> > >>>> confusion with different order of key name components.
> > >>>
> > >>> Oh my comment was pretty much just bikeshed, in case someone creates a
> > >>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> > >>> So $CATEGORY before the -memory.
> > >>>
> > >>> Otoh I don't think that'll happen, so I guess we can go with whatever more
> > >>> folks like :-) I don't really care much personally.
> > >>
> > >> Okay I missed the parsing problem.
> > >>
> > >>>> AMD currently has (among others) drm-memory-vram, which we could define in
> > >>>> the spec maps to category X, if category component is not present.
> > >>>>
> > >>>> Some examples:
> > >>>>
> > >>>> drm-memory-resident-system:
> > >>>> drm-memory-size-lmem0:
> > >>>> drm-memory-active-vram:
> > >>>>
> > >>>> Etc.. I think it creates a consistent story.
> > >>>>
> > >>>> Other than this, my two I think significant opens which haven't been
> > >>>> addressed yet are:
> > >>>>
> > >>>> 1)
> > >>>>
> > >>>> Why do we want totals (not per region) when userspace can trivially
> > >>>> aggregate if they want. What is the use case?
> > >>>>
> > >>>> 2)
> > >>>>
> > >>>> Current proposal limits the value to whole objects and fixates that by
> > >>>> having it in the common code. If/when some driver is able to support sub-BO
> > >>>> granularity they will need to opt out of the common printer at which point
> > >>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
> > >>>> some drivers already support this, I don't know. Given how important VM BIND
> > >>>> is I wouldn't be surprised.
> > >>>
> > >>> I feel like for drivers using ttm we want a ttm helper which takes care of
> > >>> the region printing in hopefully a standard way. And that could then also
> > >>> take care of all kinds of of partial binding and funny rules (like maybe
> > >>> we want a standard vram region that addds up all the lmem regions on
> > >>> intel, so that all dgpu have a common vram bucket that generic tools
> > >>> understand?).
> > >>
> > >> First part yes, but for the second I would think we want to avoid any
> > >> aggregation in the kernel which can be done in userspace just as well. Such
> > >> total vram bucket would be pretty useless on Intel even since userspace
> > >> needs to be region aware to make use of all resources. It could even be
> > >> counter productive I think - "why am I getting out of memory when half of my
> > >> vram is unused!?".
> > >
> > > This is not for intel-aware userspace. This is for fairly generic "gputop"
> > > style userspace, which might simply have no clue or interest in what lmemX
> > > means, but would understand vram.
> > >
> > > Aggregating makes sense.
> >
> > Lmem vs vram is now an argument not about aggregation but about
> > standardizing regions names.
> >
> > One detail also is a change in philosophy compared to engine stats where
> > engine names are not centrally prescribed and it was expected userspace
> > will have to handle things generically and with some vendor specific
> > knowledge.
> >
> > Like in my gputop patches. It doesn't need to understand what is what,
> > it just finds what's there and presents it to the user.
> >
> > Come some accel driver with local memory it wouldn't be vram any more.
> > Or even a headless data center GPU. So I really don't think it is good
> > to hardcode 'vram' in the spec, or midlayer, or helpers.
> >
> > And for aggregation.. again, userspace can do it just as well. If we do
> > it in kernel then immediately we have multiple sets of keys to output
> > for any driver which wants to show the region view. IMO it is just
> > pointless work in the kernel and more code in the kernel, when userspace
> > can do it.
> >
> > Proposal A (one a discrete gpu, one category only):
> >
> > drm-resident-memory: x KiB
> > drm-resident-memory-system: x KiB
> > drm-resident-memory-vram: x KiB
> >
> > Two loops in the kernel, more parsing in userspace.
> 
> why would it be more than one loop, ie.
> 
>     mem.resident += size;
>     mem.category[cat].resident += size;
> 
> At the end of the day, there is limited real-estate to show a million
> different columns of information.  Even the gputop patches I posted
> don't show everything of what is currently there.  And nvtop only
> shows toplevel resident stat.  So I think the "everything" stat is
> going to be what most tools use.

Yeah with enough finesse the double-loop isn't needed, it's just the
simplest possible approach.

Also this is fdinfo, I _really_ want perf data showing that it's a
real-world problem when we conjecture about algorithmic complexity.
procutils have been algorithmically garbage since decades after all :-)

Cheers, Daniel

> 
> BR,
> -R
> 
> > Proposal B:
> >
> > drm-resident-memory-system: x KiB
> > drm-resident-memory-vram: x KiB
> >
> > Can be one loop, one helper, less text for userspace to parse and it can
> > still trivially show the total if so desired.
> >
> > For instance a helper (or two) with a common struct containing region
> > names and totals, where a callback into the driver tallies under each
> > region, as the drm helper is walking objects.
> >
> > >>> It does mean we walk the bo list twice, but *shrug*. People have been
> > >>> complaining about procutils for decades, they're still horrible, I think
> > >>> walking bo lists twice internally in the ttm case is going to be ok. If
> > >>> not, it's internals, we can change them again.
> > >>>
> > >>> Also I'd lean a lot more towards making ttm a helper and not putting that
> > >>> into core, exactly because it's pretty clear we'll need more flexibility
> > >>> when it comes to accurate stats for multi-region drivers.
> > >>
> > >> Exactly.
> > >>
> > >>> But for a first "how much gpu space does this app use" across everything I
> > >>> think this is a good enough starting point.
> > >>
> > >> Okay so we agree this would be better as a helper and not in the core.
> > >
> > > Nope, if you mean with this = Rob's patch. I was talking about a
> > > hypothetical region-aware extension for ttm-using drivers.
> > >
> > >> On the point are keys/semantics good enough as a starting point I am still
> > >> not convinced kernel should aggregate and that instead we should start from
> > >> day one by appending -system (or something) to Rob's proposed keys.
> > >
> > > It should imo. Inflicting driver knowledge on generic userspace makes not
> > > much sense, we should start with the more generally useful stuff imo.
> > > That's why there's the drm fdinfo spec and all that so it's not a
> > > free-for-all.
> > >
> > > Also Rob's stuff is _not_ system. Check on a i915 dgpu if you want :-)
> >
> > I am well aware it adds up everything, that is beside the point.
> >
> > Drm-usage-stats.rst text needs to be more precise across all keys at least:
> >
> > +- drm-resident-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are resident in system memory.
> >
> > But as said, I don't see the point in providing aggregated values.
> >
> > Regards,
> >
> > Tvrtko

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-16  7:48                           ` Daniel Vetter
  0 siblings, 0 replies; 94+ messages in thread
From: Daniel Vetter @ 2023-04-16  7:48 UTC (permalink / raw)
  To: Rob Clark
  Cc: Rob Clark, Tvrtko Ursulin, open list:DOCUMENTATION,
	linux-arm-msm, Jonathan Corbet, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno

On Fri, Apr 14, 2023 at 06:40:27AM -0700, Rob Clark wrote:
> On Fri, Apr 14, 2023 at 1:57 AM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> >
> > On 13/04/2023 21:05, Daniel Vetter wrote:
> > > On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
> > >>
> > >> On 13/04/2023 14:27, Daniel Vetter wrote:
> > >>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> > >>>>
> > >>>> On 12/04/2023 20:18, Daniel Vetter wrote:
> > >>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> > >>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >>>>>>>
> > >>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> > >>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> > >>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
> > >>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> > >>>>>>>>>>
> > >>>>>>>>>> Add support to dump GEM stats to fdinfo.
> > >>>>>>>>>>
> > >>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
> > >>>>>>>>>> v3: Do it in core
> > >>>>>>>>>>
> > >>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> > >>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > >>>>>>>>>> ---
> > >>>>>>>>>>      Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> > >>>>>>>>>>      drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> > >>>>>>>>>>      include/drm/drm_file.h                |  1 +
> > >>>>>>>>>>      include/drm/drm_gem.h                 | 19 +++++++
> > >>>>>>>>>>      4 files changed, 117 insertions(+)
> > >>>>>>>>>>
> > >>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > >>>>>>>>>> index b46327356e80..b5e7802532ed 100644
> > >>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> > >>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> > >>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> > >>>>>>>>>>      Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> > >>>>>>>>>>      indicating kibi- or mebi-bytes.
> > >>>>>>>>>>
> > >>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> > >>>>>>>>>> +
> > >>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
> > >>>>>>>>>> +than a single handle).
> > >>>>>>>>>> +
> > >>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> > >>>>>>>>>> +
> > >>>>>>>>>> +The total size of buffers that are not shared with another file.
> > >>>>>>>>>> +
> > >>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> > >>>>>>>>>> +
> > >>>>>>>>>> +The total size of buffers that are resident in system memory.
> > >>>>>>>>>
> > >>>>>>>>> I think this naming maybe does not work best with the existing
> > >>>>>>>>> drm-memory-<region> keys.
> > >>>>>>>>
> > >>>>>>>> Actually, it was very deliberate not to conflict with the existing
> > >>>>>>>> drm-memory-<region> keys ;-)
> > >>>>>>>>
> > >>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> > >>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> > >>>>>>>>
> > >>>>>>>>> How about introduce the concept of a memory region from the start and
> > >>>>>>>>> use naming similar like we do for engines?
> > >>>>>>>>>
> > >>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
> > >>>>>>>>>
> > >>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
> > >>>>>>>>>
> > >>>>>>>>> 'size' - All reachable objects
> > >>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
> > >>>>>>>>> 'resident' - Objects with backing store
> > >>>>>>>>> 'active' - Objects in use, subset of resident
> > >>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
> > >>>>>>>>>
> > >>>>>>>>> We keep the same semantics as with process memory accounting (if I got
> > >>>>>>>>> it right) which could be desirable for a simplified mental model.
> > >>>>>>>>>
> > >>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> > >>>>>>>>> correctly captured this in the first round it should be equivalent to
> > >>>>>>>>> 'resident' above. In any case we can document no category is equal to
> > >>>>>>>>> which category, and at most one of the two must be output.)
> > >>>>>>>>>
> > >>>>>>>>> Region names we at most partially standardize. Like we could say
> > >>>>>>>>> 'system' is to be used where backing store is system RAM and others are
> > >>>>>>>>> driver defined.
> > >>>>>>>>>
> > >>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> > >>>>>>>>> region they support.
> > >>>>>>>>>
> > >>>>>>>>> I think this all also works for objects which can be migrated between
> > >>>>>>>>> memory regions. 'Size' accounts them against all regions while for
> > >>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
> > >>>>>>>>
> > >>>>>>>> I'm not too sure how to rectify different memory regions with this,
> > >>>>>>>> since drm core doesn't really know about the driver's memory regions.
> > >>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
> > >>>>>>>> just don't use the helper?  Or??
> > >>>>>>>
> > >>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> > >>>>>>> all works out reasonably consistently?
> > >>>>>>
> > >>>>>> That is basically what we have now.  I could append -system to each to
> > >>>>>> make things easier to add vram/etc (from a uabi standpoint)..
> > >>>>>
> > >>>>> What you have isn't really -system, but everything. So doesn't really make
> > >>>>> sense to me to mark this -system, it's only really true for integrated (if
> > >>>>> they don't have stolen or something like that).
> > >>>>>
> > >>>>> Also my comment was more in reply to Tvrtko's suggestion.
> > >>>>
> > >>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> > >>>> with the current drm-memory-$REGION by extending, rather than creating
> > >>>> confusion with different order of key name components.
> > >>>
> > >>> Oh my comment was pretty much just bikeshed, in case someone creates a
> > >>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> > >>> So $CATEGORY before the -memory.
> > >>>
> > >>> Otoh I don't think that'll happen, so I guess we can go with whatever more
> > >>> folks like :-) I don't really care much personally.
> > >>
> > >> Okay I missed the parsing problem.
> > >>
> > >>>> AMD currently has (among others) drm-memory-vram, which we could define in
> > >>>> the spec maps to category X, if category component is not present.
> > >>>>
> > >>>> Some examples:
> > >>>>
> > >>>> drm-memory-resident-system:
> > >>>> drm-memory-size-lmem0:
> > >>>> drm-memory-active-vram:
> > >>>>
> > >>>> Etc.. I think it creates a consistent story.
> > >>>>
> > >>>> Other than this, my two I think significant opens which haven't been
> > >>>> addressed yet are:
> > >>>>
> > >>>> 1)
> > >>>>
> > >>>> Why do we want totals (not per region) when userspace can trivially
> > >>>> aggregate if they want. What is the use case?
> > >>>>
> > >>>> 2)
> > >>>>
> > >>>> Current proposal limits the value to whole objects and fixates that by
> > >>>> having it in the common code. If/when some driver is able to support sub-BO
> > >>>> granularity they will need to opt out of the common printer at which point
> > >>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
> > >>>> some drivers already support this, I don't know. Given how important VM BIND
> > >>>> is I wouldn't be surprised.
> > >>>
> > >>> I feel like for drivers using ttm we want a ttm helper which takes care of
> > >>> the region printing in hopefully a standard way. And that could then also
> > >>> take care of all kinds of of partial binding and funny rules (like maybe
> > >>> we want a standard vram region that addds up all the lmem regions on
> > >>> intel, so that all dgpu have a common vram bucket that generic tools
> > >>> understand?).
> > >>
> > >> First part yes, but for the second I would think we want to avoid any
> > >> aggregation in the kernel which can be done in userspace just as well. Such
> > >> total vram bucket would be pretty useless on Intel even since userspace
> > >> needs to be region aware to make use of all resources. It could even be
> > >> counter productive I think - "why am I getting out of memory when half of my
> > >> vram is unused!?".
> > >
> > > This is not for intel-aware userspace. This is for fairly generic "gputop"
> > > style userspace, which might simply have no clue or interest in what lmemX
> > > means, but would understand vram.
> > >
> > > Aggregating makes sense.
> >
> > Lmem vs vram is now an argument not about aggregation but about
> > standardizing regions names.
> >
> > One detail also is a change in philosophy compared to engine stats where
> > engine names are not centrally prescribed and it was expected userspace
> > will have to handle things generically and with some vendor specific
> > knowledge.
> >
> > Like in my gputop patches. It doesn't need to understand what is what,
> > it just finds what's there and presents it to the user.
> >
> > Come some accel driver with local memory it wouldn't be vram any more.
> > Or even a headless data center GPU. So I really don't think it is good
> > to hardcode 'vram' in the spec, or midlayer, or helpers.
> >
> > And for aggregation.. again, userspace can do it just as well. If we do
> > it in kernel then immediately we have multiple sets of keys to output
> > for any driver which wants to show the region view. IMO it is just
> > pointless work in the kernel and more code in the kernel, when userspace
> > can do it.
> >
> > Proposal A (one a discrete gpu, one category only):
> >
> > drm-resident-memory: x KiB
> > drm-resident-memory-system: x KiB
> > drm-resident-memory-vram: x KiB
> >
> > Two loops in the kernel, more parsing in userspace.
> 
> why would it be more than one loop, ie.
> 
>     mem.resident += size;
>     mem.category[cat].resident += size;
> 
> At the end of the day, there is limited real-estate to show a million
> different columns of information.  Even the gputop patches I posted
> don't show everything of what is currently there.  And nvtop only
> shows toplevel resident stat.  So I think the "everything" stat is
> going to be what most tools use.

Yeah with enough finesse the double-loop isn't needed, it's just the
simplest possible approach.

Also this is fdinfo, I _really_ want perf data showing that it's a
real-world problem when we conjecture about algorithmic complexity.
procutils have been algorithmically garbage since decades after all :-)

Cheers, Daniel

> 
> BR,
> -R
> 
> > Proposal B:
> >
> > drm-resident-memory-system: x KiB
> > drm-resident-memory-vram: x KiB
> >
> > Can be one loop, one helper, less text for userspace to parse and it can
> > still trivially show the total if so desired.
> >
> > For instance a helper (or two) with a common struct containing region
> > names and totals, where a callback into the driver tallies under each
> > region, as the drm helper is walking objects.
> >
> > >>> It does mean we walk the bo list twice, but *shrug*. People have been
> > >>> complaining about procutils for decades, they're still horrible, I think
> > >>> walking bo lists twice internally in the ttm case is going to be ok. If
> > >>> not, it's internals, we can change them again.
> > >>>
> > >>> Also I'd lean a lot more towards making ttm a helper and not putting that
> > >>> into core, exactly because it's pretty clear we'll need more flexibility
> > >>> when it comes to accurate stats for multi-region drivers.
> > >>
> > >> Exactly.
> > >>
> > >>> But for a first "how much gpu space does this app use" across everything I
> > >>> think this is a good enough starting point.
> > >>
> > >> Okay so we agree this would be better as a helper and not in the core.
> > >
> > > Nope, if you mean with this = Rob's patch. I was talking about a
> > > hypothetical region-aware extension for ttm-using drivers.
> > >
> > >> On the point are keys/semantics good enough as a starting point I am still
> > >> not convinced kernel should aggregate and that instead we should start from
> > >> day one by appending -system (or something) to Rob's proposed keys.
> > >
> > > It should imo. Inflicting driver knowledge on generic userspace makes not
> > > much sense, we should start with the more generally useful stuff imo.
> > > That's why there's the drm fdinfo spec and all that so it's not a
> > > free-for-all.
> > >
> > > Also Rob's stuff is _not_ system. Check on a i915 dgpu if you want :-)
> >
> > I am well aware it adds up everything, that is beside the point.
> >
> > Drm-usage-stats.rst text needs to be more precise across all keys at least:
> >
> > +- drm-resident-memory: <uint> [KiB|MiB]
> > +
> > +The total size of buffers that are resident in system memory.
> >
> > But as said, I don't see the point in providing aggregated values.
> >
> > Regards,
> >
> > Tvrtko

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-16  7:48                           ` Daniel Vetter
  (?)
@ 2023-04-17 11:10                           ` Tvrtko Ursulin
  2023-04-17 13:42                               ` Rob Clark
  -1 siblings, 1 reply; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-17 11:10 UTC (permalink / raw)
  To: Rob Clark, Rob Clark, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno


On 16/04/2023 08:48, Daniel Vetter wrote:
> On Fri, Apr 14, 2023 at 06:40:27AM -0700, Rob Clark wrote:
>> On Fri, Apr 14, 2023 at 1:57 AM Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>
>>>
>>> On 13/04/2023 21:05, Daniel Vetter wrote:
>>>> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 13/04/2023 14:27, Daniel Vetter wrote:
>>>>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> On 12/04/2023 20:18, Daniel Vetter wrote:
>>>>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
>>>>>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
>>>>>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
>>>>>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
>>>>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Add support to dump GEM stats to fdinfo.
>>>>>>>>>>>>>
>>>>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
>>>>>>>>>>>>> v3: Do it in core
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>>>>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>       Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
>>>>>>>>>>>>>       drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
>>>>>>>>>>>>>       include/drm/drm_file.h                |  1 +
>>>>>>>>>>>>>       include/drm/drm_gem.h                 | 19 +++++++
>>>>>>>>>>>>>       4 files changed, 117 insertions(+)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>>>>> index b46327356e80..b5e7802532ed 100644
>>>>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
>>>>>>>>>>>>>       Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
>>>>>>>>>>>>>       indicating kibi- or mebi-bytes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
>>>>>>>>>>>>> +than a single handle).
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +The total size of buffers that are not shared with another file.
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +The total size of buffers that are resident in system memory.
>>>>>>>>>>>>
>>>>>>>>>>>> I think this naming maybe does not work best with the existing
>>>>>>>>>>>> drm-memory-<region> keys.
>>>>>>>>>>>
>>>>>>>>>>> Actually, it was very deliberate not to conflict with the existing
>>>>>>>>>>> drm-memory-<region> keys ;-)
>>>>>>>>>>>
>>>>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
>>>>>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
>>>>>>>>>>>
>>>>>>>>>>>> How about introduce the concept of a memory region from the start and
>>>>>>>>>>>> use naming similar like we do for engines?
>>>>>>>>>>>>
>>>>>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
>>>>>>>>>>>>
>>>>>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
>>>>>>>>>>>>
>>>>>>>>>>>> 'size' - All reachable objects
>>>>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
>>>>>>>>>>>> 'resident' - Objects with backing store
>>>>>>>>>>>> 'active' - Objects in use, subset of resident
>>>>>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
>>>>>>>>>>>>
>>>>>>>>>>>> We keep the same semantics as with process memory accounting (if I got
>>>>>>>>>>>> it right) which could be desirable for a simplified mental model.
>>>>>>>>>>>>
>>>>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
>>>>>>>>>>>> correctly captured this in the first round it should be equivalent to
>>>>>>>>>>>> 'resident' above. In any case we can document no category is equal to
>>>>>>>>>>>> which category, and at most one of the two must be output.)
>>>>>>>>>>>>
>>>>>>>>>>>> Region names we at most partially standardize. Like we could say
>>>>>>>>>>>> 'system' is to be used where backing store is system RAM and others are
>>>>>>>>>>>> driver defined.
>>>>>>>>>>>>
>>>>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
>>>>>>>>>>>> region they support.
>>>>>>>>>>>>
>>>>>>>>>>>> I think this all also works for objects which can be migrated between
>>>>>>>>>>>> memory regions. 'Size' accounts them against all regions while for
>>>>>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
>>>>>>>>>>>
>>>>>>>>>>> I'm not too sure how to rectify different memory regions with this,
>>>>>>>>>>> since drm core doesn't really know about the driver's memory regions.
>>>>>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
>>>>>>>>>>> just don't use the helper?  Or??
>>>>>>>>>>
>>>>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
>>>>>>>>>> all works out reasonably consistently?
>>>>>>>>>
>>>>>>>>> That is basically what we have now.  I could append -system to each to
>>>>>>>>> make things easier to add vram/etc (from a uabi standpoint)..
>>>>>>>>
>>>>>>>> What you have isn't really -system, but everything. So doesn't really make
>>>>>>>> sense to me to mark this -system, it's only really true for integrated (if
>>>>>>>> they don't have stolen or something like that).
>>>>>>>>
>>>>>>>> Also my comment was more in reply to Tvrtko's suggestion.
>>>>>>>
>>>>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
>>>>>>> with the current drm-memory-$REGION by extending, rather than creating
>>>>>>> confusion with different order of key name components.
>>>>>>
>>>>>> Oh my comment was pretty much just bikeshed, in case someone creates a
>>>>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
>>>>>> So $CATEGORY before the -memory.
>>>>>>
>>>>>> Otoh I don't think that'll happen, so I guess we can go with whatever more
>>>>>> folks like :-) I don't really care much personally.
>>>>>
>>>>> Okay I missed the parsing problem.
>>>>>
>>>>>>> AMD currently has (among others) drm-memory-vram, which we could define in
>>>>>>> the spec maps to category X, if category component is not present.
>>>>>>>
>>>>>>> Some examples:
>>>>>>>
>>>>>>> drm-memory-resident-system:
>>>>>>> drm-memory-size-lmem0:
>>>>>>> drm-memory-active-vram:
>>>>>>>
>>>>>>> Etc.. I think it creates a consistent story.
>>>>>>>
>>>>>>> Other than this, my two I think significant opens which haven't been
>>>>>>> addressed yet are:
>>>>>>>
>>>>>>> 1)
>>>>>>>
>>>>>>> Why do we want totals (not per region) when userspace can trivially
>>>>>>> aggregate if they want. What is the use case?
>>>>>>>
>>>>>>> 2)
>>>>>>>
>>>>>>> Current proposal limits the value to whole objects and fixates that by
>>>>>>> having it in the common code. If/when some driver is able to support sub-BO
>>>>>>> granularity they will need to opt out of the common printer at which point
>>>>>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
>>>>>>> some drivers already support this, I don't know. Given how important VM BIND
>>>>>>> is I wouldn't be surprised.
>>>>>>
>>>>>> I feel like for drivers using ttm we want a ttm helper which takes care of
>>>>>> the region printing in hopefully a standard way. And that could then also
>>>>>> take care of all kinds of of partial binding and funny rules (like maybe
>>>>>> we want a standard vram region that addds up all the lmem regions on
>>>>>> intel, so that all dgpu have a common vram bucket that generic tools
>>>>>> understand?).
>>>>>
>>>>> First part yes, but for the second I would think we want to avoid any
>>>>> aggregation in the kernel which can be done in userspace just as well. Such
>>>>> total vram bucket would be pretty useless on Intel even since userspace
>>>>> needs to be region aware to make use of all resources. It could even be
>>>>> counter productive I think - "why am I getting out of memory when half of my
>>>>> vram is unused!?".
>>>>
>>>> This is not for intel-aware userspace. This is for fairly generic "gputop"
>>>> style userspace, which might simply have no clue or interest in what lmemX
>>>> means, but would understand vram.
>>>>
>>>> Aggregating makes sense.
>>>
>>> Lmem vs vram is now an argument not about aggregation but about
>>> standardizing regions names.
>>>
>>> One detail also is a change in philosophy compared to engine stats where
>>> engine names are not centrally prescribed and it was expected userspace
>>> will have to handle things generically and with some vendor specific
>>> knowledge.
>>>
>>> Like in my gputop patches. It doesn't need to understand what is what,
>>> it just finds what's there and presents it to the user.
>>>
>>> Come some accel driver with local memory it wouldn't be vram any more.
>>> Or even a headless data center GPU. So I really don't think it is good
>>> to hardcode 'vram' in the spec, or midlayer, or helpers.
>>>
>>> And for aggregation.. again, userspace can do it just as well. If we do
>>> it in kernel then immediately we have multiple sets of keys to output
>>> for any driver which wants to show the region view. IMO it is just
>>> pointless work in the kernel and more code in the kernel, when userspace
>>> can do it.
>>>
>>> Proposal A (one a discrete gpu, one category only):
>>>
>>> drm-resident-memory: x KiB
>>> drm-resident-memory-system: x KiB
>>> drm-resident-memory-vram: x KiB
>>>
>>> Two loops in the kernel, more parsing in userspace.
>>
>> why would it be more than one loop, ie.
>>
>>      mem.resident += size;
>>      mem.category[cat].resident += size;
>>
>> At the end of the day, there is limited real-estate to show a million
>> different columns of information.  Even the gputop patches I posted
>> don't show everything of what is currently there.  And nvtop only
>> shows toplevel resident stat.  So I think the "everything" stat is
>> going to be what most tools use.
> 
> Yeah with enough finesse the double-loop isn't needed, it's just the
> simplest possible approach.
> 
> Also this is fdinfo, I _really_ want perf data showing that it's a
> real-world problem when we conjecture about algorithmic complexity.
> procutils have been algorithmically garbage since decades after all :-)

Just run it. :)

Algorithmic complexity is quite obvious and not a conjecture - to find 
DRM clients you have to walk _all_ pids and _all_ fds under them. So 
amount of work can scale very quickly and even _not_ with the number of 
DRM clients.

It's not too bad on my desktop setup but it is significantly more CPU 
intensive than top(1).

It would be possible to optimise the current code some more by not 
parsing full fdinfo (may become more important as number of keys grow), 
but that's only relevant when number of drm fds is large. It doesn't 
solve the basic pids * open fds search for which we'd need a way to walk 
the list of pids with drm fds directly.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-17 11:10                           ` Tvrtko Ursulin
@ 2023-04-17 13:42                               ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-17 13:42 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno

On Mon, Apr 17, 2023 at 4:10 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 16/04/2023 08:48, Daniel Vetter wrote:
> > On Fri, Apr 14, 2023 at 06:40:27AM -0700, Rob Clark wrote:
> >> On Fri, Apr 14, 2023 at 1:57 AM Tvrtko Ursulin
> >> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>
> >>>
> >>> On 13/04/2023 21:05, Daniel Vetter wrote:
> >>>> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
> >>>>>
> >>>>> On 13/04/2023 14:27, Daniel Vetter wrote:
> >>>>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> >>>>>>>
> >>>>>>> On 12/04/2023 20:18, Daniel Vetter wrote:
> >>>>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> >>>>>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> >>>>>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> >>>>>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
> >>>>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Add support to dump GEM stats to fdinfo.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
> >>>>>>>>>>>>> v3: Do it in core
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> >>>>>>>>>>>>> ---
> >>>>>>>>>>>>>       Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >>>>>>>>>>>>>       drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >>>>>>>>>>>>>       include/drm/drm_file.h                |  1 +
> >>>>>>>>>>>>>       include/drm/drm_gem.h                 | 19 +++++++
> >>>>>>>>>>>>>       4 files changed, 117 insertions(+)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>>>>> index b46327356e80..b5e7802532ed 100644
> >>>>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >>>>>>>>>>>>>       Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >>>>>>>>>>>>>       indicating kibi- or mebi-bytes.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
> >>>>>>>>>>>>> +than a single handle).
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +The total size of buffers that are not shared with another file.
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +The total size of buffers that are resident in system memory.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think this naming maybe does not work best with the existing
> >>>>>>>>>>>> drm-memory-<region> keys.
> >>>>>>>>>>>
> >>>>>>>>>>> Actually, it was very deliberate not to conflict with the existing
> >>>>>>>>>>> drm-memory-<region> keys ;-)
> >>>>>>>>>>>
> >>>>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> >>>>>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> >>>>>>>>>>>
> >>>>>>>>>>>> How about introduce the concept of a memory region from the start and
> >>>>>>>>>>>> use naming similar like we do for engines?
> >>>>>>>>>>>>
> >>>>>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
> >>>>>>>>>>>>
> >>>>>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
> >>>>>>>>>>>>
> >>>>>>>>>>>> 'size' - All reachable objects
> >>>>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
> >>>>>>>>>>>> 'resident' - Objects with backing store
> >>>>>>>>>>>> 'active' - Objects in use, subset of resident
> >>>>>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
> >>>>>>>>>>>>
> >>>>>>>>>>>> We keep the same semantics as with process memory accounting (if I got
> >>>>>>>>>>>> it right) which could be desirable for a simplified mental model.
> >>>>>>>>>>>>
> >>>>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> >>>>>>>>>>>> correctly captured this in the first round it should be equivalent to
> >>>>>>>>>>>> 'resident' above. In any case we can document no category is equal to
> >>>>>>>>>>>> which category, and at most one of the two must be output.)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Region names we at most partially standardize. Like we could say
> >>>>>>>>>>>> 'system' is to be used where backing store is system RAM and others are
> >>>>>>>>>>>> driver defined.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> >>>>>>>>>>>> region they support.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think this all also works for objects which can be migrated between
> >>>>>>>>>>>> memory regions. 'Size' accounts them against all regions while for
> >>>>>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
> >>>>>>>>>>>
> >>>>>>>>>>> I'm not too sure how to rectify different memory regions with this,
> >>>>>>>>>>> since drm core doesn't really know about the driver's memory regions.
> >>>>>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
> >>>>>>>>>>> just don't use the helper?  Or??
> >>>>>>>>>>
> >>>>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> >>>>>>>>>> all works out reasonably consistently?
> >>>>>>>>>
> >>>>>>>>> That is basically what we have now.  I could append -system to each to
> >>>>>>>>> make things easier to add vram/etc (from a uabi standpoint)..
> >>>>>>>>
> >>>>>>>> What you have isn't really -system, but everything. So doesn't really make
> >>>>>>>> sense to me to mark this -system, it's only really true for integrated (if
> >>>>>>>> they don't have stolen or something like that).
> >>>>>>>>
> >>>>>>>> Also my comment was more in reply to Tvrtko's suggestion.
> >>>>>>>
> >>>>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> >>>>>>> with the current drm-memory-$REGION by extending, rather than creating
> >>>>>>> confusion with different order of key name components.
> >>>>>>
> >>>>>> Oh my comment was pretty much just bikeshed, in case someone creates a
> >>>>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> >>>>>> So $CATEGORY before the -memory.
> >>>>>>
> >>>>>> Otoh I don't think that'll happen, so I guess we can go with whatever more
> >>>>>> folks like :-) I don't really care much personally.
> >>>>>
> >>>>> Okay I missed the parsing problem.
> >>>>>
> >>>>>>> AMD currently has (among others) drm-memory-vram, which we could define in
> >>>>>>> the spec maps to category X, if category component is not present.
> >>>>>>>
> >>>>>>> Some examples:
> >>>>>>>
> >>>>>>> drm-memory-resident-system:
> >>>>>>> drm-memory-size-lmem0:
> >>>>>>> drm-memory-active-vram:
> >>>>>>>
> >>>>>>> Etc.. I think it creates a consistent story.
> >>>>>>>
> >>>>>>> Other than this, my two I think significant opens which haven't been
> >>>>>>> addressed yet are:
> >>>>>>>
> >>>>>>> 1)
> >>>>>>>
> >>>>>>> Why do we want totals (not per region) when userspace can trivially
> >>>>>>> aggregate if they want. What is the use case?
> >>>>>>>
> >>>>>>> 2)
> >>>>>>>
> >>>>>>> Current proposal limits the value to whole objects and fixates that by
> >>>>>>> having it in the common code. If/when some driver is able to support sub-BO
> >>>>>>> granularity they will need to opt out of the common printer at which point
> >>>>>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
> >>>>>>> some drivers already support this, I don't know. Given how important VM BIND
> >>>>>>> is I wouldn't be surprised.
> >>>>>>
> >>>>>> I feel like for drivers using ttm we want a ttm helper which takes care of
> >>>>>> the region printing in hopefully a standard way. And that could then also
> >>>>>> take care of all kinds of of partial binding and funny rules (like maybe
> >>>>>> we want a standard vram region that addds up all the lmem regions on
> >>>>>> intel, so that all dgpu have a common vram bucket that generic tools
> >>>>>> understand?).
> >>>>>
> >>>>> First part yes, but for the second I would think we want to avoid any
> >>>>> aggregation in the kernel which can be done in userspace just as well. Such
> >>>>> total vram bucket would be pretty useless on Intel even since userspace
> >>>>> needs to be region aware to make use of all resources. It could even be
> >>>>> counter productive I think - "why am I getting out of memory when half of my
> >>>>> vram is unused!?".
> >>>>
> >>>> This is not for intel-aware userspace. This is for fairly generic "gputop"
> >>>> style userspace, which might simply have no clue or interest in what lmemX
> >>>> means, but would understand vram.
> >>>>
> >>>> Aggregating makes sense.
> >>>
> >>> Lmem vs vram is now an argument not about aggregation but about
> >>> standardizing regions names.
> >>>
> >>> One detail also is a change in philosophy compared to engine stats where
> >>> engine names are not centrally prescribed and it was expected userspace
> >>> will have to handle things generically and with some vendor specific
> >>> knowledge.
> >>>
> >>> Like in my gputop patches. It doesn't need to understand what is what,
> >>> it just finds what's there and presents it to the user.
> >>>
> >>> Come some accel driver with local memory it wouldn't be vram any more.
> >>> Or even a headless data center GPU. So I really don't think it is good
> >>> to hardcode 'vram' in the spec, or midlayer, or helpers.
> >>>
> >>> And for aggregation.. again, userspace can do it just as well. If we do
> >>> it in kernel then immediately we have multiple sets of keys to output
> >>> for any driver which wants to show the region view. IMO it is just
> >>> pointless work in the kernel and more code in the kernel, when userspace
> >>> can do it.
> >>>
> >>> Proposal A (one a discrete gpu, one category only):
> >>>
> >>> drm-resident-memory: x KiB
> >>> drm-resident-memory-system: x KiB
> >>> drm-resident-memory-vram: x KiB
> >>>
> >>> Two loops in the kernel, more parsing in userspace.
> >>
> >> why would it be more than one loop, ie.
> >>
> >>      mem.resident += size;
> >>      mem.category[cat].resident += size;
> >>
> >> At the end of the day, there is limited real-estate to show a million
> >> different columns of information.  Even the gputop patches I posted
> >> don't show everything of what is currently there.  And nvtop only
> >> shows toplevel resident stat.  So I think the "everything" stat is
> >> going to be what most tools use.
> >
> > Yeah with enough finesse the double-loop isn't needed, it's just the
> > simplest possible approach.
> >
> > Also this is fdinfo, I _really_ want perf data showing that it's a
> > real-world problem when we conjecture about algorithmic complexity.
> > procutils have been algorithmically garbage since decades after all :-)
>
> Just run it. :)
>
> Algorithmic complexity is quite obvious and not a conjecture - to find
> DRM clients you have to walk _all_ pids and _all_ fds under them. So
> amount of work can scale very quickly and even _not_ with the number of
> DRM clients.
>
> It's not too bad on my desktop setup but it is significantly more CPU
> intensive than top(1).
>
> It would be possible to optimise the current code some more by not
> parsing full fdinfo (may become more important as number of keys grow),
> but that's only relevant when number of drm fds is large. It doesn't
> solve the basic pids * open fds search for which we'd need a way to walk
> the list of pids with drm fds directly.

All of which has (almost[1]) nothing to do with one loop or two
(ignoring for a moment that I already pointed out a single loop is all
that is needed).  If CPU overhead is a problem, we could perhaps come
up some sysfs which has one file per drm_file and side-step crawling
of all of the proc * fd.  I'll play around with it some but I'm pretty
sure you are trying to optimize the wrong thing.

BR,
-R

[1] generally a single process using drm has multiple fd's pointing at
the same drm_file.. which makes the current approach of having to read
fdinfo to find the client-id sub-optimal.  But still the total # of
proc * fd is much larger

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-17 13:42                               ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-17 13:42 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, open list:DOCUMENTATION, linux-arm-msm,
	Jonathan Corbet, Emil Velikov, Christopher Healy, dri-devel,
	open list, Boris Brezillon, Thomas Zimmermann, freedreno

On Mon, Apr 17, 2023 at 4:10 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 16/04/2023 08:48, Daniel Vetter wrote:
> > On Fri, Apr 14, 2023 at 06:40:27AM -0700, Rob Clark wrote:
> >> On Fri, Apr 14, 2023 at 1:57 AM Tvrtko Ursulin
> >> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>
> >>>
> >>> On 13/04/2023 21:05, Daniel Vetter wrote:
> >>>> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
> >>>>>
> >>>>> On 13/04/2023 14:27, Daniel Vetter wrote:
> >>>>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> >>>>>>>
> >>>>>>> On 12/04/2023 20:18, Daniel Vetter wrote:
> >>>>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> >>>>>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> >>>>>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> >>>>>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
> >>>>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Add support to dump GEM stats to fdinfo.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
> >>>>>>>>>>>>> v3: Do it in core
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> >>>>>>>>>>>>> ---
> >>>>>>>>>>>>>       Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >>>>>>>>>>>>>       drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >>>>>>>>>>>>>       include/drm/drm_file.h                |  1 +
> >>>>>>>>>>>>>       include/drm/drm_gem.h                 | 19 +++++++
> >>>>>>>>>>>>>       4 files changed, 117 insertions(+)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>>>>> index b46327356e80..b5e7802532ed 100644
> >>>>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >>>>>>>>>>>>>       Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >>>>>>>>>>>>>       indicating kibi- or mebi-bytes.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
> >>>>>>>>>>>>> +than a single handle).
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +The total size of buffers that are not shared with another file.
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>> +The total size of buffers that are resident in system memory.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think this naming maybe does not work best with the existing
> >>>>>>>>>>>> drm-memory-<region> keys.
> >>>>>>>>>>>
> >>>>>>>>>>> Actually, it was very deliberate not to conflict with the existing
> >>>>>>>>>>> drm-memory-<region> keys ;-)
> >>>>>>>>>>>
> >>>>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> >>>>>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> >>>>>>>>>>>
> >>>>>>>>>>>> How about introduce the concept of a memory region from the start and
> >>>>>>>>>>>> use naming similar like we do for engines?
> >>>>>>>>>>>>
> >>>>>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
> >>>>>>>>>>>>
> >>>>>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
> >>>>>>>>>>>>
> >>>>>>>>>>>> 'size' - All reachable objects
> >>>>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
> >>>>>>>>>>>> 'resident' - Objects with backing store
> >>>>>>>>>>>> 'active' - Objects in use, subset of resident
> >>>>>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
> >>>>>>>>>>>>
> >>>>>>>>>>>> We keep the same semantics as with process memory accounting (if I got
> >>>>>>>>>>>> it right) which could be desirable for a simplified mental model.
> >>>>>>>>>>>>
> >>>>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> >>>>>>>>>>>> correctly captured this in the first round it should be equivalent to
> >>>>>>>>>>>> 'resident' above. In any case we can document no category is equal to
> >>>>>>>>>>>> which category, and at most one of the two must be output.)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Region names we at most partially standardize. Like we could say
> >>>>>>>>>>>> 'system' is to be used where backing store is system RAM and others are
> >>>>>>>>>>>> driver defined.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> >>>>>>>>>>>> region they support.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think this all also works for objects which can be migrated between
> >>>>>>>>>>>> memory regions. 'Size' accounts them against all regions while for
> >>>>>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
> >>>>>>>>>>>
> >>>>>>>>>>> I'm not too sure how to rectify different memory regions with this,
> >>>>>>>>>>> since drm core doesn't really know about the driver's memory regions.
> >>>>>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
> >>>>>>>>>>> just don't use the helper?  Or??
> >>>>>>>>>>
> >>>>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> >>>>>>>>>> all works out reasonably consistently?
> >>>>>>>>>
> >>>>>>>>> That is basically what we have now.  I could append -system to each to
> >>>>>>>>> make things easier to add vram/etc (from a uabi standpoint)..
> >>>>>>>>
> >>>>>>>> What you have isn't really -system, but everything. So doesn't really make
> >>>>>>>> sense to me to mark this -system, it's only really true for integrated (if
> >>>>>>>> they don't have stolen or something like that).
> >>>>>>>>
> >>>>>>>> Also my comment was more in reply to Tvrtko's suggestion.
> >>>>>>>
> >>>>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> >>>>>>> with the current drm-memory-$REGION by extending, rather than creating
> >>>>>>> confusion with different order of key name components.
> >>>>>>
> >>>>>> Oh my comment was pretty much just bikeshed, in case someone creates a
> >>>>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> >>>>>> So $CATEGORY before the -memory.
> >>>>>>
> >>>>>> Otoh I don't think that'll happen, so I guess we can go with whatever more
> >>>>>> folks like :-) I don't really care much personally.
> >>>>>
> >>>>> Okay I missed the parsing problem.
> >>>>>
> >>>>>>> AMD currently has (among others) drm-memory-vram, which we could define in
> >>>>>>> the spec maps to category X, if category component is not present.
> >>>>>>>
> >>>>>>> Some examples:
> >>>>>>>
> >>>>>>> drm-memory-resident-system:
> >>>>>>> drm-memory-size-lmem0:
> >>>>>>> drm-memory-active-vram:
> >>>>>>>
> >>>>>>> Etc.. I think it creates a consistent story.
> >>>>>>>
> >>>>>>> Other than this, my two I think significant opens which haven't been
> >>>>>>> addressed yet are:
> >>>>>>>
> >>>>>>> 1)
> >>>>>>>
> >>>>>>> Why do we want totals (not per region) when userspace can trivially
> >>>>>>> aggregate if they want. What is the use case?
> >>>>>>>
> >>>>>>> 2)
> >>>>>>>
> >>>>>>> Current proposal limits the value to whole objects and fixates that by
> >>>>>>> having it in the common code. If/when some driver is able to support sub-BO
> >>>>>>> granularity they will need to opt out of the common printer at which point
> >>>>>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
> >>>>>>> some drivers already support this, I don't know. Given how important VM BIND
> >>>>>>> is I wouldn't be surprised.
> >>>>>>
> >>>>>> I feel like for drivers using ttm we want a ttm helper which takes care of
> >>>>>> the region printing in hopefully a standard way. And that could then also
> >>>>>> take care of all kinds of of partial binding and funny rules (like maybe
> >>>>>> we want a standard vram region that addds up all the lmem regions on
> >>>>>> intel, so that all dgpu have a common vram bucket that generic tools
> >>>>>> understand?).
> >>>>>
> >>>>> First part yes, but for the second I would think we want to avoid any
> >>>>> aggregation in the kernel which can be done in userspace just as well. Such
> >>>>> total vram bucket would be pretty useless on Intel even since userspace
> >>>>> needs to be region aware to make use of all resources. It could even be
> >>>>> counter productive I think - "why am I getting out of memory when half of my
> >>>>> vram is unused!?".
> >>>>
> >>>> This is not for intel-aware userspace. This is for fairly generic "gputop"
> >>>> style userspace, which might simply have no clue or interest in what lmemX
> >>>> means, but would understand vram.
> >>>>
> >>>> Aggregating makes sense.
> >>>
> >>> Lmem vs vram is now an argument not about aggregation but about
> >>> standardizing regions names.
> >>>
> >>> One detail also is a change in philosophy compared to engine stats where
> >>> engine names are not centrally prescribed and it was expected userspace
> >>> will have to handle things generically and with some vendor specific
> >>> knowledge.
> >>>
> >>> Like in my gputop patches. It doesn't need to understand what is what,
> >>> it just finds what's there and presents it to the user.
> >>>
> >>> Come some accel driver with local memory it wouldn't be vram any more.
> >>> Or even a headless data center GPU. So I really don't think it is good
> >>> to hardcode 'vram' in the spec, or midlayer, or helpers.
> >>>
> >>> And for aggregation.. again, userspace can do it just as well. If we do
> >>> it in kernel then immediately we have multiple sets of keys to output
> >>> for any driver which wants to show the region view. IMO it is just
> >>> pointless work in the kernel and more code in the kernel, when userspace
> >>> can do it.
> >>>
> >>> Proposal A (one a discrete gpu, one category only):
> >>>
> >>> drm-resident-memory: x KiB
> >>> drm-resident-memory-system: x KiB
> >>> drm-resident-memory-vram: x KiB
> >>>
> >>> Two loops in the kernel, more parsing in userspace.
> >>
> >> why would it be more than one loop, ie.
> >>
> >>      mem.resident += size;
> >>      mem.category[cat].resident += size;
> >>
> >> At the end of the day, there is limited real-estate to show a million
> >> different columns of information.  Even the gputop patches I posted
> >> don't show everything of what is currently there.  And nvtop only
> >> shows toplevel resident stat.  So I think the "everything" stat is
> >> going to be what most tools use.
> >
> > Yeah with enough finesse the double-loop isn't needed, it's just the
> > simplest possible approach.
> >
> > Also this is fdinfo, I _really_ want perf data showing that it's a
> > real-world problem when we conjecture about algorithmic complexity.
> > procutils have been algorithmically garbage since decades after all :-)
>
> Just run it. :)
>
> Algorithmic complexity is quite obvious and not a conjecture - to find
> DRM clients you have to walk _all_ pids and _all_ fds under them. So
> amount of work can scale very quickly and even _not_ with the number of
> DRM clients.
>
> It's not too bad on my desktop setup but it is significantly more CPU
> intensive than top(1).
>
> It would be possible to optimise the current code some more by not
> parsing full fdinfo (may become more important as number of keys grow),
> but that's only relevant when number of drm fds is large. It doesn't
> solve the basic pids * open fds search for which we'd need a way to walk
> the list of pids with drm fds directly.

All of which has (almost[1]) nothing to do with one loop or two
(ignoring for a moment that I already pointed out a single loop is all
that is needed).  If CPU overhead is a problem, we could perhaps come
up some sysfs which has one file per drm_file and side-step crawling
of all of the proc * fd.  I'll play around with it some but I'm pretty
sure you are trying to optimize the wrong thing.

BR,
-R

[1] generally a single process using drm has multiple fd's pointing at
the same drm_file.. which makes the current approach of having to read
fdinfo to find the client-id sub-optimal.  But still the total # of
proc * fd is much larger

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-17 13:42                               ` Rob Clark
@ 2023-04-17 14:04                                 ` Alex Deucher
  -1 siblings, 0 replies; 94+ messages in thread
From: Alex Deucher @ 2023-04-17 14:04 UTC (permalink / raw)
  To: Rob Clark
  Cc: Tvrtko Ursulin, Rob Clark, open list:DOCUMENTATION,
	linux-arm-msm, Jonathan Corbet, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno

On Mon, Apr 17, 2023 at 9:43 AM Rob Clark <robdclark@gmail.com> wrote:
>
> On Mon, Apr 17, 2023 at 4:10 AM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> >
> > On 16/04/2023 08:48, Daniel Vetter wrote:
> > > On Fri, Apr 14, 2023 at 06:40:27AM -0700, Rob Clark wrote:
> > >> On Fri, Apr 14, 2023 at 1:57 AM Tvrtko Ursulin
> > >> <tvrtko.ursulin@linux.intel.com> wrote:
> > >>>
> > >>>
> > >>> On 13/04/2023 21:05, Daniel Vetter wrote:
> > >>>> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
> > >>>>>
> > >>>>> On 13/04/2023 14:27, Daniel Vetter wrote:
> > >>>>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> > >>>>>>>
> > >>>>>>> On 12/04/2023 20:18, Daniel Vetter wrote:
> > >>>>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> > >>>>>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> > >>>>>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> > >>>>>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
> > >>>>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Add support to dump GEM stats to fdinfo.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
> > >>>>>>>>>>>>> v3: Do it in core
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> > >>>>>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > >>>>>>>>>>>>> ---
> > >>>>>>>>>>>>>       Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> > >>>>>>>>>>>>>       drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> > >>>>>>>>>>>>>       include/drm/drm_file.h                |  1 +
> > >>>>>>>>>>>>>       include/drm/drm_gem.h                 | 19 +++++++
> > >>>>>>>>>>>>>       4 files changed, 117 insertions(+)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > >>>>>>>>>>>>> index b46327356e80..b5e7802532ed 100644
> > >>>>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> > >>>>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> > >>>>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> > >>>>>>>>>>>>>       Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> > >>>>>>>>>>>>>       indicating kibi- or mebi-bytes.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> > >>>>>>>>>>>>> +
> > >>>>>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
> > >>>>>>>>>>>>> +than a single handle).
> > >>>>>>>>>>>>> +
> > >>>>>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> > >>>>>>>>>>>>> +
> > >>>>>>>>>>>>> +The total size of buffers that are not shared with another file.
> > >>>>>>>>>>>>> +
> > >>>>>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> > >>>>>>>>>>>>> +
> > >>>>>>>>>>>>> +The total size of buffers that are resident in system memory.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I think this naming maybe does not work best with the existing
> > >>>>>>>>>>>> drm-memory-<region> keys.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Actually, it was very deliberate not to conflict with the existing
> > >>>>>>>>>>> drm-memory-<region> keys ;-)
> > >>>>>>>>>>>
> > >>>>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> > >>>>>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> > >>>>>>>>>>>
> > >>>>>>>>>>>> How about introduce the concept of a memory region from the start and
> > >>>>>>>>>>>> use naming similar like we do for engines?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> 'size' - All reachable objects
> > >>>>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
> > >>>>>>>>>>>> 'resident' - Objects with backing store
> > >>>>>>>>>>>> 'active' - Objects in use, subset of resident
> > >>>>>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> We keep the same semantics as with process memory accounting (if I got
> > >>>>>>>>>>>> it right) which could be desirable for a simplified mental model.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> > >>>>>>>>>>>> correctly captured this in the first round it should be equivalent to
> > >>>>>>>>>>>> 'resident' above. In any case we can document no category is equal to
> > >>>>>>>>>>>> which category, and at most one of the two must be output.)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Region names we at most partially standardize. Like we could say
> > >>>>>>>>>>>> 'system' is to be used where backing store is system RAM and others are
> > >>>>>>>>>>>> driver defined.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> > >>>>>>>>>>>> region they support.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I think this all also works for objects which can be migrated between
> > >>>>>>>>>>>> memory regions. 'Size' accounts them against all regions while for
> > >>>>>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
> > >>>>>>>>>>>
> > >>>>>>>>>>> I'm not too sure how to rectify different memory regions with this,
> > >>>>>>>>>>> since drm core doesn't really know about the driver's memory regions.
> > >>>>>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
> > >>>>>>>>>>> just don't use the helper?  Or??
> > >>>>>>>>>>
> > >>>>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> > >>>>>>>>>> all works out reasonably consistently?
> > >>>>>>>>>
> > >>>>>>>>> That is basically what we have now.  I could append -system to each to
> > >>>>>>>>> make things easier to add vram/etc (from a uabi standpoint)..
> > >>>>>>>>
> > >>>>>>>> What you have isn't really -system, but everything. So doesn't really make
> > >>>>>>>> sense to me to mark this -system, it's only really true for integrated (if
> > >>>>>>>> they don't have stolen or something like that).
> > >>>>>>>>
> > >>>>>>>> Also my comment was more in reply to Tvrtko's suggestion.
> > >>>>>>>
> > >>>>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> > >>>>>>> with the current drm-memory-$REGION by extending, rather than creating
> > >>>>>>> confusion with different order of key name components.
> > >>>>>>
> > >>>>>> Oh my comment was pretty much just bikeshed, in case someone creates a
> > >>>>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> > >>>>>> So $CATEGORY before the -memory.
> > >>>>>>
> > >>>>>> Otoh I don't think that'll happen, so I guess we can go with whatever more
> > >>>>>> folks like :-) I don't really care much personally.
> > >>>>>
> > >>>>> Okay I missed the parsing problem.
> > >>>>>
> > >>>>>>> AMD currently has (among others) drm-memory-vram, which we could define in
> > >>>>>>> the spec maps to category X, if category component is not present.
> > >>>>>>>
> > >>>>>>> Some examples:
> > >>>>>>>
> > >>>>>>> drm-memory-resident-system:
> > >>>>>>> drm-memory-size-lmem0:
> > >>>>>>> drm-memory-active-vram:
> > >>>>>>>
> > >>>>>>> Etc.. I think it creates a consistent story.
> > >>>>>>>
> > >>>>>>> Other than this, my two I think significant opens which haven't been
> > >>>>>>> addressed yet are:
> > >>>>>>>
> > >>>>>>> 1)
> > >>>>>>>
> > >>>>>>> Why do we want totals (not per region) when userspace can trivially
> > >>>>>>> aggregate if they want. What is the use case?
> > >>>>>>>
> > >>>>>>> 2)
> > >>>>>>>
> > >>>>>>> Current proposal limits the value to whole objects and fixates that by
> > >>>>>>> having it in the common code. If/when some driver is able to support sub-BO
> > >>>>>>> granularity they will need to opt out of the common printer at which point
> > >>>>>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
> > >>>>>>> some drivers already support this, I don't know. Given how important VM BIND
> > >>>>>>> is I wouldn't be surprised.
> > >>>>>>
> > >>>>>> I feel like for drivers using ttm we want a ttm helper which takes care of
> > >>>>>> the region printing in hopefully a standard way. And that could then also
> > >>>>>> take care of all kinds of of partial binding and funny rules (like maybe
> > >>>>>> we want a standard vram region that addds up all the lmem regions on
> > >>>>>> intel, so that all dgpu have a common vram bucket that generic tools
> > >>>>>> understand?).
> > >>>>>
> > >>>>> First part yes, but for the second I would think we want to avoid any
> > >>>>> aggregation in the kernel which can be done in userspace just as well. Such
> > >>>>> total vram bucket would be pretty useless on Intel even since userspace
> > >>>>> needs to be region aware to make use of all resources. It could even be
> > >>>>> counter productive I think - "why am I getting out of memory when half of my
> > >>>>> vram is unused!?".
> > >>>>
> > >>>> This is not for intel-aware userspace. This is for fairly generic "gputop"
> > >>>> style userspace, which might simply have no clue or interest in what lmemX
> > >>>> means, but would understand vram.
> > >>>>
> > >>>> Aggregating makes sense.
> > >>>
> > >>> Lmem vs vram is now an argument not about aggregation but about
> > >>> standardizing regions names.
> > >>>
> > >>> One detail also is a change in philosophy compared to engine stats where
> > >>> engine names are not centrally prescribed and it was expected userspace
> > >>> will have to handle things generically and with some vendor specific
> > >>> knowledge.
> > >>>
> > >>> Like in my gputop patches. It doesn't need to understand what is what,
> > >>> it just finds what's there and presents it to the user.
> > >>>
> > >>> Come some accel driver with local memory it wouldn't be vram any more.
> > >>> Or even a headless data center GPU. So I really don't think it is good
> > >>> to hardcode 'vram' in the spec, or midlayer, or helpers.
> > >>>
> > >>> And for aggregation.. again, userspace can do it just as well. If we do
> > >>> it in kernel then immediately we have multiple sets of keys to output
> > >>> for any driver which wants to show the region view. IMO it is just
> > >>> pointless work in the kernel and more code in the kernel, when userspace
> > >>> can do it.
> > >>>
> > >>> Proposal A (one a discrete gpu, one category only):
> > >>>
> > >>> drm-resident-memory: x KiB
> > >>> drm-resident-memory-system: x KiB
> > >>> drm-resident-memory-vram: x KiB
> > >>>
> > >>> Two loops in the kernel, more parsing in userspace.
> > >>
> > >> why would it be more than one loop, ie.
> > >>
> > >>      mem.resident += size;
> > >>      mem.category[cat].resident += size;
> > >>
> > >> At the end of the day, there is limited real-estate to show a million
> > >> different columns of information.  Even the gputop patches I posted
> > >> don't show everything of what is currently there.  And nvtop only
> > >> shows toplevel resident stat.  So I think the "everything" stat is
> > >> going to be what most tools use.
> > >
> > > Yeah with enough finesse the double-loop isn't needed, it's just the
> > > simplest possible approach.
> > >
> > > Also this is fdinfo, I _really_ want perf data showing that it's a
> > > real-world problem when we conjecture about algorithmic complexity.
> > > procutils have been algorithmically garbage since decades after all :-)
> >
> > Just run it. :)
> >
> > Algorithmic complexity is quite obvious and not a conjecture - to find
> > DRM clients you have to walk _all_ pids and _all_ fds under them. So
> > amount of work can scale very quickly and even _not_ with the number of
> > DRM clients.
> >
> > It's not too bad on my desktop setup but it is significantly more CPU
> > intensive than top(1).
> >
> > It would be possible to optimise the current code some more by not
> > parsing full fdinfo (may become more important as number of keys grow),
> > but that's only relevant when number of drm fds is large. It doesn't
> > solve the basic pids * open fds search for which we'd need a way to walk
> > the list of pids with drm fds directly.
>
> All of which has (almost[1]) nothing to do with one loop or two
> (ignoring for a moment that I already pointed out a single loop is all
> that is needed).  If CPU overhead is a problem, we could perhaps come
> up some sysfs which has one file per drm_file and side-step crawling
> of all of the proc * fd.  I'll play around with it some but I'm pretty
> sure you are trying to optimize the wrong thing.

Yeah, we have customers that would like a single interface (IOCTL or
sysfs) to get all of this info rather than having to walk a ton of
files and do effectively two syscalls to accumulate all of this data
for all of the processes on the system.

Alex

>
> BR,
> -R
>
> [1] generally a single process using drm has multiple fd's pointing at
> the same drm_file.. which makes the current approach of having to read
> fdinfo to find the client-id sub-optimal.  But still the total # of
> proc * fd is much larger

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-17 14:04                                 ` Alex Deucher
  0 siblings, 0 replies; 94+ messages in thread
From: Alex Deucher @ 2023-04-17 14:04 UTC (permalink / raw)
  To: Rob Clark
  Cc: Rob Clark, Tvrtko Ursulin, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno

On Mon, Apr 17, 2023 at 9:43 AM Rob Clark <robdclark@gmail.com> wrote:
>
> On Mon, Apr 17, 2023 at 4:10 AM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> >
> > On 16/04/2023 08:48, Daniel Vetter wrote:
> > > On Fri, Apr 14, 2023 at 06:40:27AM -0700, Rob Clark wrote:
> > >> On Fri, Apr 14, 2023 at 1:57 AM Tvrtko Ursulin
> > >> <tvrtko.ursulin@linux.intel.com> wrote:
> > >>>
> > >>>
> > >>> On 13/04/2023 21:05, Daniel Vetter wrote:
> > >>>> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
> > >>>>>
> > >>>>> On 13/04/2023 14:27, Daniel Vetter wrote:
> > >>>>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> > >>>>>>>
> > >>>>>>> On 12/04/2023 20:18, Daniel Vetter wrote:
> > >>>>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> > >>>>>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> > >>>>>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> > >>>>>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
> > >>>>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Add support to dump GEM stats to fdinfo.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
> > >>>>>>>>>>>>> v3: Do it in core
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> > >>>>>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> > >>>>>>>>>>>>> ---
> > >>>>>>>>>>>>>       Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> > >>>>>>>>>>>>>       drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> > >>>>>>>>>>>>>       include/drm/drm_file.h                |  1 +
> > >>>>>>>>>>>>>       include/drm/drm_gem.h                 | 19 +++++++
> > >>>>>>>>>>>>>       4 files changed, 117 insertions(+)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > >>>>>>>>>>>>> index b46327356e80..b5e7802532ed 100644
> > >>>>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> > >>>>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> > >>>>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> > >>>>>>>>>>>>>       Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> > >>>>>>>>>>>>>       indicating kibi- or mebi-bytes.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> > >>>>>>>>>>>>> +
> > >>>>>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
> > >>>>>>>>>>>>> +than a single handle).
> > >>>>>>>>>>>>> +
> > >>>>>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> > >>>>>>>>>>>>> +
> > >>>>>>>>>>>>> +The total size of buffers that are not shared with another file.
> > >>>>>>>>>>>>> +
> > >>>>>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> > >>>>>>>>>>>>> +
> > >>>>>>>>>>>>> +The total size of buffers that are resident in system memory.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I think this naming maybe does not work best with the existing
> > >>>>>>>>>>>> drm-memory-<region> keys.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Actually, it was very deliberate not to conflict with the existing
> > >>>>>>>>>>> drm-memory-<region> keys ;-)
> > >>>>>>>>>>>
> > >>>>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> > >>>>>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> > >>>>>>>>>>>
> > >>>>>>>>>>>> How about introduce the concept of a memory region from the start and
> > >>>>>>>>>>>> use naming similar like we do for engines?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> 'size' - All reachable objects
> > >>>>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
> > >>>>>>>>>>>> 'resident' - Objects with backing store
> > >>>>>>>>>>>> 'active' - Objects in use, subset of resident
> > >>>>>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> We keep the same semantics as with process memory accounting (if I got
> > >>>>>>>>>>>> it right) which could be desirable for a simplified mental model.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> > >>>>>>>>>>>> correctly captured this in the first round it should be equivalent to
> > >>>>>>>>>>>> 'resident' above. In any case we can document no category is equal to
> > >>>>>>>>>>>> which category, and at most one of the two must be output.)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Region names we at most partially standardize. Like we could say
> > >>>>>>>>>>>> 'system' is to be used where backing store is system RAM and others are
> > >>>>>>>>>>>> driver defined.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> > >>>>>>>>>>>> region they support.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I think this all also works for objects which can be migrated between
> > >>>>>>>>>>>> memory regions. 'Size' accounts them against all regions while for
> > >>>>>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
> > >>>>>>>>>>>
> > >>>>>>>>>>> I'm not too sure how to rectify different memory regions with this,
> > >>>>>>>>>>> since drm core doesn't really know about the driver's memory regions.
> > >>>>>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
> > >>>>>>>>>>> just don't use the helper?  Or??
> > >>>>>>>>>>
> > >>>>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> > >>>>>>>>>> all works out reasonably consistently?
> > >>>>>>>>>
> > >>>>>>>>> That is basically what we have now.  I could append -system to each to
> > >>>>>>>>> make things easier to add vram/etc (from a uabi standpoint)..
> > >>>>>>>>
> > >>>>>>>> What you have isn't really -system, but everything. So doesn't really make
> > >>>>>>>> sense to me to mark this -system, it's only really true for integrated (if
> > >>>>>>>> they don't have stolen or something like that).
> > >>>>>>>>
> > >>>>>>>> Also my comment was more in reply to Tvrtko's suggestion.
> > >>>>>>>
> > >>>>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> > >>>>>>> with the current drm-memory-$REGION by extending, rather than creating
> > >>>>>>> confusion with different order of key name components.
> > >>>>>>
> > >>>>>> Oh my comment was pretty much just bikeshed, in case someone creates a
> > >>>>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> > >>>>>> So $CATEGORY before the -memory.
> > >>>>>>
> > >>>>>> Otoh I don't think that'll happen, so I guess we can go with whatever more
> > >>>>>> folks like :-) I don't really care much personally.
> > >>>>>
> > >>>>> Okay I missed the parsing problem.
> > >>>>>
> > >>>>>>> AMD currently has (among others) drm-memory-vram, which we could define in
> > >>>>>>> the spec maps to category X, if category component is not present.
> > >>>>>>>
> > >>>>>>> Some examples:
> > >>>>>>>
> > >>>>>>> drm-memory-resident-system:
> > >>>>>>> drm-memory-size-lmem0:
> > >>>>>>> drm-memory-active-vram:
> > >>>>>>>
> > >>>>>>> Etc.. I think it creates a consistent story.
> > >>>>>>>
> > >>>>>>> Other than this, my two I think significant opens which haven't been
> > >>>>>>> addressed yet are:
> > >>>>>>>
> > >>>>>>> 1)
> > >>>>>>>
> > >>>>>>> Why do we want totals (not per region) when userspace can trivially
> > >>>>>>> aggregate if they want. What is the use case?
> > >>>>>>>
> > >>>>>>> 2)
> > >>>>>>>
> > >>>>>>> Current proposal limits the value to whole objects and fixates that by
> > >>>>>>> having it in the common code. If/when some driver is able to support sub-BO
> > >>>>>>> granularity they will need to opt out of the common printer at which point
> > >>>>>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
> > >>>>>>> some drivers already support this, I don't know. Given how important VM BIND
> > >>>>>>> is I wouldn't be surprised.
> > >>>>>>
> > >>>>>> I feel like for drivers using ttm we want a ttm helper which takes care of
> > >>>>>> the region printing in hopefully a standard way. And that could then also
> > >>>>>> take care of all kinds of of partial binding and funny rules (like maybe
> > >>>>>> we want a standard vram region that addds up all the lmem regions on
> > >>>>>> intel, so that all dgpu have a common vram bucket that generic tools
> > >>>>>> understand?).
> > >>>>>
> > >>>>> First part yes, but for the second I would think we want to avoid any
> > >>>>> aggregation in the kernel which can be done in userspace just as well. Such
> > >>>>> total vram bucket would be pretty useless on Intel even since userspace
> > >>>>> needs to be region aware to make use of all resources. It could even be
> > >>>>> counter productive I think - "why am I getting out of memory when half of my
> > >>>>> vram is unused!?".
> > >>>>
> > >>>> This is not for intel-aware userspace. This is for fairly generic "gputop"
> > >>>> style userspace, which might simply have no clue or interest in what lmemX
> > >>>> means, but would understand vram.
> > >>>>
> > >>>> Aggregating makes sense.
> > >>>
> > >>> Lmem vs vram is now an argument not about aggregation but about
> > >>> standardizing regions names.
> > >>>
> > >>> One detail also is a change in philosophy compared to engine stats where
> > >>> engine names are not centrally prescribed and it was expected userspace
> > >>> will have to handle things generically and with some vendor specific
> > >>> knowledge.
> > >>>
> > >>> Like in my gputop patches. It doesn't need to understand what is what,
> > >>> it just finds what's there and presents it to the user.
> > >>>
> > >>> Come some accel driver with local memory it wouldn't be vram any more.
> > >>> Or even a headless data center GPU. So I really don't think it is good
> > >>> to hardcode 'vram' in the spec, or midlayer, or helpers.
> > >>>
> > >>> And for aggregation.. again, userspace can do it just as well. If we do
> > >>> it in kernel then immediately we have multiple sets of keys to output
> > >>> for any driver which wants to show the region view. IMO it is just
> > >>> pointless work in the kernel and more code in the kernel, when userspace
> > >>> can do it.
> > >>>
> > >>> Proposal A (one a discrete gpu, one category only):
> > >>>
> > >>> drm-resident-memory: x KiB
> > >>> drm-resident-memory-system: x KiB
> > >>> drm-resident-memory-vram: x KiB
> > >>>
> > >>> Two loops in the kernel, more parsing in userspace.
> > >>
> > >> why would it be more than one loop, ie.
> > >>
> > >>      mem.resident += size;
> > >>      mem.category[cat].resident += size;
> > >>
> > >> At the end of the day, there is limited real-estate to show a million
> > >> different columns of information.  Even the gputop patches I posted
> > >> don't show everything of what is currently there.  And nvtop only
> > >> shows toplevel resident stat.  So I think the "everything" stat is
> > >> going to be what most tools use.
> > >
> > > Yeah with enough finesse the double-loop isn't needed, it's just the
> > > simplest possible approach.
> > >
> > > Also this is fdinfo, I _really_ want perf data showing that it's a
> > > real-world problem when we conjecture about algorithmic complexity.
> > > procutils have been algorithmically garbage since decades after all :-)
> >
> > Just run it. :)
> >
> > Algorithmic complexity is quite obvious and not a conjecture - to find
> > DRM clients you have to walk _all_ pids and _all_ fds under them. So
> > amount of work can scale very quickly and even _not_ with the number of
> > DRM clients.
> >
> > It's not too bad on my desktop setup but it is significantly more CPU
> > intensive than top(1).
> >
> > It would be possible to optimise the current code some more by not
> > parsing full fdinfo (may become more important as number of keys grow),
> > but that's only relevant when number of drm fds is large. It doesn't
> > solve the basic pids * open fds search for which we'd need a way to walk
> > the list of pids with drm fds directly.
>
> All of which has (almost[1]) nothing to do with one loop or two
> (ignoring for a moment that I already pointed out a single loop is all
> that is needed).  If CPU overhead is a problem, we could perhaps come
> up some sysfs which has one file per drm_file and side-step crawling
> of all of the proc * fd.  I'll play around with it some but I'm pretty
> sure you are trying to optimize the wrong thing.

Yeah, we have customers that would like a single interface (IOCTL or
sysfs) to get all of this info rather than having to walk a ton of
files and do effectively two syscalls to accumulate all of this data
for all of the processes on the system.

Alex

>
> BR,
> -R
>
> [1] generally a single process using drm has multiple fd's pointing at
> the same drm_file.. which makes the current approach of having to read
> fdinfo to find the client-id sub-optimal.  But still the total # of
> proc * fd is much larger

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-17 13:42                               ` Rob Clark
@ 2023-04-17 14:20                                 ` Tvrtko Ursulin
  -1 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-17 14:20 UTC (permalink / raw)
  To: Rob Clark
  Cc: Rob Clark, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno


On 17/04/2023 14:42, Rob Clark wrote:
> On Mon, Apr 17, 2023 at 4:10 AM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>>
>>
>> On 16/04/2023 08:48, Daniel Vetter wrote:
>>> On Fri, Apr 14, 2023 at 06:40:27AM -0700, Rob Clark wrote:
>>>> On Fri, Apr 14, 2023 at 1:57 AM Tvrtko Ursulin
>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>
>>>>>
>>>>> On 13/04/2023 21:05, Daniel Vetter wrote:
>>>>>> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> On 13/04/2023 14:27, Daniel Vetter wrote:
>>>>>>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
>>>>>>>>>
>>>>>>>>> On 12/04/2023 20:18, Daniel Vetter wrote:
>>>>>>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
>>>>>>>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
>>>>>>>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
>>>>>>>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
>>>>>>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Add support to dump GEM stats to fdinfo.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
>>>>>>>>>>>>>>> v3: Do it in core
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>>>>>>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>        Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
>>>>>>>>>>>>>>>        drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
>>>>>>>>>>>>>>>        include/drm/drm_file.h                |  1 +
>>>>>>>>>>>>>>>        include/drm/drm_gem.h                 | 19 +++++++
>>>>>>>>>>>>>>>        4 files changed, 117 insertions(+)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>>>>>>> index b46327356e80..b5e7802532ed 100644
>>>>>>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
>>>>>>>>>>>>>>>        Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
>>>>>>>>>>>>>>>        indicating kibi- or mebi-bytes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
>>>>>>>>>>>>>>> +than a single handle).
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +The total size of buffers that are not shared with another file.
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +The total size of buffers that are resident in system memory.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think this naming maybe does not work best with the existing
>>>>>>>>>>>>>> drm-memory-<region> keys.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Actually, it was very deliberate not to conflict with the existing
>>>>>>>>>>>>> drm-memory-<region> keys ;-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
>>>>>>>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> How about introduce the concept of a memory region from the start and
>>>>>>>>>>>>>> use naming similar like we do for engines?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 'size' - All reachable objects
>>>>>>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
>>>>>>>>>>>>>> 'resident' - Objects with backing store
>>>>>>>>>>>>>> 'active' - Objects in use, subset of resident
>>>>>>>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We keep the same semantics as with process memory accounting (if I got
>>>>>>>>>>>>>> it right) which could be desirable for a simplified mental model.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
>>>>>>>>>>>>>> correctly captured this in the first round it should be equivalent to
>>>>>>>>>>>>>> 'resident' above. In any case we can document no category is equal to
>>>>>>>>>>>>>> which category, and at most one of the two must be output.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Region names we at most partially standardize. Like we could say
>>>>>>>>>>>>>> 'system' is to be used where backing store is system RAM and others are
>>>>>>>>>>>>>> driver defined.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
>>>>>>>>>>>>>> region they support.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think this all also works for objects which can be migrated between
>>>>>>>>>>>>>> memory regions. 'Size' accounts them against all regions while for
>>>>>>>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm not too sure how to rectify different memory regions with this,
>>>>>>>>>>>>> since drm core doesn't really know about the driver's memory regions.
>>>>>>>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
>>>>>>>>>>>>> just don't use the helper?  Or??
>>>>>>>>>>>>
>>>>>>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
>>>>>>>>>>>> all works out reasonably consistently?
>>>>>>>>>>>
>>>>>>>>>>> That is basically what we have now.  I could append -system to each to
>>>>>>>>>>> make things easier to add vram/etc (from a uabi standpoint)..
>>>>>>>>>>
>>>>>>>>>> What you have isn't really -system, but everything. So doesn't really make
>>>>>>>>>> sense to me to mark this -system, it's only really true for integrated (if
>>>>>>>>>> they don't have stolen or something like that).
>>>>>>>>>>
>>>>>>>>>> Also my comment was more in reply to Tvrtko's suggestion.
>>>>>>>>>
>>>>>>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
>>>>>>>>> with the current drm-memory-$REGION by extending, rather than creating
>>>>>>>>> confusion with different order of key name components.
>>>>>>>>
>>>>>>>> Oh my comment was pretty much just bikeshed, in case someone creates a
>>>>>>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
>>>>>>>> So $CATEGORY before the -memory.
>>>>>>>>
>>>>>>>> Otoh I don't think that'll happen, so I guess we can go with whatever more
>>>>>>>> folks like :-) I don't really care much personally.
>>>>>>>
>>>>>>> Okay I missed the parsing problem.
>>>>>>>
>>>>>>>>> AMD currently has (among others) drm-memory-vram, which we could define in
>>>>>>>>> the spec maps to category X, if category component is not present.
>>>>>>>>>
>>>>>>>>> Some examples:
>>>>>>>>>
>>>>>>>>> drm-memory-resident-system:
>>>>>>>>> drm-memory-size-lmem0:
>>>>>>>>> drm-memory-active-vram:
>>>>>>>>>
>>>>>>>>> Etc.. I think it creates a consistent story.
>>>>>>>>>
>>>>>>>>> Other than this, my two I think significant opens which haven't been
>>>>>>>>> addressed yet are:
>>>>>>>>>
>>>>>>>>> 1)
>>>>>>>>>
>>>>>>>>> Why do we want totals (not per region) when userspace can trivially
>>>>>>>>> aggregate if they want. What is the use case?
>>>>>>>>>
>>>>>>>>> 2)
>>>>>>>>>
>>>>>>>>> Current proposal limits the value to whole objects and fixates that by
>>>>>>>>> having it in the common code. If/when some driver is able to support sub-BO
>>>>>>>>> granularity they will need to opt out of the common printer at which point
>>>>>>>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
>>>>>>>>> some drivers already support this, I don't know. Given how important VM BIND
>>>>>>>>> is I wouldn't be surprised.
>>>>>>>>
>>>>>>>> I feel like for drivers using ttm we want a ttm helper which takes care of
>>>>>>>> the region printing in hopefully a standard way. And that could then also
>>>>>>>> take care of all kinds of of partial binding and funny rules (like maybe
>>>>>>>> we want a standard vram region that addds up all the lmem regions on
>>>>>>>> intel, so that all dgpu have a common vram bucket that generic tools
>>>>>>>> understand?).
>>>>>>>
>>>>>>> First part yes, but for the second I would think we want to avoid any
>>>>>>> aggregation in the kernel which can be done in userspace just as well. Such
>>>>>>> total vram bucket would be pretty useless on Intel even since userspace
>>>>>>> needs to be region aware to make use of all resources. It could even be
>>>>>>> counter productive I think - "why am I getting out of memory when half of my
>>>>>>> vram is unused!?".
>>>>>>
>>>>>> This is not for intel-aware userspace. This is for fairly generic "gputop"
>>>>>> style userspace, which might simply have no clue or interest in what lmemX
>>>>>> means, but would understand vram.
>>>>>>
>>>>>> Aggregating makes sense.
>>>>>
>>>>> Lmem vs vram is now an argument not about aggregation but about
>>>>> standardizing regions names.
>>>>>
>>>>> One detail also is a change in philosophy compared to engine stats where
>>>>> engine names are not centrally prescribed and it was expected userspace
>>>>> will have to handle things generically and with some vendor specific
>>>>> knowledge.
>>>>>
>>>>> Like in my gputop patches. It doesn't need to understand what is what,
>>>>> it just finds what's there and presents it to the user.
>>>>>
>>>>> Come some accel driver with local memory it wouldn't be vram any more.
>>>>> Or even a headless data center GPU. So I really don't think it is good
>>>>> to hardcode 'vram' in the spec, or midlayer, or helpers.
>>>>>
>>>>> And for aggregation.. again, userspace can do it just as well. If we do
>>>>> it in kernel then immediately we have multiple sets of keys to output
>>>>> for any driver which wants to show the region view. IMO it is just
>>>>> pointless work in the kernel and more code in the kernel, when userspace
>>>>> can do it.
>>>>>
>>>>> Proposal A (one a discrete gpu, one category only):
>>>>>
>>>>> drm-resident-memory: x KiB
>>>>> drm-resident-memory-system: x KiB
>>>>> drm-resident-memory-vram: x KiB
>>>>>
>>>>> Two loops in the kernel, more parsing in userspace.
>>>>
>>>> why would it be more than one loop, ie.
>>>>
>>>>       mem.resident += size;
>>>>       mem.category[cat].resident += size;
>>>>
>>>> At the end of the day, there is limited real-estate to show a million
>>>> different columns of information.  Even the gputop patches I posted
>>>> don't show everything of what is currently there.  And nvtop only
>>>> shows toplevel resident stat.  So I think the "everything" stat is
>>>> going to be what most tools use.
>>>
>>> Yeah with enough finesse the double-loop isn't needed, it's just the
>>> simplest possible approach.
>>>
>>> Also this is fdinfo, I _really_ want perf data showing that it's a
>>> real-world problem when we conjecture about algorithmic complexity.
>>> procutils have been algorithmically garbage since decades after all :-)
>>
>> Just run it. :)
>>
>> Algorithmic complexity is quite obvious and not a conjecture - to find
>> DRM clients you have to walk _all_ pids and _all_ fds under them. So
>> amount of work can scale very quickly and even _not_ with the number of
>> DRM clients.
>>
>> It's not too bad on my desktop setup but it is significantly more CPU
>> intensive than top(1).
>>
>> It would be possible to optimise the current code some more by not
>> parsing full fdinfo (may become more important as number of keys grow),
>> but that's only relevant when number of drm fds is large. It doesn't
>> solve the basic pids * open fds search for which we'd need a way to walk
>> the list of pids with drm fds directly.
> 
> All of which has (almost[1]) nothing to do with one loop or two

Correct, this was just a side discussion where I understood Daniel is 
asking about the wider performance story. Perhaps I misunderstood.

> (ignoring for a moment that I already pointed out a single loop is all
> that is needed).  If CPU overhead is a problem, we could perhaps come
> up some sysfs which has one file per drm_file and side-step crawling
> of all of the proc * fd.  I'll play around with it some but I'm pretty
> sure you are trying to optimize the wrong thing.

Yes, that's what I meant too in "a way to walk the list of pids with drm 
fds directly".

Regards,

Tvrtko

> 
> BR,
> -R
> 
> [1] generally a single process using drm has multiple fd's pointing at
> the same drm_file.. which makes the current approach of having to read
> fdinfo to find the client-id sub-optimal.  But still the total # of
> proc * fd is much larger

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-17 14:20                                 ` Tvrtko Ursulin
  0 siblings, 0 replies; 94+ messages in thread
From: Tvrtko Ursulin @ 2023-04-17 14:20 UTC (permalink / raw)
  To: Rob Clark
  Cc: Rob Clark, open list:DOCUMENTATION, linux-arm-msm,
	Jonathan Corbet, Emil Velikov, Christopher Healy, dri-devel,
	open list, Boris Brezillon, Thomas Zimmermann, freedreno


On 17/04/2023 14:42, Rob Clark wrote:
> On Mon, Apr 17, 2023 at 4:10 AM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>>
>>
>> On 16/04/2023 08:48, Daniel Vetter wrote:
>>> On Fri, Apr 14, 2023 at 06:40:27AM -0700, Rob Clark wrote:
>>>> On Fri, Apr 14, 2023 at 1:57 AM Tvrtko Ursulin
>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>
>>>>>
>>>>> On 13/04/2023 21:05, Daniel Vetter wrote:
>>>>>> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> On 13/04/2023 14:27, Daniel Vetter wrote:
>>>>>>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
>>>>>>>>>
>>>>>>>>> On 12/04/2023 20:18, Daniel Vetter wrote:
>>>>>>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
>>>>>>>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
>>>>>>>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
>>>>>>>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
>>>>>>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Add support to dump GEM stats to fdinfo.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
>>>>>>>>>>>>>>> v3: Do it in core
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>>>>>>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>        Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
>>>>>>>>>>>>>>>        drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
>>>>>>>>>>>>>>>        include/drm/drm_file.h                |  1 +
>>>>>>>>>>>>>>>        include/drm/drm_gem.h                 | 19 +++++++
>>>>>>>>>>>>>>>        4 files changed, 117 insertions(+)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>>>>>>> index b46327356e80..b5e7802532ed 100644
>>>>>>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
>>>>>>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
>>>>>>>>>>>>>>>        Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
>>>>>>>>>>>>>>>        indicating kibi- or mebi-bytes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
>>>>>>>>>>>>>>> +than a single handle).
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +The total size of buffers that are not shared with another file.
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +The total size of buffers that are resident in system memory.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think this naming maybe does not work best with the existing
>>>>>>>>>>>>>> drm-memory-<region> keys.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Actually, it was very deliberate not to conflict with the existing
>>>>>>>>>>>>> drm-memory-<region> keys ;-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
>>>>>>>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> How about introduce the concept of a memory region from the start and
>>>>>>>>>>>>>> use naming similar like we do for engines?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 'size' - All reachable objects
>>>>>>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
>>>>>>>>>>>>>> 'resident' - Objects with backing store
>>>>>>>>>>>>>> 'active' - Objects in use, subset of resident
>>>>>>>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We keep the same semantics as with process memory accounting (if I got
>>>>>>>>>>>>>> it right) which could be desirable for a simplified mental model.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
>>>>>>>>>>>>>> correctly captured this in the first round it should be equivalent to
>>>>>>>>>>>>>> 'resident' above. In any case we can document no category is equal to
>>>>>>>>>>>>>> which category, and at most one of the two must be output.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Region names we at most partially standardize. Like we could say
>>>>>>>>>>>>>> 'system' is to be used where backing store is system RAM and others are
>>>>>>>>>>>>>> driver defined.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
>>>>>>>>>>>>>> region they support.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think this all also works for objects which can be migrated between
>>>>>>>>>>>>>> memory regions. 'Size' accounts them against all regions while for
>>>>>>>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm not too sure how to rectify different memory regions with this,
>>>>>>>>>>>>> since drm core doesn't really know about the driver's memory regions.
>>>>>>>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
>>>>>>>>>>>>> just don't use the helper?  Or??
>>>>>>>>>>>>
>>>>>>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
>>>>>>>>>>>> all works out reasonably consistently?
>>>>>>>>>>>
>>>>>>>>>>> That is basically what we have now.  I could append -system to each to
>>>>>>>>>>> make things easier to add vram/etc (from a uabi standpoint)..
>>>>>>>>>>
>>>>>>>>>> What you have isn't really -system, but everything. So doesn't really make
>>>>>>>>>> sense to me to mark this -system, it's only really true for integrated (if
>>>>>>>>>> they don't have stolen or something like that).
>>>>>>>>>>
>>>>>>>>>> Also my comment was more in reply to Tvrtko's suggestion.
>>>>>>>>>
>>>>>>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
>>>>>>>>> with the current drm-memory-$REGION by extending, rather than creating
>>>>>>>>> confusion with different order of key name components.
>>>>>>>>
>>>>>>>> Oh my comment was pretty much just bikeshed, in case someone creates a
>>>>>>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
>>>>>>>> So $CATEGORY before the -memory.
>>>>>>>>
>>>>>>>> Otoh I don't think that'll happen, so I guess we can go with whatever more
>>>>>>>> folks like :-) I don't really care much personally.
>>>>>>>
>>>>>>> Okay I missed the parsing problem.
>>>>>>>
>>>>>>>>> AMD currently has (among others) drm-memory-vram, which we could define in
>>>>>>>>> the spec maps to category X, if category component is not present.
>>>>>>>>>
>>>>>>>>> Some examples:
>>>>>>>>>
>>>>>>>>> drm-memory-resident-system:
>>>>>>>>> drm-memory-size-lmem0:
>>>>>>>>> drm-memory-active-vram:
>>>>>>>>>
>>>>>>>>> Etc.. I think it creates a consistent story.
>>>>>>>>>
>>>>>>>>> Other than this, my two I think significant opens which haven't been
>>>>>>>>> addressed yet are:
>>>>>>>>>
>>>>>>>>> 1)
>>>>>>>>>
>>>>>>>>> Why do we want totals (not per region) when userspace can trivially
>>>>>>>>> aggregate if they want. What is the use case?
>>>>>>>>>
>>>>>>>>> 2)
>>>>>>>>>
>>>>>>>>> Current proposal limits the value to whole objects and fixates that by
>>>>>>>>> having it in the common code. If/when some driver is able to support sub-BO
>>>>>>>>> granularity they will need to opt out of the common printer at which point
>>>>>>>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
>>>>>>>>> some drivers already support this, I don't know. Given how important VM BIND
>>>>>>>>> is I wouldn't be surprised.
>>>>>>>>
>>>>>>>> I feel like for drivers using ttm we want a ttm helper which takes care of
>>>>>>>> the region printing in hopefully a standard way. And that could then also
>>>>>>>> take care of all kinds of of partial binding and funny rules (like maybe
>>>>>>>> we want a standard vram region that addds up all the lmem regions on
>>>>>>>> intel, so that all dgpu have a common vram bucket that generic tools
>>>>>>>> understand?).
>>>>>>>
>>>>>>> First part yes, but for the second I would think we want to avoid any
>>>>>>> aggregation in the kernel which can be done in userspace just as well. Such
>>>>>>> total vram bucket would be pretty useless on Intel even since userspace
>>>>>>> needs to be region aware to make use of all resources. It could even be
>>>>>>> counter productive I think - "why am I getting out of memory when half of my
>>>>>>> vram is unused!?".
>>>>>>
>>>>>> This is not for intel-aware userspace. This is for fairly generic "gputop"
>>>>>> style userspace, which might simply have no clue or interest in what lmemX
>>>>>> means, but would understand vram.
>>>>>>
>>>>>> Aggregating makes sense.
>>>>>
>>>>> Lmem vs vram is now an argument not about aggregation but about
>>>>> standardizing regions names.
>>>>>
>>>>> One detail also is a change in philosophy compared to engine stats where
>>>>> engine names are not centrally prescribed and it was expected userspace
>>>>> will have to handle things generically and with some vendor specific
>>>>> knowledge.
>>>>>
>>>>> Like in my gputop patches. It doesn't need to understand what is what,
>>>>> it just finds what's there and presents it to the user.
>>>>>
>>>>> Come some accel driver with local memory it wouldn't be vram any more.
>>>>> Or even a headless data center GPU. So I really don't think it is good
>>>>> to hardcode 'vram' in the spec, or midlayer, or helpers.
>>>>>
>>>>> And for aggregation.. again, userspace can do it just as well. If we do
>>>>> it in kernel then immediately we have multiple sets of keys to output
>>>>> for any driver which wants to show the region view. IMO it is just
>>>>> pointless work in the kernel and more code in the kernel, when userspace
>>>>> can do it.
>>>>>
>>>>> Proposal A (one a discrete gpu, one category only):
>>>>>
>>>>> drm-resident-memory: x KiB
>>>>> drm-resident-memory-system: x KiB
>>>>> drm-resident-memory-vram: x KiB
>>>>>
>>>>> Two loops in the kernel, more parsing in userspace.
>>>>
>>>> why would it be more than one loop, ie.
>>>>
>>>>       mem.resident += size;
>>>>       mem.category[cat].resident += size;
>>>>
>>>> At the end of the day, there is limited real-estate to show a million
>>>> different columns of information.  Even the gputop patches I posted
>>>> don't show everything of what is currently there.  And nvtop only
>>>> shows toplevel resident stat.  So I think the "everything" stat is
>>>> going to be what most tools use.
>>>
>>> Yeah with enough finesse the double-loop isn't needed, it's just the
>>> simplest possible approach.
>>>
>>> Also this is fdinfo, I _really_ want perf data showing that it's a
>>> real-world problem when we conjecture about algorithmic complexity.
>>> procutils have been algorithmically garbage since decades after all :-)
>>
>> Just run it. :)
>>
>> Algorithmic complexity is quite obvious and not a conjecture - to find
>> DRM clients you have to walk _all_ pids and _all_ fds under them. So
>> amount of work can scale very quickly and even _not_ with the number of
>> DRM clients.
>>
>> It's not too bad on my desktop setup but it is significantly more CPU
>> intensive than top(1).
>>
>> It would be possible to optimise the current code some more by not
>> parsing full fdinfo (may become more important as number of keys grow),
>> but that's only relevant when number of drm fds is large. It doesn't
>> solve the basic pids * open fds search for which we'd need a way to walk
>> the list of pids with drm fds directly.
> 
> All of which has (almost[1]) nothing to do with one loop or two

Correct, this was just a side discussion where I understood Daniel is 
asking about the wider performance story. Perhaps I misunderstood.

> (ignoring for a moment that I already pointed out a single loop is all
> that is needed).  If CPU overhead is a problem, we could perhaps come
> up some sysfs which has one file per drm_file and side-step crawling
> of all of the proc * fd.  I'll play around with it some but I'm pretty
> sure you are trying to optimize the wrong thing.

Yes, that's what I meant too in "a way to walk the list of pids with drm 
fds directly".

Regards,

Tvrtko

> 
> BR,
> -R
> 
> [1] generally a single process using drm has multiple fd's pointing at
> the same drm_file.. which makes the current approach of having to read
> fdinfo to find the client-id sub-optimal.  But still the total # of
> proc * fd is much larger

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
  2023-04-17 14:20                                 ` Tvrtko Ursulin
@ 2023-04-17 16:12                                   ` Rob Clark
  -1 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-17 16:12 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, Jonathan Corbet, linux-arm-msm,
	open list:DOCUMENTATION, Emil Velikov, Christopher Healy,
	dri-devel, open list, Boris Brezillon, Thomas Zimmermann,
	freedreno

On Mon, Apr 17, 2023 at 7:20 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 17/04/2023 14:42, Rob Clark wrote:
> > On Mon, Apr 17, 2023 at 4:10 AM Tvrtko Ursulin
> > <tvrtko.ursulin@linux.intel.com> wrote:
> >>
> >>
> >> On 16/04/2023 08:48, Daniel Vetter wrote:
> >>> On Fri, Apr 14, 2023 at 06:40:27AM -0700, Rob Clark wrote:
> >>>> On Fri, Apr 14, 2023 at 1:57 AM Tvrtko Ursulin
> >>>> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>
> >>>>>
> >>>>> On 13/04/2023 21:05, Daniel Vetter wrote:
> >>>>>> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
> >>>>>>>
> >>>>>>> On 13/04/2023 14:27, Daniel Vetter wrote:
> >>>>>>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> >>>>>>>>>
> >>>>>>>>> On 12/04/2023 20:18, Daniel Vetter wrote:
> >>>>>>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> >>>>>>>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> >>>>>>>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> >>>>>>>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
> >>>>>>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Add support to dump GEM stats to fdinfo.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
> >>>>>>>>>>>>>>> v3: Do it in core
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> >>>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>>        Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >>>>>>>>>>>>>>>        drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >>>>>>>>>>>>>>>        include/drm/drm_file.h                |  1 +
> >>>>>>>>>>>>>>>        include/drm/drm_gem.h                 | 19 +++++++
> >>>>>>>>>>>>>>>        4 files changed, 117 insertions(+)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>>>>>>> index b46327356e80..b5e7802532ed 100644
> >>>>>>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >>>>>>>>>>>>>>>        Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >>>>>>>>>>>>>>>        indicating kibi- or mebi-bytes.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
> >>>>>>>>>>>>>>> +than a single handle).
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +The total size of buffers that are not shared with another file.
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +The total size of buffers that are resident in system memory.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think this naming maybe does not work best with the existing
> >>>>>>>>>>>>>> drm-memory-<region> keys.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Actually, it was very deliberate not to conflict with the existing
> >>>>>>>>>>>>> drm-memory-<region> keys ;-)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> >>>>>>>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> How about introduce the concept of a memory region from the start and
> >>>>>>>>>>>>>> use naming similar like we do for engines?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 'size' - All reachable objects
> >>>>>>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
> >>>>>>>>>>>>>> 'resident' - Objects with backing store
> >>>>>>>>>>>>>> 'active' - Objects in use, subset of resident
> >>>>>>>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We keep the same semantics as with process memory accounting (if I got
> >>>>>>>>>>>>>> it right) which could be desirable for a simplified mental model.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> >>>>>>>>>>>>>> correctly captured this in the first round it should be equivalent to
> >>>>>>>>>>>>>> 'resident' above. In any case we can document no category is equal to
> >>>>>>>>>>>>>> which category, and at most one of the two must be output.)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Region names we at most partially standardize. Like we could say
> >>>>>>>>>>>>>> 'system' is to be used where backing store is system RAM and others are
> >>>>>>>>>>>>>> driver defined.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> >>>>>>>>>>>>>> region they support.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think this all also works for objects which can be migrated between
> >>>>>>>>>>>>>> memory regions. 'Size' accounts them against all regions while for
> >>>>>>>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'm not too sure how to rectify different memory regions with this,
> >>>>>>>>>>>>> since drm core doesn't really know about the driver's memory regions.
> >>>>>>>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
> >>>>>>>>>>>>> just don't use the helper?  Or??
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> >>>>>>>>>>>> all works out reasonably consistently?
> >>>>>>>>>>>
> >>>>>>>>>>> That is basically what we have now.  I could append -system to each to
> >>>>>>>>>>> make things easier to add vram/etc (from a uabi standpoint)..
> >>>>>>>>>>
> >>>>>>>>>> What you have isn't really -system, but everything. So doesn't really make
> >>>>>>>>>> sense to me to mark this -system, it's only really true for integrated (if
> >>>>>>>>>> they don't have stolen or something like that).
> >>>>>>>>>>
> >>>>>>>>>> Also my comment was more in reply to Tvrtko's suggestion.
> >>>>>>>>>
> >>>>>>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> >>>>>>>>> with the current drm-memory-$REGION by extending, rather than creating
> >>>>>>>>> confusion with different order of key name components.
> >>>>>>>>
> >>>>>>>> Oh my comment was pretty much just bikeshed, in case someone creates a
> >>>>>>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> >>>>>>>> So $CATEGORY before the -memory.
> >>>>>>>>
> >>>>>>>> Otoh I don't think that'll happen, so I guess we can go with whatever more
> >>>>>>>> folks like :-) I don't really care much personally.
> >>>>>>>
> >>>>>>> Okay I missed the parsing problem.
> >>>>>>>
> >>>>>>>>> AMD currently has (among others) drm-memory-vram, which we could define in
> >>>>>>>>> the spec maps to category X, if category component is not present.
> >>>>>>>>>
> >>>>>>>>> Some examples:
> >>>>>>>>>
> >>>>>>>>> drm-memory-resident-system:
> >>>>>>>>> drm-memory-size-lmem0:
> >>>>>>>>> drm-memory-active-vram:
> >>>>>>>>>
> >>>>>>>>> Etc.. I think it creates a consistent story.
> >>>>>>>>>
> >>>>>>>>> Other than this, my two I think significant opens which haven't been
> >>>>>>>>> addressed yet are:
> >>>>>>>>>
> >>>>>>>>> 1)
> >>>>>>>>>
> >>>>>>>>> Why do we want totals (not per region) when userspace can trivially
> >>>>>>>>> aggregate if they want. What is the use case?
> >>>>>>>>>
> >>>>>>>>> 2)
> >>>>>>>>>
> >>>>>>>>> Current proposal limits the value to whole objects and fixates that by
> >>>>>>>>> having it in the common code. If/when some driver is able to support sub-BO
> >>>>>>>>> granularity they will need to opt out of the common printer at which point
> >>>>>>>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
> >>>>>>>>> some drivers already support this, I don't know. Given how important VM BIND
> >>>>>>>>> is I wouldn't be surprised.
> >>>>>>>>
> >>>>>>>> I feel like for drivers using ttm we want a ttm helper which takes care of
> >>>>>>>> the region printing in hopefully a standard way. And that could then also
> >>>>>>>> take care of all kinds of of partial binding and funny rules (like maybe
> >>>>>>>> we want a standard vram region that addds up all the lmem regions on
> >>>>>>>> intel, so that all dgpu have a common vram bucket that generic tools
> >>>>>>>> understand?).
> >>>>>>>
> >>>>>>> First part yes, but for the second I would think we want to avoid any
> >>>>>>> aggregation in the kernel which can be done in userspace just as well. Such
> >>>>>>> total vram bucket would be pretty useless on Intel even since userspace
> >>>>>>> needs to be region aware to make use of all resources. It could even be
> >>>>>>> counter productive I think - "why am I getting out of memory when half of my
> >>>>>>> vram is unused!?".
> >>>>>>
> >>>>>> This is not for intel-aware userspace. This is for fairly generic "gputop"
> >>>>>> style userspace, which might simply have no clue or interest in what lmemX
> >>>>>> means, but would understand vram.
> >>>>>>
> >>>>>> Aggregating makes sense.
> >>>>>
> >>>>> Lmem vs vram is now an argument not about aggregation but about
> >>>>> standardizing regions names.
> >>>>>
> >>>>> One detail also is a change in philosophy compared to engine stats where
> >>>>> engine names are not centrally prescribed and it was expected userspace
> >>>>> will have to handle things generically and with some vendor specific
> >>>>> knowledge.
> >>>>>
> >>>>> Like in my gputop patches. It doesn't need to understand what is what,
> >>>>> it just finds what's there and presents it to the user.
> >>>>>
> >>>>> Come some accel driver with local memory it wouldn't be vram any more.
> >>>>> Or even a headless data center GPU. So I really don't think it is good
> >>>>> to hardcode 'vram' in the spec, or midlayer, or helpers.
> >>>>>
> >>>>> And for aggregation.. again, userspace can do it just as well. If we do
> >>>>> it in kernel then immediately we have multiple sets of keys to output
> >>>>> for any driver which wants to show the region view. IMO it is just
> >>>>> pointless work in the kernel and more code in the kernel, when userspace
> >>>>> can do it.
> >>>>>
> >>>>> Proposal A (one a discrete gpu, one category only):
> >>>>>
> >>>>> drm-resident-memory: x KiB
> >>>>> drm-resident-memory-system: x KiB
> >>>>> drm-resident-memory-vram: x KiB
> >>>>>
> >>>>> Two loops in the kernel, more parsing in userspace.
> >>>>
> >>>> why would it be more than one loop, ie.
> >>>>
> >>>>       mem.resident += size;
> >>>>       mem.category[cat].resident += size;
> >>>>
> >>>> At the end of the day, there is limited real-estate to show a million
> >>>> different columns of information.  Even the gputop patches I posted
> >>>> don't show everything of what is currently there.  And nvtop only
> >>>> shows toplevel resident stat.  So I think the "everything" stat is
> >>>> going to be what most tools use.
> >>>
> >>> Yeah with enough finesse the double-loop isn't needed, it's just the
> >>> simplest possible approach.
> >>>
> >>> Also this is fdinfo, I _really_ want perf data showing that it's a
> >>> real-world problem when we conjecture about algorithmic complexity.
> >>> procutils have been algorithmically garbage since decades after all :-)
> >>
> >> Just run it. :)
> >>
> >> Algorithmic complexity is quite obvious and not a conjecture - to find
> >> DRM clients you have to walk _all_ pids and _all_ fds under them. So
> >> amount of work can scale very quickly and even _not_ with the number of
> >> DRM clients.
> >>
> >> It's not too bad on my desktop setup but it is significantly more CPU
> >> intensive than top(1).
> >>
> >> It would be possible to optimise the current code some more by not
> >> parsing full fdinfo (may become more important as number of keys grow),
> >> but that's only relevant when number of drm fds is large. It doesn't
> >> solve the basic pids * open fds search for which we'd need a way to walk
> >> the list of pids with drm fds directly.
> >
> > All of which has (almost[1]) nothing to do with one loop or two
>
> Correct, this was just a side discussion where I understood Daniel is
> asking about the wider performance story. Perhaps I misunderstood.
>
> > (ignoring for a moment that I already pointed out a single loop is all
> > that is needed).  If CPU overhead is a problem, we could perhaps come
> > up some sysfs which has one file per drm_file and side-step crawling
> > of all of the proc * fd.  I'll play around with it some but I'm pretty
> > sure you are trying to optimize the wrong thing.
>
> Yes, that's what I meant too in "a way to walk the list of pids with drm
> fds directly".

Just to follow up, I did a quick hack to loop and print the mem
stats.. 5x loops I couldn't really measure any increase in gputop CPU
utilization.  At 50x loops I could measure a small increase.  Without
additional looping to artificially increase the cost, nothing drm
related shows up in a perf-record of gputop.

What could be an easy optimization, if it can be accessed, is to parse
/sys/kernel/debug/dri/<n>/clients to get the list of pid's of
processes with the drm device open.  This would cut down quite a bit
the # of pid's to examine.

BR,
-R

> Regards,
>
> Tvrtko
>
> >
> > BR,
> > -R
> >
> > [1] generally a single process using drm has multiple fd's pointing at
> > the same drm_file.. which makes the current approach of having to read
> > fdinfo to find the client-id sub-optimal.  But still the total # of
> > proc * fd is much larger

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 6/7] drm: Add fdinfo memory stats
@ 2023-04-17 16:12                                   ` Rob Clark
  0 siblings, 0 replies; 94+ messages in thread
From: Rob Clark @ 2023-04-17 16:12 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Rob Clark, open list:DOCUMENTATION, linux-arm-msm,
	Jonathan Corbet, Emil Velikov, Christopher Healy, dri-devel,
	open list, Boris Brezillon, Thomas Zimmermann, freedreno

On Mon, Apr 17, 2023 at 7:20 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 17/04/2023 14:42, Rob Clark wrote:
> > On Mon, Apr 17, 2023 at 4:10 AM Tvrtko Ursulin
> > <tvrtko.ursulin@linux.intel.com> wrote:
> >>
> >>
> >> On 16/04/2023 08:48, Daniel Vetter wrote:
> >>> On Fri, Apr 14, 2023 at 06:40:27AM -0700, Rob Clark wrote:
> >>>> On Fri, Apr 14, 2023 at 1:57 AM Tvrtko Ursulin
> >>>> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>
> >>>>>
> >>>>> On 13/04/2023 21:05, Daniel Vetter wrote:
> >>>>>> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote:
> >>>>>>>
> >>>>>>> On 13/04/2023 14:27, Daniel Vetter wrote:
> >>>>>>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote:
> >>>>>>>>>
> >>>>>>>>> On 12/04/2023 20:18, Daniel Vetter wrote:
> >>>>>>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote:
> >>>>>>>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote:
> >>>>>>>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin
> >>>>>>>>>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote:
> >>>>>>>>>>>>>>> From: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Add support to dump GEM stats to fdinfo.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64
> >>>>>>>>>>>>>>> v3: Do it in core
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>>>>>>>>>>>>>> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
> >>>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>>        Documentation/gpu/drm-usage-stats.rst | 21 ++++++++
> >>>>>>>>>>>>>>>        drivers/gpu/drm/drm_file.c            | 76 +++++++++++++++++++++++++++
> >>>>>>>>>>>>>>>        include/drm/drm_file.h                |  1 +
> >>>>>>>>>>>>>>>        include/drm/drm_gem.h                 | 19 +++++++
> >>>>>>>>>>>>>>>        4 files changed, 117 insertions(+)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>>>>>>> index b46327356e80..b5e7802532ed 100644
> >>>>>>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region.
> >>>>>>>>>>>>>>>        Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
> >>>>>>>>>>>>>>>        indicating kibi- or mebi-bytes.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> +- drm-shared-memory: <uint> [KiB|MiB]
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more
> >>>>>>>>>>>>>>> +than a single handle).
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +- drm-private-memory: <uint> [KiB|MiB]
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +The total size of buffers that are not shared with another file.
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +- drm-resident-memory: <uint> [KiB|MiB]
> >>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>> +The total size of buffers that are resident in system memory.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think this naming maybe does not work best with the existing
> >>>>>>>>>>>>>> drm-memory-<region> keys.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Actually, it was very deliberate not to conflict with the existing
> >>>>>>>>>>>>> drm-memory-<region> keys ;-)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it
> >>>>>>>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> How about introduce the concept of a memory region from the start and
> >>>>>>>>>>>>>> use naming similar like we do for engines?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> drm-memory-$CATEGORY-$REGION: ...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Then we document a bunch of categories and their semantics, for instance:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 'size' - All reachable objects
> >>>>>>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1
> >>>>>>>>>>>>>> 'resident' - Objects with backing store
> >>>>>>>>>>>>>> 'active' - Objects in use, subset of resident
> >>>>>>>>>>>>>> 'purgeable' - Or inactive? Subset of resident.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We keep the same semantics as with process memory accounting (if I got
> >>>>>>>>>>>>>> it right) which could be desirable for a simplified mental model.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we
> >>>>>>>>>>>>>> correctly captured this in the first round it should be equivalent to
> >>>>>>>>>>>>>> 'resident' above. In any case we can document no category is equal to
> >>>>>>>>>>>>>> which category, and at most one of the two must be output.)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Region names we at most partially standardize. Like we could say
> >>>>>>>>>>>>>> 'system' is to be used where backing store is system RAM and others are
> >>>>>>>>>>>>>> driver defined.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory
> >>>>>>>>>>>>>> region they support.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think this all also works for objects which can be migrated between
> >>>>>>>>>>>>>> memory regions. 'Size' accounts them against all regions while for
> >>>>>>>>>>>>>> 'resident' they only appear in the region of their current placement, etc.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'm not too sure how to rectify different memory regions with this,
> >>>>>>>>>>>>> since drm core doesn't really know about the driver's memory regions.
> >>>>>>>>>>>>> Perhaps we can go back to this being a helper and drivers with vram
> >>>>>>>>>>>>> just don't use the helper?  Or??
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it
> >>>>>>>>>>>> all works out reasonably consistently?
> >>>>>>>>>>>
> >>>>>>>>>>> That is basically what we have now.  I could append -system to each to
> >>>>>>>>>>> make things easier to add vram/etc (from a uabi standpoint)..
> >>>>>>>>>>
> >>>>>>>>>> What you have isn't really -system, but everything. So doesn't really make
> >>>>>>>>>> sense to me to mark this -system, it's only really true for integrated (if
> >>>>>>>>>> they don't have stolen or something like that).
> >>>>>>>>>>
> >>>>>>>>>> Also my comment was more in reply to Tvrtko's suggestion.
> >>>>>>>>>
> >>>>>>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns
> >>>>>>>>> with the current drm-memory-$REGION by extending, rather than creating
> >>>>>>>>> confusion with different order of key name components.
> >>>>>>>>
> >>>>>>>> Oh my comment was pretty much just bikeshed, in case someone creates a
> >>>>>>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point.
> >>>>>>>> So $CATEGORY before the -memory.
> >>>>>>>>
> >>>>>>>> Otoh I don't think that'll happen, so I guess we can go with whatever more
> >>>>>>>> folks like :-) I don't really care much personally.
> >>>>>>>
> >>>>>>> Okay I missed the parsing problem.
> >>>>>>>
> >>>>>>>>> AMD currently has (among others) drm-memory-vram, which we could define in
> >>>>>>>>> the spec maps to category X, if category component is not present.
> >>>>>>>>>
> >>>>>>>>> Some examples:
> >>>>>>>>>
> >>>>>>>>> drm-memory-resident-system:
> >>>>>>>>> drm-memory-size-lmem0:
> >>>>>>>>> drm-memory-active-vram:
> >>>>>>>>>
> >>>>>>>>> Etc.. I think it creates a consistent story.
> >>>>>>>>>
> >>>>>>>>> Other than this, my two I think significant opens which haven't been
> >>>>>>>>> addressed yet are:
> >>>>>>>>>
> >>>>>>>>> 1)
> >>>>>>>>>
> >>>>>>>>> Why do we want totals (not per region) when userspace can trivially
> >>>>>>>>> aggregate if they want. What is the use case?
> >>>>>>>>>
> >>>>>>>>> 2)
> >>>>>>>>>
> >>>>>>>>> Current proposal limits the value to whole objects and fixates that by
> >>>>>>>>> having it in the common code. If/when some driver is able to support sub-BO
> >>>>>>>>> granularity they will need to opt out of the common printer at which point
> >>>>>>>>> it may be less churn to start with a helper rather than mid-layer. Or maybe
> >>>>>>>>> some drivers already support this, I don't know. Given how important VM BIND
> >>>>>>>>> is I wouldn't be surprised.
> >>>>>>>>
> >>>>>>>> I feel like for drivers using ttm we want a ttm helper which takes care of
> >>>>>>>> the region printing in hopefully a standard way. And that could then also
> >>>>>>>> take care of all kinds of of partial binding and funny rules (like maybe
> >>>>>>>> we want a standard vram region that addds up all the lmem regions on
> >>>>>>>> intel, so that all dgpu have a common vram bucket that generic tools
> >>>>>>>> understand?).
> >>>>>>>
> >>>>>>> First part yes, but for the second I would think we want to avoid any
> >>>>>>> aggregation in the kernel which can be done in userspace just as well. Such
> >>>>>>> total vram bucket would be pretty useless on Intel even since userspace
> >>>>>>> needs to be region aware to make use of all resources. It could even be
> >>>>>>> counter productive I think - "why am I getting out of memory when half of my
> >>>>>>> vram is unused!?".
> >>>>>>
> >>>>>> This is not for intel-aware userspace. This is for fairly generic "gputop"
> >>>>>> style userspace, which might simply have no clue or interest in what lmemX
> >>>>>> means, but would understand vram.
> >>>>>>
> >>>>>> Aggregating makes sense.
> >>>>>
> >>>>> Lmem vs vram is now an argument not about aggregation but about
> >>>>> standardizing regions names.
> >>>>>
> >>>>> One detail also is a change in philosophy compared to engine stats where
> >>>>> engine names are not centrally prescribed and it was expected userspace
> >>>>> will have to handle things generically and with some vendor specific
> >>>>> knowledge.
> >>>>>
> >>>>> Like in my gputop patches. It doesn't need to understand what is what,
> >>>>> it just finds what's there and presents it to the user.
> >>>>>
> >>>>> Come some accel driver with local memory it wouldn't be vram any more.
> >>>>> Or even a headless data center GPU. So I really don't think it is good
> >>>>> to hardcode 'vram' in the spec, or midlayer, or helpers.
> >>>>>
> >>>>> And for aggregation.. again, userspace can do it just as well. If we do
> >>>>> it in kernel then immediately we have multiple sets of keys to output
> >>>>> for any driver which wants to show the region view. IMO it is just
> >>>>> pointless work in the kernel and more code in the kernel, when userspace
> >>>>> can do it.
> >>>>>
> >>>>> Proposal A (one a discrete gpu, one category only):
> >>>>>
> >>>>> drm-resident-memory: x KiB
> >>>>> drm-resident-memory-system: x KiB
> >>>>> drm-resident-memory-vram: x KiB
> >>>>>
> >>>>> Two loops in the kernel, more parsing in userspace.
> >>>>
> >>>> why would it be more than one loop, ie.
> >>>>
> >>>>       mem.resident += size;
> >>>>       mem.category[cat].resident += size;
> >>>>
> >>>> At the end of the day, there is limited real-estate to show a million
> >>>> different columns of information.  Even the gputop patches I posted
> >>>> don't show everything of what is currently there.  And nvtop only
> >>>> shows toplevel resident stat.  So I think the "everything" stat is
> >>>> going to be what most tools use.
> >>>
> >>> Yeah with enough finesse the double-loop isn't needed, it's just the
> >>> simplest possible approach.
> >>>
> >>> Also this is fdinfo, I _really_ want perf data showing that it's a
> >>> real-world problem when we conjecture about algorithmic complexity.
> >>> procutils have been algorithmically garbage since decades after all :-)
> >>
> >> Just run it. :)
> >>
> >> Algorithmic complexity is quite obvious and not a conjecture - to find
> >> DRM clients you have to walk _all_ pids and _all_ fds under them. So
> >> amount of work can scale very quickly and even _not_ with the number of
> >> DRM clients.
> >>
> >> It's not too bad on my desktop setup but it is significantly more CPU
> >> intensive than top(1).
> >>
> >> It would be possible to optimise the current code some more by not
> >> parsing full fdinfo (may become more important as number of keys grow),
> >> but that's only relevant when number of drm fds is large. It doesn't
> >> solve the basic pids * open fds search for which we'd need a way to walk
> >> the list of pids with drm fds directly.
> >
> > All of which has (almost[1]) nothing to do with one loop or two
>
> Correct, this was just a side discussion where I understood Daniel is
> asking about the wider performance story. Perhaps I misunderstood.
>
> > (ignoring for a moment that I already pointed out a single loop is all
> > that is needed).  If CPU overhead is a problem, we could perhaps come
> > up some sysfs which has one file per drm_file and side-step crawling
> > of all of the proc * fd.  I'll play around with it some but I'm pretty
> > sure you are trying to optimize the wrong thing.
>
> Yes, that's what I meant too in "a way to walk the list of pids with drm
> fds directly".

Just to follow up, I did a quick hack to loop and print the mem
stats.. 5x loops I couldn't really measure any increase in gputop CPU
utilization.  At 50x loops I could measure a small increase.  Without
additional looping to artificially increase the cost, nothing drm
related shows up in a perf-record of gputop.

What could be an easy optimization, if it can be accessed, is to parse
/sys/kernel/debug/dri/<n>/clients to get the list of pid's of
processes with the drm device open.  This would cut down quite a bit
the # of pid's to examine.

BR,
-R

> Regards,
>
> Tvrtko
>
> >
> > BR,
> > -R
> >
> > [1] generally a single process using drm has multiple fd's pointing at
> > the same drm_file.. which makes the current approach of having to read
> > fdinfo to find the client-id sub-optimal.  But still the total # of
> > proc * fd is much larger

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2023-04-17 16:12 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-11 22:56 [PATCH v3 0/7] drm: fdinfo memory stats Rob Clark
2023-04-11 22:56 ` Rob Clark
2023-04-11 22:56 ` [Intel-gfx] " Rob Clark
2023-04-11 22:56 ` Rob Clark
2023-04-11 22:56 ` [PATCH v3 1/7] drm: Add common fdinfo helper Rob Clark
2023-04-11 22:56   ` Rob Clark
2023-04-12  7:55   ` Daniel Vetter
2023-04-12  7:55     ` Daniel Vetter
2023-04-11 22:56 ` [PATCH v3 2/7] drm/msm: Switch to " Rob Clark
2023-04-11 22:56   ` Rob Clark
2023-04-11 22:56 ` [PATCH v3 3/7] drm/amdgpu: " Rob Clark
2023-04-11 22:56   ` Rob Clark
2023-04-11 22:56   ` Rob Clark
2023-04-12  7:58   ` Daniel Vetter
2023-04-12  7:58     ` Daniel Vetter
2023-04-12  7:58     ` Daniel Vetter
2023-04-11 22:56 ` [PATCH v3 4/7] drm/i915: " Rob Clark
2023-04-11 22:56   ` [Intel-gfx] " Rob Clark
2023-04-11 22:56   ` Rob Clark
2023-04-12 12:32   ` Tvrtko Ursulin
2023-04-12 12:32     ` [Intel-gfx] " Tvrtko Ursulin
2023-04-12 12:32     ` Tvrtko Ursulin
2023-04-12 13:51     ` Daniel Vetter
2023-04-12 13:51       ` [Intel-gfx] " Daniel Vetter
2023-04-12 13:51       ` Daniel Vetter
2023-04-12 15:12       ` Tvrtko Ursulin
2023-04-12 15:12         ` [Intel-gfx] " Tvrtko Ursulin
2023-04-12 18:13         ` Daniel Vetter
2023-04-12 18:13           ` [Intel-gfx] " Daniel Vetter
2023-04-12 18:13           ` Daniel Vetter
2023-04-11 22:56 ` [PATCH v3 5/7] drm/etnaviv: " Rob Clark
2023-04-11 22:56   ` Rob Clark
2023-04-12  7:59   ` Daniel Vetter
2023-04-12  7:59     ` Daniel Vetter
2023-04-12 22:18     ` Rob Clark
2023-04-12 22:18       ` Rob Clark
2023-04-11 22:56 ` [PATCH v3 6/7] drm: Add fdinfo memory stats Rob Clark
2023-04-11 22:56   ` Rob Clark
2023-04-12  8:01   ` Daniel Vetter
2023-04-12  8:01     ` Daniel Vetter
2023-04-12 14:42   ` Tvrtko Ursulin
2023-04-12 14:42     ` Tvrtko Ursulin
2023-04-12 17:59     ` Rob Clark
2023-04-12 17:59       ` Rob Clark
2023-04-12 18:17       ` Daniel Vetter
2023-04-12 18:17         ` Daniel Vetter
2023-04-12 18:42         ` Rob Clark
2023-04-12 18:42           ` Rob Clark
2023-04-12 19:18           ` Daniel Vetter
2023-04-12 19:18             ` Daniel Vetter
2023-04-13 12:58             ` Tvrtko Ursulin
2023-04-13 13:27               ` Daniel Vetter
2023-04-13 13:27                 ` Daniel Vetter
2023-04-13 16:40                 ` Tvrtko Ursulin
2023-04-13 18:24                   ` Rob Clark
2023-04-13 18:24                     ` Rob Clark
2023-04-13 20:05                   ` Daniel Vetter
2023-04-13 20:05                     ` Daniel Vetter
2023-04-14  8:57                     ` Tvrtko Ursulin
2023-04-14  9:07                       ` Daniel Vetter
2023-04-14  9:07                         ` Daniel Vetter
2023-04-14 10:12                         ` Tvrtko Ursulin
2023-04-14 10:12                           ` Tvrtko Ursulin
2023-04-14 13:40                       ` Rob Clark
2023-04-14 13:40                         ` Rob Clark
2023-04-16  7:48                         ` Daniel Vetter
2023-04-16  7:48                           ` Daniel Vetter
2023-04-17 11:10                           ` Tvrtko Ursulin
2023-04-17 13:42                             ` Rob Clark
2023-04-17 13:42                               ` Rob Clark
2023-04-17 14:04                               ` Alex Deucher
2023-04-17 14:04                                 ` Alex Deucher
2023-04-17 14:20                               ` Tvrtko Ursulin
2023-04-17 14:20                                 ` Tvrtko Ursulin
2023-04-17 16:12                                 ` Rob Clark
2023-04-17 16:12                                   ` Rob Clark
2023-04-13 15:47               ` Rob Clark
2023-04-13 15:47                 ` Rob Clark
2023-04-13 16:45     ` Alex Deucher
2023-04-13 16:45       ` Alex Deucher
2023-04-11 22:56 ` [PATCH v3 7/7] drm/msm: Add memory stats to fdinfo Rob Clark
2023-04-11 22:56   ` Rob Clark
2023-04-12  9:34 ` [PATCH v3 0/7] drm: fdinfo memory stats Christian König
2023-04-12  9:34   ` Christian König
2023-04-12  9:34   ` [Intel-gfx] " Christian König
2023-04-12  9:34   ` Christian König
2023-04-12 12:10   ` Tvrtko Ursulin
2023-04-12 12:10     ` Tvrtko Ursulin
2023-04-12 12:10     ` [Intel-gfx] " Tvrtko Ursulin
2023-04-12 12:10     ` Tvrtko Ursulin
2023-04-12 12:22     ` Christian König
2023-04-12 12:22       ` Christian König
2023-04-12 12:22       ` [Intel-gfx] " Christian König
2023-04-12 12:22       ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.