All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Add fdinfo support to Panfrost
@ 2023-08-24  1:34 ` Adrián Larumbe
  0 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-24  1:34 UTC (permalink / raw)
  To: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price
  Cc: linux-arm-msm, adrian.larumbe, linux-kernel, dri-devel, healych,
	kernel, freedreno

This patch series adds fdinfo support to the Panfrost DRM driver. It will
display a series of key:value pairs under /proc/pid/fdinfo/fd for render
processes that open the Panfrost DRM file.

The pairs contain basic drm gpu engine and memory region information that
can either be cat by a privileged user or accessed with IGT's gputop
utility.

Changelog:

v2:
 - Changed the way gpu cycles and engine time are calculated, using GPU
 registers and taking into account potential resets.
 - Split render engine values into fragment and vertex/tiler ones.
 - Added more fine-grained calculation of RSS size for BO's.
 - Implemente selection of drm-memory region size units
 - Removed locking of shrinker's mutex in GEM obj status function

Adrián Larumbe (6):
  drm/panfrost: Add cycle count GPU register definitions
  drm/panfrost: Add fdinfo support GPU load metrics
  drm/panfrost: Add fdinfo support for memory stats
  drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
  drm/panfrost: Implement generic DRM object RSS reporting function
  drm/drm-file: Allow size unit selection in drm_show_memory_stats

 drivers/gpu/drm/drm_file.c                  | 27 +++++++----
 drivers/gpu/drm/msm/msm_drv.c               |  2 +-
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +++++
 drivers/gpu/drm/panfrost/panfrost_drv.c     | 54 +++++++++++++++++++--
 drivers/gpu/drm/panfrost/panfrost_gem.c     | 34 +++++++++++++
 drivers/gpu/drm/panfrost/panfrost_gem.h     |  6 +++
 drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++
 drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c     | 16 ++++--
 drivers/gpu/drm/panfrost/panfrost_regs.h    |  5 ++
 include/drm/drm_file.h                      |  5 +-
 include/drm/drm_gem.h                       |  9 ++++
 14 files changed, 195 insertions(+), 21 deletions(-)

-- 
2.42.0


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/6] Add fdinfo support to Panfrost
@ 2023-08-24  1:34 ` Adrián Larumbe
  0 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-24  1:34 UTC (permalink / raw)
  To: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price
  Cc: adrian.larumbe, dri-devel, linux-kernel, linux-arm-msm,
	freedreno, healych, kernel

This patch series adds fdinfo support to the Panfrost DRM driver. It will
display a series of key:value pairs under /proc/pid/fdinfo/fd for render
processes that open the Panfrost DRM file.

The pairs contain basic drm gpu engine and memory region information that
can either be cat by a privileged user or accessed with IGT's gputop
utility.

Changelog:

v2:
 - Changed the way gpu cycles and engine time are calculated, using GPU
 registers and taking into account potential resets.
 - Split render engine values into fragment and vertex/tiler ones.
 - Added more fine-grained calculation of RSS size for BO's.
 - Implemente selection of drm-memory region size units
 - Removed locking of shrinker's mutex in GEM obj status function

Adrián Larumbe (6):
  drm/panfrost: Add cycle count GPU register definitions
  drm/panfrost: Add fdinfo support GPU load metrics
  drm/panfrost: Add fdinfo support for memory stats
  drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
  drm/panfrost: Implement generic DRM object RSS reporting function
  drm/drm-file: Allow size unit selection in drm_show_memory_stats

 drivers/gpu/drm/drm_file.c                  | 27 +++++++----
 drivers/gpu/drm/msm/msm_drv.c               |  2 +-
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +++++
 drivers/gpu/drm/panfrost/panfrost_drv.c     | 54 +++++++++++++++++++--
 drivers/gpu/drm/panfrost/panfrost_gem.c     | 34 +++++++++++++
 drivers/gpu/drm/panfrost/panfrost_gem.h     |  6 +++
 drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++
 drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c     | 16 ++++--
 drivers/gpu/drm/panfrost/panfrost_regs.h    |  5 ++
 include/drm/drm_file.h                      |  5 +-
 include/drm/drm_gem.h                       |  9 ++++
 14 files changed, 195 insertions(+), 21 deletions(-)

-- 
2.42.0


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 1/6] drm/panfrost: Add cycle count GPU register definitions
  2023-08-24  1:34 ` Adrián Larumbe
@ 2023-08-24  1:34   ` Adrián Larumbe
  -1 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-24  1:34 UTC (permalink / raw)
  To: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price
  Cc: linux-arm-msm, adrian.larumbe, linux-kernel, dri-devel, healych,
	kernel, freedreno

These GPU registers will be used when programming the cycle counter, which
we need for providing accurate fdinfo drm-cycles values to user space.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_regs.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 919f44ac853d..55ec807550b3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -46,6 +46,8 @@
 #define   GPU_CMD_SOFT_RESET		0x01
 #define   GPU_CMD_PERFCNT_CLEAR		0x03
 #define   GPU_CMD_PERFCNT_SAMPLE	0x04
+#define   GPU_CMD_CYCLE_COUNT_START	0x05
+#define   GPU_CMD_CYCLE_COUNT_STOP	0x06
 #define   GPU_CMD_CLEAN_CACHES		0x07
 #define   GPU_CMD_CLEAN_INV_CACHES	0x08
 #define GPU_STATUS			0x34
@@ -73,6 +75,9 @@
 #define GPU_PRFCNT_TILER_EN		0x74
 #define GPU_PRFCNT_MMU_L2_EN		0x7c
 
+#define GPU_CYCLE_COUNT_LO		0x90
+#define GPU_CYCLE_COUNT_HI		0x94
+
 #define GPU_THREAD_MAX_THREADS		0x0A0	/* (RO) Maximum number of threads per core */
 #define GPU_THREAD_MAX_WORKGROUP_SIZE	0x0A4	/* (RO) Maximum workgroup size */
 #define GPU_THREAD_MAX_BARRIER_SIZE	0x0A8	/* (RO) Maximum threads waiting at a barrier */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 1/6] drm/panfrost: Add cycle count GPU register definitions
@ 2023-08-24  1:34   ` Adrián Larumbe
  0 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-24  1:34 UTC (permalink / raw)
  To: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price
  Cc: adrian.larumbe, dri-devel, linux-kernel, linux-arm-msm,
	freedreno, healych, kernel

These GPU registers will be used when programming the cycle counter, which
we need for providing accurate fdinfo drm-cycles values to user space.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_regs.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 919f44ac853d..55ec807550b3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -46,6 +46,8 @@
 #define   GPU_CMD_SOFT_RESET		0x01
 #define   GPU_CMD_PERFCNT_CLEAR		0x03
 #define   GPU_CMD_PERFCNT_SAMPLE	0x04
+#define   GPU_CMD_CYCLE_COUNT_START	0x05
+#define   GPU_CMD_CYCLE_COUNT_STOP	0x06
 #define   GPU_CMD_CLEAN_CACHES		0x07
 #define   GPU_CMD_CLEAN_INV_CACHES	0x08
 #define GPU_STATUS			0x34
@@ -73,6 +75,9 @@
 #define GPU_PRFCNT_TILER_EN		0x74
 #define GPU_PRFCNT_MMU_L2_EN		0x7c
 
+#define GPU_CYCLE_COUNT_LO		0x90
+#define GPU_CYCLE_COUNT_HI		0x94
+
 #define GPU_THREAD_MAX_THREADS		0x0A0	/* (RO) Maximum number of threads per core */
 #define GPU_THREAD_MAX_WORKGROUP_SIZE	0x0A4	/* (RO) Maximum workgroup size */
 #define GPU_THREAD_MAX_BARRIER_SIZE	0x0A8	/* (RO) Maximum threads waiting at a barrier */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
  2023-08-24  1:34 ` Adrián Larumbe
@ 2023-08-24  1:34   ` Adrián Larumbe
  -1 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-24  1:34 UTC (permalink / raw)
  To: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price
  Cc: linux-arm-msm, adrian.larumbe, linux-kernel, dri-devel, healych,
	kernel, freedreno

The drm-stats fdinfo tags made available to user space are drm-engine,
drm-cycles, drm-max-freq and drm-curfreq, one per job slot.

This deviates from standard practice in other DRM drivers, where a single
set of key:value pairs is provided for the whole render engine. However,
Panfrost has separate queues for fragment and vertex/tiler jobs, so a
decision was made to calculate bus cycles and workload times separately.

Maximum operating frequency is calculated at devfreq initialisation time.
Current frequency is made available to user space because nvtop uses it
when performing engine usage calculations.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
 drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
 drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
 drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
 6 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
index 58dfb15a8757..28caffc689e2 100644
--- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
+++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
@@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
 	spin_lock_irqsave(&pfdevfreq->lock, irqflags);
 
 	panfrost_devfreq_update_utilization(pfdevfreq);
+	pfdevfreq->current_frequency = status->current_frequency;
 
 	status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time,
 						   pfdevfreq->idle_time));
@@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
 	struct devfreq *devfreq;
 	struct thermal_cooling_device *cooling;
 	struct panfrost_devfreq *pfdevfreq = &pfdev->pfdevfreq;
+	unsigned long freq = ULONG_MAX;
 
 	if (pfdev->comp->num_supplies > 1) {
 		/*
@@ -172,6 +174,12 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
 		return ret;
 	}
 
+	/* Find the fastest defined rate  */
+	opp = dev_pm_opp_find_freq_floor(dev, &freq);
+	if (IS_ERR(opp))
+		return PTR_ERR(opp);
+	pfdevfreq->fast_rate = freq;
+
 	dev_pm_opp_put(opp);
 
 	/*
diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.h b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
index 1514c1f9d91c..48dbe185f206 100644
--- a/drivers/gpu/drm/panfrost/panfrost_devfreq.h
+++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
@@ -19,6 +19,9 @@ struct panfrost_devfreq {
 	struct devfreq_simple_ondemand_data gov_data;
 	bool opp_of_table_added;
 
+	unsigned long current_frequency;
+	unsigned long fast_rate;
+
 	ktime_t busy_time;
 	ktime_t idle_time;
 	ktime_t time_last_update;
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index b0126b9fbadc..680f298fd1a9 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -24,6 +24,7 @@ struct panfrost_perfcnt;
 
 #define NUM_JOB_SLOTS 3
 #define MAX_PM_DOMAINS 5
+#define MAX_SLOT_NAME_LEN 10
 
 struct panfrost_features {
 	u16 id;
@@ -135,12 +136,24 @@ struct panfrost_mmu {
 	struct list_head list;
 };
 
+struct drm_info_gpu {
+	unsigned int maxfreq;
+
+	struct engine_info {
+		unsigned long long elapsed_ns;
+		unsigned long long cycles;
+		char name[MAX_SLOT_NAME_LEN];
+	} engines[NUM_JOB_SLOTS];
+};
+
 struct panfrost_file_priv {
 	struct panfrost_device *pfdev;
 
 	struct drm_sched_entity sched_entity[NUM_JOB_SLOTS];
 
 	struct panfrost_mmu *mmu;
+
+	struct drm_info_gpu fdinfo;
 };
 
 static inline struct panfrost_device *to_panfrost_device(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index a2ab99698ca8..3fd372301019 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
 	job->requirements = args->requirements;
 	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
 	job->mmu = file_priv->mmu;
+	job->priv = file_priv;
 
 	slot = panfrost_job_get_slot(job);
 
@@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
 		goto err_free;
 	}
 
+	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
+	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
+#if 0
+	/* Add compute engine in the future */
+	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
+#endif
+	panfrost_priv->fdinfo.maxfreq = pfdev->pfdevfreq.fast_rate;
+
 	ret = panfrost_job_open(panfrost_priv);
 	if (ret)
 		goto err_job;
@@ -523,7 +532,40 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
 	PANFROST_IOCTL(MADVISE,		madvise,	DRM_RENDER_ALLOW),
 };
 
-DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
+
+static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
+				     struct panfrost_file_priv *panfrost_priv,
+				     struct drm_printer *p)
+{
+	int i;
+
+	for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
+		struct engine_info *ei = &panfrost_priv->fdinfo.engines[i];
+
+		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
+			   ei->name, ei->elapsed_ns);
+		drm_printf(p, "drm-cycles-%s:\t%llu\n",
+			   ei->name, ei->cycles);
+		drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
+			   ei->name, panfrost_priv->fdinfo.maxfreq);
+		drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
+			   ei->name, pfdev->pfdevfreq.current_frequency);
+	}
+}
+
+static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
+{
+	struct drm_device *dev = file->minor->dev;
+	struct panfrost_device *pfdev = dev->dev_private;
+
+	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
+}
+
+static const struct file_operations panfrost_drm_driver_fops = {
+	.owner = THIS_MODULE,
+	DRM_GEM_FOPS,
+	.show_fdinfo = drm_show_fdinfo,
+};
 
 /*
  * Panfrost driver version:
@@ -535,6 +577,7 @@ static const struct drm_driver panfrost_drm_driver = {
 	.driver_features	= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
 	.open			= panfrost_open,
 	.postclose		= panfrost_postclose,
+	.show_fdinfo		= panfrost_show_fdinfo,
 	.ioctls			= panfrost_drm_driver_ioctls,
 	.num_ioctls		= ARRAY_SIZE(panfrost_drm_driver_ioctls),
 	.fops			= &panfrost_drm_driver_fops,
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index dbc597ab46fb..a847e183b5d0 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -153,10 +153,31 @@ panfrost_get_job_chain_flag(const struct panfrost_job *job)
 	return (f->seqno & 1) ? JS_CONFIG_JOB_CHAIN_FLAG : 0;
 }
 
+static inline unsigned long long read_cycles(struct panfrost_device *pfdev)
+{
+	u64 address = (u64) gpu_read(pfdev, GPU_CYCLE_COUNT_HI) << 32;
+
+	address |= gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
+
+	return address;
+}
+
 static struct panfrost_job *
 panfrost_dequeue_job(struct panfrost_device *pfdev, int slot)
 {
 	struct panfrost_job *job = pfdev->jobs[slot][0];
+	struct engine_info *engine_info = &job->priv->fdinfo.engines[slot];
+
+	engine_info->elapsed_ns +=
+		ktime_to_ns(ktime_sub(ktime_get(), job->start_time));
+	engine_info->cycles +=
+		read_cycles(pfdev) - job->start_cycles;
+
+	/* Reset in case the job has to be requeued */
+	job->start_time = 0;
+	/* A GPU reset puts the Cycle Counter register back to 0 */
+	job->start_cycles = atomic_read(&pfdev->reset.pending) ?
+		0 : read_cycles(pfdev);
 
 	WARN_ON(!job);
 	pfdev->jobs[slot][0] = pfdev->jobs[slot][1];
@@ -233,6 +254,9 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
 	subslot = panfrost_enqueue_job(pfdev, js, job);
 	/* Don't queue the job if a reset is in progress */
 	if (!atomic_read(&pfdev->reset.pending)) {
+		job->start_time = ktime_get();
+		job->start_cycles = read_cycles(pfdev);
+
 		job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
 		dev_dbg(pfdev->dev,
 			"JS: Submitting atom %p to js[%d][%d] with head=0x%llx AS %d",
@@ -297,6 +321,9 @@ int panfrost_job_push(struct panfrost_job *job)
 
 	kref_get(&job->refcount); /* put by scheduler job completion */
 
+	if (panfrost_job_is_idle(pfdev))
+		gpu_write(pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_START);
+
 	drm_sched_entity_push_job(&job->base);
 
 	mutex_unlock(&pfdev->sched_lock);
@@ -351,6 +378,9 @@ static void panfrost_job_free(struct drm_sched_job *sched_job)
 
 	drm_sched_job_cleanup(sched_job);
 
+	if (panfrost_job_is_idle(job->pfdev))
+		gpu_write(job->pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_STOP);
+
 	panfrost_job_put(job);
 }
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h
index 8becc1ba0eb9..038171c39dd8 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.h
+++ b/drivers/gpu/drm/panfrost/panfrost_job.h
@@ -32,6 +32,10 @@ struct panfrost_job {
 
 	/* Fence to be signaled by drm-sched once its done with the job */
 	struct dma_fence *render_done_fence;
+
+	struct panfrost_file_priv *priv;
+	ktime_t start_time;
+	u64 start_cycles;
 };
 
 int panfrost_job_init(struct panfrost_device *pfdev);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
@ 2023-08-24  1:34   ` Adrián Larumbe
  0 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-24  1:34 UTC (permalink / raw)
  To: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price
  Cc: adrian.larumbe, dri-devel, linux-kernel, linux-arm-msm,
	freedreno, healych, kernel

The drm-stats fdinfo tags made available to user space are drm-engine,
drm-cycles, drm-max-freq and drm-curfreq, one per job slot.

This deviates from standard practice in other DRM drivers, where a single
set of key:value pairs is provided for the whole render engine. However,
Panfrost has separate queues for fragment and vertex/tiler jobs, so a
decision was made to calculate bus cycles and workload times separately.

Maximum operating frequency is calculated at devfreq initialisation time.
Current frequency is made available to user space because nvtop uses it
when performing engine usage calculations.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
 drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
 drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
 drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
 6 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
index 58dfb15a8757..28caffc689e2 100644
--- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
+++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
@@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
 	spin_lock_irqsave(&pfdevfreq->lock, irqflags);
 
 	panfrost_devfreq_update_utilization(pfdevfreq);
+	pfdevfreq->current_frequency = status->current_frequency;
 
 	status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time,
 						   pfdevfreq->idle_time));
@@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
 	struct devfreq *devfreq;
 	struct thermal_cooling_device *cooling;
 	struct panfrost_devfreq *pfdevfreq = &pfdev->pfdevfreq;
+	unsigned long freq = ULONG_MAX;
 
 	if (pfdev->comp->num_supplies > 1) {
 		/*
@@ -172,6 +174,12 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
 		return ret;
 	}
 
+	/* Find the fastest defined rate  */
+	opp = dev_pm_opp_find_freq_floor(dev, &freq);
+	if (IS_ERR(opp))
+		return PTR_ERR(opp);
+	pfdevfreq->fast_rate = freq;
+
 	dev_pm_opp_put(opp);
 
 	/*
diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.h b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
index 1514c1f9d91c..48dbe185f206 100644
--- a/drivers/gpu/drm/panfrost/panfrost_devfreq.h
+++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
@@ -19,6 +19,9 @@ struct panfrost_devfreq {
 	struct devfreq_simple_ondemand_data gov_data;
 	bool opp_of_table_added;
 
+	unsigned long current_frequency;
+	unsigned long fast_rate;
+
 	ktime_t busy_time;
 	ktime_t idle_time;
 	ktime_t time_last_update;
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index b0126b9fbadc..680f298fd1a9 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -24,6 +24,7 @@ struct panfrost_perfcnt;
 
 #define NUM_JOB_SLOTS 3
 #define MAX_PM_DOMAINS 5
+#define MAX_SLOT_NAME_LEN 10
 
 struct panfrost_features {
 	u16 id;
@@ -135,12 +136,24 @@ struct panfrost_mmu {
 	struct list_head list;
 };
 
+struct drm_info_gpu {
+	unsigned int maxfreq;
+
+	struct engine_info {
+		unsigned long long elapsed_ns;
+		unsigned long long cycles;
+		char name[MAX_SLOT_NAME_LEN];
+	} engines[NUM_JOB_SLOTS];
+};
+
 struct panfrost_file_priv {
 	struct panfrost_device *pfdev;
 
 	struct drm_sched_entity sched_entity[NUM_JOB_SLOTS];
 
 	struct panfrost_mmu *mmu;
+
+	struct drm_info_gpu fdinfo;
 };
 
 static inline struct panfrost_device *to_panfrost_device(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index a2ab99698ca8..3fd372301019 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
 	job->requirements = args->requirements;
 	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
 	job->mmu = file_priv->mmu;
+	job->priv = file_priv;
 
 	slot = panfrost_job_get_slot(job);
 
@@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
 		goto err_free;
 	}
 
+	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
+	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
+#if 0
+	/* Add compute engine in the future */
+	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
+#endif
+	panfrost_priv->fdinfo.maxfreq = pfdev->pfdevfreq.fast_rate;
+
 	ret = panfrost_job_open(panfrost_priv);
 	if (ret)
 		goto err_job;
@@ -523,7 +532,40 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
 	PANFROST_IOCTL(MADVISE,		madvise,	DRM_RENDER_ALLOW),
 };
 
-DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
+
+static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
+				     struct panfrost_file_priv *panfrost_priv,
+				     struct drm_printer *p)
+{
+	int i;
+
+	for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
+		struct engine_info *ei = &panfrost_priv->fdinfo.engines[i];
+
+		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
+			   ei->name, ei->elapsed_ns);
+		drm_printf(p, "drm-cycles-%s:\t%llu\n",
+			   ei->name, ei->cycles);
+		drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
+			   ei->name, panfrost_priv->fdinfo.maxfreq);
+		drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
+			   ei->name, pfdev->pfdevfreq.current_frequency);
+	}
+}
+
+static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
+{
+	struct drm_device *dev = file->minor->dev;
+	struct panfrost_device *pfdev = dev->dev_private;
+
+	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
+}
+
+static const struct file_operations panfrost_drm_driver_fops = {
+	.owner = THIS_MODULE,
+	DRM_GEM_FOPS,
+	.show_fdinfo = drm_show_fdinfo,
+};
 
 /*
  * Panfrost driver version:
@@ -535,6 +577,7 @@ static const struct drm_driver panfrost_drm_driver = {
 	.driver_features	= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
 	.open			= panfrost_open,
 	.postclose		= panfrost_postclose,
+	.show_fdinfo		= panfrost_show_fdinfo,
 	.ioctls			= panfrost_drm_driver_ioctls,
 	.num_ioctls		= ARRAY_SIZE(panfrost_drm_driver_ioctls),
 	.fops			= &panfrost_drm_driver_fops,
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index dbc597ab46fb..a847e183b5d0 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -153,10 +153,31 @@ panfrost_get_job_chain_flag(const struct panfrost_job *job)
 	return (f->seqno & 1) ? JS_CONFIG_JOB_CHAIN_FLAG : 0;
 }
 
+static inline unsigned long long read_cycles(struct panfrost_device *pfdev)
+{
+	u64 address = (u64) gpu_read(pfdev, GPU_CYCLE_COUNT_HI) << 32;
+
+	address |= gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
+
+	return address;
+}
+
 static struct panfrost_job *
 panfrost_dequeue_job(struct panfrost_device *pfdev, int slot)
 {
 	struct panfrost_job *job = pfdev->jobs[slot][0];
+	struct engine_info *engine_info = &job->priv->fdinfo.engines[slot];
+
+	engine_info->elapsed_ns +=
+		ktime_to_ns(ktime_sub(ktime_get(), job->start_time));
+	engine_info->cycles +=
+		read_cycles(pfdev) - job->start_cycles;
+
+	/* Reset in case the job has to be requeued */
+	job->start_time = 0;
+	/* A GPU reset puts the Cycle Counter register back to 0 */
+	job->start_cycles = atomic_read(&pfdev->reset.pending) ?
+		0 : read_cycles(pfdev);
 
 	WARN_ON(!job);
 	pfdev->jobs[slot][0] = pfdev->jobs[slot][1];
@@ -233,6 +254,9 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
 	subslot = panfrost_enqueue_job(pfdev, js, job);
 	/* Don't queue the job if a reset is in progress */
 	if (!atomic_read(&pfdev->reset.pending)) {
+		job->start_time = ktime_get();
+		job->start_cycles = read_cycles(pfdev);
+
 		job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
 		dev_dbg(pfdev->dev,
 			"JS: Submitting atom %p to js[%d][%d] with head=0x%llx AS %d",
@@ -297,6 +321,9 @@ int panfrost_job_push(struct panfrost_job *job)
 
 	kref_get(&job->refcount); /* put by scheduler job completion */
 
+	if (panfrost_job_is_idle(pfdev))
+		gpu_write(pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_START);
+
 	drm_sched_entity_push_job(&job->base);
 
 	mutex_unlock(&pfdev->sched_lock);
@@ -351,6 +378,9 @@ static void panfrost_job_free(struct drm_sched_job *sched_job)
 
 	drm_sched_job_cleanup(sched_job);
 
+	if (panfrost_job_is_idle(job->pfdev))
+		gpu_write(job->pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_STOP);
+
 	panfrost_job_put(job);
 }
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h
index 8becc1ba0eb9..038171c39dd8 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.h
+++ b/drivers/gpu/drm/panfrost/panfrost_job.h
@@ -32,6 +32,10 @@ struct panfrost_job {
 
 	/* Fence to be signaled by drm-sched once its done with the job */
 	struct dma_fence *render_done_fence;
+
+	struct panfrost_file_priv *priv;
+	ktime_t start_time;
+	u64 start_cycles;
 };
 
 int panfrost_job_init(struct panfrost_device *pfdev);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 3/6] drm/panfrost: Add fdinfo support for memory stats
  2023-08-24  1:34 ` Adrián Larumbe
@ 2023-08-24  1:34   ` Adrián Larumbe
  -1 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-24  1:34 UTC (permalink / raw)
  To: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price
  Cc: linux-arm-msm, adrian.larumbe, linux-kernel, dri-devel, healych,
	kernel, freedreno

A new DRM GEM object function is added so that drm_show_memory_stats can
provider more accurate memory usage numbers.

Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked
after locking the driver's shrinker mutex, but drm_show_memory_stats takes
over the drm file's object handle database spinlock, so there's potential
for a race condition here.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_drv.c |  9 +++++++--
 drivers/gpu/drm/panfrost/panfrost_gem.c | 12 ++++++++++++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 3fd372301019..93d5f5538c0b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -440,11 +440,14 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
 	args->retained = drm_gem_shmem_madvise(&bo->base, args->madv);
 
 	if (args->retained) {
-		if (args->madv == PANFROST_MADV_DONTNEED)
+		if (args->madv == PANFROST_MADV_DONTNEED) {
 			list_move_tail(&bo->base.madv_list,
 				       &pfdev->shrinker_list);
-		else if (args->madv == PANFROST_MADV_WILLNEED)
+			bo->is_purgable = true;
+		} else if (args->madv == PANFROST_MADV_WILLNEED) {
 			list_del_init(&bo->base.madv_list);
+			bo->is_purgable = false;
+		}
 	}
 
 out_unlock_mappings:
@@ -559,6 +562,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 	struct panfrost_device *pfdev = dev->dev_private;
 
 	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
+
+	drm_show_memory_stats(p, file);
 }
 
 static const struct file_operations panfrost_drm_driver_fops = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 3c812fbd126f..aea16b0e4dda 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -195,6 +195,17 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
 	return drm_gem_shmem_pin(&bo->base);
 }
 
+static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj)
+{
+	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+	enum drm_gem_object_status res = 0;
+
+	res |= (bo->is_purgable) ? DRM_GEM_OBJECT_PURGEABLE : 0;
+
+	res |= (bo->base.pages) ? DRM_GEM_OBJECT_RESIDENT : 0;
+
+	return res;
+}
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
 	.free = panfrost_gem_free_object,
 	.open = panfrost_gem_open,
@@ -206,6 +217,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
 	.vmap = drm_gem_shmem_object_vmap,
 	.vunmap = drm_gem_shmem_object_vunmap,
 	.mmap = drm_gem_shmem_object_mmap,
+	.status = panfrost_gem_status,
 	.vm_ops = &drm_gem_shmem_vm_ops,
 };
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
index ad2877eeeccd..e06f7ceb8f73 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -38,6 +38,7 @@ struct panfrost_gem_object {
 
 	bool noexec		:1;
 	bool is_heap		:1;
+	bool is_purgable	:1;
 };
 
 struct panfrost_gem_mapping {
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 3/6] drm/panfrost: Add fdinfo support for memory stats
@ 2023-08-24  1:34   ` Adrián Larumbe
  0 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-24  1:34 UTC (permalink / raw)
  To: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price
  Cc: adrian.larumbe, dri-devel, linux-kernel, linux-arm-msm,
	freedreno, healych, kernel

A new DRM GEM object function is added so that drm_show_memory_stats can
provider more accurate memory usage numbers.

Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked
after locking the driver's shrinker mutex, but drm_show_memory_stats takes
over the drm file's object handle database spinlock, so there's potential
for a race condition here.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_drv.c |  9 +++++++--
 drivers/gpu/drm/panfrost/panfrost_gem.c | 12 ++++++++++++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 3fd372301019..93d5f5538c0b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -440,11 +440,14 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
 	args->retained = drm_gem_shmem_madvise(&bo->base, args->madv);
 
 	if (args->retained) {
-		if (args->madv == PANFROST_MADV_DONTNEED)
+		if (args->madv == PANFROST_MADV_DONTNEED) {
 			list_move_tail(&bo->base.madv_list,
 				       &pfdev->shrinker_list);
-		else if (args->madv == PANFROST_MADV_WILLNEED)
+			bo->is_purgable = true;
+		} else if (args->madv == PANFROST_MADV_WILLNEED) {
 			list_del_init(&bo->base.madv_list);
+			bo->is_purgable = false;
+		}
 	}
 
 out_unlock_mappings:
@@ -559,6 +562,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 	struct panfrost_device *pfdev = dev->dev_private;
 
 	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
+
+	drm_show_memory_stats(p, file);
 }
 
 static const struct file_operations panfrost_drm_driver_fops = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 3c812fbd126f..aea16b0e4dda 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -195,6 +195,17 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
 	return drm_gem_shmem_pin(&bo->base);
 }
 
+static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj)
+{
+	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+	enum drm_gem_object_status res = 0;
+
+	res |= (bo->is_purgable) ? DRM_GEM_OBJECT_PURGEABLE : 0;
+
+	res |= (bo->base.pages) ? DRM_GEM_OBJECT_RESIDENT : 0;
+
+	return res;
+}
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
 	.free = panfrost_gem_free_object,
 	.open = panfrost_gem_open,
@@ -206,6 +217,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
 	.vmap = drm_gem_shmem_object_vmap,
 	.vunmap = drm_gem_shmem_object_vunmap,
 	.mmap = drm_gem_shmem_object_mmap,
+	.status = panfrost_gem_status,
 	.vm_ops = &drm_gem_shmem_vm_ops,
 };
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
index ad2877eeeccd..e06f7ceb8f73 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -38,6 +38,7 @@ struct panfrost_gem_object {
 
 	bool noexec		:1;
 	bool is_heap		:1;
+	bool is_purgable	:1;
 };
 
 struct panfrost_gem_mapping {
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 4/6] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
  2023-08-24  1:34 ` Adrián Larumbe
@ 2023-08-24  1:34   ` Adrián Larumbe
  -1 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-24  1:34 UTC (permalink / raw)
  To: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price
  Cc: linux-arm-msm, adrian.larumbe, linux-kernel, dri-devel, healych,
	kernel, freedreno

Some BO's might be mapped onto physical memory chunkwise and on demand,
like Panfrost's tiler heap. In this case, even though the
drm_gem_shmem_object page array might already be allocated, only a very
small fraction of the BO is currently backed by system memory, but
drm_show_memory_stats will then proceed to add its entire virtual size to
the file's total resident size regardless.

This led to very unrealistic RSS sizes being reckoned for Panfrost, where
said tiler heap buffer is initially allocated with a virtual size of 128
MiB, but only a small part of it will eventually be backed by system memory
after successive GPU page faults.

Provide a new DRM object generic function that would allow drivers to
return a more accurate RSS size for their BOs.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/drm_file.c | 5 ++++-
 include/drm/drm_gem.h      | 9 +++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 883d83bc0e3d..762965e3d503 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -944,7 +944,10 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
 		}
 
 		if (s & DRM_GEM_OBJECT_RESIDENT) {
-			status.resident += obj->size;
+			if (obj->funcs && obj->funcs->rss)
+				status.resident += obj->funcs->rss(obj);
+			else
+				status.resident += obj->size;
 		} else {
 			/* If already purged or not yet backed by pages, don't
 			 * count it as purgeable:
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index c0b13c43b459..78ed9fab6044 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -208,6 +208,15 @@ struct drm_gem_object_funcs {
 	 */
 	enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
 
+	/**
+	 * @rss:
+	 *
+	 * Return resident size of the object in physical memory.
+	 *
+	 * Called by drm_show_memory_stats().
+	 */
+	size_t (*rss)(struct drm_gem_object *obj);
+
 	/**
 	 * @vm_ops:
 	 *
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 4/6] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
@ 2023-08-24  1:34   ` Adrián Larumbe
  0 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-24  1:34 UTC (permalink / raw)
  To: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price
  Cc: adrian.larumbe, dri-devel, linux-kernel, linux-arm-msm,
	freedreno, healych, kernel

Some BO's might be mapped onto physical memory chunkwise and on demand,
like Panfrost's tiler heap. In this case, even though the
drm_gem_shmem_object page array might already be allocated, only a very
small fraction of the BO is currently backed by system memory, but
drm_show_memory_stats will then proceed to add its entire virtual size to
the file's total resident size regardless.

This led to very unrealistic RSS sizes being reckoned for Panfrost, where
said tiler heap buffer is initially allocated with a virtual size of 128
MiB, but only a small part of it will eventually be backed by system memory
after successive GPU page faults.

Provide a new DRM object generic function that would allow drivers to
return a more accurate RSS size for their BOs.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/drm_file.c | 5 ++++-
 include/drm/drm_gem.h      | 9 +++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 883d83bc0e3d..762965e3d503 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -944,7 +944,10 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
 		}
 
 		if (s & DRM_GEM_OBJECT_RESIDENT) {
-			status.resident += obj->size;
+			if (obj->funcs && obj->funcs->rss)
+				status.resident += obj->funcs->rss(obj);
+			else
+				status.resident += obj->size;
 		} else {
 			/* If already purged or not yet backed by pages, don't
 			 * count it as purgeable:
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index c0b13c43b459..78ed9fab6044 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -208,6 +208,15 @@ struct drm_gem_object_funcs {
 	 */
 	enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
 
+	/**
+	 * @rss:
+	 *
+	 * Return resident size of the object in physical memory.
+	 *
+	 * Called by drm_show_memory_stats().
+	 */
+	size_t (*rss)(struct drm_gem_object *obj);
+
 	/**
 	 * @vm_ops:
 	 *
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 5/6] drm/panfrost: Implement generic DRM object RSS reporting function
  2023-08-24  1:34 ` Adrián Larumbe
@ 2023-08-24  1:34   ` Adrián Larumbe
  -1 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-24  1:34 UTC (permalink / raw)
  To: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price
  Cc: linux-arm-msm, adrian.larumbe, linux-kernel, dri-devel, healych,
	kernel, freedreno

BO's RSS is updated every time new pages are allocated and mapped for the
object, either in its entirety at creation time for non-heap buffers, or
else on demand for heap buffers at GPU page fault's IRQ handler.

Same calculations had to be done for imported PRIME objects, since backing
storage for it might have already been allocated by the exporting driver.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_gem.c | 22 ++++++++++++++++++++++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 +++++
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 16 +++++++++++-----
 3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index aea16b0e4dda..c6bd1f16a6d4 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -206,6 +206,17 @@ static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj
 
 	return res;
 }
+
+size_t panfrost_gem_rss(struct drm_gem_object *obj)
+{
+	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+
+	if (!bo->base.pages)
+		return 0;
+
+	return bo->rss_size;
+}
+
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
 	.free = panfrost_gem_free_object,
 	.open = panfrost_gem_open,
@@ -218,6 +229,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
 	.vunmap = drm_gem_shmem_object_vunmap,
 	.mmap = drm_gem_shmem_object_mmap,
 	.status = panfrost_gem_status,
+	.rss = panfrost_gem_rss,
 	.vm_ops = &drm_gem_shmem_vm_ops,
 };
 
@@ -274,13 +286,23 @@ panfrost_gem_prime_import_sg_table(struct drm_device *dev,
 {
 	struct drm_gem_object *obj;
 	struct panfrost_gem_object *bo;
+	struct scatterlist *sgl;
+	unsigned int count;
+	size_t total = 0;
 
 	obj = drm_gem_shmem_prime_import_sg_table(dev, attach, sgt);
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
+	for_each_sgtable_dma_sg(sgt, sgl, count) {
+		size_t len = sg_dma_len(sgl);
+
+		total += len;
+	}
+
 	bo = to_panfrost_bo(obj);
 	bo->noexec = true;
+	bo->rss_size = total;
 
 	return obj;
 }
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
index e06f7ceb8f73..e2a7c46403c7 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -36,6 +36,11 @@ struct panfrost_gem_object {
 	 */
 	atomic_t gpu_usecount;
 
+	/*
+	 * Object chunk size currently mapped onto physical memory
+	 */
+	size_t rss_size;
+
 	bool noexec		:1;
 	bool is_heap		:1;
 	bool is_purgable	:1;
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index c0123d09f699..e03a5a9da06f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -285,17 +285,19 @@ static void panfrost_mmu_flush_range(struct panfrost_device *pfdev,
 	pm_runtime_put_autosuspend(pfdev->dev);
 }
 
-static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
+static size_t mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
 		      u64 iova, int prot, struct sg_table *sgt)
 {
 	unsigned int count;
 	struct scatterlist *sgl;
 	struct io_pgtable_ops *ops = mmu->pgtbl_ops;
 	u64 start_iova = iova;
+	size_t total = 0;
 
 	for_each_sgtable_dma_sg(sgt, sgl, count) {
 		unsigned long paddr = sg_dma_address(sgl);
 		size_t len = sg_dma_len(sgl);
+		total += len;
 
 		dev_dbg(pfdev->dev, "map: as=%d, iova=%llx, paddr=%lx, len=%zx", mmu->as, iova, paddr, len);
 
@@ -315,7 +317,7 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
 
 	panfrost_mmu_flush_range(pfdev, mmu, start_iova, iova - start_iova);
 
-	return 0;
+	return total;
 }
 
 int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
@@ -326,6 +328,7 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
 	struct panfrost_device *pfdev = to_panfrost_device(obj->dev);
 	struct sg_table *sgt;
 	int prot = IOMMU_READ | IOMMU_WRITE;
+	size_t mapped_size;
 
 	if (WARN_ON(mapping->active))
 		return 0;
@@ -337,9 +340,10 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
 	if (WARN_ON(IS_ERR(sgt)))
 		return PTR_ERR(sgt);
 
-	mmu_map_sg(pfdev, mapping->mmu, mapping->mmnode.start << PAGE_SHIFT,
+	mapped_size = mmu_map_sg(pfdev, mapping->mmu, mapping->mmnode.start << PAGE_SHIFT,
 		   prot, sgt);
 	mapping->active = true;
+	bo->rss_size += mapped_size;
 
 	return 0;
 }
@@ -447,6 +451,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
 	pgoff_t page_offset;
 	struct sg_table *sgt;
 	struct page **pages;
+	size_t mapped_size;
 
 	bomapping = addr_to_mapping(pfdev, as, addr);
 	if (!bomapping)
@@ -518,10 +523,11 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
 	if (ret)
 		goto err_map;
 
-	mmu_map_sg(pfdev, bomapping->mmu, addr,
-		   IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
+	mapped_size = mmu_map_sg(pfdev, bomapping->mmu, addr,
+				 IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
 
 	bomapping->active = true;
+	bo->rss_size += mapped_size;
 
 	dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 5/6] drm/panfrost: Implement generic DRM object RSS reporting function
@ 2023-08-24  1:34   ` Adrián Larumbe
  0 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-24  1:34 UTC (permalink / raw)
  To: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price
  Cc: adrian.larumbe, dri-devel, linux-kernel, linux-arm-msm,
	freedreno, healych, kernel

BO's RSS is updated every time new pages are allocated and mapped for the
object, either in its entirety at creation time for non-heap buffers, or
else on demand for heap buffers at GPU page fault's IRQ handler.

Same calculations had to be done for imported PRIME objects, since backing
storage for it might have already been allocated by the exporting driver.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_gem.c | 22 ++++++++++++++++++++++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 +++++
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 16 +++++++++++-----
 3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index aea16b0e4dda..c6bd1f16a6d4 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -206,6 +206,17 @@ static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj
 
 	return res;
 }
+
+size_t panfrost_gem_rss(struct drm_gem_object *obj)
+{
+	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+
+	if (!bo->base.pages)
+		return 0;
+
+	return bo->rss_size;
+}
+
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
 	.free = panfrost_gem_free_object,
 	.open = panfrost_gem_open,
@@ -218,6 +229,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
 	.vunmap = drm_gem_shmem_object_vunmap,
 	.mmap = drm_gem_shmem_object_mmap,
 	.status = panfrost_gem_status,
+	.rss = panfrost_gem_rss,
 	.vm_ops = &drm_gem_shmem_vm_ops,
 };
 
@@ -274,13 +286,23 @@ panfrost_gem_prime_import_sg_table(struct drm_device *dev,
 {
 	struct drm_gem_object *obj;
 	struct panfrost_gem_object *bo;
+	struct scatterlist *sgl;
+	unsigned int count;
+	size_t total = 0;
 
 	obj = drm_gem_shmem_prime_import_sg_table(dev, attach, sgt);
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
+	for_each_sgtable_dma_sg(sgt, sgl, count) {
+		size_t len = sg_dma_len(sgl);
+
+		total += len;
+	}
+
 	bo = to_panfrost_bo(obj);
 	bo->noexec = true;
+	bo->rss_size = total;
 
 	return obj;
 }
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
index e06f7ceb8f73..e2a7c46403c7 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -36,6 +36,11 @@ struct panfrost_gem_object {
 	 */
 	atomic_t gpu_usecount;
 
+	/*
+	 * Object chunk size currently mapped onto physical memory
+	 */
+	size_t rss_size;
+
 	bool noexec		:1;
 	bool is_heap		:1;
 	bool is_purgable	:1;
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index c0123d09f699..e03a5a9da06f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -285,17 +285,19 @@ static void panfrost_mmu_flush_range(struct panfrost_device *pfdev,
 	pm_runtime_put_autosuspend(pfdev->dev);
 }
 
-static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
+static size_t mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
 		      u64 iova, int prot, struct sg_table *sgt)
 {
 	unsigned int count;
 	struct scatterlist *sgl;
 	struct io_pgtable_ops *ops = mmu->pgtbl_ops;
 	u64 start_iova = iova;
+	size_t total = 0;
 
 	for_each_sgtable_dma_sg(sgt, sgl, count) {
 		unsigned long paddr = sg_dma_address(sgl);
 		size_t len = sg_dma_len(sgl);
+		total += len;
 
 		dev_dbg(pfdev->dev, "map: as=%d, iova=%llx, paddr=%lx, len=%zx", mmu->as, iova, paddr, len);
 
@@ -315,7 +317,7 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
 
 	panfrost_mmu_flush_range(pfdev, mmu, start_iova, iova - start_iova);
 
-	return 0;
+	return total;
 }
 
 int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
@@ -326,6 +328,7 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
 	struct panfrost_device *pfdev = to_panfrost_device(obj->dev);
 	struct sg_table *sgt;
 	int prot = IOMMU_READ | IOMMU_WRITE;
+	size_t mapped_size;
 
 	if (WARN_ON(mapping->active))
 		return 0;
@@ -337,9 +340,10 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
 	if (WARN_ON(IS_ERR(sgt)))
 		return PTR_ERR(sgt);
 
-	mmu_map_sg(pfdev, mapping->mmu, mapping->mmnode.start << PAGE_SHIFT,
+	mapped_size = mmu_map_sg(pfdev, mapping->mmu, mapping->mmnode.start << PAGE_SHIFT,
 		   prot, sgt);
 	mapping->active = true;
+	bo->rss_size += mapped_size;
 
 	return 0;
 }
@@ -447,6 +451,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
 	pgoff_t page_offset;
 	struct sg_table *sgt;
 	struct page **pages;
+	size_t mapped_size;
 
 	bomapping = addr_to_mapping(pfdev, as, addr);
 	if (!bomapping)
@@ -518,10 +523,11 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
 	if (ret)
 		goto err_map;
 
-	mmu_map_sg(pfdev, bomapping->mmu, addr,
-		   IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
+	mapped_size = mmu_map_sg(pfdev, bomapping->mmu, addr,
+				 IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
 
 	bomapping->active = true;
+	bo->rss_size += mapped_size;
 
 	dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
  2023-08-24  1:34 ` Adrián Larumbe
@ 2023-08-24  1:34   ` Adrián Larumbe
  -1 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-24  1:34 UTC (permalink / raw)
  To: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price
  Cc: linux-arm-msm, adrian.larumbe, linux-kernel, dri-devel, healych,
	kernel, freedreno

The current implementation will try to pick the highest available
unit. This is rather unflexible, and allowing drivers to display BO size
statistics through fdinfo in units of their choice might be desirable.

The new argument to drm_show_memory_stats is to be interpreted as the
integer multiplier of a 10-power of 2, so 1 would give us size in Kib and 2
in Mib. If we want drm-file functions to pick the highest unit, then 0
should be passed.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/drm_file.c              | 22 +++++++++++++---------
 drivers/gpu/drm/msm/msm_drv.c           |  2 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c |  2 +-
 include/drm/drm_file.h                  |  5 +++--
 4 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 762965e3d503..517e1fb8072a 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -873,7 +873,7 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
 EXPORT_SYMBOL(drm_send_event);
 
 static void print_size(struct drm_printer *p, const char *stat,
-		       const char *region, u64 sz)
+		       const char *region, u64 sz, unsigned int unit)
 {
 	const char *units[] = {"", " KiB", " MiB"};
 	unsigned u;
@@ -881,6 +881,8 @@ static void print_size(struct drm_printer *p, const char *stat,
 	for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
 		if (sz < SZ_1K)
 			break;
+		if (unit > 0 && unit == u)
+			break;
 		sz = div_u64(sz, SZ_1K);
 	}
 
@@ -898,17 +900,18 @@ static void print_size(struct drm_printer *p, const char *stat,
 void drm_print_memory_stats(struct drm_printer *p,
 			    const struct drm_memory_stats *stats,
 			    enum drm_gem_object_status supported_status,
-			    const char *region)
+			    const char *region,
+			    unsigned int unit)
 {
-	print_size(p, "total", region, stats->private + stats->shared);
-	print_size(p, "shared", region, stats->shared);
-	print_size(p, "active", region, stats->active);
+	print_size(p, "total", region, stats->private + stats->shared, unit);
+	print_size(p, "shared", region, stats->shared, unit);
+	print_size(p, "active", region, stats->active, unit);
 
 	if (supported_status & DRM_GEM_OBJECT_RESIDENT)
-		print_size(p, "resident", region, stats->resident);
+		print_size(p, "resident", region, stats->resident, unit);
 
 	if (supported_status & DRM_GEM_OBJECT_PURGEABLE)
-		print_size(p, "purgeable", region, stats->purgeable);
+		print_size(p, "purgeable", region, stats->purgeable, unit);
 }
 EXPORT_SYMBOL(drm_print_memory_stats);
 
@@ -916,11 +919,12 @@ EXPORT_SYMBOL(drm_print_memory_stats);
  * drm_show_memory_stats - Helper to collect and show standard fdinfo memory stats
  * @p: the printer to print output to
  * @file: the DRM file
+ * @unit: multipliyer of power of two exponent of desired unit
  *
  * Helper to iterate over GEM objects with a handle allocated in the specified
  * file.
  */
-void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
+void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit)
 {
 	struct drm_gem_object *obj;
 	struct drm_memory_stats status = {};
@@ -967,7 +971,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
 	}
 	spin_unlock(&file->table_lock);
 
-	drm_print_memory_stats(p, &status, supported_status, "memory");
+	drm_print_memory_stats(p, &status, supported_status, "memory", unit);
 }
 EXPORT_SYMBOL(drm_show_memory_stats);
 
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 2a0e3529598b..cd1198151744 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -1067,7 +1067,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 
 	msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
 
-	drm_show_memory_stats(p, file);
+	drm_show_memory_stats(p, file, 0);
 }
 
 static const struct file_operations fops = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 93d5f5538c0b..79c08cee3e9d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -563,7 +563,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 
 	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
 
-	drm_show_memory_stats(p, file);
+	drm_show_memory_stats(p, file, 1);
 }
 
 static const struct file_operations panfrost_drm_driver_fops = {
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 010239392adf..21a3b022dd63 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -466,9 +466,10 @@ enum drm_gem_object_status;
 void drm_print_memory_stats(struct drm_printer *p,
 			    const struct drm_memory_stats *stats,
 			    enum drm_gem_object_status supported_status,
-			    const char *region);
+			    const char *region,
+			    unsigned int unit);
 
-void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file);
+void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit);
 void drm_show_fdinfo(struct seq_file *m, struct file *f);
 
 struct file *mock_drm_getfile(struct drm_minor *minor, unsigned int flags);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
@ 2023-08-24  1:34   ` Adrián Larumbe
  0 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-24  1:34 UTC (permalink / raw)
  To: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price
  Cc: adrian.larumbe, dri-devel, linux-kernel, linux-arm-msm,
	freedreno, healych, kernel

The current implementation will try to pick the highest available
unit. This is rather unflexible, and allowing drivers to display BO size
statistics through fdinfo in units of their choice might be desirable.

The new argument to drm_show_memory_stats is to be interpreted as the
integer multiplier of a 10-power of 2, so 1 would give us size in Kib and 2
in Mib. If we want drm-file functions to pick the highest unit, then 0
should be passed.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/drm_file.c              | 22 +++++++++++++---------
 drivers/gpu/drm/msm/msm_drv.c           |  2 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c |  2 +-
 include/drm/drm_file.h                  |  5 +++--
 4 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 762965e3d503..517e1fb8072a 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -873,7 +873,7 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
 EXPORT_SYMBOL(drm_send_event);
 
 static void print_size(struct drm_printer *p, const char *stat,
-		       const char *region, u64 sz)
+		       const char *region, u64 sz, unsigned int unit)
 {
 	const char *units[] = {"", " KiB", " MiB"};
 	unsigned u;
@@ -881,6 +881,8 @@ static void print_size(struct drm_printer *p, const char *stat,
 	for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
 		if (sz < SZ_1K)
 			break;
+		if (unit > 0 && unit == u)
+			break;
 		sz = div_u64(sz, SZ_1K);
 	}
 
@@ -898,17 +900,18 @@ static void print_size(struct drm_printer *p, const char *stat,
 void drm_print_memory_stats(struct drm_printer *p,
 			    const struct drm_memory_stats *stats,
 			    enum drm_gem_object_status supported_status,
-			    const char *region)
+			    const char *region,
+			    unsigned int unit)
 {
-	print_size(p, "total", region, stats->private + stats->shared);
-	print_size(p, "shared", region, stats->shared);
-	print_size(p, "active", region, stats->active);
+	print_size(p, "total", region, stats->private + stats->shared, unit);
+	print_size(p, "shared", region, stats->shared, unit);
+	print_size(p, "active", region, stats->active, unit);
 
 	if (supported_status & DRM_GEM_OBJECT_RESIDENT)
-		print_size(p, "resident", region, stats->resident);
+		print_size(p, "resident", region, stats->resident, unit);
 
 	if (supported_status & DRM_GEM_OBJECT_PURGEABLE)
-		print_size(p, "purgeable", region, stats->purgeable);
+		print_size(p, "purgeable", region, stats->purgeable, unit);
 }
 EXPORT_SYMBOL(drm_print_memory_stats);
 
@@ -916,11 +919,12 @@ EXPORT_SYMBOL(drm_print_memory_stats);
  * drm_show_memory_stats - Helper to collect and show standard fdinfo memory stats
  * @p: the printer to print output to
  * @file: the DRM file
+ * @unit: multipliyer of power of two exponent of desired unit
  *
  * Helper to iterate over GEM objects with a handle allocated in the specified
  * file.
  */
-void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
+void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit)
 {
 	struct drm_gem_object *obj;
 	struct drm_memory_stats status = {};
@@ -967,7 +971,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
 	}
 	spin_unlock(&file->table_lock);
 
-	drm_print_memory_stats(p, &status, supported_status, "memory");
+	drm_print_memory_stats(p, &status, supported_status, "memory", unit);
 }
 EXPORT_SYMBOL(drm_show_memory_stats);
 
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 2a0e3529598b..cd1198151744 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -1067,7 +1067,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 
 	msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
 
-	drm_show_memory_stats(p, file);
+	drm_show_memory_stats(p, file, 0);
 }
 
 static const struct file_operations fops = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 93d5f5538c0b..79c08cee3e9d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -563,7 +563,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
 
 	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
 
-	drm_show_memory_stats(p, file);
+	drm_show_memory_stats(p, file, 1);
 }
 
 static const struct file_operations panfrost_drm_driver_fops = {
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 010239392adf..21a3b022dd63 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -466,9 +466,10 @@ enum drm_gem_object_status;
 void drm_print_memory_stats(struct drm_printer *p,
 			    const struct drm_memory_stats *stats,
 			    enum drm_gem_object_status supported_status,
-			    const char *region);
+			    const char *region,
+			    unsigned int unit);
 
-void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file);
+void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit);
 void drm_show_fdinfo(struct seq_file *m, struct file *f);
 
 struct file *mock_drm_getfile(struct drm_minor *minor, unsigned int flags);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
  2023-08-24  1:34   ` Adrián Larumbe
@ 2023-08-24  4:12     ` kernel test robot
  -1 siblings, 0 replies; 56+ messages in thread
From: kernel test robot @ 2023-08-24  4:12 UTC (permalink / raw)
  To: Adrián Larumbe, maarten.lankhorst, mripard, tzimmermann,
	airlied, daniel, robdclark, quic_abhinavk, dmitry.baryshkov,
	sean, marijn.suijten, robh, steven.price
  Cc: linux-arm-msm, adrian.larumbe, healych, dri-devel, linux-kernel,
	oe-kbuild-all, kernel, freedreno

Hi Adrián,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v6.5-rc7 next-20230823]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panfrost-Add-cycle-count-GPU-register-definitions/20230824-093848
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230824013604.466224-3-adrian.larumbe%40collabora.com
patch subject: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20230824/202308241240.ngAywBMr-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 13.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230824/202308241240.ngAywBMr-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308241240.ngAywBMr-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/panfrost/panfrost_drv.c: In function 'panfrost_gpu_show_fdinfo':
>> drivers/gpu/drm/panfrost/panfrost_drv.c:551:50: warning: format '%u' expects argument of type 'unsigned int', but argument 4 has type 'long unsigned int' [-Wformat=]
     551 |                 drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
         |                                                 ~^
         |                                                  |
         |                                                  unsigned int
         |                                                 %lu
     552 |                            ei->name, pfdev->pfdevfreq.current_frequency);
         |                                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |                                                      |
         |                                                      long unsigned int


vim +551 drivers/gpu/drm/panfrost/panfrost_drv.c

   534	
   535	
   536	static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
   537					     struct panfrost_file_priv *panfrost_priv,
   538					     struct drm_printer *p)
   539	{
   540		int i;
   541	
   542		for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
   543			struct engine_info *ei = &panfrost_priv->fdinfo.engines[i];
   544	
   545			drm_printf(p, "drm-engine-%s:\t%llu ns\n",
   546				   ei->name, ei->elapsed_ns);
   547			drm_printf(p, "drm-cycles-%s:\t%llu\n",
   548				   ei->name, ei->cycles);
   549			drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
   550				   ei->name, panfrost_priv->fdinfo.maxfreq);
 > 551			drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
   552				   ei->name, pfdev->pfdevfreq.current_frequency);
   553		}
   554	}
   555	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
@ 2023-08-24  4:12     ` kernel test robot
  0 siblings, 0 replies; 56+ messages in thread
From: kernel test robot @ 2023-08-24  4:12 UTC (permalink / raw)
  To: Adrián Larumbe, maarten.lankhorst, mripard, tzimmermann,
	airlied, daniel, robdclark, quic_abhinavk, dmitry.baryshkov,
	sean, marijn.suijten, robh, steven.price
  Cc: oe-kbuild-all, linux-arm-msm, adrian.larumbe, linux-kernel,
	dri-devel, healych, kernel, freedreno

Hi Adrián,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v6.5-rc7 next-20230823]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panfrost-Add-cycle-count-GPU-register-definitions/20230824-093848
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230824013604.466224-3-adrian.larumbe%40collabora.com
patch subject: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20230824/202308241240.ngAywBMr-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 13.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230824/202308241240.ngAywBMr-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308241240.ngAywBMr-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/panfrost/panfrost_drv.c: In function 'panfrost_gpu_show_fdinfo':
>> drivers/gpu/drm/panfrost/panfrost_drv.c:551:50: warning: format '%u' expects argument of type 'unsigned int', but argument 4 has type 'long unsigned int' [-Wformat=]
     551 |                 drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
         |                                                 ~^
         |                                                  |
         |                                                  unsigned int
         |                                                 %lu
     552 |                            ei->name, pfdev->pfdevfreq.current_frequency);
         |                                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |                                                      |
         |                                                      long unsigned int


vim +551 drivers/gpu/drm/panfrost/panfrost_drv.c

   534	
   535	
   536	static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
   537					     struct panfrost_file_priv *panfrost_priv,
   538					     struct drm_printer *p)
   539	{
   540		int i;
   541	
   542		for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
   543			struct engine_info *ei = &panfrost_priv->fdinfo.engines[i];
   544	
   545			drm_printf(p, "drm-engine-%s:\t%llu ns\n",
   546				   ei->name, ei->elapsed_ns);
   547			drm_printf(p, "drm-cycles-%s:\t%llu\n",
   548				   ei->name, ei->cycles);
   549			drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
   550				   ei->name, panfrost_priv->fdinfo.maxfreq);
 > 551			drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
   552				   ei->name, pfdev->pfdevfreq.current_frequency);
   553		}
   554	}
   555	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
  2023-08-24  1:34   ` Adrián Larumbe
@ 2023-08-24  6:49     ` kernel test robot
  -1 siblings, 0 replies; 56+ messages in thread
From: kernel test robot @ 2023-08-24  6:49 UTC (permalink / raw)
  To: Adrián Larumbe, maarten.lankhorst, mripard, tzimmermann,
	airlied, daniel, robdclark, quic_abhinavk, dmitry.baryshkov,
	sean, marijn.suijten, robh, steven.price
  Cc: linux-arm-msm, adrian.larumbe, healych, dri-devel, linux-kernel,
	oe-kbuild-all, kernel, freedreno

Hi Adrián,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v6.5-rc7 next-20230823]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panfrost-Add-cycle-count-GPU-register-definitions/20230824-093848
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230824013604.466224-7-adrian.larumbe%40collabora.com
patch subject: [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20230824/202308241401.Hr6gvevs-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230824/202308241401.Hr6gvevs-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308241401.Hr6gvevs-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/drm_file.c:905: warning: Function parameter or member 'unit' not described in 'drm_print_memory_stats'


vim +905 drivers/gpu/drm/drm_file.c

686b21b5f6ca2f Rob Clark      2023-05-24  891  
686b21b5f6ca2f Rob Clark      2023-05-24  892  /**
686b21b5f6ca2f Rob Clark      2023-05-24  893   * drm_print_memory_stats - A helper to print memory stats
686b21b5f6ca2f Rob Clark      2023-05-24  894   * @p: The printer to print output to
686b21b5f6ca2f Rob Clark      2023-05-24  895   * @stats: The collected memory stats
686b21b5f6ca2f Rob Clark      2023-05-24  896   * @supported_status: Bitmask of optional stats which are available
686b21b5f6ca2f Rob Clark      2023-05-24  897   * @region: The memory region
686b21b5f6ca2f Rob Clark      2023-05-24  898   *
686b21b5f6ca2f Rob Clark      2023-05-24  899   */
686b21b5f6ca2f Rob Clark      2023-05-24  900  void drm_print_memory_stats(struct drm_printer *p,
686b21b5f6ca2f Rob Clark      2023-05-24  901  			    const struct drm_memory_stats *stats,
686b21b5f6ca2f Rob Clark      2023-05-24  902  			    enum drm_gem_object_status supported_status,
cccad8cb432637 Adrián Larumbe 2023-08-24  903  			    const char *region,
cccad8cb432637 Adrián Larumbe 2023-08-24  904  			    unsigned int unit)
686b21b5f6ca2f Rob Clark      2023-05-24 @905  {
cccad8cb432637 Adrián Larumbe 2023-08-24  906  	print_size(p, "total", region, stats->private + stats->shared, unit);
cccad8cb432637 Adrián Larumbe 2023-08-24  907  	print_size(p, "shared", region, stats->shared, unit);
cccad8cb432637 Adrián Larumbe 2023-08-24  908  	print_size(p, "active", region, stats->active, unit);
686b21b5f6ca2f Rob Clark      2023-05-24  909  
686b21b5f6ca2f Rob Clark      2023-05-24  910  	if (supported_status & DRM_GEM_OBJECT_RESIDENT)
cccad8cb432637 Adrián Larumbe 2023-08-24  911  		print_size(p, "resident", region, stats->resident, unit);
686b21b5f6ca2f Rob Clark      2023-05-24  912  
686b21b5f6ca2f Rob Clark      2023-05-24  913  	if (supported_status & DRM_GEM_OBJECT_PURGEABLE)
cccad8cb432637 Adrián Larumbe 2023-08-24  914  		print_size(p, "purgeable", region, stats->purgeable, unit);
686b21b5f6ca2f Rob Clark      2023-05-24  915  }
686b21b5f6ca2f Rob Clark      2023-05-24  916  EXPORT_SYMBOL(drm_print_memory_stats);
686b21b5f6ca2f Rob Clark      2023-05-24  917  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
@ 2023-08-24  6:49     ` kernel test robot
  0 siblings, 0 replies; 56+ messages in thread
From: kernel test robot @ 2023-08-24  6:49 UTC (permalink / raw)
  To: Adrián Larumbe, maarten.lankhorst, mripard, tzimmermann,
	airlied, daniel, robdclark, quic_abhinavk, dmitry.baryshkov,
	sean, marijn.suijten, robh, steven.price
  Cc: oe-kbuild-all, linux-arm-msm, adrian.larumbe, linux-kernel,
	dri-devel, healych, kernel, freedreno

Hi Adrián,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v6.5-rc7 next-20230823]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panfrost-Add-cycle-count-GPU-register-definitions/20230824-093848
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230824013604.466224-7-adrian.larumbe%40collabora.com
patch subject: [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20230824/202308241401.Hr6gvevs-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230824/202308241401.Hr6gvevs-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308241401.Hr6gvevs-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/drm_file.c:905: warning: Function parameter or member 'unit' not described in 'drm_print_memory_stats'


vim +905 drivers/gpu/drm/drm_file.c

686b21b5f6ca2f Rob Clark      2023-05-24  891  
686b21b5f6ca2f Rob Clark      2023-05-24  892  /**
686b21b5f6ca2f Rob Clark      2023-05-24  893   * drm_print_memory_stats - A helper to print memory stats
686b21b5f6ca2f Rob Clark      2023-05-24  894   * @p: The printer to print output to
686b21b5f6ca2f Rob Clark      2023-05-24  895   * @stats: The collected memory stats
686b21b5f6ca2f Rob Clark      2023-05-24  896   * @supported_status: Bitmask of optional stats which are available
686b21b5f6ca2f Rob Clark      2023-05-24  897   * @region: The memory region
686b21b5f6ca2f Rob Clark      2023-05-24  898   *
686b21b5f6ca2f Rob Clark      2023-05-24  899   */
686b21b5f6ca2f Rob Clark      2023-05-24  900  void drm_print_memory_stats(struct drm_printer *p,
686b21b5f6ca2f Rob Clark      2023-05-24  901  			    const struct drm_memory_stats *stats,
686b21b5f6ca2f Rob Clark      2023-05-24  902  			    enum drm_gem_object_status supported_status,
cccad8cb432637 Adrián Larumbe 2023-08-24  903  			    const char *region,
cccad8cb432637 Adrián Larumbe 2023-08-24  904  			    unsigned int unit)
686b21b5f6ca2f Rob Clark      2023-05-24 @905  {
cccad8cb432637 Adrián Larumbe 2023-08-24  906  	print_size(p, "total", region, stats->private + stats->shared, unit);
cccad8cb432637 Adrián Larumbe 2023-08-24  907  	print_size(p, "shared", region, stats->shared, unit);
cccad8cb432637 Adrián Larumbe 2023-08-24  908  	print_size(p, "active", region, stats->active, unit);
686b21b5f6ca2f Rob Clark      2023-05-24  909  
686b21b5f6ca2f Rob Clark      2023-05-24  910  	if (supported_status & DRM_GEM_OBJECT_RESIDENT)
cccad8cb432637 Adrián Larumbe 2023-08-24  911  		print_size(p, "resident", region, stats->resident, unit);
686b21b5f6ca2f Rob Clark      2023-05-24  912  
686b21b5f6ca2f Rob Clark      2023-05-24  913  	if (supported_status & DRM_GEM_OBJECT_PURGEABLE)
cccad8cb432637 Adrián Larumbe 2023-08-24  914  		print_size(p, "purgeable", region, stats->purgeable, unit);
686b21b5f6ca2f Rob Clark      2023-05-24  915  }
686b21b5f6ca2f Rob Clark      2023-05-24  916  EXPORT_SYMBOL(drm_print_memory_stats);
686b21b5f6ca2f Rob Clark      2023-05-24  917  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 5/6] drm/panfrost: Implement generic DRM object RSS reporting function
  2023-08-24  1:34   ` Adrián Larumbe
@ 2023-08-24 11:13     ` kernel test robot
  -1 siblings, 0 replies; 56+ messages in thread
From: kernel test robot @ 2023-08-24 11:13 UTC (permalink / raw)
  To: Adrián Larumbe, maarten.lankhorst, mripard, tzimmermann,
	airlied, daniel, robdclark, quic_abhinavk, dmitry.baryshkov,
	sean, marijn.suijten, robh, steven.price
  Cc: oe-kbuild-all, linux-arm-msm, adrian.larumbe, linux-kernel,
	dri-devel, healych, kernel, freedreno

Hi Adrián,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v6.5-rc7 next-20230824]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panfrost-Add-cycle-count-GPU-register-definitions/20230824-093848
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230824013604.466224-6-adrian.larumbe%40collabora.com
patch subject: [PATCH v2 5/6] drm/panfrost: Implement generic DRM object RSS reporting function
config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20230824/202308241850.UjqyDaGz-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 13.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230824/202308241850.UjqyDaGz-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308241850.UjqyDaGz-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/panfrost/panfrost_gem.c:210:8: warning: no previous prototype for 'panfrost_gem_rss' [-Wmissing-prototypes]
     210 | size_t panfrost_gem_rss(struct drm_gem_object *obj)
         |        ^~~~~~~~~~~~~~~~


vim +/panfrost_gem_rss +210 drivers/gpu/drm/panfrost/panfrost_gem.c

   209	
 > 210	size_t panfrost_gem_rss(struct drm_gem_object *obj)
   211	{
   212		struct panfrost_gem_object *bo = to_panfrost_bo(obj);
   213	
   214		if (!bo->base.pages)
   215			return 0;
   216	
   217		return bo->rss_size;
   218	}
   219	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 5/6] drm/panfrost: Implement generic DRM object RSS reporting function
@ 2023-08-24 11:13     ` kernel test robot
  0 siblings, 0 replies; 56+ messages in thread
From: kernel test robot @ 2023-08-24 11:13 UTC (permalink / raw)
  To: Adrián Larumbe, maarten.lankhorst, mripard, tzimmermann,
	airlied, daniel, robdclark, quic_abhinavk, dmitry.baryshkov,
	sean, marijn.suijten, robh, steven.price
  Cc: linux-arm-msm, adrian.larumbe, healych, dri-devel, linux-kernel,
	oe-kbuild-all, kernel, freedreno

Hi Adrián,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v6.5-rc7 next-20230824]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panfrost-Add-cycle-count-GPU-register-definitions/20230824-093848
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230824013604.466224-6-adrian.larumbe%40collabora.com
patch subject: [PATCH v2 5/6] drm/panfrost: Implement generic DRM object RSS reporting function
config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20230824/202308241850.UjqyDaGz-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 13.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230824/202308241850.UjqyDaGz-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308241850.UjqyDaGz-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/panfrost/panfrost_gem.c:210:8: warning: no previous prototype for 'panfrost_gem_rss' [-Wmissing-prototypes]
     210 | size_t panfrost_gem_rss(struct drm_gem_object *obj)
         |        ^~~~~~~~~~~~~~~~


vim +/panfrost_gem_rss +210 drivers/gpu/drm/panfrost/panfrost_gem.c

   209	
 > 210	size_t panfrost_gem_rss(struct drm_gem_object *obj)
   211	{
   212		struct panfrost_gem_object *bo = to_panfrost_bo(obj);
   213	
   214		if (!bo->base.pages)
   215			return 0;
   216	
   217		return bo->rss_size;
   218	}
   219	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
  2023-08-24  1:34   ` Adrián Larumbe
@ 2023-08-28 15:00     ` Rob Clark
  -1 siblings, 0 replies; 56+ messages in thread
From: Rob Clark @ 2023-08-28 15:00 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten, robh,
	steven.price, dri-devel, linux-kernel, linux-arm-msm, freedreno,
	healych, kernel, Tvrtko Ursulin

On Wed, Aug 23, 2023 at 6:36 PM Adrián Larumbe
<adrian.larumbe@collabora.com> wrote:
>
> The current implementation will try to pick the highest available
> unit. This is rather unflexible, and allowing drivers to display BO size
> statistics through fdinfo in units of their choice might be desirable.
>
> The new argument to drm_show_memory_stats is to be interpreted as the
> integer multiplier of a 10-power of 2, so 1 would give us size in Kib and 2
> in Mib. If we want drm-file functions to pick the highest unit, then 0
> should be passed.
>
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/drm_file.c              | 22 +++++++++++++---------
>  drivers/gpu/drm/msm/msm_drv.c           |  2 +-
>  drivers/gpu/drm/panfrost/panfrost_drv.c |  2 +-
>  include/drm/drm_file.h                  |  5 +++--
>  4 files changed, 18 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index 762965e3d503..517e1fb8072a 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -873,7 +873,7 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
>  EXPORT_SYMBOL(drm_send_event);
>
>  static void print_size(struct drm_printer *p, const char *stat,
> -                      const char *region, u64 sz)
> +                      const char *region, u64 sz, unsigned int unit)
>  {
>         const char *units[] = {"", " KiB", " MiB"};
>         unsigned u;
> @@ -881,6 +881,8 @@ static void print_size(struct drm_printer *p, const char *stat,
>         for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
>                 if (sz < SZ_1K)
>                         break;
> +               if (unit > 0 && unit == u)
> +                       break;
>                 sz = div_u64(sz, SZ_1K);
>         }
>
> @@ -898,17 +900,18 @@ static void print_size(struct drm_printer *p, const char *stat,
>  void drm_print_memory_stats(struct drm_printer *p,
>                             const struct drm_memory_stats *stats,
>                             enum drm_gem_object_status supported_status,
> -                           const char *region)
> +                           const char *region,
> +                           unsigned int unit)

I'm not really adverse to changing what units we use.. or perhaps
changing the threshold to go to higher units to be 10000x or 100000x
of the previous unit.  But I'm less excited about having different
drivers using different units.

BR,
-R


>  {
> -       print_size(p, "total", region, stats->private + stats->shared);
> -       print_size(p, "shared", region, stats->shared);
> -       print_size(p, "active", region, stats->active);
> +       print_size(p, "total", region, stats->private + stats->shared, unit);
> +       print_size(p, "shared", region, stats->shared, unit);
> +       print_size(p, "active", region, stats->active, unit);
>
>         if (supported_status & DRM_GEM_OBJECT_RESIDENT)
> -               print_size(p, "resident", region, stats->resident);
> +               print_size(p, "resident", region, stats->resident, unit);
>
>         if (supported_status & DRM_GEM_OBJECT_PURGEABLE)
> -               print_size(p, "purgeable", region, stats->purgeable);
> +               print_size(p, "purgeable", region, stats->purgeable, unit);
>  }
>  EXPORT_SYMBOL(drm_print_memory_stats);
>
> @@ -916,11 +919,12 @@ EXPORT_SYMBOL(drm_print_memory_stats);
>   * drm_show_memory_stats - Helper to collect and show standard fdinfo memory stats
>   * @p: the printer to print output to
>   * @file: the DRM file
> + * @unit: multipliyer of power of two exponent of desired unit
>   *
>   * Helper to iterate over GEM objects with a handle allocated in the specified
>   * file.
>   */
> -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
> +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit)
>  {
>         struct drm_gem_object *obj;
>         struct drm_memory_stats status = {};
> @@ -967,7 +971,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
>         }
>         spin_unlock(&file->table_lock);
>
> -       drm_print_memory_stats(p, &status, supported_status, "memory");
> +       drm_print_memory_stats(p, &status, supported_status, "memory", unit);
>  }
>  EXPORT_SYMBOL(drm_show_memory_stats);
>
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index 2a0e3529598b..cd1198151744 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -1067,7 +1067,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>
>         msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
>
> -       drm_show_memory_stats(p, file);
> +       drm_show_memory_stats(p, file, 0);
>  }
>
>  static const struct file_operations fops = {
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 93d5f5538c0b..79c08cee3e9d 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -563,7 +563,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>
>         panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
>
> -       drm_show_memory_stats(p, file);
> +       drm_show_memory_stats(p, file, 1);
>  }
>
>  static const struct file_operations panfrost_drm_driver_fops = {
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index 010239392adf..21a3b022dd63 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -466,9 +466,10 @@ enum drm_gem_object_status;
>  void drm_print_memory_stats(struct drm_printer *p,
>                             const struct drm_memory_stats *stats,
>                             enum drm_gem_object_status supported_status,
> -                           const char *region);
> +                           const char *region,
> +                           unsigned int unit);
>
> -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file);
> +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit);
>  void drm_show_fdinfo(struct seq_file *m, struct file *f);
>
>  struct file *mock_drm_getfile(struct drm_minor *minor, unsigned int flags);
> --
> 2.42.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
@ 2023-08-28 15:00     ` Rob Clark
  0 siblings, 0 replies; 56+ messages in thread
From: Rob Clark @ 2023-08-28 15:00 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: kernel, tzimmermann, Tvrtko Ursulin, sean, quic_abhinavk,
	mripard, steven.price, healych, dri-devel, linux-arm-msm,
	dmitry.baryshkov, marijn.suijten, freedreno, linux-kernel

On Wed, Aug 23, 2023 at 6:36 PM Adrián Larumbe
<adrian.larumbe@collabora.com> wrote:
>
> The current implementation will try to pick the highest available
> unit. This is rather unflexible, and allowing drivers to display BO size
> statistics through fdinfo in units of their choice might be desirable.
>
> The new argument to drm_show_memory_stats is to be interpreted as the
> integer multiplier of a 10-power of 2, so 1 would give us size in Kib and 2
> in Mib. If we want drm-file functions to pick the highest unit, then 0
> should be passed.
>
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/drm_file.c              | 22 +++++++++++++---------
>  drivers/gpu/drm/msm/msm_drv.c           |  2 +-
>  drivers/gpu/drm/panfrost/panfrost_drv.c |  2 +-
>  include/drm/drm_file.h                  |  5 +++--
>  4 files changed, 18 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index 762965e3d503..517e1fb8072a 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -873,7 +873,7 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
>  EXPORT_SYMBOL(drm_send_event);
>
>  static void print_size(struct drm_printer *p, const char *stat,
> -                      const char *region, u64 sz)
> +                      const char *region, u64 sz, unsigned int unit)
>  {
>         const char *units[] = {"", " KiB", " MiB"};
>         unsigned u;
> @@ -881,6 +881,8 @@ static void print_size(struct drm_printer *p, const char *stat,
>         for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
>                 if (sz < SZ_1K)
>                         break;
> +               if (unit > 0 && unit == u)
> +                       break;
>                 sz = div_u64(sz, SZ_1K);
>         }
>
> @@ -898,17 +900,18 @@ static void print_size(struct drm_printer *p, const char *stat,
>  void drm_print_memory_stats(struct drm_printer *p,
>                             const struct drm_memory_stats *stats,
>                             enum drm_gem_object_status supported_status,
> -                           const char *region)
> +                           const char *region,
> +                           unsigned int unit)

I'm not really adverse to changing what units we use.. or perhaps
changing the threshold to go to higher units to be 10000x or 100000x
of the previous unit.  But I'm less excited about having different
drivers using different units.

BR,
-R


>  {
> -       print_size(p, "total", region, stats->private + stats->shared);
> -       print_size(p, "shared", region, stats->shared);
> -       print_size(p, "active", region, stats->active);
> +       print_size(p, "total", region, stats->private + stats->shared, unit);
> +       print_size(p, "shared", region, stats->shared, unit);
> +       print_size(p, "active", region, stats->active, unit);
>
>         if (supported_status & DRM_GEM_OBJECT_RESIDENT)
> -               print_size(p, "resident", region, stats->resident);
> +               print_size(p, "resident", region, stats->resident, unit);
>
>         if (supported_status & DRM_GEM_OBJECT_PURGEABLE)
> -               print_size(p, "purgeable", region, stats->purgeable);
> +               print_size(p, "purgeable", region, stats->purgeable, unit);
>  }
>  EXPORT_SYMBOL(drm_print_memory_stats);
>
> @@ -916,11 +919,12 @@ EXPORT_SYMBOL(drm_print_memory_stats);
>   * drm_show_memory_stats - Helper to collect and show standard fdinfo memory stats
>   * @p: the printer to print output to
>   * @file: the DRM file
> + * @unit: multipliyer of power of two exponent of desired unit
>   *
>   * Helper to iterate over GEM objects with a handle allocated in the specified
>   * file.
>   */
> -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
> +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit)
>  {
>         struct drm_gem_object *obj;
>         struct drm_memory_stats status = {};
> @@ -967,7 +971,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
>         }
>         spin_unlock(&file->table_lock);
>
> -       drm_print_memory_stats(p, &status, supported_status, "memory");
> +       drm_print_memory_stats(p, &status, supported_status, "memory", unit);
>  }
>  EXPORT_SYMBOL(drm_show_memory_stats);
>
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index 2a0e3529598b..cd1198151744 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -1067,7 +1067,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>
>         msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
>
> -       drm_show_memory_stats(p, file);
> +       drm_show_memory_stats(p, file, 0);
>  }
>
>  static const struct file_operations fops = {
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 93d5f5538c0b..79c08cee3e9d 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -563,7 +563,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>
>         panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
>
> -       drm_show_memory_stats(p, file);
> +       drm_show_memory_stats(p, file, 1);
>  }
>
>  static const struct file_operations panfrost_drm_driver_fops = {
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index 010239392adf..21a3b022dd63 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -466,9 +466,10 @@ enum drm_gem_object_status;
>  void drm_print_memory_stats(struct drm_printer *p,
>                             const struct drm_memory_stats *stats,
>                             enum drm_gem_object_status supported_status,
> -                           const char *region);
> +                           const char *region,
> +                           unsigned int unit);
>
> -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file);
> +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit);
>  void drm_show_fdinfo(struct seq_file *m, struct file *f);
>
>  struct file *mock_drm_getfile(struct drm_minor *minor, unsigned int flags);
> --
> 2.42.0
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
  2023-08-24  1:34   ` Adrián Larumbe
@ 2023-08-30 10:17     ` Boris Brezillon
  -1 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2023-08-30 10:17 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: tzimmermann, sean, quic_abhinavk, mripard, steven.price,
	freedreno, healych, dri-devel, linux-arm-msm, dmitry.baryshkov,
	marijn.suijten, kernel, linux-kernel

On Thu, 24 Aug 2023 02:34:45 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> The drm-stats fdinfo tags made available to user space are drm-engine,
> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.

Pretty sure this has already been discussed, but it's probably worth
mentioning that drm-cycles is not accurate, it just gives you a rough
idea of how much GPU cycles were dedicated to a context (just like
drm-engine elapsed-ns is giving you an approximation of the
GPU utilization). This comes from 2 factors:

1. We're dependent on the time the kernel/CPU takes to process the GPU
interrupt.
2. The pipelining done by the Job Manager (2 job slots per engine)
implies that you can't really know how much time each job spent on the
GPU. When these jobs are coming from the same context, that's not a
problem, but when they don't, it's impossible to have a clear split.

I'd really like to have that mentioned somewhere in the code+commit
message to lower users expectation.

> 
> This deviates from standard practice in other DRM drivers, where a single
> set of key:value pairs is provided for the whole render engine. However,
> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
> decision was made to calculate bus cycles and workload times separately.
> 
> Maximum operating frequency is calculated at devfreq initialisation time.
> Current frequency is made available to user space because nvtop uses it
> when performing engine usage calculations.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
>  drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>  drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
>  drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
>  drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
>  6 files changed, 102 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> index 58dfb15a8757..28caffc689e2 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> @@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
>  	spin_lock_irqsave(&pfdevfreq->lock, irqflags);
>  
>  	panfrost_devfreq_update_utilization(pfdevfreq);
> +	pfdevfreq->current_frequency = status->current_frequency;
>  
>  	status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time,
>  						   pfdevfreq->idle_time));
> @@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
>  	struct devfreq *devfreq;
>  	struct thermal_cooling_device *cooling;
>  	struct panfrost_devfreq *pfdevfreq = &pfdev->pfdevfreq;
> +	unsigned long freq = ULONG_MAX;
>  
>  	if (pfdev->comp->num_supplies > 1) {
>  		/*
> @@ -172,6 +174,12 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
>  		return ret;
>  	}
>  
> +	/* Find the fastest defined rate  */
> +	opp = dev_pm_opp_find_freq_floor(dev, &freq);
> +	if (IS_ERR(opp))
> +		return PTR_ERR(opp);
> +	pfdevfreq->fast_rate = freq;
> +
>  	dev_pm_opp_put(opp);
>  
>  	/*
> diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.h b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
> index 1514c1f9d91c..48dbe185f206 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
> @@ -19,6 +19,9 @@ struct panfrost_devfreq {
>  	struct devfreq_simple_ondemand_data gov_data;
>  	bool opp_of_table_added;
>  
> +	unsigned long current_frequency;
> +	unsigned long fast_rate;
> +
>  	ktime_t busy_time;
>  	ktime_t idle_time;
>  	ktime_t time_last_update;
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
> index b0126b9fbadc..680f298fd1a9 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> @@ -24,6 +24,7 @@ struct panfrost_perfcnt;
>  
>  #define NUM_JOB_SLOTS 3
>  #define MAX_PM_DOMAINS 5
> +#define MAX_SLOT_NAME_LEN 10
>  
>  struct panfrost_features {
>  	u16 id;
> @@ -135,12 +136,24 @@ struct panfrost_mmu {
>  	struct list_head list;
>  };
>  
> +struct drm_info_gpu {
> +	unsigned int maxfreq;
> +
> +	struct engine_info {
> +		unsigned long long elapsed_ns;
> +		unsigned long long cycles;
> +		char name[MAX_SLOT_NAME_LEN];
> +	} engines[NUM_JOB_SLOTS];
> +};
> +
>  struct panfrost_file_priv {
>  	struct panfrost_device *pfdev;
>  
>  	struct drm_sched_entity sched_entity[NUM_JOB_SLOTS];
>  
>  	struct panfrost_mmu *mmu;
> +
> +	struct drm_info_gpu fdinfo;
>  };
>  
>  static inline struct panfrost_device *to_panfrost_device(struct drm_device *ddev)
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index a2ab99698ca8..3fd372301019 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
>  	job->requirements = args->requirements;
>  	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
>  	job->mmu = file_priv->mmu;
> +	job->priv = file_priv;

Uh, I'm not comfortable passing the file context here, unless you reset
it to NULL in panfrost_job_close() and have code that's robust to
job->priv being NULL. We've had cases in the past where jobs outlived
the file context itself.

>  
>  	slot = panfrost_job_get_slot(job);
>  
> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
>  		goto err_free;
>  	}
>  
> +	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
> +	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
> +#if 0
> +	/* Add compute engine in the future */
> +	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
> +#endif
> +	panfrost_priv->fdinfo.maxfreq = pfdev->pfdevfreq.fast_rate;
> +
>  	ret = panfrost_job_open(panfrost_priv);
>  	if (ret)
>  		goto err_job;
> @@ -523,7 +532,40 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
>  	PANFROST_IOCTL(MADVISE,		madvise,	DRM_RENDER_ALLOW),
>  };
>  
> -DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
> +
> +static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
> +				     struct panfrost_file_priv *panfrost_priv,
> +				     struct drm_printer *p)
> +{
> +	int i;
> +
> +	for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
> +		struct engine_info *ei = &panfrost_priv->fdinfo.engines[i];
> +
> +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
> +			   ei->name, ei->elapsed_ns);
> +		drm_printf(p, "drm-cycles-%s:\t%llu\n",
> +			   ei->name, ei->cycles);
> +		drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
> +			   ei->name, panfrost_priv->fdinfo.maxfreq);
> +		drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
> +			   ei->name, pfdev->pfdevfreq.current_frequency);
> +	}
> +}
> +
> +static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
> +{
> +	struct drm_device *dev = file->minor->dev;
> +	struct panfrost_device *pfdev = dev->dev_private;
> +
> +	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
> +}
> +
> +static const struct file_operations panfrost_drm_driver_fops = {
> +	.owner = THIS_MODULE,
> +	DRM_GEM_FOPS,
> +	.show_fdinfo = drm_show_fdinfo,
> +};
>  
>  /*
>   * Panfrost driver version:
> @@ -535,6 +577,7 @@ static const struct drm_driver panfrost_drm_driver = {
>  	.driver_features	= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
>  	.open			= panfrost_open,
>  	.postclose		= panfrost_postclose,
> +	.show_fdinfo		= panfrost_show_fdinfo,
>  	.ioctls			= panfrost_drm_driver_ioctls,
>  	.num_ioctls		= ARRAY_SIZE(panfrost_drm_driver_ioctls),
>  	.fops			= &panfrost_drm_driver_fops,
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index dbc597ab46fb..a847e183b5d0 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -153,10 +153,31 @@ panfrost_get_job_chain_flag(const struct panfrost_job *job)
>  	return (f->seqno & 1) ? JS_CONFIG_JOB_CHAIN_FLAG : 0;
>  }
>  
> +static inline unsigned long long read_cycles(struct panfrost_device *pfdev)
> +{
> +	u64 address = (u64) gpu_read(pfdev, GPU_CYCLE_COUNT_HI) << 32;
> +
> +	address |= gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
> +

We probably want to handle the 32-bit overflow case with something like:

	u32 hi, lo;

	do {
		hi = gpu_read(pfdev, GPU_CYCLE_COUNT_HI);
		lo = gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
	} while (hi != gpu_read(pfdev, GPU_CYCLE_COUNT_HI));

	return ((u64)hi << 32) | lo;

> +	return address;
> +}
> +
>  static struct panfrost_job *
>  panfrost_dequeue_job(struct panfrost_device *pfdev, int slot)
>  {
>  	struct panfrost_job *job = pfdev->jobs[slot][0];
> +	struct engine_info *engine_info = &job->priv->fdinfo.engines[slot];
> +
> +	engine_info->elapsed_ns +=
> +		ktime_to_ns(ktime_sub(ktime_get(), job->start_time));
> +	engine_info->cycles +=
> +		read_cycles(pfdev) - job->start_cycles;
> +
> +	/* Reset in case the job has to be requeued */
> +	job->start_time = 0;
> +	/* A GPU reset puts the Cycle Counter register back to 0 */
> +	job->start_cycles = atomic_read(&pfdev->reset.pending) ?
> +		0 : read_cycles(pfdev);

Do we need to reset these values? If the jobs are re-submitted, those
fields will be re-assigned, and if the job is done, I don't see where
we're using it after that point (might have missed something).

>  
>  	WARN_ON(!job);
>  	pfdev->jobs[slot][0] = pfdev->jobs[slot][1];
> @@ -233,6 +254,9 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
>  	subslot = panfrost_enqueue_job(pfdev, js, job);
>  	/* Don't queue the job if a reset is in progress */
>  	if (!atomic_read(&pfdev->reset.pending)) {
> +		job->start_time = ktime_get();
> +		job->start_cycles = read_cycles(pfdev);
> +
>  		job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
>  		dev_dbg(pfdev->dev,
>  			"JS: Submitting atom %p to js[%d][%d] with head=0x%llx AS %d",
> @@ -297,6 +321,9 @@ int panfrost_job_push(struct panfrost_job *job)
>  
>  	kref_get(&job->refcount); /* put by scheduler job completion */
>  
> +	if (panfrost_job_is_idle(pfdev))
> +		gpu_write(pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_START);
> +
>  	drm_sched_entity_push_job(&job->base);
>  
>  	mutex_unlock(&pfdev->sched_lock);
> @@ -351,6 +378,9 @@ static void panfrost_job_free(struct drm_sched_job *sched_job)
>  
>  	drm_sched_job_cleanup(sched_job);
>  
> +	if (panfrost_job_is_idle(job->pfdev))
> +		gpu_write(job->pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_STOP);
> +
>  	panfrost_job_put(job);
>  }
>  
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h
> index 8becc1ba0eb9..038171c39dd8 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.h
> @@ -32,6 +32,10 @@ struct panfrost_job {
>  
>  	/* Fence to be signaled by drm-sched once its done with the job */
>  	struct dma_fence *render_done_fence;
> +
> +	struct panfrost_file_priv *priv;
> +	ktime_t start_time;
> +	u64 start_cycles;
>  };
>  
>  int panfrost_job_init(struct panfrost_device *pfdev);


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
@ 2023-08-30 10:17     ` Boris Brezillon
  0 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2023-08-30 10:17 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price, linux-arm-msm, linux-kernel, dri-devel,
	healych, kernel, freedreno

On Thu, 24 Aug 2023 02:34:45 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> The drm-stats fdinfo tags made available to user space are drm-engine,
> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.

Pretty sure this has already been discussed, but it's probably worth
mentioning that drm-cycles is not accurate, it just gives you a rough
idea of how much GPU cycles were dedicated to a context (just like
drm-engine elapsed-ns is giving you an approximation of the
GPU utilization). This comes from 2 factors:

1. We're dependent on the time the kernel/CPU takes to process the GPU
interrupt.
2. The pipelining done by the Job Manager (2 job slots per engine)
implies that you can't really know how much time each job spent on the
GPU. When these jobs are coming from the same context, that's not a
problem, but when they don't, it's impossible to have a clear split.

I'd really like to have that mentioned somewhere in the code+commit
message to lower users expectation.

> 
> This deviates from standard practice in other DRM drivers, where a single
> set of key:value pairs is provided for the whole render engine. However,
> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
> decision was made to calculate bus cycles and workload times separately.
> 
> Maximum operating frequency is calculated at devfreq initialisation time.
> Current frequency is made available to user space because nvtop uses it
> when performing engine usage calculations.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
>  drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>  drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
>  drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
>  drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
>  6 files changed, 102 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> index 58dfb15a8757..28caffc689e2 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
> @@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
>  	spin_lock_irqsave(&pfdevfreq->lock, irqflags);
>  
>  	panfrost_devfreq_update_utilization(pfdevfreq);
> +	pfdevfreq->current_frequency = status->current_frequency;
>  
>  	status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time,
>  						   pfdevfreq->idle_time));
> @@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
>  	struct devfreq *devfreq;
>  	struct thermal_cooling_device *cooling;
>  	struct panfrost_devfreq *pfdevfreq = &pfdev->pfdevfreq;
> +	unsigned long freq = ULONG_MAX;
>  
>  	if (pfdev->comp->num_supplies > 1) {
>  		/*
> @@ -172,6 +174,12 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
>  		return ret;
>  	}
>  
> +	/* Find the fastest defined rate  */
> +	opp = dev_pm_opp_find_freq_floor(dev, &freq);
> +	if (IS_ERR(opp))
> +		return PTR_ERR(opp);
> +	pfdevfreq->fast_rate = freq;
> +
>  	dev_pm_opp_put(opp);
>  
>  	/*
> diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.h b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
> index 1514c1f9d91c..48dbe185f206 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
> @@ -19,6 +19,9 @@ struct panfrost_devfreq {
>  	struct devfreq_simple_ondemand_data gov_data;
>  	bool opp_of_table_added;
>  
> +	unsigned long current_frequency;
> +	unsigned long fast_rate;
> +
>  	ktime_t busy_time;
>  	ktime_t idle_time;
>  	ktime_t time_last_update;
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
> index b0126b9fbadc..680f298fd1a9 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> @@ -24,6 +24,7 @@ struct panfrost_perfcnt;
>  
>  #define NUM_JOB_SLOTS 3
>  #define MAX_PM_DOMAINS 5
> +#define MAX_SLOT_NAME_LEN 10
>  
>  struct panfrost_features {
>  	u16 id;
> @@ -135,12 +136,24 @@ struct panfrost_mmu {
>  	struct list_head list;
>  };
>  
> +struct drm_info_gpu {
> +	unsigned int maxfreq;
> +
> +	struct engine_info {
> +		unsigned long long elapsed_ns;
> +		unsigned long long cycles;
> +		char name[MAX_SLOT_NAME_LEN];
> +	} engines[NUM_JOB_SLOTS];
> +};
> +
>  struct panfrost_file_priv {
>  	struct panfrost_device *pfdev;
>  
>  	struct drm_sched_entity sched_entity[NUM_JOB_SLOTS];
>  
>  	struct panfrost_mmu *mmu;
> +
> +	struct drm_info_gpu fdinfo;
>  };
>  
>  static inline struct panfrost_device *to_panfrost_device(struct drm_device *ddev)
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index a2ab99698ca8..3fd372301019 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
>  	job->requirements = args->requirements;
>  	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
>  	job->mmu = file_priv->mmu;
> +	job->priv = file_priv;

Uh, I'm not comfortable passing the file context here, unless you reset
it to NULL in panfrost_job_close() and have code that's robust to
job->priv being NULL. We've had cases in the past where jobs outlived
the file context itself.

>  
>  	slot = panfrost_job_get_slot(job);
>  
> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
>  		goto err_free;
>  	}
>  
> +	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
> +	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
> +#if 0
> +	/* Add compute engine in the future */
> +	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
> +#endif
> +	panfrost_priv->fdinfo.maxfreq = pfdev->pfdevfreq.fast_rate;
> +
>  	ret = panfrost_job_open(panfrost_priv);
>  	if (ret)
>  		goto err_job;
> @@ -523,7 +532,40 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
>  	PANFROST_IOCTL(MADVISE,		madvise,	DRM_RENDER_ALLOW),
>  };
>  
> -DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
> +
> +static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
> +				     struct panfrost_file_priv *panfrost_priv,
> +				     struct drm_printer *p)
> +{
> +	int i;
> +
> +	for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
> +		struct engine_info *ei = &panfrost_priv->fdinfo.engines[i];
> +
> +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
> +			   ei->name, ei->elapsed_ns);
> +		drm_printf(p, "drm-cycles-%s:\t%llu\n",
> +			   ei->name, ei->cycles);
> +		drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
> +			   ei->name, panfrost_priv->fdinfo.maxfreq);
> +		drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
> +			   ei->name, pfdev->pfdevfreq.current_frequency);
> +	}
> +}
> +
> +static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
> +{
> +	struct drm_device *dev = file->minor->dev;
> +	struct panfrost_device *pfdev = dev->dev_private;
> +
> +	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
> +}
> +
> +static const struct file_operations panfrost_drm_driver_fops = {
> +	.owner = THIS_MODULE,
> +	DRM_GEM_FOPS,
> +	.show_fdinfo = drm_show_fdinfo,
> +};
>  
>  /*
>   * Panfrost driver version:
> @@ -535,6 +577,7 @@ static const struct drm_driver panfrost_drm_driver = {
>  	.driver_features	= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
>  	.open			= panfrost_open,
>  	.postclose		= panfrost_postclose,
> +	.show_fdinfo		= panfrost_show_fdinfo,
>  	.ioctls			= panfrost_drm_driver_ioctls,
>  	.num_ioctls		= ARRAY_SIZE(panfrost_drm_driver_ioctls),
>  	.fops			= &panfrost_drm_driver_fops,
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index dbc597ab46fb..a847e183b5d0 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -153,10 +153,31 @@ panfrost_get_job_chain_flag(const struct panfrost_job *job)
>  	return (f->seqno & 1) ? JS_CONFIG_JOB_CHAIN_FLAG : 0;
>  }
>  
> +static inline unsigned long long read_cycles(struct panfrost_device *pfdev)
> +{
> +	u64 address = (u64) gpu_read(pfdev, GPU_CYCLE_COUNT_HI) << 32;
> +
> +	address |= gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
> +

We probably want to handle the 32-bit overflow case with something like:

	u32 hi, lo;

	do {
		hi = gpu_read(pfdev, GPU_CYCLE_COUNT_HI);
		lo = gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
	} while (hi != gpu_read(pfdev, GPU_CYCLE_COUNT_HI));

	return ((u64)hi << 32) | lo;

> +	return address;
> +}
> +
>  static struct panfrost_job *
>  panfrost_dequeue_job(struct panfrost_device *pfdev, int slot)
>  {
>  	struct panfrost_job *job = pfdev->jobs[slot][0];
> +	struct engine_info *engine_info = &job->priv->fdinfo.engines[slot];
> +
> +	engine_info->elapsed_ns +=
> +		ktime_to_ns(ktime_sub(ktime_get(), job->start_time));
> +	engine_info->cycles +=
> +		read_cycles(pfdev) - job->start_cycles;
> +
> +	/* Reset in case the job has to be requeued */
> +	job->start_time = 0;
> +	/* A GPU reset puts the Cycle Counter register back to 0 */
> +	job->start_cycles = atomic_read(&pfdev->reset.pending) ?
> +		0 : read_cycles(pfdev);

Do we need to reset these values? If the jobs are re-submitted, those
fields will be re-assigned, and if the job is done, I don't see where
we're using it after that point (might have missed something).

>  
>  	WARN_ON(!job);
>  	pfdev->jobs[slot][0] = pfdev->jobs[slot][1];
> @@ -233,6 +254,9 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
>  	subslot = panfrost_enqueue_job(pfdev, js, job);
>  	/* Don't queue the job if a reset is in progress */
>  	if (!atomic_read(&pfdev->reset.pending)) {
> +		job->start_time = ktime_get();
> +		job->start_cycles = read_cycles(pfdev);
> +
>  		job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
>  		dev_dbg(pfdev->dev,
>  			"JS: Submitting atom %p to js[%d][%d] with head=0x%llx AS %d",
> @@ -297,6 +321,9 @@ int panfrost_job_push(struct panfrost_job *job)
>  
>  	kref_get(&job->refcount); /* put by scheduler job completion */
>  
> +	if (panfrost_job_is_idle(pfdev))
> +		gpu_write(pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_START);
> +
>  	drm_sched_entity_push_job(&job->base);
>  
>  	mutex_unlock(&pfdev->sched_lock);
> @@ -351,6 +378,9 @@ static void panfrost_job_free(struct drm_sched_job *sched_job)
>  
>  	drm_sched_job_cleanup(sched_job);
>  
> +	if (panfrost_job_is_idle(job->pfdev))
> +		gpu_write(job->pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_STOP);
> +
>  	panfrost_job_put(job);
>  }
>  
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h
> index 8becc1ba0eb9..038171c39dd8 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.h
> @@ -32,6 +32,10 @@ struct panfrost_job {
>  
>  	/* Fence to be signaled by drm-sched once its done with the job */
>  	struct dma_fence *render_done_fence;
> +
> +	struct panfrost_file_priv *priv;
> +	ktime_t start_time;
> +	u64 start_cycles;
>  };
>  
>  int panfrost_job_init(struct panfrost_device *pfdev);


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 3/6] drm/panfrost: Add fdinfo support for memory stats
  2023-08-24  1:34   ` Adrián Larumbe
@ 2023-08-30 10:31     ` Boris Brezillon
  -1 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2023-08-30 10:31 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: tzimmermann, sean, quic_abhinavk, mripard, steven.price,
	freedreno, healych, dri-devel, linux-arm-msm, dmitry.baryshkov,
	marijn.suijten, kernel, linux-kernel

On Thu, 24 Aug 2023 02:34:46 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> A new DRM GEM object function is added so that drm_show_memory_stats can
> provider more accurate memory usage numbers.

  s/provider/provide/

> 
> Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked
> after locking the driver's shrinker mutex, but drm_show_memory_stats takes
> over the drm file's object handle database spinlock, so there's potential
> for a race condition here.

Yeah, I don't think it matters much if we report a BO non-purgeable,
and this BO becomes purgeable in the meantime. You'd have the same
problem

> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/panfrost/panfrost_drv.c |  9 +++++++--
>  drivers/gpu/drm/panfrost/panfrost_gem.c | 12 ++++++++++++
>  drivers/gpu/drm/panfrost/panfrost_gem.h |  1 +
>  3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 3fd372301019..93d5f5538c0b 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -440,11 +440,14 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
>  	args->retained = drm_gem_shmem_madvise(&bo->base, args->madv);
>  
>  	if (args->retained) {
> -		if (args->madv == PANFROST_MADV_DONTNEED)
> +		if (args->madv == PANFROST_MADV_DONTNEED) {
>  			list_move_tail(&bo->base.madv_list,
>  				       &pfdev->shrinker_list);
> -		else if (args->madv == PANFROST_MADV_WILLNEED)
> +			bo->is_purgable = true;
> +		} else if (args->madv == PANFROST_MADV_WILLNEED) {
>  			list_del_init(&bo->base.madv_list);
> +			bo->is_purgable = false;

Should we really flag the BO as purgeable if it's already been evicted
(args->retained == false)?

> +		}
>  	}
>  
>  out_unlock_mappings:
> @@ -559,6 +562,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>  	struct panfrost_device *pfdev = dev->dev_private;
>  
>  	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
> +
> +	drm_show_memory_stats(p, file);
>  }
>  
>  static const struct file_operations panfrost_drm_driver_fops = {
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index 3c812fbd126f..aea16b0e4dda 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -195,6 +195,17 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
>  	return drm_gem_shmem_pin(&bo->base);
>  }
>  
> +static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj)
> +{
> +	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
> +	enum drm_gem_object_status res = 0;
> +
> +	res |= (bo->is_purgable) ? DRM_GEM_OBJECT_PURGEABLE : 0;

Why not checking bo->base.madv here instead of adding an is_purgeable
field?

> +
> +	res |= (bo->base.pages) ? DRM_GEM_OBJECT_RESIDENT : 0;

Does it make sense to have DRM_GEM_OBJECT_PURGEABLE set when
DRM_GEM_OBJECT_RESIDENT is not?

> +
> +	return res;
> +}
>  static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>  	.free = panfrost_gem_free_object,
>  	.open = panfrost_gem_open,
> @@ -206,6 +217,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>  	.vmap = drm_gem_shmem_object_vmap,
>  	.vunmap = drm_gem_shmem_object_vunmap,
>  	.mmap = drm_gem_shmem_object_mmap,
> +	.status = panfrost_gem_status,
>  	.vm_ops = &drm_gem_shmem_vm_ops,
>  };
>  
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
> index ad2877eeeccd..e06f7ceb8f73 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
> @@ -38,6 +38,7 @@ struct panfrost_gem_object {
>  
>  	bool noexec		:1;
>  	bool is_heap		:1;
> +	bool is_purgable	:1;
>  };
>  
>  struct panfrost_gem_mapping {


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 3/6] drm/panfrost: Add fdinfo support for memory stats
@ 2023-08-30 10:31     ` Boris Brezillon
  0 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2023-08-30 10:31 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price, linux-arm-msm, linux-kernel, dri-devel,
	healych, kernel, freedreno

On Thu, 24 Aug 2023 02:34:46 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> A new DRM GEM object function is added so that drm_show_memory_stats can
> provider more accurate memory usage numbers.

  s/provider/provide/

> 
> Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked
> after locking the driver's shrinker mutex, but drm_show_memory_stats takes
> over the drm file's object handle database spinlock, so there's potential
> for a race condition here.

Yeah, I don't think it matters much if we report a BO non-purgeable,
and this BO becomes purgeable in the meantime. You'd have the same
problem

> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/panfrost/panfrost_drv.c |  9 +++++++--
>  drivers/gpu/drm/panfrost/panfrost_gem.c | 12 ++++++++++++
>  drivers/gpu/drm/panfrost/panfrost_gem.h |  1 +
>  3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 3fd372301019..93d5f5538c0b 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -440,11 +440,14 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
>  	args->retained = drm_gem_shmem_madvise(&bo->base, args->madv);
>  
>  	if (args->retained) {
> -		if (args->madv == PANFROST_MADV_DONTNEED)
> +		if (args->madv == PANFROST_MADV_DONTNEED) {
>  			list_move_tail(&bo->base.madv_list,
>  				       &pfdev->shrinker_list);
> -		else if (args->madv == PANFROST_MADV_WILLNEED)
> +			bo->is_purgable = true;
> +		} else if (args->madv == PANFROST_MADV_WILLNEED) {
>  			list_del_init(&bo->base.madv_list);
> +			bo->is_purgable = false;

Should we really flag the BO as purgeable if it's already been evicted
(args->retained == false)?

> +		}
>  	}
>  
>  out_unlock_mappings:
> @@ -559,6 +562,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>  	struct panfrost_device *pfdev = dev->dev_private;
>  
>  	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
> +
> +	drm_show_memory_stats(p, file);
>  }
>  
>  static const struct file_operations panfrost_drm_driver_fops = {
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index 3c812fbd126f..aea16b0e4dda 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -195,6 +195,17 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
>  	return drm_gem_shmem_pin(&bo->base);
>  }
>  
> +static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj)
> +{
> +	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
> +	enum drm_gem_object_status res = 0;
> +
> +	res |= (bo->is_purgable) ? DRM_GEM_OBJECT_PURGEABLE : 0;

Why not checking bo->base.madv here instead of adding an is_purgeable
field?

> +
> +	res |= (bo->base.pages) ? DRM_GEM_OBJECT_RESIDENT : 0;

Does it make sense to have DRM_GEM_OBJECT_PURGEABLE set when
DRM_GEM_OBJECT_RESIDENT is not?

> +
> +	return res;
> +}
>  static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>  	.free = panfrost_gem_free_object,
>  	.open = panfrost_gem_open,
> @@ -206,6 +217,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>  	.vmap = drm_gem_shmem_object_vmap,
>  	.vunmap = drm_gem_shmem_object_vunmap,
>  	.mmap = drm_gem_shmem_object_mmap,
> +	.status = panfrost_gem_status,
>  	.vm_ops = &drm_gem_shmem_vm_ops,
>  };
>  
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
> index ad2877eeeccd..e06f7ceb8f73 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
> @@ -38,6 +38,7 @@ struct panfrost_gem_object {
>  
>  	bool noexec		:1;
>  	bool is_heap		:1;
> +	bool is_purgable	:1;
>  };
>  
>  struct panfrost_gem_mapping {


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 4/6] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
  2023-08-24  1:34   ` Adrián Larumbe
@ 2023-08-30 10:34     ` Boris Brezillon
  -1 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2023-08-30 10:34 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: tzimmermann, sean, quic_abhinavk, mripard, steven.price,
	freedreno, healych, dri-devel, linux-arm-msm, dmitry.baryshkov,
	marijn.suijten, kernel, linux-kernel

On Thu, 24 Aug 2023 02:34:47 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> Some BO's might be mapped onto physical memory chunkwise and on demand,
> like Panfrost's tiler heap. In this case, even though the
> drm_gem_shmem_object page array might already be allocated, only a very
> small fraction of the BO is currently backed by system memory, but
> drm_show_memory_stats will then proceed to add its entire virtual size to
> the file's total resident size regardless.
> 
> This led to very unrealistic RSS sizes being reckoned for Panfrost, where
> said tiler heap buffer is initially allocated with a virtual size of 128
> MiB, but only a small part of it will eventually be backed by system memory
> after successive GPU page faults.
> 
> Provide a new DRM object generic function that would allow drivers to
> return a more accurate RSS size for their BOs.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> ---
>  drivers/gpu/drm/drm_file.c | 5 ++++-
>  include/drm/drm_gem.h      | 9 +++++++++
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index 883d83bc0e3d..762965e3d503 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -944,7 +944,10 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
>  		}
>  
>  		if (s & DRM_GEM_OBJECT_RESIDENT) {
> -			status.resident += obj->size;
> +			if (obj->funcs && obj->funcs->rss)
> +				status.resident += obj->funcs->rss(obj);
> +			else
> +				status.resident += obj->size;
>  		} else {
>  			/* If already purged or not yet backed by pages, don't
>  			 * count it as purgeable:
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index c0b13c43b459..78ed9fab6044 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -208,6 +208,15 @@ struct drm_gem_object_funcs {
>  	 */
>  	enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
>  
> +	/**
> +	 * @rss:
> +	 *
> +	 * Return resident size of the object in physical memory.
> +	 *
> +	 * Called by drm_show_memory_stats().
> +	 */
> +	size_t (*rss)(struct drm_gem_object *obj);
> +
>  	/**
>  	 * @vm_ops:
>  	 *


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 4/6] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
@ 2023-08-30 10:34     ` Boris Brezillon
  0 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2023-08-30 10:34 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price, linux-arm-msm, linux-kernel, dri-devel,
	healych, kernel, freedreno

On Thu, 24 Aug 2023 02:34:47 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> Some BO's might be mapped onto physical memory chunkwise and on demand,
> like Panfrost's tiler heap. In this case, even though the
> drm_gem_shmem_object page array might already be allocated, only a very
> small fraction of the BO is currently backed by system memory, but
> drm_show_memory_stats will then proceed to add its entire virtual size to
> the file's total resident size regardless.
> 
> This led to very unrealistic RSS sizes being reckoned for Panfrost, where
> said tiler heap buffer is initially allocated with a virtual size of 128
> MiB, but only a small part of it will eventually be backed by system memory
> after successive GPU page faults.
> 
> Provide a new DRM object generic function that would allow drivers to
> return a more accurate RSS size for their BOs.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> ---
>  drivers/gpu/drm/drm_file.c | 5 ++++-
>  include/drm/drm_gem.h      | 9 +++++++++
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index 883d83bc0e3d..762965e3d503 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -944,7 +944,10 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
>  		}
>  
>  		if (s & DRM_GEM_OBJECT_RESIDENT) {
> -			status.resident += obj->size;
> +			if (obj->funcs && obj->funcs->rss)
> +				status.resident += obj->funcs->rss(obj);
> +			else
> +				status.resident += obj->size;
>  		} else {
>  			/* If already purged or not yet backed by pages, don't
>  			 * count it as purgeable:
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index c0b13c43b459..78ed9fab6044 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -208,6 +208,15 @@ struct drm_gem_object_funcs {
>  	 */
>  	enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
>  
> +	/**
> +	 * @rss:
> +	 *
> +	 * Return resident size of the object in physical memory.
> +	 *
> +	 * Called by drm_show_memory_stats().
> +	 */
> +	size_t (*rss)(struct drm_gem_object *obj);
> +
>  	/**
>  	 * @vm_ops:
>  	 *


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/6] drm/panfrost: Add cycle count GPU register definitions
  2023-08-24  1:34   ` Adrián Larumbe
@ 2023-08-30 10:35     ` Boris Brezillon
  -1 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2023-08-30 10:35 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: tzimmermann, sean, quic_abhinavk, mripard, steven.price,
	freedreno, healych, dri-devel, linux-arm-msm, dmitry.baryshkov,
	marijn.suijten, kernel, linux-kernel

On Thu, 24 Aug 2023 02:34:44 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> These GPU registers will be used when programming the cycle counter, which
> we need for providing accurate fdinfo drm-cycles values to user space.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> ---
>  drivers/gpu/drm/panfrost/panfrost_regs.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h
> index 919f44ac853d..55ec807550b3 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_regs.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
> @@ -46,6 +46,8 @@
>  #define   GPU_CMD_SOFT_RESET		0x01
>  #define   GPU_CMD_PERFCNT_CLEAR		0x03
>  #define   GPU_CMD_PERFCNT_SAMPLE	0x04
> +#define   GPU_CMD_CYCLE_COUNT_START	0x05
> +#define   GPU_CMD_CYCLE_COUNT_STOP	0x06
>  #define   GPU_CMD_CLEAN_CACHES		0x07
>  #define   GPU_CMD_CLEAN_INV_CACHES	0x08
>  #define GPU_STATUS			0x34
> @@ -73,6 +75,9 @@
>  #define GPU_PRFCNT_TILER_EN		0x74
>  #define GPU_PRFCNT_MMU_L2_EN		0x7c
>  
> +#define GPU_CYCLE_COUNT_LO		0x90
> +#define GPU_CYCLE_COUNT_HI		0x94
> +
>  #define GPU_THREAD_MAX_THREADS		0x0A0	/* (RO) Maximum number of threads per core */
>  #define GPU_THREAD_MAX_WORKGROUP_SIZE	0x0A4	/* (RO) Maximum workgroup size */
>  #define GPU_THREAD_MAX_BARRIER_SIZE	0x0A8	/* (RO) Maximum threads waiting at a barrier */


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/6] drm/panfrost: Add cycle count GPU register definitions
@ 2023-08-30 10:35     ` Boris Brezillon
  0 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2023-08-30 10:35 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price, linux-arm-msm, linux-kernel, dri-devel,
	healych, kernel, freedreno

On Thu, 24 Aug 2023 02:34:44 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> These GPU registers will be used when programming the cycle counter, which
> we need for providing accurate fdinfo drm-cycles values to user space.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

> ---
>  drivers/gpu/drm/panfrost/panfrost_regs.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h
> index 919f44ac853d..55ec807550b3 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_regs.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
> @@ -46,6 +46,8 @@
>  #define   GPU_CMD_SOFT_RESET		0x01
>  #define   GPU_CMD_PERFCNT_CLEAR		0x03
>  #define   GPU_CMD_PERFCNT_SAMPLE	0x04
> +#define   GPU_CMD_CYCLE_COUNT_START	0x05
> +#define   GPU_CMD_CYCLE_COUNT_STOP	0x06
>  #define   GPU_CMD_CLEAN_CACHES		0x07
>  #define   GPU_CMD_CLEAN_INV_CACHES	0x08
>  #define GPU_STATUS			0x34
> @@ -73,6 +75,9 @@
>  #define GPU_PRFCNT_TILER_EN		0x74
>  #define GPU_PRFCNT_MMU_L2_EN		0x7c
>  
> +#define GPU_CYCLE_COUNT_LO		0x90
> +#define GPU_CYCLE_COUNT_HI		0x94
> +
>  #define GPU_THREAD_MAX_THREADS		0x0A0	/* (RO) Maximum number of threads per core */
>  #define GPU_THREAD_MAX_WORKGROUP_SIZE	0x0A4	/* (RO) Maximum workgroup size */
>  #define GPU_THREAD_MAX_BARRIER_SIZE	0x0A8	/* (RO) Maximum threads waiting at a barrier */


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 5/6] drm/panfrost: Implement generic DRM object RSS reporting function
  2023-08-24  1:34   ` Adrián Larumbe
@ 2023-08-30 10:52     ` Boris Brezillon
  -1 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2023-08-30 10:52 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: tzimmermann, sean, quic_abhinavk, mripard, steven.price,
	freedreno, healych, dri-devel, linux-arm-msm, dmitry.baryshkov,
	marijn.suijten, kernel, linux-kernel

On Thu, 24 Aug 2023 02:34:48 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> BO's RSS is updated every time new pages are allocated and mapped for the
> object, either in its entirety at creation time for non-heap buffers, or
> else on demand for heap buffers at GPU page fault's IRQ handler.
> 
> Same calculations had to be done for imported PRIME objects, since backing
> storage for it might have already been allocated by the exporting driver.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/panfrost/panfrost_gem.c | 22 ++++++++++++++++++++++
>  drivers/gpu/drm/panfrost/panfrost_gem.h |  5 +++++
>  drivers/gpu/drm/panfrost/panfrost_mmu.c | 16 +++++++++++-----
>  3 files changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index aea16b0e4dda..c6bd1f16a6d4 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -206,6 +206,17 @@ static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj
>  
>  	return res;
>  }
> +
> +size_t panfrost_gem_rss(struct drm_gem_object *obj)
> +{
> +	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
> +
> +	if (!bo->base.pages)
> +		return 0;
> +
> +	return bo->rss_size;
> +}
> +
>  static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>  	.free = panfrost_gem_free_object,
>  	.open = panfrost_gem_open,
> @@ -218,6 +229,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>  	.vunmap = drm_gem_shmem_object_vunmap,
>  	.mmap = drm_gem_shmem_object_mmap,
>  	.status = panfrost_gem_status,
> +	.rss = panfrost_gem_rss,
>  	.vm_ops = &drm_gem_shmem_vm_ops,
>  };
>  
> @@ -274,13 +286,23 @@ panfrost_gem_prime_import_sg_table(struct drm_device *dev,
>  {
>  	struct drm_gem_object *obj;
>  	struct panfrost_gem_object *bo;
> +	struct scatterlist *sgl;
> +	unsigned int count;
> +	size_t total = 0;
>  
>  	obj = drm_gem_shmem_prime_import_sg_table(dev, attach, sgt);
>  	if (IS_ERR(obj))
>  		return ERR_CAST(obj);
>  
> +	for_each_sgtable_dma_sg(sgt, sgl, count) {
> +		size_t len = sg_dma_len(sgl);
> +
> +		total += len;
> +	}

Why not simply have bo->rss_size = obj->size here? Not sure I see a
reason to not trust dma_buf?

> +
>  	bo = to_panfrost_bo(obj);
>  	bo->noexec = true;
> +	bo->rss_size = total;
>  
>  	return obj;
>  }
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
> index e06f7ceb8f73..e2a7c46403c7 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
> @@ -36,6 +36,11 @@ struct panfrost_gem_object {
>  	 */
>  	atomic_t gpu_usecount;
>  
> +	/*
> +	 * Object chunk size currently mapped onto physical memory
> +	 */
> +	size_t rss_size;
> +
>  	bool noexec		:1;
>  	bool is_heap		:1;
>  	bool is_purgable	:1;
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index c0123d09f699..e03a5a9da06f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -285,17 +285,19 @@ static void panfrost_mmu_flush_range(struct panfrost_device *pfdev,
>  	pm_runtime_put_autosuspend(pfdev->dev);
>  }
>  
> -static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
> +static size_t mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
>  		      u64 iova, int prot, struct sg_table *sgt)
>  {
>  	unsigned int count;
>  	struct scatterlist *sgl;
>  	struct io_pgtable_ops *ops = mmu->pgtbl_ops;
>  	u64 start_iova = iova;
> +	size_t total = 0;
>  
>  	for_each_sgtable_dma_sg(sgt, sgl, count) {
>  		unsigned long paddr = sg_dma_address(sgl);
>  		size_t len = sg_dma_len(sgl);
> +		total += len;
>  
>  		dev_dbg(pfdev->dev, "map: as=%d, iova=%llx, paddr=%lx, len=%zx", mmu->as, iova, paddr, len);
>  
> @@ -315,7 +317,7 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
>  
>  	panfrost_mmu_flush_range(pfdev, mmu, start_iova, iova - start_iova);
>  
> -	return 0;
> +	return total;
>  }
>  
>  int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
> @@ -326,6 +328,7 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
>  	struct panfrost_device *pfdev = to_panfrost_device(obj->dev);
>  	struct sg_table *sgt;
>  	int prot = IOMMU_READ | IOMMU_WRITE;
> +	size_t mapped_size;
>  
>  	if (WARN_ON(mapping->active))
>  		return 0;
> @@ -337,9 +340,10 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
>  	if (WARN_ON(IS_ERR(sgt)))
>  		return PTR_ERR(sgt);
>  
> -	mmu_map_sg(pfdev, mapping->mmu, mapping->mmnode.start << PAGE_SHIFT,
> +	mapped_size = mmu_map_sg(pfdev, mapping->mmu, mapping->mmnode.start << PAGE_SHIFT,
>  		   prot, sgt);
>  	mapping->active = true;
> +	bo->rss_size += mapped_size;

Actually, the GEM might be resident even before panfrost_mmu_map() is
called: as soon as drm_gem_shmem_get_pages[_locked]() is called, it's
resident (might get evicted after that point though). That means any
mmap coming from userspace will make the buffer resident too. I know
we're automatically mapping GEMs to the GPU VM in panfrost_gem_open(),
so it makes no difference, but I think I'd prefer if we keep ->rss_size
for heap BOs only (we probably want to rename it heap_rss_size) and
then have


	if (bo->is_heap)
		return bo->heap_rss_size;
	else if (bo->base.pages)
		return bo->base.base.size;
	else
		return 0;

in panfrost_gem_rss().

>  
>  	return 0;
>  }
> @@ -447,6 +451,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
>  	pgoff_t page_offset;
>  	struct sg_table *sgt;
>  	struct page **pages;
> +	size_t mapped_size;
>  
>  	bomapping = addr_to_mapping(pfdev, as, addr);
>  	if (!bomapping)
> @@ -518,10 +523,11 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
>  	if (ret)
>  		goto err_map;
>  
> -	mmu_map_sg(pfdev, bomapping->mmu, addr,
> -		   IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
> +	mapped_size = mmu_map_sg(pfdev, bomapping->mmu, addr,
> +				 IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
>  
>  	bomapping->active = true;
> +	bo->rss_size += mapped_size;
>  
>  	dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
>  


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 5/6] drm/panfrost: Implement generic DRM object RSS reporting function
@ 2023-08-30 10:52     ` Boris Brezillon
  0 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2023-08-30 10:52 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price, linux-arm-msm, linux-kernel, dri-devel,
	healych, kernel, freedreno

On Thu, 24 Aug 2023 02:34:48 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> BO's RSS is updated every time new pages are allocated and mapped for the
> object, either in its entirety at creation time for non-heap buffers, or
> else on demand for heap buffers at GPU page fault's IRQ handler.
> 
> Same calculations had to be done for imported PRIME objects, since backing
> storage for it might have already been allocated by the exporting driver.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/panfrost/panfrost_gem.c | 22 ++++++++++++++++++++++
>  drivers/gpu/drm/panfrost/panfrost_gem.h |  5 +++++
>  drivers/gpu/drm/panfrost/panfrost_mmu.c | 16 +++++++++++-----
>  3 files changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index aea16b0e4dda..c6bd1f16a6d4 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -206,6 +206,17 @@ static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj
>  
>  	return res;
>  }
> +
> +size_t panfrost_gem_rss(struct drm_gem_object *obj)
> +{
> +	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
> +
> +	if (!bo->base.pages)
> +		return 0;
> +
> +	return bo->rss_size;
> +}
> +
>  static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>  	.free = panfrost_gem_free_object,
>  	.open = panfrost_gem_open,
> @@ -218,6 +229,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>  	.vunmap = drm_gem_shmem_object_vunmap,
>  	.mmap = drm_gem_shmem_object_mmap,
>  	.status = panfrost_gem_status,
> +	.rss = panfrost_gem_rss,
>  	.vm_ops = &drm_gem_shmem_vm_ops,
>  };
>  
> @@ -274,13 +286,23 @@ panfrost_gem_prime_import_sg_table(struct drm_device *dev,
>  {
>  	struct drm_gem_object *obj;
>  	struct panfrost_gem_object *bo;
> +	struct scatterlist *sgl;
> +	unsigned int count;
> +	size_t total = 0;
>  
>  	obj = drm_gem_shmem_prime_import_sg_table(dev, attach, sgt);
>  	if (IS_ERR(obj))
>  		return ERR_CAST(obj);
>  
> +	for_each_sgtable_dma_sg(sgt, sgl, count) {
> +		size_t len = sg_dma_len(sgl);
> +
> +		total += len;
> +	}

Why not simply have bo->rss_size = obj->size here? Not sure I see a
reason to not trust dma_buf?

> +
>  	bo = to_panfrost_bo(obj);
>  	bo->noexec = true;
> +	bo->rss_size = total;
>  
>  	return obj;
>  }
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
> index e06f7ceb8f73..e2a7c46403c7 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
> @@ -36,6 +36,11 @@ struct panfrost_gem_object {
>  	 */
>  	atomic_t gpu_usecount;
>  
> +	/*
> +	 * Object chunk size currently mapped onto physical memory
> +	 */
> +	size_t rss_size;
> +
>  	bool noexec		:1;
>  	bool is_heap		:1;
>  	bool is_purgable	:1;
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index c0123d09f699..e03a5a9da06f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -285,17 +285,19 @@ static void panfrost_mmu_flush_range(struct panfrost_device *pfdev,
>  	pm_runtime_put_autosuspend(pfdev->dev);
>  }
>  
> -static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
> +static size_t mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
>  		      u64 iova, int prot, struct sg_table *sgt)
>  {
>  	unsigned int count;
>  	struct scatterlist *sgl;
>  	struct io_pgtable_ops *ops = mmu->pgtbl_ops;
>  	u64 start_iova = iova;
> +	size_t total = 0;
>  
>  	for_each_sgtable_dma_sg(sgt, sgl, count) {
>  		unsigned long paddr = sg_dma_address(sgl);
>  		size_t len = sg_dma_len(sgl);
> +		total += len;
>  
>  		dev_dbg(pfdev->dev, "map: as=%d, iova=%llx, paddr=%lx, len=%zx", mmu->as, iova, paddr, len);
>  
> @@ -315,7 +317,7 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
>  
>  	panfrost_mmu_flush_range(pfdev, mmu, start_iova, iova - start_iova);
>  
> -	return 0;
> +	return total;
>  }
>  
>  int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
> @@ -326,6 +328,7 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
>  	struct panfrost_device *pfdev = to_panfrost_device(obj->dev);
>  	struct sg_table *sgt;
>  	int prot = IOMMU_READ | IOMMU_WRITE;
> +	size_t mapped_size;
>  
>  	if (WARN_ON(mapping->active))
>  		return 0;
> @@ -337,9 +340,10 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
>  	if (WARN_ON(IS_ERR(sgt)))
>  		return PTR_ERR(sgt);
>  
> -	mmu_map_sg(pfdev, mapping->mmu, mapping->mmnode.start << PAGE_SHIFT,
> +	mapped_size = mmu_map_sg(pfdev, mapping->mmu, mapping->mmnode.start << PAGE_SHIFT,
>  		   prot, sgt);
>  	mapping->active = true;
> +	bo->rss_size += mapped_size;

Actually, the GEM might be resident even before panfrost_mmu_map() is
called: as soon as drm_gem_shmem_get_pages[_locked]() is called, it's
resident (might get evicted after that point though). That means any
mmap coming from userspace will make the buffer resident too. I know
we're automatically mapping GEMs to the GPU VM in panfrost_gem_open(),
so it makes no difference, but I think I'd prefer if we keep ->rss_size
for heap BOs only (we probably want to rename it heap_rss_size) and
then have


	if (bo->is_heap)
		return bo->heap_rss_size;
	else if (bo->base.pages)
		return bo->base.base.size;
	else
		return 0;

in panfrost_gem_rss().

>  
>  	return 0;
>  }
> @@ -447,6 +451,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
>  	pgoff_t page_offset;
>  	struct sg_table *sgt;
>  	struct page **pages;
> +	size_t mapped_size;
>  
>  	bomapping = addr_to_mapping(pfdev, as, addr);
>  	if (!bomapping)
> @@ -518,10 +523,11 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
>  	if (ret)
>  		goto err_map;
>  
> -	mmu_map_sg(pfdev, bomapping->mmu, addr,
> -		   IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
> +	mapped_size = mmu_map_sg(pfdev, bomapping->mmu, addr,
> +				 IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
>  
>  	bomapping->active = true;
> +	bo->rss_size += mapped_size;
>  
>  	dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
>  


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
  2023-08-28 15:00     ` Rob Clark
@ 2023-08-30 15:51       ` Adrián Larumbe
  -1 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-30 15:51 UTC (permalink / raw)
  To: Rob Clark
  Cc: kernel, tzimmermann, Tvrtko Ursulin, sean, quic_abhinavk,
	mripard, steven.price, healych, dri-devel, linux-arm-msm,
	dmitry.baryshkov, marijn.suijten, freedreno, linux-kernel

>> The current implementation will try to pick the highest available
>> unit. This is rather unflexible, and allowing drivers to display BO size
>> statistics through fdinfo in units of their choice might be desirable.
>>
>> The new argument to drm_show_memory_stats is to be interpreted as the
>> integer multiplier of a 10-power of 2, so 1 would give us size in Kib and 2
>> in Mib. If we want drm-file functions to pick the highest unit, then 0
>> should be passed.
>>
>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> ---
>>  drivers/gpu/drm/drm_file.c              | 22 +++++++++++++---------
>>  drivers/gpu/drm/msm/msm_drv.c           |  2 +-
>>  drivers/gpu/drm/panfrost/panfrost_drv.c |  2 +-
>>  include/drm/drm_file.h                  |  5 +++--
>>  4 files changed, 18 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>> index 762965e3d503..517e1fb8072a 100644
>> --- a/drivers/gpu/drm/drm_file.c
>> +++ b/drivers/gpu/drm/drm_file.c
>> @@ -873,7 +873,7 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
>>  EXPORT_SYMBOL(drm_send_event);
>>
>>  static void print_size(struct drm_printer *p, const char *stat,
>> -                      const char *region, u64 sz)
>> +                      const char *region, u64 sz, unsigned int unit)
>>  {
>>         const char *units[] = {"", " KiB", " MiB"};
>>         unsigned u;
>> @@ -881,6 +881,8 @@ static void print_size(struct drm_printer *p, const char *stat,
>>         for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
>>                 if (sz < SZ_1K)
>>                         break;
>> +               if (unit > 0 && unit == u)
>> +                       break;
>>                 sz = div_u64(sz, SZ_1K);
>>         }
>>
>> @@ -898,17 +900,18 @@ static void print_size(struct drm_printer *p, const char *stat,
>>  void drm_print_memory_stats(struct drm_printer *p,
>>                             const struct drm_memory_stats *stats,
>>                             enum drm_gem_object_status supported_status,
>> -                           const char *region)
>> +                           const char *region,
>> +                           unsigned int unit)
>
>I'm not really adverse to changing what units we use.. or perhaps
>changing the threshold to go to higher units to be 10000x or 100000x
>of the previous unit.  But I'm less excited about having different
>drivers using different units.
>
>BR,
>-R

Would it be alright if I left it set to the default unit, and allow changing it
at runtime with a debugfs file?

>>  {
>> -       print_size(p, "total", region, stats->private + stats->shared);
>> -       print_size(p, "shared", region, stats->shared);
>> -       print_size(p, "active", region, stats->active);
>> +       print_size(p, "total", region, stats->private + stats->shared, unit);
>> +       print_size(p, "shared", region, stats->shared, unit);
>> +       print_size(p, "active", region, stats->active, unit);
>>
>>         if (supported_status & DRM_GEM_OBJECT_RESIDENT)
>> -               print_size(p, "resident", region, stats->resident);
>> +               print_size(p, "resident", region, stats->resident, unit);
>>
>>         if (supported_status & DRM_GEM_OBJECT_PURGEABLE)
>> -               print_size(p, "purgeable", region, stats->purgeable);
>> +               print_size(p, "purgeable", region, stats->purgeable, unit);
>>  }
>>  EXPORT_SYMBOL(drm_print_memory_stats);
>>
>> @@ -916,11 +919,12 @@ EXPORT_SYMBOL(drm_print_memory_stats);
>>   * drm_show_memory_stats - Helper to collect and show standard fdinfo memory stats
>>   * @p: the printer to print output to
>>   * @file: the DRM file
>> + * @unit: multipliyer of power of two exponent of desired unit
>>   *
>>   * Helper to iterate over GEM objects with a handle allocated in the specified
>>   * file.
>>   */
>> -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
>> +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit)
>>  {
>>         struct drm_gem_object *obj;
>>         struct drm_memory_stats status = {};
>> @@ -967,7 +971,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
>>         }
>>         spin_unlock(&file->table_lock);
>>
>> -       drm_print_memory_stats(p, &status, supported_status, "memory");
>> +       drm_print_memory_stats(p, &status, supported_status, "memory", unit);
>>  }
>>  EXPORT_SYMBOL(drm_show_memory_stats);
>>
>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
>> index 2a0e3529598b..cd1198151744 100644
>> --- a/drivers/gpu/drm/msm/msm_drv.c
>> +++ b/drivers/gpu/drm/msm/msm_drv.c
>> @@ -1067,7 +1067,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>>
>>         msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
>>
>> -       drm_show_memory_stats(p, file);
>> +       drm_show_memory_stats(p, file, 0);
>>  }
>>
>>  static const struct file_operations fops = {
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> index 93d5f5538c0b..79c08cee3e9d 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> @@ -563,7 +563,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>>
>>         panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
>>
>> -       drm_show_memory_stats(p, file);
>> +       drm_show_memory_stats(p, file, 1);
>>  }
>>
>>  static const struct file_operations panfrost_drm_driver_fops = {
>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>> index 010239392adf..21a3b022dd63 100644
>> --- a/include/drm/drm_file.h
>> +++ b/include/drm/drm_file.h
>> @@ -466,9 +466,10 @@ enum drm_gem_object_status;
>>  void drm_print_memory_stats(struct drm_printer *p,
>>                             const struct drm_memory_stats *stats,
>>                             enum drm_gem_object_status supported_status,
>> -                           const char *region);
>> +                           const char *region,
>> +                           unsigned int unit);
>>
>> -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file);
>> +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit);
>>  void drm_show_fdinfo(struct seq_file *m, struct file *f);
>>
>>  struct file *mock_drm_getfile(struct drm_minor *minor, unsigned int flags);
>> --
>> 2.42.0
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
@ 2023-08-30 15:51       ` Adrián Larumbe
  0 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-30 15:51 UTC (permalink / raw)
  To: Rob Clark
  Cc: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten, robh,
	steven.price, dri-devel, linux-kernel, linux-arm-msm, freedreno,
	healych, kernel, Tvrtko Ursulin

>> The current implementation will try to pick the highest available
>> unit. This is rather unflexible, and allowing drivers to display BO size
>> statistics through fdinfo in units of their choice might be desirable.
>>
>> The new argument to drm_show_memory_stats is to be interpreted as the
>> integer multiplier of a 10-power of 2, so 1 would give us size in Kib and 2
>> in Mib. If we want drm-file functions to pick the highest unit, then 0
>> should be passed.
>>
>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> ---
>>  drivers/gpu/drm/drm_file.c              | 22 +++++++++++++---------
>>  drivers/gpu/drm/msm/msm_drv.c           |  2 +-
>>  drivers/gpu/drm/panfrost/panfrost_drv.c |  2 +-
>>  include/drm/drm_file.h                  |  5 +++--
>>  4 files changed, 18 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>> index 762965e3d503..517e1fb8072a 100644
>> --- a/drivers/gpu/drm/drm_file.c
>> +++ b/drivers/gpu/drm/drm_file.c
>> @@ -873,7 +873,7 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
>>  EXPORT_SYMBOL(drm_send_event);
>>
>>  static void print_size(struct drm_printer *p, const char *stat,
>> -                      const char *region, u64 sz)
>> +                      const char *region, u64 sz, unsigned int unit)
>>  {
>>         const char *units[] = {"", " KiB", " MiB"};
>>         unsigned u;
>> @@ -881,6 +881,8 @@ static void print_size(struct drm_printer *p, const char *stat,
>>         for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
>>                 if (sz < SZ_1K)
>>                         break;
>> +               if (unit > 0 && unit == u)
>> +                       break;
>>                 sz = div_u64(sz, SZ_1K);
>>         }
>>
>> @@ -898,17 +900,18 @@ static void print_size(struct drm_printer *p, const char *stat,
>>  void drm_print_memory_stats(struct drm_printer *p,
>>                             const struct drm_memory_stats *stats,
>>                             enum drm_gem_object_status supported_status,
>> -                           const char *region)
>> +                           const char *region,
>> +                           unsigned int unit)
>
>I'm not really adverse to changing what units we use.. or perhaps
>changing the threshold to go to higher units to be 10000x or 100000x
>of the previous unit.  But I'm less excited about having different
>drivers using different units.
>
>BR,
>-R

Would it be alright if I left it set to the default unit, and allow changing it
at runtime with a debugfs file?

>>  {
>> -       print_size(p, "total", region, stats->private + stats->shared);
>> -       print_size(p, "shared", region, stats->shared);
>> -       print_size(p, "active", region, stats->active);
>> +       print_size(p, "total", region, stats->private + stats->shared, unit);
>> +       print_size(p, "shared", region, stats->shared, unit);
>> +       print_size(p, "active", region, stats->active, unit);
>>
>>         if (supported_status & DRM_GEM_OBJECT_RESIDENT)
>> -               print_size(p, "resident", region, stats->resident);
>> +               print_size(p, "resident", region, stats->resident, unit);
>>
>>         if (supported_status & DRM_GEM_OBJECT_PURGEABLE)
>> -               print_size(p, "purgeable", region, stats->purgeable);
>> +               print_size(p, "purgeable", region, stats->purgeable, unit);
>>  }
>>  EXPORT_SYMBOL(drm_print_memory_stats);
>>
>> @@ -916,11 +919,12 @@ EXPORT_SYMBOL(drm_print_memory_stats);
>>   * drm_show_memory_stats - Helper to collect and show standard fdinfo memory stats
>>   * @p: the printer to print output to
>>   * @file: the DRM file
>> + * @unit: multipliyer of power of two exponent of desired unit
>>   *
>>   * Helper to iterate over GEM objects with a handle allocated in the specified
>>   * file.
>>   */
>> -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
>> +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit)
>>  {
>>         struct drm_gem_object *obj;
>>         struct drm_memory_stats status = {};
>> @@ -967,7 +971,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
>>         }
>>         spin_unlock(&file->table_lock);
>>
>> -       drm_print_memory_stats(p, &status, supported_status, "memory");
>> +       drm_print_memory_stats(p, &status, supported_status, "memory", unit);
>>  }
>>  EXPORT_SYMBOL(drm_show_memory_stats);
>>
>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
>> index 2a0e3529598b..cd1198151744 100644
>> --- a/drivers/gpu/drm/msm/msm_drv.c
>> +++ b/drivers/gpu/drm/msm/msm_drv.c
>> @@ -1067,7 +1067,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>>
>>         msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
>>
>> -       drm_show_memory_stats(p, file);
>> +       drm_show_memory_stats(p, file, 0);
>>  }
>>
>>  static const struct file_operations fops = {
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> index 93d5f5538c0b..79c08cee3e9d 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> @@ -563,7 +563,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>>
>>         panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
>>
>> -       drm_show_memory_stats(p, file);
>> +       drm_show_memory_stats(p, file, 1);
>>  }
>>
>>  static const struct file_operations panfrost_drm_driver_fops = {
>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>> index 010239392adf..21a3b022dd63 100644
>> --- a/include/drm/drm_file.h
>> +++ b/include/drm/drm_file.h
>> @@ -466,9 +466,10 @@ enum drm_gem_object_status;
>>  void drm_print_memory_stats(struct drm_printer *p,
>>                             const struct drm_memory_stats *stats,
>>                             enum drm_gem_object_status supported_status,
>> -                           const char *region);
>> +                           const char *region,
>> +                           unsigned int unit);
>>
>> -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file);
>> +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit);
>>  void drm_show_fdinfo(struct seq_file *m, struct file *f);
>>
>>  struct file *mock_drm_getfile(struct drm_minor *minor, unsigned int flags);
>> --
>> 2.42.0
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/6] drm/panfrost: Add cycle count GPU register definitions
  2023-08-24  1:34   ` Adrián Larumbe
@ 2023-08-31 15:54     ` Steven Price
  -1 siblings, 0 replies; 56+ messages in thread
From: Steven Price @ 2023-08-31 15:54 UTC (permalink / raw)
  To: Adrián Larumbe, maarten.lankhorst, mripard, tzimmermann,
	airlied, daniel, robdclark, quic_abhinavk, dmitry.baryshkov,
	sean, marijn.suijten, robh
  Cc: dri-devel, linux-kernel, linux-arm-msm, freedreno, healych, kernel

On 24/08/2023 02:34, Adrián Larumbe wrote:
> These GPU registers will be used when programming the cycle counter, which
> we need for providing accurate fdinfo drm-cycles values to user space.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>  drivers/gpu/drm/panfrost/panfrost_regs.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h
> index 919f44ac853d..55ec807550b3 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_regs.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
> @@ -46,6 +46,8 @@
>  #define   GPU_CMD_SOFT_RESET		0x01
>  #define   GPU_CMD_PERFCNT_CLEAR		0x03
>  #define   GPU_CMD_PERFCNT_SAMPLE	0x04
> +#define   GPU_CMD_CYCLE_COUNT_START	0x05
> +#define   GPU_CMD_CYCLE_COUNT_STOP	0x06
>  #define   GPU_CMD_CLEAN_CACHES		0x07
>  #define   GPU_CMD_CLEAN_INV_CACHES	0x08
>  #define GPU_STATUS			0x34
> @@ -73,6 +75,9 @@
>  #define GPU_PRFCNT_TILER_EN		0x74
>  #define GPU_PRFCNT_MMU_L2_EN		0x7c
>  
> +#define GPU_CYCLE_COUNT_LO		0x90
> +#define GPU_CYCLE_COUNT_HI		0x94
> +
>  #define GPU_THREAD_MAX_THREADS		0x0A0	/* (RO) Maximum number of threads per core */
>  #define GPU_THREAD_MAX_WORKGROUP_SIZE	0x0A4	/* (RO) Maximum workgroup size */
>  #define GPU_THREAD_MAX_BARRIER_SIZE	0x0A8	/* (RO) Maximum threads waiting at a barrier */


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/6] drm/panfrost: Add cycle count GPU register definitions
@ 2023-08-31 15:54     ` Steven Price
  0 siblings, 0 replies; 56+ messages in thread
From: Steven Price @ 2023-08-31 15:54 UTC (permalink / raw)
  To: Adrián Larumbe, maarten.lankhorst, mripard, tzimmermann,
	airlied, daniel, robdclark, quic_abhinavk, dmitry.baryshkov,
	sean, marijn.suijten, robh
  Cc: linux-arm-msm, linux-kernel, dri-devel, healych, kernel, freedreno

On 24/08/2023 02:34, Adrián Larumbe wrote:
> These GPU registers will be used when programming the cycle counter, which
> we need for providing accurate fdinfo drm-cycles values to user space.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>  drivers/gpu/drm/panfrost/panfrost_regs.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h
> index 919f44ac853d..55ec807550b3 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_regs.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
> @@ -46,6 +46,8 @@
>  #define   GPU_CMD_SOFT_RESET		0x01
>  #define   GPU_CMD_PERFCNT_CLEAR		0x03
>  #define   GPU_CMD_PERFCNT_SAMPLE	0x04
> +#define   GPU_CMD_CYCLE_COUNT_START	0x05
> +#define   GPU_CMD_CYCLE_COUNT_STOP	0x06
>  #define   GPU_CMD_CLEAN_CACHES		0x07
>  #define   GPU_CMD_CLEAN_INV_CACHES	0x08
>  #define GPU_STATUS			0x34
> @@ -73,6 +75,9 @@
>  #define GPU_PRFCNT_TILER_EN		0x74
>  #define GPU_PRFCNT_MMU_L2_EN		0x7c
>  
> +#define GPU_CYCLE_COUNT_LO		0x90
> +#define GPU_CYCLE_COUNT_HI		0x94
> +
>  #define GPU_THREAD_MAX_THREADS		0x0A0	/* (RO) Maximum number of threads per core */
>  #define GPU_THREAD_MAX_WORKGROUP_SIZE	0x0A4	/* (RO) Maximum workgroup size */
>  #define GPU_THREAD_MAX_BARRIER_SIZE	0x0A8	/* (RO) Maximum threads waiting at a barrier */


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
  2023-08-24  1:34   ` Adrián Larumbe
@ 2023-08-31 15:54     ` Steven Price
  -1 siblings, 0 replies; 56+ messages in thread
From: Steven Price @ 2023-08-31 15:54 UTC (permalink / raw)
  To: Adrián Larumbe, maarten.lankhorst, mripard, tzimmermann,
	airlied, daniel, robdclark, quic_abhinavk, dmitry.baryshkov,
	sean, marijn.suijten, robh
  Cc: dri-devel, linux-kernel, linux-arm-msm, freedreno, healych, kernel

On 24/08/2023 02:34, Adrián Larumbe wrote:
> The drm-stats fdinfo tags made available to user space are drm-engine,
> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.
> 
> This deviates from standard practice in other DRM drivers, where a single
> set of key:value pairs is provided for the whole render engine. However,
> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
> decision was made to calculate bus cycles and workload times separately.
> 
> Maximum operating frequency is calculated at devfreq initialisation time.
> Current frequency is made available to user space because nvtop uses it
> when performing engine usage calculations.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
>  drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>  drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
>  drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
>  drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
>  6 files changed, 102 insertions(+), 1 deletion(-)
> 

[...]

> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index a2ab99698ca8..3fd372301019 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
>  	job->requirements = args->requirements;
>  	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
>  	job->mmu = file_priv->mmu;
> +	job->priv = file_priv;
>  
>  	slot = panfrost_job_get_slot(job);
>  
> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
>  		goto err_free;
>  	}
>  
> +	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
> +	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
> +#if 0
> +	/* Add compute engine in the future */
> +	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
> +#endif

I'm not sure what names are best, but slot 2 isn't actually a compute slot.

Slot 0 is fragment, that name is fine.

Slot 1 and 2 are actually the same (from a hardware perspective) but the
core affinity of the two slots cannot overlap which means you need to
divide the GPU in two to usefully use both slots. The only GPU that this
actually makes sense for is the T628[1] as it has two (non-coherent)
core groups.

The upshot is that slot 1 is used for all of vertex, tiling and compute.
Slot 2 is currently never used, but kbase will use it only for compute
(and only on the two core group GPUs).

Personally I'd be tempted to call them "slot 0", "slot 1" and "slot 2" -
but I appreciate that's not very helpful to people who aren't intimately
familiar with the hardware ;)

Steve

[1] And technically the T608 but that's even rarer and the T60x isn't
(yet) supported by Panfrost.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
@ 2023-08-31 15:54     ` Steven Price
  0 siblings, 0 replies; 56+ messages in thread
From: Steven Price @ 2023-08-31 15:54 UTC (permalink / raw)
  To: Adrián Larumbe, maarten.lankhorst, mripard, tzimmermann,
	airlied, daniel, robdclark, quic_abhinavk, dmitry.baryshkov,
	sean, marijn.suijten, robh
  Cc: linux-arm-msm, linux-kernel, dri-devel, healych, kernel, freedreno

On 24/08/2023 02:34, Adrián Larumbe wrote:
> The drm-stats fdinfo tags made available to user space are drm-engine,
> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.
> 
> This deviates from standard practice in other DRM drivers, where a single
> set of key:value pairs is provided for the whole render engine. However,
> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
> decision was made to calculate bus cycles and workload times separately.
> 
> Maximum operating frequency is calculated at devfreq initialisation time.
> Current frequency is made available to user space because nvtop uses it
> when performing engine usage calculations.
> 
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
>  drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>  drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
>  drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
>  drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
>  6 files changed, 102 insertions(+), 1 deletion(-)
> 

[...]

> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index a2ab99698ca8..3fd372301019 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
>  	job->requirements = args->requirements;
>  	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
>  	job->mmu = file_priv->mmu;
> +	job->priv = file_priv;
>  
>  	slot = panfrost_job_get_slot(job);
>  
> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
>  		goto err_free;
>  	}
>  
> +	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
> +	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
> +#if 0
> +	/* Add compute engine in the future */
> +	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
> +#endif

I'm not sure what names are best, but slot 2 isn't actually a compute slot.

Slot 0 is fragment, that name is fine.

Slot 1 and 2 are actually the same (from a hardware perspective) but the
core affinity of the two slots cannot overlap which means you need to
divide the GPU in two to usefully use both slots. The only GPU that this
actually makes sense for is the T628[1] as it has two (non-coherent)
core groups.

The upshot is that slot 1 is used for all of vertex, tiling and compute.
Slot 2 is currently never used, but kbase will use it only for compute
(and only on the two core group GPUs).

Personally I'd be tempted to call them "slot 0", "slot 1" and "slot 2" -
but I appreciate that's not very helpful to people who aren't intimately
familiar with the hardware ;)

Steve

[1] And technically the T608 but that's even rarer and the T60x isn't
(yet) supported by Panfrost.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
  2023-08-31 15:54     ` Steven Price
@ 2023-08-31 21:34       ` Adrián Larumbe
  -1 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-31 21:34 UTC (permalink / raw)
  To: Steven Price
  Cc: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, dri-devel, linux-kernel, linux-arm-msm, freedreno, healych,
	kernel

On 31.08.2023 16:54, Steven Price wrote:
>On 24/08/2023 02:34, Adrián Larumbe wrote:
>> The drm-stats fdinfo tags made available to user space are drm-engine,
>> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.
>> 
>> This deviates from standard practice in other DRM drivers, where a single
>> set of key:value pairs is provided for the whole render engine. However,
>> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
>> decision was made to calculate bus cycles and workload times separately.
>> 
>> Maximum operating frequency is calculated at devfreq initialisation time.
>> Current frequency is made available to user space because nvtop uses it
>> when performing engine usage calculations.
>> 
>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> ---
>>  drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
>>  drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>>  drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
>>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
>>  drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
>>  drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
>>  6 files changed, 102 insertions(+), 1 deletion(-)
>> 
>
>[...]
>
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> index a2ab99698ca8..3fd372301019 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
>>  	job->requirements = args->requirements;
>>  	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
>>  	job->mmu = file_priv->mmu;
>> +	job->priv = file_priv;
>>  
>>  	slot = panfrost_job_get_slot(job);
>>  
>> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
>>  		goto err_free;
>>  	}
>>  
>> +	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
>> +	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
>> +#if 0
>> +	/* Add compute engine in the future */
>> +	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
>> +#endif
>
>I'm not sure what names are best, but slot 2 isn't actually a compute slot.
>
>Slot 0 is fragment, that name is fine.
>
>Slot 1 and 2 are actually the same (from a hardware perspective) but the
>core affinity of the two slots cannot overlap which means you need to
>divide the GPU in two to usefully use both slots. The only GPU that this
>actually makes sense for is the T628[1] as it has two (non-coherent)
>core groups.
>
>The upshot is that slot 1 is used for all of vertex, tiling and compute.
>Slot 2 is currently never used, but kbase will use it only for compute
>(and only on the two core group GPUs).

I think I might've be rushed to draw inspiration for this from a comment in panfrost_job.c:

int panfrost_job_get_slot(struct panfrost_job *job)
{
	/* JS0: fragment jobs.
	 * JS1: vertex/tiler jobs
	 * JS2: compute jobs
	 */
         [...]
}

Maybe I could rename the engine names to "fragment", "vertex-tiler" and "compute-only"?
There's no reason why I would skimp on engine name length, and anything more
descriptive would be just as good.

>Personally I'd be tempted to call them "slot 0", "slot 1" and "slot 2" -
>but I appreciate that's not very helpful to people who aren't intimately
>familiar with the hardware ;)

The downside of this is that both IGT's fdinfo library and nvtop will use the
engime name for display, and like you said these numbers might mean nothing to
someone who isn't acquainted with the hardware.

>Steve
>
>[1] And technically the T608 but that's even rarer and the T60x isn't
>(yet) supported by Panfrost.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
@ 2023-08-31 21:34       ` Adrián Larumbe
  0 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-31 21:34 UTC (permalink / raw)
  To: Steven Price
  Cc: kernel, tzimmermann, sean, quic_abhinavk, mripard, linux-kernel,
	healych, dri-devel, linux-arm-msm, dmitry.baryshkov,
	marijn.suijten, freedreno

On 31.08.2023 16:54, Steven Price wrote:
>On 24/08/2023 02:34, Adrián Larumbe wrote:
>> The drm-stats fdinfo tags made available to user space are drm-engine,
>> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.
>> 
>> This deviates from standard practice in other DRM drivers, where a single
>> set of key:value pairs is provided for the whole render engine. However,
>> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
>> decision was made to calculate bus cycles and workload times separately.
>> 
>> Maximum operating frequency is calculated at devfreq initialisation time.
>> Current frequency is made available to user space because nvtop uses it
>> when performing engine usage calculations.
>> 
>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> ---
>>  drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
>>  drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>>  drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
>>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
>>  drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
>>  drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
>>  6 files changed, 102 insertions(+), 1 deletion(-)
>> 
>
>[...]
>
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> index a2ab99698ca8..3fd372301019 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
>>  	job->requirements = args->requirements;
>>  	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
>>  	job->mmu = file_priv->mmu;
>> +	job->priv = file_priv;
>>  
>>  	slot = panfrost_job_get_slot(job);
>>  
>> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
>>  		goto err_free;
>>  	}
>>  
>> +	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
>> +	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
>> +#if 0
>> +	/* Add compute engine in the future */
>> +	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
>> +#endif
>
>I'm not sure what names are best, but slot 2 isn't actually a compute slot.
>
>Slot 0 is fragment, that name is fine.
>
>Slot 1 and 2 are actually the same (from a hardware perspective) but the
>core affinity of the two slots cannot overlap which means you need to
>divide the GPU in two to usefully use both slots. The only GPU that this
>actually makes sense for is the T628[1] as it has two (non-coherent)
>core groups.
>
>The upshot is that slot 1 is used for all of vertex, tiling and compute.
>Slot 2 is currently never used, but kbase will use it only for compute
>(and only on the two core group GPUs).

I think I might've be rushed to draw inspiration for this from a comment in panfrost_job.c:

int panfrost_job_get_slot(struct panfrost_job *job)
{
	/* JS0: fragment jobs.
	 * JS1: vertex/tiler jobs
	 * JS2: compute jobs
	 */
         [...]
}

Maybe I could rename the engine names to "fragment", "vertex-tiler" and "compute-only"?
There's no reason why I would skimp on engine name length, and anything more
descriptive would be just as good.

>Personally I'd be tempted to call them "slot 0", "slot 1" and "slot 2" -
>but I appreciate that's not very helpful to people who aren't intimately
>familiar with the hardware ;)

The downside of this is that both IGT's fdinfo library and nvtop will use the
engime name for display, and like you said these numbers might mean nothing to
someone who isn't acquainted with the hardware.

>Steve
>
>[1] And technically the T608 but that's even rarer and the T60x isn't
>(yet) supported by Panfrost.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 3/6] drm/panfrost: Add fdinfo support for memory stats
  2023-08-30 10:31     ` Boris Brezillon
@ 2023-08-31 23:07       ` Adrián Larumbe
  -1 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-31 23:07 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price, linux-arm-msm, linux-kernel, dri-devel,
	healych, kernel, freedreno

On 30.08.2023 12:31, Boris Brezillon wrote:
>On Thu, 24 Aug 2023 02:34:46 +0100
>Adrián Larumbe <adrian.larumbe@collabora.com> wrote:
>
>> A new DRM GEM object function is added so that drm_show_memory_stats can
>> provider more accurate memory usage numbers.
>
>  s/provider/provide/
>
>> 
>> Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked
>> after locking the driver's shrinker mutex, but drm_show_memory_stats takes
>> over the drm file's object handle database spinlock, so there's potential
>> for a race condition here.
>
>Yeah, I don't think it matters much if we report a BO non-purgeable,
>and this BO becomes purgeable in the meantime. You'd have the same
>problem
>
>> 
>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> ---
>>  drivers/gpu/drm/panfrost/panfrost_drv.c |  9 +++++++--
>>  drivers/gpu/drm/panfrost/panfrost_gem.c | 12 ++++++++++++
>>  drivers/gpu/drm/panfrost/panfrost_gem.h |  1 +
>>  3 files changed, 20 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> index 3fd372301019..93d5f5538c0b 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> @@ -440,11 +440,14 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
>>  	args->retained = drm_gem_shmem_madvise(&bo->base, args->madv);
>>  
>>  	if (args->retained) {
>> -		if (args->madv == PANFROST_MADV_DONTNEED)
>> +		if (args->madv == PANFROST_MADV_DONTNEED) {
>>  			list_move_tail(&bo->base.madv_list,
>>  				       &pfdev->shrinker_list);
>> -		else if (args->madv == PANFROST_MADV_WILLNEED)
>> +			bo->is_purgable = true;
>> +		} else if (args->madv == PANFROST_MADV_WILLNEED) {
>>  			list_del_init(&bo->base.madv_list);
>> +			bo->is_purgable = false;
>
>Should we really flag the BO as purgeable if it's already been evicted
>(args->retained == false)?

I checked what msm is doing, and I guess it shouldn't be marked as purgeable after eviction.
I didn't catch this at first because Freedreno isn't using drm_gem_shmem_madvise, but apparently
tracking whether a BO was already purged through an additional MADV state.

>> +		}
>>  	}
>>  
>>  out_unlock_mappings:
>> @@ -559,6 +562,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>>  	struct panfrost_device *pfdev = dev->dev_private;
>>  
>>  	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
>> +
>> +	drm_show_memory_stats(p, file);
>>  }
>>  
>>  static const struct file_operations panfrost_drm_driver_fops = {
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
>> index 3c812fbd126f..aea16b0e4dda 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
>> @@ -195,6 +195,17 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
>>  	return drm_gem_shmem_pin(&bo->base);
>>  }
>>  
>> +static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj)
>> +{
>> +	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
>> +	enum drm_gem_object_status res = 0;
>> +
>> +	res |= (bo->is_purgable) ? DRM_GEM_OBJECT_PURGEABLE : 0;
>
>Why not checking bo->base.madv here instead of adding an is_purgeable
>field?

I thought it would make the meaning more clear, but I guess there's no point in
duplicating information.

>> +
>> +	res |= (bo->base.pages) ? DRM_GEM_OBJECT_RESIDENT : 0;
>
>Does it make sense to have DRM_GEM_OBJECT_PURGEABLE set when
>DRM_GEM_OBJECT_RESIDENT is not?

Freedreno's msm_gem_status seems not to care about this because drm_show_memory_stats is already
handling this situation:

	if (s & DRM_GEM_OBJECT_RESIDENT) {
		if (obj->funcs && obj->funcs->rss)
			status.resident += obj->funcs->rss(obj);
		else
			status.resident += obj->size;
	} else {
		/* If already purged or not yet backed by pages, don't
		 * count it as purgeable:
		 */
		s &= ~DRM_GEM_OBJECT_PURGEABLE;
	}

>> +
>> +	return res;
>> +}
>>  static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>>  	.free = panfrost_gem_free_object,
>>  	.open = panfrost_gem_open,
>> @@ -206,6 +217,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>>  	.vmap = drm_gem_shmem_object_vmap,
>>  	.vunmap = drm_gem_shmem_object_vunmap,
>>  	.mmap = drm_gem_shmem_object_mmap,
>> +	.status = panfrost_gem_status,
>>  	.vm_ops = &drm_gem_shmem_vm_ops,
>>  };
>>  
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
>> index ad2877eeeccd..e06f7ceb8f73 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_gem.h
>> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
>> @@ -38,6 +38,7 @@ struct panfrost_gem_object {
>>  
>>  	bool noexec		:1;
>>  	bool is_heap		:1;
>> +	bool is_purgable	:1;
>>  };
>>  
>>  struct panfrost_gem_mapping {

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 3/6] drm/panfrost: Add fdinfo support for memory stats
@ 2023-08-31 23:07       ` Adrián Larumbe
  0 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-31 23:07 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: tzimmermann, sean, quic_abhinavk, mripard, steven.price,
	freedreno, healych, dri-devel, linux-arm-msm, dmitry.baryshkov,
	marijn.suijten, kernel, linux-kernel

On 30.08.2023 12:31, Boris Brezillon wrote:
>On Thu, 24 Aug 2023 02:34:46 +0100
>Adrián Larumbe <adrian.larumbe@collabora.com> wrote:
>
>> A new DRM GEM object function is added so that drm_show_memory_stats can
>> provider more accurate memory usage numbers.
>
>  s/provider/provide/
>
>> 
>> Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked
>> after locking the driver's shrinker mutex, but drm_show_memory_stats takes
>> over the drm file's object handle database spinlock, so there's potential
>> for a race condition here.
>
>Yeah, I don't think it matters much if we report a BO non-purgeable,
>and this BO becomes purgeable in the meantime. You'd have the same
>problem
>
>> 
>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> ---
>>  drivers/gpu/drm/panfrost/panfrost_drv.c |  9 +++++++--
>>  drivers/gpu/drm/panfrost/panfrost_gem.c | 12 ++++++++++++
>>  drivers/gpu/drm/panfrost/panfrost_gem.h |  1 +
>>  3 files changed, 20 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> index 3fd372301019..93d5f5538c0b 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> @@ -440,11 +440,14 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
>>  	args->retained = drm_gem_shmem_madvise(&bo->base, args->madv);
>>  
>>  	if (args->retained) {
>> -		if (args->madv == PANFROST_MADV_DONTNEED)
>> +		if (args->madv == PANFROST_MADV_DONTNEED) {
>>  			list_move_tail(&bo->base.madv_list,
>>  				       &pfdev->shrinker_list);
>> -		else if (args->madv == PANFROST_MADV_WILLNEED)
>> +			bo->is_purgable = true;
>> +		} else if (args->madv == PANFROST_MADV_WILLNEED) {
>>  			list_del_init(&bo->base.madv_list);
>> +			bo->is_purgable = false;
>
>Should we really flag the BO as purgeable if it's already been evicted
>(args->retained == false)?

I checked what msm is doing, and I guess it shouldn't be marked as purgeable after eviction.
I didn't catch this at first because Freedreno isn't using drm_gem_shmem_madvise, but apparently
tracking whether a BO was already purged through an additional MADV state.

>> +		}
>>  	}
>>  
>>  out_unlock_mappings:
>> @@ -559,6 +562,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>>  	struct panfrost_device *pfdev = dev->dev_private;
>>  
>>  	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
>> +
>> +	drm_show_memory_stats(p, file);
>>  }
>>  
>>  static const struct file_operations panfrost_drm_driver_fops = {
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
>> index 3c812fbd126f..aea16b0e4dda 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
>> @@ -195,6 +195,17 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
>>  	return drm_gem_shmem_pin(&bo->base);
>>  }
>>  
>> +static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj)
>> +{
>> +	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
>> +	enum drm_gem_object_status res = 0;
>> +
>> +	res |= (bo->is_purgable) ? DRM_GEM_OBJECT_PURGEABLE : 0;
>
>Why not checking bo->base.madv here instead of adding an is_purgeable
>field?

I thought it would make the meaning more clear, but I guess there's no point in
duplicating information.

>> +
>> +	res |= (bo->base.pages) ? DRM_GEM_OBJECT_RESIDENT : 0;
>
>Does it make sense to have DRM_GEM_OBJECT_PURGEABLE set when
>DRM_GEM_OBJECT_RESIDENT is not?

Freedreno's msm_gem_status seems not to care about this because drm_show_memory_stats is already
handling this situation:

	if (s & DRM_GEM_OBJECT_RESIDENT) {
		if (obj->funcs && obj->funcs->rss)
			status.resident += obj->funcs->rss(obj);
		else
			status.resident += obj->size;
	} else {
		/* If already purged or not yet backed by pages, don't
		 * count it as purgeable:
		 */
		s &= ~DRM_GEM_OBJECT_PURGEABLE;
	}

>> +
>> +	return res;
>> +}
>>  static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>>  	.free = panfrost_gem_free_object,
>>  	.open = panfrost_gem_open,
>> @@ -206,6 +217,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>>  	.vmap = drm_gem_shmem_object_vmap,
>>  	.vunmap = drm_gem_shmem_object_vunmap,
>>  	.mmap = drm_gem_shmem_object_mmap,
>> +	.status = panfrost_gem_status,
>>  	.vm_ops = &drm_gem_shmem_vm_ops,
>>  };
>>  
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
>> index ad2877eeeccd..e06f7ceb8f73 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_gem.h
>> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
>> @@ -38,6 +38,7 @@ struct panfrost_gem_object {
>>  
>>  	bool noexec		:1;
>>  	bool is_heap		:1;
>> +	bool is_purgable	:1;
>>  };
>>  
>>  struct panfrost_gem_mapping {

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
  2023-08-30 10:17     ` Boris Brezillon
@ 2023-08-31 23:23       ` Adrián Larumbe
  -1 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-31 23:23 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price, linux-arm-msm, linux-kernel, dri-devel,
	healych, kernel, freedreno

On 30.08.2023 12:17, Boris Brezillon wrote:
>On Thu, 24 Aug 2023 02:34:45 +0100
>Adrián Larumbe <adrian.larumbe@collabora.com> wrote:
>
>> The drm-stats fdinfo tags made available to user space are drm-engine,
>> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.
>
>Pretty sure this has already been discussed, but it's probably worth
>mentioning that drm-cycles is not accurate, it just gives you a rough
>idea of how much GPU cycles were dedicated to a context (just like
>drm-engine elapsed-ns is giving you an approximation of the
>GPU utilization). This comes from 2 factors:
>
>1. We're dependent on the time the kernel/CPU takes to process the GPU
>interrupt.
>2. The pipelining done by the Job Manager (2 job slots per engine)
>implies that you can't really know how much time each job spent on the
>GPU. When these jobs are coming from the same context, that's not a
>problem, but when they don't, it's impossible to have a clear split.
>
>I'd really like to have that mentioned somewhere in the code+commit
>message to lower users expectation.
>
>> 
>> This deviates from standard practice in other DRM drivers, where a single
>> set of key:value pairs is provided for the whole render engine. However,
>> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
>> decision was made to calculate bus cycles and workload times separately.
>> 
>> Maximum operating frequency is calculated at devfreq initialisation time.
>> Current frequency is made available to user space because nvtop uses it
>> when performing engine usage calculations.
>> 
>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> ---
>>  drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
>>  drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>>  drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
>>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
>>  drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
>>  drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
>>  6 files changed, 102 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
>> index 58dfb15a8757..28caffc689e2 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
>> @@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
>>  	spin_lock_irqsave(&pfdevfreq->lock, irqflags);
>>  
>>  	panfrost_devfreq_update_utilization(pfdevfreq);
>> +	pfdevfreq->current_frequency = status->current_frequency;
>>  
>>  	status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time,
>>  						   pfdevfreq->idle_time));
>> @@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
>>  	struct devfreq *devfreq;
>>  	struct thermal_cooling_device *cooling;
>>  	struct panfrost_devfreq *pfdevfreq = &pfdev->pfdevfreq;
>> +	unsigned long freq = ULONG_MAX;
>>  
>>  	if (pfdev->comp->num_supplies > 1) {
>>  		/*
>> @@ -172,6 +174,12 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
>>  		return ret;
>>  	}
>>  
>> +	/* Find the fastest defined rate  */
>> +	opp = dev_pm_opp_find_freq_floor(dev, &freq);
>> +	if (IS_ERR(opp))
>> +		return PTR_ERR(opp);
>> +	pfdevfreq->fast_rate = freq;
>> +
>>  	dev_pm_opp_put(opp);
>>  
>>  	/*
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.h b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
>> index 1514c1f9d91c..48dbe185f206 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.h
>> +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
>> @@ -19,6 +19,9 @@ struct panfrost_devfreq {
>>  	struct devfreq_simple_ondemand_data gov_data;
>>  	bool opp_of_table_added;
>>  
>> +	unsigned long current_frequency;
>> +	unsigned long fast_rate;
>> +
>>  	ktime_t busy_time;
>>  	ktime_t idle_time;
>>  	ktime_t time_last_update;
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
>> index b0126b9fbadc..680f298fd1a9 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
>> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
>> @@ -24,6 +24,7 @@ struct panfrost_perfcnt;
>>  
>>  #define NUM_JOB_SLOTS 3
>>  #define MAX_PM_DOMAINS 5
>> +#define MAX_SLOT_NAME_LEN 10
>>  
>>  struct panfrost_features {
>>  	u16 id;
>> @@ -135,12 +136,24 @@ struct panfrost_mmu {
>>  	struct list_head list;
>>  };
>>  
>> +struct drm_info_gpu {
>> +	unsigned int maxfreq;
>> +
>> +	struct engine_info {
>> +		unsigned long long elapsed_ns;
>> +		unsigned long long cycles;
>> +		char name[MAX_SLOT_NAME_LEN];
>> +	} engines[NUM_JOB_SLOTS];
>> +};
>> +
>>  struct panfrost_file_priv {
>>  	struct panfrost_device *pfdev;
>>  
>>  	struct drm_sched_entity sched_entity[NUM_JOB_SLOTS];
>>  
>>  	struct panfrost_mmu *mmu;
>> +
>> +	struct drm_info_gpu fdinfo;
>>  };
>>  
>>  static inline struct panfrost_device *to_panfrost_device(struct drm_device *ddev)
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> index a2ab99698ca8..3fd372301019 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
>>  	job->requirements = args->requirements;
>>  	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
>>  	job->mmu = file_priv->mmu;
>> +	job->priv = file_priv;
>
>Uh, I'm not comfortable passing the file context here, unless you reset
>it to NULL in panfrost_job_close() and have code that's robust to
>job->priv being NULL. We've had cases in the past where jobs outlived
>the file context itself.
>
>>  
>>  	slot = panfrost_job_get_slot(job);
>>  
>> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
>>  		goto err_free;
>>  	}
>>  
>> +	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
>> +	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
>> +#if 0
>> +	/* Add compute engine in the future */
>> +	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
>> +#endif
>> +	panfrost_priv->fdinfo.maxfreq = pfdev->pfdevfreq.fast_rate;
>> +
>>  	ret = panfrost_job_open(panfrost_priv);
>>  	if (ret)
>>  		goto err_job;
>> @@ -523,7 +532,40 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
>>  	PANFROST_IOCTL(MADVISE,		madvise,	DRM_RENDER_ALLOW),
>>  };
>>  
>> -DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
>> +
>> +static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
>> +				     struct panfrost_file_priv *panfrost_priv,
>> +				     struct drm_printer *p)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
>> +		struct engine_info *ei = &panfrost_priv->fdinfo.engines[i];
>> +
>> +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
>> +			   ei->name, ei->elapsed_ns);
>> +		drm_printf(p, "drm-cycles-%s:\t%llu\n",
>> +			   ei->name, ei->cycles);
>> +		drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
>> +			   ei->name, panfrost_priv->fdinfo.maxfreq);
>> +		drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
>> +			   ei->name, pfdev->pfdevfreq.current_frequency);
>> +	}
>> +}
>> +
>> +static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>> +{
>> +	struct drm_device *dev = file->minor->dev;
>> +	struct panfrost_device *pfdev = dev->dev_private;
>> +
>> +	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
>> +}
>> +
>> +static const struct file_operations panfrost_drm_driver_fops = {
>> +	.owner = THIS_MODULE,
>> +	DRM_GEM_FOPS,
>> +	.show_fdinfo = drm_show_fdinfo,
>> +};
>>  
>>  /*
>>   * Panfrost driver version:
>> @@ -535,6 +577,7 @@ static const struct drm_driver panfrost_drm_driver = {
>>  	.driver_features	= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
>>  	.open			= panfrost_open,
>>  	.postclose		= panfrost_postclose,
>> +	.show_fdinfo		= panfrost_show_fdinfo,
>>  	.ioctls			= panfrost_drm_driver_ioctls,
>>  	.num_ioctls		= ARRAY_SIZE(panfrost_drm_driver_ioctls),
>>  	.fops			= &panfrost_drm_driver_fops,
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>> index dbc597ab46fb..a847e183b5d0 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>> @@ -153,10 +153,31 @@ panfrost_get_job_chain_flag(const struct panfrost_job *job)
>>  	return (f->seqno & 1) ? JS_CONFIG_JOB_CHAIN_FLAG : 0;
>>  }
>>  
>> +static inline unsigned long long read_cycles(struct panfrost_device *pfdev)
>> +{
>> +	u64 address = (u64) gpu_read(pfdev, GPU_CYCLE_COUNT_HI) << 32;
>> +
>> +	address |= gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
>> +
>
>We probably want to handle the 32-bit overflow case with something like:
>
>	u32 hi, lo;
>
>	do {
>		hi = gpu_read(pfdev, GPU_CYCLE_COUNT_HI);
>		lo = gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
>	} while (hi != gpu_read(pfdev, GPU_CYCLE_COUNT_HI));
>
>	return ((u64)hi << 32) | lo;
>
>> +	return address;
>> +}
>> +
>>  static struct panfrost_job *
>>  panfrost_dequeue_job(struct panfrost_device *pfdev, int slot)
>>  {
>>  	struct panfrost_job *job = pfdev->jobs[slot][0];
>> +	struct engine_info *engine_info = &job->priv->fdinfo.engines[slot];
>> +
>> +	engine_info->elapsed_ns +=
>> +		ktime_to_ns(ktime_sub(ktime_get(), job->start_time));
>> +	engine_info->cycles +=
>> +		read_cycles(pfdev) - job->start_cycles;
>> +
>> +	/* Reset in case the job has to be requeued */
>> +	job->start_time = 0;
>> +	/* A GPU reset puts the Cycle Counter register back to 0 */
>> +	job->start_cycles = atomic_read(&pfdev->reset.pending) ?
>> +		0 : read_cycles(pfdev);
>
>Do we need to reset these values? If the jobs are re-submitted, those
>fields will be re-assigned, and if the job is done, I don't see where
>we're using it after that point (might have missed something).

I did this because from the third loop in panfrost_job_handle_irq, I got the
impression that when a job in the second slot is stopped after the one in the
first one fails, then it's requeued and started immediately without involvement
from the drm scheduler, so in this case panfrost_job_hw_submit wouldn't be
called. At the moment the initial sample of cycles and time is done in that function.

>>  
>>  	WARN_ON(!job);
>>  	pfdev->jobs[slot][0] = pfdev->jobs[slot][1];
>> @@ -233,6 +254,9 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
>>  	subslot = panfrost_enqueue_job(pfdev, js, job);
>>  	/* Don't queue the job if a reset is in progress */
>>  	if (!atomic_read(&pfdev->reset.pending)) {
>> +		job->start_time = ktime_get();
>> +		job->start_cycles = read_cycles(pfdev);
>> +
>>  		job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
>>  		dev_dbg(pfdev->dev,
>>  			"JS: Submitting atom %p to js[%d][%d] with head=0x%llx AS %d",
>> @@ -297,6 +321,9 @@ int panfrost_job_push(struct panfrost_job *job)
>>  
>>  	kref_get(&job->refcount); /* put by scheduler job completion */
>>  
>> +	if (panfrost_job_is_idle(pfdev))
>> +		gpu_write(pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_START);
>> +
>>  	drm_sched_entity_push_job(&job->base);
>>  
>>  	mutex_unlock(&pfdev->sched_lock);
>> @@ -351,6 +378,9 @@ static void panfrost_job_free(struct drm_sched_job *sched_job)
>>  
>>  	drm_sched_job_cleanup(sched_job);
>>  
>> +	if (panfrost_job_is_idle(job->pfdev))
>> +		gpu_write(job->pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_STOP);
>> +
>>  	panfrost_job_put(job);
>>  }
>>  
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h
>> index 8becc1ba0eb9..038171c39dd8 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.h
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.h
>> @@ -32,6 +32,10 @@ struct panfrost_job {
>>  
>>  	/* Fence to be signaled by drm-sched once its done with the job */
>>  	struct dma_fence *render_done_fence;
>> +
>> +	struct panfrost_file_priv *priv;
>> +	ktime_t start_time;
>> +	u64 start_cycles;
>>  };
>>  
>>  int panfrost_job_init(struct panfrost_device *pfdev);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
@ 2023-08-31 23:23       ` Adrián Larumbe
  0 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-08-31 23:23 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: tzimmermann, sean, quic_abhinavk, mripard, steven.price,
	freedreno, healych, dri-devel, linux-arm-msm, dmitry.baryshkov,
	marijn.suijten, kernel, linux-kernel

On 30.08.2023 12:17, Boris Brezillon wrote:
>On Thu, 24 Aug 2023 02:34:45 +0100
>Adrián Larumbe <adrian.larumbe@collabora.com> wrote:
>
>> The drm-stats fdinfo tags made available to user space are drm-engine,
>> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.
>
>Pretty sure this has already been discussed, but it's probably worth
>mentioning that drm-cycles is not accurate, it just gives you a rough
>idea of how much GPU cycles were dedicated to a context (just like
>drm-engine elapsed-ns is giving you an approximation of the
>GPU utilization). This comes from 2 factors:
>
>1. We're dependent on the time the kernel/CPU takes to process the GPU
>interrupt.
>2. The pipelining done by the Job Manager (2 job slots per engine)
>implies that you can't really know how much time each job spent on the
>GPU. When these jobs are coming from the same context, that's not a
>problem, but when they don't, it's impossible to have a clear split.
>
>I'd really like to have that mentioned somewhere in the code+commit
>message to lower users expectation.
>
>> 
>> This deviates from standard practice in other DRM drivers, where a single
>> set of key:value pairs is provided for the whole render engine. However,
>> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
>> decision was made to calculate bus cycles and workload times separately.
>> 
>> Maximum operating frequency is calculated at devfreq initialisation time.
>> Current frequency is made available to user space because nvtop uses it
>> when performing engine usage calculations.
>> 
>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> ---
>>  drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
>>  drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>>  drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
>>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
>>  drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
>>  drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
>>  6 files changed, 102 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
>> index 58dfb15a8757..28caffc689e2 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
>> @@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
>>  	spin_lock_irqsave(&pfdevfreq->lock, irqflags);
>>  
>>  	panfrost_devfreq_update_utilization(pfdevfreq);
>> +	pfdevfreq->current_frequency = status->current_frequency;
>>  
>>  	status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time,
>>  						   pfdevfreq->idle_time));
>> @@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
>>  	struct devfreq *devfreq;
>>  	struct thermal_cooling_device *cooling;
>>  	struct panfrost_devfreq *pfdevfreq = &pfdev->pfdevfreq;
>> +	unsigned long freq = ULONG_MAX;
>>  
>>  	if (pfdev->comp->num_supplies > 1) {
>>  		/*
>> @@ -172,6 +174,12 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
>>  		return ret;
>>  	}
>>  
>> +	/* Find the fastest defined rate  */
>> +	opp = dev_pm_opp_find_freq_floor(dev, &freq);
>> +	if (IS_ERR(opp))
>> +		return PTR_ERR(opp);
>> +	pfdevfreq->fast_rate = freq;
>> +
>>  	dev_pm_opp_put(opp);
>>  
>>  	/*
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.h b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
>> index 1514c1f9d91c..48dbe185f206 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.h
>> +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
>> @@ -19,6 +19,9 @@ struct panfrost_devfreq {
>>  	struct devfreq_simple_ondemand_data gov_data;
>>  	bool opp_of_table_added;
>>  
>> +	unsigned long current_frequency;
>> +	unsigned long fast_rate;
>> +
>>  	ktime_t busy_time;
>>  	ktime_t idle_time;
>>  	ktime_t time_last_update;
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
>> index b0126b9fbadc..680f298fd1a9 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
>> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
>> @@ -24,6 +24,7 @@ struct panfrost_perfcnt;
>>  
>>  #define NUM_JOB_SLOTS 3
>>  #define MAX_PM_DOMAINS 5
>> +#define MAX_SLOT_NAME_LEN 10
>>  
>>  struct panfrost_features {
>>  	u16 id;
>> @@ -135,12 +136,24 @@ struct panfrost_mmu {
>>  	struct list_head list;
>>  };
>>  
>> +struct drm_info_gpu {
>> +	unsigned int maxfreq;
>> +
>> +	struct engine_info {
>> +		unsigned long long elapsed_ns;
>> +		unsigned long long cycles;
>> +		char name[MAX_SLOT_NAME_LEN];
>> +	} engines[NUM_JOB_SLOTS];
>> +};
>> +
>>  struct panfrost_file_priv {
>>  	struct panfrost_device *pfdev;
>>  
>>  	struct drm_sched_entity sched_entity[NUM_JOB_SLOTS];
>>  
>>  	struct panfrost_mmu *mmu;
>> +
>> +	struct drm_info_gpu fdinfo;
>>  };
>>  
>>  static inline struct panfrost_device *to_panfrost_device(struct drm_device *ddev)
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> index a2ab99698ca8..3fd372301019 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
>>  	job->requirements = args->requirements;
>>  	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
>>  	job->mmu = file_priv->mmu;
>> +	job->priv = file_priv;
>
>Uh, I'm not comfortable passing the file context here, unless you reset
>it to NULL in panfrost_job_close() and have code that's robust to
>job->priv being NULL. We've had cases in the past where jobs outlived
>the file context itself.
>
>>  
>>  	slot = panfrost_job_get_slot(job);
>>  
>> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
>>  		goto err_free;
>>  	}
>>  
>> +	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
>> +	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
>> +#if 0
>> +	/* Add compute engine in the future */
>> +	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
>> +#endif
>> +	panfrost_priv->fdinfo.maxfreq = pfdev->pfdevfreq.fast_rate;
>> +
>>  	ret = panfrost_job_open(panfrost_priv);
>>  	if (ret)
>>  		goto err_job;
>> @@ -523,7 +532,40 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
>>  	PANFROST_IOCTL(MADVISE,		madvise,	DRM_RENDER_ALLOW),
>>  };
>>  
>> -DEFINE_DRM_GEM_FOPS(panfrost_drm_driver_fops);
>> +
>> +static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
>> +				     struct panfrost_file_priv *panfrost_priv,
>> +				     struct drm_printer *p)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
>> +		struct engine_info *ei = &panfrost_priv->fdinfo.engines[i];
>> +
>> +		drm_printf(p, "drm-engine-%s:\t%llu ns\n",
>> +			   ei->name, ei->elapsed_ns);
>> +		drm_printf(p, "drm-cycles-%s:\t%llu\n",
>> +			   ei->name, ei->cycles);
>> +		drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
>> +			   ei->name, panfrost_priv->fdinfo.maxfreq);
>> +		drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
>> +			   ei->name, pfdev->pfdevfreq.current_frequency);
>> +	}
>> +}
>> +
>> +static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
>> +{
>> +	struct drm_device *dev = file->minor->dev;
>> +	struct panfrost_device *pfdev = dev->dev_private;
>> +
>> +	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
>> +}
>> +
>> +static const struct file_operations panfrost_drm_driver_fops = {
>> +	.owner = THIS_MODULE,
>> +	DRM_GEM_FOPS,
>> +	.show_fdinfo = drm_show_fdinfo,
>> +};
>>  
>>  /*
>>   * Panfrost driver version:
>> @@ -535,6 +577,7 @@ static const struct drm_driver panfrost_drm_driver = {
>>  	.driver_features	= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
>>  	.open			= panfrost_open,
>>  	.postclose		= panfrost_postclose,
>> +	.show_fdinfo		= panfrost_show_fdinfo,
>>  	.ioctls			= panfrost_drm_driver_ioctls,
>>  	.num_ioctls		= ARRAY_SIZE(panfrost_drm_driver_ioctls),
>>  	.fops			= &panfrost_drm_driver_fops,
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>> index dbc597ab46fb..a847e183b5d0 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>> @@ -153,10 +153,31 @@ panfrost_get_job_chain_flag(const struct panfrost_job *job)
>>  	return (f->seqno & 1) ? JS_CONFIG_JOB_CHAIN_FLAG : 0;
>>  }
>>  
>> +static inline unsigned long long read_cycles(struct panfrost_device *pfdev)
>> +{
>> +	u64 address = (u64) gpu_read(pfdev, GPU_CYCLE_COUNT_HI) << 32;
>> +
>> +	address |= gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
>> +
>
>We probably want to handle the 32-bit overflow case with something like:
>
>	u32 hi, lo;
>
>	do {
>		hi = gpu_read(pfdev, GPU_CYCLE_COUNT_HI);
>		lo = gpu_read(pfdev, GPU_CYCLE_COUNT_LO);
>	} while (hi != gpu_read(pfdev, GPU_CYCLE_COUNT_HI));
>
>	return ((u64)hi << 32) | lo;
>
>> +	return address;
>> +}
>> +
>>  static struct panfrost_job *
>>  panfrost_dequeue_job(struct panfrost_device *pfdev, int slot)
>>  {
>>  	struct panfrost_job *job = pfdev->jobs[slot][0];
>> +	struct engine_info *engine_info = &job->priv->fdinfo.engines[slot];
>> +
>> +	engine_info->elapsed_ns +=
>> +		ktime_to_ns(ktime_sub(ktime_get(), job->start_time));
>> +	engine_info->cycles +=
>> +		read_cycles(pfdev) - job->start_cycles;
>> +
>> +	/* Reset in case the job has to be requeued */
>> +	job->start_time = 0;
>> +	/* A GPU reset puts the Cycle Counter register back to 0 */
>> +	job->start_cycles = atomic_read(&pfdev->reset.pending) ?
>> +		0 : read_cycles(pfdev);
>
>Do we need to reset these values? If the jobs are re-submitted, those
>fields will be re-assigned, and if the job is done, I don't see where
>we're using it after that point (might have missed something).

I did this because from the third loop in panfrost_job_handle_irq, I got the
impression that when a job in the second slot is stopped after the one in the
first one fails, then it's requeued and started immediately without involvement
from the drm scheduler, so in this case panfrost_job_hw_submit wouldn't be
called. At the moment the initial sample of cycles and time is done in that function.

>>  
>>  	WARN_ON(!job);
>>  	pfdev->jobs[slot][0] = pfdev->jobs[slot][1];
>> @@ -233,6 +254,9 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
>>  	subslot = panfrost_enqueue_job(pfdev, js, job);
>>  	/* Don't queue the job if a reset is in progress */
>>  	if (!atomic_read(&pfdev->reset.pending)) {
>> +		job->start_time = ktime_get();
>> +		job->start_cycles = read_cycles(pfdev);
>> +
>>  		job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
>>  		dev_dbg(pfdev->dev,
>>  			"JS: Submitting atom %p to js[%d][%d] with head=0x%llx AS %d",
>> @@ -297,6 +321,9 @@ int panfrost_job_push(struct panfrost_job *job)
>>  
>>  	kref_get(&job->refcount); /* put by scheduler job completion */
>>  
>> +	if (panfrost_job_is_idle(pfdev))
>> +		gpu_write(pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_START);
>> +
>>  	drm_sched_entity_push_job(&job->base);
>>  
>>  	mutex_unlock(&pfdev->sched_lock);
>> @@ -351,6 +378,9 @@ static void panfrost_job_free(struct drm_sched_job *sched_job)
>>  
>>  	drm_sched_job_cleanup(sched_job);
>>  
>> +	if (panfrost_job_is_idle(job->pfdev))
>> +		gpu_write(job->pfdev, GPU_CMD, GPU_CMD_CYCLE_COUNT_STOP);
>> +
>>  	panfrost_job_put(job);
>>  }
>>  
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h
>> index 8becc1ba0eb9..038171c39dd8 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.h
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.h
>> @@ -32,6 +32,10 @@ struct panfrost_job {
>>  
>>  	/* Fence to be signaled by drm-sched once its done with the job */
>>  	struct dma_fence *render_done_fence;
>> +
>> +	struct panfrost_file_priv *priv;
>> +	ktime_t start_time;
>> +	u64 start_cycles;
>>  };
>>  
>>  int panfrost_job_init(struct panfrost_device *pfdev);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 5/6] drm/panfrost: Implement generic DRM object RSS reporting function
  2023-08-30 10:52     ` Boris Brezillon
@ 2023-09-01  0:03       ` Adrián Larumbe
  -1 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-09-01  0:03 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price, linux-arm-msm, linux-kernel, dri-devel,
	healych, kernel, freedreno

On 30.08.2023 12:52, Boris Brezillon wrote:
>On Thu, 24 Aug 2023 02:34:48 +0100
>Adrián Larumbe <adrian.larumbe@collabora.com> wrote:
>
>> BO's RSS is updated every time new pages are allocated and mapped for the
>> object, either in its entirety at creation time for non-heap buffers, or
>> else on demand for heap buffers at GPU page fault's IRQ handler.
>> 
>> Same calculations had to be done for imported PRIME objects, since backing
>> storage for it might have already been allocated by the exporting driver.
>> 
>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> ---
>>  drivers/gpu/drm/panfrost/panfrost_gem.c | 22 ++++++++++++++++++++++
>>  drivers/gpu/drm/panfrost/panfrost_gem.h |  5 +++++
>>  drivers/gpu/drm/panfrost/panfrost_mmu.c | 16 +++++++++++-----
>>  3 files changed, 38 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
>> index aea16b0e4dda..c6bd1f16a6d4 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
>> @@ -206,6 +206,17 @@ static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj
>>  
>>  	return res;
>>  }
>> +
>> +size_t panfrost_gem_rss(struct drm_gem_object *obj)
>> +{
>> +	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
>> +
>> +	if (!bo->base.pages)
>> +		return 0;
>> +
>> +	return bo->rss_size;
>> +}
>> +
>>  static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>>  	.free = panfrost_gem_free_object,
>>  	.open = panfrost_gem_open,
>> @@ -218,6 +229,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>>  	.vunmap = drm_gem_shmem_object_vunmap,
>>  	.mmap = drm_gem_shmem_object_mmap,
>>  	.status = panfrost_gem_status,
>> +	.rss = panfrost_gem_rss,
>>  	.vm_ops = &drm_gem_shmem_vm_ops,
>>  };
>>  
>> @@ -274,13 +286,23 @@ panfrost_gem_prime_import_sg_table(struct drm_device *dev,
>>  {
>>  	struct drm_gem_object *obj;
>>  	struct panfrost_gem_object *bo;
>> +	struct scatterlist *sgl;
>> +	unsigned int count;
>> +	size_t total = 0;
>>  
>>  	obj = drm_gem_shmem_prime_import_sg_table(dev, attach, sgt);
>>  	if (IS_ERR(obj))
>>  		return ERR_CAST(obj);
>>  
>> +	for_each_sgtable_dma_sg(sgt, sgl, count) {
>> +		size_t len = sg_dma_len(sgl);
>> +
>> +		total += len;
>> +	}
>
>Why not simply have bo->rss_size = obj->size here? Not sure I see a
>reason to not trust dma_buf?

Can PRIME-imported BO's ever be heap objects?

>> +
>>  	bo = to_panfrost_bo(obj);
>>  	bo->noexec = true;
>> +	bo->rss_size = total;
>>  
>>  	return obj;
>>  }
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
>> index e06f7ceb8f73..e2a7c46403c7 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_gem.h
>> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
>> @@ -36,6 +36,11 @@ struct panfrost_gem_object {
>>  	 */
>>  	atomic_t gpu_usecount;
>>  
>> +	/*
>> +	 * Object chunk size currently mapped onto physical memory
>> +	 */
>> +	size_t rss_size;
>> +
>>  	bool noexec		:1;
>>  	bool is_heap		:1;
>>  	bool is_purgable	:1;
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
>> index c0123d09f699..e03a5a9da06f 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
>> @@ -285,17 +285,19 @@ static void panfrost_mmu_flush_range(struct panfrost_device *pfdev,
>>  	pm_runtime_put_autosuspend(pfdev->dev);
>>  }
>>  
>> -static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
>> +static size_t mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
>>  		      u64 iova, int prot, struct sg_table *sgt)
>>  {
>>  	unsigned int count;
>>  	struct scatterlist *sgl;
>>  	struct io_pgtable_ops *ops = mmu->pgtbl_ops;
>>  	u64 start_iova = iova;
>> +	size_t total = 0;
>>  
>>  	for_each_sgtable_dma_sg(sgt, sgl, count) {
>>  		unsigned long paddr = sg_dma_address(sgl);
>>  		size_t len = sg_dma_len(sgl);
>> +		total += len;
>>  
>>  		dev_dbg(pfdev->dev, "map: as=%d, iova=%llx, paddr=%lx, len=%zx", mmu->as, iova, paddr, len);
>>  
>> @@ -315,7 +317,7 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
>>  
>>  	panfrost_mmu_flush_range(pfdev, mmu, start_iova, iova - start_iova);
>>  
>> -	return 0;
>> +	return total;
>>  }
>>  
>>  int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
>> @@ -326,6 +328,7 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
>>  	struct panfrost_device *pfdev = to_panfrost_device(obj->dev);
>>  	struct sg_table *sgt;
>>  	int prot = IOMMU_READ | IOMMU_WRITE;
>> +	size_t mapped_size;
>>  
>>  	if (WARN_ON(mapping->active))
>>  		return 0;
>> @@ -337,9 +340,10 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
>>  	if (WARN_ON(IS_ERR(sgt)))
>>  		return PTR_ERR(sgt);
>>  
>> -	mmu_map_sg(pfdev, mapping->mmu, mapping->mmnode.start << PAGE_SHIFT,
>> +	mapped_size = mmu_map_sg(pfdev, mapping->mmu, mapping->mmnode.start << PAGE_SHIFT,
>>  		   prot, sgt);
>>  	mapping->active = true;
>> +	bo->rss_size += mapped_size;
>
>Actually, the GEM might be resident even before panfrost_mmu_map() is
>called: as soon as drm_gem_shmem_get_pages[_locked]() is called, it's
>resident (might get evicted after that point though). That means any
>mmap coming from userspace will make the buffer resident too. I know
>we're automatically mapping GEMs to the GPU VM in panfrost_gem_open(),
>so it makes no difference, but I think I'd prefer if we keep ->rss_size
>for heap BOs only (we probably want to rename it heap_rss_size) and
>then have
>
>
>	if (bo->is_heap)
>		return bo->heap_rss_size;
>	else if (bo->base.pages)
>		return bo->base.base.size;
>	else
>		return 0;
>
>in panfrost_gem_rss().
>
>>  
>>  	return 0;
>>  }
>> @@ -447,6 +451,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
>>  	pgoff_t page_offset;
>>  	struct sg_table *sgt;
>>  	struct page **pages;
>> +	size_t mapped_size;
>>  
>>  	bomapping = addr_to_mapping(pfdev, as, addr);
>>  	if (!bomapping)
>> @@ -518,10 +523,11 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
>>  	if (ret)
>>  		goto err_map;
>>  
>> -	mmu_map_sg(pfdev, bomapping->mmu, addr,
>> -		   IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
>> +	mapped_size = mmu_map_sg(pfdev, bomapping->mmu, addr,
>> +				 IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
>>  
>>  	bomapping->active = true;
>> +	bo->rss_size += mapped_size;
>>  
>>  	dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 5/6] drm/panfrost: Implement generic DRM object RSS reporting function
@ 2023-09-01  0:03       ` Adrián Larumbe
  0 siblings, 0 replies; 56+ messages in thread
From: Adrián Larumbe @ 2023-09-01  0:03 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: tzimmermann, sean, quic_abhinavk, mripard, steven.price,
	freedreno, healych, dri-devel, linux-arm-msm, dmitry.baryshkov,
	marijn.suijten, kernel, linux-kernel

On 30.08.2023 12:52, Boris Brezillon wrote:
>On Thu, 24 Aug 2023 02:34:48 +0100
>Adrián Larumbe <adrian.larumbe@collabora.com> wrote:
>
>> BO's RSS is updated every time new pages are allocated and mapped for the
>> object, either in its entirety at creation time for non-heap buffers, or
>> else on demand for heap buffers at GPU page fault's IRQ handler.
>> 
>> Same calculations had to be done for imported PRIME objects, since backing
>> storage for it might have already been allocated by the exporting driver.
>> 
>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> ---
>>  drivers/gpu/drm/panfrost/panfrost_gem.c | 22 ++++++++++++++++++++++
>>  drivers/gpu/drm/panfrost/panfrost_gem.h |  5 +++++
>>  drivers/gpu/drm/panfrost/panfrost_mmu.c | 16 +++++++++++-----
>>  3 files changed, 38 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
>> index aea16b0e4dda..c6bd1f16a6d4 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
>> @@ -206,6 +206,17 @@ static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj
>>  
>>  	return res;
>>  }
>> +
>> +size_t panfrost_gem_rss(struct drm_gem_object *obj)
>> +{
>> +	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
>> +
>> +	if (!bo->base.pages)
>> +		return 0;
>> +
>> +	return bo->rss_size;
>> +}
>> +
>>  static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>>  	.free = panfrost_gem_free_object,
>>  	.open = panfrost_gem_open,
>> @@ -218,6 +229,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
>>  	.vunmap = drm_gem_shmem_object_vunmap,
>>  	.mmap = drm_gem_shmem_object_mmap,
>>  	.status = panfrost_gem_status,
>> +	.rss = panfrost_gem_rss,
>>  	.vm_ops = &drm_gem_shmem_vm_ops,
>>  };
>>  
>> @@ -274,13 +286,23 @@ panfrost_gem_prime_import_sg_table(struct drm_device *dev,
>>  {
>>  	struct drm_gem_object *obj;
>>  	struct panfrost_gem_object *bo;
>> +	struct scatterlist *sgl;
>> +	unsigned int count;
>> +	size_t total = 0;
>>  
>>  	obj = drm_gem_shmem_prime_import_sg_table(dev, attach, sgt);
>>  	if (IS_ERR(obj))
>>  		return ERR_CAST(obj);
>>  
>> +	for_each_sgtable_dma_sg(sgt, sgl, count) {
>> +		size_t len = sg_dma_len(sgl);
>> +
>> +		total += len;
>> +	}
>
>Why not simply have bo->rss_size = obj->size here? Not sure I see a
>reason to not trust dma_buf?

Can PRIME-imported BO's ever be heap objects?

>> +
>>  	bo = to_panfrost_bo(obj);
>>  	bo->noexec = true;
>> +	bo->rss_size = total;
>>  
>>  	return obj;
>>  }
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
>> index e06f7ceb8f73..e2a7c46403c7 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_gem.h
>> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
>> @@ -36,6 +36,11 @@ struct panfrost_gem_object {
>>  	 */
>>  	atomic_t gpu_usecount;
>>  
>> +	/*
>> +	 * Object chunk size currently mapped onto physical memory
>> +	 */
>> +	size_t rss_size;
>> +
>>  	bool noexec		:1;
>>  	bool is_heap		:1;
>>  	bool is_purgable	:1;
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
>> index c0123d09f699..e03a5a9da06f 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
>> @@ -285,17 +285,19 @@ static void panfrost_mmu_flush_range(struct panfrost_device *pfdev,
>>  	pm_runtime_put_autosuspend(pfdev->dev);
>>  }
>>  
>> -static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
>> +static size_t mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
>>  		      u64 iova, int prot, struct sg_table *sgt)
>>  {
>>  	unsigned int count;
>>  	struct scatterlist *sgl;
>>  	struct io_pgtable_ops *ops = mmu->pgtbl_ops;
>>  	u64 start_iova = iova;
>> +	size_t total = 0;
>>  
>>  	for_each_sgtable_dma_sg(sgt, sgl, count) {
>>  		unsigned long paddr = sg_dma_address(sgl);
>>  		size_t len = sg_dma_len(sgl);
>> +		total += len;
>>  
>>  		dev_dbg(pfdev->dev, "map: as=%d, iova=%llx, paddr=%lx, len=%zx", mmu->as, iova, paddr, len);
>>  
>> @@ -315,7 +317,7 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
>>  
>>  	panfrost_mmu_flush_range(pfdev, mmu, start_iova, iova - start_iova);
>>  
>> -	return 0;
>> +	return total;
>>  }
>>  
>>  int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
>> @@ -326,6 +328,7 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
>>  	struct panfrost_device *pfdev = to_panfrost_device(obj->dev);
>>  	struct sg_table *sgt;
>>  	int prot = IOMMU_READ | IOMMU_WRITE;
>> +	size_t mapped_size;
>>  
>>  	if (WARN_ON(mapping->active))
>>  		return 0;
>> @@ -337,9 +340,10 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
>>  	if (WARN_ON(IS_ERR(sgt)))
>>  		return PTR_ERR(sgt);
>>  
>> -	mmu_map_sg(pfdev, mapping->mmu, mapping->mmnode.start << PAGE_SHIFT,
>> +	mapped_size = mmu_map_sg(pfdev, mapping->mmu, mapping->mmnode.start << PAGE_SHIFT,
>>  		   prot, sgt);
>>  	mapping->active = true;
>> +	bo->rss_size += mapped_size;
>
>Actually, the GEM might be resident even before panfrost_mmu_map() is
>called: as soon as drm_gem_shmem_get_pages[_locked]() is called, it's
>resident (might get evicted after that point though). That means any
>mmap coming from userspace will make the buffer resident too. I know
>we're automatically mapping GEMs to the GPU VM in panfrost_gem_open(),
>so it makes no difference, but I think I'd prefer if we keep ->rss_size
>for heap BOs only (we probably want to rename it heap_rss_size) and
>then have
>
>
>	if (bo->is_heap)
>		return bo->heap_rss_size;
>	else if (bo->base.pages)
>		return bo->base.base.size;
>	else
>		return 0;
>
>in panfrost_gem_rss().
>
>>  
>>  	return 0;
>>  }
>> @@ -447,6 +451,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
>>  	pgoff_t page_offset;
>>  	struct sg_table *sgt;
>>  	struct page **pages;
>> +	size_t mapped_size;
>>  
>>  	bomapping = addr_to_mapping(pfdev, as, addr);
>>  	if (!bomapping)
>> @@ -518,10 +523,11 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
>>  	if (ret)
>>  		goto err_map;
>>  
>> -	mmu_map_sg(pfdev, bomapping->mmu, addr,
>> -		   IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
>> +	mapped_size = mmu_map_sg(pfdev, bomapping->mmu, addr,
>> +				 IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
>>  
>>  	bomapping->active = true;
>> +	bo->rss_size += mapped_size;
>>  
>>  	dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 5/6] drm/panfrost: Implement generic DRM object RSS reporting function
  2023-09-01  0:03       ` Adrián Larumbe
@ 2023-09-01  6:44         ` Boris Brezillon
  -1 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2023-09-01  6:44 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, steven.price, linux-arm-msm, linux-kernel, dri-devel,
	healych, kernel, freedreno

On Fri, 1 Sep 2023 01:03:23 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> >> @@ -274,13 +286,23 @@ panfrost_gem_prime_import_sg_table(struct drm_device *dev,
> >>  {
> >>  	struct drm_gem_object *obj;
> >>  	struct panfrost_gem_object *bo;
> >> +	struct scatterlist *sgl;
> >> +	unsigned int count;
> >> +	size_t total = 0;
> >>  
> >>  	obj = drm_gem_shmem_prime_import_sg_table(dev, attach, sgt);
> >>  	if (IS_ERR(obj))
> >>  		return ERR_CAST(obj);
> >>  
> >> +	for_each_sgtable_dma_sg(sgt, sgl, count) {
> >> +		size_t len = sg_dma_len(sgl);
> >> +
> >> +		total += len;
> >> +	}  
> >
> >Why not simply have bo->rss_size = obj->size here? Not sure I see a
> >reason to not trust dma_buf?  
> 
> Can PRIME-imported BO's ever be heap objects?

Nope, heap BOs can't be exported, and if they can, that's probably a
bug we need to fix.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 5/6] drm/panfrost: Implement generic DRM object RSS reporting function
@ 2023-09-01  6:44         ` Boris Brezillon
  0 siblings, 0 replies; 56+ messages in thread
From: Boris Brezillon @ 2023-09-01  6:44 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: tzimmermann, sean, quic_abhinavk, mripard, steven.price,
	freedreno, healych, dri-devel, linux-arm-msm, dmitry.baryshkov,
	marijn.suijten, kernel, linux-kernel

On Fri, 1 Sep 2023 01:03:23 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> >> @@ -274,13 +286,23 @@ panfrost_gem_prime_import_sg_table(struct drm_device *dev,
> >>  {
> >>  	struct drm_gem_object *obj;
> >>  	struct panfrost_gem_object *bo;
> >> +	struct scatterlist *sgl;
> >> +	unsigned int count;
> >> +	size_t total = 0;
> >>  
> >>  	obj = drm_gem_shmem_prime_import_sg_table(dev, attach, sgt);
> >>  	if (IS_ERR(obj))
> >>  		return ERR_CAST(obj);
> >>  
> >> +	for_each_sgtable_dma_sg(sgt, sgl, count) {
> >> +		size_t len = sg_dma_len(sgl);
> >> +
> >> +		total += len;
> >> +	}  
> >
> >Why not simply have bo->rss_size = obj->size here? Not sure I see a
> >reason to not trust dma_buf?  
> 
> Can PRIME-imported BO's ever be heap objects?

Nope, heap BOs can't be exported, and if they can, that's probably a
bug we need to fix.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
  2023-08-24  1:34   ` Adrián Larumbe
@ 2023-09-01 22:18     ` kernel test robot
  -1 siblings, 0 replies; 56+ messages in thread
From: kernel test robot @ 2023-09-01 22:18 UTC (permalink / raw)
  To: Adrián Larumbe, maarten.lankhorst, mripard, tzimmermann,
	airlied, daniel, robdclark, quic_abhinavk, dmitry.baryshkov,
	sean, marijn.suijten, robh, steven.price
  Cc: llvm, oe-kbuild-all, linux-arm-msm, adrian.larumbe, linux-kernel,
	dri-devel, healych, kernel, freedreno

Hi Adrián,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v6.5 next-20230831]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panfrost-Add-cycle-count-GPU-register-definitions/20230824-093848
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230824013604.466224-7-adrian.larumbe%40collabora.com
patch subject: [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
config: x86_64-randconfig-002-20230902 (https://download.01.org/0day-ci/archive/20230902/202309020634.fwC7KBk6-lkp@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230902/202309020634.fwC7KBk6-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309020634.fwC7KBk6-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/drm_file.c:905: warning: Function parameter or member 'unit' not described in 'drm_print_memory_stats'


vim +905 drivers/gpu/drm/drm_file.c

686b21b5f6ca2f Rob Clark      2023-05-24  891  
686b21b5f6ca2f Rob Clark      2023-05-24  892  /**
686b21b5f6ca2f Rob Clark      2023-05-24  893   * drm_print_memory_stats - A helper to print memory stats
686b21b5f6ca2f Rob Clark      2023-05-24  894   * @p: The printer to print output to
686b21b5f6ca2f Rob Clark      2023-05-24  895   * @stats: The collected memory stats
686b21b5f6ca2f Rob Clark      2023-05-24  896   * @supported_status: Bitmask of optional stats which are available
686b21b5f6ca2f Rob Clark      2023-05-24  897   * @region: The memory region
686b21b5f6ca2f Rob Clark      2023-05-24  898   *
686b21b5f6ca2f Rob Clark      2023-05-24  899   */
686b21b5f6ca2f Rob Clark      2023-05-24  900  void drm_print_memory_stats(struct drm_printer *p,
686b21b5f6ca2f Rob Clark      2023-05-24  901  			    const struct drm_memory_stats *stats,
686b21b5f6ca2f Rob Clark      2023-05-24  902  			    enum drm_gem_object_status supported_status,
cccad8cb432637 Adrián Larumbe 2023-08-24  903  			    const char *region,
cccad8cb432637 Adrián Larumbe 2023-08-24  904  			    unsigned int unit)
686b21b5f6ca2f Rob Clark      2023-05-24 @905  {
cccad8cb432637 Adrián Larumbe 2023-08-24  906  	print_size(p, "total", region, stats->private + stats->shared, unit);
cccad8cb432637 Adrián Larumbe 2023-08-24  907  	print_size(p, "shared", region, stats->shared, unit);
cccad8cb432637 Adrián Larumbe 2023-08-24  908  	print_size(p, "active", region, stats->active, unit);
686b21b5f6ca2f Rob Clark      2023-05-24  909  
686b21b5f6ca2f Rob Clark      2023-05-24  910  	if (supported_status & DRM_GEM_OBJECT_RESIDENT)
cccad8cb432637 Adrián Larumbe 2023-08-24  911  		print_size(p, "resident", region, stats->resident, unit);
686b21b5f6ca2f Rob Clark      2023-05-24  912  
686b21b5f6ca2f Rob Clark      2023-05-24  913  	if (supported_status & DRM_GEM_OBJECT_PURGEABLE)
cccad8cb432637 Adrián Larumbe 2023-08-24  914  		print_size(p, "purgeable", region, stats->purgeable, unit);
686b21b5f6ca2f Rob Clark      2023-05-24  915  }
686b21b5f6ca2f Rob Clark      2023-05-24  916  EXPORT_SYMBOL(drm_print_memory_stats);
686b21b5f6ca2f Rob Clark      2023-05-24  917  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
@ 2023-09-01 22:18     ` kernel test robot
  0 siblings, 0 replies; 56+ messages in thread
From: kernel test robot @ 2023-09-01 22:18 UTC (permalink / raw)
  To: Adrián Larumbe, maarten.lankhorst, mripard, tzimmermann,
	airlied, daniel, robdclark, quic_abhinavk, dmitry.baryshkov,
	sean, marijn.suijten, robh, steven.price
  Cc: adrian.larumbe, linux-arm-msm, llvm, linux-kernel, dri-devel,
	healych, oe-kbuild-all, kernel, freedreno

Hi Adrián,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v6.5 next-20230831]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panfrost-Add-cycle-count-GPU-register-definitions/20230824-093848
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230824013604.466224-7-adrian.larumbe%40collabora.com
patch subject: [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
config: x86_64-randconfig-002-20230902 (https://download.01.org/0day-ci/archive/20230902/202309020634.fwC7KBk6-lkp@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230902/202309020634.fwC7KBk6-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309020634.fwC7KBk6-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/drm_file.c:905: warning: Function parameter or member 'unit' not described in 'drm_print_memory_stats'


vim +905 drivers/gpu/drm/drm_file.c

686b21b5f6ca2f Rob Clark      2023-05-24  891  
686b21b5f6ca2f Rob Clark      2023-05-24  892  /**
686b21b5f6ca2f Rob Clark      2023-05-24  893   * drm_print_memory_stats - A helper to print memory stats
686b21b5f6ca2f Rob Clark      2023-05-24  894   * @p: The printer to print output to
686b21b5f6ca2f Rob Clark      2023-05-24  895   * @stats: The collected memory stats
686b21b5f6ca2f Rob Clark      2023-05-24  896   * @supported_status: Bitmask of optional stats which are available
686b21b5f6ca2f Rob Clark      2023-05-24  897   * @region: The memory region
686b21b5f6ca2f Rob Clark      2023-05-24  898   *
686b21b5f6ca2f Rob Clark      2023-05-24  899   */
686b21b5f6ca2f Rob Clark      2023-05-24  900  void drm_print_memory_stats(struct drm_printer *p,
686b21b5f6ca2f Rob Clark      2023-05-24  901  			    const struct drm_memory_stats *stats,
686b21b5f6ca2f Rob Clark      2023-05-24  902  			    enum drm_gem_object_status supported_status,
cccad8cb432637 Adrián Larumbe 2023-08-24  903  			    const char *region,
cccad8cb432637 Adrián Larumbe 2023-08-24  904  			    unsigned int unit)
686b21b5f6ca2f Rob Clark      2023-05-24 @905  {
cccad8cb432637 Adrián Larumbe 2023-08-24  906  	print_size(p, "total", region, stats->private + stats->shared, unit);
cccad8cb432637 Adrián Larumbe 2023-08-24  907  	print_size(p, "shared", region, stats->shared, unit);
cccad8cb432637 Adrián Larumbe 2023-08-24  908  	print_size(p, "active", region, stats->active, unit);
686b21b5f6ca2f Rob Clark      2023-05-24  909  
686b21b5f6ca2f Rob Clark      2023-05-24  910  	if (supported_status & DRM_GEM_OBJECT_RESIDENT)
cccad8cb432637 Adrián Larumbe 2023-08-24  911  		print_size(p, "resident", region, stats->resident, unit);
686b21b5f6ca2f Rob Clark      2023-05-24  912  
686b21b5f6ca2f Rob Clark      2023-05-24  913  	if (supported_status & DRM_GEM_OBJECT_PURGEABLE)
cccad8cb432637 Adrián Larumbe 2023-08-24  914  		print_size(p, "purgeable", region, stats->purgeable, unit);
686b21b5f6ca2f Rob Clark      2023-05-24  915  }
686b21b5f6ca2f Rob Clark      2023-05-24  916  EXPORT_SYMBOL(drm_print_memory_stats);
686b21b5f6ca2f Rob Clark      2023-05-24  917  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
  2023-08-24  1:34   ` Adrián Larumbe
@ 2023-09-02  3:20     ` kernel test robot
  -1 siblings, 0 replies; 56+ messages in thread
From: kernel test robot @ 2023-09-02  3:20 UTC (permalink / raw)
  To: Adrián Larumbe, maarten.lankhorst, mripard, tzimmermann,
	airlied, daniel, robdclark, quic_abhinavk, dmitry.baryshkov,
	sean, marijn.suijten, robh, steven.price
  Cc: llvm, oe-kbuild-all, linux-arm-msm, adrian.larumbe, linux-kernel,
	dri-devel, healych, kernel, freedreno

Hi Adrián,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v6.5 next-20230831]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panfrost-Add-cycle-count-GPU-register-definitions/20230824-093848
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230824013604.466224-3-adrian.larumbe%40collabora.com
patch subject: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
config: s390-randconfig-001-20230902 (https://download.01.org/0day-ci/archive/20230902/202309021155.i3NPUDJi-lkp@intel.com/config)
compiler: clang version 15.0.7 (https://github.com/llvm/llvm-project.git 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230902/202309021155.i3NPUDJi-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309021155.i3NPUDJi-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/panfrost/panfrost_drv.c:17:
   In file included from drivers/gpu/drm/panfrost/panfrost_device.h:9:
   In file included from include/linux/io-pgtable.h:6:
   In file included from include/linux/iommu.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
                                                             ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
   #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
                                                        ^
   In file included from drivers/gpu/drm/panfrost/panfrost_drv.c:17:
   In file included from drivers/gpu/drm/panfrost/panfrost_device.h:9:
   In file included from include/linux/io-pgtable.h:6:
   In file included from include/linux/iommu.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
   #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
                                                        ^
   In file included from drivers/gpu/drm/panfrost/panfrost_drv.c:17:
   In file included from drivers/gpu/drm/panfrost/panfrost_device.h:9:
   In file included from include/linux/io-pgtable.h:6:
   In file included from include/linux/iommu.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> drivers/gpu/drm/panfrost/panfrost_drv.c:552:17: warning: format specifies type 'unsigned int' but the argument has type 'unsigned long' [-Wformat]
                              ei->name, pfdev->pfdevfreq.current_frequency);
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   13 warnings generated.


vim +552 drivers/gpu/drm/panfrost/panfrost_drv.c

   534	
   535	
   536	static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
   537					     struct panfrost_file_priv *panfrost_priv,
   538					     struct drm_printer *p)
   539	{
   540		int i;
   541	
   542		for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
   543			struct engine_info *ei = &panfrost_priv->fdinfo.engines[i];
   544	
   545			drm_printf(p, "drm-engine-%s:\t%llu ns\n",
   546				   ei->name, ei->elapsed_ns);
   547			drm_printf(p, "drm-cycles-%s:\t%llu\n",
   548				   ei->name, ei->cycles);
   549			drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
   550				   ei->name, panfrost_priv->fdinfo.maxfreq);
   551			drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
 > 552				   ei->name, pfdev->pfdevfreq.current_frequency);
   553		}
   554	}
   555	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
@ 2023-09-02  3:20     ` kernel test robot
  0 siblings, 0 replies; 56+ messages in thread
From: kernel test robot @ 2023-09-02  3:20 UTC (permalink / raw)
  To: Adrián Larumbe, maarten.lankhorst, mripard, tzimmermann,
	airlied, daniel, robdclark, quic_abhinavk, dmitry.baryshkov,
	sean, marijn.suijten, robh, steven.price
  Cc: adrian.larumbe, linux-arm-msm, llvm, linux-kernel, dri-devel,
	healych, oe-kbuild-all, kernel, freedreno

Hi Adrián,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v6.5 next-20230831]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panfrost-Add-cycle-count-GPU-register-definitions/20230824-093848
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230824013604.466224-3-adrian.larumbe%40collabora.com
patch subject: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
config: s390-randconfig-001-20230902 (https://download.01.org/0day-ci/archive/20230902/202309021155.i3NPUDJi-lkp@intel.com/config)
compiler: clang version 15.0.7 (https://github.com/llvm/llvm-project.git 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230902/202309021155.i3NPUDJi-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309021155.i3NPUDJi-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/panfrost/panfrost_drv.c:17:
   In file included from drivers/gpu/drm/panfrost/panfrost_device.h:9:
   In file included from include/linux/io-pgtable.h:6:
   In file included from include/linux/iommu.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
                                                             ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
   #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
                                                        ^
   In file included from drivers/gpu/drm/panfrost/panfrost_drv.c:17:
   In file included from drivers/gpu/drm/panfrost/panfrost_device.h:9:
   In file included from include/linux/io-pgtable.h:6:
   In file included from include/linux/iommu.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
   #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
                                                        ^
   In file included from drivers/gpu/drm/panfrost/panfrost_drv.c:17:
   In file included from drivers/gpu/drm/panfrost/panfrost_device.h:9:
   In file included from include/linux/io-pgtable.h:6:
   In file included from include/linux/iommu.h:10:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> drivers/gpu/drm/panfrost/panfrost_drv.c:552:17: warning: format specifies type 'unsigned int' but the argument has type 'unsigned long' [-Wformat]
                              ei->name, pfdev->pfdevfreq.current_frequency);
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   13 warnings generated.


vim +552 drivers/gpu/drm/panfrost/panfrost_drv.c

   534	
   535	
   536	static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
   537					     struct panfrost_file_priv *panfrost_priv,
   538					     struct drm_printer *p)
   539	{
   540		int i;
   541	
   542		for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
   543			struct engine_info *ei = &panfrost_priv->fdinfo.engines[i];
   544	
   545			drm_printf(p, "drm-engine-%s:\t%llu ns\n",
   546				   ei->name, ei->elapsed_ns);
   547			drm_printf(p, "drm-cycles-%s:\t%llu\n",
   548				   ei->name, ei->cycles);
   549			drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
   550				   ei->name, panfrost_priv->fdinfo.maxfreq);
   551			drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
 > 552				   ei->name, pfdev->pfdevfreq.current_frequency);
   553		}
   554	}
   555	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
  2023-08-31 21:34       ` Adrián Larumbe
@ 2023-09-04  8:22         ` Steven Price
  -1 siblings, 0 replies; 56+ messages in thread
From: Steven Price @ 2023-09-04  8:22 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	robdclark, quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten,
	robh, dri-devel, linux-kernel, linux-arm-msm, freedreno, healych,
	kernel

On 31/08/2023 22:34, Adrián Larumbe wrote:
> On 31.08.2023 16:54, Steven Price wrote:
>> On 24/08/2023 02:34, Adrián Larumbe wrote:
>>> The drm-stats fdinfo tags made available to user space are drm-engine,
>>> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.
>>>
>>> This deviates from standard practice in other DRM drivers, where a single
>>> set of key:value pairs is provided for the whole render engine. However,
>>> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
>>> decision was made to calculate bus cycles and workload times separately.
>>>
>>> Maximum operating frequency is calculated at devfreq initialisation time.
>>> Current frequency is made available to user space because nvtop uses it
>>> when performing engine usage calculations.
>>>
>>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>>> ---
>>>  drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
>>>  drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>>>  drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
>>>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
>>>  drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
>>>  drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
>>>  6 files changed, 102 insertions(+), 1 deletion(-)
>>>
>>
>> [...]
>>
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>> index a2ab99698ca8..3fd372301019 100644
>>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
>>>  	job->requirements = args->requirements;
>>>  	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
>>>  	job->mmu = file_priv->mmu;
>>> +	job->priv = file_priv;
>>>  
>>>  	slot = panfrost_job_get_slot(job);
>>>  
>>> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
>>>  		goto err_free;
>>>  	}
>>>  
>>> +	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
>>> +	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
>>> +#if 0
>>> +	/* Add compute engine in the future */
>>> +	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
>>> +#endif
>>
>> I'm not sure what names are best, but slot 2 isn't actually a compute slot.
>>
>> Slot 0 is fragment, that name is fine.
>>
>> Slot 1 and 2 are actually the same (from a hardware perspective) but the
>> core affinity of the two slots cannot overlap which means you need to
>> divide the GPU in two to usefully use both slots. The only GPU that this
>> actually makes sense for is the T628[1] as it has two (non-coherent)
>> core groups.
>>
>> The upshot is that slot 1 is used for all of vertex, tiling and compute.
>> Slot 2 is currently never used, but kbase will use it only for compute
>> (and only on the two core group GPUs).
> 
> I think I might've be rushed to draw inspiration for this from a comment in panfrost_job.c:
> 
> int panfrost_job_get_slot(struct panfrost_job *job)
> {
> 	/* JS0: fragment jobs.
> 	 * JS1: vertex/tiler jobs
> 	 * JS2: compute jobs
> 	 */
>          [...]
> }
> 
> Maybe I could rename the engine names to "fragment", "vertex-tiler" and "compute-only"?
> There's no reason why I would skimp on engine name length, and anything more
> descriptive would be just as good.

Yeah, those names are probably the best we're going to get. And I
certainly prefer the longer names.

>> Personally I'd be tempted to call them "slot 0", "slot 1" and "slot 2" -
>> but I appreciate that's not very helpful to people who aren't intimately
>> familiar with the hardware ;)
> 
> The downside of this is that both IGT's fdinfo library and nvtop will use the
> engime name for display, and like you said these numbers might mean nothing to
> someone who isn't acquainted with the hardware.

Indeed - I've spent way too much time with the hardware and there are
many subtleties so I tent to try to avoid calling them anything other
than "slot x" (especially when talking to hardware engineers). For
example a test that submits NULL jobs can submit them to any slot.
However, when you get beyond artificial tests then it is quite
consistent that slot 0=fragment, slot 1=vertex-tiler (and compute), slot
2=never used (except for compute on dual core groups).

Steve

>> Steve
>>
>> [1] And technically the T608 but that's even rarer and the T60x isn't
>> (yet) supported by Panfrost.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
@ 2023-09-04  8:22         ` Steven Price
  0 siblings, 0 replies; 56+ messages in thread
From: Steven Price @ 2023-09-04  8:22 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: kernel, tzimmermann, sean, quic_abhinavk, mripard, linux-kernel,
	healych, dri-devel, linux-arm-msm, dmitry.baryshkov,
	marijn.suijten, freedreno

On 31/08/2023 22:34, Adrián Larumbe wrote:
> On 31.08.2023 16:54, Steven Price wrote:
>> On 24/08/2023 02:34, Adrián Larumbe wrote:
>>> The drm-stats fdinfo tags made available to user space are drm-engine,
>>> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.
>>>
>>> This deviates from standard practice in other DRM drivers, where a single
>>> set of key:value pairs is provided for the whole render engine. However,
>>> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
>>> decision was made to calculate bus cycles and workload times separately.
>>>
>>> Maximum operating frequency is calculated at devfreq initialisation time.
>>> Current frequency is made available to user space because nvtop uses it
>>> when performing engine usage calculations.
>>>
>>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>>> ---
>>>  drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 ++++
>>>  drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>>>  drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++++++
>>>  drivers/gpu/drm/panfrost/panfrost_drv.c     | 45 ++++++++++++++++++++-
>>>  drivers/gpu/drm/panfrost/panfrost_job.c     | 30 ++++++++++++++
>>>  drivers/gpu/drm/panfrost/panfrost_job.h     |  4 ++
>>>  6 files changed, 102 insertions(+), 1 deletion(-)
>>>
>>
>> [...]
>>
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>> index a2ab99698ca8..3fd372301019 100644
>>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>> @@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
>>>  	job->requirements = args->requirements;
>>>  	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
>>>  	job->mmu = file_priv->mmu;
>>> +	job->priv = file_priv;
>>>  
>>>  	slot = panfrost_job_get_slot(job);
>>>  
>>> @@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file *file)
>>>  		goto err_free;
>>>  	}
>>>  
>>> +	snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, "frg");
>>> +	snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, "vtx");
>>> +#if 0
>>> +	/* Add compute engine in the future */
>>> +	snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, "cmp");
>>> +#endif
>>
>> I'm not sure what names are best, but slot 2 isn't actually a compute slot.
>>
>> Slot 0 is fragment, that name is fine.
>>
>> Slot 1 and 2 are actually the same (from a hardware perspective) but the
>> core affinity of the two slots cannot overlap which means you need to
>> divide the GPU in two to usefully use both slots. The only GPU that this
>> actually makes sense for is the T628[1] as it has two (non-coherent)
>> core groups.
>>
>> The upshot is that slot 1 is used for all of vertex, tiling and compute.
>> Slot 2 is currently never used, but kbase will use it only for compute
>> (and only on the two core group GPUs).
> 
> I think I might've be rushed to draw inspiration for this from a comment in panfrost_job.c:
> 
> int panfrost_job_get_slot(struct panfrost_job *job)
> {
> 	/* JS0: fragment jobs.
> 	 * JS1: vertex/tiler jobs
> 	 * JS2: compute jobs
> 	 */
>          [...]
> }
> 
> Maybe I could rename the engine names to "fragment", "vertex-tiler" and "compute-only"?
> There's no reason why I would skimp on engine name length, and anything more
> descriptive would be just as good.

Yeah, those names are probably the best we're going to get. And I
certainly prefer the longer names.

>> Personally I'd be tempted to call them "slot 0", "slot 1" and "slot 2" -
>> but I appreciate that's not very helpful to people who aren't intimately
>> familiar with the hardware ;)
> 
> The downside of this is that both IGT's fdinfo library and nvtop will use the
> engime name for display, and like you said these numbers might mean nothing to
> someone who isn't acquainted with the hardware.

Indeed - I've spent way too much time with the hardware and there are
many subtleties so I tent to try to avoid calling them anything other
than "slot x" (especially when talking to hardware engineers). For
example a test that submits NULL jobs can submit them to any slot.
However, when you get beyond artificial tests then it is quite
consistent that slot 0=fragment, slot 1=vertex-tiler (and compute), slot
2=never used (except for compute on dual core groups).

Steve

>> Steve
>>
>> [1] And technically the T608 but that's even rarer and the T60x isn't
>> (yet) supported by Panfrost.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
  2023-08-30 15:51       ` Adrián Larumbe
@ 2023-09-05 22:23         ` Rob Clark
  -1 siblings, 0 replies; 56+ messages in thread
From: Rob Clark @ 2023-09-05 22:23 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: maarten.lankhorst, mripard, tzimmermann, airlied, daniel,
	quic_abhinavk, dmitry.baryshkov, sean, marijn.suijten, robh,
	steven.price, dri-devel, linux-kernel, linux-arm-msm, freedreno,
	healych, kernel, Tvrtko Ursulin, Rob Clark

On Wed, Aug 30, 2023 at 8:51 AM Adrián Larumbe
<adrian.larumbe@collabora.com> wrote:
>
> >> The current implementation will try to pick the highest available
> >> unit. This is rather unflexible, and allowing drivers to display BO size
> >> statistics through fdinfo in units of their choice might be desirable.
> >>
> >> The new argument to drm_show_memory_stats is to be interpreted as the
> >> integer multiplier of a 10-power of 2, so 1 would give us size in Kib and 2
> >> in Mib. If we want drm-file functions to pick the highest unit, then 0
> >> should be passed.
> >>
> >> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> >> ---
> >>  drivers/gpu/drm/drm_file.c              | 22 +++++++++++++---------
> >>  drivers/gpu/drm/msm/msm_drv.c           |  2 +-
> >>  drivers/gpu/drm/panfrost/panfrost_drv.c |  2 +-
> >>  include/drm/drm_file.h                  |  5 +++--
> >>  4 files changed, 18 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> >> index 762965e3d503..517e1fb8072a 100644
> >> --- a/drivers/gpu/drm/drm_file.c
> >> +++ b/drivers/gpu/drm/drm_file.c
> >> @@ -873,7 +873,7 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> >>  EXPORT_SYMBOL(drm_send_event);
> >>
> >>  static void print_size(struct drm_printer *p, const char *stat,
> >> -                      const char *region, u64 sz)
> >> +                      const char *region, u64 sz, unsigned int unit)
> >>  {
> >>         const char *units[] = {"", " KiB", " MiB"};
> >>         unsigned u;
> >> @@ -881,6 +881,8 @@ static void print_size(struct drm_printer *p, const char *stat,
> >>         for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> >>                 if (sz < SZ_1K)
> >>                         break;
> >> +               if (unit > 0 && unit == u)
> >> +                       break;
> >>                 sz = div_u64(sz, SZ_1K);
> >>         }
> >>
> >> @@ -898,17 +900,18 @@ static void print_size(struct drm_printer *p, const char *stat,
> >>  void drm_print_memory_stats(struct drm_printer *p,
> >>                             const struct drm_memory_stats *stats,
> >>                             enum drm_gem_object_status supported_status,
> >> -                           const char *region)
> >> +                           const char *region,
> >> +                           unsigned int unit)
> >
> >I'm not really adverse to changing what units we use.. or perhaps
> >changing the threshold to go to higher units to be 10000x or 100000x
> >of the previous unit.  But I'm less excited about having different
> >drivers using different units.
> >
> >BR,
> >-R
>
> Would it be alright if I left it set to the default unit, and allow changing it
> at runtime with a debugfs file?

I suppose we could, but it does seem a bit like overkill.  OTOH I
think it would make sense to increase the threshold, ie. switch to MiB
after 10MiB instead of 1MiB.. at that point the fractional component
is less significant..

BR,
-R

> >>  {
> >> -       print_size(p, "total", region, stats->private + stats->shared);
> >> -       print_size(p, "shared", region, stats->shared);
> >> -       print_size(p, "active", region, stats->active);
> >> +       print_size(p, "total", region, stats->private + stats->shared, unit);
> >> +       print_size(p, "shared", region, stats->shared, unit);
> >> +       print_size(p, "active", region, stats->active, unit);
> >>
> >>         if (supported_status & DRM_GEM_OBJECT_RESIDENT)
> >> -               print_size(p, "resident", region, stats->resident);
> >> +               print_size(p, "resident", region, stats->resident, unit);
> >>
> >>         if (supported_status & DRM_GEM_OBJECT_PURGEABLE)
> >> -               print_size(p, "purgeable", region, stats->purgeable);
> >> +               print_size(p, "purgeable", region, stats->purgeable, unit);
> >>  }
> >>  EXPORT_SYMBOL(drm_print_memory_stats);
> >>
> >> @@ -916,11 +919,12 @@ EXPORT_SYMBOL(drm_print_memory_stats);
> >>   * drm_show_memory_stats - Helper to collect and show standard fdinfo memory stats
> >>   * @p: the printer to print output to
> >>   * @file: the DRM file
> >> + * @unit: multipliyer of power of two exponent of desired unit
> >>   *
> >>   * Helper to iterate over GEM objects with a handle allocated in the specified
> >>   * file.
> >>   */
> >> -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
> >> +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit)
> >>  {
> >>         struct drm_gem_object *obj;
> >>         struct drm_memory_stats status = {};
> >> @@ -967,7 +971,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
> >>         }
> >>         spin_unlock(&file->table_lock);
> >>
> >> -       drm_print_memory_stats(p, &status, supported_status, "memory");
> >> +       drm_print_memory_stats(p, &status, supported_status, "memory", unit);
> >>  }
> >>  EXPORT_SYMBOL(drm_show_memory_stats);
> >>
> >> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> >> index 2a0e3529598b..cd1198151744 100644
> >> --- a/drivers/gpu/drm/msm/msm_drv.c
> >> +++ b/drivers/gpu/drm/msm/msm_drv.c
> >> @@ -1067,7 +1067,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct drm_file *file)
> >>
> >>         msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
> >>
> >> -       drm_show_memory_stats(p, file);
> >> +       drm_show_memory_stats(p, file, 0);
> >>  }
> >>
> >>  static const struct file_operations fops = {
> >> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> >> index 93d5f5538c0b..79c08cee3e9d 100644
> >> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> >> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> >> @@ -563,7 +563,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
> >>
> >>         panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
> >>
> >> -       drm_show_memory_stats(p, file);
> >> +       drm_show_memory_stats(p, file, 1);
> >>  }
> >>
> >>  static const struct file_operations panfrost_drm_driver_fops = {
> >> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> >> index 010239392adf..21a3b022dd63 100644
> >> --- a/include/drm/drm_file.h
> >> +++ b/include/drm/drm_file.h
> >> @@ -466,9 +466,10 @@ enum drm_gem_object_status;
> >>  void drm_print_memory_stats(struct drm_printer *p,
> >>                             const struct drm_memory_stats *stats,
> >>                             enum drm_gem_object_status supported_status,
> >> -                           const char *region);
> >> +                           const char *region,
> >> +                           unsigned int unit);
> >>
> >> -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file);
> >> +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit);
> >>  void drm_show_fdinfo(struct seq_file *m, struct file *f);
> >>
> >>  struct file *mock_drm_getfile(struct drm_minor *minor, unsigned int flags);
> >> --
> >> 2.42.0
> >>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats
@ 2023-09-05 22:23         ` Rob Clark
  0 siblings, 0 replies; 56+ messages in thread
From: Rob Clark @ 2023-09-05 22:23 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: Rob Clark, kernel, tzimmermann, Tvrtko Ursulin, sean,
	quic_abhinavk, mripard, steven.price, healych, dri-devel,
	linux-arm-msm, dmitry.baryshkov, marijn.suijten, freedreno,
	linux-kernel

On Wed, Aug 30, 2023 at 8:51 AM Adrián Larumbe
<adrian.larumbe@collabora.com> wrote:
>
> >> The current implementation will try to pick the highest available
> >> unit. This is rather unflexible, and allowing drivers to display BO size
> >> statistics through fdinfo in units of their choice might be desirable.
> >>
> >> The new argument to drm_show_memory_stats is to be interpreted as the
> >> integer multiplier of a 10-power of 2, so 1 would give us size in Kib and 2
> >> in Mib. If we want drm-file functions to pick the highest unit, then 0
> >> should be passed.
> >>
> >> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> >> ---
> >>  drivers/gpu/drm/drm_file.c              | 22 +++++++++++++---------
> >>  drivers/gpu/drm/msm/msm_drv.c           |  2 +-
> >>  drivers/gpu/drm/panfrost/panfrost_drv.c |  2 +-
> >>  include/drm/drm_file.h                  |  5 +++--
> >>  4 files changed, 18 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> >> index 762965e3d503..517e1fb8072a 100644
> >> --- a/drivers/gpu/drm/drm_file.c
> >> +++ b/drivers/gpu/drm/drm_file.c
> >> @@ -873,7 +873,7 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> >>  EXPORT_SYMBOL(drm_send_event);
> >>
> >>  static void print_size(struct drm_printer *p, const char *stat,
> >> -                      const char *region, u64 sz)
> >> +                      const char *region, u64 sz, unsigned int unit)
> >>  {
> >>         const char *units[] = {"", " KiB", " MiB"};
> >>         unsigned u;
> >> @@ -881,6 +881,8 @@ static void print_size(struct drm_printer *p, const char *stat,
> >>         for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> >>                 if (sz < SZ_1K)
> >>                         break;
> >> +               if (unit > 0 && unit == u)
> >> +                       break;
> >>                 sz = div_u64(sz, SZ_1K);
> >>         }
> >>
> >> @@ -898,17 +900,18 @@ static void print_size(struct drm_printer *p, const char *stat,
> >>  void drm_print_memory_stats(struct drm_printer *p,
> >>                             const struct drm_memory_stats *stats,
> >>                             enum drm_gem_object_status supported_status,
> >> -                           const char *region)
> >> +                           const char *region,
> >> +                           unsigned int unit)
> >
> >I'm not really adverse to changing what units we use.. or perhaps
> >changing the threshold to go to higher units to be 10000x or 100000x
> >of the previous unit.  But I'm less excited about having different
> >drivers using different units.
> >
> >BR,
> >-R
>
> Would it be alright if I left it set to the default unit, and allow changing it
> at runtime with a debugfs file?

I suppose we could, but it does seem a bit like overkill.  OTOH I
think it would make sense to increase the threshold, ie. switch to MiB
after 10MiB instead of 1MiB.. at that point the fractional component
is less significant..

BR,
-R

> >>  {
> >> -       print_size(p, "total", region, stats->private + stats->shared);
> >> -       print_size(p, "shared", region, stats->shared);
> >> -       print_size(p, "active", region, stats->active);
> >> +       print_size(p, "total", region, stats->private + stats->shared, unit);
> >> +       print_size(p, "shared", region, stats->shared, unit);
> >> +       print_size(p, "active", region, stats->active, unit);
> >>
> >>         if (supported_status & DRM_GEM_OBJECT_RESIDENT)
> >> -               print_size(p, "resident", region, stats->resident);
> >> +               print_size(p, "resident", region, stats->resident, unit);
> >>
> >>         if (supported_status & DRM_GEM_OBJECT_PURGEABLE)
> >> -               print_size(p, "purgeable", region, stats->purgeable);
> >> +               print_size(p, "purgeable", region, stats->purgeable, unit);
> >>  }
> >>  EXPORT_SYMBOL(drm_print_memory_stats);
> >>
> >> @@ -916,11 +919,12 @@ EXPORT_SYMBOL(drm_print_memory_stats);
> >>   * drm_show_memory_stats - Helper to collect and show standard fdinfo memory stats
> >>   * @p: the printer to print output to
> >>   * @file: the DRM file
> >> + * @unit: multipliyer of power of two exponent of desired unit
> >>   *
> >>   * Helper to iterate over GEM objects with a handle allocated in the specified
> >>   * file.
> >>   */
> >> -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
> >> +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit)
> >>  {
> >>         struct drm_gem_object *obj;
> >>         struct drm_memory_stats status = {};
> >> @@ -967,7 +971,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
> >>         }
> >>         spin_unlock(&file->table_lock);
> >>
> >> -       drm_print_memory_stats(p, &status, supported_status, "memory");
> >> +       drm_print_memory_stats(p, &status, supported_status, "memory", unit);
> >>  }
> >>  EXPORT_SYMBOL(drm_show_memory_stats);
> >>
> >> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> >> index 2a0e3529598b..cd1198151744 100644
> >> --- a/drivers/gpu/drm/msm/msm_drv.c
> >> +++ b/drivers/gpu/drm/msm/msm_drv.c
> >> @@ -1067,7 +1067,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct drm_file *file)
> >>
> >>         msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
> >>
> >> -       drm_show_memory_stats(p, file);
> >> +       drm_show_memory_stats(p, file, 0);
> >>  }
> >>
> >>  static const struct file_operations fops = {
> >> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> >> index 93d5f5538c0b..79c08cee3e9d 100644
> >> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> >> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> >> @@ -563,7 +563,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file)
> >>
> >>         panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
> >>
> >> -       drm_show_memory_stats(p, file);
> >> +       drm_show_memory_stats(p, file, 1);
> >>  }
> >>
> >>  static const struct file_operations panfrost_drm_driver_fops = {
> >> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> >> index 010239392adf..21a3b022dd63 100644
> >> --- a/include/drm/drm_file.h
> >> +++ b/include/drm/drm_file.h
> >> @@ -466,9 +466,10 @@ enum drm_gem_object_status;
> >>  void drm_print_memory_stats(struct drm_printer *p,
> >>                             const struct drm_memory_stats *stats,
> >>                             enum drm_gem_object_status supported_status,
> >> -                           const char *region);
> >> +                           const char *region,
> >> +                           unsigned int unit);
> >>
> >> -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file);
> >> +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, unsigned int unit);
> >>  void drm_show_fdinfo(struct seq_file *m, struct file *f);
> >>
> >>  struct file *mock_drm_getfile(struct drm_minor *minor, unsigned int flags);
> >> --
> >> 2.42.0
> >>

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2023-09-05 22:23 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-24  1:34 [PATCH v2 0/6] Add fdinfo support to Panfrost Adrián Larumbe
2023-08-24  1:34 ` Adrián Larumbe
2023-08-24  1:34 ` [PATCH v2 1/6] drm/panfrost: Add cycle count GPU register definitions Adrián Larumbe
2023-08-24  1:34   ` Adrián Larumbe
2023-08-30 10:35   ` Boris Brezillon
2023-08-30 10:35     ` Boris Brezillon
2023-08-31 15:54   ` Steven Price
2023-08-31 15:54     ` Steven Price
2023-08-24  1:34 ` [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics Adrián Larumbe
2023-08-24  1:34   ` Adrián Larumbe
2023-08-24  4:12   ` kernel test robot
2023-08-24  4:12     ` kernel test robot
2023-08-30 10:17   ` Boris Brezillon
2023-08-30 10:17     ` Boris Brezillon
2023-08-31 23:23     ` Adrián Larumbe
2023-08-31 23:23       ` Adrián Larumbe
2023-08-31 15:54   ` Steven Price
2023-08-31 15:54     ` Steven Price
2023-08-31 21:34     ` Adrián Larumbe
2023-08-31 21:34       ` Adrián Larumbe
2023-09-04  8:22       ` Steven Price
2023-09-04  8:22         ` Steven Price
2023-09-02  3:20   ` kernel test robot
2023-09-02  3:20     ` kernel test robot
2023-08-24  1:34 ` [PATCH v2 3/6] drm/panfrost: Add fdinfo support for memory stats Adrián Larumbe
2023-08-24  1:34   ` Adrián Larumbe
2023-08-30 10:31   ` Boris Brezillon
2023-08-30 10:31     ` Boris Brezillon
2023-08-31 23:07     ` Adrián Larumbe
2023-08-31 23:07       ` Adrián Larumbe
2023-08-24  1:34 ` [PATCH v2 4/6] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo Adrián Larumbe
2023-08-24  1:34   ` Adrián Larumbe
2023-08-30 10:34   ` Boris Brezillon
2023-08-30 10:34     ` Boris Brezillon
2023-08-24  1:34 ` [PATCH v2 5/6] drm/panfrost: Implement generic DRM object RSS reporting function Adrián Larumbe
2023-08-24  1:34   ` Adrián Larumbe
2023-08-24 11:13   ` kernel test robot
2023-08-24 11:13     ` kernel test robot
2023-08-30 10:52   ` Boris Brezillon
2023-08-30 10:52     ` Boris Brezillon
2023-09-01  0:03     ` Adrián Larumbe
2023-09-01  0:03       ` Adrián Larumbe
2023-09-01  6:44       ` Boris Brezillon
2023-09-01  6:44         ` Boris Brezillon
2023-08-24  1:34 ` [PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats Adrián Larumbe
2023-08-24  1:34   ` Adrián Larumbe
2023-08-24  6:49   ` kernel test robot
2023-08-24  6:49     ` kernel test robot
2023-08-28 15:00   ` Rob Clark
2023-08-28 15:00     ` Rob Clark
2023-08-30 15:51     ` Adrián Larumbe
2023-08-30 15:51       ` Adrián Larumbe
2023-09-05 22:23       ` Rob Clark
2023-09-05 22:23         ` Rob Clark
2023-09-01 22:18   ` kernel test robot
2023-09-01 22:18     ` kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.