All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/10] Engine Busyness
@ 2023-12-14 11:31 Riana Tauro
  2023-12-14 11:31 ` [PATCH v3 01/10] drm/xe/pmu: Remove PMU from Xe till uapi is finalized Riana Tauro
                   ` (12 more replies)
  0 siblings, 13 replies; 20+ messages in thread
From: Riana Tauro @ 2023-12-14 11:31 UTC (permalink / raw)
  To: intel-xe

The first two patches are added for compilation and are
part of https://patchwork.freedesktop.org/series/127664/

GuC provides engine busyness ticks as a 64 bit counter which count
as clock ticks. These counters are maintained in a
shared memory buffer and internally updated on a continuous basis.

GuC also provides a periodically total active ticks that GT has been
active for. This counter is exposed to the user such that busyness can
be calculated as a percentage using

busyness % = (engine active ticks/total active ticks) * 100.

This can be listed as

sudo ./perf list
     xe_0000_03_00.0/total-active-ticks-gt0/            [Kernel PMU event]
     xe_0000_03_00.0/bcs0-busy-ticks-gt0/               [Kernel PMU event]
     xe_0000_03_00.0/ccs0-busy-ticks-gt0/               [Kernel PMU event]
     xe_0000_03_00.0/rcs0-busy-ticks-gt0/               [Kernel PMU event]
     xe_0000_03_00.0/vcs0-busy-ticks-gt0/               [Kernel PMU event]
     xe_0000_03_00.0/vecs0-busy-ticks-gt0/              [Kernel PMU event]

and can be read as

        sudo ./perf stat -e xe_0000_03_00.0/bcs0-busy-ticks-gt0/,xe_0000_03_00.0/total-active-ticks-gt0/  -I 1000

v2: rebase
    fix review comments

v3: rebase
    add dropping of group busyness changes
    add pmu base implementation

Aravind Iddamsetty (1):
  RFC drm/xe/pmu: Enable PMU interface and add engine busyness counter

Ashutosh Dixit (2):
  drm/xe/pmu: Remove PMU from Xe till uapi is finalized
  fixup! drm/xe/uapi: Reject bo creation of unaligned size

Riana Tauro (7):
  drm/xe: Move user engine class mappings to functions
  RFC drm/xe/guc: Add interface for engine busyness ticks
  RFC drm/xe/uapi: Add configs for Engine busyness
  RFC drm/xe/guc: Add PMU counter for total active ticks
  RFC drm/xe/guc: Expose engine busyness only for supported GuC version
  RFC drm/xe/guc: Dynamically enable/disable engine busyness stats
  RFC drm/xe/guc: Handle runtime suspend issues for engine busyness

 drivers/gpu/drm/xe/Makefile                 |   5 +-
 drivers/gpu/drm/xe/abi/guc_actions_abi.h    |   1 +
 drivers/gpu/drm/xe/regs/xe_gt_regs.h        |   5 -
 drivers/gpu/drm/xe/tests/xe_dma_buf.c       |   2 +
 drivers/gpu/drm/xe/xe_device.c              |   4 +-
 drivers/gpu/drm/xe/xe_device_types.h        |   6 +-
 drivers/gpu/drm/xe/xe_exec_queue.c          |  19 +-
 drivers/gpu/drm/xe/xe_gt.c                  |  41 ++-
 drivers/gpu/drm/xe/xe_gt.h                  |   3 +
 drivers/gpu/drm/xe/xe_guc.c                 |   7 +
 drivers/gpu/drm/xe/xe_guc_engine_busyness.c | 375 ++++++++++++++++++++
 drivers/gpu/drm/xe/xe_guc_engine_busyness.h |  22 ++
 drivers/gpu/drm/xe/xe_guc_fwif.h            |  15 +
 drivers/gpu/drm/xe/xe_guc_types.h           |  25 ++
 drivers/gpu/drm/xe/xe_hw_engine.c           |  50 +++
 drivers/gpu/drm/xe/xe_hw_engine.h           |   3 +
 drivers/gpu/drm/xe/xe_pmu.c                 | 326 ++++++++---------
 drivers/gpu/drm/xe/xe_pmu.h                 |   2 +
 drivers/gpu/drm/xe/xe_pmu_types.h           |  19 -
 drivers/gpu/drm/xe/xe_query.c               |  23 +-
 include/uapi/drm/xe_drm.h                   |  50 ++-
 21 files changed, 756 insertions(+), 247 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_guc_engine_busyness.c
 create mode 100644 drivers/gpu/drm/xe/xe_guc_engine_busyness.h

-- 
2.40.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 01/10] drm/xe/pmu: Remove PMU from Xe till uapi is finalized
  2023-12-14 11:31 [PATCH v3 00/10] Engine Busyness Riana Tauro
@ 2023-12-14 11:31 ` Riana Tauro
  2023-12-15  3:51   ` Aravind Iddamsetty
  2023-12-14 11:31 ` [PATCH v3 02/10] fixup! drm/xe/uapi: Reject bo creation of unaligned size Riana Tauro
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Riana Tauro @ 2023-12-14 11:31 UTC (permalink / raw)
  To: intel-xe

From: Ashutosh Dixit <ashutosh.dixit@intel.com>

PMU uapi is likely to change in the future. Till the uapi is finalized,
remove PMU from Xe. PMU can be re-added after uapi is finalized.

v2: Include xe_drm.h in xe/tests/xe_dma_buf.c (Francois)

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/xe/Makefile          |   2 -
 drivers/gpu/drm/xe/regs/xe_gt_regs.h |   5 -
 drivers/gpu/drm/xe/xe_device.c       |   2 -
 drivers/gpu/drm/xe/xe_device_types.h |   4 -
 drivers/gpu/drm/xe/xe_gt.c           |   2 -
 drivers/gpu/drm/xe/xe_module.c       |   5 -
 drivers/gpu/drm/xe/xe_pmu.c          | 645 ---------------------------
 drivers/gpu/drm/xe/xe_pmu.h          |  25 --
 drivers/gpu/drm/xe/xe_pmu_types.h    |  68 ---
 include/uapi/drm/xe_drm.h            |  40 --
 10 files changed, 798 deletions(-)
 delete mode 100644 drivers/gpu/drm/xe/xe_pmu.c
 delete mode 100644 drivers/gpu/drm/xe/xe_pmu.h
 delete mode 100644 drivers/gpu/drm/xe/xe_pmu_types.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index f4ae063a7005..b0cb6a9a390e 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -267,8 +267,6 @@ xe-$(CONFIG_DRM_XE_DISPLAY) += \
 	i915-display/skl_universal_plane.o \
 	i915-display/skl_watermark.o
 
-xe-$(CONFIG_PERF_EVENTS) += xe_pmu.o
-
 ifeq ($(CONFIG_ACPI),y)
 	xe-$(CONFIG_DRM_XE_DISPLAY) += \
 		i915-display/intel_acpi.o \
diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
index f5bf4c6d1761..3c3977c388f5 100644
--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
@@ -333,11 +333,6 @@
 #define   INVALIDATION_BROADCAST_MODE_DIS	REG_BIT(12)
 #define   GLOBAL_INVALIDATION_MODE		REG_BIT(2)
 
-#define XE_OAG_RC0_ANY_ENGINE_BUSY_FREE		XE_REG(0xdb80)
-#define XE_OAG_ANY_MEDIA_FF_BUSY_FREE		XE_REG(0xdba0)
-#define XE_OAG_BLT_BUSY_FREE			XE_REG(0xdbbc)
-#define XE_OAG_RENDER_BUSY_FREE			XE_REG(0xdbdc)
-
 #define HALF_SLICE_CHICKEN5			XE_REG_MCR(0xe188, XE_REG_OPTION_MASKED)
 #define   DISABLE_SAMPLE_G_PERFORMANCE		REG_BIT(0)
 
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 221e87584352..d9ae77fe7382 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -529,8 +529,6 @@ int xe_device_probe(struct xe_device *xe)
 
 	xe_debugfs_register(xe);
 
-	xe_pmu_register(&xe->pmu);
-
 	xe_hwmon_register(xe);
 
 	err = drmm_add_action_or_reset(&xe->drm, xe_device_sanitize, xe);
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index d1a48456e9a3..c45ef17b3473 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -18,7 +18,6 @@
 #include "xe_lmtt_types.h"
 #include "xe_platform_types.h"
 #include "xe_pt_types.h"
-#include "xe_pmu.h"
 #include "xe_sriov_types.h"
 #include "xe_step_types.h"
 
@@ -427,9 +426,6 @@ struct xe_device {
 	 */
 	struct task_struct *pm_callback_task;
 
-	/** @pmu: performance monitoring unit */
-	struct xe_pmu pmu;
-
 	/** @hwmon: hwmon subsystem integration */
 	struct xe_hwmon *hwmon;
 
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index dfd9cf01a5d5..f5d18e98f8b6 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -709,8 +709,6 @@ int xe_gt_suspend(struct xe_gt *gt)
 	if (err)
 		goto err_msg;
 
-	xe_pmu_suspend(gt);
-
 	err = xe_uc_suspend(&gt->uc);
 	if (err)
 		goto err_force_wake;
diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 51bf69b7ab22..110b69864656 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -11,7 +11,6 @@
 #include "xe_drv.h"
 #include "xe_hw_fence.h"
 #include "xe_pci.h"
-#include "xe_pmu.h"
 #include "xe_sched_job.h"
 
 struct xe_modparam xe_modparam = {
@@ -63,10 +62,6 @@ static const struct init_funcs init_funcs[] = {
 		.init = xe_sched_job_module_init,
 		.exit = xe_sched_job_module_exit,
 	},
-	{
-		.init = xe_pmu_init,
-		.exit = xe_pmu_exit,
-	},
 	{
 		.init = xe_register_pci_driver,
 		.exit = xe_unregister_pci_driver,
diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
deleted file mode 100644
index 9d0b7887cfc4..000000000000
--- a/drivers/gpu/drm/xe/xe_pmu.c
+++ /dev/null
@@ -1,645 +0,0 @@
-// SPDX-License-Identifier: MIT
-/*
- * Copyright © 2023 Intel Corporation
- */
-
-#include <drm/drm_drv.h>
-#include <drm/drm_managed.h>
-#include <drm/xe_drm.h>
-
-#include "regs/xe_gt_regs.h"
-#include "xe_device.h"
-#include "xe_gt_clock.h"
-#include "xe_mmio.h"
-
-static cpumask_t xe_pmu_cpumask;
-static unsigned int xe_pmu_target_cpu = -1;
-
-static unsigned int config_gt_id(const u64 config)
-{
-	return config >> __DRM_XE_PMU_GT_SHIFT;
-}
-
-static u64 config_counter(const u64 config)
-{
-	return config & ~(~0ULL << __DRM_XE_PMU_GT_SHIFT);
-}
-
-static void xe_pmu_event_destroy(struct perf_event *event)
-{
-	struct xe_device *xe =
-		container_of(event->pmu, typeof(*xe), pmu.base);
-
-	drm_WARN_ON(&xe->drm, event->parent);
-
-	drm_dev_put(&xe->drm);
-}
-
-static u64 __engine_group_busyness_read(struct xe_gt *gt, int sample_type)
-{
-	u64 val;
-
-	switch (sample_type) {
-	case __XE_SAMPLE_RENDER_GROUP_BUSY:
-		val = xe_mmio_read32(gt, XE_OAG_RENDER_BUSY_FREE);
-		break;
-	case __XE_SAMPLE_COPY_GROUP_BUSY:
-		val = xe_mmio_read32(gt, XE_OAG_BLT_BUSY_FREE);
-		break;
-	case __XE_SAMPLE_MEDIA_GROUP_BUSY:
-		val = xe_mmio_read32(gt, XE_OAG_ANY_MEDIA_FF_BUSY_FREE);
-		break;
-	case __XE_SAMPLE_ANY_ENGINE_GROUP_BUSY:
-		val = xe_mmio_read32(gt, XE_OAG_RC0_ANY_ENGINE_BUSY_FREE);
-		break;
-	default:
-		drm_warn(&gt->tile->xe->drm, "unknown pmu event\n");
-	}
-
-	return xe_gt_clock_cycles_to_ns(gt, val * 16);
-}
-
-static u64 engine_group_busyness_read(struct xe_gt *gt, u64 config)
-{
-	int sample_type = config_counter(config);
-	const unsigned int gt_id = gt->info.id;
-	struct xe_device *xe = gt->tile->xe;
-	struct xe_pmu *pmu = &xe->pmu;
-	unsigned long flags;
-	bool device_awake;
-	u64 val;
-
-	device_awake = xe_device_mem_access_get_if_ongoing(xe);
-	if (device_awake) {
-		XE_WARN_ON(xe_force_wake_get(gt_to_fw(gt), XE_FW_GT));
-		val = __engine_group_busyness_read(gt, sample_type);
-		XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FW_GT));
-		xe_device_mem_access_put(xe);
-	}
-
-	spin_lock_irqsave(&pmu->lock, flags);
-
-	if (device_awake)
-		pmu->sample[gt_id][sample_type] = val;
-	else
-		val = pmu->sample[gt_id][sample_type];
-
-	spin_unlock_irqrestore(&pmu->lock, flags);
-
-	return val;
-}
-
-static void engine_group_busyness_store(struct xe_gt *gt)
-{
-	struct xe_pmu *pmu = &gt->tile->xe->pmu;
-	unsigned int gt_id = gt->info.id;
-	unsigned long flags;
-	int i;
-
-	spin_lock_irqsave(&pmu->lock, flags);
-
-	for (i = __XE_SAMPLE_RENDER_GROUP_BUSY; i <= __XE_SAMPLE_ANY_ENGINE_GROUP_BUSY; i++)
-		pmu->sample[gt_id][i] = __engine_group_busyness_read(gt, i);
-
-	spin_unlock_irqrestore(&pmu->lock, flags);
-}
-
-static int
-config_status(struct xe_device *xe, u64 config)
-{
-	unsigned int gt_id = config_gt_id(config);
-	struct xe_gt *gt = xe_device_get_gt(xe, gt_id);
-
-	if (gt_id >= XE_PMU_MAX_GT)
-		return -ENOENT;
-
-	switch (config_counter(config)) {
-	case DRM_XE_PMU_RENDER_GROUP_BUSY(0):
-	case DRM_XE_PMU_COPY_GROUP_BUSY(0):
-	case DRM_XE_PMU_ANY_ENGINE_GROUP_BUSY(0):
-		if (gt->info.type == XE_GT_TYPE_MEDIA)
-			return -ENOENT;
-		break;
-	case DRM_XE_PMU_MEDIA_GROUP_BUSY(0):
-		if (!(gt->info.engine_mask & (BIT(XE_HW_ENGINE_VCS0) | BIT(XE_HW_ENGINE_VECS0))))
-			return -ENOENT;
-		break;
-	default:
-		return -ENOENT;
-	}
-
-	return 0;
-}
-
-static int xe_pmu_event_init(struct perf_event *event)
-{
-	struct xe_device *xe =
-		container_of(event->pmu, typeof(*xe), pmu.base);
-	struct xe_pmu *pmu = &xe->pmu;
-	int ret;
-
-	if (pmu->closed)
-		return -ENODEV;
-
-	if (event->attr.type != event->pmu->type)
-		return -ENOENT;
-
-	/* unsupported modes and filters */
-	if (event->attr.sample_period) /* no sampling */
-		return -EINVAL;
-
-	if (has_branch_stack(event))
-		return -EOPNOTSUPP;
-
-	if (event->cpu < 0)
-		return -EINVAL;
-
-	/* only allow running on one cpu at a time */
-	if (!cpumask_test_cpu(event->cpu, &xe_pmu_cpumask))
-		return -EINVAL;
-
-	ret = config_status(xe, event->attr.config);
-	if (ret)
-		return ret;
-
-	if (!event->parent) {
-		drm_dev_get(&xe->drm);
-		event->destroy = xe_pmu_event_destroy;
-	}
-
-	return 0;
-}
-
-static u64 __xe_pmu_event_read(struct perf_event *event)
-{
-	struct xe_device *xe =
-		container_of(event->pmu, typeof(*xe), pmu.base);
-	const unsigned int gt_id = config_gt_id(event->attr.config);
-	const u64 config = event->attr.config;
-	struct xe_gt *gt = xe_device_get_gt(xe, gt_id);
-	u64 val;
-
-	switch (config_counter(config)) {
-	case DRM_XE_PMU_RENDER_GROUP_BUSY(0):
-	case DRM_XE_PMU_COPY_GROUP_BUSY(0):
-	case DRM_XE_PMU_ANY_ENGINE_GROUP_BUSY(0):
-	case DRM_XE_PMU_MEDIA_GROUP_BUSY(0):
-		val = engine_group_busyness_read(gt, config);
-		break;
-	default:
-		drm_warn(&gt->tile->xe->drm, "unknown pmu event\n");
-	}
-
-	return val;
-}
-
-static void xe_pmu_event_read(struct perf_event *event)
-{
-	struct xe_device *xe =
-		container_of(event->pmu, typeof(*xe), pmu.base);
-	struct hw_perf_event *hwc = &event->hw;
-	struct xe_pmu *pmu = &xe->pmu;
-	u64 prev, new;
-
-	if (pmu->closed) {
-		event->hw.state = PERF_HES_STOPPED;
-		return;
-	}
-again:
-	prev = local64_read(&hwc->prev_count);
-	new = __xe_pmu_event_read(event);
-
-	if (local64_cmpxchg(&hwc->prev_count, prev, new) != prev)
-		goto again;
-
-	local64_add(new - prev, &event->count);
-}
-
-static void xe_pmu_enable(struct perf_event *event)
-{
-	/*
-	 * Store the current counter value so we can report the correct delta
-	 * for all listeners. Even when the event was already enabled and has
-	 * an existing non-zero value.
-	 */
-	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event));
-}
-
-static void xe_pmu_event_start(struct perf_event *event, int flags)
-{
-	struct xe_device *xe =
-		container_of(event->pmu, typeof(*xe), pmu.base);
-	struct xe_pmu *pmu = &xe->pmu;
-
-	if (pmu->closed)
-		return;
-
-	xe_pmu_enable(event);
-	event->hw.state = 0;
-}
-
-static void xe_pmu_event_stop(struct perf_event *event, int flags)
-{
-	if (flags & PERF_EF_UPDATE)
-		xe_pmu_event_read(event);
-
-	event->hw.state = PERF_HES_STOPPED;
-}
-
-static int xe_pmu_event_add(struct perf_event *event, int flags)
-{
-	struct xe_device *xe =
-		container_of(event->pmu, typeof(*xe), pmu.base);
-	struct xe_pmu *pmu = &xe->pmu;
-
-	if (pmu->closed)
-		return -ENODEV;
-
-	if (flags & PERF_EF_START)
-		xe_pmu_event_start(event, flags);
-
-	return 0;
-}
-
-static void xe_pmu_event_del(struct perf_event *event, int flags)
-{
-	xe_pmu_event_stop(event, PERF_EF_UPDATE);
-}
-
-static int xe_pmu_event_event_idx(struct perf_event *event)
-{
-	return 0;
-}
-
-struct xe_ext_attribute {
-	struct device_attribute attr;
-	unsigned long val;
-};
-
-static ssize_t xe_pmu_event_show(struct device *dev,
-				 struct device_attribute *attr, char *buf)
-{
-	struct xe_ext_attribute *eattr;
-
-	eattr = container_of(attr, struct xe_ext_attribute, attr);
-	return sprintf(buf, "config=0x%lx\n", eattr->val);
-}
-
-static ssize_t cpumask_show(struct device *dev,
-			    struct device_attribute *attr, char *buf)
-{
-	return cpumap_print_to_pagebuf(true, buf, &xe_pmu_cpumask);
-}
-
-static DEVICE_ATTR_RO(cpumask);
-
-static struct attribute *xe_cpumask_attrs[] = {
-	&dev_attr_cpumask.attr,
-	NULL,
-};
-
-static const struct attribute_group xe_pmu_cpumask_attr_group = {
-	.attrs = xe_cpumask_attrs,
-};
-
-#define __event(__counter, __name, __unit) \
-{ \
-	.counter = (__counter), \
-	.name = (__name), \
-	.unit = (__unit), \
-	.global = false, \
-}
-
-#define __global_event(__counter, __name, __unit) \
-{ \
-	.counter = (__counter), \
-	.name = (__name), \
-	.unit = (__unit), \
-	.global = true, \
-}
-
-static struct xe_ext_attribute *
-add_xe_attr(struct xe_ext_attribute *attr, const char *name, u64 config)
-{
-	sysfs_attr_init(&attr->attr.attr);
-	attr->attr.attr.name = name;
-	attr->attr.attr.mode = 0444;
-	attr->attr.show = xe_pmu_event_show;
-	attr->val = config;
-
-	return ++attr;
-}
-
-static struct perf_pmu_events_attr *
-add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
-	     const char *str)
-{
-	sysfs_attr_init(&attr->attr.attr);
-	attr->attr.attr.name = name;
-	attr->attr.attr.mode = 0444;
-	attr->attr.show = perf_event_sysfs_show;
-	attr->event_str = str;
-
-	return ++attr;
-}
-
-static struct attribute **
-create_event_attributes(struct xe_pmu *pmu)
-{
-	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
-	static const struct {
-		unsigned int counter;
-		const char *name;
-		const char *unit;
-		bool global;
-	} events[] = {
-		__event(0, "render-group-busy", "ns"),
-		__event(1, "copy-group-busy", "ns"),
-		__event(2, "media-group-busy", "ns"),
-		__event(3, "any-engine-group-busy", "ns"),
-	};
-
-	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
-	struct xe_ext_attribute *xe_attr = NULL, *xe_iter;
-	struct attribute **attr = NULL, **attr_iter;
-	unsigned int count = 0;
-	unsigned int i, j;
-	struct xe_gt *gt;
-
-	/* Count how many counters we will be exposing. */
-	for_each_gt(gt, xe, j) {
-		for (i = 0; i < ARRAY_SIZE(events); i++) {
-			u64 config = ___DRM_XE_PMU_OTHER(j, events[i].counter);
-
-			if (!config_status(xe, config))
-				count++;
-		}
-	}
-
-	/* Allocate attribute objects and table. */
-	xe_attr = kcalloc(count, sizeof(*xe_attr), GFP_KERNEL);
-	if (!xe_attr)
-		goto err_alloc;
-
-	pmu_attr = kcalloc(count, sizeof(*pmu_attr), GFP_KERNEL);
-	if (!pmu_attr)
-		goto err_alloc;
-
-	/* Max one pointer of each attribute type plus a termination entry. */
-	attr = kcalloc(count * 2 + 1, sizeof(*attr), GFP_KERNEL);
-	if (!attr)
-		goto err_alloc;
-
-	xe_iter = xe_attr;
-	pmu_iter = pmu_attr;
-	attr_iter = attr;
-
-	for_each_gt(gt, xe, j) {
-		for (i = 0; i < ARRAY_SIZE(events); i++) {
-			u64 config = ___DRM_XE_PMU_OTHER(j, events[i].counter);
-			char *str;
-
-			if (config_status(xe, config))
-				continue;
-
-			if (events[i].global)
-				str = kstrdup(events[i].name, GFP_KERNEL);
-			else
-				str = kasprintf(GFP_KERNEL, "%s-gt%u",
-						events[i].name, j);
-			if (!str)
-				goto err;
-
-			*attr_iter++ = &xe_iter->attr.attr;
-			xe_iter = add_xe_attr(xe_iter, str, config);
-
-			if (events[i].unit) {
-				if (events[i].global)
-					str = kasprintf(GFP_KERNEL, "%s.unit",
-							events[i].name);
-				else
-					str = kasprintf(GFP_KERNEL, "%s-gt%u.unit",
-							events[i].name, j);
-				if (!str)
-					goto err;
-
-				*attr_iter++ = &pmu_iter->attr.attr;
-				pmu_iter = add_pmu_attr(pmu_iter, str,
-							events[i].unit);
-			}
-		}
-	}
-
-	pmu->xe_attr = xe_attr;
-	pmu->pmu_attr = pmu_attr;
-
-	return attr;
-
-err:
-	for (attr_iter = attr; *attr_iter; attr_iter++)
-		kfree((*attr_iter)->name);
-
-err_alloc:
-	kfree(attr);
-	kfree(xe_attr);
-	kfree(pmu_attr);
-
-	return NULL;
-}
-
-static void free_event_attributes(struct xe_pmu *pmu)
-{
-	struct attribute **attr_iter = pmu->events_attr_group.attrs;
-
-	for (; *attr_iter; attr_iter++)
-		kfree((*attr_iter)->name);
-
-	kfree(pmu->events_attr_group.attrs);
-	kfree(pmu->xe_attr);
-	kfree(pmu->pmu_attr);
-
-	pmu->events_attr_group.attrs = NULL;
-	pmu->xe_attr = NULL;
-	pmu->pmu_attr = NULL;
-}
-
-static int xe_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
-{
-	struct xe_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
-
-	/* Select the first online CPU as a designated reader. */
-	if (cpumask_empty(&xe_pmu_cpumask))
-		cpumask_set_cpu(cpu, &xe_pmu_cpumask);
-
-	return 0;
-}
-
-static int xe_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
-{
-	struct xe_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
-	unsigned int target = xe_pmu_target_cpu;
-
-	/*
-	 * Unregistering an instance generates a CPU offline event which we must
-	 * ignore to avoid incorrectly modifying the shared xe_pmu_cpumask.
-	 */
-	if (pmu->closed)
-		return 0;
-
-	if (cpumask_test_and_clear_cpu(cpu, &xe_pmu_cpumask)) {
-		target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
-
-		/* Migrate events if there is a valid target */
-		if (target < nr_cpu_ids) {
-			cpumask_set_cpu(target, &xe_pmu_cpumask);
-			xe_pmu_target_cpu = target;
-		}
-	}
-
-	if (target < nr_cpu_ids && target != pmu->cpuhp.cpu) {
-		perf_pmu_migrate_context(&pmu->base, cpu, target);
-		pmu->cpuhp.cpu = target;
-	}
-
-	return 0;
-}
-
-static enum cpuhp_state cpuhp_slot = CPUHP_INVALID;
-
-int xe_pmu_init(void)
-{
-	int ret;
-
-	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
-				      "perf/x86/intel/xe:online",
-				      xe_pmu_cpu_online,
-				      xe_pmu_cpu_offline);
-	if (ret < 0)
-		pr_notice("Failed to setup cpuhp state for xe PMU! (%d)\n",
-			  ret);
-	else
-		cpuhp_slot = ret;
-
-	return 0;
-}
-
-void xe_pmu_exit(void)
-{
-	if (cpuhp_slot != CPUHP_INVALID)
-		cpuhp_remove_multi_state(cpuhp_slot);
-}
-
-static int xe_pmu_register_cpuhp_state(struct xe_pmu *pmu)
-{
-	if (cpuhp_slot == CPUHP_INVALID)
-		return -EINVAL;
-
-	return cpuhp_state_add_instance(cpuhp_slot, &pmu->cpuhp.node);
-}
-
-static void xe_pmu_unregister_cpuhp_state(struct xe_pmu *pmu)
-{
-	cpuhp_state_remove_instance(cpuhp_slot, &pmu->cpuhp.node);
-}
-
-void xe_pmu_suspend(struct xe_gt *gt)
-{
-	engine_group_busyness_store(gt);
-}
-
-static void xe_pmu_unregister(struct drm_device *device, void *arg)
-{
-	struct xe_pmu *pmu = arg;
-
-	if (!pmu->base.event_init)
-		return;
-
-	/*
-	 * "Disconnect" the PMU callbacks - since all are atomic synchronize_rcu
-	 * ensures all currently executing ones will have exited before we
-	 * proceed with unregistration.
-	 */
-	pmu->closed = true;
-	synchronize_rcu();
-
-	xe_pmu_unregister_cpuhp_state(pmu);
-
-	perf_pmu_unregister(&pmu->base);
-	pmu->base.event_init = NULL;
-	kfree(pmu->base.attr_groups);
-	kfree(pmu->name);
-	free_event_attributes(pmu);
-}
-
-void xe_pmu_register(struct xe_pmu *pmu)
-{
-	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
-	const struct attribute_group *attr_groups[] = {
-		&pmu->events_attr_group,
-		&xe_pmu_cpumask_attr_group,
-		NULL
-	};
-
-	int ret = -ENOMEM;
-
-	spin_lock_init(&pmu->lock);
-	pmu->cpuhp.cpu = -1;
-
-	pmu->name = kasprintf(GFP_KERNEL,
-			      "xe_%s",
-			      dev_name(xe->drm.dev));
-	if (pmu->name)
-		/* tools/perf reserves colons as special. */
-		strreplace((char *)pmu->name, ':', '_');
-
-	if (!pmu->name)
-		goto err;
-
-	pmu->events_attr_group.name = "events";
-	pmu->events_attr_group.attrs = create_event_attributes(pmu);
-	if (!pmu->events_attr_group.attrs)
-		goto err_name;
-
-	pmu->base.attr_groups = kmemdup(attr_groups, sizeof(attr_groups),
-					GFP_KERNEL);
-	if (!pmu->base.attr_groups)
-		goto err_attr;
-
-	pmu->base.module	= THIS_MODULE;
-	pmu->base.task_ctx_nr	= perf_invalid_context;
-	pmu->base.event_init	= xe_pmu_event_init;
-	pmu->base.add		= xe_pmu_event_add;
-	pmu->base.del		= xe_pmu_event_del;
-	pmu->base.start		= xe_pmu_event_start;
-	pmu->base.stop		= xe_pmu_event_stop;
-	pmu->base.read		= xe_pmu_event_read;
-	pmu->base.event_idx	= xe_pmu_event_event_idx;
-
-	ret = perf_pmu_register(&pmu->base, pmu->name, -1);
-	if (ret)
-		goto err_groups;
-
-	ret = xe_pmu_register_cpuhp_state(pmu);
-	if (ret)
-		goto err_unreg;
-
-	ret = drmm_add_action_or_reset(&xe->drm, xe_pmu_unregister, pmu);
-	if (ret)
-		goto err_cpuhp;
-
-	return;
-
-err_cpuhp:
-	xe_pmu_unregister_cpuhp_state(pmu);
-err_unreg:
-	perf_pmu_unregister(&pmu->base);
-err_groups:
-	kfree(pmu->base.attr_groups);
-err_attr:
-	pmu->base.event_init = NULL;
-	free_event_attributes(pmu);
-err_name:
-	kfree(pmu->name);
-err:
-	drm_notice(&xe->drm, "Failed to register PMU!\n");
-}
diff --git a/drivers/gpu/drm/xe/xe_pmu.h b/drivers/gpu/drm/xe/xe_pmu.h
deleted file mode 100644
index a99d4ddd023e..000000000000
--- a/drivers/gpu/drm/xe/xe_pmu.h
+++ /dev/null
@@ -1,25 +0,0 @@
-/* SPDX-License-Identifier: MIT */
-/*
- * Copyright © 2023 Intel Corporation
- */
-
-#ifndef _XE_PMU_H_
-#define _XE_PMU_H_
-
-#include "xe_gt_types.h"
-#include "xe_pmu_types.h"
-
-#if IS_ENABLED(CONFIG_PERF_EVENTS)
-int xe_pmu_init(void);
-void xe_pmu_exit(void);
-void xe_pmu_register(struct xe_pmu *pmu);
-void xe_pmu_suspend(struct xe_gt *gt);
-#else
-static inline int xe_pmu_init(void) { return 0; }
-static inline void xe_pmu_exit(void) {}
-static inline void xe_pmu_register(struct xe_pmu *pmu) {}
-static inline void xe_pmu_suspend(struct xe_gt *gt) {}
-#endif
-
-#endif
-
diff --git a/drivers/gpu/drm/xe/xe_pmu_types.h b/drivers/gpu/drm/xe/xe_pmu_types.h
deleted file mode 100644
index 9cadbd243f57..000000000000
--- a/drivers/gpu/drm/xe/xe_pmu_types.h
+++ /dev/null
@@ -1,68 +0,0 @@
-/* SPDX-License-Identifier: MIT */
-/*
- * Copyright © 2023 Intel Corporation
- */
-
-#ifndef _XE_PMU_TYPES_H_
-#define _XE_PMU_TYPES_H_
-
-#include <linux/perf_event.h>
-#include <linux/spinlock_types.h>
-#include <uapi/drm/xe_drm.h>
-
-enum {
-	__XE_SAMPLE_RENDER_GROUP_BUSY,
-	__XE_SAMPLE_COPY_GROUP_BUSY,
-	__XE_SAMPLE_MEDIA_GROUP_BUSY,
-	__XE_SAMPLE_ANY_ENGINE_GROUP_BUSY,
-	__XE_NUM_PMU_SAMPLERS
-};
-
-#define XE_PMU_MAX_GT 2
-
-struct xe_pmu {
-	/**
-	 * @cpuhp: Struct used for CPU hotplug handling.
-	 */
-	struct {
-		struct hlist_node node;
-		unsigned int cpu;
-	} cpuhp;
-	/**
-	 * @base: PMU base.
-	 */
-	struct pmu base;
-	/**
-	 * @closed: xe is unregistering.
-	 */
-	bool closed;
-	/**
-	 * @name: Name as registered with perf core.
-	 */
-	const char *name;
-	/**
-	 * @lock: Lock protecting enable mask and ref count handling.
-	 */
-	spinlock_t lock;
-	/**
-	 * @sample: Current and previous (raw) counters.
-	 *
-	 * These counters are updated when the device is awake.
-	 *
-	 */
-	u64 sample[XE_PMU_MAX_GT][__XE_NUM_PMU_SAMPLERS];
-	/**
-	 * @events_attr_group: Device events attribute group.
-	 */
-	struct attribute_group events_attr_group;
-	/**
-	 * @xe_attr: Memory block holding device attributes.
-	 */
-	void *xe_attr;
-	/**
-	 * @pmu_attr: Memory block holding device attributes.
-	 */
-	void *pmu_attr;
-};
-
-#endif
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 0895e4d2a981..5ba412007270 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1080,46 +1080,6 @@ struct drm_xe_wait_user_fence {
 	/** @reserved: Reserved */
 	__u64 reserved[2];
 };
-
-/**
- * DOC: XE PMU event config IDs
- *
- * Check 'man perf_event_open' to use the ID's DRM_XE_PMU_XXXX listed in xe_drm.h
- * in 'struct perf_event_attr' as part of perf_event_open syscall to read a
- * particular event.
- *
- * For example to open the DRMXE_PMU_RENDER_GROUP_BUSY(0):
- *
- * .. code-block:: C
- *
- *	struct perf_event_attr attr;
- *	long long count;
- *	int cpu = 0;
- *	int fd;
- *
- *	memset(&attr, 0, sizeof(struct perf_event_attr));
- *	attr.type = type; // eg: /sys/bus/event_source/devices/xe_0000_56_00.0/type
- *	attr.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED;
- *	attr.use_clockid = 1;
- *	attr.clockid = CLOCK_MONOTONIC;
- *	attr.config = DRM_XE_PMU_RENDER_GROUP_BUSY(0);
- *
- *	fd = syscall(__NR_perf_event_open, &attr, -1, cpu, -1, 0);
- */
-
-/*
- * Top bits of every counter are GT id.
- */
-#define __DRM_XE_PMU_GT_SHIFT (56)
-
-#define ___DRM_XE_PMU_OTHER(gt, x) \
-	(((__u64)(x)) | ((__u64)(gt) << __DRM_XE_PMU_GT_SHIFT))
-
-#define DRM_XE_PMU_RENDER_GROUP_BUSY(gt)	___DRM_XE_PMU_OTHER(gt, 0)
-#define DRM_XE_PMU_COPY_GROUP_BUSY(gt)		___DRM_XE_PMU_OTHER(gt, 1)
-#define DRM_XE_PMU_MEDIA_GROUP_BUSY(gt)		___DRM_XE_PMU_OTHER(gt, 2)
-#define DRM_XE_PMU_ANY_ENGINE_GROUP_BUSY(gt)	___DRM_XE_PMU_OTHER(gt, 3)
-
 #if defined(__cplusplus)
 }
 #endif
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 02/10] fixup! drm/xe/uapi: Reject bo creation of unaligned size
  2023-12-14 11:31 [PATCH v3 00/10] Engine Busyness Riana Tauro
  2023-12-14 11:31 ` [PATCH v3 01/10] drm/xe/pmu: Remove PMU from Xe till uapi is finalized Riana Tauro
@ 2023-12-14 11:31 ` Riana Tauro
  2023-12-14 13:31   ` Rodrigo Vivi
  2023-12-14 11:31 ` [PATCH v3 03/10] drm/xe: Move user engine class mappings to functions Riana Tauro
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Riana Tauro @ 2023-12-14 11:31 UTC (permalink / raw)
  To: intel-xe

From: Ashutosh Dixit <ashutosh.dixit@intel.com>

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_dma_buf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
index bb6f6424e06f..9f6d571d7fa9 100644
--- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
@@ -3,6 +3,8 @@
  * Copyright © 2022 Intel Corporation
  */
 
+#include <drm/xe_drm.h>
+
 #include <kunit/test.h>
 #include <kunit/visibility.h>
 
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 03/10] drm/xe: Move user engine class mappings to functions
  2023-12-14 11:31 [PATCH v3 00/10] Engine Busyness Riana Tauro
  2023-12-14 11:31 ` [PATCH v3 01/10] drm/xe/pmu: Remove PMU from Xe till uapi is finalized Riana Tauro
  2023-12-14 11:31 ` [PATCH v3 02/10] fixup! drm/xe/uapi: Reject bo creation of unaligned size Riana Tauro
@ 2023-12-14 11:31 ` Riana Tauro
  2023-12-14 11:31 ` [PATCH v3 04/10] RFC drm/xe/guc: Add interface for engine busyness ticks Riana Tauro
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Riana Tauro @ 2023-12-14 11:31 UTC (permalink / raw)
  To: intel-xe

Move user engine class <-> hw engine class to function
calls so that it can be used in different files.

No functional changes.

v2: change array to function

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c | 19 ++----------
 drivers/gpu/drm/xe/xe_hw_engine.c  | 50 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_hw_engine.h  |  3 ++
 drivers/gpu/drm/xe/xe_query.c      | 23 ++------------
 4 files changed, 57 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index eeb9605dd45f..98b1da77f8be 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -482,31 +482,16 @@ static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue
 	return 0;
 }
 
-static const enum xe_engine_class user_to_xe_engine_class[] = {
-	[DRM_XE_ENGINE_CLASS_RENDER] = XE_ENGINE_CLASS_RENDER,
-	[DRM_XE_ENGINE_CLASS_COPY] = XE_ENGINE_CLASS_COPY,
-	[DRM_XE_ENGINE_CLASS_VIDEO_DECODE] = XE_ENGINE_CLASS_VIDEO_DECODE,
-	[DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE] = XE_ENGINE_CLASS_VIDEO_ENHANCE,
-	[DRM_XE_ENGINE_CLASS_COMPUTE] = XE_ENGINE_CLASS_COMPUTE,
-};
-
 static struct xe_hw_engine *
 find_hw_engine(struct xe_device *xe,
 	       struct drm_xe_engine_class_instance eci)
 {
-	u32 idx;
-
-	if (eci.engine_class > ARRAY_SIZE(user_to_xe_engine_class))
-		return NULL;
 
 	if (eci.gt_id >= xe->info.gt_count)
 		return NULL;
 
-	idx = array_index_nospec(eci.engine_class,
-				 ARRAY_SIZE(user_to_xe_engine_class));
-
 	return xe_gt_hw_engine(xe_device_get_gt(xe, eci.gt_id),
-			       user_to_xe_engine_class[idx],
+			       xe_hw_engine_from_user_class(eci.engine_class),
 			       eci.engine_instance, true);
 }
 
@@ -532,7 +517,7 @@ static u32 bind_exec_queue_logical_mask(struct xe_device *xe, struct xe_gt *gt,
 			continue;
 
 		if (hwe->class ==
-		    user_to_xe_engine_class[DRM_XE_ENGINE_CLASS_COPY])
+		    xe_hw_engine_from_user_class(DRM_XE_ENGINE_CLASS_COPY))
 			logical_mask |= BIT(hwe->logical_instance);
 	}
 
diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index 86b863b99065..5b529324c89f 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -264,6 +264,56 @@ static u32 hw_engine_mmio_read32(struct xe_hw_engine *hwe, struct xe_reg reg)
 	return xe_mmio_read32(hwe->gt, reg);
 }
 
+/**
+ * xe_hw_engine_to_user_class - converts xe hw engine to user engine class
+ * @engine_class: hw engine class
+ *
+ * Returns: user engine class on success, -1 on error
+ */
+u16 xe_hw_engine_to_user_class(enum xe_engine_class engine_class)
+{
+	switch (engine_class) {
+	case XE_ENGINE_CLASS_RENDER:
+		return DRM_XE_ENGINE_CLASS_RENDER;
+	case XE_ENGINE_CLASS_COPY:
+		return DRM_XE_ENGINE_CLASS_COPY;
+	case XE_ENGINE_CLASS_VIDEO_DECODE:
+		return DRM_XE_ENGINE_CLASS_VIDEO_DECODE;
+	case XE_ENGINE_CLASS_VIDEO_ENHANCE:
+		return DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE;
+	case XE_ENGINE_CLASS_COMPUTE:
+		return DRM_XE_ENGINE_CLASS_COMPUTE;
+	default:
+		XE_WARN_ON(engine_class);
+		return -1;
+	}
+}
+
+/**
+ * xe_hw_engine_from_user_class - converts xe user engine class to hw engine class
+ * @engine_class: user engine class
+ *
+ * Returns: hw engine class on success
+ */
+enum xe_engine_class xe_hw_engine_from_user_class(u16 engine_class)
+{
+	switch (engine_class) {
+	case DRM_XE_ENGINE_CLASS_RENDER:
+		return XE_ENGINE_CLASS_RENDER;
+	case DRM_XE_ENGINE_CLASS_COPY:
+		return XE_ENGINE_CLASS_COPY;
+	case DRM_XE_ENGINE_CLASS_VIDEO_DECODE:
+		return XE_ENGINE_CLASS_VIDEO_DECODE;
+	case DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE:
+		return XE_ENGINE_CLASS_VIDEO_ENHANCE;
+	case DRM_XE_ENGINE_CLASS_COMPUTE:
+		return XE_ENGINE_CLASS_COMPUTE;
+	default:
+		XE_WARN_ON(engine_class);
+		return XE_ENGINE_CLASS_MAX;
+	}
+}
+
 void xe_hw_engine_enable_ring(struct xe_hw_engine *hwe)
 {
 	u32 ccs_mask =
diff --git a/drivers/gpu/drm/xe/xe_hw_engine.h b/drivers/gpu/drm/xe/xe_hw_engine.h
index 71968ee2f600..89ca96063644 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.h
+++ b/drivers/gpu/drm/xe/xe_hw_engine.h
@@ -62,6 +62,9 @@ void xe_hw_engine_print(struct xe_hw_engine *hwe, struct drm_printer *p);
 void xe_hw_engine_setup_default_lrc_state(struct xe_hw_engine *hwe);
 
 bool xe_hw_engine_is_reserved(struct xe_hw_engine *hwe);
+enum xe_engine_class xe_hw_engine_from_user_class(u16 engine_class);
+u16 xe_hw_engine_to_user_class(enum xe_engine_class engine_class);
+
 static inline bool xe_hw_engine_is_valid(struct xe_hw_engine *hwe)
 {
 	return hwe->name;
diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c
index 56d61bf596b2..8b28cc376fff 100644
--- a/drivers/gpu/drm/xe/xe_query.c
+++ b/drivers/gpu/drm/xe/xe_query.c
@@ -22,22 +22,6 @@
 #include "xe_mmio.h"
 #include "xe_ttm_vram_mgr.h"
 
-static const u16 xe_to_user_engine_class[] = {
-	[XE_ENGINE_CLASS_RENDER] = DRM_XE_ENGINE_CLASS_RENDER,
-	[XE_ENGINE_CLASS_COPY] = DRM_XE_ENGINE_CLASS_COPY,
-	[XE_ENGINE_CLASS_VIDEO_DECODE] = DRM_XE_ENGINE_CLASS_VIDEO_DECODE,
-	[XE_ENGINE_CLASS_VIDEO_ENHANCE] = DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE,
-	[XE_ENGINE_CLASS_COMPUTE] = DRM_XE_ENGINE_CLASS_COMPUTE,
-};
-
-static const enum xe_engine_class user_to_xe_engine_class[] = {
-	[DRM_XE_ENGINE_CLASS_RENDER] = XE_ENGINE_CLASS_RENDER,
-	[DRM_XE_ENGINE_CLASS_COPY] = XE_ENGINE_CLASS_COPY,
-	[DRM_XE_ENGINE_CLASS_VIDEO_DECODE] = XE_ENGINE_CLASS_VIDEO_DECODE,
-	[DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE] = XE_ENGINE_CLASS_VIDEO_ENHANCE,
-	[DRM_XE_ENGINE_CLASS_COMPUTE] = XE_ENGINE_CLASS_COMPUTE,
-};
-
 static size_t calc_hw_engine_info_size(struct xe_device *xe)
 {
 	struct xe_hw_engine *hwe;
@@ -139,10 +123,7 @@ query_engine_cycles(struct xe_device *xe,
 	if (!gt)
 		return -EINVAL;
 
-	if (eci->engine_class >= ARRAY_SIZE(user_to_xe_engine_class))
-		return -EINVAL;
-
-	hwe = xe_gt_hw_engine(gt, user_to_xe_engine_class[eci->engine_class],
+	hwe = xe_gt_hw_engine(gt, xe_hw_engine_from_user_class(eci->engine_class),
 			      eci->engine_instance, true);
 	if (!hwe)
 		return -EINVAL;
@@ -208,7 +189,7 @@ static int query_engines(struct xe_device *xe,
 				continue;
 
 			engines->engines[i].instance.engine_class =
-				xe_to_user_engine_class[hwe->class];
+				xe_hw_engine_to_user_class(hwe->class);
 			engines->engines[i].instance.engine_instance =
 				hwe->logical_instance;
 			engines->engines[i].instance.gt_id = gt->info.id;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 04/10] RFC drm/xe/guc: Add interface for engine busyness ticks
  2023-12-14 11:31 [PATCH v3 00/10] Engine Busyness Riana Tauro
                   ` (2 preceding siblings ...)
  2023-12-14 11:31 ` [PATCH v3 03/10] drm/xe: Move user engine class mappings to functions Riana Tauro
@ 2023-12-14 11:31 ` Riana Tauro
  2023-12-15  2:48   ` Nilawar, Badal
  2023-12-14 11:31 ` [PATCH v3 05/10] RFC drm/xe/uapi: Add configs for Engine busyness Riana Tauro
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Riana Tauro @ 2023-12-14 11:31 UTC (permalink / raw)
  To: intel-xe

GuC provides engine busyness ticks as a 64 bit counter which count
as clock ticks. These counters are maintained in a
shared memory buffer and updated on a continuous basis.

Add functions that initialize Engine busyness and get
the current accumulated busyness.

Co-developed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/Makefile                 |   1 +
 drivers/gpu/drm/xe/abi/guc_actions_abi.h    |   1 +
 drivers/gpu/drm/xe/xe_gt.c                  |  13 ++
 drivers/gpu/drm/xe/xe_gt.h                  |   2 +
 drivers/gpu/drm/xe/xe_guc.c                 |   7 +
 drivers/gpu/drm/xe/xe_guc_engine_busyness.c | 153 ++++++++++++++++++++
 drivers/gpu/drm/xe/xe_guc_engine_busyness.h |  17 +++
 drivers/gpu/drm/xe/xe_guc_fwif.h            |  15 ++
 drivers/gpu/drm/xe/xe_guc_types.h           |   6 +
 9 files changed, 215 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_guc_engine_busyness.c
 create mode 100644 drivers/gpu/drm/xe/xe_guc_engine_busyness.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index b0cb6a9a390e..2523dc96986e 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -85,6 +85,7 @@ xe-y += xe_bb.o \
 	xe_guc_ads.o \
 	xe_guc_ct.o \
 	xe_guc_debugfs.o \
+	xe_guc_engine_busyness.o \
 	xe_guc_hwconfig.o \
 	xe_guc_log.o \
 	xe_guc_pc.o \
diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
index 3062e0e0d467..d87681ca89bc 100644
--- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
@@ -139,6 +139,7 @@ enum xe_guc_action {
 	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
 	XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
 	XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
+	XE_GUC_ACTION_SET_DEVICE_ENGINE_UTILIZATION = 0x550C,
 	XE_GUC_ACTION_NOTIFY_MEMORY_CAT_ERROR = 0x6000,
 	XE_GUC_ACTION_REPORT_PAGE_FAULT_REQ_DESC = 0x6002,
 	XE_GUC_ACTION_PAGE_FAULT_RES_DESC = 0x6003,
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index f5d18e98f8b6..9c84afb00f7b 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -32,6 +32,7 @@
 #include "xe_gt_sysfs.h"
 #include "xe_gt_tlb_invalidation.h"
 #include "xe_gt_topology.h"
+#include "xe_guc_engine_busyness.h"
 #include "xe_guc_exec_queue_types.h"
 #include "xe_guc_pc.h"
 #include "xe_hw_fence.h"
@@ -794,3 +795,15 @@ struct xe_hw_engine *xe_gt_any_hw_engine_by_reset_domain(struct xe_gt *gt,
 
 	return NULL;
 }
+
+/**
+ * xe_gt_engine_busy_ticks - Return current accumulated engine busyness ticks
+ * @gt: GT structure
+ * @hwe: Xe HW engine to report on
+ *
+ * Returns accumulated ticks @hwe was busy since engine stats were enabled.
+ */
+u64 xe_gt_engine_busy_ticks(struct xe_gt *gt, struct xe_hw_engine *hwe)
+{
+	return xe_guc_engine_busyness_ticks(&gt->uc.guc, hwe);
+}
diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
index f3c780bd266d..5b4309310126 100644
--- a/drivers/gpu/drm/xe/xe_gt.h
+++ b/drivers/gpu/drm/xe/xe_gt.h
@@ -42,6 +42,8 @@ int xe_gt_resume(struct xe_gt *gt);
 void xe_gt_reset_async(struct xe_gt *gt);
 void xe_gt_sanitize(struct xe_gt *gt);
 
+u64 xe_gt_engine_busy_ticks(struct xe_gt *gt, struct xe_hw_engine *hwe);
+
 /**
  * xe_gt_any_hw_engine_by_reset_domain - scan the list of engines and return the
  * first that matches the same reset domain as @class
diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 482cb0df9f15..6116aaea936f 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -18,6 +18,7 @@
 #include "xe_gt.h"
 #include "xe_guc_ads.h"
 #include "xe_guc_ct.h"
+#include "xe_guc_engine_busyness.h"
 #include "xe_guc_hwconfig.h"
 #include "xe_guc_log.h"
 #include "xe_guc_pc.h"
@@ -306,9 +307,15 @@ int xe_guc_init_post_hwconfig(struct xe_guc *guc)
 
 int xe_guc_post_load_init(struct xe_guc *guc)
 {
+	int err;
+
 	xe_guc_ads_populate_post_load(&guc->ads);
 	guc->submission_state.enabled = true;
 
+	err = xe_guc_engine_busyness_init(guc);
+	if (err)
+		return err;
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/xe/xe_guc_engine_busyness.c b/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
new file mode 100644
index 000000000000..287429e31e6c
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+#include "xe_guc_engine_busyness.h"
+
+#include <drm/drm_managed.h>
+
+#include "abi/guc_actions_abi.h"
+#include "xe_bo.h"
+#include "xe_guc.h"
+#include "xe_guc_ct.h"
+
+/**
+ * DOC: Xe GuC Engine Busyness
+ *
+ * GuC >= 70.11.1 maintains busyness counters in a shared memory buffer for each
+ * engine on a continuous basis. The counters are all 64 bits and count in clock
+ * ticks. The values are updated on context switch events and periodicaly on a
+ * timer internal to GuC. The update rate is guaranteed to be at least 2Hz (but with
+ * a caveat that is not real time, best effort only).
+ *
+ * engine busyness ticks (ticks_engine) : clock ticks for which engine was active
+ */
+
+static void guc_engine_busyness_usage_map(struct xe_guc *guc,
+					  struct xe_hw_engine *hwe,
+					  struct iosys_map *engine_map)
+{
+	struct iosys_map *map;
+	size_t offset;
+	u32 instance;
+	u8 guc_class;
+
+	guc_class = xe_engine_class_to_guc_class(hwe->class);
+	instance = hwe->logical_instance;
+
+	map = &guc->busy.bo->vmap;
+
+	offset = offsetof(struct guc_engine_observation_data,
+			  engine_data[guc_class][instance]);
+
+	*engine_map = IOSYS_MAP_INIT_OFFSET(map, offset);
+}
+
+static void guc_engine_busyness_get_usage(struct xe_guc *guc,
+					  struct xe_hw_engine *hwe,
+					  u64 *_ticks_engine)
+{
+	struct iosys_map engine_map;
+	u64 ticks_engine = 0;
+	int i = 0;
+
+	guc_engine_busyness_usage_map(guc, hwe, &engine_map);
+
+#define read_engine_usage(map_, field_) \
+	iosys_map_rd_field(map_, 0, struct guc_engine_data, field_)
+
+	do {
+		ticks_engine = read_engine_usage(&engine_map, total_execution_ticks);
+
+		if (read_engine_usage(&engine_map, total_execution_ticks) == ticks_engine)
+			break;
+	} while (++i < 6);
+
+#undef read_engine_usage
+
+	if (_ticks_engine)
+		*_ticks_engine = ticks_engine;
+}
+
+static void guc_engine_busyness_enable_stats(struct xe_guc *guc)
+{
+	u32 ggtt_addr = xe_bo_ggtt_addr(guc->busy.bo);
+	u32 action[] = {
+		XE_GUC_ACTION_SET_DEVICE_ENGINE_UTILIZATION,
+		ggtt_addr,
+		0,
+	};
+	struct xe_device *xe = guc_to_xe(guc);
+	int ret;
+
+	ret = xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
+	if (ret)
+		drm_err(&xe->drm, "Failed to enable usage stats %pe", ERR_PTR(ret));
+}
+
+static void guc_engine_busyness_fini(struct drm_device *drm, void *arg)
+{
+	struct xe_guc *guc = arg;
+
+	xe_bo_unpin_map_no_vm(guc->busy.bo);
+}
+
+/*
+ * xe_guc_engine_busyness_ticks - Gets current accumulated
+ *				  engine busyness ticks
+ * @guc: The GuC object
+ * @hwe: Xe HW Engine
+ *
+ * Returns current acculumated ticks @hwe was busy when engine stats are enabled.
+ */
+u64 xe_guc_engine_busyness_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe)
+{
+	u64 ticks_engine;
+
+	guc_engine_busyness_get_usage(guc, hwe, &ticks_engine);
+
+	return ticks_engine;
+}
+
+/*
+ * xe_guc_engine_busyness_init - Initializes the GuC Engine Busyness
+ * @guc: The GuC object
+ *
+ * Initialize GuC engine busyness, only called once during driver load
+ * Supported only on GuC >= 70.11.1
+ *
+ * Return: 0 on success, negative error code on error.
+ */
+int xe_guc_engine_busyness_init(struct xe_guc *guc)
+{
+	struct xe_device *xe = guc_to_xe(guc);
+	struct xe_gt *gt = guc_to_gt(guc);
+	struct xe_tile *tile = gt_to_tile(gt);
+	struct xe_bo *bo;
+	u32 size;
+	int err;
+
+	/* Initialization already done */
+	if (guc->busy.bo)
+		return 0;
+
+	size = PAGE_ALIGN(sizeof(struct guc_engine_observation_data));
+
+	bo = xe_bo_create_pin_map(xe, tile, NULL, size,
+				  ttm_bo_type_kernel,
+				  XE_BO_CREATE_VRAM_IF_DGFX(tile) |
+				  XE_BO_CREATE_GGTT_BIT);
+
+	if (IS_ERR(bo))
+		return PTR_ERR(bo);
+
+	guc->busy.bo = bo;
+
+	guc_engine_busyness_enable_stats(guc);
+
+	err = drmm_add_action_or_reset(&xe->drm, guc_engine_busyness_fini, guc);
+	if (err)
+		return err;
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/xe/xe_guc_engine_busyness.h b/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
new file mode 100644
index 000000000000..d70f06209896
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_GUC_ENGINE_BUSYNESS_H_
+#define _XE_GUC_ENGINE_BUSYNESS_H_
+
+#include <linux/types.h>
+
+struct xe_hw_engine;
+struct xe_guc;
+
+int xe_guc_engine_busyness_init(struct xe_guc *guc);
+u64 xe_guc_engine_busyness_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe);
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h b/drivers/gpu/drm/xe/xe_guc_fwif.h
index 4dd5a88a7826..c8ca5fe97614 100644
--- a/drivers/gpu/drm/xe/xe_guc_fwif.h
+++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
@@ -37,6 +37,7 @@
 #define GUC_COMPUTE_CLASS		4
 #define GUC_GSC_OTHER_CLASS		5
 #define GUC_LAST_ENGINE_CLASS		GUC_GSC_OTHER_CLASS
+#define GUC_MAX_OAG_COUNTERS		8
 #define GUC_MAX_ENGINE_CLASSES		16
 #define GUC_MAX_INSTANCES_PER_CLASS	32
 
@@ -222,6 +223,20 @@ struct guc_engine_usage {
 	struct guc_engine_usage_record engines[GUC_MAX_ENGINE_CLASSES][GUC_MAX_INSTANCES_PER_CLASS];
 } __packed;
 
+/* Engine busyness stats */
+struct guc_engine_data {
+	u64 total_execution_ticks;
+	u64 reserved;
+} __packed;
+
+struct guc_engine_observation_data {
+	struct guc_engine_data engine_data[GUC_MAX_ENGINE_CLASSES][GUC_MAX_INSTANCES_PER_CLASS];
+	u64 oag_busy_data[GUC_MAX_OAG_COUNTERS];
+	u64 total_active_ticks;
+	u64 gt_timestamp;
+	u64 reserved1;
+} __packed;
+
 /* This action will be programmed in C1BC - SOFT_SCRATCH_15_REG */
 enum xe_guc_recv_message {
 	XE_GUC_RECV_MSG_CRASH_DUMP_POSTED = BIT(1),
diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
index cd80802e8918..4e9602301aed 100644
--- a/drivers/gpu/drm/xe/xe_guc_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_types.h
@@ -70,6 +70,12 @@ struct xe_guc {
 		u32 size;
 	} hwconfig;
 
+	/** @busy: Engine busyness */
+	struct {
+		/** @bo: GGTT buffer object of engine busyness that is shared with GuC */
+		struct xe_bo *bo;
+	} busy;
+
 	/**
 	 * @notify_reg: Register which is written to notify GuC of H2G messages
 	 */
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 05/10] RFC drm/xe/uapi: Add configs for Engine busyness
  2023-12-14 11:31 [PATCH v3 00/10] Engine Busyness Riana Tauro
                   ` (3 preceding siblings ...)
  2023-12-14 11:31 ` [PATCH v3 04/10] RFC drm/xe/guc: Add interface for engine busyness ticks Riana Tauro
@ 2023-12-14 11:31 ` Riana Tauro
  2023-12-14 11:31 ` [PATCH v3 06/10] RFC drm/xe/pmu: Enable PMU interface and add engine busyness counter Riana Tauro
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Riana Tauro @ 2023-12-14 11:31 UTC (permalink / raw)
  To: intel-xe

GuC provides engine busyness ticks as a 64 bit counter which count
as clock ticks.

Add configs to the uapi to expose Engine busyness via PMU.

v2: add "__" prefix for internal helpers
    add a simple helper for application usage (Aravind)

Cc: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 include/uapi/drm/xe_drm.h | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 5ba412007270..58ab3e414c87 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1080,6 +1080,40 @@ struct drm_xe_wait_user_fence {
 	/** @reserved: Reserved */
 	__u64 reserved[2];
 };
+
+/**
+ * DOC: XE PMU event config IDs
+ *
+ * Check 'man perf_event_open' to use the ID's DRM_XE_PMU_XXXX listed in xe_drm.h
+ * in 'struct perf_event_attr' as part of perf_event_open syscall to read a
+ * particular event.
+ *
+ */
+enum drm_xe_pmu_engine_sample {
+	DRM_XE_PMU_SAMPLE_BUSY_TICKS = 0,
+};
+
+/*
+ * Top bits of every counter are GT id.
+ */
+#define __DRM_XE_PMU_GT_SHIFT (56)
+
+#define __DRM_XE_PMU_SAMPLE_BITS (4)
+#define __DRM_XE_PMU_SAMPLE_INSTANCE_BITS (8)
+#define __DRM_XE_PMU_CLASS_SHIFT \
+	(__DRM_XE_PMU_SAMPLE_BITS + __DRM_XE_PMU_SAMPLE_INSTANCE_BITS)
+
+#define __DRM_XE_PMU_ENGINE(gt, class, instance, sample) \
+	(((class) << __DRM_XE_PMU_CLASS_SHIFT | \
+	(instance) << __DRM_XE_PMU_SAMPLE_BITS | \
+	(sample)) | ((__u64)(gt) << __DRM_XE_PMU_GT_SHIFT))
+
+#define __DRM_XE_PMU_OTHER(gt, x) \
+	((__u64)__DRM_XE_PMU_ENGINE(gt, 0xff, 0xff, 0xf) + 1 + (x))
+
+#define DRM_XE_PMU_ENGINE_BUSY_TICKS(gt, class, instance) \
+	__DRM_XE_PMU_ENGINE(gt, class, instance, DRM_XE_PMU_SAMPLE_BUSY_TICKS)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 06/10] RFC drm/xe/pmu: Enable PMU interface and add engine busyness counter
  2023-12-14 11:31 [PATCH v3 00/10] Engine Busyness Riana Tauro
                   ` (4 preceding siblings ...)
  2023-12-14 11:31 ` [PATCH v3 05/10] RFC drm/xe/uapi: Add configs for Engine busyness Riana Tauro
@ 2023-12-14 11:31 ` Riana Tauro
  2023-12-14 11:31 ` [PATCH v3 07/10] RFC drm/xe/guc: Add PMU counter for total active ticks Riana Tauro
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Riana Tauro @ 2023-12-14 11:31 UTC (permalink / raw)
  To: intel-xe

From: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>

This patch adds the PMU base implementation along with engine busyness
counters.

GuC provides engine busyness ticks as a 64 bit counter which count
as clock ticks. These counters are maintained in a
shared memory buffer and internally updated on a continuous basis.

This is listed by perf tool as

  sudo ./perf list
     xe_0000_03_00.0/bcs0-busy-ticks-gt0/               [Kernel PMU event]
     xe_0000_03_00.0/ccs0-busy-ticks-gt0/               [Kernel PMU event]
     xe_0000_03_00.0/rcs0-busy-ticks-gt0/               [Kernel PMU event]
     xe_0000_03_00.0/vcs0-busy-ticks-gt0/               [Kernel PMU event]
     xe_0000_03_00.0/vecs0-busy-ticks-gt0/              [Kernel PMU event]

and read as

  sudo ./perf stat -e xe_0000_03_00.0/bcs0-busy-ticks-gt0/  -I 1000
           time       counts unit       events
       1.000674178     2052       xe_0000_03_00.0/bcs0-busy-ticks-gt0/
       2.006626312     2033       xe_0000_03_00.0/bcs0-busy-ticks-gt0/
       3.009499300    40067       xe_0000_03_00.0/bcs0-busy-ticks-gt0/
       4.010521486     8491       xe_0000_03_00.0/bcs0-busy-ticks-gt0/

The pmu base implementation is taken from i915.

v2: rebase

v3: add engine busyness

Co-developed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Co-developed-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Co-developed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
---
 drivers/gpu/drm/xe/Makefile          |   2 +
 drivers/gpu/drm/xe/xe_device.c       |   2 +
 drivers/gpu/drm/xe/xe_device_types.h |   4 +
 drivers/gpu/drm/xe/xe_module.c       |   5 +
 drivers/gpu/drm/xe/xe_pmu.c          | 543 +++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_pmu.h          |  23 ++
 drivers/gpu/drm/xe/xe_pmu_types.h    |  49 +++
 7 files changed, 628 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_pmu.c
 create mode 100644 drivers/gpu/drm/xe/xe_pmu.h
 create mode 100644 drivers/gpu/drm/xe/xe_pmu_types.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 2523dc96986e..862266efd20a 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -281,6 +281,8 @@ endif
 obj-$(CONFIG_DRM_XE) += xe.o
 obj-$(CONFIG_DRM_XE_KUNIT_TEST) += tests/
 
+xe-$(CONFIG_PERF_EVENTS) += xe_pmu.o
+
 # header test
 hdrtest_find_args := -not -path xe_rtp_helpers.h
 ifneq ($(CONFIG_DRM_XE_DISPLAY),y)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index d9ae77fe7382..436a25a4b8a7 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -531,6 +531,8 @@ int xe_device_probe(struct xe_device *xe)
 
 	xe_hwmon_register(xe);
 
+	xe_pmu_register(&xe->pmu);
+
 	err = drmm_add_action_or_reset(&xe->drm, xe_device_sanitize, xe);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index c45ef17b3473..54d81ae8f1cb 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -18,6 +18,7 @@
 #include "xe_lmtt_types.h"
 #include "xe_platform_types.h"
 #include "xe_pt_types.h"
+#include "xe_pmu.h"
 #include "xe_sriov_types.h"
 #include "xe_step_types.h"
 
@@ -474,6 +475,9 @@ struct xe_device {
 	/* To shut up runtime pm macros.. */
 	struct xe_runtime_pm {} runtime_pm;
 
+	/** @pmu: performance monitoring unit */
+	struct xe_pmu pmu;
+
 	/* For pcode */
 	struct mutex sb_lock;
 
diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 110b69864656..51bf69b7ab22 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -11,6 +11,7 @@
 #include "xe_drv.h"
 #include "xe_hw_fence.h"
 #include "xe_pci.h"
+#include "xe_pmu.h"
 #include "xe_sched_job.h"
 
 struct xe_modparam xe_modparam = {
@@ -62,6 +63,10 @@ static const struct init_funcs init_funcs[] = {
 		.init = xe_sched_job_module_init,
 		.exit = xe_sched_job_module_exit,
 	},
+	{
+		.init = xe_pmu_init,
+		.exit = xe_pmu_exit,
+	},
 	{
 		.init = xe_register_pci_driver,
 		.exit = xe_unregister_pci_driver,
diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
new file mode 100644
index 000000000000..5e2ad4badaed
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_pmu.c
@@ -0,0 +1,543 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <drm/drm_drv.h>
+#include <drm/drm_managed.h>
+#include <drm/xe_drm.h>
+
+#include "xe_device.h"
+#include "xe_gt.h"
+
+#define XE_ENGINE_SAMPLE_COUNT (DRM_XE_PMU_SAMPLE_BUSY_TICKS + 1)
+
+static cpumask_t xe_pmu_cpumask;
+static unsigned int xe_pmu_target_cpu = -1;
+
+static unsigned int config_gt_id(const u64 config)
+{
+	return config >> __DRM_XE_PMU_GT_SHIFT;
+}
+
+static u64 config_counter(const u64 config)
+{
+	return config & ~(~0ULL << __DRM_XE_PMU_GT_SHIFT);
+}
+
+static u8 engine_event_sample(struct perf_event *event)
+{
+	u64 config = event->attr.config;
+
+	return config_counter(config) & 0xf;
+}
+
+static u8 engine_event_class(struct perf_event *event)
+{
+	u64 config = event->attr.config;
+
+	return (config_counter(config) >> __DRM_XE_PMU_CLASS_SHIFT) & 0xff;
+}
+
+static u8 engine_event_instance(struct perf_event *event)
+{
+	u64 config = event->attr.config;
+
+	return (config_counter(config) >> __DRM_XE_PMU_SAMPLE_BITS) & 0xff;
+}
+
+static bool is_engine_event(struct perf_event *event)
+{
+	return config_counter(event->attr.config) < __DRM_XE_PMU_OTHER(0, 0);
+}
+
+static int engine_event_status(struct xe_hw_engine *hwe,
+			       enum drm_xe_pmu_engine_sample sample)
+{
+	if (!hwe)
+		return -ENODEV;
+
+	/* Other engine events will be added, XE_ENGINE_SAMPLE_COUNT will be changed */
+	return (sample >= DRM_XE_PMU_SAMPLE_BUSY_TICKS && sample < XE_ENGINE_SAMPLE_COUNT)
+		? 0 : -ENOENT;
+}
+
+static int engine_event_init(struct perf_event *event)
+{
+	struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base);
+	const u64 config = event->attr.config;
+	const unsigned int gt_id = config_gt_id(config);
+	struct xe_gt *gt = xe_device_get_gt(xe, gt_id);
+	struct xe_hw_engine *hwe;
+
+	hwe = xe_gt_hw_engine(gt, xe_hw_engine_from_user_class(engine_event_class(event)),
+			      engine_event_instance(event), true);
+
+	return engine_event_status(hwe, engine_event_sample(event));
+}
+
+static void xe_pmu_event_destroy(struct perf_event *event)
+{
+	struct xe_device *xe =
+		container_of(event->pmu, typeof(*xe), pmu.base);
+
+	drm_WARN_ON(&xe->drm, event->parent);
+
+	drm_dev_put(&xe->drm);
+}
+
+static int xe_pmu_event_init(struct perf_event *event)
+{
+	struct xe_device *xe =
+		container_of(event->pmu, typeof(*xe), pmu.base);
+	struct xe_pmu *pmu = &xe->pmu;
+	int ret;
+
+	if (pmu->closed)
+		return -ENODEV;
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* unsupported modes and filters */
+	if (event->attr.sample_period) /* no sampling */
+		return -EINVAL;
+
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	if (event->cpu < 0)
+		return -EINVAL;
+
+	/* only allow running on one cpu at a time */
+	if (!cpumask_test_cpu(event->cpu, &xe_pmu_cpumask))
+		return -EINVAL;
+
+	if (is_engine_event(event)) {
+		ret = engine_event_init(event);
+		if (ret)
+			return ret;
+	}
+
+	if (!event->parent) {
+		drm_dev_get(&xe->drm);
+		event->destroy = xe_pmu_event_destroy;
+	}
+
+	return 0;
+}
+
+static u64 __xe_pmu_event_read(struct perf_event *event)
+{
+	struct xe_device *xe =
+		container_of(event->pmu, typeof(*xe), pmu.base);
+	const unsigned int gt_id = config_gt_id(event->attr.config);
+	struct xe_gt *gt = xe_device_get_gt(xe, gt_id);
+	u64 val;
+
+	if (is_engine_event(event)) {
+		u8 sample = engine_event_sample(event);
+		struct xe_hw_engine *hwe;
+
+		hwe = xe_gt_hw_engine(gt, xe_hw_engine_from_user_class(engine_event_class(event)),
+				      engine_event_instance(event), true);
+		if (!hwe)
+			drm_WARN_ON_ONCE(&xe->drm, "unknown engine\n");
+		else if (sample == DRM_XE_PMU_SAMPLE_BUSY_TICKS)
+			val = xe_gt_engine_busy_ticks(gt, hwe);
+		else
+			drm_warn(&xe->drm, "unknown pmu engine event\n");
+	}
+
+	return val;
+}
+
+static void xe_pmu_event_read(struct perf_event *event)
+{
+	struct xe_device *xe =
+		container_of(event->pmu, typeof(*xe), pmu.base);
+	struct hw_perf_event *hwc = &event->hw;
+	struct xe_pmu *pmu = &xe->pmu;
+	u64 prev, new;
+
+	if (pmu->closed) {
+		event->hw.state = PERF_HES_STOPPED;
+		return;
+	}
+again:
+	prev = local64_read(&hwc->prev_count);
+	new = __xe_pmu_event_read(event);
+
+	if (local64_cmpxchg(&hwc->prev_count, prev, new) != prev)
+		goto again;
+
+	local64_add(new - prev, &event->count);
+}
+
+static void xe_pmu_enable(struct perf_event *event)
+{
+	/*
+	 * Store the current counter value so we can report the correct delta
+	 * for all listeners. Even when the event was already enabled and has
+	 * an existing non-zero value.
+	 */
+	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event));
+}
+
+static void xe_pmu_event_start(struct perf_event *event, int flags)
+{
+	struct xe_device *xe =
+		container_of(event->pmu, typeof(*xe), pmu.base);
+	struct xe_pmu *pmu = &xe->pmu;
+
+	if (pmu->closed)
+		return;
+
+	xe_pmu_enable(event);
+	event->hw.state = 0;
+}
+
+static void xe_pmu_event_stop(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_UPDATE)
+		xe_pmu_event_read(event);
+
+	event->hw.state = PERF_HES_STOPPED;
+}
+
+static int xe_pmu_event_add(struct perf_event *event, int flags)
+{
+	struct xe_device *xe =
+		container_of(event->pmu, typeof(*xe), pmu.base);
+	struct xe_pmu *pmu = &xe->pmu;
+
+	if (pmu->closed)
+		return -ENODEV;
+
+	if (flags & PERF_EF_START)
+		xe_pmu_event_start(event, flags);
+
+	return 0;
+}
+
+static void xe_pmu_event_del(struct perf_event *event, int flags)
+{
+	xe_pmu_event_stop(event, PERF_EF_UPDATE);
+}
+
+struct xe_ext_attribute {
+	struct device_attribute attr;
+	unsigned long val;
+};
+
+static ssize_t xe_pmu_event_show(struct device *dev,
+				 struct device_attribute *attr, char *buf)
+{
+	struct xe_ext_attribute *eattr;
+
+	eattr = container_of(attr, struct xe_ext_attribute, attr);
+	return sprintf(buf, "config=0x%lx\n", eattr->val);
+}
+
+static ssize_t cpumask_show(struct device *dev,
+			    struct device_attribute *attr, char *buf)
+{
+	return cpumap_print_to_pagebuf(true, buf, &xe_pmu_cpumask);
+}
+
+static DEVICE_ATTR_RO(cpumask);
+
+static struct attribute *xe_cpumask_attrs[] = {
+	&dev_attr_cpumask.attr,
+	NULL,
+};
+
+static const struct attribute_group xe_pmu_cpumask_attr_group = {
+	.attrs = xe_cpumask_attrs,
+};
+
+#define __engine_event(__sample, __name) \
+{ \
+	.sample = (__sample), \
+	.name = (__name), \
+}
+
+static struct xe_ext_attribute *
+add_xe_attr(struct xe_ext_attribute *attr, const char *name, u64 config)
+{
+	sysfs_attr_init(&attr->attr.attr);
+	attr->attr.attr.name = name;
+	attr->attr.attr.mode = 0444;
+	attr->attr.show = xe_pmu_event_show;
+	attr->val = config;
+
+	return ++attr;
+}
+
+static struct attribute **
+create_event_attributes(struct xe_pmu *pmu)
+{
+	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
+	struct xe_ext_attribute *xe_attr = NULL, *xe_iter;
+	struct attribute **attr = NULL, **attr_iter;
+	unsigned int count = 0;
+	enum xe_hw_engine_id id;
+	unsigned int i, j;
+	struct xe_hw_engine *hwe;
+	struct xe_gt *gt;
+
+	static const struct {
+		enum drm_xe_pmu_engine_sample sample;
+		char *name;
+	} engine_events[] = {
+		__engine_event(DRM_XE_PMU_SAMPLE_BUSY_TICKS, "busy-ticks"),
+	};
+
+	for_each_gt(gt, xe, j) {
+		for_each_hw_engine(hwe, gt, id) {
+			for (i = 0; i < ARRAY_SIZE(engine_events); i++) {
+				if (!engine_event_status(hwe, engine_events[i].sample))
+					count++;
+			}
+		}
+	}
+
+	/* Allocate attribute objects and table. */
+	xe_attr = kcalloc(count, sizeof(*xe_attr), GFP_KERNEL);
+	if (!xe_attr)
+		goto err_alloc;
+
+	/* Max one pointer of each attribute type plus a termination entry. */
+	attr = kcalloc(count + 1, sizeof(*attr), GFP_KERNEL);
+	if (!attr)
+		goto err_alloc;
+
+	xe_iter = xe_attr;
+	attr_iter = attr;
+
+	/* Initialize supported engine counters */
+	for_each_gt(gt, xe, j) {
+		for_each_hw_engine(hwe, gt, id) {
+			for (i = 0; i < ARRAY_SIZE(engine_events); i++) {
+				char *str;
+
+				if (engine_event_status(hwe, engine_events[i].sample))
+					continue;
+
+				str = kasprintf(GFP_KERNEL, "%s-%s-gt%u",
+						hwe->name, engine_events[i].name, j);
+
+				if (!str)
+					goto err;
+
+				*attr_iter++ = &xe_iter->attr.attr;
+				xe_iter = add_xe_attr(xe_iter, str,
+						      __DRM_XE_PMU_ENGINE(j, xe_hw_engine_to_user_class(hwe->class),
+									  hwe->logical_instance,
+									  engine_events[i].sample));
+			}
+		}
+	}
+
+	pmu->xe_attr = xe_attr;
+	return attr;
+
+err:
+	for (attr_iter = attr; *attr_iter; attr_iter++)
+		kfree((*attr_iter)->name);
+
+err_alloc:
+	kfree(attr);
+	kfree(xe_attr);
+
+	return NULL;
+}
+
+static void free_event_attributes(struct xe_pmu *pmu)
+{
+	struct attribute **attr_iter = pmu->events_attr_group.attrs;
+
+	for (; *attr_iter; attr_iter++)
+		kfree((*attr_iter)->name);
+
+	kfree(pmu->events_attr_group.attrs);
+	kfree(pmu->xe_attr);
+
+	pmu->events_attr_group.attrs = NULL;
+	pmu->xe_attr = NULL;
+}
+
+static int xe_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+	struct xe_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
+
+	/* Select the first online CPU as a designated reader. */
+	if (cpumask_empty(&xe_pmu_cpumask))
+		cpumask_set_cpu(cpu, &xe_pmu_cpumask);
+
+	return 0;
+}
+
+static int xe_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
+{
+	struct xe_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
+	unsigned int target = xe_pmu_target_cpu;
+
+	/*
+	 * Unregistering an instance generates a CPU offline event which we must
+	 * ignore to avoid incorrectly modifying the shared xe_pmu_cpumask.
+	 */
+	if (pmu->closed)
+		return 0;
+
+	if (cpumask_test_and_clear_cpu(cpu, &xe_pmu_cpumask)) {
+		target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
+
+		/* Migrate events if there is a valid target */
+		if (target < nr_cpu_ids) {
+			cpumask_set_cpu(target, &xe_pmu_cpumask);
+			xe_pmu_target_cpu = target;
+		}
+	}
+
+	if (target < nr_cpu_ids && target != pmu->cpuhp.cpu) {
+		perf_pmu_migrate_context(&pmu->base, cpu, target);
+		pmu->cpuhp.cpu = target;
+	}
+
+	return 0;
+}
+
+static enum cpuhp_state cpuhp_slot = CPUHP_INVALID;
+
+int xe_pmu_init(void)
+{
+	int ret;
+
+	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
+				      "perf/x86/intel/xe:online",
+				      xe_pmu_cpu_online,
+				      xe_pmu_cpu_offline);
+	if (ret < 0)
+		pr_notice("Failed to setup cpuhp state for xe PMU! (%d)\n",
+			  ret);
+	else
+		cpuhp_slot = ret;
+
+	return 0;
+}
+
+void xe_pmu_exit(void)
+{
+	if (cpuhp_slot != CPUHP_INVALID)
+		cpuhp_remove_multi_state(cpuhp_slot);
+}
+
+static int xe_pmu_register_cpuhp_state(struct xe_pmu *pmu)
+{
+	if (cpuhp_slot == CPUHP_INVALID)
+		return -EINVAL;
+
+	return cpuhp_state_add_instance(cpuhp_slot, &pmu->cpuhp.node);
+}
+
+static void xe_pmu_unregister_cpuhp_state(struct xe_pmu *pmu)
+{
+	cpuhp_state_remove_instance(cpuhp_slot, &pmu->cpuhp.node);
+}
+
+static void xe_pmu_unregister(struct drm_device *device, void *arg)
+{
+	struct xe_pmu *pmu = arg;
+
+	if (!pmu->base.event_init)
+		return;
+
+	/*
+	 * "Disconnect" the PMU callbacks - since all are atomic synchronize_rcu
+	 * ensures all currently executing ones will have exited before we
+	 * proceed with unregistration.
+	 */
+	pmu->closed = true;
+	synchronize_rcu();
+
+	xe_pmu_unregister_cpuhp_state(pmu);
+
+	perf_pmu_unregister(&pmu->base);
+	pmu->base.event_init = NULL;
+	kfree(pmu->base.attr_groups);
+	kfree(pmu->name);
+	free_event_attributes(pmu);
+}
+
+void xe_pmu_register(struct xe_pmu *pmu)
+{
+	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
+	const struct attribute_group *attr_groups[] = {
+		&pmu->events_attr_group,
+		&xe_pmu_cpumask_attr_group,
+		NULL
+	};
+
+	int ret = -ENOMEM;
+
+	spin_lock_init(&pmu->lock);
+	pmu->cpuhp.cpu = -1;
+
+	pmu->name = kasprintf(GFP_KERNEL,
+			      "xe_%s",
+			      dev_name(xe->drm.dev));
+	if (pmu->name)
+		/* tools/perf reserves colons as special. */
+		strreplace((char *)pmu->name, ':', '_');
+
+	if (!pmu->name)
+		goto err;
+
+	pmu->events_attr_group.name = "events";
+	pmu->events_attr_group.attrs = create_event_attributes(pmu);
+	if (!pmu->events_attr_group.attrs)
+		goto err_name;
+
+	pmu->base.attr_groups = kmemdup(attr_groups, sizeof(attr_groups),
+					GFP_KERNEL);
+	if (!pmu->base.attr_groups)
+		goto err_attr;
+
+	pmu->base.module	= THIS_MODULE;
+	pmu->base.task_ctx_nr	= perf_invalid_context;
+	pmu->base.event_init	= xe_pmu_event_init;
+	pmu->base.add		= xe_pmu_event_add;
+	pmu->base.del		= xe_pmu_event_del;
+	pmu->base.start		= xe_pmu_event_start;
+	pmu->base.stop		= xe_pmu_event_stop;
+	pmu->base.read		= xe_pmu_event_read;
+
+	ret = perf_pmu_register(&pmu->base, pmu->name, -1);
+	if (ret)
+		goto err_groups;
+
+	ret = xe_pmu_register_cpuhp_state(pmu);
+	if (ret)
+		goto err_unreg;
+
+	ret = drmm_add_action_or_reset(&xe->drm, xe_pmu_unregister, pmu);
+	if (ret)
+		goto err_cpuhp;
+
+	return;
+
+err_cpuhp:
+	xe_pmu_unregister_cpuhp_state(pmu);
+err_unreg:
+	perf_pmu_unregister(&pmu->base);
+err_groups:
+	kfree(pmu->base.attr_groups);
+err_attr:
+	pmu->base.event_init = NULL;
+	free_event_attributes(pmu);
+err_name:
+	kfree(pmu->name);
+err:
+	drm_notice(&xe->drm, "Failed to register PMU!\n");
+}
diff --git a/drivers/gpu/drm/xe/xe_pmu.h b/drivers/gpu/drm/xe/xe_pmu.h
new file mode 100644
index 000000000000..d6fca18466f4
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_pmu.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_PMU_H_
+#define _XE_PMU_H_
+
+#include "xe_gt_types.h"
+#include "xe_pmu_types.h"
+
+#if IS_ENABLED(CONFIG_PERF_EVENTS)
+int xe_pmu_init(void);
+void xe_pmu_exit(void);
+void xe_pmu_register(struct xe_pmu *pmu);
+#else
+static inline int xe_pmu_init(void) { return 0; }
+static inline void xe_pmu_exit(void) {}
+static inline void xe_pmu_register(struct xe_pmu *pmu) {}
+#endif
+
+#endif
+
diff --git a/drivers/gpu/drm/xe/xe_pmu_types.h b/drivers/gpu/drm/xe/xe_pmu_types.h
new file mode 100644
index 000000000000..d38b24d27cfd
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_pmu_types.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_PMU_TYPES_H_
+#define _XE_PMU_TYPES_H_
+
+#include <linux/perf_event.h>
+#include <linux/spinlock_types.h>
+#include <uapi/drm/xe_drm.h>
+
+#define XE_PMU_MAX_GT 2
+
+struct xe_pmu {
+	/**
+	 * @cpuhp: Struct used for CPU hotplug handling.
+	 */
+	struct {
+		struct hlist_node node;
+		unsigned int cpu;
+	} cpuhp;
+	/**
+	 * @base: PMU base.
+	 */
+	struct pmu base;
+	/**
+	 * @closed: xe is unregistering.
+	 */
+	bool closed;
+	/**
+	 * @name: Name as registered with perf core.
+	 */
+	const char *name;
+	/**
+	 * @lock: Lock protecting enable mask and ref count handling.
+	 */
+	spinlock_t lock;
+	/**
+	 * @events_attr_group: Device events attribute group.
+	 */
+	struct attribute_group events_attr_group;
+	/**
+	 * @xe_attr: Memory block holding device attributes.
+	 */
+	void *xe_attr;
+};
+
+#endif
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 07/10] RFC drm/xe/guc: Add PMU counter for total active ticks
  2023-12-14 11:31 [PATCH v3 00/10] Engine Busyness Riana Tauro
                   ` (5 preceding siblings ...)
  2023-12-14 11:31 ` [PATCH v3 06/10] RFC drm/xe/pmu: Enable PMU interface and add engine busyness counter Riana Tauro
@ 2023-12-14 11:31 ` Riana Tauro
  2023-12-14 11:31 ` [PATCH v3 08/10] RFC drm/xe/guc: Expose engine busyness only for supported GuC version Riana Tauro
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Riana Tauro @ 2023-12-14 11:31 UTC (permalink / raw)
  To: intel-xe

GuC provides engine busyness ticks as a 64 bit counter which count
as clock ticks. These counters are maintained in a
shared memory buffer and internally updated on a continuous basis.

GuC also provides a periodically total active ticks that GT has been
active for. This counter is exposed to the user such that busyness can
be calculated as a percentage using

busyness % = (engine active ticks/total active ticks) * 100.

This patch provides a pmu counter for total active ticks.

This is listed by perf tool as

sudo ./perf list
	  xe_0000_03_00.0/total-active-ticks-gt0/            [Kernel PMU event]

and can be read using

sudo ./perf stat -e xe_0000_03_00.0/total-active-ticks-gt0/ -I 1000
        time 	    counts  unit 	events
    1.001332764    58942964    xe_0000_03_00.0/total-active-ticks-gt0/
    2.011421147	   21191869    xe_0000_03_00.0/total-active-ticks-gt0/
    3.013223865	   19269012    xe_0000_03_00.0/total-active-ticks-gt0/

Co-developed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/xe_gt.c                  | 11 +++
 drivers/gpu/drm/xe/xe_gt.h                  |  2 +-
 drivers/gpu/drm/xe/xe_guc_engine_busyness.c | 71 ++++++++++++++++----
 drivers/gpu/drm/xe/xe_guc_engine_busyness.h |  1 +
 drivers/gpu/drm/xe/xe_pmu.c                 | 74 +++++++++++++++++++--
 include/uapi/drm/xe_drm.h                   | 20 ++++++
 6 files changed, 160 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 9c84afb00f7b..0e131b699a54 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -807,3 +807,14 @@ u64 xe_gt_engine_busy_ticks(struct xe_gt *gt, struct xe_hw_engine *hwe)
 {
 	return xe_guc_engine_busyness_ticks(&gt->uc.guc, hwe);
 }
+
+/**
+ * xe_gt_total_active_ticks - Return total active ticks
+ * @gt: GT structure
+ *
+ * Returns total active ticks that the GT was active for.
+ */
+u64 xe_gt_total_active_ticks(struct xe_gt *gt)
+{
+	return xe_guc_engine_busyness_active_ticks(&gt->uc.guc);
+}
diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
index 5b4309310126..7e7828b12acd 100644
--- a/drivers/gpu/drm/xe/xe_gt.h
+++ b/drivers/gpu/drm/xe/xe_gt.h
@@ -43,7 +43,7 @@ void xe_gt_reset_async(struct xe_gt *gt);
 void xe_gt_sanitize(struct xe_gt *gt);
 
 u64 xe_gt_engine_busy_ticks(struct xe_gt *gt, struct xe_hw_engine *hwe);
-
+u64 xe_gt_total_active_ticks(struct xe_gt *gt);
 /**
  * xe_gt_any_hw_engine_by_reset_domain - scan the list of engines and return the
  * first that matches the same reset domain as @class
diff --git a/drivers/gpu/drm/xe/xe_guc_engine_busyness.c b/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
index 287429e31e6c..4c24f51f2fa3 100644
--- a/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
+++ b/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
@@ -20,53 +20,83 @@
  * timer internal to GuC. The update rate is guaranteed to be at least 2Hz (but with
  * a caveat that is not real time, best effort only).
  *
+ * In addition to the engine busyness ticks, there is also a total time count which
+ * is a free running GT timestamp counter.
+ *
+ * Note that counters should be used as ratios of each other for calculating a
+ * percentage.
+ *
  * engine busyness ticks (ticks_engine) : clock ticks for which engine was active
+ * total active ticks (ticks_gt)	: total clock ticks
+ *
+ * engine busyness % = (ticks_engine / ticks_gt) * 100
  */
 
 static void guc_engine_busyness_usage_map(struct xe_guc *guc,
 					  struct xe_hw_engine *hwe,
-					  struct iosys_map *engine_map)
+					  struct iosys_map *engine_map,
+					  struct iosys_map *global_map)
 {
 	struct iosys_map *map;
 	size_t offset;
 	u32 instance;
 	u8 guc_class;
 
-	guc_class = xe_engine_class_to_guc_class(hwe->class);
-	instance = hwe->logical_instance;
+	if (hwe) {
+		guc_class = xe_engine_class_to_guc_class(hwe->class);
+		instance = hwe->logical_instance;
+	}
 
 	map = &guc->busy.bo->vmap;
 
-	offset = offsetof(struct guc_engine_observation_data,
-			  engine_data[guc_class][instance]);
+	if (hwe) {
+		offset = offsetof(struct guc_engine_observation_data,
+				  engine_data[guc_class][instance]);
+		*engine_map = IOSYS_MAP_INIT_OFFSET(map, offset);
+	}
 
-	*engine_map = IOSYS_MAP_INIT_OFFSET(map, offset);
+	*global_map = IOSYS_MAP_INIT_OFFSET(map, 0);
 }
 
 static void guc_engine_busyness_get_usage(struct xe_guc *guc,
 					  struct xe_hw_engine *hwe,
-					  u64 *_ticks_engine)
+					  u64 *_ticks_engine,
+					  u64 *_ticks_gt)
 {
-	struct iosys_map engine_map;
-	u64 ticks_engine = 0;
+	struct iosys_map engine_map, global_map;
+	u64 ticks_engine = 0, ticks_gt = 0;
 	int i = 0;
 
-	guc_engine_busyness_usage_map(guc, hwe, &engine_map);
+	guc_engine_busyness_usage_map(guc, hwe, &engine_map, &global_map);
 
 #define read_engine_usage(map_, field_) \
 	iosys_map_rd_field(map_, 0, struct guc_engine_data, field_)
 
+#define read_global_field(map_, field_) \
+	iosys_map_rd_field(map_, 0, struct guc_engine_observation_data, field_)
+
 	do {
-		ticks_engine = read_engine_usage(&engine_map, total_execution_ticks);
+		if (hwe)
+			ticks_engine = read_engine_usage(&engine_map, total_execution_ticks);
+
+		ticks_gt = read_global_field(&global_map, gt_timestamp);
 
-		if (read_engine_usage(&engine_map, total_execution_ticks) == ticks_engine)
+		if (hwe && read_engine_usage(&engine_map, total_execution_ticks) != ticks_engine)
+			continue;
+
+		if (read_global_field(&global_map, gt_timestamp) == ticks_gt)
 			break;
+
 	} while (++i < 6);
 
 #undef read_engine_usage
+#undef read_global_field
 
 	if (_ticks_engine)
 		*_ticks_engine = ticks_engine;
+
+	if (_ticks_gt)
+		*_ticks_gt = ticks_gt;
 }
 
 static void guc_engine_busyness_enable_stats(struct xe_guc *guc)
@@ -92,6 +122,21 @@ static void guc_engine_busyness_fini(struct drm_device *drm, void *arg)
 	xe_bo_unpin_map_no_vm(guc->busy.bo);
 }
 
+/*
+ * xe_guc_engine_busyness_active_ticks - Gets the total active ticks
+ * @guc: The GuC object
+ *
+ * Returns total active ticks that the GT has been running for.
+ */
+u64 xe_guc_engine_busyness_active_ticks(struct xe_guc *guc)
+{
+	u64 ticks_gt;
+
+	guc_engine_busyness_get_usage(guc, NULL, NULL, &ticks_gt);
+
+	return ticks_gt;
+}
+
 /*
  * xe_guc_engine_busyness_ticks - Gets current accumulated
  *				  engine busyness ticks
@@ -104,7 +149,7 @@ u64 xe_guc_engine_busyness_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe)
 {
 	u64 ticks_engine;
 
-	guc_engine_busyness_get_usage(guc, hwe, &ticks_engine);
+	guc_engine_busyness_get_usage(guc, hwe, &ticks_engine, NULL);
 
 	return ticks_engine;
 }
diff --git a/drivers/gpu/drm/xe/xe_guc_engine_busyness.h b/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
index d70f06209896..57325910ebc4 100644
--- a/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
+++ b/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
@@ -12,6 +12,7 @@ struct xe_hw_engine;
 struct xe_guc;
 
 int xe_guc_engine_busyness_init(struct xe_guc *guc);
+u64 xe_guc_engine_busyness_active_ticks(struct xe_guc *guc);
 u64 xe_guc_engine_busyness_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe);
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
index 5e2ad4badaed..74212d8c5434 100644
--- a/drivers/gpu/drm/xe/xe_pmu.c
+++ b/drivers/gpu/drm/xe/xe_pmu.c
@@ -51,6 +51,20 @@ static bool is_engine_event(struct perf_event *event)
 	return config_counter(event->attr.config) < __DRM_XE_PMU_OTHER(0, 0);
 }
 
+static int
+config_status(struct xe_device *xe, u64 config)
+{
+	unsigned int gt_id = config_gt_id(config);
+
+	if (gt_id >= XE_PMU_MAX_GT)
+		return -ENOENT;
+
+	if (config_counter(config) == DRM_XE_PMU_TOTAL_ACTIVE_TICKS(0))
+		return 0;
+
+	return -ENOENT;
+}
+
 static int engine_event_status(struct xe_hw_engine *hwe,
 			       enum drm_xe_pmu_engine_sample sample)
 {
@@ -113,11 +127,13 @@ static int xe_pmu_event_init(struct perf_event *event)
 	if (!cpumask_test_cpu(event->cpu, &xe_pmu_cpumask))
 		return -EINVAL;
 
-	if (is_engine_event(event)) {
+	if (is_engine_event(event))
 		ret = engine_event_init(event);
-		if (ret)
-			return ret;
-	}
+	else
+		ret = config_status(xe, event->attr.config);
+
+	if (ret)
+		return ret;
 
 	if (!event->parent) {
 		drm_dev_get(&xe->drm);
@@ -131,7 +147,8 @@ static u64 __xe_pmu_event_read(struct perf_event *event)
 {
 	struct xe_device *xe =
 		container_of(event->pmu, typeof(*xe), pmu.base);
-	const unsigned int gt_id = config_gt_id(event->attr.config);
+	u64 config = event->attr.config;
+	const unsigned int gt_id = config_gt_id(config);
 	struct xe_gt *gt = xe_device_get_gt(xe, gt_id);
 	u64 val;
 
@@ -147,6 +164,11 @@ static u64 __xe_pmu_event_read(struct perf_event *event)
 			val = xe_gt_engine_busy_ticks(gt, hwe);
 		else
 			drm_warn(&xe->drm, "unknown pmu engine event\n");
+	} else {
+		if (config_counter(config) == DRM_XE_PMU_TOTAL_ACTIVE_TICKS(0))
+			val = xe_gt_total_active_ticks(gt);
+		else
+			drm_warn(&xe->drm, "unknown pmu event\n");
 	}
 
 	return val;
@@ -256,6 +278,12 @@ static const struct attribute_group xe_pmu_cpumask_attr_group = {
 	.attrs = xe_cpumask_attrs,
 };
 
+#define __event(__counter, __name) \
+{ \
+	.counter = (__counter), \
+	.name = (__name), \
+}
+
 #define __engine_event(__sample, __name) \
 { \
 	.sample = (__sample), \
@@ -293,6 +321,23 @@ create_event_attributes(struct xe_pmu *pmu)
 		__engine_event(DRM_XE_PMU_SAMPLE_BUSY_TICKS, "busy-ticks"),
 	};
 
+	static const struct {
+		unsigned int counter;
+		const char *name;
+	} events[] = {
+		__event(0, "total-active-ticks"),
+	};
+
+	/* Count how many counters we will be exposing. */
+	for_each_gt(gt, xe, j) {
+		for (i = 0; i < ARRAY_SIZE(events); i++) {
+			u64 config = __DRM_XE_PMU_OTHER(j, events[i].counter);
+
+			if (!config_status(xe, config))
+				count++;
+		}
+	}
+
 	for_each_gt(gt, xe, j) {
 		for_each_hw_engine(hwe, gt, id) {
 			for (i = 0; i < ARRAY_SIZE(engine_events); i++) {
@@ -315,6 +360,25 @@ create_event_attributes(struct xe_pmu *pmu)
 	xe_iter = xe_attr;
 	attr_iter = attr;
 
+	/* Initialize supported non-engine counters */
+	for_each_gt(gt, xe, j) {
+		for (i = 0; i < ARRAY_SIZE(events); i++) {
+			u64 config = __DRM_XE_PMU_OTHER(j, events[i].counter);
+			char *str;
+
+			if (config_status(xe, config))
+				continue;
+
+			str = kasprintf(GFP_KERNEL, "%s-gt%u",
+					events[i].name, j);
+			if (!str)
+				goto err;
+
+			*attr_iter++ = &xe_iter->attr.attr;
+			xe_iter = add_xe_attr(xe_iter, str, config);
+		}
+	}
+
 	/* Initialize supported engine counters */
 	for_each_gt(gt, xe, j) {
 		for_each_hw_engine(hwe, gt, id) {
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 58ab3e414c87..cee89ebdafe9 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1088,7 +1088,25 @@ struct drm_xe_wait_user_fence {
  * in 'struct perf_event_attr' as part of perf_event_open syscall to read a
  * particular event.
  *
+ * For example to open the DRM_XE_PMU_TOTAL_ACTIVE_TICKS(0):
+ *
+ * .. code-block:: C
+ *
+ *     struct perf_event_attr attr;
+ *     long long count;
+ *     int cpu = 0;
+ *     int fd;
+ *
+ *     memset(&attr, 0, sizeof(struct perf_event_attr));
+ *     attr.type = type; // eg: /sys/bus/event_source/devices/xe_0000_03_00.0/type
+ *     attr.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED;
+ *     attr.use_clockid = 1;
+ *     attr.clockid = CLOCK_MONOTONIC;
+ *     attr.config = DRM_XE_PMU_TOTAL_ACTIVE_TICKS(0);
+ *
+ *     fd = syscall(__NR_perf_event_open, &attr, -1, cpu, -1, 0);
  */
+
 enum drm_xe_pmu_engine_sample {
 	DRM_XE_PMU_SAMPLE_BUSY_TICKS = 0,
 };
@@ -1114,6 +1132,8 @@ enum drm_xe_pmu_engine_sample {
 #define DRM_XE_PMU_ENGINE_BUSY_TICKS(gt, class, instance) \
 	__DRM_XE_PMU_ENGINE(gt, class, instance, DRM_XE_PMU_SAMPLE_BUSY_TICKS)
 
+#define DRM_XE_PMU_TOTAL_ACTIVE_TICKS(gt)	__DRM_XE_PMU_OTHER(gt, 0)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 08/10] RFC drm/xe/guc: Expose engine busyness only for supported GuC version
  2023-12-14 11:31 [PATCH v3 00/10] Engine Busyness Riana Tauro
                   ` (6 preceding siblings ...)
  2023-12-14 11:31 ` [PATCH v3 07/10] RFC drm/xe/guc: Add PMU counter for total active ticks Riana Tauro
@ 2023-12-14 11:31 ` Riana Tauro
  2023-12-14 11:31 ` [PATCH v3 09/10] RFC drm/xe/guc: Dynamically enable/disable engine busyness stats Riana Tauro
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Riana Tauro @ 2023-12-14 11:31 UTC (permalink / raw)
  To: intel-xe

Guc version numbers are 8 bits only so convert to 32 bit 8.8.8
to allow version comparisions. use compatibility version
for the same.

Engine busyness is supported only on GuC versions >= 70.11.1.
Allow enabling/reading engine busyness only on supported
GuC versions. Warn once if not supported.

v2: rebase
    fix guc comparison error (Matthew Brost)
    add a macro for guc version comparison

v3: do not show pmu counters if guc engine busyness
    is not supported

Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/xe_gt.c                  | 11 ++++++
 drivers/gpu/drm/xe/xe_gt.h                  |  1 +
 drivers/gpu/drm/xe/xe_guc_engine_busyness.c | 41 +++++++++++++++++++++
 drivers/gpu/drm/xe/xe_guc_engine_busyness.h |  2 +-
 drivers/gpu/drm/xe/xe_pmu.c                 | 12 ++++--
 5 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 0e131b699a54..d946e51a3a06 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -818,3 +818,14 @@ u64 xe_gt_total_active_ticks(struct xe_gt *gt)
 {
 	return xe_guc_engine_busyness_active_ticks(&gt->uc.guc);
 }
+
+/**
+ * xe_gt_engine_busyness_supported - Checks support for engine busyness
+ * @gt: GT structure
+ *
+ * Returns true if engine busyness is supported, false otherwise.
+ */
+bool xe_gt_engine_busyness_supported(struct xe_gt *gt)
+{
+	return xe_guc_engine_busyness_supported(&gt->uc.guc);
+}
diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
index 7e7828b12acd..41ed52c8b704 100644
--- a/drivers/gpu/drm/xe/xe_gt.h
+++ b/drivers/gpu/drm/xe/xe_gt.h
@@ -42,6 +42,7 @@ int xe_gt_resume(struct xe_gt *gt);
 void xe_gt_reset_async(struct xe_gt *gt);
 void xe_gt_sanitize(struct xe_gt *gt);
 
+bool xe_gt_engine_busyness_supported(struct xe_gt *gt);
 u64 xe_gt_engine_busy_ticks(struct xe_gt *gt, struct xe_hw_engine *hwe);
 u64 xe_gt_total_active_ticks(struct xe_gt *gt);
 /**
diff --git a/drivers/gpu/drm/xe/xe_guc_engine_busyness.c b/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
index 4c24f51f2fa3..c40625f41ae5 100644
--- a/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
+++ b/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
@@ -32,6 +32,9 @@
  * engine busyness % = (ticks_engine / ticks_gt) * 100
  */
 
+/* GuC version number components are only 8-bit, so converting to a 32bit 8.8.8 */
+#define GUC_VER(maj, min, pat)	(((maj) << 16) | ((min) << 8) | (pat))
+
 static void guc_engine_busyness_usage_map(struct xe_guc *guc,
 					  struct xe_hw_engine *hwe,
 					  struct iosys_map *engine_map,
@@ -110,6 +113,10 @@ static void guc_engine_busyness_enable_stats(struct xe_guc *guc)
 	struct xe_device *xe = guc_to_xe(guc);
 	int ret;
 
+	/* Engine busyness supported only on GuC >= 70.11.1 */
+	if (!xe_guc_engine_busyness_supported(guc))
+		return;
+
 	ret = xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
 	if (ret)
 		drm_err(&xe->drm, "Failed to enable usage stats %pe", ERR_PTR(ret));
@@ -122,6 +129,28 @@ static void guc_engine_busyness_fini(struct drm_device *drm, void *arg)
 	xe_bo_unpin_map_no_vm(guc->busy.bo);
 }
 
+/*
+ * xe_guc_engine_busynes_supported- check if engine busyness is supported
+ * @guc: The GuC object
+ *
+ * Engine busyness is supported only above guc 70.11.1
+ *
+ * Returns true if supported, false otherwise
+ */
+bool xe_guc_engine_busyness_supported(struct xe_guc *guc)
+{
+	struct xe_uc_fw *uc_fw = &guc->fw;
+	struct xe_uc_fw_version *version = &uc_fw->versions.found[XE_UC_FW_VER_COMPATIBILITY];
+
+	if (GUC_VER(version->major, version->minor, version->patch) >= GUC_VER(1, 3, 1))
+		return true;
+
+	drm_WARN_ON_ONCE(&guc_to_xe(guc)->drm,
+			 "Engine busyness not supported in this GuC version\n");
+
+	return false;
+}
+
 /*
  * xe_guc_engine_busyness_active_ticks - Gets the total active ticks
  * @guc: The GuC object
@@ -132,6 +161,10 @@ u64 xe_guc_engine_busyness_active_ticks(struct xe_guc *guc)
 {
 	u64 ticks_gt;
 
+	/* Engine busyness supported only on GuC >= 70.11.1 */
+	if (!xe_guc_engine_busyness_supported(guc))
+		return 0;
+
 	guc_engine_busyness_get_usage(guc, NULL, NULL, &ticks_gt);
 
 	return ticks_gt;
@@ -149,6 +182,10 @@ u64 xe_guc_engine_busyness_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe)
 {
 	u64 ticks_engine;
 
+	/* Engine busyness supported only on GuC >= 70.11.1 */
+	if (!xe_guc_engine_busyness_supported(guc))
+		return 0;
+
 	guc_engine_busyness_get_usage(guc, hwe, &ticks_engine, NULL);
 
 	return ticks_engine;
@@ -172,6 +209,10 @@ int xe_guc_engine_busyness_init(struct xe_guc *guc)
 	u32 size;
 	int err;
 
+	/* Engine busyness supported only on GuC >= 70.11.1 */
+	if (!xe_guc_engine_busyness_supported(guc))
+		return 0;
+
 	/* Initialization already done */
 	if (guc->busy.bo)
 		return 0;
diff --git a/drivers/gpu/drm/xe/xe_guc_engine_busyness.h b/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
index 57325910ebc4..e3c74e0236af 100644
--- a/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
+++ b/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
@@ -14,5 +14,5 @@ struct xe_guc;
 int xe_guc_engine_busyness_init(struct xe_guc *guc);
 u64 xe_guc_engine_busyness_active_ticks(struct xe_guc *guc);
 u64 xe_guc_engine_busyness_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe);
-
+bool xe_guc_engine_busyness_supported(struct xe_guc *guc);
 #endif
diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
index 74212d8c5434..9c8591d59b54 100644
--- a/drivers/gpu/drm/xe/xe_pmu.c
+++ b/drivers/gpu/drm/xe/xe_pmu.c
@@ -55,14 +55,16 @@ static int
 config_status(struct xe_device *xe, u64 config)
 {
 	unsigned int gt_id = config_gt_id(config);
+	struct xe_gt *gt = xe_device_get_gt(xe, gt_id);
 
 	if (gt_id >= XE_PMU_MAX_GT)
 		return -ENOENT;
 
-	if (config_counter(config) == DRM_XE_PMU_TOTAL_ACTIVE_TICKS(0))
-		return 0;
+	if (config_counter(config) == DRM_XE_PMU_TOTAL_ACTIVE_TICKS(0) &&
+	    !xe_gt_engine_busyness_supported(gt))
+		return -ENOENT;
 
-	return -ENOENT;
+	return 0;
 }
 
 static int engine_event_status(struct xe_hw_engine *hwe,
@@ -71,6 +73,10 @@ static int engine_event_status(struct xe_hw_engine *hwe,
 	if (!hwe)
 		return -ENODEV;
 
+	if (sample == DRM_XE_PMU_SAMPLE_BUSY_TICKS &&
+	    !xe_gt_engine_busyness_supported(hwe->gt))
+		return -ENOENT;
+
 	/* Other engine events will be added, XE_ENGINE_SAMPLE_COUNT will be changed */
 	return (sample >= DRM_XE_PMU_SAMPLE_BUSY_TICKS && sample < XE_ENGINE_SAMPLE_COUNT)
 		? 0 : -ENOENT;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 09/10] RFC drm/xe/guc: Dynamically enable/disable engine busyness stats
  2023-12-14 11:31 [PATCH v3 00/10] Engine Busyness Riana Tauro
                   ` (7 preceding siblings ...)
  2023-12-14 11:31 ` [PATCH v3 08/10] RFC drm/xe/guc: Expose engine busyness only for supported GuC version Riana Tauro
@ 2023-12-14 11:31 ` Riana Tauro
  2023-12-14 11:31 ` [PATCH v3 10/10] RFC drm/xe/guc: Handle runtime suspend issues for engine busyness Riana Tauro
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Riana Tauro @ 2023-12-14 11:31 UTC (permalink / raw)
  To: intel-xe

Dynamically enable/disable engine busyness stats using GuC
action when PMU interface is opened and closed to avoid
power penality.

Co-developed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_engine_busyness.c | 96 ++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_guc_engine_busyness.h |  2 +
 drivers/gpu/drm/xe/xe_guc_types.h           | 14 +++
 drivers/gpu/drm/xe/xe_pmu.c                 | 32 +++++++
 4 files changed, 140 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_engine_busyness.c b/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
index c40625f41ae5..56e3378d856d 100644
--- a/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
+++ b/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
@@ -8,6 +8,7 @@
 
 #include "abi/guc_actions_abi.h"
 #include "xe_bo.h"
+#include "xe_device.h"
 #include "xe_guc.h"
 #include "xe_guc_ct.h"
 
@@ -102,9 +103,9 @@ static void guc_engine_busyness_get_usage(struct xe_guc *guc,
 		*_ticks_gt = ticks_gt;
 }
 
-static void guc_engine_busyness_enable_stats(struct xe_guc *guc)
+static void guc_engine_busyness_action_usage_stats(struct xe_guc *guc, bool enable)
 {
-	u32 ggtt_addr = xe_bo_ggtt_addr(guc->busy.bo);
+	u32 ggtt_addr = enable ? xe_bo_ggtt_addr(guc->busy.bo) : 0;
 	u32 action[] = {
 		XE_GUC_ACTION_SET_DEVICE_ENGINE_UTILIZATION,
 		ggtt_addr,
@@ -122,6 +123,45 @@ static void guc_engine_busyness_enable_stats(struct xe_guc *guc)
 		drm_err(&xe->drm, "Failed to enable usage stats %pe", ERR_PTR(ret));
 }
 
+static void guc_engine_busyness_enable_stats(struct xe_guc *guc, bool enable)
+{
+	struct xe_device *xe = guc_to_xe(guc);
+	bool skip;
+
+	spin_lock(&guc->busy.enable_lock);
+	skip = enable == guc->busy.enabled;
+	if (!skip)
+		guc->busy.enabled = enable;
+	spin_unlock(&guc->busy.enable_lock);
+
+	if (skip)
+		return;
+
+	xe_device_mem_access_get(xe);
+	guc_engine_busyness_action_usage_stats(guc, enable);
+	xe_device_mem_access_put(xe);
+}
+
+static void guc_engine_busyness_toggle_stats(struct xe_guc *guc)
+{
+	if (!guc->submission_state.enabled)
+		return;
+
+	/* Pmu_ref can increase before the worker thread runs this function */
+	if (guc->busy.pmu_ref >= 1)
+		guc_engine_busyness_enable_stats(guc, true);
+	else if (guc->busy.pmu_ref == 0)
+		guc_engine_busyness_enable_stats(guc, false);
+}
+
+static void guc_engine_buysness_worker_func(struct work_struct *w)
+{
+	struct xe_guc *guc = container_of(w, struct xe_guc,
+					  busy.enable_worker);
+
+	guc_engine_busyness_toggle_stats(guc);
+}
+
 static void guc_engine_busyness_fini(struct drm_device *drm, void *arg)
 {
 	struct xe_guc *guc = arg;
@@ -151,6 +191,52 @@ bool xe_guc_engine_busyness_supported(struct xe_guc *guc)
 	return false;
 }
 
+/*
+ * xe_guc_engine_busyness_pin - Dynamically enables engine busyness stats
+ * @guc: The GuC object
+ * @pmu_locked: boolean to indicate pmu event is started, locked by pmu spinlock
+ *
+ * Dynamically enables engine busyness by queueing a worker thread
+ * if guc submission is not yet enabled or if pmu event is started.
+ */
+void xe_guc_engine_busyness_pin(struct xe_guc *guc, bool pmu_locked)
+{
+	/* Engine busyness supported only on GuC >= 70.11.1 */
+	if (!xe_guc_engine_busyness_supported(guc))
+		return;
+
+	if (pmu_locked)
+		guc->busy.pmu_ref++;
+
+	if (!guc->submission_state.enabled || pmu_locked)
+		queue_work(system_unbound_wq, &guc->busy.enable_worker);
+	else
+		guc_engine_busyness_enable_stats(guc, true);
+}
+
+/*
+ * xe_guc_engine_busyness_unpin - Dynamically disables engine busyness stats
+ * @guc: The GuC object
+ * @pmu_locked: boolean to indicate pmu event is stopped, locked by pmu spinlock
+ *
+ * Dynamically disables engine busyness by queueing a worker thread
+ * if guc submission is not yet enabled or if pmu event is stopped.
+ */
+void xe_guc_engine_busyness_unpin(struct xe_guc *guc, bool pmu_locked)
+{
+	/* Engine busyness supported only on GuC >= 70.11.1 */
+	if (!xe_guc_engine_busyness_supported(guc))
+		return;
+
+	if (pmu_locked)
+		guc->busy.pmu_ref--;
+
+	if (!guc->submission_state.enabled || pmu_locked)
+		queue_work(system_unbound_wq, &guc->busy.enable_worker);
+	else
+		guc_engine_busyness_toggle_stats(guc);
+}
+
 /*
  * xe_guc_engine_busyness_active_ticks - Gets the total active ticks
  * @guc: The GuC object
@@ -227,9 +313,11 @@ int xe_guc_engine_busyness_init(struct xe_guc *guc)
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
+	spin_lock_init(&guc->busy.enable_lock);
+	INIT_WORK(&guc->busy.enable_worker, guc_engine_buysness_worker_func);
 	guc->busy.bo = bo;
-
-	guc_engine_busyness_enable_stats(guc);
+	guc->busy.enabled = false;
+	guc->busy.pmu_ref = 0;
 
 	err = drmm_add_action_or_reset(&xe->drm, guc_engine_busyness_fini, guc);
 	if (err)
diff --git a/drivers/gpu/drm/xe/xe_guc_engine_busyness.h b/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
index e3c74e0236af..008af1c0838a 100644
--- a/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
+++ b/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
@@ -15,4 +15,6 @@ int xe_guc_engine_busyness_init(struct xe_guc *guc);
 u64 xe_guc_engine_busyness_active_ticks(struct xe_guc *guc);
 u64 xe_guc_engine_busyness_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe);
 bool xe_guc_engine_busyness_supported(struct xe_guc *guc);
+void xe_guc_engine_busyness_pin(struct xe_guc *guc, bool pmu_locked);
+void xe_guc_engine_busyness_unpin(struct xe_guc *guc, bool pmu_locked);
 #endif
diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
index 4e9602301aed..cf87fe75490b 100644
--- a/drivers/gpu/drm/xe/xe_guc_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_types.h
@@ -74,6 +74,20 @@ struct xe_guc {
 	struct {
 		/** @bo: GGTT buffer object of engine busyness that is shared with GuC */
 		struct xe_bo *bo;
+		/** @enabled: state of engine stats */
+		bool enabled;
+		/** @enable_lock: for accessing @enabled */
+		spinlock_t enable_lock;
+		/**
+		 * @enable_worker: Async worker for enabling/disabling
+		 * busyness tracking from PMU
+		 */
+		struct work_struct enable_worker;
+		/**
+		 * @pmu_ref: how many outstanding PMU counters have
+		 * been requested, locked by PMU spinlock
+		 */
+		int pmu_ref;
 	} busy;
 
 	/**
diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
index 9c8591d59b54..5eeb904acfa2 100644
--- a/drivers/gpu/drm/xe/xe_pmu.c
+++ b/drivers/gpu/drm/xe/xe_pmu.c
@@ -9,6 +9,9 @@
 
 #include "xe_device.h"
 #include "xe_gt.h"
+#include "xe_gt_clock.h"
+#include "xe_guc_engine_busyness.h"
+#include "xe_mmio.h"
 
 #define XE_ENGINE_SAMPLE_COUNT (DRM_XE_PMU_SAMPLE_BUSY_TICKS + 1)
 
@@ -93,6 +96,8 @@ static int engine_event_init(struct perf_event *event)
 	hwe = xe_gt_hw_engine(gt, xe_hw_engine_from_user_class(engine_event_class(event)),
 			      engine_event_instance(event), true);
 
+	xe_guc_engine_busyness_pin(&gt->uc.guc, false);
+
 	return engine_event_status(hwe, engine_event_sample(event));
 }
 
@@ -204,6 +209,19 @@ static void xe_pmu_event_read(struct perf_event *event)
 
 static void xe_pmu_enable(struct perf_event *event)
 {
+	struct xe_device *xe =
+		container_of(event->pmu, typeof(*xe), pmu.base);
+	const int gt_id = config_gt_id(event->attr.config);
+	struct xe_gt *gt = xe_device_get_gt(xe, gt_id);
+	struct xe_pmu *pmu = &xe->pmu;
+	unsigned long flags;
+
+	if (is_engine_event(event) ||
+	    config_counter(event->attr.config) == DRM_XE_PMU_TOTAL_ACTIVE_TICKS(0)) {
+		spin_lock_irqsave(&pmu->lock, flags);
+		xe_guc_engine_busyness_pin(&gt->uc.guc, true);
+		spin_unlock_irqrestore(&pmu->lock, flags);
+	}
 	/*
 	 * Store the current counter value so we can report the correct delta
 	 * for all listeners. Even when the event was already enabled and has
@@ -227,9 +245,23 @@ static void xe_pmu_event_start(struct perf_event *event, int flags)
 
 static void xe_pmu_event_stop(struct perf_event *event, int flags)
 {
+	struct xe_device *xe =
+		container_of(event->pmu, typeof(*xe), pmu.base);
+	const int gt_id = config_gt_id(event->attr.config);
+	struct xe_gt *gt = xe_device_get_gt(xe, gt_id);
+	struct xe_pmu *pmu = &xe->pmu;
+	unsigned long irqflags;
+
 	if (flags & PERF_EF_UPDATE)
 		xe_pmu_event_read(event);
 
+	if (is_engine_event(event) ||
+	    config_counter(event->attr.config) == DRM_XE_PMU_TOTAL_ACTIVE_TICKS(0)) {
+		spin_lock_irqsave(&pmu->lock, irqflags);
+		xe_guc_engine_busyness_unpin(&gt->uc.guc, true);
+		spin_unlock_irqrestore(&pmu->lock, irqflags);
+	}
+
 	event->hw.state = PERF_HES_STOPPED;
 }
 
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 10/10] RFC drm/xe/guc: Handle runtime suspend issues for engine busyness
  2023-12-14 11:31 [PATCH v3 00/10] Engine Busyness Riana Tauro
                   ` (8 preceding siblings ...)
  2023-12-14 11:31 ` [PATCH v3 09/10] RFC drm/xe/guc: Dynamically enable/disable engine busyness stats Riana Tauro
@ 2023-12-14 11:31 ` Riana Tauro
  2023-12-14 12:59 ` ✓ CI.Patch_applied: success for Engine Busyness (rev3) Patchwork
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Riana Tauro @ 2023-12-14 11:31 UTC (permalink / raw)
  To: intel-xe

1) During runtime suspend, when card enters D3hot, values read
    from the shared memory maintained by GuC returns 0xFF.
    Waking up for every perf read when
    device is runtime suspended causes power penality.
    Store the last read busy ticks and total active ticks and return
    these values when suspended

 2) When the device is runtime resumed, guc is loaded again. If pmu
    interface was opened to collect busyness events, the guc stats
    have to be re-enabled to resume collection after suspend.
    Disable/enable guc stats if pmu is opened and is already collecting
    busyness events and device gets runtime suspended/resumed.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
---
 drivers/gpu/drm/xe/xe_gt.c                  |  4 ++
 drivers/gpu/drm/xe/xe_guc_engine_busyness.c | 52 ++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_guc_engine_busyness.h |  2 +
 drivers/gpu/drm/xe/xe_guc_types.h           |  5 ++
 drivers/gpu/drm/xe/xe_pmu.c                 | 10 ++++
 drivers/gpu/drm/xe/xe_pmu.h                 |  4 ++
 6 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index d946e51a3a06..21eb2ad5cc33 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -714,6 +714,8 @@ int xe_gt_suspend(struct xe_gt *gt)
 	if (err)
 		goto err_force_wake;
 
+	xe_pmu_suspend(gt);
+
 	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
 	xe_device_mem_access_put(gt_to_xe(gt));
 	xe_gt_info(gt, "suspended\n");
@@ -742,6 +744,8 @@ int xe_gt_resume(struct xe_gt *gt)
 	if (err)
 		goto err_force_wake;
 
+	xe_pmu_resume(gt);
+
 	XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
 	xe_device_mem_access_put(gt_to_xe(gt));
 	xe_gt_info(gt, "resumed\n");
diff --git a/drivers/gpu/drm/xe/xe_guc_engine_busyness.c b/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
index 56e3378d856d..8867f5743ea5 100644
--- a/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
+++ b/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
@@ -96,11 +96,15 @@ static void guc_engine_busyness_get_usage(struct xe_guc *guc,
 #undef read_engine_usage
 #undef read_global_field
 
-	if (_ticks_engine)
+	if (hwe && _ticks_engine) {
 		*_ticks_engine = ticks_engine;
+		guc->busy.prev_busy_ticks[hwe->class][hwe->logical_instance] = ticks_engine;
+	}
 
-	if (_ticks_gt)
+	if (_ticks_gt) {
 		*_ticks_gt = ticks_gt;
+		guc->busy.prev_gt_ticks = ticks_gt;
+	}
 }
 
 static void guc_engine_busyness_action_usage_stats(struct xe_guc *guc, bool enable)
@@ -237,6 +241,36 @@ void xe_guc_engine_busyness_unpin(struct xe_guc *guc, bool pmu_locked)
 		guc_engine_busyness_toggle_stats(guc);
 }
 
+/*
+ * xe_guc_engine_busyness_resume - Helper to resume engine busyness
+ * @guc: The GuC object
+ *
+ * Enable engine busyness if there were outstanding pmu events before
+ * suspend and the collection has to be resumed. This is necessary
+ * as there is a common path for both Runtime suspend and system suspend
+ * and it reloads GuC on resume.
+ */
+void xe_guc_engine_busyness_resume(struct xe_guc *guc)
+{
+	if (guc->busy.pmu_ref)
+		guc_engine_busyness_toggle_stats(guc);
+}
+
+/*
+ * xe_guc_engine_busyness_suspend - Helper to suspend engine busyness
+ * @guc: The GuC object
+ *
+ * Disable engine busyness if there are any outstanding pmu events
+ * and if its suspended. This is necessary as there is a common
+ * path for both Runtime suspend and system suspend
+ * and it reloads GuC on resume.
+ */
+void xe_guc_engine_busyness_suspend(struct xe_guc *guc)
+{
+	if (guc->busy.pmu_ref)
+		guc->busy.enabled = false;
+}
+
 /*
  * xe_guc_engine_busyness_active_ticks - Gets the total active ticks
  * @guc: The GuC object
@@ -245,13 +279,20 @@ void xe_guc_engine_busyness_unpin(struct xe_guc *guc, bool pmu_locked)
  */
 u64 xe_guc_engine_busyness_active_ticks(struct xe_guc *guc)
 {
+	struct xe_device *xe = guc_to_xe(guc);
+	bool device_awake;
 	u64 ticks_gt;
 
 	/* Engine busyness supported only on GuC >= 70.11.1 */
 	if (!xe_guc_engine_busyness_supported(guc))
 		return 0;
 
+	device_awake = xe_device_mem_access_get_if_ongoing(xe);
+	if (!device_awake)
+		return guc->busy.prev_gt_ticks;
+
 	guc_engine_busyness_get_usage(guc, NULL, NULL, &ticks_gt);
+	xe_device_mem_access_put(xe);
 
 	return ticks_gt;
 }
@@ -266,13 +307,20 @@ u64 xe_guc_engine_busyness_active_ticks(struct xe_guc *guc)
  */
 u64 xe_guc_engine_busyness_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe)
 {
+	struct xe_device *xe = guc_to_xe(guc);
+	bool device_awake;
 	u64 ticks_engine;
 
 	/* Engine busyness supported only on GuC >= 70.11.1 */
 	if (!xe_guc_engine_busyness_supported(guc))
 		return 0;
 
+	device_awake = xe_device_mem_access_get_if_ongoing(xe);
+	if (!device_awake)
+		return guc->busy.prev_busy_ticks[hwe->class][hwe->logical_instance];
+
 	guc_engine_busyness_get_usage(guc, hwe, &ticks_engine, NULL);
+	xe_device_mem_access_put(xe);
 
 	return ticks_engine;
 }
diff --git a/drivers/gpu/drm/xe/xe_guc_engine_busyness.h b/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
index 008af1c0838a..b33692d77f7d 100644
--- a/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
+++ b/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
@@ -17,4 +17,6 @@ u64 xe_guc_engine_busyness_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe);
 bool xe_guc_engine_busyness_supported(struct xe_guc *guc);
 void xe_guc_engine_busyness_pin(struct xe_guc *guc, bool pmu_locked);
 void xe_guc_engine_busyness_unpin(struct xe_guc *guc, bool pmu_locked);
+void xe_guc_engine_busyness_suspend(struct xe_guc *guc);
+void xe_guc_engine_busyness_resume(struct xe_guc *guc);
 #endif
diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
index cf87fe75490b..4596a341f09a 100644
--- a/drivers/gpu/drm/xe/xe_guc_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_types.h
@@ -15,6 +15,7 @@
 #include "xe_guc_fwif.h"
 #include "xe_guc_log_types.h"
 #include "xe_guc_pc_types.h"
+#include "xe_hw_engine.h"
 #include "xe_uc_fw_types.h"
 
 /**
@@ -88,6 +89,10 @@ struct xe_guc {
 		 * been requested, locked by PMU spinlock
 		 */
 		int pmu_ref;
+		/** @prev_busy_ticks: array containing last stored busy ticks */
+		u64 prev_busy_ticks[XE_ENGINE_CLASS_MAX][XE_HW_ENGINE_MAX_INSTANCE];
+		/** @prev_gt_ticks: last stored gt ticks */
+		u64 prev_gt_ticks;
 	} busy;
 
 	/**
diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
index 5eeb904acfa2..5aa0fcee175a 100644
--- a/drivers/gpu/drm/xe/xe_pmu.c
+++ b/drivers/gpu/drm/xe/xe_pmu.c
@@ -548,6 +548,16 @@ static void xe_pmu_unregister_cpuhp_state(struct xe_pmu *pmu)
 	cpuhp_state_remove_instance(cpuhp_slot, &pmu->cpuhp.node);
 }
 
+void xe_pmu_suspend(struct xe_gt *gt)
+{
+	xe_guc_engine_busyness_suspend(&gt->uc.guc);
+}
+
+void xe_pmu_resume(struct xe_gt *gt)
+{
+	xe_guc_engine_busyness_resume(&gt->uc.guc);
+}
+
 static void xe_pmu_unregister(struct drm_device *device, void *arg)
 {
 	struct xe_pmu *pmu = arg;
diff --git a/drivers/gpu/drm/xe/xe_pmu.h b/drivers/gpu/drm/xe/xe_pmu.h
index d6fca18466f4..568bcf250934 100644
--- a/drivers/gpu/drm/xe/xe_pmu.h
+++ b/drivers/gpu/drm/xe/xe_pmu.h
@@ -13,10 +13,14 @@
 int xe_pmu_init(void);
 void xe_pmu_exit(void);
 void xe_pmu_register(struct xe_pmu *pmu);
+void xe_pmu_suspend(struct xe_gt *gt);
+void xe_pmu_resume(struct xe_gt *gt);
 #else
 static inline int xe_pmu_init(void) { return 0; }
 static inline void xe_pmu_exit(void) {}
 static inline void xe_pmu_register(struct xe_pmu *pmu) {}
+static inline void xe_pmu_suspend(struct xe_gt *gt) {}
+static inline void xe_pmu_resume(struct xe_gt *gt) {}
 #endif
 
 #endif
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* ✓ CI.Patch_applied: success for Engine Busyness (rev3)
  2023-12-14 11:31 [PATCH v3 00/10] Engine Busyness Riana Tauro
                   ` (9 preceding siblings ...)
  2023-12-14 11:31 ` [PATCH v3 10/10] RFC drm/xe/guc: Handle runtime suspend issues for engine busyness Riana Tauro
@ 2023-12-14 12:59 ` Patchwork
  2023-12-14 12:59 ` ✗ CI.checkpatch: warning " Patchwork
  2023-12-14 13:00 ` ✗ CI.KUnit: failure " Patchwork
  12 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2023-12-14 12:59 UTC (permalink / raw)
  To: Riana Tauro; +Cc: intel-xe

== Series Details ==

Series: Engine Busyness (rev3)
URL   : https://patchwork.freedesktop.org/series/126919/
State : success

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: 5cd189336 drm/xe/xe2: Support flat ccs
=== git am output follows ===
.git/rebase-apply/patch:662: new blank line at EOF.
+
warning: 1 line adds whitespace errors.
Applying: drm/xe/pmu: Remove PMU from Xe till uapi is finalized
Applying: fixup! drm/xe/uapi: Reject bo creation of unaligned size
Applying: drm/xe: Move user engine class mappings to functions
Applying: RFC drm/xe/guc: Add interface for engine busyness ticks
Applying: RFC drm/xe/uapi: Add configs for Engine busyness
Applying: RFC drm/xe/pmu: Enable PMU interface and add engine busyness counter
Applying: RFC drm/xe/guc: Add PMU counter for total active ticks
Applying: RFC drm/xe/guc: Expose engine busyness only for supported GuC version
Applying: RFC drm/xe/guc: Dynamically enable/disable engine busyness stats
Applying: RFC drm/xe/guc: Handle runtime suspend issues for engine busyness



^ permalink raw reply	[flat|nested] 20+ messages in thread

* ✗ CI.checkpatch: warning for Engine Busyness (rev3)
  2023-12-14 11:31 [PATCH v3 00/10] Engine Busyness Riana Tauro
                   ` (10 preceding siblings ...)
  2023-12-14 12:59 ` ✓ CI.Patch_applied: success for Engine Busyness (rev3) Patchwork
@ 2023-12-14 12:59 ` Patchwork
  2023-12-14 13:00 ` ✗ CI.KUnit: failure " Patchwork
  12 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2023-12-14 12:59 UTC (permalink / raw)
  To: Riana Tauro; +Cc: intel-xe

== Series Details ==

Series: Engine Busyness (rev3)
URL   : https://patchwork.freedesktop.org/series/126919/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
6030b24c1386b00de8187b5fb987e283a57b372a
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit d0d03f749f168812909448c5249aa820f8c2dda3
Author: Riana Tauro <riana.tauro@intel.com>
Date:   Thu Dec 14 17:01:44 2023 +0530

    RFC drm/xe/guc: Handle runtime suspend issues for engine busyness
    
    1) During runtime suspend, when card enters D3hot, values read
        from the shared memory maintained by GuC returns 0xFF.
        Waking up for every perf read when
        device is runtime suspended causes power penality.
        Store the last read busy ticks and total active ticks and return
        these values when suspended
    
     2) When the device is runtime resumed, guc is loaded again. If pmu
        interface was opened to collect busyness events, the guc stats
        have to be re-enabled to resume collection after suspend.
        Disable/enable guc stats if pmu is opened and is already collecting
        busyness events and device gets runtime suspended/resumed.
    
    Signed-off-by: Riana Tauro <riana.tauro@intel.com>
+ /mt/dim checkpatch 5cd1893366708380854f4694ae57417192458a6b drm-intel
b8621025e drm/xe/pmu: Remove PMU from Xe till uapi is finalized
-:117: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#117: 
deleted file mode 100644

total: 0 errors, 1 warnings, 0 checks, 114 lines checked
71e8da255 fixup! drm/xe/uapi: Reject bo creation of unaligned size
-:7: WARNING:COMMIT_MESSAGE: Missing commit description - Add an appropriate one

total: 0 errors, 1 warnings, 0 checks, 8 lines checked
3ed2e2616 drm/xe: Move user engine class mappings to functions
723eca7ed RFC drm/xe/guc: Add interface for engine busyness ticks
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in <module>
    from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in <module>
    from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
-:111: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#111: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 275 lines checked
3c45b8729 RFC drm/xe/uapi: Add configs for Engine busyness
95abe0af1 RFC drm/xe/pmu: Enable PMU interface and add engine busyness counter
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in <module>
    from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in <module>
    from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in <module>
    from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
-:118: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#118: 
new file mode 100644

-:457: WARNING:LONG_LINE: line length of 116 exceeds 100 columns
#457: FILE: drivers/gpu/drm/xe/xe_pmu.c:335:
+						      __DRM_XE_PMU_ENGINE(j, xe_hw_engine_to_user_class(hwe->class),

total: 0 errors, 2 warnings, 0 checks, 664 lines checked
a847a8ef7 RFC drm/xe/guc: Add PMU counter for total active ticks
bfeb74340 RFC drm/xe/guc: Expose engine busyness only for supported GuC version
159c5db3f RFC drm/xe/guc: Dynamically enable/disable engine busyness stats
d0d03f749 RFC drm/xe/guc: Handle runtime suspend issues for engine busyness



^ permalink raw reply	[flat|nested] 20+ messages in thread

* ✗ CI.KUnit: failure for Engine Busyness (rev3)
  2023-12-14 11:31 [PATCH v3 00/10] Engine Busyness Riana Tauro
                   ` (11 preceding siblings ...)
  2023-12-14 12:59 ` ✗ CI.checkpatch: warning " Patchwork
@ 2023-12-14 13:00 ` Patchwork
  12 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2023-12-14 13:00 UTC (permalink / raw)
  To: Riana Tauro; +Cc: intel-xe

== Series Details ==

Series: Engine Busyness (rev3)
URL   : https://patchwork.freedesktop.org/series/126919/
State : failure

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:../drivers/gpu/drm/xe/xe_device.c: In function ‘xe_device_probe’:
../drivers/gpu/drm/xe/xe_device.c:534:21: error: ‘struct xe_device’ has no member named ‘pmu’
  534 |  xe_pmu_register(&xe->pmu);
      |                     ^~
make[7]: *** [../scripts/Makefile.build:243: drivers/gpu/drm/xe/xe_device.o] Error 1
make[7]: *** Waiting for unfinished jobs....
make[6]: *** [../scripts/Makefile.build:480: drivers/gpu/drm/xe] Error 2
make[6]: *** Waiting for unfinished jobs....
make[5]: *** [../scripts/Makefile.build:480: drivers/gpu/drm] Error 2
make[4]: *** [../scripts/Makefile.build:480: drivers/gpu] Error 2
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [../scripts/Makefile.build:480: drivers] Error 2
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [/kernel/Makefile:1913: .] Error 2
make[1]: *** [/kernel/Makefile:234: __sub-make] Error 2
make: *** [Makefile:234: __sub-make] Error 2

[12:59:45] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[12:59:49] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 02/10] fixup! drm/xe/uapi: Reject bo creation of unaligned size
  2023-12-14 11:31 ` [PATCH v3 02/10] fixup! drm/xe/uapi: Reject bo creation of unaligned size Riana Tauro
@ 2023-12-14 13:31   ` Rodrigo Vivi
  2023-12-18  5:26     ` Riana Tauro
  0 siblings, 1 reply; 20+ messages in thread
From: Rodrigo Vivi @ 2023-12-14 13:31 UTC (permalink / raw)
  To: Riana Tauro; +Cc: intel-xe

On Thu, Dec 14, 2023 at 05:01:36PM +0530, Riana Tauro wrote:
> From: Ashutosh Dixit <ashutosh.dixit@intel.com>

We don't need this anymore since Ashutosh already put this as part of
the PMU removal patch anyway.

But also, please no more !fixup patches. We are about to do our rebase
to drm-next and will need stable hashes for stable 'Fixes:' tags
and commit mentions.

> 
> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> ---
>  drivers/gpu/drm/xe/tests/xe_dma_buf.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> index bb6f6424e06f..9f6d571d7fa9 100644
> --- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> @@ -3,6 +3,8 @@
>   * Copyright © 2022 Intel Corporation
>   */
>  
> +#include <drm/xe_drm.h>
> +
>  #include <kunit/test.h>
>  #include <kunit/visibility.h>
>  
> -- 
> 2.40.0
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 04/10] RFC drm/xe/guc: Add interface for engine busyness ticks
  2023-12-14 11:31 ` [PATCH v3 04/10] RFC drm/xe/guc: Add interface for engine busyness ticks Riana Tauro
@ 2023-12-15  2:48   ` Nilawar, Badal
  2023-12-18  6:07     ` Riana Tauro
  0 siblings, 1 reply; 20+ messages in thread
From: Nilawar, Badal @ 2023-12-15  2:48 UTC (permalink / raw)
  To: Riana Tauro, intel-xe



On 14-12-2023 17:01, Riana Tauro wrote:
> GuC provides engine busyness ticks as a 64 bit counter which count
> as clock ticks. These counters are maintained in a
> shared memory buffer and updated on a continuous basis.
> 
> Add functions that initialize Engine busyness and get
> the current accumulated busyness.
> 
> Co-developed-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
> ---
>   drivers/gpu/drm/xe/Makefile                 |   1 +
>   drivers/gpu/drm/xe/abi/guc_actions_abi.h    |   1 +
>   drivers/gpu/drm/xe/xe_gt.c                  |  13 ++
>   drivers/gpu/drm/xe/xe_gt.h                  |   2 +
>   drivers/gpu/drm/xe/xe_guc.c                 |   7 +
>   drivers/gpu/drm/xe/xe_guc_engine_busyness.c | 153 ++++++++++++++++++++
>   drivers/gpu/drm/xe/xe_guc_engine_busyness.h |  17 +++
>   drivers/gpu/drm/xe/xe_guc_fwif.h            |  15 ++
>   drivers/gpu/drm/xe/xe_guc_types.h           |   6 +
>   9 files changed, 215 insertions(+)
>   create mode 100644 drivers/gpu/drm/xe/xe_guc_engine_busyness.c
>   create mode 100644 drivers/gpu/drm/xe/xe_guc_engine_busyness.h
> 
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index b0cb6a9a390e..2523dc96986e 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -85,6 +85,7 @@ xe-y += xe_bb.o \
>   	xe_guc_ads.o \
>   	xe_guc_ct.o \
>   	xe_guc_debugfs.o \
> +	xe_guc_engine_busyness.o \
>   	xe_guc_hwconfig.o \
>   	xe_guc_log.o \
>   	xe_guc_pc.o \
> diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> index 3062e0e0d467..d87681ca89bc 100644
> --- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> +++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> @@ -139,6 +139,7 @@ enum xe_guc_action {
>   	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
>   	XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
>   	XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
> +	XE_GUC_ACTION_SET_DEVICE_ENGINE_UTILIZATION = 0x550C,
>   	XE_GUC_ACTION_NOTIFY_MEMORY_CAT_ERROR = 0x6000,
>   	XE_GUC_ACTION_REPORT_PAGE_FAULT_REQ_DESC = 0x6002,
>   	XE_GUC_ACTION_PAGE_FAULT_RES_DESC = 0x6003,
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index f5d18e98f8b6..9c84afb00f7b 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -32,6 +32,7 @@
>   #include "xe_gt_sysfs.h"
>   #include "xe_gt_tlb_invalidation.h"
>   #include "xe_gt_topology.h"
> +#include "xe_guc_engine_busyness.h"
>   #include "xe_guc_exec_queue_types.h"
>   #include "xe_guc_pc.h"
>   #include "xe_hw_fence.h"
> @@ -794,3 +795,15 @@ struct xe_hw_engine *xe_gt_any_hw_engine_by_reset_domain(struct xe_gt *gt,
>   
>   	return NULL;
>   }
> +
> +/**
> + * xe_gt_engine_busy_ticks - Return current accumulated engine busyness ticks
> + * @gt: GT structure
> + * @hwe: Xe HW engine to report on
> + *
> + * Returns accumulated ticks @hwe was busy since engine stats were enabled.
> + */
> +u64 xe_gt_engine_busy_ticks(struct xe_gt *gt, struct xe_hw_engine *hwe)
> +{
> +	return xe_guc_engine_busyness_ticks(&gt->uc.guc, hwe);
> +}
> diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
> index f3c780bd266d..5b4309310126 100644
> --- a/drivers/gpu/drm/xe/xe_gt.h
> +++ b/drivers/gpu/drm/xe/xe_gt.h
> @@ -42,6 +42,8 @@ int xe_gt_resume(struct xe_gt *gt);
>   void xe_gt_reset_async(struct xe_gt *gt);
>   void xe_gt_sanitize(struct xe_gt *gt);
>   
> +u64 xe_gt_engine_busy_ticks(struct xe_gt *gt, struct xe_hw_engine *hwe);
> +
>   /**
>    * xe_gt_any_hw_engine_by_reset_domain - scan the list of engines and return the
>    * first that matches the same reset domain as @class
> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> index 482cb0df9f15..6116aaea936f 100644
> --- a/drivers/gpu/drm/xe/xe_guc.c
> +++ b/drivers/gpu/drm/xe/xe_guc.c
> @@ -18,6 +18,7 @@
>   #include "xe_gt.h"
>   #include "xe_guc_ads.h"
>   #include "xe_guc_ct.h"
> +#include "xe_guc_engine_busyness.h"
>   #include "xe_guc_hwconfig.h"
>   #include "xe_guc_log.h"
>   #include "xe_guc_pc.h"
> @@ -306,9 +307,15 @@ int xe_guc_init_post_hwconfig(struct xe_guc *guc)
>   
>   int xe_guc_post_load_init(struct xe_guc *guc)
>   {
> +	int err;
> +
>   	xe_guc_ads_populate_post_load(&guc->ads);
>   	guc->submission_state.enabled = true;
>   
> +	err = xe_guc_engine_busyness_init(guc);
> +	if (err)
> +		return err;
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/xe/xe_guc_engine_busyness.c b/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
> new file mode 100644
> index 000000000000..287429e31e6c
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
> @@ -0,0 +1,153 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2023 Intel Corporation
> + */
> +#include "xe_guc_engine_busyness.h"
> +
> +#include <drm/drm_managed.h>
> +
> +#include "abi/guc_actions_abi.h"
> +#include "xe_bo.h"
> +#include "xe_guc.h"
> +#include "xe_guc_ct.h"
> +
> +/**
> + * DOC: Xe GuC Engine Busyness
> + *
> + * GuC >= 70.11.1 maintains busyness counters in a shared memory buffer for each
> + * engine on a continuous basis. The counters are all 64 bits and count in clock
> + * ticks. The values are updated on context switch events and periodicaly on a
> + * timer internal to GuC. The update rate is guaranteed to be at least 2Hz (but with
> + * a caveat that is not real time, best effort only).
> + *
> + * engine busyness ticks (ticks_engine) : clock ticks for which engine was active
> + */
> +
> +static void guc_engine_busyness_usage_map(struct xe_guc *guc,
> +					  struct xe_hw_engine *hwe,
> +					  struct iosys_map *engine_map)
> +{
> +	struct iosys_map *map;
> +	size_t offset;
> +	u32 instance;
> +	u8 guc_class;
> +
> +	guc_class = xe_engine_class_to_guc_class(hwe->class);
> +	instance = hwe->logical_instance;
> +
> +	map = &guc->busy.bo->vmap;
> +
> +	offset = offsetof(struct guc_engine_observation_data,
> +			  engine_data[guc_class][instance]);
> +
> +	*engine_map = IOSYS_MAP_INIT_OFFSET(map, offset);
> +}
> +
> +static void guc_engine_busyness_get_usage(struct xe_guc *guc,
> +					  struct xe_hw_engine *hwe,
> +					  u64 *_ticks_engine)
> +{
> +	struct iosys_map engine_map;
> +	u64 ticks_engine = 0;
> +	int i = 0;
> +
> +	guc_engine_busyness_usage_map(guc, hwe, &engine_map);
> +
> +#define read_engine_usage(map_, field_) \
> +	iosys_map_rd_field(map_, 0, struct guc_engine_data, field_)
> +
> +	do {
> +		ticks_engine = read_engine_usage(&engine_map, total_execution_ticks);
> +
> +		if (read_engine_usage(&engine_map, total_execution_ticks) == ticks_engine)
> +			break;
> +	} while (++i < 6);
> +
> +#undef read_engine_usage
> +
> +	if (_ticks_engine)
> +		*_ticks_engine = ticks_engine;
> +}
> +
> +static void guc_engine_busyness_enable_stats(struct xe_guc *guc)
> +{
> +	u32 ggtt_addr = xe_bo_ggtt_addr(guc->busy.bo);
> +	u32 action[] = {
> +		XE_GUC_ACTION_SET_DEVICE_ENGINE_UTILIZATION,
> +		ggtt_addr,
> +		0,
> +	};
> +	struct xe_device *xe = guc_to_xe(guc);
> +	int ret;
> +
> +	ret = xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> +	if (ret)
> +		drm_err(&xe->drm, "Failed to enable usage stats %pe", ERR_PTR(ret));
> +}
> +
> +static void guc_engine_busyness_fini(struct drm_device *drm, void *arg)
> +{
> +	struct xe_guc *guc = arg;
> +
> +	xe_bo_unpin_map_no_vm(guc->busy.bo);
> +}
> +
> +/*
> + * xe_guc_engine_busyness_ticks - Gets current accumulated
> + *				  engine busyness ticks
> + * @guc: The GuC object
> + * @hwe: Xe HW Engine
> + *
> + * Returns current acculumated ticks @hwe was busy when engine stats are enabled.
> + */
> +u64 xe_guc_engine_busyness_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe)
> +{
> +	u64 ticks_engine;
> +
> +	guc_engine_busyness_get_usage(guc, hwe, &ticks_engine);
> +
> +	return ticks_engine;
> +}
> +
> +/*
> + * xe_guc_engine_busyness_init - Initializes the GuC Engine Busyness
> + * @guc: The GuC object
> + *
> + * Initialize GuC engine busyness, only called once during driver load
> + * Supported only on GuC >= 70.11.1
> + *
> + * Return: 0 on success, negative error code on error.
> + */
> +int xe_guc_engine_busyness_init(struct xe_guc *guc)
> +{
> +	struct xe_device *xe = guc_to_xe(guc);
> +	struct xe_gt *gt = guc_to_gt(guc);
> +	struct xe_tile *tile = gt_to_tile(gt);
> +	struct xe_bo *bo;
> +	u32 size;
> +	int err;
> +
How about adding guc version check here and places applicable.
I am seeing patch 8 handles version check but how about moving  version 
check function in this patch only. This will also align the code with 
the doc.

Regards,
Badal  > +	/* Initialization already done */
> +	if (guc->busy.bo)
> +		return 0;
> +
> +	size = PAGE_ALIGN(sizeof(struct guc_engine_observation_data));
> +
> +	bo = xe_bo_create_pin_map(xe, tile, NULL, size,
> +				  ttm_bo_type_kernel,
> +				  XE_BO_CREATE_VRAM_IF_DGFX(tile) |
> +				  XE_BO_CREATE_GGTT_BIT);
> +
> +	if (IS_ERR(bo))
> +		return PTR_ERR(bo);
> +
> +	guc->busy.bo = bo;
> +
> +	guc_engine_busyness_enable_stats(guc);
> +
> +	err = drmm_add_action_or_reset(&xe->drm, guc_engine_busyness_fini, guc);
> +	if (err)
> +		return err;
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/xe/xe_guc_engine_busyness.h b/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
> new file mode 100644
> index 000000000000..d70f06209896
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2023 Intel Corporation
> + */
> +
> +#ifndef _XE_GUC_ENGINE_BUSYNESS_H_
> +#define _XE_GUC_ENGINE_BUSYNESS_H_
> +
> +#include <linux/types.h>
> +
> +struct xe_hw_engine;
> +struct xe_guc;
> +
> +int xe_guc_engine_busyness_init(struct xe_guc *guc);
> +u64 xe_guc_engine_busyness_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe);
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h b/drivers/gpu/drm/xe/xe_guc_fwif.h
> index 4dd5a88a7826..c8ca5fe97614 100644
> --- a/drivers/gpu/drm/xe/xe_guc_fwif.h
> +++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
> @@ -37,6 +37,7 @@
>   #define GUC_COMPUTE_CLASS		4
>   #define GUC_GSC_OTHER_CLASS		5
>   #define GUC_LAST_ENGINE_CLASS		GUC_GSC_OTHER_CLASS
> +#define GUC_MAX_OAG_COUNTERS		8
>   #define GUC_MAX_ENGINE_CLASSES		16
>   #define GUC_MAX_INSTANCES_PER_CLASS	32
>   
> @@ -222,6 +223,20 @@ struct guc_engine_usage {
>   	struct guc_engine_usage_record engines[GUC_MAX_ENGINE_CLASSES][GUC_MAX_INSTANCES_PER_CLASS];
>   } __packed;
>   
> +/* Engine busyness stats */
> +struct guc_engine_data {
> +	u64 total_execution_ticks;
> +	u64 reserved;
> +} __packed;
> +
> +struct guc_engine_observation_data {
> +	struct guc_engine_data engine_data[GUC_MAX_ENGINE_CLASSES][GUC_MAX_INSTANCES_PER_CLASS];
> +	u64 oag_busy_data[GUC_MAX_OAG_COUNTERS];
> +	u64 total_active_ticks;
> +	u64 gt_timestamp;
> +	u64 reserved1;
> +} __packed;
> +
>   /* This action will be programmed in C1BC - SOFT_SCRATCH_15_REG */
>   enum xe_guc_recv_message {
>   	XE_GUC_RECV_MSG_CRASH_DUMP_POSTED = BIT(1),
> diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
> index cd80802e8918..4e9602301aed 100644
> --- a/drivers/gpu/drm/xe/xe_guc_types.h
> +++ b/drivers/gpu/drm/xe/xe_guc_types.h
> @@ -70,6 +70,12 @@ struct xe_guc {
>   		u32 size;
>   	} hwconfig;
>   
> +	/** @busy: Engine busyness */
> +	struct {
> +		/** @bo: GGTT buffer object of engine busyness that is shared with GuC */
> +		struct xe_bo *bo;
> +	} busy;
> +
>   	/**
>   	 * @notify_reg: Register which is written to notify GuC of H2G messages
>   	 */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 01/10] drm/xe/pmu: Remove PMU from Xe till uapi is finalized
  2023-12-14 11:31 ` [PATCH v3 01/10] drm/xe/pmu: Remove PMU from Xe till uapi is finalized Riana Tauro
@ 2023-12-15  3:51   ` Aravind Iddamsetty
  2023-12-18  5:30     ` Riana Tauro
  0 siblings, 1 reply; 20+ messages in thread
From: Aravind Iddamsetty @ 2023-12-15  3:51 UTC (permalink / raw)
  To: Riana Tauro, intel-xe


On 12/14/23 17:01, Riana Tauro wrote:

Hi Riana,

If you are adding back pmu infra in a later patch then you better just remove
the group engine events and drop this patch.

Thanks,
Aravind.
> From: Ashutosh Dixit <ashutosh.dixit@intel.com>
>
> PMU uapi is likely to change in the future. Till the uapi is finalized,
> remove PMU from Xe. PMU can be re-added after uapi is finalized.
>
> v2: Include xe_drm.h in xe/tests/xe_dma_buf.c (Francois)
>
> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> Acked-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/xe/Makefile          |   2 -
>  drivers/gpu/drm/xe/regs/xe_gt_regs.h |   5 -
>  drivers/gpu/drm/xe/xe_device.c       |   2 -
>  drivers/gpu/drm/xe/xe_device_types.h |   4 -
>  drivers/gpu/drm/xe/xe_gt.c           |   2 -
>  drivers/gpu/drm/xe/xe_module.c       |   5 -
>  drivers/gpu/drm/xe/xe_pmu.c          | 645 ---------------------------
>  drivers/gpu/drm/xe/xe_pmu.h          |  25 --
>  drivers/gpu/drm/xe/xe_pmu_types.h    |  68 ---
>  include/uapi/drm/xe_drm.h            |  40 --
>  10 files changed, 798 deletions(-)
>  delete mode 100644 drivers/gpu/drm/xe/xe_pmu.c
>  delete mode 100644 drivers/gpu/drm/xe/xe_pmu.h
>  delete mode 100644 drivers/gpu/drm/xe/xe_pmu_types.h
>
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index f4ae063a7005..b0cb6a9a390e 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -267,8 +267,6 @@ xe-$(CONFIG_DRM_XE_DISPLAY) += \
>  	i915-display/skl_universal_plane.o \
>  	i915-display/skl_watermark.o
>  
> -xe-$(CONFIG_PERF_EVENTS) += xe_pmu.o
> -
>  ifeq ($(CONFIG_ACPI),y)
>  	xe-$(CONFIG_DRM_XE_DISPLAY) += \
>  		i915-display/intel_acpi.o \
> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> index f5bf4c6d1761..3c3977c388f5 100644
> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
> @@ -333,11 +333,6 @@
>  #define   INVALIDATION_BROADCAST_MODE_DIS	REG_BIT(12)
>  #define   GLOBAL_INVALIDATION_MODE		REG_BIT(2)
>  
> -#define XE_OAG_RC0_ANY_ENGINE_BUSY_FREE		XE_REG(0xdb80)
> -#define XE_OAG_ANY_MEDIA_FF_BUSY_FREE		XE_REG(0xdba0)
> -#define XE_OAG_BLT_BUSY_FREE			XE_REG(0xdbbc)
> -#define XE_OAG_RENDER_BUSY_FREE			XE_REG(0xdbdc)
> -
>  #define HALF_SLICE_CHICKEN5			XE_REG_MCR(0xe188, XE_REG_OPTION_MASKED)
>  #define   DISABLE_SAMPLE_G_PERFORMANCE		REG_BIT(0)
>  
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 221e87584352..d9ae77fe7382 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -529,8 +529,6 @@ int xe_device_probe(struct xe_device *xe)
>  
>  	xe_debugfs_register(xe);
>  
> -	xe_pmu_register(&xe->pmu);
> -
>  	xe_hwmon_register(xe);
>  
>  	err = drmm_add_action_or_reset(&xe->drm, xe_device_sanitize, xe);
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index d1a48456e9a3..c45ef17b3473 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -18,7 +18,6 @@
>  #include "xe_lmtt_types.h"
>  #include "xe_platform_types.h"
>  #include "xe_pt_types.h"
> -#include "xe_pmu.h"
>  #include "xe_sriov_types.h"
>  #include "xe_step_types.h"
>  
> @@ -427,9 +426,6 @@ struct xe_device {
>  	 */
>  	struct task_struct *pm_callback_task;
>  
> -	/** @pmu: performance monitoring unit */
> -	struct xe_pmu pmu;
> -
>  	/** @hwmon: hwmon subsystem integration */
>  	struct xe_hwmon *hwmon;
>  
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index dfd9cf01a5d5..f5d18e98f8b6 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -709,8 +709,6 @@ int xe_gt_suspend(struct xe_gt *gt)
>  	if (err)
>  		goto err_msg;
>  
> -	xe_pmu_suspend(gt);
> -
>  	err = xe_uc_suspend(&gt->uc);
>  	if (err)
>  		goto err_force_wake;
> diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
> index 51bf69b7ab22..110b69864656 100644
> --- a/drivers/gpu/drm/xe/xe_module.c
> +++ b/drivers/gpu/drm/xe/xe_module.c
> @@ -11,7 +11,6 @@
>  #include "xe_drv.h"
>  #include "xe_hw_fence.h"
>  #include "xe_pci.h"
> -#include "xe_pmu.h"
>  #include "xe_sched_job.h"
>  
>  struct xe_modparam xe_modparam = {
> @@ -63,10 +62,6 @@ static const struct init_funcs init_funcs[] = {
>  		.init = xe_sched_job_module_init,
>  		.exit = xe_sched_job_module_exit,
>  	},
> -	{
> -		.init = xe_pmu_init,
> -		.exit = xe_pmu_exit,
> -	},
>  	{
>  		.init = xe_register_pci_driver,
>  		.exit = xe_unregister_pci_driver,
> diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
> deleted file mode 100644
> index 9d0b7887cfc4..000000000000
> --- a/drivers/gpu/drm/xe/xe_pmu.c
> +++ /dev/null
> @@ -1,645 +0,0 @@
> -// SPDX-License-Identifier: MIT
> -/*
> - * Copyright © 2023 Intel Corporation
> - */
> -
> -#include <drm/drm_drv.h>
> -#include <drm/drm_managed.h>
> -#include <drm/xe_drm.h>
> -
> -#include "regs/xe_gt_regs.h"
> -#include "xe_device.h"
> -#include "xe_gt_clock.h"
> -#include "xe_mmio.h"
> -
> -static cpumask_t xe_pmu_cpumask;
> -static unsigned int xe_pmu_target_cpu = -1;
> -
> -static unsigned int config_gt_id(const u64 config)
> -{
> -	return config >> __DRM_XE_PMU_GT_SHIFT;
> -}
> -
> -static u64 config_counter(const u64 config)
> -{
> -	return config & ~(~0ULL << __DRM_XE_PMU_GT_SHIFT);
> -}
> -
> -static void xe_pmu_event_destroy(struct perf_event *event)
> -{
> -	struct xe_device *xe =
> -		container_of(event->pmu, typeof(*xe), pmu.base);
> -
> -	drm_WARN_ON(&xe->drm, event->parent);
> -
> -	drm_dev_put(&xe->drm);
> -}
> -
> -static u64 __engine_group_busyness_read(struct xe_gt *gt, int sample_type)
> -{
> -	u64 val;
> -
> -	switch (sample_type) {
> -	case __XE_SAMPLE_RENDER_GROUP_BUSY:
> -		val = xe_mmio_read32(gt, XE_OAG_RENDER_BUSY_FREE);
> -		break;
> -	case __XE_SAMPLE_COPY_GROUP_BUSY:
> -		val = xe_mmio_read32(gt, XE_OAG_BLT_BUSY_FREE);
> -		break;
> -	case __XE_SAMPLE_MEDIA_GROUP_BUSY:
> -		val = xe_mmio_read32(gt, XE_OAG_ANY_MEDIA_FF_BUSY_FREE);
> -		break;
> -	case __XE_SAMPLE_ANY_ENGINE_GROUP_BUSY:
> -		val = xe_mmio_read32(gt, XE_OAG_RC0_ANY_ENGINE_BUSY_FREE);
> -		break;
> -	default:
> -		drm_warn(&gt->tile->xe->drm, "unknown pmu event\n");
> -	}
> -
> -	return xe_gt_clock_cycles_to_ns(gt, val * 16);
> -}
> -
> -static u64 engine_group_busyness_read(struct xe_gt *gt, u64 config)
> -{
> -	int sample_type = config_counter(config);
> -	const unsigned int gt_id = gt->info.id;
> -	struct xe_device *xe = gt->tile->xe;
> -	struct xe_pmu *pmu = &xe->pmu;
> -	unsigned long flags;
> -	bool device_awake;
> -	u64 val;
> -
> -	device_awake = xe_device_mem_access_get_if_ongoing(xe);
> -	if (device_awake) {
> -		XE_WARN_ON(xe_force_wake_get(gt_to_fw(gt), XE_FW_GT));
> -		val = __engine_group_busyness_read(gt, sample_type);
> -		XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FW_GT));
> -		xe_device_mem_access_put(xe);
> -	}
> -
> -	spin_lock_irqsave(&pmu->lock, flags);
> -
> -	if (device_awake)
> -		pmu->sample[gt_id][sample_type] = val;
> -	else
> -		val = pmu->sample[gt_id][sample_type];
> -
> -	spin_unlock_irqrestore(&pmu->lock, flags);
> -
> -	return val;
> -}
> -
> -static void engine_group_busyness_store(struct xe_gt *gt)
> -{
> -	struct xe_pmu *pmu = &gt->tile->xe->pmu;
> -	unsigned int gt_id = gt->info.id;
> -	unsigned long flags;
> -	int i;
> -
> -	spin_lock_irqsave(&pmu->lock, flags);
> -
> -	for (i = __XE_SAMPLE_RENDER_GROUP_BUSY; i <= __XE_SAMPLE_ANY_ENGINE_GROUP_BUSY; i++)
> -		pmu->sample[gt_id][i] = __engine_group_busyness_read(gt, i);
> -
> -	spin_unlock_irqrestore(&pmu->lock, flags);
> -}
> -
> -static int
> -config_status(struct xe_device *xe, u64 config)
> -{
> -	unsigned int gt_id = config_gt_id(config);
> -	struct xe_gt *gt = xe_device_get_gt(xe, gt_id);
> -
> -	if (gt_id >= XE_PMU_MAX_GT)
> -		return -ENOENT;
> -
> -	switch (config_counter(config)) {
> -	case DRM_XE_PMU_RENDER_GROUP_BUSY(0):
> -	case DRM_XE_PMU_COPY_GROUP_BUSY(0):
> -	case DRM_XE_PMU_ANY_ENGINE_GROUP_BUSY(0):
> -		if (gt->info.type == XE_GT_TYPE_MEDIA)
> -			return -ENOENT;
> -		break;
> -	case DRM_XE_PMU_MEDIA_GROUP_BUSY(0):
> -		if (!(gt->info.engine_mask & (BIT(XE_HW_ENGINE_VCS0) | BIT(XE_HW_ENGINE_VECS0))))
> -			return -ENOENT;
> -		break;
> -	default:
> -		return -ENOENT;
> -	}
> -
> -	return 0;
> -}
> -
> -static int xe_pmu_event_init(struct perf_event *event)
> -{
> -	struct xe_device *xe =
> -		container_of(event->pmu, typeof(*xe), pmu.base);
> -	struct xe_pmu *pmu = &xe->pmu;
> -	int ret;
> -
> -	if (pmu->closed)
> -		return -ENODEV;
> -
> -	if (event->attr.type != event->pmu->type)
> -		return -ENOENT;
> -
> -	/* unsupported modes and filters */
> -	if (event->attr.sample_period) /* no sampling */
> -		return -EINVAL;
> -
> -	if (has_branch_stack(event))
> -		return -EOPNOTSUPP;
> -
> -	if (event->cpu < 0)
> -		return -EINVAL;
> -
> -	/* only allow running on one cpu at a time */
> -	if (!cpumask_test_cpu(event->cpu, &xe_pmu_cpumask))
> -		return -EINVAL;
> -
> -	ret = config_status(xe, event->attr.config);
> -	if (ret)
> -		return ret;
> -
> -	if (!event->parent) {
> -		drm_dev_get(&xe->drm);
> -		event->destroy = xe_pmu_event_destroy;
> -	}
> -
> -	return 0;
> -}
> -
> -static u64 __xe_pmu_event_read(struct perf_event *event)
> -{
> -	struct xe_device *xe =
> -		container_of(event->pmu, typeof(*xe), pmu.base);
> -	const unsigned int gt_id = config_gt_id(event->attr.config);
> -	const u64 config = event->attr.config;
> -	struct xe_gt *gt = xe_device_get_gt(xe, gt_id);
> -	u64 val;
> -
> -	switch (config_counter(config)) {
> -	case DRM_XE_PMU_RENDER_GROUP_BUSY(0):
> -	case DRM_XE_PMU_COPY_GROUP_BUSY(0):
> -	case DRM_XE_PMU_ANY_ENGINE_GROUP_BUSY(0):
> -	case DRM_XE_PMU_MEDIA_GROUP_BUSY(0):
> -		val = engine_group_busyness_read(gt, config);
> -		break;
> -	default:
> -		drm_warn(&gt->tile->xe->drm, "unknown pmu event\n");
> -	}
> -
> -	return val;
> -}
> -
> -static void xe_pmu_event_read(struct perf_event *event)
> -{
> -	struct xe_device *xe =
> -		container_of(event->pmu, typeof(*xe), pmu.base);
> -	struct hw_perf_event *hwc = &event->hw;
> -	struct xe_pmu *pmu = &xe->pmu;
> -	u64 prev, new;
> -
> -	if (pmu->closed) {
> -		event->hw.state = PERF_HES_STOPPED;
> -		return;
> -	}
> -again:
> -	prev = local64_read(&hwc->prev_count);
> -	new = __xe_pmu_event_read(event);
> -
> -	if (local64_cmpxchg(&hwc->prev_count, prev, new) != prev)
> -		goto again;
> -
> -	local64_add(new - prev, &event->count);
> -}
> -
> -static void xe_pmu_enable(struct perf_event *event)
> -{
> -	/*
> -	 * Store the current counter value so we can report the correct delta
> -	 * for all listeners. Even when the event was already enabled and has
> -	 * an existing non-zero value.
> -	 */
> -	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event));
> -}
> -
> -static void xe_pmu_event_start(struct perf_event *event, int flags)
> -{
> -	struct xe_device *xe =
> -		container_of(event->pmu, typeof(*xe), pmu.base);
> -	struct xe_pmu *pmu = &xe->pmu;
> -
> -	if (pmu->closed)
> -		return;
> -
> -	xe_pmu_enable(event);
> -	event->hw.state = 0;
> -}
> -
> -static void xe_pmu_event_stop(struct perf_event *event, int flags)
> -{
> -	if (flags & PERF_EF_UPDATE)
> -		xe_pmu_event_read(event);
> -
> -	event->hw.state = PERF_HES_STOPPED;
> -}
> -
> -static int xe_pmu_event_add(struct perf_event *event, int flags)
> -{
> -	struct xe_device *xe =
> -		container_of(event->pmu, typeof(*xe), pmu.base);
> -	struct xe_pmu *pmu = &xe->pmu;
> -
> -	if (pmu->closed)
> -		return -ENODEV;
> -
> -	if (flags & PERF_EF_START)
> -		xe_pmu_event_start(event, flags);
> -
> -	return 0;
> -}
> -
> -static void xe_pmu_event_del(struct perf_event *event, int flags)
> -{
> -	xe_pmu_event_stop(event, PERF_EF_UPDATE);
> -}
> -
> -static int xe_pmu_event_event_idx(struct perf_event *event)
> -{
> -	return 0;
> -}
> -
> -struct xe_ext_attribute {
> -	struct device_attribute attr;
> -	unsigned long val;
> -};
> -
> -static ssize_t xe_pmu_event_show(struct device *dev,
> -				 struct device_attribute *attr, char *buf)
> -{
> -	struct xe_ext_attribute *eattr;
> -
> -	eattr = container_of(attr, struct xe_ext_attribute, attr);
> -	return sprintf(buf, "config=0x%lx\n", eattr->val);
> -}
> -
> -static ssize_t cpumask_show(struct device *dev,
> -			    struct device_attribute *attr, char *buf)
> -{
> -	return cpumap_print_to_pagebuf(true, buf, &xe_pmu_cpumask);
> -}
> -
> -static DEVICE_ATTR_RO(cpumask);
> -
> -static struct attribute *xe_cpumask_attrs[] = {
> -	&dev_attr_cpumask.attr,
> -	NULL,
> -};
> -
> -static const struct attribute_group xe_pmu_cpumask_attr_group = {
> -	.attrs = xe_cpumask_attrs,
> -};
> -
> -#define __event(__counter, __name, __unit) \
> -{ \
> -	.counter = (__counter), \
> -	.name = (__name), \
> -	.unit = (__unit), \
> -	.global = false, \
> -}
> -
> -#define __global_event(__counter, __name, __unit) \
> -{ \
> -	.counter = (__counter), \
> -	.name = (__name), \
> -	.unit = (__unit), \
> -	.global = true, \
> -}
> -
> -static struct xe_ext_attribute *
> -add_xe_attr(struct xe_ext_attribute *attr, const char *name, u64 config)
> -{
> -	sysfs_attr_init(&attr->attr.attr);
> -	attr->attr.attr.name = name;
> -	attr->attr.attr.mode = 0444;
> -	attr->attr.show = xe_pmu_event_show;
> -	attr->val = config;
> -
> -	return ++attr;
> -}
> -
> -static struct perf_pmu_events_attr *
> -add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
> -	     const char *str)
> -{
> -	sysfs_attr_init(&attr->attr.attr);
> -	attr->attr.attr.name = name;
> -	attr->attr.attr.mode = 0444;
> -	attr->attr.show = perf_event_sysfs_show;
> -	attr->event_str = str;
> -
> -	return ++attr;
> -}
> -
> -static struct attribute **
> -create_event_attributes(struct xe_pmu *pmu)
> -{
> -	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
> -	static const struct {
> -		unsigned int counter;
> -		const char *name;
> -		const char *unit;
> -		bool global;
> -	} events[] = {
> -		__event(0, "render-group-busy", "ns"),
> -		__event(1, "copy-group-busy", "ns"),
> -		__event(2, "media-group-busy", "ns"),
> -		__event(3, "any-engine-group-busy", "ns"),
> -	};
> -
> -	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
> -	struct xe_ext_attribute *xe_attr = NULL, *xe_iter;
> -	struct attribute **attr = NULL, **attr_iter;
> -	unsigned int count = 0;
> -	unsigned int i, j;
> -	struct xe_gt *gt;
> -
> -	/* Count how many counters we will be exposing. */
> -	for_each_gt(gt, xe, j) {
> -		for (i = 0; i < ARRAY_SIZE(events); i++) {
> -			u64 config = ___DRM_XE_PMU_OTHER(j, events[i].counter);
> -
> -			if (!config_status(xe, config))
> -				count++;
> -		}
> -	}
> -
> -	/* Allocate attribute objects and table. */
> -	xe_attr = kcalloc(count, sizeof(*xe_attr), GFP_KERNEL);
> -	if (!xe_attr)
> -		goto err_alloc;
> -
> -	pmu_attr = kcalloc(count, sizeof(*pmu_attr), GFP_KERNEL);
> -	if (!pmu_attr)
> -		goto err_alloc;
> -
> -	/* Max one pointer of each attribute type plus a termination entry. */
> -	attr = kcalloc(count * 2 + 1, sizeof(*attr), GFP_KERNEL);
> -	if (!attr)
> -		goto err_alloc;
> -
> -	xe_iter = xe_attr;
> -	pmu_iter = pmu_attr;
> -	attr_iter = attr;
> -
> -	for_each_gt(gt, xe, j) {
> -		for (i = 0; i < ARRAY_SIZE(events); i++) {
> -			u64 config = ___DRM_XE_PMU_OTHER(j, events[i].counter);
> -			char *str;
> -
> -			if (config_status(xe, config))
> -				continue;
> -
> -			if (events[i].global)
> -				str = kstrdup(events[i].name, GFP_KERNEL);
> -			else
> -				str = kasprintf(GFP_KERNEL, "%s-gt%u",
> -						events[i].name, j);
> -			if (!str)
> -				goto err;
> -
> -			*attr_iter++ = &xe_iter->attr.attr;
> -			xe_iter = add_xe_attr(xe_iter, str, config);
> -
> -			if (events[i].unit) {
> -				if (events[i].global)
> -					str = kasprintf(GFP_KERNEL, "%s.unit",
> -							events[i].name);
> -				else
> -					str = kasprintf(GFP_KERNEL, "%s-gt%u.unit",
> -							events[i].name, j);
> -				if (!str)
> -					goto err;
> -
> -				*attr_iter++ = &pmu_iter->attr.attr;
> -				pmu_iter = add_pmu_attr(pmu_iter, str,
> -							events[i].unit);
> -			}
> -		}
> -	}
> -
> -	pmu->xe_attr = xe_attr;
> -	pmu->pmu_attr = pmu_attr;
> -
> -	return attr;
> -
> -err:
> -	for (attr_iter = attr; *attr_iter; attr_iter++)
> -		kfree((*attr_iter)->name);
> -
> -err_alloc:
> -	kfree(attr);
> -	kfree(xe_attr);
> -	kfree(pmu_attr);
> -
> -	return NULL;
> -}
> -
> -static void free_event_attributes(struct xe_pmu *pmu)
> -{
> -	struct attribute **attr_iter = pmu->events_attr_group.attrs;
> -
> -	for (; *attr_iter; attr_iter++)
> -		kfree((*attr_iter)->name);
> -
> -	kfree(pmu->events_attr_group.attrs);
> -	kfree(pmu->xe_attr);
> -	kfree(pmu->pmu_attr);
> -
> -	pmu->events_attr_group.attrs = NULL;
> -	pmu->xe_attr = NULL;
> -	pmu->pmu_attr = NULL;
> -}
> -
> -static int xe_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
> -{
> -	struct xe_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
> -
> -	/* Select the first online CPU as a designated reader. */
> -	if (cpumask_empty(&xe_pmu_cpumask))
> -		cpumask_set_cpu(cpu, &xe_pmu_cpumask);
> -
> -	return 0;
> -}
> -
> -static int xe_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
> -{
> -	struct xe_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
> -	unsigned int target = xe_pmu_target_cpu;
> -
> -	/*
> -	 * Unregistering an instance generates a CPU offline event which we must
> -	 * ignore to avoid incorrectly modifying the shared xe_pmu_cpumask.
> -	 */
> -	if (pmu->closed)
> -		return 0;
> -
> -	if (cpumask_test_and_clear_cpu(cpu, &xe_pmu_cpumask)) {
> -		target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
> -
> -		/* Migrate events if there is a valid target */
> -		if (target < nr_cpu_ids) {
> -			cpumask_set_cpu(target, &xe_pmu_cpumask);
> -			xe_pmu_target_cpu = target;
> -		}
> -	}
> -
> -	if (target < nr_cpu_ids && target != pmu->cpuhp.cpu) {
> -		perf_pmu_migrate_context(&pmu->base, cpu, target);
> -		pmu->cpuhp.cpu = target;
> -	}
> -
> -	return 0;
> -}
> -
> -static enum cpuhp_state cpuhp_slot = CPUHP_INVALID;
> -
> -int xe_pmu_init(void)
> -{
> -	int ret;
> -
> -	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
> -				      "perf/x86/intel/xe:online",
> -				      xe_pmu_cpu_online,
> -				      xe_pmu_cpu_offline);
> -	if (ret < 0)
> -		pr_notice("Failed to setup cpuhp state for xe PMU! (%d)\n",
> -			  ret);
> -	else
> -		cpuhp_slot = ret;
> -
> -	return 0;
> -}
> -
> -void xe_pmu_exit(void)
> -{
> -	if (cpuhp_slot != CPUHP_INVALID)
> -		cpuhp_remove_multi_state(cpuhp_slot);
> -}
> -
> -static int xe_pmu_register_cpuhp_state(struct xe_pmu *pmu)
> -{
> -	if (cpuhp_slot == CPUHP_INVALID)
> -		return -EINVAL;
> -
> -	return cpuhp_state_add_instance(cpuhp_slot, &pmu->cpuhp.node);
> -}
> -
> -static void xe_pmu_unregister_cpuhp_state(struct xe_pmu *pmu)
> -{
> -	cpuhp_state_remove_instance(cpuhp_slot, &pmu->cpuhp.node);
> -}
> -
> -void xe_pmu_suspend(struct xe_gt *gt)
> -{
> -	engine_group_busyness_store(gt);
> -}
> -
> -static void xe_pmu_unregister(struct drm_device *device, void *arg)
> -{
> -	struct xe_pmu *pmu = arg;
> -
> -	if (!pmu->base.event_init)
> -		return;
> -
> -	/*
> -	 * "Disconnect" the PMU callbacks - since all are atomic synchronize_rcu
> -	 * ensures all currently executing ones will have exited before we
> -	 * proceed with unregistration.
> -	 */
> -	pmu->closed = true;
> -	synchronize_rcu();
> -
> -	xe_pmu_unregister_cpuhp_state(pmu);
> -
> -	perf_pmu_unregister(&pmu->base);
> -	pmu->base.event_init = NULL;
> -	kfree(pmu->base.attr_groups);
> -	kfree(pmu->name);
> -	free_event_attributes(pmu);
> -}
> -
> -void xe_pmu_register(struct xe_pmu *pmu)
> -{
> -	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
> -	const struct attribute_group *attr_groups[] = {
> -		&pmu->events_attr_group,
> -		&xe_pmu_cpumask_attr_group,
> -		NULL
> -	};
> -
> -	int ret = -ENOMEM;
> -
> -	spin_lock_init(&pmu->lock);
> -	pmu->cpuhp.cpu = -1;
> -
> -	pmu->name = kasprintf(GFP_KERNEL,
> -			      "xe_%s",
> -			      dev_name(xe->drm.dev));
> -	if (pmu->name)
> -		/* tools/perf reserves colons as special. */
> -		strreplace((char *)pmu->name, ':', '_');
> -
> -	if (!pmu->name)
> -		goto err;
> -
> -	pmu->events_attr_group.name = "events";
> -	pmu->events_attr_group.attrs = create_event_attributes(pmu);
> -	if (!pmu->events_attr_group.attrs)
> -		goto err_name;
> -
> -	pmu->base.attr_groups = kmemdup(attr_groups, sizeof(attr_groups),
> -					GFP_KERNEL);
> -	if (!pmu->base.attr_groups)
> -		goto err_attr;
> -
> -	pmu->base.module	= THIS_MODULE;
> -	pmu->base.task_ctx_nr	= perf_invalid_context;
> -	pmu->base.event_init	= xe_pmu_event_init;
> -	pmu->base.add		= xe_pmu_event_add;
> -	pmu->base.del		= xe_pmu_event_del;
> -	pmu->base.start		= xe_pmu_event_start;
> -	pmu->base.stop		= xe_pmu_event_stop;
> -	pmu->base.read		= xe_pmu_event_read;
> -	pmu->base.event_idx	= xe_pmu_event_event_idx;
> -
> -	ret = perf_pmu_register(&pmu->base, pmu->name, -1);
> -	if (ret)
> -		goto err_groups;
> -
> -	ret = xe_pmu_register_cpuhp_state(pmu);
> -	if (ret)
> -		goto err_unreg;
> -
> -	ret = drmm_add_action_or_reset(&xe->drm, xe_pmu_unregister, pmu);
> -	if (ret)
> -		goto err_cpuhp;
> -
> -	return;
> -
> -err_cpuhp:
> -	xe_pmu_unregister_cpuhp_state(pmu);
> -err_unreg:
> -	perf_pmu_unregister(&pmu->base);
> -err_groups:
> -	kfree(pmu->base.attr_groups);
> -err_attr:
> -	pmu->base.event_init = NULL;
> -	free_event_attributes(pmu);
> -err_name:
> -	kfree(pmu->name);
> -err:
> -	drm_notice(&xe->drm, "Failed to register PMU!\n");
> -}
> diff --git a/drivers/gpu/drm/xe/xe_pmu.h b/drivers/gpu/drm/xe/xe_pmu.h
> deleted file mode 100644
> index a99d4ddd023e..000000000000
> --- a/drivers/gpu/drm/xe/xe_pmu.h
> +++ /dev/null
> @@ -1,25 +0,0 @@
> -/* SPDX-License-Identifier: MIT */
> -/*
> - * Copyright © 2023 Intel Corporation
> - */
> -
> -#ifndef _XE_PMU_H_
> -#define _XE_PMU_H_
> -
> -#include "xe_gt_types.h"
> -#include "xe_pmu_types.h"
> -
> -#if IS_ENABLED(CONFIG_PERF_EVENTS)
> -int xe_pmu_init(void);
> -void xe_pmu_exit(void);
> -void xe_pmu_register(struct xe_pmu *pmu);
> -void xe_pmu_suspend(struct xe_gt *gt);
> -#else
> -static inline int xe_pmu_init(void) { return 0; }
> -static inline void xe_pmu_exit(void) {}
> -static inline void xe_pmu_register(struct xe_pmu *pmu) {}
> -static inline void xe_pmu_suspend(struct xe_gt *gt) {}
> -#endif
> -
> -#endif
> -
> diff --git a/drivers/gpu/drm/xe/xe_pmu_types.h b/drivers/gpu/drm/xe/xe_pmu_types.h
> deleted file mode 100644
> index 9cadbd243f57..000000000000
> --- a/drivers/gpu/drm/xe/xe_pmu_types.h
> +++ /dev/null
> @@ -1,68 +0,0 @@
> -/* SPDX-License-Identifier: MIT */
> -/*
> - * Copyright © 2023 Intel Corporation
> - */
> -
> -#ifndef _XE_PMU_TYPES_H_
> -#define _XE_PMU_TYPES_H_
> -
> -#include <linux/perf_event.h>
> -#include <linux/spinlock_types.h>
> -#include <uapi/drm/xe_drm.h>
> -
> -enum {
> -	__XE_SAMPLE_RENDER_GROUP_BUSY,
> -	__XE_SAMPLE_COPY_GROUP_BUSY,
> -	__XE_SAMPLE_MEDIA_GROUP_BUSY,
> -	__XE_SAMPLE_ANY_ENGINE_GROUP_BUSY,
> -	__XE_NUM_PMU_SAMPLERS
> -};
> -
> -#define XE_PMU_MAX_GT 2
> -
> -struct xe_pmu {
> -	/**
> -	 * @cpuhp: Struct used for CPU hotplug handling.
> -	 */
> -	struct {
> -		struct hlist_node node;
> -		unsigned int cpu;
> -	} cpuhp;
> -	/**
> -	 * @base: PMU base.
> -	 */
> -	struct pmu base;
> -	/**
> -	 * @closed: xe is unregistering.
> -	 */
> -	bool closed;
> -	/**
> -	 * @name: Name as registered with perf core.
> -	 */
> -	const char *name;
> -	/**
> -	 * @lock: Lock protecting enable mask and ref count handling.
> -	 */
> -	spinlock_t lock;
> -	/**
> -	 * @sample: Current and previous (raw) counters.
> -	 *
> -	 * These counters are updated when the device is awake.
> -	 *
> -	 */
> -	u64 sample[XE_PMU_MAX_GT][__XE_NUM_PMU_SAMPLERS];
> -	/**
> -	 * @events_attr_group: Device events attribute group.
> -	 */
> -	struct attribute_group events_attr_group;
> -	/**
> -	 * @xe_attr: Memory block holding device attributes.
> -	 */
> -	void *xe_attr;
> -	/**
> -	 * @pmu_attr: Memory block holding device attributes.
> -	 */
> -	void *pmu_attr;
> -};
> -
> -#endif
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 0895e4d2a981..5ba412007270 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -1080,46 +1080,6 @@ struct drm_xe_wait_user_fence {
>  	/** @reserved: Reserved */
>  	__u64 reserved[2];
>  };
> -
> -/**
> - * DOC: XE PMU event config IDs
> - *
> - * Check 'man perf_event_open' to use the ID's DRM_XE_PMU_XXXX listed in xe_drm.h
> - * in 'struct perf_event_attr' as part of perf_event_open syscall to read a
> - * particular event.
> - *
> - * For example to open the DRMXE_PMU_RENDER_GROUP_BUSY(0):
> - *
> - * .. code-block:: C
> - *
> - *	struct perf_event_attr attr;
> - *	long long count;
> - *	int cpu = 0;
> - *	int fd;
> - *
> - *	memset(&attr, 0, sizeof(struct perf_event_attr));
> - *	attr.type = type; // eg: /sys/bus/event_source/devices/xe_0000_56_00.0/type
> - *	attr.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED;
> - *	attr.use_clockid = 1;
> - *	attr.clockid = CLOCK_MONOTONIC;
> - *	attr.config = DRM_XE_PMU_RENDER_GROUP_BUSY(0);
> - *
> - *	fd = syscall(__NR_perf_event_open, &attr, -1, cpu, -1, 0);
> - */
> -
> -/*
> - * Top bits of every counter are GT id.
> - */
> -#define __DRM_XE_PMU_GT_SHIFT (56)
> -
> -#define ___DRM_XE_PMU_OTHER(gt, x) \
> -	(((__u64)(x)) | ((__u64)(gt) << __DRM_XE_PMU_GT_SHIFT))
> -
> -#define DRM_XE_PMU_RENDER_GROUP_BUSY(gt)	___DRM_XE_PMU_OTHER(gt, 0)
> -#define DRM_XE_PMU_COPY_GROUP_BUSY(gt)		___DRM_XE_PMU_OTHER(gt, 1)
> -#define DRM_XE_PMU_MEDIA_GROUP_BUSY(gt)		___DRM_XE_PMU_OTHER(gt, 2)
> -#define DRM_XE_PMU_ANY_ENGINE_GROUP_BUSY(gt)	___DRM_XE_PMU_OTHER(gt, 3)
> -
>  #if defined(__cplusplus)
>  }
>  #endif

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 02/10] fixup! drm/xe/uapi: Reject bo creation of unaligned size
  2023-12-14 13:31   ` Rodrigo Vivi
@ 2023-12-18  5:26     ` Riana Tauro
  0 siblings, 0 replies; 20+ messages in thread
From: Riana Tauro @ 2023-12-18  5:26 UTC (permalink / raw)
  To: Vivi, Rodrigo; +Cc: intel-xe



On 12/14/2023 7:01 PM, Vivi, Rodrigo wrote:
> On Thu, Dec 14, 2023 at 05:01:36PM +0530, Riana Tauro wrote:
>> From: Ashutosh Dixit <ashutosh.dixit@intel.com>
> 
> We don't need this anymore since Ashutosh already put this as part of
> the PMU removal patch anyway.
> 
> But also, please no more !fixup patches. We are about to do our rebase
> to drm-next and will need stable hashes for stable 'Fixes:' tags
> and commit mentions.
Hi Rodrigo

I added this patch only for compilation as its going to be merged.
Wanted to rebase the pmu patches on top of the removal series.

Thanks
Riana

> 
>>
>> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>> ---
>>   drivers/gpu/drm/xe/tests/xe_dma_buf.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
>> index bb6f6424e06f..9f6d571d7fa9 100644
>> --- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
>> +++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
>> @@ -3,6 +3,8 @@
>>    * Copyright © 2022 Intel Corporation
>>    */
>>   
>> +#include <drm/xe_drm.h>
>> +
>>   #include <kunit/test.h>
>>   #include <kunit/visibility.h>
>>   
>> -- 
>> 2.40.0
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 01/10] drm/xe/pmu: Remove PMU from Xe till uapi is finalized
  2023-12-15  3:51   ` Aravind Iddamsetty
@ 2023-12-18  5:30     ` Riana Tauro
  0 siblings, 0 replies; 20+ messages in thread
From: Riana Tauro @ 2023-12-18  5:30 UTC (permalink / raw)
  To: Aravind Iddamsetty, intel-xe



On 12/15/2023 9:21 AM, Aravind Iddamsetty wrote:
> 
> On 12/14/23 17:01, Riana Tauro wrote:
> 
> Hi Riana,
> 
> If you are adding back pmu infra in a later patch then you better just remove
> the group engine events and drop this patch.
> 
> Thanks,
> Aravind.
Hi Aravind

I added this patch only for compilation as this 
https://patchwork.freedesktop.org/series/127664/ is in the process
of getting pushed.

Wanted to rebase patches with the removal patch.

Thanks
Riana
>> From: Ashutosh Dixit <ashutosh.dixit@intel.com>
>>
>> PMU uapi is likely to change in the future. Till the uapi is finalized,
>> remove PMU from Xe. PMU can be re-added after uapi is finalized.
>>
>> v2: Include xe_drm.h in xe/tests/xe_dma_buf.c (Francois)
>>
>> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>> Acked-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
>> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
>> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> ---
>>   drivers/gpu/drm/xe/Makefile          |   2 -
>>   drivers/gpu/drm/xe/regs/xe_gt_regs.h |   5 -
>>   drivers/gpu/drm/xe/xe_device.c       |   2 -
>>   drivers/gpu/drm/xe/xe_device_types.h |   4 -
>>   drivers/gpu/drm/xe/xe_gt.c           |   2 -
>>   drivers/gpu/drm/xe/xe_module.c       |   5 -
>>   drivers/gpu/drm/xe/xe_pmu.c          | 645 ---------------------------
>>   drivers/gpu/drm/xe/xe_pmu.h          |  25 --
>>   drivers/gpu/drm/xe/xe_pmu_types.h    |  68 ---
>>   include/uapi/drm/xe_drm.h            |  40 --
>>   10 files changed, 798 deletions(-)
>>   delete mode 100644 drivers/gpu/drm/xe/xe_pmu.c
>>   delete mode 100644 drivers/gpu/drm/xe/xe_pmu.h
>>   delete mode 100644 drivers/gpu/drm/xe/xe_pmu_types.h
>>
>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>> index f4ae063a7005..b0cb6a9a390e 100644
>> --- a/drivers/gpu/drm/xe/Makefile
>> +++ b/drivers/gpu/drm/xe/Makefile
>> @@ -267,8 +267,6 @@ xe-$(CONFIG_DRM_XE_DISPLAY) += \
>>   	i915-display/skl_universal_plane.o \
>>   	i915-display/skl_watermark.o
>>   
>> -xe-$(CONFIG_PERF_EVENTS) += xe_pmu.o
>> -
>>   ifeq ($(CONFIG_ACPI),y)
>>   	xe-$(CONFIG_DRM_XE_DISPLAY) += \
>>   		i915-display/intel_acpi.o \
>> diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> index f5bf4c6d1761..3c3977c388f5 100644
>> --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>> @@ -333,11 +333,6 @@
>>   #define   INVALIDATION_BROADCAST_MODE_DIS	REG_BIT(12)
>>   #define   GLOBAL_INVALIDATION_MODE		REG_BIT(2)
>>   
>> -#define XE_OAG_RC0_ANY_ENGINE_BUSY_FREE		XE_REG(0xdb80)
>> -#define XE_OAG_ANY_MEDIA_FF_BUSY_FREE		XE_REG(0xdba0)
>> -#define XE_OAG_BLT_BUSY_FREE			XE_REG(0xdbbc)
>> -#define XE_OAG_RENDER_BUSY_FREE			XE_REG(0xdbdc)
>> -
>>   #define HALF_SLICE_CHICKEN5			XE_REG_MCR(0xe188, XE_REG_OPTION_MASKED)
>>   #define   DISABLE_SAMPLE_G_PERFORMANCE		REG_BIT(0)
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>> index 221e87584352..d9ae77fe7382 100644
>> --- a/drivers/gpu/drm/xe/xe_device.c
>> +++ b/drivers/gpu/drm/xe/xe_device.c
>> @@ -529,8 +529,6 @@ int xe_device_probe(struct xe_device *xe)
>>   
>>   	xe_debugfs_register(xe);
>>   
>> -	xe_pmu_register(&xe->pmu);
>> -
>>   	xe_hwmon_register(xe);
>>   
>>   	err = drmm_add_action_or_reset(&xe->drm, xe_device_sanitize, xe);
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>> index d1a48456e9a3..c45ef17b3473 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -18,7 +18,6 @@
>>   #include "xe_lmtt_types.h"
>>   #include "xe_platform_types.h"
>>   #include "xe_pt_types.h"
>> -#include "xe_pmu.h"
>>   #include "xe_sriov_types.h"
>>   #include "xe_step_types.h"
>>   
>> @@ -427,9 +426,6 @@ struct xe_device {
>>   	 */
>>   	struct task_struct *pm_callback_task;
>>   
>> -	/** @pmu: performance monitoring unit */
>> -	struct xe_pmu pmu;
>> -
>>   	/** @hwmon: hwmon subsystem integration */
>>   	struct xe_hwmon *hwmon;
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>> index dfd9cf01a5d5..f5d18e98f8b6 100644
>> --- a/drivers/gpu/drm/xe/xe_gt.c
>> +++ b/drivers/gpu/drm/xe/xe_gt.c
>> @@ -709,8 +709,6 @@ int xe_gt_suspend(struct xe_gt *gt)
>>   	if (err)
>>   		goto err_msg;
>>   
>> -	xe_pmu_suspend(gt);
>> -
>>   	err = xe_uc_suspend(&gt->uc);
>>   	if (err)
>>   		goto err_force_wake;
>> diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
>> index 51bf69b7ab22..110b69864656 100644
>> --- a/drivers/gpu/drm/xe/xe_module.c
>> +++ b/drivers/gpu/drm/xe/xe_module.c
>> @@ -11,7 +11,6 @@
>>   #include "xe_drv.h"
>>   #include "xe_hw_fence.h"
>>   #include "xe_pci.h"
>> -#include "xe_pmu.h"
>>   #include "xe_sched_job.h"
>>   
>>   struct xe_modparam xe_modparam = {
>> @@ -63,10 +62,6 @@ static const struct init_funcs init_funcs[] = {
>>   		.init = xe_sched_job_module_init,
>>   		.exit = xe_sched_job_module_exit,
>>   	},
>> -	{
>> -		.init = xe_pmu_init,
>> -		.exit = xe_pmu_exit,
>> -	},
>>   	{
>>   		.init = xe_register_pci_driver,
>>   		.exit = xe_unregister_pci_driver,
>> diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
>> deleted file mode 100644
>> index 9d0b7887cfc4..000000000000
>> --- a/drivers/gpu/drm/xe/xe_pmu.c
>> +++ /dev/null
>> @@ -1,645 +0,0 @@
>> -// SPDX-License-Identifier: MIT
>> -/*
>> - * Copyright © 2023 Intel Corporation
>> - */
>> -
>> -#include <drm/drm_drv.h>
>> -#include <drm/drm_managed.h>
>> -#include <drm/xe_drm.h>
>> -
>> -#include "regs/xe_gt_regs.h"
>> -#include "xe_device.h"
>> -#include "xe_gt_clock.h"
>> -#include "xe_mmio.h"
>> -
>> -static cpumask_t xe_pmu_cpumask;
>> -static unsigned int xe_pmu_target_cpu = -1;
>> -
>> -static unsigned int config_gt_id(const u64 config)
>> -{
>> -	return config >> __DRM_XE_PMU_GT_SHIFT;
>> -}
>> -
>> -static u64 config_counter(const u64 config)
>> -{
>> -	return config & ~(~0ULL << __DRM_XE_PMU_GT_SHIFT);
>> -}
>> -
>> -static void xe_pmu_event_destroy(struct perf_event *event)
>> -{
>> -	struct xe_device *xe =
>> -		container_of(event->pmu, typeof(*xe), pmu.base);
>> -
>> -	drm_WARN_ON(&xe->drm, event->parent);
>> -
>> -	drm_dev_put(&xe->drm);
>> -}
>> -
>> -static u64 __engine_group_busyness_read(struct xe_gt *gt, int sample_type)
>> -{
>> -	u64 val;
>> -
>> -	switch (sample_type) {
>> -	case __XE_SAMPLE_RENDER_GROUP_BUSY:
>> -		val = xe_mmio_read32(gt, XE_OAG_RENDER_BUSY_FREE);
>> -		break;
>> -	case __XE_SAMPLE_COPY_GROUP_BUSY:
>> -		val = xe_mmio_read32(gt, XE_OAG_BLT_BUSY_FREE);
>> -		break;
>> -	case __XE_SAMPLE_MEDIA_GROUP_BUSY:
>> -		val = xe_mmio_read32(gt, XE_OAG_ANY_MEDIA_FF_BUSY_FREE);
>> -		break;
>> -	case __XE_SAMPLE_ANY_ENGINE_GROUP_BUSY:
>> -		val = xe_mmio_read32(gt, XE_OAG_RC0_ANY_ENGINE_BUSY_FREE);
>> -		break;
>> -	default:
>> -		drm_warn(&gt->tile->xe->drm, "unknown pmu event\n");
>> -	}
>> -
>> -	return xe_gt_clock_cycles_to_ns(gt, val * 16);
>> -}
>> -
>> -static u64 engine_group_busyness_read(struct xe_gt *gt, u64 config)
>> -{
>> -	int sample_type = config_counter(config);
>> -	const unsigned int gt_id = gt->info.id;
>> -	struct xe_device *xe = gt->tile->xe;
>> -	struct xe_pmu *pmu = &xe->pmu;
>> -	unsigned long flags;
>> -	bool device_awake;
>> -	u64 val;
>> -
>> -	device_awake = xe_device_mem_access_get_if_ongoing(xe);
>> -	if (device_awake) {
>> -		XE_WARN_ON(xe_force_wake_get(gt_to_fw(gt), XE_FW_GT));
>> -		val = __engine_group_busyness_read(gt, sample_type);
>> -		XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FW_GT));
>> -		xe_device_mem_access_put(xe);
>> -	}
>> -
>> -	spin_lock_irqsave(&pmu->lock, flags);
>> -
>> -	if (device_awake)
>> -		pmu->sample[gt_id][sample_type] = val;
>> -	else
>> -		val = pmu->sample[gt_id][sample_type];
>> -
>> -	spin_unlock_irqrestore(&pmu->lock, flags);
>> -
>> -	return val;
>> -}
>> -
>> -static void engine_group_busyness_store(struct xe_gt *gt)
>> -{
>> -	struct xe_pmu *pmu = &gt->tile->xe->pmu;
>> -	unsigned int gt_id = gt->info.id;
>> -	unsigned long flags;
>> -	int i;
>> -
>> -	spin_lock_irqsave(&pmu->lock, flags);
>> -
>> -	for (i = __XE_SAMPLE_RENDER_GROUP_BUSY; i <= __XE_SAMPLE_ANY_ENGINE_GROUP_BUSY; i++)
>> -		pmu->sample[gt_id][i] = __engine_group_busyness_read(gt, i);
>> -
>> -	spin_unlock_irqrestore(&pmu->lock, flags);
>> -}
>> -
>> -static int
>> -config_status(struct xe_device *xe, u64 config)
>> -{
>> -	unsigned int gt_id = config_gt_id(config);
>> -	struct xe_gt *gt = xe_device_get_gt(xe, gt_id);
>> -
>> -	if (gt_id >= XE_PMU_MAX_GT)
>> -		return -ENOENT;
>> -
>> -	switch (config_counter(config)) {
>> -	case DRM_XE_PMU_RENDER_GROUP_BUSY(0):
>> -	case DRM_XE_PMU_COPY_GROUP_BUSY(0):
>> -	case DRM_XE_PMU_ANY_ENGINE_GROUP_BUSY(0):
>> -		if (gt->info.type == XE_GT_TYPE_MEDIA)
>> -			return -ENOENT;
>> -		break;
>> -	case DRM_XE_PMU_MEDIA_GROUP_BUSY(0):
>> -		if (!(gt->info.engine_mask & (BIT(XE_HW_ENGINE_VCS0) | BIT(XE_HW_ENGINE_VECS0))))
>> -			return -ENOENT;
>> -		break;
>> -	default:
>> -		return -ENOENT;
>> -	}
>> -
>> -	return 0;
>> -}
>> -
>> -static int xe_pmu_event_init(struct perf_event *event)
>> -{
>> -	struct xe_device *xe =
>> -		container_of(event->pmu, typeof(*xe), pmu.base);
>> -	struct xe_pmu *pmu = &xe->pmu;
>> -	int ret;
>> -
>> -	if (pmu->closed)
>> -		return -ENODEV;
>> -
>> -	if (event->attr.type != event->pmu->type)
>> -		return -ENOENT;
>> -
>> -	/* unsupported modes and filters */
>> -	if (event->attr.sample_period) /* no sampling */
>> -		return -EINVAL;
>> -
>> -	if (has_branch_stack(event))
>> -		return -EOPNOTSUPP;
>> -
>> -	if (event->cpu < 0)
>> -		return -EINVAL;
>> -
>> -	/* only allow running on one cpu at a time */
>> -	if (!cpumask_test_cpu(event->cpu, &xe_pmu_cpumask))
>> -		return -EINVAL;
>> -
>> -	ret = config_status(xe, event->attr.config);
>> -	if (ret)
>> -		return ret;
>> -
>> -	if (!event->parent) {
>> -		drm_dev_get(&xe->drm);
>> -		event->destroy = xe_pmu_event_destroy;
>> -	}
>> -
>> -	return 0;
>> -}
>> -
>> -static u64 __xe_pmu_event_read(struct perf_event *event)
>> -{
>> -	struct xe_device *xe =
>> -		container_of(event->pmu, typeof(*xe), pmu.base);
>> -	const unsigned int gt_id = config_gt_id(event->attr.config);
>> -	const u64 config = event->attr.config;
>> -	struct xe_gt *gt = xe_device_get_gt(xe, gt_id);
>> -	u64 val;
>> -
>> -	switch (config_counter(config)) {
>> -	case DRM_XE_PMU_RENDER_GROUP_BUSY(0):
>> -	case DRM_XE_PMU_COPY_GROUP_BUSY(0):
>> -	case DRM_XE_PMU_ANY_ENGINE_GROUP_BUSY(0):
>> -	case DRM_XE_PMU_MEDIA_GROUP_BUSY(0):
>> -		val = engine_group_busyness_read(gt, config);
>> -		break;
>> -	default:
>> -		drm_warn(&gt->tile->xe->drm, "unknown pmu event\n");
>> -	}
>> -
>> -	return val;
>> -}
>> -
>> -static void xe_pmu_event_read(struct perf_event *event)
>> -{
>> -	struct xe_device *xe =
>> -		container_of(event->pmu, typeof(*xe), pmu.base);
>> -	struct hw_perf_event *hwc = &event->hw;
>> -	struct xe_pmu *pmu = &xe->pmu;
>> -	u64 prev, new;
>> -
>> -	if (pmu->closed) {
>> -		event->hw.state = PERF_HES_STOPPED;
>> -		return;
>> -	}
>> -again:
>> -	prev = local64_read(&hwc->prev_count);
>> -	new = __xe_pmu_event_read(event);
>> -
>> -	if (local64_cmpxchg(&hwc->prev_count, prev, new) != prev)
>> -		goto again;
>> -
>> -	local64_add(new - prev, &event->count);
>> -}
>> -
>> -static void xe_pmu_enable(struct perf_event *event)
>> -{
>> -	/*
>> -	 * Store the current counter value so we can report the correct delta
>> -	 * for all listeners. Even when the event was already enabled and has
>> -	 * an existing non-zero value.
>> -	 */
>> -	local64_set(&event->hw.prev_count, __xe_pmu_event_read(event));
>> -}
>> -
>> -static void xe_pmu_event_start(struct perf_event *event, int flags)
>> -{
>> -	struct xe_device *xe =
>> -		container_of(event->pmu, typeof(*xe), pmu.base);
>> -	struct xe_pmu *pmu = &xe->pmu;
>> -
>> -	if (pmu->closed)
>> -		return;
>> -
>> -	xe_pmu_enable(event);
>> -	event->hw.state = 0;
>> -}
>> -
>> -static void xe_pmu_event_stop(struct perf_event *event, int flags)
>> -{
>> -	if (flags & PERF_EF_UPDATE)
>> -		xe_pmu_event_read(event);
>> -
>> -	event->hw.state = PERF_HES_STOPPED;
>> -}
>> -
>> -static int xe_pmu_event_add(struct perf_event *event, int flags)
>> -{
>> -	struct xe_device *xe =
>> -		container_of(event->pmu, typeof(*xe), pmu.base);
>> -	struct xe_pmu *pmu = &xe->pmu;
>> -
>> -	if (pmu->closed)
>> -		return -ENODEV;
>> -
>> -	if (flags & PERF_EF_START)
>> -		xe_pmu_event_start(event, flags);
>> -
>> -	return 0;
>> -}
>> -
>> -static void xe_pmu_event_del(struct perf_event *event, int flags)
>> -{
>> -	xe_pmu_event_stop(event, PERF_EF_UPDATE);
>> -}
>> -
>> -static int xe_pmu_event_event_idx(struct perf_event *event)
>> -{
>> -	return 0;
>> -}
>> -
>> -struct xe_ext_attribute {
>> -	struct device_attribute attr;
>> -	unsigned long val;
>> -};
>> -
>> -static ssize_t xe_pmu_event_show(struct device *dev,
>> -				 struct device_attribute *attr, char *buf)
>> -{
>> -	struct xe_ext_attribute *eattr;
>> -
>> -	eattr = container_of(attr, struct xe_ext_attribute, attr);
>> -	return sprintf(buf, "config=0x%lx\n", eattr->val);
>> -}
>> -
>> -static ssize_t cpumask_show(struct device *dev,
>> -			    struct device_attribute *attr, char *buf)
>> -{
>> -	return cpumap_print_to_pagebuf(true, buf, &xe_pmu_cpumask);
>> -}
>> -
>> -static DEVICE_ATTR_RO(cpumask);
>> -
>> -static struct attribute *xe_cpumask_attrs[] = {
>> -	&dev_attr_cpumask.attr,
>> -	NULL,
>> -};
>> -
>> -static const struct attribute_group xe_pmu_cpumask_attr_group = {
>> -	.attrs = xe_cpumask_attrs,
>> -};
>> -
>> -#define __event(__counter, __name, __unit) \
>> -{ \
>> -	.counter = (__counter), \
>> -	.name = (__name), \
>> -	.unit = (__unit), \
>> -	.global = false, \
>> -}
>> -
>> -#define __global_event(__counter, __name, __unit) \
>> -{ \
>> -	.counter = (__counter), \
>> -	.name = (__name), \
>> -	.unit = (__unit), \
>> -	.global = true, \
>> -}
>> -
>> -static struct xe_ext_attribute *
>> -add_xe_attr(struct xe_ext_attribute *attr, const char *name, u64 config)
>> -{
>> -	sysfs_attr_init(&attr->attr.attr);
>> -	attr->attr.attr.name = name;
>> -	attr->attr.attr.mode = 0444;
>> -	attr->attr.show = xe_pmu_event_show;
>> -	attr->val = config;
>> -
>> -	return ++attr;
>> -}
>> -
>> -static struct perf_pmu_events_attr *
>> -add_pmu_attr(struct perf_pmu_events_attr *attr, const char *name,
>> -	     const char *str)
>> -{
>> -	sysfs_attr_init(&attr->attr.attr);
>> -	attr->attr.attr.name = name;
>> -	attr->attr.attr.mode = 0444;
>> -	attr->attr.show = perf_event_sysfs_show;
>> -	attr->event_str = str;
>> -
>> -	return ++attr;
>> -}
>> -
>> -static struct attribute **
>> -create_event_attributes(struct xe_pmu *pmu)
>> -{
>> -	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
>> -	static const struct {
>> -		unsigned int counter;
>> -		const char *name;
>> -		const char *unit;
>> -		bool global;
>> -	} events[] = {
>> -		__event(0, "render-group-busy", "ns"),
>> -		__event(1, "copy-group-busy", "ns"),
>> -		__event(2, "media-group-busy", "ns"),
>> -		__event(3, "any-engine-group-busy", "ns"),
>> -	};
>> -
>> -	struct perf_pmu_events_attr *pmu_attr = NULL, *pmu_iter;
>> -	struct xe_ext_attribute *xe_attr = NULL, *xe_iter;
>> -	struct attribute **attr = NULL, **attr_iter;
>> -	unsigned int count = 0;
>> -	unsigned int i, j;
>> -	struct xe_gt *gt;
>> -
>> -	/* Count how many counters we will be exposing. */
>> -	for_each_gt(gt, xe, j) {
>> -		for (i = 0; i < ARRAY_SIZE(events); i++) {
>> -			u64 config = ___DRM_XE_PMU_OTHER(j, events[i].counter);
>> -
>> -			if (!config_status(xe, config))
>> -				count++;
>> -		}
>> -	}
>> -
>> -	/* Allocate attribute objects and table. */
>> -	xe_attr = kcalloc(count, sizeof(*xe_attr), GFP_KERNEL);
>> -	if (!xe_attr)
>> -		goto err_alloc;
>> -
>> -	pmu_attr = kcalloc(count, sizeof(*pmu_attr), GFP_KERNEL);
>> -	if (!pmu_attr)
>> -		goto err_alloc;
>> -
>> -	/* Max one pointer of each attribute type plus a termination entry. */
>> -	attr = kcalloc(count * 2 + 1, sizeof(*attr), GFP_KERNEL);
>> -	if (!attr)
>> -		goto err_alloc;
>> -
>> -	xe_iter = xe_attr;
>> -	pmu_iter = pmu_attr;
>> -	attr_iter = attr;
>> -
>> -	for_each_gt(gt, xe, j) {
>> -		for (i = 0; i < ARRAY_SIZE(events); i++) {
>> -			u64 config = ___DRM_XE_PMU_OTHER(j, events[i].counter);
>> -			char *str;
>> -
>> -			if (config_status(xe, config))
>> -				continue;
>> -
>> -			if (events[i].global)
>> -				str = kstrdup(events[i].name, GFP_KERNEL);
>> -			else
>> -				str = kasprintf(GFP_KERNEL, "%s-gt%u",
>> -						events[i].name, j);
>> -			if (!str)
>> -				goto err;
>> -
>> -			*attr_iter++ = &xe_iter->attr.attr;
>> -			xe_iter = add_xe_attr(xe_iter, str, config);
>> -
>> -			if (events[i].unit) {
>> -				if (events[i].global)
>> -					str = kasprintf(GFP_KERNEL, "%s.unit",
>> -							events[i].name);
>> -				else
>> -					str = kasprintf(GFP_KERNEL, "%s-gt%u.unit",
>> -							events[i].name, j);
>> -				if (!str)
>> -					goto err;
>> -
>> -				*attr_iter++ = &pmu_iter->attr.attr;
>> -				pmu_iter = add_pmu_attr(pmu_iter, str,
>> -							events[i].unit);
>> -			}
>> -		}
>> -	}
>> -
>> -	pmu->xe_attr = xe_attr;
>> -	pmu->pmu_attr = pmu_attr;
>> -
>> -	return attr;
>> -
>> -err:
>> -	for (attr_iter = attr; *attr_iter; attr_iter++)
>> -		kfree((*attr_iter)->name);
>> -
>> -err_alloc:
>> -	kfree(attr);
>> -	kfree(xe_attr);
>> -	kfree(pmu_attr);
>> -
>> -	return NULL;
>> -}
>> -
>> -static void free_event_attributes(struct xe_pmu *pmu)
>> -{
>> -	struct attribute **attr_iter = pmu->events_attr_group.attrs;
>> -
>> -	for (; *attr_iter; attr_iter++)
>> -		kfree((*attr_iter)->name);
>> -
>> -	kfree(pmu->events_attr_group.attrs);
>> -	kfree(pmu->xe_attr);
>> -	kfree(pmu->pmu_attr);
>> -
>> -	pmu->events_attr_group.attrs = NULL;
>> -	pmu->xe_attr = NULL;
>> -	pmu->pmu_attr = NULL;
>> -}
>> -
>> -static int xe_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
>> -{
>> -	struct xe_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
>> -
>> -	/* Select the first online CPU as a designated reader. */
>> -	if (cpumask_empty(&xe_pmu_cpumask))
>> -		cpumask_set_cpu(cpu, &xe_pmu_cpumask);
>> -
>> -	return 0;
>> -}
>> -
>> -static int xe_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
>> -{
>> -	struct xe_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
>> -	unsigned int target = xe_pmu_target_cpu;
>> -
>> -	/*
>> -	 * Unregistering an instance generates a CPU offline event which we must
>> -	 * ignore to avoid incorrectly modifying the shared xe_pmu_cpumask.
>> -	 */
>> -	if (pmu->closed)
>> -		return 0;
>> -
>> -	if (cpumask_test_and_clear_cpu(cpu, &xe_pmu_cpumask)) {
>> -		target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
>> -
>> -		/* Migrate events if there is a valid target */
>> -		if (target < nr_cpu_ids) {
>> -			cpumask_set_cpu(target, &xe_pmu_cpumask);
>> -			xe_pmu_target_cpu = target;
>> -		}
>> -	}
>> -
>> -	if (target < nr_cpu_ids && target != pmu->cpuhp.cpu) {
>> -		perf_pmu_migrate_context(&pmu->base, cpu, target);
>> -		pmu->cpuhp.cpu = target;
>> -	}
>> -
>> -	return 0;
>> -}
>> -
>> -static enum cpuhp_state cpuhp_slot = CPUHP_INVALID;
>> -
>> -int xe_pmu_init(void)
>> -{
>> -	int ret;
>> -
>> -	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
>> -				      "perf/x86/intel/xe:online",
>> -				      xe_pmu_cpu_online,
>> -				      xe_pmu_cpu_offline);
>> -	if (ret < 0)
>> -		pr_notice("Failed to setup cpuhp state for xe PMU! (%d)\n",
>> -			  ret);
>> -	else
>> -		cpuhp_slot = ret;
>> -
>> -	return 0;
>> -}
>> -
>> -void xe_pmu_exit(void)
>> -{
>> -	if (cpuhp_slot != CPUHP_INVALID)
>> -		cpuhp_remove_multi_state(cpuhp_slot);
>> -}
>> -
>> -static int xe_pmu_register_cpuhp_state(struct xe_pmu *pmu)
>> -{
>> -	if (cpuhp_slot == CPUHP_INVALID)
>> -		return -EINVAL;
>> -
>> -	return cpuhp_state_add_instance(cpuhp_slot, &pmu->cpuhp.node);
>> -}
>> -
>> -static void xe_pmu_unregister_cpuhp_state(struct xe_pmu *pmu)
>> -{
>> -	cpuhp_state_remove_instance(cpuhp_slot, &pmu->cpuhp.node);
>> -}
>> -
>> -void xe_pmu_suspend(struct xe_gt *gt)
>> -{
>> -	engine_group_busyness_store(gt);
>> -}
>> -
>> -static void xe_pmu_unregister(struct drm_device *device, void *arg)
>> -{
>> -	struct xe_pmu *pmu = arg;
>> -
>> -	if (!pmu->base.event_init)
>> -		return;
>> -
>> -	/*
>> -	 * "Disconnect" the PMU callbacks - since all are atomic synchronize_rcu
>> -	 * ensures all currently executing ones will have exited before we
>> -	 * proceed with unregistration.
>> -	 */
>> -	pmu->closed = true;
>> -	synchronize_rcu();
>> -
>> -	xe_pmu_unregister_cpuhp_state(pmu);
>> -
>> -	perf_pmu_unregister(&pmu->base);
>> -	pmu->base.event_init = NULL;
>> -	kfree(pmu->base.attr_groups);
>> -	kfree(pmu->name);
>> -	free_event_attributes(pmu);
>> -}
>> -
>> -void xe_pmu_register(struct xe_pmu *pmu)
>> -{
>> -	struct xe_device *xe = container_of(pmu, typeof(*xe), pmu);
>> -	const struct attribute_group *attr_groups[] = {
>> -		&pmu->events_attr_group,
>> -		&xe_pmu_cpumask_attr_group,
>> -		NULL
>> -	};
>> -
>> -	int ret = -ENOMEM;
>> -
>> -	spin_lock_init(&pmu->lock);
>> -	pmu->cpuhp.cpu = -1;
>> -
>> -	pmu->name = kasprintf(GFP_KERNEL,
>> -			      "xe_%s",
>> -			      dev_name(xe->drm.dev));
>> -	if (pmu->name)
>> -		/* tools/perf reserves colons as special. */
>> -		strreplace((char *)pmu->name, ':', '_');
>> -
>> -	if (!pmu->name)
>> -		goto err;
>> -
>> -	pmu->events_attr_group.name = "events";
>> -	pmu->events_attr_group.attrs = create_event_attributes(pmu);
>> -	if (!pmu->events_attr_group.attrs)
>> -		goto err_name;
>> -
>> -	pmu->base.attr_groups = kmemdup(attr_groups, sizeof(attr_groups),
>> -					GFP_KERNEL);
>> -	if (!pmu->base.attr_groups)
>> -		goto err_attr;
>> -
>> -	pmu->base.module	= THIS_MODULE;
>> -	pmu->base.task_ctx_nr	= perf_invalid_context;
>> -	pmu->base.event_init	= xe_pmu_event_init;
>> -	pmu->base.add		= xe_pmu_event_add;
>> -	pmu->base.del		= xe_pmu_event_del;
>> -	pmu->base.start		= xe_pmu_event_start;
>> -	pmu->base.stop		= xe_pmu_event_stop;
>> -	pmu->base.read		= xe_pmu_event_read;
>> -	pmu->base.event_idx	= xe_pmu_event_event_idx;
>> -
>> -	ret = perf_pmu_register(&pmu->base, pmu->name, -1);
>> -	if (ret)
>> -		goto err_groups;
>> -
>> -	ret = xe_pmu_register_cpuhp_state(pmu);
>> -	if (ret)
>> -		goto err_unreg;
>> -
>> -	ret = drmm_add_action_or_reset(&xe->drm, xe_pmu_unregister, pmu);
>> -	if (ret)
>> -		goto err_cpuhp;
>> -
>> -	return;
>> -
>> -err_cpuhp:
>> -	xe_pmu_unregister_cpuhp_state(pmu);
>> -err_unreg:
>> -	perf_pmu_unregister(&pmu->base);
>> -err_groups:
>> -	kfree(pmu->base.attr_groups);
>> -err_attr:
>> -	pmu->base.event_init = NULL;
>> -	free_event_attributes(pmu);
>> -err_name:
>> -	kfree(pmu->name);
>> -err:
>> -	drm_notice(&xe->drm, "Failed to register PMU!\n");
>> -}
>> diff --git a/drivers/gpu/drm/xe/xe_pmu.h b/drivers/gpu/drm/xe/xe_pmu.h
>> deleted file mode 100644
>> index a99d4ddd023e..000000000000
>> --- a/drivers/gpu/drm/xe/xe_pmu.h
>> +++ /dev/null
>> @@ -1,25 +0,0 @@
>> -/* SPDX-License-Identifier: MIT */
>> -/*
>> - * Copyright © 2023 Intel Corporation
>> - */
>> -
>> -#ifndef _XE_PMU_H_
>> -#define _XE_PMU_H_
>> -
>> -#include "xe_gt_types.h"
>> -#include "xe_pmu_types.h"
>> -
>> -#if IS_ENABLED(CONFIG_PERF_EVENTS)
>> -int xe_pmu_init(void);
>> -void xe_pmu_exit(void);
>> -void xe_pmu_register(struct xe_pmu *pmu);
>> -void xe_pmu_suspend(struct xe_gt *gt);
>> -#else
>> -static inline int xe_pmu_init(void) { return 0; }
>> -static inline void xe_pmu_exit(void) {}
>> -static inline void xe_pmu_register(struct xe_pmu *pmu) {}
>> -static inline void xe_pmu_suspend(struct xe_gt *gt) {}
>> -#endif
>> -
>> -#endif
>> -
>> diff --git a/drivers/gpu/drm/xe/xe_pmu_types.h b/drivers/gpu/drm/xe/xe_pmu_types.h
>> deleted file mode 100644
>> index 9cadbd243f57..000000000000
>> --- a/drivers/gpu/drm/xe/xe_pmu_types.h
>> +++ /dev/null
>> @@ -1,68 +0,0 @@
>> -/* SPDX-License-Identifier: MIT */
>> -/*
>> - * Copyright © 2023 Intel Corporation
>> - */
>> -
>> -#ifndef _XE_PMU_TYPES_H_
>> -#define _XE_PMU_TYPES_H_
>> -
>> -#include <linux/perf_event.h>
>> -#include <linux/spinlock_types.h>
>> -#include <uapi/drm/xe_drm.h>
>> -
>> -enum {
>> -	__XE_SAMPLE_RENDER_GROUP_BUSY,
>> -	__XE_SAMPLE_COPY_GROUP_BUSY,
>> -	__XE_SAMPLE_MEDIA_GROUP_BUSY,
>> -	__XE_SAMPLE_ANY_ENGINE_GROUP_BUSY,
>> -	__XE_NUM_PMU_SAMPLERS
>> -};
>> -
>> -#define XE_PMU_MAX_GT 2
>> -
>> -struct xe_pmu {
>> -	/**
>> -	 * @cpuhp: Struct used for CPU hotplug handling.
>> -	 */
>> -	struct {
>> -		struct hlist_node node;
>> -		unsigned int cpu;
>> -	} cpuhp;
>> -	/**
>> -	 * @base: PMU base.
>> -	 */
>> -	struct pmu base;
>> -	/**
>> -	 * @closed: xe is unregistering.
>> -	 */
>> -	bool closed;
>> -	/**
>> -	 * @name: Name as registered with perf core.
>> -	 */
>> -	const char *name;
>> -	/**
>> -	 * @lock: Lock protecting enable mask and ref count handling.
>> -	 */
>> -	spinlock_t lock;
>> -	/**
>> -	 * @sample: Current and previous (raw) counters.
>> -	 *
>> -	 * These counters are updated when the device is awake.
>> -	 *
>> -	 */
>> -	u64 sample[XE_PMU_MAX_GT][__XE_NUM_PMU_SAMPLERS];
>> -	/**
>> -	 * @events_attr_group: Device events attribute group.
>> -	 */
>> -	struct attribute_group events_attr_group;
>> -	/**
>> -	 * @xe_attr: Memory block holding device attributes.
>> -	 */
>> -	void *xe_attr;
>> -	/**
>> -	 * @pmu_attr: Memory block holding device attributes.
>> -	 */
>> -	void *pmu_attr;
>> -};
>> -
>> -#endif
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 0895e4d2a981..5ba412007270 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -1080,46 +1080,6 @@ struct drm_xe_wait_user_fence {
>>   	/** @reserved: Reserved */
>>   	__u64 reserved[2];
>>   };
>> -
>> -/**
>> - * DOC: XE PMU event config IDs
>> - *
>> - * Check 'man perf_event_open' to use the ID's DRM_XE_PMU_XXXX listed in xe_drm.h
>> - * in 'struct perf_event_attr' as part of perf_event_open syscall to read a
>> - * particular event.
>> - *
>> - * For example to open the DRMXE_PMU_RENDER_GROUP_BUSY(0):
>> - *
>> - * .. code-block:: C
>> - *
>> - *	struct perf_event_attr attr;
>> - *	long long count;
>> - *	int cpu = 0;
>> - *	int fd;
>> - *
>> - *	memset(&attr, 0, sizeof(struct perf_event_attr));
>> - *	attr.type = type; // eg: /sys/bus/event_source/devices/xe_0000_56_00.0/type
>> - *	attr.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED;
>> - *	attr.use_clockid = 1;
>> - *	attr.clockid = CLOCK_MONOTONIC;
>> - *	attr.config = DRM_XE_PMU_RENDER_GROUP_BUSY(0);
>> - *
>> - *	fd = syscall(__NR_perf_event_open, &attr, -1, cpu, -1, 0);
>> - */
>> -
>> -/*
>> - * Top bits of every counter are GT id.
>> - */
>> -#define __DRM_XE_PMU_GT_SHIFT (56)
>> -
>> -#define ___DRM_XE_PMU_OTHER(gt, x) \
>> -	(((__u64)(x)) | ((__u64)(gt) << __DRM_XE_PMU_GT_SHIFT))
>> -
>> -#define DRM_XE_PMU_RENDER_GROUP_BUSY(gt)	___DRM_XE_PMU_OTHER(gt, 0)
>> -#define DRM_XE_PMU_COPY_GROUP_BUSY(gt)		___DRM_XE_PMU_OTHER(gt, 1)
>> -#define DRM_XE_PMU_MEDIA_GROUP_BUSY(gt)		___DRM_XE_PMU_OTHER(gt, 2)
>> -#define DRM_XE_PMU_ANY_ENGINE_GROUP_BUSY(gt)	___DRM_XE_PMU_OTHER(gt, 3)
>> -
>>   #if defined(__cplusplus)
>>   }
>>   #endif

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 04/10] RFC drm/xe/guc: Add interface for engine busyness ticks
  2023-12-15  2:48   ` Nilawar, Badal
@ 2023-12-18  6:07     ` Riana Tauro
  0 siblings, 0 replies; 20+ messages in thread
From: Riana Tauro @ 2023-12-18  6:07 UTC (permalink / raw)
  To: Nilawar, Badal, intel-xe



On 12/15/2023 8:18 AM, Nilawar, Badal wrote:
> 
> 
> On 14-12-2023 17:01, Riana Tauro wrote:
>> GuC provides engine busyness ticks as a 64 bit counter which count
>> as clock ticks. These counters are maintained in a
>> shared memory buffer and updated on a continuous basis.
>>
>> Add functions that initialize Engine busyness and get
>> the current accumulated busyness.
>>
>> Co-developed-by: John Harrison <John.C.Harrison@Intel.com>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>>    drivers/gpu/drm/xe/Makefile                 |   1 +
>>    drivers/gpu/drm/xe/abi/guc_actions_abi.h    |   1 +
>>    drivers/gpu/drm/xe/xe_gt.c                  |  13 ++
>>    drivers/gpu/drm/xe/xe_gt.h                  |   2 +
>>    drivers/gpu/drm/xe/xe_guc.c                 |   7 +
>>    drivers/gpu/drm/xe/xe_guc_engine_busyness.c | 153 ++++++++++++++++++++
>>    drivers/gpu/drm/xe/xe_guc_engine_busyness.h |  17 +++
>>    drivers/gpu/drm/xe/xe_guc_fwif.h            |  15 ++
>>    drivers/gpu/drm/xe/xe_guc_types.h           |   6 +
>>    9 files changed, 215 insertions(+)
>>    create mode 100644 drivers/gpu/drm/xe/xe_guc_engine_busyness.c
>>    create mode 100644 drivers/gpu/drm/xe/xe_guc_engine_busyness.h
>>
>> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>> index b0cb6a9a390e..2523dc96986e 100644
>> --- a/drivers/gpu/drm/xe/Makefile
>> +++ b/drivers/gpu/drm/xe/Makefile
>> @@ -85,6 +85,7 @@ xe-y += xe_bb.o \
>>    	xe_guc_ads.o \
>>    	xe_guc_ct.o \
>>    	xe_guc_debugfs.o \
>> +	xe_guc_engine_busyness.o \
>>    	xe_guc_hwconfig.o \
>>    	xe_guc_log.o \
>>    	xe_guc_pc.o \
>> diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
>> index 3062e0e0d467..d87681ca89bc 100644
>> --- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
>> +++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
>> @@ -139,6 +139,7 @@ enum xe_guc_action {
>>    	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
>>    	XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
>>    	XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
>> +	XE_GUC_ACTION_SET_DEVICE_ENGINE_UTILIZATION = 0x550C,
>>    	XE_GUC_ACTION_NOTIFY_MEMORY_CAT_ERROR = 0x6000,
>>    	XE_GUC_ACTION_REPORT_PAGE_FAULT_REQ_DESC = 0x6002,
>>    	XE_GUC_ACTION_PAGE_FAULT_RES_DESC = 0x6003,
>> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>> index f5d18e98f8b6..9c84afb00f7b 100644
>> --- a/drivers/gpu/drm/xe/xe_gt.c
>> +++ b/drivers/gpu/drm/xe/xe_gt.c
>> @@ -32,6 +32,7 @@
>>    #include "xe_gt_sysfs.h"
>>    #include "xe_gt_tlb_invalidation.h"
>>    #include "xe_gt_topology.h"
>> +#include "xe_guc_engine_busyness.h"
>>    #include "xe_guc_exec_queue_types.h"
>>    #include "xe_guc_pc.h"
>>    #include "xe_hw_fence.h"
>> @@ -794,3 +795,15 @@ struct xe_hw_engine *xe_gt_any_hw_engine_by_reset_domain(struct xe_gt *gt,
>>    
>>    	return NULL;
>>    }
>> +
>> +/**
>> + * xe_gt_engine_busy_ticks - Return current accumulated engine busyness ticks
>> + * @gt: GT structure
>> + * @hwe: Xe HW engine to report on
>> + *
>> + * Returns accumulated ticks @hwe was busy since engine stats were enabled.
>> + */
>> +u64 xe_gt_engine_busy_ticks(struct xe_gt *gt, struct xe_hw_engine *hwe)
>> +{
>> +	return xe_guc_engine_busyness_ticks(&gt->uc.guc, hwe);
>> +}
>> diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
>> index f3c780bd266d..5b4309310126 100644
>> --- a/drivers/gpu/drm/xe/xe_gt.h
>> +++ b/drivers/gpu/drm/xe/xe_gt.h
>> @@ -42,6 +42,8 @@ int xe_gt_resume(struct xe_gt *gt);
>>    void xe_gt_reset_async(struct xe_gt *gt);
>>    void xe_gt_sanitize(struct xe_gt *gt);
>>    
>> +u64 xe_gt_engine_busy_ticks(struct xe_gt *gt, struct xe_hw_engine *hwe);
>> +
>>    /**
>>     * xe_gt_any_hw_engine_by_reset_domain - scan the list of engines and return the
>>     * first that matches the same reset domain as @class
>> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
>> index 482cb0df9f15..6116aaea936f 100644
>> --- a/drivers/gpu/drm/xe/xe_guc.c
>> +++ b/drivers/gpu/drm/xe/xe_guc.c
>> @@ -18,6 +18,7 @@
>>    #include "xe_gt.h"
>>    #include "xe_guc_ads.h"
>>    #include "xe_guc_ct.h"
>> +#include "xe_guc_engine_busyness.h"
>>    #include "xe_guc_hwconfig.h"
>>    #include "xe_guc_log.h"
>>    #include "xe_guc_pc.h"
>> @@ -306,9 +307,15 @@ int xe_guc_init_post_hwconfig(struct xe_guc *guc)
>>    
>>    int xe_guc_post_load_init(struct xe_guc *guc)
>>    {
>> +	int err;
>> +
>>    	xe_guc_ads_populate_post_load(&guc->ads);
>>    	guc->submission_state.enabled = true;
>>    
>> +	err = xe_guc_engine_busyness_init(guc);
>> +	if (err)
>> +		return err;
>> +
>>    	return 0;
>>    }
>>    
>> diff --git a/drivers/gpu/drm/xe/xe_guc_engine_busyness.c b/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
>> new file mode 100644
>> index 000000000000..287429e31e6c
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_guc_engine_busyness.c
>> @@ -0,0 +1,153 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2023 Intel Corporation
>> + */
>> +#include "xe_guc_engine_busyness.h"
>> +
>> +#include <drm/drm_managed.h>
>> +
>> +#include "abi/guc_actions_abi.h"
>> +#include "xe_bo.h"
>> +#include "xe_guc.h"
>> +#include "xe_guc_ct.h"
>> +
>> +/**
>> + * DOC: Xe GuC Engine Busyness
>> + *
>> + * GuC >= 70.11.1 maintains busyness counters in a shared memory buffer for each
>> + * engine on a continuous basis. The counters are all 64 bits and count in clock
>> + * ticks. The values are updated on context switch events and periodicaly on a
>> + * timer internal to GuC. The update rate is guaranteed to be at least 2Hz (but with
>> + * a caveat that is not real time, best effort only).
>> + *
>> + * engine busyness ticks (ticks_engine) : clock ticks for which engine was active
>> + */
>> +
>> +static void guc_engine_busyness_usage_map(struct xe_guc *guc,
>> +					  struct xe_hw_engine *hwe,
>> +					  struct iosys_map *engine_map)
>> +{
>> +	struct iosys_map *map;
>> +	size_t offset;
>> +	u32 instance;
>> +	u8 guc_class;
>> +
>> +	guc_class = xe_engine_class_to_guc_class(hwe->class);
>> +	instance = hwe->logical_instance;
>> +
>> +	map = &guc->busy.bo->vmap;
>> +
>> +	offset = offsetof(struct guc_engine_observation_data,
>> +			  engine_data[guc_class][instance]);
>> +
>> +	*engine_map = IOSYS_MAP_INIT_OFFSET(map, offset);
>> +}
>> +
>> +static void guc_engine_busyness_get_usage(struct xe_guc *guc,
>> +					  struct xe_hw_engine *hwe,
>> +					  u64 *_ticks_engine)
>> +{
>> +	struct iosys_map engine_map;
>> +	u64 ticks_engine = 0;
>> +	int i = 0;
>> +
>> +	guc_engine_busyness_usage_map(guc, hwe, &engine_map);
>> +
>> +#define read_engine_usage(map_, field_) \
>> +	iosys_map_rd_field(map_, 0, struct guc_engine_data, field_)
>> +
>> +	do {
>> +		ticks_engine = read_engine_usage(&engine_map, total_execution_ticks);
>> +
>> +		if (read_engine_usage(&engine_map, total_execution_ticks) == ticks_engine)
>> +			break;
>> +	} while (++i < 6);
>> +
>> +#undef read_engine_usage
>> +
>> +	if (_ticks_engine)
>> +		*_ticks_engine = ticks_engine;
>> +}
>> +
>> +static void guc_engine_busyness_enable_stats(struct xe_guc *guc)
>> +{
>> +	u32 ggtt_addr = xe_bo_ggtt_addr(guc->busy.bo);
>> +	u32 action[] = {
>> +		XE_GUC_ACTION_SET_DEVICE_ENGINE_UTILIZATION,
>> +		ggtt_addr,
>> +		0,
>> +	};
>> +	struct xe_device *xe = guc_to_xe(guc);
>> +	int ret;
>> +
>> +	ret = xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
>> +	if (ret)
>> +		drm_err(&xe->drm, "Failed to enable usage stats %pe", ERR_PTR(ret));
>> +}
>> +
>> +static void guc_engine_busyness_fini(struct drm_device *drm, void *arg)
>> +{
>> +	struct xe_guc *guc = arg;
>> +
>> +	xe_bo_unpin_map_no_vm(guc->busy.bo);
>> +}
>> +
>> +/*
>> + * xe_guc_engine_busyness_ticks - Gets current accumulated
>> + *				  engine busyness ticks
>> + * @guc: The GuC object
>> + * @hwe: Xe HW Engine
>> + *
>> + * Returns current acculumated ticks @hwe was busy when engine stats are enabled.
>> + */
>> +u64 xe_guc_engine_busyness_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe)
>> +{
>> +	u64 ticks_engine;
>> +
>> +	guc_engine_busyness_get_usage(guc, hwe, &ticks_engine);
>> +
>> +	return ticks_engine;
>> +}
>> +
>> +/*
>> + * xe_guc_engine_busyness_init - Initializes the GuC Engine Busyness
>> + * @guc: The GuC object
>> + *
>> + * Initialize GuC engine busyness, only called once during driver load
>> + * Supported only on GuC >= 70.11.1
>> + *
>> + * Return: 0 on success, negative error code on error.
>> + */
>> +int xe_guc_engine_busyness_init(struct xe_guc *guc)
>> +{
>> +	struct xe_device *xe = guc_to_xe(guc);
>> +	struct xe_gt *gt = guc_to_gt(guc);
>> +	struct xe_tile *tile = gt_to_tile(gt);
>> +	struct xe_bo *bo;
>> +	u32 size;
>> +	int err;
>> +
> How about adding guc version check here and places applicable.
> I am seeing patch 8 handles version check but how about moving  version
> check function in this patch only. This will also align the code with
> the doc.
> 
Hi Badal

I added it seperately for the ease of review. Since this patch is big i
could seperate only that out. Once that version check function is acked 
i'll move it here?

Thanks
Riana
> Regards,
> Badal  > +	/* Initialization already done */
>> +	if (guc->busy.bo)
>> +		return 0;
>> +
>> +	size = PAGE_ALIGN(sizeof(struct guc_engine_observation_data));
>> +
>> +	bo = xe_bo_create_pin_map(xe, tile, NULL, size,
>> +				  ttm_bo_type_kernel,
>> +				  XE_BO_CREATE_VRAM_IF_DGFX(tile) |
>> +				  XE_BO_CREATE_GGTT_BIT);
>> +
>> +	if (IS_ERR(bo))
>> +		return PTR_ERR(bo);
>> +
>> +	guc->busy.bo = bo;
>> +
>> +	guc_engine_busyness_enable_stats(guc);
>> +
>> +	err = drmm_add_action_or_reset(&xe->drm, guc_engine_busyness_fini, guc);
>> +	if (err)
>> +		return err;
>> +
>> +	return 0;
>> +}
>> diff --git a/drivers/gpu/drm/xe/xe_guc_engine_busyness.h b/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
>> new file mode 100644
>> index 000000000000..d70f06209896
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_guc_engine_busyness.h
>> @@ -0,0 +1,17 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2023 Intel Corporation
>> + */
>> +
>> +#ifndef _XE_GUC_ENGINE_BUSYNESS_H_
>> +#define _XE_GUC_ENGINE_BUSYNESS_H_
>> +
>> +#include <linux/types.h>
>> +
>> +struct xe_hw_engine;
>> +struct xe_guc;
>> +
>> +int xe_guc_engine_busyness_init(struct xe_guc *guc);
>> +u64 xe_guc_engine_busyness_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe);
>> +
>> +#endif
>> diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h b/drivers/gpu/drm/xe/xe_guc_fwif.h
>> index 4dd5a88a7826..c8ca5fe97614 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_fwif.h
>> +++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
>> @@ -37,6 +37,7 @@
>>    #define GUC_COMPUTE_CLASS		4
>>    #define GUC_GSC_OTHER_CLASS		5
>>    #define GUC_LAST_ENGINE_CLASS		GUC_GSC_OTHER_CLASS
>> +#define GUC_MAX_OAG_COUNTERS		8
>>    #define GUC_MAX_ENGINE_CLASSES		16
>>    #define GUC_MAX_INSTANCES_PER_CLASS	32
>>    
>> @@ -222,6 +223,20 @@ struct guc_engine_usage {
>>    	struct guc_engine_usage_record engines[GUC_MAX_ENGINE_CLASSES][GUC_MAX_INSTANCES_PER_CLASS];
>>    } __packed;
>>    
>> +/* Engine busyness stats */
>> +struct guc_engine_data {
>> +	u64 total_execution_ticks;
>> +	u64 reserved;
>> +} __packed;
>> +
>> +struct guc_engine_observation_data {
>> +	struct guc_engine_data engine_data[GUC_MAX_ENGINE_CLASSES][GUC_MAX_INSTANCES_PER_CLASS];
>> +	u64 oag_busy_data[GUC_MAX_OAG_COUNTERS];
>> +	u64 total_active_ticks;
>> +	u64 gt_timestamp;
>> +	u64 reserved1;
>> +} __packed;
>> +
>>    /* This action will be programmed in C1BC - SOFT_SCRATCH_15_REG */
>>    enum xe_guc_recv_message {
>>    	XE_GUC_RECV_MSG_CRASH_DUMP_POSTED = BIT(1),
>> diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
>> index cd80802e8918..4e9602301aed 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_types.h
>> +++ b/drivers/gpu/drm/xe/xe_guc_types.h
>> @@ -70,6 +70,12 @@ struct xe_guc {
>>    		u32 size;
>>    	} hwconfig;
>>    
>> +	/** @busy: Engine busyness */
>> +	struct {
>> +		/** @bo: GGTT buffer object of engine busyness that is shared with GuC */
>> +		struct xe_bo *bo;
>> +	} busy;
>> +
>>    	/**
>>    	 * @notify_reg: Register which is written to notify GuC of H2G messages
>>    	 */

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-12-18 18:46 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-14 11:31 [PATCH v3 00/10] Engine Busyness Riana Tauro
2023-12-14 11:31 ` [PATCH v3 01/10] drm/xe/pmu: Remove PMU from Xe till uapi is finalized Riana Tauro
2023-12-15  3:51   ` Aravind Iddamsetty
2023-12-18  5:30     ` Riana Tauro
2023-12-14 11:31 ` [PATCH v3 02/10] fixup! drm/xe/uapi: Reject bo creation of unaligned size Riana Tauro
2023-12-14 13:31   ` Rodrigo Vivi
2023-12-18  5:26     ` Riana Tauro
2023-12-14 11:31 ` [PATCH v3 03/10] drm/xe: Move user engine class mappings to functions Riana Tauro
2023-12-14 11:31 ` [PATCH v3 04/10] RFC drm/xe/guc: Add interface for engine busyness ticks Riana Tauro
2023-12-15  2:48   ` Nilawar, Badal
2023-12-18  6:07     ` Riana Tauro
2023-12-14 11:31 ` [PATCH v3 05/10] RFC drm/xe/uapi: Add configs for Engine busyness Riana Tauro
2023-12-14 11:31 ` [PATCH v3 06/10] RFC drm/xe/pmu: Enable PMU interface and add engine busyness counter Riana Tauro
2023-12-14 11:31 ` [PATCH v3 07/10] RFC drm/xe/guc: Add PMU counter for total active ticks Riana Tauro
2023-12-14 11:31 ` [PATCH v3 08/10] RFC drm/xe/guc: Expose engine busyness only for supported GuC version Riana Tauro
2023-12-14 11:31 ` [PATCH v3 09/10] RFC drm/xe/guc: Dynamically enable/disable engine busyness stats Riana Tauro
2023-12-14 11:31 ` [PATCH v3 10/10] RFC drm/xe/guc: Handle runtime suspend issues for engine busyness Riana Tauro
2023-12-14 12:59 ` ✓ CI.Patch_applied: success for Engine Busyness (rev3) Patchwork
2023-12-14 12:59 ` ✗ CI.checkpatch: warning " Patchwork
2023-12-14 13:00 ` ✗ CI.KUnit: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.